0% found this document useful (0 votes)

1K views135 pages

Master Probability - Lecture Notes - Quizzes & Solutions

The document provides an overview of key concepts in probability theory including: - Probability models consist of a sample space (set of all possible outcomes) and a probability law that assigns probabilities to events. - The probability law must satisfy three axioms: non-negativity, additivity, and normalization. - For discrete sample spaces, the probability of an event is the sum of probabilities of individual outcomes. A discrete uniform law assigns equal probability to each outcome. - Properties of probability laws include subsets having lower probability, the union-intersection formula, and bounds involving unions of events.

Uploaded by

Ayoub Bensakhria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views135 pages

Master Probability - Lecture Notes - Quizzes & Solutions

Uploaded by

Ayoub Bensakhria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 135

FROM TOP MIT LECTURERS

MASTER

PROBABILITY
LECTURE NOTES - QUIZZES WITH SOLUTIONS

EXCLUSIVE EDITION

WWW.AYOUBB.COM
ADVANCED PROBABILITY & STATISTICS

What to expect?

- Advanced Probability topics (Graduate Level)

- Summaries and Quizzes with solutions
- Exercises and Problems with short solutions
- Mid-term exams with solutions

CONTENT

Chapter I - Probabilistic models and axioms

Chapter II - Conditioning and Bayes_ rule
Chapter III - Counting
Chapter IV - Discrete Random Variables
Chapter V - Continuous Random Variables
Chapter VI - Further Topics on Random Variables
Chapter VII Bayesian inference
Chapter VIII - Limit theorems and classical statistics
Chapter IX - Bernoulli and Poisson processes
1. Probability Models and Axioms

Chapter I

1. Probabilistic
Models and
Axioms

Probabilistic Models
Elements of a Probabilistic Model
The sample space
The sample space Ω is the set of all possible outcomes of an experiment.

The probability law assigns to a set A of possible outcomes (also called an

event) a nonnegative number P(A) (called the probability of A) that
encodes our knowledge or belief about the collective “likelihood” of the
elements of A.

Figure 1. Two rolls of a tetrahedral die, Discrete example: Sample space vs. sequential
description
1. Probability Models and Axioms

Figure 2. Sample space: Continuous example

Exercise 1:
For the experiment of flipping a coin, and for each one of the following
choices, determine whether we have a legitimate sample space:

Ω = {𝐻𝑒𝑎𝑑𝑠 𝑎𝑛𝑑 𝑖𝑡 𝑖𝑠 𝑟𝑎𝑖𝑛𝑖𝑛𝑔, 𝐻𝑒𝑎𝑑𝑠 𝑎𝑛𝑑 𝑖𝑡 𝑖𝑠 𝑛𝑜𝑡 𝑟𝑎𝑖𝑛𝑖𝑛𝑔, 𝑇𝑎𝑖𝑙𝑠} Yes|No ?

Ω = {𝐻𝑒𝑎𝑑𝑠 𝑎𝑛𝑑 𝑖𝑡 𝑖𝑠 𝑟𝑎𝑖𝑛𝑖𝑛𝑔, 𝑇𝑎𝑖𝑙𝑠 𝑎𝑛𝑑 𝑖𝑡 𝑖𝑠 𝑛𝑜𝑡 𝑟𝑎𝑖𝑛𝑖𝑛𝑔, 𝑇𝑎𝑖𝑙𝑠} Yes|No ?

Yes, No

Exercise 2:
Paul checks the weather forecast. If the forecast is good, Paul will go out
for a walk. If the forecast is bad, then Paul will either stay home or go out. If
he goes out, he might either remember or forget his umbrella. In the tree
diagram below, identify the leaf that corresponds to the event that the
forecast is bad and Paul stays home.

Probability axioms
1. (Nonnegativity) 𝑃(𝐴)≥0, for every event A.
2. (Additivity) If A and B are two disjoint events, then the probability
of their union satisfies
1. Probability Models and Axioms

𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

More generally, if the sample space has an infinite number of elements

and 𝐴1, 𝐴2, … is a sequence of disjoint events, then the probability of
their union satisfies 𝑃(𝐴1 ∪ 𝐴2∪⋯) = 𝑃(𝐴1) + 𝑃(𝐴2) + ⋯.

3. (Normalization) The probability of the entire sample space Ω is

equal to 1, that is, 𝑃(Ω) = 1.

Exercise 3:
Let A and B be events on the same sample space, with P(A)=0.6
and P(B)=0.7. Can these two events be disjoint?
No

Discrete Probability Law

If the sample space consists of a finite number of possible outcomes, then
the probability law is specified by the probabilities of the events that
consist of a single element. In particular, the probability of any event
{𝑠1, 𝑠2, …, 𝑠𝑛} is the sum of the probabilities of its elements:

𝑃({𝑠1, 𝑠2, …, 𝑠𝑛}) = 𝑃(𝑠1) + 𝑃(𝑠2) + ⋯ + 𝑃(𝑠𝑛)

Discrete Uniform Probability Law

If the sample space consists of n possible outcomes which are equally
likely (i.e., all single-element events have the same probability), then the
probability of any event A is given by:
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓𝐴
𝑃(𝐴) = 𝑛

Some Properties of Probability Laws

Consider a probability law, and let A, B, and C be events:

(a) If 𝐴⊂𝐵, then 𝑃(𝐴)≤𝑃(𝐵).

(b) 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴∩𝐵)

(c) 𝑃(𝐴∪𝐵)≤𝑃(𝐴) + 𝑃(𝐵)

𝑐 𝑐 𝑐
(d) 𝑃(𝐴∪𝐵∪𝐶) = 𝑃(𝐴) + 𝑃(𝐴 ∩𝐵) + 𝑃(𝐴 ∩ 𝐵 ∩𝐶)
1. Probability Models and Axioms

Exercise 4:
Let A, B, and C be disjoint subsets of the sample space. For each one of the
following statements, determine whether it is true or false. Note: "False"
means "not guaranteed to be true."

( 𝑐) (
𝑎) 𝑃(𝐴) + 𝑃 𝐴 + 𝑃(𝐵) = 𝑃 𝐴∪𝐴 ∪𝐵
𝑐
)
𝑏) 𝑃(𝐴) + 𝑃(𝐵)≤1

( 𝑐)
𝑐) 𝑃 𝐴 + 𝑃(𝐵)≤1

𝑑) 𝑃(𝐴∪𝐵∪𝐶)≥𝑃(𝐴∪𝐵)
False, True, False, True

Exercise 5:
Let A, B, and C be subsets of the sample space, not necessarily disjoint. For
each one of the following statements, determine whether it is true or
false. Note: “False" means “not guaranteed to be true."
𝑐
( (
a) 𝑃 (𝐴∩𝐵) ∪ 𝐶∩𝐴 )) ≤ 𝑃(𝐴∪𝐵∪𝐶)
𝑐 𝑐 𝑐
b) 𝑃(𝐴∪𝐵∪𝐶) = 𝑃(𝐴∩𝐶 ) + 𝑃(𝐶) + 𝑃(𝐵∩𝐴 ∩ 𝐶 )
True, True

Exercise 6:
Consider the same model of two rolls of a tetrahedral die, with all 16
outcomes equally likely. Find the probability of the following events:

a) The value in the first roll is strictly larger than the value in the
second roll.
b) The sum of the values obtained in the two rolls is an even number.
0.375, 0.5

Exercise 7:
Consider a sample space that is the rectangular region [0,1] × [0,2] i.e., the
set of all pairs (x,y) that satisfy 0≤x≤1 and 0≤y≤2. Consider a “uniform"
probability law, under which the probability of an event is half of the area of
the event. Find the probability of the following events:

a) The two components x and y have the same values.

b) The value, x, of the first component is larger than or equal to the
value, y, of the second component.
1. Probability Models and Axioms

c) The value of x2 is larger than or equal to the value of y.

0, 0.25, 0.166

Exercise 8:
Let the sample space be the set of positive integers and suppose that
𝑛
𝑃(𝑛) = 1/2 , for 𝑛 = 1, 2, ….Find the probability of the set {3, 6, 9, …}, that is,
of the set of of positive integers that are multiples of 3.
0.1428

Exercise 9:
Let the sample space be the set of all positive integers. Is it possible to
have a “uniform" probability law, that is, a probability law that assigns the
same probability c to each positive integer?
No

Let the sample space be the two-dimensional plane. For any real number x,
let Ax be the subset of the plane that consists of all points of the vertical
line through the point (x,0), i.e., Ax={(x,y):y∈Re}.

a) Do the axioms of probability theory imply that the probability of the

union of the sets Ax (which is the whole plane) is equal to the sum of
the probabilities P(Ax)?
b) Do the axioms of probability theory imply that:
∞
( )
𝑃 𝐴1 ∪ 𝐴2∪⋯ = ∑ 𝑃 𝐴𝑥
𝑥=1
( )
(In other words, we consider only those lines for which the xx coordinate is
a positive integer.)
No, Yes

Exercise 10:
Countably infinite sample space

a) Do the axioms of probability theory imply that the probability of the

union of the sets Ax (which is the whole plane) is equal to the sum of
the probabilities P(Ax)?
1. Probability Models and Axioms

b) Do the axioms of probability theory imply that :

∞
( )
𝑃 𝐴1 ∪ 𝐴2∪⋯ = ∑ 𝑃 𝐴𝑥 ?
𝑥=1
( )
No, Yes

Problem 1:
Venn diagrams

In this problem, you are given descriptions in words of certain events (e.g.,
"at least one of the events A,B,C occurs"). For each one of these
descriptions, identify the correct symbolic description in terms of A,B,C
from Events E1-E7 below. Also identify the correct description in terms of
regions (i.e., subsets of the sample space Ω) as depicted in the Venn
diagram below. (For example, Region 1 is the part of A outside of B and C.)

Symbolic descriptions:

𝐸1: 𝐴∩𝐵∩𝐶
𝑐
𝐸2: (𝐴∩𝐵∩𝐶)
𝑐
𝐸3: 𝐴∩𝐵∩𝐶
𝑐 𝑐
(
𝐸4: 𝐵∪ 𝐵 ∩ 𝐶 )
𝑐 𝑐 𝑐
𝐸5: 𝐴 ∩ 𝐵 ∩ 𝐶

𝐸6: (𝐴∩𝐵) ∪ (𝐴∩𝐶) ∪ (𝐵∩𝐶)

𝑐 9
(
𝐸7: 𝐴∩𝐵 ∩ 𝐶 ) ∪ (𝐴𝑐∩𝐵∩𝐶𝑐) ∪ (𝐴𝑐 ∩ 𝐵𝑐∩𝐶)
a) At least two of the events A, B, C occur.
b) At most two of the events A, B, C occur.
1. Probability Models and Axioms

c) None of the events A, B, C occurs.

d) All three events A, B, C occur.
e) Exactly one of the events A, B, C occurs.
f) Events A and B occur, but C does not occur.
g) Either (i) event B occurs, or (ii) neither B or C occurs.
E6 R2-4-5-6, E2 R1-2-3-5-6-7-8, E5 R8, E1 R4, E7 R1-3-7, E3 R2, E4 R1-2-3-4-6-8

Problem 2:
Set operations and probabilities

Find the value of 𝑃 𝐴∪ 𝐵 ∪ 𝐶 ( ( 𝑐 𝑐 𝑐

) ) for each of the following cases:

1. The events A, B, C are disjoint events and P(A)=2/5.

( (
𝑃 𝐴∪ 𝐵 ∪ 𝐶
𝑐 𝑐 𝑐
) ) = ?

2. The events A and C are disjoint, and P(A)=1/2 and P(B∩C)=1/4

( (
𝑃 𝐴∪ 𝐵 ∪ 𝐶
𝑐 𝑐 𝑐
) ) =?

𝑐 𝑐 𝑐
(
3. 𝑃 𝐴 ∩ 𝐵 ∪ 𝐶 ( )) = 0. 7
( (
𝑃 𝐴∪ 𝐵 ∪ 𝐶
𝑐 𝑐 𝑐
) ) =?

0.4, 0.75, 0.3

Problem 3:
Three tosses of a fair coin

You flip a fair coin (i.e., the probability of obtaining Heads is 1/2) three times.
Assume that all sequences of coin flip results, of length 3, are equally likely.
Determine the probability of each of the following events.

1. {HHH}: 3 Heads
2. {HTH}: the sequence Heads, Tails, Heads
3. Any sequence with 2 Heads and 1 Tail (in any order):
4. Any sequence in which the number of Heads is greater than or
equal to the number of Tails:
0.125, 0.125, 0.375, 0.5
1. Probability Models and Axioms

Problem 4:
Parking lot problem

Mary and Tom park their cars in an empty parking lot with n≥2 consecutive
parking spaces (i.e, n spaces in a row, where only one car fits in each space).
Mary and Tom pick parking spaces at random; of course, they must each
choose a different space. (All pairs of distinct parking spaces are equally
likely.) What is the probability that there is at most one empty parking
space between them?
4⋅𝑛−6
2
𝑛 −𝑛

Problem 5:
Probabilities on a continuous sample space

Alice and Bob each choose at random a real number between zero and
one. We assume that the pair of numbers is chosen according to the
uniform probability law on the unit square, so that the probability of an
event is equal to its area.

We define the following events:

A = {The magnitude of the difference (for any two real numbers x

and y, the value |x−y|) of the two numbers is greater than 1/3}

B = {At least one of the numbers is greater than 1/4}

C = {The sum of the two numbers is 1}

D = {Alice's number is greater than 1/4}

Find the following probabilities:

P(A)=?

P(B)=?

P(A∩B)=?

P(C)=?

P(D)?

P(A∩D)=?
0.4444, 15/16, 0.4444, 0, 0.75, 0.30903
1. Probability Models and Axioms

Problem 6:
Upper and lower bounds on the probability of intersection

Given two events A,B with P(A)=3/4 and P(B)=1/3, what is the smallest
possible value of P(A∩B)? The largest? That is, find a and b such that,

𝑎≤𝑃(𝐴∩𝐵)≤𝑏

holds and any value in the closed interval [a,b] is possible.

0.083, 1/3
2. Conditioning and Bayes' rule

Chapter II

2. Conditioning and
Bayes' rule

2.1. Conditional probabilities

2.1.1. Properties of Conditional Probability
The conditional probability of an event A, given an event B with P(B) > 0, is
defined by:
𝑃(𝐴∩𝐵)
𝑃(𝐵) = 𝑃(𝐵)

and specifies a new (conditional) probability law on the same sample space
Ω. In particular, all properties of probability laws remain valid for conditional
probability laws.

Conditional probabilities can also be viewed as a probability law on a new

universe B, because all of the conditional probability is concentrated on B.

If the possible outcomes are finitely many and equally likely, then:
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝐴∩𝐵
𝑃(𝐵) = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝐵

Exercise 1:
Are the following statements true or false?

1. If Ω is finite and we have a discrete uniform probability law, and

if B≠∅, then the conditional probability law on B, given
that B occurred, is also discrete uniform.
2. If Ω is finite and we have a discrete uniform probability law, and
if B≠∅, then the conditional probability law on Ω, given
that B occurred, is also discrete uniform.
2. Conditioning and Bayes' rule

True, False

Exercise 2:
Conditional probabilities in a continuous model

Let the sample space be the unit square, Ω=[0,1]2, and let the probability of
a set be the area of the set. Let A be the set of points (x,y)∈[0,1]2 for
which y≤x. Let B be the set of points for which x≤1/2. Find P(A∣B).
¼

Exercise 3:
The multiplication rule

Are the following statements true or false? (Assume that all conditioning
events have positive probability.)
𝐶
(
1. 𝑃 𝐴∩𝐵∩𝐶 ) = 𝑃(𝐴∩𝐵)𝑃(𝐴∩𝐵)
𝐶 𝑐
2. 𝑃(𝐴∩𝐵∩𝐶 ) = 𝑃(𝐴)𝑃(𝐴)𝑃(𝐴∩𝐶 )
𝐶 𝑐
3. 𝑃(𝐴∩𝐵∩𝐶 ) = 𝑃(𝐴)𝑃(𝐴)𝑃(𝐴∩𝐶 )
4. 𝑃(𝐶) = 𝑃(𝐶)𝑃(𝐴∩𝐶)
True, True, True, True

2.2. Total Probability Theorem

𝐿𝑒𝑡 𝐴1, …, 𝐴𝑛 be disjoint events that form a partition of the sample space
(each possible outcome is included in exactly one of the events 𝐴1, …, 𝐴𝑛)
and assume that P(Ai) > 0, for all i. Then, for any event B, we have:

( )
𝑃(𝐵) = 𝑃 𝐴1∩𝐵 + ⋯ + 𝑃 𝐴𝑛∩𝐵 ( ) ( )( ) ( )( )
= 𝑃 𝐴1 𝑃 𝐴1 + ⋯ + 𝑃 𝐴𝑛 𝑃 𝐴𝑛

Exercise 4:
We have an infinite collection of biased coins, indexed by the positive
integers. Coin ii has probability 2−i of being selected. A flip of coin ii results
2. Conditioning and Bayes' rule

in Heads with probability 3−i. We select a coin and flip it. What is the
probability that the result is Heads? The geometric sum formula may be
useful here:
∞
𝑖 α
∑ α = 1−α
, 𝑤ℎ𝑒𝑛|α| < 1
𝑖=1

The probability that the result is Heads is: ?

1/5

2.3. Bayes’ rule

(
𝑃 𝐴𝑖∩𝐵 ) ( )( )
𝑃 𝐴𝑖 𝑃 𝐴𝑖 ( )( )
𝑃 𝐴𝑖 𝑃 𝐴𝑖
𝑃(𝐵) = 𝑃(𝐵)
= 𝑃(𝐵)
=
( )( )
∑𝑃 𝐴𝑗 𝑃 𝐴𝑗
𝑗

Exercise 5:
A test for a certain rare disease is assumed to be correct 95% of the time: if
a person has the disease, the test result is positive with probability 0.95,
and if the person does not have the disease, the test result is negative with
probability 0.95. A person drawn at random from a certain population has
probability 0.001 of having the disease.

1. Find the probability that a random person tests positive. (This

answer will require an accuracy of 4 decimal places.)
2. Given that the person just tested positive, what is the probability he
actually has the disease?
0.0509, 0.01866

2.4. Independence
Two events A and B are said to be independent if:

𝑃(𝐴∩𝐵) = 𝑃(𝐴)𝑃(𝐵)

If in addition, P(B) > 0, independence is equivalent to the condition:

𝑃(𝐵) = 𝑃(𝐴)

If A and B are independent, so are A and Bc.

Two events A and B are said to be conditionally independent, given another

event C with P(C) > 0, if:
2. Conditioning and Bayes' rule

𝑃(𝐶) = 𝑃(𝐶)𝑃(𝐶)

If in addition, P(B ∩ C) > 0, conditional independence is equivalent to the

condition:

𝑃(𝐵∩𝐶) = 𝑃(𝐶)

Independence does not imply conditional independence, and vice versa.

Exercise 6:
We have a peculiar coin. When tossed twice, the first toss results in Heads
with probability 1/2. However, the second toss always yields the same
result as the first toss. Thus, the only possible outcomes for a sequence of
2 tosses are HH and TT, and both have equal probabilities. Are the two
events A={Heads in the first toss} and B={Heads in the second toss}
independent?
No they’re independent

Let A be an event, a subset of the sample space Ω.

Are A and Ω independent?
Yes they’re independent

When is an event A independent of itself? Choose one of the following

possible answers:

1. Always
2. If and only if P(A)=0
3. If and only if P(A)=1
4. If and only if P(A) is either 0 or 1
4

Suppose that A and B are independent events. Are Ac and Bc independent?

Yes they’re independent

Exercise 7:
Conditional independence

Suppose that A and B are conditionally independent given C. Suppose

that P(C)>0 and P(Cc)>0P.

1. Are A and Bc guaranteed to be conditionally independent given C?

2. Are A and B guaranteed to be conditionally independent given Cc?
2. Conditioning and Bayes' rule

Yes, No

Suppose that A, B, C, and D are independent. Use intuitive reasoning (not a

mathematical proof) to answer the following.

1. Is it guaranteed that A∩C is independent from Bc∩D?

2. Is it guaranteed that A∩Bc∩D is independent from Bc∪Dc?
Yes, No

2.5. Definition of Independence of

Several Events
We say that the events 𝐴1, 𝐴2, …, 𝐴𝑛 are independent if :

𝑃(⋂𝑖∈𝑆𝐴𝑖) = ∏ 𝑃(𝐴𝑖) for every subset S of {1, 2, . . . , n}

𝑖∈𝑆

Exercise 8:
Reliability

Suppose that each unit of a system is up with probability 2/3 and down with

probability 1/3. Different units are independent. For each one of the
systems shown below, calculate the probability that the whole system is up
(that is, that there exists a path from the left end to the right end,
consisting entirely of units that are up).

1. What is the probability that the following system is up?

2. What is the probability that the following system is up?

2. Conditioning and Bayes' rule

16/27, 22/27

Problem 1:
Two five-sided dice

You roll two five-sided dice. The sides of each die are numbered from 1 to
5. The dice are “fair"" (all sides are equally likely), and the two die rolls are
independent.

Part (a): Event A is “the total is 10" (i.e., the sum of the results of the two die
rolls is 10).

1. Is event A independent of the event “at least one of the dice

resulted in a 5"?
2. Is event A independent of the event “at least one of the dice
resulted in a 1"?

Part (b): Event B is “the total is 8."

1. Is event B independent of getting “doubles" (i.e., both dice resulting

in the same number)?
2. Given that the total was 8, what is the probability that at least one of
the dice resulted in a 3?
No, No, No, 2/3

Problem 2:
A reliability problem

Consider the communication network shown in the figure below and

suppose that each link can fail with probability p. Assume that failures of
different links are independent.
2. Conditioning and Bayes' rule

1. Assume that p=1/3. Find the probability that there exists a path

from A to B along which no link has failed. (Give a numerical answer.)
2. Given that exactly one link in the network has failed, find the
probability that there exists a path from A to B along which no link
has failed. (Give a numerical answer.)
112/243, 64/81

Problem 3:
Oscar's lost dog in the forest

Oscar has lost his dog in either forest A (with probability 0.4) or in forest B
(with probability 0.6).

If the dog is in forest A and Oscar spends a day searching for it in forest A,
the conditional probability that he will find the dog that day is 0.25.
Similarly, if the dog is in forest B and Oscar spends a day looking for it
there, he will find the dog that day with probability 0.15.

The dog cannot go from one forest to the other. Oscar can search only in
the daytime, and he can travel from one forest to the other only overnight.

The dog is alive during day 0, when Oscar loses it, and during day 1, when
Oscar starts searching. It is alive during day 2 with probability 2/3. In
general, for n≥1, if the dog is alive during day n−1, then the probability it is
alive during day n is 2/(n+1). The dog can only die overnight. Oscar stops
searching as soon as he finds his dog, either alive or dead.

1. In which forest should Oscar look on the first day of the search to
maximize the probability he finds his dog that day?
2. Oscar looked in forest A on the first day but didn't find his dog.
What is the probability that the dog is in forest A?
3. Oscar flips a fair coin to determine where to look on the first day
and finds the dog on the first day. What is the probability that he
looked in forest A?
2. Conditioning and Bayes' rule

4. Oscar decides to look in forest A for the first two days. What is the
probability that he finds his dog alive for the first time on the
second day?
5. Oscar decides to look in forest A for the first two days. Given that
he did not find his dog on the first day, find the probability that he
does not find his dog dead on the second day.
6. Oscar finally finds his dog on the fourth day of the search. He
looked in forest A for the first 3 days and in forest B on the fourth
day. Given this information, what is the probability that he found his
dog alive?
Forrest A, 0.3333, 0.5263, 0.05, 0.9722, 0.1333

Problem 4:
Serap and her umbrella

Before leaving for work, Serap checks the weather report in order to
decide whether to carry an umbrella. On any given day, with
probability 0.2 the forecast is “rain" and with probability 0.8 the forecast is
“no rain". If the forecast is “rain", the probability of actually having rain on
that day is 0.8. On the other hand, if the forecast is “no rain", the probability
of actually raining is 0.1.

1. One day, Serap missed the forecast and it rained. What is the
probability that the forecast was “rain"?
2. Serap misses the morning forecast with probability 0.2 on any day
in the year. If she misses the forecast, Serap will flip a fair coin to
decide whether to carry an umbrella. (We assume that the result of
the coin flip is independent from the forecast and the weather.) On
any day she sees the forecast, if it says “rain" she will always carry
an umbrella, and if it says “no rain" she will not carry an umbrella.
Let U be the event that “Serap is carrying an umbrella", and let N be
the event that the forecast is “no rain". Are
events U and N independent?
3. Serap is carrying an umbrella and it is not raining. What is the
probability that she saw the forecast?
0.67, No, 8/27
3. Counting

Chapter III

3. Counting

3.1. The counting principle

The Counting Principle Consider a process that consists of r stages.
Suppose that: (a) There are n1 possible results at the first stage.

(b) For every possible result at the first stage, there are n2 possible results
at the second stage.

(c) More generally, for any sequence of possible results at the first i − 1
stages, there are ni possible results at the ith stage. Then, the total number
of possible results of the r-stage process is

𝑛1𝑛2 + …𝑛𝑟

Exercise 1:
You are given the set of letters {A, B, C, D, E}.

How many three-letter strings (i.e., sequences of 3 letters) can be made

out of these letters if each letter can be used only once? (In this and
subsequent questions, your answer should be a number. Do not enter ‘!' or
combinations in your answer.)

How many subsets does the set {A, B, C, D, E} have?

How many five-letter strings can be made if we require that each letter
appears exactly once and the letters A and B are next to each other, as
either “AB" or “BA"? (Hint: Think of a sequential way of producing such a
string.)
60, 32, 48

Exercise couting:
You are given the set of letters {A, B, C, D, E}. What is the probability that in
a random five-letter string (in which each letter appears exactly once, and
with all such strings equally likely) the letters A and B are next to each
other? The answer to a previous exercise may also be useful here. (In this
3. Counting

and subsequent questions, your answer should be a number. Do not enter

‘!' or combinations in your answer.)
2/5

3.2. Summary of Counting Results

• Permutations of n objects: 𝑛!

• k-permutations of n objects: 𝑛! /(𝑛 − 𝑘)!

𝑛!
• Combinations of k out of n objects: (𝑛 𝑘 ) = 𝑘!(𝑛−𝑘)!

• Partitions of n objects into r groups, with the ith group having ni objects:

(𝑛 𝑛1, 𝑛2, …, 𝑛𝑟 ) = 𝑛!
𝑛1!𝑛2!⋯𝑛𝑟!

Exercise: Counting committees

We start with a pool of n people. A chaired committee consists
of k≥1 members, out of whom one member is designated as the
chairperson. The expression 𝑘(𝑛 𝑘 ) can be interpreted as the number of
possible chaired committees with k members. This is because we have
(𝑛 𝑘 ) choices for the k members, and once the members are chosen, there
are then k choices for the chairperson. Thus,

𝑐 = ∑ 𝑘(𝑛 𝑘 )
𝑘=1

is the total number of possible chaired committees of any size.

Find the value of c (as a function of n) by thinking about a different way of
forming a chaired committee: first choose the chairperson, then choose
the other members of the committee. The answer is of the form
β γ𝑚+δ
(
𝑐 = α +𝑛 2 )
What are the values of α, β, γ, and δ?
0, 1, 1, -1

Exercise: Binomial probabilities

Recall that the probability of obtaining k Heads in n independent coin
𝑘 𝑛−𝑘
tosses is (𝑛 𝑘 )𝑝 (1 − 𝑝) , where p is the probability of Heads for any
given coin toss.
3. Counting

𝑘 𝑛−𝑘
Find the value of ∑ (𝑛 𝑘 )𝑝 (1 − 𝑝)
𝑘=0

Exercise: Coin tossing

Use the second method in the preceding segment to find the probability
that the 6th toss out of a total of 10 tosses is Heads, given that there are
exactly 2 Heads out of the 10 tosses. As in the preceding segment,
continue to assume that all coin tosses are independent and that each coin
toss has the same fixed probability of Heads. (In this and subsequent
questions, your answer should be a number. Do not enter ‘!' or
combinations in your answer.)
0.2

Exercise: Counting partitions

We have 9 distinct items and three persons. Alice is to get 2 items, Bob is
to get 3 items, and Charlie is to get 4 items.
𝑎!
As just discussed, this can be done in 𝑏!3!4! ways. Find a and b.

A different way of generating the desired partition is as follows. We first

choose 2 items to give to Alice. This can be done in (𝑐 𝑑 ) different ways.
Find c and d. (There are 2 possible values of d= that are correct. Enter the
smaller value.)

Having given 2 items to the Alice, we now give 3 items to Bob. This can be
done in (𝑒 𝑓 )ways. Find e and f. (There are 2 possible values of f that are
correct. Enter the smaller value.)
9, 2, 9, 2, 7, 3

Problem 1. Customers arriving at a restaurant

Six customers enter a three-floor restaurant. Each customer decides on
which floor to have dinner. Assume that the decisions of different
customers are independent, and that for each customer, each floor is
equally likely. Find the probability that exactly one customer dines on the
first floor.
64/243
3. Counting

Problem 2. A three-sided die, part 1

The newest invention of the 6.431x staff is a three-sided die. On any roll of
this die, the result is 1 with probability 1/2, 2 with probability 1/4, and 3 with
probability 1/4.

Consider a sequence of six independent rolls of this die.

Find the probability that exactly two of the rolls result in a 3.

1 2 3 4
a) (6 2 ) ( 4 )( )
4
2
(6 2 )( )
1
b) 4
1 2
(6 2 )( ) (6 4 )(3 4
c) 4
− 4)
1 4 3 2
d) (6 2 )( 4 )( )
4

Problem 2. A three-sided die, part 2

Given that exactly two of the six rolls resulted in a 1, find the probability
that the first roll resulted in a 1.
Note: Your answer should be a number. Do not enter "!" or combinations in
your answer.

We are told that exactly three of the rolls resulted in a 1 and exactly three
rolls resulted in a 2. Given this information, find the probability that the six
rolls resulted in the sequence (1,2,1,2,1,2).
Note: Your answer should be a number. Do not enter "!" or combinations in
your answer.

The conditional probability that exactly k rolls resulted in a 3, given that at

least one roll resulted in a 3, is of the form:
𝑘 𝑐3−𝑘

)( )( ) 𝑐1
1
𝑐3 (
𝑐3 𝑘
1
𝑐2 𝑐2
, 𝑓𝑜𝑟 𝑘 = 1, 2, …, 6
1−( )𝑐1
𝑐2

Find the values of the constants c1, c2, and c3:

1/3, 1/20, 3, 4, 6
3. Counting

Problem 3. Forming a committee

Out of five men and five women, we form a committee consisting of four
different people. Assuming that each committee of size four is equally
likely, find the probabilities of the following events:

The committee consists of two men and two women.

The committee has more women than men.

The committee has at least one man.

For the remainder of the problem, assume that Alice and Bob are among
the ten people being considered.

Both Alice and Bob are members of the committee.

10/21, 11/42, 41/42, 2/15

Problem 4. Proving binomial identities via counting

Binomial identities (i.e., identities involving binomial coefficients) can often
be proved via a counting interpretation. For each of the binomial identities
given below, select the counting problem that can be used to prove it.

Hint: You may find it useful to review the lecture exercise on counting

committees before attempting the problem.

(You need to answer all 4 questions before you can submit.)

1. 1. 𝑛(2𝑛 𝑛 ) = 2𝑛(2𝑛 − 1 𝑛 − 1 )
𝑛 𝑛
2
2. (2𝑛 𝑛 ) = ∑ (𝑛 𝑖 ) = ∑ (𝑛 𝑖 )(𝑛 𝑛 − 𝑖 )
𝑖=0 𝑖=0
2𝑛
2𝑛
3. 2 = ∑ (2𝑛 𝑖 )
𝑖=0
𝑛
𝑛−1
4. 𝑛2 = ∑ (𝑛 𝑖 )𝑖
𝑖=0

a) In a group of 2n people, consisting of n boys and n girls, we want to

select a committee of n people. In how many ways can this be
done?
b) How many subsets does a set with 2n elements have?
c) Out of n people, we want to form a committee consisting of a chair
and other members. We allow the committee size to be any integer
in the range 1,2,…,n. How many choices do we have in selecting a
committee-chair combination?
3. Counting

d) Out of 2n people, we want to choose a committee of n people, one

of whom will be its chair. In how many different ways can this be
done?
c, a, b, d

Problem 5. Hats in a box

Each one of n persons, indexed by 1,2,…,n, has a clean hat and throws it
into a box. The persons then pick hats from the box, at random. Every
assignment of the hats to the persons is equally likely. In an equivalent
model, each person picks a hat, one at a time, in the order of their index,
with each one of the remaining hats being equally likely to be picked. Find
the probability of the following events.

(You need to answer all 5 questions before you can submit.)

Every person gets his or her own hat back.

1
a) 𝑛!
1
b) (𝑛+1)!
1
c) 𝑛
1
d) 𝑛+1

Each one of persons 1,…,m gets his or her own hat back, where 1≤m≤n.

(𝑛+𝑚)!
a) 𝑛!
(𝑛−𝑚)!
b) 𝑛!
𝑛!
c) (𝑛+𝑚)!
𝑚!
d) 𝑛!

Each one of persons 1,…,m gets back a hat belonging to one of the

last m persons (persons n−m+1,…,n), where 1≤m≤n.
1
a) (𝑛 𝑚 )
𝑚
b) (𝑛 𝑚 )
𝑛−𝑚
c) (𝑛 𝑚 )
𝑛
d) (𝑛 𝑚 )
3. Counting

Now assume, in addition, that every hat thrown into the box has
probability p of getting dirty (independently of what happens to the other
hats or who has dropped or picked it up). Find the probability that:

Persons 1,…,m will pick up clean hats.

𝑛−𝑚
a) (1 − 𝑝)
𝑚
b) 𝑚(1 − 𝑝)
𝑚
c) (1 − 𝑝)
𝑛−𝑚
d) 𝑚(1 − 𝑝)

Exactly m persons will pick up clean hats.

(𝑛 𝑚 ) 𝑚 𝑛−𝑚
a) 𝑛!
(1 − 𝑝) 𝑝
𝑚 𝑛−𝑚
b) (1 − 𝑝) 𝑝
𝑛−𝑚 𝑚
c) (𝑛 𝑚 )(1 − 𝑝) 𝑝
𝑚 𝑛−𝑚
d) (𝑛 𝑚 )(1 − 𝑝) 𝑝
a, b, a, c, d
4. Discrete Random Variables

Chapter IV

4. Discrete
Random
Variables
4.1. Main Concepts Related to Random Variables
Starting with a probabilistic model of an experiment:

• A random variable is a real-valued function of the outcome of the

experiment.

• A function of a random variable defines another random variable.

• We can associate with each random variable certain “averages” of

interest, such as the mean and the variance.

• A random variable can be conditioned on an event or on another random

variable.

• There is a notion of independence of a random variable from an event or

from another random variable

4.2. Concepts Related to Discrete Random

Variables
Starting with a probabilistic model of an experiment:

• A discrete random variable is a real-valued function of the outcome of the

experiment that can take a finite or countably infinite number of values.

• A discrete random variable has an associated probability mass function

(PMF), which gives the probability of each numerical value that the random
variable can take.
4. Discrete Random Variables

• A function of a discrete random variable defines another discrete random

variable, whose PMF can be obtained from the PMF of the original random
variable.

4.3. Calculation of the PMF of a Random Variable

X
For each possible value x of X:

1. Collect all the possible outcomes that give rise to the event {X = x}.

2. Add their probabilities to obtain pX(x).

Exercise: PMF calculation

As in the previous lecture clip, consider the same example of two rolls of a
4-sided die, with all 16 outcomes equally likely. As before, let X be the
result of the first roll and Y be the result of the second roll. Define W=XY.
Find the numerical values of pW(4) and pW(5).
3/16, 0

Exercise: Random variables versus numbers

Let X be a random variable that takes integer values, with PMF pX(x).
Let Y be another integer-valued random variable and let y be a number.

a) Is pX(y) a random variable or a number?

b) Is pX(Y) a random variable or a number?
Number, RV

Exercise: Indicator variables

Let A and B be two events (subsets of the same sample space Ω), with
nonempty intersection. Let IA and IB be the associated indicator random
variables.

For each of the two cases below, indicate a statement that is true.

a) 𝐼𝐴 + 𝐼𝐵:
b) 𝐼𝐴 · 𝐼𝐵:

Is not an Indicator RV of any event, is the IRV A∩BA∩B.

4. Discrete Random Variables

Exercise: The binomial PMF

You roll a fair six-sided die (all 6 of the possible results of a die roll are
equally likely) 5 times, independently. Let X be the number of times that
the roll results in 2 or 3. Find the numerical values of the following.

𝑝𝑋(2. 5) =?

𝑝𝑋(1) =?

0, 80/243

Exercise: Geometric random variables

Let X be a geometric random variable with parameter p. Find the
probability that X≥10. Express your answer in terms of p

𝑃(𝑋≥10) =?
9
(1 − 𝑝)

4.4. Expectation
We define the expected value (also called the expectation or the mean) of
a random variable X, with PMF pX, by

𝐸[𝑋] = ∑ 𝑥𝑝𝑋(𝑥)
𝑥

4.5. Expected Value Rule for Functions of

Random Variables
Let X be a random variable with PMF pX, and let g(X) be a function of X.
Then, the expected value of the random variable g(X) is given by

𝐸[𝑔(𝑋)] = ∑ 𝑔(𝑥)𝑝𝑋(𝑥)
𝑥

Exercise: Expectation calculation

The PMF of the random variable Y satisfies
1 2 3
𝑝𝑝(− 1) = 6
, 𝑝𝑌(2) = 6
, 𝑝𝑌(5) = 6
, 𝑎𝑛𝑑 𝑝𝑌(𝑦) = 0 for all other values y. The
expected value of Y is:
3
4. Discrete Random Variables

Exercise: Random variables with bounded range

Suppose a random variable X can take any value in the interval [−1,2] and a
random variable Y can take any value in the interval [−2,3].

a) The random variable X−Y can take any value in an interval [a,b]. Find the

values of a and b.

b) Can the expected value of X+Y be equal to 6?

-4, 4, No

Exercise: The expected value rule

Let X be a uniform random variable on the range {−1,0,1,2}. Let Y=X4. Use
the expected value rule to calculate E[Y].
9/2

Exercise: Linearity of expectations

The random variable X is known to satisfy E[X]=2 and E[X2]=7. Find the
expected value of 8−X and of (X−3)(X+3).

𝐸[8 − 𝑋] =?

𝐸[(𝑋 − 3)(𝑋 + 3)] =?

6, -2

4.6. Variance
The variance var(X) of a random variable X is defined by
2
𝑣𝑎𝑟⁡(𝑋) = 𝐸[(𝑋 − 𝐸[𝑋]) ]

and can be calculated as

2
𝑣𝑎𝑟⁡(𝑋) = ∑(𝑥 − 𝐸[𝑋]) 𝑝𝑋(𝑥)
𝑥

It is always nonnegative. Its square root is denoted by σX and is called the

standard deviation

Exercise: Variance calculation

Suppose that Var(X)=2. The variance of 2−3X is:

(2 − 3𝑋) =?
4. Discrete Random Variables

Exercise: Variance properties

2 2
Is it always true that 𝐸[𝑋 ]≥(𝐸[𝑋]) ?
Yes

4.7. Mean and Variance of a Linear Function of a

Random Variable
Let X be a random variable and let Y = aX + b, where a and b are given
scalars. Then
2
𝐸[𝑌] = 𝑎𝐸[𝑋] + 𝑏, (𝑌) = 𝑎 (𝑋)

4.8. Variance in Terms of Moments Expression

2 2
𝑣𝑎𝑟⁡(𝑋) = 𝐸[𝑋 ] − (𝐸[𝑋])

Exercise: Variance of the uniform

Suppose that the random variable X takes values in the
set {0,2,4,6,…,2n} (the even integers between 0 and 2n, inclusive), with each
value having the same probability. What is the variance
of X? Hint: Consider the random variable Y=X/2 and recall that the variance
of a uniform random variable on the set {0,1,…,n} is equal to n(n+2)/12.

Var(X)=?
𝑛⋅(𝑛+2)
3

Exercise: Conditional variance

In the last example, we saw that the conditional distribution of X, which
was a uniform over a smaller range (and in some sense, less uncertain), had
a smaller variance, i.e., Var(X∣A)≤Var(X). Here is an example where this is not
true. Let Y be uniform on {0,1,2} and let B be the event that Y belongs
to {0,2}.

a) What is the variance of Y?

b) What is the conditional variance Var(Y∣B)?
2/3, 1
4. Discrete Random Variables

Exercise: Total expectation calculation

We have two coins, A and B. For each toss of coin A, we obtain Heads with
probability 1/2; for each toss of coin B, we obtain Heads with
probability 1/3. All tosses of the same coin are independent. We select a
coin at random, where the probability of selecting coin A is 1/4, and then
toss it until Heads is obtained for the first time.

The expected number of tosses until the first Heads is:

11/4

Exercise: Memorylessness of the geometric

Let X be a geometric random variable, and assume that Var(X)=5.

a) What is the conditional variance Var(X−4∣X>4)?

b) What is the conditional variance Var(X−8∣X>4)?
5, 5

4.9. Summary of Facts About Joint PMFs

Let X and Y be random variables associated with the same experiment.

• The joint PMF pX,Y of X and Y is defined by

𝑝𝑋,𝑌(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)

• The marginal PMFs of X and Y can be obtained from the joint PMF, using
the formulas

𝑝𝑋(𝑥) = ∑ 𝑝𝑥, 𝑌(𝑥, 𝑦), 𝑝𝑌(𝑦) = ∑ 𝑝𝑥, 𝑌(𝑥, 𝑦)

𝑦 𝑥

• A function g(X, Y ) of X and Y defines another random variable, and

𝐸[𝑔(𝑋, 𝑌)] = ∑ ∑ 𝑔(𝑥, 𝑦)𝑝𝑋,𝑌(𝑥, 𝑦)

𝑥 𝑦

If g is linear, of the form aX + bY + c, we have

𝐸[𝑎𝑋 + 𝑏𝑌 + 𝑐] = 𝑎𝐸[𝑋] + 𝑏𝐸[𝑌] + 𝑐

• The above have natural extensions to the case where more than two
random variables are involved.
4. Discrete Random Variables

Exercise: Joint PMF calculation

The random variable V takes values in the set {0,1} and the random
variable W takes values in the set {0,1,2}. Their joint PMF is of the form

𝑝𝑉,𝑊(𝑣, 𝑤) = 𝑐⋅(𝑣 + 𝑤)

where c is some constant, for v and w in their respective ranges, and is zero

everywhere else.

a) Find the value of c

b) Find pV(1)
1/9, 2/3

Exercise: Expected value rule

Let X and Y be discrete random variables. For each one of the formulas
below, state whether it is true or false.

[ 2]
𝑎)𝐸 𝑋 = ∑ 𝑥𝑝𝑋 𝑥 ( 2)
𝑥

[ 2] 2
𝑏)𝐸 𝑋 = ∑ 𝑥 𝑝𝑋(𝑥)
𝑥

2 2
𝑐)𝐸[𝑋 ] = ∑ 𝑥 𝑝𝑋,𝑃(𝑥)
𝑥

[ 2] 2
𝑑)𝐸 𝑋 = ∑ 𝑥 𝑝𝑋,𝑌(𝑥, 𝑦)
𝑥

2 2
𝑒)𝐸[𝑋 ] = ∑ ∑ 𝑥 𝑝𝑋,𝑌(𝑥, 𝑦)
𝑥 𝑦

2
𝑓)𝐸[𝑋 ] = ∑ 𝑧𝑝 2(𝑧)
𝑋
𝑧

False, True, False, False, True, True

Exercise: Linearity of expectations drill

Suppose that E[Xi]=i for every i. Then,

[ ]
𝐸 𝑋1 + 2𝑋2 − 3𝑋3 =?
4. Discrete Random Variables

-4

Exercise: Using linearity of expectations

We have two coins, A and B. For each toss of coin A, we obtain Heads with
probability 1/2; for each toss of coin B, we obtain Heads with probability 1/3.
All tosses of the same coin are independent.

We toss coin A until Heads is obtained for the first time. We then toss coin
B until Heads is obtained for the first time with coin B.

The expected value of the total number of tosses is:

4.10. Summary of Facts About Conditional PMFs

Let X and Y be random variables associated with the same experiment.

• Conditional PMFs are similar to ordinary PMFs, but pertain to a universe

where the conditioning event is known to have occurred.

• The conditional PMF of X given an event A with P(A) > 0, is defined by

𝑝𝑋∣𝐴(𝑥) = 𝑃(𝐴)

and satisfies

∑ 𝑝𝑋∣𝐴(𝑥) = 1
𝑥

• If A1, . . . , An are disjoint events that form a partition of the sample space,
with P(Ai) > 0 for all i, then
𝑛

𝑖=1
( )
𝑝𝑋(𝑥) = ∑ 𝑃 𝐴𝑖 𝑝𝑋∣𝐴 (𝑥)
𝑖

(This is a special case of the total probability theorem.) Furthermore, for

any event B, with P(Ai ∩ B) > 0 for all i, we have
𝑛
𝑝𝑋∣𝐵(𝑥) = ∑ 𝑃(𝐵)𝑝𝑋∣𝐴 ∩𝐵(𝑥)
𝑖=1 𝑖

• The conditional PMF of X given Y = y is related to the joint PMF by

𝑝𝑋,𝑌(𝑥, 𝑦) = 𝑝𝑌(𝑦)𝑝𝑋∣𝑌(𝑦)
4. Discrete Random Variables

• The conditional PMF of X given Y can be used to calculate the marginal

PMF of X through the formula

𝑝𝑋(𝑥) = ∑ 𝑝𝑌(𝑦)𝑝𝑋∣𝑌(𝑦)
𝑦

• There are natural extensions of the above involving more than two
random variables.

Exercise: Conditional PMFs

For each of the formulas below, state whether it is true or false.

𝑎)𝑝𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧) = 𝑝𝑌(𝑦)𝑝𝑍∣𝑌(𝑦)𝑝𝑋∣𝑌,𝑍(𝑦, 𝑧)

𝑏)𝑝𝑋,𝑌∣𝑍(𝑧) = 𝑝𝑋(𝑥)𝑝𝑌∣𝑍(𝑧)

𝐶)𝑝𝑋,𝑌∣𝑍(𝑧) = 𝑝𝑋∣𝑍(𝑧)𝑝𝑌∣𝑋,𝑍(𝑥, 𝑧)

𝑑) ∑ 𝑝𝑋,𝑌∣𝑍(𝑧) = 1
𝑥

𝑒) ∑ ∑ 𝑝𝑋,𝑌∣𝑍(𝑥, 𝑦∣𝑧) = 1
𝑥 𝑦

𝑝𝑋,𝑌,𝑍(𝑥,𝑦,𝑧)
𝑓)𝑝𝑋,𝑌∣𝑍(𝑧) = 𝑝𝑍(𝑧)

𝑝𝑋,𝑌,𝑍(𝑥,𝑦,𝑧)
𝑔(𝑥, 𝑦, 𝑧) = 𝑝𝑌,𝑍(𝑦,𝑧)

True, False, True, False, True, True, True

4.11. Summary of Facts About Conditional

Expectations
Let X and Y be random variables associated with the same experiment. •
The conditional expectation of X given an event A with P(A) > 0, is defined
by

𝐸[𝐴] = ∑ 𝑥𝑝𝑋∣𝐴(𝑥)
𝑥

For a function g(X), we have

4. Discrete Random Variables

𝐸[𝐴] = ∑ 𝑔(𝑥)𝑝𝑋∣𝐴(𝑥)
𝑥

• The conditional expectation of X given a value y of Y is defined by

𝐸[𝑌 = 𝑦] = ∑ 𝑥𝑝𝑋∣𝑌(𝑦)
𝑥

• If A1, . . . , An be disjoint events that form a partition of the sample space,

with P(Ai) > 0 for all i, then
𝑛
( )[ ]
𝐸[𝑋] = ∑ 𝑃 𝐴𝑖 𝐸 𝐴𝑖
𝑖=1

Furthermore, for any event B with P(Ai ∩ B) > 0 for all i, we have
𝑛
𝐸[𝐵] = ∑ 𝑃(𝐵)𝐸 𝐴𝑖∩𝐵
𝑖=1
[ ]
• We have

𝐸[𝑋] = ∑ 𝑝𝑌(𝑦)𝐸[𝑌 = 𝑦]
𝑦

Exercise: The expected value rule with conditioning

For each of the formulas below, state whether it is true or false.

1)𝐸[𝑌 = 2] = ∑ 𝑔(𝑥, 𝑦)𝑝𝑋,𝑌(𝑥, 𝑦)

𝑥

2)𝐸[𝑌 = 2] = ∑ 𝑔(𝑥, 𝑦)𝑝𝑋,𝑌(𝑥, 2)

𝑥

3)𝐸[𝑔(𝑋, 𝑌)∣𝑌 = 2] = ∑ 𝑔(𝑥, 2)𝑝𝑋,𝑌(𝑥, 2)

𝑥

4)𝐸[𝑌 = 2] = ∑ 𝑔(𝑥, 2)𝑝𝑋∣𝑌(2)

𝑥

𝑝𝑋,𝑌(𝑥,2)
5)𝐸[𝑌 = 2] = ∑ 𝑔(𝑥, 2) 𝑝𝑌(2)
𝑥
4. Discrete Random Variables

6)𝐸[𝑌 = 2] = ∑ ∑ 𝑔(𝑥, 𝑦)𝑝𝑋,𝑌∣𝑌(2)

𝑥 𝑦

False, False, False, True, True, True

4.12. Summary of Facts About Independent

Random Variables
Let A be an event, with P(A) > 0, and let X and Y be random variables
associated with the same experiment.

• X is independent of the event A if

𝑝𝑋∣𝐴(𝑥) = 𝑝𝑋(𝑥) for all x,

that is, if for all x, the events {X = x} and A are independent.

• X and Y are independent if for all pairs (x, y), the events {X = x} and {Y = y}
are independent, or equivalently

𝑝𝑋,𝑌(𝑥, 𝑦) = 𝑝𝑋(𝑥)𝑝𝑌(𝑦) for all x, y.

• If X and Y are independent random variables, then

𝐸[𝑋𝑌] = 𝐸[𝑋]𝐸[𝑌]

Furthermore, for any functions g and h, the random variables g(X) and h(Y )
are independent, and we have

𝐸[𝑔(𝑋)ℎ(𝑌)] = 𝐸[𝑔(𝑋)]𝐸[ℎ(𝑌)]

• If X and Y are independent, then

(𝑋 + 𝑌) = (𝑋) + (𝑌)

Exercise: Independence
Let X, Y, and Z be discrete random variables.

a) Suppose that Z is identically equal to 3, i.e., P(Z=3)=1. Is X guaranteed to

be independent of Z?

b) Would either of the following be an appropriate definition of

independence of the pair (X,Y) from Z?

𝑝𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧) = 𝑝𝑋(𝑥)𝑝𝑌(𝑦)𝑝𝑍(𝑧) for all x,y,z

4. Discrete Random Variables

𝑝𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧) = 𝑝𝑋,𝑌(𝑥, 𝑦)𝑝𝑍(𝑧) for all x,y,z

c) Suppose that X,Y,Z are independent. Is it true that X and Y are

independent?

d) Suppose that X,Y,Z are independent. Is it true that (X,Y) is independent

from Z?
Yes, No, Yes, Yes, Yes

Exercise: A criterion for independence

Suppose that the conditional PMF of X, given Y=y, is the same for
every y for which pY(y)>0. Is this enough to guarantee independence?
Yes

Exercise: Independence and expectations

Let X and Y be independent positive discrete random variables. For each of
the following statements, determine whether it is true (that is, always true)
or false (that is, not guaranteed to be always true).
𝑋 𝐸[𝑋]
1. 𝐸⎡ 𝑌 ⎤ =
⎣ ⎦ 𝐸[𝑌]

𝑋 1
2. 𝐸⎡ 𝑌 ⎤ = 𝐸[𝑋]𝐸⎡ 𝑌 ⎤
⎣ ⎦ ⎣ ⎦
False, True

Exercise: Independence and variances

The pair of random variables (X,Y) is equally likely to take any of the four
pairs of values (0,1), (1,0), (−1,0), (0,−1). Note that X and Y each have zero
mean.

a) Find E[XY].

b) For this pair of random variables (X,Y), is it true

that Var(X+Y)=Var(X)+Var(Y)?

c) We know that if X and Y are independent, then Var(X+Y)=Var(X)+Var(Y).

Is the converse true? That is, does the condition Var(X+Y)=Var(X)+Var(Y)
imply independence?
0, Yes, No
4. Discrete Random Variables

Exercise: The hat problem

Consider the hat problem, with n=10. What is the expected value of X3X6X7?
1/720

Problem 1. Tosses of a biased coin

Consider 10 independent tosses of a biased coin with the probability of
Heads at each toss equal to p, where 0<p<1.

Let A be the event that there are 6 Heads in the first 8 tosses. Let B be the
event that the 9th toss results in Heads.

Find P(B∣A) and express it in terms of p

Find the probability that there are 3 Heads in the first 4 tosses and 2 Heads
in the last 3 tosses. Express your answer in terms of p using standard
notation. Remember not to use ! or combinations in your answer.

Given that there were 4 Heads in the first 7 tosses, find the probability that
the 2nd Heads occurred at the 4th toss. Give a numerical answer.

We are interested in calculating the probability that there are 5 Heads in

the first 6 tosses and 3 Heads in the last 5 tosses. Give the exact numerical
7 3 𝑐 𝑑
values of a, b, c, d that would match the answer 𝑎𝑝 (1 − 𝑝) + 𝑏𝑝 (1 − 𝑝)
5 2
p, 12⋅𝑝 ⋅(1 − 𝑝) , 9/35, 30, 4, 8, 2

Problem 2. Three-sided dice

We have two fair three-sided dice, indexed by i=1,2. Each die has sides
labelled 1, 2, and 3. We roll the two dice independently, one roll for each die.
For i=1,2, let the random variable Xi represent the result of the ith die, so
that Xi is uniformly distributed over the set {1,2,3}. Define X=X2−X1

Calculate the numerical values of following probabilities, as well as the

expected value and variance of X:

P(X=0)=?

P(X=1)=?

P(X=−2)=?

P(X=3)=?

E[X]=?
4. Discrete Random Variables

Var(X)=?

Let Y=X2. Calculate the following probabilities:

P(Y=0)=?

P(Y=1)=?

P(Y=2)=?
1/3, 2/9, 1/9, 0, 0, 4/3, 1/3, 4/9, 0

Problem 3. PMF, expectation, and variance

The random variables X and Y have the joint PMF
2
𝑝𝑋,𝑌(𝑥, 𝑦) = {𝑐⋅(𝑥 + 𝑦) , 𝑖𝑓𝑥∈{1, 2, 4}𝑎𝑛𝑑𝑦∈{1, 3} 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

All answers in this problem should be numerical.

Find the value of the constant c.

Find P(Y<X).

Find P(Y=X).

Find the following probabilities.

P(X=1)=?

P(X=2)=?

P(X=3)=?

P(X=4)=?

Find the expectations E[X] and E[XY].

Find the variance of X.

1/128, 0.648, 0.03125, 20/128, 34/128, 0, 74/128, 2.97, 7.15, 1.47

Problem 4. Joint PMF

The joint PMF, pX,Y(x,y), of the random variables X and Y is given by the
following table:
4. Discrete Random Variables

Find the value of the constant c.

Find pX(1).

Consider the random variable Z=X2Y3. Find E[Z∣Y=−1].

Conditioned on the event that Y≠0, are X and Y independent?

Find the conditional variance of Y given that X=0.

1/28, ½, -1.7143, Yes, 0.89

Problem 5. Indicator variables

Consider a sequence of n+1 independent tosses of a biased coin, at
times k=0,1,2,…,n. On each toss, the probability of Heads is p, and the
probability of Tails is 1−p.

A reward of one unit is given at time k, for k∈{1,2,…,n}, if the toss at

time k resulted in Tails and the toss at time k−1 resulted in Heads.
Otherwise, no reward is given at time k.

Let R be the sum of the rewards collected at times 1,2,…,n.

We will find E[R] and Var(R) by carrying out a sequence of steps. Express

your answers below in terms of p and/or n.

We first work towards finding E[R].

Let Ik denote the reward (possibly 0) given at time k, for k∈{1,2,…,n}.

Find E[Ik].

Using the answer to part 1, find E[R].

The variance calculation is more involved because the random

variables I1,I2,…,In are not independent. We begin by computing the
following values.
2
If k∈{1,2,…,n}, then 𝐸⎡⎢𝐼𝑘⎤⎥ =?
⎣ ⎦
[
If k∈{1,2,…,n−1}, then 𝐸 𝐼𝑘𝐼𝑘+1 =?]
4. Discrete Random Variables

[ ]
If k≥1k≥1, ℓ≥2, and k+ℓ≤n, then 𝐸 𝐼𝑘𝐼𝑘+𝑙 =?

Using the results above, calculate the numerical value of Var(R), assuming

that p=3/4, n=10.
2 2
𝑝⋅(1 − 𝑝), (𝑛⋅𝑝) · (1 − 𝑝), (𝑝) · (1 − 𝑝), 0, (𝑝 )⋅(1 − 𝑝) , 0.89

Problem 6. True or False

For each of the following statements, state whether it is true (meaning,
always true) or false (meaning, not always true):

Let X and Y be two binomial random variables.

(a) If X and Y are independent, then X+Y is also a binomial random

variable.
(b) If X and Y have the same parameters, n and p, then X+Y is a binomial
random variable.
(c) If X and Y have the same parameter, and are independent, then X+Y
is a binomial random variable.

Suppose that, E[X]=0. Then, X=0.

Suppose that, E[X2]=0. Then, P(X=0)=1.

Let X be a random variable. Then, E[X2]≥E[X].

Suppose that, X is a random variable, taking positive integer values, which

satisfies E[(X−6)2]=0. Then, pX(4)=pX(5).

Suppose that E[X]≥0. Then, X≥0 with probability 1, i.e., P(X≥0)=1.

False, False, True, False, True, False, True, False

Mid-term exam
True or False
Let A, B, and C be events associated with the same probabilistic model (i.e.,
subsets of a common sample space), and assume that P(C)>0.

For each one of the following statements, decide whether the statement is
True (always true), or False (not always true).

a) Suppose that A⊂C. Then, P(A∣C)≥P(A).

b) Suppose that A⊂B. Then, P(A∣C)≤P(B∣C).
c) Suppose that P(A)≤P(B). Then, P(A∣C)≤P(B∣C).
4. Discrete Random Variables

d) Suppose that A⊂C, B⊂C, and P(A)≤P(B). Then, P(A∣C)≤P(B∣C).

True, True, False, True

A Drunk Person at the Theater

There are n people in line, indexed by i=1,…,ni, to enter a theater
with n seats one by one. However, the first person (i=1) in the line is drunk.
This person has lost her ticket and decides to take a random seat instead
of her assigned seat. That is, the drunk person decides to take any one of
the seats 1 to n with equal probability. Every other person i=2,…,n that
enters afterwards is sober and will take his assigned seat (seat ii) unless his
seat ii is already taken, in which case he will take a random seat chosen
uniformly from the remaining seats.

Suppose that n=3. What is the probability that person 2 takes seat 2?

(Enter a fraction or a decimal accurate to at least 3 decimal places.)

Suppose that n=5. What is the probability that person 3 takes seat 3?

(Enter a fraction or a decimal accurate to at least 3 decimal places.)

2/3, ¾

Expectation 1
Compute E(X) for the following random variable X:

X=Number of tosses until getting 4 (including the last toss) by tossing a fair
10-sided die.
10

Expectation 2
Compute E(X) for the following random variable X:

X=Number of tosses until all 10 numbers are seen (including the last toss)
by tossing a fair 10-sided die.

To answer this, we will use induction and follow the steps below:

Let E(i) be the expected number of additional tosses until all 10 numbers

are seen (including the last toss) given i distinct numbers have already
been seen.

Find E(10).
4. Discrete Random Variables

Write down a relation between E(i) and E(i+1). Answer by finding the

function f(i) in the formula below.

For i=0,1,…,9

𝐸(𝑖) = 𝐸(𝑖 + 1) + 𝑓(𝑖)

where f(i)= ?

Finally, using the results above, find E[X].

(Enter an answer accurate to at least 1 decimal place.)

10
0, 10−𝑖
, 29.29

Conditional Independence 1
Suppose that we have a box that contains two coins:

A fair coin: P(H)=P(T)=0.5.

A two-headed coin: P(H)=1.

A coin is chosen at random from the box, i.e. either coin is chosen with
probability 1/2, and tossed twice. Conditioned on the identity of the coin,
the two tosses are independent.

Define the following events:

Event A: first coin toss is H.

Event B: second coin toss is H.

Event C: two coin tosses result in HH.

Event D: the fair coin is chosen.

For the following statements, decide whether they are true or false.

a) A and B are independent.
b) A and C are independent.
c) A and B are independent given D.
d) A and C are independent given D.
False, False, True, False

Conditional Independence 2
Suppose three random variables X, Y, Z have a joint distribution
4. Discrete Random Variables

𝑃𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧) = 𝑃𝑋(𝑥)𝑃𝑍∣𝑋(𝑥)𝑃𝑌∣𝑍(𝑧)

Then, X and Y are independent given Z.

Suppose random variables X and Y are independent given Z, then the joint

distribution must be of the form

𝑃𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧) = ℎ(𝑥, 𝑧)𝑔(𝑦, 𝑧)

where h,g are some functions.

True, True

Variance of Difference of Indicators

Let A be an event, and let IA be the associated indicator random variable
(IA is 1 if A occurs, and zero if A does not occur). Similarly, let IB be the
indicator of another event, B. Suppose that P(A)=p, P(B)=q, and P(A∩B)=r

Find the variance of IA−IB, in terms of p, q, r.

(𝐼𝐴 − 𝐼𝐵) =?

2
𝑝 − 2𝑟 + 𝑞 − (𝑝 − 𝑞)

For all problems on this page, use the following setup:

Let N be a positive integer random variable with PMF of the form

1 −𝑛
𝑝𝑁(𝑛) = 2
⋅𝑛⋅2 , 𝑛 = 1, 2

Once we see the numerical value of N, we then draw a random

variable K whose (conditional) PMF is uniform on the set {1,2,…,2n}.

Joint PMF
Write down an expression for the joint 𝑃𝑀𝐹𝑝𝑁,𝑅(𝑛, 𝑘).

For n=1,2,… and k=1,2,…,2n:
1
𝑛
4⋅2

Marginal Distribution
Find the marginal PMF pK(k) as a function of k. For simplicity, provide the
answer only for the case when k is an even number. (The formula for
when k is odd would be slightly different, and you do not need to provide it).
4. Discrete Random Variables

𝑖 1
Hint: You may find the following helpful: ∑ 𝑟 = 1−𝑟
𝑓𝑜𝑟 0 < 𝑟 < 1
𝑖=0

For k=2,4,6,… :

𝑝𝐾(𝑘) =?

1
( )+1
2
𝑘
2

Discrete PMFs
Let A be the event that K is even. Find P(A|N=n) and P(A).

P(A∣N=n)=?

P(A)=?
½, ½

Independence 2
Is the event A independent of N?
Yes
4. Discrete Random Variables
5. Continuous Random Variables

Chapter V

5. Continuous
Random
Variables
5.1. Summary of PDF Properties
Summary of PDF Properties Let X be a continuous random variable with
PDF fX.
∞
𝑓𝑥(𝑥)≥0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥 ∫ 𝑓𝑋(𝑥)𝑑𝑥 = 1
−∞

• If δ is very small, then 𝑃([𝑥, 𝑥 + δ])≈𝑓𝑋(𝑥)⋅δ.

• For any subset B of the real line

𝑃(𝑋∈𝐵) = ∫ 𝑓𝑋(𝑥)𝑑𝑥
𝐵

Exercise: PDFs
Let X be a continuous random variable with a PDF of the form

𝑓𝑋(𝑥) = {𝑐(1 − 𝑥), 𝑖𝑓𝑥∈[0, 1] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Find the following values.

c=?

P(X=1/2)=?

( {
𝑃 𝑋∈
1
𝑘 }) =?
: 𝑘 𝑖𝑛𝑡𝑒𝑔𝑒𝑟, 𝑘≥2

P(X≤1/2)=?
2, 0, 0, ¾
5. Continuous Random Variables

Exercise: Piecewise constant PDF

Consider a piecewise constant PDF of the form

𝑓𝑋(𝑥) = {2𝑐, 𝑖𝑓0≤𝑥≤1 𝑐, 𝑖𝑓1 < 𝑥≤3 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

c=?

P(1/2≤X≤3/2)=?
¼, 3/8

Exercise: Uniform PDF

Let X be uniform on the interval [1,3]. Suppose that 1<a<b<3. Then

𝑃(𝑎≤𝑋≤𝑏) =?

𝐸[𝑋] =?

[ 3]
𝐸 𝑋 =?
𝑏−𝑎
2
, 2, 10

5.2. Expectation of a Continuous Random Variable

and its Properties
Let X be a continuous random variable with PDF fX. • The expectation of X
is defined by:
∞
𝐸[𝑋] = ∫ 𝑥𝑓𝑋(𝑥)𝑑𝑥
−∞

The expected value rule for a function g(X) has the for
∞
𝐸[𝑔(𝑋)] = ∫ 𝑔(𝑥)𝑓𝑋(𝑥)𝑑𝑥
−∞

The variance of X is defined by

∞
2 2
𝑣𝑎𝑟⁡(𝑋) = 𝐸[(𝑋 − 𝐸[𝑋]) ] = ∫ (𝑥 − 𝐸[𝑋]) 𝑓𝑋(𝑥)𝑑𝑥
−∞

We have
2 2
0≤𝑣𝑎𝑟⁡(𝑋) = 𝐸[𝑋 ] − (𝐸[𝑋])
5. Continuous Random Variables

If Y = aX + b, where a and b are given scalars,

2
𝐸[𝑌] = 𝑎𝐸[𝑋] + 𝑏, (𝑌) = 𝑎 (𝑋)

Properties of a CDF

The CDF FX of a random variable X is defined by

𝐹𝑋(𝑥) = 𝑃(𝑋≤𝑥) , 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥

and has the following properties.

• FX is monotonically non decreasing:

If 𝑥≤𝑦, 𝑡ℎ𝑒𝑛 𝐹𝑋(𝑥)≤𝐹𝑋(𝑦)

• FX (x) tends to 0 as x → −∞, and to 1 as x → ∞.

• If X is discrete, then FX (x) is a piecewise constant function of x.

• If X is continuous, then FX(x) is a continuous function of x.

• If X is discrete and takes integer values, the PMF and the CDF can be
obtained from each other by summing or differencing:
𝑘
𝐹𝑋(𝑘) = ∑ 𝑝𝑋(𝑖) 𝑝𝑋(𝑘) = 𝑃(𝑋≤𝑘) − 𝑃(𝑋≤𝑘 − 1) = 𝐹𝑋(𝑘) − 𝐹𝑋(𝑘 − 1)
𝑖=−∞

for all integers k.

• If X is continuous, the PDF and the CDF can be obtained from each other
by integration or differentiation:
𝑥 𝑑𝐹𝑋
𝐹𝑋(𝑥) = ∫ 𝑓𝑋(𝑡)𝑑𝑡, 𝑓𝑋(𝑥) = 𝑑𝑥
(𝑥)
−∞

(The second equality is valid for those x at which the PDF is continuous.)

Exercise: Exponential PDF

Let X be an exponential random variable with parameter λ=2. Find the
values of the following. Use 'e' for the base of the natural logarithm
2
𝐸[(3𝑋 + 1) ] =?

𝑃(1≤𝑋≤2) =?
17/2, 0.12
5. Continuous Random Variables

Exercise: Exponential CDF

Let X be an exponential random variable with parameter 2.

Find the CDF of X. Express your answer in terms of x.

𝐹𝑜𝑟 𝑥≤0, 𝐹𝑋(𝑥) =?

𝑥 > 0, 𝐹𝑋(𝑥) =?

−2⋅𝑥
0, 1 − 𝑒

5.3. Normal Random Variables

Normality is Preserved by Linear Transformations

If X is a normal random variable with mean µ and variance σ2, and if a 6= 0,

b are scalars, then the random variable Y = aX + b is also normal, with mean
and variance
2 2
𝐸[𝑌] = 𝑎μ + 𝑏, (𝑌) = 𝑎 σ

CDF Calculation for a Normal Random Variable

For a normal random variable X with mean µ and variance σ2, we use a
two-step procedure.

(a) “Standardize” X, i.e., subtract µ and divide by σ to obtain a standard

normal random variable Y .

(b) Read the CDF value from the standard normal table:

𝑃(𝑋≤𝑥) = 𝑃 ( 𝑋−μ
σ
≤
𝑥−μ
σ ) = 𝑃(𝑌≤ 𝑥−μ
σ ) = Φ( )
𝑥−μ
σ
5. Continuous Random Variables

The standard normal table. The entries in this table provide the numerical values of Φ(y) =
P(Y ≤ y), where Y is a standard normal random variable, for y between 0 and 3.49. For
example, to find Φ(1.71), we look at the row corresponding to 1.7 and the column
corresponding to 0.01, so that Φ(1.71) = .9564. When y is negative, the value of Φ(y) can be
found using the formula Φ(y) = 1 − Φ(−y)

Exercise: Normal random variables

Choose the correct answer below.
According to our conventions, a normal random variable X∼N(μ,σ2) is a
continuous random variable

a) always.
b) if and only if σ≠0.
c) if and only if μ≠0 and σ≠0.
b
5. Continuous Random Variables

Exercise: Using the normal tables

Let X be a normal random variable with mean 4 and variance 9.

Use the normal table to find the following probabilities, to an accuracy of 4

decimal places.

P(X≤5.2)=?

P(X≥2.8)=?

P(X≤2.2)=?
0.6554, 0.6554, 0.2743

Exercise: A conditional PDF

Suppose that X has a PDF of the form
2
𝑓𝑋(𝑥) = {1/𝑥 , 𝑖𝑓𝑥≥1 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

For any x>2, the conditional PDF of X, given the event X>2 is

2
2
𝑥

Exercise: Memorylessness of the exponential

Let X be an exponential random variable with parameter λ.

a) The probability that X>5 is

−5λ
a) λ𝑒
−5λ
b) 𝑒
c) none of the above

b) The probability that X>5 given that X>2 is

−5λ
a) λ𝑒
−5λ
b) 𝑒
−3λ
c) λ𝑒
−3λ
d) 𝑒
e) 𝑁𝑜𝑛𝑒

c) Given that X>2, and for a small δ>0, the probability that 4≤X≤4+2δ is

approximately

a) λδ
5. Continuous Random Variables

b) 2λδ
−4λ
c) δ𝑒
−4λ
d) λδ𝑒
−2λ
e) λδ𝑒
−2λ
f) 2λδ𝑒
g) None
b, d, f

5.4. Conditional PDF Given an Event

• The conditional PDF fX|A of a continuous random variable X, given an
event A with P(A) > 0, satisfies

𝑃(𝐴) = ∫ 𝑓𝑋∣𝐴(𝑥)𝑑𝑥
𝐵

• If A is a subset of the real line with P(X ∈ A) > 0, then

𝑓𝑋(𝑥)
𝑓𝑋∣{𝑋∈𝐴}(𝑥) = { 𝑃(𝑋∈𝐴)
, 𝑖𝑓𝑥∈𝐴 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

• Let A1, A2, . . . , An be disjoint events that form a partition of the sample
space, and assume that P(Ai) > 0 for all i. Then,
𝑛

𝑖=1
( )
𝑓𝑋(𝑥) = ∑ 𝑃 𝐴𝑖 𝑓𝑋∣𝐴 (𝑥)
𝑖

(a version of the total probability theorem).

5.5. Conditional PDF Given a Random Variable

Let X and Y be jointly continuous random variables with joint PDF fX,Y . • The
joint, marginal, and conditional PDFs are related to each other by the
formulas
∞
𝑓𝑋,𝑌(𝑥, 𝑦) = 𝑓𝑌(𝑦)𝑓𝑋∣𝑌(𝑥∣𝑦) 𝑓𝑋(𝑥) = ∫ 𝑓𝑌(𝑦)𝑓𝑋∣𝑌(𝑥∣𝑦)𝑑𝑦
−∞

The conditional PDF fX|Y (x | y) is defined only for those y for which fY (y) > 0.

• We have

𝑃(𝑌 = 𝑦) = ∫ 𝑓𝑋∣𝑌(𝑦)𝑑𝑥
𝐴
5. Continuous Random Variables

Exercise: A conditional PDF

Suppose that X has a PDF of the form
1
𝑓𝑋(𝑥) = { 2 , 𝑖𝑓𝑥≥1 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥

For any x>2, the conditional PDF of X, given the event X>2 is ?

2
2
𝑥

Exercise: Memorylessness of the exponential

Let X be an exponential random variable with parameter λ.

The probability that X>5 is ?

−5λ
a) λ𝑒
−5λ
b) 𝑒
c) none of the above

The probability that X>5 given that X>2 is

−5λ
a) λ𝑒
−5λ
b) 𝑒
−3λ
c) λ𝑒
−3λ
d) 𝑒
e) none of the above

Given that X>2, and for a small δ>0, the probability that 4≤X≤4+2δ is

approximately

a) λδ
b) 2λδ
−4λ
c) δ𝑒
−4λ
d) λδ𝑒
−2λ
e) λδ𝑒
−2λ
f) 2λδ𝑒
g) None of the above
b, d, f
5. Continuous Random Variables

Exercise: Total probability theorem II

On any given day, mail gets delivered by either Alice or Bob. If Alice delivers
it, which happens with probability 1/4, she does so at a time that is
uniformly distributed between 9 and 11. If Bob delivers it, which happens
with probability 3/4, he does so at a time that is uniformly distributed
between 10 and 12. The PDF of the time X that mail gets delivered satisfies

𝑓𝑋(9. 5) =?

𝑓𝑋(10. 5) =?

1/8, ½

Exercise: A mixed random variable

A lightbulb is installed. With probability 1/3, it burns out immediately when
it is first installed. With probability 2/3, it burns out after an amount of time
that is uniformly distributed on [0,3]. The expected value of the time until
the lightbulb burns out is
1

Exercise: Jointly continuous r.v.'s

The random variables X and Y are continuous. Is this enough information to
2 3𝑌
determine the value of 𝑃(𝑋 = 𝑒 )?

The random variables X and Y are jointly continuous. Is this enough

2 3𝑌
(
information to determine the value of 𝑃 𝑋 = 𝑒 )?
No, Yes

From joint PDFs to probabilities

a) The probability of the event that 0≤Y≤X≤1 is of the form
𝑏 𝑑
∫(∫ 𝑓𝑋,𝑌𝑝(𝑥, 𝑦)𝑑𝑥)𝑑𝑦
𝑎 𝑑

Find the values of a, b, c, d. Each one of your answers should be one of the
following: 0, x, y, or 1.
0, 1, y, 1
5. Continuous Random Variables

Finding a marginal PDF

The random variables X and Y are described by a uniform joint PDF of the
form
2
{
𝑓𝑋,𝑌(𝑥, 𝑦) = 3 on the set 0≤𝑥≤1, 0≤𝑦≤1, 𝑦≤𝑥 }
𝑓𝑋(0. 5) =?

From joint PDFs to the marginal

For each one of the following formulas, identify those that are always true.
All integrals are meant to be from −∞ to ∞.

a) 𝑓𝑋,𝑍(𝑎, 𝑏) = ∫𝑓𝑋,𝑌,𝑍(𝑎, 𝑏, 𝑐)𝑑𝑎

b) 𝑓𝑋,𝑍(𝑎, 𝑐) = ∫𝑓𝑋,𝑌,𝑍(𝑎, 𝑏, 𝑐)𝑑𝑏
c) 𝑓𝑋,𝑍(𝑎, 𝑏) = ∫𝑓𝑋,𝑌,𝑍(𝑎, 𝑏, 𝑐)𝑑𝑐
d) 𝑓𝑌(𝑎) = ∭𝑓𝑈,𝑉,𝑋,𝑌(𝑎, 𝑏, 𝑐, 𝑠)𝑑𝑏𝑑𝑐𝑑𝑠
e) 𝑓𝑌(𝑎) = ∭𝑓𝑈,𝑉,𝑋,𝑌(𝑠, 𝑐, 𝑏, 𝑎)𝑑𝑏𝑑𝑐𝑑𝑠

No, Yes, No, No, Yes

Exercise: Joint CDFs

' '
(
Is it always true that if 𝑥 < 𝑥 , 𝑡ℎ𝑒𝑛𝐹𝑋,𝑃(𝑥, 𝑦) ≤ 𝐹𝑋,𝑌 𝑥 , 𝑦 ?)
Suppose that the random variables XX and YY are jointly continuous
and take values on the unit square, i.e., 0≤x≤1 and 0≤y≤1. Is
2
𝐹𝑋,𝑌(𝑥, 𝑦) = (𝑥 + 2𝑦) /9 a legitimate joint CDF? Hint: Consider FX,Y(0,1).

As above, suppose that the random variables X and Y are jointly

continuous and take values on the unit square, i.e., 0≤x≤1 and 0≤y≤1.
The joint CDF on that set is of the form xy(x+y)/2. Find an expression for
the joint PDF which is valid for (x,y) in the unit square. Enter an algebraic
function of x and y.
Yes, No, x+y
5. Continuous Random Variables

Exercise: Conditional PDF

The random variables X and Y are jointly continuous, with a joint PDF of the
form

𝑓𝑋,𝑌(𝑥, 𝑦) = {𝑐𝑥𝑦, 𝑖𝑓0≤𝑥≤𝑦≤1 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

where c is a normalizing constant.

a) Is it true that fX|Y(2|0.5) is equal to zero?

b) Is it true that fX|Y(0.5|2) is equal to zero?

Yes, No

For x∈[0,0.5], the conditional PDF fX|Y(x|0.5) is of the form axb. Find a and b.

Your answers should be numbers.
8, 1

5.6. Summary of Facts About Conditional

Expectations
Let X snd Y be jointly continuous random variables, and let A be an event
with P(A) > 0.

• Definitions: The conditional expectation of X given the event A is defined

by
∞
𝐸[𝐴] = ∫ 𝑥𝑓𝑋∣𝐴(𝑥)𝑑𝑥
−∞

The conditional expectation of X given that Y = y is defined by

∞
𝐸[𝑌 = 𝑦] = ∫ 𝑥𝑓𝑋∣𝑌(𝑦)𝑑𝑥
−∞

• The expected value rule: For a function g(X), we have

∞
𝐸[𝐴] = ∫ 𝑔(𝑥)𝑓𝑋∣𝐴(𝑥)𝑑𝑥
−∞

And
∞
𝐸[𝑌 = 𝑦] = ∫ 𝑔(𝑥)𝑓𝑋∣𝑌(𝑦)𝑑𝑥
−∞
5. Continuous Random Variables

• Total expectation theorem: Let A1, A2, . . . , An be disjoint events that form
a partition of the sample space, and assume that P(Ai) > 0 for all i. Then,
𝑛
( )[ ]
𝐸[𝑋] = ∑ 𝑃 𝐴𝑖 𝐸 𝐴𝑖
𝑖=1

Similarly,
∞
𝐸[𝑋] = ∫ 𝐸[𝑌 = 𝑦]𝑓𝑌(𝑦)𝑑𝑦
−∞

• There are natural analogs for the case of functions of several random
variables. For example,

𝐸[𝑌 = 𝑦] = ∫𝑔(𝑥, 𝑦)𝑓𝑋∣𝑌(𝑦)𝑑𝑥

And

𝐸[𝑔(𝑋, 𝑌)] = ∫𝐸[𝑌 = 𝑦]𝑓𝑌(𝑦)𝑑𝑦

5.7. Expected value rule and total expectation

theorem
Let X, Y, and Z be jointly continuous random variables. Assume that all
conditional PDFs and expectations are well defined. E.g., when
conditioning on X=x, assume that xx is such that fX(x)>0. For each one of
the following formulas, state whether it is true for all choices of the
function g or false (i.e., not true for all choices of g).

1. 𝐸[𝑋 = 𝑥] = ∫𝑔(𝑦)𝑓𝑌∣𝑋(𝑥)𝑑𝑦

2. 𝐸[𝑋 = 𝑥] = ∫𝑔(𝑦)𝑓𝑌∣𝑋(𝑥)𝑑𝑦

3. 𝐸[𝑔(𝑌)] = ∫𝐸[𝑍 = 𝑧]𝑓𝑍(𝑧)𝑑𝑧

4. 𝐸[𝑋 = 𝑥, 𝑍 = 𝑧] = ∫𝑔(𝑦)𝑓𝑌∣𝑋,𝑍(𝑥, 𝑧)𝑑𝑦

5. 𝐸[𝑋 = 𝑥] = ∫𝐸[𝑋 = 𝑥, 𝑍 = 𝑧]𝑓𝑍∣𝑋(𝑥)𝑑𝑧

6. 𝐸[𝑌 = 𝑦] = 𝐸[𝑌 = 𝑦]

7. 𝐸[𝑌 = 𝑦] = 𝐸[𝑔(𝑋, 𝑦)]

8. 𝐸[𝑌 = 𝑦] = ∫𝑔(𝑥, 𝑧)𝑓𝑋,𝑍∣𝑌(𝑦)𝑑𝑦

5. Continuous Random Variables

True, False, True, True, True, True, False, False

5.8. Independence of Continuous Random

Variables
Let X and Y be jointly continuous random variables.

• X and Y are independent if

𝑓𝑋,𝑌(𝑥, 𝑦) = 𝑓𝑋(𝑥)𝑓𝑌(𝑦), 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥, 𝑦

• If X and Y are independent, then

𝐸[𝑋𝑌] = 𝐸[𝑋]𝐸[𝑌]

Furthermore, for any functions g and h, the random variables g(X) and h(Y )
are independent, and we have

𝐸[𝑔(𝑋)ℎ(𝑌)] = 𝐸[𝑔(𝑋)]𝐸[ℎ(𝑌)]

• If X and Y are independent, then

(𝑋 + 𝑌) = (𝑋) + (𝑌)

Definition of independence
Suppose that X and Y are independent, with a joint PDF that is uniform on a
certain set S: fX,Y(x,y) is constant on S, and zero otherwise. The set S

a) must be a square.
b) must be a set of the form {(x,y):x∈A, y∈B}} (known as the Cartesian
product of two sets A and B).
c) can be any set.
b

Exercise: Independence and expectations II

Let X, Y, Z be independent jointly continuous random variables, and
let g, h, r be some functions. For each one of the following formulas, state
whether it is true for all choices of the functions g, h, and r, or false (i.e., not
true for all choices of these functions). Do not attempt formal derivations;
use an intuitive argument.

1. 𝐸[𝑔(𝑋, 𝑌)ℎ(𝑍)] = 𝐸[𝑔(𝑋, 𝑌)] · 𝐸[ℎ(𝑍)]

2. 𝐸[𝑔(𝑋, 𝑌)ℎ(𝑌, 𝑍)] = 𝐸[𝑔(𝑋, 𝑌)] · 𝐸[ℎ(𝑌, 𝑍)]

5. Continuous Random Variables

3. 𝐸[𝑔(𝑋)𝑟(𝑌)ℎ(𝑍)] = 𝐸[𝑔(𝑋)] · 𝐸[𝑟(𝑌)] · 𝐸[ℎ(𝑍)]

True, False, True

Exercise: Independence and CDFs

a) Suppose that X and Y are independent. Is it true that their joint CDF
satisfies 𝐹𝑋,𝑌(𝑥, 𝑦) = 𝐹𝑋(𝑥)𝐹𝑌(𝑦) for all x and y?

b) Suppose 𝐹𝑋,𝑌(𝑥, 𝑦) = 𝐹𝑋(𝑥)𝐹𝑌(𝑦), for all x and y. Is it true that X and Y are

2
independent? Recall 𝑓𝑋,𝑌(𝑥, 𝑦) = (∂ /∂𝑥∂𝑦)𝐹𝑋,𝑌(𝑥, 𝑦)

Yes, Yes

Exercise: Stick-breaking
Consider the same stick-breaking problem as in the previous clip, and
let ℓ=1 Recall that fX,Y(x,y)=1/x when 0≤y≤x≤1.

a) Conditioned on Y=2/3, the conditional PDF of X is nonzero when a≤x≤b.

Find a and b.

b) On the range found in part (a), the conditional PDF fX|Y(x|2/3) is of the

form cxd for some constants c and d. Find d
2/3, 1, -1

Exercise: Independent normals

The random variables X and Y have a joint PDF of the form

{ (4𝑥
𝑓𝑋,𝑌(𝑥, 𝑦) = 𝑐⋅ exp 𝑒𝑥𝑝 −
1
2
2 2
− 8𝑥 + 𝑦 − 6𝑦 + 13 )}
E[X]=?

Var(X)?

E[Y]=?

Var(Y)=?
1, ¼, 3, 1

5.9. Bayes’ Rule Relations for Random Variables

Let X and Y be two random variables.
5. Continuous Random Variables

• If X and Y are discrete, we have for all x, y with 𝑝𝑋(𝑥)≠0, 𝑝𝑌(𝑦)≠0,

𝑝𝑋(𝑥)𝑝𝑌∣𝑋(𝑥) = 𝑝𝑌(𝑦)𝑝𝑋∣𝑌(𝑦)

and the terms on the two sides in this relation are both equal to

𝑝𝑋,𝑌(𝑥, 𝑦)

• If X is discrete and Y is continuous, we have for all x, y with 𝑝𝑋(𝑥)≠0,

𝑓𝑌(𝑦)≠0,

𝑝𝑋(𝑥)𝑓𝑌∣𝑋(𝑥) = 𝑓𝑌(𝑦)𝑝𝑋∣𝑌(𝑦)

and the terms on the two sides in this relation are both equal to
𝑃(𝑋=𝑥,𝑦≤𝑌≤𝑦+δ)
lim δ
δ→0

• If X and Y are continuous, we have for all x, y with 𝑓𝑋(𝑥)≠0, 𝑓𝑌(𝑦)≠0,

𝑓𝑋(𝑥)𝑓𝑌∣𝑋(𝑥) = 𝑓𝑌(𝑦)𝑓𝑋∣𝑌(𝑦)

and the terms on the two sides in this relation are both equal to
𝑃(𝑥≤𝑋≤𝑥+δ,𝑦≤𝑌≤𝑦+δ)
lim 2
δ→0 δ

Exercise: The discrete Bayes rule

The bias of a coin (i.e., the probability of Heads) can take three possible
values, 1/4, 1/2, or 3/4, and is modeled as a discrete random variable Q with
PMF
1 1 2 2 3 3
𝑝𝑄(𝑞) = { 6 , 𝑖𝑓𝑞 = 4 6
, 𝑖𝑓𝑞 = 4 6
, 𝑖𝑓𝑞 = 4
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Let K be the total number of Heads in two independent tosses of the coin.
Find pQ|K(3/4|2).
3/8

5.10. Summary of Results for Special Random

Variables
Continuous Uniform Over [a, b]:
5. Continuous Random Variables

1
𝑓𝑋(𝑥) = { 𝑏−𝑎 , 𝑖𝑓𝑎≤𝑥≤𝑏 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
2
𝑎+𝑏 (𝑏−𝑎)
𝐸[𝑋] = 2
, (𝑋) = 12

Exponential with Parameter λ:

−λ𝑥
𝑓𝑋(𝑥) = {λ𝑒 , 𝑖𝑓𝑥≥0 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

−λ𝑥
𝐹𝑋(𝑥) = {1 − 𝑒 , 𝑖𝑓𝑥≥0 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

1 1
𝐸[𝑋] = λ
, (𝑋) = 2
λ

Normal with Parameters µ and σ2 > 0:

2 2
1 −(𝑥−μ) /2σ 2
𝑓𝑋(𝑥) = 𝑒 𝐸[𝑋] = μ, (𝑋) = σ
2πσ

Exercise: Discrete unknown, continuous measurement

Let K be a discrete random variable that can take the values 1, 2, and 3, all
with equal probability. Suppose that X takes values in [0,1] and that for x in
that interval we have
2
𝑓𝑋∣𝐾(𝑥∣𝑘) = {1, 𝑖𝑓 𝑘 = 1 2𝑥, 𝑖𝑓 𝑘 = 2 3𝑥 , 𝑖𝑓 𝑘 = 3

Find the probability that K=1, given that X=1/2.

11/12

Exercise: Inference of the bias of a coin

The random variable K is geometric with a parameter which is itself a
uniform random variable Q on [0,1]. Find the value fQ|K(0.5|1) of the
conditional PDF of Q, given that K=1. Hint: Use the result in the last
segment.
1

5.11. Problems
Problem 1. Normal random variables
Let X and Y be two normal random variables, with means 0 and 3,
respectively, and variances 1 and 16, respectively. Find the following, using
the standard normal table.
5. Continuous Random Variables

P(X>−1)= ?

P(X≤−2)= ?

Let V=(4−Y)/3. Find the mean and the variance of V.

E[V]=?

Var(V)=?

P(−2<Y≤2)=?
0.8413, 0.0228, 1/3, 16/9, 0.2957

Problem 2. CDF
For each one of the following figures, identify whether it is a valid CDF. The
value of the CDF at points of discontinuity is indicated with a small solid
circle.

Note: Assume that the above function never goes above 0.6.

2.
5. Continuous Random Variables

4.
Not a valid CDF, Not a valid CDF, Valid CDF, Not a valid CDF

Problem 3. A joint PDF given by a simple formula

The random variables X and Y are distributed according to the joint PDF
2
𝑓𝑋,𝑌(𝑥, 𝑦) = {𝑎𝑥 , 𝑖𝑓1≤𝑥≤2𝑎𝑛𝑑0≤𝑦≤𝑥 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Find the constant a.

Determine the marginal PDF fY(y).

(Your answer can be either numerical or an algebraic function of y).
𝑏
2
Useful fact: You may find the following fact useful ∫ 𝑥 𝑑𝑥 =
1
3 (𝑏3 − 𝑎3)
𝑎

If 0≤y≤1:

𝑓𝑌(𝑦) =?

If 1<y≤2:

𝑓𝑌(𝑦) =?

Determine the conditional expectation of 1/(X2Y), given that Y=5/4.

5
𝐸⎡𝑌 = ⎤ =?
⎣ 4 ⎦
5. Continuous Random Variables

4 3
4/15, 28/45, 45
⋅(8 − 𝑦 ), 0.2976

Problem 4. Sophia's vacation

Sophia is vacationing in Monte Carlo. On any given night, she
takes X dollars to the casino and returns with Y dollars. The random
variable X has the PDF shown in the figure. Conditional on X=x, the
continuous random variable Y is uniformly distributed between zero
and 3x.

If 0<x<40 and 0<y<3x:

𝑓𝑋,𝑌(𝑥, 𝑦) =?

If y<0 or y>3x:

𝑓𝑋,𝑌(𝑥, 𝑦) =?

On any particular night, Sophia makes a profit Z=Y−X dollars. Find the

probability that Sophia makes a positive profit, that is, find P(Z>0).

P(Z>0)=?

Find the PDF of Z. Express your answers in terms of z using standard

notation.

Hint : start by finding 𝑓𝑍∣𝑋(𝑧∣𝑥)

If −40<z<0:

𝑓𝑍(𝑧) =?

If 0<z<80:

𝑓𝑍(𝑧) =?
5. Continuous Random Variables

If z<−40 or z>80:

𝑓𝑍(𝑧) =?

What is E[Z]?
40+𝑧 80−𝑧
1/2400, 0, 2/3, 2400
, 4800
, 0, 40/3

Problem 5. True or False

Determine whether each of the following statement is true (i.e., always
true) or false (i.e., not always true).

1. Let X be a random variable that takes values between 0 and c only,

for some c≥0, so that P(0≤X≤c)=1. Then, Var(X)≤c2/4.
2
Let X and Y be continuous random variables. If 𝑋∼𝑁 𝑢, σ (i.e., normal with ( )
2
mean μ and variance σ ), Y=aX+b, and a>0, then 𝑌∼𝑁(𝑎μ + 𝑏, 𝑎σ )
2

2. The expected value of a non-negative continuous random

∞
variable X, which is defined by 𝐸[𝑋] = ∫ 𝑥𝑓𝑋(𝑥)𝑑𝑥 also satisfies
0
∞
𝐸[𝑋] = ∫ 𝑃(𝑋 > 𝑡)𝑑𝑡
0

True, False, True

Problem 6. Bayes' rule

Let K be a discrete random variable with PMF
1 1 1
𝑝𝐾(𝑘) = { 4 , 𝑖𝑓 𝑘 = 1 2
, 𝑖𝑓 𝑘 = 2 4
, 𝑖𝑓 𝑘 = 3 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Conditional on K=1,2, or 3, random variable Y is exponentially distributed

with parameter 1, 1/2, or 1/3, respectively.

Using Bayes' rule, find the conditional PMF pK∣Y(k∣y). Which of the following
is the correct expression for pK∣Y(2|y), when y≥0?
𝑦
−2
𝑒
a) −𝑦 −2
𝑦
−3
𝑦
1
𝑒 +𝑒 +3𝑒
−𝑦
𝑒
b) −𝑦 −2
𝑦
−3
𝑦
1 1
3
𝑒 +𝑒 +3𝑒
𝑦
−2
𝑒
c) −𝑦 −2
𝑦
−3
𝑦
1 1 1
3
𝑒 +3𝑒 +3𝑒
5. Continuous Random Variables

𝑦
−3
𝑒
d) −𝑦 −
𝑦 𝑦
−3
2 1
𝑒 +𝑒 +3𝑒

Problem 7. A joint PDF on a triangular region

This figure below describes the joint PDF of the random variables X and Y.
These random variables take values in [0,2] and [0,1], respectively. At x=1,
the value of the joint PDF is 1/2.

Are X and Y independent?

Find 𝑓𝑋(𝑥):

● If 0<x≤1
● If 1<x<2
● If x<0 or x≥2

Find 𝑓𝑌∣𝑋(𝑦∣0. 5):

● If y<0 or y>1/2
● If 1/2<x<1
● If 1<x<3/2
● If x<1/2 or x>3/2

Let R=XY and let A be the event that {X<0.5}. Find E[R|A].

𝑥 3⋅𝑥
No, 2
,3 − 2
, 0, 2, 0, ½, 3/2, 0, 1/16
6. Further Topics on Random Variables

Chapter VI

6. Further Topics
Random
Variables
6.1. Calculation of the PDF of a Function Y = g(X)
of a Continuous Random Variable X
Calculate the CDF FY of Y using the formula

𝐹𝑌(𝑦) = 𝑃(𝑔(𝑋)≤𝑦) = ∫ 𝑓𝑋(𝑥)𝑑𝑥

{𝑔(𝑥)≤𝑦}

Differentiate to obtain the PDF of

𝑑𝐹𝑌
𝑓𝑌(𝑦) = 𝑑𝑦
(𝑦)

6.2. The PDF of a Linear Function of a Random

Variable
Let X be a continuous random variable with PDF fX, and let

𝑌 = 𝑎𝑋 + 𝑏

where a and b are scalars, with 𝑎≠0. 𝑇ℎ𝑒𝑛

𝑓𝑌(𝑦) =
1
|𝑎|
𝑓𝑋 ( )
𝑦−𝑏
𝑎

Exercise: Linear functions of discrete r.v.'s

The random variables X and Y obey a linear relation of the
form Y=aX+b and have the PMFs shown in the diagram. Find the values
of a and b.
6. Further Topics on Random Variables

-1, 5

Exercise: Linear functions of continuous r.v.'s

(a) Let X be an exponential random variable and let Y=aX+b. The random
variable Y is exponential if and only if (choose one of the following
statements):

a) always.
b) a≠0.
c) a≠0 and b=0
d) a>0
e) a>0 and b=0
f) a=1
e

(b) Let X be a continuous random variable, uniformly distributed on some

interval, and let Y=aX+b. The random variable Y will be a continuous
random variable with a uniform distribution if and only if (choose one of the
following statements):

a) always.
b) a≠0.
c) a≠0 and b=0
d) a>0
e) a>0 and b=0
b

Exercise: PDF of a general function

The random variable X has a PDF of the form
1
𝑓𝑋(𝑥) = { 2 , 𝑓𝑜𝑟𝑥≥1 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥

Let Y=X2. For y≥1, the PDF of Y it takes the form fY(y)=a/yb. Find the values
of a and b.
6. Further Topics on Random Variables

½, 3/2

6.3. PDF Formula for a Strictly Monotonic

Function of a Continuous Random Variable
Suppose that g is strictly monotonic and that for some function h and all x
in the range of X we have

𝑦 = 𝑔(𝑥) if and only if 𝑥 = ℎ(𝑦)

Assume that h is differentiable. Then, the PDF of Y in the region where fY

(y) > 0 is given by
𝑑ℎ
𝑓𝑌(𝑦) = 𝑓𝑋(ℎ(𝑦))|| 𝑑𝑦 (𝑦)||

Exercise: Using the formula for the monotonic case

The random variable X is exponential with parameter λ=1. The random
variable Y is defined by Y=g(X)=1/(1+X).

a) The inverse function h, for which h(g(x))=x, is of the form ayb+c. Find a, b,

and c.
𝑎 (𝑏/𝑦)+𝑐
b) For 𝑦∈(0, 1] the PDF of Y is of the form 𝑓𝑌(𝑦) = 𝑦 𝑒 . Find a, b, and c.

1 -1 -1, -2 -1 1

Exercise: Nonmonotonic functions

Suppose that X is a continuous random variable and that Y=X4. Then,
for y≥0, we have
𝑏 α
(
𝑓𝑌(𝑦) = 𝑎𝑦 𝑓𝑋 − 𝑐𝑦 ) + 𝑎𝑦𝑏𝑓𝑋(𝑐𝑦𝑑)
for some a, b, d, and some c>0. Find a, b, c, and d.
¼, -3/4, 1, ¼

Exercise: A function of multiple r.v.'s

Suppose that X and Y are described by a joint PDF which is uniform inside
2 2
the unit circle, that is, the set of points that satisfy 𝑥 + 𝑦 ≤1. In particular,
2 2
the joint PDF takes the value of 1/π on the unit circle. Let 𝑍 = 𝑋 +𝑌 ,
6. Further Topics on Random Variables

which is the distance of the outcome (X,Y) from the origin. The PDF of Z,
𝑏
for z∈[0,1], takes the form 𝑓𝑍(𝑧) = 𝑎𝑧 . Find a and b.

2, 1

Exercise: Discrete convolution

The random variables X and Y are independent and have the PMFs shown
in this diagram.

The probability that X+Y=6 is:

Exercise: Continuous convolution

When calculating the convolution of two PDFs, one must be careful to use
the appropriate limits of integration. Suppose that X and Y are nonnegative
random variables. In particular, fX(x) is equal to some positive
function hX(x) for x≥0 and is zero for x<0. Similarly, fY(y) is equal to some
positive function hY(y) for y≥0, and is zero for y<0. Then, the convolution
∞
integral ∫ 𝑓𝑋(𝑥)𝑓𝑌(𝑧 − 𝑥)𝑑𝑥 is of the form
−∞

𝑏
∫ ℎ𝑋(𝑥)ℎ𝑌(𝑧 − 𝑥)𝑑𝑥
𝑎

for suitable choices of a and b determined by z. Fix some z≥0. Find a and b.

(Your answer can be an algebraic function of z.)
0, z

Exercise: Sum of normals

Let X and Y be independent normal random variables.

a) Is 2X−4 always normal?
b) Is 3X−4Y always normal?
6. Further Topics on Random Variables

c) Is X2+Y always normal?
True, True, False

6.4. Covariance and Correlation

The covariance of X and Y is given by

(𝑋, 𝑌) = 𝐸[(𝑋 − 𝐸[𝑋])(𝑌 − 𝐸[𝑌])] = 𝐸[𝑋𝑌] − 𝐸[𝑋]𝐸[𝑌]

• If cov(X, Y ) = 0, we say that X and Y are uncorrelated.

• If X and Y are independent, they are uncorrelated. The converse is not

always true.

• We have

(𝑋 + 𝑌) = (𝑋) + (𝑌) + 2(𝑋, 𝑌)

• The correlation coefficient ρ(X, Y ) of two random variables X and Y with

positive variances is defined
(𝑋,𝑌)
ρ(𝑋, 𝑌) =
(𝑋) (𝑌)

, and satisfy

− 1≤ρ(𝑋, 𝑌)≤1

Exercise: Covariance calculation

Suppose that X, Y, and Z are independent random variables with unit
variance. Furthermore, E[X]=0 and E[Y]=E[Z]=2. Then, 𝑐𝑜𝑣⁡(𝑋𝑌, 𝑋𝑍) = ?
4

Exercise: Covariance properties

a) Is it true that Cov(X,Y)=Cov(Y,X)?

b) Find the value of a in the relation Cov(2X,−3Y+2)=a⋅Cov(X,Y).

c) Suppose that X, Y, and Z are independent, with a common variance of 5.

Then, (2𝑋 + 𝑌, 3𝑋 − 4𝑍) =?
True, -6, 30
6. Further Topics on Random Variables

Exercise: The variance of a sum

The random variables X1,…,X8 satisfy E[Xi]=1 and Var(Xi)=4 for i=1,2,…,8.
Also, for 𝑖≠𝑗, 𝐸[𝑋𝑡𝑋𝑗] = 3. Then,

(𝑋1 + ⋯ + 𝑋8) =?

144

Exercise: Correlation coefficient

It is known that for a standard normal random variable X, we have
3 4 5 6
𝐸[𝑋 ] = 0, 𝐸[𝑋 ] = 3, 𝐸[𝑋 ] = 0, 𝐸[𝑋 ] = 15. Find the correlation coefficient
between X and X3. Enter your answer as a number.
0.77459

Exercise: Correlation properties

As in the preceding example, let Z, V, and W be independent random
variables with mean 0 and variance 1, and let X=Z+V and Y=Z+W. We have
found that ρ(X,Y)=1/2.

a) It follows that:

ρ(𝑋, − 𝑌) =?

ρ(− 𝑋, − 𝑌) =?

b) Suppose that X and Y are measured in dollars. Let X′ and Y′ be the same

random variables, but measured in cents, so that X′=100X and Y′=100Y.
Then,
' '
( )
ρ 𝑋 , 𝑌 =?
~ ~
c) Suppose now that 𝑋 = 3𝑍 + 3𝑉 + 3 𝑎𝑛𝑑 𝑌 =− 2𝑍 − 2𝑊. Then
~ ~
(
ρ 𝑋 ,𝑌 ) =?
d) Suppose now that the variance of Z is replaced by a very large number.
Then

ρ(X,Y) is close to ?

e) Alternatively, suppose that the variance of Z is close to zero. Then

ρ(X,Y) is close to?

6. Further Topics on Random Variables

- ½, ½, ½, - ½, 1, 0

6.5. Conditional Expectation and Variance

Law of Iterated Expectations:

𝐸[𝐸[𝑌]] = 𝐸[𝑋]

Law of Total Variance:

(𝑋) = 𝐸[(𝑌) ] + (𝐸[𝑌])

Properties of the Conditional Expectation and Varianc

• E[X | Y = y] is a number whose value depends on y.

• E[X | Y ] is a function of the random variable Y , hence a random variable.

Its value is E[X | Y = y] whenever the value of Y is y.

• E E[X | Y ] = E[X] (law of iterated expectations).

• E[X | Y = y] may be viewed as an estimate of X given Y = y. The

corresponding error E[X | Y ] − X is a zero mean random variable that is
uncorrelated with E[X|Y].

• var(X | Y ) is a random variable whose value is var(X | Y = y) whenever the

value of Y is y.

• var(X) = E var(X | Y ) + var E[X | Y ] (law of total variance)

Exercise: Conditional expectation

Let X and Y be zero-mean independent random variables. Which one of the
following statements is correct? Hint: You can take for granted the intuitive
fact that E[X|X=x]=x.

a) E[X+Y∣X]=0.
b) E[X+Y∣X]=x.
c) E[X+Y∣X]=X.
d) E[X+Y∣X]=X+Y.
c

Exercise: Iterated expectations

In this exercise, do not attempt formal mathematical derivations, which
would actually involve some subtle issues when we go beyond discrete
random variables. Rather, use your understanding of the concepts
6. Further Topics on Random Variables

involved. For each one of the statements below, indicate whether it is true
or false.

(a) The law of iterated expectations tells us that E[E[X|Y]]=E[X]. Suppose

that we want apply this law in a conditional universe, given another random
variable Z, in order to evaluate E[X|Z]. Then:

E[E[X|Y,Z]|Z]=E[X|Z]

E[E[X|Y]|Z]=E[X|Z]

E[E[X|Y,Z]]=E[X|Z]

(b) Determine whether each of the following statements about the

quantity E[g(X,Y)|Y,Z] is true or false.

The quantity E[g(X,Y)|Y,Z] is:

a) a random variable
b) a number
c) a function of (X,Y)
d) a function of (Y,Z)
e) a function of Z only
True, True, False, True, False, False, True, False

Exercise: Conditional expectation example

The random variable Q is uniform on [0,1]. Conditioned on Q=q, the random
variable X is Bernoulli with parameter q. Then, E[X|Q] is equal to:

a) q
b) Q
c) 1−q
d) 1−Q
b

Exercise: Conditional variance II

The random variable Q is uniform on [0,1]. Conditioned on Q=q, the random
variable X is Bernoulli with parameter q.

(a) The conditional variance, Var(X|Q), is equal to:

a) ¼
6. Further Topics on Random Variables

b) q(1−q)
c) Q(1−Q)
d)
q2
e) Q2

(b) Recall that a uniform random variable on [0,1] has a variance of 1/12 and

also satisfies E[Q2]=1/3. Then:

(𝐸[𝑄]) =?

𝐸[(𝑄) ] =?
c, 1/12, 1/6

Exercise: Conditional variance definition

For each one of the following statements, indicate whether it is true or
false.

(a) If X=Y (i.e., the two random variables always take the same values),
then Var(X|Y)=0.

(b) If X=Y (the two random variables always take the same values),
then Var(X|Y)=Var(X).

(c) If Y takes on the value y, then the random variable Var(X|Y) takes the

value
2
𝐸[(𝑋 − 𝐸[𝑋∣𝑌 = 𝑦]) ∣𝑌 = 𝑦]

(d) If Y takes on the value y, then the random variable Var(X|Y) takes the

value
2
𝐸[(𝑋 − 𝐸[𝑋∣𝑌]) ∣𝑌 = 𝑦

(e) If Y takes on the value y, then the random variable Var(X|Y) takes the

value
2
𝐸[(𝑋 − 𝐸[𝑋]) ∣𝑌 = 𝑦
True, False, True, True, False

Exercise: Sections of a class

A class consists of three sections with 10 students each. The mean quiz
scores in each section were 40, 50, 60, respectively. We pick a student,
uniformly at random. Let X be the score of the selected student, and
6. Further Topics on Random Variables

let Y be the number of his/her section. The quantity Var(X|Y=y) turned out

to be equal to 5y for each section (y=1,2,3).

(a) The random variable E[X|Y] has:

a mean of: ?

a variance of: ?

(b) E[Var(X|Y)]= ?

Exercise: Second generation offspring

Every person has a random number of children, drawn from a common
distribution with mean 3 and variance 2. The numbers of children of each
person are independent. Let M be the number of grandchildren of a certain
person. Then:

E[M]=?

Var(M)=?
9, 24

Problems
Problem 1. The PDF of the logarithm of X
Let XX be a non-negative random variable. Find the PDF of the random
variable Y=lnX for each of the following cases:

For general fX, fY(y)=

( 𝑦)
a) 𝑓𝑋 𝑒 𝑒
𝑦

( 𝑦)
𝑓𝑋 𝑒
b) 𝑦
𝑒
𝑓𝑋(ln𝑙𝑛 𝑦 )
c) 𝑦
d) none of the above
1
When 𝑓𝑋(𝑥) = { 4 , 𝑖𝑓 2 < 𝑥≤6 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

we have 𝑓𝑌(𝑦) = {𝑔(𝑦), 𝑖𝑓 𝑎 < 𝑦≤𝑏 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

6. Further Topics on Random Variables

Give a formula for g(y) and the values of a and b using standard notation

g(y)= ?

a=?

b=?

When 𝑓𝑋(𝑥) = {2(𝑥 − 1), 𝑖𝑓 1 < 𝑥≤2 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

We have 𝑓𝑌(𝑦) = {𝑔(𝑦), 𝑖𝑓𝑎 < 𝑦≤𝑏 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Give a formula for g(y), and the values of a and b, using standard notation.

g(y)= ?

a=?

b=?
𝑦
𝑒 2⋅𝑦 𝑦
a, 4
, 0.6931, 1.791, 2⋅𝑒 − 2⋅𝑒 , 0, 0.6931

Problem 2. Functions of the standard normal

The random variable X has a standard normal distribution. Find the PDF of
the random variable Y, where:

Y=5X−7

a) 𝑓𝑌(𝑦) = 5𝑓𝑋 ( 𝑦−7

5 )
𝑓 (𝑦) = 𝑓 ( )
1 𝑦+7
b) 𝑌 5 𝑋 5

𝑓 (𝑦) = 5𝑓 ( )
𝑦+7
c) 𝑌 𝑋 5

𝑓 (𝑦) = 𝑓 ( )
1 𝑦−7
d) 𝑌 5 𝑋 5

Y=X2−2X. For y≥−1,
𝑓𝑋(1+ 𝑦+1)−𝑓𝑋(1− 𝑦+1)
a)
2 𝑦−1
𝑓𝑋(1+ 𝑦+1)−𝑓𝑋(1− 𝑦+1)
b)
2 𝑦+1
𝑓𝑋(1+ 𝑦+1)+𝑓𝑋(1− 𝑦+1)
c)
2 𝑦+1
𝑓𝑋(1+ 𝑦+1)+𝑓𝑋(1− 𝑦+1)
d)
2 𝑦+1−2 𝑦−1
6. Further Topics on Random Variables

b, c

Problem 3. The PDF of the maximum

Let X and Y be independent random variables, each uniformly distributed
on the interval [0,1].

Let Z=max{X,Y}. Find the PDF of Z. Express your answer in terms of z using
standard notation.

For 0<z<1:

𝑓𝑍(𝑧) =?

Let Z=max{2X,Y}. Find the PDF of Z. Express your answer in terms

of z using standard notation.

For 0<z<1:

𝑓𝑍(𝑧) =?

For 0<z<2:

𝑓𝑍(𝑧) =?

2⋅z, z, ½

Problem 4. Convolution calculations

Let the discrete random variable XX be uniform on {0,1,2} and let the
discrete random variable Y be uniform on {3,4}. Assume that X and Y are
independent. Find the PMF of X+Y using convolution. Determine the values
of the constants a, b, c, and d that appear in the following specification of
the PMF.

𝑝𝑋+𝑌(𝑧) = {𝑎, 𝑧 = 3 𝑏, 𝑧 = 4 𝑐, 𝑧 = 5 𝑑, 𝑧 = 6 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Find a, b, c, d

Let the random variable X be uniform on [0,2] and the random variable Y be

uniform on [3,4]. (Note that in this case, X and Y are continuous random
variables.) Assume that X and Y are independent. Let Z=X+Y. Find the PDF
of Z using convolution. The following figure shows a plot of this PDF.
Determine the values of a, b, c, d, and e.
6. Further Topics on Random Variables

1/6, 1/3, 1/3, 1/6, ½, 3, 4, 5, 6

Problem 5. Covariance of the multinomial

Consider n independent rolls of a k-sided fair die with k≥2: the sides of the
die are labelled 1,2,…,k, and each side has probability 1/k. Let the random
variable Xi denote the number of rolls that result in side i. Thus, the random
vector (X1,…,Xk) has a multinomial distribution.

1. Which of the following statements is correct? Try to answer without

doing any calculations.

a) X1 and X2 are uncorrelated.
b) X1 and X2 are positively correlated.
c) X1 and X2 are negatively correlated.

Find the covariance, cov(X1,X2), of X1 and X2. Express your answer as a

function of n and k using standard notation. Hint: Use indicator variables to
encode the result of each roll.

(𝑋1, 𝑋2) =?

Suppose now that the die is biased, with a probability pi≠0 that the result of
any given die roll is ii, for i=1, 2,…,k. We still consider n independent rolls of
this biased die and define Xi to be the number of rolls that result in side i.

Generalize your answer to part 2: Find cov(X1,X2) for this case of a biased

die. Express your answer as a function of n,k,p1,p2 using standard notation.
𝑛
c, − 2 , − 𝑛⋅(𝑝1)⋅(𝑝2)
𝑘
6. Further Topics on Random Variables

Problem 6. Correlation coefficients

Consider random variables X, Y and Z, which are assumed to be pairwise
uncorrelated (i.e., X and Y are uncorrelated, X and Z are uncorrelated,
and Y and Z are uncorrelated). Suppose that

[ 2] [ 2]
𝐸[𝑋] = 𝐸[𝑌] = 𝐸[𝑍] = 0 𝐸 𝑋 = 𝐸 𝑌 = 𝐸 𝑍 = 1 [ 2]
Find the correlation coefficients ρ(X−Y,X+Y), ρ(X+Y,Y+Z), and ρ(X,Y+Z).
0, 0.5, 0

Problem 7. Sum of a random number of r.v.'s

A fair coin is flipped independently until the first Heads is observed. Let the
random variable K be the number of tosses until the first Heads is
observed plus 1. For example, if we see TTTHTH, so that the first head is
observed after 4 tosses, then K=4+1=5. For k=1,2,…,K, let Xk be a continuous
random variable that is uniform over the interval [0,5]. The Xk are
𝐾
independent of one another and of the coin flips. Let 𝑋 = ∑ 𝑋 *. Find the
𝑘
𝑘=1
mean and variance of X. You may use the fact that the mean and variance
of a geometric random variable with parameter p are 1/p and (1−p)/p2,
respectively.

E[X]=?

Var(X)=?
7.5, 18.75
7. Bayesian Inference

Chapter VII

7. Bayesian
Inference
7.1. Major Terms, Problems, and Methods in this
Chapter
•Bayesian statistics treats unknown parameters as random variables with
known prior distributions.

• In parameter estimation, we want to generate estimates that are close to

the true values of the parameters in some probabilistic sense.

• In hypothesis testing, the unknown parameter takes one of a finite

number of values, corresponding to competing hypotheses; we want to
choose one of the hypotheses, aiming to achieve a small probability of
error.

• Principal Bayesian inference methods:

(a) Maximum a posteriori probability (MAP) rule: Out of the possible

parameter values/hypotheses, select one with maximum
conditional/posterior probability given the data (Section 8.2).

(b) Least mean squares (LMS) estimation: Select an estimator/function of

the data that minimizes the mean squared error between the parameter
and its estimate (Section 8.3).

(c) Linear least mean squares estimation: Select an estimator which is a

linear function of the data and minimizes the mean squared error between
the parameter and its estimate (Section 8.4). This may result in higher
mean squared error, but requires simple calculations, based only on the
means, variances, and covariances of the random variables involved.

7.2. Summary of Bayesian Inference

• We start with a prior distribution pΘ or fΘ for the unknown random variable
Θ.
7. Bayesian Inference

• We have a model pX|Θ or fX|Θ of the observation vector X.

• After observing the value x of X, we form the posterior distribution of Θ,

using the appropriate version of Bayes’ rule.

7.3. Point estimate

^
• An estimator is a random variable of the form Θ = 𝑔(𝑋) for some function
g. Different choices of g correspond to different estimators.
^
• An estimate is the value θ of an estimator, as determined by the realized
value x of the observation X.

• Once the value x of X is observed, the Maximum a Posteriori Probability

^
(MAP) estimator, sets the estimate θ to a value that maximizes the
posterior distribution over all possible values of θ.

• Once the value x of X is observed, the Conditional Expectation (LMS)

^
estimator sets the estimate θ to E[Θ | X = x].

7.4. The MAP Rule for Hypothesis Testing

• Given the observation value x, the MAP rule selects a hypothesis Hi for
which the value of the posterior probability P(Θ = θi | X = x) is largest.

• Equivalently, it selects a hypothesis Hi for which pΘ(θi)pX|Θ(x | θi) (if X is

discrete) or pΘ(θi)fX|Θ(x | θi) (if X is continuous) is largest.

• The MAP rule minimizes the probability of selecting an incorrect

hypothesis for any observation value x, as well as the probability of error
over all decision rules.

Exercise: Hypothesis testing versus estimation

For each one of the following situations, state whether it corresponds to a
hypothesis testing or estimation problem.

A grocery store was robbed yesterday morning. The police have

determined that the robber was one of the five customers who visited a
nearby bank earlier that morning. For those customers, the police know
their identity as well as the time that they visited the bank. The police want
to:

(a) Guess the time at which the grocery store was robbed.

(b) Guess the identity of the robber.

7. Bayesian Inference

(c) Guess the gender of the robber.

(d) Guess the weight of the robber.

Estimation, Hypothesis testing, Hypothesis testing, Hypothesis testing

Exercise: Estimates and estimators

Valerie wants to find an estimator for an unknown random variable Θ. She
can observe a random variable X whose distribution satisfies E[X2∣Θ]=Θ.
She goes ahead and observes that X took a numerical value of 5. She then
estimates Θ as the square of the observed value, namely, 25.

For each of the following questions, choose the most appropriate answer.

1) X2 is an

2) 25 is an

3) X3+2 is another (not necessarily good)

Estimator, Estimate, Estimator

7.5. The Four Versions of Bayes’ Rule

• Θ discrete, X discrete:
𝑝Θ(θ)𝑝𝑋∣Θ(θ)
𝑝Θ∣𝑋(𝑥) =
( ')
∑ 𝑝Θ θ 𝑝𝑋∣Θ θ
'
( ')
θ

• Θ discrete, X continuous:
𝑝Θ(θ)𝑓𝑋∣⊖(θ)
𝑝Θ∣𝑋(𝑥) =
( ')
∑ 𝑝Θ θ 𝑓𝑋∣Θ θ
'
( ')
θ

• Θ continuous, X discrete:
𝑓Θ(θ)𝑝𝑋∣Θ(θ)
𝑓Θ∣𝑋(𝑥) =
( ')
∫𝑓Θ θ 𝑝𝑋∣Θ θ 𝑑θ ( ') '

• Θ continuous, X continuous:
𝑓Θ(θ)𝑓𝑋∣Θ(θ)
𝑓Θ∣𝑋(𝑥) =
( ')
∫𝑓Θ θ 𝑓𝑋∣Θ θ 𝑑θ ( ') '
7. Bayesian Inference

7.6. The Maximum a Posteriori Probability (MAP)

Rule
^
• Given the observation value x, the MAP rule selects a value θ that
maximizes over θ the posterior distribution pΘ|X(θ | x) (if Θ is discrete) or
fΘ|X(θ | x) (if Θ is continuous).
^
• Equivalently, it selects θ that maximizes over θ:

𝑝Θ(θ)𝑝𝑋∣Θ(θ)(𝑖𝑓 Θ 𝑎𝑛𝑑 𝑋 𝑎𝑟𝑒 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒),

𝑝Θ(θ)𝑓𝑋∣Θ(θ)(𝑖𝑓 Θ 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑎𝑛𝑑 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠),

𝑓Θ(θ)𝑝𝑋∣Θ(θ)(𝑖𝑓 Θ 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑎𝑛𝑑 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒),

𝑓Θ(θ)𝑓𝑋∣Θ(θ)(𝑖𝑓 Θ 𝑎𝑛𝑑 𝑋 𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠).

• If Θ takes only a finite number of values, the MAP rule minimizes (over all
decision rules) the probability of selecting an incorrect hypothesis. This is
true for both the unconditional probability of error and the conditional one,
given any observation value x.

Exercise: The posterior of a coin's bias

Let Θ be a continuous random variable that represents the unknown bias
(i.e., the probability of Heads) of a coin.

a) The prior PDF fΘ for the bias of a coin is of the form

9
𝑓Θ(θ) = 𝑎θ (1 − θ), 𝑓𝑜𝑟 θ∈[0, 1]

where a is a normalizing constant. This indicates a prior belief that the

bias Θ of the coin is

b) We flip the coin 10 times independently and observe 1 Heads and 9 Tails.
𝑚 𝑛
The posterior PDF of Θ will be of the form 𝑐θ (1 − θ) , where c is a
normalizing constant and where m=?, n=?
High, 10, 10

Exercise: Discrete unknowns

Let Θ1 and Θ2 be some unobserved Bernoulli random variables and let X be
an observation. Conditional on X=x, the posterior joint PMF of Θ1 and Θ2 is
given by
7. Bayesian Inference

𝑝Θ ,Θ ∣𝑋(θ1, θ2∣𝑥) = {0. 26, 𝑖𝑓 θ1 = 0, θ2 = 0 0. 26, 𝑖𝑓 θ1 = 0, θ2 = 1 0. 21, 𝑖𝑓 θ1 = 1, θ2 = 0 0. 2

1 2

We can view this as a hypothesis testing problem where we choose

between four alternative hypotheses: the four possible values of (Θ1,Θ2).

a) What is the estimate of (Θ1,Θ2) provided by the MAP rule?

^ ^
b) Once you calculate the estimate (θ 1, θ 2) 𝑜𝑓 (Θ1, Θ2), you may report the
^
first component, θ 1, as your estimate of Θ1. With this procedure, your
estimate of Θ1 will be

c) What is the probability that Θ1 is estimated incorrectly (the probability of

error) when you use the procedure in part (b)?

d) What is the MAP estimate of Θ1 based on X, that is, the one that
maximizes 𝑝Θ ∣𝑋(θ1∣𝑥)??
1

e) The moral of this example is that an estimate of Θ1 obtained by

identifying the maximum of the joint PMF of all unknown random variables
is …

the MAP estimate of Θ1.

(1,1), 1, 0.52, 0, can be different

Exercise: Discrete unknown and continuous observation

Similar to the last example, suppose that X=Θ+W, where Θ is equally likely
to take the values −1 and 1, and where W is standard normal noise,
^ ^ ^
independent of Θ. We use the estimator Θ, with Θ=1 if X>0 and Θ
=−1 otherwise. (This is actually the MAP estimator for this problem.)

a) Let us assume that the true value of Θ is 1. In this case, our estimator
makes an error if and only if W has a low (negative) value. The conditional
^
probability of error given the true value of Θ is 1, that is, 𝑃(Θ ≠1∣Θ = 1), is
equal to:

a) Φ(−1)
b) Φ(0)
c) Φ(1)

where Φ is the standard normal CDF.

7. Bayesian Inference

b) For this problem, the overall probability of error is easiest found using
the formula
^ ^
a) 𝑃(Θ ≠ Θ) = ∫𝑃(Θ ≠ Θ∣𝑋 = 𝑥)𝑓𝑋(𝑥)𝑑𝑥
^ ^
b) 𝑃(Θ ≠ Θ) = ∑ 𝑃(Θ ≠θ∣Θ = θ)𝑝Θ(θ)
θ

a, b

Exercise: Continuous unknown and observation

Let Θ and X be jointly continuous nonnegative random variables. A
−2θ
particular value x of X is observed and it turns out that 𝑓⊙∣𝑋(θ∣𝑥) = 2𝑒 ,
for θ≥0.

The following facts may be useful: for an exponential random

variable Y with parameter λ, we have E[Y]=1/λ and Var(Y)=1/λ2.

a) The LMS estimate (conditional expectation) of Θ is

^ 2
b) The conditional mean squared error 𝐸[(Θ − Θ 𝐿𝑀𝑆
) ∣𝑋 = 𝑥] is

c) The MAP estimate of Θ is

^ 2
d) The conditional mean squared error 𝐸[(Θ − Θ 𝑀𝐴𝑃
) ∣𝑋 = 𝑥] is

0.5, 0.25, 0, 0.5

Exercise: Moments of the Beta distribution

Suppose that Θ takes values in [0,1] and its PDF is of the form
2
𝑓Θ(θ) = 𝑎θ(1 − θ) , 𝑓𝑜𝑟 θ∈[0, 1]

where a is a normalizing constant.

Use the formula

1
α β α!β!
∫ θ (1 − θ) 𝑑θ = (α+β+1)!
0

to find the following:

a) a=?
7. Bayesian Inference

[ 2]
b) 𝐸 Θ =?
12, 0.2

Exercise: Recognizing normal PDFs

The random variable X has a PDF of the form
2
−4𝑥 −24𝑥+30
𝑓𝑋(𝑥) = 𝑐𝑒

where c is a normalizing constant. Then,

a) E[X]=?

b) Var(X)=?
-3, 0.125

Exercise: Normal unknown and additive noise

As in the last video, let X=Θ+W, where Θ and W are independent normal
random variables and W has mean zero

a) Assume that W has positive variance. Are X and W independent?

b) Find the MAP estimator of Θ based on X if Θ∼N(1,1) and W∼N(0,1), and

evaluate the corresponding estimate if X=2.
^
θ = ?

c) Find the MAP estimator of Θ based on X if Θ∼N(0,1) and W∼N(0,4), and

evaluate the corresponding estimate if X=2.
^
θ = ?

d) For this part of the problem, suppose instead that X=2Θ+3W,

where Θ and W are standard normal random variables. Find the MAP
estimator of Θ based on X under this model and evaluate the
corresponding estimate if X=2.
No, 1.5, 0.4, 0.3077

Exercise: Multiple observations

Consider a model involving multiple observations of the form
𝑋𝑖 = 𝑐𝑖Θ + 𝑊 '𝑖 = 1, 2, …, 𝑛. where Θ, W1,…,Wn are independent (not
𝑖
7. Bayesian Inference

necessarily normal) random variables and the ci's are known nonzero

constants. Assume that Θ has positive variance.

a) Are the random variables Xi, i=1,2,…,n, independent?

b) Are the random variables Xi, i=1,2,…,n, conditionally independent

given Θ?
No, Yes

Exercise: Multiple observations, more general model

Suppose that X1=Θ+W1 and X2=2Θ+W2, where Θ,W1,W2 are independent
standard normal random variables. If the values that we observe happen to
be X1=−1 and X2=1, then the MAP estimate of Θ is
0.167

7.7. Key Facts About Least Mean Squares

Estimation
^ 2
• In the absence of any observations, 𝐸[(Θ − θ ) ] is minimized when
^
θ = 𝐸[Θ]:
2 ^ 2 ^
𝐸[(Θ − 𝐸[Θ]) ]≤𝐸[(Θ − θ ) ], 𝑓𝑜𝑟 𝑎𝑙𝑙 θ
^ 2
• For any given value x of X, 𝐸[(Θ − θ ) ∣𝑋 = 𝑥] is minimized when
^
θ = 𝐸[𝑋 = 𝑥]:
2 ^ 2 ^
𝐸[(Θ − 𝐸[Θ∣𝑋 = 𝑥]) ∣𝑋 = 𝑥]≤𝐸[(Θ − θ ) ∣𝑋 = 𝑥], 𝑓𝑜𝑟 𝑎𝑙𝑙 θ

• Out of all estimators g(X) of Θ based on X, the mean squared estimation

error
2
𝐸[(Θ − 𝑔(𝑋)) ] is minimized when 𝑔(𝑋) = 𝐸[Θ∣𝑋]:
2 2
𝐸[(Θ − 𝐸[Θ∣𝑋]) ]≤𝐸[(Θ − 𝑔(𝑋)) ] for all estimators

Exercise: The mean-squared error

In this exercise we want to understand a little better the formula
1
𝑛
1
∑ 2
𝑖=0 σ𝑖
7. Bayesian Inference

for the mean squared error by considering two alternative scenarios.

In the first scenario, Θ∼N(0,1) and we observe X=Θ+W, where W∼N(0,1) is

independent of Θ.

In the second scenario, the prior information on Θ is extremely inaccurate:

( ) 2 2
Θ∼𝑁 0, σ0 where σ0 is so large that it can be treated as infinite. But in this
second scenario we obtain two observations of the form Xi=Θ+Wi, where
the Wi are standard normals, independent of each other and of Θ.

The mean squared error is

a) smaller in the first scenario.

b) smaller in the second scenario.
c) the same in both scenarios.
c

7.8. Properties of the Estimation Error

~
• The estimation error Θ is unbiased, i.e., it has zero unconditional and
conditional mean:

[ ~] = 0, 𝐸[𝑋 = 𝑥] = 0, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥

𝐸Θ
~ ~
• The estimation error Θ is uncorrelated with the estimate Θ :
^ ~
𝑐𝑜𝑣⁡(Θ , Θ ) = 0

• The variance of Θ can be decomposed as

^ ~
𝑣𝑎𝑟⁡(Θ) = 𝑣𝑎𝑟⁡(Θ ) + 𝑣𝑎𝑟⁡(Θ )

7.9. Linear LMS Estimation Formulas

^
• The linear LMS estimator Θ of Θ based on X is
^ 𝑐𝑜𝑣⁡(Θ,𝑋) σΘ
Θ = 𝐸[Θ] + 𝑣𝑎𝑟⁡(𝑋)
(𝑋 − 𝐸[𝑋]) = 𝐸[Θ] + ρ σ𝑋
(𝑋 − 𝐸[𝑋])

Where
(Θ,𝑋)
ρ= σΘσ𝑋

is the correlation coefficient.

7. Bayesian Inference

• The resulting mean squared estimation error is equal to

(1 − ρ2)σ2Θ

Exercise: The effect of a stronger signal

For the model X=Θ+W, and under the usual independence and normality
assumptions for Θ and W, the mean squared error of the LMS estimator is
1

( )( )
1
σ0
2 +
1
2
σ1

2 2
where σ0 𝑎𝑛𝑑 σ1 are the variances of Θ and W, respectively.

Suppose now that we change the observation model to Y=3Θ+W. In some

sense the “signal” has a stronger presence, relative to the noise term W,
and we should expect to obtain a smaller mean squared error. Suppose
2 2
σ0 = σ1 = 1. The mean squared error of the original model X=Θ+W is
then 1/2. In contrast, the mean squared error of the new
model Y=3Θ+W is?
0.1

Exercise: Multiple observations and unknowns

Let Θ1, Θ2, 𝑊1, 𝑎𝑛𝑑 𝑊2 be independent standard normal random variables.
We obtain two observations,

𝑋1 = Θ1 + 𝑊1, 𝑋2 = Θ1 + Θ2 + 𝑊2

^ ^ ^
Find the MAP estimate θ = (θ 1, θ 2)𝑜𝑓(Θ1, Θ2) if we observe
that X1=1, X2=3. (You will have to solve a system of two linear equations.)
^ ^
θ 1
=?, θ 2
=?

1, 1
7. Bayesian Inference

7.10. Problems
Problem 1. Defective Coin
A defective coin minting machine produces coins whose probability of
Heads is a random variable Q with PDF
4
𝑓𝑄(𝑞) = {5𝑞 , 𝑖𝑓𝑞∈[0, 1] 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

A coin produced by this machine is tossed repeatedly, with successive

tosses assumed to be independent. Let A be the event that the first toss of
this coin results in Heads, and let B be the event that the second toss of
this coin results in Heads.

P(A)=?

Find the conditional PDF of Q given event A. Express your answer in terms
of q using standard notation.

P(B∣A)=?
5
5/6, 6⋅𝑞 , 6/7

Problem 2. Hypothesis test between two coins

Alice has two coins. The probability of Heads for the first coin is 1/4, and
the probability of Heads for the second is 3/4. Other than this difference,
the coins are indistinguishable. Alice chooses one of the coins at random
and sends it to Bob. The random selection used by Alice to pick the coin to
send to Bob is such that the first coin has a probability p of being selected.
Assume that 0<p<1. Bob tries to guess which of the two coins he received
by tossing it 3 times in a row and observing the outcome. Assume that for
any particular coin, all tosses of that coin are independent.

Given that Bob observed k Heads out of the 3 tosses (where k=0,1,2,3),

what is the conditional probability that he received the first coin?
𝑘
3 ⋅𝑝
a) 3−𝑘 𝑘
3 ⋅𝑝+3 ·(1−𝑝)
𝑝
b) 3−𝑘
3
3−𝑘
3 ⋅𝑝
c) 3−𝑘 𝑘
3 ⋅𝑝+3 ·(1−𝑝)
3−𝑘
3 ·(1−𝑝)
d) 3−𝑘 𝑘
3 ⋅𝑝+3 ·(1−𝑝)
7. Bayesian Inference

We define an error to have occurred if Bob decides that he received one

coin from Alice, but he actually received the other coin. He decides that he
received the first coin when the number of Heads, k, that he observes on
the 3 tosses satisfies a certain condition. When one of the following
conditions is used, Bob will minimize the probability of error. Choose the
correct threshold condition.
3 1 𝑝
a) 𝑘≤ 2
+ 2 1−𝑝
3 1 𝑝
b) 𝑘≥ 2
+ 2 1−𝑝
1 𝑝
c) 𝑘≤ 2 1−𝑝
1 𝑝
d) 𝑘≥ 2 1−𝑝
e) None

For this part, assume that p=3/4.

What is the probability that Bob will guess the coin correctly using the
decision rule from part 2?

Suppose instead that Bob tries to guess which coin he received without
tossing it. He still guesses the coin in order to minimize the probability of
error. What is the probability that Bob will guess the coin correctly under
this scenario?

Bob uses the decision rule of Part 2. If p is small, then Bob will always
decide in favor of the second coin, ignoring the results of the three tosses.
The range of such p's is [0,t). Find t.
c, a, 0.8437, 0.75, 0.035

Problem 3. Hypothesis test with a continuous observation

Let Θ be a Bernoulli random variable that indicates which one of two
hypotheses is true, and let P(Θ=1) =p. Under the hypothesis Θ=0, the
random variable X has a normal distribution with mean 0, and variance 1.
Under the alternative hypothesis Θ=1, X has a normal distribution with
mean 2 and variance 1.

Consider the MAP rule for deciding between the two hypotheses, given
that X=x.

Suppose for this part of the problem that p=2/3. The MAP rule can choose
in favor of the hypothesis Θ=1 if and only if x≥c1. Find the value of c1.
7. Bayesian Inference

For this part, assume again that p=2/3. Find the conditional probability of
error for the MAP decision rule, given that the hypothesis Θ=0 is true.

𝑃(Θ = 0) =?

Find the overall (unconditional) probability of error associated with the

MAP rule for p=1/2.
0.6534, 0.26, 0.16

Problem 4. Trajectory estimation, Part I

The vertical coordinate ("height") of an object in free fall is described by an
equation of the form
2
𝑥(𝑡) = θ0 + θ1𝑡 + θ2𝑡

where θ0, θ1, 𝑎𝑛𝑑 θ2 are some parameters and t stands for time. At certain
times t1,…,tn, we make noisy observations Y1,…,Yn, respectively, of the
height of the object. Based on these observations, we would like to
estimate the object's vertical trajectory.

We consider the special case where there is only one unknown parameter.
We assume that θ0 (the height of the object at time zero) is a known
constant. We also assume that θ2 (which is related to the acceleration of
the object) is known. We view θ1 as the realized value of a continuous
random variable Θ1. The observed height at time
2
𝑡𝑖 𝑖𝑠 𝑌𝑖 = θ0 + Θ1𝑡𝑖 + θ2𝑡𝑖 + 𝑊 2𝑖 = 1, …, 𝑛, 𝑤ℎ𝑒𝑟𝑒 𝑊𝑖 models the observation
𝑖
2
noise. We assume that Θ1 ∼ 𝑁(0, 1), 𝑊1, …, 𝑊𝑛∼𝑁(0, σ ), and that all these
random variables are independent.

In this case, finding the MAP estimate of Θ1 involves the minimization of

𝑛 2
2
θ1 +
1
2 (
∑ 𝑦𝑖 − θ0 − θ1𝑡𝑖 − θ2𝑡𝑖
σ 𝑖=1
)
2

with respect to θ1

Carry out this minimization and choose the correct formula for the MAP
^
estimate, θ 1, from the options below.
𝑛
2
∑ 𝑡𝑖(𝑦𝑖−θ0−θ2𝑡𝑖 )
^
a) θ 1
= 𝑖=1
2
σ
7. Bayesian Inference

𝑛
2
∑ 𝑡𝑖(𝑦𝑖−θ0−θ2𝑡𝑖 )
^
b) θ 1
= 𝑖=1
𝑛
2 2
σ + ∑ 𝑡𝑖
𝑖=1
𝑛
2
∑ 𝑡𝑖(𝑦𝑖−θ0−θ2𝑡𝑖 )
^
c) θ 1
= 𝑖=1
𝑛
2 2
σ + ∑ θ2𝑡𝑖
𝑖=1

d) none of the above

^ ^
The formula for θ 1
can be used to define the MAP estimator, Θ 1
(a random
variable), as a function of t1,…,tn and the random variables Y1,…,Yn. Identify
whether the following statement is true:
^
The MAP estimator , Θ 1 has a normal distribution ?

Let σ=1 and consider the special case of only two observations (n=2). Write
^ 2
down a formula for the mean squared error 𝐸[(Θ 1
− Θ1) ], as a function
of t1 and t2.
^ 2
𝐸[(Θ 1
− Θ1) ] =?

Consider the "experimental design" problem of choosing when to make

measurements. Under the assumptions of the previous part, and under the
constraints 0≤t1, t2≤10, find the values of t1 and t2 that minimize the mean
squared error associated with the MAP estimator.

t1 = ?

t2 = ?

1
b, true, 2 2 , 10, 10
1+𝑡1+𝑡2

Problem 5. Hypothesis test between two normal

Conditioned on the result of an unbiased coin flip, the random
variables T1,T2,…,Tn are independent and identically distributed, each drawn
from a common normal distribution with mean zero. If the result of the coin
flip is Heads, this normal distribution has variance 1; otherwise, it has
variance 4. Based on the observed values t1,t2,…,tn, we use the MAP rule to
decide whether the normal distribution from which they were drawn has
variance 1 or variance 4. The MAP rule decides that the underlying normal
distribution has variance 1 if and only if
7. Bayesian Inference

| 𝑛 2 𝑛 |
|𝑐 ∑ 𝑡 + 𝑐 ∑ 𝑡 | < 1
| 1 𝑖|
| 𝑖=1 𝑖 2
𝑖=1 |

Find the values of c1≥0 and c2≥0 such that this is true. Express your answer
in terms of n, and use "ln" to denote the natural logarithm function, as in
"ln(3)".
3
8⋅𝑛⋅𝑙𝑛⁡(2)
,0

7.11. Least mean squares (LMS) estimation

Exercise: LMS estimation
Let Θ be the bias of a coin, i.e., the probability of Heads at each toss. We
assume that Θ is uniformly distributed on [0,1]. Let K be the number of
Heads in 9 independent tosses.

By performing some fancy and very precise measurements on the

structure of that particular coin, we determine that Θ=1/3. Find the LMS
estimate of K based on Θ.
3

Exercise: LMS estimation error

As in the previous exercise, let Θ be the bias of a coin, i.e., the probability of
Heads at each toss. We assume that Θ is uniformly distributed on [0,1].
Let K be the number of Heads in 9 independent tosses. We have seen that
the LMS estimate of K is E[K∣Θ=θ]=nθ.
1
a) Find the conditional mean squared error 𝐸[(Θ = θ] 𝑖𝑓 θ = 3
b) Find the overall mean squared error of this estimation procedure.
2, 1.5

Exercise: LMS example

The random variables Θ and X are described by a joint PDF which is
uniform on the triangular set defined by the constraints 0≤x≤1, 0≤θ≤x. Find
the LMS estimate of Θ given that X=x, for xx in the range [0,1]. Express your
answer in terms of x
x/2
7. Bayesian Inference

Exercise: Mean squared error

As in an earlier exercise, we assume that the random variables Θ and X are
described by a joint PDF which is uniform on the triangular set defined by
the constraints 0≤x≤1, 0≤θ≤x.

a) Find an expression for the conditional mean squared error of the LMS
estimator given that X=x, valid for x∈[0,1]. Express your answer in terms
of x

b) Find the (unconditional) mean squared error of the LMS estimator.

2
𝑥
12
, 0.042

Exercise: Multidimensional challenges

Suppose that fΘ and fX|Θ are described by simple closed-form formulas.
Suppose that Θ is one-dimensional but X is high-dimensional.

a) Suppose that a specific value x of the random variable X has been

observed. Is it true that the calculation of the LMS estimate will always
involve only ordinary integrals (integrals with respect to only one variable)?

b) Is it true that the calculation of the mean squared error of the LMS
estimator will always involve only ordinary integrals (integrals with respect
to only one variable)?
Yes, No

Exercise: Theoretical properties

^ ~ ^
Let Θ be an estimator of a random variable Θ, and let Θ = Θ − Θ be the
estimation error.
^
a) In this part of the problem, let Θ be specifically the LMS estimator of Θ.
~
We have seen that for the case of the LMS estimator, 𝐸[Θ ∣𝑋 = 𝑥] = 0 for
~
every x. Is it also true that 𝐸[Θ ∣Θ = θ] = 0 for all θ? Equivalently, is it true
^
that 𝐸[Θ ∣Θ = θ] = θ for all θ?
^
b) In this part of the problem, Θ is no longer necessarily the LMS estimator
^ ~
of Θ. Is the property 𝑉𝑎𝑟⁡(Θ) = 𝑉𝑎𝑟⁡(Θ ) + 𝑉𝑎𝑟⁡(Θ ) true for every estimator
^
Θ ?
No, No
7. Bayesian Inference

Exercise: LMS and LLMS

Suppose that the random variables Θ and X are not independent, but
𝐸[𝑋 = 𝑥] = 3 for all x. Then the LLMS estimator of Θ based on X is of the
form aX+b, with a=? and b=?
0, 3

Exercise: LLMS without a constant term

Suppose that instead of estimators of the form aX+e, we consider
^
estimators of the form Θ = 𝑎𝑋 and ask for the value of a that minimizes
the mean squared error. Mimic the derivation you have just seen and find
the optimal value of a. Your answer should be an algebraic expression
involving some of the constants b, c, d, where
2 2
𝑏 = 𝐸[Θ ], 𝑐 = 𝐸[Θ𝑋], 𝑑 = 𝐸[𝑋 ].
c/d

Exercise: LLMS drill

Suppose that Θ and W are independent, both with variance 1, and
that X=Θ+W. Furthermore, E[Θ]=1 and E[W]=2. The LLMS estimator
^
Θ = 𝑎𝑋 + 𝑏 has

a=?, b=?

Hint: Remember the formula Cov(X+Y,Z)=Cov(X,Z)+Cov(Y,Z).

½, -1/2

Exercise: Possible values of the estimates

Suppose that the random variable Θ takes values in the interval [0,1].

a) Is it true that the LMS estimator is guaranteed to take values only in the
interval [0,1]?

b) Is it true that the LLMS estimator is guaranteed to take values only in

the interval [0,1]?
Yes, No
7. Bayesian Inference

Exercise: Comparison for the coin problem

Recall that the MAP estimator for the problem of estimating the bias of a
coin is X/n, which is different from the LLMS estimator (X+1)/(n+2). How do
they compare in terms of mean squared error (MSE)?

a) MAP has a smaller MSE.

b) LLMS has a smaller MSE.
c) They have the same MSE.
b

Exercise: LLMS with multiple observations

Suppose that Θ, X1, and X2 have zero means. Furthermore,

(𝑋1) = (𝑋2) = (Θ) = 4

And

(Θ, 𝑋1) = (Θ, 𝑋2) = (𝑋1, 𝑋2) =1

The LLMS estimator of Θ based on X1 and X2 is of the form

^
Θ = 𝑎1𝑋1 + 𝑎2𝑋2 + 𝑏. Find the coefficients a1, a2, and b. Hint: To find b, recall
the argument we used for the case of a single observation.
0.2, 0.2, 0

Exercise: Choice of representations

We wish to estimate an unknown quantity Θ. Our measuring equipment
produces an observation of the form X=Θ3+W, where W is a noise term
which is small relative to the range of Θ. Which type of linear estimator is
preferable in such a situation?
^
a) Θ = 𝑎𝑋 + 𝑏
^ 3
b) Θ = 𝑎𝑋 + 𝑏
^ 1/3
c) Θ = 𝑎𝑋 +𝑏
c
7. Bayesian Inference

7.12. Problems
Problem 1. Determining the type of a lightbulb
The lifetime of a type-A bulb is exponentially distributed with parameter λ.
The lifetime of a type-B bulb is exponentially distributed with parameter μ,
where μ>λ>0. You have a box full of lightbulbs of the same type, and you
would like to know whether they are of type A or B. Assume an a
priori probability of 1/4 that the box contains type-B lightbulbs.

Assume that μ≥3λ. You observe the value t1 of the lifetime, T1, of a lightbulb.

A MAP decision rule decides that the lightbulb is of type A if and only
if t1≥α. Find α, and express your answer in terms of μ and λ. Use ‘mu" and
‘lambda" and ‘ln" to denote μ,λ, and the natural logarithm function,
respectively.

α=?

Assume again that μ≥3λ. What is the probability of error of the MAP

decision rule?
−μα
a)
1
4
𝑒
3
+
4 (1 − 𝑒−λα)
−μα
b)
3
4
𝑒
1
+ 4 (1 − 𝑒−λα)
−μα
c)
1
4 (1 − 𝑒 ) + 34 𝑒−λα
3 −μ𝑎 1 −λ𝑎
d) 4
(1 − 𝑒 )+ 4
𝑒

Assume that λ=3 and μ=4. Find the LMS estimate of T2, the lifetime of

another lightbulb from the same box, based on observing T1=2. Assume
that conditioned on the bulb type, bulb lifetimes are independent. (For this
part, you will need a calculator. Provide an answer with an accuracy of
three decimal places.)

LMS estimate of T2: ?

1 µ
( μ−λ )⋅𝑙𝑛⁡( 3⋅λ ), a, 0.328

Problem 2. Estimating the parameter of a geometric r.v.

We have k coins. The probability of Heads is the same for each coin and is
the realized value q of a random variable Q that is uniformly distributed
on [0,1]. We assume that conditioned on Q=q, all coin tosses are
independent. Let Ti be the number of tosses of the ith coin until that coin
7. Bayesian Inference

results in Heads for the first time, for i=1,2,…,k (Ti includes the toss that
results in the first Heads.)

You may find the following integral useful: For any non-negative
integers k and m,
1
𝑘 𝑚 𝑘!𝑚!
∫ 𝑞 (1 − 𝑞) 𝑑𝑞 = (𝑘+𝑚+1)!
0

Find the PMF of T1. (Express your answer in terms of t using standard
notation.)

For t=1,2,…

𝑝𝑇 (𝑡) =?
1

Find the least mean squares (LMS) estimate of Q based on the observed
value, t, of T1. (Express your answer in terms of t using standard notation.)

[ ]
𝐸 𝑇1 = 𝑡 =?

We flip each of the k coins until they result in Heads for the first time.
^
Compute the maximum a posteriori (MAP) estimate 𝑞 of Q given the
number of tosses needed, T1=t1,…,Tk=tk, for each coin. Choose the correct
^
expression for 𝑞 .
^ 𝑘−1
a) 𝑞 = 𝑘
∑ 𝑡𝑖
𝑖=1
^ 𝑘
b) 𝑞 = 𝑘
∑ 𝑡𝑖
𝑖=1
^ 𝑘+1
c) 𝑞 = 𝑘
∑ 𝑡𝑖
𝑖=1

d) none of the above

1 2
,
𝑡⋅(𝑡+1) 𝑡+2
,b

Problem 3. LLMS estimation

Let X=U+W with E[U]=m, Var(U)=v, E[W]=0, and Var(W)=h. Assume
that U and W are independent.
7. Bayesian Inference

^
The LLMS estimator of U based on X is of the form 𝑈 = 𝑎𝑋 + 𝑏.
Find a and b. Express your answers in terms of m, v, and h using standard
notation.

We now further assume that U and W are normal random variables and

^
then construct 𝑈 𝐿𝑀𝑆
, the LMS estimator of U based on X, under this
^ ^
additional assumption. Would 𝑈 𝐿𝑀𝑆
be identical to 𝑈 , the LLMS estimator
developed without the additional normality assumption in Part 1?
𝑣 𝑚⋅ℎ
𝑣+ℎ
, 𝑣+ℎ
, Yes

4. LLMS estimation with random sums

Let N be a random variable with mean E[N]=m, and Var(N)=v; let A1,A2,… be a
sequence of i.i.d random variables, all independent of N, with mean 1 and
variance 1; let B1,B2,… be another sequence of i.i.d. random variables, all
independent of N and of A1,A2,… also with mean 1 and variance 1. Let
𝑁 𝑁
𝐴 = ∑ 𝐴𝑖𝑎𝑛𝑑 𝐵 = ∑ 𝐵𝑖.
𝑖=1 𝑖=1

Find the following expectations using the law of iterated expectations.

Express each answer in terms of mm and v, using standard notation.

E[AB]=?

E[NA]=?
^
Let 𝑁 = 𝑐1𝐴 + 𝑐2 be the LLMS estimator of N given A. Find c1 and c2 in
terms of m and v

c1=?

c2=?
2
2 2 𝑣 𝑚
𝑚 + 𝑣, 𝑚 + 𝑣, 𝑚+𝑣
, 𝑚+𝑣

Problem 5. Estimating the parameter of a uniform r.v.

The random variable X is uniformly distributed over the interval [θ,2θ]. The
parameter θ is unknown and is modeled as the value of a continuous
random variable Θ, uniformly distributed between zero and one.
7. Bayesian Inference

Given an observation x of X, find the posterior distribution of Θ. Express

your answers below in terms of θ and x. Use ‘theta" to denote θ and ‘ln" to
denote the natural logarithm function.

For 0≤x≤1 and x/2≤θ≤x:

𝑓Θ∣𝑋(𝑥) =?

Find the MAP estimate of Θ based on the observation X=x and assuming

that 0≤x≤1. Express your answer in terms of x.

For 0≤x≤1:
^
θ 𝑀𝐴𝑃
(𝑥) =?

Find the LMS estimate of Θ based on the observation X=x and assuming

that 0≤x≤1. Express your answer in terms of x.

For 0≤x≤1:
^
θ 𝐿𝑀𝑆
(𝑥) =?

^
Find the linear LMS estimate θ 𝐿𝐿𝑀𝑆
of Θ based on the observation X=x.
^
Specifically, θ 𝐿𝐿𝑀𝑆
is of the form 𝑐1 + 𝑐2𝑥. Find c1 and c2.

1 𝑥
θ⋅𝑙𝑛⁡(2)
, x/2, 2⋅𝑙𝑛⁡(2)
, 0.065, 0.581

Exam
Problem 1(a)
Suppose that X, Y, and Z are independent, with E[X]=E[Y]=E[Z]=2,
and E[X2]=E[Y2]=E[Z2]=5.

Find cov(XY,XZ).

(Enter a numerical answer.)

Problem 2. Problem 1(b)

Let X be a standard normal random variable. Another random variable is
determined as follows. We flip a fair coin (independent from X). In case of
Heads, we let Y=X. In case of Tails, we let Y=−X.

Is Y normal? Justify your answer.

7. Bayesian Inference

Compute Cov(X,Y).

Are X and Y independent?

Problem 3. Problem 1(c)

Find P(X+Y≤0).
4, yes, 0, no, ¾

Problem 2
We are given a stick that extends from 0 to x. Its length, x, is the realization
of an exponential random variable X, with mean 1. We break that stick at a
point Y that is uniformly distributed over the interval [0,x].

Write down the (fully specified) joint PDF fX,Y(x,y)of X and Y.

𝐹𝑜𝑟 0 < 𝑦≤𝑥: 𝑓𝑋,𝑌(𝑥, 𝑦) = ?

(𝐸[𝑋]) =?

We do not observe the value of X, but are told that Y=2.2. Find the MAP
estimate of X based on Y=2.2.
−𝑥
𝑒
𝑥
, 1/4, 2.2

Problem 3
Let X be uniform on [0,1/2]. Find the PDF 𝑓𝑌(𝑦)𝑜𝑓𝑌 = 𝑋/(1 − 𝑋)

𝐹𝑜𝑟 0≤𝑦≤1 𝑓𝑌(𝑦) =?

2
2
(1+𝑦)

Problem 4(a)
A random variable X is generated as follows. We flip a coin. With
probability p, the result is Heads, and then X is generated according to a
PDF fX|H which is uniform on [0,1]. With probability 1−p the result is Tails, and
then X is generated according to a PDF fX|T of the form

𝑓𝑋∣𝑇(𝑥) = 2𝑥, 𝑖𝑓 𝑥∈[0, 1]

(The PDF is zero everywhere else.)

What is the (unconditional) PDF fX(x) of X?

7. Bayesian Inference

For 0≤x≤1:

𝑓𝑋(𝑥) =?

Calculate E[X]

Problem 4(b)
We now wish to estimate the result of the coin toss, based on the value
of X.

Find P(Tails∣X=1/4).

The MAP rule decides in favor of Heads if X<a and in favor of Tails if X>a.
What is a?
1 1−𝑝 𝑝
2⋅𝑥 + 𝑝⋅(1 − 2⋅𝑥), 6
⋅(4 − 𝑝), 1+𝑝
, 2⋅(1−𝑝)

Problem 5
Let X and Y be independent positive random variables. Let Z=X/Y. In what
follows, all occurrences of x, y, z are assumed to be positive numbers.

Suppose that X and Y are discrete, with known PMFs, pX and pY. Then,

𝑝𝑍∣𝑌(𝑦) = 𝑝𝑋(?)

What is the argument in the place of the question mark?

Suppose that X and Y are continuous, with known PDFs, fX and fY. Provide a

formula, analogous to the one in part (a), for fZ|Y(z|y) in terms of fX. That is,
find A and B in the formula below.

𝑓𝑍∣𝑌(𝑦) = 𝐴𝑓𝑋(𝐵)

Which of the following is a formula for fZ(z)?

∞
a) 𝑓𝑍(𝑧) = ∫ 𝑓𝑌,𝑍(𝑦, 𝑧)𝑑𝑦
0
∞
b) 𝑓𝑍(𝑧) = ∫ 𝑓𝑌,𝑍(𝑦, 𝑧)𝑑𝑧
0
∞
c) 𝑓𝑍(𝑧) = ∫ 𝑓𝑌(𝑦)𝑓𝑍,𝑌(𝑧, 𝑦)𝑑𝑦
0
∞
d) 𝑓𝑍(𝑧) = ∫ 𝑓𝑌(𝑦)𝑓𝑍∣𝑌(𝑦)𝑑𝑦
0
7. Bayesian Inference

∞
e) 𝑓𝑍(𝑧) = ∫ 𝑓𝑌(𝑦)𝑓𝑋(𝑦𝑧)𝑑𝑦
0
∞
f) 𝑓𝑍(𝑧) = ∫ 𝑦𝑓𝑌(𝑦)𝑓𝑋(𝑦𝑧)𝑑𝑦
0

𝑧⋅𝑦, z, 𝑧⋅𝑦, a d f
8. Limit theorems and classical statistics

Chapter VIII

8. Limit theorems
and classical
statistics
8.1. Markov Inequality
If a random variable X can only take nonnegative values, then
𝐸[𝑋]
𝑃(𝑋≥𝑎) ≤ 𝑎
, 𝑓𝑜𝑟 𝑎𝑙𝑙𝑎 > 0

Exercise: Markov inequality

Let Z be a nonnegative random variable that satisfies E[Z4]=4. Apply the
Markov inequality to the random variable Z4 to find the tightest possible
(given the available information) upper bound on P(Z≥2).
¼

8.2. Chebyshev Inequality

If X is a random variable with mean μ and variance σ , then 2

2
σ
𝑃(|𝑋 − μ|≥𝑐) ≤ 2 , 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑐 > 0
𝑐

Exercise: Chebyshev inequality

Let Z be normal with zero mean and variance equal to 4. For this case, the
Chebyshev inequality yields:

𝑃(|𝑍|≥4)≤?
¼
8. Limit theorems and classical statistics

Exercise: Chebyshev versus Markov

Let X be a random variable with zero mean and finite variance. The Markov
inequality applied to |X| yields
𝐸[|𝑋|]
𝑃(|𝑋|≥𝑎) ≤ 𝑎

whereas the Chebyshev inequality yields

2
𝑃(|𝑋|≥𝑎) ≤ [ 2 ]
𝐸𝑋
𝑎

a) Is it true that the Chebyshev inequality is stronger (i.e., the upper bound
is smaller) than the Markov inequality, when a is very large?

b) Is it true that the Chebyshev inequality is always stronger (i.e., the upper
bound is smaller) than the Markov inequality?
Yes, No

Exercise: Sample mean bounds

if the Xi are i.i.d. with mean μ and variance σ2, and if Mn=(X1+⋯+Xn)/n, then
we have an inequality of the form
2

(| | )
𝑃 𝑀𝑛 − μ ≥ϵ ≤
𝑎σ
𝑛

for a suitable value of a.

a) If ϵ=0.1, then the value of a is:

b) If we change ϵ=0.1 to ϵ=0.1/k, for k≥1 (i.e., if we are interested in k times

higher accuracy), how should we change n so that the value of the upper
bound does not change from the value calculated in part (a)?

n should

a) stay the same

b) increase by a factor of k
c) increase by a factor of k2
d) decrease by a factor of k
e) none of the above
100, c
8. Limit theorems and classical statistics

Exercise: Polling
We saw that if we want to have a probability of at least 95% that the poll
results are within 1 percentage point of the truth, Chebyshev's inequality
recommends a sample size of n=50,000n=50,000. This is very large
compared to what is done in practice. Newspaper polls use smaller sample
sizes for various reasons. For each of the following, decide whether it is a
valid reason.

In the real world,

a) the accuracy requirements are looser.

b) the Chebyshev bound is too conservative.

c) the people sampled are all different, so their answers are not identically
distributed.

d) the people sampled do not have independent opinions.

Yes, Yes, No, No

8.3. Convergence of a Deterministic Sequence

Let a1, a2, . . . be a sequence of real numbers, and let a be another real
number. We say that the sequence an converges to a, or limn→∞ an = a, if for
every ϵ > 0 there exists some n0 such that

|𝑎𝑛 − 𝑎|≤ϵ , 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑛≥𝑛0

8.4. Convergence in Probability
Let Y1, Y2, . . . be a sequence of random variables (not necessarily
independent), and let a be a real number. We say that the sequence Yn
converges to a in probability, if for every ϵ > 0, we have

(| | )
lim 𝑃 𝑌𝑛 − 𝑎 ≥ϵ = 0
𝑛→∞

Exercise: Convergence in probability

a) Suppose that Xn is an exponential random variable with parameter λ=n.
Does the sequence {Xn} converge in probability?

b) Suppose that Xn is an exponential random variable with parameter λ=1/n.

Does the sequence {Xn} converge in probability?
8. Limit theorems and classical statistics

c) Suppose that the random variables in the sequence {Xn} are

independent, and that the sequence converges to some number a, in
probability. Let {Yn} be another sequence of random variables that are
dependent, but where each Yn has the same distribution (CDF) as X. Is it
necessarily true that the sequence {Yn} converges to aa in probability?
Yes, No, Yes

8.5. The Weak Law of Large Numbers

Let X1,X2, . . . be independent identically distributed random variables with
mean μ. For every ϵ > 0, we have

(| | ) (
| 𝑋 +⋯+𝑋 |
𝑃 𝑀𝑛 − μ ≥ϵ = 𝑃 | 1 𝑛 𝑛 − μ|≥ϵ →0, 𝑎𝑠𝑛→∞
| | )
8.6. The Central Limit Theorem
Let X1,X2, . . . be a sequence of independent identically distributed random
variables with common mean μ and variance σ , and define 2

𝑋1+⋯+𝑋𝑛−𝑛μ
𝑍𝑛 =
σ 𝑛

Then, the CDF of Zn converges to the standard normal CDF

2
𝑧 𝑥
1 − 2
Φ(𝑧) = ∫ 𝑒 𝑑𝑥
2π
−∞

in the sense that

lim 𝑃(𝑍𝑛≤𝑧) = Φ(𝑧), for every z.

𝑛→∞

Exercise: CLT
Let Xn be i.i.d. random variables with mean zero and variance σ2.
Let Sn=X1+⋯+Xn. Let Φ stand for the standard normal CDF. According to
the central limit theorem, and as n→∞, 𝑃(𝑆𝑛≤2σ 𝑛) converges to Φ(a),
where:

a= ?

Furthermore, P(Sn≤0) converges to: ?

2, ½
8. Limit theorems and classical statistics

Exercise: CLT applicability

Consider the class average in an exam in a few different settings. In all
cases, assume that we have a large class consisting of equally
well-prepared students. Think about the assumptions behind the central
limit theorem, and choose the most appropriate response under the given
description of the different settings.

1. Consider the class average in an exam of a fixed difficulty.

a) The class average is approximately normal

b) The class average is not approximately normal because the student
scores are strongly dependent
c) The class average is not approximately normal because the student
scores are not identically distributed

2. Consider the class average in an exam that is equally likely to be very

easy or very hard.

a) The class average is approximately normal

3. Consider the class average if the class is split into two equal-size
sections. One section gets an easy exam and the other section gets a hard
exam.

a) The class average is approximately normal

4. Consider the class average if every student is (randomly and

independently) given either an easy or a hard exam.

a) The class average is approximately normal

Exercise: CLT practice

The random variables Xi are i.i.d. with mean 2 and standard deviation equal
to 3. Assume that the Xi are nonnegative. Let Sn=X1+⋯+Xn.

Use the CLT to find good approximations to the following quantities. You
may want to refer to the normal table. In parts (a) and (b), give answers with
4 decimal digits.

( )
𝑎)𝑃 𝑆100≤245 ≈?

b) We let N (a random variable) be the first value of n for

which Sn exceeds 119.

𝑃(𝑁 > 49)≈?

c) What is the largest possible value of n for which we have P(Sn≤128)≈0.5?

0.9332, 0.8413, 64

Exercise: CLT for the binomial

Let X be binomial with parameters n=49 and p=1/10.

The mean of X is:?

The standard deviation of X is:?

The CLT, together with the 1/2-correction, suggests that 𝑃(𝑋 = 6)≈ ?

4.9, 2.1, 0.1623

8.7. An introduction to classical statistics

• Classical statistics treats unknown parameters as constants to be
determined. A separate probabilistic model is assumed for each possible
value of the unknown parameter.

• In parameter estimation, we want to generate estimates that are nearly

correct under any possible value of the unknown parameter.

• In hypothesis testing, the unknown parameter takes a finite number m of

values (m ≥ 2), corresponding to competing hypotheses; we want to
8. Limit theorems and classical statistics

choose one of the hypotheses, aiming to achieve a small probability of

error under any of the possible hypotheses.

• In significance testing, we want to accept or reject a single hypothesis,

while keeping the probability of false rejection suitably small.

• Principal classical inference methods in this chapter:

(a) Maximum likelihood (ML) estimation: Select the parameter that makes
the observed data “most likely,” i.e., maximizes the probability of obtaining
the data at hand.

(b) Linear regression: Find the linear relation that matches best a set of
data pairs, in the sense that it minimizes the sum of the squares of the
discrepancies between the model and the data

(c) Likelihood ratio test: Given two hypotheses, select one based on the
ratio of their “likelihoods,” so that certain error probabilities are suitably
small.

(d) Significance testing: Given a hypothesis, reject it if and only if the

observed data falls within a certain rejection region. This region is specially
designed to keep the probability of false rejection below some threshold.

8.8. Terminology Regarding Estimators

^
Let Θ 𝑛
be an estimator of an unknown parameter θ, that is, a function of n
observations X1, . . . ,Xn whose distribution depends on θ.
~ ~ ^
• The estimation error, denoted by Θ 𝑛, is defined by Θ 𝑛
=Θ 𝑛
− θ.

^
• The bias of the estimator, denoted by 𝑏θ(Θ 𝑛), is the expected value of the
estimation error:
^ ^
𝑏θ(Θ 𝑛) = 𝐸θ[Θ 𝑛] − θ

^
The expected value, the variance, and the bias of Θ 𝑛
depend on θ, while
the estimation error depends in addition on the observations X1, . . . ,Xn.
^ ^
• We call Θ 𝑛
unbiased if 𝐸θ[Θ 𝑛], for every possible value of θ.

^ ^
• We call Θ 𝑛
asymptotically unbiased if lim 𝐸θ[Θ 𝑛] = θ, for every possible
𝑛→∞

value of θ.
8. Limit theorems and classical statistics

^ ^
• We call Θ 𝑛
consistent if the sequence Θ 𝑛
converges to the true value of
the parameter θ, in probability, for every possible value of θ.

Exercise: Estimator properties

We estimate the unknown mean θ of a random variable X (where X has a
finite and positive variance) by forming the sample mean Mn=(X1+⋯+Xn)/n
of n i.i.d. samples Xi and then forming the estimator
^ 1
Θ = 𝑀𝑛 + 𝑛

Is this estimator unbiased?

Is this estimator consistent?

^
Consider now a different estimator, Θ 𝑛
= 𝑋1, which ignores all but the first
measurement.

Is this estimator unbiased?

Is this estimator consistent?

No, Yes, Yes, No

8.9. Maximum Likelihood Estimation

We are given the realization x = (x1, . . . , xn) of a random vector X = (X1, . . . ,Xn),
distributed according to a 𝑝𝑋(𝑥; θ) 𝑜𝑟 𝑃𝐷𝐹 𝑓𝑋(𝑥; θ)

• The maximum likelihood (ML) estimate is a value of θ that maximizes the

likelihood function 𝑝𝑋(𝑥; θ) 𝑜𝑟 𝑓𝑋(𝑥; θ), over all θ.

^ ^
• The ML estimate of a one-to-one function h(θ) of θ is ℎ(θ 𝑛) where θ 𝑛 is
the ML estimate of θ (the invariance principle).

• When the random variables Xi are i.i.d., and under some mild additional
assumptions, each component of the ML estimator is consistent and
asymptotically normal.

8.10. Estimates of the Mean and Variance of a

Random Variable
Let the observations X1, . . . ,Xn be i.i.d., with mean θ and variance v that are
unknown.
8. Limit theorems and classical statistics

• The sample mean

𝑋1+⋯+𝑋𝑛
𝑀𝑛 = 𝑛

is an unbiased estimator of θ, and its mean squared error is v/n.

• Two variance estimators are

𝑛 𝑛
2 1 2 ^2 1 2
𝑆𝑛 = 𝑛
∑ (𝑋𝑖 − 𝑀𝑛) , 𝑆 𝑛
= 𝑛−1
∑ (𝑋𝑖 − 𝑀𝑛)
𝑖=1 𝑖=1

2
• The estimator 𝑆𝑛 coincides with the ML estimator if the Xi are normal. It is
^2
biased but asymptotically unbiased. The estimator 𝑆 𝑛
is unbiased. For
large n, the two variance estimators essentially coincide.

8.11. Confidence Intervals

A confidence interval for a scalar unknown parameter θ is an interval
^− ^+
whose endpoints Θ 𝑛
𝑎𝑛𝑑 Θ 𝑛
bracket θ with a given high probability.

^− ^+
•Θ 𝑛
𝑎𝑛𝑑 Θ 𝑛
are random variables that depend on the observations X1, . . .
,Xn.

• A 1 − α confidence interval is one that satisfies

^− ^+
𝑃θ(Θ 𝑛 ≤θ≤Θ 𝑛 )≥1 − α

for all possible values of θ.

Exercise: Confidence interval interpretation

Every day, I try to estimate an unknown parameter using a fresh data set. I
look at the data and then I use some formulas to calculate a 70%
^− ^+
confidence interval, [Θ , Θ ], based on the day's data.

Are the following statements accurate?

Over the next 100 days, I expect that the unknown parameter will be inside
the confidence interval about 70 times.

If today's confidence interval is [0.41,0.47], there is probability 70% that

the unknown parameter is inside this confidence interval.
8. Limit theorems and classical statistics

Out of 100 days on which the confidence interval happens to

be [0.41,0.47], I expect that the unknown parameter will be inside the
confidence interval about 70 times.

Today, I decided to use a Bayesian approach, by viewing the unknown

parameter, denoted by Θ, as a continuous random variable and assuming a
prior PDF for Θ. I observe a specific value x, calculate the posterior fΘ|X(⋅|x),
and find out that
0.47
∫ 𝑓Θ∣𝑋(𝑥)𝑑θ = 0. 70
0.41

Am I allowed to say that there is probability 70%7 that the unknown

parameter is inside the (Bayesian) confidence interval [0.41,0.47]?
Yes, No, No, Yes

Exercise: A simple CI
Let θ be an unknown parameter, and let X be uniform on the
interval [θ−0.5,θ+0.5].

Is [X−2,X+2] an 80% confidence interval?

I form a confidence interval of the form [X−a,X+a]. What is the narrowest

confidence interval of this type (i.e., what is the smallest possible choice
of a) if I want to have an 80% confidence interval?
Yes, 0.40

Exercise: CI's via the CLT

^
The sample mean estimate Θ of the mean of a random variable with
variance 1, based on 100 samples, happened to be 22. The 80%
confidence interval provided by the CLT is of the form [a,b], with:

a =?

b =?

Your answers should include at least 2 decimal digits.

You may want to refer to the normal table (below). For your reference, if we

had 95% instead of 80%, the confidence interval would be of the form
^ 1.96σ ^ 1.96σ
[Θ − ,Θ + ]
𝑛 𝑛
8. Limit theorems and classical statistics

Exercise: Natural estimators

The random variables Xi are i.i.d. and satisfy E[X2i]=θ. Use a natural
estimator to calculate an estimate of θ based on the
values X1=1, X2=3, X3=−1, X4=2, X5=0.

In order to calculate confidence intervals around your estimator, you need

information on the variance of your estimator. This variance is determined
by E[X2i] and E[Xai] for some other power a. What is the value of a?

If you do not have any prior knowledge about the value of E[Xai], can you
estimate it based on the available data?
3, 4, Yes

Exercise: ML estimation
Let K be a Poisson random variable with parameter λ: its PMF is
𝑘 −λ
λ𝑒
𝑝𝐾(𝑘, λ) = 𝑘!
, 𝑓𝑜𝑟 𝑘 = 0, 1, 2, …

What is the ML estimate of λ based on a single observation K=k? (Your

answer should be an algebraic function of k using standard notation.)
k

Problems
Problem 1. Convergence in probability
For each of the following sequences, determine whether it converges in
probability to a constant. If it does, enter the value of the limit. If it does
not, enter the number “999".

Let X1,X2,… be independent continuous random variables, each uniformly

distributed between −1 and 1.
𝑋1+𝑋2+⋯+𝑋𝑖
Let 𝑖
, 𝑖 = 1, 2 What value does the sequence Ui converge to in
probability? (If it does not converge, enter the number “999". Similarly, in
all below.)

𝐿𝑒𝑡 Σ𝑖 = 𝑋1 + 𝑋2 + ⋯ + 𝑋 '𝑖 = 1, 2, … What value does the

𝑖
sequence Σi converge to in probability?
1
𝐿𝑒𝑡 𝐼𝑖 = 1 if 𝑋𝑖 ≥ 2
, 𝑎𝑛𝑑 𝐼𝑖 = 0 otherwise, define
8. Limit theorems and classical statistics

𝐼1+𝐼2+⋯+𝐼𝑖
𝑆𝑖 = 𝑖

What value does the sequence Si converge to, in probability?

𝐿𝑒𝑡 𝑊𝑖 = 𝑚𝑎𝑥{𝑋1, …, 𝑋𝑖}, 𝑖 = 1, 2, … What value does the

sequence Wi converge to in probability?

𝐿𝑒𝑡 𝑉𝑡 = 𝑋1 · 𝑋2 ··· 𝑋 '𝑖 = 1, 2, … What value does the sequence Vi converge

𝑡
to in probability?

Let X1,X2,…, be independent identically distributed random variables with

𝑋𝑖
[ ]
𝐸 𝑋𝑖 = 2 𝑎𝑛𝑑 𝑋𝑖 ( ) = 9, 𝑎𝑛𝑑 𝑙𝑒𝑡 𝑌𝑖 =
2
𝑖

What value does the sequence Yi converge to in probability?

𝑛
1
𝐿𝑒𝑡 𝐴𝑛 = 𝑛
∑ 𝑌𝑖
𝑖=1

𝑛
1 2 1
𝐿𝑒𝑡 𝑍𝑖 = 3
𝑋𝑖 + 3
𝑋𝑖+1𝑓𝑜𝑟 𝑖 = 1, 2, …, 𝑎𝑛𝑑 𝑙𝑒𝑡 𝑀𝑛 = 𝑛
∑ 𝑍𝑖 𝑓𝑜𝑟 𝑛 = 1, 2
𝑖=1

What value does the sequence Mn converge to in probability?

0, 999, 0.25, 1, 0, 0, 0, 2

Problem 2. Find the limits

Let Sn be the number of successes in n independent Bernoulli trials, where
the probability of success at each trial is 1/3. Provide a numerical value, to a
precision of 3 decimal places, for each of the following limits. You may
want to refer to the standard normal table.

lim 𝑃
𝑛→∞
( 𝑛
3
− 10≤𝑆𝑛 ≤
𝑛
3
+ 10 =? )
lim 𝑃
𝑛→∞
( 𝑛
3
−
𝑛
6
≤ 𝑆𝑛 ≤
𝑛
3
+
𝑛
6 ) =?
𝑛→∞
lim 𝑃 ( 𝑛
3
−
2𝑛
5
≤ 𝑆𝑛 ≤
𝑛
3
+
2𝑛
5 ) =?
0, 1, 0.45
8. Limit theorems and classical statistics

Problem 3. The sample mean

Let X be a continuous random variable. We know that it takes values
between 0 and 6, but we do not know its distribution or its mean and
variance, although we know that its variance is at most 4. We are
interested in estimating the mean of X, which we denote by h. To
estimate h, we take n i.i.d. samples X1,…,Xn, which all have the same
distribution as X, and compute the sample mean
𝑛
1
𝐻= 𝑛
∑ 𝑋𝑖
𝑖=1

Express your answers for this part in terms of h and n using standard

notation.

E[H]=?

Given the available information, the smallest upper bound for Var(H) that

we can assert/guarantee is:

Var(H)≤?

Calculate the smallest possible value of n such that the standard deviation
of H is guaranteed to be at most 0.01.

This minimum value of n is:?

We would like to be at least 96% sure that our estimate is within 0.02 of

the true mean h. Using the Chebyshev inequality, calculate the minimum
value of n that will achieve this.

This minimum value of n is:?

Suppose now that X is uniformly distributed on [h−3,h+3], for some

unknown h. Using the Central Limit Theorem, identify the most appropriate
expression for a 95% confidence interval for h. You may want to refer to
the normal table.

1.96⋅3 1.96⋅3
a) ⎡⎢𝐻 − ,𝐻 + ⎤
⎥
⎣ 𝑛 𝑛 ⎦
1.96 1.96
b) ⎡⎢𝐻 − ,𝐻 + ⎤
⎥
⎣ 3𝑛 3𝑛 ⎦
1.96⋅ 3 1.96⋅ 3
c) ⎡⎢𝐻 − ,𝐻 + ⎤
⎥
⎣ 𝑛 𝑛 ⎦
1.96⋅3 1.96⋅3
d) ⎡⎢𝐻 − ,𝐻 + ⎤
⎥
⎣ 𝑛 𝑛 ⎦
8. Limit theorems and classical statistics

h, 4/n, 40000, 250000, c

Problem 4. Airline overbooking

For any given flight, an airline tries to sell as many tickets as possible.
Suppose that on average, 20% of ticket holders fail to show up, all
independent of one another. Knowing this, an airline will sell more tickets
than there are seats available (i.e., overbook the flight) and hope that there
is a sufficient number of ticket holders who do not show up, to
compensate for its overbooking. Using the Central Limit Theorem,
determine n, the maximum number of tickets an airline can sell on a flight
with 400 seats so that it can be approximately 99% confident that all
ticket holders who do show up will be able to board the plane. Use the de
Moivre-Laplace 1/2-correction in your calculations. Hint: You may have to
solve numerically a quadratic equation.
475

Problem 5. Maximum likelihood estimation

The random variables X1,X2,…,Xn are continuous, independent, and
distributed according to the Erlang PDF
3 2 −λ𝑥
λ𝑥𝑒
𝑓𝑋(𝑥) = 2
, 𝑓𝑜𝑟 𝑥≥0

where λ is an unknown parameter. Find the maximum likelihood estimate

of λ, based on observed values x1,x2,…,xn. Express your answer as a
function of n and s where s=x1+x2+…xn.
^
λ 𝑀𝐿
=?

3⋅𝑛
𝑠

Problem 6. Tossing a triple of coins

We have a red coin, for which P(Heads)=0.4, a green coin, for
which P(Heads)=0.5, and a yellow coin, for which P(Heads)=0.6. The flips of
the same or of different coins are independent. For each of the following
situations, determine whether the random variable N can be approximated
by a normal.

If yes, enter the mean and variance of N. If not, enter 0 in both of the
corresponding answer boxes.

Let N be the number of Heads in 300 tosses of the red coin.

8. Limit theorems and classical statistics

mean: ?

variance: ?

Let N be the number of Heads in 300 tosses. At each toss, one of the three
coins is selected at random (either choice is equally likely), and
independently from everything else.

mean: ?

variance: ?

Let N be the number of Heads in 100 tosses of the red coin, followed by
100 tosses of the green coin, followed by 100 tosses of the yellow coin (for
a total of 300 tosses).

mean: ?

variance: ?

We select one of the three coins at random: each coin is equally likely to be
selected. We then toss the selected coin 300 times, independently, and
let N be the number of Heads.

mean: ?

variance: ?
120 72, 150 75, 150 73, 0 0
9. Bernoulli and Poisson processes

Chapter IX

9. Bernoulli and
Poisson
processes
9.1. BERNOULLI PROCESS
Some Random Variables Associated with the Bernoulli Process and their
Properties

• The binomial with parameters p and n. This is the number S of successes

in n independent trials. Its PMF, mean, and variance are
𝑘 𝑛−𝑘
𝑝𝑆(𝑘) = (𝑛 𝑘 )𝑝 (1 − 𝑝) , 𝑘 = 0, 1, …, 𝑛

𝐸[𝑆] = 𝑛𝑝, (𝑆) = 𝑛𝑝(1 − 𝑝)

The geometric with parameter p. This is the number T of trials up to (and

including) the first success. Its PMF, mean, and variance are
𝑡−1
𝑝𝑇(𝑡) = (1 − 𝑝) 𝑝, 𝑡 = 1, 2, …

1 1−𝑝
𝐸[𝑇] = 𝑝
, (𝑇) = 2
𝑝

Exercise: The Bernoulli process

Let X1,X2,… be a Bernoulli process. We will define some new sequences of
random variables and inquire whether they form a Bernoulli process.

Let Yn=X2n. Is the sequence Yn a Bernoulli process?

Let Un=Xn+1. Is the sequence Un a Bernoulli process?

Let Vn=Xn+Xn+1. Is the sequence Vn a Bernoulli process?

Let Wn=(−1)nXn. Is the sequence Wn a Bernoulli process?

9. Bernoulli and Poisson processes

Yes, Yes, No, No

Independence Properties of the Bernoulli Process

• For any given time n, the sequence of random variables Xn+1,Xn+2, . . .(the
future of the process) is also a Bernoulli process, and is independent from
X1, . . . ,Xn (the past of the process).

• Let n be a given time and let T be the time of the first success after time
n. Then, T − n has a geometric distribution with parameter p, and is
independent of the random variables X1, . . . ,Xn.

Alternative Description of the Bernoulli Process

1. Start with a sequence of independent geometric random variables T1, T2,
. . ., with common parameter p, and let these stand for the interarrival
times.

2. Record a success (or arrival) at times T1, T1 + T2, T1 + T2 + T3, etc.

Properties of the kth Arrival Time

• The kth arrival time is equal to the sum of the first k interarrival times

𝑌𝑘 = 𝑇1 + 𝑇2 + ⋯ + 𝑇𝑘

and the latter are independent geometric random variables with common
parameter p.

• The mean and variance of Yk are given by

[ ] [ ]
𝐸 𝑌𝑘 = 𝐸 𝑇1 + ⋯ + 𝐸 𝑇𝑘 = [ ] 𝑘
𝑝

(𝑌𝑘) = (𝑇1) + ⋯ + 𝑇𝑘( ) =

𝑘(1−𝑝)
𝑝
2

• The PMF of Yk is given by

𝑘
𝑝𝑌 (𝑡) = (𝑡 − 1 𝑘 − 1 )𝑝 (1 − 𝑝)𝑡 − 𝑘, 𝑡 = 𝑘, 𝑘 + 1, …
𝑘

and is known as the Pascal PMF of order k.

Exercise: Time until the first failure

Let the sequence Xn, n=1,2,3,…, be a Bernoulli process with parameter
P(Xn=1)=p for all n≥1. Let U be the time when a value of 0 is first observed:
U=min{n:Xn=0} Then, the random variable U is:

a) Geometric with parameter p

9. Bernoulli and Poisson processes

b) Geometric with parameter 1−p

c) None of the above
b

Exercise: Fresh start

Consider a Bernoulli process, with a “1" considered a success and a “0"
considered a failure. Determine whether the process starts fresh right
after each of the following random times:

a) The time of the kth failure

b) The first time that a failure follows a success
c) The first time at which we have a failure that will be followed by a
success
Yes Yes No

Exercise: More on fresh start

Consider a Bernoulli process with parameter p=1/3. Let T1 be the time of
the first success and let T1+T2 be the time of the second success. We are
told that the results of the two slots that follow the first success are
failures, so that XT1+1=XT1+2=0. What is the conditional expectation of the
second interarrival time, T2, given this information? (Recall that the
expectation of a geometric random variable with parameter p is equal
to 1/p.)
5

Exercise: Busy periods

Consider the same setting as in the last video. After the first busy period
ends (with an idle slot), there will be a subsequent busy period, which starts
with a busy slot, and lasts as long as the slots are busy. Is it true that the
length of the second busy period is geometric?
Yes

Exercise: A variation on merging

We start with two independent Bernoulli processes, Xn and Yn, with
parameters p and q, respectively. We form a new process Zn by recording
an arrival in a given time slot if and only if both of the original processes
record an arrival in that same time slot. Mathematically, Zn=XnYn.

The new process Zn is also Bernoulli with parameter ?

9. Bernoulli and Poisson processes

Suppose that the two Bernoulli processes Xn and Yn are dependent. We

still assume, however, that the pairs (Xn,Yn) are independent. E.g., (X1,Y1) is
independent from (X2,Y2), etc. Is the process Zn guaranteed to be
Bernoulli?
pq, no

Exercise: Splitting
For each exam, Ariadne studies with probability 1/2 and does not study
with probability 1/2, independently of any other exams. On any exam for
which she has not studied, she still has a 0.20 probability of passing,
independently of whatever happens on other exams. What is the expected
number of total exams taken until she has had 3 exams for which she did
not study but which she still passed?
30

9.2. Poisson Process

Poisson Approximation to the Binomial
• A Poisson random variable Z with parameter λ takes nonnegative integer
values and is described by the PMF
𝑘
−λ λ
𝑝𝑍(𝑘) = 𝑒 𝑘!
, 𝑘 = 0, 1, 2, …

Its mean and variance are given by

𝐸[𝑍] = λ, (𝑍) = λ

• For any fixed nonnegative integer k, the binomial probability

𝑛! 𝑘 𝑛−𝑘
𝑝𝑆(𝑘) = (𝑛−𝑘)!𝑘!
· 𝑝 (1 − 𝑝)

converges to pZ(k), when we take the limit as n → ∞ and p = λ/n, while

keeping λ constant.

• In general, the Poisson PMF is a good approximation to the binomial as

long as λ = np, n is very large, and p is very small.

Definition of the Poisson Process

An arrival process is called a Poisson process with rate λ if it has the
following properties:
9. Bernoulli and Poisson processes

(a) (Time-homogeneity) The probability P(k, τ) of k arrivals is the same for all
intervals of the same length τ.

(b) (Independence) The number of arrivals during a particular interval is

independent of the history of arrivals outside this interval.

(c) (Small interval probabilities) The probabilities P(k, τ) satisfy

𝑃(0, τ) = 1 − λτ + 𝑜(τ) 𝑃(1, τ) = λτ + 𝑜1(τ) 𝑃(𝑘, τ) = 𝑜𝑘(τ), 𝑓𝑜𝑟𝑘 = 2, 3, …

Here, o(τ) and o (τ) are functions of τ that satisfy.

𝑜(τ) 𝑜𝑘(τ)
lim τ
= 0, lim τ
=0
τ→0 τ→0

Poisson process definition

Consider a Poisson process with rate λ=4, and let N(t) be the number of
arrivals during the time interval [0,t].

Suppose that you have recorded this process in a movie and that you play
this movie at twice the speed. The process that you will be seeing in the
sped-up movie satisfies the following (pick one of the answers):

a) is a Poisson process with rate 2

b) is a Poisson process with rate 4
c) is a Poisson process with rate 8
d) is not a Poisson process
c

Poisson models
For each one of the following situations, state whether a Poisson model is a
plausible model over the specified time frame.

a) The process of arrivals of passengers to the baggage claim section

of an airport
b) The process of order arrivals at an online retailer between 3:00 and
3:15 pm
c) The process of order arrivals at a local pizza delivery shop over the
course of a day
No, Yes, No
9. Bernoulli and Poisson processes

Poisson practice
Consider a Poisson arrival process with rate λλ per hour. To simplify
notation, we let a=P(0,1), b=P(1,1), and c=P(2,1), where P(k,1) is the
probability of exactly k arrivals over an hour-long time interval.

What is the probability that we will have “at most one arrival between
10:00 and 11:00 and exactly two arrivals between 10:00 and 12:00"? Your
answer should be an algebraic function of a, b, and c
2
𝑎 · 𝑐 +𝑏

Exercise: Describing events

Events related to the Poisson process can be often described in two
equivalent ways: in terms of numbers of arrivals during certain intervals or
in terms of arrival times. The first description involves discrete random
variables, the second continuous random variables. Let N(t) be the number
of arrivals during the time interval [0,t] in a Poisson process. Let Yk be the
time of the kth arrival. a) The event {N(5)>1} is equivalent to the event
{Yk≤b}, for suitable b and k. Find b and k. b) The event {2<Y3≤Y4≤5} is
equivalent to the event {N(2)≤a and N(5)≥b}. Find a and b.
5, 2, 2 4

Random Variables Associated with the Poisson Process

and their Properties
• The Poisson with parameter λτ. This is the number N_ of arrivals in a
Poisson process with rate λ, over an interval of length τ. Its PMF, mean, and
variance are
𝑘
−λτ (λτ)
𝑝𝑁 (𝑘) = 𝑃(𝑘, τ) = 𝑒
τ
𝑘! [ ]
, 𝑘 = 0, 1, … 𝐸 𝑁τ = λτ, 𝑁τ ( ) = λτ

• The exponential with parameter λ. This is the time T until the first arrival.
Its PDF, mean, and variance are
−λ𝑡 1 1
𝑓𝑇(𝑡) = λ𝑒 , 𝑡≥0, 𝐸[𝑇] = λ
, (𝑇) = 2
λ

Independence Properties of the Poisson Process

• For any given time t > 0, the history of the process after time t is also a
Poisson process, and is independent from the history of the process until
time t.
9. Bernoulli and Poisson processes

• Let t be a given time and let 𝑇 be the time of the first arrival after time t.
Then, 𝑇 −t has an exponential distribution with parameter λ, and is
independent of the history of the process until time t.

Alternative Description of the Poisson Process

1. Start with a sequence of independent exponential random variables T1,
T2,. . ., with common parameter λ, and let these represent the interarrival
times.

2. Record an arrival at times T1, T1 + T2, T1 + T2 + T3, etc.

Properties of the kth Arrival Time

• The kth arrival time is equal to the sum of the first k interarrival times

𝑌𝑘 = 𝑇1 + 𝑇2 + ⋯ + 𝑇𝑘

and the latter are independent exponential random variables with common
parameter λ.

• The mean and variance of Yk are given by

[ ] [ ]
𝐸 𝑌𝑘 = 𝐸 𝑇1 + ⋯ + 𝐸 𝑇𝑘 = [ ] 𝑘
λ (𝑌𝑘) = (𝑇1) ( )
+ ⋯ + 𝑇𝑘 =
𝑘
2
λ

• The PDF of Yk is given by

𝑘 𝑘−1 −λ𝑦
λ𝑦 𝑒
𝑓𝑌 (𝑦) = (𝑘−1)!
, 𝑦≥0
𝑘

and is known as the Erlang PDF of order k.

Properties of Sums of a Random Number of Random

Variables
Let N,X1,X2, . . . be independent random variables, where N takes
nonnegative integer values. Let Y = X1 + ・・・ + XN for positive values of N,
and let Y = 0 when N = 0.

• If Xi is Bernoulli with parameter p, and N is binomial with parameters m

and q, then Y is binomial with parameters m and pq.

• If Xi is Bernoulli with parameter p, and N is Poisson with parameterλ, then

Y is Poisson with parameter λp.

• If Xi is geometric with parameter p, and N is geometric with parameter q,

then Y is geometric with parameter pq.
9. Bernoulli and Poisson processes

• If Xi is exponential with parameter λ, and N is geometric with parameter q,

then Y is exponential with parameter λq.

Exercise: Erlang r.v.'s

Let X and Y be independent Erlang random variables with common
parameter λ and of order m and n, respectively. Is the random
variable X+Y Erlang? If yes, enter below its order in terms of m and n.
m+n

Exercise: The time of the kth arrival

Let Yk be the time of the kth arrival in a Poisson process with parameter
λ=1. In particular, E[Yk]=k. Is it true that P(Yk≥k)=1/2 for any finite k? No
correct Is it true that lim 𝑃(𝑌𝑘≥𝑘) = 1/2?
𝑘→∞

No, Yes

Exercise: Bank tellers

When you enter your bank, you find that there are only two tellers, both
busy serving other customers, and that there are no other customers in
line. Assume that the service times for you and for each of the customers
being served are independent identically distributed exponential random
variables. Also assume that after a service completion, the next customer
in line immediately begins to be served. What is the probability that you will
be the last to leave? Hint: Think of the situation at the time that you start
getting served.
½

9.3. Problems
Problem 1. Marie gives away children toys
Marie distributes toys for toddlers. She makes visits to households and
gives away one toy only on visits for which the door is answered and a
toddler is in residence. On any visit, the probability of the door being
answered is 3/4, and the probability that there is a toddler in residence is
1/3. Assume that the events “Door answered" and “Toddler in residence"
are independent and also that events related to different households are
independent.
9. Bernoulli and Poisson processes

a) What is the probability that she has not distributed any toys by the
end of her second visit?
b) What is the probability that she gives away the first toy on her
fourth visit?
c) Given that she has given away her second toy on her fifth visit, what
is the conditional probability that she will give away her third toy on
her eighth visit?
d) What is the probability that she will give away the second toy on her
fourth visit?
e) Given that she has not given away her second toy by her third visit,
what is the conditional probability that she will give away her
second toy on her fifth visit?
f) We will say that Marie “needs a new supply"" immediately after the
visit on which she gives away her last toy. If she starts out with
three toys, what is the probability that she completes at least five
visits before she needs a new supply?
g) If she starts out with exactly six toys, what is the expected value of
the number of houses with toddlers that Marie visits without
leaving any toys (because the door was not answered) before she
needs a new supply?
0.5625, 0.105469, 0.140625, 0.105469, 0.1250, 0.95, 2

Problem 2. Three engines

Suppose that we have three engines, which we turn on at time 0. Each
engine will eventually fail, and we model each engine's lifetime as
exponentially distributed with parameter λ. The lifetimes of different
engines are independent. One of the engines will fail first, followed by the
second, and followed by the last. Let T1 be the time of the first failure, T2 be
the time of the second failure, and T3 be the time of the third failure. For
answers involving algebraic expressions, enter "lambda" for λ and use
"exp()" for exponentials. Follow standard notation.

a) Determine the PDF of T1.

For t>0
b) Let X=T2−T1. Determine the conditional PDF 𝑓𝑋∣𝑇 (𝑥∣𝑡).
1

For x,t>0,
c) Is X independent of T1?
d) Let Y=T3−T2. Find the PDF of 𝑓𝑌∣𝑇 (𝑦∣𝑇2)
2

For y,t>0,
e) Is Y independent of T2?
9. Bernoulli and Poisson processes

f) Find the PDF fT3(t) for t≥0.

For t≥0,
g) Find E[T3].
−3⋅λ⋅𝑡 −2⋅λ⋅𝑥 2
(3⋅λ)⋅𝑒 , (2⋅λ)⋅𝑒 , yes, λ⋅𝑒𝑥𝑝⁡(− λ⋅𝑦), yes, 3⋅λ⋅𝑒𝑥𝑝⁡(− λ⋅𝑡)⋅(1 − 𝑒𝑥𝑝⁡(− λ⋅𝑡)) ,
1 1 1
3⋅λ
+ 2⋅λ
+ λ

Problem 3. Shuttles
In parts 1, 3, 4, and 5 below, your answers will be algebraic expressions.
Enter "lambda" for λλ and "mu" for μμ. Follow standard notation.

Shuttles bound for Boston depart from New York every hour on the hour
(e.g., at exactly one o"clock, two o'clock, etc.). Passengers arrive at the
departure gate in New York according to a Poisson process with rate λ per
hour. What is the expected number of passengers on any given shuttle?
(Assume that everyone who arrives between two successive shuttle
departures boards the shuttle immediately following his/her arrival. That is,
shuttles are big enough to accommodate all arriving passengers.)

For this and for the remaining parts of this problem, suppose that the
shuttles are not operating on a deterministic schedule. Rather, their
interdeparture times are independent and exponentially distributed with
common parameter μ per hour. Furthermore, shuttle departures are
independent of the process of passenger arrivals. Is the sequence of
shuttle departures a Poisson process?

Let us say that an "event" occurs whenever a passenger arrives or a shuttle

departs. What is the expected number of "events" that occur in any
one-hour interval?

If a passenger arrives at the gate and sees 2λ people waiting (assume

that 2λ is an integer), what is his/her expected waiting time until the next
shuttle departs?

Find the PMF, pN(n), of the number, N, of people on any given shuttle.

Assume that λ=20 and μ=2.
For n≥0,
𝑛
1 2⋅20
λ, Yes, λ + μ, µ
, 𝑛+1
22

Problem 4. Ships
All ships travel at the same speed through a wide canal. Each ship
takes t days to traverse the length of the canal. Eastbound ships (i.e., ships
9. Bernoulli and Poisson processes

traveling east) arrive as a Poisson process with an arrival rate of λE ships

per day. Westbound ships (i.e., ships traveling west) arrive as an
independent Poisson process with an arrival rate of λW ships per day. A
pointer at some location in the canal is always pointing in the direction of
travel of the most recent ship to pass it.

In each part below, your answers will be algebraic expressions in terms

of λE,λW,x,t,v and/or k.

For Parts 1 and 2, suppose that the pointer is currently pointing west.

What is the probability that the next ship to pass will be westbound?

Determine the PDF, fX(x), of the remaining time, X, until the pointer changes
direction.

For x≥0, fX(x)=?

For the remaining parts of this problem, we make no assumptions about

the current direction of the pointer.

What is the probability that an eastbound ship does not pass by any
westbound ships during its eastward journey through the canal?

Starting at an arbitrary time, we monitor the cross-section of the canal at

some fixed location along its length. Let V be the amount of time that will
have elapsed (since we began monitoring) by the time we observe our
seventh eastbound ship. Find the PDF of V.
For v≥0,

fV(v)=?

What is the probability that the next ship to arrive causes a change in the
direction of the pointer?

If we begin monitoring a fixed cross-section of the canal at an arbitrary

time, determine the probability mass function pK(k) for K, the total number
of ships we observe up to and including the seventh eastbound ship we
see. The answer will be of the form 𝑝𝐾(𝑘) = (𝑎 6 )⋅𝑏, for suitable algebraic
expressions in place of a and b. Find a, b.
7 6 (−𝐿𝐸)⋅𝑣 7 𝑘−7
𝐿𝑊 −(𝐿𝐸)⋅𝑥 (−𝐿𝑊)⋅2⋅𝑡 (𝐿𝐸) ·𝑣 ·𝑒 2⋅𝐿𝑊⋅𝐿𝐸 𝐿𝐸 ⋅𝐿𝑊
𝐿𝐸+𝐿𝑊
, (𝐿𝐸) · 𝑒 ,𝑒 , 720
, 2 , k−1, 𝑘
(𝐿𝑊+𝐿𝐸) (𝐿𝐸+𝐿𝑊)
9. Bernoulli and Poisson processes

Problem 5. Arrivals during overlapping time intervals

Consider a Poisson process with rate λ. Let N be the number of arrivals
in (0,t] and M be the number of arrivals in (0,t+s], where t>0,s≥0.

In each part below, your answers will be algebraic expressions in terms

of λ,t,s,m and/or n. Enter "lambda" for λ and use "exp()" for exponentials.
Do not use "fac()" or "!" for factorials. Follow standard notation.

For 0≤n≤m, the conditional PMF 𝑝𝑀∣𝑁(𝑚∣𝑛) of M given N is of the form a/b! for

suitable algebraic expressions in place of a and b. Find a, b, E[NM].
𝑚−𝑛 2
𝑒𝑥𝑝⁡(− λ⋅(𝑠))⋅(λ⋅𝑠) , m−n, (λ⋅𝑡)⋅(λ⋅𝑠) + (λ⋅𝑡) + (λ⋅𝑡)

Problem 6. Random incidence under Erlang interarrivals

A factory produces copper wires. A blue mark is placed on a very long
length of copper. The wire is then cut into pieces. The lengths of different
pieces are independent, and the length of each piece is distributed
according to the same PDF fX(x). Let R be length of the piece including the
blue mark. Determine the expected value of R in each of the following
cases.

In each part below, express your answer in terms of μ using standard

notation. Enter "mu" for μ.

Suppose that :
−μ𝑥
𝑓𝑋(𝑥) = {µ𝑒 , 𝑥≥0 0, 𝑥 < 0

E[R]=?
4 3 −μ𝑥
µ𝑥𝑒
𝑓𝑋(𝑥) = { 6
, 𝑥≥0 0, 𝑥 < 0

E[R]=?
2 5
µ
, µ

Problem 7. Sampling families

We are given the following statistics about the number of children in the
families of a small village.

There are 100 families: 10 families have no children, 40 families have 1

child each, 30 families have 2 children each, 10 families have 3 each, and
10 families have 4 each.
9. Bernoulli and Poisson processes

If you pick a family at random (each family in the village being equally likely
to be picked), what is the expected number of children in that family?

If you pick a child at random (each child in the village being equally likely to
be picked), what is the expected number of children in that child's family
(including the picked child)?

Generalize your approach from part 2: Suppose that a fraction pk of the

families have k children each. Let K be the number of children in a randomly
selected family, and let a=E[K] and b=E[K2]. Let W be the number of children
in the family of a randomly chosen child. Express E[W] in terms
of a and b using standard notation.
1.7, 410/170, b/a

Problem 8. Poisson fun

Based on your understanding of the Poisson process, determine the
numerical values of a and b in the following expression.
∞ 6 5 −λ𝑡 𝑏 𝑘 −λ𝑡
λτ𝑒 (λ𝑡) 𝑒
∫ 5!
𝑑τ = ∑ 𝑘!
𝑡 𝑘=𝑎

Find a, b.
0, 5

Analysis and Linear Algebra For Finance Part I
100% (1)
Analysis and Linear Algebra For Finance Part I
127 pages
MTH3251 Financial Mathematics Exercise Book 15
No ratings yet
MTH3251 Financial Mathematics Exercise Book 15
18 pages
ProbabilityandProbabilityDistribution 2 PDF
No ratings yet
ProbabilityandProbabilityDistribution 2 PDF
65 pages
Introductory Probability Theory (Nicholas N.N. Nsowah-Nuamah)
No ratings yet
Introductory Probability Theory (Nicholas N.N. Nsowah-Nuamah)
299 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
Expected Value Markov Chains
No ratings yet
Expected Value Markov Chains
10 pages
For Mathematics For Management
No ratings yet
For Mathematics For Management
37 pages
Properties of The Trinomial Distribution
No ratings yet
Properties of The Trinomial Distribution
2 pages
Discrete Mathematics and Its Applications 7th Edition by Kenneth Rosen Official Test Bank
No ratings yet
Discrete Mathematics and Its Applications 7th Edition by Kenneth Rosen Official Test Bank
312 pages
Fall 2011
No ratings yet
Fall 2011
2 pages
Introduction To Probability 1
No ratings yet
Introduction To Probability 1
71 pages
Advanced Topics in Introductory Probability
100% (1)
Advanced Topics in Introductory Probability
275 pages
Probability Theory and Stochastic Processes
0% (1)
Probability Theory and Stochastic Processes
155 pages
Binomial Distribution Guide
No ratings yet
Binomial Distribution Guide
5 pages
Portfolio Theory Notes
No ratings yet
Portfolio Theory Notes
103 pages
Statistical Inference
100% (1)
Statistical Inference
118 pages
Advanced Triangle Inequalities
No ratings yet
Advanced Triangle Inequalities
10 pages
Fundamental Probability - 2006 - Paolella
100% (2)
Fundamental Probability - 2006 - Paolella
474 pages
An Introduction To Stochastic Calculus
100% (1)
An Introduction To Stochastic Calculus
102 pages
Analysis and Linear Algebra For Finance Part II
100% (1)
Analysis and Linear Algebra For Finance Part II
156 pages
Set Theory & Probability Cheat Sheet
No ratings yet
Set Theory & Probability Cheat Sheet
2 pages
Measure Theory - SEO
No ratings yet
Measure Theory - SEO
62 pages
Probability Random Variables and Random Processes Part 1
100% (10)
Probability Random Variables and Random Processes Part 1
30 pages
Probability for Robotics Students
100% (1)
Probability for Robotics Students
19 pages
B.sc. Statistics
No ratings yet
B.sc. Statistics
63 pages
Stat 230 Notes
No ratings yet
Stat 230 Notes
248 pages
First Course Mathematical Statistics by C. E. Weatherburn
100% (1)
First Course Mathematical Statistics by C. E. Weatherburn
302 pages
Logistic Regression Lab Guide
No ratings yet
Logistic Regression Lab Guide
10 pages
Introduction-To-Problem-Solving-Based-On-The-First-32-British-Mathematical-Olympiads-1965-1996-Oxford-Science - Publications-32862404
No ratings yet
Introduction-To-Problem-Solving-Based-On-The-First-32-British-Mathematical-Olympiads-1965-1996-Oxford-Science - Publications-32862404
175 pages
CHAPTER3 Continuous Probability Distribution
No ratings yet
CHAPTER3 Continuous Probability Distribution
56 pages
Probability and Random Variables
100% (3)
Probability and Random Variables
380 pages
Applications of External Direct Product: Group Theory - Ii
No ratings yet
Applications of External Direct Product: Group Theory - Ii
11 pages
Solutions Degroot
No ratings yet
Solutions Degroot
405 pages
Multiple Regression - Estimation
No ratings yet
Multiple Regression - Estimation
18 pages
Lecture Note 2033 Full Version
No ratings yet
Lecture Note 2033 Full Version
43 pages
Multivariate Distributions
No ratings yet
Multivariate Distributions
8 pages
B.SC Statistics PDF
No ratings yet
B.SC Statistics PDF
16 pages
Geometric Linear Programming CH03 PDF
No ratings yet
Geometric Linear Programming CH03 PDF
36 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
8 pages
Formulae
No ratings yet
Formulae
2 pages
D. Equations
No ratings yet
D. Equations
39 pages
Random Variables & Discrete Probability Distributions
100% (1)
Random Variables & Discrete Probability Distributions
22 pages
Mathematical Statistics With Applications 7th Edition Dennis D. Wackerly Full
75% (8)
Mathematical Statistics With Applications 7th Edition Dennis D. Wackerly Full
168 pages
Chapter 2 Boolean Algebra and Logic Gates
No ratings yet
Chapter 2 Boolean Algebra and Logic Gates
54 pages
Statistical Inference
No ratings yet
Statistical Inference
35 pages
Probability and Statistics For Engineers Syllabus
No ratings yet
Probability and Statistics For Engineers Syllabus
7 pages
Theory of Estimation by P.G.dixit, Nirali Publication
No ratings yet
Theory of Estimation by P.G.dixit, Nirali Publication
186 pages
Nonlinear Least Squares Theory - Lecture Notes
No ratings yet
Nonlinear Least Squares Theory - Lecture Notes
33 pages
Laplace-Stieltjes Transform
100% (1)
Laplace-Stieltjes Transform
4 pages
2 Random Vectors
No ratings yet
2 Random Vectors
53 pages
Linear Algebra For Data Science, Machine Learning, and Signal Processing
0% (3)
Linear Algebra For Data Science, Machine Learning, and Signal Processing
2 pages
Chapter 3
100% (2)
Chapter 3
19 pages
Lecture01a (Linear Algebra)
100% (1)
Lecture01a (Linear Algebra)
113 pages
CH 1
No ratings yet
CH 1
107 pages
Lectureslides L01cleanslides
No ratings yet
Lectureslides L01cleanslides
20 pages
Basics of Probability
No ratings yet
Basics of Probability
6 pages
Probability & Statistics
No ratings yet
Probability & Statistics
54 pages
Probability and Statistics - Stat 166 - It
No ratings yet
Probability and Statistics - Stat 166 - It
62 pages
CHAPTER - 02-Basic Probability Theory
No ratings yet
CHAPTER - 02-Basic Probability Theory
34 pages