[go: up one dir, main page]

0% found this document useful (0 votes)
11 views31 pages

Probablity Mit Removed

The document outlines the coursework and assessment structure for the 6.041 Probabilistic Systems Analysis course, including quizzes, a final exam, and weekly homework. It provides an overview of the topics covered in lectures, such as probability axioms, conditional probability, and independence of events. Additionally, it emphasizes the importance of reading the assigned textbook and completing required forms for recitation and tutorial scheduling.

Uploaded by

gauravpddu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views31 pages

Probablity Mit Removed

The document outlines the coursework and assessment structure for the 6.041 Probabilistic Systems Analysis course, including quizzes, a final exam, and weekly homework. It provides an overview of the topics covered in lectures, such as probability axioms, conditional probability, and independence of events. Additionally, it emphasizes the importance of reading the assigned textbook and completing required forms for recitation and tutorial scheduling.

Uploaded by

gauravpddu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

6.

041 Probabilistic Systems Analysis Coursework


6.431 Applied Probability
– Quiz 1 (October 12, 12:05-12:55pm) 17%
• Staff:
– Quiz 2 (November 2, 7:30-9:30pm) 30%
– Lecturer: John Tsitsikli s
– Final exam (scheduled by registrar) 40%
• Pick up and read course information handout – Weekly homework (best 9 of 10) 10%
• Turn in recitation and tutorial scheduling form – Attendance/participation/enthusiasm in 3%
(last sheet of course information handout) recitations/tutorials
• Pick up copy of slides • Collaboration policy described in course info handout

• Text: Introduction to Probability, 2nd Edition,


D. P. Bertsekas and J. N. Tsitsiklis, Athena Scientific, 2008
Read the text!

LECTURE 1 Sample space Ω


• Readings: Sections 1.1, 1.2 • “List” (set) of possible outcomes
• List must be:
Lecture outline – Mutually exclusive

• Probability as a mathematical framework – Collectively exhaustive


for reasoning about uncertainty • Art: to be at the “right” granularity

• Probabilistic models
– sample space
– probability law
• Axioms of probability
• Simple examples

1
Sample space: Discrete example Sample space: Continuous example

• Two rolls of a tetrahedral die Ω = {(x, y) | 0 ≤ x, y ≤ 1}


– Sample space vs. sequential description
y
1,1
1 1,2
1,3
1
1,4
4 2
3
Y = Second
roll
3
2
1 x
1

1 2 3 4
4
X = First roll 4,4

Probability axioms Probability law: Example with finite sample space

• Event: a subset of the sample space 4


• Probability is assigned to events
Y = Second 3
roll
2
Axioms:
1
1. Nonnegativity: P(A) ≥ 0
1 2 3 4
2. Normalization: P(Ω) = 1
X = First roll
3. Additivity: If A ∩ B = Ø, then P(A ∪ B) = P(A) + P(B)
• Let every possible outcome have probability 1/16

• P({s1, s2, . . . , sk }) = P({s1}) + · · · + P({sk }) – P((X, Y ) is (1,1) or (1,2)) =

= P(s1) + · · · + P(sk ) – P({X = 1}) =


– P(X + Y is odd) =
• Axiom 3 needs strengthening
• Do weird sets have probabilities? – P(min(X, Y ) = 2) =

2
Discrete uniform law Continuous uniform law

• Let all outcomes be equally likely • Two “random” numbers in [0, 1].
y
• Then,
1
number of elements of A
P(A) =
total number of sample points

1 x
• Computing probabilities ≡ counting
• Uniform law: Probability = Area
• Defines fair coins, fair dice, well-shuffled decks
– P(X + Y ≤ 1/2) = ?

– P( (X, Y ) = (0.5, 0.3) )

Probability law: Ex. w/countably infinite sample space


• Sample space: {1, 2, . . .}
– We are given P(n) = 2−n, n = 1, 2, . . .
– Find P(outcome is even)
Remember!
p

1/2
• Turn in recitation/tutorial scheduling form now

1/4 • Tutorials start next week


…..
1/8
1/16

1 2 3 4

1 1 1 1
P({2, 4, 6, . . .}) = P(2) + P(4) + · · · = + 4 + 6 + ··· =
22 2 2 3

• Countable additivity axiom (needed for this calculation):


If A1, A2, . . . are disjoint events, then:
P(A1 ∪ A2 ∪ · · · ) = P(A1) + P(A2) + · · ·

3
LECTURE 2 Review of probability models

• Readings: Sections 1.3-1.4 • Sample space Ω


– Mutually exclusive
Collectively exhaustive
Lecture outline
– Right granularity
• Review • Event: Subset of the sample space
• Conditional probability
• Allocation of probabilities to events
• Three important tools: 1. P(A) ≥ 0
– Multiplication rule 2. P(Ω) = 1

– Total probability theorem 3. If A ∩ B = Ø,


then P(A ∪ B) = P(A) + P(B)
– Bayes’ rule
3’. If A1, A2, . . . are disjoint events, then:
P(A1 ∪ A2 ∪ · · · ) = P(A1) + P(A2) + · · ·

• Problem solving:
– Specify sample space
– Define probability law
– Identify event of interest
– Calculate...

Conditional probability Die roll example

A 4

Y = Second 3
B roll
2

1
• P(A | B) = probability of A,
1 2 3 4
given that B occurred
– B is our new universe X = First roll

• Definition: Assuming P(B) =


$ 0, • Let B be the event: min(X, Y ) = 2
P(A ∩ B) • Let M = max(X, Y )
P(A | B) =
P(B)
P(A | B) undefined if P(B) = 0 • P(M = 1 | B) =

• P(M = 2 | B) =

4
Models based on conditional Multiplication rule
Multiplication rule
probabilities
PP(A∩∩B
(A C) =
B ∩ C) = PP(A) P(B | A)
(A)·P A)P (C ||A
· P(C A∩∩ B)
B)
• Event A: Airplane is flying above
Event B: Something registers on radar
screen
U U
P(B | A)=0.99 A B P(C | A B) U U
A B C
P(B | A)
P(Bc | A)=0.01
P(A)=0.05
A
P(Bc | A)
A Bc C
U U
P(A)
Bc
U
A
P(Ac)=0.95
A Bc Cc
U U
P(B | Ac)=0.10

P(Bc | Ac)=0.90 P(Ac)

Ac
P(A ∩ B) =

P(B) =

P(A | B) =

Total probability theorem Bayes’ rule

• Divide and conquer • “Prior” probabilities P(Ai)


– initial “beliefs”
• Partition of sample space into A1, A2, A3
• We know P(B | Ai) for each i
• Have P(B | Ai), for every i
• Wish to compute P(Ai | B)
A1 – revise “beliefs”, given that B occurred
B
A1
B
A2 A3

• One way of computing P(B): A2 A3

P(B) = P(A1)P(B | A1)


+ P(A2)P(B | A2)
+ P(A3)P(B | A3) P(Ai ∩ B)
P(Ai | B) =
P(B)
P(Ai)P(B | Ai)
=
P(B)
P(Ai)P(B | Ai)
= !
j P(Aj )P(B | Aj )

5
LECTURE 3 Models based on conditional
probabilities
• Readings: Section 1.5
• 3 tosses of a biased coin:
• Review P(H) = p, P(T ) = 1 − p
• Independence of two events p HHH

• Independence of a collection of events p


HHT
1- p
p HTH
Review p 1- p

1- p HTT
P(A ∩ B) p THH
P(A | B) = , assuming P(B ) > 0
P(B) 1- p p
1- p THT
• Multiplication rule: p TTH
1- p
P(A ∩ B) = P(B) · P(A | B) = P(A) · P(B | A) 1- p TTT

• Total probability theorem:

P(B ) = P(A)P(B | A) + P(Ac)P(B | Ac) P(T HT ) =

• Bayes rule: P(1 head) =


P(Ai)P(B | Ai)
P(Ai | B) =
P(B) P(first toss is H | 1 head) =

Independence of two events Conditioning may affect independence

• “Defn:” P(B | A) = P(B) • Conditional independence, given C,


is defined as independence
– “occurrence of A
under probability law P( · | C)
provides no information
about B’s occurrence”
• Assume A and B are independent
• Recall that P(A ∩ B) = P(A) · P(B | A)

• Defn: P(A ∩ B) = P(A) · P(B) C


A
• Symmetric with respect to A and B
B
– applies even if P(A) = 0
– implies P(A | B) = P(A) • If we are told that C occurred,
are A and B independent?

6
Conditioning may affect independence Independence of a collection of events

• Two unfair coins, A and B: • Intuitive definition:


P(H | coin A) = 0.9, P(H | coin B) = 0.1
Information on some of the events tells
choose either coin with equal probability
us nothing about probabilities related to
0.9 the remaining events
0.1
0.9 – E.g.:
Coin A
0.9 P(A1 ∩ (Ac2 ∪ A3) | A5∩Ac6) = P(A1 ∩ (Ac2 ∪ A3))
0.5 0.1
0.1
• Mathematical definition:
0.1 Events A1, A2, . . . , An
0.5 0.1
0.9
are called independent if:
Coin B 0.1
0.9
P(Ai∩Aj ∩· · ·∩Aq ) = P(Ai)P(Aj ) · · · P(Aq )
0.9 for any distinct indices i, j, . . . , q,
(chosen from {1, . . . , n})
• Once we know it is coin A, are tosses
independent?
• If we do not know which coin it is, are
tosses independent?
– Compare:
P(toss 11 = H)
P(toss 11 = H | first 10 tosses are heads)

Independence vs. pairwise The king’s sibling


independence
• The king comes from a family of two
• Two independent fair coin tosses children. What is the probability that
– A: First toss is H his sibling is female?
– B: Second toss is H
– P(A) = P(B) = 1/2

HH HT

TH TT

– C: First and second toss give same


result
– P(C) =
– P(C ∩ A) =
– P(A ∩ B ∩ C) =
– P(C | A ∩ B) =
• Pairwise independence does not
imply independence

7
LECTURE 4 Discrete uniform law

• Readings: Section 1.6 • Let all sample points be equally likely

• Then,
Lecture outline number of elements of A |A|
P(A) = =
total number of sample points |Ω|
• Principles of counting
• Just count. . .
• Many examples
– permutations
– k-permutations
– combinations
– partitions

• Binomial probabilities

Basic counting principle Example

• r stages • Probability that six rolls of a six-sided die


• ni choices at stage i all give different numbers?

– Number of outcomes that


make the event happen:

– Number of elements
in the sample space:

• Number of choices is: n1n2 · · · nr


– Answer:

• Number of license plates


with 3 letters and 4 digits =

• . . . if repetition is prohibited =

• Permutations: Number of ways


of ordering n elements is:

• Number of subsets of {1, . . . , n} =

8
Combinations Binomial probabilities
�n �
• : number of k-element subsets
k • n independent coin tosses
of a given n-element set
– P(H) = p
• Two ways of constructing an ordered
sequence of k distinct items:
• P(HT T HHH) =
– Choose the k items one at a time:
n!
n(n−1) · · · (n−k+1) = choices
(n − k)! • P(sequence) = p# heads(1 − p)# tails
– Choose k items, then order them
(k! possible orders)

P(k heads) = P(seq.)
• Hence: k−head seq.
�n � n!
· k! =
k (n − k)! = (# of k−head seqs.) · pk (1 − p)n−k
�n � n! �n �
= = pk (1 − p)n−k
k k!(n − k)! k

n � �
� n
=
k=0
k

Coin tossing problem Partitions

• event B: 3 out of 10 tosses were “heads”. • 52-card deck, dealt to 4 players


– Given that B occurred, • Find P(each gets an ace)
what is the (conditional) probability
that the first 2 tosses were heads? • Outcome: a partition of the 52 cards
– number of outcomes:
• All outcomes in set B are equally likely:
52!
probability p3(1 − p)7
13! 13! 13! 13!
– Conditional probability law is uniform • Count number of ways of distributing the
four aces: 4 · 3 · 2
• Number of outcomes in B:
• Count number of ways of dealing the
• Out of the outcomes in B, remaining 48 cards
how many start with HH? 48!
12! 12! 12! 12!
• Answer:
48!
4·3·2
12! 12! 12! 12!
52!
13! 13! 13! 13!

9
LECTURE 5 Random variables

• Readings: Sections 2.1-2.3, start 2.4 • An assignment of a value (number) to


every possible outcome
Lecture outline
• Mathematically: A function
• Random variables from the sample space Ω to the real
numbers
• Probability mass function (PMF)
– discrete or continuous values
• Expectation
• Can have several random variables
• Variance
defined on the same sample space

• Notation:
– random variable X
– numerical value x

Probability mass function (PMF) How to compute a PMF pX (x)


– collect all possible outcomes for which
• (“probability law”, X is equal to x
“probability distribution” of X) – add their probabilities
– repeat for all x
• Notation:
• Example: Two independent rools of a
pX (x) = P(X = x) fair tetrahedral die
= P({ω ∈ Ω s.t. X(ω) = x})
F : outcome of first throw
! S: outcome of second throw
• pX (x) ≥ 0 x pX (x) = 1 X = min(F, S)

• Example: X=number of coin tosses


until first head 4

– assume independent tosses, 3


P(H) = p > 0 S = Second roll
2
pX (k) = P(X = k)
1
= P(T T · · · T H)
= (1 − p)k−1p, k = 1, 2, . . . 1 2 3 4

F = First roll
– geometric PMF
pX (2) =

10
Binomial PMF Expectation
• Definition:
• X: number of heads in n independent $
coin tosses E[X] = xpX (x)
x
• P(H) = p
• Interpretations:
• Let n = 4 – Center of gravity of PMF
– Average in large number of repetitions
pX (2) = P(HHT T ) + P(HT HT ) + P(HT T H) of the experiment
+P(T HHT ) + P(T HT H) + P(T T HH) (to be substantiated later in this course)

= 6p2(1 − p)2 • Example: Uniform on 0, 1, . . . , n


"4#
= p2(1 − p)2 pX(x )
2

1/(n+1)
In general:

pX (k) =
"n#

k
pk (1−p)n−k , k = 0, 1, . . . , n ...
0 1 n- 1 n x

1 1 1
E[X] = 0× +1× +· · ·+n× =
n+1 n+1 n+1

Properties of expectations Variance

• Let X be a r.v. and let Y = g(X) $


Recall: E[g(X)] = g(x)pX (x)
$ x
– Hard: E[Y ] = ypY (y) ! 2
y • Second moment: E[X 2] = x x pX (x)
$
– Easy: E[Y ] = g(x)pX (x)
x • Variance
% &
• Caution: In general, E[g(X)] %= g(E[X]) var(X) = E (X − E[X])2

$
= (x − E[X ])2pX (x)
x

Prop erties: If α, β are constants, then: = E[X 2] − (E[X])2

• E[α] =
Properties:
• E[αX] = • var(X) ≥ 0

• E[αX + β] = • var(αX + β) = α2var(X)

11
LECTURE 6 Review
• Random variable X: function from
• Readings: Sections 2.4-2.6
sample space to the real numbers

Lecture outline • PMF (for discrete random variables):


pX (x) = P(X = x)
• Review: PMF, expectation, variance • Expectation:
!
• Conditional PMF E[X] = xpX (x)
x
• Geometric PMF !
E[g(X)] = g(x)pX (x)
• Total expectation theorem x

• Joint PMF of two random variables E[αX + β] = αE[X] + β


" #
• E X − E[X] =

$ %
var(X) = E (X − E[X])2
!
= (x − E[X])2pX (x)
x
= E[X 2] − (E[X])2
&
Standard deviation: σX = var(X)

Random speed Average speed vs. average time

• Traverse a 200 mile distance at constant • Traverse a 200 mile distance at constant
but random speed V but random speed V
pV (v ) 1/2 1/2 pV (v ) 1/2 1/2

1 200 v 1 200 v

• d = 200, T = t(V ) = 200/V • time in hours = T = t(V ) =


'
• E[T ] = E[t(V )] = v t(v)pV (v) =
• E[V ] =
• E[T V ] = 200 =
" E[T ] · E[V ]

• var(V ) = • E[200/V ] = E[T ] "= 200/E[V ].

• σV =

12
Conditional PMF and expectation Geometric PMF
• X: number of independent coin tosses
• pX|A(x) = P(X = x | A)
until first head
!
• E[X | A] = xpX |A(x) pX (k) = (1 − p)k−1p, k = 1, 2, . . .
x


! ∞
!
pX (x ) E[X] = kpX (k) = k(1 − p)k−1p
k=1 k=1

• Memoryless property: Given that X > 2,


1/4
the r.v. X − 2 has same geometric PMF
p
pX (k) pX |X>2(k)

p(1-p)2
p
1 2 3 4 x
... ...
• Let A = {X ≥ 2} 1 k 3 k
pX- 2|X>2(k)
pX|A(x) =
p
E[X | A] =
...
1 k

Total Expectation theorem Joint PMFs

• Partition of sample space • pX,Y (x, y) = P(X = x and Y = y)


into disjoint events A1, A2, . . . , An
y

A1 4 1/20 2/20 2/20


B
3 2/20 4/20 1/20 2/20

2 1/20 3/20 1/20

A2 A3 1 1/20

1 2 3 4 x

P(B) = P(A1)P(B | A1)+· · ·+P(An)P(B | An) !!


• pX,Y (x, y) =
pX (x) = P(A1)pX |A1 (x)+· · ·+P(An)pX |An (x) x y
!
E[X] = P(A1)E[X | A1]+· · ·+P(An)E[X | An] • pX (x) = pX,Y (x, y)
y

• Geometric example: pX,Y (x, y)


• pX |Y (x | y) = P(X = x | Y = y) =
A1 : {X = 1}, A2 : {X > 1} pY (y)
!
E[X] = P(X = 1)E[X | X = 1] • pX |Y (x | y) =
+P(X > 1)E[X | X > 1] x

• Solve to get E[X] = 1/p

13
LECTURE 7 Review

• Readings: Finish Chapter 2


pX (x) = P(X = x)
Lecture outline
pX,Y (x, y) = P(X = x, Y = y)
• Multiple random variables
pX |Y (x | y) = P(X = x | Y = y )
– Joint PMF
– Conditioning
– Independence !
pX (x) = pX,Y (x, y)
• More on expectations y

• Binomial distribution revisited pX,Y (x, y) = pX (x)pY |X (y | x)

• A hat problem

Independent random variables Expectations

!
pX,Y,Z (x, y, z) = pX (x)pY |X (y | x)pZ |X,Y (z | x, y) E[X] = xpX (x)
x
!!
• Random variables X, Y , Z are
E[g(X, Y )] = g(x, y)pX,Y (x, y)
x y
independent if:
" #
pX,Y,Z (x, y, z) = pX (x) · pY (y) · pZ (z) • In general: E[g(X, Y )] #= g E[X], E[Y ]
for all x, y, z
y • E[αX + β] = αE[X] + β
4 1/20 2/20 2/20
• E[X + Y + Z] = E[X] + E[Y ] + E[Z]
3 2/20 4/20 1/20 2/20

2 1/20 3/20 1/20 • If X, Y are independent:


1 1/20
– E[XY ] = E[X]E[Y ]
1 2 3 4 x

– E[g(X)h(Y )] = E[g(X)] · E[h(Y )]


• Independent?

• What if we condition on X ≤ 2
and Y ≥ 3?

14
Variances Binomial mean and variance

• Var(aX) = a2Var(X) • X = # of successes in n independent


trials
• Var(X + a) = Var(X)
– probability of success p
n
! "n#
• Let Z = X + Y . E[X] = k pk (1 − p)n−k
k=0
k
If X, Y are independent:

Var(X + Y ) = Var(X) + Var(Y ) 1, if success in trial i,
• Xi =
0, otherwise

• Examples:
• E[Xi] =
– If X = Y , Var(X + Y ) =

– If X = −Y , Var(X + Y ) = • E[X] =

– If X, Y indep., and Z = X − 3Y ,
Var(Z) = • Var(Xi) =

• Var(X) =

The hat problem Variance in the hat problem

• n people throw their hats in a box and • Var(X ) = E[X 2] − (E[X])2 = E[X 2] − 1
then pick one at random.
– X: number of people who get their own
! !
hat X2 = Xi2 + XiXj
i i,j :i=j
#
– Find E[X]

• E[Xi2] =

1, if i selects own hat
Xi =
0, otherwise.
P(X1X2 = 1) = P(X1 = 1)·P(X2 = 1 | X1 = 1)
• X = X1 + X2 + · · · + Xn
=
• P(Xi = 1) =

• E[Xi] =

• Are the Xi independent? • E[X 2 ] =

• E[X] = • Var(X) =

15
LECTURE 8 Continuous r.v.’s and pdf’s

• Readings: Sections 3.1-3.3 • A continuous r.v. is described by a


probability density function fX

Lecture outline
fX(x)
Sample Space
• Probability density functions

• Cumulative distribution functions


a b x
Event {a < X < b }
• Normal random variables

! b
P(a ≤ X ≤ b) = fX (x) dx
a

! ∞
fX (x) dx = 1
−∞

! x+δ
P(x ≤ X ≤ x + δ) = fX (s) ds ≈ fX (x) · δ
x

!
P(X ∈ B) = fX (x) dx, for “nice” sets B
B

Means and variances Cumulative distribution function


! ∞ (CDF)
• E[X] = xfX (x) dx
−∞
! ∞ ! x
• E[g(X)] = g(x)fX (x) dx FX (x) = P(X ≤ x) = fX (t) dt
−∞ −∞
! ∞
• 2 =
var(X ) = σX (x − E[X ])2fX (x) dx
−∞ fX(x ) CDF

• Continuous Uniform r.v.


fX (x )

a b x a b x

• Also for discrete r.v.’s:


$
a b x FX (x) = P(X ≤ x) = pX (k)
k≤x
• fX (x) = a≤x≤b 3/6

2/6
• E[X] = 1/6

! b" #
2 = a+b 2 1 (b − a)2 1 2 4 x 1 2 4 x
• σX x− dx =
a 2 b−a 12

16
Mixed distributions Gaussian (normal) PDF

• Schematic drawing of a combination of 1 2


• Standard normal N (0, 1): fX (x) = √ e−x /2
a PDF and a PMF 2π
1/2
Normal PDF fx (x) Normal CDF FX(x)
1
0.5

-1 0 1 2 x -1 0 1 2 x

• E[X] = var(X) = 1
0 1/2 1 x0
• General normal N (µ, σ 2):
• The corresponding CDF: 1 2 2
fX (x) = √ e−(x−µ) /2σ
σ 2π
FX (x) = P(X ≤ x)
CDF

• It turns out that:


1
E[X] = µ and Var(X) = σ 2.
3/4
• Let Y = aX + b

1/4 – Then: E[Y ] = Var(Y ) =


– Fact: Y ∼ N (aµ + b, a2σ 2)
1/2 1 x

Calculating normal probabilities The constellation of concepts

• No closed form available for CDF


– but there are tables
(for standard normal) pX (x) fX (x)
X −µ FX (x)
• If X ∼ N (µ, σ 2), then ∼ N( ) E[X], var(X)
σ
• If X ∼ N (2, 16): pX,Y (x, y) fX,Y (x, y)
" # pX |Y (x | y) fX |Y (x | y)
X −2 3−2
(X
PSec. 3) =Random
3.3 ≤ Normal P Variables ≤ = CDF(0.25)
155
4 4
.00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7
2.8
.9965
.9974
.9966
.9975
.9967
.9976
.9968
.9977
.9969
.9977
.9970
.9978
.9971
.9979
.9972
.9979
.9973
.9980
.9974
.9981
17
LECTURE 9 Continuous r.v.’s and pdf’s
• Readings: Sections 3.4-3.5 fX(x)
Sample Space

Outline
• PDF review Event {a < X < b }
a b x

• Multiple random variables


– conditioning
� b
– independence P(a ≤ X ≤ b) = fX (x) dx
a
• Examples
• P(x ≤ X ≤ x + δ) ≈ fX (x) · δ
Summary of concepts � ∞
• E[g(X)] = g(x)fX (x) dx
−∞
pX (x) fX (x)
FX (x) �

xpX (x) E[X] xfX (x) dx
x
var(X)
pX,Y (x, y) fX,Y (x, y)
pX|A(x) fX |A(x)
pX |Y (x | y ) fX |Y (x | y)

Joint PDF fX,Y (x, y) Buffon’s needle


• Parallel lines at distance d
Needle of length � (assume � < d)
� � • Find P(needle intersects one of the lines)
P((X, Y ) ∈ S) = fX,Y (x, y) dx dy
S q
x
l
d
• Interpretation:
P(x ≤ X ≤ x+δ, y ≤ Y ≤ y +δ ) ≈ fX,Y (x, y )·δ 2
• X ∈ [0, d/2]: distance of needle midpoint
to nearest line
• Expectations: • Model: X, Θ uniform, independent
� ∞ � ∞
E[g (X, Y )] = g (x, y)fX,Y (x, y) dx dy fX,Θ(x, θ ) = 0 ≤ x ≤ d/2, 0 ≤ θ ≤ π/2
−∞ −∞


• From the joint to the marginal: • Intersect if X ≤ sin Θ
2
fX (x) · δ ≈ P(x ≤ X ≤ x + δ) =
� � � �

P X≤ sin Θ = fX (x)fΘ(θ) dx dθ
2 x≤ 2� sin θ
� �
4 π/2 (�/2) sin θ
= dx dθ
πd 0 0
• X and Y are called independent if �
4 π/2 � 2�
fX,Y (x, y) = fX (x)fY (y), for all x, y = sin θ dθ =
πd 0 2 πd

18
Conditioning

• Recall

P(x ≤ X ≤ x + δ) ≈ fX (x) · δ Joint, Marginal and Conditional Densities

• By analogy, would like:

P(x ≤ X ≤ x + δ | Y ≈ y) ≈ fX |Y (x | y) · δ

• This leads us to the definition:


fX,Y (x, y) Area of slice = Height of marginal
fX |Y (x | y) = if fY (y) > 0 density at x
fY (y)
Renormalizing slices for
fixed x gives conditional
• For given y, conditional PDF is a densities for Y given X = x
Slice through
(normalized) “section” of the joint PDF density surface
for fixed x

• If independent, fX,Y = fX fY , we obtain


Image by MIT OpenCourseWare, adapted from
fX |Y (x|y) = fX (x) Probability, by J. Pittman, 1999.

Stick-breaking example
• Break a stick of length � twice: 1
fX,Y (x, y) = , 0≤y≤x≤�
break at X: uniform in [0, 1]; �x
break again at Y , uniform in [0, X]
y
f X(x) f Y |X (y | x)
L

L x y
L x

fX,Y (x, y) = fX (x)fY |X (y | x) =


on the set:
y �
fY (y) = fX,Y (x, y) dx
L � �
1
= dx
y �x
1 �
= log , 0≤y≤�
� y
L x � � � �
1 � �
E[Y ] = yfY (y) dy = y log dy =
� 0 0 � y 4
E[Y | X = x] = yfY |X (y | X = x) dy =

19
LECTURE 10 The Bayes variations

Continuous Bayes rule; pX,Y (x, y) pX (x)pY |X (y | x)


Derived distributions pX|Y (x | y) = =
pY (y) pY (y)

• Readings: pY (y) = pX (x)pY |X (y | x)
x
Section 3.6; start Section 4.1
Example:
Review • X = 1, 0: airplane present/not present
• Y = 1, 0: something did/did not register
pX (x) fX (x) on radar
pX,Y (x, y) fX,Y (x, y)
pX,Y (x, y) fX,Y (x, y) Continuous counterpart
pX |Y (x | y) = fX|Y (x | y) =
pY (y) fY (y)
� � ∞
fX,Y (x, y) fX (x)fY |X (y | x)
pX (x) = pX,Y (x, y) fX (x) = fX,Y (x, y) dy fX|Y (x | y) = =
y −∞ fY (y) fY (y)

fY (y) = fX (x)fY |X (y | x) dx
x
FX (x) = P(X ≤ x)
Example: X: some signal; “prior” fX (x)
E[X], var(X) Y : noisy version of X
fY |X (y | x): model of the noise

Discrete X, Continuous Y What is a derived distribution

pX (x)fY |X (y | x) • It is a PMF or PDF of a function of one


pX |Y (x | y) = or more random variables with known
fY (y)
probability law. E.g.:
� y
fY (y ) = pX (x)fY |X (y | x) f X ,Y(y,x)=1
x
1
Example:
• X: a discrete signal; “prior” pX (x)
• Y : noisy version of X
• fY |X (y | x): continuous noise model
1 x

Continuous X, Discrete Y
– Obtaining the PDF for
fX (x)pY |X (y | x)
fX |Y (x | y) = g(X, Y ) = Y /X
pY (y)
� involves deriving a distribution.
pY (y) = fX (x)pY |X (y | x) dx Note: g(X, Y ) is a random variable
x
Example:
When not to find them
• X: a continuous signal; “prior” fX (x)
(e.g., intensity of light beam); • Don’t need PDF for g(X, Y ) if only want
• Y : discrete r.v. affected by X to compute expected value:
� �
(e.g., photon count)
E[g(X, Y )] = g(x, y)fX,Y (x, y) dx dy
• pY |X (y | x): model of the discrete r.v.

20
How to find them The continuous case

• Discrete case • Two-step procedure:

– Obtain probability mass for each – Get CDF of Y : FY (y) = P(Y ≤ y)


possible value of Y = g(X)

pY (y) = P(g(X) = y) – Differentiate to get



= pX (x) dFY
fY (y) = (y)
x: g(x)=y dy

x y
g(x)
Example
. . • X: uniform on [0,2]
. . • Find PDF of Y = X 3
. .
. . • Solution:

FY (y) = P(Y ≤ y) = P(X 3 ≤ y)


. . 1
= P(X ≤ y 1/3) = y 1/3
. . 2

. . fY (y) =
dFY
dy
(y) =
1
6y 2/3

Example The pdf of Y=aX+b

• Joan is driving from Boston to New York.


Her speed is uniformly distributed be- Y = 2X + 5:
tween 30 and 60 mph. What is the dis-
tribution of the duration of the trip? fX
faX faX+b
200
• Let T (V ) = .
V
• Find fT (t) -2 -1 2 3 4 9

f v(v0 )

1/30 � �
1 y−b
fY (y) = fX
|a| a

30 60 v0 • Use this to check that if X is normal,


then Y = aX + b is also normal.

21
LECTURE 11 A general formula

Derived distributions; convolution; • Let Y = g(X)


covariance and correlation g strictly monotonic.
dg
slope (x)
• Readings: y dx

Finish Section 4.1;


Section 4.2
g(x)
[y, y+?]
Example
y x
f X ,Y(y,x)=1
[x, x+d]

• Event x ≤ X ≤ x + δ is the same as


g(x) ≤ Y ≤ g(x + δ)
or (approximately)
1 x
g(x) ≤ Y ≤ g(x) + δ|(dg/dx)(x)|

• Hence,
Find the PDF of Z = g(X, Y ) = Y /X ! !
! dg !
δfX (x) = δfY (y) !! (x)!!
FZ (z ) = z≤1 dx
where y = g(x)

FZ (z) = z≥1

The distribution of X + Y The continuous case

• W = X + Y ; X, Y independent • W = X + Y ; X, Y independent

. y
y
. (0,3)

. (1,2) w

.(2,1)

.
(3,0)
x
. w x
pW (w) = P(X + Y = w)
" x +y=w
= P(X = x)P(Y = w − x)
x
"
= pX (x)pY (w − x) • fW |X (w | x) = fY (w − x)
x
• fW,X (w, x) = fX (x)fW |X (w | x)
• Mechanics:
= fX (x)fY (w − x)
– Put the pmf’s on top of each other
# ∞
– Flip the pmf of Y • fW (w) = fX (x)fY (w − x) dx
−∞
– Shift the flipped pmf by w
(to the right if w > 0)
– Cross-multiply and add

22
Two independent normal r.v.s The sum of independent normal r.v.’s

• X ∼ N (µx, σx2), Y ∼ N (µy , σy2), • X ∼ N (0, σx2), Y ∼ N (0, σy2),


independent independent

fX,Y (x, y) = fX (x)fY (y) • Let W = X + Y


$ %
1 (x − µx)2 (y − µy )2 # ∞
= exp − − fW (w) = fX (x)fY (w − x) dx
2πσxσy 2σx2 2σy2 −∞
#
1 ∞ 2 2 2 2
= e−x /2σx e−(w−x) /2σy dx
• PDF is constant on the ellipse where 2πσxσy −∞
2
(x − µx)2 (y − µ y ) 2 (algebra) = ce−γw
+
2σx2 2σy2
is constant • Conclusion: W is normal
– mean=0, variance=σx2 + σy2
• Ellipse is a circle when σx = σy
– same argument for nonzero mean case

Covariance Correlation coefficient


& '
• cov(X, Y ) = E (X − E[X]) · (Y − E[Y ]) • Dimensionless version of covariance:
, -
(X − E[X]) (Y − E[Y ])
• Zero-mean case: cov(X, Y ) = E [XY ] ρ = E ·
σX σY
cov(X, Y )
x x =
. . .... . .. .

. σX σY
.
.

.
. .. ... . .... ... . .. .. . ... . .

. . .... . .. .
. ... ...... ... .. . . .

. . ... ...... ... .. . . . • −1 ≤ ρ ≤ 1


. .. ... . ... ... . .. .. . ... . .
. . .. .. . . ... .. . .

. . . .. .. .
. . . .. .

. . . .. . . ... .. . . y y
• |ρ| = 1 ⇔ (X − E[X]) = c(Y − E[Y ])
. .
. . (linearly related)
. . .
.

• Independent ⇒ ρ = 0
• cov(X, Y ) = E[XY ] − E[X]E[Y ] (converse is not true)
 
n
" n
" "
• var  Xi = var(Xi) + 2 cov(Xi, Xj )
i=1 i=1 (i,j ):i=j
&

• independent ⇒ cov(X, Y ) = 0
(converse is not true)

23
LECTURE 12 Conditional expectations

• Readings: Section 4.3; • Given the value y of a r.v. Y :


parts of Section 4.5 !
E[X | Y = y] = xpX|Y (x | y)
(mean and variance only; no transforms) x
(integral in continuous case)

Lecture outline • Stick example: stick of length !


break at uniformly chosen point Y
• Conditional expectation break again at uniformly chosen point X
– Law of iterated expectations y
• E[X | Y = y] = (number)
2
– Law of total variance

• Sum of a random number Y


of independent r.v.’s E[X | Y ] = (r.v.)
2
– mean, variance

• Law of iterated expectations:


!
E[E[X | Y ]] = E[X | Y = y]pY (y)= E[X]
y

• In stick example:
E[X] = E[E[X | Y ]] = E[Y /2] = !/4

var(X | Y ) and its expectation Section means and variances


" # Two sections:
• var(X | Y = y ) = E (X − E[X | Y = y])2 | Y = y
y = 1 (10 students); y = 2 (20 students)
• var(X | Y ): a r.v. 10 30
1 ! 1 !
with value var(X | Y = y) when Y = y y=1: xi = 90 y=2: xi = 60
10 i=1 20 i=11
• Law of total variance:

var(X) = E[var(X | Y )] + var(E[X | Y ]) 30


1 ! 90 · 10 + 60 · 20
E[X] = xi = = 70
30 i=1 30

Proof:
E[X | Y = 1] = 90, E[X | Y = 2] = 60

(a) Recall: var(X) = E[X 2] − (E[X])2 


90, w.p. 1/3
E[X | Y ] =
60, w.p. 2/3
(b) var(X | Y ) = E[X 2 | Y ] − (E[X | Y ])2
E[E[X | Y ]] = 1 · 90 + 2 · 60 = 70 = E[X]
3 3
(c) E[var(X | Y )] = E[X 2] − E[ (E[X | Y ])2 ]

1 2
(d) var(E[X | Y ]) = E[ (E[X | Y ])2 ]−(E[X])2 var(E[X | Y ]) = (90 − 70)2 + (60 − 70)2
3 3
600
Sum of right-hand sides of (c), (d): = = 200
3
E[X 2] − (E[X ])2 = var(X)

24
Section means and variances (ctd.) Example

10 30
1 ! 1 ! var(X) = E[var(X | Y )] + var(E[X | Y ])
(xi−90)2 = 10 (xi−60)2 = 20
10 i=1 20 i=11
f X(x)

2/3

var(X | Y = 1) = 10 var(X | Y = 2) = 20
1/3

1 2
 Y=1 Y=2 x
10, w.p. 1/3
var(X | Y ) =
20, w.p. 2/3
E[X | Y = 1] = E[X | Y = 2] =
E[var(X | Y )] = 1 · 10 + 2 · 20 = 50
3 3 3

var(X | Y = 1) = var(X | Y = 2) =

var(X) = E[var(X | Y )] + var(E[X | Y ])


E[X] =
50
= + 200
3
= (average variability within sections) var(E[X | Y ]) =
+ (variability between sections)

Sum of a random number of Variance of sum of a random number


independent r.v.’s of independent r.v.’s

• N : number of stores visited • var(Y ) = E[var(Y | N )] + var(E[Y | N ])


(N is a nonnegative integer r.v.)
• E[Y | N ] = N E[X]
• Xi: money spent in store i var(E[Y | N ]) = (E[X])2 var(N )
– Xi assumed i.i.d.
• var(Y | N = n) = n var(X)
– independent of N
var(Y | N ) = N var(X)
• Let Y = X1 + · · · + XN E[var(Y | N )] = E[N ] var(X)

E[Y | N = n] = E[X1 + X2 + · · · + Xn | N = n]
= E[X1 + X2 + · · · + Xn]
= E[X1] + E[X2] + · · · + E[Xn] var(Y ) = E[var(Y | N )] + var(E[Y | N ])
= n E[X] = E[N ] var(X ) + (E[X])2 var(N )

• E[Y | N ] = N E[X]

E[Y ] = E[E[Y | N ]]
= E[N E[X]]
= E[N ] E[X]

25
LECTURE 13 The Bernoulli process

The Bernoulli process • A sequence of independent


Bernoulli trials
• Readings: Section 6.1
• At each trial, i:
– P(success) = P(Xi = 1) = p
Lecture outline
– P(failure) = P(Xi = 0) = 1 − p
• Definition of Bernoulli process

• Random processes • Examples:


– Sequence of lottery wins/losses
• Basic properties of Bernoulli process
– Sequence of ups and downs of the Dow
• Distribution of interarrival times Jones
– Arrivals (each second) to a bank
• The time of the kth success
– Arrivals (at each time slot) to server
• Merging and splitting

Random processes Number of successes S in n time slots

• First view: • P(S = k) =


sequence of random variables X1, X2, . . .
• E[Xt] = • E[S] =
• Var(Xt) =
• Var(S) =
• Second view:
what is the right sample space?

• P(Xt = 1 for all t) =

• Random processes we will study:


– Bernoulli process
(memoryless, discrete time)
– Poisson process
(memoryless, continuous time)
– Markov chains
(with memory/dependence across time)

26
Interarrival times Time of the kth arrival

• T1: number of trials until first success • Given that first arrival was at time t
i.e., T1 = t:
– P(T1 = t) = additional time, T2, until next arrival

– Memoryless property – has the same (geometric) distribution


– independent of T1
– E[T1] =

– Var(T1) = • Yk : number of trials to kth success

– E[Yk ] =
• If you buy a lottery ticket every day, what
is the distribution of the length of the
first string of losing days? – Var(Yk ) =

– P(Yk = t) =

Sec. 6.1 The Bernoulli Process 305

Splitting and Merging of Bernoulli Processes

Starting with a Bernoulli process in which there is a probability p of an arrival


at each time, consider splitting it as follows. Whenever there is an arrival, we
choose to either keep it (with probability q), or to discard it (with probability
1−q); see Fig. 6.3. Assume that the decisions to keep or discard are independent
for different arrivals. If we focus on the process of arrivals that are kept, we see
Splitting of a Bernoulli Process
that it is a Bernoulli process: in each time slot, there is a probability pq of a Merging of Indep. Bernoulli Processes
kept arrival, independent of what happens in other slots. For the same reason, 306 The Bernoulli and Poisson Processes Chap. 6

(using independent coin flips)


the process of discarded arrivals is also a Bernoulli process, with a probability
of a discarded arrival at each time slot equal to p(1 − q). Bernoulli (p)
time

time Merged process:


q Bernoulli (p + q − pq) time
Original
process time
Bernoulli (q)
1−q time

time
Figure 6.4: Merging of independent Bernoulli processes.

Figure 6.3: Splitting of a Bernoulli process. yields a Bernoulli process


yields Bernoulli processes
In a reverse situation, we start with two independent Bernoulli processes
(collisions
concentrate on the are
special counted as but
case where n is large one arrival)
p is small, so that the mean
np has a moderate value. A situation of this type arises when one passes from
(with parameters p and q, respectively) and merge them into a single process, discrete to continuous time, a theme to be picked up in the next section. For
as follows. An arrival is recorded in the merged process if and only if there some examples, think of the number of airplane accidents on any given day:
is an arrival in at least one of the two original processes. This happens with there is a large number n of trials (airplane flights), but each one has a very
probability p + q − pq [one minus the probability (1 − p)(1 − q) of no arrival in small probability p of being involved in an accident. Or think of counting the
either process]. Since different time slots in either of the original processes are number of typos in a book: there is a large number of words, but a very small
independent, different slots in the merged process are also independent. Thus, probability of misspelling any single one.
the merged process is Bernoulli, with success probability p + q − pq at each time Mathematically, we can address situations of this kind, by letting n grow
step; see Fig. 6.4. while simultaneously decreasing p, in a manner that keeps the product np at a
Splitting and merging of Bernoulli (or other) arrival processes arises in constant value λ. In the limit, it turns out that the formula for the binomial PMF
many contexts. For example, a two-machine work center may see a stream of simplifies to the Poisson PMF. A precise statement is provided next, together
arriving parts to be processed and split them by sending each part to a randomly with a reminder of some of the properties of the Poisson PMF that were derived
chosen machine. Conversely, a machine may be faced with arrivals of different in Chapter 2.
types that can be merged into a single arrival stream.

The Poisson Approximation to the Binomial


Poisson Approximation to the Binomial
The number of successes in n independent Bernoulli trials is a binomial random
variable with parameters n and p, and its mean is np. In this subsection, we • A Poisson random variable Z with parameter λ takes nonnegative
integer values and is described by the PMF

λk
pZ (k) = e−λ , k = 0, 1, 2, . . . .
k!
Its mean and variance are given by

E[Z] = λ, var(Z) = λ.

27
LECTURE 19 Chebyshev’s inequality
Limit theorems – I
• Random variable X
• Readings: Sections 5.1-5.3; (with finite mean µ and variance σ 2)
start Section 5.4 !
σ2 = (x − µ)2fX (x) dx
! −c ! ∞
• X1, . . . , Xn i.i.d. ≥ (x − µ)2fX (x) dx + (x − µ)2fX (x) dx
−∞ c
X1 + · · · + Xn
Mn =
n ≥ c2 · P(|X − µ| ≥ c)
What happens as n → ∞?

• Why bother? σ2
P(|X − µ| ≥ c) ≤
c2
• A tool: Chebyshev’s inequality

• Convergence “in probability”


1
P(|X − µ| ≥ kσ ) ≤
• Convergence of Mn k2
(weak law of large numbers)

Deterministic limits Convergence “in probability”

• Sequence an • Sequence of random variables Yn


Number a
• converges in probability to a number a:
“(almost all) of the PMF/PDF of Yn ,
• an converges to a eventually gets concentrated
lim a = a (arbitrarily) close to a”
n→∞ n
“an eventually gets and stays
(arbitrarily) close to a” • For every " > 0,

lim P(|Yn − a| ≥ ") = 0


n→∞
• For every " > 0,
there exists n0,
such that for every n ≥ n0,
we have |an − a| ≤ ". 1 - 1 /n pmf of Yn

1 /n

0 n
Does Yn converge?

38
Convergence of the sample mean The pollster’s problem
(Weak law of large numbers)
• f : fraction of population that “. . . ”
• X1, X2, . . . i.i.d.
• ith (randomly selected) person polled:
finite mean µ and variance σ 2 
1, if yes,
X + · · · + Xn
Mn = 1 Xi =
0, if no.
n

• Mn = (X1 + · · · + Xn)/n
• E[Mn] = fraction of “yes” in our sample

• Goal: 95% confidence of ≤1% error


• Var(Mn) = P(|Mn − f | ≥ .01) ≤ .05

• Use Chebyshev’s inequality:


Var(Mn) σ2 2
σM
P(|Mn − µ| ≥ ") ≤ = P(|Mn − f | ≥ .01) ≤ n
"2 n"2 (0.01)2
σx2 1
• Mn converges in probability to µ = ≤
n(0.01)2 4n(0.01)2

• If n = 50, 000,
then P(|Mn − f | ≥ .01) ≤ .05
(conservative)

Different scalings of Mn The central limit theorem

• X1, . . . , Xn i.i.d. • “Standardized” Sn = X1 + · · · + Xn:


finite variance σ 2 Sn − E[Sn] Sn − nE[X]
Zn = = √
σSn nσ
• Look at three variants of their sum:
– zero mean
• Sn = X1 + · · · + Xn variance nσ 2
– unit variance

Sn • Let Z be a standard normal r.v.


• Mn = variance σ 2/n
n (zero mean, unit variance)
converges “in probability” to E[X] (WLLN)

Sn • Theorem: For every c:


• √ constant variance σ 2
n
P(Zn ≤ c) → P(Z ≤ c)
– Asymptotic shape?

• P(Z ≤ c) is the standard normal CDF,


Φ(c), available from the normal tables

39
LECTURE 20 Usefulness
THE CENTRAL LIMIT THEOREM • universal; only means, variances matter
• accurate computational shortcut
• Readings: Section 5.4
• justification of normal models
• X1, . . . , Xn i.i.d., finite variance σ 2

• “Standardized” Sn = X1 + · · · + Xn:
What exactly does it say?
Sn − E[Sn] Sn − nE[X]
Zn = = √ • CDF of Zn converges to normal CDF
σ Sn nσ
– not a statement about convergence of
– E[Zn] = 0, var(Zn) = 1 PDFs or PMFs

• Let Z be a standard normal r.v.


(zero mean, unit variance) Normal approximation

• Theorem: For every c: • Treat Zn as if normal


– also treat Sn as if normal
P(Zn ≤ c) → P(Z ≤ c)

• P(Z ≤ c) is the standard normal CDF,


Φ(c), available from the normal tables
Can we use it when n is “moderate”?
• Yes, but no nice theorems to this effect
• Symmetry helps a lot

0.14 0.1
n =2 n =4
0.12
0.08
0.1

0.08 0.06

0.06
0.04
The pollster’s problem using the CLT
0.04

0.02
0.02
• f : fraction of population that “ . . .��
0 0
0 5 10 15 20 0 5 10 15 20 25 30 35

• ith (randomly selected) person polled:


0.25 0.035
n =32

0.2
0.03 1, if yes,
0.025 Xi =
0.15
0.02
0, if no.
0.1 0.015

0.01
0.05
0.005
• Mn = (X1 + · · · + Xn)/n
0 0

• Suppose we want:
0 2 4 6 8 100 120 140 160 180 200

P(|Mn − f | ≥ .01) ≤ .05


0.12 0.08

n = 16
0.07
n=8

• Event of interest: |Mn − f | ≥ .01


0.1

0.06

0.08
0.05

� �
� X1 + · · · + Xn − nf �
0.06 0.04

� �
� ≥ .01
0.03
0.04

0.02
0.02
n
0.01

0 0
0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70
� � √
1 0.06 � X + · · · + X − nf � .01 n
� 1 n �
0.9 n = 32
� √ � ≥
� �
0.05
0.8

0.7
nσ σ
0.04
0.6

0.5 0.03

0.4

0.3
0.02

0.2
0.01 P(|Mn − f | ≥ .01) ≈ P(|Z| ≥ .01 n/σ)
0.1

0
0 1 2 3 4 5 6 7
0
30 40 50 60 70 80 90 100 ≤ P(|Z| ≥ .02 n)

40
Apply to binomial The 1/2 correction for binomial
approximation
• Fix p, where 0 < p < 1
• P(Sn ≤ 21) = P(Sn < 22),
• Xi: Bernoulli(p) because Sn is integer

• Sn = X1 + · · · + Xn: Binomial(n, p) • Compromise: consider P(Sn ≤ 21.5)


– mean np, variance np(1 − p)

Sn − np
• CDF of � −→ standard normal
np(1 − p)

Example
18 19 20 21 22
• n = 36, p = 0.5; find P(Sn ≤ 21)

• Exact answer:
21 � � �
� 36� 1 36
= 0.8785
k=0
k 2

De Moivre–Laplace CLT (for binomial) Poisson vs. normal approximations of


• When the 1/2 correction is used, CLT the binomial
can also approximate the binomial p.m.f.
(not just the binomial CDF) • Poisson arrivals during unit interval equals:
sum of n (independent) Poisson arrivals
P(Sn = 19) = P(18.5 ≤ Sn ≤ 19.5) during n intervals of length 1/n
– Let n → ∞, apply CLT (??)
18.5 ≤ Sn ≤ 19.5 ⇐⇒
– Poisson=normal (????)
18.5 − 18 Sn − 18 19.5 − 18
≤ ≤ ⇐⇒
3 3 3
0.17 ≤ Zn ≤ 0.5 • Binomial(n, p)
– p fixed, n → ∞: normal
P(Sn = 19) ≈ P(0.17 ≤ Z ≤ 0.5)
– np fixed, n → ∞, p → 0: Poisson

= P(Z ≤ 0.5) − P(Z ≤ 0.17) • p = 1/100, n = 100: Poisson

= 0.6915 − 0.5675 • p = 1/10, n = 500: normal

= 0.124

• Exact answer:
�36� � 1 �36
= 0.1251
19 2

41

You might also like