Probablity Mit Removed
Probablity Mit Removed
• Probabilistic models
– sample space
– probability law
• Axioms of probability
• Simple examples
1
Sample space: Discrete example Sample space: Continuous example
1 2 3 4
4
X = First roll 4,4
2
Discrete uniform law Continuous uniform law
• Let all outcomes be equally likely • Two “random” numbers in [0, 1].
y
• Then,
1
number of elements of A
P(A) =
total number of sample points
1 x
• Computing probabilities ≡ counting
• Uniform law: Probability = Area
• Defines fair coins, fair dice, well-shuffled decks
– P(X + Y ≤ 1/2) = ?
1/2
• Turn in recitation/tutorial scheduling form now
1 2 3 4
1 1 1 1
P({2, 4, 6, . . .}) = P(2) + P(4) + · · · = + 4 + 6 + ··· =
22 2 2 3
3
LECTURE 2 Review of probability models
• Problem solving:
– Specify sample space
– Define probability law
– Identify event of interest
– Calculate...
A 4
Y = Second 3
B roll
2
1
• P(A | B) = probability of A,
1 2 3 4
given that B occurred
– B is our new universe X = First roll
• P(M = 2 | B) =
4
Models based on conditional Multiplication rule
Multiplication rule
probabilities
PP(A∩∩B
(A C) =
B ∩ C) = PP(A) P(B | A)
(A)·P A)P (C ||A
· P(C A∩∩ B)
B)
• Event A: Airplane is flying above
Event B: Something registers on radar
screen
U U
P(B | A)=0.99 A B P(C | A B) U U
A B C
P(B | A)
P(Bc | A)=0.01
P(A)=0.05
A
P(Bc | A)
A Bc C
U U
P(A)
Bc
U
A
P(Ac)=0.95
A Bc Cc
U U
P(B | Ac)=0.10
Ac
P(A ∩ B) =
P(B) =
P(A | B) =
5
LECTURE 3 Models based on conditional
probabilities
• Readings: Section 1.5
• 3 tosses of a biased coin:
• Review P(H) = p, P(T ) = 1 − p
• Independence of two events p HHH
1- p HTT
P(A ∩ B) p THH
P(A | B) = , assuming P(B ) > 0
P(B) 1- p p
1- p THT
• Multiplication rule: p TTH
1- p
P(A ∩ B) = P(B) · P(A | B) = P(A) · P(B | A) 1- p TTT
6
Conditioning may affect independence Independence of a collection of events
HH HT
TH TT
7
LECTURE 4 Discrete uniform law
• Then,
Lecture outline number of elements of A |A|
P(A) = =
total number of sample points |Ω|
• Principles of counting
• Just count. . .
• Many examples
– permutations
– k-permutations
– combinations
– partitions
• Binomial probabilities
– Number of elements
in the sample space:
• . . . if repetition is prohibited =
8
Combinations Binomial probabilities
�n �
• : number of k-element subsets
k • n independent coin tosses
of a given n-element set
– P(H) = p
• Two ways of constructing an ordered
sequence of k distinct items:
• P(HT T HHH) =
– Choose the k items one at a time:
n!
n(n−1) · · · (n−k+1) = choices
(n − k)! • P(sequence) = p# heads(1 − p)# tails
– Choose k items, then order them
(k! possible orders)
�
P(k heads) = P(seq.)
• Hence: k−head seq.
�n � n!
· k! =
k (n − k)! = (# of k−head seqs.) · pk (1 − p)n−k
�n � n! �n �
= = pk (1 − p)n−k
k k!(n − k)! k
n � �
� n
=
k=0
k
9
LECTURE 5 Random variables
• Notation:
– random variable X
– numerical value x
F = First roll
– geometric PMF
pX (2) =
10
Binomial PMF Expectation
• Definition:
• X: number of heads in n independent $
coin tosses E[X] = xpX (x)
x
• P(H) = p
• Interpretations:
• Let n = 4 – Center of gravity of PMF
– Average in large number of repetitions
pX (2) = P(HHT T ) + P(HT HT ) + P(HT T H) of the experiment
+P(T HHT ) + P(T HT H) + P(T T HH) (to be substantiated later in this course)
1/(n+1)
In general:
pX (k) =
"n#
k
pk (1−p)n−k , k = 0, 1, . . . , n ...
0 1 n- 1 n x
1 1 1
E[X] = 0× +1× +· · ·+n× =
n+1 n+1 n+1
$
= (x − E[X ])2pX (x)
x
• E[α] =
Properties:
• E[αX] = • var(X) ≥ 0
11
LECTURE 6 Review
• Random variable X: function from
• Readings: Sections 2.4-2.6
sample space to the real numbers
$ %
var(X) = E (X − E[X])2
!
= (x − E[X])2pX (x)
x
= E[X 2] − (E[X])2
&
Standard deviation: σX = var(X)
• Traverse a 200 mile distance at constant • Traverse a 200 mile distance at constant
but random speed V but random speed V
pV (v ) 1/2 1/2 pV (v ) 1/2 1/2
1 200 v 1 200 v
• σV =
12
Conditional PMF and expectation Geometric PMF
• X: number of independent coin tosses
• pX|A(x) = P(X = x | A)
until first head
!
• E[X | A] = xpX |A(x) pX (k) = (1 − p)k−1p, k = 1, 2, . . .
x
∞
! ∞
!
pX (x ) E[X] = kpX (k) = k(1 − p)k−1p
k=1 k=1
p(1-p)2
p
1 2 3 4 x
... ...
• Let A = {X ≥ 2} 1 k 3 k
pX- 2|X>2(k)
pX|A(x) =
p
E[X | A] =
...
1 k
A2 A3 1 1/20
1 2 3 4 x
13
LECTURE 7 Review
• A hat problem
!
pX,Y,Z (x, y, z) = pX (x)pY |X (y | x)pZ |X,Y (z | x, y) E[X] = xpX (x)
x
!!
• Random variables X, Y , Z are
E[g(X, Y )] = g(x, y)pX,Y (x, y)
x y
independent if:
" #
pX,Y,Z (x, y, z) = pX (x) · pY (y) · pZ (z) • In general: E[g(X, Y )] #= g E[X], E[Y ]
for all x, y, z
y • E[αX + β] = αE[X] + β
4 1/20 2/20 2/20
• E[X + Y + Z] = E[X] + E[Y ] + E[Z]
3 2/20 4/20 1/20 2/20
• What if we condition on X ≤ 2
and Y ≥ 3?
14
Variances Binomial mean and variance
• Examples:
• E[Xi] =
– If X = Y , Var(X + Y ) =
– If X = −Y , Var(X + Y ) = • E[X] =
– If X, Y indep., and Z = X − 3Y ,
Var(Z) = • Var(Xi) =
• Var(X) =
• n people throw their hats in a box and • Var(X ) = E[X 2] − (E[X])2 = E[X 2] − 1
then pick one at random.
– X: number of people who get their own
! !
hat X2 = Xi2 + XiXj
i i,j :i=j
#
– Find E[X]
• E[Xi2] =
1, if i selects own hat
Xi =
0, otherwise.
P(X1X2 = 1) = P(X1 = 1)·P(X2 = 1 | X1 = 1)
• X = X1 + X2 + · · · + Xn
=
• P(Xi = 1) =
• E[Xi] =
• E[X] = • Var(X) =
15
LECTURE 8 Continuous r.v.’s and pdf’s
Lecture outline
fX(x)
Sample Space
• Probability density functions
! b
P(a ≤ X ≤ b) = fX (x) dx
a
! ∞
fX (x) dx = 1
−∞
! x+δ
P(x ≤ X ≤ x + δ) = fX (s) ds ≈ fX (x) · δ
x
!
P(X ∈ B) = fX (x) dx, for “nice” sets B
B
a b x a b x
2/6
• E[X] = 1/6
! b" #
2 = a+b 2 1 (b − a)2 1 2 4 x 1 2 4 x
• σX x− dx =
a 2 b−a 12
16
Mixed distributions Gaussian (normal) PDF
-1 0 1 2 x -1 0 1 2 x
• E[X] = var(X) = 1
0 1/2 1 x0
• General normal N (µ, σ 2):
• The corresponding CDF: 1 2 2
fX (x) = √ e−(x−µ) /2σ
σ 2π
FX (x) = P(X ≤ x)
CDF
Outline
• PDF review Event {a < X < b }
a b x
�
• From the joint to the marginal: • Intersect if X ≤ sin Θ
2
fX (x) · δ ≈ P(x ≤ X ≤ x + δ) =
� � � �
�
P X≤ sin Θ = fX (x)fΘ(θ) dx dθ
2 x≤ 2� sin θ
� �
4 π/2 (�/2) sin θ
= dx dθ
πd 0 0
• X and Y are called independent if �
4 π/2 � 2�
fX,Y (x, y) = fX (x)fY (y), for all x, y = sin θ dθ =
πd 0 2 πd
18
Conditioning
• Recall
P(x ≤ X ≤ x + δ | Y ≈ y) ≈ fX |Y (x | y) · δ
Stick-breaking example
• Break a stick of length � twice: 1
fX,Y (x, y) = , 0≤y≤x≤�
break at X: uniform in [0, 1]; �x
break again at Y , uniform in [0, X]
y
f X(x) f Y |X (y | x)
L
L x y
L x
19
LECTURE 10 The Bayes variations
Continuous X, Discrete Y
– Obtaining the PDF for
fX (x)pY |X (y | x)
fX |Y (x | y) = g(X, Y ) = Y /X
pY (y)
� involves deriving a distribution.
pY (y) = fX (x)pY |X (y | x) dx Note: g(X, Y ) is a random variable
x
Example:
When not to find them
• X: a continuous signal; “prior” fX (x)
(e.g., intensity of light beam); • Don’t need PDF for g(X, Y ) if only want
• Y : discrete r.v. affected by X to compute expected value:
� �
(e.g., photon count)
E[g(X, Y )] = g(x, y)fX,Y (x, y) dx dy
• pY |X (y | x): model of the discrete r.v.
20
How to find them The continuous case
x y
g(x)
Example
. . • X: uniform on [0,2]
. . • Find PDF of Y = X 3
. .
. . • Solution:
. . fY (y) =
dFY
dy
(y) =
1
6y 2/3
f v(v0 )
1/30 � �
1 y−b
fY (y) = fX
|a| a
21
LECTURE 11 A general formula
• Hence,
Find the PDF of Z = g(X, Y ) = Y /X ! !
! dg !
δfX (x) = δfY (y) !! (x)!!
FZ (z ) = z≤1 dx
where y = g(x)
FZ (z) = z≥1
• W = X + Y ; X, Y independent • W = X + Y ; X, Y independent
. y
y
. (0,3)
. (1,2) w
.(2,1)
.
(3,0)
x
. w x
pW (w) = P(X + Y = w)
" x +y=w
= P(X = x)P(Y = w − x)
x
"
= pX (x)pY (w − x) • fW |X (w | x) = fY (w − x)
x
• fW,X (w, x) = fX (x)fW |X (w | x)
• Mechanics:
= fX (x)fY (w − x)
– Put the pmf’s on top of each other
# ∞
– Flip the pmf of Y • fW (w) = fX (x)fY (w − x) dx
−∞
– Shift the flipped pmf by w
(to the right if w > 0)
– Cross-multiply and add
22
Two independent normal r.v.s The sum of independent normal r.v.’s
. σX σY
.
.
.
. .. ... . .... ... . .. .. . ... . .
. . .... . .. .
. ... ...... ... .. . . .
. . . .. .. .
. . . .. .
. . . .. . . ... .. . . y y
• |ρ| = 1 ⇔ (X − E[X]) = c(Y − E[Y ])
. .
. . (linearly related)
. . .
.
• Independent ⇒ ρ = 0
• cov(X, Y ) = E[XY ] − E[X]E[Y ] (converse is not true)
n
" n
" "
• var Xi = var(Xi) + 2 cov(Xi, Xj )
i=1 i=1 (i,j ):i=j
&
• independent ⇒ cov(X, Y ) = 0
(converse is not true)
23
LECTURE 12 Conditional expectations
• In stick example:
E[X] = E[E[X | Y ]] = E[Y /2] = !/4
Proof:
E[X | Y = 1] = 90, E[X | Y = 2] = 60
1 2
(d) var(E[X | Y ]) = E[ (E[X | Y ])2 ]−(E[X])2 var(E[X | Y ]) = (90 − 70)2 + (60 − 70)2
3 3
600
Sum of right-hand sides of (c), (d): = = 200
3
E[X 2] − (E[X ])2 = var(X)
24
Section means and variances (ctd.) Example
10 30
1 ! 1 ! var(X) = E[var(X | Y )] + var(E[X | Y ])
(xi−90)2 = 10 (xi−60)2 = 20
10 i=1 20 i=11
f X(x)
2/3
var(X | Y = 1) = 10 var(X | Y = 2) = 20
1/3
1 2
Y=1 Y=2 x
10, w.p. 1/3
var(X | Y ) =
20, w.p. 2/3
E[X | Y = 1] = E[X | Y = 2] =
E[var(X | Y )] = 1 · 10 + 2 · 20 = 50
3 3 3
var(X | Y = 1) = var(X | Y = 2) =
E[Y | N = n] = E[X1 + X2 + · · · + Xn | N = n]
= E[X1 + X2 + · · · + Xn]
= E[X1] + E[X2] + · · · + E[Xn] var(Y ) = E[var(Y | N )] + var(E[Y | N ])
= n E[X] = E[N ] var(X ) + (E[X])2 var(N )
• E[Y | N ] = N E[X]
E[Y ] = E[E[Y | N ]]
= E[N E[X]]
= E[N ] E[X]
25
LECTURE 13 The Bernoulli process
26
Interarrival times Time of the kth arrival
• T1: number of trials until first success • Given that first arrival was at time t
i.e., T1 = t:
– P(T1 = t) = additional time, T2, until next arrival
– E[Yk ] =
• If you buy a lottery ticket every day, what
is the distribution of the length of the
first string of losing days? – Var(Yk ) =
– P(Yk = t) =
time
Figure 6.4: Merging of independent Bernoulli processes.
λk
pZ (k) = e−λ , k = 0, 1, 2, . . . .
k!
Its mean and variance are given by
E[Z] = λ, var(Z) = λ.
27
LECTURE 19 Chebyshev’s inequality
Limit theorems – I
• Random variable X
• Readings: Sections 5.1-5.3; (with finite mean µ and variance σ 2)
start Section 5.4 !
σ2 = (x − µ)2fX (x) dx
! −c ! ∞
• X1, . . . , Xn i.i.d. ≥ (x − µ)2fX (x) dx + (x − µ)2fX (x) dx
−∞ c
X1 + · · · + Xn
Mn =
n ≥ c2 · P(|X − µ| ≥ c)
What happens as n → ∞?
• Why bother? σ2
P(|X − µ| ≥ c) ≤
c2
• A tool: Chebyshev’s inequality
1 /n
0 n
Does Yn converge?
38
Convergence of the sample mean The pollster’s problem
(Weak law of large numbers)
• f : fraction of population that “. . . ”
• X1, X2, . . . i.i.d.
• ith (randomly selected) person polled:
finite mean µ and variance σ 2
1, if yes,
X + · · · + Xn
Mn = 1 Xi =
0, if no.
n
• Mn = (X1 + · · · + Xn)/n
• E[Mn] = fraction of “yes” in our sample
• If n = 50, 000,
then P(|Mn − f | ≥ .01) ≤ .05
(conservative)
39
LECTURE 20 Usefulness
THE CENTRAL LIMIT THEOREM • universal; only means, variances matter
• accurate computational shortcut
• Readings: Section 5.4
• justification of normal models
• X1, . . . , Xn i.i.d., finite variance σ 2
• “Standardized” Sn = X1 + · · · + Xn:
What exactly does it say?
Sn − E[Sn] Sn − nE[X]
Zn = = √ • CDF of Zn converges to normal CDF
σ Sn nσ
– not a statement about convergence of
– E[Zn] = 0, var(Zn) = 1 PDFs or PMFs
0.14 0.1
n =2 n =4
0.12
0.08
0.1
0.08 0.06
0.06
0.04
The pollster’s problem using the CLT
0.04
0.02
0.02
• f : fraction of population that “ . . .��
0 0
0 5 10 15 20 0 5 10 15 20 25 30 35
0.01
0.05
0.005
• Mn = (X1 + · · · + Xn)/n
0 0
• Suppose we want:
0 2 4 6 8 100 120 140 160 180 200
n = 16
0.07
n=8
0.06
0.08
0.05
� �
� X1 + · · · + Xn − nf �
0.06 0.04
� �
� ≥ .01
0.03
0.04
�
0.02
0.02
n
0.01
0 0
0 5 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70
� � √
1 0.06 � X + · · · + X − nf � .01 n
� 1 n �
0.9 n = 32
� √ � ≥
� �
0.05
0.8
0.7
nσ σ
0.04
0.6
0.5 0.03
0.4
0.3
0.02
√
0.2
0.01 P(|Mn − f | ≥ .01) ≈ P(|Z| ≥ .01 n/σ)
0.1
√
0
0 1 2 3 4 5 6 7
0
30 40 50 60 70 80 90 100 ≤ P(|Z| ≥ .02 n)
40
Apply to binomial The 1/2 correction for binomial
approximation
• Fix p, where 0 < p < 1
• P(Sn ≤ 21) = P(Sn < 22),
• Xi: Bernoulli(p) because Sn is integer
Sn − np
• CDF of � −→ standard normal
np(1 − p)
Example
18 19 20 21 22
• n = 36, p = 0.5; find P(Sn ≤ 21)
• Exact answer:
21 � � �
� 36� 1 36
= 0.8785
k=0
k 2
= 0.124
• Exact answer:
�36� � 1 �36
= 0.1251
19 2
41