18.
600: Lecture 25
Conditional expectation
Scott Sheffield
MIT
Outline
Conditional probability distributions
Conditional expectation
Interpretation and examples
Outline
Conditional probability distributions
Conditional expectation
Interpretation and examples
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
I We do something similar when X and Y are continuous
random variables. In that case we write fX |Y (x|y ) = ffY(x,y )
(y ) .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
I We do something similar when X and Y are continuous
random variables. In that case we write fX |Y (x|y ) = ffY(x,y )
(y ) .
I Often useful to think of sampling (X , Y ) as a two-stage
process. First sample Y from its marginal distribution, obtain
Y = y for some particular y . Then sample X from its
probability distribution given Y = y .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
I We do something similar when X and Y are continuous
random variables. In that case we write fX |Y (x|y ) = ffY(x,y )
(y ) .
I Often useful to think of sampling (X , Y ) as a two-stage
process. First sample Y from its marginal distribution, obtain
Y = y for some particular y . Then sample X from its
probability distribution given Y = y .
I Marginal law of X is weighted average of conditional laws.
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
I Answer: uniform on {6, 7, 8, 9, 10, 11}.
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
I Answer: uniform on {6, 7, 8, 9, 10, 11}.
I What is the probability distribution for Y given that Z = 5?
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
I Answer: uniform on {6, 7, 8, 9, 10, 11}.
I What is the probability distribution for Y given that Z = 5?
I Answer: uniform on {1, 2, 3, 4}.
Outline
Conditional probability distributions
Conditional expectation
Interpretation and examples
Outline
Conditional probability distributions
Conditional expectation
Interpretation and examples
Conditional expectation
I Now, what do we mean by E [X |Y = y ]? This should just be
the expectation of X in the conditional probability measure
for X given that Y = y .
Conditional expectation
I Now, what do we mean by E [X |Y = y ]? This should just be
the expectation of X in the conditional probability measure
for X given that Y = y .
I Can write this as
P P
E [X |Y = y ] = x xP{X = x|Y = y } = x xpX |Y (x|y ).
Conditional expectation
I Now, what do we mean by E [X |Y = y ]? This should just be
the expectation of X in the conditional probability measure
for X given that Y = y .
I Can write this as
P P
E [X |Y = y ] = x xP{X = x|Y = y } = x xpX |Y (x|y ).
I Can make sense of this in the continuum setting as well.
Conditional expectation
I Now, what do we mean by E [X |Y = y ]? This should just be
the expectation of X in the conditional probability measure
for X given that Y = y .
I Can write this as
P P
E [X |Y = y ] = x xP{X = x|Y = y } = x xpX |Y (x|y ).
I Can make sense of this in the continuum setting as well.
f (x,y )
I In continuum setting we had fX |Y (x|y ) = fY (y ) . So
R
E [X |Y = y ] = x ffY(x,y )
(y ) dx
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is E [X |Y = 5]?
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is E [X |Y = 5]?
I What is E [Z |Y = 5]?
Example
I Let X be value on one die roll, Y value on second die roll,
and write Z = X + Y .
I What is E [X |Y = 5]?
I What is E [Z |Y = 5]?
I What is E [Y |Z = 5]?
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
I Proof in discretePcase:
E [X |Y = y ] = x xP{X = x|Y = y } = x x p(x,y )
P
pY (y ) .
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
I Proof in discretePcase:
E [X |Y = y ] = x xP{X = x|Y = y } = x x p(x,y )
P
pY (y ) .
P
I Recall that, in general, E [g (Y )] = y pY (y )g (y ).
Conditional expectation as a random variable
I Can think of E [X |Y ] as a function of the random variable Y .
When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
I Proof in discretePcase:
E [X |Y = y ] = x xP{X = x|Y = y } = x x p(x,y )
P
pY (y ) .
P
I Recall that, in general, E [g (Y )] = y pY (y )g (y ).
E [E [X |Y = y ]] = y pY (y ) x x p(x,y )
P P P P
pY (y ) = y p(x, y )x =
I
x
E [X ].
Conditional variance
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
Conditional variance
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Var(X |Y ) is a random variable that depends on Y . It is the
variance of X in the conditional distribution for X given Y .
Conditional variance
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Var(X |Y ) is a random variable that depends on Y . It is the
variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
Conditional variance
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Var(X |Y ) is a random variable that depends on Y . It is the
variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
Conditional variance
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Var(X |Y ) is a random variable that depends on Y . It is the
variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )].
Conditional variance
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Var(X |Y ) is a random variable that depends on Y . It is the
variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )].
I One can discover X in two stages: first sample Y from
marginal and compute E [X |Y ], then sample X from
distribution given Y value.
Conditional variance
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Var(X |Y ) is a random variable that depends on Y . It is the
variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )].
I One can discover X in two stages: first sample Y from
marginal and compute E [X |Y ], then sample X from
distribution given Y value.
I Above fact breaks variance into two parts, corresponding to
these two stages.
Example
I Let X be a random variable of variance X2 and Y an
independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
Example
I Let X be a random variable of variance X2 and Y an
independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
Example
I Let X be a random variable of variance X2 and Y an
independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
Example
I Let X be a random variable of variance X2 and Y an
independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
I What is E [Z |X ]? And how about Var(Z |X )?
Example
I Let X be a random variable of variance X2 and Y an
independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
I What is E [Z |X ]? And how about Var(Z |X )?
I Both of these values are functions of X . Former is just X .
Latter happens to be a constant-valued function of X , i.e.,
happens not to actually depend on X . We have
Var(Z |X ) = Y2 .
Example
I Let X be a random variable of variance X2 and Y an
independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
I What is E [Z |X ]? And how about Var(Z |X )?
I Both of these values are functions of X . Former is just X .
Latter happens to be a constant-valued function of X , i.e.,
happens not to actually depend on X . We have
Var(Z |X ) = Y2 .
I Can we check the formula
Var(Z ) = Var(E [Z |X ]) + E [Var(Z |X )] in this case?
Outline
Conditional probability distributions
Conditional expectation
Interpretation and examples
Outline
Conditional probability distributions
Conditional expectation
Interpretation and examples
Interpretation
I Sometimes think of the expectation E [Y ] as a best guess or
best predictor of the value of Y .
Interpretation
I Sometimes think of the expectation E [Y ] as a best guess or
best predictor of the value of Y .
I It is best in the sense that at among all constants m, the
expectation E [(Y m)2 ] is minimized when m = E [Y ].
Interpretation
I Sometimes think of the expectation E [Y ] as a best guess or
best predictor of the value of Y .
I It is best in the sense that at among all constants m, the
expectation E [(Y m)2 ] is minimized when m = E [Y ].
I But what if we allow non-constant predictors? What if the
predictor is allowed to depend on the value of a random
variable X that we can observe directly?
Interpretation
I Sometimes think of the expectation E [Y ] as a best guess or
best predictor of the value of Y .
I It is best in the sense that at among all constants m, the
expectation E [(Y m)2 ] is minimized when m = E [Y ].
I But what if we allow non-constant predictors? What if the
predictor is allowed to depend on the value of a random
variable X that we can observe directly?
I Let g (x) be such a function. Then E [(y g (X ))2 ] is
minimized when g (X ) = E [Y |X ].
Examples
I Toss 100 coins. Whats the conditional expectation of the
number of heads given that there are k heads among the first
fifty tosses?
Examples
I Toss 100 coins. Whats the conditional expectation of the
number of heads given that there are k heads among the first
fifty tosses?
I k + 25
Examples
I Toss 100 coins. Whats the conditional expectation of the
number of heads given that there are k heads among the first
fifty tosses?
I k + 25
I Whats the conditional expectation of the number of aces in a
five-card poker hand given that the first two cards in the hand
are aces?
Examples
I Toss 100 coins. Whats the conditional expectation of the
number of heads given that there are k heads among the first
fifty tosses?
I k + 25
I Whats the conditional expectation of the number of aces in a
five-card poker hand given that the first two cards in the hand
are aces?
I 2 + 3 2/50