Conditional Distributions

The document discusses conditional distributions, highlighting the differences between discrete and continuous random variables, and the importance of conditional densities in defining conditional distributions. It also covers Jensen's inequality, its applications in statistics, and introduces concentration inequalities, specifically Hoeffding's inequality, which provides bounds on the probability that sample means deviate from the population mean. The text emphasizes the utility of conditional distributions and concentration inequalities in statistical modeling and analysis.

Uploaded by

Juwon Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views5 pages

Conditional Distributions

Uploaded by

Juwon Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Conditional distributions

Conditional distributions in general are rather abstract. When the random variables in question
are discrete (µ = counting measure), however, things are quite simple; the reason is that events
where the value of the random variable is fixed have positive probability, so the ordinary
conditional probability formula involving ratios can be applied.
When one or more of the random variables in question are continuous (dominated by Lebesgue
measure), then more care must be taken. Suppose random variables X and Y have a joint
distribution with density function pX,Y (x, y), with respect to some dominating (product)
measure µ×ν. Then the corresponding marginal distributions have densities with respect to µ and
ν, respectively, given by
pX(x) = Z pX,Y (x, y) dν(y) and pY (y) = Z pX,Y (x, y) dµ(x).
Moreover, the conditional distribution of Y , given X = x, also has a density with respect to ν,
and is given by the ratio
pY |X(y | x) = pX,Y (x, y)/pX(x).
As a function of x, for given y, this is clearly µ-measurable since the joint and marginal densities
are measurable. Also, for a given x, pY |X(y | x) defines a probability measure Qx, called the
conditional distribution of Y , given X = x, through the integral
Qx(B) = Z B pY |X(y | x) dν(y).
That is, pY |X(y | x) is the Radon–Nikodym derivative for the conditional distribution Qx. For
our purposes, conditional distribution can always be defined through its conditional density
though, in general, a conditional density may not exist even if the conditional distribution Qx
does exist. There are real cases where the most general definition of conditional distribution
(Keener 2010, Sec. 6.2) is required, e.g., in the proof of the Neyman–Fisher factorization
theorem and in the proof of the general Bayes theorem. Also, I should mention that conditional
distributions are not unique: the point being that the conditional density can be redefined
arbitrarily on a set of ν-measure zero, without affecting the integral that defines Qx(B) above. We
will not dwell on this point here, but students should be aware of the subtleties of conditional
distributions; the wikipedia page4 on the Borel paradox gives a clear explanation of these
difficulties, along with references, e.g., to Jaynes (2003), Chapter 15.
Given conditional distribution with density pY |X(y | x), we can define conditional probabilities
and expectations. That is,
P(Y ∈ B | X = x) = Z B pY |X(y | x) dν(y).
Here I use the more standard notation for conditional probability. The law of total probability
then allows us to write
P(Y ∈ B) = Z P(Y ∈ B | X = x)pX(x) dµ(x),
in other words, marginal probabilities for Y may be obtained by taking expectation of the
conditional probabilities. More generally, for any ν-integrable function ϕ, we may write the
conditional expectation
E{ϕ(Y ) | X = x} = Z ϕ(y)pY |X(y|x) dν(y).
We may evaluate the above expectation for any x, so we actually have defined a (µ-measurable)
function, say, g(x) = E(Y | X = x); here I took ϕ(y) = y for simplicity. Now, g(X) is a random
variable, to be denoted by E(Y | X), and we can ask about its mean, variance, etc. The
corresponding version of the law of total probability for conditional expectations is
E(Y ) = E{E(Y | X)}. (1.6)
This formula is called smoothing in Keener (2010) but I would probably call it a law of iterated
expectation. This is actually a very powerful result that can simplify lots of calculations; Keener
(2010) uses this a lot. There are versions of iterated expectation for higher moments, e.g.,
V(Y ) = V{E(Y | X)} + E{V(Y | X)}, (1.7)
C(X, Y ) = E{C(X, Y | Z)} + C{E(X | Z), E(Y | Z)}, (1.8)
where V(Y | X) is the conditional variance, i.e., the variance of Y relative to its conditional
distribution and, similarly, C(X, Y | Z) is the conditional covariance of X and Y .
As a final word about conditional distributions, it is worth mentioning that conditional
distributions are particularly useful in the specification of complex models. Indeed, it can be
difficult to specify a meaningful joint distribution for a collection of random variables in a given
application. However, it is often possible to write down a series of conditional distributions that,
together, specify a meaningful joint distribution. That is, we can simplify the modeling step by
working with several lower-dimensional conditional distributions. This is particularly useful for
specifying prior distributions for unknown parameters in a Bayesian analysis; we will discuss
this more later.
Jensen’s inequality
Convex sets and functions appear quite frequently in statistics and probability applications, so it
can help to see the some applications. The first result, relating the expectation of a convex
function to the function of the expectation, should be familiar.
Theorem 1.6 (Jensen’s inequality). Suppose ϕ is a convex function on an open interval X ⊆ R,
and X is a random variable taking values in X. Then
ϕ[E(X)] ≤ E[ϕ(X)].
If ϕ is strictly convex, then equality holds if and only if X is constant. Proof. First, take x0 to be
any fixed point in X. Then there exists a linear function `(x) = c(x − x0) + ϕ(x0), through the
point (x0, ϕ(x0)), such that `(x) ≤ ϕ(x) for all x. To prove our claim, take x0 = E(X), and note that
ϕ(X) ≥ c[X − E(X)] + ϕ[E(X)].
Taking expectations on both sides gives the result.
Jensen’s inequality can be used to confirm: E(1/X) ≥ 1/E(X), E(X2 ) ≥ E(X) 2 , and E[log X] ≤
log E(X). An interesting consequence is the following.
Example 1.7 (Kullback–Leibler divergence). Let f and g be two probability density functions
dominated by a σ-finite measure µ. The Kullback–Leibler divergence of g from f is defined as
Ef {log[f(X)/g(X)]} = Z log(f/g)f dµ.
It follows from Jensen’s inequality that
Ef {log[f(X)/g(X)]} = −Ef {log[g(X)/f(X)]}
≥ − log Ef [g(X)/f(X)]
= − log Z (g/f)f dµ = 0.
That is, the Kullback–Leibler divergence is non-negative for all f and g. Moreover, it equals zero
if and only if f = g (µ-almost everywhere). Therefore, the Kullback–Leibler divergence acts like
a distance measure between to density functions. While it’s not a metric in a mathematical sense5
, it has a lot of statistical applications. See Exercise 23.
Example 1.8 (Another proof of Cauchy–Schwartz). Recall that f 2 and g 2 are µ-measurable
functions. If R g 2 dµ is infinite, then there is nothing to prove, so suppose otherwise. Then p = g
2/ R g 2 dµ is a probability density on X. Moreover, R fg dµ R g 2 dµ 2 = Z (f/g)p d2 ≤ Z (f/g) 2
p dµ = R f 2 dµ R g 2 dµ , where the inequality follows from Theorem 1.6. Rearranging terms
one gets Z fg dµ2 ≤ Z f 2 dµ · Z g 2 dµ, which is the desired result.
Another application of convexity and Jensen’s inequality will come up in the decisiontheoretic
context to be discussed later. In particular, when the loss function is convex, it will follow from
Jensen’s inequality that randomized decision rules are inadmissible and, hence, can be ignored.
A concentration inequality
We know that sample means of iid random variables, for large sample sizes, will “concentrate”
around the population mean. A concentration inequality gives a bound on the probability that the
sample mean is outside a neighborhood of the population mean. Chebyshev’s inequality
(Exercise 25) is one example of a concentration inequality and, often, these tools are the key to
proving limit theorems and even some finite-sample results in statistics and machine learning.
Here we prove a famous but relatively simple concentration inequality for sums of independent
bounded random variables. By “bounded random variables” we mean Xi such that P(ai ≤ Xi ≤
bi) = 1. For one thing, boundedness implies existence of moment generating functions. We start
with a simple result for one bounded random variable with mean zero; the proof uses some
properties of convex functions. Portions of what follows are based on notes prepared by Larry
Wasserman.6
Lemma 1.1. Let X be a random variable with mean zero, bounded within the interval [a, b].
Then the moment generating function MX(t) = E(e tX) satisfies
MX(t) ≤ e t 2 (b−a) 2/8 .
Proof. Write X = W a + (1 − W)b, where W = (X − a)/(b − a). The function z 7→ e tz is convex,
so we get
e tX ≤ W eta + (1 − W)e tb .
Taking expectation, using the fact that E(X) = 0, gives
MX(t) ≤ − a b − a e ta + b b − a e tb .
The right-hand side can be rewritten as e h(ζ) , where
ζ = t(b − a) > 0, h(z) = −cz + log(1 − c + cez ), c = −a/(b − a) ∈ (0, 1).
Obviously, h(0) = 0; similarly, h 0 (z) = −c + cez/(1 − c + cez ), so h 0 (0) = 0. Also,
h 00(z) = c(1 − c)e z (1 − c + cez ) 2 , h000(z) = c(1 − c)e z (1 − c − cez ) (1 − c + cez ) 3 .

1/4, and this is the global maximum. Therefore, h 00(z) ≤ 1/4 for all z > 0. Now, for some z0 ∈
It is easy to verify that h 000(z) = 0 iff z = log( 1−c c ). Plugging this z value in to h 00 gives

(0, ζ), there is a second-order Taylor approximation of h(ζ) around 0: h(ζ) = h(0) + h 0 (0)ζ + h
00(z0) ζ 2 2 ≤ ζ 2 8 = t 2 (b − a) 2 8 .
Plug this bound in to get MX(t) ≤ e h(ζ) ≤ e t 2 (b−a) 2/8 .
Lemma 1.2 (Chernoff). For any random variable X, P(X > ε) ≤ inft>0 e −tεE(e tX). Proof. See
Exercise 26.
Now we are ready for the main result, Hoeffding’s inequality. The proof combines the results in
the two previous lemmas.
Theorem 1.7 (Hoeffding’s inequality). Let Y1, Y2, . . . be independent random variables, with
P(a ≤ Yi ≤ b) = 1 and mean µ. Then
P(|Y¯ n − µ| > ε) ≤ 2e −2nε2/(b−a) 2 .
Proof. We can take µ = 0, without loss of generality, by working with Xi = Yi − µ. Of course, Xi
is still bounded, and the length of the bounding interval is still b − a. Write
P(|X¯ n| > ε) = P(X¯ n > ε) + P(−X¯ n > ε).
Start with the first term on the right-hand side. Using Lemma 1.2, P(X¯ n > ε) = P(X1 + · · · +
Xn > nε) ≤ inf t>0 e −tnεMX(t) n ,
where MX(t) is the moment generating function of X1. By Lemma 1.1, we have P(X¯ n > ε) ≤
inf t>0 e −tnεe nt2 (b−a) 2/8 .
The minimizer, over t > 0, of the right-hand side is t = 4ε/(b − a) 2 , so we get
P(X¯ n > ε) ≤ e −2nε2/(b−a) 2 .
To complete the proof, apply the same argument to P(−X¯ n > ε), obtain the same bound as
above, then sum the two bounds together.
There are lots of other kinds of concentration inequalities, most are more general than
Hoeffding’s inequality above. Exercise 28 walks you through a concentration inequality for
normal random variables and a corresponding strong law. Modern work on concentration
inequalities deals with more advanced kinds of random quantities, e.g., random functions or
stochastic processes. The next subsection gives a special case of such a result.

Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
SF 2940 Forms
No ratings yet
SF 2940 Forms
23 pages
College Statistics
No ratings yet
College Statistics
244 pages
Course Berestycki
No ratings yet
Course Berestycki
21 pages
Probability Theory (MATHIAS LOWE)
No ratings yet
Probability Theory (MATHIAS LOWE)
69 pages
A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook
No ratings yet
A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook
4 pages
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
No ratings yet
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
155 pages
Advanced Probability Concepts
No ratings yet
Advanced Probability Concepts
18 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
Advanced Probability Concepts
No ratings yet
Advanced Probability Concepts
80 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
Prerequis Esp Cond
No ratings yet
Prerequis Esp Cond
6 pages
Prob 2 B English
No ratings yet
Prob 2 B English
81 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
APA Lecture Notes Part2
No ratings yet
APA Lecture Notes Part2
21 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Properties of Expectation: Jeff Chak Fu WONG
No ratings yet
Properties of Expectation: Jeff Chak Fu WONG
55 pages
Basic Probability & Statistics Review
No ratings yet
Basic Probability & Statistics Review
20 pages
Conditional Probability
No ratings yet
Conditional Probability
18 pages
Hitchhiker S Guide To Probability
No ratings yet
Hitchhiker S Guide To Probability
6 pages
R300 MT Class 1 Slides
No ratings yet
R300 MT Class 1 Slides
68 pages
SDENotes 2
No ratings yet
SDENotes 2
140 pages
Important Inequalities
No ratings yet
Important Inequalities
7 pages
Lesson 1
No ratings yet
Lesson 1
31 pages
Mathematical Statistics: Probability & Distributions
No ratings yet
Mathematical Statistics: Probability & Distributions
87 pages
Ross Chapter 3 Sols
No ratings yet
Ross Chapter 3 Sols
5 pages
Chapter 4 Expected Values
No ratings yet
Chapter 4 Expected Values
13 pages
cs109 Final Cheat 3 PDF
No ratings yet
cs109 Final Cheat 3 PDF
13 pages
Probability and Statistics: Cheat Sheet
100% (1)
Probability and Statistics: Cheat Sheet
10 pages
Kỳ Vọng Có Điều Kiện
No ratings yet
Kỳ Vọng Có Điều Kiện
12 pages
Advanced Risk Model Applications
No ratings yet
Advanced Risk Model Applications
67 pages
Probability Exam Cheat Sheet
No ratings yet
Probability Exam Cheat Sheet
2 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Conditional Probability and Expectation
No ratings yet
Conditional Probability and Expectation
19 pages
Conditional Expectation
No ratings yet
Conditional Expectation
7 pages
Info Theory & Coding Syllabus
No ratings yet
Info Theory & Coding Syllabus
96 pages
Cheatsheet Probability and Statistics
100% (1)
Cheatsheet Probability and Statistics
10 pages
Stochastic Processes Notes
100% (1)
Stochastic Processes Notes
22 pages
P8-Properties of Distributions
No ratings yet
P8-Properties of Distributions
12 pages
Probability Cheat Sheet: Distributions
No ratings yet
Probability Cheat Sheet: Distributions
2 pages
Advanced Stochastic Processes
No ratings yet
Advanced Stochastic Processes
90 pages
Marcin Pitera. Stochastic Processes.
No ratings yet
Marcin Pitera. Stochastic Processes.
45 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
79 pages
Lecture Notes 1 36-705 Brief Review of Basic Probability
No ratings yet
Lecture Notes 1 36-705 Brief Review of Basic Probability
7 pages
Chapter 2
No ratings yet
Chapter 2
49 pages
Stochbasics Handout
No ratings yet
Stochbasics Handout
36 pages
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
No ratings yet
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
11 pages
Foundation Level CITN Syllabus - SyllabusNG
No ratings yet
Foundation Level CITN Syllabus - SyllabusNG
41 pages
Operation Research 2
No ratings yet
Operation Research 2
111 pages
Time Series Analysis and Forecasting
No ratings yet
Time Series Analysis and Forecasting
13 pages
Assignment
No ratings yet
Assignment
1 page
Forecast Analysis of Child Admissions and Releases
No ratings yet
Forecast Analysis of Child Admissions and Releases
61 pages
Basic Group Theory
No ratings yet
Basic Group Theory
4 pages
Name Approval for Divine Favour Logistics
No ratings yet
Name Approval for Divine Favour Logistics
1 page
Book 1
No ratings yet
Book 1
185 pages
40th Ceremony and 2526 Convocation List Complete
No ratings yet
40th Ceremony and 2526 Convocation List Complete
420 pages
TIME SERIES SEMINAR DATA Modif
No ratings yet
TIME SERIES SEMINAR DATA Modif
20 pages
Project Topics Proposal
No ratings yet
Project Topics Proposal
1 page
NDDC 2024 Training Proposal - Tax Unit 2024
No ratings yet
NDDC 2024 Training Proposal - Tax Unit 2024
9 pages
Time Series Seminar Data
No ratings yet
Time Series Seminar Data
6 pages
Ich 515 CHM 502 PPT 2023 Lms Issues
No ratings yet
Ich 515 CHM 502 PPT 2023 Lms Issues
30 pages
Bootsquare Scenarios
No ratings yet
Bootsquare Scenarios
2 pages
XEROX WorkCentre 5330 Data Sheet
No ratings yet
XEROX WorkCentre 5330 Data Sheet
2 pages
Definition of Online Ordering System
No ratings yet
Definition of Online Ordering System
27 pages
Winsteam User'S Manual
No ratings yet
Winsteam User'S Manual
22 pages
Outline of Technology
No ratings yet
Outline of Technology
25 pages
Model and Part Numbers: Desktop, Desk-Edge and Flush-Mount Stations
No ratings yet
Model and Part Numbers: Desktop, Desk-Edge and Flush-Mount Stations
2 pages
Math & Programming Guide
No ratings yet
Math & Programming Guide
23 pages
Ebook Data Science in The Middle East - Original
No ratings yet
Ebook Data Science in The Middle East - Original
17 pages
Manuel IEC61850 GE P14D P14N
No ratings yet
Manuel IEC61850 GE P14D P14N
74 pages
Horiba Raman LabSpec 6 Spectroscopy Suite Reference Manual
No ratings yet
Horiba Raman LabSpec 6 Spectroscopy Suite Reference Manual
117 pages
Homework Help: Battle of Hastings 1066
100% (1)
Homework Help: Battle of Hastings 1066
8 pages
MODULE 4 Hardware The CPU & Storage
No ratings yet
MODULE 4 Hardware The CPU & Storage
62 pages
DNS and Network Services Exam
No ratings yet
DNS and Network Services Exam
11 pages
07 Chapter7Decimals
No ratings yet
07 Chapter7Decimals
36 pages
Microsoft GitHub Copilot For SWE - Hands-On Tutorial
No ratings yet
Microsoft GitHub Copilot For SWE - Hands-On Tutorial
28 pages
Ultimate Scrapebox Guide by Jacob King
No ratings yet
Ultimate Scrapebox Guide by Jacob King
52 pages
SJX V1-50 - Walkthrough
No ratings yet
SJX V1-50 - Walkthrough
60 pages
Pcchips P47g-15ge
No ratings yet
Pcchips P47g-15ge
41 pages
DRAGFLOW Catalogo DRF IT
No ratings yet
DRAGFLOW Catalogo DRF IT
1 page
Mapiex
No ratings yet
Mapiex
13 pages
Chapter Three: Memory Management
No ratings yet
Chapter Three: Memory Management
11 pages
Game Log
No ratings yet
Game Log
94 pages
MVVM Mobile Application
No ratings yet
MVVM Mobile Application
21 pages
Bizhub C558e C458e
No ratings yet
Bizhub C558e C458e
8 pages
Pioneer CD RDS Receiver Manual
No ratings yet
Pioneer CD RDS Receiver Manual
14 pages
Scilab
No ratings yet
Scilab
11 pages
Technical Information SCU Control Unit
No ratings yet
Technical Information SCU Control Unit
98 pages
Blockchain Unconfirmed Transaction Hack Free Script
No ratings yet
Blockchain Unconfirmed Transaction Hack Free Script
9 pages
Solarwinds Software Services Agreement For Cloud
No ratings yet
Solarwinds Software Services Agreement For Cloud
8 pages
KPIs GSM
No ratings yet
KPIs GSM
19 pages
வித விதமான கோலங &# PDF
No ratings yet
வித விதமான கோலங &# PDF
25 pages

Conditional Distributions

Uploaded by

Conditional Distributions

Uploaded by

Conditional distributions

You might also like