Lecture06 BasicSampleGeneration CMU
Lecture06 BasicSampleGeneration CMU
METHODS
AND APPLICATIONS
x=±a 2b
0
Floating Point—Representable Numbers (2D)
Errors in Floating Point
• Sequences of arithmetic operations can lose running
total
information/accuracy
• Can also go out of range
• Many common misconceptions, e.g.:
smallest number representable with same
– “dividing by small numbers is bad for accuracy” exponent as running total
Key idea: pile up “little bits” until they’re big enough to add to the sum.
Harmonic Series—revisited
• Let’s try our harmonic series again, this
time with compensated summation harmonic
N sum
series 1 1
• This time, the partial sum keeps 10 2.92897
increasing, as expected 😮💨 100 5.18738
1k 7.48548
1. (Easier.) The partial sums have to be collected and aggregated into the final sum,
which means there’s some communication overhead.
2. (Harder.) Each summand may take a very different amount of work, or use
divergent memory accesses, making load balancing and cache management
challenging.
3. (Also harder.) Distributing MCMC doesn’t help with mixing time. 😞
• Some well-established solutions for, e.g., parallel sum/parallel prefix sum… the other
problems are much harder! (And problem-specific.)
Parallel Implementation—Challenges
• At first glance, Monte Carlo seems “embarrassingly parallel”: just ask each thread/
core/node/etc., to evaluate its fair share of the terms in the sum.
• Several problems, though:
1. (Easier.) The partial sums have to be collected and aggregated into the final sum,
which means there’s some communication overhead.
2. (Harder.) Each summand may take a very different amount of work, or use
divergent memory accesses, making load balancing and cache management
challenging.
3. (Also harder.) Distributing MCMC doesn’t help with mixing time. 😞
• Some well-established solutions for, e.g., parallel sum/parallel prefix sum… the other
problems are much harder! (And problem-specific.)
Random Number Generation
Random Number Generation—Overview
• Our most basic task will be to sample uniform random numbers
– canonically on interval [0,1]; can be transformed to other distributions
• Lots of questions:
– what does “random” really mean?
– do random numbers really exist?
– how can we simulate numbers that appear random?
– how do we measure/test quality of random numbers?
– what features of randomness are important for Monte Carlo?
• Will try to at least get a brief glimpse at some answers!
How do you generate (good) random numbers?
47,692
*…but if you like this kind of thing, there’s a great Abel Prize lecture by Avi Wigderson
Good vs. Bad Random Numbers
• Does the following list of numbers seem like a “good” or “bad” random
sequence for use in Monte Carlo methods?
– 2, 5, 8, 31, 32, 36, 54, 56, 59, 60, 68, 71, 72, 78, 82, 89
• How about this one—is this a “better” or “worse” sequence?
– 31, 36, 8, 5, 82, 89, 72, 60, 54, 2, 71, 32, 56, 78, 68, 59
• If we use these numbers to generate sample points for Monte Carlo
integration, does order even matter? Key idea: “quality” of
random numbers depends
– or just distribution over the sample space? on what you want to use
them for!
Good vs. Bad Random Numbers
Are these “good” random numbers / points?
How can you tell which ones are best suited for Monte Carlo?
“Deterministic random numbers…?”
See: Vigna, “It Is High Time We Let Go Of The Mersenne Twister” (2019)
Permuted Congruential Generator
• A permuted congruential generator (PCG) is the complete implementation (C++)
most recent hope for a “good” PRNG
#include <stdint.h>
uint32_t pcg32(void) {
1 0 0 1 0 0 1 1
Python: numpy.random.PCG64
Comparison of PRNGs
From https://www.pcg-random.org/
“Please give me 64
two-dimensional
points sampled from
the uniform
distribution on the
unit square.”
Stratified Random Numbers
• We discussed stratified sampling as method of
variance reduction
f (x)
• Easy win: why not “bake in” (1D) stratification to
random number generator itself?
• Caveat: to avoid bias, must always request & use
numbers in batches of a known size x
e l a rg e s t How do we d
“fill in th o this
algorithmical
remain i n g g a p ” ly?
8 4 12 2 10 6 14 1 9 5 13 3 11 7 15
Radical Inverse
• Many low-discrepancy sequences built up using radical inverse ϕb(i), also known
(more or less…) as the van der Corput sequence
– pick a base b, “reflect” digits of integer i across decimal place
∞ k ∞ −1−k
– more precisely, if i = ∑k=0 ikb , then ϕb(i) := ∑i=0 ikb
• Example. Consider integer i = 6. In base 2 (binary), i has digits 110, i.e., two plus
four. Radical inverse in binary is 0.011. In decimal, then, the base 2 radical inverse
of 6 is one fourth plus one eighth, or 1/4 + 1/8 = 3/8
0 0 0
0 1 0 1 0 1
a=2, b=1 a=3, b=2 a=5, b=3
Hammersley Sequence
• Hammersley sequence also builds low-
Halton
discrepancy sample points x1, …, xn from
radical inverse
– in particular, xi1 := (i − 1)/n, and
xij := (ϕi−1, pj) for remaining j, where pj is
the jth prime number
Hammersley
– achieves lower discrepancy than Halton
– …but, requires that total number of
samples n be fixed in advance
Special Distributions
Special Distributions—Overview
• Many continuous distributions beyond uniform come up repeatedly:
– normal, exponential, …
– or uniform distribution on other domains (sphere, simplex, …)
• Well-known formulas for sampling from these
• Basic approach:
– sample uniform random numbers
– apply closed-form expression that “warps” uniform to nonuniform
• Can more generally apply warping to any source of randomness
– stratified, low-discrepancy, …
• Also need to sample discrete distributions (integers, permutations…)
Sampling Integers
• Q: Suppose I have a random number 0%7 = 0 5%7 = 5
generator rand() that provides integers 1%7 = 1 6%7 = 6 0, 1, 2 appea
uniformly sampled from the interval [0, M]. 2%7 = 2 7%7 = 0 twice as ofte r
n!
How can I get a random integer in the 3%7 = 3 8%7 = 1
range [0, a), where a < M? 4%7 = 4 9%7 = 2
# uniformly sample integer
• A: Just compute rand()%a. wrong # from the range [0,a-1],
# assuming rand() gives
• No! “Obvious” strategy doesn’t sample # uniform values in [0,M]
uniformly. def uniformRand( a ):
m = a * floor(M/a)
while True:
• Q: Why not? :-) x = rand()
if x < m:
• A: Consider the case M = 10, a = 7… return x % a
right
• Q: How can we get it right? (two-word j u s t c a l l a
Or
answer) e d ! ) l i b r a r y
(trust
• A: Rejection sampling. function
Sampling Real Values
• As you might imagine, you also have to
think carefully about numerical issues
when sampling other distributions (e.g.,
floating point values in [0,1]).
• Lots of stuff written on how to do it
right…
• Rule of thumb: trust, but verify!
“The conventional method for generating pseudorandom floating point values is to divide a
pseudorandom integer by a floating-point constant. The problems with this approach are (1) the
possible outcomes are only about 7% of representable floating point values (2) subsequent
transformations may not yield the expected distributions (e.g., applying a log transformation to a
uniform distribution does not yield an exponential distribution).”
Sampling a Random Permutation
• For many algorithms, need to sample random
permutations Ω 1
η
– e.g., Latin hypercube sampling 2 5
• Recall that a permutation of a discrete set Ω is a
bijection η : Ω → Ω
3 4
– intuitively, a “shuffling” of the elements
integrates to 1 mean
standard
deviation
eas i l y g e n e r a l i z e s
to n dimensions
Sampling from the Normal Distribution
The Box-Muller transformation provides a simple way to warp uniformly-
2
distributed samples (u, v) ∈ [0,1] into normally-distributed ones:
def SampleNormal( μ, σ ):
u = uniform(0,1) * 2*pi
v = uniform(0,1)
return σ*sqrt(v)*cos(u) + μ
∫−∞
distributio n , j u s t s a m p l e 1 −y 2/2
p(x, y) dx = e
X1, …, Xd ∼ 𝒩0,1 2π
Sampling the Unit Circle
• Plenty of other standard / simple
distributions we need to sample from
• Q: Easy example: how do we draw
samples uniformly from the unit circle
1 2
S := {x ∈ ℝ : ∥x∥ = 1}?
• A: Just sample a value u uniformly from
[-1,1], then evaluate (cos(πu), sin(πu))
• Q: How do you same from [-1,1]? :-)
• A: Sample [0,1], multiply by 2, subtract 1
Sampling a 2D disk
• How do we get uniform samples on the
2 2
unit disk D := {x ∈ ℝ : ∥x∥ ≤ 1}?
wrong
• First idea: uniformly sample r, u from
(nonuniform)
[0,1], then plug in to r(cos(2πu)sin(2πu))
• Doesn’t work!
– too many samples near the origin…
right
• Simple modification:
(uniform)
u(cos(2πv), sin(2πv)) for u, v ∼ U[0,1]
– we’ll derive this formula later!
Sampling a d-Dimensional Sphere
• Q: How do we uniformly sample the d-
dimensional sphere
n n+1
S := {x ∈ ℝ : ∥x∥ = 1}?
2 2
– Hint: since the L2 norm x1 +⋯+ xd
doesn’t depend on the choice of
coordinate system, a d-dimensional
−∥x∥2
Gaussian e exhibits spherical
symmetry
• A: Just generate d+1 normally-distributed
coordinates x1, …, xd ∼ 𝒩0,1 and
normalize them (y := x/∥x∥).
Sampling a d-Dimensional Ball
• First sample a point y uniformly
from the d-dimensional sphere wrong
• Then draw a value a uniformly (nonuniform)
from [0,1]
• Uniform point on the ball is then
1/d
a y
• Food for thought: can you right
derive this formula (using the (uniform)
techniques we’ll see later)?
Sampling a Random Rotation
• In general, normal distribution is starting
point for sampling many other basic
distributions
• E.g., to sample
⊤
a d-dimensional rotation
matrix Q Q = I, det(Q) > 0:
– sample a d × d matrix A where each entry
is sampled from the standard (1D) normal
distribution
– compute the
⊤
spectral decomposition
A = QΛQ
– then Q is a matrix uniformly sampled*
from the rotation group
s w e c a n
e a t t r i c k
L o t s o f n e b a s i c
t o s a m p l
play u t i o n s …
dis t r i b
…but just l i k e i n t e g r a l s ,
there a re o n l y s o m a n y
distrib u t i o n s w e c a n
sample in c l o s e d f o r m .
p(x)
How can we hand
le more
general distributio
ns?
x
Discrete Distributions
Sampling Discrete Distributions
• Consider a discrete random variable X with
outcomes x1, …, xn, and associated probabilities probability 0.29
– p1 + ⋯ + pn = 1 0.13 0.14
0.11 0.11
• How do we construct an algorithmic procedure
that models this random variable?
• I.e., probability of sampling xi is pi
– repeatedly running the procedure N times
yields each outcome xi with fraction
approaching pi as N → ∞
• Several methods
– inversion, rejection, resampling, alias table…
Review: Cumulative Distribution
For any discrete random variable X, the cumulative distribution function
P(x) is the monotonically increasing function given by eac h cumulati
probability v e
the sum of
Pk i s just
the first k
probabilitie
s pi
1
3
4
1
2
1
4
0
2 3 4 5 6 7 8 9 10 11 12
Computing the CDF
k
• Want to compute Pk := ∑i=1 pi for k
∈ 1,…, n
• Q: What’s the asymptotic cost of evaluating all p1 p2 p3 p4 p5 p6
n sums directly?
• A: Have to sum one term for the 1st sum, two
terms for the 2nd…
2 2
– 1 + 2 + ⋯ + n = (n + n)/2 ⟹ O(n )
• Q: Can you do it faster, in O(n) time?
• A: Sure, just store the “partial sums”
– P1 = p1, P2 = P1 + p2, …
– known as cumulative sum/prefix sum/scan
Computing the CDF
k
• Want to compute Pk := ∑i=1 pi for k
∈ 1,…, n
• Q: What’s the asymptotic cost of evaluating all p2 p3 p4 p5 p6
n sums directly?
• A: Have to sum one term for the 1st sum, two
terms for the 2nd…
2 2
– 1 + 2 + ⋯ + n = (n + n)/2 ⟹ O(n )
• Q: Can you do it faster, in O(n) time?
• A: Sure, just store the “partial sums”
– P1 = p1, P2 = P1 + p2, …
– known as cumulative sum/prefix sum/scan
p1 p1
Computing the CDF
k
• Want to compute Pk := ∑i=1 pi for k
∈ 1,…, n
• Q: What’s the asymptotic cost of evaluating all p3 p4 p5 p6
n sums directly?
• A: Have to sum one term for the 1st sum, two
terms for the 2nd…
2 2
– 1 + 2 + ⋯ + n = (n + n)/2 ⟹ O(n )
• Q: Can you do it faster, in O(n) time?
• A: Sure, just store the “partial sums”
– P1 = p1, P2 = P1 + p2, …
– known as cumulative sum/prefix sum/scan p2 p2
p1 p1 p1
Computing the CDF
k
• Want to compute Pk := ∑i=1 pi for k
∈ 1,…, n
• Q: What’s the asymptotic cost of evaluating all p4 p5 p6
n sums directly?
• A: Have to sum one term for the 1st sum, two
terms for the 2nd…
2 2
– 1 + 2 + ⋯ + n = (n + n)/2 ⟹ O(n )
• Q: Can you do it faster, in O(n) time?
• A: Sure, just store the “partial sums”
– P1 = p1, P2 = P1 + p2, … p3 p3
– known as cumulative sum/prefix sum/scan p2 p2 p2
p1 p1 p1 p1
Computing the CDF
k
• Want to compute Pk := ∑i=1 pi for k
∈ 1,…, n
• Q: What’s the asymptotic cost of evaluating all p5 p6
n sums directly?
• A: Have to sum one term for the 1st sum, two
terms for the 2nd…
2 2
– 1 + 2 + ⋯ + n = (n + n)/2 ⟹ O(n )
• Q: Can you do it faster, in O(n) time?
• A: Sure, just store the “partial sums” p4 p4
– P1 = p1, P2 = P1 + p2, … p3 p3 p3
– known as cumulative sum/prefix sum/scan p2 p2 p2 p2
p1 p1 p1 p1 p1
Computing the CDF
k
• Want to compute Pk := ∑i=1 pi for k
∈ 1,…, n
• Q: What’s the asymptotic cost of evaluating all p6
n sums directly?
• A: Have to sum one term for the 1st sum, two
terms for the 2nd…
2 2
– 1 + 2 + ⋯ + n = (n + n)/2 ⟹ O(n )
p5 p5
• Q: Can you do it faster, in O(n) time?
• A: Sure, just store the “partial sums” p4 p4 p4
– P1 = p1, P2 = P1 + p2, … p3 p3 p3 p3
– known as cumulative sum/prefix sum/scan p2 p2 p2 p2 p2
p1 p1 p1 p1 p1 p1
Computing the CDF parallel scan
can
k build CDF
• Want to compute Pk := ∑i=1 pi for k
∈ 1,…, n O(log n) tim in
e
• Q: What’s the asymptotic cost of evaluating all
n sums directly?
• A: Have to sum one term for the 1st sum, two P6 = 1
terms for the 2nd…
2 2 p6
– 1 + 2 + ⋯ + n = (n + n)/2 ⟹ O(n ) P5
p5 p5
• Q: Can you do it faster, in O(n) time? P4
rich ⟶
i c h c a n
• Construct via Robin Hood algorithm: o t e : r !
N e p o o r
b e c o m
– first compute the mean probability
p := (p1 + ⋯ + pn)/n
– anybody above p is “rich”, anyone below is
“poor”, anyone equal is “fair” p
– iterate over events from left-to-right
– then just rob from the rich, give to the poor…
– keep track of who you robbed from
⟵ poor
Alias Table
• Can get O(1) amortized sampling via alias table : s i n c e a r e a
Key idea k i s
o v e re d b y r e g i o n
c
• Construct via Robin Hood algorithm:
pr o p o r t i o n a l t o p k, c a n
a m p l e u n i f o r m l y
– first compute the mean probability ju s t s
p := (p1 + ⋯ + pn)/n r e c t a n g l e .
from this
– anybody above p is “rich”, anyone below is
“poor”, anyone equal is “fair”
– iterate over events from left-to-right
– then just rob from the rich, give to the poor…
– keep track of who you robbed from
• To sample from alias table:
– sample index k uniformly from 1, …, n
– use biased coin flip to pick event in kth column
Inversion
Inversion Method—Overview
• How do we get formulas for special distributions (normal, etc.)?
• Can be via continuous version of the inversion method
• Inversion method for continuous distributions:
– start with probability distribution p(x) we want to sample from
– construct the associated cumulative distribution P(x) via integration
−1
– find an explicit form for the inverse CDF, P (y)
−1
– apply P to uniformly-distributed samples
• Basic idea is to “warp” uniform samples into nonuniform ones
Inversion Method—1D
• More explicitly, consider a probability
density p : [a, b] → [0,1]
• Corresponding cumulative distribution
x
∫0
function given by P(x) := p(y) dy a b l e
a re t o b e
Qui t e r g i n
v e r y t h i n
−1 d o e
• The random variable Y := P (U[a,b])
to f o r m !
clo s e d
then has probability density p
– i.e., draw samples uniformly, then
apply inverse CDF
Inversion Method—Example
π
Consider the probability density p(x) := sin(πx) on x ∈ [0,1]
2
p(x)
Step I: integrate PDF to get the CDF
x
[ 2 cos(πy)] =
x
∫0
π 1 1 1
P(x) = 2
sin(πy) dy = − −2 cos(πx)+ 2
0
x
a b
P(x)
Step II: invert the CDF
1 1
y= 2
(1 − cos(πx))
⟺ cos(πx) = 1 − 2y
⟺ x = arccos(1 − 2y)/π
now just plug in random values y ∼ U[0,1] 0
a
x
Inversion Method—2D
Can extend inversion method to multiple dimensions, via the multivariable CDF.
2
E.g., to sample from a two-dimensional region Ω ⊂ ℝ : Step IV: invert the
distributions
Step I: define a map Step II: measure the
−1
to the region change in density f(u) := F (u)
−1
2
ϕ : [0,1] → Ω ρ(u, v) := det Jϕ(u, v) g(u, v) := Gu (v)
ϕ ϕ∘η η
Q: Does this story remind e s a s e n s e o f
A: It’s what we ideally want to Maybe giv
you of anything? a t ’ s h a rd t o d o …
do with importance sampling! why th
Rejection Sampling
Rejection Sampling—Overview
• Inversion method is great when it works…
– …can’t always invert & integrate a distribution!
• In general, will need more universal techniques
• Rejection sampling
– stupid but simple
– essentially just (repeatedly) “guess and check”
– example of Las Vegas algorithm: always yields
correct result, but run time is nondeterministic
Rejection Sampling—Region
• Suppose we want to rejection sample
2
a region Ω ⊂ ℝ
A
– only requirement: must be able to
answer query “is a point x inside Ω”
• Let A ⊂ Ω be a region that we know
how to (efficiently) sample uniformly
– e.g., a rectangle bounding Ω
• until we find a point x ∈ Ω:
Ω
– draw a point x ∼ UA
Example—Unit Disk
• Suppose our region is the unit disk (1,1)
2
Ω := {x ∈ ℝ : ∥x∥ ≤ 1}
A
• Q: If I have a function rand() that gives uniform Ω
values in [0,1], how do I uniformly sample the
(smallest) box around Ω?
• A: Evaluate
u,v = [2*rand()-1, 2*rand()-1]
x x
0 1 0 1
Analysis of Rejection Sampling
• How many trials should we expect rejection sampling
to take on average? A
– each trial costs us more PRNG invocations
• Q: What’s probability p1 that we accept the 1st sample? Ω
• A: p1 = | Ω | / | A |
• Q: What’s the probability we accept the 2nd sample?
• A: p2 = (1 − p1)p1
• Q: What’s the probability we accept the kth sample? expected number of trials:
k−1
• A: pk = (1 − p1) p1
u n i t d i s k :
e.g., f o r
• Q: What’s the expected number of trials? ) ≈ 1 . 27
∞ ∞
1 / (π / 4
k−1
• A: ∑k=1 kpk = ∑k=1 k(1 − p1) p1 = 1/p1
Disk—Rejection Sampling vs. Inversion
• Earlier, saw how to sample from the disk using inversion
• Which approach is better: rejection or inversion?
• Rejection
2 2
– evaluate x + y multiple times (~1.27 on average)
– call PRNG multiple times (~2 x 1.27 = 2.54 on average)
• Inversion
– evaluate square root, cosine, and sine once
– call PRNG twice (radius and angle)
• Winner: …depends on a lot of factors!
– speed of PRNG, cost of special functions,
multithreading…
• Universal rule: profile, profile, profile!
– i.e., which gives best results for actual application? inversion rejection
Rejection Sampling the n-Dimensional Ball
• Still, overall, rejection sampling is volume
frequency
• Step III. Draw N samples Xi from the frequency
discrete distribution with events Yi and
normalized weights Wi := wi / ∑j wj. X Y
Summary
Takeaway—Monte Carlo Integration
If you forget everything else from this class, remember this: