[go: up one dir, main page]

0% found this document useful (0 votes)
24 views44 pages

HPD Explained From 17 Slides

Chapter 5 discusses conjugate priors in Bayesian statistics, emphasizing their computational advantages and interpretability. It specifically examines the Gamma distribution as a conjugate prior for the Poisson distribution, illustrating how to derive posterior distributions and calculate Bayesian estimates. The chapter also covers credible intervals and highest posterior density regions for summarizing parameter uncertainty.

Uploaded by

Maliha Siddika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views44 pages

HPD Explained From 17 Slides

Chapter 5 discusses conjugate priors in Bayesian statistics, emphasizing their computational advantages and interpretability. It specifically examines the Gamma distribution as a conjugate prior for the Poisson distribution, illustrating how to derive posterior distributions and calculate Bayesian estimates. The chapter also covers credible intervals and highest posterior density regions for summarizing parameter uncertainty.

Uploaded by

Maliha Siddika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

STAT 535: Chapter 5:

More Conjugate Priors

David B. Hitchcock
E-Mail: hitchcock@stat.sc.edu

Spring 2022

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Why are Conjugate Priors Nice?

▶ Recall that a conjugate prior is a prior which (along with the


data model) produces a posterior distribution that has the
same functional form as the prior (but with new, updated
parameter values).
▶ In the Beta-binomial setup, the beta prior was conjugate
because the posterior was also a beta distribution.
▶ Conjugate priors are nice because
1. we can typically derive the posterior without needing any
difficult computation;
2. it is typically easy to understand the respective contributions of
the prior information and the data information to the posterior.
▶ We will now examine a couple of other Bayesian models with
conjugate priors.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


The Poisson Distribution

▶ Recall that the Poisson distribution is a common model for


count data: Data whose possible values are the nonnegative
integers 0, 1, 2, . . ..
▶ The Poisson distribution is indexed by a parameter λ > 0, and
(given λ) the pdf of a Poisson random variable Y |λ is:

λy e −λ
f (y |λ) =
y!

▶ If our data consists of a random sample on n such counts,


then the likelihood function is the joint density function
f (y1 |λ)f (y2 |λ) · · · f (yn |λ), since Y1 , Y2 , . . . , Yn are
independent.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Choice of Prior
▶ When our data model is Poisson, what is a good choice for
the prior for the parameter λ?
▶ Since λ > 0, we should use as a prior some distribution whose
support is (0, ∞).
▶ The Gamma distribution is a good choice for the prior, since
its support is (0, ∞).
▶ Note that the parameterization of the Gamma distribution
that we will use in this class is different from the one in the
STAT 511 course.
▶ We will consider a Gamma pdf with a shape parameter s and
a rate parameter r :
r s s−1 −r λ
f (λ) = λ e , λ > 0.
Γ(s)

▶ Note that the rate parameter is the reciprocal of the scale


parameter used in the other parameterization.
David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors
The Gamma/Poisson Bayesian Model
▶ If our data Y1 , . . . , Yn are iid Poisson(λ), then a gamma(s, r )
prior on λ is a conjugate prior.
Likelihood:
n
e −λ λyi e −nλ λ yi
P
Y
L(λ|y ) = = Qn
yi ! i=1 (yi !)
i=1

Prior:
r s s−1 −r λ
f (λ) = λ e , λ > 0.
Γ(s)
⇒ Posterior:
yi +s−1 −(n+r )λ
P
f (λ|y ) ∝ λ e , λ > 0.


⇒ f (λ|y ) is gamma yi + s, n + r . (Conjugate!)
P

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Properties of the Gamma (Mean)

▶ Under this shape/rate parameterization, the mean of the


Gamma(s, r ) prior distribution is
s
E (λ) =
r
▶ Based on our prior beliefs, we would choose appropriate values
of the hyperparameters s and r .

▶ Similarly, the mean of the Gamma Pyi + s, n + r posterior
distribution is P
yi + s
E (λ|y ) =
n+r
▶ This posterior mean could be used as a Bayesian estimator of
λ.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Properties of the Gamma (Variance)

▶ If we have a good guess of the prior mean of λ, how can we


specifically choose which s and r to use in our prior?
▶ Under this shape/rate parameterization, the variance of the
Gamma(s, r ) prior distribution is
s
Var (λ) =
r2
▶ The prior variance (and standard deviation) can guide our
choices of s and r .
▶ Plotting the potential prior using the plot gamma function in
the bayesrules package can also be helpful in choosing the
prior.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


The Posterior Mean in the Gamma/Poisson Bayesian
Model

▶ The posterior mean is:

P
yi + s
λ̂B =
Pn + r
yi s
= +
n + r n P
+r    
n yi r s
= +
n+r n n+r r
▶ Again, the data get weighted more heavily as n → ∞.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example: Fraud Risk Phone Calls

▶ The textbook gives an example using data on fraud risk phone


calls per day, which can be modeled with a Poisson
distribution.
▶ The parameter of interest is λ, the mean number of fraud risk
calls per day.
▶ Prior belief: The mean number of such calls per day is around
5.
▶ So let’s choose s and r so that s/r = 5.
▶ Also, we believe that λ is very likely between 2 and 7.
▶ Let’s try to plot a few possible priors that have s/r = 5 (see R
examples).

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example: Fraud Risk Phone Calls

▶ The choice of s = 10 and r = 2 seems to reflect our prior


beliefs.
▶ We collect n = 4 counts as our data, and the data were:
P
6, 2, 2, 1 ( i yi = 11 and ȳ = 2.75).

▶ So our posterior is Gamma Pyi + s, n + r =
Gamma(11 + 10, 4 + 2) = Gamma(21, 6)
▶ A Bayesian estimate of λ is thus the posterior mean
21/6 = 3.5.
▶ See R plots to see how the data has updated our prior beliefs.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Bayesian Inference: Posterior Intervals

▶ Simple values like the posterior mean E [θ|y ] and posterior


variance var [θ|y ] can be useful in learning about θ.
▶ Quantiles of p(θ|y ) (especially the posterior median) can also
be a useful summary of θ.
▶ The ideal summary of θ is an interval (or region) with a
certain probability of containing θ.
▶ Note that a classical (frequentist) confidence interval does
not exactly have this interpretation.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Bayesian Credible Intervals

▶ A credible interval (or in general, a credible set) is the


Bayesian analogue of a confidence interval.
▶ A 100(1 − α)% credible set C is a subset of Θ such that
Z
p(θ|y ) dθ = 1 − α.
C

▶ If the parameter space Θ is discrete, a sum replaces the


integral.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Quantile-Based Intervals

▶ If θL∗ is the α/2 posterior quantile for θ, and θU∗ is the 1 − α/2
∗ ∗
posterior quantile for θ, then (θL , θU ) is a 100(1 − α)%
credible interval for θ.
Note: P[θ < θL∗ |y ] = α/2 and P[θ > θU
∗ |y ] = α/2.

⇒ P{θ ∈ (θL∗ , θU

)|y }
/ (θL∗ , θU
= 1 − P{θ ∈ ∗
)|y }
 
∗ ∗
= 1 − P[θ < θL |y ] + P[θ > θU |y ]

= 1 − α.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Quantile-Based Intervals
Picture:
Gamma(21,6) posterior

0.5
0.4
0.3
f(λ|y)

0.2

area=0.025 area=0.025
area=0.95
0.1
0.0

0 2 4 6 8

Figure: Between 2.17 and 5.15 is posterior probability 0.95.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example 2: Quantile-Based Interval

▶ Consider 10 flips of a coin having P{Heads} = θ.


▶ Suppose we observe 2 “heads”.
▶ We model the count of heads as binomial:
 
10 y
p(y |θ) = θ (1 − θ)10−y , y = 0, 1, . . . , 10.
y

▶ Let’s use a uniform prior for θ:

p(θ) = 1, 0 ≤ θ ≤ 1.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example 2: Quantile-Based Interval

▶ Then the posterior is:

p(θ|y ) ∝ p(θ)L(θ|y )
 
10 y
= (1) θ (1 − θ)10−y
y
∝ θy (1 − θ)10−y , 0 ≤ θ ≤ 1.

▶ This is a beta distribution for θ with parameters


y + 1 and 10 − y + 1.
▶ Since y = 2 here, p(θ|y = 2) is beta(3,9).
▶ The 0.025 and 0.975 quantiles of a beta(3,9) are (.0602,
.5178), which is a 95% credible interval for θ.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


HPD Intervals / Regions

▶ The equal-tail credible interval approach is ideal when the


posterior distribution is symmetric.
▶ But what if p(θ|y ) is skewed?
Picture:
A Skewed Posterior Density
0.15
0.10
π(θ)
0.05
0.00

0 5 10 15

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


HPD Intervals / Regions

▶ Note that values of θ around 1 have much higher posterior


probability than values around 7.5.
▶ Yet 7.5 is in the equal-tails interval and 1 is not!
▶ A better approach here is to create our interval of θ-values
having the Highest Posterior Density.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


HPD Intervals / Regions

Defn: A 100(1 − α)% HPD region for θ is a subset C ∈ Θ defined


by
C = {θ : p(θ|y ) ≥ k}
where k is the largest number such that
Z
p(θ|y ) dθ = 1 − α.
θ:p(θ|y )≥k

▶ The value k can be thought of as a horizontal line placed over


the posterior density whose intersection(s) with the posterior
define regions with probability 1 − α.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


HPD Intervals / Regions

Picture: (95% HPD Interval)


Gamma(21,6) posterior

0.5
0.4
0.3
f(λ|y)

0.2
0.1

area=0.90
0.0

0 2 4 6 8

⇒ P{θL∗ < θ < θU∗ } = 0.90.

The values between θL∗ = 2.25 and θU


∗ = 4.72 here have the

highest posterior density.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


HPD Intervals / Regions

▶ The HPD region will be an interval when the posterior is


unimodal.
▶ If the posterior is multimodal, the HPD region might be a
discontiguous set.
Picture:
Bimodal posterior distribution

0.25
0.20
0.15
f(λ|y)

0.10
0.05
0.00

0 2 4 6 8

▶ The set {θ : θ ∈ (2.85, 4.1) ∪ (6.0, 7.25)} is the HPD region


for θ here.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example 1 Revisited: HPD Interval

▶ See course web page for finding an HPD interval in R for λ in


the fraud risk call example.
▶ A 90% quantile-based credible interval for λ is (2.345, 4.844).
▶ Also note the hpd function in TeachingDemos package in R.
▶ See code for Example 2 (coin-flipping data) in R.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


The Normal-Normal Model

▶ Why is it so common to model data using a normal


distribution?
▶ Approximately normally distributed quantities appear often in
nature.
▶ CLT tells us any variable that is basically a sum of
independent components should be approximately normal.
▶ Note ȳ and S 2 are independent when sampling from a normal
population — so if beliefs about the mean are independent of
beliefs about the variance, a normal model may be
appropriate.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Why Normal Models?

▶ The normal model is analytically convenient (exponential


family, sufficient statistics ȳ and S 2 )
▶ Inference about the population mean based on a normal
model will be correct as n → ∞ even if the data are truly
non-normal.
▶ When we assume a normal likelihood, we can get a wide class
of posterior distributions by using different priors.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Conjugate analysis with Normal Data (variance known)

▶ Simple situation: Assume data Y1 , . . . , Yn are iid N(µ, σ 2 ),


with µ unknown and σ 2 known.
▶ We will make inference about µ.
▶ The likelihood is
n
1 2
(2πσ 2 )− /2 e − 2σ2 (Yi −µ)
Y 1
L(µ|y ) =
i=1

▶ The parameter of interest µ can take values from −∞ to ∞.


▶ A conjugate prior for µ is µ ∼ N(δ, τ 2 ):
1 2
p(µ) = (2πτ 2 )− /2 e − 2τ 2 (µ−δ)
1

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Conjugate analysis with Normal Data (variance known)

So the posterior is:

p(µ|y ) ∝ L(µ|y )p(µ)


n
1 2 1 2
e − 2σ2 (Yi −µ) e − 2τ 2 (µ−δ)
Y

i=1
  X n 
1 1 2 1 2
= exp − (Yi − µ) + 2 (µ − δ)
2 σ2 τ
i=1
  X n 
1 1 2 2 1 2 2
= exp − (Yi − 2Yi µ + µ ) + 2 (µ − 2µδ + δ )
2 σ2 τ
i=1

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Conjugate analysis with Normal Data (variance known)

So the posterior is:

  X
1 1
p(µ|y ) ∝ exp − 2 2 τ 2 Yi2 − 2τ 2 µnȳ + nµ2 τ 2
2σ τ

2 2 2 2 2
+ σ µ − 2σ µδ + σ δ
 
1 1
= exp − 2 2 µ2 (σ 2 + nτ 2 ) − 2µ(δσ 2 + τ 2 nȳ )
2σ τ
 X 
2 2 2
+ δ σ +τ Yi2
n 1h  1 n δ nȳ  io
∝ exp − µ2 2 + 2 − 2µ 2 + 2 + k
2 τ σ τ σ
(where k is some constant)

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Conjugate analysis with Normal Data (variance known)

+ nȳ
n 1 h 1  δ 
n  2 τ2 σ2
io
Hence p(µ|y ) ∝ exp − + µ − 2µ 1
+ k
2 τ 2 σ2 τ2
+ σn2
( " 2 #)
δ
+ nȳ

1 1 n τ2 σ2
∝ exp − + µ− 1
2 τ 2 σ2 τ2
+ σn2

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Conjugate analysis with Normal Data (variance known)

▶ Hence the posterior for µ is simply a normal distribution with


mean
δ
τ2
+ nȳ
σ2
1 n
τ2
+ σ2
and variance −1
τ 2σ2

1 n
+ =
τ 2 σ2 σ 2 + nτ 2

▶ The precision is the reciprocal of the variance.


1
▶ Here, is the prior precision . . .
τ2
n
▶ is the data precision . . .
σ2
1 n
▶ . . . and + 2 is the posterior precision.
τ 2 σ

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Conjugate analysis with Normal Data (variance known)

▶ Note the posterior mean E [µ|y ] is simply

1/τ 2 n/σ 2
1/τ 2
δ+ ȳ ,
+ n/σ 2 1/τ 2 + n/σ2

a combination of the prior mean and the sample mean.


▶ If the prior is highly precise, the weight is large on δ.
▶ If the data are highly precise (e.g., when n is large), the
weight is large on ȳ .
2
▶ Clearly as n → ∞, E [µ|y ] ≈ ȳ , and var [µ|y ] ≈ σn if we
choose a large prior variance τ 2 .
▶ This implies that for τ 2 large and n large, Bayesian and
frequentist inference about µ will be nearly identical.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Conjugate analysis with Normal Data (mean known)
▶ Now suppose Y1 , . . . , Yn are iid N(µ, σ 2 ) with µ known and
σ 2 unknown.
▶ We will make inference about σ 2 .
▶ Our likelihood
n
n 1 P
n − 2σ 2 [ n (Yi −µ)2 ]
2 2 −2
L(σ |y ) ∝ (σ ) e i=1

▶ Let W denote the sufficient statistic n1 (Yi − µ)2 .


P

▶ The conjugate prior for σ 2 is the inverse gamma distribution.


▶ If a r.v. Y ∼ gamma, then 1/Y ∼ inverse gamma (IG).
▶ The prior for σ 2 is
βα 2
p(σ 2 ) = (σ 2 )−(α+1) e −(β/σ ) for σ 2 > 0
Γ(α)
where α > 0, β > 0.
David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors
A Conjugate analysis with Normal Data (mean known)
▶ Note the prior mean and variance are
β
E (σ 2 ) = provided that α > 1
α−1
β2
var (σ 2 ) = provided that α > 2
(α − 1)2 (α − 2)

▶ So the posterior for σ 2 is:

p(σ 2 |y ) ∝ L(σ 2 |y )p(σ 2 )


n n
∝ (σ 2 )− 2 e − 2σ2 w (σ 2 )−(α+1) e −(
β/σ 2 )

n
n β+ w
2 −(α+ 2 +1) − σ22
= (σ ) e
▶ Hence the posterior is clearly an IG(α + n2 , β + n2 w )
distribution, where w = n1 (Yi − µ)2 .
P
Conjugate!
David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors
A Conjugate analysis with Normal Data (mean known)

▶ How to choose the prior parameters α and β?


▶ Note

[E (σ 2 )]2 [E (σ 2 )]2
 
2
α= + 2 and β = E (σ ) +1
var (σ 2 ) var (σ 2 )

so we could make guesses about E (σ 2 ) and var (σ 2 ) and use


these to determine α and β.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)

▶ When Y1 , . . . , Yn are iid N(µ, σ 2 ) with both µ, σ 2 unknown,


the conjugate prior for the mean explicitly depends on the
variance:

2
p(σ 2 ) ∝ (σ 2 )−(α+1) e −β/σ
1 − 1
(µ−δ)2
p(µ|σ 2 ) ∝ (σ 2 )− 2 e 2σ 2 /s0

▶ The prior parameter s0 measures the analyst’s confidence in


the prior specification.
▶ When s0 is large, we strongly believe in our prior.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)

The joint posterior for (µ, σ 2 ) is:

p(µ, σ 2 |y ) ∝ L(µ, σ 2 |y )p(σ 2 )p(µ|σ 2 )


β n
1 P 1
n 3 − σ 2 − 2σ 2 (Yi −µ)2 − 2 (µ−δ)2
2 −α− 2 − 2 2σ /s0
∝ (σ ) e i=1

n 3 − β − 1 ( Y 2 −2nȳ µ+nµ2 )− 1
(µ2 −2µδ+δ 2 )
P
= (σ 2 )−α− 2 − 2 e σ2 2σ2 i 2σ 2 /s0
 β

n 1 1 P
2 −α− 2 − 2 − σ 2 − 2σ 2 ( Yi2 −nȳ 2 )
= (σ ) e
 
1 2 2 2
× (σ 2 )−1 e − 2σ2 {(n+s0 )µ −2(nȳ +δs0 )µ+(nȳ +s0 δ )}

Note the second part is simply a normal kernel for µ.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)

▶ To get the posterior for σ 2 , we integrate out µ:


Z ∞
p(σ 2 |y ) = p(µ, σ 2 |y ) dµ
−∞
n 1 1 1 P
Yi2 −nȳ 2 )]
∝ (σ 2 )−α− 2 − 2 e − σ2 [β+ 2 (

since the second piece (which depends on µ) just integrates to


a normalizing constant.
▶ Hence since −α − n2 − 12 = −(α + n
2 − 12 ) − 1, we see the
posterior for σ 2 is inverse gamma:
X
σ 2 |y ∼ IG α + n
− 12 , β + 1
(Yi − ȳ )2

2 2

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)
▶ Note that
p(µ, σ 2 |y )
p(µ|σ 2 , y ) =
p(σ 2 |y )
▶ After lots of cancellation,

p(µ|σ 2 , y ) ∝ σ −2 exp{− 2σ1 2 [(n + s0 )µ2 − 2(nȳ + δs0 )µ


+ (nȳ 2 + s0 δ 2 )]}
n h io
nȳ 2 +s0 δ 2
= σ −2 exp − 2σ2 /(n+s
1
0)
µ2 − 2 nȳn+s
+δs0
0
µ+ n+s0

▶ Clearly p(µ|σ 2 , y ) is normal:

nȳ + δs0 σ 2
 
2
µ|σ , y ∼ N ,
n + s0 n + s0

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)

2
˙ ȳ , σn .
▶ Note as s0 → 0, µ|σ 2 , y ∼N
▶ Note also the conditional posterior mean is
   
n s0
ȳ + δ.
n + s0 n + s0
▶ The relative sizes of n and s0 determine the weighting of the
sample mean ȳ and the prior mean δ.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)

The marginal posterior for µ is:


Z ∞
p(µ|y ) = p(µ, σ 2 |y ) dσ 2
0
Z ∞
2β + (s0 + n)(µ − δ)2
 
n 3
2 −α− 2 − 2
= (σ ) exp − dσ 2
0 2σ 2

A A
Letting A = 2β + (s0 + n)(µ − δ)2 , z = 2σ 2
⇒ σ2 = 2z and
dσ 2 = − 2zA2 dz,

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)

Z ∞ −α− n − 3
A 2 2 A −z
p(µ|y ) ∝ e dz
0 2z 2z 2
Z ∞ −α− n − 1
A 2 2 1 −z
= e dz
0 2z z
n 1
Z ∞ n 1
−α− 2 − 2
∝A z −α− 2 − 2 −1 e −z dz
0

This integrand is the kernel of a gamma density and thus the


integral is a constant. So

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


A Model for Normal Data (mean and variance both
unknown)

n 1
p(µ|y ) ∝ A−α− 2 − 2
 − 2α+n+1
2
2
= 2β + (s0 + n)(µ − δ)
" #− 2α+n+1
2
(s0 + n)(µ − δ)2
∝ 1+

which is a (scaled) noncentral t kernel having noncentrality


parameter δ and degrees of freedom n + 2α.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example 1: Midge Data

▶ Example 1: Y1 , . . . , Y9 are a random sample of midge wing


iid
lengths (in mm). Assume the Yi′ s ∼ N(µ, σ 2 ).
▶ Example 1(a): If we know σ 2 = 0.01, make inference about µ.
(See R example)
▶ A Bayesian point estimate for the population mean midge
wing length is the posterior mean, 1.806 mm.
▶ A 95% credible interval for µ is (1.741, 1.871), so with
posterior probability 0.95, the population mean midge wing
length is between 1.741 and 1.871 mm.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example 1: Midge Data

▶ Example 1(b): Make inference about µ and σ 2 , both


unknown (see R example).
▶ This requires choosing the hyperparameters α and β of the
inverse gamma prior on σ 2 .
▶ 95% credible interval for σ 2 : (0.012, 0.028), with posterior
median 0.0188.
▶ To approximate the posterior distribution for µ: We will
randomly generate many values from the posterior distribution
of σ 2 .
▶ Then we will generate many values from the posterior of µ,
given each respective generated value of σ 2 .
▶ 95% credible interval for µ: (1.727, 1.90), with posterior
median 1.81 mm.

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors


Example 2: Brain Data

▶ The textbook has an example of Bayesian inference about the


mean hippocampal volume of the brain in a population of
college football players who have a history of concussions.
▶ Example 2: Y1 , . . . , Y25 are a random sample of hippocampal
volumes (in cm3 ) of such football players. Assume the
iid
Yi′ s ∼ N(µ, σ 2 ).
▶ Example 2(a): If we know σ = 0.5 ⇒ σ 2 = 0.25, make
inference about µ. We assume a N(6.5, 0.42 ) prior on µ.
▶ The posterior mean is 5.78 cm3 . With posterior probability
0.95, the mean hippocampal volume of the brains for the
population of concussed players is between 5.59 and 5.97cm3 .

David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Chapter 5: More Conjugate Priors

You might also like