STA732
Statistical Inference
Lecture 09: Bayesian estimation
Yuansi Chen
Spring 2023
Duke University
https://www2.stat.duke.edu/courses/Spring23/sta732.01/
1
Recap from Lecture 08
1. Construct minimum risk equivariant (MRE) estimator via
conditioning on maximal invariant statistics
2. Pitman estimator of location
3. MRE for location is unbiased under squared error loss
4. MRE usually admissible
2
Where we are
• We have finished the first approach of arguing for “the best”
estimator in point estimation: by restricting to a small set of
estimatiors
• Unbiased estimators
• Equivariant estimators
• We begin the second approach: global measure of optimality
• average risk
• minimax risk
3
Goal of Lecture 09
1. Bayes risk, Bayes estimator
2. Examples
3. Bayes estimators are usually biased
4. Bayes estimators are usually admissible
Chap. 7 in Keener or Chap. 4 in Lehmann and Casella
4
Bayes risk, Bayes estimator
Recall the components of a decision problem
• Data 𝑋
• Model family P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}, a collection of probability
distributions on the sample space
• Loss function 𝐿, 𝐿(𝜃, 𝑑) measures the loss incurred by the
decision 𝑑 when compared with the parameter obtained from 𝜃
• Risk function 𝑅, 𝑅(𝜃, 𝛿) = 𝔼𝜃 [𝐿(𝜃, 𝛿)]
5
The frequentist motivation of the Bayesian setup
Motivation
It is in general hard to find uniformly minimum risk estimator.
Oftentimes, we have risks that cross. This difficulty will not arise if
the performance is measured via a single number.
Def. Bayes risk
The Bayes risk is the average-case risk, integrated w.r.t. some
measure Λ, called prior.
6
The frequentist motivation of the Bayesian setup
Motivation
It is in general hard to find uniformly minimum risk estimator.
Oftentimes, we have risks that cross. This difficulty will not arise if
the performance is measured via a single number.
Def. Bayes risk
The Bayes risk is the average-case risk, integrated w.r.t. some
measure Λ, called prior.
Remark
For now, assume Λ(Ω) = 1 (Λ is a prob measure). Later we might
deal with improper prior.
6
Bayes risk
𝑅Bayes (Λ, 𝛿) = ∫ 𝑅(𝜃, 𝛿)𝑑Λ(𝜃)
Ω
= 𝔼𝑅(Θ, 𝛿)
where Θ is the randoma variable with distribution Λ.
𝔼𝑅(Θ, 𝛿) = 𝔼[𝔼[𝐿(Θ, 𝛿(𝑋)) ∣ 𝑋]]
Both 𝑋 and Θ are considered random.
The frequentist understanding: average risk makes sense without believing the
parameter is random
7
Bayes estimator
An estimator 𝛿 which minimizes the average risk 𝑅Bayes (Λ, ⋅) is a
Bayes estimator.
8
Construct Bayes estimator
Thm 7.1 in Keener
Suppose Θ ∼ Λ, 𝑋 ∣ Θ = 𝜃 ∼ 𝑃𝜃 , and 𝐿(𝜃, 𝑑) ≥ 0 for all 𝜃 ∈ Ω and
all 𝑑. If
• 𝔼[𝐿(Θ, 𝛿0 )] < ∞ for some 𝛿0
• for a.e. 𝑥, there exists a 𝛿Λ (𝑥) minimizing
𝔼[𝐿(Θ, 𝑑) ∣ 𝑋 = 𝑥]
with respect to 𝑑
Then 𝛿Λ is a Bayes estimator.
In words: the Bayes estimator can be found by minimizing the conditional
distribution 𝔼[𝐿(𝜃, 𝑑) ∣ 𝑋 = 𝑥], one 𝑥 at a time
9
proof of Thm 7.1
10
Posterior
Def. Posterior
The conditional distribution of Θ given 𝑋, written as ℒ(Θ ∣ 𝑋) is
called the posterior distribution
Remark
• Λ is usually interpreted as prior belief about Θ before seeing
the data
• ℒ(Θ ∣ 𝑋) is the belief after seeing the data
11
Posterior calcultation with density
Suppose prior density 𝜆(𝜃), likelihood 𝑝𝜃 (𝑥), then the posterior
density is
𝜆(𝜃)𝑝𝜃 (𝑥)
𝜆(𝜃 ∣ 𝑥) =
𝑞(𝑥)
where 𝑞(𝑥) = ∫Ω 𝜆(𝜃)𝑝𝜃 (𝑥)𝑑𝜃 is the marginal density of 𝑋.
Then the Bayes estimator has the form
𝛿Λ (𝑥) = arg min ∫ 𝐿(𝜃, 𝑑)𝜆(𝜃 ∣ 𝑥)𝑑𝜃
𝑑 Ω
12
Posterior mean is Bayes estimator for squared error loss
2
Suppose 𝐿(𝜃, 𝑑) = (𝑔(𝜃) − 𝑑) then the Bayes estimator is the
posterior mean
proof:
13
Examples
Binomial model with Beta prior
Suppose 𝑋 ∣ Θ = 𝜃 ∼ Binomial(𝑛, 𝜃) with density 𝜃𝑥 (1 − 𝜃)𝑛−𝑥 (𝑛𝑥),
Θ ∼ Beta(𝛼, 𝛽) with density 𝜃𝛼−1 (1 − 𝜃)𝛽−1 Γ(𝛼)Γ(𝛽)
Γ(𝛼+𝛽) . Find the
Bayes estimator under squared error loss.
14
Weighted squared error loss
2
Suppose 𝐿(𝜃, 𝑑) = 𝑤(𝜃) (𝑔(𝜃) − 𝑑) . Find a Bayes estimator.
15
Normal mean estimation
𝑋 ∣ Θ = 𝜃 ∼ 𝒩(𝜃, 𝜎2 ),
Θ ∼ 𝒩(𝜇, 𝜏 2 ).
Find the Bayes estimator of mean under squared error loss
What if we have 𝑛 i.i.d. data points?
16
Binary classification
Suppose the parameter space Ω = {0, 1}.
ℙ(𝑋 = 𝑥 ∣ Θ = 0) = 𝑓0 (𝑥) and ℙ(𝑋 = 𝑥 ∣ Θ = 1) = 𝑓1 (𝑥). The
prior is 𝜋(1) = 𝑝, 𝜋(0) = 1 − 𝑝.
⎧
{0 𝑑 = 𝜃
Determine a Bayes estimator under 0-1 loss 𝐿(𝜃, 𝑑) =
⎨
{
⎩1 𝑑 ≠ 𝜃
17
Bayes estimators are usually biased
Unbiased estimator under squared error loss is not Bayes
Thm Lehmann Casella 4.2.3
If 𝛿 is unbiased for 𝑔(𝜃) with 𝑅Bayes (Λ, 𝛿) < ∞ then 𝛿 is not Bayes
under squared error loss unless its average risk is zero
𝔼 [(𝛿(𝑋) − 𝑔(Θ))2 ] = 0
18
proof:
19
Bayes estimators are usually
admissible
Uniqueness of Bayes estimator under strictly convex loss
Thm. Lehmann Casella 4.1.4
Let 𝑄 be the marginal distribution of 𝑋, i.e.,
𝑄(𝐸) = ∫ ℙ𝜃 (𝐸)𝑑Λ(𝜃). Suppose 𝐿 is strictly convex. If
1. 𝑅Bayes (Λ, 𝛿Λ ) < ∞,
2. 𝑄(𝐸) = 0 implies 𝑃𝜃 (𝐸) = 0, ∀𝜃,
then the Bayes estimator 𝛿Λ is unique (a.e. with respect to 𝑃𝜃 for all
𝜃).
20
proof: Use the following lemma
Lem. Lehmann Casella exercise 1.7.26
Let 𝜙 be a strictly convex function over an interval 𝐼. If there exists a
value 𝑎0 ∈ 𝐼 minimizing 𝜙, then 𝑎0 is unique.
21
A unique Bayes estimator is admissible
Thm. Lehmann Casella 5.2.4
A unique Bayes estimator (a.s. for all 𝑃𝜃 ) is admissible.
22
proof:
23
Summary
• Bayes estimator is defined as the minimizer of the average risk
over a prior on 𝜃.
• Bayes estimator can be constructed by conditioning the risk
on each 𝑥
• Bayes estimators are biased under squared error loss
• Bayes estimators are admissible under strictly convex loss
24
What is next?
• Where do priors come from?
• Pros and cons of Bayes
25
Thank you
26
27