Chapter 2 BayesianStatistics1
Chapter 2 BayesianStatistics1
1. Bayesian Paradigm
⋅Frequentist(Fisherian) :
⋅Bayesian :
- 1 -
- Just as the prior distribution reflects so reflects
- In other words, the posterior distribution combines the prior beliefs about with the information
about contained in the sample to give a composite picture of the final beliefs about .
- Since ∝ , the choice of the prior pdf is very important for Bayesian inference.
⋅
: the posterior predictive pdf of
given
⋅ If
and are conditionally independent when is given
- 2 -
2. Conjugate Priors
If the prior pdf and the posterior pdf are in the same class of pdf’s,
-----------------------------------------------------
Sample Distribution(Likelihood) Conjugate prior
-----------------------------------------------------
∼
Poisson( ) ∼Gamma(a,b)
is known ∼
is known ∼Gamma(a,b)
Gamma ∼Gamma(a,b)
-----------------------------------------------------
- 3 -
◎ Beta Distribution : ∼
,
▪ ,
▪ (conjugate) Prior of :
- 4 -
▪ The posterior distribution:
- 5 -
▪ The predictive probability: If we are interested in the probability of the new female birth ( ),
then the predictive probability is as following and we call it the Law of Succession.
Posterior Mean:
(Posterior mean is the Bayes estimator under the squared error loss)
: In this case, the MLE and posterior mean are very close
- 6 -
Example 2.2 Binomial Distribution
▪ Likelihood function :
▪ Posterior :
▪ Posterior mean : Since the mean of Beta is ,
- 7 -
▪ The predictive probability of
:
& are independent
- 8 -
Example 2.3 Placenta previa (전치태반)
▪
▪ Conjugate Prior of :
∼
⇒ posterior mean 0.446 , posterior s.d 0.016
- 9 -
Example 2.4 Normal Distribution with known variance
▪ Likelihood function :
where
- 10 -
also, ← free from
i.e.
- 11 -
▪ The predictive probability of
:
& are independent
∞
∞
←
and ′
∞
∞
∞
∝
∞
exp
∞
∝ exp
∞
where
.
Therefore,
- 12 -
∞
∝ exp
exp
∞
∝ exp
A child is given an intelligent test. Assume that the test result ∼ , where is the true IQ
level of the child, as measured by the test. In other words, if the child were to take a large number of
- 13 -
∼
Thus, if a child scores 115 on this test, the posterior distribution is as following;
≡ . ▦
▪ ∼ ⇒ ∼
▪ pdf: ∞
▪ Mean and Variance:
- 14 -
Example 2.6 Normal Distribution with known mean
▪ Likelihood function :
▪(Conjugate) Prior :
∼
( are known) :
- 15 -
Thus, the posterior distribution is
∼
,
,
. ▦
▪ Likelihood function :
- 16 -
▪Posterior :
∼
- 17 -
3. Noninformative Priors (Objective Priors)
When no (or minimal) prior information is available, we need a prior distribution which is
relatively “flat” to the sample information (likelihood function). It is called noninformative prior.
- Laplace Prior
- Jeffreys’ Prior
- Reference Prior
(e.g) If ∞ ∞, for ∼ , then let → ∞ and we can use it as a noninformative prior
- 18 -
3.1 Improper prior
∞
: A prior is said to be improper if ∞
∞ .
- (e.g) A uniform prior distribution on the real line, ∝ for ∞ ∞, is an improper prior.
- Improper priors are often used in Bayesian inference since they usually yield noninformative priors
and proper posterior distributions.
- Improper prior distributions can lead to posterior impropriety (improper posterior distribution).
- To determine whether a posterior distribution is proper, you need to make sure that the propriety of
posterior pdf.
- If an improper prior distribution leads to an improper posterior distribution, inference based on
the improper posterior distribution is invalid.
- 19 -
▪ Posterior :
∝
⇒ and it is a proper posterior distribution. ▦
▪ ∼
▪ Noninformative Prior :
▪ Posterior :
∝
⇒
and it is a proper posterior distribution.
Also, we know that ∝ is a conjugate prior. ▦
- 20 -
Example 3.3 Binomial Distribution
▪ Likelihood :
▪ Noninformative Prior (Haldane) : ∝
∞ ⇒ improper prior
▪ Posterior :
∝ ⇒ ∼
▪ Likelihood :
▪ Noninformative Prior :
- 21 -
▪ Posterior :
∝
⇒ ∼
▪ Posterior : ∝
Let , then . Thus, and
∞ ∞ ∞
- 22 -
Thus,
≈ ∞
Also, ≥ ⇔ ≤ , so for the second term
∞ ∞ ∞
≤
∞
∞
⇒
∞ [improper posterior!!]
▪ Jeffreys’ Prior
- It is proportional to the square root of the determinant of the Fisher information.
- For one-parameter case, it has the invariant property under one-to-one transformation.
log log
⇒ ∝
where
- 23 -
Example 3.6 Normal Distribution with known variance
▪ ∼ , ( is known)
exp
log ⇒ [free from ],
▪ Jeffreys' Prior :
▪ ∼ , ( is known)
exp
log ⇒
▪ Jeffreys' Prior : ∝ ∝ . Also,
- 24 -
Example 3.8 Binomial Distribution
▪ ∼ ≤ ≤ ,
▪ ∼ , ∝
log
▪ Jeffreys' Prior :
log ⇒
▪ Jeffreys' Prior :
- 25 -
Example 3.10 Exponential Distribution
▪ ∼ Exp ,
exp
log
⇒
▪ Jeffreys' Prior : ∝
▪ Posterior :
∝ exp
. Let , then
and
∞
∞ [proper]
▪ We should find the marginal posterior pdf for the parameter(s) of interest.
- 26 -
(i) find the joint posterior pdf for all parameters
(ii) after integrating out for the nuisance, one can get the marginal posterior pdf
for the parameter(s) of interest
▦ multi-parameter :
: (scalar or vector) is the parameter(s) of interest and
(scalar or vector) is the nuisance parameter(s)
⇒
▪ ∼
▪ the parameter vector:
▪ the likelihood function: ∝
exp
- 27 -
▪ Fisher Information Matrix :
log log
,
log
Thus,
log
log
E E
I and
log log
E E
∝
- 28 -
▦ Jeffreys’ Independent Prior
▪ Jeffreys considered the prior ∝ for and the prior ∝ for , respectively
▪ Jeffreys considered and are independent
i.e.
∝
exp
▪ If is the parameter of interest, one can get the marginal posterior pdf of as following;
∝
∞
∝
exp
Useful Fact
∞
exp
∝
- 29 -
- Using the fact, since ,
in this case
∝
×
∝ where
- 30 -
[Remark: Another way to find the marginal posterior pdf of ]
Let , then . And using the property of Gamma distribution, one can get
the same result.
▪ If is the parameter of interest, one can get the marginal posterior pdf of as following;
∝
exp
- then
∞
∝
∞
∞
∝ exp
∞
∝
exp
⇒ the marginal posterior distribution of :
- 31 -