What is Bayesian Statistics?
Alireza Akhondi-Asl
MSICU Center For Outcomes
Department of Anesthesiology, Critical Care and Pain Medicine
Learning Objectives
• What is Bayes’ Rule?
• Mechanism of belief update in Bayesian
statistics?
• What are the Differences with the frequentist
statistics?
Probability Interpretations
Frequentist Subjective
Frequentist
Probability
• Relative Frequency of an Event in
long run:
#𝑡𝑖𝑚𝑒𝑠 𝑒 ℎ𝑎𝑝𝑝𝑒𝑛𝑑
𝑃 𝑒 = lim
𝑛→∞ 𝑛
It is inside the head probability
• How strongly do you believe that a patient is going to
survive?
• The probability of the Democrats winning the 2024 US
presidential election.
Subjective
Probability Sometimes it is hard to quantify our belief.
• Thinking about a fair bet.
• Comparing with other events with clear probabilities
We should be coherent.
Conditional probability
S
A
B
Law of total probability
If 𝐴1 , 𝐴2 , …, 𝐴4 is a partition of the sample space, then for
any event B we have:
S
𝐴2 𝐴3
𝐴1
𝐵
𝐴4
Bayes’ Rule
S
A
B
Bayes’ Rule
S
A
B
Bayes’ Rule
S
A
B
Bayes’ Rule S
𝐴2 𝐴3
𝐴1
𝐵
𝐴4
Bayes’ Rule S
𝐴2 𝐴3
𝐴1
𝐵
𝐴4
Bayes’ Rule S
𝐴2 𝐴3
𝐴1
𝐵
𝐴4
Medical Test
• A certain disease affects about 1 out of 1000 people in a
population.
• P()=0.001
• P(☺)=0.999
• There is a test to check whether the person has the
disease. The test has very high sensitivity and specificity. In
particular, we know that:
• P(T+|)=0.98
• P(T+|☺)=0.01
Medical Test
If you test positive for this disease, what
are the chances that you have the disease?
A) 98 Percent
B) Less than 10 percent
Medical Test
P(T+|)P()
P(|T+)=
P(T+)
Medical Test
P(T+|)P()
P(|T+)=
P(T+|)P()+P(T+|☺)P(☺)
Medical Test
P(T+|)P() 0.98×0.001
P(|T+)= = = 0.089
P(T+|)P()+P(T+|☺)P(☺) 0.98×0.001+0.01×0.999
Medical Test
P(T+|)P() 0.98×0.001
P(|T+)= = = 0.089
P(T+|)P()+P(T+|☺)P(☺) 0.98×0.001+0.01×0.999
The test updates your chances of having the diseases from 0.001 to 0.089.
Medical Test
If you test positive for this disease, what
are the chances that you have the disease?
A) 98 Percent
B) Less than 10 percent
What does present evidence
tell?
Richard
Royall’s Three What should we believe?
Questions
What should we do?
Medical Test Paradox
• A second independent test with the same accuracy is done
and it is positive again. What are the chances that you have
the disease?
• A) More than 90 percent
• B) Less than 10 percent
Medical Test Paradox
P(T+|)P()
• P(|T+)=
P(T+|)P()+P(T+|☺)P(☺)
• P()= ?
• P(☺)= ?
Medical Test Paradox
P(T+|)P()
• P(|T+)=
P(T+|)P()+P(T+|☺)P(☺)
• P()= 0.089
• P(☺)= 0.911
Medical Test Paradox
P(T+|)P() 0.98×0.089
• P(|T+)= = = 0.906
P(T+|)P()+P(T+|☺)P(☺) 0.98×0.089+0.01×0.911
• P()= 0.089
• P(☺)= 0.911
Medical Test Paradox
• A second independent test with the same accuracy is done
and it is positive again. What are the chances that you have
the disease?
• A) More than 90 percent
• B) Less than 10 percent
Statistical Analysis
Frequentist Bayesian Likelihoodist
Frequentist
Probabilities are long-
The most popular Parameters are fixed
run relative Data is assumed to be
method for statistical but unknown
frequencies from the random
inference constants
repeated experiments
We cannot make Randomness is
any probability due to sampling
statement about from a fixed
the parameters population.
The uncertainty is
due to sampling
variation.
Frequentist
• 𝑃 𝐷𝑎𝑡𝑎 𝜃
• Maximum Likelihood Estimation
• P-values : 𝑃 𝐷𝑎𝑡𝑎 𝜃 = 𝜃0
• Confidence Intervals, Effect Size
• No probability statement about 𝜃
Bayesian
Probability is interpreted as “degree of
subjective belief”.
• The events do not need to be repeatable.
We don’t know the value of parameters • Epistemic uncertainty.
and therefore, we consider them to be
random variables. • Parameters are probabilistic in nature.
Since we have observed data, it is fixed.
We Update our prior belief based on
observed data. The updated belief is • We use Bayes’ rule to calculate posterior.
called posterior belief
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule
Prior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Prior
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule
Likelihood Prior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Prior Model
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule
Likelihood Prior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Evidence
Prior Data Model
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule
Likelihood Prior
Posterior
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
Evidence
Prior Data Model Posterior
Bayesian
• Update our belief in a parameter using new evidence or data.
• Based on Bayes’ rule
Posterior Likelihood Prior
𝑃 𝜃 𝐷𝑎𝑡𝑎 ∝ 𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
Prior Data Model Posterior
Posterior Distribution
Summarizes
Model
everything
Comparison
we know
Prediction Hypothesis
• Posterior Testing
predictive • Region of
distribution practical
equivalence
Example
A new treatment approach
is proposed. We would like We observe results of
to infer about the success treatment of N patients.
rate of this treatment.
Likelihood
• Since the outcome is binary and samples are
independent, for a fixed number of trials, N, we
can use binomial distribution to describe our data
generation model:
𝑁 𝐸
𝑝 𝐷𝑎𝑡𝑎 𝜃 = 𝑝 𝐸 𝑁, 𝜃 = 𝜃 1 − 𝜃 𝑁−𝐸
𝐸
Example: Frequentist
Our Null Hypothesis
is that 𝜃0 = 0.5
Frequentist
N=6
E=5
CI: 0.36-1.0
=0.833
𝜽
Frequentist
N=18
E=15
CI: 0.59-0.96
=0.833
𝜽
Example: Bayesian
Let’s assume that we We update our belief
believe the success after observing each
rate is around 50%. outcome.
This is our prior
belief before
observing any
data.
Prior
O1=Y
O2=Y
“Today's posterior is tomorrow's prior”
— Lindley
O3=N
O4=Y
O5=Y
O6=Y
O1..6=YYNYYY
Higher resolution
• We need a distribution to describe our prior belief such
that posterior has a closed form distribution
• Beta distribution is an excellent option for parameters in
the range [0,1]
• It is the conjugate prior for binomial distribution.
• Beta prior + binomial = Beta posterior
𝑎−1 𝑏−1
𝑝 𝜃 𝑎, 𝑏 = beta a, b ∝ 𝜃 1−𝜃 ,0 ≤ 𝜃 ≤ 1
𝑎
• Mean=𝑎+𝑏
Conjugate Prior • Mode=𝑎+𝑏−2
𝑎−1
• N samples and E events
𝑝 𝑑𝑎𝑡𝑎 𝜃 = beta a + E, b + N − E
• 𝑎 + 𝑏 is effective sample size of prior
Beta
Distribution
Kruschke, John K. (Ed.) (2014): Doing Bayesian data analysis
𝒃𝒆𝒕𝒂(𝟏. 𝟓, 𝟏. 𝟓)
𝒃𝒆𝒕𝒂(𝟗, 𝟗)
𝒃𝒆𝒕𝒂(𝟗, 𝟗)
𝒃𝒆𝒕𝒂(𝟒. 𝟐, 𝟏𝟑. 𝟖)
𝒃𝒆𝒕𝒂(𝟏𝟑. 𝟖, 𝟒. 𝟐)
𝒃𝒆𝒕𝒂(𝟏. 𝟎, 𝟏. 𝟎)
N=24 and E=7.
Example:
Stopping N is fixed. Binomial
Rules
Negative
E is fixed. Binomial
Fixed N
Fixed E
Uniform
Prior
Subjectivity
• Most serious objection to Bayesian statistics.
• Two observers/researchers can arrive at different
conclusions
• Same statistical model
Problems with • Different priors
Bayesian
Denominator is hard to calculate
Inference • In some cases, we can use conjugate priors
• But in many cases, we cannot
• If the number of parameters are small, we can use grid
approximation
• However, even when we have moderate number of
parameters, it is not practical to use grid
approximation.
Subjectivity
• Most serious objection to Bayesian statistics.
• Two observers/researchers can arrive at different
conclusions
• Same statistical model
Problems with • Different priors
Bayesian
Denominator is hard to calculate
Inference • In some cases, we can use conjugate priors
• But in many cases, we cannot
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃)
• If the number of parameters are small, we can use grid
𝑃 𝜃 𝐷𝑎𝑡𝑎 =
𝑃(𝐷𝑎𝑡𝑎)
=
approximation
𝑃 𝐷𝑎𝑡𝑎 𝜃 𝑃(𝜃) • However, even when we have moderate number of
parameters, it is not practical to use grid
𝜃′ 𝑃 𝐷𝑎𝑡𝑎 𝜃 ′ 𝑝 𝜃 ′ 𝑑𝜃′ approximation.
Markov chain Monte Carlo
(MCMC)
• Metropolis–Hastings
Sampling from • Gibbs sampling
Posterior • JAGS, BUGS
Hamiltonian Monte Carlo
(HMC)
• STAN
• “If you are completely ignorant about which of a set of
Principle of exclusive and exhaustive propositions is true, that you
should assign them equal probabilities that sum to one.”
Indifference SOBER, ELLIOTT (2008): Evidence and evolution. The logic behind the science. Cambridge University Press.
Bayesian Inference Violates Principle of indifference
• Uniform prior.
• We believe all 0 ≤ 𝜃 ≤ 1
have the same prior
probability.
• We might think that this prior
is “uninformative”.
• Change the 𝜃 which is the
probability metric to odds
𝜃
𝑞=
1−𝜃
Weakly Informative
Informative priors
How to set • Prior Studies
• Moment-Matching
the prior? • Expert Knowledge
Objective Priors
• Jeffreys Prior
• Reference Prior
Jeffreys Prior
• Jeffreys proposed an “objective”
prior that is invariant under
monotone transformations of the
parameter.
• Based on Fisher information
• It is not uninformative
• For example, for binomial
distribution, Jeffreys Prior is
beta(0.5,0.5).
Jeffreys
prior
Reading Suggestions
• Kruschke, John K. (Ed.) (2014): Doing Bayesian data analysis.
A tutorial with R, JAGS, and Stan. Academic Press.
• Some of the simulations was based on the codes from this book.
• Lambert, Ben (2018): A student's guide to Bayesian
statistics. 1st. Los Angeles: SAGE.
• McElreath, Richard (2020): Statistical rethinking. A Bayesian
course with examples in R and Stan. Taylor and Francis CRC
Press.
• SOBER, ELLIOTT (2008): Evidence and evolution. The logic
behind the science. Cambridge, UK: Cambridge University
Press.
Conclusions
• Bayesian Statistics is a very flexible approach
• Update our belief after observing data
• Natural statement about the parameters
• Bayesian Inference Violates Principle of indifference
https://xkcd.com/1132/
Thank you!