0% found this document useful (0 votes)

101 views7 pages

T4 Probability

Typically, probability refers to the occurrence of some future event, while likelihood refers to evaluating a past event. The document provides an introduction to key probability concepts like sample space, relative frequency, and the frequentist definition of probability as the long-run relative frequency of an event. It also discusses probability models, axioms, and the difference between probability and likelihood - namely that probability is used to predict future outcomes while likelihood evaluates the probability of a past observed outcome.

Uploaded by

Pruthvi Arjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views7 pages

T4 Probability

Uploaded by

Pruthvi Arjun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A

COURSE ON MOLECULAR EVOLUTION (BIOL 3046)

Probability

The subject of PROBABILITY is a branch of mathematics dedicated to building models to

describe conditions of uncertainty and providing tools to make decisions or draw
conclusions on the basis of such models.

In the broad sense, a PROBABILITY is a measure of the degree to which an occurrence

is certain [or uncertain].

A statistical definition of probability

People have thought about, and defined, probability in different ways. It is

important to note the consequences of the definition:

1. All definitions agree on the algebraic and arithmetic procedures that must be
followed; hence, the definition does not influence the outcome.

2. The definition has a fundamental impact on the meaning of the result!

We will consider the frequentist definition of probability, as it is the one that

currently is the most widely held. To do this we need to define two concepts: (i)
sample space, and (ii) relative frequency.

1. Sample space, S, is the collection [sometimes called universe] of all possible

outcomes. For a stochastic system, or an experiment, the sample space is a
set where each outcome comprises one element of the set.

2. Relative frequency is the proportion of the sample space on which an event E

occurs. In an experiment with 100 outcomes, and E occurs 81 times, the
relative frequency is 81/100 or 0.81.

The frequentist approach is based on the notion of statistical regularity; i.e., in the
long run, over replicates, the cumulative relative frequency of an event (E) stabilizes.
The best way to illustrate this is with an example experiment that we run many
times and measure the cumulative relative frequency (crf). The crf is simply the
relative frequency computed cumulatively over some number of replicates of
samples, each with a space S.

Let’s take a look at an example of statistical regularity.

Suppose we have a treatment for high blood pressure. The event, E, we are
interested in is successfully controlling the blood pressure. So, we want to be able to
make a prediction about the probability that a patient treated in the future will have
blood pressure under control, P(E). To estimate this probability we conduct an
experiment that is replicated over time in months. The data are presented in the
table below.

Month Number of Number Cumulative S Cumulative E crf

subjects (S) Controlled (E)
1 100 80 100 80 0.800
2 100 88 200 168 0.840
3 100 75 300 243 0.810
4 100 77 400 320 0.800
5 100 80 500 400 0.800
6 100 76 600 476 0.793
7 100 82 700 558 0.797
8 100 79 800 637 0.796
9 100 80 900 717 0.797
10 100 76 1000 793 0.793
11 100 77 1100 970 0.791
12 100 78 1200 948 0.790

[data for example is after McColl (1995)]

The crf values down the right most column fluctuate the most in the beginning, but
rapidly stabilize. Statistical regularity is the stabilization of the crf in the face of
individual fluctuations form month to month in the relative frequency of E.

Finally, we are in a position where we can obtain a definition of probability. Here

goes: In words, the probability of an event E, written as P(E), is the long run
(cumulative) relative frequency of E. More formally we define P(E) as follows:

P(E) = lim crf n (E )

n →∞

We can get an idea of this by using an example with “nearly infinite” replications.

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2500 5000 7500 10000

Hypothetical plot of crf of an event

Probability models

For all probability models to give consistent results about the outcomes of future
events they need to obey four simple axioms (Kolmogorov 1933).

Probability axioms:

1. Probability scale= 1 to 0. Hence, 0 ≤ P(E) ≤ 1.

2. Probabilities are derived from a relative frequency of an event (E) in the

“space of all possible outcomes” (S), where P(S) = 1. Hence, if the probability
of an event (E) is P(E), then the probability that E does not occur is 1 – P(E).

3. When events E and F are disjoint, they cannot occur together. The
probability of disjoint events E or F = P(E or F) = P(E) + P(F).

4. Axiom 3 above deals with a finite sequence of events. Axiom 4 is an

extension of axiom 3 to an infinite sequence of events.

For the purpose of modelling in molecular evolution, we need to assume these

probability axioms and just one additional theorem, the multiplication theorem. I
will not provide a detailed explanation of this theorem. However, a consequence of
this theorem is what is sometime referred to as the “product rule” or “multiplication
rule”; see the box below for an explanation.

Product rule:

The product rule applies when two events E1 and E2 are independent. E1 and
E2 are independent if the occurrence or non-occurrence of E1 does not change
the probability of E2 [and vice versa]. [A further statistical definition requires the
use of the multiplication theorem]

It is important to note that a proof of statistical independence for a specific case

by using the multiplication theorem is rarely possible; hence, most models
incorporate independence as a model assumption.
Typically, probability refers to the occurrence of some future event:
When E1 and E2 occur together they are joint events. The joint probability of
• For example, the probability that a tossed [fair] coin will be heads is ½.
the independent events E1 and E2 = P(E1,E2) = P(E1) × P(E2). Hence the term
• What is the probability of getting 5H and 6T if the coin is fair
“product rule” or “multiplication principle”, or whatever you call it.

Conditional probability is very useful as it allows us to express a probability given

some further information; specifically, it is the probability of event E2 assuming that
event E1 has already occurred. We assume the E1 and E2 events are in a given
sample space, S, and P(E1) > 0. We write the conditional probability as P(E2|E1);
the vertical bar is read as “given”.
Let’s look at an example of a probability model. The familiar binomial distribution
provides the appropriate model for describing the probability of the outcomes of
flipping a coin. The binomial model is as follows:

⎛n⎞ k
P = ⎜⎜ ⎟⎟( p ) (1 − p )
n−k

⎝k ⎠

⎛n⎞ n!
⎜⎜ ⎟⎟ =
⎝ ⎠
k k !(n − k )!

If we had a fair coin we could predict the probability of specific outcomes (e.g., 1
head & 1 tail in two tosses) by setting the p parameter equal to 0.5. Note that the
model does not require this. In the case of the coin toss, we are interested in a
conditional probability; i.e., what is the probability of obtaining, say, 5 heads given a
fair coin (p = 0.5) and 12 tosses, or P(k=5 | p=0.5, n=12).

Probability and likelihood are inverted

Probability refers to the occurrence of some future outcome.

• For example: “If I toss a fair coin 12 times, what is the probability that I
will obtain 5 heads and 7 tails?”

Likelihood refers to a past event with a known outcome.

• For example: “What is the probability that my coin is fair if I tossed it 12
times and observed 5 heads and 7 tails ”

Let’s continue to use the familiar coin tossing experiment to examine this inversion.

⎛n⎞
P = ⎜⎜ ⎟⎟(1 / 2 ) (1 / 2)
k n−k

⎝k ⎠

⎛n⎞ n!
⎜⎜ ⎟⎟ =
⎝k ⎠ k !( − k )!
n

n is the number of flips

k is the number of successes
CASE 1: PROBABILITY.

The question is the same: “If I toss a fair coin 12 times, what is the probability that I
will obtain 5 heads and 7 tails?”

The answer comes directly from the above formula where n = 12, and k = 5. The
probability of such a future event is 0.193359.

From the probability perspective we can look at the distribution of all possible
outcomes

Probability of 5 heads & 7 tails = 0.1933

Our outcome of 5 heads & 7 tails

This is the distribution of mutually exclusive outcomes that comprise the set of all
possible outcomes under the model where p = 0.5. Remember probability axiom 2
where P(S) = 1; the probability of each outcome (i.e., 0 to 12 heads) sum to 1.

CASE 2: LIKELIHOOD.

The second question is: “What is the probability that my coin is fair if I tossed it 12
times and observed 5 heads and 7 tails?”

We have inverted the problem. In the previous case (1) we were interested in the
probability of a future outcome given that my coin is fair. In this case (2) we are
interested in the probability hat my coin is fair, given a particular outcome.

So, in the likelihood framework we have inverted the question such that the
hypothesis (H) is variable, and the outcome (let’s call it the data, D) is constant.

A problem: What we want to measure is P(H|D). The problem is that we can’t work
with the probability of a hypothesis, only the relative frequencies of outcomes. The
solution comes from the knowledge that there is a relationship between P(H|D) and
P(D|H):
The P(H|D) = αP(D|H)

Constant value of proportionality

The likelihood of the hypothesis given the data, L(H|D), is proportional to the
probability of the data given the hypothesis, P(D|H). As long as we stick to
comparing hypotheses on the same data and probability model, the constant remains
the same, and we can compare the likelihood scores. We cannot make comparisons
on different data using likelihoods.

Just remember: with likelihoods, the hypotheses are the variables!

Let’s use the binomial model to look at the application of probability as compared
with likelihood.

PROBABILITIES Data
D1: 1H & 1T D2: 2H
Hypotheses H1: p(H) = 1/4 0.375 0.0625
H2: p(H) = 1/2 0.5 0.25

Following the probability axioms, and as we saw in the binomial distribution above,
given a singe hypothesis (i.e., H2: p(H) = 0.5), the different outcomes can be
summed. For example P(D1 or D2|H2) = P(D1|H2) + P(D2|H2), a well known
result; with all possible outcomes summing to 1. However, we cannot use the
addition axiom over different hypotheses H1 and H2; i.e., P(D1|H1 or D2|H2) ≠
P(D1|H1) + P(D2|H2).

LIKELIHOODS Data
D1: 1H & 1T D2: 2H
Hypotheses H1: p(H) = 1/4 α1 × 0.375 α2 × 0.0625
H2: p(H) = 1/2 α1 × 0.5 α2 × 0.25

Under likelihood we can work with different hypotheses as long as we stick to the
same dataset. Take the likelihoods of H1 and H2 under D1. We can infer that the
H1 is ¾ less likely than H2. Note that when working with likelihoods, we compute
the probabilities, and we drop the constant for convenience. The likelihoods do not
sum to 1 because the probabilities terms are for the same outcome drawn from
different distributions [probabilities for the total set of outcomes S in same
distribution sum to 1].

An example of Likelihood in action

Let’s use likelihood to follow through on our question of the probability that the coin
is fair given 12 tosses with 5 heads and 7 tails. As always our tosses are
independent.

The L(p=0.5|12,5) = α × P(2,5|p=0.5)

[it’s easy to use the binomial formula to get the probability term]
L = α × 0.193

[we drop the constant for convenience]

L = 0.193

Perhaps there is an alternative hypothesis; i.e., where p ≠ 0.05, that has a higher
likelihood. To explore this possibility we take the binomial formula as our likelihood
function and evaluate the resulting likelihoods with respect to various values of p and
the given data. The results can be plotted as a curve; this curve is sometimes called
the likelihood surface. The curve for our data (12,5) is shown below.

Maximum Likelihood score = 0.228

0.25

0.2

0.15

0.1

0.05

0
0 0.2 0.4 0.6 0.8 1

ML estimate of p = 0.42

IMPORTANT NOTE: It looks like a distribution, but don’t be fooled, the area under the
curve does not sum to 1. The curve reflects the probabilities of different values of p
(a parameter of the model) under the same data, and these are not mutually
exclusive outcomes within a single set of all the possible outcomes.

Behavioral Archeology (Studies in Archeology) -- Schiffer, Michael B -- Studies in Archeology (Academic Press), New York, 1976 -- Academic Press, -- 9780126241501 -- 9a389e5888c0f8644522dee945741000 -- Anna’s Arc
No ratings yet
Behavioral Archeology (Studies in Archeology) -- Schiffer, Michael B -- Studies in Archeology (Academic Press), New York, 1976 -- Academic Press, -- 9780126241501 -- 9a389e5888c0f8644522dee945741000 -- Anna’s Arc
248 pages
Probablity
100% (1)
Probablity
312 pages
Probability and Statistics
100% (1)
Probability and Statistics
90 pages
Slides-Sksk
100% (1)
Slides-Sksk
151 pages
MAST20006 Module1 Slides
No ratings yet
MAST20006 Module1 Slides
28 pages
Statistics Ebook - Class 12
100% (1)
Statistics Ebook - Class 12
281 pages
Notes 1
No ratings yet
Notes 1
4 pages
Probablity
No ratings yet
Probablity
310 pages
Probability Notes
No ratings yet
Probability Notes
14 pages
Chapter 2: Elementary Probability Theory: Chiranjit Mukhopadhyay Indian Institute of Science
No ratings yet
Chapter 2: Elementary Probability Theory: Chiranjit Mukhopadhyay Indian Institute of Science
48 pages
21 Probability PDF
No ratings yet
21 Probability PDF
11 pages
Lectures Ma 2203
No ratings yet
Lectures Ma 2203
209 pages
Slides 11 09 PDF
No ratings yet
Slides 11 09 PDF
105 pages
The Process of Research in Psychology 3rd Edition Dawn M. Mcbride 2024 scribd download
100% (8)
The Process of Research in Psychology 3rd Edition Dawn M. Mcbride 2024 scribd download
85 pages
Probability: E.G. The Result of A Coin Toss
No ratings yet
Probability: E.G. The Result of A Coin Toss
11 pages
I Unit
No ratings yet
I Unit
16 pages
Chapter 2: Elementary Probability Theory: Chiranjit Mukhopadhyay Indian Institute of Science
No ratings yet
Chapter 2: Elementary Probability Theory: Chiranjit Mukhopadhyay Indian Institute of Science
48 pages
Lecture 05 - Probability (4.1-4.3)
No ratings yet
Lecture 05 - Probability (4.1-4.3)
12 pages
Probability Theory For Data Analytics: (CSPC-309)
No ratings yet
Probability Theory For Data Analytics: (CSPC-309)
50 pages
BS UNIT 2 Probability New PDF
No ratings yet
BS UNIT 2 Probability New PDF
62 pages
Probability and Statistical Theory: With Applications To Games and Gambling
No ratings yet
Probability and Statistical Theory: With Applications To Games and Gambling
36 pages
Unit 2 - A Quick Review of Probability
No ratings yet
Unit 2 - A Quick Review of Probability
15 pages
Superior University of Pakistan: Lecture No. 18 of The Course On Statistics and Probability by DR - Amjad Hussain
No ratings yet
Superior University of Pakistan: Lecture No. 18 of The Course On Statistics and Probability by DR - Amjad Hussain
61 pages
Lecture 2 Review of Probabilty Theory
No ratings yet
Lecture 2 Review of Probabilty Theory
52 pages
MA-2203: Introduction To Probability and Statistics: Lectures Notes
No ratings yet
MA-2203: Introduction To Probability and Statistics: Lectures Notes
27 pages
Chapter 13
No ratings yet
Chapter 13
18 pages
Probability and Statistics
No ratings yet
Probability and Statistics
65 pages
Chapter # 4 Exhaustive Events
No ratings yet
Chapter # 4 Exhaustive Events
5 pages
Chapter 5: Introduction To Probability: 5.1. Basic Concepts Definition of Terms
No ratings yet
Chapter 5: Introduction To Probability: 5.1. Basic Concepts Definition of Terms
10 pages
Statistics and Probability Q3
No ratings yet
Statistics and Probability Q3
29 pages
APznzabODUiSSaWMUou42Zzm0EYzg0Yh 1FEJF QJAUr4k8rz3m YU3iMSbfj49gHbb070VtVcnEvSEyzQBI1oV0P1nfomatmabhLwlTksvMa8zNID0lFpjygrjBXJpow7OT1jEWPikvLlRPMXG56 KCTPGX6AhP ArSuiN6zEcbb9NFUZrolIsRV3C5
No ratings yet
APznzabODUiSSaWMUou42Zzm0EYzg0Yh 1FEJF QJAUr4k8rz3m YU3iMSbfj49gHbb070VtVcnEvSEyzQBI1oV0P1nfomatmabhLwlTksvMa8zNID0lFpjygrjBXJpow7OT1jEWPikvLlRPMXG56 KCTPGX6AhP ArSuiN6zEcbb9NFUZrolIsRV3C5
33 pages
Chapter 5 Elementery Probability
No ratings yet
Chapter 5 Elementery Probability
25 pages
Chapter 5&6 Probability and Probability Distribution
No ratings yet
Chapter 5&6 Probability and Probability Distribution
48 pages
3 Probability
No ratings yet
3 Probability
44 pages
Artificial Intelligence Methodology For Definitions and For Hypotheses.
100% (1)
Artificial Intelligence Methodology For Definitions and For Hypotheses.
8 pages
Probability
No ratings yet
Probability
58 pages
Lecture 9
No ratings yet
Lecture 9
9 pages
Probability
No ratings yet
Probability
43 pages
Probability: Concepts Related To Probability
No ratings yet
Probability: Concepts Related To Probability
10 pages
Chap2 Full
No ratings yet
Chap2 Full
18 pages
Probability Notes
No ratings yet
Probability Notes
19 pages
MATH-E5_Randomness-Uncertainty-and-Probality
No ratings yet
MATH-E5_Randomness-Uncertainty-and-Probality
88 pages
Probability: Section 6: Probability BIT Semester 2 - Mathematics For Computing I
No ratings yet
Probability: Section 6: Probability BIT Semester 2 - Mathematics For Computing I
14 pages
wk8w21
No ratings yet
wk8w21
55 pages
Introduction to probability
No ratings yet
Introduction to probability
38 pages
Chapter 7 Statistics
No ratings yet
Chapter 7 Statistics
7 pages
Probability
No ratings yet
Probability
30 pages
Umrah Statistics Bulletin 2018 en
No ratings yet
Umrah Statistics Bulletin 2018 en
79 pages
1.6*_Goodness of fit tests 2
No ratings yet
1.6*_Goodness of fit tests 2
37 pages
Statistics chapter 4 (1)
No ratings yet
Statistics chapter 4 (1)
28 pages
Screenshot 2024-10-29 at 2.11.42 PM
No ratings yet
Screenshot 2024-10-29 at 2.11.42 PM
26 pages
STERN 2020 Methodological Guidance (1)
No ratings yet
STERN 2020 Methodological Guidance (1)
19 pages
Bahan Bacaan 2 Minggu 2 PDF
No ratings yet
Bahan Bacaan 2 Minggu 2 PDF
3 pages
Probability
No ratings yet
Probability
13 pages
Probability - Lecture 2-3
No ratings yet
Probability - Lecture 2-3
6 pages
ProbabilityStatistics Probability
No ratings yet
ProbabilityStatistics Probability
10 pages
Random events probability combinatorics
No ratings yet
Random events probability combinatorics
21 pages
MEFall2023_6
No ratings yet
MEFall2023_6
35 pages
The Effect of Dark Personality Traits and Individual Accountability
No ratings yet
The Effect of Dark Personality Traits and Individual Accountability
14 pages
Abawi
No ratings yet
Abawi
14 pages
Longarela, I. R. (2017) - Explaining Vertical Gender Segregation - A Research Agenda.
No ratings yet
Longarela, I. R. (2017) - Explaining Vertical Gender Segregation - A Research Agenda.
20 pages
Stats Solutions Danny
100% (2)
Stats Solutions Danny
6 pages
SPSS Project 2024_2025Sem2 (1)
No ratings yet
SPSS Project 2024_2025Sem2 (1)
5 pages
PROBABILITY
No ratings yet
PROBABILITY
9 pages
Probabilitytheory
No ratings yet
Probabilitytheory
10 pages
Process Capability Analysis: March 20, 2012
No ratings yet
Process Capability Analysis: March 20, 2012
34 pages
Statistics Chapter One
No ratings yet
Statistics Chapter One
17 pages
STAT_112_LECTURE_NOTE[1]
No ratings yet
STAT_112_LECTURE_NOTE[1]
29 pages
The Normal Distribution Activity
No ratings yet
The Normal Distribution Activity
6 pages
SLA - Class Test - 4 - AnswerKey
No ratings yet
SLA - Class Test - 4 - AnswerKey
2 pages
Statistical Signal Processing: ECE 5615 Lecture Notes Spring 201 9
No ratings yet
Statistical Signal Processing: ECE 5615 Lecture Notes Spring 201 9
32 pages
Urban Transport Rev)
No ratings yet
Urban Transport Rev)
57 pages
SC0x LiveEvent2 Statistics
No ratings yet
SC0x LiveEvent2 Statistics
14 pages
Latip, Nur Izzah A. Final Exam
No ratings yet
Latip, Nur Izzah A. Final Exam
4 pages
10 11648 J JHRM 20170506 11
No ratings yet
10 11648 J JHRM 20170506 11
5 pages
Regression Analysis 1 2020
No ratings yet
Regression Analysis 1 2020
40 pages
Course Descriptions: BCOM 4006 - Algorithms in Computational Biology
No ratings yet
Course Descriptions: BCOM 4006 - Algorithms in Computational Biology
3 pages
CORRELATION ANALYSIS Pearson's R
No ratings yet
CORRELATION ANALYSIS Pearson's R
3 pages
Outline Full 2022
No ratings yet
Outline Full 2022
4 pages
BITI2233 Teaching Plan
No ratings yet
BITI2233 Teaching Plan
10 pages
Syllabus v3
No ratings yet
Syllabus v3
3 pages
Point-Biserial and Biserial Correlation
No ratings yet
Point-Biserial and Biserial Correlation
6 pages
A Level Statitics Unit 1 Mark Scheme
No ratings yet
A Level Statitics Unit 1 Mark Scheme
10 pages
Stat 211 - Digital Assignment 2-2017
No ratings yet
Stat 211 - Digital Assignment 2-2017
4 pages
Introduction to Gambling Theory – Know the Odds!
From Everand
Introduction to Gambling Theory – Know the Odds!
stanbook449
3.5/5 (2)
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Probability Theory: A Concise Course
From Everand
Probability Theory: A Concise Course
Y. A. Rozanov
4/5 (2)
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet