0% found this document useful (0 votes)

8 views77 pages

Lecture 7 Bayesian Models - Merged

The document discusses Bayesian models for estimating the potential of gold deposits in a specified study area using various data sets and probabilistic approaches. It explains the difference between frequentist and Bayesian probability, emphasizing the use of prior knowledge in Bayesian statistics. Additionally, it outlines the weights of evidence model for calculating the probability of mineral deposits based on geological features.

Uploaded by

pythontest3960

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views77 pages

Lecture 7 Bayesian Models - Merged

Uploaded by

pythontest3960

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

Lecture 7

Baysesian Models
Study area

Area: 144 sq km
Unit Cell Size: 1 sq km

What is the potential of each unit cell to have a gold deposit?

How do we estimate this potential???

Data sets: Input GIS layers
Px – Proximal
Ds – Distal
Distance to fault layer
H - High
L - Low

Arsenic content layer

Mineral deposit layer

Training data
Px – Proximal
Ds – Distal

H - High
Distance to fault layer L – Low

D – Deposit
ND – Non deposit

Arsenic content layer

Deposit layer
Training data
Px – Proximal
Ds – Distal

H - High
Distance to fault layer L – Low

D – Deposit
ND – Non deposit

Arsenic content layer

Deposit layer

Probabilistic approach:
Class of pixel (1,1) given high arsenic and proximal to fault
Probability & Statistics

Sampling
Population Sample Data

Parameters Statistics
Inferencing

Statistical inference = generalizing from a sample to a population

Probability = Likelihood of a sample belonging to a specific population
Probability & Statistics
assumes it to be given
(perfect prior knowledge)
A container contains red balls and blue balls. A probabilist starts by knowing the
proportion of each and asks the probability of drawing a red ball. A statistician infers the
proportion of red balls by sampling from the jar……………..

Statistician’s approach Probabilist’s approach

Draw out a few balls randomly If the proportion of red balls to

from the container and estimate blue balls is 4:1, what is the
the proportion of red to blue balls probability that the ball I will draw
out randomly is red?

From the observations we compute statistics that we use to estimate population

parameters, which index the probability density, from which we can compute the
probability of a future observation from that population.
Statistical inference = generalizing from a sample to a population
Probability = Likelihood of a sample belonging to a specific population
Variables
• Quantities measured for a sample. May be
– Quantitative i.e. numerical
• Continuous (e.g. pH of a sample, radiance, magnetic
field, distance from a feature)
• Discrete (e.g. DN value on an image)
– Categorical
• Nominal (e.g. gender, land-use class)
• Ordinal (ranked e.g. mild, moderate or severe; small or
large, cool, warm and hot). Often ordinal variables are re-
coded to be quantitative.

8
Frequency Distributions

• An (Empirical) Frequency Distribution or Histogram for a

continuous variable presents the counts of observations grouped
within pre-specified classes or groups

• A Relative Frequency Distribution presents the corresponding

proportions of observations within the classes

• A Barchart presents the frequencies for a categorical variable

9
Example – Uranium in groundwater
• Water samples taken from 36 locations in Powai as part of a study to
determine the natural variation of total dissolved solids in the area.

• The Uranium concentrations measured in (PPM/I) are as follows:

10
U in a study area groundwater samples

121 82 100 151 68 58

95 145 64 201 101 163
84 57 139 60 78 94
119 104 110 113 118 203
62 83 67 93 92 110
25 123 70 48 95 42

11
Frequency Distribution
8

Frequency
4

20 40 60 80 100 120 140 160 180 200 220

= Probability distributions
(when idealized and fitted to mathematical functions)
Probability: the “frequentist” approach
• probability should be assessed in purely
objective terms
• no room for subjectivity on the part of individual
researchers
• knowledge about probabilities comes from the
relative frequency of a large number of trials
– this is a good model for coin tossing
– not so useful for predicting complex problems, where
many of the factors are unknown…e.g., stock market
Frequentist: "The probability of a coin landing heads is 50% because we
observed it in many trials."
Bayesian: "I believe the coin is fair based on prior evidence, and I’ll update
my belief as I see new data."
Probability: the Bayesian approach
• Bayes Theorem
– Thomas Bayes
– 18th century English clergyman

• concerned with integrating “prior knowledge” into

calculations of probability
• problematic for frequentists
– prior knowledge = bias, subjectivity…
Dealing with a ‘random phenomenon’
• a random phenomenon is a situation in which
we know what outcomes could happen, but we
don’t know which particular outcome did or will
happen.
• when dealing with probability, we will be
dealing with many random phenomena.
• examples: coin, cards, survey, experiments
Recall that…….
• probability of event = p
0 <= p <= 1
0 = certain non-occurrence
1 = certain occurrence

• .5 = even odds
• .1 = 1 chance out of 10
Probability

“something-has-to-happen rule”:
– The probability of the set of all possible outcomes of a
trial must be 1.
– P(S) = 1
(S represents set of all possible outcomes.)
CAUTION:: are outcomes are equally
likely??

Winning Lottery?? 50-50??

Rain Today?? Yes-No 50-50??

Just because there are two outcomes does not mean they are 50-50.
In a desert: P(Rain) nearly 1%

In a rainforest: P(Rain) nearly 90%

Mistake: Ignoring base rates (prior knowledge).

Example: Random Exploration
• The Lottery (also known as a tax on people who are
bad at math…)

• A certain lottery works by picking 6 numbers from 1 to

49. It costs $ 1.00 to play the lottery, and if you win,
you win $ 2 Million after taxes.

• If you play the lottery once, what are your expected

winnings or losses?
Lottery
Calculate the probability of winning in 1 try:

1 1 1 “49 choose 6”
= = = 7.2 x 10 -8
 49  49! 13,983,816 Out of 49 numbers,
 
 6  43!6! this is the number of
distinct combinations
of 6.
The probability function (note, sums to 1.0):
x p(x)
-1 $. .999999928

+ 2 Million $ 7.2 x 10--8

Expected Value
x p(x)
-1 $ .999999928

+ 2 Million $ 7.2 x 10--8

Expected Value
E(X) = P(win)*$2,000,000 + P(lose)*-$1.00
= 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.0.86

Negative expected value is never good!

You shouldn’t play if you expect to lose money!
Subjective probability
• we use the language of probability in everyday speech to
express a degree of uncertainty without basing it on long-
run relative frequencies.
• such probabilities are called subjective or personal
probabilities.
• personal probabilities don’t display the kind of consistency
that we will need probabilities to have, so we’ll stick with
formally defined probabilities.

I’m 90% sure Tesla will drop tomorrow—Elon’s tweet seemed off
Rules of probability:
addition rule
Definition: events that have no outcomes in common (and,
thus, cannot occur together) are called mutually exclusive.

For two mutually exclusive events A and B, the probability

that one or the other occurs is the sum of the probabilities
of the two events.

P(A or B) = P(A) + P(B), provided that

A and B are mutually exclusive.
rules of probability:
the general addition rule
For any two events A and B,
P(A or B) = P(A) + P(B) – P(A and B).
Rules of probability:
multiplication rule
– for two independent events A and B, the probability that
both A and B occur is the product of the probabilities of
the two events.
– P(A & B) = P(A) x P(B), provided that A and B are
independent.
Independent events
• one event has no influence on the outcome of
another event
• if P(A&B) = P(A) x P(B)
then events A & B are independent
• coin flipping
if P(H) = P(T) = .5 then
P(HTHTH) = P(HHHHH) =
.5*.5*.5*.5*.5 = .55 = .03125
independent ≠ mutually exclusive
• mutually exclusive events cannot be independent. Well,
why not?
• since we know that mutually exclusive events have no
outcomes in common, knowing that one occurred
means the other didn’t.
• thus, the probability of the second occurring changed
based on our knowledge that the first occurred.
• it follows, then, that the two events are not independent.
Conditional probability
• concern the odds of one event occurring, given that
another event has occurred

• P(A|B)=Prob of A, given B
Conditional probability (cont.)

• P(B|A) = P(A&B)/P(A)
Independence….???
With notation for conditional probabilities, we can now
formalize the definition of independence
• events A and B are independent whenever
P(B|A) = P(B)

{if A and B are independent, then

P(B|A) = P(A&B)/P(A)
= P(A)*P(B)/P(A)
or P(B|A) = P(B) }
so, the general multiplication rule
– For any two events A and B,
P(A & B) = P(A) x P(B|A) or
P(A & B) = P(B) x P(A|B)
Bayes’ Rule

P(d | h)P(h)
p(h | d) =
P(h)P(d | h) + P(~ h) P(d |~ h)
Does the patient have cancer or not?
• A patient takes a lab test and the result comes back positive. It is
known that the test returns a correct positive result in 98% of the
cases and a correct negative result in 97% of the cases.
Furthermore, only 0.008 of the entire population has this disease.

1. What is the probability that this patient has cancer?

2. What is the probability that he does not have cancer?
3. What is the diagnosis?
Probability of test being positive when there was no cancer = =1-0.97 = 0.03
1 – Probability of test being negative when there was no
cancer

= 0.208511
Probability of test being positive when there was no cancer = 1
– Probability of test being negative when there was no cancer
= 1-0.97 = 0.03

= 0.791489
Choosing Hypotheses
• Maximum Likelihood hypothesis:
h
ML=
arg
max
P(
d |h
)

hH

• Generally we want the most

probable hypothesis given training h =
arg
MAP max
P(
h |d
)
data.This is the maximum a 
hH
posteriori hypothesis:
– Useful observation: it does not depend
on the denominator P(d)
Now we compute the diagnosis
– To find the Maximum Likelihood hypothesis, we evaluate P(d|h)
for the data d, which is the positive lab test and chose the
hypothesis (diagnosis) that maximises it:
P(+ | cancer) = 0.98
P(+ | ¬cancer) = 0.03
 Diagnosis : hML = 0.98
– To find the Maximum A Posteriori hypothesis, we evaluate
P(d|h)P(h) for the data d, which is the positive lab test and chose
the hypothesis (diagnosis) that maximises it. This is the same as
choosing the hypotheses gives the higher posterior probability.
P(+ | cancer)P(cancer) = 0.20
P(+ | ¬cancer)P(¬cancer) = 0.79
 Diagnosis : hMAP = ......................
Weights of evidence model

Study area (S)

Target deposits
10k
Geological Feature (B1)

Geological Feature (B2)

10k

Objective: To estimate the probability of occurrence of D in each unit cell of the study area

Approach: Use BAYES’ THEOREM for updating the prior probability of the occurrence of
mineral deposit to posterior probability based on the conditional probabilities (or
weights of evidence) of the geological features.
Weights of evidence model
Step 1: Calculation of prior probability
1k
1k Study area (S)
Unit cell
Target deposits
10k

10k
• The probability of the occurrence of the targeted mineral deposit type when
no other geological information about the area is available or considered.

Total study area = Area (S) = 10 km x 10 km = 100 sq km = 100 unit cells

Area where deposits are present = Area (D) = 10 unit cells
Prior Probability of occurrence of deposits = P {D} = Area(D)/Area(S)= 10/100 = 0.1
Prior odds of occurrence of deposits = P{D}/(1-P{D}) = 0.1/0.9 = 0.11
Weights of Evidence
Step 3 Calculation of weights of evidence
Bayes’ Equation
Inference Observation

𝑃(𝐷&𝐵) 𝑃 𝐵𝐷
𝑃 𝐷𝐵 = =𝑃 𝐷
𝑃(𝐵) 𝑃 𝐵

ത
𝑃(𝐷&𝐵) 𝑃 𝐵ത 𝐷
𝑃 𝐷 𝐵ത = =𝑃 𝐷
ത
𝑃(𝐵) 𝑃 𝐵ത

Converting probabilities into odds and

logarithms:

𝑛(𝐷) D
𝑃 𝐷 =
𝑛(𝑆)

𝑛(𝐵 ∩ 𝐷)
𝑃 𝐵|𝐷 =
𝑛(𝐷)

ഥ)
𝑛(𝐵 ∩ 𝐷
ഥ =
𝑃 𝐵|𝐷
𝑛(𝐷ഥ)

𝑛(𝐵ത ∩ 𝐷) 𝑛 𝐷 − 𝑛(𝐷 ∩ 𝐵)
ത
𝑃 𝐵|𝐷 = =
𝑛(𝐷) 𝑛(𝐷)

𝑛(𝐵ത ∩ 𝐷
ഥ ) 𝑛 𝑆 − 𝑛 𝐵 − 𝑛(𝐷) + 𝑛(𝐵 ∩ 𝐷)
ത𝐷
𝑃 𝐵| ഥ = =
𝑛(𝐷 ഥ) 𝑛(𝐷ഥ)
Exercise 10k Unit cell size = 1 sq km & each deposit
S occupies 1 unit cell

B1
10k Geological Feature (B1)

Geological Feature (B2)

10k

Calculate the weights of evidence (W+ and W-) and Contrast values for B1 and B2

B1
10k Geological Feature (B1)

Geological Feature (B2)

10k

𝑛(𝐵∩𝐷)
𝑃 𝐵|𝐷 = =4/10 𝑊𝐵1 += 1.09888; 𝑊𝐵1 −= −0.3678
𝑛(𝐷)

ഥ 𝑊𝐵2 += 0.2060; 𝑊𝐵1 −= −0.0763

ഥ = 𝑛(𝐵∩𝐷)=12/90
𝑃 𝐵|𝐷 ഥ 𝑛(𝐷)
𝑛(𝐵ത ∩ 𝐷) 𝑛 𝐷 − 𝑛(𝐷 ∩ 𝐵)
ത
𝑃 𝐵|𝐷 = = = 6/10
𝑛(𝐷) 𝑛(𝐷)
𝑛(𝐵ത ∩ 𝐷)
ഥ 𝑛 𝑆 − 𝑛 𝐵 − 𝑛(𝐷) + 𝑛(𝐵 ∩ 𝐷)
ത ഥ
𝑃 𝐵|𝐷 = = = 78/90
ഥ
𝑛(𝐷) ഥ
𝑛(𝐷)
Step 3 Calculation of weights of evidence

Contrast (C) measures the net strength of spatial association between the
geological feature and mineral deposits

Contrast = W+ – W-

+ ive Contrast – net positive spatial association

-ive Contrast – net negative spatial association

zero Contrast – no spatial association

Can be used to test spatial associations

Step 4 Combining weights of evidence
Assuming conditional independence – Naïve!

+/− +/− +/−

𝐿𝑜𝑔[𝑂 𝐷 𝐵1, 𝐵2, 𝐵3. . ] = 𝐿𝑜𝑔 𝑂 + 𝑊𝐵1 + 𝑊𝐵2 + 𝑊𝐵3 +…….

𝐸𝑥𝑝[𝑂 𝐷 𝐵1, 𝐵2, 𝐵3. . ]

𝑃 𝐷 𝐵1, 𝐵2, 𝐵3 … . =
1 + 𝐸𝑥𝑝[𝑂 𝐷 𝐵1, 𝐵2, 𝐵3. . ]
Combining Weights of Evidence: Posterior Probability

Loge (O{D|B1, B2}) = Loge(O{D}) + W+/-B1 + W+/-B2

Loge(O{D}) = Loge(0.11) = -2.2073

Calculate posterior probability given: B1

1. Presence of B1 and B2;
2. Presence of B1 and absence of B2;
3. Absence of B1 and presence of B2;
4. Absence of both B1 and B2
B2
Prior Prb = 0.10
Prior Odds = 0.11

54
Loge (O{D|B1, B2}) = Loge(O{D}) + W+/-B1 + W+/-B2

Loge(O{D}) = Loge(0.11) = -2.2073

S
For the areas where both B1 and B2 are present
Loge (O{D|B1, B2}) = -2.2073 + 1.0988 + 0.2050 = -0.8585 B1
O{D|B1, B2} = Antiloge (-0.8585) = 0.4238

P = O/(1+O) = (0.4238)/(1.4238) = 0.2968

For the areas where B1 is present but B2 is absent

Loge (O{D|B1, B2}) = -2.2073 + 1.0988 - 0.0763 = -1.1848
B2
O{D|B1, B2} = Antiloge (- 1.1848) = 0.3058 Prior Prb = 0.10

P = O/(1+O) = (0.3058)/(1.3058) = 0.2342 Prospectivity Map

For the areas where B1 is absent but B2 is present

Loge (O{D|B1, B2}) = -2.2073 - 0.3678 + 0.2050 = -2.3701

O{D|B1, B2} = Antiloge (-2.3701) = 0.0934

P = O/(1+O) = (0.0934)/(1.0934) = 0.0854

For the areas where both B1 and B2 are absent

Loge (O{D|B1, B2}) = -2.2073 - 0.3678 - 0.0763 = -2.6514

O{D|B1, B2} = Antiloge (-2.6514) = 0.0705 Posterior probability

0.2968 0.0854
P = O/(1+O) = (0.0705)/(1.0705) = 0.0658 55
0.2342 0.0658
Bayesian Networks
& Classifiers

56
Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model

P(C|X) C = c1 ,  , cL , X = (X1 ,  , Xn )

P(c1 |x) P(c2 |x) P(c L |x)

•••

Discriminative
Probabilistic Classifier

•••
x1 x2 xn
x = ( x1 , x2 ,  , xn )
57
Probabilistic Classification
• Establishing a probabilistic model for classification (cont.)
– Generative model

P(X|C) C = c1 ,  , cL , X = (X1 ,  , Xn )

P(x|c1 ) P(x|c2 ) P(x|c L )

Generative Generative Generative

Probabilistic Model Probabilistic Model ••• Probabilistic Model
for Class 1 for Class 2 for Class L
••• ••• •••
x1 x2 x n x1 x2 xn x1 x2 xn

x = ( x1 , x2 ,  , xn )

58
The Joint Probability Distribution
• Joint probabilities can be between any
A B C P(A,B,C)
number of variables
false false false 0.1
eg. P(A = true, B = true, C = true)
false false true 0.2
• For each combination of variables, we
need to say how probable that false true false 0.05
combination is false true true 0.05
• The probabilities of these combinations true false false 0.3
need to sum to 1 true false true 0.1
true true false 0.05
• Once you have the joint probability
distribution, you can calculate any true true true 0.15
probability involving A, B, and C

Examples of things you can compute:

• P(A=true) = sum of P(A,B,C) in rows with A=true
• P(A=true, B = true | C=true) = P(A = true, B = true, C = true)
59
The Problem with the Joint Distribution
• Lots of entries in the table to fill A B C P(A,B,C)
up! false false false 0.1
false false true 0.2
• For k Boolean random variables, false true false 0.05
you need a table of size 2k false true true 0.05
true false false 0.3
• How do we use fewer numbers? true false true 0.1
Need the concept of true true false 0.05
independence
true true true 0.15

60
Independence
Variables A and B are independent if any of the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)

This says that knowing the outcome of A does not tell me

anything new about the outcome of B.

• E.g., P(Au, As) = P(Au) x P(As)

P(Au | As) = P(Au)
P(As | P(Au) = P(As)
61
Independence
How is independence useful?
• Suppose you have n coin flips and you want to calculate the joint distribution
P(C1, …, Cn)
• If the coin flips are not independent, you need 2n values in the table
• If the coin flips are independent, then
n
P(C1 ,..., Cn ) =  P(Ci )
i =1

Each P(Ci) table has 2 entries and there

are n of them for a total of 2n values

62
Conditional Independence
Variables A and B are conditionally independent given C if any of the
following hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)

Knowing C tells me everything about B.

I don’t gain anything by knowing A
(either because A doesn’t influence B
or because knowing C provides all the
information knowing A would give)

63
A Bayesian Network
Suppose there are four binary variables: A, B, C, D such that
• A is independent => P(A|B,C,D) = P(A) => A does not have parents

• B is dependent on A; independent of C and D => P(B|A,C,D) = P(B|A)

=> A is a parent of B or B is a child of A
• C and D are dependent on B => P(C|B,D) = P(C|B); P(D|B,C) = P(D|B)
=> B is the parent of C and D (or C and D are the children of B)

Objective: Estimate joint probability distribution of A,B,C,D

That is, P(A,B,C,D)
A Directed Acyclic Graph (DAG)

Each node in the

graph is a random Arrow indicates the direction of
variable dependence or parent-child
relationship; e.g., the arrow from A to
B indicates the that A is a parent of B
A

Informally, an arrow from node X to

B node Y means X has a direct influence
on Y

C D

65
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph (DAG) 2. Parameters: Conditional probability distribution
table for each node
A P(A)

A false 0.6
true 0.4

A B P(B|A)
false false 0.01
false true 0.99
B true false 0.7
true true 0.3

B D P(D|B)
B C P(C|B)
false false 0.02
false false 0.4 C D
false true 0.98
false true 0.6
true false 0.05
true false 0.9
true true 0.95
true true 0.1
Node parameters
Conditional probability distribution for C given B

B C P(C|B)
false false 0.4
false true 0.6
true false 0.9 For a given combination of values of the parents (B in this
true true 0.1 example), the entries for the child variable must add up to 1
eg. P(C=true | B=false) + P(C=false |B=false )=1

67
Bayesian Networks
Two important properties:
1. Encodes the conditional dependence relationships
between the variables in the graph structure
2. Is a compact representation of the joint probability
distribution over all variables

68
Bayesian Classifier
• One binary variable at the core of the network, called the class variable
• The class variable can have as many child variables (called attribute
variables) as possible
• The class variable has no parent
• All attribute variables have the class variable as their parent
• The attribute variables can have more than one parent variable.

69
Bayesian Classifier
D is the class variable A, B and C are attribute variables

C B
Bayesian Classifier
D is the class variable and A, B and C are attribute D P(D)

variables false 0.6

Naïve classifier Augmented Naïve classifier true 0.4

D P(D) A A D A P(A|D)

false false 0.01

false 0.6
false true 0.99
true 0.4
true false 0.7
D A P(A|D) D D true true 0.3
false false 0.06
D C P(C|D)
false true 0.94
false false 0.01
true false 0.10 C B C B
false true 0.99
true true 0.90
true false 0.7
D B P(B|D) D A P(A|D)

false false 0.02 false false 0.01

Selective true true 0.3

false true 0.99

A Naïve classifier D A B P(B|D,A)
false true 0.98
false false false 0.1
true false 0.05 true false 0.7
false false true 0.2
true true 0.95 true true 0.3 D P(D)
false true false 0.05
D C P(C|B) D C P(C|D)
D false 0.6
false true true 0.05
true 0.4
false false 0.4 false false 0.01 true false false 0.3

false true 0.6 false true 0.99 true false true 0.1

true false 0.9 true false 0.7

C true true false 0.05
true true true 0.15
true true 0.1 true true 0.3
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
He will play tennis or not?
PlayTennis Problem
You have data on weather conditions on the days Jones played
tennis in past
• Outlook – [Sunny, Overcast, Rain]
• Temperature – [Hot, Cool, Mild]
• Humidity – [High, Normal]
• Wind – [Weak, Strong]

73
PlayTennis: Bayesian Network
Outlook

Wind PlayTennis Temp

Humidity

74
PlayTennis: Bayesian Network
Outlook

Wind PlayTennis Temp

Humidity

75
Naïve Bayesian Classifier
• Bayes classification
P(C|X)  P(X|C)P(C) = P(X1 ,  , Xn |C)P(C)
Difficulty: learning the joint probability P(X1 ,  , Xn |C)
• Naïve Bayes classification
– Assumption that all input features are conditionally independent!

P( X 1 , X 2 ,  , X n | C ) = P( X 1 | C ) P( X 2 | C )    P( X n | C )

– MAP classification rule: for x = ( x1 , x2 ,  , xn )

[P( x1 |c* )    P( xn |c* )]P(c* )  [P( x1 |c)    P( xn |c)]P(c), c  c* , c = c1 ,  , cL

76
Naïve Bayesian Classifier

2
For each target value of ci (ci = c1 , c0 )
Pˆ (C = c1)  estimate P(C = c1 ) with examples in S;
For every feature value x jk of each feature X j ( j = 1,  , F ; k = 1,  , N j )
Pˆ ( X j = x jk | C = c1 )  estimate P( X j = x jk | C = c1 ) with examples in S;
Pˆ ( X j = x jk | C = c0 )  estimate P( X j = x jk | C = c0 ) with examples in S;
2
X = (a1 ,  , an )
c1
[ Pˆ (a1 | c1 )    Pˆ (an | c1 )]Pˆ (c1 )  [ Pˆ (a1 | co )    Pˆ (an | co )]Pˆ (co )

77
Example: PlayTennis
Sunny
Overcast
Outlook

Strong yes
Weak No
Wind PlayTennis Temp
Hot
Mild
Cool
Humidity
High
Normal
Low

x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

He will play tennis or not?
Example

Learning Phase: generate lookup tables

Outlook Play=Yes Play=No Temperature Play=Yes Play=No

Sunny 2/9 3/5 Hot 2/9 2/5

Overcast 4/9 0/5 Mild 4/9 2/5

Rain 3/9 2/5 Cool 3/9 1/5

Humidity Play=Yes Play=No Wind Play=Yes Play=No

High 3/9 4/5 Strong 3/9 3/5

Normal 6/9 1/5 Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)

He will play tennis or not?
Example
• Test Phase
– Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

81
Example
• Test Phase
– Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

– Decision making with the MAP rule

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

82
PlayTennis Problem
With numeric data
Temperatures on previous 14 days and Jones’s playing history
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.
No: 27.3, 30.1, 17.4, 29.5, 15.1

Temperature is 23 C. Play tennis or not?

83
Naïve Bayesian Classifier
Numeric (floating point) data
• Algorithm: Continuous-valued Features
– Numberless values for a feature
– Conditional probability often modeled with the normal distribution
1  ( X j − ji )2 
Pˆ ( X j |C = ci ) = exp − 
2 ji  2 ji 
2

 ji : mean (avearage) of feature values X j of examples for which C = ci
 ji : standard deviation of feature values X j of examples for which C = ci

– Learning Phase: for X = (X1 ,  , Xn ), C = c1 ,  , cL

Output: n L normal distributions and P(C = ci ) i = 1,  , L
– Test Phase: Given an unknown instance X = (a1 ,  , an )
• Instead of looking-up tables, calculate conditional probabilities with all the normal
distributions achieved in the learning phrase
• Apply the MAP rule to make a decision
PlayTennis Problem
With numeric data
• Example: Continuous-valued Features
– Temperature is naturally of continuous value.
Yes: 25.2, 19.3, 18.5, 21.7, 20.1, 24.3, 22.8, 23.1, 19.8
No: 27.3, 30.1, 17.4, 29.5, 15.1
– Estimate mean and variance for each class
1 N 1 N Yes = 21.64 , Yes = 2.35
 =  xn ,  =  ( xn − )2
2
 No = 23.88 , No = 7.09
N n=1 N n=1

– Learning Phase: output two Gaussian models for P(temp|C)

1  ( x − 21.64) 2  1  ( x − 21.64) 2 
Pˆ ( x | Yes) = exp −  = exp − 
2.35 2  2  2.35 2
 2.35 2  11.09 
ˆ 1  ( x − 23.88) 2  1  ( x − 23.88) 2 
P( x | No) = exp −  = exp − 
7.09 2  2  7.09  7.09 2
2
 50.25 
85

Lecture 5 Bayesian Model 1
No ratings yet
Lecture 5 Bayesian Model 1
61 pages
Chapter II Types of Probabilities-1
No ratings yet
Chapter II Types of Probabilities-1
29 pages
2
No ratings yet
2
30 pages
Probability & Statistics Guide
No ratings yet
Probability & Statistics Guide
41 pages
Probability - The Mathematics of Uncertainty
No ratings yet
Probability - The Mathematics of Uncertainty
4 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
Probability Basics for College Students
No ratings yet
Probability Basics for College Students
37 pages
Unit 5 - Probability Theory For Financial Risk
No ratings yet
Unit 5 - Probability Theory For Financial Risk
74 pages
Probability 1
No ratings yet
Probability 1
2 pages
Introduction to Probability Concepts
No ratings yet
Introduction to Probability Concepts
38 pages
Chapter 6 - PPTs
No ratings yet
Chapter 6 - PPTs
45 pages
Probability
No ratings yet
Probability
47 pages
Probability
No ratings yet
Probability
20 pages
Udayagiriiiii
No ratings yet
Udayagiriiiii
13 pages
QMM Epgdm 6
No ratings yet
QMM Epgdm 6
110 pages
Concept of Probability & Probability Distribution (STAT-2207, CP-7 & 8)
No ratings yet
Concept of Probability & Probability Distribution (STAT-2207, CP-7 & 8)
9 pages
Probability Basics for Beginners
100% (1)
Probability Basics for Beginners
17 pages
Probability Basics for Beginners
No ratings yet
Probability Basics for Beginners
57 pages
Maths Project
No ratings yet
Maths Project
18 pages
Wa0001
No ratings yet
Wa0001
14 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Topic01 - Probability
No ratings yet
Topic01 - Probability
80 pages
Understanding Probability Basics
100% (7)
Understanding Probability Basics
55 pages
Session 7&8
No ratings yet
Session 7&8
38 pages
Unit 3 Uncertainty
No ratings yet
Unit 3 Uncertainty
36 pages
Probability Concepts
No ratings yet
Probability Concepts
32 pages
Probability Project for Class XII
No ratings yet
Probability Project for Class XII
18 pages
Decesion Science: "Probability"
No ratings yet
Decesion Science: "Probability"
18 pages
Probability (Scanned)
No ratings yet
Probability (Scanned)
13 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
71 pages
Statistics For Business Topic - Chapter 5 - Probability
No ratings yet
Statistics For Business Topic - Chapter 5 - Probability
1 page
Probability Basics for Students
No ratings yet
Probability Basics for Students
39 pages
Week 10 Lecture
No ratings yet
Week 10 Lecture
31 pages
Probability Guide Formatted
No ratings yet
Probability Guide Formatted
2 pages
Introduction To Probability
No ratings yet
Introduction To Probability
33 pages
Lecture 2 - CS50 - S Introduction To Artificial Intelligence With Python
No ratings yet
Lecture 2 - CS50 - S Introduction To Artificial Intelligence With Python
24 pages
Probability
No ratings yet
Probability
5 pages
Probabilistic Model
No ratings yet
Probabilistic Model
7 pages
Probability Basics for Beginners
No ratings yet
Probability Basics for Beginners
28 pages
SUMSEM12024-25 MAT1014 ETH AP2024257000245 2025-05-23 Reference-Material-II
No ratings yet
SUMSEM12024-25 MAT1014 ETH AP2024257000245 2025-05-23 Reference-Material-II
15 pages
Chapter 04
No ratings yet
Chapter 04
5 pages
Maths Project XII Probability Final
No ratings yet
Maths Project XII Probability Final
30 pages
Maths Project (XII) - Probability
68% (19)
Maths Project (XII) - Probability
18 pages
Bayesian Learning for Graphics
No ratings yet
Bayesian Learning for Graphics
141 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Probability Concepts and Applications
No ratings yet
Probability Concepts and Applications
9 pages
Probability
No ratings yet
Probability
3 pages
Bayes' Formula and Independence: Scott Sheffield
No ratings yet
Bayes' Formula and Independence: Scott Sheffield
61 pages
Introduction to Probability Theory
No ratings yet
Introduction to Probability Theory
19 pages
Presented By, Name-Ankita Rath ROLL NO-17STAT008 P.G. Dept. of Statistics 2 Yr
No ratings yet
Presented By, Name-Ankita Rath ROLL NO-17STAT008 P.G. Dept. of Statistics 2 Yr
30 pages
COMP3411 Week 9 - Uncertainty
No ratings yet
COMP3411 Week 9 - Uncertainty
70 pages
Unit 5
No ratings yet
Unit 5
25 pages
ML Ch2 Probability
No ratings yet
ML Ch2 Probability
36 pages
Operation Research - Probability Calculation
No ratings yet
Operation Research - Probability Calculation
17 pages
Probability Concepts and Applications
85% (13)
Probability Concepts and Applications
10 pages
MIT Probabilistic Systems Analysis Quiz Solutions
No ratings yet
MIT Probabilistic Systems Analysis Quiz Solutions
11 pages
Further Mathematics H2
No ratings yet
Further Mathematics H2
14 pages
Monte Carlo Simulation for Profit Analysis
100% (1)
Monte Carlo Simulation for Profit Analysis
15 pages
Stat 3331
No ratings yet
Stat 3331
5 pages
Statistics and Probability: Quarter 3 - Module 1: Random Variables
100% (1)
Statistics and Probability: Quarter 3 - Module 1: Random Variables
19 pages
Financial Mathematics, Derivatives and Structured Products, 2nd (Raymond H. Chan, Yves ZY. Guo, Spike T. Lee Etc.) (Z-Library)
No ratings yet
Financial Mathematics, Derivatives and Structured Products, 2nd (Raymond H. Chan, Yves ZY. Guo, Spike T. Lee Etc.) (Z-Library)
478 pages
Vietnam Land Use Restrictions & Deforestation
No ratings yet
Vietnam Land Use Restrictions & Deforestation
25 pages
Uwi Computing Program
No ratings yet
Uwi Computing Program
106 pages
Statistics & Probability Week 1-2
No ratings yet
Statistics & Probability Week 1-2
16 pages
14 Midterm 2740 PDF
No ratings yet
14 Midterm 2740 PDF
17 pages
Statistics and Probability Lesson1
No ratings yet
Statistics and Probability Lesson1
9 pages
Spring 2025-Course Outline-Probability and Statistics
No ratings yet
Spring 2025-Course Outline-Probability and Statistics
3 pages
CUET Chapter-Wise PYQ (2021-2024) Book
No ratings yet
CUET Chapter-Wise PYQ (2021-2024) Book
137 pages
PDF Statistics in Engineering With Examples in MATLAB and R Second Edition Chapman Hall CRC Texts in Statistical Science Andrew Metcalfe Download
100% (2)
PDF Statistics in Engineering With Examples in MATLAB and R Second Edition Chapman Hall CRC Texts in Statistical Science Andrew Metcalfe Download
54 pages
Understanding Uniform Distribution
No ratings yet
Understanding Uniform Distribution
14 pages
Actuarial Mathematics For Life Contingent Risks International Series On Actuarial Science 1st Edition David C. M. Dickson Online Reading
No ratings yet
Actuarial Mathematics For Life Contingent Risks International Series On Actuarial Science 1st Edition David C. M. Dickson Online Reading
167 pages
Psat NMSQT Student Guide
No ratings yet
Psat NMSQT Student Guide
51 pages
MScFE 610 ECON - Compiled - Notes - M6
No ratings yet
MScFE 610 ECON - Compiled - Notes - M6
29 pages
Operational Risk Modeling in Insurance and Banking
No ratings yet
Operational Risk Modeling in Insurance and Banking
13 pages
Stat&Prob Module 1 Edited
No ratings yet
Stat&Prob Module 1 Edited
14 pages
Fresco
No ratings yet
Fresco
50 pages
Unit 2
No ratings yet
Unit 2
8 pages
Experiment No1
No ratings yet
Experiment No1
16 pages
Single Population Proportion Testing
No ratings yet
Single Population Proportion Testing
11 pages
MCQS Introduction To Statistical Theory MSC 4TH
No ratings yet
MCQS Introduction To Statistical Theory MSC 4TH
19 pages
Data Science and Visualization (21CS644) : Text Books
No ratings yet
Data Science and Visualization (21CS644) : Text Books
23 pages
Probability and Statistics
No ratings yet
Probability and Statistics
64 pages
Probability and Statistics Problems
No ratings yet
Probability and Statistics Problems
18 pages