0% found this document useful (0 votes)

5 views42 pages

Bayesian Learning

The document discusses Bayesian learning, which uses probability distributions for making optimal decisions based on observed data. It highlights the usefulness of Bayesian methods in machine learning, including the Naive Bayes classifier, and explains key concepts such as prior and posterior probabilities, Bayes theorem, and maximum a posteriori (MAP) hypothesis. Additionally, it covers applications of Bayesian algorithms and introduces Gaussian Naive Bayes for continuous attributes.

Uploaded by

bunny kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views42 pages

Bayesian Learning

Uploaded by

bunny kim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

BAYESIAN LEARNING

MODULE: 4

1
Introduction
• Bayesian reasoning provides a probabilistic approach to learning and
inference

• It is based on the assumption that the quantities of interest are governed

by probability distributions and that optimal decisions can be made by
reasoning about these probabilities together with observed data.

2
Usefulness of Bayesian Learning

Bayesian learning methods are relevant to our study of machine learning for
two different reasons:

• First, Bayesian learning algorithms that calculate explicit probabilities for

hypotheses, such as the naive Bayes classifier, are among the most
practical approaches to certain types of learning problems.

• They provide a useful perspective for understanding many learning

algorithms that do not explicitly manipulate probabilities

3
Features of Bayesian learning
1. Each observed training example can incrementally decrease or
increase the estimated probability that a hypothesis is correct.
– This provides a more flexible approach to learning than algorithms
that completely eliminate a hypothesis if it is found to be inconsistent
with any single example.

2. Prior knowledge can be combined with observed data to determine

the final probability of a hypothesis.
In Bayesian learning, prior knowledge is provided by asserting
(1) a prior probability for each candidate hypothesis,
(2) a probability distribution over observed data for each possible
hypothesis.

4
Features of Bayesian learning
3. Bayesian methods can accommodate hypotheses that make probabilistic
predictions (e.g., hypotheses such as "this pneumonia patient has a 93%
chance of complete recovery").

4. New instances can be classified by combining the predictions of

multiple hypotheses, weighted by their probabilities.

5. Even in cases where Bayesian methods prove computationally intractable,

they can provide a standard of optimal decision making against which
other practical methods can be measured.

5
Bayes Theorem in Machine Learning
• In machine learning we are often interested in determining the best
hypothesis from some space H, given the observed training data D.

• One way to specify what we mean by the best hypothesis is to say that we
demand the most probable hypothesis, given the data D plus any initial
knowledge about the prior probabilities of the various hypotheses in H.

• Bayes theorem provides a direct method for calculating such probabilities.

6
Prior probability
• We shall write P(h) to denote the initial probability that hypothesis h holds,
before we have observed the training data.

• P(h) is often called the prior probability of h and may reflect any
background knowledge we have about the chance that h is a correct
hypothesis.

• If we have no such prior knowledge, then we might simply assign the

same prior probability to each candidate hypothesis.

• Similarly, we will write P(D) to denote the prior probability that training data
D will be observed (i.e., the probability of D given no knowledge about which
hypothesis holds).

7
Probability
• P(D|h) denotes the probability of observing data D given some world
in which hypothesis h holds.

• In machine learning problems, we are interested in the probability P(h|D)

that h holds given the observed training data D.

• P(h|D) is called the posterior probability of h, because it reflects our

confidence that h holds after we have seen the training data D.

• Notice the posterior probability P(h|D) reflects the influence of the

training data D, in contrast to the prior probability P(h), which is
independent of D.

8
9
▪ P(h|D) : Probability that the customer will buy a computer given that we know his
age, credit rating and income. (Posterior Probability of h)

▪ P(h) : Probability that the customer will buy a computer regardless of age, credit
rating, income (Prior Probability of h)

▪ P(D|h) : Probability that the customer is 35 yrs old, have fair credit rating and
earns $40,000, given that he has bought our computer (Posterior Probability of
D)

▪ P(D) : Probability that a person from our set of customers is 35 yrs old, have fair
credit rating and earns $40,000. (Prior Probability of D)

10
11
Maximum a posteriori (MAP) hypothesis
• In many learning scenarios, the learner considers some set of
candidate hypotheses H and is interested in finding the most probable
hypothesis h ε H given the observed data D (or at least one of the
maximally probable if there are several).

• Any such maximally probable hypothesis is called a

maximum a posteriori (MAP) hypothesis.

• We can determine the MAP hypotheses by using Bayes theorem to

calculate the posterior probability of each candidate hypothesis.

12
P(D) is a
constant
independent
of h.

P(D|h) is called the

likelihood of D given h and
any hypothesis that
maximizes this is the
Maximum Likelihood 13 h
Example of the Bayes rule
• Consider a medical diagnosis problem in which there are two
alternative hypotheses:
(1) that the patient has a particular form of cancer, and
(2) that the patient does not.

• The available data is from a particular laboratory test with two

possible outcomes: + (positive) and - (negative).
• We have prior knowledge that over the entire population of people only .008
have this disease.
• Furthermore, the lab test is only an imperfect indicator of the disease.
• The test returns a correct positive result in only 98% of the cases in which the
disease is actually present and a correct negative result in only 97% of the cases
in which the disease is not present. In other cases, the test returns the opposite
result.
14
Computing probabilities
• The situation can be summarized by the following probabilities:
– P(cancer) = .008, P(¬cancer) =.992
– P(+|cancer) = .98, P(-|cancer) = .02
– P(+|¬cancer) = .03, P(-|¬cancer) = .97
• Suppose we now observe a new patient for whom the lab test returns a
positive result.
• Should we diagnose the patient as having cancer or not?
• The maximum a posteriori hypothesis can be found using hmap
– P(+|cancer)P(cancer) = (.98) .008 = .0078
– P(+|¬cancer)P(¬cancer) = (.03).992 =.0298
• Thus, hmap = ¬cancer.
15
Naive Bayes Classifier

16
Naïve Bayes Classifier
Along with decision trees, neural networks, nearest neighor, one of
the most practical learning methods.

When to use
• Moderate or large training set available
• Attributes that describe instances are conditionally
independent given classification

Successful applications:
• Diagnosis
• Classifying text documents
17
Naïve Bayes Classifier
• What can we do if our data d has several attributes?
• Naïve Bayes assumption: Attributes that describe data instances are
conditionally independent given the classification hypothesis

P
(
d|
h)=
P(
a1 a
|
Th=
,...,
)P
(
at|h
)
t

– it is a simplifying assumption, obviously it may be violated in reality

– in spite of that, it works well in practice

• The Bayesian classifier that uses the Naïve Bayes assumption and computes
the MAP hypothesis is called Naïve Bayes classifier

18
An Illustrative Example of NBC

19
An Illustrative Example

20
21
22
23
Example 2:

Given the data for symptoms and whether patient have flu or not, classify following:
x = (chills = Y, runny nose = N, headache = mild, fever = Y)
■ P(Flu = Y) = 5/8
■ P(Flu = N) = 3/8 Test Case:
■ P(chills = Y | Y) = 3/5 x = (chills = Y, runny nose = N, headache = mild, fever = Y)
■ P(chills = Y | N) = 1/3
■ P(runny nose = N | Y) = 1/5
■ P(runny nose = N | N) = 2/3
■ P(headache = Mild | Y) = 2/5
■ P(headache = Mild | N) = 1/3
■ P(fever = Y | Y) = 4/5
■ P(fever = Y | N) = 1/3

■ P(Yes|x) = [P(chills=Y|Y) P(runny nose=Y|Y) P(headache=Mild|Y) P(fever=Y|Y)] P(flu=Y) =

3/5 * 1/5 * 2/5 *4/5 * 5/8 = 0.024 (Maximum value and hence the predicted class)

■ P(No|x) = [P(chills=Y|N) P(runny nose=N|N) P(headache=Mild|N) P(fever=Y|N)] P(flu=N) =

1/3 * 2/3 *1/3 *1/3 * 3/8 = 0.0009
zero probability error
❑ There is a chance that the probability of a hypothesis becomes zero due to an element
having zero value in the numerator

❑ This when multiplied by other probabilities leads to a final zero probability.

❑ This can be avoided by applying the smoothing technique called Laplace correction which means if there
are zero instances of a particular feature, just add one which will not make much of a difference

❑ Ex: if probability = 0/400, it can be changed to 1/400 which will not make much of a difference
Applications of Naïve Bayes Algorithms

■ Naïve Bayes is fast and thus can be used for making real time
predictions
■ It can predict probability of multiple classes of target variables
■ It can be used for text classification, spam filtering, sentiment analysis
■ Naïve Bayes and collaborative filtering together can help make
recommendation systems to filter unseen information and predict
whether user would like a given resource or not
Bayes Optimal Classifier
Bayes Optimal Classifier
• So far we have considered the question "what is the most probable
hypothesis given the training data?"

• In fact, the question that is often of most significance is the

closely related question:
■ – what is the most probable classification of the new
instance given the training data?

• Although it may seem that this second question can be answered

by simply applying the MAP hypothesis to the new instance, in
fact it is possible to do better.
Bayes Optimal Classifier
• To develop some intuitions consider a hypothesis space
containing three hypotheses, h1, h2, and h3.
• Suppose that the posterior probabilities of these hypotheses given the training
data are 0.4, 0.3, and 0.3 respectively.

• Thus, h1 is the MAP hypothesis.

• Suppose a new instance x is encountered, which is classified positive by h1, but

negative by h2 and h3.

• Taking all hypotheses into account, the probability that x is positive is 0.4 (the
probability associated with h1), and the probability that it is negative is therefore
0.6.

• The most probable classification (negative) in this case is different from the
classification generated by the MAP hypothesis.
Bayes Optimal Classifier
• In general, the most probable classification of the new instance is
obtained by combining the predictions of all hypotheses, weighted by their
posterior probabilities.
• If the possible classification of the new example can take on any value vj
from some set V, then the probability P(vj|D) that the correct classification for
the new instance is vj, is just:

• The optimal classification of the new instance is the value Vj for which P(vj|D)
is maximum.

arg max  P(v j | hi )P(hi | D)

v j V
h H
Bayes Optimal Classifier Example
Bayes optimal classification
arg max  P(v j | hi )P(hi | D)
v j V
h iH
Example:
P(h1|D)=.4, P(-|h1)=0, P(+|h1)=1
P(h2|D)=.3, P(-|h2)=1, P(+|h2)=0
P(h3|D)=.3, P(-|h3)=1, P(+|h3)=0
therefore
 P(+ | h )P(h
i i | D) = .4
hi H

 P(− | h )P(h
i i | D) = .6
hi H

and
arg max  P(v j | hi )P(hi | D) = -
v j V
h iH
Bayes Optimal Classifier Example
Bayes optimal classification
arg max  P(v j | hi )P(hi | D)
v j V
h iH

P(h1|D)=.4, P(-|h1)=0, P(+|h1)=1

P(h2|D)=.3, P(-|h2)=0, P(+|h2)=1
P(h3|D)=.5, P(-|h3)=1, P(+|h3)=0
therefore
 P(+ | h )P(h
i i | D) = .7
hi H

 P(− | h )P(h
i i | D) = .5
hi H

and
arg max  P(v j | hi )P(hi | D) = +
v j V
h iH
Gaussian Naïve Bayes for Continuous Attributes

Normal Distribution formula to be used for

probability distribution computation
Gaussian Naïve Bayes for Continuous Attributes
We calculate the mean and standard deviation of the
attributes PSA and Age for Cancer Class

Standard deviation = S
Mean = x̄
Gaussian Naïve Bayes for Continuous Attributes
We calculate the mean and standard deviation of the
attributes PSA and Age for Healthy Class

Standard deviation = S
Mean = x̄
Gaussian Naïve Bayes for Continuous Attributes

Test Case – Predict for ->

Gaussian Naïve Bayes for Continuous Attributes
Gaussian Naïve Bayes for Continuous Attributes
Gaussian Naïve Bayes for Continuous Attributes
Gaussian Naïve Bayes for Continuous Attributes
Gaussian Naïve Bayes for Continuous Attributes

Class is Cancer

E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
ML Notes 4
No ratings yet
ML Notes 4
27 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Bayesian and Computational Learning
No ratings yet
Bayesian and Computational Learning
178 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
60 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
AI Mod4@AzDOCUMENTS - in
No ratings yet
AI Mod4@AzDOCUMENTS - in
41 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
Module 5
No ratings yet
Module 5
30 pages
Lecture 5 Bayesian
No ratings yet
Lecture 5 Bayesian
37 pages
AIML - Module 4 - Updated
No ratings yet
AIML - Module 4 - Updated
41 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
Unit II Probabilistic Reasoning
No ratings yet
Unit II Probabilistic Reasoning
28 pages
Bayesian Concept Learning Guide
No ratings yet
Bayesian Concept Learning Guide
157 pages
3.1 New
No ratings yet
3.1 New
12 pages
Module 5
No ratings yet
Module 5
24 pages
2bayesian Learning
No ratings yet
2bayesian Learning
22 pages
Module - 4 Bayeian Learning
No ratings yet
Module - 4 Bayeian Learning
44 pages
UNIT 4 - Bayesian Learning
No ratings yet
UNIT 4 - Bayesian Learning
54 pages
Bayesian Learning Essentials
No ratings yet
Bayesian Learning Essentials
49 pages
Unit III
No ratings yet
Unit III
19 pages
CSC 323-07 Bayesian Learning
No ratings yet
CSC 323-07 Bayesian Learning
11 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
ML Unit 4-1-24
No ratings yet
ML Unit 4-1-24
24 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
Unit 4
No ratings yet
Unit 4
24 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Chapter 8
No ratings yet
Chapter 8
26 pages
Unit2 - 5 Part 1
No ratings yet
Unit2 - 5 Part 1
14 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
Module 4
No ratings yet
Module 4
15 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
Bayesian Decision Theory in ML
No ratings yet
Bayesian Decision Theory in ML
56 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
Bcs602 ML Mod-4 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-4 Notes @vtunetwork
31 pages
Unit 4
No ratings yet
Unit 4
36 pages
ML Unit-4
No ratings yet
ML Unit-4
24 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
39 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
39 pages
2022 Slide9 BayesML Eng
No ratings yet
2022 Slide9 BayesML Eng
34 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
66 pages
Module 2qd
No ratings yet
Module 2qd
4 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
Module - 5 - Notes BAYESIAN Learning Notes
No ratings yet
Module - 5 - Notes BAYESIAN Learning Notes
24 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
Unit - 5 ML
No ratings yet
Unit - 5 ML
57 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
0% (1)
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
ML 16
No ratings yet
ML 16
22 pages
Cloud
No ratings yet
Cloud
5 pages
CC Labu
No ratings yet
CC Labu
7 pages
Purple and White Clean and Professional Resume
No ratings yet
Purple and White Clean and Professional Resume
1 page
Regression Analysis
No ratings yet
Regression Analysis
33 pages
Uni Flip
No ratings yet
Uni Flip
46 pages
Cross Entropy Wikipedia
No ratings yet
Cross Entropy Wikipedia
8 pages
Basics of Learning
No ratings yet
Basics of Learning
53 pages
Chapter 02 Understanding of Data
No ratings yet
Chapter 02 Understanding of Data
63 pages
Midterm - SW 408 - SWE 411 Selected Topics Model 2 Answer
No ratings yet
Midterm - SW 408 - SWE 411 Selected Topics Model 2 Answer
3 pages
101successaffirmations Bigmanifestation
No ratings yet
101successaffirmations Bigmanifestation
9 pages
Extended Abstract Template
No ratings yet
Extended Abstract Template
2 pages
HYPERTEXT
No ratings yet
HYPERTEXT
21 pages
HSM Design Principles v5 1
No ratings yet
HSM Design Principles v5 1
6 pages
Online Grievance Reddressal System (Mini Project) Guided By: R.Nitesh Asst - Professor
No ratings yet
Online Grievance Reddressal System (Mini Project) Guided By: R.Nitesh Asst - Professor
35 pages
Scheme - G Third Semester (Ce, CR, CS, CV)
No ratings yet
Scheme - G Third Semester (Ce, CR, CS, CV)
35 pages
Praveen Kumar's CV: Store & Purchase Expert
No ratings yet
Praveen Kumar's CV: Store & Purchase Expert
3 pages
PI - Premium Blue GEO M 74 15W 40 - 138 03
No ratings yet
PI - Premium Blue GEO M 74 15W 40 - 138 03
2 pages
OTMS User Guide (Employer) Version 1.2
No ratings yet
OTMS User Guide (Employer) Version 1.2
29 pages
This Research Based On Chetan Bhagat
No ratings yet
This Research Based On Chetan Bhagat
7 pages
Employee Satisfaction at JIO
No ratings yet
Employee Satisfaction at JIO
74 pages
(Ebook) Veterinary Instruments and Equipment 4th by Teresa F. Sonsthagen ISBN 9780323511322, 0323511325 All Chapters Available
No ratings yet
(Ebook) Veterinary Instruments and Equipment 4th by Teresa F. Sonsthagen ISBN 9780323511322, 0323511325 All Chapters Available
162 pages
Santals' Quest for Identity
No ratings yet
Santals' Quest for Identity
4 pages
Grammar Skills for Young Learners
No ratings yet
Grammar Skills for Young Learners
16 pages
Mathematics: Quarter 2 - Module 1: Polynomial Functions
100% (1)
Mathematics: Quarter 2 - Module 1: Polynomial Functions
27 pages
Consumer Behaviour & Advertising Management Unit 3
No ratings yet
Consumer Behaviour & Advertising Management Unit 3
37 pages
FAAM Catalogo Traccion Battek Ingles
No ratings yet
FAAM Catalogo Traccion Battek Ingles
7 pages
Republic of The Philippines Department of Education Region Iv - A Calabarzon Schools Division Office of Rizal Rodriguez Sub - Office
No ratings yet
Republic of The Philippines Department of Education Region Iv - A Calabarzon Schools Division Office of Rizal Rodriguez Sub - Office
6 pages
BDB - BD FACSCalibur Flow Cytometry BR TR
No ratings yet
BDB - BD FACSCalibur Flow Cytometry BR TR
12 pages
127 - Hse Inspection Checklist-Compressed Gas Cylinder
82% (11)
127 - Hse Inspection Checklist-Compressed Gas Cylinder
1 page
Explorent - Micro-Laryngoscopy-Catalogue
No ratings yet
Explorent - Micro-Laryngoscopy-Catalogue
80 pages
MR Darcys Undoing Abigail Reynolds Instant Download
No ratings yet
MR Darcys Undoing Abigail Reynolds Instant Download
26 pages
5.4 Prof. Josaphat Tetuko Sri Sumantyo PH.D
No ratings yet
5.4 Prof. Josaphat Tetuko Sri Sumantyo PH.D
34 pages
Battery Report
No ratings yet
Battery Report
38 pages
Top OTC Drugs for Common Ailments
No ratings yet
Top OTC Drugs for Common Ailments
3 pages
Software Development Sheet
No ratings yet
Software Development Sheet
23 pages
Living Dining-Reflected Ceiling Plan
No ratings yet
Living Dining-Reflected Ceiling Plan
1 page
Instant Download Pharmacology Principles and Applications 3rd Edition Eugenia M. Fulcher PDF All Chapters
100% (11)
Instant Download Pharmacology Principles and Applications 3rd Edition Eugenia M. Fulcher PDF All Chapters
82 pages
Assessment of Radiofrequency Radiation Intensity On 35 Main Streets Throughout Pennsylvania, USA During The Fall of 2021
No ratings yet
Assessment of Radiofrequency Radiation Intensity On 35 Main Streets Throughout Pennsylvania, USA During The Fall of 2021
13 pages

Bayesian Learning

Uploaded by

Bayesian Learning

Uploaded by

BAYESIAN LEARNING

• It is based on the assumption that the quantities of interest are governed

• First, Bayesian learning algorithms that calculate explicit probabilities for

• They provide a useful perspective for understanding many learning

2. Prior knowledge can be combined with observed data to determine

4. New instances can be classified by combining the predictions of

5. Even in cases where Bayesian methods prove computationally intractable,

• Bayes theorem provides a direct method for calculating such probabilities.

• If we have no such prior knowledge, then we might simply assign the

• In machine learning problems, we are interested in the probability P(h|D)

• P(h|D) is called the posterior probability of h, because it reflects our

• Notice the posterior probability P(h|D) reflects the influence of the

• Any such maximally probable hypothesis is called a

• We can determine the MAP hypotheses by using Bayes theorem to

P(D|h) is called the

• The available data is from a particular laboratory test with two

– it is a simplifying assumption, obviously it may be violated in reality

■ P(Yes|x) = [P(chills=Y|Y) P(runny nose=Y|Y) P(headache=Mild|Y) P(fever=Y|Y)] P(flu=Y) =

■ P(No|x) = [P(chills=Y|N) P(runny nose=N|N) P(headache=Mild|N) P(fever=Y|N)] P(flu=N) =

❑ This when multiplied by other probabilities leads to a final zero probability.

• In fact, the question that is often of most significance is the

• Although it may seem that this second question can be answered

• Thus, h1 is the MAP hypothesis.

• Suppose a new instance x is encountered, which is classified positive by h1, but

arg max  P(v j | hi )P(hi | D)

P(h1|D)=.4, P(-|h1)=0, P(+|h1)=1

Normal Distribution formula to be used for

Test Case – Predict for ->

You might also like