0% found this document useful (0 votes)

23 views21 pages

L3 (Week3) Bayesian Classifier

Uploaded by

Fahim Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views21 pages

L3 (Week3) Bayesian Classifier

Uploaded by

Fahim Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Bayesian Theory & Naïve Bayes Classifiers

Course 4232: Machine Learning

Dept. of Computer Science

Faculty of Science and Technology

Lecturer No: Week No: 03 Semester: Summer 21-22

Instructor: Dr. M M Manjurul Islam
Bayesian Classifier
 A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A basic Bayesian classifier, naïve Bayesian classifier, has
comparable performance with decision tree and selected neural network
classifiers
 Standard: Even when Bayesian methods are computationally intractable,
they can provide a standard of optimal decision making against which
other methods can be measured
Bayes’ Theorem: Basics
M
 Total probability Theorem: P(B) =  P(B | A )P( A )
i i
i =1

 Bayes’ Theorem: P(H | X) = P(X | H )P(H ) = P(X | H ) P(H ) / P(X)

P(X)
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori probability): the probability that
the hypothesis holds given the observed data sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X, given that the
hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40, medium income
Prediction Based on Bayes’ Theorem

 Given training data X, posteriori probability of a hypothesis H, P(H|X),

follows the Bayes’ theorem

P(H | X) = P(X | H )P(H ) = P(X | H ) P(H ) / P(X)

P(X)
 Informally, this can be viewed as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the highest among
all the P(Ck|X) for all the k classes
 Practical difficulty: It requires initial knowledge of many probabilities,
involving significant computational cost
Classification Is to Derive the Maximum Posteriori

 Let D be a training set of tuples and their associated class labels, and each
tuple is represented by an n-D attribute vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the maximal
P(Ci|X)
 This can be derived from Bayes’ theorem

P(X | C )P(C )
P(C | X) = i i
i P(X)
 Since P(X) is constant for all classes, only

needs to be maximized P(C | X) = P(X | C )P(C )

i i i
Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally independent (i.e.,
no dependence relation between attributes):
n
P(X | C i) =  P( x | C i) = P( x | C i)  P( x | C i)  ... P( x | C i)
k 1 2 n
k =1
 This greatly reduces the computation cost: Only counts the class
distribution
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for
Ak divided by |Ci, D| (# of tuples of Ci in D)
 If Ak is continous-valued, P(xk|Ci) is usually computed based on
Gaussian distribution with a mean μ and standard deviation σ
( x− )2
1 −
and P(xk|Ci) is g ( x,  ,  ) = e 2 2
2 

P(X | Ci) = g ( xk , Ci , Ci )

Bayesian Methods
 Learning and classification methods based on probability
theory.
 Bayes theorem plays a critical role in probabilistic learning
and classification.
 Uses prior probability of each category given no information
about an item.
 Categorization produces a posterior probability distribution
over the possible categories given a description of an item.
Does patient have cancer or not?

 A patient takes a lab test and the result comes back positive. It is known
that the test returns a correct positive result in only 99% of the cases and a
correct negative result in only 95% of the cases. Furthermore, only 0.03 of
the entire population has this disease.

1. What is the probability that this patient has cancer?

2. What is the probability that he does not have cancer?
3. What is the diagnosis?
Maximum A Posterior
 Based on Bayes Theorem, we can compute the Maximum A
Posterior (MAP) hypothesis for the data
 We are interested in the best hypothesis for some space H given
observed training data D.

hMAP  argmax P(h | D)

hH

P ( D | h) P ( h)
= argmax
hH P ( D)
= argmax P( D | h) P(h)
hH
H: set of all hypothesis.
Note that we can drop P(D) as the probability of the data is constant
(and independent of the hypothesis).
Maximum Likelihood
 Now assume that all hypotheses are equally probable a priori,
i.e., P(hi ) = P(hj ) for all hi, hj belong to H.
 This is called assuming a uniform prior. It simplifies computing
the posterior:

 This hypothesis is called the maximum likelihood hypothesis.

hML = arg max P( D | h)

hH
Desirable Properties of Bayes Classifier
 Incrementality: with each training example, the prior and the
likelihood can be updated dynamically: flexible and robust to
errors.
 Combines prior knowledge and observed data: prior probability of
a hypothesis multiplied with probability of the hypothesis
given the training data
 Probabilistic hypothesis: outputs not only a classification, but a
probability distribution over all classes
Bayes Classifiers
Assumption: training set consists of instances of different classes
described cj as conjunctions of attributes values
Task: Classify a new instance d based on a tuple of attribute values
into one of the classes cj C
Key idea: assign the most probable class cMAP using Bayes
Theorem.

cMAP = argmax P(c j | x1 , x2 ,, xn )

c j C

P( x1 , x2 , , xn | c j ) P(c j )
= argmax
c j C P( x1 , x2 ,, xn )
= argmax P( x1 , x2 ,, xn | c j ) P(c j )
c j C
Parameters estimation
 P(cj)
 Can be estimated from the frequency of classes in the training
examples.
 P(x1,x2,…,xn|cj)
 O(|X|n•|C|) parameters
 Could only be estimated if a very, very large number of training
examples was available.
 Independence Assumption: attribute values are conditionally
independent given the target value: naïve Bayes.

P( x1 , x 2 ,, x n | c j ) =  P( xi | c j )
i

c NB = arg max P(c j ) P( xi | c j )

c j C i
Properties
 Estimating P( xi | c j ) instead of P( x1 , x2 ,, xn | c j ) greatly
reduces the number of parameters (and the data sparseness).
 The learning step in Naïve Bayes consists of estimating
and P( xi | c j ) based on the frequencies in the training data
 An unseen instance is classified by computing the class that
maximizes the posterior P(c j )
 When conditioned independence is satisfied, Naïve Bayes
corresponds to MAP classification.
Naïve Bayes Classifier: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
Class: <=30 high no excellent no
C1:buys_computer = ‘yes’ 31…40 high no fair yes
C2:buys_computer = ‘no’ >40 medium no fair yes
>40 low yes fair yes
Data to be classified: >40 low yes excellent no
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium, <=30 low yes fair yes
Student = yes >40 medium yes fair yes
Credit_rating = Fair) <=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40
15
medium no excellent no
Naïve Bayes Classifier: An Example age income studentcredit_rating
buys_comp
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 >40
>40
medium
low
no fair
yes fair
yes
yes

P(buys_computer = “no”) = 5/14= 0.357

>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no

 Compute P(X|Ci) for each class <=30

>40
low yes fair
medium yes fair
yes
yes
<=30 medium yes excellent yes
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 31…40
31…40
medium
high
no excellent
yes fair
yes
yes

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 >40 medium no excellent no

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
What is discriminant Function
 For classification problem, for each class, define a function
such that we choose Ci if

gi ( x), i = 1,..., K

gi ( x) = max g k ( x)
k
Discriminant Functions
choose if (x ) = max (x ) (x ) =1

− ( | x )
(x ) =  ( | x )
 (x | ) ( )


R1,...,R

R = x | (x ) = max (x )

1
8
K=2 Classes

 Dichotomizer (K=2) vs Polychotomizer (K>2)

 g(x) = g1(x) – g2(x)

 1 if (x )  0
choose 
 2 otherwise

 Log odds:
log
( 1 | x)
( 2 | x)
Naïve Bayes Classifier: Comments
 Advantages
 Easy to implement
 Good results obtained in most of the cases
 Disadvantages
 Assumption: class conditional independence, therefore loss of accuracy
 Practically, dependencies exist among variables
 E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
 Dependencies among these cannot be modeled by Naïve Bayes Classifier
 How to deal with these dependencies? Bayesian Belief Networks (Chapter
9)
Textbook/ Reference Materials

 Introduction to Machine Learning by Ethem Alpaydin

 Machine Learning: An Algorithmic Perspective by
Stephen Marsland
 Pattern Recognition and Machine Learning by
Christopher M. Bishop

ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
Bayes' Theorem for Data Science
No ratings yet
Bayes' Theorem for Data Science
10 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Bayesian Classification Guide
No ratings yet
Bayesian Classification Guide
6 pages
TTDS Lecture 5
No ratings yet
TTDS Lecture 5
8 pages
AI Notes
No ratings yet
AI Notes
19 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
Bayesian
No ratings yet
Bayesian
23 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
18 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Classification Bayes
No ratings yet
Classification Bayes
21 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian Classification, Nearest
No ratings yet
Bayesian Classification, Nearest
46 pages
Module - 3 - Last Part
No ratings yet
Module - 3 - Last Part
16 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Lecture 7 - Naïve Bayes Classification
No ratings yet
Lecture 7 - Naïve Bayes Classification
12 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Naive Bayes Classifier Guide
No ratings yet
Naive Bayes Classifier Guide
16 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Bayesian Decision Theory in ML
No ratings yet
Bayesian Decision Theory in ML
56 pages
Lecture12 Ch8 ClassBasic Part2
No ratings yet
Lecture12 Ch8 ClassBasic Part2
22 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Bayesian Classification - Problem
No ratings yet
Bayesian Classification - Problem
4 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
7 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Classification Clustering
No ratings yet
Classification Clustering
44 pages
Classification Naive Bayes
No ratings yet
Classification Naive Bayes
17 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Naive by
No ratings yet
Naive by
23 pages
ML 09 Naive Bayes Classifier
No ratings yet
ML 09 Naive Bayes Classifier
24 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Naïve Bayes for Data Scientists
No ratings yet
Naïve Bayes for Data Scientists
31 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
3 2 Factorial Experiment
No ratings yet
3 2 Factorial Experiment
4 pages
TV Ratings Forecasting Models
No ratings yet
TV Ratings Forecasting Models
26 pages
Final Exam Formula Sheet
No ratings yet
Final Exam Formula Sheet
3 pages
2x2 Factorial Design Guide
No ratings yet
2x2 Factorial Design Guide
3 pages
Comparing Cross-Section and Time-Series Factor Models: R MC BM OP INV
No ratings yet
Comparing Cross-Section and Time-Series Factor Models: R MC BM OP INV
47 pages
Business Report - Advanced Statistics - Great Learning
100% (1)
Business Report - Advanced Statistics - Great Learning
20 pages
STAT 310 Syllabus
No ratings yet
STAT 310 Syllabus
5 pages
P10926 PDF
No ratings yet
P10926 PDF
18 pages
Kalman Filter Equation Derivation
No ratings yet
Kalman Filter Equation Derivation
4 pages
Understanding Confidence Interval Estimates For The Sample Mean
No ratings yet
Understanding Confidence Interval Estimates For The Sample Mean
18 pages
SB Test Bank Chapter 7
No ratings yet
SB Test Bank Chapter 7
77 pages
Homework/Assignment: Answers
No ratings yet
Homework/Assignment: Answers
4 pages
Linear Regression Analysis in Education and Economics
No ratings yet
Linear Regression Analysis in Education and Economics
19 pages
Probability and Statistics by Prof Sudip Roy Roorkee
No ratings yet
Probability and Statistics by Prof Sudip Roy Roorkee
21 pages
AWGN Channel Signal Transmission
No ratings yet
AWGN Channel Signal Transmission
82 pages
Lesson 12 Practice Problem #5 3T AY1920-1
No ratings yet
Lesson 12 Practice Problem #5 3T AY1920-1
2 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
CH Var Basic PDF
No ratings yet
CH Var Basic PDF
48 pages
Advanced Calculus: Improper Integrals
No ratings yet
Advanced Calculus: Improper Integrals
9 pages
MPC 006 PDF
No ratings yet
MPC 006 PDF
55 pages
Statistics and Probability
100% (3)
Statistics and Probability
100 pages
Model Comparison On Genomic Predictions Using High Density Markers For Different Groups of Bulls in The Nordic Holstein Population
No ratings yet
Model Comparison On Genomic Predictions Using High Density Markers For Different Groups of Bulls in The Nordic Holstein Population
10 pages
View
No ratings yet
View
4 pages
Poisson Distribution Explained - Intuition, Examples, and Derivation - Towards Data Science
No ratings yet
Poisson Distribution Explained - Intuition, Examples, and Derivation - Towards Data Science
10 pages
9.3 One Factor Analysis of Variance
No ratings yet
9.3 One Factor Analysis of Variance
7 pages
8th Semester B.pharm Syllabus - SEMESTERVIII BP80
No ratings yet
8th Semester B.pharm Syllabus - SEMESTERVIII BP80
1 page
Forecast Time Series-Notes
No ratings yet
Forecast Time Series-Notes
138 pages
Chapter 2 t2
No ratings yet
Chapter 2 t2
17 pages
Pengaruh Kepribadian, Orientasi Kerja Dan Penempatan Pegawai Terhadap Kinerja Karyawan Pt. Advantage Supply Chain Management (SCM) Cabang Batam
No ratings yet
Pengaruh Kepribadian, Orientasi Kerja Dan Penempatan Pegawai Terhadap Kinerja Karyawan Pt. Advantage Supply Chain Management (SCM) Cabang Batam
14 pages
Business Statistics L2 Past Paper Series 3 2012 - 031
No ratings yet
Business Statistics L2 Past Paper Series 3 2012 - 031
9 pages

L3 (Week3) Bayesian Classifier

Uploaded by

L3 (Week3) Bayesian Classifier

Uploaded by

Bayesian Theory & Naïve Bayes Classifiers

Course 4232: Machine Learning

Dept. of Computer Science

Lecturer No: Week No: 03 Semester: Summer 21-22

 Bayes’ Theorem: P(H | X) = P(X | H )P(H ) = P(X | H ) P(H ) / P(X)

 Given training data X, posteriori probability of a hypothesis H, P(H|X),

P(H | X) = P(X | H )P(H ) = P(X | H ) P(H ) / P(X)

needs to be maximized P(C | X) = P(X | C )P(C )

P(X | Ci) = g ( xk , Ci , Ci )

1. What is the probability that this patient has cancer?

hMAP  argmax P(h | D)

 This hypothesis is called the maximum likelihood hypothesis.

hML = arg max P( D | h)

cMAP = argmax P(c j | x1 , x2 ,, xn )

c NB = arg max P(c j ) P( xi | c j )

P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class <=30

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

 Dichotomizer (K=2) vs Polychotomizer (K>2)

 Introduction to Machine Learning by Ethem Alpaydin

You might also like