0% found this document useful (0 votes)

18 views46 pages

Nayes Bayes Classifier

The document discusses Bayesian Classification methods in machine learning, highlighting the use of Bayes' Theorem for probabilistic predictions and the effectiveness of the Naïve Bayes classifier. It explains the process of estimating probabilities from data, including handling continuous and discrete attributes, and addresses the advantages and disadvantages of Naïve Bayes. Additionally, it introduces Bayesian Belief Networks as a graphical model for representing dependencies among variables.

Uploaded by

Dr Aruna Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views46 pages

Nayes Bayes Classifier

Uploaded by

Dr Aruna Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

Machine Learning

Classification Methods
Bayesian Classification, Nearest
Neighbor, Ensemble Methods
Bayesian Classification: Why?

 A statistical classifier: performs probabilistic prediction,

i.e., predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree
and selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with
observed data

Data Mining: Concepts and

April 29, 2025 Techniques 2
Bayes’ Rule
Understanding Bayes' rule
P ( d | h) P ( h) d data
p(h | d )  h hypothesis (model)
P(d ) - rearranging
p(h | d ) P(d ) P(d | h ) P(h )
P(d , h) P(d , h)
the same joint probability
Who is who in Bayes’ rule on both sides

P(h) : prior belief (probabili ty of hypothesis h before seeing any data)

P(d | h) : likelihood (probabili ty of the data if the hypothesis h is true)
P ( d )  P ( d | h ) P ( h ) : data evidence (marginal probabilit y of the data)
h

P(h | d ) : posterior (probabili ty of hypothesis h after having seen the data d )

Example of Bayes Theorem
 Given:
 A doctor knows that meningitis causes stiff neck 50% of the time
 Prior probability of any patient having meningitis is 1/50,000
 Prior probability of any patient having stiff neck is 1/20

 Ifa patient has stiff neck, what’s the probability

he/she has meningitis?

P ( S | M ) P ( M ) 0.5 1 / 50000
P( M | S )   0.0002
P( S ) 1 / 20
Choosing Hypotheses
 Maximum Likelihood hML arg max P (d | h)
hypothesis: hH

 Generally we want the most hMAP arg max P (h | d )

probable hypothesis given hH
training data.This is the
maximum a posteriori
hypothesis:
 Useful observation: it does
not depend on the
denominator P(d)
Bayesian Classifiers
 Consider each attribute and class label as random
variables

 Given a record with attributes (A1, A2,…,An)

 Goal is to predict class C
 Specifically, we want to find the value of C that maximizes
P(C| A1, A2,…,An )

 Can we estimate P(C| A1, A2,…,An ) directly from

data?
Bayesian Classifiers
 Approach:
 compute the posterior probability P(C | A1, A2, …, An) for all values
of C using the Bayes theorem
P ( A A  A | C ) P (C )
P (C | A A  A )  1 2 n

P( A A  A )
1 2 n

1 2 n

 Choose value of C that maximizes

P(C | A1, A2, …, An)

 Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)

 How to estimate P(A1, A2, …, An | C )?

Naïve Bayes Classifier
 Assume independence among attributes Ai when class is
given:
 P(A , A , …, A |C) = P(A | C ) P(A | C )… P(A | C )
1 2 n 1 j 2 j n j
 Can estimate P(A | C ) for all A and C .
i j i j
 This is a simplifying assumption which may be violated in
reality
 The Bayesian classifier that uses the Naïve Bayes assumption
and computes the MAP hypothesis is called Naïve Bayes
classifier
cNaive Bayes arg max P ( c ) P ( x | c ) arg max P ( c ) P ( ai | c )
c c i
How to Estimate Probabilities
from Data?
 Class: P(C) = Nc/N
 e.g., P(No) = 7/10,
P(Yes) = 3/10

 For discrete attributes:

P(Ai | Ck) = |Aik|/ Nc
k
 where |Aik| is number of instances
having attribute Ai and belongs to
class Ck
 Examples:

P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
How to Estimate Probabilities
from Data?
 For continuous attributes:
 Discretize the range into bins
 one ordinal attribute per bin
 violates independence assumption
 Two-way split: (A < v) or (A > v)
 choose only one of the two splits as new attribute
 Probability density estimation:
 Assume attribute follows a normal distribution
 Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
 Once probability distribution is known, can use it to
estimate the conditional probability P(A i|c)
How to Estimate Probabilities from
Data?
 Normal distribution:
1 
( Ai   ij ) 2

P( A | c )  e 2  ij2

2
i j 2

 One for each (Ai,ci) pair

 For (Income, Class=No):

 If Class=No
 sample mean = 110

 sample variance = 2975

1 
( 120  110 ) 2

P ( Income 120 | No)  e 2 ( 2975 )

0.0072
2 (54.54)
Naïve Bayesian Classifier:
Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = >40 medium no fair yes
‘yes’ >40 low yes fair yes
C2:buys_computer = ‘no’>40 low yes excellent no
31…40 low yes excellent yes
New Data:
<=30 medium no fair no
X = (age <=30,
<=30 low yes fair yes
Income = medium,
>40 medium yes fair yes
Student = yes
Credit_rating = Fair)
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Data Mining: Concepts and
April 29, 2025 Techniques 12
Naïve Bayesian Classifier:
An Example

Given X (age=youth, income=medium, student=yes, credit=fair)

Maximize P(X|Ci)P(Ci), for i=1,2

First step: Compute P(C) The prior probability of each class can be
computed based on the training tuples:
P(buys_computer=yes)=9/14=0.643
P(buys_computer=no)=5/14=0.357
Naïve Bayesian Classifier:
An Example

Given X (age=youth, income=medium, student=yes, credit=fair)

Maximize P(X|Ci)P(Ci), for i=1,2

Second step: compute P(X|Ci)

Given X (age=youth, income=medium, student=yes, credit=fair)

Maximize P(X|Ci)P(Ci), for i=1,2

Second step: compute P(X|Ci)

Given X (age=youth, income=medium, student=yes, credit=fair)

Maximize P(X|Ci)P(Ci), for i=1,2

We have computed in the first and second steps:

P(buys_computer=yes)=9/14=0.643
P(buys_computer=no)=5/14=0.357
P(X|buys_computer=yes)= 0.044
P(X|buys_computer=no)= 0.019

Third step: compute P(X|Ci)P(Ci) for each class

P(X|buys_computer=yes)P(buys_computer=yes)=0.044 x 0.643=0.028
P(X|buys_computer=no)P(buys_computer=no)=0.019 x 0.357=0.007
The naïve Bayesian Classifier predicts X belongs to class (“buys_computer =
yes”)
Example
Training set :
(Öğrenme Kümesi)

Given a Test Record:

X (Refund No, Married, Income 120K)

Example of Naïve Bayes Classifier
Given a Test Record:
X (Refund No, Married, Income 120K)

 P(X|Class=No) = P(Refund=No|Class=No)
 P(Married| Class=No)
 P(Income=120K|
Class=No)
= 4/7  4/7  0.0072 = 0.0024

 P(X|Class=Yes) = P(Refund=No| Class=Yes)

 P(Married| Class=Yes)
 P(Income=120K|
Class=Yes)
= 1  0  1.2  10-9 = 0

Since P(X|No)P(No) > P(X|Yes)P(Yes)

Therefore P(No|X) > P(Yes|X)
=> Class = No
Avoiding the 0-Probability Problem

 If one of the conditional probability is zero, then the

entire expression becomes zero
 Probability estimation:

N ic
Original : P( Ai | C )  c: number of classes
Nc
p: prior probability
N ic  1
Laplace : P( Ai | C )  m: parameter
Nc  c
N ic  mp
m - estimate : P ( Ai | C ) 
Nc  m
Naïve Bayes (Summary)
 Advantage
 Robust to isolated noise points
 Handle missing values by ignoring the instance during probability
estimate calculations
 Robust to irrelevant attributes
 Disadvantage
 Assumption: class conditional independence, which may cause loss
of accuracy
 Independence assumption may not hold for some attribute.
Practically, dependencies exist among variables
 Use other techniques such as Bayesian Belief Networks (BBN)
Remember

 Bayes’ rule can be turned into a classifier

 Maximum A Posteriori (MAP) hypothesis estimation
incorporates prior knowledge; Max Likelihood (ML) doesn’t
 Naive Bayes Classifier is a simple but effective Bayesian
classifier for vector data (i.e. data with several attributes)
that assumes that attributes are independent given the class.
 Bayesian classification is a generative approach to
classification
Classification Paradigms
 In fact, we can categorize three fundamental approaches
to classification:
 Generative models: Model p(x|Ck) and P(Ck) separately
and use the Bayes theorem to find the posterior
probabilities P(Ck|x)
 E.g. Naive Bayes, Gaussian Mixture Models, Hidden Markov
Models,…
 Discriminative models:
 Determine P(Ck|x) directly and use in decision
 E.g. Linear discriminant analysis, SVMs, NNs,…
 Find a discriminant function f that maps x onto a class
label directly without calculating probabilities
Slide from B.Yanik
Bayesian Belief Networks
 Bayesian belief network allows a subset of the variables to
be conditionally independent
 A graphical model of causal relationships (neden sonuç
ilişkilerini simgeleyen bir çizge tabanlı model)
 Represents dependency among the variables
 Gives a specification of joint probability distribution
 Nodes: random variables
X Y  Links: dependency
 X and Y are the parents of Z, and
Z Y is the parent of P
P  No dependency between Z and P
 Has no loops or cycles

Data Mining: Concepts and

April 29, 2025 Techniques 23
Bayesian Belief Network: An Example

Family The conditional probability

Smoker
History table (CPT) for variable
LungCancer:
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9

CPT shows the conditional probability

for each possible combination of its
parents
Derivation of the probability of a
PositiveXRay Dyspnea particular combination of values
of X, from CPT:
n
P ( x ,..., xn )   P ( x i | Parents (Y i ))
Bayesian Belief Networks 1 i 1
Data Mining: Concepts and
April 29, 2025 Techniques 24
Training Bayesian Networks
 Several scenarios:
 Given both the network structure and all variables observable:
learn only the CPTs
 Network structure known, some hidden variables: gradient
descent (greedy hill-climbing) method, analogous to neural
network learning
 Network structure unknown, all variables observable: search
through the model space to reconstruct network topology
 Unknown structure, all hidden variables: No good algorithms
known for this purpose

 Ref. D. Heckerman: Bayesian networks for data mining

Data Mining: Concepts and

April 29, 2025 Techniques 25
Lazy Learners
 The classification algorithms presented before are eager
learners
 Construct a model before receiving new tuples to classify
 Learned models are ready and eager to classify previously
unseen tuples
 Lazy learners
 The learner waits till the last minute before doing any model
construction
 In order to classify a given test tuple
 Store training tuples
 Wait for test tuples
 Perform generalization based on similarity between test and the
stored training tuples
Lazy vs Eager
Eager Learners Lazy Learners

• Do lot of work on training data • Do less work on training data

• Do less work when test tuples are • Do more work when test tuples are
presented presented
Basic k-Nearest Neighbor Classification
 Given training data (x1 , y1 ),..., (x N , y N )
 Define a distance metric between points in input space
D(x1,xi)
 E.g., Eucledian distance, Weighted Eucledian, Mahalanobis
distance, TFIDF, etc.

 Training method:
 Save the training examples
 At prediction time:
 Find the k training examples (x1,y1),…(xk,yk) that are closest
to the test example x given the distance D(x1,xi)
 Predict the most frequent class among those yi’s.
Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

 To classify an unknown record:

– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
K-Nearest Neighbor Model

 Classification:

yˆ = most common class in set {y1 ,..., yK }

 Regression: K
^ 1
y
K
y
k 1
k
K-Nearest Neighbor Model: Weighted
by Distance
 Classification:

yˆ = most common class in wieghted set

{D (x, x1 ) y1 ,..., D (x, x K ) yK }
K
 Regression:
^  D ( x, x ) yk k
y  k 1K
 D ( x, x )
k 1
k

31
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that

have the k smallest distance to x
The decision boundary implemented
by 3NN
The boundary is always the perpendicular bisector
of the line between two points (Voronoi tessellation)

Slide by Hinton
Nearest Neighbor Classification…

 Choosing the value of k:

 If k is too small, sensitive to noise points
 If k is too large, neighborhood may include points from other
classes

X
Determining the value of k
 In typical applications k is in units or tens rather than in
hundreds or thousands
 Higher values of k provide smoothing that reduces the
risk of overfitting due to noise in the training data
 Value of k can be chosen based on error rate measures
 We should also avoid over-smoothing by choosing k=n,
where n is the total number of tuples in the training data
set
Determining the value of k
 Given training examples (x1 , y1 ),..., (x N , y N )
 Use N fold cross validation
 Search over K = (1,2,3,…,Kmax). Choose search size Kmax
based on compute constraints
 Calculated the average error for each K:
 Calculate predicted class for each training point
ˆyi
(xi , yi ), i =1,..., N
(using all other points to build the model)
 Average over all training examples
 Pick K to minimize the cross validation error
Example

Example from J. Gamper

Choosing k

Slide from J. Gamper

Nearest neighbor Classification…
 k-NN classifiers are lazy learners
 It does not build models explicitly
 Unlike eager learners such as decision tree induction and rule-
based systems
 Adv: No training time
 Disadv:
 Testing time can be long, classifying unknown records are
relatively expensive
 Curse of Dimensionality : Can be easily fooled in high
dimensional spaces
 Dimensionality reduction techniques are often used
Ensemble Methods

 One of the eager methods => builds model over

the training set

 Construct a set of classifiers from the training

data

 Predict class label of previously unseen records

by aggregating predictions made by multiple
classifiers
General Idea
Original
D Training data

Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets

Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 3:
Combine C*
Classifiers
Why does it work?

 Suppose there are 25 base classifiers

 Each classifier has error rate,  = 0.35
 Assume classifiers are independent
 Probability that the ensemble classifier makes a wrong
prediction:
25
 25  i
 
 i 
i 1 
  (1   ) 25 i
0.06

Examples of Ensemble Methods

 Howto generate an ensemble of classifiers?

 Bagging

 Boosting

 Random Forests
Bagging: Bootstrap AGGregatING
 Bootstrap: data resampling
 Generate multiple training sets
 Resample the original training data
 With replacement
 Data sets have different “specious” patterns
 Sampling with replacement
 Each sample has probability (1 – 1/n)n of being selected
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
 Build classifier on each bootstrap sample
 Specious patterns will not correlate
 Underlying true pattern will be common to many
 Combine the classifiers: Label new test examples by a majority vote
among classifiers
Boosting

 An iterative procedure to adaptively change

distribution of training data by focusing more on
previously misclassified records
 Initially, all N records are assigned equal weights
 Unlike bagging, weights may change at the end of
boosting round
 Thefinal classifier is the weighted combination of
the weak classifiers.
Boosting

 Records that are wrongly classified will have their

weights increased
 Records that are classified correctly will have

their weights decreased

Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds

Joseph Victor Michalowicz, Jonathan M. Nichols, Frank Bucholtz - Handbook of Differential Entropy (2013, Chapman and Hall - CRC) PDF
No ratings yet
Joseph Victor Michalowicz, Jonathan M. Nichols, Frank Bucholtz - Handbook of Differential Entropy (2013, Chapman and Hall - CRC) PDF
241 pages
Glossary of Statistical Terms and Symbols
No ratings yet
Glossary of Statistical Terms and Symbols
4 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
ML 09 Naive Bayes Classifier
No ratings yet
ML 09 Naive Bayes Classifier
24 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Bayes
No ratings yet
Bayes
10 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Module - 3 - Last Part
No ratings yet
Module - 3 - Last Part
16 pages
Bayesian
No ratings yet
Bayesian
23 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
AI Notes
No ratings yet
AI Notes
19 pages
IML Module 3
No ratings yet
IML Module 3
95 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Bayesian Classification - Problem
No ratings yet
Bayesian Classification - Problem
4 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Alaa Ali ch8 Mining
No ratings yet
Alaa Ali ch8 Mining
13 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Naive by
No ratings yet
Naive by
23 pages
Lecture12 Ch8 ClassBasic Part2
No ratings yet
Lecture12 Ch8 ClassBasic Part2
22 pages
L25 - Naïve Bayes
No ratings yet
L25 - Naïve Bayes
18 pages
2024 - Slide2 - BayesML Sub
No ratings yet
2024 - Slide2 - BayesML Sub
40 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Bayes Classification Method
No ratings yet
Bayes Classification Method
18 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
TTDS Lecture 5
No ratings yet
TTDS Lecture 5
8 pages
Lecture 6 - Generative Models
No ratings yet
Lecture 6 - Generative Models
33 pages
Basic Probability Exam Packet
No ratings yet
Basic Probability Exam Packet
44 pages
Basic Computer Skills
100% (1)
Basic Computer Skills
2 pages
Exercises For Stochastic Calculus
100% (2)
Exercises For Stochastic Calculus
85 pages
Marginal Distribution
No ratings yet
Marginal Distribution
6 pages
CH 04
No ratings yet
CH 04
29 pages
Tutorial - 11-18MAB204T
No ratings yet
Tutorial - 11-18MAB204T
2 pages
Tabel Statistik 1
No ratings yet
Tabel Statistik 1
22 pages
Time Series Analysis
No ratings yet
Time Series Analysis
9 pages
Oxford Textbook - Chapters 7 and 8 - Worked Solutions-4
No ratings yet
Oxford Textbook - Chapters 7 and 8 - Worked Solutions-4
32 pages
Statistics For Management - 2
80% (10)
Statistics For Management - 2
14 pages
Midterm Codes A - B 28.10.21
No ratings yet
Midterm Codes A - B 28.10.21
1 page
Computing Binomial Probabilities With Minitab
No ratings yet
Computing Binomial Probabilities With Minitab
6 pages
Mat204 June 2022
No ratings yet
Mat204 June 2022
4 pages
Stat Assignment 5
No ratings yet
Stat Assignment 5
6 pages
1743 Chapter 3 Probability
No ratings yet
1743 Chapter 3 Probability
21 pages
3 - Classification - Naive Bayes
No ratings yet
3 - Classification - Naive Bayes
30 pages
Estimation: M. Shafiqur Rahman
No ratings yet
Estimation: M. Shafiqur Rahman
31 pages
Chronomodel: Marie-Anne Vibet Anne Philippe
No ratings yet
Chronomodel: Marie-Anne Vibet Anne Philippe
19 pages
MSC Guide
No ratings yet
MSC Guide
26 pages
10 Models Erdos Renyi
No ratings yet
10 Models Erdos Renyi
9 pages
Sampling Distributions 10.1. Single Mean:: Solution
No ratings yet
Sampling Distributions 10.1. Single Mean:: Solution
15 pages
MAT107 Statistics Exam II - Professor Paul Felker
No ratings yet
MAT107 Statistics Exam II - Professor Paul Felker
11 pages
Mas 103
No ratings yet
Mas 103
6 pages
Process Control Concepts Through Statistics: N X N X X X X X
No ratings yet
Process Control Concepts Through Statistics: N X N X X X X X
14 pages
DLL Stat and Probab 1st Quarter Rev
No ratings yet
DLL Stat and Probab 1st Quarter Rev
8 pages
Statistics Skittles Project Part 4
No ratings yet
Statistics Skittles Project Part 4
2 pages
Homework Wiley 1 - Key Answer
No ratings yet
Homework Wiley 1 - Key Answer
2 pages
Soa Exam P
100% (1)
Soa Exam P
10 pages

Nayes Bayes Classifier

Uploaded by

Nayes Bayes Classifier

Uploaded by

Machine Learning

 A statistical classifier: performs probabilistic prediction,

Data Mining: Concepts and

P(h) : prior belief (probabili ty of hypothesis h before seeing any data)

P(h | d ) : posterior (probabili ty of hypothesis h after having seen the data d )

 Ifa patient has stiff neck, what’s the probability

 Generally we want the most hMAP arg max P (h | d )

 Given a record with attributes (A1, A2,…,An)

 Can we estimate P(C| A1, A2,…,An ) directly from

 Choose value of C that maximizes

 Equivalent to choosing value of C that maximizes

 How to estimate P(A1, A2, …, An | C )?

 For discrete attributes:

 One for each (Ai,ci) pair

 For (Income, Class=No):

 sample variance = 2975

P ( Income 120 | No)  e 2 ( 2975 )

Given X (age=youth, income=medium, student=yes, credit=fair)

Given X (age=youth, income=medium, student=yes, credit=fair)

Second step: compute P(X|Ci)

Given X (age=youth, income=medium, student=yes, credit=fair)

Second step: compute P(X|Ci)

Given X (age=youth, income=medium, student=yes, credit=fair)

We have computed in the first and second steps:

Third step: compute P(X|Ci)P(Ci) for each class

Given a Test Record:

X (Refund No, Married, Income 120K)

 P(X|Class=Yes) = P(Refund=No| Class=Yes)

Since P(X|No)P(No) > P(X|Yes)P(Yes)

 If one of the conditional probability is zero, then the

 Bayes’ rule can be turned into a classifier

Data Mining: Concepts and

Family The conditional probability

LC 0.8 0.5 0.7 0.1

CPT shows the conditional probability

 Ref. D. Heckerman: Bayesian networks for data mining

Data Mining: Concepts and

• Do lot of work on training data • Do less work on training data

 To classify an unknown record:

yˆ = most common class in set {y1 ,..., yK }

yˆ = most common class in wieghted set

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that

 Choosing the value of k:

Example from J. Gamper

Slide from J. Gamper

 One of the eager methods => builds model over

 Construct a set of classifiers from the training

 Predict class label of previously unseen records

 Suppose there are 25 base classifiers

 Howto generate an ensemble of classifiers?

 An iterative procedure to adaptively change

 Records that are wrongly classified will have their

their weights decreased

• Example 4 is hard to classify

You might also like