Naïve Bayes Algorithm Explained

Uploaded by

MEGHANA BONTHU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views17 pages

Naïve Bayes Algorithm Explained

Uploaded by

MEGHANA BONTHU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

NAÏVE BAYES ALGORITHM

BAYES THEOREM
Bayes theorem helps to calculate the probability of
occuring one event with uncertain knowledge while
other one is already occured
Bayes theorem is derived from the conditional
probability
Conditional probability: if A and B are two
dependent events then the probability of occurrence of
A given that B has already occurred.
Bayes theorem :p(A|B)=P(B|A)*P(A) /P(B)
p(A|B)=P(B|A)*P(A) /P(B)
P(A|B) is posterier probability, defined as updated
probability after considering the evidence
P(B|A) is called a likelihood, it is a probability of the
evidence when hypothesis is true
P(A) is called class prior probability, probability of
hypothesis before considering the evidence
P(B) is called predictor prior probability. It is defined
as the probability of evidence under any
consideration.
NAÏVE BAYES THEOREM
 Naïve bayes is a classification algorithm which works based on the bayes
theorem.
 Naive Bayes is called naive because it assumes that each input variable is
independent.
 Working of Naïve Bayes' Classifier can be understood with the help of the
below example:
 Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that whether
we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
 Convert the given dataset into frequency tables.
 Generate Likelihood table by finding the probabilities of given features.
 Now, use Bayes theorem to calculate the posterior probability.
 Problem: If the weather is sunny, then the Player should play or not?
Day Outlook Temp Humidity Wind Play
tennis
D1 Rainy Hot High Weak No
D2 Rainy Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Sunny Mild High Weak Yes
D5 Sunny Cool Normal Weak Yes
D6 Sunny Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Rainy Mild High Weak No
D9 Rainy Cool Normal Weak Yes
D10 Sunny Mild Normal Weak Yes
D11 Rainy Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Sunny mild High Strong No
Step 1: frequency tables
Play tennis
Play tennis
yes no yes no

outlook temp Hot 2 2

sunny 3 2
Mild 4 2
overcast 4 0
rainy 2 3 cool 3 1

Play tennis Play tennis

humidity Yes No
wind yes no
high 3 4
nor 6 1 false 6 2
mal true 3 3
Step 2: likelihood tables
Play tennis
Play tennis
yes no yes no

outlook temp Hot 2/9 2/5

sunny 3/9 2/5
Mild 4/9 2/5
overcast 4/9 0/5
rainy 2/9 3/5 cool 3/9 1/5

Play tennis Play tennis

humidity Yes No
wind yes no
high 3/9 4/5
nor 6/9 1/5 false 6/9 2/5
mal true 3/9 3/5
Step 3: calculating posterior probability
P(yes / sunny)= p(sunny /yes) * p(yes)
p(sunny)
=0.33*0.64/0.36 = 0.60
P(no / sunny)= p(sunny / no) * p(no)
p(sunny)
=0.40 * 0.36 / 0.36 = 0.40
the posterior probability of yes is grater than posterior
probability of no so the player can play tennis
Example 2:naïve bayes for filtering spam
D1: send us your password- spam
D2: send us your review – ham
D3: review your password – ham
D4: review us- spam
D5: send your password – spam
D6: send us your account – spam

New mail: review us now

P(spam)=4/6 P(ham)=2/6
words spam Ham
password 2/4 ½
Review 1/4 2/2
Send 3/4 ½
Us 3/4 ½
Your 3/4 ½
account 1/4 0/2
P(spam/review us)=
p(review/spam).p(us/spam).p(spam)
p(review) . P(us)

= ¼ * ¾ *4/6
3/6 * 4/6
=0.375
P(ham / review us) =
p(review / ham).p(us/ham).p(ham)
p(review). P(us)
= 2/2 * ½ *2/6
3/6 * 4/6
=0.5 > 0.375
So the mail is ham
why KNN and Linear Regression are poor
choices for filtering spam
K-Nearest Neighbors (KNN) and Linear Regression are
typically poor choices for filtering spam for several reasons:
1. KNN's inefficiency:
KNN is a non-parametric algorithm that classifies data points
based on their similarity to the k-nearest neighbors. In the
context of spam filtering, this would involve comparing
incoming email messages to a database of labeled examples.
However, for spam filtering, the feature space can be high-
dimensional, and the number of training examples (emails)
can be very large. Calculating the distances between all data
points for each new email can be computationally expensive
and slow, making it impractical for real-time spam filtering.
2.Lack of interpretability:
KNN does not provide insights into why a particular
decision was made. It cannot explain why a given email
was classified as spam or not, which is crucial for spam
filter users who want to understand and trust the system's
decisions.
3.Sensitivity to irrelevant features:
KNN relies on the entire feature space, and it may not
perform well when irrelevant features are present in the
data. In spam filtering, many features could be irrelevant
or even detrimental to the classification task.
4.Linearity in Linear Regression:
Linear Regression is designed for regression tasks,
where the goal is to predict a continuous target
variable. Spam filtering is a binary classification
problem (spam or not spam). Using Linear Regression
in such cases may lead to suboptimal results because it
models a linear relationship between features and the
target, which doesn't capture the complex, non-linear
patterns in spam emails.
5. Assumptions of Linear Regression:
Linear Regression assumes that the relationship
between the features and the target variable is linear
and that the residuals (the errors) are normally
distributed and homoscedastic. These assumptions
may not hold in the case of spam filtering, where the
relationship between email features and spam status
can be highly non-linear, and the data may not meet
these assumptions.
6. Outliers and noise:
Both KNN and Linear Regression can be sensitive to
outliers and noise in the data. In spam filtering, there
may be a considerable amount of noisy or mislabeled
data, and these methods may not handle such data
well.

Day 4 - Supervised Learning (Classification)
No ratings yet
Day 4 - Supervised Learning (Classification)
46 pages
2425s Csec520 08 Naive Bayes KNN
No ratings yet
2425s Csec520 08 Naive Bayes KNN
44 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
7 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Baye's Theorem - Example
No ratings yet
Baye's Theorem - Example
7 pages
ML Unit 2
No ratings yet
ML Unit 2
107 pages
Notes On Module 3 - Pattern Recognition
No ratings yet
Notes On Module 3 - Pattern Recognition
17 pages
Naïve Bayes Classifiers 3
No ratings yet
Naïve Bayes Classifiers 3
16 pages
Naive Bayes
No ratings yet
Naive Bayes
4 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
07 - KNN & Naive Bayes
No ratings yet
07 - KNN & Naive Bayes
59 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Lec 03 NaiveBayesClassification
No ratings yet
Lec 03 NaiveBayesClassification
33 pages
Naive Bayes Classifier Guide
No ratings yet
Naive Bayes Classifier Guide
28 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Nave Bayes Algorithms
No ratings yet
Nave Bayes Algorithms
15 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Text Mining - Classification
No ratings yet
Text Mining - Classification
28 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
16 pages
Module3 Ids
No ratings yet
Module3 Ids
17 pages
BSC ML CH2
No ratings yet
BSC ML CH2
79 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Unit 3
No ratings yet
Unit 3
46 pages
ML 2
No ratings yet
ML 2
13 pages
Naïve Bayes Classifier Guide
No ratings yet
Naïve Bayes Classifier Guide
47 pages
Shawndra Hill Spring 2013 TR 1:30 - 3pm and 3 - 4:30
No ratings yet
Shawndra Hill Spring 2013 TR 1:30 - 3pm and 3 - 4:30
75 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
14 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Unit 4
No ratings yet
Unit 4
36 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
What Is Naive Bayes?
No ratings yet
What Is Naive Bayes?
6 pages
3 - Classification - Naive Bayes
No ratings yet
3 - Classification - Naive Bayes
30 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Session 5-ClassifyWithKnn-NaiveBayes
No ratings yet
Session 5-ClassifyWithKnn-NaiveBayes
39 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
Lab7&8 NaiveBayes
No ratings yet
Lab7&8 NaiveBayes
5 pages
DWM Exp5 C49
No ratings yet
DWM Exp5 C49
12 pages
Lab5 NaiveBayes Full
No ratings yet
Lab5 NaiveBayes Full
5 pages
Unit 6
No ratings yet
Unit 6
19 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
Unit 3 LOGISTIC
No ratings yet
Unit 3 LOGISTIC
7 pages
Bayesian Concept Learning Guide
No ratings yet
Bayesian Concept Learning Guide
157 pages
Naive Bayes & SVM Overview
No ratings yet
Naive Bayes & SVM Overview
79 pages
Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
On Unit-4
No ratings yet
On Unit-4
60 pages
Intro To R
No ratings yet
Intro To R
253 pages
K Means
No ratings yet
K Means
26 pages
Module 1 - Introduction & Fabrication.
No ratings yet
Module 1 - Introduction & Fabrication.
30 pages
Introduction to C Programming Basics
No ratings yet
Introduction to C Programming Basics
162 pages
Assignment 3: This Assignment Aims To Fit A VAR To The Following Variables
No ratings yet
Assignment 3: This Assignment Aims To Fit A VAR To The Following Variables
6 pages
Group Assign Econometrics.
No ratings yet
Group Assign Econometrics.
3 pages
FRM Bionic Turtle T2-Quantitative
100% (4)
FRM Bionic Turtle T2-Quantitative
133 pages
An Enquiry Into The Perception of Post Office Savings Schemes Amongst Millenial Population
No ratings yet
An Enquiry Into The Perception of Post Office Savings Schemes Amongst Millenial Population
60 pages
Investigate The Relationship Between Personality Characteristics and Consumer Behavior - Case Study: Nike Brand
No ratings yet
Investigate The Relationship Between Personality Characteristics and Consumer Behavior - Case Study: Nike Brand
5 pages
MCQ Set2 PTSP
No ratings yet
MCQ Set2 PTSP
2 pages
7th Grade Girls' Height Data Analysis
No ratings yet
7th Grade Girls' Height Data Analysis
1 page
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Null Hypothesis Formula
100% (3)
Null Hypothesis Formula
8 pages
Two-Way ANOVA
No ratings yet
Two-Way ANOVA
9 pages
PEDro Scale
No ratings yet
PEDro Scale
2 pages
Non Parametric Method
No ratings yet
Non Parametric Method
35 pages
DS Imp Questions
No ratings yet
DS Imp Questions
5 pages
Market Research Sampling Guide
No ratings yet
Market Research Sampling Guide
5 pages
Quantitative Methods Fairview Branch PDF
100% (1)
Quantitative Methods Fairview Branch PDF
82 pages
6th Sem Statistics
No ratings yet
6th Sem Statistics
2 pages
Statistics for LLB Students
No ratings yet
Statistics for LLB Students
22 pages
Data Analysis Module for Analysts
No ratings yet
Data Analysis Module for Analysts
6 pages
Design and Analysis of Experiments
0% (1)
Design and Analysis of Experiments
188 pages
Chapter 9
No ratings yet
Chapter 9
14 pages
Gage R&R
No ratings yet
Gage R&R
24 pages
Advanced Data Analytics: UE23AM343AB1
No ratings yet
Advanced Data Analytics: UE23AM343AB1
24 pages
One-Way ANOVA Two-Way ANOVA
No ratings yet
One-Way ANOVA Two-Way ANOVA
31 pages
Business Statistics by S P Gupta PDF
No ratings yet
Business Statistics by S P Gupta PDF
79 pages
305 - TC 508 Booklet
No ratings yet
305 - TC 508 Booklet
18 pages
Data Management for Math Students
No ratings yet
Data Management for Math Students
32 pages
Study Design Presentation
No ratings yet
Study Design Presentation
130 pages
One-Way ANOVA in SPSS Statistics: Ancova
No ratings yet
One-Way ANOVA in SPSS Statistics: Ancova
12 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
11 pages
Geog 113 - Quantitative Methods
No ratings yet
Geog 113 - Quantitative Methods
3 pages