0% found this document useful (0 votes)

2 views6 pages

Naive Bayes

Naive Bayes classifiers are a family of algorithms based on Bayes' Theorem that assume the independence of features for classification tasks, particularly effective in high-dimensional data like text classification. They are fast and widely used for applications such as spam filtering and sentiment analysis, despite their simplifying assumptions often not holding true in real-world scenarios. The classification process involves calculating the probabilities of class membership based on feature values and selecting the class with the highest probability.

Uploaded by

ravindrababu.jonnadula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views6 pages

Naive Bayes

Uploaded by

ravindrababu.jonnadula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Naive Bayes classifier

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not
a single algorithm but a family of algorithms where all of them share a common principle, i.e. every
pair of features being classified is independent of each other. To start with, let us consider a dataset.

One of the most simple and effective classification algorithms, the Naïve Bayes classifier aids in the
rapid development of machine learning models with rapid prediction capabilities.

Naïve Bayes algorithm is used for classification problems. It is highly used in text classification. In text
classification tasks, data contains high dimension (as each word represent one feature in the data). It
is used in spam filtering, sentiment detection, rating classification etc. The advantage of using naïve
Bayes is its speed. It is fast and making prediction is easy with high dimension of data.This model
predicts the probability of an instance belongs to a class with a given set of feature value. It is a
probabilistic classifier. It is because it assumes that one feature in the model is independent of
existence of another feature. In other words, each feature contributes to the predictions with no
relation between each other. In real world, this condition satisfies rarely. It uses Bayes theorem in the
algorithm for training and prediction

Why it is Called Naive Bayes?

The “Naive” part of the name indicates the simplifying assumption made by the Naïve Bayes
classifier. The classifier assumes that the features used to describe an observation are conditionally
independent, given the class label. The “Bayes” part of the name refers to Reverend Thomas Bayes,
an 18th-century statistician and theologian who formulated Bayes’ theorem.

Consider a fictional dataset that describes the weather conditions for playing a game of golf. Given
the weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for playing
golf. Here is a tabular representation of our dataset.

Outlook Temperature Humidity Windy Play Golf

0 Rainy Hot High False No

1 Rainy Hot High True No

2 Overcast Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

Outlook Temperature Humidity Windy Play Golf

6 Overcast Cool Normal True Yes

7 Rainy Mild High False No

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overcast Mild High True Yes

12 Overcast Hot Normal False Yes

13 Sunny Mild High True No

The dataset is divided into two parts, namely, feature matrix and the response vector.

 Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the
value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.

 Response vector contains the value of class variable(prediction or output) for each row of
feature matrix. In above dataset, the class variable name is ‘Play golf’.

Assumption of Naive Bayes

The fundamental Naive Bayes assumption is that each feature makes an:

 Feature independence: The features of the data are conditionally independent of each
other, given the class label.

 Continuous features are normally distributed: If a feature is continuous, then it is assumed

to be normally distributed within each class.

 Discrete features have multinomial distributions: If a feature is discrete, then it is assumed

to have a multinomial distribution within each class.

 Features are equally important: All features are assumed to contribute equally to the
prediction of the class label.

 No missing data: The data should not contain any missing values.

With relation to our dataset, this concept can be understood as:

 Secondly, each feature is given the same weight(or importance). For example, knowing only
temperature and humidity alone can’t predict the outcome accurately. None of the
attributes is irrelevant and assumed to be contributing equally to the outcome.

The assumptions made by Naive Bayes are not generally correct in real-world situations. In-fact, the
independence assumption is never correct but often works well in practice.Now, before moving to
the formula for Naive Bayes, it is important to know about Bayes’ theorem.

Bayes’ Theorem

Bayes’ Theorem finds the probability of an event occurring given the probability of another event
that has already occurred. Bayes’ theorem is stated mathematically as the following equation:

P(A∣B) = P(B)P(B∣A)/P(A)

where A and B are events and P(B) ≠ 0

 Basically, we are trying to find probability of event A, given the event B is true. Event B is also
termed as evidence.

 P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen).
The evidence is an attribute value of an unknown instance(here, it is event B).

 P(B) is Marginal Probability: Probability of Evidence.

 P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

 P(B|A) is Likelihood probability i.e the likelihood that a hypothesis will come true based on
the evidence.

Now, with regards to our dataset, we can apply Bayes’ theorem in following way:

P(y∣X)=P(X)P(X∣y)P(y)

where, y is class variable and X is a dependent feature vector (of size n) where:

X=(x1,x2,x3,…..,xn)

Just to clear, an example of a feature vector and corresponding class variable can be: (refer 1st row of
dataset)

X = (Rainy, Hot, High, False)

y = No

So basically, P(y∣X)here means, the probability of “Not playing golf” given that the weather
conditions are “Rainy outlook”, “Temperature is hot”, “high humidity” and “no wind”.

With relation to our dataset, this concept can be understood as:

 We assume that no pair of features are dependent. For example, the temperature being
‘Hot’ has nothing to do with the humidity or the outlook being ‘Rainy’ has no effect on the
winds. Hence, the features are assumed to be independent.
 Secondly, each feature is given the same weight(or importance). For example, knowing only
temperature and humidity alone can’t predict the outcome accurately. None of the
attributes is irrelevant and assumed to be contributing equally to the outcome.

Now, its time to put a naive assumption to the Bayes’ theorem, which is, independence among the
features. So now, we split evidence into the independent parts.

Now, if any two events A and B are independent, then,

P(A,B) = P(A)P(B)

Hence, we reach to the result:

P(y∣x1,…,xn) = P(x1∣y)P(x2∣y)…P(xn∣y)P(y) / P(x1)P(x2)…P(xn)

which can be expressed as:

P(y∣x1,…,xn)=P(y) ∏i=1 to n P(xi∣y) / P(x1)P(x2)…P(xn)

Now, as the denominator remains constant for a given input, we can remove that term:

P(y∣x1,…,xn)∝P(y) ∏i=1 to n P(xi∣y)

Now, we need to create a classifier model. For this, we find the probability of given set of inputs for
all possible values of the class variable y and pick up the output with maximum probability. This can
be expressed mathematically as:

y=argmaxyP(y)∏i=1 to n P(xi∣y)

So, finally, we are left with the task of calculating P(y)and P(xi∣y).

Please note that P(y) is also called class probability and P(xi∣y) is called conditional probability.

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the
distribution of P(xi∣y).

Let us try to apply the above formula manually on our weather dataset. For this, we need to do some
precomputations on our dataset.

We need to find P(xi∣yj)for each xi in X andyj in y. All these calculations have been demonstrated in
the tables below:
So, in the figure above, we have calculated P(xi ∣yj) for each xi in X and yj in y manually in the tables
1-4. For example, probability of playing golf given that the temperature is cool, i.e P(temp. = cool |
play golf = Yes) = 3/9.

Also, we need to find class probabilities P(y) which has been calculated in the table 5. For example,
P(play golf = Yes) = 9/14.

So now, we are done with our pre-computations and the classifier is ready!

Let us test it on a new set of features (let us call it today):

today = (Sunny, Hot, Normal, False)

P(Yes∣today)=P(SunnyOutlook∣Yes)P(HotTemperature∣Yes)P(NormalHumidity∣Yes)P(NoWind∣Yes)P(Yes)
/ P(today)

and probability to not play golf is given by:

P(No∣today)=P(SunnyOutlook∣No)P(HotTemperature∣No)P(NormalHumidity∣No)P(NoWind∣No)P(No) /
P(today)

Since, P(today) is common in both probabilities, we can ignore P(today) and find proportional
probabilities as:

P(Yes∣today)∝3/9. 2/9. 6/9. 6/9. 9/14 ≈ 0.02116

and

P(No∣today)∝3/5. 2/5. 1/5. 2/5. 5/14 ≈0.0068

Now, since

P(Yes∣today)+P(No∣today)=1
These numbers can be converted into a probability by making the sum equal to 1 (normalization):

P(Yes∣today)=0.02116 / (0.02116+0.0068)≈0.0237

and

P(No∣today)=0.0068 / (0.0141+0.0068)≈0.33

Since

P(Yes∣today)>P(No∣today)

So, prediction that golf would be played is ‘Yes’.

The method that we discussed above is applicable for discrete data. In case of continuous data, we
need to make some assumptions regarding the distribution of values of each feature. The different
naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of

P(xi∣y).

9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Baye's Notes
No ratings yet
Baye's Notes
3 pages
06 - NaiveBayes and ME
No ratings yet
06 - NaiveBayes and ME
26 pages
Lec 03 NaiveBayesClassification
No ratings yet
Lec 03 NaiveBayesClassification
33 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Cours #5 - Naive Bayes Classification
No ratings yet
Cours #5 - Naive Bayes Classification
18 pages
ML Lecture 12 NB
No ratings yet
ML Lecture 12 NB
15 pages
ML Lecture 10 (Naïve Bayes Classifier)
No ratings yet
ML Lecture 10 (Naïve Bayes Classifier)
14 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Naive Bayes Classification Outlne
No ratings yet
Naive Bayes Classification Outlne
12 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Bayesian Learning
No ratings yet
Bayesian Learning
41 pages
07 - ML - Naive-Bayes-update
No ratings yet
07 - ML - Naive-Bayes-update
26 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
D3 It Naive Bayes
No ratings yet
D3 It Naive Bayes
24 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
15 pages
Naive Bayes
No ratings yet
Naive Bayes
26 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
MBARARA DDP III Vol I
No ratings yet
MBARARA DDP III Vol I
173 pages
ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
BSC ML CH2
No ratings yet
BSC ML CH2
79 pages
Naive Bayes
No ratings yet
Naive Bayes
7 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Bayes Theorem
No ratings yet
Bayes Theorem
7 pages
Lecture10 - Bayesian Classifier
No ratings yet
Lecture10 - Bayesian Classifier
40 pages
WK 08
No ratings yet
WK 08
10 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
Pattern Recognition - Lec02
No ratings yet
Pattern Recognition - Lec02
44 pages
Arab Tamil
No ratings yet
Arab Tamil
64 pages
Bayesian-Classification Ok
No ratings yet
Bayesian-Classification Ok
21 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
Assignment No 2
No ratings yet
Assignment No 2
5 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
Building The Analysis Model
No ratings yet
Building The Analysis Model
32 pages
Brochure-Richmond 2023 Web
No ratings yet
Brochure-Richmond 2023 Web
30 pages
GVC01 Introduction Drive of ComputerTechnology
No ratings yet
GVC01 Introduction Drive of ComputerTechnology
89 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Ch5 - CPU Scheduling
No ratings yet
Ch5 - CPU Scheduling
60 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
3 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
An Introduction To Naive Bayes Algorithm For Beginners
No ratings yet
An Introduction To Naive Bayes Algorithm For Beginners
11 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Facs 2022
No ratings yet
Facs 2022
25 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Naive Bayes
No ratings yet
Naive Bayes
21 pages
Architectural Design
No ratings yet
Architectural Design
45 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
MCQ Compiledii
0% (1)
MCQ Compiledii
62 pages
Baye's Theorem - Example
No ratings yet
Baye's Theorem - Example
7 pages
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
CECRILab Safety Manual
No ratings yet
CECRILab Safety Manual
109 pages
Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Illustrated Parts & Service Map: HP Compaq dc5700 Small Form Factor Business PC
No ratings yet
Illustrated Parts & Service Map: HP Compaq dc5700 Small Form Factor Business PC
4 pages
Clustering
No ratings yet
Clustering
18 pages
Moving To Design
No ratings yet
Moving To Design
34 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Java Programming Hari Mohan Pandey 2012
No ratings yet
Java Programming Hari Mohan Pandey 2012
13 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Shelter 22: Architecture - Graphic Design - Research
No ratings yet
Shelter 22: Architecture - Graphic Design - Research
11 pages
Peak Education Services Nakabugo p.4 MTC Mot I 2025.
No ratings yet
Peak Education Services Nakabugo p.4 MTC Mot I 2025.
8 pages
Files
No ratings yet
Files
4 pages
CMOS Means Complementary MOS: NMOS and PMOS Working Together in A Circuit
No ratings yet
CMOS Means Complementary MOS: NMOS and PMOS Working Together in A Circuit
8 pages
BS en Iso 140-3-1995
No ratings yet
BS en Iso 140-3-1995
26 pages
Library Thesis
No ratings yet
Library Thesis
29 pages
Volkswagen
No ratings yet
Volkswagen
16 pages
Resolution 24 - Requesting For Road Surfacing
No ratings yet
Resolution 24 - Requesting For Road Surfacing
2 pages
Marketing Management Assignment
No ratings yet
Marketing Management Assignment
16 pages
Tüv Rheinland 170111 - TR Ev Services
No ratings yet
Tüv Rheinland 170111 - TR Ev Services
9 pages
ASME - Summary Change - Sec VIII Div 1
No ratings yet
ASME - Summary Change - Sec VIII Div 1
8 pages
01 Naiv Bayes
No ratings yet
01 Naiv Bayes
25 pages
Naive Bayes Theory
No ratings yet
Naive Bayes Theory
4 pages
Soal UTS Ganjil 22-23 - Ekotek
No ratings yet
Soal UTS Ganjil 22-23 - Ekotek
2 pages
Public Safey Forces Isabela City
No ratings yet
Public Safey Forces Isabela City
3 pages
PowerScale Monthly Support Highlights - Nov 2021 (FINAL EDITS)
No ratings yet
PowerScale Monthly Support Highlights - Nov 2021 (FINAL EDITS)
9 pages
BSD50 125psi 460V - List Parts
No ratings yet
BSD50 125psi 460V - List Parts
2 pages
BUS ROUTES 1 Shift 11.07.24
No ratings yet
BUS ROUTES 1 Shift 11.07.24
6 pages
MC Chem2-Analytical Chemistry
No ratings yet
MC Chem2-Analytical Chemistry
5 pages
Cast in Situ Pile Capacity-18inch
No ratings yet
Cast in Situ Pile Capacity-18inch
1 page
Homework - 1
No ratings yet
Homework - 1
3 pages
Division Disaster Risk Reduction Management Seminar Workshop For School DRRM Focal Person
No ratings yet
Division Disaster Risk Reduction Management Seminar Workshop For School DRRM Focal Person
3 pages
Mail From Rashid
No ratings yet
Mail From Rashid
1 page
TCS Smart Syllabus
No ratings yet
TCS Smart Syllabus
3 pages

Naive Bayes

Uploaded by

Naive Bayes

Uploaded by

Naive Bayes classifier

Why it is Called Naive Bayes?

Outlook Temperature Humidity Windy Play Golf

0 Rainy Hot High False No

1 Rainy Hot High True No

2 Overcast Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcast Cool Normal True Yes

7 Rainy Mild High False No

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overcast Mild High True Yes

12 Overcast Hot Normal False Yes

13 Sunny Mild High True No

Assumption of Naive Bayes

 Continuous features are normally distributed: If a feature is continuous, then it is assumed

 Discrete features have multinomial distributions: If a feature is discrete, then it is assumed

With relation to our dataset, this concept can be understood as:

where A and B are events and P(B) ≠ 0

 P(B) is Marginal Probability: Probability of Evidence.

 P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.

X = (Rainy, Hot, High, False)

With relation to our dataset, this concept can be understood as:

Now, if any two events A and B are independent, then,

Hence, we reach to the result:

P(y∣x1,…,xn) = P(x1∣y)P(x2∣y)…P(xn∣y)P(y) / P(x1)P(x2)…P(xn)

which can be expressed as:

P(y∣x1,…,xn)=P(y) ∏i=1 to n P(xi∣y) / P(x1)P(x2)…P(xn)

P(y∣x1,…,xn)∝P(y) ∏i=1 to n P(xi∣y)

Let us test it on a new set of features (let us call it today):

today = (Sunny, Hot, Normal, False)

and probability to not play golf is given by:

P(Yes∣today)∝3/9. 2/9. 6/9. 6/9. 9/14 ≈ 0.02116

P(No∣today)∝3/5. 2/5. 1/5. 2/5. 5/14 ≈0.0068

So, prediction that golf would be played is ‘Yes’.

You might also like