NAÏVE BAYES ALGORITHM
BAYES THEOREM
Bayes theorem helps to calculate the probability of
occuring one event with uncertain knowledge while
other one is already occured
Bayes theorem is derived from the conditional
probability
Conditional probability: if A and B are two
dependent events then the probability of occurrence of
A given that B has already occurred.
Bayes theorem :p(A|B)=P(B|A)*P(A) /P(B)
p(A|B)=P(B|A)*P(A) /P(B)
P(A|B) is posterier probability, defined as updated
probability after considering the evidence
P(B|A) is called a likelihood, it is a probability of the
evidence when hypothesis is true
P(A) is called class prior probability, probability of
hypothesis before considering the evidence
P(B) is called predictor prior probability. It is defined
as the probability of evidence under any
consideration.
NAÏVE BAYES THEOREM
Naïve bayes is a classification algorithm which works based on the bayes
theorem.
Naive Bayes is called naive because it assumes that each input variable is
independent.
Working of Naïve Bayes' Classifier can be understood with the help of the
below example:
Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that whether
we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
Convert the given dataset into frequency tables.
Generate Likelihood table by finding the probabilities of given features.
Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Day Outlook Temp Humidity Wind Play
tennis
D1 Rainy Hot High Weak No
D2 Rainy Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Sunny Mild High Weak Yes
D5 Sunny Cool Normal Weak Yes
D6 Sunny Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Rainy Mild High Weak No
D9 Rainy Cool Normal Weak Yes
D10 Sunny Mild Normal Weak Yes
D11 Rainy Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Sunny mild High Strong No
Step 1: frequency tables
Play tennis
Play tennis
yes no yes no
outlook temp Hot 2 2
sunny 3 2
Mild 4 2
overcast 4 0
rainy 2 3 cool 3 1
Play tennis Play tennis
humidity Yes No
wind yes no
high 3 4
nor 6 1 false 6 2
mal true 3 3
Step 2: likelihood tables
Play tennis
Play tennis
yes no yes no
outlook temp Hot 2/9 2/5
sunny 3/9 2/5
Mild 4/9 2/5
overcast 4/9 0/5
rainy 2/9 3/5 cool 3/9 1/5
Play tennis Play tennis
humidity Yes No
wind yes no
high 3/9 4/5
nor 6/9 1/5 false 6/9 2/5
mal true 3/9 3/5
Step 3: calculating posterior probability
P(yes / sunny)= p(sunny /yes) * p(yes)
p(sunny)
=0.33*0.64/0.36 = 0.60
P(no / sunny)= p(sunny / no) * p(no)
p(sunny)
=0.40 * 0.36 / 0.36 = 0.40
the posterior probability of yes is grater than posterior
probability of no so the player can play tennis
Example 2:naïve bayes for filtering spam
D1: send us your password- spam
D2: send us your review – ham
D3: review your password – ham
D4: review us- spam
D5: send your password – spam
D6: send us your account – spam
New mail: review us now
P(spam)=4/6 P(ham)=2/6
words spam Ham
password 2/4 ½
Review 1/4 2/2
Send 3/4 ½
Us 3/4 ½
Your 3/4 ½
account 1/4 0/2
P(spam/review us)=
p(review/spam).p(us/spam).p(spam)
p(review) . P(us)
= ¼ * ¾ *4/6
3/6 * 4/6
=0.375
P(ham / review us) =
p(review / ham).p(us/ham).p(ham)
p(review). P(us)
= 2/2 * ½ *2/6
3/6 * 4/6
=0.5 > 0.375
So the mail is ham
why KNN and Linear Regression are poor
choices for filtering spam
K-Nearest Neighbors (KNN) and Linear Regression are
typically poor choices for filtering spam for several reasons:
1. KNN's inefficiency:
KNN is a non-parametric algorithm that classifies data points
based on their similarity to the k-nearest neighbors. In the
context of spam filtering, this would involve comparing
incoming email messages to a database of labeled examples.
However, for spam filtering, the feature space can be high-
dimensional, and the number of training examples (emails)
can be very large. Calculating the distances between all data
points for each new email can be computationally expensive
and slow, making it impractical for real-time spam filtering.
2.Lack of interpretability:
KNN does not provide insights into why a particular
decision was made. It cannot explain why a given email
was classified as spam or not, which is crucial for spam
filter users who want to understand and trust the system's
decisions.
3.Sensitivity to irrelevant features:
KNN relies on the entire feature space, and it may not
perform well when irrelevant features are present in the
data. In spam filtering, many features could be irrelevant
or even detrimental to the classification task.
4.Linearity in Linear Regression:
Linear Regression is designed for regression tasks,
where the goal is to predict a continuous target
variable. Spam filtering is a binary classification
problem (spam or not spam). Using Linear Regression
in such cases may lead to suboptimal results because it
models a linear relationship between features and the
target, which doesn't capture the complex, non-linear
patterns in spam emails.
5. Assumptions of Linear Regression:
Linear Regression assumes that the relationship
between the features and the target variable is linear
and that the residuals (the errors) are normally
distributed and homoscedastic. These assumptions
may not hold in the case of spam filtering, where the
relationship between email features and spam status
can be highly non-linear, and the data may not meet
these assumptions.
6. Outliers and noise:
Both KNN and Linear Regression can be sensitive to
outliers and noise in the data. In spam filtering, there
may be a considerable amount of noisy or mislabeled
data, and these methods may not handle such data
well.