0% found this document useful (0 votes)

60 views15 pages

Naïve Bayesian Classifier

The document discusses the Naïve Bayesian Classifier and K-Means Clustering, detailing their concepts, applications, and mathematical foundations. It includes examples such as spam email detection for the Naïve Bayesian Classifier and customer segmentation for K-Means Clustering. The document also provides a submission overview, including similarity metrics and integrity checks.

Uploaded by

megwejohnmwangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views15 pages

Naïve Bayesian Classifier

Uploaded by

megwejohnmwangi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Page 1 of 15 - Cover Page Submission ID trn:oid:::29034:86246888

submission
My Files

My Files

University

Document Details

Submission ID

trn:oid:::29034:86246888 11 Pages

Submission Date 1,085 Words

Mar 17, 2025, 1:20 AM GMT+5:30

5,901 Characters

Download Date

Mar 17, 2025, 1:21 AM GMT+5:30

File Name

Naïve Bayesian Classifier and K means.docx

File Size

98.2 KB

Page 1 of 15 - Cover Page Submission ID trn:oid:::29034:86246888

Page 2 of 15 - Integrity Overview Submission ID trn:oid:::29034:86246888

24% Overall Similarity

The combined total of all matches, including overlapping sources, for each database.

Filtered from the Report

Bibliography

Quoted Text

Match Groups Top Sources

21 Not Cited or Quoted 24% 18% Internet sources

Matches with neither in-text citation nor quotation marks
12% Publications
0 Missing Quotations 0% 20% Submitted works (Student Papers)
Matches that are still very similar to source material

0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation

0 Cited and Quoted 0%

Matches with in-text citation present, but no quotation marks

Integrity Flags
0 Integrity Flags for Review
Our system's algorithms look deeply at a document for any inconsistencies that
No suspicious text manipulations found. would set it apart from a normal submission. If we notice something strange, we flag
it for you to review.

A Flag is not necessarily an indicator of a problem. However, we'd recommend you

focus your attention there for further review.

Page 2 of 15 - Integrity Overview Submission ID trn:oid:::29034:86246888

Page 3 of 15 - Integrity Overview Submission ID trn:oid:::29034:86246888

Match Groups Top Sources

21 Not Cited or Quoted 24% 18% Internet sources

Matches with neither in-text citation nor quotation marks
12% Publications
0 Missing Quotations 0% 20% Submitted works (Student Papers)
Matches that are still very similar to source material

0 Missing Citation 0%
Matches that have quotation marks, but no in-text citation

0 Cited and Quoted 0%

Matches with in-text citation present, but no quotation marks

Top Sources
The sources with the highest number of matches within the submission. Overlapping sources will not be displayed.

1 Internet

pmc.ncbi.nlm.nih.gov 5%

2 Submitted works

Indiana Wesleyan University on 2025-03-03 2%

3 Internet

machinelearningmodels.org 2%

4 Submitted works

University College London on 2022-03-04 2%

5 Internet

www.coursehero.com 2%

6 Submitted works

BB9.1 PROD on 2024-12-09 1%

7 Internet

machinelearningcoban.com 1%

8 Internet

dev.to 1%

9 Submitted works

King's College on 2014-04-22 <1%

10 Submitted works

University of Stirling on 2024-12-06 <1%

Page 3 of 15 - Integrity Overview Submission ID trn:oid:::29034:86246888

Page 4 of 15 - Integrity Overview Submission ID trn:oid:::29034:86246888

11 Submitted works

Erasmus University of Rotterdam on 2021-04-27 <1%

12 Submitted works

Liverpool John Moores University on 2024-11-22 <1%

13 Submitted works

Melbourne Institute of Technology on 2025-01-09 <1%

14 Submitted works

Cardiff University on 2024-09-07 <1%

15 Submitted works

Nottingham Trent University on 2020-09-04 <1%

16 Internet

runningcode11.blogspot.com <1%

17 Publication

Yan Liang, Jeong-Yeol Yoon. "Fundamentals of machine learning", Elsevier BV, 2024 <1%

18 Internet

fastercapital.com <1%

19 Internet

www.jatit.org <1%

Page 4 of 15 - Integrity Overview Submission ID trn:oid:::29034:86246888

Page 5 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Naïve Bayesian Classifier and K-Means Clustering

5 Student’s Name

Institution Affiliation

Professor’s name

Course

Date

Page 5 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 6 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Naïve Bayesian Classifier and K-Means Clustering

Part 1: Naïve Bayesian Classifier

1. Concept Explanation

19 Machine learning uses the Naïve Bayesian Classifier as a probabilistic algorithm for

14 classification operations. This algorithm relies on Bayes' theorem, assuming features become

independent when the class is known. Despite using a simplified conditional independence

assumption, the Naïve Bayesian Classifier functions effectively for spam detection, sentiment

analysis, and medical diagnosis. Feature independence occurs only after the class label has been

provided.

Assumptions:

1. Conditional Independence – The algorithm bases its operation on a rule that states

features show independence from one another when the class label serves as input.

2. Equal Importance of Features – Each feature contributes equally to the classification.

3. Prior Probabilities Are Used – The model relies on prior knowledge (base rates of

classes).

1 Mathematically, Bayes' theorem is given by:

𝑃(𝑋|𝐶)𝑃(𝐶)
𝑃(𝐶|𝑋) =
𝑃(𝑋)

Where:

 𝑃(𝐶|𝑋) is the posterior probability of class C given feature set X.

 𝑃(𝑋|𝐶) is the likelihood of feature set X given class C.

 𝑃(𝐶) is the prior probability of class C.

 𝑃(𝑋) is the marginal probability of feature set X.

Page 6 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 7 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

5 For multiple features X = (X1, X2, ..., Xn), the Naïve Bayes assumption simplifies:
𝑛
𝑃(𝐶)𝛱ⅈ=1 𝑃(𝑋ⅈ |𝐶)
𝑃(𝐶|𝑋) =
𝑃(𝑋)

This allows for efficient computation in classification problems.

2. Example with Explanation

Application: Spam Email Detection

Spam detection involves categorizing emails into spam and valid messages (ham). The

primary purpose is to develop a predictive model for identifying spam emails based on word

frequency patterns and additional characteristics.

Classification Objective

The goal is to determine the probability of an email being spam given a set of observed

9 words. This is achieved using the Naïve Bayes classifier, which assumes that the presence of

each word in the email is independent of the others, given the class label.

3. Sample Problem & Solution

Dataset

Consider a small dataset of emails with the presence (1) or absence (0) of specific keywords:

Email ID "Free" "Win" "Money" "Offer" Spam (1=Yes, 0=No)

1 1 1 0 1 1

2 0 1 1 0 0

3 1 1 1 1 1

4 0 0 1 0 0

5 1 0 1 1 1

Page 7 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 8 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

We classify a new email with: ("Free"=1, "Win"=1, "Money"=1, "Offer"=0).

Step-by-Step Calculation using Bayes' Theorem

Calculate Priors:
3
 P(Spam) = = 0.6
5

2
 P(Not Spam) = = 0.4
5

Calculate Likelihoods:
2
 P(Free=1∣Spam) = = 0.67
3

2
 P(Win=1∣Spam) = = 0.67
3

2
 P(Money=1∣Spam) = = 0.67
3

11 1
 P(Offer=0∣Spam) = = 0.33
3

0
 P(Free=1∣Not Spam) = = 0.00
2

1
 P(Win=1∣Not Spam) = = 0.5
2

1
 P(Money=1∣Not Spam) = = 0.5
2

1
 P(Offer=0∣Not Spam) = = 0.5
2

Compute Posteriors:

 P(Spam∣X) ∝ 0.6 × (0.67 × 0.67 × 0.67 × 0.33)

 P(Not Spam∣X) ∝ 0.4 × (0.00 × 0.50 × 0.50 × 0.50)

Since P(Not Spam∣X) is 0, the classification is Spam.

Python Code for Naïve Bayes Implementation

7 from sklearn.naive_bayes import BernoulliNB

import numpy as np

Page 8 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 9 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

# Training dataset
X_train = np.array([[1,1,0,1], [0,1,1,0], [1,1,1,1], [0,0,1,0], [1,0,1,1]])
16 y_train = np.array([1, 0, 1, 0, 1]) # 1 = Spam, 0 = Not Spam

# New email sample

X_test = np.array([[1,1,1,0]])

# Model training
6 nb_model = BernoulliNB()
nb_model.fit(X_train, y_train)

# Prediction
prediction = nb_model.predict(X_test)
print("Prediction:", "Spam" if prediction[0] == 1 else "Not Spam")
Output:

Prediction: Spam

Page 9 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 10 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Part 2: K-Means Clustering

1. Concept Explanation

Machine learning techniques implement clustering as an unsupervised approach that

groups data points through their shared features. Clustering algorithms detect natural data

groupings in an unsupervised manner since they work without predefined categories. Clustering

serves multiple functions, including market segmentation and anomaly detection, image

processing, and biological data analysis.

Definition of K-Means Clustering

10 K-Means Clustering is commonly used in marketing to segment customers based on

18 spending behavior and income levels. This allows businesses to target specific customer groups

with personalized promotions.

K-Means follows three main steps:

1. Centroid Selection:

17 o Randomly select K initial centroids from the dataset.

2 2. Cluster Assignment:

o Each data point is assigned to the nearest centroid based on the Euclidean

distance.

3. Centroid Updating:

o Compute the new centroid by taking the mean of all points in the cluster.

o Repeat until centroids no longer change significantly (convergence).

The centroid of a cluster is mathematically represented as:

Page 10 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 11 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

𝑛
1
𝐶𝑘 = ∑ 𝑥ⅈ
𝑛
ⅈ=1

where:

12  𝐶𝑘 is the centroid of cluster k,

 𝑥ⅈ represents the data points in cluster k,

15  n is the number of points in the cluster.

The Euclidean distance used for assigning clusters:

𝑛
2
𝑑(𝑥, 𝐶𝑘 ) = √∑(𝑥𝑗 − 𝐶𝑘𝑗 )
0=1

where:

 x is a data point,

 Ck is the cluster centroid,

 m is the number of features.

2. Customer Segmentation

The marketing industry uses K-Means Clustering as a popular technique to divide

customers by their purchasing activities and financial capability. Businesses use this approach to

deliver advertisements that cater specifically to discernible customer demographics.

3. Sample Problem & Solution

4 Customer ID Annual Income ($1000s) Spending Score (1-100)

1 15 39

Page 11 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 12 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

2 16 81

3 17 6

4 18 77

5 20 40

6 24 94

7 25 3

8 30 73

9 35 92

10 40 8

Step 1: Initial Centroid Selection

Randomly selecting K=3 centroids:

 C1 (Low Income, Low Spending): (15, 39)

 C2 (Middle Income, High Spending): (24, 94)

 C3 (High Income, Low Spending): (40, 8)

Step 2: Cluster Assignment (Iteration 1)

Using Euclidean distance to compute the 3 centroids to each customer, and assigning it to

the nearest one.

Example Calculation for Customer 1 (15, 39)

Distance to C1 (15,39):

𝑑1 = √(15 − 15)2 + (39 − 39)2 = 0

(Customer 1 stays in Cluster 1)

Distance to C2 (24,94):

Page 12 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 13 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

𝑑1 = √(15 − 24)2 + (39 − 94)2 = 55.1

Distance to C3 (40,8):

𝑑1 = √(15 − 40)2 + (39 − 8)2 = 39.5

Step 3: Centroid Update

Example for Cluster 1 (Customers: 1, 3, 5, 7, 10):

New centroid:

51+17+20+25+40 39+6+40+3+8
C1 = ( , ) = (23.4,19.2)
5 5

Final Cluster Assignments

4 Customer ID Annual Income ($1000s) Spending Score (1-100) Final Cluster
1 15 39 1
2 16 81 2
3 17 6 1
4 18 77 2
5 20 40 1
6 24 94 2
7 25 3 1
8 30 73 2
9 35 92 2
10 40 8 1

Page 13 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 14 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

8 # Dataset: Customer Income & Spending Score

X = np.array([
[15, 39], [16, 81], [17, 6], [18, 77], [20, 40],
[24, 94], [25, 3], [30, 73], [35, 92], [40, 8]
])

3 # Apply K-Means Clustering

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Cluster assignments
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Plot the clusters

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', marker='o', edgecolor='k')
plt.scatter(centroids[:, 0], centroids[:, 1], s=300, c='red', marker='X', label='Centroids')
plt.xlabel('Annual Income ($1000s)')
plt.ylabel('Spending Score (1-100)')
13 plt.title('K-Means Customer Segmentation')
plt.legend()
plt.show()

# Print cluster assignments

print("Final Cluster Assignments:", labels)
Output

Page 14 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Page 15 of 15 - Integrity Submission Submission ID trn:oid:::29034:86246888

Naïve Bayesian Classifier and K-Means Clustering
No ratings yet
Naïve Bayesian Classifier and K-Means Clustering
13 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
CH 5
No ratings yet
CH 5
21 pages
Naive Bayes
No ratings yet
Naive Bayes
4 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
10 pages
Naive Bates Classifier
No ratings yet
Naive Bates Classifier
18 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
16 pages
Naive Bayes Classifier Notes
No ratings yet
Naive Bayes Classifier Notes
3 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
7 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
Naïve Bayes for Computer Science Students
No ratings yet
Naïve Bayes for Computer Science Students
38 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
Machine Learning: Classification & Naive Bayes
No ratings yet
Machine Learning: Classification & Naive Bayes
20 pages
Mechine Learning
No ratings yet
Mechine Learning
7 pages
Naive Bays
No ratings yet
Naive Bays
10 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
17 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
Lec 09
No ratings yet
Lec 09
50 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Practical 3
No ratings yet
Practical 3
11 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
16 pages
22mbada303 Module 5
No ratings yet
22mbada303 Module 5
61 pages
Bayesian Classification, Nearest
No ratings yet
Bayesian Classification, Nearest
46 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Naïve Bayes Classifier Guide
No ratings yet
Naïve Bayes Classifier Guide
47 pages
Naive Bayes Classifier 1
No ratings yet
Naive Bayes Classifier 1
18 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Lec 09
No ratings yet
Lec 09
50 pages
FPA Notes
No ratings yet
FPA Notes
13 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
CSC 325 AI Lecture08 Supervised Learning
No ratings yet
CSC 325 AI Lecture08 Supervised Learning
32 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
AI ML Unit4
No ratings yet
AI ML Unit4
252 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
Naive Bayes - Report (Repaired)
No ratings yet
Naive Bayes - Report (Repaired)
5 pages
WK 08
No ratings yet
WK 08
10 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
7 Final AI
No ratings yet
7 Final AI
10 pages
Introduction to Classification in AI
No ratings yet
Introduction to Classification in AI
66 pages
AIReport
No ratings yet
AIReport
7 pages
AIReport
No ratings yet
AIReport
12 pages
Assignment Guideline 2
No ratings yet
Assignment Guideline 2
5 pages
MH Training for OR Nurses
No ratings yet
MH Training for OR Nurses
12 pages
Screen-Time Weight-Loss Intervention Targeting Children at Home (SWITCH) : A Randomized Controlled Trial
No ratings yet
Screen-Time Weight-Loss Intervention Targeting Children at Home (SWITCH) : A Randomized Controlled Trial
11 pages
Apple Stock Forecasting Study
No ratings yet
Apple Stock Forecasting Study
9 pages
Business Ecosystem Analysis Report - Docx: Document Details
No ratings yet
Business Ecosystem Analysis Report - Docx: Document Details
29 pages
Journal of Catholic Social Thought
No ratings yet
Journal of Catholic Social Thought
17 pages
Lesson 15.3 Exponential Smoothing Techniques - Practice Results - Hawkes Learning - Portal
No ratings yet
Lesson 15.3 Exponential Smoothing Techniques - Practice Results - Hawkes Learning - Portal
1 page
Lesson 15.4 Forecast Accuracy - Certify Results - Hawkes Learning - Portal
No ratings yet
Lesson 15.4 Forecast Accuracy - Certify Results - Hawkes Learning - Portal
1 page
Week 3
No ratings yet
Week 3
31 pages