0% found this document useful (0 votes)

77 views35 pages

Optimizing Spam Filtering With Machine Learning

The document discusses optimizing spam filtering through machine learning. It describes how early spam filters worked by watching for specific words but were ineffective. More advanced Bayesian and heuristic filters now learn user preferences to identify spam by word patterns and frequencies. Machine learning methods use example-based learning to classify incoming emails based on similarity to stored spam examples. The goal is to develop more reliable anti-spam filters as spam volume surges.

Uploaded by

Pavin Pavin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views35 pages

Optimizing Spam Filtering With Machine Learning

Uploaded by

Pavin Pavin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Optimizing Spam Filtering

With Machine Learning

1 INTRODUCTION

1.1 Overview

In Machine Learning, A spam filter is a program used

to detect unsolicited, unwanted and virus-infected
emails and prevent those messages from a getting
to a user's inbox. Like other types of filtering programs,
spam filter looks for specific criteria on which to
base its judgments.

Internet service providers (ISPs), free online email

services and businesses use email spam filtering tools
to minimize the risk of distributing spam.
For example, one of the simplest and earliest versions of
spam filtering, like the one that was used by Microsoft's
Hotmail, was set to watch out for particular words in the
subject lines of messages. An email was excluded from
the user's inbox whenever the filter recognized one of
the specified words.

This method is not especially effective and often omits

perfectly legitimate messages, called false positives,
while letting actual spam messages through.

More sophisticated programs, such as Bayesian filters

and other heuristic filters, identify spam messages by
recognizing suspicious word patterns or word frequency.
They do this by learning the user's preferences based on
the emails marked as spam. The spam software then
creates rules and applies them to future emails that
target the user's inbox.
For example, whenever users mark emails from a specific
sender as spam, the Bayesian filter recognizes the pattern
and automatically moves future emails from that sender to
the spam folder.

ISPs apply spam filters to both inbound and outbound emails.

However, small to medium enterprises usually focus on
inbound filters to protect their network. There are also many
different spam filtering solutions available. They can be
hosted in the cloud, hosted on servers or integrated into
email software, such as Microsoft Outlook.

In machine learning, spam filtering protocols use instance-

based or memory-based learning methods to identify and
classify incoming spam emails based on their resemblance to
stored training examples of spam emails.
The upsurge in the volume of unwanted emails called
spam has created an intense need for the development
of more dependable and robust antispam filters.
Machine learning methods of recent are being used to
successfully detect and filter spam emails. We present
a systematic review of some of the popular machine
learning based email spam filtering approaches.

1.2 Purpose

An email spam filter is a necessity for every individual

and organization operating emailing activities on a
regular base. An average person roughly receives 100-120
emails a day, out of which an average of 80% of emails
are spam. At its very root, keeping your communications
flow smooth requires a reliable email spam filter.
2 PROBLEM DEFINITION & DESIGN THINKING
2.1 Empathy Map
2.2 Ideation & Brainstorming Map
3 RESULT
4 ADVANTAGES & DISADVANTAGES

Advantages

Security :
Out of all the emails received by an individual throughout
the day, the possibility of a phishing attack or cyber threat
is never zero. With the benefits of email spam filters, the
security risk can be reduced since the user gets in hand the
emails that have gone through various spam checks.
Moreover, these email spam filters throw out malware,
malicious, and virus-infested emails and protect user security.

Time-Saving :
Let us go back to the emailing stats we discussed at the
beginning of this section. Having to filter out the 20%
important emails out of the average 80% clutter does seem
time-consuming.
This can be of greater concern if these stats are put into
an organization’s emailing communications. By filtering
out the important emails and sending to the spam box the
junk emails, an email spam filter saves time for the user
and keeps the business communications going by
streamlining the user inbox.

Increased Productivity:

Along the lines of the time-saving benefit of email spam

filters, these tools facilitate increased productivity of the
user by keeping away unwanted emails. As mentioned in
the types of email spam filters, in certain cases the users
can set up standards for email spam division. By keeping
away the emails that might distract or waste the time of
the employees, these email spam filters can keep the
inbox of employees clean and facilitate increased
productivity.
Disadvantage

The biggest disadvantage of using an email filter is that

you may end up with messages being identified as being
spam through a mistake of the algorithm that is used.
According to Steven Scott Bayesian specialist, even with
the very best spam filters on the market you can still end
up with messages being improperly labeled.

While missing out on important emails is a nuisance, we

need to think about the fact that you can also miss the
same emails if you receive a lot of spam. How can you see
that message from the boss if there are hundreds of emails
sent every single day? You can be highly attentive and still
miss out on some emails.
5 APPLICATIONS

Spam to a private email can cause havoc throughout the

system. Nowadays, it has created many problems in business
life, such as occupying network bandwidth and the space
in users’ mailboxes. Research has been conducted in this
area to resolve this issue and spam detection systems (SDS)
have been developed to monitor spammers and filter
email activities by identifying patterns in email messages,
thus improving the tool to detect spam.

Both the knowledge filtering and the guideline filtering

strategies are used to detect spam. Both have advantages
and disadvantages, but neither is effective against all threats .
The guideline detection method works well for identifying
recognised communications but not spam. In comparison,
the knowledge detection strategy is effective at finding new
messages, but it has a low detection rate and a high percentage
of false positives. As such, our study introduces a new method.
Most investigations into spam detection in the literature have
focused on the knowledge detection strategy since it seemed
more promising.

Spam filters are applied to both inbound email (email entering

the network) and outbound email (email leaving the network).
ISPs use both methods to protect their customers. SMBs typically
focus on inbound filters.

There are many spam filtering solutions available. They can be

hosted in the “cloud,” on computer servers, or integrated into
email software such as Microsoft Outlook.
6 CONCLUSION

To review the results of the hypothesis it can be said,

that the design of a Meta spam filter make sense as well
as has its ground. Although the notion deals with existing
spam filters as well as e-mail corpus, the over describe
methodology can as well be applied for extra filters also.
Studies of Bayesian networks have provided a fine base for
the creation of a Meta spam filter.
7 FUTURE SCOPE

This work proposes a model for improving recognition

of cruel spam in email. Our model resolve employ a
novel dataset intended for the process of feature choice,
and then validate the set of chosen features using three
classifiers identified in spam detection: Support Vector
Machine, Naïve Bayes, and Multilayer Perception. Feature
selection is projected to recover training time as well as
accuracy for the classifiers.
8 APPENDIX

A. Source Code

2 Data Collection & Preparation

2.1 Collect The Dataset

2.2 Importing The Libraries
2.3 Read The Dataset
2.4 Data Preparation

2.5 Handling Missing Values

Rename The Column
2.6 Handling Categorical Values
2.7 Handling Imbalance Data
2.8 Cleaning The Text Data
3 Exploratory Data Analysis

3.1 Descriptive Statistical

3.2 Visual Analysis
3.3 Univariate Analysis
4 Model Building

4.1 Training The Model In Multiple Algorithms

Splitting data into train and test

4.2 Compare The Model
5 Performance Testing & Hyperparameter Tuning

5.1 Testing Model With Multiple Evaluation Metrics

5.2 Compare The Model

5.3 Comparing Model Accuracy Before & After Applying

Hyperparameter Tuning
6 Model Deployment

6.1 Save The Best Model

6.2 Integrate With Web Framework

6.3 Building Html Pages

6.4 Build Python Code
6.5 Run The Web Application

20409B ENU TrainerHandbook
100% (2)
20409B ENU TrainerHandbook
640 pages
Guide To Industrial Analytics Solving Data Science Problems For Manufacturing and The Internet of Things (Richard Hill, Stuart Berry)
No ratings yet
Guide To Industrial Analytics Solving Data Science Problems For Manufacturing and The Internet of Things (Richard Hill, Stuart Berry)
285 pages
New Google Dorking
No ratings yet
New Google Dorking
21 pages
Spam Email. Classifier
No ratings yet
Spam Email. Classifier
16 pages
Microsoft OST To PST Converter
No ratings yet
Microsoft OST To PST Converter
13 pages
Email Spam Detection
No ratings yet
Email Spam Detection
13 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
Manual PDF
No ratings yet
Manual PDF
470 pages
Guide 6
No ratings yet
Guide 6
98 pages
Chat Rest
No ratings yet
Chat Rest
65 pages
MAA - OnPremises - Overview ORACLE
No ratings yet
MAA - OnPremises - Overview ORACLE
64 pages
1 SRS (Email Spam Detection) - Introduction:: 1.1.1 Purpose
No ratings yet
1 SRS (Email Spam Detection) - Introduction:: 1.1.1 Purpose
10 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
16 Transactions and Concurrency Control
No ratings yet
16 Transactions and Concurrency Control
26 pages
Usxref
No ratings yet
Usxref
57 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
100% (2)
Hybrid Machine Learning Based E-Mail Spam Filtering Technique
58 pages
Spam Email Using Machine Learning
No ratings yet
Spam Email Using Machine Learning
13 pages
NLP Report
No ratings yet
NLP Report
19 pages
DX Diag
No ratings yet
DX Diag
31 pages
Email Spam Filtering Techniques
No ratings yet
Email Spam Filtering Techniques
11 pages
Amrit Science Campus: Submitted by
No ratings yet
Amrit Science Campus: Submitted by
35 pages
Types of Spam Filters
No ratings yet
Types of Spam Filters
5 pages
E-Mail Spam Filtering: A Review of Techniques and Trends: January 2018
No ratings yet
E-Mail Spam Filtering: A Review of Techniques and Trends: January 2018
28 pages
Spam E-Mail
No ratings yet
Spam E-Mail
9 pages
Spam 2023
No ratings yet
Spam 2023
11 pages
Unit 3
No ratings yet
Unit 3
11 pages
Spam Filtering Thesis
100% (2)
Spam Filtering Thesis
6 pages
Guidelines 1/2020 On Processing Personal Data in The Context of Connected Vehicles and Mobility Related Applications
No ratings yet
Guidelines 1/2020 On Processing Personal Data in The Context of Connected Vehicles and Mobility Related Applications
31 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
Ijirt156181 Paper
No ratings yet
Ijirt156181 Paper
5 pages
Intel MP Architecture & Assembly Programming
No ratings yet
Intel MP Architecture & Assembly Programming
16 pages
10939-Article Text-13747-1-10-20240802
No ratings yet
10939-Article Text-13747-1-10-20240802
8 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
EmailSpamFilteringTechniques AReview
No ratings yet
EmailSpamFilteringTechniques AReview
13 pages
Quiz Questions - MS POWERPOINT
No ratings yet
Quiz Questions - MS POWERPOINT
14 pages
CPP Report
No ratings yet
CPP Report
14 pages
Website: 1. How To Insert Table in Website
No ratings yet
Website: 1. How To Insert Table in Website
12 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Anti Spam
No ratings yet
Anti Spam
26 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Allahabad Bank Last Year Clerk Exam Paper With Answer - English Language & Computer General Awareness Solved Questions of Allahabad Bank
No ratings yet
Allahabad Bank Last Year Clerk Exam Paper With Answer - English Language & Computer General Awareness Solved Questions of Allahabad Bank
16 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
Report
No ratings yet
Report
6 pages
Spam Classification Based On Supervised Learning U
No ratings yet
Spam Classification Based On Supervised Learning U
6 pages
Spam Filtering Using Spam Mail Communities: A Paper On
No ratings yet
Spam Filtering Using Spam Mail Communities: A Paper On
13 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
IJTC201510012-Email With Classification Detection Power
No ratings yet
IJTC201510012-Email With Classification Detection Power
7 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Comparative Analysis of Classifiers For PDF
No ratings yet
Comparative Analysis of Classifiers For PDF
6 pages
Email Based Spam Detection
No ratings yet
Email Based Spam Detection
5 pages
Naive Bayes Spam Filte....
No ratings yet
Naive Bayes Spam Filte....
10 pages
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
No ratings yet
A Comparative Performance Evaluation of Content Based Spam and Malicious URL Detection in E-Mail
6 pages
PTSGI Company Profile
No ratings yet
PTSGI Company Profile
15 pages
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
No ratings yet
Detecting Spam Messages Using The Naive Bayes Algorithm of Basic Machine Learning
3 pages
Article 28
No ratings yet
Article 28
5 pages
Salesforce Flow Quick Reference Guide
No ratings yet
Salesforce Flow Quick Reference Guide
4 pages
Jebin 2
No ratings yet
Jebin 2
22 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
No ratings yet
A Novel Method of Spam Mail Detection Using Text Based Clustering Approach
11 pages
Moutafis EWS 098
No ratings yet
Moutafis EWS 098
8 pages
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
No ratings yet
Majority Voting Technique To Classify Emails As Spam or Ham: 1 Background, Context and Scope 2 Problem Description
17 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
Report
No ratings yet
Report
11 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Email Spam Detection
No ratings yet
Email Spam Detection
8 pages
E-Mail Security Using Spam Mail Detection and Filtering Network System
No ratings yet
E-Mail Security Using Spam Mail Detection and Filtering Network System
4 pages
0 - Spam Mail Prediction
No ratings yet
0 - Spam Mail Prediction
29 pages
Email (Research) 3
No ratings yet
Email (Research) 3
7 pages
Symantec Endpoint Protection Manager System Requirements
No ratings yet
Symantec Endpoint Protection Manager System Requirements
4 pages
Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability
No ratings yet
Reverse of E-Mail Spam Filtering Algorithms To Maintain E-Mail Deliverability
4 pages
Transport Phenomena Fundamentals PDF
0% (2)
Transport Phenomena Fundamentals PDF
2 pages
Synopsis Email Spam
No ratings yet
Synopsis Email Spam
9 pages
CSC315 356 144-CSC315
No ratings yet
CSC315 356 144-CSC315
5 pages
cs3157 - Advanced Programming Summer 2006, Lab #4, 30 Points June 15, 2006
No ratings yet
cs3157 - Advanced Programming Summer 2006, Lab #4, 30 Points June 15, 2006
5 pages
Practical List
No ratings yet
Practical List
2 pages
Lab 4: Access Policy For Easyconnect
No ratings yet
Lab 4: Access Policy For Easyconnect
12 pages
Why The Pharaohs Built The Pyramids With Fake ST 9782951482043 PDF
No ratings yet
Why The Pharaohs Built The Pyramids With Fake ST 9782951482043 PDF
2 pages
Pitch Deck
No ratings yet
Pitch Deck
1 page
License
No ratings yet
License
2 pages
PPT
0% (1)
PPT
15 pages
E-Mail Spam Filtering
No ratings yet
E-Mail Spam Filtering
7 pages
Cyber Security-Ethical Hacking: Letshack Foundation in Collaboration With 3girps Presents
No ratings yet
Cyber Security-Ethical Hacking: Letshack Foundation in Collaboration With 3girps Presents
4 pages
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
No ratings yet
Survey On Spam Filtering in Text Analysis: Saksham Sharma, Rabi Raj Yadav
7 pages
HP 500 Error Code
No ratings yet
HP 500 Error Code
12 pages
Accounting Information System - Chapter 8
0% (1)
Accounting Information System - Chapter 8
22 pages
DK Essential Managers: Dealing With E-mail
From Everand
DK Essential Managers: Dealing With E-mail
David Brake
4/5 (1)
SpamAssassin: A practical guide to integration and configuration
From Everand
SpamAssassin: A practical guide to integration and configuration
Alistair McDonald
No ratings yet
Guide to PC Security
From Everand
Guide to PC Security
Max Editorial
No ratings yet

Optimizing Spam Filtering With Machine Learning

Uploaded by

Optimizing Spam Filtering With Machine Learning

Uploaded by

Optimizing Spam Filtering

With Machine Learning

In Machine Learning, A spam filter is a program used

Internet service providers (ISPs), free online email

This method is not especially effective and often omits

More sophisticated programs, such as Bayesian filters

ISPs apply spam filters to both inbound and outbound emails.

In machine learning, spam filtering protocols use instance-

An email spam filter is a necessity for every individual

Along the lines of the time-saving benefit of email spam

The biggest disadvantage of using an email filter is that

While missing out on important emails is a nuisance, we

Spam to a private email can cause havoc throughout the

Both the knowledge filtering and the guideline filtering

Spam filters are applied to both inbound email (email entering

There are many spam filtering solutions available. They can be

To review the results of the hypothesis it can be said,

This work proposes a model for improving recognition

2 Data Collection & Preparation

2.1 Collect The Dataset

2.5 Handling Missing Values

3.1 Descriptive Statistical

4.1 Training The Model In Multiple Algorithms

Splitting data into train and test

5.1 Testing Model With Multiple Evaluation Metrics

5.2 Compare The Model

5.3 Comparing Model Accuracy Before & After Applying

6.1 Save The Best Model

6.2 Integrate With Web Framework

6.3 Building Html Pages

You might also like