[go: up one dir, main page]

0% found this document useful (0 votes)
77 views35 pages

Optimizing Spam Filtering With Machine Learning

The document discusses optimizing spam filtering through machine learning. It describes how early spam filters worked by watching for specific words but were ineffective. More advanced Bayesian and heuristic filters now learn user preferences to identify spam by word patterns and frequencies. Machine learning methods use example-based learning to classify incoming emails based on similarity to stored spam examples. The goal is to develop more reliable anti-spam filters as spam volume surges.

Uploaded by

Pavin Pavin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views35 pages

Optimizing Spam Filtering With Machine Learning

The document discusses optimizing spam filtering through machine learning. It describes how early spam filters worked by watching for specific words but were ineffective. More advanced Bayesian and heuristic filters now learn user preferences to identify spam by word patterns and frequencies. Machine learning methods use example-based learning to classify incoming emails based on similarity to stored spam examples. The goal is to develop more reliable anti-spam filters as spam volume surges.

Uploaded by

Pavin Pavin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Optimizing Spam Filtering

With Machine Learning


1 INTRODUCTION

1.1 Overview

In Machine Learning, A spam filter is a program used


to detect unsolicited, unwanted and virus-infected
emails and prevent those messages from a getting
to a user's inbox. Like other types of filtering programs,
spam filter looks for specific criteria on which to
base its judgments.

Internet service providers (ISPs), free online email


services and businesses use email spam filtering tools
to minimize the risk of distributing spam.
For example, one of the simplest and earliest versions of
spam filtering, like the one that was used by Microsoft's
Hotmail, was set to watch out for particular words in the
subject lines of messages. An email was excluded from
the user's inbox whenever the filter recognized one of
the specified words.

This method is not especially effective and often omits


perfectly legitimate messages, called false positives,
while letting actual spam messages through.

More sophisticated programs, such as Bayesian filters


and other heuristic filters, identify spam messages by
recognizing suspicious word patterns or word frequency.
They do this by learning the user's preferences based on
the emails marked as spam. The spam software then
creates rules and applies them to future emails that
target the user's inbox.
For example, whenever users mark emails from a specific
sender as spam, the Bayesian filter recognizes the pattern
and automatically moves future emails from that sender to
the spam folder.

ISPs apply spam filters to both inbound and outbound emails.


However, small to medium enterprises usually focus on
inbound filters to protect their network. There are also many
different spam filtering solutions available. They can be
hosted in the cloud, hosted on servers or integrated into
email software, such as Microsoft Outlook.

In machine learning, spam filtering protocols use instance-


based or memory-based learning methods to identify and
classify incoming spam emails based on their resemblance to
stored training examples of spam emails.
The upsurge in the volume of unwanted emails called
spam has created an intense need for the development
of more dependable and robust antispam filters.
Machine learning methods of recent are being used to
successfully detect and filter spam emails. We present
a systematic review of some of the popular machine
learning based email spam filtering approaches.

1.2 Purpose

An email spam filter is a necessity for every individual


and organization operating emailing activities on a
regular base. An average person roughly receives 100-120
emails a day, out of which an average of 80% of emails
are spam. At its very root, keeping your communications
flow smooth requires a reliable email spam filter.
2 PROBLEM DEFINITION & DESIGN THINKING
2.1 Empathy Map
2.2 Ideation & Brainstorming Map
3 RESULT
4 ADVANTAGES & DISADVANTAGES

Advantages

Security :
Out of all the emails received by an individual throughout
the day, the possibility of a phishing attack or cyber threat
is never zero. With the benefits of email spam filters, the
security risk can be reduced since the user gets in hand the
emails that have gone through various spam checks.
Moreover, these email spam filters throw out malware,
malicious, and virus-infested emails and protect user security.

Time-Saving :
Let us go back to the emailing stats we discussed at the
beginning of this section. Having to filter out the 20%
important emails out of the average 80% clutter does seem
time-consuming.
This can be of greater concern if these stats are put into
an organization’s emailing communications. By filtering
out the important emails and sending to the spam box the
junk emails, an email spam filter saves time for the user
and keeps the business communications going by
streamlining the user inbox.

Increased Productivity:

Along the lines of the time-saving benefit of email spam


filters, these tools facilitate increased productivity of the
user by keeping away unwanted emails. As mentioned in
the types of email spam filters, in certain cases the users
can set up standards for email spam division. By keeping
away the emails that might distract or waste the time of
the employees, these email spam filters can keep the
inbox of employees clean and facilitate increased
productivity.
Disadvantage

The biggest disadvantage of using an email filter is that


you may end up with messages being identified as being
spam through a mistake of the algorithm that is used.
According to Steven Scott Bayesian specialist, even with
the very best spam filters on the market you can still end
up with messages being improperly labeled.

While missing out on important emails is a nuisance, we


need to think about the fact that you can also miss the
same emails if you receive a lot of spam. How can you see
that message from the boss if there are hundreds of emails
sent every single day? You can be highly attentive and still
miss out on some emails.
5 APPLICATIONS

Spam to a private email can cause havoc throughout the


system. Nowadays, it has created many problems in business
life, such as occupying network bandwidth and the space
in users’ mailboxes. Research has been conducted in this
area to resolve this issue and spam detection systems (SDS)
have been developed to monitor spammers and filter
email activities by identifying patterns in email messages,
thus improving the tool to detect spam.

Both the knowledge filtering and the guideline filtering


strategies are used to detect spam. Both have advantages
and disadvantages, but neither is effective against all threats .
The guideline detection method works well for identifying
recognised communications but not spam. In comparison,
the knowledge detection strategy is effective at finding new
messages, but it has a low detection rate and a high percentage
of false positives. As such, our study introduces a new method.
Most investigations into spam detection in the literature have
focused on the knowledge detection strategy since it seemed
more promising.

Spam filters are applied to both inbound email (email entering


the network) and outbound email (email leaving the network).
ISPs use both methods to protect their customers. SMBs typically
focus on inbound filters.

There are many spam filtering solutions available. They can be


hosted in the “cloud,” on computer servers, or integrated into
email software such as Microsoft Outlook.
6 CONCLUSION

To review the results of the hypothesis it can be said,


that the design of a Meta spam filter make sense as well
as has its ground. Although the notion deals with existing
spam filters as well as e-mail corpus, the over describe
methodology can as well be applied for extra filters also.
Studies of Bayesian networks have provided a fine base for
the creation of a Meta spam filter.
7 FUTURE SCOPE

This work proposes a model for improving recognition


of cruel spam in email. Our model resolve employ a
novel dataset intended for the process of feature choice,
and then validate the set of chosen features using three
classifiers identified in spam detection: Support Vector
Machine, Naïve Bayes, and Multilayer Perception. Feature
selection is projected to recover training time as well as
accuracy for the classifiers.
8 APPENDIX

A. Source Code

2 Data Collection & Preparation

2.1 Collect The Dataset


2.2 Importing The Libraries
2.3 Read The Dataset
2.4 Data Preparation

2.5 Handling Missing Values


Rename The Column
2.6 Handling Categorical Values
2.7 Handling Imbalance Data
2.8 Cleaning The Text Data
3 Exploratory Data Analysis

3.1 Descriptive Statistical


3.2 Visual Analysis
3.3 Univariate Analysis
4 Model Building

4.1 Training The Model In Multiple Algorithms

Splitting data into train and test


4.2 Compare The Model
5 Performance Testing & Hyperparameter Tuning

5.1 Testing Model With Multiple Evaluation Metrics

5.2 Compare The Model

5.3 Comparing Model Accuracy Before & After Applying


Hyperparameter Tuning
6 Model Deployment

6.1 Save The Best Model

6.2 Integrate With Web Framework

6.3 Building Html Pages


6.4 Build Python Code
6.5 Run The Web Application

You might also like