© July 2022| IJIRT | Volume 9 Issue 2 | ISSN: 2349-6002
Paper on Spam Email Detection with Classification Using
Machine Learning
Naresh Vinod Wankhade1, Dr.Ranjit. R. Keole2, Prof.Tushar. R. Mahore3
1
ME (Computer Science and Engineering) Second Year, Dr.RGIT&R, Amravati, India
2
Head of the Department, Information Technology, HVPM’s CET, Amravati, India
3
Head of the Department, Computer Science & Engineer DRGIT&R, Amravati, India
Abstract— The increasing volume of unsolicited bulk e- Spam is waste of time, storage space and
mail (also known as spam) has generated a need for communication bandwidth. The problem of spam e-
reliable anti-spam filters. Machine learning techniques mail has been increasing for years. In recent statistics,
now days used to automatically filter the spam e-mail in
40% of all emails are spam which about 15.4 billion
a very successful rate. In this paper we review some of
email per day and that cost internet users about $355
the most popular machine learning methods (Bayesian
classification, k-NN, ANNs, SVMs, Artificial immune million per year. Automatic e-mail filtering seems to
system and rough sets) and of their applicability to the be the most effective method for countering spam at
problem of spam Email classification. Descriptions of the the moment and a tight competition between
algorithms are presented, and the comparison of their spammers and spam-filtering methods is going on.
performance on the Spam Assassin spam corpus is Only several years ago most of the spam could be
presented. Electronic mail has eased communication reliably dealt with by blocking e-mails coming from
methods for many organizations as well as individuals. certain addresses or filtering out messages with certain
This method is exploited for fraudulent gain by
subject lines. Spammers began to use several tricky
spammers through sending unsolicited emails. This
methods to overcome the filtering methods like using
article aims to present a method for detection of spam
emails with machine learning algorithms that are random sender addresses and/or append random
optimized with bio-inspired methods. A literature review characters to the beginning or the end of the message
is carried to explore the efficient methods applied on subject line [11]. Knowledge engineering and machine
different datasets to achieve good results. Extensive learning are the two general approaches used in e-mail
research was done to implement machine learning filtering. In knowledge engineering approach a set of
models using Naïve Bayes, Support Vector Machine, rules has to be specified according to which emails are
Random Forest, Decision Tree and Multi-Layer categorized as spam or ham. A set of such rules should
Perceptron on seven different email datasets, along with
be created either by the user of the filter, or by some
feature extraction and pre-processing. The bio-inspired
other authority (e.g., the software company that
algorithms like Particle Swarm Optimization and
Genetic Algorithm were implemented to optimize the provides a particular rule-based spam-filtering tool).
performance of classifiers. Multinomial Naïve Bayes By applying this method, no promising results shows
with Genetic Algorithm performed the best overall. The because the rules must be constantly updated and
comparison of our results with other machine learning maintained, which is a waste of time, and it is not
and bio-inspired models to show the best suitable model convenient for most users. Machine learning approach
is also discussed. is more efficient than knowledge engineering
approach; it does not require specifying any rules [4].
Index Terms- ANN, Data Extraction, IP Filtration,
Instead, a set of training samples, these samples is a
Machine Learning, URL
set of pre classified e-mail messages. A specific
algorithm is then used to learn the classification rules
I. INTRODUCTION
from these e-mail messages. Machine learning
approach has been widely studied and there are lots of
Recently unsolicited commercial/bulk e-mail also
algorithms can be used in e-mail filtering. They
known as spam, become a big trouble over the internet.
include Naïve Bayes, support vector machines, Neural
IJIRT 156181 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 1055
© July 2022| IJIRT | Volume 9 Issue 2 | ISSN: 2349-6002
Networks, K-nearest neighbor, rough sets and the artificial immune system (IA-AIS) and applied to the
artificial immune system. problem of identification of unsolicited bulk e-mail
The proposed system will help to enhance the security messages (SPAM).
of user through previous checking of email. In which 3. PROPOSED METHODOLOGY
the evolutionary mechanism firstly check the content
of the mail which passed through various machine As per the things seen it is necessary to propose the
learning technique. In this the proposed methodology mechanism in which mail are going to cross verify the
will perform the various check for the link as well mail content in which we are going to filter the both
which will help for the security enhancement. It will content and links of shared email. Most probably the
handle the cyber security attack to stop the entry. spam mails contain the malicious link in which URL
classification or parsing need to be work out. So that
2. EXISTING SYSTEM& ALGORITHM in proposed we analyze the URL data as well as mail
content
There are some research works that apply machine
learning methods in e-mail classification, Muhammad Start
N. Marsono, M. Watheq El-Kharashi, Fayez Gebali[2]
. They demonstrated that the naïve Bayes e-mail
content classification could be adapted for layer-3 Retrieve Email
processing, without the need for reassembly.
Suggestions on redetecting e-mail packets on spam
control middle boxes to support timely spam detection Apply Content
at receiving e-mail servers were presented. M. N. filter
Marsono, M. W. El-Kharashi, and F. Gebali[1] They
presented hardware architecture of na¨ıve Bayes
inference engine for spam control using two class e- Mail Content Extracted URL
mail classification. That can classify more 117 million
features per second given a stream of probabilities as
inputs. This work can be extended to investigate Data Parsing and Content
proactive spam handling schemes on receiving e-mail Extraction Extraction Analysis
servers and spam throttling on network gateways. Y.
Tang, S. Krasser, Y. He, W. Yang, D. Alperovitch [3]
proposed a system that used the SVM for classification
purpose, such system extract email sender behavior Apply Sentimental Analysis
data based on global sending distribution, analyze
them and assign a value of trust to each IP address
sending email message, the Experimental results show Analys
NO
that the SVM classifier is effective, accurate and much
Is Malicious
faster than the Random Forests (RF) Classifier. Yoo,
Content?
S., Yang, Y., Lin, F., and Moon [11] developed
personalized email prioritization (PEP) method that
Yes
specially focus on analysis of personal social networks
to capture user groups and to obtain rich features that
Mark Spam Mail
represent the social roles from the viewpoint of
particular user, as well as they developed a supervised
classification framework for modeling personal Finish
priorities over email messages, and for predicting
Fig 1 –Flow chart of Proposed Methodology for spam
importance levels for new messages. Guzella, Mota-
email detection Above diagram represents the flow
Santos, J.Q. Uch, and W.M. Caminhas[4] proposed an
chart of proposed methodology in which mails are
immune-inspired model, named innate and adaptive
IJIRT 156181 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 1056
© July 2022| IJIRT | Volume 9 Issue 2 | ISSN: 2349-6002
given as input to the system in which on mail content (Sender) can send mail to another registered user
the content extraction will be done and followed with (Receiver) by selecting appropriate email id. At the
execution process of breaking it in to the links and data receiver end each mail has to go through stages
in this it is going to filter in various aspect like content discussed in system design and implementation of all
filtration counting the malicious word and shows it in stages is as follow.
appropriate manner firstly the link and data
classification will be workout latterly the data process Steps in Spam email Detection system
with sentimental analysis in which the various Admin Login
keywords compared and evaluate . Latterly the step of Admin has access to all contents in the Spam email
IP check will be encounter in which the send email id detection system, Admin can make certain changes.
will be retrieve and perform with evaluation. This Following screenshot shows the login window for
process followed by result evaluation. At the end the admin
spam email detection will be concluded
New Registration Window
Architecture New User has to register using Username, email id,
contact details and password to be chosen by the user.
User has to remember all credentials in order to access
account under user login. In this case, email id shall be
used as username. Screenshot for New Registration is
shown below.
User Login Window
Registered user has to use emailed as username to
login into the account. Once login is done, registered
user (Sender) can send mail to another registered user
(receiver) using appropriate email id.
Spam Email Detection System
All registered users can be accessed by admin. Admin
can certainly changes to spam mail, users, can remove
spam and can add training.
However, the fields are confined to username, contact
Fig 2-Execution Spam email number, email id which is later used as username for
Above diagram represents the architecture of login and password. In this system one user can have
execution of spam email in which the first step will be same username with different email ids. Here in this
perform as content filtration URL extraction and case, email id acts as a primary key. Duplication of
separating the data . in this the link-based evaluation email id strictly restricted here in Spam Email
well done the content of the mail will be compared Detection System
with existing keyword and IPs. So that the spam email
detection will be done. Spam Detection Mail Window
After login into user account, registered users can
Implementation make certain changes to spam mail, users, can remove
Spam email detection system has following stages in spam and can add training. However, the fields are
order to detect spam email. Admin has access to all confined to username, contact number, email id which
contents in the Spam email detection system. New is later used as username for login and password. In
User has to register using Username, email id, Contact this system one user can have same username with
details and password to be chosen by the user. After different email ids. Here in this case, email id acts as a
registering user can access his account by login using primary key. After receiving mail, user can check the
email id as a username and password. Registered user mail body, if he finds inappropriate word then he can
IJIRT 156181 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 1057
© July 2022| IJIRT | Volume 9 Issue 2 | ISSN: 2349-6002
report that mail as spam or otherwise non spam. Under [1] Shukor Bin AbdRazak, Ahmad Fahrulrazie Bin
this mechanism it performs actions like Mail Splitting, Mohamad “Identification of Spam Email Based
Content Extraction ,URL Filtration, IP Extraction to on Information from Email Header” 13th
detect the spam International Conference on Intelligent Systems
Design and Applications (ISDA), 2013.
Result Analysis [2] Mohammed Reza Parsei, Mohammed Salehi “E-
We pass certain email content to both existing and Mail Spam Detection Based on Part of Speech
proposed in which all mails are pass which are non- Tagging” 2nd International Conference on
spam so below system show the detections of mails Knowledge Based Engineering and
which are shown below Innovation(KBEI), 2015.
[3] Sunil B. Rathod, Tareek M. Pattewar “Content
Precision Existing Vs Proposed Based Spam Detection in Email using Bayesian
0.3 Classifier”, presented at the IEEE ICCSP 2015
conference.
0.2 [4] AakashAtulAlurkar, Sourabh Bharat Ranade,
Shreeya Vijay Joshi, SiddheshSanjay Ranade,
0.1
Piyush A. Sonewa, Parikshit N.Mahalle, Arvind
0 V. Deshpande “A Proposed Data Science
Recall Approach for Email Spam Classification using
Machine Learning Techniques”, 2017.
By Existing Proposed
[5] KritiAgarwal, Tarun Kumar “Email Spam
Fig.3- Precision Existing Vs Proposed System Detection using integrated approach of Naïve
Bayes and Particle Swarm Optimization”,
4.CONCLUSION Proceedings of the Second International
Conference on Intelligent Computing and Control
In Spam mail classification is major area of concern Systems (ICICCS), 2018.
these days as it helps in the detection of unwanted [6] CihanVarol, HezhaM. TareqAbdulhadi
emails and threats. So now a day’s most of the “Comparison of String-Matching Algorithms on
researchers are working in this area in order to find out Spam Email Detection”, International Congress
the best classifier for detecting the spam mails. So a on Big Data, Deep Learning and Fighting
filter is required with high accuracy to filter the CyberTerrorism Dec 2018.
unwanted mails or spam mails. In this paper we [7] Duan, Lixin, Dong Xu, and Ivor Wai-Hung
focused on finding the best classifier for spam mail Tsang. "Domain adaptation from multiple
classification using Data Mining techniques. So, we sources: A domain dependent regularization
applied various classification algorithms on the given approach." IEEE Transactions on Neural
input data set and check the results. From this study Networks and Learning Systems 23.3 (2012).
we analyze that classifier works well when we embed [8] Anitha, PU &Rao, Chakunta& ,T.Sireesha.
feature selection approach in the classification process (2013). A Survey On: E-mail Spam Messages and
that is the accuracy improved drastically when Bayesian Approach for Spam Filtering.
classifiers are applied on the reduced data set instead International Journal of Advanced Engineering
of the entire data set. As in proposed the spam and Global Technology (IJAEGT). 1. 124- 136.
classification done on all parameters like IP , Previous [9] Attenberg, J., Weinberger, K., Dasgupta, A.,
history and content of shared URL and data so that the Smola, A., &Zinkevich, M. (2009, July).
proposed mechanism will helps a lot to go improved Collaborative email-spam filtering with the
spam mail detection. hashing trick.In Proceedings of the Sixth
Conference on Email and Anti-Spam.
REFERENCE [10] Awad, W. A., &ELseuofi, S. M. (2011). Machine
learning methods for spam e-mail classification.
IJIRT 156181 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 1058
© July 2022| IJIRT | Volume 9 Issue 2 | ISSN: 2349-6002
International Journal of Computer Science &
Information Technology (IJCSIT), 3(1), 173-184.
IJIRT 156181 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGY 1059