Intelligent phishing website detection
system using fuzzy techniques.
Item Type
Conference paper
Authors
Aburrous, Maher R.; Hossain, M. Alamgir; Thabatah, F.; Dahal,
Keshav P.
Citation
Aburrous, M., Hossain, M. A., Thabatah, F. and Dahal, K. P.
(2008). Intelligent phishing website detection system using fuzzy
techniques. In: Proceedings of the 3rd International Conference
on Information & Communication Technologies: From Theory to
Applications (ICCTA'08). New York: IEEE.
Publisher
IEEE
Rights
Copyright © [2008] IEEE. Reprinted from the Proceedings of
the International Conference on Information & Communication
Technologies: From Theory to Applications (ICCTA'08).
This material is posted here with permission of the IEEE.
Such permission of the IEEE does not in any way imply IEEE
endorsement of any of the University of Bradford's products or
services. Internal or personal use of this material is permitted.
However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new
collective works for resale or redistribution must be obtained
from the IEEE by writing to pubspermissions@ ieee.org. By
choosing to view this document, you agree to all provisions of the
copyright laws protecting it.
Download date
17/06/2020 18:38:00
Link to Item
http://hdl.handle.net/10454/2640
The University of Bradford Institutional
Repository
http://bradscholars.brad.ac.uk
This work is made available online in accordance with publisher policies. Please
refer to the repository record for this item and our Policy Document available from
the repository home page for further information.
To see the final version of this work please visit the publisher’s website. Where
available access to the published online version may require a subscription.
Author(s): Aburrous, M., Hossain, M. A., Thabatah, F. and Dahal, K. P.
Title: Intelligent phishing website detection system using fuzzy techniques.
Publication year: 2008
Conference title: International Conference on Information & Communication
Technologies (ICCTA'08).
Link to original publication:
http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=4529902&isYear=2008
Citation: Aburrous, M., Hossain, M. A., Thabatah, F. and Dahal, K. P. (2008).
Intelligent phishing website detection system using fuzzy techniques. In:
Proceedings of the 3rd International Conference on Information &
Communication Technologies: From Theory to Applications (ICCTA'08). New
York: IEEE.
Copyright statement: Copyright © [2008] IEEE. Reprinted from the Proceedings
of the International Conference on Information & Communication Technologies:
From Theory to Applications (ICCTA'08). This material is posted here with
permission of the IEEE. Such permission of the IEEE does not in any way imply
IEEE endorsement of any of the University of Bradford's products or services.
Internal or personal use of this material is permitted. However, permission to
reprint/republish this material for advertising or promotional purposes or for
creating new collective works for resale or redistribution must be obtained from
the IEEE by writing to pubspermissions@ ieee.org. By choosing to view this
document, you agree to all provisions of the copyright laws protecting it.
Intelligent Phishing Website Detection System using
Fuzzy Techniques
Maher Aburrous
M. A. Hossain
Dept. of Computing
University of Bradford
Bradford, UK
mrmaburr@bradford.ac.uk
Fadi Thabatah
Dept. of Computing
University of Bradford
Bradford, UK
m.a.hossain1@bradford.ac.uk
MIS Department
Philadelphia University
Amman, Jordan
ffayez@philadelphia.edu.jo
instant messaging site, which will take the user to hostile
phishing websites [10].
Phishing website is a very complicate and complex issue to
understand and to analyze, since it is joining technical and
social problem with each other for which there is no known
single silver bullet to solve it entirely. The motivation behind
my study is to create a resilient and effective method that uses
fuzzy logic to quantify and qualify all the website phishing
characteristics and factors in order to detect phishing websites
to assess whether phishing activity is taking place or not.
The paper is organized as follows: Section 2 presents the
literature review and related work and Section 3 shows the
theory and methodology of the website phishing risk
assessment model with its system design and implementation.
Section 4 reveals the experiments and results of the fuzzy
phishing website risk assessment model and then conclusions
and future work are given in Section 5.
Abstract- Phishing websites are forged web pages that are
created by malicious people to mimic web pages of real websites
and it attempts to defraud people of their personal information.
Detecting and identifying Phishing websites is really a complex
and dynamic problem involving many factors and criteria, and
because of the subjective considerations and the ambiguities
involved in the detection, Fuzzy Logic model can be an effective
tool in assessing and identifying phishing websites than any other
traditional tool since it offers a more natural way of dealing with
quality factors rather than exact values. In this paper, we present
novel approach to overcome the ‘fuzziness’ in traditional website
phishing risk assessment and propose an intelligent resilient and
effective model for detecting phishing websites. The proposed
model is based on FL operators which is used to characterize the
website phishing factors and indicators as fuzzy variables and
produces six measures and criteria’s of website phishing attack
dimensions with a layer structure. Our experimental results
showed the significance and importance of the phishing website
criteria (URL & Domain Identity) represented by layer one, and
the variety influence of the phishing characteristic layers on the
final phishing website rate.
II. LITERATURE REVIEW AND RELATED WORK
A. Literature Review
Phishing website is a recent problem, nevertheless due to
its huge impact on the financial and on-line retailing sectors
and since preventing such attacks is an important step towards
defending against website phishing attacks, there are several
promising approaches to this problem and a comprehensive
collection of related works. In this section, we briefly survey
existing anti-phishing solutions and list of the related works.
One approach is to stop phishing at the email level [3], since
most current phishing attacks use broadcast email (spam) to
lure victims to a phishing website [21]. Another approach is to
use security toolbars. The phishing filter in IE7 [19] is a
toolbar approach with more features such as blocking the
user’s activity with a detected phishing site. A third approach
is to visually differentiate the phishing sites from the spoofed
legitimate sites. Dynamic Security Skins [5] proposes to use a
randomly generated visual hash to customize the browser
window or web form elements to indicate the successfully
authenticated sites. A fourth approach is two-factor
authentication, which ensures that the user not only knows a
secret but also presents a security token [6]. However, this
approach is a server-side solution. Phishing can still happen at
sites that do not support two-factor authentication. Sensitive
information that is not related to a specific site, e.g., credit
Keywords- Phishing; Fuzzy Logic; risk assessment; phishing
website criteria
I.
Keshav Dahal
Dept. of Computing,
University of Bradford
Bradford, UK
k.p.dahal@bradford.ac.uk
INTRODUCTION
PHISHING websites are forged web pages that are created
by malicious people to mimic web pages of real websites.
Most of these kinds of Web pages have high visual similarities
to scam their victims. Some of these kinds of Web pages look
exactly like the real ones. Unwary Internet users may be easily
deceived by this kind of scam. Victims of phishing Web pages
may expose their bank account, password, credit card number,
or other important information to the phishing Web page
owners. The impact is the breach of information security
through the compromise of confidential data and the victims
may finally suffer losses of money or other kinds. Phishing is
a relatively new Internet crime in comparison with other
forms, e.g., virus and hacking. More and more phishing Web
pages have been found in recent years in an accelerative way
[7]. The word phishing from the phrase “website phishing” is
a variation on the word “fishing.” The idea is that bait is
thrown out with the hopes that a user will grab it and bite into
it just like the fish. In most cases, bait is either an e-mail or an
1
card information and SSN, cannot be protected by this
approach either [22].
Many industrial antiphishing products use toolbars in Web
browsers, but some researchers have shown that security tool
bars don’t effectively prevent phishing attacks. In [5] Rachna
Dhamija and Doug Tygar proposed a scheme that uses a
cryptographic identity-verification method that lets remote
Web servers prove their identities. However, the proposal
requires changes to the entire Web infrastructure (both servers
and clients), so it can succeed only if the entire industry
supports it. Reference [13] also proposed a tool to model and
describes phishing by visualizing and quantifying a given
site’s threat, but this method still wouldn’t provide an
antiphishing solution. Another approach is using certification,
e.g., (microsoft.com/mscorp/twc/privacy/spam), [14], [15],
[17], [1]. A recent and particularly promising solution [8]
proposes to combine the technique of standard certificates
with a visual indication of correct certification; a sitedependent logo indicating that the certificate was valid, would
be displayed in a trusted credentials area of the browser. A
variant of web credential is to use a database or list published
by a trusted party, where known phishing web sites are
blacklisted. For example Netcraft antiphishing toolbar
http://toolbar.netcraft.com/ prevents phishing attacks by using
a centralized blacklist of current phishing URLs. Other
Examples include Websense, McAfee’s anti–phishing filter,
Netcraft anti-phishing system, Cloudmark SafetyBar,
Microsoft Phishing Filter [16]. The weakness of this approach
is its poor scalability and its timeliness. Note that phishing
sites are cheap and easy to build and their average lifetime is
only a few days.
APWG provides a solution directory at (Anti-Phishing
Working Group) [2] which contains most of the major
antiphishing companies in the world. However, an automatic
antiphishing method is seldom reported. The typical
technologies of antiphishing from the User Interface aspect are
done by [5] and [22]. They proposed methods that need Web
page creators to follow certain rules to create Web pages,
either by adding dynamic skin to Web pages or adding
sensitive information location attributes to HTML code.
However, it is difficult to convince all Web page creators to
follow the rules [7]. In [12], [7], [13], the DOM-based [20]
visual similarity of Web pages is oriented, and the concept of
visual approach to phishing detection was first introduced.
Through this approach, a phishing Web page can be detected
and reported in an automatic way rather than involving too
many human efforts. Their method first decomposes the Web
pages (in HTML) into salient (visually distinguishable) block
regions. The visual similarity between two Web pages is then
evaluated in three metrics: block level similarity, layout
similarity, and overall style similarity, which are based on the
matching of the salient block regions [7].
for a phisher to defraud people is to make the phishing Web
pages similar to their targets.
Actually, there are many characteristics and factors that can
distinguish the original legitimate website from the forged
faked phishing website like Spelling errors, Long URL
address and Abnormal DNS record [16]. The full list is shown
in table I which will be used later on our analysis and
methodology study.
C. Why using Fuzzy Logic?
Fuzzy logic has been used for decades in the engineering
sciences to embed expert input into computer models for a
broad range of applications. It offers a promising alternative
for measuring operational risks [18]. The fuzzy logic approach
provides more information to help risk managers effectively
manage assessing and ranking website phishing risks than the
current qualitative approaches as the risks are quantified based
on a combination of historical data and expert input. The
advantage of the fuzzy approach is that it enables processing
of vaguely defined variables, and variables whose
relationships cannot be defined by mathematical relationships
[23]. Fuzzy logic can incorporate expert human judgment to
define those variable and their relationships.
III. THEORY & METHODOLOGY
A. The Phishing Website Risk Assessment Model
1) Fuzzification
The approach described here is to apply fuzzy logic
modeling to assess website phishing risk on the 27
characteristics and factors which stamp the forged website.
The essential advantage offered by fuzzy logic techniques is
the use of linguistic variables to represent Key Phishing
Characteristic Indicators and relating website phishing
probability. In this step, linguistic descriptors such as High,
Low, Medium, for example, are assigned to a range of values
for each Key Phishing Characteristic Indicators. Valid ranges
of the inputs are considered and divided into classes, or fuzzy
sets. For example, length of URL address can range from
‘low’ to ‘high’ with other values in between. We cannot
specify clear boundaries between classes. The degree of
belongingness of the values of the variables to any selected
class is called the degree of membership; Membership
function is designed for each Phishing characteristic indicator,
which is a curve that defines how each point in the input space
is mapped to a membership value (or degree of membership)
between [0, 1]. Linguistic values are assigned for each
Phishing indicator as Low, Moderate, and High while for
Phishing website risk rate as Very legitimate, Legitimate,
Suspicious, Phishy, and Very phishy (triangular and
trapezoidal membership function). For each input their values
ranges from 0 to 10 while for output, ranges from 0 to 100.
An example of the linguistic descriptors used to represent one
of the Key Phishing Characteristic Indicators (URL Address
B. Main Characteristics of phishing websites.
Evolving with the antiphishing techniques, various
phishing techniques and more complicated and hard-to-detect
methods are used by phishers. The most straightforward way
2
Long) and a plot of the fuzzy membership functions are
shown in figure 1. The fuzzy representation more closely
matches human cognition, thereby facilitating expert input and
more reliably representing experts’ understanding of
underlying dynamics [4].
number of components for each criterion, five components for
URL & Domain Identity, Source Code & Java script, Page
Style & Contents, Web Address Bar, four components for
Security & Encryption and three components for Social
Human Factor. Therefore, there are twenty seven
components in total.
There are three layers on this website phishing fuzzy model as
shown in figure 2. The first layer contains only URL &
Domain Identity criteria with a weight equal to 0.3 for its
importance; the second layer contains Security & Encryption
criteria and Source Code & Java script criteria with a weight
equal to 0.2 each; the third layer contains Page Style &
Contents criteria, Web Address Bar criteria And Social
Human Factor criteria with a weight equal to 0.1 each. The
six criteria have been prioritized according to their importance
using weights as concluded from the Website phishing
experiments, case studies, Anti phishing tools analysis, web
surveys, phishing quizzes, detailed questionnaire and phishing
expert’s feedback.
Figure 1. Input variable for Long URL Address component
The same approach is used to calibrate the other 26 Key
Phishing Characteristic Indicators.
2) Rule Evaluation.
Having specified the risk of website phishing and its Key
Phishing Characteristic Indicators, the logical next step is to
specify how the website phishing probability varies as a
function of the Key Phishing Characteristic Indicators. Experts
provide fuzzy rules in the form of if…then statements that
relate website phishing probability to various levels of Key
Phishing Characteristic Indicators based on their knowledge
and experience.
Website phishing experiments, Anti phishing tools analysis,
web surveys, phishing quizzes and detailed questionnaire to
assess factors, which collectively characterise the website
phishing. A detailed checklist table is based on the types of
phishing source and style, and weights assigned to them
according to their effectiveness and influence.
Using the IP Address
Abnormal Request URL
Abnormal URL of Anchor
URL & Domain Identity
Abnormal DNS record
Abnormal URL
Anomalous SSL certificate
Conflicting certification authority
Abnormal Cookie
Security & Encryption
Inconsistent Distinguished Names (DN)
Layer Two
Redirect pages
Straddling attack
Pharming Attack
Source Code Java script
Using onMouseOver to hide the Link
3) Aggregation of the rule outputs.
Server Form Handler (SFH)
Website Phishing
Rate
This is the process of unification of the outputs of all rules.
Combining the membership functions of all the rules
consequents previously scaled into single fuzzy sets (output).
4)
Spelling errors
Copying website
Using forms with “Submit ” button
Page Style & Contents
Using Pop-Ups windows
Defuzzification.
Disabling Right-Click
This is the process of transforming a fuzzy output of a
fuzzy inference system into a crisp output. Fuzziness helps to
evaluate the rules, but the final output this system has to be a
crisp number. The input for the defuzzification process is the
aggregate output fuzzy set and the output is a number. This
step was done using Centroid technique because it is most
commonly used method of defuzzification.
The output is website phishing risk rate and is defined in fuzzy
sets like ‘very phishy’ to ‘very legitimate’. The fuzzy output
set is then defuzzified to arrive at a scalar value.
Long URL address
Replacing similar characters for URL
Adding a prefix or suffix
Web Address Bar
Layer Three
Using the @ Symbol to Confuse
Using Hexadecimal Character Codes
Emphasis on security and response
Public generic salutation
Social Human Factor
Buying Time to Access Accounts
Structure of the fuzzy inference overall system to evaluate website phishing rate
Figure 2. Structure of the fuzzy inference overall system to evaluate website
phishing rate.
C. System Design
The design based on multi-level fuzzy approach for risk
analysis [23]. Website phishing detection rate is performed
based on six criteria: URL & Domain Identity, Security &
Encryption, Source Code & Java script, Page Style &
Contents, Web Address Bar And Social Human Factor as
shown in Table I, which also shows that there are different
Website Phishing Rating = 0.3 * URL & Domain Identity
crisp [First layer] + ((0.2 * Security & Encryption crisp)+(0.2
* Source Code & Java script crisp)) [Second layer] + ((0.1 *
Page Style & Contents crisp) +(0.1 * Web Address Bar crisp)
+ (0.1 * Social Human Factor crisp)) [Third layer]
3
Therefore, the rule base 1 contains (35) = 243 entries. The
output of rule base 1 is one of the website phishing rate fuzzy
sets (Genuine, Doubtful or Fraud) representing URL &
Domain Identity criteria phishing risk rate. A sample of the
structure and the entries of the rule base 1 for layer 1 are
shown in Table II. The system structure for URL & Domain
Identity criteria is the joining of its five components (Using
the IP Address, Abnormal Request URL, Abnormal URL of
Anchor, Abnormal DNS record and Abnormal URL), which
produces the URL & Domain Identity criteria (Layer one).
TABLE I. COMPONENTS AND LAYERS OF WEBSITE PHISHING
CRITERIA.
Criteria
Component
Layer No.
N
URL &
1 Using the IP Address
Domain
Layer One
2 Abnormal Request URL
Identity
3 Abnormal URL of Anchor
4 Abnormal DNS record
(Weight = 0.3)
Sub weight =
5 Abnormal URL
0.3
Security &
Encryption
(Weight = 0.2)
Source Code
& Java script
(Weight = 0.2)
1
2
3
4
1
2
3
4
5
Page Style &
Contents
(Weight =0.1)
Web Address
Bar
(Weight = 0.1)
Social Human
Factor
(Weight = 0.1)
1
2
3
4
5
1
2
3
4
5
1
2
3
Using SSL certificate
Certificationauthority
Abnormal Cookie
Distinguished Names
Certificate(DN)
Redirect pages
Straddling attack
Pharming Attack
Using onMouseOver to hide the
Link
Server Form Handler (SFH)
Spelling errors
Copying website
Using forms with “Submit” button
Using Pop-Ups windows
Disabling Right-Click
Long URL address
Replacing similar characters for
URL
Adding a prefix or suffix
Using the @ Symbol to Confuse
Using Hexadecimal Character
Codes
Much emphasis on security and
response
Public generic salutation
Buying Time to Access Accounts
Total Weight
Layer Two
2) The Rule Base for layer 2.
In Layer 2, there are two inputs, which are (Security &
Encryption and Source Code & Java script) and one output.
The system structure for Security & Encryption criteria is the
joining of its four components (Using SSL certificate,
Certification authority, Abnormal Cookie and Distinguished
Names Certificate(DN)) using Rule base 1, which produces
Security & Encryption criteria. The system structure for
Source Code & Java script criteria is the joining of its five
components (Redirect pages, Straddling attack, Pharming
Attack, Using onMouseOver to hide the Link and Server Form
Handler (SFH)) using Rule base 1, which produces Source
Code & Java script criteria. The structure and the entries of the
rule base for layer 2 are illustrated in Table III. The system
structure for layer 2 is the combination of two website
phishing criteria (Security & Encryption and Source Code &
Java script), which produces rule base 2. The rule base
contains (32) = 9 entries and the output of rule base 2 is one of
the website phishing rate fuzzy sets (Legal, Uncertain or Fake)
representing Layer Two criteria phishing risk rate.
Sub weight =
0.4
Layer Three
Sub weight =
0.3
1
TABLE III. THE RULE BASE 2 STRUCTURE AND ENTRIES FOR
LAYER TWO
Security &
Source Code &
Phishing Risk
Encryption
Java script
(Layer Two)
Rule
D. The Rule Base
1) The Rule Base1 for layer 1.
The rule base has five input parameters and one output and
contains all the “IF-THEN” rules of the system. For each entry
of the rule base, each component is assumed to be one of three
values and each criterion has five components.
1
2
3
4
5
6
7
8
9
Rule #
(comp. 1)
Using the IP
Address
(comp. 2)
Abnormal Req.
URL
(comp. 3)
Abnormal URL
Anchor
(comp. 4)
Abnormal DNS
record
(comp. 5)
Abnormal URL
URL & Domain
Identity Criteria
Phishing Risk
(Layer one)
TABLE II. SAMPLE OF THE RULE BASE 1 STRUCTURE AND
ENTRIES FOR URL & DOMAIN IDENTITY CRITERIA
1
2
3
4
5
6
7
8
9
10
Low
Low
Low
Low
Low
Low
Mod.
high
Low
high
Low
Low
Low
Low
Low
Mod.
Low
Mod.
high
Mod.
Low
Low
Low
Low
Mod.
Mod.
high
Low
Low
high
Low
Low
Mod.
Mod.
Mod.
Low
Mod.
Low
Low
high
Low
Mod.
Mod.
high
high
high
high
Low
Mod.
Low
Genuine
Genuine
Doubtful
Doubtful
Fraud
Fraud
Fraud
Doubtful
Doubtful
Fraud
Genuine
Genuine
Genuine
Doubtful
Doubtful
Doubtful
Fraud
Fraud
Fraud
Genuine
Doubtful
Fraud
Genuine
Doubtful
Fraud
Genuine
Doubtful
Fraud
Legal
Legal
Uncertain
Legal
Uncertain
Uncertain
Uncertain
Uncertain
Fake
3) The Rule Base for layer 3.
In Layer 3, there are three inputs, which are: the Page
Style & Contents, Web Address Bar and Social Human Factor
which is the output from layer 3, and one output. The system
structure for Page Style & Contents criteria is the joining of its
five components (Spelling errors, Copying website, Using
forms with “Submit” button, Using Pop-Ups windows and
Disabling Right-Click) using Rule base 1, which produces
Page Style & Contents criteria. The system structure for Web
Address Bar criteria is the joining of its five components
(Long URL address, Replacing similar characters for URL,
Adding a prefix or suffix, Using the @ Symbol to Confuse and
Using Hexadecimal Character Codes) using Rule base 1,
4
Rule
which produces Web Address Bar criteria. The system
structure for Social Human Factor criteria is the joining of its
three components (Much emphasis on security and response,
Public generic salutation and Buying Time to Access
Accounts) using Rule base 1, which produces Social Human
Factor criteria.
A sample of the structure and the entries of the rule base for
layer 3 are shown in Table IV. The system structure for layer 3
is the combination of Page Style & Contents, Web Address
Bar and Social Human Factor, which produces rule base 3.
The rule base contains (33) = 27 entries and the output of rule
base 3 is one of the website phishing rate fuzzy sets (Legal,
Uncertain or Fake) representing Layer Three criteria phishing
risk rate.
19
20
21
22
23
24
25
26
27
TABLE IV. THE RULE BASE 3 STRUCTURE AND ENTRIES FOR
LAYER THREE
Page Style
Web
Social Human
Phishing Risk
& Contents
Address Bar
Factor
(Layer Three)
1
2
3
4
5
6
7
8
9
Genuine
Genuine
Genuine
Doubtful
Doubtful
Doubtful
Fraud
Fraud
Fraud
Genuine
Doubtful
Fraud
Genuine
Doubtful
Fraud
Genuine
Doubtful
Fraud
Genuine
Fraud
Fraud
Genuine
Doubtful
Doubtful
Genuine
Doubtful
Fraud
Fraud
Fraud
Fraud
Fraud
Fraud
Fraud
Fraud
Fraud
Fraud
Legal
Legal
Legal
Uncertain
Uncertain
Uncertain
Fake
Fake
Fake
Legal
Uncertain
Fake
Legal
Uncertain
Fake
Legal
Uncertain
Fake
Suspicious
Suspicious
Phishy
Suspicious
Suspicious
Phishy
Phishy
Very Phishy
Very Phishy
Figure 3. Three-dimensional plots for final phishing rate
Legal
Uncertain
Fake
Legal
Uncertain
Uncertain
Legal
Uncertain
Fake
IV. EXPERIMENTS AND RESULTS
Clipping method [9] is used in aggregating the
consequences and the aggregated surface of the rule
evaluation is defuzzified using Mamdani method [11] to find
the Center Of Gravity (COG). Centroid defuzzification
technique shown in equation (1) can be expressed as where x*
is the defuzzified output, µi(x) is the aggregated membership
function and x is the output variable.
4) The Rule Base for final website phishing rate.
In the website phishing rule base last phase, there are three
inputs, which are: layer one, layer two and layer three, and one
output which is the rate of the phishing website. The structure
and the entries of the rule base for website phishing rate are
shown in Table V. The system structure for is the combination
of layer one, layer two and layer three, which produces final
website phishing rule base. The three dimensional plots of this
structure is shown in Figure 3 using MATLAB. The rule base
contains (33) = 27 entries and the output of final website
phishing rule base is one of the final output fuzzy sets (Very
Legitimate, Legitimate, Suspicious, Phishy or Very Phishy)
representing final phishing website rate.
------------------ (1)
The proposed intelligent Phishing website detection system
has been implemented in MATLAB 6.5. The results of some
input combinations are listed in Tables VI, VII and VIII.
The final phishing website risk rating will be balanced (50%)
representing a “suspicious website”, when the Layer one
(URL & Domain Identity) of the phishing website risk criteria
has 10 input values which indicate High phishing indicator
and all other layers have the value of zero inputs as shown in
Table VI. Same result can be made when all phishing website
risk criteria’s representing by the three layers have middle (5)
input values which indicate Mod. phishing indicator. These
results shows the significance and importance of the phishing
website criteria (URL & Domain Identity) represented by
layer one especially when compared to the other criteria’s and
layers. Table VII shows that when the Layer one and Layer
two of the phishing website risk criteria has middle (5) input
values which indicate Mod. phishing indicator and other third
Layer has the value of 10 input values which indicate High
phishing indicator, the final phishing website risk rating will
be reasonably high (65%) representing a “phishy website”,
which means that there is a Good guarantee that the website is
forged phishy website. This result clearly shows that even if
some of the website phishing characteristics or layers are not
very clear or not definite, the website can still be phishy and
forged and users should be aware when dealing with it
especially when other phishing characteristics or layers are
obvious and clear.
Rule
TABLE V. THE WEBSITE PHISHING RATE RULE BASE STRUCTURE
AND ENTRIES FOR FINAL PHISHING RATE
URL &
Layer Two
Layer
Final Website
Domain
Three
Phishing Rate
Identity
1
Genuine
Legal
Legal
Very Legitimate
2
Genuine
Legal
Uncertain
Legitimate
3
Genuine
Legal
Fake
Suspicious
4
Genuine
Uncertain
Legal
Suspicious
5
Genuine
Uncertain
Uncertain
Phishy
6
Genuine
Uncertain
Fake
Phishy
7
Genuine
Fake
Legal
Suspicious
8
Genuine
Fake
Uncertain
Phishy
9
Genuine
Fake
Fake
Very Phishy
10
Doubtful
Legal
Legal
Legitimate
11
Doubtful
Legal
Uncertain
Suspicious
12
Doubtful
Legal
Fake
Phishy
13
Doubtful
Uncertain
Legal
Suspicious
14
Doubtful
Uncertain
Uncertain
Suspicious
15
Doubtful
Uncertain
Fake
Phishy
16
Doubtful
Fake
Legal
Phishy
17
Doubtful
Fake
Uncertain
Phishy
18
Doubtful
Fake
Fake
Very Phishy
5
Comp
TABLE VI. FIVE HIGHEST (10) FOR LAYER ONE AND ALL OTHERS
LOWEST (0).
Layer
One
URL &
Domain
10
10
10
10
10
1
2
3
4
5
Layer Two
Security
Source
&
Code &
Encrypt
Java
0
0
0
0
0
0
0
0
0
Page
Style &
Contents
Layer Three
Web
Address
Bar
0
0
0
0
0
0
0
0
0
0
Social
Human
Factor
0
0
0
but it can be safe and secured especially when other phishing
characteristics or layers are not noticeable, visible or
detectable. As a future work we will propose and develop a
prototype intelligent website phishing detection system by
using Fuzzy Data Mining algorithms and techniques. The
approach will look for deviations from stored patterns of
normal phishing behavior and for previously described
patterns of behavior that is likely to indicate phishing.
%
Website
Phishing
Rating
50%
REFERENCES
Comp
TABLE VII. FIVE MIDDLE (5) INPUTS FOR LAYER ONE AND LAYER
TWO AND HIGHEST (10) INPUTS FOR LAYER THREE.
Layer
One
URL &
Domain
5
5
5
5
5
1
2
3
4
5
Layer Two
Security
Source
&
Code &
Encrypt
Java
5
5
5
5
5
5
5
5
5
Page
Style &
Contents
Layer Three
Web
Address
Bar
10
10
10
10
10
10
10
10
10
10
Social
Human
Factor
10
10
10
[1]
[2]
%
Website
Phishing
Rating
[3]
65%
[4]
Comp
TABLE VIII. FIVE MIDDLE (5) INPUTS FOR LAYER ONE AND ALL
OTHERS LOWEST (0) INPUTS.
1
2
3
4
5
Layer
One
URL &
Domain
5
5
5
5
5
Layer Two
Security
Source
&
Code &
Encrypt
Java
0
0
0
0
0
0
0
0
0
Layer Three
Page
Web
Style &
Address
Contents
Bar
0
0
0
0
0
0
0
0
0
0
Social
Human
Factor
0
0
0
[5]
%
Website
Phishing
Rating
[6]
[7]
35%
[8]
[9]
Table VIII show that when the Layer one of the phishing
website risk criteria (URL & Domain Identity) has middle (5)
input values which indicate Mod. phishing indicator and all
other Layers has the value of zero input values which indicate
Low phishing indicator, the final phishing website risk rating
will be reasonably low (35%) representing a “legitimate
website”, which means that there is Good guarantee that the
website is legitimate website. This result clearly shows that
even if some of the website phishing characteristics or layers
are noticed or observed, that does not mean at all that the
website is phishy or forged, but it can be safe and secured
especially when other phishing characteristics or layers are not
noticeable, visible or detectable. The results also indicates that
the worst website phishing rate (all three layers have 10 input
value) equals 86.2% representing “Very Phishy Website” and
the best website phishing rate (all three layers have 0 input
value) is 13.8% representing “Very Legitimate Website”
rather than a full range, i.e. 0 to 100, because of the
fuzzification process
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
V. CONCLUSION AND FUTURE WORK
[19]
The fuzzy website phishing model showed the significance
and importance of the phishing website criteria (URL &
Domain Identity) represented by layer one, and also showed
that even if some of the website phishing characteristics or
layers are not very clear or not definite, the website can still be
phishy especially when other phishing characteristics or layers
are obvious and clear. On the other hand even if some of the
website phishing characteristics or layers are noticed or
observed, that does not mean at all that the website is phishy,
[20]
[21]
[22]
[23]
6
WholeSecurity Web Caller-ID, www.wholesecurity.com.
Anti-Phishing Working Group. Phishing Activity Trends Report,
http://antiphishing.org/reports/apwg_report_DEC2005_FINAL.pdf,
December 2005.
B. Adida, S. Hohenberger and R. Rivest, “Lightweight Encryption for
Email,” USENIX Steps to Reducing Unwanted Traffic on the Internet
Workshop (SRUTI), 2005.
S. M. Bridges and R. B. Vaughn, “fuzzy data mining and genetic
algorithms applied to intrusion detection,” Department of Computer
Science Mississippi State University, White Paper, 2001.
R. Dhamija and J.D. Tygar, “The Battle against Phishing: Dynamic
Security Skins,” Proc. Symp. Usable Privacy and Security, 2005.
FDIC., “Putting an End to Account-Hijacking Identity Theft,”
http://www.fdic.gov/consumers/consumer/identity_theft.pdf, 2004.
A. Y. Fu, L. Wenyin and X. Deng, “ Detecting Phishing Web Pages with
Visual Similarity Assessment Based on Earth Mover’s Distance ,” IEEE
transactions on dependable and secure computing, pp. 301-311 vol. 3,
no. 4, 2006.
A. Herzberg and A. Gbara, “Protecting Naive Web Users,” Draft of July
18, 2004.
C. Y. Ho, B. W. Ling and J. D. Reiss, "Fuzzy Impulsive Control of
High-Order Interpolative Low-Pass Sigma–Delta Modulators," IEEE
Transactions on Circuits and Systems—I: Regular Papers, pp. 2224 2233 Vol. 53, No. 10, October 2006.
L. James, “Phishing Exposed,” Tech Target Article sponsored by:
Sunbelt software, searchexchange.com, 2006.
M. Liu, D. Chen and C. Wu. "The continuity of Mamdani method,"
International Conference on Machine Learning and Cybernetics,
Page(s): 1680 - 1682 vol.3, 2002.
W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng, “Phishing Web Page
Detection,” Proc. Eighth Int’l Conf. Documents Analysis and
Recognition, pp. 560-564, 2005.
W. Liu, X. Deng, G. Huang and A. Y. Fu, “An Antiphishing Strategy
Based on Visual Similarity Assessment,” Published by the IEEE
Computer Society 1089-7801/06, INTERNET COMPUTING IEEE, pp.
58-65, 2006.
Microsoft Corp, “Microsoft Phishing Filter: A New Approach to
Building Trust in E-Commerce Content,” White Paper, 2005.
S. Olsen, “AOL tests caller ID for e-mail,” CNET News.com, January
22, 2004.
Y. Pan and X. Ding, “Anomaly BasedWeb Phishing Page Detection,”
Proceedings of the 22nd Annual Computer Security Applications
Conference (ACSAC'06), Computer Society, pp. 381-392, 2006.
J. C. Perez, “Yahoo airs antispam initiative,” ComputerWeekly.com,
December 8, 2003.
S. Shah, “Measuring Operational Risks using Fuzzy Logic Modeling,
“Article, Towers Perrin, JULY 2003.
T. Sharif, “Phishing Filter in IE7,”
http://blogs.msdn.com/ie/archive/2005/09/09/463204.aspx, 2006.
L. Wood, “Document Object Model Level 1 Specification,”
http://www.w3.org, 2005.
M. Wu, R. C. Miller and S. L. Garfinkel , “Do Security Toolbars
Actually Prevent Phishing Attacks?,” CHI April 2006.
M. Wu, R. C. Miller and G. Little, “Web Wallet: Preventing Phishing
Attacks by Revealing User Intentions,” MIT Computer Science and
Artificial Intelligence Lab, 2006.
K.P. Dahal, Z. Hussain and M.A. Hossain," Loan risk analyzer based on
fuzzy logic," Proceedings of IEEE Int. Conference on
E-Technology & E-Commerce, IEEE Comp. Society Press, pp 363-366.