[go: up one dir, main page]

0% found this document useful (0 votes)
13 views14 pages

Unsupervised Network Intrusion Detection Systems For Zero-Day Fast-Spreading Attacks and Botnets

This paper presents a novel unsupervised Network Intrusion Detection System (NIDS) designed to detect zero-day fast-spreading attacks and botnets in high-speed networks. It utilizes two engines: one for real-time detection of various attacks and another for identifying botmasters from DDoS traffic. The proposed system leverages unsupervised machine learning techniques, particularly clustering algorithms, to enhance detection rates without prior knowledge of attack signatures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Unsupervised Network Intrusion Detection Systems For Zero-Day Fast-Spreading Attacks and Botnets

This paper presents a novel unsupervised Network Intrusion Detection System (NIDS) designed to detect zero-day fast-spreading attacks and botnets in high-speed networks. It utilizes two engines: one for real-time detection of various attacks and another for identifying botmasters from DDoS traffic. The proposed system leverages unsupervised machine learning techniques, particularly clustering algorithms, to enhance detection rates without prior knowledge of attack signatures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/301549262

Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-


Spreading Attacks and Botnets

Article in International Journal of Digital Content Technology and its Applications · March 2016

CITATIONS READS

42 1,193

1 author:

Payam Vahdani Amoli


University of Jyväskylä
12 PUBLICATIONS 364 CITATIONS

SEE PROFILE

All content following this page was uploaded by Payam Vahdani Amoli on 21 April 2016.

The user has requested enhancement of the downloaded file.


Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-


Spreading Attacks and Botnets

Payam Vahdani Amoli 1, Timo Hamalainen 2, Gil David 3, Mikhail Zolotukhin 4, Mahsa
Mirzamohammad 5
Department of Mathematical Information Technology
1 Corresponding Author,2,3,4

Faculty of Information Technology, Jyväskylä University


Jyväskylä, Finland
5
Department of Computer Science and Information Systems
Faculty of Information Technology, Jyväskylä University
Jyväskylä, Finland
pavahdan@jyu.fi, timo.t.hamalainen@jyu.fi, gil.david@jyu.fi, mikhail.m.zolotukhin@jyu.fi,
mamirzam@student.jyu.fi

Abstract
The occurrence of zero-day attacks in high-speed networks is increasingly common. As most
network intrusions are detrimental to the network via fast-spreading; real-time monitoring, processing
and intrusion detection are now among the key features of NIDS. Unsupervised machine-learning
techniques are commonly applied in NIDS to detect unknown and complex attacks in the network. This
paper proposes a novel unsupervised NIDS for high-speed networks, which detects network intrusion
without any prior knowledge via two separate engines. The first engine detects fast-spreading DOS,
probes and DDOS attacks (e.g. POD, SMURF, Mail-bomb, SSH-process-table, UDP Storm, port
scanning, network scanning) in real time to stop the paralysis of both network and victims. The second
engine finds the eventual internal botnet (bots or botmaster), while the monitored network filled by
DDOS attacks traffic to prevent the occurrence of internal DDOS attacks in the future.

Keywords: NIDS, Unsupervised, Clustering, Real Time, Botnet

1. Introduction

Nowadays, Networks are exposed to an increasing number of security threats; a Network Intrusion
Detection System (NIDS) has therefore become a necessary supplement to every large network
infrastructure. A NIDS monitors and analyzes the behavior of networks to detect unauthorized activity.
Typically, while the NIDS detects an intrusion it sends the report to the administrator; moreover, it
informs other network nodes inside the network to terminate their sessions with the attacker/s.
Generally, a NIDS uses two methods to detect intrusions in the network: signature-based and
anomaly-based. A signature-based NIDS raises the alarm if the pre-defined pattern (signature) of
attacks matches the current behavior of the network. Preparation of the signature for known attacks
must be carried out by security experts, which is both costly and time-consuming. Signature-based
NIDS have high detection rates for well-known attacks; however, they fail to detect known intrusions
with small variations to their signatures. In addition, signature-based methods are not able to detect
zero-day attacks. In anomaly-based methods, a NIDS needs to observe network data to become trained
or adapted to the normal (common) and abnormal behavior of the network and nodes. After the
observation phase, if the behavior of the network passes the threshold it will consider those actions
abnormal and send the report to the network administrator. The capability of detecting zero-day attacks
in anomaly-based NIDS can be considered an advantage of this method; however, finding the best
method of training and dealing with a high rate of false alarms is the main issue of this method [1], [2],
[3].
In general the NIDS detects an intrusion by inspecting bytes, packets or network flows. “A flow is
defined as a set of IP packets passing an observation point in the network during a certain time interval.
All packets belonging to a particular flow have a set of common properties” [4]. Based on the results of

International Journal of Digital Content Technology and its Applications(JDCTA) 1


Volume10, Number2, March 2016
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

our previously proposed models and other researchers, monitoring network flows enhances the
detection rate of complex attacks and decreases false alarms [1], [5], [6], [7], [8], [9], [10]. As network
attacks may occur in several stages or via a lengthy communication, inspecting the packet’s payload or
counting the number of transferred bytes may not provide sufficient information for their detection.
Sampling network traffic [1] is one of the main solutions to reduce the resource requirement and
computation time of analyzing the packet’s payload; however, it increases the probability of losing
anomalous data (data related to intrusions) and pushes the NIDS to produce a high level of false-
negative alarms. Besides applying different algorithms of intrusion detection, pre-processing the input
data improves the detection rate. Since the network flow contains the behavior of the network and the
nodes in higher extensive vision, the explicitness of the data results in a higher detection rate of
network attacks. As the data volume of network flows is only 0.1 per cent compared to the packet
payload [1], real-time detection is practical and implementing the NIDS in a high-speed network is
feasible. Intrusions in encrypted communication raise a false-negative alarm in payload-based NIDS as
a result of the inaccessibility of the packets’ payload; however, monitoring and inspecting encrypted
communication in the form of network flows provides useful information to the NIDS [11], [12], [13].
Besides detecting fast-spreading, zero-day attacks (e.g. DOS, DDOS and probes), detecting
complex attacks such as a botnet is another challenge for anomaly-based NIDS (A-NIDS). A botnet is
a collection of bots (robots, slaves), which are connected to the botmaster through a command and
control channel (C&C) and wait for their command from the botmaster. The detection technique for A-
NIDS can be categorized as statically based, knowledge-based and machine-learning based [5]. Since
statically based techniques (probabilistic approaches) rely on statistics alone and do not correlate
alarms, the rate of false alarms increases during complex attacks. On the other hand, a knowledge-
based (scenario-based or expert system) NIDS needs to observe specific steps in order to detect attacks.
However, complex attacks change their structure automatically to become undetectable from a
scenario-based NIDS. In addition, since the rules should be manually constructed by human experts, it
is both difficult and time-consuming to develop high-quality knowledge. Machine-learning techniques
model the data input automatically to predict or make decisions based on the data entry. The self-
learning capability of machine-learning techniques in A-NIDS increases the detection rate of zero-day
and complex attacks [3], [14], [15], [16].
In general there are three types of machine-learning technique: supervised, semi-supervised and
unsupervised. In supervised machine-learning techniques, the engine needs to be trained by a labeled
data set in order to create models for future prediction or decision-making; however, the attainment of
labeled network traffic needs to be carried out by security experts, which is both costly and time-
consuming. Semi-supervised machine-learning techniques need to be trained by small amounts of
labeled data and large amounts of unlabeled data to create a model for the normal and abnormal
behaviors in the network. Unsupervised machine-learning techniques formulate the invisible model of
unlabeled data without any prior knowledge. Clustering algorithms is one of the main approaches in
unsupervised machine-learning techniques; it detects noises or abnormal behavior via categorizing
patterns (data) into group/s (cluster) according to their resemblance [17], [18].
The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [19] is a data-
clustering technique that finds a number of clusters starting from the estimated density distribution of
the corresponding samples. The DBSCAN requires two parameters to cluster data: α as the minimum
number of points required to form a cluster, and β as the acceptable distance. Based on our previous
experiment [10] and others [20], [21], the DBSCAN appears to have a high detection rate of network
intrusions without any prior knowledge as a result of the ability to cluster data in any size and arbitrary
shape.
In this paper we propose and implement a novel, real-time unsupervised NIDS, which detects
network intrusions by monitoring and inspecting the behavior of the network in normal or encrypted
communications via two separate engines. The first engine monitors the behavior of the
network through an automated self-adaptive threshold to detect significant network traffic changes that
may be caused by intrusions. Whenever the threshold raises the alarm it will cluster the current
behavior of the network to detect the fast-spreading network intrusions in real time. As most of the
DDOS attacks are orchestrated from botnet, whenever the first engine classifies as attack as DDOS the
second engine correlates the previous network traffic of DDOS attackers and compares it with the rest
of the network history to find the eventual botmaster. The paper is organized as follows. In Section II,
we discuss the related work. In Section III, we describe in detail our approach to detecting different

2
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

types of network intrusion via two separate engines. The experiments and performance evaluations are
presented in Section IV. Finally, we summarize the paper and outline our future research plans in
Section V.

2. Related Works

Recently, instead of payload checking, researchers have analyzed the different features of network
flows to improve the detection rate of intrusions. For instance, in [22] they obtained a 95 per cent rate
of detection accuracy by training their model via 15 custom network features of the DARPA 99 data
set. Faster processing of the network’s behavior, which was shown in [23], is another remarkable
advantage of flow-based NIDS, as a result of saving the behavior of the network in only 0.1 per cent of
volume compared to the packets’ payload. In [24] they used network flow to reduce the computational
complexity of the NIDS. Furthermore, since the payload of encrypted packets is not available, many
researchers suggested and implemented flow-based NIDS to monitor the behavior of encrypted
communication. For instance, in [25] they compared different machine-learning algorithms to classify
the network flows of encrypted communication in SSH protocol and Skype traffic.
Unsupervised machine-learning algorithms have been applied in NIDS to increase the detection rate
of zero-day attacks, while decreasing the false alarms in imbalanced traffic without any prior
knowledge. In [26] they used different unsupervised techniques to detect anomalies in the mobile ad
hoc network. In their research they indicated that k-means and c-means performed best, while k-means
required fewer resources. Improving the accuracy rate of unsupervised fuzzy c-means clustering
(FCM) for network intrusion detection was achieved in [27] using a particle swarm algorithm (PSO)
for supervisory control and a data acquisition system. In [28] they proposed a real-time NIDS to detect
known and zero-day intrusion by applying several neural networks such as Adaptive Resonance
Theory (ART) and Self-Organizing Map (SOM). In [29] they similarly proposed an unsupervised
NIDS, which uses different clustering algorithms such as sub-space clustering and density-based
clustering with a high detection rate. Several researchers applied clustering algorithms on network
flows to detect different types of network intrusion in normal and encrypted communication, such as
DOS, scanning, DDOS and botnets [11], [12], [13], [30], [31], [32], [33], [34], [36].

3. Proposed Model

Based on our proposed model in [37], which was partially implemented in [10], we divided the
processes of intrusion detection using two separate engines. The first engine checks the behavior of the
network in real time to detect fast-spreading DOS, DDOS, scanning and worm propagation (e.g. ping
of death, SMURF, port scanning), while the second engine (botnet detection engine) needs more time
to observe sufficient information in order to find the eventual botmaster. Figure 1 shows the general
architecture of the proposed model.
In our proposed model, the NIDS uses live network flows (from routers, e.g. Cisco [43]) as the input.
Transferring network flows to the NIDS is an optimized solution of network monitoring, while in port
mirroring transferring packets produces a high volume of network traffic and causes a bottleneck
during network attacks.

3
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

Routers

Live Network Traffic


(Network Flows)

Live Network Traffic Live Network Traffic


(Network Flows) (Network Flows)

Network Traffic
History Live Network Traffic Live Network Traffic
(Network Flows) (Network Flows)

Network Traffic Threshold


History Determiner
Network Flow
Network Traffic
(Network Flows)
Alarm
IP Addresses
Clustering Engine of Attackers Clustering Engine
2 1

BotMaster IP Attack Details


Admin

Second Engine First Engine

Figure 1. Architecture of the NIDS

3.1 First Engine, Dynamic Self-Adaptable Threshold Determiner

The first engine monitors the behavior of the network through an automated self-adaptive threshold
for detecting significant changes that may be caused by intrusions in real time. To prevent a high rate
of false alarms in the imbalanced network traffic, the network threshold adapts to the current status of
the network to determine the normal expected volume of network traffic in the future (the next second).
Table 1 shows how the NIDS observes the previous behavior of the network to determine the
threshold, T, for the current traffic. First, it needs to collect the last, N, minutes of network traffic to
observe a sufficient amount of network traffic in order to reach the threshold. The collected traffic is
divided into four equal windows (as W1 contains the first quarter of network traffic history). To find
the ratio of new network flows per machine, Xj, the total number of new network flows is divided by
the total number of live sources and destinations per second. A high number of network flows can be
expected while the number of online users is large; however, it can be considered an abnormal situation
if the number of live machines in the network is small. Since the number of network users is wave-like
(based on a time series), the monitoring ratio of out-bounded and in-bounded network flows (per
machine) provides more accurate data about the current status of the network.
To calculate µ as a parameter showing the volume of network traffic, Xj is standardized by
logarithm (log) to decrease the probability of missing small intrusions while a larger intrusion is taking
place. In addition, to determine an accurate threshold, the NIDS needs to learn and adapt to the past
behavior changes of the network by calculating the standard deviation ( ) of Xj in each window.

Table 1. Network Traffic Threshold


Last, N, minutes of traffic for learning Gap Current traffic
W1 W2 W3 W4 ε T=(Max i + Max (µ) ) * γ
seconds i=1,2,3,4
1 2 3 4

µ=Xj Log Xj
j=1…(N*60)

Adding the maximum deviation of network traffic to the maximum number of network traffic
specifies the highest expected volume of network traffic in the future. Since in a real network the
highest expected network traffic is reached in normal scenarios, the administrator can multiply it by γ

4
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

to obtain an accurate and final threshold. Fast network intrusions take the total bandwidth of the
network in 1–3 seconds; it is therefore suggested to have a small time gap, ε, between the learning
window and the current traffic to minimize the effect of the intrusion’s traffic during the threshold
calculation.
A vast amount of network traffic in high-speed networks significantly increases the possibility of
losing the signs of attack in small subnets. To overcome this issue the network threshold obtains input
from four different sizes of network (network prefix: /0, /8, /16 and /24).

3.2 First Engine, Clustering Engine (fast-spreading network attacks detection module)

Whenever the volume of network flows passes the threshold, the NIDS uses the DBSCAN to cluster
the number of in-bounded and out-bounded network flows for each machine to find the attacker/s. To
increase the accuracy rate of detection, the network traffic is clustered in two phases: training (self-
learning) and detection.
During the training phase, the NIDS clusters the clean network traffic transmitted before the
threshold raised the alarm in order to obtain the most accurate distance during the detection phase. In
the training phase α is considered to be a small segment of the total number of data and β the mean of
the Euclidian and Mahalanobis distance [38] of all points. Besides all the advantages of using the
Euclidian distance in DBSCAN clustering, considering the density of points via the Mahalanobis
distance increases the accuracy of distance measurement. Since we assume attack-free data input
during the training phase, β may still not be sufficient to put all the points into the clusters. Therefore,
the NIDS determines the smallest distance between clusters and all of the noises to be Δ, which is the
supplement value added to β in order to obtain a more accurate distance parameter during the detection
phase. In other words, during the training phase, β+Δ is the minimum distance that puts all the points
inside the clusters. Since, during the training phase, the NIDS adapts to the normal behavior of the
network, the possibility of obtaining accurate intrusion detection is increased.
Afterwards, to find the outliers (anomalies), the NIDS clusters the suspicious network traffic, while
α is a small segment of the total amount of data, and the acceptable distance is β+Δ. Since network
intrusions such as DOS, DDOS, scanning, spamming and fast-spreading worms generate large numbers
of network flows in a short period of time, it is therefore possible to detect their behavior as noise
(outliers). Upon the detection of intrusions, the first engine sends the details of the attack to the
administrator, which shows the number of machines, ports and other useful data to classify the attack.
Algorithm 1 presents the pseudocode of the clustering process for the first engine to detect the fast-
spreading network attacks.

Algorithm 1. Pseudocode of the First Engine, Clustering Engine (fast-spreading network


attacks detection module)
1: CNT= # of inbounded & outbounded network flows before the
suspicious traffic (Clean Network Traffic)
2: SNT= # of inbounded & outbounded network flows during the
suspicious traffic (Suspicions Network Traffic)

//Total Traffic
3: TNT=CNT+SNT

4: InitializeAlpha(CNT)
5: InitializeBeta(CNT)
6: (Noises, Clusters)=DBSCAN(CNT, α, β)

7: Δ=FindSmallestDistance(Noises, Clusters)

8: InitializeAlpha(TNT)
9: Attacker and Victim IP=DBSCAN(TNT, α, β+Δ)

5
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

3.3 Second Engine (botnet detection module)

The aim of proposing the second engine is to detect the internal botnet (bots or botmaster), while it
may cause another DDOS attack against the monitored network in future. As most of the DDOS
attacks are orchestrated from the botnet [40], the second engine will analyze and cluster the network
traffic before the DDOS attack to find the eventual botnet if any of its nodes (bots or botmaster) have
the internal IP addresses. In general the life cycle of the botnet consists of five stages: initial infection,
secondary injection, connection, malicious activity and maintenance. As a result of this structure, there
is a specific unique behavioral structure of communication between C&C and bots in centralized and
decentralized botnets to the rest of the network traffic. Based on our previous research [39], and that of
others [30], [31], [32], [33], [34], [36], by clustering different features of network traffic before the
DDOS attack, it is possible to detect the uniqueness of botnet communication (the communication
between bots and botmaster) as the noise (outliers) to the rest of the network traffic. Whenever the first
engine classifies an attack as DDOS, it sends the IP addresses of the attackers (eventual bots) to the
second engine to analyze the possibility of finding the botmaster. In general, C&C communicates with
the bots several times during the bots’ life. For this reason, the second engine creates a list of IPs,
which are communicated to the attackers before the DDOS attack, and considers them to be the
eventual botmaster. Afterwards, different features of all the transmitted network flows are clustered by
the DBSCAN, while α is a small segment of the total amount of data and β the mean of the Euclidian
and Mahalanobis distance of all points.
Table 2 shows the features of network flows, which are clustered by the DBSCAN. To reduce the
complexity and processing time, the NIDS uses sub-space clustering to reduce the dimensions of the
data during clustering. Clustering different features of the network traffic (before the DDOS attack)
provides a scheme from the normal activities in the network, and outliers can be considered the
communications that do not follow the most common behavior of the network. Since communication
between C&C and the bots is unique to the rest of the network, there is a high chance of finding the
flows between C&C and bots in the outliers. Finally, whenever the second engine finds that any IP
address from the eventual C&C list matches the IP addresses of the outliers it will consider that IP
address to C&C and report it to the administrator. Furthermore, the administrator may block all
communications, which leads to the botmaster to terminate the botnet from its source. Algorithm 2
presents the pseudocode of the clustering process for the second engine to detect the botmaster after the
DDOS attacks.

Table 2. Network Flow Features for Botnet Detection


Features
 Duration  Average Latency of Response from Source
 Number of Packets  Average Latency of Response from Destination
 Smallest Packet Size  Average Size of Packet from Source
 Largest Packet Size  Average Size of Packet from Destination

Algorithm 2. Pseudocode of the Second Engine (botnet detection module)


// Network flows with 8 features based on Table 2
1: NF{F1:F8}= Network flows before the DDOS attack

// Loading the IP address of DDOS attackers(Bots)


2: Bots=CallDDOSAttackers()

// IPs which communicated with Bots previously


3: EventualBotMaster{}=CommunicatedMachine(NF,Bots)

4: InitializeAlpha()
5: InitializeBetaa()
6: Noises= DBSCAN(NFSubSpaceClustring{F1:F8}, α, β)
7: ReportBotMasterIP(If (Noises{} ∩ EventualBotMaster{}))

6
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

4. Experimental Results

We evaluated our proposed model using two well-known and widely used data sets: DARPA [41]
and ISCX [42]. Fast network intrusions in DOS (e.g. POD, SMURF, Mail-bomb, SSH-process-table,
UDP Storm), probes (e.g. port scanning–port sweep, network scanning–IP sweep) and DDOS were
extracted from a DARPA traffic sample to check the performance and reliability of the first engine,
since these types of attack have still been occurring frequently in recent decades [35], [44], [45], [46].
In addition, to evaluate the performance of the second engine we used a botnet traffic sample from the
ISCX data set and merged it with a DDOS traffic sample from the DARPA data set. Imbalanced
network traffic in the DARPA and ISCX data set created a high rate of false alarms for the threshold
since they used real users and autonomous program network traffic as their background traffic. To
reduce the number of false alarms, we used an automated system to test a range of different variables
(e.g. two to fifteen) for γ. During the experimental phase, while the system set γ as a small variable (e.g.
two or three) any small deviation of network traffic pushed the first engine to cluster network traffic
while the network was not under attack. On the other hand, while γ was set as a large number (e.g. ten
to fifteen) the threshold was not able to detect suspicious network traffic. After comparison of the
results from the automated testing, we found that γ as five had the best performance for intrusion
detection. Meanwhile, we also tested different window sizes (one to fifteen minutes) for N to enhance
the accuracy of the threshold. Small window sizes (e.g. one minute) for N decreased the observation
volume for the threshold, which resulted in a high false alarm. On the other hand, calculating
parameters in a large window size (fifteen minutes) was time-consuming and did not affect the
performance of the threshold significantly. During our experiment N as five minutes had an optimal
amount of observed traffic during prediction of the threshold. However, to prevent the influence of
suspicious traffic during threshold prediction, we set ε as five seconds to create sufficient space
between the learning window and the current traffic.
Figure 2 shows the different steps of intrusion detection by the proposed NIDS. We chose three
types of fast intrusion (1–1, 1–M, M–1) to demonstrate the different phases of intrusion detection in
this figure. As shown in Figure 2 (A, B, C), the number of network flows (in-bounded or out-bounded)
is divided by the dynamic and self-adaptable threshold. During the occurrence of fast network
intrusions, such as port scanning (probes), ping of death (POD-DOS) and distributed denial of service
(DDOS), the result of dividing is greater than one because of the high number of network flows. In port
scanning, which can be categorized as a one-to-one (1–1) attack, a single attacker sends several
requests to different ports of the victim to identify the running services on the victim’s machine. In the
DARPA data set, the ping of death (POD) can be categorized as a one-to-many (1–M) type of attack,
where one attacker sends a high number of oversized pings to several systems in the network to crash
or disable their services. A DDOS attack can be categorized as a many-to-one (M–1) attack type, since
it starts via a large volume of requests to a single victim through a high number of compromised
systems (potentially bots) to cause an immediate crash of the victim and penalization of the network.
One of the novelties of our proposed model is to build an efficient threshold determiner that is
automated, self-adapted and capable of detecting suspicious behavior during fast network intrusions
(DOS, DDOS and scanning) in real time.
Figure 2 (D, E, F) shows the training (self-learning) phase of the first engine, which is engaged after
the threshold’s alarm. We tested different fractions of data (1% to 40%) for α during our experiment.
Small fractions of data (e.g. 1% to 7%) may mislead the clustering engine to cluster the abnormal
behavior as a cluster; however, large fractions (e.g. 30% to 40%) push the system to accept a larger
acceptable distance (β) during clustering and cause the system to have a higher false alarm rate. After
setting different values for α, we conclude that setting α as 10 to 15 per cent of the total data in both the
self-learning and detection phase provides the most optimized value, which leads us to obtain the best
detection rate with the lowest computation time. As previously mentioned, since β is not a sufficient
distance to place all the normal network flows into clusters in the self-learning phase, Δ is obtained to
include those network flows as normal during the detection phase. Afterwards, in the detection phase,
the suspicious network traffic is clustered via the parameters that are gained during the training phase
to find the outliers and mark them as the attacker or victims. As shown in Figure 2 (G, H I), the NIDS
considers all the previous clean network traffic to be normal; however, attackers’ and victims’ IP
addresses are marked as outliers during the detection phase. The novelty of the proposed clustering

7
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

method in the first engine is to learn and adapt to the previous, normal state of the network in order to
find, with a high accuracy rate, the attacker or victim in the recent suspicious network traffic.

(A) (B)
Self-Training Phase During Port Scanning Attack (Portsweep)

Normal Machines (Core Point)


250 Normal Machines (Density Reachable)
Normal Machine (High Traffic Machines)

200

Inbound Flows Per IP 150

100

50

0
0 50 100 150 200 250
(C) (D) Outbound Flows Per IP

Self-Training Phase During Ping of Death Attack (POD) Self-Training Phase During DDOS Attack

90 Normal Machines (Core Point) 2000 Normal Machines


Normal Machines (Density Reachable) Normal Machines
80 Normal Machine (High Traffic Machines) 1800 High Traffic Normal Machine
70 1600
Inbound Flows Per IP

1400
Inbound Flows Per IP

60
1200
50
1000
40
800
30 600
20 400

10 200

0 0

(E) (F)
0 20 40 60 80 0 500 1000 1500 2000
Outbound Flows Per IP Outbound Flows Per IP

Detection Phase During Port Scanning Attack (Portsweep) Detection Phase During Ping of Death Attack (POD)

Normal Machine (Core Point) Normal Machine (Core Point)


600 900
Normal Machine (With High Traffic) Normal Machine (With High Traffic)
Attacker / Victim (Noise) 800 Attacker / Victim (Noise)
500
700
Inbound Flows Per IP

Inbound Flows Per IP

400 600

500
300
400

200 300

200
100
100

0 0
(G) (H)
0 100 200 300 400 500 600 0 200 400 600 800
Outbound Flows Per IP Outbound Flows Per IP

Figure 2(A, B, C, D, E, F, G, H). Unsupervised Detection of Network Attacks

8
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

x 10
4
Detection Phase During DDOS Attack Botmaster Detection in Botnet

Normal Machine (Core Point) Normal Network Flows


2.5 Normal Machine (With High Traffic) 180
Attacker / Victim (Noise)
Normal Network Flows (Border Points)

Latency of Respond From Destination


160 Suspicious Network Flows

2 140

120
Inbound Flows Per IP

1.5
100

80
1
60

40
0.5
20

(I) (J)
0
0 0.5 1 1.5 2 2.5
0 50 100 150
Outbound Flows Per IP 4 Duration of Network Flows
x 10

Figure 2(I, J). Unsupervised Detection of Network Attacks

To detect the botmaster, the first step that the second engine undertakes is to create a list of IPs that
communicated with all of the DDOS attackers (potential bots) during the last hour. Since bots will
communicate with the botmaster several times, it is possible to consider those IPs (in the list) as the
eventual botmaster. As shown in Table 3, during our experiment two machines with IP address and
port number “131.202.241.200:80” and “192.168.2.112:6667”, respectively, communicated with all of
the DDOS attackers during the last hour of network traffic. Afterwards, the second engine clustered
different features (from Table 2) of the last hour’s network flows to find the abnormal flows in the
network. During the clustering phase we set α as 10 per cent of total traffic, as we assume that botnet
traffic (traffic between the botmaster and the bots) should not be in high volumes while botmasters aim
to be undetectable. Since communication between the bots and the botmaster has a unique pattern and
differs to the rest of the network, it is possible to find these flows as outliers during clustering. During
clustering, all of the network flow features between the HTTP server and the bots (DDOS attackers)
were placed in the cluster; however, network features such as “duration” and “average latency of
respond from server” between the IRC server and the bots were abnormal and recognized as outliers.
At the end the NIDS reported the IRC server to be the botmaster of the detected botnet. Since, in our
proposed model, the second engine clusters and finds the abnormality of the network flows between
the eventual botmaster and the bots, it is possible to find the numbers of botmasters in a decentralized
botnet.

Table 3. Botnet Detail


Botmaster Bots Eventual botmaster
192.168.2.112 192.168.1.103 Eventual botmaster 1:
192.168.1.105 IP: 131.202.241.200 Port: 80 (HTTP)
192.168.2.109 All features of network flows were normal
192.168.2.110 Eventual botmaster 2:
192.168.2.113 IP: 192.168.2.112 Port: 6667 (IRC)
192.168.4.118 Duration of network flows and average latency of response
192.168.4.120 from server was considered as noise during clustering

To compare and evaluate our proposed model, we also implemented DBSCAN-based outlier
detection and k-means-based outlier detection as the two well-known, previously used approaches for
the unsupervised detection of network intrusions. During the implantation phase we used a 10-fold
cross validation method to train and test DBSCAN and k-means algorithms. Table 4 shows the average
performances of the DBSCAN-based outlier detection, k-means-based outlier detection and our
proposed model. Since in our proposed model, the network’s behavior is monitored through the
dynamic and self-adaptive threshold, it will not cluster the total traffic to find the anomalous machine;
for this reason, the computational burden of clustering the total network traffic via DBSCAN and k-
means-based outlier detection was 26 to 30 times greater than our proposed model.
The main drawbacks of most unsupervised NIDS are complexity and computation time. For
instance, in [29] they had complexity issues in terms of analyzing the network data and they suggested
applying parallel processing to overcome this issue. Another issue in [29] was to cluster the suspicious
network traffic with the parameters (α, β), which were obtained by randomly selected samples from
suspicious traffic windows. Suspicious network traffic may be filled by intrusions and there is a high
chance of selecting parameters from intrusions during sampling; however, one of the novelties of our

9
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

proposed models was to obtain α, β from a clean traffic sample (the traffic windows for the self-
learning phase) in order to minimize the effect of attacks. To reach the goal of real-time detection we
used the automated threshold to point out the anomaly behavior in the network. Since the first engine
clusters the anomalous traffic window instead of the total traffic, we decreased the computation time
significantly. For the first engine the computation time of the self-learning and detection phases was
fewer than five seconds as a result of analyzing the suspicious traffic window instead of the total
network history. This makes our novel proposed model practical for real-time intrusion detection in
real networks, while the previous proposed real-time unsupervised NIDS, such as [28], used misuse-
based and anomaly-based detection techniques to classify the behavior of the network. Applying this
novel idea resulted in 100 per cent detection rate on zero-day attacks with 3.61 per cent of false alarms.
Since the second engine only deploys after DDOS attacks, we detected the botmaster with the least
processes for detecting these types of complex attack.

Table 4. Performance Evaluation


Our proposed DBSCAN outliers K-means outliers
model detection detection
False positive rate 3.61% 8.88% 12.92%
True negative rate 96.39% 91.12% 89.08%
Accuracy 98.39% 86.10% 82.07%
Recall 100% 81.63% 80.31%
Precision 98.12% 91.14% 89.38%

5. Conclusions and Future Work

In this paper we have presented a new, real-time unsupervised NIDS, which detects zero-day attacks
without any prior knowledge. We used a dynamic and self-adaptable threshold to detect unexpected
behavior in the network to decrease the computation time of the clustering process during the normal
state of the network. Standardizing data input via a logarithm (log) and monitoring the different size of
subnets through the threshold increase the performance of the NIDS. In addition, dividing the process
of intrusion detection by multistage engines decreases the computation time, which leads to having
real-time intrusion detection for fast-spreading network attacks. Since, in the first engine, the
DBSCAN trains itself with the previous clean network traffic, we reached a 100 per cent detection rate
with 3.61 per cent of false alarms during our experiment. As a result of an increasing rate of DDOS
attacks via the botnet, we implemented the second engine to trace the traffic of bots in order to detect
the botmaster in centralized and decentralized models under different protocols (HTTP, IRC and P2P).
To evaluate our proposed model we used two publicly available and well-known data sets to ensure the
detection process. Future work will focus mainly on detecting more types of complex attack and
analyzing network traffic, while users are grouped into different behavioral classes. In addition, since
distinguishing flash crowds and DDOS attacks is another main challenge in the NIDS, we will also
focus on solving this issue in our future work.

Acknowledgments

We wish to acknowledge the sponsorship of COMAS (Doctoral Program in Computing and


Mathematical Sciences) by the University of Jyväskylä, Finland, which has made it possible to
undertake this research.

References

[1] Sperotto, A., Schaffrath, G., Sadre, R., Morariu, C., Pras, A., Stiller, B.: An overview of IP flow-
based intrusion detection. Communications Surveys & Tutorials , vol. 12, no. 3, pp. 343--356,
IEEE (2010)
[2] Engen, V.: Machine learning for network based intrusion detection: an investigation into
discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural
network classifier ensembles from imbalanced data. PhD Thesis, Bournemouth University (2010)

10
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

[3] García-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E., Anomaly-based network
intrusion detection: Techniques, systems and challenges. Computers & Security, vol. 28, Issues 1–
2, pp. 18--28 (2009)
[4] Claise, B.: Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of
IP Traffic Flow Information. RFC 5101 (Proposed Standard), [Online]. Available:
http://www.ietf.org/rfc/rfc5101.txt, Jun. 2015
[5] Vahdani Amoli, P., Ghobadi, A.R., Taherzadeh, G., Karimi, R., Maham, S.: New Detection
Technique Using Correlation of Network Flows For NIDS. Proceedings of the 2011 International
Conference on Security Management, SAM 2011, Las Vegas, Nevada, USA (2011)
[6] Lakhina, A., Crovella, M., Diot, C.: Characterization of network-wide anomalies in traffic flows.
Proc. of the 4th ACM SIGCOMM conference on Internet measurement, pp. 201–206, ACM, New
York, USA (2004)
[7] Tedesco, G., Aickelin, U.: An Immune Inspired Network Intrusion Detection System Utilising
Correlation Context. Proceedings of the Workshop on Artificial Immune Systems and Immune
System Modelling (AISB '06), Bristol (2006)
[8] Peng, T., Leckie, C., Ramamohanarao, K.: Proactively Detecting Distributed Denial of Service
Attacks Using Source IP Address Monitoring. Proceedings of the Third International IFIP-TC6
Networking Conference (Networking 2004), pp. 771--782 (2004)
[9] Mark, A.L., Crovella, M., Diot, C.: Characterization of Network-Wide Anomalies in Traffic Flows.
IMC '04 Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, pp. 201--
206, New York, NY, USA (2004)
[10] Hosseinpour, F., Vahdani Amoli, P., Farahnakian, F., Plosila, J., Hämäläinen, T.: Artificial
Immune System Based Intrusion Detection: Innate Immunity using an Unsupervised Learning
Approach. International Journal of Digital Content Technology and its Applications(JDCTA),
Volume8, Number5, pp. 1--12 (2014)
[11] Koch, R., Rodosek, G.D.: Security System for Encrypted Environments (S2E2). RAID 2010,
LNCS, vol. 6306, pp. 505--507, Springer, Heidelberg (2010)
[12] Koch, R., Rodosek, G.D.: Command Evaluation in Encrypted Remote Sessions. Network and
System Security (NSS), 2010 4th International Conference on , vol., no., pp. 299--305, 1-3 Sept.
(2010)
[13] Augustin, M., Balaz, A.: Intrusion detection with early recognition of encrypted application.
Intelligent Engineering Systems (INES), 2011 15th IEEE International Conference on , vol., no.,
pp. 245--247 (2011)
[14] Alserhani, F., Akhlaq, M., Awan, I.U., Cullen, A.J., Mirchandani P.: MARS: Multi-stage Attack
Recognition System. Advanced Information Networking and Applications (AINA), 2010 24th
IEEE International Conference on , vol., no., pp. 753--759 (2010)
[15] Sap, M.N.M., Abdullah, A.H., Srinoy, S., Chimphle, S., Chimphle, W.: Anomaly Intrusion
Detection Using Fuzzy Clustering Methods. Jurnal Teknologi Maklumat, FSKSM, UTM, Jurnal
Teknologi Maklumat, vol.18, pp. 25--32 (2006)
[16] Fries, T.P., A Fuzzy-Genetic Approach to Network Intrusion Detection. Proceedings of the 2008
GECCO conference companion on Genetic and evolutionary computation, Atlanta, GA, USA, pp.
2141--2146 (2008)
[17] Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using
machine learning. Communications Surveys & Tutorials, IEEE , vol.10, no.4, pp. 56--76 (2008)
[18] Bhuyan, M. H., Bhattacharyya, D. K., Kalita. J. K., An effective unsupervised network anomaly
detection method. In Proceedings of the International Conference on Advances in Computing,
Communications and Informatics (ICACCI '12). ACM, pp. 533--539, New York, NY, USA
(2012)
[19] Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in
large spatial databases with noise. Proceedings of the Second International Conference on
Knowledge Discovery and Data Mining (KDD-96), AAAI Press. pp. 226–23 (1996)
[20] Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms, Proceeding
MineNet '06 Proceedings of the 2006 SIGCOMM workshop on Mining network data, ACM, pp.
281--286 (2006)

11
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

[21] Ghourabi, A., Abbes, T., Bouhoula, A.: Data analyzer based on data mining for honeypot router.
In Computer Systems and Applications (AICCSA), 2010 IEEE/ACS International Conference on,
pp. 1--6., IEEE (2010)
[22] Lu, W., Ghorbani, A.A.: Network anomaly detection based on wavelet analysis, EURASIP Journal
on Advances in Signal Processing, vol 2009, pp. 4--10 (2009)
[23] Chinchani, R., Berg, E.V.D.: A Fast Static Analysis Approach to Detect Exploit Code Inside
Network Flows,In Proceedings of the 8th Symposium on Recent Advances in Intrusion Detection
(RAID). LNCS, vol. 3858, pp. 284–308, Springer, Heidelberg (2006)
[24] Hong, W., Zhenghu, G., Qing, G., Baosheng, W.: Detection Network Anomalies Based on Packet
and Flow Analysis, Seventh International Conference on Networking, 2008. ICN 2008., vol., no.,
pp. 497--502 (2008)
[25] Alshammari, R., Zincir-Heywood, A.N., Machine learning based encrypted traffic classification:
Identifying SSH and Skype, IEEE Symposium on Computational Intelligence for Security and
Defense Applications 2009 (CISDA 2009) , pp. 1--8,Ottawa Canada (2009)
[26] Dang, B.H., Li W.: Performance Evaluation of Unsupervised Learning Techniques for Intrusion
Detection in Mobile Ad Hoc Networks, Computer and Information Science (Studies in
Computational Intelligence), Volume 566, pp 71--86, Springer (2015)
[27] Almalawi, A., Tari, Z., Fahad, A., Khalil, I.: A Framework for Improving the Accuracy of
Unsupervised Intrusion Detection for SCADA Systems, 12th IEEE International Conference on
Trust, Security and Privacy in Computing and Communications (TrustCom), pp.
292,301,Melbourne Australia (2013)
[28] Amini, M., Jalili R., Shahriari, H.R., RT-UNNID: A practical solution to real-time network-based
intrusion detection using unsupervised neural networks, Computers and Security, Elsevier Inc,
vol.25, Issue 6, pp. 459--468 (2006)
[29] Casas, P., Mazel, J., Owezarski, P.: Unsupervised Network Intrusion Detection Systems: Detecting
the Unknown without Knowledge, Computer Communications, vol.35, Issue 7, pp. 772--783
(2012)
[30] Strayer, W.T., Walsh, R., Livadas, C., Lapsley, D.k.: Detecting Botnets with Tight Command and
Control, Proceedings 31st IEEE Conference on Local Computer Networks, pp. 195--202,Tampa
USA (2006)
[31] Karasaridis, A., Rexroad, B., Hoeflin, D.: Wide-scale botnet detection and characterization,
Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets, pp.
7--7, Cambridge USA (2007)
[32] Thonnard, O., Dacier, M.: A strategic analysis of spam botnets operations, In Proceedings of the
8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS '11).
ACM, New York, USA (2011)
[33] Yu, F., Xie, Y., Ke, Q.: SBotMiner: large scale search bot detection, In Proceedings of the third
ACM international conference on Web search and data mining (WSDM '10). ACM, New York,
USA (2010)
[34] Liu, D., Li, Y., Hu, Y., Liang, Z.: P2P-Botnet detection model and algorithms based on network
streams analysis, International Conference on Future Information Technology and Management
Engineering (FITME), vol.1, , pp. 55,58, Changzhou, China (2010)
[35] http://www.symantec.com/connect/forums/port-scan-attack-blocking-3-5-every-minute
[36] Lin, H.C., Chen, C.M., Tzeng, J.Y.: Flow Based Botnet Detection, Innovative Computing,
Information and Control (ICICIC), 2009 Fourth International Conference on , vol., no., pp. 1538--
1541 (2009)
[37] Vahdani Amoli, P., Hamalainen, T.: A real time unsupervised NIDS for detecting unknown and
encrypted network attacks in high speed network, Proceedings IEEE International Workshop on
Measurements and Networking (M&N), pp. 149--154 (2013)
[38] Mahalanobis, P.C.: On the generalised distance in statistics, Proceedings of the National Institute
of Sciences of India 2 (1) : pp. 49–55 (1936)
[39] Farid Etemad, F., Vahdani Amoli, P.: Real-Time Botnet Command and Control Characterization
at the Host Level, 6th International Symposium on Telecommunication with emphasis on
Information and Communication Technology (IST’2012), Tehran, Iran (2012)
[40] Silva, C.S.C, Silva, R.M.P., Pinto, P.C.G., Salles, R.M.: Botnets: A survey. Computer Networks,
Volume 57, Issue 2 ,pp. 378--403 (2013)

12
Unsupervised Network Intrusion Detection Systems for Zero-Day Fast-Spreading Attacks and Botnets
Payam Vahdani Amoli, Timo Hamalainen, Gil David, Mikhail Zolotukhin, Mahsa Mirzamohammad

[41] DARPA dataset, accessed 2014-9-1.[online]. Availble :www.ll.mit.edu


[42] Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach
to generate benchmark datasets for intrusion detection, Computers & Security, vol. 31, Issue 3,
May 2012, pp. 357--374, ISSN 0167-4048 (2012)
[43] Introduction to Cisco IOS NetFlow - A Technical Overview , [Online]. Available:
http://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/ios-netflow/prod_white_
[44] paper0900aecd80406232.html, Jun. 2015
[45] Ping of death attack detected, [Online]. Available: http://www.symantec.com/connect/forums/
[46] ping-death-attack-detected, Jun. 2015
[47] Preventing Smurf Attack, [Online]. Available: http://www.symantec.com/connect/forums/
[48] preventing-smurf-attack, Jun. 2015
[49] Short, sharp spam attacks aiming to spread Dyre financial malware, [Online]. Available:
http://www.symantec.com/connect/blogs/short-sharp-spam-attacks-aiming-spread-dyre-financial-
malware, Jun. 2015

13

View publication stats

You might also like