© [2009] IEEE. Reprinted, with permission, from [Aruna Jamdagni, Zhiyuan Tan, Priyadarsi Nanda, Xiangjian He and
Ren Liu, Intrusion Detection Using Geometrical Structure, 2009, 2009 International Conference on Frontier of
Computer Science and Technology, 2009]. This material is posted here with permission of the IEEE. Such ermission
of the IEEE does not in any way imply IEEE endorsement of any of the University of Technology, Sydney's products
or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this
material for advertising or promotional purposes or for creating new collective works for resale or redistribution must
be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to
all provisions of the copyright laws protecting it
Intrusion Detection Using Geometrical Structure
Aruna Jamdagni1,2, Zhiyuan Tan1, Priyadarsi Nanda1, Xiangjian He1,3 and Ren Liu2
1
Centre for Innovation in IT Services and Applications (iNEXT)
University of Technology, Sydney
Sydney, Australia
{arunaj, thomas, pnanda, sean}@it.uts.edu.au
2
Commonwealth Scientific and Research Organization (CSIRO)
Sydney, Australia
Ren.Liu@csiro.au
3
Lab of Biomedical Information Technology
University of Aizu
Japan
Abstract—We propose a statistical model, namely Geometrical
Structure Anomaly Detection (GSAD) to detect intrusion using
the packet payload in the network. GSAD takes into account
the correlations among the packet payload features arranged
in a geometrical structure. The representation is based on
statistical analysis of Mahalanobis distances among payload
features, which calculate the similarity of new data against precomputed profile. It calculates weight factor to determine
anomaly in the payload. In the 1999 DARPA intrusion
detection evaluation data set, we conduct several tests for
limited attacks on port 80 and port 25. Our approach
establishes and identifies the correlation among packet
payloads in a network.
Keywords-Intusion
Detection;
Payload;
Geometrical
Structure; Mahalanobis Distance; Pattern Recognition
I.
INTRODUCTION
The growth of Internet and local area networks provide
quality and convenience to human life but at the same time
provides a platform for network hackers and criminals.
Internet security hence becomes an important problem in
near future. The concept of Intrusion Detection was
introduced in 1980 by J.P. Anderson [2], and since then has
become an active field of research. According to Computer
Emergency Response Team (CERT) [1], 32,956
vulnerabilities were reported from many sources through
1995 until the first quarter of 2007. These vulnerabilities
provide opportunities for attackers to launch attacks to
computer systems and gain an access to the computers. The
goal of an Intrusion Detection System (IDS) is to
characterize attacks manifestations to positively identify all
true attacks without falsely identifying non-attacks.
Intrusion Detection Systems are components designed to
detect intrusion and also to prevent a system from being
compromised. There are three major types of Intrusion
Detection Systems. Anomaly detection system creates a
model of normal behavior, and flags suspicious behavior or
any deviation from the normal behavior. The main strength
of anomaly detection is the ability to recognize novel attacks,
and the major weakness is that it is susceptible to false
positive alarms. Signature-based system or misuse detection
system uses knowledge base to recognize directly the
signatures of intrusion attempts. This technique is susceptible
to a slight variation of the attack signature and also to an
unknown attack. The Snort and Bro are popular examples of
signature based intrusion detection system [5, 7] used
commercially. Specification-based system [6] relies solely
on the frequency of the input data based on system calls, or
protocols such as IP, TCP and UDP. The strength of this
technique is computationally light and does not require and
need maintenance of many types of parameters, and or
profile activities. However the weakness of this system is
that it needs detail design to avoid missed attack types.
In this paper, we present a new model, called
Geometrical Structure Anomaly Detection (GSAD) based on
pattern recognition technique used in image processing.
The structure of the paper is as follows. Section 2
describes related work in the field of anomaly detection.
Section 3 briefly describes methods used in intrusion
detection. In Section 4 we discuss our proposed model.
Section 5 describes the implementation of the model and
Section 6 presents conclusions and future work.
II.
RELATED WORKS
The misuse detection systems or signature based systems
rely on signatures of known attacks or pre-defined rules to
match and identify known attacks. Presently in industry, rule
based network intrusion detection systems such as Snort [5]
and Bro [7] are most popular. These systems use signatures
or finger prints to identify known attacks. But signature
based systems are clueless in case of novel attacks. Examples
of such novel attacks are Zero day attacks, Mutation attacks
etc. A Zero day attack is a computer threat that tries to
exploit unknown computer application vulnerability. In
Mutation attack, known instances of attacks are transformed
into distinct instances which have the same power of
exploitation. Since attack signature is different from stored
known signature due to transformations, such attacks are less
likely to be detected by signature based systems.
Anomaly detection systems model the normal profile of
system behaviour, and any deviation from this behaviour will
be identify as a possible attack.
There are two anomaly based detection systems. One is
based on specification (or a set of rules) regarded as good or
normal behaviour, which depend on the human expertise,
and the other one learns the behaviour of the system under
normal operation automatically. Anomaly detection systems
such as PAYL [14], SPADE [8], NIDES [9], PHAD [10],
ALAD [11] and NATE [12] compute (statistical) models for
normal network traffic and generate alarms when there is a
large deviation from the normal model. Some of these
systems use different algorithms to model the normal
network traffic behaviour and feature extraction techniques
from the available audit data. SPAD, ALAD and NIDES use
source and destination IP and port addresses and TCP
connection in the development of model, while PHAD uses
34 features, extracted from the packet header fields of
Ethernet, IP, TCP, UDP, and ICMP packets. For these
systems the detection rate of protocol based attacks is good
but poor for application based attacks, as these systems
ignore the payload contents.
NATE and PHAD system use first 48 bytes as a
statistical features starting from IP header and can include at
most 8 bytes of network packet payload. ALAD models
incoming TCP request first word or token of each input line
out of 1000 application payloads as a feature for HTTP and
SMTP protocols.
Kruegel at al [13] describes a service-specific intrusion
detection system. They use the type, length and payload
distribution of the request as features to compute anomaly
score of a service request and use chi-square test to calculate
anomaly score of new request. They group 256 ASCII
characters into six segments: 0, 1-3, 4-6, 7-11, 12-15, and
16-256, and compute one single distribution model of these
six segments. Ke Wang and Salvatore J. Stolfo [14]
developed full byte distribution model conditioned on the
length of payloads and use Mahalanobis distance to calculate
anomaly score. They also introduced the concept of
automatic clustering of centroids to increase the accuracy
and reduce the resource consumption. In contrast, we
prepose a novel approach to develop GSAD model for
packet payload. Each network connection between a pair of
hosts will be viewed as an object in an image (to be
recognized through image processing), and each image will
be viewed as a pattern to be classified as normal or
anomalous traffic class based upon the given information
about the connections. This model includes the correlation
between various payload features and increases the detection
accuracy. We use Mahalanobis Distance Map to calculate the
difference between normal and anomaly of new network
traffic. We will use DARPA 1999 IDS dataset [15, 16, 21] as
a benchmark to evaluate the robustness of our algorithms.
This dataset is not without its critic. McHugh [17] pointed
out that the DARPA/MIT Lincoln Laboratories IDS test used
generated data, but MIT researchers never did any tests to
show that the generated data was a representative of real
data. Further more they did not conduct tests to verify that
their attacks were representative of real attacks. The detail
description of our model is given in Section 4.
III.
INTRUSION DETECTION METHODS
Various supervised and unsupervised algorithms used by
researchers for intrusion detection with varying degree of
accuracy are reviewed in [3, 4]. Some of them are
summarized here in brief.
Statistical Method: Statistical methods are commonly
used for pattern recognition. The IDS observes a set of
normal behaviour and calculates one or more statistics
identified by a person or some other portion of the IDS to be
significant. It can provide accurate information about the
malicious activities which occur over a long period of time,
but it is hard to determine thresholds that balance the
likelihood of false positive alarms with the likelihood of
false negative alarms.
Artificial Neural Networks: One or more data sources
are used to train the neural net to recognise normal
behaviour. The neural net then identifies behaviour which
does not match its training experience. It is a data clustering
method based on distance measurement. This approach
applies biological concepts to machines to recognise pattern.
It requires minimum priory knowledge, and with enough
layers and neurons can create any complex decision region.
Data Clustering: Data clustering is a technique for
finding data in unlabelled data with many dimensions. It is
an unsupervised method. It can learn from and detect
intrusions in the audit data without explicit descriptions of
various attack classes.
Immune systems: It mimics natural immunology as
observed in biology. Several models exist such as negative
selection, immune network model and clonal selection. Cells
can sense not only the evidence for antigen presence, but
also danger signals.
Decision Tree: This can be used to show possible
consequences for particular occurrences where there are
conditional probabilities for each occurrence. They perform
efficiently with a large amount of data.
Fuzzy Logic: It is a set of rules and concepts and
approaches designed to handle vagueness and imprecision. A
set of rules can be created to describe a relationship between
input variables and output variables, which may indicate
whether an intrusion has occurred. It uses membership
function to evaluate the degree of truthfulness.
However GSDA model uses statistical intrusion detection
method to identify an abnormal behaviour in the network.
IV.
GEOMETRICAL STRUCTURE BASED IDS
In this section, we give a comprehensive introduction
about the GSAD which employs geometrical structure into
payload-based anomaly detection. This IDS is based on a
statistical analysis of Mahalanobis Distances Map among
characters appearing in network traffic and distinguishes
abnormal traffic from normal ones with patterns. The
architecture of GSAD is shown in Fig. 1.
In the following figure solid arrow indicates data flow
inside the GSAD. The GSAD Architecture contains the
following 5 components:
Payload feature classifier: This component is used in the
network traffic payload classification phase. The network
traffic data are grouped into various categories by using
Wireshark based on four conditions including size of
payload, destination address, services and direction of traffic
flow. The source of the network traffic can be real network
and collected tcpdump files.
Payload feature analyst: The payload feature analyst is
first key constituents of Geometrical Structure Payload
Model (GSPM). It is responsible for payload feature analysis
using statistical analysis approaches and prepares raw data
for the following analysis phase.
Payload geometrical structure model: It is the second
key constituent of GSPM. The payload geometrical structure
model is developed by using a statistical method for anomaly
detection based on Mahalanobis Distance Map. The source
data are well prepared by the payload feature analyst.
Attack recognizer: This part of GSAD handles the
recognition of attacks from the input network traffic. It
compares each incoming packet with normal and abnormal
payload geometrical structure model, and then gives out the
score which is the criterion to either generate alarm or not.
Acknowledge/Communication: In this module, the attack
alarm will be generated if the score of a packet is larger than
the threshold and report to the administrator. Otherwise it
will consider the packet is a normal one.
A. GSAD Model Characteristics
The GSAD intrusion detection system uses pattern
recognition techniques. They facilitate the anomaly detection
ability of the system without the prior knowledge of an
attack. Similar to other anomaly detection systems, GSAD
models the normal behavior of the network traffic rather than
the malicious ones. Moreover, the most significant
contribution of GSAD is the integration of geometrical
structures and payload-based anomaly detection systems,
which has not been considered in other related researches.
There are two models involving into our GSAD system,
namely 1-gram payload model [14] and geometrical
structure model [13, 19].
1) One-gram Payload Mode: The 1-gram payload model
is a payload based statistical model. The content of network
packets is the analysis object of the 1-gram payload model
which calculates the average frequency of each ASCII
character (0-255). It does not take network packet header
features into account. However, the average frequency is not
the most appropriate characterizing feature for describing
network behaviors because the same average frequency
which can be obtained from some very different character
frequencies and some steady character frequencies.
Therefore, some other criteria are expected to interpret the
behaviors of variant network traffic. They are the mean value
and standard deviation of each byte’s frequency.
In fact, these criteria are all derived from ACSII character
frequency. So, when building the 1-gram payload model,
feature vector is the compulsory constituent needed to be
calculated first. For a payload model, the feature vector is a
set of relative frequencies is the occurrences of each ASCII
character to the total number of characters appearing in the
payload. In general, each feature vector can be represented as
the following (1).
Then, given a set of feature vectors, we can compute the
mean value and standard deviation of each byte’s frequency.
Here, we assume that there is a network traffic dataset with n
network packets. The mean value and standard deviation of
each byte’s frequency are described as (2) and (3),
respectively.
Here,
Figure 1.
GSAD architecture
The mean value and standard deviation vectors, and ,
are stored in a model M. Whereas due to the network traffic
dataset consists of traffic generated by the various network
services. Therefore we need to classify network traffic based
on the following features: size of payload, destination
address, services and direction of traffic flow. The models
are developed according to this group of features.
2) Geometrical Structure Model: The Geometrical
Structure Model (GSM) is a pattern recognition technique
used to detect similarity between the normal behavior with
the new input traffic. Although this model has been adopted
into the research of human detection, it is still a new concept
to intrusion detection. In this subsection, we present an
explanation about the practical application of geometrical
structure model in payload-based anomaly detection. The
model takes into account the correlations among different
features (256 ASCII characters). Thus, for each network
packet, there is a feature vector defined by (1). The average
value of features in the 1-gram model is
Where (0 ≤ i, j ≤ 255) and
is the (i, j) element
of distance maps
. The
and
are all kept
in a model Mnor for further evaluation.
In the attack recognition phase, an input network packet
experiences the same preprocessing procedure to construct
its Mahalanobis distance map
Then, a calculation is conducted to estimate the
Mahalanobis distance between two distributions of Dobj and
the model Mnor.
If the weight w is larger than a threshold, we determine
that the input network packet is an intrusion
The covariance value of each feature is
In order to investigate the relationship among the
characters, we compute the Mahalanobis distance (indicated
by
) between every two characters.
Based to the above calculation, the Mahalanobis Distance
Map (MDM) of a network packet is constructed as the
following,
The above basic formulas are used in the GSM model to
process a large amount of sample network traffic with
normal behaviors. The distance maps of normal behaviors
for each group of network traffic are calculated by (8).
Simply, let us consider one group of network traffic with m
normal packets inside. Thus, the distance maps of normal
,…,
, and the averages and variances
packets are:
for all elements (i, j) of the distance map are computed by
the following (9) and (10).
V.
EXPERIMENTAL ANALYSIS AND RESULTS
We tested GSAD model on the 1999 DARPA IDS data
set [16, 21], which is considered as standard data set to
evaluate intrusion detection systems. In our experiment we
made assumption that the number of attacks is very small in
contrast to number of normal traffic. We mainly considered
inbound TCP traffic only. The experiment has been done to
identify crashiis attack, back attack, and mailbomb attack
using 150 bytes of packet payload.
A. Analysis and Result
The 1999 DARPA IDS data set was collected at MIT
Lincoln Labs to evaluate intrusion systems. Entire network
traffic was recorded in tcpdump format. The data set consists
of three weeks of training of training data and two weeks of
testing data. In the training data there are two weeks of
attack-free data and one week of data with labelled attacks.
These attacks are grouped into five classes as scan or probe,
DoS, R2L, U2R and data.
In this experiment we used the inside network traffic data
(week 1, week2 and week 3) which was captured between
the router and the victims. We use wireshark for payload
analysis and apply some filters based on payload length of
150 bytes, and for HTTP and SMTP service inbound TCP
traffic.
We trained the GSAD model on the DARPA dataset
using week1 and week 3 (attack free), then evaluate the
model on week 2, which contains 43 instances of 15 different
attacks. Test has been done on three types of attacks,
crashiis, back and mailbomb. For port 80, the attacks are
often malformed HTTP requests and are very different from
normal requests. For instance, crashiis sends request “GET
..//..”,apache2 sends request with a lot of repeated “UserAgent:sioux\r\n”, back sends an HTTP request “GET
///////////….” with more than 6000 slashes, which causes some
versions of Apache web server to consume excessive CPU
time, and for port 25, the attack mailbomb floods a user with
thousands of junk emails. It is easy to identify these attacks
using GSAD model and model shows a great difference in
the behaviour of these attacks with respect to the behaviour
of normal network traffic for these services.
Fig. 2 (a) and (c) show the attack free and attack
character relative frequencies, Fig. 2 (b) and (d) show the
attack free and attack Mahalanobis Distance Map for crashiis
attack. Fig. 3 (a) and (c) show the attack free and attack
character relative frequencies, Fig. 3 (b) and (d) show the
attack free and attack Mahalanobis Distance Map for back
attack.
From Fig. 2 and Fig. 3, we can see that the character
relative frequency and Mahalanobis Distance Map of the
attack packets are very different from the normal packets’,
which can provide strong evidences to distinguish attacks
from normal packets. The character relative frequencies of
attack packets in both figures reveal the behaviours of
crashiis and back attack, which are different. For the crashiis
attack, the “.” character has the highest frequency and the
other characters share even frequencies. Relatively, the
statistical tendency of back attack is totally different and it is
perfect match with the signature. Around 98 per cent of
characters in the attack packets are “/”.
Simultaneously, these experimental results illustrate the
good performance of our GSAD model in detecting crashiis
attack, back attack and mailbomb attack. That is clearly to be
discovered from the geometrical structure models which
explain the correlation among 256 ASCII characters. Both
the behaviour models pairs in Figure 2 and 3 express
dissimilar states between the attack free and attack packet. It
can be taken as the sign to determine an intrusion.
(a) Attack free
(b) Attack free
(c) Attack
(d) Attack
Figure 2. Relative frequencies of characters (a) (c) and mahalanobis distance map (b) (d) for Crashiis attack.
(a) Attack free
(b) Attack free
(c) Attack
(d) Attack
Figure 3. Relative frequencies of characters (a) (c) and mahalanobis distance map (b) (d) for Back attack
[2]
VI.
CONCLUSIONS
In this paper we present an approach for network
intrusion detection based on geometrical structure of
anomaly payload. The key features are to compute byte
distribution model and geometrical structure model for
normal traffic, conditioned to service type, and payload
length. The weight factor is used to compare the similarity
between the new incoming packet's payload and its
corresponding model using mahalanobis distance map
(MDM). If the weight is greater than the threshold, the
incoming packet will be considered as an attack packet. The
experiments done for crahiis attack, back attack and
mailbomb attack show good results.
In our future work we aim to evaluate the performance of
our model and validate our results. We also plan to test this
model on 1999 DARPA IDS dataset for variable length
payload, protocols and services.
REFERENCES
[1]
CERT, "CERT Statistics", http://www.cert.org/stats/#notes, 2007.
J. P. Anderson, "Computer Security Threat monitoring and
surveillance", Technical report, JP Anderson Co., Ft. Washington,
Pennsylvania, Apr 1980.
[3] P. Ning and S. Jajodia, "Intrusion Detection Techniques in H. Bidgoli
(Ed.)", The Internet Encyclopedia: John Wiley & Sons, 2003.
[4] A. Patcha and J. M. Park, "An overview of anomaly detection
techniques: existing solutions and latest technological trends",
Computer networks, 2007.
[5] Snort: The open source network intrusion detection system
[6] P. Uppuluri and R. Sekar, "Experiences with Specification-Based
Intrusion Detection System", In Recent Advances in Intrusion
Detection: 4th International Symposium, RAID 2001 Davis, CA,
USA, October 10-12, 2001, Proceedings 2001, pp. 172.
[7] VernPaxson, “Bro: a system for detecting network intruders in realtime”, Computer Networks (Amsterdam, Netherlands: 1999), 31(2324):2435–2463, 1999V Paxson,
[8] J.
Hoagland,
SPADE,
Silican
Defence,
http://www.silicondefence.com/software/spice, 2000.
[9] H. S. Javits and A. Valdes, "The NIDES statistical component:
Description and justification", Technical report, SRI International,
computer Science Laboratory, 1993.
[10] M. Mahoney and P. Chan, “Learning non stationary models of normal
network traffic for detecting novel attacks”, In Proc. SIGKDD 2002,
pp. 376–385, 2002.
[11] M. Mahoney, “Network traffic anomaly detection based on packet
bytes”, In Proc. ACM-SAC, Melbourne FL, pp. 346– 350, 2003.
[12] C. Taylor and J. Alves-foss, "NATE-Network Analysis of Anomaly
Traffic Events, A Low-Cost approach", New Security Paradigms
Workshop, 2001.
[13] Christopher Krgel, Thomas Toth, and Engin Kirda, “Service specific
anomaly detection for network intrusion detection”, In Proceedings of
the 2002 ACM symposium on Applied computing, pp. 201–208,
2002.
[14] Ke Wang and S. Stolfo, “Anomalous payload-based network
intrusion detection”, In Recent Advances in Intrusion Detection,
RAID, pages 203–222, September 2004.
[15] R. Lippmann, “The 1999 DARPA offline intrusion detection
evaluation”, In recent Advances in Intrusion Detection. Third
International Workshop, RAID 2000, 2-4 Oct. 200, Toulouse, France
(Berlin, Germany, 2000), H.Debar, L. Me, and S. Wu, Eds., SpringerVerlag, pp. 162-182.
[16] R. Lippmann, J. Haines, K. Dass, "Analysis and results Of the 1999
DARPA offline Intrusion detection evaluation”, In Computer
Networks, 34(4), pp. 579-595, 2000.
[17] J McHugh, "The 1998 Lincoln Laboratory IDS evaluation-a critique",
In Recent Advances in Intrusion Detection, Third International
Workshop, RAID 2000, 2-4 Oct. 200, Toulouse, France (Berlin,
Germany, 2000), H.Debar, L. Me, and S. Wu, Eds., Springer-Verlag,
pp. 145-161.
[18] TCPDUMP and LIBPCAP Project: http://www.tcpdump.org/
[19] X. He, J. Li, Y. Chen and W. Jia, "Local Binary Patterns with
Mahalanobis Distance Maps for Human Detection", IEEE Congress
on Image and Signal Processing, pp. 520-524, 2008
[20] Akira Utsumi and Nobuji Tetsutani, "Human Detection using
Geometrical Pixel Value Structures", In Proceeding of 5th
International Conference on Automatic Face and Gesture Recognition
(FGR '02), pp. 34-39, 2002.
[21] http://www.ll.mit.edu/IST/ideval/dex.html