Detecting Denial-of-Service And Network Probe Attacks Using Principal
Component Analysis
Khaled Labib and V. Rao Vemuri
Department of Applied Science
University of California, Davis
U.S.A.
kmlabib@ucdavis.edu and rvemuri@ucdavis.edu
Abstract
Intrusion detection complements prevention
mechanisms, such as firewalls, cryptography,
and authentication to capture intrusions into an
information system while they are acting on the
information system. This study presents an
analysis of a method proposed for anomaly
detection. The method uses a multivariate
statistical method called Principal Component
Analysis to detect selected Denial-of-Service and
Network Probe attacks using the 1998 DARPA
Intrusion Detection data set. The Principal
Components are calculated for both attack and
normal traffic, and the loading values of the
various feature vector components are analyzed
with respect to the Principal Components. The
variance and standard deviation of the Principal
Components are calculated and analyzed. A brief
introduction to Principal Component Analysis
and the merits of using it for detecting the
selected intrusions are discussed. A method for
identifying an attack based on the Principal
Component Analysis results is proposed. The
results obtained using a proposed criterion for
detecting the selected intrusions show that a
detection rate of 100% can be achieved using
this method. Bi-Plots are used as a graphical
mean for summarizing the statistics collected as
a result of the analyzed data.
I. Introduction
With the growing rate of interconnections among
computer systems, network security is becoming
a major challenge. In order to meet this
challenge, Intrusion Detection Systems (IDS) are
being designed to protect the availability,
confidentiality and integrity of critical networked
information systems. Automated detection and
immediate reporting of intrusion events are
required in order to provide a timely response to
attacks.
Early in the research into IDS, two major
approaches known as anomaly detection and
signature detection were arrived at. The former
relies on flagging behaviors that are abnormal
and the later flagging behaviors that are close to
some previously defined pattern signature of a
known intrusion [1]. This paper describes a
network-based anomaly detection method for
detecting Denial of Service and Network Probe
attacks.
The detection of intrusions or system abuses
presupposes the existence of a model [2]. In
signature detection, also referred to as misuse
detection, the known attack patterns are modeled
through the construction of a library of attack
signatures. Incoming patterns that match an
element of the library are labeled as attacks. If
only exact matching is allowed, misuse detectors
operate with no false alarms. By allowing some
tolerance in attack matching, there is a risk of
false alarms, but the detector is expected to be
able to detect certain classes of unknown attacks
that do not deviate much from the attacks listed
in the library. Such attacks are called
neighboring attacks.
In anomaly detection, the normal behavior of the
system is modeled. Incoming patterns that
deviate substantially from normal behavior are
labeled as attacks. The premise that malicious
activity is a subset of anomalous activity implies
that the abnormal patterns can be utilized to
indicate attacks. The presence of false alarms is
expected in this case in exchange for the hope of
detecting unknown attacks, which may be
substantially different from neighboring attacks.
These are called novel attacks.
Detecting novel attacks while keeping acceptably
low rates of false alarm, is possibly the most
challenging and important problem in Intrusion
Detection.
IDSs may also be characterized by scope, as
either network-based or host-based. The key
difference between network-based and host-
based IDSs is that a network-based IDS,
although run on a single host, is responsible for
an entire network, or some network segment,
while a host-based IDS is only responsible for
the host on which it resides [3].
In this study, a method for detecting selected
types of network intrusions is presented. The
selected intrusions represent two classes of
attacks; namely Denial of Service attacks and
Network Probe attacks. The method uses
Principal Component Analysis (PCA) to reduce
the dimensionality of the feature vectors to
enable better visualization and analysis of the
data. The data for both normal and attack types
are extracted from the 1998 DARPA Intrusion
Detection Evaluation data sets [4]. Portions of
the data sets are processed to create a new
database of feature vectors. These feature vectors
represent the Internet Protocol (IP) header of the
packets. The feature vectors are analyzed using
PCA and various statistics are generated during
this process, including the principal components,
their standard deviations and the loading of each
feature on the principal components. Bi-plots are
used to represent a graphical summary of these
statistics. Based on the generated statistics, a
method is proposed to detect intrusions with
relatively low false alarm rates.
The rest of the paper is organized as follows:
Section II discusses related work in intrusion
detection
using
multivariate
statistical
approaches with emphasis on those using PCA.
Section III provides an introduction to PCA and
its applicability to the field of intrusion
detection. Section IV describes Denial of Service
and Network Probe attacks with emphasis on the
attacks selected for this study. Section V details
the process of data collection and preprocessing
and the creation of feature vectors. It also
describes how the various statistics are generated
using PCA results. Section VI discusses the
results obtained using this method and suggests a
method of detecting intrusions using these
results. False alarm rates are also discussed here.
Finally, Section VII provides a conclusion of the
work
presented
in
this
paper
and
recommendations for future work.
II. Related Work
IDS research has been ongoing for the past 15
years producing a number of viable systems,
some of which have become profitable
commercial ventures [5].
There are a number of research projects that
focus on using statistical approaches for anomaly
detection.
Ye et al [6], [7] discuss probabilistic techniques
of intrusion detection, including decision tree,
Hotelling’s T2 test, chi-square multivariate test
and Markov Chains. These tests are applied to
audit data to investigate the frequency property
and the ordering property of the data.
Taylor et al [8], [9] present a method for
detecting network intrusions that addresses the
problem of monitoring high speed network
traffic and the time constraints on administrators
for managing network security. They use
multivariate statistics techniques, namely,
Cluster Analysis and PCA to find groups in the
observed data.
DuMouchel et al [10] discuss a method for
detecting unauthorized users masquerading as a
registered user by comparing in real time the
sequence of commands given by each user to a
profile of the user’s past behavior. They use a
Principal Component Regression model to
reduce the dimensionality of the test statistics.
Staniford-Chen et al [11] address the problem of
tracing intruders who obscure their identity by
logging through a chain of multiple machines.
They use PCA to infer the best choice of
thumbprinting parameters from data. They
introduce thumbprints, which are short
summaries of the content of a connection.
Shah et al [3] study how fuzzy data mining
concepts can cooperate in synergy to perform
Distributed Intrusion Detection. They describe
attacks using a semantically rich language,
reason over them and subsequently classify them
as instances of an attack of a specific type. They
use PCA to reduce the dimensionality of the
collected data.
III. Principal Component Analysis
Principal Component Analysis [12] is a wellestablished technique for dimensionality
reduction and multivariate analysis. Examples of
its many applications include data compression,
image processing, visualization, exploratory data
analysis, pattern recognition, and time series
prediction. A complete discussion of PCA can be
found in several textbooks [13], [14]. The
popularity of PCA comes from three important
properties. First, it is the optimal (in terms of
mean squared error) linear scheme for
compressing a set of high dimensional vectors
into a set of lower dimensional vectors and then
reconstructing the original set. Second, the
model parameters can be computed directly from
the data - for example by diagonalizing the
sample covariance matrix. Third, compression
and decompression are easy operations to
perform given the model parameters - they
require only matrix multiplication.
A multi-dimensional hyper-space is often
difficult to visualize. The main objectives of
unsupervised learning methods are to reduce
dimensionality, scoring all observations based on
a composite index and clustering similar
observations together based on multivariate
attributes. Summarizing multivariate attributes
by two or three variables that can be displayed
graphically with minimal loss of information is
useful in knowledge discovery. Because it is
hard to visualize multi-dimensional space, PCA
is mainly used to reduce the dimensionality of d
multivariate attributes into two or three
dimensions.
PCA summarizes the variation in correlated
multivariate attributes to a set of non-correlated
components, each of which is a particular linear
combination of the original variables. The
extracted non-correlated components are called
Principal Components (PC) and are estimated
from the eigenvectors of the covariance matrix of
the original variables. Therefore, the objective of
PCA is to achieve parsimony and reduce
dimensionality by extracting the smallest number
components that account for most of the
variation in the original multivariate data and to
summarize the data with little loss of
information.
In PCA, the extractions of PC can be made using
either original multivariate data set or using the
covariance matrix if the original data set is not
available. In deriving PC, the correlation matrix
may be used, instead of the covariance matrix,
when different variables in the data set are
measured using different units or if different
variables have different variances. Using the
correlation matrix is equivalent to standardizing
the variables to zero mean and unit standard
deviation.
The PCA model can be represented by:
u mx1 = Wmxd x dx1
where u, an m-dimensional vector, is a projection
of x - the original d-dimensional data vector (m
<< d).
It can be shown [12] that the m projection
vectors that maximize the variance of u, called
the principal axes, are given by the eigenvectors
e1, e2, …, em of the data set’s covariance matrix
S, corresponding to the m largest non-zero
eigenvalues λ1, λ2, … λm.
The data set’s covariance matrix S can be found
as:
S=
1 n
(x − µ )(x − µ )T
∑
n − 1 i =1
Where µ is the mean vector of x. The
eigenvectors ei can be found by solving the set of
equations:
(S − λi I )ei
=0
i = 1,2,..., d
where λi are the eigenvalues of S. After
calculating the eigenvectors, they are sorted by
the magnitude of the corresponding eigenvalues.
Then the m vectors with the largest eigenvalues
are chosen. The PCA projection matrix is then
calculated as:
W = ET
where E has the m eigenvectors as its columns.
One of the motives behind the selection of PCA
for the detection of network traffic anomalies is
its ability to operate on the input feature vector’s
space directly without the need to transform the
data into another output space as in the case of
other self-learning techniques. For example, in
Self-Organizing Maps [15], the transformation of
a high-dimensional input space to a lowdimensional output space takes place through the
iterative process of training the map and
adjusting the weight vectors. The weight vectors
are typically selected randomly which makes the
process of selecting the best initial weight
vectors a trial-and-error process. In PCA,
dimensionality reduction is achieved by
calculating the first few principle components
representing the highest variance in the
components of the input feature vector, without
the need to perform any transformations on the
input space. The input data is analyzed within its
own input space, and the results of the
transformations are deterministic and do not rely
on initial conditions.
IV. Denial of Service and Probe Attacks
In a Denial of Service (DoS) attack, the attacker
makes some computing or memory resource too
busy, or too full, to handle legitimate users’
requests. But before an attacker launches an
attack on a given site, the attacker typically
probes the victim’s network or host by searching
these networks and hosts for open ports. This is
done using a sweeping process across the
different hosts on a network and within a single
host for services that are up by probing the open
ports. This is referred to as Probe Attacks.
Table 1 : Description of DoS and Probe
attacks
Attack Name
Smurf
Neptune
IPsweep
Portsweep
Attack Description
Denial of Service ICMP echo reply
flood
SYN flood Denial of Service on one
or more ports
Surveillance sweep performing
either a port sweep or ping on
multiple host addresses
Surveillance sweep through many
ports to determine which services
are supported on a single host
Table 1 summarizes the types of attacks used in
this study.
Smurf attacks, also known as directed broadcast
attacks, are a popular form of DoS packet floods.
Smurf attacks rely on directed broadcast to create
a flood of traffic for a victim. The attacker sends
a ping packet to the broadcast address for some
network on the Internet that will accept and
respond to directed broadcast messages, known
as the Smurf amplifier. The attacker uses a
spoofed source address of the victim. If there are
30 hosts connected to the Smurf amplifier, the
attacker can cause 30 packets to be sent to the
victim by sending a single packet to the Smurf
amplifier [16].
Neptune attacks can make memory resources too
full for a victim by sending a TCP packet
requesting to initiate a TCP session. This packet
is part of a three-way handshake that is needed to
establish a TCP connection between two hosts.
The SYN flag on this packet is set to indicate
that a new connection is to be established. This
packet includes a spoofed source address, such
that the victim is not able to finish the handshake
but had allocated an amount of system memory
for this connection. After sending many of these
packets, the victim eventually runs out of
memory resources.
IPsweep and Portsweep, as their names suggest,
sweep through IP addresses and port numbers for
a victim network and host respectively looking
for open ports, that could potentially be used
later in an attack.
V. Data Collection and Preprocessing
The 1998 DARPA Intrusion Detection data sets
were used as the source of all traffic patterns in
this study. The training data set includes traffic
collected over a period of seven weeks and
contains traces of many types of network attacks
as well as normal network traffic.
This data set has been widely used in the
research in Intrusion Detection, and has been
used in comparative evaluation of many IDSs.
McHugh [17] presents a critical review of the
design and execution of this data set.
Approach
Attack traces were identified using the time
stamps published on the DARPA project web
site. Data sets were preprocessed to create
feature vectors that were used to extract the
principal components and other statistics. The
feature vector chosen has the following format:
SIPx
SPort
DIPx
DPort
Prot
PLen
Where
•
•
•
•
SIPx = Source IP address nibble, where x =
[1-4]. Four nibbles constitute the full source
IP address
SPort = Source Port number
DIPx = Destination IP address nibble, where
x = [1-4]. Four nibbles constitute the full
destination IP address
DPort = Destination Port number
One of the motives in creating smaller data sets
for representing the feature vectors is to study
the effectiveness of this method for real-time
applications. Real-time processing of network
traffic mandates the creation of small sized
databases that are dynamically created from realtime traffic presented at the network interface.
With each packet header being represented by a
12 dimensional feature vector, it is difficult to
view this high-dimensional vector graphically
and be able to extract the relationships between
its various features. It is equally difficult to
extract the relationship between the many
vectors in a set. Therefore, the goal of using
PCA is to reduce the dimensionality of the
feature vector by extracting the PCs and using
the first and second components to represent
most of the variance in the data. It is also
important to be able to graphically depict the
relationship between the various feature vector
components and the calculated PCs, to see which
of the features affect the PCs most. This
graphical representation would enable better
visualization of the summary of the relationships
in the data set. This visualization is achieved
using Bi-Plots.
PCA was performed on all data sets where each
feature vector would be represented by its 12
components. An exploratory analysis and
statistical modeling tool called S-Plus [18] was
used to generate the required statistics for this
study. The following statistics were generated for
each data set:
•
•
VI. Results
The principal component loadings are the
coefficients of the principal components
transformation. They provide a convenient
summary of the influence of the original
variables on the principal components, and thus a
useful basis for interpretation of data. A large
coefficient (in absolute value) corresponds to a
high loading, while a coefficient near zero has a
low loading [19].
The variance and standard deviation of a random
variable are measures of dispersion. The variance
is the average value of the squared deviation
from the variable’s mean, and the standard
deviation is the square root of the variance.
If X is a discrete random variable with density
function fX(x) and mean µX, the variance σ2 is
given by the weighted sum:
σ X2 = ∑ ( xi − µ X )2 f ( xi )
n
i =1
1.2
1
0.8
0.6
0.4
0.2
0
f
Seven data sets were created, each containing
300 feature vectors as described above. Four data
sets represented the four different attack types
one each of shown in Table 1. The three
remaining data sets represent different portions
of normal network traffic across different weeks
of the DARPA Data Sets. This allows for
variations of normal traffic to be accounted for in
the experiment.
Standard Deviation for each component
Proportion of variance for each component
Cumulative proportion of variance across all
components
Loading value of each feature on all
individual components
A Bi-Plot representing the loading of the
different features on the first and second
components
ur
This format represents the IP packet header
information. Each feature vector has 12
components. The IP source and destination
addresses are broken down to their network and
host addresses to enable the analysis of all types
of network addresses.
•
•
•
Sm
Prot = Protocol type: TCP, UDP or ICMP
PLen = Packet length in bytes
N
or
m
al
1
N
or
m
al
2
N
or
m
al
3
IP
Sw
ee
p
N
ep
tu
ne
Po
rt
Sw
ee
p
•
•
Comp. 1 Loading
Comp. 2 Loading
Comp. 1 Variance
Comp. 2 Variance
Cumulative Prop. Of Variance
Figure 1: Component Loading and Variance
In Neptune attacks, a flood of SYN packets is
sent to one or more ports of the server machine,
but from many clients with, typically, nonexisting (spoofed) IP addresses. The packets seen
by the server appears to be coming from many
different IP addresses with different source port
numbers. This is represented by the irregularity
12000
10000
8000
6000
4000
Comp. 1 Std. Deviation
Smurf
0
Port Sweep
2000
Neptune
In IPsweep attacks, one or more machines (IPs)
are sweeping through a list of server machines
looking for open ports that can later be utilized in
an attack. While in Portsweep attacks, one
machine is sweeping through all ports of a single
server machine looking for open ports. In both
cases, there is an irregular use of port numbers
that causes the variance in the principle
components to vary, with an associated
irregularity in the loading values.
14000
IP Sweep
For the four attack data sets, note that the loading
values for the first and second principal
components are not equal, possibly representing
the imbalance in variance in the packets flowing
between a client and a server with respect to the
source and destination port numbers.
Figure 2 shows the standard deviation for the
first and second principal components for all data
sets. In the case of IPsweep and Portsweep
attacks, the standard deviation of both source and
destination port numbers is almost similar. This
is due to the similarity in utilizing source and
destination port numbers in these attacks.
Normal 3
Note that the loading values for the first and
second principal components in the three normal
data sets are equal, with a value of 0.7. This
represents the balance in variance in the packets
flowing between a client and a server with
respect to the source and destination ports. In
TCP, the data and acknowledgement packets
regularly flow between the client and the server,
each using a designated TCP port number for the
duration of the session.
In Smurf attacks, attackers utilize floods of
Internet Control Message Protocol (ICMP) echo
reply packets to attack a server. Using
amplifying stations, attackers utilize broadcast
addressing to amplify the attack. The packets
seen by the server appear to be coming from
many different IP addresses but to one source
port. Therefore, 99% of the variance for this data
set is represented by the first four principal
components and has their loading values
associated with SIP1, SIP2, SIP3 and SIP4,
instead of the source and destination ports as in
previous attacks.
Normal 2
In the results above, the first 2 principal
components consistently had their highest
absolute value loading from SPort and DPort
features across all data sets. This reflects the high
variance in both source and destination port
numbers for all data sets, except for Smurf at
which the highest variance was due to source IP
address components. Port numbers in TCP
connections vary from 0 to 65534 and represent
the different network services offered within the
TCP protocol.
in both loading and variance of the principal
components.
Normal 1
Figure 1 shows the loading and variance of the
first and second principal components for all data
sets. Normal 1, 2 and 3 represent 3 randomly
chosen data sets from normal traffic. IPsweep,
Neptune, Portsweep and Smurf represent data
sets for these attacks.
Comp. 2 Std. Deviation
Figure 2: Standard Deviation Values for first
2 PCs
In Neptune attacks, the source and destination
ports vary differently where the source port
would have the highest variance. In Smurf
attacks, the first two components, namely SIP1
and SIP2, represent only a portion of the
variance and have a relatively small standard
deviation value.
-5*10^4
0
5*10^4
10^5
10^5
-10^5
0.10
290
278
266
258
250
238
230
224
216
208
200
196
188
182
174
168
160
154
146
142
134
128
120
114
106
100
92
88
80
74
66
60
52
46
38
34
26
20
12
6
0.05
5*10^4
DstPort
-5*10^4
0
SrcIP2
DstIP3
DstIP4
300
298
296
294
288
282
286
280
274
272
270
260
264
254
248
244
242
240
234
226
220
212
210
202
192
186
178
170
164
156
148
140
132
124
116
110
102
94
86
78
70
62
56
48
40
32
24
16
10
2
SrcIP1
299
297
295
293
291
292
287
281
285
283
284
279
273
275
276
271
269
267
268
259
263
261
262
253
251
255
252
256
247
243
245
246
241
239
233
231
235
232
236
225
227
228
219
217
221
218
222
211
209
213
214
201
203
204
205
206
197
198
191
189
193
190
194
185
183
184
177
175
179
176
180
169
171
172
163
161
165
162
166
155
157
158
147
149
150
151
152
143
144
139
135
136
137
138
131
129
130
123
121
125
122
126
115
117
118
109
107
111
108
112
101
103
104
93
95
96
97
98
89
90
85
81
82
83
84
77
75
76
69
67
71
68
72
61
63
64
55
53
57
54
58
47
49
50
39
41
42
43
44
35
36
31
27
28
29
30
23
21
22
15
13
17
14
18
97
8
1
3
4
DstIP1
PacketLength
Protocol
SrcIP4
SrcIP3
DstIP2
-0.05
C = abs((l1 − l 2 ) p v * 100)
Figure [3] shows two sample Bi-Plots generated
for Normal 1 and Portsweep data sets.
0.0
Table 2 shows the results of a possible criterion
C for the detection of an attack based on the
loading values. This criterion is represented by
the following equation:
observations on the principal components axes.
By showing the transformed observations, the
data can be easily interpreted in terms of the
principal components. By showing the variables,
the relationships between those variables and the
principal components can be viewed graphically.
Comp.2
With these results, it is possible to use the
loading values of the features on the first and
second principal components to identify an
attack. For normal traffic, loading values appear
to be similar, while during an attack the loading
values differ significantly for the first two
principal components. A threshold value could
be used to make such a distinction. In addition,
the decision could be further enhanced using the
standard deviation values for first and second
components. Whenever these values differ
significantly, an additional data point could be
obtained regarding the possibility of an attack.
SrcPort
-10^5
-0.10
Where, l1 and l2 are the loading values for the
first and second principal components, and pv is
the cumulative proportion of variance for the
first and second principal components.
289
277
265
257
249
237
229
223
215
207
199
195
187
181
173
167
159
153
145
141
133
127
119
113
105
99
91
87
79
73
65
59
51
45
37
33
25
19
11
5
-0.10
-0.05
0.0
0.05
0.10
Comp.1
Table 2 : Attack Criteria calculation
0.00
0.998
0.40
Normal 3
0.708
0.706
0.997
0.20
IP Sweep
0.617
0.787
0.998
16.97
Neptune
0.723
0.69
0.999
3.30
Port
Sweep
0.221
0.974
0.998
75.15
Smurf
0.981
0.139
0.705
59.36
If a threshold value of C = 1 is used given the
above data sets, we could achieve a 100%
detection rate using the selected criterion for
detection.
In addition to the calculation of the attack
criterion, Bi-Plots could be utilized to visually
interpret the loading values of the principal
components and to see which features had the
highest loading on a given principal component
value.
A Bi-Plot allows the representation of both the
original variables and the transformed
60000
SrcPort
40000
0.999
0.705
20000
0.707
0.709
298
291
294
292
295
296 DstPort
299
287
290
PacketLength
SrcIP4
DstIP3
DstIP4
SrcIP3
DstIP1
Protocol
SrcIP1
DstIP2
SrcIP2
162
163
178
179
180
188
189
190
191
192
193
165
169
166
167
181
182
183
184
185
186
187
194
273
274
275
276
277
278
279
280
281
282
283
284
285
125
131
134
126
132
127
128
129
164
168
171
170
172
173
174
175
176
177
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
100
101
102
103
104
105
106
71
72
73
74
75
76
78
79
80
81
82
83
84
85
86
87
88
89
95
96
97
98
99
90
91
92
117
121
114
115
116
118
119
120
122
123
124
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
46
49
53
56
60
62
44
47
50
54
57
63
51
58
65
66
67
68
69
70
3
2
1
7
9
155
156
157
158
159
160
161
150
151
152
153
154
144
145
146
147
148
149
137
138
139
140
141
142
143
136
108
109
110
111
112
113
130
133
135
48
52
55
59
61
64
107
77
93
94
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
45
8
4
5
6
195
196
0
0.707
Normal 2
0.0
Normal 1
0.3
Attack
Criteria
0.2
Cum. Prop.
Of Variance
60000
Comp.2
Comp. 2
Loading
40000
0.1
Comp. 1
Loading
20000
-0.1
Attack
Data Set
0.4
0
300
286
288
289
297
293
-0.1
0.0
0.1
0.2
0.3
0.4
Comp.1
Figure 3: Bi-Plots for Normal 1 (top) and
Portsweep (bottom) data sets
Interpreting the Bi-Plot is straightforward: the xaxis represents the scores for the first principal
component, the y-axis represents the scores for
the second principal component. The original
variables are represented by arrows, which
graphically indicate the proportion of the original
variance explained by the first two principal
components. The direction of the arrows
indicates the relative loadings on the first and
second principal components.
7.
VII. Conclusion and Future Work
This study presents a method for detecting
Denial-of-Service attacks and Network Probe
attacks using Principal Component Analysis as a
multivariate statistical tool. The study described
the nature of these attacks, introduced Principal
Component Analysis and discussed the merits of
using it for detecting intrusions. The study
presented the approach used to extract the
Principal Components and the related statistics.
It also discussed the results obtained from using
a proposed criterion for detecting the subject
intrusions. This criterion can yield 100%
detection rate. The study presented a graphical
method for interpreting the results obtained
based on the Bi-Plots. Future work includes
testing this model to work in a real-time
environment at which network traffic is
collected, processed and analyzed for intrusions
dynamically. This may involve using a more
comprehensive criterion that accounts for other
statistics including standard deviation values of
the Principal Components. In addition, an
enhancement may be added to utilize Bi-Plots for
visual interpretation of data in real-time. In this
case the entire DARPA data sets will be used to
qualify the results.
8.
References
15.
1.
16.
2.
3.
4.
5.
6.
Axelsson S., “Intrusion Detection Systems:
A Survey and Taxonomy”. Technical report
99-15, Dept. of Computer Engineering,
Chalmers University of Technology,
Goteborg, Sweden, March 2000.
Cabrera J., Ravichandran B., Mehra R.,
“Statistical Traffic Modeling for Network
Intrusion Detection”
Shah H., Undercoffer J., Joshi A., “Fuzzy
Clustering for Intrusion Detection”
DARPA Intrusion Detection Evaluation
Project: http://www.ll.mit.edu/IST/ideval/
Allen J. et al, “State of The Practice:
Intrusion Detection Technologies”. Carnegei
Mellon, SEI, Tech. Report CMU/SEI-99TR-028, ESC-99-028, January 2000
Ye N., Li X., Chen Q., Emran S., Xu M.,
“Probabilistic Techniques for Intrusion
Detection Based on Computer Audit Data”.
9.
10.
11.
12.
13.
14.
17.
18.
19.
IEEE Transactions on Systems, Man and
Cybernetics – Part A: Systems and Humans,
Vol. 31, No. 4, July 2001
Ye N., Emran S., Chen Q., Vilbert S.,
“Multivariate Statistical Analysis of Audit
Trails for Host-Based Intrusion Detection”.
IEEE Transactions on Computers, Vol. 51,
No. 7, July 2002
Taylor C., Alves-Foss J., “NATE: Network
Analysis of Anomalous Traffic Events, a
low-cost approach”. NSPW’01, September
10-13th, 2002, Cloudcroft, New Mexico,
U.S.A.
Taylor C., Alves-Foss J., “An Empirical
Analysis of NATE – Network Analysis of
Anomalous Traffic Events”. New Security
Paradigms Workshop’02, September 23-26,
2002, Virginia Beach, Virginia.
DuMouchel W., Schonlau M., “A
Comparison of Test Statistics for Computer
Intrusion Detection Based on Principal
Component Regression of Transition
Probabilities”.
Staniford-Chen S., Heberlein L.T., “Holding
Intruders Accountable on the Internet”.
Hotelling H., “Analysis of a Complex of
Statistical Variables into Principal
Components”. Journal of Educational
Psychology, 24:417–441, 1933.
Duda R., Hart P., Stork D., “Pattern
Classification”. Second Edition, John Wiley
& Sons, Inc., 2001
Haykin S., “Neural Networks: A
Comprehensive Foundation”. Second
Edition. Prentice Hall Inc., 1999
Kohonen T., “Self-Organizing Maps”. New
York, Springer-Verlag, 1995.
Skoudis E., “Counter Hack: A Step-by-Step
Guide to Computer Attacks and Effective
Defenses”. Prentice Hall Inc., 2002
McHugh J., “Testing Intrusion Detection
Systems: Critique of the 1998 DARPA
Intrusion Detection System Evaluations as
Performed by Lincoln Laboratory”. ACM
Transactions on Information and System
Security, Vol. 3, No. 4, November 2000,
Pages 262-294
http://www.insightful.com/
S-Plus: Guide to Statistics, Volume 2.
Insightful Corporation, 2001