[go: up one dir, main page]

Academia.eduAcademia.edu
Detecting Denial-of-Service And Network Probe Attacks Using Principal Component Analysis Khaled Labib and V. Rao Vemuri Department of Applied Science University of California, Davis U.S.A. kmlabib@ucdavis.edu and rvemuri@ucdavis.edu Abstract Intrusion detection complements prevention mechanisms, such as firewalls, cryptography, and authentication to capture intrusions into an information system while they are acting on the information system. This study presents an analysis of a method proposed for anomaly detection. The method uses a multivariate statistical method called Principal Component Analysis to detect selected Denial-of-Service and Network Probe attacks using the 1998 DARPA Intrusion Detection data set. The Principal Components are calculated for both attack and normal traffic, and the loading values of the various feature vector components are analyzed with respect to the Principal Components. The variance and standard deviation of the Principal Components are calculated and analyzed. A brief introduction to Principal Component Analysis and the merits of using it for detecting the selected intrusions are discussed. A method for identifying an attack based on the Principal Component Analysis results is proposed. The results obtained using a proposed criterion for detecting the selected intrusions show that a detection rate of 100% can be achieved using this method. Bi-Plots are used as a graphical mean for summarizing the statistics collected as a result of the analyzed data. I. Introduction With the growing rate of interconnections among computer systems, network security is becoming a major challenge. In order to meet this challenge, Intrusion Detection Systems (IDS) are being designed to protect the availability, confidentiality and integrity of critical networked information systems. Automated detection and immediate reporting of intrusion events are required in order to provide a timely response to attacks. Early in the research into IDS, two major approaches known as anomaly detection and signature detection were arrived at. The former relies on flagging behaviors that are abnormal and the later flagging behaviors that are close to some previously defined pattern signature of a known intrusion [1]. This paper describes a network-based anomaly detection method for detecting Denial of Service and Network Probe attacks. The detection of intrusions or system abuses presupposes the existence of a model [2]. In signature detection, also referred to as misuse detection, the known attack patterns are modeled through the construction of a library of attack signatures. Incoming patterns that match an element of the library are labeled as attacks. If only exact matching is allowed, misuse detectors operate with no false alarms. By allowing some tolerance in attack matching, there is a risk of false alarms, but the detector is expected to be able to detect certain classes of unknown attacks that do not deviate much from the attacks listed in the library. Such attacks are called neighboring attacks. In anomaly detection, the normal behavior of the system is modeled. Incoming patterns that deviate substantially from normal behavior are labeled as attacks. The premise that malicious activity is a subset of anomalous activity implies that the abnormal patterns can be utilized to indicate attacks. The presence of false alarms is expected in this case in exchange for the hope of detecting unknown attacks, which may be substantially different from neighboring attacks. These are called novel attacks. Detecting novel attacks while keeping acceptably low rates of false alarm, is possibly the most challenging and important problem in Intrusion Detection. IDSs may also be characterized by scope, as either network-based or host-based. The key difference between network-based and host- based IDSs is that a network-based IDS, although run on a single host, is responsible for an entire network, or some network segment, while a host-based IDS is only responsible for the host on which it resides [3]. In this study, a method for detecting selected types of network intrusions is presented. The selected intrusions represent two classes of attacks; namely Denial of Service attacks and Network Probe attacks. The method uses Principal Component Analysis (PCA) to reduce the dimensionality of the feature vectors to enable better visualization and analysis of the data. The data for both normal and attack types are extracted from the 1998 DARPA Intrusion Detection Evaluation data sets [4]. Portions of the data sets are processed to create a new database of feature vectors. These feature vectors represent the Internet Protocol (IP) header of the packets. The feature vectors are analyzed using PCA and various statistics are generated during this process, including the principal components, their standard deviations and the loading of each feature on the principal components. Bi-plots are used to represent a graphical summary of these statistics. Based on the generated statistics, a method is proposed to detect intrusions with relatively low false alarm rates. The rest of the paper is organized as follows: Section II discusses related work in intrusion detection using multivariate statistical approaches with emphasis on those using PCA. Section III provides an introduction to PCA and its applicability to the field of intrusion detection. Section IV describes Denial of Service and Network Probe attacks with emphasis on the attacks selected for this study. Section V details the process of data collection and preprocessing and the creation of feature vectors. It also describes how the various statistics are generated using PCA results. Section VI discusses the results obtained using this method and suggests a method of detecting intrusions using these results. False alarm rates are also discussed here. Finally, Section VII provides a conclusion of the work presented in this paper and recommendations for future work. II. Related Work IDS research has been ongoing for the past 15 years producing a number of viable systems, some of which have become profitable commercial ventures [5]. There are a number of research projects that focus on using statistical approaches for anomaly detection. Ye et al [6], [7] discuss probabilistic techniques of intrusion detection, including decision tree, Hotelling’s T2 test, chi-square multivariate test and Markov Chains. These tests are applied to audit data to investigate the frequency property and the ordering property of the data. Taylor et al [8], [9] present a method for detecting network intrusions that addresses the problem of monitoring high speed network traffic and the time constraints on administrators for managing network security. They use multivariate statistics techniques, namely, Cluster Analysis and PCA to find groups in the observed data. DuMouchel et al [10] discuss a method for detecting unauthorized users masquerading as a registered user by comparing in real time the sequence of commands given by each user to a profile of the user’s past behavior. They use a Principal Component Regression model to reduce the dimensionality of the test statistics. Staniford-Chen et al [11] address the problem of tracing intruders who obscure their identity by logging through a chain of multiple machines. They use PCA to infer the best choice of thumbprinting parameters from data. They introduce thumbprints, which are short summaries of the content of a connection. Shah et al [3] study how fuzzy data mining concepts can cooperate in synergy to perform Distributed Intrusion Detection. They describe attacks using a semantically rich language, reason over them and subsequently classify them as instances of an attack of a specific type. They use PCA to reduce the dimensionality of the collected data. III. Principal Component Analysis Principal Component Analysis [12] is a wellestablished technique for dimensionality reduction and multivariate analysis. Examples of its many applications include data compression, image processing, visualization, exploratory data analysis, pattern recognition, and time series prediction. A complete discussion of PCA can be found in several textbooks [13], [14]. The popularity of PCA comes from three important properties. First, it is the optimal (in terms of mean squared error) linear scheme for compressing a set of high dimensional vectors into a set of lower dimensional vectors and then reconstructing the original set. Second, the model parameters can be computed directly from the data - for example by diagonalizing the sample covariance matrix. Third, compression and decompression are easy operations to perform given the model parameters - they require only matrix multiplication. A multi-dimensional hyper-space is often difficult to visualize. The main objectives of unsupervised learning methods are to reduce dimensionality, scoring all observations based on a composite index and clustering similar observations together based on multivariate attributes. Summarizing multivariate attributes by two or three variables that can be displayed graphically with minimal loss of information is useful in knowledge discovery. Because it is hard to visualize multi-dimensional space, PCA is mainly used to reduce the dimensionality of d multivariate attributes into two or three dimensions. PCA summarizes the variation in correlated multivariate attributes to a set of non-correlated components, each of which is a particular linear combination of the original variables. The extracted non-correlated components are called Principal Components (PC) and are estimated from the eigenvectors of the covariance matrix of the original variables. Therefore, the objective of PCA is to achieve parsimony and reduce dimensionality by extracting the smallest number components that account for most of the variation in the original multivariate data and to summarize the data with little loss of information. In PCA, the extractions of PC can be made using either original multivariate data set or using the covariance matrix if the original data set is not available. In deriving PC, the correlation matrix may be used, instead of the covariance matrix, when different variables in the data set are measured using different units or if different variables have different variances. Using the correlation matrix is equivalent to standardizing the variables to zero mean and unit standard deviation. The PCA model can be represented by: u mx1 = Wmxd x dx1 where u, an m-dimensional vector, is a projection of x - the original d-dimensional data vector (m << d). It can be shown [12] that the m projection vectors that maximize the variance of u, called the principal axes, are given by the eigenvectors e1, e2, …, em of the data set’s covariance matrix S, corresponding to the m largest non-zero eigenvalues λ1, λ2, … λm. The data set’s covariance matrix S can be found as: S= 1 n (x − µ )(x − µ )T ∑ n − 1 i =1 Where µ is the mean vector of x. The eigenvectors ei can be found by solving the set of equations: (S − λi I )ei =0 i = 1,2,..., d where λi are the eigenvalues of S. After calculating the eigenvectors, they are sorted by the magnitude of the corresponding eigenvalues. Then the m vectors with the largest eigenvalues are chosen. The PCA projection matrix is then calculated as: W = ET where E has the m eigenvectors as its columns. One of the motives behind the selection of PCA for the detection of network traffic anomalies is its ability to operate on the input feature vector’s space directly without the need to transform the data into another output space as in the case of other self-learning techniques. For example, in Self-Organizing Maps [15], the transformation of a high-dimensional input space to a lowdimensional output space takes place through the iterative process of training the map and adjusting the weight vectors. The weight vectors are typically selected randomly which makes the process of selecting the best initial weight vectors a trial-and-error process. In PCA, dimensionality reduction is achieved by calculating the first few principle components representing the highest variance in the components of the input feature vector, without the need to perform any transformations on the input space. The input data is analyzed within its own input space, and the results of the transformations are deterministic and do not rely on initial conditions. IV. Denial of Service and Probe Attacks In a Denial of Service (DoS) attack, the attacker makes some computing or memory resource too busy, or too full, to handle legitimate users’ requests. But before an attacker launches an attack on a given site, the attacker typically probes the victim’s network or host by searching these networks and hosts for open ports. This is done using a sweeping process across the different hosts on a network and within a single host for services that are up by probing the open ports. This is referred to as Probe Attacks. Table 1 : Description of DoS and Probe attacks Attack Name Smurf Neptune IPsweep Portsweep Attack Description Denial of Service ICMP echo reply flood SYN flood Denial of Service on one or more ports Surveillance sweep performing either a port sweep or ping on multiple host addresses Surveillance sweep through many ports to determine which services are supported on a single host Table 1 summarizes the types of attacks used in this study. Smurf attacks, also known as directed broadcast attacks, are a popular form of DoS packet floods. Smurf attacks rely on directed broadcast to create a flood of traffic for a victim. The attacker sends a ping packet to the broadcast address for some network on the Internet that will accept and respond to directed broadcast messages, known as the Smurf amplifier. The attacker uses a spoofed source address of the victim. If there are 30 hosts connected to the Smurf amplifier, the attacker can cause 30 packets to be sent to the victim by sending a single packet to the Smurf amplifier [16]. Neptune attacks can make memory resources too full for a victim by sending a TCP packet requesting to initiate a TCP session. This packet is part of a three-way handshake that is needed to establish a TCP connection between two hosts. The SYN flag on this packet is set to indicate that a new connection is to be established. This packet includes a spoofed source address, such that the victim is not able to finish the handshake but had allocated an amount of system memory for this connection. After sending many of these packets, the victim eventually runs out of memory resources. IPsweep and Portsweep, as their names suggest, sweep through IP addresses and port numbers for a victim network and host respectively looking for open ports, that could potentially be used later in an attack. V. Data Collection and Preprocessing The 1998 DARPA Intrusion Detection data sets were used as the source of all traffic patterns in this study. The training data set includes traffic collected over a period of seven weeks and contains traces of many types of network attacks as well as normal network traffic. This data set has been widely used in the research in Intrusion Detection, and has been used in comparative evaluation of many IDSs. McHugh [17] presents a critical review of the design and execution of this data set. Approach Attack traces were identified using the time stamps published on the DARPA project web site. Data sets were preprocessed to create feature vectors that were used to extract the principal components and other statistics. The feature vector chosen has the following format: SIPx SPort DIPx DPort Prot PLen Where • • • • SIPx = Source IP address nibble, where x = [1-4]. Four nibbles constitute the full source IP address SPort = Source Port number DIPx = Destination IP address nibble, where x = [1-4]. Four nibbles constitute the full destination IP address DPort = Destination Port number One of the motives in creating smaller data sets for representing the feature vectors is to study the effectiveness of this method for real-time applications. Real-time processing of network traffic mandates the creation of small sized databases that are dynamically created from realtime traffic presented at the network interface. With each packet header being represented by a 12 dimensional feature vector, it is difficult to view this high-dimensional vector graphically and be able to extract the relationships between its various features. It is equally difficult to extract the relationship between the many vectors in a set. Therefore, the goal of using PCA is to reduce the dimensionality of the feature vector by extracting the PCs and using the first and second components to represent most of the variance in the data. It is also important to be able to graphically depict the relationship between the various feature vector components and the calculated PCs, to see which of the features affect the PCs most. This graphical representation would enable better visualization of the summary of the relationships in the data set. This visualization is achieved using Bi-Plots. PCA was performed on all data sets where each feature vector would be represented by its 12 components. An exploratory analysis and statistical modeling tool called S-Plus [18] was used to generate the required statistics for this study. The following statistics were generated for each data set: • • VI. Results The principal component loadings are the coefficients of the principal components transformation. They provide a convenient summary of the influence of the original variables on the principal components, and thus a useful basis for interpretation of data. A large coefficient (in absolute value) corresponds to a high loading, while a coefficient near zero has a low loading [19]. The variance and standard deviation of a random variable are measures of dispersion. The variance is the average value of the squared deviation from the variable’s mean, and the standard deviation is the square root of the variance. If X is a discrete random variable with density function fX(x) and mean µX, the variance σ2 is given by the weighted sum: σ X2 = ∑ ( xi − µ X )2 f ( xi ) n i =1 1.2 1 0.8 0.6 0.4 0.2 0 f Seven data sets were created, each containing 300 feature vectors as described above. Four data sets represented the four different attack types one each of shown in Table 1. The three remaining data sets represent different portions of normal network traffic across different weeks of the DARPA Data Sets. This allows for variations of normal traffic to be accounted for in the experiment. Standard Deviation for each component Proportion of variance for each component Cumulative proportion of variance across all components Loading value of each feature on all individual components A Bi-Plot representing the loading of the different features on the first and second components ur This format represents the IP packet header information. Each feature vector has 12 components. The IP source and destination addresses are broken down to their network and host addresses to enable the analysis of all types of network addresses. • • • Sm Prot = Protocol type: TCP, UDP or ICMP PLen = Packet length in bytes N or m al 1 N or m al 2 N or m al 3 IP Sw ee p N ep tu ne Po rt Sw ee p • • Comp. 1 Loading Comp. 2 Loading Comp. 1 Variance Comp. 2 Variance Cumulative Prop. Of Variance Figure 1: Component Loading and Variance In Neptune attacks, a flood of SYN packets is sent to one or more ports of the server machine, but from many clients with, typically, nonexisting (spoofed) IP addresses. The packets seen by the server appears to be coming from many different IP addresses with different source port numbers. This is represented by the irregularity 12000 10000 8000 6000 4000 Comp. 1 Std. Deviation Smurf 0 Port Sweep 2000 Neptune In IPsweep attacks, one or more machines (IPs) are sweeping through a list of server machines looking for open ports that can later be utilized in an attack. While in Portsweep attacks, one machine is sweeping through all ports of a single server machine looking for open ports. In both cases, there is an irregular use of port numbers that causes the variance in the principle components to vary, with an associated irregularity in the loading values. 14000 IP Sweep For the four attack data sets, note that the loading values for the first and second principal components are not equal, possibly representing the imbalance in variance in the packets flowing between a client and a server with respect to the source and destination port numbers. Figure 2 shows the standard deviation for the first and second principal components for all data sets. In the case of IPsweep and Portsweep attacks, the standard deviation of both source and destination port numbers is almost similar. This is due to the similarity in utilizing source and destination port numbers in these attacks. Normal 3 Note that the loading values for the first and second principal components in the three normal data sets are equal, with a value of 0.7. This represents the balance in variance in the packets flowing between a client and a server with respect to the source and destination ports. In TCP, the data and acknowledgement packets regularly flow between the client and the server, each using a designated TCP port number for the duration of the session. In Smurf attacks, attackers utilize floods of Internet Control Message Protocol (ICMP) echo reply packets to attack a server. Using amplifying stations, attackers utilize broadcast addressing to amplify the attack. The packets seen by the server appear to be coming from many different IP addresses but to one source port. Therefore, 99% of the variance for this data set is represented by the first four principal components and has their loading values associated with SIP1, SIP2, SIP3 and SIP4, instead of the source and destination ports as in previous attacks. Normal 2 In the results above, the first 2 principal components consistently had their highest absolute value loading from SPort and DPort features across all data sets. This reflects the high variance in both source and destination port numbers for all data sets, except for Smurf at which the highest variance was due to source IP address components. Port numbers in TCP connections vary from 0 to 65534 and represent the different network services offered within the TCP protocol. in both loading and variance of the principal components. Normal 1 Figure 1 shows the loading and variance of the first and second principal components for all data sets. Normal 1, 2 and 3 represent 3 randomly chosen data sets from normal traffic. IPsweep, Neptune, Portsweep and Smurf represent data sets for these attacks. Comp. 2 Std. Deviation Figure 2: Standard Deviation Values for first 2 PCs In Neptune attacks, the source and destination ports vary differently where the source port would have the highest variance. In Smurf attacks, the first two components, namely SIP1 and SIP2, represent only a portion of the variance and have a relatively small standard deviation value. -5*10^4 0 5*10^4 10^5 10^5 -10^5 0.10 290 278 266 258 250 238 230 224 216 208 200 196 188 182 174 168 160 154 146 142 134 128 120 114 106 100 92 88 80 74 66 60 52 46 38 34 26 20 12 6 0.05 5*10^4 DstPort -5*10^4 0 SrcIP2 DstIP3 DstIP4 300 298 296 294 288 282 286 280 274 272 270 260 264 254 248 244 242 240 234 226 220 212 210 202 192 186 178 170 164 156 148 140 132 124 116 110 102 94 86 78 70 62 56 48 40 32 24 16 10 2 SrcIP1 299 297 295 293 291 292 287 281 285 283 284 279 273 275 276 271 269 267 268 259 263 261 262 253 251 255 252 256 247 243 245 246 241 239 233 231 235 232 236 225 227 228 219 217 221 218 222 211 209 213 214 201 203 204 205 206 197 198 191 189 193 190 194 185 183 184 177 175 179 176 180 169 171 172 163 161 165 162 166 155 157 158 147 149 150 151 152 143 144 139 135 136 137 138 131 129 130 123 121 125 122 126 115 117 118 109 107 111 108 112 101 103 104 93 95 96 97 98 89 90 85 81 82 83 84 77 75 76 69 67 71 68 72 61 63 64 55 53 57 54 58 47 49 50 39 41 42 43 44 35 36 31 27 28 29 30 23 21 22 15 13 17 14 18 97 8 1 3 4 DstIP1 PacketLength Protocol SrcIP4 SrcIP3 DstIP2 -0.05 C = abs((l1 − l 2 ) p v * 100) Figure [3] shows two sample Bi-Plots generated for Normal 1 and Portsweep data sets. 0.0 Table 2 shows the results of a possible criterion C for the detection of an attack based on the loading values. This criterion is represented by the following equation: observations on the principal components axes. By showing the transformed observations, the data can be easily interpreted in terms of the principal components. By showing the variables, the relationships between those variables and the principal components can be viewed graphically. Comp.2 With these results, it is possible to use the loading values of the features on the first and second principal components to identify an attack. For normal traffic, loading values appear to be similar, while during an attack the loading values differ significantly for the first two principal components. A threshold value could be used to make such a distinction. In addition, the decision could be further enhanced using the standard deviation values for first and second components. Whenever these values differ significantly, an additional data point could be obtained regarding the possibility of an attack. SrcPort -10^5 -0.10 Where, l1 and l2 are the loading values for the first and second principal components, and pv is the cumulative proportion of variance for the first and second principal components. 289 277 265 257 249 237 229 223 215 207 199 195 187 181 173 167 159 153 145 141 133 127 119 113 105 99 91 87 79 73 65 59 51 45 37 33 25 19 11 5 -0.10 -0.05 0.0 0.05 0.10 Comp.1 Table 2 : Attack Criteria calculation 0.00 0.998 0.40 Normal 3 0.708 0.706 0.997 0.20 IP Sweep 0.617 0.787 0.998 16.97 Neptune 0.723 0.69 0.999 3.30 Port Sweep 0.221 0.974 0.998 75.15 Smurf 0.981 0.139 0.705 59.36 If a threshold value of C = 1 is used given the above data sets, we could achieve a 100% detection rate using the selected criterion for detection. In addition to the calculation of the attack criterion, Bi-Plots could be utilized to visually interpret the loading values of the principal components and to see which features had the highest loading on a given principal component value. A Bi-Plot allows the representation of both the original variables and the transformed 60000 SrcPort 40000 0.999 0.705 20000 0.707 0.709 298 291 294 292 295 296 DstPort 299 287 290 PacketLength SrcIP4 DstIP3 DstIP4 SrcIP3 DstIP1 Protocol SrcIP1 DstIP2 SrcIP2 162 163 178 179 180 188 189 190 191 192 193 165 169 166 167 181 182 183 184 185 186 187 194 273 274 275 276 277 278 279 280 281 282 283 284 285 125 131 134 126 132 127 128 129 164 168 171 170 172 173 174 175 176 177 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 100 101 102 103 104 105 106 71 72 73 74 75 76 78 79 80 81 82 83 84 85 86 87 88 89 95 96 97 98 99 90 91 92 117 121 114 115 116 118 119 120 122 123 124 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 46 49 53 56 60 62 44 47 50 54 57 63 51 58 65 66 67 68 69 70 3 2 1 7 9 155 156 157 158 159 160 161 150 151 152 153 154 144 145 146 147 148 149 137 138 139 140 141 142 143 136 108 109 110 111 112 113 130 133 135 48 52 55 59 61 64 107 77 93 94 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 45 8 4 5 6 195 196 0 0.707 Normal 2 0.0 Normal 1 0.3 Attack Criteria 0.2 Cum. Prop. Of Variance 60000 Comp.2 Comp. 2 Loading 40000 0.1 Comp. 1 Loading 20000 -0.1 Attack Data Set 0.4 0 300 286 288 289 297 293 -0.1 0.0 0.1 0.2 0.3 0.4 Comp.1 Figure 3: Bi-Plots for Normal 1 (top) and Portsweep (bottom) data sets Interpreting the Bi-Plot is straightforward: the xaxis represents the scores for the first principal component, the y-axis represents the scores for the second principal component. The original variables are represented by arrows, which graphically indicate the proportion of the original variance explained by the first two principal components. The direction of the arrows indicates the relative loadings on the first and second principal components. 7. VII. Conclusion and Future Work This study presents a method for detecting Denial-of-Service attacks and Network Probe attacks using Principal Component Analysis as a multivariate statistical tool. The study described the nature of these attacks, introduced Principal Component Analysis and discussed the merits of using it for detecting intrusions. The study presented the approach used to extract the Principal Components and the related statistics. It also discussed the results obtained from using a proposed criterion for detecting the subject intrusions. This criterion can yield 100% detection rate. The study presented a graphical method for interpreting the results obtained based on the Bi-Plots. Future work includes testing this model to work in a real-time environment at which network traffic is collected, processed and analyzed for intrusions dynamically. This may involve using a more comprehensive criterion that accounts for other statistics including standard deviation values of the Principal Components. In addition, an enhancement may be added to utilize Bi-Plots for visual interpretation of data in real-time. In this case the entire DARPA data sets will be used to qualify the results. 8. References 15. 1. 16. 2. 3. 4. 5. 6. Axelsson S., “Intrusion Detection Systems: A Survey and Taxonomy”. Technical report 99-15, Dept. of Computer Engineering, Chalmers University of Technology, Goteborg, Sweden, March 2000. Cabrera J., Ravichandran B., Mehra R., “Statistical Traffic Modeling for Network Intrusion Detection” Shah H., Undercoffer J., Joshi A., “Fuzzy Clustering for Intrusion Detection” DARPA Intrusion Detection Evaluation Project: http://www.ll.mit.edu/IST/ideval/ Allen J. et al, “State of The Practice: Intrusion Detection Technologies”. Carnegei Mellon, SEI, Tech. Report CMU/SEI-99TR-028, ESC-99-028, January 2000 Ye N., Li X., Chen Q., Emran S., Xu M., “Probabilistic Techniques for Intrusion Detection Based on Computer Audit Data”. 9. 10. 11. 12. 13. 14. 17. 18. 19. IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans, Vol. 31, No. 4, July 2001 Ye N., Emran S., Chen Q., Vilbert S., “Multivariate Statistical Analysis of Audit Trails for Host-Based Intrusion Detection”. IEEE Transactions on Computers, Vol. 51, No. 7, July 2002 Taylor C., Alves-Foss J., “NATE: Network Analysis of Anomalous Traffic Events, a low-cost approach”. NSPW’01, September 10-13th, 2002, Cloudcroft, New Mexico, U.S.A. Taylor C., Alves-Foss J., “An Empirical Analysis of NATE – Network Analysis of Anomalous Traffic Events”. New Security Paradigms Workshop’02, September 23-26, 2002, Virginia Beach, Virginia. DuMouchel W., Schonlau M., “A Comparison of Test Statistics for Computer Intrusion Detection Based on Principal Component Regression of Transition Probabilities”. Staniford-Chen S., Heberlein L.T., “Holding Intruders Accountable on the Internet”. Hotelling H., “Analysis of a Complex of Statistical Variables into Principal Components”. Journal of Educational Psychology, 24:417–441, 1933. Duda R., Hart P., Stork D., “Pattern Classification”. Second Edition, John Wiley & Sons, Inc., 2001 Haykin S., “Neural Networks: A Comprehensive Foundation”. Second Edition. Prentice Hall Inc., 1999 Kohonen T., “Self-Organizing Maps”. New York, Springer-Verlag, 1995. Skoudis E., “Counter Hack: A Step-by-Step Guide to Computer Attacks and Effective Defenses”. Prentice Hall Inc., 2002 McHugh J., “Testing Intrusion Detection Systems: Critique of the 1998 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory”. ACM Transactions on Information and System Security, Vol. 3, No. 4, November 2000, Pages 262-294 http://www.insightful.com/ S-Plus: Guide to Statistics, Volume 2. Insightful Corporation, 2001