[go: up one dir, main page]

0% found this document useful (0 votes)
22 views22 pages

Machine Learning For Misuse-Based Network Intrusio

The document discusses a framework for machine learning-based misuse detection in network intrusion systems, highlighting the shift from hard-coded rules to AI-driven approaches. It introduces new evaluation metrics for fair comparison of algorithms and presents a workflow for processing raw network traffic into machine learning features. The findings indicate that this framework can achieve state-of-the-art performance while simplifying feature extraction for real-time applications.

Uploaded by

lawkar0101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

Machine Learning For Misuse-Based Network Intrusio

The document discusses a framework for machine learning-based misuse detection in network intrusion systems, highlighting the shift from hard-coded rules to AI-driven approaches. It introduces new evaluation metrics for fair comparison of algorithms and presents a workflow for processing raw network traffic into machine learning features. The findings indicate that this framework can achieve state-of-the-art performance while simplifying feature extraction for real-time applications.

Uploaded by

lawkar0101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/351270050

Machine Learning for Misuse-Based Network Intrusion Detection: Overview,


Unified Evaluation and Feature Choice Comparison Framework

Article in IEEE Access · January 2021


DOI: 10.1109/ACCESS.2021.3075066

CITATIONS READS

34 320

3 authors, including:

Laurens Le Jeune Toon Goedemé


KU Leuven KU Leuven
8 PUBLICATIONS 74 CITATIONS 213 PUBLICATIONS 3,432 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Laurens Le Jeune on 15 September 2021.

The user has requested enhancement of the downloaded file.


Received March 10, 2021, accepted March 22, 2021, date of publication April 22, 2021, date of current version May 4, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3075066

Machine Learning for Misuse-Based Network


Intrusion Detection: Overview, Unified Evaluation
and Feature Choice Comparison Framework
LAURENS LE JEUNE 1,2 , TOON GOEDEMÉ 2 , AND
NELE MENTENS 1,3 , (Senior Member, IEEE)
1 ES&S and imec-COSIC, Department of Electrical Engineering (ESAT), KU Leuven, 3000 Leuven, Belgium
2 EAVISE, PSI, Department of Electrical Engineering (ESAT), KU Leuven, 3000 Leuven, Belgium
3 Leiden Institute of Advanced Computer Science (LIACS), Leiden University, 2311 Leiden, The Netherlands

Corresponding author: Laurens Le Jeune (laurens.lejeune@kuleuven.be)


This work was supported in part by the COllective Research NETworking (CORNET) and funded by VLAIO under
Grant HBC.2018.0491, and in part by the CyberSecurity Research Flanders under Grant VR20192203.

ABSTRACT Network Intrusion detection systems are essential for the protection of advanced commu-
nication networks. Originally, these systems were hard-coded to identify specific signatures, patterns and
rule violations; now artificial intelligence and machine learning algorithms provide promising alternatives.
However, in the literature, various outdated datasets as well as a plethora of different evaluation metrics
are used to prove algorithm efficacy. To enable a global comparison, this study compiles algorithms for
different configurations to create common ground and proposes two new evaluation metrics. These metrics,
the detection score and the identification score, together reliably present the performance of a network
intrusion detection system to allow for practical comparison on a large scale. Additionally, we present a
workflow to process raw packet flows into input features for machine learning. This framework quickly
implements different algorithms for the various datasets and allows systematic performance comparison
between those algorithms. Our experimental results, matching and surpassing the state-of-the-art, indicate
the potential of this approach. As raw traffic input features are much easier and cheaper to extract when
compared to traditional features, they show promise for application in real-time deep learning-based systems.

INDEX TERMS Intrusion detection, machine learning, neural networks, security.

I. INTRODUCTION for identifying malicious activity and attacks [4]. Network


Today, more and more devices are connected to the internet. intrusion detection systems then aim to detect attacks by
Cisco forecasts that by 2023 there will be 29.3 billion devices investigating network traffic. While they historically func-
connected to the internet [1]. As the attack surface increases, tioned through hard-coded rules, more and more research is
the need for security rises. For example, in 2015 a mas- being conducted to investigate the application of machine
sive brute force attack [2] on Alibaba resulted in the poten- learning (ML). Academic research proposes many differ-
tial compromisation of 21 million user accounts. In 2016, ent network intrusion detection techniques, also comparing
Internet-of-Things (IoT) devices infected with the Mirai bot- against other techniques. However, the plethora of publicly
net were used in a large Distributed Denial of Service (DDoS) available and potentially outdated datasets, the differences
attack against Domain Name System provider Dyn, resulting between those datasets, the variety of evaluation methods and
in the unavailability of many major internet platforms such as the occasional unclear reporting of proposed techniques sig-
Spotify, Twitter and Netflix1 [3]. nificantly complicate making a fair comparison. This paper
One important link in the chain of protection against aims at solving this issue and sets out a work flow that is
attacks is the intrusion detection system (IDS), responsible used to run existing solutions on relevant datasets and that is
made open source such that it can easily be applied to future
1 https://splinternews.com/here-are-the-sites-you-cant-access-because- solutions. Our contribution is four-fold:
someone-took-1793863079 • We give an overview of the most frequently used datasets
The associate editor coordinating the review of this manuscript and and we summarize the pros and cons of each dataset
approving it for publication was Vicente Alarcon-Aquino . (Sect. IV).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 63995
L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

• We present existing evaluation methods and their draw- in [10]. Building what effectively is an intrusion detection
backs, and we propose newly derived unifying metrics network (IDN) allows the different IDS nodes to communi-
for fair and reliable comparison of network intrusion cate and pass on useful information. This however unlocks
detection performance (Sect. V). a new security risk: malicious or compromised IDS nodes
• We provide a profound overview of ML techniques in can try to give false feedback, or forgo giving necessary
the literature for network intrusion detection, with a feedback. Therefore, CIDS research involves verifying the
focus on recent deep learning (DL) approaches, and trustworthiness of a node, e.g. [9], [11], [12].
we quantitatively compare and discuss these techniques Recently, a new branch of IDS research has emerged,
based on results reported in related work as well as investigating the security of the Internet-of-Things (IoT). IoT
our own recalculations (Sect. VI) using our proposed nodes require IDS technology different from regular net-
metrics. works, for three reasons: The limited resources of IoT nodes,
• We propose a workflow that allows for the use of raw their specific network topologies and their new communica-
network traffic in machine learning, as raw traffic-based tion protocols [13]. Some examples are [14], [15].
features are more suitable for real-time application when Another increasingly important domain concerns Wireless
compared to traditional machine learning features for Sensor Networks (WSN). These networks are characterized
network intrusion detection (Sect. VII). The promising by infrastructure absence, wireless links, limited physi-
experimental results for various datasets and algorithms cal protection, a lack of central management and limited
are comparable to the state-of-the-art. resources as defined in [16]. And while some WSN solutions
Before expounding on these contributions, we will first are being evaluated on NIDS datasets [17]–[20], WSN envi-
provide background information on intrusion detection sys- ronments are intrinsically different from the traditional NIDS
tems (Sect. II) as well as compare our work against other, environment we consider in this paper, with a fixed, wired
related work (Sect. III). infrastructure and abundant resources.
In practice, IDS implementations can be combinations
II. BACKGROUND of the different types of IDSs. Reference [11] for example
Intrusion detection systems are systems that are able to detect uses different HIDS nodes in a CIDS system, where the
malicious behaviour. This section inspects network intrusion HIDS nodes communicate to improve detection accuracy.
detection as one of multiple intrusion detection applications, IDSs that combine HIDS and NIDS technology are some-
as well as different approaches to actually build a network times called hybrid IDS [21], [22]. Note that hybrid IDS
intrusion detection system. can also denote an IDS that combines misuse-based and
anomaly-based intrusion detection [14], [23], [24]. This will
A. INTRUSION DETECTION SYSTEMS be more thoroughly investigated in Sect. II-B. In this paper,
One of the first mentions of detecting malicious activity we focus on network-based intrusion detection systems.
was made in 1980 by J.P. Anderson [5], who outlines the
required components for what is now known as an IDS. B. MISUSE-BASED OR ANOMALY-BASED
In 1987 D. Denning introduced IDES (Intrusion-Detection Generally, the functioning of any IDS can be described as
Expert System), which was the foundation for many subse- being either misuse-based or anomaly-based. Misuse-based2
quent IDSs [6]. Currently, various IDSs are used for various intrusion detection, also known as knowledge-based [4],
goals. Host-based intrusion detection systems (HIDS), for in principle simply means that the IDS knows what cer-
example, aim to detect intrusions in a specific host [4], such tain attacks look like, and that it detects attacks based on
as a computer or a server. Notably, this not only comprises that knowledge. Therefore, misuse-based intrusion detection
network-based intrusions, but includes unauthorized use of algorithms obtain low false positive rates (see Sect. V) when
the host. A HIDS example is given in [7], in which the anoma- inspecting network traffic. Moreover, they can effectively
lous use of applications is detected by monitoring system detect known attacks and label them accordingly, which
calls. Although this technique helps to identify intrusions via facilitates following up on the detection. There is however
a specific host, it provides no insight into intrusions in other one glaring weakness of misuse-based intrusion detection
parts of the network. systems: Their inability to detect unknown and zero-day
By contrast, network-based IDSs (NIDS) detect network attacks. Since they are conditioned to detect known attacks,
traffic intrusions [4]. Rather than keeping track of one host, other attacks that do not share similarities with them will go
NIDSs monitor network attacks, by inspecting network traffic unnoticed.
to detect malicious communication through the flows and The counterpart to the misuse-based intrusion detection
generated features of the network packets. is anomaly-based intrusion detection. Anomaly-based or
Collaborative IDSs (CIDS) consist of multiple IDS nodes behaviour-based intrusion detection creates a model of what
that exchange information in a centralized, decentralized or
2 Traditionally, misuse-based approaches were signature-based, matching
distributed manner [8]. This way, a large network can be
traffic against know attack patterns. In this paper, we include supervised
better protected against exceedingly distributed attacks [9]. machine learning solutions in this category, as they are trained to recognize
One of the earlier mentions of such a system is given in specific attacks.

63996 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

is supposed to be normal network traffic. Intrusions can techniques, they consider both misuse-based detection as well
be defined as traffic with significant deviations from that as anomaly-detection. Next, they provide an overview of
expected behaviour. This means that rather than detect- the computational complexity and streaming capabilities of
ing very specific attacks, anomaly-based intrusion detection each technique. By commenting on IDS performance, on the
catches abnormal traffic. Because abnormal traffic does not difficulty of comparing different detection methods and on
automatically correspond to an attack, anomaly-based intru- the (re)trainability of models, as well as by proving some
sion detection is characterized by high false positive rates. As recommendations, they conclude their work.
anomaly-based intrusion detection systems are able to detect In [28], the authors provide an overview of machine learn-
unknown and zero-day attacks however, they remain relevant ing and deep learning techniques that are being used in cyber-
for research. security, with a focus on network intrusion detection. They
Note that in some literature, (network) anomaly detection consider Support Vector Machines, k-Nearest Neighbour,
is a term used to designate both misuse-based as well as Decision Trees, Deep Belief Networks, Recurrent Neural
behaviour-based intrusion detection. It is therefore advisable Networks and finally Convolutional Neural Networks. In this
to always verify what exactly is meant by anomaly detec- overview, they observe three problems: The relative lack of
tion. In this paper we mainly concentrate on misuse-based benchmark datasets, the non-uniformity of evaluation metrics
detection, although our metrics can also be applied to leading to difficult comparison and finally the insufficient
anomaly-based techniques. attention to algorithm efficiency. Moreover, they also remark
some trends in intrusion detection research, namely the study
III. RELATED WORK of hybrid models, the opportunities and challenges deep
Since much research has gone into network intrusion detec- learning poses, the increased number of papers comparing
tion, many different techniques and algorithms have been different algorithms as well as their practicability, and the
proposed over the years. As a result, there is a need to promise for new benchmark datasets.
categorize and compare different approaches to get a better Boutaba et al. [23] present a survey on the use of
understanding of the field. In this section, we go over some machine learning for different networking aspects, among
survey papers, ordered by years, in order to investigate what which they include network security. Distinguishing between
exists and to state our contribution. The focus in this regard misuse-based, anomaly-based, Deep and Reinforcement
lies on more recent work, as machine learning and especially Learning-based and hybrid intrusion detection, they consider
deep learning are fast evolving research fields. 36 approaches, mainly evaluated on the KDDCup1999 and
Bhuyan et al. [25] present a categorization of network the NSL-KDD datasets. Overall, they observe a need for more
anomaly detection methods and systems, encompass- recent datasets, the lack of anomaly-based detection systems
ing statistical, classification-based, knowledge-based, soft- in real implementations, insufficient real-time implementa-
computing, clustering-based, ensemble-based, fusion-based tions and a general lack of systems fulfilling other specific
and hybrid approaches. Additionally, they consider several requirements. Finally, they conclude with a perspective of ML
tools such as nmap or Wireshark that are useful in network for networking in general. Interestingly, they also denote a
anomaly detection, and they discuss the evaluation of detec- need for real-world data instead of synthetic datasets as well
tion systems. Finally, they conclude by providing recom- as a need for standard evaluation metrics to accommodate
mendations for and stating challenges of network anomaly easier comparison.
detection. Berman et al. [29] present a survey portraying the appli-
Ahmed et al. [26] consider different methods for anomaly cation of deep learning in different cybersecurity problems,
detection, namely classification, statistical, information among which network intrusion detection is covered. In their
theory-based and clustering-based approaches. Note that overview of network intrusion detection, they consider recur-
their anomaly detection techniques also include misuse-based rent neural networks, convolutional neural networks, deep
algorithms such as Support Vector Machines and rule-based neural networks, deep belief networks and autoencoders
approaches. Additionally, they discuss IDS datasets, and eval- and other algorithms. They remark that, among all secu-
uate the anomaly detection methods according to their com- rity problems, restricted Boltzmann machines, autoencoders
putational complexity, their output format and their attack and recurrent neural networks comprised the most popular
priority. However, as this evaluation appears to be limited approaches. Moreover, they state that the intrusion detection
to DARPA/KDDCup attacks, its applicability with regards to ability of a system depends strongly on both the number of
more recent datasets is limited. classes as well as the kind of attack, in addition to being
In [27], Buczak et al. examine machine learning and data subject to the benign-malicious ratio in the training data.
mining techniques for intrusion detection. These techniques Finally, they consider the impact of false alarms as well as
include artificial neural networks, (fuzzy) association rules, missed attacks, and draw attention to the fact that adversaries
Bayesian Networks, clustering, decision trees, ensemble might try to actively circumvent protection measures.
learning, evolutionary computation, hidden Markov mod- Mahdavifar et al. [30] investigate the application of deep
els, inductive learning, naive Bayes, sequential pattern learning for a number of cybersecurity tasks, namely for mal-
mining and support vector machines. For each of these ware detection, intrusion detection, phishing detection, spam

VOLUME 9, 2021 63997


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

detection and website defacement detection. For intrusion research [23], [27], [28], [34]. We introduce two metrics that
detection, they mainly examine papers that employ genera- can be calculated using standard measures and allow for fairer
tive deep learning technology, as opposed to discriminative and easier comparison between datasets. Moreover, we also
or hybrid deep learning approaches. Based on the surveyed study the application of machine learning-based network
papers, the authors also present a generic framework showing intrusion detection in constrained, real-time environments,
the use of deep learning for cybersecurity. Moreover, they comparing different techniques for different datasets. We fur-
note the potential of semi-supervised learning,3 as the field ther propose a workflow to generate features in such a setting,
presents lots of such unlabelled data. Besides some other and run experiments to validate this workflow as well as to
learning-related remarks, they finally conclude by pointing compare against the state-of-the-art.
out that that deep learning should only be used in fields where
complex non-linear models are required, if sufficient data is IV. DATASETS
available. Whenever developing machine learning algorithms, it is
Chaabouni et al. [31] present an overview of NIDS-based paramount to have a dataset at one’s disposal. This
IoT security, considering IoT threats, public datasets and dataset allows for training an algorithm and/or for evaluat-
tools as well as current open-source NIDSs. Additionally, ing the performance of that algorithm. In this section we
they also consider machine learning-based approaches which discuss the publicly available data sets that are relevant for
have been validated for general network intrusion detection our research, as also listed in Table 1. These datasets usually
datasets. One of their major remarks is the need for an IoT contain both feature files as well as raw traffic files. While the
IDS dataset that is representative of the real world, with feature files provide manually selected features and labelling,
more attention to the semantic relation between detection the raw traffic files provide the unlabelled binary traffic
performance and the learning process. packets from which those features were extracted. Generally,
In [32], Ring et al. provide a substantial overview of net- the format of such raw traffic files is PCAP (Packet CAPture)
work intrusion detection datasets, focussing on 15 properties or PCAPNG (PCAP Next Generation).
in the analysis. Their work can serve as a guideline in select-
ing suitable public datasets for a specific goal, as they provide A. DARPA1998
both a simplified overview as well as a large in-depth table One of the oldest important intrusion detection datasets is
and discussion. Moreover, they also consider other sources the 1998 DARPA Intrusion Detection Evaluation Dataset
for data, namely data repositories and traffic generators. (DARPA1998) [35], [36] generated by MIT Lincoln Labo-
Finally, they draw some conclusions that are relevant for any ratory. For this dataset, the network traffic of over 50 air
other research involving NIDS datasets. For example, they force bases during 4 months was examined and then recre-
discuss the unlikeliness of ever creating a perfect dataset, ated in a simulation. By including attacks corresponding
and recommend using multiple datasets for evaluation. Thus, to the DoS, R2L, U2R and Probe categories, the dataset
while not providing insight regarding performance of specific simulates intrusion in the network. Concretely, 7 weeks of
algorithms or approaches, their investigation does yield per- simulation were used to generate training data while the final
tinent knowledge. 2 weeks provided test data. Interestingly, these test data con-
Ferrag et al. [33] provide insight in deep learning tech- tain some attack types (for example mailbomb, UDP Storm
niques for cybersecurity purposes. They discuss 35 public or httptunnel) that are not present in the training data. This
datasets, classifying them into one of seven categories: Net- allows for testing whether the NIDS can detect attacks it
work traffic-based, electrical network-based, internet traffic- has not encountered before. This dataset has been criticized
based, virtual private network-based, android apps-based, IoT however, as John McHugh already did in 2000 [37]. While
traffic-based and internet-connected devices-based datasets. somewhat outdated in some regards (mentioned traffic flows
Besides considering existing deep learning-based intrusion are much smaller than today), the critiques remain relevant.
detection algorithms, they also train and evaluate several deep One critique is that not enough proof is given that testing
learning models for the CSE-CIC-IDS2018 and the Bot-IoT on the synthesised data is representative of the result in
datasets. The detection performance for these deep neural real-world traffic. Other critiques of McHugh include the
network, recurrent neural network, convolutional neural net- taxonomy and distribution of the attacks or the analysis of
work, restricted Boltzmann machine, deep belief network, the results. Moreover, [38] detected simulation artifacts in
deep Boltzmann machine and deep autoencoder models multiple attributes of network traffic, such as too few dif-
appears to be comparable. ferent TTL values, high regularity for TCP SYN packets or
In comparison to other surveys, in this paper we focus on a source address predictability. Additionally, [39] indicate even
number of novel things. Firstly, we tackle the issue of compar- more issues in the DARPA1998 dataset. Besides once again
ing between different approaches, potentially across different denoting that the synthetic origin of the data can prove to be a
datasets, which is a challenge in network intrusion detection problem, they argue that tcpdump (the traffic collector used)
might drop packets in heavy traffic. As no examination of
3 In semi-supervised learning, the training set comprises a small subset of dropped packets was undertaken, it is impossible to account
labelled samples and a large subset of unlabelled samples. for these packets. Furthermore, they point out the lack of

63998 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

exact definitions of the attacks. Not every probing action of It is important to note that, since it is still based on
a host can immediately be seen as an attack, for example. DARPA1998, the network traffic featured in NSL-KDD
However, in DARPA1998, there are no clear specifications of remains outdated and less relevant for modern networks.
when for example a probe becomes an attack [39]. Although NSL-KDD is the final iteration of the datasets based on
being the first real intrusion detection dataset, DARPA1998 is DARPA1998.
often used or referred to, in combination with its successor
KDDCup1999. D. ISCX2012
In [43], the improved generation of intrusion detection
B. KDDCup 1999 datasets is investigated. The authors aim to dynamically gen-
The KDDCup1999 dataset (Knowledge Discovery and Data erate datasets from specific profiles that simulate agents’ traf-
Mining Tools Competition) [40], or KDD99 for short, was fic behaviours. This way, it is possible to regenerate certain
created by taking a version of DARPA1998. This version configurations or to create different scenarios. ISCX2012 is
was then used as the dataset for the KDD99 competition. the result of generating such a dataset by using α-and
KDD99 is a very important benchmark in the intrusion β-profiles in a testbed of physical network devices and
detection community, still being used for the evaluation of hosts. α-profiles represents human-designed attacks while
algorithms in 2019 [41]. This highlights one of the most β-profiles represent the mathematically modelled behavior of
important critiques on KDD99 nowadays: It is severely the users of the Information Security Centre of Excellence
outdated [27], [34]. Attacks that were relevant 20 years ago, (ISCX). The α-profiles comprised 4 large attack scenarios
might not be relevant anymore. Moreover, many new attacks that were executed during the generation, namely infiltrat-
have been developed since (for example, DDoS attacks are ing the network from the inside, HTTP DoS attacks using
not featured at all in KDD99). This critique of course also Slowloris, DDoS attacks using an IRC Botnet and brute force
counts for DARPA1998, of which the data from KDD99 is SSH attacks.
derived. Likewise, as KDD99 is based on DARPA1998, some
of the critiques of DARPA1998 also count for KDD99. How- E. UNSW-NB15
ever, the critiques of [38] are not applicable to KDD99, as the Motivated by the growing deprecation of KDD99 and NSL-
features used in KDD99 remain unaffected [39]. In [42] the KDD, University of New-South Wales researchers in [44] cre-
unbalanced character of the attacks appears. While dividing ated a new dataset striving to improve on the shortcomings of
the dataset into 10 seperate parts, they noticed that 4 parts those benchmark datasets. By generating both normal as well
only contained smurf attacks, and that one part only contained as malicious traffic using the IXIA PerfectStorm tool4 they
neptune attacks (both DoS attacks). The presence of so many created the UNSW-NB15 dataset. They used the Bro [45]5
issues led to the development of a better version of KDD99, and Argus6 systems to extract features from the packet cap-
namely NSL-KDD. ture files and additionally calculated more in-depth features
with their own algorithms. The nearly 2 million packet flows
C. NSL-KDD present normal traffic and malicious traffic corresponding
In [39], various problems of KDD99 are discussed by Taval- to nine attack classes: Fuzzers, Analysis, Backdoors, DoS,
laee et al.. Firstly, they note that 78% of the records in the Exploits, Generic, Reconnaissance, Shellcode and Worms.
training set, as well as 75% of the records in the test set are The same authors statistically analyze the dataset in [46],
duplicated. They argue that this causes a model to become determining that, when splitting the dataset into training
biased towards the more frequent records in the dataset. and test sets, both sets are similarly distributed and share
Secondly, as they trained 21 ML models on the data, every statistical properties. Moreover, they indicate that the fea-
model classified 98% of the training and 86% of the test data tures are correlated and that UNSW-NB15 is more complex
correctly. Tavallaee et al. argue that this signifies that the than KDD99 by comparing classification results on both
dataset is fairly easy to classify. For this reason, it is hard to datasets.
actually compare different IDSs on the same dataset, as they Results in both other work [47] as well as our experiments
all obtain high scores. Therefore, the authors in [39] present a appear to suggest that UNSW-NB15 also is more complex to
new dataset, NSL-KDD. In order to achieve this new dataset, classify than both ISCX2012 and CICIDS2017, when com-
they first removed redundant records from KDD99. In order paring performances of the same approach for those datasets.
to solve the ease of classification for the dataset, they have
also created a specific, difficult subset of the original dataset. F. CICIDS2017
By including mostly packets that were poorly classified with In [48], the authors list eleven characteristics that are essen-
the test classifiers, this subset represents the packets that are tial for a valid intrusion detection dataset: attack diversity,
hard to classify. Often, it is denoted as NSL-KDD21 or NSL- anonymity, available protocols, complete capture, complete
KDD-21, since no packets that were correctly classified by
all 21 classifiers are included. The full NSL-KDD dataset 4 https://www.ixiacom.com/products/perfectstorm
(containing all packets) is also known as NSL-KDD+ or 5 Formerly Bro IDS, is now known as Zeek: https://zeek.org/
NSL-KDDC+ . 6 https://openargus.org/

VOLUME 9, 2021 63999


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

TABLE 1. Different network intrusion detection datasets.

interaction, complete network configuration, complete traf- datasets, this also requires evaluation metrics that are usable
fic, feature set heterogeneity, labelling and metadata. More- for every algorithm with each dataset. In the literature,
over, they created a modern dataset complying to those we observe that each author makes an own choice from the
characteristics, the Canadian Institute for Cybersecurity plethora of metrics that can be used to evaluate NIDSs: accu-
Intrusion Detection Evaluation Dataset (CICIDS2017). racy, precision, detection rate, false positive rate, F-measure,
By setting up 10 benign user agents that generate realis- etc. Typically, they are used in two scenarios: The binary
tic traffic data, and simultaneously attacking the network scenario and the multiclass (n-ary) scenario. While metrics
from 4 attack hosts, they simulated a dataset spanning such as the precision are very straightforward to calculate
one workweek. For each day a number of attacks were in a binary scenario, the multiclass scenario requires metric
applied, resulting in attack data for brute force attacks, values for the different classes to be aggregated. In practice,
heartbleed attacks, botnets, DoS attacks, DDoS attacks, this can be done using macro averaging (denoted with M),
web attacks and infiltration attacks. Globally, this results micro averaging (denoted with µ) and weighted averaging
in 15 attack classes corresponding to these 6 categories. (denoted with w). Macro averaging considers every class
While this course of action provides a modern and diverse equally, while both micro averaging and weighted averaging
dataset, there are still some shortcomings. Researchers in [49] favor large classes. We present a more in-depth overview
identify 4 issues for CICIDS2017. Firstly, they argue that, and formulas of these evaluation metrics in appendices A-A
as the dataset is divided over 8 files, it is difficult to and A-B.
work with. Secondly, the dataset contains a huge amount As will be elaborately discussed in Sect. VI-D, one issue in
of data, with 3119345 instances across those files. Thirdly, the field of network intrusion detection is that many authors
288,805 instances in CICIDS2017 miss either a class label report their results in various ways, complicating comparison
or other information. Finally, the dataset has serious issues between different work. Therefore, we propose two metrics
with class imbalance, with the heartbleed attack containing that reliably evaluate the overall performance of an intrusion
as little as 11 instances while there are over 2 million benign detection system. These metrics, the detection score and iden-
instances. The first three issues can be dealt with easily, tification score, are ideally used in tandem.
by respectively aggregating the dataset files, sampling the
dataset and by removing instances missing information. For A. DETECTION SCORE
the class imbalance, the authors suggest combining minority The main task of any intrusion detection system is to reliably
attack classes belonging to the same attack category. The detect intrusion. In the most basic sense, this comes down
three different web attacks can for example be combined triggering an alarm whenever an attack is detected. In the
into one web attack class. While this provides relief for some case of network intrusion detection with a normal class and
attack classes, it still does not help in case of the infiltration an attack class, we define the detection score (DS) as the
class with only 36 instances. F-measure with the attacks as positive class: DS = F1. This
In this section, we discussed the most-used datasets in the is the case for anomaly-based approaches, as these generally
field that are relevant for this paper. Among these datasets, identify attacks in normal traffic without elaborating into dif-
UNSW-NB15 and CICIDS2017 are most recommended for ferent attack classes. However, in the case of multiple attack
use in [32], as they contain a wide range of modern attack classes, this calculation no longer is straightforward. Rather
scenarios. ISCX2012 is also suitable, but features somewhat than aggregating individual F-measures per class, we propose
older network traffic, which has to be kept in mind. a method to map the multiclass confusion matrix to a binary
class confusion matrix, which would allow for the use of
V. EVALUATION METHODS the binary F-measure as DS. This mapping is demonstrated
If different intrusion detection algorithms and implementa- in Fig. 1, with more information about confusion matrices in
tions are to be compared, it is important this happens on a appendix A-C. Specific to this approach is that any predicted
common ground. Besides an understanding of the different attack that corresponds with any true attack is considered

64000 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

a true positive, even if the attack classes differ. We reason Since the IS is a harmonic mean, it will penalize small
that any detected legitimate attack is beneficial, even if the values of either F1w of F1M , only resulting in a high value
attack is initially detected for the wrong reasons. Thus, the DS if both inputs are also high. It will lie between 0 and 1, but a
allows for assessing the base performance of a NIDS, namely value of for example 0.9 already indicates that at best, both
its ability to detect attacks. In our implementation, we con- averages F1w and F1M have a value of 0.9. However, for
sider the F-measure to account for both the precision and unbalanced classes, one average may be significantly higher
detection rate of an NIDS. The detection rate outlines the than 0.9. In such a case, the other value must be significantly
ability to not miss any attacks, while the precision serves to lower than 0.9. Smaller values for the IS can therefore indi-
limit false alarms. Both properties are required for a real-life cate that either one average is very small, or that both values
NIDS, as it should neither miss attacks nor produce too many are equally small. Both situations are undesirable.
false alarms. While it is possible to use either metric instead The IS is not the only metric that aims to handle class
of the F-measure in the calculation of the DS, we argue that imbalance. For example, Receiver Operating Characteris-
both should be kept in mind concurrently. In the scenario that tic (ROC) or Precision-Recall (PR) curves with an Area
either precision or recall is more important than the other, Under Curve (AUC) value [50] use thresholding to evaluate
the Fβ -measure can be used instead. We will only consider the performance of a dataset. According to [51], the Matthews
β = 1, where precision and recall are equally important. Correlation Coefficient (MCC) is the least biased metric for
In order to evaluate actually identifying (attack) classes, binary classification that takes into account both classifi-
we also propose the identification score. cation successes and errors. In principle, training a model
implies minimizing the misclassification cost [52]. For IDSs
in particular, this cost encompasses not only the impact of
intrusion, but also operational costs and the hostility of the
environment [53]. It is however hard to find this cost, as it
depends on uncertain parameters. When comparing against
such other evaluation approaches, there is a clear difference
with the IS. First, unlike for ROC or PR, no thresholding is
required in the calculation. This is valuable in a setting where
not all data to apply thresholding are present, for example
when comparing results of various authors that did not pro-
vide their models. Moreover, comparing to many different
FIGURE 1. Mapping of attack classes to binary. curves in one or more graphs can quickly become cluttered,
complicating the comparison process. MCC similarly heavily
depends on a confusion matrix to be computed, which is not
B. IDENTIFICATION SCORE
always provided by authors. It also incorporates the number
The second task of an NIDS is the classification of detected
of true negatives (normal traffic that does not raise alarms),
attacks into different attack classes, as correctly identifying
which is not really relevant to the performance of network
attacks helps in formulating a response. However, network
intrusion detection systems. The advantage of the IS, when
intrusion detection datasets suffer from class imbalance:
compared to other metrics, is that it can easily be deduced
Minority classes are underrepresented in the dataset, while
when the F-measures for each individual class, as well as
majority classes are overrepresented. On the one hand,
the class weights are known. These values are among the
the results for the majority classes will dictate the over-
most reported evaluation metrics.7 Moreover, as mentioned
all outcome for weighted averages. This provides a reli-
in section V-A, the F-measure also accounts for both the false
able overview of the overall performance, but might mask
alarms (in the precision) and missed attacks (in the recall).
poor performance for a minority class. On the other hand,
as macro averages do consider minority classes, they can
VI. STATE-OF-THE-ART ALGORITHMS
mask a good overall performance in the case of poorly
Many different methods are used to perform network intru-
detected minority classes. For network intrusion detection,
sion detection on specialized datasets, of which some datasets
both the overall performance as well as the performance per
are discussed in Sect. IV. This paper however does not
class are important. Therefore, only using one instead of
strive to exhaustively list every network intrusion detection
both aggregation approaches could unreliably display perfor-
technique, as that would be nearly impossible, but rather to
mance. As also discussed in Sect. VI-D, most other works
provide an overview of the current state-of-the-art for ML
only report weighted averages, forgoing the macro averages.
approaches. This overview is further divided into two major
Therefore, we propose the identification score (IS), which
segments: Traditional machine learning and deep learning.
we define as the harmonic mean of the weighted average
While both are important and relevant, the main focus will
F-measure and the macro average F-measure:
2 · F1w · F1M 7 The overall accuracy similarly is very often reported, but far less insight-
IS = H (F1w , F1M ) = (1) ful for unbalanced classification,
F1w + F1M
VOLUME 9, 2021 64001
L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

lie on deep learning. Tables 2 and 3 give the performance K-means approach to generate a high-quality dataset from
of the algorithms that are discussed in the remainder of KDD99. By using a distance threshold, they add a cluster
this section, for the datasets introduced in Sect. IV. The centroid to the set whenever a sample exceeds this thresh-
tables report performance following two methods: The first old for every other centroid. Furthermore, they construct a
provides the accuracy and weighted averages for precision, multi-level classifier, detecting a traffic class at each level.
detection rate (DR), false positive rate (FPR) and F-measure Notably, they use an Extreme Learning Machine (ELM)
as originally reported, if applicable. Secondly, whenever pos- rather than an SVM to detect Probe attacks, as they argue an
sible, we calculated the performance following our proposed ELM is better suited for that task. With every other classifier
metrics: DS and IS. In order to visualize the effect F-measures being an SVM, they achieved state-of-the-art results. In [81],
have on the IS, we also provide their weighted and macro the authors use SVM and Multilayer Perceptron classifiers on
averages. Before discussing different machine learning tech- UNSW-NB15, while employing the feature dimensionality
niques however, we will first consider what features are used reduction approaches Principal Component Analysis (PCA)
in network intrusion detection problems in the next section. and a chi-squares test. They achieve both a high accuracy and
F-measure of above 90%, surpassing other work on the same
A. FEATURES dataset.
Feature selection and extraction is essential for any machine In [88], the authors strive to implement a stable and
learning task. For network intrusion detection, proposed tech- accurate incremental SVM learning scheme. This allows for
niques can grossly be divided into two groups relating fea- manageable retraining in real-time to counteract increasing
ture usage: One group starts from provided features, while training set and training time. By using reserved sets with
the other extracts their own features. The first group uses weighted non-support vectors, they can retrain the classifier
dataset-specific features, that were extracted at the dataset’s without having to use the entire training set. Moreover, they
conception, manually selected by experts. Most methods also modify the RBF kernel to further reduce training and
using these features simply apply normalization for numeri- testing speed and increase reliability.
cal features and one-hot encoding for the categorical features.
For example, the 41 features provided in KDD99 are usually 2) DECISION TREES AND RANDOM FORESTS
preprocessed into 122-dimensional input vectors. If the pre- Decision trees give structure to a set of rules, derived from and
processing for a specific study is very similar to this process, using the input features for, among other things, classifica-
we will not discuss it here. However, the second group of tion [89]. Random forests (RF) then combine many different
research encompasses all other approaches that significantly decision trees for the same problem and take an ensem-
modify the provided features or extract completely new fea- ble of their results to achieve a more robust classifier [90].
tures. If for example a research uses raw traffic header bytes In [54], the RF approach is used to construct a misuse-based
as features, that will be discussed here. classifier, an anomaly-detector and a hybrid approach eval-
uated on KDD99. For the misuse-based approach, a RF is
B. TRADITIONAL MACHINE LEARNING METHODS used as a classifier to differentiate between normal traffic
In this section, we will discuss traditional ML-based NIDSs, and intrusion, resulting in a reported error rate of 7.07%.
namely Support Vector Machines, Decision Trees and Ran- By using the proximity of samples of traffic in the RF, out-
dom Forests, Extreme learning Machines, Restricted Boltz- liers can be detected for anomaly detection with a reported
mann Machines and some other approaches. We strive to give DR of 65% and FPR of 1%. Finally, by running all traffic
a very general overview of what is being done. unclear to the misuse-classifier through the anomaly-detector,
the authors propose a hybrid IDS approach with an overall
1) SUPPORT VECTOR MACHINES DR of 94.7%.
Even to date, one of the most prevalent ML-based intrusion As NIDS datasets suffer from class imbalance, Wei
detection techniques is the Support Vector Machine (SVM). Zong et al. [83] propose a two-stage approach for network
SVMs are classifiers that work by finding the hyperplane intrusion detection. After over-sampling the minority classes
separating classes with a maximal margin [86]. and down-sampling the majority classes, they first classify
In 2003, researchers explored the use of Robust input data into a minority class or other. This other class is
SVMs to classify process and session data from the then classified into the different majority classes in the second
DARPA1998 dataset [87]. Starting from the observation step. Both classifiers are RF, but can be exchanged for other
that datasets supposedly clean usually contain some noise techniques. After testing on UNSW-NB15, they achieve a
consisting of malicious traffic, they strove to design a robust result comparable to the best result in the analysis of that
classifier trainable on noisy data. By modifying the con- dataset [46].
straints and objective function of the SVM, they obtained a Combining ensemble methods and DL, the authors in [65]
recall of 100% with a FPR of 8% where the reference SVM aim to achieve high detection performance with a low training
was unable to even reach a recall of 100% without having time. They apply both Multi-Grained Traversing as well as
the FPR reach 100% as well. The authors in [63] describe Cascade Forest in an effort to surpass the work they compare
another way to deal with dataset issues, proposing a modified against, which succeeds.

64002 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

TABLE 2. State-of-the-art results for ML-based NIDSs. Values in italics are originally reported rather than recalculated. F1w and F1M in binary scenarios
are only relevant for IS calculation. The best value for each calculated metric is in bold.

3) EXTREME LEARNING MACHINES input weights and calculated output weights. They excel
Extreme Learning Machines (ELM) are single hidden layer in their high training speed while providing adequate gen-
feedforward neural networks that use randomly generated eralization [91]. Yuanlong Yu et al. [61] aim to improve

VOLUME 9, 2021 64003


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

TABLE 3. Continued: State-of-the-art results for ML-based NIDSs.

unbalanced multiclass classification by designing a cascaded Other, older techniques include Hidden Markov Machines
scheme of binary ELM classifiers, in which each ELM is [95], k-Nearest Neighbours [96] and clustering [97].
trained on KDD99 to detect one class. While retaining high
performance on the majority classes, they succeed in improv- C. DEEP LEARNING METHODS
ing the classification of the minority classes. DL-based NIDSs, as opposed to regular ML approaches, use
Yi Yu et al. [67] propose a Dual Adaptive Regular- a multitude of subsequent hidden network layers to perform
ized Online Sequential ELM (DA-ROS-ELM) to allow for complex calculations. These networks can range from regu-
real-time training of the output weights. They report a high lar multilayer perceptrons to convolutional neural networks
accuracy while maintaining a high training speed. and (auto)encoders. Each of these architectures has its own
strengths and weaknesses regarding the detection of intru-
4) RESTRICTED BOLTZMANN MACHINES sions in network data. In the following sections we discuss
Restricted Boltzmann Machines (RBM) are generative mod- the different architectures and relevant techniques proposed
els consisting of a visible and a hidden layer [92]. They can be in research.
used for both unsupervised learning as well as classification.
The authors in [80] implemented an RBM that is trained and 1) MLP
evaluated on a balanced subset of ISCX2012, achieving an Multilayer Perceptrons (MLP) can be regarded as a base-
accuracy and a DR of about 88%. line neural network structure. While many more complex
In [93], a Discriminative RBM is used to perform classifi- or elaborate techniques are used, MLPs remain relevant and
cation with an accuracy of about 84.65% on KDD99. are used in NIDSs. Liu Zhiqiang et al. [82] use a 10-layer
MLP to binarily classify the traffic of UNSW-NB15. With an
5) OTHER APPROACHES accuracy of 99.5% and FPR of 0.47%, they obtain very high
Traditional ML is not limited to SVMs, decision trees and results. The authors of [55] achieved similarly high results
neural networks. For example, Genetic Algorithms (GA) by using a 4-layer MLP on KDD99, achieving an accuracy
are usually used in combination with other approaches to of 99.08%. However, they do not report whether this concerns
improve their performance: In [41], [94] genetic algorithms binary or multiclass classification.
are used to improve SVM performance, while [66] improve Chunlin Lu et al. [66] apply a genetic algorithm along with
the functionality of a MLP. a Dempster-Shafer (DS) decision fusion in an effort to better
Rather than using DL, in [73] Manuel Lopez-Martin et al. leverage the potential of back-propagated neural networks.
employ shallow neural networks for intrusion detection. They After removing 12 features with little influence from the
apply three different kernel approximation methods to project KDD99 dataset, they group the remaining features in three
the input features to a higher dimension, in order to be able subsets that are used to train a neural network with one hidden
to linearly separate the classes in the neural network. layer. By fusing the outputs of these three networks with

64004 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

the DS-model, the resulting GABP-model (Genetic Algo- features with 1.25 pixels (10 bits, 8 bits per pixel), while 2-bit
rithm BackPropagation) achieves state-of-the-art results. categorical features only account for 0.25 pixels in the image.
In [47] an extensive research is conducted, developing Moreover, they also argue that discretizing numerical features
a DNN that can be used for NIDS and HIDS and is in 10 values provides too rough an estimation of their actual
tested on KDD99, NSL-KDD, UNSW-NB15, WSN-DS [98], value. Therefore, they propose RGB-like encoding where
CICIDS2017 and Kyoto [99]. By comparing different con- numerical features range between 1 and 224 − 1. The categor-
figurations of a DNN for these different datasets and against ical features are one-hot encoded by using 0 × 000000 and
other ML approaches, the authors provide an insightful 0xffffff instead of 0 and 1. In their experiments, they report
overview of performance. It must be noted however that not a considerable improvement when compared to the grayscale
the entirety of CICIDS2017 has been included in the research, encoding technique, obtaining F-measures of about 82% on
as classes such as Infiltration appear to be excluded. NSL-KDD+ , about 75% on NSL-KDD21 , about 86% on
MLPs are also often used as baseline comparison for a new UNSW-NB15 and about 82% on CICIDS2017. We are unable
technology, for example in [67], [69]. to provide calculated metrics for more insight, as insufficient
information is given in their article.
2) CNN Another approach to using KDD99 features is given
Starting from the inception of AlexNet [100] in 2012, Convo- in [58], where the authors use VGGNet to classify 11 × 11
lutional Neural Networks (CNN) have been used in diverse images derived from those features. This approach results in
settings for diverse purposes. Some important domains an accuracy of 98.34%, on par with other techniques proposed
include image recognition [101], object detection [102] and on the same dataset.
Natural Language Processing [103]. Their success in these The authors of [85] take the image generation one step fur-
domains can be partially attributed to their ability to inter- ther by no longer using precalculated features, but by instead
pret spatial characteristics. Therefore, CNNs have also been using fragments of the raw packets themselves. In their exper-
applied in intrusion detection applications in recent years. iments, they evaluate three different methods of feature gen-
Generally, convolution layers for the processing of traffic eration by using packet headers, payloads or a combination
data are 1-dimensional or 2-dimensional. In [71], 1D con- of both. Header features for example are a string of the
volutional layers are used to perform binary classification first 50 bytes of 5 subsequent packets in a network flow,
on data from NSL-KDD, MAWILab [104] and Kyoto [99]. separated by an 0 × 00, resulting in a 16 × 16 grayscale
Kwon et al. test three different configurations using either image. Correspondingly, payload features contain the 51st to
one, two or three 1D convolutional layers as well as max pool- the 100th byte of 5 subsequent packets, generating different
ing layers and fully connected layers. From their experiments, 16 × 16 images. Unlike the header and payload features,
they conclude that the shallow model with only one convolu- the combination generates 22 × 22 images by concatenat-
tion layer consistently outperforms the other models. With ing the first 96 bytes of 5 subsequent packets and adding
F-measures of about 76% for NSL-KDDTest+ , about 63% 0 × 00s accordingly. Note that packets belong to a flow if
for NSL-KDDTest21 and a little over 60% for MAWILab, they share their source and destination addresses, their source
it does not outperform other algorithms for the same datasets. and destination ports and the protocol used. Besides their
For Kyoto, their results fluctuate heavily and are hard to novel feature generation, they also apply a specific Parallel
draw conclusions from. Another example of 1D-convolution Cross-Convolutional neural Network (PCCN) to better clas-
can be found in [76], where the authors use a custom 1D sify the highly imbalanced classes of CICIDS2017. While
CNN to classify the entirety of NSL-KDD. With high FPRs reporting results for each attack class, they only differentiate
and relatively low accuracies and DRs, the system fails to between those attacks as they do not include normal traf-
distinguish itself from similar technology however. fic. Therefore, it is impossible to draw definite conclusions
Besides 1D, CNNs can also use 2D convolution on about the algorithm’s ability of detecting attacks in normal
input data. This means that features are turned into 2D network traffic. We have implemented this algorithm for three
images that are classified. How these images are gener- datasets with normal traffic included, and present the results
ated, varies between different implementations. In [105], in Sect. VII-B
the 41 NSL-KDD features are turned into binary vectors. In [106], researchers examine the possibilities of transfer
By using 10-bit vectors for the discretized numerical features learning for intrusion detection. Concretely, the authors first
and n-bit vectors for categorical features, where n is the train a base CNN on UNSW-NB15, which then remains fixed
number of categories, an 8-bit grayscale 8 × 8 image can be to train a second CNN that is added onto the first. The
created. Using ResNet50 and GoogLeNet, they reach accu- aggregate of both CNNs is then trained on NSL-KDD and
racies of 80% on both NSL-KDD+ as well as NSL-KDD21 . tested for NSL-KDD+ and NSL-KDD21.
However, as they receive false positive rates of respectively
99.81% and 100%, their results for NSL-KDD21 are not 3) RNN, GRU AND LSTM
very useful. For this dataset, their implementations simply Recurrent Neural Networks (RNN), Gated Recurrent Net-
report each data sample as an attack. Researchers in [70] works (GRU) and Long-Short Term Memory (LSTM) have
argue that using this grayscale encoding provides numerical successfully been applied in areas such as Natural Language

VOLUME 9, 2021 64005


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

Processing [107], traffic forecasting [108] or remaining use- features with a J4.8 decision tree. While using all features,
ful life estimation [109]. LSTM [110] and GRU [111] are an accuracy of 93.82% is obtained. Important to note how-
alternative gate units that can replace the standard gate unit ever, are the very low DR values for attacks U2R and R2L in
of RNN. They share the ability to take the temporal aspect this model. Reducing the number of features to respectively
of data into account: data samples are processed based on 8 or 4 then results in accuracies of 93.69% and 93.72%.
preceding samples [112]. Consequently, their use for intru- Although the overall accuracy is maintained, the detection of
sion detection has also been investigated. More specifically, minority classes U2R and R2L is increasingly inhibited. Con-
as they are able to inspect time series of variable length, sequently, even if it is proven that KDD99 can be classified
they might be able to identify attacks based on their rela- using a very small number of features, the performance for
tion to the previously transmitted packets. One very early minority classes needs to be considered.
example of this is [113], where a RNN analyzes audits in Another research focusing on the use of LSTMs for
a HIDS in 1992. For NIDSs, early uses of RNN include KDD99 is [118], where the authors use a small subset of the
[56], [59] where specific RNN models are used to classify dataset consisting of 1000 normal samples and 300 samples
the KDD99 dataset. [56] uses, among others, Jordan/Elman for each attack except the very underrepresented U2R. This
recurrent networks [114], [115] to train on on a very small approach results in acceptable detection of normal and DoS
subset of KDD99. With a 91.91% overall detection rate, attacks, but only a limited number of the other attacks is
a corresponding false alarm rate of 8.08% and comparable detected. The overall DR of 98.96% is comparable to the
results for MLP and PCA-based approaches on binary classi- state-of-the-art, but the FPR of 7.78% remains too high.
fication, the algorithm does not quite meet the requirements
for real-life implementation. 4) CNN + RNN
The authors of [59] use a partially connected RNN archi- With CNNs exploring the spatial characteristics and RNNs
tecture, employing KDD99 for training and testing. They considering the temporal aspect of traffic data, a combina-
report a DR of 94.1% while maintaining a FPR of 0.38% tion of both methods has great potential. In [57], CNNs are
on multiclass classification, outperforming similar research used to generate feature maps from the base 41 features of
at the time. More recently, Chuanlong et al. employed a KDD99. After a maxpooling operation, these new features
RNN to perform classification on the NSL-KDD dataset [68]. are then presented as input features to the recurrent model.
By testing out a number of different network configurations, With either a RNN, GRU or LSTM as recurrent network,
they obtained to optimal accuracies. Concretely, they show the ideal configuration is sought. For binary classification,
binary classification accuracies of 83.28% for NSL-KDD+ the CNN with 2 LSTM layers, an accuracy of 99.7% and
and 68.55% for KDD21 , and multiclass accuracies of 81.29% F-measure of 99.8% outperforms other solutions. In multi-
for NSL-KDD+ and 64.67% for NSL-KDD21 . As the results class classification, the CNN with 3 LSTM layers obtains
for NSL-KDD+ hover only slightly above 80%, it is clear that the highest accuracy of 98.7%. It is however not possible to
there is still room for improvement for this approach. give any weighted averages, as the weight of each class is
Not only the basic RNN sees use in intrusion detection not clearly specified. More extensive research is conducted
research, the more advanced LSTM option has also shown in [79], in which HierArchical Spatial-Temporal features are
its potential. For example in [116], a LSTM Sequence to used to create an IDS (HAST-IDS). Defining HAST-I and
Sequence (LSTM-Seq2Seq) model is used for NSL-KDD HAST-II as configurations that process either an entire flow
and Kyoto-Honeypot data. This model consists of an encoder or a limited number of packets in a flow, enables intrusions
turning an input sequence into a fixed-length context vector, in DARPA1998 and ISCX2012 to be detected. HAST-I uses a
and a decoder giving the probability of the input sequence CNN to classify m × n flow images created by concatenating
being either benign or an attack by analyzing the context the first n one-hot encoded m-dimensional bytes of a flow.
vector. As the deep version of the model, featuring multiple Similarly, HAST-II generates r × p × q images concatenating
LSTM layers, has an F-measure of over 99% for NSL-KDD the first q one-hot encoded p-dimensional bytes of a packet,
and 100.0% for Kyoto, it appears to be very effective. Evi- it then repeats this step for r packets. However, after applying
dently, this greatly outperforms all other algorithms that were a CNN to generate features, HAST-II uses an LSTM to tem-
also tested in the study, being Fully Connected Network and porally analyze the sequence of packets as opposed to only
Variational Autoencoder. However, no confusion matrices using a CNN to analyze the entire flow. After determining the
or source code are provided to more deeply analyze these ideal values for n, q and r, the authors in [79] achieve high val-
results. ues for accuracy, DR and FPR. The results for DARPA1998
Another example of the potential of LSTM is given are not discussed here for two reasons: HAST-IDS would be
in [117], once again for NSL-KDD. Among a large selection the only relevant investigation to use the dataset. and HAST-II
of ML algorithms, LSTM surpasses its competition by obtain- does not use DARPA1998 as r is 1 or 2 for over 63% of flows.
ing accuracies of 89% on KDDTest+ and 83% on KDDTest21 . For ISCX2012, HAST-I achieves an accuracy of 99.69%,
[60] experimented with many different configuration settings a DR of 96.91% and a FPR of 0.22%. Similarly, the results
for the use of LSTM for KDD99, testing both the use of all for HAST-II, show an equivalent detection performance with
features as well as the possibility of reducing the number of a near negligible FPR of 0.02%.

64006 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

5) DBN state-of-the-art, the transfer-learning approach only is a slight


Deep Belief Networks can be viewed as stacks of RBMs improvement compared to the same architecture trained from
that evade the disadvantages of backpropagation in DL by scratch.
using unsupervised learning [119]. In [75], DBNs are used In [74], researchers try to improve the classification of
for NSL-KDD to achieve an accuracy of 97.45%. Similarly, network traffic by incooperating Stacked AutoEncoders for
authors in [62] apply a DBN with logistic regression to detect feature encoding. Each stacked autoencoder consists of 3
intrusions in KDD99. By removing duplicate samples from autoencoders that are trained on normal traffic. By then taking
the training set and designing a 4-layer network, they achieve the error vector of a stacked autoencoder, they obtain a mea-
an accuracy of 97.9%. sure of the deviation from normal traffic behaviour. Multiple
Yang et al. [77] modify the density peak clustering algo- error vectors from different stacked autoencoders are then
rithm (DPCA) [120] to better suit complex intrusion detection assembled as the channels of a 1D-image that is classified
datasets. After dividing the training data into clusters, they using a CNN.
train a DBN on each cluster and calculate a fuzzy membership Instead of using stacked regular autoencoders, in [64] the
matrix for each test sample related to the clusters. They then authors define a Non-symmetric Deep Autoencoder (NDAE)
use this membership to aggregate the outputs of the different in which only the encoding phase is utilised. Using this
DBNs for a test sample. The resulting Modified Density Peak structure to reduce the dimensionality, they performed RF
Clustering Algorithm and Deep Belief Networks (MDPCA- classification on KDD99 and NSL-KDD obtaining state-of-
DBN) approach is tested on NSL-KDD and UNSW-NB15, the-art results.
yielding high results. Calculating the same metrics on the Another application of autoencoders is given in [121],
confusion matrices they provide produces somewhat lower where Stacked Denoising Autoencoders are used to learn
values. useful features from traffic data. More specifically, they first
extract session-based features from the raw network traffic.
6) AUTOENCODERS These features include session metadata and 983 payload
Autoencoders are DL networks that allow for unsupervised bytes, and are used as input for the SDA. However, during
learning. Generally, they consist of two parts: An encoder training these input data are first corrupted by setting some
and a decoder. The encoder first encodes the input data, units to zero, in order to more robustly process noisy data. The
typically to a lower dimension. This encoded data is then authors then combine normal traffic of ISCX2012 with botnet
decoded back to its original dimension, the result of which attacks from CTU-13 [122] to train the system, obtaining
should be equal to the original input data. In the field of accuracies of 99.48% on binary classification and 98.11%
intrusion detection, autoencoders are usually combined with in a multiclass scenario, surpassing other configurations they
other mechanisms to improve detection. The autoencoders explored. However, as no further information is given regard-
then serve as dimensionality reduction devices that generate ing the merging of two different datasets, it is hard to draw
features that can be more efficiently processed by the follow- further conclusions.
ing classifier. For example in [72] a sparse autoencoder (SAE) Besides these examples, some authors also use autoen-
is trained on NSL-KDD features in order to improve both the coders in comparison with other algorithms such as in [116]
classification accuracy as well as the processing time of the where the LSTM-SeqSeq model surpassed all other models,
SVM intrusion detector. The resulting detection performance including the autoencoders. Similarly, the autoencoder-based
is on par with other approaches for NSL-KDD, but lacks a designs in [117] are surpassed by LSTM and CNN
reported FPR. While the autoencoder in this set-up presents alternatives.
DL, the SVM in practice still is a traditional ML technique.
In [84] this principle is extended, using either a SAE D. DISCUSSION
or Principle Component Analysis (PCA) to reduce the fea- In order to be able to compare approaches, Tables 2
tures’ dimensions before classification. Between the Random and 3 present the results for the KDD99, NSL-KDD+ ,
Forest (RF), Bayesian Network (BN), Linear Discriminant NSL-KDD21 , ISCX-2012, UNSW-NB15 and CICIDS2017
Analysis (LDA) and Quadratic Discriminant Analysis (QDA) datasets. Following these results, we can make a number of
classifiers, the RF classifier obtained the most promising observations.
results. After also applying Uniform Distribution Based Bal- Firstly, the need for standardized result reporting becomes
ancing (UDBB) to combat class imbalance, the authors reach apparent. Clearly, we observe that every author uses different
near perfect detection performance for CICIDS2017. evaluation measures to report performance. Moreover, when
Researchers in [78] investigate the use of transfer learning reporting multiclass results, researchers use various averag-
for their SAE, implementing it as the neural network in their ing approaches. While globally the weighted average appears
Deep neural network and adaptive Self-Taught-based Trans- to be most prevalent, some authors calculate this in a different
fer Learning (DST-TL) approach. They use layers trained on way. This can be seen in how some reported metrics deviate
power forecasting data of European wind farms and argue that from our calculated metrics, for example for [65] or [77].
NSL-KDD data has the same time-series-based and unpre- Moreover, some authors only provide very limited reporting
dictable nature. While they obtain results comparable to the regarding the different metrics, often disregarding to report

VOLUME 9, 2021 64007


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

the precision, DR, FPR or F-measure. While they sometimes clear and significant progress to date, however there remain
alleviate this by explicitly reporting other metrics such as a evident challenges for the future. These challenges are in line
Receiving Operator Characteristics curve (ROC-curve), that with the challenges first formulated in [34], and remain rele-
is not always the case. If for example the provided confusion vant for practical applications. The important challenges we
matrices for NSL-KDD21 in [69] are disregarded, the FPRs address in the following section are the difficulty of feature
of practically 100% resulting in very poor ISs do not stand extraction and comparison across datasets.
out. Transparently reporting these metrics allows for a better
performance assessment. VII. WORKFLOW AND COMPARATIVE EXPERIMENTS
Secondly, some trends are visible in the data, as data Currently, most of the machine learning-based NIDS research
suggest that binary classification is generally more accu- is conducted on complete datasets using precomputed fea-
rate than multiclass classification, following the DS val- tures. This approach, however, is not representative of
ues. Moreover, trends regarding dataset complexity can be real-time environments. In an on-line, high-speed networking
suggested. Clearly, NSL-KDD21 is significantly harder than environment, an NIDS needs to be able to inspect incoming
NSL-KDD+ . CICIDS2017 appears to be reasonably manage- traffic at high rates, lest it causes congestion or does not
able, while UNSW-NB15 and KDD99 remain challenging inspect all traffic. A system might therefore not have ample
even for very recent techniques. time to capture an entire flow for analysis, which is necessary
Thirdly, the issue of dataset imbalance remains important. to find some of the necessary features. For example, in the
While not directly clear in weighted averages, metrics such UNSW-NB15 features, sload indicates the number of source
as the average F-measure and the IS reveal that not all classes bits per second, and ct_flow_http_mthd tracks the number of
are well classified, even in the case of otherwise high results. Hypertext Transfer Protocol (HTTP) methods in the traffic.
Upon closer inspection for specific architectures, this appears Tracking such features in real-time is problematic for a num-
to be caused by attacks such as U2R and R2L in KDD99. ber of reasons:
These attacks are so underrepresented that systems fail to 1) Memory efficiency: As it is unknown when the flow
accurately learn their pattern or behaviour. Although the will end, it is unknown how long the NIDS needs to
authors all report very high (> 90%) scores on accuracy and keep specific data in memory. If many different flows
DR, our IS measure clearly provides a more nuanced view. are being transmitted simultaneously, the system would
Fourthly, even very recent research is still only being vali- need to keep many features in memory, exceeding the
dated on datasets such as KDD99 and NSL-KDD. As argued, available memory capacity or occupying memory that
these flawed and outdated datasets are a very poor represen- is necessary in other parts of the detection pipeline.
tation of contemporary network traffic. The relatively limited 2) Calculation overhead: Many features need to be cal-
amount of research on more recent datasets such as UNSW- culated from incoming data flows, requiring time
NB15 and CICIDS2017 in comparison underlines this issue. and resources not going to the actual detection
Finally, our proposed evaluation metrics allow for a more process. For example, most of the 48 features in
straightforward comparison between results, requiring only UNSW-NB15 require counters, timers or application
1 or 2 values to be investigated while giving a clear overview layer interpretation of data.
of the performance of the different techniques. Take for exam- 3) Detection delay: Following the principle of keeping a
ple the results reported in [59] or [60]: While it is possible flow in memory until it has been completely transmit-
to investigate the results of individual papers by looking up ted would also imply to delay detection until that point
specific graphs or confusion matrices, this is hardly practical in time. For flows that are spread over a longer period,8
on a larger scale. For large scale comparisons, it is necessary it takes too much time before an attack is detected.
to use only one or a few metrics that can be applied to each An attacker could easily abuse the system by starting many
method. When using the metrics provided by the authors, flows and never ending them to throttle IDS resources, or by
their results appear to be very good, while in practice their having an attack flow that never stops in order to avoid detec-
methods performed poorly for certain classes, as demon- tion. In practice, one workaround is to set (timing) thresholds
strated in their low identification scores. This trend can be that determine when to process an ongoing flow. This would
seen throughout the tables: Using only reported metrics con- however introduce noise in the detection process, as this pro-
ceals certain issues, while the combination of DS and IS cess would be based on the original, non-thresholded features
indicates whenever something is amiss. However, actually instead of their thresholded counterparts. While methods
identifying the root of such issues will most likely require might exist to implement an NIDS based on those traditional
actually investigating the original work. As such, our pro- features, in this section we explore a different approach: Raw
posed metrics are excellent tools for large-scale comparisons network traffic-based features.
of different work, but they do not replace individual metrics For this approach, raw network packets and flows are used
that assess specific properties and can be counselled in a as ML features, rather than their derived characteristics. One
follow-up investigation. example of this is the work presented in [85], where the
When we observe the trend of our unified measures over
the last years, we see network intrusion detection has made 8 Some flows in UNSW-NB15 are spread over more than 2 hours.

64008 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

first n bytes of 5 consecutive packets in the same flow are transport layer protocol, the unidirectionally labelled flows
used to form a square image. As this image is directly fed comprise both the (A, B, x, y, p) as well as the (B, A, y, x, p)
into a CNN, the only feature selections are the number of traffic. As different flows may have identical flow IDs, in the
packet bytes and the number of packets to include. Clearly, case that they were transmitted at different points in time,
using this approach minimizes feature extraction overhead. it is required to distinguish between packets of different flows
Instead of keeping track of many potentially long lasting with the same flow ID. By only selecting the first k packets
flows, the first n bytes of incoming packets are used instead. for a flow ID, where k is number of forward or backward
This introduces another advantage, namely that these features packets specified for each flow, we differentiate between dif-
are dataset agnostic: They can be extracted from any network ferent flows. These k packets are then removed from the tree,
traffic, thus allowing for evaluation on different datasets. assuring that the next packets correspond to the next flow.
On the contrary, traditional features are dataset-dependent, This approach depends on the chronological order of packets
complicating inter-dataset comparisons. in PCAP(NG) files. Note that for some flows, the indicated
Starting from this observation, we built a workflow9 number of forward and backward packets differs from the
to alleviate the comparison of raw traffic-based network actual captured packets. Similarly, some flow IDs featured
intrusion detection algorithms. This workflow features the in the labelling information are not featured at all in captured
use of different recent and publicly available datasets, traffic files. Both inconsistencies might introduce some noise
namely ISCX2012, UNSW-NB15, CICIDS2017. Moreover, in the dataset.
we implemented novel algorithms such as HAST-II, [79] Once all eligible12 individual flows have been labelled,
and PCCN [85] for each of these datasets. Both algo- the required features for each flow can be extracted in the sec-
rithms boast high reported performance, for ISCX2012 and ond step of the workflow. While this procedure is specific
CICIDS2017 respectively. Implementing them in our exper- for each feature extraction strategy, all extraction strategies
iments allows for easy comparison of on three different utilize the packet bytes in a flow. For both PCCN [85] and
datasets. This section will discuss the functioning of the HAST-II [79], these bytes are structured in 2D images, with
workflow and present results of the comparison between dif- HAST-II also creating sequences of such images.
ferent configurations of raw traffic-based network intrusion Finally, after acquiring the features, the corresponding ML
detection algorithms. algorithms can be trained, validated and tested in the final
step. For PCCN this algorithm is a CNN while for HAST-II a
A. PROPOSED WORKFLOW CNN and an LSTM are combined.
Our workflow is inspired by the workflow presented in [85],
but is more elaborate and generic. It also uses other meth- B. EXPERIMENTS AND RESULTS
ods to extract the relevant data, for example custom Python We implemented the proposed workflow, and extracted the
code instead of tshark10 for parsing, and will be made fully required features from the raw flow data as described for
available online. In order to be manageable, the workflow PCCN [85] and HAST-II [79]. All code is written in Python,
is divided into 3 large steps: Network traffic processing to using PyTorch for the implementation of the ML algo-
extract and label flows, feature extraction and intrusion detec- rithms. The train/validation/test split of the dataset was
tion, as visualized in Fig. 2a. 75%/15%/10%. The initial learning rate of each experi-
Typically, raw dataset traffic is stored inside PCAP and ment was 0.01 or 0.001, which was divided by 10 every
PCAPNG files11 that first need to be decoded before they time the loss stagnated for 10 epochs. Every experiment
can be used. Decoding PCAP files yields a chronological used a batch size of 256, and ran for at least 35 epochs,
sequence of packets with additional metadata such as times- depending on how fast it converged. The specific hyper-
tamps. Extracted packets can then be sorted according to their parameters for each experiment can be retrieved from
IP source and destination addresses as well as their TCP/UDP https://gitlab.com/EAVISE/raw-traffic-nids. In an effort to
source and destination ports and their protocol number. These reduce class imbalance, a constraint was introduced during
5 parameters, sometimes referred to as 5-tuple, constitute a training. This constraint limits the maximum number of train-
flow identifier (flow ID). ing samples to use per class, in order to get a more even distri-
After sorting all packets according to their flow ID, they bution during training. No changes were made in the testing
need to be labelled. The datasets used in this workflow pro- distribution. Besides introducing these constraints, we also
vided labelling information in a unidirectional fashion for aggregated the samples of the three separate web attacks in
flows based on their flow ID. If a flow ID is represented CICIDS2017 to further combat class imbalance. The results
by (A, B, x, y, p), where A and B are source and destination following these experiments are presented in Table 4 and
addresses, x and y are source and destination ports and p is the visualized in Fig. 2b.
9 Software code publicly available at https://gitlab.com/EAVISE/raw-
traffic-nids 12 Not all traffic flows in a PCAP(NG) file are labelled, as some TCP/IP
10 https://www.wireshark.org/docs/man-pages/tshark.html
flows remain unlabelled. All other traffic, such as for example Internet
11 Not the case for KDD99 and NSL-KDD, which is why these datasets Message Control Protocol (ICMP) packets, is disregarded and not even
cannot be included in this workflow and the described experiments. included in the sorting and labelling phase.

VOLUME 9, 2021 64009


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

FIGURE 2. Proposed workflow for raw traffic-based feature extraction (left) and experimental results as well as other published results
(right).

TABLE 4. Experimental results for each dataset, feature extraction strategy and ML network. The best value for each metric is emboldened. Constraint
indicates the maximum number of samples per class during training.

C. DISCUSSION first 6 packets in a flow, while all PCCN-based approaches


From the results, it is clear that using the PCCN approach use all available flow packets. Concretely, the number of
yields the highest results, with both the header-based as HAST-II features in a dataset is proportional to the number
well as the header and payload-based approaches (HePa) of flows in that dataset, while the number of PCCN features
reaching very high DS and IS values. HePa appears to be is proportional to the number of packets. Therefore, while
about 1% better regarding the IS when compared to the both approaches generate a very large amount of benign
header approach for CICIDS2017 and UNSW-NB15. This features, only PCCN approaches generate sufficient attack
however is not the case for ISCX2012, as both approaches features. Besides these observations, it is also clear that the
are very near 100%. Furthermore, the HAST-II approach results for UNSW-NB15 are significantly lower than those
obtains significantly worse results in comparison, as for obtained for CICIDS2017 and ISCX2012. As this can also
unconstrained training these models classify everything as be observed in [47] and for the comparative Tables 2 and 3,
normal traffic. While applying constraints alleviates this UNSW-NB15 is more challenging, and thus of higher aca-
issue somewhat, the PCCN-based approaches still excel. demic interest. When comparing the experimental results
This can be partially attributed to the number of features against the corresponding results in tables 2 and 3, it is clear
generated in the HAST-II approach. HAST-II only uses the that raw traffic-based network intrusion detection is able to

64010 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

compete with detection methods based on traditional features. true negatives (tn) are elements of a negative class that are
However, the raw traffic-based approach also raises some correctly predicted to be negative. On the contrary, false
concerns: The first n bytes of packets in a flow may not positives (fp) or false negatives (fn) are elements that are
contain the information that is necessary for reliable detec- respectively negative or positive, but are wrongly predicted
tion. Moreover, it might be possible that different datasets to be their opposites. By indicating tp, tn, fp and fn to have
(or networking environments) are different in such a way a value equal to the number of corresponding elements after
that they require different features. More ideally, it might a series of classifications or predictions, five metrics can be
be possible to combine both raw traffic-based as well as defined.
certain traditional features into one approach, in an effort to The accuracy simply is the fraction of elements that was
provide more information in a format that can be efficiently predicted correctly:
extracted. More research is necessary to investigate this tp + tn
avenue. Accuracy = (2)
tp + tn + fp + fn
VIII. CONCLUSION Precision, sometimes also denoted as Positive Predictive
In this paper we provide an overview of the state of Value, signifies the fraction of elements that were correctly
the art of ML-based network intrusion detection, and dis- predicted of all elements predicted to be attacks.
cuss a number of issues within this area. These issues tp
include the non-standardized reporting of results as well Precision = (3)
tp + fp
as the outdatedness and imbalance of most network intru-
While precision resembles how many elements were cor-
sion detection datasets. We propose the detection score and
rectly classified as positive, the detection rate (DR) indicates
identification score metrics as reliable methods of fair eval-
how many attacks were detected, as shown in Eq. 4. In the
uation, and strongly encourage their use in future work.
literature, this metric is also referred to as recall, sensitivity
We also enrolled these metrics in our comparison of both
or true positive rate.
existing work as well as our own experiments and show
that they are able to accurately display NIDS performance. tp
DR = (4)
Finally, we extended the work of previous research by tp + fn
implementing previously published raw traffic-based detec- On the contrary, the false positive rate (FPR) shows the
tion approaches in a new workflow for multiple datasets amount of attacks that would have gone unnoticed in the same
to compare against other research. These experiments show scenario. It is defined in Eq. 5.
promising results, and provide more insight in the ability
of ML algorithms for real-time network intrusion detection fp
FPR = (5)
purposes. tn + fp
Future work on NIDSs should consider the challenges we Finally, the F-measure, F1-score or F-score provides the
discuss in this paper. This may include further investigation harmonic mean of the precision and the recall. This measure
of the use of alternative features, as well as the ability of penalizes low values of either precision or recall, requiring
an NIDS to work in a real-world scenario, not only for syn- both to be high to result in a high output value.
thesized datasets. Addressing these issues will significantly
2 · Precision · Recall
contribute to the adoption of ML-based network intrusion F1 = (6)
detection. Precision + Recall
The F-measure can also be generalized to the Fβ -measure,
APPENDIX A where the β value can be used to tweak the weight of either
EVALUATION METRICS BACKGROUND the precision or the recall relative to the other metric.
In this appendix, we introduce the metrics that are commonly   Precision · Recall
used for network intrusion detection evaluation (A-A). More- Fβ = 1 + β 2 (7)
β · Precision + Recall

2
over, we also consider the averaging options in a multiclass
scenario (A-B). Lastly, we concisely describe the function In eq. 7, the weight of the precision is altered.
and construction of a confusion matrix.
B. n-ARY CLASSIFICATION
A. BINARY CLASSIFICATION While these metrics are straightforward in a binary situation,
For a dataset of arbitrary size, each element belongs to a caution is required in scenarios where attacks are repre-
specific class. The entity of classes can either be binary sented in multiple classes. As the metrics are calculated for
or n-ary. In the case of binary classification, an element is each class individually by considering each other class as
either benign or an attack. Usually, the attack is denoted negative, a number of values is obtained for every metric.
as positive, while the benign class is negative. Considering These values then have to be aggregated into a singe value
this, a true positive (tp) is an element of a positive class reflecting the performance of the classification task for that
that is predicted to be positive by an algorithm. Similarly, metric.

VOLUME 9, 2021 64011


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

Commonly, three averaging approaches serve this purpose, [7] S. Wunderlich, M. Ring, D. Landes, and A. Hotho, ‘‘Comparison of system
namely micro-averaging (denoted with µ), macro-averaging call representations for intrusion detection,’’ in Proc. Int. Joint Conf., 12th
Int. Conf. Comput. Intell. Secur. Inf. Syst. (CISIS), 10th Int. Conf. Eur.
(denoted with M) and weighted averaging (denoted with w) Transnational Educ. (ICEUTE), F. M. Álvarez, A. T. Lora, J. A. S. Muñoz,
Consider the precision for binary classification as given in H. Quintián, and E. Corchado, Eds. Cham, Switzerland: Springer, 2020,
Eq. 3, used in a multiclass scenario with l classes, including pp. 14–24, doi: 10.1007/978-3-030-20005-3_2.
[8] E. Vasilomanolakis, S. Karuppayah, M. Mühlhäuser, and M. Fischer, ‘‘Tax-
normal traffic. In micro-averaging, the resulting Precisionµ onomy and survey of collaborative intrusion detection,’’ ACM Comput.
consists of the sum of the tp per class divided by the sum of Surv., vol. 47, no. 4, pp. 1–33, Jul. 2015, doi: 10.1145/2716260.
all tp and fp per class [123]. [9] C. Duma, M. Karresand, N. Shahmehri, and G. Caronni, ‘‘A trust-aware,
P2P-based overlay for intrusion detection,’’ in Proc. 17th Int. Conf.
Pl Database Expert Syst. Appl. (DEXA), 2006, pp. 692–697.
tpi
Precisionµ = Pl i=1 (8) [10] G. B. White, E. A. Fisch, and U. W. Pooch, ‘‘Cooperating security man-
agers: A peer-based intrusion detection system,’’ IEEE Netw., vol. 10, no. 1,
i=1 (tpi + fpi )
pp. 20–23, Jan. 1996.
The macro averaged precision then simply is the arithmetic [11] C. J. Fung, O. Baysal, J. Zhang, I. Aib, and R. Boutaba, ‘‘Trust man-
agement for host-based collaborative intrusion detection,’’ in Manag-
mean of all precision values: ing Large-Scale Service Deployment, F. De Turck, W. Kellerer, and
Pl tpi G. Kormentzas, Eds. Berlin, Germany: Springer, 2008, pp. 109–122.
i=1 tpi +fpi [12] W. Li, W. Meng, Y. Wang, J. Han, and J. Li, ‘‘Towards securing
PrecisionM = (9) challenge-based collaborative intrusion detection networks via message
l verification,’’ in Information Security Practice and Experience, C. Su and
Finally, for a dataset of N samples, where ni is the number H. Kikuchi, Eds. Cham, Switzerland: Springer, 2018, pp. 313–328.
[13] B. B. Zarpelão, R. S Miani, C. T. Kawakani, and S. C. de Alvarenga,
of samples for class i, the weighted average of the precision is: ‘‘A survey of intrusion detection in Internet of Things,’’ J. Netw. Comput.
l Appl., vol. 84, pp. 25–37, Apr. 2017. [Online]. Available: http://www.
X tpi ni sciencedirect.com/science/article/pii/S1084804517300802
Precisionw = · (10) [14] S. Raza, L. Wallgren, and T. Voigt, ‘‘SVELTE: Real-time intru-
tpi + fpi N sion detection in the Internet of Things,’’ Ad Hoc Netw., vol. 11,
i=1
no. 8, pp. 2661–2674, Nov. 2013. [Online]. Available: http://www.
While weighted and micro averages are biased towards sciencedirect.com/science/article/pii/S1570870513001005
larger classes, macro averages consider each class to be [15] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan,
and A.-R. Sadeghi, ‘‘DÏoT: A federated self-learning anomaly detec-
equal [123]. From this point, we will only consider weighted tion system for IoT,’’ 2018, arXiv:1804.07474. [Online]. Available:
and macro averages, as micro averages are very similar to https://arxiv.org/abs/1804.07474
weighted averages but appear less often in NIDS literature. [16] K. Khan, A. Mehmood, S. Khan, M. A. Khan, Z. Iqbal, and
W. K. Mashwani, ‘‘A survey on intrusion detection and prevention
While we only provide the equations for the precision, we can in wireless ad-hoc networks,’’ J. Syst. Archit., vol. 105, May 2020,
similarly average recall and F-measure results. Art. no. 101701. [Online]. Available: https://www.sciencedirect.
com/science/article/pii/S1383762119305089
[17] B. Riyaz and S. Ganapathy, ‘‘A deep learning approach for effective intru-
C. CONFUSION MATRIX sion detection in wireless networks using CNN,’’ Soft Comput., vol. 24,
Confusion matrices are a useful tool to present and eval- no. 22, pp. 17265–17278, Nov. 2020, doi: 10.1007/s00500-020-05017-0.
uate the classification performance of any classification [18] R. Vijayanand and D. Devaraj, ‘‘A novel feature selection method using
whale optimization algorithm and genetic operators for intrusion detection
algorithm. Each column in the square matrix presents the system in wireless mesh network,’’ IEEE Access, vol. 8, pp. 56847–56854,
number of samples that were predicted as members of a 2020.
corresponding class. Each row then describes to what actual [19] M. Safaldin, M. Otair, and L. Abualigah, ‘‘Improved binary gray wolf
optimizer and SVM for intrusion detection system in wireless sen-
class those samples belonged. Ideally, the matrix is a diagonal sor networks,’’ J. Ambient Intell. Humanized Comput., vol. 12, no. 2,
matrix, which means that all samples are predicted correctly. pp. 1559–1576, Jun. 2020, doi: 10.1007/s12652-020-02228-z.
A confusion matrix structure is given in Fig. 1, where we [20] S. M. Kasongo and Y. Sun, ‘‘A deep learning method with wrapper
based feature extraction for wireless intrusion detection system,’’ Com-
demonstrate how to map the multiclass scenario to a binary put. Secur., vol. 92, May 2020, Art. no. 101752. [Online]. Available:
scenario as described in Sect. V. https://www.sciencedirect.com/science/article/pii/S0167404820300365
[21] S. Anwar, J. M. Zain, M. F. Zolkipli, Z. Inayat, S. Khan, B. Anthony,
and V. Chang, ‘‘From intrusion detection to an intrusion response system:
REFERENCES Fundamentals, requirements, and future directions,’’ Algorithms, vol. 10,
[1] (Mar. 2020). Cisco Annual Internet Report—Cisco Annual Internet no. 2, p. 39, Mar. 2017.
Report (2018–2023) White Paper—Cisco. [Online]. Available: [22] K. Kim, M. E. Aminanto, and H. C. Tanuwidjaja, Network Intrusion
https://www.cisco.com/c/en/us/solutions/collateral/executive- Detection Using Deep Learning (Springer Briefs on Cyber Security
perspectives/annual-internet-report/white-paper-c11-741490.html Systems and Networks). Singapore: Springer, 2018. [Online]. Available:
[2] (Feb. 2016). Massive Brute-Force Attack on Alibaba Affects http://www.springer.com/series/15797 and http://link.springer.com/10.
Millions. [Online]. Available: https://www.infosecurity-magazine. 1007/978-981-13-1444-5
com/news/massive-bruteforce-attack-on [23] R. Boutaba, M. A. Salahuddin, N. Limam, S. Ayoubi, N. Shahriar,
[3] (Oct. 2016). Dyn Analysis Summary of Friday October 21 Attack. F. Estrada-Solano, and O. M. Caicedo, ‘‘A comprehensive survey on
[Online]. Available: https://web.archive.org/web/20200620203923/ and machine learning for networking: Evolution, applications and research
https://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/ opportunities,’’ J. Internet Services Appl., vol. 9, no. 1, pp. 1–99,
[4] H. Debar, ‘‘An introduction to intrusion-detection systems,’’ in Proc. Con- Dec. 2018.
nect, 2002, pp. 1–18. [24] Z. Chiba, N. Abghour, K. Moussaid, A. El Omri, and M. Rida, ‘‘New
[5] J. P. Anderson, ‘‘Computer security threat monitoring and surveillance,’’ anomaly network intrusion detection system in cloud environment based
James P. Anderson Co., Washington, DC, USA, Tech. Rep., 1980. on optimized back propagation neural network using improved genetic
[6] D. E. Denning, ‘‘An intrusion-detection model,’’ IEEE Trans. Softw. Eng., algorithm,’’ Int. J. Commun. Netw. Inf. Secur., vol. 11, pp. 61–84,
vol. SE-13, no. 2, pp. 222–232, Feb. 1987. Apr. 2019.

64012 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

[25] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, ‘‘Network anomaly [46] N. Moustafa and J. Slay, ‘‘The evaluation of network anomaly detection
detection: Methods, systems and tools,’’ IEEE Commun. Surveys Tuts., systems: Statistical analysis of the UNSW-NB15 data set and the compar-
vol. 16, no. 1, pp. 303–336, 1st Quart., 2014. ison with the KDD99 data set,’’ Inf. Secur. J., Global Perspective, vol. 25,
[26] M. Ahmed, A. N. Mahmood, and J. Hu, ‘‘A survey of network nos. 1–3, pp. 18–31, Apr. 2016, doi: 10.1080/19393555.2015.1125974.
anomaly detection techniques,’’ J. Netw. Comput. Appl., vol. 60, [47] R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran,
pp. 19–31, Jan. 2016. [Online]. Available: http://www.sciencedirect.com/ A. Al-Nemrat, and S. Venkatraman, ‘‘Deep learning approach
science/article/pii/S1084804515002891 for intelligent intrusion detection system,’’ IEEE Access, vol. 7,
[27] A. L. Buczak and E. Guven, ‘‘A survey of data mining and machine pp. 41525–41550, 2019.
learning methods for cyber security intrusion detection,’’ IEEE Commun. [48] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, ‘‘Toward generating a
Surveys Tuts., vol. 18, no. 2, pp. 1153–1176, 2nd Quart., 2016. new intrusion detection dataset and intrusion traffic characterization,’’ in
[28] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, and Proc. 4th Int. Conf. Inf. Syst. Secur. Privacy (ICISSP), Portugal, Jan. 2018,
C. Wang, ‘‘Machine learning and deep learning methods for cybersecu- pp. 108–116.
rity,’’ IEEE Access, vol. 6, pp. 35365–35381, 2018. [49] R. Panigrahi and S. Borah, ‘‘A detailed analysis of CICIDS2017 dataset for
[29] D. Berman, A. Buczak, J. Chavis, and C. Corbett, ‘‘A survey of deep designing intrusion detection systems,’’ Int. J. Eng. Technol., vol. 7, no. 3,
learning methods for cyber security,’’ Information, vol. 10, no. 4, p. 122, pp. 479–482, 2018.
Apr. 2019, doi: 10.3390/info10040122. [50] P. Branco, L. Torgo, and R. P. Ribeiro, ‘‘A survey of predictive modeling
[30] S. Mahdavifar and A. A. Ghorbani, ‘‘Application of deep learning to on imbalanced domains,’’ ACM Comput. Surv., vol. 49, no. 2, pp. 1–50,
cybersecurity: A survey,’’ Neurocomputing, vol. 347, pp. 149–176, Nov. 2016, doi: 10.1145/2907070.
Jun. 2019. [Online]. Available: http://www.sciencedirect.com/ [51] A. Luque, A. Carrasco, A. Martín, and A. de las Heras, ‘‘The impact
science/article/pii/S0925231219302954 of class imbalance in classification performance metrics based on the
[31] N. Chaabouni, M. Mosbah, A. Zemmari, C. Sauvignac, and P. Faruki, binary confusion matrix,’’ Pattern Recognit., vol. 91, pp. 216–231,
‘‘Network intrusion detection for IoT security based on learning tech- Jul. 2019. [Online]. Available: https://www.sciencedirect.com/
niques,’’ IEEE Commun. Surveys Tuts., vol. 21, no. 3, pp. 2671–2701, science/article/pii/S0031320319300950
3rd Quart., 2019. [52] A. P. Bradley, ‘‘The use of the area under the ROC curve in the
[32] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, ‘‘A survey evaluation of machine learning algorithms,’’ Pattern Recognit., vol. 30,
of network-based intrusion detection data sets,’’ Comput. Secur., vol. 86, no. 7, pp. 1145–1159, Jul. 1997. [Online]. Available: https://www.
pp. 147–167, Sep. 2019, doi: 10.1016/j.cose.2019.06.005. sciencedirect.com/science/article/pii/S0031320396001422
[33] M. A. Ferrag, L. Maglaras, S. Moschoyiannis, and H. Janicke, ‘‘Deep [53] A. A. Cardenas, J. S. Baras, and K. Seamon, ‘‘A framework for the
learning for cyber security intrusion detection: Approaches, datasets, evaluation of intrusion detection systems,’’ in Proc. IEEE Symp. Secur.
and comparative study,’’ J. Inf. Secur. Appl., vol. 50, Feb. 2020, Privacy (S&P), May 2006, p. 15.
Art. no. 102419. [Online]. Available: http://www.sciencedirect. [54] J. Zhang, M. Zulkernine, and A. Haque, ‘‘Random-forests-
com/science/article/pii/S2214212619305046 based network intrusion detection systems,’’ IEEE Trans. Syst.,
[34] R. Sommer and V. Paxson, ‘‘Outside the closed world: On using machine Man, Cybern. C, Appl. Rev., vol. 38, no. 5, pp. 649–659,
learning for network intrusion detection,’’ in Proc. IEEE Symp. Secur. Sep. 2008.
Privacy, Los Alamitos, CA, USA, May 2010, pp. 305–316. [55] J. Kim, N. Shin, S. Y. Jo, and S. H. Kim, ‘‘Method of intrusion detection
[35] 1998 DARPA Intrusion Detection Evaluation Dataset. Accessed: using deep neural network,’’ in Proc. IEEE Int. Conf. Big Data Smart
Jan. 5, 2021. [Online]. Available: https://www.ll.mit.edu/r-d/datasets/ Comput. (BigComp), Feb. 2017, pp. 313–316.
1998-darpa-intrusion-detection-evaluation-dataset [56] R. Beghdad, ‘‘Training all the KDD data set to classify
[36] R. Lippmann, R. Cunningham, D. Fried, I. Graf, K. Kendall, S. Webster, and detect attacks,’’ Neural Netw. World, vol. 17, pp. 81–91,
and M. Zissman, ‘‘Results of the DARPA 1998 offline intrusion detec- Jun. 2007.
tion evaluation,’’ in Proc. 2nd Int. Workshop Recent Adv. Intrusion [57] R. Vinayakumar, K. P. Soman, and P. Poornachandran, ‘‘Applying con-
Detection, Jan. 1999, pp. 1–29. [Online]. Available: http://www.raid- volutional neural network for network intrusion detection,’’ in Proc.
symposium.org/raid99/ Int. Conf. Adv. Comput., Commun. Informat. (ICACCI), Sep. 2017,
[37] J. McHugh, ‘‘Testing intrusion detection systems: A critique of the 1998 pp. 1222–1228.
and 1999 DARPA intrusion detection system evaluations as performed [58] L. Zhang, M. Li, X. Wang, and Y. Huang, ‘‘An improved network intru-
by Lincoln laboratory,’’ ACM Trans. Inf. Syst. Secur., vol. 3, no. 4, sion detection based on deep neural network,’’ IOP Conf. Ser., Mater.
pp. 262–294, Nov. 2000, doi: 10.1145/382912.382923. Sci. Eng., vol. 563, Aug. 2019, Art. no. 052019, doi: 10.1088%2F1757-
[38] M. V. Mahoney and P. K. Chan, ‘‘An analysis of the 1999 DARPA/Lincoln 899x%2F563%2F5%2F052019.
laboratory evaluation data for network anomaly detection,’’ in [59] M. Sheikhan, Z. Jadidi, and A. Farrokhi, ‘‘Intrusion detection using
Recent Advances in Intrusion Detection, G. Vigna, C. Kruegel, and reduced-size RNN based on feature grouping,’’ Neural Comput. Appl.,
E. Jonsson, Eds. Berlin, Germany: Springer, 2003, pp. 220–237. vol. 21, no. 6, pp. 1185–1190, Sep. 2012, doi: 10.1007/s00521-010-0487-
[39] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis 0.
of the KDD CUP 99 data set,’’ in Proc. IEEE Symp. Comput. Intell. Secur. [60] R. C. Staudemeyer, ‘‘Applying long short-term memory recurrent neu-
Defense Appl., Jul. 2009, pp. 1–6. ral networks to intrusion detection,’’ South Afr. Comput. J., vol. 56,
[40] KDD Cup 1999 Data. Accessed: Jan. 5, 2021. [Online]. Available: pp. 136–154, Jul. 2015.
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [61] Y. Yu, Z. Ye, X. Zheng, and C. Rong, ‘‘An efficient cascaded method
[41] H. Xu, Q. Cao, H. Fu, C. Fu, H. Chen, and J. Su, ‘‘Application of support for network intrusion detection based on extreme learning machines,’’
vector machine model based on an improved elephant herding optimiza- J. Supercomput., vol. 74, no. 11, pp. 5797–5812, Nov. 2018.
tion algorithm in network intrusion detection,’’ in Artificial Intelligence, [62] K. Alrawashdeh and C. Purdy, ‘‘Toward an online anomaly intrusion
K. Knight, C. Zhang, G. Holmes, and M.-L. Zhang, Eds. Singapore: detection system based on deep learning,’’ in Proc. 15th IEEE Int. Conf.
Springer, 2019, pp. 283–295. Mach. Learn. Appl. (ICMLA), Dec. 2016, pp. 195–200.
[42] L. Portnoy, E. Eskin, and S. Stolfo, ‘‘Intrusion detection with unlabeled [63] W. L. Al-Yaseen, Z. A. Othman, and M. Z. A. Nazri, ‘‘Multi-level hybrid
data using clustering,’’ in Proc. ACM CSS Workshop Data Mining Appl. support vector machine and extreme learning machine based on modified
Secur. (DMSA), 2001, pp. 5–8. K-means for intrusion detection system,’’ Expert Syst. Appl., vol. 67,
[43] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, ‘‘Toward pp. 296–303, Jan. 2017. [Online]. Available: http://www.sciencedirect.
developing a systematic approach to generate benchmark datasets for com/science/article/pii/S0957417416305310
intrusion detection,’’ Comput. Secur., vol. 31, no. 3, pp. 357–374, [64] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, ‘‘A deep learning approach to
May 2012. [Online]. Available: http://www.sciencedirect.com/ network intrusion detection,’’ IEEE Trans. Emerg. Topics Comput. Intell.,
science/article/pii/S0167404811001672 vol. 2, no. 1, pp. 41–50, Feb. 2018.
[44] N. Moustafa and J. Slay, ‘‘UNSW-NB15: A comprehensive data set for [65] B. Hu, J. Wang, Y. Zhu, and T. Yang, ‘‘Dynamic deep forest: An ensemble
network intrusion detection systems (UNSW-NB15 network data set),’’ in classification method for network intrusion detection,’’ Electronics, vol. 8,
Proc. Mil. Commun. Inf. Syst. Conf. (MilCIS), Nov. 2015, pp. 1–6. no. 9, p. 968, Aug. 2019, doi: 10.3390/electronics8090968.
[45] V. Paxson, ‘‘Bro: A system for detecting network intruders in [66] C. Lu, L. Zhai, T. Liu, and N. Li, ‘‘Network intrusion detection
real-time,’’ Comput. Netw., vol. 31, nos. 23–24, pp. 2435–2463, based on neural networks and D-S evidence,’’ in Image and Video
Dec. 1999. [Online]. Available: http://www.sciencedirect.com/ Technology—PSIVT 2015 Workshops, F. Huang and A. Sugimoto, Eds.
science/article/pii/S1389128699001127 Cham, Switzerland: Springer, 2016, pp. 332–343.

VOLUME 9, 2021 64013


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

[67] Y. Yu, S. Kang, and H. Qiu, ‘‘A new network intrusion detection algo- [86] C. Cortes and V. Vapnik, ‘‘Support-vector networks,’’ Mach. Learn.,
rithm: DA-ROS-ELM,’’ IEEJ Trans. Electr. Electron. Eng., vol. 13, vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
no. 4, pp. 602–612, Apr. 2018. [Online]. Available: https://onlinelibrary. [87] W. Hu, Y. Liao, and V. R. Vemuri, ‘‘Robust support vector machines for
wiley.com/doi/abs/10.1002/tee.22606 anomaly detection in computer security,’’ in Proc. Int. Conf. Mach. Learn.
[68] C. Yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru- Appl. (ICMLA), M. A. Wani, K. J. Cios, and K. Hafeez, Eds., Los Angeles,
sion detection using recurrent neural networks,’’ IEEE Access, vol. 5, CA, USA, Jun. 2003, pp. 168–174.
pp. 21954–21961, 2017. [88] Y. Yi, J. Wu, and W. Xu, ‘‘Incremental SVM based on reserved set
[69] Z. Li, Z. Qin, K. Huang, X. Yang, and S. Ye, ‘‘Intrusion detection for network intrusion detection,’’ Expert Syst. Appl., vol. 38, no. 6,
using convolutional neural networks for representation learning,’’ in pp. 7698–7707, Jun. 2011. [Online]. Available: http://www.sciencedirect.
Neural Information Processing, D. Liu, S. Xie, Y. Li, D. Zhao, and com/science/article/pii/S0957417410015046
E.-S. M. El-Alfy, Eds. Cham, Switzerland: Springer, 2017, pp. 858–866. [89] S. Murthy, ‘‘Automatic construction of decision trees from data: A multi-
[70] T. Kim, S. C. Suh, H. Kim, J. Kim, and J. Kim, ‘‘An encoding technique disciplinary survey,’’ Data Mining Knowl. Discovery, vol. 2, pp. 345–389,
for CNN-based network anomaly detection,’’ in Proc. IEEE Int. Conf. Big Mar. 2000.
Data (Big Data), Dec. 2018, pp. 2960–2965. [90] T. K. Ho, ‘‘Random decision forests,’’ in Proc. 3rd Int. Conf. Document
[71] D. Kwon, K. Natarajan, S. C. Suh, H. Kim, and J. Kim, ‘‘An empirical Anal. Recognit., vol. 1, Aug. 1995, pp. 278–282.
study on network anomaly detection using convolutional neural networks,’’ [91] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, ‘‘Extreme learning machine: The-
in Proc. IEEE 38th Int. Conf. Distrib. Comput. Syst. (ICDCS), Jul. 2018, ory and applications,’’ Neurocomputing, vol. 70, nos. 1–3, pp. 489–501,
pp. 1595–1598. Dec. 2006. [Online]. Available: http://www.sciencedirect.com/science/
[72] M. Al-Qatf, Y. Lasheng, M. Al-Habib, and K. Al-Sabahi, ‘‘Deep learning article/pii/S0925231206000385
approach combining sparse autoencoder with SVM for network intrusion [92] G. E. Hinton, ‘‘A practical guide to training restricted Boltzmann
detection,’’ IEEE Access, vol. 6, pp. 52843–52856, 2018. machines,’’ in Neural Networks: Tricks of the Trade. Berlin,
[73] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, ‘‘Shal- Germany: Springer, 2012, pp. 599–619, doi: 10.1007/978-3-642-35289-
low neural network with kernel approximation for prediction prob- 8_32.
lems in highly demanding data networks,’’ Expert Syst. Appl., vol. 124, [93] U. Fiore, F. Palmieri, A. Castiglione, and A. De Santis, ‘‘Net-
pp. 196–208, Jun. 2019. [Online]. Available: http://www.sciencedirect. work anomaly detection with the restricted Boltzmann machine,’’ Neu-
com/science/article/pii/S0957417419300843 rocomputing, vol. 122, pp. 13–23, Dec. 2013. [Online]. Available:
[74] N. Chouhan, A. Khan, and H.-U.-R. Khan, ‘‘Network anomaly http://www.sciencedirect.com/science/article/pii/S0925231213005547
detection using channel boosted and residual learning based deep [94] M. R. G. Raman, N. Somu, K. Kirthivasan, R. Liscano, and
convolutional neural network,’’ Appl. Soft Comput., vol. 83, Oct. 2019, V. S. S. Sriram, ‘‘An efficient intrusion detection system based on
Art. no. 105612. [Online]. Available: http://www.sciencedirect.com/ hypergraph–genetic algorithm for parameter optimization and feature
science/article/pii/S1568494619303928 selection in support vector machine,’’ Knowl.-Based Syst., vol. 134,
[75] M. Z. Alom, V. Bontupalli, and T. M. Taha, ‘‘Intrusion detection using pp. 1–12, Oct. 2017. [Online]. Available: http://www.sciencedirect.
deep belief networks,’’ in Proc. Nat. Aerosp. Electron. Conf. (NAECON), com/science/article/pii/S0950705117303209
Jun. 2015, pp. 339–344. [95] R. Jain and N. S. Abouzakhar, ‘‘Hidden Markov model based anomaly
[76] Y. Ding and Y. Zhai, ‘‘Intrusion detection system for NSL-KDD dataset intrusion detection,’’ in Proc. Int. Conf. Internet Technol. Secured Trans.,
using convolutional neural networks,’’ in Proc. 2nd Int. Conf. Comput. Dec. 2012, pp. 528–533.
Sci. Artif. Intell. (CSAI). New York, NY, USA: Association for Computing [96] W.-C. Lin, S.-W. Ke, and C.-F. Tsai, ‘‘CANN: An intrusion
Machinery, 2018, pp. 81–85, doi: 10.1145/3297156.3297230. detection system based on combining cluster centers and nearest
[77] Y. Yang, K. Zheng, C. Wu, X. Niu, and Y. Yang, ‘‘Building an effective neighbors,’’ Knowl.-Based Syst., vol. 78, pp. 13–21, Apr. 2015.
intrusion detection system using the modified density peak clustering [Online]. Available: http://www.sciencedirect.com/science/article/pii/
algorithm and deep belief networks,’’ Appl. Sci., vol. 9, no. 2, p. 238, S0950705115000167
Jan. 2019, doi: 10.3390/app9020238. [97] K. Leung and C. Leckie, ‘‘Unsupervised anomaly detection in
[78] A. S. Qureshi, A. Khan, N. Shamim, and M. H. Durad, ‘‘Intrusion detection network intrusion detection using clusters,’’ in Proc. ACSC, 2005,
using deep sparse auto-encoder and self-taught learning,’’ Neural Comput. pp. 1–10.
Appl., vol. 32, pp. 1–13, Mar. 2019. [98] I. Almomani, B. A. Kasasbeh, and M. Al-Akhras, ‘‘WSN-DS: A dataset
[79] W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye, Y. Huang, and M. Zhu, for intrusion detection systems in wireless sensor networks,’’ J. Sensors,
‘‘HAST-IDS: Learning hierarchical spatial-temporal features using deep vol. 2016, pp. 4731953:1–4731953:16, Aug. 2016.
neural networks to improve intrusion detection,’’ IEEE Access, vol. 6, [99] J. Song, H. Takakura, Y. Okabe, M. Eto, D. Inoue, and K. Nakao, ‘‘Sta-
pp. 1792–1806, 2018. tistical analysis of honeypot data and building of kyoto 2006+dataset
[80] T. Aldwairi, D. Perera, and M. A. Novotny, ‘‘An evaluation of the for NIDS evaluation,’’ in Proc. 1st Workshop Building Anal. Datasets
performance of restricted Boltzmann machines as a model for anomaly Gathering Exper. Returns Secur. (BADGERS). New York, NY, USA: Asso-
network intrusion detection,’’ Comput. Netw., vol. 144, pp. 111–119, ciation for Computing Machinery, 2011, pp. 29–36, doi: 10.1145/1978672.
Oct. 2018. [Online]. Available: http://www.sciencedirect.com/ 1978676.
science/article/pii/S1389128618306005 [100] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
[81] N. Aboueata, S. Alrasbi, A. Erbad, A. Kassler, and D. Bhamare, ‘‘Super- with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6,
vised machine learning techniques for efficient network intrusion detec- pp. 84–90, May 2017.
tion,’’ in Proc. 28th Int. Conf. Comput. Commun. Netw. (ICCCN), [101] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
Jul. 2019, pp. 1–8. recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
[82] L. Zhiqiang, G. Mohi-Ud-Din, L. Bing, L. Jianchao, Z. Ye, and L. Zhijun, 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
‘‘Modeling network intrusion detection system using feed-forward neural [102] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You
network using UNSW-NB15 dataset,’’ in Proc. IEEE 7th Int. Conf. Smart only look once: Unified, real-time object detection,’’ in Proc.
Energy Grid Eng. (SEGE), Aug. 2019, pp. 299–303. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016,
[83] W. Zong, Y.-W. Chow, and W. Susilo, ‘‘A two-stage classifier approach pp. 779–788.
for network intrusion detection,’’ in Information Security Practice and [103] Y. Kim, ‘‘Convolutional neural networks for sentence classifica-
Experience, C. Su and H. Kikuchi, Eds. Cham, Switzerland: Springer, tion,’’ in Proc. Conf. Empirical Methods Natural Lang. Process.
2018, pp. 329–340. (EMNLP). Doha, Qatar: Association for Computational Linguistics,
[84] R. Abdulhammed, H. Musafer, A. Alessa, M. Faezipour, and A. Abuzneid, Oct. 2014, pp. 1746–1751. [Online]. Available: https://www.aclweb.
‘‘Features dimensionality reduction approaches for machine learning based org/anthology/D14-1181
network intrusion detection,’’ Electronics, vol. 8, no. 3, p. 322, Mar. 2019. [104] R. Fontugne, P. Borgnat, P. Abry, and K. Fukuda, ‘‘MAWILab: Com-
[85] Y. Zhang, X. Chen, D. Guo, M. Song, Y. Teng, and X. Wang, ‘‘PCCN: bining diverse anomaly detectors for automated anomaly labeling and
Parallel cross convolutional neural network for abnormal network traffic performance benchmarking,’’ in Proc. 6th Int. Conf. Co-NEXT. New York,
flows detection in multi-class imbalanced network traffic flows,’’ IEEE NY, USA: Association for Computing Machinery, 2010, pp. 1–12, doi:
Access, vol. 7, pp. 119904–119916, 2019. 10.1145/1921168.1921179.

64014 VOLUME 9, 2021


L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

[105] Z. Li, Z. Qin, K. Huang, X. Yang, and S. Ye, ‘‘Intrusion detec- LAURENS LE JEUNE received the B.Sc. and
tion using convolutional neural networks for representation learning,’’ M.Sc. degrees in electronics and ICT engineer-
in Neural Information Processing, D. Liu, S. Xie, Y. Li, D. Zhao, and ing technology in a joint program of KU Leuven,
E.-S. M. El-Alfy, Eds. Cham, Switzerland: Springer, 2017, pp. 858–866. Leuven, Belgium, and Hasselt University, Diepen-
[106] P. Wu, H. Guo, and R. Buckland, ‘‘A transfer learning approach for beek, Belgium, in 2018 and 2019, respectively.
network intrusion detection,’’ in Proc. IEEE 4th Int. Conf. Big Data Anal.
He is currently pursuing the Ph.D. degree in
(ICBDA), Mar. 2019, pp. 281–285.
[107] H. Palangi, L. Deng, Y. Shen, J. Gao, X. He, J. Chen, X. Song, and engineering technology with KU Leuven. For his
R. Ward, ‘‘Deep sentence embedding using long short-term memory net- master’s thesis, he investigated the classification
works: Analysis and application to information retrieval,’’ IEEE/ACM of camera trap wildlife footage for biodiversity
Trans. Audio, Speech, Language Process., vol. 24, no. 4, pp. 694–707, research. In his Ph.D. research, he works for
Apr. 2016. the Embedded Systems and Security (ES&S) as well as the Embedded
[108] Z. Zhao, W. Chen, X. Wu, P. C. Y. Chen, and J. Liu, ‘‘LSTM network: and Artificially Intelligent Vision Engineering (EAVISE) research groups.
A deep learning approach for short-term traffic forecast,’’ IET Intell. He is also investigating the application of deep learning technology for
Transp. Syst., vol. 11, no. 2, pp. 68–75, Mar. 2017. hardware-accelerated real-time network intrusion detection. His research
[109] Y. Wu, M. Yuan, S. Dong, L. Lin, and Y. Liu, ‘‘Remaining useful life interests include machine learning and deep learning, FPGAs, network secu-
estimation of engineered systems using vanilla LSTM neural networks,’’ rity, and intrusion detection.
Neurocomputing, vol. 275, pp. 167–179, Jan. 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0925231217309505 TOON GOEDEMÉ studied electrical engineer-
[110] S. Hochreiter and J. J. Schmidhuber, ‘‘Long short-term memory,’’ Neural ing at KU Leuven. He received the Ph.D. degree
Comput., vol. 9, no. 8, pp. 80–1735, 1997. in vision-based topological navigation from KU
[111] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’
Leuven, in December 2006, under the guidance
Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997, doi:
10.1162/neco.1997.9.8.1735. of Prof. L. Van Gool and T. Tuytelaars. After-
[112] I. Sutskever, J. Martens, and G. Hinton, ‘‘Generating text with recurrent wards, he started teaching at the Technical Uni-
neural networks,’’ in Proc. 28th Int. Conf. Mach. Learn. (ICML). Madison, versity De Nayer, Sint-Katelijne-Waver, where he
WI, USA: Omnipress, 2011, pp. 1017–1024. founded his research group Embedded and Arti-
[113] H. Debar, M. Becker, and D. Siboni, ‘‘A neural network component for an ficially Intelligent Vision Engineering (EAVISE),
intrusion detection system,’’ in Proc. IEEE Comput. Soc. Symp. Res. Secur. in 2008. Nowadays, his group is integrated in the
Privacy, May 1992, pp. 240–250. KU Leuven and consists of three professors (Joost Vennekens, Patrick Van-
[114] M. I. Jordan, ‘‘Serial order: A parallel distributed processing approach,’’ dewalle, and himself), four postdocs and about 20 researchers, playing a vital
in Neural-Network Models of Cognition (Advances in Psychology),
role in the transfer of computer vision and AI know-how from academic
vol. 121, J. W. Donahoe and V. P. Dorsel, Eds. North-Holland,
research towards the industry. Since 2014, he has been an Associate Pro-
1997, ch. 25, pp. 471–495. [Online]. Available: https://www.
sciencedirect.com/science/article/pii/S0166411597801112, doi: 10.1016/
fessor with KU Leuven. He is the (co)author of more than 190 international
S0166-4115(97)80111-2. publications and was a project leader of more than 75 industrially co-founded
[115] J. L. Elman, ‘‘Finding structure in time,’’ Cognit. Sci., vol. 14, no. 2, research projects. Together with his team, he won several awards, such as
pp. 179–211, Mar. 1990. [Online]. Available: https://onlinelibrary.wiley. the Best Paper Award at Embedded Vision Workshop CVPR 2015, the Best
com/doi/abs/10.1207/s15516709cog1402_1 Demo Award at BNAIC 2015, the Best Paper Award at CGVCVIP 2016,
[116] R. K. Malaiya, D. Kwon, J. Kim, S. C. Suh, H. Kim, and I. Kim, the Willy Asselman Award for research achievements in 2016, and the Best
‘‘An empirical evaluation of deep learning for network anomaly detec- Paper Award at Embedded Vision Workshop ECCV 2020. He is also an
tion,’’ in Proc. Int. Conf. Comput., Netw. Commun. (ICNC), Mar. 2018, Associate Editor of the IET Computer Vision journal and the MDPI Journal
pp. 893–898. of Imaging.
[117] S. Naseer, Y. Saleem, S. Khalid, M. K. Bashir, J. Han, M. M. Iqbal,
and K. Han, ‘‘Enhanced network anomaly detection based on deep neural NELE MENTENS (Senior Member, IEEE)
networks,’’ IEEE Access, vol. 6, pp. 48231–48246, 2018. received the master’s and Ph.D. degrees from
[118] J. Kim, J. Kim, H. L. Thi Thu, and H. Kim, ‘‘Long short term KU Leuven, in 2003 and 2007, respectively. She
memory recurrent neural network classifier for intrusion detection,’’
was a Visiting Researcher with Ruhr University
in Proc. Int. Conf. Platform Technol. Service (PlatCon), Feb. 2016,
Bochum, in 2013, and with EPFL, in 2017. She is
pp. 1–5.
[119] G. Hinton, Deep Belief Nets. Boston, MA, USA: Springer, 2010, currently a Professor with Leiden University and
pp. 267–269, doi: 10.1007/978-0-387-30164-8_208. KU Leuven. She is the (co)author in over 100 pub-
[120] A. Rodriguez and A. Laio, ‘‘Clustering by fast search and find lications in international journals, conferences,
of density peaks,’’ Science, vol. 344, no. 6191, pp. 1492–1496, and books. She was/is the PI in around 20 finished
Jun. 2014. and ongoing research projects with national and
[121] Y. Yu, J. Long, and Z. Cai, ‘‘Session-based network intrusion detection international funding. Her research interests include the domains of config-
using a deep learning architecture,’’ in Modeling Decisions for Artificial urable computing for security, hardware acceleration of network security
Intelligence, V. Torra, Y. Narukawa, A. Honda, and S. Inoue, Eds. Cham, applications, and security in constrained environments. She serves as a
Switzerland: Springer, 2017, pp. 144–155. program committee member for renowned international conferences on
[122] S. García, M. Grill, J. Stiborek, and A. Zunino, ‘‘An empirical
comparison of botnet detection methods,’’ Comput. Secur., vol. 45,
security and hardware design, such as NDSS, Usenix Security Symposium,
pp. 100–123, Sep. 2014. [Online]. Available: http://www.sciencedirect. CHES, DAC, DATE, FPL, and ESWEEK. She was the General Co-Chair of
com/science/article/pii/S0167404814000923 FPL, in 2017, the Program Chair of EWME and PROOFS, in 2018, and the
[123] M. Sokolova and G. Lapalme, ‘‘A systematic analysis of per- Program Chair of FPL and CARDIS, in 2020. She also serves as an Associate
formance measures for classification tasks,’’ Inf. Process. Manage., Editor for IEEE TRANSACTIONS ON INFORMATION FORENSICS and SECURITY and
vol. 45, no. 4, pp. 427–437, Jul. 2009. [Online]. Available: http://www. IEEE Circuits and Systems Magazine.
sciencedirect.com/science/article/pii/S0306457309000259

VOLUME 9, 2021 64015

View publication stats

You might also like