Machine Learning For Misuse-Based Network Intrusio

The document discusses a framework for machine learning-based misuse detection in network intrusion systems, highlighting the shift from hard-coded rules to AI-driven approaches. It introduces new evaluation metrics for fair comparison of algorithms and presents a workflow for processing raw network traffic into machine learning features. The findings indicate that this framework can achieve state-of-the-art performance while simplifying feature extraction for real-time applications.

Uploaded by

lawkar0101

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views22 pages

Machine Learning For Misuse-Based Network Intrusio

Uploaded by

lawkar0101

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/351270050

Machine Learning for Misuse-Based Network Intrusion Detection: Overview,

Uniﬁed Evaluation and Feature Choice Comparison Framework

Article in IEEE Access · January 2021

DOI: 10.1109/ACCESS.2021.3075066

CITATIONS READS

34 320

3 authors, including:

Laurens Le Jeune Toon Goedemé

KU Leuven KU Leuven
8 PUBLICATIONS 74 CITATIONS 213 PUBLICATIONS 3,432 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Laurens Le Jeune on 15 September 2021.

The user has requested enhancement of the downloaded file.

Received March 10, 2021, accepted March 22, 2021, date of publication April 22, 2021, date of current version May 4, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3075066

Machine Learning for Misuse-Based Network

Intrusion Detection: Overview, Unified Evaluation
and Feature Choice Comparison Framework
LAURENS LE JEUNE 1,2 , TOON GOEDEMÉ 2 , AND
NELE MENTENS 1,3 , (Senior Member, IEEE)
1 ES&S and imec-COSIC, Department of Electrical Engineering (ESAT), KU Leuven, 3000 Leuven, Belgium
2 EAVISE, PSI, Department of Electrical Engineering (ESAT), KU Leuven, 3000 Leuven, Belgium
3 Leiden Institute of Advanced Computer Science (LIACS), Leiden University, 2311 Leiden, The Netherlands

Corresponding author: Laurens Le Jeune (laurens.lejeune@kuleuven.be)

This work was supported in part by the COllective Research NETworking (CORNET) and funded by VLAIO under
Grant HBC.2018.0491, and in part by the CyberSecurity Research Flanders under Grant VR20192203.

ABSTRACT Network Intrusion detection systems are essential for the protection of advanced commu-
nication networks. Originally, these systems were hard-coded to identify specific signatures, patterns and
rule violations; now artificial intelligence and machine learning algorithms provide promising alternatives.
However, in the literature, various outdated datasets as well as a plethora of different evaluation metrics
are used to prove algorithm efficacy. To enable a global comparison, this study compiles algorithms for
different configurations to create common ground and proposes two new evaluation metrics. These metrics,
the detection score and the identification score, together reliably present the performance of a network
intrusion detection system to allow for practical comparison on a large scale. Additionally, we present a
workflow to process raw packet flows into input features for machine learning. This framework quickly
implements different algorithms for the various datasets and allows systematic performance comparison
between those algorithms. Our experimental results, matching and surpassing the state-of-the-art, indicate
the potential of this approach. As raw traffic input features are much easier and cheaper to extract when
compared to traditional features, they show promise for application in real-time deep learning-based systems.

INDEX TERMS Intrusion detection, machine learning, neural networks, security.

I. INTRODUCTION for identifying malicious activity and attacks [4]. Network

Today, more and more devices are connected to the internet. intrusion detection systems then aim to detect attacks by
Cisco forecasts that by 2023 there will be 29.3 billion devices investigating network traffic. While they historically func-
connected to the internet [1]. As the attack surface increases, tioned through hard-coded rules, more and more research is
the need for security rises. For example, in 2015 a mas- being conducted to investigate the application of machine
sive brute force attack [2] on Alibaba resulted in the poten- learning (ML). Academic research proposes many differ-
tial compromisation of 21 million user accounts. In 2016, ent network intrusion detection techniques, also comparing
Internet-of-Things (IoT) devices infected with the Mirai bot- against other techniques. However, the plethora of publicly
net were used in a large Distributed Denial of Service (DDoS) available and potentially outdated datasets, the differences
attack against Domain Name System provider Dyn, resulting between those datasets, the variety of evaluation methods and
in the unavailability of many major internet platforms such as the occasional unclear reporting of proposed techniques sig-
Spotify, Twitter and Netflix1 [3]. nificantly complicate making a fair comparison. This paper
One important link in the chain of protection against aims at solving this issue and sets out a work flow that is
attacks is the intrusion detection system (IDS), responsible used to run existing solutions on relevant datasets and that is
made open source such that it can easily be applied to future
1 https://splinternews.com/here-are-the-sites-you-cant-access-because- solutions. Our contribution is four-fold:
someone-took-1793863079 • We give an overview of the most frequently used datasets
The associate editor coordinating the review of this manuscript and and we summarize the pros and cons of each dataset
approving it for publication was Vicente Alarcon-Aquino . (Sect. IV).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 63995
L. Le Jeune et al.: Machine Learning for Misuse-Based Network Intrusion Detection

• We present existing evaluation methods and their draw- in [10]. Building what effectively is an intrusion detection
backs, and we propose newly derived unifying metrics network (IDN) allows the different IDS nodes to communi-
for fair and reliable comparison of network intrusion cate and pass on useful information. This however unlocks
detection performance (Sect. V). a new security risk: malicious or compromised IDS nodes
• We provide a profound overview of ML techniques in can try to give false feedback, or forgo giving necessary
the literature for network intrusion detection, with a feedback. Therefore, CIDS research involves verifying the
focus on recent deep learning (DL) approaches, and trustworthiness of a node, e.g. [9], [11], [12].
we quantitatively compare and discuss these techniques Recently, a new branch of IDS research has emerged,
based on results reported in related work as well as investigating the security of the Internet-of-Things (IoT). IoT
our own recalculations (Sect. VI) using our proposed nodes require IDS technology different from regular net-
metrics. works, for three reasons: The limited resources of IoT nodes,
• We propose a workflow that allows for the use of raw their specific network topologies and their new communica-
network traffic in machine learning, as raw traffic-based tion protocols [13]. Some examples are [14], [15].
features are more suitable for real-time application when Another increasingly important domain concerns Wireless
compared to traditional machine learning features for Sensor Networks (WSN). These networks are characterized
network intrusion detection (Sect. VII). The promising by infrastructure absence, wireless links, limited physi-
experimental results for various datasets and algorithms cal protection, a lack of central management and limited
are comparable to the state-of-the-art. resources as defined in [16]. And while some WSN solutions
Before expounding on these contributions, we will first are being evaluated on NIDS datasets [17]–[20], WSN envi-
provide background information on intrusion detection sys- ronments are intrinsically different from the traditional NIDS
tems (Sect. II) as well as compare our work against other, environment we consider in this paper, with a fixed, wired
related work (Sect. III). infrastructure and abundant resources.
In practice, IDS implementations can be combinations
II. BACKGROUND of the different types of IDSs. Reference [11] for example
Intrusion detection systems are systems that are able to detect uses different HIDS nodes in a CIDS system, where the
malicious behaviour. This section inspects network intrusion HIDS nodes communicate to improve detection accuracy.
detection as one of multiple intrusion detection applications, IDSs that combine HIDS and NIDS technology are some-
as well as different approaches to actually build a network times called hybrid IDS [21], [22]. Note that hybrid IDS
intrusion detection system. can also denote an IDS that combines misuse-based and
anomaly-based intrusion detection [14], [23], [24]. This will
A. INTRUSION DETECTION SYSTEMS be more thoroughly investigated in Sect. II-B. In this paper,
One of the first mentions of detecting malicious activity we focus on network-based intrusion detection systems.
was made in 1980 by J.P. Anderson [5], who outlines the
required components for what is now known as an IDS. B. MISUSE-BASED OR ANOMALY-BASED
In 1987 D. Denning introduced IDES (Intrusion-Detection Generally, the functioning of any IDS can be described as
Expert System), which was the foundation for many subse- being either misuse-based or anomaly-based. Misuse-based2
quent IDSs [6]. Currently, various IDSs are used for various intrusion detection, also known as knowledge-based [4],
goals. Host-based intrusion detection systems (HIDS), for in principle simply means that the IDS knows what cer-
example, aim to detect intrusions in a specific host [4], such tain attacks look like, and that it detects attacks based on
as a computer or a server. Notably, this not only comprises that knowledge. Therefore, misuse-based intrusion detection
network-based intrusions, but includes unauthorized use of algorithms obtain low false positive rates (see Sect. V) when
the host. A HIDS example is given in [7], in which the anoma- inspecting network traffic. Moreover, they can effectively
lous use of applications is detected by monitoring system detect known attacks and label them accordingly, which
calls. Although this technique helps to identify intrusions via facilitates following up on the detection. There is however
a specific host, it provides no insight into intrusions in other one glaring weakness of misuse-based intrusion detection
parts of the network. systems: Their inability to detect unknown and zero-day
By contrast, network-based IDSs (NIDS) detect network attacks. Since they are conditioned to detect known attacks,
traffic intrusions [4]. Rather than keeping track of one host, other attacks that do not share similarities with them will go
NIDSs monitor network attacks, by inspecting network traffic unnoticed.
to detect malicious communication through the flows and The counterpart to the misuse-based intrusion detection
generated features of the network packets. is anomaly-based intrusion detection. Anomaly-based or
Collaborative IDSs (CIDS) consist of multiple IDS nodes behaviour-based intrusion detection creates a model of what
that exchange information in a centralized, decentralized or
2 Traditionally, misuse-based approaches were signature-based, matching
distributed manner [8]. This way, a large network can be
traffic against know attack patterns. In this paper, we include supervised
better protected against exceedingly distributed attacks [9]. machine learning solutions in this category, as they are trained to recognize
One of the earlier mentions of such a system is given in specific attacks.