A Survey On Network Intrusion Detection
A Survey On Network Intrusion Detection
ABSTRACT
Network security is any activity designed to protect the usability and integrity of your network and data. It
includes both hardware and software technologies. Effective network security manages access to the network.
It targets a variety of threats and stops them from entering or spreading on your network. Network security
combines multiple layers of defenses at the edge and in the network. Each network security layer implements
policies and controls. Authorized users gain access to network resources, but malicious actors is blocked from
carrying out exploits and threats. Two types of network includes wired and wireless network. The common
vulnerability that exists in both wired and wireless networks is an “unauthorized access” to a network. An
attacker can connect his device to a network though unsecure hub/switch port. In this regard, wireless network
are considered less secure than wired network, because wireless network can be easily accessed without any
physical connection. Network security is a big topic and is growing into a high profile Information Technology
(IT) specialty area. Security-related websites are tremendously popular with savvy Internet users. The
popularity of security-related certifications has expanded. Esoteric security measures like biometric
identification and authentication have become commonplace in corporate America. Many organizations still
implement security measures in an almost haphazard way, with no well-thought out plan for making all the
parts fit together. Computer security involves many aspects, from protection of the physical equipment to
protection of electronic bits and bytes that make up the information that resides on the network.
Keywords : Unauthorized Access, Savvy Internet Users, Intrusion Detection System, NIDS, HIDS
IJSRSET1848160 | Received : 01 May 2018 | Accepted 11 May 2018 | May-June-2018 [(4) 8 : 595-613]
595
K. Veena et al. Int J S Res Sci. Engg. Tech. 2018 May-June;4(8) : 595-613
forecast that by 2020, the amount of data in existence and Software Defined Networks. NIDSs will need to
will top 44 ZB [4]. The traffic capacity of modern be able to adapt to the usage of such technologies and
networks has drastically increased to facilitate the the side effects it bring about.
volume of traffic observed. Many modern backbone
links are now operating at wirespeeds of 100 Gbps or 1.2 DEEP LEARNING
more. To contextualise the above issue, a 100 Gbps
link is capable of handling 148,809,524 packets per Deep learning is an advanced sub-field of machine
second [5]. A NIDS would need to be capable of learning, which advances Machine Learning closer to
completing the analysis of a packet within 6.72 ns for Artificial Intelligence. It facilitates the modelling of
operating at wire speed. NIDS at such a speed is complex relationships and concepts [6] using multiple
difficult and ensuring satisfactory levels of accuracy, levels of representation. Supervised and unsupervised
effectiveness and efficiency also presents a significant learning algorithms are used to construct successively
challenge. higher levels of abstraction, defined using the output
2) Accuracy - To maintain the aforementioned levels features from lower levels [7].
of accuracy, existing techniques cannot be relied upon.
Therefore, greater levels of granularity, depth and 1) Auto-Encoder: A popular technique currently
contextual understanding are required to provide a utilised within deep learning research is auto-
more holistic and accurate view. It comes with encoders, which is utilised by our proposed solution
various financial, computational and time costs. An auto encoder is an unsupervised neural network-
3) Diversity - Recent years have seen an increase in based feature extraction algorithm, which learns the
the number of new or customised protocols being best parameters required to reconstruct its output as
utilised in modern networks. This can be partially close to its input as possible. One of it desirable
attributed to the number of devices with network characteristics is the capability to provide more a
and/or Internet connectivity. As a result, it is powerful and non-linear generalisation than Principle
becoming increasingly difficult to differentiate Component Analysis (PCA). This is achieved by
between normal and abnormal traffic and behaviours. applying back propagation and setting the target
4) Dynamics - Given the diversity and flexibility of values to be equal to the inputs. In other words, it is
modern networks, the behaviour is dynamic and trying to learn an approximation to the identity
difficult to predict. It leads to difficulty in establishing function. An auto-encoder typically has an input
a reliable behavioural norm. It also raises concerns as layer, output layer (with the same dimension as the
to the lifespan of learning models. input layer) and a hidden layer. This hidden layer
5) Low-frequency attacks - These types of attacks normally has a smaller dimension than that of the
have often thwarted previous anomaly detection input (known as an undercomplete or sparse auto-
techniques, including artificial intelligence encoder). Most researchers [8]–[10] use auto-encoders
approaches. The problem stems from imbalances in as a non-linear transformation to discover interesting
the training dataset, meaning that NIDS offer weaker data structures, by imposing other constraints on the
detection precision when faced with these types of network and compare the results with those of PCA
low frequency attacks. (linear transformation). These methods are based on
6) Adaptability - Modern networks have adopted the encoder-decoder paradigm. The input is first
many new technologies to reduce their reliance on transformed into a typically lower-dimensional space
static technologies and management styles. Therefore, (encoder) and then expanded to reproduce the initial
it becomes more widespread usage of dynamic data (decoder). Once a layer is trained, its code is fed
technologies such as containerisation, virtualisation to the next, to better model highly non-linear
dependencies in the input. This paradigm focuses on and multiple hidden layers are used to provide depth,
reducing the dimensionality of input data. A special in a technique known as a stacked auto-encoder. This
layer - the code layer [9], at the centre of the deep increased depth can reduce computational costs and
auto-encoder structure is used to reduce the amount of required training data, as well as
dimensionality. The code layer is used as a yielding. greater degrees of accuracy [6]. The output
compressed feature vector for classification or for from each hidden layer is used as the input for a
combination within a stacked auto-encoder [8]. The progressively higher level. Hence, the first layer of a
hidden layer is used to create a lower dimensionality stacked auto-encoder usually learns first order
version of high dimensionality data (known as features in raw input. The second layer usually learns
encoding). By reducing the dimensionality, the auto- second-order features relating to patterns in the
encoder is forced to capture the most prominent appearance of the first-order features. Subsequent
features of the data distribution. In an ideal scenario, higher layers learn higher order features. An
the data features generated by the auto-encoder will illustrative example of a stacked auto-encoder is
provide a better representation of the data points than shown in Fig. 2. Here, the superscript numbers refer
the raw data itself. The aim of the auto-encoder is to to the hidden layer identity and the subscript
try and learn the function shown in (1). numbers signify the dimension for that layer.
hW,b (x) ≈ x (1) The findings from our literature review have shown
that despite the high detection accuracies being
where h = non-linear hypothesis using the parameters achieved, there is still room for improvement. Such
W=weighting weaknesses include the reliance on human operators,
b =bias long training times, inconsistent or average accuracy
x=given data. levels and the heavy modification of datasets (e.g.
balancing or profiling). The area is still in an infantile
The learning process is described as a reconstruction stage, with most researchers still experimenting on
error minimisation function, as shown in (2). combining various algorithms (e.g. training,
L(x, d(f(x))) (2) optimisation, activation and classification) and
where L = loss function penalising d(f(x)) for being layering approaches to produce the most accurate and
dissimilar to x, efficient solution for a specific dataset. Hence, we
d is a decoding function and believe the model and work presented in this paper
f is an encoding function. will be able to make a valid contribution to the
current pool of knowledge.
2) Stacked Auto-Encoder: Unlike a simple auto-
encoder, a deep auto-encoder is composed of two II. LITERATURE SURVEY
symmetrical deep-belief networks, which typically
have four or five shallow layers for encoding, and a Deep learning is garnering significant interest and its
second set of four or five layers for decoding. The application is being investigated within many
work by Hinton and Salacukhudinov [9] has produced research domains, such as: healthcare [11], [12];
promising results by implementing a deep learning automotive design [13], [14]; manufacturing [15] and
algorithm to convert high dimensional data to low law enforcement [16], [17]. There are also several
dimensional data by utilising a deep auto-encoder. existing works within the domain of NIDS.
Deep learning can be applied to auto-encoders, Comparison Deep Learning Method to Traditional
whereby the hidden layers are the simple concepts Methods Using for Network Intrusion Detection
Dong and Wang undertook a literary and calculate the distance between nodes within the
experimental comparison between the use of specific network. Importantly, the technique works on the
traditional NIDS techniques and deep learning assumption that the normality of the data is
methods. Deep learning has gained prominence due determined by consistency of distance between the
to the potential it portends for machine learning. nodes. As such, the longer the distance between the
Deep learning techniques have been applied in many nodes is indicative of the abnormality of the
fields such as recognizing some kinds of patterns or information thus acting as a pointer to the presence of
classification. Intrusion detection analyses and get malicious data. In relation to this, two measurement
data from monitoring security events to get situation systems exist that are Manhattan distance, which is
for assessment of network. Lots of traditional machine the total distance between the dimensions within the
learning method has been put forward to intrusion network and the Euclidean distance that primarily is
detection and it is necessary to improvement the the size of the vector being assessed.
detection performance and accuracy. The current idea
discusses different methods which were used to Deep Learning through Analysis of Data Patterns in
classify network traffic. The above approach use Network Security
different methods on open data set and did
experiment with those methods to find out a best way The growth of the information technology field has
to intrusion detection. necessitated the need for newer and better methods of
analyzing how these computer systems operate.
Deep Learning in Network Security Several methods in machine learning exist that try
investigate the principles behind the devices. The
Traffic identification is a key component in network field of deep learning is dynamic due to the
security since it raises the red flag in case of intrusion development of new techniques in several sub
into the network. The system has relied on traditional branches that include image recognition, computer
methods of detection that are increasingly becoming security, and speech recognition. The classical
ineffective due to the commensurate increase in data. methods of deep learning used in network security
Traditional approaches include port identification for are increasingly failing to detect intrusions into
instance, standard HTTP that is failing to perform as network systems due to the commensurate increase in
envisaged due to less protocols following the system. data production. As such, big data analysis using deep
Another system involves the signature-based method belief system is the latest innovation that tries to
that relies on the payload data was used in several study information patterns with a view of detecting
applications. unauthorized entry into computer networks.
Detecting intrusion into the system is always the The use of deep learning has in recent times gained
challenge of differentiating between normal and prominence due to its effectiveness in evaluating
abnormal data on the system. The detection approach network security. The system has enabled the
should define the characteristics of malicious data on exhaustive and conclusive assessment of network
the system. Further, the technique should be able to security. Traditional methods of network security are
design a classification system that is able to increasingly failing to function effectively due to
differentiate between the two sets of information increased processing of data. Deep learning has
accurately that is genuine and malicious data. This revolutionized the evaluation of challenges in
system is known as dimensionality reduction network security. The system uses several approaches
technique that uses automatic encoding approaches to to detect abnormalities in the system that include
anomaly detection and traffic identification. The where ϕ is a non-linear activation function. Then,
system faces certain limitations that include sanctity decoder maps the hidden representation back to the
of data used to generate inputs and outputs. Similarly, original representation in a similar way.
new methods of deep learning are gaining traction
due demand for faster and efficient data assessment. Model parameters are optimized to minimize the
Deep belief and deep coding techniques have enabled reconstruction error between z = fθ(x) and x. One
the analysis of large data sets and deeper system commonly adopted measure for the average
analysis respectively. reconstruction error over a collection of N data
samples is squared error and the corresponding
Deep Learning and Its Applications to Machine optimization problem. The hidden representation h
Health Monitoring: A Survey can be regarded as a more abstract and meaningful
representation for data sample x. The hidden size
Deep learning (DL) has become a rapidly growing should be set to be larger than the input size in AE,
research direction, redefining state-of-the-art which is verified empirically. The above method
performances in a wide range of areas such as object prevent the learned transformation is the identity one
recognition, image segmentation, speech recognition and regularize auto-encoders, the sparsity constraint
and machine translation. In modern manufacturing is imposed on the hidden units in addition to sparsity.
systems, data-driven machine health monitoring is
gaining in popularity due to the widespread Addition of Denoising: Different from conventional
deployment of low-cost sensors and their connection AE, denoising AE takes a corrupted version of data as
to the Internet. Deep learning provides useful tools input and is trained to reconstruct/denoise the clean
for processing and analyzing these big machinery data. input x from its corrupted sample x˜. The most
The main purpose of the method is to review and common adopted noise is dropout noise/binary
summarize the emerging research work of deep masking noise, which randomly sets a fraction of the
learning on machine health monitoring. The input features to be zero. The variant of AE is
applications of deep learning in machine health denoising auto-encoder (DA), which can learn more
monitoring systems are reviewed mainly from the robust representation and prevent it from learning
following aspects: Autoencoder (AE) and its variants, the identity transformation.
Restricted Boltzmann Machines and its variants
including Deep Belief Network (DBN) and Deep Stacking Structure
Boltzmann Machines (DBM), Convolutional Neural
Networks (CNN) and Recurrent Neural Networks Several DA can be stacked together to form a deep
(RNN). Finally, some new trends of DL-based network and learn high-level representations by
machine health monitoring methods are discussed. feeding the outputs of the l-st layer as inputs to the (l
+ 1)-th layer. The training will be done for one layer
Deep Learning greedily at a time.
As a feed-forward neural network, auto-encoder Convolutional Neural Network
consists of two phases including encoder and decoder. Convolutional neural networks (CNNs) were
Encoder takes an input x and transforms it to a hidden firstly proposed by LeCun for image processing,
representation h via a non-linear mapping as follows: which is featured by two key properties: spatially
h = ϕ(Wx + b) shared weights and spatial pooling. CNN models have
shown their success in various computer vision
applications where input data is usually 2D data. CNN
has also been introduced to address sequential data The progress in intrusion detection has been steady
including Natural Language Processing and Speech but slow for past twenty years. The biggest challenge
Recognition. CNN aims to learn abstract features by is to detect new attacks in real time. A deep learning
alternating and stacking convolutional kernels and approach for anomaly detection using a Restricted
pooling operation. In CNN, the convolutional layers Boltzmann Machine (RBM) and a deep belief network
(convolutional kernels) convolve multiple local filters are implemented. Our method uses a one-hidden
with raw input data and generate invariant local layer RBM to perform unsupervised feature reduction.
features and the subsequent pooling layers extract The resultant weights from this RBM are passed to
most significant features with a fixed-length over another RBM producing a deep belief network. The
sliding windows of the raw input data. 2D-CNN have pre-trained weights are passed into a fine tuning layer
been illustrated extensively in previous research consisting of a Logistic Regression (LR) classifier with
compared to 1D-CNN. Only mathematical details are multi-class soft-max datas. The deep learning
behind 1D-CNN. architecture was introduced in C++, Microsoft Visual
Studio 2013 using DARPA KDDCUP'99 dataset to
A systematic overview of the state-of-the-art DL- evaluate its performance. The proposed architecture
based MHMS. Deep learning, as a subfield of machine outperforms previous deep learning methods
learning, is serving as a bridge between big machinery implemented by Li and Salama in both detection
data and data-driven MHMS. Deep learning have speed and accuracy. We achieve a detection rate of
been applied in various machine health monitoring 97.9% on the total 10% KDDCUP'99 test dataset. By
tasks within past four years. The proposed DL-based improving the training process of the simulation, low
MHMS are summarized according to four categories false negative rate of 2.47% is produced. Although the
of DL architecture as: Auto-encoder models, deficiencies in the KDDCUP'99 dataset are well
Restricted Boltzmann Machines models, understood, it still presents machine learning
Convolutional Neural Networks and Recurrent approaches for predicting attacks with a reasonable
Neural Networks. Since the momentum of the challenge. The future work will include application of
research of DL-based MHMS is growing fast, the machine learning strategy to larger and more
messages about the capabilities of these DL techniques, challenging datasets, which includes larger classes of
especially representation learning for complex attacks.
machinery data and target prediction for various
machine health monitoring tasks, can be conveyed to The weights are passed to another RBM to produce a
readers. It can be found that DL-based MHMS do not DBN. The pre-trained weights are passed into a fine
require extensive human labor and expert knowledge, tuning layer consisting of a Logistic Regression
i.e., the end-to-end structure is able to map raw classifier (trained with 10 epochs) with multi-class
machinery data to targets. Therefore, the application soft-max. The proposed solution was evaluated using
of deep learning models are not restricted to specific the KDD Cup ‟99 dataset. The authors claimed a
kinds of machines, which can be a general solution to detection rate of 97.90% and a false negative rate of
address the machine health monitoring problems 2.47%. This is an improvement over other results.
Toward an Online Anomaly Intrusion Detection Method of intrusion detection using deep neural
System Based on Deep Learning network
Alrawashdeh and Purdy proposed a method called The work by Kim et al. [19] aspired to specifically
RBM with one hidden layer to perform unsupervised target advanced persistent threats. An artificial
feature reduction. intelligence (AI) intrusion detection system using a
deep neural network (DNN) was investigated and the second stage, this learnt representation is applied
tested with the KDD Cup 99 dataset in response to to labeled data xl and used for the classification task.
ever-evolving network attacks in his method. First, Although the unlabeled and labeled data may come
the data were preprocessed through data from different distributions, there must be relevance
transformation and normalization for input to the among them. There are different approaches used for
DNN model. The DNN algorithm was applied to the UFL such as Sparse Autoencoder, Restricted
data refined through preprocessing to create a Boltzmann Machine (RBM), K-Means Clustering and
learning model and the entire KDD Cup 99 dataset Gaussian Mixtures. Sparse autoencoder based feature
was used to verify it. Finally, the accuracy, detection learning is used for the work due to its relatively
rate and false alarm rate were calculated to ascertain easier implementation and good performance. A
the detection efficacy of the DNN model which was sparse autoencoder is a neural network consists of an
found to generate good results for intrusion detection. input, a hidden and an output layers. The input and
output layers contain N nodes and the hidden layer
The new proposed system was Deep Neural Network contains K nodes.
(DNN) using 100 hidden units, combined with the
Rectified Linear Unit activation function and the NSL-KDD Dataset
ADAM optimiser. The approach was implemented on The dataset is an improved and reduced version of the
a GPU using TensorFlow and evaluated using the KDD Cup 99 dataset [20] is used. The KDD Cup
KDD data set. The authors claimed an average dataset was prepared using the network traffic
accuracy rate of 99% and summarised that both RNN captured by 1998 DARPA IDS evaluation program
and Long Short-Term Memory (LSTM) models are [22]. The network traffic includes normal and
needed for improving future defences. different kinds of attack traffic such as DoS, Probing,
user-to-root (U2R) and root-to-local (R2L). The
A Deep Learning Approach for Network Intrusion network traffic for training was collected for seven
Detection System weeks followed by two weeks of traffic collection for
testing in raw tcpdump format. The test data contains
A Network Intrusion Detection System (NIDS) helps many attacks that were not injected during the
system administrators to detect network security training data collection phase to make the intrusion
breaches in their organizations. However, many detection task realistic. It is believed that most of the
challenges arise while developing a flexible and novel attacks can be derived from the known attacks.
efficient NIDS for unforeseen and unpredictable Finally, the training and test data were processed into
attacks. We propose a deep learning based approach the datasets of five million and two million TCP/IP
for developing such an efficient and flexible NIDS. connection records respectively.
Self-taught Learning (STL), a deep learning based
technique on NSL-KDD - a benchmark dataset for The KDD Cup dataset has been widely used as a
network intrusion is used. benchmark dataset for many years in the evaluation
of NIDS. One of the major drawbacks with the dataset
Self-Taught Learning is that it contains an enormous amount of redundant
Self-taught Learning (STL) is a deep learning records both in the training and test data. It was
approach that consists of two stages for the observed that almost 78% and 75% records are
classification. First, a good feature representation is redundant in the training and test dataset respectively.
learnt from a large collection of unlabeled data xu, The resulted redundancy makes the learning
termed as Unsupervised Feature Learning (UFL). In algorithms biased towards the frequent attack records
and leads to poor classification results for the mixed and those focusing on fewer classes were more
infrequent and harmful records. The training and test accurate than those with more classes.
data were classified with the minimum accuracy of 98%
and 86% respectively using a very simple machine Network based communication is more vulnerable to
learning algorithm. It made the comparison task outsider and insider attacks in recent days due to its
difficult for various IDSs based on different learning wide spread applications in many fields. Intrusion
algorithms. NSL-KDD was proposed to overcome the Detection System (IDS) a software application or
limitation of KDD Cup dataset. The dataset is derived hardware is a security mechanism that is able to
from the KDD Cup dataset. It improved the previous monitor network traffic and find abnormal activities
dataset in two ways. First, it eliminated all the in the network. Machine learning techniques which
redundant records from the training and test data. have an important role in detecting the attacks were
Second, it partitioned all the records in the KDD Cup mostly used in the development of IDS. Due to huge
dataset into various difficulty levels based on the increase in network traffic and different types of
number of learning algorithms that can correctly attacks, monitoring each and every packet in the
classify the records. Further, it selects the records by network traffic is time consuming and computational
random sampling of distinct records from different intensive. Deep learning acts as a powerful tool by
difficulty levels in a fraction that is inversely which thorough packet inspection and attack
proportional to their fractions in the distinct records. identification is possible. The parallel computing
The multi-steps processing of KDD Cup dataset made capabilities of the neural network make the Deep
the total records statistics reasonable in the NSL-KDD Neural Network (DNN) to effectively look through
dataset. the network traffic with an accelerated performance.
An accelerated DNN architecture is developed to
A deep learning based approach was developed for an identify the abnormalities in the network data. NSL-
efficient and flexible NIDS. A sparse auto encoder and KDD dataset is used to compute the training time and
soft-max regression based NIDS was implemented. to analyze the effectiveness of the detection
The benchmark network intrusion dataset - SL-KDD mechanism.
was used to evaluate anomaly detection accuracy. We Analyzing flow-based anomaly intrusion detection
observed that the proposed NIDS performed very well using Replicator Neural Networks
compared to previously implemented NIDSs for the An unsupervised method to learn models of normal
normal/anomaly detection when evaluated on the test network flows is flow based anomaly detection. The
data. The performance can be further enhanced by above method use Replicator Neural Networks, auto-
applying techniques such as Stacked Auto encoder, an encoder and the dropout concepts of deep learning.
extension of sparse auto encoder in deep belief nets The exact accuracy of their proposed method
for unsupervised feature learning, and NB-Tree, evaluated is not fully disclosed. Defending key
Random Tree, or J48 for further classification. It was network infrastructure, such as Internet backbone
noted that the latter techniques performed well when links or the communication channels of critical
applied directly on the dataset. infrastructure is yet challenging. The inherent
Accelerated deep neural networks for enhanced complex nature and quantity of network data impedes
Intrusion Detection System detecting attacks in real world settings. The
The enhanced intrusion detection method uses 41 utilization features of network flows, characterized by
features and their DNN has 3 hidden layers (2 auto- their entropy, together with an extended version of
encoders and 1 soft-max). The results obtained were the original Replicator Neural Network (RNN) and
deep learning techniques is used to learn models of
normality. The combination allows us to apply building the DNN structure are trained with
anomaly-based intrusion detection on arbitrarily large probability-based feature vectors that are extracted
amounts of data and large networks. The approach is from the in-vehicular network packets. For a given
unsupervised and it requires no labeled data. It also packet, the DNN provides the probability of each
accurately detects network-wide anomalies without class discriminating normal and attack packets and
presuming that the training data is completely free of thus the sensor can identify any malicious attack to
attacks. The evaluation of intrusion detection method the vehicle. The traditional artificial neural network
on top of real network data indicates that it can applied to the IDS and the proposed technique adopts
accurately detect resource exhaustion attacks and recent advances in deep learning studies such as
network profiling techniques of varying intensities. initializing the parameters through the unsupervised
The developed method is efficient because a pre-training of deep belief networks (DBN) by
normality model can be learned by training an RNN improving the detection accuracy. It is demonstrated
within a few seconds only. with experimental results that the proposed technique
Deep learning approach for Network Intrusion can provide a real-time response to the attack with a
Detection in Software Defined Networking significantly improved detection ratio in controller
The above method is to monitor network flow data. area network (CAN) bus.
The paper lacked details about its exact algorithms
but does present an evaluation using the NSL-KDD Proposed Intrusion Detection System with Deep
dataset, which the authors claim gave an accuracy of Neural Network Structure
75.75% using six basic features. The intrusion detection system considers a general
type of an attack scenario where malicious data
Software Defined Networking (SDN) has recently packets are injected into an in-vehicle CAN bus. In-
emerged to become one of the promising solutions for vehicular networks are accessed from the mobile
the future Internet. With the logical centralization of communication links such as 3G, 4G, and WIFI or a
controllers and a global network overview, SDN self-diagnostic tool such as On-Board Diagnostics
brings us a chance to strengthen our network security. paired with the driver‟s mobile device. Intrusion
However, SDN also brings us a dangerous increase in detection system monitors broadcasting CAN packets
potential threats. The deep learning approach for in the bus and determines an attack.
flow-based anomaly detection in an SDN
environment is applied. A Deep Neural Network CAN Packet Feature
(DNN) model for an intrusion detection system and CAN feature is an abstract representation of a CAN
train the model with the NSL-KDD Dataset. The packet. The feature is designed by considering
usage of six basic features (that can be easily obtained computational efficiency. The feature is extracted
in an SDN environment) taken from the forty-one directly from a bit stream of a CAN packet so that the
features of NSL-KDD Dataset. Through experiments, decoding is not necessary during the extraction. The
we confirm that the deep learning approach shows occurrences of bit-symbols in a data packet are taken
strong potential to be used for flow-based anomaly into an account. The DATA field chosen includes 64
detection in SDN environments. bit positions (= 8 Bytes) in the CAN syntax and
Intrusion Detection System Using Deep Neural investigate the probability distributions of the bit-
Network for In-Vehicle Network Security symbols.
A novel intrusion detection system (IDS) using a deep Training the Deep Neural Network Structure
neural network (DNN) is proposed to enhance the The learning mechanism of the proposed DNN
security of in-vehicular network. The parameters structure to classify a normal packet and an attack
packet is explained. An input layer, multiple hidden the problem of building reliable and efficient IDS that
layers, and an output layer are used. The feature are capable of handling large quantities of data, with
vector is fed to the input nodes of the structure. Each changing patterns in real time situations. The work
node computes an output with an activation function presented in this manuscript classifies intrusion
using rectified linear unit (ReLU) and the linear detection systems (IDS). Moreover, a taxonomy and
combinations of the outputs are linked to the next survey of shallow and deep networks intrusion
hidden layers. detection systems is presented based on previous and
current works. The taxonomy and survey reviews
Attack Detection machine learning techniques and the performance in
The class of a testing CAN packet is predicted in the detecting anomalies was improved. Feature selection
detection phase. The output is computed with the which influences the effectiveness of machine
trained weight parameters and the feature set learning (ML) IDS is discussed to explain the role of
extracted from the testing CAN packet as in the feature selection in the classification and training
training. The classifier provides the logistic value 0 or phase of ML IDS. A discussion of the false and true
1, telling if the sample is normal packet or the attack positive alarm rates is presented to help researchers
packet respectively. model reliable and efficient machine learning based
intrusion detection systems.
An efficient intrusion detection system (IDS) based on
a deep neural network (DNN) for the security of in- Anomaly Based Detection
vehicular network. The parameters of DNN are Anomaly Based Detection is a behavioral based
trained with probability-based feature vectors intrusion detection system. It observes changes in
extracted from the in-vehicular network packets by normal activity within a system by building a profile
using unsupervised pre-training method of deep belief of the system which is being monitored. The profile is
networks, followed by the conventional stochastic generated over a period of time when the system is
gradient descent method. The DNN provides the established to have behaved normally. One advantage
probability of each class to discriminate normal and is that it offers the ability to detect attacks which are
hacking packets and thus the system can identify any new to the system.
malicious attack to the vehicle as a result. A novel
feature vector was proposed comprising the mode Self-learning – The self-learning system operate by
information and the value information extracted from example with a baseline set for normal operation.
the network packets and it was efficiently used in the This is achieved by building a model for the
training and the testing. It was demonstrated with underlying processes with the observed system traffic
experimental results that the proposed technique built up over a period of time. Self-learning systems
could provide a real-time response to the attack with are sub-divided into the following main categories:
a significantly accurate detection ratio about 98% on time series model and machine learning.
average when the computational complexity with the
number of the layers is modestly small. Programmed – A programmed model is when a
system needs either a user or an external person to
Shallow and Deep Networks Intrusion Detection teach the system to detect changes in behavior. The
System: A Taxonomy and Survey user decides the extent of abnormal behavior in the
Intrusion detection has attracted a considerable system and flags an intrusion threat. The programmed
interest from researchers and industries. The models are grouped into four categories: threshold,
community, after many years of research, still faces simple rule based and statistical models.
Signature Based Detection defines a set of rules used Shallow and deep networks intrusion detection
to match the patterns in the network traffic. If a systems have gained a considerable interest
mismatch is detected it raises an alarm. It has an commercially and amongst the research community.
advantage of being able to detect attacks giving a low With advancement in data sizes, intrusion detection
false positive detection ratio. It has a drawback of systems should have the characteristics to handle
being able to detect only attacks known to the noisy data with high accuracy in detection with high
database. Signature based detection systems are computational speed. The proposed method gives an
programmed with distinct decision rules. The rules overview of the general classification of intrusion
set for detection are coded in a straight forward detection systems and taxonomy with recent and past
manner to detect intrusion. works. The taxonomy gives a clear description of
intrusion detection system and its complexity.
State modeling is the encoding of attacks as a number Current studies of deep learning intrusion detection
of different states in a finite automaton. Each of these systems have been reviewed to help address the
attacks has to be observed in the traffic profile to be challenges in the new technique still in its early stages
considered as an intrusion. It occurs in sub-classes as in intrusion detection. The scope of the work on
time series models: the first is state transition which classifying intrusion detection systems, reviewing the
was proposed by Porras et al. which uses a state various methods of detecting anomaly, performance
transition diagram to represent intrusion. The of these methods were based on past and recent works
approach in models intrusion as a series of state revealing the advantages and disadvantages of each of
transitions which are described as signature action them.
and states descriptions.
The focus of the research on shallow and deep
String matching is a process of knowledge acquisition networks described experiments is to compare the
just as Expert system but has a different approach in performance of these learning algorithms. The
exploiting the knowledge. It deals with matching the experiments demonstrated deep networks
patterns in the audit event generated by the attack significantly outperformed the shallow network in
but not involved in the decision making process. The detection of attacks. CNN has not been exploited in
above technique has been used effectively used as IDS. the field of intrusion detection but proven to be a
good classifier. DBN is also new in its exploitation in
Machine Learning Techniques this field and experimental works are still in progress
Machine Learning (ML) can provide IDS methods to to determine the reliability of these learning
detect current, new and subtle attacks without algorithms to detect attacks. Signature based
extensive human-based training or intervention. It is technique have been in use commercially but have
defined as a set of methods that can automatically not been able to detect all types of attacks especially if
detect patterns to predict future data trends. A large the IDS signature list did not contain the right
number of machine learning techniques exist; the signature.
fundamental operation of all of them relies upon
optimal feature selection. The features are the metrics A Deep Learning Based DDoS Detection System in
which will be used to detect patterns and trends. If Software-Defined Networking (SDN)
one feature of a network is the packet size: machine Distributed Denial of Service (DDoS) is one of the
learning techniques may monitor the packet size over most prevalent attacks in an organizational network
time and generate distributions from which infrastructure. A deep learning based multi-vector
conclusions may be drawn regarding an intrusion. DDoS detection system is used in a software-defined
network (SDN) environment. SDN provides flexibility A deep learning based DDoS detection system for
to program network devices for different objectives multi-vector attack detection in an SDN environment.
and eliminates the need for third-party vendor- The proposed system identifies individual DDoS
specific hardware. A system as a network application attack class with an accuracy of 95.65%. It classifies
was implemented on top of an SDN controller. Deep the traffic in normal and attack classes with an
learning for feature reduction of a large set of features accuracy of 99.82% with very low false-positive
derived from network traffic headers was used. The compared to other works. Future enhancement is to
system based on different performance metrics by reduce the controller‟s bottleneck and implement an
applying it on traffic traces collected from different NIDS that can detect different kinds of network
scenarios was evaluated. The high accuracy with a attacks in addition to DDoS attack.
low false-positive for attack detection was observed in
our proposed system. A deep learning-based RNNs model for automatic
security audit of short messages
Software-Defined Networking (SDN) The traditional text classification methods usually
SDN architecture decouples the control plane and follow this process: first, a sentence can be considered
data plane from network devices, also termed as as a bag of words (BOW), then transformed into
„switches‟ and makes them simple packet forwarding sentence feature vector which can be classified by
elements. The decoupling of control logic and its some methods, such as maximum entropy (ME),
unification to a centralized controller offers several Naive Bayes (NB), support vector machines (SVM)
advantages compared to the current network and so on. An ideal result is not obtained by the
architecture that integrates both the planes tightly. application of the above methods. The most
Administrators can implement policies from a single important reason is that the semantic relations
point, i.e. controller and observe their effects on the between words are very important for text
entire network that makes management simple, less categorization however, the traditional method
error-prone and enhances security. Switches become cannot capture it. Sentiment classification, as a special
generic and vendor-agnostic. Applications that run case of text classification, is binary classification
inside a controller can program these switches for (positive or negative). Inspired by the sentiment
different purposes such as layer 2/3 switch, firewall, analysis, a novel deep learning-based recurrent neural
IDS, load balancer using API offered by a controller networks (RNNs)model for automatic security audit
to them. of short messages from prisons is used to classify short
messages(secure and non-insecure). The feature of
Stacked Auto encoder (SAE) short messages is extracted by word2vec which
Stacked sparse auto encoders and soft-max classifier captures word order information, and each sentence
for unsupervised feature learning and classification is mapped to a feature vector. In particular, words
respectively. A sparse auto encoder is a neural with similar meaning are mapped to a similar position
network that consists of three layers in which the in the vector space and then classified by RNNs.
input and output layers contain M nodes and the RNNs are now widely used and the network structure
hidden layer contains N nodes. The M nodes at the of RNNs determines that it can easily process the
input represent a record with M features, i.e., X = {x1, sequence data. Short messages are preprocessed to
x2, ..., xm}. The output layer is made an identity extract typical features from existing security and
function of the input layer for training purpose, i.e., non-security short messages via word2vec and classify
Xˆ = X . short messages through RNNs which accept a fixed-
sized vector as input and produce a fixed-sized vector
as output. The experimental results show that the ones and the appropriate features can be applied to
RNNs model achieves an average 92.7% accuracy pattern classification in the end. According to the
which is higher than SVM. universal approximation theorem of neural networks,
deep models have a better ability to represent the
An automatic security auditing tool for short messages nonlinear functions than shallow ones; therefore,
(SMS) is developed. The application is based upon the deep models can achieve better results on large-scale
RNN model. The authors claimed that their training data. Furthermore, from a feature recognition
evaluations resulted in an accuracy rate of 92.7%, and classification point of view, the deep learning
thus improving existing classification methods (e.g. framework incorporates a feature extractor and
SVM and Naive Bayes). classifier into one framework, which can
A deep learning approach for detecting malicious automatically learn feature representations (often
JavaScript code from unlabeled data), thus avoiding spending
Malicious JavaScript code in web pages on the substantial effort to manually design features. Typical
Internet is an emergent security issue because of its deep neural networks include convolutional neural
universality and potentially severe impact. Because of networks, stacked auto-encoders, Stacked denoising
its obfuscation and complexities, detecting it has a Autoencoders (SdA), deep Boltzmann machines and
considerable cost. Over the last few years, several deep belief networks. In this research, in order to
machine learning-based detection approaches have learn more useful features with unsupervised pre-
been proposed; most of them use shallow training, SdAs is focused for two reasons: first of all,
discriminating models with features that are SdA is suitable for the application of text classification,
constructed with artificial rules. However, with the especially when the input is binary form. Secondly,
advent of the big data era for information according to the results described by Vincent et al. ,
transmission, the existing methods already cannot the SdA would yield better results than other
satisfy actual needs. A new deep learning framework unsupervised methods when the input is binary high-
for detection of malicious JavaScript code is proposed dimensional data.
from the where the highest detection accuracy is
compared with the control group. The architecture is Denoising auto-encoder (dA)
composed of a sparse random projection, deep The dA is an extension of a classical auto-encoder. In
learning model and logistic regression. Stacked real applications, many problems like missing data
denoising auto-encoders were used to extract high- and data noise would cause the theoretical method to
level features from JavaScript code; logistic regression be impractical. In order to force the hidden layer to
as a classifier was used to distinguish between learn more robust features, we trained the auto-
malicious and benign JavaScript code. encoder to reconstruct the input from corrupted input
data. To convert the auto encoder into a dA, we
Deep learning needed to introduce a stochastic noise (i.e., a
Deep learning is an emerging research field of stochastic corruption step operation) into the input
machine learning; it attempts to hierarchically learn layer. A dA is an extension of auto-encoder, which is
high-level representation of data with deep neural used to reconstruct the inputs from a noisy version of
networks. Each layer of the network is first layer- it by minimizing the reconstruction loss. A dA first
wise pre-trained via unsupervised learning and then encodes x to a hidden representation y through a
the entire network carries out fine-tuning in a deterministic mapping, where y ∈ Rh , and then
supervised mechanism. In this manner, high- decodes representation y back into a reconstruction z
hierarchy features can be learned from low-hierarchy of the same shape as x through a similar
transformation, where z ∈ Rd . Hence, the dA is code classification. The limitation of the SdA-LR is
attempting to reconstruct the original inputs x from the long training time; however, the high-speed
the corrupted values. testing time makes up for this deficiency. The
classifier has some other potential drawbacks. One
Stacked denoising auto-encoders (SdA) obvious drawback is that the classifier is likely to
An SdA model is created by integrating multiple dAs identify a small number of good JavaScript codes as
to form a deep learning. Every hidden layer in the latently bad. JavaScript codes in some websites have
network is trained as a dA by optimizing equation been packed to compress as well as to obfuscate; the
through unsupervised pretraining and we can take reason for doing this is to reduce the size of the initial
the output of a dA on a previous layer as the input of JavaScript code and to make it more difficult for one
the next. More clearly, the first hidden layer is to find out what happened in the code and to steal
trained as a dA with JavaScript code vectors as input, their source code. Some benign packed JavaScript
and after finishing training the first hidden layer uses codes are the most likely to be categorized as
the output of the first hidden layer as the input of the malicious by the classifier. Namely, benign packed
second hidden layer. Similarly, once the k-th hidden JavaScript code might yield a false positive to a large
layer is trained, (k + 1)- th layer can be trained using extent.
the output of the k-th hidden layer as the input of the
(k + 1) - th hidden layer to compute the latent Deep4MalDroid: A Deep Learning Framework for
representation. Multiple dAs can be stacked Android Malware Detection Based on Linux Kernel
hierarchically. To use the SdA model for malicious System Call Graphs
JavaScript code detection, a logistic regression Commercial Android malware detection framework
classifier is applied to the output of the last hidden was Deep4MalDroid. The method involves the use of
layer to distinguish between malicious and benign stacked auto-encoders with best accuracy resulting
JavaScript code. The parameters are adjusted from 3 layers. The 10-fold cross validation was used,
throughout the entire network by making use of showing that in comparison to shallow learning, the
target class labels. approach offers improved detection performance.
A method using deep features extracted by SdA for With explosive growth of Android malware and due
classification, to detect JavaScript on webpages as to its damage to smart phone users (e.g., stealing user
either malicious or not. Experimental results credentials, resource abuse), Android malware
indicated that features extracted by our SdAs were detection is one of the cyber security topics that are of
useful for pattern classification and the SdA helped to great interests. Currently, the most significant line of
improve the accuracy of both logistic regression and defense against Android malware is anti-malware
the SVM classifiers. When compared with other software products, such as Norton, Lookout and
feature extraction methods such as principle Comodo Mobile Security, which mainly use the
component analysis, independent component analysis signature-based method to recognize threats.
and FA, the SdA has the highest classification However, malware attackers increasingly employ
accuracy. The proposed SdA-LR model was verified to techniques such as repackaging and obfuscation to
have higher statistical evaluations than the existing bypass signatures and defeat attempts to analyze their
methods. It was shown in our experimental results inner mechanisms.
that building the deep architecture network with
three layers of auto-encoders and 250 hidden units in The increasing sophistication of Android malware
each layer was the optimal choice for the JavaScript calls for new defensive techniques that are harder to
evade and are capable of protecting users against classification boundaries and generate more
novel threats. A novel dynamic analysis method misclassified examples. Since base classifiers are built
named Component Traversal was proposed that can with small labeled dataset and are hard to earn good
automatically execute the code routines of each given generalization performance due to the small labeled
Android application (app) as completely as possible. dataset. Although improving training procedure and
Based on the extracted Linux kernel system calls, the performance of classifiers, error occurrence is
construct the weighted directed graphs are inevitable, so corrections of self-labeled data are
constructed and then apply a deep learning necessary to avoid error amplification in the
framework resting on the graph based features for following classifiers. A deep neural network based
newly unknown Android malware detection. approach was proposed for alleviating the problems of
self-training by combining schemes: pre-training,
A comprehensive experimental study on a real sample dropout and error forgetting. By applying
collection from Comodo Cloud Security Center is combinations of these schemes to various dataset, a
performed to compare various malware detection trained classifier using the above approach shows
approaches. Promising experimental results improved performance than trained classifier using
demonstrate that the proposed method outperforms common self-training.
other alternative Android malware detection
techniques. The developed system Deep4MalDroid In self-training, the error amplification is a major
has also been integrated into a commercial Android problem. If the current base classifier mislabels some
anti-malware software. examples, the mislabelled examples may provide
inaccurate information to base classifier in the next
Deep Neural Network Self-training Based on phase. Then the next base classifier may learn the
Unsupervised Learning and Dropout inaccurate information and generate more mislabelled
In supervised learning methods, a large amount of examples. The error of the base classifier is reinforced
labeled data is necessary to find reliable classification because of error viscosity. The base classifier needs
boundaries to train a classifier. However, it is hard to ability to learn accurate but generalized class
obtain a large amount of labeled data in practice and boundaries even from small training data for the
it is time-consuming with a lot of cost to obtain labels reduction of error amplification. The misclassified
of data. Although unlabeled data is comparatively examples in the proceeding labelling processes are
plentiful than labeled data, most of supervised inevitable and it needs to have a filtering step to
learning methods are not designed to exploit remove errors in the classifier or the training dataset.
unlabeled data. Self-training is one of the semi A deep neural network based approach for self-
supervised learning methods that alternatively repeat training was used by adopting pre-training, dropout
training a base classifier and labelling unlabeled data and error forgetting.
in training set. Most self-training methods have
adopted confidence measures to select confidently The self-training scheme adopting unsupervised pre-
labeled examples because high-confidence usually training based on restricted Boltzmann machine
implies low error. A major difficulty of self-training is learning methods was dropped out. The target of our
the error amplification. approach is alleviating error amplification that is
major problem of self-training. The classification
If the classifier misclassifies some examples and the performance of self-training is improved in various
misclassified examples are included in the labeled datasets by the applying the combination of above
training set, the next classifier may learn improper schemes and it is showed and confirmed in
experimental result. The combination of the error 3.3 13-CLASS NSL-KDD CLASSIFICATION
forgetting and example re-evaluation shows the The results from the 13-Class classification evaluate
performance degradation in our experiments. It is demonstrate that the model was able to offer a 3.8%
assumed that the reason of the performance improvement on its own accuracy simply by using a
degradation is the over-regularization by the more granular dataset. This supports a claim that the
combination of the error forgetting model is able to work more effectively with larger
and complex datasets. Furthermore, the larger dataset
III. LIMITATIONS gives a better insight into the weakness in the model.
3.1 5-CLASSES KDD CUP ‟99 CLASSIFICATION As it can be seen from the results, there is a direct
correlation between the size of the training datasets
The results involving KDD Cup ‟99 dataset evaluation for each label and the accuracy/error rates. The
showed that the model is able to offer an average smaller classes yield lower levels of accuracy using
accuracy of 97.85%. More specifically, the results our model, the larger classes yielded consistently high
show that the accuracy is better than or comparable rates throughout all of the performance measures
in 3 out of 5 classes. It is noted that the results for
“Remote to local” and “User to roots” attack classes 3.4 COMPARISON WITH RELATED WORKS
are anomalous. The stacked NDAE model requires The results are compared from our stacked NDAE
greater amounts of data to learn from. Unfortunately, model against the results obtained from similar deep
due to the smaller number of training datum available, learning based NIDSs. In [26], the authors claim their
the results achieved are less stable. It is evident from 5-class classification of the NSL-KDD dataset
the performance analysis that the model can offer produced an f-score of 75.76%. Their recall and
improved precision, recall and F-score, especially for precision results are not listed but the bar charts show
larger classes. Furthermore, the proposed model them to be around 69% and 83% respectively. The
managed to produce these comparable performance proposed model has produced superior results by
results, whilst consistently reducing the required offering f-score of 87.37%, recall of 85.42% and
training time by an average of 97.72%. precision of 100.00%. Tang et al. [23] claim that their
Deep Neural Network (DNN) approach achieved an
3.2 5-CLASS NSL-KDD CLASSIFICATION accuracy of 75.75% when performing a 5-class
The results that throughout all of the measures the classification of the NSL-KDD dataset. The result is
model yields superior level of performance by using lower than our achieved accuracy of 85.42%. Whilst
NSL-KDD dataset in 3 of the 5 classes. The model classifying the KDD Cup ‟99 dataset, Kim et al. [37]
offered a total accuracy rate of 85.42% which claim they have achieved an accuracy of 96.93%. Gao
improves upon the DBN model by just fewer than 5%. et al. [38] claim their deep learning DBN model
It also offered a 4.84% reduction in the false alarm achieved an accuracy of 93.49%. Both of these results
rate. The results also re-emphasize the point made, are less than the 97.85% accomplished by the model.
that the model doesn‟t handle smaller classes as well. The comparisons show that the proposed model‟s
Another important factor is that the time required to results are very promising when compared to other
train the model is drastically reduced, yielding an current deep learning-based methods.
average time saving of 78.19% against DBN. It is of
critical importance particularly for application in a IV. CONCLUSION & FUTURE WORK
NIDS.
There are several problems faced by existing NIDS
techniques and a novel NDAE method was produced
for unsupervised feature learning. A novel android malware detection based on linux kernel
classification model constructed from stacked NDAEs systemcall graphs," in Proc. IEEE/WIC/ACM Int.
and the RF classification algorithm. Proposed Model Conf. Web Intell. Workshops,Omaha, NE, USA,
in TensorFlow and performed extensive evaluations Oct. 2016, pp. 104-111.
on its capabilities. Evaluations utilized the benchmark 4. IDC, "Executive summary: Data growth, business
KDD Cup ‟99 and NSL-KDD datasets and achieved opportunities, and the IT imperatives. The digital
very promising results. The results have demonstrated universe of opportunities: Rich data and the
that the approach offers high levels of accuracy, increasing value of the internet of things," IDC,
precision and recall together with reduced training Framingham, MA, USA,Tech. Rep. IDC_1672,
time. The stacked NDAE model was compared against 2014. [Online]. Available: https://www.emc.com/
the mainstream DBN technique. These comparisons leadership/digital-universe/2014iview/executive-
have demonstrated that the model offers up to a 5% summary.htm
improvement in accuracy and training time reduction 5. Juniper Networks, "Juniper Networks, How many
of up to 98.81%. Unlike most previous work, project packets per secondper port are needed to achieve
was evaluated with the capabilities of the model based Wire-Speed?," 2015. [Online]. Available:
on both benchmark datasets, revealing a consistent https://kb.juniper.net/InfoCenter/index?page=con
level of classification accuracy. Although the model tent&id=KB14737
has achieved the above promising results, result that 6. I Goodfellow, Y. Bengio, and A. Courville, Deep
acknowledges that it is not perfect and there is Learning. Cambridge,MA, USA: MIT Press, 2016.
further room for improvement. The first avenue of [Online]. Available:
exploration for improvement will be to assess and http://www.deeplearningbook.org
extend the capability of the model to handle zero-day 7. L Deng, "Deep learning: Methods and
attacks was the future enhancement. The idea to applications," Found. Trends Signal Process., vol.
expand upon the existing evaluations by utilizing 7, no. 3/4, pp. 197-387, Aug. 2014.
real-world backbone network traffic to demonstrate 8. P Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and
the merits of the extended model was also be in P.-A. Manzagol,"Stacked denoising autoencoders:
future work Learning useful representations in a deep network
with a local denoising criterion," J. Mach. Learn.
V. REFERENCES Res.,vol. 11, pp. 3371-3408, 2010.
9. G E. Hinton and R. R. Salakhutdinov, "Reducing
1. B Dong and X. Wang, "Comparison deep learning the dimensionality of data with neural networks,"
method to traditional methods using for network Science, vol. 313, no. 5786, pp. 504-507,2006.
intrusion detection," in Proc. 8th IEEE Int. Conf. 10. Y. Wang, H. Yao, and S. Zhao, "Auto-encoder
Commun. Softw. Netw., Beijing, China, Jun. 2016, based dimensionality reduction,"
pp. 581-585. Neurocomputing, vol. 184, pp. 232-242, 2016.
2. R Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and 11. Z. Liang, G. Zhang, J. X. Huang, and Q. V. Hu,
R. X. Gao, "Deep learning and its applications to "Deep learning for healthcare decision making
machine health monitoring: A survey,"Submitted with EMRs," in Proc. IEEE Int. Conf.
to IEEE Trans. Neural Netw. Learn. Syst., 2016. Bioinformat.Biomed., Nov. 2014, pp. 556-559.
[Online].Available: 12. S. P. Shashikumar, A. J. Shah, Q. Li, G. D.
http://arxiv.org/abs/1612.07640 Clifford, and S. Nemati,"A deep learning approach
3. S Hou, A. Saas, L. Chen, and Y. Ye, to monitoring and detecting atrial fibrillation
"Deep4MalDroid: A Deep learningframework for using wearable technology," in Proc. IEEE EMBS