[go: up one dir, main page]

0% found this document useful (0 votes)
25 views19 pages

SOMK Based Network Traffic Classification For Zero-Day Application Detection

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 19

SOMK based

Network Traffic Classification


for Zero-day Application Detection

Presented by,
Steffy Benny
TVE17ECMT16
Guided by,
02-07-2019 Dr. Ciza Thomas
1
Introduction

● Network Traffic Classifier identifies different applications


that exist in a system
● It helps QoS control mechanisms to properly prioritize
applications across limited bandwidth
● It also helps in implementing proper security policies over
a network

2
Traffic Classification Methods

Port based • Traditional; Checks portnumbers with IANA

Payload based • Relies on application's signature in payload

Flow-statistics • To ensure data confidentiality


based • Uses flow features from the packet's header

3
Problem and Motivation
The problem with available application detection techniques are the
difficulty in identification due to changing ports, traffic encryption
and zero-day applications.
Flow statistics-based methods are used to overcome the obfuscation
techniques due to changing ports with port-based methods and
encrypted data transmission with payload-based methods.
However, zero-day traffic are misclassified as a known class,
decreasing the overall accuracy.
4
An SOMK based
Compound Classifier
that can:

Methodology  Classify known classes


with high overall
Build a accuracy
classifier using  Classify zero-day
flow statistics-based applications into
semi-supervised method unknown classes

5
System Model

Labeled
and Data SOMK Cluster
Unlabeled pre-processing Clustering Identification
Data
ZERODAY DISCOVERY AND TRAINING

Test Trafiic Classification Classified


Data (N+1) Output

System Update
6
Data pre-processing
i. Sampling : random sampling was done to reduce the large
dataset
ii. Feature selection : since irrelevant features can negatively
affect the performance, only selected features were used

 CORRELATION-BASED FEATURE SELECTION


 Examines the usefulness of individual features
 High scores are assigned to attributes that are highly correlated with the
class and have low intercorrelation with each other
7
DATASET
Dataset used : MOORES DATASET
● A real world trace by high performance network monitor
● Each object represents a single flow of TCP packets between
client and server
● Each flow has 248 features
● 19 features were selected by correlation based method

8
Selected Features(19)
1 Server Port

10 Minimum of bytes in (Ethernet) packet

19 Median of total bytes in IP packet

28, 29, 30 Third quartile, Maximum, Variance of control bytes in packet

83,84 The minimum segment size(client→server) & (server→client)

95,96 Initial window-bytes (client→server) & (server→client)

97,98 Initial window-packets (client→server) & (server→client)

153, 174 Minimum number of bytes in (Ethernet) packet (client→server) & (server→client)

160, 181 Minimum number of total bytes in IP packet (client→server) & (server→client)

172, 193 Maximum of control bytes in packet (client→server) & (server→client)


9
194 Variance of control bytes packet (server→client)
SOM Clustering
A self-organizing map (SOM) is a type of
artificial neural network that is trained using
unsupervised learning to produce a two-
dimensional representation of the input
space.

 Result of training the SOM is a lattice of neurons representing possible clusters


10
SOM Clustering
● In Self-organizing map (SOM),
the Euclidean distance between
neighboring neurons is depicted
in a grayscale image.
● Light colours depict closely
spaced node vectors and darker
colours indicate more widely
separated node vectors
11
SOMK Method
Pre-processed
Training set
A combination of KMeans clustering and Self
Organisation Maps
SOM
● Node values of all the flows can be obtained
from SOM
● These values are appended to the original KMEANS
dataset as a new feature
● This new dataset is then clustered using
KMeans Clustered
applications

12
Cluster Identification
C1 C2
• Cluster merging is introduced to improve 1

the accuracy of the proposed system 4


3
• Clusters are identified with the majority 6
label
• Clusters with same labels are merged 2
C3
9 8
together as a single major cluster
• Clusters without any label are identified as
Unknown
unknown applications

13
(N+1)-Class Traffic Classification

• A new (N+1)-class classifier is trained, i.e. N known classes


and one unknown class

• All the zero-day classes are categorized into a generic


unknown class

• Thus, N labeled clusters and the unknown cluster obtained


from clustering are given as the training set.
14
System Update
● With unknown discovery and (N+1) class classification,
zero-day traffic has been identified

● By performing clustering on the obtained unknown cluster


and then by manual inspection, we can identify a new
application

● System update is done by including flows of this new


application into the training data
15
OVERALL ACCURACY Results

16
Results

17
Conclusions and future works
CONCLUSIONS:
● Results show that unknown applications can significantly
affect the classification accuracy of supervised methods
● With unknown discovery and (N+1) class classification,
zero-day traffic has been identified

FUTURE WORKS:
● Raw data capture to improve the real time classification
● Better feature selection algorithms for improved accuracy
● Weighted sampling techniques to obtain better training flows
18
References
 Zhang, Jun, et al. "Robust network traffic classification." IEEE/ACM
Transactions on Networking (TON) 23.4 (2015): 1257-1270.
 Erman, Jeffrey, et al. "Offline/real time traffic classification using semi-
supervised learning." Performance Evaluation 64.9-12 (2007): 1194-1213.
 Zhang, Jun, et al. "An effective network traffic classification method with
unknown flow detection." IEEE Transactions on Network and Service
Management 10.2 (2013): 133-147.
 Dainotti, Alberto, Antonio Pescape, and Kimberly C. Claffy. "Issues and
future directions in traffic classification." IEEE network 26.1 (2012): 35-
40. 19

You might also like