SOMK Based Network Traffic Classification For Zero-Day Application Detection

SOMK based
Network Traffic Classification

for Zero-day Application Detection
Presented by,
Steffy Benny
TVE17ECMT16
Guided by,
02-07-2019 Dr. Ciza Thomas
1
Introduction
● Network Traffic Classifier identifies different applications

that exist in a system
● It helps QoS control mechanisms to properly prioritize
applications across limited bandwidth
● It also helps in implementing proper security policies over
a network
2
Traffic Classification Methods
Port based • Traditional; Checks portnumbers with IANA
Payload based • Relies on application's signature in payload
Flow-statistics • To ensure data confidentiality

based • Uses flow features from the packet's header
3
Problem and Motivation
The problem with available application detection techniques are the
difficulty in identification due to changing ports, traffic encryption
and zero-day applications.
Flow statistics-based methods are used to overcome the obfuscation
techniques due to changing ports with port-based methods and
encrypted data transmission with payload-based methods.
However, zero-day traffic are misclassified as a known class,
decreasing the overall accuracy.
4
An SOMK based
Compound Classifier
that can:
Methodology  Classify known classes

with high overall
Build a accuracy
classifier using  Classify zero-day
flow statistics-based applications into
semi-supervised method unknown classes
5
System Model
Labeled
and Data SOMK Cluster
Unlabeled pre-processing Clustering Identification
Data
ZERODAY DISCOVERY AND TRAINING
Test Trafiic Classification Classified

Data (N+1) Output
System Update
6
Data pre-processing
i. Sampling : random sampling was done to reduce the large
dataset
ii. Feature selection : since irrelevant features can negatively
affect the performance, only selected features were used
 CORRELATION-BASED FEATURE SELECTION

 Examines the usefulness of individual features
 High scores are assigned to attributes that are highly correlated with the
class and have low intercorrelation with each other
7
DATASET
Dataset used : MOORES DATASET
● A real world trace by high performance network monitor
● Each object represents a single flow of TCP packets between
client and server
● Each flow has 248 features
● 19 features were selected by correlation based method
8
Selected Features(19)
1 Server Port
10 Minimum of bytes in (Ethernet) packet
19 Median of total bytes in IP packet
28, 29, 30 Third quartile, Maximum, Variance of control bytes in packet
83,84 The minimum segment size(client→server) & (server→client)
95,96 Initial window-bytes (client→server) & (server→client)
97,98 Initial window-packets (client→server) & (server→client)
153, 174 Minimum number of bytes in (Ethernet) packet (client→server) & (server→client)
160, 181 Minimum number of total bytes in IP packet (client→server) & (server→client)
172, 193 Maximum of control bytes in packet (client→server) & (server→client)

9
194 Variance of control bytes packet (server→client)
SOM Clustering
A self-organizing map (SOM) is a type of
artificial neural network that is trained using
unsupervised learning to produce a two-
dimensional representation of the input
space.
 Result of training the SOM is a lattice of neurons representing possible clusters

10
SOM Clustering
● In Self-organizing map (SOM),
the Euclidean distance between
neighboring neurons is depicted
in a grayscale image.
● Light colours depict closely
spaced node vectors and darker
colours indicate more widely
separated node vectors
11
SOMK Method
Pre-processed
Training set
A combination of KMeans clustering and Self
Organisation Maps
SOM
● Node values of all the flows can be obtained
from SOM
● These values are appended to the original KMEANS
dataset as a new feature
● This new dataset is then clustered using
KMeans Clustered
applications
12
Cluster Identification
C1 C2
• Cluster merging is introduced to improve 1
the accuracy of the proposed system 4

3
• Clusters are identified with the majority 6
label
• Clusters with same labels are merged 2
C3
9 8
together as a single major cluster
• Clusters without any label are identified as
Unknown
unknown applications
13
(N+1)-Class Traffic Classification
• A new (N+1)-class classifier is trained, i.e. N known classes

and one unknown class
• All the zero-day classes are categorized into a generic

unknown class
• Thus, N labeled clusters and the unknown cluster obtained

from clustering are given as the training set.
14
System Update
● With unknown discovery and (N+1) class classification,
zero-day traffic has been identified
● By performing clustering on the obtained unknown cluster

and then by manual inspection, we can identify a new
application
● System update is done by including flows of this new

application into the training data
15
OVERALL ACCURACY Results
16
Results
17
Conclusions and future works
CONCLUSIONS:
● Results show that unknown applications can significantly
affect the classification accuracy of supervised methods
● With unknown discovery and (N+1) class classification,
zero-day traffic has been identified
FUTURE WORKS:
● Raw data capture to improve the real time classification
● Better feature selection algorithms for improved accuracy
● Weighted sampling techniques to obtain better training flows
18
References
 Zhang, Jun, et al. "Robust network traffic classification." IEEE/ACM
Transactions on Networking (TON) 23.4 (2015): 1257-1270.
 Erman, Jeffrey, et al. "Offline/real time traffic classification using semi-
supervised learning." Performance Evaluation 64.9-12 (2007): 1194-1213.
 Zhang, Jun, et al. "An effective network traffic classification method with
unknown flow detection." IEEE Transactions on Network and Service
Management 10.2 (2013): 133-147.
 Dainotti, Alberto, Antonio Pescape, and Kimberly C. Claffy. "Issues and
future directions in traffic classification." IEEE network 26.1 (2012): 35-
40. 19

SOMK Based Network Traffic Classification For Zero-Day Application Detection

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

SOMK Based Network Traffic Classification For Zero-Day Application Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SOMK Based Network Traffic Classification For Zero-Day Application Detection

Uploaded by

Copyright:

Available Formats

SOMK based

Network Traffic Classification

● Network Traffic Classifier identifies different applications

Port based • Traditional; Checks portnumbers with IANA

Payload based • Relies on application's signature in payload

Flow-statistics • To ensure data confidentiality

Methodology  Classify known classes

Test Trafiic Classification Classified

 CORRELATION-BASED FEATURE SELECTION

10 Minimum of bytes in (Ethernet) packet

19 Median of total bytes in IP packet

28, 29, 30 Third quartile, Maximum, Variance of control bytes in packet

83,84 The minimum segment size(client→server) & (server→client)

95,96 Initial window-bytes (client→server) & (server→client)

97,98 Initial window-packets (client→server) & (server→client)

172, 193 Maximum of control bytes in packet (client→server) & (server→client)

 Result of training the SOM is a lattice of neurons representing possible clusters

the accuracy of the proposed system 4

• A new (N+1)-class classifier is trained, i.e. N known classes

• All the zero-day classes are categorized into a generic

• Thus, N labeled clusters and the unknown cluster obtained

● By performing clustering on the obtained unknown cluster

● System update is done by including flows of this new

You might also like