Learning Representations of Network Traffic Using Deep Neural Networks for Network Anomaly Detection: A Perspective towards Oil and Gas IT Infrastructures
<p>ISA-95 hierarchical view of components [<a href="#B9-symmetry-12-01882" class="html-bibr">9</a>].</p> "> Figure 2
<p>Methodology of Deep Representation learning for Anomaly Detection adopted by current study.</p> "> Figure 3
<p>Architecture of Implemented CNN for Representation Learning.</p> "> Figure 4
<p>Algorithm-wise RoC curve and Area under Curve for ELM variants, kNN, SVM and Decision Tree Algorithm.</p> "> Figure 5
<p>Algorithm-wise RoC curve and Area under Curve for MLP, QDA, Boosting and Averaging classification Algorithms.</p> "> Figure 6
<p>Representation-wise RoC curve and Area under Curve scores for all 12 ML algorithms.</p> "> Figure 7
<p>Algorithm-wise Precision-Recall curve and mAP for ELM variants, kNN, SVM and Decision Tree Algorithm.</p> "> Figure 8
<p>Algorithm-wise RoC curve and Area under Curve for MLP, QDA, Boosting and Averaging classification Algorithms.</p> "> Figure 9
<p>Representation-wise Precision-Recall curve and mean Average Precision scores for all 12 ML algorithms.</p> "> Figure 10
<p>Algorithm-wise Test Accuracy scores for all 12 ML Algorithms.</p> "> Figure 11
<p>Representation-wise Test Accuracy scores for all 12 ML Algorithms.</p> "> Figure 12
<p>Algorithm-wise F1-measure scores for all 12 ML Algorithms.</p> "> Figure 13
<p>Representation-wise F1-measure scores for all 12 ML Algorithms.</p> "> Figure 14
<p>Model Training times of DNN models (on GPU) for Learning Representations of ISCX 2012.</p> "> Figure 15
<p>Model Training times of Conventional ML algorithms for Anomaly Detection using ISCX 2012.</p> ">
Abstract
:1. Introduction
- Creating Handcrafted representation from large datasets is resource intensive and laborious because it requires low-level sensing, preprocessing and feature extraction. This is especially true for unstructured data,
- Identifying and selecting optimal features from large feature-pool of preprocessed data is time consuming and requires expert domain knowledge,
- Handcrafted representations leads to difficulty in “scaling up” activity recognition to complex high level behaviors (e.g., second-long, minute-long or more).
2. Related Works
3. Materials and Methods
- Autoencoders (Denoising, Convolutional)
- RNN with LSTM cells (LSTM)
- Convolutional Neural Networks (CNN)
- Initial dataset preprocessing
- Secondary dataset preprocessing
- DNN design and optimization
- Deep feature extraction using trained DNN models
- Extraction of Handcrafted features
- Conventional ML model training using deep and handcrafted features
- Evaluation and Comparison
3.1. Dataset Sampling and Preprocessing
3.1.1. Initial Dataset Preprocessing
3.1.2. Secondary Dataset Preprocessing
3.2. Deep Neural Networks for Learning Representations
3.2.1. Convolutional Neural Network
3.2.2. Convolutional Autoencoder
3.2.3. Denoising Autoencoder
3.2.4. Recurrent NeuralNet with LSTM Cells
3.2.5. Hyperparameter Optimization
3.2.6. Deep Feature Extraction Using Trained DNN Models
- Extract first 7500 bytes from each aggregated application payload for 200,083 chosen flow records of ISCX 2012.
- Reshape each input according to requirement of DNN to be employed for model training.
- Divide input in training set of 160,000 records and validation set of 40,083 records.
- Train a model using each deep model on dataset I using backpropagation.
- Propagate dataset I through each and retrieve the outputs generated at the penultimate layer of the Model . The output will be a matrix of shape , where n is the number of samples and q is the number of learned features. For current study, q is fixed at 40 and n is 200,083 which means each is a matrix of 200,083 × 40 stored in a numpy array.
- For each do the following:
- (a)
- Divide into trainset and testset.
- (b)
- Train all 12 conventional ML algorithms on trainset and evaluate on test set.
- (c)
- Calculate model evaluation metrics on results of evaluation.
3.3. Handcrafted Representation
3.4. Conventional ML Model Training Using Network Data Representations
Algorithm 1: Model Generation from Conventional ML Algorithms. |
Evaluation and Comparison of Conventional Anomaly Detectors
4. Experimental Setup
- CPU: Quad Core Intel Xeon E-1650
- RAM: 16 GB
- GPU: nVidia GTX 1070
5. Results and Discussion
- True Positive (TP): Model prediction is accepted as TP, if an anomaly is predicted as anomaly,
- False Positive (FP): Model prediction is accepted as FP, if a normal record is predicted as anomaly,
- True Negative (TN): Model prediction is accepted as TN, if a normal record is predicted as normal,
- False Negative (FN): Model prediction is accepted as FN, if an anomaly is predicted as normal.
5.1. Receiver Operating Characteristics (RoC) and Area under RoC
5.1.1. RoC and AuC Algorithm-Wise Performance
5.1.2. RoC and AuC Representation-Wise Performance
5.2. Precision-Recall Curve and Mean Average Precision
5.2.1. PR-Curve and mAP Algorithm-Wise Performance Evaluation
5.2.2. PR-Curve and mAP Representation-Wise Performance Evaluation
5.3. Accuracy
5.4. F1-Measure
5.5. Timing Information
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Economics BEBP. BP Energy Outlook; Report; British Petrolium: London, UK, 2019. [Google Scholar]
- Colwill, C. Human factors in information security: The insider threat–Who can you trust these days? Inf. Secur. Tech. Rep. 2009, 14, 186–196. [Google Scholar] [CrossRef]
- Ali, R.F.; Dominic, P.; Karunakaran, P.K. Information Security Policy and Compliance In Oil And Gas Organizations—A Pilot Study. Solid State Technol. 2020, 63, 1275–1282. [Google Scholar]
- Ali, R.F.; Dominic, P.; Ali, K. Organizational governance, social bonds and information security policy compliance: A perspective towards oil and gas employees. Sustainability 2020, 12, 8576. [Google Scholar] [CrossRef]
- Lu, H.; Huang, K.; Azimi, M.; Guo, L.J.I.A. Blockchain technology in the oil and gas industry: A review of applications, opportunities, challenges, and risks. IEEE Access 2019, 7, 41426–41444. [Google Scholar] [CrossRef]
- Wueest, C. Targeted Attacks against the Energy Sector; Symantec Security Response: Mountain View, CA, USA, 2014. [Google Scholar]
- Lu, H.; Guo, L.; Azimi, M.; Huang, K. Oil and Gas 4.0 era: A systematic review and outlook. Comput. Ind. 2019, 111, 68–90. [Google Scholar] [CrossRef]
- International Society of Automation. Enterprise-Control System Integration; Part 3: Activity Models of Manufacturing Operations Management; ISA: Durham, NC, USA, 2005. [Google Scholar]
- Foehr, M.; Vollmar, J.; Calà, A.; Leitão, P.; Karnouskos, S.; Colombo, A.W. Engineering of Next Generation Cyber-Physical Automation System Architectures. In Multi-Disciplinary Engineering for Cyber-Physical Production Systems; Biffl, S., Lüder, A., Gerhard, D., Eds.; Springer: Cham, Switzerland, 2017; pp. 185–206. [Google Scholar]
- Si, W.; Li, J.H.; Huang, X.J. Features Extraction Based on Deep Analysis of Network Packets in Industrial Control Systems; Springer: Singapore, 2020; Volume 595, pp. 524–529. [Google Scholar] [CrossRef]
- Thudumu, S.; Branch, P.; Jin, J.; Singh, J.J. A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 2020, 7, 42. [Google Scholar] [CrossRef]
- Kurniabudi, K.; Purnama, B.; Sharipuddin, S.; Darmawijoyo, D.; Stiawan, D.; Samsuryadi, S.; Heryanto, A.; Budiarto, R. Network anomaly detection research: A survey. Indones. J. Electr. Eng. Informatics 2019, 7, 37–50. [Google Scholar] [CrossRef] [Green Version]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A Detailed Analysis of the KDD CUP 99 Data Set. In Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, Piscataway, NJ, USA, 7–10 September 2009; pp. 53–58. [Google Scholar]
- Zhu, X.; Goldberg, A.B. Introduction to Semi-Supervised Learning. Synth. Lect. Artif. Intell. Mach. Learn. 2009, 3. [Google Scholar] [CrossRef] [Green Version]
- Luo, M.; Wang, L.; Zhang, H.; Chen, J. A Research on Intrusion Detection Based on Unsupervised Clustering and Support Vector Machine; Springer: Berlin/Heidelberg, Germany, 2003; pp. 325–336. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Naseer, S.; Saleem, Y.; Khalid, S.; Bashir, M.K.; Han, J.; Iqbal, M.M.; Han, K. Enhanced Network Anomaly Detection Based on Deep Neural Networks. IEEE Access 2018, 6, 48231–48246. [Google Scholar] [CrossRef]
- Naseer, S.; Saleem, Y. Enhanced Network Intrusion Detection using Deep Convolutional Neural Networks. TIIS 2018, 12, 5159–5178. [Google Scholar] [CrossRef]
- Hamamoto, A.H.; Carvalho, L.F.; Sampaio, L.D.H.; Abrão, T.; Proença, M.L., Jr. Network anomaly detection system using genetic algorithm and fuzzy logic. Expert Syst. Appl. 2018, 92, 390–402. [Google Scholar] [CrossRef]
- Song, J.; Zhao, W.; Liu, Q.; Wang, X. Hybrid feature selection for supporting lightweight intrusion detection systems. J. Phys. 2017, 887, 012031. [Google Scholar] [CrossRef] [Green Version]
- Lashkari, A.H.; Gil, G.D.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of Tor Traffic using Time based Features. In Proceedings of the 3rd International Conference on Information Systems Security and Privacy, Porto, Portugal, 19–21 February 2017; pp. 253–262. [Google Scholar] [CrossRef]
- Shiravi, A.; Shiravi, H.; Tavallaee, M.; Ghorbani, A.A. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 2012, 31, 357–374. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; pp. 1097–1105. [Google Scholar]
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112. [Google Scholar]
- Hodo, E.; Bellekens, X.; Hamilton, A.; Tachtatzis, C.; Atkinson, R. Shallow and Deep Networks Intrusion Detection System: A Taxonomy and Survey. arXiv 2017, arXiv:1701.02145. [Google Scholar]
- Ghorbani, A.A.; Lu, W.; Tavallaee, M. Network Intrusion Detection and Prevention, Advances in Information Security; Springer: Boston, MA, USA, 2010; Volume 47. [Google Scholar]
- Gao, N.; Gao, L.; Gao, Q.; Wang, H. An Intrusion Detection Model Based on Deep Belief Networks. In Proceedings of the 2014 Second International Conference on Advanced Cloud and Big Data, Huangshan, China, 20–22 November 2014; pp. 247–252. [Google Scholar] [CrossRef]
- Staudemeyer, R.C.; Omlin, C.W. Evaluating performance of long short-term memory recurrent neural networks on intrusion detection data. In Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference, East London, South Africa, 7–9 October 2013; pp. 218–224. [Google Scholar]
- Tuor, A.; Kaplan, S.; Hutchinson, B.; Nichols, N.; Robinson, S. Deep learning for unsupervised insider threat detection in structured cybersecurity data streams. arXiv 2017, arXiv:1710.00811. [Google Scholar]
- Wang, Z. The Applications of Deep Learning on Traffic Identification; BlackHat: Las Vegas, NV, USA, 2015. [Google Scholar]
- Ashfaq, R.A.R.; Wang, X.Z.; Huang, J.Z.; Abbas, H.; He, Y.L. Fuzziness based semi-supervised learning approach for intrusion detection system. Inf. Sci. 2017, 378, 484–497. [Google Scholar] [CrossRef]
- Du, M.; Li, F.; Zheng, G.; Srikumar, V. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning; ACM Press: New York, NY, USA, 2017; pp. 1285–1298. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Arnaldo, I.; Cuesta-Infante, A.; Arun, A.; Lam, M.; Bassias, C.; Veeramachaneni, K. Learning Representations for Log Data in Cybersecurity. In Cyber Security Cryptography and Machine Learning; Dolev, S., Lodha, S., Eds.; Springer: Cham, Switzerland, 2017; Volume 10332, pp. 250–268. [Google Scholar] [CrossRef]
- Ian, J. Principal Component Analysis. In Encyclopedia of Statistics in Behavioral Science; American Cancer Society: Atlanta, GA, USA, 2005. [Google Scholar] [CrossRef]
- Maaten, L.V.D.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Karpathy, A. Connecting Images and Natural Language. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2016. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2011, Espoo, Finland, 14–17 June 2011; pp. 52–59. [Google Scholar]
- Zeiler, M.D. ADADELTA: An Adaptive Learning Rate Method. arXiv 2012, arXiv:1212.5701. [Google Scholar]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. JMLR 2012, 13, 281–305. [Google Scholar]
- Mcginnis, W. BaseN Encoding and Grid Search in Categorical Variables. Available online: http://www.willmcginnis.com/2016/12/18/basen-encoding-grid-search-category_encoders/ (accessed on 10 May 2018).
- Mcginnis, W. Beyond OneHot: An Exploration of Categorical Variables. Available online: http://www.willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/ (accessed on 20 May 2018).
- Group, S.C. Contrast Coding Systems for Categorical Variables. Available online: https://stats.idre.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis-2/ (accessed on 7 June 2018).
- Zhang, O. Strategies to Encode Categorical Variables with Many Categories. Available online: https://towardsdatascience.com/smarter-ways-to-encode-categorical-data-for-machine-learning-part-1-of-3-6dca2f71b159 (accessed on 27 June 2018).
- Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
- Fernandez-Navarro, F.; Hervas-Martinez, C.; Sanchez-Monedero, J.; Gutierrez, P.A. MELM-GRBF: A modified version of the extreme learning machine for generalized radial basis function neural networks. Neurocomputing 2011, 74, 2502–2510. [Google Scholar] [CrossRef]
- Martin, A.; Ashish, A.; Paul, B.; Eugene, B.; Zhifeng, C.; Craig, C.; Greg, S.C.; Andy, D.; Jeffrey, D.; Matthieu, D.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. arXiv 2015, arXiv:1603.04467. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Layer Name | Output Shape | No. of Trainable Parameters |
---|---|---|
R0: Reshape layer (7500 to 50 × 50 × 3 conversion) | (Batch-size,50,50,3) | 0 |
C1: Conv layer with 3 × 3 kernels and 6 feature maps | (Batch-size,48,48,6) | × 3 × × |
S2: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,24,24,6) | 0 |
C3: Conv layer with 3 × 3 kernels and 16 feature maps | (Batch-size,22,22,16) | × × × |
S4: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,11,11,16) | 0 |
C5: Conv layer with 3 × 3 kernels and 64 feature maps | (Batch-size,9,9,64) | × × × |
S6: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,4,4,64) | 0 |
D7: Dropout layer with 0.25 drop probability | Dropout layer | |
C8: Conv layer with 2 × 2 kernels and 120 feature maps | (Batch-size,3,3,120) | × × × |
S9: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,1,1,120) | 0 |
Model Flattening | (Batch-size,120) | |
D11: Dropout layer with 0.5 drop probability | Dropout layer | |
FC12: Feature Extraction: Fully connected layer | (Batch-size,120) | × |
Output | (Batch-size,1) | × |
Layer Name | Output Shape | No. of Trainable Parameters |
---|---|---|
Input with Reshape | (Batch-size,50,50,3) | 0 |
C1: Conv layer with 3 × 3 kernels and 8 feature maps | (Batch-size,48,48,8) | × 3 × × |
S2: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,24,24,8) | 0 |
C3: Conv layer with 3 × 3 kernels and 16 feature maps | (Batch-size,24,24,16) | × 3 × × |
S4: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,12,12,16) | 0 |
C5: Conv layer with 3 × 3 kernels and 16 feature maps | (Batch-size,12,12,16) | × 3 × × |
S6A: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,6,6,16) | 0 |
C7: Conv layer with 3 × 3 kernels and 16 feature maps | (Batch-size,6,6,16) | × 3 × × |
S6B: Subsample with 2 × 2 non-overlapping kernel | (Batch-size,3,3,16) | 0 |
Model Flattening | (Batch-size,144) | |
FC8: Bottleneck: Fully connected layer | (Batch-size,40) | × |
FC9: Fully connected layer | (Batch-size,144) | × |
R10: Reshape layer (144 to 3x3x16 conversion) | (Batch-size,3,3,16) | 0 |
C11: Conv layer with 3 × 3 kernels and 16 feature maps | (Batch-size,3,3,16) | × 3 × × |
U12: Upsample with 2 × 2 kernels non-overlapping kernel | (Batch-size,6,6,16) | |
C13: Convolution layer with 3 × 3 kernels and 16 feature maps | (Batch-size,6,6,16) | × 3 × × |
U14: Upsample with 2 × 2 kernels non-overlapping kernel | (Batch-size,12,12,16) | |
CT15: Transposed-Conv layer with 3 × 3 kernels, (2,2) Strides and 16 feature maps | (Batch-size,25,25,16) | × 3 × × |
CT16: Transposed-Conv layer with 3 × 3 kernels, (2,2) and 3 feature maps | (Batch-size,50,50,3) | × 3 × × |
Layer Name | Output Shape | No. of Trainable Parameters |
---|---|---|
Input with Reshape | (Batch-size,50,50,3) | 0 |
N0: Gaussian Noise layer with Impurity | ||
C1: Conv layer with 5 × 5 kernels and 6 feature maps | (Batch-size,46,46,6) | × 5 × × |
S2: Subsample with 4 × 4 non-overlapping kernel | (Batch-size,11,11,6) | 0 |
Model Flattening | (Batch-size,726) | |
FC3: Bottleneck: Fully connected layer | (Batch-size,40) | × |
FC4: Fully connected layer | (Batch-size,726) | × |
R5: Reshape layer (726 to 11 × 11 × 6 conversion) | (Batch-size,11,11,6) | |
CT6: Transposed-Conv layer with 5 × 5 kernels, (2,2) Strides and 6 feature maps | (Batch-size,25,25,6) | × 5 × × |
CT7: Transposed-Conv layer with 5 × 5 kernels, (2,2) and 3 feature maps | (Batch-size,50,50,3) | × 5 × × |
Layer Name | Output Shape | No. of Trainable Parameters |
---|---|---|
Input with Reshape | (Batch-size,25,300) | 0 |
L1: LSTM layer with 128 units | (Batch-size,25,128) | 219,648 |
Model Flattening | (Batch-size,3200) | |
D2: Dropout with drop probability | Dropout | |
FC3: Fully connected layer | (Batch-size,128) | × 128 = 409,728 |
FC4: Feature Extraction: Fully connected layer | (Batch-size,40) | × |
Output | (Batch-size,1) | × |
DNN | Optimizer | Objective | Neuron | Misc | Bias | Learn-Rate | Valid-Acc |
---|---|---|---|---|---|---|---|
CNN | Adam | Binary crossentropy | Conv:relu, FC:tanh | Init:He-uniform | Yes | 0.01 | 0.9937 |
LSTM | Adadelta | Binary crossentropy | FC:tanh | lstm-cells:128 | NA | 0.05 | 0.9920 |
ConvAE | adadelta | MSE | Conv:relu | Init:glorot-uniform | Yes | 0.01 | 0.6991 |
DenoiseAE | adadelta | MSE | Conv:relu, FC:relu | Noise: 0.25 | Yes | 0.001 | 0.5860 |
Quantitative Features | Symbolic Features |
---|---|
Duration | application Name |
Bytes per second | direction |
Min/ Max/ Avg/ Std packet inter-arrival times | dstTCPflagDescription |
Packets per second | srcTCPflagDescription |
Min/ Max/ Avg/ Std idle time | dstIP |
Min/ Max/ Avg/ Std active time | SrcIP |
Min/ Max/ Avg/ Std inter-arrival times of received packets | StartDateTime |
Min/ Max/ Avg/ Std inter arrival times of sent packets | transport protocol |
srcPort | EndDateTime |
dstPort |
Encoding Scheme | Dimensionality | Average Score | Score StDev | Training Time (Seconds) |
---|---|---|---|---|
BinaryEncoder | 61 | 0.9900 | 0.001789 | 45.247959 |
BackwardDifference | 11,136 | 0.9096 | 0.066437 | 90.317814 |
HelmertEncoder | 11,136 | 0.8928 | 0.064478 | 78.638483 |
HashingEncoder | 8 | 0.9304 | 0.003611 | 8.252981 |
OrdinalEncoder | 9 | 0.9938 | 0.007909 | 44.707021 |
OneHotEncoder | 11,145 | 0.9968 | 0.001600 | 206.595544 |
SumEncoder | 11,136 | 0.9962 | 0.001939 | 79.257670 |
BaseNEncoder | 61 | 0.9910 | 0.001414 | 65.117447 |
LeaveOneOutEncoder | 9 | 0.9994 | 0.000800 | 44.738180 |
Sr. No. | Classification Algorithm | Hyper-Parameters |
---|---|---|
1 | Decision-Tree | Max-Depth = 10, Split Quality Measure = “entropy”, Max features considered for each best split = 40 |
2 | Random-Forest | Max-Depth = 5, No. of Estimators = 10, Split Quality Measure = “gini”, Max features considered for each best split = |
3 | QDA | Priors = None, Regularization Parameter = , Rank Estimation Threshold = |
4 | Multilevel Perceptron | Hidden layer Units = 100, Activation = , Solver = SGD, Penalty = , Learning rate = adaptive, epochs = 200 |
5 | Nearest Neighbor | Neighbors = 15, Algorithm = Ball-Tree, Leaf-size = 30, Distance-Metric = Euclidean |
6 | SVM | Kernel = RBF, Gamma = , epochs = 2500, Length scale = 1, Length scale bounds = (1 × 10, 1 × 10) |
7 | Extremely Randomized Trees | Max-Depth = Not Set, No. of Estimators = 30, Split Quality Measure = “gini”, Max features considered for each best split = , Out-of-bag Sampling = enabled, Bootstrap-Sampling = enabled |
8 | Adaboost | Base-estimator = J48, No. of Estimators = 50 |
9 | Gradient Boost | No. of Estimators = 100, max-leaf-nodes = 20, max-depth = 90, random-state = 24, min-samples-split = 5, learning-rate = |
10 | ELM MLP | Hidden Layer Units = 256, activation = , Hidden-Layer = MLP, epochs = 15 |
11 | ELM RBF | Hidden Layer Units = 256, activation = , Hidden Layer = , epochs = 15 |
12 | ELM Generalized | Hidden Layer Units = 256, activation = [52], epochs = 15 |
Algorithm-Name | Mean-AuC | Algorithm-Name | Mean-AuC |
---|---|---|---|
ELM-MLP | Random-Forest | ||
ELM-RBF | MLP | ||
ELM-GRBF | Adaboost | ||
k-NN | QDA | ||
SVM | Gradient Boost | ||
Decision-Tree | ExtraTree |
Algorithm-Name | Mean mAP Score | Algorithm-Name | Mean mAP Score |
---|---|---|---|
ELM-MLP | Random-Forest | ||
ELM-RBF | MLP | ||
ELM-GRBF | Adaboost | ||
k-NN | QDA | ||
SVM | Gradient Boost | ||
Decision-Tree | ExtraTree |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Naseer, S.; Faizan Ali, R.; Dominic, P.D.D.; Saleem, Y. Learning Representations of Network Traffic Using Deep Neural Networks for Network Anomaly Detection: A Perspective towards Oil and Gas IT Infrastructures. Symmetry 2020, 12, 1882. https://doi.org/10.3390/sym12111882
Naseer S, Faizan Ali R, Dominic PDD, Saleem Y. Learning Representations of Network Traffic Using Deep Neural Networks for Network Anomaly Detection: A Perspective towards Oil and Gas IT Infrastructures. Symmetry. 2020; 12(11):1882. https://doi.org/10.3390/sym12111882
Chicago/Turabian StyleNaseer, Sheraz, Rao Faizan Ali, P.D.D Dominic, and Yasir Saleem. 2020. "Learning Representations of Network Traffic Using Deep Neural Networks for Network Anomaly Detection: A Perspective towards Oil and Gas IT Infrastructures" Symmetry 12, no. 11: 1882. https://doi.org/10.3390/sym12111882