Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors

Ali Safari Khatouni ORCID: orcid.org/0000-0002-6435-6933¹,
Nabil Seddigh²,
Biswajit Nandy² &
…
Nur Zincir-Heywood³

652 Accesses
8 Citations
Explore all metrics

Abstract

Visibility into network traffic is a key requirement for different security and network monitoring tools. Recent trends in the evolution of Internet traffic present a challenge for traditional traffic analysis methods to achieve accurate classification of Internet traffic including Voice over IP (VoIP), text messaging, video, and audio services among others. A key aspect of this trend is the rising levels of encrypted multiple service channels where the payload is opaque to middleboxes in the network. In such scenarios, traditional approaches such as Deep Packet Inspection (DPI) or examination of Port numbers are unable to achieve the classification accuracy required. This work investigates Machine Learning-based network traffic classifiers as a means of accurately classifying encrypted multiple service channels. The study carries out a thorough study which (i) proposes and evaluates two machine learning-based frameworks for multiple service channels analysis; (ii) undertakes feature engineering to identify the minimum number of features required to obtain high accuracy while reducing the effects of over-fitting; (iii) explores the portability and robustness of the frameworks trained models under different network conditions: location, time, and volume; and (iv) collects and analyzes a large-scale dataset including nine classes of services, for benchmarking purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Traffic Identification in Big Internet Data

A review on machine learning–based approaches for Internet traffic classification

Article 22 June 2020

Analysis of Early Traffic Processing and Comparison of Machine Learning Algorithms for Real Time Internet Traffic Identification Using Statistical Approach

Notes

We use the abbreviation of classifier in the Table 7 for the sake of brevity. We consider Random forest (RF), Decision tree (DT), Complement Naive Bayes (CNB), Multinomial Naive Bayes (MNB), K-Nearest Neighbors (KNN), Bernoulli Naive Bayes (BNB), Linear Support-Vector Machine (LSVM), Classifier using Ridge Regression (RR), NearestCentroid (NC), Support-Vector Machine (SVM), Passive-Aggressive (PR), Perceptron (P), and linear Models with Stochastic Gradient Descent (LSGD).
https://scikit-learn.org/stable/index.html.
The detailed result is omitted for the sake of brevity.

References

Alshammari, R., Zincir-Heywood, A.N.: Can encrypted traffic be identified without port numbers, ip addresses and payload inspection? Comput. Netw. 55, 1326–1350 (2011)
Article Google Scholar
Matthews, P., Rosenberg, J., Wing, D., Mahy, R.: Session traversal utilities for NAT (STUN). In: RFC 5389, no. 5389 in Request for Comments, RFC Editor, (October 2008)
Safari Khatouni, A., Trevisan, M., Regano, L., Viticchié, A.: Privacy issues of ISPs in the modern web. In: 2017 8th IEEE annual information technology, electronics and mobile communication conference (IEMCON), pp. 588–594, (October 2017)
Trevisan, M., Giordano, D., Drago, I., Mellia, M., Munafo, M.: Five years at the edge: Watching internet from the isp network. In: Proceedings of the 14th international conference on emerging networking experiments and technologies, CoNEXT ’18, (New York, NY, USA), pp. 1–12, ACM, (December 2018)
Trevisan, M., Giordano, D., Drago, I., Mellia, M., Munafo, M.: Five years at the edge: Watching internet from the isp network. In: Proceedings of the 14th international conference on emerging networking experiments and technologies, CoNEXT ’18, (New York, NY, USA), pp. 1–12, ACM, (December 2018)
Rescorla, E., Schiffman, A.M.: The secure HyperText transfer protocol. RFC 2660, (Aug. 1999)
Burschka, S., Dupasquier, B.: Tranalyzer: Versatile high performance network traffic analyser. In: 2016 IEEE symposium series on computational intelligence (SSCI), pp. 1–8, (December 2016)
Finamore, A., Mellia, M., Meo, M., Munafo, M.M., Torino, P.D., Rossi, D.: Experiences of Internet traffic monitoring with tstat. IEEE Netw. 25, 8–14 (2011)
Article Google Scholar
CERT/NetSA at Carnegie Mellon University: SiLK (System for Internet-Level Knowledge). http://tools.netsa.cert.org/silk. Accessed July 2009.
C.M. University: Argus: the network audit record generation and utilization system. https://qosient.com/argus/. December 1994
Safari Khatouni, A., Zincir-Heywood, N.: Integrating machine learning with off-the-shelf traffic flow features for http/https traffic classification. In: 2019 the 24th symposium on computers and communications (ISCC), (June 2019)
Safari Khatouni A., Zincir-Heywood, N.: How much training data is enough to move a ml-based classifier to a different network? Procedia Computer Science, 155, 378–385 (August 2019). In: The 14th international conference on future networks and communications (FNC-2019)
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Multi-classification approaches for classifying mobile app traffic. J. Netw. Comput. Appl. 103, 131–145 (2018)
Article Google Scholar
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Mimetic: mobile encrypted traffic classification using multimodal deep learning. Comput. Netw. 165, 106944 (2019)
Article Google Scholar
Brissaud, P.-O., Francois, J., Chrisment, I., Cholez, T., Bettan, O.: Passive monitoring of https service use. In: CNSM’18—14th international conference on network and service management, (Rome, Italy), pp. 7, (November 2018)
Lotfollahi, M., Zade, R.S.H., Siavoshani, M.J., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. CoRR, (September 2017). arXiv:1709.02656
Trevisan, M., Drago, I., Mellia, M., Song, H.H., Baldi, M.: What: a big data approach for accounting of modern web services. In: 2016 IEEE international conference on big data (Big Data), pp. 2740–2745, (December 2016)
Trevisan, M., Drago, I., Mellia, M., Song, H.H., Baldi, M.: AWESoME: big data for automatic web service management in SDN. IEEE Trans. Netw. Serv. Manag. PP, 1 (2017)
Google Scholar
Dong Ning, Y., Jie Zhao, J., Jin, J.: Novel feature selection and classification of internet video traffic based on a hierarchical scheme. Comput. Netw. 119, 102–111 (2017)
Article Google Scholar
Davis, J.J., Foo, E.: Automated feature engineering for http tunnel detection. Comput. Secur. 59, 166–185 (2016)
Article Google Scholar
Gonzalez, R., Soriente, C., Laoutaris, N.: User profiling in the time of HTTPS. In: Proceedings of the 2016 internet measurement conference, IMC ’16, (New York, NY, USA), pp. 373–379, ACM, (November 2016)
Fu, Y., Xiong, H., Lu, X., Yang, J., Chen, C.: Service usage classification with encrypted internet traffic in mobile messaging apps. IEEE Trans. Mob. Comput. 15, 2851–2864 (2016)
Article Google Scholar
Shbair, W.M., Cholez, T., Francois, J., Chrisment, I.: A multi-level framework to identify https services. In: NOMS 2016 - 2016 IEEE/IFIP network operations and management symposium, pp. 240–248, (April 2016)
Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I.: AppScanner: automatic fingerprinting of smartphone apps from encrypted network traffic. In: 2016 IEEE European symposium on security and privacy (EuroS P), pp. 439–454, (March 2016)
Alshammari, R., Zincir-Heywood, A.N.: How robust can a machine learning approach be for classifying encrypted voip? J. Netw. Syst. Manag. 23, 830–869 (2015)
Article Google Scholar
Wang, Q., Yahyavi, A., Kemme, B., He, W.: I know what you did on your smartphone: inferring app usage over encrypted data traffic. In: 2015 IEEE conference on communications and network security (CNS), pp. 433–441, (September 2015)
Xu, Q., Liao, Y., Miskovic, S., Mao, Z.M., Baldi, M., Nucci, A., Andrews, T.: Automatic generation of mobile app signatures from traffic observations. In: 2015 IEEE conference on computer communications (INFOCOM), pp. 1481–1489, (April 2015)
Branch, P.A., Heyde, A., Armitage, G.J.: Rapid identification of skype traffic flows. In: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video, NOSSDAV ’09, (New York, NY, USA), pp. 91–96, ACM, June (2009)
Li, W., Canini, M., Moore, A.W., Bolla, R.: Efficient application identification and the temporal and spatial stability of classification schema. Comput. Netw. 53, 790–809 (2009)
Article Google Scholar
Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36, 23–26 (2006)
Article Google Scholar
Pacheco, F., Exposito, E., Gineste, M., Baudoin, C., Aguilar, J.: Towards the deployment of machine learning solutions in network traffic classification: a systematic survey. IEEE Communications Surveys Tutorials, pp. 1–1, (November 2018)
Namdev, N., Agrawal, S., Silkari, S.: Recent advancement in machine learning based internet traffic classification. Procedia Computer Science 60, 784–791 (2015). Knowledge-based and intelligent information & engineering systems 19th annual conference, KES-2015, Singapore, September 2015 proceedings
Velan, P., Čermák, M., Čeleda, P., Drašar, M.: A survey of methods for encrypted traffic classification and analysis. Network 25, 355–374 (2015)
Google Scholar
Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutorials 10, 56–76 (2008). Fourth Quarter
Article Google Scholar
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE Trans. Netw. Serv. Manag. 16(2), 445–458 (2019)
Article Google Scholar
Shbair, W.M., Cholez, T., Francois, J., Chrisment, I.: A survey of https traffic and services identification approaches (2020)
Conti, M., Li, Q.Q., Maragno, A., Spolaor, R.: The dark side(-channel) of mobile devices: a survey on network traffic analysis. IEEE Commun. Surv. Tutorials 20(4), 2658–2713 (2018)
Article Google Scholar
Kim, H., Claffy, K., Fomenkov, M., Barman, D., Faloutsos, M., Lee, K.: Internet traffic classification demystified: myths, caveats, and the best practices. In: Proceedings of the 2008 ACM CoNEXT conference, CoNEXT ’08, (New York, NY, USA), Association for Computing Machinery, (2008)
Datta, J., Kataria N., Hubballi, N.: Network traffic classification in encrypted environment: A case study of google hangout. In: 2015 twenty first national conference on communications (NCC), pp. 1–6, February 2015
Husák, M., Cermák, M., Jirsík, T., Celeda, P.: Network-based https client identification using ssl/tls fingerprinting. In: 2015 10th international conference on availability, reliability and security, pp. 389–396, (August 2015)
Hady, M.F.A., Schwenker, F.: Semi-supervised learning, pp. 215–239. Springer, Berlin, Heidelberg (2013)
Google Scholar
Kato, N., Fadlullah, Z.M., Mao, B., Tang, F., Akashi, O., Inoue, T., Mizutani, K.: The deep learning vision for heterogeneous network traffic control: proposal, challenges, and future perspective. IEEE Wirel. Commun. 24(3), 146–153 (2017)
Article Google Scholar
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters (09 2017)
Haddadi, F., Zincir-Heywood, A.N.: Benchmarking the effect of flow exporters and protocol filters on botnet traffic classification. IEEE Syst. J. 10, 1390–1401 (2016)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Draper-Gil, G., Lashkari, A.H., Mamun, M.S.I., Ghorbani, A.A.: Characterization of encrypted and vpn traffic using time-related features. In: ICISSP, February (2016)
Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. In: Proceedings of the 2007 conference on applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’07, (New York, NY, USA), p. 37–48, association for computing machinery, (2007)
Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 200–215. Springer International Publishing, Cham (2015)
Chapter Google Scholar
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw 13(4–5), 411–30 (2000)
Article Google Scholar
Vabalas, A., Gowen, E., Poliakoff, E., Casson, A.J.: Machine learning algorithm validation with a limited sample size. PLoS ONE 14, 1–20 (2019)
Article Google Scholar
Jaber, M., Cascella, R.G., Barakat, C.: Can we trust the inter-packet time for traffic classification? In: 2011 IEEE international conference on communications (ICC), pp. 1–5, (June 2011)
Bar Yanai, R., Langberg, M., Peleg, D., Roditty, L.: Realtime classification for encrypted traffic. In: Festa, P. (ed.) Experimental Algorithms, pp. 373–385. Springer, Berlin (2010)
Chapter Google Scholar
Lotfollahi, M., Zade, R.S.H., Siavoshani, M.J., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft Comput. 24, 1999–2012 (2020)
Article Google Scholar

Download references

Acknowledgements

This research is supported by the Mitacs (IT11704) and Solana Networks funding program. The research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/.

Author information

Authors and Affiliations

Western University, London, Canada
Ali Safari Khatouni
Solana Networks, Ottawa, Canada
Nabil Seddigh & Biswajit Nandy
Dalhousie University, Halifax, Canada
Nur Zincir-Heywood

Authors

Ali Safari Khatouni
View author publications
You can also search for this author in PubMed Google Scholar
Nabil Seddigh
View author publications
You can also search for this author in PubMed Google Scholar
Biswajit Nandy
View author publications
You can also search for this author in PubMed Google Scholar
Nur Zincir-Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Safari Khatouni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Safari Khatouni, A., Seddigh, N., Nandy, B. et al. Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors. J Netw Syst Manage 29, 8 (2021). https://doi.org/10.1007/s10922-020-09566-5

Download citation

Received: 11 March 2020
Revised: 19 September 2020
Accepted: 25 September 2020
Published: 31 October 2020
DOI: https://doi.org/10.1007/s10922-020-09566-5

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Traffic Identification in Big Internet Data

A review on machine learning–based approaches for Internet traffic classification

Analysis of Early Traffic Processing and Comparison of Machine Learning Algorithms for Real Time Internet Traffic Identification Using Statistical Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Traffic Identification in Big Internet Data

A review on machine learning–based approaches for Internet traffic classification

Analysis of Early Traffic Processing and Comparison of Machine Learning Algorithms for Real Time Internet Traffic Identification Using Statistical Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation