Abstract
Visibility into network traffic is a key requirement for different security and network monitoring tools. Recent trends in the evolution of Internet traffic present a challenge for traditional traffic analysis methods to achieve accurate classification of Internet traffic including Voice over IP (VoIP), text messaging, video, and audio services among others. A key aspect of this trend is the rising levels of encrypted multiple service channels where the payload is opaque to middleboxes in the network. In such scenarios, traditional approaches such as Deep Packet Inspection (DPI) or examination of Port numbers are unable to achieve the classification accuracy required. This work investigates Machine Learning-based network traffic classifiers as a means of accurately classifying encrypted multiple service channels. The study carries out a thorough study which (i) proposes and evaluates two machine learning-based frameworks for multiple service channels analysis; (ii) undertakes feature engineering to identify the minimum number of features required to obtain high accuracy while reducing the effects of over-fitting; (iii) explores the portability and robustness of the frameworks trained models under different network conditions: location, time, and volume; and (iv) collects and analyzes a large-scale dataset including nine classes of services, for benchmarking purposes.
Similar content being viewed by others
Notes
We use the abbreviation of classifier in the Table 7 for the sake of brevity. We consider Random forest (RF), Decision tree (DT), Complement Naive Bayes (CNB), Multinomial Naive Bayes (MNB), K-Nearest Neighbors (KNN), Bernoulli Naive Bayes (BNB), Linear Support-Vector Machine (LSVM), Classifier using Ridge Regression (RR), NearestCentroid (NC), Support-Vector Machine (SVM), Passive-Aggressive (PR), Perceptron (P), and linear Models with Stochastic Gradient Descent (LSGD).
The detailed result is omitted for the sake of brevity.
References
Alshammari, R., Zincir-Heywood, A.N.: Can encrypted traffic be identified without port numbers, ip addresses and payload inspection? Comput. Netw. 55, 1326–1350 (2011)
Matthews, P., Rosenberg, J., Wing, D., Mahy, R.: Session traversal utilities for NAT (STUN). In: RFC 5389, no. 5389 in Request for Comments, RFC Editor, (October 2008)
Safari Khatouni, A., Trevisan, M., Regano, L., Viticchié, A.: Privacy issues of ISPs in the modern web. In: 2017 8th IEEE annual information technology, electronics and mobile communication conference (IEMCON), pp. 588–594, (October 2017)
Trevisan, M., Giordano, D., Drago, I., Mellia, M., Munafo, M.: Five years at the edge: Watching internet from the isp network. In: Proceedings of the 14th international conference on emerging networking experiments and technologies, CoNEXT ’18, (New York, NY, USA), pp. 1–12, ACM, (December 2018)
Trevisan, M., Giordano, D., Drago, I., Mellia, M., Munafo, M.: Five years at the edge: Watching internet from the isp network. In: Proceedings of the 14th international conference on emerging networking experiments and technologies, CoNEXT ’18, (New York, NY, USA), pp. 1–12, ACM, (December 2018)
Rescorla, E., Schiffman, A.M.: The secure HyperText transfer protocol. RFC 2660, (Aug. 1999)
Burschka, S., Dupasquier, B.: Tranalyzer: Versatile high performance network traffic analyser. In: 2016 IEEE symposium series on computational intelligence (SSCI), pp. 1–8, (December 2016)
Finamore, A., Mellia, M., Meo, M., Munafo, M.M., Torino, P.D., Rossi, D.: Experiences of Internet traffic monitoring with tstat. IEEE Netw. 25, 8–14 (2011)
CERT/NetSA at Carnegie Mellon University: SiLK (System for Internet-Level Knowledge). http://tools.netsa.cert.org/silk. Accessed July 2009.
C.M. University: Argus: the network audit record generation and utilization system. https://qosient.com/argus/. December 1994
Safari Khatouni, A., Zincir-Heywood, N.: Integrating machine learning with off-the-shelf traffic flow features for http/https traffic classification. In: 2019 the 24th symposium on computers and communications (ISCC), (June 2019)
Safari Khatouni A., Zincir-Heywood, N.: How much training data is enough to move a ml-based classifier to a different network? Procedia Computer Science, 155, 378–385 (August 2019). In: The 14th international conference on future networks and communications (FNC-2019)
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Multi-classification approaches for classifying mobile app traffic. J. Netw. Comput. Appl. 103, 131–145 (2018)
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Mimetic: mobile encrypted traffic classification using multimodal deep learning. Comput. Netw. 165, 106944 (2019)
Brissaud, P.-O., Francois, J., Chrisment, I., Cholez, T., Bettan, O.: Passive monitoring of https service use. In: CNSM’18—14th international conference on network and service management, (Rome, Italy), pp. 7, (November 2018)
Lotfollahi, M., Zade, R.S.H., Siavoshani, M.J., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. CoRR, (September 2017). arXiv:1709.02656
Trevisan, M., Drago, I., Mellia, M., Song, H.H., Baldi, M.: What: a big data approach for accounting of modern web services. In: 2016 IEEE international conference on big data (Big Data), pp. 2740–2745, (December 2016)
Trevisan, M., Drago, I., Mellia, M., Song, H.H., Baldi, M.: AWESoME: big data for automatic web service management in SDN. IEEE Trans. Netw. Serv. Manag. PP, 1 (2017)
Dong Ning, Y., Jie Zhao, J., Jin, J.: Novel feature selection and classification of internet video traffic based on a hierarchical scheme. Comput. Netw. 119, 102–111 (2017)
Davis, J.J., Foo, E.: Automated feature engineering for http tunnel detection. Comput. Secur. 59, 166–185 (2016)
Gonzalez, R., Soriente, C., Laoutaris, N.: User profiling in the time of HTTPS. In: Proceedings of the 2016 internet measurement conference, IMC ’16, (New York, NY, USA), pp. 373–379, ACM, (November 2016)
Fu, Y., Xiong, H., Lu, X., Yang, J., Chen, C.: Service usage classification with encrypted internet traffic in mobile messaging apps. IEEE Trans. Mob. Comput. 15, 2851–2864 (2016)
Shbair, W.M., Cholez, T., Francois, J., Chrisment, I.: A multi-level framework to identify https services. In: NOMS 2016 - 2016 IEEE/IFIP network operations and management symposium, pp. 240–248, (April 2016)
Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I.: AppScanner: automatic fingerprinting of smartphone apps from encrypted network traffic. In: 2016 IEEE European symposium on security and privacy (EuroS P), pp. 439–454, (March 2016)
Alshammari, R., Zincir-Heywood, A.N.: How robust can a machine learning approach be for classifying encrypted voip? J. Netw. Syst. Manag. 23, 830–869 (2015)
Wang, Q., Yahyavi, A., Kemme, B., He, W.: I know what you did on your smartphone: inferring app usage over encrypted data traffic. In: 2015 IEEE conference on communications and network security (CNS), pp. 433–441, (September 2015)
Xu, Q., Liao, Y., Miskovic, S., Mao, Z.M., Baldi, M., Nucci, A., Andrews, T.: Automatic generation of mobile app signatures from traffic observations. In: 2015 IEEE conference on computer communications (INFOCOM), pp. 1481–1489, (April 2015)
Branch, P.A., Heyde, A., Armitage, G.J.: Rapid identification of skype traffic flows. In: Proceedings of the 18th international workshop on network and operating systems support for digital audio and video, NOSSDAV ’09, (New York, NY, USA), pp. 91–96, ACM, June (2009)
Li, W., Canini, M., Moore, A.W., Bolla, R.: Efficient application identification and the temporal and spatial stability of classification schema. Comput. Netw. 53, 790–809 (2009)
Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., Salamatian, K.: Traffic classification on the fly. SIGCOMM Comput. Commun. Rev. 36, 23–26 (2006)
Pacheco, F., Exposito, E., Gineste, M., Baudoin, C., Aguilar, J.: Towards the deployment of machine learning solutions in network traffic classification: a systematic survey. IEEE Communications Surveys Tutorials, pp. 1–1, (November 2018)
Namdev, N., Agrawal, S., Silkari, S.: Recent advancement in machine learning based internet traffic classification. Procedia Computer Science 60, 784–791 (2015). Knowledge-based and intelligent information & engineering systems 19th annual conference, KES-2015, Singapore, September 2015 proceedings
Velan, P., Čermák, M., Čeleda, P., Drašar, M.: A survey of methods for encrypted traffic classification and analysis. Network 25, 355–374 (2015)
Nguyen, T.T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutorials 10, 56–76 (2008). Fourth Quarter
Aceto, G., Ciuonzo, D., Montieri, A., Pescapé, A.: Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE Trans. Netw. Serv. Manag. 16(2), 445–458 (2019)
Shbair, W.M., Cholez, T., Francois, J., Chrisment, I.: A survey of https traffic and services identification approaches (2020)
Conti, M., Li, Q.Q., Maragno, A., Spolaor, R.: The dark side(-channel) of mobile devices: a survey on network traffic analysis. IEEE Commun. Surv. Tutorials 20(4), 2658–2713 (2018)
Kim, H., Claffy, K., Fomenkov, M., Barman, D., Faloutsos, M., Lee, K.: Internet traffic classification demystified: myths, caveats, and the best practices. In: Proceedings of the 2008 ACM CoNEXT conference, CoNEXT ’08, (New York, NY, USA), Association for Computing Machinery, (2008)
Datta, J., Kataria N., Hubballi, N.: Network traffic classification in encrypted environment: A case study of google hangout. In: 2015 twenty first national conference on communications (NCC), pp. 1–6, February 2015
Husák, M., Cermák, M., Jirsík, T., Celeda, P.: Network-based https client identification using ssl/tls fingerprinting. In: 2015 10th international conference on availability, reliability and security, pp. 389–396, (August 2015)
Hady, M.F.A., Schwenker, F.: Semi-supervised learning, pp. 215–239. Springer, Berlin, Heidelberg (2013)
Kato, N., Fadlullah, Z.M., Mao, B., Tang, F., Akashi, O., Inoue, T., Mizutani, K.: The deep learning vision for heterogeneous network traffic control: proposal, challenges, and future perspective. IEEE Wirel. Commun. 24(3), 146–153 (2017)
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters (09 2017)
Haddadi, F., Zincir-Heywood, A.N.: Benchmarking the effect of flow exporters and protocol filters on botnet traffic classification. IEEE Syst. J. 10, 1390–1401 (2016)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Draper-Gil, G., Lashkari, A.H., Mamun, M.S.I., Ghorbani, A.A.: Characterization of encrypted and vpn traffic using time-related features. In: ICISSP, February (2016)
Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., Tofanelli, P.: Revealing skype traffic: when randomness plays with you. In: Proceedings of the 2007 conference on applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’07, (New York, NY, USA), p. 37–48, association for computing machinery, (2007)
Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 200–215. Springer International Publishing, Cham (2015)
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw 13(4–5), 411–30 (2000)
Vabalas, A., Gowen, E., Poliakoff, E., Casson, A.J.: Machine learning algorithm validation with a limited sample size. PLoS ONE 14, 1–20 (2019)
Jaber, M., Cascella, R.G., Barakat, C.: Can we trust the inter-packet time for traffic classification? In: 2011 IEEE international conference on communications (ICC), pp. 1–5, (June 2011)
Bar Yanai, R., Langberg, M., Peleg, D., Roditty, L.: Realtime classification for encrypted traffic. In: Festa, P. (ed.) Experimental Algorithms, pp. 373–385. Springer, Berlin (2010)
Lotfollahi, M., Zade, R.S.H., Siavoshani, M.J., Saberian, M.: Deep packet: a novel approach for encrypted traffic classification using deep learning. Soft Comput. 24, 1999–2012 (2020)
Acknowledgements
This research is supported by the Mitacs (IT11704) and Solana Networks funding program. The research is conducted as part of the Dalhousie NIMS Lab at: https://projects.cs.dal.ca/projectx/.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Safari Khatouni, A., Seddigh, N., Nandy, B. et al. Machine Learning Based Classification Accuracy of Encrypted Service Channels: Analysis of Various Factors. J Netw Syst Manage 29, 8 (2021). https://doi.org/10.1007/s10922-020-09566-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10922-020-09566-5