PREIUD: An Industrial Control Protocols Reverse Engineering Tool Based on Unsupervised Learning and Deep Neural Network Methods
<p>The system architecture of PRE. Statistics for output types are reflected in the word cloud.</p> "> Figure 2
<p>The processing steps of PREIUD.</p> "> Figure 3
<p>An example of the BVE algorithm’s voting process.</p> "> Figure 4
<p>The overall architecture of the BiLSTM-AM-CRF model.</p> "> Figure 5
<p>The PLC simulation platform used in the experiment.</p> "> Figure 6
<p>Accuracy of feature extraction.</p> "> Figure 7
<p>Conciseness of feature extraction.</p> "> Figure 8
<p>Coverage of feature extraction.</p> "> Figure 9
<p>Performance Comparison of three protocol reverse tools.</p> ">
Abstract
:1. Introduction
- We propose a novel unsupervised learning approach—namely, bootstrap voting expert arithmetic—to address the challenge of protocol feature extraction. Our experimental results demonstrated that this method outperforms several commonly used unsupervised feature extraction algorithms, in accurately inferring field boundaries.
- The proposed tool for industrial control protocol reversal, PREIUD, leverages a deep neural network model to facilitate protocol format and semantic inference. The model incorporates an attention mechanism and a bidirectional long short-term memory conditional random field (BiLSTM-AM-CRF) model, which enables the learning of potential dependencies between protocol fields, and enhances the accuracy of ICP reversal. Notably, we have introduced the concept of sequence tagging into the field of protocol inversion, which represents a significant contribution that further enhances the accuracy and interpretability of the reversal results.
- In contrast to most prior investigations of protocol reverse engineering, we generated sample datasets for our experiments, by utilizing traffic that had been collected from an offensive and defensive exercise platform that was based on actual industrial scenarios. The platform comprised control systems from various leading manufacturers, and a diverse range of industrial control protocols. In contrast to approaches that rely solely on static datasets or simulated traffic, this novel approach provided a more authentic representation of industrial traffic characteristics. It also surmounted the challenge of low test coverage, thereby enhancing the credibility of experimental findings and the generalizability of the model.
- We employed a multidimensional quantitative evaluation method, based on fuzzy comprehensive evaluation, to compare the performance of two advanced protocol reverse tools (MSERA, Discoverer) against PREIUD. The experimental results indicated that PREIUD is more effective and practical for the reverse analysis of industrial control protocols.
2. Related Work
3. Materials and Methods
3.1. Data Collection and Preprocessing
3.2. Protocol Field Segmentation and Feature Extraction
3.3. Protocol Format and Semantic Inference Model
3.3.1. Embedding Layer
3.3.2. BiLSTM Layer
3.3.3. Attention Layer
3.3.4. CRF Layer
4. Experiment and Evaluation
4.1. Datasets
4.2. Evaluation of Feature Extraction
- Accuracy of feature extraction: we selected the first s key fields inferred, to compare to the prior knowledge of related protocols, so as to judge whether each method could accurately extract the features of the target protocol. No points were awarded if a specific keyword was omitted or a fixed field was wrongly split.
- Conciseness of feature extraction: we counted the ratio of the number of extracted top s feature words to the number of all key fields of the corresponding protocol. An overly conservative segmentation strategy produced redundant protocol features; a lower ratio, therefore, reduced redundancy in the protocol format.
- Coverage of feature extraction: we counted the proportion of the first s key fields covering the entire protocol information. Higher coverage could reflect the comprehensiveness of feature extraction.
4.3. Evaluation of Format and Semantic Inference Model
4.3.1. Model Training Setting
4.3.2. Results of Format and Semantic Inference
4.3.3. Evaluation and Comparison Analysis
- Precision: described the rate at which samples of a particular type were correctly predicted. A higher precision score meant that the model was making fewer false positives.
- Recall: described the rate at which samples of a particular type were correctly identified. A higher recall score meant that the model was better at identifying positive samples.
- F1-score: this metric combined precision and recall into a single measure. A higher F1-score meant that the model performed well in both precision and recall.
- Conciseness: described the complexity of the model. A simpler model was more interpretable and easier to understand. We counted the ratio of the number of protocol states inferred by PRE to the sample protocol types: the higher the ratio, the higher the redundancy. A ratio of less than 1 indicated that the semantic inference was incomplete, and that the item could not be scored.
- Efficiency: described the speed and scalability of the model. A more efficient model could handle larger datasets and make predictions faster. We counted the time interval required to complete the reverse work of the protocol for a fixed-size data stream (1kb): the shorter the time interval, the higher the efficiency.
5. Discussion
6. Conclusions
7. Patents
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- ICS-CERT 2021 Annual Vulnerability Coordination Report. Available online: https://www.cisa.gov/uscert/ics/alerts (accessed on 14 January 2022).
- Narayan, J.; Shukla, S.K.; Clancy, T.C. A survey of automatic protocol reverse engineering tools. ACM Comput. Surv. (CSUR) 2015, 48, 1–26. [Google Scholar] [CrossRef]
- Aldallal, A. Toward Efficient Intrusion Detection System Using Hybrid Deep Learning Approach. Symmetry 2022, 14, 1916. [Google Scholar] [CrossRef]
- Luo, J.Z.; Shan, C.; Cai, J.; Liu, Y. IoT Application-Layer Protocol Vulnerability Detection using Reverse Engineering. Symmetry 2018, 10, 561. [Google Scholar] [CrossRef] [Green Version]
- Alomari, E.S.; Nuiaa, R.R.; Alyasseri, Z.A.A.; Mohammed, H.J.; Sani, N.S.; Esa, M.I.; Musawi, B.A. Malware Detection Using Deep Learning and Correlation-Based Feature Selection. Symmetry 2023, 15, 123. [Google Scholar] [CrossRef]
- Galloway, B.; Hancke, G.P. Introduction to industrial control networks. IEEE Commun. Surv. Tutor. 2012, 15, 860–880. [Google Scholar] [CrossRef] [Green Version]
- Sija, B.D.; Goo, Y.H.; Shim, K.S.; Hasanova, H.; Kim, M.S. A survey of automatic protocol reverse engineering approaches, methods, and tools on the inputs and outputs view. Secur. Commun. Netw. 2018, 2018, 8370341. [Google Scholar] [CrossRef]
- Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey. Information 2019, 10, 150. [Google Scholar] [CrossRef] [Green Version]
- Xiao, M.M.; Luo, Y.P. Automatic protocol reverse engineering using grammatical inference. J. Intell. Fuzzy Syst. 2017, 32, 3585–3594. [Google Scholar] [CrossRef]
- Meng, F.; Zhang, C.; Wu, G. Protocol reverse based on hierarchical clustering and probability alignment from network traces. In Proceedings of the 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), Shanghai, China, 9–12 March 2018; pp. 443–447. [Google Scholar]
- Kleber, S.; van der Heijden, R.W.; Kargl, F. Message type identification of binary network protocols using continuous segment similarity. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 2243–2252. [Google Scholar]
- Yang, C.; Fu, C.; Qian, Y.; Hong, Y.; Feng, G.; Han, L. Deep learning-based reverse method of binary protocol. In Proceedings of the International Conference on Security and Privacy in Digital Economy, Quzhou, China, 30 October–1 November 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 606–624. [Google Scholar]
- Wang, Y.; Bai, B.; Hei, X.; Zhu, L.; Ji, W. An unknown protocol syntax analysis method based on convolutional neural network. Trans. Emerg. Telecommun. Technol. 2021, 32, e3922. [Google Scholar] [CrossRef]
- Kiechle, V.; Börsig, M.; Nitzsche, S.; Baumgart, I.; Becker, J. PREUNN: Protocol Reverse Engineering using Neural Networks. In Proceedings of the ICISSP, Online Streaming, 9–11 February 2022; pp. 345–356. [Google Scholar]
- Wang, R.; Shi, Y.; Ding, J. Reverse Engineering of Industrial Control Protocol By XGBoost with V-gram. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 172–176. [Google Scholar]
- Wang, X.; Lv, K.; Li, B. IPART: An automatic protocol reverse engineering tool based on global voting expert for industrial protocols. Int. J. Parallel Emergent Distrib. Syst. 2020, 35, 376–395. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, Z.; Lee, P.P.; Liu, Y.; Xie, G. ProWord: An unsupervised approach to protocol feature word extraction. In Proceedings of the IEEE INFOCOM 2014-IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 1393–1401. [Google Scholar]
- Cohen, P.; Adams, N.; Heeringa, B. Voting experts: An unsupervised algorithm for segmenting sequences. Intell. Data Anal. 2007, 11, 607–625. [Google Scholar] [CrossRef] [Green Version]
- Hewlett, D.; Cohen, P. Bootstrap voting experts. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, Hainan, China, 25–26 April 2009. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. arXiv 2013, arXiv:1310.4546. [Google Scholar]
- Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
- Jang, B.; Kim, M.; Harerimana, G.; Kang, S.u.; Kim, J.W. Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Appl. Sci. 2020, 10, 5841. [Google Scholar] [CrossRef]
- Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
- Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
- Lou, H.L. Implementing the Viterbi algorithm. IEEE Signal Process. Mag. 1995, 12, 42–52. [Google Scholar] [CrossRef]
- Zong, X.; Zhang, J.; He, K. An Offensive and Defensive Exercise Platform for Industrial Control System Network Information Security. J. Shenyang Univ. Chem. Technol. 2021, 36, 296–304. [Google Scholar]
- Li, H.; Shuai, B.; Wang, J.; Tang, C. Protocol reverse engineering using LDA and association analysis. In Proceedings of the 2015 11th International Conference on Computational Intelligence and Security (CIS), Shenzhen, China, 19–20 December 2015; pp. 312–316. [Google Scholar]
- Wang, Y.; Yun, X.; Shafiq, M.Z.; Wang, L.; Liu, A.X.; Zhang, Z.; Yao, D.; Zhang, Y.; Guo, L. A semantics aware approach to automated reverse engineering unknown protocols. In Proceedings of the 2012 20th IEEE International Conference on Network Protocols (ICNP), Austin, TX, USA, 30 October–2 November 2012; pp. 1–10. [Google Scholar]
- Lopes, R.H.; Reid, I.; Hobson, P.R. The Two-Dimensional Kolmogorov-Smirnov Test. In Proceedings of the Xi International Workshop on Advanced Computing & Analysis Techniques in Physics Research, Amsterdam, The Netherlands, 23–27 April 2007. [Google Scholar]
- Zhang, Z. Improved adam optimizer for deep neural networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar]
- Huang, Y.; Shu, H.; Kang, F.; Guang, Y. Protocol Reverse-Engineering Methods and Tools: A Survey. Comput. Commun. 2022, 182, 238–254. [Google Scholar] [CrossRef]
- Wang, Q.; Sun, Z.; Wang, Z.; Ye, S.; Su, Z.; Chen, H.; Hu, C. A Practical Format and Semantic Reverse Analysis Approach for Industrial Control Protocols. Secur. Commun. Netw. 2021, 2021, 6690988. [Google Scholar] [CrossRef]
- Cui, W.; Kannan, J.; Wang, H.J. Discoverer: Automatic Protocol Reverse Engineering from Network Traces. In Proceedings of the USENIX Security Symposium; USENIX Association: Berkeley, CA, USA, 2007; pp. 1–14. [Google Scholar]
- Bossert, G.; Guihéry, F.; Hiet, G. Towards automated protocol reverse engineering using semantic information. In Proceedings of the 9th ACM Symposium on Information, Computer and Communications Security, Kyoto, Japan, 4–6 June 2014; pp. 51–62. [Google Scholar]
- Meng, F.; Liu, Y.; Zhang, C.; Li, T.; Yue, Y. Inferring protocol state machine for binary communication protocol. In Proceedings of the 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA), Ottawa, ON, Canada, 29–30 September 2014; pp. 870–874. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
- Hu, Z.; Shi, J.; Huang, Y.; Xiong, J.; Bu, X. GANFuzz: A GAN-based industrial network protocol fuzzing framework. In Proceedings of the 15th ACM International Conference on Computing Frontiers, Ischia, Italy, 8–10 May 2018; pp. 138–145. [Google Scholar]
Tool or Author | Protocol Type | Feature Extraction | Format Inference | Features |
---|---|---|---|---|
Meng F | Binary | Hierarchical clustering | Probabilistic alignment | Efficient but moderately accurate inference |
NEMETYL | Binary | DBSCAN clustering | Hirschberg alignment | Field dissimilarity considered, but limited test protocols |
Yang C | Binary | Sequence coding | LSTM-FCN | Deep learning and field encoding proposed for reverse engineering |
Wang R | Binary(ICPs) | Progressive multi-sequence clustering | XGBoost | Effective SIEMENS protocol reversal, but low test coverage |
IPART | Binary(ICPs) | Extended voting expert | Global voting expert algorithm | Able to reverse Modbus, iec104, and Ethernet |
PREIUD | Binary(ICPs) | Bootstrap voting expert | BiLSTM-AM-CRF | Combines unsupervised learning and deep neural networks for efficient reversal of most ICPs. |
Protocol | Source | Type | Flow |
---|---|---|---|
S7comm | PLC: SIEMENS S7-300 SIEMENS S7-1200 | request, response, Read, Write, upload, download, run, stop | 42,920 |
Fins | PLC: OMRON CP1L | Read, multiple read, Transfer, write, upload, download, run, stop | 5464 |
Modbus /TCP | PLC: Rockwell Mircologix1400 Emerson CPE100 Application:pymodbus | Read and write registers/ coils, report slave, unknown function | 35,324 |
Ethernet/IP | PLC:AB CompactLogixL30ER MITSUBISHI FX5U32M | Send/reserved data | 16,782 |
DNP3 | Application: Gec-dnp3 | File_read, file_list_directory, full_exchange | 4538 |
IEC104 | Packet:lib60870 | U-format | 5381 |
Parameter | Value |
---|---|
Field embedding size | 50 |
Feature embedding size | 50 |
Size of LSTM hidden unit | 200 |
Mini-batch size | 64 |
Learning rate | 0.001 |
Dropout rate | 0.3 |
Time steps | 100 |
Field Sequence | 00 | 15 | 00 | 00 | 00 | 06 | ff | 04 | 01 | f4 | 00 | 64 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Wireshark results | Transaction ID | Protocol ID | Length | Unit ID | Function code: read | Reference number | Word count | |||||
Semantic tag | Transaction ID | Protocol: Modbus/TCP | Length | Unit ID | Function code: read | Data | ||||||
Offset tag | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 |
Evaluation Factor | Precision | Recall | F1-Score | Conciseness | Efficiency |
---|---|---|---|---|---|
Weight | 0.3 | 0.25 | 0.25 | 0.1 | 0.1 |
Protocol | Precision(%) | Recall(%) | F1-Score(%) | Conciseness | Efficiency | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PRE | MSE | DIS | PRE | MSE | DIS | PRE | MSE | DIS | PRE | MSE | DIS | PRE | MSE | DIS | |
S7comm | 86.8 | 81.2 | 55.6 | 84.2 | 83.8 | 67.3 | 85.5 | 82.5 | 60.9 | 2.5 | 2 | 0.88 | 0.3 | 0.15 | 0.1 |
Fins | 81.2 | 69.4 | 43.8 | 80.2 | 73.5 | 52.8 | 80.7 | 71.4 | 47.9 | 3 | 2.63 | 0.67 | 0.3 | 0.1 | 0.1 |
Modbus | 91.4 | 82.4 | 83.7 | 92.5 | 85.6 | 86.2 | 91.9 | 84.0 | 84.9 | 1.5 | 1.33 | 1.2 | 0.3 | 0.1 | 0.1 |
Ethernet | 88.2 | 81.1 | 76.8 | 89.6 | 78.2 | 74.5 | 88.9 | 79.6 | 75.6 | 1.33 | 1.5 | 1.33 | 0.3 | 0.1 | 0.1 |
DNP3 | 84.1 | 79.4 | 82.5 | 83.3 | 82.5 | 81.9 | 83.7 | 80.9 | 82.2 | 1.67 | 0.88 | 1.12 | 0.3 | 0.15 | 0.1 |
IEC104 | 79.3 | 73.5 | 76.1 | 80.6 | 78.2 | 80.4 | 79.9 | 75.8 | 78.2 | 2 | 1.5 | 0.3 | 0.15 | 0.1 | 0.1 |
Total(%) | 85.2 | 77.8 | 69.8 | 85.1 | 80.3 | 73.9 | 85.1 | 79.0 | 71.6 | 0.17 | 0.33 | 0.66 | 0 | 0.5 | 0.83 |
Tool | PREIUD | MSERA | Discoverer |
---|---|---|---|
Combined score | 70.15 | 64.35 | 60.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ning, B.; Zong, X.; He, K.; Lian, L. PREIUD: An Industrial Control Protocols Reverse Engineering Tool Based on Unsupervised Learning and Deep Neural Network Methods. Symmetry 2023, 15, 706. https://doi.org/10.3390/sym15030706
Ning B, Zong X, He K, Lian L. PREIUD: An Industrial Control Protocols Reverse Engineering Tool Based on Unsupervised Learning and Deep Neural Network Methods. Symmetry. 2023; 15(3):706. https://doi.org/10.3390/sym15030706
Chicago/Turabian StyleNing, Bowei, Xuejun Zong, Kan He, and Lian Lian. 2023. "PREIUD: An Industrial Control Protocols Reverse Engineering Tool Based on Unsupervised Learning and Deep Neural Network Methods" Symmetry 15, no. 3: 706. https://doi.org/10.3390/sym15030706