CN119675922B

CN119675922B - Malicious control instruction detection method and system based on CNN-LSTM hybrid model

Info

Publication number: CN119675922B
Application number: CN202411727815.XA
Authority: CN
Inventors: 杨舒钧; 宋进良; 李桐; 任帅; 陈得丰; 张浩明; 雷振江; 夏天; 杨超; 刘卓林; 李广翱; 扬爽; 李欢; 肖楠; 孙茜; 朱紫煜; 刘东阳; 黄博南
Original assignee: State Grid Liaoning Electric Power Co Ltd; Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd; State Grid Corp of China SGCC
Current assignee: State Grid Liaoning Electric Power Co Ltd; Electric Power Research Institute of State Grid Liaoning Electric Power Co Ltd; State Grid Corp of China SGCC
Priority date: 2024-11-28
Filing date: 2024-11-28
Publication date: 2025-10-03
Anticipated expiration: 2044-11-28
Also published as: CN119675922A

Abstract

The invention provides a malicious control instruction detection method and system based on a CNN-LSTM hybrid model. The method is applied to a power distribution network communication network, firstly, an API call sequence generated by the execution behavior of a malicious control instruction in software in power distribution network operation software is extracted by using an open-source cuckoo sandbox, and a training data set is constructed. And establishing a CNN-LSTM hybrid model, extracting a part of the API call sequence as input data, constructing a loss function, selecting two classification loss functions Sigmoid for processing, obtaining a classification result, and detecting whether a control instruction is malicious or not. And then, proving the high efficiency of the proposed method under the malicious service control instruction through a confusion matrix, and comparing indexes such as precision, recall rate, accuracy, F1 score and the like with the detection performance of the traditional CNN and LSTM model to prove the performance advantage of the CNN-LSTM hybrid model of the proposed method. Finally, the super parameters are adjusted to determine the optimal CNN-LSTM hybrid model. The invention can more efficiently and accurately detect the malicious control instruction.

Description

Malicious control instruction detection method and system based on CNN-LSTM hybrid model

Technical Field

The invention relates to the technical field of detection of malicious control instructions in an active power distribution network communication network, in particular to a method and a system for detecting malicious control instructions based on a CNN-LSTM hybrid model.

Background

In modern society, distribution networks play a vital role, which is a critical infrastructure for delivering electrical energy from power plants to user terminals. First, the distribution network is the basis for securing the supply of electricity. Almost all production and living activities in modern society are not separated from the power supply. Secondly, the distribution network is a key to secure socioeconomic operation, as the distribution network is a key link for power transmission. Any disturbance to the distribution network may cause interruption of the power supply, production stagnation and even economic losses. Therefore, the safety and stability of the power distribution network are important for the sustainable development of socioeconomic. In addition, the distribution network is an important guarantee for guaranteeing public safety. Overload, short circuit or damage of distribution equipment can cause fire, explosion and other safety accidents, and the life and property of people are seriously threatened. Therefore, ensuring that the power distribution system operates in a safe and stable state is a key to ensuring public safety.

PLC (Power Line Communication ) technology in power distribution networks is divided into wired and wireless technologies for distributing power and transmitting data over different frequency ranges. Malicious control instructions are one of the important issues that lead to slow and unavailable communications, the presence of erroneous and altered instructions, faults, and observed system anomalies. Malicious control instructions may cause abnormal operation of the power distribution equipment, paralysis of the system, and may even pose a serious threat to public safety. Therefore, with the rapid diffusion of novel malicious control instructions, an effective technical means is needed for detecting the malicious control instructions in the power distribution network communication network.

IDS (Intrusion Detection System ) is one of the solutions to handle distribution network security attacks and to protect the system from malicious control instructions of unauthorized access. However, the existing IDS system has the problems that attack samples are rare and changeable, and positive and negative sample distribution is extremely unbalanced, so that models are difficult to learn and train, and on one hand, as control instructions increase, more and more rights are needed. To avoid detection by the system built-in protection and disinfection software, malicious control instructions also begin to learn how to apply a series of permissions to benign control instructions. Therefore, the judgment of the malicious control instruction is difficult to realize, and a feasible rapid identification and detection algorithm is required to be designed to identify, prevent and cope with the influence of the malicious control instruction, so that the safe and stable operation of the power distribution network is ensured, the reliability of power supply is ensured, and the social order and public safety are maintained.

With the development of artificial intelligence, deep learning opens up a new research direction for the field of network security of distribution network side communication networks. Detection and classification of malicious control instructions is essentially a classification problem, and deep learning can perform various classification tasks excellently. After a CNN (Convolutional Neural Network ) model widely used in image processing is proposed, some students draw inspiration from a method of NLP (Natural Language Processing ), a multi-layer convolutional neural network model is proposed, assembly language codes decompiled by a decompiler are treated as text data, and an instruction set of the assembly code is used as a dictionary to train the model. However, this method requires a very expensive program analysis. Some scholars have proposed a malicious control instruction classification method named MCFT-CNN (Malware Classification with fine-tune Convolution Neural Networks, malware classification using a fine-tuning convolutional neural network) based on a fine-tuning convolutional neural network model. Model parameters that have been trained on the ImageNet dataset are used for transition learning to classify malicious control instructions into corresponding families of malicious control instructions. However, the use of pre-training parameters can result in a model that is too bulky and complex, affecting detection efficiency.

Disclosure of Invention

Aiming at the defects of the existing malicious control instruction detection technology, the invention provides a method and a system for detecting a malicious control instruction based on a CNN-LSTM hybrid model, which utilize an open-source Cuckoo Sandbox to extract an API (Application Programming Interface ) call sequence generated by the execution behavior of the malicious control instruction in the power distribution network operation software in the software, and train the model by using a large data set of the API sequence called by the malicious and benign control instructions. Then, a CNN-LSTM (Long Short Term Memory long-short-term memory network) model structure is established, a part of an API call sequence is extracted as input data, a classification result is obtained after a classification loss function Sigmoid is selected for processing, and whether a control instruction is malicious or not is detected. Finally, the confusion matrix proves that the proposed method can detect the malicious control instruction more efficiently and accurately.

The invention adopts the following technical scheme.

The first aspect of the invention provides a malicious control instruction detection method based on a CNN-LSTM hybrid model, which is applied to a power distribution network communication network and comprises the following steps:

acquiring a plurality of API call sequences generated when a control instruction to be detected is executed in the distribution network communication network software;

encrypting the API call sequences into hash values by using a hash algorithm to obtain a plurality of encrypted API call sequences;

The method comprises the steps of inputting a plurality of encrypted API call sequences into a pre-built malicious control instruction detection model, outputting whether a control instruction to be detected is a malicious control instruction or not, wherein the malicious control instruction detection model is a mixed model built based on CNN and LSTM, the malicious control instruction detection model is obtained through machine learning training by using a plurality of groups of data, the plurality of groups of data comprise first-class data and second-class data, each group of data in the first-class data comprises an API call sequence generated by the malicious control instruction when the malicious control instruction is executed in distribution network communication network software after encryption, and each group of data in the second-class data comprises an API call sequence generated by the benign control instruction when the malicious control instruction is executed in the distribution network communication network software after encryption.

Optionally, the plurality of API call sequences includes a first 100 non-repeating continuous API call sequences associated with a parent process of the calling API.

Optionally, constructing the hybrid model based on CNN and LSTM includes:

Selecting an API call sequence from a plurality of groups of data as an input sequence;

the method comprises the steps of constructing a CNN model, inputting an input sequence into the CNN model, and sequentially processing the convolutional layer and the pooling layer;

Constructing an LSTM model, inputting the output of the pooling layer into the LSTM model, and inputting the output of the LSTM model into the complete connection layer;

An output layer is established and used for outputting characteristic data related to the malicious control instruction, wherein the characteristic data comprises a network communication destination of the control instruction, a target path operated by the control instruction and a behavior mode of the control instruction;

and establishing a classification layer, inputting the characteristic data output by the output layer into the classification layer, and outputting whether the control instruction to be detected is a malicious control instruction by the classification layer to complete the establishment of the hybrid model based on the CNN and the LSTM.

Optionally, the CNN model further includes:

the embedded layer is used for performing dimension reduction treatment on the input sequence;

And the batch normalization layer is connected with the embedded layer and used for normalizing the data output by the embedded layer to a preset range and then inputting the normalized data to the convolution layer.

Optionally, the convolution layers include a first convolution layer and a second convolution layer.

Optionally, the first convolution layer has 64 filters and the second convolution layer has 128 filters.

Optionally, in the classification layer, a Sigmoid function is used to output a probability that the control instruction to be detected is a malicious control instruction according to the feature data.

Optionally, the method further comprises:

Evaluating the performance of the malicious control instruction detection model to obtain an evaluation result;

And adjusting parameters of the malicious control instruction detection model according to the loss function and the evaluation result, wherein the parameters comprise at least one of pooling rate, layer number, channel number and hidden neuron number.

Optionally, evaluating the performance of the malicious control instruction detection model includes:

Calculating according to the output result of the malicious control instruction detection model by using an confusion matrix, wherein the row of the confusion matrix represents an actual category, the column represents the category output by the malicious control instruction detection model, the confusion matrix comprises four cells, each cell corresponds to a preset category, the preset category comprises true positives, false negatives and true negatives, and the numerical value in each cell represents the number of times that the malicious control instruction detection model predicts the actual category as the preset category;

And calculating an evaluation result according to the confusion matrix.

Optionally, adjusting parameters of the malicious control instruction detection model according to the loss function and the evaluation result includes:

calculating a comprehensive index value according to the loss function and the evaluation result and the following formula;

Q=α×e^lnL+β×(1-e^lnZ)

Wherein Q represents a comprehensive index value, L is a loss value of a malicious control instruction monitoring model, the loss value is obtained through loss function calculation, Z is an evaluation result of the performance of the malicious control instruction monitoring model, and alpha and beta are weight coefficients and are non-negative numbers;

And when Q is greater than the tuning threshold, tuning parameters of the malicious control instruction detection model.

A second aspect of the present invention provides a malicious control instruction detection system, comprising:

The acquisition module is used for acquiring a plurality of API call sequences generated when the control instruction to be detected is executed in the distribution network communication network software;

The encryption module is used for encrypting the API call sequences into hash values by using a hash algorithm to obtain a plurality of encrypted API call sequences;

The detection module is used for inputting a plurality of encrypted API call sequences into a pre-built malicious control instruction detection model and outputting whether a control instruction to be detected is a malicious control instruction or not, the malicious control instruction detection model is a mixed model built based on CNN and LSTM, the malicious control instruction detection model is obtained through machine learning training by using a plurality of groups of data, the plurality of groups of data comprise first-class data and second-class data, each group of data in the first-class data comprises an API call sequence generated by the malicious control instruction when the malicious control instruction is executed in the distribution network communication network software, the API call sequence generated by the malicious control instruction when the malicious control instruction is executed in the distribution network communication network software is encrypted, and each group of data in the second-class data comprises the API call sequence generated by the benign control instruction when the malicious control instruction is executed in the distribution network communication network software.

A third aspect of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the computer program implements the above-mentioned method for detecting malicious control instructions based on a CNN-LSTM hybrid model when loaded into the processor.

A fourth aspect of the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the above-described method for detecting malicious control instructions based on a CNN-LSTM hybrid model.

Compared with the prior art, the invention has the beneficial effects that at least:

Compared with the existing malicious control instruction detection method, the method has the problems that the model is difficult to learn and train due to the fact that malicious control instruction samples are rare and changeable and the positive and negative sample distribution is extremely unbalanced, and judgment of the malicious control instruction is difficult to realize. The detection method provided by the invention finds the key information API call sequence for identifying the malicious control instruction. Since these data are generated by the behavior of the control instructions themselves, it is difficult to confuse them. Therefore, the method uses the API call sequence extracted from the Cuckoo Sandbox report as the data set for detecting the effective malicious control instruction, and solves the problems of less training and overfitting of the original method sample.

Compared with the existing method, the invention provides a detection method of the CNN-LSTM hybrid model. The CNN-LSTM hybrid model can process different parts of the input data simultaneously, facilitating the simultaneous extraction and integration of data features. Information about various aspects of control instruction execution generation data can be captured, computing efficiency is significantly improved, and model training and reasoning processes are expedited. Specifically, the accuracy of the CNN-LSTM model was improved by 24% and 22%, respectively, and the F1 score was improved by at least 16% and 14%, respectively, as compared to the other models. These results demonstrate the effectiveness of the model proposed by embodiments of the present invention in enhancing the detection capability of malicious control instructions.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a schematic diagram of a cuckoo sandbox according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for detecting malicious control instructions according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a CNN-LSTM hybrid model according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of constructing a CNN-LSTM hybrid model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the result of proving the validity of a confusion matrix proving method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of the result of determining a final optimal model by using a superparameter selection according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. The described embodiments of the application are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are within the scope of the present application.

As shown in connection with FIG. 1, the implementation environment of the present invention mainly comprises three parts of an analysis virtual machine, a virtual network and a Cuckoo host. The training samples are extracted from the malicious control instructions in the power distribution network communication network software on the analysis virtual machine, the analysis virtual machine provides an isolated environment, and the behavior of the software can be analyzed more safely and accurately without affecting the safety of a host system. The virtual network is an isolated virtual network which is mainly used for running analysis virtual machines, and connects each analysis virtual machine and a Cuckoo host machine in the isolated environment. The analysis virtual machine creates a new environment when running the sample instruction, i.e. the control instruction to be detected, and reports the sample instruction behavior to the Cuckoo host. The Cuckoo host is connected with the Internet to manage and analyze, dump flow and generate reports.

Based on the implementation environment of fig. 1, with reference to fig. 2, embodiment 1 of the present invention provides a method for detecting malicious control instructions based on a CNN-LSTM hybrid model, which is applied to a power distribution network communication network, and the method includes the following contents:

s1, acquiring a plurality of API call sequences generated when a control instruction to be detected is executed in the distribution network communication network software.

Specifically, the plurality of API sequences are extracted from an API call sequence generated by the execution behavior of a malicious control instruction in the power distribution network software in the software by an open-source cuckoo sandbox.

S1 specifically comprises:

S1.1, establishing an isolated virtual network mainly used for running analysis virtual machines through a Cuckoo sand box, connecting each analysis virtual machine with a Cuckoo host in the isolated environment, and finally connecting the Cuckoo host with the Internet.

S1.2, extracting a dynamic data API call sequence generated after control instruction execution, and generating a report, wherein the report is specifically shown as follows:

Analysis reports are first obtained from the Cuckoo Sandbox or other Sandbox environment, and the Sandbox reports are typically generated in JSON, XML or HTML format, containing detailed behavioral data of the software being analyzed. The report content is then read and parsed using an appropriate parsing tool (e.g., json module of Python.) depending on the format of the report. This portion of the data may be under the "behavior" or "api_calls" fields, for example, a "calls" list may be included in the report of Cuckoo Sandbox, in which all API calls are recorded, then traversing the API call data, the first 100 non-repeated consecutive API calls may be selected for extraction, as desired, the extracted API call sequence is processed (e.g., mapped to a unique value or encoded), and stored in an appropriate data structure (e.g., a list or database.) finally, the extracted API call sequence is exported to a file or database for subsequent analysis or model training.

Malicious control instructions, which typically have specific patterns of behavior and characteristics, can be identified by analyzing dynamic data and API call sequences. Dynamic data refers to real-time data generated during execution of control instructions by software, such as file operations, network communications, registry access, etc., and API call sequences refer to sequences of API functions called by software during execution of control instructions. There is an affinity between the dynamic data and the API call sequence. In particular, the API call sequence may reflect the functions and behavior that the software uses during execution of the control instructions, while the dynamic data provides more specific details and context information.

Malicious control instructions typically use specific API functions to perform malicious activities, such as reading sensitive information, modifying system settings, self-propagating, etc., and by analyzing the API call sequences of the malicious control instructions, patterns and features of these malicious behaviors can be identified. Meanwhile, the dynamic data can also provide more detailed information, such as network communication destination of malicious control instructions, target paths of control instruction operation and the like, so that further analysis and judgment of malicious properties of the control instructions executed by the software are facilitated.

Analyzing an API call sequence of a virtual machine extraction control instruction, creating a new environment when each sample is run, reporting the behavior of the sample back to a Cuckoo host, analyzing and managing the Cuckoo host, dumping flow and generating a report. The data used by the embodiment of the invention comprises 42797 malicious control instruction API call sequences and 1079 benign control instruction API call sequences, wherein each API call sequence consists of the first 100 non-repeated continuous API calls associated with a father process, is extracted from call elements reported by Cuckoo Sandbox and is generated from a real Sandbox environment.

S2, encrypting the API call sequences into hash values by using a hash algorithm to obtain a plurality of encrypted API call sequences.

S2, encrypting an API call sequence of a malicious or benign control instruction into a string of 32-byte hash values by utilizing a hash algorithm, wherein the specific steps are as follows:

① Input data first, API call sequence of malicious or benign control instruction is used as the input data of hash calculation.

② Data blocking, namely, the embodiment of the invention needs to process an API call sequence generated by 43876 control instruction execution behaviors, so that the data needing to be blocked is firstly divided into blocks with fixed sizes. By the method, the processing efficiency can be improved, and the method is suitable for a hash algorithm which cannot process the whole data at one time.

③ Initializing a hash value-a hash value needs to be initialized before starting the calculation. The length of this hash value is typically determined by the design of the hash algorithm, and the embodiment of the present invention employs the SHA-3-256 hash algorithm, which has a hash value length of 256 bits, i.e., 32 bytes. Since the hash value length directly affects the security of the hash algorithm and the likelihood of a hash collision. In general, the longer the hash value length, the lower the probability of a hash collision and thus the more difficult to forge.

④ Processing the data block, namely processing the data block, converting the data block into a specific format, and adopting hexadecimal format in the embodiment of the invention. Then, hash operation is carried out, and hash operation is carried out on the converted data block.

⑤ And updating the hash value, namely combining the hash operation result of each data block with the previous hash value and updating the hash value.

⑥ And repeatedly processing the data blocks, and repeating the steps until all the data blocks are processed.

⑦ And outputting the hash value, namely obtaining the hash value which is the hash calculation result of the input data after all the data blocks are processed. Table 1 is an API call sequence dataset showing the results of an API call sequence cryptographic hash value for a malicious or benign control instruction.

TABLE 1

Hash value	t_0	t_1	t_2	***	t_99
						07le8c3f8922e186e57548cd4c703a5d	112	274	158	***	71
70b78f7bdf9c484913d10365348ed2ba	286	172	117	***	135
						b68abd064e975elc6d5f25e748663076	16	110	240	***	112
72049be7bd30ea61297ea624ae198067	82	208	187	***	302
						164b56522eb24164184460f8523ed7e2	82	240	117	***	35
56ae1459ba6lal4eb119982d6ec793d7	82	240	117	***	117
						654139d715abcf7ecdddbef5a84f224b	82	240	117	***	141
078c964e7be4819a06974c6f292a4857	112	274	158	***	71

T_0 through t_99 in Table 1 record the first 100 non-repeated consecutive API calls associated with a particular parent process during dynamic analysis. Each API call represents a specific operation in the program execution process, such as file read-write, network request, memory allocation, etc.

And S3, inputting a plurality of encrypted API call sequences into a pre-built malicious control instruction detection model, and outputting whether the control instruction to be detected is a malicious control instruction, wherein the malicious control instruction detection model is a mixed model built based on CNN and LSTM, the malicious control instruction detection model is obtained through machine learning training by using a plurality of groups of data, each group of data comprises a first class of data and a second class of data, each group of data in the first class of data comprises an API call sequence generated when the malicious control instruction is executed in the distribution network communication network software, and each group of data in the second class of data comprises an API call sequence generated when the benign control instruction is executed in the distribution network communication network software.

And S3, training a malicious control instruction detection model through machine learning by using a plurality of groups of data, wherein the plurality of groups of data are obtained from a data set, and specifically, an automatic malicious software analysis system extracts training samples for malicious control instructions in the power distribution network communication network software.

The cuckoo sandbox is an open-source automatic malicious software analysis system and is also a classical sandbox analysis tool. It can provide detailed analysis results in a few seconds summarizing the situation when a file is executed in an isolated environment. Simulating the malicious control instruction in the cuckoo sandbox distribution network malicious control software to obtain an original malicious sample, and placing the original malicious sample into a data set.

Samples for model training are classified into two classes, a malicious sample and a benign sample in S3. The first type of data is a malicious sample, which refers to an instruction or command used by a hacker or a malicious attacker to manipulate, interfere with or destroy the operation of the power distribution network, and the sample is often harmful and may threaten the safety of the communication network of the power distribution network and the stable operation of the power distribution network. The second type of data is a benign sample, which refers to a sample without malicious control command or behavior, and is a sample for the power distribution network to send benign operation instructions to maintain the optimal power distribution strategy of the power distribution network, and for accurate and rapid recovery after faults and the like. The samples are safe and reliable, and do not cause any harm to the system.

Specifically, malicious control instructions include, but are not limited to, the following:

and a hacker can send an instruction to remotely control the power distribution equipment to cut off the power supply, so that power failure or partial power failure is caused, and normal power utilization of a user is affected.

And the power equipment overload instruction is that an attacker can send an instruction to overload the power equipment, so that the equipment is overheated, short-circuited or damaged, and further safety accidents such as fire or explosion are caused.

Malicious data tampering instructions-hackers may send instructions to tamper with the power distribution system data, such as power meter readings or power load information, to cause inaccurate data logging or misleading the operation and maintenance personnel to make erroneous decisions.

Remote control instructions an attacker may utilize malicious instructions to remotely control devices of the power distribution system, such as circuit breakers, switches, etc., resulting in abnormal operation of the devices, paralysis or destruction of the power distribution system.

Denial of service instructions-hackers may send instructions that cause the power distribution system to fail to function properly, for example, by sending large amounts of malicious traffic or attacking the communication network of the power distribution system, rendering its services unavailable or severely compromised.

And the transformer damage instruction is that an attacker can send an instruction to overload or damage the transformer, thereby affecting the power supply stability or causing power failure.

Security protocol attack instruction a hacker may send an instruction to attack the communication protocol or security mechanism of the power distribution system, for example by intercepting, tampering or forging communication data, to gain control over the system or to manipulate the system behaviour resulting in system paralysis, possibly even causing serious threat to public security.

Constructing a hybrid model based on CNN and LSTM in S3 comprises:

s3.1, selecting an API call sequence from a plurality of groups of data as an input sequence.

Specifically, an API call sequence is selected from a plurality of groups of data to be used as an input sequence, and an input layer is established. Samples of the parent process of the called API sequence are encrypted to hash values by a hash algorithm based on their behavior. The call sequence consists of the first 100 non-repeated continuous API calls associated with the parent process, reflecting the behavioral characteristics of the control instructions, the hash value generated by the hash algorithm may be considered a "fingerprint" or "identity" of the parent process as it executes in the sandbox. This hash value is generated based on the specific behavior of the process, and can uniquely identify the API call pattern of the process. Specifically, these non-duplicate consecutive API calls are mapped to corresponding values (each API has a unique value) and placed in the t_0 to t_99 index.

In this way, by mapping the sequence of API calls to unique values, the calls can be converted to a numeric format for subsequent machine learning processing. Each value corresponds to a particular API and is effectively representative of the characteristics of that API. Also, by limiting the API call sequence to the first 100 calls, it is ensured that the dimensions of the input data are consistent, which is very important for training of deep learning models. The CNN model requires a fixed-size input, so using the indices of t_0 through t_99 can ensure that the shape of the input data meets the requirements of the model.

S3.2, constructing a CNN model, wherein the CNN model comprises an embedded layer, a batch normalization layer, a convolution layer and a pooling layer, inputting an input sequence into the CNN model, and sequentially carrying out processing of the embedded layer, the batch normalization layer, the convolution layer and the pooling layer, wherein the embedded layer carries out dimension reduction processing on an API call sequence, and the batch normalization layer normalizes data output by the embedded layer to the same range;

S3.3, constructing an LSTM model, inputting the output of the pooling layer into the LSTM model, and inputting the output of the LSTM model into the complete connection layer.

The CNN model can process fee structured information, extract characteristics from an input sequence and improve the efficiency of solving problems.

As shown in fig. 3 and 4 in combination, the parameter settings of each layer of the CNN model are as follows:

The first layer is the embedded layer, the vocabulary set to 307, where the vocabulary is to translate the API call sequence into a corresponding unique value by mapping. The dimension of the output word vector is then set to 8, indicating that the embedding layer is reduced to the dimension of 8. Finally, the length of the input sequence is set to 100, which is the number of API call sequences. Thus, in the embedded layer, the input data will be reduced from original 100×307 to 100×8. In this way, the overall model reduces the risk of overadaptation during training.

The second layer is a batch normalization layer, and the main purpose of adding the batch normalization layer is to save resources, accelerate learning efficiency, enhance generalization capability of the model and prevent overfitting to a certain extent. This layer will not change the dimension of the input data and will remain the same dimension as the embedded layer.

The third layer is a convolution layer, and the main function of the third layer is to perform convolution operation on the input matrix and filter out API call data irrelevant to malicious control instructions in the first 100 non-repeated continuous API call data associated with the API call parent process. The embodiment of the invention can judge whether the data is suitable for training in the network. The first 100 non-repeated consecutive API calls associated with its parent process do not have a decisive effect on determining whether the control instructions of the software are malicious control instructions. Of these 100 non-repeated consecutive API calls, there may be some that have no significant impact on whether the software control instructions are malicious. However, entering these API calls as input data into the model adds computational complexity, but does not provide significant benefits. Thus, a receptive field is used in CNNs to avoid the process of extracting features from all data.

A receptive field refers to a sensing area of neurons of a layer in a network on input data. For a convolutional layer, the receptive field R can be calculated by the following formula:

R=R_prev+(k-1)×S

Where R _prev is the size of the receptive field of the previous layer, k is the size of the current convolution kernel, which may be 3, representing the convolution kernel of 3*3, S is the stride, representing the stride in which the convolution kernel moves on the input.

The fourth layer is the max pooling layer, which uses a one-dimensional max pooling method. The size of the pooling layer is 2, where the stride uses a default size that is the same as the size of the pooling layer. The input to this layer is the output of the convolutional layer, with a data shape of 100 x 32. After pooling, the maximum value is extracted from the range of size 2, reducing the original element map size to 50 x 307.

And S3.4, establishing an output layer, wherein the output layer is used for outputting characteristic data related to the malicious control instruction. The characteristic data includes a network communication destination of the control instruction, a target path of the control instruction operation, and a behavior pattern of the control instruction.

Specifically, the output of the maximized pooling layer of the CNN model is input into the LSTM model, and the specific operation steps are as follows:

After inputting the data after batch normalization into a one-dimensional convolution layer for training and learning, collecting convolution output, and inputting the collected data into an LSTM layer for training and learning. Each batch of data contains 512 samples, the return sequence is set to FALSE, indicating that only the hidden state value of the last time period is needed. Furthermore, setting the discard value to 0.2 indicates that 20% of the neurons randomly selected are discarded per iteration of the training, which affects the LSTM neuron inputs and recursive connections in a probabilistic manner during forward propagation and weight updating. The method can effectively avoid overfitting and improve the performance of the model. It is in effect a random neglect of a part of the neurons to save computational resources, preventing or alleviating overadaptation.

For example, if one behavior feature in the API call sequence is a file reading and writing, the behavior of the reading intelligent terminal unit is a feature unrelated to the malicious control instruction, and the behavior feature of the reading intelligent terminal unit is filtered in the hybrid model based on CNN and LSTM, so as to obtain the behavior feature related to the malicious control instruction.

It should be noted that the CNN portion is responsible for extracting spatial features in the input data, for example, identifying a specific calling mode or feature, which may be related to malicious behavior. The LSTM part is responsible for capturing the time characteristics in the input data, can remember the sequence of the API call sequence, understand the context, capture the time change trend of the characteristics, and further extract the data characteristics related to the malicious instruction, so that the malicious property of the control instruction is better judged.

And S3.5, establishing a classification layer, inputting the characteristic data output by the output layer into the classification layer, outputting whether the control instruction to be detected is a malicious control instruction or not by the classification layer, and completing the establishment of the hybrid model based on the CNN and the LSTM.

Specifically, the output result is mapped between 0 and 1 using the Sigmoid function, 1 representing a malicious control instruction, and 0 representing a benign control instruction.

The classification layer includes a classifier and a regressor, and the loss function is generally composed of two parts, namely classification loss and positioning loss. The classification loss trains the classifier to identify the class of control instructions detected, while the positioning loss trains the regressor to determine the location or time in the sequence of a particular API call or the magnitude of certain API call frequencies. The loss function used in the embodiments of the present invention is a binary cross entropy loss function. This loss function is mainly used for binary classification problems. For binary classification, observations represent the true class of the sample, both of which are malicious and benign. Regardless of the actual observed value, it uses {0,1} instead of y. Obviously, the Bernoulli distribution is met, only one parameter,

P_θ(y=1)=θ (1)

P_θ(y=0)=1-θ (2)

In the formula, the formula (1) is a predicted value of the model to the ith sample, that is, a probability θ that the model predicts that the label value of the ith sample is 1. The two equations are combined and expressed by one equation,

P_θ(y)＝θ^y(1-θ)^1-y (3)

Where y represents the true tag value, i.e., the actual value, y is 1 for the malicious sample, and y is 0 for the benign sample. The method can better reflect the difference between the predicted value of the model and the label, so that the accuracy of model prediction can be better evaluated. Furthermore, binary cross entropy loss functions are relatively easy to optimize and converge.

Assuming the dataset is d= (x ₁,y₁),(x₂,y₂),...,(x_N,y_N) assuming that the observed data points are all ii D (INDEPENDENT AND IDENTICALLY distributed, independent co-distribution), then the likelihood of their observed log (linklihood function, log likelihood) function is equal to

Wherein θ is a parameter of the model, that is, the model predicts a probability that the label value of the i-th sample is 1, y _i is a binary label value of 0 or 1 of the i-th sample, likelihood function l (θ) is an objective function, and N is the number of samples. Adding a negative sign in front of it, it becomes a loss function. By observing the above equation and comparing it with the cross entropy formula, it can be seen that this loss function is the cross entropy H _y (θ) of y _i and θ. The parameters of the model are updated by minimizing the loss function.

The loss function selected by the embodiment of the invention is a Sigmoid loss function. And after the Sigmoid loss function processing, obtaining a classification result.

And S4, evaluating the performance of the malicious control instruction detection model.

The S4 specifically comprises the steps of calculating according to an output result of a malicious control instruction detection model by using an confusion matrix, wherein a row of the confusion matrix represents an actual category, the row represents the category output by the malicious control instruction detection model, the confusion matrix comprises four cells, each cell corresponds to one preset category, the preset category comprises true positives, false negatives and true negatives, and a numerical value in each cell represents the number of times that the malicious control instruction detection model predicts the actual category as the preset category, and the evaluation result is obtained according to the calculation of the confusion matrix.

The method comprises the steps of taking a true positive sample as a positive example, predicting a model as a positive example, taking a false positive sample as a negative example, predicting a model as a positive example, taking a false negative sample as a positive example, predicting a model as a negative example, and taking a true negative sample as a negative example, and predicting a model as a negative example. Thus, the parts of the true value, which are consistent with the predicted value and are not consistent with the true value, can be more intuitively presented through the confusion matrix, so that the prediction capability is further improved.

Optionally, calculating the accuracy, recall, precision and F1 score according to the confusion matrix, and calculating to obtain the evaluation result through the accuracy, recall, precision and F1 score.

Accuracy refers to the ratio of the number of targets correctly identified by the model to the number of targets. The accuracy reflects the ability of the model to distinguish between objects, with a lower probability of false detection. The specific formula is as follows:

Where TP refers to true positive, FP refers to false positive, TN refers to true negative, and FN refers to false negative.

Recall refers to the ratio of the number of targets that the model correctly recognizes to the number of targets that are actually present. The recall rate reflects the coverage of the model to the target, and the higher the probability of correct recognition, the lower the recall rate. The specific formula is as follows:

accuracy, F1 score provides a harmonic average of accuracy and recall of classification results. The user may further make a more accurate decision by evaluating the index values and determine where improvements are needed, such as by adjusting the hyper-parameters of the hybrid model. The formulas (7) and (8) respectively represent the calculation methods of the accuracy and the F1 fraction:

the embodiment of the invention uses various evaluation indexes such as accuracy, precision, F1 score and the like to evaluate the performance of the model. By evaluating the index values, the performance of different malicious control instruction detection methods can be compared and analyzed more objectively.

Further, an evaluation result is obtained through calculation of accuracy, recall, precision and F1 score according to the following formula:

Z＝ξ₁×ln(Precision+1)+ξ₂×ln(Recall+1)+ξ₃×ln(Accuracy+1)+ξ₄×ln(F1+1)(9)

Wherein Z is a performance evaluation result, precision represents Accuracy, recall represents Recall, accuracy represents Accuracy, F1 represents F1 score, ζ ₁、ξ₂、ξ₃、ξ₄ is a weight of a corresponding index, and ζ ₁+ξ₂+ξ₃+ξ₄ =1 is satisfied. The embodiment of the invention has the same importance degree on the accuracy rate, the recall rate, the accuracy rate and the F1 fraction, so the weight is set to be 0.25.

Referring to fig. 5, the test set consists of 284 benign control instruction data instances. Of these, 208 were accurately detected and 75 were misclassified. In addition, there are 10686 malicious control instruction data instances, 10665 of which are correctly identified, and only 21 are misclassified. The method provided by the embodiment of the invention can improve the detection efficiency of the malicious control instruction.

And S5, adjusting parameters of the feature extraction model according to the loss function and the evaluation result, wherein the parameters comprise at least one of pooling rate, layer number, channel number and hidden neuron number.

When the embodiment of the invention trains the mixed model, the hash value corresponding to the first column is removed from the API call sequence of the table 1, and the label corresponding to the last column of API call sequence is added. The tag values are only 0 and 1, where 0 is a benign sample and 1 is a malicious sample. The required data are t_0 to t_99. The values in the index need to be extracted 99 for each row, corresponding to the labels of each sample, to construct a dataset.

80% Of the data in the rows are selected from the data set to serve as a training set of the hybrid model, and 20% of the data in the rows are selected to serve as a verification set. During the experiment, one epoch means that the model learns all training samples throughout the training data set during the training process. The embodiment of the invention trains each model by using 140 epochs, and records the F1 score of each epoch to finally determine the optimal model.

Specifically, adjusting parameters of the malicious control instruction detection model according to the loss function and the evaluation result includes the following contents:

Q=α×e^lnL+β×(1-e^lnZ) (10)

Wherein Q represents the comprehensive index value, L is the loss value of the malicious control instruction monitoring model, the loss value is obtained through loss function calculation, Z is the evaluation result of the malicious control instruction detection model performance, alpha and beta are weight coefficients, and the alpha and beta are non-negative numbers.

Optionally, the tuning threshold is 0.45, the weight range is [0,1], the alpha + beta=1 is satisfied, the invention has the same importance degree on the loss function and the performance evaluation result, and alpha=0.5 and beta=0.5 are set.

Furthermore, when Q is smaller than 0.45, the malicious control instruction detection model can meet the performance requirement, and still can continue tuning so that Q is smaller and better parameters are obtained.

It should be noted that, a person skilled in the art may set the weight value and the tuning threshold according to practical applications, and the specific numerical values of the weight value and the tuning threshold are not limited in the embodiment of the present invention.

More specifically, a back propagation algorithm is used to calculate the gradient of the loss function relative to each parameter, the back propagation is carried out through a chain rule, the gradient outputted by the loss value pair is transmitted back layer by layer, the gradient of each parameter is finally obtained, a gradient descent method is used, and the parameters of the model are updated according to the calculated gradient:

where θ is a model parameter, η is a learning rate, Is the gradient of the loss function with respect to the parameter.

Further, when Q is greater than 0.45, tuning parameters of the malicious control instruction detection model includes:

And (3) adjusting the pooling rate a, and trying different pooling rates (2 x2, 3x3 and the like). A larger pooling rate may reduce the size of the feature map faster, but may lose important information. b. Different pooling layer positions were tested and the effect on model performance was observed by trying to add pooling layers after different convolution layers.

The number of layers is adjusted, namely the number of layers is gradually increased from a smaller number of layers (such as from 2 layers to 5 layers).

The number of channels is adjusted, the number of channels determines the feature extraction capability of each layer, the number of channels can be increased to improve the expression capability of the model, and the number of channels is gradually increased (from 32 to 64 and 128) in the convolution layer.

The hidden neuron number is adjusted a, and the hidden neuron number is gradually increased, namely, the hidden neuron number is gradually increased (128 and 256) from the smaller hidden neuron number (32 and 64). b. Different activation functions (e.g., reLU, leak ReLU, tanh, etc.) are tried in the hidden layer to increase the nonlinear expression capability of the model. The number of neurons in the hidden layer determines the complexity and learning ability of the model.

Drawing a learning curve as shown in fig. 6, namely recording indexes such as training and verification loss, accuracy and the like, drawing the learning curve, and helping to analyze whether the model is over-fitted or under-fitted.

And combining and adjusting the pooling rate, the layer number, the channel number and the hidden neuron number to find the optimal network structure. For example, the number of channels per layer may be appropriately reduced while increasing the number of layers to maintain computational efficiency. Figure 6 shows the loss and accuracy impact during training, the larger the parameters, the more computational resources are consumed, but the less effective it is to increase the F1 score, possibly even resulting in performance degradation.

Through experimental results, the CNN-LSTM model can be observed to have better evaluation indexes in malicious control instruction detection than other two detection models. In terms of accuracy, the CNN-LSTM model was improved by 24% and 22% compared to the CNN and LSTM models, respectively. In terms of F1 score, the CNN-LSTM model was at least 16% and 14% higher than the other models, and the comparison results for the different models are shown in Table 2. This means that the CNN-LSTM hybrid model proposed in this chapter performs better in malicious control instruction detection.

TABLE 2

Embodiment 2 of the present invention provides a malicious control instruction detection system, which runs the method for detecting malicious control instructions based on the CNN-LSTM hybrid model described in embodiment 1, where the system includes:

Embodiment 3 of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the computer program implements the method for detecting a malicious control instruction based on the CNN-LSTM hybrid model of embodiment 1 when loaded into the processor.

Embodiment 4 of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method for detecting malicious control instructions based on the CNN-LSTM hybrid model according to embodiment 1.

the method for detecting the malicious control command by the power distribution network communication network is provided, and the high efficiency of the method is verified through data analysis and comparison.

Compared with the existing malicious control instruction detection method, the method has the problems that the model is difficult to learn and train due to the fact that malicious control instruction samples are rare and changeable and the positive and negative sample distribution is extremely unbalanced, and judgment of the malicious control instruction is difficult to realize. The detection method provided by the invention finds the key information-API call sequence for identifying the malicious control instruction. Since these data are generated by the behavior of the control instructions themselves, it is difficult to confuse them. Therefore, the invention uses the API call sequence extracted from the cuckoo sandbox report as the data set for detecting the effective malicious control instruction, and solves the problems of less training and excessive fitting of the original method sample.

Compared with the existing method, the invention provides a detection method of the CNN-LSTM hybrid model. The CNN-LSTM hybrid model can process different parts of the input data simultaneously, facilitating the simultaneous extraction and integration of data features. Information about various aspects of control instruction execution generation data can be captured, computing efficiency is significantly improved, and model training and reasoning processes are expedited. Specifically, the accuracy of the CNN-LSTM hybrid model was improved by 24% and 22%, respectively, and the F1 score was improved by at least 16% and 14%, respectively, as compared to the other models. These results demonstrate the effectiveness of the model proposed by embodiments of the present invention in enhancing the detection capability of malicious control instructions.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, punch cards or intra-groove protrusion structures such as those having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the specific embodiments of the present invention without departing from the spirit and scope of the present invention, and any modifications and equivalents are intended to be included in the scope of the claims of the present invention.

Claims

1. A method for detecting malicious control instructions based on a CNN-LSTM hybrid model, applied to a distribution network communication network, characterized by comprising:

Obtain multiple API call sequences generated when the control instruction to be detected is executed in the distribution network communication network software;

Use a hash algorithm to encrypt multiple API call sequences into hash values to obtain multiple encrypted API call sequences;

Input multiple encrypted API call sequences into a pre-built malicious control instruction detection model, and output whether the control instruction to be detected is a malicious control instruction; the malicious control instruction detection model is a hybrid model constructed based on CNN and LSTM, and the malicious control instruction detection model is obtained through machine learning training using multiple sets of data, and the multiple sets of data include a first type of data and a second type of data, each set of data in the first type of data includes an encrypted API call sequence of an API call sequence generated when a malicious control instruction is executed in the distribution network communication network software, and each set of data in the second type of data includes an encrypted API call sequence of an API call sequence generated when a benign control instruction is executed in the distribution network communication network software;

Building a hybrid model based on CNN and LSTM involves:

Select an API call sequence from the encrypted API call sequence as an input sequence;

Construct a CNN model, which includes convolutional layers and pooling layers. Input the input sequence into the CNN model and process it through the convolutional layers and pooling layers in turn.

Build an LSTM model, input the output of the pooling layer into the LSTM model, and input the output of the LSTM model into the fully connected layer;

Establishing an output layer for outputting characteristic data related to malicious control instructions, including the network communication destination of the control instruction, the target path of the control instruction operation, and the behavior pattern of the control instruction;

A classification layer is established, and the feature data output by the output layer is input into the classification layer. The classification layer outputs whether the control instruction to be detected is a malicious control instruction, completing the construction of the hybrid model based on CNN and LSTM.

2. The malicious control instruction detection method based on the CNN-LSTM hybrid model according to claim 1 is characterized in that:

The plurality of API call sequences includes first 100 non-repeating consecutive API call sequences associated with a parent process of the called API.

3. The malicious control instruction detection method based on the CNN-LSTM hybrid model according to claim 1 is characterized in that:

The CNN model also includes:

Embedding layer, used to reduce the dimensionality of the input sequence;

Batch normalization is done on one layer, connected to the embedding layer, to normalize the data output by the embedding layer to a preset range, and then the normalized data is input to the convolutional layer.

4. The malicious control instruction detection method based on the CNN-LSTM hybrid model according to claim 1 is characterized in that:

In the classification layer, the Sigmoid function is used to output the probability that the control instruction to be detected is a malicious control instruction based on the feature data.

5. The malicious control instruction detection method based on the CNN-LSTM hybrid model according to claim 4 is characterized in that:

The method further comprises:

Evaluate the performance of the malicious control instruction detection model and obtain the evaluation results

Parameters of the malicious control instruction detection model are adjusted according to the loss function and the evaluation results, wherein the parameters include at least one of the following: pooling rate, number of layers, number of channels, and number of hidden neurons.

6. The malicious control instruction detection method based on the CNN-LSTM hybrid model according to claim 5 is characterized in that:

The performance evaluation of the malicious control command detection model includes:

A confusion matrix is used to calculate based on the output of the malicious control instruction detection model, where the rows of the confusion matrix represent the actual categories, and the columns represent the categories output by the malicious control instruction detection model. The confusion matrix includes four cells, each cell corresponding to a preset category, and the preset categories include true positive, false positive, false negative, and true negative. The value in each cell represents the number of times the malicious control instruction detection model predicted the actual category as the preset category;

Calculate the evaluation results based on the confusion matrix.

7. The malicious control instruction detection method based on the CNN-LSTM hybrid model according to claim 5 or 6 is characterized in that:

Adjust the parameters of the malicious control command detection model based on the loss function and evaluation results, including:

The comprehensive index value is calculated according to the loss function and evaluation results according to the following formula;

Q＝α×e ^lnL +β×(1-e ^lnZ )

Where Q represents the comprehensive index value, L is the loss value of the malicious control instruction monitoring model, which is calculated by the loss function, Z is the evaluation result of the malicious control instruction detection model performance, α and β are weight coefficients, which are non-negative numbers;

When Q is greater than the tuning threshold, the parameters of the malicious control instruction detection model are tuned.

8. A malicious control instruction detection system using the malicious control instruction detection method based on the CNN-LSTM hybrid model according to any one of claims 1 to 7, characterized by comprising:

An acquisition module, used to acquire multiple API call sequences generated when the control instruction to be detected is executed in the distribution network communication network software;

An encryption module, used to encrypt multiple API call sequences into hash values using a hash algorithm to obtain multiple encrypted API call sequences;

A detection module is used to input multiple encrypted API call sequences into a pre-built malicious control instruction detection model, and output whether the control instruction to be detected is a malicious control instruction; the malicious control instruction detection model is a hybrid model constructed based on CNN and LSTM, and the malicious control instruction detection model is obtained through machine learning training using multiple groups of data, and the multiple groups of data include a first category of data and a second category of data. Each group of data in the first category of data includes an API call sequence generated when a malicious control instruction is executed in the distribution network communication network software, and each group of data in the second category of data includes an API call sequence generated when a benign control instruction is executed in the distribution network communication network software.

9. An electronic device comprising a processor and a storage medium; characterized in that:

The storage medium is used to store instructions;

The processor is configured to operate according to the instruction to execute the steps of the malicious control instruction detection method based on the CNN-LSTM hybrid model according to any one of claims 1 to 7.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that when the program is executed by a processor, the steps of the malicious control instruction detection method based on the CNN-LSTM hybrid model described in any one of claims 1 to 7 are implemented.