CN120880810B - GAN-based electric power protocol honeypot trapping and anomaly identification method - Google Patents
GAN-based electric power protocol honeypot trapping and anomaly identification methodInfo
- Publication number
- CN120880810B CN120880810B CN202511404551.9A CN202511404551A CN120880810B CN 120880810 B CN120880810 B CN 120880810B CN 202511404551 A CN202511404551 A CN 202511404551A CN 120880810 B CN120880810 B CN 120880810B
- Authority
- CN
- China
- Prior art keywords
- attack
- protocol
- gan
- honeypot
- trapping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1491—Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1466—Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Probability & Statistics with Applications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a GAN-based electric power protocol honeypot trapping and anomaly identification method, which is characterized in that a data set is constructed by collecting real flow of an electric power protocol, a plurality of attack samples conforming to protocol grammar are generated by using GAN of a transducer architecture, and robustness is improved by combining data enhancement technologies such as random truncation, noise injection and the like. The virtual honeypot equipment is deployed to simulate the behavior of the power equipment, the attack log and the flow characteristic are fused in real time, the cross-message interaction relation is modeled by adopting the graph neural network, and the self-supervision learning is introduced to detect the semantic abnormality. Experiments show that the method realizes the detection accuracy of 98.2% on data sets such as IEEE 123-Bus and the like, supports dynamic adaptation of protocol versions, realizes accurate tracing of an attack source IP and intention through honeypot log association analysis, and effectively improves the active defense capability of the power system to novel attacks.
Description
Technical Field
The invention relates to the technical field of power system safety, in particular to a GAN-based power protocol honeypot trapping and anomaly identification method, which is used for improving the attack detection capability of power communication protocols (such as IEC 60870-5-104, modbus, DNP3 and the like).
Background
With the development of the energy internet and smart grids, the power system communication protocol faces increasingly complex network attack threats. Traditional honeypot technology attracts attackers by simulating real power devices, but has the following drawbacks:
the single attack sample is that the existing honeypot depends on preset rules or static data, so that diversified attack samples are difficult to generate, and the recognition capability of novel attacks is limited.
The anomaly detection precision is insufficient, namely the identification rate of a rule-based anomaly detection model (such as threshold judgment and feature matching) to a hidden attack (such as protocol fuzzy attack and zero-day exploit) is low, and the rule-based anomaly detection model depends on manual annotation data.
The dynamic adaptability is poor, the version iteration of the power protocol and the communication mode change lead to the frequent updating rule of the traditional honeypot, and the maintenance cost is high.
In the prior art, a power protocol anomaly detection method based on deep learning is proposed in literature, but a large amount of marked data is relied on, and the traditional GAN is adopted in literature to generate attack flow, but a honeypot trapping mechanism is not combined, so that real-time tracking and feedback optimization of attack behaviors cannot be realized. Therefore, an innovative method of fusing GAN and honeypot technologies is needed to improve the level of intelligence of the power system safety protection.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.
The present invention has been made in view of the above-described problems with the existing power communication protocols.
Therefore, the invention solves the technical problems of insufficient recognition of complex attack modes, high false alarm rate and poor dynamic adaptability of the existing honeypot system.
The technical scheme includes that the method comprises the following steps of S1, collecting real protocol traffic from a power system communication link, constructing an original data set, S2, extracting time sequence features and semantic features of protocol fields, converting the time sequence features and the semantic features into tensor format input models, S3, generating attack samples by the GAN, S4, constructing virtual power equipment based on the generated attack samples, simulating real communication behaviors, completing deployment of honeypot trapping strategies, S5, carrying out multi-mode fusion on attack behavior data collected by the honeypot and real-time traffic features, S6, capturing abnormal modes crossing messages by adopting a graph neural network modeling protocol interactive relation, S7, associating abnormal detection results with honeypot trapping logs, optimizing the GAN generation strategy, and synchronously positioning an attack source IP and attack intention by analyzing an attacker behavior path.
As an optimal scheme of the GAN-based electric power protocol honeypot trapping and abnormality identifying method, the real protocol flow acquired in the S1 comprises ASDU messages of IEC 60870-5-104 and Modbus function code requests.
As a preferable scheme of the GAN-based electric power protocol honeypot trapping and anomaly identification method, the method further comprises the step of expanding a data set through a random truncation and noise injection method after the step S2, so that model robustness is improved.
As a preferable scheme of the GAN-based electric power protocol honeypot trapping and anomaly identification method, the step S3 of generating an attack sample by GAN specifically comprises the following structure and implementation method:
1. Generator network
Inputting semantic feature vectors of a power protocol;
Adopting a transducer architecture to generate an attack message conforming to protocol grammar;
2. distinguishing device network
Inputting mixed data of real protocol flow and an attack sample;
the structure is based on a time sequence classifier of LSTM, and real data and generated data are distinguished;
3. training process
The GAN is optimized by minimizing the cross entropy loss function, so that the attack samples output by the generator are semantically and chronologically consistent with the real traffic.
As a preferable scheme of the GAN-based electric power protocol honeypot trapping and anomaly identification method, the invention specifically comprises the following steps:
Wherein z is an input noise vector and obeys the prior distribution p z, G (z) is a generator network, the input noise z outputs the generated attack sample x g∈X;D(xg) is a discriminator network, the input sample x g and the output probability value D (x g)∈[0,1];Ez~pz is the expectation of the noise distribution p z).
The GAN-based power protocol honeypot trapping and anomaly identification method is characterized in that the honeypot trapping strategy in the S4 step specifically comprises the steps of dynamically adjusting honeypot response according to attacker behaviors, recording an attacker operation path and generating attack fingerprints.
The GAN-based power protocol honeypot trapping and anomaly identification method is characterized in that in the step S6, self-supervision learning is introduced, and semantic anomalies are detected through a predictive protocol field.
The invention provides a GAN-based electric power protocol honeypot trapping and anomaly identification method, which has the following beneficial effects:
1. Attack sample diversity, namely covering protocol grammar boundary conditions and implicit loopholes by an attack sample generated by GAN, obviously improving the trapping coverage rate of the honeypot, and improving the attack recognition rate by more than 30 percent;
2. the anomaly detection precision is that the detection accuracy rate of 98.2% is achieved on a public power protocol data set (such as IEEE 123-Bus and CIGRE) by combining the GNN and a self-supervision learning multi-mode model, and the false alarm rate is lower than 2%;
3. Dynamic adaptation, namely, through an online learning mechanism, the model can automatically adapt to protocol version iteration (such as expansion fields of Modbus TCP v1.1 to v 1.2);
4. attack traceability, namely combining the attack fingerprint recorded by the honeypot with the semantic analysis of the detection model to realize the accurate retrospection of the attack path.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
Fig. 1 is a flow chart of a method for trapping and anomaly identification in a GAN-based power agreement honeypot.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
In the prior art, a power protocol anomaly detection method based on deep learning is proposed in literature, but a large amount of marked data is relied on, and the traditional GAN is adopted in literature to generate attack flow, but a honeypot trapping mechanism is not combined, so that real-time tracking and feedback optimization of attack behaviors cannot be realized.
Accordingly, referring to fig. 1, the present invention provides a GAN-based power agreement honeypot trapping and anomaly identification method, comprising the steps of:
S1, acquiring real protocol flow (ASDU message of IEC 60870-5-104 and function code request of Modbus) from a power system communication link to construct an original data set;
it should be noted that:
① The collection tool captures power system communication traffic using a network sniffing tool (e.g., WIRESHARK, TCPDUMP or custom protocol parser).
② Protocol type:
IEC 60870-5-104 extracts ASDU (application service data Unit) messages containing address field, control field, data field, etc.
Modbus, capture function code request (e.g. 0x03 read register, 0x06 write single register) and response message.
DNP3 is to extract control domain, object reference, data domain, etc. in frame structure.
③ And the acquisition environment is an intermediate node (such as a gateway and a switch) deployed on a communication link of the power system, so that the data integrity is ensured.
④ Data format-original traffic is saved in PCAP file, and key fields are extracted later by protocol parser.
S2, extracting time sequence features and semantic features of protocol fields (address field, control field and data field) and converting the time sequence features and the semantic features into tensor format input models;
it should be noted that:
The steps are as follows:
① Extracting fields:
Semantic features, extracting protocol field values (such as ASDU types of Modbus function codes 0x03 and IEC 60870-5-104).
The time sequence features include calculating the time interval of message transmission, such as the time stamp of command sequence, and the data updating frequency.
② Feature coding:
semantic features, one-Hot encoding of discrete fields (such as function codes) and normalization of continuous fields (such as data field values).
The time sequence features are that the time stamp is converted into a time interval sequence (such as delta t=t i+1-ti), and sliding window segmentation is performed.
③ Tensor construction:
semantic features (e.g., function codes, data lengths) and timing features (e.g., Δt sequences) are stitched into two-dimensional tensors (e.g., [ N, T, D ], where N is the number of samples, T is the time step, and D is the feature dimension).
S3, GAN generates an attack sample;
S4, constructing virtual power equipment based on the generated attack sample, simulating real communication behavior, and completing deployment of a honeypot trapping strategy;
it should be noted that:
1. Protocol simulation layer
The method aims at constructing virtual power equipment (such as SCADA servers and intelligent electric meters) and simulating real communication behaviors.
The implementation method comprises the following steps:
① Protocol stack implementation protocol parsing and response is implemented using scapy libraries of Python or special simulation tools (e.g., pcapPlusPlus).
② Virtual device:
SCADA server, analog telemetry data upload (e.g., voltage, current values) and remote command response.
And the intelligent ammeter simulates the reporting of the electric energy reading and the parameter setting request.
2. Trapping strategy
① Dynamic response mechanism:
Falsifying an error status code-when an illegal function code request (e.g. 0x 10) is detected, an error code (e.g. 0x 03) is returned.
Delay response-increasing response delay (e.g., 500 ms) for high frequency requests (e.g., 10 times per second), induces an attacker exposure behavior pattern.
② Behavior tracking module:
logging, namely recording an attacker operation path (such as command sequence and data reading frequency).
Attack fingerprint generation, namely extracting attack characteristics (such as 'illegal function code+high-frequency request') through a clustering algorithm (such as K-means).
S5, carrying out multi-mode fusion on attack behavior data acquired by the honeypot and real-time flow characteristics (such as IP source address and protocol version);
S6, capturing abnormal modes (such as illegal command sequences) of the cross-message by adopting a Graph Neural Network (GNN) modeling protocol interaction relation;
And S7, associating an abnormality detection result with the honeypot trapping log, optimizing a GAN generation strategy (such as increasing the generation capacity of novel attacks), and synchronously positioning an attack source IP and an attack intention (such as data falsification and service interruption) by analyzing an attacker behavior path.
It should be noted that:
1. model feedback
The method aims at optimizing GAN generation strategies and increasing the generation capacity of novel attacks.
The method comprises the following steps:
And (3) associating an abnormality detection result (such as illegal function code and high-frequency request) with the honeypot log to generate a new attack sample.
Generator optimization by adjusting generator input features (e.g., adding new function code fields) or by adjusting loss function weights (e.g., lifting lambda 1 to enhance semantic constraints).
2. Attack tracing
And (3) positioning the IP of the attack source and the attack intention.
The method comprises the following steps:
IP tracking-analysis of the access frequency (e.g., number of accesses per minute) and command pattern (e.g., "read register + write register") of the attack source IP by honeypot log.
Intent classification-using classifiers (e.g., SVM, LSTM) to determine intent of attack (e.g., data tampering, service disruption).
Attack path backtracking-deducing the target of the attacker (e.g. stealing data or interfering with control) by analyzing the command sequence (e.g. "read register→write register").
It should be noted that step S2 further comprises expanding the data set by random truncation and noise injection, improving model robustness, and preventing over fitting.
It should be noted that:
① Random truncation, namely randomly intercepting subsequences (such as preserving the first 50% of messages) of the long message sequence, and simulating incomplete attack behaviors.
② Noise injection:
gaussian noise-gaussian noise with a mean value of 0 and standard deviation of 0.1 is added to the continuous field (e.g., data field value).
Discrete noise-random substitution of the function code field (e.g., 0x03 to 0x04, simulating illegal requests).
③ And (3) data synthesis:
Additional samples (e.g., illegal function code requests, extra long data field fills) are generated using the GAN, supplementing the original data set.
It should be noted that, the step S3 of GAN generating an attack sample specifically includes the following structures and implementation methods:
1. Generator network (Generator)
① Semantic feature vectors (e.g., function code, data length, time stamp) of the power protocol are input.
② The structure is as follows:
the transducer architecture:
the encoder maps the input semantic features (such as the function code One-Hot vector) to the hidden space.
And a decoder for generating attack messages (such as illegal function code requests and ultra-long data field filling) conforming to the protocol grammar.
And outputting that the generated attack sample is in tensor format (such as [ T, D ], wherein T is the length of a message sequence, and D is the field dimension).
③ Constraint conditions:
and checking protocol grammar, namely generating a message which needs to meet protocol specifications (such as Modbus function code range is 0x00-0xFF, and ASDU type of IEC 60870-5-104 needs to meet standards).
Semantic consistency-the generated function code needs to be logically matched with the data domain value (e.g. 0x03 read register needs to contain valid register address).
2. Distinguishing equipment network (Discriminator)
① Inputting mixed data of real protocol flow and an attack sample;
② The structure is as follows:
LSTM timing classifier: input tensor N, T, D (N is number of samples, T is time step and D is feature dimension). The time series features are extracted by the LSTM layer, and the output probability value (D (x)) represents the probability that the sample is real data.
And outputting probability value D (x) E [0,1] for distinguishing real data from generated data.
3. Training process
The GAN is optimized by minimizing the cross entropy loss function, so that the attack samples output by the generator are semantically and chronologically consistent with the real traffic.
Further, the minimization of the cross entropy loss function is specifically:
Wherein z is an input noise vector (hidden space sample) and obeys a priori distribution p z (such as uniform distribution or Gaussian distribution), G (z) is a generator network, input noise z, output generated attack samples X g E X (such as protocol message sequence), D (X g) is a discriminator network, input samples X g, output probability values D (X g) E [0,1] represent the probability that the samples are real data, and E z~pz is the expectation (i.e. average value) of the noise distribution p z.
It should be noted that, in the present invention, the training objective of generating the countermeasure network (GAN) is to make the attack samples output by the Generator (Generator) agree with the real power protocol traffic in terms of semantic features (such as protocol field values) and timing features (such as message transmission intervals, command sequences) by minimizing the cross entropy loss function. The following is a specific loss function design and parameter description:
TABLE 1 detailed details of loss function parameters
| Parameters (parameters) | Meaning of | Description of the invention |
| z | Noise vector | Random variables sampled from the hidden space are used to generate attack samples. In a power protocol scenario, z may contain semantic constraints (e.g., function code, data length, etc.) for the protocol field. |
| G(z) | Generator network | By adopting a transducer or LSTM architecture, z is input, and an attack sample x g conforming to the protocol grammar (such as illegal function code request and ultra-long data field filling) is output. |
| D(xg) | Distinguishing device network | The LSTM or CNN based timing classifier inputs x g, outputs the probability that it is real data. In the power protocol scenario, D (x g) needs to capture both semantic consistency (e.g., whether the field value meets the protocol specification) and timing consistency (e.g., whether the message sending frequency is abnormal). |
| logD(G(z)) | Log likelihood function | The logarithmic probability of the output of the arbiter is used to measure the "authenticity" of the generated samples. The goal of the generator is to maximize D (G (z)), i.e., minimize-log D (G (z)). |
| Ez-pz | Expected operation | By randomly sampling the noise z, the average loss of the generated samples is calculated, ensuring coverage of diverse attack patterns. |
Optimization process of loss function:
the generator optimizes the following objectives by back propagation:
Semantic consistency, namely, a generator needs to learn grammar rules (such as Modbus function code range and ASDU structure of IEC 60870-5-104) of a power protocol, so that the generated attack sample is consistent with real traffic in field value.
The time sequence consistency is that the generator needs to simulate the communication mode (such as command sequence and data updating frequency) of the real protocol, so as to avoid the generation sample from deviating from the normal behavior obviously in time sequence.
To further improve the quality of the generated samples, a multi-task loss function can be introduced, combining the constraints of semantic features and timing features:
semantic loss L semantic calculates the Mean Square Error (MSE) or cross entropy of the generated samples and the real data over the field values.
Time sequence loss L temporal, the time sequence similarity (such as Dynamic Time Warping (DTW) distance) between the generated sample and the real traffic is calculated through LSTM or a transducer.
And alpha, beta and gamma are weight coefficients to balance the importance of each task.
Further, the honeypot trapping strategy in the step S4 specifically comprises:
Dynamically adjusting the honey response according to the attacker behavior;
And recording an attacker operation path and generating an attack fingerprint.
Further, in the step S6, self-supervision learning is introduced, and semantic anomalies are detected through predictive protocol fields (such as data threshold values).
In the step S5, the construction of the abnormality detection model specifically includes:
1. feature fusion layer
And the aim is to fuse the attack behavior data acquired by the honeypot with the real-time flow characteristics.
The method comprises the following steps:
① Multimode feature stitching:
Honeypot data, attack fingerprint (such as command sequence, data read frequency).
Real-time traffic characteristics including IP source address, protocol version and message sending interval.
② Feature normalization, namely, carrying out standardization processing on numerical type features (such as IP addresses and time intervals).
2. Detection model
① Main model Graphic Neural Network (GNN)
The method comprises the steps of modeling protocol interaction relation and capturing an abnormal mode of a cross-message.
The implementation mode is as follows:
The node means that each message is used as a node and is characterized by semantic features (such as function codes and data domain values).
Edge construction, namely, establishing edges (such as 'command A-command B') according to message time stamps and command sequences.
Graph convolution, namely embedding through GNN learning nodes, and detecting abnormal subgraphs (such as illegal command sequences).
② Auxiliary model self-supervision learning
Target-protocol fields (e.g., data fields) are predicted to detect semantic anomalies.
The implementation mode is as follows:
masking language model the data field values are randomly masked (e.g., 50% of the fields are hidden), and the training model predicts the masked portions.
And a loss function, namely using cross entropy loss to restrict the generated value to be consistent with the true value.
3. Dynamic update mechanism
The goal is to accommodate protocol version changes (e.g., the extension fields of Modbus TCP v1.1 through v 1.2).
The method comprises the following steps:
① Incremental learning, namely, periodically fine-tuning model parameters (such as learning rate decaying to 0.001) by using newly acquired flow data.
② Online learning, namely when a new protocol version is detected, automatically loading a pre-training model and expanding feature dimensions.
Example 1 Modbus protocol attack trapping
And data acquisition, namely acquiring Modbus TCP flow from a certain transformer substation, and extracting fields such as function codes (0 x03 and 0x 06), register addresses and the like.
GAN training, in which the generator learns to generate illegal function code requests (e.g., 0x 10) and ultra-long data field messages.
And (3) honey tank deployment, namely simulating a Modbus server, responding to an attacker request and recording an operation log.
The detection model GNN analyzes the attacker command sequence (e.g., continuously reads register 0x 0001) to identify potential data theft behavior.
Example 2 IEC 60870-5-104 protocol anomaly detection
Feature extraction, namely converting a Type Identifier (TI) and a transmission reason (COT) of the ASDU message into a time sequence vector.
GAN generation, which generates falsified telemetry data (e.g., abnormal voltage values) and illegal control commands.
And (3) abnormality detection, namely predicting a data threshold value from the supervision model, and triggering an alarm if the prediction error exceeds a threshold value (such as 10%).
In order to verify the beneficial effects of the invention, the following simulation experiments were performed:
1. Experimental goal
The method has the following technical effects of attack sample diversity (GAN generating capability), anomaly detection precision (GNN+ self-supervision model), dynamic adaptability (protocol version update) and attack traceability (honeypot log analysis).
2. Data set
TABLE 2 data set sample specification Table
| Data set name | Protocol type | Sample size | Attack type | Protocol version |
| IEEE 123-Bus | IEC 60870-5-104 | 50000 | Falsification of ASDU messages and illegal control | 2010.2 |
| CIGRE | Modbus TCP | 30000 | Ultra-long data field and illegal function code | v1.1 |
| Actual power environment data | DNP3、IEC 61850 | 20000 | Fuzzy attack and zero-day vulnerability exploitation | Multi-version blending |
3. Evaluation index
TABLE 3 evaluation index specification Table
| Index (I) | Definition of the definition |
| Attack recognition rate | The GNN model correctly identifies the proportion of attack samples |
| False alarm rate | The model misjudges the normal flow as the attack proportion |
| Detecting delay | Average time (ms) from attack occurrence to model response |
| Dynamic adaptation | Detection accuracy that model still maintains after protocol version update |
| Tracing accuracy | Success rate of locating attack source IP and intention through honeypot log |
4. Experimental procedure
Step 1, attack sample generation verification
Experimental design compares the diversity of traditional rule engine and GAN generation samples:
control group, attack sample based on predefined rule (e.g. fixed function code 0x 10);
Experimental group an attack sample (containing semantic constraints) generated by the GAN of the present invention;
Results table
TABLE 4 comparison result Table (1)
| Attack type | Sample size of control group | Sample size of experimental group | Protocol compliance | Semantic diversity (functional code distribution) |
| Illegal function code | 100 | 10000 | 0% | 0X10 (Single) |
| Very long data fields | 50 | 5000 | 0% | Fixed length |
| Semantic exception message | 0 | 8000 | 100% | 0X03, 0x06, 0x10 blends |
| Implicit exploit of vulnerabilities | 0 | 2000 | 95% | Dynamic field combining |
The GAN generates a sample to cover a protocol grammar boundary (such as Modbus function code 0x00-0xFF full range), and the compliance reaches 95% through semantic consistency constraint (such as the function code 0x03 needs to contain a valid register address), which is obviously superior to the traditional method.
Step 2, verification of abnormality detection accuracy
The experimental design is compared with 3 detection models, namely a traditional rule engine (threshold judgment), an LSTM time sequence model (without a graph structure) and a GNN+ self-supervision model.
Test scenario
Normal flow, modbus function code 0x03 (read register) +valid data;
attack traffic 0x10 (illegal function code) +extra long data field (length > 255);
Results table
TABLE 5 comparison result Table (2)
| Model type | Attack recognition rate | False alarm rate | Detection delay (ms) | Cross-message exception Capture capability |
| Rule engine | 62% | 18% | 5 | Without any means for |
| LSTM time sequence model | 85% | 9% | 12 | Only single message |
| GNN+ self-supervision model | 98.2% | 1.8% | 8 | Cross 3 message sequence anomalies (e.g., 0x03→0x10→0x06) |
GNN detection confidence 0.97 (based on node embedded similarity).
Step 3, dynamic adaptability verification
The experimental design simulates Modbus protocol version upgrade (v1.1→v1.2 newly added extension field), and the online learning capacity of the test model is that initial training data is v1.1 protocol flow (10000 samples), and online updating data is v1.2 protocol flow (2000 samples).
Results table
TABLE 6 comparison result Table (3)
| Stage(s) | V1.1 detection Rate | V1.2 detection Rate | Field extension learning efficiency (hours) |
| Initial model | 98.5% | 72% | - |
| After online learning | 99.0% | 97.2% | 2.5 |
Step 4, attack traceability verification
The experimental design simulates an APT attack scene, and verifies the cooperative capability of the honeypot log and the detection model, namely an attack path from an extranet IP to a honeypot (counterfeit SCADA server) to an intranet device, and an attack intention, namely data tampering (modifying the reading of an ammeter).
Results table
TABLE 7 comparison result Table (4)
| Dimension of tracing source | The scheme of the invention | Traditional honeypot scheme |
| Attack source IP positioning | 100% Accurate (192.168.1.100) | 60% Accurate |
| Behavior path restoration | Complete sequence 0x03 → 0x06 → 0x10 | Clip-only information |
| Intention classification accuracy | 98% | 72% |
| Response time (ms) | 150 | 500 |
5. Comparative experiments
TABLE 8 comparison of Properties
| Technical proposal | Attack recognition rate | False alarm rate | Dynamic adaptation time consuming | Tracing accuracy |
| Traditional rule engine | 62% | 18% | Failure to adapt to | 40% |
| Static GAN+ honey-free pot | 85% | 12% | Failure to adapt to | 65% |
| The scheme of the invention (full flow) | 98.2% | 1.8% | 2.5 Hours | 98% |
6. Conclusion(s)
And generating an attack sample, namely restricting GAN by using a converter+protocol grammar, and generating sample compliance to 95%, so that the diversity is improved by 20 times.
The detection precision is that the GNN model is excellent in cross-message abnormal detection (such as illegal command sequences), and the false alarm rate is reduced to 1.8%.
Dynamic adaptation, namely, an online learning mechanism enables the model to still maintain 97.2% of detection rate after protocol version updating, and maintenance cost is reduced by 60%.
And the attack tracing is that the honeypot log cooperates with the detection model to realize the classification accuracy of attack intention of 98 percent and the response time of 70 percent.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.
Claims (7)
1. The GAN-based electric power protocol honeypot trapping and abnormality identifying method is characterized by comprising the following steps of:
S1, acquiring real protocol flow from a power system communication link, and constructing an original data set;
s2, extracting time sequence characteristics and semantic characteristics of protocol fields in the protocol flow, and converting the time sequence characteristics and the semantic characteristics into tensor format for generating training of an antagonistic network GAN and input of a subsequent anomaly detection model;
S3, generating an attack sample conforming to the protocol specification in terms of semantics and time sequence by adopting GAN;
S4, constructing virtual power equipment based on the generated attack sample to serve as a honeypot, and simulating real communication behaviors to deploy a trapping strategy;
s5, integrating the attack fingerprint collected by the honeypot with the context characteristics of the real-time flow, and constructing a multi-mode feature vector through feature splicing and normalization, wherein the attack fingerprint is specifically an attack command sequence and an operation frequency, and the context characteristics are specifically a source IP address and a protocol version;
S6, constructing a communication interaction diagram, wherein network messages are nodes, the time sequence among the messages and the command logic relationship are edges, further embedding representation by adopting a graph neural network GNN learning node to identify an illegal command sequence cross-message abnormal interaction mode composed of a plurality of messages, and combining a self-supervision learning prediction protocol field to detect semantic abnormality in a single message;
and S7, associating the abnormal detection result identified in the S6 with the honeypot trapping log to optimize the generation strategy of the GAN in the S3, and synchronously positioning the attack source IP and the attack intention by analyzing the attack behavior path.
2. The GAN-based power protocol honeypot trapping and anomaly identification method of claim 1, wherein the real protocol traffic collected in S1 comprises ASDU messages of IEC 60870-5-104, function code requests of Modbus.
3. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 2, further comprising expanding a data set by random truncation and noise injection to improve model robustness after step S2.
4. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 3, wherein the step S3 of GAN generating the attack sample specifically comprises the following structure and implementation method:
1. Generator network
Inputting semantic feature vectors of a power protocol;
Adopting a transducer architecture to generate an attack message conforming to protocol grammar;
2. distinguishing device network
Inputting mixed data of real protocol flow and an attack sample;
the structure is based on a time sequence classifier of LSTM, and real data and generated data are distinguished;
3. training process
The GAN is optimized by minimizing the cross entropy loss function, so that the attack samples output by the generator are semantically and chronologically consistent with the real traffic.
5. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 4, wherein the minimizing cross entropy loss function is specifically:
;
Wherein z is an input noise vector and obeys the prior distribution p z, G (z) is a generator network, the input noise z outputs the generated attack sample x g∈X;D(xg) is a discriminator network, the input sample x g and the output probability value D (x g)∈[0,1];Ez~pz is the expectation of the noise distribution p z).
6. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 5, wherein the honeypot trapping strategy in step S4 specifically comprises:
Dynamically adjusting the honey response according to the attacker behavior;
And recording an attacker operation path and generating an attack fingerprint.
7. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 6, further comprising introducing self-supervised learning to detect semantic anomalies through predictive agreement fields in step S6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511404551.9A CN120880810B (en) | 2025-09-29 | 2025-09-29 | GAN-based electric power protocol honeypot trapping and anomaly identification method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202511404551.9A CN120880810B (en) | 2025-09-29 | 2025-09-29 | GAN-based electric power protocol honeypot trapping and anomaly identification method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN120880810A CN120880810A (en) | 2025-10-31 |
| CN120880810B true CN120880810B (en) | 2025-12-09 |
Family
ID=97454818
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202511404551.9A Active CN120880810B (en) | 2025-09-29 | 2025-09-29 | GAN-based electric power protocol honeypot trapping and anomaly identification method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120880810B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116451216A (en) * | 2023-02-24 | 2023-07-18 | 百度时代网络技术(北京)有限公司 | Data processing method, device, electronic equipment and storage medium |
| CN116601630A (en) * | 2020-11-25 | 2023-08-15 | 国际商业机器公司 | Defending Targeted Database Attacks Through Dynamic Honeypot Database Response Generation |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10733292B2 (en) * | 2018-07-10 | 2020-08-04 | International Business Machines Corporation | Defending against model inversion attacks on neural networks |
| CN117240560A (en) * | 2023-09-25 | 2023-12-15 | 哈尔滨工业大学 | A GAN-based high-simulation honeypot implementation method and system |
| CN120301699A (en) * | 2025-05-28 | 2025-07-11 | 杭州银行股份有限公司 | An active trapping method for network attacks based on intelligent scheduling |
-
2025
- 2025-09-29 CN CN202511404551.9A patent/CN120880810B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116601630A (en) * | 2020-11-25 | 2023-08-15 | 国际商业机器公司 | Defending Targeted Database Attacks Through Dynamic Honeypot Database Response Generation |
| CN116451216A (en) * | 2023-02-24 | 2023-07-18 | 百度时代网络技术(北京)有限公司 | Data processing method, device, electronic equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120880810A (en) | 2025-10-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN117473571B (en) | Data information security processing method and system | |
| CN111600919B (en) | Method and device for constructing intelligent network application protection system model | |
| CN110909811A (en) | OCSVM (online charging management system) -based power grid abnormal behavior detection and analysis method and system | |
| Shan et al. | NeuPot: A neural network-based honeypot for detecting cyber threats in industrial control systems | |
| CN112231680A (en) | Access right verification method and device based on Internet of things | |
| CN119227089B (en) | Vulnerability and threat scanning method and system based on artificial intelligence | |
| CN117978551B (en) | Interaction abnormal behavior analysis method for transformer substation monitoring network | |
| CN118713918A (en) | Network behavior anomaly detection method, system and electronic device based on data mining | |
| CN119341792B (en) | A security alert false alarm identification method based on deep learning and text analysis | |
| CN116827656A (en) | Network information security protection system and method | |
| CN120200801A (en) | Power grid terminal safety protection system and method based on large language model | |
| CN120658465A (en) | Lightweight dynamic causal reasoning electric power Internet of things terminal attack tracing method | |
| CN116973945A (en) | Interference detection method and system based on GNSS data of intelligent terminal | |
| CN120124051A (en) | File detection data analysis method and system based on data fusion | |
| CN119583115A (en) | A GOOSE protocol message anomaly detection method based on generative adversarial network model | |
| CN120880810B (en) | GAN-based electric power protocol honeypot trapping and anomaly identification method | |
| CN118381682B (en) | Comprehensive analysis and tracing method and device for industrial control network attack events | |
| CN120301688A (en) | A network security risk assessment system and method | |
| CN120634464A (en) | An artificial intelligence-based heterogeneous terminal behavior baseline audit system and method | |
| CN120582813A (en) | Network attack traffic detection method, device, computer equipment and storage medium | |
| CN117240511A (en) | A power grid terminal anomaly detection method | |
| CN116488325A (en) | A smart grid anomaly detection and classification method, device, and readable storage medium | |
| Cheng et al. | Fingerprint recognition and classification of IoT devices based on Z-Wave | |
| CN120785656B (en) | An automated test case construction method and system for testing proprietary protocols | |
| CN119675922B (en) | Malicious control instruction detection method and system based on CNN-LSTM hybrid model |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |