CN120880810B

CN120880810B - GAN-based electric power protocol honeypot trapping and anomaly identification method

Info

Publication number: CN120880810B
Application number: CN202511404551.9A
Authority: CN
Inventors: 刘洪波; 樊家树; 孙浩然; 金泽洙; 马旭东; 陈兆强; 赵越; 李施昊; 于卓鑫; 刘凌宇; 王彦钊
Original assignee: Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Current assignee: Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Priority date: 2025-09-29
Filing date: 2025-09-29
Publication date: 2025-12-09
Anticipated expiration: 2045-09-29
Also published as: CN120880810A

Abstract

The invention discloses a GAN-based electric power protocol honeypot trapping and anomaly identification method, which is characterized in that a data set is constructed by collecting real flow of an electric power protocol, a plurality of attack samples conforming to protocol grammar are generated by using GAN of a transducer architecture, and robustness is improved by combining data enhancement technologies such as random truncation, noise injection and the like. The virtual honeypot equipment is deployed to simulate the behavior of the power equipment, the attack log and the flow characteristic are fused in real time, the cross-message interaction relation is modeled by adopting the graph neural network, and the self-supervision learning is introduced to detect the semantic abnormality. Experiments show that the method realizes the detection accuracy of 98.2% on data sets such as IEEE 123-Bus and the like, supports dynamic adaptation of protocol versions, realizes accurate tracing of an attack source IP and intention through honeypot log association analysis, and effectively improves the active defense capability of the power system to novel attacks.

Description

GAN-based electric power protocol honeypot trapping and anomaly identification method

Technical Field

The invention relates to the technical field of power system safety, in particular to a GAN-based power protocol honeypot trapping and anomaly identification method, which is used for improving the attack detection capability of power communication protocols (such as IEC 60870-5-104, modbus, DNP3 and the like).

Background

With the development of the energy internet and smart grids, the power system communication protocol faces increasingly complex network attack threats. Traditional honeypot technology attracts attackers by simulating real power devices, but has the following drawbacks:

the single attack sample is that the existing honeypot depends on preset rules or static data, so that diversified attack samples are difficult to generate, and the recognition capability of novel attacks is limited.

The anomaly detection precision is insufficient, namely the identification rate of a rule-based anomaly detection model (such as threshold judgment and feature matching) to a hidden attack (such as protocol fuzzy attack and zero-day exploit) is low, and the rule-based anomaly detection model depends on manual annotation data.

The dynamic adaptability is poor, the version iteration of the power protocol and the communication mode change lead to the frequent updating rule of the traditional honeypot, and the maintenance cost is high.

In the prior art, a power protocol anomaly detection method based on deep learning is proposed in literature, but a large amount of marked data is relied on, and the traditional GAN is adopted in literature to generate attack flow, but a honeypot trapping mechanism is not combined, so that real-time tracking and feedback optimization of attack behaviors cannot be realized. Therefore, an innovative method of fusing GAN and honeypot technologies is needed to improve the level of intelligence of the power system safety protection.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The present invention has been made in view of the above-described problems with the existing power communication protocols.

Therefore, the invention solves the technical problems of insufficient recognition of complex attack modes, high false alarm rate and poor dynamic adaptability of the existing honeypot system.

The technical scheme includes that the method comprises the following steps of S1, collecting real protocol traffic from a power system communication link, constructing an original data set, S2, extracting time sequence features and semantic features of protocol fields, converting the time sequence features and the semantic features into tensor format input models, S3, generating attack samples by the GAN, S4, constructing virtual power equipment based on the generated attack samples, simulating real communication behaviors, completing deployment of honeypot trapping strategies, S5, carrying out multi-mode fusion on attack behavior data collected by the honeypot and real-time traffic features, S6, capturing abnormal modes crossing messages by adopting a graph neural network modeling protocol interactive relation, S7, associating abnormal detection results with honeypot trapping logs, optimizing the GAN generation strategy, and synchronously positioning an attack source IP and attack intention by analyzing an attacker behavior path.

As an optimal scheme of the GAN-based electric power protocol honeypot trapping and abnormality identifying method, the real protocol flow acquired in the S1 comprises ASDU messages of IEC 60870-5-104 and Modbus function code requests.

As a preferable scheme of the GAN-based electric power protocol honeypot trapping and anomaly identification method, the method further comprises the step of expanding a data set through a random truncation and noise injection method after the step S2, so that model robustness is improved.

As a preferable scheme of the GAN-based electric power protocol honeypot trapping and anomaly identification method, the step S3 of generating an attack sample by GAN specifically comprises the following structure and implementation method:

1. Generator network

Inputting semantic feature vectors of a power protocol;

Adopting a transducer architecture to generate an attack message conforming to protocol grammar;

2. distinguishing device network

Inputting mixed data of real protocol flow and an attack sample;

the structure is based on a time sequence classifier of LSTM, and real data and generated data are distinguished;

3. training process

The GAN is optimized by minimizing the cross entropy loss function, so that the attack samples output by the generator are semantically and chronologically consistent with the real traffic.

As a preferable scheme of the GAN-based electric power protocol honeypot trapping and anomaly identification method, the invention specifically comprises the following steps:

Wherein z is an input noise vector and obeys the prior distribution p _z, G (z) is a generator network, the input noise z outputs the generated attack sample x _g∈X;D（x_g) is a discriminator network, the input sample x _g and the output probability value D (x _g）∈[0,1];E_z~pz is the expectation of the noise distribution p _z).

The GAN-based power protocol honeypot trapping and anomaly identification method is characterized in that the honeypot trapping strategy in the S4 step specifically comprises the steps of dynamically adjusting honeypot response according to attacker behaviors, recording an attacker operation path and generating attack fingerprints.

The GAN-based power protocol honeypot trapping and anomaly identification method is characterized in that in the step S6, self-supervision learning is introduced, and semantic anomalies are detected through a predictive protocol field.

The invention provides a GAN-based electric power protocol honeypot trapping and anomaly identification method, which has the following beneficial effects:

1. Attack sample diversity, namely covering protocol grammar boundary conditions and implicit loopholes by an attack sample generated by GAN, obviously improving the trapping coverage rate of the honeypot, and improving the attack recognition rate by more than 30 percent;

2. the anomaly detection precision is that the detection accuracy rate of 98.2% is achieved on a public power protocol data set (such as IEEE 123-Bus and CIGRE) by combining the GNN and a self-supervision learning multi-mode model, and the false alarm rate is lower than 2%;

3. Dynamic adaptation, namely, through an online learning mechanism, the model can automatically adapt to protocol version iteration (such as expansion fields of Modbus TCP v1.1 to v 1.2);

4. attack traceability, namely combining the attack fingerprint recorded by the honeypot with the semantic analysis of the detection model to realize the accurate retrospection of the attack path.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

Fig. 1 is a flow chart of a method for trapping and anomaly identification in a GAN-based power agreement honeypot.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the prior art, a power protocol anomaly detection method based on deep learning is proposed in literature, but a large amount of marked data is relied on, and the traditional GAN is adopted in literature to generate attack flow, but a honeypot trapping mechanism is not combined, so that real-time tracking and feedback optimization of attack behaviors cannot be realized.

Accordingly, referring to fig. 1, the present invention provides a GAN-based power agreement honeypot trapping and anomaly identification method, comprising the steps of:

S1, acquiring real protocol flow (ASDU message of IEC 60870-5-104 and function code request of Modbus) from a power system communication link to construct an original data set;

it should be noted that:

① The collection tool captures power system communication traffic using a network sniffing tool (e.g., WIRESHARK, TCPDUMP or custom protocol parser).

② Protocol type:

IEC 60870-5-104 extracts ASDU (application service data Unit) messages containing address field, control field, data field, etc.

Modbus, capture function code request (e.g. 0x03 read register, 0x06 write single register) and response message.

DNP3 is to extract control domain, object reference, data domain, etc. in frame structure.

③ And the acquisition environment is an intermediate node (such as a gateway and a switch) deployed on a communication link of the power system, so that the data integrity is ensured.

④ Data format-original traffic is saved in PCAP file, and key fields are extracted later by protocol parser.

S2, extracting time sequence features and semantic features of protocol fields (address field, control field and data field) and converting the time sequence features and the semantic features into tensor format input models;

it should be noted that:

The steps are as follows:

① Extracting fields:

Semantic features, extracting protocol field values (such as ASDU types of Modbus function codes 0x03 and IEC 60870-5-104).

The time sequence features include calculating the time interval of message transmission, such as the time stamp of command sequence, and the data updating frequency.

② Feature coding:

semantic features, one-Hot encoding of discrete fields (such as function codes) and normalization of continuous fields (such as data field values).

The time sequence features are that the time stamp is converted into a time interval sequence (such as delta t=t _i+1-t_i), and sliding window segmentation is performed.

③ Tensor construction:

semantic features (e.g., function codes, data lengths) and timing features (e.g., Δt sequences) are stitched into two-dimensional tensors (e.g., [ N, T, D ], where N is the number of samples, T is the time step, and D is the feature dimension).

S3, GAN generates an attack sample;

S4, constructing virtual power equipment based on the generated attack sample, simulating real communication behavior, and completing deployment of a honeypot trapping strategy;

it should be noted that:

1. Protocol simulation layer

The method aims at constructing virtual power equipment (such as SCADA servers and intelligent electric meters) and simulating real communication behaviors.

The implementation method comprises the following steps:

① Protocol stack implementation protocol parsing and response is implemented using scapy libraries of Python or special simulation tools (e.g., pcapPlusPlus).

② Virtual device:

SCADA server, analog telemetry data upload (e.g., voltage, current values) and remote command response.

And the intelligent ammeter simulates the reporting of the electric energy reading and the parameter setting request.

2. Trapping strategy

① Dynamic response mechanism:

Falsifying an error status code-when an illegal function code request (e.g. 0x 10) is detected, an error code (e.g. 0x 03) is returned.

Delay response-increasing response delay (e.g., 500 ms) for high frequency requests (e.g., 10 times per second), induces an attacker exposure behavior pattern.

② Behavior tracking module:

logging, namely recording an attacker operation path (such as command sequence and data reading frequency).

Attack fingerprint generation, namely extracting attack characteristics (such as 'illegal function code+high-frequency request') through a clustering algorithm (such as K-means).

S5, carrying out multi-mode fusion on attack behavior data acquired by the honeypot and real-time flow characteristics (such as IP source address and protocol version);

S6, capturing abnormal modes (such as illegal command sequences) of the cross-message by adopting a Graph Neural Network (GNN) modeling protocol interaction relation;

And S7, associating an abnormality detection result with the honeypot trapping log, optimizing a GAN generation strategy (such as increasing the generation capacity of novel attacks), and synchronously positioning an attack source IP and an attack intention (such as data falsification and service interruption) by analyzing an attacker behavior path.

It should be noted that:

1. model feedback

The method aims at optimizing GAN generation strategies and increasing the generation capacity of novel attacks.

The method comprises the following steps:

And (3) associating an abnormality detection result (such as illegal function code and high-frequency request) with the honeypot log to generate a new attack sample.

Generator optimization by adjusting generator input features (e.g., adding new function code fields) or by adjusting loss function weights (e.g., lifting lambda ₁ to enhance semantic constraints).

2. Attack tracing

And (3) positioning the IP of the attack source and the attack intention.

The method comprises the following steps:

IP tracking-analysis of the access frequency (e.g., number of accesses per minute) and command pattern (e.g., "read register + write register") of the attack source IP by honeypot log.

Intent classification-using classifiers (e.g., SVM, LSTM) to determine intent of attack (e.g., data tampering, service disruption).

Attack path backtracking-deducing the target of the attacker (e.g. stealing data or interfering with control) by analyzing the command sequence (e.g. "read register→write register").

It should be noted that step S2 further comprises expanding the data set by random truncation and noise injection, improving model robustness, and preventing over fitting.

It should be noted that:

① Random truncation, namely randomly intercepting subsequences (such as preserving the first 50% of messages) of the long message sequence, and simulating incomplete attack behaviors.

② Noise injection:

gaussian noise-gaussian noise with a mean value of 0 and standard deviation of 0.1 is added to the continuous field (e.g., data field value).

Discrete noise-random substitution of the function code field (e.g., 0x03 to 0x04, simulating illegal requests).

③ And (3) data synthesis:

Additional samples (e.g., illegal function code requests, extra long data field fills) are generated using the GAN, supplementing the original data set.

It should be noted that, the step S3 of GAN generating an attack sample specifically includes the following structures and implementation methods:

1. Generator network (Generator)

① Semantic feature vectors (e.g., function code, data length, time stamp) of the power protocol are input.

② The structure is as follows:

the transducer architecture:

the encoder maps the input semantic features (such as the function code One-Hot vector) to the hidden space.

And a decoder for generating attack messages (such as illegal function code requests and ultra-long data field filling) conforming to the protocol grammar.

And outputting that the generated attack sample is in tensor format (such as [ T, D ], wherein T is the length of a message sequence, and D is the field dimension).

③ Constraint conditions:

and checking protocol grammar, namely generating a message which needs to meet protocol specifications (such as Modbus function code range is 0x00-0xFF, and ASDU type of IEC 60870-5-104 needs to meet standards).

Semantic consistency-the generated function code needs to be logically matched with the data domain value (e.g. 0x03 read register needs to contain valid register address).

2. Distinguishing equipment network (Discriminator)

① Inputting mixed data of real protocol flow and an attack sample;

② The structure is as follows:

LSTM timing classifier: input tensor N, T, D (N is number of samples, T is time step and D is feature dimension). The time series features are extracted by the LSTM layer, and the output probability value (D (x)) represents the probability that the sample is real data.

And outputting probability value D (x) E [0,1] for distinguishing real data from generated data.

3. Training process

Further, the minimization of the cross entropy loss function is specifically:

Wherein z is an input noise vector (hidden space sample) and obeys a priori distribution p _z (such as uniform distribution or Gaussian distribution), G (z) is a generator network, input noise z, output generated attack samples X _g E X (such as protocol message sequence), D (X _g) is a discriminator network, input samples X _g, output probability values D (X _g) E [0,1] represent the probability that the samples are real data, and E _z~pz is the expectation (i.e. average value) of the noise distribution p _z.

It should be noted that, in the present invention, the training objective of generating the countermeasure network (GAN) is to make the attack samples output by the Generator (Generator) agree with the real power protocol traffic in terms of semantic features (such as protocol field values) and timing features (such as message transmission intervals, command sequences) by minimizing the cross entropy loss function. The following is a specific loss function design and parameter description:

TABLE 1 detailed details of loss function parameters

Parameters (parameters)	Meaning of	Description of the invention
			z	Noise vector	Random variables sampled from the hidden space are used to generate attack samples. In a power protocol scenario, z may contain semantic constraints (e.g., function code, data length, etc.) for the protocol field.
G(z)	Generator network	By adopting a transducer or LSTM architecture, z is input, and an attack sample x _g conforming to the protocol grammar (such as illegal function code request and ultra-long data field filling) is output.
			D(x_g)	Distinguishing device network	The LSTM or CNN based timing classifier inputs x _g, outputs the probability that it is real data. In the power protocol scenario, D (x _g) needs to capture both semantic consistency (e.g., whether the field value meets the protocol specification) and timing consistency (e.g., whether the message sending frequency is abnormal).
logD(G(z))	Log likelihood function	The logarithmic probability of the output of the arbiter is used to measure the "authenticity" of the generated samples. The goal of the generator is to maximize D (G (z)), i.e., minimize-log D (G (z)).
			E_z-pz	Expected operation	By randomly sampling the noise z, the average loss of the generated samples is calculated, ensuring coverage of diverse attack patterns.

Optimization process of loss function:

the generator optimizes the following objectives by back propagation:

Semantic consistency, namely, a generator needs to learn grammar rules (such as Modbus function code range and ASDU structure of IEC 60870-5-104) of a power protocol, so that the generated attack sample is consistent with real traffic in field value.

The time sequence consistency is that the generator needs to simulate the communication mode (such as command sequence and data updating frequency) of the real protocol, so as to avoid the generation sample from deviating from the normal behavior obviously in time sequence.

To further improve the quality of the generated samples, a multi-task loss function can be introduced, combining the constraints of semantic features and timing features:

semantic loss L _semantic calculates the Mean Square Error (MSE) or cross entropy of the generated samples and the real data over the field values.

Time sequence loss L _temporal, the time sequence similarity (such as Dynamic Time Warping (DTW) distance) between the generated sample and the real traffic is calculated through LSTM or a transducer.

And alpha, beta and gamma are weight coefficients to balance the importance of each task.

Further, the honeypot trapping strategy in the step S4 specifically comprises:

Dynamically adjusting the honey response according to the attacker behavior;

And recording an attacker operation path and generating an attack fingerprint.

Further, in the step S6, self-supervision learning is introduced, and semantic anomalies are detected through predictive protocol fields (such as data threshold values).

In the step S5, the construction of the abnormality detection model specifically includes:

1. feature fusion layer

And the aim is to fuse the attack behavior data acquired by the honeypot with the real-time flow characteristics.

The method comprises the following steps:

① Multimode feature stitching:

Honeypot data, attack fingerprint (such as command sequence, data read frequency).

Real-time traffic characteristics including IP source address, protocol version and message sending interval.

② Feature normalization, namely, carrying out standardization processing on numerical type features (such as IP addresses and time intervals).

2. Detection model

① Main model Graphic Neural Network (GNN)

The method comprises the steps of modeling protocol interaction relation and capturing an abnormal mode of a cross-message.

The implementation mode is as follows:

The node means that each message is used as a node and is characterized by semantic features (such as function codes and data domain values).

Edge construction, namely, establishing edges (such as 'command A-command B') according to message time stamps and command sequences.

Graph convolution, namely embedding through GNN learning nodes, and detecting abnormal subgraphs (such as illegal command sequences).

② Auxiliary model self-supervision learning

Target-protocol fields (e.g., data fields) are predicted to detect semantic anomalies.

The implementation mode is as follows:

masking language model the data field values are randomly masked (e.g., 50% of the fields are hidden), and the training model predicts the masked portions.

And a loss function, namely using cross entropy loss to restrict the generated value to be consistent with the true value.

3. Dynamic update mechanism

The goal is to accommodate protocol version changes (e.g., the extension fields of Modbus TCP v1.1 through v 1.2).

The method comprises the following steps:

① Incremental learning, namely, periodically fine-tuning model parameters (such as learning rate decaying to 0.001) by using newly acquired flow data.

② Online learning, namely when a new protocol version is detected, automatically loading a pre-training model and expanding feature dimensions.

Example 1 Modbus protocol attack trapping

And data acquisition, namely acquiring Modbus TCP flow from a certain transformer substation, and extracting fields such as function codes (0 x03 and 0x 06), register addresses and the like.

GAN training, in which the generator learns to generate illegal function code requests (e.g., 0x 10) and ultra-long data field messages.

And (3) honey tank deployment, namely simulating a Modbus server, responding to an attacker request and recording an operation log.

The detection model GNN analyzes the attacker command sequence (e.g., continuously reads register 0x 0001) to identify potential data theft behavior.

Example 2 IEC 60870-5-104 protocol anomaly detection

Feature extraction, namely converting a Type Identifier (TI) and a transmission reason (COT) of the ASDU message into a time sequence vector.

GAN generation, which generates falsified telemetry data (e.g., abnormal voltage values) and illegal control commands.

And (3) abnormality detection, namely predicting a data threshold value from the supervision model, and triggering an alarm if the prediction error exceeds a threshold value (such as 10%).

In order to verify the beneficial effects of the invention, the following simulation experiments were performed:

1. Experimental goal

The method has the following technical effects of attack sample diversity (GAN generating capability), anomaly detection precision (GNN+ self-supervision model), dynamic adaptability (protocol version update) and attack traceability (honeypot log analysis).

2. Data set

TABLE 2 data set sample specification Table

Data set name	Protocol type	Sample size	Attack type	Protocol version
					IEEE 123-Bus	IEC 60870-5-104	50000	Falsification of ASDU messages and illegal control	2010.2
CIGRE	Modbus TCP	30000	Ultra-long data field and illegal function code	v1.1
					Actual power environment data	DNP3、IEC 61850	20000	Fuzzy attack and zero-day vulnerability exploitation	Multi-version blending

3. Evaluation index

TABLE 3 evaluation index specification Table

Index (I)	Definition of the definition
		Attack recognition rate	The GNN model correctly identifies the proportion of attack samples
False alarm rate	The model misjudges the normal flow as the attack proportion
		Detecting delay	Average time (ms) from attack occurrence to model response
Dynamic adaptation	Detection accuracy that model still maintains after protocol version update
		Tracing accuracy	Success rate of locating attack source IP and intention through honeypot log

4. Experimental procedure

Step 1, attack sample generation verification

Experimental design compares the diversity of traditional rule engine and GAN generation samples:

control group, attack sample based on predefined rule (e.g. fixed function code 0x 10);

Experimental group an attack sample (containing semantic constraints) generated by the GAN of the present invention;

Results table

TABLE 4 comparison result Table (1)

Attack type	Sample size of control group	Sample size of experimental group	Protocol compliance	Semantic diversity (functional code distribution)
					Illegal function code	100	10000	0%	0X10 (Single)
Very long data fields	50	5000	0%	Fixed length
					Semantic exception message	0	8000	100%	0X03, 0x06, 0x10 blends
Implicit exploit of vulnerabilities	0	2000	95%	Dynamic field combining

The GAN generates a sample to cover a protocol grammar boundary (such as Modbus function code 0x00-0xFF full range), and the compliance reaches 95% through semantic consistency constraint (such as the function code 0x03 needs to contain a valid register address), which is obviously superior to the traditional method.

Step 2, verification of abnormality detection accuracy

The experimental design is compared with 3 detection models, namely a traditional rule engine (threshold judgment), an LSTM time sequence model (without a graph structure) and a GNN+ self-supervision model.

Test scenario

Normal flow, modbus function code 0x03 (read register) +valid data;

attack traffic 0x10 (illegal function code) +extra long data field (length > 255);

Results table

TABLE 5 comparison result Table (2)

Model type	Attack recognition rate	False alarm rate	Detection delay (ms)	Cross-message exception Capture capability
					Rule engine	62%	18%	5	Without any means for
LSTM time sequence model	85%	9%	12	Only single message
					GNN+ self-supervision model	98.2%	1.8%	8	Cross 3 message sequence anomalies (e.g., 0x03→0x10→0x06)

GNN detection confidence 0.97 (based on node embedded similarity).

Step 3, dynamic adaptability verification

The experimental design simulates Modbus protocol version upgrade (v1.1→v1.2 newly added extension field), and the online learning capacity of the test model is that initial training data is v1.1 protocol flow (10000 samples), and online updating data is v1.2 protocol flow (2000 samples).

Results table

TABLE 6 comparison result Table (3)

Stage(s)	V1.1 detection Rate	V1.2 detection Rate	Field extension learning efficiency (hours)
				Initial model	98.5%	72%	-
After online learning	99.0%	97.2%	2.5

Step 4, attack traceability verification

The experimental design simulates an APT attack scene, and verifies the cooperative capability of the honeypot log and the detection model, namely an attack path from an extranet IP to a honeypot (counterfeit SCADA server) to an intranet device, and an attack intention, namely data tampering (modifying the reading of an ammeter).

Results table

TABLE 7 comparison result Table (4)

Dimension of tracing source	The scheme of the invention	Traditional honeypot scheme
			Attack source IP positioning	100% Accurate (192.168.1.100)	60% Accurate
Behavior path restoration	Complete sequence 0x03 → 0x06 → 0x10	Clip-only information
			Intention classification accuracy	98%	72%
Response time (ms)	150	500

5. Comparative experiments

TABLE 8 comparison of Properties

Technical proposal	Attack recognition rate	False alarm rate	Dynamic adaptation time consuming	Tracing accuracy
					Traditional rule engine	62%	18%	Failure to adapt to	40%
Static GAN+ honey-free pot	85%	12%	Failure to adapt to	65%
					The scheme of the invention (full flow)	98.2%	1.8%	2.5 Hours	98%

6. Conclusion(s)

And generating an attack sample, namely restricting GAN by using a converter+protocol grammar, and generating sample compliance to 95%, so that the diversity is improved by 20 times.

The detection precision is that the GNN model is excellent in cross-message abnormal detection (such as illegal command sequences), and the false alarm rate is reduced to 1.8%.

Dynamic adaptation, namely, an online learning mechanism enables the model to still maintain 97.2% of detection rate after protocol version updating, and maintenance cost is reduced by 60%.

And the attack tracing is that the honeypot log cooperates with the detection model to realize the classification accuracy of attack intention of 98 percent and the response time of 70 percent.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. The GAN-based electric power protocol honeypot trapping and abnormality identifying method is characterized by comprising the following steps of:

S1, acquiring real protocol flow from a power system communication link, and constructing an original data set;

s2, extracting time sequence characteristics and semantic characteristics of protocol fields in the protocol flow, and converting the time sequence characteristics and the semantic characteristics into tensor format for generating training of an antagonistic network GAN and input of a subsequent anomaly detection model;

S3, generating an attack sample conforming to the protocol specification in terms of semantics and time sequence by adopting GAN;

S4, constructing virtual power equipment based on the generated attack sample to serve as a honeypot, and simulating real communication behaviors to deploy a trapping strategy;

s5, integrating the attack fingerprint collected by the honeypot with the context characteristics of the real-time flow, and constructing a multi-mode feature vector through feature splicing and normalization, wherein the attack fingerprint is specifically an attack command sequence and an operation frequency, and the context characteristics are specifically a source IP address and a protocol version;

S6, constructing a communication interaction diagram, wherein network messages are nodes, the time sequence among the messages and the command logic relationship are edges, further embedding representation by adopting a graph neural network GNN learning node to identify an illegal command sequence cross-message abnormal interaction mode composed of a plurality of messages, and combining a self-supervision learning prediction protocol field to detect semantic abnormality in a single message;

and S7, associating the abnormal detection result identified in the S6 with the honeypot trapping log to optimize the generation strategy of the GAN in the S3, and synchronously positioning the attack source IP and the attack intention by analyzing the attack behavior path.

2. The GAN-based power protocol honeypot trapping and anomaly identification method of claim 1, wherein the real protocol traffic collected in S1 comprises ASDU messages of IEC 60870-5-104, function code requests of Modbus.

3. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 2, further comprising expanding a data set by random truncation and noise injection to improve model robustness after step S2.

4. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 3, wherein the step S3 of GAN generating the attack sample specifically comprises the following structure and implementation method:

1. Generator network

Inputting semantic feature vectors of a power protocol;

2. distinguishing device network

Inputting mixed data of real protocol flow and an attack sample;

3. training process

5. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 4, wherein the minimizing cross entropy loss function is specifically:

;

6. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 5, wherein the honeypot trapping strategy in step S4 specifically comprises:

Dynamically adjusting the honey response according to the attacker behavior;

And recording an attacker operation path and generating an attack fingerprint.

7. The GAN-based power agreement honeypot trapping and anomaly identification method of claim 6, further comprising introducing self-supervised learning to detect semantic anomalies through predictive agreement fields in step S6.