CN112202762B

CN112202762B - Game defense strategy optimization method and system for sensing edge cloud intelligent jamming attack

Info

Publication number: CN112202762B
Application number: CN202011039611.9A
Authority: CN
Inventors: 刘建华; 沈士根; 方朝曦; 黄龙军
Original assignee: University of Shaoxing
Current assignee: University of Shaoxing
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-07-08
Anticipated expiration: 2040-09-28
Also published as: CN112202762A

Abstract

The invention discloses a game defense strategy optimization method and system for intelligent interference attack in sensing edge cloud. The method comprises the following steps: (1) acquiring a transmission power vector distributed to the computing task of an initial sensor equipment cluster head node set; (2) calculating a power distribution vector of the intelligent interference attacker when the game effect of the intelligent interference attacker is maximized according to a Stark-Boolean model; (3) calculating a transmission power distribution vector of the cluster head node of the sensing equipment when the Nash equilibrium point is reached according to a Stark Boolean model; (4) a decision configuration variable is determined. The system comprises an initialization module, an intelligent interference attacker prediction module, a defense strategy decision module and a configuration module. The invention can effectively defend the intelligent interference attacker with learning ability and provides a defense method for resisting the intelligent interference attack.

Description

Game defense strategy optimization method and system for sensing edge cloud intelligent jamming attack

技术领域technical field

本发明属于物联网技术领域，更具体地，涉及一种传感边缘云智能干扰攻击的博弈防御策略优化方法及系统。The invention belongs to the technical field of the Internet of Things, and more particularly, relates to a game defense strategy optimization method and system for sensing edge cloud intelligent interference attack.

背景技术Background technique

传感设备的计算任务能通过服务访问点或基站节点卸载到边缘服务节点，这极大的减少了传感设备的资源消耗，提高了用户的服务质量。然而，在开放的环境中，传感设备计算任务卸载过程容易受到智能干扰攻击。The computing tasks of sensing devices can be offloaded to edge service nodes through service access points or base station nodes, which greatly reduces the resource consumption of sensing devices and improves service quality for users. However, in an open environment, the computing task offloading process of sensing devices is vulnerable to intelligent jamming attacks.

传感边缘云系统集成了传感能力、控制、通信和计算能力，已经广泛应用于工业互联网领域。在传感边缘云系统中处于边缘侧的边缘服务节点通过开放的无线环境响应传感设备节点的请求，并且接收来自传感设备节点的计算任务。考虑到边缘侧复杂的无线通信特性，在传感设备和边缘服务节点之间的计算任务卸载中，特别是针对延迟敏感计算任务的智能干扰攻击，使得边缘计算性能降低或任务卸载失败。因此，传感设备和边缘服务的床杆设备簇头节点之间的安全通信面临巨大挑战。The sensing edge cloud system integrates sensing capabilities, control, communication and computing capabilities, and has been widely used in the industrial Internet field. In the sensor edge cloud system, the edge service node on the edge side responds to the request of the sensor device node through an open wireless environment, and receives computing tasks from the sensor device node. Considering the complex wireless communication characteristics on the edge side, in the offloading of computing tasks between sensing devices and edge service nodes, especially the intelligent jamming attack on delay-sensitive computing tasks, the performance of edge computing is reduced or task offloading fails. Therefore, the secure communication between the sensing device and the bedbar device cluster head nodes of the edge service faces great challenges.

传感设备簇头节点作为防御者，由于难以捕获智能干扰攻击者的信道增益，特别是智能干扰攻击者对多信道发起的DDoS(Distributed denial of service attack)攻击，越多的信道受攻击，防御者越需要高的计算成本才能获得优化的防御策略。S.Bhattacharya等形式化了一个零和追逐-逃避博弈来计算优化策略，并进行抗UAV(Unmanned aerial vehicle)干扰攻击(Game-theoretic analysis of an aerialjamming attack on a UAV communication network[C].)。L.Xiao等考虑了智能UAV干扰攻击者，能够指定攻击类型，如干扰攻击、窃听攻击和欺骗攻击等，并且基于强化学习的功率分配策略来防御这些攻击(User-Centric View of Unmanned Aerial VehicleTransmission Against Smart Attacks.)。Y.Xu等人考虑了不完全信道状态信息，使用贝叶斯Stackelberg博弈研究了UAV用户和干扰攻击者之间的竞争交互过程(A One-LeaderMulti-Follower Bayesian-Stackelberg Game for Anti-Jamming Transmission in UAVCommunication Networks[J].)。在Y.Xu等的方案中，防御者使用Stackelberg抗干扰攻击博弈来评估智能干扰攻击的观察误差对防御性能的影响，并获得了纳什均衡(A One-Leader Multi-Follower Bayesian-Stackelberg Game for Anti-Jamming Transmissionin UAV Communication Networks)。The sensor device cluster head node acts as a defender, because it is difficult to capture the channel gain of the intelligent jamming attacker, especially the DDoS (Distributed denial of service attack) attack launched by the intelligent jamming attacker on multiple channels, the more channels are attacked, the defense The higher the computational cost, the higher the computational cost to obtain an optimized defense strategy. S. Bhattacharya et al. formalized a zero-sum chase-evasion game to calculate the optimization strategy and conduct an anti-UAV (Unmanned aerial vehicle) jamming attack (Game-theoretic analysis of an aerialjamming attack on a UAV communication network [C].). L. Xiao et al. considered intelligent UAV jamming attackers, which can specify attack types, such as jamming attacks, eavesdropping attacks, and spoofing attacks, and use reinforcement learning-based power allocation strategies to defend against these attacks (User-Centric View of Unmanned Aerial VehicleTransmission Against Smart Attacks.). Y. Xu et al. considered the incomplete channel state information and used the Bayesian Stackelberg game to study the competitive interaction process between UAV users and jamming attackers (A One-LeaderMulti-Follower Bayesian-Stackelberg Game for Anti-Jamming Transmission in UAVCommunication Networks[J].). In the scheme of Y. Xu et al., the defender uses the Stackelberg anti-jamming attack game to evaluate the impact of the observation error of the intelligent jamming attack on the defense performance, and obtains a Nash equilibrium (A One-Leader Multi-Follower Bayesian-Stackelberg Game for Anti - Jamming Transmission in UAV Communication Networks).

这些方案存在的不足如下：The shortcomings of these schemes are as follows:

(1)已提出的方法对不完全的信道状态信息考虑有限，使得博弈参与者最优策略的选择面临复杂性，当博弈参与者双方攻防策略改变时，已提出的方法也未提供快速的推理功能来实现防御策略选择。(1) The proposed method has limited consideration of incomplete channel state information, which makes the choice of the optimal strategy of the game participants face complexity. When the attack and defense strategies of both game participants change, the proposed method does not provide fast reasoning. function to implement defense strategy selection.

(2)虽然已提出的解决方案设计了基于学习的防御策略，但未考虑如何防御具有学习能力的智能干扰攻击者。(2) Although the proposed solutions design learning-based defense strategies, they do not consider how to defend against intelligent jamming attackers with learning ability.

(3)智能干扰攻击者对计算任务卸载的多信道攻击，已提出的解决方案需要解决复杂的问题，极大降低了防御性能。(3) Multi-channel attacks on computing task offloading by intelligent jamming attackers, the proposed solutions need to solve complex problems, which greatly reduces the defense performance.

发明内容SUMMARY OF THE INVENTION

针对现有技术的以上缺陷或改进需求，本发明提供了一种传感边缘云智能干扰攻击的博弈防御策略优化方法及系统，其目的在于按照博弈模型预测智能干扰攻击者的攻击策略，并针对预测的攻击策略进行防御策略的智能优化，由此解决现有技术无法针对不完全达到信道状态信息、不能对具有学习能力的智能干扰攻击者进行防御或者防御方案过于复杂导致的方案不可用的技术问题。In view of the above defects or improvement requirements of the prior art, the present invention provides a game defense strategy optimization method and system for sensing edge cloud intelligent interference attack, the purpose of which is to predict the attack strategy of the intelligent interference attacker according to the game model, and to target The predicted attack strategy is used to intelligently optimize the defense strategy, thereby solving the unusable technology that the existing technology cannot target the incomplete channel state information, cannot defend against intelligent jamming attackers with learning ability, or the defense plan is too complicated. question.

为实现上述目的，按照本发明的一个方面，提供了一种传感边缘云中受智能干扰攻击的博弈防御策略优化方法，包括以下步骤：In order to achieve the above object, according to one aspect of the present invention, a method for optimizing a game defense strategy under intelligent interference attack in a sensing edge cloud is provided, comprising the following steps:

(1)获取初始的传感器设备簇头节点集合的分配给所述计算任务传输功率向量P：P＝(P₁,P₂,...,P_m)，其中m为传感设备簇头节点的可用信道资源个数；(1) Obtain the transmission power vector P of the initial sensor device cluster head node set allocated to the computing task: P=(P ₁ , P ₂ , . . . , P _m ), where m is the sensor device cluster head node The number of available channel resources;

(2)按照斯塔克尔伯格模型，以设备簇头节点为领导者，以智能干扰攻击者为追随者，根据智能干扰攻击者攻击的n个传感设备簇头节点的信道其信道增益向量，计算当最大化智能干扰攻击者的博弈效用时，所述智能干扰攻击者对于其攻击的n个信道的功率分配向量J_NN，作为智能干扰攻击者的功率分配策略；(2) According to the Stackelberg model, with the device cluster head node as the leader and the intelligent jamming attacker as the follower, according to the channel gain of the n sensor device cluster head nodes attacked by the intelligent jamming attacker vector, calculating when the game utility of the intelligent jamming attacker is maximized, the power allocation vector J _NN of the intelligent jamming attacker to the n channels it attacks, as the power allocation strategy of the intelligent jamming attacker;

(3)按照斯塔克尔伯格模型，以所述智能干扰攻击者采用步骤(2)的功率分配策略为前提，根据传感设备簇头节点的m个信道的信道增益向量，计算最大化传感设备簇头节点的博弈效用从而达到纳什均衡点时，所述传感设备簇头节点对m个可用信道的传输功率分配向量P_MM，作为所述传感设备簇头节点的功率分配策略；(3) According to the Stackelberg model, on the premise that the intelligent jamming attacker adopts the power allocation strategy of step (2), according to the channel gain vectors of the m channels of the cluster head node of the sensing device, calculate the maximum When the game utility of the sensing device cluster head node reaches the Nash equilibrium point, the transmission power distribution vector P _MM of the sensing device cluster head node to m available channels is used as the power distribution strategy of the sensing device cluster head node ;

(4)按照步骤(3)获得的传感器设备簇头节点的功率分配策略P_MM，确定决策配置变量，进行任务卸载。(4) According to the power allocation strategy P _MM of the sensor device cluster head node obtained in step (3), the decision configuration variable is determined to perform task offloading.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其步骤(2)所述智能干扰攻击者的博弈效用的计算方法如下：Preferably, in the method for optimizing the game defense strategy attacked by intelligent jamming in the sensing edge cloud, the calculation method of the game utility of the intelligent jamming attacker in step (2) is as follows:

其中，n为智能干扰攻击者攻击的信道总个数，P为传感器设备簇头节点集合的分配给所述计算任务传输功率向量，J为智能干扰攻击者的传输功率向量J＝(J₁,J₂,...,J_n)；a_s,i为第s个传感设备簇头节点第i个信道的使用状态，当a_s,i＝1时，第s个传感设备簇头节点第i个信道用于卸载该计算任务，否则a_s,i＝0；h_s,i为第s个传感设备簇头节点第i个信道上行的计算任务卸载链路信道增益，P_i为第i个信道的传感设备簇头节点的传输功率，n_0,i为第i个信道的噪声功率，h_J,i为智能干扰攻击者在第i个信道的信道增益，J_i为智能干扰攻击者在第i个信道的传输功率，γ为智能干扰攻击者的每单位干扰功率的干扰攻击成本。Among them, n is the total number of channels attacked by the intelligent jamming attacker, P is the transmission power vector of the sensor device cluster head node set allocated to the computing task, and J is the transmission power vector of the intelligent jamming attacker J=(J ₁ , _J ₂ _, _. The ith channel of the node is used to unload the computing task, otherwise a _s,i =0; h _s,i is the computing task of the upstream computing task of the s th sensor device cluster head node on the ith channel offload link channel gain, P _i is the transmission power of the sensor device cluster head node of the i-th channel, n _0,i is the noise power of the i-th channel, h _J,i is the channel gain of the intelligent jamming attacker in the i-th channel, and J _i is The transmission power of the intelligent jamming attacker in the i-th channel, and γ is the jamming attack cost per unit jamming power of the intelligent jamming attacker.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其步骤(2)所述计算当最大化智能干扰攻击者的博弈效用时，所述智能干扰攻击者的功率分配策略，采用深度神经网络建立智能攻击模型，根据智能干扰攻击者攻击的n个传感设备簇头节点的信道其信道增益向量H_s,i＝(h_s,1,h_s,2,...,h_s,n)，预测智能干扰攻击者的功率分配策略。Preferably, in the method for optimizing the game defense strategy attacked by intelligent jamming in the sensing edge cloud, in step (2), when the game utility of the intelligent jamming attacker is maximized, the power distribution of the intelligent jamming attacker is calculated. Strategy, use deep neural network to establish intelligent attack model, according to the channel of n sensor equipment cluster head nodes attacked by intelligent jamming attacker, its channel gain vector H _s,i =(h _s,1 ,h _s,2 ,.. .,h _s,n ), predicting the power allocation strategy of the intelligent jamming attacker.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其步骤(2)所述最大化智能干扰攻击者的博弈效用记作：Preferably, in the method for optimizing the game defense strategy attacked by intelligent jamming in the sensing edge cloud, the game utility of maximizing the intelligent jamming attacker in step (2) is denoted as:

其中，J_max为智能干扰攻击者的最大传输功率，为常数。Among them, J _max is the maximum transmission power of the intelligent jamming attacker, which is a constant.

所述采用深度神经网络建立的智能攻击模型，包括依次连接的输入层、归一化层、全连接层、数据整形层、卷积模块、池化层组、和输出层；The intelligent attack model established by the deep neural network includes an input layer, a normalization layer, a fully connected layer, a data shaping layer, a convolution module, a pooling layer group, and an output layer that are sequentially connected;

所述输入层，用于输入智能干扰攻击者攻击的n个传感设备簇头节点的信道其信道增益向量H_s,i＝(h_s,1,h_s,2,...,h_s,n)至归一化层；The input layer is used to input the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s of the channel of the n sensor device cluster head nodes attacked by the intelligent jamming attacker _{, n} ) to the normalization layer;

所述归一化层，将传感设备簇头节点信道增益向量H_s,i＝(h_s,1,h_s,2,...,h_s,n)进行归一化处理，获得归一化后的传感设备簇头节点信道增益向量

并通过全连接层输入到数据整形层；The normalization layer normalizes the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,n ) of the sensor device cluster head node to obtain a normalized The normalized sensor device cluster head node channel gain vector

And input to the data shaping layer through the fully connected layer;

所述数据整形层，用于将全连接层的输出经过

整形后变换成二维矩阵，输入到卷积层。The data shaping layer is used to pass the output of the fully connected layer through the

After shaping, it is transformed into a two-dimensional matrix and input to the convolutional layer.

所述卷积模块，包括两组卷积层，所述两组卷积核分别通过Relu线性整流函数连接，每组卷积核包括n个卷积核；整形后的归一化传感设备簇头节点信道增益，通过相应卷积核做第一次卷积运算后，经激活函数输出，再次通过相应卷积核做第二次卷积运算，经过激活函数，输出到池化层组；The convolution module includes two groups of convolution layers, the two groups of convolution kernels are respectively connected by the Relu linear rectification function, and each group of convolution kernels includes n convolution kernels; the normalized sensor device cluster after shaping The channel gain of the head node, after the first convolution operation is performed through the corresponding convolution kernel, is output by the activation function, and the second convolution operation is performed through the corresponding convolution kernel again, and after the activation function, it is output to the pooling layer group;

所述池化层组，包括n个并联的池化层和全连接层；The pooling layer group includes n parallel pooling layers and fully connected layers;

所述输出层，用于输出Stackelberg博弈存在的纳什均衡点的智能干扰攻击者传输功率向量(J_N,1,J_N,2,...,J_N,n)，即为所述智能干扰攻击者的功率分配策略。The output layer is used to output the intelligent jamming attacker transmission power vector (J _N,1 ,J _N,2 ,...,J _N,n ) of the Nash equilibrium point existing in the Stackelberg game, which is the intelligent jamming Attacker's power allocation strategy.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其所述采用深度神经网络建立的智能攻击模型，按照如下方法训练获取：Preferably, the game defense strategy optimization method for being attacked by intelligent interference in the sensing edge cloud, the intelligent attack model established by the deep neural network is obtained by training according to the following method:

随机初始化多信道训练权值向量

使用梯度下降法训练，逐步反相传播调整权值，智能干扰攻击者通过交互获得传感设备簇头节点的传输功率；智能攻击模型的损失函数表示如下：Randomly initialize the multi-channel training weight vector

The gradient descent method is used for training, and the weights are gradually adjusted by anti-phase propagation. The intelligent interference attacker obtains the transmission power of the cluster head node of the sensing device through interaction; the loss function of the intelligent attack model is expressed as follows:

其中，α_J,i表示损失函数的权重系数，(1-α_J,i)tanh(|J_i-J_max|)为正则化项，智能干扰攻击者的功率约束参与训练，

Among them, α _J,i represents the weight coefficient of the loss function, (1-α _J,i )tanh(|J _i -J _max |) is the regularization term, and the power constraint of the intelligent interference attacker participates in the training,

训练智能攻击模型的权值更新方程为：The weight update equation for training the intelligent attack model is:

其中，θ_J表示学习率。where _θJ represents the learning rate.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其步骤(3)所述传感设备簇头节点的博弈效用的计算方法如下：Preferably, in the method for optimizing the game defense strategy attacked by intelligent interference in the sensing edge cloud, the calculation method of the game utility of the cluster head node of the sensing device in step (3) is as follows:

其中，m为传感器设备簇头节点可用的信道资源总数，P为传感器设备簇头节点集合的分配给所述计算任务传输功率向量，J为智能干扰攻击者的传输功率向量J＝(J₁,J₂,...,J_n)；a_s,i为第s个传感设备簇头节点第i个信道的使用状态，当a_s,i＝1时，第s个传感设备簇头节点第i个信道用于卸载该计算任务，否则a_s,i＝0；h_s,i为第s个传感设备簇头节点第i个信道上行的计算任务卸载链路信道增益，P_i为第i个信道的传感设备簇头节点的传输功率，n_0,i为第i个信道的噪声功率，h_J,i为智能干扰攻击者在第i个信道的信道增益，J_i为智能干扰攻击者在第i个信道的传输功率，λ为传感设备簇头节点的每单位传输功率的传输成本。Among them, m is the total number of channel resources available to the sensor device cluster head node, P is the transmission power vector of the sensor device cluster head node set allocated to the computing task, and J is the transmission power vector of the intelligent jamming attacker J=(J ₁ , _J ₂ _, _. The ith channel of the node is used to unload the computing task, otherwise a _s,i =0; h _s,i is the computing task of the upstream computing task of the s th sensor device cluster head node on the ith channel offload link channel gain, P _i is the transmission power of the sensor device cluster head node of the i-th channel, n _0,i is the noise power of the i-th channel, h _J,i is the channel gain of the intelligent jamming attacker in the i-th channel, and J _i is The transmission power of the intelligent jamming attacker in the i-th channel, λ is the transmission cost per unit transmission power of the cluster head node of the sensing device.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其所述计算最大化传感设备簇头节点的博弈效用从而达到纳什均衡点时，所述传感设备簇头节点的传输功率分配策略，采用深度神经网络建立只能防御模型，获得传感器设备簇头节点的功率分配策略。Preferably, in the game defense strategy optimization method under intelligent interference attack in the sensing edge cloud, when the calculation maximizes the game utility of the sensing device cluster head node to reach a Nash equilibrium point, the sensing device cluster head For the transmission power allocation strategy of nodes, a deep neural network is used to establish a defense-only model, and the power allocation strategy of the sensor equipment cluster head node is obtained.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其步骤(3)所述最大化传感设备簇头节点的博弈效用记作：Preferably, in the method for optimizing the game defense strategy attacked by intelligent interference in the sensing edge cloud, in step (3), the game utility of maximizing the game utility of the cluster head node of the sensing device is denoted as:

其中，P_max为传感设备簇头节点的最大传输功率，是一个常数。Among them, _Pmax is the maximum transmission power of the sensor device cluster head node, which is a constant.

所述采用深度神经网络建立的智能防御模型，包括依次连接的输入层、归一化层、全连接层、数据整形层、卷积模块、池化层组、和输出层；The intelligent defense model established by using a deep neural network includes an input layer, a normalization layer, a fully connected layer, a data shaping layer, a convolution module, a pooling layer group, and an output layer that are sequentially connected;

所述输入层，用于输入传感设备簇头节点的m个信道的信道增益向量H_s,i＝(h_s,1,h_s,2,...,h_s,m)至归一化层；The input layer is used to input the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,m ) of the m channels of the sensor device cluster head node to normalize chemical layer;

所述归一化层，将传感设备簇头节点信道增益向量H_s,i＝(h_s,1,h_s,2,...,h_s,m)进行归一化处理，获得归一化后的传感设备簇头节点信道增益向量

并通过全连接层输入到数据整形层；The normalization layer normalizes the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,m ) of the sensor device cluster head node to obtain a normalized The normalized sensor device cluster head node channel gain vector

And input to the data shaping layer through the fully connected layer;

所述数据整形层，用于将全连接层的输出经过

所述卷积模块，包括两组卷积层，所述两组卷积核分别通过Relu线性整流函数连接，每组卷积核包括m个卷积核；整形后的归一化传感设备簇头节点信道增益，通过相应卷积核做第一次卷积运算后，经激活函数输出，再次通过相应卷积核做第二次卷积运算，经过激活函数，输出到池化层组；The convolution module includes two groups of convolution layers, the two groups of convolution kernels are respectively connected by the Relu linear rectification function, and each group of convolution kernels includes m convolution kernels; the normalized sensor device cluster after shaping The channel gain of the head node, after the first convolution operation is performed through the corresponding convolution kernel, is output by the activation function, and the second convolution operation is performed through the corresponding convolution kernel again, and after the activation function, it is output to the pooling layer group;

所述池化层组，包括m个并联的池化层和全连接层；The pooling layer group includes m parallel pooling layers and fully connected layers;

所述输出层，用于输出Stackelberg博弈存在的纳什均衡点的传感设备簇头节点的输功率向量P_MM＝(P_M,1,P_M,2,...,P_M,m)，即为所述传感器设备簇头节点的功率分配策略。The output layer is used to output the output power vector P _MM =(P _M,1 ,P _M,2 ,...,P _M,m ) of the sensor device cluster head node of the Nash equilibrium point existing in the Stackelberg game, That is, the power allocation strategy of the cluster head node of the sensor device.

优选地，所述传感边缘云中受智能干扰攻击的博弈防御策略优化方法，其步骤(3)所述智能防御模型按照如下方法训练获取：Preferably, in the method for optimizing the game defense strategy attacked by intelligent interference in the sensing edge cloud, the intelligent defense model in step (3) is trained and obtained according to the following method:

随机初始化多信道训练权值向量

使用梯度下降法训练，逐步反相传播调整权值，传感设备簇头节点通过交互获得智能干扰攻击者的传输功率；智能攻击模型的损失函数表示如下：Randomly initialize the multi-channel training weight vector

The gradient descent method is used for training, and the weights are gradually adjusted by inverse propagation. The cluster head node of the sensing device obtains the transmission power of the intelligent interference attacker through interaction; the loss function of the intelligent attack model is expressed as follows:

其中，α_s表示损失函数的权重系数，平衡约束对训练过程的影响；(1-α_s)tanh(|P-P_max|)为正则化项，传感设备簇头节点的功率约束参与训练，

通过对损失函数中的功率求偏导，得到：Among them, α _s represents the weight coefficient of the loss function, which balances the influence of constraints on the training process; (1-α _s )tanh(|PP _max |) is the regularization term, and the power constraint of the sensor device cluster head node participates in the training,

By taking the partial derivative of the power in the loss function, we get:

训练智能防御模型的权值更新方程为：The weight update equation for training the intelligent defense model is:

其中，θ_s表示学习率。where _θs is the learning rate.

按照本发明的另一个方面，提供了一种传感边缘云中受智能干扰攻击的博弈防御策略优化系统，包括初始化模块、智能干扰攻击者预测模块、防御策略决策模块、以及配置模块；According to another aspect of the present invention, there is provided a game defense strategy optimization system attacked by intelligent interference in a sensing edge cloud, including an initialization module, an intelligent interference attacker prediction module, a defense strategy decision module, and a configuration module;

所述初始化模块，用于获取初始的传感器设备簇头节点集合的分配给所述计算任务传输功率向量P：P＝(P₁,P₂,...,P_m)并提交给所述智能干扰攻击者预测模块，其中m为传感设备簇头节点的可用信道资源个数；The initialization module is used to obtain the initial sensor device cluster head node set and the transmission power vector P: P=(P ₁ , P ₂ , . . . , P _m ) allocated to the computing task and submit it to the intelligent Interference attacker prediction module, where m is the number of available channel resources of the sensor device cluster head node;

所述智能干扰攻击者预测模块，用于按照斯塔克尔伯格模型，以设备簇头节点为领导者，以智能干扰攻击者为追随者，根据智能干扰攻击者攻击的n个传感设备簇头节点的信道其信道增益向量，计算当最大化智能干扰攻击者的博弈效用时，所述智能干扰攻击者对于其攻击的n个信道的功率分配向量J_NN，作为智能干扰攻击者的功率分配策略，提交给所述防御策略决策模块；The intelligent jamming attacker prediction module is used to follow the Stackelberg model, with the device cluster head node as the leader and the intelligent jamming attacker as the follower, according to the n sensing devices attacked by the intelligent jamming attacker The channel gain vector of the channel of the cluster head node, when the game utility of the intelligent jamming attacker is maximized, the power allocation vector J _NN of the intelligent jamming attacker to the n channels it attacks is calculated as the power of the intelligent jamming attacker assigning a strategy and submitting it to the defense strategy decision-making module;

所述防御策略决策模块，用于按照斯塔克尔伯格模型，以所述智能干扰攻击者采用功率分配策略为前提，根据传感设备簇头节点的m个信道的信道增益向量，计算最大化传感设备簇头节点的博弈效用从而达到纳什均衡点时，所述传感设备簇头节点对m个可用信道的传输功率分配向量，作为所述传感设备簇头节点的功率分配策略，提交给配置模块；The defense strategy decision-making module is used to calculate the maximum value according to the channel gain vectors of the m channels of the cluster head node of the sensing device according to the Stackelberg model and on the premise that the intelligent jamming attacker adopts a power allocation strategy. When the game utility of the sensing device cluster head node is optimized to reach the Nash equilibrium point, the transmission power distribution vector of the sensing device cluster head node to m available channels is used as the power distribution strategy of the sensing device cluster head node, Submit to the configuration module;

所述配置模块，用于按照传感器设备簇头节点的功率分配策略P_MM，确定决策配置变量，进行任务卸载。总体而言，通过本发明所构思的以上技术方案与现有技术相比，能够取得下列有益效果:The configuration module is configured to determine decision configuration variables according to the power distribution strategy P _MM of the sensor device cluster head node, and perform task offloading. In general, compared with the prior art, the following beneficial effects can be achieved by the above technical solutions conceived by the present invention:

本发明在传感边缘云环境下，针对传感设备和边缘服务节点之间的安全任务卸载问题，基于Stackelberg博弈资源充足的传感设备簇头节点通过学习来优化功率分配策略并防御智能干扰攻击，能有效的防御具有学习能力的智能干扰攻击者，提供了对抗智能干扰攻击的防御方法。In the sensing edge cloud environment, the present invention aims at the problem of offloading security tasks between sensing equipment and edge service nodes. Based on the Stackelberg game, the sensing equipment cluster head node with sufficient resources optimizes the power allocation strategy and defends against intelligent interference attacks through learning. , which can effectively defend against intelligent jamming attackers with learning ability, and provides a defense method against intelligent jamming attacks.

优选方案，基于DNN(Deep neural network)的Stackelberg博弈策略学习过程，其中传感设备簇头节点作为leader角色，智能干扰攻击者作为follower角色。首先，智能干扰攻击者获取传感设备簇头节点的传输功率分配策略，并且通过自身的信道增益学习最优的功率分配策略，并最大化其博弈效用。其次，传感设备簇头节点获取智能干扰攻击者的功率分配策略，并且通过自身的信道增益学习最优的功率分配策略，最大化其博弈效用，能在智能干扰攻击者的功率分配策略信息不全的情况下有效的进行防御决策The preferred solution is based on the learning process of the Stackelberg game strategy based on DNN (Deep neural network), in which the cluster head node of the sensing device plays the role of leader, and the intelligent interference attacker plays the role of follower. First, the intelligent jamming attacker obtains the transmission power allocation strategy of the cluster head node of the sensing device, and learns the optimal power allocation strategy through its own channel gain, and maximizes its game utility. Secondly, the cluster head node of the sensing device obtains the power allocation strategy of the intelligent jamming attacker, and learns the optimal power allocation strategy through its own channel gain to maximize its game utility. make defensive decisions effectively

附图说明Description of drawings

图1是本发明针对的传感边缘云结构示意图；1 is a schematic structural diagram of the sensing edge cloud targeted by the present invention;

图2是本发明实施例提供的智能攻击模型结构示意图；2 is a schematic structural diagram of an intelligent attack model provided by an embodiment of the present invention;

图3是本发明实施例提供的单信道智能攻击模型结构示意图；3 is a schematic structural diagram of a single-channel intelligent attack model provided by an embodiment of the present invention;

图4是本发明实施例提供的智能防御模型结构示意图；4 is a schematic structural diagram of an intelligent defense model provided by an embodiment of the present invention;

图5是本发明实施例提供的单信道智能防御模型结构示意图。FIG. 5 is a schematic structural diagram of a single-channel intelligent defense model provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

本发明中，传感边缘云系统由传感设备节点、传感设备簇头节点、边缘服务节点和智能干扰攻击者组成。其中，边缘服务节点由AP访问点和微云服务组成。如图1所示。在受到智能干扰攻击时，传感设备节点通过传感设备簇头节点卸载其计算任务到边缘服务节点，利用传感设备簇头节点的计算能力对具有学习能力的智能干扰攻击者实施防御策略。In the present invention, the sensing edge cloud system is composed of sensing equipment nodes, sensing equipment cluster head nodes, edge service nodes and intelligent jamming attackers. Among them, the edge service node consists of AP access points and micro cloud services. As shown in Figure 1. When attacked by intelligent jamming, the sensing device node offloads its computing tasks to the edge service node through the sensing device cluster head node, and uses the computing power of the sensing device cluster head node to implement defense strategies against intelligent jamming attackers with learning ability.

本发明提供的传感边缘云中受智能干扰攻击的博弈防御策略优化方法，包括以下步骤：The game defense strategy optimization method for being attacked by intelligent interference in the sensing edge cloud provided by the present invention includes the following steps:

(2)按照斯塔克尔伯格模型，以设备簇头节点为领导者，以智能干扰攻击者为追随者，根据智能干扰攻击者攻击的n个传感设备簇头节点的信道其信道增益向量，计算当最大化智能干扰攻击者的博弈效用时，所述智能干扰攻击者对于其攻击的n个信道的功率分配向量J_NN，作为智能干扰攻击者的功率分配策略。(2) According to the Stackelberg model, with the device cluster head node as the leader and the intelligent jamming attacker as the follower, according to the channel gain of the n sensor device cluster head nodes attacked by the intelligent jamming attacker vector, calculating the power allocation vector J _NN of the intelligent jamming attacker to the n channels attacked when the game utility of the intelligent jamming attacker is maximized, as the power allocation strategy of the intelligent jamming attacker.

所述智能干扰攻击者的博弈效用的计算方法如下：The calculation method of the game utility of the intelligent interference attacker is as follows:

所述计算当最大化智能干扰攻击者的博弈效用时，所述智能干扰攻击者的功率分配策略，优选采用深度神经网络建立智能攻击模型，根据智能干扰攻击者攻击的n个传感设备簇头节点的信道其信道增益向量H_s,i＝(h_s,1,h_s,2,...,h_s,n)，预测智能干扰攻击者的功率分配策略；具体如下：When the calculation is to maximize the game utility of the intelligent jamming attacker, the power allocation strategy of the intelligent jamming attacker is preferably a deep neural network to establish an intelligent attack model, according to the n sensing device cluster heads attacked by the intelligent jamming attacker. The channel of the node has its channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,n ), which predicts the power allocation strategy of the intelligent jamming attacker; the details are as follows:

所述最大化智能干扰攻击者的博弈效用记作：The game utility of the maximizing intelligence jamming attacker is written as:

其中，J_max为智能干扰攻击者的最大传输功率，是一个常数。Among them, J _max is the maximum transmission power of the intelligent jamming attacker, which is a constant.

所述采用深度神经网络建立的智能攻击模型，结构如图2所示，包括依次连接的输入层、归一化层、全连接层、数据整形层、卷积模块、池化层组、和输出层；The intelligent attack model established by the deep neural network, as shown in Figure 2, includes an input layer, a normalization layer, a fully connected layer, a data shaping layer, a convolution module, a pooling layer group, and an output layer that are sequentially connected. Floor;

并通过全连接层输入到数据整形层；计算方法如下：The normalization layer normalizes the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,n ) of the sensor device cluster head node to obtain a normalized The normalized sensor device cluster head node channel gain vector

And input to the data shaping layer through the fully connected layer; the calculation method is as follows:

其中，E(·)表示期望，D(·)表示方差。where E(·) is the expectation and D(·) is the variance.

所述数据整形层，用于将全连接层的输出经过

第一次卷积运算，使用3×3步长为1的卷积核，提取出智能干扰攻击者n个信道增益信息，卷积运算输出如下：The first convolution operation uses a 3×3 convolution kernel with a stride of 1 to extract n channel gain information of the intelligent jammer attacker. The output of the convolution operation is as follows:

其中，

为权值，第一次卷积运算的输出向量为

in,

is the weight, the output vector of the first convolution operation is

第一次卷积运算后采用Relu函数作为激活函数，输出向量：

其中

After the first convolution operation, the Relu function is used as the activation function, and the output vector is:

in

第二次卷积运算，使用3×3步长为1的卷积核，卷积运算输出如下：

第二次卷积运算的输出向量为

The second convolution operation uses a 3×3 convolution kernel with a stride of 1. The output of the convolution operation is as follows:

The output vector of the second convolution operation is

第二次卷积运算后采用Relu函数作为激活函数，输出向量：

其中

After the second convolution operation, the Relu function is used as the activation function, and the output vector is:

in

所述池化层组，包括n个并联的池化层和全连接层；优选为了加快训练速度，使用最大化池化层，经过2×2步长为1的滑动窗口，得到最大化池化层的输出向量为

The pooling layer group includes n parallel pooling layers and fully-connected layers; preferably, in order to speed up the training speed, the maximum pooling layer is used, and the maximum pooling layer is obtained through a 2×2 sliding window with a step size of 1. The output vector of the layer is

所述输出层，优选采用Sigmoid函数，输出Stackelberg博弈存在的纳什均衡点的智能干扰攻击者传输功率向量(J_N,1,J_N,2,...,J_N,n)，即为所述智能干扰攻击者的功率分配策略；优选采用Sigmoid函数作为输出层，

The output layer preferably adopts the Sigmoid function to output the intelligent interference attacker transmission power vector (J _N,1 ,J _N,2 ,...,J _N,n ) of the Nash equilibrium point existing in the Stackelberg game, which is the The power allocation strategy of the intelligent jamming attacker is described; the Sigmoid function is preferably used as the output layer,

所述采用深度神经网络建立的智能攻击模型，按照如下方法训练获取：The intelligent attack model established by the deep neural network is obtained by training according to the following method:

随机初始化多信道训练权值向量

通过对损失函数中的功率求偏导，得到：Among them, α _J,i represents the weight coefficient of the loss function, (1-α _J,i )tanh(|J _i -J _max |) is the regularization term, and the power constraint of the intelligent interference attacker participates in the training,

By taking the partial derivative of the power in the loss function, we get:

其中，θ_J表示学习率。where _θJ represents the learning rate.

所述智能攻击模型训练完成后，使用训练好的智能攻击模型预测智能干扰攻击者的传输功率策略，当输入智能干扰攻击者攻击的n个信道的传感设备簇头节点信道增益向量时，所述智能攻击模型输出的智能干扰攻击者的功率分配策略J_NN＝(J_N,1,J_N,2,...,J_N,n)。After the training of the intelligent attack model is completed, the trained intelligent attack model is used to predict the transmission power strategy of the intelligent jammer. The power allocation strategy J _NN = (J _N,1 ,J _N,2 ,...,J _N,n ) of the intelligent jamming attacker output by the intelligent attack model.

(3)按照斯塔克尔伯格模型，以所述智能干扰攻击者采用步骤(2)的功率分配策略为前提，根据传感设备簇头节点的m个信道的信道增益向量，计算最大化传感设备簇头节点的博弈效用从而达到纳什均衡点时，所述传感设备簇头节点对m个可用信道的传输功率分配向量，作为所述传感设备簇头节点的功率分配策略；(3) According to the Stackelberg model, on the premise that the intelligent jamming attacker adopts the power allocation strategy of step (2), according to the channel gain vectors of the m channels of the cluster head node of the sensing device, calculate the maximum When the game utility of the sensing device cluster head node reaches the Nash equilibrium point, the transmission power distribution vector of the sensing device cluster head node to m available channels is used as the power distribution strategy of the sensing device cluster head node;

所述传感设备簇头节点的博弈效用的计算方法如下：The calculation method of the game utility of the cluster head node of the sensing device is as follows:

所述计算最大化传感设备簇头节点的博弈效用从而达到纳什均衡点时，所述传感设备簇头节点的传输功率分配策略，优选采用深度神经网络建立只能防御模型，获得传感器设备簇头节点的功率分配策略；具体如下：When the calculation maximizes the game utility of the sensor device cluster head node to reach the Nash equilibrium point, the transmission power distribution strategy of the sensor device cluster head node is preferably a deep neural network to establish a defense-only model to obtain the sensor device cluster. The power allocation strategy of the head node; the details are as follows:

所述最大化传感设备簇头节点的博弈效用记作：The game utility of maximizing the cluster head node of the sensing device is denoted as:

所述采用深度神经网络建立的智能防御模型，结构如图4所示，包括依次连接的输入层、归一化层、全连接层、数据整形层、卷积模块、池化层组、和输出层；The intelligent defense model established by the deep neural network, the structure is shown in Figure 4, including the sequentially connected input layer, normalization layer, fully connected layer, data shaping layer, convolution module, pooling layer group, and output Floor;

并通过全连接层输入到数据整形层；计算方法如下：The normalization layer normalizes the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,m ) of the sensor device cluster head node to obtain a normalized The normalized sensor device cluster head node channel gain vector

所述数据整形层，用于将全连接层的输出经过

第一次卷积运算，使用3×3步长为1的卷积核，提取出智能干扰攻击者m个信道增益信息，卷积运算输出如下：The first convolution operation uses a 3×3 convolution kernel with a stride of 1 to extract m channel gain information of the intelligent interference attacker. The output of the convolution operation is as follows:

其中，

为权值，第一次卷积运算的输出向量为

in,

is the weight, the output vector of the first convolution operation is

第一次卷积运算后采用Relu函数作为激活函数，输出向量：

其中

in

第二次卷积运算的输出向量为

The output vector of the second convolution operation is

第二次卷积运算后采用Relu函数作为激活函数，输出向量：

其中

in

所述池化层组，包括m个并联的池化层和全连接层；优选为了加快训练速度，使用最大化池化层，经过2×2步长为1的滑动窗口，得到最大化池化层的输出向量为

The pooling layer group includes m parallel pooling layers and fully-connected layers; preferably, in order to speed up the training speed, the maximum pooling layer is used, and the maximum pooling layer is obtained through a 2×2 sliding window with a step size of 1. The output vector of the layer is

所述输出层，用于输出Stackelberg博弈存在的纳什均衡点的传感设备簇头节点的输功率向量P_MM＝(P_M,1,P_M,2,...,P_M,m)，即为所述传感器设备簇头节点的功率分配策略；优选采用Sigmoid函数作为输出层，

The output layer is used to output the output power vector P _MM =(P _M,1 ,P _M,2 ,...,P _M,m ) of the sensor device cluster head node of the Nash equilibrium point existing in the Stackelberg game, is the power allocation strategy of the cluster head node of the sensor device; preferably the Sigmoid function is used as the output layer,

按照如下方法训练获取：Train as follows:

随机初始化多信道训练权值向量

By taking the partial derivative of the power in the loss function, we get:

其中，θ_s表示学习率。where _θs is the learning rate.

所述智能防御模型训练完成后，使用训练好的智能防御模型决策传感器设备簇头节点的传输功率策略，当输入m个传感设备节点的信道增益向量时，所述智能防御模型输出的传感设备簇头节点的功率分配策略P_MM。After the training of the intelligent defense model is completed, the trained intelligent defense model is used to decide the transmission power strategy of the cluster head node of the sensor equipment. When the channel gain vectors of m sensor equipment nodes are input, the sensing The power allocation policy P _MM of the device cluster head node.

本发明提供的传感边缘云中受智能干扰攻击的博弈防御策略优化系统，包括初始化模块、智能干扰攻击者预测模块、防御策略决策模块、以及配置模块；The game defense strategy optimization system attacked by intelligent interference in the sensing edge cloud provided by the present invention includes an initialization module, an intelligent interference attacker prediction module, a defense strategy decision module, and a configuration module;

所述配置模块，用于按照传感器设备簇头节点的功率分配策略P_MM，确定决策配置变量，进行任务卸载。The configuration module is configured to determine decision configuration variables according to the power distribution strategy P _MM of the sensor device cluster head node, and perform task offloading.

以下为实施例：The following are examples:

在传感设备簇头节点卸载延迟敏感性计算任务时，智能干扰攻击者增加了计算任务卸载的延迟时间和能量消耗，降低了信道的可靠性和计算任务的卸载容量。令计算任务卸载的决策配置变量为A＝{a_s,i|s∈M,i∈E}，其中M表示传感设备簇头节点集合，E表示传感设备簇头节点的信道集合。如果传感设备簇头节点s利用信道i来卸载计算任务，则a_s,i＝1，否则a_s,i＝0。因此，对于单信道i，传感设备簇头节点s的计算任务卸载容量为：When the sensor device cluster head node unloads delay-sensitive computing tasks, the intelligent jamming attacker increases the delay time and energy consumption of computing task offloading, and reduces the reliability of the channel and the offloading capacity of computing tasks. Let the decision configuration variable for computing task offloading be A={a _{s, i} |s ∈ M, i ∈ E}, where M represents the cluster head node set of the sensing device, and E represents the channel set of the sensing device cluster head node. If the sensor device cluster head node s utilizes channel i to offload computing tasks, then a _s,i =1, otherwise a _s,i =0. Therefore, for a single channel i, the computing task offload capacity of the sensor device cluster head node s is:

传感设备簇头节点s作为抗干扰攻击的防御者，选择一定的传输功率来卸载计算任务，因此，传感设备簇头节点s的博弈效用为：As a defender of anti-jamming attacks, the sensing device cluster head node s selects a certain transmission power to offload computing tasks. Therefore, the game utility of the sensing device cluster head node s is:

其中，P表示传感设备簇头节点s的传输功率，J表示智能干扰攻击者的传输功率。传感设备簇头节点和智能干扰攻击者每单元的功率分别为λ和γ。n₀表示噪声功率。h_s,i表示上行的计算任务卸载链路信道增益，h_J,i表示智能干扰攻击者的信道增益。Among them, P represents the transmission power of the sensor device cluster head node s, and J represents the transmission power of the intelligent jamming attacker. The power per unit of the sensor device cluster head node and the intelligent jamming attacker are λ and γ, respectively. n ₀ represents the noise power. h _s,i represents the channel gain of the uplink computing task offloading link, and h _J,i represents the channel gain of the intelligent jammer attacker.

智能干扰攻击者选择一定的功率干扰传感设备簇头节点s的延迟敏感计算任务的卸载过程。因此，智能干扰攻击者的博弈效用为：The intelligent jamming attacker selects a certain power to jam the offloading process of the delay-sensitive computing task of the cluster head node s of the sensing device. Therefore, the game utility of the intelligent jamming attacker is:

当传感设备簇头节点有m个可用的信道资源时，智能干扰攻击者发起多信道攻击，导致多个信道上卸载的计算任务传输失败。令P_i和J_i表示传感设备簇头节点和智能干扰攻击者分配给信道i的传输功率。令P＝(P₁,P₂,...,P_m)表示传感设备簇头节点的传输功率向量。J＝(J₁,J₂,...,J_n)表示智能干扰攻击者的传输功率向量。传感设备簇头节点和智能干扰攻击者的传输功率满足

且

m个信道模式下，传感设备簇头节点的博弈效用为：When the cluster head node of the sensing device has m available channel resources, the intelligent jamming attacker initiates a multi-channel attack, which leads to the failure of transmission of offloaded computing tasks on multiple channels. Let P _i and J _i denote the transmission power allocated to channel i by the sensor device cluster head node and the intelligent jamming attacker. Let P=(P ₁ , P ₂ , . . . , P _m ) denote the transmission power vector of the cluster head node of the sensing device. J=(J ₁ , J ₂ , . . . , J _n ) represents the transmission power vector of the intelligent jamming attacker. The transmission power of the sensor device cluster head node and the intelligent jamming attacker satisfies

and

In m channel mode, the game utility of the sensor device cluster head node is:

在多信道模式下，智能干扰攻击者对n个信道进行干扰攻击，智能干扰攻击者的博弈效用为：In the multi-channel mode, the intelligent jamming attacker performs jamming attacks on n channels, and the game utility of the intelligent jamming attacker is:

在受到具有学习能力的智能干扰攻击者进行攻击时，本发明模型化了抗干扰攻击的功率分配问题为一个基于DNN的Stackelberg博弈。在博弈模型中，传感设备簇头节点和智能干扰攻击者是博弈参与者。传感设备簇头节点是leader，首先实施传输功率的分配，智能干扰攻击者是follower，对传感设备簇头节点计算任务卸载过程进行干扰攻击。每个博弈参与者的最优功率分配策略通过DNN推理获得。本发明针对具有学习能力的智能干扰攻击者设计防御策略，智能干扰攻击者通过博弈交互能够获得传感设备簇头节点的传输功率，同时根据自身的信道增益推理功率分配策略来最大化其博弈效用。同理，传感设备簇头节点作为防御者，通过博弈交互也能获取智能干扰攻击者的功率分配策略，同时根据自身的信道增益推理计算任务卸载的功率分配策略。When attacked by intelligent jamming attackers with learning ability, the present invention models the power allocation problem of anti-jamming attacks as a DNN-based Stackelberg game. In the game model, the sensor device cluster head node and the intelligent jamming attacker are the game participants. The cluster head node of the sensing device is the leader, and the transmission power is allocated first. The intelligent jamming attacker is the follower, which performs jamming attacks on the computing task offloading process of the cluster head node of the sensing device. The optimal power allocation strategy for each game participant is obtained through DNN inference. The invention designs a defense strategy for an intelligent jamming attacker with learning ability. The intelligent jamming attacker can obtain the transmission power of the cluster head node of the sensing device through game interaction, and at the same time infers the power distribution strategy according to its own channel gain to maximize its game utility. . Similarly, as a defender, the cluster head node of the sensing device can also obtain the power allocation strategy of the intelligent interference attacker through game interaction, and at the same time calculate the power allocation strategy of task offloading according to its own channel gain inference.

(1)获取初始的传感器设备簇头节点集合的分配给所述计算任务传输功率向量P：P＝(P₁,P₂,...,P_m)，其中m为传感设备簇节点的可用信道资源个数；(1) Obtain the transmission power vector P of the initial sensor device cluster head node set allocated to the computing task: P=(P ₁ , P ₂ , . . . , P _m ), where m is the value of the sensor device cluster node The number of available channel resources;

当智能干扰攻击者对卸载链路发起多信道攻击时，最大化智能干扰攻击者的博弈效用可形式化如下：When the intelligent jamming attacker launches a multi-channel attack on the offload link, the game utility of maximizing the intelligent jamming attacker can be formalized as follows:

本实施例建立多信道智能攻击模型网络MJnet来最大化智能干扰攻击者的多信道攻击博弈效用。同时，通过训练MJnet来推理多信道模式下智能干扰攻击者的最优攻击策略向量。In this embodiment, a multi-channel intelligent attack model network MJnet is established to maximize the multi-channel attack game utility of the intelligent jamming attacker. At the same time, MJnet is trained to infer the optimal attack strategy vector of intelligent jamming attackers in multi-channel mode.

MJnet结构，如图2所示：MJnet structure, as shown in Figure 2:

MJnet处理智能干扰攻击者信道增益的输入、输出步骤如下：The input and output steps of MJnet processing the channel gain of the intelligent jammer are as follows:

①输入：归一化智能干扰攻击者多信道攻击的信道增益为均值是0，方差是1的标准正态分布。MJnet多信道增益归一化的输入为：

①Input: The channel gain of the normalized intelligent jamming attacker's multi-channel attack is a standard normal distribution with a mean of 0 and a variance of 1. The input to MJnet multi-channel gain normalization is:

②经过

整形后变换成n个二维矩阵存储信道增益。② After

After shaping, it is transformed into n two-dimensional matrices to store the channel gains.

③整形后的数据输入到卷积层，使用n个3×3步长为1的卷积核，进行第一次卷积操作，提取出智能干扰攻击者n个信道增益信息。(3) The shaped data is input to the convolution layer, and n convolution kernels with a stride of 1 are used to perform the first convolution operation to extract n channel gain information of the intelligent interference attacker.

④第一次卷积层的输出为：

其中

为权值。因此，第一次卷积的输出向量为

④ The output of the first convolutional layer is:

in

is the weight. Therefore, the output vector of the first convolution is

⑤第一次卷积层的输出使用relu作为激活函数，

为relu的单个输出，多输出向量为

⑤ The output of the first convolutional layer uses relu as the activation function,

is the single output of relu, and the multi-output vector is

⑥然后进行第二次卷积操作，单个输出为：

多输出向量为

⑥ Then perform the second convolution operation, and the single output is:

The multi-output vector is

⑦再次使用relu作为激活函数，

为relu的单个输出，多输出向量为

⑦ Use relu as the activation function again,

is the single output of relu, and the multi-output vector is

⑧为了加快训练速度，使用最大化池化层，经过2×2步长为1的滑动窗口，得到最大化池化层的输出向量为

⑧ In order to speed up the training speed, the maximum pooling layer is used, and the output vector of the maximum pooling layer is obtained after a 2×2 sliding window with a step size of 1.

⑨最后使用Sigmoid函数，单个输出为

多输出向量为(J_N,1,J_N,2,...,J_N,N)。⑨ Finally, the Sigmoid function is used, and the single output is

The multi-output vector is (J _N,1 ,J _N,2 ,...,J _N,N ).

MJnet训练及推理过程如下：The MJnet training and inference process is as follows:

智能干扰攻击者通过MJnet推理其多信道攻击策略。所以，智能干扰攻击者首先随机初始化多信道训练权值向量

初始化完成后，使用随机梯度下降法训练MJnet，逐步反向传播调整权值，使得智能干扰攻击者的博弈效用值最大。其中，智能干扰攻击者通过交互获得传感设备簇头节点的传输功率。智能干扰攻击者进行多信道攻击时最大化其博弈效用的损失函数表示如下：Intelligent jamming attackers reason their multi-channel attack strategies through MJnet. Therefore, the intelligent jamming attacker first randomly initializes the multi-channel training weight vector

After the initialization is completed, the stochastic gradient descent method is used to train MJnet, and the weights are adjusted by back-propagation step by step, so that the game utility value of the intelligent interference attacker is maximized. Among them, the intelligent jamming attacker obtains the transmission power of the cluster head node of the sensing device through interaction. The loss function that maximizes the game utility of an intelligent jamming attacker when conducting a multi-channel attack is expressed as follows:

通过对损失函数中的功率J_i求偏导，得到：Among them, α _J,i represents the weight coefficient of the loss function, (1-α _J,i )tanh(|J _i -J _max |) is the regularization term, and the power constraint of the intelligent interference attacker participates in the training,

By taking the partial derivative of the power J _i in the loss function, we get:

训练MJnet的权值更新方程为：The weight update equation for training MJnet is:

其中，θ_J表示学习率。MJnet训练完成后，使用训练好的MJnet推理智能干扰攻击者多信道攻击的传输功率策略。当输入智能干扰攻击者的多信道增益向量

MJnet输出优化的多信道攻击功率策略向量。where _θJ represents the learning rate. After the MJnet training is completed, use the trained MJnet to reason about the transmission power strategy of the multi-channel attack of the intelligent jammer. The multi-channel gain vector of the input smart jammer attacker

MJnet outputs the optimized multi-channel attack power policy vector.

当m＝n＝1时，即单信道攻击模式下，最大化智能干扰攻击者的博弈效用可以特别的简化为：When m=n=1, that is, in the single-channel attack mode, the game utility of maximizing the intelligent jamming attacker can be specially simplified as:

MJnet可具体简化为一个单层的SJnet，通过训练SJnet来学习和推理单信道攻击模式下智能干扰攻击者的响应策略，以此最大化智能干扰攻击者的博弈效用。MJnet can be specifically simplified as a single-layer SJnet. By training SJnet to learn and reason the response strategy of the intelligent jamming attacker in the single-channel attack mode, so as to maximize the game utility of the intelligent jamming attacker.

SJnet结构如图3所示：该网络模型由归一化、全连接层、数据整形、卷积层、池化层等组成The structure of SJnet is shown in Figure 3: The network model consists of normalization, full connection layer, data shaping, convolution layer, pooling layer, etc.

SJnet处理智能干扰攻击者信道增益的输入、输出步骤如下：The input and output steps of SJnet processing the channel gain of the intelligent jammer are as follows:

①为了加快智能干扰攻击者博弈策略学习的收敛速度和保证输入的智能干扰攻击者的信道增益是同分布的。因此，把输入的信道增益归一化处理为均值为0，方差为1的标准正态分布。智能干扰攻击者信道增益归一化的输出为：①In order to speed up the convergence speed of the game strategy learning of the intelligent jamming attacker and ensure that the input channel gain of the intelligent jamming attacker is identically distributed. Therefore, the input channel gain is normalized to a standard normal distribution with mean 0 and variance 1. The normalized output of the smart jammer's channel gain is:

其中，E(·)表示期望，D(·)表示方差。把归一化后的智能干扰攻击者的信道增益向量

输入SJnet全连接层。where E(·) is the expectation and D(·) is the variance. The normalized channel gain vector of the smart jammer

Enter the SJnet fully connected layer.

②数据整形。全连接层的输出经过

整形后变换成二维矩阵。② data shaping. The output of the fully connected layer goes through

After shaping, it is transformed into a two-dimensional matrix.

③整形后的数据输入到卷积层，使用3×3步长为1的卷积核，进行第一次卷积操作，提取出智能干扰攻击者关键变化的信道增益信息。(3) The shaped data is input to the convolution layer, and the first convolution operation is performed using a 3×3 convolution kernel with a stride of 1, and the channel gain information of the key changes of the intelligent jammer is extracted.

④卷积层一的输出为：

其中

为权值。④ The output of convolutional layer 1 is:

in

is the weight.

⑤卷积层一的输出使用relu作为激活函数，

为relu的输出。⑤ The output of convolutional layer one uses relu as the activation function,

is the output of relu.

⑥然后进行第二次卷积操作，卷积层二的输出为：⑥ Then perform the second convolution operation, and the output of the second convolution layer is:

⑦再次使用relu作为激活函数，

为relu的输出。⑦ Use relu as the activation function again,

is the output of relu.

⑧为了加快训练速度，使用最大化池化层，使用2×2步长为1的滑动窗口，得到最大化池化层的输出为

⑧ In order to speed up the training, use the maximization pooling layer and use a 2×2 sliding window with a stride of 1, and the output of the maximization pooling layer is

⑨使用全连接层，把最大化池化层的输出拉伸成n×1向量。⑨ Using a fully connected layer, stretch the output of the maximization pooling layer into an n×1 vector.

⑩最后使用Sigmoid函数，输出

⑩ Finally, use the Sigmoid function to output

SJnet训练及推理过程如下：The SJnet training and inference process is as follows:

智能干扰攻击者通过SJnet推理其攻击策略。所以，智能干扰攻击者首先随机初始化训练权值

初始化完成后，使用随机梯度下降法训练SJnet，逐步反向传播调整权值，使得智能干扰攻击者的博弈效用值最大。其中，智能干扰攻击者通过交互获得传感设备簇头节点的传输功率。智能干扰攻击者最大化其博弈效用的损失函数表示如下：Intelligent jamming attackers reason their attack strategies through SJnet. Therefore, the intelligent jamming attacker first randomly initializes the training weights

After the initialization is completed, the SJnet is trained using the stochastic gradient descent method, and the weights are gradually adjusted by backpropagation, so that the game utility value of the intelligent interference attacker is maximized. Among them, the intelligent jamming attacker obtains the transmission power of the cluster head node of the sensing device through interaction. The loss function for an intelligent jamming attacker to maximize its game utility is expressed as follows:

其中，α_J表示损失函数的权重系数。第2项为正则化项，智能干扰攻击者的干扰功率约束参与训练，

Among them, α _J represents the weight coefficient of the loss function. The second term is the regularization term, the interference power constraint of the intelligent jamming attacker participates in the training,

通过对损失函数中的功率J求偏导，得到：By taking the partial derivative of the power J in the loss function, we get:

训练SJnet的权值更新方程为：The weight update equation for training SJnet is:

其中，θ_J表示学习率。SJnet训练完成后，使用训练好的SJnet推理智能干扰攻击者的传输功率策略，当输入智能干扰攻击者的信道增益向量

SJnet输出优化的单信道攻击功率策略。where _θJ represents the learning rate. After the SJnet training is completed, use the trained SJnet to infer the transmission power strategy of the intelligent jammer attacker. When inputting the channel gain vector of the intelligent jammer attacker

SJnet outputs an optimized single-channel attack power policy.

传感设备簇头节点通过深度神经网络推理其防御策略。当智能干扰攻击者对卸载链路发起多信道攻击时，最大化传感设备簇头节点的博弈效用可形式化如下：The sensing device cluster head node infers its defense strategy through a deep neural network. When an intelligent jamming attacker launches a multi-channel attack on the offload link, the game utility of maximizing the cluster head node of the sensing device can be formalized as follows:

在多信道攻击模式下建立深度神经网络防御模型MSnet来最大化传感设备簇头节点的博弈效用。同时，通过训练MSnet来推理多信道攻击模式下传感设备簇头节点的最优防御策略向量。MSnet结构如图4所示。In the multi-channel attack mode, a deep neural network defense model MSnet is established to maximize the game utility of the sensor equipment cluster head node. At the same time, MSnet is trained to infer the optimal defense strategy vector of the sensor device cluster head node in the multi-channel attack mode. The MSnet structure is shown in Figure 4.

MSnet处理多信道攻击模式下传感设备簇头节点信道增益的输入、输出步骤如下：The steps of MSnet processing the input and output of the channel gain of the cluster head node of the sensing device in the multi-channel attack mode are as follows:

①输入：归一化均值为0，方差为1的传感设备节点多信道增益向量①Input: The multi-channel gain vector of the sensor device node with a normalized mean value of 0 and a variance of 1

②数据整形。经过

整形后变换成m个二维矩阵，分别对应于m个信道。② data shaping. go through

After shaping, it is transformed into m two-dimensional matrices, corresponding to m channels respectively.

③整形后的数据输入到卷积层，使用m个3×3步长为1的卷积核，进行第一次卷积操作，提取出传感设备簇头节点关键变化的多信道增益信息。(3) The shaped data is input to the convolution layer, and m convolution kernels with a stride of 1 are used to perform the first convolution operation to extract the multi-channel gain information of the key changes of the cluster head node of the sensing device.

④卷积层的单个输出为：

其中

为权值。输出向量为

④ The single output of the convolutional layer is:

in

is the weight. The output vector is

⑤卷积层的输出使用relu作为激活函数，

为relu的单个输出，多输出向量为

⑤ The output of the convolutional layer uses relu as the activation function,

is the single output of relu, and the multi-output vector is

⑥然后进行第二次卷积操作，使用m个3×3步长为1的卷积核，卷积层的单个输出为：⑥ Then perform the second convolution operation, using m convolution kernels of 3×3 stride 1, and the single output of the convolution layer is:

多输出向量为

The multi-output vector is

⑦再次使用relu作为激活函数，

为relu的单个输出，多输出向量为

⑦ Use relu as the activation function again,

is the single output of relu, and the multi-output vector is

⑧为了加快训练速度，使用最大化池化层，经过2×2步长为1的滑动窗口，得到最大化池化层的单个输出为

多输出向量为

⑧In order to speed up the training speed, using the maximization pooling layer, after a 2×2 sliding window with a step size of 1, the single output of the maximization pooling layer is obtained as

The multi-output vector is

⑨最后使用Sigmoid函数，输出

得到多输出向量为(P_M,1,P_M,2,...,P_M,m)。⑨ Finally, use the Sigmoid function to output

The multi-output vector is obtained as (P _M,1 ,P _M,2 ,...,P _M,m ).

MSnet训练及推理过程如下：The MSnet training and inference process is as follows:

传感设备簇头节点通过MSnet推理其防御策略。所以，传感设备簇头节点首先随机初始化训练权值向量

初始化完成后，使用随机梯度下降法训练MSnet，逐步反向传播调整权值，使得传感设备簇头节点的博弈效用最大。其中，传感设备簇头节点通过交互获得智能干扰攻击者的传输功率。在多信道攻击模式下，传感设备簇头节点最大化其博弈效用的损失函数表示如下：The sensing device cluster head node infers its defense strategy through MSnet. Therefore, the cluster head node of the sensing device first randomly initializes the training weight vector

After the initialization is completed, MSnet is trained by stochastic gradient descent, and the weights are adjusted by back-propagation step by step, so that the game utility of the cluster head node of the sensing device is maximized. Among them, the cluster head node of the sensing device obtains the transmission power of the intelligent interference attacker through interaction. In the multi-channel attack mode, the loss function of the sensor device cluster head node to maximize its game utility is expressed as follows:

其中，α_s,i表示损失函数权重系数，平衡约束对训练过程的影响，(1-α_s,i)tanh(|P_i-P_max|)为正则化项，传感设备簇头节点的功率约束参与训练MSnet的权值。Among them, α _s,i represents the weight coefficient of the loss function, the influence of the balance constraint on the training process, (1-α _s,i )tanh(|P _i -P _max |) is the regularization term, the sensor device cluster head node The power constraints are involved in training the weights of MSnet.

通过对损失函数中的功率求偏导，得到：By taking the partial derivative of the power in the loss function, we get:

训练MSnet的权值更新方程为：The weight update equation for training MSnet is:

其中，θ_s表示学习率。MSnet训练完成后，使用训练好的MSnet推理传感设备簇头节点用于防御的传输功率策略向量。在多信道攻击模式下，当输入传感设备簇头节点的信道增益向量

MSnet输出优化的功率策略向量通过随机梯度下降法，使得DNN处于收敛状态，基于DNN的Stackelberg博弈存在的纳什均衡点为(J_NN,P_MM)。where _θs is the learning rate. After the MSnet training is completed, use the trained MSnet to infer the transmission power policy vector of the sensing device cluster head node for defense. In the multi-channel attack mode, when inputting the channel gain vector of the cluster head node of the sensing device

MSnet outputs the optimized power policy vector through the stochastic gradient descent method, which makes the DNN in a convergent state. The Nash equilibrium point of the Stackelberg game based on DNN is (J _NN , P _MM ).

此时，在多信道攻击模式下，智能干扰攻击者和传感设备簇头节点的效用达到最大。At this time, in the multi-channel attack mode, the utility of intelligently interfering with the attacker and the cluster head node of the sensing device reaches the maximum.

当m＝n＝1时，即单信道攻击模式下，最大化传感设备簇头节点的博弈效用可以特别简化为：When m=n=1, that is, in the single-channel attack mode, the game utility of maximizing the cluster head node of the sensing device can be specially simplified as:

MSnet可具体简化为一个单层的SSnet，通过训练SSnet来学习和推理单信道攻击模式下传感设备簇头节点的博弈策略，以最大化传感设备簇头节点的博弈效用。MSnet can be simplified as a single-layer SSnet, and SSnet can be trained to learn and reason the game strategy of the sensor device cluster head node in the single-channel attack mode, so as to maximize the game utility of the sensor device cluster head node.

SSnet的结构如图5所示：该网络模型由归一化、全连接层、数据整形、卷积层、池化层等组成。The structure of SSnet is shown in Figure 5: the network model consists of normalization, full connection layer, data shaping, convolution layer, pooling layer, etc.

SSnet处理传感设备簇头节点信道增益的输入、输出步骤如下：SSnet processes the input and output steps of the sensor device cluster head node channel gain as follows:

①为了加快传感设备簇头节点博弈策略学习的收敛速度和保证传感设备簇头节点的信道增益是同分布的。因此，把输入的信道增益归一化处理为均值为0，方差为1的标准正态分布。传感设备簇头节点信道增益归一化输出为：①In order to speed up the convergence speed of the game strategy learning of the sensing device cluster head node and ensure that the channel gain of the sensing device cluster head node is identically distributed. Therefore, the input channel gain is normalized to a standard normal distribution with mean 0 and variance 1. The normalized output of the channel gain of the sensor device cluster head node is:

其中，E(·)表示期望，D(·)表示方差。把归一化后的传感设备簇头节点信道增益向量

输入SSnet全连接层。where E(·) is the expectation and D(·) is the variance. The normalized sensor device cluster head node channel gain vector

Enter the SSnet fully connected layer.

②数据整形。全连接层的输出经过

After shaping, it is transformed into a two-dimensional matrix.

③整形后的数据输入到卷积层，使用3×3步长为1的卷积核，进行第一次卷积操作，提取出传感设备簇头节点关键变化的信道增益信息。(3) The shaped data is input to the convolution layer, and the first convolution operation is performed using a 3×3 convolution kernel with a stride of 1, and the channel gain information of the key changes of the cluster head node of the sensing device is extracted.

④卷积层一的输出为：

其中

为权值。④ The output of convolutional layer 1 is:

in

is the weight.

⑤卷积层一的输出使用relu作为激活函数，

is the output of relu.

⑦再次使用relu作为激活函数，

为relu的输出。⑦ Use relu as the activation function again,

is the output of relu.

⑨使用全连接层，把最大化池化层的输出拉伸成m×1向量。⑨ Using a fully connected layer, stretch the output of the maximizing pooling layer into an m×1 vector.

⑩最后使用Sigmoid函数，输出

⑩ Finally, use the Sigmoid function to output

SSnet训练及推理过程如下：The SSnet training and inference process is as follows:

传感设备簇头节点通过SSnet推理其防御策略。所以，传感设备簇头节点首先随机初始化训练权值

初始化完成后，使用随机梯度下降法训练SSnet，逐步反向传播调整权值，使得传感设备簇头节点的博弈效用最大。其中，传感设备簇头节点通过交互获得智能干扰攻击者的传输功率。传感设备簇头节点最大化其博弈效用的损失函数表示如下：The sensing device cluster head node infers its defense strategy through SSnet. Therefore, the cluster head node of the sensing device first randomly initializes the training weights

After the initialization is completed, the SSnet is trained using the stochastic gradient descent method, and the weights are adjusted by back-propagation step by step, so that the game utility of the cluster head node of the sensing device is maximized. Among them, the cluster head node of the sensing device obtains the transmission power of the intelligent interference attacker through interaction. The loss function of the sensor device cluster head node to maximize its game utility is expressed as follows:

其中，α_s表示损失函数权重系数，平衡约束对训练过程的影响，(1-α_s)tanh(|P-P_max|)为正则化项，传感设备簇头节点的功率约束参与训练SSnet的权值。Among them, α _s represents the weight coefficient of the loss function, the impact of the balance constraint on the training process, (1-α _s )tanh(|PP _max |) is the regularization term, and the power constraint of the sensor device cluster head node participates in the weight of training SSnet value.

通过对损失函数中的功率P求偏导，得到：By taking the partial derivative of the power P in the loss function, we get:

训练SSnet的权值更新方程为：The weight update equation for training SSnet is:

其中，θ_s表示学习率。SSnet训练完成后，使用训练好的SSnet推理传感设备簇头节点用于防御的传输功率策略，当输入传感设备簇头节点的信道增益向量

SSnet输出优化的功率策略P_M。通过随机梯度下降法，使得DNN处于收敛状态，基于DNN的Stackelberg博弈存在的纳什均衡点为(J_N,P_M)。此时，在单信道攻击模式下，智能干扰攻击者和传感设备簇头节点的效用达到最大。where _θs is the learning rate. After the SSnet training is completed, use the trained SSnet to infer the transmission power strategy of the cluster head node of the sensing device for defense. When the channel gain vector of the cluster head node of the sensing device is input,

SSnet outputs an optimized power policy P _M . Through the stochastic gradient descent method, the DNN is in a state of convergence, and the Nash equilibrium point of the Stackelberg game based on DNN is (J _N , P _M ). At this time, in the single-channel attack mode, the utility of intelligently interfering with the attacker and the cluster head node of the sensing device reaches the maximum.

4)按照步骤(3)获得的传感器设备簇头节点的功率分配策略P_MM，确定决策配置变量，进行任务卸载。4) According to the power allocation strategy P _MM of the sensor device cluster head node obtained in step (3), the decision configuration variable is determined to perform task offloading.

本实施例在传感边缘云环境下，传感设备计算任务卸载中，具有学习能力的智能干扰攻击者对传感设备计算任务卸载链路攻击时，实现低复杂性和精准的防御。具体通过以下方案：In this embodiment, in the sensing device computing task offloading in the sensing edge cloud environment, a low-complexity and accurate defense is achieved when an intelligent interference attacker with learning ability attacks the sensing device computing task offloading link. Specifically through the following schemes:

本实施例针对传感设备计算任务卸载场景，建立资源充足的传感设备簇头节点的计算任务卸载容量模型。针对具有学习能力的智能干扰攻击者对计算任务卸载过程的攻击设计防御策略。In this embodiment, a computing task offloading capacity model of a sensor device cluster head node with sufficient resources is established for the sensing device computing task offloading scenario. A defense strategy is designed for the intelligent interference attacker with learning ability to attack the offloading process of computing tasks.

本实施例特别列出在单信道攻击模式下，分别形式化智能干扰攻击者和传感设备簇头节点最大化其博弈效用的优化问题。建立单信道攻击模式下智能干扰攻击者进行博弈策略优化的深度神经网络模型SJnet、传感设备簇头节点进行博弈防御策略优化的深度神经网络模型SSnet，以最大化博弈参与者的效用为目标分别训练SJnet和SSnet，并且在智能干扰攻击者改变单信道攻击策略时，使得传感设备簇头节点在单信道攻击模式下使用SSnet推理快速获得最优的功率分配策略来防御智能干扰攻击。This embodiment specifically lists the optimization problem of maximizing the game utility of the intelligent jamming attacker and the sensor device cluster head node respectively under the single-channel attack mode. Establish a deep neural network model SJnet for intelligent jamming attackers to optimize game strategies in single-channel attack mode, and a deep neural network model SSnet for game defense strategy optimization by sensor device cluster head nodes, aiming to maximize the utility of game participants respectively. SJnet and SSnet are trained, and when the intelligent jamming attacker changes the single-channel attack strategy, the cluster head node of the sensing device uses SSnet inference in the single-channel attack mode to quickly obtain the optimal power allocation strategy to defend against intelligent jamming attacks.

普通地，在多信道攻击模式下，分别形式化智能干扰攻击者和传感设备簇头节点最大化其博弈效用的优化问题。建立多信道攻击模式下智能干扰攻击者进行多信道攻击博弈策略优化的深度神经网络模型MJnet、传感设备簇头节点进行多信道防御策略优化的深度神经网络模型MSnet，以最大化多信道模式下智能干扰攻击者和传感设备簇头节点的博弈效用为目标分别训练MJnet和MSnet，并且在智能干扰攻击者改变多信道攻击策略时，使得传感设备簇头节点在多信道攻击场景下使用MSnet推理快速获得最优的功率分配策略向量来防御智能干扰攻击者的多信道攻击。Generally, in the multi-channel attack mode, the optimization problem of maximizing the game utility of the intelligent jamming attacker and the sensor device cluster head node is formulated respectively. Establish a deep neural network model MJnet for intelligent jamming attackers to optimize multi-channel attack game strategies in multi-channel attack mode, and a deep neural network model MSnet for sensor equipment cluster head nodes to optimize multi-channel defense strategies to maximize the multi-channel mode. The game utility of the intelligent jamming attacker and the sensor device cluster head node is trained to train MJnet and MSnet respectively, and when the intelligent jammer attacker changes the multi-channel attack strategy, the sensor device cluster head node uses MSnet in the multi-channel attack scenario Inference quickly obtains the optimal power allocation policy vector to defend against multi-channel attacks by intelligent jammers.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, etc., All should be included within the protection scope of the present invention.

Claims

1. a game defense strategy optimization method of being attacked by intelligent interference in a sensing edge cloud, is characterized in that, comprises the following steps:

(1) Obtain the transmission power vector P of the initial sensor device cluster head node set allocated to the computing task: P=(P ₁ , P ₂ ,...,P _m ), where m is the available sensor device cluster head node The number of channel resources;

(2) According to the Stackelberg model, with the device cluster head node as the leader and the intelligent jamming attacker as the follower, according to the channel gain vector of the n sensing device cluster head nodes attacked by the intelligent jamming attacker, Calculate when the game utility of the intelligent jamming attacker is maximized, the power allocation vector J _NN of the intelligent jamming attacker to the n channels it attacks, as the power allocation strategy of the intelligent jamming attacker; The game utility is calculated as follows:

Among them, n is the total number of channels attacked by the intelligent jamming attacker, P is the transmission power vector of the sensor device cluster head node set allocated to the computing task, and J is the transmission power vector of the intelligent jamming attacker J=(J ₁ , _J ₂ _, _. The ith channel of the node is used to unload the computing task, otherwise a _s,i =0; h _s,i is the computing task of the upstream computing task of the s th sensor device cluster head node on the ith channel offload link channel gain, P _i is the transmission power of the sensor device cluster head node of the i-th channel, n _0,i is the noise power of the i-th channel, h _J,i is the channel gain of the intelligent jamming attacker in the i-th channel, and J _i is The transmission power of the intelligent jamming attacker in the i-th channel, γ is the jamming attack cost per unit jamming power of the intelligent jamming attacker;

(3) According to the Stackelberg model, on the premise that the intelligent jamming attacker adopts the power allocation strategy of step (2), according to the channel gain vectors of the m channels of the cluster head node of the sensing device, calculate the maximum When the game utility of the sensing device cluster head node reaches the Nash equilibrium point, the transmission power distribution vector of the sensing device cluster head node to m available channels is used as the power distribution strategy of the sensing device cluster head node; so The calculation method of the game utility of the cluster head node of the sensing device is as follows:

Among them, m is the total number of channel resources available to the sensor device cluster head node, P is the transmission power vector of the sensor device cluster head node set allocated to the computing task, and J is the transmission power vector of the intelligent jamming attacker J=(J ₁ , _J ₂ _, _. The ith channel of the node is used to unload the computing task, otherwise a _s,i =0; h _s,i is the computing task of the upstream computing task of the s th sensor device cluster head node on the ith channel offload link channel gain, P _i is the transmission power of the sensor device cluster head node of the i-th channel, n _0,i is the noise power of the i-th channel, h _J,i is the channel gain of the intelligent jamming attacker in the i-th channel, and J _i is The transmission power of the intelligent jamming attacker in the i-th channel, and λ is the transmission cost per unit transmission power of the cluster head node of the sensing device;

(4) According to the power allocation strategy P _MM of the sensor device cluster head node obtained in step (3), the decision configuration variable is determined to perform task offloading.

2. The game defense strategy optimization method of being attacked by intelligent interference in the sensing edge cloud as claimed in claim 1, wherein the calculation in step (2) maximizes the game utility of the intelligent interference attacker. The power allocation strategy of the intelligent jammer is to use the deep neural network to establish the intelligent attack model. According to the channel gain vector H _s,i =(h _s,1 ,h _s of the n sensor equipment cluster head nodes attacked by the intelligent jammer _{, 2} ,...,h _s,n ), predict the power allocation strategy of the intelligent jamming attacker.

3. the game defense strategy optimization method of being attacked by intelligent interference in the sensing edge cloud as claimed in claim 2, is characterized in that, the game utility of maximizing intelligent interference attacker described in step (2) is denoted as:

Among them, J _max is the maximum transmission power of the intelligent jamming attacker;

The intelligent attack model established by the deep neural network includes an input layer, a normalization layer, a fully connected layer, a data shaping layer, a convolution module, a pooling layer group, and an output layer that are sequentially connected;

The input layer is used to input the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,n of the n sensor device cluster head nodes attacked by the intelligent jamming attacker ) to the normalization layer;

The normalization layer normalizes the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,n ) of the sensor device cluster head node to obtain a normalized The normalized sensor device cluster head node channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,n ), and input to the data shaping layer through the fully connected layer;

The data shaping layer is used to transform the output of the fully connected layer into a two-dimensional matrix after reshape (h _{s, i} ), and input it to the convolution layer;

The convolution module includes two groups of convolution layers, the two groups of convolution layers are respectively connected by the Relu linear rectification function, and each group of convolution layers includes n convolution kernels; the normalized sensor device cluster after shaping The channel gain of the head node, after the first convolution operation is performed through the corresponding convolution kernel, is output through the activation function, and the second convolution operation is performed through the corresponding convolution kernel again, and after the activation function, it is output to the pooling layer group;

The pooling layer group includes n parallel pooling layers and fully connected layers;

The output layer is used to output the intelligent jamming attacker transmission power vector (J _N,1 ,J _N,2 ,...,J _N,n ) of the Nash equilibrium point existing in the Stackelberg game, which is the intelligent jamming Attacker's power allocation strategy.

4. the game defense strategy optimization method of being attacked by intelligent interference in the sensing edge cloud as claimed in claim 3, it is characterized in that, described adopting the intelligent attack model of deep neural network to establish, according to the following method training acquisition:

Randomly initialize the multi-channel training weight vector

The weight update equation for training the intelligent attack model is:

where _θJ represents the learning rate.

5. The game defense strategy optimization method of being attacked by intelligent interference in the sensing edge cloud as claimed in claim 1, wherein, when the calculation maximizes the game utility of the cluster head node of the sensing device so as to reach the Nash equilibrium point, For the transmission power distribution strategy of the sensor equipment cluster head node, a deep neural network is used to establish an intelligent defense model, and the power distribution strategy of the sensor equipment cluster head node is obtained.

6. The game defense strategy optimization method of being attacked by intelligent interference in the sensing edge cloud as claimed in claim 5, is characterized in that, the game utility of the described maximizing sensing device cluster head node in step (3) is denoted as:

Among them, _Pmax is the maximum transmission power of the sensor device cluster head node;

The intelligent defense model established by using a deep neural network includes an input layer, a normalization layer, a fully connected layer, a data shaping layer, a convolution module, a pooling layer group, and an output layer that are sequentially connected;

The input layer is used to input the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,m ) of the m channels of the sensor device cluster head node to normalize chemical layer;

The normalization layer normalizes the channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,m ) of the sensor device cluster head node to obtain a normalized The normalized sensor device cluster head node channel gain vector H _s,i =(h _s,1 ,h _s,2 ,...,h _s,m ), and input to the data shaping layer through the fully connected layer;

The convolution module includes two groups of convolution layers, the two groups of convolution layers are respectively connected by the Relu linear rectification function, and each group of convolution layers includes m convolution kernels; the normalized sensor device cluster after shaping The channel gain of the head node, after the first convolution operation is performed through the corresponding convolution kernel, is output through the activation function, and the second convolution operation is performed through the corresponding convolution kernel again, and after the activation function, it is output to the pooling layer group;

The pooling layer group includes m parallel pooling layers and fully connected layers;

The output layer is used to output the output power vector P _MM =(P _M,1 ,P _M,2 ,...,P _M,m ) of the sensor device cluster head node of the Nash equilibrium point existing in the Stackelberg game, That is, the power allocation strategy of the cluster head node of the sensor device.

7. The game defense strategy optimization method of being attacked by intelligent interference in the sensing edge cloud as claimed in claim 6, is characterized in that, the intelligent defense model described in step (3) is trained and obtained according to the following method:

Randomly initialize the multi-channel training weight vector

Among them, α _s represents the weight coefficient of the loss function, which balances the influence of constraints on the training process; (1-α _s )tanh(|PP _max |) is the regularization term, and the power constraint of the sensor device cluster head node participates in the training,

By taking the partial derivative of the power in the loss function, we get:

The weight update equation for training the intelligent defense model is:

where _θs is the learning rate.

8. A game defense strategy optimization system attacked by intelligent interference in a sensing edge cloud, characterized in that it comprises an initialization module, an intelligent interference attacker prediction module, a defense strategy decision module, and a configuration module;

The initialization module is used to obtain the initial sensor device cluster head node set assigned to the calculation task transmission power vector P: P=(P ₁ , P ₂ , . . . , P _m ) and submit it to the intelligent jamming attack is the user prediction module, where m is the number of available channel resources of the cluster head node of the sensing device;

The intelligent jamming attacker prediction module is used to follow the Stackelberg model, with the device cluster head node as the leader and the intelligent jamming attacker as the follower, according to the n sensing devices attacked by the intelligent jamming attacker The channel gain vector of the channel of the cluster head node, when the game utility of the intelligent jamming attacker is maximized, the power allocation vector J _NN of the intelligent jamming attacker to the n channels it attacks is calculated as the power of the intelligent jamming attacker The allocation strategy is submitted to the defense strategy decision-making module; the calculation method of the game utility of the intelligent interference attacker is as follows:

The defense strategy decision-making module is used to calculate the maximum value according to the channel gain vectors of the m channels of the cluster head node of the sensing device according to the Stackelberg model and on the premise that the intelligent jamming attacker adopts a power allocation strategy. When the game utility of the sensing device cluster head node is optimized to reach the Nash equilibrium point, the transmission power distribution vector of the sensing device cluster head node to m available channels is used as the power distribution strategy of the sensing device cluster head node, Submit to the configuration module; the calculation method of the game utility of the cluster head node of the sensing device is as follows:

The configuration module is configured to determine decision configuration variables according to the power distribution strategy P _MM of the sensor device cluster head node, and perform task offloading.