[go: up one dir, main page]

CN115550031A - Network defense method, control device and storage medium - Google Patents

Network defense method, control device and storage medium Download PDF

Info

Publication number
CN115550031A
CN115550031A CN202211192687.4A CN202211192687A CN115550031A CN 115550031 A CN115550031 A CN 115550031A CN 202211192687 A CN202211192687 A CN 202211192687A CN 115550031 A CN115550031 A CN 115550031A
Authority
CN
China
Prior art keywords
attack
defense
strategy
terminal
target working
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211192687.4A
Other languages
Chinese (zh)
Inventor
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202211192687.4A priority Critical patent/CN115550031A/en
Publication of CN115550031A publication Critical patent/CN115550031A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • H04L63/205Network architectures or network communication protocols for network security for managing network security; network security policies in general involving negotiation or determination of the one or more network security mechanisms to be used, e.g. by negotiation between the client and the server or between peers or by selection according to the capabilities of the entities involved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供一种网络防御方法、控制设备及存储介质,控制设备建立攻防博弈模型后,重复执行获得控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和攻击终端的身份类型的先验类型,在该操作后根据攻击策略信息和先验概率,确定最优防御策略和对目标工作终端的最优攻击策略,并执行最优防御策略,控制设备根据最优防御策略和攻击策略信息,计算马尔科夫学习函数值,直至根据马尔科夫学习函数值,确定目标工作终端未处于安全状态时停止循环,控制设备根据目标工作终端获得的持续性的攻击策略信息确定目标工作终端的攻击状态,有助于其根据持续获得的信息确定隐蔽性的攻击,使其更好地保护目标工作终端。

Figure 202211192687

This application provides a network defense method, a control device and a storage medium. After the control device establishes an attack-defense game model, the control device repeatedly executes to obtain the attack strategy information of the attacking terminal attacking the target working terminal observed by the control device and the a priori type of the identity type of the attacking terminal. , after this operation, according to the attack strategy information and prior probability, determine the optimal defense strategy and the optimal attack strategy for the target working terminal, and execute the optimal defense strategy, and control the device to calculate according to the optimal defense strategy and attack strategy information Markov learning function value, until according to the Markov learning function value, stop the cycle when it is determined that the target working terminal is not in a safe state, the control device determines the attacking state of the target working terminal according to the persistent attack strategy information obtained by the target working terminal, It helps it to determine concealed attacks based on continuously obtained information, so that it can better protect the target work terminal.

Figure 202211192687

Description

网络防御方法、控制设备及存储介质Network defense method, control device and storage medium

技术领域technical field

本申请的实施例涉及网络安全技术领域,尤其涉及一种网络防御方法、控制设备及存储介质。The embodiments of the present application relate to the technical field of network security, and in particular to a network defense method, a control device, and a storage medium.

背景技术Background technique

在进入万物互联时代的同时,各种终端设备的安全问题日益严峻。现有的基于被动扫描式入侵检测的网络防御架构对于现在的网络攻击行为可谓是捉襟见肘。现有方法为安全网络设备从待检测数据中待检测项(例如:IP地址、接口调用、程序执行等)之间的交互关系提取有效关系路径,并对有效关系路径进行异常检测,得到目标主机的威胁检测结果,其中,有效关系路径包括至少两个检测项及各检测项之间的交互关系。但是,该网络安全设备很难识别隐蔽性高、大规模且智能化的未知类型的网络攻击。While entering the era of the Internet of Everything, the security issues of various terminal devices are becoming increasingly serious. The existing network defense architecture based on passive scanning intrusion detection is not enough for the current network attacks. The existing method extracts the effective relationship path from the interactive relationship between the items to be detected (such as: IP address, interface call, program execution, etc.) in the data to be detected by the security network device, and performs anomaly detection on the effective relationship path to obtain the target host Threat detection results, wherein the effective relationship path includes at least two detection items and the interaction relationship between each detection item. However, it is difficult for this network security device to identify unknown types of network attacks with high concealment, large scale and intelligence.

发明内容Contents of the invention

本申请提供一种网络防御方法、控制设备及存储介质,用以解决安全网络设备对隐蔽性、持续性的攻击操作不易被察觉,无法实现及时防御的技术问题。The present application provides a network defense method, control equipment and storage media, which are used to solve the technical problem that the hidden and persistent attack operations of security network equipment are not easy to be detected, and timely defense cannot be realized.

第一方面,本申请提供一种网络防御方法,所述方法应用于控制设备,所述控制设备位于目标系统中,所述目标系统包括至少一个工作终端和蜜网集群,所述方法包括:In a first aspect, the present application provides a network defense method, the method is applied to a control device, the control device is located in a target system, the target system includes at least one working terminal and a honeynet cluster, and the method includes:

建立攻防博弈模型;其中,所述攻防博弈模型包括多个博弈参与方、各所述博弈参与方的策略空间、信号空间、先验概率、后验概率和各所述博弈参与方的收益;An attack-defense game model is established; wherein, the attack-defense game model includes a plurality of game participants, the strategy space of each of the game participants, the signal space, the prior probability, the posterior probability and the income of each of the game participants;

重复执行获得所述控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和所述攻击终端的身份类型的先验概率;Obtaining the attack strategy information of the attacking terminal attacking the target working terminal and the prior probability of the identity type of the attacking terminal observed by the control device repeatedly;

根据所述攻击策略信息和所述先验概率,确定最优防御策略和所述攻击终端针对所述最优防御策略的最优攻击策略,并执行所述最优防御策略;determining an optimal defense strategy and an optimal attack strategy of the attacking terminal for the optimal defense strategy according to the attack strategy information and the prior probability, and executing the optimal defense strategy;

根据所述最优防御策略和所述攻击策略信息,计算马尔科夫学习函数值;calculating a Markov learning function value according to the optimal defense strategy and the attack strategy information;

根据所述马尔科夫学习函数值,确定所述目标工作终端是否处于安全状态并在所述目标工作终端未处于安全状态时终止循环。Based on the Markov learning function value, it is determined whether the target working terminal is in a safe state and a loop is terminated when the target working terminal is not in a safe state.

在上述技术方案中,控制设备建立攻防博弈模型之后,在至少一次循环中获得先验概率和控制设备观测到的攻击策略信息,并在每次循环中根据先验概率和攻击策略信息确定控制设备对攻击终端的最优防御策略和攻击终端针对最优防御策略的最优攻击策略,并利用马尔可夫学习函数确定目标工作终端在本次攻击策略的影响下是否危险,以使控制设备在确定目标工作终端未处于安全状态时终止循环,以使控制设备对攻击终端执行的多次攻击进行监测,并统计多次攻击对目标系统造成的危害,察觉到隐蔽性、持续性的攻击操作,从而使得控制设备在确定目标工作终端未处于安全状态时能及时对该目标工作终端进行防护,有效保障目标工作终端的安全。In the above technical solution, after the control device establishes the attack-defense game model, it obtains the prior probability and the attack strategy information observed by the control device in at least one cycle, and determines the control device according to the prior probability and attack strategy information in each cycle. The optimal defense strategy for the attack terminal and the optimal attack strategy for the attack terminal against the optimal defense strategy, and use the Markov learning function to determine whether the target working terminal is dangerous under the influence of this attack strategy, so that the control device can determine When the target working terminal is not in a safe state, the cycle is terminated, so that the control device can monitor the multiple attacks performed by the attack terminal, and count the damage caused by multiple attacks to the target system, and detect the concealed and persistent attack operations, thereby When the control device determines that the target working terminal is not in a safe state, it can protect the target working terminal in time, effectively ensuring the safety of the target working terminal.

可选地,根据所述攻击策略信息和所述先验概率,确定最优防御策略和对所述目标工作终端的最优攻击策略,具体包括:Optionally, determining an optimal defense strategy and an optimal attack strategy for the target working terminal according to the attack strategy information and the prior probability, specifically including:

根据所述攻击策略信息和所述先验概率,计算所述攻击终端的身份类型的后验概率;calculating the posterior probability of the identity type of the attacking terminal according to the attack strategy information and the prior probability;

根据所述后验概率,确定所述攻击终端的身份类型;Determine the identity type of the attacking terminal according to the posterior probability;

根据所述攻击策略信息和所述身份类型,确定系统成本和类型成本;determining system cost and type cost according to the attack strategy information and the identity type;

根据所述系统成本、所述类型成本、所述攻击策略信息和所述后验概率,计算所述攻击终端的攻击者收益和所述目标系统的防御者收益;calculating the attacker payoff of the attacking terminal and the defender payoff of the target system according to the system cost, the type cost, the attack strategy information, and the posterior probability;

根据所述后验概率和所述防御者收益,确定最优防御策略和对所述目标工作终端的最优攻击策略。An optimal defense strategy and an optimal attack strategy for the target working terminal are determined according to the posterior probability and the defender's income.

可选地,根据所述攻击策略信息和所述身份类型,确定系统成本和类型成本,具体包括:Optionally, according to the attack policy information and the identity type, determine system cost and type cost, specifically including:

将所述攻击策略信息在系统成本映射表中查询,获得与所述攻击策略信息对应的系统成本;Querying the attack strategy information in a system cost mapping table to obtain the system cost corresponding to the attack strategy information;

将所述攻击策略信息和所述身份类型在类型成本映射表中查询,获得与所述攻击策略信息对应的类型成本。The attack policy information and the identity type are queried in a type cost mapping table to obtain the type cost corresponding to the attack policy information.

可选地,所述博弈参与方的策略空间包含防御策略空间,所述防御策略空间中包含至少一个防御策略;根据所述最优防御策略和所述攻击策略信息,计算马尔科夫学习函数值,具体包括:Optionally, the strategy space of the game participants includes a defense strategy space, and the defense strategy space includes at least one defense strategy; according to the optimal defense strategy and the attack strategy information, the Markov learning function value is calculated , including:

根据所述最优防御策略和所述攻击策略信息,计算所述目标工作终端在当前状态下的攻防回报值;calculating the attack and defense return value of the target working terminal in the current state according to the optimal defense strategy and the attack strategy information;

获得所述目标工作终端从最初状态到当前状态的累计回报值和所述目标工作终端在当前状态下的马尔科夫学习函数值;Obtaining the cumulative return value of the target working terminal from the initial state to the current state and the Markov learning function value of the target working terminal in the current state;

根据攻击终端针对所述最优防御策略的最优攻击策略、所有所述防御策略、所述累计回报值和所述攻防回报值,计算所述目标工作终端在下一状态下的最大累计回报值;According to the optimal attack strategy of the attacking terminal for the optimal defense strategy, all the defense strategies, the cumulative reward value and the attack-defense reward value, calculate the maximum cumulative reward value of the target working terminal in the next state;

根据所述当前状态下的攻防回报值、所述下一状态下的最大累计回报值和所述当前状态下的马尔科夫学习函数值,计算所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值。According to the attack and defense return value in the current state, the maximum cumulative return value in the next state and the Markov learning function value in the current state, calculate the next state of the target working terminal in the current state The Markov learning function value of .

可选地,所述博弈参与方的策略空间还包括攻击策略空间,所述攻击策略空间中包含至少一个攻击策略;根据所述最优防御策略和所述攻击策略信息,计算所述目标工作终端在当前状态下的攻防回报值,具体包括:Optionally, the strategy space of the game participants further includes an attack strategy space, and the attack strategy space contains at least one attack strategy; according to the optimal defense strategy and the attack strategy information, calculate the target working terminal The attack and defense return value in the current state, including:

根据所述最优防御策略、所述攻击策略信息和攻防回报值计算公式,计算所述目标工作终端在当前状态下的攻防回报值:Calculate the attack and defense return value of the target working terminal in the current state according to the optimal defense strategy, the attack strategy information and the attack and defense return value calculation formula:

其中,所述攻防回报值计算公式具体包括:Wherein, the calculation formula of the offensive and defensive return value specifically includes:

Figure BDA0003870099590000031
Figure BDA0003870099590000031

其中,sp表示所述目标工作终端所处的当前状态之后的第p个状态,Re(fx(sp))表示所述目标工作终端x在状态p下的攻防回报值,i表示所述防御策略空间中的防御标识,j表示所述攻击策略空间中的攻击标识,i=0表示所述目标工作终端x在状态p时未受到防御,i≠0表示所述目标工作终端x在状态p时受到第i个防御策略的防御,j=0表示所述目标工作终端x在状态p时未受到攻击,j≠0表示所述目标工作终端x在状态p时受到第j个攻击策略的攻击,P表示防护值,Di表示所述目标工作终端x受到第i个防御策略保护时的回报值,δi表示回报系数,D表示所述目标工作终端未受到防御策略的保护时受到攻击的攻击回报值,Pkij表示博弈论模型采用第i个保护策略来抵抗第j个攻击策略时的回报值。Wherein, sp represents the pth state after the current state of the target working terminal, Re(f x (sp )) represents the attack and defense return value of the target working terminal x in state p , and i represents the The defense identifier in the defense strategy space, j represents the attack identifier in the attack strategy space, i=0 means that the target working terminal x is not defended in state p, i≠0 means that the target working terminal x is in state p In state p, it is defended by the i-th defense strategy, j=0 means that the target working terminal x is not attacked in state p, and j≠0 means that the target working terminal x is in state p by the j-th attack strategy , P represents the protection value, D i represents the reward value when the target working terminal x is protected by the i-th defense strategy, δ i represents the reward coefficient, D represents the target working terminal x is protected by the defense strategy The attack return value of the attack, Pk ij represents the return value when the game theory model adopts the i-th protection strategy to resist the j-th attack strategy.

可选地,根据所述目标工作终端的最优攻击策略、所有所述防御策略、所述累计回报值和所述攻防回报值,计算所述目标工作终端在下一状态下的最大累计回报值,具体包括:Optionally, calculating the maximum cumulative reward value of the target working terminal in the next state according to the optimal attack strategy of the target working terminal, all the defense strategies, the cumulative reward value, and the offensive and defensive reward value, Specifically include:

根据攻击终端针对所述最优防御策略的最优攻击策略、各所述防御策略和攻防回报值公式,计算基于各所述防御策略调整的所述目标工作终端的下一状态下的攻防回报值;According to the optimal attack strategy of the attack terminal for the optimal defense strategy, each of the defense strategies and the attack and defense return value formula, calculate the attack and defense return value in the next state of the target working terminal adjusted based on each of the defense strategies ;

将各所述攻防回报值与所述累计回报值相加,计算基于各所述防御策略调整的所述目标工作终端的下一状态下的累计回报值;adding each of the offensive and defensive return values to the cumulative return value, and calculating the cumulative return value in the next state of the target working terminal adjusted based on each of the defense strategies;

将各所述防御策略对应的下一状态下的累计回报值中的最大值确定为所述最大累计回报值。Determining the maximum value among the cumulative return values in the next state corresponding to each defense strategy as the maximum cumulative return value.

可选地,根据所述当前状态下的攻防回报值、所述下一状态下的最大累计回报值和所述当前状态下的马尔科夫学习函数值,计算所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值,具体包括:Optionally, according to the attack and defense return value in the current state, the maximum cumulative return value in the next state, and the Markov learning function value in the current state, calculate the target working terminal in the current state Markov learning function value in the next state, including:

根据所述当前状态下的攻防回报值、所述下一状态下的最大累计回报值、所述当前状态下的马尔科夫学习函数值和学习函数更新公式,计算所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值;其中,所述学习函数更新公式具体包括:According to the attack and defense return value in the current state, the maximum cumulative return value in the next state, the Markov learning function value in the current state and the learning function update formula, calculate the current state of the target working terminal The Markov learning function value in the next state of ; wherein, the learning function update formula specifically includes:

Figure BDA0003870099590000041
Figure BDA0003870099590000041

Figure BDA0003870099590000042
Figure BDA0003870099590000042

其中,

Figure BDA0003870099590000043
表示所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值,
Figure BDA0003870099590000044
表示所述目标工作终端在当前状态下的马尔科夫学习函数值,α表示学习率参数,γ表示折现率参数,
Figure BDA0003870099590000045
表示所述下一状态下的最大累计回报值;s0表示所述目标工作终端的初始状态,Re(fx(s0))表示初始状态对应的攻防回报值;s1表示所述目标工作终端的状态s0的下一状态,Re(fx(s1))表示状态s1对应的攻防回报值;sp表示所述终端的当前状态,Re(fx(sp))表示当前状态对应的攻防回报值;sp+1表示所述目标工作终端当前状态的下一状态,Re(fx(sp+1))表示下一状态对应的攻防回报值。in,
Figure BDA0003870099590000043
Indicates the Markov learning function value of the target working terminal in the next state of the current state,
Figure BDA0003870099590000044
Represents the Markov learning function value of the target working terminal in the current state, α represents the learning rate parameter, γ represents the discount rate parameter,
Figure BDA0003870099590000045
Represents the maximum cumulative return value under the next state; s 0 represents the initial state of the target work terminal, Re(f x (s 0 )) represents the attack and defense return value corresponding to the initial state; s 1 represents the target work terminal The next state of the terminal state s 0 , Re(f x (s 1 )) represents the attack and defense return value corresponding to state s 1 ; sp p represents the current state of the terminal, Re(f x (s p )) represents the current The attack and defense return value corresponding to the state; sp+1 represents the next state of the current state of the target working terminal, and Re(f x (s p+1 )) represents the attack and defense return value corresponding to the next state.

可选地,根据所述马尔科夫学习函数值,确定所述目标工作终端是否处于安全程度,具体包括:Optionally, determining whether the target working terminal is at a safe level according to the Markov learning function value specifically includes:

当所述马尔科夫学习函数值小于或等于安全阈值时,确定所述目标工作终端处于所述安全状态;determining that the target working terminal is in the safe state when the Markov learning function value is less than or equal to a safe threshold;

当所述马尔科夫学习函数值大于所述安全阈值时,确定所述目标工作终端未处于所述安全状态。When the Markov learning function value is greater than the safety threshold, it is determined that the target working terminal is not in the safety state.

第二方面,本申请提供一种控制设备,包括:处理器以及与处理器通信连接的存储器;In a second aspect, the present application provides a control device, including: a processor and a memory communicatively connected to the processor;

存储器存储计算机指令;the memory stores computer instructions;

处理器在执行计算机指令时用于实现第一方面涉及的网络防御方法。The processor is used to implement the network defense method involved in the first aspect when executing computer instructions.

第三方面,本申请提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机指令,计算机指令被处理器执行时用于实现第一方面涉及的网络防御方法。In a third aspect, the present application provides a computer-readable storage medium, wherein computer instructions are stored in the computer-readable storage medium, and the computer instructions are used to implement the network defense method involved in the first aspect when executed by a processor.

本申请提供一种网络防御方法、控制设备及存储介质,控制设备建立攻防博弈模型,重复执行获得控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和攻击终端的身份类型的先验类型,在该操作后根据攻击策略信息和先验概率,确定最优防御策略和对目标工作终端的最优攻击策略,并执行最优防御策略,控制设备根据最优防御策略和攻击策略信息,计算马尔科夫学习函数值,直至根据马尔科夫学习函数值,确定目标工作终端未处于安全状态时停止循环,以使控制设备根据目标工作终端获得的持续性的攻击策略信息确定目标工作终端的攻击状态,有助于其根据持续获得的信息确定隐蔽性的攻击,提高了其更好地保护目标工作终端。This application provides a network defense method, a control device and a storage medium. The control device establishes an attack-defense game model, and repeatedly executes to obtain the attack strategy information of the attacking terminal attacking the target working terminal observed by the control device and the a priori type of the identity type of the attacking terminal. After this operation, according to the attack strategy information and prior probability, determine the optimal defense strategy and the optimal attack strategy for the target working terminal, and execute the optimal defense strategy, and control the device to calculate the Mar The Kove learning function value, until according to the Markov learning function value, stop the cycle when it is determined that the target working terminal is not in a safe state, so that the control device determines the attack state of the target working terminal according to the persistent attack strategy information obtained by the target working terminal , which helps it determine concealed attacks based on continuously obtained information, and improves its ability to better protect target working terminals.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

图1为本申请根据一示例性实施例提供的网络防御方法的应用场景图;FIG. 1 is an application scenario diagram of a network defense method provided by the present application according to an exemplary embodiment;

图2为本申请根据一示例性实施例提供的网络防御方法的流程示意图;FIG. 2 is a schematic flowchart of a network defense method provided by the present application according to an exemplary embodiment;

图3为本申请根据一示例性实施例提供的马尔科夫学习函数值的计算流程示意图;Fig. 3 is a schematic flow diagram of the calculation process of the Markov learning function value provided by the present application according to an exemplary embodiment;

图4为本申请根据一实施例提供的网络防御装置的结构示意图;FIG. 4 is a schematic structural diagram of a network defense device provided by the present application according to an embodiment;

图5为本申请根据一实施例提供的控制设备的结构示意图。Fig. 5 is a schematic structural diagram of a control device provided by the present application according to an embodiment.

通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。By means of the above drawings, specific embodiments of the present application have been shown, which will be described in more detail hereinafter. These drawings and text descriptions are not intended to limit the scope of the concept of the application in any way, but to illustrate the concept of the application for those skilled in the art by referring to specific embodiments.

具体实施方式detailed description

这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.

在进入万物互联时代的同时,各种终端设备的安全问题日益严峻。现有的基于被动扫描式入侵检测的网络防御架构对于现在的网络攻击行为可谓是捉襟见肘。现有方法为安全网络设备从待检测数据中待检测项(例如:IP地址、接口调用、程序执行等)之间的交互关系提取有效关系路径,并对有效关系路径进行异常检测,得到目标主机的威胁检测结果,其中,有效关系路径包括至少两个检测项及各检测项之间的交互关系。但是,该网络安全设备很难识别隐蔽性高、大规模且智能化的未知类型的网络攻击。While entering the era of the Internet of Everything, the security issues of various terminal devices are becoming increasingly serious. The existing network defense architecture based on passive scanning intrusion detection is not enough for the current network attacks. The existing method extracts the effective relationship path from the interactive relationship between the items to be detected (such as: IP address, interface call, program execution, etc.) in the data to be detected by the security network device, and performs anomaly detection on the effective relationship path to obtain the target host Threat detection results, wherein the effective relationship path includes at least two detection items and the interaction relationship between each detection item. However, it is difficult for this network security device to identify unknown types of network attacks with high concealment, large scale and intelligence.

本申请提供一种网络防御方法、控制设备及存储介质,旨在解决解决安全网络设备对隐蔽性、持续性的攻击操作不易被察觉,无法实现及时防御的技术问题。本申请的技术构思是:控制设备建立攻防博弈模型,重复执行获得控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和攻击终端的身份类型的先验类型,在该操作后根据攻击策略信息和先验概率,确定最优防御策略和对目标工作终端的最优攻击策略,并执行最优防御策略,控制设备根据最优防御策略和攻击策略信息,计算马尔科夫学习函数值,直至根据马尔科夫学习函数值,确定目标工作终端未处于安全状态时停止循环,将该工作终端的请求进行转发以保护目标工作终端不再受到攻击,以使控制设备根据目标工作终端获得的持续性的攻击策略信息确定目标工作终端的攻击状态,有助于其根据持续获得的信息确定隐蔽性的攻击,使其更好地保护目标工作终端。This application provides a network defense method, control equipment and storage media, aiming to solve the technical problem that the hidden and persistent attack operations of security network equipment are difficult to be detected and timely defense cannot be realized. The technical concept of this application is: the control device establishes an attack-defense game model, and repeatedly executes to obtain the attack strategy information of the attack terminal attacking the target working terminal observed by the control device and the a priori type of the identity type of the attack terminal. After this operation, according to the attack strategy information and the prior probability, determine the optimal defense strategy and the optimal attack strategy for the target working terminal, and implement the optimal defense strategy, and control the device to calculate the Markov learning function value according to the optimal defense strategy and attack strategy information until the Markov learning function value, stop the cycle when it is determined that the target working terminal is not in a safe state, and forward the request of the working terminal to protect the target working terminal from being attacked again, so that the control device can obtain the continuous The attack strategy information determines the attack status of the target working terminal, which helps it determine concealed attacks based on continuously obtained information, so that it can better protect the target working terminal.

图1为本申请根据一示例性实施例提供的网络防御方法的应用场景图,如图1所示,该应用场景图中包括:攻击终端106、路由设备105、第一OpenFlow交换机102、第二OpenFlow交换机103、第三OpenFlow交换机104、控制设备101、入侵检测设备108、SDN控制器107、工作区109和蜜网集群111。其中,工作区109包括多个工作终端,蜜网集群111包括多个蜜网虚拟终端,该蜜网虚拟终端中安装有对工作区109中的工作终端的虚拟映射的系统。攻击终端106与路由设备105连接,路由设备105和第一OpenFlow交换机102连接,第一OpenFlow交换机102分别与第二OpenFlow交换机103、第三OpenFlow交换机104、控制设备101和SDN控制器107连接,入侵检测设备108和SDN控制器107连接,工作区109和第二OpenFlow交换机103连接,蜜网集群111和第三OpenFlow交换机104连接。FIG. 1 is an application scenario diagram of a network defense method provided by the present application according to an exemplary embodiment. As shown in FIG. 1 , the application scenario diagram includes: attack terminal 106, routing device 105, first OpenFlow switch 102, second OpenFlow switch 103 , third OpenFlow switch 104 , control device 101 , intrusion detection device 108 , SDN controller 107 , work area 109 and honeynet cluster 111 . Wherein, the working area 109 includes a plurality of working terminals, and the honeynet cluster 111 includes a plurality of honeynet virtual terminals, and a virtual mapping system for the working terminals in the working area 109 is installed in the honeynet virtual terminals. The attack terminal 106 is connected to the routing device 105, the routing device 105 is connected to the first OpenFlow switch 102, and the first OpenFlow switch 102 is respectively connected to the second OpenFlow switch 103, the third OpenFlow switch 104, the control device 101 and the SDN controller 107, and the intrusion The detection device 108 is connected to the SDN controller 107 , the working area 109 is connected to the second OpenFlow switch 103 , and the honeynet cluster 111 is connected to the third OpenFlow switch 104 .

攻击终端106向工作区109中的目标工作终端110发送访问请求时,攻击终端106将访问请求发送至路由设备105,路由设备105将该访问请求转发至第一OpenFlow交换机102,第一OpenFlow交换机102将该访问请求通过第二OpenFlow交换机103转发至目标工作终端110,同时,第一OpenFlow交换机102通过SDN控制器107发送至入侵检测设备108,入侵检测设备108中安装有入侵检测系统(intrusion detection system,简称:IDS),IDS对上述访问请求进行识别,检测其是否存在对目标工作终端110恶意攻击的风险。When the attack terminal 106 sends an access request to the target working terminal 110 in the work area 109, the attack terminal 106 sends the access request to the routing device 105, and the routing device 105 forwards the access request to the first OpenFlow switch 102, and the first OpenFlow switch 102 The access request is forwarded to the target work terminal 110 by the second OpenFlow switch 103, and at the same time, the first OpenFlow switch 102 is sent to the intrusion detection device 108 by the SDN controller 107, and an intrusion detection system (intrusion detection system) is installed in the intrusion detection device 108 , referred to as: IDS), the IDS identifies the above access request, and detects whether there is a risk of malicious attack on the target working terminal 110 .

若存在风险,则将该检测结果反馈至控制设备101,控制设备101控制第一OpenFlow交换机102将攻击终端106再次发送的访问请求转发至蜜网集群111中加载目标工作终端110的镜像的目标蜜网设备112,目标蜜网设备112隐藏在防火墙后面,对所有进出的资料都受到监控、捕获及控制;若未存在风险,且目标工作终端110确定该访问请求属于异常操作类型,则目标工作终端110将确定的结果反馈至控制设备101,控制设备101通过攻防博弈模型确定对应的防御策略,并根据马尔科夫学习函数确定目标工作终端110在接收到下一访问请求时是否安全。更具体地,控制设备101建立攻防博弈模型,并获得控制设备101观察的访问请求中的攻击策略信息、攻击终端106的身份类型的先验概率、系统成本和类型成本,其中,攻击终端106的身份类型包括用户(User)或攻击者(Attacker)。控制设备101根据攻击策略信息、先验概率、系统成本和类型成本,确定最优防御策略和攻击终端106针对最优防御策略的最优攻击策略,并执行最优防御策略,同时根据最优防御策略和攻击策略信息,计算目标工作终端110在当前状态下的马尔科夫学习函数值,从而确定目标工作终端110是否处于安全状态。当目标工作终端110处于安全状态,则控制设备101在目标工作终端110再次获取到访问请求时,仍通过攻防博弈模型确定防御策略对目标工作终端110进行防护;当目标工作终端110未处于安全状态,则控制设备101在目标工作终端110再次获取到访问请求时,直接将该访问请求转发至蜜网集群111,以保护目标工作终端110不再受到攻击终端106的攻击。If there is a risk, the detection result is fed back to the control device 101, and the control device 101 controls the first OpenFlow switch 102 to forward the access request sent again by the attack terminal 106 to the target honeycomb that loads the image of the target working terminal 110 in the honeynet cluster 111. network device 112, the target honeynet device 112 is hidden behind the firewall, and all incoming and outgoing data are monitored, captured and controlled; if there is no risk, and the target working terminal 110 determines that the access request belongs to an abnormal operation type, then the target working terminal 110 feeds back the determined result to the control device 101, and the control device 101 determines the corresponding defense strategy through the attack-defense game model, and determines whether the target working terminal 110 is safe when receiving the next access request according to the Markov learning function. More specifically, the control device 101 establishes an attack-defense game model, and obtains the attack strategy information in the access request observed by the control device 101, the prior probability of the identity type of the attacking terminal 106, the system cost and the type cost, wherein the attacking terminal 106's Identity types include User or Attacker. The control device 101 determines the optimal defense strategy and the optimal attack strategy of the attack terminal 106 for the optimal defense strategy according to the attack strategy information, prior probability, system cost, and type cost, and executes the optimal defense strategy. strategy and attack strategy information, and calculate the Markov learning function value of the target working terminal 110 in the current state, so as to determine whether the target working terminal 110 is in a safe state. When the target work terminal 110 is in a safe state, the control device 101 will still determine the defense strategy to protect the target work terminal 110 through the attack-defense game model when the target work terminal 110 obtains the access request again; when the target work terminal 110 is not in the safe state , then the control device 101 directly forwards the access request to the honeynet cluster 111 when the target working terminal 110 obtains the access request again, so as to protect the target working terminal 110 from being attacked by the attacking terminal 106 .

下面基于上述应用场景,结合附图,对本申请的一些实施方式作详细说明。在各实施例之间不冲突的情况下,下述的实施例及实施例中的特征可以相互结合。另外,下述各方法实施例中的步骤时序仅为一种举例,而非严格限定。Based on the above application scenarios, some embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. The following embodiments and features in the embodiments may be combined with each other under the condition that there is no conflict between the embodiments. In addition, the sequence of steps in the following method embodiments is only an example, rather than a strict limitation.

图2为本申请根据一示例性实施例提供的网络防御方法的流程示意图,如图2所示,网络防御方法包括:Fig. 2 is a schematic flowchart of a network defense method provided by the present application according to an exemplary embodiment. As shown in Fig. 2, the network defense method includes:

S201、控制设备建立攻防博弈模型。S201. The control device establishes an attack and defense game model.

攻防博弈模型是基于博弈论建立的模型,博弈论是基于两个以上绝对理性的决策者进行决策时,通过猜想其他决策者的策略,优化自身的策略以实现自身利益最大化。The offensive and defensive game model is a model based on game theory. Game theory is based on two or more absolutely rational decision makers making decisions, by guessing the strategies of other decision makers, optimizing their own strategies to maximize their own interests.

攻防博弈模型包括多个博弈参与方、各博弈参与方的策略空间、信号空间、先验概率、后验概率和各博弈参与方的收益。该模型可以表示为六元组ADGMA={P,A,S,p,ppost,Reward},其中,P={Actor,Defender}表示博弈参与双方,指在博弈中能够进行自主决策与行为的主体,在网络攻击中,通常指攻防双方。在未知威胁防御中,防御方无法确定用户的身份,防御方的身份信息是共同所享有的信息。在模型中参与方有Actor以及Defender,其中,Actor有两种类型Type={Attacker,User}。当博弈参与方中Actor有多个,则控制设备对各Actor分配对应且唯一的标识。例如:Actor1、Actor2。The offensive and defensive game model includes multiple game participants, the strategy space of each game participant, signal space, prior probability, posterior probability and the income of each game participant. This model can be expressed as a six-tuple ADGMA={P, A, S, p, p post , Reward}, where P={Actor, Defender} represents the two parties involved in the game, and refers to those who can make independent decisions and behaviors in the game The subject, in a network attack, usually refers to both the attacker and the defender. In defense against unknown threats, the defender cannot determine the identity of the user, and the identity information of the defender is shared information. In the model, participants include Actor and Defender, and Actor has two types Type={Attacker, User}. When there are multiple Actors in the game participant, the control device assigns a corresponding and unique identifier to each Actor. For example: Actor1, Actor2.

A={Aactor=(m1,m2,……,mx),Adefender=(a1,a2,……,ay)}是各参与方的策略空间,代表各参与方可采取的具体行为操作集合,可以被所有参与方所观测,并隐含其类型信息。当参与方中Actor有多个时,则每个Actor对应一个策略空间。A={A actor =(m 1 ,m 2 ,...,m x ),A defender =(a 1 ,a 2 ,...,a y )} is the strategy space of each participant, which means that each participant can The set of specific actions and operations taken can be observed by all participants, and its type information is implied. When there are multiple Actors in a participant, each Actor corresponds to a strategy space.

S={s1,s2,……,sn}代表信号空间,将Actor的行为所传达出的关于其真实类型的信息称为“信号”,可用于指导对其类型的判断与修正。S={s 1 ,s 2 ,...,s n } represents the signal space, and the information about the true type conveyed by the Actor's behavior is called "signal", which can be used to guide the judgment and correction of its type.

p代表先验概率,是指防御方Defender对攻击方Actor类型为Attacker的初始判断,即p(Type=Attacker)=θ,p(Type=User)=1-θ。p represents the prior probability, which refers to the initial judgment of the defender on the Actor type of the attacker as Attacker, that is, p(Type=Attacker)=θ, p(Type=User)=1-θ.

ppost代表后验概率,是指防御方Defender根据所获取到得信号运用贝叶斯法则,对Actor类型判断的先验概率进行修正之后得到的后验概率。p post represents the posterior probability, which refers to the posterior probability obtained by the defender after correcting the prior probability of Actor type judgment by using Bayesian rule according to the acquired signal.

Reward={RA,RD}代表各参与方所采取的行为能够带来的净收益,通常与行为价值和行为成本相关,是参与方选择行为的根本依据。Reward={R A , R D } represents the net income that each participant's behavior can bring, usually related to the behavior value and behavior cost, and is the fundamental basis for the participant to choose behavior.

控制终端在建立攻防博弈模型后,对该模型进行初始化,其中,先验概率的初始化为0.5,后验概率为0,各参与方的净收益也均为0。After the control terminal establishes the offensive and defensive game model, it initializes the model. The prior probability is initialized to 0.5, the posterior probability is 0, and the net income of each participant is also 0.

S202、控制设备重复执行获得控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和攻击终端的身份类型的先验概率。S202. The control device repeatedly executes to obtain the attack strategy information of the attacking terminal attacking the target working terminal observed by the control device and the a priori probability of the identity type of the attacking terminal.

其中,控制设备为步骤S201中建立的攻防博弈模型中的防御方Defender,攻击终端为攻防博弈模型中的攻击方Actor。Wherein, the control device is the defender Defender in the attack-defense game model established in step S201, and the attack terminal is the attacker Actor in the attack-defense game model.

攻击终端的身份类型的先验概率是目标工作终端接收到本次攻击终端的攻击操作之前,控制设备确定的攻击终端的身份类型的先验概率。若目标工作终端为首次受到攻击时,则先验概率为初始化值,即0.5,否则,先验概率为控制设备在目标工作终端在上一次受到攻击时确定的攻击终端的身份类型的后验概率。The prior probability of the identity type of the attack terminal is the prior probability of the identity type of the attack terminal determined by the control device before the target working terminal receives the attack operation of the attack terminal. If the target working terminal is attacked for the first time, the prior probability is the initial value, that is, 0.5; otherwise, the prior probability is the posterior probability of the identity type of the attacking terminal determined by the control device when the target working terminal was attacked last time .

目标工作终端在确定其接收的访问请求包含异常操作,则将该访问请求异常的状态信息反馈至控制设备。对于网络攻击,攻击终端将带有攻击行为的访问请求分成多个步骤完成,每个步骤之间也有一定的因果关系,每一步行为可以成为原子行为e,原子行为按照时间顺序以及因果顺序组成了有效行为m,则m=(e1,e2,……,ek),其中,k为有效行为m中包含的原子行为的个数。当目标工作终端接收到的有效行为中至少一个原子行为是原子异常行为表中的异常行为,则确定该有效行为异常。在一实施例中,原子异常行为表如表1所示。After determining that the access request received by the target working terminal contains an abnormal operation, the target working terminal feeds back the abnormal status information of the access request to the control device. For network attacks, the attack terminal divides the access request with attack behavior into multiple steps to complete, and there is also a certain causal relationship between each step. Each step of behavior can become an atomic behavior e, and the atomic behavior is composed of time sequence and causal sequence Effective behavior m, then m=(e 1 , e 2 ,...,e k ), where k is the number of atomic behaviors contained in the effective behavior m. When at least one atomic behavior in the effective behaviors received by the target working terminal is an abnormal behavior in the atomic abnormal behavior table, it is determined that the effective behavior is abnormal. In an embodiment, the atomic abnormal behavior table is shown in Table 1.

表1Table 1

Figure BDA0003870099590000091
Figure BDA0003870099590000091

Figure BDA0003870099590000101
Figure BDA0003870099590000101

控制设备在确定目标工作终端接收到攻击操作后,根据其观察到的攻击策略信息更新攻防博弈模型的信号空间。After the control device determines that the target working terminal has received the attack operation, it updates the signal space of the attack-defense game model according to the attack strategy information it observes.

S203、控制设备根据攻击策略信息和先验概率,确定最优防御策略和对目标工作终端的最优攻击策略,并执行最优防御策略。S203. The control device determines an optimal defense strategy and an optimal attack strategy for the target working terminal according to the attack strategy information and the prior probability, and executes the optimal defense strategy.

更具体地,控制设备根据攻击策略信息和先验概率,计算攻击终端的身份类型的后验概率,再根据后验概率,确定攻击终端的身份类型。控制设备还根据攻击策略信息和身份类型,计算攻击终端的攻击者收益和目标系统的防御者收益,并根据后验概率和防御者收益,确定最优防御策略和对目标工作终端的最优攻击策略。More specifically, the control device calculates the posterior probability of the identity type of the attacking terminal according to the attack strategy information and the prior probability, and then determines the identity type of the attacking terminal according to the posterior probability. The control device also calculates the attacker's income of the attacking terminal and the defender's income of the target system according to the attack strategy information and identity type, and determines the optimal defense strategy and the optimal attack on the target working terminal according to the posterior probability and the defender's income Strategy.

更具体地,控制设备根据贝叶斯法则分别计算攻击方的类型为Attacker或User的后验概率,并对上述两个后验概率进行比较,将后验概率值大的类型确定为攻击方的身份类型,并将该类型对应的后验概率确定为攻击方的后验概率。同时,将该后验概率在攻防博弈模型中更新先验概率,使其作为目标工作终端再次接收到攻击时的先验概率。More specifically, the control device calculates the posterior probability of the attacking party being Attacker or User according to the Bayesian rule, and compares the above two posterior probabilities, and determines the type with the larger posterior probability value as the attacking party. identity type, and determine the posterior probability corresponding to this type as the posterior probability of the attacker. At the same time, the prior probability is updated in the attack-defense game model to make it the prior probability when the target working terminal receives an attack again.

控制设备先根据攻击方的身份类型和攻击策略信息确定系统成本和类型成本,其中,控制设备将目标工作终端获得攻击策略信息在系统成本映射表中查询,获得与攻击策略信息对应的系统成本,该系统成本表示攻击方或防御方实施行为之后引起系统状态发生变化,对其它参与方的收益产生的影响。The control device first determines the system cost and type cost according to the identity type of the attacker and the attack strategy information, wherein the control device queries the attack strategy information obtained by the target working terminal in the system cost mapping table to obtain the system cost corresponding to the attack strategy information, The system cost represents the impact on the income of other participants caused by changes in the state of the system after the attacker or the defender implements the behavior.

控制设备还将攻击策略信息和根据后验概率确定的身份类型在类型成本映射表中查询,获得与攻击策略信息对应的类型成本,该类型成本表示不同类型的参与方选择行为时所需要的成本。其中,参与方Attacker的类型成本TCAttacker∈[100,300],TCuser∈[500,2000]。The control device also queries the attack strategy information and the identity type determined according to the posterior probability in the type cost mapping table to obtain the type cost corresponding to the attack strategy information, which represents the cost required for different types of participants to choose behaviors . Among them, the type cost of the participant Attacker is TC Attacker ∈ [100, 300], TC user ∈ [500, 2000].

更具体地,系统成本映射表和类型成本映射表均可以在MIT的林肯攻防实验室攻防行为分类数据库中查到,此处不再赘述。More specifically, both the system cost mapping table and the type cost mapping table can be found in the MIT Lincoln Offensive and Defense Laboratory offensive and defensive behavior classification database, and will not be repeated here.

控制设备再根据系统成本、类型成本、攻击策略信息和后验概率,计算攻击终端的攻击者收益和目标系统的防御者收益。The control device then calculates the attacker's revenue of the attack terminal and the defender's revenue of the target system according to the system cost, type cost, attack strategy information and posterior probability.

更具体地,攻击者策略信息中包含多个原子行为,控制设备将各原子行为在步骤S202中涉及的原子异常行为表中查询,确定与各原子行为对应的自身价值和潜在价值,并利用总价值计算公式、自身价值、潜在价值以及权重计算各原子行为对应的总价值。其中,总价值计算公式具体表示为:More specifically, the attacker's policy information contains multiple atomic behaviors, and the control device queries each atomic behavior in the atomic abnormal behavior table involved in step S202, determines its own value and potential value corresponding to each atomic behavior, and uses the total The value calculation formula, own value, potential value and weight calculate the total value corresponding to each atomic behavior. Among them, the total value calculation formula is specifically expressed as:

r0(e)=SV(e)+PV(e)×weight,r 0 (e)=SV(e)+PV(e)×weight,

其中,r0(e)表示原子行为e的总价值,SV(e)表示原子行为e的自身价值,PV(e)表示原子行为e的潜在价值,weight表示潜在价值的权重。其中,SV(e)与原子行为e的执行难度、资源能力、环境要求等相关,执行难度越低、资源能力越高、环境要求越低,则自身价值越高;PV(e)与e的公开程度、防御成熟度、通用程度等因素相关,公开程度越低、防御成熟度越低、通用程度越高,则潜在价值越高,如利用常用办公软件中存在的零日漏洞实施的原子行为,其潜在的攻击价值非常高;weight∈[0,1]表示原子行为e的潜在价值发挥作用的程度,通常与原子行为e的等级相关。该原子行为的等级代表原子攻击(防御)行为的执行难度和效能水平,依据林肯实验室的攻防分类数据确定。Among them, r 0 (e) represents the total value of atomic behavior e, SV(e) represents the self-value of atomic behavior e, PV(e) represents the potential value of atomic behavior e, and weight represents the weight of potential value. Among them, SV(e) is related to the implementation difficulty, resource capability, and environmental requirements of the atomic behavior e. The lower the execution difficulty, the higher the resource capability, and the lower the environmental requirements, the higher its own value; the relationship between PV(e) and e The degree of publicity, defense maturity, generality and other factors are related. The lower the degree of publicity, the lower the defense maturity, and the higher the generality, the higher the potential value. For example, atomic behavior implemented by using zero-day vulnerabilities in commonly used office software , its potential attack value is very high; weight ∈ [0, 1] represents the degree to which the potential value of atomic behavior e plays a role, and is usually related to the level of atomic behavior e. The level of the atomic behavior represents the execution difficulty and performance level of the atomic attack (defense) behavior, which is determined according to the attack and defense classification data of Lincoln Laboratory.

控制设备在获得各原子行为的总价值之后,将攻击策略信息中包含的所有原子行为对应的总价值相加,获得攻击策略信息对应的有效行为价值。控制设备将有效行为价值减去系统成本和类型成本,计算攻击终端的攻击者收益。After obtaining the total value of each atomic behavior, the control device adds the total values corresponding to all atomic behaviors included in the attack strategy information to obtain the effective behavior value corresponding to the attack strategy information. The control device subtracts the system cost and type cost from the effective behavior value, and calculates the attacker's benefit of attacking the terminal.

控制设备从防御行为表中获得其所能执行的所有防御原子行为,并根据所有防御原子行为构建针对攻击者策略信息的防御策略信息表,其中,防御策略信息表的个数不止一个。控制设备计算各防御策略信息表对应的有效行为价值,并将有效行为价值和系统成本之差确定为该防御策略信息表对应的防御者收益。在一实施例中,防御行为表如表2所示:The control device obtains all defensive atomic behaviors that it can execute from the defensive behavior table, and constructs a defensive strategy information table for the attacker's strategy information according to all defensive atomic behaviors, wherein the number of defensive strategy information tables is more than one. The control device calculates the effective behavior value corresponding to each defense strategy information table, and determines the difference between the effective behavior value and the system cost as the defender's benefit corresponding to the defense strategy information table. In one embodiment, the defense behavior table is as shown in Table 2:

表2Table 2

Figure BDA0003870099590000111
Figure BDA0003870099590000111

Figure BDA0003870099590000121
Figure BDA0003870099590000121

控制设备根据各防御策略信息表对应的防御者价值和后验概率,计算最优防御策略,并根据最优防御策略,估计攻击终端应对的最优攻击策略。其中,最优防御策略和最优攻击策略通过求解博弈论来进行获取,求解过程为现有技术,此处不再赘述。The control device calculates the optimal defense strategy according to the defender value and the posterior probability corresponding to each defense strategy information table, and estimates the optimal attack strategy for the attacking terminal according to the optimal defense strategy. Among them, the optimal defense strategy and the optimal attack strategy are obtained by solving the game theory, and the solving process is an existing technology, and will not be repeated here.

S204、控制设备根据最优防御策略和攻击策略信息,计算马尔科夫学习函数值。S204. The control device calculates a Markov learning function value according to the optimal defense strategy and attack strategy information.

其中,马尔科夫学习函数值是目标工作终端从首次受到攻击时的初始状态到当前状态,利用马尔科夫决策过程计算的累计学习函数值。Wherein, the Markov learning function value is the cumulative learning function value calculated by using the Markov decision process from the initial state when the target working terminal is attacked for the first time to the current state.

S205、控制设备根据马尔科夫学习函数值,确定目标工作终端是否处于安全状态。S205. The control device determines whether the target working terminal is in a safe state according to the value of the Markov learning function.

当马尔科夫学习函数值小于或等于安全阈值时,确定目标工作终端处于安全状态,当马尔科夫学习函数值大于安全阈值时,确定目标工作终端未处于安全状态。在一实施例中,安全阈值为1000。When the Markov learning function value is less than or equal to the safety threshold, it is determined that the target working terminal is in a safe state, and when the Markov learning function value is greater than the safety threshold, it is determined that the target working terminal is not in a safe state. In one embodiment, the security threshold is 1000.

其中,安全阈值表示目标工作终端在下一次受到攻击时,无论接收到何种攻击策略信息均不会对目标工作终端造成严重威胁的阈值。Wherein, the security threshold represents the threshold at which the target working terminal will not pose a serious threat to the target working terminal no matter what kind of attack policy information is received when the target working terminal is attacked next time.

当目标工作终端处于安全状态时,进入步骤S202,以进行下一轮的攻防博弈,否则,进入步骤S206。When the target working terminal is in a safe state, go to step S202 for the next round of attack and defense game, otherwise go to step S206.

S206、控制设备执行对应的安防操作。S206. The control device executes corresponding security operations.

在一实施例中,控制设备在执行安防操作时,将目标工作终端获得的访问请求转移至蜜网集群,中断攻击终端对目标工作设备的操作。In an embodiment, when the control device performs security operations, it transfers the access request obtained by the target working terminal to the honeynet cluster, and interrupts the operation of the attacking terminal on the target working device.

在上述技术方案中,控制设备建立攻防博弈模型,重复执行获得控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和攻击终端的身份类型的先验类型,在该操作后根据攻击策略信息和先验概率,确定最优防御策略和对目标工作终端的最优攻击策略,并执行最优防御策略,控制设备根据最优防御策略和攻击策略信息,计算马尔科夫学习函数值,直至根据马尔科夫学习函数值,确定目标工作终端未处于安全状态时停止循环,以使控制设备根据目标工作终端获得的持续性的攻击策略信息确定目标工作终端的攻击状态,有助于其根据持续获得的信息确定隐蔽性的攻击,使其更好地保护目标工作终端。In the above technical solution, the control device establishes an attack-defense game model, repeatedly executes to obtain the attack strategy information of the attack terminal attacking the target working terminal observed by the control device and the a priori type of the identity type of the attack terminal. After this operation, according to the attack strategy information and Prior probability, determine the optimal defense strategy and the optimal attack strategy for the target working terminal, and implement the optimal defense strategy, and control the equipment to calculate the value of the Markov learning function according to the optimal defense strategy and attack strategy information until the Markov learning function value is calculated according to the Markov Cove learning function value, stop the cycle when it is determined that the target working terminal is not in a safe state, so that the control device can determine the attack status of the target working terminal according to the persistent attack strategy information obtained by the target working terminal, which helps it to obtain The information identifies stealthy attacks, making it better to protect target work terminals.

图3为本申请根据一示例性实施例提供的马尔科夫学习函数值的计算流程示意图。如图3所示,该方法包括:Fig. 3 is a schematic diagram of a calculation flow of a Markov learning function value provided by the present application according to an exemplary embodiment. As shown in Figure 3, the method includes:

S301、控制设备根据最优防御策略和攻击策略信息,计算目标工作终端在当前状态下的攻防回报值。S301. The control device calculates the attack and defense return value of the target working terminal in the current state according to the optimal defense strategy and attack strategy information.

博弈参与方的策略空间包含防御策略空间,该防御策略空间中包含至少一个防御策略;博弈参与方的策略空间还包括攻击策略空间,该攻击策略空间中包含至少一个攻击策略。The strategy space of the game participants includes a defense strategy space, and the defense strategy space includes at least one defense strategy; the strategy space of the game participants also includes an attack strategy space, and the attack strategy space includes at least one attack strategy.

控制设备根据最优防御策略、攻击策略信息和攻防回报值计算公式,计算目标工作终端在当前状态下的攻防回报值,其中,攻防回报值计算公式具体包括:The control device calculates the attack and defense return value of the target working terminal in the current state according to the optimal defense strategy, attack strategy information and the calculation formula of the attack and defense return value. The calculation formula of the attack and defense return value specifically includes:

Figure BDA0003870099590000131
Figure BDA0003870099590000131

其中,sp表示目标工作终端的当前状态,Re(fx(sp))表示目标工作终端x在当前状态下的攻防回报值,i表示防御策略空间中的防御标识,j表示攻击策略空间中的攻击标识,i=0表示目标工作终端x未受到防御,i≠0表示目标工作终端x受到第i个防御策略的防御,j=0表示目标工作终端x未受到攻击,j≠0表示目标工作终端x受到第j个攻击策略的攻击,P表示防护值,在一实施例中,P为常数,Di表示目标工作终端x受到第i个防御策略保护时的回报值,δi表示回报系数,D表示目标工作终端未受到防御策略的保护时受到攻击的攻击回报值,Pkij表示博弈论模型采用第i个保护策略来抵抗第j个攻击策略时的回报值。Among them, sp represents the current state of the target working terminal, Re(f x (s p )) represents the attack and defense return value of the target working terminal x in the current state, i represents the defense identity in the defense strategy space, and j represents the attack strategy space i=0 means that the target working terminal x has not been defended; i≠0 means that the target working terminal x is defended by the i-th defense strategy; j=0 means that the target working terminal x has not been attacked; j≠0 means The target working terminal x is attacked by the jth attack strategy, and P represents the protection value. In one embodiment, P is a constant, D i represents the return value when the target working terminal x is protected by the i-th defense strategy, and δ i represents Return coefficient, D represents the attack reward value when the target working terminal is attacked when it is not protected by the defense strategy, Pk ij represents the return value when the game theory model adopts the i-th protection strategy to resist the j-th attack strategy.

S302、控制设备获得目标工作终端从最初状态到当前状态的累计回报值和目标工作终端在当前状态下的马尔科夫学习函数值。S302. The control device obtains the cumulative return value of the target working terminal from the initial state to the current state and the Markov learning function value of the target working terminal in the current state.

其中,目标工作终端从最初状态到当前状态的累计回报值可通过回报值累计公式求取,其中,回报值累计公式具体表示为:Among them, the cumulative return value of the target working terminal from the initial state to the current state can be obtained through the cumulative return value formula, where the cumulative return value formula is specifically expressed as:

Figure BDA0003870099590000132
Figure BDA0003870099590000132

其中,

Figure BDA0003870099590000141
表示目标工作终端x在从当前状态s0变化到状态sp时的累计回报值,Re(fx(s0))表示当前状态对应的攻防回报值;s1表示目标工作终端的第一状态,第一状态是处于当前状态的目标终端在攻击策略和防护策略的影响下改变后的下一状态,Re(fx(s1))表示第一状态对应的攻防回报值;sp表示终端的当前状态的未来第p个状态,Re(fx(sp))表示状态p对应的攻防回报值;γ表示折现率参数,γ∈[0,1],折现率为0时,目标系统仅考虑当前回报;折现率为1时,目标系统将得到长期高回报。in,
Figure BDA0003870099590000141
Indicates the cumulative reward value of the target working terminal x when it changes from the current state s 0 to the state s p , Re(f x (s 0 )) represents the attack and defense reward value corresponding to the current state; s 1 represents the first state of the target working terminal , the first state is the next state after the target terminal in the current state changes under the influence of the attack strategy and defense strategy, Re(f x (s 1 )) represents the attack and defense return value corresponding to the first state; sp represents the terminal The p-th future state of the current state, Re(f x (s p )) represents the attack and defense return value corresponding to state p; γ represents the discount rate parameter, γ∈[0,1], when the discount rate is 0, The target system only considers the current return; when the discount rate is 1, the target system will get long-term high returns.

学习率参数表示控制设备在目标工作终端获得攻击时状态变化过程中的学习状态。学习率参数∈[0,1],学习率参数为0表示后续行为对当前没有影响;学习参数为1表示完全要考虑后续行为对于现在的影响。The learning rate parameter represents the learning state of the control device during the state change process when the target working terminal is attacked. The learning rate parameter ∈ [0, 1], a learning rate parameter of 0 means that the subsequent behavior has no effect on the current situation; a learning parameter of 1 means that the impact of the subsequent behavior on the current situation must be considered.

目标工作终端在当前状态下的马尔科夫学习函数值是控制设备对目标工作设备从开始受到攻击时的初始状态直至当前所处状态获得的累计回报值,其中,若当前状态为首次受到攻击的初始状态,则当前状态下的马尔科夫学习函数为预设初始值,在一实施例中,该预设初始值为0。更具体地,当目标工作终端的状态为p时,目标工作设备在上一次受到攻击时,其状态是从p-1转化为p的过程。The Markov learning function value of the target working terminal in the current state is the cumulative return value obtained by the control device from the initial state when the target working device is attacked until the current state, wherein, if the current state is the first attack In the initial state, the Markov learning function in the current state is a preset initial value. In one embodiment, the preset initial value is 0. More specifically, when the state of the target working terminal is p, the state of the target working device was transformed from p-1 to p when it was attacked last time.

S303、控制设备根据攻击终端针对最优防御策略的最优攻击策略、所有防御策略、累计回报值和攻防回报值,计算目标工作终端在下一状态下的最大累计回报值。S303. The control device calculates the maximum cumulative return value of the target working terminal in the next state according to the attack terminal's optimal attack strategy for the optimal defense strategy, all defense strategies, cumulative reward value, and attack-defense reward value.

控制设备根据攻击终端针对最优防御策略的最优攻击策略、各防御策略和步骤S301中涉及的攻防回报值公式,计算基于各防御策略调整的目标工作终端的下一状态下的攻防回报值,即基于控制设备预测的攻击终端待采取的攻击策略和控制设备针对该攻击策略可进行的所有防御策略,分别计算各防御策略对应的攻防回报值。The control device calculates the attack and defense return value of the target working terminal adjusted based on each defense strategy in the next state according to the attack terminal's optimal attack strategy for the optimal defense strategy, each defense strategy and the attack and defense return value formula involved in step S301, That is, based on the attack strategy to be adopted by the attacking terminal predicted by the control device and all the defense strategies that the control device can implement for the attack strategy, the attack and defense return values corresponding to each defense strategy are calculated respectively.

控制设备获得攻防回报值后,将各攻防回报值与步骤S302中获得的累计回报值相加,计算基于各防御策略调整的目标工作终端的下一状态下的累计回报值。控制设备将各防御策略对应的下一状态下的累计回报值中的最大值确定为最大累计回报值。After obtaining the attack and defense return values, the control device adds each attack and defense return value to the cumulative return value obtained in step S302 to calculate the cumulative return value in the next state of the target working terminal adjusted based on each defense strategy. The control device determines the maximum value among the cumulative return values in the next state corresponding to each defense strategy as the maximum cumulative return value.

S304、控制设备根据当前状态下的攻防回报值、下一状态下的最大累计回报值和当前状态下的马尔科夫学习函数值,计算目标工作终端在当前状态的下一状态下的马尔科夫学习函数值。S304. The control device calculates the Markov value of the target working terminal in the next state of the current state according to the attack and defense return value in the current state, the maximum cumulative return value in the next state, and the Markov learning function value in the current state. Learn function values.

根据当前状态下的攻防回报值、下一状态下的最大累计回报值、当前状态下的马尔科夫学习函数值和学习函数更新公式,计算目标工作终端在当前状态的下一状态下的马尔科夫学习函数值;其中,学习函数更新公式具体包括:According to the attack and defense return value in the current state, the maximum cumulative return value in the next state, the Markov learning function value in the current state and the learning function update formula, calculate the Markov value of the target working terminal in the next state of the current state value of the learning function; wherein, the updating formula of the learning function specifically includes:

Figure BDA0003870099590000151
Figure BDA0003870099590000151

Figure BDA0003870099590000152
Figure BDA0003870099590000152

其中,

Figure BDA0003870099590000153
表示目标工作终端在当前状态的下一状态下的马尔科夫学习函数值,
Figure BDA0003870099590000154
表示目标工作终端在当前状态下的马尔科夫学习函数值,α表示学习率参数,γ表示折现率参数,
Figure BDA0003870099590000155
表示下一状态下的最大累计回报值;s0表示目标工作终端的初始状态,Re(fx(s0))表示初始状态对应的攻防回报值;s1表示目标工作终端的第一状态,第一状态是处于初始状态的目标终端在攻击策略和防护策略的影响下改变后的状态,Re(fx(s1))表示第一状态对应的攻防回报值;sp表示终端的当前状态,Re(fx(sp))表示当前状态对应的攻防回报值;sp+1表示目标工作终端的下一状态,Re(fx(sp+1))表示下一状态对应的攻防回报值。其中,马尔科夫学习函数值的初始化值为0。in,
Figure BDA0003870099590000153
Indicates the Markov learning function value of the target working terminal in the next state of the current state,
Figure BDA0003870099590000154
Indicates the Markov learning function value of the target working terminal in the current state, α indicates the learning rate parameter, γ indicates the discount rate parameter,
Figure BDA0003870099590000155
Represents the maximum cumulative return value in the next state; s 0 represents the initial state of the target working terminal, Re(f x (s 0 )) represents the attack and defense return value corresponding to the initial state; s 1 represents the first state of the target working terminal, The first state is the changed state of the target terminal in the initial state under the influence of attack strategy and defense strategy, Re(f x (s 1 )) represents the attack and defense return value corresponding to the first state; sp represents the current state of the terminal , Re(f x (s p )) represents the attack and defense return value corresponding to the current state; s p+1 represents the next state of the target working terminal, Re(f x (s p+1 )) represents the attack and defense corresponding to the next state return value. Among them, the initial value of the Markov learning function value is 0.

在上述技术方案中,控制设备建立攻防博弈模型之后,在至少一次循环中获得先验概率和控制设备观测到的攻击策略信息,并在每次循环中根据先验概率和攻击策略信息确定控制设备对攻击终端的最优防御策略和攻击终端针对最优防御策略的最优攻击策略,并利用马尔可夫学习函数确定目标工作终端在本次攻击策略的影响下是否危险,以使控制设备在确定目标工作终端未处于安全状态时终止循环,以使控制设备对攻击终端执行的多次攻击进行监测,并统计多次攻击对目标系统造成的危害,察觉到隐蔽性、持续性的攻击操作,从而使得控制设备在确定目标工作终端未处于安全状态时能及时对该目标工作终端进行防护,有效保障目标工作终端的安全。In the above technical solution, after the control device establishes the attack-defense game model, it obtains the prior probability and the attack strategy information observed by the control device in at least one cycle, and determines the control device according to the prior probability and attack strategy information in each cycle. The optimal defense strategy for the attack terminal and the optimal attack strategy for the attack terminal against the optimal defense strategy, and use the Markov learning function to determine whether the target working terminal is dangerous under the influence of this attack strategy, so that the control device can determine When the target working terminal is not in a safe state, the cycle is terminated, so that the control device can monitor the multiple attacks performed by the attack terminal, and count the damage caused by multiple attacks to the target system, and detect the concealed and persistent attack operations, thereby When the control device determines that the target working terminal is not in a safe state, it can protect the target working terminal in time, effectively ensuring the safety of the target working terminal.

图4为本申请根据一实施例提供的网络防御装置的结构示意图,该网络防御装置400包括获取模块401和处理模块402,其中,FIG. 4 is a schematic structural diagram of a network defense device provided by the present application according to an embodiment. The network defense device 400 includes an acquisition module 401 and a processing module 402, wherein,

处理模块402,用于建立攻防博弈模型;其中,攻防博弈模型包括多个博弈参与方、各博弈参与方的策略空间、信号空间、先验概率、后验概率和各博弈参与方的收益。The processing module 402 is used to establish an attack-defense game model; wherein, the attack-defense game model includes multiple game participants, the strategy space of each game participant, signal space, prior probability, posterior probability and the income of each game participant.

获取模块401,用于重复执行获得控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和攻击终端的身份类型的先验概率。The acquisition module 401 is configured to repeatedly perform the acquisition of the attack strategy information of the attacking terminal attacking the target working terminal observed by the control device and the a priori probability of the identity type of the attacking terminal.

处理模块402还用于根据攻击策略信息和先验概率,确定最优防御策略和攻击终端针对最优防御策略的最优攻击策略,并执行最优防御策略;The processing module 402 is also used to determine the optimal defense strategy and the optimal attack strategy of the attack terminal for the optimal defense strategy according to the attack strategy information and the prior probability, and execute the optimal defense strategy;

处理模块402还用于根据最优防御策略和攻击策略信息,计算马尔科夫学习函数值;The processing module 402 is also used to calculate the Markov learning function value according to the optimal defense strategy and attack strategy information;

处理模块402还用于根据马尔科夫学习函数值,确定目标工作终端是否处于安全状态并在目标工作终端未处于安全状态时终止循环。The processing module 402 is further configured to determine whether the target working terminal is in a safe state according to the value of the Markov learning function, and terminate the loop when the target working terminal is not in a safe state.

在一实施例中,处理模块402具体用于:In an embodiment, the processing module 402 is specifically used to:

根据攻击策略信息和先验概率,计算攻击终端的身份类型的后验概率;According to the attack strategy information and the prior probability, calculate the posterior probability of the identity type of the attacking terminal;

根据后验概率,确定攻击终端的身份类型;Determine the identity type of the attacking terminal according to the posterior probability;

根据攻击策略信息和身份类型,确定系统成本和类型成本;Determine system cost and type cost based on attack strategy information and identity type;

根据系统成本、类型成本、攻击策略信息和后验概率,计算攻击终端的攻击者收益和目标系统的防御者收益;According to the system cost, type cost, attack strategy information and posterior probability, calculate the attacker's benefit of the attack terminal and the defender's benefit of the target system;

根据后验概率和防御者收益,确定最优防御策略和对目标工作终端的最优攻击策略。According to the posterior probability and the defender's income, the optimal defense strategy and the optimal attack strategy for the target working terminal are determined.

在一实施例中,处理模块402具体用于:In an embodiment, the processing module 402 is specifically used to:

将攻击策略信息在系统成本映射表中查询,获得与攻击策略信息对应的系统成本;Query the attack strategy information in the system cost mapping table to obtain the system cost corresponding to the attack strategy information;

将攻击策略信息和身份类型在类型成本映射表中查询,获得与攻击策略信息对应的类型成本。Query the attack policy information and identity type in the type cost mapping table to obtain the type cost corresponding to the attack policy information.

在一实施例中,处理模块402具体用于:In an embodiment, the processing module 402 is specifically used to:

根据最优防御策略和攻击策略信息,计算目标工作终端在当前状态下的攻防回报值;According to the optimal defense strategy and attack strategy information, calculate the attack and defense return value of the target working terminal in the current state;

获得目标工作终端从最初状态到当前状态的累计回报值和目标工作终端在当前状态下的马尔科夫学习函数值;Obtain the cumulative return value of the target working terminal from the initial state to the current state and the Markov learning function value of the target working terminal in the current state;

根据攻击终端针对最优防御策略的最优攻击策略、所有防御策略、累计回报值和攻防回报值,计算目标工作终端在下一状态下的最大累计回报值;Calculate the maximum cumulative return value of the target working terminal in the next state according to the attack terminal's optimal attack strategy for the optimal defense strategy, all defense strategies, cumulative return value, and attack-defense return value;

根据当前状态下的攻防回报值、下一状态下的最大累计回报值、当前状态下的马尔科夫学习函数值,计算目标工作终端在当前状态的下一状态下的马尔科夫学习函数值。Calculate the Markov learning function value of the target working terminal in the next state of the current state according to the attack and defense return value in the current state, the maximum cumulative return value in the next state, and the Markov learning function value in the current state.

在一实施例中,处理模块402具体用于:In an embodiment, the processing module 402 is specifically used to:

根据最优防御策略、攻击策略信息和攻防回报值计算公式,计算目标工作终端在当前状态下的攻防回报值:Calculate the attack and defense return value of the target working terminal in the current state according to the optimal defense strategy, attack strategy information and attack and defense return value calculation formula:

其中,攻防回报值计算公式具体包括:Among them, the calculation formula of attack and defense return value includes:

Figure BDA0003870099590000171
Figure BDA0003870099590000171

其中,sp表示目标工作终端的当前状态,Re(fx(sp))表示目标工作终端x在当前状态下的攻防回报值,i表示防御策略空间中的防御标识,j表示攻击策略空间中的攻击标识,i=0表示目标工作终端x未受到防御,i≠0表示目标工作终端x受到第i个防御策略的防御,j=0表示目标工作终端x未受到攻击,j≠0表示目标工作终端x受到第j个攻击策略的攻击,P表示防护值,Di表示目标工作终端x受到第i个防御策略保护时的回报值,δi表示回报系数,D表示目标工作终端未受到防御策略的保护时受到攻击的攻击回报值,Pkij表示博弈论模型采用第i个保护策略来抵抗第j个攻击策略时的回报值。Among them, sp represents the current state of the target working terminal, Re(f x (s p )) represents the attack and defense return value of the target working terminal x in the current state, i represents the defense identity in the defense strategy space, and j represents the attack strategy space i=0 means that the target working terminal x has not been defended; i≠0 means that the target working terminal x is defended by the i-th defense strategy; j=0 means that the target working terminal x has not been attacked; j≠0 means The target working terminal x is attacked by the jth attack strategy, P represents the protection value, D i represents the return value when the target working terminal x is protected by the i-th defense strategy, δ i represents the return coefficient, D represents the target working terminal is not attacked by The attack reward value of being attacked when the defense strategy is protected, Pk ij represents the reward value when the game theory model adopts the i-th protection strategy to resist the j-th attack strategy.

在一实施例中,处理模块402具体用于:In an embodiment, the processing module 402 is specifically used to:

根据攻击终端针对最优防御策略的最优攻击策略、各防御策略和攻防回报值公式,计算基于各防御策略调整的目标工作终端的下一状态下的攻防回报值;According to the optimal attack strategy of the attack terminal for the optimal defense strategy, each defense strategy and the attack and defense return value formula, calculate the attack and defense return value in the next state of the target working terminal adjusted based on each defense strategy;

将各攻防回报值与累计回报值相加,计算基于各防御策略调整的目标工作终端的下一状态下的累计回报值;Adding each offensive and defensive return value to the cumulative return value to calculate the cumulative return value in the next state of the target working terminal adjusted based on each defense strategy;

将各防御策略对应的下一状态下的累计回报值中的最大值确定为最大累计回报值。The maximum value among the cumulative return values in the next state corresponding to each defense strategy is determined as the maximum cumulative return value.

在一实施例中,处理模块402具体用于:In an embodiment, the processing module 402 is specifically used to:

根据当前状态下的攻防回报值、下一状态下的最大累计回报值、当前状态下的马尔科夫学习函数值和学习函数更新公式,计算目标工作终端在当前状态的下一状态下的马尔科夫学习函数值;其中,学习函数更新公式具体包括:According to the attack and defense return value in the current state, the maximum cumulative return value in the next state, the Markov learning function value in the current state and the learning function update formula, calculate the Markov value of the target working terminal in the next state of the current state value of the learning function; wherein, the updating formula of the learning function specifically includes:

Figure BDA0003870099590000181
Figure BDA0003870099590000181

Figure BDA0003870099590000182
Figure BDA0003870099590000182

其中,

Figure BDA0003870099590000183
表示目标工作终端在当前状态的下一状态下的马尔科夫学习函数值,
Figure BDA0003870099590000184
表示目标工作终端在当前状态下的马尔科夫学习函数值,α表示学习率参数,γ表示折现率参数,
Figure BDA0003870099590000185
表示下一状态下的最大累计回报值;s0表示目标工作终端的初始状态,Re(fx(s0))表示初始状态对应的攻防回报值;s1表示目标工作终端的第一状态,第一状态是处于初始状态的目标终端在攻击策略和防护策略的影响下改变后的状态,Re(fx(s1))表示第一状态对应的攻防回报值;sp表示终端的当前状态,Re(fx(sp))表示当前状态对应的攻防回报值;sp+1表示目标工作终端的下一状态,Re(fx(sp+1))表示下一状态对应的攻防回报值。in,
Figure BDA0003870099590000183
Indicates the Markov learning function value of the target working terminal in the next state of the current state,
Figure BDA0003870099590000184
Indicates the Markov learning function value of the target working terminal in the current state, α indicates the learning rate parameter, γ indicates the discount rate parameter,
Figure BDA0003870099590000185
Represents the maximum cumulative return value in the next state; s 0 represents the initial state of the target working terminal, Re(f x (s 0 )) represents the attack and defense return value corresponding to the initial state; s 1 represents the first state of the target working terminal, The first state is the changed state of the target terminal in the initial state under the influence of attack strategy and defense strategy, Re(f x (s 1 )) represents the attack and defense return value corresponding to the first state; sp represents the current state of the terminal , Re(f x (s p )) represents the attack and defense return value corresponding to the current state; s p+1 represents the next state of the target working terminal, Re(f x (s p+1 )) represents the attack and defense corresponding to the next state return value.

在一实施例中,处理模块402具体用于:In an embodiment, the processing module 402 is specifically used to:

当马尔科夫学习函数值小于或等于安全阈值时,确定目标工作终端处于安全状态;When the Markov learning function value is less than or equal to the safety threshold, it is determined that the target working terminal is in a safe state;

当马尔科夫学习函数值大于安全阈值时,确定目标工作终端未处于安全状态。When the Markov learning function value is greater than the safety threshold, it is determined that the target working terminal is not in a safe state.

图5为本申请根据一实施例提供的控制设备的结构示意图。其中,控制设备500包括存储器501和处理器502,存储器501用于存储处理器可执行的计算机指令。该存储器501可能包含高速随机存取存储器(Random Access Memory,RAM),也可能还包括非易失性存储(Non-Volatile Memory,NVM),例如至少一个磁盘存储器,还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。Fig. 5 is a schematic structural diagram of a control device provided by the present application according to an embodiment. Wherein, the control device 500 includes a memory 501 and a processor 502, and the memory 501 is used for storing computer instructions executable by the processor. The memory 501 may include a high-speed random access memory (Random Access Memory, RAM), and may also include a non-volatile storage (Non-Volatile Memory, NVM), such as at least one disk storage, and may also be a U disk, a mobile hard disk , read-only memory, disk or CD-ROM, etc.

处理器502在执行计算机指令时实现上述实施例中以控制设备为执行主体的网络防御方法中的各个步骤。具体可以参见前述方法实施例中的相关描述。该处理器502可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific IntegratedCircuit,ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。When the processor 502 executes computer instructions, it implements various steps in the network defense method in the above-mentioned embodiments with the control device as the execution subject. For details, refer to the related descriptions in the foregoing method embodiments. The processor 502 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in conjunction with the invention can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

可选地,上述存储器501既可以是独立的,也可以跟处理器502集成在一起。当存储器501独立设置时,该控制设备500还包括总线,用于连接存储器501和处理器502。该总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component Interconnect,PCI)总线或扩展工业标准体系结构(ExtendedIndustry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。Optionally, the above-mentioned memory 501 may be independent or integrated with the processor 502 . When the memory 501 is set independently, the control device 500 also includes a bus for connecting the memory 501 and the processor 502 . The bus may be an Industry Standard Architecture (Industry Standard Architecture, ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, the buses in the drawings of the present application are not limited to only one bus or one type of bus.

本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机指令,当处理器执行计算机指令时,实现上述实施例中网络防御方法中的各个步骤。The embodiment of the present application also provides a computer-readable storage medium, in which computer instructions are stored, and when the processor executes the computer instructions, each step in the network defense method in the foregoing embodiments is implemented.

本申请实施例还提供一种计算机程序产品,包括计算机指令,该计算机指令被处理器执行时实现上述实施例中网络防御方法中的各个步骤。The embodiment of the present application also provides a computer program product, including computer instructions, and when the computer instructions are executed by a processor, each step in the network defense method in the foregoing embodiments is implemented.

本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求书指出。Other embodiments of the present application will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the application, these modifications, uses or adaptations follow the general principles of the application and include common knowledge or conventional technical means in the technical field not disclosed in the application . The specification and examples are to be considered exemplary only, with a true scope and spirit of the application indicated by the following claims.

应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求书来限制。It should be understood that the present application is not limited to the precise constructions which have been described above and shown in the accompanying drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1.一种网络防御方法,其特征在于,所述方法应用于控制设备,所述控制设备位于目标系统中,所述目标系统包括至少一个工作终端和蜜网集群,所述方法包括:1. A network defense method, wherein the method is applied to a control device, the control device is located in a target system, and the target system includes at least one working terminal and a honeynet cluster, the method comprising: 建立攻防博弈模型;其中,所述攻防博弈模型包括多个博弈参与方、各所述博弈参与方的策略空间、信号空间、先验概率、后验概率和各所述博弈参与方的收益;An attack-defense game model is established; wherein, the attack-defense game model includes a plurality of game participants, the strategy space of each of the game participants, the signal space, the prior probability, the posterior probability and the income of each of the game participants; 重复执行获得所述控制设备观察的攻击终端攻击目标工作终端的攻击策略信息和所述攻击终端的身份类型的先验概率;Obtaining the attack strategy information of the attacking terminal attacking the target working terminal and the prior probability of the identity type of the attacking terminal observed by the control device repeatedly; 根据所述攻击策略信息和所述先验概率,确定最优防御策略和所述攻击终端针对所述最优防御策略的最优攻击策略,并执行所述最优防御策略;determining an optimal defense strategy and an optimal attack strategy of the attacking terminal for the optimal defense strategy according to the attack strategy information and the prior probability, and executing the optimal defense strategy; 根据所述最优防御策略和所述攻击策略信息,计算马尔科夫学习函数值;calculating a Markov learning function value according to the optimal defense strategy and the attack strategy information; 根据所述马尔科夫学习函数值,确定所述目标工作终端是否处于安全状态并在所述目标工作终端未处于安全状态时终止循环。Based on the Markov learning function value, it is determined whether the target working terminal is in a safe state and a loop is terminated when the target working terminal is not in a safe state. 2.根据权利要求1所述的方法,其特征在于,根据所述攻击策略信息和所述先验概率,确定最优防御策略和对所述目标工作终端的最优攻击策略,具体包括:2. The method according to claim 1, wherein, according to the attack strategy information and the prior probability, determining an optimal defense strategy and an optimal attack strategy for the target working terminal, specifically comprising: 根据所述攻击策略信息和所述先验概率,计算所述攻击终端的身份类型的后验概率;calculating the posterior probability of the identity type of the attacking terminal according to the attack strategy information and the prior probability; 根据所述后验概率,确定所述攻击终端的身份类型;Determine the identity type of the attacking terminal according to the posterior probability; 根据所述攻击策略信息和所述身份类型,确定系统成本和类型成本;determining system cost and type cost according to the attack strategy information and the identity type; 根据所述系统成本、所述类型成本和所述攻击策略信息和所述后验概率,计算所述攻击终端的攻击者收益和所述目标系统的防御者收益;calculating the attacker payoff of the attacking terminal and the defender payoff of the target system according to the system cost, the type cost, the attack strategy information, and the posterior probability; 根据所述后验概率和所述防御者收益,确定最优防御策略和对所述目标工作终端的最优攻击策略。An optimal defense strategy and an optimal attack strategy for the target working terminal are determined according to the posterior probability and the defender's income. 3.根据权利要求2所述的方法,其特征在于,根据所述攻击策略信息和所述身份类型,确定系统成本和类型成本,具体包括:3. The method according to claim 2, wherein the system cost and type cost are determined according to the attack policy information and the identity type, specifically comprising: 将所述攻击策略信息在系统成本映射表中查询,获得与所述攻击策略信息对应的系统成本;Querying the attack strategy information in a system cost mapping table to obtain the system cost corresponding to the attack strategy information; 将所述攻击策略信息和所述身份类型在类型成本映射表中查询,获得与所述攻击策略信息对应的类型成本。The attack policy information and the identity type are queried in a type cost mapping table to obtain the type cost corresponding to the attack policy information. 4.根据权利要求1所述的方法,其特征在于,所述博弈参与方的策略空间包含防御策略空间,所述防御策略空间中包含至少一个防御策略;根据所述最优防御策略和所述攻击策略信息,计算马尔科夫学习函数值,具体包括:4. The method according to claim 1, wherein the strategy space of the game participants comprises a defense strategy space, and at least one defense strategy is included in the defense strategy space; according to the optimal defense strategy and the Attack strategy information, calculate Markov learning function value, including: 根据所述最优防御策略和所述攻击策略信息,计算所述目标工作终端在当前状态下的攻防回报值;calculating the attack and defense return value of the target working terminal in the current state according to the optimal defense strategy and the attack strategy information; 获得所述目标工作终端从最初状态到当前状态的累计回报值和所述目标工作终端在当前状态下的马尔科夫学习函数值;Obtaining the cumulative return value of the target working terminal from the initial state to the current state and the Markov learning function value of the target working terminal in the current state; 根据所述攻击终端针对所述最优防御策略的最优攻击策略、所有所述防御策略、所述累计回报值和所述攻防回报值,计算所述目标工作终端在下一状态下的最大累计回报值;According to the optimal attack strategy of the attacking terminal for the optimal defense strategy, all the defense strategies, the cumulative reward value and the offensive and defensive reward value, calculate the maximum cumulative reward of the target working terminal in the next state value; 根据所述当前状态下的攻防回报值、所述下一状态下的最大累计回报值和所述当前状态下的马尔科夫学习函数值,计算所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值。According to the attack and defense return value in the current state, the maximum cumulative return value in the next state and the Markov learning function value in the current state, calculate the next state of the target working terminal in the current state The Markov learning function value of . 5.根据权利要求4所述的方法,其特征在于,所述博弈参与方的策略空间还包括攻击策略空间,所述攻击策略空间中包含至少一个攻击策略;根据所述最优防御策略和所述攻击策略信息,计算所述目标工作终端在当前状态下的攻防回报值,具体包括:5. method according to claim 4, is characterized in that, the strategy space of described game participant also comprises attack strategy space, comprises at least one attack strategy in the described attack strategy space; The above attack strategy information is used to calculate the attack and defense return value of the target working terminal in the current state, specifically including: 根据所述最优防御策略、所述攻击策略信息和攻防回报值计算公式,计算所述目标工作终端在当前状态下的攻防回报值:Calculate the attack and defense return value of the target working terminal in the current state according to the optimal defense strategy, the attack strategy information and the attack and defense return value calculation formula: 其中,所述攻防回报值计算公式具体包括:Wherein, the calculation formula of the offensive and defensive return value specifically includes:
Figure FDA0003870099580000021
Figure FDA0003870099580000021
其中,sp表示所述目标工作终端所处的当前状态之后的第p个状态,Re(fx(sp))表示所述目标工作终端x在状态p下的攻防回报值,i表示所述防御策略空间中的防御标识,j表示所述攻击策略空间中的攻击标识,i=0表示所述目标工作终端x在状态p时未受到防御,i≠0表示所述目标工作终端x在状态p时受到第i个防御策略的防御,j=0表示所述目标工作终端x在状态p时未受到攻击,j≠0表示所述目标工作终端x在状态p时受到第j个攻击策略的攻击,P表示防护值,Di表示所述目标工作终端x受到第i个防御策略保护时的回报值,δi表示回报系数,D表示所述目标工作终端未受到防御策略的保护时受到攻击的攻击回报值,Pkij表示博弈论模型采用第i个保护策略来抵抗第j个攻击策略时的回报值。Wherein, sp represents the pth state after the current state of the target working terminal, Re(f x (sp )) represents the attack and defense return value of the target working terminal x in state p , and i represents the The defense identifier in the defense strategy space, j represents the attack identifier in the attack strategy space, i=0 means that the target working terminal x is not defended in state p, i≠0 means that the target working terminal x is in state p In state p, it is defended by the i-th defense strategy, j=0 means that the target working terminal x is not attacked in state p, and j≠0 means that the target working terminal x is in state p by the j-th attack strategy , P represents the protection value, D i represents the reward value when the target working terminal x is protected by the i-th defense strategy, δ i represents the reward coefficient, D represents the target working terminal x is protected by the defense strategy The attack return value of the attack, Pk ij represents the return value when the game theory model adopts the i-th protection strategy to resist the j-th attack strategy.
6.根据权利要求5所述的方法,其特征在于,根据攻击终端针对所述最优防御策略的最优攻击策略、所有所述防御策略、所述累计回报值和所述攻防回报值,计算所述目标工作终端在下一状态下的最大累计回报值,具体包括:6. The method according to claim 5, characterized in that, according to the optimal attack strategy of the attacking terminal for the optimal defense strategy, all the defense strategies, the cumulative return value and the attack-defense return value, calculate The maximum cumulative return value of the target working terminal in the next state specifically includes: 根据攻击终端针对所述最优防御策略的最优攻击策略、各所述防御策略和攻防回报值公式,计算基于各所述防御策略调整的所述目标工作终端的下一状态下的攻防回报值;According to the optimal attack strategy of the attack terminal for the optimal defense strategy, each of the defense strategies and the attack and defense return value formula, calculate the attack and defense return value in the next state of the target working terminal adjusted based on each of the defense strategies ; 将各所述攻防回报值与所述累计回报值相加,计算基于各所述防御策略调整的所述目标工作终端的下一状态下的累计回报值;adding each of the offensive and defensive return values to the cumulative return value, and calculating the cumulative return value in the next state of the target working terminal adjusted based on each of the defense strategies; 将各所述防御策略对应的下一状态下的累计回报值中的最大值确定为所述最大累计回报值。Determining the maximum value among the cumulative return values in the next state corresponding to each defense strategy as the maximum cumulative return value. 7.根据权利要求6所述的方法,其特征在于,根据所述当前状态下的攻防回报值、所述下一状态下的最大累计回报值和所述当前状态下的马尔科夫学习函数值,计算所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值,具体包括:7. The method according to claim 6, characterized in that, according to the attack and defense return value under the current state, the maximum cumulative return value under the next state and the Markov learning function value under the current state , calculating the Markov learning function value of the target working terminal in the next state of the current state, specifically including: 根据所述当前状态下的攻防回报值、所述下一状态下的最大累计回报值、所述当前状态下的马尔科夫学习函数值和学习函数更新公式,计算所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值;其中,所述学习函数更新公式具体包括:According to the attack and defense return value in the current state, the maximum cumulative return value in the next state, the Markov learning function value in the current state and the learning function update formula, calculate the current state of the target working terminal The Markov learning function value in the next state of ; wherein, the learning function update formula specifically includes:
Figure FDA0003870099580000031
Figure FDA0003870099580000031
Figure FDA0003870099580000032
Figure FDA0003870099580000032
其中,
Figure FDA0003870099580000033
表示所述目标工作终端在当前状态的下一状态下的马尔科夫学习函数值,
Figure FDA0003870099580000034
表示所述目标工作终端在当前状态下的马尔科夫学习函数值,α表示学习率参数,γ表示折现率参数,
Figure FDA0003870099580000035
表示所述下一状态下的最大累计回报值;s0表示所述目标工作终端的初始状态,Re(fx(s0))表示初始状态对应的攻防回报值;s1表示所述目标工作终端的状态s0的下一状态,Re(fx(s1))表示状态s1对应的攻防回报值;sp表示所述终端的当前状态,Re(fx(sp))表示当前状态对应的攻防回报值;sp+1表示所述目标工作终端当前状态的下一状态,Re(fx(sp+1))表示下一状态对应的攻防回报值。
in,
Figure FDA0003870099580000033
Indicates the Markov learning function value of the target working terminal in the next state of the current state,
Figure FDA0003870099580000034
Represents the Markov learning function value of the target working terminal in the current state, α represents the learning rate parameter, γ represents the discount rate parameter,
Figure FDA0003870099580000035
Represents the maximum cumulative return value under the next state; s 0 represents the initial state of the target work terminal, Re(f x (s 0 )) represents the attack and defense return value corresponding to the initial state; s 1 represents the target work terminal The next state of the terminal state s 0 , Re(f x (s 1 )) represents the attack and defense return value corresponding to state s 1 ; sp p represents the current state of the terminal, Re(f x (s p )) represents the current The attack and defense return value corresponding to the state; sp+1 represents the next state of the current state of the target working terminal, and Re(f x (s p+1 )) represents the attack and defense return value corresponding to the next state.
8.根据权利要求1所述的方法,其特征在于,根据所述马尔科夫学习函数值,确定所述目标工作终端是否处于安全程度,具体包括:8. The method according to claim 1, wherein, according to the Markov learning function value, determining whether the target working terminal is at a safe level specifically comprises: 当所述马尔科夫学习函数值小于或等于安全阈值时,确定所述目标工作终端处于所述安全状态;determining that the target working terminal is in the safe state when the Markov learning function value is less than or equal to a safe threshold; 当所述马尔科夫学习函数值大于所述安全阈值时,确定所述目标工作终端未处于所述安全状态。When the Markov learning function value is greater than the safety threshold, it is determined that the target working terminal is not in the safety state. 9.一种控制设备,其特征在于,包括:处理器以及与所述处理器通信连接的存储器;9. A control device, comprising: a processor and a memory connected to the processor in communication; 所述存储器存储计算机指令;the memory stores computer instructions; 所述处理器在执行所述计算机指令时用于实现如权利要求1至8中任意一项所述的网络防御方法。The processor is used to implement the network defense method according to any one of claims 1 to 8 when executing the computer instructions. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机指令,所述计算机指令被处理器执行时用于实现如权利要求1至8中任意一项所述的网络防御方法。10. A computer-readable storage medium, wherein computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed by a processor, they are used to implement the network defense methods described above.
CN202211192687.4A 2022-09-28 2022-09-28 Network defense method, control device and storage medium Pending CN115550031A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211192687.4A CN115550031A (en) 2022-09-28 2022-09-28 Network defense method, control device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211192687.4A CN115550031A (en) 2022-09-28 2022-09-28 Network defense method, control device and storage medium

Publications (1)

Publication Number Publication Date
CN115550031A true CN115550031A (en) 2022-12-30

Family

ID=84731187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211192687.4A Pending CN115550031A (en) 2022-09-28 2022-09-28 Network defense method, control device and storage medium

Country Status (1)

Country Link
CN (1) CN115550031A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318615A1 (en) * 2012-05-23 2013-11-28 International Business Machines Corporation Predicting attacks based on probabilistic game-theory
CN110460572A (en) * 2019-07-06 2019-11-15 中国人民解放军战略支援部队信息工程大学 Method and device for selecting defense strategy of moving target based on Markov signal game
CN113098882A (en) * 2021-04-08 2021-07-09 鹏城实验室 Game theory-based network space mimicry defense method, device, medium and terminal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318615A1 (en) * 2012-05-23 2013-11-28 International Business Machines Corporation Predicting attacks based on probabilistic game-theory
CN110460572A (en) * 2019-07-06 2019-11-15 中国人民解放军战略支援部队信息工程大学 Method and device for selecting defense strategy of moving target based on Markov signal game
CN113098882A (en) * 2021-04-08 2021-07-09 鹏城实验室 Game theory-based network space mimicry defense method, device, medium and terminal

Similar Documents

Publication Publication Date Title
AU2019210493B2 (en) Anomaly detection to identify coordinated group attacks in computer networks
CN103152345B (en) A kind of optimum attacking and defending decision-making technique of network security of attacking and defending game
US10735466B2 (en) Reactive and pre-emptive security system for the protection of computer networks and systems
WO2021109695A1 (en) Adversarial attack detection method and device
US10193868B2 (en) Safe security proxy
US20150047026A1 (en) Anomaly detection to identify coordinated group attacks in computer networks
US20130318615A1 (en) Predicting attacks based on probabilistic game-theory
CN112073411A (en) Network security deduction method, device, equipment and storage medium
Shen et al. Adaptive Markov game theoretic data fusion approach for cyber network defense
CN104333529A (en) Detection method and system of HTTP DOS (Denial of Service) attack under cloud computing environment
EP3132569A1 (en) Rating threat submitter
CN109327449B (en) Attack path restoration method, electronic device and computer readable storage medium
EP2683130A2 (en) Social network protection system
CN109714372A (en) Network safety system and processing method based on block chain
CN114726557A (en) Network security protection method and device
US10757029B2 (en) Network traffic pattern based machine readable instruction identification
Shang Optimal control strategies for virus spreading in inhomogeneous epidemic dynamics
Savenko et al. Botnet detection technique for corporate area network
Shen et al. A markov game theoretic data fusion approach for cyber situational awareness
CN112583841B (en) Virtual machine safety protection method and system, electronic equipment and storage medium
CN110784487B (en) A SDN node defense method based on data packet sampling model
CN115550031A (en) Network defense method, control device and storage medium
CN113132398A (en) Array honeypot system defense strategy prediction method based on Q learning
Panimalar et al. A review on taxonomy of botnet detection
Shen et al. An adaptive Markov game model for cyber threat intent inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination