CN114638356A

CN114638356A - Static weight guided deep neural network back door detection method and system

Info

Publication number: CN114638356A
Application number: CN202210177556.2A
Authority: CN
Inventors: 赵磊; 李文欣; 王�琦; 刘佩
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-17
Anticipated expiration: 2042-02-25
Also published as: CN114638356B

Abstract

The invention discloses a method and a system for detecting a backdoor of a deep neural network guided by static weight.A pre-trained neural network model is subjected to static weight analysis to obtain a suspicious target label and a victim label of backdoor attack, and a target-victim label pair is formed; then, performing reverse engineering by using the obtained target-damaged label pair, and recovering a back door trigger; and finally, analyzing the shape attribute of the reverse trigger and the distribution of the activated neurons to filter false alarms and obtain a final detection result. The method exerts the advantages of small calculation overhead of static weight analysis, no influence of the quality of an input sample and no influence of the type of a trigger, and effectively improves the efficiency, the precision and the expandability of the neural network back door detection.

Description

A deep neural network backdoor detection method and system guided by static weights

技术领域technical field

本发明属于人工智能和网络安全领域，涉及一种深度神经网络后门检测方法，具体涉及一种静态权重引导的深度神经网络后门检测方法。The invention belongs to the field of artificial intelligence and network security, and relates to a deep neural network backdoor detection method, in particular to a deep neural network backdoor detection method guided by static weights.

背景技术Background technique

近年来，深度神经网络在诸多领域取得了不错的表现，例如计算机视觉、恶意软件检测、自动驾驶等。由于构建和部署表现良好的神经网络模型需要大量的专家知识和计算开销，用户一般选择外包云计算或者下载预训练模型。In recent years, deep neural networks have achieved good performance in many fields, such as computer vision, malware detection, autonomous driving, etc. Since building and deploying a well-performing neural network model requires a lot of expert knowledge and computational overhead, users generally choose to outsource cloud computing or download pre-trained models.

然而，现有研究已经证明神经网络很容易受到后门攻击，这导致从第三方获取的预训练模型可能存在严重的安全风险。在后门攻击中，攻击者定义一个后门触发器，并指定攻击的目标标签和受害标签。受害标签可能是除目标标签外的所有标签，也可能是攻击者特定的几个标签。攻击者在训练过程中为来自受害标签的数据加上触发器，并标记为目标标签，向模型植入后门。用户使用时，含后门的模型对于干净输入仍能正确分类，然而一旦受害标签的输入中含有攻击者定义的触发器，就会被分到攻击者指定的后门攻击目标标签。However, existing research has demonstrated that neural networks are vulnerable to backdoor attacks, which leads to the possible serious security risks of pre-trained models obtained from third parties. In a backdoor attack, the attacker defines a backdoor trigger and specifies the target tag and victim tag of the attack. Victim tags may be all tags except the target tag, or a few tags specific to the attacker. Attackers add triggers to data from victim labels during training and mark them as target labels to implant a backdoor into the model. When the user uses it, the model with the backdoor can still correctly classify the clean input, but once the input of the victim label contains the trigger defined by the attacker, it will be assigned to the backdoor attack target label specified by the attacker.

对神经网络后门攻击的防御一直是研究热点。给定一个预训练模型，防御者需要判断模型中是否含有恶意的后门。防御者通常不能获取攻击者使用的训练集和触发器的样式，只拥有一小部分用于验证模型功能的干净样本。后门检测的主要困难在于识别攻击者指定的目标-受害标签对，并恢复触发器的样式。The defense against neural network backdoor attacks has always been a research hotspot. Given a pretrained model, the defender needs to determine whether the model contains malicious backdoors. Defenders typically do not have access to the training set and trigger patterns used by attackers, and only have a small subset of clean samples to validate model functionality. The main difficulty of backdoor detection lies in identifying the attacker-specified target-victim tag pair and recovering the trigger pattern.

传统检测方法通常是盲目尝试所有的标签对、或者给定一些干净输入样本进行标签预选，然后针对每组标签对通过逆向工程恢复触发器样式。另外，传统方法大多假设检测的后门触发器类型为像素补丁类型，而另一种使用基于图像变换的滤镜型触发器也是常见且容易实施的后门攻击类型。由于为每个标签对逆向工程计算开销大、输入样本质量和触发器类型存在不确定性，传统方法通常面临计算复杂度高、精度不稳定和缺乏可扩展性的问题。Traditional detection methods usually blindly try all label pairs, or pre-select labels given some clean input samples, and then reverse engineer the trigger style for each set of label pairs. In addition, most traditional methods assume that the detected backdoor trigger type is pixel patch type, and another type of backdoor attack using image transformation-based filter type triggers is also a common and easy to implement type of backdoor attack. Due to the high computational overhead of reverse engineering for each tag pair, uncertainties in the quality of input samples and trigger types, traditional methods usually suffer from high computational complexity, unstable accuracy, and lack of scalability.

发明内容SUMMARY OF THE INVENTION

本发明为了提高深度学习神经网络模型后门检测的效率、准确率和可扩展性，提供了一种静态权重引导的深度神经网络后门检测方法及系统，输入为一个待检测的深度神经网络模型，和一小部分用于验证模型功能的干净图像样本，输出为该神经网络模型是否含后门，如果含有后门，输出后门攻击的目标标签，受害标签，以及触发器的样式。In order to improve the efficiency, accuracy and scalability of the deep neural network model backdoor detection, the present invention provides a deep neural network backdoor detection method and system guided by static weights, the input is a deep neural network model to be detected, and A small number of clean image samples used to verify the function of the model, the output is whether the neural network model contains a backdoor, and if it contains a backdoor, output the target label of the backdoor attack, the victim label, and the trigger style.

本发明的方法所采用的技术方案是：一种静态权重引导的深度神经网络后门检测方法，包括以下步骤：The technical solution adopted by the method of the present invention is: a deep neural network backdoor detection method guided by static weights, comprising the following steps:

步骤1：对深度神经网络进行静态权重分析，得到后门攻击的可疑目标标签和受害标签，组成目标-受害标签对；Step 1: Perform static weight analysis on the deep neural network to obtain suspicious target labels and victim labels of backdoor attacks, and form target-victim label pairs;

步骤1的具体实现包括以下子步骤：The specific implementation of step 1 includes the following sub-steps:

步骤1.1：提取深度神经网络最后一层与输出标签相连的所有权重，假设深度神经网络共有n个输出标签，将与每一个标签相连接的权重组织成向量，得到n个权重向量w₁…w_n；Step 1.1: Extract all the weights connected to the output labels in the last layer of the deep neural network, assuming that the deep neural network has n output labels, organize the weights connected to each label into a vector, and obtain n weight vectors w ₁ ... w _n ;

步骤1.2：对每一个权重向量，计算它和其他所有权重向量的差异度，将差异度由高到低排序，取差异度高的前k_d个权重向量对应的目标标签集合D；对每一个权重向量，计算它包含所有权重的和，将权重和由高到低排序，取权重和高的前k_s个权重向量对应的目标标签集合S；将D和S取并集得到最终的可疑目标标签集合T；Step 1.2: For each weight vector, calculate the degree of difference between it and all other weight vectors, sort the degree of difference from high to low, and take the target label set D corresponding to the first k _d weight vectors with high degree of difference; Weight vector, calculate the sum of all weights, sort the weight sum from high to low, and take the target label set S corresponding to the first k _s weight vectors of the weight sum and high; take the union of D and S to get the final suspicious target label set T;

步骤1.3：将步骤1.2得到的最高差异度和次高差异度相减，所得到的差大于阈值θ，则认为受害标签是模型中的所有标签；否则，对T中的每一个可疑目标标签t，计算它和其他权重向量的相似度，将相似度由高到低排序，取相似度高的前k_v个标签，作为可疑受害标签V_t；Step 1.3: Subtract the highest difference and the second highest difference obtained in step 1.2, and the difference obtained is greater than the threshold θ, then the victim label is considered to be all labels in the model; otherwise, for each suspicious target label t in T , calculate the similarity between it and other weight vectors, sort the similarity from high to low, and take the first k _v tags with high similarity as the suspicious victim tag V _t ;

步骤1.4：将得到的可疑目标标签集合T和T中每一个目标标签t对应的受害标签V_t，组成目标-受害标签对集合；Step 1.4: The obtained suspicious target label set T and the victim label V _t corresponding to each target label t in T are formed into a target-victim label pair set;

步骤2：利用步骤1得到的可疑目标标签和受害标签，和干净图像样本，进行触发器逆向工程，得到逆向触发器；Step 2: Use the suspicious target labels and victim labels obtained in Step 1, and clean image samples to perform reverse engineering of triggers to obtain reverse triggers;

当所述触发器逆向工程为像素补丁型触发器逆向工程时，则判断得到的像素补丁型逆向触发器是否满足成功率和触发器大小预设条件，若满足预设条件，则执行下述步骤3；否则，输出检测结果为待检测深度神经网络不含有像素补丁型后门；When the reverse engineering of the trigger is the reverse engineering of the pixel patch type trigger, it is judged whether the obtained pixel patch type reverse trigger satisfies the preset conditions of the success rate and the trigger size, and if the preset conditions are met, the following steps are performed 3; otherwise, the output detection result is that the deep neural network to be detected does not contain a pixel patch backdoor;

当所述触发器逆向工程为图像滤镜型触发器逆向工程时，则判断得到的图像滤镜型逆向触发器是否满足成功率预设条件，若满足预设条件，则输出检测结果为待检测深度神经网络含有图像滤镜型后门；否则，输出检测结果为待检测深度神经网络不含有图像滤镜型后门；When the reverse engineering of the trigger is the reverse engineering of the image filter type trigger, it is judged whether the obtained image filter type reverse trigger satisfies the preset conditions of the success rate, and if the preset conditions are met, the output detection result is to be detected The deep neural network contains an image filter type backdoor; otherwise, the output detection result is that the deep neural network to be detected does not contain an image filter type backdoor;

步骤3：分析步骤2得到的像素补丁型逆向触发器的形状属性，以及激活待检测深度神经网络内部神经元的分布，输出最终检测结果。Step 3: Analyze the shape attribute of the pixel patch-type inverse trigger obtained in Step 2, and activate the distribution of the internal neurons of the deep neural network to be detected, and output the final detection result.

本发明的系统所采用的技术方案是：一种静态权重引导的深度神经网络后门检测系统，包括以下模块：The technical scheme adopted by the system of the present invention is: a deep neural network backdoor detection system guided by static weights, comprising the following modules:

模块1，用于对深度神经网络进行静态权重分析，得到后门攻击的可疑目标标签和受害标签，组成目标-受害标签对；Module 1 is used to perform static weight analysis on the deep neural network to obtain suspicious target labels and victim labels of backdoor attacks, and form target-victim label pairs;

模块1包括以下子模块：Module 1 includes the following submodules:

模块1.1，用于提取深度神经网络最后一层与输出标签相连的所有权重，假设深度神经网络共有n个输出标签，将与每一个标签相连接的权重组织成向量，得到n个权重向量w₁…w_n；Module 1.1 is used to extract all the weights connected to the output labels in the last layer of the deep neural network. Assuming that the deep neural network has n output labels in total, organize the weights connected to each label into a vector to obtain n weight vectors w ₁ ...w _n ;

模块1.2，用于对每一个权重向量，计算它和其他所有权重向量的差异度，将差异度由高到低排序，取差异度高的前k_d个权重向量对应的目标标签集合D；对每一个权重向量，计算它包含所有权重的和，将权重和由高到低排序，取权重和高的前k_s个权重向量对应的目标标签集合S；将D和S取并集得到最终的可疑目标标签集合T；Module 1.2 is used to calculate the degree of difference between each weight vector and all other weight vectors, sort the degree of difference from high to low, and take the target label set D corresponding to the first k _d weight vectors with high degree of difference; For each weight vector, calculate the sum of all weights, sort the weight sum from high to low, and take the target label set S corresponding to the first k _s weight vectors of the weight sum and high; take the union of D and S to get the final Suspicious target label set T;

模块1.3，用于将模块1.2得到的最高差异度和次高差异度相减，所得到的差大于阈值θ，则认为受害标签是模型中的所有标签；否则，对T中的每一个可疑目标标签t，计算它和其他权重向量的相似度，将相似度由高到低排序，取相似度高的前k_v个标签，作为可疑受害标签V_t；Module 1.3 is used to subtract the highest difference degree and the second highest degree of difference obtained by module 1.2, and the obtained difference is greater than the threshold θ, then the victim label is considered to be all labels in the model; otherwise, for each suspicious target in T Label t, calculate the similarity between it and other weight vectors, sort the similarity from high to low, and take the top k _v labels with high similarity as the suspicious victim label V _t ;

模块1.4，用于将得到的可疑目标标签集合T和T中每一个目标标签t对应的受害标签V_t，组成目标-受害标签对集合；Module 1.4 is used to form a set of target-victim tag pairs from the obtained suspicious target tag set T and the victim tag V _t corresponding to each target tag t in T;

模块2：用于利用模块1得到的可疑目标标签和受害标签，和干净图像样本，进行触发器逆向工程，得到逆向触发器；Module 2: Use the suspicious target labels and victim labels obtained by module 1, and clean image samples to perform trigger reverse engineering to obtain reverse triggers;

当所述触发器逆向工程为像素补丁型触发器逆向工程时，则判断得到的像素补丁型逆向触发器是否满足成功率和触发器大小预设条件，若满足预设条件，则执行下述模块3；否则，输出检测结果为待检测深度神经网络不含有像素补丁型后门；When the reverse engineering of the trigger is the reverse engineering of the pixel patch type trigger, it is judged whether the obtained pixel patch type reverse trigger satisfies the preset conditions of the success rate and the trigger size, and if the preset conditions are met, the following modules are executed 3; otherwise, the output detection result is that the deep neural network to be detected does not contain a pixel patch backdoor;

模块3，用于分析得到的像素补丁型触发器的形状属性，以及激活待检测深度神经网络内部神经元的分布，输出最终检测结果。The module 3 is used for analyzing the shape attribute of the obtained pixel patch trigger, and activating the distribution of the inner neurons of the deep neural network to be detected, and outputting the final detection result.

本发明的优点：Advantages of the present invention:

1.本发明创新性地采用静态权重分析来识别待检测模型中可疑的后门目标-受害标签对，能有效利用模型权重中的信息。对模型权重进行静态分析无需执行模型，计算复杂度低；无需输入样本，能有效克服输入样本数量、质量的影响，实现稳定的标签识别。而传统方法盲目尝试所有的目标-受害标签对，效率低，实际可用性差；或者给定一些干净输入样本进行标签预选，效果容易受到样本数量和质量的限制。1. The present invention innovatively uses static weight analysis to identify suspicious backdoor target-victim tag pairs in the model to be detected, which can effectively utilize the information in the model weight. Static analysis of model weights does not require model execution, and the computational complexity is low; no input samples are required, which can effectively overcome the influence of the quantity and quality of input samples and achieve stable label recognition. However, traditional methods blindly try all target-victim label pairs, which is inefficient and has poor practical usability; or given some clean input samples for label preselection, the effect is easily limited by the number and quality of samples.

2.本发明利用静态权重分析不受触发器类型影响、可扩展性强的优点，在静态权重分析结果的引导下，对后门攻击触发器进行逆向工程，能检测和还原像素补丁类型触发器和图像滤镜类型触发器。而传统方法大多假设后门攻击使用像素补丁类型的触发器，难以扩展到图像滤镜类型触发器的检测和还原。2. The present invention utilizes the advantages that static weight analysis is not affected by trigger type and has strong scalability. Under the guidance of static weight analysis results, reverse engineering is performed on backdoor attack triggers, and pixel patch type triggers and triggers can be detected and restored. Image filter type trigger. Traditional methods mostly assume that backdoor attacks use pixel patch-type triggers, which are difficult to extend to the detection and restoration of image filter-type triggers.

3.本发明对逆向工程得到的触发器进行形状属性分析和激活神经元分布分析，有效降低了误报率，使得所发明方法在有效检测含后门模型的同时保留正常的模型，实现高精度的深度神经网络模型后门检测。3. The present invention performs shape attribute analysis and activation neuron distribution analysis on the flip-flop obtained by reverse engineering, which effectively reduces the false alarm rate, so that the invented method can effectively detect the model with backdoor while retaining the normal model, and realize high precision. Deep neural network model backdoor detection.

附图说明Description of drawings

图1是本发明实施例的方法流程图。FIG. 1 is a flowchart of a method according to an embodiment of the present invention.

图2是本发明实施例的静态权重分析流程图。FIG. 2 is a flow chart of static weight analysis according to an embodiment of the present invention.

图3是本发明实施例的像素补丁型触发器逆向工程流程图。FIG. 3 is a flowchart of reverse engineering of a pixel patch flip-flop according to an embodiment of the present invention.

图4是本发明实施例的图像滤镜型触发器逆向工程流程图。FIG. 4 is a flow chart of reverse engineering of an image filter type trigger according to an embodiment of the present invention.

图5是本发明实施例的像素补丁型触发器分析流程图。FIG. 5 is a flow chart of analyzing a pixel patch type flip-flop according to an embodiment of the present invention.

具体实施方式Detailed ways

为了便于本领域普通技术人员理解和实施本发明，下面结合附图及实施例对本发明作进一步的详细描述，应当理解，此处所描述的实施示例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate the understanding and implementation of the present invention by those skilled in the art, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to illustrate and explain the present invention, but not to limit it. this invention.

传统检测方法通常是盲目尝试所有的目标-受害标签对、或者给定一些干净输入样本进行标签预选，然后针对每个标签对通过逆向工程恢复触发器样式。当需要尝试的标签对多、或者输入样本的质量差导致标签对选择不准确时，传统方法的效率、准确率会受到影响，并且传统方法大多只适用于像素补丁类型的触发器。Traditional detection methods usually blindly try all target-victim label pairs, or preselect labels given some clean input samples, and then reverse engineer the trigger style for each label pair. When there are many label pairs to be tried, or the poor quality of input samples leads to inaccurate label pair selection, the efficiency and accuracy of traditional methods will be affected, and most traditional methods are only suitable for pixel patch type triggers.

在训练过程中，神经网络通过更新权重参数来完成分类任务。对于训练完成的神经网络，权重参数是固定不变的，可以看作模型的一种静态属性。因为植入后门必须要篡改模型的权重参数，相比干净的模型，含后门的模型会不可避免地出现权重分布的异常，这种异常和触发器类型无关，可以用来识别攻击者指定的目标-受害标签对，从而引导后门检测过程，提高检测效率。During the training process, the neural network completes the classification task by updating the weight parameters. For the trained neural network, the weight parameters are fixed and can be regarded as a static attribute of the model. Because the weight parameters of the model must be tampered with to implant the backdoor, compared with the clean model, the model with the backdoor will inevitably have abnormal weight distribution. This abnormality has nothing to do with the trigger type and can be used to identify the target specified by the attacker. - Victim tag pair, thus guiding the backdoor detection process and improving detection efficiency.

基于以上分析，本发明提出了一种静态权重分析引导的深度神经网络后门检测方法及系统。本发明致力于利用预训练神经网络中的权重分布异常来引导后门检测，发挥了静态权重分析计算开销小、不受输入样本质量影响、不受触发器类型影响的优势，研究和探索出了一种高效、稳定、可扩展的神经网络后门检测方法。Based on the above analysis, the present invention proposes a deep neural network backdoor detection method and system guided by static weight analysis. The present invention is dedicated to using the abnormal weight distribution in the pre-training neural network to guide backdoor detection, and takes advantage of the advantages of static weight analysis with low computational cost, unaffected by the quality of input samples, and unaffected by trigger types. An efficient, stable and scalable neural network backdoor detection method.

请见图1，本发明提供的一种静态权重引导的深度神经网络后门检测方法，包括以下步骤：Referring to Figure 1, a static weight guided deep neural network backdoor detection method provided by the present invention includes the following steps:

请见图2，本实施例中，步骤1的具体实现包括以下子步骤：Referring to FIG. 2, in this embodiment, the specific implementation of step 1 includes the following sub-steps:

步骤1.3：为了识别受害标签，将步骤1.2得到的最高差异度和次高差异度相减，所得到的差大于阈值θ，则认为受害标签是模型中的所有标签；否则，对T中的每一个可疑目标标签t，计算它和其他权重向量的相似度，将相似度由高到低排序，取相似度高的前k_v个标签，作为可疑受害标签V_t；Step 1.3: In order to identify the victim label, subtract the highest difference degree and the second highest difference degree obtained in step 1.2, and the obtained difference is greater than the threshold θ, then the victim label is considered to be all the labels in the model; otherwise, for each label in T A suspicious target label t, calculate the similarity between it and other weight vectors, sort the similarity from high to low, and take the top k _v labels with high similarity as the suspicious victim label V _t ;

步骤1.4：经过上面三步的处理，将得到的可疑目标标签集合T和T中每一个目标标签t对应的受害标签V_t，组成目标-受害标签对集合；Step 1.4: After the above three steps of processing, the obtained suspicious target label set T and the victim label V _t corresponding to each target label t in T are formed into a target-victim label pair set;

在本实施例的实验过程中，取θ＝0.1，k_d＝3，k_s＝2，k_v＝2(在具体实施时，也可由防御者视防御需求预设)，则集合D中包含元素个数为3，集合S中包含元素个数为2，集合T中包含元素个数为3-5。当最高差异度和次高差异度相差大于0.1时，认为受害标签是模型中所有标签，将所有标签看作一个整体作为可疑受害标签V_t，目标-受害标签对数量为3-5；否则认为受害标签不是所有标签，为每个目标标签计算可疑受害标签V_t，目标-受害标签对数量为6-10。In the experimental process of this embodiment, θ=0.1, k _d =3, k _s =2, k _v =2 (in the specific implementation, it can also be preset by the defender according to the defense requirements), then the set D contains The number of elements is 3, the number of elements contained in the set S is 2, and the number of elements contained in the set T is 3-5. When the difference between the highest difference degree and the second highest difference degree is greater than 0.1, the victim label is considered to be all the labels in the model, and all labels are regarded as a whole as a suspicious victim label V _t , and the number of target-victim label pairs is 3-5; otherwise, it is considered that Victim tags are not all tags, suspicious victim tags V _t are calculated for each target tag, and the number of target-victim tag pairs is 6-10.

本实施例中采用平均余弦相似度计算标签l的权重向量w_l的差异度Divergence(l)，其定义为：In this embodiment, the average cosine similarity is used to calculate the difference degree Divergence(l) of the weight vector w _l of the label l, which is defined as:

本实施例中采用余弦相似度计算权重向量之间的相似度，对T中的可疑目标标签t，其权重向量为w_t，则标签i的权重向量w_i与w_t的相似度定义为：In this embodiment, the cosine similarity is used to calculate the similarity between the weight vectors. For the suspicious target label _t in T, the weight vector is wt , and the similarity between the weight vector _wi and _wt of the label i is defined as:

步骤2：利用步骤1得到的可疑目标标签和受害标签，和少量干净图像样本，进行触发器逆向工程，得到成功率满足预设条件的逆向触发器；Step 2: Use the suspicious target label and victim label obtained in Step 1, and a small number of clean image samples, to perform reverse engineering of triggers, and obtain a reverse trigger whose success rate satisfies the preset conditions;

本实施例中，逆向工程针对的触发器类型，包括像素补丁型触发器和图像滤镜型触发器。In this embodiment, the types of triggers targeted by reverse engineering include pixel patch triggers and image filter triggers.

当触发器逆向工程为像素补丁型触发器逆向工程时，则判断得到的像素补丁型逆向触发器是否满足成功率和触发器大小预设条件，若满足预设条件，则执行下述步骤3；否则，输出检测结果为待检测深度神经网络不含有像素补丁型后门；When the reverse engineering of the trigger is the reverse engineering of the pixel patch type trigger, then it is judged whether the obtained pixel patch type reverse trigger satisfies the preset conditions of the success rate and the trigger size, and if the preset conditions are met, the following step 3 is performed; Otherwise, the output detection result is that the deep neural network to be detected does not contain a pixel patch backdoor;

当触发器逆向工程为图像滤镜型触发器逆向工程时，则判断得到的图像滤镜型逆向触发器是否满足成功率预设条件，若满足预设条件，则输出检测结果为待检测深度神经网络含有图像滤镜型后门；否则，输出检测结果为待检测深度神经网络不含有图像滤镜型后门。When the reverse engineering of the trigger is the reverse engineering of the image filter type trigger, it is judged whether the obtained image filter type reverse trigger meets the preset conditions of the success rate, and if the preset conditions are met, the output detection result is the deep neural network to be detected. The network contains an image filter type backdoor; otherwise, the output detection result is that the deep neural network to be detected does not contain an image filter type backdoor.

本实施例中，在触发器逆向工程中，对于图像滤镜型触发器的逆向工程，包括定义图像滤镜型触发器对图像进行变换的一般形式和定义优化任务求解图像滤镜型触发器；In this embodiment, in the reverse engineering of the trigger, the reverse engineering of the image filter type trigger includes defining the general form of the image filter type trigger to transform the image and defining the optimization task to solve the image filter type trigger;

定义图像滤镜型触发器对图像进行变换的一般形式，包括：对维度为3×H×W的三通道彩色图片，拼接两个值为全1的通道，即透明度通道和偏置通道，得到维度为5×H×W的矩阵；其中，H和W分别表示高和宽；将滤镜触发器定义为一个大小为4×5的二维矩阵，与维度为5×H×W的矩阵相乘，得到4×H×W的矩阵；将4×H×W的矩阵视为RGBA格式的图片，最后一个通道为透明度通道；利用RGBA格式转RGB格式的方法，将4×H×W的矩阵还原为3×H×W的三通道彩色图片，最终得到经过滤镜触发器变换的图片；Define the general form of image filter trigger to transform the image, including: splicing two channels whose value is all 1 for a three-channel color image with a dimension of 3×H×W, namely the transparency channel and the bias channel, to get A matrix of dimension 5×H×W; where H and W represent height and width, respectively; the filter trigger is defined as a two-dimensional matrix of size 4×5, which is similar to the matrix of dimension 5×H×W Multiply to get a 4×H×W matrix; regard the 4×H×W matrix as a picture in RGBA format, and the last channel is the transparency channel; use the method of RGBA format to RGB format, convert the 4×H×W matrix It is restored to a 3×H×W three-channel color image, and finally the image transformed by the filter trigger is obtained;

定义优化任务求解图像滤镜型触发器，优化任务包括：添加滤镜触发器后的图片能被待检测神经网络模型错误分类到目标标签，和添加滤镜触发器后的图片与原图片结构相似性尽量高。Define an optimization task to solve the image filter trigger. The optimization task includes: the image after adding the filter trigger can be misclassified to the target label by the neural network model to be detected, and the image after adding the filter trigger has a similar structure to the original image sex as high as possible.

本实施例中，在触发器逆向工程中，对于像素补丁型触发器的逆向工程，其优化任务为：受害标签图片加上同一个触发器后，能被神经网络模型错判为目标标签，且触发器的像素数量尽量少。In this embodiment, in the reverse engineering of triggers, for the reverse engineering of pixel patch triggers, the optimization task is: after adding the same trigger to the victim label image, it can be wrongly judged as the target label by the neural network model, and Triggers should have as few pixels as possible.

请见图3，对于像素补丁型触发器的逆向工程，步骤2的具体实现包括以下子步骤：See Figure 3. For the reverse engineering of pixel patch flip-flops, the specific implementation of step 2 includes the following sub-steps:

(1)输入目标-受害标签对集合；(1) Input target-victim tag pair set;

(2)取第一个目标-受害标签对；(2) Take the first target-victim tag pair;

(3)模拟像素补丁添加过程，为这个标签对求解优化任务；(3) Simulate the pixel patch adding process, and solve the optimization task for this label pair;

(4)生成像素补丁型逆向触发器，计算逆向成功率；(4) Generate a pixel patch type reverse trigger, and calculate the reverse success rate;

(5)判断成功率是否大于阈值；若是，则执行下述步骤(6)；若否，则执行下述步骤(9)；(5) determine whether the success rate is greater than the threshold; if so, execute the following step (6); if not, execute the following step (9);

(6)判断触发器大小是否小于阈值；若是，则执行下述步骤(7)；若否，则执行下述步骤(9)；(6) judge whether the trigger size is less than the threshold; if so, execute the following step (7); if not, execute the following step (9);

(7)深度神经网络可能含有像素补丁型后门，进一步分析得到的逆向触发器，判断是否存在真正的后门触发器；若是，则执行下述步骤(8)；若否，则执行下述步骤(9)；(7) The deep neural network may contain a pixel patch type backdoor, further analyze the reverse trigger obtained, and judge whether there is a real backdoor trigger; if so, execute the following steps (8); if not, execute the following steps ( 9);

(8)深度神经网络含有像素补丁型后门，输出该标签对和求解的逆向触发器；本流程结束；(8) The deep neural network contains a pixel patch type backdoor, and outputs the reverse trigger of the label pair and solution; this process ends;

(9)判断是否遍历完所有标签对；若是，则执行下述步骤(10)；若否，取下一个目标-受害标签对，并回转执行步骤(3)；(9) Judging whether all tag pairs are traversed; if so, execute the following step (10); if not, take a target-victim tag pair, and turn around to execute step (3);

(10)输出判断结果为深度神经网络无像素补丁型后门。(10) The output judgment result is a pixel-free backdoor of the deep neural network.

请见图4，对于图像滤镜型触发器的逆向工程，步骤2的具体实现包括以下子步骤：See Figure 4. For the reverse engineering of image filter triggers, the specific implementation of step 2 includes the following sub-steps:

(1)输入目标标签集合；(1) Input the target label set;

(2)取第一个目标标签；(2) Take the first target label;

(3)模拟图像滤镜添加过程，为每个其他标签求解优化任务；(3) Simulate the image filter adding process, and solve the optimization task for each other label;

(4)生成图像滤镜型逆向触发器，计算平均逆向成功率；(4) Generate an image filter type reverse trigger, and calculate the average reverse success rate;

(5)判断成功率是否大于阈值；若是，则执行下述步骤(6)；若否，则执行下述步骤(7)；(5) determine whether the success rate is greater than the threshold; if so, execute the following step (6); if not, execute the following step (7);

(6)深度神经网络含有图像滤镜型后门，输出该标签和求解的逆向触发器；本流程结束；(6) The deep neural network contains an image filter type backdoor, and outputs the label and the reverse trigger of the solution; this process ends;

(7)判断是否遍历完所有标签对；若是，则执行下述步骤(8)；若否，取下一个目标标签，并回转执行步骤(3)；(7) judge whether to traverse all label pairs; if so, execute the following step (8); if not, take down a target label, and turn around to execute step (3);

(8)输出判断结果为深度神经网络无图像滤镜型后门。(8) The output judgment result is a deep neural network backdoor without an image filter.

在本发明的实验过程中，图像样本的维度(通道，高，宽)为(3，224，224)。当认为受害标签是模型中所有标签时，干净图片样本的数量为40张，从所有标签中随机选取；当认为受害标签不是所有标签时，干净图片样本的数量为10张，从受害标签中选取。像素补丁型触发器的成功率阈值取99％(可由防御者视防御需求预设)，像素补丁型触发器的大小||M||₁阈值取350(可由防御者视防御需求预设)，当逆向触发器攻击成功率大于99％，并且触发器大小小于350时，认为是一个可疑像素触发器；图像滤镜型触发器的平均攻击成功率阈值取90％(可由防御者视防御需求预设)，当n-1个逆向触发器的平均攻击成功率大于90％时，认为模型含有图像滤镜型后门。In the experimental process of the present invention, the dimensions (channel, height, width) of the image sample are (3, 224, 224). When the victim labels are considered to be all labels in the model, the number of clean image samples is 40, which are randomly selected from all labels; when the victim labels are not considered to be all labels, the number of clean image samples is 10, which are selected from the victim labels . The success rate threshold of the pixel patch trigger is 99% (it can be preset by the defender according to the defense needs), the size of the pixel patch trigger ||M|| ₁ is 350 (it can be preset by the defender according to the defense needs), When the attack success rate of the reverse trigger is greater than 99% and the trigger size is less than 350, it is considered a suspicious pixel trigger; the threshold of the average attack success rate of the image filter trigger is 90% (which can be predicted by the defender depending on the defense requirements). ), when the average attack success rate of n-1 reverse triggers is greater than 90%, the model is considered to contain an image filter type backdoor.

本实施例中，对于像素补丁型的触发器，其添加在图片上的一般形式

定义为：In this embodiment, for the pixel patch type trigger, the general form of the trigger added to the picture is

defined as:

其中，x表示原图片，

表示加上触发器之后的图片，算子°表示哈达玛积。P是一个颜色矩阵，M矩阵控制触发器的位置和透明度，像素补丁型触发器由(P，M)表示。Among them, x represents the original image,

Represents the picture after adding the trigger, and the operator ° represents the Hadamard product. P is a color matrix, M matrix controls the position and transparency of flip-flops, and pixel-patch flip-flops are represented by (P, M).

本实施例中，对于图像滤镜型的触发器，其对图像进行变换的一般形式

定义为：In this embodiment, for an image filter type trigger, the general form of transforming an image is

defined as:

其中，x表示原图片，

表示加上触发器之后的图片，算子·表示矩阵乘法，操作concatenate表示矩阵拼接，操作rgba2rgb表示将RGBA格式的四通道图片转换为RGB三通道格式。触发器

是一个维度为4×5的二维矩阵。Among them, x represents the original image,

Represents the picture after adding the trigger, operator · represents matrix multiplication, operation concatenate represents matrix splicing, and operation rgba2rgb represents converting a four-channel image in RGBA format to RGB three-channel format. trigger

is a two-dimensional matrix of dimension 4 × 5.

本实施例中，对于像素补丁型的触发器，其优化任务为：受害标签图片加上同一个触发器后，能被神经网络模型错判为目标标签，且触发器的像素数量尽量少。定义为：In this embodiment, for the pixel patch type trigger, the optimization task is: after adding the same trigger to the victim label image, it can be misjudged as the target label by the neural network model, and the number of pixels of the trigger is as small as possible. defined as:

其中，f表示待检测模型，其输出为预测的概率值，X表示来自受害标签的干净图片集合，

表示交叉熵损失函数，y_t表示目标标签，||M||₁表示M的L1范数，β表示约束参数。where f represents the model to be detected whose output is the predicted probability value, X represents the set of clean images from the victim label,

represents the cross-entropy loss function, y _t represents the target label, ||M|| ₁ represents the L1 norm of M, and β represents the constraint parameter.

本实施例中，对于图像滤镜型的触发器，为除了目标标签外的每个模型输出标签优化求解一个逆向触发器，共n-1个逆向触发器，并计算n-1个逆向触发器的平均攻击成功率。理由是被图像滤镜型触发器攻击的后门模型，内部决策空间被破坏，对绝大多数的标签都可以通过一个线性变换创建通往目标标签决策域的路径。优化任务为：受害标签图片经过滤镜触发器变换后，能被神经网络模型错判为目标标签，且由于滤镜变换后的图片仍然是清晰可辨的，经过滤镜触发器变换后的图片和原图片的结构相似性应尽量高。定义为：In this embodiment, for an image filter type trigger, an inverse trigger is optimized and solved for each model output label except the target label, with a total of n-1 inverse triggers, and n-1 inverse triggers are calculated. The average attack success rate. The reason is that the internal decision space of the backdoor model attacked by image filter triggers is destroyed, and a linear transformation can be used to create a path to the target label decision domain for most labels. The optimization task is: after the image of the victim tag is transformed by the filter trigger, it can be misjudged as the target label by the neural network model, and because the image after the filter transformation is still clearly discernible, the image after the filter trigger transformation The structural similarity with the original image should be as high as possible. defined as:

其中，f表示待检测模型，其输出为预测的概率值，X表示来自除目标标签外任一标签的干净图片集合，

表示交叉熵损失函数，y_t表示目标标签，SSIM表示结构相似性度量指标，γ表示约束参数。Among them, f represents the model to be detected, and its output is the predicted probability value, X represents the clean image set from any label except the target label,

represents the cross-entropy loss function, y _t represents the target label, SSIM represents the structural similarity measure, and γ represents the constraint parameter.

本实施例中，优化任务使用Adam优化器进行优化求解，得到逆向触发器。In this embodiment, the optimization task is optimized and solved by using the Adam optimizer to obtain an inverse trigger.

步骤3：分析得到的像素补丁型触发器的形状属性，以及激活待检测深度神经网络内部神经元的分布，输出最终检测结果。Step 3: Analyze the shape attribute of the obtained pixel patch trigger, and activate the distribution of the neurons in the deep neural network to be detected, and output the final detection result.

请见图5，本实施例中，步骤3的具体实现包括以下子步骤：Referring to Figure 5, in this embodiment, the specific implementation of step 3 includes the following sub-steps:

步骤3.1：对得到的成功率和触发器大小满足预设条件的像素补丁型触发器，分析其像素分布的稀疏度，若高于预设的稀疏度阈值，则认定是对抗扰动误报，输出深度神经网络检测结果为不含像素补丁型后门；否则，继续执行下述步骤3.2；Step 3.1: For the pixel patch triggers whose success rate and trigger size meet the preset conditions, analyze the sparsity of the pixel distribution. If it is higher than the preset sparsity threshold, it is considered to be an anti-disturbance false alarm, and output The detection result of the deep neural network is that there is no pixel patch backdoor; otherwise, proceed to the following step 3.2;

本实施例中，逆向触发器的像素分布稀疏度用触发器的格点覆盖率计算，将整张图片平均划分成若干小块，每块看作一个格点，则触发器的格点覆盖率定义为：包含触发器像素的格点占全部格点数的比例。当这个比例大于一定的阈值，认为是对抗扰动，而非攻击者恶意植入的后门触发器。In this embodiment, the pixel distribution sparsity of the reverse trigger is calculated by the grid coverage ratio of the trigger, and the entire picture is evenly divided into several small blocks, each block is regarded as a grid point, then the grid coverage ratio of the trigger Defined as: the proportion of grid points containing trigger pixels to the total number of grid points. When this ratio is greater than a certain threshold, it is considered to be an adversarial disturbance, rather than a backdoor trigger maliciously implanted by the attacker.

例如，在本发明实验过程中，将长宽为224×224的图片平均划分成1024个7×7的格点，格点覆盖率阈值取10％(具体实施时可由防御者视防御需求而定)，当逆向触发器的格点覆盖率大于10％时，认为它是对抗扰动，而非攻击者恶意植入的后门触发器。For example, during the experiment of the present invention, a picture with a length and width of 224×224 is evenly divided into 1024 7×7 grid points, and the grid coverage threshold is 10% (the specific implementation can be determined by the defender depending on the defense needs. ), when the grid coverage of the reverse trigger is greater than 10%, it is considered to be an adversarial perturbation rather than a maliciously implanted backdoor trigger by the attacker.

步骤3.2：分析其激活待检测深度神经网络内部神经元的分布和目标标签干净图片样本激活神经元分布的相似性，若高于预设的相似性阈值，则认定是自然特征误报，输出深度神经网络检测结果为不含像素补丁型后门；否则，继续执行下述步骤3.3；Step 3.2: Analyze the similarity between the distribution of the activated neurons in the deep neural network to be detected and the distribution of the activated neurons in the target label clean image sample. If it is higher than the preset similarity threshold, it is considered to be a natural feature false positive, and the output depth The detection result of the neural network is that there is no pixel patch backdoor; otherwise, proceed to the following step 3.3;

本实施例中，激活内部神经元分布用模型倒数第二层的神经元计算，神经元激活函数为ReLU。In this embodiment, the activation of the internal neuron distribution is calculated by using the neurons in the penultimate layer of the model, and the neuron activation function is ReLU.

本实施例中，激活神经元分布的相似性用最大激活值神经元集合的相似度NBS计算，定义为：In this embodiment, the similarity of the activation neuron distribution is calculated by the similarity NBS of the maximum activation value neuron set, which is defined as:

其中，N_c表示来自目标标签干净样本有最大激活值的r个神经元集合，

表示受害标签样本添加触发器后有最大激活值的r个神经元集合，

表示全部受害标签样本添加触发器的样本集合。where N _c represents the set of r neurons with the largest activation value from the clean sample of the target label,

Represents the set of r neurons with the largest activation value after the victim label sample is added with a trigger,

Represents a sample set of all victim tag sample addition triggers.

例如，在本发明实验过程中，取r＝30，当激活神经元分布的相似性大于0.5时(具体实施时可由防御者视防御需求预设)，认为它是自然特征，而非攻击者恶意植入的后门触发器。For example, in the experimental process of the present invention, r=30 is taken. When the similarity of the distribution of activated neurons is greater than 0.5 (the specific implementation can be preset by the defender according to the defense requirements), it is considered to be a natural feature rather than a malicious attacker. Implanted backdoor trigger.

步骤3.3：对得到的成功率和触发器大小满足预设条件的像素补丁型触发器，若经过上述两步检查后仍能保留，则认定该触发器是恶意植入的后门触发器，输出深度神经网络检测结果为含像素补丁型后门深度神经网络，以及目标标签，受害标签，触发器样式。Step 3.3: For the obtained pixel patch trigger whose success rate and trigger size meet the preset conditions, if it can still be retained after the above two-step inspection, it is determined that the trigger is a maliciously implanted backdoor trigger, and the output depth The neural network detection result is a backdoor deep neural network with pixel patches, as well as target labels, victim labels, and trigger patterns.

本发明通过静态权重分析识别可疑的目标-受害标签对，为识别的标签对进行触发器逆向工程，还原攻击者定义的触发器样式，实现高效、稳定、可扩展的神经网络后门检测。The invention identifies suspicious target-victim tag pairs through static weight analysis, performs trigger reverse engineering for the identified tag pairs, restores the trigger pattern defined by the attacker, and realizes efficient, stable and scalable neural network backdoor detection.

应当理解的是，上述针对较佳实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制，本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，均落入本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above description of the preferred embodiments is relatively detailed, and therefore should not be considered as a limitation on the protection scope of the patent of the present invention. In the case of the protection scope, substitutions or deformations can also be made, which all fall within the protection scope of the present invention, and the claimed protection scope of the present invention shall be subject to the appended claims.

Claims

1. A static weight guided deep neural network backdoor detection method is characterized by comprising the following steps:

step 1: performing static weight analysis on the deep neural network to obtain a suspicious target label and a victim label of backdoor attack to form a target-victim label pair;

the specific implementation of the step 1 comprises the following substeps:

step 1.1: extracting all weights connected with output labels at the last layer of the deep neural network, assuming that the deep neural network has n output labels, organizing the weights connected with each label into vectors to obtain n weight vectors w₁…w_n；

Step 1.2: for each weight vector, calculating the difference degree between the weight vector and all other weight vectors, sorting the difference degrees from high to low, and taking the top k with high difference degree_dA target label set D corresponding to each weight vector; for each weight vector, calculating the sum of all weights, ordering the weight sums from high to low, taking the weight sum high_sA target label set S corresponding to each weight vector; merging the D and the S to obtain a final suspicious target label set T;

step 1.3: subtracting the highest difference and the second highest difference obtained in the step 1.2, and if the obtained difference is greater than a threshold value theta, determining that the damaged label is all the labels in the model; otherwise, for each suspicious target label T in T, calculating the similarity between the suspicious target label T and other weight vectors, sorting the similarity from high to low, and taking the top k with high similarity_vIndividual label as suspicious victim label V_t；

Step 1.4: obtaining suspicious target label sets T and victim labels V corresponding to each target label T in the T_tComposing a set of target-victim tag pairs;

step 2: performing trigger reverse engineering by using the suspicious target label and the damaged label obtained in the step (1) and the clean image sample to obtain a reverse trigger;

when the trigger reverse engineering is the pixel patch type trigger reverse engineering, judging whether the obtained pixel patch type reverse trigger meets preset conditions of success rate and trigger size, and if the preset conditions are met, executing the following step 3; otherwise, outputting a detection result that the deep neural network to be detected does not contain a pixel patch type back door;

when the trigger reverse engineering is the image filter type trigger reverse engineering, judging whether the obtained image filter type reverse trigger meets a success rate preset condition, and if the obtained image filter type reverse trigger meets the preset condition, outputting a detection result that the deep neural network to be detected contains an image filter type back door; otherwise, outputting a detection result that the deep neural network to be detected does not contain an image filter type back door;

and step 3: and (3) analyzing the shape attribute of the pixel patch type reverse trigger obtained in the step (2), activating the distribution of the neurons in the deep neural network to be detected, and outputting a final detection result.

2. The static weight guided deep neural network back door detection method of claim 1, wherein: in step 1.2, the weight vector difference degree is the weight vector w of the label l calculated by adopting the average cosine similarity degree_lDegree of difference (l) of (a), which is defined as:

3. the static weight guided deep neural network back door detection method of claim 1, wherein: in step 1.3, the similarity of the weight vectors is calculated by cosine similarity, and the suspicious target label T in T is labeled by the cosine similarityThe weight vector is w_tThen the weight vector w of the label i_iAnd w_tThe similarity of (c) is defined as:

4. the static weight guided deep neural network back door detection method of claim 1, wherein: step 2, in the trigger reverse engineering, the reverse engineering of the image filter type trigger comprises defining a general form of the image filter type trigger for converting the image and defining an optimization task to solve the image filter type trigger;

the definition image filter type trigger carries out the transformation of the image in a general form, and comprises the following components: splicing two channels with the value of all 1, namely a transparency channel and an offset channel, of a three-channel color picture with the dimension of 3 multiplied by H multiplied by W to obtain a matrix with the dimension of 5 multiplied by H multiplied by W; wherein H and W represent height and width, respectively; defining a filter trigger as a two-dimensional matrix with the size of 4 multiplied by 5, and multiplying the two-dimensional matrix with the dimension of 5 multiplied by H multiplied by W to obtain a 4 multiplied by H multiplied by W matrix; regarding a 4 × H × W matrix as an RGBA format picture, and regarding a last channel as a transparency channel; reducing a 4 XHxW matrix into a 3 XHxW three-channel color picture by using a method of converting an RGBA format into an RGB format, and finally obtaining a picture converted by a filter trigger;

the definition optimization task solves the image filter type trigger, and comprises the following steps: the picture added with the filter trigger can be wrongly classified to a target label by the neural network model to be detected, and the picture added with the filter trigger has the highest similarity with the original picture.

5. The static weight guided deep neural network back door detection method of claim 1, wherein: in step 2, in the trigger reverse engineering, for the reverse engineering of the pixel patch type trigger, the optimization task is as follows: after the same trigger is added to the damaged label picture, the damaged label picture can be wrongly judged as a target label by the neural network model, and the number of pixels of the trigger is reduced as much as possible.

6. The method of claim 1, wherein the step 2 is implemented by the following steps for reverse engineering of an image filter type trigger:

(1) inputting a target label set;

(2) taking a first target label;

(3) simulating an image filter adding process, and solving an optimization task for each other label;

(4) generating an image filter type reverse trigger, and calculating the average reverse success rate;

(5) judging whether the success rate is greater than a threshold value; if yes, executing the following step (6); if not, executing the following step (7);

(6) the deep neural network comprises an image filter type rear door and outputs the label and a solved reverse trigger; the process is finished;

(7) judging whether all the label pairs are traversed or not; if yes, executing the following step (8); if not, taking down a target label, and rotating to execute the step (3);

(8) and outputting a judgment result to be a depth neural network image-filter-free back door.

7. The static weight guided deep neural network back door detection method as claimed in claim 1, wherein for the reverse engineering of the pixel patch type trigger, the implementation of step 2 comprises the following sub-steps:

(1) inputting a set of target-victim tag pairs;

(2) taking a first target-victim tag pair;

(3) simulating a pixel patch adding process, and solving an optimization task for the label pair;

(4) generating a pixel patch type reverse trigger, and calculating a reverse success rate;

(5) judging whether the success rate is greater than a threshold value; if yes, executing the following step (6); if not, executing the following step (9);

(6) judging whether the size of the trigger is smaller than a threshold value; if yes, executing the following step (7); if not, executing the following step (9);

(7) the deep neural network may contain a pixel patch type back door, and further analyzes the obtained reverse trigger to judge whether a real back door trigger exists or not; if yes, executing the following step (8); if not, executing the following step (9);

(8) the deep neural network comprises a pixel patch type back gate and outputs the label pair and a solved reverse trigger; the process is finished;

(9) judging whether all the label pairs are traversed or not; if yes, executing the following step (10); if not, taking down a target-victim tag pair, and rotating to execute the step (3);

(10) and outputting a judgment result as a pixel-free patch type backdoor of the deep neural network.

8. The method for detecting the backdoor of the static weight guided deep neural network according to any one of claims 1 to 7, wherein the step 3 is implemented by the following sub-steps:

step 3.1: analyzing the sparsity of pixel distribution of the pixel patch type trigger with the obtained success rate and the trigger size meeting the preset condition, if the obtained success rate and the trigger size are higher than a preset sparsity threshold value, determining that the pixel patch type trigger is false-alarm-resistant, and outputting a deep neural network detection result as a back door without the pixel patch type trigger; otherwise, continuing to execute the following step 3.2;

step 3.2: analyzing the similarity of the distribution of the neurons in the deep neural network to be detected activated and the distribution of the neurons activated by the target label clean picture sample, if the similarity is higher than a preset similarity threshold value, determining that the natural characteristic is false alarm, and outputting a deep neural network detection result as a back door without a pixel patch type; otherwise, continuing to execute the following step 3.3;

step 3.3: and determining that the trigger is a maliciously implanted backdoor trigger, and outputting a detection result of the deep neural network as a patch type backdoor deep neural network containing pixels, a target label, a damaged label and a trigger model.

9. A deep neural network backdoor detection system guided by static weight is characterized by comprising the following modules:

the module 1 is used for performing static weight analysis on the deep neural network to obtain a suspicious target label and a victim label of backdoor attack to form a target-victim label pair;

module 1 includes the following sub-modules:

a module 1.1, configured to extract all weights of the last layer of the deep neural network connected to the output tags, assuming that the deep neural network has n output tags, organizing the weights connected to each tag into a vector, and obtaining n weight vectors w₁…w_n；

A module 1.2, configured to calculate, for each weight vector, a degree of difference between the weight vector and all other weight vectors, rank the degree of difference from high to low, and take the top k with high degree of difference_dA target label set D corresponding to each weight vector; for each weight vector, calculating the sum of all weights, ordering the weight sums from high to low, taking the weight sum high_sA target label set S corresponding to each weight vector; merging the D and the S to obtain a final suspicious target label set T;

a module 1.3, configured to subtract the highest difference and the second highest difference obtained by the module 1.2, and if the obtained difference is greater than a threshold θ, consider that the damaged tag is all tags in the model; otherwise, for each suspicious target label T in T, calculating the similarity between the suspicious target label T and other weight vectors, sorting the similarity from high to low, and taking the top k with high similarity_vIndividual label as suspicious victim label V_t；

A module 1.4, configured to obtain a suspicious target tag set T and a victim tag V corresponding to each target tag T in the suspicious target tag set T_tComposing a set of target-victim tag pairs;

and (3) module 2: the system is used for performing trigger reverse engineering by utilizing the suspicious target label and the victim label obtained by the module 1 and the clean image sample to obtain a reverse trigger;

when the trigger reverse engineering is the pixel patch type trigger reverse engineering, judging whether the obtained pixel patch type reverse trigger meets preset conditions of success rate and trigger size, and if the preset conditions are met, executing a following module 3; otherwise, outputting a detection result that the deep neural network to be detected does not contain a pixel patch type back door;

and the module 3 is used for analyzing the shape attribute of the obtained pixel patch type trigger, activating the distribution of the neurons in the deep neural network to be detected and outputting a final detection result.