CN114638356A - Static weight guided deep neural network back door detection method and system - Google Patents
Static weight guided deep neural network back door detection method and system Download PDFInfo
- Publication number
- CN114638356A CN114638356A CN202210177556.2A CN202210177556A CN114638356A CN 114638356 A CN114638356 A CN 114638356A CN 202210177556 A CN202210177556 A CN 202210177556A CN 114638356 A CN114638356 A CN 114638356A
- Authority
- CN
- China
- Prior art keywords
- trigger
- label
- neural network
- deep neural
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 81
- 238000001514 detection method Methods 0.000 title claims abstract description 55
- 230000003068 static effect Effects 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 claims abstract description 29
- 210000002569 neuron Anatomy 0.000 claims abstract description 19
- 238000003062 neural network model Methods 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 51
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000005457 optimization Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 230000009466 transformation Effects 0.000 claims description 5
- 230000003213 activating effect Effects 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 5
- 238000004364 calculation method Methods 0.000 abstract 1
- 230000004913 activation Effects 0.000 description 7
- 230000007123 defense Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 2
- 239000007943 implant Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于人工智能和网络安全领域,涉及一种深度神经网络后门检测方法,具体涉及一种静态权重引导的深度神经网络后门检测方法。The invention belongs to the field of artificial intelligence and network security, and relates to a deep neural network backdoor detection method, in particular to a deep neural network backdoor detection method guided by static weights.
背景技术Background technique
近年来,深度神经网络在诸多领域取得了不错的表现,例如计算机视觉、恶意软件检测、自动驾驶等。由于构建和部署表现良好的神经网络模型需要大量的专家知识和计算开销,用户一般选择外包云计算或者下载预训练模型。In recent years, deep neural networks have achieved good performance in many fields, such as computer vision, malware detection, autonomous driving, etc. Since building and deploying a well-performing neural network model requires a lot of expert knowledge and computational overhead, users generally choose to outsource cloud computing or download pre-trained models.
然而,现有研究已经证明神经网络很容易受到后门攻击,这导致从第三方获取的预训练模型可能存在严重的安全风险。在后门攻击中,攻击者定义一个后门触发器,并指定攻击的目标标签和受害标签。受害标签可能是除目标标签外的所有标签,也可能是攻击者特定的几个标签。攻击者在训练过程中为来自受害标签的数据加上触发器,并标记为目标标签,向模型植入后门。用户使用时,含后门的模型对于干净输入仍能正确分类,然而一旦受害标签的输入中含有攻击者定义的触发器,就会被分到攻击者指定的后门攻击目标标签。However, existing research has demonstrated that neural networks are vulnerable to backdoor attacks, which leads to the possible serious security risks of pre-trained models obtained from third parties. In a backdoor attack, the attacker defines a backdoor trigger and specifies the target tag and victim tag of the attack. Victim tags may be all tags except the target tag, or a few tags specific to the attacker. Attackers add triggers to data from victim labels during training and mark them as target labels to implant a backdoor into the model. When the user uses it, the model with the backdoor can still correctly classify the clean input, but once the input of the victim label contains the trigger defined by the attacker, it will be assigned to the backdoor attack target label specified by the attacker.
对神经网络后门攻击的防御一直是研究热点。给定一个预训练模型,防御者需要判断模型中是否含有恶意的后门。防御者通常不能获取攻击者使用的训练集和触发器的样式,只拥有一小部分用于验证模型功能的干净样本。后门检测的主要困难在于识别攻击者指定的目标-受害标签对,并恢复触发器的样式。The defense against neural network backdoor attacks has always been a research hotspot. Given a pretrained model, the defender needs to determine whether the model contains malicious backdoors. Defenders typically do not have access to the training set and trigger patterns used by attackers, and only have a small subset of clean samples to validate model functionality. The main difficulty of backdoor detection lies in identifying the attacker-specified target-victim tag pair and recovering the trigger pattern.
传统检测方法通常是盲目尝试所有的标签对、或者给定一些干净输入样本进行标签预选,然后针对每组标签对通过逆向工程恢复触发器样式。另外,传统方法大多假设检测的后门触发器类型为像素补丁类型,而另一种使用基于图像变换的滤镜型触发器也是常见且容易实施的后门攻击类型。由于为每个标签对逆向工程计算开销大、输入样本质量和触发器类型存在不确定性,传统方法通常面临计算复杂度高、精度不稳定和缺乏可扩展性的问题。Traditional detection methods usually blindly try all label pairs, or pre-select labels given some clean input samples, and then reverse engineer the trigger style for each set of label pairs. In addition, most traditional methods assume that the detected backdoor trigger type is pixel patch type, and another type of backdoor attack using image transformation-based filter type triggers is also a common and easy to implement type of backdoor attack. Due to the high computational overhead of reverse engineering for each tag pair, uncertainties in the quality of input samples and trigger types, traditional methods usually suffer from high computational complexity, unstable accuracy, and lack of scalability.
发明内容SUMMARY OF THE INVENTION
本发明为了提高深度学习神经网络模型后门检测的效率、准确率和可扩展性,提供了一种静态权重引导的深度神经网络后门检测方法及系统,输入为一个待检测的深度神经网络模型,和一小部分用于验证模型功能的干净图像样本,输出为该神经网络模型是否含后门,如果含有后门,输出后门攻击的目标标签,受害标签,以及触发器的样式。In order to improve the efficiency, accuracy and scalability of the deep neural network model backdoor detection, the present invention provides a deep neural network backdoor detection method and system guided by static weights, the input is a deep neural network model to be detected, and A small number of clean image samples used to verify the function of the model, the output is whether the neural network model contains a backdoor, and if it contains a backdoor, output the target label of the backdoor attack, the victim label, and the trigger style.
本发明的方法所采用的技术方案是:一种静态权重引导的深度神经网络后门检测方法,包括以下步骤:The technical solution adopted by the method of the present invention is: a deep neural network backdoor detection method guided by static weights, comprising the following steps:
步骤1:对深度神经网络进行静态权重分析,得到后门攻击的可疑目标标签和受害标签,组成目标-受害标签对;Step 1: Perform static weight analysis on the deep neural network to obtain suspicious target labels and victim labels of backdoor attacks, and form target-victim label pairs;
步骤1的具体实现包括以下子步骤:The specific implementation of step 1 includes the following sub-steps:
步骤1.1:提取深度神经网络最后一层与输出标签相连的所有权重,假设深度神经网络共有n个输出标签,将与每一个标签相连接的权重组织成向量,得到n个权重向量w1…wn;Step 1.1: Extract all the weights connected to the output labels in the last layer of the deep neural network, assuming that the deep neural network has n output labels, organize the weights connected to each label into a vector, and obtain n weight vectors w 1 ... w n ;
步骤1.2:对每一个权重向量,计算它和其他所有权重向量的差异度,将差异度由高到低排序,取差异度高的前kd个权重向量对应的目标标签集合D;对每一个权重向量,计算它包含所有权重的和,将权重和由高到低排序,取权重和高的前ks个权重向量对应的目标标签集合S;将D和S取并集得到最终的可疑目标标签集合T;Step 1.2: For each weight vector, calculate the degree of difference between it and all other weight vectors, sort the degree of difference from high to low, and take the target label set D corresponding to the first k d weight vectors with high degree of difference; Weight vector, calculate the sum of all weights, sort the weight sum from high to low, and take the target label set S corresponding to the first k s weight vectors of the weight sum and high; take the union of D and S to get the final suspicious target label set T;
步骤1.3:将步骤1.2得到的最高差异度和次高差异度相减,所得到的差大于阈值θ,则认为受害标签是模型中的所有标签;否则,对T中的每一个可疑目标标签t,计算它和其他权重向量的相似度,将相似度由高到低排序,取相似度高的前kv个标签,作为可疑受害标签Vt;Step 1.3: Subtract the highest difference and the second highest difference obtained in step 1.2, and the difference obtained is greater than the threshold θ, then the victim label is considered to be all labels in the model; otherwise, for each suspicious target label t in T , calculate the similarity between it and other weight vectors, sort the similarity from high to low, and take the first k v tags with high similarity as the suspicious victim tag V t ;
步骤1.4:将得到的可疑目标标签集合T和T中每一个目标标签t对应的受害标签Vt,组成目标-受害标签对集合;Step 1.4: The obtained suspicious target label set T and the victim label V t corresponding to each target label t in T are formed into a target-victim label pair set;
步骤2:利用步骤1得到的可疑目标标签和受害标签,和干净图像样本,进行触发器逆向工程,得到逆向触发器;Step 2: Use the suspicious target labels and victim labels obtained in Step 1, and clean image samples to perform reverse engineering of triggers to obtain reverse triggers;
当所述触发器逆向工程为像素补丁型触发器逆向工程时,则判断得到的像素补丁型逆向触发器是否满足成功率和触发器大小预设条件,若满足预设条件,则执行下述步骤3;否则,输出检测结果为待检测深度神经网络不含有像素补丁型后门;When the reverse engineering of the trigger is the reverse engineering of the pixel patch type trigger, it is judged whether the obtained pixel patch type reverse trigger satisfies the preset conditions of the success rate and the trigger size, and if the preset conditions are met, the following steps are performed 3; otherwise, the output detection result is that the deep neural network to be detected does not contain a pixel patch backdoor;
当所述触发器逆向工程为图像滤镜型触发器逆向工程时,则判断得到的图像滤镜型逆向触发器是否满足成功率预设条件,若满足预设条件,则输出检测结果为待检测深度神经网络含有图像滤镜型后门;否则,输出检测结果为待检测深度神经网络不含有图像滤镜型后门;When the reverse engineering of the trigger is the reverse engineering of the image filter type trigger, it is judged whether the obtained image filter type reverse trigger satisfies the preset conditions of the success rate, and if the preset conditions are met, the output detection result is to be detected The deep neural network contains an image filter type backdoor; otherwise, the output detection result is that the deep neural network to be detected does not contain an image filter type backdoor;
步骤3:分析步骤2得到的像素补丁型逆向触发器的形状属性,以及激活待检测深度神经网络内部神经元的分布,输出最终检测结果。Step 3: Analyze the shape attribute of the pixel patch-type inverse trigger obtained in Step 2, and activate the distribution of the internal neurons of the deep neural network to be detected, and output the final detection result.
本发明的系统所采用的技术方案是:一种静态权重引导的深度神经网络后门检测系统,包括以下模块:The technical scheme adopted by the system of the present invention is: a deep neural network backdoor detection system guided by static weights, comprising the following modules:
模块1,用于对深度神经网络进行静态权重分析,得到后门攻击的可疑目标标签和受害标签,组成目标-受害标签对;Module 1 is used to perform static weight analysis on the deep neural network to obtain suspicious target labels and victim labels of backdoor attacks, and form target-victim label pairs;
模块1包括以下子模块:Module 1 includes the following submodules:
模块1.1,用于提取深度神经网络最后一层与输出标签相连的所有权重,假设深度神经网络共有n个输出标签,将与每一个标签相连接的权重组织成向量,得到n个权重向量w1…wn;Module 1.1 is used to extract all the weights connected to the output labels in the last layer of the deep neural network. Assuming that the deep neural network has n output labels in total, organize the weights connected to each label into a vector to obtain n weight vectors w 1 ...w n ;
模块1.2,用于对每一个权重向量,计算它和其他所有权重向量的差异度,将差异度由高到低排序,取差异度高的前kd个权重向量对应的目标标签集合D;对每一个权重向量,计算它包含所有权重的和,将权重和由高到低排序,取权重和高的前ks个权重向量对应的目标标签集合S;将D和S取并集得到最终的可疑目标标签集合T;Module 1.2 is used to calculate the degree of difference between each weight vector and all other weight vectors, sort the degree of difference from high to low, and take the target label set D corresponding to the first k d weight vectors with high degree of difference; For each weight vector, calculate the sum of all weights, sort the weight sum from high to low, and take the target label set S corresponding to the first k s weight vectors of the weight sum and high; take the union of D and S to get the final Suspicious target label set T;
模块1.3,用于将模块1.2得到的最高差异度和次高差异度相减,所得到的差大于阈值θ,则认为受害标签是模型中的所有标签;否则,对T中的每一个可疑目标标签t,计算它和其他权重向量的相似度,将相似度由高到低排序,取相似度高的前kv个标签,作为可疑受害标签Vt;Module 1.3 is used to subtract the highest difference degree and the second highest degree of difference obtained by module 1.2, and the obtained difference is greater than the threshold θ, then the victim label is considered to be all labels in the model; otherwise, for each suspicious target in T Label t, calculate the similarity between it and other weight vectors, sort the similarity from high to low, and take the top k v labels with high similarity as the suspicious victim label V t ;
模块1.4,用于将得到的可疑目标标签集合T和T中每一个目标标签t对应的受害标签Vt,组成目标-受害标签对集合;Module 1.4 is used to form a set of target-victim tag pairs from the obtained suspicious target tag set T and the victim tag V t corresponding to each target tag t in T;
模块2:用于利用模块1得到的可疑目标标签和受害标签,和干净图像样本,进行触发器逆向工程,得到逆向触发器;Module 2: Use the suspicious target labels and victim labels obtained by module 1, and clean image samples to perform trigger reverse engineering to obtain reverse triggers;
当所述触发器逆向工程为像素补丁型触发器逆向工程时,则判断得到的像素补丁型逆向触发器是否满足成功率和触发器大小预设条件,若满足预设条件,则执行下述模块3;否则,输出检测结果为待检测深度神经网络不含有像素补丁型后门;When the reverse engineering of the trigger is the reverse engineering of the pixel patch type trigger, it is judged whether the obtained pixel patch type reverse trigger satisfies the preset conditions of the success rate and the trigger size, and if the preset conditions are met, the following modules are executed 3; otherwise, the output detection result is that the deep neural network to be detected does not contain a pixel patch backdoor;
当所述触发器逆向工程为图像滤镜型触发器逆向工程时,则判断得到的图像滤镜型逆向触发器是否满足成功率预设条件,若满足预设条件,则输出检测结果为待检测深度神经网络含有图像滤镜型后门;否则,输出检测结果为待检测深度神经网络不含有图像滤镜型后门;When the reverse engineering of the trigger is the reverse engineering of the image filter type trigger, it is judged whether the obtained image filter type reverse trigger satisfies the preset conditions of the success rate, and if the preset conditions are met, the output detection result is to be detected The deep neural network contains an image filter type backdoor; otherwise, the output detection result is that the deep neural network to be detected does not contain an image filter type backdoor;
模块3,用于分析得到的像素补丁型触发器的形状属性,以及激活待检测深度神经网络内部神经元的分布,输出最终检测结果。The module 3 is used for analyzing the shape attribute of the obtained pixel patch trigger, and activating the distribution of the inner neurons of the deep neural network to be detected, and outputting the final detection result.
本发明的优点:Advantages of the present invention:
1.本发明创新性地采用静态权重分析来识别待检测模型中可疑的后门目标-受害标签对,能有效利用模型权重中的信息。对模型权重进行静态分析无需执行模型,计算复杂度低;无需输入样本,能有效克服输入样本数量、质量的影响,实现稳定的标签识别。而传统方法盲目尝试所有的目标-受害标签对,效率低,实际可用性差;或者给定一些干净输入样本进行标签预选,效果容易受到样本数量和质量的限制。1. The present invention innovatively uses static weight analysis to identify suspicious backdoor target-victim tag pairs in the model to be detected, which can effectively utilize the information in the model weight. Static analysis of model weights does not require model execution, and the computational complexity is low; no input samples are required, which can effectively overcome the influence of the quantity and quality of input samples and achieve stable label recognition. However, traditional methods blindly try all target-victim label pairs, which is inefficient and has poor practical usability; or given some clean input samples for label preselection, the effect is easily limited by the number and quality of samples.
2.本发明利用静态权重分析不受触发器类型影响、可扩展性强的优点,在静态权重分析结果的引导下,对后门攻击触发器进行逆向工程,能检测和还原像素补丁类型触发器和图像滤镜类型触发器。而传统方法大多假设后门攻击使用像素补丁类型的触发器,难以扩展到图像滤镜类型触发器的检测和还原。2. The present invention utilizes the advantages that static weight analysis is not affected by trigger type and has strong scalability. Under the guidance of static weight analysis results, reverse engineering is performed on backdoor attack triggers, and pixel patch type triggers and triggers can be detected and restored. Image filter type trigger. Traditional methods mostly assume that backdoor attacks use pixel patch-type triggers, which are difficult to extend to the detection and restoration of image filter-type triggers.
3.本发明对逆向工程得到的触发器进行形状属性分析和激活神经元分布分析,有效降低了误报率,使得所发明方法在有效检测含后门模型的同时保留正常的模型,实现高精度的深度神经网络模型后门检测。3. The present invention performs shape attribute analysis and activation neuron distribution analysis on the flip-flop obtained by reverse engineering, which effectively reduces the false alarm rate, so that the invented method can effectively detect the model with backdoor while retaining the normal model, and realize high precision. Deep neural network model backdoor detection.
附图说明Description of drawings
图1是本发明实施例的方法流程图。FIG. 1 is a flowchart of a method according to an embodiment of the present invention.
图2是本发明实施例的静态权重分析流程图。FIG. 2 is a flow chart of static weight analysis according to an embodiment of the present invention.
图3是本发明实施例的像素补丁型触发器逆向工程流程图。FIG. 3 is a flowchart of reverse engineering of a pixel patch flip-flop according to an embodiment of the present invention.
图4是本发明实施例的图像滤镜型触发器逆向工程流程图。FIG. 4 is a flow chart of reverse engineering of an image filter type trigger according to an embodiment of the present invention.
图5是本发明实施例的像素补丁型触发器分析流程图。FIG. 5 is a flow chart of analyzing a pixel patch type flip-flop according to an embodiment of the present invention.
具体实施方式Detailed ways
为了便于本领域普通技术人员理解和实施本发明,下面结合附图及实施例对本发明作进一步的详细描述,应当理解,此处所描述的实施示例仅用于说明和解释本发明,并不用于限定本发明。In order to facilitate the understanding and implementation of the present invention by those skilled in the art, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to illustrate and explain the present invention, but not to limit it. this invention.
传统检测方法通常是盲目尝试所有的目标-受害标签对、或者给定一些干净输入样本进行标签预选,然后针对每个标签对通过逆向工程恢复触发器样式。当需要尝试的标签对多、或者输入样本的质量差导致标签对选择不准确时,传统方法的效率、准确率会受到影响,并且传统方法大多只适用于像素补丁类型的触发器。Traditional detection methods usually blindly try all target-victim label pairs, or preselect labels given some clean input samples, and then reverse engineer the trigger style for each label pair. When there are many label pairs to be tried, or the poor quality of input samples leads to inaccurate label pair selection, the efficiency and accuracy of traditional methods will be affected, and most traditional methods are only suitable for pixel patch type triggers.
在训练过程中,神经网络通过更新权重参数来完成分类任务。对于训练完成的神经网络,权重参数是固定不变的,可以看作模型的一种静态属性。因为植入后门必须要篡改模型的权重参数,相比干净的模型,含后门的模型会不可避免地出现权重分布的异常,这种异常和触发器类型无关,可以用来识别攻击者指定的目标-受害标签对,从而引导后门检测过程,提高检测效率。During the training process, the neural network completes the classification task by updating the weight parameters. For the trained neural network, the weight parameters are fixed and can be regarded as a static attribute of the model. Because the weight parameters of the model must be tampered with to implant the backdoor, compared with the clean model, the model with the backdoor will inevitably have abnormal weight distribution. This abnormality has nothing to do with the trigger type and can be used to identify the target specified by the attacker. - Victim tag pair, thus guiding the backdoor detection process and improving detection efficiency.
基于以上分析,本发明提出了一种静态权重分析引导的深度神经网络后门检测方法及系统。本发明致力于利用预训练神经网络中的权重分布异常来引导后门检测,发挥了静态权重分析计算开销小、不受输入样本质量影响、不受触发器类型影响的优势,研究和探索出了一种高效、稳定、可扩展的神经网络后门检测方法。Based on the above analysis, the present invention proposes a deep neural network backdoor detection method and system guided by static weight analysis. The present invention is dedicated to using the abnormal weight distribution in the pre-training neural network to guide backdoor detection, and takes advantage of the advantages of static weight analysis with low computational cost, unaffected by the quality of input samples, and unaffected by trigger types. An efficient, stable and scalable neural network backdoor detection method.
请见图1,本发明提供的一种静态权重引导的深度神经网络后门检测方法,包括以下步骤:Referring to Figure 1, a static weight guided deep neural network backdoor detection method provided by the present invention includes the following steps:
步骤1:对深度神经网络进行静态权重分析,得到后门攻击的可疑目标标签和受害标签,组成目标-受害标签对;Step 1: Perform static weight analysis on the deep neural network to obtain suspicious target labels and victim labels of backdoor attacks, and form target-victim label pairs;
请见图2,本实施例中,步骤1的具体实现包括以下子步骤:Referring to FIG. 2, in this embodiment, the specific implementation of step 1 includes the following sub-steps:
步骤1.1:提取深度神经网络最后一层与输出标签相连的所有权重,假设深度神经网络共有n个输出标签,将与每一个标签相连接的权重组织成向量,得到n个权重向量w1…wn;Step 1.1: Extract all the weights connected to the output labels in the last layer of the deep neural network, assuming that the deep neural network has n output labels, organize the weights connected to each label into a vector, and obtain n weight vectors w 1 ... w n ;
步骤1.2:对每一个权重向量,计算它和其他所有权重向量的差异度,将差异度由高到低排序,取差异度高的前kd个权重向量对应的目标标签集合D;对每一个权重向量,计算它包含所有权重的和,将权重和由高到低排序,取权重和高的前ks个权重向量对应的目标标签集合S;将D和S取并集得到最终的可疑目标标签集合T;Step 1.2: For each weight vector, calculate the degree of difference between it and all other weight vectors, sort the degree of difference from high to low, and take the target label set D corresponding to the first k d weight vectors with high degree of difference; Weight vector, calculate the sum of all weights, sort the weight sum from high to low, and take the target label set S corresponding to the first k s weight vectors of the weight sum and high; take the union of D and S to get the final suspicious target label set T;
步骤1.3:为了识别受害标签,将步骤1.2得到的最高差异度和次高差异度相减,所得到的差大于阈值θ,则认为受害标签是模型中的所有标签;否则,对T中的每一个可疑目标标签t,计算它和其他权重向量的相似度,将相似度由高到低排序,取相似度高的前kv个标签,作为可疑受害标签Vt;Step 1.3: In order to identify the victim label, subtract the highest difference degree and the second highest difference degree obtained in step 1.2, and the obtained difference is greater than the threshold θ, then the victim label is considered to be all the labels in the model; otherwise, for each label in T A suspicious target label t, calculate the similarity between it and other weight vectors, sort the similarity from high to low, and take the top k v labels with high similarity as the suspicious victim label V t ;
步骤1.4:经过上面三步的处理,将得到的可疑目标标签集合T和T中每一个目标标签t对应的受害标签Vt,组成目标-受害标签对集合;Step 1.4: After the above three steps of processing, the obtained suspicious target label set T and the victim label V t corresponding to each target label t in T are formed into a target-victim label pair set;
在本实施例的实验过程中,取θ=0.1,kd=3,ks=2,kv=2(在具体实施时,也可由防御者视防御需求预设),则集合D中包含元素个数为3,集合S中包含元素个数为2,集合T中包含元素个数为3-5。当最高差异度和次高差异度相差大于0.1时,认为受害标签是模型中所有标签,将所有标签看作一个整体作为可疑受害标签Vt,目标-受害标签对数量为3-5;否则认为受害标签不是所有标签,为每个目标标签计算可疑受害标签Vt,目标-受害标签对数量为6-10。In the experimental process of this embodiment, θ=0.1, k d =3, k s =2, k v =2 (in the specific implementation, it can also be preset by the defender according to the defense requirements), then the set D contains The number of elements is 3, the number of elements contained in the set S is 2, and the number of elements contained in the set T is 3-5. When the difference between the highest difference degree and the second highest difference degree is greater than 0.1, the victim label is considered to be all the labels in the model, and all labels are regarded as a whole as a suspicious victim label V t , and the number of target-victim label pairs is 3-5; otherwise, it is considered that Victim tags are not all tags, suspicious victim tags V t are calculated for each target tag, and the number of target-victim tag pairs is 6-10.
本实施例中采用平均余弦相似度计算标签l的权重向量wl的差异度Divergence(l),其定义为:In this embodiment, the average cosine similarity is used to calculate the difference degree Divergence(l) of the weight vector w l of the label l, which is defined as:
本实施例中采用余弦相似度计算权重向量之间的相似度,对T中的可疑目标标签t,其权重向量为wt,则标签i的权重向量wi与wt的相似度定义为:In this embodiment, the cosine similarity is used to calculate the similarity between the weight vectors. For the suspicious target label t in T, the weight vector is wt , and the similarity between the weight vector wi and wt of the label i is defined as:
步骤2:利用步骤1得到的可疑目标标签和受害标签,和少量干净图像样本,进行触发器逆向工程,得到成功率满足预设条件的逆向触发器;Step 2: Use the suspicious target label and victim label obtained in Step 1, and a small number of clean image samples, to perform reverse engineering of triggers, and obtain a reverse trigger whose success rate satisfies the preset conditions;
本实施例中,逆向工程针对的触发器类型,包括像素补丁型触发器和图像滤镜型触发器。In this embodiment, the types of triggers targeted by reverse engineering include pixel patch triggers and image filter triggers.
当触发器逆向工程为像素补丁型触发器逆向工程时,则判断得到的像素补丁型逆向触发器是否满足成功率和触发器大小预设条件,若满足预设条件,则执行下述步骤3;否则,输出检测结果为待检测深度神经网络不含有像素补丁型后门;When the reverse engineering of the trigger is the reverse engineering of the pixel patch type trigger, then it is judged whether the obtained pixel patch type reverse trigger satisfies the preset conditions of the success rate and the trigger size, and if the preset conditions are met, the following step 3 is performed; Otherwise, the output detection result is that the deep neural network to be detected does not contain a pixel patch backdoor;
当触发器逆向工程为图像滤镜型触发器逆向工程时,则判断得到的图像滤镜型逆向触发器是否满足成功率预设条件,若满足预设条件,则输出检测结果为待检测深度神经网络含有图像滤镜型后门;否则,输出检测结果为待检测深度神经网络不含有图像滤镜型后门。When the reverse engineering of the trigger is the reverse engineering of the image filter type trigger, it is judged whether the obtained image filter type reverse trigger meets the preset conditions of the success rate, and if the preset conditions are met, the output detection result is the deep neural network to be detected. The network contains an image filter type backdoor; otherwise, the output detection result is that the deep neural network to be detected does not contain an image filter type backdoor.
本实施例中,在触发器逆向工程中,对于图像滤镜型触发器的逆向工程,包括定义图像滤镜型触发器对图像进行变换的一般形式和定义优化任务求解图像滤镜型触发器;In this embodiment, in the reverse engineering of the trigger, the reverse engineering of the image filter type trigger includes defining the general form of the image filter type trigger to transform the image and defining the optimization task to solve the image filter type trigger;
定义图像滤镜型触发器对图像进行变换的一般形式,包括:对维度为3×H×W的三通道彩色图片,拼接两个值为全1的通道,即透明度通道和偏置通道,得到维度为5×H×W的矩阵;其中,H和W分别表示高和宽;将滤镜触发器定义为一个大小为4×5的二维矩阵,与维度为5×H×W的矩阵相乘,得到4×H×W的矩阵;将4×H×W的矩阵视为RGBA格式的图片,最后一个通道为透明度通道;利用RGBA格式转RGB格式的方法,将4×H×W的矩阵还原为3×H×W的三通道彩色图片,最终得到经过滤镜触发器变换的图片;Define the general form of image filter trigger to transform the image, including: splicing two channels whose value is all 1 for a three-channel color image with a dimension of 3×H×W, namely the transparency channel and the bias channel, to get A matrix of dimension 5×H×W; where H and W represent height and width, respectively; the filter trigger is defined as a two-dimensional matrix of size 4×5, which is similar to the matrix of dimension 5×H×W Multiply to get a 4×H×W matrix; regard the 4×H×W matrix as a picture in RGBA format, and the last channel is the transparency channel; use the method of RGBA format to RGB format, convert the 4×H×W matrix It is restored to a 3×H×W three-channel color image, and finally the image transformed by the filter trigger is obtained;
定义优化任务求解图像滤镜型触发器,优化任务包括:添加滤镜触发器后的图片能被待检测神经网络模型错误分类到目标标签,和添加滤镜触发器后的图片与原图片结构相似性尽量高。Define an optimization task to solve the image filter trigger. The optimization task includes: the image after adding the filter trigger can be misclassified to the target label by the neural network model to be detected, and the image after adding the filter trigger has a similar structure to the original image sex as high as possible.
本实施例中,在触发器逆向工程中,对于像素补丁型触发器的逆向工程,其优化任务为:受害标签图片加上同一个触发器后,能被神经网络模型错判为目标标签,且触发器的像素数量尽量少。In this embodiment, in the reverse engineering of triggers, for the reverse engineering of pixel patch triggers, the optimization task is: after adding the same trigger to the victim label image, it can be wrongly judged as the target label by the neural network model, and Triggers should have as few pixels as possible.
请见图3,对于像素补丁型触发器的逆向工程,步骤2的具体实现包括以下子步骤:See Figure 3. For the reverse engineering of pixel patch flip-flops, the specific implementation of step 2 includes the following sub-steps:
(1)输入目标-受害标签对集合;(1) Input target-victim tag pair set;
(2)取第一个目标-受害标签对;(2) Take the first target-victim tag pair;
(3)模拟像素补丁添加过程,为这个标签对求解优化任务;(3) Simulate the pixel patch adding process, and solve the optimization task for this label pair;
(4)生成像素补丁型逆向触发器,计算逆向成功率;(4) Generate a pixel patch type reverse trigger, and calculate the reverse success rate;
(5)判断成功率是否大于阈值;若是,则执行下述步骤(6);若否,则执行下述步骤(9);(5) determine whether the success rate is greater than the threshold; if so, execute the following step (6); if not, execute the following step (9);
(6)判断触发器大小是否小于阈值;若是,则执行下述步骤(7);若否,则执行下述步骤(9);(6) judge whether the trigger size is less than the threshold; if so, execute the following step (7); if not, execute the following step (9);
(7)深度神经网络可能含有像素补丁型后门,进一步分析得到的逆向触发器,判断是否存在真正的后门触发器;若是,则执行下述步骤(8);若否,则执行下述步骤(9);(7) The deep neural network may contain a pixel patch type backdoor, further analyze the reverse trigger obtained, and judge whether there is a real backdoor trigger; if so, execute the following steps (8); if not, execute the following steps ( 9);
(8)深度神经网络含有像素补丁型后门,输出该标签对和求解的逆向触发器;本流程结束;(8) The deep neural network contains a pixel patch type backdoor, and outputs the reverse trigger of the label pair and solution; this process ends;
(9)判断是否遍历完所有标签对;若是,则执行下述步骤(10);若否,取下一个目标-受害标签对,并回转执行步骤(3);(9) Judging whether all tag pairs are traversed; if so, execute the following step (10); if not, take a target-victim tag pair, and turn around to execute step (3);
(10)输出判断结果为深度神经网络无像素补丁型后门。(10) The output judgment result is a pixel-free backdoor of the deep neural network.
请见图4,对于图像滤镜型触发器的逆向工程,步骤2的具体实现包括以下子步骤:See Figure 4. For the reverse engineering of image filter triggers, the specific implementation of step 2 includes the following sub-steps:
(1)输入目标标签集合;(1) Input the target label set;
(2)取第一个目标标签;(2) Take the first target label;
(3)模拟图像滤镜添加过程,为每个其他标签求解优化任务;(3) Simulate the image filter adding process, and solve the optimization task for each other label;
(4)生成图像滤镜型逆向触发器,计算平均逆向成功率;(4) Generate an image filter type reverse trigger, and calculate the average reverse success rate;
(5)判断成功率是否大于阈值;若是,则执行下述步骤(6);若否,则执行下述步骤(7);(5) determine whether the success rate is greater than the threshold; if so, execute the following step (6); if not, execute the following step (7);
(6)深度神经网络含有图像滤镜型后门,输出该标签和求解的逆向触发器;本流程结束;(6) The deep neural network contains an image filter type backdoor, and outputs the label and the reverse trigger of the solution; this process ends;
(7)判断是否遍历完所有标签对;若是,则执行下述步骤(8);若否,取下一个目标标签,并回转执行步骤(3);(7) judge whether to traverse all label pairs; if so, execute the following step (8); if not, take down a target label, and turn around to execute step (3);
(8)输出判断结果为深度神经网络无图像滤镜型后门。(8) The output judgment result is a deep neural network backdoor without an image filter.
在本发明的实验过程中,图像样本的维度(通道,高,宽)为(3,224,224)。当认为受害标签是模型中所有标签时,干净图片样本的数量为40张,从所有标签中随机选取;当认为受害标签不是所有标签时,干净图片样本的数量为10张,从受害标签中选取。像素补丁型触发器的成功率阈值取99%(可由防御者视防御需求预设),像素补丁型触发器的大小||M||1阈值取350(可由防御者视防御需求预设),当逆向触发器攻击成功率大于99%,并且触发器大小小于350时,认为是一个可疑像素触发器;图像滤镜型触发器的平均攻击成功率阈值取90%(可由防御者视防御需求预设),当n-1个逆向触发器的平均攻击成功率大于90%时,认为模型含有图像滤镜型后门。In the experimental process of the present invention, the dimensions (channel, height, width) of the image sample are (3, 224, 224). When the victim labels are considered to be all labels in the model, the number of clean image samples is 40, which are randomly selected from all labels; when the victim labels are not considered to be all labels, the number of clean image samples is 10, which are selected from the victim labels . The success rate threshold of the pixel patch trigger is 99% (it can be preset by the defender according to the defense needs), the size of the pixel patch trigger ||M|| 1 is 350 (it can be preset by the defender according to the defense needs), When the attack success rate of the reverse trigger is greater than 99% and the trigger size is less than 350, it is considered a suspicious pixel trigger; the threshold of the average attack success rate of the image filter trigger is 90% (which can be predicted by the defender depending on the defense requirements). ), when the average attack success rate of n-1 reverse triggers is greater than 90%, the model is considered to contain an image filter type backdoor.
本实施例中,对于像素补丁型的触发器,其添加在图片上的一般形式定义为:In this embodiment, for the pixel patch type trigger, the general form of the trigger added to the picture is defined as:
其中,x表示原图片,表示加上触发器之后的图片,算子°表示哈达玛积。P是一个颜色矩阵,M矩阵控制触发器的位置和透明度,像素补丁型触发器由(P,M)表示。Among them, x represents the original image, Represents the picture after adding the trigger, and the operator ° represents the Hadamard product. P is a color matrix, M matrix controls the position and transparency of flip-flops, and pixel-patch flip-flops are represented by (P, M).
本实施例中,对于图像滤镜型的触发器,其对图像进行变换的一般形式定义为:In this embodiment, for an image filter type trigger, the general form of transforming an image is defined as:
其中,x表示原图片,表示加上触发器之后的图片,算子·表示矩阵乘法,操作concatenate表示矩阵拼接,操作rgba2rgb表示将RGBA格式的四通道图片转换为RGB三通道格式。触发器是一个维度为4×5的二维矩阵。Among them, x represents the original image, Represents the picture after adding the trigger, operator · represents matrix multiplication, operation concatenate represents matrix splicing, and operation rgba2rgb represents converting a four-channel image in RGBA format to RGB three-channel format. trigger is a two-dimensional matrix of dimension 4 × 5.
本实施例中,对于像素补丁型的触发器,其优化任务为:受害标签图片加上同一个触发器后,能被神经网络模型错判为目标标签,且触发器的像素数量尽量少。定义为:In this embodiment, for the pixel patch type trigger, the optimization task is: after adding the same trigger to the victim label image, it can be misjudged as the target label by the neural network model, and the number of pixels of the trigger is as small as possible. defined as:
其中,f表示待检测模型,其输出为预测的概率值,X表示来自受害标签的干净图片集合,表示交叉熵损失函数,yt表示目标标签,||M||1表示M的L1范数,β表示约束参数。where f represents the model to be detected whose output is the predicted probability value, X represents the set of clean images from the victim label, represents the cross-entropy loss function, y t represents the target label, ||M|| 1 represents the L1 norm of M, and β represents the constraint parameter.
本实施例中,对于图像滤镜型的触发器,为除了目标标签外的每个模型输出标签优化求解一个逆向触发器,共n-1个逆向触发器,并计算n-1个逆向触发器的平均攻击成功率。理由是被图像滤镜型触发器攻击的后门模型,内部决策空间被破坏,对绝大多数的标签都可以通过一个线性变换创建通往目标标签决策域的路径。优化任务为:受害标签图片经过滤镜触发器变换后,能被神经网络模型错判为目标标签,且由于滤镜变换后的图片仍然是清晰可辨的,经过滤镜触发器变换后的图片和原图片的结构相似性应尽量高。定义为:In this embodiment, for an image filter type trigger, an inverse trigger is optimized and solved for each model output label except the target label, with a total of n-1 inverse triggers, and n-1 inverse triggers are calculated. The average attack success rate. The reason is that the internal decision space of the backdoor model attacked by image filter triggers is destroyed, and a linear transformation can be used to create a path to the target label decision domain for most labels. The optimization task is: after the image of the victim tag is transformed by the filter trigger, it can be misjudged as the target label by the neural network model, and because the image after the filter transformation is still clearly discernible, the image after the filter trigger transformation The structural similarity with the original image should be as high as possible. defined as:
其中,f表示待检测模型,其输出为预测的概率值,X表示来自除目标标签外任一标签的干净图片集合,表示交叉熵损失函数,yt表示目标标签,SSIM表示结构相似性度量指标,γ表示约束参数。Among them, f represents the model to be detected, and its output is the predicted probability value, X represents the clean image set from any label except the target label, represents the cross-entropy loss function, y t represents the target label, SSIM represents the structural similarity measure, and γ represents the constraint parameter.
本实施例中,优化任务使用Adam优化器进行优化求解,得到逆向触发器。In this embodiment, the optimization task is optimized and solved by using the Adam optimizer to obtain an inverse trigger.
步骤3:分析得到的像素补丁型触发器的形状属性,以及激活待检测深度神经网络内部神经元的分布,输出最终检测结果。Step 3: Analyze the shape attribute of the obtained pixel patch trigger, and activate the distribution of the neurons in the deep neural network to be detected, and output the final detection result.
请见图5,本实施例中,步骤3的具体实现包括以下子步骤:Referring to Figure 5, in this embodiment, the specific implementation of step 3 includes the following sub-steps:
步骤3.1:对得到的成功率和触发器大小满足预设条件的像素补丁型触发器,分析其像素分布的稀疏度,若高于预设的稀疏度阈值,则认定是对抗扰动误报,输出深度神经网络检测结果为不含像素补丁型后门;否则,继续执行下述步骤3.2;Step 3.1: For the pixel patch triggers whose success rate and trigger size meet the preset conditions, analyze the sparsity of the pixel distribution. If it is higher than the preset sparsity threshold, it is considered to be an anti-disturbance false alarm, and output The detection result of the deep neural network is that there is no pixel patch backdoor; otherwise, proceed to the following step 3.2;
本实施例中,逆向触发器的像素分布稀疏度用触发器的格点覆盖率计算,将整张图片平均划分成若干小块,每块看作一个格点,则触发器的格点覆盖率定义为:包含触发器像素的格点占全部格点数的比例。当这个比例大于一定的阈值,认为是对抗扰动,而非攻击者恶意植入的后门触发器。In this embodiment, the pixel distribution sparsity of the reverse trigger is calculated by the grid coverage ratio of the trigger, and the entire picture is evenly divided into several small blocks, each block is regarded as a grid point, then the grid coverage ratio of the trigger Defined as: the proportion of grid points containing trigger pixels to the total number of grid points. When this ratio is greater than a certain threshold, it is considered to be an adversarial disturbance, rather than a backdoor trigger maliciously implanted by the attacker.
例如,在本发明实验过程中,将长宽为224×224的图片平均划分成1024个7×7的格点,格点覆盖率阈值取10%(具体实施时可由防御者视防御需求而定),当逆向触发器的格点覆盖率大于10%时,认为它是对抗扰动,而非攻击者恶意植入的后门触发器。For example, during the experiment of the present invention, a picture with a length and width of 224×224 is evenly divided into 1024 7×7 grid points, and the grid coverage threshold is 10% (the specific implementation can be determined by the defender depending on the defense needs. ), when the grid coverage of the reverse trigger is greater than 10%, it is considered to be an adversarial perturbation rather than a maliciously implanted backdoor trigger by the attacker.
步骤3.2:分析其激活待检测深度神经网络内部神经元的分布和目标标签干净图片样本激活神经元分布的相似性,若高于预设的相似性阈值,则认定是自然特征误报,输出深度神经网络检测结果为不含像素补丁型后门;否则,继续执行下述步骤3.3;Step 3.2: Analyze the similarity between the distribution of the activated neurons in the deep neural network to be detected and the distribution of the activated neurons in the target label clean image sample. If it is higher than the preset similarity threshold, it is considered to be a natural feature false positive, and the output depth The detection result of the neural network is that there is no pixel patch backdoor; otherwise, proceed to the following step 3.3;
本实施例中,激活内部神经元分布用模型倒数第二层的神经元计算,神经元激活函数为ReLU。In this embodiment, the activation of the internal neuron distribution is calculated by using the neurons in the penultimate layer of the model, and the neuron activation function is ReLU.
本实施例中,激活神经元分布的相似性用最大激活值神经元集合的相似度NBS计算,定义为:In this embodiment, the similarity of the activation neuron distribution is calculated by the similarity NBS of the maximum activation value neuron set, which is defined as:
其中,Nc表示来自目标标签干净样本有最大激活值的r个神经元集合,表示受害标签样本添加触发器后有最大激活值的r个神经元集合,表示全部受害标签样本添加触发器的样本集合。where N c represents the set of r neurons with the largest activation value from the clean sample of the target label, Represents the set of r neurons with the largest activation value after the victim label sample is added with a trigger, Represents a sample set of all victim tag sample addition triggers.
例如,在本发明实验过程中,取r=30,当激活神经元分布的相似性大于0.5时(具体实施时可由防御者视防御需求预设),认为它是自然特征,而非攻击者恶意植入的后门触发器。For example, in the experimental process of the present invention, r=30 is taken. When the similarity of the distribution of activated neurons is greater than 0.5 (the specific implementation can be preset by the defender according to the defense requirements), it is considered to be a natural feature rather than a malicious attacker. Implanted backdoor trigger.
步骤3.3:对得到的成功率和触发器大小满足预设条件的像素补丁型触发器,若经过上述两步检查后仍能保留,则认定该触发器是恶意植入的后门触发器,输出深度神经网络检测结果为含像素补丁型后门深度神经网络,以及目标标签,受害标签,触发器样式。Step 3.3: For the obtained pixel patch trigger whose success rate and trigger size meet the preset conditions, if it can still be retained after the above two-step inspection, it is determined that the trigger is a maliciously implanted backdoor trigger, and the output depth The neural network detection result is a backdoor deep neural network with pixel patches, as well as target labels, victim labels, and trigger patterns.
本发明通过静态权重分析识别可疑的目标-受害标签对,为识别的标签对进行触发器逆向工程,还原攻击者定义的触发器样式,实现高效、稳定、可扩展的神经网络后门检测。The invention identifies suspicious target-victim tag pairs through static weight analysis, performs trigger reverse engineering for the identified tag pairs, restores the trigger pattern defined by the attacker, and realizes efficient, stable and scalable neural network backdoor detection.
应当理解的是,上述针对较佳实施例的描述较为详细,并不能因此而认为是对本发明专利保护范围的限制,本领域的普通技术人员在本发明的启示下,在不脱离本发明权利要求所保护的范围情况下,还可以做出替换或变形,均落入本发明的保护范围之内,本发明的请求保护范围应以所附权利要求为准。It should be understood that the above description of the preferred embodiments is relatively detailed, and therefore should not be considered as a limitation on the protection scope of the patent of the present invention. In the case of the protection scope, substitutions or deformations can also be made, which all fall within the protection scope of the present invention, and the claimed protection scope of the present invention shall be subject to the appended claims.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210177556.2A CN114638356B (en) | 2022-02-25 | 2022-02-25 | A static weight-guided deep neural network backdoor detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210177556.2A CN114638356B (en) | 2022-02-25 | 2022-02-25 | A static weight-guided deep neural network backdoor detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114638356A true CN114638356A (en) | 2022-06-17 |
CN114638356B CN114638356B (en) | 2024-06-28 |
Family
ID=81947670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210177556.2A Active CN114638356B (en) | 2022-02-25 | 2022-02-25 | A static weight-guided deep neural network backdoor detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114638356B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115186816A (en) * | 2022-09-08 | 2022-10-14 | 南京逸智网络空间技术创新研究院有限公司 | Back door detection method based on decision shortcut search |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021042665A1 (en) * | 2019-09-04 | 2021-03-11 | 笵成科技南京有限公司 | Dnn-based method for protecting passport against fuzzy attack |
CN113269308A (en) * | 2021-05-31 | 2021-08-17 | 北京理工大学 | Clean label neural network back door implantation method based on universal countermeasure trigger |
-
2022
- 2022-02-25 CN CN202210177556.2A patent/CN114638356B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021042665A1 (en) * | 2019-09-04 | 2021-03-11 | 笵成科技南京有限公司 | Dnn-based method for protecting passport against fuzzy attack |
CN113269308A (en) * | 2021-05-31 | 2021-08-17 | 北京理工大学 | Clean label neural network back door implantation method based on universal countermeasure trigger |
Non-Patent Citations (1)
Title |
---|
莫禹钧;黄捷;潘愈嘉;: "基于网络安全态势感知的主动防御系统设计与实现", 医学信息学杂志, no. 03, 25 March 2020 (2020-03-25) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115186816A (en) * | 2022-09-08 | 2022-10-14 | 南京逸智网络空间技术创新研究院有限公司 | Back door detection method based on decision shortcut search |
WO2024051183A1 (en) * | 2022-09-08 | 2024-03-14 | 南京逸智网络空间技术创新研究院有限公司 | Backdoor detection method based on decision shortcut search |
Also Published As
Publication number | Publication date |
---|---|
CN114638356B (en) | 2024-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Adversarial examples: Opportunities and challenges | |
Bayar et al. | Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection | |
WO2023070696A1 (en) | Feature manipulation-based attack and defense method for continuous learning ability system | |
Gragnaniello et al. | Perceptual quality-preserving black-box attack against deep learning image classifiers | |
CN109543760B (en) | Adversarial sample detection method based on image filter algorithm | |
Liu et al. | Adversaries or allies? Privacy and deep learning in big data era | |
CN112738014A (en) | Industrial control flow abnormity detection method and system based on convolution time sequence network | |
Sun et al. | Can shape structure features improve model robustness under diverse adversarial settings? | |
Zanddizari et al. | Generating black-box adversarial examples in sparse domain | |
CN113420289B (en) | Hidden poisoning attack defense method and device for deep learning model | |
CN114638356B (en) | A static weight-guided deep neural network backdoor detection method and system | |
CN112613032B (en) | Host intrusion detection method and device based on system call sequence | |
Liu et al. | Defend Against Adversarial Samples by Using Perceptual Hash. | |
CN118264448A (en) | A deep learning network intrusion detection model for multi-classification | |
CN112948237B (en) | Poisoning model testing method, device and system based on neural pathway | |
CN116305103A (en) | Neural network model backdoor detection method based on confidence coefficient difference | |
Onoja et al. | Exploring the effectiveness and efficiency of LightGBM algorithm for windows malware detection | |
Kalidindi et al. | Feature selection and hybrid CNNF deep stacked autoencoder for botnet attack detection in IoT | |
Cui et al. | A cutting-edge video anomaly detection method using image quality assessment and attention mechanism-based deep learning | |
De Rose et al. | VINCENT: Cyber-threat detection through vision transformers and knowledge distillation | |
Han et al. | Research on APT attack detection technology based on DenseNet convolutional neural network | |
Ledda et al. | Adversarial attacks against uncertainty quantification | |
Santoso et al. | Malware detection using hybrid autoencoder approach for better security in educational institutions | |
Luo et al. | Defective Convolutional Networks | |
Li et al. | Precision strike: Precise backdoor attack with dynamic trigger |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |