CN110443162A

CN110443162A - A kind of two-part training method for disguised face identification

Info

Publication number: CN110443162A
Application number: CN201910654611.0A
Authority: CN
Inventors: 吴晓富; 项阳; 赵师亮; 张索非; 颜俊
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2019-11-12
Anticipated expiration: 2039-07-19
Also published as: CN110443162B

Abstract

A kind of two-part training method for disguised face identification, includes the following steps: step S1, pre-processes the data set used needed for training, obtains collection Set_F、Set_S；Step S2, the first stage, by Set_FAs training set and use ArcFace loss function training network；Step S3, the full articulamentum of the last layer of network is cancelled；Step S4, second stage, by Set_SAs training set and use ArcFace loss function training network.The work of model is applicable in domain present invention utilizes a small amount of disguised face data and has gone to disguised face identification from Generic face identification, there is good recognition effect in DFW test benchmark.

Description

A two-stage training method for disguised face recognition

技术领域technical field

本发明属于人脸识别技术领域，具体涉及一种用于伪装人脸识别的两段式训练方法。The invention belongs to the technical field of face recognition, and in particular relates to a two-stage training method for camouflaged face recognition.

背景技术Background technique

通过卷积神经网络的人脸识别技术在近些年取得了巨大的成功。作为研究中较为突出的一项成果，使用通过网络映射而得到的人脸特征向量来进行人脸识别有着很好的效果，并且被普遍认为是当前最先进的方法。随着先进的网络结构、高质量的数据集以及更加精妙的损失函数的不断提出，最后得到的特征向量的辨识力也越来越强：不同人的特征向量之间差异逐渐增大，反之同一个人的特征向量之间差异则在减小。Face recognition technology through convolutional neural networks has achieved great success in recent years. As a prominent achievement in the research, the use of face feature vectors obtained through network mapping for face recognition has a very good effect, and is generally considered to be the most advanced method at present. With the continuous development of advanced network structures, high-quality datasets, and more sophisticated loss functions, the resulting feature vectors are more and more discriminating: the difference between the feature vectors of different people gradually increases, and vice versa for the same person The difference between the eigenvectors is decreasing.

尽管人脸识别取得的成就颇为瞩目，但伪装人脸识别却依然是一个具有挑战性的课题。在对人脸进行诸如化妆，带上帽子、口罩等物件后，识别的难度将大大增加。并且在本身识别难度较大的情况下，深度学习所依赖度的数据集的总体质量也不能令人满意，这更进一步提高了课题的难度。相较于一般人脸识别领域不断有诸如FaceNet，SphereFace，ArcFace之类高质量的成果出现，伪装人脸识别领域的成果就少得多了，目前较新的成果是利用DFW伪装人脸数据集得到的MiRA-Face。它使用了一个两阶段的训练，先使用一般人脸识别的训练方法在得到了一个网络，接着利用了DFW提供的训练集使用PCA对特征向量进行了降维处理，借此获得了一定的伪装方面的信息。MiRA-Face的不足之处在于：(1)第一阶段训练采用的是CosFace提出的方法，该方法在目前并不是最好的选择；(2)使用PCA提取的信息较少。这两者导致了该算法的性能仍然有提升空间。Despite the impressive achievements in face recognition, disguised face recognition is still a challenging topic. After applying makeup, hats, masks and other objects to the face, the difficulty of identification will be greatly increased. And in the case that the recognition itself is difficult, the overall quality of the data set that deep learning relies on is also unsatisfactory, which further increases the difficulty of the subject. Compared with the high-quality results such as FaceNet, SphereFace, and ArcFace in the general face recognition field, the achievements in the field of camouflage face recognition are much less. The newer results are obtained by using the DFW camouflage face dataset MiRA-Face. It uses a two-stage training, first uses the general face recognition training method to obtain a network, and then uses the training set provided by DFW to use PCA to reduce the dimensionality of the feature vector, thereby obtaining a certain camouflage aspect. Information. The shortcomings of MiRA-Face are: (1) The first stage training adopts the method proposed by CosFace, which is not the best choice at present; (2) The information extracted by PCA is less. These two lead to the performance of the algorithm still has room for improvement.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是克服现有技术的不足，提供一种用于伪装人脸识别的两段式训练方法，通过采用ArcFace训练基本的卷积神经网络，再使用Joint损失函数用以最小化DFW训练样本的类内距并扩大其内间距，具有良好的伪装人脸识别效果。The technical problem to be solved by the present invention is to overcome the deficiencies of the prior art, and to provide a two-stage training method for camouflaged face recognition, by using ArcFace to train the basic convolutional neural network, and then using the Joint loss function to minimize The intra-class distance of DFW training samples is quantized and the intra-class distance is enlarged, which has a good effect of camouflaged face recognition.

本发明提供一种用于伪装人脸识别的两段式训练方法，包括如下步骤：The present invention provides a two-stage training method for disguised face recognition, comprising the following steps:

步骤S1、将训练所需使用的数据集进行预处理，获取集Set_F、Set_S；Step S1, preprocessing the data set required for training to obtain sets Set _F and Set _S ;

步骤S2、第一阶段，将Set_F作为训练集并使用ArcFace损失函数训练网络；Step S2, the first stage, take Set _F as the training set and use the ArcFace loss function to train the network;

步骤S3、撤销网络的最后一层全连接层；Step S3, cancel the last fully connected layer of the network;

步骤S4、第二阶段、将Set_S作为训练集并使用ArcFace损失函数训练网络。Step S4, the second stage, takes Set _S as the training set and uses the ArcFace loss function to train the network.

作为本发明的进一步技术方案，第一阶段采用的模型由ResNet50IR残差网络、BatchNorm层与Dropout层、全连接层以及另一个BatchNorm层构成的输出模块、全连接层和Softmax分类层构成的分类模块和ArcFace损失函数构成，所述ResNet50IR残差网络与所述输出模块作为提取特征的骨干网络。As a further technical solution of the present invention, the model adopted in the first stage is composed of ResNet50IR residual network, BatchNorm layer and Dropout layer, fully connected layer and an output module composed of another BatchNorm layer, and a classification module composed of a fully connected layer and Softmax classification layer and ArcFace loss function, the ResNet50IR residual network and the output module are used as the backbone network for feature extraction.

更进一步的，ResNet50IR残差网络在50层ResNet的基础上使用残差单元，残差单元为BatchNorm-卷积-BatchNorm-PRelu-卷积-BatchNorm的6层复合结构，输出大小由第5层卷积层的步长决定；步长为1时，输出与输入的尺寸相同；步长为2时，输出的尺寸为输入的一半；ResNet50IR残差网络由一个Input和4个卷积模块构成，4个卷积模块分别有3、4、14、3个残差单元，每个卷积模块的第一个残差单元负责降低输出的维度，输出模块的Dropout层的参数为0.5，全连接层输出为512维的向量，再经过一个BatchNorm层后得到最终的特征向量v。Further, the ResNet50IR residual network uses a residual unit based on the 50-layer ResNet. The residual unit is a 6-layer composite structure of BatchNorm-convolution-BatchNorm-PRelu-convolution-BatchNorm, and the output size is determined by the fifth layer volume. The step size of the product layer is determined; when the step size is 1, the output is the same size as the input; when the step size is 2, the output size is half of the input; the ResNet50IR residual network consists of an Input and 4 convolution modules, 4 Each convolution module has 3, 4, 14, and 3 residual units respectively. The first residual unit of each convolution module is responsible for reducing the dimension of the output. The parameter of the Dropout layer of the output module is 0.5, and the output of the fully connected layer is is a 512-dimensional vector, and then passes through a BatchNorm layer to obtain the final feature vector v.

更进一步的，步特征向量V输入全连接层前需经归一化处理，使||v||＝1；全连接层权重的维度根据训练集的标签类别数而定，当类别数为P时，权重矩阵w维度为D*P，使MS-Celeb-1M数据集作为训练集时，P为85K，D是特征向量V的长度，长度为512；将全连接层的偏置量b置零及w的每一列归一化，则全连接层的输出向量为w_i为权重矩阵W的第i列。Further, the step feature vector V needs to be normalized before being input to the fully connected layer, so that ||v||=1; the dimension of the weight of the fully connected layer is determined according to the number of label categories in the training set, when the number of categories is P , the dimension of the weight matrix w is D*P, and when the MS-Celeb-1M dataset is used as the training set, P is 85K, D is the length of the feature vector V, and the length is 512; set the bias b of the fully connected layer to Normalizing each column of zero and w, the output vector of the fully connected layer is w _i is the i-th column of the weight matrix W.

进一步的，使用ArcFace损失函数训练网络，函数公式为其中，超参数s与m分别为64与0.5，θ_j,i为第i个输入所产生的特征向量v_i与权重向量w_j的夹角，y_i为v_i所对应的正确的标签值。Further, use the ArcFace loss function to train the network, the function formula is Among them, the hyperparameters s and m are 64 and 0.5 respectively, θ _j,i is the angle between the feature vector v _i generated by the ith input and the weight vector w _j , and y _i is the correct label value corresponding to v _i .

进一步的，第二阶段的模型由特征提取网络与Joint损失函数构成，特征提取网络为第一阶段的骨干网络，Joint损失函数公式为Further, the model in the second stage is composed of a feature extraction network and a joint loss function. The feature extraction network is the backbone network of the first stage, and the joint loss function formula is

公式中，前部分为Triplet损失，后部分为Pair损失；f(x_i)为经过归一化处理后的特征提取网络输出的特征向量v_i，<f(x₁),f(x₂)>为特征向量的向量积，即向量v₁与v₂的夹角余弦值，参数α与λ均为正值。In the formula, the former part is Triplet loss, and the latter part is Pair loss; f(x _i ) is the feature vector v _i output by the feature extraction network after normalization, <f(x ₁ ), f(x ₂ ) > is the vector product of the eigenvectors, that is, the cosine value of the angle between the vectors v ₁ and v ₂ , and the parameters α and λ are both positive values.

进一步的，第二阶段的训练集为DFW数据集的训练集，训练前需要进行三元组的配对，首先选取一张Normal图像作为再选取与Normal同目录下的Validation或者Disguised作为最后选取与Normal同目录下的Impersonator作为 Further, the training set of the second stage is the training set of the DFW data set, and a triplet needs to be performed before training. pairing, first select a Normal image as Then select Validation or Disguised in the same directory as Normal as Finally, select Impersonator in the same directory as Normal as the

进一步的，DFW数据集中同目录下的图片分Normal、Validation、Disguised和Impersonator，其中，Normal、Validation和Disguised为同一个人，Impersonator与前三种为长相相近的不同人。Further, the pictures in the same directory in the DFW dataset are divided into Normal, Validation, Disguised and Impersonator. Among them, Normal, Validation and Disguised are the same person, and Impersonator and the first three are different people who are similar in appearance.

本发明利用了少量的伪装人脸数据将模型的工作适用域从一般人脸识别转到了伪装人脸识别，在DFW测试基准上具有很好的识别效果。The invention utilizes a small amount of camouflaged face data to change the working applicable domain of the model from general face recognition to camouflaged face recognition, and has a good recognition effect on the DFW test benchmark.

附图说明Description of drawings

图1为本发明的训练流程示意图；Fig. 1 is the training flow schematic diagram of the present invention;

图2为本发明的骨干网络结构示意图；Fig. 2 is the backbone network structure schematic diagram of the present invention;

图3为本发明的残差单元结构图；3 is a structural diagram of a residual unit of the present invention;

图4为本发明的阶段2中不同损失函数对比图；4 is a comparison diagram of different loss functions in stage 2 of the present invention;

图5为DFW数据集示例图；Figure 5 is an example diagram of the DFW dataset;

图6为DFW测试结果对比图。Figure 6 is a comparison chart of DFW test results.

具体实施方式Detailed ways

请参阅图1，本实施例整体流程分为两个阶段，阶段1所采用的网络模型由以下4部分构成：(1)ResNet50IR残差网络；(2)BatchNorm层、Dropout层、全连接层以及又一个BatchNorm层构成的输出模块；(3)全连接层和Softmax分类层构成的分类模块；(4)ArcFace损失函数。其中(1)和(2)两部分作为进行特征提取的骨干网络，在图2中给出了具体的网络结构和单个输出的维度。阶段1具体分析如下：Please refer to FIG. 1. The overall process of this embodiment is divided into two stages. The network model adopted in stage 1 consists of the following four parts: (1) ResNet50IR residual network; (2) BatchNorm layer, Dropout layer, fully connected layer and Another output module composed of a BatchNorm layer; (3) a classification module composed of a fully connected layer and a Softmax classification layer; (4) ArcFace loss function. Among them, the two parts (1) and (2) are used as the backbone network for feature extraction, and the specific network structure and the dimension of a single output are given in Figure 2. The detailed analysis of stage 1 is as follows:

(1)ResNet50IR在常规的50层ResNet的基础上使用图3所示的改进的残差单元。该残差单元使用了BatchNorm-卷积-BatchNorm-PRelu-卷积-BatchNorm的6层复合结构。整个残差单元的输出大小由第5层卷积层的步长来控制，当步长为1时，输出与输入的尺寸相同，而步长为2时，输出的尺寸为输入的一半。(1) ResNet50IR uses the improved residual unit shown in Figure 3 on the basis of the conventional 50-layer ResNet. The residual unit uses a 6-layer composite structure of BatchNorm-Convolution-BatchNorm-PRelu-Convolution-BatchNorm. The output size of the entire residual unit is controlled by the stride of the 5th convolutional layer. When the stride is 1, the output is the same size as the input, and when the stride is 2, the output is half the size of the input.

(2)ResNet50IR由5部分构成，一个Input部分和4个卷积模块，4个卷积模块分别有3、4、14、3个残差单元，并且每个模块中的第一个残差单元负责降低输出的维度(将该单元中第二个卷积层的步长设置为2)。在图2中我们给出了在单个输入维度为[112*112*3]时每个模块的输出维度。(2) ResNet50IR consists of 5 parts, an Input part and 4 convolution modules, the 4 convolution modules have 3, 4, 14, and 3 residual units respectively, and the first residual unit in each module Responsible for reducing the dimensionality of the output (set the stride of the second convolutional layer in this unit to 2). In Figure 2 we present the output dimension of each module when a single input dimension is [112*112*3].

(3)输出模块中的Dropout层的参数设置为0.5，也就是说有随机一半的单元在该层时输出会置零，这增加了网络的鲁棒性。全连接层输出为512维的向量，再经过一个BatchNorm层后得到最终的特征向量v。(3) The parameter of the Dropout layer in the output module is set to 0.5, which means that the output of half of the random units in this layer will be set to zero, which increases the robustness of the network. The output of the fully connected layer is a 512-dimensional vector, and after a BatchNorm layer, the final feature vector v is obtained.

(4)将特征向量v输入全连接层之前，需要对其进行归一化处理，使得||v||＝1。全连接层权重的维度根据训练集的标签类别数而定，类别数为PP时，权重矩阵W维度为D*P(D行P列)，使用MS-Celeb-1M数据集作为训练集时，P为85k，D是特征向量v的长度，在此为512。将全连接层的偏置量b置零以及W的每一列归一化，假设W的第i列写作w_i，此时全连接层的输出向量的第i位为：(4) Before inputting the feature vector v into the fully connected layer, it needs to be normalized so that ||v||=1. The dimension of the weight of the fully connected layer depends on the number of label categories in the training set. When the number of categories is PP, the W dimension of the weight matrix is D*P (D row and P column). When using the MS-Celeb-1M dataset as the training set, P is 85k, and D is the length of the feature vector v, which is 512 here. The bias b of the fully connected layer is set to zero and each column of W is normalized. Assuming that the i-th column of W is written as w _i , the i-th bit of the output vector of the fully connected layer is:

v·w_i＝‖v‖·||w_i||·cosθ＝cosθ (1.1)v·w _i =‖v‖·||w _i ||·cosθ=cosθ (1.1)

cos(θ)是两向量v和w_i的夹角。cos(θ) is the angle between the two vectors v and _wi .

(5)采用ArcFace提出的损失函数训练整个网络，公式如下：(5) Use the loss function proposed by ArcFace to train the entire network, the formula is as follows:

式中的超参数s与m分别设置为64与0.5，θ_j,i指的是第i个输入所产生的特征向量v_i与权重向量w_j的夹角，y_i表示v_i所对应的正确的标签值。相对于公式(1.3)所示的经常采用的Softmax损失函数：The hyperparameters s and m in the formula are set to 64 and 0.5 respectively, θ _j,i refers to the angle between the feature vector v _i generated by the ith input and the weight vector w _j , and y _i represents the corresponding value of v _i . Correct label value. Relative to the often used Softmax loss function shown in formula (1.3):

公式(1.2)进行了如下改进：Equation (1.2) is improved as follows:

将全连接层的偏置b置0，并将特征向量v以及权重向量w_i作归一化处理，此时v与w_i的向量积可以看作两向量夹角的余弦值，见公式(1.1)。公式(1.2)看作夹角θ_j,i的函数，对其求梯度，有如下结果：The bias b of the fully connected layer is set to 0, and the feature vector v and the weight vector _wi are normalized. At this time, the vector product of v and _wi can be regarded as the cosine value of the angle between the two vectors, as shown in the formula ( 1.1). Formula (1.2) is regarded as a function of the included angle θ _j,i , and the gradient is calculated, and the following results are obtained:

使损失函数下降最快的方向是减小而增大，可以认为在训练时会让特征向量v_i尽量接近代表其标签类别的权重向量同时远离并非代表该类的其余权重向量w_j,j≠y_i。如此一来具有相同标签的特征向量就会随着训练而聚集在同一区域，不同标签的特征向量会被拉开夹角角度，即缩小类内距离，增大类间距离。The direction that makes the loss function fall the fastest is reduce while Increase, it can be considered that the feature vector v _i will be as close as possible to the weight vector representing its label category during training At the same time, stay away from the rest of the weight vectors w _j , j≠y _i that do not represent the class. In this way, the feature vectors with the same label will be gathered in the same area with training, and the feature vectors of different labels will be pulled apart by the angle, that is, the intra-class distance is reduced and the inter-class distance is increased.

使用代替了这会进一步缩小特征向量的类内距离。考虑在较小时有此时以公式(1.2)作为损失函数依然会保持一个较大的值，为了使损失函数继续下降，需要使进一步减小。use instead of This further reduces the intra-class distance of the feature vectors. consider in small when At this time, using formula (1.2) as the loss function will still maintain a large value. In order to make the loss function continue to decrease, it is necessary to make further reduced.

在对特征向量v以及权重向量w_i归一化之外，还设置了一个超参数s，使用较大的s值会使得训练更容易进行，网络更容易收敛，一般设置为64。在不使用s的情况下，单单使用代替在实际训练时往往模型很难收敛。在s设置较大时，可以发现，在分类错误时损失函数会较不使用s时更大，迫使其向正确的方向迭代，而在分类正确时则损失函数会较原先小，使得训练容易收敛。In addition to normalizing the feature vector v and the weight vector w _i , a hyperparameter s is also set. Using a larger s value will make the training easier and the network easier to converge. It is generally set to 64. In the case of not using s, just use replace It is often difficult for the model to converge during actual training. When s is set large, it can be found that the loss function will be larger when the classification is wrong than when s is not used, forcing it to iterate in the correct direction, while when the classification is correct, the loss function will be smaller than the original, making the training easier to converge .

第一阶段的训练一般采用较大的人脸数据集，例如VGG2，MS-Celeb-1M等等，该阶段是为了得到一个效果较好的应用于非伪装人脸的特征提取网络(即去掉最后一部分全连接层的网络)。The first stage of training generally uses larger face datasets, such as VGG2, MS-Celeb-1M, etc. This stage is to obtain a better feature extraction network applied to non-camouflaged faces (that is, remove the last part of the network with fully connected layers).

阶段2使用的网络模型为(1)特征提取网络；(2)Joint损失函数。其中特征提取网络即为经过阶段1后得到的骨干网络。阶段2具体分析如下：The network model used in stage 2 is (1) feature extraction network; (2) Joint loss function. The feature extraction network is the backbone network obtained after stage 1. The detailed analysis of stage 2 is as follows:

(1)Joint损失函数如下公式所示：(1) The Joint loss function is shown in the following formula:

显然，公式(1.5)由两部分构成，前一部分称为Triplet损失，后部分称为Pair损失。在式中使用f(x_i,x_i)表示经过归一化处理后的特征提取网络输出的特征向量v_i，<f(x₁),f(x₂)>表示特征向量的向量积，也就是两个向量v₁与v₂的夹角余弦值。参数α与λ都取正值。在使用该损失函数训练时，需要将图片整理成三张一组进行输入，即公式中的其中的二元组称为正样本对，需要有相同的标签，而则需要不同标签，称为负样本对。Triplet损失控制了正样本对之间的距离小于负样本对之间的距离，具体的差值由参数α控制，一般α取0.3左右。Pair损失控制了正样本对之间的距离，进一步限制了类内距，它的存在避免了只使用Triplet损失可能会出现的控制了类间距却不能降低类内距的情况。图4中显示了只使用Triplet损失和使用Joint损失训练后得到的正样本对之间的夹角分布对比，可以发现，在使用了Pair损失后，正样本对之间的夹角分布明显更好。对于参数λ一般取0.3或者0.4。Obviously, formula (1.5) consists of two parts, the former part is called Triplet loss, and the latter part is called Pair loss. In the formula, f(x _i , x _i ) is used to represent the feature vector v _i output by the feature extraction network after normalization, and <f(x ₁ ), f(x ₂ )> represents the vector product of the feature vectors, That is, the cosine of the angle between the two vectors v ₁ and v ₂ . The parameters α and λ both take positive values. When using this loss function for training, it is necessary to organize the pictures into groups of three for input, that is, in the formula the two-tuple of is called a positive sample pair and needs to have the same label, while Then different labels are required, which are called negative sample pairs. Triplet loss controls that the distance between positive sample pairs is smaller than the distance between negative sample pairs, and the specific difference is controlled by the parameter α, generally α is about 0.3. The Pair loss controls the distance between pairs of positive samples and further limits the intra-class distance. Its existence avoids the situation that only the Triplet loss can control the inter-class distance but cannot reduce the intra-class distance. Figure 4 shows the comparison of the angle distribution between the positive sample pairs after training using only Triplet loss and using Joint loss. It can be found that after using Pair loss, the angle distribution between positive sample pairs is significantly better . For the parameter λ, generally take 0.3 or 0.4.

(2)阶段2使用的训练集是DFW数据集的训练集，在训练前需要进行三元组的配对：1)选取一张Normal图像作为2)选取与Normal同目录下的Validation或者Disguised作为3)选取与Normal同目录下的Impersonator作为(注：DFW数据集中同目录下的图片分为4种：Normal、Validation、Disguised和Impersonator，Normal、Validation和Disguised为同一个人，Impersonator和前三种为长相相近的不同人，示例见图5)(2) The training set used in stage 2 is the training set of the DFW dataset, which needs to be tripled before training. Pairing: 1) Pick a Normal image as 2) Select Validation or Disguised in the same directory as Normal as 3) Select Impersonator in the same directory as Normal as (Note: The pictures in the same directory in the DFW dataset are divided into 4 types: Normal, Validation, Disguised and Impersonator, Normal, Validation and Disguised are the same person, Impersonator and the first three are different people with similar looks, see Figure 5 for an example)

本发明通过阶段1的训练得到了一个针对一般人脸的特征提取网络，接着的阶段2针对现今伪装人脸数据集普遍较小的情况，使用Triplet损失函数的思想进行三元组的配对训练，并且使用Pair损失弥补了Triplet损失存在的不足，完成了网络的适用范围向伪装人脸的迁移。两种处理结合在一起使得本发明在结果上有了较大的提升。The present invention obtains a feature extraction network for general faces through the training in stage 1, and then in stage 2, aiming at the situation that the current camouflaged face data set is generally small, the triplet pairing training is performed using the idea of the Triplet loss function, and Pair loss is used to make up for the insufficiency of Triplet loss, and the migration of the application scope of the network to disguised faces is completed. The combination of the two treatments enables the present invention to achieve a greater improvement in results.

本发明的模型(在MS-Celeb-1M数据集上进行阶段1训练，DFW训练集上进行阶段2训练)在DFW提供的伪装人脸识别测试集上进行进行而测试，FAR在1％与0.1％的情况下GAR分别为：1)protocol-1：97.98％和60.23％；2)protocol-2：90.37％和82.84％；3)protocol-3：90.4％和81.18％。下面为GAR与FAR的说明：The model of the present invention (stage 1 training on MS-Celeb-1M data set, stage 2 training on DFW training set) is tested on the camouflaged face recognition test set provided by DFW, and the FAR is between 1% and 0.1 The GARs in % cases are: 1) protocol-1: 97.98% and 60.23%; 2) protocol-2: 90.37% and 82.84%; 3) protocol-3: 90.4% and 81.18%. The following is the description of GAR and FAR:

DFW测试数据集提供一批两两一组的人脸图像，在这些成对的图像中会有一部分的对中的两张图像为同一人，这些对作为正样本。另外的图片对则分属不同人。衡量两张图像相似程度的是图像特征向量之间的距离，但是现在只有一个距离明显是无法判断这两张图像到时是否是同一个人。目前比较常见的方法是再添加一个阙值作为门限，当距离小于这个门限值时便认为是正样本，反之为负样本。The DFW test dataset provides a batch of two-by-two face images. Among these paired images, some of the two images in the pair are the same person, and these pairs are used as positive samples. The other picture pairs belong to different people. What measures the similarity of two images is the distance between the image feature vectors, but now there is only one distance, and it is obvious that it is impossible to judge whether the two images are the same person at that time. At present, the more common method is to add a threshold value as a threshold. When the distance is less than this threshold value, it is considered as a positive sample, otherwise it is a negative sample.

当给定一个阙值时。可以计算出TP，TN，FP，FN的值：when a threshold is given. The values of TP, TN, FP, FN can be calculated:

TP：被算法正确识别的正样本数量；TP: the number of positive samples correctly identified by the algorithm;

TN：被算法正确识别的负样本数量；TN: the number of negative samples correctly identified by the algorithm;

FP：被识别为正样本的负样本数；FP: the number of negative samples identified as positive samples;

FN：被识别为负样本的正样本数量FN: the number of positive samples identified as negative samples

接下来通过这些值可以得到GAR与FAR的值：Next, the values of GAR and FAR can be obtained through these values:

图6是本发明模型与其他模型的比较，总体来看，本发明的表现要好于现存的多数算法。(注：DFW数据集提供了protocol-1、protocol-2和protocol-3这三组不同的正负样本对，其中protocol-3为前两组样本对的综合。)Figure 6 is a comparison of the model of the present invention with other models. Overall, the performance of the present invention is better than most existing algorithms. (Note: The DFW dataset provides three different sets of positive and negative sample pairs, protocol-1, protocol-2, and protocol-3, where protocol-3 is the synthesis of the first two sets of sample pairs.)

以上显示和描述了本发明的基本原理、主要特征和优点。本领域的技术人员应该了解，本发明不受上述具体实施例的限制，上述具体实施例和说明书中的描述只是为了进一步说明本发明的原理，在不脱离本发明精神范围的前提下，本发明还会有各种变化和改进，这些变化和改进都落入要求保护的本发明范围内。本发明要求保护的范围由权利要求书及其等效物界定。The foregoing has shown and described the basic principles, main features and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited by the above-mentioned specific embodiments. The above-mentioned specific embodiments and descriptions in the specification are only to further illustrate the principle of the present invention. There are also various changes and modifications which fall within the scope of the claimed invention. The claimed scope of the present invention is defined by the claims and their equivalents.

Claims

1. a two-stage training method for camouflaging face recognition, is characterized in that, comprises the steps,

Step S1, preprocessing the data set required for training to obtain sets Set _F and Set _S ;

Step S2, the first stage, take Set _F as the training set and use the ArcFace loss function to train the network;

Step S3, cancel the last fully connected layer of the network;

Step S4, the second stage, takes Set _S as the training set and uses the ArcFace loss function to train the network.

2. a kind of two-stage training method for disguised face recognition according to claim 1, is characterized in that, the model that described first stage adopts is by ResNet50IR residual network, BatchNorm layer and Dropout layer, fully connected Layer and another BatchNorm layer composed of output module, fully connected layer and Softmax classification layer composed of classification module and ArcFace loss function, the ResNet50IR residual network and the output module are used as the backbone network for extracting features.

3. a kind of two-stage training method for disguised face recognition according to claim 2, is characterized in that, ResNet50IR residual network uses residual unit on the basis of 50 layers of ResNet, and residual unit is BatchNorm- Convolution-BatchNorm-PRelu-Convolution-BatchNorm 6-layer composite structure, the output size is determined by the stride of the fifth convolutional layer; when the stride is 1, the output is the same size as the input; when the stride is 2, The size of the output is half of the input; the ResNet50IR residual network consists of an Input and 4 convolution modules, and the 4 convolution modules have 3, 4, 14, and 3 residual units respectively. A residual unit is responsible for reducing the dimension of the output. The parameter of the Dropout layer of the output module is 0.5, and the output of the fully connected layer is a 512-dimensional vector. After a BatchNorm layer, the final feature vector v is obtained.

4. A two-stage training method for disguised face recognition according to claim 3, wherein the feature vector V needs to be normalized before being input to the fully connected layer, so that ||v||= 1; The dimension of the weight of the fully connected layer is determined according to the number of label categories in the training set. When the number of categories is P, the dimension of the weight matrix w is D*P, and when the MS-Celeb-1M dataset is used as the training set, P is 85K , D is the length of the feature vector V, and the length is 512; if the bias b of the fully connected layer is set to zero and each column of w is normalized, the output vector of the fully connected layer is w _i is the i-th column of the weight matrix W.

5. a kind of two-stage training method for disguised face recognition according to claim 1 and 2, is characterized in that, uses ArcFace loss function training network, and the function formula is Among them, the hyperparameters s and m are 64 and 0.5 respectively, θ _j,i is the angle between the feature vector v _i generated by the ith input and the weight vector w _j , and y _i is the correct label value corresponding to v _i .

6. A two-stage training method for disguised face recognition according to claim 1, wherein the model in the second stage is composed of a feature extraction network and a joint loss function, and the feature extraction network is the first stage The backbone network of , the Joint loss function formula is

In the formula, the former part is Triplet loss, and the latter part is Pair loss; f(x _i ) is the feature vector v _i output by the feature extraction network after normalization, <f(x ₁ ), f(x ₂ ) > is the vector product of the eigenvectors, that is, the cosine value of the angle between the vectors v ₁ and v ₂ , and the parameters α and λ are both positive values.

7. a kind of two-stage training method for disguised face recognition according to claim 1, is characterized in that, the training set of described second stage is the training set of DFW data set, need to carry out ternary before training Group pairing, first select a Normal image as Then select Validation or Disguised in the same directory as Normal as Finally, select Impersonator in the same directory as Normal as the

8. a kind of two-stage training method for camouflaging face recognition according to claim 7, is characterized in that, the picture under the same catalogue in DFW data set divides Normal, Validation, Disguised and Impersonator, wherein, Normal, Validation The same person as Disguised, Impersonator and the first three are different people who are similar in appearance.