CN111582057A

CN111582057A - Face verification method based on local receptive field

Info

Publication number: CN111582057A
Application number: CN202010310755.7A
Authority: CN
Inventors: 刘昊; 花硕硕; 庞伟; 陆生礼
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2020-08-25
Anticipated expiration: 2040-04-20
Also published as: CN111582057B

Abstract

The invention discloses a face verification method based on local receptive fields, and belongs to the technical field of calculation, calculation or counting. The method comprises the following steps: establishing an external data set, and performing data enhancement on samples in the data set; establishing a convolutional neural network, wherein the input of the convolutional neural network is a color picture, the output of the convolutional neural network is a characteristic vector corresponding to a face region in the picture and a prediction frame coordinate of a face position, and the characteristic vector of the corresponding region is output according to the position of the prediction frame in the picture during testing; and testing the pre-trained convolutional neural network by using the test set and finely adjusting the convolutional neural network according to the test result. According to the translation invariance of the deep neural network, the invention effectively extracts the characteristics of the face region by using one network, so that the receptive field of the characteristic vector just only contains the face, thereby effectively reducing the noise brought by background information, ensuring the accuracy of face verification, simultaneously improving the parallelism of network calculation and greatly simplifying the training process.

Description

A face verification method based on local receptive field

技术领域technical field

本发明公开了一种基于局部感受野的人脸验证方法，涉及人脸验证的计算机视觉技术，属于计算、推算或计数的技术领域。The invention discloses a face verification method based on a local receptive field, relates to a computer vision technology for face verification, and belongs to the technical field of calculation, calculation or counting.

背景技术Background technique

人脸识别是计算机视觉技术中十分重要的一部分，其目标是在一张人脸照片中正确地识别人物的身份。当前主流的方式是利用分类的神经网络对人脸图片进行分类，然而分类网络需要根据固定的类别进行设计，并且在训练完成后无法新增人物身份，在实际使用中十分不灵活。因此我们利用人脸验证的方式进行人脸识别，利用神经网络对人脸图片进行特征抽取生成图片中人脸的特征向量，然后计算不同人脸特征向量之间的欧几里得距离，最后通过设置阈值判断是否为同一个人。这样如果新增加一个身份只需要利用网络生成其人脸特征并保存，再与新输入的样本特征向量进行计算即可进行身份识别。Face recognition is a very important part of computer vision technology, and its goal is to correctly identify the identity of a person in a face photo. The current mainstream method is to use a classification neural network to classify face pictures. However, the classification network needs to be designed according to a fixed category, and the identity of the person cannot be added after the training is completed, which is very inflexible in practical use. Therefore, we use face verification for face recognition, use neural network to extract features from face pictures to generate feature vectors of the faces in the pictures, and then calculate the Euclidean distance between different face feature vectors, and finally pass Set a threshold to judge whether it is the same person. In this way, if a new identity is added, it is only necessary to use the network to generate its facial features and save them, and then calculate with the newly input sample feature vector to perform identity recognition.

然而，当前的人脸验证算法都分为两个步骤，首先图片需要经过一个人脸检测网络获取人脸的位置坐标，通过坐标将图片中的人脸部分裁剪下来，通过这种方式减少背景带来的噪声，然后将人脸图片输入到人脸验证网络中进行后续计算。也就是说在进行人脸验证的过程中，需要采用两个网络才可以完成，这就意味着在训练时需要分别训练一个人脸检测网络和一个人脸验证网络，这会在训练时带来不便，同时因为是两个单独的网络，所以在实际使用中会降低网络的并行度，本质上是一种两阶段的人脸验证算法。However, the current face verification algorithm is divided into two steps. First, the picture needs to go through a face detection network to obtain the position coordinates of the face, and then cut out the face part in the picture through the coordinates, so as to reduce the background band. noise, and then input the face image into the face verification network for subsequent calculations. That is to say, in the process of face verification, two networks need to be used to complete it, which means that a face detection network and a face verification network need to be trained separately during training, which will bring Inconvenient, at the same time, because there are two separate networks, the parallelism of the network will be reduced in actual use, which is essentially a two-stage face verification algorithm.

卷积神经网络具有平移不变性，因此只需要通过对卷积核和步长的调整就可以得到图片指定区域的特征，也就是说这些特征的感受野对应于图片中的特定区域。利用卷积神经网络的这种性质，可以只获取原图中的人脸区域的特征。本申请旨在提出一种基于局部感受野的人脸验证方法以提高网络的并行度并降低训练步骤的复杂性。The convolutional neural network has translation invariance, so it only needs to adjust the convolution kernel and step size to obtain the features of the specified area of the picture, that is to say, the receptive field of these features corresponds to the specific area in the picture. Using this property of the convolutional neural network, only the features of the face area in the original image can be obtained. This application aims to propose a face verification method based on local receptive fields to improve the parallelism of the network and reduce the complexity of the training steps.

发明内容SUMMARY OF THE INVENTION

本发明的发明目的是针对上述背景技术的不足，提供了一种基于局部感受野的人脸验证方法，利用网络输出的感受野获取人脸区域的特征向量，仅使用一个卷积神经网络即可实现人脸的检测和验证，在提高检测操作和验证操作并行度的同时有效减少图片背景带来的噪声，解决了现有的两阶段人脸验证方法并行度低且训练步骤复杂的技术问题。The purpose of the present invention is to address the deficiencies of the above-mentioned background technology, and to provide a face verification method based on local receptive field, which utilizes the receptive field output by the network to obtain the feature vector of the face region, and only uses a convolutional neural network. Realize face detection and verification, improve the parallelism of detection operations and verification operations while effectively reducing the noise caused by the background of the picture, and solve the technical problems of low parallelism and complex training steps in the existing two-stage face verification method.

本发明为实现上述发明目的采用如下技术方案：The present invention adopts following technical scheme for realizing above-mentioned purpose of invention:

一种基于局部感受野的人脸验证方法，包括如下步骤：A face verification method based on local receptive field, comprising the following steps:

步骤1，将公开的人脸验证数据集或者自行收集的数据集分为训练集、验证集和和测试集；Step 1: Divide the public face verification data set or the data set collected by yourself into training set, verification set and test set;

步骤2，对数据集中的样本进行数据增强，采用如下至少一种方式进行数据增强：平移、缩放、旋转、翻转；Step 2, performing data enhancement on the samples in the data set, using at least one of the following methods for data enhancement: translation, zoom, rotation, flip;

步骤3，建立基于局部感受野的人脸验证卷积神经网络，该卷积神经网络的输入为彩色图片，训练时输出为图片中人脸所属的身份类别和人脸位置的预测框坐标，损失函数采用softmax loss，测试时根据预测框在图像中的位置输出对应区域的特征向量；Step 3, establish a face verification convolutional neural network based on local receptive field, the input of the convolutional neural network is a color image, and the output during training is the identity category of the face in the image and the predicted frame coordinates of the face position, loss The function uses softmax loss, and outputs the feature vector of the corresponding area according to the position of the predicted frame in the image during testing;

步骤4，利用测试集对步骤3预训练好的卷积神经网络进行测试，并根据测试结果对卷积神经网络进行微调。Step 4, use the test set to test the pre-trained convolutional neural network in step 3, and fine-tune the convolutional neural network according to the test results.

上述步骤1中，人脸验证训练集采用CASIA-WebFace，测试集采用LFW数据集。In the above step 1, the face verification training set adopts CASIA-WebFace, and the test set adopts the LFW data set.

上述步骤1中，将数据集中所有图片都缩放成卷积神经网络的输入尺寸后，进行归一化处理。In the above step 1, after scaling all the images in the dataset to the input size of the convolutional neural network, normalization is performed.

上述步骤3中，基于局部感受野的人脸验证的卷积神经网络的有两部分组成，一部分用于检测输入图片中的人脸区域，另一部分用于提取人脸区域的特征向量。在训练时，网络针对输入图片中的不同区域进行人脸特征的抽取，然后根据人脸检测结果挑选人脸区域对应的特征向量，通过全连接层对挑选出的特征向量进行分类训练，输出人脸所属的身份类别和人脸位置的预测框坐标，采用softmax loss作为损失函数，利用Adam优化器进行训练，当准确率不再上升时，保存卷积神经网络模型，得到训练好的卷积神经网络。In the above step 3, the convolutional neural network for face verification based on local receptive field consists of two parts, one part is used to detect the face area in the input picture, and the other part is used to extract the feature vector of the face area. During training, the network extracts face features from different regions in the input image, and then selects the feature vectors corresponding to the face regions according to the face detection results, and classifies and trains the selected feature vectors through the fully connected layer. The identity category to which the face belongs and the predicted frame coordinates of the face position use softmax loss as the loss function, and use the Adam optimizer for training. When the accuracy rate no longer rises, save the convolutional neural network model and obtain the trained convolutional neural network. network.

上述步骤4中，在对网络进行测试和使用时去掉全连接层，将测试图片样本输入网络中，网络直接输出的人脸特征向量即为本图像中人脸的身份特征，然后通过计算不同测试图片样本特征向量之间的欧氏距离，如果距离小于一定阈值，则认为两张人脸图片为同一个人的图片；否则为不同人的照片。In the above step 4, when testing and using the network, the fully connected layer is removed, and the test image sample is input into the network. The face feature vector directly output by the network is the identity feature of the face in this image, and then different tests are calculated by calculating. The Euclidean distance between the feature vectors of the picture samples. If the distance is less than a certain threshold, the two face pictures are considered to be pictures of the same person; otherwise, they are pictures of different people.

上述卷积神经网络中，在计算经过网络最后一个卷积层后生成N*N*K维的特征向量，其中，N*N维表示将图片划分为N个区域且每个区域的特征向量大小为1*1*K，同时为了解决人脸出现的位置正好位于区域交界处以致区域无法包含整张人脸的问题，采用大卷积核并以小于卷积核宽度的卷积步长滑动卷积核以使相邻两次卷积操作针对的局部图片相互交叠；在训练时需要在筛选出的特征向量后添加一个全连接层，将每次筛选的特征向量通过全连接层并且采用softmax loss作为损失函数计算网络预测误差。In the above convolutional neural network, after calculating the last convolutional layer of the network, a feature vector of N*N*K dimension is generated, where N*N dimension indicates that the picture is divided into N regions and the size of the feature vector of each region is It is 1*1*K. At the same time, in order to solve the problem that the face appears at the junction of the region so that the region cannot contain the entire face, a large convolution kernel is used and the convolution step size is smaller than the width of the convolution kernel. Accumulate the kernel so that the local images targeted by the two adjacent convolution operations overlap each other; during training, a fully connected layer needs to be added after the filtered feature vector, and the feature vector filtered each time is passed through the fully connected layer and softmax is used loss is used as a loss function to calculate the network prediction error.

上述卷积神经网络中，在进行人脸特征进行抽取的同时还需要用人脸检测的方法对输入图片是否存在人脸进行判断，即，对输入图片中的人脸位置坐标进行预测，然后根据坐标确定的人脸区域选择对应的特征向量；在训练时要在人脸分类的损失函数中添加人脸检测的损失函数同时进行训练。In the above-mentioned convolutional neural network, while extracting the face features, it is also necessary to use the method of face detection to judge whether there is a face in the input picture, that is, to predict the position coordinates of the face in the input picture, and then according to the coordinates. Select the corresponding feature vector for the determined face area; during training, add the loss function of face detection to the loss function of face classification and train at the same time.

本发明采用上述技术方案，具有以下有益效果：The present invention adopts the above-mentioned technical scheme, and has the following beneficial effects:

(1)本发明提出了一种可以利用单个深度神经网络进行人脸验证的方法，该方法利用深度卷积神经网络的平移不变性抽取图像不同区域的特征向量，以检测到的人脸区域为特征向量的局部感受野，利用特征向量的感受野从图片不同区域的特征向量进行筛选，相比于先检测后裁剪的特征向量提取方式，能够在有效减少背景信息带来的噪声的同时减少后续计算中的重复计算。(1) The present invention proposes a method that can use a single deep neural network for face verification. The method uses the translation invariance of the deep convolutional neural network to extract the feature vectors of different areas of the image, and the detected face area is The local receptive field of the feature vector uses the receptive field of the feature vector to filter from the feature vectors in different areas of the image. Compared with the feature vector extraction method of detecting first and then cropping, it can effectively reduce the noise caused by background information and reduce the follow-up. Double counting in calculations.

(2)本发明通过掩码矩阵实现基于感受野的特征向量筛选，通过一个神经网络即可实现人脸的检测和验证，相比于两阶段的人脸验证算法，简化了训练过程，提高了计算的并行度。(2) The present invention realizes the feature vector screening based on the receptive field through the mask matrix, and can realize the detection and verification of the face through a neural network. Compared with the two-stage face verification algorithm, the training process is simplified, and the Computational parallelism.

附图说明Description of drawings

图1是本发明验证人脸的神经网络的结构图。FIG. 1 is a structural diagram of a neural network for face verification according to the present invention.

图2是本发明的特征向量映射示意图。FIG. 2 is a schematic diagram of feature vector mapping of the present invention.

具体实施方式Detailed ways

下面结合附图对发明的技术方案进行详细说明。The technical solutions of the invention will be described in detail below with reference to the accompanying drawings.

本发明提供一种基于局部感受野的人脸验证方法，包括如下步骤：The present invention provides a face verification method based on local receptive field, comprising the following steps:

建立外部数据集：根据研究机构的公开细粒度分类数据库或自行搜集的数据建立外部数据集，示例性地，人脸验证训练集可以采用CASIA-WebFace，测试集采用LFW数据集。每张图片都应含有身份标注，指明该图片属于哪个类别；每张图片还需要有人脸框坐标，指标明人脸在图片中的位置。应当收集尽可能多的不同身份的人脸，每个身份包含尽可能多的样本，同时减少数据集中错误标注样本的数量。Build an external dataset: Build an external dataset based on the public fine-grained classification database of the research institution or the data collected by yourself. Exemplarily, the face verification training set can use CASIA-WebFace, and the test set can use the LFW data set. Each picture should contain an identity label, indicating which category the picture belongs to; each picture also needs to have face frame coordinates, indicating the position of the face in the picture. As many faces of different identities as possible should be collected, each identity containing as many samples as possible, while reducing the number of mislabeled samples in the dataset.

数据增强：利用深度神经网络完成人脸验证任务容易导致过拟合，但是训练样本数通常远小于需要的样本数，手动数据增强可以减少过拟合。用于扩大数据集的数据增强方法通常有以下四种：平移、缩放、旋转、翻转。Data augmentation: Using deep neural networks to complete face verification tasks can easily lead to overfitting, but the number of training samples is usually much smaller than the required number of samples. Manual data augmentation can reduce overfitting. There are generally four types of data augmentation methods used to enlarge datasets: translation, scaling, rotation, and flipping.

训练模型：建立卷积神经网络，该卷积神经网络的输入为彩色图片，训练时输出为图片中人脸所属的身份类别和人脸位置的预测框坐标，损失函数采用softmax loss，测试时根据预测框在图像中的位置输出对应区域的特征向量；基于局部感受野的人脸验证网络的结构中有两部分组成，其一为图中人脸的检测部分；其二为人脸区域的特征向量部分；在训练时网络将会针对图中的不同区域进行人脸特征的抽取，然后根据人脸检测的结果挑选出图片中人脸区域对应的特征向量；在训练时需要在筛选出的特征向量后添加一个全连接层，每次筛选的特征向量通过全连接层，采用softmax loss作为损失函数，利用Adam优化器进行训练，当准确率不再上升时，保存卷积神经网络模型，得到训练好的卷积神经网络；在对网络进行测试和使用时去掉全连接层直接输出图片中人脸区域的特征向量，计算不同图片特征向量之间的欧氏距离，如果距离小于一定阈值，则认为两张人脸图片为同一个人的图片；否则为不同人的照片。Training model: Build a convolutional neural network. The input of the convolutional neural network is a color image. During training, the output is the identity category of the face in the image and the predicted frame coordinates of the face position. The loss function uses softmax loss. The position of the predicted frame in the image outputs the feature vector of the corresponding area; the structure of the face verification network based on the local receptive field consists of two parts, one is the detection part of the face in the figure; the other is the feature vector of the face area part; during training, the network will extract face features for different areas in the picture, and then select the feature vector corresponding to the face region in the picture according to the results of face detection; during training, the selected feature vector After adding a fully connected layer, the feature vector filtered each time passes through the fully connected layer, using softmax loss as the loss function, and using the Adam optimizer for training. When the accuracy rate no longer increases, save the convolutional neural network model and get trained. When testing and using the network, remove the fully connected layer and directly output the feature vector of the face area in the picture, and calculate the Euclidean distance between the feature vectors of different pictures. If the distance is less than a certain threshold, it is considered that the two A face picture is a picture of the same person; otherwise, it is a picture of a different person.

在计算经过网络最后一个卷积层后生成N*N*K维的特征向量，其中，N*N维表示将图片划分为N个区域且每个区域的特征向量大小为1*1*K，同时为了解决人脸出现的位置正好位于区域的交界处以致区域无法包含整张人脸的问题，采用大卷积核以及小于卷积核宽度的卷积步长，使得卷积核在滑动计算时能够相互交叠；在训练时需要在筛选出的特征向量后添加一个全连接层，将每次筛选的特征向量通过全连接层并且采用softmax loss作为损失函数计算网络预测误差。After calculating the last convolutional layer of the network, a feature vector of N*N*K dimension is generated, where N*N dimension means that the image is divided into N regions and the feature vector size of each region is 1*1*K, At the same time, in order to solve the problem that the face appears at the junction of the region so that the region cannot contain the entire face, a large convolution kernel and a convolution step size smaller than the width of the convolution kernel are used, so that the convolution kernel is used for sliding calculation. They can overlap each other; during training, a fully connected layer needs to be added after the filtered feature vector, and each filtered feature vector is passed through the fully connected layer and softmax loss is used as the loss function to calculate the network prediction error.

以下将结合附图及具体实施例，对本发明的技术方案进行详细说明。The technical solutions of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

本发明的基于局部感受野的人脸验证方法包括以下三个步骤。The face verification method based on the local receptive field of the present invention includes the following three steps.

步骤一、建立外部数据集：训练集采用CASIA-WebFace，测试集采用LFW数据集。CASIA-WebFace数据集中一共有10575个身份，每个身份有多张照片；LFW数据集中5749个身份，一共13233张图片。所有图像都缩放至250*250*3的大小。Step 1. Establish an external data set: The training set uses CASIA-WebFace, and the test set uses the LFW data set. There are a total of 10575 identities in the CASIA-WebFace dataset, and each identity has multiple photos; there are 5749 identities in the LFW dataset, with a total of 13233 images. All images are scaled to 250*250*3 size.

步骤二、进行数据增强：对所得到的数据集中的样本进行平移操作，对每张图片进行随机平移。在深度神经网络训练过程中容易产生过拟合，同时通过随机平移使人脸处于图片中的任意位置。所以采用随机平移的方式增加网络的鲁棒性。Step 2: Perform data enhancement: perform a translation operation on the samples in the obtained data set, and perform random translation on each image. In the process of deep neural network training, it is easy to produce overfitting, and at the same time, the face is placed in any position in the picture through random translation. Therefore, random translation is used to increase the robustness of the network.

步骤三、建立卷积神经网络：本发明使用的深度神经网络如图1所示，输入神经网络的是像素为250*250*3的彩色图片，彩色图片首先依次经过一系列的卷积层得到13*13*896的特征图，然后分为两个支路其中一条支路继续进行图像特征的提取，对图片不同区域生成对应的特征向量，如图1中生成了3*3个128维的特征向量，即将原图分成了9个固定的区域，每个区域对应一个128维向量，如果增加生成的特征向量数量，原图中对应的区域也会更多，每个区域大小也会更小；另一个支路用于对人脸的位置进行预测，如图1所示，13*13*896的特征图经过卷积生成对人脸坐标的预测框和此预测框中是人脸还是背景的分类预测，人脸置信度最高的框则为人脸所在的框，将人脸所在框中心点所在的像素点置为1其余像素点置0得到15*15的掩码矩阵，然后再经过一个大小为5步长也为5的最大池化操作将掩码矩阵变成3*3，其中，1所在位置即是3*3*128维向量中人脸区域对应特征向量所在位置，所以通过掩码矩阵可以挑选出人脸区域所对应的特征向量。在训练时需要在筛选出特征向量之后添加一个全连接层用于分类进行softmax loss的计算，需要注意的是每张图片只有人脸区域对应的那个特征向量才计算softmax loss，网络的损失函数为softmax loss与人脸检测支路中的误差之和。在网络训练完成后，将全连接层去掉，网络直接输出测试图片的特征向量，对于不同的输入样本计算其特征向量之间的欧几里得距离，如果大于某一阈值则认为不是同一个人，否则为同一个人。Step 3: Establish a convolutional neural network: The deep neural network used in the present invention is shown in Figure 1, and the input neural network is a color picture with a pixel of 250*250*3, and the color picture is first obtained through a series of convolutional layers. The feature map of 13*13*896 is then divided into two branches, one of which continues to extract image features, and generates corresponding feature vectors for different areas of the picture. As shown in Figure 1, 3*3 128-dimensional images are generated. Feature vector, that is, the original image is divided into 9 fixed areas, each area corresponds to a 128-dimensional vector, if the number of generated feature vectors is increased, the corresponding areas in the original image will be more, and the size of each area will be smaller. ; Another branch is used to predict the position of the face. As shown in Figure 1, the 13*13*896 feature map is convolved to generate a prediction frame for the coordinates of the face and whether the prediction frame is a face or a background For the classification prediction, the frame with the highest face confidence is the frame where the face is located, and the pixel where the center point of the frame where the face is located is set to 1 and the rest of the pixels are set to 0 to get a 15*15 mask matrix, and then go through a The maximum pooling operation with a size of 5 and a step size of 5 turns the mask matrix into 3*3, where the position of 1 is the position of the feature vector corresponding to the face area in the 3*3*128-dimensional vector, so by masking The code matrix can pick out the feature vector corresponding to the face region. During training, it is necessary to add a fully connected layer for classification and softmax loss calculation after filtering out the feature vectors. It should be noted that only the feature vector corresponding to the face area of each image calculates the softmax loss. The loss function of the network is The sum of the softmax loss and the error in the face detection branch. After the network training is completed, the fully connected layer is removed, and the network directly outputs the feature vector of the test image. For different input samples, the Euclidean distance between the feature vectors is calculated. If it is greater than a certain threshold, it is considered that it is not the same person. Otherwise the same person.

本发明还提供一种方法用来解决人脸位置处于区域边界的情况，人脸9个特征向量与图片中对应区域的映射关系如图2所示，如果人脸处于某一区域中间则可以较好地抽取特征，然而如果人脸处于两个区域的交界处则会对特征抽取产生影响，本发明在提取特征向量的卷积层采用大的卷积核并且小步长来缓解这种情况，采用的大卷积核高和宽为特征图高和宽的2/3并向下取整，由于输出的特征向量数量是固定的，所以根据公式(1)即可确定卷积步长。在公式(1)中feature为最终生成的特征向量的大小，在图1中即为3，input则为输入的特征图大小，在图1中即为11，选用的卷积核大小kernel为7，那么可以计算出步长stride大小为2。The present invention also provides a method for solving the situation that the face position is at the boundary of the region. The mapping relationship between the nine feature vectors of the face and the corresponding region in the picture is shown in Figure 2. If the face is in the middle of a certain region, it can be compared However, if the face is at the junction of the two regions, it will have an impact on the feature extraction. The present invention adopts a large convolution kernel and a small step size in the convolution layer for extracting the feature vector to alleviate this situation, The height and width of the large convolution kernel used are 2/3 of the height and width of the feature map and rounded down. Since the number of output feature vectors is fixed, the convolution step size can be determined according to formula (1). In formula (1), feature is the size of the finally generated feature vector, which is 3 in Figure 1, input is the size of the input feature map, which is 11 in Figure 1, and the selected convolution kernel size kernel is 7 , then the stride size can be calculated to be 2.

例如，最后一层的输入特征图为11*11，卷积核采用7*7步长为2，输出为3*3；由于在进行卷积计算时，卷积核只会滑动一小步使得卷积有交叠，这样位于区域交界处的人脸特征也能够被较好地抽取。For example, the input feature map of the last layer is 11*11, the convolution kernel uses a 7*7 step size of 2, and the output is 3*3; since the convolution kernel only slides a small step during the convolution calculation The convolutions overlap, so that the facial features located at the junction of the regions can also be better extracted.

综合上述，本发明一种基于局部感受野的人脸验证方法，利用深度卷积神经网络的平移不变性对图像中的不同区域分别进行特征抽取，有效地降低了背景信息对人脸特征抽取的影响，同时利用人脸检测的方法预测出人脸所在区域，挑选出人脸区域对应的特征向量，用于人脸验证。与两阶段的人脸验证方法相比，不需要先用人脸检测网络将人脸部分裁剪出来再进行人脸验证，有效提升人脸验证的并行度，在网络训练时也不用训练两个网络，大大简化了网络的训练过程。To sum up the above, the present invention is a face verification method based on local receptive field, which utilizes the translation invariance of deep convolutional neural network to extract features from different regions in the image, effectively reducing the impact of background information on face feature extraction. At the same time, the face detection method is used to predict the area where the face is located, and the feature vector corresponding to the face area is selected for face verification. Compared with the two-stage face verification method, there is no need to use the face detection network to cut out the face part and then perform face verification, which effectively improves the parallelism of face verification, and does not need to train two networks during network training. , which greatly simplifies the training process of the network.

以上实施例仅为说明本发明的技术思想，不能以此限定本发明的保护范围，凡是按照本发明提出的技术思想在技术方案基础上所做的任何改动均落入本发明保护范围之内。The above embodiments are only to illustrate the technical idea of the present invention, and cannot limit the protection scope of the present invention. Any changes made on the basis of the technical solution according to the technical idea proposed by the present invention all fall within the protection scope of the present invention.

Claims

1. A human face verification method based on local receptive fields is characterized in that a neural network for performing human face verification on an input picture is trained, the neural network detects a human face region of the input picture and extracts feature vectors of each region of the input picture, the feature vectors of the human face region of the input picture are screened out from all the extracted feature vectors, the trained neural network is used for extracting the feature vectors of the human face regions of different test pictures, and when the Euclidean distance between the feature vectors of the human face regions of the two test pictures is smaller than a threshold value, the two test pictures are output as the verification result of the same human face picture.

2. The local receptive field-based face verification method according to claim 1, wherein the neural network detects a face region of the input picture by reading face prediction box information of the input picture, and when the selected region of the face prediction box is a face classification result, the selected region of the face prediction box with the highest confidence is selected as the face region of the input picture.

3. The local receptive field-based face verification method as claimed in claim 1, characterized in that a mask matrix representing the face position is used to screen out the feature vectors of the face region of the input picture from all extracted feature vectors.

4. The method of claim 1, wherein in the course of training the neural network for face verification, the feature vectors of the face regions of the input pictures are input into a full connection layer to obtain classification results.

5. The method according to claim 1, wherein during training of the neural network for face verification of the input picture, softmax loss is used as a loss function to reversely propagate the sum of the classification error of the feature vector of the face region of the input picture and the detection error of the face region of the input picture to correct the network parameters.

6. The local receptive field-based face verification method as claimed in claim 1, characterized in that the input picture face region is detected by sliding a large convolution kernel with a convolution step smaller than the convolution kernel width.

7. The method for verifying human face based on local receptive field as claimed in claim 1, wherein the input picture and the test picture are subjected to size scaling, normalization and data enhancement, and the data enhancement includes but is not limited to translation, scaling, rotation and flipping.

8. The method as claimed in claim 3, wherein the mask matrix has an element of 1 corresponding to the face position and the remaining elements are 0.

9. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the face verification method of claim 1.

10. Terminal equipment, characterized by, includes: a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the face verification method of claim 1 when executing the program.