CN111488806A

CN111488806A - Multi-scale face recognition method based on parallel branch neural network

Info

Publication number: CN111488806A
Application number: CN202010220225.3A
Authority: CN
Inventors: 苏寒松; 田曦初; 刘高华
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2020-08-04

Abstract

The invention discloses a multi-scale face recognition method based on a parallel branch neural network, which can output the recognized face end-to-end after inputting any picture. Generate a face picture and input it into the face recognition network to prepare for recognition; step 2, input the unlabeled original data set picture into the face detection network obtained in step 1, detect the face picture and perform preprocessing; step 3, put the step 2. The obtained pictures are marked with everyone's names, and input the face recognition network for training; step 4, fix all the parameters of the training convergent face recognition network without any changes, and obtain the final face recognition network, and The face detection networks obtained in step 1 are combined and finally form an end-to-end face recognition network.

Description

A multi-scale face recognition method based on parallel branched neural network

技术领域technical field

本发明涉及人工智能方向深度学习领域，主要关于一种有并行分支结构的神经网络检测和多尺度识别人脸的方法。The invention relates to the field of deep learning in the direction of artificial intelligence, and mainly relates to a method for neural network detection and multi-scale face recognition with a parallel branch structure.

背景技术Background technique

随着计算机技术的发展，计算机视觉领域取得了巨大进展，视频监控越来越智能化，智能视频监控系统的一个重要任务是对视频图像中的人脸进行检测、识别等。目标的正确检测与识别是视频监控的前提，其效果会影响后续的跟踪分析等操作。With the development of computer technology, great progress has been made in the field of computer vision, and video surveillance is becoming more and more intelligent. An important task of an intelligent video surveillance system is to detect and recognize faces in video images. The correct detection and identification of the target is the premise of video surveillance, and its effect will affect the subsequent tracking analysis and other operations.

传统的人脸识别方法有很多，最常用的是SVM分类器、KNN聚类。SVM是最大化分类间隔的线性分类器，把线性不可分的数据投射到高维空间使其线性可分，然后以人为给出的线性函数为依据，寻找一个最大边缘距离的分类线(面)来完成对数据的分类。但其依赖工程人员的经验，设置的参数对最终结果有决定性影响且难以解决多分类问题。KNN是将测试数据的特征与训练集中对应的特征进行相互比较，找到训练集中与之最为相似的前K个数据，则该测试数据对应的类别就是K个数据中出现次数最多的那个分类。但其计算量大且对数据的容错性差、准确率低。There are many traditional face recognition methods, the most commonly used are SVM classifier and KNN clustering. SVM is a linear classifier that maximizes the classification interval. It projects the linearly inseparable data into a high-dimensional space to make it linearly separable, and then finds a classification line (surface) with the largest edge distance based on the linear function given by humans. Complete the classification of the data. However, it relies on the experience of engineers, the parameters set have a decisive impact on the final result and it is difficult to solve the multi-classification problem. KNN compares the features of the test data with the corresponding features in the training set, and finds the most similar top K data in the training set, then the category corresponding to the test data is the category with the most occurrences in the K data. However, it has a large amount of calculation, poor tolerance to data, and low accuracy.

近年深度学习算法的迅猛发展使其在计算机视觉领域全面超过传统的识别算法，且端到端的方式便利了不同知识背景的人进行操作。故目前主流采用深度学习神经网络的方法进行人脸检测与识别。但这些算法大多存在着网络过深、结构复杂、参数量巨大且耗时长的问题。检测识别准确率与计算开销之间存在矛盾。In recent years, the rapid development of deep learning algorithms has made it comprehensively surpass traditional recognition algorithms in the field of computer vision, and the end-to-end approach is convenient for people with different knowledge backgrounds to operate. Therefore, at present, the mainstream method of deep learning neural network is used for face detection and recognition. However, most of these algorithms have the problems of too deep network, complex structure, huge amount of parameters and time-consuming. There is a contradiction between detection and recognition accuracy and computational overhead.

发明内容SUMMARY OF THE INVENTION

本方法通过在现有的深度学习算法上进行改进，利用轻量级的网络解决常见的神经网络结构复杂臃肿、参数庞大冗余的问题。提出了一种并行分支的神经网络结构，同时区别于现有的神经网络人脸识别算法，不再用某一层网络的特征做分类依据，而是用多尺度特征学习到更丰富的人脸特征来提高准确率。另外，图片预处理对齐图片、灰度化，进一步控制计算开销，以较小的计算代价获得较理想的人脸识别准确率。By improving the existing deep learning algorithm, the method uses a lightweight network to solve the common problems of complex and bloated neural network structure and huge and redundant parameters. A neural network structure with parallel branches is proposed. At the same time, different from the existing neural network face recognition algorithm, it no longer uses the characteristics of a certain layer of network as the classification basis, but uses multi-scale features to learn more abundant faces. features to improve accuracy. In addition, the image preprocessing aligns the image and grayscales it to further control the computational overhead, and obtain an ideal face recognition accuracy with a small computational cost.

本发明的目的是通过以下技术方案实现的：The purpose of this invention is to realize through the following technical solutions:

一种基于并行分支神经网络的多尺度人脸识别方法，能够在输入任意图片后，端到端地输出识别的人脸，具体包括以下步骤：A multi-scale face recognition method based on a parallel branched neural network, which can output the recognized face end-to-end after inputting any picture, and specifically includes the following steps:

步骤一、训练人脸检测网络，为生成人脸图片输入人脸识别网络进行识别做准备；具体分为以下几部分：Step 1: Train a face detection network to prepare for generating a face image and input it into a face recognition network for recognition; it is divided into the following parts:

步骤101：对训练数据集图片进行标注，包括用矩形框完整包围人脸并标记人脸的五个点位置：左眼、右眼、鼻子、左嘴角、右嘴角；Step 101: Annotate the images of the training data set, including completely enclosing the face with a rectangular frame and marking five points of the face: left eye, right eye, nose, left corner of mouth, and right corner of mouth;

步骤102：以轻量级网络即MobileNet网络作为人脸检测网络，将标注后图片输入进行训练；Step 102: Use the lightweight network, namely the MobileNet network, as the face detection network, and input the labeled pictures for training;

步骤103：将训练收敛后的人脸检测网络固定所有参数，得到最终的人脸检测网络，为检测出人脸输入人脸识别网络做准备；Step 103: Fix all parameters of the converged face detection network after training to obtain a final face detection network, and prepare for inputting the detected face into the face recognition network;

步骤二、将未标注的原始数据集图片输入步骤一得到的人脸检测网络，检测到人脸图片并进行预处理；Step 2: Input the unlabeled original data set image into the face detection network obtained in step 1, detect the face image and perform preprocessing;

步骤三、将步骤二得到的图片标注好所有人的名字，输入人脸识别网络进行训练；Step 3: Label the pictures obtained in Step 2 with everyone's name, and input the face recognition network for training;

步骤四、将训练收敛的人脸识别网络的全部参数固定，不再进行任何变化，得到最终的人脸识别网络，和步骤一得到的人脸检测网络组合并最终形成端到端的人脸识别网络。Step 4: Fix all the parameters of the converged face recognition network, without any changes, to obtain the final face recognition network, which is combined with the face detection network obtained in step 1 to form an end-to-end face recognition network. .

进一步的，步骤二具体包括以下步骤Further, step 2 specifically includes the following steps

步骤201：将原始数据集图片输入步骤一得到的人脸检测网络，检测到人脸图片后裁剪保存；Step 201: Input the original data set picture into the face detection network obtained in step 1, and cut and save the face picture after detecting it;

步骤202：将裁剪得到的不同大小人脸图片放缩到统一的112x112尺寸，并用相似变换的方式将角度倾斜的人脸进行纠正对齐；Step 202: Scale the cropped face pictures of different sizes to a uniform size of 112x112, and use a similar transformation method to correct and align the faces with inclined angles;

步骤203：将同一尺寸的对齐后的图片灰度化处理。Step 203: Grayscale the aligned pictures of the same size.

进一步的，所述人脸识别网络由两部分组成；第一部分用轻量级网络即MobileNet网络做基础网络，第二部分在基础网络后面加三层并行分支的、尺寸不一致的特征图，用来进行多尺度人脸识别；上述第二部分的分支结构分别用普通卷积和空洞卷积的方式进行前馈传播，用以学习不同侧重点的人脸特征。Further, the face recognition network consists of two parts; the first part uses a lightweight network, namely the MobileNet network, as the basic network, and the second part adds three layers of parallel branched feature maps with inconsistent sizes behind the basic network to be used. Carry out multi-scale face recognition; the branch structure in the second part above uses ordinary convolution and hole convolution to carry out feed-forward propagation to learn face features with different focuses.

与现有技术相比，本发明的技术方案所带来的有益效果是：Compared with the prior art, the beneficial effects brought by the technical solution of the present invention are:

1.图像预处理过程中，将裁剪得到的不同大小人脸图片放缩到统一的112×112尺寸，并用相似变换的方式将角度倾斜的人脸进行纠正对齐；这样用很简单的数学运算便可以简化人脸识别网络的工作，使其更专注于学习人脸的五个关键点的位置、内在联系，而不需考虑因人脸大小和角度不同而进行额外的学习开销。1. In the process of image preprocessing, the cropped face pictures of different sizes are scaled to a uniform size of 112×112, and the faces with inclined angles are corrected and aligned by similar transformation; this is a very simple mathematical operation. It can simplify the work of the face recognition network and make it more focused on learning the positions and internal connections of the five key points of the face, without considering the additional learning overhead due to the size and angle of the face.

2.在图像预处理过程中，将统一尺寸的对齐图片灰度化处理。因人脸识别，最重要的是学习到五个关键点的联系，因此色彩对人脸检测识别来说并不能够提高准确率，却会影响计算开销。具体而言，一般彩色图像有红、绿、蓝三个基色分量，每种都有256个取值可能，共有256×256×256＝16777216种颜色。而灰度图像没有色彩，各颜色通道取值一样，只有颜色深浅，共256种颜色深度。因此用这样的方式可以让后续神经网络注意颜色深浅变化，而不是具体的颜色，这样可以大大加快运算速度、也可以提高后续网络对有效信息的着重学习，提高准确率。2. In the process of image preprocessing, the aligned images of uniform size are grayscaled. Because face recognition, the most important thing is to learn the connection of five key points, so color can not improve the accuracy of face detection and recognition, but it will affect the calculation cost. Specifically, a general color image has three primary color components, red, green, and blue, each of which has 256 possible values, and a total of 256×256×256=16777216 colors. The grayscale image has no color, the value of each color channel is the same, only the color depth, a total of 256 color depths. Therefore, in this way, the subsequent neural network can be made to pay attention to the color depth change, rather than the specific color, which can greatly speed up the operation speed, and can also improve the subsequent network's emphasis on learning effective information and improve the accuracy.

3.本方法基于深度学习算法，选择轻量级网络，避免网络过深、参数量过多的问题；将检测与识别分为两个独立阶段进行、互不干扰，确保网络对特定功能的针对性；且在人脸识别网络设计并行的网络分支对于不同的特征进行学习，并行网络分支选择普通卷积和空洞卷积并行传递并识别人脸，普通卷积可以学习整体的信息；空洞卷积增加了感受野，学习到了更多的细节信息。整体信息和细节信息相结合，能提高对人脸的识别准确率、学习到更丰富的信息；再用多尺度预测的方式，保证轻量级网络运行速度快的的同时，解决常存在的识别效果差的问题。3. This method is based on the deep learning algorithm, selects a lightweight network, avoids the problem of too deep network and too many parameters; the detection and recognition are divided into two independent stages, without interfering with each other, to ensure that the network can target specific functions. And in the face recognition network design parallel network branches to learn different features, the parallel network branch selects ordinary convolution and hole convolution to pass and recognize faces in parallel, ordinary convolution can learn the overall information; hole convolution The receptive field has been increased, and more detailed information has been learned. The combination of overall information and detailed information can improve the accuracy of face recognition and learn more abundant information; and then use multi-scale prediction to ensure that the lightweight network runs fast while solving the frequent recognition problems. The problem of poor performance.

附图说明Description of drawings

图1为人脸检测网络训练流程示意图；Figure 1 is a schematic diagram of a face detection network training process;

图2为图片预处理流程示意图；Figure 2 is a schematic diagram of a picture preprocessing process;

图3为人脸识别网络训练流程示意图；Figure 3 is a schematic diagram of a face recognition network training process;

图4人脸识别算法完整结构示意图。Figure 4 is a schematic diagram of the complete structure of the face recognition algorithm.

具体实施方式Detailed ways

以下结合附图和具体实施例对本发明作进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

本发明提供一种基于并行分支神经网络的多尺度人脸识别方法，具体如下：The present invention provides a multi-scale face recognition method based on a parallel branch neural network, the details are as follows:

步骤一、训练人脸检测网络，为生成人脸图片输入人脸识别网络进行识别做准备。图1是对步骤一做的详细解释。具体为：Step 1: Train a face detection network to prepare for generating a face picture and inputting it into a face recognition network for recognition. Figure 1 is a detailed explanation of step one. Specifically:

步骤101.对于训练数据集图片进行标注，包括用矩形框完整包围人脸并标记人脸的五个点位置：左眼、右眼、鼻子、左嘴角、右嘴角。采用工具为LabelImg，标注后产生后缀为.xml的信息文件和原图片一一对应。Step 101. Annotate the images of the training data set, including completely enclosing the face with a rectangular frame and marking the five point positions of the face: left eye, right eye, nose, left corner of the mouth, and right corner of the mouth. The tool is LabelImg, and the information file with the suffix .xml is generated after labeling, which corresponds to the original image one by one.

步骤102.以轻量级网络如目前最流行的MobileNet等作为人脸检测网络，将标注后.xml文件和图片一起输入进行训练。Step 102. Use a lightweight network such as the currently most popular MobileNet as a face detection network, and input the marked .xml file and the image together for training.

步骤103.训练收敛后的人脸检测网络可以有效检测图片中人脸。固定所有层的所有参数不再变化，便得到了最终的人脸检测网络，为检测出人脸输入人脸识别网络做准备。Step 103. The converged face detection network after training can effectively detect the face in the picture. When all parameters of all layers are fixed and no longer change, the final face detection network is obtained, in preparation for detecting the face and inputting the face recognition network.

步骤二、图像预处理。将未标注的原始数据集图片输入步骤一得到的人脸检测网络，检测到人脸图片并进行一系列处理。图2是对步骤二图像预处理所做的详细解释。具体为：The second step is image preprocessing. Input the unlabeled original data set image into the face detection network obtained in step 1, detect the face image and perform a series of processing. Figure 2 is a detailed explanation of step 2 image preprocessing. Specifically:

步骤201.将未标注的原始图片输入步骤一得到的人脸检测网络，检测到人脸图片后裁剪保存。Step 201. Input the unlabeled original picture into the face detection network obtained in step 1, and cut and save the face picture after detecting it.

步骤202.将不同大小人脸图片统一放缩到112x112尺寸，之后用相似变换的方式将角度倾斜的人脸进行纠正对齐。相似变换基本原理如下：Step 202. Uniformly scale the face pictures of different sizes to 112x112 size, and then correct and align the faces with inclined angles by means of similar transformation. The basic principle of similarity transformation is as follows:

原图像坐标(x,y)，变换后图像坐标(x’,y’)，公示表示如下：The original image coordinates (x, y), the transformed image coordinates (x', y'), the public representation is as follows:

其中，s＝1是保向的，s＝-1是逆向的。有4个自由度(1个旋转角θ，2个平移t_x、t_y，1个缩放尺度)。这样用很简单的数学运算便可以简化人脸识别网络的工作，使其更专注于学习人脸的五个关键点的位置、内在联系，而不需考虑因人脸大小和角度不同而进行额外的学习开销。Among them, s=1 is direction-preserving, and s=-1 is reverse. There are 4 degrees of freedom (1 rotation angle θ, 2 translations t _x , _ty , 1 scaling scale). In this way, the work of the face recognition network can be simplified with very simple mathematical operations, so that it can focus more on learning the positions and internal connections of the five key points of the face, without considering the additional size and angle of the face. learning cost.

步骤203.将同尺寸对齐图片灰度化处理。因人脸识别，最重要的是学习到五个关键点的联系，因此色彩对人脸检测识别来说并不能够提高准确率，却会影响计算开销。彩色图像中每个像素点由R、G、B三个分量决定，而每个分量又有256种取值方法，此时一个像素点就有1600多万(256×256×256)的颜色变化范围。而灰度图像是R、G、B三个分量取值相同的特殊彩色图像，此时一个像素点的变化范围有256种。在进行人脸表情识别之前把图像进行灰度化处理，可以使后续的图像处理计算量变小。Step 203. Grayscale the aligned pictures of the same size. Because face recognition, the most important thing is to learn the connection of five key points, so color can not improve the accuracy of face detection and recognition, but it will affect the calculation cost. Each pixel in the color image is determined by three components, R, G, and B, and each component has 256 value methods. At this time, one pixel has more than 16 million (256×256×256) color changes. scope. The grayscale image is a special color image with the same three components of R, G, and B. At this time, there are 256 variations of a pixel. Grayscale processing of the image before face expression recognition can reduce the amount of calculation in subsequent image processing.

步骤三、将步骤二得到的图片标注好所有人的名字，输入人脸识别网络进行训练。图3是对步骤三设计的人脸识别网络所做的具体解释。人脸识别网络构成如下：Step 3: Label the pictures obtained in Step 2 with everyone's names, and input them into the face recognition network for training. Figure 3 is a detailed explanation of the face recognition network designed in step 3. The face recognition network is composed as follows:

由两部分组成。第一部分用轻量级网络如MobileNet等做基础网络，第二部分在基础网络后面加3层并行分支的不同卷积层产生的多尺度特征图，用来进行多尺度人脸识别。这两条分支，一条用普通卷积，另一条用空洞卷积的方式进行前馈传播，旨在学习不同侧重点的人脸特征。普通卷积可以学习整体的信息；空洞卷积增加了感受野，学习到了更多的细节信息。整体信息和细节信息相结合，能提高对人脸的识别准确率。尤其是对于识别到的小人脸，也放大到了112×112大小，势必会造成信息损失。加入空洞卷积支路后，极大地提高了对小人脸识别的准确率。两条支路均采用了下采样操作，目的在于改变特征图的尺寸。在本例中采用尺寸如下:特征图a1、b1为52×52；特征图a2、b2为26×26；特征图a3、b3为13×13。大的特征图利于识别小的物体、小的特征图利于识别大的物体。用特征图预测的方式，可以更好地对各个尺度变换来的人脸有好的识别效果。避免某层特征图只对特定尺寸人脸有好的效果。最终将3个尺度的预测中置信度最高的人名当做最终的人脸识别结果。Consists of two parts. The first part uses a lightweight network such as MobileNet as the basic network, and the second part adds multi-scale feature maps generated by different convolutional layers with 3 parallel branches behind the basic network for multi-scale face recognition. These two branches, one uses ordinary convolution, and the other uses hole convolution for feedforward propagation, aiming to learn facial features with different focuses. Ordinary convolution can learn overall information; dilated convolution increases the receptive field and learns more detailed information. The combination of overall information and detailed information can improve the accuracy of face recognition. Especially for the recognized small face, it is also enlarged to 112×112 size, which will inevitably cause information loss. After adding the atrous convolution branch, the accuracy of small face recognition is greatly improved. Both branches use downsampling operations to change the size of the feature map. The dimensions used in this example are as follows: feature maps a1, b1 are 52 × 52; feature maps a2, b2 are 26 × 26; feature maps a3, b3 are 13 × 13. A large feature map is helpful for recognizing small objects, and a small feature map is favorable for recognizing large objects. Using the method of feature map prediction, it can better recognize the face transformed from each scale. Avoiding a certain layer of feature maps only has a good effect on faces of a certain size. Finally, the name with the highest confidence in the predictions of the three scales is used as the final face recognition result.

步骤四、将训练收敛的人脸识别网络参数“冷冻”，即网络全部参数固定，不再进行任何变化，便得到最终的人脸识别网络，和步骤一得到的人脸检测网络组合起来，便形成了端到端的人脸识别网络。Step 4. "Frozen" the training convergent face recognition network parameters, that is, all parameters of the network are fixed without any changes, and then the final face recognition network is obtained, which is combined with the face detection network obtained in step 1. An end-to-end face recognition network is formed.

图4是最终本发明提出的形成的端到端人脸识别算法完整结构示意图。在由步骤一和步骤三得到人脸检测网络和人脸识别网络后，最终端到端结构可表述为：(1)输入待检测图片；(2)人脸检测网络检测人脸并裁剪出来；(3)将裁剪人脸图片放缩到112x112尺寸并用相似变换摆正对齐人脸；(4)将摆正对齐人脸灰度化处理；(5)人脸图片输入人脸识别网络进行多尺度识别分析；(5)输出最终识别结果。FIG. 4 is a schematic diagram of the complete structure of the end-to-end face recognition algorithm finally proposed by the present invention. After obtaining the face detection network and the face recognition network from steps 1 and 3, the final end-to-end structure can be expressed as: (1) input the picture to be detected; (2) the face detection network detects the face and cuts it out; (3) Scale the cropped face image to 112x112 size and align the face with similarity transformation; (4) Grayscale the aligned face; (5) Input the face image into the face recognition network for multi-scale Identify and analyze; (5) output the final identification result.

本发明并不限于上文描述的实施方式。以上对具体实施方式的描述旨在描述和说明本发明的技术方案，上述的具体实施方式仅仅是示意性的，并不是限制性的。在不脱离本发明宗旨和权利要求所保护的范围情况下，本领域的普通技术人员在本发明的启示下还可做出很多形式的具体变换，这些均属于本发明的保护范围之内。The present invention is not limited to the embodiments described above. The above description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above-mentioned specific embodiments are only illustrative and not restrictive. Without departing from the spirit of the present invention and the protection scope of the claims, those of ordinary skill in the art can also make many specific transformations under the inspiration of the present invention, which all fall within the protection scope of the present invention.

Claims

1. A multi-scale face recognition method based on a parallel branch neural network can output a recognized face end to end after any picture is input, and is characterized by specifically comprising the following steps of:

training a face detection network to prepare for inputting a face picture into the face recognition network for recognition; the method is divided into the following parts:

step 101: marking the picture of the training data set, wherein the method comprises the following steps of completely surrounding the face by using a rectangular frame and marking five point positions of the face: left eye, right eye, nose, left mouth corner, right mouth corner;

step 102: taking a lightweight network as a face detection network, and inputting the marked pictures for training;

step 103: fixing all parameters of the face detection network after the training convergence to obtain a final face detection network, and preparing for detecting a face and inputting the face into a face recognition network;

inputting the unmarked original data set picture into the face detection network obtained in the step one, detecting the face picture and preprocessing the face picture;

step three, marking the names of all people on the pictures obtained in the step two, and inputting the pictures into a face recognition network for training;

and step four, fixing all parameters of the face recognition network with convergence training without any change to obtain a final face recognition network, and combining the final face recognition network with the face detection network obtained in the step one to finally form an end-to-end face recognition network.

2. The method for multi-scale face recognition based on the parallel branch neural network as claimed in claim 1, wherein the second step specifically comprises the following steps

Step 201: inputting the original data set picture into the face detection network obtained in the first step, and cutting and storing the detected face picture;

step 202, zooming the cut human face pictures with different sizes to be in the uniform 112 × 112 size, and correcting and aligning the human face with inclined angle in a similar transformation mode;

step 203: and graying the aligned pictures with the same size.

3. The method for multi-scale face recognition based on the parallel branched neural network as claimed in claim 1, wherein the face recognition network is composed of two parts; the first part uses a lightweight network as a basic network, and the second part adds three layers of parallel branched feature maps with different sizes behind the basic network for carrying out multi-scale face recognition; and the branch structure of the second part is subjected to feedforward propagation in a common convolution and a cavity convolution mode respectively so as to learn the face features of different emphasis points.

4. The method according to claim 1, wherein the lightweight network is a MobileNet network.