CN106951840A

CN106951840A - A Face Feature Point Detection Method

Info

Publication number: CN106951840A
Application number: CN201710138179.0A
Authority: CN
Inventors: 孙艳丰; 赵爽; 孔德慧; 王少帆; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-03-09
Filing date: 2017-03-09
Publication date: 2017-07-14

Abstract

The present invention discloses a kind of facial feature points detection method, using attitude detection task as constraint, method of novel three-way road GEH (the Gray Edge Hog) mode images merged using multiclass feature figure as the facial feature points detection of input.Detection in view of face 3 d pose information to face global characteristic point, is especially detected in the case where attitude deflection is larger to features of human face images, with considerable influence；The Hog characteristic informations of reflection facial image local feature presentation and shape are added simultaneously and can effectively reduce the complexity of contour feature point detection for the edge image information of the Sobel operator extractions of rim detection, the present invention is by extracting image intensity value, marginal information and Hog features generate new GEH triple channels image and are used as input, the nonproductive task estimated simultaneously using 3 d pose is used as constraint information, progress facial feature points detection.

Description

A Face Feature Point Detection Method

技术领域technical field

本发明属于计算机视觉领域，尤其涉及一种新型图像模式下以人脸三维姿态信息为辅助约束的人脸特征点检测方法，在人脸识别、人脸姿态表情分析及人脸合成中有着重要应用。The invention belongs to the field of computer vision, and in particular relates to a face feature point detection method using face three-dimensional pose information as an auxiliary constraint in a new image mode, which has important applications in face recognition, face pose and expression analysis, and face synthesis .

背景技术Background technique

近年来，随着深度学习的发展，卷积神经网络(Convolutional Neural Networks，CNN)在人脸特征点检测方面取得了很好的效果。CNN以人脸原始图像为输入，利用局部感受野策略获取的特征具有更好的表达能力；权值共享结构减少了权值的数量进而降低了网络模型的复杂度；同时，利用图像局部相关性原理对特征图进行的下采样在保留有用结构信息的同时有效地减少了数据的处理量，因此CNN被广泛应用于人脸图像的特征提取。In recent years, with the development of deep learning, Convolutional Neural Networks (CNN) have achieved good results in facial feature point detection. CNN takes the original image of the face as input, and the features obtained by using the local receptive field strategy have better expressive ability; the weight sharing structure reduces the number of weights and thus reduces the complexity of the network model; at the same time, using the local correlation of the image Principle The downsampling of feature maps effectively reduces the amount of data processing while retaining useful structural information, so CNN is widely used in feature extraction of face images.

Yi Sun等人在2013年提出三级深度卷积神经网络级联的人脸特征点检测模型(Deep Convolutional Network Cascade，DCNN)。该网络的第一级以人脸图像的三块不同区域(全部人脸区域，眼睛与鼻子区域，鼻子与嘴唇区域)作为输入，分别训练三个卷积神经网络来预测特征点的位置，融合三个网络的预测值以得到更加稳定的初级特征点检测结果。第二、三级在每个特征点附近提取特征，针对每个特征点单独训练一个卷积神经网络来修正定位的结果，实现左眼中心、右眼中心、鼻尖、左嘴角、右嘴角五个特征点的检测。但该方法所得结果只是将眼睛、鼻子、嘴的位置粗略标定出来，并不能很好的将面部属性具体表示出来,同时网络模型也过于复杂。Yi Sun et al. proposed a three-level deep convolutional neural network cascade face feature point detection model (Deep Convolutional Network Cascade, DCNN) in 2013. The first stage of the network takes three different areas of the face image (the entire face area, the eyes and nose area, and the nose and lips area) as input, and trains three convolutional neural networks to predict the position of feature points, and then fuses The predicted values of the three networks are used to obtain more stable primary feature point detection results. The second and third stages extract features near each feature point, and train a convolutional neural network for each feature point to correct the positioning results, and realize the left eye center, right eye center, nose tip, left mouth corner, and right mouth corner. Detection of feature points. However, the result of this method is only to roughly calibrate the positions of eyes, nose, and mouth, and it cannot express the facial attributes well, and the network model is also too complicated.

同年Erijin Zhou等人提出检测68个人脸特征点的四级卷积神经网络级联模型。该模型考虑到人脸外部轮廓与内部五官特征点定位的复杂程度不同，分别进行检测。多特征点检测能够更加详细的表示出人脸的姿态及表情等属性。但该方法操作过程较为复杂，涉及到10种不同的网络分别进行训练。In the same year, Erijin Zhou et al. proposed a four-level convolutional neural network cascade model for detecting 68 facial feature points. The model takes into account the complexity of the location of the outer contour of the face and the feature points of the inner facial features, and detects them separately. Multi-feature point detection can express the attributes of human faces such as gestures and expressions in more detail. However, the operation process of this method is relatively complicated, involving 10 different networks to be trained separately.

Zhanpeng Zhang等人在2015年提出受辅助任务约束的深度卷积神经网络(Tasks-Constrained Deep Convolutional Network，TCDCN)。该模型在进行5个人脸特征点检测的同时，通过与面部属性相关的18个辅助任务作为约束，增强了网络提取特征的能力，有助于特征点检测精确度的提高。但该方法仅考虑了人脸姿态在水平维度偏转的情况，而在多数情况下其他维度姿态偏转情况对于特征点检测精确度也具有一定影响。Zhanpeng Zhang et al. proposed a deep convolutional neural network (Tasks-Constrained Deep Convolutional Network, TCDCN) constrained by auxiliary tasks in 2015. While performing 5 facial feature point detection, the model uses 18 auxiliary tasks related to facial attributes as constraints, which enhances the network's ability to extract features and helps to improve the accuracy of feature point detection. However, this method only considers the face posture deflection in the horizontal dimension, and in most cases, the posture deflection in other dimensions also has a certain impact on the accuracy of feature point detection.

发明内容Contents of the invention

本发明给出一种以多类特征图融合的新型三通道GEH(Gray-Edge-Hog)模式图像为输入，以三维姿态辅助任务为约束信息进行人脸特征点检测的方法。本发明考虑到人脸在姿态发生变化的同时，能够明显地看到图像轮廓结构的变化，因此人脸三维姿态信息对人脸全局特征点的标定都具有相当大的影响；同时因为人脸外部轮廓特征点与内部器官特征点检测的难易程度不同，将人脸提取的边缘信息作为一个图像模式变量，可降低外部轮廓点的检测难度；将人脸图像提取的Hog特征图作为一个图像模式变量，在清晰反映图像轮廓结构的同时，更有效地突出了人脸各器官区域特征，因此提出了一种以图像灰度值，边缘信息及Hog特征融合的新型GEH模式图像为输入，以面部三维姿态信息为辅助约束的，面部特征点与面部姿态联合训练的卷积神经网络结构模型，实现对人脸68个特征点进行精确定位。The invention provides a new three-channel GEH (Gray-Edge-Hog) mode image fused with multi-category feature maps as input, and a method for detecting human face feature points with three-dimensional posture auxiliary tasks as constraint information. The present invention considers that when the posture of the face changes, the change of the image contour structure can be clearly seen, so the three-dimensional posture information of the human face has a considerable influence on the calibration of the global feature points of the human face; The difficulty of detecting contour feature points and internal organ feature points is different. Using the edge information extracted from a face as an image pattern variable can reduce the difficulty of detecting external contour points; using the Hog feature map extracted from a face image as an image pattern Variables, while clearly reflecting the image contour structure, can more effectively highlight the regional characteristics of the various organs of the face. Therefore, a new GEH mode image is proposed that uses the image gray value, edge information and Hog feature fusion as input. The three-dimensional posture information is auxiliary constraints, and the convolutional neural network structure model jointly trained by facial feature points and facial postures can realize the precise positioning of 68 feature points of the face.

为实现上述目的，本发明采用如下的技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种人脸特征点检测方法包括以下步骤：A method for face feature point detection comprises the following steps:

步骤1、将原始图像进行人脸检测定位与剪裁和三通道多特征图融合，得到三通道GEH模式图Picture_GEH；Step 1, the original image is subjected to face detection and positioning, clipping and three-channel multi-feature map fusion to obtain a three-channel GEH pattern picture Picture _GEH ;

步骤2、以三种特征图融合后的三通道GEH模式图作为卷积神经网络的输入，进行网络人脸特征提取，所述人脸特征包含：人脸特征点和三维姿态，所述特征点检测及姿态检测均以线性回归问题对应的最小二乘函数设计双任务损失函数；Step 2, use the three-channel GEH pattern map after the fusion of three kinds of feature maps as the input of the convolutional neural network, and perform network face feature extraction, the face features include: face feature points and three-dimensional posture, the feature points Both detection and attitude detection use the least squares function corresponding to the linear regression problem to design a dual-task loss function;

步骤3、采用梯度反向传播算法对所述双任务损失函数进行网络训练，最终学习到人脸特征点检测权重和姿态检测权重，在测试过程，经过相同的人脸特征提取网络，以实现人脸特征点检测及人脸三维姿态的检测。Step 3, using the gradient backpropagation algorithm to carry out network training on the dual-task loss function, and finally learn the face feature point detection weight and attitude detection weight, in the test process, through the same face feature extraction network, to achieve human Face feature point detection and face 3D pose detection.

作为优选，所述网络特征提取由3个卷积层和3个池化层，2个全连接层交替完成；As a preference, the network feature extraction is completed alternately by 3 convolutional layers and 3 pooling layers, and 2 fully connected layers;

首先，将三通道GEH模式图Picture_GEH作为第一层卷积操作的输入输出特征图y_j的计算公式如以下公式所示：First, the three-channel GEH pattern Picture _GEH is used as the input of the first layer convolution operation The calculation formula of the output feature map y _j is shown in the following formula:

其中，f表示卷积操作，l表示当前网络层数，i表示输入特征图的数量，j表示输出特征图的数量，w_ij为待求的卷积核参数，b_j是偏置参数，w_ij和b_j在实验开始时采用随机正态初始化的方式获取；Among them, f represents the convolution operation, l represents the number of current network layers, i represents the number of input feature maps, j represents the number of output feature maps, w _ij is the convolution kernel parameter to be obtained, b _j is the bias parameter, w _ij and _bj are obtained by random normal initialization at the beginning of the experiment;

然后，根据卷积阶段得到的结果，将特征送入线性回归问题对应的函数，设计的双任务损失函数表达式如以下公式所示：Then, according to the results obtained in the convolution stage, the features are sent to the function corresponding to the linear regression problem, and the designed dual-task loss function expression is shown in the following formula:

其中，N表示训练图像数量，表示第i张图像特征点检测任务的标签值，分别表示人脸三维姿态检测对应的标签值，x_i表示卷积神经网络提取的第i张图像特征，W^f表示特征点检测任务权重，W^yaw,W^pitch,W^roll分别表示人脸三姿态检测任务对应的权重λ_Yaw,λ_Pitch,λ_Roll表示损失函数损失权；Among them, N represents the number of training images, Indicates the label value of the i-th image feature point detection task, Represents the label value corresponding to the three-dimensional pose detection of the face, x _i represents the i-th image feature extracted by the convolutional neural network, W ^f represents the weight of the feature point detection task, W ^yaw , W ^pitch , W ^roll represent the three poses of the face The weights λ _Yaw , λ _Pitch , and λ _Roll corresponding to the detection task represent the loss weight of the loss function;

通过反向传播算法进行网络训练，得到人脸特征点检测权重W^f和姿态检测权重W^yaw,W^pitch,W^roll；测试过程，经过相同的人脸特征提取网络，最终得到人脸特征点检测结果(W^f)^Tx_i，以及三维姿态检测结果(W^Yaw)^Tx_i,(W^Pitch)^Tx_i,(W^Roll)^Tx_i。Network training is carried out through the backpropagation algorithm, and the face feature point detection weight W ^f and the attitude detection weight W ^yaw , W ^pitch , W ^roll are obtained; the test process, through the same face feature extraction network, finally gets the face feature point detection The result (W ^f ) ^T _xi , and the 3D attitude detection results (W ^Yaw ) ^T _xi , (W ^Pitch ) ^T _xi , (W ^Roll ) ^T _xi .

作为优选，步骤1中将经人脸检测定位与剪裁后的人脸子图进行灰度处理，得到灰度特征图G；然后对人脸子图提取Hog特征，得到Hog特征图H；最后提取边缘特征，得到边缘特征图E；利用RGB(Red‐Green‐Blue)颜色空间作为基底，将上述特征图G(Gray)、特征图E(Edge)、特征图H(Hog)特征图变量分别映射到RGB直角坐标系颜色空间上，生成新型的GEH模式图像Picture_GEH，生成公式如下：As preferably, in step 1, carry out gray-scale processing through the face sub-image after face detection positioning and clipping, obtain the gray-scale feature map G; Then extract the Hog feature to the face sub-image, obtain the Hog feature map H; Finally extract the edge feature , to obtain the edge feature map E; using the RGB (Red‐Green‐Blue) color space as the base, the above feature map G (Gray), feature map E (Edge), feature map H (Hog) feature map variables are respectively mapped to RGB In the Cartesian coordinate system color space, a new type of GEH mode image Picture _GEH is generated, and the generation formula is as follows:

其中，代表灰度特征图Gray的灰度值映射到RGB中的R(Red)色彩空间，代表Hog特征图特征值映射到B(Blue)色彩空间，代表Edge特征图的特征值映射到G(Green)色彩空间。in, The grayscale value representing the grayscale feature map Gray is mapped to the R (Red) color space in RGB, Represents the Hog feature map eigenvalues mapped to the B(Blue) color space, The eigenvalues representing the Edge feature map are mapped to the G(Green) color space.

附图说明Description of drawings

图1：本发明的三通道特征图生成流程示意图；Figure 1: Schematic diagram of the process of generating the three-channel feature map of the present invention;

图2a：原始人脸图像；Figure 2a: Original face image;

图2b：经多特征图融合后的人脸图像；Figure 2b: Face image after multi-feature map fusion;

图3：GEH-双任务卷积神经网络结构模型。Figure 3: GEH-Dual-Task Convolutional Neural Network Architecture Model.

具体实施方式detailed description

本发明提供一种人脸特征点检测方法，以姿态检测任务作为约束，以多类特征图进行融合的新型三通道GEH(Gray-Edge-Hog)模式图像作为输入的人脸特征点检测的方法。考虑到人脸三维姿态信息对人脸全局特征点的检测，尤其在姿态偏转较大的情况下对人脸图像特征点检测，具有相当大的影响；同时加入反映人脸图像局部特征表象和形状的Hog特征信息以及用于边缘检测的Sobel算子提取的边缘图像信息可有效降低轮廓特征点检测的复杂度，本发明通过提取图像灰度值，边缘信息及Hog特征生成新型GEH三通道图像作为输入，同时以三维姿态估计的辅助任务作为约束信息，进行人脸特征点检测。The present invention provides a face feature point detection method, which takes the pose detection task as a constraint, and uses a new three-channel GEH (Gray-Edge-Hog) pattern image fused with multi-class feature maps as an input face feature point detection method . Considering that the three-dimensional pose information of the face has a considerable influence on the detection of the global feature points of the face, especially in the case of a large pose deflection, it has a considerable influence on the detection of the feature points of the face image; The Hog feature information and the edge image information extracted by the Sobel operator used for edge detection can effectively reduce the complexity of contour feature point detection. The present invention generates a new GEH three-channel image by extracting image gray values, edge information and Hog features as Input, and at the same time use the auxiliary task of 3D pose estimation as constraint information to detect facial feature points.

本发明所用到的基础数据来自于300—W人脸特征点检测竞赛平台，其中包含LFPW,AFW,HELEN,IBUG四个数据集。每个数据集中的图像标签均为68个点，包括外部轮廓点及内部五官(眉毛，眼睛，鼻子，嘴巴)。对应的三维姿态标签由人类感知实验室研发的Interface软件，根据300-W图像提供的特征点标签计算生成，分别表示Yaw，Pitch及Roll三维信息。The basic data used in the present invention comes from the 300-W facial feature point detection competition platform, which includes four data sets of LFPW, AFW, HELEN, and IBUG. The image labels in each dataset are 68 points, including external contour points and internal facial features (eyebrows, eyes, nose, mouth). The corresponding 3D attitude labels are calculated and generated by the Interface software developed by the Human Perception Laboratory based on the feature point labels provided by the 300-W image, which respectively represent the 3D information of Yaw, Pitch and Roll.

本发明提出的人脸特征点检测CNN结构包含两个任务，即68个点的人脸特征点检测任务，以及三维姿态信息检测任务。利用三维姿态信息对全局人脸特征点检测的相关影响，通过CNN人脸特征点检测网络提取出既能表示人脸特征点位置又能反映人脸姿态朝向的联合特征，最终实现人脸特征点的检测。The face feature point detection CNN structure proposed by the present invention includes two tasks, that is, a 68-point face feature point detection task and a three-dimensional pose information detection task. Utilizing the relative influence of 3D pose information on global face feature point detection, the CNN face feature point detection network is used to extract joint features that can not only represent the position of face feature points but also reflect the orientation of face poses, and finally realize the face feature points detection.

1、图像预处理1. Image preprocessing

适当的图像预处理方法，可以消除原始图像中的天气、光照等环境影响，使图像的边缘和颜色特征更加突出，便于卷积神经网络的特征提取。本发明首先对原始图像进行人脸检测定位与剪裁，归一化处理，然后对剪裁图像分别进行灰度化处理，Hog特征提取并可视化，以及Sobel算子边缘特征提取，将生成的多类特征图融合形成新型GEH图像模式，将该模式下的三通道图像作为输入。Appropriate image preprocessing methods can eliminate environmental influences such as weather and light in the original image, make the edge and color features of the image more prominent, and facilitate feature extraction of convolutional neural networks. The present invention first performs face detection, positioning, clipping, and normalization processing on the original image, and then performs grayscale processing on the clipped image, Hog feature extraction and visualization, and Sobel operator edge feature extraction, and the generated multi-category features Graph fusion forms a new GEH image mode, and the three-channel image in this mode is used as input.

1.1人脸检测定位与剪裁1.1 Face detection positioning and clipping

本发明为去除图像背景、头发、服装等信息对特征点检测任务的干扰。根据人脸检测定位的结果，分别对检测到的图像上、下、左、右距离进行一定比例扩大，然后对扩大后的人脸图像进行裁剪处理及尺度归一化处理。The invention aims to remove the interference of image background, hair, clothing and other information on the feature point detection task. According to the results of face detection and positioning, the detected image's up, down, left, and right distances are enlarged by a certain proportion, and then the enlarged face image is cropped and scaled normalized.

1.2三通道多特征图融合1.2 Three-channel multi-feature map fusion

首先将上述1.1所得人脸子图进行灰度处理，得到灰度特征图G；然后对人脸子图提取Hog特征，因提取Hog特征后的特征图维度与人脸子图维度不同，在此采用CarlVondrick于2012年提出的Inverting Visual Features方法进行特征可视化处理最终得到与人脸子图同尺寸的Hog特征图H；最后提取边缘特征，利用5阶Sobel算子对经过尺度归一化后的RGB人脸子图图像进行边缘提取，生成的边缘特征图E。First, the face sub-image obtained in 1.1 above is processed in grayscale to obtain the gray-scale feature map G; then the Hog feature is extracted from the face sub-image, because the dimension of the feature map after extracting the Hog feature is different from the dimension of the face sub-image, Carl Vondrick was used here The Inverting Visual Features method proposed in 2012 performs feature visualization processing and finally obtains the Hog feature map H of the same size as the face sub-image; finally extracts the edge features, and uses the 5th-order Sobel operator to normalize the scale-normalized RGB face sub-map image Edge extraction is performed to generate an edge feature map E.

由于上述三类特征图反映了人脸图像不同属性的信息且相互独立，三个独立变量综合作用可形成新的图像模式。因此，本发明利用RGB(Red‐Green‐Blue)颜色空间作为基底，将上述G(Gray)，E(Edge)，H(Hog)特征图变量分别映射到RGB直角坐标系颜色空间上，生成新型的GEH模式图像Picture_GEH。生成公式如下：Since the above three types of feature maps reflect the information of different attributes of the face image and are independent of each other, the combined effect of the three independent variables can form a new image pattern. Therefore, the present invention uses the RGB (Red-Green-Blue) color space as a base, and maps the above-mentioned G (Gray), E (Edge), and H (Hog) feature map variables to the RGB Cartesian coordinate system color space respectively to generate a new The GEH mode image Picture _GEH . The generation formula is as follows:

GEH三通道图像生成过程如图1所示。The GEH three-channel image generation process is shown in Figure 1.

GEH三通道图像效果如图2a、2b所示。The GEH three-channel image effects are shown in Figures 2a and 2b.

2、人脸特征点检测与三维姿态检测双任务网络模型2. Dual-task network model of facial feature point detection and 3D pose detection

本发明的CNN网络结构联合考虑人脸图像特征点检测任务与三维姿态检测任务，其中人脸特征点检测任务为主要任务，三维姿态检测任务为辅助任务，网络结构如图3所示。The CNN network structure of the present invention jointly considers the task of face image feature point detection and the task of three-dimensional pose detection, wherein the task of face feature point detection is the main task, and the task of three-dimensional pose detection is an auxiliary task. The network structure is shown in Figure 3.

2.1人脸特征提取2.1 Face Feature Extraction

本网络以三种特征图融合后的三通道GEH模式图Picture_GEH作为卷积神经网络的输入，提取人脸特征。特征点检测及姿态检测任务均以线性回归问题对应的最小二乘函数作为损失函数，采用梯度反向传播算法，训练网络参数，最终实现人脸特征点检测及人脸三维姿态的检测。This network uses the three-channel GEH pattern Picture _GEH after the fusion of three feature maps as the input of the convolutional neural network to extract face features. Both the feature point detection and pose detection tasks use the least squares function corresponding to the linear regression problem as the loss function, and use the gradient backpropagation algorithm to train network parameters, and finally realize face feature point detection and face 3D pose detection.

本发明中网络特征提取由3个卷积层和3个池化层，2个全连接层交替完成。卷积层通过局部感受野进行卷积操作提取视觉特征。为了更完整地保留原始图像转换成GEH图像模式后的图像特征，本发明以接近原始图像人脸子图的尺寸作为输入尺寸，第一层卷积操作的输入(即Picture_GEH)是尺寸为224×224的图像，是DCNN及TCDCN输入尺寸的2.3倍，保证了图像的完整性及特征有效性；卷积核大小分别为7×7，4×4，3×3。输出特征图y_j的计算公式如式(2)所示：In the present invention, network feature extraction is accomplished alternately by three convolutional layers, three pooling layers, and two fully connected layers. The convolutional layer extracts visual features through the convolution operation of the local receptive field. In order to more completely preserve the image features after the original image is converted into the GEH image mode, the present invention takes the size of the face submap close to the original image as the input size, and the input of the first layer of convolution operation (Picture _GEH ) is an image with a size of 224×224, which is 2.3 times the input size of DCNN and TCDCN, which ensures the integrity of the image and the validity of features; the convolution kernel sizes are 7×7, 4×4, 3 ×3. The calculation formula of the output feature map y _j is shown in formula (2):

其中，f表示卷积操作，l表示当前网络层数，i表示输入特征图的数量，j表示输出特征图的数量，w_ij为待求的卷积核参数，b_j是偏置参数，w_ij和b_j在实验开始时采用随机正态初始化的方式获取。Among them, f represents the convolution operation, l represents the number of current network layers, i represents the number of input feature maps, j represents the number of output feature maps, w _ij is the convolution kernel parameter to be obtained, b _j is the bias parameter, w _ij and _bj are obtained by random normal initialization at the beginning of the experiment.

池化层操作采用最大池化方法，针对上述GEH模式图像输入尺寸的特点，本发明第一、三层池化范围设为3×3，步长为3，在保证提取特征完备性的同时，更加有效的降低特征维度，降低了网络训练的复杂度；第二层池化范围为2×2，步长为2。The pooling layer operation adopts the maximum pooling method. According to the characteristics of the above-mentioned GEH mode image input size, the pooling range of the first and third layers of the present invention is set to 3×3, and the step size is 3. While ensuring the completeness of the extracted features, It reduces the feature dimension more effectively and reduces the complexity of network training; the pooling range of the second layer is 2×2, and the step size is 2.

2.2双任务目标函数的设计2.2 Design of dual-task objective function

接下来根据卷积阶段得到的结果，将特征送入线性回归问题对应的函数。本发明中人脸特征点检测问题和人脸三维姿态检测问题均为线性回归问题，采用最小二乘函数作为损失函数，表达式如公式(3)所示：Next, according to the results obtained in the convolution stage, the features are sent to the function corresponding to the linear regression problem. In the present invention, the problem of face feature point detection and the problem of three-dimensional attitude detection of people's faces are all linear regression problems, and the least square function is used as the loss function, and the expression is as shown in formula (3):

f_loss＝||l-W^Tx_i|| (3)f _loss ＝||lW ^T x _i || (3)

其中，l表示回归问题的标签，x_i表示由卷积神经网络提取的特征，W表示线性回归问题对应的权重系数。Among them, l represents the label of the regression problem, _xi represents the feature extracted by the convolutional neural network, and W represents the weight coefficient corresponding to the linear regression problem.

本发明以人脸特征点检测为主要任务。三维姿态检测为辅助任务，用以协助人脸特征点检测任务提取更能反映三维姿态的特征，对姿态较大的人脸图像进行更加精确的定位；三维姿态坐标分别以Pitch，Yaw和Roll表示，以三维直角坐标系中的右手笛卡尔坐标系为例，Pitch表示围绕X轴旋转，称为俯仰角，Yaw表示围绕Y轴旋转，称为偏航角，Roll表示围绕Z轴旋转，称为翻滚角。在三维姿态检测任务中，根据对实验数据的统计可知，三维姿态中Yaw姿态的变化幅度较大，为Pitch姿态及Roll姿态变化的5～6倍，因此Yaw的影响较Pitch及Roll更大，所以设置损失函数权重以调整三维姿态对于人脸检测主要任务的影响。本发明设计的双任务损失函数表达式如公式(4)所示：The present invention takes the face feature point detection as the main task. 3D pose detection is an auxiliary task to assist the face feature point detection task to extract features that can better reflect the 3D pose, and to more accurately locate the face image with a larger pose; the 3D pose coordinates are represented by Pitch, Yaw and Roll respectively , taking the right-handed Cartesian coordinate system in the three-dimensional rectangular coordinate system as an example, Pitch means rotation around the X axis, called pitch angle, Yaw means rotation around the Y axis, called yaw angle, Roll means rotation around the Z axis, called roll angle. In the 3D attitude detection task, according to the statistics of the experimental data, the change range of the Yaw attitude in the 3D attitude is relatively large, which is 5 to 6 times the change of the Pitch attitude and the Roll attitude. Therefore, the influence of Yaw is greater than that of Pitch and Roll. Therefore, the weight of the loss function is set to adjust the influence of the three-dimensional pose on the main task of face detection. The dual-task loss function expression designed by the present invention is shown in formula (4):

其中，N表示训练图像数量，表示第i张图像特征点检测任务的标签值(维度为136)，分别表示人脸三维姿态检测对应的标签值(维度均为1)，x_i表示卷积神经网络提取的第i张图像特征，W^f表示特征点检测任务权重，W^yaw,W^pitch,W^roll分别表示人脸三姿态检测任务对应的权重。λ_Yaw,λ_Pitch,λ_Roll表示损失函数损失权重，分别取0.3，0.1，0.1。Among them, N represents the number of training images, Represents the label value of the i-th image feature point detection task (dimension is 136), Respectively represent the label value corresponding to the three-dimensional pose detection of the face (the dimension is 1), x _i represents the i-th image feature extracted by the convolutional neural network, W ^f represents the weight of the feature point detection task, W ^yaw , W ^pitch , W ^roll Respectively represent the weights corresponding to the three pose detection tasks of the face. λ _Yaw , λ _Pitch , and λ _Roll represent the loss weight of the loss function, which are 0.3, 0.1, and 0.1 respectively.

2.3网络学习2.3 E-learning

本发明采用反向传播算法进行网络训练。最终学习到人脸特征点检测权重W^f和姿态检测权重W^yaw,W^pitch,W^roll。测试过程，经过相同的人脸特征提取网络，最终得到人脸特征点检测结果(W^f)^Tx_i，以及三维姿态检测结果(W^Yaw)^Tx_i,(W^Pitch)^Tx_i,(W^Roll)^Tx_i。The present invention adopts backpropagation algorithm to carry out network training. Finally, the face feature point detection weight W ^f and attitude detection weights W ^yaw , W ^pitch , W ^roll are learned. During the test process, through the same face feature extraction network, the face feature point detection result (W ^f ) ^T _xi , and the 3D pose detection result (W ^Yaw ) ^T _xi ,(W ^Pitch ) ^T _xi ,( W ^Roll ) ^T x _i .

对上述方法进行了实验验证，并取得了明显的效果。评价指标采用Yi Sun等人2013年在CVPR上发表的针对人脸特征点检测提出的平均估计误差指标来度量所设计方法的性能，该指标显示了一个特征点定位算法的准确度和可靠性。The above method has been verified by experiment, and obvious effect has been obtained. The evaluation index uses the average estimation error index proposed by Yi Sun et al. on CVPR in 2013 for face feature point detection to measure the performance of the designed method. This index shows the accuracy and reliability of a feature point positioning algorithm.

平均估计误差计算公式如下:The formula for calculating the average estimation error is as follows:

其中，(x，y)和(x′，y′)分别表示特征点真值坐标和估计坐标，l表示估计误差标准化因子。如果估计误差超过10％，则认为估计失效。Among them, (x, y) and (x', y') represent the true coordinates and estimated coordinates of feature points respectively, and l represents the estimation error normalization factor. An estimate is considered invalid if the error in the estimate exceeds 10%.

实验采用300‐W挑战平台中的LFPW数据库，该数据库是一个包括多姿态、多视角的人脸数据库。因为该包括了各种姿态、表情、光照、等因素影响的图片，所以大部分人脸特征点检测方法均在该数据集上进行验证。该数据集包含811张训练图像和224张测试图像。The experiment uses the LFPW database in the 300-W challenge platform, which is a face database including multi-pose and multi-view. Because it includes pictures affected by various poses, expressions, lighting, and other factors, most face feature point detection methods are verified on this dataset. The dataset contains 811 training images and 224 testing images.

第一组实验主要内容：以多特征融合图为输入的双任务卷积神经网络(3feature‐D‐CNN)，以原始图像为输入的双任务卷积神经网络(D‐CNN)，及以原始图像为输入的传统卷积神经网络(CNN)对于人脸特征点检测的性能比较。上述三种网络中的卷积层，池化层及全连接层的结构设置均相同。区别在于网络结构的输入图像类型，输出损失函数及输出维度不同。实验比较结果如下：The main content of the first group of experiments: dual-task convolutional neural network (3feature-D-CNN) with multi-feature fusion image as input, dual-task convolutional neural network (D-CNN) with original image as input, and original image as input Performance comparison of traditional convolutional neural network (CNN) with image as input for facial feature point detection. The structure settings of the convolutional layer, pooling layer and fully connected layer in the above three networks are the same. The difference lies in the input image type of the network structure, the output loss function and the output dimension are different. The experimental comparison results are as follows:

表1：三种卷积神经网络模型比较Table 1: Comparison of three convolutional neural network models

表1各列数据代表不同网络模型在LFPW数据上的测试结果。值越低表示人脸特征点检测的效果越好。可以看出本发明提出的3feature‐D‐CNN网络较原始CNN网络，及D‐CNN网络，平均估计误差分别降低了14.02％，11.6％。这说明以姿态检测任务作为约束，以多类特征图进行融合的新型三通道GEH(Gray-Edge-Hog)模式图像作为输入的人脸特征点检测的方法是有效的。The data in each column of Table 1 represents the test results of different network models on LFPW data. The lower the value, the better the effect of face landmark detection. It can be seen that the 3feature-D-CNN network proposed by the present invention reduces the average estimation error by 14.02% and 11.6% respectively compared with the original CNN network and the D-CNN network. This shows that the face feature point detection method with the pose detection task as a constraint and the new three-channel GEH (Gray-Edge-Hog) pattern image fused with multi-class feature maps as input is effective.

第二组实验主要内容：上述三种网络对于外部轮廓点检测的效果比较。实验结果如表2所示：The main content of the second group of experiments: the comparison of the above three networks for the detection of external contour points. The experimental results are shown in Table 2:

表2：三种网络模型检测人脸外部轮廓检测结果比较Table 2: Comparison of the detection results of the outer contour of the face detected by the three network models

表2各列数据代表不同网络模型在LFPW数据上对轮廓点检测结果。值越低表示人脸特征点检测的效果越好。可以看出本发明提出的3feature‐D‐CNN网络与原始CNN网络，及D‐CNN网络相比，在外部轮廓检测方面平均误差分别降低了21.52％，4.95％。验证本发明对人脸外部轮廓点检测效果进行了一定程度的改善。The data in each column of Table 2 represents the detection results of contour points on LFPW data by different network models. The lower the value, the better the effect of face landmark detection. It can be seen that compared with the original CNN network and the D-CNN network, the average error in external contour detection of the 3feature-D-CNN network proposed by the present invention is reduced by 21.52% and 4.95%, respectively. It is verified that the present invention improves the detection effect of the outer contour points of the human face to a certain extent.

Claims

1. A face feature point detection method is characterized by comprising the following steps:

step 1, carrying out face detection positioning and cutting and three-channel multi-feature image fusion on an original face image to obtain a three-channel GEH mode image Picture_GEH；

Step 2, taking a three-channel GEH mode image obtained by fusing the three feature images as the input of a convolutional neural network, and extracting the facial features of the network, wherein the facial features comprise: the method comprises the following steps that characteristic points and three-dimensional postures of a human face are detected, and a double-task loss function is designed according to a least square function corresponding to a linear regression problem;

and 3, performing network training on the double-task loss function by adopting a gradient back propagation algorithm, finally learning the detection weight of the human face characteristic points and the detection weight of the posture, and extracting a network through the same human face characteristics in the test process so as to realize the detection of the human face characteristic points and the detection of the three-dimensional posture of the human face.

2. The method of claim 1, wherein the network feature extraction is performed by alternating 3 convolutional layers and 3 pooling layers, with 2 fully-connected layers;

firstly, a three-channel GEH mode diagram Picture_GEHAs input to a first layer convolution operationOutput feature map y_jThe calculation formula of (a) is shown as the following formula:

x_{j}^{l} = f (b_{j} + Σ_{i} w_{i j} * x_{i}^{l - 1})

where f denotes convolution operation, l denotes the current number of network layers, i denotes the number of input profiles, j denotes the number of output profiles, w_ijFor the convolution kernel parameter to be solved, b_jIs a bias parameter, w_ijAnd b_jAcquiring by adopting a random normal initialization mode at the beginning of an experiment;

then, according to the result obtained in the convolution stage, the characteristics are sent to the function corresponding to the linear regression problem, and the designed dual-task loss function expression is shown as the following formula:

\begin{matrix} \underset{W^{f}, W^{Y a w}, W^{P i t c h}, W^{R o l l}}{\arg \min} {Σ_{i = 1}^{N} | | l_{i}^{f} - {(W^{f})}^{T} x_{i} | |^{2} + λ_{Y a w} Σ_{i = 1}^{N} | | l_{i}^{Y a w} - {(W^{Y a w})}^{T} x_{i} | |^{2} \\ + λ_{P i t c h} Σ_{i = 1}^{N} | | l_{i}^{P i t c h} - {(W^{P i t c h})}^{T} x_{i} | |^{2} + λ_{R o l l} Σ_{i = 1}^{N} | | l_{i}^{R o l l} - {(W^{R o l l})}^{T} x_{i} | |^{2}} \end{matrix}

wherein N represents the number of training images,a tag value indicating the ith image feature point detection task,respectively representing label values, x corresponding to the detection of the three-dimensional posture of the human face_iFeatures of the ith image, W, representing convolutional neural network extraction^fRepresenting the weight of the feature point detection task, W^yaw,W^pitch,W^rollRespectively representing weight lambda corresponding to three-pose detection task of human face_Yaw,λ_Pitch,λ_RollRepresenting a loss function loss weight;

performing network training through a back propagation algorithm to obtain the face characteristic point detection weight W^fAnd attitude detection weight W^yaw,W^pitch,W^roll(ii) a In the testing process, the same face feature extraction network is used to finally obtain the face feature point detection result (W)^f)^Tx_iAnd three-dimensional attitude detection result (W)^Yaw)^Tx_i,(W^Pitch)^Tx_i,(W^Roll)^Tx_i。

3. The method for detecting human face feature points as claimed in claim 1, wherein in step 1, the human face sub-image after human face detection positioning and clipping is processed with gray scale to obtain a gray scale feature image G; extracting the Hog features from the face subgraph to obtain a Hog feature graph H; finally, extracting edge features to obtain an edge feature graph E; respectively mapping the feature map variables of the feature map G (Gray), the feature map E (edge) and the feature map H (hog) onto the color space of an RGB rectangular coordinate system by using an RGB (Red-Green-Blue) color space as a substrate to generate a novel GEH mode image Picture_GEHThe generation formula is as follows:

wherein,the Gray values representing the Gray feature map Gray are mapped to the r (red) color space in RGB,the representative Hog feature map feature values are mapped to the b (blue) color space,the feature values representing the Edge feature map are mapped to the g (green) color space.