CN106355138A

CN106355138A - Face recognition method based on deep learning and key features extraction

Info

Publication number: CN106355138A
Application number: CN201610682083.6A
Authority: CN
Inventors: 高建彬; 刘婧月
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-08-18
Filing date: 2016-08-18
Publication date: 2017-01-25

Abstract

The invention discloses a novel method of detecting and recognizing faces in videos based on deep learning, which comprises the following steps: detecting the face images in the video and accurately extracting the facial image from the background to eliminate the interference of non-facial areas to the detection result; selecting from the face such an area less affected by visual interference conditions and extracting the local feature of a key point within such area; extracting the top facial feature from the selected area via the convolutional neural network to serve along with the local feature as the features that express the face, and reducing the dimensionality of and normalizing such features; measuring the similarities between the features of two facial images in facial pairs, and assessing one by one to select the pair of images that are most identical, thus the recognition result is obtained.The invention has good recognition effect under unconfined conditions and is suitable for application in settings with complicated interference such as postures, expressions, light and shelter.

Description

Face recognition method based on deep learning and key point feature extraction

技术领域technical field

本发明属于计算机视觉研究领域，涉及视频图像处理和机器学习方法，特别是涉及基于深度学习和关键点特征提取的人脸识别方法。The invention belongs to the field of computer vision research and relates to video image processing and machine learning methods, in particular to a face recognition method based on deep learning and key point feature extraction.

背景技术Background technique

随着监控摄像头的广泛应用，人脸识别系统的市场需求也在逐渐扩大。然而，在这些应用中被监控人群大多处于非约束状态，当前的人脸识别产品和人脸识别系统都需要对检测到的人脸具有一定的限定或要求。这些限定条件已成为人脸识别技术推广和应用的主要障碍。存在这些限定条件是因为：在非可控条件下，复杂干扰因素将导致人脸识别精度急剧下降，不能满足应用需求。在非可控条件下，不仅可能存在强光变化、大范围的姿态变化、夸张的表情变化、有意或无意的遮挡、图像分辨率偏低等严重的干扰因素，而且这些因素可能随机组合地出现在视频人脸图像中。这些复杂干扰将导致同一个人的人脸面像呈现巨大差异。这导致在非可控条件下准确地识别人脸非常困难。因此，非限定人脸识别依然是一个非常困难的问题；其识别精度也远不能满足实际应用需求。With the widespread application of surveillance cameras, the market demand for face recognition systems is gradually expanding. However, in these applications, the monitored population is mostly in an unconstrained state, and the current face recognition products and face recognition systems need to have certain restrictions or requirements on the detected faces. These restrictions have become the main obstacles to the promotion and application of face recognition technology. These restrictions exist because: under uncontrollable conditions, complex interference factors will lead to a sharp drop in the accuracy of face recognition, which cannot meet the application requirements. Under uncontrollable conditions, not only may there be serious interference factors such as strong light changes, large-scale posture changes, exaggerated expression changes, intentional or unintentional occlusion, and low image resolution, but these factors may appear in random combinations in video face images. These complex interferences will lead to huge differences in the facial images of the same person. This makes it very difficult to accurately recognize faces under uncontrolled conditions. Therefore, unrestricted face recognition is still a very difficult problem; its recognition accuracy is far from meeting the needs of practical applications.

然而，在非限定人脸识别中，大多数当前工作都着重于减少单张人脸图像包含的干扰，而忽略了“视觉干扰条件差异”这种现象。例如，一些研究者采用图像预处理技术消除图像包含的干扰，如光照规范化技术、图像降噪技术、姿态评估技术等。另外一些研究者通过提取鲁棒的人脸特征表达减少复杂干扰的影响。当一张人脸图像包含强烈干扰时，当前的预处理技术和特征提取技术都无法有效地消除或抑制它们。相应地，当一对人脸图像视觉干扰条件不同时，当前人脸验证技术很难准确地识别它们。However, in unrestricted face recognition, most current work focuses on reducing the noise contained in a single face image, while ignoring the phenomenon of "visual noise condition difference". For example, some researchers use image preprocessing technology to eliminate the interference contained in the image, such as lighting normalization technology, image noise reduction technology, pose evaluation technology, etc. Other researchers reduce the impact of complex interference by extracting robust facial feature representations. When a face image contains strong disturbances, none of the current preprocessing techniques and feature extraction techniques can effectively remove or suppress them. Correspondingly, when a pair of face images have different visual interference conditions, it is difficult for current face verification techniques to identify them accurately.

对于一对人脸图像，当它们视觉干扰条件不同时，选择具有相似干扰条件的局部区域图像对验证该对人脸，可以有效地减少局部强烈干扰的影响。并且，许多研究已表明，仅利用单个人脸部件判别也能得到令人满意的识别率。基于上述发现，如果预定义多个人脸区域，对于输入的待识别人脸图像对，通过评估它们是否存在视觉干扰条件差异，可以选择更可靠的人脸区域图像对。这将有效地减少局部强烈干扰对人脸验证过程的影响。For a pair of face images, when their visual interference conditions are different, selecting a local area image pair with similar interference conditions to verify the pair of faces can effectively reduce the impact of local strong interference. Moreover, many studies have shown that a satisfactory recognition rate can also be obtained by using only a single face part discrimination. Based on the above findings, if multiple face regions are predefined, for the input face image pairs to be recognized, by evaluating whether they have differences in visual interference conditions, a more reliable face region image pair can be selected. This will effectively reduce the impact of local strong interference on the face verification process.

近年来，国内外学者开始把深度学习方法应用在图像识别问题上，并取得了优异的效果。深度学习算法的一个重要的特性就是对训练样本集规模要求比较大，在视频人脸的识别过程，通过一个系统机构可以轻松的生成数万、数十万的样本数据，因此将视频人脸识别与深度学习算法相结合，可以有效的解决深度学习算法样本集规模不足的问题。同时，由于深度卷积神经网络能够提取对光照、表情、姿势等因素鲁棒且互补的特征，通过它来构建图像的本质特征，可以大大的提高非限定条件下视频人脸识别系统结果的准确性。In recent years, scholars at home and abroad have begun to apply deep learning methods to image recognition problems, and have achieved excellent results. An important characteristic of the deep learning algorithm is that the training sample set is relatively large. In the video face recognition process, tens of thousands or hundreds of thousands of sample data can be easily generated through a system mechanism. Therefore, video face recognition Combining with deep learning algorithms, it can effectively solve the problem of insufficient sample set size of deep learning algorithms. At the same time, since the deep convolutional neural network can extract features that are robust and complementary to factors such as illumination, expression, and posture, it can be used to construct the essential features of the image, which can greatly improve the accuracy of the results of the video face recognition system under unrestricted conditions. sex.

发明内容Contents of the invention

针对上述现有技术，本发明目的在于提供基于深度学习和关键点特征提取的人脸识别方法，解决现有技术其深度学习算法样本集规模不足，在非限定条件下视频人脸识别系统抗干扰能力弱、识别精度低且鲁棒性差等技术问题。In view of the above-mentioned prior art, the purpose of the present invention is to provide a face recognition method based on deep learning and key point feature extraction, to solve the problem of insufficient sample set size of its deep learning algorithm in the prior art, and to prevent interference of the video face recognition system under unrestricted conditions Technical problems such as weak ability, low recognition accuracy and poor robustness.

为达到上述目的，本发明采用的技术方案如下：In order to achieve the above object, the technical scheme adopted in the present invention is as follows:

基于深度学习和关键点特征提取的人脸识别方法，包括以下步骤，A face recognition method based on deep learning and key point feature extraction, comprising the following steps,

步骤1、获取视频图像，提取视频图像中的类哈尔特征；Step 1, obtain the video image, and extract the Haar-like feature in the video image;

步骤2、根据类哈尔特征，构建级联的强分类器，再利用强分类器检测出视频图像中的人眼区域图像；Step 2. Construct a cascaded strong classifier according to the Haar-like feature, and then use the strong classifier to detect the human eye area image in the video image;

步骤3、对人眼区域图像对称设置至少7个关键点，再对其进行区域划分，获得局部图像块；Step 3. Symmetrically set at least 7 key points on the image of the human eye area, and then divide it into regions to obtain local image blocks;

步骤4、获取人脸数据库中设置有相同关键点的预处理局部图像块，将其匹配局部图像块获得对应关键点的图像块对，再利用深度卷积神经网络提取出图像块对的特征向量；Step 4. Obtain the preprocessed local image blocks with the same key points in the face database, match them to the local image blocks to obtain image block pairs corresponding to the key points, and then use the deep convolutional neural network to extract the feature vectors of the image block pairs ;

步骤5、计算特征向量的分类器决策分，判断出分类器决策分最高的图像块对，其中属于人脸数据库中的预处理局部图像块所在人脸图像作为识别的输出结果。Step 5. Calculate the classifier decision score of the feature vector, and determine the image block pair with the highest classifier decision score, wherein the face image belonging to the preprocessed local image block in the face database is used as the output result of the recognition.

上述方法中，所述的步骤2，包括如下步骤，In the above method, the step 2 includes the following steps,

步骤2.1、归一化权重，对应类哈尔特征，计算简单分类器的错误率，选择错误率最低的简单分类器作为弱分类器，并更新权重；Step 2.1, normalize the weight, corresponding to Haar-like features, calculate the error rate of the simple classifier, select the simple classifier with the lowest error rate as the weak classifier, and update the weight;

步骤2.2、根据自适应增强算法，对弱分类器进行步骤2.1循环迭代，得出强分类器，并将所有强分类器级联；Step 2.2. According to the self-adaptive enhancement algorithm, the weak classifier is iterated in step 2.1 to obtain a strong classifier, and all the strong classifiers are cascaded;

步骤2.3、利用级联的强分类器检测出视频图像中人眼区域图像。Step 2.3, using cascaded strong classifiers to detect the image of the human eye area in the video image.

上述方法中，所述的步骤3，其中，对人眼区域图像设置23个关键点。In the above method, in step 3, 23 key points are set for the image of the human eye area.

上述方法中，所述的步骤4，包括如下步骤，In the above method, the step 4 includes the following steps,

步骤4.1、获取人脸数据库中设置有相同关键点的预处理局部图像块，将其匹配局部图像块获得对应关键点的图像块对，计算关键点的梯度并提取图像块对关键点的联合规范化局部特征；Step 4.1, obtain the preprocessed local image blocks with the same key points in the face database, match them to the local image blocks to obtain image block pairs corresponding to the key points, calculate the gradient of the key points and extract the joint normalization of the image blocks to the key points local features;

步骤4.2、预设梯度差异阈值条件，剔除不满足梯度差异阈值条件的梯度所属关键点所在图像块对；Step 4.2, preset the gradient difference threshold condition, and eliminate the image block pairs where the key points of the gradient do not meet the gradient difference threshold condition;

步骤4.3、利用深度卷积神经网络提取出剩余的图像块对的高维度特征向量；Step 4.3, using a deep convolutional neural network to extract high-dimensional feature vectors of the remaining image block pairs;

步骤4.4、将高维度特征向量连接对应关键点的联合规范化局部特征，获得特征向量。Step 4.4. Connect the high-dimensional feature vectors to the joint normalized local features corresponding to key points to obtain feature vectors.

上述方法中，所述的步骤5，包括如下步骤，In the above method, the step 5 includes the following steps,

步骤5.1、计算特征向量的余弦距离向量；Step 5.1, calculate the cosine distance vector of the feature vector;

步骤5.2、根据余弦距离向量，训练并求出分类器决策函数；Step 5.2, according to the cosine distance vector, train and obtain the classifier decision function;

步骤5.3、利用分类器决策函数评估图像块对的相似性，判断出分类器决策分最高的图像块对，其中属于人脸数据库中的预处理局部图像块所在人脸图像作为识别的输出结果。Step 5.3: Use the classifier decision function to evaluate the similarity of the image block pairs, and determine the image block pair with the highest classifier decision score, and the face image belonging to the preprocessed local image block in the face database is used as the output result of the recognition.

与现有技术相比，本发明的有益效果：Compared with prior art, the beneficial effect of the present invention:

(1)本发明能够很好地实现对视频中的人脸图像检测，同时能够减弱非人脸图像对检测结果的干扰；(1) The present invention can well realize the face image detection in the video, and can weaken the interference of the non-face image to the detection result at the same time;

(2)本发明用了及与深度学习的特征提取算法，能够得到抗姿态、表情、光照、遮挡等复杂干扰的、更具区分性的、互补的、低维的人脸特征，该特征能够在很大程度上降低人脸识别的错误率；(2) The present invention uses the feature extraction algorithm of deep learning, which can obtain complex interference such as posture, expression, illumination, occlusion, more distinguishing, complementary, low-dimensional human face features, which can Reduce the error rate of face recognition to a great extent;

(3)本发明能根据视觉条件差异，自适应的选择分类器决策分值，极大地降低非限定条件下人脸识别的错误率。(3) The present invention can adaptively select the decision score of the classifier according to the differences in visual conditions, and greatly reduce the error rate of face recognition under unrestricted conditions.

附图说明Description of drawings

图1本发明提供的的视频人脸检测流程图；The video face detection flowchart that Fig. 1 the present invention provides;

图2本发明提供的人脸区域划分及关键点位置示意图；Fig. 2 is a schematic diagram of the division of human face regions and the positions of key points provided by the present invention;

图3本发明的卷积神经网络结构示意图。Fig. 3 is a schematic diagram of the convolutional neural network structure of the present invention.

具体实施方式detailed description

本说明书中公开的所有特征，或公开的所有方法或过程中的步骤，除了互相排斥的特征和/或步骤以外，均可以以任何方式组合。All features disclosed in this specification, or steps in all methods or processes disclosed, may be combined in any manner, except for mutually exclusive features and/or steps.

下面结合附图对本发明做进一步说明：The present invention will be further described below in conjunction with accompanying drawing:

实施例1Example 1

过程1，基于Adaboost自适应增强算法的人脸检测Process 1, face detection based on Adaboost adaptive enhancement algorithm

(1.1)由Adaboost算法训练产生弱分类器的优化组合的具体实现如下，人脸检测流程图参照图1：(1.1) The specific implementation of the optimized combination of weak classifiers generated by Adaboost algorithm training is as follows. The flow chart of face detection is shown in Figure 1:

给定n个训练样本：(x₁,y₁),(x₂,y₂),……(x_n,y_n)其中y_i＝0,1,分别表示人脸区域和非人脸区域，x_i表示24×24的子窗口。Given n training samples: (x ₁ , y ₁ ), (x ₂ , y ₂ ),...(x _n , y _n ) where y _i = 0, 1, representing face area and non-face area respectively , _xi represents a 24×24 sub-window.

初始化样本权值为：The initial sample weights are:

其中m和l分别是负样本和正样本的个数。where m and l are the number of negative samples and positive samples, respectively.

将Haar-Like类哈尔特征转换为弱分类器，弱分类器的定义公式如下：Convert the Haar-Like Haar-like feature into a weak classifier. The definition formula of the weak classifier is as follows:

其中：h_j(x)表示简单分类器的值；θ_j为阈值；p_j表示不等号的方向，只能取±1；f_j(x)表示从大小归一化到24×24的图像块上，提取到的Haar-Like特征值。Among them: h _j (x) represents the value of the simple classifier; θ _j is the threshold; p _j represents the direction of the inequality sign, which can only be ±1; f _j (x) represents the image block normalized from the size to 24×24 , the extracted Haar-Like eigenvalues.

选择T个最优的弱分类器，每个分类器选择方法如下：Select T optimal weak classifiers, each classifier selection method is as follows:

1.归一化权重：1. Normalized weights:

2.每一个特征j训练一个分类器，并计算该分类器的错误率：2. Train a classifier for each feature j, and calculate the error rate of the classifier:

3.选择错误率最低的分类器h_t作为一个弱分类器3. Select the classifier h _t with the lowest error rate as a weak classifier

4.更新权重：4. Update weights:

当样本x_i被正确分类时，e_i＝0，否则e_i＝1,重复上述4步，直到e_i＝0。When the sample x _i is correctly classified, e _i =0, otherwise e _i =1, Repeat the above four steps until e _i =0.

最终把选好的T个弱分类器组合成强分类器：Finally, the selected T weak classifiers are combined into a strong classifier:

其中，级联训练好的若干强分类器，形成级联分类器，用于人脸检测。in, Cascade trained several strong classifiers to form a cascade classifier for face detection.

(1.2)用训练好的级联分类器对视频中的人脸进行检测，选定若干可能为人脸的区域。本发明为了提高人脸检测的准确性，对上述可能的人脸区域，用Haar-Like特征和Adaboost算法，对人眼区域进行检测。若检测到的人脸内部有眼睛区域，则表示该人脸区域是真人脸区域，否则表示该人脸区域是假人脸区域，丢弃该检测结果。(1.2) Use the trained cascade classifier to detect the faces in the video, and select several areas that may be faces. In order to improve the accuracy of human face detection, the present invention uses Haar-Like feature and Adaboost algorithm to detect human eye area for the above possible human face area. If there is an eye area inside the detected face, it means that the face area is a real face area, otherwise it means that the face area is a fake face area, and the detection result is discarded.

过程2，基于异常检测的人脸区域选择方法Process 2, face area selection method based on anomaly detection

把人脸图像分成10个区域R_i,i＝1,…,10，定义人脸图像上的23个关键点，区域划分及关键点位置参见图2。Divide the face image into 10 regions R _i , i=1,...,10, define 23 key points on the face image, see Figure 2 for the region division and key point positions.

(2.1)将人脸数据库中的每个人的标准人脸图像I_a,j,j＝1,…,n分成10个区域，记为R_ai,i＝1,…,10，其中n为数据库中的人数。定位出每个区域内相应的23个关键点，提取每个区域内这些关键点的局部特征。即对数据库中的每个人脸图像，均得到10个区域中的关键点局部特征向量H_a,i，其中i表示每个区域内关键点数目。(2.1) Divide the standard face image I _a,j ,j=1,...,n of each person in the face database into 10 regions, which are recorded as R _ai ,i=1,...,10, where n is the database the number of people in. The corresponding 23 key points in each area are located, and the local features of these key points in each area are extracted. That is, for each face image in the database, local feature vectors H _a,i of key points in 10 regions are obtained, where i represents the number of key points in each region.

(2.2)将检测到的视频人脸图像分成10个区域，记为R_bi,i＝1,…,10。定位出每个区域内相应的23个关键点，提取每个区域内这些的关键点的局部特征H_b,i，其中i表示每个区域内关键点数目。(2.2) Divide the detected video face image into 10 regions, denoted as R _bi , i=1,...,10. The corresponding 23 key points in each area are located, and the local features H _b,i of these key points in each area are extracted, where i represents the number of key points in each area.

(2.3)视频人脸图像和每个数据库中的人脸图像组成人脸对，可得n个人脸对(I_a,i,I_b)。对给定人脸区域R_i上的一对(I_a,i,I_b)，分别得到I_a,i和I_b在P个关键点上的局部特征H_a,i和H_b,i,i＝1,…,p。(2.3) Video face images and face images in each database form face pairs, and n face pairs (I _a,i ,I _b ) can be obtained. For a pair (I _a,i ,I _b ) on a given face area R _i , get the local features H _a,i and H _b,i of I _a,i and I _b on P key points, respectively, i=1,...,p.

联合规范化局部特征提取流程如下：The joint normalization local feature extraction process is as follows:

记以关键点P为中心的局部邻域为N，其包括左上、右上、左下、右下四个块，分别记为B1、B2、B3、B4，每个块包含四个更小的单元块。记N_0.5为将图像缩小一倍之后P点的局部邻域，其与N具有相似的块和单元块。Note that the local neighborhood centered on the key point P is N, which includes four blocks in the upper left, upper right, lower left, and lower right, respectively marked as B1, B2, B3, and B4, and each block contains four smaller unit blocks . Note that N _0.5 is the local neighborhood of point P after the image is doubled, which has similar blocks and unit blocks to N.

计算M(x,y)、θ(x,y)，(x,y)∈N，其中M(x,y)、θ(x,y)分别表示点(x,y)上的梯度幅值和梯度方向，θ(x,y)∈[0,π]；Calculate M(x,y), θ(x,y), (x,y)∈N, where M(x,y), θ(x,y) represent the gradient magnitude on the point (x,y) respectively and gradient direction, θ(x,y)∈[0,π];

之后针对N内每个单元，以幅值M(x,y)为权重执行梯度方向直方图投影，产生16个8位的直方图H_i,i＝1,…,16，每个块内的四个单元执行L2规范化；Afterwards, for each unit in N, the gradient orientation histogram projection is performed with the amplitude M(x, y) as the weight, and 16 8-bit histograms H _i , i=1,...,16 are generated, and the Four units perform L2 normalization;

在N的B1、B2、B3、B4上，分别以幅值M(x,y)为权重执行梯度方向直方图投影，产生4个8位直方图，记作H_i,i＝17,…,20，它们一起执行L2规范化；On B1, B2, B3, and B4 of N, respectively perform gradient direction histogram projection with the amplitude M(x, y) as the weight, and generate four 8-bit histograms, which are recorded as H _i , i=17,..., 20, they perform L2 normalization together;

计算M(x,y)、θ(x,y)，(x,y)∈N_0.5，在N_0.5上执行相同的操作，产生4个8位直方图记作H_i,i＝21,…,24，它们一起执行L2规范化；Calculate M(x,y), θ(x,y), (x,y)∈N _0.5 , perform the same operation on N _0.5 , and generate 4 8-bit histograms marked as H _i ,i=21,… ,24, which together perform L2 normalization;

在N_0.5上以幅值M(x,y)为权重执行梯度方向直方图投影，产生1个8位的直方图，执行L2规范化，记作H₂₅；联合规范化的H_i,i＝1,…,25，形成P点上200维的局部特征H。Perform gradient direction histogram projection on N _0.5 with the amplitude M(x, y) as the weight to generate an 8-bit histogram, perform L2 normalization, denoted as H ₂₅ ; jointly normalized H _i , i=1, …, 25, forming a 200-dimensional local feature H on point P.

(2.4)对所有图像对(I_a,i,I_b)，执行翻转操作得到r_j和表示他们的可靠性估计，其中j＝1,…,10。(2.4) For all image pairs (I _a,i ,I _b ), perform the flip operation to get r _j and Denote their reliability estimates, where j=1,...,10.

每个区域R_i,i＝1,…,10上的可靠性r为：The reliability r on each region R _i , i=1,...,10 is:

其中，p`为满足e_k<1的关键点个数，e_k,k＝1,...,p表示(I_a,i,I_b)在p个关键点的异常差异检测结果，定义如下：e_k＝e_μe_ν Among them, p` is the number of key points satisfying e _k <1, e _k ,k=1,...,p represents the abnormal difference detection results of (I _a,i ,I _b ) at p key points, defined As follows: e _k = e _μ e _ν

其中，in,

k_μ和k_ν为控制参数；d_c和d_χ是余弦距离和χ²距离，表示关键点P的梯度差异。k _μ and k _ν are control parameters; d _c and d _χ are the cosine distance and χ ² distance, which represent the gradient difference of the key point P.

(2.5)对于所有图像对，用如下区域选择方法选择每对图像对中受视觉干扰条件较小的区域(梯度差异阈值条件)：(2.5) For all image pairs, use the following region selection method to select regions with less visual interference conditions in each pair of image pairs (gradient difference threshold condition):

1.如果则 1. If but

2.如果r₁＝1，输出R₁的全脸图像对(I_a,i,1,I_b,1)；2. If r ₁ =1, output the full-face image pair (I _a,i,1 ,I _b,1 ) of R ₁ ;

3.如果r₁<1，但存在r_j＝1，则输出满足r_j＝1的图像对；3. If r ₁ <1, but there is r _j =1, then output the image pair satisfying r _j =1;

4.如果所有r_j<1，则输出r_j最大的5区域个图像对。4. If all r _j < 1, then output the 5 region image pairs with the largest r _j .

过程3，基于深度卷积神经网络的高层人脸特征表达Process 3, high-level face feature expression based on deep convolutional neural network

(3.1)对于每对图像对(I_a,i,I_b)，已选择出受视觉干扰条件较小的区域。把每个图像对中选出的区域缩放至31×39大小，分别输入到卷积神经网络中，输出层得到160维的特征向量，卷积神经网络结构示意参见图3。根据不同的图像对，选出不同数目的160维向量。把这些向量记为F_a＝{F_a1,F_a2,…,F_am}和F_b＝{F_b1,F_b2,…,F_bm}，m为图像对中选择的区域个数，把这m个特征作为描述该图像对的高层人脸特征。(3.1) For each pair of images (I _a,i ,I _b ), regions with less visual disturbance conditions have been selected. The selected area in each image pair is scaled to 31×39, and input to the convolutional neural network respectively, and the output layer obtains a 160-dimensional feature vector. The structure of the convolutional neural network is shown in Figure 3. According to different image pairs, different numbers of 160-dimensional vectors are selected. Denote these vectors as F _a ={F _a1 ,F _a2 ,...,F _am } and F _b ={F _b1 ,F _b2 ,...,F _bm }, m is the number of regions selected in the image pair, take this The m features are used as high-level face features describing the image pair.

(3.2)再从人脸图像对中选出的若干区域上，把该区域包含的所有关键点的局部特征表示为和作为n个局部特征向量，n位该区域内关键点数目。对每个人脸图像对(I_a,i,I_b)均进行此操作。(3.2) On several areas selected from the face image pair, the local features of all the key points contained in the area are expressed as and As n local feature vectors, n bits are the number of key points in the area. This operation is performed for each face image pair (I _a,i ,I _b ).

(3.3)用PCA算法对F和中的各个特征向量进行降维，保留他们90％的信息。并用min-max技术分别对F和中的各个特征向量进行规范化，使他们的特征值保持在[0,1]范围内。(3.3) Use PCA algorithm for F and Dimensionality reduction is performed on each feature vector in , and 90% of their information is retained. And use the min-max technique for F and Each eigenvector in is normalized to keep their eigenvalues in the [0,1] range.

过程4，基于距离度量学习评估人脸相似性Process 4. Evaluating face similarity based on distance metric learning

(4.1)对每个图像对，用如下公式分别计算F和中的各个特征向量的余弦距离：(4.1) For each image pair, calculate F and The cosine distances of the individual eigenvectors in :

得到多个多距离向量，规范化之后记作X_i。A number of multi-distance vectors are obtained, which are denoted as Xi after _{normalization} .

(4.2)计算X_i的分类器决策分s，定义如下(4.2) Calculate the classifier decision score _s of Xi, which is defined as follows

其中φ^b(x)、φ^c(x)和φ^g(x)分别表示三种不同视觉一致性条件下的分类器决策函数。τ₁和τ₂界定如何根据e选择合适的分类器，e表示VCM评估结果。本发明中，他们从训练φ^b(x)、φ^c(x)和φ^g(x)的训练集上获取。where φ ^b (x), φ ^c (x) and φ ^g (x) denote the decision functions of the classifier under three different visual consistency conditions, respectively. τ ₁ and τ ₂ define how to select a suitable classifier according to e, and e represents the VCM evaluation result. In the present invention, they are obtained from the training set for training φ ^b (x), φ ^c (x) and φ ^g (x).

根据上述过程，利用选定的人脸对区域R_j，分别计算X_j，可得该图像对的分类器决策分s_j＝ψ_j(X_j,e_j)。According to the above process, using the selected face pair region R _j to calculate X _j respectively, the classifier decision score s _j =ψ _j (X _j , e _j ) of the image pair can be obtained.

(4.3)把视频图像中的人脸与人脸数据库中的人脸逐一进行相似性评估，选择其中s_j＝ψ_j(X_j,e_j)最高的一对，把数据集中相应的第i幅人脸图像I_a,i所对应的身份作为识别的结果输出。(4.3) Evaluate the similarity between the faces in the video image and the faces in the face database one by one, select the pair with the highest s _j =ψ _j (X _j , e _j ), and put the corresponding i-th face in the data set The identities corresponding to the face images I _{a, i} are output as the recognition results.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何属于本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto, any changes or substitutions that can be easily imagined by those skilled in the art within the technical scope disclosed in the present invention, All should be covered within the protection scope of the present invention.

Claims

1. A face recognition method based on deep learning and key point feature extraction is characterized by comprising the following steps,

step 1, acquiring a video image, and extracting a haar-like feature in the video image;

step 2, constructing a cascade strong classifier according to the haar-like characteristics, and detecting a human eye region image in the video image by using the strong classifier;

step 3, symmetrically setting at least 7 key points on the eye region image, and then carrying out region division on the eye region image to obtain a local image block;

step 4, acquiring preprocessed local image blocks with the same key points in a face database, matching the preprocessed local image blocks with the preprocessed local image blocks to obtain image block pairs corresponding to the key points, and extracting feature vectors of the image block pairs by using a deep convolution neural network;

and 5, calculating a classifier decision score of the feature vector, and judging an image block pair with the highest classifier decision score, wherein a face image where the preprocessed local image block belongs to the face database is used as an identification output result.

2. The face recognition method based on deep learning and key point feature extraction as claimed in claim 1, wherein the step 2 comprises the following steps,

step 2.1, normalizing the weight, calculating the error rate of the simple classifiers corresponding to the class haar characteristics, selecting the simple classifier with the lowest error rate as a weak classifier, and updating the weight;

step 2.2, according to the self-adaptive enhancement algorithm, carrying out step 2.1 loop iteration on the weak classifiers to obtain strong classifiers, and cascading all the strong classifiers;

and 2.3, detecting the human eye region image in the video image by using the cascaded strong classifiers.

3. The method for recognizing the human face based on the deep learning and the key point feature extraction as claimed in claim 1, wherein in the step 3, 23 key points are set for the human eye region image.

4. The face recognition method based on deep learning and key point feature extraction as claimed in claim 3, wherein the step 4 comprises the following steps,

step 4.1, acquiring preprocessed local image blocks with the same key points in a face database, matching the preprocessed local image blocks with the preprocessed local image blocks to obtain image block pairs corresponding to the key points, calculating the gradient of the key points and extracting the joint normalized local features of the key points of the image block pairs;

step 4.2, presetting a gradient difference threshold condition, and eliminating image block pairs where key points of gradients which do not meet the gradient difference threshold condition belong;

4.3, extracting high-dimensional feature vectors of the remaining image block pairs by using a deep convolutional neural network;

and 4.4, connecting the high-dimensional feature vector with the joint normalized local features of the corresponding key points to obtain the feature vector.

5. The face recognition method based on deep learning and key point feature extraction as claimed in claim 1, wherein the step 5 comprises the following steps,

step 5.1, calculating cosine distance vectors of the feature vectors;

step 5.2, training and solving a decision function of the classifier according to the cosine distance vector;

and 5.3, evaluating the similarity of the image block pairs by utilizing a classifier decision function, and judging the image block pair with the highest classifier decision score, wherein the face image where the preprocessed local image block belongs to the face database is used as an identification output result.