CN105550634B

CN105550634B - Face posture recognition method based on Gabor characteristics and dictionary learning

Info

Publication number: CN105550634B
Application number: CN201510796987.7A
Authority: CN
Inventors: 陈友斌; 廖海斌
Original assignee: Guangdong Micropattern Software Co ltd
Current assignee: Guangdong Micropattern Software Co ltd
Priority date: 2015-11-18
Filing date: 2015-11-18
Publication date: 2019-05-03
Anticipated expiration: 2035-11-18
Also published as: CN105550634A

Abstract

The invention discloses a face gesture recognition method based on Gabor characteristics and dictionary learning, which comprises the following steps of: firstly, discretizing the human face posture into different subspaces, and training a sub dictionary for each subspace by using K-SVD (K-singular value decomposition) to enable the sub dictionary to correspond to a category; then, all the sub-dictionaries are combined into an ultra-complete dictionary; and finally, carrying out attitude classification by adopting a method based on gabor characteristics and sparse representation. In order to improve the robustness of the algorithm, the invention reconstructs an occlusion face dictionary and solves the problem of face occlusion in face gesture recognition. The method can solve the problems of illumination, noise, shielding and the like in the face posture estimation, and quickly and robustly identify the front face, the head raising, the nodding, the left deflection, the left side face, the right deflection and the right side face. The method can be well applied to the fields of safe driving, man-machine interaction, face recognition and the like.

Description

Face pose recognition method based on Gabor feature and dictionary learning

技术领域technical field

本发明属于图像处理、模式识别、计算机视觉和人机交互技术领域，涉及一种人脸姿态识别方法，具体涉及一种基于字典学习与稀疏表示的人脸姿态识别方法。The invention belongs to the technical fields of image processing, pattern recognition, computer vision and human-computer interaction, and relates to a face gesture recognition method, in particular to a face gesture recognition method based on dictionary learning and sparse representation.

背景技术Background technique

人脸姿态估计在智能视频监控、人脸识别、人机交互和虚拟现实领域具有巨大的应用前景。例如，在智能视频监控方面，人脸姿态估计可以应用于驾驶监控系统，通过监控司机的人脸姿态变化来识别司机是否集中注意力开车，避免撞车情况的发生。此外，人脸姿态估计对人脸识别的准确度有很大的影响，许多人脸识别算法对正面人脸图像能够达到很好的识别率，但对于多姿态的非正面人脸图像，它的识别准确率会严重下降，而通过人脸姿态预估计是解决多人脸姿态识别的一种重要途径。Face pose estimation has great application prospects in the fields of intelligent video surveillance, face recognition, human-computer interaction and virtual reality. For example, in the aspect of intelligent video surveillance, face pose estimation can be applied to driving monitoring systems, by monitoring the changes in the driver's face pose to identify whether the driver is paying attention to driving and avoid crashes. In addition, face pose estimation has a great impact on the accuracy of face recognition. Many face recognition algorithms can achieve a good recognition rate for frontal face images, but for multi-pose non-frontal face images, its The recognition accuracy will drop seriously, and face pose pre-estimation is an important way to solve the multi-face pose recognition.

目前现有的人脸姿态检测方法大体上可以分为三类：纹理子空间方法，3D 方法，其它类方法。第一类方法通过基于2D人脸外观的学习方法实现姿态的检测与估计。其中比较典型的有主成份分析(PCA)和线性判别(LDA)等。由于 PCA是一种线性降维方法，而人脸姿态3D旋转变化很大程度上是一种非线性变化。因此学者们使用核主成份分析(KPCA)，流型学习方法解决这种非线性变化问题。但是，核方法和流型学习方法有一个缺陷：随着人脸训练样本增加，它很难分离出身份和姿态。这就意味着，当人脸训练库足够大时，姿态估计的准确率会根据人的不同而变化。第一类方法最大的特点是处理速度快，容易实现，但是需要通过大量样本的训练，对人脸的光照、表情等变化较为敏感，特别是对光照极差的视频人脸图像其准确率下降明显。At present, the existing face pose detection methods can be roughly divided into three categories: texture subspace methods, 3D methods, and other methods. The first category of methods implements pose detection and estimation through 2D face appearance-based learning methods. The typical ones are Principal Component Analysis (PCA) and Linear Discriminant (LDA). Since PCA is a linear dimensionality reduction method, the 3D rotation change of face pose is largely a nonlinear change. Therefore, scholars use Kernel Principal Component Analysis (KPCA) and manifold learning method to solve this nonlinear change problem. However, kernel methods and manifold learning methods have a flaw: it is difficult to separate identity and pose as face training samples increase. This means that when the face training library is large enough, the accuracy of pose estimation will vary from person to person. The biggest feature of the first type of method is that the processing speed is fast and easy to implement, but it needs to be trained through a large number of samples, and it is more sensitive to changes in face illumination and expression, especially for video face images with extremely poor illumination. obvious.

第二类方法认为人脸姿态检测本身就是一个3D问题，只有通过3D信息才能表征人脸姿态的本质特征。因此这类方法往往通过抽取3D特征来表征不同姿态，或者利用不同视角下的多幅图像，在三维空间中重建人脸的3D模型实现姿态的检测。这类方法往往对图像的大小和质量要求很高，并且会花费大量的运算时间。第二类方法通过3D方法能够得到很高准确率，但是实时性不高，同时对视频监控中的超低分辨率和遮挡人脸图像效果不是很好。The second type of method considers that face pose detection itself is a 3D problem, and only 3D information can characterize the essential features of face pose. Therefore, such methods often represent different poses by extracting 3D features, or use multiple images from different perspectives to reconstruct 3D models of faces in three-dimensional space to detect poses. Such methods often require high image size and quality, and spend a lot of computation time. The second type of method can achieve high accuracy through 3D methods, but the real-time performance is not high, and the effect of ultra-low resolution and occluded face images in video surveillance is not very good.

第三类方法是一些非主流方法，只能解决人脸姿态估计中部分问题或只能应用于某些特定场合。例如，Rafael 等人提出多相机的人脸姿态估计方法。为了正确估计人脸姿态，他们方法中需要利用前后左右6个相机拍照的6幅图像进行融合判别。J.Nuevo等人提出块聚类的方法进行人脸姿态估计，取得了不错的效果，但是他们的方法估计的姿态范围有限(只能识别45度范围的姿态变化)。山东大学的陈振学等人提出三角形的人脸姿态估计方法，得到了91％左右的准确率，但是他们的方法只能对人脸绕Y和Z轴偏转有效，而对于绕X轴旋转人脸姿态无效，即对人脸上下旋转情况失效。The third type of methods are some non-mainstream methods, which can only solve some problems in face pose estimation or can only be applied to some specific occasions. For example, Rafael et al. proposed a multi-camera face pose estimation method. In order to correctly estimate the face pose, their method needs to use 6 images taken by 6 front, back, left and right cameras for fusion and discrimination. J.Nuevo et al. proposed a block clustering method for face pose estimation, and achieved good results, but the range of poses estimated by their method was limited (only 45-degree range of pose changes could be recognized). Chen Zhenxue of Shandong University and others proposed a triangular face pose estimation method, and obtained an accuracy rate of about 91%, but their method is only effective for the deflection of the face around the Y and Z axes, but for the rotation of the face around the X axis. Invalid, that is, it is invalid for the rotation of the face up and down.

目前想要计算机具备和人类一样的姿态识别能力还很难，主要原因是光照、噪声、遮挡、分辨率、身份、表情等因素的变化都会对姿态估计的准确性产生巨大的影响，如何消除这些因素的影响是目前亟需解决的问题。At present, it is still difficult for computers to have the same attitude recognition ability as humans. The main reason is that changes in lighting, noise, occlusion, resolution, identity, expression and other factors will have a huge impact on the accuracy of attitude estimation. How to eliminate these The influence of factors is an urgent problem to be solved at present.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服以上方法的不足，提出一种基于Gabor特征与字典学习的人脸姿态识别方法，解决人脸姿态识别中的光照、噪声和遮挡等问题，鲁棒性的识别出正面、抬头、点头、左偏转、左侧脸、右偏转和右侧脸。The purpose of the present invention is to overcome the deficiencies of the above methods, and propose a face gesture recognition method based on Gabor feature and dictionary learning, which solves the problems of illumination, noise and occlusion in face gesture recognition, and robustly recognizes frontal, Head up, nod, left deflection, left face, right deflection and right face.

本发明的目的通过下述技术方案实现：The object of the present invention is achieved through the following technical solutions:

一种基于Gabor特征与字典学习的人脸姿态识别方法，包括下列步骤：A face gesture recognition method based on Gabor feature and dictionary learning, comprising the following steps:

对在线输入的待识别人脸姿态图像进行Gabor特征提取，构建Gabor特征向量y；Extract the Gabor feature of the face pose image to be recognized online input, and construct the Gabor feature vector y;

对所述Gabor特征向量y使用姿态完备字典进行线性组合表示，建立稀疏表示模型并求解系数向量，其中Gabor特征向量m为Gabor特征向量维数；The Gabor eigenvector y is represented by a linear combination using an attitude complete dictionary, a sparse representation model is established, and the coefficient vector is solved, where the Gabor eigenvector m is the dimension of Gabor feature vector;

根据上述求解的线性组合的系数向量进行人脸姿态分类识别。The face pose classification and recognition is performed according to the coefficient vector of the linear combination obtained above.

进一步地，所述人脸姿态划分为7个不同姿态类别，分别定义为左偏转、左侧脸、右偏转、右侧脸、正面、抬头和点头，其各对应不同的子空间。Further, the facial posture is divided into 7 different posture categories, which are respectively defined as left deflection, left face, right deflection, right face, frontal, head-up and nodding, which correspond to different subspaces.

进一步地，对所述Gabor特征向量y使用姿态完备字典进行线性组合表示之前，还包括所述姿态完备字典的训练步骤，其中所述姿态完备字典包括对应无遮挡的人脸姿态的第一姿态完备字典D和对应有遮挡的人脸姿态第二姿态完备字典D_e，所述第一姿态完备字典D和所述第二姿态完备字典D_e的训练分别独立完成。Further, before using the posture complete dictionary to express the linear combination of the Gabor feature vector y, the training step of the posture complete dictionary is also included, wherein the posture complete dictionary includes the first posture complete corresponding to the unobstructed face posture. The dictionary D and the second pose complete dictionary D _e corresponding to the occluded face pose, the training of the first pose complete dictionary D and the second pose complete dictionary D _e are completed independently.

进一步地，所述第一姿态完备字典D的训练过程具体如下：Further, the training process of the first posture complete dictionary D is as follows:

分别收集各姿态类别的人脸姿态图像样本，并对所述人脸姿态图像样本进行Gabor滤波处理以及特征提取并向量化组成各姿态类别的人脸姿态Gabor特征训练集；Collect face pose image samples of each pose category respectively, and perform Gabor filtering processing and feature extraction on the face pose image samples and quantify the face pose Gabor feature training set of each pose category;

对每类姿态类别的人脸姿态Gabor特征训练集使用K-SVD进行训练优化分别得出最佳子字典D_i，i＝1，2，…，7；Use K-SVD to perform training optimization on the face pose Gabor feature training set of each pose category to obtain the best sub-dictionary D _i , i=1, 2, ..., 7;

将各类最佳子字典D_i组成第一姿态完备字典D＝[D₁，D₂，…，D₇]。All kinds of optimal sub-dictionaries D _i are formed into a first pose complete dictionary D=[D ₁ , D ₂ , . . . , D ₇ ].

进一步地，所述人脸姿态图像的Gabor特征提取为：Further, the Gabor feature extraction of the face pose image is:

其中，是通过对Gabor滤波系数的模进行ρ次采样而得到的列向量，μ，v为Gabor滤波器的方向与尺度，采用方向与尺度不变的Gabor滤波器描述人脸姿态图像的特征，为人脸姿态图像与Gabor核ψ_μ，v的卷积，Gabor 核定义为：in, is the modulo of the Gabor filter coefficients The column vector obtained by sampling ρ times, μ, v are the direction and scale of the Gabor filter, and the Gabor filter with constant direction and scale is used to describe the characteristics of the face pose image, is the convolution of the face pose image and the Gabor kernel ψ _{μ, v} , and the Gabor kernel is defined as:

其中，z(x，y)表示像素；为小波项，k_v＝k_max/f^v，φ_μ＝πμ/8，σ＝1.5π控制着高斯窗口宽度与波长的比例。Among them, z(x, y) represents the pixel; is the wavelet term, k _v =km _max /f ^v , φ _μ =πμ/8, σ=1.5π controls the ratio of the width of the Gaussian window to the wavelength.

进一步地，所述在线输入的待识别人脸姿态图像为无遮挡的人脸姿态图像时，提取所述待估计人脸姿态图像的Gabor特征向量m为Gabor特征向量维数，将y看成所述第一姿态完备字典的线性组合表示：其中N为字典原子总个数，所述稀疏表示模型即为 Further, when the face pose image to be recognized inputted online is an unoccluded face pose image, extract the Gabor feature vector of the face pose image to be estimated m is the dimension of Gabor feature vector, and y is regarded as the complete dictionary of the first pose The linear combination of : where N is the total number of dictionary atoms, and the sparse representation model is

进一步地，所述稀疏表示模型 Further, the sparse representation model

通过带稀疏约束的最小二乘法进行求解：Solve by least squares with sparsity constraints:

其中，λ为平衡因子，起到平衡重建误差与稀疏性的作用。Among them, λ is a balance factor, which plays a role in balancing reconstruction error and sparsity.

进一步地，所述在线输入的待识别人脸姿态图像为有遮挡的人脸姿态图像时，提取所述待估计人脸姿态图像的Gabor特征向量m为Gabor特征向量维数，将y看成由所述的第一完备字典D和第二姿态完备字典的线性组合表示：Further, when the face pose image to be recognized inputted online is an occluded face pose image, extract the Gabor feature vector of the face pose image to be estimated m is the dimension of the Gabor feature vector, and y is regarded as the first complete dictionary D and the second attitude complete dictionary The linear combination of :

其中无遮挡图像y₀与遮挡误差图像e₀分别可由第一姿态完备字典D和第二姿态完备字典稀疏表示，D_e为正交单位矩阵，所述稀疏表示模型即为 in The unoccluded image y ₀ and the occlusion error image e ₀ can be obtained from the first pose complete dictionary D and the second pose complete dictionary respectively Sparse representation, _De is an orthogonal unit matrix, and the sparse representation model is

进一步地，所述稀疏表示模型将有遮挡的人脸姿态图像识别问题转化成如下的优化问题：Further, the sparse representation model The occluded face pose image recognition problem is transformed into the following optimization problem:

该问题可以通过标准的线性规范方法进行求解。This problem can be solved by standard linear norm methods.

进一步地，所述根据上述求解的线性组合的系数向量进行人脸姿态分类识别具体为通过对所述求解的线性组合的系数向量进行有效性累计，以累计值最大作为判定分类的依据，即Further, the described performing face pose classification and recognition according to the coefficient vector of the linear combination of the above-mentioned solution is specifically by effectively accumulating the coefficient vector of the linear combination of the solution, and taking the maximum accumulated value as the basis for judging the classification, that is,

其中，f(·)是特殊函数，用于将稀疏表示模型的表示系数的负因子置0，是选取函数，用于仅选择稀疏表示模型的表示系数中第i类子空间矩阵所对应的表示系数，并且将其它表示系数置0，p_i(y)是训练样本集合中第i类姿态类别对应的子空间与其对应的稀疏表示模型的表示系数的有效性累计因子。where f( ) is a special function used to set the negative factor of the representation coefficient of the sparse representation model to 0, is the selection function used to select only the representation coefficients of the sparse representation model The representation coefficients corresponding to the i-th subspace matrix in , and other representation coefficients are set to 0, p _i (y) is the subspace corresponding to the i-th pose category in the training sample set and the representation coefficients of its corresponding sparse representation model Effectiveness cumulative factor.

本发明相对于现有技术具有如下的优点及效果：Compared with the prior art, the present invention has the following advantages and effects:

1)目前大多数人脸姿态识别方法只对人脸左右偏转进行了分类，本发明不仅可以对人脸左右偏转进行分类，同还可以进行上下姿态分类。本发明可以在0.3秒左右识别出人脸图像的大致左右上下偏转状态，识别精度在95％以上，可以较好地应用于安全驾驶和人机交互等领域。1) At present, most face gesture recognition methods only classify the left and right deflection of the human face. The present invention can not only classify the left and right deflection of the human face, but also can classify the up and down posture. The invention can identify the roughly left and right up and down deflection states of the face image in about 0.3 seconds, the recognition accuracy is above 95%, and can be well applied to the fields of safe driving and human-computer interaction.

2)由于本发明基于字典学习与稀疏表示原理，所以可以同时解决人脸姿态识别中的光照、噪声、表情和遮挡等难题；2) Since the present invention is based on the principle of dictionary learning and sparse representation, it can simultaneously solve problems such as illumination, noise, expression and occlusion in face gesture recognition;

3)由于Gabor滤波器能从多尺度多方向提取图像的局部特征。因此，本发明采用人脸图像的Gabor特征作为超完备字典的原子向量。此方法提高了人脸姿态识别的稳定性。3) Since the Gabor filter can extract the local features of the image from multi-scale and multi-direction. Therefore, the present invention uses the Gabor feature of the face image as the atomic vector of the overcomplete dictionary. This method improves the stability of face gesture recognition.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例或者现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present invention. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1本发明中人脸姿态图像的划分类别示意图；Fig. 1 is a schematic diagram of the classification of face pose images in the present invention;

图2本发明公开的基于Gabor特征与字典学习的人脸姿态识别方法的总流程示意图；2 is a schematic diagram of the general flow of the face gesture recognition method based on Gabor feature and dictionary learning disclosed by the present invention;

图3本发明中人脸姿态图像Gabor特征提取示意图；Fig. 3 is a schematic diagram of Gabor feature extraction of face pose image in the present invention;

图4本发明公开的人脸姿态识别方法中基于稀疏表示的人脸姿态分类示意图。FIG. 4 is a schematic diagram of face pose classification based on sparse representation in the face pose recognition method disclosed in the present invention.

具体实施方式Detailed ways

为使本发明实现的技术手段、创作特征、达成目的与功效易于明白了解，以下参照附图并举实施例对本发明进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the technical means, creative features, achieved objects and effects of the present invention easy to understand and understand, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the description and claims of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

以下根据实施例分别进行详细说明。The following detailed descriptions are given respectively according to the examples.

实施例Example

本发明实施例中公开的基于Gabor特征与字典学习的人脸姿态识别方法，主要根据字典学习和稀疏表示的思想进行人脸姿态分类。The face gesture recognition method based on Gabor feature and dictionary learning disclosed in the embodiment of the present invention mainly performs face gesture classification according to the idea of dictionary learning and sparse representation.

预先将人脸姿态进行划分，将人脸姿态离散化为不同的子空间，每个字空间对应一种人脸姿态类别。本发明实施例中采用7种不同类姿态类别的划分与其对应子空间的定义，将人脸姿态离散为7个不同子空间，即左侧1、左侧2、右侧1和右侧2、正面、抬头、点头等7个姿态类别，如图1所示，分别定义为左偏转、左侧脸、右偏转、右侧脸、正面、抬头和点头。The face pose is divided in advance, and the face pose is discretized into different subspaces, and each word space corresponds to a face pose category. In the embodiment of the present invention, the division of 7 different gesture categories and the definitions of their corresponding subspaces are used to discretize the face gestures into 7 different subspaces, namely left side 1, left side 2, right side 1 and right side 2, There are 7 gesture categories including frontal, head-up, and nodding, as shown in Figure 1, which are defined as left deflection, left face, right deflection, right face, front, head up, and nod.

本发明实施例中公开的基于Gabor特征与字典学习的人脸姿态识别方法包括姿态字典训练和姿态分类识别两大部分。其中姿态字典训练部分又包括Gabor 特征提取，K-SVD字典优化和构建姿态完备字典等三个步骤；姿态分类识别部分包括Gabor特征提取，稀疏表示模型匹配求解和姿态分类识别等3个步骤。下面逐一对各步骤进行详细说明，基于Gabor特征与字典学习的人脸姿态识别方法的流程如附图2所示：The face gesture recognition method based on Gabor feature and dictionary learning disclosed in the embodiment of the present invention includes two parts: gesture dictionary training and gesture classification and recognition. The pose dictionary training part includes three steps: Gabor feature extraction, K-SVD dictionary optimization and construction of pose complete dictionary; pose classification and recognition part includes Gabor feature extraction, sparse representation model matching solution and pose classification and recognition. Each step is described in detail below. The process of the face gesture recognition method based on Gabor features and dictionary learning is shown in Figure 2:

步骤S1：姿态字典训练Step S1: pose dictionary training

本部分是离线进行。首先，分别收集各姿态类别的人脸姿态图像样本，并对所述人脸姿态图像样本进行Gabor滤波处理以及特征提取并向量化组成各姿态类别的人脸姿态Gabor特征训练集，本实施例中将人脸姿态划分为7个不同子空间，即左侧1、左侧2、右侧1和右侧2、正面、抬头、点头等7个姿态类别，故分别向量化生成7个不同的人脸姿态Gabor特征训练集。然后，采用K-SVD 方法对各类样本集进行训练优化得到各类子字典。最后，将各类子字典融合组成完备姿态字典。This part is done offline. First, collect face pose image samples of each pose category respectively, perform Gabor filtering processing and feature extraction on the face pose image samples, and quantify the face pose Gabor feature training set of each pose category. The face pose is divided into 7 different subspaces, namely left 1, left 2, right 1 and right 2, frontal, head-up, nodding and other 7 pose categories, so 7 different people are vectorized respectively. Face pose Gabor feature training set. Then, the K-SVD method is used to train and optimize various sample sets to obtain various sub-dictionaries. Finally, various sub-dictionaries are fused to form a complete pose dictionary.

其中姿态完备字典包括第一姿态完备字典D和第二姿态完备字典D_e，其中第一姿态完备字典D对应无遮挡的人脸姿态，其中第二姿态完备字典D_e对应遮挡的人脸姿态，又名遮挡腐蚀字典D_e。所述第一姿态完备字典D和所述第二姿态完备字典D_e的训练分别独立完成。The posture complete dictionary includes a first posture complete dictionary D and a second posture complete dictionary D _e , wherein the first posture complete dictionary D corresponds to the unoccluded face posture, and the second posture complete dictionary D _e corresponds to the occluded face posture, Also known as the occlusion erosion dictionary _De . The training of the first posture complete dictionary D and the training of the second posture complete dictionary D _e is completed independently.

具体步骤和实施方式包括：Specific steps and implementations include:

步骤S1a：Gabor特征提取Step S1a: Gabor Feature Extraction

人脸姿态图像的Gabor特征提取为：The Gabor feature extraction of the face pose image is:

其中，是通过对Gabor滤波系数的模进行ρ次采样而得到的列向量，μ，v为Gabor滤波器的方向与尺度，采用方向与尺度不变的Gabor滤波器描述人脸姿态图像的特征。为人脸姿态图像与Gabor核ψ_μ，v的卷积。Gabor 核定义为：in, is the modulo of the Gabor filter coefficients The column vector obtained by sampling ρ times, μ, v are the direction and scale of the Gabor filter, and the Gabor filter with constant direction and scale is used to describe the features of the face pose image. is the convolution of the face pose image with the Gabor kernel ψ _{μ, v} . The Gabor kernel is defined as:

其中，z(x，y)表示像素；为小波项，k_v＝k_max/f^v，φ_μ＝πμ/8，σ＝1.5π控制着高斯窗口宽度与波长的比例。本发明各参数的取值为k_max＝π/2，ρ≈40，μ＝{0，…，7}，v＝{0，…，4}。Gabor特征提取的示例见图3所示。Among them, z(x, y) represents the pixel; is the wavelet term, k _v =km _max /f ^v , φ _μ =πμ/8, σ=1.5π controls the ratio of the width of the Gaussian window to the wavelength. The value of each parameter of the present invention is k _max =π/2, ρ≈40, μ={0,...,7}, v={0,...,4}. An example of Gabor feature extraction is shown in Figure 3.

步骤S1b：K-SVD优化训练Step S1b: K-SVD optimization training

首先，采用上步方法对所有训练样本进行Gabor特征提取组成训练集，然后，对训练集采用K-SVD方法进行优化训练得到姿态完备字典D。K-SVD是一种经典的字典训练算法，依据误差最小原则，对误差项进行SVD分解，选择使误差最小的分解项作为更新的字典原子和对应的原子系数，经过不断的迭代从而得到优化的解。First, use the previous method to extract Gabor features from all training samples to form a training set, and then use the K-SVD method to optimize the training set to obtain a complete pose dictionary D. K-SVD is a classic dictionary training algorithm. According to the principle of minimum error, SVD decomposition is performed on the error term, and the decomposition term with the smallest error is selected as the updated dictionary atom and the corresponding atomic coefficient. After continuous iteration, the optimized algorithm is obtained. untie.

具体过程为：首先，对每类姿态样本集使用K-SVD进行训练优化得出最佳子字典D_i；然后，将各类子字典D_i组成第一姿态完备字典D＝[D₁，D₂，…，D₇]。The specific process is: first, use K-SVD to perform training optimization on each type of pose sample set to obtain the best sub-dictionary _Di ; then, form the first _posture complete dictionary D=[D ₁ , D ₂ , …, _D7 ].

本发明采用K-SVD进行训练优化得出最佳子字典D_i。K-SVD是一种交叉迭代稀疏表示与字典更新的过程。对于具有n_i个样本训练集，K-SVD的目标函数为：The present invention adopts K-SVD to perform training and optimization to obtain the best sub-dictionary D _i . K-SVD is a process that crosses iterative sparse representation and dictionary update. For a training set with n _i samples, the objective function of K-SVD is:

其中，为样本的稀疏系数集，为第i个训练样本y_i的稀疏表示系数向量，k为字典D_i的原子个数。K-SVD算法包括两个过程。第一过程中，固定字典D_i，因此上式目标函数变为求解稀疏表示系数的优化问题，目前有许多种求解方法。第二过程为利用第一过程求解的稀疏系数更新字典D_i。此过程通过逐步更新D_i的每列d_k和X的第i行实现。而d_k，的求解则是通过奇异值分解(SVD)实现。in, is the sparse coefficient set of the sample, is the sparse representation coefficient vector of the _i -th training sample yi, and k is the number of atoms in the dictionary D _i . The K-SVD algorithm consists of two processes. In the first process, the dictionary D _i is fixed, so the objective function above becomes an optimization problem for solving sparse representation coefficients, and there are many solving methods at present. The second process is to update the dictionary D _i using the sparse coefficients solved by the first process. This process works by incrementally updating each column _dk of D _i and the i-th row of X accomplish. and d _k , The solution is achieved by singular value decomposition (SVD).

步骤S1c：构建姿态完备字典Step S1c: Construct a pose complete dictionary

通过步骤1(b)得到各类优化后的子字典D_i，将所有子字典融合组成超完备字典。由于本发明将人脸姿态分为7种，因此人脸姿态完备字典为：其中，m为样本特征维数，n为字典原子总个数。Various optimized sub-dictionaries D _i are obtained through step 1(b), and all sub-dictionaries are fused to form an over-complete dictionary. Because the present invention divides the human face postures into 7 types, the complete dictionary of human face postures is: Among them, m is the sample feature dimension, and n is the total number of dictionary atoms.

步骤2：姿态分类识别Step 2: Pose classification and recognition

本部分是在线过程。首先，对在线输入人脸姿态图像提取其Gabor特征向量。然后，对在线输入的人脸姿态图像的Gabor特征向量使用构建的姿态完备字典进行线性组合表示，建立稀疏表示模型，并采用Lars-Lasso算法进行模型匹配求解。最后，利用求解系数进行姿态分类。具体步骤和实施方式包括：This section is an online process. First, extract the Gabor feature vector of the online input face pose image. Then, the Gabor feature vector of the online input face pose image is represented by linear combination using the constructed pose complete dictionary, and a sparse representation model is established, and the Lars-Lasso algorithm is used to solve the model matching. Finally, pose classification is performed using the solution coefficients. Specific steps and implementations include:

步骤S2a：Gabor特征提取Step S2a: Gabor feature extraction

此部分与步骤S1a相同，对在线输入的人脸姿态图像的Gabor特征向量y进行提取。This part is the same as step S1a, extracting the Gabor feature vector y of the face pose image input online.

步骤S2b：稀疏表示模型匹配与求解Step S2b: sparse representation model matching and solution

对在线输入的人脸姿态图像的Gabor特征向量y提取后，使用构建的姿态完备字典进行线性组合表示，向量化组成人脸姿态Gabor特征训练集。After extracting the Gabor feature vector y of the face pose image input online, the constructed pose complete dictionary is used for linear combination representation, and vectorized to form a face pose Gabor feature training set.

其中姿态完备字典包括第一姿态完备字典D和第二姿态完备字典D_e，其中第一姿态完备字典D对应无遮挡的人脸姿态，其中第二姿态完备字典D_e对应有遮挡的人脸姿态，又名遮挡腐蚀字典D_e。提取待估计人脸姿态图像的Gabor特征向量m为Gabor特征向量维数，将y看成第一姿态完备字典(N 为字典原子总个数)的线性组合表示：The posture complete dictionary includes a first posture complete dictionary D and a second posture complete dictionary D _e , wherein the first posture complete dictionary D corresponds to the face posture without occlusion, and the second posture complete dictionary D _e corresponds to the face posture with occlusion , aka the occlusion erosion dictionary _De . Extract the Gabor feature vector of the face pose image to be estimated m is the dimension of Gabor feature vector, and y is regarded as the complete dictionary of the first pose (N is the total number of dictionary atoms) linear combination representation:

其中是一个非常稀疏的线性组合系数向量。理想情况下，如果测试样本y属于第i类姿态，那么x除了它对应的第i类不为0外，其余的项都为0。根据系数向量x可以得到测试图像y的姿态类别。因此，人脸姿态识别问题转化为解线性方程组的问题。如果m＞N，方程组是超定的，x有唯一的解或无解。如果m＜N，x有多个解。在人脸姿态识别应用中，往往m＜N，即需要求解欠定方程组问题，根据原子追踪、压缩感知与稀疏表示方法的研究成果表明，如果上述欠定方程组的解足够稀疏，就可以由范数正则化的极小化问题求出：in is a very sparse linear combination coefficient vector. Ideally, if the test sample y belongs to the i-th type of pose, then x is 0 except for its corresponding i-th type which is not 0. According to the coefficient vector x, the pose category of the test image y can be obtained. Therefore, the face pose recognition problem is transformed into a problem of solving a system of linear equations. If m>N, the system of equations is overdetermined, and x has a unique solution or no solution. If m<N, x has multiple solutions. In the application of face gesture recognition, m<N is often needed, that is, the problem of underdetermined equations needs to be solved. According to the research results of atomic tracking, compressed sensing and sparse representation methods, it is shown that if the solutions of the above underdetermined equations are sparse enough, it can be Depend on The norm regularization minimization problem is solved:

稀疏表示模型匹配即求解如下线性方程组：Sparse representation model matching is to solve the following system of linear equations:

(无遮挡)或(有遮挡) (unblocked) or (with cover)

上述方程组中只有线性组合的系数向量x(以无遮挡为例)或ω(以有遮挡为例)为未知，因此可以通过带稀疏约束的最小二乘法进行求解：In the above equation system, only the linearly combined coefficient vector x (take no occlusion as an example) or ω (take occlusion as an example) is unknown, so it can be solved by the least square method with sparse constraints:

其中，λ为平衡因子，起到平衡重建误差与稀疏性的作用，本发明取λ＝0.01。上述最小优化模型本发明采用最小角度回归(Least Angle Regression，LAR) 的Lars-Lasso算法进行求解(B.Efron，T.Hastie，I.Johnstone and R.Tibshirani，″Least AngleRegression″，Annals of Statistics，32，407-499，2004)。对于有遮挡情况求解类似。Among them, λ is a balance factor, which plays the role of balancing reconstruction error and sparsity, and λ=0.01 in the present invention. The present invention adopts the Lars-Lasso algorithm of least angle regression (Least Angle Regression, LAR) to solve the above-mentioned minimum optimization model (B.Efron, T.Hastie, I.Johnstone and R.Tibshirani, "Least AngleRegression", Annals of Statistics, 32, 407-499, 2004). The solution is similar for occlusion cases.

步骤S2c：人脸姿态识别Step S2c: face gesture recognition

根据求解的线性组合的系数向量x进行人脸姿态分类识别。理论上应该只与训练样本中的某一类姿态的测试样本的相关密切，其对应的表征系数非零。因此，可以清楚的对该待测姿态进行分类。然而，由于噪声以及建模误差会引起待测样本的部分非相关的类别表征系数出现数值很小的非零元素，给正确分类带来影响。考虑到待测样本在训练样本集合上稀疏表示是基于整体训练集合的，据此考虑将待测样本在训练样本集合上的表示系数进行有效性累计，以累计值最大作为判定分类的依据。The face pose classification and recognition is performed according to the coefficient vector x of the linear combination of the solution. theoretically It should only be closely related to the test sample of a certain type of pose in the training sample, and its corresponding characterization coefficient is non-zero. Therefore, the to-be-measured pose can be clearly classified. However, due to noise and modeling errors, some non-correlated class characterization coefficients of the samples to be tested will appear non-zero elements with small values, which will affect the correct classification. Considering that the sparse representation of the samples to be tested on the training sample set is based on the overall training set, it is considered that the representation coefficients of the samples to be tested on the training sample set are effectively accumulated, and the maximum accumulated value is used as the basis for judging the classification.

根据步骤S2b求解的系数向量x可以得到待估计人脸姿态图像y的姿态类别。如图4所示，测试人脸姿态属于第4类，因此其非0系数主要集中在x₄中，其它类别的系数绝大部分为0。The pose category of the face pose image y to be estimated can be obtained according to the coefficient vector x solved in step S2b. As shown in Figure 4, the test face pose belongs to the fourth category, so its non-zero coefficients are mainly concentrated in x ₄ , and most of the coefficients of other categories are 0.

为了减少噪声与建模误差的影响；同时考虑到待测样本在训练样本集合上稀疏表示是基于整体训练集合的，据此考虑将待测样本在训练样本集合上的表示系数有效性累计，以累计值最大作为判定分类的依据。因此基于稀疏表示的姿态分类为：In order to reduce the influence of noise and modeling errors; at the same time, considering that the sparse representation of the samples to be tested on the training sample set is based on the overall training set, it is considered to accumulate the validity of the representation coefficients of the samples to be tested on the training sample set to obtain The maximum accumulated value is used as the basis for judging the classification. So the poses based on sparse representation are classified as:

其中，f(·)是个特殊函数，表示将稀疏表示系数的负因子置0；是个选取函数，仅选择稀疏表示系数中第i类子空间矩阵所对应的表示系数，并且将其它置0；p_i(y)是训练样本集合中第i类子空间与其对应的稀疏表示系数的有效性累计因子。Among them, f( ) is a special function, which means that the negative factor of the sparse representation coefficient is set to 0; is a selection function that selects only sparse representation coefficients The representation coefficient corresponding to the i-th subspace matrix in , and set the others to 0; p _i (y) is the validity accumulation factor of the i-th subspace and its corresponding sparse representation coefficient in the training sample set.

对应遮挡人脸姿态处理，通过增加遮挡姿态人脸字典的方法解决人脸姿态图像遮挡问题。由于，上述人脸姿态识别方法巧妙地利用了稀疏表示分类方法原理。因此，对人脸光照、噪声、表情和分辨率变化具有鲁棒性。为了解决人脸姿态遮挡问题：首先，依据步骤S1再建立一个遮挡腐蚀字典D_e。然后，依据步骤S2a 及步骤S2b提取待估计人脸姿态图像的Gabor特征向量m为Gabor特征向量维数，将y看成由无遮挡字典D和遮挡腐蚀字典共同线性组合表示 (对应步骤S2b中的字典学习模型)为：Corresponding to occlusion face pose processing, the problem of face pose image occlusion is solved by adding occlusion pose face dictionary. Because, the above face gesture recognition method cleverly utilizes the principle of sparse representation classification method. Therefore, it is robust to face illumination, noise, expression and resolution changes. In order to solve the problem of face pose occlusion: first, establish an occlusion erosion dictionary De according to step _S1 . Then, extract the Gabor feature vector of the face pose image to be estimated according to step S2a and step S2b m is the dimension of the Gabor feature vector, and y is regarded as a combination of the non-occlusion dictionary D and the occlusion erosion dictionary The common linear combination representation (corresponding to the dictionary learning model in step S2b) is:

其中，无遮挡图像y₀与遮挡误差图像e₀分别可由字典D和遮挡字典稀疏表示。D_e为正交单位矩阵。最后，遮挡人脸姿态识别问题转化成如下的优化问题：in, The unoccluded image y ₀ and the occlusion error image e ₀ can be obtained from the dictionary D and the occlusion dictionary respectively Sparse representation. _De is an orthogonal identity matrix. Finally, the occlusion face pose recognition problem is transformed into the following optimization problem:

该问题可以通过标准的线性规范方法进行求解。稀疏表示模型匹配即求解如下线性方程组：This problem can be solved by standard linear norm methods. Sparse representation model matching is to solve the following system of linear equations:

(有遮挡) (with cover)

上述方程组中只有线性组合的系数向量ω(以有遮挡为例)为未知。In the above equation system, only the coefficient vector ω of the linear combination (take occlusion as an example) is unknown.

到此，通过增加遮挡字典D_e的方法成功地解决人脸遮挡问题。So far, the face _occlusion problem is successfully solved by adding the occlusion dictionary De.

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments, and any other changes, modifications, substitutions, combinations, The simplification should be equivalent replacement manners, which are all included in the protection scope of the present invention.

Claims

1. A face posture recognition method based on Gabor characteristics and dictionary learning is characterized by comprising the following steps:

performing Gabor feature extraction on the online input face posture image to be recognized to construct a Gabor feature vector y;

performing linear combination representation on the Gabor characteristic vector y by using an attitude complete dictionary, establishing a sparse representation model and solving a coefficient vector, wherein the Gabor characteristic vectorm is a Gabor eigenvector dimension;

carrying out face posture classification and identification according to the solved coefficient vector of the linear combination;

before the Gabor feature vector y is subjected to linear combination representation by using a pose complete dictionary, the method also comprises a training step of the pose complete dictionary, wherein the pose complete dictionary comprises a first pose complete dictionary D corresponding to the non-shielded human face pose and a second pose complete dictionary D corresponding to the shielded human face pose_eSaid first well-posed complete dictionary D and said second well-posed complete dictionary D_eThe training is respectively and independently finished;

when the online input face gesture image to be recognized is a shielded face gesture image, extracting a Gabor characteristic vector of the face gesture image to be recognizedm is the dimension of Gabor eigenvector, and y is considered as the complete dictionary from the first postureAnd a second gesture complete dictionaryCommon linear combinations represent:

whereinNon-occluded image y₀And occlusion error image e₀From a first posture-completed dictionary D and a second posture-completed dictionary D respectivelySparse representation, D_eIs an orthogonal identity matrix, and the sparse representation model is

The sparse representation modelThe problem of recognizing the human face posture image with the shielding is converted into the following optimization problem:

(l¹):ω＝argmin||ω||₁s.t.Bω＝y，

the problem is solved by standard linear normative methods.

2. The face pose recognition method based on Gabor feature and dictionary learning of claim 1,

the human face pose is divided into 7 different pose categories, which are respectively defined as left deflection, left side face, right deflection, right side face, front, head raising and nodding, and each of the pose categories corresponds to different subspaces.

3. The face pose recognition method based on Gabor feature and dictionary learning of claim 1, wherein the training process of the first pose complete dictionary D is as follows:

respectively collecting face attitude image samples of all attitude categories, carrying out Gabor filtering processing and feature extraction on the face attitude image samples, and vectorizing to form a face attitude Gabor feature training set of all attitude categories;

training and optimizing a face posture Gabor feature training set of each posture category by using K-SVD to respectively obtain an optimal sub-dictionary D_i，i＝1，2，…，7；

All kinds of optimal sub-dictionaries D_iForm a first pose complete dictionary D ═ D₁,D₂,…,D₇]。

4. The face pose recognition method based on Gabor features and dictionary learning of claim 3, wherein the Gabor features of the face pose image are:

wherein,by modulo of Gabor filter coefficientsColumn vectors obtained by sampling rho times, mu and v are directions and scales of the Gabor filter,for human face pose image and Gabor kernel psi_μ,vThe Gabor kernel is defined as:

wherein z (x, y) represents a pixel;is a wavelet term, k_v＝k_max/f^v，φ_μPi μ/8, and 1.5 pi control the ratio of gaussian window width to wavelength.

5. The method of claim 1, wherein the face pose recognition based on Gabor feature and dictionary learning is performed by performing validity accumulation on the coefficient vectors of the linear combination of the solution, and taking the maximum accumulated value as a basis for determining classification, that is, performing face pose classification recognition based on the coefficient vectors of the linear combination of the solution

Where f (-) is a special function for setting the negative factor of the representation coefficient of the sparse representation model to 0,is a selection function for selecting only the representation coefficients of the sparse representation modelThe representing coefficient corresponding to the i-th type subspace matrix, and setting other representing coefficients to be 0, p_iAnd (y) is an effectiveness accumulation factor of the subspace corresponding to the ith posture category in the training sample set and the corresponding expression coefficient of the sparse expression model.