CN101561710B

CN101561710B - A Human-Computer Interaction Method Based on Face Pose Estimation

Info

Publication number: CN101561710B
Application number: CN2009101038842A
Authority: CN
Inventors: 毛玉星; 张占龙; 何为; 成华安; 傅饶
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2009-05-19
Filing date: 2009-05-19
Publication date: 2011-02-09
Anticipated expiration: 2029-05-19
Also published as: CN101561710A

Abstract

The present invention relates to a human-computer interaction device and method based on face pose estimation. The steps of realization are as follows: the face image sequence is obtained through a camera, and five features of two eye corners, two mouth corners and nose tip are automatically extracted after preprocessing point; using a frontal image as a reference, estimate the three deflection angles of the face in any image according to the position and corresponding relationship of the five feature points; define the position and operation mode of the mouse pointer by these posture information, and generate human-computer interaction information; Connect to the computer through the USB interface to form a new visual mouse device. As a supplement to traditional human-computer interaction methods, the device is suitable for some special interactive groups (such as handicapped people) and interactive environments (such as multimedia games), and has significant application value.

Description

A Human-Computer Interaction Method Based on Face Pose Estimation

技术领域technical field

本发明涉及一种人机交互方法，特别涉及一种构建以人脸特征点定位与姿态检测为基础的视觉鼠标装置的方法，通过图像分析与目标检测手段，提取两个眼角、两个嘴角和鼻尖共5个特征点，以一幅正面图像的相应特征点位置为参考，对实时人脸图像3个偏转角进行估计，并生成人机交互信息，通过USB接口连接到计算机，形成一种人机交互的视觉鼠标装置。The present invention relates to a human-computer interaction method, in particular to a method for constructing a visual mouse device based on facial feature point positioning and attitude detection. Two corners of the eyes, two corners of the mouth and There are 5 feature points on the tip of the nose. Taking the position of the corresponding feature points of a frontal image as a reference, the three deflection angles of the real-time face image are estimated, and human-computer interaction information is generated, which is connected to the computer through the USB interface to form a human-computer interface. A visual mouse device for computer interaction.

背景技术Background technique

人机交互技术(Human-Computer Interaction Techniques)是指通过计算机输入、输出设备，以有效的方式实现人与计算机对话的技术。它包括机器通过输出或显示设备给人提供大量有关信息及提示，人通过输入设备给机器输入有关信息，回答问题等。人机交互技术是计算机用户界面设计中的重要内容之一。它与认知学、人机工程学、心理学等学科领域有密切的联系。Human-computer interaction technology (Human-Computer Interaction Techniques) refers to the technology that realizes the dialogue between human and computer in an effective way through computer input and output devices. It includes that the machine provides a large amount of relevant information and prompts to people through output or display devices, and people input relevant information to the machine through input devices to answer questions, etc. Human-computer interaction technology is one of the important contents in computer user interface design. It is closely related to the subject areas of cognition, ergonomics, psychology and so on.

在人机交互领域，现在广泛使用的有键盘、鼠标、光笔等传统装置，然而在一些特殊情况下传统装置有其局限性，例如某些多媒体游戏的复杂界面操作以及残疾人士使用计算机，迫切需要设计不依赖肢体动作的输入装置作为现有人机交互方式的补充。In the field of human-computer interaction, traditional devices such as keyboards, mice, and light pens are widely used. However, traditional devices have their limitations in some special cases, such as the complex interface operations of some multimedia games and the use of computers by disabled people. Design input devices that do not rely on body movements as a supplement to existing human-computer interaction methods.

为了适应不同的交互环境，适用于不同的交互人群，国内外学者已经广泛研究基于视觉(图像)、听觉(语音)、触觉(压力、温度)等交互方法，有些方法已经初步投入使用。在视觉交互方法中，目前主要有视线追踪技术、手语识别技术等研究热点。视线追踪技术是通过摄像头和瞳孔定位技术感知人的关注视点，从而驱动鼠标定位及相关操作，然而有两个重要缺点：一是瞳孔移动是不连续的，不能与连续移动的视点很好吻合，降低了跟踪的准确性；二是由于生理与心理原因，人眼的移动具有随意性，某些情况下人眼视线所在位置并不反应自身所关注的主观意愿。In order to adapt to different interaction environments and apply to different interaction groups, domestic and foreign scholars have extensively studied interaction methods based on vision (image), hearing (voice), and touch (pressure, temperature), and some methods have been initially put into use. Among the visual interaction methods, there are currently research hotspots such as eye-tracking technology and sign language recognition technology. Gaze tracking technology perceives people's attention point of view through the camera and pupil positioning technology, so as to drive mouse positioning and related operations. However, there are two important shortcomings: one is that the pupil movement is discontinuous and cannot be well matched with the continuous moving point of view. The accuracy of tracking is reduced; the second is that due to physiological and psychological reasons, the movement of the human eye is random, and in some cases the position of the human eye's line of sight does not reflect the subjective will of its own attention.

为克服视线跟踪人机交互方法的缺陷，本发明采用人脸姿态估计方法，利用视频摄像头获得人脸图像，通过图像与视频处理技术实现人脸检测与关键点定位，根据多帧序列图像的特征点位置分布估计人脸姿态，并由姿态信息实现人机交互，是一种新的交互方法。由此形成的人机交互装置，适用于一些特殊的交互人群(如残疾人士)和交互环境(如多媒体游戏)，具有显著的应用价值，在人机交互领域有着扩展范畴的重要意义。In order to overcome the defects of the human-computer interaction method of line of sight tracking, the present invention adopts a face pose estimation method, uses a video camera to obtain a face image, realizes face detection and key point positioning through image and video processing technology, and according to the characteristics of multi-frame sequence images It is a new interaction method to estimate the face pose based on the point position distribution, and realize human-computer interaction based on the pose information. The resulting human-computer interaction device is suitable for some special interactive groups (such as the disabled) and interactive environments (such as multimedia games), has significant application value, and has great significance in the field of human-computer interaction.

发明内容Contents of the invention

针对现有的人机交互装置不能完全满足不同的交互环境和交互人群，本发明的目的是提供一种一种基于人脸姿态估计的人机交互方法，利用人脸特征点自动定位和人脸姿态估计等技术，生成人机交互信息，通过USB接口连接到计算机，形成一种人机交互的视觉鼠标装置。Aiming at the fact that existing human-computer interaction devices cannot completely satisfy different interactive environments and interactive groups, the purpose of the present invention is to provide a human-computer interaction method based on face pose estimation, which utilizes facial feature point automatic positioning and human face Posture estimation and other technologies generate human-computer interaction information, and connect to a computer through a USB interface to form a human-computer interaction visual mouse device.

本发明涉及包含以下步骤：The present invention involves comprising the following steps:

a)经光学镜头及CMOS图像传感器组件得到的人脸数字图像序列，通过DSP的数字视频高速通道实现数据采集。a) The face digital image sequence obtained through the optical lens and CMOS image sensor components, and the data acquisition is realized through the digital video high-speed channel of the DSP.

b)对人脸数字图像进行预处理。b) Preprocessing the face digital image.

①图像消噪。由于图像噪声点与它们的近邻像素在灰度值、统计特征或分布规律上有显著的区别，由此可以进行图像滤波实现噪声抑制，使有用信息更容易检测与识别。非线性扩散是一种良好的图像消噪方法，具有很好的保边特性，但要经过多次迭代，计算复杂度高。本发明中采用目标点四个方向带状区域的方差构造边缘映射，应用于非线性扩散图像消噪算法中，减少迭代次数，并采用积分图实现快速计算。① Image denoising. Since image noise points are significantly different from their neighboring pixels in gray value, statistical characteristics or distribution rules, image filtering can be performed to achieve noise suppression, making useful information easier to detect and identify. Nonlinear diffusion is a good image denoising method, which has good edge-preserving properties, but it needs to go through many iterations, and the computational complexity is high. In the present invention, the edge mapping is constructed by using the variance of the strip area in four directions of the target point, which is applied to the non-linear diffusion image denoising algorithm, reduces the number of iterations, and uses the integral map to realize fast calculation.

②人脸检测与区域划分。手工标注多张人脸，将这些人脸变换到相同尺寸，再求平均得到人脸模板，用模板匹配方法得到人脸的大概位置，然后根据人脸结构特征先验知识将人脸划分成左右眼、鼻和嘴四个区域。②Face detection and area division. Manually mark multiple faces, transform these faces to the same size, and then calculate the average to get the face template, use the template matching method to get the approximate position of the face, and then divide the face into left and right according to the prior knowledge of face structure features Eyes, nose and mouth areas.

③边缘检测。由于5个特征点都位于人脸器官的角点上，需要提取轮廓或角点信息以提高定位精度。传统的边缘检测方法有Canny算子、Sobel算子、形态学方法以及小波变换方法等。此外，SUSAN角点检测算法能够很好反映边缘及角点信息，但效果受到阈值影响。本发明中采用方向滤波方法获取边缘图像。③Edge detection. Since the five feature points are located on the corners of the face organs, it is necessary to extract contour or corner information to improve the positioning accuracy. Traditional edge detection methods include Canny operator, Sobel operator, morphological method and wavelet transform method. In addition, the SUSAN corner detection algorithm can well reflect the edge and corner information, but the effect is affected by the threshold. In the present invention, a direction filtering method is used to obtain edge images.

c)定位人脸的5个特征点(2个眼角、2个嘴角和鼻尖)。c) Locate the 5 feature points of the face (2 corners of the eyes, 2 corners of the mouth and the tip of the nose).

①定位眼角。首先，在前面确定的人眼区域内，对消噪后图像进行水平和垂直两个方向的灰度投影，分别取谷点位置作为眼球中心的二维坐标，并以此划定矩形区域作为眼睛的确切位置；然后，考察这一区域，对边缘图像进行二值化处理，并检测任意连通区域的点数，去掉连通点数太少的区域。最后，对选中的人眼连通区域提取最左(右)点作为眼角点。① Locate the corner of the eye. Firstly, within the previously determined human eye area, grayscale projection is performed on the denoised image in both horizontal and vertical directions, and the position of the valley point is taken as the two-dimensional coordinates of the eyeball center, and a rectangular area is defined as the eye. The exact position of ; Then, examine this area, binarize the edge image, and detect the number of points in any connected area, and remove the area with too few connected points. Finally, the leftmost (right) point is extracted from the selected connected area of the human eye as the corner point.

②定位嘴角。嘴角定位也是角点检测，方法与眼角定位类似，但由于嘴的灰度变化不象眼睛一样存在高对比度的瞳孔，造成其轮廓不分明。先采用唇色样本训练建立高斯混合概率模型，将嘴区域所有点代入模型计算其概率，并规格化后作为新的灰度信息，再应用眼角定位相同的方法提取嘴角位置。② Locate the corners of the mouth. Mouth corner positioning is also corner detection, the method is similar to eye corner positioning, but because the gray scale change of the mouth does not have high-contrast pupils like eyes, resulting in unclear outlines. Firstly, a Gaussian mixture probability model is established by using lip color sample training, and all points in the mouth area are substituted into the model to calculate their probability, and normalized as new grayscale information, and then the mouth corner position is extracted using the same method as eye corner positioning.

③定位鼻尖。鼻尖不在轮廓上，定位鼻尖要依靠其与眼角与嘴角的几何位置关系确定。由于鼻尖离鼻孔较近，而且位于两个鼻孔连线上方，先采用灰度投影法在鼻子区域内搜索两个鼻孔位置，然后在鼻孔上方查找高光点作为鼻尖位置。在人脸偏转角度较大时，鼻尖定位容易造成误差，但在本发明中，鼻尖位置只影响偏转角度的符号，即确定转动方向，并不影响角度大小计算，因而对误差并不敏感。③ Locate the tip of the nose. The tip of the nose is not on the contour, and the positioning of the tip of the nose depends on its geometric relationship with the corners of the eyes and mouth. Since the tip of the nose is close to the nostrils and above the line connecting the two nostrils, the grayscale projection method is used to search for the positions of the two nostrils in the nose area, and then the high light point above the nostrils is used as the position of the nose tip. When the deflection angle of the human face is large, the positioning of the tip of the nose is likely to cause errors, but in the present invention, the position of the tip of the nose only affects the sign of the deflection angle, that is, determines the direction of rotation, and does not affect the calculation of the angle, so it is not sensitive to errors.

d)估计人脸的3个偏转角度d) Estimate the 3 deflection angles of the face

3个偏转角度是生成人机交互信息的基础。在偏转角度不太大，人脸大小相对固定并处于画面中心，有良好光照的条件下，可以准确得到5个特征点，用下面步骤求取3个偏转角。The three deflection angles are the basis for generating human-computer interaction information. Under the condition that the deflection angle is not too large, the size of the face is relatively fixed and is in the center of the screen, and there is good light, 5 feature points can be obtained accurately, and 3 deflection angles can be obtained by the following steps.

①坐标变换。先计算左右眼角点的中心，并作为图像原点，其余所有坐标点根据其与原点的相对位置进行坐标变换，得到新的坐标值。然后计算两个嘴角点的中心位置并记录，得到两个眼角、眼角中心、鼻尖、嘴角中心共5个特征点，这5个特征点是进行人脸三个三维偏转角度估计的基础。① Coordinate transformation. First calculate the center of the left and right eye corners, and use it as the origin of the image, and perform coordinate transformation on all other coordinate points according to their relative positions with the origin to obtain new coordinate values. Then calculate and record the center positions of the two corner points of the mouth, and obtain five feature points of the two eye corners, the center of the eye corners, the tip of the nose, and the center of the mouth corners. These five feature points are the basis for estimating the three-dimensional deflection angles of the face.

②确定一幅正面图像，以两个眼角连线为水平线，该连线与嘴角中心点确定的平面与摄像镜头法线垂直时为正面，设此时三个偏转角0。定位正面图像的5个特征点，根据d)①方法完成坐标变换，计算5个特征点坐标并记录，在后续的姿态估计中作为参考信息。② Determine a frontal image, take the line connecting the two corners of the eyes as the horizontal line, and the plane defined by the line and the center point of the mouth corner is the front when it is perpendicular to the normal of the camera lens, and set the three deflection angles at this time to 0. Locate the 5 feature points of the frontal image, complete the coordinate transformation according to the method d)①, calculate and record the coordinates of the 5 feature points, and use them as reference information in the subsequent pose estimation.

③对任意时刻采集的人脸图像，需要估计其姿态以生成人机交互信息。首先定位5个特征点，再按照d)②相同的方法得到5个点的坐标。参照d)②得到的正面图像的特征点，对这些坐标按照几何约束关系进行验证，对明显不符合条件的定位结果作放弃处理，不生成交互信息。③ For the face image collected at any time, it is necessary to estimate its pose to generate human-computer interaction information. First locate 5 feature points, and then obtain the coordinates of 5 points according to the same method as d)②. Referring to the feature points of the frontal image obtained in d)②, these coordinates are verified according to the geometric constraint relationship, and the positioning results that obviously do not meet the conditions are discarded, and no interactive information is generated.

④利用d)②、d)③两步所确定的两幅图像对应5点的坐标，利用针孔摄像机模型，根据算法特点作出一些合乎情理及不影响结果的简化假设，最后利用对极几何原理进行推导，计算出任意图像中人脸的姿态，得到三个偏转角度。④ Use the coordinates of the two images corresponding to 5 points determined in the two steps of d) ② and d) ③, use the pinhole camera model, make some simplified assumptions that are reasonable and do not affect the results according to the characteristics of the algorithm, and finally use the principle of epipolar geometry Derivation is performed to calculate the pose of the face in any image and obtain three deflection angles.

e)生成人机交互信息e) Generate human-computer interaction information

由前面获得的三个偏转角度定义鼠标位置和操作方式。计算机-鼠标交互方法包括指针在二位平面内任意移动，鼠标左、右键的单双击等操作。本发明根据其中两个角度值定位鼠标指针位置，用另一角度的帧间突变量定义鼠标的操作，从而生成人机交互信息。The mouse position and operation mode are defined by the three deflection angles obtained earlier. The computer-mouse interaction method includes operations such as arbitrary movement of the pointer in the two-dimensional plane, single and double clicks of the left and right keys of the mouse, and the like. The invention locates the position of the mouse pointer according to the two angle values, and defines the operation of the mouse by the inter-frame mutation amount of the other angle, so as to generate human-computer interaction information.

f)连接计算机实现通信f) Connect to a computer for communication

装置中开发USB接口，按照标准USB鼠标的通信协议编写驱动程序，将前面获取的人机交互信息传送到计算机。计算机不需要任何专门的程序支持，降低了目标计算机的负担，不影响操作者应用计算机运行复杂的软件。The USB interface is developed in the device, the driver program is written according to the communication protocol of the standard USB mouse, and the human-computer interaction information obtained earlier is transmitted to the computer. The computer does not need any special program support, which reduces the burden on the target computer and does not affect the operator's use of the computer to run complex software.

本发明作为传统人机交互方法的补充，适用于一些特殊的交互人群(如肢残人士)和交互环境(如多媒体游戏)，具有显著的应用价值。As a supplement to the traditional human-computer interaction method, the present invention is applicable to some special interactive groups (such as handicapped persons) and interactive environments (such as multimedia games), and has significant application value.

附图说明Description of drawings

图1是本发明的人机交互信息处理流程图Fig. 1 is the flow chart of human-computer interaction information processing of the present invention

图2是人脸初定位采用的人脸模板图Figure 2 is the face template image used for the initial positioning of the face

图3是正面人脸器官模板及本方法中5个特征点位置图Figure 3 is the frontal face organ template and the position map of the five feature points in this method

图4是本发明中人脸3个偏转角度的定义方法示意图Fig. 4 is a schematic diagram of the definition method of the three deflection angles of the human face in the present invention

具体实施方式Detailed ways

下面结合一个非限定性实例对本发明作进一步的说明Below in conjunction with a non-limiting example the present invention will be further described

参见图1、图2、图3、图4。See Figure 1, Figure 2, Figure 3, Figure 4.

本发明图像采集、时序生成及控制采用CPLD器件实现，图像预处理、特征点定位和姿态估计相关算法采TI公司的达芬奇处理器TMS32C6446完成，USB接口用Cypress控制芯片实现。依据信息处理流程完成所有算法及硬件模块设计，模拟鼠标操作实现人机交互。The image acquisition, timing generation and control of the present invention are realized by CPLD devices, the related algorithms of image preprocessing, feature point positioning and attitude estimation are completed by Da Vinci processor TMS32C6446 of TI Company, and the USB interface is realized by Cypress control chip. Complete the design of all algorithms and hardware modules according to the information processing flow, and simulate mouse operation to realize human-computer interaction.

主要模块内容介绍如下：The main modules are introduced as follows:

(1)人脸图像预处理(1) Face image preprocessing

I.图像消噪。由于光学系统或电子器件影响，图像不可避免会受到噪声干扰，需要进行消噪处理以提高特征点的定位精度。应用非线性扩散消噪算法原理，并利用目标点邻域四个方向带状区域灰度的方差值作为图像的边缘映射，每个像素点的扩散量由邻域八像素与该点的差值以及相应的方向权系数决定，从而增强适应能力，减少迭代次数少，加快运算速度。其迭代公式为：I. Image denoising. Due to the influence of the optical system or electronic devices, the image will inevitably be disturbed by noise, and denoising processing is required to improve the positioning accuracy of feature points. The principle of nonlinear diffusion denoising algorithm is applied, and the variance value of the gray value of the band-shaped area in the four directions of the neighborhood of the target point is used as the edge map of the image. The diffusion amount of each pixel is determined by the difference The value and the corresponding direction weight coefficient are determined, thereby enhancing the adaptability, reducing the number of iterations, and speeding up the calculation. Its iteration formula is:

${x x}_{i i,, j j}^{' '} = = {x x}_{i i,, j j} + + λ λ (({Σ Σ}_{p p = = - - 11}^{11} {Σ Σ}_{q q = = - - 11}^{11} g g (({σ σ}_{p p,, q q})) {&dtri; &dtri;}_{p p,, q q} {x x}_{i i,, j j}))$

为了在迭带过程中不产生新的极值点，λ取0.125，σ_p，q根据p、q值不同代表四个方向带状区域方差。为叉分方法计算的梯度。扩散函数g(σ)定义为：In order not to produce new extremum points in the process of overlapping bands, λ is set to 0.125, and σ _{p, q} represent the variance of banded areas in four directions according to the different values of p and q. The gradient computed for the fork method. The spread function g(σ) is defined as:

$g g ((σ σ)) = = \frac{11}{11 + + {σ σ}^{22} / / {K K}^{22}}$

II.人脸定位与区域划分。由于本发明应用于特殊的人机交互环境，可以保证一系列客观条件：良好的光照，人脸大小比较统一并处于图像中心，背景影响较小，减少了人脸定位负担。先采用手工标注100张人脸，在尺寸归一化后求平均得到人脸模板，用模板匹配方法提取人脸的位置。设a_i，j为待检测像素灰度，t_i，j为模板像素灰度，E_a、E_t分别为二者均值，则匹配系数定义为：II. Face localization and area division. Since the present invention is applied to a special human-computer interaction environment, a series of objective conditions can be guaranteed: good illumination, relatively uniform size of the face and being in the center of the image, less influence of the background, and reduced burden of face positioning. Firstly, 100 faces are marked manually, and the face template is obtained by averaging after size normalization, and the position of the face is extracted by template matching method. Let a _{i, j} be the gray level of the pixel to be detected, t _{i, j} be the gray level of the template pixel, and E _a and E _t be the mean values of the two respectively, then the matching coefficient is defined as:

$m m = = \frac{\underset{i i,, j j}{Σ Σ} (({a a}_{i i,, j j} - - {E E.}_{a a})) (({t t}_{i i,, j j} - - {E E.}_{t t}))}{\sqrt{\underset{i i,, j j}{Σ Σ} {(({a a}_{i i,, j j} - - {E E.}_{a a}))}^{22} \underset{i i,, j j}{Σ Σ} {(({t t}_{i i,, j j} - - {E E.}_{t t}))}^{22}}}$

m＞0.55为侯选人脸区域。在检测范围内若出现多个符合条件的区域，则对连续出现的人脸区域求平均得到人脸位置。然后根据人脸结构特征先验知识将人脸划分成左右眼、鼻和嘴四个区域。m>0.55 is the candidate face area. If there are multiple qualified areas within the detection range, the face position is obtained by averaging the continuously appearing face areas. Then the face is divided into four regions: left and right eyes, nose and mouth according to the prior knowledge of face structure features.

III.边缘检测。人脸的眼、鼻、嘴等器官包含了显著的轮廓信息，可以通过边缘检测算法提取轮廓信息，为特征点检测打下基础。本发明中采用方向滤波方法：先用差分法计算每个点的梯度大小和方向，再对目标点及其相邻8点的梯度进行矢量叠加，用其模值作为该点新的灰度值，从而得到边缘图像。III. Edge detection. The eyes, nose, mouth and other organs of the human face contain significant contour information, which can be extracted through edge detection algorithms to lay the foundation for feature point detection. In the present invention, the direction filtering method is adopted: first calculate the gradient size and direction of each point by difference method, then carry out vector superposition to the gradient of the target point and its adjacent 8 points, and use its modulus value as the new gray value of the point , so as to obtain the edge image.

(2)定位人脸的5个特征点(2个眼角、2个嘴角和鼻尖)。(2) Locate the 5 feature points of the face (2 corners of the eyes, 2 corners of the mouth and the tip of the nose).

I.定位眼角。首先，针对(1)I得到的预处理图像和(1)II确定的人眼大致区域，进行水平和垂直灰度投影，先对投影波形进行平滑，再分别取谷点位置为眼球中心点的纵向和横向坐标，并以此划定矩形区域作为眼睛的位置；其次，对该区域的(1)III边缘图像采用最大类间方差法确定阈值进行二值化处理；再次，对二值图像进行连通区域检测，对连通点少于20的区域作为干扰排除，并对候选连通区进行形状验证；最后，对选中的人眼连通区域提取最左(右)点，若存在多点则最下面的点作为眼角点。I. Locate the corner of the eye. First, for the preprocessed image obtained in (1)I and the general area of the human eye determined in (1)II, horizontal and vertical grayscale projections are performed, and the projected waveform is first smoothed, and then the position of the valley point is taken as the center point of the eyeball. Vertical and horizontal coordinates, and delineate the rectangular area as the position of eyes with this; Secondly, adopt the maximum inter-class variance method to determine the threshold value and carry out binarization processing on the (1) III edge image of this area; Again, carry out binary image Connected area detection, remove the area with less than 20 connected points as interference, and verify the shape of the candidate connected area; finally, extract the leftmost (right) point from the selected human eye connected area, if there are multiple points, the bottom point as the corner of the eye.

II.定位嘴角。采用与眼角定位相似的方法，只是由于嘴的灰度变化不如眼睛明显，所以在对嘴唇区域进行边沿检测之前，先通过高斯混合模型(GMMs)进行肤色变换：手工收集大量嘴唇样本，根据其颜色先计算所有样本点的Cr、Cb色差信息，并以此为二维坐标，建立由两个高斯模型组成的高斯混合概率模型，代入所有样本点进行肤色训练，获得模型参数。将(1)II步确定的嘴巴区域所有点的颜色信息代入模型，计算属于嘴唇的概率，将概率变换到0-255区间，作为新的灰度图，然后用(1)III方法进行边缘检测。后续的嘴角定位步骤与眼角提取相同。II. Locate the corners of the mouth. The method is similar to that of eye corner positioning, but because the grayscale change of the mouth is not as obvious as the eyes, so before the edge detection of the lip area, the skin color transformation is performed through Gaussian mixture models (GMMs): manually collect a large number of lip samples, according to their color First calculate the Cr and Cb color difference information of all sample points, and use this as two-dimensional coordinates to establish a Gaussian mixture probability model composed of two Gaussian models, and substitute all sample points for skin color training to obtain model parameters. Substitute the color information of all points in the mouth area determined in step (1) II into the model, calculate the probability of belonging to the lips, transform the probability to the 0-255 interval as a new grayscale image, and then use the method (1) III for edge detection . The subsequent mouth corner localization steps are the same as eye corner extraction.

III.定位鼻尖。首先定位两个鼻孔位置，以找到的两个眼角位置为基础，定义其距离为h，根据眼角、嘴角和鼻孔的位置关系，在眼角连线以下的1.2h到1.6h，宽度为h的矩形区域内分别进行水平和垂直灰度投影，由水平投影的谷点位置作为鼻孔位置的纵坐标，以垂直投影的两个谷点分别作为两个鼻孔的横坐标。然后在鼻孔上方0.3h内查找高光点，作为鼻尖位置。III. Locate the tip of the nose. First locate the positions of the two nostrils, and based on the two found corners of the eyes, define the distance as h. According to the positional relationship between the corners of the eyes, the corners of the mouth and the nostrils, a rectangle with a width of h between 1.2h and 1.6h below the line connecting the corners of the eyes Horizontal and vertical grayscale projections were performed in the area, and the valley point position of the horizontal projection was used as the ordinate of the nostril position, and the two valley points of the vertical projection were respectively taken as the abscissa of the two nostrils. Then find the highlight point within 0.3h above the nostril as the position of the tip of the nose.

(3)估计人脸的3个偏转角度(3) Estimate the three deflection angles of the face

在偏转角度不太大(在20°以内)，并且有良好光照的条件下，前面步骤能够准确定位5个特征点A～E，用下面步骤求取3个偏转角。Under the condition that the deflection angle is not too large (within 20°) and there is good light, the previous steps can accurately locate the 5 feature points A~E, and use the following steps to obtain the 3 deflection angles.

I.坐标预处理。对A、B两点坐标求平均，得到中心点F。指定F为坐标原点，水平向右为横轴u，竖直向上为纵轴v，对后面所有图像点的二维坐标均以此为参考进行变换。对两个嘴角点D、E坐标求平均得到中心点G，连同两个眼角、眼角中心点和鼻尖共5个特征点A、B、C、F、G，用下面步骤求取3个偏转角。I. Coordinate preprocessing. Calculate the average of the coordinates of A and B to obtain the center point F. Designate F as the coordinate origin, the horizontal axis u to the right, and the vertical axis v to the vertical, and transform the two-dimensional coordinates of all subsequent image points with this as a reference. Average the coordinates of the two mouth corner points D and E to obtain the center point G, together with the two eye corners, the center point of the eye corners and the tip of the nose, a total of 5 feature points A, B, C, F, G, use the following steps to obtain 3 deflection angles .

II.首先确定一幅正面图像，设定其三个偏转角α、β、γ为0，定位5个特征点A～E，按照上面(3)I方法计算A、B、C、F、G点的坐标并记录。作为参考图像，在下面表述中，以下标1作为标记。于是这些坐标点描述为：A点(u_A1，v_A1)，B点(u_B1，v_B1)，C点(u_C1，v_C1)，F点(u_F1，v_F1)，G点(u_G1，v_G1)。II. First determine a frontal image, set its three deflection angles α, β, γ to 0, locate 5 feature points A~E, and calculate A, B, C, F, G according to the method (3)I above Point coordinates and record. As a reference image, in the following expressions, a subscript 1 is used as a mark. Then these coordinate points are described as: point A (u _A1 , v _A1 ), point B (u _B1 , v _B1 ), point C (u _C1 , v _C1 ), point F (u _F1 , v _F1 ), point G ( u _G1 , v _G1 ).

III.对任意时刻采集的人脸图像，首先定位5个特征点，再按照(3)II相同的方法得到5个点的坐标，以下标2为标记：A点(u_A2，v_A2)，B点(u_B2，v_B2)，C点(u_C2，v_C2)，F点(u_F2，v_F2)，G点(u_G2，v_G2)。III. For the face image collected at any time, first locate 5 feature points, and then obtain the coordinates of 5 points according to the same method as (3)II, and mark 2 as follows: point A (u _A2 , v _A2 ), Point B (u _B2 , v _B2 ), point C (u _C2 , v _C2 ), point F (u _F2 , v _F2 ), and point G (u _G2 , v _G2 ).

IV.利用(3)II、(3)III两步所确定的两幅图像对应5点的坐标，利用针孔摄像机模型和对极几何原理进行数学推导，计算出图像2中人脸的姿态，得到三个偏转角度α、β、γ。针孔摄像机模型：IV. Utilize the coordinates of the two images corresponding to 5 points determined by (3)II and (3)III two steps, use the pinhole camera model and the principle of epipolar geometry to perform mathematical derivation, and calculate the posture of the face in image 2, Three deflection angles α, β, γ are obtained. Pinhole camera model:

$s the s \overset{^^}{m m} = = K K [[R R | | T T]] \overset{^^}{M m}$

其中s为尺度因子， $\hat{m} = {[u, v, 1]}^{T}$ 图像点的齐次坐标，K为像机的内部参数矩阵，可以定义成diag(f，f，1)。T＝[t_x，t_y，t_z]^T是平移矩阵， $\hat{M} = {[x, y, z, 1]}^{T}$ 是空间人脸三维点的齐次坐标。R是以α，β和γ为基础的旋转矩阵。本步骤的任务是根据两幅已知图像的5个特征点的

信息，通过一些合理的假设，在不知道

的情况下估算R中的三个角度值。where s is the scale factor,

\hat{m} = {[u, v, 1]}^{T}

The homogeneous coordinates of the image point, K is the internal parameter matrix of the camera, which can be defined as diag(f, f, 1). T=[t _x , _ty , t _z ] ^T is the translation matrix,

\hat{m} = {[x, the y, z, 1]}^{T}

is the homogeneous coordinate of the 3D point of the face in space. R is a rotation matrix based on α, β and γ. The task of this step is based on the five feature points of two known images

information, through some reasonable assumptions, without knowing

Estimate three angle values in R in case of .

由于只涉及角度估计，算法中的运算式全部以差-商方式出现，可以对针孔像机模型的参数作如下假设：s＝1，K＝diag(1，1，1)，T＝[0，0，0]^T。设：Since only angle estimation is involved, the calculation formulas in the algorithm all appear in the form of difference-quotient, and the following assumptions can be made for the parameters of the pinhole camera model: s=1, K=diag(1,1,1), T=[ 0,0,0] ^T . set up:

$M m = = \frac{{v v}_{B B 22} - - {v v}_{A A 22}}{{u u}_{B B 22} - - {u u}_{A A 22}}$ $O o = = \frac{{v v}_{G G 22} - - {v v}_{F f 22}}{{u u}_{F f 22} - - {u u}_{F f 22}}$ $P P = = \frac{{u u}_{B B 11} - - {u u}_{A A 11}}{{u u}_{B B 22} - - {u u}_{A A 22}}$

c＝cosγ d＝sinγ h＝sinβc=cosγ d=sinγ h=sinβ

可以推导出：It can be deduced that:

γ＝arc tan(M)γ=arc tan(M)

$β β = = arccos arccos ((\frac{11}{Pc PC}))$

$α α = = arctan arctan ((\frac{c c + + Od Odd}{Ohc Ohc - - hd hd}))$

由于在计算过程中，β存在一个正负符号问题，不能由A、B、F、G四点唯一确定，所以需要引入C点，根据C与F、G连线的相对位置决定其符号。Since there is a sign problem of β in the calculation process, it cannot be uniquely determined by the four points A, B, F, and G, so point C needs to be introduced, and its sign is determined according to the relative position of the line connecting C, F, and G.

(4)生成人机交互信息(4) Generate human-computer interaction information

由α、β、γ三个角度信息定义鼠标位置和操作方式。α、β为0定义为屏幕中心，α变化时鼠标上下移动，β变化时鼠标左右移动，角度大于或等于20°时鼠标定位到屏幕边沿。γ角度的突变定义鼠标操作方式，γ为正且连续两帧之间角度变化介于3°～8°为单击左键，超过8°为双击左键，γ为负且连续两帧之间角度变化介于-3°～-8°为单击右键，超过-8°为双击右键。The position and operation mode of the mouse are defined by three angle information of α, β, γ. When α and β are 0, it is defined as the center of the screen. When α changes, the mouse moves up and down, when β changes, the mouse moves left and right, and when the angle is greater than or equal to 20°, the mouse is positioned to the edge of the screen. The mutation of the γ angle defines the mouse operation mode. When γ is positive and the angle change between two consecutive frames is between 3° and 8°, it means clicking the left button, and if it exceeds 8°, it means double-clicking the left button. When γ is negative and the angle changes between two consecutive frames If the angle changes between -3°～-8°, it is a right-click, and if it exceeds -8°, it is a double-click.

(5)通过USB连接到PC机上形成鼠标装置(5) Connect to the PC via USB to form a mouse device

装置上应用Cypress芯片开发USB接口，以标准USB鼠标方式实现与PC机的通信。用上面方法产生的鼠标位置和操作方式取代传统鼠标的操作信息传送到PC机，形成一种基于人脸姿态估计的视觉鼠标装置。The Cypress chip is used to develop the USB interface on the device, and the communication with the PC is realized by means of a standard USB mouse. The mouse position and operation mode generated by the above method are transmitted to the PC instead of the traditional mouse operation information, forming a visual mouse device based on human face pose estimation.

Claims

1. man-machine interaction method of estimating based on human face posture, method may further comprise the steps:

A) people's face digital image sequence that will obtain through optical lens and cmos image sensor assembly is realized data acquisition by the digital video high-speed channel of DSP, and is finished video processnig algorithms by DSP;

B) people's face digital picture is carried out pre-service

At first carry out image noise reduction: adopt the nonlinear diffusion method to realize squelch, wherein edge map adopts the variance computing method of impact point neighborhood four direction belt-like zone; Zone summation in the variance computation process adopts integrogram to realize, to reduce calculated amount; Secondly, the people's face in the image is positioned, and carry out area dividing, obtain the Position Approximate of people's face, according to human face structure feature priori people's face is divided into right and left eyes, nose and four zones of mouth then with template matching method; Carry out rim detection at last, the size and Orientation of examination arbitrary image point gradient, according to the consistance of the gradient direction of 8 of picture point and neighborhoods, with all gradients of 9 ask vector with, as the new gray-scale value of this picture point, obtain edge image with its mould value;

C) inner eye corner of two eyes of people from location face, two corners of the mouths and nose totally 5 unique points:

1. locate the canthus: at first, in the preliminary zone of human eye, adopt the Gray Projection method to determine the eyeball center, and mark the definite position of rectangular area as eyes with this; Then, in this zone, adopt maximum variance between clusters to determine threshold value, edge image is carried out binary conversion treatment, and the restriction connection is counted; At last, extract the most left to the human eye connected region and the rightest o'clock as two canthus points;

2. locate the corners of the mouth: adopt the training of lip colo(u)r atlas to contain the mixed Gauss model of two Gaussian functions earlier, the arbitrfary point belongs to the probability of lip in the calculating face zone, and the normalization back is used the identical method in location, canthus again and is extracted corners of the mouth position as the new half-tone information of this point;

3. locate nose: in nasal area, adopt two naris positions of Gray Projection method search earlier, above the nostril, search highlight then as the nose position;

D) 3 deflection angles of estimation people face

1. coordinate transform: at first, calculate the center at two canthus, all unique points are carried out coordinate transform as image origin; Secondly, calculate the center and the record of two corners of the mouth points, obtain two canthus, center, canthus, nose, corners of the mouth center totally 5 new unique points;

2. to a width of cloth direct picture and there is the realtime graphic of deflection angle to obtain 5 characteristic point coordinates respectively, finish coordinate transform and set up corresponding relation;

3. utilize the coordinate of corresponding 5 of two width of cloth images, derive, calculate the attitude of people's face in the realtime graphic of deflection angle, derive three deflection angles by the pinhole camera modeling of simplifying with to utmost point geometrical principle; Wherein, the pinhole camera modeling of described simplification is that parameter to the pin-hole image machine model makes the following assumptions: scale factor s=1, the inner parameter matrix K=diag of camera (1,1,1), translation matrix T=[0,0,0] ^T

E) generate human-machine interactive information

3. three deflection angles definition mouse positions being derived by step d) and mode of operation are according to two angle values location mouse pointer position wherein, with the operation of the interframe variable quantity definition mouse of another angle, generation human-machine interactive information;

F) connect computer realization communication

Develop USB interface and write driver according to the communication protocol of standard USB mouse, the human-machine interactive information that obtains previously is sent to computing machine, form a kind of new vision mouse apparatus.

2. the man-machine interaction method of estimating based on human face posture according to claim 1, it is characterized in that:, determine that wherein two corners of the mouth points are to participate in attitude and estimate in order to obtain its mid point a width of cloth front reference picture and have the realtime graphic of deflection angle to locate 5 unique points respectively; By the deflection angle definition human-machine interactive information of people's face and with USB mouse mode and compunication; This vision mouse apparatus is suitable for multimedia application and disabled people scholar uses a computer.