CN101561710B - A Human-Computer Interaction Method Based on Face Pose Estimation - Google Patents
A Human-Computer Interaction Method Based on Face Pose Estimation Download PDFInfo
- Publication number
- CN101561710B CN101561710B CN2009101038842A CN200910103884A CN101561710B CN 101561710 B CN101561710 B CN 101561710B CN 2009101038842 A CN2009101038842 A CN 2009101038842A CN 200910103884 A CN200910103884 A CN 200910103884A CN 101561710 B CN101561710 B CN 101561710B
- Authority
- CN
- China
- Prior art keywords
- face
- human
- image
- point
- people
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000003993 interaction Effects 0.000 title claims abstract description 38
- 230000002452 interceptive effect Effects 0.000 claims abstract description 13
- 210000001508 eye Anatomy 0.000 claims description 39
- 238000001514 detection method Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 5
- 238000009792 diffusion process Methods 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 5
- 230000003287 optical effect Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000013519 translation Methods 0.000 claims description 2
- 239000004744 fabric Substances 0.000 claims 3
- 238000006243 chemical reaction Methods 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 6
- 238000007781 pre-processing Methods 0.000 abstract description 5
- 239000013589 supplement Substances 0.000 abstract description 3
- 241000699666 Mus <mouse, genus> Species 0.000 description 20
- 238000005516 engineering process Methods 0.000 description 9
- 238000003708 edge detection Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 210000000887 face Anatomy 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 210000000056 organ Anatomy 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 210000001747 pupil Anatomy 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 241000218691 Cupressaceae Species 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 210000005252 bulbus oculi Anatomy 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000002187 spin decoupling employing ultra-broadband-inversion sequences generated via simulated annealing Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Landscapes
- Image Processing (AREA)
Abstract
本发明涉及一种基于人脸姿态估计的人机交互装置及方法,实现的步骤是:经摄像头得到人脸图像序列,在进行预处理后自动提取两个眼角、两个嘴角和鼻尖5个特征点;以一幅正面图像为参考,根据5个特征点的位置及对应关系估计任意图像中人脸的3个偏转角度;由这些姿态信息定义鼠标指针位置和操作方式,生成人机交互信息;通过USB接口连接到计算机,形成一种新的视觉鼠标装置。该装置作为传统人机交互方法的补充,适用于一些特殊的交互人群(如肢残人士)和交互环境(如多媒体游戏),具有显著的应用价值。
The present invention relates to a human-computer interaction device and method based on face pose estimation. The steps of realization are as follows: the face image sequence is obtained through a camera, and five features of two eye corners, two mouth corners and nose tip are automatically extracted after preprocessing point; using a frontal image as a reference, estimate the three deflection angles of the face in any image according to the position and corresponding relationship of the five feature points; define the position and operation mode of the mouse pointer by these posture information, and generate human-computer interaction information; Connect to the computer through the USB interface to form a new visual mouse device. As a supplement to traditional human-computer interaction methods, the device is suitable for some special interactive groups (such as handicapped people) and interactive environments (such as multimedia games), and has significant application value.
Description
技术领域technical field
本发明涉及一种人机交互方法,特别涉及一种构建以人脸特征点定位与姿态检测为基础的视觉鼠标装置的方法,通过图像分析与目标检测手段,提取两个眼角、两个嘴角和鼻尖共5个特征点,以一幅正面图像的相应特征点位置为参考,对实时人脸图像3个偏转角进行估计,并生成人机交互信息,通过USB接口连接到计算机,形成一种人机交互的视觉鼠标装置。The present invention relates to a human-computer interaction method, in particular to a method for constructing a visual mouse device based on facial feature point positioning and attitude detection. Two corners of the eyes, two corners of the mouth and There are 5 feature points on the tip of the nose. Taking the position of the corresponding feature points of a frontal image as a reference, the three deflection angles of the real-time face image are estimated, and human-computer interaction information is generated, which is connected to the computer through the USB interface to form a human-computer interface. A visual mouse device for computer interaction.
背景技术Background technique
人机交互技术(Human-Computer Interaction Techniques)是指通过计算机输入、输出设备,以有效的方式实现人与计算机对话的技术。它包括机器通过输出或显示设备给人提供大量有关信息及提示,人通过输入设备给机器输入有关信息,回答问题等。人机交互技术是计算机用户界面设计中的重要内容之一。它与认知学、人机工程学、心理学等学科领域有密切的联系。Human-computer interaction technology (Human-Computer Interaction Techniques) refers to the technology that realizes the dialogue between human and computer in an effective way through computer input and output devices. It includes that the machine provides a large amount of relevant information and prompts to people through output or display devices, and people input relevant information to the machine through input devices to answer questions, etc. Human-computer interaction technology is one of the important contents in computer user interface design. It is closely related to the subject areas of cognition, ergonomics, psychology and so on.
在人机交互领域,现在广泛使用的有键盘、鼠标、光笔等传统装置,然而在一些特殊情况下传统装置有其局限性,例如某些多媒体游戏的复杂界面操作以及残疾人士使用计算机,迫切需要设计不依赖肢体动作的输入装置作为现有人机交互方式的补充。In the field of human-computer interaction, traditional devices such as keyboards, mice, and light pens are widely used. However, traditional devices have their limitations in some special cases, such as the complex interface operations of some multimedia games and the use of computers by disabled people. Design input devices that do not rely on body movements as a supplement to existing human-computer interaction methods.
为了适应不同的交互环境,适用于不同的交互人群,国内外学者已经广泛研究基于视觉(图像)、听觉(语音)、触觉(压力、温度)等交互方法,有些方法已经初步投入使用。在视觉交互方法中,目前主要有视线追踪技术、手语识别技术等研究热点。视线追踪技术是通过摄像头和瞳孔定位技术感知人的关注视点,从而驱动鼠标定位及相关操作,然而有两个重要缺点:一是瞳孔移动是不连续的,不能与连续移动的视点很好吻合,降低了跟踪的准确性;二是由于生理与心理原因,人眼的移动具有随意性,某些情况下人眼视线所在位置并不反应自身所关注的主观意愿。In order to adapt to different interaction environments and apply to different interaction groups, domestic and foreign scholars have extensively studied interaction methods based on vision (image), hearing (voice), and touch (pressure, temperature), and some methods have been initially put into use. Among the visual interaction methods, there are currently research hotspots such as eye-tracking technology and sign language recognition technology. Gaze tracking technology perceives people's attention point of view through the camera and pupil positioning technology, so as to drive mouse positioning and related operations. However, there are two important shortcomings: one is that the pupil movement is discontinuous and cannot be well matched with the continuous moving point of view. The accuracy of tracking is reduced; the second is that due to physiological and psychological reasons, the movement of the human eye is random, and in some cases the position of the human eye's line of sight does not reflect the subjective will of its own attention.
为克服视线跟踪人机交互方法的缺陷,本发明采用人脸姿态估计方法,利用视频摄像头获得人脸图像,通过图像与视频处理技术实现人脸检测与关键点定位,根据多帧序列图像的特征点位置分布估计人脸姿态,并由姿态信息实现人机交互,是一种新的交互方法。由此形成的人机交互装置,适用于一些特殊的交互人群(如残疾人士)和交互环境(如多媒体游戏),具有显著的应用价值,在人机交互领域有着扩展范畴的重要意义。In order to overcome the defects of the human-computer interaction method of line of sight tracking, the present invention adopts a face pose estimation method, uses a video camera to obtain a face image, realizes face detection and key point positioning through image and video processing technology, and according to the characteristics of multi-frame sequence images It is a new interaction method to estimate the face pose based on the point position distribution, and realize human-computer interaction based on the pose information. The resulting human-computer interaction device is suitable for some special interactive groups (such as the disabled) and interactive environments (such as multimedia games), has significant application value, and has great significance in the field of human-computer interaction.
发明内容Contents of the invention
针对现有的人机交互装置不能完全满足不同的交互环境和交互人群,本发明的目的是提供一种一种基于人脸姿态估计的人机交互方法,利用人脸特征点自动定位和人脸姿态估计等技术,生成人机交互信息,通过USB接口连接到计算机,形成一种人机交互的视觉鼠标装置。Aiming at the fact that existing human-computer interaction devices cannot completely satisfy different interactive environments and interactive groups, the purpose of the present invention is to provide a human-computer interaction method based on face pose estimation, which utilizes facial feature point automatic positioning and human face Posture estimation and other technologies generate human-computer interaction information, and connect to a computer through a USB interface to form a human-computer interaction visual mouse device.
本发明涉及包含以下步骤:The present invention involves comprising the following steps:
a)经光学镜头及CMOS图像传感器组件得到的人脸数字图像序列,通过DSP的数字视频高速通道实现数据采集。a) The face digital image sequence obtained through the optical lens and CMOS image sensor components, and the data acquisition is realized through the digital video high-speed channel of the DSP.
b)对人脸数字图像进行预处理。b) Preprocessing the face digital image.
①图像消噪。由于图像噪声点与它们的近邻像素在灰度值、统计特征或分布规律上有显著的区别,由此可以进行图像滤波实现噪声抑制,使有用信息更容易检测与识别。非线性扩散是一种良好的图像消噪方法,具有很好的保边特性,但要经过多次迭代,计算复杂度高。本发明中采用目标点四个方向带状区域的方差构造边缘映射,应用于非线性扩散图像消噪算法中,减少迭代次数,并采用积分图实现快速计算。① Image denoising. Since image noise points are significantly different from their neighboring pixels in gray value, statistical characteristics or distribution rules, image filtering can be performed to achieve noise suppression, making useful information easier to detect and identify. Nonlinear diffusion is a good image denoising method, which has good edge-preserving properties, but it needs to go through many iterations, and the computational complexity is high. In the present invention, the edge mapping is constructed by using the variance of the strip area in four directions of the target point, which is applied to the non-linear diffusion image denoising algorithm, reduces the number of iterations, and uses the integral map to realize fast calculation.
②人脸检测与区域划分。手工标注多张人脸,将这些人脸变换到相同尺寸,再求平均得到人脸模板,用模板匹配方法得到人脸的大概位置,然后根据人脸结构特征先验知识将人脸划分成左右眼、鼻和嘴四个区域。②Face detection and area division. Manually mark multiple faces, transform these faces to the same size, and then calculate the average to get the face template, use the template matching method to get the approximate position of the face, and then divide the face into left and right according to the prior knowledge of face structure features Eyes, nose and mouth areas.
③边缘检测。由于5个特征点都位于人脸器官的角点上,需要提取轮廓或角点信息以提高定位精度。传统的边缘检测方法有Canny算子、Sobel算子、形态学方法以及小波变换方法等。此外,SUSAN角点检测算法能够很好反映边缘及角点信息,但效果受到阈值影响。本发明中采用方向滤波方法获取边缘图像。③Edge detection. Since the five feature points are located on the corners of the face organs, it is necessary to extract contour or corner information to improve the positioning accuracy. Traditional edge detection methods include Canny operator, Sobel operator, morphological method and wavelet transform method. In addition, the SUSAN corner detection algorithm can well reflect the edge and corner information, but the effect is affected by the threshold. In the present invention, a direction filtering method is used to obtain edge images.
c)定位人脸的5个特征点(2个眼角、2个嘴角和鼻尖)。c) Locate the 5 feature points of the face (2 corners of the eyes, 2 corners of the mouth and the tip of the nose).
①定位眼角。首先,在前面确定的人眼区域内,对消噪后图像进行水平和垂直两个方向的灰度投影,分别取谷点位置作为眼球中心的二维坐标,并以此划定矩形区域作为眼睛的确切位置;然后,考察这一区域,对边缘图像进行二值化处理,并检测任意连通区域的点数,去掉连通点数太少的区域。最后,对选中的人眼连通区域提取最左(右)点作为眼角点。① Locate the corner of the eye. Firstly, within the previously determined human eye area, grayscale projection is performed on the denoised image in both horizontal and vertical directions, and the position of the valley point is taken as the two-dimensional coordinates of the eyeball center, and a rectangular area is defined as the eye. The exact position of ; Then, examine this area, binarize the edge image, and detect the number of points in any connected area, and remove the area with too few connected points. Finally, the leftmost (right) point is extracted from the selected connected area of the human eye as the corner point.
②定位嘴角。嘴角定位也是角点检测,方法与眼角定位类似,但由于嘴的灰度变化不象眼睛一样存在高对比度的瞳孔,造成其轮廓不分明。先采用唇色样本训练建立高斯混合概率模型,将嘴区域所有点代入模型计算其概率,并规格化后作为新的灰度信息,再应用眼角定位相同的方法提取嘴角位置。② Locate the corners of the mouth. Mouth corner positioning is also corner detection, the method is similar to eye corner positioning, but because the gray scale change of the mouth does not have high-contrast pupils like eyes, resulting in unclear outlines. Firstly, a Gaussian mixture probability model is established by using lip color sample training, and all points in the mouth area are substituted into the model to calculate their probability, and normalized as new grayscale information, and then the mouth corner position is extracted using the same method as eye corner positioning.
③定位鼻尖。鼻尖不在轮廓上,定位鼻尖要依靠其与眼角与嘴角的几何位置关系确定。由于鼻尖离鼻孔较近,而且位于两个鼻孔连线上方,先采用灰度投影法在鼻子区域内搜索两个鼻孔位置,然后在鼻孔上方查找高光点作为鼻尖位置。在人脸偏转角度较大时,鼻尖定位容易造成误差,但在本发明中,鼻尖位置只影响偏转角度的符号,即确定转动方向,并不影响角度大小计算,因而对误差并不敏感。③ Locate the tip of the nose. The tip of the nose is not on the contour, and the positioning of the tip of the nose depends on its geometric relationship with the corners of the eyes and mouth. Since the tip of the nose is close to the nostrils and above the line connecting the two nostrils, the grayscale projection method is used to search for the positions of the two nostrils in the nose area, and then the high light point above the nostrils is used as the position of the nose tip. When the deflection angle of the human face is large, the positioning of the tip of the nose is likely to cause errors, but in the present invention, the position of the tip of the nose only affects the sign of the deflection angle, that is, determines the direction of rotation, and does not affect the calculation of the angle, so it is not sensitive to errors.
d)估计人脸的3个偏转角度d) Estimate the 3 deflection angles of the face
3个偏转角度是生成人机交互信息的基础。在偏转角度不太大,人脸大小相对固定并处于画面中心,有良好光照的条件下,可以准确得到5个特征点,用下面步骤求取3个偏转角。The three deflection angles are the basis for generating human-computer interaction information. Under the condition that the deflection angle is not too large, the size of the face is relatively fixed and is in the center of the screen, and there is good light, 5 feature points can be obtained accurately, and 3 deflection angles can be obtained by the following steps.
①坐标变换。先计算左右眼角点的中心,并作为图像原点,其余所有坐标点根据其与原点的相对位置进行坐标变换,得到新的坐标值。然后计算两个嘴角点的中心位置并记录,得到两个眼角、眼角中心、鼻尖、嘴角中心共5个特征点,这5个特征点是进行人脸三个三维偏转角度估计的基础。① Coordinate transformation. First calculate the center of the left and right eye corners, and use it as the origin of the image, and perform coordinate transformation on all other coordinate points according to their relative positions with the origin to obtain new coordinate values. Then calculate and record the center positions of the two corner points of the mouth, and obtain five feature points of the two eye corners, the center of the eye corners, the tip of the nose, and the center of the mouth corners. These five feature points are the basis for estimating the three-dimensional deflection angles of the face.
②确定一幅正面图像,以两个眼角连线为水平线,该连线与嘴角中心点确定的平面与摄像镜头法线垂直时为正面,设此时三个偏转角0。定位正面图像的5个特征点,根据d)①方法完成坐标变换,计算5个特征点坐标并记录,在后续的姿态估计中作为参考信息。② Determine a frontal image, take the line connecting the two corners of the eyes as the horizontal line, and the plane defined by the line and the center point of the mouth corner is the front when it is perpendicular to the normal of the camera lens, and set the three deflection angles at this time to 0. Locate the 5 feature points of the frontal image, complete the coordinate transformation according to the method d)①, calculate and record the coordinates of the 5 feature points, and use them as reference information in the subsequent pose estimation.
③对任意时刻采集的人脸图像,需要估计其姿态以生成人机交互信息。首先定位5个特征点,再按照d)②相同的方法得到5个点的坐标。参照d)②得到的正面图像的特征点,对这些坐标按照几何约束关系进行验证,对明显不符合条件的定位结果作放弃处理,不生成交互信息。③ For the face image collected at any time, it is necessary to estimate its pose to generate human-computer interaction information. First locate 5 feature points, and then obtain the coordinates of 5 points according to the same method as d)②. Referring to the feature points of the frontal image obtained in d)②, these coordinates are verified according to the geometric constraint relationship, and the positioning results that obviously do not meet the conditions are discarded, and no interactive information is generated.
④利用d)②、d)③两步所确定的两幅图像对应5点的坐标,利用针孔摄像机模型,根据算法特点作出一些合乎情理及不影响结果的简化假设,最后利用对极几何原理进行推导,计算出任意图像中人脸的姿态,得到三个偏转角度。④ Use the coordinates of the two images corresponding to 5 points determined in the two steps of d) ② and d) ③, use the pinhole camera model, make some simplified assumptions that are reasonable and do not affect the results according to the characteristics of the algorithm, and finally use the principle of epipolar geometry Derivation is performed to calculate the pose of the face in any image and obtain three deflection angles.
e)生成人机交互信息e) Generate human-computer interaction information
由前面获得的三个偏转角度定义鼠标位置和操作方式。计算机-鼠标交互方法包括指针在二位平面内任意移动,鼠标左、右键的单双击等操作。本发明根据其中两个角度值定位鼠标指针位置,用另一角度的帧间突变量定义鼠标的操作,从而生成人机交互信息。The mouse position and operation mode are defined by the three deflection angles obtained earlier. The computer-mouse interaction method includes operations such as arbitrary movement of the pointer in the two-dimensional plane, single and double clicks of the left and right keys of the mouse, and the like. The invention locates the position of the mouse pointer according to the two angle values, and defines the operation of the mouse by the inter-frame mutation amount of the other angle, so as to generate human-computer interaction information.
f)连接计算机实现通信f) Connect to a computer for communication
装置中开发USB接口,按照标准USB鼠标的通信协议编写驱动程序,将前面获取的人机交互信息传送到计算机。计算机不需要任何专门的程序支持,降低了目标计算机的负担,不影响操作者应用计算机运行复杂的软件。The USB interface is developed in the device, the driver program is written according to the communication protocol of the standard USB mouse, and the human-computer interaction information obtained earlier is transmitted to the computer. The computer does not need any special program support, which reduces the burden on the target computer and does not affect the operator's use of the computer to run complex software.
本发明作为传统人机交互方法的补充,适用于一些特殊的交互人群(如肢残人士)和交互环境(如多媒体游戏),具有显著的应用价值。As a supplement to the traditional human-computer interaction method, the present invention is applicable to some special interactive groups (such as handicapped persons) and interactive environments (such as multimedia games), and has significant application value.
附图说明Description of drawings
图1是本发明的人机交互信息处理流程图Fig. 1 is the flow chart of human-computer interaction information processing of the present invention
图2是人脸初定位采用的人脸模板图Figure 2 is the face template image used for the initial positioning of the face
图3是正面人脸器官模板及本方法中5个特征点位置图Figure 3 is the frontal face organ template and the position map of the five feature points in this method
图4是本发明中人脸3个偏转角度的定义方法示意图Fig. 4 is a schematic diagram of the definition method of the three deflection angles of the human face in the present invention
具体实施方式Detailed ways
下面结合一个非限定性实例对本发明作进一步的说明Below in conjunction with a non-limiting example the present invention will be further described
参见图1、图2、图3、图4。See Figure 1, Figure 2, Figure 3, Figure 4.
本发明图像采集、时序生成及控制采用CPLD器件实现,图像预处理、特征点定位和姿态估计相关算法采TI公司的达芬奇处理器TMS32C6446完成,USB接口用Cypress控制芯片实现。依据信息处理流程完成所有算法及硬件模块设计,模拟鼠标操作实现人机交互。The image acquisition, timing generation and control of the present invention are realized by CPLD devices, the related algorithms of image preprocessing, feature point positioning and attitude estimation are completed by Da Vinci processor TMS32C6446 of TI Company, and the USB interface is realized by Cypress control chip. Complete the design of all algorithms and hardware modules according to the information processing flow, and simulate mouse operation to realize human-computer interaction.
主要模块内容介绍如下:The main modules are introduced as follows:
(1)人脸图像预处理(1) Face image preprocessing
I.图像消噪。由于光学系统或电子器件影响,图像不可避免会受到噪声干扰,需要进行消噪处理以提高特征点的定位精度。应用非线性扩散消噪算法原理,并利用目标点邻域四个方向带状区域灰度的方差值作为图像的边缘映射,每个像素点的扩散量由邻域八像素与该点的差值以及相应的方向权系数决定,从而增强适应能力,减少迭代次数少,加快运算速度。其迭代公式为:I. Image denoising. Due to the influence of the optical system or electronic devices, the image will inevitably be disturbed by noise, and denoising processing is required to improve the positioning accuracy of feature points. The principle of nonlinear diffusion denoising algorithm is applied, and the variance value of the gray value of the band-shaped area in the four directions of the neighborhood of the target point is used as the edge map of the image. The diffusion amount of each pixel is determined by the difference The value and the corresponding direction weight coefficient are determined, thereby enhancing the adaptability, reducing the number of iterations, and speeding up the calculation. Its iteration formula is:
为了在迭带过程中不产生新的极值点,λ取0.125,σp,q根据p、q值不同代表四个方向带状区域方差。为叉分方法计算的梯度。扩散函数g(σ)定义为:In order not to produce new extremum points in the process of overlapping bands, λ is set to 0.125, and σ p, q represent the variance of banded areas in four directions according to the different values of p and q. The gradient computed for the fork method. The spread function g(σ) is defined as:
II.人脸定位与区域划分。由于本发明应用于特殊的人机交互环境,可以保证一系列客观条件:良好的光照,人脸大小比较统一并处于图像中心,背景影响较小,减少了人脸定位负担。先采用手工标注100张人脸,在尺寸归一化后求平均得到人脸模板,用模板匹配方法提取人脸的位置。设ai,j为待检测像素灰度,ti,j为模板像素灰度,Ea、Et分别为二者均值,则匹配系数定义为:II. Face localization and area division. Since the present invention is applied to a special human-computer interaction environment, a series of objective conditions can be guaranteed: good illumination, relatively uniform size of the face and being in the center of the image, less influence of the background, and reduced burden of face positioning. Firstly, 100 faces are marked manually, and the face template is obtained by averaging after size normalization, and the position of the face is extracted by template matching method. Let a i, j be the gray level of the pixel to be detected, t i, j be the gray level of the template pixel, and E a and E t be the mean values of the two respectively, then the matching coefficient is defined as:
m>0.55为侯选人脸区域。在检测范围内若出现多个符合条件的区域,则对连续出现的人脸区域求平均得到人脸位置。然后根据人脸结构特征先验知识将人脸划分成左右眼、鼻和嘴四个区域。m>0.55 is the candidate face area. If there are multiple qualified areas within the detection range, the face position is obtained by averaging the continuously appearing face areas. Then the face is divided into four regions: left and right eyes, nose and mouth according to the prior knowledge of face structure features.
III.边缘检测。人脸的眼、鼻、嘴等器官包含了显著的轮廓信息,可以通过边缘检测算法提取轮廓信息,为特征点检测打下基础。本发明中采用方向滤波方法:先用差分法计算每个点的梯度大小和方向,再对目标点及其相邻8点的梯度进行矢量叠加,用其模值作为该点新的灰度值,从而得到边缘图像。III. Edge detection. The eyes, nose, mouth and other organs of the human face contain significant contour information, which can be extracted through edge detection algorithms to lay the foundation for feature point detection. In the present invention, the direction filtering method is adopted: first calculate the gradient size and direction of each point by difference method, then carry out vector superposition to the gradient of the target point and its adjacent 8 points, and use its modulus value as the new gray value of the point , so as to obtain the edge image.
(2)定位人脸的5个特征点(2个眼角、2个嘴角和鼻尖)。(2) Locate the 5 feature points of the face (2 corners of the eyes, 2 corners of the mouth and the tip of the nose).
I.定位眼角。首先,针对(1)I得到的预处理图像和(1)II确定的人眼大致区域,进行水平和垂直灰度投影,先对投影波形进行平滑,再分别取谷点位置为眼球中心点的纵向和横向坐标,并以此划定矩形区域作为眼睛的位置;其次,对该区域的(1)III边缘图像采用最大类间方差法确定阈值进行二值化处理;再次,对二值图像进行连通区域检测,对连通点少于20的区域作为干扰排除,并对候选连通区进行形状验证;最后,对选中的人眼连通区域提取最左(右)点,若存在多点则最下面的点作为眼角点。I. Locate the corner of the eye. First, for the preprocessed image obtained in (1)I and the general area of the human eye determined in (1)II, horizontal and vertical grayscale projections are performed, and the projected waveform is first smoothed, and then the position of the valley point is taken as the center point of the eyeball. Vertical and horizontal coordinates, and delineate the rectangular area as the position of eyes with this; Secondly, adopt the maximum inter-class variance method to determine the threshold value and carry out binarization processing on the (1) III edge image of this area; Again, carry out binary image Connected area detection, remove the area with less than 20 connected points as interference, and verify the shape of the candidate connected area; finally, extract the leftmost (right) point from the selected human eye connected area, if there are multiple points, the bottom point as the corner of the eye.
II.定位嘴角。采用与眼角定位相似的方法,只是由于嘴的灰度变化不如眼睛明显,所以在对嘴唇区域进行边沿检测之前,先通过高斯混合模型(GMMs)进行肤色变换:手工收集大量嘴唇样本,根据其颜色先计算所有样本点的Cr、Cb色差信息,并以此为二维坐标,建立由两个高斯模型组成的高斯混合概率模型,代入所有样本点进行肤色训练,获得模型参数。将(1)II步确定的嘴巴区域所有点的颜色信息代入模型,计算属于嘴唇的概率,将概率变换到0-255区间,作为新的灰度图,然后用(1)III方法进行边缘检测。后续的嘴角定位步骤与眼角提取相同。II. Locate the corners of the mouth. The method is similar to that of eye corner positioning, but because the grayscale change of the mouth is not as obvious as the eyes, so before the edge detection of the lip area, the skin color transformation is performed through Gaussian mixture models (GMMs): manually collect a large number of lip samples, according to their color First calculate the Cr and Cb color difference information of all sample points, and use this as two-dimensional coordinates to establish a Gaussian mixture probability model composed of two Gaussian models, and substitute all sample points for skin color training to obtain model parameters. Substitute the color information of all points in the mouth area determined in step (1) II into the model, calculate the probability of belonging to the lips, transform the probability to the 0-255 interval as a new grayscale image, and then use the method (1) III for edge detection . The subsequent mouth corner localization steps are the same as eye corner extraction.
III.定位鼻尖。首先定位两个鼻孔位置,以找到的两个眼角位置为基础,定义其距离为h,根据眼角、嘴角和鼻孔的位置关系,在眼角连线以下的1.2h到1.6h,宽度为h的矩形区域内分别进行水平和垂直灰度投影,由水平投影的谷点位置作为鼻孔位置的纵坐标,以垂直投影的两个谷点分别作为两个鼻孔的横坐标。然后在鼻孔上方0.3h内查找高光点,作为鼻尖位置。III. Locate the tip of the nose. First locate the positions of the two nostrils, and based on the two found corners of the eyes, define the distance as h. According to the positional relationship between the corners of the eyes, the corners of the mouth and the nostrils, a rectangle with a width of h between 1.2h and 1.6h below the line connecting the corners of the eyes Horizontal and vertical grayscale projections were performed in the area, and the valley point position of the horizontal projection was used as the ordinate of the nostril position, and the two valley points of the vertical projection were respectively taken as the abscissa of the two nostrils. Then find the highlight point within 0.3h above the nostril as the position of the tip of the nose.
(3)估计人脸的3个偏转角度(3) Estimate the three deflection angles of the face
在偏转角度不太大(在20°以内),并且有良好光照的条件下,前面步骤能够准确定位5个特征点A~E,用下面步骤求取3个偏转角。Under the condition that the deflection angle is not too large (within 20°) and there is good light, the previous steps can accurately locate the 5 feature points A~E, and use the following steps to obtain the 3 deflection angles.
I.坐标预处理。对A、B两点坐标求平均,得到中心点F。指定F为坐标原点,水平向右为横轴u,竖直向上为纵轴v,对后面所有图像点的二维坐标均以此为参考进行变换。对两个嘴角点D、E坐标求平均得到中心点G,连同两个眼角、眼角中心点和鼻尖共5个特征点A、B、C、F、G,用下面步骤求取3个偏转角。I. Coordinate preprocessing. Calculate the average of the coordinates of A and B to obtain the center point F. Designate F as the coordinate origin, the horizontal axis u to the right, and the vertical axis v to the vertical, and transform the two-dimensional coordinates of all subsequent image points with this as a reference. Average the coordinates of the two mouth corner points D and E to obtain the center point G, together with the two eye corners, the center point of the eye corners and the tip of the nose, a total of 5 feature points A, B, C, F, G, use the following steps to obtain 3 deflection angles .
II.首先确定一幅正面图像,设定其三个偏转角α、β、γ为0,定位5个特征点A~E,按照上面(3)I方法计算A、B、C、F、G点的坐标并记录。作为参考图像,在下面表述中,以下标1作为标记。于是这些坐标点描述为:A点(uA1,vA1),B点(uB1,vB1),C点(uC1,vC1),F点(uF1,vF1),G点(uG1,vG1)。II. First determine a frontal image, set its three deflection angles α, β, γ to 0, locate 5 feature points A~E, and calculate A, B, C, F, G according to the method (3)I above Point coordinates and record. As a reference image, in the following expressions, a
III.对任意时刻采集的人脸图像,首先定位5个特征点,再按照(3)II相同的方法得到5个点的坐标,以下标2为标记:A点(uA2,vA2),B点(uB2,vB2),C点(uC2,vC2),F点(uF2,vF2),G点(uG2,vG2)。III. For the face image collected at any time, first locate 5 feature points, and then obtain the coordinates of 5 points according to the same method as (3)II, and mark 2 as follows: point A (u A2 , v A2 ), Point B (u B2 , v B2 ), point C (u C2 , v C2 ), point F (u F2 , v F2 ), and point G (u G2 , v G2 ).
IV.利用(3)II、(3)III两步所确定的两幅图像对应5点的坐标,利用针孔摄像机模型和对极几何原理进行数学推导,计算出图像2中人脸的姿态,得到三个偏转角度α、β、γ。针孔摄像机模型:IV. Utilize the coordinates of the two images corresponding to 5 points determined by (3)II and (3)III two steps, use the pinhole camera model and the principle of epipolar geometry to perform mathematical derivation, and calculate the posture of the face in image 2, Three deflection angles α, β, γ are obtained. Pinhole camera model:
其中s为尺度因子,
由于只涉及角度估计,算法中的运算式全部以差-商方式出现,可以对针孔像机模型的参数作如下假设:s=1,K=diag(1,1,1),T=[0,0,0]T。设:Since only angle estimation is involved, the calculation formulas in the algorithm all appear in the form of difference-quotient, and the following assumptions can be made for the parameters of the pinhole camera model: s=1, K=diag(1,1,1), T=[ 0,0,0] T . set up:
c=cosγ d=sinγ h=sinβc=cosγ d=sinγ h=sinβ
可以推导出:It can be deduced that:
γ=arc tan(M)γ=arc tan(M)
由于在计算过程中,β存在一个正负符号问题,不能由A、B、F、G四点唯一确定,所以需要引入C点,根据C与F、G连线的相对位置决定其符号。Since there is a sign problem of β in the calculation process, it cannot be uniquely determined by the four points A, B, F, and G, so point C needs to be introduced, and its sign is determined according to the relative position of the line connecting C, F, and G.
(4)生成人机交互信息(4) Generate human-computer interaction information
由α、β、γ三个角度信息定义鼠标位置和操作方式。α、β为0定义为屏幕中心,α变化时鼠标上下移动,β变化时鼠标左右移动,角度大于或等于20°时鼠标定位到屏幕边沿。γ角度的突变定义鼠标操作方式,γ为正且连续两帧之间角度变化介于3°~8°为单击左键,超过8°为双击左键,γ为负且连续两帧之间角度变化介于-3°~-8°为单击右键,超过-8°为双击右键。The position and operation mode of the mouse are defined by three angle information of α, β, γ. When α and β are 0, it is defined as the center of the screen. When α changes, the mouse moves up and down, when β changes, the mouse moves left and right, and when the angle is greater than or equal to 20°, the mouse is positioned to the edge of the screen. The mutation of the γ angle defines the mouse operation mode. When γ is positive and the angle change between two consecutive frames is between 3° and 8°, it means clicking the left button, and if it exceeds 8°, it means double-clicking the left button. When γ is negative and the angle changes between two consecutive frames If the angle changes between -3°~-8°, it is a right-click, and if it exceeds -8°, it is a double-click.
(5)通过USB连接到PC机上形成鼠标装置(5) Connect to the PC via USB to form a mouse device
装置上应用Cypress芯片开发USB接口,以标准USB鼠标方式实现与PC机的通信。用上面方法产生的鼠标位置和操作方式取代传统鼠标的操作信息传送到PC机,形成一种基于人脸姿态估计的视觉鼠标装置。The Cypress chip is used to develop the USB interface on the device, and the communication with the PC is realized by means of a standard USB mouse. The mouse position and operation mode generated by the above method are transmitted to the PC instead of the traditional mouse operation information, forming a visual mouse device based on human face pose estimation.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101038842A CN101561710B (en) | 2009-05-19 | 2009-05-19 | A Human-Computer Interaction Method Based on Face Pose Estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009101038842A CN101561710B (en) | 2009-05-19 | 2009-05-19 | A Human-Computer Interaction Method Based on Face Pose Estimation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101561710A CN101561710A (en) | 2009-10-21 |
CN101561710B true CN101561710B (en) | 2011-02-09 |
Family
ID=41220526
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009101038842A Expired - Fee Related CN101561710B (en) | 2009-05-19 | 2009-05-19 | A Human-Computer Interaction Method Based on Face Pose Estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101561710B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408651A (en) * | 2018-09-21 | 2019-03-01 | 神思电子技术股份有限公司 | A kind of face retrieval method based on the identification of face face gesture |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102156537B (en) * | 2010-02-11 | 2016-01-13 | 三星电子株式会社 | A kind of head pose checkout equipment and method |
CN104331152B (en) * | 2010-05-24 | 2017-06-23 | 原相科技股份有限公司 | Three-dimensional image interaction system |
CN102262706B (en) * | 2010-05-24 | 2014-11-05 | 原相科技股份有限公司 | Method for calculating ocular distance |
CN102375533A (en) * | 2010-08-19 | 2012-03-14 | 阳程科技股份有限公司 | Cursor control method |
CN102231093B (en) * | 2011-06-14 | 2013-07-31 | 伍斌 | Screen locating control method and device |
US8803800B2 (en) * | 2011-12-02 | 2014-08-12 | Microsoft Corporation | User interface control based on head orientation |
CN102663354B (en) * | 2012-03-26 | 2014-02-19 | 腾讯科技(深圳)有限公司 | Face calibration method and system thereof |
CN103149939B (en) * | 2013-02-26 | 2015-10-21 | 北京航空航天大学 | A kind of unmanned plane dynamic target tracking of view-based access control model and localization method |
CN103279767A (en) * | 2013-05-10 | 2013-09-04 | 杭州电子科技大学 | Human-machine interaction information generation method based on multi-feature-point combination |
CN103211605B (en) * | 2013-05-14 | 2015-02-18 | 重庆大学 | Psychological testing system and method |
CN103336577B (en) * | 2013-07-04 | 2016-05-18 | 宁波大学 | A kind of mouse control method based on human face expression identification |
CN103472915B (en) * | 2013-08-30 | 2017-09-05 | 深圳Tcl新技术有限公司 | reading control method based on pupil tracking, reading control device and display device |
CN103605466A (en) * | 2013-10-29 | 2014-02-26 | 四川长虹电器股份有限公司 | Facial recognition control terminal based method |
CN103593654B (en) * | 2013-11-13 | 2015-11-04 | 智慧城市系统服务(中国)有限公司 | A kind of method and apparatus of Face detection |
CN103942525A (en) * | 2013-12-27 | 2014-07-23 | 高新兴科技集团股份有限公司 | Real-time face optimal selection method based on video sequence |
CN104780308A (en) * | 2014-01-09 | 2015-07-15 | 联想(北京)有限公司 | Information processing method and electronic device |
CN103793693A (en) * | 2014-02-08 | 2014-05-14 | 厦门美图网科技有限公司 | Method for detecting face turning and facial form optimizing method with method for detecting face turning |
FR3021443B1 (en) * | 2014-05-20 | 2017-10-13 | Essilor Int | METHOD FOR CONSTRUCTING A MODEL OF THE FACE OF AN INDIVIDUAL, METHOD AND DEVICE FOR ANALYZING POSTURE USING SUCH A MODEL |
CN104123000A (en) * | 2014-07-09 | 2014-10-29 | 昆明理工大学 | Non-intrusive mouse pointer control method and system based on facial feature detection |
CN104573657B (en) * | 2015-01-09 | 2018-04-27 | 安徽清新互联信息科技有限公司 | It is a kind of that detection method is driven based on the blind of feature of bowing |
CN107123139A (en) * | 2016-02-25 | 2017-09-01 | 夏立 | 2D to 3D facial reconstruction methods based on opengl |
CN105701786B (en) * | 2016-03-21 | 2019-09-24 | 联想(北京)有限公司 | A kind of image processing method and electronic equipment |
CN106774936B (en) * | 2017-01-10 | 2020-01-07 | 上海木木机器人技术有限公司 | Man-machine interaction method and system |
CN106874861A (en) * | 2017-01-22 | 2017-06-20 | 北京飞搜科技有限公司 | A kind of face antidote and system |
CN106991417A (en) * | 2017-04-25 | 2017-07-28 | 华南理工大学 | A kind of visual projection's interactive system and exchange method based on pattern-recognition |
CN107122054A (en) * | 2017-04-27 | 2017-09-01 | 青岛海信医疗设备股份有限公司 | A kind of detection method and device of face deflection angle and luffing angle |
WO2019000462A1 (en) * | 2017-06-30 | 2019-01-03 | 广东欧珀移动通信有限公司 | Face image processing method and apparatus, storage medium, and electronic device |
CN107424218B (en) * | 2017-07-25 | 2020-11-06 | 成都通甲优博科技有限责任公司 | 3D try-on-based sequence diagram correction method and device |
CN108573218A (en) * | 2018-03-21 | 2018-09-25 | 漳州立达信光电子科技有限公司 | Face data collection method and terminal equipment |
CN111033508B (en) * | 2018-04-25 | 2020-11-20 | 北京嘀嘀无限科技发展有限公司 | A system and method for identifying body movements |
CN108905192A (en) * | 2018-06-01 | 2018-11-30 | 北京市商汤科技开发有限公司 | Information processing method and device, storage medium |
CN109345305A (en) * | 2018-09-28 | 2019-02-15 | 广州凯风科技有限公司 | A kind of elevator electrical screen advertisement improvement analysis method based on face recognition technology |
CN109671108B (en) * | 2018-12-18 | 2020-07-28 | 重庆理工大学 | A Pose Estimation Method for a Single Multi-view Face Image with Arbitrary Rotation in the Plane |
CN110647790A (en) * | 2019-04-26 | 2020-01-03 | 北京七鑫易维信息技术有限公司 | Method and device for determining gazing information |
CN110097021B (en) * | 2019-05-10 | 2022-09-06 | 电子科技大学 | MTCNN-based face pose estimation method |
CN110610171A (en) * | 2019-09-24 | 2019-12-24 | Oppo广东移动通信有限公司 | Image processing method and apparatus, electronic device, computer-readable storage medium |
CN112069993B (en) * | 2020-09-04 | 2024-02-13 | 西安西图之光智能科技有限公司 | Dense face detection method and system based on five-sense organ mask constraint and storage medium |
CN112488032B (en) * | 2020-12-11 | 2022-05-20 | 重庆邮电大学 | Human eye positioning method based on nose and eye structure constraint |
CN114611600A (en) * | 2022-03-09 | 2022-06-10 | 安徽大学 | A three-dimensional pose estimation method for skiers based on self-supervision technology |
CN116308371A (en) * | 2022-09-07 | 2023-06-23 | 中国电信股份有限公司 | A business processing request generation method, device, equipment and medium |
CN119292470A (en) * | 2024-12-10 | 2025-01-10 | 无锡学院 | Intelligent mouse auxiliary control method, device, electronic device and storage medium |
-
2009
- 2009-05-19 CN CN2009101038842A patent/CN101561710B/en not_active Expired - Fee Related
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408651A (en) * | 2018-09-21 | 2019-03-01 | 神思电子技术股份有限公司 | A kind of face retrieval method based on the identification of face face gesture |
CN109408651B (en) * | 2018-09-21 | 2020-09-29 | 神思电子技术股份有限公司 | Face retrieval method based on face gesture recognition |
Also Published As
Publication number | Publication date |
---|---|
CN101561710A (en) | 2009-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101561710B (en) | A Human-Computer Interaction Method Based on Face Pose Estimation | |
CN107358648B (en) | Real-time fully automatic high-quality 3D face reconstruction method based on a single face image | |
CN111414798B (en) | Head posture detection method and system based on RGB-D image | |
US10353465B2 (en) | Iris and pupil-based gaze estimation method for head-mounted device | |
CN105389539B (en) | A method and system for 3D gesture pose estimation based on depth data | |
CN113781640B (en) | Three-dimensional face reconstruction model building method based on weak supervision learning and application thereof | |
WO2019237942A1 (en) | Line-of-sight tracking method and apparatus based on structured light, device, and storage medium | |
CN102830793B (en) | Sight tracing and equipment | |
Barmpoutis | Tensor body: Real-time reconstruction of the human body and avatar synthesis from RGB-D | |
TW202025137A (en) | Image processing method and apparatus, electronic device, and computer readable storage medium | |
CN103473801B (en) | A kind of human face expression edit methods based on single camera Yu movement capturing data | |
CN110930374A (en) | Acupoint positioning method based on double-depth camera | |
CN111443804B (en) | Method and system for describing fixation point track based on video analysis | |
CN109614899B (en) | Human body action recognition method based on lie group features and convolutional neural network | |
CN104821010A (en) | Binocular-vision-based real-time extraction method and system for three-dimensional hand information | |
CN113807287B (en) | A 3D structured light face recognition method | |
CN116958420A (en) | A high-precision modeling method for the three-dimensional face of a digital human teacher | |
CN105869166A (en) | Human body action identification method and system based on binocular vision | |
CN116051631A (en) | Light spot labeling method and system | |
CN107358646A (en) | A kind of fatigue detecting system and method based on machine vision | |
CN108256391A (en) | A kind of pupil region localization method based on projecting integral and edge detection | |
Cao et al. | Gaze tracking on any surface with your phone | |
Ma et al. | Research on kinect-based gesture recognition | |
CN112116673B (en) | Method, system and electronic device for generating virtual human body image based on structural similarity under posture guidance | |
CN114202795A (en) | Method for quickly positioning pupils of old people |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110209 Termination date: 20130519 |