[go: up one dir, main page]

CN1156248C - Method for detecting moving human face - Google Patents

Method for detecting moving human face Download PDF

Info

Publication number
CN1156248C
CN1156248C CNB011204281A CN01120428A CN1156248C CN 1156248 C CN1156248 C CN 1156248C CN B011204281 A CNB011204281 A CN B011204281A CN 01120428 A CN01120428 A CN 01120428A CN 1156248 C CN1156248 C CN 1156248C
Authority
CN
China
Prior art keywords
mrow
msub
eyes
eye
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB011204281A
Other languages
Chinese (zh)
Other versions
CN1325662A (en
Inventor
徐光v
徐光祐
彭振云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB011204281A priority Critical patent/CN1156248C/en
Publication of CN1325662A publication Critical patent/CN1325662A/en
Application granted granted Critical
Publication of CN1156248C publication Critical patent/CN1156248C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention relates to a motion image human face feature detection method. The method comprises: human face images are shot, and form a training set; principal component analysis, Hough conversion, etc. are carried out, and then, the positions and the sizes are same with eyes in the images of the training set; the images are projected to a character eye space; candidate eyes with minimum errors between primitive eyes and the projection are used as test results, and the exact positions of mouth edges, nostrils and nasal tips are obtained by integral projection. Compared with the existing methods, the method has the advantages that the detection speed is enhanced to 225 times, and the correct rate is enhanced by 1.27%.

Description

method for detecting face feature of moving image
The technical field is as follows:
the invention relates to a method for detecting face characteristics of a moving image, and belongs to the technical field of computer vision.
Background art:
the existing face feature detection method is performed for a still image. In the document of "robust face feature detection based on generalized symmetry" ("11 th international conference proceedings of pattern recognition", 1992, pp.117-120), authors d.reisfeld and y.yeshura proposed a typical still image face feature detection method. The principle of the method is as follows: according to the local and global symmetry of the human face, a complex measure (called symmetry) about the symmetry is defined, then the symmetry is obtained for each edge point in the image through energy function iteration, and the point with the maximum symmetry is regarded as a feature point. The method can detect pupils and mouth angles in the human face, the accuracy is about 95%, and the detection time of each image is about 3 minutes. The main disadvantages of this method are: (1) because the prior knowledge of the human face is not fully utilized, the method has large computation amount and low detection speed, and is not suitable for real-time application environments such as visual communication, non-contact computer operation and the like; (2) since only the information provided by a single still image is used, the search result cannot be verified and corrected; (3) only the feature points on a single image can be retrieved, and the method cannot be used for moving images.
The invention content is as follows:
the invention aims to provide a face detection method in a moving image, which can be used for quickly and accurately detecting the positions of two pupils, two corners of the mouth, two nostrils and the tip of the nose on a face in the moving image, thereby overcoming the defects of low speed, low accuracy and the like in the face detection method of a static image. The detected result can be used in application environments such as face recognition, visual communication, image coding, non-contact computer operation and the like.
The invention provides a moving image face feature detection method, which comprises the following steps:
1. shooting 300 face images with different sexes, ages, postures and illumination to form a training set, and geometrically calibrating the eyes of the images in the training set through homogeneous transformation to ensure that the sizes and the positions of the eyes in the images are completely consistent;
2. performing principal component analysis on the calibrated eyes in the images of the training set to obtain a group of characteristic vectors called characteristic eyes to form a characteristic eye subspace;
3. for a human face image of a tested person, firstly obtaining a plurality of candidate eyes through Hough transformation, carrying out geometric calibration on each pair of candidate eyes by using homogeneous transformation to ensure that the positions and the sizes of the candidate eyes are the same as those of the eyes in an image of a training set, then projecting the candidate eyes to the characteristic eye subspace, and finally taking the candidate eye with the minimum error between the original eye and the projection of the original eye as a detection result;
4. after the eye position of the tested person is determined through the steps, estimating the mouth position according to the human face structure characteristics, obtaining the accurate position of the mouth angle by utilizing integral projection, then estimating the nose position according to the mouth position and the eye position, and accurately positioning the positions of the nostrils and the nose tip by utilizing the integral projection;
5. and if false detection or missing detection occurs, estimating the positions of eyes, nose and mouth in the current frame from the characteristic points in the previous frame of image according to the motion smoothness constraint and the plane motion constraint.
The human face detection method of the invention is used for testing 50 image sequences with different postures, illumination, breadth size, gender, age and background, and the method has the correct detection rate of 96.27 percent and the average detection time of 40 seconds per sequence (each sequence comprises 50 frames of images). Compared with the prior method, the detection speed is improved by 225 times, and the accuracy is improved by 1.27%.
The invention can detect the human face characteristics in the moving images in real time, the accuracy rate reaches 96.27 percent, and the invention can be used in the following application fields: (1) and (5) face recognition. Face recognition methods fall into two broad categories, image-based and feature-based. For the former, the feature points obtained by the method can be used for calibrating the posture and guiding image matching; for the latter, the face features can be used directly as recognition criteria. (2) Visual communication. The biggest challenge in visual communication is to solve the contradiction between channel bandwidth and large amount of transmitted data. By using the method of the invention, the sending end only needs to transmit a few key frame images, can detect the characteristic points of the non-key frame images and only transmits the characteristic points. The receiving end can restore the non-key frame image according to the key frame and the characteristic points. In this way, the existing image transmission bandwidth can be reduced by several orders of magnitude. (3) And (4) encoding the moving image. Content-based retrieval coding methods are becoming new moving picture compression standards (e.g., MPEG-4 and MPEG-7). The human face features are important image contents, and the method provided by the invention can be used as an effective implementation and supplement of the coding method. (4) Contactless computer operation. In many situations, such as a disabled person operating a computer, nuclear reaction control, etc., a user cannot operate the computer with a keyboard or mouse. In which case the computer can be controlled by tracking the gaze point of the human eye. The method of the present invention detects the human face characteristic points in real time, and the positions of pupils on the computer screen are obtained according to the three-dimensional geometric model and the calibrated camera model, so that the computer makes corresponding response.
Description of the drawings:
fig. 1 is a mouth region definition.
Fig. 2 is a nose region definition.
FIG. 3 is a diagram of the feature point spacing used in motion smoothness constraints.
The specific implementation mode is as follows:
1. geometric calibration
100 and 300 face images with different sexes, ages, postures and illumination are shot to form a training set. And through homogeneous transformation, the eye geometry of the images in the training set is calibrated, so that the sizes and the positions of the eyes in the images are completely consistent. In the next step, the same geometric calibration is performed on the eyes in the face image to be tested, so that the relative positions of the two pupils of the eyes in the training set image and the test image are kept unchanged.
Assuming that the original image is I (x, y), the positions of the two pupils are known to be EL(xL,yL) And ER(xR,yR) The included angle between the pupil connecting line and the horizontal axis is theta. The image I (x, y) is now transformed into I' (x, y) by a homogeneous transformation (equation 1) such that the positions of the two pupils are E, respectivelyL0(xL0,yL0) And ER0(xR0,yR0)。EL0(xL0,yL0) And ER0(xR0,yR0) Is a fixed pupil position, and yL0=yR0I.e. the pupillary line is parallel to the horizontal axis.
[x′,y′]=STR[x,y,l]T, (1)
Wherein: r, T and S are a rotation transformation, a translation transformation and a scale transformation, respectively.
<math> <mrow> <mi>R</mi> <mo>=</mo> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mi>Cos&theta;</mi> </mtd> <mtd> <mi>Sin&theta;</mi> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mi>Cos&theta;</mi> </mtd> <mtd> <mo>-</mo> <mi>Sin&theta;</mi> </mtd> <mtd> <mn>0</mn> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> </math>
T = 1 0 x l . 0 - x l . 0 1 0 0 0 1 - - - ( 3 )
S = d ( E l . , E R ) d ( E l . 0 , E R 0 ) 0 ( 1 - d ( E l . , E R ) d ( E l . 0 , E R 0 ) ) x l . 0 0 1 0 - - - ( 4 )
2. Acquisition of a characteristic eye subspace
And performing principal component analysis on the calibrated eyes in the images of the training set to obtain a group of characteristic vectors called characteristic eyes to form a characteristic eye subspace.
Assume that after calibration, the eye region size is w × h — n. Using n-dimension vector i to form RnAnd (4) showing. Let the training set be { i1,i2...,im},ik∈Rn,k=1,2,...,m。
First, the average image (i.e. average eye) of the training set is found:
<math> <mrow> <mi>&mu;</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mi>i</mi> <mi>k</mi> </msub> <mo>,</mo> <mi>&mu;</mi> <mo>&Element;</mo> <msup> <mi>R</mi> <mn>11</mn> </msup> <mo>.</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow> </math>
then, a covariance matrix of the training set samples is calculated:
<math> <mrow> <mi>R</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>m</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>i</mi> <mi>k</mi> </msub> <mo>-</mo> <mi>&mu;</mi> <mo>)</mo> </mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>i</mi> <mi>k</mi> </msub> <mo>-</mo> <mi>&mu;</mi> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>=</mo> <msup> <mi>AA</mi> <mi>T</mi> </msup> <mo>,</mo> <mi>R</mi> <mo>&Element;</mo> <msup> <mi>R</mi> <mrow> <mi>n</mi> <mo>&times;</mo> <mi>n</mi> </mrow> </msup> <mo>,</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein,
A=[i1-μ,i2-μ,...,im-μ]A∈Rnxm. (7)
according to the Singular Value Decomposition (SVD) theorem, it is possible to pass through the matrix ATA∈Rm×mObtaining AA from the set of orthogonal feature vectorsT∈Rn×nOrthogonal feature vector set (u) of1,u2,...,ur). Will (u)1,u2,...,ur) Quadrature obtained after normalizationThe set of feature vectors is still represented as (u)1,u2,...,ur) This is exactly the eigenvector of the covariance matrix R of the training set.
In actual use, only the set of feature vectors (u) for which the following expression holds is taken1,u2,...,u1),
<math> <mrow> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>I</mi> </munderover> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>&GreaterEqual;</mo> <mn>0.95</mn> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>r</mi> </munderover> <mo>|</mo> <msub> <mi>&lambda;</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow> </math>
In algebraic sense, the covariance matrix R of the training set completely expresses all the information of the training set, and R can be used (u)1,u2,...,u1) Complete representation, therefore, if the selected training set includes human eye images in all cases, it can be considered as being composed of (u)1,u2,...,u1) The formed subspace can fully describe the human eye. That is, any human eye can use (u)1,u2,...,u1) Is expressed in linear combinations. We call (u)1,u2,...,u1) For characterizing the eye, it is called as (u)1,u2,...,u1) The constructed subspace is the characteristic eye subspace.
Suppose that an input image with a breadth of w x h is p ∈ RnIt is projected into the characteristic eye subspace, i.e.,
<math> <mrow> <mi>p</mi> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>l</mi> </munderover> <msub> <mi>c</mi> <mi>i</mi> </msub> <msub> <mi>u</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>U</mi> <msup> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>c</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <msub> <mi>c</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow> </math>
since U is an orthogonal matrix, it is, therefore,
(c1,c2,...,c1)T=UTp (10)
thus, we get a mapping of p in the feature eye subspace
<math> <mrow> <msup> <mi>p</mi> <mo>&prime;</mo> </msup> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>l</mi> </munderover> <msub> <mi>c</mi> <mn>1</mn> </msub> <msub> <mi>u</mi> <mn>1</mn> </msub> <mo>.</mo> </mrow> </math>
The difference between p and p 'is described by its correlation value δ (p, p'):
<math> <mrow> <mi>&delta;</mi> <mrow> <mo>(</mo> <msup> <mrow> <mi>p</mi> <mo>,</mo> <mi>p</mi> </mrow> <mo>&prime;</mo> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>pp</mi> <mo>&prime;</mo> </msup> <mo>)</mo> </mrow> <mo>-</mo> <mi>E</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>&prime;</mo> </msup> <mo>)</mo> </mrow> </mrow> <mrow> <mi>&sigma;</mi> <mrow> <mo>(</mo> <mi>p</mi> <mo>)</mo> </mrow> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msup> <mi>p</mi> <mo>&prime;</mo> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow> </math>
3. eye detection
Firstly, obtaining a plurality of candidate eyes from a face image of a tested person; performing geometric calibration on each pair of candidate eyes by using homogeneous transformation to ensure that the positions and the sizes of the candidate eyes are the same as those of the eyes in the training set image, and then projecting the candidate eyes to the characteristic eye subspace; and finally, taking the candidate eye with the minimum error between the original eye and the projection thereof as a detection result.
Firstly, k candidate pupils C are obtained by Hough transformation1,C2,...,CkAnd with C1,C2,...,CkA complete graph G is constructed for the nodes. For C in the figureiAnd CiThe edges in between define a profit function B (i, j) as follows:
<math> <mrow> <mi>B</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <msub> <mi>k</mi> <mi>i</mi> </msub> <mi>&delta;</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>ij</mi> </msub> <mo>,</mo> <msubsup> <mi>p</mi> <mi>ij</mi> <mo>&prime;</mo> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>k</mi> <mn>2</mn> </msub> <mi>&gamma;</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>ij</mi> </msub> <mo>,</mo> <msubsup> <mi>p</mi> <mi>ij</mi> <mo>&prime;</mo> </msubsup> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>*</mo> <mi>D</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>A</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein k is1k2∈[0,1],k1+k2=1.0;pijAre each represented by CiAnd CjA human eye region is divided from the image for the left pupil and the right pupil; p'ijIs pijProjection into a characteristic eye space; gamma (p)ij,pij') is a similarity and symmetry measure; delta (p)ij,pij') is a description of the authenticity of the eye (equation 11); d (i, j) and A (i, j) are constraints on interocular distance and angle.
Pupil pair (C) satisfying the following conditionsl,Cr) Pupil position considered correct:
<math> <mrow> <mi>B</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>,</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>max</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>=</mo> <mn>1,2</mn> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>k</mi> </mrow> </munder> <mi>B</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&GreaterEqual;</mo> <msub> <mi>k</mi> <mn>1</mn> </msub> <msub> <mi>&delta;</mi> <mn>0</mn> </msub> <mo>+</mo> <msub> <mi>k</mi> <mn>2</mn> </msub> <msub> <mi>&gamma;</mi> <mn>0</mn> </msub> <mo>,</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein, γ0Is the human eye similarity and symmetry threshold, δ0Is the human eye truth threshold. If there is no B (l, r) satisfying the equation (13), the binarization threshold value can be increased and adaptive adjustment can be performed.
4. Mouth and nose detection
(1) Mouth corner detection
First, the mouth region is estimated from the pupil position based on anthropometric data. If the two pupils are each C, as shown in FIG. 1lAnd CrThen the mouth area can be roughly estimated as the parallelogram ABCD. The horizontal and vertical integral projections are made in ABCD as follows:
<math> <mrow> <mi>H</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>x</mi> <mo>=</mo> <mi>AD</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>HC</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </munderover> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <mi>V</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>y</mi> <mo>=</mo> <mi>AB</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>DC</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </munderover> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow> </math>
where y ═ AB (x) and y ═ DC (x) are the linear equations of lines AB and DC, respectively; y BC (x) and x AD (y) are linear equations for lines BC and AD, respectively. H (y) is calculated from the original image, and v (x) is the combination of the vertical gradient map and the original image.
The valley points on histogram h (y) correspond to the vertical positions of the mouth corners, and the two valley points on histogram v (x) on both sides of the median value correspond to the horizontal positions of the mouth corners, whereby the positions of the two mouth corners can be determined.
(2) Nostril and tip detection
The nostril detection steps are as follows:
1) rough estimation of the nose region from the mouth region (fig. 2);
2) obtaining a base line y ═ yn of the nose by using integral projection;
3) two nostrils N1(xn1l,yn) And N3(xn3,yn) Is the point that lies on the baseline y-yn and satisfies the following condition:
<math> <mrow> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <msub> <mi>n</mi> <mn>1</mn> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>min</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mo>[</mo> <msub> <mi>x</mi> <mn>3</mn> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>m</mi> </msub> <mo>]</mo> </mrow> </munder> <mi>S</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>16</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <msub> <mi>n</mi> <mn>1</mn> </msub> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>min</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mo>[</mo> <msub> <mi>x</mi> <mi>m</mi> </msub> <mo>,</mo> <msub> <mi>x</mi> <mn>4</mn> </msub> <mo>]</mo> </mrow> </munder> <mi>S</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>17</mn> <mo>)</mo> </mrow> </mrow> </math>
wherein,
<math> <mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>Circle</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <msub> <mi>y</mi> <mi>n</mi> </msub> <mo>,</mo> <msub> <mi>r</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> </mrow> </munder> <msup> <mi>I</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>18</mn> <mo>)</mo> </mrow> </mrow> </math>
5. verification and correction of facial features in moving images
And (3) detecting the characteristics of each frame of image in the moving image by using the method, and if the condition of false detection or missing detection occurs, estimating the position of the characteristic point in the current frame from the characteristic point in the previous frame of image according to the motion smoothness constraint and the plane motion constraint. The specific method comprises the following steps:
1) starting from frame 1, features are detected frame by frame in the above-described manner until the variation between features of successive 3-frame images is less than a given threshold. These 3 frame images are referred to as reference frames, and their features are considered to be correct.
2) Given a reference frame, the feature detection steps of its neighboring frames (target frames) are:
(1) and estimating the characteristic region of the target frame according to the reference frame characteristic.
(2) The features of the target frame are detected within the estimated region using the method described above.
(3) And verifying the detection result by using motion smoothness constraint.
The principle of smoothness constraint is: between two adjacent frames (the reference frame and the target frame), since the head movement amplitude and the distance of the camera from the face are small, the change of the face characteristic point should be small. As shown in fig. 3, the variation of five distances between feature points in two adjacent images should be less than a threshold, otherwise, the detection is considered to be false.
(4) And if the detected features do not accord with the motion smoothness constraint, estimating the face features of the target frame by using a plane motion model.
Two pupils, two corners of the mouth and two nostrils on a human face may be considered approximately on a plane. This plane should conform to the plane rigid motion constraint between the two frames. Let x be (x)1,x2) Is the feature point in the reference frame, the corresponding feature point in the target frame can be estimated by the equations (19) and (20)
<math> <mrow> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mo>=</mo> <mrow> <mo>(</mo> <msubsup> <mi>x</mi> <mn>1</mn> <msup> <mn>1</mn> <mo>&prime;</mo> </msup> </msubsup> <mo>,</mo> <msub> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow> </math>
<math> <mrow> <msub> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mn>2</mn> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>a</mi> <mn>4</mn> </msub> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>+</mo> <msub> <mi>a</mi> <mn>5</mn> </msub> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>+</mo> <msub> <mi>a</mi> <mn>3</mn> </msub> </mrow> <mrow> <msub> <mi>a</mi> <mn>7</mn> </msub> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>+</mo> <msub> <mi>a</mi> <mn>8</mn> </msub> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>19</mn> <mo>)</mo> </mrow> </mrow> </math>
<math> <mrow> <msubsup> <mi>x</mi> <mn>1</mn> <mo>&prime;</mo> </msubsup> <mo>=</mo> <mfrac> <mrow> <msub> <mi>a</mi> <mn>1</mn> </msub> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>+</mo> <msub> <mi>a</mi> <mn>3</mn> </msub> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>+</mo> <msub> <mi>a</mi> <mn>3</mn> </msub> </mrow> <mrow> <msub> <mi>a</mi> <mn>7</mn> </msub> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>+</mo> <msub> <mi>a</mi> <mn>8</mn> </msub> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow> </math>
Wherein a 1.., a8 is a planar motion parameter. If 4 corresponding feature points in the reference frame and the target frame are known, a 1.
<math> <mrow> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msub> <mi>x</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>x</mi> <mn>2</mn> </msub> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mo>-</mo> <msub> <mi>x</mi> <mn>1</mn> </msub> <msub> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mn>2</mn> </msub> </mtd> <mtd> <mo>-</mo> <msub> <mi>x</mi> <mn>2</mn> </msub> <msub> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mn>2</mn> </msub> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <mn>0</mn> </mtd> <mtd> <msub> <mi>x</mi> <mn>1</mn> </msub> </mtd> <mtd> <msub> <mi>x</mi> <mn>2</mn> </msub> </mtd> <mtd> <mn>1</mn> </mtd> <mtd> <mo>-</mo> <msub> <mi>x</mi> <mn>1</mn> </msub> <msub> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mn>2</mn> </msub> </mtd> <mtd> <mo>-</mo> <msub> <mi>x</mi> <mn>2</mn> </msub> <msub> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mn>2</mn> </msub> </mtd> </mtr> </mtable> </mfenced> <mi>A</mi> <mo>=</mo> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msup> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>&prime;</mo> </msup> </mtd> </mtr> <mtr> <mtd> <msup> <msub> <mi>x</mi> <mn>2</mn> </msub> <mo>&prime;</mo> </msup> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>21</mn> <mo>)</mo> </mrow> </mrow> </math>
A=[a1 a2 a3 a4 a5 a6 a7 a8]r (22)
According to the steps 1-5, 6 corresponding feature points (two pupils, two corners of mouth and two nostrils) between two adjacent frames can be obtained. 4 of the 6 points are selected C 6 4 = 15 In the middle combination, 15 sets of plane parameters A can be obtained by using the formula (21)1....A15. For each combination, 6 feature points in the target frame can be estimated using equations (19) and (20). Optimum plane parameter AoptThe following equation is obtained:
Aopt={A1|Min(Err(A1)},i=1-15 (23)
where err (ai) is the estimation error:
Err ( A i ) = Max ( | x ij ( A i ) - x 0 j | ) , j = 1 - 6 - - - ( 24 )

Claims (1)

1. A method for detecting the face characteristics of a moving image is characterized by comprising the following steps:
(1) shooting 300 face images with different sexes, ages, postures and illumination to form a training set, and geometrically calibrating the eyes of the images in the training set through homogeneous transformation to ensure that the sizes and the positions of the eyes in the images are completely consistent;
(2) performing principal component analysis on the calibrated eyes in the images of the training set to obtain a group of characteristic vectors called characteristic eyes to form a characteristic eye subspace;
(3) for a human face image of a tested person, firstly obtaining a plurality of candidate eyes through Hough transformation, carrying out geometric calibration on each pair of candidate eyes by using homogeneous transformation to ensure that the positions and the sizes of the candidate eyes are the same as those of the eyes in an image of a training set, then projecting the candidate eyes to the characteristic eye subspace, and finally taking the candidate eye with the minimum error between the original eye and the projection of the original eye as a detection result;
(4) after the eye position of the tested person is determined through the steps, estimating the mouth position according to the human face structure characteristics, obtaining the accurate position of the mouth angle by utilizing integral projection, then estimating the nose position according to the mouth position and the eye position, and accurately positioning the positions of the nostrils and the nose tip by utilizing the integral projection;
(5) and if false detection or missing detection occurs, estimating the positions of eyes, nose and mouth in the current frame from the characteristic points in the previous frame of image according to the motion smoothness constraint and the plane motion constraint.
CNB011204281A 2001-07-13 2001-07-13 Method for detecting moving human face Expired - Fee Related CN1156248C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB011204281A CN1156248C (en) 2001-07-13 2001-07-13 Method for detecting moving human face

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB011204281A CN1156248C (en) 2001-07-13 2001-07-13 Method for detecting moving human face

Publications (2)

Publication Number Publication Date
CN1325662A CN1325662A (en) 2001-12-12
CN1156248C true CN1156248C (en) 2004-07-07

Family

ID=4664123

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB011204281A Expired - Fee Related CN1156248C (en) 2001-07-13 2001-07-13 Method for detecting moving human face

Country Status (1)

Country Link
CN (1) CN1156248C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8294776B2 (en) 2006-09-27 2012-10-23 Sony Corporation Imaging apparatus and imaging method

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7936902B2 (en) * 2004-11-12 2011-05-03 Omron Corporation Face feature point detection apparatus and feature point detection apparatus
JP2007094906A (en) * 2005-09-29 2007-04-12 Toshiba Corp Feature point detection apparatus and method
CN100347721C (en) * 2006-06-29 2007-11-07 南京大学 Face setting method based on structured light
JP5228307B2 (en) * 2006-10-16 2013-07-03 ソニー株式会社 Display device and display method
CN101169827B (en) * 2007-12-03 2010-06-02 北京中星微电子有限公司 Method and device for tracking characteristic point of image
JP4539729B2 (en) * 2008-02-15 2010-09-08 ソニー株式会社 Image processing apparatus, camera apparatus, image processing method, and program
CN101339606B (en) * 2008-08-14 2011-10-12 北京中星微电子有限公司 Human face critical organ contour characteristic points positioning and tracking method and device
CN101360246B (en) * 2008-09-09 2010-06-02 西南交通大学 Video error concealment method combined with 3D face model
CN102043966B (en) * 2010-12-07 2012-11-28 浙江大学 Face recognition method based on combination of partial principal component analysis (PCA) and attitude estimation
CN102163240A (en) * 2011-05-20 2011-08-24 苏州两江科技有限公司 Method for constructing human face characteristic image index database based on MPEG-7 (Motion Picture Experts Group-7) standard
CN107506682A (en) * 2016-06-14 2017-12-22 掌赢信息科技(上海)有限公司 A kind of man face characteristic point positioning method and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8294776B2 (en) 2006-09-27 2012-10-23 Sony Corporation Imaging apparatus and imaging method
US9179057B2 (en) 2006-09-27 2015-11-03 Sony Corporation Imaging apparatus and imaging method that acquire environment information and information of a scene being recorded

Also Published As

Publication number Publication date
CN1325662A (en) 2001-12-12

Similar Documents

Publication Publication Date Title
Ploumpis et al. Combining 3d morphable models: A large scale face-and-head model
CN108549873B (en) Three-dimensional face recognition method and three-dimensional face recognition system
Chattopadhyay et al. SURDS: Self-supervised attention-guided reconstruction and dual triplet loss for writer independent offline signature verification
US6580810B1 (en) Method of image processing using three facial feature points in three-dimensional head motion tracking
CN100565583C (en) Face feature point detection device, feature point detection device
US7512255B2 (en) Multi-modal face recognition
JP4238542B2 (en) Face orientation estimation apparatus, face orientation estimation method, and face orientation estimation program
CN1156248C (en) Method for detecting moving human face
CN105487665B (en) A kind of intelligent Mobile Service robot control method based on head pose identification
US20060245639A1 (en) Method and system for constructing a 3D representation of a face from a 2D representation
US20160314345A1 (en) System and method for identifying faces in unconstrained media
Martin et al. On the design and evaluation of robust head pose for visual user interfaces: Algorithms, databases, and comparisons
CN1794265A (en) Method and device for distinguishing face expression based on video frequency
CN1781123A (en) System and method for tracking a global shape of an object in motion
CN101964064A (en) Human face comparison method
CN102654903A (en) Face comparison method
CN110603570B (en) Object recognition method, device, system, and program
CN101968846A (en) Face tracking method
CN106600626A (en) Three-dimensional human body movement capturing method and system
WO2022042203A1 (en) Human body key point detection method and apparatus
CN1561499A (en) Head motion estimation from four feature points
CN106096517A (en) A kind of face identification method based on low-rank matrix Yu eigenface
WO2015165227A1 (en) Human face recognition method
CN117095033A (en) A multi-modal point cloud registration method based on image and geometric information guidance
CN119137632A (en) Probabilistic Keypoint Regression with Uncertainty

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee