JP2013171494A

JP2013171494A - Image processing device, method, and program

Info

Publication number: JP2013171494A
Application number: JP2012035995A
Authority: JP
Inventors: Shingo Tsurumi; 辰吾鶴見
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2012-02-22
Filing date: 2012-02-22
Publication date: 2013-09-02

Abstract

【課題】より確実に、安定してユーザの視点位置を求める。
【解決手段】検出処理部は、表示部を観察するユーザを被写体として撮影された撮影画像から、ユーザの顔と眼を検出する。撮影画像からユーザの顔と眼が検出された場合、位置算出処理部は、撮影画像上の眼の位置と、ユーザの統計的な眼間距離とに基づいて、ユーザの視点位置を算出する。このとき、位置算出処理部は、得られた視点位置と顔の検出結果とから、ユーザの実際の眼間距離と顔幅の関係を示す関係情報も算出する。また、撮影画像からユーザの顔のみが検出された場合、位置算出処理部は、顔の検出結果と、過去に算出した関係情報とからユーザの視点位置を算出する。本技術は、画像処理装置に適用することができる。
【選択図】図３A user's viewpoint position is obtained more reliably and stably.
A detection processing unit detects a user's face and eyes from a photographed image taken with a user observing the display unit as a subject. When the user's face and eyes are detected from the captured image, the position calculation processing unit calculates the user's viewpoint position based on the eye position on the captured image and the user's statistical interocular distance. At this time, the position calculation processing unit also calculates relationship information indicating the relationship between the user's actual interocular distance and the face width from the obtained viewpoint position and face detection result. When only the user's face is detected from the captured image, the position calculation processing unit calculates the user's viewpoint position from the face detection result and the relationship information calculated in the past. The present technology can be applied to an image processing apparatus.
[Selection] Figure 3

Description

本技術は画像処理装置および方法、並びにプログラムに関し、特に、より確実に、安定してユーザの視点位置を求めることができるようにした画像処理装置および方法、並びにプログラムに関する。 The present technology relates to an image processing device and method, and a program, and more particularly, to an image processing device and method, and a program that can obtain a user's viewpoint position more reliably and stably.

例えば、ディスプレイに対するユーザの眼の位置（視点位置）に基づいて、ディスプレイへの３Ｄ映像の表示を制御する裸眼立体表示技術が知られている（例えば、特許文献１参照）。このような技術では、ディスプレイからユーザまでの距離、すなわち３次元におけるユーザの視点位置が必要となる。 For example, there is known an autostereoscopic display technology for controlling the display of 3D video on a display based on the position of the user's eyes (viewpoint position) with respect to the display (see, for example, Patent Document 1). Such a technique requires the distance from the display to the user, that is, the user's viewpoint position in three dimensions.

そこで、撮影された画像からユーザの眼の位置を検出し、その検出結果からユーザの視点位置を求める技術が提案されている。 Therefore, a technique has been proposed in which the position of the user's eye is detected from the captured image, and the user's viewpoint position is obtained from the detection result.

特開２０１１−１３９２８１号公報JP 2011-139281 A

しかしながら、上述した技術では、安定して、確実にユーザの視点位置を求めることは困難であった。例えば、撮影された画像からユーザの眼を検出し、その検出結果からユーザの視点位置を求める方法では、ユーザが俯いたときなど、画像上にユーザの眼がない場合には、ユーザの視点位置を得ることができなくなってしまうことがあった。 However, with the above-described technique, it has been difficult to stably and reliably obtain the user's viewpoint position. For example, in the method of detecting the user's eye from the photographed image and obtaining the user's viewpoint position from the detection result, when the user's eye is not on the image, such as when the user crawls, the user's viewpoint position I couldn't get it.

本技術は、このような状況に鑑みてなされたものであり、より確実に、安定してユーザの視点位置を求めることができるようにするものである。 The present technology has been made in view of such a situation, and enables the user's viewpoint position to be obtained more reliably and stably.

本技術の一側面の画像処理装置は、ユーザを被写体として撮影された撮影画像に基づいて、前記撮影画像から前記ユーザの顔を検出する顔検出部と、前記撮影画像から前記ユーザの眼を検出する眼検出部と、前記ユーザの眼が検出された場合、前記ユーザの眼の検出結果に基づいて、実空間における前記ユーザの位置を算出する位置算出部と、前記ユーザの眼が検出された場合、算出された前記ユーザの位置と前記ユーザの顔の検出結果とに基づいて、前記ユーザの眼間距離と顔幅の関係を示す関係情報を算出する関係情報算出部とを備え、前記位置算出部は、前記ユーザの顔が検出され、前記ユーザの眼が検出されなかった場合、前記ユーザの顔の検出結果および前記関係情報に基づいて、前記ユーザの位置を算出する。 An image processing apparatus according to an aspect of the present technology detects a face of the user from the photographed image based on a photographed image photographed with the user as a subject, and detects the user's eye from the photographed image. When the user's eyes are detected, a position calculation unit that calculates the position of the user in real space is detected based on a detection result of the user's eyes, and the user's eyes are detected A relation information calculation unit that calculates relation information indicating a relation between the interocular distance of the user and a face width based on the calculated position of the user and the detection result of the user's face, When the user's face is detected and the user's eyes are not detected, the calculation unit calculates the position of the user based on the detection result of the user's face and the relationship information.

前記位置算出部には、前記ユーザの眼が検出された場合、平均的なユーザの眼間距離と、前記ユーザの眼の検出結果とに基づいて、前記ユーザの位置を算出させることができる。 When the user's eyes are detected, the position calculation unit can calculate the user's position based on an average user's interocular distance and a detection result of the user's eyes.

画像処理装置には、前記撮影画像に基づいて、前記ユーザの性別、年齢、または人種の少なくとも何れかを判定する判定部をさらに設け、前記位置算出部には、前記判定部による判定結果により定まる前記平均的なユーザの眼間距離と、前記ユーザの眼の検出結果とに基づいて、前記ユーザの位置を算出させることができる。 The image processing apparatus further includes a determination unit that determines at least one of the sex, age, or race of the user based on the captured image, and the position calculation unit includes a determination result by the determination unit. The position of the user can be calculated based on the determined average distance between the eyes of the user and the detection result of the eyes of the user.

前記関係情報算出部には、前記関係情報として実際の前記ユーザの顔幅を算出させることができる。 The relationship information calculation unit can calculate an actual face width of the user as the relationship information.

画像処理装置には、前記撮影画像から検出された前記ユーザの顔の領域から特徴量を抽出する顔識別部と、前記特徴量と前記関係情報を対応付けて保持する関係情報保持部と
をさらに設け、前記位置算出部には、処理対象の前記撮影画像の直前の前記撮影画像から前記ユーザの顔が検出されず、かつ前記処理対象の前記撮影画像から前記ユーザの顔が検出され、前記ユーザの眼が検出されなかった場合、前記特徴量に基づいて選択した前記関係情報、および前記ユーザの顔の検出結果に基づいて、前記ユーザの位置を算出させることができる。 The image processing apparatus further includes a face identification unit that extracts a feature amount from the user's face area detected from the captured image, and a relationship information holding unit that holds the feature amount and the relationship information in association with each other. The position calculating unit detects the user's face from the captured image immediately before the captured image to be processed and detects the user's face from the captured image to be processed; When the first eye is not detected, the position of the user can be calculated based on the relation information selected based on the feature amount and the detection result of the user's face.

本技術の一側面の画像処理方法またはプログラムは、ユーザを被写体として撮影された撮影画像に基づいて、前記撮影画像から前記ユーザの顔を検出し、前記撮影画像から前記ユーザの眼を検出し、前記ユーザの眼が検出された場合、前記ユーザの眼の検出結果に基づいて、実空間における前記ユーザの位置を算出し、前記ユーザの眼が検出された場合、算出された前記ユーザの位置と前記ユーザの顔の検出結果とに基づいて、前記ユーザの眼間距離と顔幅の関係を示す関係情報を算出し、前記ユーザの顔が検出され、前記ユーザの眼が検出されなかった場合、前記ユーザの顔の検出結果および前記関係情報に基づいて、前記ユーザの位置を算出するステップを含む。 An image processing method or program according to one aspect of the present technology detects the user's face from the captured image based on a captured image captured using the user as a subject, detects the user's eyes from the captured image, When the user's eyes are detected, the position of the user in real space is calculated based on the detection result of the users' eyes, and when the user's eyes are detected, the calculated position of the user and Based on the detection result of the user's face, calculating relationship information indicating the relationship between the distance between the eyes of the user and the face width, when the user's face is detected, and the user's eyes are not detected, Calculating a position of the user based on a detection result of the user's face and the relationship information.

本技術の一側面においては、ユーザを被写体として撮影された撮影画像に基づいて、前記撮影画像から前記ユーザの顔が検出され、前記撮影画像から前記ユーザの眼が検出され、前記ユーザの眼が検出された場合、前記ユーザの眼の検出結果に基づいて、実空間における前記ユーザの位置が算出され、前記ユーザの眼が検出された場合、算出された前記ユーザの位置と前記ユーザの顔の検出結果とに基づいて、前記ユーザの眼間距離と顔幅の関係を示す関係情報が算出され、前記ユーザの顔が検出され、前記ユーザの眼が検出されなかった場合、前記ユーザの顔の検出結果および前記関係情報に基づいて、前記ユーザの位置が算出される。 In one aspect of the present technology, the user's face is detected from the captured image based on a captured image captured with the user as a subject, the user's eyes are detected from the captured image, and the user's eyes are If detected, the position of the user in real space is calculated based on the detection result of the user's eyes. If the user's eyes are detected, the calculated position of the user and the face of the user are detected. Based on the detection result, relationship information indicating the relationship between the user's interocular distance and the face width is calculated, and when the user's face is detected and the user's eyes are not detected, Based on the detection result and the relation information, the position of the user is calculated.

本技術の一側面によれば、より確実に、安定してユーザの視点位置を求めることができる。 According to one aspect of the present technology, the viewpoint position of the user can be obtained more reliably and stably.

顔幅方式によるユーザの視点位置の算出について説明する図である。It is a figure explaining calculation of a user's viewpoint position by a face width method. 顔幅方式と眼幅方式の切り替えについて説明する図である。It is a figure explaining switching of a face width system and an eye width system. 画像処理システムの構成例を示す図である。It is a figure which shows the structural example of an image processing system. 視点位置算出処理について説明するフローチャートである。It is a flowchart explaining a viewpoint position calculation process. コンピュータの構成例を示す図である。It is a figure which shows the structural example of a computer.

以下、図面を参照して、本技術を適用した実施の形態について説明する。 Hereinafter, embodiments to which the present technology is applied will be described with reference to the drawings.

〈第１の実施の形態〉
［視点位置の算出について］
本技術は、例えば顔検出や眼検出の結果を用いて、表示部（ディスプレイ）に対するユーザの眼の位置（視点位置）を推定し、その推定結果に基づいて、表示部への３Ｄ映像（立体画像）の表示を制御するヘッドトラッキング方式の裸眼立体表示に関する技術である。なお、本技術は、ユーザの視点位置に基づいて画像表示等、各種の制御を行なう場合に適用可能であるが、以下では、本技術を立体画像の表示制御に利用する場合を例として説明を続ける。 <First Embodiment>
[About calculation of viewpoint position]
The present technology estimates the position (viewpoint position) of the user's eyes with respect to the display unit (display) using, for example, the results of face detection and eye detection, and based on the estimation result, 3D video (3D This is a technique related to autostereoscopic display of a head tracking system that controls display of (image). Note that the present technology can be applied to various types of control such as image display based on the user's viewpoint position. However, in the following description, the present technology is used as an example of stereoscopic image display control. to continue.

本技術では、表示部に対する３次元のユーザの眼の位置（視点位置）を求める手法として、ユーザの顔の検出結果を用いて視点位置を求める手法と、ユーザの眼の検出結果を用いて視点位置を求める手法とが、適宜切り替えられながら用いられる。 In the present technology, as a method for obtaining a three-dimensional user's eye position (viewpoint position) with respect to the display unit, a method for obtaining a viewpoint position using the detection result of the user's face and a viewpoint using the detection result of the user's eye The method for obtaining the position is used while being appropriately switched.

このとき、ユーザの眼の検出結果から視点位置が求められた場合には、その視点位置に基づいて、ユーザの実際の眼間距離と顔サイズの関係を示す関係情報が求められる。そして、ユーザの顔の検出結果を用いて視点位置を求める場合には、ユーザの顔の検出結果だけでなく、関係情報も用いられてユーザの視点位置が求められる。 At this time, when the viewpoint position is obtained from the detection result of the user's eyes, the relationship information indicating the relationship between the user's actual interocular distance and the face size is obtained based on the viewpoint position. When obtaining the viewpoint position using the detection result of the user's face, not only the detection result of the user's face but also the related information is used to obtain the user's viewpoint position.

なお、以下、ユーザの顔の検出結果を用いて視点位置を求める手法を顔幅方式とも称し、ユーザの眼の検出結果を用いて視点位置を求める手法を眼幅方式とも称することとする。 Hereinafter, a method for obtaining a viewpoint position using a user's face detection result is also referred to as a face width method, and a method for obtaining a viewpoint position using a user's eye detection result is also referred to as a eye width method.

それでは、以下において本技術によるユーザの視点位置の算出について説明する。 The calculation of the user's viewpoint position according to the present technology will be described below.

まず、顔検出を利用して、ユーザの視点位置を求める手法について説明する。 First, a method for obtaining a user's viewpoint position using face detection will be described.

例えば、図１に示すように、画像が表示される表示部１１が３次元空間上に配置されており、ユーザが図中、上側から下方向を向いて表示部１１を観察するとする。また、表示部１１の中心の位置には、ユーザを撮影する撮影部１２が配置され、撮影部１２によりユーザを被写体とした撮影画像ＰＣ１１が撮影されるとする。つまり、撮影画像ＰＣ１１が図示されている位置にある被写体が撮影されて、撮影画像ＰＣ１１が得られるとする。 For example, as shown in FIG. 1, it is assumed that a display unit 11 on which an image is displayed is arranged in a three-dimensional space, and a user observes the display unit 11 from the upper side to the lower side in the drawing. Further, it is assumed that a photographing unit 12 that photographs the user is arranged at the center position of the display unit 11 and the photographed image PC11 with the user as a subject is photographed by the photographing unit 12. That is, it is assumed that the subject at which the captured image PC11 is illustrated is captured and the captured image PC11 is obtained.

なお、図中、横方向は表示部１１を観察するユーザの両眼が並んでいる方向であり、図１は、表示部１１とユーザを、ユーザの真上から見下ろした図となっている。 In the figure, the horizontal direction is the direction in which both eyes of the user observing the display unit 11 are lined up, and FIG. 1 is a view of the display unit 11 and the user looking down from directly above the user.

図１において、撮影部１２の画角がＡＨであるとし、撮影部１２の光軸と平行な直線ＤＲ１１と、撮影画像ＰＣ１１との交点の位置、つまり撮影画像ＰＣ１１の図中、横方向の中心位置が位置Ｏであるとする。例えば、撮影部１２の図中、横方向の画角が63.24度であれば、画角ＡＨは、ＡＨ＝63.24×π／180などとされる。 In FIG. 1, it is assumed that the angle of view of the photographing unit 12 is AH, and the position of the intersection of the straight line DR11 parallel to the optical axis of the photographing unit 12 and the photographed image PC11, that is, the center in the horizontal direction in the figure of the photographed image PC11. Assume that the position is the position O. For example, in the drawing of the photographing unit 12, when the horizontal field angle is 63.24 degrees, the field angle AH is set to AH = 63.24 × π / 180 or the like.

このとき、撮影画像ＰＣ１１の図中、横方向の幅を「１」とすると、撮影部１２から撮影画像ＰＣ１１上の位置Ｏまでの図中、縦方向の距離ＩＷＤは、次式（１）により求められる。 At this time, if the width in the horizontal direction is “1” in the figure of the photographed image PC11, the vertical distance IWD in the figure from the photographing unit 12 to the position O on the photographed image PC11 is expressed by the following equation (1). Desired.

また、撮影画像ＰＣ１１上の領域ＦＣがユーザの顔の領域であり、領域ＦＣの図中、横方向の幅がwidthであったとする。ここで、領域ＦＣは、撮影画像ＰＣ１１上の位置ｘから位置ｘ’までの間の領域である。 Further, it is assumed that the area FC on the photographed image PC11 is a user's face area, and the width in the horizontal direction in the figure of the area FC is width. Here, the area FC is an area between the position x and the position x ′ on the captured image PC11.

撮影画像ＰＣ１１の横方向の幅は「１」であるから、位置Ｏを「０」とすると、撮影画像ＰＣ１１の図中、横方向の任意の位置、例えば位置ｘや位置ｘ’は-0.5乃至0.5の範囲内の数値により表現することができる。また、撮影画像ＰＣ１１上のユーザの顔の領域ＦＣの幅widthは、０乃至１の間の値となる。 Since the horizontal width of the captured image PC11 is “1”, if the position O is “0”, arbitrary positions in the horizontal direction in the figure of the captured image PC11, for example, the position x and the position x ′ are −0.5 to It can be expressed by a numerical value within the range of 0.5. The width width of the user's face area FC on the photographed image PC11 is a value between 0 and 1.

いま、撮影部１２の中心からユーザの顔の図中、左端までの角度、つまり、直線ＤＲ１１と直線ＤＲ１２とのなす角度が角度φであるとする。また、ユーザの顔幅に相当する角度、つまり直線ＤＲ１２と直線ＤＲ１３とがなす角度が角度θであるとする。 Assume that the angle from the center of the photographing unit 12 to the left end in the figure of the user's face, that is, the angle formed by the straight line DR11 and the straight line DR12 is the angle φ. Further, it is assumed that an angle corresponding to the user's face width, that is, an angle formed by the straight line DR12 and the straight line DR13 is an angle θ.

ここで、直線ＤＲ１２は、撮影部１２と領域ＦＣの図中、左端の位置ｘとを結ぶ直線であり、直線ＤＲ１３は、撮影部１２と領域ＦＣの図中、右端の位置ｘ’とを結ぶ直線である。 Here, the straight line DR12 is a straight line connecting the photographing unit 12 and the left end position x in the drawing of the region FC, and the straight line DR13 is connecting the photographing unit 12 and the right end position x ′ in the drawing of the region FC. It is a straight line.

このとき角度φは、ユーザの顔の領域ＦＣの位置ｘと、上述した距離ＩＷＤとから次式（２）により求まる。 At this time, the angle φ is obtained by the following equation (2) from the position x of the user's face area FC and the above-described distance IWD.

したがって、ユーザの顔幅に相当する角度θは、式（２）により求まった角度φ、距離ＩＷＤ、ユーザの顔の領域ＦＣの位置ｘ、およびユーザの顔の領域ＦＣの幅widthを用いて、次式（３）により求めることができる。 Therefore, the angle θ corresponding to the user's face width is obtained by using the angle φ obtained by the equation (2), the distance IWD, the position x of the user's face area FC, and the width width of the user's face area FC. It can obtain | require by following Formula (3).

このように、ユーザの顔幅に相当する角度θが求まると、この角度θと、ユーザの実際の顔幅ＦＷとから、次式（４）により実空間（３次元）における表示部１１（撮影部１２）から、ユーザの視点位置（顔）までの距離Ｒを求めることができる。ここで、ユーザの視点位置は、例えばユーザの左右の眼の中間の位置である。 Thus, when the angle θ corresponding to the user's face width is obtained, the display unit 11 (photographing in real space (three-dimensional)) is obtained from the angle θ and the user's actual face width FW by the following equation (4). The distance R from the unit 12) to the user's viewpoint position (face) can be obtained. Here, the viewpoint position of the user is, for example, an intermediate position between the left and right eyes of the user.

なお、ユーザの実際の顔幅ＦＷは、例えばユーザが子供であれば、12.5cm程度であり、ユーザが大人であれば17.0cm程度である。また、ユーザの顔幅ＦＷは、幅widthに対応する、実空間上におけるユーザの顔の横方向の幅である。 The actual face width FW of the user is, for example, about 12.5 cm if the user is a child, and about 17.0 cm if the user is an adult. The user's face width FW is a width in the horizontal direction of the user's face in real space, corresponding to the width width.

以上のように、顔幅方式によるユーザの視点位置の算出では、撮影部１２の画角ＡＨ、撮影画像ＰＣ１１上のユーザの顔の位置ｘと幅width、およびユーザの実際の顔幅ＦＷが分かれば、３次元におけるユーザの視点位置（距離Ｒ）を求めることができる。なお、後述するように顔幅方式では、顔幅ＦＷの値は、ユーザの実際の眼間距離と顔幅の関係を示す関係情報が用いられて求められる。 As described above, in the calculation of the user's viewpoint position by the face width method, the angle of view AH of the photographing unit 12, the position x and width of the user's face on the photographed image PC11, and the actual face width FW of the user are separated. For example, the viewpoint position (distance R) of the user in three dimensions can be obtained. As will be described later, in the face width method, the value of the face width FW is obtained using relationship information indicating the relationship between the user's actual interocular distance and the face width.

次に、眼幅方式によるユーザの視点位置の算出について説明する。 Next, calculation of the user's viewpoint position by the eye width method will be described.

眼幅方式においても、顔幅方式と同様の計算によりユーザの視点位置を算出することができる。具体的には、例えば図１における位置ｘおよび位置ｘ’が、仮にユーザの右眼の位置および左眼の位置であったとする。この場合、位置ｘをユーザの右眼位置とし、幅widthをユーザの左右の眼の幅、つまり眼間距離として式（３）を計算すれば、角度θとしてユーザの眼幅に相当する角度を得ることができる。 Also in the eye width method, the viewpoint position of the user can be calculated by the same calculation as in the face width method. Specifically, for example, it is assumed that the position x and the position x ′ in FIG. 1 are the position of the right eye and the position of the left eye of the user. In this case, if the equation (3) is calculated with the position x as the right eye position of the user and the width width as the width of the left and right eyes of the user, that is, the interocular distance, an angle corresponding to the user's eye width is set as the angle θ. Can be obtained.

したがって、この角度θから、式（４）と同様の計算により、撮影部１２からユーザの視点位置までの距離Ｒを得ることができる。つまり、ユーザの眼幅に相当する角度をθ’とし、実際のユーザの眼間距離をＥＷとすると、次式（５）を計算することで視点位置までの距離Ｒを求めることができる。 Therefore, the distance R from the photographing unit 12 to the user's viewpoint position can be obtained from the angle θ by the same calculation as in the equation (4). That is, if the angle corresponding to the user's eye width is θ ′ and the actual user's interocular distance is EW, the distance R to the viewpoint position can be obtained by calculating the following equation (5).

以上のことから、眼幅方式によるユーザの視点位置の算出では、撮影部１２の画角ＡＨ、撮影画像ＰＣ１１上のユーザの眼の位置と眼間距離、およびユーザの実際の眼間距離ＥＷが分かれば、３次元におけるユーザの視点位置（距離Ｒ）を求めることができる。 From the above, in the calculation of the viewpoint position of the user by the eye width method, the angle of view AH of the imaging unit 12, the position and interocular distance of the user's eyes on the captured image PC11, and the actual interocular distance EW of the user. If known, the viewpoint position (distance R) of the user in three dimensions can be obtained.

なお、以下では、撮影画像ＰＣ１１上におけるユーザの眼の位置（図１の例ではユーザの右眼位置）をｘｅとし、撮影画像ＰＣ１１上におけるユーザの眼間距離をＨＥとする。 In the following description, the position of the user's eyes on the captured image PC11 (the user's right eye position in the example of FIG. 1) is xe, and the interocular distance of the user on the captured image PC11 is HE.

ところで、上述したように顔幅方式によるユーザの視点位置の算出では、ユーザの実際の顔幅ＦＷが必要となり、眼幅方式によるユーザの視点位置の算出では、ユーザの実際の眼間距離ＥＷが必要となる。 By the way, as described above, the calculation of the user's viewpoint position by the face width method requires the user's actual face width FW, and the calculation of the user's viewpoint position by the eye width method requires the user's actual interocular distance EW. Necessary.

これらの計算に用いられる顔幅ＦＷや眼間距離ＥＷについては、個人ごとに直接、顔幅ＦＷや眼間距離ＥＷを測定して得られた値を用いる方法と、統計的な平均値を顔幅ＦＷや眼間距離ＥＷの値として用いる方法が考えられる。 Regarding the face width FW and the interocular distance EW used for these calculations, a method using values obtained by directly measuring the face width FW and the interocular distance EW for each individual, and a statistical average value of the face A method of using the width FW or the interocular distance EW as a value can be considered.

例えば、顔幅ＦＷや眼間距離ＥＷとして統計的な平均値を用いる場合、顔幅ＦＷの個人差よりも眼間距離ＥＷの個人差の方が少ないため、顔幅方式と比べて眼幅方式の方が、より正確にユーザの視点位置を求めることが可能である。 For example, when a statistical average value is used as the face width FW or the interocular distance EW, the individual difference in the interocular distance EW is smaller than the individual difference in the face width FW. It is possible to determine the user's viewpoint position more accurately.

また、顔幅方式または眼幅方式の何れかにより、ユーザの視点位置までの距離Ｒが求まれば、実際の顔幅ＦＷと眼間距離ＥＷの関係を求めることができる。 Further, if the distance R to the user's viewpoint position is obtained by either the face width method or the eye width method, the relationship between the actual face width FW and the interocular distance EW can be obtained.

そこで、本技術では、撮影画像からユーザの顔と眼が検出されている間は、眼間距離ＥＷとして統計的な平均値が用いられて眼幅方式により距離Ｒが算出される。また、眼幅方式により距離Ｒが算出されると、算出された距離Ｒや眼間距離ＥＷ、顔検出結果から、実際の顔幅ＦＷと眼間距離ＥＷの関係を示す関係情報が求められ、保持される。例えば関係情報は、距離Ｒから求まる各個人の顔幅ＦＷとされる。 Therefore, in the present technology, while the user's face and eyes are detected from the captured image, a statistical average value is used as the interocular distance EW, and the distance R is calculated by the eye width method. When the distance R is calculated by the eye width method, the relationship information indicating the relationship between the actual face width FW and the interocular distance EW is obtained from the calculated distance R, the interocular distance EW, and the face detection result. Retained. For example, the relationship information is the face width FW of each individual obtained from the distance R.

これに対して、撮影画像からユーザの顔のみが検出され、ユーザの眼が検出されなかった場合には、関係情報として求められた顔幅ＦＷが用いられて、顔幅方式により距離Ｒが算出される。 On the other hand, when only the user's face is detected from the photographed image and the user's eyes are not detected, the face width FW obtained as the relationship information is used, and the distance R is calculated by the face width method. Is done.

具体的には、例えば図２に示すように、眼幅方式と顔幅方式とが切り替えられてユーザの視点位置（距離Ｒ）が算出される。なお、図２において、図中、横方向は時間方向を示している。また、斜線が施された長方形は、撮影画像からユーザの顔が検出された区間を示しており、斜線が施されていない長方形は、撮影画像からユーザの眼が検出された区間を示している。 Specifically, for example, as shown in FIG. 2, the user's viewpoint position (distance R) is calculated by switching between the eye width method and the face width method. In FIG. 2, the horizontal direction indicates the time direction. Further, the hatched rectangle indicates a section where the user's face is detected from the captured image, and the rectangle without the hatched line indicates a section where the user's eyes are detected from the captured image. .

図２の例では、区間Ｑ１から区間Ｑ３までの間は、継続してユーザの顔が検出されている。また、区間Ｑ１と区間Ｑ３では継続してユーザの眼が検出されているが、区間Ｑ２ではユーザの眼は検出されていない。 In the example of FIG. 2, the user's face is continuously detected from the section Q1 to the section Q3. In addition, the user's eyes are continuously detected in the sections Q1 and Q3, but the user's eyes are not detected in the sections Q2.

なお、ユーザの顔が継続して検出されている区間では、検出されたユーザの顔のトラッキングと、検出されたユーザの顔の識別が行なわれる。例えば、顔の識別は、撮影画像上の顔の領域から抽出された特徴量により行なわれる。 In the section where the user's face is continuously detected, tracking of the detected user's face and identification of the detected user's face are performed. For example, the face is identified by the feature amount extracted from the face area on the photographed image.

図２において、区間Ｑ１に注目すると、区間Ｑ１ではユーザの顔も眼も検出されている。そこで、この区間Ｑ１では、ユーザの眼の検出結果から眼幅方式により距離Ｒ（視点位置）が算出されるとともに、関係情報としての顔幅ＦＷが算出され、保持される。 In FIG. 2, when attention is paid to the section Q1, the user's face and eyes are detected in the section Q1. Therefore, in this section Q1, the distance R (viewpoint position) is calculated from the detection result of the user's eyes by the eye width method, and the face width FW as the related information is calculated and held.

ここで、保持される顔幅ＦＷは、より詳細には例えば区間Ｑ１を構成する各フレームにおいて求められた顔幅ＦＷの平均値とされる。また、求められた関係情報としての顔幅ＦＷは、撮影画像上のユーザの顔の領域から抽出された特徴量と対応付けられて保持される。これにより、ユーザ個人ごとに関係情報としての顔幅ＦＷを保持することができる。 Here, the held face width FW is, for example, an average value of the face width FW obtained in each frame constituting the section Q1. Further, the face width FW as the obtained relation information is held in association with the feature amount extracted from the user's face area on the captured image. Thereby, the face width FW as the related information can be held for each individual user.

区間Ｑ１に続く区間Ｑ２では、撮影画像からユーザの顔は検出されているが、ユーザの眼は検出されていない。そこで、区間Ｑ２では、検出されている顔について保持されている関係情報としての顔幅ＦＷと、顔の検出結果とが用いられて顔幅方式により距離Ｒ（視点位置）が算出される。 In a section Q2 following the section Q1, the user's face is detected from the captured image, but the user's eyes are not detected. Therefore, in the section Q2, the distance R (viewpoint position) is calculated by the face width method using the face width FW as the relation information held for the detected face and the face detection result.

例えば、区間Ｑ２のように、ユーザが下を向いた場合など、条件によっては撮影画像からユーザの顔は検出されるが、ユーザの眼は検出されないということがある。 For example, as in the section Q2, the user's face may be detected from the captured image depending on the conditions, such as when the user faces down, but the user's eyes may not be detected.

そのような場合に、ユーザの眼が検出されている間は眼幅方式により視点位置が算出されていたが、眼が検出されなくなったときに、統計的な顔幅が用いられて顔幅方式により視点位置が算出されると、時系列に並ぶ視点位置が不連続になったり、視点位置の誤差が大きくなったりしてしまう。 In such a case, the viewpoint position was calculated by the eye width method while the user's eyes were detected, but when the eyes are no longer detected, the statistical face width is used and the face width method is used. If the viewpoint position is calculated by the above, the viewpoint positions arranged in time series become discontinuous or the viewpoint position error becomes large.

そこで、区間Ｑ２のようなユーザの顔のみが検出された区間では、予め求めておいた関係情報を用いて顔幅方式により視点位置を算出することで、時間方向にみて連続的で、より正確な視点位置を得ることができる。つまり、ユーザの眼が検出されない区間があっても、より確実に、安定してユーザの視点位置を求めることができる。 Therefore, in the section where only the user's face is detected, such as the section Q2, the viewpoint position is calculated by the face width method using the relationship information obtained in advance, so that it is continuous and more accurate in the time direction. Can be obtained. That is, even if there is a section in which the user's eyes are not detected, the user's viewpoint position can be obtained more reliably and stably.

また、区間Ｑ２の後の区間Ｑ３では、再びユーザの顔と眼が検出されている。また、顔のトラッキング結果から、区間Ｑ１乃至区間Ｑ３で検出されていた顔は同じユーザの顔である。そこで、区間Ｑ３では、これまでの顔幅方式から眼幅方式に切り替えられて、距離Ｒ（視点位置）が算出されるとともに、関係情報としての顔幅ＦＷが算出され、保持される。つまり、関係情報の更新が行なわれる。 Further, in the section Q3 after the section Q2, the user's face and eyes are detected again. Further, from the face tracking results, the faces detected in the sections Q1 to Q3 are the faces of the same user. Therefore, in the section Q3, the face width method is switched from the conventional face width method, the distance R (viewpoint position) is calculated, and the face width FW as the related information is calculated and held. That is, the related information is updated.

区間Ｑ３と区間Ｑ４の間では、ユーザの顔の眼も検出されないので、この区間では、ユーザの視点位置の算出は行われない。 Between the section Q3 and the section Q4, since the eyes of the user's face are not detected, the viewpoint position of the user is not calculated in this section.

その後、区間Ｑ４において、再びユーザの顔が検出されている。なお、区間Ｑ４では、ユーザの顔は検出されているが、ユーザの眼は検出されていないため、関係情報が用いられて、顔幅方式により距離Ｒ（視点位置）が算出される。 Thereafter, the user's face is detected again in the section Q4. In section Q4, since the user's face is detected, but the user's eyes are not detected, the relationship information is used and the distance R (viewpoint position) is calculated by the face width method.

但し、区間Ｑ４の直前の区間では、ユーザの顔は検出されていないので、区間Ｑ４において検出された顔と、区間Ｑ４よりも前にある区間で検出された顔とが同じユーザの顔であるかが、顔から抽出された特徴量により特定される。そして、それらの区間で検出された顔が同じ顔である場合には、その顔についての関係情報が、区間Ｑ４での視点位置の算出に用いられる。 However, since the user's face is not detected in the section immediately before section Q4, the face detected in section Q4 and the face detected in the section before section Q4 are the same user's face. Is identified by the feature amount extracted from the face. When the faces detected in these sections are the same face, the relationship information about the faces is used for calculating the viewpoint position in the section Q4.

このように、撮影画像上の顔領域の追跡が途切れた場合であっても、特徴量を比較して同一のユーザ（顔）であるかを特定することで、より確実に、安定してユーザの視点位置を求めることができる。 As described above, even when the tracking of the face area on the photographed image is interrupted, the feature amount is compared to identify whether the user is the same user (face). The viewpoint position can be obtained.

さらに、区間Ｑ４以降においては、ユーザの顔も眼も検出されているので、この区間では眼幅方式によりユーザの視点位置が算出され、関係情報の更新も行なわれることになる。 Further, since the user's face and eyes are detected in and after the section Q4, the viewpoint position of the user is calculated by the eye width method in this section, and the related information is also updated.

［画像処理システムの構成例］
次に、本技術を適用した具体的な実施の形態について説明する。図３は、本技術を適用した画像処理システムの一実施の形態の構成例を示す図である。なお、図３において、図１における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 [Image processing system configuration example]
Next, specific embodiments to which the present technology is applied will be described. FIG. 3 is a diagram illustrating a configuration example of an embodiment of an image processing system to which the present technology is applied. In FIG. 3, the same reference numerals are given to the portions corresponding to those in FIG. 1, and the description thereof will be omitted as appropriate.

図３の画像処理システムは、撮影部１２、画像処理装置４１、表示制御部４２、および表示部１１から構成される。この画像処理システムは、例えばユーザの視点位置に応じて表示部１１に表示する立体画像の視差制御を行なうものであり、テレビジョン受像機やパーソナルコンピュータ、携帯電話機などに内蔵されている。 The image processing system in FIG. 3 includes an imaging unit 12, an image processing device 41, a display control unit 42, and a display unit 11. This image processing system performs parallax control of a stereoscopic image displayed on the display unit 11 according to a user's viewpoint position, for example, and is incorporated in a television receiver, a personal computer, a mobile phone, or the like.

撮影部１２は表示部１１の上部中央に設けられており、表示部１１をほぼ正面から観察するユーザを被写体として撮影し、その結果得られた撮影画像を画像処理装置４１に供給する。 The photographing unit 12 is provided in the upper center of the display unit 11, photographs a user who observes the display unit 11 from almost the front as a subject, and supplies a photographed image obtained as a result to the image processing device 41.

画像処理装置４１は、撮影部１２から供給された撮影画像からユーザの顔や眼を検出し、その検出結果に基づいてユーザの視点位置を算出して表示制御部４２に供給する。画像処理装置４１は、検出処理部５１および位置算出処理部５２から構成される。 The image processing apparatus 41 detects the user's face and eyes from the captured image supplied from the imaging unit 12, calculates the user's viewpoint position based on the detection result, and supplies the user's viewpoint position to the display control unit 42. The image processing device 41 includes a detection processing unit 51 and a position calculation processing unit 52.

検出処理部５１は、撮影部１２から供給された撮影画像からユーザの顔や眼を検出し、その検出結果を位置算出処理部５２に供給する。検出処理部５１は、顔検出部６１、眼検出部６２、判定部６３、および顔識別部６４を備えている。 The detection processing unit 51 detects the user's face and eyes from the captured image supplied from the imaging unit 12, and supplies the detection result to the position calculation processing unit 52. The detection processing unit 51 includes a face detection unit 61, an eye detection unit 62, a determination unit 63, and a face identification unit 64.

顔検出部６１は、撮影部１２からの撮影画像からユーザの顔を検出する。眼検出部６２は、撮影部１２からの撮影画像からユーザの眼を検出する。また、判定部６３は、撮影部１２からの撮影画像に基づいて、検出されたユーザの性別、年齢、および人種の判定を行なう。なお、ユーザの性別等の判定結果は、ユーザの実際の眼間距離ＥＷの特定に用いられる。 The face detection unit 61 detects the user's face from the photographed image from the photographing unit 12. The eye detection unit 62 detects the user's eyes from the captured image from the imaging unit 12. The determination unit 63 determines the gender, age, and race of the detected user based on the captured image from the imaging unit 12. The determination result such as the user's sex is used to specify the user's actual interocular distance EW.

顔識別部６４は、撮影画像から検出されたユーザの顔の領域から特徴量を抽出することで、顔識別を行なう。 The face identifying unit 64 performs face identification by extracting a feature amount from a region of the user's face detected from the captured image.

顔検出部６１乃至顔識別部６４で得られた顔の検出結果、眼の検出結果、ユーザの性別等の判定結果、および顔領域の特徴量は、適宜、検出処理部５１から位置算出処理部５２に供給される。 The face detection result obtained by the face detection unit 61 to the face identification unit 64, the eye detection result, the determination result such as the gender of the user, and the feature amount of the face region are appropriately transmitted from the detection processing unit 51 to the position calculation processing unit. 52.

位置算出処理部５２は、検出処理部５１から供給された顔の検出結果等に基づいて、３次元におけるユーザの視点位置（距離Ｒ）を算出し、表示制御部４２に供給する。位置算出処理部５２は、視点位置算出部７１、関係情報算出部７２、および関係情報保持部７３を備えている。 The position calculation processing unit 52 calculates the viewpoint position (distance R) of the user in three dimensions based on the face detection result supplied from the detection processing unit 51 and supplies the calculated position to the display control unit 42. The position calculation processing unit 52 includes a viewpoint position calculation unit 71, a relationship information calculation unit 72, and a relationship information holding unit 73.

視点位置算出部７１は、検出処理部５１から供給された顔や眼の検出結果に基づいて、ユーザの視点位置を算出する。このとき、視点位置算出部７１は、必要に応じて関係情報保持部７３に保持されている関係情報を用いてユーザの視点位置を算出する。 The viewpoint position calculation unit 71 calculates the viewpoint position of the user based on the detection result of the face and eyes supplied from the detection processing unit 51. At this time, the viewpoint position calculation unit 71 calculates the user's viewpoint position using the relationship information held in the relationship information holding unit 73 as necessary.

関係情報算出部７２は、検出処理部５１からの顔の検出結果と、視点位置算出部７１による視点位置の算出結果とに基づいて関係情報を算出し、関係情報保持部７３に保持させる。関係情報保持部７３は、関係情報算出部７２により算出された関係情報を、検出処理部５１から供給された特徴量と対応付けて保持する。 The relationship information calculation unit 72 calculates the relationship information based on the face detection result from the detection processing unit 51 and the viewpoint position calculation result by the viewpoint position calculation unit 71 and causes the relationship information holding unit 73 to hold the relationship information. The relationship information holding unit 73 holds the relationship information calculated by the relationship information calculation unit 72 in association with the feature amount supplied from the detection processing unit 51.

また、表示制御部４２は、図示せぬ記録部等から取得した立体画像を表示部１１に供給し、立体表示させる。このとき表示制御部４２は、位置算出処理部５２から供給されたユーザの視点位置に応じて、表示部１１に表示させる立体画像の視差制御を行なう。具体的には、例えば表示制御部４２は、ユーザの視点位置に応じて、表示部１１の各表示領域に対して立体画像を構成する右眼画像または左眼画像を割り当てることで、ユーザが適切に立体画像を視聴できるようにする。 In addition, the display control unit 42 supplies a stereoscopic image acquired from a recording unit or the like (not shown) to the display unit 11 for stereoscopic display. At this time, the display control unit 42 performs parallax control of the stereoscopic image displayed on the display unit 11 according to the viewpoint position of the user supplied from the position calculation processing unit 52. Specifically, for example, the display control unit 42 assigns a right-eye image or a left-eye image constituting a stereoscopic image to each display area of the display unit 11 according to the user's viewpoint position. 3D images can be viewed.

表示部１１は、裸眼方式で立体画像を表示させるディスプレイからなり、表示制御部４２の制御にしたがって、表示制御部４２から供給された立体画像を表示する。 The display unit 11 includes a display that displays a stereoscopic image by the naked eye method, and displays the stereoscopic image supplied from the display control unit 42 under the control of the display control unit 42.

［視点位置算出処理の説明］
ところで、図３の画像処理システムに対して、立体画像の表示が指示されると、撮影部１２は、撮影画像を撮影して順次、画像処理装置４１に供給する。すると、画像処理装置４１は、視点位置算出処理を行なって、ユーザの視点位置を表示制御部４２に出力する。そして、表示制御部４２は、画像処理装置４１からの視点位置に応じて視差制御を行い、表示部１１に立体画像を表示させる。 [Description of viewpoint position calculation processing]
By the way, when the display of a stereoscopic image is instructed to the image processing system of FIG. Then, the image processing device 41 performs viewpoint position calculation processing and outputs the user's viewpoint position to the display control unit 42. Then, the display control unit 42 performs parallax control according to the viewpoint position from the image processing device 41 and causes the display unit 11 to display a stereoscopic image.

以下、図４のフローチャートを参照して、画像処理装置４１による視点位置算出処理について説明する。 Hereinafter, the viewpoint position calculation process performed by the image processing apparatus 41 will be described with reference to the flowchart of FIG.

ステップＳ１１において、顔検出部６１は、撮影部１２から供給された撮影画像に基づいて顔検出を行い、撮影画像からユーザの顔の領域を検出する。なお、顔検出の方法は、例えば識別器を用いる方法など、どのような方法であってもよい。 In step S 11, the face detection unit 61 performs face detection based on the photographed image supplied from the photographing unit 12 and detects a region of the user's face from the photographed image. The face detection method may be any method such as a method using a discriminator.

ステップＳ１２において、顔検出部６１はステップＳ１１における顔検出の結果と、過去の顔検出の結果とに基づいて、撮影画像上における顔の領域のトラッキング（追跡）を行なう。これにより、各時刻の撮影画像において検出された顔が同一ユーザの顔であるかを特定することができる。 In step S12, the face detection unit 61 performs tracking (tracking) of the face area on the captured image based on the face detection result in step S11 and the past face detection result. Thereby, it can be specified whether the face detected in the captured image of each time is a face of the same user.

ステップＳ１３において、眼検出部６２は、撮影部１２から供給された撮影画像に基づいて眼検出を行い、撮影画像からユーザの眼の領域を検出する。このとき、眼検出部６２は、必要に応じて顔検出部６１による顔検出の結果を利用し、撮影画像上の顔の領域内からユーザの眼の領域を検出する。なお、眼の検出方法は、識別器やテンプレートを用いる方法など、どのような方法であってもよい。 In step S 13, the eye detection unit 62 performs eye detection based on the captured image supplied from the imaging unit 12, and detects the user's eye area from the captured image. At this time, the eye detection unit 62 uses the face detection result by the face detection unit 61 as necessary to detect the user's eye region from within the face region on the captured image. The eye detection method may be any method such as a method using a discriminator or a template.

ステップＳ１４において、判定部６３は、撮影部１２から供給された撮影画像と、顔検出部６１による顔検出の結果とに基づいて、撮影画像から検出されたユーザの性別、年齢、および人種の判定を行なう。 In step S 14, the determination unit 63 determines the gender, age, and race of the user detected from the captured image based on the captured image supplied from the capturing unit 12 and the result of face detection by the face detection unit 61. Make a decision.

ステップＳ１５において、顔識別部６４は、撮影部１２から供給された撮影画像と、顔検出部６１による顔検出の結果とに基づいて、撮影画像から検出された顔の識別を行なう。すなわち、顔識別部６４は、撮影画像上の顔の領域から特徴量を抽出する。なお、この特徴量は、必要に応じて顔のトラッキングに用いられるようにしてもよい。 In step S 15, the face identifying unit 64 identifies the face detected from the captured image based on the captured image supplied from the capturing unit 12 and the result of face detection by the face detecting unit 61. That is, the face identification unit 64 extracts a feature amount from a face area on the captured image. This feature amount may be used for face tracking as necessary.

ステップＳ１６において、検出処理部５１は、撮影画像から顔が検出されたか否かを判定する。例えば、ステップＳ１１の処理において、撮影画像からユーザの顔が検出された場合、顔が検出されたと判定される。 In step S 16, the detection processing unit 51 determines whether a face is detected from the captured image. For example, when the user's face is detected from the captured image in the process of step S11, it is determined that the face has been detected.

ステップＳ１６において、顔が検出されなかったと判定された場合、処理はステップＳ１１に戻り、上述した処理が繰り返される。すなわち、次のフレームの撮影画像が処理対象とされ、撮影画像に対する顔検出等が行なわれる。 If it is determined in step S16 that no face has been detected, the process returns to step S11 and the above-described process is repeated. That is, the captured image of the next frame is set as a processing target, and face detection or the like is performed on the captured image.

なお、撮影画像から顔が検出されなかった場合には、実質的には上述したステップＳ１２乃至ステップＳ１５の処理は行なわれないことになる。 If no face is detected from the photographed image, the processes in steps S12 to S15 described above are not substantially performed.

これに対してステップＳ１６において、顔が検出されたと判定された場合、処理はステップＳ１７へと進む。このとき検出処理部５１は、適宜、ステップＳ１１乃至ステップＳ１５の処理で得られた、顔検出の結果、顔のトラッキング結果、眼検出の結果、性別等の判定結果、および顔の特徴量を位置算出処理部５２に供給する。 On the other hand, if it is determined in step S16 that a face has been detected, the process proceeds to step S17. At this time, the detection processing unit 51 appropriately positions the face detection result, the face tracking result, the eye detection result, the gender determination result, and the facial feature amount obtained in the processes of steps S11 to S15. This is supplied to the calculation processing unit 52.

ステップＳ１７において、位置算出処理部５２は、検出処理部５１から供給された眼検出の結果に基づいて、撮影画像からユーザの眼が検出されたか否かを判定する。 In step S 17, the position calculation processing unit 52 determines whether the user's eyes are detected from the captured image based on the eye detection result supplied from the detection processing unit 51.

ステップＳ１７において、眼が検出されたと判定された場合、ステップＳ１８において、視点位置算出部７１は、撮影画像上のユーザの眼の位置と、ユーザの眼間距離ＥＷとに基づいて３次元のユーザの眼の位置、つまり実空間におけるユーザの視点位置（距離Ｒ）を算出する。 If it is determined in step S17 that an eye has been detected, in step S18, the viewpoint position calculation unit 71 determines a three-dimensional user based on the user's eye position on the captured image and the user's interocular distance EW. Is calculated, that is, the viewpoint position (distance R) of the user in the real space.

例えば、視点位置算出部７１は、性別、年齢、および人種と、それらの性別、年齢、および人種が同じであるユーザの平均的（統計的）な眼間距離ＥＷとが対応付けられた眼幅テーブルを予め記録している。視点位置算出部７１は、記録している眼幅テーブルから、検出処理部５１から供給されたユーザの性別、年齢、および人種の判定結果から特定される眼間距離ＥＷを取得する。つまり、判定の結果得られた性別、年齢、および人種により特定される眼間距離ＥＷが読み出される。 For example, the viewpoint position calculation unit 71 associates gender, age, and race with an average (statistical) interocular distance EW of users having the same gender, age, and race. The eye width table is recorded in advance. The viewpoint position calculation unit 71 acquires the interocular distance EW specified from the determination result of the sex, age, and race of the user supplied from the detection processing unit 51 from the recorded eye width table. That is, the interocular distance EW specified by the sex, age, and race obtained as a result of the determination is read.

また、視点位置算出部７１は、検出処理部５１から供給された眼検出の結果としての撮影画像上におけるユーザの眼の位置ｘｅおよび眼間距離ＨＥと、既知である撮影部１２の画角ＡＨとに基づいて式（１）乃至式（３）と同様の計算を行う。 The viewpoint position calculation unit 71 also includes the user's eye position xe and interocular distance HE on the captured image as a result of eye detection supplied from the detection processing unit 51, and the known angle of view AH of the imaging unit 12. Based on the above, the same calculations as in equations (1) to (3) are performed.

この計算により、ユーザの眼幅に相当する角度θ’が得られるので、視点位置算出部７１は、得られた角度θ’と、眼幅テーブルから読み出した眼間距離ＥＷとから上述した式（５）を計算し、表示部１１からユーザまでの距離Ｒを視点位置として算出する。なお、ユーザの視点位置は、距離Ｒでもよいし、距離Ｒから求まる３次元座標空間上におけるユーザの視点位置の座標でもよい。 Since the angle θ ′ corresponding to the user's eye width is obtained by this calculation, the viewpoint position calculation unit 71 calculates the above-described formula (θ) from the obtained angle θ ′ and the interocular distance EW read from the eye width table. 5) is calculated, and the distance R from the display unit 11 to the user is calculated as the viewpoint position. The user's viewpoint position may be the distance R or the coordinates of the user's viewpoint position on the three-dimensional coordinate space obtained from the distance R.

このように、眼間距離ＥＷとして、性別，年齢，人種ごとの統計的な値のなかから、撮影画像上のユーザの性別，年齢，人種と合致するものを選択して用いることで、より高精度にユーザの視点位置（距離Ｒ）を求めることができる。 As described above, by selecting and using the interocular distance EW that matches the sex, age, and race of the user on the captured image from among the statistical values for each sex, age, and race, The user's viewpoint position (distance R) can be obtained with higher accuracy.

なお、ユーザの性別、年齢、および人種の判定結果により特定される眼間距離ＥＷが用いられると説明したが、ユーザの性別、年齢、人種のうちの少なくとも何れか１つが用いられて眼間距離ＥＷが特定されるようにしてもよい。 In addition, although it has been described that the interocular distance EW specified by the determination result of the user's sex, age, and race is used, at least one of the user's sex, age, and race is used. The inter-distance EW may be specified.

また、センサ等によりユーザの眼間距離ＥＷを直接測定し、その測定結果を予め視点位置算出部７１に記録しておくようにしてもよいし、ユーザにより直接入力または選択された眼間距離ＥＷを予め視点位置算出部７１に記録しておくようにしてもよい。 Alternatively, the interocular distance EW of the user may be directly measured by a sensor or the like, and the measurement result may be recorded in the viewpoint position calculation unit 71 in advance, or the interocular distance EW directly input or selected by the user. May be recorded in the viewpoint position calculation unit 71 in advance.

ステップＳ１９において関係情報算出部７２は、検出処理部５１からの顔の検出結果と、視点位置算出部７１による視点位置の算出結果とに基づいて関係情報を算出する。 In step S 19, the relationship information calculation unit 72 calculates relationship information based on the face detection result from the detection processing unit 51 and the viewpoint position calculation result by the viewpoint position calculation unit 71.

具体的には関係情報算出部７２は、検出処理部５１からの顔検出結果としての撮影画像上におけるユーザの顔の位置ｘ、および撮影画像上におけるユーザの顔の幅widthと、既知である撮影部１２の画角ＡＨとに基づいて式（１）乃至式（３）を計算する。 Specifically, the relational information calculation unit 72 is a known photographing that includes the position x of the user's face on the photographed image as the face detection result from the detection processing unit 51 and the width width of the user's face on the photographed image. Expressions (1) to (3) are calculated based on the angle of view AH of the unit 12.

この計算により、ユーザの顔幅に相当する角度θが得られるので、関係情報算出部７２は、得られた角度θと、視点位置算出部７１により算出された距離Ｒとから、式（４）よりユーザの実際の顔幅ＦＷを関係情報として算出する。 Since the angle θ corresponding to the face width of the user is obtained by this calculation, the relationship information calculation unit 72 calculates the equation (4) from the obtained angle θ and the distance R calculated by the viewpoint position calculation unit 71. Further, the actual face width FW of the user is calculated as related information.

ステップＳ２０において、関係情報保持部７３は、ステップＳ１９の処理において関係情報算出部７２により算出された関係情報を保持する。 In step S20, the relationship information holding unit 73 holds the relationship information calculated by the relationship information calculation unit 72 in the process of step S19.

より具体的には、関係情報保持部７３は、保持している関係情報としての顔幅ＦＷのうち、その関係情報に対応付けられている特徴量が、検出処理部５１から供給された特徴量と最も類似している関係情報を特定する。例えば、特徴量の差分など、特徴量間の距離が最も短いものが類似する特徴量とされる。また、顔のトラッキングにより顔が継続して検出されている場合には、その顔について求められた関係情報が特定される。 More specifically, the relationship information holding unit 73 includes the feature amount associated with the relationship information among the face width FW as the held relationship information, which is supplied from the detection processing unit 51. The relationship information that is most similar to is identified. For example, a feature having the shortest distance between feature amounts, such as a difference between feature amounts, is a similar feature amount. Further, when the face is continuously detected by the face tracking, the relationship information obtained for the face is specified.

そして、関係情報保持部７３は、特定された関係情報としての顔幅ＦＷと、ステップＳ１９において算出された関係情報としての顔幅ＦＷとに基づいて関係情報を更新し、更新後の関係情報を保持する。 Then, the relationship information holding unit 73 updates the relationship information based on the face width FW as the specified relationship information and the face width FW as the relationship information calculated in step S19, and the updated relationship information is displayed. Hold.

例えば、関係情報の更新では、新たにステップＳ１９で算出された顔幅ＦＷと、過去に求められたいくつかの顔幅ＦＷとの平均値が求められ、得られた平均値が更新後の顔幅ＦＷ（関係情報）とされる。 For example, in updating the relationship information, an average value of the face width FW newly calculated in step S19 and several face widths FW obtained in the past is obtained, and the obtained average value is the updated face value. The width is FW (related information).

なお、例えば顔幅ＦＷの更新は、予め定められた回数だけ更新された後は行なわれないようにしてもよいし、顔が連続して検出されている区間でのみ顔幅ＦＷの更新が行なわれ、新たに顔が検出された場合には、新たな関係情報が保持されるようにしてもよい。 Note that, for example, the face width FW may not be updated after being updated a predetermined number of times, or the face width FW is updated only in a section where the face is continuously detected. If a new face is detected, new relationship information may be held.

ステップＳ２１において、位置算出処理部５２は、視点位置算出部７１により算出されたユーザの視点位置（距離Ｒ）、つまり３次元の眼の位置を表示制御部４２に出力する。これらのステップＳ１８乃至ステップＳ２１で行なわれる処理が、例えば図２の区間Ｑ１や区間Ｑ３で行なわれる処理である。 In step S 21, the position calculation processing unit 52 outputs the user's viewpoint position (distance R) calculated by the viewpoint position calculation unit 71, that is, the three-dimensional eye position, to the display control unit 42. The processes performed in steps S18 to S21 are processes performed in, for example, the section Q1 and the section Q3 in FIG.

ステップＳ２２において、位置算出処理部５２は、視点位置を算出する処理を終了するか否かを判定する。例えば、ユーザにより立体画像の再生停止が指示された場合、処理を終了すると判定される。 In step S22, the position calculation processing unit 52 determines whether to end the process of calculating the viewpoint position. For example, when the user instructs to stop the reproduction of the stereoscopic image, it is determined that the process is to be ended.

ステップＳ２２において、処理を終了しないと判定された場合、処理はステップＳ１１に戻り、上述した処理が繰り返される。 If it is determined in step S22 that the process is not terminated, the process returns to step S11, and the above-described process is repeated.

また、ステップＳ２２において、処理を終了すると判定された場合、視点位置算出処理は終了する。 If it is determined in step S22 that the process is to be terminated, the viewpoint position calculation process is terminated.

さらに、上述したステップＳ１７において、眼が検出されなかったと判定された場合、処理はステップＳ２３に進む。 Furthermore, when it is determined in step S17 described above that no eye has been detected, the process proceeds to step S23.

ステップＳ２３において、視点位置算出部７１は、検出処理部５１からの顔検出結果や特徴量と、関係情報保持部７３に保持されている関係情報とに基づいて、３次元のユーザの眼の位置、つまり実空間におけるユーザの視点位置（距離Ｒ）を算出する。 In step S 23, the viewpoint position calculation unit 71 determines the position of the three-dimensional user's eye based on the face detection result and the feature amount from the detection processing unit 51 and the relationship information held in the relationship information holding unit 73. That is, the viewpoint position (distance R) of the user in the real space is calculated.

具体的には、検出処理部５１から供給された顔のトラッキング結果が、撮影画像からユーザの顔と眼が検出された区間から、継続して顔が検出されているという結果であるとする。この場合、視点位置算出部７１は、継続して検出されている顔について求められた関係情報を、関係情報保持部７３から取得する。すなわち、処理対象となっているフレームが、例えば図２の区間Ｑ２内のフレームである場合、継続して検出されている顔の関係情報が取得される。このとき、必要に応じて、顔識別部６４により顔の領域から抽出された特徴量が用いられる。 Specifically, it is assumed that the face tracking result supplied from the detection processing unit 51 is a result that the face is continuously detected from the section in which the user's face and eyes are detected from the captured image. In this case, the viewpoint position calculation unit 71 acquires the relationship information obtained for the continuously detected face from the relationship information holding unit 73. That is, when the frame to be processed is, for example, a frame in the section Q2 in FIG. 2, the relationship information of the face detected continuously is acquired. At this time, the feature amount extracted from the face area by the face identification unit 64 is used as necessary.

また、検出処理部５１から供給された顔のトラッキング結果が、最後に撮影画像からユーザの顔と眼が検出された区間から、処理対象の撮影画像までの間に、顔が検出されていない区間があるという結果であったとする。この場合、視点位置算出部７１は、処理対象となっている撮影画像から検出された顔と同じ顔について算出された関係情報を関係情報保持部７３から取得する。 Further, the face tracking result supplied from the detection processing unit 51 is a section in which no face is detected between the section in which the user's face and eyes are detected from the last captured image and the captured image to be processed. Suppose that there is a result. In this case, the viewpoint position calculation unit 71 acquires, from the relationship information holding unit 73, related information calculated for the same face as the face detected from the captured image that is the processing target.

例えば、処理対象の撮影画像よりも前の撮影画像のうち、最後に顔と眼が検出された撮影画像についての顔の特徴量と、処理対象の撮影画像の顔の特徴量とが類似している場合には、それらの顔は同じ顔であるとして、その顔の関係情報が取得される。また、例えば顔識別部６４により処理対象の撮影画像上の顔の領域から抽出された特徴量との類似度が最も高い特徴量に対応付けられた関係情報が、関係情報保持部７３から取得されるようにしてもよい。 For example, among the captured images before the processing target captured image, the facial feature amount of the captured image in which the face and eyes are detected last is similar to the facial feature amount of the processing target captured image. If they are, they are assumed to be the same face, and related information about the faces is acquired. Further, for example, the relationship information associated with the feature amount having the highest similarity with the feature amount extracted from the face region on the captured image to be processed by the face identification unit 64 is acquired from the relationship information holding unit 73. You may make it do.

なお、最後に撮影画像からユーザの顔と眼が検出された区間から、処理対象の撮影画像までの間に、顔が検出されていない区間がある場合とは、例えば処理対象の撮影画像が、図２の区間Ｑ４内にある場合などである。例えば、図２の区間Ｑ４において、区間Ｑ４で顔から抽出された特徴量と、直前の区間Ｑ３で顔から抽出された特徴量とから、これらの顔が同じであると特定されたときには、区間Ｑ３で検出された顔の関係情報が取得される。 In addition, when there is a section where no face is detected between the section in which the user's face and eyes were detected from the last captured image and the captured image to be processed, for example, the captured image to be processed is This is the case in the section Q4 of FIG. For example, in the section Q4 of FIG. 2, when it is determined that the feature amount extracted from the face in the section Q4 and the feature amount extracted from the face in the immediately preceding section Q3 are the same, the section The relationship information of the face detected in Q3 is acquired.

視点位置算出部７１は、関係情報保持部７３から関係情報としての顔幅ＦＷを取得すると、ユーザの顔幅に相当する角度θを算出する。 Upon obtaining the face width FW as the relationship information from the relationship information holding unit 73, the viewpoint position calculation unit 71 calculates the angle θ corresponding to the user's face width.

すなわち、視点位置算出部７１は、検出処理部５１からの顔検出結果としての撮影画像上におけるユーザの顔の位置ｘ、および撮影画像上におけるユーザの顔の幅widthと、既知である撮影部１２の画角ＡＨとに基づいて式（１）乃至式（３）を計算し、角度θを求める。そして、視点位置算出部７１は、得られた角度θと関係情報としての顔幅ＦＷとから、式（４）を計算することで、ユーザの視点位置（距離Ｒ）を算出する。 In other words, the viewpoint position calculation unit 71 includes the position x of the user's face on the captured image as the face detection result from the detection processing unit 51, the width width of the user's face on the captured image, and the known imaging unit 12. Equations (1) to (3) are calculated on the basis of the angle of view AH and the angle θ is obtained. Then, the viewpoint position calculation unit 71 calculates the user's viewpoint position (distance R) by calculating Expression (4) from the obtained angle θ and the face width FW as the relationship information.

ステップＳ２３において、ユーザの視点位置が算出されると、その後、ステップＳ２１およびステップＳ２２の処理が行なわれて視点位置算出処理は終了する。 In step S23, when the viewpoint position of the user is calculated, the processes in steps S21 and S22 are then performed, and the viewpoint position calculation process ends.

以上のようにして、画像処理装置４１は、顔検出と眼検出の結果に応じて、眼幅方式と顔幅方式とを切り替えてユーザの視点位置（距離Ｒ）を算出する。また、画像処理装置４１は、眼幅方式により視点位置を算出した場合には、その算出結果から実際のユーザの眼間距離と顔幅の関係を求め、関係情報として保持しておく。 As described above, the image processing apparatus 41 calculates the user's viewpoint position (distance R) by switching between the eye width method and the face width method according to the results of face detection and eye detection. Further, when the viewpoint position is calculated by the eye width method, the image processing apparatus 41 obtains the relationship between the actual distance between the eyes of the user and the face width from the calculation result, and holds it as the relationship information.

このように、眼幅方式により視点位置を算出した場合には、関係情報を求めて保持しておき、顔幅方式で視点位置を算出する場合には関係情報を用いることで、より確実に、安定してユーザの視点位置を求めることができる。その結果、表示制御部４２による立体画像の視差制御において、より安定した、正確な視差制御を行なうことができるようになり、表示部１１に表示する立体画像のクロストークを減少させたり、逆視を抑制したりできるようになる。 As described above, when the viewpoint position is calculated by the eye width method, the relationship information is obtained and held, and when the viewpoint position is calculated by the face width method, the relationship information is used, thereby more reliably. The viewpoint position of the user can be obtained stably. As a result, in the parallax control of the stereoscopic image by the display control unit 42, more stable and accurate parallax control can be performed, and the crosstalk of the stereoscopic image displayed on the display unit 11 can be reduced, or the reverse view can be performed. Can be suppressed.

ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどが含まれる。 By the way, the above-described series of processing can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software is installed in the computer. Here, the computer includes, for example, a general-purpose personal computer capable of executing various functions by installing a computer incorporated in dedicated hardware and various programs.

図５は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示すブロック図である。 FIG. 5 is a block diagram illustrating an example of a hardware configuration of a computer that executes the above-described series of processes using a program.

コンピュータにおいて、CPU（Central Processing Unit）２０１，ROM（Read Only Memory）２０２，RAM（Random Access Memory）２０３は、バス２０４により相互に接続されている。 In a computer, a central processing unit (CPU) 201, a read only memory (ROM) 202, and a random access memory (RAM) 203 are connected to each other by a bus 204.

バス２０４には、さらに、入出力インターフェース２０５が接続されている。入出力インターフェース２０５には、入力部２０６、出力部２０７、記録部２０８、通信部２０９、及びドライブ２１０が接続されている。 An input / output interface 205 is further connected to the bus 204. An input unit 206, an output unit 207, a recording unit 208, a communication unit 209, and a drive 210 are connected to the input / output interface 205.

入力部２０６は、キーボード、マウス、マイクロホンなどよりなる。出力部２０７は、ディスプレイ、スピーカなどよりなる。記録部２０８は、ハードディスクや不揮発性のメモリなどよりなる。通信部２０９は、ネットワークインターフェースなどよりなる。ドライブ２１０は、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリなどのリムーバブルメディア２１１を駆動する。 The input unit 206 includes a keyboard, a mouse, a microphone, and the like. The output unit 207 includes a display, a speaker, and the like. The recording unit 208 includes a hard disk, a nonvolatile memory, and the like. The communication unit 209 includes a network interface and the like. The drive 210 drives a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

以上のように構成されるコンピュータでは、CPU２０１が、例えば、記録部２０８に記録されているプログラムを、入出力インターフェース２０５及びバス２０４を介して、RAM２０３にロードして実行することにより、上述した一連の処理が行われる。 In the computer configured as described above, the CPU 201 loads, for example, the program recorded in the recording unit 208 to the RAM 203 via the input / output interface 205 and the bus 204, and executes the program. Is performed.

コンピュータ（CPU２０１）が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブルメディア２１１に記録して提供することができる。また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供することができる。 The program executed by the computer (CPU 201) can be provided by being recorded on the removable medium 211 as a package medium or the like, for example. The program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

コンピュータでは、プログラムは、リムーバブルメディア２１１をドライブ２１０に装着することにより、入出力インターフェース２０５を介して、記録部２０８にインストールすることができる。また、プログラムは、有線または無線の伝送媒体を介して、通信部２０９で受信し、記録部２０８にインストールすることができる。その他、プログラムは、ROM２０２や記録部２０８に、あらかじめインストールしておくことができる。 In the computer, the program can be installed in the recording unit 208 via the input / output interface 205 by attaching the removable medium 211 to the drive 210. Further, the program can be received by the communication unit 209 via a wired or wireless transmission medium and installed in the recording unit 208. In addition, the program can be installed in the ROM 202 or the recording unit 208 in advance.

なお、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.

また、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.

例えば、本技術は、１つの機能をネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 For example, the present technology can take a configuration of cloud computing in which one function is shared by a plurality of devices via a network and is jointly processed.

また、上述のフローチャートで説明した各ステップは、１つの装置で実行する他、複数の装置で分担して実行することができる。 In addition, each step described in the above flowchart can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.

さらに、１つのステップに複数の処理が含まれる場合には、その１つのステップに含まれる複数の処理は、１つの装置で実行する他、複数の装置で分担して実行することができる。 Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.

さらに、本技術は、以下の構成とすることも可能である。 Furthermore, this technique can also be set as the following structures.

［１］
ユーザを被写体として撮影された撮影画像に基づいて、前記撮影画像から前記ユーザの顔を検出する顔検出部と、
前記撮影画像から前記ユーザの眼を検出する眼検出部と、
前記ユーザの眼が検出された場合、前記ユーザの眼の検出結果に基づいて、実空間における前記ユーザの位置を算出する位置算出部と、
前記ユーザの眼が検出された場合、算出された前記ユーザの位置と前記ユーザの顔の検出結果とに基づいて、前記ユーザの眼間距離と顔幅の関係を示す関係情報を算出する関係情報算出部と
を備え、
前記位置算出部は、前記ユーザの顔が検出され、前記ユーザの眼が検出されなかった場合、前記ユーザの顔の検出結果および前記関係情報に基づいて、前記ユーザの位置を算出する
画像処理装置。
［２］
前記位置算出部は、前記ユーザの眼が検出された場合、平均的なユーザの眼間距離と、前記ユーザの眼の検出結果とに基づいて、前記ユーザの位置を算出する
［１］に記載の画像処理装置。
［３］
前記撮影画像に基づいて、前記ユーザの性別、年齢、または人種の少なくとも何れかを判定する判定部をさらに備え、
前記位置算出部は、前記判定部による判定結果により定まる前記平均的なユーザの眼間距離と、前記ユーザの眼の検出結果とに基づいて、前記ユーザの位置を算出する
［２］に記載の画像処理装置。
［４］
前記関係情報算出部は、前記関係情報として実際の前記ユーザの顔幅を算出する
［１］乃至［３］の何れかに記載の画像処理装置。
［５］
前記撮影画像から検出された前記ユーザの顔の領域から特徴量を抽出する顔識別部と、
前記特徴量と前記関係情報を対応付けて保持する関係情報保持部と
をさらに備え、
前記位置算出部は、処理対象の前記撮影画像の直前の前記撮影画像から前記ユーザの顔が検出されず、かつ前記処理対象の前記撮影画像から前記ユーザの顔が検出され、前記ユーザの眼が検出されなかった場合、前記特徴量に基づいて選択した前記関係情報、および前記ユーザの顔の検出結果に基づいて、前記ユーザの位置を算出する
［１］乃至［４］の何れかに記載の画像処理装置。 [1]
A face detection unit that detects the user's face from the photographed image based on a photographed image photographed with the user as a subject;
An eye detector that detects the eyes of the user from the captured image;
A position calculating unit that calculates the position of the user in real space based on a detection result of the user's eye when the user's eye is detected;
When the user's eyes are detected, the relationship information that calculates the relationship information indicating the relationship between the user's interocular distance and the face width based on the calculated position of the user and the detection result of the user's face A calculation unit and
When the user's face is detected and the user's eyes are not detected, the position calculation unit calculates the user's position based on the detection result of the user's face and the relation information. .
[2]
The position calculation unit calculates the position of the user based on an average user's interocular distance and a detection result of the user's eyes when the user's eyes are detected. [1] Image processing apparatus.
[3]
A determination unit that determines at least one of gender, age, or race of the user based on the captured image;
The position calculation unit calculates the position of the user based on the average interocular distance of the user determined by the determination result by the determination unit and the detection result of the user's eyes. Image processing device.
[4]
The image processing apparatus according to any one of [1] to [3], wherein the relationship information calculation unit calculates an actual face width of the user as the relationship information.
[5]
A face identifying unit that extracts a feature amount from an area of the user's face detected from the captured image;
A relation information holding unit that holds the feature quantity and the relation information in association with each other; and
The position calculation unit does not detect the user's face from the captured image immediately before the captured image to be processed, detects the user's face from the captured image to be processed, and the user's eyes If not detected, the position of the user is calculated based on the relation information selected based on the feature amount and a detection result of the user's face. [1] to [4] Image processing device.

４１画像処理装置，６１顔検出部，６２眼検出部，６３判定部，６４顔識別部，７１視点位置算出部，７２関係情報算出部，７３関係情報保持部 41 image processing device 61 face detection unit 62 eye detection unit 63 determination unit 64 face identification unit 71 viewpoint position calculation unit 72 relationship information calculation unit 73 relationship information holding unit

Claims

A face detection unit that detects the user's face from the photographed image based on a photographed image photographed with the user as a subject;
An eye detector that detects the eyes of the user from the captured image;
A position calculating unit that calculates the position of the user in real space based on a detection result of the user's eye when the user's eye is detected;
When the user's eyes are detected, the relationship information that calculates the relationship information indicating the relationship between the user's interocular distance and the face width based on the calculated position of the user and the detection result of the user's face A calculation unit and
When the user's face is detected and the user's eyes are not detected, the position calculation unit calculates the user's position based on the detection result of the user's face and the relation information. .

The position calculation unit calculates the position of the user based on an average distance between the eyes of the user and a detection result of the eyes of the user when the user's eyes are detected. Image processing apparatus.

A determination unit that determines at least one of gender, age, or race of the user based on the captured image;
The position calculation unit calculates the position of the user based on the average user's interocular distance determined by the determination result by the determination unit and the detection result of the user's eyes. Image processing device.

The image processing apparatus according to claim 1, wherein the relationship information calculation unit calculates an actual face width of the user as the relationship information.

A face identifying unit that extracts a feature amount from an area of the user's face detected from the captured image;
A relation information holding unit that holds the feature quantity and the relation information in association with each other; and
The position calculation unit does not detect the user's face from the captured image immediately before the captured image to be processed, detects the user's face from the captured image to be processed, and the user's eyes The image processing apparatus according to claim 1, wherein if not detected, the position of the user is calculated based on the relation information selected based on the feature amount and a detection result of the user's face.

Based on a photographed image photographed with the user as a subject, the user's face is detected from the photographed image,
Detecting the user's eyes from the captured image;
When the user's eyes are detected, the position of the user in real space is calculated based on the detection results of the user's eyes,
When the user's eyes are detected, based on the calculated position of the user and the detection result of the user's face, calculate relationship information indicating the relationship between the user's interocular distance and the face width;
An image processing method including a step of calculating the position of the user based on a detection result of the user's face and the relationship information when the user's face is detected and the user's eyes are not detected.

Based on a photographed image photographed with the user as a subject, the user's face is detected from the photographed image,
Detecting the user's eyes from the captured image;
When the user's eyes are detected, the position of the user in real space is calculated based on the detection results of the user's eyes,
When the user's eyes are detected, based on the calculated position of the user and the detection result of the user's face, calculate relationship information indicating the relationship between the user's interocular distance and the face width;
When the user's face is detected and the user's eyes are not detected, the computer executes a process including a step of calculating the user's position based on the detection result of the user's face and the relation information. program.