JP2011223284A

JP2011223284A - Pseudo-stereoscopic image generation device and camera

Info

Publication number: JP2011223284A
Application number: JP2010089977A
Authority: JP
Inventors: Yasuhiko Teranishi; 康彦寺西; Hiroshi Ichimura; 洋市村
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2010-04-09
Filing date: 2010-04-09
Publication date: 2011-11-04
Anticipated expiration: 2030-04-09
Also published as: JP5304714B2

Abstract

PROBLEM TO BE SOLVED: To provide means for generating a pseudo-stereoscopic image signal which is free of a sense of incompatibility and close to a real image in any scene of a non-stereoscopic image when converting a non-stereoscopic image signal into a pseudo-stereoscopic image signal.SOLUTION: A CPU 108 calculates control signals CTL1, CTL2 and CTL3 for each frame based upon various kinds of input data such as the position of a range-finding area within a photographic image, an estimated distance to a subject, an estimated depth of field, data on whether a subject is a face, and data on the size of the face if the subject is a face, and supplies the data as parameters for pseudo-stereoscopic image generation to a 2D3D conversion section 115 which generates a pseudo-stereoscopic image from a non-stereoscopic image. The control signal CTL1 is used for controlling a composition ratio of individual image of a plurality of basic depth model types. The control signal CTL2 is a control signal indicating a weighting factor for weighting only an R signal component in a decoded image signal "a". The control signal CTL3 is a control signal that includes a parameter representing a depth and a parameter representing convergence.

Description

本発明は擬似立体画像生成装置及びカメラに係り、特に奥行き情報が明示的にも又はステレオ画像のように暗示的にも与えられていない通常の動画像（非立体動画像）から擬似的な立体画像を生成する擬似立体動画像生成装置及びカメラに関する。 The present invention relates to a pseudo-stereoscopic image generation apparatus and a camera, and more particularly to a pseudo-stereoscopic image from a normal moving image (non-stereoscopic moving image) in which depth information is not given explicitly or implicitly like a stereo image. The present invention relates to a pseudo three-dimensional moving image generation device and a camera that generate an image.

近年、立体画像を生成する立体画像生成装置が、数多く発表されている。この立体画像生成装置は大きく３種類に分類できる。一つ目は、右目用と左目用の光学系をそれぞれ備えて、それぞれの光学系で撮影した画像を記録、再生するものである。二つ目は、主たる一つの光学系に加えて現実の奥行き情報を測定する構成を備えて、主たる一つの光学系で撮影した２次元の静止画像又は動画像（以下、非立体画像という）と、測定した奥行き情報とを合成して立体画像を構成するものである。 In recent years, many stereoscopic image generation apparatuses that generate a stereoscopic image have been announced. This stereoscopic image generating apparatus can be roughly classified into three types. The first one includes a right-eye optical system and a left-eye optical system, and records and reproduces images captured by the respective optical systems. The second is a structure for measuring actual depth information in addition to the main optical system, and a two-dimensional still image or moving image (hereinafter referred to as a non-stereoscopic image) photographed by the main optical system. The three-dimensional image is constructed by combining the measured depth information.

特許文献１に開示された立体画像生成装置では、非立体画像である２次元画像の画面の各画素について距離を測定する。そして、２つのカメラ又はレーザー光を用いて、画面内の全ての被写体とカメラとの距離を測定し、２次元画像の画面の各画素についての距離情報を得る。続いて、この立体画像生成装置では、撮影者の撮影時の意図を反映させた奥行き画像及び奥行き精度画像を簡便に生成するために、個々の画素毎に、非線形な関数（マッピングテーブル）によって距離情報から奥行き値を算出する。このとき、複数のマッピングテーブルを撮影情報に応じて選択する。また、この立体画像生成装置では、奥行き情報の精度について、画面内で精度の高い部分と低い部分とを設けることができるようにしている。 In the stereoscopic image generating apparatus disclosed in Patent Document 1, the distance is measured for each pixel of the screen of a two-dimensional image that is a non-stereo image. Then, the distance between all the subjects in the screen and the camera is measured using two cameras or laser beams, and distance information about each pixel of the screen of the two-dimensional image is obtained. Subsequently, in this stereoscopic image generating apparatus, in order to easily generate a depth image and a depth accuracy image reflecting the intention of the photographer at the time of shooting, the distance is determined by a non-linear function (mapping table) for each individual pixel. The depth value is calculated from the information. At this time, a plurality of mapping tables are selected according to the shooting information. Further, in this stereoscopic image generating device, with respect to the accuracy of the depth information, a portion with high accuracy and a portion with low accuracy can be provided in the screen.

三つ目は、非立体画像から推定して擬似的な奥行き情報を生成し、その擬似的な奥行き情報と非立体画像とを合成して立体画像を構成する立体画像生成装置である。この立体画像生成装置は、推定によって擬似的な奥行き情報を生成するので、２次元画像の画面の各画素について距離を測定することはない構成である。 The third one is a stereoscopic image generating device that generates pseudo depth information by estimating from a non-stereo image, and composes the stereoscopic image by synthesizing the pseudo depth information and the non-stereo image. Since this stereoscopic image generation device generates pseudo depth information by estimation, it does not measure the distance for each pixel of the screen of the two-dimensional image.

また、本出願人による特許文献２には、擬似立体画像を生成する際に、自動的な処理方法で推定される奥行き情報を、非立体画像の１画面毎の場面（以下、シーンという）に応じて補正することを可能にする擬似立体画像生成装置が開示されている。 Further, in Patent Document 2 by the present applicant, when generating a pseudo-stereoscopic image, depth information estimated by an automatic processing method is used as a scene for each screen of a non-stereoscopic image (hereinafter referred to as a scene). A pseudo-stereoscopic image generation device that can be corrected accordingly is disclosed.

この特許文献２に開示された擬似立体画像生成装置では、奥行き感を有する画像（以下、基本奥行きモデルタイプという）を複数用意し、非立体画像の１画面における輝度信号の高域成分を算出して、その算出値に基づいて複数の基本奥行きモデルタイプの合成比率を自動的に算出する。そして、算出した合成比率から非立体画像の奥行き感を出すための奥行きデータを推定して、非立体画像と奥行きデータとにより擬似立体画像を生成する。 In the pseudo stereoscopic image generating apparatus disclosed in Patent Document 2, a plurality of images having a sense of depth (hereinafter referred to as basic depth model type) are prepared, and a high frequency component of a luminance signal in one screen of a non-stereo image is calculated. Then, the composition ratio of the plurality of basic depth model types is automatically calculated based on the calculated value. Then, depth data for producing a sense of depth of the non-stereo image is estimated from the calculated composition ratio, and a pseudo-stereo image is generated from the non-stereo image and the depth data.

このとき、複数の基本奥行きモデルタイプの合成比率は、すべての非立体画像に対して予め定めた同一の方法により自動的に算出すると、シーンによっては、適切な奥行き情報が得られず、違和感のある擬似立体画像が生成されてしまう場合がある。しかし、非立体画像のシーンに応じて、その都度ユーザー自身が擬似立体画像のアルゴリズムやパラメータを調整することは現実的に困難である。そこで、上記の特許文献２記載の擬似立体画像生成装置では、違和感のない、現実のイメージにより近い擬似立体画像を生成するために、製作者側において奥行き情報を、シーンに応じて補正することを目的にしている。 At this time, if the composition ratio of a plurality of basic depth model types is automatically calculated by the same method predetermined for all non-stereo images, appropriate depth information may not be obtained depending on the scene, and it may be uncomfortable. A certain pseudo-stereoscopic image may be generated. However, it is practically difficult for the user himself to adjust the algorithm and parameters of the pseudo stereoscopic image each time according to the scene of the non-stereoscopic image. Therefore, in the pseudo-stereoscopic image generation device described in Patent Document 2, in order to generate a pseudo-stereoscopic image that is closer to the actual image without any sense of incongruity, the producer side corrects the depth information according to the scene. It is aimed.

特開２００８−１４１６６６号公報JP 2008-141666 A 特開２００９−０４４７２２号公報JP 2009-044722 A

しかしながら、特許文献１記載の立体画像生成装置では、距離情報の取得のために２つのカメラあるいはレーザー光を用いた距離測定手段が必要であるため、装置が高価格のものになってしまう。また、この立体画像生成装置では、撮影した情報を記録する場合には、２次元画像データと共に各画素毎の奥行き情報を記録する必要があり、記録画データが大きくなって、その保存や伝送にかかるコストも問題となる。 However, the stereoscopic image generating apparatus described in Patent Document 1 requires two cameras or distance measuring means using laser light to acquire distance information, which makes the apparatus expensive. Further, in this stereoscopic image generating apparatus, when recording photographed information, it is necessary to record depth information for each pixel together with two-dimensional image data, and the recorded image data becomes large and can be stored and transmitted. Such costs are also a problem.

一方、特許文献２には、奥行き情報の調整を行うための具体的な方法については開示されていない。例えば、人が実際にシーンを視聴して、手作業で調整パラメータを決定してもよい。しかしながら、ユーザーがハンディビデオカメラで撮影した非立体画像を擬似立体画像に変換するような場合には、特許文献２記載の擬似立体画像生成装置では、ユーザーが製作者となって、手作業で調整パラメータを調整することとなるが、違和感のない、現実のイメージにより近い擬似立体画像を生成するための調整は非常に煩わしい。 On the other hand, Patent Document 2 does not disclose a specific method for adjusting depth information. For example, the person may actually watch the scene and manually determine the adjustment parameter. However, when converting a non-stereoscopic image captured by a user with a handy video camera to a pseudo-stereoscopic image, the pseudo-stereoscopic image generation apparatus described in Patent Document 2 is adjusted manually by the user as a producer. Although the parameters are adjusted, the adjustment for generating a pseudo-stereoscopic image closer to the actual image without a sense of incongruity is very troublesome.

本発明は、上記の点に鑑みなされたもので、ビデオカメラで撮影した得られた非立体画像信号を擬似立体画像信号に変換する場合に、どのような非立体画像のシーンであっても、違和感のない、現実のイメージにより近い擬似立体画像信号を生成することができる擬似立体画像生成装置及びカメラを提供することを目的とする。 The present invention has been made in view of the above points, and when converting a non-stereoscopic image signal obtained by a video camera into a pseudo-stereoscopic image signal, any non-stereoscopic image scene can be obtained. It is an object of the present invention to provide a pseudo-stereoscopic image generation apparatus and a camera that can generate a pseudo-stereoscopic image signal closer to an actual image without a sense of incongruity.

上記の目的を達成するため、本発明の擬似立体画像生成装置は、ズームレンズ、フォーカスレンズ及び絞りを含む光学系を通して撮像素子の撮像面に結像された被写体の光学像を、撮像素子により光電変換して得られた非立体画像信号から被写体が人物であるとき、その人物の顔の大きさ情報を取得する顔の大きさ情報取得手段と、非立体画像信号から擬似立体画像信号を生成するための基本となるシーンを示す複数の基本奥行きモデルタイプの画像を発生する基本奥行きモデル発生手段と、複数の基本奥行きモデルタイプの画像の合成比率を示す第１の制御信号に基づいて、基本奥行きモデル発生手段から供給される複数の基本奥行きモデルタイプの画像を合成して奥行きモデルの画像を生成する奥行きモデル合成手段と、非立体画像信号の重み付けをするための重み付け係数を示す第２の制御信号に基づいて、非立体画像信号に対して第２の制御信号が示す重み付け係数を乗算する重み付け手段と、重み付け手段により重み付けされた非立体画像信号と、奥行きモデル合成手段により生成された奥行きモデルの画像とから、奥行き推定データを生成する奥行き推定データ生成手段と、奥行きと輻輳とを示す第３の制御信号により奥行きと輻輳とが調整された奥行き推定データに基づいて、非立体画像信号のテクスチャをシフトして、擬似立体画像信号を生成するテクスチャシフト手段と、第１〜第３の制御信号をそれぞれ算出して出力する際に、顔の大きさ情報を用いて、算出する第１〜第３の制御信号のうちの少なくともいずれか一つの制御信号の値を可変する制御信号算出手段とを有することを特徴とする。 In order to achieve the above object, the pseudo-stereoscopic image generation apparatus of the present invention photoelectrically converts an optical image of a subject formed on an imaging surface of an imaging element through an optical system including a zoom lens, a focus lens, and a diaphragm by the imaging element. When the subject is a person from the non-stereo image signal obtained by the conversion, a face size information acquisition unit that acquires face size information of the person and a pseudo stereo image signal from the non-stereo image signal are generated. A basic depth model generating means for generating a plurality of basic depth model type images indicating a basic scene, and a basic depth based on a first control signal indicating a composition ratio of the plurality of basic depth model type images. A depth model synthesizing means for synthesizing a plurality of basic depth model type images supplied from the model generating means to generate a depth model image; and a non-stereoscopic image signal Weighting means for multiplying a non-stereo image signal by a weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting, and the non-stereo image weighted by the weighting means The depth and the congestion are adjusted by the depth estimation data generating means for generating the depth estimation data from the signal and the depth model image generated by the depth model synthesizing means and the third control signal indicating the depth and the congestion. Based on the estimated depth estimation data, the texture shifting means for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal and the first to third control signals are calculated and output. Control signal calculation that varies the value of at least one of the first to third control signals to be calculated using the magnitude information of And having a stage.

また、上記の目的を達成するため、本発明の擬似立体画像生成装置は、光学系を構成するズームレンズ及びフォーカスレンズの各位置をそれぞれ取得するレンズ位置取得手段と、レンズ位置取得手段で取得されたズームレンズ位置及びフォーカスレンズの位置に基づいて、フォーカスレンズの位置から被写体までの推定距離を算出する被写体推定距離算出手段と、光学系を通して撮像素子の撮像面に結像された被写体の光学像を、撮像素子により光電変換して得られた非立体画像信号から擬似立体画像信号を生成するための基本となるシーンを示す複数の基本奥行きモデルタイプの画像を発生する基本奥行きモデル発生手段と、複数の基本奥行きモデルタイプの画像の合成比率を示す第１の制御信号に基づいて、基本奥行きモデル発生手段から供給される複数の基本奥行きモデルタイプの画像を合成して奥行きモデルの画像を生成する奥行きモデル合成手段と、非立体画像信号の重み付けをするための重み付け係数を示す第２の制御信号に基づいて、非立体画像信号に対して第２の制御信号が示す重み付け係数を乗算する重み付け手段と、重み付け手段により重み付けされた非立体画像信号と、奥行きモデル合成手段により生成された奥行きモデルの画像とから、奥行き推定データを生成する奥行き推定データ生成手段と、奥行きと輻輳とを示す第３の制御信号により奥行きと輻輳とが調整された奥行き推定データに基づいて、非立体画像信号のテクスチャをシフトして、擬似立体画像信号を生成するテクスチャシフト手段と、第１乃び第３の制御信号をそれぞれ算出して出力する際に、算出された被写体までの推定距離の情報を用いて、算出する第１及び第３の制御信号のうちの少なくともいずれか一方の制御信号の値を可変する制御信号算出手段とを有することを特徴とする。 In order to achieve the above object, the pseudo-stereoscopic image generation apparatus of the present invention is acquired by a lens position acquisition unit that acquires each position of a zoom lens and a focus lens that constitute an optical system, and a lens position acquisition unit. Subject estimated distance calculating means for calculating an estimated distance from the focus lens position to the subject based on the zoom lens position and the focus lens position, and an optical image of the subject imaged on the imaging surface of the image sensor through the optical system A basic depth model generating means for generating a plurality of basic depth model type images indicating a basic scene for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal obtained by photoelectric conversion by an imaging device; Based on the first control signal indicating the composition ratio of the images of the plurality of basic depth model types, the basic depth model generating means Depth model synthesis means for synthesizing a plurality of supplied basic depth model type images to generate a depth model image, and a second control signal indicating a weighting coefficient for weighting the non-stereo image signal The weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal, the non-stereo image signal weighted by the weighting means, and the depth model image generated by the depth model synthesis means The texture of the non-stereoscopic image signal is shifted based on the depth estimation data generating means for generating the depth estimation data and the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. The texture shift means for generating the pseudo stereoscopic image signal and the first control signal and the third control signal are respectively calculated and output. And a control signal calculating means for varying the value of at least one of the first and third control signals to be calculated using the information on the estimated distance to the subject. Features.

また、上記の目的を達成するため、本発明の擬似立体画像生成装置は、光学系を構成するズームレンズの位置、フォーカスレンズの位置及び絞りの絞り値をそれぞれ取得する光学系情報取得手段と、光学系情報取得手段で取得された絞り値及びズームレンズ位置に基づいて、推定被写界深度を算出する被写界深度算出手段と、光学系を通して撮像素子の撮像面に結像された被写体の光学像を、撮像素子により光電変換して得られた非立体画像信号から擬似立体画像信号を生成するための基本となるシーンを示す複数の基本奥行きモデルタイプの画像を発生する基本奥行きモデル発生手段と、複数の基本奥行きモデルタイプの画像の合成比率を示す第１の制御信号に基づいて、基本奥行きモデル発生手段から供給される複数の基本奥行きモデルタイプの画像を合成して奥行きモデルの画像を生成する奥行きモデル合成手段と、非立体画像信号の重み付けをするための重み付け係数を示す第２の制御信号に基づいて、非立体画像信号に対して第２の制御信号が示す重み付け係数を乗算する重み付け手段と、重み付け手段により重み付けされた非立体画像信号と、奥行きモデル合成手段により生成された奥行きモデルの画像とから、奥行き推定データを生成する奥行き推定データ生成手段と、奥行きと輻輳とを示す第３の制御信号により奥行きと輻輳とが調整された奥行き推定データに基づいて、非立体画像信号のテクスチャをシフトして、擬似立体画像信号を生成するテクスチャシフト手段と、第１〜第３の制御信号をそれぞれ算出して出力する際に、算出された推定被写界深度の情報を用いて、算出する第１〜第３の制御信号のうちの少なくともいずれか一つの制御信号の値を可変する制御信号算出手段とを有することを特徴とする。 In order to achieve the above object, the pseudo-stereoscopic image generation apparatus of the present invention includes an optical system information acquisition unit that acquires a position of a zoom lens, a position of a focus lens, and an aperture value of a diaphragm that constitute an optical system, Based on the aperture value and zoom lens position acquired by the optical system information acquisition means, the depth of field calculation means for calculating the estimated depth of field, and the subject imaged on the imaging surface of the image sensor through the optical system Basic depth model generation means for generating a plurality of basic depth model type images indicating a basic scene for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal obtained by photoelectrically converting an optical image by an image sensor. And a plurality of basic depth models supplied from the basic depth model generating means based on a first control signal indicating a composition ratio of the images of the plurality of basic depth model types. A non-stereo image signal based on a second model control signal indicating a weighting coefficient for weighting the non-stereo image signal, and a depth model synthesizing unit that synthesizes the image of the image and generates a depth model image Depth for generating depth estimation data from weighting means for multiplying the weighting coefficient indicated by the second control signal, the non-stereo image signal weighted by the weighting means, and the image of the depth model generated by the depth model synthesis means. Based on the depth estimation data in which the depth and the convergence are adjusted by the estimated data generation means and the third control signal indicating the depth and the congestion, the texture of the non-stereo image signal is shifted to generate a pseudo stereoscopic image signal. Information of the estimated depth of field calculated when calculating and outputting the texture shift means and the first to third control signals respectively Used, and having a variable control signal calculating means a value of at least one of the control signals of the first to third control signals to be calculated.

また、上記の目的を達成するため、本発明の擬似立体画像生成装置は、光学系を構成するズームレンズ及びフォーカスレンズの各位置をそれぞれ取得するレンズ位置取得手段と、光学系を通して撮像素子の撮像面に結像された被写体の光学像を、撮像素子により光電変換して得られた非立体画像信号のうち、撮像素子の撮像画面内の所定の小領域である測距エリア内の非立体画像信号の高域成分が最大となる合焦位置を求めるために、フォーカスレンズの位置を移動制御する自動焦点調節手段と、撮像画面内の測距エリアの位置データを取得する位置データ取得手段と、フォーカスレンズの合焦位置から被写体までの推定距離を算出する被写体推定距離算出手段と、非立体画像信号から擬似立体画像信号を生成するための基本となるシーンを示す複数の基本奥行きモデルタイプの画像を発生する基本奥行きモデル発生手段と、複数の基本奥行きモデルタイプの画像の合成比率を示す第１の制御信号に基づいて、基本奥行きモデル発生手段から供給される複数の基本奥行きモデルタイプの画像を合成して奥行きモデルの画像を生成する奥行きモデル合成手段と、非立体画像信号の重み付けをするための重み付け係数を示す第２の制御信号に基づいて、非立体画像信号に対して第２の制御信号が示す重み付け係数を乗算する重み付け手段と、重み付け手段により重み付けされた非立体画像信号と、奥行きモデル合成手段により生成された奥行きモデルの画像とから、奥行き推定データを生成する奥行き推定データ生成手段と、奥行きと輻輳とを示す第３の制御信号により奥行きと輻輳とが調整された奥行き推定データに基づいて、非立体画像信号のテクスチャをシフトして、擬似立体画像信号を生成するテクスチャシフト手段と、第１〜第３の制御信号をそれぞれ算出して出力する際に、測距エリアの位置データと被写体までの推定距離とに応じて、算出する第１及び第３の制御信号のうち少なくともいずれか一方の制御信号を可変する制御信号算出手段とを有することを特徴とする。 In order to achieve the above object, the pseudo-stereoscopic image generation device of the present invention includes a lens position acquisition unit that acquires each position of a zoom lens and a focus lens that constitute an optical system, and imaging of an image sensor through the optical system. Of the non-stereo image signal obtained by photoelectrically converting the optical image of the subject imaged on the surface by the image sensor, the non-stereo image in the distance measuring area, which is a predetermined small area in the imaging screen of the image sensor In order to obtain the in-focus position where the high frequency component of the signal is maximized, automatic focus adjustment means for moving and controlling the position of the focus lens, position data acquisition means for acquiring position data of the ranging area in the imaging screen, Subject estimation distance calculation means for calculating an estimated distance from the focus lens focus position to the subject, and a basic scene for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal Based on a basic depth model generating means for generating a plurality of basic depth model type images and a first control signal indicating a composition ratio of the plurality of basic depth model type images, the basic depth model generating means supplies the images. Based on a depth model combining means for generating a depth model image by combining a plurality of basic depth model type images, and a second control signal indicating a weighting coefficient for weighting the non-stereo image signal, Depth estimation from weighting means for multiplying the image signal by a weighting coefficient indicated by the second control signal, the non-stereo image signal weighted by the weighting means, and the depth model image generated by the depth model synthesis means Depth estimation data generation means for generating data, and a third control signal indicating depth and congestion When shifting the texture of the non-stereo image signal based on the adjusted depth estimation data and generating the pseudo stereo image signal, and calculating and outputting the first to third control signals, respectively. And control signal calculation means for varying at least one of the first and third control signals to be calculated according to the position data of the distance measurement area and the estimated distance to the subject. And

また、上記の目的を達成するため、本発明の擬似立体画像生成装置は、光学系を通して撮像面に結像された被写体の光学像を光電変換して非立体画像信号を得る撮像素子の搖動の大きさ情報と、パンニング、チルティングの大きさ情報とからなる手振れ情報を検出する手振れ検出手段と、撮像素子により得られた非立体画像信号から擬似立体画像信号を生成するための基本となるシーンを示す複数の基本奥行きモデルタイプの画像を発生する基本奥行きモデル発生手段と、複数の基本奥行きモデルタイプの画像の合成比率を示す第１の制御信号に基づいて、基本奥行きモデル発生手段から供給される複数の基本奥行きモデルタイプの画像を合成して奥行きモデルの画像を生成する奥行きモデル合成手段と、非立体画像信号の重み付けをするための重み付け係数を示す第２の制御信号に基づいて、非立体画像信号に対して第２の制御信号が示す重み付け係数を乗算する重み付け手段と、重み付け手段により重み付けされた非立体画像信号と、奥行きモデル合成手段により生成された奥行きモデルの画像とから、奥行き推定データを生成する奥行き推定データ生成手段と、奥行きと輻輳とを示す第３の制御信号により奥行きと輻輳とが調整された奥行き推定データに基づいて、非立体画像信号のテクスチャをシフトして、擬似立体画像信号を生成するテクスチャシフト手段と、第１〜第３の制御信号をそれぞれ算出して出力する際に、手振れ情報を用いて、算出する第１〜第３の制御信号のうちの少なくともいずれか一つの制御信号の値を可変する制御信号算出手段とを有することを特徴とする。 In order to achieve the above object, the pseudo-stereoscopic image generation apparatus according to the present invention is configured to perform a peristaltic movement of an imaging element that photoelectrically converts an optical image of a subject formed on an imaging surface through an optical system to obtain a non-stereoscopic image signal. Camera shake detection means for detecting camera shake information composed of size information, panning and tilting size information, and a basic scene for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal obtained by an image sensor And a basic depth model generating means for generating a plurality of basic depth model type images and a first control signal indicating a synthesis ratio of the plurality of basic depth model type images. Depth model compositing means that generates a depth model image by combining multiple basic depth model type images, and weighting of non-stereo image signals A weighting means for multiplying the non-stereo image signal by a weighting coefficient indicated by the second control signal based on the second control signal indicating the weighting coefficient, a non-stereoscopic image signal weighted by the weighting means, and a depth Depth estimation data generating means for generating depth estimation data from the depth model image generated by the model synthesizing means, and depth estimation data in which the depth and congestion are adjusted by a third control signal indicating depth and congestion When shifting the texture of the non-stereo image signal to generate a pseudo stereo image signal and calculating and outputting each of the first to third control signals, the camera shake information is used. Control signal calculating means for varying the value of at least one of the first to third control signals to be calculated. To.

また、上記の目的を達成するため、本発明の擬似立体画像生成装置は、光学系を通して撮像面に結像された被写体の光学像を光電変換して非立体画像信号を得る撮像素子の光軸周りのロール角度を検出するロール角度検出手段と、撮像素子により得られた非立体画像信号から擬似立体画像信号を生成するための基本となるシーンを示す複数の基本奥行きモデルタイプの画像を発生する基本奥行きモデル発生手段と、複数の基本奥行きモデルタイプの画像の合成比率を示す第１の制御信号に基づいて、基本奥行きモデル発生手段から供給される複数の基本奥行きモデルタイプの画像を合成して奥行きモデルの画像を生成する奥行きモデル合成手段と、非立体画像信号の重み付けをするための重み付け係数を示す第２の制御信号に基づいて、非立体画像信号に対して第２の制御信号が示す重み付け係数を乗算する重み付け手段と、重み付け手段により重み付けされた非立体画像信号と、奥行きモデル合成手段により生成された奥行きモデルの画像とから、奥行き推定データを生成する奥行き推定データ生成手段と、奥行きと輻輳とを示す第３の制御信号により奥行きと輻輳とが調整された奥行き推定データに基づいて、非立体画像信号のテクスチャをシフトして、擬似立体画像信号を生成するテクスチャシフト手段と、第１〜第３の制御信号をそれぞれ算出して出力する際に、ロール角度検出手段により検出されたロール角度検出情報を用いて、算出する第１及び第３の制御信号のうちの少なくともいずれか一方の制御信号の値を可変する制御信号算出手段とを有することを特徴とする。 In order to achieve the above object, the pseudo-stereoscopic image generation apparatus of the present invention provides an optical axis of an imaging element that obtains a non-stereoscopic image signal by photoelectrically converting an optical image of a subject formed on an imaging surface through an optical system. A roll angle detection means for detecting a surrounding roll angle, and a plurality of basic depth model type images indicating a scene serving as a basis for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal obtained by an image sensor. Based on the first control signal indicating the synthesis ratio of the basic depth model generating means and the images of the plurality of basic depth model types, the images of the plurality of basic depth model types supplied from the basic depth model generating means are synthesized. Based on the depth model synthesis means for generating the image of the depth model and the second control signal indicating the weighting coefficient for weighting the non-stereo image signal, Depth estimation from weighting means for multiplying the image signal by a weighting coefficient indicated by the second control signal, the non-stereo image signal weighted by the weighting means, and the depth model image generated by the depth model synthesis means Based on depth estimation data generating means for generating data and depth estimation data in which the depth and congestion are adjusted by the third control signal indicating the depth and congestion, the texture of the non-stereoscopic image signal is shifted and simulated. When calculating and outputting the texture shift means for generating the stereoscopic image signal and the first to third control signals, the first and the third calculated using the roll angle detection information detected by the roll angle detection means Control signal calculation means for varying the value of at least one of the third control signals.

更に、上記の目的を達成するため、本発明のカメラは、光学系を通して撮像面に結像された被写体の光学像を光電変換して、非立体画像信号を得る撮像素子を有するカメラにおいて、カメラから被写体までの推定距離を算出する被写体推定距離算出手段と、光学系を通して撮像面に結像された被写体の光学像を光電変換する撮像素子から出力される非立体画像信号の隣接する２フレーム間の信号変化と推定距離とに基づいて、撮影シーン情報を取得する撮影シーン情報取得手段と、撮像素子から出力される非立体画像信号から擬似立体画像信号を生成するための基本となるシーンを示す複数の基本奥行きモデルタイプの画像を発生する基本奥行きモデル発生手段と、複数の基本奥行きモデルタイプの画像の合成比率を示す第１の制御信号に基づいて、基本奥行きモデル発生手段から供給される複数の基本奥行きモデルタイプの画像を合成して奥行きモデルの画像を生成する奥行きモデル合成手段と、非立体画像信号の重み付けをするための重み付け係数を示す第２の制御信号に基づいて、非立体画像信号に対して第２の制御信号が示す重み付け係数を乗算する重み付け手段と、重み付け手段により重み付けされた非立体画像信号と、奥行きモデル合成手段により生成された奥行きモデルの画像とから、奥行き推定データを生成する奥行き推定データ生成手段と、奥行きと輻輳とを示す第３の制御信号により奥行きと輻輳とが調整された奥行き推定データに基づいて、非立体画像信号のテクスチャをシフトして、擬似立体画像信号を生成するテクスチャシフト手段と、第１乃至第３の制御信号をそれぞれ算出して出力する際に、撮影シーン情報取得手段により取得された撮影シーン情報を用いて、算出する第１乃至第３の制御信号のうちの少なくともいずれか一つの制御信号の値を可変する制御信号算出手段とを有することを特徴とする。 Furthermore, in order to achieve the above object, the camera of the present invention is a camera having an image sensor that photoelectrically converts an optical image of a subject imaged on an imaging surface through an optical system to obtain a non-stereoscopic image signal. Between the adjacent two frames of the non-stereoscopic image signal output from the subject estimated distance calculating means for calculating the estimated distance from the subject to the subject and the image sensor that photoelectrically converts the optical image of the subject imaged on the imaging surface through the optical system A scene that is a basis for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal that is output from an imaging element and a non-stereoscopic image signal that is output from an imaging device is shown based on the signal change and the estimated distance Based on a basic depth model generating means for generating a plurality of basic depth model type images and a first control signal indicating a composition ratio of the plurality of basic depth model type images. And a depth model synthesizing unit for synthesizing a plurality of basic depth model type images supplied from the basic depth model generating unit to generate a depth model image, and a weighting coefficient for weighting the non-stereo image signal. Based on the second control signal, the weighting means for multiplying the non-stereo image signal by a weighting coefficient indicated by the second control signal, the non-stereo image signal weighted by the weighting means, and the depth model synthesis means Based on the depth estimation data generated by the depth estimation data generation means for generating depth estimation data, and the depth estimation data in which the depth and the congestion are adjusted by the third control signal indicating the depth and the congestion, Texture shifting means for generating a pseudo stereoscopic image signal by shifting the texture of the stereoscopic image signal; When calculating and outputting each control signal, the value of at least one of the first to third control signals to be calculated using the shooting scene information acquired by the shooting scene information acquisition means And a control signal calculating means for varying.

本発明によれば、画面毎に得られる撮影時のパラメータから制御信号を補正することにより、撮影して得られた非立体画像信号に対して、最適な奥行きモデルの擬似立体画像信号を生成することができ、これにより擬似立体画像信号を視聴するユーザーの違和感や疲労感を抑制し、更に迫力や臨場感のある擬似立体画像信号を生成することができる。 According to the present invention, a pseudo-stereoscopic image signal of an optimal depth model is generated for a non-stereoscopic image signal obtained by photographing by correcting a control signal from photographing parameters obtained for each screen. Accordingly, it is possible to suppress a sense of discomfort and fatigue of a user who views the pseudo stereoscopic image signal, and to generate a pseudo stereoscopic image signal having a more powerful and realistic feeling.

本発明の擬似立体画像生成装置を備えたビデオカメラの第１の実施形態のブロック図である。1 is a block diagram of a first embodiment of a video camera including a pseudo stereoscopic image generation device of the present invention. FIG. 図１中の２Ｄ３Ｄ変換部の一例のブロック図である。It is a block diagram of an example of the 2D3D conversion part in FIG. 図２中の奥行き推定データ生成部の一例のブロック図である。It is a block diagram of an example of the depth estimation data generation part in FIG. 図２中のステレオペア生成部の一例のブロック図である。It is a block diagram of an example of the stereo pair production | generation part in FIG. 基本奥行きモデルタイプＡの画像の立体構造の一例である。It is an example of the three-dimensional structure of the image of basic depth model type A. 基本奥行きモデルタイプＢの画像の立体構造の一例である。It is an example of the three-dimensional structure of the image of basic depth model type B. 基本奥行きモデルタイプＣの画像の立体構造の一例である。It is an example of the three-dimensional structure of the image of basic depth model type C. 図３中の合成比率決定部における合成比率の決定条件の一例を示す図である。It is a figure which shows an example of the determination conditions of the composite ratio in the composite ratio determination part in FIG. 図３中の奥行きモデル合成部の一例のブロック図である。It is a block diagram of an example of the depth model synthetic | combination part in FIG. 本発明の擬似立体画像生成装置を備えたビデオカメラの第２の実施形態のブロック図である。It is a block diagram of 2nd Embodiment of the video camera provided with the pseudo stereo image production | generation apparatus of this invention. 本発明の擬似立体画像生成装置を備えたビデオカメラの第３の実施形態のブロック図である。It is a block diagram of 3rd Embodiment of the video camera provided with the pseudo stereo image production | generation apparatus of this invention.

次に、本発明の各実施形態について図面と共に説明する。 Next, each embodiment of the present invention will be described with reference to the drawings.

＜第１の実施形態＞
図１は、本発明になる擬似立体画像生成装置を備えたカメラの第１の実施形態のブロック図を示す。以後の各実施形態の説明において、カメラはビデオカメラとして説明するが、ビデオカメラに限定するものではない。本実施形態のビデオカメラ１００は、ズームレンズ１０１、フォーカスレンズ１０２、絞り１０３及び撮像素子１０４からなる光学系を有するハンディビデオカメラである。 <First Embodiment>
FIG. 1 shows a block diagram of a first embodiment of a camera provided with a pseudo stereoscopic image generating apparatus according to the present invention. In the following description of each embodiment, the camera is described as a video camera, but is not limited to a video camera. The video camera 100 according to the present embodiment is a handy video camera having an optical system including a zoom lens 101, a focus lens 102, a diaphragm 103, and an image sensor 104.

ズームレンズ１０１はズームレンズ駆動部１０５により、またフォーカスレンズ１０２はフォーカスレンズ駆動部１０６により、それぞれ撮像素子１０４の撮像面に対して近付く方向又は遠ざかる方向に互いに独立して移動制御される。絞り１０３は、絞り駆動部１０７により絞り値が制御される。撮像素子１０４は、ＣＣＤ（Charge Coupled Devise；電荷結合素子）あるいはＣＭＯＳ（Complementary Metal Oxide Semiconductor）センサからなり、行方向及び列方向に多数個の画素がマトリクス状に配列された撮像面を有する。撮像素子１０４は、撮像面に結像された被写体の光学像を光電変換して撮像信号を出力する。 The zoom lens 101 is controlled to move independently from each other in the direction approaching or moving away from the imaging surface of the image sensor 104 by the zoom lens driving unit 105 and the focus lens 102 by the focus lens driving unit 106, respectively. The aperture value of the aperture 103 is controlled by the aperture driver 107. The image sensor 104 is composed of a CCD (Charge Coupled Devise) or CMOS (Complementary Metal Oxide Semiconductor) sensor, and has an imaging surface in which a large number of pixels are arranged in a matrix in the row direction and the column direction. The image sensor 104 photoelectrically converts an optical image of a subject formed on the imaging surface and outputs an imaging signal.

また、ズームレンズ駆動部１０５は、ズームレンズ１０１の移動位置を検出する機能を有する。同様に、フォーカスレンズ駆動部１０６は、フォーカスレンズ１０２の移動位置を検出する機能を有する。絞り駆動部１０７は、絞り量（絞り値）を検出する機能を有する。 The zoom lens driving unit 105 has a function of detecting the movement position of the zoom lens 101. Similarly, the focus lens driving unit 106 has a function of detecting the movement position of the focus lens 102. The aperture driving unit 107 has a function of detecting an aperture amount (aperture value).

ＣＰＵ（Central Processing Unit；中央処理装置）１０８は、本発明における被写体の顔の大きさ情報取得手段、ズームレンズ１０１及びフォーカスレンズ１０２の各位置をそれぞれ取得するレンズ位置取得手段、光学系を構成するズームレンズ１０１の位置、フォーカスレンズ１０２の位置及び絞り１０３の絞り値をそれぞれ取得する光学系情報取得手段、フォーカスレンズ１０２の合焦位置から被写体までの推定距離を算出する被写体推定距離算出手段、推定被写界深度を算出する被写界深度算出手段、撮像画面内の測距エリアの位置データを取得する位置データ取得手段、制御信号ＣＴＬ１〜ＣＴＬ３を算出する制御信号算出手段を構成している。また、ＣＰＵ１０８は、フォーカスレンズ駆動部１０６と共にフォーカスレンズ１０２を合焦位置に移動制御する自動焦点調節手段も構成している。 A CPU (Central Processing Unit) 108 constitutes a subject face size information acquisition unit, a lens position acquisition unit for acquiring each position of the zoom lens 101 and the focus lens 102, and an optical system in the present invention. Optical system information acquisition means for acquiring the position of the zoom lens 101, the position of the focus lens 102, and the aperture value of the diaphragm 103, subject estimation distance calculation means for calculating the estimated distance from the focus position of the focus lens 102 to the subject, and estimation A depth-of-field calculating means for calculating the depth of field, a position data acquiring means for acquiring position data of a distance measuring area in the imaging screen, and a control signal calculating means for calculating the control signals CTL1 to CTL3 are configured. The CPU 108 also constitutes an automatic focus adjustment unit that controls the movement of the focus lens 102 to the in-focus position together with the focus lens driving unit 106.

すなわち、ＣＰＵ１０８は、ズームレンズ駆動部１０５により検出されたズームレンズ１０１の移動位置、フォーカスレンズ駆動部１０６により検出されたフォーカスレンズ１０２の移動位置、及び絞り駆動部１０７により検出された絞り量（絞り値）が入力され、これらの入力情報をメモリ１０９に格納されたプログラム及びテーブルに従って演算処理する。また、ＣＰＵ１０８は、撮像信号のシーン毎に得られる撮影情報から後述する制御信号ＣＴＬ１、ＣＴＬ２及びＣＴＬ３を生成する。 That is, the CPU 108 moves the zoom lens 101 detected by the zoom lens driving unit 105, moves the focus lens 102 detected by the focus lens driving unit 106, and the aperture amount (aperture) detected by the aperture driving unit 107. Value) is input, and the input information is processed according to a program and a table stored in the memory 109. Further, the CPU 108 generates control signals CTL1, CTL2, and CTL3, which will be described later, from imaging information obtained for each scene of the imaging signal.

メモリ１０９は、ズームレンズ１０１の移動位置と合焦状態でのフォーカスレンズ１０２の移動位置との組み合わせから、フォーカスレンズ１０２の合焦位置から被写体までの距離を推定するための数値テーブル（以下、「被写体距離推定テーブル」と称する）を格納している。また、メモリ１０９は、絞り値とズームレンズ１０１の移動位置と被写体との距離との推定値から、いわゆる被写界深度を推定するための数値テーブル（以下、「被写界深度テーブル」と称する）を格納している。 The memory 109 is a numerical value table (hereinafter referred to as “the focus lens 102”) for estimating the distance from the focus position of the focus lens 102 to the subject from the combination of the movement position of the zoom lens 101 and the movement position of the focus lens 102 in the focused state. A subject distance estimation table). Further, the memory 109 is a numerical value table (hereinafter referred to as “depth of field table”) for estimating a so-called depth of field from the estimated value of the aperture value, the moving position of the zoom lens 101, and the distance from the subject. ) Is stored.

また、ビデオカメラ１００は、撮像信号をデジタル信号に変換するためのＡ／Ｄ変換器１１０、デジタル信号に対して所定の信号処理を行って画像データを出力する信号処理部１１１、信号処理された画像データを記憶する画像記憶部１１２、画像データに対して符号化等の画像処理を行う画像処理部１１３及び録再インタフェース（Ｉ／Ｆ;Interface）部１１４を有する。 In addition, the video camera 100 includes an A / D converter 110 that converts an image pickup signal into a digital signal, a signal processing unit 111 that performs predetermined signal processing on the digital signal, and outputs image data. An image storage unit 112 that stores image data, an image processing unit 113 that performs image processing such as encoding on the image data, and a recording / reproducing interface (I / F) unit 114 are included.

更に、ビデオカメラ１００は、擬似立体画像生成部（以下、「２Ｄ３Ｄ変換部」という）１１５及び表示Ｉ／Ｆ部１１６を有する。２Ｄ３Ｄ変換部１１５は、後に詳述するように、ＣＰＵ１０８により生成された制御信号ＣＴＬ１、ＣＴＬ２及びＣＴＬ３に基づき、非立体画像である撮影画像から奥行き情報を推定して、各シーンで違和感の殆どない擬似立体画像を生成する。表示Ｉ／Ｆ部１１６は、擬似立体画像を表示する。２Ｄ３Ｄ変換部１１５は、前述したＣＰＵ１０８に直接に接続される一方、画像記憶部１１２、画像処理部１１３、録再Ｉ／Ｆ部１１４と共に双方向の画像データバス１１７を介してＣＰＵ１０８に接続されている。 Furthermore, the video camera 100 includes a pseudo stereoscopic image generation unit (hereinafter referred to as “2D3D conversion unit”) 115 and a display I / F unit 116. As will be described in detail later, the 2D3D conversion unit 115 estimates depth information from a captured image that is a non-stereo image based on the control signals CTL1, CTL2, and CTL3 generated by the CPU 108, and there is almost no sense of incongruity in each scene. A pseudo stereoscopic image is generated. The display I / F unit 116 displays a pseudo stereoscopic image. The 2D3D conversion unit 115 is directly connected to the CPU 108 described above, and is connected to the CPU 108 via the bidirectional image data bus 117 together with the image storage unit 112, the image processing unit 113, and the recording / playback I / F unit 114. Yes.

また、ビデオカメラ１００は、操作部１１８を備えている。操作部１１８は、タッチパネルを備えており、タッチパネルのある位置にユーザーが指をタッチすることで、撮影画面でその位置に対応する位置が測距エリアとして選択されるようにされる。ビデオカメラ１００において、ズームレンズ１０１、フォーカスレンズ１０２、絞り１０３、撮像素子１０４、ズームレンズ駆動部１０５、フォーカスレンズ駆動部１０６、絞り駆動部１０７、ＣＰＵ１０８、メモリ１０９、録再Ｉ／Ｆ部１１４、及び２Ｄ３Ｄ変換部１１５は、本実施形態の擬似立体画像生成装置を構成している。 In addition, the video camera 100 includes an operation unit 118. The operation unit 118 includes a touch panel. When a user touches a finger on a position on the touch panel, a position corresponding to the position on the shooting screen is selected as a distance measurement area. In the video camera 100, a zoom lens 101, a focus lens 102, an aperture 103, an image sensor 104, a zoom lens driving unit 105, a focus lens driving unit 106, an aperture driving unit 107, a CPU 108, a memory 109, a recording / playback I / F unit 114, The 2D3D conversion unit 115 configures the pseudo-stereoscopic image generation apparatus according to the present embodiment.

上記の各構成要素からなるビデオカメラ１００は、所望の撮影モードによる被写体の撮影、撮影した画像データの記録媒体への記録、記録媒体から再生した画像データに基づく擬似立体画像の生成、擬似立体画像の表示を適宜選択して行う。 The video camera 100 including the above-described components includes a subject shooting in a desired shooting mode, recording of captured image data onto a recording medium, generation of a pseudo stereoscopic image based on image data reproduced from the recording medium, and a pseudo stereoscopic image. Is appropriately selected and performed.

そこで、まず、ビデオカメラ１００の撮影時と、撮影した画像データの記録媒体への記録時の動作について説明する。撮影時には、ズームレンズ１０１、フォーカスレンズ１０２、絞り１０３を順次に通過した被写体からの入射光は、撮像素子１０４の撮像面に光学像として結像されて光電変換され、撮像信号として出力される。 Therefore, first, operations at the time of shooting by the video camera 100 and at the time of recording the shot image data on a recording medium will be described. At the time of shooting, incident light from a subject that sequentially passes through the zoom lens 101, the focus lens 102, and the aperture 103 is formed as an optical image on the imaging surface of the image sensor 104, photoelectrically converted, and output as an imaging signal.

Ａ／Ｄ変換器１１０は、撮像素子１０４から出力された撮像信号をデジタル信号である画像データに変換し、信号処理部１１１に供給する。信号処理部１１１は、入力された画像データに対し、画素補間処理を含むカラープロセス処理を施し、デジタル値の輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）を生成すると共に、それらの信号中のノイズ除去を行う。画像記憶部１１２は、信号処理部１１１で生成されたデジタル値の輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）を順次格納する。画像記憶部１１２には、数フレーム分の信号が格納される。 The A / D converter 110 converts the image signal output from the image sensor 104 into image data that is a digital signal, and supplies the image data to the signal processing unit 111. The signal processing unit 111 performs color process processing including pixel interpolation processing on the input image data to generate a digital luminance signal (Y) and chrominance signals (Cb, Cr). Remove noise. The image storage unit 112 sequentially stores the digital luminance signal (Y) and color difference signals (Cb, Cr) generated by the signal processing unit 111. The image storage unit 112 stores signals for several frames.

画像処理部１１３は、画像データバス１１７を介して画像記憶部１１２にアクセスし、画像記憶部１１２に格納された画像データを、公知のＭＰＥＧ−２（Moving Picture Experts Group 2）方式やＭＰＥＧ−４ＡＶＣ（Moving Picture Experts Group 4 Advanced Video Coding）方式によって符号化する。 The image processing unit 113 accesses the image storage unit 112 via the image data bus 117 and converts the image data stored in the image storage unit 112 into a known MPEG-2 (Moving Picture Experts Group 2) system or MPEG-4. Encoding is performed by AVC (Moving Picture Experts Group 4 Advanced Video Coding) method.

このように撮影して得られた被写体の符号化画像データは記録媒体に記録される。すなわち、録再Ｉ／Ｆ部１１４は、被写体の撮像画像を画像処理部１１３により符号化して得られた画像データを、画像データバス１１７を介して取り込み、その画像データを図示しない記録媒体に記録する。ただし、記録媒体には後述するように符号化された画像データだけでなく、所定のデータが符号化された画像データに多重されて記録される。記録媒体としては、磁気ディスク、光ディスク、半導体記憶媒体、磁気テープ等がある。また、記憶媒体は、外部取り外しの可能な記憶媒体または内蔵の記憶媒体とすることができる。 The encoded image data of the subject obtained by photographing in this way is recorded on a recording medium. That is, the recording / reproducing I / F unit 114 takes in the image data obtained by encoding the captured image of the subject by the image processing unit 113 via the image data bus 117, and records the image data on a recording medium (not shown). To do. However, not only encoded image data as described later but also predetermined data is multiplexed and recorded on the encoded image data on the recording medium. Examples of the recording medium include a magnetic disk, an optical disk, a semiconductor storage medium, and a magnetic tape. The storage medium can be an externally removable storage medium or a built-in storage medium.

ここで、ビデオカメラ１００は、自動焦点調節（ＡＦともいう）撮影モードを備えている。ＡＦ方法には、いくつかの方式が公知となっているが、ここでは画面内の所定の小領域の輝度信号（Ｙ）の高域成分が最大になるフォーカスレンズの位置を合焦位置とする方式（いわゆる山登り方式）を用いる。この山登り方式は測距範囲の全域にわたりフォーカスレンズ１０２の位置を移動しながら撮像素子１０４から得られる輝度信号の高域成分（以下焦点評価値という）を記憶していき、記憶した値の最大値に相当するフォーカスレンズ位置を合焦位置とする方式を用いる。一旦、合焦位置が求まったら、以後の撮影は合焦位置の近傍でフォーカスレンズ１０２を駆動して輝度信号の高域成分の最大値を求めることで、動画像撮影に対応する。 Here, the video camera 100 has an automatic focus adjustment (also referred to as AF) shooting mode. Several methods are known for the AF method. Here, the focus lens position where the high frequency component of the luminance signal (Y) of a predetermined small area in the screen is maximized is set as the in-focus position. A method (so-called mountain climbing method) is used. In this hill-climbing method, the high frequency component (hereinafter referred to as a focus evaluation value) of the luminance signal obtained from the image sensor 104 is stored while moving the position of the focus lens 102 over the entire distance measuring range, and the maximum value of the stored values is stored. The focus lens position corresponding to is used as the in-focus position. Once the in-focus position is obtained, the subsequent photographing corresponds to moving image photographing by driving the focus lens 102 in the vicinity of the in-focus position to obtain the maximum value of the high frequency component of the luminance signal.

このＡＦ撮影モードでは、ＣＰＵ１０８は、画像記憶部１１２に格納されている１フレーム分の輝度信号（Ｙ）のうち、所定の小領域に属する画素のデータを画像データバス１１７を介して読み出す。そして、ＣＰＵ１０８は、この画素データから高域成分をデジタルフィルタ演算によって算出することで、焦点評価値を得る。また、ＣＰＵ１０８は、フォーカスレンズ駆動部１０６に指示を出してフォーカスレンズ１０２を移動させる。 In this AF shooting mode, the CPU 108 reads out data of pixels belonging to a predetermined small area from the luminance signal (Y) for one frame stored in the image storage unit 112 via the image data bus 117. Then, the CPU 108 obtains a focus evaluation value by calculating a high frequency component from the pixel data by digital filter calculation. Further, the CPU 108 instructs the focus lens driving unit 106 to move the focus lens 102.

上記の所定の小領域は、測距エリアと呼ばれる。この測距エリアを決定する方法にはいくつかの方法が知られている。第１の測距エリア決定方法は、測距エリアを撮影画面の中央部分に固定する方法である。第２の測距エリア決定方法は、測距エリアをユーザーが撮影画面の所望の位置に選択する方法である。この方法では、操作部１１８が備えるタッチパネルをユーザーがタッチすることで、撮影画面のタッチした位置に対応する位置を中心とする所定の大きさの小領域が測距エリアとして選択される。 The predetermined small area is called a distance measuring area. Several methods are known for determining the distance measurement area. The first distance measurement area determination method is a method of fixing the distance measurement area at the center of the shooting screen. The second distance measurement area determination method is a method in which the user selects the distance measurement area at a desired position on the shooting screen. In this method, when the user touches the touch panel included in the operation unit 118, a small area having a predetermined size centering on a position corresponding to the touched position on the shooting screen is selected as the distance measurement area.

また、第３の測距エリア決定方法は、ユーザーが上記タッチパネルにタッチした時点で撮影画面のタッチした位置に対応する位置を中心とする所定の大きさの小領域内に存在する被写体を追尾して、その追尾領域を測距エリアとする方法である。例えば、草原を歩き回る犬を撮影するような場合、撮影画面内の犬の位置に対応する位置をタッチすることで、それ以降に犬が歩き回っても、犬の位置を自動で追尾して測距エリアとする。 The third distance measurement area determination method tracks a subject existing in a small area having a predetermined size centering on a position corresponding to the touched position on the shooting screen when the user touches the touch panel. Thus, the tracking area is used as a ranging area. For example, when shooting a dog walking around the meadow, touch the position corresponding to the position of the dog in the shooting screen, and even if the dog walks around after that, the dog's position is automatically tracked and measured Let it be an area.

ＣＰＵ１０８は、画像記憶部１１２に格納されている１フレーム分の輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）を読み出し、それらの信号を用いてタッチパネルのタッチされた位置に対応する撮影画面内の位置を中心とする小領域の信号の特徴量を抽出する。特徴量としては、色の分布や輝度信号の高域成分の形状などを利用する。測距エリアを移動する被写体に追尾する方法では、被写体の移動中は、後続の各々のフレームの輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）を用いて類似の特徴量を有する位置を探索することで、測距エリアを決定する。 The CPU 108 reads out the luminance signal (Y) and the color difference signals (Cb, Cr) for one frame stored in the image storage unit 112, and uses these signals in the shooting screen corresponding to the touched position on the touch panel. The feature amount of the signal in the small area centering on the position of is extracted. As the feature amount, a color distribution, a shape of a high frequency component of a luminance signal, or the like is used. In the method of tracking a moving object in the ranging area, a position having a similar feature amount is searched using the luminance signal (Y) and color difference signals (Cb, Cr) of each subsequent frame while the object is moving. By doing so, the ranging area is determined.

また、第４の測距エリア決定方法は、撮影画面内で被写体の人物の顔を自動で検出して、その顔の位置を中心とする所定の大きさの小領域を測距エリアとする方法である。人物の顔の検出手段自体は公知であるので、その説明は省略する。複数の顔が検出された場合は、その内で主要被写体と判断される顔の位置を中心とする所定の大きさの小領域を測距エリアとする。 The fourth distance measurement area determination method is a method of automatically detecting the face of a subject person on the shooting screen and setting a small area having a predetermined size centered on the position of the face as the distance measurement area. It is. Since the human face detection means itself is known, the description thereof is omitted. When a plurality of faces are detected, a small area having a predetermined size centered on the position of the face determined to be the main subject is set as the distance measurement area.

この測距エリア決定方法としては、例えばユーザーが上記タッチパネルをタッチすることで、タッチした位置に相当する顔を主要被写体の顔と判断する方法がある。あるいは、個人認証機能により、予め登録した人物の特徴量がメモリ１０９に格納されている場合には、登録していない人物よりも登録した人物の顔を主要被写体の顔と判断し、登録した複数の人物の顔が検出された場合には、登録時にユーザーが設定した登録順位に従って主要被写体の顔と判断する方法がある。 As the distance measurement area determination method, for example, there is a method in which the face corresponding to the touched position is determined as the face of the main subject when the user touches the touch panel. Alternatively, when the pre-registered person's feature amount is stored in the memory 109 by the personal authentication function, the registered person's face is judged to be the face of the main subject rather than the unregistered person, and the registered plural When a person's face is detected, there is a method of determining the face of the main subject according to the registration order set by the user at the time of registration.

なお、顔の位置を測距エリアとした場合には、その顔の大きさ情報も取得する。顔の大きさ情報は、例えば、撮影画面内で検出された被写体の人物の顔の撮影画面面積（画素数）を、その撮影時点におけるズームレンズ１０１の位置（ズーム比）と対応させた情報である。上記のＡＦ撮影モードでは測距エリアの合焦位置は時々刻々変化し、また、上記の追尾方式や顔検出方式では測距エリアの位置は時々刻々変化する。 In addition, when the position of the face is set as the distance measurement area, the size information of the face is also acquired. The face size information is, for example, information in which the shooting screen area (number of pixels) of the face of the subject person detected in the shooting screen is associated with the position (zoom ratio) of the zoom lens 101 at the time of shooting. is there. In the AF shooting mode described above, the focus position of the distance measurement area changes from moment to moment, and in the tracking method and face detection method, the position of the distance measurement area changes from moment to moment.

また、撮影時には、操作部１１８に対するユーザーの操作に従いズームレンズ１０１を移動させて、撮影画面を拡大したり、縮小したりすることかできる。この場合、ＣＰＵ１０８は、操作部１１８から入力されるズームレンズ操作用信号に対応してズームレンズ駆動部１０５に指示を出してズームレンズ１０１を移動させる。 At the time of shooting, the zoom lens 101 can be moved in accordance with a user operation on the operation unit 118 to enlarge or reduce the shooting screen. In this case, the CPU 108 instructs the zoom lens driving unit 105 to move the zoom lens 101 in response to the zoom lens operation signal input from the operation unit 118.

また、撮影時には、操作部１１８に対するユーザーの操作に従い絞り量（絞り値）を変化させることができる。この場合、ＣＰＵ１０８は、操作部１１８からの絞り制御信号に対応して絞り駆動部１０７に指示を出して絞り１０３の絞り量を変化させる。絞り量は被写体の明るさに応じて、ＣＰＵ１０８が自動で制御する場合もある。 At the time of shooting, the aperture amount (aperture value) can be changed according to the user's operation on the operation unit 118. In this case, the CPU 108 instructs the diaphragm drive unit 107 in response to the diaphragm control signal from the operation unit 118 to change the diaphragm amount of the diaphragm 103. The CPU 108 may automatically control the aperture amount according to the brightness of the subject.

ズームレンズ１０１の移動位置、フォーカスレンズ１０２の移動位置及び絞り１０３の絞り量は、ズームレンズ駆動部１０５、フォーカスレンズ駆動部１０６及び絞り駆動部１０７でそれぞれ取得され、その取得情報がＣＰＵ１０８に送られる。ＣＰＵ１０８は、これらの取得情報とメモリ１０９に格納されている被写体距離推定テーブルとを用いて、フォーカスレンズ１０２の合焦位置から被写体までの距離を推定する（すなわち、被写体までの推定距離を算出する）。また、ＣＰＵ１０８は、これらの取得情報とメモリ１０９に格納されている被写界深度テーブルとを用いて被写界深度を推定する。 The moving position of the zoom lens 101, the moving position of the focus lens 102, and the amount of aperture of the diaphragm 103 are respectively acquired by the zoom lens driving unit 105, the focus lens driving unit 106, and the aperture driving unit 107, and the acquired information is sent to the CPU 108. . The CPU 108 estimates the distance from the in-focus position of the focus lens 102 to the subject using the acquired information and the subject distance estimation table stored in the memory 109 (that is, calculates the estimated distance to the subject). ). Further, the CPU 108 estimates the depth of field using the acquired information and the depth of field table stored in the memory 109.

続いて、ＣＰＵ１０８は、フレーム毎の測距エリアの撮影画面内での位置データ、被写体までの推定距離のデータ、被写界深度の推定データ及び被写体が顔か否かのデータ、更に顔の場合には顔の大きさのデータを含めて、それらの各種データを録再Ｉ／Ｆ部１１４に出力する。録再Ｉ／Ｆ部１１４は、画像処理部１１３で符号化された画像データと、ＣＰＵ１０８で生成された上記の各種データとを、対応させて多重化して図示しない記録媒体に記録する。上記の各種データを符号化画像データに多重化する方法は、特に限定されるものではないが、例えば、上記の各種データを符号化画像データの中に、ユーザデータとして公知の方法により埋め込むようにしてもよい。 Subsequently, the CPU 108 determines the position data in the shooting area of the ranging area for each frame, the estimated distance data to the subject, the estimated depth of field data, and the data on whether or not the subject is a face. Includes the face size data and outputs the various data to the recording / playback I / F unit 114. The recording / reproducing I / F unit 114 multiplexes the image data encoded by the image processing unit 113 and the various data generated by the CPU 108 in correspondence with each other, and records them on a recording medium (not shown). The method for multiplexing the above various data into the encoded image data is not particularly limited. For example, the above various data may be embedded in the encoded image data as user data by a known method. May be.

なお、測距エリアの位置、被写体までの推定距離と推定被写界深度のデータの多重化は必ずしも毎フレームでなくてもよく、数フレーム置きでもよいし、所定の閾値よりも大きな変化があった場合に限定してもよい。 It should be noted that the data of the position of the ranging area, the estimated distance to the subject, and the estimated depth of field need not be multiplexed every frame, may be every other frame, and there is a change larger than a predetermined threshold. It may be limited to the case.

また、被写体までの推定距離と推定被写界深度については、これらの替りにズームレンズ１０１の移動位置、フォーカスレンズ１０２の移動位置、及び絞り１０３の絞り量を符号化するようにしてもよい。これらと共に被写体距離推定テーブル、被写界深度テーブルに相当するデータを多重化して記録すれば、再生側で被写体までの推定距離と推定被写界深度とが算出できる。また、自己記録再生に限定すれば、被写体距離推定テーブル、被写界深度テーブルは記録しなくてもよい。 As for the estimated distance to the subject and the estimated depth of field, the moving position of the zoom lens 101, the moving position of the focus lens 102, and the aperture amount of the aperture 103 may be encoded instead. If the data corresponding to the subject distance estimation table and the depth of field table are multiplexed and recorded together with them, the estimated distance to the subject and the estimated depth of field can be calculated on the reproduction side. Further, if limited to self-recording / reproduction, the subject distance estimation table and the depth of field table need not be recorded.

また、本実施形態では、被写体までの推定距離を山登り法で求めたが、これに限定されるものではなく、赤外線を照射してその反射波を観測する方式や、いわゆる位相差検出方式を用いてもよい。 In this embodiment, the estimated distance to the subject is obtained by the hill-climbing method. However, the present invention is not limited to this, and a method of irradiating infrared rays and observing the reflected wave, or a so-called phase difference detection method is used. May be.

次に、本実施形態の記録媒体から再生した画像データに基づく擬似立体画像の生成時の動作について説明する。 Next, an operation at the time of generating a pseudo stereoscopic image based on image data reproduced from the recording medium of the present embodiment will be described.

再生時には、録再Ｉ／Ｆ部１１４は、図示しない記録媒体に記録されている多重化データを読み出し、その多重化データのうち符号化画像データは画像データバス１１７を介して画像処理部１１３に供給する。また、録再Ｉ／Ｆ部１１４は、読み出した多重化データのうち、符号化画像データ以外の、測距エリアの位置、被写体までの推定距離、推定被写界深度、被写体が顔か否かのデータ、顔の場合の顔の大きさのデータなど各種データは画像データバス１１７を介してＣＰＵ１０８に供給する。 During reproduction, the recording / playback I / F unit 114 reads multiplexed data recorded on a recording medium (not shown), and the encoded image data of the multiplexed data is sent to the image processing unit 113 via the image data bus 117. Supply. In addition, the recording / playback I / F unit 114, among the read multiplexed data, other than the encoded image data, the position of the distance measurement area, the estimated distance to the subject, the estimated depth of field, whether the subject is a face or not And various data such as face size data in the case of a face are supplied to the CPU 108 via the image data bus 117.

画像処理部１１３は、入力された符号化画像データを復号して、デジタル値の復号輝度信号（Ｙ）及び復号色差信号（Ｃｂ，Ｃｒ）を得て、これらを画像記憶部１１２に格納する。画像記憶部１１２は、格納した復号輝度信号（Ｙ）及び復号色差信号（Ｃｂ，Ｃｒ）を、各フレームの時間順の並べ替えなどを行って読み出して、２Ｄ３Ｄ変換部１１５に供給する。 The image processing unit 113 decodes input encoded image data to obtain a decoded luminance signal (Y) and a decoded color difference signal (Cb, Cr) as digital values, and stores them in the image storage unit 112. The image storage unit 112 reads the stored decoded luminance signal (Y) and decoded chrominance signal (Cb, Cr) by rearranging the frames in order of time, and supplies them to the 2D3D conversion unit 115.

一方、ＣＰＵ１０８は、入力された、測距エリアの位置、被写体までの推定距離、推定被写界深度、被写体が顔か否かのデータ、顔の場合の顔の大きさのデータなど各種データに基づいて、後述するようにフレーム毎に制御信号ＣＴＬ１、ＣＴＬ２及びＣＴＬ３を算出して、２Ｄ３Ｄ変換部１１５に供給する。このとき、ＣＰＵ１０８は、上記の復号輝度信号（Ｙ）及び復号色差信号（Ｃｂ，Ｃｒ）（以下、これらを復号画像信号ａという）中のあるフレームが２Ｄ３Ｄ変換部１１５に入力するタイミングに合わせて、そのフレームに対応する前記の制御信号ＣＴＬ１〜ＣＴＬ３を２Ｄ３Ｄ変換部１１５に供給する。 On the other hand, the CPU 108 stores various data such as the position of the distance measurement area, the estimated distance to the subject, the estimated depth of field, whether or not the subject is a face, and the face size data in the case of a face. Based on this, as described later, control signals CTL1, CTL2, and CTL3 are calculated for each frame and supplied to the 2D3D conversion unit 115. At this time, the CPU 108 matches the timing at which a certain frame in the decoded luminance signal (Y) and the decoded color difference signal (Cb, Cr) (hereinafter referred to as a decoded image signal a) is input to the 2D3D conversion unit 115. The control signals CTL1 to CTL3 corresponding to the frame are supplied to the 2D3D conversion unit 115.

ここで、上記の制御信号ＣＴＬ１〜ＣＴＬ３は、制御信号の無い場合に得られた擬似立体画像が不自然になる場合に、ＣＰＵ１０８によって１フレーム毎に自動的に生成される信号である。なお、制御信号ＣＴＬ１〜ＣＴＬ３は１フィールド毎に生成してもよい。 Here, the control signals CTL1 to CTL3 are signals that are automatically generated for each frame by the CPU 108 when the pseudo-stereoscopic image obtained without the control signal becomes unnatural. Note that the control signals CTL1 to CTL3 may be generated for each field.

上記の制御信号ＣＴＬ１は、後述する基本奥行きモデルタイプＡ、Ｂ及びＣの各画像の合成比率を制御する信号であり、基本奥行きモデルタイプＡの合成比率ｋ１、基本奥行きモデルタイプＢの合成比率ｋ２、基本奥行きモデルタイプＣの合成比率ｋ３をそれぞれ示すパラメータを含む。なお、上記の３つの合成比率ｋ１〜ｋ３の合計値は常に「１」であるので、制御信号ＣＴＬ１は例えば３つの合成比率ｋ１〜ｋ３のうちの任意の２つの合成比率を伝送し、残りの１つの合成比率は２Ｄ３Ｄ変換部１１５内で、伝送される任意の２つの合成比率の和を「１」から減算することで生成させることも可能である。 The control signal CTL1 is a signal for controlling a synthesis ratio of each image of basic depth model types A, B, and C, which will be described later, and is a synthesis ratio k1 of the basic depth model type A and a synthesis ratio k2 of the basic depth model type B. And a parameter indicating the combination ratio k3 of the basic depth model type C. Since the total value of the above-described three synthesis ratios k1 to k3 is always “1”, the control signal CTL1 transmits, for example, any two synthesis ratios among the three synthesis ratios k1 to k3, and the remaining One composition ratio can be generated by subtracting the sum of any two composition ratios to be transmitted from “1” in the 2D3D conversion unit 115.

また、上記の制御信号ＣＴＬ２は、画像処理部１１３で復号して得られた、非立体画像信号である復号画像信号ａ中の赤色信号（Ｒ信号）成分のみを重み付けするための重み付け係数を示す制御信号である。この制御信号ＣＴＬ２により、復号画像信号ａの輝度差が強い場合であっても、不自然な擬似立体画像となることを抑制することができる。 Further, the control signal CTL2 indicates a weighting coefficient for weighting only the red signal (R signal) component in the decoded image signal a which is a non-stereo image signal obtained by decoding by the image processing unit 113. It is a control signal. This control signal CTL2 can suppress an unnatural pseudo stereoscopic image even when the luminance difference of the decoded image signal a is strong.

更に、上記の制御信号ＣＴＬ３は、奥行きを示すパラメータと輻輳を示すパラメータとを含む制御信号である。輻輳とは、遠景に対しては擬似立体画像のユーザーの両目の視線がほぼ平行となるようにし、近景に対しては両目の視線を内転させることができるようにすることをいう。 Further, the control signal CTL3 is a control signal including a parameter indicating depth and a parameter indicating congestion. Convergence means that the line of sight of both eyes of the user of the pseudo-stereoscopic image is substantially parallel to a distant view, and that the line of sight of both eyes can be inverted for a close view.

図１において、２Ｄ３Ｄ変換部１１５は、画像記憶部１１２から供給される復号映像信号ａと、ＣＰＵ１０８から供給される制御信号ＣＴＬ１〜ＣＴＬ３とを入力信号として受け、後述するように、復号映像信号ａ又は制御信号ＣＴＬ１及びＣＴＬ２に基づいて奥行き推定データｄを生成してＣＰＵ１０８に出力すると共に、復号映像信号ａ及び制御信号ＣＴＬ１〜ＣＴＬ３に基づいて擬似立体画像信号を生成する。 In FIG. 1, the 2D3D conversion unit 115 receives the decoded video signal a supplied from the image storage unit 112 and the control signals CTL1 to CTL3 supplied from the CPU 108 as input signals. Alternatively, the depth estimation data d is generated based on the control signals CTL1 and CTL2 and output to the CPU 108, and the pseudo stereoscopic image signal is generated based on the decoded video signal a and the control signals CTL1 to CTL3.

次に、２Ｄ３Ｄ変換部１１５の構成及び動作について詳細に説明する。 Next, the configuration and operation of the 2D3D conversion unit 115 will be described in detail.

図２は、２Ｄ３Ｄ変換部１１５の一例のブロック図、図３は、図２中の奥行き推定データ生成部の一例のブロック図、図４は、図２中のステレオペア生成部３００の一例のブロック図を示す。 2 is a block diagram of an example of the 2D3D conversion unit 115, FIG. 3 is a block diagram of an example of the depth estimation data generation unit in FIG. 2, and FIG. 4 is a block of an example of the stereo pair generation unit 300 in FIG. The figure is shown.

図２に示すように、２Ｄ３Ｄ変換部１１５は、奥行き推定データ生成部２００とステレオペア生成部３００とから構成される。２Ｄ３Ｄ変換部１１５は、画像記憶部１１２から供給される非立体画像信号である復号画像信号ａを、加工することなくそのまま例えば右目用映像信号ｅ２として出力すると共に、奥行き推定データ生成部２００及びステレオペア生成部３００にそれぞれ供給する。 As illustrated in FIG. 2, the 2D3D conversion unit 115 includes a depth estimation data generation unit 200 and a stereo pair generation unit 300. The 2D3D conversion unit 115 outputs the decoded image signal a, which is a non-stereo image signal supplied from the image storage unit 112, as it is, for example, as the right-eye video signal e2 without being processed, and also includes the depth estimation data generation unit 200 and the stereo It supplies to the pair production | generation part 300, respectively.

奥行き推定データ生成部２００は、ＣＰＵ１０８から供給される制御信号ＣＴＬ１及びＣＴＬ２と、画像記憶部１１２から供給される復号画像信号ａとから奥行き推定データｄを生成してＣＰＵ１０８及びステレオペア生成部３００に供給する。ステレオペア生成部３００は、奥行き推定データｄと、ＣＰＵ１０８から供給される制御信号ＣＴＬ３と、画像記憶部１１２から供給される復号画像信号ａとから擬似立体画像信号の左目用映像信号ｅ１を生成して出力する。 The depth estimation data generation unit 200 generates depth estimation data d from the control signals CTL1 and CTL2 supplied from the CPU 108 and the decoded image signal a supplied from the image storage unit 112, and sends the depth estimation data d to the CPU 108 and the stereo pair generation unit 300. Supply. The stereo pair generation unit 300 generates a left-eye video signal e1 of a pseudo stereoscopic image signal from the depth estimation data d, the control signal CTL3 supplied from the CPU 108, and the decoded image signal a supplied from the image storage unit 112. Output.

奥行き推定データ生成部２００は、図３に示すように、画像入力部２０１と、画面上部の高域成分評価部２０２と、画面下部の高域成分評価部２０３と、合成比率決定部２０４と、スイッチ２０５と、奥行きモデル合成手段である奥行きモデル合成部２０６と、それぞれ基本奥行きモデル発生手段であるフレームメモリ２０７、２０８及び２０９と、制御信号判定部２１０と、重み付け手段である重み付け部２１１と、奥行き推定データを生成する奥行き推定データ生成手段である加算部２１２とから構成されている。 As shown in FIG. 3, the depth estimation data generation unit 200 includes an image input unit 201, a high-frequency component evaluation unit 202 at the top of the screen, a high-frequency component evaluation unit 203 at the bottom of the screen, a composition ratio determination unit 204, A switch 205, a depth model synthesis unit 206 as a depth model synthesis unit, frame memories 207, 208 and 209 as basic depth model generation units, a control signal determination unit 210, a weighting unit 211 as a weighting unit, It is comprised from the addition part 212 which is the depth estimation data production | generation means which produces | generates depth estimation data.

画像入力部２０１は、フレームメモリを備えており、非立体画像信号である１フレーム分の復号画像信号ａを一時記憶した後、その１フレーム分の復号画像信号ａを画面上部の高域成分評価部２０２及び画面下部の高域成分評価部２０３にそれぞれ供給すると共に、復号画像信号ａ中の赤色信号（Ｒ信号）成分のみを重み付け手段である重み付け部２１１の一方の入力端子に供給する。重み付け部２１１の他方の入力端子には、制御信号判定手段である制御信号判定部２１０からの制御信号ＣＴＬ２が供給される。 The image input unit 201 includes a frame memory. After temporarily storing the decoded image signal a for one frame, which is a non-stereo image signal, the decoded image signal a for one frame is evaluated as a high-frequency component at the top of the screen. The unit 202 and the high-frequency component evaluation unit 203 at the bottom of the screen are respectively supplied, and only the red signal (R signal) component in the decoded image signal a is supplied to one input terminal of the weighting unit 211 serving as a weighting unit. The other input terminal of the weighting unit 211 is supplied with a control signal CTL2 from a control signal determination unit 210 that is a control signal determination unit.

画面上部の高域成分評価部２０２は、１フレーム分の復号画像信号ａにおける画面の上部約２０％にあたる領域内での高域成分を求めて、画面上部の高域成分評価値として算出する。そして、画面上部の高域成分評価部２０２は、画面上部の高域成分評価値を合成比率決定部２０４に供給する。画面下部の高域成分評価部２０３は、１フレーム分の復号画像信号ａにおける画面の下部約２０％領域内にあたる領域内での高域成分を求めて、画面下部の高域成分評価値として算出する。そして、画面下部の高域成分評価部２０３は、画面下部の高域成分評価値を合成比率決定部２０４に供給する。 The high-frequency component evaluation unit 202 at the top of the screen calculates a high-frequency component in an area corresponding to about 20% of the top of the screen in the decoded image signal a for one frame, and calculates it as a high-frequency component evaluation value at the top of the screen. Then, the high frequency component evaluation unit 202 at the top of the screen supplies the high frequency component evaluation value at the top of the screen to the synthesis ratio determination unit 204. A high-frequency component evaluation unit 203 at the bottom of the screen calculates a high-frequency component in an area corresponding to approximately 20% of the bottom of the screen in the decoded image signal a for one frame, and calculates it as a high-frequency component evaluation value at the bottom of the screen. To do. Then, the high frequency component evaluation unit 203 at the bottom of the screen supplies the high frequency component evaluation value at the bottom of the screen to the synthesis ratio determination unit 204.

一方、フレームメモリ２０７は基本奥行きモデルタイプＡ、フレームメモリ２０８は基本奥行きモデルタイプＢ、フレームメモリ２０９は基本奥行きモデルタイプＣの画像を予め格納している。これらの基本奥行きモデルタイプＡ〜Ｃの画像は、それぞれ非立体画像信号を基に奥行き推定データを生成して擬似立体画像信号を生成するための基本となるシーンの画像を示す。 On the other hand, the frame memory 207 stores the basic depth model type A, the frame memory 208 stores the basic depth model type B, and the frame memory 209 stores the basic depth model type C. The images of these basic depth model types A to C are images of scenes serving as a basis for generating pseudo stereoscopic image signals by generating depth estimation data based on non-stereo image signals, respectively.

すなわち、上記の基本奥行きモデルタイプＡの画像は、球面状の凹面による奥行きモデルの画像で、図５に示すような立体構造の画像を示す。多くの場合に、この基本奥行きモデルタイプＡの画像が使用される。オブジェクトが存在しないシーンにおいては、画面中央を一番遠距離に設定することにより、違和感の少ない立体感及び快適な奥行き感が得られるからである。 That is, the basic depth model type A image is a depth model image having a spherical concave surface, and an image having a three-dimensional structure as shown in FIG. In many cases, this basic depth model type A image is used. This is because, in a scene in which no object exists, setting the center of the screen to the farthest distance can provide a three-dimensional effect with less discomfort and a comfortable depth feeling.

また、上記の基本奥行きモデルタイプＢの画像は、基本奥行きモデルタイプＡの画像の上部を球面でなく、アーチ型の円筒面に置き換えたもので、図６に立体構造を示すような、上部を円筒面（軸は垂直方向）で下部を凹面（球面）としたモデルの画像である。 In addition, the basic depth model type B image is obtained by replacing the upper part of the basic depth model type A image with an arch-shaped cylindrical surface instead of a spherical surface. It is an image of a model in which a cylindrical surface (axis is a vertical direction) and a lower surface is a concave surface (spherical surface).

更に、上記の基本奥行きモデルタイプＣの画像は、図７に立体構造を示すような、上部を平面とし、下部をその平面から連続し、下に行くほど手前側に向かう円筒面状としたもので、上部が平面、下部が円筒面（軸は水平方向）としたモデルの画像である。基本奥行きモデルタイプ発生手段を構成するフレームメモリ２０７〜２０９に格納されている、これら基本奥行きモデルタイプＡ〜Ｃの画像は、奥行きモデル合成部２０６へ供給される。 Furthermore, the basic depth model type C image is a cylindrical surface that has a three-dimensional structure as shown in FIG. 7, with the upper part being a plane, the lower part being continuous from the plane, and going downwards toward the front. The upper part is a plane image and the lower part is a cylindrical surface (the axis is the horizontal direction). The images of the basic depth model types A to C stored in the frame memories 207 to 209 constituting the basic depth model type generation unit are supplied to the depth model synthesis unit 206.

合成比率決定部２０４は、画面上部の高域成分評価部２０２から供給された画面上部の高域成分評価値と、画面下部の高域成分評価部２０３から供給された画面下部の高域成分評価値とに基づいて、画像のシーンを考慮することなく、予め定められた方法により、基本奥行きモデルタイプＡの合成比率ｋ１、基本奥行きモデルタイプＢの合成比率ｋ２、基本奥行きモデルタイプＣの合成比率ｋ３を自動的に算出し、それらを合成比率信号ＣＯＭとして制御信号判定手段であるスイッチ２０５に供給する。なお、３つの合成比率ｋ１〜ｋ３の合計値は常に「１」である。 The composition ratio determination unit 204 includes a high frequency component evaluation value at the top of the screen supplied from the high frequency component evaluation unit 202 at the top of the screen and a high frequency component evaluation at the bottom of the screen supplied from the high frequency component evaluation unit 203 at the bottom of the screen. Based on the value, the composition ratio k1 of the basic depth model type A, the composition ratio k2 of the basic depth model type B, and the composition ratio of the basic depth model type C by a predetermined method without considering the scene of the image k3 is automatically calculated and supplied to the switch 205, which is a control signal determination means, as a composite ratio signal COM. The total value of the three synthesis ratios k1 to k3 is always “1”.

図８は、合成比率の決定条件の一例を示す。図８は、横軸に示す画面上部の高域成分評価値（以下、上部の高域成分評価値と略す）と、縦軸に示す画面下部の高域成分評価値（以下、下部の高域成分評価値と略す）の各値と、予め指定された値ｔｐｓ、ｔｐｌ、ｂｍｓ、ｂｍｌとの兼ね合いにより合成比率が決定されることを示す。この合成比率の決定条件は、本出願人が特許第４２１４９７６号公報にて開示した公知の決定条件であるが、これに限定されるものではない。 FIG. 8 shows an example of conditions for determining the composition ratio. FIG. 8 shows a high-frequency component evaluation value at the top of the screen indicated by the horizontal axis (hereinafter abbreviated as the high-frequency component evaluation value at the top) and a high-frequency component evaluation value at the bottom of the screen indicated by the vertical axis (hereinafter, the high frequency component at the bottom It is shown that the composition ratio is determined based on the balance between each value of the component evaluation value (abbreviated as component evaluation value) and the predesignated values tps, tpl, bms, and bml. The determination condition of the synthesis ratio is a known determination condition disclosed by the present applicant in Japanese Patent No. 4214976, but is not limited thereto.

図８において、複数のタイプが記載されている領域については、高域成分評価値に応じて線形に合成される。例えば、図８において、「ｔｙｐｅＡ／Ｂ」の領域では、下記のように（上部の高域成分評価値）と（下部の高域成分評価値）の比率で基本奥行きモデルタイプＡの値であるｔｙｐｅＡと基本奥行きモデルタイプＢの値であるｔｙｐｅＢの比率が決定され、基本奥行きモデルタイプＣの値であるｔｙｐｅＣは比率の決定には用いられない。 In FIG. 8, regions where a plurality of types are described are synthesized linearly according to the high frequency component evaluation value. For example, in FIG. 8, the area of “type A / B” is the value of the basic depth model type A at the ratio of (upper high-frequency component evaluation value) and (lower high-frequency component evaluation value) as follows. The ratio between type A and type B, which is the value of basic depth model type B, is determined, and type C, which is the value of basic depth model type C, is not used for determining the ratio.

ｔｙｐｅＡ：ｔｙｐｅＢ：ｔｙｐｅＣ
＝（上部の高域成分評価値−ｔｐｓ）：（ｔｐ１−下部の高域成分評価値）：０
また、図８において、「ｔｙｐｅＡ／Ｂ／Ｃ」の領域では、ｔｙｐｅＡ／ＢとｔｙｐｅＡ／Ｃとの平均を採用して、下記のようにｔｙｐｅＡ／Ｂ／Ｃの値が決定される。 typeA: typeB: typeC
= (Upper high-frequency component evaluation value-tps): (tp1-Lower high-frequency component evaluation value): 0
In FIG. 8, in the region of “type A / B / C”, the average of type A / B and type A / C is adopted, and the value of type A / B / C is determined as follows.

ｔｙｐｅＡ：ｔｙｐｅＢ：ｔｙｐｅＣ
＝（上部の高域成分評価値−tps）＋（下部の高域成分評価値−bms）：（tpl−
上部の高域成分評価値）：（bml−下部の高域成分評価値）
なお、合成比率ｋ１、ｋ２、ｋ３は次式で算出される。 typeA: typeB: typeC
= (Upper high-frequency component evaluation value-tps) + (Lower high-frequency component evaluation value-bms): (tpl-
Upper high band component evaluation value): (bml-lower high band component evaluation value)
The synthesis ratios k1, k2, and k3 are calculated by the following equations.

ｋ１＝ｔｙｐｅ１／（ｔｙｐｅＡ＋ｔｙｐｅＢ＋ｔｙｐｅＣ）
ｋ２＝ｔｙｐｅ２／（ｔｙｐｅＡ＋ｔｙｐｅＢ＋ｔｙｐｅＣ）
ｋ３＝ｔｙｐｅ３／（ｔｙｐｅＡ＋ｔｙｐｅＢ＋ｔｙｐｅＣ）
図３に戻って説明する。スイッチ２０５は、ＣＰＵ１０８から制御信号ＣＴＬ１が供給される場合には制御信号ＣＴＬ１を選択して奥行きモデル合成部２０６に供給し、制御信号ＣＴＬ１が供給されない場合には合成比率決定部２０４より供給される合成比率信号ＣＯＭを選択する。スイッチ２０５は、選択した信号を合成手段である奥行きモデル合成部２０６に供給する。 k1 = type1 / (typeA + typeB + typeC)
k2 = type2 / (typeA + typeB + typeC)
k3 = type3 / (typeA + typeB + typeC)
Returning to FIG. The switch 205 selects the control signal CTL1 when the control signal CTL1 is supplied from the CPU 108 and supplies the selected control signal CTL1 to the depth model synthesis unit 206. When the control signal CTL1 is not supplied, the switch 205 is supplied from the synthesis ratio determination unit 204. The combination ratio signal COM is selected. The switch 205 supplies the selected signal to the depth model synthesis unit 206 which is a synthesis unit.

奥行きモデル合成部２０６は、スイッチ２０５から供給された制御信号ＣＴＬ１あるいは合成比率信号ＣＯＭが示す合成比率ｋ１〜ｋ３が示す比率で、基本奥行きモデルタイプＡ〜Ｃの画像を合成して合成奥行きモデルとなる画像信号を生成する。 The depth model synthesis unit 206 synthesizes the images of the basic depth model types A to C at the ratio indicated by the synthesis ratios k1 to k3 indicated by the control signal CTL1 or the synthesis ratio signal COM supplied from the switch 205 and the synthesized depth model. An image signal is generated.

図９は、奥行きモデル合成部２０６の一例のブロック図を示す。同図に示すように、奥行きモデル合成部２０６は、スイッチ２０５により選択された制御信号ＣＴＬ１又は合成比率信号ＣＯＭが示す合成比率ｋ１、ｋ２、ｋ３と、フレームメモリ２０７、２０８、２０９からの基本奥行きモデルタイプＡ、Ｂ、Ｃの各画像信号とを、乗算器２０６１、２０６２、２０６３において別々に乗算し、それら３つの乗算結果を加算器２０６４で加算して、得られた加算結果を奥行き推定データ生成手段である加算部２１２へ出力する構成である。 FIG. 9 is a block diagram illustrating an example of the depth model synthesis unit 206. As shown in the figure, the depth model combining unit 206 includes the combination ratios k1, k2, and k3 indicated by the control signal CTL1 or the combination ratio signal COM selected by the switch 205, and the basic depth from the frame memories 207, 208, and 209. The image signals of model types A, B, and C are separately multiplied by multipliers 2061, 2062, and 2063, the three multiplication results are added by an adder 2064, and the obtained addition result is obtained as depth estimation data. It is the structure which outputs to the addition part 212 which is a production | generation means.

図３において、制御信号判定部２１０は、ＣＰＵ１０８より制御信号ＣＴＬ２が供給されているか否かを判定する。制御信号判定部２１０は、制御信号ＣＴＬ２が供給される場合は、その制御信号ＣＴＬ２を重み付け部２１１に供給する。一方、制御信号ＣＴＬ２が供給されていない場合、予め制御信号判定部２１０内に設定されている重み付け係数に相当するパラメータを含む制御信号ＣＴＬ２を重み付け部２１１に供給する。 In FIG. 3, the control signal determination unit 210 determines whether or not the control signal CTL2 is supplied from the CPU. When the control signal CTL2 is supplied, the control signal determination unit 210 supplies the control signal CTL2 to the weighting unit 211. On the other hand, when the control signal CTL2 is not supplied, the control signal CTL2 including a parameter corresponding to a weighting coefficient set in the control signal determination unit 210 in advance is supplied to the weighting unit 211.

重み付け部２１１は、制御信号判定部２１０から供給される制御信号ＣＴＬ２に含まれる最大値を「１」とする重み付け係数と、画像入力部２０１から供給される復号画像信号ａのＲ信号成分とを乗算して、復号画像信号ａのＲ信号成分に重み付けをする。Ｒ信号成分は、画像入力部２０１において、輝度信号（Ｙ）及び色差信号Ｃｒとから、Ｒ＝Ｙ＋Ｃｒといった演算で算出される。 The weighting unit 211 calculates a weighting coefficient for setting the maximum value included in the control signal CTL2 supplied from the control signal determination unit 210 to “1” and the R signal component of the decoded image signal a supplied from the image input unit 201. Multiplication is performed to weight the R signal component of the decoded image signal a. The R signal component is calculated in the image input unit 201 from the luminance signal (Y) and the color difference signal Cr by an operation such as R = Y + Cr.

なお、Ｒ信号成分を使用する理由の一つは、順光に近い環境で、かつ、テクスチャの明るさの度合い（明度）の変化が大きくない条件下で、Ｒ信号成分の大きさが原画像の凹凸と一致する確率が高いという経験則による。テクスチャとは、画像を構成する要素であり、単一の画素もしくは画素群で構成される。 One of the reasons for using the R signal component is that the size of the R signal component is the same as that of the original image in an environment close to direct light and under a condition in which the change in the brightness level (brightness) of the texture is not large. This is based on an empirical rule that there is a high probability of matching with the irregularities of A texture is an element constituting an image and is composed of a single pixel or a group of pixels.

更に、Ｒ信号成分を使用するもう一つの理由として、赤色及び暖色は色彩学における前進色であり、寒色系よりも奥行きが手前に認識されるという特徴があり、この奥行きを手前に配置することで立体感を強調することが可能であるということである。 Further, another reason for using the R signal component is that red and warm colors are advanced colors in chromaticity, and the depth is recognized in front of the cold color system. It is possible to emphasize the stereoscopic effect.

なお、赤色及び暖色が前進色であるのに対し、青色は後退色であり、暖色系よりも奥行きが奥に認識される特徴がある。よって、青色の部分を奥に配置することによっても立体感の強調は可能である。更に双方を併用して、赤色の部分を手前、青色の部分を奥に配置することによって立体感を強調することも可能である。 Note that red and warm colors are forward colors, whereas blue is a backward color, and the depth is recognized deeper than the warm color system. Therefore, it is possible to enhance the stereoscopic effect by arranging the blue portion in the back. Furthermore, it is also possible to enhance the stereoscopic effect by using both in combination and arranging the red part in front and the blue part in the back.

加算部２１２は、奥行きモデル合成部２０６より供給される合成奥行きモデルの画像信号と、重み付け部２１１から供給される重み付けされた復号画像信号ａのＲ信号成分とから、奥行き推定データｄを生成する。例えば、奥行きモデル合成部２０６より供給される合成奥行きモデルの画像信号に、重み付け部２１１から供給される重み付けされた復号画像信号ａのＲ信号成分を重畳して、奥行き推定データｄを生成する。重畳した値が奥行き推定データｄに割り当てられる所定のビット数を超える場合は、所定のビット数に制限される。生成された奥行き推定データｄは、ステレオペア生成部３００に供給される。また、ＣＰＵ１０８にも供給される。 The adding unit 212 generates depth estimation data d from the image signal of the combined depth model supplied from the depth model combining unit 206 and the R signal component of the weighted decoded image signal a supplied from the weighting unit 211. . For example, the depth estimation data d is generated by superimposing the R signal component of the weighted decoded image signal a supplied from the weighting unit 211 on the image signal of the combined depth model supplied from the depth model combining unit 206. When the superposed value exceeds a predetermined number of bits assigned to the depth estimation data d, the number is limited to a predetermined number of bits. The generated depth estimation data d is supplied to the stereo pair generation unit 300. It is also supplied to the CPU 108.

次に、ステレオペア生成部３００の構成及び動作について説明する。図４は、ステレオペア生成部３００の一例のブロック図を示す。制御信号判定手段である制御信号判定部３０１は、ＣＰＵ１０８より制御信号ＣＴＬ３が供給されているか否かを判定する。制御信号ＣＴＬ３が供給されていない場合、予め制御信号判定部３０１内に設定されている輻輳及び奥行きを表す２つのパラメータをテクスチャシフト部３０２に供給する。制御信号ＣＴＬ３が供給されている場合、その制御信号ＣＴＬ３に含まれている輻輳及び奥行きを表す２つのパラメータをテクスチャシフト部３０２に供給する。 Next, the configuration and operation of the stereo pair generation unit 300 will be described. FIG. 4 shows a block diagram of an example of the stereo pair generation unit 300. A control signal determination unit 301 serving as a control signal determination unit determines whether or not the control signal CTL3 is supplied from the CPU. When the control signal CTL3 is not supplied, two parameters representing the congestion and the depth set in the control signal determination unit 301 in advance are supplied to the texture shift unit 302. When the control signal CTL3 is supplied, two parameters representing the congestion and the depth included in the control signal CTL3 are supplied to the texture shift unit 302.

テクスチャシフト手段であるテクスチャシフト部３０２は、供給される復号画像信号ａと奥行き推定データｄと制御信号判定部３０１からの輻輳及び奥行きを表す２つのパラメータとに基づいて、復号画像信号ａとは別視点の画像信号を生成する。例えば、テクスチャシフト部３０２は、復号画像信号ａを画面表示させた場合の視点を基準にして、左に視点移動した画像信号を生成する。その場合、テクスチャシフト部３０２は、ユーザーに対してテクスチャを近景として表示させるときは、近い画像ほどユーザーの内側（鼻側）に見えるので、画面右側へテクスチャを奥行きに応じた量だけ移動した画像信号を生成する。また、テクスチャシフト部３０２は、ユーザーに対してテクスチャを遠景として表示させるときは、遠い画像ほどユーザーの外側に見えるので、画面左側へテクスチャを奥行きに応じた量だけ移動した画像信号を生成する。 The texture shift unit 302 serving as a texture shift unit is based on the supplied decoded image signal a, the depth estimation data d, and the two parameters representing the congestion and depth from the control signal determination unit 301. An image signal of another viewpoint is generated. For example, the texture shift unit 302 generates an image signal whose viewpoint is moved to the left with reference to the viewpoint when the decoded image signal a is displayed on the screen. In that case, when the texture shift unit 302 displays the texture as a foreground for the user, the closer the image is, the closer it is to the inner side (nose side) of the user, so that the image is moved to the right side of the screen by an amount corresponding to the depth. Generate a signal. Further, when displaying the texture as a distant view for the user, the texture shift unit 302 generates an image signal in which the texture is moved to the left side of the screen by an amount corresponding to the depth because the farther the image is visible to the outside of the user.

ここでは、それぞれの画素に対する奥行き推定データｄを８ビットの値Ｄｄで表すものとする。テクスチャシフト部３０２は、Ｄｄの小さい値（すなわち、画面奥に位置するもの）から順に、そのＤｄに対応する復号画像信号ａのテクスチャをそれぞれの画素毎に（Ｄｄ−ｍ）／ｎ画素分右にシフトした画像信号を生成する。上記のｍは飛び出し感を表すパラメータ（輻輳値）であり、上記のｎは奥行きを表すパラメータ（奥行き値）である。 Here, the depth estimation data d for each pixel is represented by an 8-bit value Dd. The texture shift unit 302 sequentially increases the texture of the decoded image signal a corresponding to the Dd by (Dd−m) / n pixels in order from the smallest value of Dd (that is, the one located at the back of the screen). An image signal shifted to is generated. The above m is a parameter (congestion value) representing a feeling of popping out, and the above n is a parameter (depth value) representing a depth.

なお、ユーザーには、奥行き推定データｄを示す値Ｄｄの小さいテクスチャは画面奥側に見え、奥行き推定データｄを示す値Ｄｄの大きいテクスチャは画面手前に見える。奥行き推定データｄを示す値Ｄｄ、輻輳値ｍ、奥行き値ｎは０〜２５５の範囲の値であり、例えば、制御信号判定部３０１に予め設定されている値は、輻輳値ｍ＝２００、奥行き値ｎ＝２０である。 For the user, a texture having a small value Dd indicating the depth estimation data d appears on the back side of the screen, and a texture having a large value Dd indicating the depth estimation data d appears on the front side of the screen. The value Dd indicating the depth estimation data d, the congestion value m, and the depth value n are values in the range of 0 to 255. For example, the value preset in the control signal determination unit 301 is the congestion value m = 200, the depth The value n = 20.

オクルージョン補償部３０３は、テクスチャシフト部３０２より出力された別視点の画像信号に対してオクルージョンの補償を行い、オクルージョン補償された画像信号をポスト処理部３０４に供給する。オクルージョンとは、テクスチャをシフトした結果、画像中の位置関係変化によりテクスチャの存在しない部分のことをいう。オクルージョン補償部３０３は、テクスチャシフトされた画像信号に対応する元の復号画像信号ａによりオクルージョンの箇所を充填する。また、公知の文献（山田邦男、望月研二、相澤清晴、齊藤隆弘："領域競合法により分割された画像のテクスチャの統計量に基づくオクルージョン補償”、映像情報学会誌、Vol.56,No.5,pp.863〜866（2002.5））に記載の手法でオクルージョンを補償してもよい。 The occlusion compensation unit 303 performs occlusion compensation on the image signal of another viewpoint output from the texture shift unit 302 and supplies the occlusion compensated image signal to the post processing unit 304. Occlusion refers to a portion where no texture exists due to a change in the positional relationship in the image as a result of shifting the texture. The occlusion compensation unit 303 fills the occlusion location with the original decoded image signal a corresponding to the texture-shifted image signal. In addition, well-known literature (Kunio Yamada, Kenji Mochizuki, Kiyoharu Aizawa, Takahiro Saito: “Occlusion compensation based on texture statistics of images divided by the region competition method”, Journal of the Institute of Image Information Science, Vol.56, No.5 , pp. 863 to 866 (2002.5)), the occlusion may be compensated.

ポスト処理手段であるポスト処理部３０４は、オクルージョン補償部３０３によりオクルージョン補償された画像信号に対して、平滑化やノイズの除去などのポスト処理を公知の方法で必要に応じて行い、左目用画像信号ｅ１を出力する。なお、２Ｄ３Ｄ変換部１１５は、録再Ｉ／Ｆ部１１４により記録媒体から再生された復号画像信号ａを右目用画像信号ｅ２とする。 The post-processing unit 304, which is a post-processing unit, performs post-processing such as smoothing and noise removal on the image signal that has been occlusion-compensated by the occlusion compensation unit 303, if necessary, using a known method. The signal e1 is output. The 2D3D conversion unit 115 sets the decoded image signal a reproduced from the recording medium by the recording / playback I / F unit 114 as the right-eye image signal e2.

図１において、表示Ｉ／Ｆ部１１６は、２Ｄ３Ｄ変換部１１５から出力される右目用画像信号ｅ２とポスト処理された左目用画像信号ｅ１とを入力信号として受け、その入力信号をステレオ画像表示に対応したモニタ（図示せず）に出力する。これにより、ユーザーはステレオ画像を見ることができる。以上のようにして、記録媒体から再生した画像データに基づく擬似立体画像の生成と擬似立体画像の表示とが行われる。 In FIG. 1, a display I / F unit 116 receives a right-eye image signal e2 output from the 2D3D conversion unit 115 and a post-processed left-eye image signal e1 as input signals, and displays the input signals for stereo image display. Output to a corresponding monitor (not shown). Thereby, the user can see a stereo image. As described above, the generation of the pseudo stereoscopic image and the display of the pseudo stereoscopic image based on the image data reproduced from the recording medium are performed.

次に、本実施形態の要部の動作について更に詳細に説明する。 Next, the operation of the main part of this embodiment will be described in more detail.

ユーザーがビデオカメラ１００で撮影した非立体画像を擬似立体画像に変換する場合に、ユーザーが製作者となることなく、擬似立体画像生成用パラメータを自動的に調整するための制御信号ＣＴＬ１〜ＣＴＬ３の生成方法について具体的に説明する。 When the user converts a non-stereoscopic image captured by the video camera 100 into a pseudo-stereoscopic image, the control signals CTL1 to CTL3 for automatically adjusting the pseudo-stereoscopic image generation parameters without the user becoming a producer. The generation method will be specifically described.

ＣＰＵ１０８は、録再Ｉ／Ｆ部１１４が図示しない記録媒体から読み出した多重化データのうち、測距エリアの位置、被写体までの推定距離、推定被写界深度、被写体が顔か否かのデータ、顔の場合の顔の大きさのデータなど各種データを取得する。そして、ＣＰＵ１０８は、この取得データと２Ｄ３Ｄ変換部１１５から供給される奥行き推定データｄとから、以下のように制御信号ＣＴＬ１〜ＣＴＬ３を算出する。 Of the multiplexed data read out from the recording medium (not shown) by the recording / playback I / F unit 114, the CPU 108 stores the position of the ranging area, the estimated distance to the subject, the estimated depth of field, and whether the subject is a face. In the case of a face, various data such as face size data are acquired. Then, the CPU 108 calculates the control signals CTL1 to CTL3 from the acquired data and the depth estimation data d supplied from the 2D3D conversion unit 115 as follows.

ＣＰＵ１０８は、測距エリアの撮影画面内での位置が画面の水平方向の中央で、かつ、被写体までの推定距離が所定の閾値１よりも小さい場合には、そうでない場合よりも基本奥行きモデルタイプＣの混合比率が高くなるように制御信号ＣＴＬ１を生成する。その理由は、近距離にある被写体が主たる被写体であり、再生鑑賞する際に注視する部分であることが多い。従って、基本奥行きモデルタイプＡやＢのように画面中央ほど遠方に配置されるモデルでは、注視部分が遠方になってしまい、違和感があるためである。 The CPU 108 determines the basic depth model type when the position of the distance measurement area on the shooting screen is the center in the horizontal direction of the screen and the estimated distance to the subject is smaller than the predetermined threshold 1 than when it is not. The control signal CTL1 is generated so that the mixing ratio of C is increased. The reason for this is that a subject at a short distance is the main subject, and is often a portion to be watched during playback viewing. Therefore, in a model arranged farther toward the center of the screen, such as the basic depth model types A and B, the gaze portion is far away, and there is a sense of incongruity.

また、ＣＰＵ１０８は、被写体が顔であって、正規化した顔の大きさが所定の閾値２より大きい場合には、被写体の胸から上を含めた顔全体を捉えたショットであるバストショットの確率が高いので、そうでない場合よりも基本奥行きモデルタイプＣの混合比率が高くなるように制御信号ＣＴＬ１を生成する。これは、上記の理由と同じ理由に基づく。同様に、推定被写界深度が浅い値であるほど基本奥行きモデルタイプＣの混合比率が高くなるように制御信号ＣＴＬ１を生成する。焦点が合っていない部分の奥行きに変化があると違和感があるからである。 Further, when the subject is a face and the normalized face size is larger than a predetermined threshold 2, the CPU 108 determines the probability of a bust shot that is a shot that captures the entire face including the top from the chest of the subject. Therefore, the control signal CTL1 is generated so that the mixing ratio of the basic depth model type C is higher than that in the case where it is not. This is based on the same reason as described above. Similarly, the control signal CTL1 is generated so that the mixing ratio of the basic depth model type C is higher as the estimated depth of field is shallower. This is because there is a sense of incongruity when there is a change in the depth of the out-of-focus portion.

さらに、ＣＰＵ１０８は、測距エリアの撮影画面内での位置が画面の水平方向の右端または左端で、かつ、被写体までの推定距離が所定の閾値３よりも小さい場合には、基本奥行きモデルタイプＢの混合比率がそうでない場合よりも高くなるように制御信号ＣＴＬ１を生成する。その理由は、注視する部分が画面の水平方向の右端又は左端であると考えられるので、その部分を画面手前に、他の部分を画面奥側に配置するモデルにした方が違和感が少ないためである。 Further, the CPU 108 determines the basic depth model type B when the position of the ranging area on the shooting screen is the right or left end in the horizontal direction of the screen and the estimated distance to the subject is smaller than the predetermined threshold value 3. The control signal CTL1 is generated so that the mixing ratio is higher than in the case where the mixing ratio is not. The reason for this is that the part to be watched is considered to be the right or left edge in the horizontal direction of the screen, so it is less uncomfortable to use the model with the part in front of the screen and the other part in the back of the screen. is there.

また、ＣＰＵ１０８は、被写体までの推定距離が所定の閾値４よりも大きい場合には、基本奥行きモデルタイプＡの混合比率がそうでない場合よりも高くなるように制御信号ＣＴＬ１を生成する。その理由は、画面に拡がり感を持たせることで擬似立体画像に迫力がでるからである。 In addition, when the estimated distance to the subject is larger than the predetermined threshold 4, the CPU 108 generates the control signal CTL1 so that the mixing ratio of the basic depth model type A is higher than that when it is not. The reason is that the pseudo-stereoscopic image is powerful by giving the screen a feeling of spreading.

これら以外の場合には制御信号ＣＴＬ１を出力しない。従って、上記以外の場合は、スイッチ２０５は合成比率信号ＣＯＭを出力する。また、ＣＰＵ１０８は、主たる被写体が顔であって、正規化された顔の大きさが所定の閾値５より大きい場合には、制御信号ＣＴＬ２に含まれる重み付け係数を、それ以外の場合よりも大きな値とする。その理由は、顔の肌色部分にはＲ信号成分が含まれるため、重み付け係数を大きくすることで顔が周囲よりも画面の手前に配置されるので、擬似立体画像としての印象が強くなるからである。 In other cases, the control signal CTL1 is not output. Therefore, in cases other than the above, the switch 205 outputs the composite ratio signal COM. Further, when the main subject is a face and the normalized face size is larger than the predetermined threshold value 5, the CPU 108 sets the weighting coefficient included in the control signal CTL2 to a larger value than in other cases. And The reason is that since the skin color portion of the face contains an R signal component, increasing the weighting coefficient places the face closer to the screen than the surroundings, so the impression as a pseudo-stereoscopic image becomes stronger. is there.

更に、ＣＰＵ１０８は、推定被写界深度が深い値であるほど、制御信号ＣＴＬ２に含まれる重み付け係数を、それ以外の場合よりも大きな値とする。これは、推定被写界深度が深い値である場合、画面内の多くの部分がぼけないで映っていることが多く、２次元画像（非立体画像）で見ると、奥行き感が失われている状態となるので、奥行き感を持たせるためである。 Further, the CPU 108 sets the weighting coefficient included in the control signal CTL2 to a larger value than the other cases as the estimated depth of field becomes deeper. This is because when the estimated depth of field is a deep value, many parts of the screen are often seen without blurring, and the sense of depth is lost when viewed in a two-dimensional image (non-stereoscopic image). This is to give a sense of depth.

また、ＣＰＵ１０８は、奥行き推定データｄのそれぞれの画素のうち、測距エリアの撮影画面内での位置に相当する複数の画素の奥行き推定データｄを示す値Ｄｄの平均値を求める。この平均値をｍとする。すなわち、輻輳及び奥行きを制御する制御信号ＣＴＬ３のうちの輻輳値ｍが算出される。従って、前述のシフト量（Ｄｄ−ｍ）／ｎ画素分の式から、測距エリアの撮影画面内での位置に相当する領域が、シフト量０の領域となる。右目用と左目用の画像データでシフト量が０の領域は、立体表示ではモニタの実際の表示面に位置して見えるようにする。これは、測距エリアには、主たる被写体が存在し、再生鑑賞する際の注視部分であると考えられることから、その部分を常にモニタの実際の表示面に位置させることで、安定した印象を与えることが目的である。 Further, the CPU 108 obtains an average value of the values Dd indicating the depth estimation data d of a plurality of pixels corresponding to the positions in the shooting screen of the distance measurement area among the respective pixels of the depth estimation data d. Let this average value be m. That is, the congestion value m of the control signal CTL3 for controlling the congestion and depth is calculated. Therefore, from the above-described equation for the shift amount (Dd−m) / n pixels, the region corresponding to the position of the distance measurement area on the shooting screen is the region with the shift amount 0. The region where the shift amount is zero in the image data for the right eye and the left eye is made to appear on the actual display surface of the monitor in the stereoscopic display. This is because the main subject exists in the distance measurement area and is considered to be a gaze part when viewing and replaying. Therefore, by always positioning this part on the actual display surface of the monitor, a stable impression can be obtained. The purpose is to give.

更に、ＣＰＵ１０８は、推定被写界深度が深い値であるほど、制御信号ＣＴＬ３のうちの奥行き値ｎの値を小さくする。すなわち、立体画像としての奥行きを大きくする。逆に、推定被写界深度が浅い値であるほど、ＣＰＵ１０８は制御信号ＣＴＬ３のうちの奥行き値ｎの値を小さくする。すなわち、立体画像としての奥行きを小さくする。 Further, the CPU 108 decreases the depth value n of the control signal CTL3 as the estimated depth of field is deeper. That is, the depth as a stereoscopic image is increased. Conversely, as the estimated depth of field is shallower, the CPU 108 decreases the depth value n of the control signal CTL3. That is, the depth as a stereoscopic image is reduced.

これは次の理由による。推定被写界深度が深い値である場合、画面内の多くの画像部分がぼけないで映っていることが多い。この場合、２次元画像（非立体画像）で見ると、奥行き感が失われている状態となる。これを２Ｄ３Ｄ変換で補うことで、奥行き感を得ることができる。逆に、推定被写界深度が浅い値である場合には、焦点が合ったように見える画像部分が限定されていることが多く、２次元画像（非立体画像）で見ても、奥行き感が感じられる。この状態で２Ｄ３Ｄ変換により、更に奥行き感を加えると、ユーザーには目の焦点の位置まで強制されているように感じてしまい、不快な感じを抱くことがあるからである。 This is due to the following reason. When the estimated depth of field is a deep value, many image portions in the screen are often shown without being blurred. In this case, when viewed as a two-dimensional image (non-stereoscopic image), a feeling of depth is lost. By supplementing this with 2D3D conversion, a sense of depth can be obtained. Conversely, when the estimated depth of field is a shallow value, the portion of the image that appears to be in focus is often limited, and even when viewed in a two-dimensional image (non-stereoscopic image), the sense of depth Is felt. This is because, if a sense of depth is further added by 2D3D conversion in this state, the user may feel as if the user is forced to the focus position of the eyes, and may feel uncomfortable.

また、ＣＰＵ１０８は、主たる被写体が顔であって、正規化された顔の大きさが所定の閾値５より大きい場合には、制御信号ＣＴＬ３のうちの奥行き値ｎの値をそれ以外の場合よりも小さくする。すなわち、立体画像としての奥行きを小さくする。 Further, when the main subject is a face and the normalized face size is larger than the predetermined threshold value 5, the CPU 108 sets the depth value n of the control signal CTL3 as compared to the other cases. Make it smaller. That is, the depth as a stereoscopic image is reduced.

なお、以上の制御信号ＣＴＬ１〜ＣＴＬ３の値は、連続するフレームにおいて時間的な連続を考慮して生成する。すなわち、値の変更が時間的に緩やかであるようにする。急激な変更には擬似立体画像を見るユーザーの視覚感覚がついていけないからである。 Note that the values of the control signals CTL1 to CTL3 are generated in consideration of temporal continuity in successive frames. That is, the value change is made gradual in time. This is because a sudden change cannot follow the visual sense of the user viewing the pseudo-stereoscopic image.

以上のように、本実施形態によれば、非立体画像である撮影画像から奥行き情報を推定して擬似立体画像を生成する場合に、第１乃至第３の制御信号によって複数の基本奥行きモデルタイプの画像の合成比率や非立体画像信号の重み付け係数やテクスチャシフトの際の奥行き値や輻輳値を制御する際に、画面毎に得られる撮影時のパラメータから第１乃至第３の制御信号の値を最適な奥行きモデルが得られるように自動的に可変するようにしたため、ユーザーが撮影シーンなどに応じて個々に設定しなくても、どのような非立体画像のシーンであっても、違和感のない、現実のイメージにより近い擬似立体画像を生成することができる。 As described above, according to the present embodiment, when a depth information is estimated from a captured image that is a non-stereo image and a pseudo stereoscopic image is generated, a plurality of basic depth model types are generated by the first to third control signals. Values of the first to third control signals from the shooting parameters obtained for each screen when controlling the image synthesis ratio, the weighting coefficient of the non-stereo image signal, the depth value and the convergence value at the time of texture shift. Is automatically changed so that the optimal depth model is obtained, so that the user does not have to set it individually according to the shooting scene etc. It is possible to generate a pseudo stereoscopic image that is closer to an actual image.

また、本実施形態によれば、被写体が顔である場合には人物撮影に適した奥行き感を与えることができる。更に、合焦している領域である測距エリアがモニタの実際の表示面に位置して見えることで、安定した擬似立体画像を表示できる。また、本実施形態では、制御信号ＣＴＬ１〜ＣＴＬ３の値の変更を時間的に緩やかにするようにしているため、ボケ量と奥行きの関係がユーザーにとって違和感のない自然なものにできる。 In addition, according to the present embodiment, when the subject is a face, it is possible to give a sense of depth suitable for portrait photography. Furthermore, a stable pseudo-stereoscopic image can be displayed by making the ranging area, which is the focused area, appear to be located on the actual display surface of the monitor. Further, in the present embodiment, since the change of the values of the control signals CTL1 to CTL3 is made gradual in time, the relationship between the blur amount and the depth can be made natural without any sense of incongruity for the user.

＜第２の実施形態＞
次に、本発明の第２の実施形態について説明する。図１０は、本発明になる擬似立体画像生成装置を備えたビデオカメラの第２の実施形態のブロック図を示す。同図中、図１と同一構成部分には同一符号を付し、その説明を省略する。図１０に示す第２の実施形態のビデオカメラ１５０は、再生した画像データに対して２Ｄ３Ｄ変換を行うのではなく、撮影した画像信号に対して２Ｄ３Ｄ変換を行い、その結果を記録する点でビデオカメラ１００と相違する。 <Second Embodiment>
Next, a second embodiment of the present invention will be described. FIG. 10 shows a block diagram of a second embodiment of a video camera provided with a pseudo-stereoscopic image generation apparatus according to the present invention. In the figure, the same components as those in FIG. The video camera 150 of the second embodiment shown in FIG. 10 does not perform 2D3D conversion on the reproduced image data, but performs 2D3D conversion on the captured image signal and records the result. Different from the camera 100.

このビデオカメラ１５０は、所望の撮影モードによる被写体の撮影、撮影した画像データに基づく擬似立体画像の生成と、その擬似立体画像の記録媒体への記録、記録媒体から再生した擬似立体画像の表示を適宜選択して行う。 The video camera 150 shoots a subject in a desired shooting mode, generates a pseudo stereoscopic image based on the captured image data, records the pseudo stereoscopic image on a recording medium, and displays a pseudo stereoscopic image reproduced from the recording medium. Select as appropriate.

撮影時において、ビデオカメラ１５０内の画像記憶部１５１は、信号処理部１１１から出力される被写体画像のデジタル値の輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）を順次格納する点は画像記憶部１１２と同様であるが、格納した輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）をＣＰＵ１５２と２Ｄ３Ｄ変換部１１５にそれぞれ出力する点で画像記憶部１１２と異なる。 At the time of shooting, the image storage unit 151 in the video camera 150 stores the digital luminance signal (Y) and color difference signals (Cb, Cr) of the subject image output from the signal processing unit 111 in sequence. Similar to the unit 112, but differs from the image storage unit 112 in that the stored luminance signal (Y) and color difference signals (Cb, Cr) are output to the CPU 152 and the 2D3D conversion unit 115, respectively.

ＣＰＵ１５２は、画像記憶部１５１から読み出された１フレーム分の輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）と、ズームレンズ１０１の移動位置、フォーカスレンズ１０２の移動位置及び絞り１０３の絞り量などの取得情報と、メモリ１０９に格納されている被写体距離推定テーブルや被写界深度テーブルなどを用いて、ＣＰＵ１０８と同様にしてフレーム毎の測距エリアの撮影画面内での位置データ、フォーカスレンズ１０２の合焦位置から被写体までの推定距離のデータ、被写界深度の推定データ及び被写体が顔か否かのデータ、更に顔の場合には顔の大きさのデータを生成する。 The CPU 152 reads the luminance signal (Y) and color difference signals (Cb, Cr) for one frame read from the image storage unit 151, the moving position of the zoom lens 101, the moving position of the focus lens 102, and the aperture amount of the diaphragm 103. Using the acquired information such as the subject distance estimation table and the depth-of-field table stored in the memory 109, the position data in the shooting screen of the distance measurement area for each frame, the focus lens, and the like in the CPU 108 Data of the estimated distance from the in-focus position 102 to the subject, estimated data of depth of field, data on whether or not the subject is a face, and data on the size of the face in the case of a face are generated.

続いて、ＣＰＵ１５２は、生成したそれら各種のデータと２Ｄ３Ｄ変換部１１５から供給される奥行き推定データｄとから、第１の実施形態で説明したと同様の方法により制御信号ＣＴＬ１〜ＣＴＬ３を算出して２Ｄ３Ｄ変換部１１５に供給する。 Subsequently, the CPU 152 calculates the control signals CTL1 to CTL3 from the generated various data and the depth estimation data d supplied from the 2D3D conversion unit 115 by the same method as described in the first embodiment. This is supplied to the 2D3D conversion unit 115.

２Ｄ３Ｄ変換部１１５は、画像記憶部１５１から読み出された、非立体画像信号である輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）と、ＣＰＵ１５２で算出された制御信号ＣＴＬ１及びＣＴＬ２とに基づいて、第１の実施形態で説明したと同様の方法により奥行き推定データｄを生成してＣＰＵ１５２に出力すると共に、非立体画像信号（Ｙ，Ｃｂ，Ｃｒ）と制御信号ＣＴＬ１〜ＣＴＬ３とに基づいて、左目用画像信号ｅ１及び右目用画像信号ｅ２を生成する。そして、２Ｄ３Ｄ変換部１１５は、生成した左目用画像信号ｅ１及び右目用画像信号ｅ２を画像処理部１５３に供給する。 The 2D3D conversion unit 115 is based on the luminance signal (Y) and color difference signals (Cb, Cr), which are non-stereo image signals, read from the image storage unit 151, and the control signals CTL1 and CTL2 calculated by the CPU 152. Then, the depth estimation data d is generated and output to the CPU 152 by the same method as described in the first embodiment, and based on the non-stereo image signal (Y, Cb, Cr) and the control signals CTL1 to CTL3. The left-eye image signal e1 and the right-eye image signal e2 are generated. Then, the 2D3D conversion unit 115 supplies the generated left-eye image signal e1 and right-eye image signal e2 to the image processing unit 153.

画像処理部１５３は、２Ｄ３Ｄ変換部１１５から供給される左目用画像信号ｅ１及び右目用画像信号ｅ２を、それぞれ独立に公知のＭＰＥＧ−２方式やＭＰＥＧ−４ＡＶＣ方式によって符号化する。あるいは、画像処理部１５３は、２Ｄ３Ｄ変換部１１５から供給される左目用画像信号ｅ１及び右目用画像信号ｅ２を一組の擬似立体画像信号として、例えばＭＰＥＧ−４ＭＶＣ（Moving Picture Experts Group 4 Multiview Video Coding）方式に基づいて符号化する。 The image processing unit 153 encodes the left-eye image signal e1 and the right-eye image signal e2 supplied from the 2D3D conversion unit 115 independently by a known MPEG-2 system or MPEG-4 AVC system. Alternatively, the image processing unit 153 uses the left-eye image signal e1 and the right-eye image signal e2 supplied from the 2D3D conversion unit 115 as a set of pseudo-stereoscopic image signals, for example, MPEG-4 MVC (Moving Picture Experts Group 4 Multiview Video Coding). ) Encoding based on the method.

録再Ｉ／Ｆ部１１４は、画像処理部１５３において符号化された画像信号が画像データバス１１７を介して入力され、その入力された符号化画像信号を図示しない記録媒体に記録する。 The recording / playback I / F unit 114 receives the image signal encoded by the image processing unit 153 via the image data bus 117 and records the input encoded image signal on a recording medium (not shown).

本実施形態によれば、撮影した被写体画像の擬似立体画像信号を記録媒体に記録することができるので、このビデオカメラ１５０で記録媒体から擬似立体画像信号を再生する場合に限らず、他の装置でこのビデオカメラ１５０で記録した記録媒体から擬似立体画像信号を再生して表示させることができる。 According to the present embodiment, since the pseudo stereoscopic image signal of the photographed subject image can be recorded on the recording medium, the video camera 150 is not limited to the case of reproducing the pseudo stereoscopic image signal from the recording medium. Thus, the pseudo stereoscopic image signal can be reproduced and displayed from the recording medium recorded by the video camera 150.

＜第３の実施形態＞
次に、本発明の第３の実施形態について説明する。図１１は、本発明になる擬似立体画像生成装置を備えたビデオカメラの第３の実施形態のブロック図を示す。同図中、図１と同一構成部分には同一符号を付し、その説明を省略する。図１１に示す第３の実施形態のビデオカメラ１７０は、ビデオカメラ１００の構成に手振れセンサ１７１及び姿勢センサ１７２を追加した構成である。 <Third Embodiment>
Next, a third embodiment of the present invention will be described. FIG. 11 shows a block diagram of a third embodiment of a video camera provided with a pseudo-stereoscopic image generation apparatus according to the present invention. In the figure, the same components as those in FIG. A video camera 170 according to the third embodiment illustrated in FIG. 11 has a configuration in which a camera shake sensor 171 and a posture sensor 172 are added to the configuration of the video camera 100.

手振れセンサ１７１は、ジャイロセンサなどで構成され、カメラの搖動を測定して、その測定結果をＣＰＵ１７３に出力する。また、手振れセンサ１７１は、カメラのパンニング、チルティング操作を測定する手段ともなっている。カメラの搖動をジャイロセンサなどで測定した場合、その角速度成分の周波数が３〜１０Ｈｚであれば、その搖動は手振れによるものであると考えられる。一方、一定時間、一定方向にカメラを動かすパンニング、チルティング操作の場合、ほぼ直流成分の角速度となる。すなわち、手振れセンサ１７１からの出力信号であるカメラの搖動の測定結果を用いて、ＣＰＵ１７３で演算することで、その搖動が手振れによるものであるか、パンニング、チルティング操作によるものであるかを推定し、更にいずれの場合も揺れの大きさを算出する。 The camera shake sensor 171 is composed of a gyro sensor or the like, measures camera movement, and outputs the measurement result to the CPU 173. The camera shake sensor 171 is also a means for measuring camera panning and tilting operations. When the camera motion is measured with a gyro sensor or the like, if the frequency of the angular velocity component is 3 to 10 Hz, the motion is considered to be caused by camera shake. On the other hand, in the case of panning and tilting operations in which the camera is moved in a certain direction for a certain time, the angular velocity of the direct current component is almost obtained. That is, by using the measurement result of the camera movement, which is an output signal from the camera shake sensor 171, by the CPU 173, it is estimated whether the movement is caused by camera shake, panning, or tilting operation. In any case, the magnitude of shaking is calculated.

なお、カメラの搖動は、ジャイロセンサなどによらない方法で算出してもよい。例えば、公知のように、撮影画面のあるフレームと次のフレームで画面内の全被写体の動きベクトルの平均値を算出し、その動きベクトルの画面毎の変動をカメラの搖動として測定してもよい。 The camera movement may be calculated by a method that does not depend on a gyro sensor or the like. For example, as is well known, the average value of the motion vectors of all the subjects in the screen may be calculated from a certain frame of the shooting screen and the next frame, and the variation of the motion vector for each screen may be measured as the camera motion. .

姿勢センサ１７２は、重力センサなどで構成され、ここでは撮像素子１０４の光軸周りのロール角度を測定して、その測定結果をＣＰＵ１７３に出力する。撮像素子１０４の光軸周りのロール角度は、水平線を撮影したときに、撮影画面で水平線が水平になる場合が０°であり、水平線が右下がりに撮影される場合が＋ａ°（ａは正数）、水平線が左下がりに撮影される場合が−ａ°（ａは正数）である。 The posture sensor 172 is composed of a gravity sensor or the like, and here, measures the roll angle around the optical axis of the image sensor 104 and outputs the measurement result to the CPU 173. The roll angle around the optical axis of the image sensor 104 is 0 ° when the horizontal line is horizontal when the horizontal line is imaged, and + a ° (a is positive when the horizontal line is imaged downward to the right). Number), the case where the horizontal line is photographed to the left is −a ° (a is a positive number).

ＣＰＵ１７３は、本発明における被写体の顔の大きさ情報取得手段、ビデオカメラ１７０から被写体までの推定距離を算出する被写体推定距離算出手段、推定被写界深度を算出する被写界深度算出手段、撮像画面内の測距エリアの位置データを取得する位置データ取得手段、制御信号ＣＴＬ１〜ＣＴＬ３を算出する制御信号算出手段、手振れ検出手段、撮像素子の光軸周りのロール角度検出手段、及び撮影シーン情報取得手段を構成している。なお、手振れ検出手段が検出する手振れ情報には、前述した手振れによるカメラの搖動の大きさ情報だけでなく、パンニング及びチルティングの大きさ情報も含まれる。 The CPU 173 includes a face size information acquisition unit of the present invention, a subject estimation distance calculation unit that calculates an estimated distance from the video camera 170 to the subject, a depth of field calculation unit that calculates an estimated depth of field, and imaging. Position data acquisition means for acquiring position data of a ranging area in the screen, control signal calculation means for calculating control signals CTL1 to CTL3, camera shake detection means, roll angle detection means around the optical axis of the image sensor, and shooting scene information It constitutes acquisition means. Note that the camera shake information detected by the camera shake detection means includes not only the above-described camera shake magnitude information but also panning and tilting magnitude information.

すなわち、ＣＰＵ１７３は、第１の実施形態で前述した、測距エリアの位置、被写体までの推定距離、推定被写界深度、被写体が顔か否かのデータ、顔の場合の顔の大きさのデータなど各種データに、更に手振れセンサ１７１により得られたカメラの搖動の大きさ情報と、パンニング、チルティングの大きさ情報と、姿勢センサ１７２により得られた撮像素子１０４の光軸周りのロール角度情報とを多重化する。 That is, the CPU 173 determines the position of the distance measurement area, the estimated distance to the subject, the estimated depth of field, the data on whether the subject is a face, the size of the face in the case of a face, as described in the first embodiment. In addition to various data such as data, the camera swing sensor 171 magnitude information of panning and tilting, and the roll angle around the optical axis of the image sensor 104 obtained by the attitude sensor 172 Multiplex information.

なお、被写体までの推定距離は、第１の実施形態と同様に山登り法で求めてもよいし、赤外線を照射してその反射波を観測する方式や、いわゆる位相差検出方式を用いてもよい。また、被写体までの推定距離は、フォーカスレンズと被写体までの距離であってもよいし、カメラの他の部分と被写体までの距離であってもよい。 The estimated distance to the subject may be obtained by a hill-climbing method as in the first embodiment, or a method of irradiating infrared rays and observing the reflected wave, or a so-called phase difference detection method may be used. . Further, the estimated distance to the subject may be a distance from the focus lens to the subject, or may be a distance from the other part of the camera to the subject.

また、ＣＰＵ１７３は、画像記憶部１１２に格納されている１フレーム分の輝度信号（Ｙ）及び色差信号（Ｃｂ，Ｃｒ）を読み出し、色相（Ｈ）、彩度（Ｓ）、明度（Ｖ）に変換する。更に、ＣＰＵ１７３は、１フレーム分の画面を、例えば横３０、縦２０に分割した各ブロックのそれぞれで、ブロック内の信号の色相（Ｈ）、彩度（Ｓ）、明度（Ｖ）の平均値を算出して、その算出結果をメモリ１７４に格納する。メモリ１７４には、現在の１フレームの算出結果と、一つ前のフレームの算出結果とが格納される。なお、メモリ１７４には、ＣＰＵ１７３の動作用プログラムの他に、メモリ１０９と同様に、被写体距離推定テーブル及び被写界深度テーブルが格納されている。 Further, the CPU 173 reads the luminance signal (Y) and the color difference signals (Cb, Cr) for one frame stored in the image storage unit 112, and converts them into hue (H), saturation (S), and brightness (V). Convert. Furthermore, the CPU 173 averages the hue (H), saturation (S), and lightness (V) of the signal in each block in each block obtained by dividing the screen for one frame into, for example, 30 horizontal and 20 vertical. And the calculation result is stored in the memory 174. The memory 174 stores the calculation result of the current one frame and the calculation result of the previous frame. In addition to the operation program for the CPU 173, the memory 174 stores a subject distance estimation table and a depth-of-field table as in the memory 109.

ＣＰＵ１７３は、これらの算出結果と前述の被写体までの推定距離とから撮影シーンを判定する。例えば、画面上部の多くのブロックの明度（Ｖ）が明るく、色相（Ｈ）が青で、推定距離が所定の閾値より大きい場合は、昼間の風景撮影シーンと判定する。また、ＣＰＵ１７３は、一画面全体において、一つ前のフレームの明度（Ｖ）と現在のフレームの明度（Ｖ）とを、画面内の同じ位置に相当するブロック同士で比較して、変化の大きさが第１の閾値よりも大きいブロックの数が第２の閾値よりも多い場合には、被写体の動きが多いと判断してスポーツ撮影シーンと判定する。また、ＣＰＵ１７３は、画面中央のブロックの色相（Ｈ）が赤や青で、彩度（Ｓ）が第３の閾値より大きく、被写体までの推定距離が第４の閾値より小さい場合は、花のマクロ撮影シーンと判定する。 The CPU 173 determines a shooting scene from these calculation results and the estimated distance to the subject. For example, when the lightness (V) of many blocks at the top of the screen is bright, the hue (H) is blue, and the estimated distance is larger than a predetermined threshold, it is determined as a daytime landscape scene. Also, the CPU 173 compares the brightness (V) of the previous frame with the brightness (V) of the previous frame in the entire screen, and compares the blocks corresponding to the same position in the screen with each other. If the number of blocks whose length is greater than the first threshold is greater than the second threshold, it is determined that there is a lot of movement of the subject and a sports shooting scene is determined. Further, the CPU 173 determines that if the hue (H) of the block at the center of the screen is red or blue, the saturation (S) is larger than the third threshold, and the estimated distance to the subject is smaller than the fourth threshold, The macro shooting scene is determined.

ＣＰＵ１７３は、これらの撮影シーン情報についても、上記の手振れによる振動の大きさ情報及び光軸周りのロール角度情報と、第１の実施形態で説明した各種データと共に多重化して、録再Ｉ／Ｆ部１１４に供給する。録再Ｉ／Ｆ部１１４は、供給された多重化データと、対応するフレームの符号化画像データと共に図示しない記録媒体に対応付けて記録する。 The CPU 173 also multiplexes the shooting scene information with the above-described vibration magnitude information and roll angle information around the optical axis and various data described in the first embodiment, and the recording / playback I / F. Supplied to the unit 114. The recording / reproducing I / F unit 114 records the supplied multiplexed data and the encoded image data of the corresponding frame in association with a recording medium (not shown).

次に、このビデオカメラ１７０の再生信号に基づく擬似立体画像信号生成動作について説明する。録再Ｉ／Ｆ部１１４は、図示しない記録媒体から読み出したデータのうち、符号化画像データは画像データバス１１７を介して画像処理部１１３に供給する一方、符号化画像データ以外の測距エリアの位置、被写体までの推定距離、推定被写界深度、被写体が顔か否かのデータ、顔の場合の顔の大きさのデータ、手振れによるカメラの搖動の大きさ情報、パンニング、チルティングの大きさ情報及び撮像素子の光軸周りのロール角度情報、及び撮影シーン情報を画像データバス１１７を介してＣＰＵ１７３に供給する。 Next, a pseudo stereoscopic image signal generation operation based on the reproduction signal of the video camera 170 will be described. The recording / playback I / F unit 114 supplies the encoded image data to the image processing unit 113 via the image data bus 117 among the data read from the recording medium (not shown), while the ranging area other than the encoded image data. Position, estimated distance to the subject, estimated depth of field, data on whether the subject is a face, face size data for a face, camera shake information due to camera shake, panning, tilting The size information, roll angle information about the optical axis of the image sensor, and shooting scene information are supplied to the CPU 173 via the image data bus 117.

ＣＰＵ１７３は、この録再Ｉ／Ｆ部１１４から供給されるデータと、２Ｄ３Ｄ変換部１１５から供給される奥行き推定データｄとから、以下のように制御信号ＣＴＬ１〜ＣＴＬ３を算出する。なお、ＣＰＵ１７３による制御信号ＣＴＬ１〜ＣＴＬ３の算出動作は、ＣＰＵ１０８による制御信号ＣＴＬ１〜ＣＴＬ３の算出動作と同じ算出動作も含むが、その説明は既に説明したので省略し、本実施形態特有の制御信号ＣＴＬ１〜ＣＴＬ３の算出動作について以下説明する。 The CPU 173 calculates control signals CTL1 to CTL3 as follows from the data supplied from the recording / playback I / F unit 114 and the depth estimation data d supplied from the 2D3D conversion unit 115. Note that the calculation operation of the control signals CTL1 to CTL3 by the CPU 173 includes the same calculation operation as the calculation operation of the control signals CTL1 to CTL3 by the CPU 108, but since the description thereof has already been described, it is omitted and the control signal CTL1 unique to the present embodiment. -Calculation operation of CTL3 will be described below.

ＣＰＵ１７３は、撮影シーン情報がスポーツ撮影シーンを示しているときには、基本奥行きモデルタイプＣの混合比率が、そうでない場合よりも高くなるように制御信号ＣＴＬ１を生成する。その理由は、スポーツ撮影シーンの場合、撮影対象の動きが激しいことが多く、画面の中央だけでなく画面の端の方にも撮影対象が動きまわるため、基本奥行きモデルタイプＡやＢのように、画面中央ほど画面奥側に配置されるモデルでは距離感の変化が大きく、擬似立体画像を視聴するユーザーの違和感につながることがあり、それを防止するためである。 When the shooting scene information indicates a sports shooting scene, the CPU 173 generates the control signal CTL1 so that the mixing ratio of the basic depth model type C is higher than that when it is not. The reason for this is that in the case of sports shooting scenes, the movement of the shooting target is often intense, and the shooting target moves not only in the center of the screen but also toward the edge of the screen. This is to prevent a sense of discomfort for the user who views the pseudo-stereoscopic image from being greatly changed in the model arranged at the back of the screen toward the center of the screen.

また、ＣＰＵ１７３は、パンニング、チルティングの大きさ情報の値が所定の閾値よりも大きい場合には、基本奥行きモデルタイプＣの混合比率が、そうでない場合よりも高くなるように制御信号ＣＴＬ１を生成する。その理由は、パンニングを行う場合、それまでは画面の左右の端にあった被写体が画面中央に移動することになる。その場合、その被写体の奥行き感が画面の左右の端にあった時と画面の中央にある時とで異なることになるのは不自然なので、そのような事態を防止するためである。 Further, the CPU 173 generates the control signal CTL1 so that the mixing ratio of the basic depth model type C is higher when the value of the size information of panning and tilting is larger than a predetermined threshold. To do. The reason for this is that when panning, the subject that has been at the left and right edges of the screen until then moves to the center of the screen. In this case, it is unnatural that the sense of depth of the subject is different between when it is at the left and right edges of the screen and when it is in the center of the screen, and this is to prevent such a situation.

また、ＣＰＵ１７３は、撮影シーン情報が昼間の風景撮影シーンを示しているときには、基本奥行きモデルタイプＡの混合比率が、そうでない場合よりも高くなるように制御信号ＣＴＬ１を生成する。その理由は、画面の拡がり感を持たせることで、擬似立体画像に迫力がでるからである。 In addition, when the shooting scene information indicates a daytime landscape shooting scene, the CPU 173 generates the control signal CTL1 so that the mixing ratio of the basic depth model type A is higher than that when it is not. The reason is that the pseudo-stereoscopic image is powerful by giving a feeling of expansion of the screen.

また、ＣＰＵ１７３は、撮影シーン情報が花のマクロ撮影シーンを示しているときには、制御信号ＣＴＬ２に含まれる重み付け係数を、それ以外の場合よりも大きな値とする。これは、重み付け係数を大きくすることで、Ｒ信号成分を含む花の画像が画面手前に配置され、擬似立体画像としての印象が強くなるためである。 Further, when the shooting scene information indicates a flower macro shooting scene, the CPU 173 sets the weighting coefficient included in the control signal CTL2 to a larger value than in other cases. This is because by increasing the weighting coefficient, a flower image including the R signal component is arranged in front of the screen, and the impression as a pseudo stereoscopic image is strengthened.

一方、ＣＰＵ１７３は、光軸周りのロール角度情報が０°から離れているほど、制御信号ＣＴＬ２に含まれる重み付け係数を小さくする。その理由は、光軸周りのロール角度情報が０°から離れているほど、撮影画面が水平線に対して傾いた状態で撮影が行われており、再生視聴時に立体感を強調すると、ユーザーの画像視聴中の姿勢は傾いていないために違和感があるため、これを防止するためである。同様に、手振れによる振動の大きさが大きいほど、制御信号ＣＴＬ２に含まれる重み付け係数を小さくする。画面が揺れている状態で立体感を強調すると、ユーザーの画像視聴中の姿勢は傾いていないために違和感があるからである。 On the other hand, the CPU 173 decreases the weighting coefficient included in the control signal CTL2 as the roll angle information around the optical axis is further away from 0 °. The reason is that as the roll angle information around the optical axis is farther from 0 °, the image is taken with the shooting screen tilted with respect to the horizontal line. This is to prevent this because the posture during viewing is not inclined and there is a sense of incongruity. Similarly, the greater the magnitude of vibration due to camera shake, the smaller the weighting coefficient included in the control signal CTL2. This is because if the stereoscopic effect is emphasized while the screen is shaking, the user's posture during viewing of the image is not inclined, which makes the user feel uncomfortable.

また、ＣＰＵ１７３は、手振れによる振動の大きさが大きいほど、制御信号ＣＴＬ３のうちの奥行き値ｎの値を大きくする。すなわち、立体画像としての奥行きを小さくする。手振れによる振動が大きい場合には、再生視聴時に画面の揺れが画像を見るユーザーに感知されるので、２次元画像（非立体画像）で見ても乗り物に酔ったような感じがする。この状態で２Ｄ３Ｄ変換して得た擬似立体画像を見ると、更に酔ったような感じが強くなる。これを防止するために、手振れによる振動の大きさが大きいほど、立体画像としての奥行き感を減らすのである。 Further, the CPU 173 increases the value of the depth value n in the control signal CTL3 as the magnitude of vibration due to camera shake increases. That is, the depth as a stereoscopic image is reduced. When vibration due to camera shake is large, the user who sees the image is perceived by the user who sees the image during playback and viewing. Therefore, even when viewing the two-dimensional image (non-stereoscopic image), the user feels drunk. Looking at the pseudo-stereoscopic image obtained by 2D3D conversion in this state, the feeling of getting drunk becomes stronger. In order to prevent this, as the magnitude of vibration due to camera shake increases, the sense of depth as a stereoscopic image is reduced.

また、ＣＰＵ１７３は、光軸周りのロール角度情報が０°から離れているほど、制御信号ＣＴＬ３のうちの奥行き値ｎの値を大きくする。すなわち、立体画像としての奥行きを小さくする。その理由は、光軸周りのロール角度情報が０°から離れているほど、撮影画面が水平線に対して傾いた状態で撮影が行われており、表示される画像が画面の水平線に対して傾いているのに対し、表示画像を見るユーザーの両目の高さは表示画像の傾きに追従していないことが多いので、この状態で２Ｄ３Ｄ変換して得た擬似立体画像を奥行き感が大きな状態で表示すると、ユーザーの違和感が大きくなる、ということを防止するためである。 Further, the CPU 173 increases the depth value n of the control signal CTL3 as the roll angle information around the optical axis is further away from 0 °. That is, the depth as a stereoscopic image is reduced. The reason is that, as the roll angle information about the optical axis is further away from 0 °, the image is taken with the shooting screen tilted with respect to the horizontal line, and the displayed image is tilted with respect to the horizontal line of the screen. On the other hand, the height of both eyes of the user viewing the display image often does not follow the tilt of the display image, so the pseudo-stereoscopic image obtained by 2D3D conversion in this state has a great sense of depth. This is to prevent the user from feeling uncomfortable when displayed.

また、ＣＰＵ１７３は、撮影シーン情報がスポーツ撮影シーンを示しているときには、制御信号ＣＴＬ３のうちの奥行き値ｎの値を大きくする。すなわち、立体画像としての奥行きを小さくする。この場合は、撮影対象の動きが激しいので、右目用画像と左目用画像との視差の時間的な変動も大きく、奥行き感の大きな擬似立体画像をユーザーが視聴した場合、ユーザーの疲労感につながるおそれがあるので、それを防止するためである。 In addition, when the shooting scene information indicates a sports shooting scene, the CPU 173 increases the depth value n of the control signal CTL3. That is, the depth as a stereoscopic image is reduced. In this case, since the movement of the shooting target is intense, the temporal variation of the parallax between the right-eye image and the left-eye image is large, and when the user views a pseudo-stereoscopic image with a great sense of depth, the user feels tired. This is to prevent it because there is a fear.

更に、ＣＰＵ１７３は、撮影シーン情報が昼間の風景撮影シーンを示しており、手振れによる振動の大きさが第１の所定値よりも小さく、光軸周りのロール角度が０°に近く、被写体までの推定距離が第２の所定値よりも大きいという条件を満たしている場合は、制御信号ＣＴＬ３のうちの奥行き値ｎの値を小さくする。すなわち、立体画像としての奥行きを大きくする。その理由は、この場合の画像に対しては、生成される擬似立体画像の奥行き感を強く持たせることで、臨場感を高めることができるためである。 Further, the CPU 173 indicates that the shooting scene information indicates a daytime landscape shooting scene, the magnitude of vibration due to camera shake is smaller than the first predetermined value, the roll angle around the optical axis is close to 0 °, and When the condition that the estimated distance is larger than the second predetermined value is satisfied, the depth value n of the control signal CTL3 is decreased. That is, the depth as a stereoscopic image is increased. The reason is that the presence of the pseudo-stereoscopic image to be generated can be enhanced with respect to the image in this case, thereby enhancing the sense of reality.

このように、本実施形態によれば、撮影時のパラメータを用いて２Ｄ３Ｄ変換部１１５に供給する制御信号ＣＴＬ１〜ＣＴＬ３を自動的に制御するようにしたので、ユーザーが撮影シーンなどに応じて個々に設定しなくても最適な奥行きモデルが自動的に生成される。特に本実施形態では、制御信号ＣＴＬ１〜ＣＴＬ３を自動的に制御する撮影時のパラメータとして、手振れによるカメラの搖動の大きさ情報、パンニング、チルティングの大きさ情報、光軸周りのロール角度情報及び撮影シーン情報を含んでいるため、スポーツ撮影シーン、昼間の風景撮影シーン、花のマクロ撮影シーンなどの撮影シーンに応じて、また画面の水平線に対して傾いている非立体画像や、画面内での揺れが大きな非立体画像に対して、最適な奥行きモデルの擬似立体画像を生成して、擬似立体画像を視聴するユーザーの違和感や疲労感を抑制し、更に迫力や臨場感のある擬似立体画像を生成して表示することができる。 As described above, according to the present embodiment, the control signals CTL1 to CTL3 supplied to the 2D3D conversion unit 115 are automatically controlled using the parameters at the time of shooting. Even if it is not set to, the optimum depth model is automatically generated. In particular, in the present embodiment, as shooting parameters for automatically controlling the control signals CTL1 to CTL3, information on the size of the camera shake due to camera shake, information on the size of panning and tilting, information on the roll angle around the optical axis, and Because it includes shooting scene information, it can be used for shooting scenes such as sports shooting scenes, daytime landscape shooting scenes, and macro shooting scenes of flowers, as well as non-stereoscopic images that are tilted with respect to the horizontal line of the screen, Generates a pseudo-stereoscopic image of the optimal depth model for non-stereoscopic images with large fluctuations, suppresses discomfort and fatigue of the user viewing the pseudo-stereoscopic image, and more powerful and realistic pseudo-stereoscopic images Can be generated and displayed.

なお、本発明は以上の実施形態に限定されるものではなく、例えば、以上の各実施形態の擬似立体画像生成動作をＣＰＵ１０８、１５２、又は１７３により実行させる擬似立体画像生成プログラムも本発明に含まれる。この場合、擬似立体画像生成プログラムは、記録媒体からＣＰＵに取り込まれてもよいし、通信ネットワークを通して配信されてＣＰＵに取り込まれてもよい。 Note that the present invention is not limited to the above-described embodiments. For example, a pseudo-stereoscopic image generation program that causes the CPUs 108, 152, or 173 to execute the pseudo-stereoscopic image generation operations of the above-described embodiments is also included in the present invention. It is. In this case, the pseudo stereoscopic image generation program may be taken into the CPU from a recording medium, or may be distributed through a communication network and taken into the CPU.

また、本発明は第２の実施形態のビデオカメラ１５０に、手振れセンサや姿勢センサを追加して、これらにより得られた手振れによるカメラの搖動の大きさ情報、パンニング、チルティングの大きさ情報及び撮像素子の光軸周りのロール角度情報を用いて第３の実施形態と同様にして最適な奥行きモデルの擬似立体画像信号を生成し、その擬似立体画像信号を記録媒体に記録する構成も含む。さらに、基本奥行きモデルタイプの画像は前述したＡ、Ｂ、Ｃの立体構造に限定されるものではなく、また３種類以外の複数種類であってもよい。 In addition, the present invention adds a camera shake sensor and a posture sensor to the video camera 150 of the second embodiment, and information on the magnitude of the camera swing caused by camera shake, panning and tilting magnitude information, and A configuration is also included in which a pseudo-stereoscopic image signal of an optimal depth model is generated using roll angle information about the optical axis of the image sensor, and the pseudo-stereoscopic image signal is recorded on a recording medium, as in the third embodiment. Further, the basic depth model type image is not limited to the three-dimensional structure of A, B, and C described above, and may be a plurality of types other than the three types.

１００、１５０、１７０ビデオカメラ
１０１ズームレンズ
１０２フォーカスレンズ
１０３絞り
１０４撮像素子
１０５ズームレンズ駆動部
１０６フォーカスレンズ駆動部
１０７絞り駆動部
１０８、１５２、１７３中央処理装置（ＣＰＵ）
１０９、１７４メモリ
１１２、１５１画像記憶部
１１３、１５３画像処理部
１１４録再Ｉ／Ｆ部
１１５２Ｄ３Ｄ変換部
１１８操作部
１７１手振れセンサ
１７２姿勢センサ
２００奥行き推定データ生成部
２０２画面上部の高域成分評価部
２０３画面下部の高域成分評価部
２０４合成比率決定部
２０５スイッチ
２０６奥行きモデル合成部
２０７〜２０９フレームメモリ
２１０、３０１制御信号判定部
２１１重み付け部
２１２加算部
３００ステレオペア生成部
３０２テクスチャシフト部
３０３オクルージョン補償部
３０４ポスト処理部 100, 150, 170 Video camera 101 Zoom lens 102 Focus lens 103 Diaphragm 104 Image sensor 105 Zoom lens driving unit 106 Focus lens driving unit 107 Aperture driving unit 108, 152, 173 Central processing unit (CPU)
109, 174 Memory 112, 151 Image storage unit 113, 153 Image processing unit 114 Recording / playback I / F unit 115 2D3D conversion unit 118 Operation unit 171 Camera shake sensor 172 Attitude sensor 200 Depth estimation data generation unit 202 High-frequency component evaluation at the top of the screen Unit 203 high-frequency component evaluation unit 204 at the bottom of the screen 204 synthesis ratio determination unit 205 switch 206 depth model synthesis unit 207 to 209 frame memory 210, 301 control signal determination unit 211 weighting unit 212 addition unit 300 stereo pair generation unit 302 texture shift unit 303 Occlusion compensation unit 304 Post processing unit

Claims

The subject is a person from a non-stereoscopic image signal obtained by photoelectrically converting the optical image of the subject imaged on the imaging surface of the imaging device through an optical system including a zoom lens, a focus lens, and an aperture by the imaging device. A face size information acquiring means for acquiring the face size information of the person;
Basic depth model generating means for generating images of a plurality of basic depth model types indicating a basic scene for generating a pseudo-stereoscopic image signal from the non-stereoscopic image signal;
Based on a first control signal indicating a synthesis ratio of the plurality of basic depth model type images, the plurality of basic depth model type images supplied from the basic depth model generating means are combined to generate a depth model image. Depth model synthesis means for generating
Weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting the non-stereo image signal;
Depth estimation data generation means for generating depth estimation data from the non-stereoscopic image signal weighted by the weighting means and the image of the depth model generated by the depth model synthesis means;
A texture for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal based on the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. Shifting means;
When calculating and outputting the first to third control signals, respectively, at least one of the first to third control signals to be calculated using the face size information A pseudo-stereoscopic image generation apparatus comprising: a control signal calculation unit that varies the value of.

Lens position acquisition means for acquiring respective positions of a zoom lens and a focus lens constituting the optical system;
Subject estimated distance calculation means for calculating an estimated distance from the position of the focus lens to the subject based on the zoom lens position and the position of the focus lens acquired by the lens position acquisition means;
A scene serving as a basis for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal obtained by photoelectrically converting an optical image of a subject formed on the imaging surface of the imaging element through the optical system by the imaging element. A basic depth model generating means for generating images of a plurality of basic depth model types shown;
Based on a first control signal indicating a synthesis ratio of the plurality of basic depth model type images, the plurality of basic depth model type images supplied from the basic depth model generating means are combined to generate a depth model image. Depth model synthesis means for generating
Weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting the non-stereo image signal;
Depth estimation data generation means for generating depth estimation data from the non-stereoscopic image signal weighted by the weighting means and the image of the depth model generated by the depth model synthesis means;
A texture for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal based on the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. Shifting means;
When calculating and outputting each of the first and third control signals, at least one of the first and third control signals to be calculated using the calculated estimated distance information to the subject. A pseudo-stereoscopic image generation device comprising: control signal calculation means for changing the value of any one of the control signals.

Optical system information acquisition means for acquiring the position of the zoom lens, the position of the focus lens, and the aperture value of the diaphragm, respectively, constituting the optical system;
A depth-of-field calculating means for calculating an estimated depth of field based on the aperture value and the zoom lens position acquired by the optical system information acquiring means;
A scene serving as a basis for generating a pseudo-stereoscopic image signal from a non-stereoscopic image signal obtained by photoelectrically converting an optical image of a subject formed on the imaging surface of the imaging element through the optical system by the imaging element. A basic depth model generating means for generating images of a plurality of basic depth model types shown;
Based on a first control signal indicating a synthesis ratio of the plurality of basic depth model type images, the plurality of basic depth model type images supplied from the basic depth model generating means are combined to generate a depth model image. Depth model synthesis means for generating
Weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting the non-stereo image signal;
Depth estimation data generation means for generating depth estimation data from the non-stereoscopic image signal weighted by the weighting means and the image of the depth model generated by the depth model synthesis means;
A texture for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal based on the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. Shifting means;
When calculating and outputting each of the first to third control signals, at least any one of the first to third control signals to be calculated using the calculated information on the estimated depth of field. A pseudo-stereoscopic image generation apparatus comprising: control signal calculation means for changing a value of the one control signal.

Lens position acquisition means for acquiring respective positions of a zoom lens and a focus lens constituting the optical system;
Among the non-stereo image signals obtained by photoelectrically converting the optical image of the subject imaged on the imaging surface of the imaging element through the optical system by the imaging element, a predetermined small area in the imaging screen of the imaging element Automatic focus adjustment means for moving and controlling the position of the focus lens in order to obtain a focus position at which the high-frequency component of the non-stereoscopic image signal in the ranging area is the maximum,
Position data acquisition means for acquiring position data of the ranging area in the imaging screen;
Subject estimated distance calculating means for calculating an estimated distance from the focus position of the focus lens to the subject;
Basic depth model generating means for generating images of a plurality of basic depth model types indicating a basic scene for generating a pseudo-stereoscopic image signal from the non-stereoscopic image signal;
Based on a first control signal indicating a synthesis ratio of the plurality of basic depth model type images, the plurality of basic depth model type images supplied from the basic depth model generating means are combined to generate a depth model image. Depth model synthesis means for generating
Weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting the non-stereo image signal;
Depth estimation data generation means for generating depth estimation data from the non-stereoscopic image signal weighted by the weighting means and the image of the depth model generated by the depth model synthesis means;
A texture for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal based on the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. Shifting means;
When calculating and outputting the first to third control signals, the first and third control signals to be calculated according to the position data of the ranging area and the estimated distance to the subject. A pseudo-stereoscopic image generation apparatus comprising: control signal calculation means for varying at least one of the control signals.

Detects camera shake information consisting of information on the amplitude of the image sensor that obtains non-stereoscopic image signals by photoelectrically converting the optical image of the subject imaged on the imaging surface through the optical system, and information on the size of panning and tilting. Camera shake detection means for
Basic depth model generating means for generating images of a plurality of basic depth model types indicating a scene serving as a basis for generating a pseudo-stereoscopic image signal from the non-stereoscopic image signal obtained by the imaging device;
Based on a first control signal indicating a synthesis ratio of the plurality of basic depth model type images, the plurality of basic depth model type images supplied from the basic depth model generating means are combined to generate a depth model image. Depth model synthesis means for generating
Weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting the non-stereo image signal;
Depth estimation data generation means for generating depth estimation data from the non-stereoscopic image signal weighted by the weighting means and the image of the depth model generated by the depth model synthesis means;
A texture for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal based on the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. Shifting means;
When calculating and outputting each of the first to third control signals, the camera shake information is used to calculate the value of at least one of the first to third control signals to be calculated. A pseudo-stereoscopic image generation apparatus comprising: a variable control signal calculation unit.

A roll angle detection means for detecting a roll angle around an optical axis of an image pickup element that photoelectrically converts an optical image of a subject imaged on an imaging surface through an optical system to obtain a non-stereoscopic image signal;
Basic depth model generating means for generating images of a plurality of basic depth model types indicating a scene serving as a basis for generating a pseudo-stereoscopic image signal from the non-stereoscopic image signal obtained by the imaging device;
Based on a first control signal indicating a synthesis ratio of the plurality of basic depth model type images, the plurality of basic depth model type images supplied from the basic depth model generating means are combined to generate a depth model image. Depth model synthesis means for generating
Weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting the non-stereo image signal;
Depth estimation data generation means for generating depth estimation data from the non-stereoscopic image signal weighted by the weighting means and the image of the depth model generated by the depth model synthesis means;
A texture for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal based on the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. Shifting means;
When calculating and outputting each of the first to third control signals, the roll angle detection information detected by the roll angle detection means is used to calculate the first and third control signals. A pseudo-stereoscopic image generation apparatus comprising: control signal calculation means for changing a value of at least one of the control signals.

In a camera having an image sensor that photoelectrically converts an optical image of a subject imaged on an imaging surface through an optical system to obtain a non-stereoscopic image signal,
Subject estimated distance calculating means for calculating an estimated distance from the camera to the subject;
Shooting scene information based on a signal change between two adjacent frames of a non-stereoscopic image signal output from an image sensor that photoelectrically converts an optical image of a subject imaged on an imaging surface through the optical system and the estimated distance Shooting scene information acquisition means for acquiring
Basic depth model generation means for generating images of a plurality of basic depth model types indicating a scene serving as a basis for generating a pseudo-stereoscopic image signal from the non-stereoscopic image signal output from the imaging device;
Based on a first control signal indicating a synthesis ratio of the plurality of basic depth model type images, the plurality of basic depth model type images supplied from the basic depth model generating means are combined to generate a depth model image. Depth model synthesis means for generating
Weighting means for multiplying the non-stereo image signal by the weighting coefficient indicated by the second control signal based on a second control signal indicating a weighting coefficient for weighting the non-stereo image signal;
Depth estimation data generation means for generating depth estimation data from the non-stereoscopic image signal weighted by the weighting means and the image of the depth model generated by the depth model synthesis means;
A texture for generating the pseudo stereoscopic image signal by shifting the texture of the non-stereo image signal based on the depth estimation data in which the depth and the convergence are adjusted by the third control signal indicating the depth and the congestion. Shifting means;
Of the first to third control signals to be calculated using the shooting scene information acquired by the shooting scene information acquisition means when calculating and outputting the first to third control signals, respectively. And a control signal calculating means for varying the value of at least one of the control signals.