JP2020194454A

JP2020194454A - Image processing device and image processing method, program, and storage medium

Info

Publication number: JP2020194454A
Application number: JP2019100724A
Authority: JP
Inventors: 知小松; Satoru Komatsu
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2020-12-03
Anticipated expiration: 2039-05-29
Also published as: JP7300895B2

Abstract

【課題】画像中の物体の属性情報を推定する。
【解決手段】画像処理装置は、被写体を撮像した撮像画像から前記被写体の深度方向の距離分布を示した深度情報を生成する深度生成手段と、前記撮像画像から特定の物体の領域を検出する物体検出手段と、前記特定の物体の姿勢を推定する姿勢推定手段と、前記撮像画像および前記深度情報における前記特定の物体の姿勢を変換する姿勢変換手段と、前記姿勢が変換された撮像画像と深度情報と画像の撮影条件とから前記特定の物体の属性情報を推定する属性情報推定手段と、を有する。
【選択図】図１PROBLEM TO BE SOLVED: To estimate attribute information of an object in an image.
An image processing device includes a depth generating means that generates depth information indicating a distance distribution in the depth direction of the subject from an image captured by capturing the subject, and an object that detects a region of a specific object from the captured image. A detection means, a posture estimation means for estimating the posture of the specific object, a posture conversion means for converting the posture of the specific object in the captured image and the depth information, and a captured image and a depth obtained by converting the posture. It has an attribute information estimating means for estimating the attribute information of the specific object from the information and the image shooting conditions.
[Selection diagram] Fig. 1

Description

本発明は、画像中の物体の属性情報を推定する技術に関する。 The present invention relates to a technique for estimating attribute information of an object in an image.

野生動物の観測や家畜の成長管理において、物体の寸法（体積）や質量を直接計測することが困難な場合があり、非接触に動物の属性情報を取得することが求められる。 In wildlife observation and livestock growth management, it may be difficult to directly measure the size (volume) and mass of an object, and it is required to acquire animal attribute information in a non-contact manner.

非接触で物体表面の形状計測を行う方法としてパターン投影法、多眼撮影法、ＴＯＦ（Time of Flight）法などが知られている。非接触で計測した表面形状から体積を推定する方法として特許文献１が提案されている。特許文献１では、農作物の体積推定方法として光切断法を用いた表面形状計測に加え、農作物の陰面を接地面まで体積としたうえで補正係数によって体積の補正を行い、近似的に体積を推定する方法が提案されている。 As a method for measuring the shape of an object surface in a non-contact manner, a pattern projection method, a multi-eye imaging method, a TOF (Time of Flight) method, and the like are known. Patent Document 1 has been proposed as a method of estimating a volume from a surface shape measured in a non-contact manner. In Patent Document 1, in addition to surface shape measurement using the optical cutting method as a method for estimating the volume of a crop, the volume is corrected by a correction coefficient after the shadow surface of the crop is set to the ground plane, and the volume is approximately estimated. A method has been proposed.

農作物のような静止物体に対し、動物は動きがありその動きを制御することは困難である。動物の寸法などの属性情報を取得するには動物の姿勢を知ることが必要となる。特許文献２では、撮像装置を用いて撮影した画像から特徴点を抽出し、予め学習して取得した特徴量と比較することで対象物体の姿勢を推定する方法が提案されている。 Animals move with respect to stationary objects such as crops, and it is difficult to control their movements. It is necessary to know the posture of the animal in order to acquire the attribute information such as the size of the animal. Patent Document 2 proposes a method of estimating the posture of a target object by extracting feature points from an image taken by an imaging device and comparing them with feature quantities obtained by learning in advance.

特許第５６８６０５８号公報Japanese Patent No. 5686058 特許第４４４９４１０号公報Japanese Patent No. 4449410

特許文献１では、計測対象物体の背面形状は計測または推定を行っておらず、おおよその体積を推定することは可能であるが、対象物体の３次元的な形状を得ることができないため、体積や質量の推定精度が低下してしまう。 In Patent Document 1, the back shape of the object to be measured is not measured or estimated, and it is possible to estimate the approximate volume, but the volume cannot be obtained because the three-dimensional shape of the object cannot be obtained. And the estimation accuracy of the mass will decrease.

野生動物の観測や家畜の成長管理において、動物を対象とする場合は対象物体の姿勢を制御することは困難であり、撮像画像から対象動物の全長を推定する場合、異なる姿勢では推定結果も異なる。特許文献２では、対象物体を認識するために予め学習した特徴点との比較により対象物体の姿勢を推定して対象物体を認識しているが、対象物体の属性情報（寸法・形状・体積・質量など）の推定は行っていない。 In wildlife observation and livestock growth management, it is difficult to control the posture of the target object when targeting animals, and when estimating the total length of the target animal from captured images, the estimation results differ depending on the different postures. .. In Patent Document 2, the posture of the target object is estimated by comparison with the feature points learned in advance to recognize the target object, and the target object is recognized. However, the attribute information (dimensions, shape, volume, and size) of the target object are recognized. Mass, etc.) is not estimated.

本発明は、上記課題に鑑みてなされ、その目的は、画像中の物体の属性情報を推定できる技術を実現することである。 The present invention has been made in view of the above problems, and an object of the present invention is to realize a technique capable of estimating attribute information of an object in an image.

上記課題を解決し、目的を達成するために、本発明の画像処理装置は、被写体を撮像した撮像画像から前記被写体の深度方向の距離分布を示した深度情報を生成する深度生成手段と、前記撮像画像から特定の物体の領域を検出する物体検出手段と、前記特定の物体の姿勢を推定する姿勢推定手段と、前記撮像画像および前記深度情報における前記特定の物体の姿勢を変換する姿勢変換手段と、前記姿勢が変換された撮像画像と深度情報と画像の撮影条件とから前記特定の物体の属性情報を推定する属性情報推定手段と、を有する。 In order to solve the above problems and achieve the object, the image processing apparatus of the present invention includes a depth generation means for generating depth information indicating a distance distribution in the depth direction of the subject from an captured image of the subject, and the above-mentioned. An object detecting means for detecting a region of a specific object from a captured image, a posture estimating means for estimating the posture of the specific object, and a posture changing means for converting the posture of the specific object in the captured image and the depth information. It also has an attribute information estimating means for estimating the attribute information of the specific object from the captured image in which the posture is changed, the depth information, and the shooting conditions of the image.

本発明によれば、画像中の物体の属性情報を推定することが可能となる。 According to the present invention, it is possible to estimate the attribute information of an object in an image.

実施形態１の装置構成を示すブロック図（ａ）、撮像素子の画素配列を示す図（ｂ）および撮像素子の断面構造を示す模式図（ｃ）。A block diagram (a) showing the apparatus configuration of the first embodiment, a diagram (b) showing the pixel arrangement of the image sensor, and a schematic diagram (c) showing the cross-sectional structure of the image sensor. 実施形態１の撮像素子と光学系と画像の関係を説明する模式図。The schematic diagram explaining the relationship between the image sensor of Embodiment 1, an optical system, and an image. 実施形態１の属性情報推定処理を示すフローチャート。The flowchart which shows the attribute information estimation processing of Embodiment 1. 実施形態１の対象物体選択画面を例示する図。The figure which illustrates the target object selection screen of Embodiment 1. FIG. 実施形態１の姿勢推定処理の一例を説明する図。The figure explaining an example of the posture estimation process of Embodiment 1. FIG. 実施形態１の姿勢変換処理および寸法計測位置の一例を示す図。The figure which shows an example of the posture change process and the dimension measurement position of Embodiment 1. FIG. 実施形態１の寸法計測位置の入力方法の一例を説明する図。The figure explaining an example of the input method of the dimension measurement position of Embodiment 1. FIG. 実施形態２の属性情報推定処理を示すフローチャート。The flowchart which shows the attribute information estimation processing of Embodiment 2. 実施形態２の姿勢推定・変換処理を示すフローチャート。The flowchart which shows the posture estimation / conversion process of Embodiment 2.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential to the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are designated by the same reference numbers, and duplicate description is omitted.

［実施形態１］以下、実施形態１について説明する。
以下では、画像処理装置の一例としての、被写体の距離分布を示す深度情報を取得可能なデジタルカメラに、本発明を適用した実施形態の例を説明する。しかし、本発明は、撮像画像と撮像画像に対応する深度情報と画像の撮影条件とに基づいて物体の属性情報寸法・形状・体積・質量など）を推定することが可能な任意の機器に適用可能である。 [Embodiment 1] Hereinafter, the first embodiment will be described.
Hereinafter, an example of an embodiment in which the present invention is applied to a digital camera capable of acquiring depth information indicating a distance distribution of a subject as an example of an image processing device will be described. However, the present invention is applied to any device capable of estimating the attribute information of an object (dimension, shape, volume, mass, etc.) based on the captured image, the depth information corresponding to the captured image, and the shooting conditions of the image. It is possible.

＜デジタルカメラの構成＞まず、図１を参照して、本実施形態のデジタルカメラ１００の構成および機能について説明する。 <Configuration of Digital Camera> First, the configuration and functions of the digital camera 100 of the present embodiment will be described with reference to FIG.

撮像光学系１０は、デジタルカメラ１００が有する撮影レンズであり、被写体の光学像を撮像素子１１上に形成する。撮像光学系１０は、光軸１０２上に並んだ不図示の複数のレンズで構成され、撮像素子１１から所定距離離れた位置に射出瞳１０１を有する。なお、本明細書において、光軸１０２と平行な方向をｚ方向または深度方向とし、光軸１０２と直交し、撮像素子１１の水平方向と平行な方向をｘ方向、撮像素子１１の垂直方向と平行な方向をｙ方向として定義する、あるいは軸を設けるものとする。 The image pickup optical system 10 is a photographing lens included in the digital camera 100, and forms an optical image of a subject on the image pickup element 11. The image pickup optical system 10 is composed of a plurality of lenses (not shown) arranged on the optical axis 102, and has an exit pupil 101 at a position separated from the image pickup element 11 by a predetermined distance. In the present specification, the direction parallel to the optical axis 102 is the z direction or the depth direction, the direction orthogonal to the optical axis 102 and parallel to the horizontal direction of the image pickup element 11 is the x direction, and the direction parallel to the image pickup element 11 is the vertical direction. The parallel direction is defined as the y direction, or an axis is provided.

撮像素子１１は、例えばＣＣＤ（電荷結合素子）やＣＭＯＳセンサ（相補型金属酸化膜半導体）である。撮像素子１１は、撮像光学系１０を介して撮像面に形成された被写体像を光電変換し、該被写体像に係る画像信号を出力する。また、本実施形態では撮像素子１１は、後述するように撮像面位相差測距方式の測距機能を有しており、撮像画像に加えて、撮像装置から被写体までの距離（被写体距離）を示す距離情報を生成して出力可能である。 The image pickup device 11 is, for example, a CCD (charge-coupled device) or a CMOS sensor (complementary metal oxide semiconductor). The image sensor 11 photoelectrically converts a subject image formed on the image pickup surface via the image pickup optical system 10 and outputs an image signal related to the subject image. Further, in the present embodiment, the image sensor 11 has a distance measuring function of the imaging surface phase difference ranging method as described later, and in addition to the captured image, the distance from the imaging device to the subject (subject distance) is measured. It is possible to generate and output the indicated distance information.

制御部１２は、例えばＣＰＵやマイクロプロセッサなどの制御装置であり、デジタルカメラ１００が備える各ブロックの動作を制御する。制御部１２は、例えば、撮像時のオートフォーカス（ＡＦ：自動焦点合わせ）、フォーカス位置の変更、Ｆ値（絞り）の変更、画像の取り込み、記憶部１４や入力部１５、表示部１６、通信部１７の制御を行う。 The control unit 12 is a control device such as a CPU or a microprocessor, and controls the operation of each block included in the digital camera 100. The control unit 12 is, for example, autofocus (AF: automatic focusing) at the time of imaging, change of focus position, change of F value (aperture), image capture, storage unit 14, input unit 15, display unit 16, communication. The unit 17 is controlled.

画像処理装置１３は、デジタルカメラ１００が有する各種の画像処理を実現するブロックである。図示されるように画像処理装置１３は、画像生成部１３０、深度生成部１３１、物体検出部１３２、姿勢推定部１３３、姿勢変換部１３４、属性情報推定部１３５の画像処理ブロックと、画像処理の作業領域として用いられるメモリ１３６とを有している。画像処理装置１３は、論理回路を用いて構成することができる。また、別の形態として、中央演算処理装置（ＣＰＵ）と演算処理プログラムを格納するメモリとから構成してもよい。 The image processing device 13 is a block that realizes various image processing possessed by the digital camera 100. As shown in the figure, the image processing device 13 includes an image processing block of an image generation unit 130, a depth generation unit 131, an object detection unit 132, a posture estimation unit 133, a posture conversion unit 134, and an attribute information estimation unit 135, and image processing. It has a memory 136 used as a work area. The image processing device 13 can be configured by using a logic circuit. Further, as another form, it may be composed of a central processing unit (CPU) and a memory for storing an arithmetic processing program.

画像生成部１３０は、撮像素子１１から出力された画像信号のノイズ除去、デモザイキング、輝度信号変換、収差補正、ホワイトバランス調整、色補正などの各種信号処理を行う。画像生成部１３０から出力される画像データ（撮像画像）はメモリ１３６に蓄積され、物体検出部１３２および表示部１６に用いられる。 The image generation unit 130 performs various signal processing such as noise removal, demosaiking, brightness signal conversion, aberration correction, white balance adjustment, and color correction of the image signal output from the image sensor 11. The image data (captured image) output from the image generation unit 130 is stored in the memory 136 and used by the object detection unit 132 and the display unit 16.

深度生成部１３１は、後述する撮像素子１１が有する測距用画素に係り得られた信号を基づいて、深度情報の分布を表す深度画像を生成する。ここで、深度画像は、各画素に格納される値が、該画素に対応する撮像画像の領域に存在する被写体の被写体距離である２次元の情報である。 The depth generation unit 131 generates a depth image showing the distribution of depth information based on the signal obtained by the distance measuring pixel included in the image sensor 11 described later. Here, the depth image is two-dimensional information in which the value stored in each pixel is the subject distance of the subject existing in the region of the captured image corresponding to the pixel.

物体検出部１３２は、画像生成部１３０により生成された撮像画像を用いて、該撮像画像に含まれる、予め計測対象となる物体を検出し、撮像画像中の位置・大きさを特定する。予め計測対象となる物体の種類が指定されていない場合は、物体検出部１３２において種類を特定する。なお、本実施形態では、対象物体は人間以外の動物であるものとする。 The object detection unit 132 detects an object to be measured in advance included in the captured image by using the captured image generated by the image generation unit 130, and specifies the position and size in the captured image. When the type of the object to be measured is not specified in advance, the object detection unit 132 specifies the type. In this embodiment, the target object is an animal other than a human being.

姿勢推定部１３３は、物体検出部１３２によって検出された物体領域において、対象物体の姿勢を予め学習して取得し記憶部１４に格納されている情報を利用して推定する。 The posture estimation unit 133 learns and acquires the posture of the target object in advance in the object region detected by the object detection unit 132, and estimates using the information stored in the storage unit 14.

姿勢変換部１３４は、姿勢推定部１３３で推定された対象物体について、鑑賞用画像および深度画像における対象物体の姿勢を属性情報の推定に適した特定の姿勢に変換する。特定の姿勢は対象物体により異なり、予め指定された物体の種類、または物体検出部１３２で特定した物体情報に基づき、予め記憶部１４および／またはメモリ１３６に格納されている姿勢情報から決定する。 The posture conversion unit 134 converts the posture of the target object in the viewing image and the depth image into a specific posture suitable for estimating the attribute information of the target object estimated by the posture estimation unit 133. The specific posture differs depending on the target object, and is determined from the posture information previously stored in the storage unit 14 and / or the memory 136 based on the type of the object specified in advance or the object information specified by the object detection unit 132.

属性情報推定部１３５は、姿勢変換部１３４で対象物体の姿勢が特定の姿勢に変換された鑑賞用画像と深度画像とから対象物体の属性情報として寸法・形状・体積・質量の少なくとも１つを推定する。寸法推定では、対象物体により寸法を計測する位置が異なる。よって、予め指定された物体の種類または物体検出部１３２で特定した物体の種類に基づき予め記憶部１４および／またはメモリ１３６に格納されている寸法計測のための情報を利用して推定を行う。形状推定では、姿勢変換した深度画像により表面形状を取得し、物体検出部１３２で特定した物体の種類に応じて、予め記憶部１４および／またはメモリ１３６に格納されている対象物体の３次元形状データを参照して推定を行う。体積推定では、形状推定により求められた対象物体の３次元形状と撮影パラメータから推定を行う。質量推定では、体積推定により求められた体積と対象物体に応じた密度情報を利用して推定を行う。密度情報は予め物体ごとに計測しておき、記憶部１４に格納されている。 The attribute information estimation unit 135 uses at least one of dimensions, shape, volume, and mass as attribute information of the target object from the viewing image and the depth image in which the posture of the target object is converted into a specific posture by the posture conversion unit 134. presume. In dimensional estimation, the position where dimensions are measured differs depending on the target object. Therefore, estimation is performed using the information for dimension measurement stored in the storage unit 14 and / or the memory 136 in advance based on the type of the object specified in advance or the type of the object specified by the object detection unit 132. In the shape estimation, the surface shape is acquired from the attitude-converted depth image, and the three-dimensional shape of the target object stored in the storage unit 14 and / or the memory 136 in advance according to the type of the object specified by the object detection unit 132. Estimate with reference to the data. In volume estimation, estimation is performed from the three-dimensional shape of the target object obtained by shape estimation and imaging parameters. In mass estimation, estimation is performed using the volume obtained by volume estimation and the density information according to the target object. The density information is measured in advance for each object and stored in the storage unit 14.

記憶部１４は、撮像された画像データ、各ブロックの動作の過程で生成された中間データ、画像処理装置１３やデジタルカメラ１００の動作において参照されるパラメータなどが記録される不揮発性の記録媒体である。記憶部１４は、処理の実現にあたり許容される処理性能が担保されるものであれば、高速に読み書きでき、かつ、大容量の記録媒体であればどのようなものであってもよく、例えば、フラッシュメモリなどが望ましい。 The storage unit 14 is a non-volatile recording medium that records captured image data, intermediate data generated in the process of operation of each block, parameters referred to in the operation of the image processing device 13 and the digital camera 100, and the like. is there. The storage unit 14 may be any recording medium that can read and write at high speed and has a large capacity as long as the processing performance allowed for the realization of processing is guaranteed. Flash memory or the like is desirable.

入力部１５は、例えば、ダイヤル、ボタン、スイッチ、タッチパネルなどの、デジタルカメラ１００に対してなされた情報入力や設定変更の操作入力を検出するユーザインターフェイスである。入力部１５は、なされた操作入力を検出すると、対応する制御信号を制御部１２に出力する。 The input unit 15 is a user interface that detects information input and setting change operation input made to the digital camera 100, such as dials, buttons, switches, and touch panels. When the input unit 15 detects the operation input made, it outputs the corresponding control signal to the control unit 12.

表示部１６は、例えば、液晶ディスプレイや有機ＥＬなどの表示装置である。表示部１６は、撮像画像をスルー表示することによる撮影時の構図確認や、各種設定画面やメッセージ情報の報知に用いられる。本実施形態では表示部１６は、物体の検出結果、形状・体積・質量など推定結果などの表示も行う。 The display unit 16 is, for example, a display device such as a liquid crystal display or an organic EL. The display unit 16 is used for confirming the composition at the time of shooting by displaying the captured image through, and for notifying various setting screens and message information. In the present embodiment, the display unit 16 also displays an object detection result, an estimation result such as a shape, a volume, and a mass.

通信部１７は、デジタルカメラ１００が備える、外部との情報送受信を実現する通信インタフェースである。通信部１７は、得られた撮像画像や深度情報、被写体の属性情報（寸法・形状・体積・質量）の推定結果などを他の装置に送出可能に構成されていてよい。 The communication unit 17 is a communication interface included in the digital camera 100 that realizes information transmission / reception with the outside. The communication unit 17 may be configured so that the obtained captured image, depth information, estimation result of subject attribute information (dimensions, shape, volume, mass), and the like can be sent to another device.

＜撮像素子の構成＞次に、図１（ｂ）、（ｃ）を参照して、本実施形態の撮像素子１１の詳細構成について説明する。 <Structure of Image Sensor> Next, the detailed configuration of the image sensor 11 of the present embodiment will be described with reference to FIGS. 1 (b) and 1 (c).

撮像素子１１は、図１（ｂ）に示されるように、異なるカラーフィルタが適用された２行×２列の画素群１１０が複数連結して配列されることで構成されている。拡大図示されるように、画素群１１０は、赤（Ｒ）、緑（Ｇ）、青（Ｂ）のカラーフィルタが配置されており、各画素（光電変換素子）からは、Ｒ、Ｇ、Ｂのいずれかの色情報を示した画像信号が出力される。なお、本実施形態では一例として、カラーフィルタが、図示されるような分布担っているものとして説明するが、本発明の実施がこれに限られるものではないことは容易に理解されよう。 As shown in FIG. 1B, the image pickup device 11 is configured by connecting and arranging a plurality of pixel groups 110 of 2 rows × 2 columns to which different color filters are applied. As shown in the enlarged view, red (R), green (G), and blue (B) color filters are arranged in the pixel group 110, and R, G, and B are arranged from each pixel (photoelectric conversion element). An image signal showing any of the color information of is output. In the present embodiment, as an example, the color filter is described as having a distribution as shown in the figure, but it can be easily understood that the implementation of the present invention is not limited to this.

本実施形態の撮像素子１１は、撮像面位相差測距方式の測距機能を実現すべく、１つの画素（光電変換素子）は、撮像素子１１の水平方向に係る、図１（ｂ）のＩ−Ｉ’断面において、複数の光電変換部が並んで構成される。より詳しくは、図１（ｃ）に示されるように、各画素は、マイクロレンズ１１１およびカラーフィルタ１１２を含む導光層１１３と、第１の光電変換部１１５および第２の光電変換部１１６を含むと、で構成されている。 In the image pickup device 11 of the present embodiment, one pixel (photoelectric conversion element) relates to the horizontal direction of the image pickup device 11 in order to realize the distance measurement function of the image pickup surface phase difference distance measurement method, as shown in FIG. 1 (b). In the I-I'cross section, a plurality of photoelectric conversion units are arranged side by side. More specifically, as shown in FIG. 1 (c), each pixel includes a light guide layer 113 including a microlens 111 and a color filter 112, and a first photoelectric conversion unit 115 and a second photoelectric conversion unit 116. Including, it is composed of.

導光層１１３において、マイクロレンズ１１１は、画素へ入射した光束を第１の光電変換部１１５および第２の光電変換部１１６に効率よく導くよう構成されている。またカラーフィルタ１１２は、所定の波長帯域の光を通過させるものであり、上述したＲ、Ｇ、Ｂのいずれかの波長帯の光のみを通過させ、後段の第１の光電変換部１１５および第２の光電変換部１１６に導く。 In the light guide layer 113, the microlens 111 is configured to efficiently guide the light flux incident on the pixel to the first photoelectric conversion unit 115 and the second photoelectric conversion unit 116. Further, the color filter 112 passes light in a predetermined wavelength band, passes only light in any of the wavelength bands R, G, and B described above, and passes through the first photoelectric conversion unit 115 and the first photoelectric conversion unit 115 in the subsequent stage. It leads to the photoelectric conversion unit 116 of 2.

受光層１１４には、受光した光をアナログ画像信号に変換する２つの光電変換部（第１の光電変換部１１５と第２の光電変換部１１６）が設けられており、これら２つの光電変換部から出力された２種類の信号が測距に用いられる。即ち、撮像素子１１の各画素は、同様に水平方向に並んだ２つの光電変換部を有しており、全画素のうちの第１の光電変換部１１５から出力された信号で構成された画像信号と、第２の光電変換部１１６から出力された信号で構成される画像信号が用いられる。換言すれば、第１の光電変換部１１５と第２の光電変換部１１６とは、画素に対してマイクロレンズ１１１を介して入光する光束を、それぞれ部分的に受光する。故に、最終的に得られる２種類の画像信号は、撮像光学系１０の射出瞳の異なる領域を通過した光束に係る瞳分割画像群となる。ここで、各画素で第１の光電変換部１１５と第２の光電変換部１１６とが光電変換した画像信号を合成したものは、画素に１つの光電変換部のみが設けられている態様において該１つの光電変換部から出力される画像信号（鑑賞用）と等価である。 The light receiving layer 114 is provided with two photoelectric conversion units (first photoelectric conversion unit 115 and second photoelectric conversion unit 116) that convert the received light into an analog image signal, and these two photoelectric conversion units are provided. Two types of signals output from are used for distance measurement. That is, each pixel of the image sensor 11 also has two photoelectric conversion units arranged in the horizontal direction, and is an image composed of signals output from the first photoelectric conversion unit 115 of all the pixels. An image signal composed of a signal and a signal output from the second photoelectric conversion unit 116 is used. In other words, the first photoelectric conversion unit 115 and the second photoelectric conversion unit 116 partially receive the light flux that enters the pixel through the microlens 111, respectively. Therefore, the two types of image signals finally obtained are a pupil-divided image group related to the light flux passing through different regions of the exit pupil of the imaging optical system 10. Here, the image signal obtained by synthesizing the image signal photoelectrically converted by the first photoelectric conversion unit 115 and the second photoelectric conversion unit 116 in each pixel is described in an embodiment in which only one photoelectric conversion unit is provided in the pixel. It is equivalent to an image signal (for viewing) output from one photoelectric conversion unit.

このような構造を有することで、本実施形態の撮像素子１１は、鑑賞用画像信号と測距用画像信号（２種類の瞳分割画像）とを出力することが可能となっている。なお、本実施形態では、撮像素子１１の全ての画素が２つの光電変換部を備え、高密度な深度情報を出力可能に構成されているものであるとして説明するが、本発明の実施はこれに限られるものではない。 By having such a structure, the image sensor 11 of the present embodiment can output an image signal for viewing and an image signal for distance measurement (two types of pupil-divided images). In the present embodiment, it will be described that all the pixels of the image sensor 11 are provided with two photoelectric conversion units and are configured to be capable of outputting high-density depth information. However, this is the embodiment of the present invention. It is not limited to.

＜撮像面位相差測距方式の測距原理＞
ここで、本実施形態のデジタルカメラ１００で行われる、第１の光電変換部１１５および第２の光電変換部１１６から出力された瞳分割画像群に基づいて、被写体距離を導出する原理について、図２を参照して説明する。 <Distance measurement principle of imaging surface phase difference distance measurement method>
Here, the principle of deriving the subject distance based on the pupil division image group output from the first photoelectric conversion unit 115 and the second photoelectric conversion unit 116, which is performed by the digital camera 100 of the present embodiment, is illustrated. This will be described with reference to 2.

図２（ａ）は、撮像光学系１０の射出瞳１０１と、撮像素子１１中の画素の第１の光電変換部１１５に受光する光束を示した概略図である。図２（ｂ）は同様に第２の光電変換部１１６に受光する光束を示した概略図である。 FIG. 2A is a schematic view showing the light flux received by the exit pupil 101 of the image pickup optical system 10 and the first photoelectric conversion unit 115 of the pixels in the image pickup element 11. FIG. 2B is a schematic view showing the luminous flux received by the second photoelectric conversion unit 116 in the same manner.

図２（ａ）および（ｂ）に示したマイクロレンズ１１１は、射出瞳１０１と受光層１１４とが光学的に共役関係になるように配置されている。撮像光学系１０の射出瞳１０１を通過した光束は、マイクロレンズ１１１により集光されて第１の光電変換部１１５または第２の光電変換部１１６に導かれる。この際、第１の光電変換部１１５と第２の光電変換部１１６にはそれぞれ図２（ａ）および（ｂ）に示される通り、異なる瞳領域を通過した光束を主に受光する。第１の光電変換部１１５には第１の瞳領域２１０を通過した光束、第２の光電変換部１１６には第２の瞳領域２２０を通過した光束となる。 The microlens 111 shown in FIGS. 2A and 2B is arranged so that the exit pupil 101 and the light receiving layer 114 are in an optically conjugate relationship. The luminous flux that has passed through the exit pupil 101 of the imaging optical system 10 is focused by the microlens 111 and guided to the first photoelectric conversion unit 115 or the second photoelectric conversion unit 116. At this time, as shown in FIGS. 2 (a) and 2 (b), the first photoelectric conversion unit 115 and the second photoelectric conversion unit 116 mainly receive light fluxes that have passed through different pupil regions. The first photoelectric conversion unit 115 has a luminous flux that has passed through the first pupil region 210, and the second photoelectric conversion unit 116 has a luminous flux that has passed through the second pupil region 220.

撮像素子１１が備える複数の第１の光電変換部１１５は、第１の瞳領域２１０を通過した光束を主に受光し、第１の画像信号を出力する。また、同時に撮像素子１１が備える複数の第２の光電変換部１１６は、第２の瞳領域２２０を通過した光束を主に受光し、第２の画像信号を出力する。第１の画像信号から第１の瞳領域２１０を通過した光束が撮像素子１１上に形成する像の強度分布を得ることができる。また、第２の画像信号から第２の瞳領域２２０を通過した光束が、撮像素子１１上に形成する像の強度分布を得ることができる。 The plurality of first photoelectric conversion units 115 included in the image sensor 11 mainly receive the light flux that has passed through the first pupil region 210, and output the first image signal. At the same time, the plurality of second photoelectric conversion units 116 included in the image sensor 11 mainly receive the light flux passing through the second pupil region 220 and output the second image signal. It is possible to obtain the intensity distribution of the image formed on the image pickup device 11 by the luminous flux passing through the first pupil region 210 from the first image signal. Further, the intensity distribution of the image formed on the image pickup device 11 by the luminous flux passing through the second pupil region 220 from the second image signal can be obtained.

第１の画像信号と第２の画像信号間の相対的な位置ズレ量（所謂、視差量）は、デフォーカス量に応じた値となる。視差量とデフォーカス量との関係について、図２（ｃ）、（ｄ）、（ｅ）を用いて説明する。図２（ｃ）、（ｄ）、（ｅ）は本実施形態の撮像素子１１、撮像光学系１０について説明した概略図である。図中の符号２１１は、第１の瞳領域２１０を通過する第１の光束を示し、符号２２１は第２の瞳領域２２０を通過する第２の光束を示す。 The relative positional deviation amount (so-called parallax amount) between the first image signal and the second image signal is a value corresponding to the defocus amount. The relationship between the parallax amount and the defocus amount will be described with reference to FIGS. 2 (c), (d), and (e). 2 (c), 2 (d), and 2 (e) are schematic views illustrating the image pickup device 11 and the image pickup optical system 10 of the present embodiment. Reference numeral 211 in the figure indicates a first luminous flux passing through the first pupil region 210, and reference numeral 221 indicates a second luminous flux passing through the second pupil region 220.

図２（ｃ）は合焦時の状態を示しており、第１の光束２１１と第２の光束２２１が撮像素子１１上で収束している。このとき、第１の光束２１１により形成される第１の画像信号と第２の光束２２１により形成される第２の画像信号間との視差量は０となる。図２（ｄ）は像側でｚ軸の負方向にデフォーカスした状態を示している。この時、第１の光束により形成される第１の画像信号と第２の信号により形成される第２の画像信号との視差量は０とはならず、負の値を有する。図２（ｅ）は、像側でｚ軸の正方向にデフォーカスした状態を示している。この時、第１の光束により形成される第１の画像信号と第２の光束により形成される第２の画像信号との視差量は正の値を有する。図２（ｄ）と図２（ｅ）の比較から、デフォーカス量の正負に応じて、位置ズレの方向が入れ替わることが分かる。また、デフォーカス量に応じて、撮像光学系の結像関係（幾何関係）に従って位置ズレが生じることが分かる。第１の画像信号と第２の画像信号との位置ズレである視差量は、後述する領域ベースのマッチング手法により検出することができる。 FIG. 2C shows the state at the time of focusing, and the first luminous flux 211 and the second luminous flux 221 are converged on the image sensor 11. At this time, the amount of parallax between the first image signal formed by the first luminous flux 211 and the second image signal formed by the second luminous flux 221 becomes zero. FIG. 2D shows a state in which the image side is defocused in the negative direction of the z-axis. At this time, the amount of parallax between the first image signal formed by the first light flux and the second image signal formed by the second signal does not become 0 and has a negative value. FIG. 2E shows a state in which the image side is defocused in the positive direction of the z-axis. At this time, the amount of parallax between the first image signal formed by the first light flux and the second image signal formed by the second light flux has a positive value. From the comparison between FIGS. 2 (d) and 2 (e), it can be seen that the directions of the positional deviations are switched according to the positive and negative of the defocus amount. Further, it can be seen that the positional deviation occurs according to the imaging relationship (geometric relationship) of the imaging optical system according to the defocus amount. The amount of parallax, which is the positional deviation between the first image signal and the second image signal, can be detected by a region-based matching method described later.

＜属性情報推定処理＞次に、図３（ａ）のフローチャートを用いて、本実施形態のデジタルカメラ１００において実行される撮像画像から対象物体の属性情報を推定する処理について説明する。なお、図３（ａ）のフローチャートに対応する処理は、制御部１２が、例えば記憶部１４に記憶されている対応する処理プログラムを読み出し、不図示の揮発性メモリに展開して実行し、デジタルカメラ１００の各部を制御することにより実現することができる。後述する図８および図９でも同様である。 <Attribute Information Estimating Process> Next, a process of estimating the attribute information of the target object from the captured image executed by the digital camera 100 of the present embodiment will be described with reference to the flowchart of FIG. 3A. The process corresponding to the flowchart of FIG. 3A is digitally executed by the control unit 12 reading, for example, the corresponding processing program stored in the storage unit 14, expanding it into a volatile memory (not shown), and executing the process. This can be achieved by controlling each part of the camera 100. The same applies to FIGS. 8 and 9 described later.

Ｓ３０１で、制御部１２は、計測対象となる物体の選択を行う。撮影時に表示部１６に計測を行う物体の一覧を表示し、入力部１５によってユーザが所望の物体を選択できるようにする。これにより物体の種類の推定を省略することができ、誤認識を防止することができる。ここでは、入力部１５と表示部１６を別体に構成しているが、タッチパネルなどにより表示部１６が入力部１５の機能を持つように構成してもよい。図４は計測対象の物体一覧の表示例を示している。図４では、動物の分類と、分類ごとの詳細な動物の種類が選択可能に表示され、ユーザは表示された選択肢のいずれかを選択すればよい。撮影対象が予め用意された物体の種類に当てはまらない場合は、例えば、図４に示すように「その他」の選択肢を表示し、一般的なパラメータを用いて属性情報推定を行えばよい。 In S301, the control unit 12 selects an object to be measured. A list of objects to be measured is displayed on the display unit 16 at the time of shooting, and the input unit 15 allows the user to select a desired object. As a result, the estimation of the type of the object can be omitted, and erroneous recognition can be prevented. Here, the input unit 15 and the display unit 16 are separately configured, but the display unit 16 may be configured to have the function of the input unit 15 by a touch panel or the like. FIG. 4 shows a display example of a list of objects to be measured. In FIG. 4, animal classifications and detailed animal types for each classification are displayed in a selectable manner, and the user may select one of the displayed options. When the object to be photographed does not correspond to the type of the object prepared in advance, for example, as shown in FIG. 4, the "other" option may be displayed and the attribute information may be estimated using general parameters.

Ｓ３０２で、制御部１２は、設定された焦点位置、絞り、露光時間などの撮像設定にて撮像を行うよう処理する。より詳しくは、制御部１２は、撮像素子１１に撮像を行わせ、得られた撮像画像を画像処理装置１３に伝送させ、メモリ１３６に記憶するよう制御する。ここで、撮像画像は、撮像素子１１が有する第１の光電変換部１１５のみから出力された信号で構成された画像信号Ｓ１と、第２の光電変換部１１６のみから出力された信号で構成された画像信号Ｓ２の２種類であるものとする。 In S302, the control unit 12 processes to perform imaging with imaging settings such as a set focal position, aperture, and exposure time. More specifically, the control unit 12 causes the image pickup device 11 to take an image, transmits the obtained captured image to the image processing device 13, and controls the image to be stored in the memory 136. Here, the captured image is composed of an image signal S1 composed of signals output only from the first photoelectric conversion unit 115 included in the image sensor 11 and a signal output only from the second photoelectric conversion unit 116. It is assumed that there are two types of image signals S2.

Ｓ３０３で、画像処理装置１３は、得られた撮像画像から鑑賞用画像と深度画像とを生成する。より詳しくは、画像処理装置１３のうちの画像生成部１３０は、まず画像信号Ｓ１と画像信号Ｓ２の各画素の画素値を加算することで、１つのベイヤー配列画像を生成する。画像生成部１３０は、該ベイヤー配列画像について、Ｒ、Ｇ、Ｂ各色の画像のデモザイキング処理を行い、鑑賞用画像を生成する。なお、デモザイキング処理は、撮像素子上に配置されたカラーフィルタに応じて行われるものであり、デモザイキング方法についていずれの方式が用いられるものであってもよいことは言うまでもない。このほか、画像生成部１３０は、ノイズ除去、輝度信号変換、収差補正、ホワイトバランス調整、色補正などの処理を行い、最終的な鑑賞用画像を生成してメモリ１３６に格納する。 In S303, the image processing device 13 generates a viewing image and a depth image from the obtained captured image. More specifically, the image generation unit 130 of the image processing device 13 first generates one Bayer array image by adding the pixel values of each pixel of the image signal S1 and the image signal S2. The image generation unit 130 performs demosing processing of images of each color of R, G, and B on the Bayer array image to generate an image for viewing. It goes without saying that the demosaiking process is performed according to the color filter arranged on the image sensor, and any method may be used for the demosaiking method. In addition, the image generation unit 130 performs processing such as noise removal, luminance signal conversion, aberration correction, white balance adjustment, and color correction to generate a final viewing image and store it in the memory 136.

＜深度画像生成処理＞
一方、深度画像については、深度生成部１３１が生成に係る処理を行う。ここで、深度画像生成に係る処理について、図３（ｂ）のフローチャートを用いて説明する。 <Depth image generation processing>
On the other hand, for the depth image, the depth generation unit 131 performs the processing related to the generation. Here, the process related to the depth image generation will be described with reference to the flowchart of FIG. 3 (b).

Ｓ３１１で、深度生成部１３１は、画像信号Ｓ１および画像信号Ｓ２について、光量補正処理を行う。撮像光学系１０の周辺画角ではヴィネッティングにより、第１の瞳領域２１０と第２の瞳領域２２０の形状が異なることに起因し、画像信号Ｓ１と画像信号Ｓ２の間では、光量バランスが崩れている。従って、本ステップにおいて、深度生成部１３１は、例えば、予め記憶部１４および／またはメモリ１３６に格納されている光量補正値を用いて、画像信号Ｓ１と画像信号Ｓ２の光量補正を行う。 In S311 the depth generation unit 131 performs light amount correction processing on the image signal S1 and the image signal S2. Due to vignetting at the peripheral angle of view of the imaging optical system 10, the shapes of the first pupil region 210 and the second pupil region 220 are different, and the light amount balance is lost between the image signal S1 and the image signal S2. ing. Therefore, in this step, the depth generation unit 131 corrects the light amount of the image signal S1 and the image signal S2 by using, for example, the light amount correction value previously stored in the storage unit 14 and / or the memory 136.

Ｓ３１２で、深度生成部１３１は、撮像素子１１における変換時に生じたノイズを低減する処理を行う。具体的には深度生成部１３１は、画像信号Ｓ１と画像信号Ｓ２に対して、フィルタ処理を適用することで、ノイズ低減を実現する。一般に、空間周波数が高い高周波領域ほどＳＮ比が低くなり、相対的にノイズ成分が多くなる。従って、深度生成部１３１は、空間周波数が高いほど、通過率が低減するローパスフィルタを適用する処理を行う。なお、Ｓ３１１における光量補正は、撮像光学系１０の製造誤差などによっては望ましい結果とはならないため、深度生成部１３１は、直流成分を遮断し、かつ、高周波成分の通過率が低いバンドパスフィルタを適用することが望ましい。 In S312, the depth generation unit 131 performs a process of reducing noise generated during conversion in the image sensor 11. Specifically, the depth generation unit 131 realizes noise reduction by applying a filter process to the image signal S1 and the image signal S2. In general, the higher the spatial frequency, the lower the SN ratio and the relatively large amount of noise components. Therefore, the depth generation unit 131 performs a process of applying a low-pass filter in which the passing rate decreases as the spatial frequency increases. Since the light amount correction in S311 does not give a desirable result depending on the manufacturing error of the imaging optical system 10, the depth generation unit 131 blocks a DC component and uses a bandpass filter having a low passing rate of a high frequency component. It is desirable to apply.

Ｓ３１３で、深度生成部１３１は、画像信号Ｓ１と画像信号Ｓ２に基づいて、これらの画像間の視差量を導出する。具体的には、深度生成部１３１は、画像信号Ｓ１内に、代表画素情報に対応した注目点と、該注目点を中心とする照合領域とを設定する。照合領域は、例えば、注目点を中心とした一辺が所定長さを有する正方領域などの矩形領域であってよい。次に深度生成部１３１は、画像信号Ｓ２内に参照点を設定し、該参照点を中心とする参照領域を設定する。参照領域は、上述した照合領域と同一の大きさおよび形状を有する。深度生成部１３１は、参照点を順次移動させながら、画像信号Ｓ１の照合領域内に含まれる画像と、画像信号Ｓ２の参照領域内に含まれる画像との相関度を導出し、最も相関度が高い参照点を、画像信号Ｓ２における、注目点に対応する対応点として特定する。このようにして特定された対応点と注目点との相対的な位置ズレ量が、注目点における視差量となる。 In S313, the depth generation unit 131 derives the amount of parallax between these images based on the image signal S1 and the image signal S2. Specifically, the depth generation unit 131 sets a point of interest corresponding to the representative pixel information and a collation area centered on the point of interest in the image signal S1. The collation area may be, for example, a rectangular area such as a square area having a predetermined length on one side centered on the point of interest. Next, the depth generation unit 131 sets a reference point in the image signal S2, and sets a reference region centered on the reference point. The reference region has the same size and shape as the collation region described above. The depth generation unit 131 derives the degree of correlation between the image included in the collation area of the image signal S1 and the image included in the reference area of the image signal S2 while sequentially moving the reference points, and the degree of correlation is the highest. A high reference point is specified as a corresponding point corresponding to the point of interest in the image signal S2. The amount of relative positional deviation between the corresponding point and the point of interest identified in this way is the amount of parallax at the point of interest.

深度生成部１３１は、このように注目点を代表画素情報に従って順次変更しながら視差量を算出することで、該代表画素情報によって定められた複数の画素位置における視差量を導出する。本実施形態では簡単のため、鑑賞用画像と同一の解像度で深度情報を得るべく、視差量を計算する画素位置（代表画素情報に含まれる画素群）は、鑑賞用画像と同数になるよう設定されているものとする。なお、相関度の導出方法として、ＮＣＣ（Normalized Cross-Correlation）やＳＳＤ（Sum of Squared Difference）、ＳＡＤ（Sum of Absolute Difference）などの方法を用いてよい。 The depth generation unit 131 calculates the parallax amount while sequentially changing the points of interest according to the representative pixel information, thereby deriving the parallax amount at a plurality of pixel positions determined by the representative pixel information. For simplicity in this embodiment, the pixel positions (pixel groups included in the representative pixel information) for calculating the parallax amount are set to be the same as the viewing image in order to obtain depth information at the same resolution as the viewing image. It is assumed that it has been done. As a method for deriving the degree of correlation, a method such as NCC (Normalized Cross-Correlation), SSD (Sum of Squared Difference), or SAD (Sum of Absolute Difference) may be used.

また、導出された視差量は、所定の変換係数を用いることで、撮像素子１１から撮像光学系１０の焦点までの距離であるデフォーカス量に変換することができる。ここで、所定の変換係数Ｋ、デフォーカス量をΔＬとすると、視差量ｄは、以下の式１によって、デフォーカス量に変換できる。 Further, the derived parallax amount can be converted into a defocus amount which is the distance from the image sensor 11 to the focal point of the image pickup optical system 10 by using a predetermined conversion coefficient. Here, assuming that the predetermined conversion coefficient K and the defocus amount are ΔL, the parallax amount d can be converted into the defocus amount by the following equation 1.

（式１）
ΔＬ＝Ｋ×ｄ
さらに、デフォーカス量ΔＬを幾何光学におけるレンズの公式である以下の式２を用いることで、被写体距離に変換することができる。
（式２）
１／Ａ＋１／Ｂ＝１／Ｆ
ここで、Ａは物面から撮像光学系１０の主点までの距離（被写体距離）、Ｂは撮像光学系１０の主点から像面までの距離、Ｆは撮像光学系１０の焦点距離を指すものとする。即ち、該レンズの公式において、Ｂの値がデフォーカス量ΔＬから導出することができるため、撮像時の焦点距離の設定に基づき、被写体から物面までの距離Ａを導出することができる。 (Equation 1)
ΔL = K × d
Further, the defocus amount ΔL can be converted into the subject distance by using the following equation 2 which is the formula of the lens in geometrical optics.
(Equation 2)
1 / A + 1 / B = 1 / F
Here, A is the distance from the object surface to the principal point of the imaging optical system 10 (subject distance), B is the distance from the principal point of the imaging optical system 10 to the image plane, and F is the focal length of the imaging optical system 10. It shall be. That is, in the lens formula, since the value of B can be derived from the defocus amount ΔL, the distance A from the subject to the object surface can be derived based on the setting of the focal length at the time of imaging.

深度生成部１３１は、このように導出した被写体距離を画素値とする２次元情報を構成し、深度画像としてメモリ１３６に格納する。 The depth generation unit 131 configures two-dimensional information having the subject distance derived in this way as a pixel value, and stores it in the memory 136 as a depth image.

一方、Ｓ３０４で、物体検出部１３２は、対象物体領域の検出を行う。物体検出部１３２は、Ｓ３０１で選択された対象物体の種類に基づき、事前に学習して取得し記憶部１４に格納された情報を利用して対象物体領域を特定し、特定した領域の輪郭に沿って対象物体を抽出する。この場合、深度画像を利用して対象物体領域の抽出を補助することも可能である。抽出した対象物体領域以外は特定または一定の値とし、対象物体領域のみが残された物体抽出画像を生成する。深度画像においても同様に対象物体領域以外は特定または一定の値に置き換え、対象物体領域のみが有効な値を持つ物体抽出深度画像を生成する。物体抽出画像および物体抽出深度画像はメモリ１３６に記憶され、以降の処理に利用される。対象物体領域を抽出するための学習方法には、例えばＤｅｅｐＬｅａｒｎｉｎｇなど、様々な機械学習を利用することができるが、特定の方法に限定されず、どのような方法を用いてもよい。 On the other hand, in S304, the object detection unit 132 detects the target object region. Based on the type of the target object selected in S301, the object detection unit 132 identifies the target object area by using the information learned and acquired in advance and stored in the storage unit 14, and forms the contour of the specified area. The target object is extracted along the line. In this case, it is also possible to assist the extraction of the target object area by using the depth image. A specific or constant value is set except for the extracted target object area, and an object extraction image in which only the target object area is left is generated. Similarly, in the depth image, the objects other than the target object area are replaced with specific or constant values, and an object extraction depth image having a valid value only in the target object area is generated. The object extraction image and the object extraction depth image are stored in the memory 136 and used for the subsequent processing. Various machine learning such as Deep Learning can be used as the learning method for extracting the target object region, but the learning method is not limited to a specific method, and any method may be used.

Ｓ３０５で、姿勢推定部１３３は、物体抽出画像における対象物体の姿勢推定を行う。姿勢推定部１３３は、Ｓ３０４における対象物体領域の検出結果から、物体抽出画像中の領域内における特徴点の抽出を行い、事前に学習して取得し記憶部１４に格納されている３次元形状の特徴点データを利用して姿勢の推定を行う。さらに物体抽出深度画像を利用することでより詳細な姿勢変化を推定することが可能となる。姿勢推定では、主に物体抽出画像を利用して対象物体としての動物の頭部や胴体、脚、尾といった部位がどこに位置しているか、頭部がどちらを向いているかといった情報を推定する。また、対象物体全体の撮影方向から見た向きは胴体の向きによって判定可能である。胴体の向きの判定には物体抽出深度画像を利用する。物体抽出深度画像から各画素における法線方向を算出し、上記胴体位置の情報を利用して胴体部分の法線方向を取得する。動物の胴体は曲面であるため法線方向は一定ではない。よって主成分分析などを行い、主たる法線方向を算出する。この主たる法線方向と垂直な面を対象物体の向きを表す平面として推定する。例えば、図５に示すように対象動物の胴体の中心を通り頭部から尾までを垂直に切断する平面Ｐｖおよび水平に切断する平面Ｐｈを推定し、後述の姿勢変換に利用する。 In S305, the posture estimation unit 133 estimates the posture of the target object in the object extracted image. The posture estimation unit 133 extracts feature points in the region in the object extraction image from the detection result of the target object region in S304, learns in advance, acquires the feature points, and stores the three-dimensional shape in the storage unit 14. The posture is estimated using the feature point data. Furthermore, it is possible to estimate a more detailed attitude change by using the object extraction depth image. In the posture estimation, information such as where the head, torso, legs, and tail of the animal as the target object are located and which direction the head is facing is estimated mainly by using the object extraction image. In addition, the orientation of the entire target object as viewed from the photographing direction can be determined by the orientation of the body. An object extraction depth image is used to determine the orientation of the fuselage. The normal direction of each pixel is calculated from the object extraction depth image, and the normal direction of the body portion is acquired by using the information on the body position. Since the body of an animal is curved, the normal direction is not constant. Therefore, the principal component analysis is performed to calculate the main normal direction. The plane perpendicular to the main normal direction is estimated as a plane representing the direction of the target object. For example, as shown in FIG. 5, a plane Pv that vertically cuts from the head to the tail through the center of the body of the target animal and a plane Ph that cuts horizontally are estimated and used for the posture change described later.

本実施形態では、対象物体領域検出と姿勢推定を別の処理としたが、機械学習を利用することで対象物体領域検出と物体の姿勢推定を同時に行ってもよい。 In the present embodiment, the target object area detection and the posture estimation are performed separately, but the target object area detection and the posture estimation of the object may be performed at the same time by using machine learning.

Ｓ３０６で、姿勢変換部１３４は、Ｓ３０５で推定された対象物体の姿勢を利用して、対象物体の物体抽出画像での姿勢変換および物体抽出深度画像での姿勢変換を行う。例えば、物体抽出画像から物体の属性情報として動物の寸法を推定する場合、動物に種類によって計測しやすい特定の姿勢がある。例えば、哺乳類の場合は図６に示すように撮影した画像（図６（ａ））を変換し側面から撮影された画像（図６（ｂ））にすることでその全長Ｌａ・頭胴長Ｌｂ・体高Ｌｃといった寸法の計測が容易となる。 In S306, the posture conversion unit 134 uses the posture of the target object estimated in S305 to perform posture conversion in the object extraction image of the target object and posture conversion in the object extraction depth image. For example, when estimating the size of an animal as attribute information of an object from an object extracted image, the animal has a specific posture that is easy to measure depending on the type. For example, in the case of mammals, the total length La and head and torso length Lb can be obtained by converting the image (FIG. 6 (a)) taken as shown in FIG. 6 into an image taken from the side (FIG. 6 (b)). -It becomes easy to measure dimensions such as body height Lc.

鳥類・魚類なども側面からの撮像画像となるように変換するのがよい。鳥類の場合は図６（ｃ）に示すように全長Ｌａや翼長Ｌｄなどの寸法を計測する。ただし、図６（ｄ）のように翼を広げた鳥類の場合は、上方から俯瞰した画像となるように変換するのが望ましく、翼開長Ｌｅを計測する。また、爬虫類・両生類・昆虫などの節足動物も情報からの俯瞰した画像となるのが望ましい。ただし、どのような動物においても撮影された画像における対象物体の姿勢に応じて、側面からの画像または上方からの画像に適宜変換するのが望ましい。 Birds, fish, etc. should also be converted so that they are captured images from the side. In the case of birds, dimensions such as total length La and wingspan Ld are measured as shown in FIG. 6 (c). However, in the case of a bird with its wingspan as shown in FIG. 6D, it is desirable to convert the image so that the image is viewed from above, and the wingspan Le is measured. It is also desirable to have a bird's-eye view of arthropods such as reptiles, amphibians, and insects. However, in any animal, it is desirable to appropriately convert the image from the side surface or the image from above according to the posture of the target object in the captured image.

ここでの姿勢変換は基本的には幾何変換による画像の変換を行う。哺乳類の場合を例にすると、上記姿勢推定において推定された動物の姿勢を示す垂直切断平面Ｐｖを利用し、この平面の法線が撮影装置１に対して垂直となるように回転角を算出する。得られた回転角から回転行列Ｒを生成し、以下の式３のように物体抽出画像および物体抽出深度画像を回転変換させることで図６（ａ）であった対象動物の姿勢を図６（ｂ）のような姿勢に変換する。
（式３）

Ｐは変換前の画像上の位置（ｘ、ｙ、ｚ）を意味し、Ｐ’は変換後の画像上の位置である。変換により欠落した画素位置の情報は周辺の画素の情報から補間することで欠落のない変換画像を生成する。 The posture conversion here basically converts the image by geometric transformation. Taking the case of mammals as an example, the vertical cutting plane Pv indicating the posture of the animal estimated in the above posture estimation is used, and the rotation angle is calculated so that the normal of this plane is perpendicular to the imaging device 1. .. A rotation matrix R is generated from the obtained rotation angle, and the object extraction image and the object extraction depth image are rotationally transformed as shown in Equation 3 below to obtain the posture of the target animal shown in FIG. 6 (a). Convert to the posture as in b).
(Equation 3)

P means a position (x, y, z) on the image before conversion, and P'is a position on the image after conversion. The information on the pixel position missing due to the conversion is interpolated from the information on the surrounding pixels to generate a converted image without any loss.

このように、姿勢変換された物体抽出画像および物体抽出深度画像がメモリ１３６に記憶され、以降の処理に利用される。本実施形態では、回転行列を利用した姿勢変換を例に説明したが、回転以外に平行移動、拡大縮小などを加えた変換を利用することもできる。姿勢変換を行うことで後述する属性情報推定のために予め計測保持しておくデータを減らすことができる利点もある。 In this way, the posture-transformed object extraction image and the object extraction depth image are stored in the memory 136 and used for the subsequent processing. In the present embodiment, the posture conversion using the rotation matrix has been described as an example, but it is also possible to use the transformation in which parallel movement, enlargement / reduction, etc. are added in addition to rotation. There is also an advantage that the data to be measured and held in advance for estimating the attribute information described later can be reduced by performing the posture change.

Ｓ３０７で、属性情報推定部１３５は、対象物体の属性情報の推定を行う。属性情報とは、対象物体の寸法・形状・体積・質量を表し、属性情報推定ではこれらのうち少なくとも１つを推定する。 In S307, the attribute information estimation unit 135 estimates the attribute information of the target object. The attribute information represents the size, shape, volume, and mass of the target object, and at least one of these is estimated in the attribute information estimation.

まず属性情報の１つである寸法推定について説明する。寸法推定では、姿勢変換された物体抽出画像を表示部１６に表示し、ユーザが入力部１５により表示された画像中の動物において所望の計測位置を指定する。指定方法としては２箇所を指定する方法（図７（ａ））、３箇所以上を指定してそれぞれの間を線形的に接続したり（図７（ｂ））、多項式を用いて接続したりする方法、ユーザが計測したい部分をなぞる方法（図７（ｃ））を用いる。図７（ａ）では、２点Ｐ１およびＰ２を指定し、２点間の水平の長さを計測する場合を示す。他にも垂直方向の長さを計測する場合や２点間のユークリッド距離を計測する場合を指定できるようにするのが望ましい。図７（ｂ）では４点Ｐ１〜Ｐ４を指定し各点の間を直線で繋いだ例を示している。他にもスプライン曲線などを用いて各点間を補間してその長さを計測してもよい。図７（ｃ）は図中のＰ１からＰ２までユーザがなぞった曲線の長さを計測する例である。このように様々な計測位置の指定方法と指定区間の計測方法があるが、２箇所の指定では直線のみの計測となり簡便な計測が可能な一方で、３箇所以上の指定もしくはなぞることによって曲線の長さも計測可能となり計測の自由度が向上する。特に、曲線による計測は、姿勢変換において所望の姿勢に変換できなかった場合の計測に効果がある。例えば、図６（ｅ）に示すようにヘビの全長Ｌａの計測など直線に伸びている状態が困難な動物の計測に効果がある。 First, dimension estimation, which is one of the attribute information, will be described. In the dimensional estimation, the posture-transformed object extraction image is displayed on the display unit 16, and the user specifies a desired measurement position in the animal in the image displayed by the input unit 15. As a specification method, a method of designating two places (Fig. 7 (a)), specifying three or more places and connecting them linearly (Fig. 7 (b)), or connecting using a polynomial. The method of tracing the part that the user wants to measure is used (FIG. 7 (c)). FIG. 7A shows a case where two points P1 and P2 are designated and the horizontal length between the two points is measured. In addition, it is desirable to be able to specify the case of measuring the length in the vertical direction and the case of measuring the Euclidean distance between two points. FIG. 7B shows an example in which four points P1 to P4 are designated and the points are connected by a straight line. Alternatively, the length may be measured by interpolating between each point using a spline curve or the like. FIG. 7C is an example of measuring the length of the curve traced by the user from P1 to P2 in the figure. In this way, there are various methods for designating measurement positions and methods for specifying designated sections, but while specifying only two points makes it possible to measure only straight lines, simple measurement is possible, but by specifying or tracing three or more points, a curve The length can also be measured, increasing the degree of freedom in measurement. In particular, the measurement using a curve is effective for the measurement when the posture cannot be converted into a desired posture. For example, as shown in FIG. 6E, it is effective for measuring an animal that is difficult to extend in a straight line, such as measuring the total length La of a snake.

計測点指定後の長さの計測は、指定された計測位置間の画素数をカウントすることで画像中の画素単位またはサブ画素単位で計測される。この場合の計測値は像空間における長さである。計測された画素単位の長さを実際の物体空間での長さに変換するために、まず撮像素子１１の１画素サイズの大きさから像空間での国際単位系の長さに変換する。次に撮影パラメータを利用して撮影倍率Ｍを求め、計測された像空間での長さと撮影倍率Ｍの積をとることで物体空間での実際の長さを算出する。 The length after designating the measurement point is measured in pixel units or sub-pixel units in the image by counting the number of pixels between the designated measurement positions. The measured value in this case is the length in the image space. In order to convert the measured pixel unit length to the length in the actual object space, first, the size of one pixel size of the image sensor 11 is converted to the length of the International System of Units in the image space. Next, the shooting magnification M is obtained using the shooting parameters, and the actual length in the object space is calculated by taking the product of the measured length in the image space and the shooting magnification M.

撮影倍率Ｍは撮影時のパラメータである撮像光学系１０の焦点距離Ｆ、対象物体距離Ｚを利用して以下の式４により算出できる。
（式４）
Ｍ＝Ｚ／Ｆ
対象物体距離Ｚは、撮像光学系１０に含まれるフォーカレンズの位置と対応するフォーカス距離を予め計測しておき、撮影時のフォーカスレンズ位置を検出して対応するフォーカス距離を対象物体距離Ｚとして取得する。 The photographing magnification M can be calculated by the following equation 4 using the focal length F of the imaging optical system 10 and the target object distance Z, which are parameters at the time of photographing.
(Equation 4)
M = Z / F
For the target object distance Z, the focus distance corresponding to the position of the focus lens included in the imaging optical system 10 is measured in advance, the focus lens position at the time of shooting is detected, and the corresponding focus distance is acquired as the target object distance Z. To do.

続いて属性情報の１つである形状推定について説明する。形状推定は対象物体の３次元形状を推定する。Ｓ３０１で選択された物体の種類の情報、Ｓ３０６で生成された姿勢変換された物体抽出画像および物体抽出深度画像を利用して行う。姿勢変換された物体抽出深度画像は、デジタルカメラ１００から対象物体までの距離に依存した値になっているため、対象物体までの距離を差し引くことでデジタルカメラ１００から見た対象物体面の深度画像（＝形状）が算出される。以降の説明では、計測された対象物体のある特定の一面を表面とするものとする。一度の撮影では対象物体の特定の一面形状のみが計測可能で、撮影方向から見えない反対の面は計測することができない。対象物体の反対面を推定するにあたり、予め計測対象となる複数の物体の３次元形状を計測し、参照３次元形状を記憶部１４および／またはメモリ１３６に格納しておく。格納しておく参照３次元形状は、物体ごとに平均的な１つの３次元形状でもよいが、反対面の推定精度を向上させるために複数の３次元形状を保持しておくことが望ましい。参照３次元形状は、ボクセル単位のデータまたは、国際単位系で表現されたデータのいずれであってもよいが、参照３次元形状データの単位によって以下の変換処理が変更される。ここでボクセルは、１画素をｘｙｚ方向に拡張した３次元の画素サイズを意味する。また、国際単位系のデータは物体側での対象物体のサイズを国際単位系で計測したものを意味する。上記算出された対象物体の表面の形状情報において、深度情報（Ｚ方向）は国際単位系であるが、対象物体のＸＹ方向の大きさは画素単位となっている。参照３次元形状のデータ単位に応じて深度情報をボクセル単位に変更、または対象物体のＸＹ方向の大きさを国際単位系に変更する。ボクセル単位および国際単位系の変換は、前述のように撮影パラメータを利用して撮影倍率Ｍを求め、式４を利用して変換する。 Next, shape estimation, which is one of the attribute information, will be described. The shape estimation estimates the three-dimensional shape of the target object. This is performed using the information on the type of the object selected in S301, the posture-transformed object extraction image and the object extraction depth image generated in S306. Since the posture-converted object extraction depth image has a value that depends on the distance from the digital camera 100 to the target object, the depth image of the target object surface seen from the digital camera 100 is obtained by subtracting the distance to the target object. (= Shape) is calculated. In the following description, it is assumed that a specific surface of the measured target object is the surface. Only a specific one-sided shape of the target object can be measured in one shooting, and the opposite side that cannot be seen from the shooting direction cannot be measured. In estimating the opposite surface of the target object, the three-dimensional shapes of a plurality of objects to be measured are measured in advance, and the reference three-dimensional shapes are stored in the storage unit 14 and / or the memory 136. The reference three-dimensional shape to be stored may be one average three-dimensional shape for each object, but it is desirable to retain a plurality of three-dimensional shapes in order to improve the estimation accuracy of the opposite surface. The reference three-dimensional shape may be either voxel unit data or data expressed in the International System of Units, but the following conversion process is changed depending on the reference three-dimensional shape data unit. Here, voxel means a three-dimensional pixel size in which one pixel is extended in the xyz direction. In addition, the data of the International System of Units means that the size of the target object on the object side is measured by the International System of Units. In the calculated surface shape information of the target object, the depth information (Z direction) is in the International System of Units, but the size of the target object in the XY direction is in pixel units. The depth information is changed to voxel units according to the data unit of the reference three-dimensional shape, or the size of the target object in the XY direction is changed to the international system of units. In the conversion of the voxel unit and the international system of units, the photographing magnification M is obtained by using the photographing parameter as described above, and the conversion is performed by using the equation 4.

次に参照３次元形状と検出した対象物体の大きさが同じになるように、参照３次元形状を変換する。その後、対象物体の表面形状と大きさ変換した参照３次元形状との位置のマッチング処理が行われる。マッチング処理によって計測された表面形状が、参照３次元形状においてどの面に対応するかを決定する。同時に、計測されていない対象物体の反対面が参照３次元形状において特定される。この特定された参照３次元形状における反対面を、計測された対象物体の表面形状と合成することで対象物体の３次元形状が推定される。複数の異なる参照３次元形状を格納した場合は、最も計測形状と合致する参照３次元形状から反対面を推定する。 Next, the reference three-dimensional shape is converted so that the size of the detected target object is the same as that of the reference three-dimensional shape. After that, the position matching process is performed between the surface shape of the target object and the size-converted reference three-dimensional shape. It is determined which surface the surface shape measured by the matching process corresponds to in the reference three-dimensional shape. At the same time, the opposite surface of the unmeasured object is identified in the reference three-dimensional shape. The three-dimensional shape of the target object is estimated by synthesizing the opposite surface in the specified reference three-dimensional shape with the measured surface shape of the target object. When a plurality of different reference 3D shapes are stored, the opposite surface is estimated from the reference 3D shape that most closely matches the measurement shape.

３次元形状の推定精度をさらに向上させるために、計測した表面形状と参照３次元形状の合致面との形状の差を算出し、算出した差を参照３次元形状の反対面に加減算することで形状を補正し推定反対面とする。または、形状の厚みに対する上記差の量を算出し、反対面の形状の厚みに応じて補正量を変更してもよい。 In order to further improve the estimation accuracy of the 3D shape, the difference in shape between the measured surface shape and the matching surface of the reference 3D shape is calculated, and the calculated difference is added to or subtracted from the opposite surface of the reference 3D shape. Correct the shape to make it the estimated opposite surface. Alternatively, the amount of the above difference with respect to the thickness of the shape may be calculated, and the correction amount may be changed according to the thickness of the shape on the opposite surface.

対象物体の大きさに対して、物体空間における１画素サイズが大きい場合、推定された３次元形状は段差のある不正確な形状となる。よって撮像素子１１の画素数が多いことが望ましく、撮影時に物体が画面に対してできるだけ大きく占めるように撮影するのが望ましい。画素単位の段差を低減するために、補間処理を適用することでより滑らかな形状に変更し、さらにはポリゴンデータとしてもよい。 When the size of one pixel in the object space is larger than the size of the target object, the estimated three-dimensional shape becomes an inaccurate shape with a step. Therefore, it is desirable that the number of pixels of the image sensor 11 is large, and it is desirable to shoot so that the object occupies as much as possible with respect to the screen at the time of shooting. In order to reduce the step difference in pixel units, the shape may be changed to a smoother shape by applying interpolation processing, and further, polygon data may be used.

次に、属性情報の１つである体積推定について説明する。体積推定では、上記形状推定で推定された３次元形状を用いて体積を算出する。推定された体積がボクセル単位データの場合、推定された３次元形状中のボクセル数をカウントし、ボクセルの一辺の長さを、式４を利用することで物体空間での体積を推定する。推定された体積が既に物体空間における国際単位系で表現されたデータの場合は、推定された３次元形状内を積分することで体積を推定する。なお、推定される体積は、画像処理にてベースとなる単位体積要素（正規格子単位）であるボクセル基準で導出されてもよいし、現実世界における実寸大の寸法基準で導出されるものであってもよい。 Next, volume estimation, which is one of the attribute information, will be described. In the volume estimation, the volume is calculated using the three-dimensional shape estimated by the shape estimation. When the estimated volume is voxel unit data, the number of voxels in the estimated three-dimensional shape is counted, and the length of one side of the voxel is estimated by using Equation 4 to estimate the volume in the object space. If the estimated volume is already expressed in the International System of Units in object space, the volume is estimated by integrating within the estimated three-dimensional shape. The estimated volume may be derived based on the voxel standard, which is a unit volume element (normal lattice unit) that is the base in image processing, or is derived based on the actual size dimensional standard in the real world. You may.

次に、属性情報の１つである質量推定について説明する。質量推定では、上記体積推定で導出された対象物体の体積と、記憶部１４および／またはメモリ１３６に格納されている対象物体の密度情報とを乗算することで対象物体の質量を推定する。密度情報は対象物体に対して一様としてもよいが、より高精度に質量を推定するために部位ごとに異なる情報を保持して利用することもできる。対象物体の骨格分析などを用いて部位ごと、例えば、頭部、胴体、腕、脚などに分割し、それぞれ異なる密度情報を用いて質量推定を行う。なお、本実施形態では、対象物体の密度情報を用いて質量を算出したが、これに限らず、対象物体の比重量の情報を予め記憶部１４および／またはメモリ１３６に格納しておき、推定された対象物体の３次元形状の体積を乗算することで重量を推定してもよい。 Next, mass estimation, which is one of the attribute information, will be described. In the mass estimation, the mass of the target object is estimated by multiplying the volume of the target object derived by the volume estimation by the density information of the target object stored in the storage unit 14 and / or the memory 136. The density information may be uniform with respect to the target object, but it is also possible to hold and use different information for each part in order to estimate the mass with higher accuracy. Each part is divided into, for example, the head, torso, arms, legs, etc. by using skeleton analysis of the target object, and mass estimation is performed using different density information for each part. In the present embodiment, the mass is calculated using the density information of the target object, but the mass is not limited to this, and the specific weight information of the target object is stored in advance in the storage unit 14 and / or the memory 136 for estimation. The weight may be estimated by multiplying the volume of the three-dimensional shape of the object.

Ｓ３０８で、制御部１２は、Ｓ３０７で推定された属性情報を表示部１６に表示すると共に、記憶部１４に記憶する。Ｓ３０７で推定された属性情報は、Ｓ３０３で生成された鑑賞用画像のメタデータとして深度画像と関連付けて記録することが望ましい。 In S308, the control unit 12 displays the attribute information estimated in S307 on the display unit 16 and stores it in the storage unit 14. It is desirable that the attribute information estimated in S307 is recorded in association with the depth image as the metadata of the viewing image generated in S303.

以上説明したように、本実施形態によれば、撮像画像から生成される深度画像と、画像が撮影された条件とから画像中の物体の属性情報を推定することが可能となる。詳しくは、深度画像と、物体の領域検出、姿勢推定、姿勢変換のための事前に学習し取得した情報、予め計測した形状、撮影パラメータおよび質量比を用いることで、対象物体の属性情報を推定できる。 As described above, according to the present embodiment, it is possible to estimate the attribute information of the object in the image from the depth image generated from the captured image and the conditions under which the image was taken. Specifically, the attribute information of the target object is estimated by using the depth image, the information learned and acquired in advance for area detection, attitude estimation, and attitude conversion of the object, the shape measured in advance, the shooting parameters, and the mass ratio. it can.

［実施形態２］次に、実施形態２について説明する。 [Embodiment 2] Next, the second embodiment will be described.

実施形態１では、ユーザが計測対象物体の選択を行っていた。これに対し、実施形態２は、ユーザにおる計測対象物体の選択入力がないところが実施形態１と相違する。なお、実施形態２において、デジタルカメラ１００の構成や機能は、実施形態１の図１や図３と同様であり、実施形態１の属性情報推定処理と相違する点を中心に説明する。 In the first embodiment, the user selects the measurement target object. On the other hand, the second embodiment is different from the first embodiment in that the user does not have a selection input of the measurement target object. In the second embodiment, the configuration and functions of the digital camera 100 are the same as those in FIGS. 1 and 3 of the first embodiment, and the differences from the attribute information estimation process of the first embodiment will be mainly described.

図８は、実施形態２の属性情報推定処理を示し、実施形態１の図３の処理と同一の処理には同一のステップ番号を付して示している。 FIG. 8 shows the attribute information estimation process of the second embodiment, and shows the same process as the process of FIG. 3 of the first embodiment with the same step number.

Ｓ８０１で、図３のＳ３０２と同様に、制御部１２は、設定された焦点位置、絞り、露光時間などの撮像設定にて撮像を行うよう処理する。 In S801, similarly to S302 in FIG. 3, the control unit 12 processes so as to perform imaging with the imaging settings such as the set focal position, aperture, and exposure time.

Ｓ８０２で、図３のＳ３０３と同様に、画像生成部１３０は、鑑賞用画像と深度画像を生成する。 In S802, similarly to S303 in FIG. 3, the image generation unit 130 generates an viewing image and a depth image.

Ｓ８０３で、物体検出部１３２は、被写体の認識を行う。被写体認識・領域検出は、予め機械学習によって取得した物体の分類・種類の情報に基づき、画像中の物体の識別および位置・輪郭の抽出を行い、物体抽出画像を生成する。抽出した位置・輪郭情報を深度画像にも適用し、深度生成部１３１により物体抽出深度画像を生成する。機械学習は特定の方法に限定されず、どのような方法を用いてもよい。 In S803, the object detection unit 132 recognizes the subject. Subject recognition / region detection generates an object extraction image by identifying an object in an image and extracting a position / contour based on information on the classification / type of the object acquired in advance by machine learning. The extracted position / contour information is also applied to the depth image, and the depth generation unit 131 generates the object extraction depth image. Machine learning is not limited to a specific method, and any method may be used.

Ｓ８０４で、姿勢推定部１３３および姿勢変換部１３４は、Ｓ８０３で識別および抽出した物体の姿勢推定および姿勢変換を行う。ここで、画像処理装置１３の姿勢推定部１３３および姿勢変換部１３４が行うＳ８０４の処理の詳細を図９のフローチャートを用いて説明する。 In S804, the posture estimation unit 133 and the posture conversion unit 134 perform posture estimation and posture conversion of the object identified and extracted in S803. Here, the details of the processing of S804 performed by the posture estimation unit 133 and the posture conversion unit 134 of the image processing device 13 will be described with reference to the flowchart of FIG.

姿勢推定部１３３は、Ｓ８０４１において、Ｓ８０３で生成された物体抽出深度画像から、デジタルカメラ１００から対象物体の基準位置までの距離を差し引くことで対象物体の撮影方向から見た表面の形状を取得する。基準位置は、デジタルカメラ１００から対象物体までの最も近い位置で設定してもよいし、最も遠い位置で設定してもよく、特に限定するものではない。 In S8041, the posture estimation unit 133 acquires the shape of the surface of the target object as seen from the shooting direction by subtracting the distance from the digital camera 100 to the reference position of the target object from the object extraction depth image generated in S803. .. The reference position may be set at the closest position from the digital camera 100 to the target object, or may be set at the farthest position, and is not particularly limited.

Ｓ８０４２で、姿勢推定部１３３は、Ｓ８０４１で得られた対象物体の表面形状と、予め記憶部１４に格納されている対象物体の３次元形状とを比較し、同じ大きさになるようにいずれか一方の大きさを変更する。その後の属性情報推定を考慮した場合、予め記憶部１４および／またはメモリ１３６に格納されている３次元形状の大きさを対象物体の表面形状の大きさに合わせるのが望ましく、変換係数を予め記憶部１４および／またはメモリ１３６に記憶しておくことが望ましい。 In S8042, the posture estimation unit 133 compares the surface shape of the target object obtained in S8041 with the three-dimensional shape of the target object stored in the storage unit 14 in advance, and either of them has the same size. Change the size of one. Considering the subsequent estimation of attribute information, it is desirable to match the size of the three-dimensional shape stored in the storage unit 14 and / or the memory 136 with the size of the surface shape of the target object in advance, and the conversion coefficient is stored in advance. It is desirable to store in the unit 14 and / or the memory 136.

Ｓ８０４３で、姿勢推定部１３３は、Ｓ８０４１で取得した対象物体の表面形状と、予め記憶部１４および／またはメモリ１３６に格納されている３次元形状とのマッチング処理を行い、対象物体を撮影している方向を特定する。この方法は、撮影した物体の姿勢が予め記憶部１４および／またはメモリ１３６に格納されている属性情報が取得しやすい姿勢と類似した姿勢であって、撮影方向が異なる場合に有効である。一方、対象物体の姿勢が予め記憶部１４および／またはメモリ１３６に格納されている姿勢と大きく異なる場合は、マッチング処理における評価値（マッチングスコア）が低下する。よって、Ｓ８０４４においてマッチングスコアを閾値と比較する。Ｓ８０４４においてマッチングスコアが閾値より高い場合は、Ｓ８０４５で対象物体の向きを表す撮影面が特定される。マッチングスコアが閾値より低い場合は、Ｓ８０４６で対象物体の関節部位および骨格の特定を行う。この特定も予め学習して取得した情報を利用して行う。 In S8043, the posture estimation unit 133 performs matching processing between the surface shape of the target object acquired in S8041 and the three-dimensional shape previously stored in the storage unit 14 and / or the memory 136, and photographs the target object. Identify the direction you are in. This method is effective when the posture of the photographed object is similar to the posture in which the attribute information stored in the storage unit 14 and / or the memory 136 is easily acquired, and the imaging directions are different. On the other hand, when the posture of the target object is significantly different from the posture stored in the storage unit 14 and / or the memory 136 in advance, the evaluation value (matching score) in the matching process is lowered. Therefore, in S8044, the matching score is compared with the threshold value. When the matching score is higher than the threshold value in S8044, the photographing surface representing the orientation of the target object is specified in S8045. If the matching score is lower than the threshold value, the joint site and skeleton of the target object are specified in S8046. This identification is also performed by using the information acquired by learning in advance.

Ｓ８０４７で、姿勢変換部１３４は、予め用意された３次元形状における骨格位置との違いを算出し、関節位置を支点に関節位置より先端方向部分の部位を回転させて、基準となる姿勢に類似するように表面形状を変換する。例えば、座った状態の牛を撮影した場合、大腿部、飛節、前膝などの脚部の関節位置や長さ、回転角を推定し、関節を回転の支点として回転させて立ち上がった状態の推定画像を生成する。変換された表面形状を再びＳ８０４３のマッチング処理に入力し、再度対象物体の向きを表す撮影面の特定を行う。 In S8047, the posture changing unit 134 calculates the difference from the skeleton position in the three-dimensional shape prepared in advance, rotates the portion in the tip direction direction from the joint position with the joint position as a fulcrum, and resembles the reference posture. The surface shape is changed so as to. For example, when a cow is photographed while sitting, the joint positions, lengths, and angles of rotation of the legs such as the thigh, hock, and front knee are estimated, and the joints are rotated as fulcrums of rotation to stand up. Generate an estimated image of. The converted surface shape is input to the matching process of S8043 again, and the photographing surface representing the direction of the target object is specified again.

Ｓ８０４８で、姿勢変換部１３４は、Ｓ８０４５で特定された撮影面情報に基づき、図３のＳ３０６と同様に、対象物体を側面から撮影したように、物体抽出画像および表面形状に対して幾何変換を利用して姿勢変換する。変換された物体抽出画像および表面形状はメモリ１３６に記憶され、以降の処理に利用される。 In S8048, the posture conversion unit 134 performs geometric transformation on the object extracted image and the surface shape based on the imaged surface information specified in S8045, as if the target object was photographed from the side surface, as in S306 of FIG. Use to change posture. The converted object extracted image and surface shape are stored in the memory 136 and used for the subsequent processing.

図８の説明に戻り、Ｓ８０５で、図３のＳ３０７と同様に、属性情報推定部１３５により対象物体の属性情報の推定を行う。Ｓ８０５で推定された属性情報は、表示部１６に表示されるとともに記憶部１４に記憶される。推定された属性情報は、Ｓ８０２で生成された鑑賞用画像のメタデータとして深度画像と共に記憶することが望ましい。 Returning to the description of FIG. 8, in S805, the attribute information estimation unit 135 estimates the attribute information of the target object in the same manner as in S307 of FIG. The attribute information estimated in S805 is displayed on the display unit 16 and stored in the storage unit 14. It is desirable that the estimated attribute information is stored together with the depth image as the metadata of the viewing image generated in S802.

ここで、Ｓ８０５の属性情報推定がＳ３０７と相違するところを説明する。 Here, the difference between the attribute information estimation of S805 and S307 will be described.

属性情報の１つである寸法推定について、実施形態１では、ユーザが計測位置を指定し、指定された位置で寸法を計測していた。これに対して、実施形態２では、画像処理装置１３が、Ｓ８０３で得られた物体の識別結果により物体の種類を特定し、必要な寸法情報（全長・頭胴長・体高など）から寸法の計測位置を決定する。そして、Ｓ８０４６と同様に事前に学習して取得した情報を利用して骨格認識を行い、Ｓ８０３の被写体認識で得られた輪郭情報を利用して計測位置を特定する。 Regarding the dimension estimation which is one of the attribute information, in the first embodiment, the user specifies the measurement position and measures the dimension at the designated position. On the other hand, in the second embodiment, the image processing device 13 identifies the type of the object based on the object identification result obtained in S803, and determines the dimensions from the necessary dimensional information (total length, head and body length, body height, etc.). Determine the measurement position. Then, similarly to S8046, the skeleton recognition is performed using the information acquired in advance by learning, and the measurement position is specified by using the contour information obtained by the subject recognition in S803.

属性情報の１つである形状推定については、Ｓ８０４の姿勢推定・変換で生成した表面形状と、予め記憶部１４および／またはメモリ１３６に格納されている３次元形状の大きさを合わせるための変換係数を利用する。変換係数を利用して記憶部１４に記憶されている３次元形状の大きさを変換し、計測対象物体の表面形状以外の背面部分を、大きさが変換された３次元形状から取得する。計測対象物体の表面形状と３次元形状から取得した背面形状とを合成することで計測対象物体の３次元形状を生成する。合成にあたり、接続部分は滑らかになるように平滑化処理を行う。 Regarding shape estimation, which is one of the attribute information, conversion for matching the size of the surface shape generated by the posture estimation / conversion of S804 with the size of the three-dimensional shape stored in the storage unit 14 and / or the memory 136 in advance. Use the coefficient. The size of the three-dimensional shape stored in the storage unit 14 is converted using the conversion coefficient, and the back surface portion other than the surface shape of the object to be measured is acquired from the three-dimensional shape whose size has been converted. The three-dimensional shape of the measurement target object is generated by synthesizing the surface shape of the measurement target object and the back surface shape obtained from the three-dimensional shape. In synthesizing, smoothing is performed so that the connected portion is smooth.

属性情報の１つである体積および質量の推定については、対象物体の体毛を考慮した推定を行う。実施形態１では体毛を考慮しておらず、体毛も同じ密度として質量の算出を行っていたため、実際の質量と推定された質量との誤差が大きくなる場合がある。また、羊毛生産などにおいては、体毛体積のみの推定が必要な場合もある。 Regarding the estimation of volume and mass, which is one of the attribute information, the estimation is performed in consideration of the hair of the target object. In the first embodiment, the body hair is not taken into consideration, and the mass of the body hair is calculated with the same density. Therefore, the error between the actual mass and the estimated mass may become large. In addition, in wool production and the like, it may be necessary to estimate only the body hair volume.

以上説明したように、本実施形態によれば、実施形態１の処理に加え、体毛量を推定し補正を行う。体毛量の推定には、まずジョイントバイラテラルフィルタやガイデットフィルターなどを利用してマッチング処理を行い、アルファマットを算出する。アルファマットが１以下の領域を体毛領域としてその厚みを算出し、推定した３次元形状から体毛領域を除いた体積を算出する。同様に質量推定についても、体毛領域を除いた体積を利用して、物体の種類ごとに格納されている密度情報を乗算することで体毛領域を除いた質量を推定する。または、体毛を含めた体積と体毛を除いた体積から体毛領域のみの体積を算出し、体毛の密度情報を利用して体毛質量を推定し、体毛を除いた質量との和をとることで、体毛の密度の違いを考慮した質量推定を行う。 As described above, according to the present embodiment, in addition to the treatment of the first embodiment, the amount of hair is estimated and corrected. To estimate the amount of hair, first perform a matching process using a joint bilateral filter, a guided filter, etc., and calculate the alpha matte. The thickness of the hair region is calculated with the region where the alpha mat is 1 or less as the hair region, and the volume obtained by excluding the hair region from the estimated three-dimensional shape is calculated. Similarly, for mass estimation, the mass excluding the hair region is estimated by multiplying the density information stored for each type of object by using the volume excluding the hair region. Alternatively, the volume of only the body hair region is calculated from the volume including the body hair and the volume excluding the body hair, the body hair mass is estimated using the density information of the body hair, and the sum is taken with the mass excluding the body hair. Estimate the mass considering the difference in hair density.

［他の実施形態］
本実施形態では、撮像素子１１が撮像面位相差測距方式の光電変換素子を有し、鑑賞用画像と深度画像とを取得できるものとして説明したが、本発明の実施において、深度情報の取得はこれに限られるものではない。深度情報は、例えば両眼の撮像装置や複数の異なる撮像装置から得られた複数枚の撮像画像に基づいて、ステレオ測距方式で取得するものであってもよい。あるいは、例えば光照射部と撮像装置を用いたステレオ測距方式や、ＴＯＦ（Time of Flight）方式と撮像装置の組み合わせによる方式などを用いて取得するものであってもよい。 [Other Embodiments]
In the present embodiment, it has been described that the image pickup device 11 has an image pickup surface phase difference ranging type photoelectric conversion element and can acquire a viewing image and a depth image. However, in the embodiment of the present invention, the depth information is acquired. Is not limited to this. The depth information may be acquired by a stereo ranging method based on, for example, a plurality of captured images obtained from a binocular imaging device or a plurality of different imaging devices. Alternatively, for example, it may be acquired by using a stereo ranging method using a light irradiation unit and an image pickup device, a method using a combination of a TOF (Time of Flight) method and an image pickup device, or the like.

実施形態１と実施形態２の属性情報推定処理はそれぞれの実施形態に限定するものではなく、同じ情報を用いる処理を入れ替えても実現可能である。 The attribute information estimation processing of the first embodiment and the second embodiment is not limited to the respective embodiments, and can be realized by exchanging the processes using the same information.

また、本実施形態として適用可能な画像処理装置は、デジタルスチルカメラ、デジタルビデオカメラ、車載カメラ、携帯電話やスマートフォンなどを含む。 Further, the image processing device applicable as the present embodiment includes a digital still camera, a digital video camera, an in-vehicle camera, a mobile phone, a smartphone, and the like.

また、本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention also supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device execute the program. It can also be realized by the process of reading and executing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

本発明は、撮像装置を利用した非接触での物体の属性情報推定であり、例えば、簡易的な家畜の成長記録、動物園での動物の健康管理、野生動物の遠方からの属性取得などメジャーや質量計による計測が困難な状況において有用である。 The present invention is a non-contact attribute information estimation of an object using an imaging device. For example, a simple livestock growth record, animal health management in a zoo, acquisition of attributes of wild animals from a distance, etc. It is useful in situations where measurement with a mass meter is difficult.

発明は上記実施形態に制限されるものではなく、発明の精神および範囲から離脱することなく、様々な変更および変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiments, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to make the scope of the invention public.

１００…デジタルカメラ、１２…制御部、１３…画像処理装置、１３０…画像生成部、１３１…深度生成部、１３２…物体検出部、１３３…姿勢推定部、１３４…姿勢変換部、１３５…属性情報推定部 100 ... Digital camera, 12 ... Control unit, 13 ... Image processing device, 130 ... Image generation unit, 131 ... Depth generation unit, 132 ... Object detection unit, 133 ... Attitude estimation unit, 134 ... Attitude conversion unit, 135 ... Attribute information Estimator

Claims

Depth generation means for generating depth information showing the distance distribution in the depth direction of the subject from the captured image obtained by capturing the subject, and
An object detecting means for detecting a region of a specific object from the captured image,
A posture estimation means for estimating the posture of the specific object, and
A posture changing means for converting the posture of the specific object in the captured image and the depth information, and
An image processing apparatus comprising: an attribute information estimating means for estimating the attribute information of the specific object from the captured image in which the posture is changed, the depth information, and the shooting conditions of the image.

Claim 1 is characterized in that the object detecting means detects a specific object type and region by extracting a region of the specific object using the object information acquired by learning in advance. The image processing apparatus according to.

The image processing apparatus according to claim 1, wherein the posture estimating means estimates the posture of the specific object by using the information of the object acquired by learning in advance.

The third aspect of the present invention, wherein the posture estimating means estimates the posture of the specific object by using the three-dimensional shape data of the object acquired in advance and the data of the part of the specific object. Image processing device.

The posture changing means uses the posture of the specific object in the captured image and the depth information estimated by the posture estimating means and the posture of the specific object acquired in advance, and uses the captured image and the depth information. The image processing apparatus according to any one of claims 1 to 4, wherein the image processing apparatus is subjected to geometric conversion.

The posture changing means uses the information on the part of the specific object estimated by the posture estimation means and the information on the posture and the part of the object acquired in advance, and the identification in the captured image and the depth information. The image processing apparatus according to claim 5, wherein geometric transformation is performed for each part of the object.

The posture estimation means calculates the main normal direction of the body portion of the specific object, and obtains a plane perpendicular to the normal direction.
The image processing apparatus according to claim 6, wherein the posture changing means performs geometric transformation on the captured image and the depth information with reference to the plane.

The image processing apparatus according to any one of claims 1 to 7, wherein the attribute information is at least one of dimensions, shape, volume, and mass.

The claim is characterized in that the attribute information estimating means specifies a measurement position according to the type of the specific object, and estimates the dimension at the measurement position using the information of the object acquired by learning in advance. 8. The image processing apparatus according to 8.

The attribute information estimation means estimates the shape of the part of the object that cannot be acquired from the captured image using the three-dimensional shape of the object acquired in advance, and synthesizes the shape of the part of the object obtained from the depth information of the object. The image processing apparatus according to claim 8, wherein the three-dimensional shape of the specific object is estimated.

The eighth aspect of claim 8, wherein the attribute information estimating means estimates a volume from the three-dimensional shape of the specific object, and estimates the mass of the specific object using the volume and the density of the object. Image processing device.

The image processing apparatus according to claim 11, wherein the attribute information estimation means estimates the mass of the specific object by using the portion of the object obtained from the captured image and the density of each portion of the object. ..

When the specific object is an animal, the attribute information estimating means calculates the volume of the specific object excluding the hair region, and uses the volume excluding the hair region and the density of the object to obtain the specific object. The image processing apparatus according to claim 11, wherein the mass of the image processing apparatus is estimated.

The image processing apparatus according to any one of claims 1 to 12, further comprising an input means for inputting information about the specific object.

The object detecting means is characterized in that it determines the type of the specific object by using the information for identifying the type of the object acquired by learning in advance and the information about the object input by the input means. The image processing apparatus according to claim 14.

Further having a designation means for designating the measurement position of the specific object,
The image processing apparatus according to claim 9, wherein the attribute information estimating means calculates the dimension of a measurement position of the specific object designated by the designated means.

16. The designation means is characterized in that the measurement position is designated according to an operation of designating two points for the specific object by a user, an operation of designating three or more points, or an operation of tracing. The image processing apparatus according to.

The image processing apparatus according to any one of claims 1 to 17, wherein the specific object is an animal other than a human being.

The image processing device is an image pickup device having an image pickup element for capturing an image.
The image processing apparatus according to any one of claims 1 to 17, wherein the depth generating means generates depth information from images having different parallax captured by the image pickup device.

The image processing apparatus according to any one of claims 1 to 19, further comprising a recording means for recording the captured image, the depth information, and the attribute information in association with each other.

The image processing apparatus according to claim 19, wherein the image pickup device has a plurality of photoelectric conversion units in one pixel.

A step in which the depth generation means generates depth information indicating the distance distribution of the subject in the depth direction from the captured image of the subject.
A step in which the object detecting means detects a region of a specific object from the captured image,
A step in which the posture estimating means estimates the posture of the specific object,
A step in which the posture changing means changes the posture of the specific object in the captured image and the depth information.
An image processing method, characterized in that the attribute information estimating means includes a step of estimating the attribute information of the specific object from the captured image whose posture is changed, the depth information, and the shooting conditions of the image.

A program for causing a computer to function as the image processing device according to any one of claims 1 to 21.

A storage medium in which a program for operating a computer as an image processing device according to any one of claims 1 to 21 is stored.