JP2010238134A

JP2010238134A - Image processor and program

Info

Publication number: JP2010238134A
Application number: JP2009087806A
Authority: JP
Inventors: Kazunobu Maezono; 和伸前園
Original assignee: Saxa Inc
Current assignee: Saxa Inc
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2010-10-21

Abstract

<P>PROBLEM TO BE SOLVED: To generate a large number of images in a virtual space based on a small amount of data acquired in a real space and construct recognition database for recognizing a person in the images. <P>SOLUTION: An image processor acquires three-dimensional shape data of a person, applies the data to a joint model of a person; creates a human body model in the virtual space; moves each joint of a human body model corresponding to the movement of each joint of the joint model; generates individual differences of action and physical constitution by adding noise according to the differences in the virtual space to the human body model by a noise generation processing section 11; generates an image from the human-body model, based on a set photographing condition by an image generation processing section 13; calculates image features of a person in the image from the generated image by an image feature calculation processing section 14; and constructs a recognition database 54 for recognizing a person in the image by machine-learning the image features by a machine learning section 15. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、現実空間で取得した数少ない人の形状データを３次元仮想空間上に投影して画像認識システムを構築するための画像処理装置及びプログラムに関する。 The present invention relates to an image processing apparatus and a program for constructing an image recognition system by projecting a few human shape data acquired in a real space onto a three-dimensional virtual space.

画像内の人を認識する画像認識システムでは、一般に、人の形状、姿勢、動作を撮影した多数の画像を取得して解析することで、認識に必要な認識データベースを構築し、認識データベースの各種データを使用して現実空間の画像内の人の動作や姿勢等を認識する。その際、解析する画像の種類や数が多いほど、認識データベースの精度が向上して、人の形状、姿勢、動作を正確に認識できるようになる。ところが、実際には、人の体型や動作、周囲の環境や照明、或いは、撮影の手段や方向等は様々であり、それらの差異（ノイズ）に応じた全ての画像を撮影して取得するのは極めて困難である。 In general, an image recognition system that recognizes people in an image constructs a recognition database necessary for recognition by acquiring and analyzing a large number of images obtained by capturing the shape, posture, and movement of people. Recognize human movements and postures in real space images using data. At that time, the more types and the number of images to be analyzed, the higher the accuracy of the recognition database and the more accurately the shape, posture, and movement of a person can be recognized. In reality, however, the human body shape and movement, the surrounding environment and lighting, or the means and direction of photographing are various, and all images corresponding to the difference (noise) are photographed and acquired. Is extremely difficult.

これに対し、仮想空間上で人の３次元モデルを変化させて所望の画像を生成することで、多数のパターンの画像を取得することが考えられる。また、そのような装置として、従来、３次元基本モデルに２次元モデルの画像を貼り付けて、所望の３次元モデルを生成する装置が知られている（特許文献１参照）。 On the other hand, it is conceivable to obtain images of a large number of patterns by generating a desired image by changing a three-dimensional model of a person in a virtual space. Further, as such an apparatus, there is conventionally known an apparatus that generates a desired three-dimensional model by pasting a two-dimensional model image on a three-dimensional basic model (see Patent Document 1).

しかしながら、この従来の装置では、単に、２次元モデルの特定部位を動かして３次元モデルの姿勢を変化させており、上記したノイズに応じた種々の画像を生成できず、現実空間で取得した画像との差が大きくなる傾向がある。その結果、正確な画像を取得することができず、かつ、取得できる画像のパターン等が制限されて、その数を大幅に増加させるのは難しい。 However, in this conventional apparatus, the posture of the three-dimensional model is simply changed by moving a specific part of the two-dimensional model, and various images corresponding to the noise cannot be generated. And the difference tends to increase. As a result, an accurate image cannot be acquired, and the pattern or the like of the image that can be acquired is limited, and it is difficult to greatly increase the number of images.

特開平１１−１７５７６５号公報Japanese Patent Laid-Open No. 11-175765

本発明は、このような従来の問題に鑑みなされたものであって、その目的は、現実空間で取得した少量のデータを基に、現実空間で取得が困難な画像を仮想空間上で容易にかつ膨大な数生成するとともに、現実空間で取得した画像と遜色ない画像を取得することである。また、他の目的は、これらの画像から所定の画像特徴を算出して機械学習することで、仮想空間において取得したデータのみで汎用性の高い認識データベースを構築することである。 The present invention has been made in view of such a conventional problem, and an object of the present invention is to easily obtain an image that is difficult to acquire in the real space in the virtual space based on a small amount of data acquired in the real space. In addition, an enormous number of images are generated, and images that are comparable to images acquired in real space are acquired. Another object is to construct a highly versatile recognition database using only data acquired in a virtual space by calculating predetermined image features from these images and performing machine learning.

請求項１の発明は、画像処理装置であって、人の３次元形状データを取得する手段と、３次元形状データを仮想空間の人の関節モデルに当てはめて仮想空間に人体モデルを作成する手段と、関節モデルの各関節の動きに合わせて人体モデルの各関節を動かし、人体モデルを変化させる手段と、人体モデルに現実空間の差異に応じたノイズを付加する手段と、人体モデルから画像を生成する手段と、人体モデルから生成した画像を記憶させる手段と、を備えたことを特徴とする。
請求項２の発明は、請求項１に記載された画像処理装置において、人の動きや姿勢に応じた関節モデルの関節毎の変位情報を取得する手段と、関節毎の変位情報に基づき関節モデルを変化させる手段と、を備えたことを特徴とする。
請求項３の発明は、請求項１又は２に記載された画像処理装置において、人体モデルから取得する画像の撮影条件を設定する手段を備え、人体モデルから画像を生成する手段が、人体モデルから撮影条件に基づく画像を生成することを特徴とする。
請求項４の発明は、請求項１ないし３のいずれかに記載された画像処理装置において、ノイズを付加する手段が、人体モデルに人の姿勢、動作、体格の少なくとも１つのノイズを付加して個人差を生成する手段を有することを特徴とする。
請求項５の発明は、請求項１ないし４のいずれかに記載された画像処理装置において、ノイズを付加する手段が、人体モデルに服、照明、遮蔽物の少なくとも１つのノイズを付加する手段を有することを特徴とする。
請求項６の発明は、請求項１ないし５のいずれかに記載された画像処理装置において、人体モデルから生成した画像に基づいて画像内の人の画像特徴を算出する手段と、画像特徴を機械学習して画像内の人を認識するための認識データベースを構築する手段と、を備えたことを特徴とする。
請求項７の発明は、コンピュータにより、請求項１ないし６のいずれかに記載された画像処理装置の各手段を実現するためのプログラムである。 The invention of claim 1 is an image processing device, means for acquiring human three-dimensional shape data, and means for applying a three-dimensional shape data to a human joint model in a virtual space to create a human body model in the virtual space A means for moving the human body model according to the movement of each joint of the joint model to change the human body model, a means for adding noise to the human body model according to the difference in the real space, and an image from the human body model. It is characterized by comprising means for generating and means for storing an image generated from a human body model.
According to a second aspect of the present invention, there is provided the image processing apparatus according to the first aspect, wherein the joint model is based on the displacement information for each joint of the joint model according to the movement and posture of the person, and the displacement information for each joint. And means for changing.
According to a third aspect of the present invention, in the image processing apparatus according to the first or second aspect, the image processing apparatus includes means for setting a photographing condition of an image acquired from the human body model, and the means for generating the image from the human body model is based on the human body model. An image based on shooting conditions is generated.
According to a fourth aspect of the present invention, in the image processing apparatus according to any one of the first to third aspects, the means for adding noise adds at least one noise of a human posture, motion, and physique to the human body model. It has the means to produce | generate an individual difference, It is characterized by the above-mentioned.
According to a fifth aspect of the present invention, in the image processing apparatus according to any one of the first to fourth aspects, the means for adding noise includes means for adding at least one noise of clothes, lighting, and shielding to a human body model. It is characterized by having.
According to a sixth aspect of the present invention, in the image processing apparatus according to any one of the first to fifth aspects, the means for calculating the human image feature in the image based on the image generated from the human body model, Means for constructing a recognition database for learning and recognizing a person in the image.
The invention according to claim 7 is a program for realizing each means of the image processing apparatus according to any one of claims 1 to 6 by a computer.

本発明によれば、現実空間で取得した少量のデータを基に、現実空間で取得が困難な画像を仮想空間上で容易にかつ膨大な数生成することができる。しかも、これらの画像には、人の形状、姿勢、動作、及び照明等に対して、予め設定する現実空間の差異に応じたノイズを付加しているため、現実空間で取得した画像と遜色ない画像を取得できる。これらの画像から所定の画像特徴を算出して機械学習することで、仮想空間において取得したデータのみで汎用性の高い認識データベースを構築できる。 According to the present invention, it is possible to easily generate an enormous number of images that are difficult to acquire in the real space in the virtual space, based on a small amount of data acquired in the real space. Moreover, since these images are added with noise corresponding to differences in the real space set in advance with respect to the human shape, posture, movement, lighting, etc., they are comparable to images acquired in the real space. Images can be acquired. By calculating predetermined image features from these images and performing machine learning, a highly versatile recognition database can be constructed using only data acquired in the virtual space.

本実施形態の画像処理装置の構成を概略的に示すブロック図である。1 is a block diagram schematically illustrating a configuration of an image processing apparatus according to an embodiment. 人の関節モデルを示す模式図である。It is a schematic diagram which shows a human joint model. 服の模様の違いを生成する輝度パターンを示す図である。It is a figure which shows the brightness | luminance pattern which produces | generates the difference in the pattern of clothes. 本実施形態の画像処理装置による画像処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the image processing by the image processing apparatus of this embodiment.

本発明の画像処理装置の一実施形態について、図面を参照して説明する。
図１は、本実施形態の画像処理装置の構成を概略的に示すブロック図である。 An image processing apparatus according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram schematically showing the configuration of the image processing apparatus of the present embodiment.

画像処理装置１は、例えば、中央演算処理装置（ＣＰＵ）と、各種のプログラムを格納するＲＯＭ（Read Only Memory）と、ＣＰＵの処理用データを一時的に格納するＲＡＭ（Random Access Memory）とを有するコンピュータを備えている。また、画像処理装置１は、画像処理部１０と、それぞれ予め記憶された各種関数からなるノイズ生成関数群２０及び画像特徴算出関数群４０と、画像を記憶する画像メモリ３０とを備えている。更に、画像処理装置１は、人の３次元形状データベース５０と、モーションキャプチャデータベース５１とを備え、それらに画像処理に使用する各種データを予め記憶している。 The image processing apparatus 1 includes, for example, a central processing unit (CPU), a ROM (Read Only Memory) that stores various programs, and a RAM (Random Access Memory) that temporarily stores processing data for the CPU. Having a computer. The image processing apparatus 1 also includes an image processing unit 10, a noise generation function group 20 and an image feature calculation function group 40 each of which is stored in advance, and an image memory 30 that stores an image. Furthermore, the image processing apparatus 1 includes a human three-dimensional shape database 50 and a motion capture database 51, in which various data used for image processing are stored in advance.

この画像処理装置１では、まず、ステレオカメラ等の距離画像センサを用いて、所定の姿勢の人を、前後、左右、上下等から撮影して、人の全方位から見た画像（距離画像）を取得する。取得した各画像の距離データに基づき、例えば縦横所定ピッチで網目状に人の表面を３次元計測し、人の全体の３次元的な表面形状を、計測点同士を結んだポリゴンデータとして算出して形状データを取得し、取得した人の形状データを３次元形状データベース５０に記憶する。また、画像処理装置１は、このように作成して取得した人の３次元形状データを、仮想空間の人の関節モデルに当てはめて、仮想空間に人体モデルを作成する。 In this image processing apparatus 1, first, an image (distance image) obtained by photographing a person in a predetermined posture from front and rear, left and right, up and down using a distance image sensor such as a stereo camera, and viewed from all directions of the person. To get. Based on the acquired distance data of each image, for example, the surface of a person is measured three-dimensionally at a predetermined vertical and horizontal pitch, and the entire three-dimensional surface shape of the person is calculated as polygon data connecting measurement points. The shape data is acquired, and the acquired human shape data is stored in the three-dimensional shape database 50. In addition, the image processing apparatus 1 applies the three-dimensional shape data of the person created and acquired in this way to the joint model of the person in the virtual space to create a human body model in the virtual space.

図２は、人の関節モデルを示す模式図である。
関節モデル７０は、複数の関節を結んで人を表すモデルであり、図示のように、人の各関節位置に応じて、順に結ばれた１２個の関節７１、７２、７３と、手と足の先端に位置する４個の終端効果器７４からなる。また、各関節７１、７２、７３は、１０個の関節７１、７２が回転可能に、かつ、４個の関節７２が関節モデル７０内での位置が固定と設定されている。 FIG. 2 is a schematic diagram showing a human joint model.
The joint model 70 is a model that represents a person by connecting a plurality of joints. As shown in the figure, twelve joints 71, 72, 73 connected in order according to the position of each joint of the person, hands and feet. It consists of four end effectors 74 located at the tip. The joints 71, 72, and 73 are set such that the ten joints 71 and 72 can rotate and the positions of the four joints 72 in the joint model 70 are fixed.

ここで、モーションキャプチャデータベース５１は、人の動きや姿勢をデジタル的に記憶するモーションキャプチャシステムにより、人の３次元の動きや姿勢のデータを取得して記憶している。即ち、例えば、マーカを付けた人に所定の動作をさせて、マーカをトラッカーにより連続して検出し、それぞれの動作や姿勢に対する１又は複数の時系列データを取得して記憶している。これにより、関節モデル７０の各部毎の変化を取得し、人の動きや姿勢に応じた関節モデル７０の関節７１、７２、７３毎の回転角度、位置、変位（ここでは、これらを総称して変位情報という）を、時系列に沿うデータとして順に記憶してデータベースを構築する。
なお、関節モデル７０に当てはめる人の距離画像は、関節モデル７０の姿勢に合わせた姿勢の人の距離画像を取得するのが望ましい。また、この関節モデル７０は一例であり、あらゆる関節モデルへの適用が可能である。 Here, the motion capture database 51 acquires and stores data of a three-dimensional movement and posture of a person by a motion capture system that digitally stores the movement and posture of the person. That is, for example, a person with a marker is caused to perform a predetermined motion, the marker is continuously detected by a tracker, and one or a plurality of time-series data for each motion and posture is acquired and stored. Thereby, the change for each part of the joint model 70 is acquired, and the rotation angle, position, and displacement for each joint 71, 72, 73 of the joint model 70 according to the movement and posture of the person (herein, these are collectively referred to) (Referred to as displacement information) is sequentially stored as time-sequential data to construct a database.
The distance image of the person applied to the joint model 70 is desirably acquired as a distance image of the person in a posture that matches the posture of the joint model 70. Further, the joint model 70 is an example, and can be applied to any joint model.

画像処理装置１は、上記した人の３次元形状データに基づく画像（人体モデル）を表示手段（図示せず）に表示する。この表示された人体モデルの画像に対して、ユーザが、関節モデル７０の各関節７１、７２、７３や終端効果器７４に対応する位置を、マウスを操作等して指定することで、３次元形状データを表す人体モデルの画像と関節モデル７０の位置同士が対応づけられる。これにより、画像処理装置１は、人体モデルの３次元形状データを関節モデル７０に当てはめ、関節モデル７０の各関節の動きに合わせて作成した人体モデルの各関節を動かし、人体モデルを関節モデル７０に合わせて変化させる。このように人体モデル及びその各部の形状の位置や方向を変化させて、所定の姿勢や動作の人体モデル及び３次元形状データを作成する。 The image processing apparatus 1 displays an image (human body model) based on the above-described human three-dimensional shape data on a display means (not shown). The user designates positions corresponding to the joints 71, 72, 73 and the end effector 74 of the joint model 70 with respect to the displayed image of the human body model by operating the mouse or the like, so that the three-dimensional The image of the human body model representing the shape data and the positions of the joint model 70 are associated with each other. Accordingly, the image processing apparatus 1 applies the three-dimensional shape data of the human body model to the joint model 70, moves each joint of the human body model created in accordance with the movement of each joint of the joint model 70, and converts the human body model into the joint model 70. Change to match. In this way, the human body model and the positions and directions of the shapes of the respective parts are changed to create a human body model and three-dimensional shape data of a predetermined posture and motion.

画像処理装置１は、上記のようにモーションキャプチャデータベース５１から読み出して、人の特定の動作や姿勢に応じた関節モデル７０の関節７１、７２、７３毎の変位情報を取得し、変位情報に基づき、関節７１、７２、７３の回転角度や位置等を変化させて動かし、関節モデル７０を変化させる。同時に、画像処理部１０のノイズ生成処理部１１により、人の３次元形状データ（人体モデル）を変化させ、所定のかつ少量の人の３次元形状データから、人体モデルにより人の様々な動き等を再現する。 The image processing apparatus 1 reads out from the motion capture database 51 as described above, acquires displacement information for each joint 71, 72, 73 of the joint model 70 according to a specific motion and posture of a person, and based on the displacement information. The joint model 70 is changed by moving the joints 71, 72, and 73 by changing the rotation angles and positions thereof. At the same time, the noise generation processing unit 11 of the image processing unit 10 changes the three-dimensional shape data (human body model) of the person, and from a predetermined and small amount of the three-dimensional shape data of the person, various movements of the person depending on the human body model. To reproduce.

即ち、現実空間では、痩せ型の人や太っている人、身長が高い人や低い人等、人毎に様々な体格差がある。また、厚着しているときや薄着しているとき等に応じて形状も様々に変化するとともに、人の動作についても、動作がゆっくりの人や速い人等の個人差が様々である。そこで、画像処理装置１は、このような、現実空間における人の体格、形状、動作における差異、照明や環境上の差異等、現実空間における差異を、ノイズ生成処理部１１によりノイズを付加することで再現する。ノイズ生成処理部１１は、ノイズ生成関数群２０に設定された各ノイズモデル２１〜２６に基づき、人の形状や動き、体格、照明等、現実空間の差異に応じたノイズを付加して、人の３次元形状データと人体モデルを変化させる。以下、ノイズ生成処理部１１による、各種ノイズ付加の一例を説明するが、これらは一例であり、現実空間を忠実に再現するための、あらゆるノイズモデルへの拡張が可能である。 That is, in the real space, there are various physique differences for each person, such as a thin person, a fat person, a tall person, or a short person. In addition, the shape changes variously depending on whether the person is wearing thick clothes or wearing lightly, and the person's movements are also different from person to person such as a slow-moving person or a fast-moving person. Therefore, the image processing apparatus 1 adds noise to the difference in the real space such as a difference in the physique, shape, and movement of the person in the real space, a difference in lighting and the environment, and the like by the noise generation processing unit 11. To reproduce. Based on the noise models 21 to 26 set in the noise generation function group 20, the noise generation processing unit 11 adds noise according to differences in the real space, such as the shape, movement, physique, and lighting of the person. The three-dimensional shape data and the human body model are changed. Hereinafter, although an example of various noise additions by the noise generation processing unit 11 will be described, these are examples and can be extended to any noise model for faithfully reproducing the real space.

ノイズ生成処理部１１は、人体モデル（３次元形状データ）に対して、人の体格のノイズモデル２３と人の形状のノイズモデル２４に基づき、人の体格のノイズを付加して体格の個人差を生成する。具体的には、人体モデルの頭部・胴部・腕部・足部の体積と、身長及び頭部・胴部・腕部・足部・肩幅の長さと、頭部・胸部・腹部・臀部・腕部・足部の周囲長と、を変動させる。これらの値は、日本人標準サイズを中央にとる。そして、各値の上限値及び下限値を任意に設定する。上限値と中央値の差及び下限値と中央値の差は同じになるように設定する。各値の出現確率は、上限値及び下限値が±３σと対応する正規分布に従うものとする。（その際、各値を、例えば、０．１σずつ変動させる等、所定量ずつ都度変動させる。以下、同様。）。ここでは、各値に、日本人の標準サイズを適用するが、世界の標準サイズや、国別や地方別の標準サイズを適用してもよい。また、各値のバラツキの確率密度関数に正規分布を仮定しているが、これを整形医学の分野の知見を利用して、医学的に解明された人の体格差の統計データに置き換えることもできる。 The noise generation processing unit 11 adds noise of the human physique to the human body model (three-dimensional shape data) based on the noise model 23 of the human physique and the noise model 24 of the human shape, and the individual difference of the physique Is generated. Specifically, the volume of the head, torso, arms, and feet of the human body model, the height, the length of the head, torso, arms, feet, and shoulder width, and the head, chest, abdomen, and buttocks・ Change the perimeter of the arms and feet. These values are centered on the Japanese standard size. And the upper limit and lower limit of each value are set arbitrarily. The difference between the upper limit value and the median value and the difference between the lower limit value and the median value are set to be the same. The appearance probability of each value follows a normal distribution in which an upper limit value and a lower limit value correspond to ± 3σ. (At this time, each value is changed each time by a predetermined amount, for example, 0.1 σ or the like. The same applies hereinafter). Here, the standard size of Japanese is applied to each value, but the standard size of the world, or the standard size of each country or region may be applied. In addition, a normal distribution is assumed for the probability density function of the variation of each value, but this can be replaced with statistical data of human body physique differences that have been medically elucidated using knowledge in the field of orthopedic medicine. it can.

また、ノイズ生成処理部１１は、その他のノイズモデル２４に基づき、人体モデルに洋服等の服のノイズを付加して服の違いを生成する。この服の違いには、薄着、厚着等の厚みの違いと服の模様の違いがあるが、服の模様の違いについては、３次元形状データの人体モデルを画像面に投影して生成した輝度画像に対してテンプレートマッチング処理を行った際、充分に高い相関値が獲得できればよい。ここでは、先ず、人体モデル表面の服を着用する範囲に、所定の輝度パターン８０（図３参照）を並べて格子状に投影し、輝度パターン８０に輝度階調の±３３％範囲のホワイトノイズを加算することで、服の模様の違いを生成する。即ち、輝度階調の変化に伴い、例えば輝度パターン８０の黒色部８１は変化させずに白色部８２を次第に黒くし、或いは、白色部８２は変化させずに黒色部８１を次第に白くし、全体の色を、白色から黒色の間で次第に変化させて模様の違いを生成する。これに対し、服の形状や厚みの違いは、体格差の生成と同様に生成する。ただし、服の厚みは、胴部及び腕部の変動が最も大きく、頭部、足部については小さいため、この規則に従い、頭部・胸部・腹部・臀部の周囲長及び、身長、肩幅、胴部、腕部、足部の長さに適当なノイズを加算して３次元形状データを変化させ、人体モデルの服の厚みの違いを生成する。 The noise generation processing unit 11 adds clothes noise such as clothes to the human body model based on the other noise model 24 to generate a difference in clothes. This difference in clothes includes a difference in thickness such as thin clothes and thick clothes, and a difference in clothes pattern. The difference in clothes pattern is the brightness generated by projecting a human body model of 3D shape data onto the image plane. It is sufficient that a sufficiently high correlation value can be obtained when the template matching process is performed on the image. Here, first, a predetermined luminance pattern 80 (see FIG. 3) is arranged and projected in a lattice pattern in the range where the clothes on the surface of the human body model are worn, and white noise in the range of ± 33% of the luminance gradation is applied to the luminance pattern 80. By adding, the difference in the pattern of the clothes is generated. That is, as the luminance gradation changes, for example, the black portion 81 of the luminance pattern 80 is not changed, and the white portion 82 is gradually blackened, or the white portion 82 is not changed and the black portion 81 is gradually whitened. The color of the pattern is gradually changed from white to black to generate a pattern difference. On the other hand, the difference in the shape and thickness of the clothes is generated in the same manner as the generation of the physique difference. However, as for the thickness of the clothing, the torso and arms have the largest fluctuations, and the head and feet are small.According to this rule, the circumference of the head, chest, abdomen, and buttocks, and the height, shoulder width, and torso Appropriate noise is added to the lengths of the head, arms, and feet to change the three-dimensional shape data, thereby generating a difference in the thickness of the human body model clothes.

ノイズ生成処理部１１は、人の姿勢のノイズモデル２２に基づき、人体モデルに姿勢のノイズを付加して、姿勢の個人差を生成する。その際、関節モデル７０の関節７１、７２、７３毎（図２参照）に、同じ姿勢の複数の関節角度をモーションキャプチャデータベース５１から取得して平均値と標準偏差を求める。関節角度の出現確率は正規分布に従うものとし、平均値に対して±３σの範囲で各関節角度にノイズを付加して３次元形状データを変化させ、姿勢の個人差を生成して人体モデルを作成する。ただし、モーションキャプチャデータベース５１から、同じ姿勢のサンプルを充分に多く獲得できるときは、姿勢毎に各関節角度のバラツキを統計的に解析して求めた最適な関節角度のバラツキを適用することもできる。また、リハビリテーション医学の知見を利用して、各関節の可動域や姿勢毎の関節に掛かる肉体的負荷のデータを参考にして、関節角度にバラツキ範囲を設定してもよい。 The noise generation processing unit 11 adds posture noise to the human body model based on the human posture noise model 22 to generate individual differences in posture. At that time, for each of the joints 71, 72, 73 of the joint model 70 (see FIG. 2), a plurality of joint angles with the same posture are acquired from the motion capture database 51 to obtain an average value and a standard deviation. The joint angle appearance probability follows a normal distribution, and noise is added to each joint angle within a range of ± 3σ with respect to the average value to change the three-dimensional shape data, and to generate individual differences in posture to create a human body model create. However, when a sufficiently large number of samples of the same posture can be acquired from the motion capture database 51, the variation of the optimum joint angle obtained by statistically analyzing the variation of each joint angle for each posture can be applied. . In addition, using the knowledge of rehabilitation medicine, a range of variation in the joint angle may be set with reference to the range of motion of each joint and physical load data applied to the joint for each posture.

ノイズ生成処理部１１は、人の動きのノイズモデル２１に基づき、人体モデルに動作のノイズを付加して動作の個人差を生成する。具体的には、例えば、関節モデル７０の関節７１、７２、７３毎に、その関節角度の時系列データをモーションキャプチャデータベース５１から取得する。この関節角度の時系列データを伸縮することで、動作の個人差を表現する。関節角度の時系列データの伸縮率の範囲は、元データに対して１／２倍〜２倍の範囲で設定する。そして、伸縮率の出現する確率は一定となるものとする。元データが出現する確率、１／２倍データが出現する確率、２倍データの出現する確率は同じである。以上の規則に基づき、関節角度の時系列データにノイズを与えることで、動作の個人差を表現する。 The noise generation processing unit 11 adds individual motion noise to the human body model based on the human motion noise model 21 to generate individual differences in motion. Specifically, for example, for each joint 71, 72, 73 of the joint model 70, time series data of the joint angle is acquired from the motion capture database 51. Individual differences in motion are expressed by expanding and contracting the time-series data of the joint angles. The range of the expansion / contraction rate of the time series data of the joint angle is set in a range of 1/2 to 2 times the original data. The probability that the expansion / contraction rate appears is assumed to be constant. The probability that the original data appears, the probability that the half data appears, and the probability that the double data appears are the same. Based on the above rules, individual differences in motion are expressed by adding noise to the time-series data of joint angles.

また、現実空間において、ステレオカメラにより距離画像を生成することを想定している場合は、ノイズ生成処理部１１は、その画像のステレオマッチング位置に、予め設定する誤差モデルを適用して距離画像の計測誤差を再現する。即ち、仮想実験環境では、後述するカメラ仕様及び視点位置設定処理部１２の設定に基づき、設定されたステレオカメラから出力されるステレオ画像同士の対応点の正確な対応関係が判る。このステレオ対応点を中心に、３ピクセルを半径とする円の範囲で誤差を加算する。その際、正確なステレオ対応点を中心にして、誤差を含むステレオ計測点が分布する確率密度関数が正規分布に従うと仮定し、正確なステレオ対応点を中心に±３ピクセルが±３σ（標準偏差＝１ピクセル）になるようなノイズモデルを適用して、計測誤差を再現する。 In addition, when it is assumed that a distance image is generated by a stereo camera in the real space, the noise generation processing unit 11 applies a preset error model to the stereo matching position of the image, and generates a distance image. Reproduce the measurement error. That is, in the virtual experiment environment, an accurate correspondence between corresponding points of stereo images output from the set stereo camera can be determined based on camera specifications and setting of the viewpoint position setting processing unit 12 described later. An error is added within a circle having a radius of 3 pixels around the stereo corresponding point. At that time, assuming that the probability density function in which the stereo measurement points including the error are distributed around the correct stereo correspondence point follows the normal distribution, ± 3 pixels around the accurate stereo correspondence point is ± 3σ (standard deviation) = 1 pixel) to apply a noise model to reproduce the measurement error.

加えて、現実空間では、被写体とカメラの間に隠蔽物が存在する場合があり、本実施形態では、ノイズ生成処理部１１により、仮想空間における被写体（人体モデル）とカメラの間の適当な位置に、適当な大きさの直方体を１又は複数配置することで隠蔽の現象を再現する。また、現実空間では、光源と被写体の間に遮蔽物があり被写体に影が映る場合や、様々な色の光源が混在していて被写体に映る場合がある。本実施形態では、これらの現象を照明ノイズとして扱い、ノイズ生成処理部１１により、予め設定する照明のノイズモデル２５を、仮想空間における被写体に投影することで、この現象を再現する。具体的には、複数周波数の正弦（ＳＩＮ）波の振幅を輝度画像の階調の±３．３％の範囲で変動させて生成した輝度ノイズを輝度画像に加算することで照明ノイズを生成する。また、輝度値のみでなく、様々な色の照明ノイズを同様に生成する。このように、ノイズ生成処理部１１は、人体モデルに遮蔽物のノイズ又は照明のノイズを付加する。 In addition, in the real space, there may be a concealment between the subject and the camera. In this embodiment, the noise generation processing unit 11 causes the noise generation processing unit 11 to select an appropriate position between the subject (human body model) and the camera. In addition, the concealment phenomenon is reproduced by arranging one or more rectangular parallelepipeds of appropriate size. In the real space, there is a case where there is an obstacle between the light source and the subject and a shadow is reflected on the subject, or there are cases where light sources of various colors are mixed and appear on the subject. In the present embodiment, these phenomena are treated as illumination noise, and the noise generation processing unit 11 reproduces this phenomenon by projecting a preset illumination noise model 25 onto a subject in the virtual space. Specifically, illumination noise is generated by adding luminance noise generated by changing the amplitude of a sine wave of multiple frequencies within a range of ± 3.3% of the gradation of the luminance image to the luminance image. . In addition, not only luminance values but also illumination noises of various colors are generated in the same manner. As described above, the noise generation processing unit 11 adds the noise of the shielding object or the noise of the illumination to the human body model.

その後、画像処理装置１は、以上のようにノイズを付加して生成した３次元形状データ（人体モデル）から、現実空間の撮影画像に対応する画像を順次生成して、認識データベース５４を構築する。まず、画像処理装置１は、カメラ仕様及び視点位置設定処理部１２で、人体モデルから取得する画像のカメラや視点位置等に関する撮影条件を設定する。具体的には、例えば、カメラにより出力するセンサ値（輝度画像、距離画像）、その焦点距離、画像面サイズ、画像面解像度、カメラの設置状態（位置、高さ、ロール角度、パン角度、チルト角度等）、被写体との相対的な位置関係を設定する。その際、カメラは、複数設定することもできる。続いて、画像生成処理部１３により、３次元形状データ（人体モデル）に基づき、その輝度や距離等の画像を生成する。その際、カメラ仕様及び視点位置設定処理部１２で設定された撮影条件に基づいて、そのカメラ仕様及び視点位置の設定等に従い、人体モデルを、取得すべき距離又は輝度を画像面に中心射影することで、撮影条件に基づく画像を生成する。ここでは、カメラモデルとして、ピンホールカメラモデルを採用するが、例えば、魚眼カメラモデル等の他のカメラモデルに適用することもできる。このように、画像生成処理部１３は、人体モデルから画像を生成して、各画像を画像メモリ３０の対応するフォルダに記憶させる。 Thereafter, the image processing apparatus 1 constructs the recognition database 54 by sequentially generating images corresponding to the captured images in the real space from the three-dimensional shape data (human body model) generated by adding noise as described above. . First, the image processing apparatus 1 uses the camera specification and viewpoint position setting processing unit 12 to set shooting conditions regarding the camera, viewpoint position, and the like of an image acquired from a human body model. Specifically, for example, sensor values (luminance image, distance image) output by the camera, its focal length, image surface size, image surface resolution, camera installation state (position, height, roll angle, pan angle, tilt) Angle, etc.) and the relative positional relationship with the subject. At that time, a plurality of cameras can be set. Subsequently, based on the three-dimensional shape data (human body model), the image generation processing unit 13 generates an image such as luminance and distance. At that time, based on the shooting conditions set by the camera specification and viewpoint position setting processing unit 12, the human body model is centrally projected on the image plane with the distance or luminance to be acquired in accordance with the setting of the camera specification and the viewpoint position. Thus, an image based on the shooting conditions is generated. Here, a pinhole camera model is adopted as the camera model, but it can also be applied to other camera models such as a fisheye camera model. As described above, the image generation processing unit 13 generates an image from the human body model and stores each image in a corresponding folder in the image memory 30.

また、画像処理装置１は、画像特徴算出処理部１４により、人体モデルから生成した各画像に基づき、画像内の人の範囲や位置、姿勢や動作等、人の各認識処理に必要な画像内の人の画像特徴を算出する。この画像特徴は、画像特徴算出関数群４０に予め設定された各種の画像特徴を算出する。画像処理装置１は、算出された各画像特徴に基づき、それらを機械学習部１５により機械学習して、認識データベースを構築する。これを、人の認識に使用する認識データベース５４として記憶する。 Further, the image processing apparatus 1 uses the image feature calculation processing unit 14 based on each image generated from the human body model to include in the image necessary for each human recognition process such as the range, position, posture, and motion of the person in the image. The image feature of the person is calculated. As this image feature, various image features preset in the image feature calculation function group 40 are calculated. Based on the calculated image features, the image processing apparatus 1 performs machine learning on the machine features by the machine learning unit 15 to construct a recognition database. This is stored as a recognition database 54 used for human recognition.

その際、例えば、（１）画像内の人の領域を抽出するための画像特徴を算出し、人領域の抽出の為の画像特徴から人らしさを機械学習する。（２）人の姿勢を推定するための画像特徴を算出し、姿勢を推定するための画像特徴から各関節の回転角度や位置等を機械学習する。（３）人の動作を判定する為の画像特徴を算出し、人の動作を判定する為の画像特徴を算出する。これら（１）、（２）、（３）の画像特徴算出及び認識データベースの構築の例について以下説明する。本実施形態では、距離画像を用いた場合の一例を示すが、輝度画像や２値化画像についても適用可能である。 At this time, for example, (1) image features for extracting a human region in the image are calculated, and humanity is machine-learned from the image features for extracting the human region. (2) Image features for estimating the posture of a person are calculated, and the rotation angle and position of each joint are machine-learned from the image features for estimating the posture. (3) An image feature for determining a human motion is calculated, and an image feature for determining a human motion is calculated. An example of image feature calculation and recognition database construction in (1), (2), and (3) will be described below. In this embodiment, an example of using a distance image is shown, but the present invention can also be applied to a luminance image and a binarized image.

（１）人領域の抽出データベースの構築について。
人領域の抽出には、人らしさの特徴として距離画像から獲得した対象物の大きさ、長さ及び円らしさの特徴を用いる。これらの画像特徴を多層パーセプトロンにより、人か人以外かを教師あり学習することで認識データベースを構築する。大きさや長さ及び円らしさの画像特徴の算出については、特に限定されず、既存の画像処理技術を駆使して算出する。例えば、大きさについては、距離画像上の対象物が占有する領域の画素値を積分することで求めることが可能である。また、長さについては、先に求めた大きさ量を距離画像上の対象物が占有する画素数で割り算することで簡易的に求めることが可能である。円らしさについては、対象物の最外郭のエッジ線にサーキュラハフ変換を適用することで円らしさ量を獲得することが可能である。これら人らしさの画像特徴は一例であり、これら以外にも適用可能である。また、上記した各種画像特徴算出方法についても既存の画像処理技術を利用して適用することが可能である。 (1) Constructing an extraction database for human areas.
For the extraction of the human region, the characteristics of the size, length, and circularity of the object acquired from the distance image are used as the characteristics of humanity. A recognition database is constructed by learning whether these image features are human or non-human by using a multilayer perceptron. The calculation of the image feature of size, length, and circularity is not particularly limited, and is calculated using existing image processing technology. For example, the size can be obtained by integrating the pixel values of the area occupied by the object on the distance image. The length can be easily obtained by dividing the previously obtained size amount by the number of pixels occupied by the object on the distance image. As for the circularity, it is possible to obtain the circularity amount by applying a circular Hough transform to the outermost edge line of the object. These image characteristics of humanity are examples, and can be applied to other than these. The various image feature calculation methods described above can also be applied using existing image processing techniques.

（２）人の姿勢認識データベースの構築について。
人の姿勢認識には、距離画像を格子状の小領域に分割した局所領域における対象の大きさ及び長さの特徴を用いる。この距離画像上の局所領域における対象の大きさ及び長さを入力として、多層パーセプトロンにより、各関節角度を教師あり学習することで認識データベースを構築する。距離画像上の局所領域における対象物の大きさ及び長さ特徴の算出については、（１）で記載した手法と同様に算出する。
（３）人の動作認識データベースの構築について。
人の動作は、（２）で構築した人の姿勢認識データベースを利用して推定した各関節角度及び変位の時系列データをＨＭＭ（Hidden Markov Model）アルゴリズムにより学習することで構築する。 (2) About construction of human posture recognition database.
For the posture recognition of a person, the characteristics of the size and length of a target in a local area obtained by dividing a distance image into small grid areas are used. Using the size and length of the object in the local region on the distance image as input, a recognition database is constructed by learning each joint angle with a multi-layer perceptron with supervision. The size and length characteristics of the object in the local area on the distance image are calculated in the same manner as the method described in (1).
(3) Constructing a human motion recognition database.
The human motion is constructed by learning the time series data of each joint angle and displacement estimated using the human posture recognition database constructed in (2) using an HMM (Hidden Markov Model) algorithm.

画像処理装置１は、このように算出した画像特徴を機械学習して、画像内の人を認識するための認識データベース５４を構築する。なお、ここでは、距離画像から画像特徴を算出する一例を示したが、輝度画像や２値化画像への適用も可能である。また、画像特徴についても、予め定められた画像特徴のみに適用するのではなく、必要に応じて新規の画像特徴算出関数を追加して適用することもできる。 The image processing apparatus 1 performs machine learning on the image features calculated in this way, and constructs a recognition database 54 for recognizing a person in the image. Although an example of calculating the image feature from the distance image is shown here, application to a luminance image or a binarized image is also possible. Also, the image feature can be applied by adding a new image feature calculation function as needed, instead of applying only to a predetermined image feature.

次に、この画像処理装置１による画像処理の手順について説明する。
図４は、画像処理装置１による画像処理の手順を示すフローチャートである。
画像処理装置１は、図示のように、各データベース５０、５１から、人の３次元形状データ及びモーションキャプチャデータを取得し（Ｓ１０１）、まず、３次元形状データに関節モデル７０を当てはめて人体モデルを作成する（Ｓ１０２）。次に、モーションキャプチャデータベース５１から読み出した、特定の動作や姿勢時における人の関節の回転角度や位置の情報等（関節毎の変位情報）に基づき、関節モデル７０を変化させて人体モデルの各関節を動かし、人体モデルを変更する（Ｓ１０３）。また、ノイズ生成処理部１１により、人体モデルに現実空間に応じた差異、例えば人の姿勢、動作、体格、服の模様、服の形状や厚み、照明、遮蔽物に関する各ノイズを上記のように付加して個人差等を生成し（Ｓ１０４）、各３次元形状データ（人体モデル）を生成する。 Next, the procedure of image processing by the image processing apparatus 1 will be described.
FIG. 4 is a flowchart showing a procedure of image processing by the image processing apparatus 1.
As shown in the figure, the image processing apparatus 1 acquires human three-dimensional shape data and motion capture data from each of the databases 50 and 51 (S101), and first applies a joint model 70 to the three-dimensional shape data to form a human body model. Is created (S102). Next, based on information about the rotation angle and position of the human joint at the time of a specific action or posture read out from the motion capture database 51 (displacement information for each joint), the joint model 70 is changed to change each of the human body models. The joint is moved to change the human body model (S103). In addition, the noise generation processing unit 11 causes differences in human body models according to real space, such as human posture, motion, physique, clothing pattern, clothing shape and thickness, lighting, and shielding, as described above. In addition, individual differences and the like are generated (S104), and each three-dimensional shape data (human body model) is generated.

続いて、人体モデルから取得したい画像のカメラ等に関する撮影条件を設定し（Ｓ１０５）、その設定に基づき、ノイズ等を付加した人体モデルから画像を生成する処理を行い（Ｓ１０６）、生成した画像を画像メモリ３０に記憶させる。次に、生成した各画像に基づいて、画像特徴算出処理部１４により画像特徴を算出して（Ｓ１０７）、機械学習部１５により機械学習する（Ｓ１０８）。また、画像処理装置１は、以上の手順を繰り返して（Ｓ１０１〜Ｓ１０８）、人の動きや姿勢、ノイズ、撮影条件等を変化させて画像を生成し、その画像特徴を順次機械学習して、実際の画像内の人の認識に必要な認識データベース及び画像認識システムを構築する。 Subsequently, shooting conditions relating to the camera or the like of the image desired to be acquired from the human body model are set (S105), and based on the setting, processing is performed to generate an image from the human body model to which noise or the like is added (S106). The image is stored in the image memory 30. Next, based on each generated image, an image feature is calculated by the image feature calculation processing unit 14 (S107), and machine learning is performed by the machine learning unit 15 (S108). Further, the image processing apparatus 1 repeats the above procedure (S101 to S108), generates an image by changing a person's movement, posture, noise, shooting conditions, etc., and sequentially machine-learns the image features, A recognition database and an image recognition system necessary for recognizing a person in an actual image are constructed.

以上説明したように、本実施形態では、現実空間で取得した数少ない人の形状、姿勢、動作等のデータを３次元仮想空間上に投影することで、現実空間での取得が困難な画像（例えば、複数視点から撮影した画像等）を容易に且つ膨大な数を生成して取得することができる。また、仮想空間上で、人の形状や姿勢、動作に対して、実験を通じて統計的に求める等したノイズモデルを適用することで、現実空間での人の形状、姿勢、動作等の差異が付された現実に近い３次元形状データを取得して膨大なデータを生成することができる。その際、影、外乱光、照明についても、予め設定した照明モデルを仮想空間上で再現することにより、より現実空間での光の現象に近い現象が生じた３次元形状データを膨大に生成することができる。 As described above, in this embodiment, an image (for example, an image that is difficult to acquire in the real space is projected by projecting data such as a few human shapes, postures, and movements acquired in the real space onto the three-dimensional virtual space. , Images taken from a plurality of viewpoints) can be easily generated and acquired. In addition, by applying a noise model that is statistically obtained through experiments to human shapes, postures, and movements in virtual space, differences in human shapes, postures, movements, etc. in real space are added. A large amount of data can be generated by acquiring the three-dimensional shape data close to reality. At that time, by reproducing a preset illumination model in the virtual space for shadows, disturbance light, and illumination, a large amount of 3D shape data in which a phenomenon closer to the phenomenon of light in the real space has occurred is generated. be able to.

更に、上記のように、取得した膨大な３次元形状データを多視点距離画像や１視点距離画像、多視点輝度画像、１視点輝度画像、多視点２値画像、１視点２値画像等、多種類の画像に変換でき、かつ、現実空間で取得した画像と遜色ない画像を取得できる。また、上記のように取得した画像から算出した所定の画像特徴を機械学習して認識データベースを構築することで、仮想空間において取得したデータのみで、汎用性の高い認識データベースを構築できる。この認識データベースを現実空間に適用することで、撮影装置の種類の違い等を問題としない画像認識システムを構築できる。即ち、撮影装置が距離画像センサ、又は輝度画像センサであっても、その違いを問わず画像内の人や動作等を認識でき、例えば輝度画像センサの２次元の輝度画像から、その画像特徴を算出して、３次元形状データから対応する距離画像を検索することで、より高次元な画像での認識結果を取得できる。その結果を参考にすることで、より高精度な、人の形状、姿勢、動作等の認識が可能になる。 Further, as described above, the acquired enormous three-dimensional shape data can be used as a multi-view distance image, a single-view distance image, a multi-view luminance image, a single-view luminance image, a multi-view binary image, a single-view binary image, etc. An image that can be converted into a kind of image and that is comparable to an image acquired in the real space can be acquired. Further, by constructing a recognition database by machine learning of predetermined image features calculated from the images acquired as described above, a highly versatile recognition database can be constructed using only data acquired in the virtual space. By applying this recognition database to the real space, it is possible to construct an image recognition system that does not cause a difference in the types of photographing devices. That is, even if the photographing device is a distance image sensor or a luminance image sensor, it is possible to recognize a person or an action in the image regardless of the difference. For example, the image characteristics can be obtained from a two-dimensional luminance image of the luminance image sensor. By calculating and retrieving the corresponding distance image from the three-dimensional shape data, a recognition result in a higher-dimensional image can be acquired. By referring to the result, it is possible to recognize a human shape, posture, movement, etc. with higher accuracy.

なお、本発明は、コンピュータにより、画像処理装置１の以上説明した各手段を実現するためのプログラムとしても実現できる。 The present invention can also be realized as a program for realizing the above-described units of the image processing apparatus 1 by a computer.

１・・・画像処理装置、１０・・・画像処理部、１１・・・ノイズ生成処理部、１２・・・カメラ仕様及び視点位置設定処理部、１３・・・画像生成処理部、１４・・・画像特徴算出処理部、１５・・・機械学習部、２０・・・ノイズ生成関数群、３０・・・画像メモリ、４０・・・画像特徴算出関数群、５０・・・人の３次元形状データベース、５１・・・モーションキャプチャデータベース、５４・・・認識データベース、７０・・・人の関節モデル。 DESCRIPTION OF SYMBOLS 1 ... Image processing apparatus, 10 ... Image processing part, 11 ... Noise generation processing part, 12 ... Camera specification and viewpoint position setting processing part, 13 ... Image generation processing part, 14 ... Image feature calculation processing unit, 15 ... machine learning unit, 20 ... noise generation function group, 30 ... image memory, 40 ... image feature calculation function group, 50 ... 3D human shape Database 51... Motion capture database 54. Recognition database 70. Human joint model.

Claims

Means for acquiring human three-dimensional shape data;
Means for applying a three-dimensional shape data to a human joint model in a virtual space to create a human body model in the virtual space;
Means for moving each joint of the human body model in accordance with the movement of each joint of the joint model and changing the human body model;
Means for adding noise corresponding to the difference in real space to the human body model;
Means for generating an image from a human body model;
Means for storing an image generated from a human body model;
An image processing apparatus comprising:

The image processing apparatus according to claim 1,
Means for acquiring displacement information for each joint of the joint model according to the movement and posture of the person;
Means for changing the joint model based on displacement information for each joint;
An image processing apparatus comprising:

In the image processing apparatus according to claim 1 or 2,
Means for setting image capturing conditions of an image acquired from a human body model,
An image processing apparatus, wherein means for generating an image from a human body model generates an image based on a photographing condition from the human body model.

The image processing apparatus according to any one of claims 1 to 3,
An image processing apparatus characterized in that the means for adding noise includes means for adding at least one noise of a human posture, motion, and physique to a human body model to generate an individual difference.

In the image processing device according to any one of claims 1 to 4,
An image processing apparatus characterized in that the means for adding noise has means for adding at least one noise of clothes, lighting, and shielding to a human body model.

The image processing apparatus according to any one of claims 1 to 5,
Means for calculating an image feature of a person in the image based on an image generated from a human body model;
Means for constructing a recognition database for machine learning of image features to recognize people in the image;
An image processing apparatus comprising:

The program for implement | achieving each means of the image processing apparatus as described in any one of Claim 1 thru | or 6 with a computer.