JP2018116589A

JP2018116589A - State identification apparatus, program, and method using changed image group of target images

Info

Publication number: JP2018116589A
Application number: JP2017008044A
Authority: JP
Inventors: 剣明呉; Jiangming Wu; 聿津湯; Tang Yujin; 矢崎　智基; Tomomoto Yazaki; 智基矢崎
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-01-20
Filing date: 2017-01-20
Publication date: 2018-07-26

Abstract

【課題】様々な状況下で種々の見え方をしている識別対象の画像であっても、より高い精度でこの識別対象の状態を識別することが可能な装置を提供する。【解決手段】識別対象に係る画像を用い、識別モデルによって識別対象の状態を識別する本状態識別装置は、入力された識別対象に係る１つの画像に対し、この１つの画像における識別対象の見え方を変更し且つその状態を維持する処理を施し、この処理で生成された画像を含む画像群を生成する画像群生成手段と、識別モデルを構築する際、及び／又は識別モデルを用いて状態を識別する際に、生成された画像群を使用する識別処理手段とを有する。ここで、画像群生成手段は、この１つの画像に対し、見え方に係る少なくとも１つの種別について、この１つの画像における方向とは反対の方向への変更を少なくとも行うことも好ましい。【選択図】図１An object of the present invention is to provide an apparatus capable of identifying the state of an identification target with higher accuracy even if the image of the identification target appears in various ways under various circumstances. Kind Code: A1 A state identification device that uses an image of an identification target and identifies a state of the identification target by a discrimination model is provided for one input image of the identification target. image group generating means for performing processing for changing the direction and maintaining the state, and generating an image group including the image generated by this processing; identification processing means for using the generated image group when identifying Here, it is also preferable that the image group generating means at least change at least one type related to the appearance of the one image in a direction opposite to the direction of the one image. [Selection drawing] Fig. 1

Description

本発明は、所定対象の状態を、当該所定対象に係る画像に基づいて識別する技術に関する。 The present invention relates to a technique for identifying a state of a predetermined target based on an image related to the predetermined target.

従来、所定対象の状態、例えば人間の表情を、この所定対象に関する画像、例えば顔を撮影した写真画像を用いて識別する技術は、種々考案されてきた。 Conventionally, various techniques have been devised for identifying a state of a predetermined object, for example, a human facial expression using an image related to the predetermined object, for example, a photographic image obtained by photographing a face.

特に、人間の表情認識の分野では、ポジティブ、ネガティブ、ニュートラルの３分類モデルや、Paul Ekman の７分類モデル（ニュートラル、喜び、嫌悪、怒り、サプライズ、悲しみ、恐怖）等を用いて、多くの研究者が表情認識技術の向上に取り組んでいる。 In particular, in the field of human facial expression recognition, many studies have been conducted using positive, negative, and neutral three classification models and Paul Ekman's seven classification models (neutral, joy, disgust, anger, surprise, sadness, and fear). Are working on improving facial expression recognition technology.

このような取り組みの一例として、特許文献１には、上記の分類モデルに基づく大量の顔画像データの特徴量を学習し、その特徴量に基づいて表情を識別する技術が開示されている。この技術では、特に、意図的に作った顔ではなく自然な顔表情の学習データを効率良く収集し、認識精度の良い識別器を作成することを目的としている。 As an example of such an approach, Patent Literature 1 discloses a technique for learning a feature amount of a large amount of face image data based on the above classification model and identifying a facial expression based on the feature amount. In particular, this technique aims to efficiently collect learning data of natural facial expressions, not intentionally created faces, and create a discriminator with high recognition accuracy.

特開２０１１−１５０３８１号公報JP 2011-150381 A

しかしながら、特許文献１に記載されたような従来技術では、実際の表情認識環境に耐え得る高い識別精度を達成するために、様々な状況下で撮影された、種々の撮影具合の大量の顔画像が、学習データとして必要となってくる。 However, with the conventional technology as described in Patent Document 1, in order to achieve high identification accuracy that can withstand an actual facial expression recognition environment, a large number of facial images taken in various situations and in various shooting conditions However, it becomes necessary as learning data.

この学習データが大量に得られず、識別器が現実の様々な状況での種々の撮影具合の顔画像を十分に学習することができない場合、顔の表情を誤って判定するケースも多く発生し、結果として識別精度が低下し、実用に堪えない又は汎用性の低い識別器となってしまう。 If this learning data is not available in large quantities and the discriminator cannot sufficiently learn the facial images of various shooting conditions in various real situations, there are many cases where the facial expression is erroneously determined. As a result, the discrimination accuracy is lowered, and the discriminator becomes unusable or less versatile.

例えば、真正面を向いた顔の画像における表情識別では、ある一定の識別精度に達している識別器においても、識別すべき顔画像において、顔の一部が画像範囲（画像枠）からはみ出ていたり、顔が上下左右のいずれかの方に向いていたり、頭部の傾きが大きかったり、帽子やサングラス等の部品が付いていたり、さらには、画像の顔部分の照度（明度）が低すぎたり高すぎたりしている場合、識別精度が顕著に低下してしまうことも少なくない。 For example, in facial expression identification in a face image facing directly in front, even in a discriminator that has reached a certain discrimination accuracy, a part of the face may protrude from the image range (image frame) in the face image to be identified. , The face is facing up, down, left or right, the head is tilted, parts such as hats or sunglasses are attached, and the illuminance (lightness) of the face part of the image is too low If it is too high, the identification accuracy often decreases significantly.

また一方で、このような様々な状況での種々の撮影具合の顔画像を、学習データとして大量に準備することは極めて困難なのが実情となっている。その結果、ある顔画像における表情の識別において、当該顔画像の撮影された状況や撮影の具合によっては、表情の識別精度が低いまま処理を行い、表情を誤って識別してしまう事態が数多く発生してしまう。 On the other hand, in reality, it is extremely difficult to prepare a large amount of face images of various shooting conditions in various situations as learning data. As a result, in the identification of facial expressions in a face image, depending on the situation of the face image and how it was photographed, there are many situations in which facial expressions are erroneously identified by processing with low facial expression identification accuracy. Resulting in.

そこで、本発明は、様々な状況下で種々の見え方をしている識別対象の画像であっても、より高い精度でこの識別対象の状態を識別することが可能な装置、プログラム及び方法を提供することを目的とする。 Therefore, the present invention provides an apparatus, a program, and a method capable of identifying the state of an identification target with higher accuracy even for an image of the identification target that is viewed in various ways under various circumstances. The purpose is to provide.

本発明によれば、識別対象に係る画像を用い、識別モデルによって当該識別対象の状態を識別する状態識別装置であって、
入力された当該識別対象に係る１つの画像に対し、当該１つの画像における当該識別対象の見え方を変更し且つ当該状態を維持する処理を施し、当該処理で生成された画像を含む画像群を生成する画像群生成手段と、
当該識別モデルを構築する際、及び／又は当該識別モデルを用いて当該状態を識別する際に、生成された当該画像群を使用する識別処理手段と
を有する状態識別装置が提供される。 According to the present invention, a state identification device for identifying a state of an identification object by an identification model using an image related to the identification object,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Image group generation means for generating;
When constructing the identification model and / or identifying the state using the identification model, a state identification device is provided that includes an identification processing unit that uses the generated image group.

この本発明による状態識別装置の一実施形態として、画像群生成手段は、当該１つの画像に対し、当該見え方に係る少なくとも１つの種別について、当該１つの画像における方向とは反対の方向への変更を少なくとも行うことも好ましい。 As an embodiment of the state identification device according to the present invention, the image group generation means, for the one image, for at least one type related to the appearance, in a direction opposite to the direction in the one image. It is also preferable to make at least the change.

また、本発明による状態識別装置の他の実施形態として、画像群生成手段は、当該１つの画像に対し、当該見え方に係る少なくとも１つの種別について、この少なくとも１つの種別に関し予め設定された方向であって識別に好適な方向への変更を少なくとも行うことも好ましい。 Further, as another embodiment of the state identification device according to the present invention, the image group generation means, for the one image, for at least one type related to the appearance, a direction set in advance for the at least one type It is also preferable to change at least a direction suitable for identification.

さらに、本発明による状態識別装置の更なる他の実施形態として、画像群生成手段は、当該１つの画像に対し、当該見え方に係る複数の種別のうちの複数について合わせて変更を行うことも好ましい。 Furthermore, as still another embodiment of the state identification device according to the present invention, the image group generation unit may change the one image in accordance with a plurality of types of the appearance. preferable.

また、本発明による状態識別装置の更なる他の実施形態として、画像群生成手段は、当該１つの画像に対し、当該見え方に係る少なくとも１つの種別について、この少なくとも１つの種別に関し予め設定された方向への変更を行い、当該方向によって特徴付けられる当該識別モデルを構築するための画像群を生成し、
識別処理手段は、生成された当該画像群を使用して当該識別モデルを構築する
ことも好ましい。 As still another embodiment of the state identification device according to the present invention, the image group generation means is preset with respect to the at least one type for at least one type related to the appearance for the one image. To generate a set of images to build the identification model characterized by the direction,
The identification processing means preferably constructs the identification model using the generated image group.

さらに、本発明による状態識別装置の更なる他の実施形態として、画像群生成手段は、当該１つの画像に対し、当該見え方に係る少なくとも１つの種別について、この少なくとも１つの種別に関しとり得る全ての方向への変更を行い、この少なくとも１つの種別についての方向毎の識別結果のばらつきが抑制された当該識別モデルを構築するための画像群を生成し、
識別処理手段は、生成された当該画像群を使用して当該識別モデルを構築する
ことも好ましい。 Furthermore, as still another embodiment of the state identification device according to the present invention, the image group generation means may take all of the at least one type related to the appearance with respect to the one image. To generate an image group for constructing the identification model in which the variation in the identification result for each direction for the at least one type is suppressed,
The identification processing means preferably constructs the identification model using the generated image group.

また、本発明による状態識別装置の更なる他の実施形態として、画像群生成手段は、正解として１つの状態が対応付けられた１つの画像に対し、当該見え方に係る少なくとも１つの種別について、当該１つの状態に対し予め設定された変更数だけの変更を行い、所定のサンプル数の当該画像群を生成することも好ましい。 Further, as still another embodiment of the state identification device according to the present invention, the image group generation means, for one image associated with one state as a correct answer, for at least one type related to the appearance, It is also preferable to make a change by a preset number of changes for the one state to generate the image group of a predetermined number of samples.

ここで、上記の変更数設定に係る実施形態において、画像群生成手段は、当該１つの状態が対応付けられた画像のサンプル数が、他の状態が対応付けられた画像のサンプル数よりも小さい場合、当該１つの状態が対応付けられた画像について、当該他の状態が対応付けられた画像についてよりも大きい変更数だけの変更を行うことも好ましい。 Here, in the embodiment according to the above-described change number setting, the image group generation unit is configured such that the number of samples of the image associated with the one state is smaller than the number of samples of the image associated with the other state. In this case, it is also preferable to change the image associated with the one state by a larger number of changes than the image associated with the other state.

さらに、本発明による状態識別装置の更なる他の実施形態として、識別処理手段は、当該識別モデルを構築する際、及び当該識別モデルを用いて当該状態を識別する際に、生成された当該画像群を使用し、
画像群生成手段は、当該状態の識別の際に使用される画像群を生成する場合、入力された１つの画像に対し、当該識別対象の見え方に係るいずれの種別についても、当該識別モデル構築の際に使用された画像群を生成する際に行われた当該種別についての変更と同じ方向への変更を行い、一方、この画像群を生成する際に当該種別についての変更が行われなかった場合には変更を行わない
ことも好ましい。 Furthermore, as still another embodiment of the state identification device according to the present invention, the identification processing means generates the image generated when the identification model is constructed and when the state is identified using the identification model. Use groups,
When generating an image group used for identifying the state, the image group generating means constructs the identification model for any type related to the appearance of the identification target with respect to one input image. The change was made in the same direction as the change for the type made when generating the image group used at the time, while the change for the type was not made when generating the image group In some cases it is also preferred not to make any changes.

また、本発明による状態識別装置の更なる他の実施形態として、識別処理手段は、当該識別モデルを用いて、生成された当該画像群に含まれる画像の各々について当該状態を決定し、当該画像が変更前の元画像か否かに応じて予め設定された重み、及び／又は当該画像における変更された種別に応じて予め設定された重みによって、決定された当該状態の重み付け平均を行い、当該重み付け平均結果を当該状態の識別結果とすることも好ましい。 As still another embodiment of the state identification device according to the present invention, the identification processing means determines the state for each of the images included in the generated image group using the identification model, and the image The weighted average of the state determined by the weight set in advance according to whether or not is the original image before the change and / or the weight set in advance according to the changed type in the image, It is also preferable to use the weighted average result as the identification result of the state.

さらに、本発明による状態識別装置の更なる他の実施形態として、画像群生成手段は、当該識別対象に係る１つの画像を含む当該画像群を生成することも好ましい。 Furthermore, as still another embodiment of the state identification device according to the present invention, it is preferable that the image group generation unit generates the image group including one image related to the identification target.

また、本発明による状態識別装置の更なる他の実施形態として、状態識別装置は、入力された当該識別対象に係る１つの画像に対し、当該識別対象に係る画像領域を抽出し、当該画像領域に基づいて当該見え方に係る少なくとも１つの種別における当該種別発生の有無及び／又は度合いを判定する度合い判定手段を更に有し、
画像群生成手段は、当該少なくとも１つの種別について、判定された当該有無の状態及び／又は当該度合いを変えることによって変更を行う
ことも好ましい。 As still another embodiment of the state identification device according to the present invention, the state identification device extracts an image region related to the identification target for one image related to the input identification target, and the image region And a degree determination means for determining the presence and / or degree of occurrence of the type in at least one type related to the appearance based on
It is also preferable that the image group generation unit changes the at least one type by changing the determined presence state and / or the degree.

さらに、本発明による状態識別装置において、具体的に、画像群生成手段は、当該見え方に係る複数の種別として、当該１つの画像における当該識別対象の表れている度合い、当該識別対象の表れている向き若しくは傾き、当該識別対象を遮蔽するものの有無若しくは種類、当該識別対象の明度、当該識別対象の色合い、及び当該識別対象と背景とのコントラストの度合いのうちの少なくとも１つについて変更を行うことも好ましい。 Furthermore, in the state identification device according to the present invention, specifically, the image group generation means displays the degree of the identification target in the one image and the appearance of the identification target as a plurality of types related to the appearance. Change at least one of the orientation or inclination, presence / absence or type of the object to be identified, brightness of the object to be identified, hue of the object to be identified, and degree of contrast between the object to be identified and the background Is also preferable.

また、本発明による状態識別装置において、具体的に、当該識別対象は顔であって当該状態は顔の表情であり、
画像群生成手段は、入力された当該識別対象に係る１つの画像に対し、当該１つの画像における顔の見え方を変更し且つこの顔の表情を維持する処理を施し、
識別処理手段は、当該識別モデルを用いて、入力された１つの画像における顔の表情を識別する
ことも好ましい。 In the state identification device according to the present invention, specifically, the identification target is a face, and the state is a facial expression,
The image group generation means performs a process of changing the appearance of the face in the one image and maintaining the facial expression for the one image related to the input identification target,
It is also preferable that the identification processing means identifies facial expressions in one input image using the identification model.

本発明によれば、また、識別対象に係る画像を用い、識別モデルによって当該識別対象の状態を識別する状態識別システムであって、
入力された当該識別対象に係る１つの画像に対し、当該１つの画像における当該識別対象の見え方を変更し且つ当該状態を維持する処理を施し、当該処理で生成された画像を含む画像群を生成する画像群生成手段と、
生成された当該画像群によって当該識別モデルを構築する識別モデル構築手段と、
生成された当該画像群を使用し、当該識別モデルによって当該状態を識別する識別処理手段と
を有する状態識別システムが提供される。 According to the present invention, there is also a state identification system for identifying a state of the identification target by an identification model using an image related to the identification target,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Image group generation means for generating;
An identification model construction means for constructing the identification model by the generated image group;
There is provided a state identification system having identification processing means for identifying the state by the identification model using the generated image group.

本発明によれば、さらに、識別対象に係る画像を用い、識別モデルによって当該識別対象の状態を識別する装置に搭載されたコンピュータを機能させる状態識別プログラムであって、
入力された当該識別対象に係る１つの画像に対し、当該１つの画像における当該識別対象の見え方を変更し且つ当該状態を維持する処理を施し、当該処理で生成された画像を含む画像群を生成する画像群生成手段と、
当該識別モデルを構築する際、及び／又は当該識別モデルを用いて当該状態を識別する際に、生成された当該画像群を使用する識別処理手段と
してコンピュータを機能させる状態識別プログラムが提供される。 According to the present invention, there is further provided a state identification program for causing a computer mounted on an apparatus for identifying a state of the identification target using an identification model using an image related to the identification target,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Image group generation means for generating;
When the identification model is constructed and / or when the identification model is used to identify the state, a state identification program is provided that causes a computer to function as identification processing means that uses the generated image group.

本発明によれば、さらにまた、識別対象に係る画像を用い、識別モデルによって当該識別対象の状態を識別する装置に搭載されたコンピュータにおいて実施される状態識別方法であって、
入力された当該識別対象に係る１つの画像に対し、当該１つの画像における当該識別対象の見え方を変更し且つ当該状態を維持する処理を施し、当該処理で生成された画像を含む画像群を生成するステップと、
当該識別モデルを構築する際、及び／又は当該識別モデルを用いて当該状態を識別する際に、生成された当該画像群を使用するステップと
を有する状態識別方法が提供される。 According to the present invention, there is further provided a state identification method implemented in a computer mounted on an apparatus for identifying a state of the identification target using an identification model using an image related to the identification target,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Generating step;
A state identification method is provided that includes using the generated image group when constructing the identification model and / or identifying the state using the identification model.

本発明の状態識別装置、プログラム及び方法によれば、様々な状況下で種々の見え方をしている識別対象の画像であっても、より高い精度でこの識別対象の状態を識別することができる。 According to the state identification device, the program, and the method of the present invention, it is possible to identify the state of the identification target with higher accuracy even for the image of the identification target that is displayed in various ways under various circumstances. it can.

本発明による状態識別装置の一実施形態における機能構成を示す機能ブロック図である。It is a functional block diagram which shows the function structure in one Embodiment of the state identification apparatus by this invention. 本発明による状態識別方法の一実施形態の概略を示すフローチャートである。It is a flowchart which shows the outline of one Embodiment of the state identification method by this invention. 見え方度合判定部での判定に係る見え方種別の例を説明するための顔画像データである。It is face image data for demonstrating the example of the appearance type which concerns on the determination in the appearance degree determination part. 本発明に係る変更画像作成・画像群生成処理の一実施形態を示すフローチャートである。It is a flowchart which shows one Embodiment of the change image preparation / image group production | generation process which concerns on this invention. 識別モデル構築部で構築される識別モデルの一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of the identification model constructed | assembled by the identification model construction part. 本発明による状態識別システムの一実施形態を示す模式図である。It is a schematic diagram which shows one Embodiment of the state identification system by this invention.

以下、本発明の実施形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［状態識別装置］
図１は、本発明による状態識別装置の一実施形態における機能構成を示す機能ブロック図である。 [State identification device]
FIG. 1 is a functional block diagram showing a functional configuration in an embodiment of a state identification device according to the present invention.

図１によれば、本実施形態の状態識別装置としてのスマートフォン１は、公知の構成を有するカメラ１０２を内蔵しており、このカメラ１０２を用いて、例えばユーザの顔を撮影してこの顔の写真画像を生成し、生成した写真画像に映ったユーザの顔の表情を識別して、例えばタッチパネル・ディスプレイ（ＴＰ・ＤＰ）１０６に識別結果を表示することができる。また、当然に、このような表情識別対象である顔の写真画像を、外部から通信ネットワーク経由で通信インタフェース部１０１を介して取得し、表情識別処理を行うことも可能である。 According to FIG. 1, the smartphone 1 as the state identification device of the present embodiment has a built-in camera 102 having a known configuration, and for example, the user's face is photographed by using the camera 102. A photographic image is generated, the facial expression of the user reflected in the generated photographic image is identified, and the identification result can be displayed on, for example, the touch panel display (TP / DP) 106. Naturally, it is also possible to obtain a photographic image of the face as a facial expression identification target from the outside via the communication interface 101 via the communication network and perform facial expression identification processing.

さらに、１つの応用例として、スマートフォン１のアプリケーション１２１、例えば対話ＡＩアプリが、この表情の識別結果を利用して、例えば対話しているユーザの感情（発話意図）を理解し、その応答内容を調整したり、当該ユーザとの対話内容をパーソナライズしたりすることも可能になる。 Furthermore, as one application example, the application 121 of the smartphone 1, for example, the conversation AI application, uses the facial expression identification result, for example, understands the emotion (utterance intention) of the user who is interacting with, and displays the response content. It is also possible to make adjustments and personalize the dialogue with the user.

さらに、スマートフォン１は、本実施形態において、表情識別のための表情識別処理部１１４における学習用の（人間の顔の写真）画像を、画像管理サーバ２から通信インタフェース部１０１を介して取得することも好ましい。この際、スマートフォン１は、画像群生成部１１３によって、取得した学習用の画像をベースとして学習に供する学習画像群を生成する。その結果、従来ほど大量の画像サンプルを外部から取り込まずとも、十分な量の学習用画像を準備することができるのである。 Furthermore, in this embodiment, the smartphone 1 acquires a learning (human face photograph) image in the facial expression identification processing unit 114 for facial expression identification from the image management server 2 via the communication interface unit 101. Is also preferable. At this time, the smartphone 1 uses the image group generation unit 113 to generate a learning image group used for learning based on the acquired learning image. As a result, it is possible to prepare a sufficient amount of learning images without taking in a large amount of image samples from the outside as conventionally.

このような本発明による状態識別装置としてのスマートフォン１は、識別対象（例えば人間の顔）に係る画像を用い、識別モデルによって識別対象の状態（例えば顔の表情）を識別する装置であって、
（Ａ）入力された識別対象に係る１つの画像に対し、この１つの画像における識別対象（例えば顔）の見え方を変更し且つその状態（例えば表情）を維持する処理を施し、この処理で生成された画像を含む画像群を生成する画像群生成手段（画像群生成部１１３）と、
（Ｂ）識別モデルを構築する際、及び／又は識別モデルを用いて状態（例えば表情）を識別する際に、上記（Ａ）で生成された画像群を使用する識別処理手段（表情識別処理部１１４）と
を有することを特徴としている。 A smartphone 1 as such a state identification device according to the present invention is a device that uses an image related to an identification target (for example, a human face) and identifies the state of the identification target (for example, facial expression) by an identification model,
(A) A process for changing the appearance of the identification target (for example, a face) in the one image and maintaining the state (for example, a facial expression) is performed on one input image related to the identification target. Image group generation means (image group generation unit 113) for generating an image group including the generated image;
(B) Identification processing means (expression identification processing unit) that uses the image group generated in (A) above when constructing an identification model and / or identifying a state (for example, a facial expression) using the identification model 114).

ここで、識別モデルを構築する際には、実用に堪える識別精度を達成するために、通常、学習データとして大量の識別対象に係る画像が必要となる。しかも、識別対象の見え方について互いに異なっている画像も大量に含まれることが好ましい。しかしながら、現実には、種々の見え方をしている識別対象画像を大量に準備することは極めて困難であった。また、識別モデルを用いて状態を識別する際にも、識別対象画像における識別対象の見え方によっては、識別モデルによる状態識別結果がその影響を受け、状態を誤って識別する場合も少なくないのが実情であった。 Here, when constructing an identification model, a large amount of images related to an identification target is usually required as learning data in order to achieve a practically accurate identification accuracy. Moreover, it is preferable that a large number of images that differ from each other in the appearance of the identification target are included. However, in reality, it is extremely difficult to prepare a large number of identification target images having various appearances. In addition, when identifying a state using an identification model, depending on how the identification target appears in the identification target image, the state identification result by the identification model is affected, and there are many cases where the state is identified incorrectly. Was the actual situation.

これに対し、上記構成（Ａ）及び（Ｂ）を備えたスマートフォン１によれば、識別対象の見え方を変更し且つその状態を維持する処理で生成された画像を含む画像群を生成する。これにより、例えば、種々の見え方をしている識別対象に係るより多数の画像である学習用及び／又は識別用の画像群を、確実に準備することも可能となる。その結果、様々な状況下で種々の見え方をしている識別対象の画像であっても、より高い精度でこの識別対象の状態を識別することができるのである。 On the other hand, according to the smart phone 1 provided with the said structure (A) and (B), the image group containing the image produced | generated by the process which changes the appearance of an identification target and maintains the state is produced | generated. Thereby, for example, it is also possible to reliably prepare a learning and / or identification image group that is a larger number of images related to identification objects that are viewed in various ways. As a result, the state of the identification target can be identified with higher accuracy even in the case of an image of the identification target that is viewed in various ways under various circumstances.

なお、上記のスマートフォン１に具現された本発明による状態識別装置は、当然に、識別すべき対象の状態が人間の顔の表情である場合にのみ適用されるものではない。本発明によれば、１つの状態を表した画像におけるその見え方によって状態識別結果が影響を受けるような識別対象の状態であっても、より高い精度で識別可能となる。例えば、人間や動物の性別、年齢、人種（種別）や、服装等を識別すべき状態とすることもできる。さらには、家具、日用品や、自動車等の種々の物体認識にも本発明を適用することが可能である。従来、そのような状態識別の結果が、画像での見え方によって大きな誤差を含んでいたり間違ったりしてきたのに対し、本発明によれば、そのような状態をより精度良く識別することができるのである。 Of course, the state identification device according to the present invention embodied in the above-described smartphone 1 is not applied only when the target state to be identified is a facial expression of a human face. According to the present invention, even a state of an identification target whose state identification result is affected by the appearance in an image representing one state can be identified with higher accuracy. For example, it is possible to identify the sex, age, race (type), clothes, etc. of a human or animal. Furthermore, the present invention can also be applied to recognition of various objects such as furniture, daily necessities, and automobiles. Conventionally, the result of such state identification has included a large error or wrong depending on how it is seen in the image, but according to the present invention, such a state can be identified more accurately. It is.

さらに、スマートフォン１に具現された本発明による状態識別装置は、当然にスマートフォンに限定されるものではない。例えば、この状態識別装置として、タブレット型コンピュータ、ノート型コンピュータ、パーソナルコンピュータ（ＰＣ）、セットトップボックス（ＳＴＢ）、ロボット、デジタルサイネージ等を採用することもできる。例えば、カメラを内蔵したこれらの装置（端末）において、様々な実環境下においても、ユーザの表情を読み取ることによって、読み取った表情に係る情報に応じた応答を行ったり、読み取った表情に係る情報から先に実施されたユーザに対するアクション等の評価を行ったりすることも可能となる。 Furthermore, naturally the state identification apparatus by this invention embodied in the smart phone 1 is not limited to a smart phone. For example, a tablet computer, a notebook computer, a personal computer (PC), a set top box (STB), a robot, a digital signage, or the like can be employed as the state identification device. For example, in these devices (terminals) with a built-in camera, even in various real environments, by reading the user's facial expression, a response corresponding to the information related to the read facial expression is made, or information relating to the read facial expression It is also possible to evaluate an action or the like for the user that has been performed first.

［状態識別方法］
図２は、本発明による状態識別方法の一実施形態の概略を示すフローチャートである。ここで、本実施形態において、識別対象は人間の顔であって、識別対象の状態はこの顔の表情となっている。また、図２（Ａ）は、本発明に係る学習画像群を用いた表情識別モデル構築処理（学習処理）のフローを示しており、図２（Ｂ）は、本発明に係る判定画像群を用いた表情識別処理のフローを示している。以下、最初に、図２（Ａ）を用いて表情識別モデル構築処理の概略を説明する。 [State identification method]
FIG. 2 is a flowchart showing an outline of an embodiment of the state identification method according to the present invention. Here, in this embodiment, the identification target is a human face, and the state of the identification target is the facial expression of this face. 2A shows a flow of facial expression identification model construction processing (learning processing) using the learning image group according to the present invention, and FIG. 2B shows the determination image group according to the present invention. The flow of used facial expression identification processing is shown. Hereinafter, first, an outline of the facial expression identification model construction process will be described with reference to FIG.

（Ｓ１０１）学習用画像を取得する。この学習用画像は、この後、学習画像群を生成して学習処理時の画像サンプル数を増加させるための元画像となる。 (S101) A learning image is acquired. This learning image becomes an original image for generating a learning image group and increasing the number of image samples during the learning process.

（Ｓ１０２）取得した学習用画像から顔画像領域を抽出する。
（Ｓ１０３）抽出した顔画像領域において、予め設定された見え方種別毎に、当該見え方種別発生の有無及び／又は度合いを判定する。 (S102) A face image region is extracted from the acquired learning image.
(S103) In the extracted face image area, the presence / absence and / or degree of occurrence of the appearance type is determined for each preset appearance type.

ここで、見え方種別としては、例えば、顔が正面を向いている１つの状態、及び顔が正面を向いた状態から見て上下左右の各々の方に向いている４つの状態を含む、顔の向きに係る種別を予め設定することができる。また、この見え方種別における度合いとして、正面向きを基準（ゼロ度）として顔が向いている向きのなす角度としてもよい。なお、この見え方種別については、後に図３を用いて詳細に説明する。 Here, the types of appearance include, for example, one state in which the face is facing the front and four states in which the face is facing upward, downward, left and right when viewed from the front. Can be set in advance. Further, the degree of the appearance type may be an angle formed by the direction in which the face is facing with reference to the front direction (zero degree). This appearance type will be described later in detail with reference to FIG.

（Ｓ１０４、Ｓ１０５）取得した学習用画像における、見え方種別毎における当該見え方種別発生の有無及び／又は度合いの判定結果に基づいて、変更画像を作成し、当該変更画像を含む学習画像群を生成する。これらの変更画像作成処理及び学習画像群生成処理については、後に図４を用いて詳細に説明する。
（Ｓ１０６）生成した学習画像群を用いて表情識別モデルを構築する。これで、表情識別モデル構築処理が完了する。 (S104, S105) In the acquired learning image, a changed image is created based on the determination result of the presence type and / or degree of appearance type for each appearance type, and a learning image group including the changed image is created. Generate. These modified image creation processing and learning image group generation processing will be described later in detail with reference to FIG.
(S106) A facial expression identification model is constructed using the generated learning image group. This completes the facial expression identification model construction process.

次いで、図２（Ｂ）を用いて表情識別処理の概略を説明する。
（Ｓ２０１）識別対象画像を取得する。この識別対象画像は、状態識別の対象であるとともに、この後、判定画像群を生成して状態識別処理時の画像サンプル数を増加させるための元画像となる。 Next, an outline of facial expression identification processing will be described with reference to FIG.
(S201) An identification target image is acquired. This identification target image is a state identification target, and thereafter becomes an original image for generating a determination image group and increasing the number of image samples during the state identification processing.

（Ｓ２０２）取得した識別対象画像から顔画像領域を抽出する。
（Ｓ２０３）抽出した顔画像領域において、予め設定された見え方種別毎に、当該見え方種別発生の有無及び／又は度合いを判定する。この見え方種別については、ステップＳ１０３で説明したものと同様とすることができる。いずれにしても、この見え方種別について、後に図３を用いて詳細に説明する。 (S202) A face image area is extracted from the acquired identification target image.
(S203) In the extracted face image area, the presence / absence and / or degree of occurrence of the appearance type is determined for each preset appearance type. The appearance type can be the same as that described in step S103. In any case, this type of appearance will be described later in detail with reference to FIG.

（Ｓ２０４、Ｓ２０５）取得した識別対象画像における、見え方種別毎における当該見え方種別発生の有無及び／又は度合いの判定結果に基づいて、変更画像を作成し、当該変更画像を含む判定画像群を生成する。これらの変更画像作成処理及び学習画像群生成処理は、ステップＳ１０４及びＳ１０５での処理と同様のものとしたり、互いに関連付けられた連携した処理としたりすることができる。いずれにしても、これらの処理について、後に図４を用いて詳細に説明する。 (S204, S205) In the acquired identification target image, a changed image is created based on the determination result of the presence type and / or degree of appearance type for each appearance type, and a determination image group including the changed image is created. Generate. These modified image creation processing and learning image group generation processing can be the same as the processing in steps S104 and S105, or can be linked processing associated with each other. In any case, these processes will be described in detail later with reference to FIG.

（Ｓ２０６）生成した判定画像群の各々に対し、生成された表情識別モデルを用いて、その顔の表情についての判定を行う。ここで使用する表情識別モデルは、ステップＳ１０６で構築されたものとすることができる。または、別途、例えば公知の方法で予め生成された表情識別モデルを利用することも可能である。 (S206) For each of the generated determination image groups, the facial expression is determined using the generated facial expression identification model. The facial expression identification model used here may be constructed in step S106. Alternatively, for example, a facial expression identification model generated in advance by a known method can be used.

（Ｓ２０７）ステップＳ２０６での判定結果を総合して、取得した識別対象画像における顔の表情の識別を行う。これで、表情識別処理が完了する。 (S207) The determination results in step S206 are combined to identify facial expressions in the acquired identification target image. This completes the facial expression identification process.

［装置構成］
次に図１に戻って、以下、本発明による状態識別装置の具体的構成について説明を行う。同図の機能ブロック図に示すように、状態識別装置（表情識別装置）である本実施形態のスマートフォン１は、通信インタフェース部１０１と、カメラ１０２と、画像データベース１０３と、変更画像記憶部１０４と、判定結果記憶部１０５と、タッチパネル・ディスプレイ（ＴＰ・ＤＰ）１０６と、プロセッサ・メモリとを有する。ここで、プロセッサ・メモリは、スマートフォン１のコンピュータを機能させるプログラムを実行することによって、状態識別機能（表情識別機能）を実現させる。 [Device configuration]
Next, referring back to FIG. 1, a specific configuration of the state identification device according to the present invention will be described below. As shown in the functional block diagram of FIG. 1, the smartphone 1 of the present embodiment, which is a state identification device (facial expression identification device), includes a communication interface unit 101, a camera 102, an image database 103, and a changed image storage unit 104. The determination result storage unit 105, the touch panel display (TP / DP) 106, and the processor memory are included. Here, the processor memory realizes a state identification function (facial expression identification function) by executing a program that causes the computer of the smartphone 1 to function.

さらに、このプロセッサ・メモリは、機能構成部として、画像管理部１１１と、見え方度合判定部１１２と、画像変更部１１３ａ、学習画像群生成部１１３ｂ及び判定画像群生成部１１３ｃを含む画像群生成部１１３と、識別モデル構築部１１４ａ及び表情判定部１１４ｂを含む表情識別処理部（状態識別処理部）１１４と、アプリケーション１２１と、入出力制御部１２２とを有する。ここで、図１におけるスマートフォン１の機能構成部間を矢印で接続して示した処理の流れは、本発明による状態識別方法（表情識別方法）の一実施形態としても理解される。 Furthermore, the processor memory includes an image management unit 111, an appearance degree determination unit 112, an image change unit 113a, a learning image group generation unit 113b, and a determination image group generation unit 113c as functional configuration units. Unit 113, facial expression identification processing unit (state identification processing unit) 114 including identification model construction unit 114a and facial expression determination unit 114b, application 121, and input / output control unit 122. Here, the flow of processing shown by connecting the functional components of the smartphone 1 in FIG. 1 with arrows is understood as an embodiment of the state identification method (expression identification method) according to the present invention.

通信インタフェース部１０１は、
（ａ）学習画像群生成の元画像となる学習用画像や、表情識別処理部１１４での学習用の大量の画像のデータ、さらには、
（ｂ）判定画像群生成の元画像となる識別対象画像や、表情識別処理部１１４でそのまま識別処理を受ける識別対象画像のデータ
を、例えば外部の画像管理サーバ２からインターネット等の通信ネットワークを介して取得し、画像管理部１１１に出力する。また、この通信インタフェース部１０１を介して、本発明に係る表情識別プログラム（アプリ）や、当該表情識別結果を利用したサービスを提供可能なアプリケーション・プログラム、例えば対話ＡＩアプリ、をダウンロードすることも可能となっている。 The communication interface unit 101
(A) A learning image that is an original image for generating a learning image group, a large amount of learning data in the facial expression identification processing unit 114,
(B) The identification target image that is the original image for generation of the determination image group and the data of the identification target image that is directly subjected to the identification processing by the facial expression identification processing unit 114 are transferred from the external image management server 2 via a communication network such as the Internet And output to the image management unit 111. It is also possible to download the facial expression identification program (application) according to the present invention or an application program that can provide a service using the facial expression identification result, such as a conversation AI application, via the communication interface unit 101. It has become.

カメラ１０２は、例えばユーザの顔を撮影し、識別対象画像としての顔画像データを画像管理部１１１に出力することができる。 For example, the camera 102 can photograph a user's face and output face image data as an identification target image to the image management unit 111.

画像管理部１１１は、通信インタフェース部１０１やカメラ１０５から上述したような画像データを入力し、画像データベース１０３に保存して管理する。また、ユーザ等による指示や装置内処理からの要請等に応じて、画像群生成の元画像となる学習用画像又は識別対象画像を、見え方度合判定部１１２に出力する。 The image management unit 111 receives the above-described image data from the communication interface unit 101 or the camera 105, and stores and manages the image data in the image database 103. Further, in response to an instruction from the user or the like, a request from in-device processing, or the like, the learning image or the identification target image that is the original image for generating the image group is output to the appearance degree determination unit 112.

同じく図１において、見え方度合判定部１１２は、入力された識別対象（例えば顔）に係る１つの画像（学習用画像又は識別対象画像）に対し、識別対象（例えば顔）に係る画像領域を抽出し、この画像領域に基づいて、見え方に係る少なくとも１つの種別における当該種別発生の有無及び／又は度合いを判定する。ここで、種別発生有りを（度合い）＝１とし、種別発生無しを（度合い）＝０と解釈することも可能である。ちなみに、この後、画像群生成部１１３は、当該種別につき、ここで判定された有無の状態及び／又は度合いを変えることによって元画像の変更を行い、変更画像を作成するのである。 Similarly, in FIG. 1, the appearance degree determination unit 112 determines an image region related to the identification target (for example, the face) for one image (learning image or identification target image) related to the input identification target (for example, the face). Based on this image area, the presence / absence and / or degree of occurrence of the type in at least one type of appearance is determined. Here, it is also possible to interpret that occurrence of a type is (degree) = 1 and that no type is generated (degree) = 0. Incidentally, after this, the image group generation unit 113 changes the original image by changing the state and / or degree of presence / absence determined here for the type, and creates a changed image.

なお、これらの画像領域の抽出、及び見え方種別に係る判定は、例えば、画像をスキャンし、画素値が急激に変化するエッジを取り出して識別対象（例えば顔）を検知する公知の方法を用いて実施することができる。また、このような抽出を可能とするオープンソースを利用してもよい。さらには、公知の深層学習（Deep Learning）等の機械学習を用いた抽出手法を利用することも可能である。 The extraction of these image areas and the determination relating to the type of appearance use, for example, a known method of scanning an image and extracting an edge whose pixel value changes abruptly to detect an identification target (for example, a face). Can be implemented. Moreover, you may utilize the open source which enables such extraction. Furthermore, it is also possible to use an extraction method using machine learning such as publicly known deep learning.

図３は、見え方度合判定部１１２での判定に係る見え方種別の例を説明するための顔画像データである。 FIG. 3 is face image data for explaining an example of the appearance type related to the determination by the appearance degree determination unit 112.

見え方種別の一例として、図３（Ａ）には、顔が画面からはみ出している画像が示されている。このように、見え方種別として例えば、顔が画面からはみ出していない１つの状態、及び顔が画面の上下左右の各々の辺からはみ出している状態を含む、顔のはみ出しに係る種別を設定することができる。ここで、この見え方種別における度合いとしては、例えば、顔全体の（推定）面積又は画素数に対する、画像の外にはみ出した顔部分の面積又は画素数（(顔全体面積又は画素数)−(画像内に残っている顔部分の面積又は画素数)）の割合としてもよい。 As an example of the appearance type, FIG. 3A shows an image in which the face protrudes from the screen. In this way, as the appearance type, for example, a type relating to the protrusion of the face including one state where the face does not protrude from the screen and a state where the face protrudes from each of the upper, lower, left and right sides of the screen is set. Can do. Here, as the degree of the appearance type, for example, the area or the number of pixels of the face portion that protrudes outside the image ((the total area of the face or the number of pixels) − ( It may be the ratio of the area of the face portion or the number of pixels)) remaining in the image.

また、見え方種別の他の例として、図３（Ｂ）には、顔の向き・傾きの角度が大きい画像が示されている。このように、見え方種別として例えば、顔が正面を向いている１つの状態、及び顔が正面を向いた状態から見て上下左右の各々の方に向いている状態を含む、顔の向き・傾きに係る種別を設定してもよい。ここで、この見え方種別における度合いとしては、例えば、正面向きを基準（ゼロ度）として顔が向いている向きのなす角度（及び顔の傾き具合を表す傾き角）とすることができる。 As another example of the appearance type, FIG. 3B shows an image with a large face orientation / tilt angle. In this way, the types of appearance include, for example, one state where the face is facing the front, and a state where the face is facing toward each of the top, bottom, left and right when viewed from the front. A type related to the inclination may be set. Here, the degree of the appearance type can be, for example, an angle formed by a direction in which the face is facing with reference to the front direction (zero degree) (and an inclination angle representing the degree of inclination of the face).

なお、顔の向きに係る見え方種別の判定は、例えば、上下左右の各々の方に向いている顔の画像によって、予めこれら４つの向きのパターンを学習した識別モデルを生成し、この識別モデルを用いて顔の向きを判定する手法を利用することにより実施可能である。また、顔の傾きに係る見え方種別の判定についても、同様に傾き具合の異なる複数のパターンを学習した識別モデルを生成し、この識別モデルを用いて顔の傾きを判定する手法を利用することにより実施可能である。 The determination of the appearance type related to the orientation of the face is performed by, for example, generating an identification model in which patterns of these four orientations are learned in advance from an image of a face facing in the up, down, left, and right directions. This method can be implemented by using a method of determining the face orientation using. Similarly, for the determination of the appearance type related to the tilt of the face, a method for generating an identification model in which a plurality of patterns having different tilts is similarly learned and using the method for determining the tilt of the face is used. Can be implemented.

さらに、見え方種別の更なる他の例として、図３（Ｃ）には、顔又は顔付近にサングラス、帽子やマスクといった装着物が付された画像が示されている。このように、見え方種別として例えば、顔又は顔付近に装着物が何ら付されていない１つの状態、並びに顔又は顔付近に眼鏡、サングラス、ゴーグル、帽子、頬かむり、スカーフ、マスク及びその他の装着物の各々が付された状態を含む、装着物付加に係る種別を設定することができる。ここで、この見え方種別における度合いとしては、例えば、顔全体の（推定）面積又は画素数に対する、装着物によって遮蔽された顔部分の面積又は画素数（(顔全体面積又は画素数)−(遮蔽されていない露出した顔部分の面積又は画素数)）の割合としてもよい。 Furthermore, as yet another example of the appearance type, FIG. 3C shows an image in which a wearing object such as sunglasses, a hat, or a mask is attached to the face or the vicinity of the face. In this way, for example, as a type of appearance, for example, one state where no attachment is attached to the face or the vicinity of the face, and glasses, sunglasses, goggles, a hat, cheek cheeks, a scarf, a mask, and other attachments on the face or the vicinity of the face It is possible to set the type related to the addition of the wearing object including the state where each of the objects is attached. Here, as the degree of the appearance type, for example, the area or the number of pixels of the face part shielded by the wearing object ((the total area of the face or the number of pixels) − (the total number of the faces or the number of pixels) − ( It may be the ratio of the area or the number of pixels of the exposed face part that is not shielded)).

なお、このような装着物付加に係る見え方種別の判定も、例えば、所定の装着物を付された顔の画像によって、予め当該装着物の有無を学習した識別モデルを生成し、この識別モデルを用いて装着物の有無を判定する手法を利用することにより実施可能である。 In addition, for the determination of the appearance type related to the addition of the wearing object, for example, an identification model in which the presence / absence of the wearing object is learned in advance is generated from the image of the face to which the predetermined wearing object is attached. This can be implemented by using a method for determining the presence / absence of a wearing object using the.

また、見え方種別の更なる他の例として、図３（Ｄ）には、顔の照度（明度）が高すぎたり低すぎたりする画像が示されている。このように、見え方種別として例えば、顔画像領域の全画素における明度（照度）の平均が所定の基準範囲内である１つの状態、及び顔画像領域の全画素における明度（照度）の平均が所定の基準範囲を超えて高すぎる又は低すぎる状態を含む、顔の照度（明度）に係る種別を設定してもよい。ここで、この見え方種別における度合いとしては、例えば、この照度（明度）の平均値と、照度（明度）基準値との差とすることができる。 As yet another example of the appearance type, FIG. 3D shows an image in which the illuminance (brightness) of the face is too high or too low. Thus, for example, the appearance type is, for example, one state in which the average brightness (illuminance) of all pixels in the face image area is within a predetermined reference range, and the average brightness (illuminance) of all pixels in the face image area. You may set the type which concerns on the illumination intensity (brightness) of a face including the state which is too high or too low exceeding a predetermined reference range. Here, the degree of the appearance type can be, for example, the difference between the average value of the illuminance (lightness) and the illuminance (lightness) reference value.

なお、当然に、見え方度合判定部１１２での判定に係る見え方種別は、上述した４つに限定されるものではない。そのうちの１つ、２つ又は３つであってもよく、また、他の見え方種別を含んでいてもよい。ここで、他の見え方種別としては、例えば、顔画像領域の全画素における色（彩度）の平均に係る見え方種別、顔画像における顔画像領域と背景領域との明度／彩度のコントラストに係る見え方種別や、顔部分のサイズ（例えば、画像全体に対する顔部分の面積又は画素数の割合）を挙げることができる。 Of course, the appearance types related to the determination by the appearance degree determination unit 112 are not limited to the above four. One, two, or three of them may be included, and other types of appearance may be included. Here, as other types of appearance, for example, the type of appearance related to the average of colors (saturation) in all pixels of the face image region, the brightness / saturation contrast between the face image region and the background region in the face image And the size of the face part (for example, the ratio of the area of the face part or the number of pixels to the entire image).

すなわち、見え方種別の例をまとめると、１つの画像における（ａ）識別対象の表れている度合い、（ｂ）識別対象の表れている向き若しくは傾き、（ｃ）識別対象を遮蔽するものの有無若しくは種類、（ｄ）識別対象の明度、（ｅ）識別対象の色合い、及び（ｆ）識別対象と背景とのコントラストの度合い等のうちの少なくとも１つを、見え方種別として採用することができる。このように、画像中において、識別対象が、その識別対象となる状態を維持したままで判別可能な変化を生じ得るならば、その変化に係る一連の見え方を、見え方種別とすることが可能である。 That is, examples of appearance types are summarized as follows: (a) the degree of identification target appearing in one image, (b) the direction or inclination of the identification target appearing, and (c) presence / absence of something blocking the identification target or At least one of the type, (d) the brightness of the identification target, (e) the color of the identification target, (f) the degree of contrast between the identification target and the background, and the like can be adopted as the appearance type. As described above, in the image, if the identification target can cause a change that can be discriminated while maintaining the state to be the identification target, a series of appearances related to the change may be set as the appearance type. Is possible.

図１に戻って、画像群生成部１１３の画像変更部１１３ａは、入力された識別対象（例えば顔）に係る１つの画像（学習用画像又は識別対象画像）に対し、この１つの画像における識別対象（例えば顔）の見え方を変更し且つ当該状態を維持する処理を施す。 Returning to FIG. 1, the image changing unit 113 a of the image group generating unit 113 identifies one image (learning image or identification target image) related to the input identification target (for example, a face) in this one image. A process of changing the appearance of the target (for example, the face) and maintaining the state is performed.

次いで、画像群生成部１１３の学習画像群生成部１１３ｂ及び判定画像群生成部１１３ｃはそれぞれ、画像変更部１１３ａでの処理で作成された変更画像を含む学習画像群及び判定画像群を生成する。生成された画像群（及び元画像）は、変更画像記憶部１０４に保存され、適宜読み出されて使用されることも好ましい。 Next, the learning image group generation unit 113b and the determination image group generation unit 113c of the image group generation unit 113 respectively generate a learning image group and a determination image group including the changed image created by the processing in the image change unit 113a. It is also preferable that the generated image group (and the original image) is stored in the modified image storage unit 104 and is appropriately read and used.

ここで、学習画像群に含まれる各画像には、元画像である学習用画像に付与されている「正解情報」がそのまま付加されている。識別対象の状態として顔の表情を識別する場合、この正解情報としては、ポジティブ、ニュートラル、及びネガティブという表情に関する３つのカテゴリを表す値（例えばそれぞれ１、０及び−１）を用いることができる。当然に、他の表情分類モデル、例えばPaul Ekmanの７つのカテゴリ（ニュートラル、喜び、嫌悪、怒り、サプライズ、悲しみ、恐怖）に基づく感情分類モデルや、さらに細分化された感情分類モデル（例えば、Paul Ekmanモデルの７つのカテゴリに対し、さらに、おもしろさ、軽蔑、満足、困惑、興奮、罪悪感、功績に基づく自負心、安心、納得感、喜び、及び恥を追加したモデル）におけるカテゴリを表す値を、正解情報とすることも可能である。 Here, to each image included in the learning image group, the “correct information” assigned to the learning image that is the original image is added as it is. When a facial expression is identified as a state to be identified, values representing three categories (for example, 1, 0, and −1, respectively) relating to facial expressions of positive, neutral, and negative can be used as the correct answer information. Naturally, other facial expression classification models, such as the emotion classification model based on Paul Ekman's seven categories (neutral, joy, disgust, anger, surprise, sadness, fear), and more detailed emotion classification models (eg, Paul A value that represents the categories in the Ekman model's seven categories, plus fun, contempt, satisfaction, embarrassment, excitement, guilt, pride based on achievement, security, persuasion, joy, and shame. Can be used as correct answer information.

なお、生成される画像群には、入力された１つの画像（元画像）を含めることも好ましい。勿論、元画像を含まない画像群を生成してもよいが、元画像を含めた画像群を用いることによって、より識別精度の高い識別モデルを構築したり、識別モデルによる識別精度を向上させたりすることが可能となる。ここで、以上に説明した画像群生成部１１３における画像変更・画像群生成処理がとり得る種々の実施形態について、以下に説明する。 Note that the generated image group preferably includes one input image (original image). Of course, an image group that does not include the original image may be generated, but by using the image group that includes the original image, an identification model with higher identification accuracy can be constructed, or the identification accuracy by the identification model can be improved. It becomes possible to do. Here, various embodiments that can be taken by the image change / image group generation processing in the image group generation unit 113 described above will be described below.

（形態ａ）入力された１つの画像（学習用画像又は識別対象画像）に対し、少なくとも１つの見え方種別について、この１つの画像における方向とは反対の方向への変更を少なくとも行う。 (Mode a) At least one change in the direction opposite to the direction in the one image is performed for at least one appearance type for one input image (learning image or identification target image).

この形態ａでは、例えば、（図３（Ｂ）で説明した）顔の向き・傾きに係る見え方種別について、入力された１つの画像が左向きの顔の画像である場合、この左向きの顔を右向きに変更する（例えば反転させる）処理を行った変更画像を作成し、この変更画像を少なくとも含む画像群を生成することができる。例えば、左向きの顔の画像（元画像）と、右向きの顔の画像（変更画像）とを含む画像群（学習画像群又は判定画像群）を生成してもよい。ちなみにこのような処理によって、構築する識別モデルの汎用性（画像での見え方に対するロバスト性）を向上させたり、識別モデルの偏向性に対応して識別精度を向上させたりすることも可能となる。 In this form a, for example, when one input image is a left-facing face image with respect to the appearance type related to the face orientation / tilt (described in FIG. 3B), this left-facing face is displayed. It is possible to create a modified image that has been subjected to a process of changing to the right (for example, to invert), and to generate an image group including at least the modified image. For example, an image group (a learning image group or a determination image group) including a left-facing face image (original image) and a right-facing face image (changed image) may be generated. By the way, by such processing, it becomes possible to improve the versatility of the identification model to be constructed (robustness with respect to the appearance in the image) or to improve the identification accuracy corresponding to the deviation of the identification model. .

（形態ｂ）入力された１つの画像（学習用画像又は識別対象画像）に対し、少なくとも１つの見え方種別について、当該見え方種別に関し予め設定された方向であって識別に好適な方向への変更を少なくとも行う。 (Mode b) For at least one appearance type with respect to one input image (learning image or identification target image), the direction is set in advance with respect to the appearance type and is suitable for identification. Make at least changes.

この形態ｂでは、例えば、（図３（Ｄ）で説明した）顔の照度（明度）に係る見え方種別について、入力された１つの画像における顔画像領域の全画素での明度（照度）の平均値が所定の基準明度（照度）範囲を超えて高すぎる場合、この明度（照度）平均値がこの基準明度（照度）範囲の値となるように各画素での明度（照度）を変更する処理を行った変更画像を作成し、この変更画像を少なくとも含む画像群を生成することができる。例えば、明度（照度）の平均値が高すぎる画像（元画像）と、明度（照度）の平均値を所定の基準明度（照度）範囲内に収めた画像（変更画像）とを含む画像群（学習画像群又は判定画像群）を生成してもよい。ちなみにこのような処理によって、より識別性能の高い識別モデルを構築したり、識別モデルによる識別結果の精度を向上させたりすることも可能となる。 In this form b, for example, for the appearance type related to the illuminance (brightness) of the face (described in FIG. 3D), the brightness (illuminance) of all the pixels of the face image area in one input image is displayed. If the average value is too high beyond the predetermined reference lightness (illuminance) range, the lightness (illuminance) at each pixel is changed so that the average value of the lightness (illuminance) becomes a value in the reference lightness (illuminance) range. A modified image that has been processed can be created, and an image group that includes at least the modified image can be generated. For example, an image group including an image (original image) whose average value of brightness (illuminance) is too high and an image (changed image) in which the average value of brightness (illuminance) is within a predetermined reference brightness (illuminance) range ( Learning image group or determination image group) may be generated. Incidentally, it is possible to construct a discrimination model with higher discrimination performance and improve the accuracy of the discrimination result by the discrimination model by such processing.

（形態ｃ）入力された１つの画像（学習用画像又は識別対象画像）に対し、予め設定された複数の見え方種別のうちの複数について合わせて変更を行う。 (Mode c) The input one image (the learning image or the identification target image) is changed in accordance with a plurality of preset appearance types.

この形態ｃでは、例えば、（図３（Ｂ）で説明した）顔の向き・傾きに係る見え方種別と、（図３（Ｄ）で説明した）顔の照度（明度）に係る見え方種別との両種別について、同時に変更の施された変更画像を作成することができる。具体的には、例えば、左向きの顔の画像であって且つ顔画像領域での明度（照度）の平均値が高すぎる画像（元画像）から、右向きの顔の画像であって且つ明度（照度）の平均値が所定の基準明度（照度）範囲内に収まった画像（変更画像）を作成し、画像群（学習画像群又は判定画像群）に含めてもよい。ちなみにこのような処理によって、構築する識別モデルの汎用性（見え方の種々の変化に対するロバスト性）を向上させたり、識別モデルによる識別結果の精度を向上させたりすることも可能となる。 In this form c, for example, the appearance type related to the face orientation / tilt (described in FIG. 3B) and the appearance type related to the illuminance (brightness) of the face (described in FIG. 3D). For both types, it is possible to create a modified image that has been modified at the same time. Specifically, for example, an image of a face facing left and an average value of brightness (illuminance) in the face image area that is too high (original image), an image of a face facing right and brightness (illuminance). ) May be created and included in the image group (learning image group or determination image group). Incidentally, by such processing, it is possible to improve the versatility of the identification model to be constructed (robustness against various changes in appearance) and improve the accuracy of the identification result by the identification model.

（形態ｄ）入力された１つの画像（学習用画像又は識別対象画像）に対し、少なくとも１つの見え方種別について、当該見え方種別に関し予め設定された方向への変更を行い、当該方向によって特徴付けられる識別モデルを構築するための画像群を生成する。 (Mode d) For one input image (learning image or identification target image), at least one appearance type is changed in a direction set in advance with respect to the appearance type, and features are determined depending on the direction. An image group for constructing an identification model to be attached is generated.

この形態ｄでは、例えば、（図３（Ｂ）で説明した）顔の向き・傾きに係る見え方種別について、入力された１つの画像が下向きではない顔の画像である場合、この顔を下向きに変更する処理を行った変更画像を作成し、この変更画像を少なくとも含む画像群を生成することができる。すなわち、この場合、下向きの顔の画像を必ず（又は高い頻度で）含む画像群（学習画像群又は判定画像群）を生成することができる。ちなみにこのような処理によって、所定の方向によって特徴付けられる（例えば、特に下向きの顔の表情を精度良く識別可能な）識別モデルを構築したり、所定の方向によって特徴付けられた識別モデルの当該特徴に合わせた画像入力を行うことにより識別精度を向上させたりすることも可能となる。 In this form d, for example, with regard to the appearance type related to the orientation / tilt of the face (described in FIG. 3B), when one input image is a face image that is not downward, this face is faced downward. It is possible to create a modified image that has been subjected to the process of changing to, and generate an image group including at least the modified image. That is, in this case, it is possible to generate an image group (a learning image group or a determination image group) that always includes an image of a downward face (or frequently). By the way, by such processing, an identification model characterized by a predetermined direction (for example, an identification model that can particularly accurately identify a facial expression of a downward face, for example) is constructed, or the identification model characterized by a predetermined direction It is also possible to improve the identification accuracy by inputting an image according to the above.

（形態ｅ）入力された１つの画像（学習用画像又は識別対象画像）に対し、少なくとも１つの見え方種別について、当該見え方種別に関しとり得る全ての方向への変更を行い、当該見え方種別についての方向毎の識別結果のばらつきが抑制された識別モデルを構築するための画像群を生成する。 (Mode e) For one input image (learning image or identification target image), at least one appearance type is changed in all possible directions for the appearance type, and the appearance type An image group for constructing an identification model in which variations in identification results for each direction are suppressed is generated.

この形態ｅでは、例えば、（図３（Ｂ）で説明した）顔の向き・傾きに係る見え方種別について、入力された１つの画像が左向きの顔の画像である場合、この左向きの顔を、右向き、上向き、及び下向きに変更する処理を行った変更画像を作成し、元画像とこれらの変更画像とを含む画像群（学習画像群又は判定画像群）を生成することができる。ちなみにこのような処理によって、識別精度のより高い識別モデルを構築し、識別モデルの汎用性（画像での見え方に対するロバスト性）を向上させたり、識別モデルによる識別の精度を向上させたりすることも可能となる。 In this form e, for example, when one input image is a left-facing face image with respect to the appearance type related to the face orientation / tilt (described in FIG. 3B), this left-facing face is selected. Then, it is possible to create a modified image that has been processed to change to the right, upward, and downward, and generate an image group (learning image group or determination image group) that includes the original image and these modified images. By the way, by this kind of processing, an identification model with higher identification accuracy is constructed, and the versatility of the identification model (robustness with respect to the appearance in the image) is improved, or the accuracy of identification by the identification model is improved. Is also possible.

（形態ｆ）正解として１つの状態が対応付けられた１つの画像（学習用画像）に対し、少なくとも１つの見え方種別について、この１つの状態に対し予め設定された変更数だけの変更を行い、所定のサンプル数の学習画像群を生成する。 (Mode f) For one image (learning image) associated with one state as a correct answer, at least one appearance type is changed by the number of changes set in advance for the one state. A learning image group having a predetermined number of samples is generated.

この形態ｆによれば、例えば、（正解としての）ある１つの状態が対応付けられた学習用の画像サンプルを、必要とするサンプル数だけ準備することが可能となる。ここで、設定された変更数だけの変更は、例えば、見え方種別における見え方の「度合い」を段階的に変化させ、当該見え方種別について複数（変更数だけ）の互いに異なる度合いを有する変更画像を作成することによって達成される。 According to this form f, for example, it is possible to prepare as many image samples as necessary for learning that are associated with one state (as a correct answer). Here, the change by the set number of changes is, for example, a step that changes the “degree” of the appearance in the appearance type step by step and has a plurality of (only the number of changes) different degrees of the appearance type. This is accomplished by creating an image.

ここで、１つの状態が対応付けられた画像のサンプル数が、他の状態が対応付けられた画像のサンプル数よりも小さい場合、当該１つの状態が対応付けられた画像について、当該他の状態が対応付けられた画像についてよりも大きい変更数だけの変更を行うことも好ましい。例えば、顔の表情がポジティブ、ニュートラル及びネガティブのうちのいずれの状態であるかを識別する表情識別モデルを構築する際、正解としてネガティブ状態である表情の顔画像サンプルが、他の状態に比べて比較的に取得し難いとする。この場合、ネガティブ状態の元画像については、他の状態の元画像と比較して、例えば（図３（Ｄ）で説明した）顔の照度（明度）に係る見え方種別において、顔画像領域での明度（照度）の平均値をより細かく変化させ、より多くの変更画像を作成することができる。これにより、表情識別モデルに対し、より偏りの少ない良好な学習を行わせることも可能となる。 Here, when the number of samples of the image associated with one state is smaller than the number of samples of the image associated with the other state, the other state is determined for the image associated with the one state. It is also preferable to make the change by the number of changes larger than the image associated with. For example, when constructing a facial expression identification model that identifies whether a facial expression is positive, neutral, or negative, a facial image sample of a facial expression that is negative as a correct answer is compared to other states. It is relatively difficult to obtain. In this case, the original image in the negative state is compared with the original image in the other state, for example, in the face image area in the appearance type related to the illuminance (brightness) of the face (described in FIG. 3D). The average value of the brightness (illuminance) can be changed more finely, and more changed images can be created. As a result, it is possible to cause the facial expression identification model to perform good learning with less bias.

（形態ｇ）識別モデルを構築する際、及びこの識別モデルを用いて状態を識別する際の両方で画像群を使用するケースにおいて、状態識別の際に使用される判定画像群を生成する場合、入力された１つの画像（識別対象画像）に対し、識別対象の見え方に係るいずれの種別についても、識別モデル構築の際に使用された学習画像群を生成する際に行われた種別についての変更と同じ方向への変更を行い、一方、学習画像群を生成する際に当該種別についての変更が行われなかった場合には変更を行わない。 (Mode g) In the case of using an image group both when constructing an identification model and when identifying a state using this identification model, when generating a determination image group used for state identification, For any one type of input image (identification target image) related to the appearance of the identification target, the type used when generating the learning image group used in constructing the identification model The change is performed in the same direction as the change. On the other hand, when the learning image group is generated, the change is not performed if the change is not performed on the type.

この形態ｇによれば、学習画像群と判定画像群とにおいて、いずれの見え方種別についても同方向への変更処理の施された画像が準備可能となる。すなわち、学習画像群と判定画像群との間で、見え方の変更の傾向を揃えることができる。その結果、例えば、所定の方向によって特徴付けられた（例えば、特に下向きの顔の表情を精度良く識別可能な）識別モデルを構築した上で、この識別モデルの当該特徴に合わせた画像入力を行うことによって、識別精度を向上させることも可能となるのである。 According to the form g, it is possible to prepare an image that has been subjected to the change process in the same direction for any type of appearance in the learning image group and the determination image group. That is, it is possible to align the tendency of change in appearance between the learning image group and the determination image group. As a result, for example, an identification model characterized by a predetermined direction (for example, an identification model that can particularly accurately identify a facial expression of a downward face, for example) is constructed, and an image input according to the feature of the identification model is performed. As a result, the identification accuracy can be improved.

図４は、本発明に係る変更画像作成・画像群生成処理の一実施形態を示すフローチャートである。なお、同図に示されたフローは、図２に示されたステップＳ１０４及びＳ２０４と、ステップＳ１０５及びＳ２０５との処理フローに対応している。 FIG. 4 is a flowchart showing an embodiment of the modified image creation / image group generation processing according to the present invention. The flow shown in the figure corresponds to the processing flow of steps S104 and S204 and steps S105 and S205 shown in FIG.

（Ｓ３０１）元画像に対する変更処理について、変更すべき見え方種別、並びに当該種別についての変更における方向及び度合いを設定する。
（Ｓ３０２）ここで、どの見え方種別について変更処理を行うかを決定し、設定された見え方種別について順次処理を進める。 (S301) For the change process for the original image, the appearance type to be changed, and the direction and degree of change for the type are set.
(S302) Here, it is determined for which appearance type the change process is to be performed, and the process is sequentially performed for the set appearance type.

なお、設定された見え方種別、及び当該種別についての変更における方向及び度合い（属性）は、見え方種別一覧データベースに登録され、適宜参照されることも好ましい。ちなみに、本実施形態では、顔のはみ出しに係る種別、顔の向き・傾きに係る種別、装着物付加に係る種別、及び顔の照度（明度）に係る種別の４つの見え方種別が設定され、これらの見え方種別について順次処理が実施される。 In addition, it is also preferable that the set appearance type and the direction and degree (attribute) in the change of the type are registered in the appearance type list database and referred to as appropriate. By the way, in the present embodiment, four types of appearances are set: a type related to the protrusion of the face, a type related to the orientation / tilt of the face, a type related to the attachment addition, and a type related to the illuminance (brightness) of the face, Processing is sequentially performed for these appearance types.

（Ｓ３０３ａ、Ｓ３０４ａ）顔のはみ出しに係る見え方種別につき、入力画像（元画像）において抽出された顔（画像領域）が、画面からはみ出ているか否かを判定する。ここで、真の判定（はみ出ているとの判定）を行った場合、当該見え方種別については変更処理を行わない。一方、偽の判定を行った場合、顔のはみ出していない入力画像（元画像）に対し、所定の方向に所定量だけ顔（画像領域）を画面からはみ出させるセグメンテーション処理を実施する。 (S303a, S304a) Whether the face (image area) extracted from the input image (original image) protrudes from the screen is determined for the appearance type related to the protrusion of the face. Here, when the true determination (determination that it is protruding) is performed, the change process is not performed for the appearance type. On the other hand, when a false determination is made, a segmentation process is performed for causing the face (image area) to protrude from the screen by a predetermined amount in a predetermined direction with respect to the input image (original image) where the face does not protrude.

（Ｓ３０３ｂ、Ｓ３０４ｂ、Ｓ３０４ｃ）顔の向き・傾きに係る見え方種別につき、入力画像（元画像）において抽出された顔（画像領域）が、上下左右のいずれかの方向に向いているか否か、及び顔が傾いているか否かを判定する。ここで、いずれについても偽の判定（正面向きであって傾いていないとの判定）を行った場合、所定の方向に所定の角度だけ顔を傾ける（例えば顔画像領域を回転させる）処理を行う。またはこの場合に、当該見え方種別について変更処理を行わないとしてもよい。一方、いずれかについて偽以外の判定を行った場合、顔の向き若しくは傾きを変化させる（例えば反転させる）処理を行ったり、所定の方向に所定の角度だけ顔を傾ける（例えば顔画像領域を回転させる）処理を行ったりする。 (S303b, S304b, S304c) Whether or not the face (image area) extracted from the input image (original image) is oriented in any of the top, bottom, left, and right directions for the type of appearance related to the orientation / tilt of the face, And it is determined whether or not the face is tilted. Here, when a false determination (determination that it is facing front and not tilted) is made in any case, a process of tilting the face by a predetermined angle in a predetermined direction (for example, rotating the face image area) is performed. . In this case, the change process may not be performed for the appearance type. On the other hand, if a determination other than false is made for either of them, a process of changing (for example, reversing) the orientation or inclination of the face is performed, or the face is tilted by a predetermined angle in a predetermined direction (for example, rotating the face image area) Process).

（Ｓ３０３ｃ、Ｓ３０４ｄ）装着物付加に係る種別につき、入力画像（元画像）において抽出された顔画像領域又はその付近に、所定の装着物（例えば眼鏡又は帽子）が存在するか否かを判定する。ここで、真の判定（所定装着物が存在するとの判定）を行った場合、当該見え方種別については変更処理を行わない。一方、偽の判定を行った場合、所定の装着物の付されていない入力画像（元画像）に対し、顔画像領域又はその付近における所定の位置に所定の装着物を付加する処理を実施する。 (S303c, S304d) Determine whether or not a predetermined wearing object (for example, glasses or a hat) exists in or near the face image area extracted from the input image (original image) for the type related to the wearing object addition. . Here, when the true determination (determination that the predetermined wearing object exists) is performed, the change process is not performed on the appearance type. On the other hand, when a false determination is made, a process of adding a predetermined wearing object to a predetermined position in or near the face image area is performed on an input image (original image) without a predetermined wearing object. .

（Ｓ３０３ｄ、Ｓ３０４ｅ、Ｓ３０４ｆ、Ｓ３０４ｇ）顔の照度（明度）に係る種別につき、入力画像（元画像）において抽出された顔画像領域の全画素での明度（照度）の平均値が所定の基準明度（照度）範囲内であるか、当該範囲を超えて高すぎるのか、又は当該範囲を下回って低すぎるのかを判定する。ここで、当該範囲内であるとの判定を行った場合、当該見え方種別については変更処理を行わない。またはこの場合に、所定の明度（照度）変更処理を行ってもよい。一方、明度（照度）平均値が当該範囲を超えて高すぎる又は下回って低すぎるとの判定を行った場合、照度（明度）の高すぎる入力画像（元画像）に対し、明度（照度）の平均値が所定の基準明度（照度）範囲内となるように照度（明度）変更処理を実施する。 (S303d, S304e, S304f, S304g) For the type relating to the illuminance (brightness) of the face, the average value of the luminosity (illuminance) at all the pixels in the face image area extracted from the input image (original image) is a predetermined reference lightness. Whether it is within the (illuminance) range, beyond the range, too high, or below the range is too low. Here, when it is determined that it is within the range, the change process is not performed for the appearance type. Alternatively, in this case, a predetermined brightness (illuminance) changing process may be performed. On the other hand, if it is determined that the lightness (illuminance) average value exceeds the range and is too high or low and too low, the brightness (illuminance) of the input image (original image) with too high illuminance (lightness) The illuminance (brightness) changing process is performed so that the average value falls within a predetermined reference lightness (illuminance) range.

（Ｓ３０５）設定した全ての見え方種別について、変更処理の判断を行ったか否か又は変更処理を行ったか否かの判定を行う。ここで、偽の判定を行った場合、変更処理（の判断）を行っていない見え方種別について当該処理を実施すべく、ステップＳ３０２に移行する。なお、１つの入力画像（元画像）に対し、複数の見え方種別についての変更処理を実施することも可能である。この場合、このステップＳ３０５において、すでに１つの見え方種別に関して変更処理の実施された入力画像について、残りの見え方種別に関する変更処理を受けさせる判断をすることになる。 (S305) It is determined whether or not the change process has been determined for all the set appearance types or whether or not the change process has been performed. If a false determination is made here, the process proceeds to step S302 in order to perform the process for the appearance type for which the change process (determination) is not performed. Note that it is also possible to perform a change process for a plurality of appearance types for one input image (original image). In this case, in this step S305, it is determined that the input image that has already been subjected to the change process for one appearance type is subjected to the change process for the remaining appearance type.

（Ｓ３０６）一方、ステップＳ３０５で真の判定を行った場合、作成された変更画像を含む学習画像群又は判定画像群を生成する。これで、変更画像作成・画像群生成処理が完了する。 (S306) On the other hand, when a true determination is made in step S305, a learning image group or a determination image group including the created changed image is generated. This completes the modified image creation / image group generation processing.

なお、当然に、本発明に係る変更画像作成・画像群生成処理は、以上に説明した実施形態に限定されるものではない。例えば、上述した形態ａ〜形態ｇのうちの少なくとも１つを適用し、又は複数を組み合わせて、様々な処理形態を設定することが可能となる。いずれにしても、本発明に係る変更画像作成・画像群生成処理によれば、従来と比較してより少ない元画像をもって、十分な数の画像サンプルを準備することができる。または、従来と比較してより多量の画像サンプルを準備することも可能となるのである。 Naturally, the modified image creation / image group generation processing according to the present invention is not limited to the embodiment described above. For example, it is possible to set various processing modes by applying at least one of the above-described forms a to g or combining a plurality thereof. In any case, according to the modified image creation / image group generation processing according to the present invention, it is possible to prepare a sufficient number of image samples with fewer original images as compared with the related art. Alternatively, it is possible to prepare a larger amount of image samples than in the past.

図１に戻って、表情識別処理部１１４は、本実施形態において、識別モデル構築部１１４ａと、表情判定部１１４ｂとを有する。このうち、識別モデル構築部１１４ａは、生成された学習画像群を用いて学習を行い、（表情）識別モデルを構築する。この際、学習画像群に含まれる各画像サンプルには、（そもそも元画像の有していた）正解情報が付加されている。識別モデル構築部１１４ａでは、この正解情報を参照し照合して学習を進めることになる。 Returning to FIG. 1, the facial expression identification processing unit 114 includes an identification model construction unit 114a and a facial expression determination unit 114b in the present embodiment. Among these, the identification model construction unit 114a performs learning using the generated learning image group to construct a (facial expression) identification model. At this time, correct information (originally possessed by the original image) is added to each image sample included in the learning image group. The identification model construction unit 114a refers to this correct answer information to collate and proceeds with learning.

ここで、構築される識別モデルは、例えば、ディープラーニングの一種である畳み込みニューラルネットワーク（ＣＮＮ，Convolutional Neural Network）を含む識別器とすることができる。なお当然に、本発明に係る学習画像群によって構築される識別モデルは、ＣＮＮを含む識別器に限定されず、状態識別の可能な他の機械学習手法に係る識別器としてもよい。さらには、例えば表情を判定する識別モデルとして、顔画像領域における目、鼻、口や、眉毛等の部位の特徴量を生成し、基準画像領域の特徴量との類似度に基づいて表情を判定する方式のものを採用することも可能である。 Here, the constructed identification model can be, for example, a classifier including a convolutional neural network (CNN) which is a kind of deep learning. Naturally, the discrimination model constructed by the learning image group according to the present invention is not limited to the discriminator including CNN, and may be a discriminator according to another machine learning method capable of state discrimination. Further, for example, as an identification model for determining facial expressions, feature amounts of parts such as eyes, nose, mouth, and eyebrows in the face image region are generated, and facial expressions are determined based on the similarity to the feature amount of the reference image region. It is also possible to adopt a method that does this.

一方、表情判定部１１４ｂは、生成された判定画像群を構築された識別モデルに入力し、判定画像群に含まれる変更画像、及び識別対象画像（元画像）における識別対象の状態（表情）を判定し、判定状態（例えば表情がポジティブならば１、ニュートラルならば０、及びネガティブならば−１）並びに判定値（スコア）を出力する。なお、識別モデル構築部１１４ａ及び表情判定部１１４ｂのいずれか一方においては、生成された画像群を用いず、通常の学習用の多数の画像や識別対象画像を用いて処理を行うことも可能である。 On the other hand, the facial expression determination unit 114b inputs the generated determination image group to the constructed identification model, and displays the changed image included in the determination image group and the state (expression) of the identification target in the identification target image (original image). A determination state (for example, 1 if the facial expression is positive, 0 if neutral, and -1 if negative) and a determination value (score) are output. In either one of the identification model construction unit 114a and the facial expression determination unit 114b, it is possible to perform processing using a large number of images for normal learning and identification target images without using the generated image group. is there.

図５は、識別モデル構築部１１４ａで構築される識別モデルの一実施形態を示す模式図である。 FIG. 5 is a schematic diagram showing an embodiment of an identification model constructed by the identification model construction unit 114a.

図５に示すように、本実施形態において、識別モデル構築部１１４ａで構築される識別モデルは、順伝播型の一種であるＣＮＮに基づいて構成されている。このＣＮＮは複数の畳み込み層を含んでいるが、この畳み込み層は、動物の視覚野の単純細胞の働きを模しており、画像に対しカーネル（重み付け行列フィルタ）をスライドさせて特徴マップを生成する畳み込み処理を実行する層である。この畳み込み処理によって、画像の解像度を段階的に落としながら、エッジや勾配等の基本的特徴を抽出し、局所的な相関パターンの情報を得ることができる。 As shown in FIG. 5, in this embodiment, the identification model constructed by the identification model construction unit 114a is configured based on a CNN that is a kind of forward propagation type. This CNN contains multiple convolution layers, which mimic the function of simple cells in the visual cortex of animals, and generate a feature map by sliding a kernel (weighted matrix filter) against the image. This is a layer for executing the convolution process. With this convolution process, it is possible to extract basic features such as edges and gradients while gradually reducing the resolution of the image, and obtain information on local correlation patterns.

また、このＣＮＮにおいて、各畳み込み層がプーリング層（サブサンプリング層）と対になっていて、畳み込み処理とプーリング処理とが繰り返し実施されることも好ましい。ここで、プーリング処理とは、動物の視覚野の複雑細胞の働きを模した処理であり、畳み込み層から出力される特徴マップ（一定領域内の畳み込みフィルタの反応）を最大値や平均値等でまとめ、調整パラメータを減らしつつ、局所的な平行移動不変性を確保する処理である。これにより、画像における多少のズレによる見え方の違いを吸収し、本来の特徴を捉えた適切な特徴量を獲得することができる。 In this CNN, it is also preferable that each convolution layer is paired with a pooling layer (subsampling layer), and the convolution process and the pooling process are repeatedly performed. Here, the pooling process is a process that mimics the function of complex cells in the visual cortex of animals. The feature map output from the convolution layer (convolution filter response in a certain area) is expressed as a maximum value or an average value. In summary, it is a process of ensuring local translational invariance while reducing adjustment parameters. Thereby, it is possible to absorb a difference in appearance due to a slight shift in the image and acquire an appropriate feature amount that captures the original feature.

学習画像群生成部１１３ｂは、このＣＮＮに対し学習を行わせるべく、生成した学習画像群を識別モデル構築部１１４ａに出力するのである。具体的には、多数の元画像から生成された大量の学習画像群に含まれる画像サンプルを入力し、ＣＮＮ内の複数の層のうち最終層を除いたいくつかの層分による多層ネットワークとしての反応を特徴量として出力し、この出力を正解と照合して、ニューロンの結合荷重やネットワーク構成のパラメータ等を生成・更新することにより学習を行う。 The learning image group generation unit 113b outputs the generated learning image group to the identification model construction unit 114a so that the CNN can perform learning. Specifically, as a multi-layer network by inputting several image samples included in a large number of learning image groups generated from a large number of original images, and excluding the last layer among a plurality of layers in the CNN. The response is output as a feature amount, and this output is collated with a correct answer, and learning is performed by generating / updating neuron connection weights, network configuration parameters, and the like.

通常、このようなＣＮＮの学習は、例えば一般画像データベースに蓄積された大量の識別対象画像からなる大規模画像データセットを用いて行われる。これに対し、本実施形態では、元画像から学習画像群が生成された上でＣＮＮへの入力が行われるので、通常必要とされる学習用画像のサンプル数に比べ、より少ない画像数（元画像数）で必要とされる大量の画像サンプルを準備することができるのである。 Usually, such learning of CNN is performed using, for example, a large-scale image data set composed of a large number of identification target images stored in a general image database. On the other hand, in this embodiment, since a learning image group is generated from the original image and input to the CNN is performed, the number of images (original) is smaller than the number of learning image samples that are normally required. A large number of image samples required for the number of images) can be prepared.

図１に戻って、表情識別処理部１１４の表情判定部１１４ｂは、生成された判定画像群を構築された識別モデルに入力し、判定画像群に含まれる変更画像、及び識別対象画像（元画像）における識別対象の状態（表情）を判定する。ここで、構築された表情の識別モデルが、学習画像群に付加された正解情報として、例えば、ポジティブ、ニュートラル及びネガティブという表情に関する３つのカテゴリに対応する値（例えば１、０及び−１）で学習されている場合を説明する。この場合、表情判定部１１４ｂにおける各画像に対する判定状態（例えば表情がポジティブならば１、ニュートラルならば０、及びネガティブならば−１）は、この３つのカテゴリのうちで、対応する判定値（スコア）が最も高くなっているカテゴリの状態に決定される。なお、この判定値（スコア）は、対応するカテゴリのコンフィデンス（信頼度）を示す指標となっており、合計値が１となるように規格化されていることも好ましい。 Returning to FIG. 1, the facial expression determination unit 114 b of the facial expression identification processing unit 114 inputs the generated determination image group to the constructed identification model, and the modified image included in the determination image group and the identification target image (original image) ) To determine the state (facial expression) of the identification target. Here, the constructed facial expression identification model includes, as correct answer information added to the learning image group, for example, values corresponding to three categories related to facial expressions such as positive, neutral and negative (for example, 1, 0 and −1). The case where it is learned will be described. In this case, the determination state for each image in the facial expression determination unit 114b (for example, 1 if the facial expression is positive, 0 if neutral, and -1 if negative), the corresponding determination value (score) among these three categories. ) Is determined to be the highest category state. The determination value (score) is an index indicating the confidence (reliability) of the corresponding category, and it is preferable that the determination value (score) is normalized so that the total value is 1.

表情識別処理部１１４は、構築した識別モデルを用いて、生成された判定画像群に含まれる画像の各々についての状態（表情）を判定して、判定状態（例えば表情がポジティブならば１、ニュートラルならば０、及びネガティブならば−１）と、各カテゴリに対応する判定値とを出力し、これらの決定された判定値（スコア）に基づいて、最終的な状態識別を行う。例えば、これらの判定画像群の画像毎に決定された判定値（スコア）の平均値を、状態識別結果とすることも好ましい。 The facial expression identification processing unit 114 determines a state (facial expression) for each of the images included in the generated determination image group using the constructed identification model, and determines a determination state (for example, 1 if the expression is positive, neutral. If it is negative, -1) if negative, and the determination value corresponding to each category are output, and the final state identification is performed based on these determined determination values (scores). For example, an average value of determination values (scores) determined for each image in these determination image groups is preferably used as the state identification result.

また、変更態様として、
（ａ）状態を判定された画像が変更前の元画像か否かに応じて、
互いに異なる「重み」を予め設定し、この重みを用いて、判定画像群の画像毎に決定された判定値（スコア）の重み付け平均を行い、この重み付け平均結果を状態識別結果としてもよい。例えば、元画像の判定値に対する重みを、他の画像（変更画像）に対する重みよりも大きい値として重み平均処理を行うことによって、最終的な状態識別結果に、元画像の判定結果をより強く反映させることも可能となる。 Moreover, as a change aspect,
(A) Depending on whether the image whose state has been determined is the original image before the change,
Different “weights” may be set in advance, and using the weights, a weighted average of determination values (scores) determined for each image in the determination image group may be performed, and the weighted average result may be used as a state identification result. For example, by performing the weighted average process with the weight for the determination value of the original image being larger than the weight for the other image (changed image), the determination result of the original image is more strongly reflected in the final state identification result. It is also possible to make it.

また、更なる変更態様として、
（ｂ）当該画像における変更された種別に応じて、
互いに異なる「重み」を予め設定し、この重みを用いて、判定画像群の画像毎に決定された判定値（スコア）の重み付け平均を行い、この重み付け平均結果を状態識別結果としてもよい。例えば、顔の表情の判定結果が顔の向き・傾きの具合に最も強く影響を受けることが分かったとすると、顔の向き・傾きに係る見え方種別について変更した変更画像の判定値に対する重みを、他の見え方種別について変更した変更画像の判定値に対する重みよりも大きい値として重み平均処理を行ってもよい。これにより、顔の向き・傾き具合の強い影響を考慮した状態識別結果を得ることも可能となる。 As a further modification,
(B) According to the changed type in the image,
Different “weights” may be set in advance, and using the weights, a weighted average of determination values (scores) determined for each image in the determination image group may be performed, and the weighted average result may be used as a state identification result. For example, if it is found that the determination result of the facial expression is most strongly influenced by the orientation / tilt of the face, the weight for the determination value of the changed image changed for the appearance type related to the orientation / tilt of the face, You may perform a weight average process as a value larger than the weight with respect to the determination value of the change image changed about another appearance type. As a result, it is possible to obtain a state identification result in consideration of the strong influence of the face direction and inclination.

さらに、上記（ａ）及び（ｂ）の両方に基づく「重み」を設定し、この重みによって状態識別結果を導出することも好ましい。また、各画像の判定値に対し、（重み）平均以外の演算処理、例えば多数決、を行って最終的な状態識別結果を決定することも可能である。 Furthermore, it is also preferable to set a “weight” based on both (a) and (b) and derive the state identification result by this weight. It is also possible to determine the final state identification result by performing arithmetic processing other than (weight) averaging, for example, majority decision, on the determination value of each image.

ここで、以上に説明した状態識別結果（及び状態判定結果）は、直接例えばアプリケーション１２１に出力されて使用されてもよく、または、判定結果記憶部１０５に保存され、その後適宜、例えばアプリケーション１２１によって読み出されて使用されることも好ましい。アプリケーション１２１は、取得した状態識別結果（及び状態判定結果）を、例えば所定のアプリケーション・プログラムにおける表情判断データとして利用することができる。さらに、この利用結果（処理結果）や元の状態識別結果（及び状態判定結果）は、入出力制御部１２２を介して、タッチパネル・ディスプレイ１０６に表示されてもよく、通信インタフェース部１０１を通して外部の情報処理装置へ送信されてもよい。 Here, the state identification result (and the state determination result) described above may be directly output to the application 121 and used, for example, or stored in the determination result storage unit 105, and thereafter appropriately, for example, by the application 121. It is also preferable to read and use. The application 121 can use the acquired state identification result (and the state determination result) as, for example, facial expression determination data in a predetermined application program. Further, the use result (processing result) and the original state identification result (and the state determination result) may be displayed on the touch panel display 106 via the input / output control unit 122, and may be displayed externally through the communication interface unit 101. It may be transmitted to the information processing apparatus.

［状態識別システム］
図６は、本発明による状態識別システムの一実施形態を示す模式図である。 [State identification system]
FIG. 6 is a schematic diagram showing an embodiment of a state identification system according to the present invention.

図６によれば、本実施形態の状態識別システムは、サーバとしての表情識別準備装置３と、本発明による状態識別装置としてのスマートフォン５とを有している。 According to FIG. 6, the state identification system of this embodiment has the facial expression identification preparation apparatus 3 as a server, and the smart phone 5 as a state identification apparatus by this invention.

このうち、表情識別準備装置３は、見え方度合い判定部３１２と、画像群生成部３１３と、識別モデル構築部３１４ａを含む表情識別処理部３１４とを有する。これらの機能構成部は、図１に示したスマートフォン１における同名の機能構成部と同様の機能・構成を有している。したがって、表情識別準備装置３は、例えば外部の画像管理サーバ２から取得した学習用画像を元画像として学習画像群を生成し、生成した学習画像群を用いて識別モデルを構築し、構築した識別モデル（の情報）を、例えばスマートフォン５に送信することができる。ちなみに、このような構成を有する本実施形態の表情識別準備装置３は、識別モデルに学習させるための学習画像群を生成するので、本発明による状態識別装置として把握することも可能である。 Among these, the facial expression identification preparation device 3 includes an appearance degree determination unit 312, an image group generation unit 313, and a facial expression identification processing unit 314 including an identification model construction unit 314 a. These functional components have the same functions and configurations as the functional components of the same name in the smartphone 1 shown in FIG. Therefore, the facial expression identification preparation device 3 generates a learning image group using, for example, a learning image acquired from the external image management server 2 as an original image, constructs an identification model using the generated learning image group, and constructs the constructed identification. The model (information thereof) can be transmitted to the smartphone 5, for example. Incidentally, the facial expression identification preparation apparatus 3 of the present embodiment having such a configuration generates a learning image group for learning by an identification model, and can be understood as a state identification apparatus according to the present invention.

一方、スマートフォン５は、通信インタフェース部５０１と、カメラ５０２と、タッチパネル・ディスプレイ５０３と、画像管理部５１１と、見え方度合判定部５１２と、画像群生成部５１３と、表情判定部５１４ｂを含む表情識別処理部５１４と、アプリケーション５２１と、入出力制御部５２２とを有する。これらの機能構成部は、図１に示したスマートフォン１における同名の機能構成部と同様の機能・構成を有している。したがって、スマートフォン５は、例えば、表情識別準備装置３から構築された識別モデル（の情報）を受信し、例えばカメラ５０２で撮影された識別対象画像を元画像として判定画像群を生成し、生成した判定画像群によって、取得した識別モデルを用いて状態の識別を行い、この状態識別結果をアプリケーション５２１で利用することができる。 On the other hand, the smartphone 5 has an expression including a communication interface unit 501, a camera 502, a touch panel display 503, an image management unit 511, an appearance degree determination unit 512, an image group generation unit 513, and an expression determination unit 514 b. An identification processing unit 514, an application 521, and an input / output control unit 522 are included. These functional components have the same functions and configurations as the functional components of the same name in the smartphone 1 shown in FIG. Therefore, the smartphone 5 receives, for example, the identification model (information thereof) constructed from the facial expression identification preparation device 3, and generates and generates a determination image group using, for example, the identification target image captured by the camera 502 as an original image. Depending on the determination image group, the state can be identified using the obtained identification model, and the state identification result can be used by the application 521.

以上説明したように、本実施形態では、識別モデルの構築（学習）は、表情識別準備装置３が実施している。したがって、スマートフォン５は、画像管理サーバ２等から大量の画像サンプルを取得しなくてもよく、また、処理負担の大きい識別モデル構築処理を実施する必要もない。その結果、スマートフォン５では、装置内で実行する情報処理量、及び装置内に取り込むべき情報量が格段に小さくて済む。言い換えれば、スマートフォン５は、携帯端末レベルのサイズ及び処理能力をもって好適な表情識別を実現可能とするのである。 As described above, in this embodiment, the identification model is constructed (learned) by the facial expression identification preparation device 3. Therefore, the smartphone 5 does not have to acquire a large amount of image samples from the image management server 2 or the like, and does not need to perform an identification model construction process with a large processing load. As a result, in the smartphone 5, the information processing amount to be executed in the device and the information amount to be taken in the device can be significantly reduced. In other words, the smartphone 5 can realize suitable facial expression identification with the size and processing capability at the portable terminal level.

なお、更なる他の実施形態として、スマートフォン５は、見え方度合判定部５１２、画像群生成部５１３、及び表情識別処理部５１４のいずれも備えておらず、表情識別準備装置３がこれらの機能構成部を全て備えていて、スマートフォン５は、表情識別準備装置３から状態識別結果（の情報）を取得して利用してもよい。このような実施形態では、表情識別準備装置３が本発明に係る状態識別装置として機能する。 As yet another embodiment, the smartphone 5 does not include any of the appearance degree determination unit 512, the image group generation unit 513, and the facial expression identification processing unit 514, and the facial expression identification preparation device 3 performs these functions. All the components are provided, and the smartphone 5 may acquire and use the state identification result (information thereof) from the facial expression identification preparation device 3. In such an embodiment, the facial expression identification preparation device 3 functions as a state identification device according to the present invention.

ちなみに、上述したようなサーバ（表情識別準備装置３）から出力された表情識別結果を享受する端末は当然、スマートフォンに限定されるものではない。例えば、タブレット型コンピュータ、ノート型コンピュータ、ＰＣや、ＩＯＴ（Internet Of Things）環境での使用に適したデバイスとしてのシンクライアント（Thin client）端末であってもよく、さらには、ＳＴＢ、サイネージ、ロボット等、種々の形態の端末を採用することが可能である。 Incidentally, the terminal that receives the facial expression identification result output from the server (facial expression identification preparation device 3) as described above is not limited to a smartphone. For example, it may be a thin client terminal as a device suitable for use in a tablet computer, notebook computer, PC, or IOT (Internet Of Things) environment, and further, STB, signage, robot It is possible to employ various types of terminals.

以上、詳細に説明したように、本発明によれば、識別対象の見え方を変更し且つその状態を維持する処理で生成された画像を含む画像群を生成する。これにより、従来と比較して、種々の見え方をしているより多くの識別対象の画像を確実に準備することが可能となる。その結果、様々な状況下で種々の見え方をしている識別対象の画像であっても、当該状況や当該見え方による影響を調整し、より高い精度でこの識別対象の状態を識別することができるのである。 As described above in detail, according to the present invention, an image group including images generated by the process of changing the appearance of the identification target and maintaining the state is generated. As a result, it is possible to reliably prepare more images to be identified that have various appearances as compared to the conventional case. As a result, even for an image of an identification target that looks differently under various circumstances, the influence of the situation and the appearance is adjusted, and the state of the identification target can be identified with higher accuracy. Can do it.

具体的に、本発明によれば、例えば、学習用の画像サンプル数が少ないため識別モデルの識別精度が低いままであるという従来の問題に対し、その学習用の画像サンプル数（さらには画像パターン数）を増やすことによって、高い識別精度を有する（さらには状況や見え方に対するロバスト性の高い）識別モデルを構築することが可能となる。また、識別モデルを用いた状態識別の際に、入力する画像数（さらには画像パターン数）を増やし、識別精度を向上させる（さらには状況や見え方に対するロバスト性を高める）ことも可能となるのである。 Specifically, according to the present invention, for example, in contrast to the conventional problem that the identification model identification accuracy remains low due to a small number of learning image samples, the number of image samples for learning (and also image patterns). By increasing the number, it is possible to construct an identification model having high identification accuracy (and high robustness to the situation and appearance). In addition, it is possible to increase the number of input images (and the number of image patterns) and improve the identification accuracy (and improve the robustness with respect to the situation and appearance) during state identification using the identification model. It is.

ちなみに、本発明に基づいて実現された高精度の表情識別結果を利用することによって、様々なサービスを提供可能なアプリケーション・プログラムを開発することもできる。そのようなアプリとして、例えば、この表情識別結果を利用して、対話している端末ユーザの感情（発話意図）を理解し、その応答内容を調整したり、当該ユーザとの対話内容をパーソナライズしたりすることが可能な対話ＡＩアプリが挙げられる。 Incidentally, an application program capable of providing various services can be developed by using the highly accurate facial expression identification result realized based on the present invention. As such an application, for example, by using this facial expression identification result, it is possible to understand the emotion (utterance intention) of the terminal user who is interacting, adjust the response content, or personalize the content of the interaction with the user. An interactive AI application that can be used.

以上に述べた本発明の種々の実施形態について、本発明の技術思想及び見地の範囲内での種々の変更、修正及び省略は、当業者によれば容易に行うことができる。以上に述べた説明はあくまで例示であって、何ら制約を意図するものではない。本発明は、特許請求の範囲及びその均等物によってのみ制約される。 Various changes, modifications, and omissions of the various embodiments of the present invention described above within the scope of the technical idea and the viewpoint of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be any limitation. The present invention is limited only by the claims and the equivalents thereof.

１、５スマートフォン（状態識別装置）
１０１、５０１通信インタフェース部
１０２、５０２カメラ
１０３画像データベース
１０４変更画像記憶部
１０５判定結果記憶部
１０６、５０３タッチパネル・ディスプレイ（ＴＰ・ＤＰ）
１１１、５１１画像管理部
１１２、３１２、５１２見え方度合判定部
１１３、３１３、５１３画像群生成部
１１３ａ画像変更部
１１３ｂ学習画像群生成部
１１３ｃ判定画像群生成部
１１４、３１４、５１４表情識別処理部
１１４ａ、３１４ａ識別モデル構築部
１１４ｂ、５１４ｂ表情判定部
１２１、５２１アプリケーション
１２２、５２２入出力制御部
２画像管理サーバ
３表情識別準備装置 1, 5 Smartphone (state identification device)
101, 501 Communication interface unit 102, 502 Camera 103 Image database 104 Changed image storage unit 105 Judgment result storage unit 106, 503 Touch panel display (TP / DP)
111,511 Image management unit 112,312,512 Appearance degree determination unit 113,313,513 Image group generation unit 113a Image change unit 113b Learning image group generation unit 113c Determination image group generation unit 114,314,514 Expression expression identification processing unit 114a, 314a Identification model construction unit 114b, 514b Expression determination unit 121, 521 Application 122, 522 Input / output control unit 2 Image management server 3 Expression identification preparation device

Claims

A state identification device that uses an image related to an identification object and identifies the state of the identification object by an identification model,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Image group generation means for generating;
A state identification apparatus comprising: an identification processing unit that uses the generated image group when the identification model is constructed and / or when the state is identified using the identification model.

2. The image group generation unit performs at least a change in a direction opposite to a direction in the one image with respect to the one image with respect to at least one type related to the appearance. The state identification device described in 1.

The image group generation means at least changes at least one type related to the appearance with respect to the one image in a direction that is preset with respect to the at least one type and that is suitable for identification. The state identification device according to claim 1 or 2.

The said image group production | generation means changes according to the some of several types which concern on the said appearance with respect to the said one image, The any one of Claim 1 to 3 characterized by the above-mentioned. State identification device.

The image group generation means performs, for the one image, at least one type relating to the appearance, changing the at least one type in a direction set in advance, and the identification characterized by the direction Generate images to build a model,
5. The state identification device according to claim 1, wherein the identification processing unit constructs the identification model using the generated image group. 6.

The image group generation means performs, for the one image, for at least one type related to the appearance, changes in all possible directions for the at least one type, and the direction for the at least one type. Generate a group of images for constructing the identification model in which the variation in the identification results for each is suppressed,
5. The state identification device according to claim 1, wherein the identification processing unit constructs the identification model using the generated image group. 6.

The image group generation means performs, for one image associated with one state as a correct answer, for at least one type related to the appearance, as many changes as the number of changes set in advance for the one state. The state identification apparatus according to claim 1, wherein the image group is generated with a predetermined number of samples.

When the number of samples of the image associated with the one state is smaller than the number of samples of the image associated with the other state, the image group generation unit performs the processing for the image associated with the one state. The state identification device according to claim 7, wherein the number of changes is larger than that of an image associated with the other state.

The identification processing means uses the generated image group when constructing the identification model and identifying the state using the identification model,
When the image group generation means generates an image group to be used for identifying the state, the identification model for any type related to the appearance of the identification target for one input image. Change in the same direction as the change for the type made when generating the image group used in the construction, while the change for the type is not made when generating the image group The state identification device according to any one of claims 1 to 8, wherein no change is made in the event of a failure.

The identification processing means determines the state of each of the images included in the generated image group using the identification model, and is set in advance depending on whether the image is the original image before the change. The weighted average of the state is determined by the weight and / or the weight set in advance according to the type changed in the image, and the weighted average result is used as the identification result of the state. The state identification device according to any one of claims 1 to 9.

The state identification device according to claim 1, wherein the image group generation unit generates the image group including one image related to the identification target.

The state identification device extracts an image area related to the identification target for one input image related to the identification target, and generates the type in at least one type related to the appearance based on the image area A degree determination means for determining the presence and / or degree of
The said image group production | generation means changes by changing the state and / or the said degree of the said presence / absence determined about the said at least 1 classification. State identification device.

The image group generation means includes, as a plurality of types related to the appearance, the degree of the identification object appearing in the one image, the direction or inclination of the identification object appearing, and whether or not there is something that blocks the identification object Alternatively, at least one of the type, the brightness of the identification target, the hue of the identification target, and the degree of contrast between the identification target and the background is changed. The state identification device according to item.

The identification target is a face, and the state is a facial expression,
The image group generation means performs a process of changing the appearance of the face in the one image and maintaining the facial expression on the input one image related to the identification target,
The state identification device according to claim 1, wherein the identification processing unit identifies facial expressions in one input image using the identification model.

A state identification system for identifying a state of the identification target by an identification model using an image related to the identification target,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Image group generation means for generating;
An identification model construction means for constructing the identification model by the generated image group;
A state identification system comprising: identification processing means for identifying the state by the identification model using the generated image group.

A state identification program for causing a computer mounted on a device for identifying a state of the identification target by using an identification model using an image related to the identification target,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Image group generation means for generating;
A state identification program for causing a computer to function as an identification processing unit that uses the generated image group when constructing the identification model and / or identifying the state using the identification model .

A state identification method implemented in a computer mounted on an apparatus for identifying a state of the identification target by using an identification model using an image related to the identification target,
An image group including an image generated by the processing is performed on the input image related to the identification target by performing a process of changing the appearance of the identification target in the one image and maintaining the state. Generating step;
Using the generated image group when the identification model is constructed and / or when the identification model is used to identify the state.