JP4641389B2

JP4641389B2 - Information processing method and information processing apparatus

Info

Publication number: JP4641389B2
Application number: JP2004166135A
Authority: JP
Inventors: 克彦森; 優和真継; 美絵石井; 裕輔御手洗
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-06-03
Filing date: 2004-06-03
Publication date: 2011-03-02
Anticipated expiration: 2024-06-03
Also published as: JP2005346471A

Description

本発明は、ユーザの表情や動作等といった反応を検出し、その検出した反応からユーザの感情を認識する為の技術に関するものである。 The present invention relates to a technique for detecting a reaction such as a user's facial expression or action and recognizing a user's emotion from the detected reaction.

近年、機械と人間とのインターフェースとして、人間の意志や感情を認識し、人間とのコミュニケーションを行う人工エージェントの研究が進んでいる。また、様々なペットロボットが登場し、使用者の行動や感情等を認識し、その認識結果によりペットロボットの行動を変化させるものもあり、このようなペットロボットは一種の人工エージェントと言える。このような人工エージェントにおいては、カメラやマイクを用いて、人物の状態を検出し、検出した状態に基づいて人物の喜怒哀楽等といった感情を認識することが必要とされている。 In recent years, research on artificial agents that recognize human intentions and emotions and communicate with humans as an interface between machines and humans is progressing. In addition, various pet robots appear, and there are those that recognize the user's behavior and emotions and change the behavior of the pet robot according to the recognition result. Such a pet robot can be said to be a kind of artificial agent. In such an artificial agent, it is necessary to detect the state of a person using a camera or a microphone and recognize emotions such as emotions of the person based on the detected state.

例えば、CCDカメラ等で撮影された画像から顔の表情を検出する装置が従来から開示されている（例えば特許文献１を参照）。これは、入力画像に対してウェーブレット変換を行い、各周波数領域での平均電力と無表情の時の平均電力との差分から、表情を検出するものである。 For example, an apparatus for detecting a facial expression from an image taken with a CCD camera or the like has been conventionally disclosed (see, for example, Patent Document 1). In this method, wavelet transform is performed on an input image, and a facial expression is detected from the difference between the average power in each frequency region and the average power when there is no expression.

また、音声と画像の両方を用いて、被写体の顔の表情と感情を検出する技術が従来から開示されている（例えば非特許文献１を参照）。 In addition, a technique for detecting facial expressions and emotions of a subject using both sound and images has been conventionally disclosed (for example, see Non-Patent Document 1).

さらに、顔の変位に応じた特徴ベクトルと、表情の各カテゴリごとに用意したベクトル量子化を行うためのコードブックとを使用し、ベクトル量子化後のシンボル列に基づいてカテゴリを決定する技術が従来から開示されている（例えば特許文献２を参照）。 Furthermore, there is a technology for determining a category based on a symbol sequence after vector quantization using a feature vector corresponding to face displacement and a code book for vector quantization prepared for each category of facial expression. It has been conventionally disclosed (for example, see Patent Document 2).

さらにまた、笑顔画像に対して、眼や口や眉の位置やサイズ等の変動から、その笑顔を、快の笑顔・不快の笑顔・社交的な笑顔に分類する検討を行なう技術も開示されている（例えば非特許文献２を参照）。
特開平８―２４９４４７号公報特許第２８３９８５５号「Lawrence S.Chen,Thomas S.Huang, Tsutomu Miyasato, Ryohei Nakatsu : “Multimodal Human Emotion/Expression Recognition”, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition pp366-371(1998)」「片岡、岩口、佐冶：“目と口の動きの追跡による笑顔の分類”、情報処理学会研究報告、2001-HI-95、pp109-116(2001)」 In addition, a technique is also disclosed for examining smile images that are classified into pleasant smiles, unpleasant smiles, and social smiles based on changes in the position and size of the eyes, mouth, and eyebrows. (See Non-Patent Document 2, for example).
JP-A-8-249447 Japanese Patent No. 2839855 “Lawrence S. Chen, Thomas S. Huang, Tsutomu Miyasato, Ryohei Nakatsu:“ Multimodal Human Emotion / Expression Recognition ”, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition pp366-371 (1998)” “Kataoka, Iwaguchi, Sagi:“ Classification of smiles by tracking eye and mouth movements ”, Information Processing Society of Japan, 2001-HI-95, pp109-116 (2001)”

上記説明したように、カメラやマイクを用いて人物の状態（反応）を検出し、さらに検出した状態（反応）に基づいて人物の喜怒哀楽等といった感情を認識する研究が進められている。しかし、検出した状態（反応）と感情とを正確に対応させるのは容易ではない。 As described above, research is being conducted to detect a person's state (reaction) using a camera or a microphone, and to recognize emotions such as a person's emotions based on the detected state (reaction). However, it is not easy to accurately correspond the detected state (reaction) and emotion.

例えば、反応の一つとして表情を考えれば、表情の豊かな人もいれば、そうでない人もいる。つまり、同じ感情を持った人が全員同じ表情をするわけではない。逆に言うと、同じ表情をした人たちが、同じ感情を持っているとは限らない。そのため、表情といった人物の状態（反応）と感情との対応は、個人毎に設定する必要がある。上記従来例においては、ユーザ個々人の個人差に関しては言及がない。 For example, if one considers facial expressions as one of the reactions, some people have rich facial expressions and others do not. In other words, not all people with the same emotions have the same facial expression. In other words, people with the same facial expression do not always have the same emotion. For this reason, the correspondence between a person's state (reaction) such as a facial expression and emotion must be set for each individual. In the above conventional example, there is no mention regarding individual differences among individual users.

本発明は以上の問題に鑑みてなされたものであり、表情に代表される人物の状態（反応）と感情との対応を、個人毎に簡便に設定する為の技術を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for easily setting a correspondence between a person's state (reaction) represented by a facial expression and an emotion for each individual. To do.

本発明の目的を達成するために、例えば本発明の情報処理装置は以下の構成を備える。 In order to achieve the object of the present invention, for example, an information processing apparatus of the present invention comprises the following arrangement.

すなわち、映画のデータを再生する情報処理装置であって、
前記映画のシーン毎に、該シーンを見る人が持つと思われる感情を示す感情情報を予め関連付けて記憶保持している記憶保持手段と、
前記映画のデータを再生する再生手段と、
前記映画のデータを見ているときのユーザの画像を撮像する撮像手段と、
前記画像から得られる前記ユーザの特徴量に基づいて、前記映画のシーンを見ているときの前記ユーザの感情を、予め設定された認識モデルを用いて推定する推定手段と、
前記ユーザが前記シーンを見ているときに前記推定手段により推定される感情が、該シーンに関連付けて前記記憶保持手段に記憶保持されている感情情報が示す感情として推定されるように、前記認識モデルのパラメータを修正する修正手段と
を備えることを特徴とする。 That is, an information processing apparatus for reproducing movie data,
Memory holding means for pre-associating and holding emotion information indicating an emotion that a person who sees the scene seems to have for each scene of the movie;
Playback means for playing back the movie data;
Imaging means for capturing an image of a user when viewing the movie data;
Estimating means for estimating the user's emotion when watching the scene of the movie based on the feature amount of the user obtained from the image using a preset recognition model;
Wherein such user emotion estimated by the estimating means when viewing the scene, emotion information stored and held in the storage holding means in association with the scene is estimated as emotion indicated, the recognition Correction means for correcting the parameters of the model.

すなわち、ゲームを再生する情報処理装置であって、
前記ゲームの各進展具合を示す情報毎に、該進展具合において前記ゲームをしている人が持つと思われる感情を示す感情情報を予め関連付けて記憶保持している記憶保持手段と、
前記ゲームを行っているときのユーザの画像を撮像する撮像手段と、
前記ユーザが行っているゲームの進展具合を監視する監視手段と、
前記画像から得られる前記ユーザの特徴量に基づいて、前記ゲームの各進展具合における前記ユーザの感情を、予め設定された認識モデルを用いて推定する推定手段と、
前記ゲームの各進展具合について前記推定手段により推定される感情が、前記記憶保持手段に該進展具合を示す情報に関連付けられて保持された感情情報が示す感情として推定されるように、前記認識モデルのパラメータを修正する修正手段と
を備えることを特徴とする。 That is, an information processing apparatus that plays a game,
Every information indicating each progress degree of the game, a storage holding means in advance in association with memory retention emotion information indicating the emotion that seems to have the person with Oite the game該進exhibition condition,
Imaging means for capturing an image of the user when the game is being played ;
Monitoring means for monitoring the progress of the game being played by the user;
Estimating means for estimating the user's emotion in each progress of the game using a preset recognition model based on the feature amount of the user obtained from the image;
As emotion estimated by the estimating means for each progress condition of the game, emotion information held associated with the information indicating the該進exhibition degree in the storage holding means is estimated as emotion indicated, the recognition model And a correction means for correcting the parameters of.

本発明の目的を達成するために、例えば本発明の情報処理方法は以下の構成を備える。 In order to achieve the object of the present invention, for example, an information processing method of the present invention comprises the following arrangement.

すなわち、映画のデータを再生する情報処理装置が行う情報処理方法であって、
前記情報処理装置が有する記憶保持手段が、前記映画のシーン毎に、該シーンを見る人が持つと思われる感情を示す感情情報を予め関連付けて記憶保持部に記憶保持する記憶保持工程と、
前記情報処理装置が有する再生手段が、前記映画のデータを再生する再生工程と、
前記情報処理装置が有する撮像手段が、前記映画のデータを見ているときのユーザの画像を撮像する撮像工程と、
前記情報処理装置が有する推定手段が、前記画像から得られる前記ユーザの特徴量に基づいて、前記映画のシーンを見ているときの前記ユーザの感情を、予め設定された認識モデルを用いて推定する推定工程と、
前記情報処理装置が有する修正手段が、前記ユーザが前記シーンを見ているときに前記推定工程により推定される感情が、該シーンに関連付けて前記記憶保持部に記憶保持されている感情情報が示す感情として推定されるように、前記認識モデルのパラメータを修正する修正工程と
を備えることを特徴とする。 That is, an information processing method performed by an information processing apparatus that reproduces movie data,
A memory holding step in which the memory holding means of the information processing apparatus associates in advance and stores in the memory holding unit emotion information indicating the feeling that a person who sees the scene seems to have for each scene of the movie;
A reproduction step in which reproduction means included in the information processing apparatus reproduces the data of the movie;
An imaging process in which an imaging unit included in the information processing apparatus captures an image of a user when viewing the movie data;
Based on the feature amount of the user obtained from the image, the estimation means included in the information processing apparatus estimates the emotion of the user when watching the movie scene using a preset recognition model. An estimation process to
Correction means said information processing apparatus has found the emotion estimated by the estimation process, shown emotion information stored and held in the storage holding unit in association with the scene when the user is viewing the scene A correction step of correcting the parameters of the recognition model so as to be estimated as emotions.

すなわち、ゲームを再生する情報処理装置が行う情報処理方法であって、
前記情報処理装置が有する記憶保持手段が、前記ゲームの各進展具合を示す情報毎に、該進展具合において前記ゲームをしている人が持つと思われる感情を示す感情情報を予め関連付けて記憶保持部に記憶保持する記憶保持工程と、
前記情報処理装置が有する撮像手段が、前記ゲームを行っているときのユーザの画像を撮像する撮像工程と、
前記情報処理装置が有する監視手段が、前記ユーザが行っているゲームの進展具合を監視する監視工程と、
前記情報処理装置が有する推定手段が、前記画像から得られる前記ユーザの特徴量に基づいて、前記ゲームの各進展具合における前記ユーザの感情を、予め設定された認識モデルを用いて推定する推定工程と、
前記情報処理装置が有する修正手段が、前記ゲームの各進展具合について前記推定工程により推定される感情が、前記記憶保持部に該進展具合を示す情報に関連付けられて保持された感情情報が示す感情として推定されるように、前記認識モデルのパラメータを修正する修正工程と
を備えることを特徴とする。 That is, an information processing method performed by an information processing apparatus that plays a game,
Storage retaining means said information processing apparatus has found the each information indicating each progress condition of the game, in advance associate emotion information indicating the emotion that seems to have the person with Oite the game該進Exhibition degree A memory holding step for storing and holding in the memory holding unit;
An imaging process in which an imaging unit included in the information processing apparatus captures an image of a user when the game is being played ,
A monitoring step in which the monitoring means of the information processing apparatus monitors the progress of the game being played by the user;
An estimation step in which the estimation unit included in the information processing apparatus estimates the emotion of the user in each progress of the game using a preset recognition model based on the feature amount of the user obtained from the image. When,
Emotion correction means said information processing apparatus has the emotion estimated by the estimating step for each progress condition of the game, indicated by the emotion information stored associated with information indicating the該進exhibition degree in the storage holder And a correction step of correcting the parameters of the recognition model.

本発明の構成により、表情に代表される人物の状態（反応）と感情との対応を、個人毎に簡便に設定することができる。 With the configuration of the present invention, the correspondence between the state (reaction) of a person typified by a facial expression and emotion can be easily set for each individual.

以下添付図面を参照して、本発明を好適な実施形態に従って詳細に説明する。 Hereinafter, the present invention will be described in detail according to preferred embodiments with reference to the accompanying drawings.

［第１の実施形態］
図１は、本実施形態に係る情報処理システムの機能構成を示す図である。 [First Embodiment]
FIG. 1 is a diagram illustrating a functional configuration of the information processing system according to the present embodiment.

同図において１００はユーザで、以下では、このユーザの表情と感情との対応関係を設定する処理について説明する。本実施形態に係るシステムは、カメラ、１０１、マイク１０２、制御部１０３、反応計測部１０４、感情認識部１０５、刺激データ提示部１０６、感情データ保持部１０７、感情モデル修正部１０８、感情モデル１０９により構成されている。 In the figure, reference numeral 100 denotes a user. In the following, processing for setting the correspondence between the user's facial expression and emotion will be described. The system according to this embodiment includes a camera, a 101, a microphone 102, a control unit 103, a reaction measurement unit 104, an emotion recognition unit 105, a stimulus data presentation unit 106, an emotion data holding unit 107, an emotion model correction unit 108, and an emotion model 109. It is comprised by.

以下、システムを構成する各部について簡単に説明する。 Hereinafter, each part which comprises a system is demonstrated easily.

カメラ１０１はユーザの様子を動画像として撮像するものである。マイク１０２は、ユーザの音声を収集するものである。カメラ１０１、マイク１０２により得られた画像情報、音声情報は、後段の反応計測部１０４に出力される。 The camera 101 captures the state of the user as a moving image. The microphone 102 collects user's voice. Image information and audio information obtained by the camera 101 and the microphone 102 are output to the reaction measurement unit 104 in the subsequent stage.

制御部１０３は、同図に示したシステムを構成する各部を制御するものである。反応計測部１０４は、カメラ１０１から得られた画像情報、もしくはマイク１０２から得られた音声情報を用いて、ユーザ１００の反応を計測する。この反応とは例えば、ユーザ１００の顔における眼や口等の位置や大きさ、また声の大きさ等といったものである。 The control unit 103 controls each unit constituting the system shown in FIG. The reaction measurement unit 104 measures the reaction of the user 100 using image information obtained from the camera 101 or voice information obtained from the microphone 102. This reaction includes, for example, the position and size of eyes and mouth on the face of the user 100, the volume of voice, and the like.

感情認識部１０５は、反応計測部１０４が計測したユーザ１００の反応を示す情報、及び感情モデル１０９を用いて、ユーザ１００の感情を認識する。この認識方法については後述する。 The emotion recognition unit 105 recognizes the emotion of the user 100 using the information indicating the reaction of the user 100 measured by the reaction measurement unit 104 and the emotion model 109. This recognition method will be described later.

刺激データ提示部１０６は、刺激データに従った画像や音声をユーザ１００に提示（再生）するものである。刺激データには、この刺激データに従った画像や音声を一般のユーザに提示した場合に、一般のユーザが抱くとおぼしき感情を示すデータ（情報）が関連付けられている。この刺激データについては詳しくは後述する。感情データ保持部１０７は、上記刺激データ、及びこの刺激データに関連付けられている感情を示すデータを保持する。 The stimulus data presentation unit 106 presents (reproduces) an image and sound according to the stimulus data to the user 100. The stimulus data is associated with data (information) indicating a feeling that a general user holds when an image or sound according to the stimulus data is presented to the general user. This stimulation data will be described later in detail. The emotion data holding unit 107 holds the stimulus data and data indicating emotions associated with the stimulus data.

感情モデル修正部１０８は、刺激データ提示部１０６によってユーザ１００に刺激データを提示した場合に、感情認識部１０５が認識した感情が、提示した刺激データに関連付けらた感情として認識されるように、感情モデル１０９のパラメータを修正するものである。感情モデル１０９は、反応計測部１０４からの情報を入力とし、この入力に基づいて、ユーザ１００の感情を認識する為のモデルである。 When emotion data is presented to the user 100 by the stimulus data presentation unit 106, the emotion model correction unit 108 recognizes the emotion recognized by the emotion recognition unit 105 as an emotion associated with the presented stimulus data. The parameters of the emotion model 109 are corrected. The emotion model 109 is a model for receiving information from the reaction measurement unit 104 and recognizing the emotion of the user 100 based on the input.

図７は、本実施形態に係るシステムのハードウェア構成を示す図である。本実施形態では、カメラ１０１、マイク１０２をＰＣ（パーソナルコンピュータ）やＷＳ（ワークステーション）などのコンピュータに接続し、一連の上記認識処理、感情モデル１０９のパラメータ修正処理をこのコンピュータにおいて行うとする。 FIG. 7 is a diagram illustrating a hardware configuration of a system according to the present embodiment. In the present embodiment, it is assumed that the camera 101 and the microphone 102 are connected to a computer such as a PC (personal computer) or WS (workstation), and a series of the above-described recognition processing and parameter correction processing of the emotion model 109 are performed in this computer.

７０１はＣＰＵで、ＲＡＭ７０２やＲＯＭ７０３に格納されているプログラムやデータを用いてコンピュータ全体の制御を行うと共に、Ｉ／Ｆ７０７を介してカメラ１０１から入力される各フレームのデータ、マイク１０２から入力されるアナログ信号を受ける処理をも行う。また、後述する一連の認識処理、感情モデル１０９のパラメータ修正処理をも行う。ＣＰＵ７０１は、図１では制御部１０３として機能するものである。 Reference numeral 701 denotes a CPU that controls the entire computer using programs and data stored in the RAM 702 and ROM 703, and also receives data of each frame input from the camera 101 via the I / F 707 and input from the microphone 102. Processing to receive analog signals is also performed. In addition, a series of recognition processing, which will be described later, and parameter correction processing of the emotion model 109 are also performed. The CPU 701 functions as the control unit 103 in FIG.

７０２はＲＡＭで、外部記憶装置７０６に保存されており、ＣＰＵ７０１の制御によりロードされたプログラムやデータを一時的に記憶するためのエリアを備えると共に、ＣＰＵ７０１が各種の処理を行うために使用するワークエリアも備える。また、Ｉ／Ｆ７０７を介してカメラ１０１、マイク１０２から入力される画像情報、音声情報を一時的に記憶するためのエリアを備える。 Reference numeral 702 denotes a RAM which is stored in the external storage device 706. The RAM 702 includes an area for temporarily storing programs and data loaded under the control of the CPU 701, and a work used by the CPU 701 for performing various processes. It also has an area. Further, an area for temporarily storing image information and audio information input from the camera 101 and the microphone 102 via the I / F 707 is provided.

７０３はＲＯＭで、コンピュータを起動するためのプログラムやデータなどを格納する。 Reference numeral 703 denotes a ROM which stores programs and data for starting up the computer.

７０４は操作部で、キーボードやマウスなどにより構成されており、各種の指示をＣＰＵ７０１に入力することができる。 An operation unit 704 includes a keyboard and a mouse, and can input various instructions to the CPU 701.

７０５は表示部で、ＣＲＴや液晶画面などにより構成されており、画像や文字などを表示することができる。また、表示部７０５は音声出力も可能である。表示部７０５は、図１では刺激データ提示部１０６として機能するものである。 A display unit 705 includes a CRT, a liquid crystal screen, and the like, and can display images, characters, and the like. The display unit 705 can also output sound. The display unit 705 functions as the stimulus data presentation unit 106 in FIG.

７０６は外部記憶装置で、ハードディスク装置などの大容量情報記憶装置として機能するものであり、ここにＯＳ（オペレーティングシステム）やＣＰＵ７０１に後述する一連の処理を実行させるためのプログラムやデータを保存させておくことができる。本実施形態では、反応計測部１０４、感情認識部１０５、感情モデル１０９、感情モデル修正部１０８は全てプログラムやデータなどにより構成され、外部記憶装置７０６に保存されており、ＣＰＵ７０１の制御に従ってＲＡＭ７０２にロードされるものであるとする。また、外部記憶装置７０６は、図１では、感情データ保持部１０７として機能するものである。 Reference numeral 706 denotes an external storage device that functions as a large-capacity information storage device such as a hard disk device, and stores programs and data for causing the OS (operating system) and the CPU 701 to execute a series of processes described later. I can leave. In this embodiment, the reaction measuring unit 104, the emotion recognizing unit 105, the emotion model 109, and the emotion model correcting unit 108 are all configured by programs, data, and the like, stored in the external storage device 706, and stored in the RAM 702 according to the control of the CPU 701. Suppose that it is to be loaded. Also, the external storage device 706 functions as the emotion data holding unit 107 in FIG.

７０７はＩ／Ｆで、カメラ１０１やマイク１０２を接続することができ、このＩ／Ｆ７０７を介してカメラ１０１から入力される各フレームのデータ、マイク１０２から入力されるアナログ信号を受けることができる。なお、Ｉ／Ｆ７０７には、カメラ１０１やマイク１０２からのアナログ信号をディジタルデータに変換するためのＡ／Ｄ変換器が内蔵されているものとする。 Reference numeral 707 denotes an I / F that can connect the camera 101 and the microphone 102 and can receive data of each frame input from the camera 101 and an analog signal input from the microphone 102 via the I / F 707. . It is assumed that the I / F 707 includes an A / D converter for converting analog signals from the camera 101 and the microphone 102 into digital data.

７０８は上述の各部を繋ぐバスである。 Reference numeral 708 denotes a bus connecting the above-described units.

以下では、このコンピュータが行う感情認識処理、及びユーザ１００の感情を認識するための感情モデル１０９のパラメータの修正処理について、これらの処理のフローチャートを示す図３を参照して詳細に説明する。 Hereinafter, the emotion recognition process performed by the computer and the parameter correction process of the emotion model 109 for recognizing the emotion of the user 100 will be described in detail with reference to FIG. 3 showing a flowchart of these processes.

ここで、本実施形態では、表示部７０５によってユーザ１００に提示するものを映画とする。すなわち、上記刺激データは、映画のデータとする。従って刺激データは、映画の各フレームの画像データ、及び音声データにより構成される。また、刺激データには、映画の各シーン毎に、そのシーンを見ている人が（一般に）抱くとおぼしき感情を示す情報が関連付けられている。 Here, in this embodiment, what is presented to the user 100 by the display unit 705 is a movie. That is, the stimulus data is movie data. Therefore, the stimulus data is composed of image data and audio data of each frame of the movie. In addition, the stimulus data is associated with each scene of the movie, and information indicating an emotion that appears to be (generally) held by a person watching the scene.

このように刺激データとして映画を使用することに対してはいくつかの長所がある。一つは、多数の人が同じ体験をすることが出来るので、平均的な反応と個人の特異な反応との比較がしやすい。つまり、日常生活の中での経験に基づいて個人ごとの設定を行おうとすると、状況がまったく同じであることはまれであるので比較がしにくいが、映画を使用することで、そのような問題を回避できる。 There are several advantages to using movies as stimulus data. One is that many people can have the same experience, so it is easy to compare the average response with the individual's unique response. In other words, when trying to make individual settings based on experience in daily life, it is rare that the situation is exactly the same, so it is difficult to compare, but using movies makes such a problem Can be avoided.

また、一般的に映画は多くのシーンがあり映画を一通り見ると、基本的な感情全てに対する反応が得られる。さらに、ユーザが同じ感情を持つ複数のシーンがあるので、ある特異なシーンのみで感情モデルが作成されるのではなく、ユーザが同じ感情をもつ複数のシーンに対する平均的な反応を使用して、感情モデルを作成するため、精度の高い感情モデルが作成できる。 In general, movies have many scenes, and when you watch a movie, you can get a response to all the basic emotions. Furthermore, since there are multiple scenes where the user has the same emotion, instead of creating an emotion model only with one particular scene, the average reaction for multiple scenes where the user has the same emotion is used, Since an emotion model is created, an accurate emotion model can be created.

図２は、各シーン毎に感情を示すデータを関連付けた感情データシートの構成例を示す図である。例えば映画開始から１０分２１秒目のフレームから１２分２０秒目のフレームまでのシーンを見た場合、多くの人は「怒り」の感情を抱くというように統計が取れているので、同図に示す如く、１０分２１秒目のフレームから１２分２０秒目のフレームまでのシーンには、「怒り」の感情を示すデータが関連付けられている。これは他のシーンについても同様である。 FIG. 2 is a diagram illustrating a configuration example of an emotion data sheet in which data indicating emotion is associated with each scene. For example, if you watch a scene from the frame of 10 minutes 21 seconds to the frame of 12 minutes 20 seconds from the start of the movie, many people have statistics such as feeling “anger”, so the figure As shown in FIG. 5, data indicating the feeling of “anger” is associated with the scene from the frame at 10 minutes 21 seconds to the frame at 12 minutes 20 seconds. The same applies to other scenes.

このような感情データシートのデータは予め作成され、外部記憶装置７０６に保存されており、必要に応じてＲＡＭ７０２にロードされる。 Such emotion data sheet data is created in advance, stored in the external storage device 706, and loaded into the RAM 702 as necessary.

そしてＣＰＵ７０１は外部記憶装置７０６に保存されている映画のデータ、各シーンに関連付けられている感情を示すデータ、上記感情データシートのデータをＲＡＭ７０２にロードし、映画のデータを再生して表示部７０５に映画の各フレームの画像情報、音声情報を出力（再生）する（ステップＳ３０１）。 The CPU 701 loads movie data stored in the external storage device 706, data indicating emotions associated with each scene, and data of the emotion data sheet into the RAM 702, reproduces the movie data, and displays the display unit 705. The image information and audio information of each frame of the movie are output (reproduced) (step S301).

ユーザ１００はこの提示された映画の画像を見、提示された音声情報を聞くのであるが、このときのユーザの様子（反応）はカメラ１０１、マイク１０２により収集される。カメラ１０１はユーザ１００の動画像を撮像しており、撮像した各フレームの画像のデータはＩ／Ｆ７０７を介して順次ＲＡＭ７０２に出力される。同様にマイク１０２はユーザ１００から発せられる音声を収集しており、収集した音声信号はＩ／Ｆ７０７を介してＲＡＭ７０２に出力される。 The user 100 views the presented movie image and listens to the presented audio information. The state (reaction) of the user at this time is collected by the camera 101 and the microphone 102. The camera 101 captures a moving image of the user 100, and the captured image data of each frame is sequentially output to the RAM 702 via the I / F 707. Similarly, the microphone 102 collects sound emitted from the user 100, and the collected sound signal is output to the RAM 702 via the I / F 707.

ＣＰＵ７０１は、カメラ１０１、マイク１０２からＲＡＭ７０２に出力された画像情報、音声情報を用いて、映画を見ているユーザ１００の反応を計測する（ステップＳ３０２）。反応の計測とは上述の通り、ユーザ１００の顔の眼・口・鼻といった顔を構成する部品の画像上における位置や大きさ、またそれらの変化、その他にも手や足の画像上における位置やその位置の変化、またユーザ１００の発する音声の大きさ等といった特徴量が、後述する感情モデル１０９への入力に使用されるパラメータ計測値として計測される。なお、このような特徴量の内容については特に限定するものではないし、またその計測、算出方法については周知のものであるので、これ以上の説明は省略する。 The CPU 701 measures the reaction of the user 100 watching a movie using the image information and audio information output from the camera 101 and microphone 102 to the RAM 702 (step S302). As described above, the measurement of the reaction is the position and size of the parts constituting the face such as the eyes, mouth and nose of the face of the user 100 on the image, their changes, and other positions on the hand and foot images A feature amount such as a change in the position of the user 100, a loudness of the voice uttered by the user 100, or the like is measured as a parameter measurement value used for input to the emotion model 109 described later. Note that the contents of such feature quantities are not particularly limited, and the measurement and calculation methods thereof are well known, and thus further explanation is omitted.

そしてＣＰＵ７０１は、得られた計測値に基づいてユーザ１００の感情を認識（推定）する（ステップＳ３０３）。すなわち、ステップＳ３０１で提示したシーンを見、聞いているユーザ１００の感情を認識する。この認識のために、感情モデル１０９を用いる。感情モデル１０９は、プログラムやデータの形態で外部記憶装置７０６からＲＡＭ７０２にロードされており、ＣＰＵ７０１がこれを用いることで、以下説明する認識処理を行うことができる。 Then, the CPU 701 recognizes (estimates) the emotion of the user 100 based on the obtained measurement value (step S303). That is, it recognizes the emotion of the user 100 who is listening to the scene presented in step S301. The emotion model 109 is used for this recognition. The emotion model 109 is loaded from the external storage device 706 to the RAM 702 in the form of a program or data, and the CPU 701 can use this to perform recognition processing described below.

感情モデル１０９としては例えば周知のニューラルネットワークを用いることができる。図４は、周知の階層型ニューラルネットワークの構成例を示す図である。同図に示す如く、階層型ニューラルネットワークは、入力層、中間層、出力層の３層構造でもって構成されるもので、入力層に入力された情報は中間層でもって処理され、その処理結果は出力層でもって更に処理されて、出力層を構成する各ニューロンから出力される。 As the emotion model 109, for example, a known neural network can be used. FIG. 4 is a diagram illustrating a configuration example of a known hierarchical neural network. As shown in the figure, the hierarchical neural network has a three-layer structure of an input layer, an intermediate layer, and an output layer. Information input to the input layer is processed by the intermediate layer, and the processing result Is further processed by the output layer and output from each neuron constituting the output layer.

入力層を構成するニューロンの数は、ステップＳ３０２で求めた特徴量の数だけ用意される。従って入力層を構成する各ニューロンには、ステップＳ３０２で求めたそれぞれの特徴量（ユーザ１００の顔の眼・口・鼻といった顔を構成する部品の位置や大きさ、またそれらの変化、その他にも手や足の位置やその位置の変化、またユーザ１００の発する音声の大きさ等）が入力される。 The number of neurons constituting the input layer is prepared by the number of feature amounts obtained in step S302. Therefore, each neuron constituting the input layer includes the feature amount obtained in step S302 (the position and size of the parts constituting the face such as the eyes, mouth, and nose of the user 100, the change thereof, and the like). The position of the hand or foot, the change in the position, the loudness of the sound produced by the user 100, etc.) are input.

一方、出力層を構成する各ニューロンは、それぞれが異なる感情を示す為のもので、入力層を構成する各ニューロンに、ステップＳ３０２で求めたそれぞれの特徴量を入力すると、出力層を構成するニューロンの何れか１つが発火する。従って、発火したニューロンに割り当てられた感情が、認識結果とする。例えば出力層を構成するニューロンの数を７つとすると、各ニューロンに“幸福”“怒り”“悲しみ”“嫌悪”“驚き”“恐れ”“感情なし”の何れか１つを重複なしに割り当てる。そして例えば”驚き”を割り当てられたニューロンのみが発火した場合には、認識結果は”驚き”となる。なお、感情の種類はこれに限定するものではないし、これに伴って出力層を構成するニューロンの数もこれに限定するものではない。 On the other hand, each neuron constituting the output layer is for showing different emotions, and when each feature amount obtained in step S302 is input to each neuron constituting the input layer, the neuron constituting the output layer Any one of them will ignite. Therefore, the emotion assigned to the fired neuron is the recognition result. For example, when the number of neurons constituting the output layer is seven, any one of “happiness”, “anger”, “sadness”, “disgust”, “surprise”, “fear”, and “no emotion” is assigned to each neuron without duplication. For example, when only a neuron assigned with “surprise” fires, the recognition result is “surprise”. The type of emotion is not limited to this, and the number of neurons constituting the output layer is not limited to this.

なお、周知の通り、各層間のニューロン間の重み係数はすでに設定されているのであるが、この重み係数は、平均的な人の反応データのパラメータ計測値が入力されたときに、平均的な人の感情を出力するように初期設定されている。この初期設定は、多数の人に同じ刺激データを提示し、その時の反応データのパラメータ計測値を入力層への入力信号とし、そしてそのときの平均的な感情を教師信号として学習することにより、行われる。 As is well known, a weighting factor between neurons between layers is already set, but this weighting factor is averaged when parameter measurement values of average human reaction data are input. It is initially set to output human emotions. In this initial setting, the same stimulus data is presented to a large number of people, the parameter measurement value of the response data at that time is used as an input signal to the input layer, and the average emotion at that time is learned as a teacher signal, Done.

これにより、様々な人の反応データをニューラルネットワークに入力しても、出力層からは、おおよその人が抱くであろう感情を示すニューロンのみが発火するようになる。 As a result, even when various human reaction data are input to the neural network, only the neurons that show the emotions that the approximate person will have will fire from the output layer.

しかし、人によっては、例えば怒っている場合の顔の特徴量をニューラルネットワークに入力しても、悲しみの感情を割り当てられたニューロンが発火してしまうことがある。これは、この人の表情が平均的な表情ではない場合等に生ずる。 However, depending on the person, for example, even if the facial feature amount when angry is input to the neural network, the neuron assigned with the emotion of sadness may fire. This occurs when the facial expression of this person is not an average facial expression.

従って以下で詳しく説明するが、感情モデル１０９としてニューラルネットワークを用いる場合には、あるシーンを見たユーザ１００の反応データをニューラルネットワークに入力した場合の出力層におけるニューロンの発火パターンが、このシーンに関連付けられている感情を示す出力層におけるニューロンの発火パターンに一致するように、ニューロン間の重み係数を修正する必要がある。 Therefore, as will be described in detail below, when a neural network is used as the emotion model 109, the firing pattern of neurons in the output layer when the response data of the user 100 viewing a certain scene is input to the neural network is shown in this scene. It is necessary to modify the weighting factor between neurons to match the firing pattern of neurons in the output layer showing the associated emotion.

一方、感情モデル１０９としてニューラルネットワーク以外のモデルを用いた場合について説明する。ニューラルネットワーク以外のモデルとして、ステップＳ３０２で計測されるユーザ１００の反応データのパラメータ計測値の存在範囲を、各感情毎に規定している感情モデルを使用する。 On the other hand, a case where a model other than the neural network is used as the emotion model 109 will be described. As a model other than the neural network, an emotion model that defines the existence range of parameter measurement values of the response data of the user 100 measured in step S302 for each emotion is used.

図５は、このようなモデルが規定する各反応パラメータの範囲の例を示す図である。同図に示した感情モデルシート５０１には、前述した“幸福”・・・といった各感情ごとに、眼や口等、及びそれらを構成するさらに小さな部位の位置やその変化といった反応データのパラメータ計測値の存在範囲が記述されている。例えば、“幸福”という感情は、眼に関しては、目の中心位置と目尻の位置関係が規定されている。原点を画像の左上にとり、眼の垂直方向の長さをEv、眼の中心位置をEx,Ey、とすると、目尻の垂直方向の位置Eeyは、
Eey≧Ey＋a×Ev 且つ Eey＜Ey＋b×Ev （ａ＝０．１，ｂ＝１．０）
という条件式に従っている。つまり、この式から、“幸福”という感情をしめす時には目尻が下がっているということがわかる。 FIG. 5 is a diagram showing an example of the range of each reaction parameter defined by such a model. In the emotion model sheet 501 shown in the figure, for each emotion such as “happiness” described above, parameter measurement of reaction data such as the position of the eye, mouth, etc., and the position of a smaller part constituting them, and changes thereof is performed. Describes the range of values. For example, for the feeling of “happiness”, the positional relationship between the center position of the eye and the corner of the eye is defined for the eye. If the origin is at the upper left of the image, the vertical length of the eye is Ev, the center position of the eye is Ex, Ey, the vertical position Eey of the outer corner of the eye is
Eey ≧ Ey + a × Ev and Eey <Ey + b × Ev (a = 0.1, b = 1.0)
It follows the conditional expression. In other words, this expression shows that when the emotion of “happiness” is expressed, the corners of the eyes are lowered.

逆にいうと、ユーザ１００の反応データにおいて、眼の中心位置及び目尻の位置というパラメータ計測値がこの式を満足したときに、ユーザ１００が“幸福”という状態である可能性があるということである。そして、この感情モデル１０９を用いて、この他にも、眼・口等の構成部品それぞれに対して求められたパラメータ計測値が、それぞれの位置関係を示す式を満足しているかどうかを調べ、満足した式が最も多い感情をユーザ１００の感情として認識する。 In other words, in the reaction data of the user 100, when the parameter measurement values of the center position of the eye and the position of the corner of the eye satisfy this equation, the user 100 may be in a state of “happiness”. is there. Then, using this emotion model 109, in addition to this, it is examined whether the parameter measurement values obtained for the respective components such as the eyes and mouth satisfy the expressions indicating the respective positional relationships, The emotion with the most satisfied expressions is recognized as the emotion of the user 100.

このように各感情時の顔の構成部品の位置や変化、また手や足の動作や声の大きさ等の存在範囲を規定することで、モデルシート上で各感情を表現することができる。なお、上記式におけるパラメータ（係数）（ａ，ｂ）の値は、平均的な人の反応データのパラメータ計測値が入力されたときに、平均的な人の感情を出力するように初期設定されている。当然、他の関係式におけるパラメータについても同様に初期設定されている。 In this way, by defining the positions and changes of the components of the face at the time of each emotion, and the existence ranges such as the movements of the hands and feet and the loudness of the voice, each emotion can be expressed on the model sheet. Note that the values of the parameters (coefficients) (a, b) in the above formula are initially set so that the average human emotion is output when the parameter measurement values of the average human reaction data are input. ing. Of course, the parameters in the other relational expressions are similarly initialized.

これにより、様々な人の反応データを図５に示すようなモデルシートに記述された各条件式に当てはめてみても、おおよその人が抱くであろう感情に該当する各条件式を満たすようになる。 As a result, even if the response data of various people are applied to the conditional expressions described in the model sheet as shown in FIG. 5, the conditional expressions corresponding to the emotions that an approximate person will have are satisfied. Become.

しかし、人によっては、例えば怒っている場合の顔の特徴量を各条件式に代入してみても、悲しみの感情に該当する条件式を満たすようになることがある。これは、この人の表情が平均的な表情ではない場合等に生ずる。 However, depending on the person, for example, even if the facial feature amount when angry is substituted into each conditional expression, the conditional expression corresponding to the emotion of sadness may be satisfied. This occurs when the facial expression of this person is not an average facial expression.

従って以下で詳しく説明するが、感情モデル１０９として感情毎に設けた条件式群を用いる場合には、あるシーンを見たユーザ１００の反応データを感情毎に設けた条件式群に代入した場合に、このシーンに関連付けられている感情に対して設けた条件式群を最も満たすように、各条件式群におけるパラメータを修正する必要がある。 Therefore, as will be described in detail below, when a group of conditional expressions provided for each emotion is used as the emotion model 109, the reaction data of the user 100 who has seen a certain scene is substituted into the conditional expression group provided for each emotion. It is necessary to correct the parameters in each conditional expression group so that the conditional expression group provided for the emotion associated with this scene is most satisfied.

以上のようにして、感情モデル１０９に何れのモデルを用いたとしても、本実施形態では、あるシーンを見ているユーザ１００の反応データに基づいて、ユーザ１００の感情がこのシーンに関連付けられている感情として認識されるように、感情モデル１０９のパラメータを修正する必要がある。 As described above, regardless of which model is used for the emotion model 109, in this embodiment, the emotion of the user 100 is associated with this scene based on the reaction data of the user 100 who is watching a certain scene. It is necessary to correct the parameters of the emotion model 109 so that the emotion is recognized as the emotion.

従ってステップＳ３０４では、感情モデル１０９のパラメータの修正処理を行う。 Accordingly, in step S304, the parameter of the emotion model 109 is corrected.

先ず、感情モデル１０９としてニューラルネットワークを用いた場合のステップＳ３０４における処理について説明する。 First, the process in step S304 when a neural network is used as the emotion model 109 will be described.

図８は、感情モデル１０９としてニューラルネットワークを用いた場合のステップＳ３０４における処理を説明する図である。ここでは出力層を構成するニューロンの数を３とするが、これに限定するものではない。 FIG. 8 is a diagram illustrating the process in step S304 when a neural network is used as the emotion model 109. Although the number of neurons constituting the output layer is 3 here, the number is not limited to this.

同図において出力層を構成するニューロン８０１，８０２，８０３にはそれぞれ「怒り」、「悲しみ」、「幸福」の感情を割り当てているものとする。従ってニューロン８０１のみが発火する場合（「１００」の出力パターンが出力層から出力された場合）、ニューラルネットワークは認識結果として「怒り」を出力したことになる。一方、ニューロン８０２のみが発火する場合（「０１０」の出力パターンが出力層から出力された場合）、ニューラルネットワークは認識結果として「悲しみ」を出力したことになる。一方、ニューロン８０３のみが発火する場合（「００１」の出力パターンが出力層から出力された場合）、ニューラルネットワークは認識結果として「幸福」を出力したことになる。 In the figure, it is assumed that emotions “anger”, “sadness”, and “happiness” are assigned to the neurons 801, 802, and 803 constituting the output layer, respectively. Therefore, when only the neuron 801 fires (when the output pattern of “1 0 0” is output from the output layer), the neural network outputs “anger” as the recognition result. On the other hand, when only the neuron 802 fires (when the output pattern “0 1 0” is output from the output layer), the neural network outputs “sadness” as the recognition result. On the other hand, when only the neuron 803 fires (when the output pattern of “0 0 1” is output from the output layer), the neural network outputs “happiness” as the recognition result.

ここで、あるシーンを見ているユーザ１００の反応データが入力層に入力され、出力層から「０．１０．２０．９」の出力パターンが出力され、更にこのシーンには「幸福」を示す情報が関連付けられているとする。 Here, the response data of the user 100 watching a certain scene is input to the input layer, and an output pattern of “0.1 0.2 0.9” is output from the output layer. Assume that information indicating is associated.

この場合、このシーンを見て一般的にはユーザ１００は「幸福」な感情を抱くと予想されるので、ユーザ１００の感情の認識結果として「幸福」を示すパターン「００１」が出力層から出力されることが好ましい。ところが、これとは異なる「０．１０．２０．９」の出力パターンが出力されたということは、これはすなわち、重み係数を修正し、ニューラルネットワークが上記反応データを入力した場合には所望の出力パターン（「００１」）を出力するようにする必要があるということである。 In this case, since it is generally expected that the user 100 will have a “happy” feeling by looking at this scene, the pattern “0 0 1” indicating “happiness” is recognized as an output layer as the recognition result of the user 100 emotion. Is preferably output from. However, when an output pattern of “0.1 0.2 0.9” different from this is output, this means that when the weighting coefficient is corrected and the neural network inputs the above reaction data. This means that it is necessary to output a desired output pattern (“0 0 1”).

従ってこのような場合には、上記反応データを入力層に入力することで出力層から出力される出力パターンと、教師信号としての上記所望の出力パターンとの差（誤差）を求め、この差に基づいて周知のバックプロパゲーション処理を行うことで、各層間のニューロン間の重み係数を修正する（学習処理）。このような処理を複数回繰り返して重み係数を修正することで、幸福なシーンを見ているユーザ１００の反応データをニューラルネットワークに入力すると、「００１」の出力パターンが出力層から出力される、すなわち、「幸福」の感情として認識されることになる。 Therefore, in such a case, by inputting the reaction data to the input layer, a difference (error) between the output pattern output from the output layer and the desired output pattern as a teacher signal is obtained, and Based on this, a known back-propagation process is performed to correct the weighting factor between the neurons between the layers (learning process). By repeating such processing a plurality of times and correcting the weighting coefficient, when the response data of the user 100 who is watching a happy scene is input to the neural network, an output pattern of “0 0 1” is output from the output layer. In other words, it will be recognized as an emotion of “happiness”.

また、これは他のシーン（他の感情を抱かせるシーン）をユーザ１００に見せ、そのときのユーザ１００の感情を認識するために用いる重み係数の修正処理を行う場合についても同じである。なお、本実施形態では、１つのシーン（図２に示した各シーン）について上記誤差が所定値となるまで行ってから次にシーンに処理を移行しても良いし、各シーンについて１回ずつ順次行うようにしても良い。また、上記学習処理のアルゴリズムについては様々なものが適用可能であり、特に限定するものではない。また、学習処理の終了基準についても特に限定するものではない。 The same applies to the case where the user 100 is shown other scenes (scenes with other emotions) and the weighting coefficient used for recognizing the emotions of the user 100 at that time is corrected. In the present embodiment, the processing may be shifted to the next scene after the above error reaches a predetermined value for one scene (each scene shown in FIG. 2), or once for each scene. You may make it carry out sequentially. Various algorithms for the learning process can be applied and are not particularly limited. Further, the end criterion of the learning process is not particularly limited.

次に、感情モデル１０９がパラメータ計測値の存在範囲を各感情毎に規定している感情モデルの場合のステップＳ３０４における処理について説明する。 Next, the process in step S304 in the case where the emotion model 109 is an emotion model that defines the existence range of parameter measurement values for each emotion will be described.

例えばユーザ１００が「幸福」を示す情報が関連付けられたシーンを見ている場合、このときのユーザ１００の反応データの存在範囲が、「幸福」の感情に該当する条件式を満たすように、条件式に含まれているパラメータを修正する。例えば幸福という感情として認識するための目に関する上記条件式
Eey≧Ey＋a×Ev 且つ Eey＜Ey＋b×Ev
の場合、「幸福」を示す情報が関連付けられたシーンを見ているユーザ１００の反応データに含まれる眼の垂直方向の長さEv、眼の中心位置Ex,Eyをこの式に代入し、この式を満たすように係数ａ、ｂを修正する。 For example, when the user 100 is viewing a scene associated with information indicating “happiness”, the condition that the presence range of the reaction data of the user 100 at this time satisfies the conditional expression corresponding to the emotion of “happiness” is satisfied. Correct the parameters included in the expression. For example, the above conditional expression for eyes to recognize as feelings of happiness
Eey ≧ Ey + a × Ev and Eey <Ey + b × Ev
In this case, the vertical length Ev and the center positions Ex and Ey of the eye included in the response data of the user 100 who is viewing the scene associated with the information indicating “happiness” are substituted into this expression, The coefficients a and b are corrected so as to satisfy the equation.

なお、本実施形態では、１つのシーン（図２に示した各シーン）について条件式のパラメータ修正処理を行ってから次にシーンに処理を移行しても良いし、各シーンについて１回ずつ順次行うようにしても良い。 In this embodiment, the conditional expression parameter correction process may be performed for one scene (each scene shown in FIG. 2), and then the process may be transferred to the next scene. Alternatively, each scene may be sequentially performed once. You may make it do.

以上の処理を映画の終了時まで行う（ステップＳ３０５）。 The above processing is performed until the end of the movie (step S305).

以上説明した処理を行うことにより、各シーンに関連付けられた感情をユーザ１００が抱いた場合に、どのような表情、手足の動き、声の大きさであるのかを感情モデル１０９に含まれるパラメータを修正することにより、学習することができ、その結果、学習後の感情モデル１０９は、ユーザ１００の感情を、ユーザ１００固有の視覚的な情報、聴覚的な情報でもって認識することができるモデルとなる。 By performing the processing described above, the parameters included in the emotion model 109 indicate what facial expression, limb movement, and voice volume when the user 100 has emotions associated with each scene. As a result, the emotion model 109 after learning can recognize the emotion of the user 100 with visual information and auditory information unique to the user 100. Become.

このようにユーザ個人毎の感情モデルを作成し、その個人ごとの感情モデルに基づいて、ユーザの反応に対して感情を認識することにより、正確にユーザの感情を認識することが出来る。つまり、一般的に同じ感情を抱いていても個人の反応（表情等）は異なるが、同じ刺激に対する各個人の反応を学習したため、個人の反応の差異に影響を受けない感情認識が可能になる。 Thus, by creating an emotion model for each individual user and recognizing the emotion for the user's reaction based on the individual emotion model, the user's emotion can be accurately recognized. In other words, although the individual responses (facial expressions, etc.) are generally different even if they hold the same emotions, they can learn emotions that are not affected by differences in individual responses because they learned each individual response to the same stimulus. .

また、刺激データに映画を使用することで、前述のように、精度の高い感情モデルが作成できるため、精度の高い感情認識が可能になる。 In addition, by using a movie as the stimulus data, a highly accurate emotion model can be created as described above, so that highly accurate emotion recognition is possible.

また本実施形態では、感情モデル１０９のパラメータの初期値は、平均的な人の反応データのパラメータ計測値が入力されたときに、平均的な人の感情を出力するように初期設定されているので、初期値を全て０と設定するよりも、容易に個人ごとの感情モデルが作成できる。 In this embodiment, the initial values of the parameters of the emotion model 109 are initially set so that the average human emotion is output when the parameter measurement value of the average human reaction data is input. Therefore, it is possible to easily create an emotion model for each individual rather than setting all initial values to 0.

また、各シーンごとに感情を認識するのではなく、ユーザ１００の表情が変化したり、ユーザ１００の動作が変化したとき等、ユーザ１００の反応データのパラメータ計測値に変化が起こった時のみに、感情を認識するようにしてもよい。このようにすることで、反応の乏しい人に対して感情モデルを作成する時に、無反応の時の誤差を減少させることが出来る。 Also, instead of recognizing emotion for each scene, only when a change occurs in the parameter measurement value of the reaction data of the user 100, such as when the facial expression of the user 100 changes or when the operation of the user 100 changes. You may be made to recognize emotions. In this way, when creating an emotion model for a person with poor response, the error when there is no response can be reduced.

なお、本実施形態では、ユーザ１００の画像情報と音声情報の両方を収集して感情を認識するようにしていたが、ユーザによっては如何なる感情を抱いても声を出さない場合もある。その場合には、画像情報のみを用いて感情を認識するようにしても良い。その場合には、マイク１０２は必要はないし、感情モデル１０９には、上記画像情報のみを入力することになる。 In the present embodiment, both image information and audio information of the user 100 are collected to recognize emotions. However, some users may not speak any emotions. In that case, emotion may be recognized using only image information. In that case, the microphone 102 is not necessary, and only the image information is input to the emotion model 109.

［第２の実施形態］
第１の実施形態では刺激データとして映画のデータとしたが、本実施形態ではテレビゲームのプログラム、及びデータとする。 [Second Embodiment]
In the first embodiment, movie data is used as stimulus data, but in this embodiment, a video game program and data are used.

つまり、ゲームを再生し、ゲームの進展具合と、その進展具合に対して人が抱く感情とに関して関連付けられた感情データを用意する。そして、ゲームの進展具合に対する感情データと、その進展具合に対するユーザの反応とから、そのユーザの感情モデルを第１の実施形態と同様にして作成する。 That is, the game is played, and emotion data associated with the progress of the game and the emotion that the person has with respect to the progress is prepared. Then, the emotion model of the user is created in the same manner as in the first embodiment from the emotion data for the progress of the game and the user's reaction to the progress.

ここでいうゲームの進展具合とは、ある場面をクリアするのにどれくらい時間がかかっているか、またはある時間内にどれだけの敵を倒したか、またはある時間内に自分のキャラクタが敵に連続して負けたか等である。なお、本実施形態では、感情データ保持部１０７には、ゲームの進展具合とその時の感情とを関連付けた感情データを保持している。 The progress of the game here means how long it takes to clear a scene, how many enemies have been defeated within a certain period of time, Or have lost. In the present embodiment, the emotion data holding unit 107 holds emotion data in which the progress of the game is associated with the emotion at that time.

図６は、本実施形態における感情データシートの構成例を示している。 FIG. 6 shows a configuration example of an emotion data sheet in the present embodiment.

感情データシート６０１では、前述のようなゲームの進展具合と感情とを関連付けている。この関連付けは、多数の人にゲームをやってもらい、そのときのゲームの進展具合に応じた感情をまとめた平均的な感情データから作成される。 The emotion data sheet 601 associates the progress of the game as described above with emotions. This association is created from average emotion data in which a large number of people play a game and the emotions according to the progress of the game at that time are summarized.

よってＣＰＵ７０１は、同図に示した各進展具合についてユーザ１００のプレイを監視（クリア時間は？ある時間内にどの程度の数の敵を倒したか？等を監視）し、同図の感情データシートにおいて該当する進展具合を特定する。特定した各進展具合には感情が関連付けられているので、それぞれの進展具合に関連付けられた感情のうち、最も多いものを認識結果として出力する。すなわち、ここで認識したのは、「この進展具合なら、ユーザ１００はこのような感情を抱くであろう」感情（推定感情）である。 Therefore, the CPU 701 monitors the play of the user 100 for each progress shown in the figure (monitoring the clear time, how many enemies have been defeated within a certain time, etc.), and the emotion data sheet of the figure Identify the relevant progress in. Since emotions are associated with each identified progress, the largest number of emotions associated with each progress are output as recognition results. That is, what is recognized here is an emotion (estimated emotion) that “if this progress, the user 100 will have such an emotion”.

一方で、第１の実施形態と同様にして、カメラ１０１、マイク１０２からの画像情報、音声情報に基づいて感情モデル１０９によりユーザ１００の感情を認識する。そして、認識した感情と上記推定感情（第１の実施形態では教師信号に該当）とを参照し、第１の実施形態と同様にして感情モデル１０９のパラメータを修正する。 On the other hand, as in the first embodiment, the emotion of the user 100 is recognized by the emotion model 109 based on image information and audio information from the camera 101 and the microphone 102. Then, referring to the recognized emotion and the estimated emotion (corresponding to the teacher signal in the first embodiment), the parameters of the emotion model 109 are corrected in the same manner as in the first embodiment.

以上の説明からもわかるとおり、第１の実施形態では、各シーン毎に感情を示す情報が予め関連付けられて感情データシートの形態で記憶保持されていたので、各シーン毎に認識されるべき感情はこの感情データシートを参照することにより得られたのであるが、本実施形態では、刺激データとしてゲームのプログラム、データを用いたことにより、感情データシートの構成、及びこの感情データシートを用いたユーザ１００の感情を認識する方法は異なる。 As can be seen from the above description, in the first embodiment, information indicating emotions is associated with each scene in advance and stored in the form of an emotion data sheet, so the emotions to be recognized for each scene. Is obtained by referring to this emotion data sheet, but in this embodiment, by using a game program and data as stimulus data, the configuration of the emotion data sheet and this emotion data sheet are used. The method for recognizing the emotion of the user 100 is different.

すなわち、ＣＰＵ７０１は常にユーザ１００のゲームの進展具合を監視し、所定の時間毎に感情データシートのどの進展具合に該当するのかを判断する。そして該当する進展具合に関連付けられた感情のうち、最も多いものを認識結果とする。これにより、ＣＰＵ７０１は、「この進展具合なら、ユーザ１００はこのような感情を抱くであろう」と判断して、感情を認識することができる。 In other words, the CPU 701 always monitors the progress of the game of the user 100 and determines which progress of the emotion data sheet corresponds to each predetermined time. Of the emotions associated with the corresponding progress, the most common emotion is taken as the recognition result. Thereby, the CPU 701 can recognize the emotion by determining that “the user 100 will have such an emotion if this progress is made”.

このように、現在のユーザ１００の感情がどのようなものであるかを得るために映画の代わりにテレビゲームを用いることで、感情データシートの構成、及びこの感情データシートを用いたユーザ１００の感情を認識する方法が異なる以外は、本実施形態は第１の実施形態と同じである。 Thus, by using a video game instead of a movie in order to obtain what the current user's 100 emotion is, the configuration of the emotion data sheet and the user 100 using this emotion data sheet The present embodiment is the same as the first embodiment except that the method for recognizing emotions is different.

このように刺激データとしてテレビゲームを使用すると以下のような長所がある。すなわち、多数の人がゲームの進展に沿って、ほぼ同じ体験をすることが出来るので、平均的な反応と個人の特異な反応との比較がしやすい。さらに、同じ感情を持つ状態がいくつかあるので、ある特異な状態のみで感情モデルが作成されるのではなく、同じ感情をもつ複数の状態に対する平均的な反応を使用して感情モデルを作成するため、精度の高い感情モデルが作成できる。 Thus, using a video game as stimulus data has the following advantages. That is, since many people can experience almost the same experience as the game progresses, it is easy to compare the average response with the individual specific response. In addition, since there are several states with the same emotion, an emotion model is created using an average response to multiple states with the same emotion, instead of creating an emotion model only with a specific state. Therefore, an accurate emotion model can be created.

また、テレビゲームはインタラクティブ性があり、ユーザが映画よりも更に主体的に取り組むので、声を出したりして、ユーザの反応も強くなり、喜怒哀楽の感情が出やすい。そのため、ユーザの反応データのパラメータ計測が行いやすい。しかし、感情モデルの作成及び修正時には、入力されるユーザの反応データのパラメータ計測値に対して出力される感情の強さは最大のレベルとする必要がある。つまり、ゲームを実行中の反応データのパラメータ計測値のレベルより低い値を入力した時に、出力される感情のレベルが通常レベルになるようにする必要がある。 In addition, the video game is interactive, and the user works more proactively than the movie. Therefore, the user's reaction becomes stronger by making a voice and the emotion of emotion is easily generated. Therefore, it is easy to perform parameter measurement of user reaction data. However, when creating and correcting an emotion model, the strength of the emotion that is output with respect to the parameter measurement value of the input response data of the user needs to be at the maximum level. That is, when a value lower than the parameter measurement value level of the reaction data during the game is input, the emotion level to be output needs to be set to the normal level.

なお、ユーザ１００の反応データを得る形態は上記実施形態以外にも考えられる。例えば、第１の実施形態ではシーンを見ているユーザ１００の画像情報、音声情報を反応データとして収集したが、シーンを見ているユーザの脈拍数を計測し、計測した脈拍数が単位時間あたりどの程度の脈拍数であるかによって、感情を推定するようにしても良い。例えば興奮時には単位時間あたりの脈拍数は多くなる等、脈拍数は感情の部分情報になりうるので、この脈拍数を反応データとして用いることも可能である。また、反応データとして以上説明したものを適宜組み合わせて用いるようにしても良い。 In addition, the form which obtains the reaction data of the user 100 can be considered other than the above embodiment. For example, in the first embodiment, image information and audio information of the user 100 watching the scene are collected as reaction data, but the pulse rate of the user watching the scene is measured, and the measured pulse rate is per unit time. The emotion may be estimated depending on how much the pulse rate is. For example, since the pulse rate per unit time increases during excitement, the pulse rate can be partial information of emotion, so this pulse rate can also be used as reaction data. Moreover, you may make it use combining suitably what was demonstrated above as reaction data.

［その他の実施形態］
本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体（または記憶媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。 [Other Embodiments]
An object of the present invention is to supply a recording medium (or storage medium) that records software program codes for realizing the functions of the above-described embodiments to a system or apparatus, and the computer of the system or apparatus (or CPU or MPU). Needless to say, this can also be achieved by reading and executing the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.

また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.

さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program code read from the recording medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is based on the instruction of the program code. It goes without saying that the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

本発明を上記記録媒体に適用する場合、その記録媒体には、先に説明したフローチャート（機能構成）に対応するプログラムコードが格納されることになる。 When the present invention is applied to the recording medium, program code corresponding to the flowchart (functional configuration) described above is stored in the recording medium.

本発明の第１の実施形態に係る情報処理システムの機能構成を示す図である。It is a figure which shows the function structure of the information processing system which concerns on the 1st Embodiment of this invention. 各シーン毎に感情を示すデータを関連付けた感情データシートの構成例を示す図である。It is a figure which shows the structural example of the emotion data sheet which linked | related the data which show an emotion for every scene. 本発明の第１の実施形態に係るコンピュータが行う感情認識処理、及びユーザ１００の感情を認識するための感情モデル１０９のパラメータの修正処理のフローチャートである。It is a flowchart of the emotion recognition process which the computer which concerns on the 1st Embodiment of this invention performs, and the correction process of the parameter of the emotion model 109 for recognizing the emotion of the user 100. FIG. 周知の階層型ニューラルネットワークの構成例を示す図である。It is a figure which shows the structural example of a known hierarchical type neural network. ステップＳ３０２で計測されるユーザ１００の反応データのパラメータ計測値の存在範囲を、各感情毎に規定している感情モデルが規定する各反応パラメータの範囲の例を示す図である。It is a figure which shows the example of the range of each reaction parameter which the emotion model which prescribes | regulates the existence range of the parameter measurement value of the reaction data of the user 100 measured by step S302 for every emotion. 本発明の第２の実施形態における感情データシートの構成例を示している。The structural example of the emotion data sheet in the 2nd Embodiment of this invention is shown. 本発明の第１の実施形態に係るシステムのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the system which concerns on the 1st Embodiment of this invention. 感情モデル１０９としてニューラルネットワークを用いた場合のステップＳ３０４における処理を説明する図である。It is a figure explaining the process in step S304 at the time of using a neural network as the emotion model 109. FIG.

Claims

An information processing apparatus for reproducing movie data,
Memory holding means for pre-associating and holding emotion information indicating an emotion that a person who sees the scene seems to have for each scene of the movie;
Playback means for playing back the movie data;
Imaging means for capturing an image of a user when viewing the movie data;
Estimating means for estimating the user's emotion when watching the scene of the movie based on the feature amount of the user obtained from the image using a preset recognition model;
Wherein such user emotion estimated by the estimating means when viewing the scene, emotion information stored and held in the storage holding means in association with the scene is estimated as emotion indicated, the recognition An information processing apparatus comprising: correction means for correcting a parameter of the model.

The said estimation means inputs the said feature-value into the neuron group of the input layer in a multilayer type neural network, and obtains the output from the neuron group of an output layer as the information which shows the said user's emotion. The information processing apparatus described.

When the feature value obtained from the image of the user when viewing the scene is input to the neuron group of the input layer, the correction unit is stored and held in the storage holding unit in association with the scene. The information processing apparatus according to claim 2, wherein a weighting factor between neurons is corrected so that emotion information is output.

The information processing apparatus according to claim 3, wherein the correction unit corrects the weight coefficient by back propagation.

The estimation means refers to an allowable range of the feature amount obtained in advance for each emotion, and outputs an emotion including the feature amount obtained from the user image as an estimation result as an estimation result. The information processing apparatus according to 1.

And further comprising a collecting means for collecting the user's voice information,
The estimation means estimates the emotion of the user when viewing the scene using a preset recognition model based on the feature amount obtained from the image and the feature amount obtained from the audio information. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

An information processing apparatus for playing a game,
Every information indicating each progress degree of the game, a storage holding means in advance in association with memory retention emotion information indicating the emotion that seems to have the person with Oite the game該進exhibition condition,
Imaging means for capturing an image of the user when the game is being played ;
Monitoring means for monitoring the progress of the game being played by the user;
Estimating means for estimating the user's emotion in each progress of the game using a preset recognition model based on the feature amount of the user obtained from the image;
As emotion estimated by the estimating means for each progress condition of the game, emotion information held associated with the information indicating the該進exhibition degree in the storage holding means is estimated as emotion indicated, the recognition model An information processing apparatus comprising: correction means for correcting the parameters of the information processing apparatus.

An information processing method performed by an information processing apparatus for reproducing movie data,
A memory holding step in which the memory holding means of the information processing apparatus associates in advance and stores in the memory holding unit emotion information indicating the feeling that a person who sees the scene seems to have for each scene of the movie;
A reproduction step in which reproduction means included in the information processing apparatus reproduces the data of the movie;
An imaging process in which an imaging unit included in the information processing apparatus captures an image of a user when viewing the movie data;
Based on the feature amount of the user obtained from the image, the estimation means included in the information processing apparatus estimates the emotion of the user when watching the movie scene using a preset recognition model. An estimation process to
Correction means said information processing apparatus has found the emotion estimated by the estimation process, shown emotion information stored and held in the storage holding unit in association with the scene when the user is viewing the scene A correction step of correcting a parameter of the recognition model so as to be estimated as emotion.

An information processing method performed by an information processing apparatus that plays a game,
Storage retaining means said information processing apparatus has found the each information indicating each progress condition of the game, in advance associate emotion information indicating the emotion that seems to have the person with Oite the game該進Exhibition degree A memory holding step for storing and holding in the memory holding unit;
An imaging process in which an imaging unit included in the information processing apparatus captures an image of a user when the game is being played ,
A monitoring step in which the monitoring means of the information processing apparatus monitors the progress of the game being played by the user;
An estimation step in which the estimation unit included in the information processing apparatus estimates the emotion of the user in each progress of the game using a preset recognition model based on the feature amount of the user obtained from the image. When,
Emotion correction means said information processing apparatus has the emotion estimated by the estimating step for each progress condition of the game, indicated by the emotion information stored associated with information indicating the該進exhibition degree in the storage holder And a correction step of correcting the parameters of the recognition model so as to be estimated as follows.

The computer program for functioning a computer as each means which the information processing apparatus of any one of Claims 1 thru | or 7 has.

A computer-readable storage medium storing the computer program according to claim 10.