JP2025013311A

JP2025013311A - Control System

Info

Publication number: JP2025013311A
Application number: JP2024112906A
Authority: JP
Inventors: 正義孫; Masayoshi Son
Original assignee: SoftBank Group Corp
Current assignee: SoftBank Group Corp
Priority date: 2023-07-14
Filing date: 2024-07-12
Publication date: 2025-01-24

Abstract

To provide an electronic device capable of guiding a user.SOLUTION: A control system provided herein comprises a condition recognition unit, emotion determination unit, and action determination unit. An electronic device is removably provided to a neck strap includes one or more sensors configured to collect information. Device actuation includes setting first action content for guiding a user by having the electronic device reproduce at least one of a voice and an image.SELECTED DRAWING: Figure 2

Description

本発明は、制御システムに関する。 The present invention relates to a control system.

特許文献１には、ネックストラップを具備するカメラ付きの通信端末が開示されている。 Patent document 1 discloses a communication terminal with a camera and a neck strap.

特開２０１９－２１５４４４号公報JP 2019-215444 A

しかしながら従来技術は、ネックストラップのカメラへの映り込みを抑制できるもののユーザの行動を支援するものではないため、ユーザを支援する上で改善の余地がある。 However, while conventional technology can prevent the neck strap from being captured by the camera, it does not support the user's actions, so there is room for improvement in terms of supporting users.

本発明の第１の態様によれば、制御システムが提供される。当該制御システムは、ユーザの行動を含むユーザ状態、及び電子機器の状態を認識する状態認識部と、前記ユーザの感情又は前記電子機器の感情を判定する感情決定部と、所定のタイミングで、前記ユーザ状態、前記電子機器の状態、前記ユーザの感情、及び前記電子機器の感情の少なくとも一つと、行動決定モデルとを用いて、作動しないことを含む複数種類の機器作動の何れかを、前記電子機器の行動として決定する行動決定部と、を含み、ネックストラップに着脱可能に設けられる前記電子機器は、情報を収集する１又は複数のセンサを含み、前記機器作動は、音声及び画像の少なくとも１つを前記電子機器が再生することで前記ユーザを先導する第１行動内容を設定することを含む。
ここで、電子機器は、通信端末を含む。通信端末とは、物理的な動作を行う装置、物理的な動作を行わずに映像や音声を出力する装置、及びソフトウェア上で動作するエージェントを含む。 According to a first aspect of the present invention, there is provided a control system, the control system including: a state recognition unit that recognizes a user state including a user's behavior and a state of an electronic device; an emotion determination unit that determines an emotion of the user or an emotion of the electronic device; and an action determination unit that determines, at a predetermined timing, one of a plurality of types of device operations including no operation as an action of the electronic device, using at least one of the user state, the state of the electronic device, the emotion of the user, and the emotion of the electronic device, and a behavior determination model, the electronic device detachably provided on a neck strap includes one or more sensors that collect information, and the device operation includes setting a first action content that leads the user by the electronic device reproducing at least one of a sound and an image.
Here, the electronic device includes a communication terminal, which includes a device that performs a physical operation, a device that outputs video and audio without performing a physical operation, and an agent that operates on software.

第１実施形態に係るシステム５の一例を概略的に示す。1 illustrates a schematic diagram of an example of a system 5 according to a first embodiment. 第１実施形態に係るロボット１００の機能構成を概略的に示す。2 illustrates a schematic functional configuration of a robot 100 according to a first embodiment. 第１実施形態に係るロボット１００による収集処理の動作フローの一例を概略的に示す。13 illustrates an example of an operation flow of a collection process by the robot 100 according to the first embodiment. 第１実施形態に係るロボット１００による応答処理の動作フローの一例を概略的に示す。13 illustrates an example of an operation flow of a response process by the robot 100 according to the first embodiment. 第１実施形態に係るロボット１００による自律的処理の動作フローの一例を概略的に示す。4 illustrates an example of an operation flow of autonomous processing by the robot 100 according to the first embodiment. 複数の感情がマッピングされる感情マップ４００を示す。4 shows an emotion map 400 onto which multiple emotions are mapped. 複数の感情がマッピングされる感情マップ９００を示す。9 shows an emotion map 900 onto which multiple emotions are mapped. （Ａ）第２実施形態に係るぬいぐるみ１００Ｎの外観図、（Ｂ）ぬいぐるみ１００Ｎの内部構造図である。13A is an external view of a stuffed animal 100N according to a second embodiment, and FIG. 13B is a diagram showing the internal structure of the stuffed animal 100N. 第２実施形態に係るぬいぐるみ１００Ｎの背面正面図である。FIG. 11 is a rear front view of a stuffed animal 100N according to a second embodiment. 第２実施形態に係るぬいぐるみ１００Ｎの機能構成を概略的に示す。13 shows a schematic functional configuration of a stuffed animal 100N according to a second embodiment. 第３実施形態に係るエージェントシステム５００の機能構成を概略的に示す。13 shows an outline of the functional configuration of an agent system 500 according to a third embodiment. エージェントシステムの動作の一例を示す。An example of the operation of the agent system is shown. エージェントシステムの動作の一例を示す。An example of the operation of the agent system is shown. 第４実施形態に係るエージェントシステムの機能構成を概略的に示す。13 shows an outline of the functional configuration of an agent system according to a fourth embodiment. スマート眼鏡によるエージェントシステムの利用態様の一例を示す。1 shows an example of how an agent system using smart glasses is used. コンピュータ１２００のハードウェア構成の一例を概略的に示す。1 illustrates an example of a hardware configuration of a computer 1200. 本開示の実施形態に係る通信端末の外観図である。1 is an external view of a communication terminal according to an embodiment of the present disclosure. 通信端末が再生する音声などの例を示す図である。11A and 11B are diagrams illustrating examples of sounds and the like reproduced by a communication terminal. 通信端末が再生する音声などの例を示す図である。11A and 11B are diagrams illustrating examples of sounds and the like reproduced by a communication terminal. 通信端末が再生する音声などの例を示す図である。11A and 11B are diagrams illustrating examples of sounds and the like reproduced by a communication terminal. 通信端末が再生する音声などの例を示す図である。11A and 11B are diagrams illustrating examples of sounds and the like reproduced by a communication terminal. 通信端末が再生する音声などの例を示す図である。11A and 11B are diagrams illustrating examples of sounds and the like reproduced by a communication terminal.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 The present invention will be described below through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. Furthermore, not all of the combinations of features described in the embodiments are necessarily essential to the solution of the invention.

［第１実施形態］
図１は、本実施形態に係るシステム５の一例を概略的に示す。システム５は、ロボット１００、ロボット１０１、ロボット１０２、及びサーバ３００を備える。ユーザ１０ａ、ユーザ１０ｂ、ユーザ１０ｃ、及びユーザ１０ｄは、ロボット１００のユーザである。ユーザ１１ａ、ユーザ１１ｂ及びユーザ１１ｃは、ロボット１０１のユーザである。ユーザ１２ａ及びユーザ１２ｂは、ロボット１０２のユーザである。なお、本実施形態の説明において、ユーザ１０ａ、ユーザ１０ｂ、ユーザ１０ｃ、及びユーザ１０ｄを、ユーザ１０と総称する場合がある。また、ユーザ１１ａ、ユーザ１１ｂ及びユーザ１１ｃを、ユーザ１１と総称する場合がある。また、ユーザ１２ａ及びユーザ１２ｂを、ユーザ１２と総称する場合がある。ロボット１０１及びロボット１０２は、ロボット１００と略同一の機能を有する。そのため、ロボット１００の機能を主として取り上げてシステム５を説明する。 [First embodiment]
FIG. 1 is a schematic diagram of an example of a system 5 according to the present embodiment. The system 5 includes a robot 100, a robot 101, a robot 102, and a server 300. A user 10a, a user 10b, a user 10c, and a user 10d are users of the robot 100. A user 11a, a user 11b, and a user 11c are users of the robot 101. A user 12a and a user 12b are users of the robot 102. In the description of the present embodiment, the user 10a, the user 10b, the user 10c, and the user 10d may be collectively referred to as the user 10. The user 11a, the user 11b, and the user 11c may be collectively referred to as the user 11. The user 12a and the user 12b may be collectively referred to as the user 12. The robot 101 and the robot 102 have substantially the same functions as the robot 100. Therefore, the system 5 will be described by mainly focusing on the functions of the robot 100.

ロボット１００は、ユーザ１０と会話を行ったり、ユーザ１０に映像を提供したりする。このとき、ロボット１００は、通信網２０を介して通信可能なサーバ３００等と連携して、ユーザ１０との会話や、ユーザ１０への映像等の提供を行う。例えば、ロボット１００は、自身で適切な会話を学習するだけでなく、サーバ３００と連携して、ユーザ１０とより適切に会話を進められるように学習を行う。また、ロボット１００は、撮影したユーザ１０の映像データ等をサーバ３００に記録させ、必要に応じて映像データ等をサーバ３００に要求して、ユーザ１０に提供する。 The robot 100 converses with the user 10 and provides images to the user 10. At this time, the robot 100 cooperates with a server 300 or the like with which it can communicate via the communication network 20 to converse with the user 10 and provide images, etc. to the user 10. For example, the robot 100 not only learns appropriate conversation by itself, but also cooperates with the server 300 to learn how to have a more appropriate conversation with the user 10. The robot 100 also records captured image data of the user 10 in the server 300, and requests the image data, etc. from the server 300 as necessary to provide it to the user 10.

また、ロボット１００は、自身の感情の種類を表す感情値を持つ。例えば、ロボット１００は、「喜」、「怒」、「哀」、「楽」、「快」、「不快」、「安心」、「不安」、「悲しみ」、「興奮」、「心配」、「安堵」、「充実感」、「虚無感」及び「普通」のそれぞれの感情の強さを表す感情値を持つ。ロボット１００は、例えば興奮の感情値が大きい状態でユーザ１０と会話するときは、早いスピードで音声を発する。このように、ロボット１００は、自己の感情を行動で表現することができる。 The robot 100 also has an emotion value that represents the type of its own emotion. For example, the robot 100 has emotion values that represent the strength of each of the emotions of "happiness," "anger," "sorrow," "pleasure," "discomfort," "relief," "anxiety," "sorrow," "excitement," "worry," "relief," "fulfillment," "emptiness," and "neutral." When the robot 100 converses with the user 10 when its excitement emotion value is high, for example, it speaks at a fast speed. In this way, the robot 100 can express its own emotions through its actions.

また、ロボット１００は、ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）を用いた文章生成モデルと感情エンジンをマッチングさせることで、ユーザ１０の感情に対応するロボット１００の行動を決定するように構成してよい。具体的には、ロボット１００は、ユーザ１０の行動を認識して、当該ユーザの行動に対するユーザ１０の感情を判定し、判定した感情に対応するロボット１００の行動を決定するように構成してよい。 The robot 100 may be configured to determine the behavior of the robot 100 corresponding to the emotion of the user 10 by matching a sentence generation model using AI (Artificial Intelligence) with an emotion engine. Specifically, the robot 100 may be configured to recognize the behavior of the user 10, determine the emotion of the user 10 regarding the user's behavior, and determine the behavior of the robot 100 corresponding to the determined emotion.

より具体的には、ロボット１００は、ユーザ１０の行動を認識した場合、予め設定された文章生成モデルを用いて、当該ユーザ１０の行動に対してロボット１００がとるべき行動内容を自動で生成する。文章生成モデルは、文字による自動対話処理のためのアルゴリズム及び演算と解釈してよい。文章生成モデルは、例えば特開２０１８－０８１４４４号公報やＣｈａｔＧＰＴ（インターネット検索＜URL: https://openai.com/blog/chatgpt＞）に開示される通り公知であるため、その詳細な説明を省略する。このような、文章生成モデルは、大規模言語モデル（ＬＬＭ：ＬａｒｇｅＬａｎｇｕａｇｅＭｏｄｅｌ）により構成されている。 More specifically, when the robot 100 recognizes the behavior of the user 10, the robot 100 automatically generates the behavioral content that the robot 100 should take in response to the behavior of the user 10, using a preset sentence generation model. The sentence generation model may be interpreted as an algorithm and calculation for automatic dialogue processing by text. The sentence generation model is publicly known as disclosed in, for example, JP 2018-081444 A and ChatGPT (Internet search <URL: https://openai.com/blog/chatgpt>), and therefore a detailed description thereof will be omitted. Such a sentence generation model is configured by a large language model (LLM: Large Language Model).

以上、本実施形態は、大規模言語モデルと感情エンジンとを組み合わせることにより、ユーザ１０やロボット１００の感情と、様々な言語情報とをロボット１００の行動に反映させるということができる。つまり、本実施形態によれば、文章生成モデルと感情エンジンとを組み合わせることにより、相乗効果を得ることができる。 As described above, this embodiment combines a large-scale language model with an emotion engine, thereby making it possible to reflect the emotions of the user 10 and the robot 100, as well as various linguistic information, in the behavior of the robot 100. In other words, according to this embodiment, a synergistic effect can be obtained by combining a sentence generation model with an emotion engine.

また、ロボット１００は、ユーザ１０の行動を認識する機能を有する。ロボット１００は、カメラ機能で取得したユーザ１０の顔画像や、マイク機能で取得したユーザ１０の音声を解析することによって、ユーザ１０の行動を認識する。ロボット１００は、認識したユーザ１０の行動等に基づいて、ロボット１００が実行する行動を決定する。 The robot 100 also has a function of recognizing the behavior of the user 10. The robot 100 recognizes the behavior of the user 10 by analyzing the facial image of the user 10 acquired by the camera function and the voice of the user 10 acquired by the microphone function. The robot 100 determines the behavior to be performed by the robot 100 based on the recognized behavior of the user 10, etc.

ロボット１００は、行動決定モデルの一例として、ユーザ１０の感情、ロボット１００の感情、及びユーザ１０の行動に基づいてロボット１００が実行する行動を定めたルールを記憶しており、ルールに従って各種の行動を行う。 As an example of a behavioral decision model, the robot 100 stores rules that define the behaviors that the robot 100 will execute based on the emotions of the user 10, the emotions of the robot 100, and the behavior of the user 10, and performs various behaviors according to the rules.

具体的には、ロボット１００には、ユーザ１０の感情、ロボット１００の感情、及びユーザ１０の行動に基づいてロボット１００の行動を決定するための反応ルールを、行動決定モデルの一例として有している。反応ルールには、例えば、ユーザ１０の行動が「笑う」である場合に対して、「笑う」という行動が、ロボット１００の行動として定められている。また、反応ルールには、ユーザ１０の行動が「怒る」である場合に対して、「謝る」という行動が、ロボット１００の行動として定められている。また、反応ルールには、ユーザ１０の行動が「質問する」である場合に対して、「回答する」という行動が、ロボット１００の行動として定められている。反応ルールには、ユーザ１０の行動が「悲しむ」である場合に対して、「声をかける」という行動が、ロボット１００の行動として定められている。 Specifically, the robot 100 has reaction rules for determining the behavior of the robot 100 based on the emotions of the user 10, the emotions of the robot 100, and the behavior of the user 10, as an example of a behavior decision model. For example, the reaction rules define the behavior of the robot 100 as "laughing" when the behavior of the user 10 is "laughing". The reaction rules also define the behavior of the robot 100 as "apologizing" when the behavior of the user 10 is "angry". The reaction rules also define the behavior of the robot 100 as "answering" when the behavior of the user 10 is "asking a question". The reaction rules also define the behavior of the robot 100 as "calling out" when the behavior of the user 10 is "sad".

ロボット１００は、反応ルールに基づいて、ユーザ１０の行動が「怒る」であると認識した場合、反応ルールで定められた「謝る」という行動を、ロボット１００が実行する行動として選択する。例えば、ロボット１００は、「謝る」という行動を選択した場合に、「謝る」動作を行うと共に、「謝る」言葉を表す音声を出力する。 When the robot 100 recognizes that the behavior of the user 10 is "angry" based on the reaction rules, the robot 100 selects the behavior of "apologizing" defined in the reaction rules as the behavior to be executed by the robot 100. For example, when the robot 100 selects the behavior of "apologizing", the robot 100 performs the motion of "apologizing" and outputs a voice expressing the words "apologize".

また、ロボット１００の感情が「普通」（すなわち、「喜」＝０、「怒」＝０、「哀」＝０、「楽」＝０）であり、ユーザ１０の状態が「１人、寂しそう」という条件が満たされた場合に、ロボット１００の感情が「心配になる」という感情の変化内容と、「声をかける」の行動を実行できることが定められている。 In addition, when the emotion of the robot 100 is "normal" (i.e., "happy" = 0, "anger" = 0, "sad" = 0, "happy" = 0) and the condition that the state of the user 10 is "alone and looks lonely" is satisfied, it is defined that the emotion of the robot 100 will change to "worried" and that the robot 100 will be able to execute the action of "calling out."

ロボット１００は、反応ルールに基づいて、ロボット１００の現在の感情が「普通」であり、かつ、ユーザ１０が１人で寂しそうな状態にあると認識した場合、ロボット１００の「哀」の感情値を増大させる。また、ロボット１００は、反応ルールで定められた「声をかける」という行動を、ユーザ１０に対して実行する行動として選択する。例えば、ロボット１００は、「声をかける」という行動を選択した場合に、心配していることを表す
「どうしたの？」という言葉を、心配そうな音声に変換して出力する。 When the robot 100 recognizes based on the reaction rule that the current emotion of the robot 100 is "normal" and that the user 10 is alone and seems lonely, the robot 100 increases the emotion value of "sadness" of the robot 100. The robot 100 also selects the action of "calling out" defined in the reaction rule as the action to be performed toward the user 10. For example, when the robot 100 selects the action of "calling out", the robot 100 converts the words "What's wrong?", which indicate concern, into a worried voice and outputs it.

また、ロボット１００は、この行動によって、ユーザ１０からポジティブな反応が得られたことを示すユーザ反応情報を、サーバ３００に送信する。ユーザ反応情報には、例えば、「怒る」というユーザ行動、「謝る」というロボット１００の行動、ユーザ１０の反応がポジティブであったこと、及びユーザ１０の属性が含まれる。 The robot 100 also transmits to the server 300 user reaction information indicating that this action has elicited a positive reaction from the user 10. The user reaction information includes, for example, the user action of "getting angry," the robot 100 action of "apologizing," the fact that the user 10's reaction was positive, and the attributes of the user 10.

サーバ３００は、ロボット１００から受信したユーザ反応情報を記憶する。なお、サーバ３００は、ロボット１００だけでなく、ロボット１０１及びロボット１０２のそれぞれからもユーザ反応情報を受信して記憶する。そして、サーバ３００は、ロボット１００、ロボット１０１及びロボット１０２からのユーザ反応情報を解析して、反応ルールを更新する。 The server 300 stores the user reaction information received from the robot 100. The server 300 receives and stores user reaction information not only from the robot 100, but also from each of the robots 101 and 102. The server 300 then analyzes the user reaction information from the robots 100, 101, and 102, and updates the reaction rules.

ロボット１００は、更新された反応ルールをサーバ３００に問い合わせることにより、更新された反応ルールをサーバ３００から受信する。ロボット１００は、更新された反応ルールを、ロボット１００が記憶している反応ルールに組み込む。これにより、ロボット１００は、ロボット１０１やロボット１０２等が獲得した反応ルールを、自身の反応ルールに組み込むことができる。 The robot 100 receives the updated reaction rules from the server 300 by inquiring about the updated reaction rules from the server 300. The robot 100 incorporates the updated reaction rules into the reaction rules stored in the robot 100. This allows the robot 100 to incorporate the reaction rules acquired by the robot 101, the robot 102, etc. into its own reaction rules.

図２は、ロボット１００の機能構成を概略的に示す。ロボット１００は、センサ部２００と、センサモジュール部２１０と、格納部２２０と、制御部２２８と、制御対象２５２と、を有する。制御部２２８は、状態認識部２３０と、感情決定部２３２と、行動認識部２３４と、行動決定部２３６と、記憶制御部２３８と、行動制御部２５０と、関連情報収集部２７０と、通信処理部２８０と、を有する。 Figure 2 shows a schematic functional configuration of the robot 100. The robot 100 has a sensor unit 200, a sensor module unit 210, a storage unit 220, a control unit 228, and a control target 252. The control unit 228 has a state recognition unit 230, an emotion determination unit 232, a behavior recognition unit 234, a behavior determination unit 236, a memory control unit 238, a behavior control unit 250, a related information collection unit 270, and a communication processing unit 280.

制御対象２５２は、表示装置、スピーカ及び目部のＬＥＤ、並びに、腕、手及び足等を駆動するモータ等を含む。ロボット１００の姿勢や仕草は、腕、手及び足等のモータを制御することにより制御される。ロボット１００の感情の一部は、これらのモータを制御することにより表現できる。また、ロボット１００の目部のＬＥＤの発光状態を制御することによっても、ロボット１００の表情を表現できる。なお、ロボット１００の姿勢、仕草及び表情は、ロボット１００の態度の一例である。 The controlled object 252 includes a display device, a speaker, LEDs in the eyes, and motors for driving the arms, hands, legs, etc. The posture and gestures of the robot 100 are controlled by controlling the motors of the arms, hands, legs, etc. Some of the emotions of the robot 100 can be expressed by controlling these motors. In addition, the facial expressions of the robot 100 can also be expressed by controlling the light emission state of the LEDs in the eyes of the robot 100. The posture, gestures, and facial expressions of the robot 100 are examples of the attitude of the robot 100.

センサ部２００は、マイク２０１と、３Ｄ深度センサ２０２と、２Ｄカメラ２０３と、距離センサ２０４と、タッチセンサ２０５と、加速度センサ２０６と、を含む。マイク２０１は、音声を連続的に検出して音声データを出力する。なお、マイク２０１は、ロボット１００の頭部に設けられ、バイノーラル録音を行う機能を有してよい。３Ｄ深度センサ２０２は、赤外線パターンを連続的に照射して、赤外線カメラで連続的に撮影された赤外線画像から赤外線パターンを解析することによって、物体の輪郭を検出する。２Ｄカメラ２０３は、イメージセンサの一例である。２Ｄカメラ２０３は、可視光によって撮影して、可視光の映像情報を生成する。距離センサ２０４は、例えばレーザや超音波等を照射して物体までの距離を検出する。なお、センサ部２００は、この他にも、時計、ジャイロセンサ、モータフィードバック用のセンサ等を含んでよい。 The sensor unit 200 includes a microphone 201, a 3D depth sensor 202, a 2D camera 203, a distance sensor 204, a touch sensor 205, and an acceleration sensor 206. The microphone 201 continuously detects sound and outputs sound data. The microphone 201 may be provided on the head of the robot 100 and may have a function of performing binaural recording. The 3D depth sensor 202 detects the contour of an object by continuously irradiating an infrared pattern and analyzing the infrared pattern from infrared images continuously captured by the infrared camera. The 2D camera 203 is an example of an image sensor. The 2D camera 203 captures images using visible light and generates visible light video information. The distance sensor 204 detects the distance to an object by irradiating, for example, a laser or ultrasonic waves. The sensor unit 200 may also include a clock, a gyro sensor, a sensor for motor feedback, and the like.

なお、図２に示すロボット１００の構成要素のうち、制御対象２５２及びセンサ部２００を除く構成要素は、ロボット１００が有する制御システムが有する構成要素の一例である。ロボット１００の制御システムは、制御対象２５２を制御の対象とする。 Note that, among the components of the robot 100 shown in FIG. 2, the components other than the control object 252 and the sensor unit 200 are examples of components of the control system of the robot 100. The control system of the robot 100 controls the control object 252.

格納部２２０は、行動決定モデル２２１、履歴データ２２２、収集データ２２３、及び行動予定データ２２４を含む。履歴データ２２２は、ユーザ１０の過去の感情値、ロボット１００の過去の感情値、及び行動の履歴を含み、具体的には、ユーザ１０の感情値、ロボット１００の感情値、及びユーザ１０の行動を含むイベントデータを複数含む。ユーザ１０の行動を含むデータは、ユーザ１０の行動を表すカメラ画像を含む。この感情値及び行動の履歴は、例えば、ユーザ１０の識別情報に対応付けられることによって、ユーザ１０毎に記録される。格納部２２０の少なくとも一部は、メモリ等の記憶媒体によって実装される。ユーザ１０の顔画像、ユーザ１０の属性情報等を格納する人物ＤＢを含んでもよい。なお、図２に示すロボット１００の構成要素のうち、制御対象２５２、センサ部２００及び格納部２２０を除く構成要素の機能は、ＣＰＵがプログラムに基づいて動作することによって実現できる。例えば、基本ソフトウエア（ＯＳ）及びＯＳ上で動作するプログラムによって、これらの構成要素の機能をＣＰＵの動作として実装できる。 The storage unit 220 includes a behavior decision model 221, history data 222, collected data 223, and behavior schedule data 224. The history data 222 includes the past emotional values of the user 10, the past emotional values of the robot 100, and the history of behavior, and specifically includes a plurality of event data including the emotional values of the user 10, the emotional values of the robot 100, and the behavior of the user 10. The data including the behavior of the user 10 includes a camera image representing the behavior of the user 10. The emotional values and the history of the behavior are recorded for each user 10, for example, by being associated with the identification information of the user 10. At least a part of the storage unit 220 is implemented by a storage medium such as a memory. It may include a person DB that stores the face image of the user 10, the attribute information of the user 10, and the like. Note that the functions of the components of the robot 100 shown in FIG. 2, except for the control target 252, the sensor unit 200, and the storage unit 220, can be realized by the CPU operating based on a program. For example, the functions of these components can be implemented as CPU operations using operating system (OS) and programs that run on the OS.

センサモジュール部２１０は、音声感情認識部２１１と、発話理解部２１２と、表情認識部２１３と、顔認識部２１４とを含む。センサモジュール部２１０には、センサ部２００で検出された情報が入力される。センサモジュール部２１０は、センサ部２００で検出された情報を解析して、解析結果を状態認識部２３０に出力する。 The sensor module unit 210 includes a voice emotion recognition unit 211, a speech understanding unit 212, a facial expression recognition unit 213, and a face recognition unit 214. Information detected by the sensor unit 200 is input to the sensor module unit 210. The sensor module unit 210 analyzes the information detected by the sensor unit 200 and outputs the analysis result to the state recognition unit 230.

センサモジュール部２１０の音声感情認識部２１１は、マイク２０１で検出されたユーザ１０の音声を解析して、ユーザ１０の感情を認識する。例えば、音声感情認識部２１１は、音声の周波数成分等の特徴量を抽出して、抽出した特徴量に基づいて、ユーザ１０の感情を認識する。発話理解部２１２は、マイク２０１で検出されたユーザ１０の音声を解析して、ユーザ１０の発話内容を表す文字情報を出力する。 The voice emotion recognition unit 211 of the sensor module unit 210 analyzes the voice of the user 10 detected by the microphone 201 to recognize the emotion of the user 10. For example, the voice emotion recognition unit 211 extracts features such as frequency components of the voice, and recognizes the emotion of the user 10 based on the extracted features. The speech understanding unit 212 analyzes the voice of the user 10 detected by the microphone 201, and outputs text information representing the content of the user 10's utterance.

表情認識部２１３は、２Ｄカメラ２０３で撮影されたユーザ１０の画像から、ユーザ１０の表情及びユーザ１０の感情を認識する。例えば、表情認識部２１３は、目及び口の形状、位置関係等に基づいて、ユーザ１０の表情及び感情を認識する。 The facial expression recognition unit 213 recognizes the facial expression and emotions of the user 10 from the image of the user 10 captured by the 2D camera 203. For example, the facial expression recognition unit 213 recognizes the facial expression and emotions of the user 10 based on the shape, positional relationship, etc. of the eyes and mouth.

顔認識部２１４は、ユーザ１０の顔を認識する。顔認識部２１４は、人物ＤＢ（図示省略）に格納されている顔画像と、２Ｄカメラ２０３によって撮影されたユーザ１０の顔画像とをマッチングすることによって、ユーザ１０を認識する。 The face recognition unit 214 recognizes the face of the user 10. The face recognition unit 214 recognizes the user 10 by matching a face image stored in a person DB (not shown) with a face image of the user 10 captured by the 2D camera 203.

状態認識部２３０は、センサモジュール部２１０で解析された情報に基づいて、ユーザ１０の状態を認識する。例えば、センサモジュール部２１０の解析結果を用いて、主として知覚に関する処理を行う。例えば、「パパが１人です。」、「パパが笑顔でない確率９０％です。」等の知覚情報を生成する。生成された知覚情報の意味を理解する処理を行う。例えば、「パパが１人、寂しそうです。」等の意味情報を生成する。 The state recognition unit 230 recognizes the state of the user 10 based on the information analyzed by the sensor module unit 210. For example, it mainly performs processing related to perception using the analysis results of the sensor module unit 210. For example, it generates perceptual information such as "Daddy is alone" or "There is a 90% chance that Daddy is not smiling." It then performs processing to understand the meaning of the generated perceptual information. For example, it generates semantic information such as "Daddy is alone and looks lonely."

状態認識部２３０は、センサ部２００で検出された情報に基づいて、ロボット１００の状態を認識する。例えば、状態認識部２３０は、ロボット１００の状態として、ロボット１００のバッテリー残量やロボット１００の周辺環境の明るさ等を認識する。 The state recognition unit 230 recognizes the state of the robot 100 based on the information detected by the sensor unit 200. For example, the state recognition unit 230 recognizes the remaining battery charge of the robot 100 and the brightness of the surrounding environment of the robot 100 as the state of the robot 100.

感情決定部２３２は、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態に基づいて、ユーザ１０の感情を示す感情値を決定する。例えば、センサモジュール部２１０で解析された情報、及び認識されたユーザ１０の状態を、予め学習されたニューラルネットワークに入力し、ユーザ１０の感情を示す感情値を取得する。 The emotion determination unit 232 determines an emotion value indicating the emotion of the user 10 based on the information analyzed by the sensor module unit 210 and the state of the user 10 recognized by the state recognition unit 230. For example, the information analyzed by the sensor module unit 210 and the recognized state of the user 10 are input to a pre-trained neural network to obtain an emotion value indicating the emotion of the user 10.

ここで、ユーザ１０の感情を示す感情値とは、ユーザの感情の正負を示す値であり、例えば、ユーザの感情が、「喜」、「楽」、「快」、「安心」、「興奮」、「安堵」、及び「充実感」のように、快感や安らぎを伴う明るい感情、換言すると「正の感情」であれば、正の値を示し、明るい感情であるほど、大きい値となる。ユーザの感情が、「怒」、「哀」、「不快」、「不安」、「悲しみ」、「心配」、及び「虚無感」のように、嫌な気持ちになってしまう感情、換言すると「負の感情」であれば、負の値を示し、嫌な気持ちであるほど、負の値の絶対値が大きくなる。ユーザの感情が、上記の何れでもない場合（「普通」）、０の値を示す。 Here, the emotion value indicating the emotion of user 10 is a value indicating the positive or negative emotion of the user. For example, if the user's emotion is a cheerful emotion accompanied by a sense of pleasure or comfort, such as "joy," "pleasure," "comfort," "relief," "excitement," "relief," and "fulfillment," in other words a "positive emotion," the value is positive, and the more cheerful the emotion, the larger the value is. If the user's emotion is a disgusting emotion, such as "anger," "sorrow," "discomfort," "anxiety," "sorrow," "worry," and "emptiness," in other words a "negative emotion," the value is negative, and the more disgusting the emotion, the larger the absolute value of the negative value is. If the user's emotion is none of the above ("normal"), the value is 0.

また、感情決定部２３２は、センサモジュール部２１０で解析された情報、センサ部２００で検出された情報、及び状態認識部２３０によって認識されたユーザ１０の状態に基づいて、ロボット１００の感情を示す感情値を決定する。 In addition, the emotion determination unit 232 determines an emotion value indicating the emotion of the robot 100 based on the information analyzed by the sensor module unit 210, the information detected by the sensor unit 200, and the state of the user 10 recognized by the state recognition unit 230.

ロボット１００の感情値は、複数の感情分類の各々に対する感情値を含み、例えば、「喜」、「怒」、「哀」、「楽」それぞれの強さを示す値（０～５）である。 The emotion value of the robot 100 includes emotion values for each of a number of emotion categories, and is, for example, a value (0 to 5) indicating the strength of each of the emotions "joy," "anger," "sorrow," and "happiness."

具体的には、感情決定部２３２は、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態に対応付けて定められた、ロボット１００の感情値を更新するルールに従って、ロボット１００の感情を示す感情値を決定する。 Specifically, the emotion determination unit 232 determines an emotion value indicating the emotion of the robot 100 according to rules for updating the emotion value of the robot 100 that are determined in association with the information analyzed by the sensor module unit 210 and the state of the user 10 recognized by the state recognition unit 230.

例えば、感情決定部２３２は、状態認識部２３０によってユーザ１０が寂しそうと認識された場合、ロボット１００の「哀」の感情値を増大させる。また、状態認識部２３０によってユーザ１０が笑顔になったと認識された場合、ロボット１００の「喜」の感情値を増大させる。 For example, if the state recognition unit 230 recognizes that the user 10 looks lonely, the emotion determination unit 232 increases the "sad" emotion value of the robot 100. Also, if the state recognition unit 230 recognizes that the user 10 is smiling, the emotion determination unit 232 increases the "happy" emotion value of the robot 100.

なお、感情決定部２３２は、ロボット１００の状態を更に考慮して、ロボット１００の感情を示す感情値を決定してもよい。例えば、ロボット１００のバッテリー残量が少ない場合やロボット１００の周辺環境が真っ暗な場合等に、ロボット１００の「哀」の感情値を増大させてもよい。更にバッテリー残量が少ないにも関わらず継続して話しかけてくるユーザ１０の場合は、「怒」の感情値を増大させても良い。 The emotion determination unit 232 may determine the emotion value indicating the emotion of the robot 100 by further considering the state of the robot 100. For example, when the battery level of the robot 100 is low or when the surrounding environment of the robot 100 is completely dark, the emotion value of "sadness" of the robot 100 may be increased. Furthermore, when the user 10 continues to talk to the robot 100 despite the battery level being low, the emotion value of "anger" may be increased.

行動認識部２３４は、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態に基づいて、ユーザ１０の行動を認識する。例えば、センサモジュール部２１０で解析された情報、及び認識されたユーザ１０の状態を、予め学習されたニューラルネットワークに入力し、予め定められた複数の行動分類（例えば、「笑う」、「怒る」、「質問する」、「悲しむ」）の各々の確率を取得し、最も確率の高い行動分類を、ユーザ１０の行動として認識する。 The behavior recognition unit 234 recognizes the behavior of the user 10 based on the information analyzed by the sensor module unit 210 and the state of the user 10 recognized by the state recognition unit 230. For example, the information analyzed by the sensor module unit 210 and the recognized state of the user 10 are input to a pre-trained neural network, the probability of each of a number of predetermined behavioral categories (e.g., "laughing," "anger," "asking a question," "sad") is obtained, and the behavioral category with the highest probability is recognized as the behavior of the user 10.

以上のように、本実施形態では、ロボット１００は、ユーザ１０を特定したうえでユーザ１０の発話内容を取得するが、当該発話内容の取得と利用等に際してはユーザ１０から法令に従った必要な同意を取得するほか、本実施形態に係るロボット１００の制御システムは、ユーザ１０の個人情報及びプライバシーの保護に配慮する。 As described above, in this embodiment, the robot 100 identifies the user 10 and acquires the contents of the user's utterance. When acquiring and using the contents of the utterance, the robot 100 obtains the necessary consent in accordance with laws and regulations from the user 10, and the control system of the robot 100 according to this embodiment takes into consideration the protection of the personal information and privacy of the user 10.

次に、ユーザ１０の行動に対してロボット１００が応答する応答処理を行う際の、行動決定部２３６の処理について説明する。 Next, we will explain the processing of the behavior decision unit 236 when performing response processing in which the robot 100 responds to the behavior of the user 10.

行動決定部２３６は、感情決定部２３２により決定されたユーザ１０の現在の感情値と、ユーザ１０の現在の感情値が決定されるよりも前に感情決定部２３２により決定された過去の感情値の履歴データ２２２と、ロボット１００の感情値とに基づいて、行動認識部２３４によって認識されたユーザ１０の行動に対応する行動を決定する。本実施形態では、行動決定部２３６は、ユーザ１０の過去の感情値として、履歴データ２２２に含まれる直近の１つの感情値を用いる場合について説明するが、開示の技術はこの態様に限定されない。例えば、行動決定部２３６は、ユーザ１０の過去の感情値として、直近の複数の感情値を用いてもよいし、一日前などの単位期間の分だけ前の感情値を用いてもよい。また、行動決定部２３６は、ロボット１００の現在の感情値だけでなく、ロボット１００の過去の感情値の履歴を更に考慮して、ユーザ１０の行動に対応する行動を決定してもよい。行動決定部２３６が決定する行動は、ロボット１００が行うジェスチャー又はロボット１００の発話内容を含む。 The behavior determination unit 236 determines an action corresponding to the behavior of the user 10 recognized by the behavior recognition unit 234 based on the current emotion value of the user 10 determined by the emotion determination unit 232, the history data 222 of past emotion values determined by the emotion determination unit 232 before the current emotion value of the user 10 was determined, and the emotion value of the robot 100. In this embodiment, the behavior determination unit 236 uses one most recent emotion value included in the history data 222 as the past emotion value of the user 10, but the disclosed technology is not limited to this aspect. For example, the behavior determination unit 236 may use the most recent multiple emotion values as the past emotion value of the user 10, or may use an emotion value from a unit period ago, such as one day ago. In addition, the behavior determination unit 236 may determine an action corresponding to the behavior of the user 10 by further considering not only the current emotion value of the robot 100 but also the history of the past emotion values of the robot 100. The behavior determined by the behavior determination unit 236 includes gestures performed by the robot 100 or the contents of speech uttered by the robot 100.

本実施形態に係る行動決定部２３６は、ユーザ１０の行動に対応する行動として、ユーザ１０の過去の感情値と現在の感情値の組み合わせと、ロボット１００の感情値と、ユーザ１０の行動と、行動決定モデル２２１とに基づいて、ロボット１００の行動を決定する。例えば、行動決定部２３６は、ユーザ１０の過去の感情値が正の値であり、かつ現在の感情値が負の値である場合、ユーザ１０の行動に対応する行動として、ユーザ１０の感情値を正に変化させるための行動を決定する。 The behavior decision unit 236 according to this embodiment decides the behavior of the robot 100 as the behavior corresponding to the behavior of the user 10, based on a combination of the past and current emotion values of the user 10, the emotion value of the robot 100, the behavior of the user 10, and the behavior decision model 221. For example, when the past emotion value of the user 10 is a positive value and the current emotion value is a negative value, the behavior decision unit 236 decides the behavior corresponding to the behavior of the user 10 as the behavior for changing the emotion value of the user 10 to a positive value.

行動決定モデル２２１としての反応ルールには、ユーザ１０の過去の感情値と現在の感情値の組み合わせと、ロボット１００の感情値と、ユーザ１０の行動とに応じたロボット１００の行動が定められている。例えば、ユーザ１０の過去の感情値が正の値であり、かつ現在の感情値が負の値であり、ユーザ１０の行動が悲しむである場合、ロボット１００の行動として、ジェスチャーを交えてユーザ１０を励ます問いかけを行う際のジェスチャーと発話内容との組み合わせが定められている。 The reaction rules as the behavior decision model 221 define the behavior of the robot 100 according to a combination of the past and current emotional values of the user 10, the emotional value of the robot 100, and the behavior of the user 10. For example, when the past emotional value of the user 10 is a positive value and the current emotional value is a negative value, and the behavior of the user 10 is sad, a combination of gestures and speech content when asking a question to encourage the user 10 with gestures is defined as the behavior of the robot 100.

例えば、行動決定モデル２２１としての反応ルールには、ロボット１００の感情値のパターン（「喜」、「怒」、「哀」、「楽」の値「０」～「５」の６値の４乗である１２９６パターン）、ユーザ１０の過去の感情値と現在の感情値の組み合わせのパターン、ユーザ１０の行動パターンの全組み合わせに対して、ロボット１００の行動が定められる。すなわち、ロボット１００の感情値のパターン毎に、ユーザ１０の過去の感情値と現在の感情値の組み合わせが、負の値と負の値、負の値と正の値、正の値と負の値、正の値と正の値、負の値と普通、及び普通と普通等のように、複数の組み合わせのそれぞれに対して、ユーザ１０の行動パターンに応じたロボット１００の行動が定められる。なお、行動決定部２３６は、例えば、ユーザ１０が「この前に話したあの話題について話したい」というような過去の話題から継続した会話を意図する発話を行った場合に、履歴データ２２２を用いてロボット１００の行動を決定する動作モードに遷移してもよい。 For example, the reaction rule as the behavior decision model 221 defines the behavior of the robot 100 for all combinations of the patterns of the emotion values of the robot 100 (1296 patterns, which are the fourth power of six values of "joy", "anger", "sorrow", and "pleasure", from "0" to "5"); the combination patterns of the past emotion values and the current emotion values of the user 10; and the behavior patterns of the user 10. That is, for each pattern of the emotion values of the robot 100, the behavior of the robot 100 is defined according to the behavior patterns of the user 10 for each of a plurality of combinations of the past emotion values and the current emotion values of the user 10, such as negative values and negative values, negative values and positive values, positive values and negative values, positive values and positive values, negative values and normal values, and normal values and normal values. Note that the behavior decision unit 236 may transition to an operation mode that determines the behavior of the robot 100 using the history data 222, for example, when the user 10 makes an utterance intending to continue a conversation from a past topic, such as "I want to talk about that topic we talked about last time."

なお、行動決定モデル２２１としての反応ルールには、ロボット１００の感情値のパターン（１２９６パターン）の各々に対して、最大で一つずつ、ロボット１００の行動としてジェスチャー及び発言内容の少なくとも一方が定められていてもよい。あるいは、行動決定モデル２２１としての反応ルールには、ロボット１００の感情値のパターンのグループの各々に対して、ロボット１００の行動としてジェスチャー及び発言内容の少なくとも一方が定められていてもよい。 The reaction rules as the behavior decision model 221 may define at least one of a gesture and a statement as the behavior of the robot 100, up to one for each of the patterns (1296 patterns) of the emotion value of the robot 100. Alternatively, the reaction rules as the behavior decision model 221 may define at least one of a gesture and a statement as the behavior of the robot 100, for each group of patterns of the emotion value of the robot 100.

行動決定モデル２２１としての反応ルールに定められているロボット１００の行動に含まれる各ジェスチャーには、当該ジェスチャーの強度が予め定められている。行動決定モデル２２１としての反応ルールに定められているロボット１００の行動に含まれる各発話内容には、当該発話内容の強度が予め定められている。 The strength of each gesture included in the behavior of the robot 100 defined in the reaction rules as the behavior decision model 221 is predetermined. The strength of each utterance content included in the behavior of the robot 100 defined in the reaction rules as the behavior decision model 221 is predetermined.

記憶制御部２３８は、行動決定部２３６によって決定された行動に対して予め定められた行動の強度と、感情決定部２３２により決定されたロボット１００の感情値とに基づいて、ユーザ１０の行動を含むデータを履歴データ２２２に記憶するか否かを決定する。 The memory control unit 238 determines whether or not to store data including the behavior of the user 10 in the history data 222 based on the predetermined behavior strength for the behavior determined by the behavior determination unit 236 and the emotion value of the robot 100 determined by the emotion determination unit 232.

具体的には、ロボット１００の複数の感情分類の各々に対する感情値の総和と、行動決定部２３６によって決定された行動が含むジェスチャーに対して予め定められた強度と、行動決定部２３６によって決定された行動が含む発話内容に対して予め定められた強度との和である強度の総合値が、閾値以上である場合、ユーザ１０の行動を含むデータを履歴データ２２２に記憶すると決定する。 Specifically, if the total intensity value, which is the sum of the emotion values for each of the multiple emotion classifications of the robot 100, the predetermined intensity for the gesture included in the behavior determined by the behavior determination unit 236, and the predetermined intensity for the speech content included in the behavior determined by the behavior determination unit 236, is equal to or greater than a threshold value, it is determined that data including the behavior of the user 10 is to be stored in the history data 222.

記憶制御部２３８は、ユーザ１０の行動を含むデータを履歴データ２２２に記憶すると決定した場合、行動決定部２３６によって決定された行動と、現時点から一定期間前までの、センサモジュール部２１０で解析された情報（例えば、その場の音声、画像、匂い等のデータなどのあらゆる周辺情報）、及び状態認識部２３０によって認識されたユーザ１０の状態（例えば、ユーザ１０の表情、感情など）を、履歴データ２２２に記憶する。 When the memory control unit 238 decides to store data including the behavior of the user 10 in the history data 222, it stores in the history data 222 the behavior determined by the behavior determination unit 236, information analyzed by the sensor module unit 210 from the present time up to a certain period of time ago (e.g., all peripheral information such as data on the sound, images, smells, etc. of the scene), and the state of the user 10 recognized by the state recognition unit 230 (e.g., the facial expression, emotions, etc. of the user 10).

行動制御部２５０は、行動決定部２３６が決定した行動に基づいて、制御対象２５２を制御する。例えば、行動制御部２５０は、行動決定部２３６が発話することを含む行動を決定した場合に、制御対象２５２に含まれるスピーカから音声を出力させる。このとき、行動制御部２５０は、ロボット１００の感情値に基づいて、音声の発声速度を決定してもよい。例えば、行動制御部２５０は、ロボット１００の感情値が大きいほど、速い発声速度を決定する。このように、行動制御部２５０は、感情決定部２３２が決定した感情値に基づいて、行動決定部２３６が決定した行動の実行形態を決定する。 The behavior control unit 250 controls the control target 252 based on the behavior determined by the behavior determination unit 236. For example, when the behavior determination unit 236 determines an behavior including speaking, the behavior control unit 250 outputs a sound from a speaker included in the control target 252. At this time, the behavior control unit 250 may determine the speaking speed of the sound based on the emotion value of the robot 100. For example, the behavior control unit 250 determines a faster speaking speed as the emotion value of the robot 100 increases. In this way, the behavior control unit 250 determines the execution form of the behavior determined by the behavior determination unit 236 based on the emotion value determined by the emotion determination unit 232.

行動制御部２５０は、行動決定部２３６が決定した行動を実行したことに対するユーザ１０の感情の変化を認識してもよい。例えば、ユーザ１０の音声や表情に基づいて感情の変化を認識してよい。その他、センサ部２００に含まれるタッチセンサ２０５で衝撃が検出されたことに基づいて、ユーザ１０の感情の変化を認識してよい。センサ部２００に含まれるタッチセンサ２０５で衝撃が検出された場合に、ユーザ１０の感情が悪くなったと認識したり、センサ部２００に含まれるタッチセンサ２０５の検出結果から、ユーザ１０の反応が笑っている、あるいは、喜んでいる等と判断される場合には、ユーザ１０の感情が良くなったと認識したりしてもよい。ユーザ１０の反応を示す情報は、通信処理部２８０に出力される。 The behavior control unit 250 may recognize a change in the user 10's emotions in response to the execution of the behavior determined by the behavior determination unit 236. For example, the change in emotions may be recognized based on the voice or facial expression of the user 10. Alternatively, the change in emotions may be recognized based on the detection of an impact by the touch sensor 205 included in the sensor unit 200. If an impact is detected by the touch sensor 205 included in the sensor unit 200, the user 10's emotions may be recognized as having worsened, and if the detection result of the touch sensor 205 included in the sensor unit 200 indicates that the user 10 is smiling or happy, the user 10's emotions may be recognized as having improved. Information indicating the user 10's reaction is output to the communication processing unit 280.

また、行動制御部２５０は、行動決定部２３６が決定した行動をロボット１００の感情に応じて決定した実行形態で実行した後、感情決定部２３２は、当該行動が実行されたことに対するユーザの反応に基づいて、ロボット１００の感情値を更に変化させる。具体的には、感情決定部２３２は、行動決定部２３６が決定した行動を行動制御部２５０が決定した実行形態でユーザに対して行ったことに対するユーザの反応が不良でなかった場合に、ロボット１００の「喜」の感情値を増大させる。また、感情決定部２３２は、行動決定部２３６が決定した行動を行動制御部２５０が決定した実行形態でユーザに対して行ったことに対するユーザの反応が不良であった場合に、ロボット１００の「哀」の感情値を増大させる。 In addition, after the behavior control unit 250 executes the behavior determined by the behavior determination unit 236 in the execution form determined according to the emotion of the robot 100, the emotion determination unit 232 further changes the emotion value of the robot 100 based on the user's reaction to the execution of the behavior. Specifically, the emotion determination unit 232 increases the emotion value of "happiness" of the robot 100 when the user's reaction to the behavior determined by the behavior determination unit 236 being performed on the user in the execution form determined by the behavior control unit 250 is not bad. In addition, the emotion determination unit 232 increases the emotion value of "sadness" of the robot 100 when the user's reaction to the behavior determined by the behavior determination unit 236 being performed on the user in the execution form determined by the behavior control unit 250 is bad.

更に、行動制御部２５０は、決定したロボット１００の感情値に基づいて、ロボット１００の感情を表現する。例えば、行動制御部２５０は、ロボット１００の「喜」の感情値を増加させた場合、制御対象２５２を制御して、ロボット１００に喜んだ仕草を行わせる。また、行動制御部２５０は、ロボット１００の「哀」の感情値を増加させた場合、ロボット１００の姿勢がうなだれた姿勢になるように、制御対象２５２を制御する。 Furthermore, the behavior control unit 250 expresses the emotion of the robot 100 based on the determined emotion value of the robot 100. For example, when the behavior control unit 250 increases the emotion value of "happiness" of the robot 100, it controls the control object 252 to make the robot 100 perform a happy gesture. When the behavior control unit 250 increases the emotion value of "sadness" of the robot 100, it controls the control object 252 to make the robot 100 assume a droopy posture.

通信処理部２８０は、サーバ３００との通信を担う。上述したように、通信処理部２８０は、ユーザ反応情報をサーバ３００に送信する。また、通信処理部２８０は、更新された反応ルールをサーバ３００から受信する。通信処理部２８０がサーバ３００から、更新された反応ルールを受信すると、行動決定モデル２２１としての反応ルールを更新する。 The communication processing unit 280 is responsible for communication with the server 300. As described above, the communication processing unit 280 transmits user reaction information to the server 300. In addition, the communication processing unit 280 receives updated reaction rules from the server 300. When the communication processing unit 280 receives updated reaction rules from the server 300, it updates the reaction rules as the behavioral decision model 221.

サーバ３００は、ロボット１００、ロボット１０１及びロボット１０２とサーバ３００との間の通信を行い、ロボット１００から送信されたユーザ反応情報を受信し、ポジティブな反応が得られた行動を含む反応ルールに基づいて、反応ルールを更新する。 The server 300 communicates between the robots 100, 101, and 102 and the server 300, receives user reaction information sent from the robot 100, and updates the reaction rules based on reaction rules that include actions that have received positive reactions.

関連情報収集部２７０は、所定のタイミングで、ユーザ１０について取得した好み情報に基づいて、外部データ（ニュースサイト、動画サイトなどのＷｅｂサイト）から、好み情報に関連する情報を収集する。 The related information collection unit 270 collects information related to the preference information acquired about the user 10 at a predetermined timing from external data (websites such as news sites and video sites) based on the preference information acquired about the user 10.

具体的には、関連情報収集部２７０は、ユーザ１０の発話内容、又はユーザ１０による設定操作から、ユーザ１０の関心がある事柄を表す好み情報を取得しておく。関連情報収集部２７０は、一定期間毎に、好み情報に関連するニュースを、例えばＣｈａｔＧＰＴＰｌｕｇｉｎｓ（インターネット検索＜URL: https://openai.com/blog/chatgpt-plugins＞）を用いて、外部データから収集する。例えば、ユーザ１０が特定のプロ野球チームのファンであることが好み情報として取得されている場合、関連情報収集部２７０は、毎日、所定時刻に、特定のプロ野球チームの試合結果に関連するニュースを、例えばＣｈａｔＧＰＴＰｌｕｇｉｎｓを用いて、外部データから収集する。 Specifically, the related information collection unit 270 acquires preference information indicating matters of interest to the user 10 from the contents of the speech of the user 10 or from a setting operation by the user 10. The related information collection unit 270 periodically collects news related to the preference information from external data, for example, using ChatGPT Plugins (Internet search <URL: https://openai.com/blog/chatgpt-plugins>). For example, if it is acquired as preference information that the user 10 is a fan of a specific professional baseball team, the related information collection unit 270 collects news related to the game results of the specific professional baseball team from external data at a predetermined time every day, for example, using ChatGPT Plugins.

感情決定部２３２は、関連情報収集部２７０によって収集した好み情報に関連する情報に基づいて、ロボット１００の感情を決定する。 The emotion determination unit 232 determines the emotion of the robot 100 based on information related to the preference information collected by the related information collection unit 270.

具体的には、感情決定部２３２は、関連情報収集部２７０によって収集した好み情報に関連する情報を表すテキストを、感情を判定するための予め学習されたニューラルネットワークに入力し、各感情を示す感情値を取得し、ロボット１００の感情を決定する。例えば、収集した特定のプロ野球チームの試合結果に関連するニュースが、特定のプロ野球チームが勝ったことを示している場合、ロボット１００の「喜」の感情値が大きくなるように決定する。 Specifically, the emotion determination unit 232 inputs text representing information related to the preference information collected by the related information collection unit 270 into a pre-trained neural network for determining emotions, obtains emotion values indicating each emotion, and determines the emotion of the robot 100. For example, if the collected news related to the game results of a specific professional baseball team indicates that the specific professional baseball team won, the emotion determination unit 232 determines that the emotion value of "joy" for the robot 100 is large.

記憶制御部２３８は、ロボット１００の感情値が閾値以上である場合に、関連情報収集部２７０によって収集した好み情報に関連する情報を、収集データ２２３に格納する。 When the emotion value of the robot 100 is equal to or greater than the threshold value, the memory control unit 238 stores information related to the preference information collected by the related information collection unit 270 in the collected data 223.

次に、ロボット１００及び通信端末が自律的に行動する自律的処理を行う際の、行動決定部２３６の処理について説明する。 Next, we will explain the processing of the behavior decision unit 236 when the robot 100 and the communication terminal perform autonomous processing to act autonomously.

本実施形態における自律的処理では、ロボット１００及び通信端末は、ユーザを監視することで、自発的に又は定期的に、ユーザの状態又は行動を検知してよい。自発的は、ロボット１００及び通信端末が外部から契機なしに、ユーザの状態又は行動を自ら進んで取得することと解釈してよい。外部から契機は、ユーザからロボット１００及び通信端末への質問、ユーザからロボット１００及び通信端末への能動的な行動などを含み得る。定期的とは、１秒単位、１分単位、１時間単位、数時間単位、数日単位、週単位、曜日単位などの、特定周期と解釈してよい。 In the autonomous processing of this embodiment, the robot 100 and the communication terminal may detect the user's state or behavior spontaneously or periodically by monitoring the user. Spontaneous may be interpreted as the robot 100 and the communication terminal acquiring the user's state or behavior of their own accord without an external trigger. External triggers may include questions from the user to the robot 100 and the communication terminal, active behavior from the user to the robot 100 and the communication terminal, etc. Periodically may be interpreted as a specific cycle, such as every second, every minute, every hour, every few hours, every few days, every week, or every day of the week.

ユーザの状態は、ユーザの行動傾向を含み得る。行動傾向は、ユーザが長時間歩行すること、ユーザが長時間ランニングすること、ユーザが危険な動作を行うことを含み得る。危険な動作は、ユーザが車道を歩くこと、ユーザが歩道から車道に侵入することを含み得る。行動傾向は、ユーザの多動性又は衝動性のある行動の傾向と解釈してもよい。 The user's state may include the user's behavioral tendency. The behavioral tendency may include the user walking for a long time, the user running for a long time, and the user performing a dangerous action. The dangerous action may include the user walking on the roadway, and the user entering the roadway from the sidewalk. The behavioral tendency may be interpreted as the user's tendency to behave hyperactively or impulsively.

また、自律的処理では、ロボット１００及び通信端末は、検知したユーザの状態又は行動について、生成系ＡＩに質問し、質問に対する生成系ＡＩの回答と、検知したユーザの行動とを、対応付けて記憶してよい。このとき、ロボット１００及び通信端末は、当該行動を是正する行動内容を、当該回答に対応付けて記憶してよい。 In addition, in the autonomous processing, the robot 100 and the communication terminal may ask the generative AI questions about the detected state or behavior of the user, and may store the generative AI's answer to the question in association with the detected user behavior. At this time, the robot 100 and the communication terminal may store the action content for correcting the behavior in association with the answer.

質問に対する生成系ＡＩの回答と、検知したユーザの行動と、行動を是正する行動内容とを対応付けた情報は、テーブル情報として、メモリなどの記憶媒体に記録してよい。当該テーブル情報は、記憶部に記録された特定情報と解釈してよい。 The information associating the generative AI's answer to the question, the detected user behavior, and the action content for correcting the behavior may be recorded as table information in a storage medium such as a memory. The table information may be interpreted as specific information recorded in the storage unit.

具体的には、ユーザが長時間歩行する傾向がある場合に、ロボット１００及び通信端末が生成系ＡＩに「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行い、この質問に対する生成系ＡＩの回答が「以前通ったルートを同じルートを提案する」である場合、ロボット１００及び通信端末は、当該回答と、長時間歩行し得るユーザの行動と、行動を是正する行動内容（例えば「以前にも通ったルートを案内します」というガイダンス内容）とを、テーブル情報に対応付けて記録してよい。長時間歩行しているユーザの行動を検知した場合、ロボット１００及び通信端末の行動決定部２３６は、当該テーブル情報を利用することで、ロボット１００及び通信端末の行動として、「以前にも通ったルートを案内します」という音声を再生してよい。 Specifically, when a user has a tendency to walk for long periods of time, the robot 100 and the communication terminal ask the generative AI, "What kind of guidance should be given to a user who behaves like this?", and when the generative AI answers this question with, "Propose the same route as the user has taken before," the robot 100 and the communication terminal may record the answer, the user's behavior that may lead to long periods of walking, and the action content for correcting the behavior (for example, guidance content such as "We will guide you along a route you have taken before") in association with table information. When detecting the behavior of a user who has been walking for long periods of time, the action decision unit 236 of the robot 100 and the communication terminal may use the table information to play a voice stating, "We will guide you along a route you have taken before," as the action of the robot 100 and the communication terminal.

他の例として、ユーザが長時間ランニングする傾向がある場合に、ロボット１００及び通信端末が生成系ＡＩに「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行い、この質問に対する生成系ＡＩの回答が「水分補給を促す案内を定期的に行うことが望ましい」である場合、ロボット１００及び通信端末は、当該回答と、長時間ランニングし得るユーザの行動と、行動を是正する行動内容（例えば「水分補給をこまめに行ってください」というガイダンス内容）とを、テーブル情報に対応付けて記録してよい。長時間ランニングしているユーザの行動を検知した場合、ロボット１００及び通信端末の行動決定部２３６は、当該テーブル情報を利用することで、ロボット１００及び通信端末の行動として、「水分補給をこまめに行ってください」という音声を再生してよい。 As another example, if a user has a tendency to run for long periods of time, the robot 100 and the communication terminal may ask the generative AI, "What kind of guidance should be given to a user who behaves in this way?", and if the generative AI answers this question by saying, "It is desirable to periodically provide guidance encouraging the user to hydrate," the robot 100 and the communication terminal may record the answer, the behavior of the user who may be running for long periods of time, and the behavioral content for correcting the behavior (for example, the guidance content "Please hydrate frequently") in association with table information. When the behavior of a user who is running for long periods of time is detected, the behavior decision unit 236 of the robot 100 and the communication terminal may use the table information to play a voice message saying, "Please hydrate frequently," as the behavior of the robot 100 and the communication terminal.

他の例として、ユーザが車道を歩く傾向がある場合に、ロボット１００及び通信端末が生成系ＡＩに「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行い、この質問に対する生成系ＡＩの回答が「車両と接触する可能性があるので歩道に戻ることを提案することを推奨する」である場合、ロボット１００及び通信端末は、当該回答と、車道を歩き得るユーザの行動と、行動を是正する行動内容（例えば「歩道に戻ってください」というガイダンス内容）とを、テーブル情報に対応付けて記録してよい。歩道を歩いているユーザの行動を検知した場合、ロボット１００及び通信端末の行動決定部２３６は、当該テーブル情報を利用することで、ロボット１００及び通信端末の行動として、「歩道に戻ってください」という音声を再生してよい。 As another example, if a user has a tendency to walk on the roadway, the robot 100 and the communication terminal may ask the generative AI, "What kind of guidance should be given to a user behaving in this way?", and if the generative AI's answer to this question is, "There is a possibility of contact with a vehicle, so we recommend suggesting that the user return to the sidewalk," the robot 100 and the communication terminal may record this answer, the user's behavior that may walk on the roadway, and the action content for correcting the behavior (for example, the guidance content "return to the sidewalk") in association with the table information. When the action of a user walking on the sidewalk is detected, the action decision unit 236 of the robot 100 and the communication terminal may use the table information to play a voice saying "return to the sidewalk" as the action of the robot 100 and the communication terminal.

また自律的処理では、検出したユーザの行動と、記憶した特定情報とに基づき、ユーザの状態又は行動に対して、注意を促すロボット１００及び通信端末の行動予定を設定してよい。 In addition, in the autonomous processing, a behavioral schedule may be set for the robot 100 and the communication terminal to alert the user to the user's state or behavior, based on the detected user behavior and the stored specific information.

前述したように、ロボット１００及び通信端末は、ユーザの状態又は行動に対応する生成系ＡＩの回答と、検知したユーザの状態又は行動とを対応付けたテーブル情報を記憶媒体に記録し得る。以下に、テーブルに記憶する内容の例について説明する。 As described above, the robot 100 and the communication terminal can record table information in a storage medium that associates the generative AI's response corresponding to the user's state or behavior with the detected user's state or behavior. An example of the contents stored in the table is described below.

（１．ユーザが長時間歩行する傾向がある場合）
当該傾向がある場合、ロボット１００及び通信端末は、ロボット１００及び通信端末自ら生成系ＡＩに、「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行う。この質問に対する生成系ＡＩの回答が、例えば「以前通ったルートとは異なるルートを提案する」、「雨が降りそうなので雨宿りできる場所へ案内するルートを提案する」、「自転車などの車両の通行が比較的少ないルートを提案する」などである場合、ロボット１００及び通信端末は、ユーザの行動と、生成系ＡＩの回答を対応付けて記憶してよい。 (1. When the user tends to walk for long periods of time)
When such a tendency exists, the robot 100 and the communication terminal themselves ask the generative AI, "What kind of guidance should be provided to a user behaving in this way?" If the generative AI answers this question with, for example, "Suggest a route different from the route taken previously,""Suggest a route that guides the user to a place where they can take shelter from the rain because it looks like it is about to rain," or "Suggest a route with relatively little traffic of vehicles such as bicycles," the robot 100 and the communication terminal may store the user's behavior and the answer of the generative AI in association with each other.

（２．ユーザが長時間ランニングする傾向がある場合）
当該傾向がある場合、ロボット１００及び通信端末は、ロボット１００及び通信端末自ら生成系ＡＩに、「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行う。この質問に対する生成系ＡＩの回答が、例えば「水分補給を促す案内を定期的に行う」、「自転車が少なく空気がきれいなルートを提案する」、「信号機が少ないルートを提案する」、「負荷を高めるため起伏のあるルートを提案する」などである場合、ロボット１００及び通信端末は、ユーザの行動と、生成系ＡＩの回答を対応付けて記憶してよい。 (2. If the user tends to run for long periods of time)
When such a tendency is found, the robot 100 and the communication terminal themselves ask the generative AI, "What kind of guidance should be provided to a user who behaves in this way?" If the generative AI's answer to this question is, for example, "Periodic guidance should be provided to encourage hydration,""Suggest a route with fewer bicycles and cleaner air,""Suggest a route with fewer traffic lights,""Suggest an undulating route to increase the load," etc., the robot 100 and the communication terminal may store the user's behavior and the generative AI's answer in association with each other.

（３．ユーザが車道を歩く傾向がある場合）
当該傾向がある場合、ロボット１００及び通信端末は、ロボット１００及び通信端末自ら生成系ＡＩに、「このような行動をとるユーザにどのような事態が及び得るか？」という質問を行う。この質問に対する生成系ＡＩの回答が、例えば「自転車などの車両と接触する可能性がある」、「高速ＩＣに侵入し得る」、「できるだけ歩道を歩くように促すアドバイスをする」などである場合、ロボット１００及び通信端末は、車道を歩くというユーザの行動と、生成系ＡＩの回答を対応付けて記憶してよい。またロボット１００及び通信端末は、当該行動を是正する行動内容を、当該回答に対応付けて記憶してよい。 (3. When the user tends to walk on the road)
If the tendency is present, the robot 100 and the communication terminal themselves ask the generative AI, "What kind of situation may occur to a user who behaves in this way?" If the generative AI's answer to this question is, for example, "There is a possibility of contact with a vehicle such as a bicycle,""There is a possibility of entering a highway interchange," or "Provide advice to encourage walking on the sidewalk as much as possible," the robot 100 and the communication terminal may store the user's behavior of walking on the roadway in association with the generative AI's answer. The robot 100 and the communication terminal may also store the content of an action to correct the behavior in association with the answer.

行動を是正する行動内容は、ユーザの危険な行動を是正する音声の再生、及び、ユーザの危険な行動を是正する画像の再生の少なくとも１つを含めてよい。 The behavioral content for correcting the behavior may include at least one of playing audio to correct the user's risky behavior and playing an image to correct the user's risky behavior.

ユーザの危険な行動を是正する音声は、ユーザを特定の場所に誘導する音声を含み得る。特定の場所は、ユーザを現在位置する場所以外の場所、例えば、歩道などを含めてよい。具体的には、当該音声は、「車道を歩くのはやめてください」、「そこは歩道ではありません」、「車道は危ない」、「すぐ歩道に移動して下さい」などの音声を含めてよい。 The audio that corrects the user's risky behavior may include audio that guides the user to a specific location. The specific location may include a location other than the user's current location, such as a sidewalk. Specifically, the audio may include audio such as "Please do not walk on the road," "That is not a sidewalk," "The road is dangerous," and "Move to the sidewalk immediately."

このように、自律的処理では、ユーザの状態又は行動に対応する生成系ＡＩの回答と、当該状態又は行動の内容と、当該状態又は行動を是正する行動内容とを対応付けたテーブルを、メモリなどの記憶媒体に記録してよい。 In this way, in autonomous processing, a table that associates the generative AI's response corresponding to the user's state or behavior, the content of that state or behavior, and the content of the behavior that corrects that state or behavior may be recorded in a storage medium such as a memory.

また、自律的処理では、当該テーブルを記録した後、ユーザの行動を自律的又は定期的に検出し、検出したユーザの行動と記憶したテーブルの内容とに基づき、ユーザに注意を促す通信端末の行動予定を設定してよい。具体的には、ロボット１００及び通信端末の行動決定部２３６が、検出したユーザの行動と記憶したテーブルの内容とに基づき、第１行動内容を設定してよい。 In addition, in the autonomous processing, after recording the table, the user's behavior may be detected autonomously or periodically, and a behavioral plan for the communication terminal that alerts the user may be set based on the detected user's behavior and the contents of the stored table. Specifically, the behavior decision unit 236 of the robot 100 and the communication terminal may set the first behavior content based on the detected user's behavior and the contents of the stored table.

具体的には、行動決定部２３６は、自発的に又は定期的に、センサに基づきユーザの行動を検知し、検知したユーザの行動と予め記憶した特定情報とに基づき、電子機器の行動として、ユーザを先導することを決定した場合には、第１行動内容を決定してよい。 Specifically, the behavior decision unit 236 may detect the user's behavior based on a sensor, either autonomously or periodically, and when it determines that the behavior of the electronic device is to lead the user based on the detected user's behavior and specific information stored in advance, it may determine the first behavior content.

センサは、通信端末に設けられている１又は複数のセンサのセンサと解釈してよい。具体的には、１又は複数のセンサは、カメラ、マイク、ジャイロセンサなどを含み得る。 The sensor may be interpreted as one or more sensors provided in the communication terminal. Specifically, the one or more sensors may include a camera, a microphone, a gyro sensor, etc.

ここで、図１６を参照して、通信端末の具体的について説明する。 Here, we will explain the communication terminal in detail with reference to Figure 16.

図１６は、本開示の実施形態に係る通信端末の外観図である。図１６に示すように、通信端末１００Ｍは、ネックストラップ６００に着脱可能に設けられる電子機器と解釈してよい。通信端末１００Ｍは、後述するスマートホンと解釈してよく、スマートＡＩデバイスと解釈してもよい。通信端末１００Ｍは、前述したロボット１００～１０２の機能を備えてよい。ネックストラップ６００は、通信端末１００Ｍに係止めされるフック形状の部材を備えてよい。 FIG. 16 is an external view of a communication terminal according to an embodiment of the present disclosure. As shown in FIG. 16, the communication terminal 100M may be interpreted as an electronic device that is detachably attached to a neck strap 600. The communication terminal 100M may be interpreted as a smartphone, which will be described later, or as a smart AI device. The communication terminal 100M may have the functions of the robots 100 to 102 described above. The neck strap 600 may have a hook-shaped member that is engaged with the communication terminal 100M.

（第１行動内容）
第１行動内容は、音声及び画像の少なくとも１つを通信端末１００Ｍが再生することでユーザを先導する通信端末１００Ｍの行動を含み得る。以下に、第１行動内容の例について説明する。 (First action content)
The first action content may include an action of the communication terminal 100M leading the user by playing back at least one of a sound and an image by the communication terminal 100M. An example of the first action content will be described below.

行動決定部２３６は、歩行するユーザが特定ルートを移動するようにルート案内する第１行動内容として、ユーザの歩行を補助する音声及び画像の少なくとも１つの再生を実行し得る。具体的には、行動決定部２３６は、通信端末１００Ｍがネックストラップ６００に装着されたことを検出した場合、ユーザが定期的に向かう傾向のある場所を、自発的に検索し、検索した場所までのルートを案内してよい。行動決定部２３６は、さらに、通信端末１００Ｍが有するセンサに含まれるカメラがユーザの進行方向に向けられていることを検出した場合、ユーザが定期的に向かう傾向のある場所を、自発的に検索し、検索した場所までのルートを案内してよい。また行動決定部２３６は、ユーザが歩行し得るルートの案内に関する音声を再生してよい。当該音声は、「Ａさんがこの時間によく向かう場所を検索しました。案内を開始します」、図１７に示す「１０ｍ先を右方向です」などを含み得る。行動決定部２３６はユーザが歩行し得るルートの案内に関する画像を通信端末１００Ｍの画面に再生してもよい。 The action decision unit 236 may execute the reproduction of at least one of a sound and an image that assists the user in walking as a first action content that guides the user to move along a specific route. Specifically, when the action decision unit 236 detects that the communication terminal 100M is attached to the neck strap 600, it may spontaneously search for a place that the user tends to go to periodically and guide the user along a route to the searched place. When the action decision unit 236 detects that the camera included in the sensor of the communication terminal 100M is pointed in the user's traveling direction, it may spontaneously search for a place that the user tends to go to periodically and guide the user along a route to the searched place. The action decision unit 236 may also reproduce a sound related to the guidance of a route that the user can walk. The sound may include, for example, "Places that Mr. A often goes to at this time have been searched for. Guidance will begin," and "Turn right 10 meters ahead," as shown in FIG. 17. The action decision unit 236 may play an image on the screen of the communication terminal 100M that provides guidance on a route that the user can walk.

行動決定部２３６は、設定された案内ルートと別のルートを案内する音声及び画像の少なくとも１つの再生を実行してよい。具体的には、行動決定部２３６は、設定された案内ルートの交通状況、当該ルート付近の気候、現在時刻などを含めて、ユーザが定期的に向かう傾向のある場所を、自発的に再検索し、再検索した場所までのルート、つまり、設定された案内ルートとは別のルートに関連する音声などを再生してよい。より具体的には、当該音声は、「今日は交通量が多いため別ルートを検索しました。別ルートで案内しますか？」、図１８に示すように「この先通行止めです。別ルートを案内しますか？」などを含み得る。 The behavior decision unit 236 may execute playback of at least one of audio and images guiding a route other than the set guidance route. Specifically, the behavior decision unit 236 may autonomously re-search for places to which the user tends to go periodically, including the traffic conditions of the set guidance route, the weather near the route, the current time, and the like, and play audio and the like related to the route to the re-searched place, that is, a route other than the set guidance route. More specifically, the audio may include, "Traffic is heavy today, so we have searched for an alternative route. Would you like to be guided along the alternative route?", or, as shown in FIG. 18, "The road is closed ahead. Would you like to be guided along an alternative route?", etc.

行動決定部２３６は、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、ユーザを案内ルートに戻す経路を自発的に検索し、検索した経路を案内する音声及び画像の少なくとも１つの再生を実行してよい。具体的には、行動決定部２３６は、案内ルートから外れた経路の交通状況、当該経路付近の気候、現在時刻、当該経路付近で過去に起きた出来事などを含めて、ユーザが戻り得る場所を、自発的に再検索し、再検索した場所までのルートに関連する音声などを再生してよい。つまり、設定された案内ルートに戻る経路に関連する音声を再生してよい。より具体的には、当該音声は、「人通りが少なく危険なので、元ルートに戻る経路を検索しました」、「雨が降り出しました。以前土砂崩れが発生した地域のようです。元の道に戻りますか？」、図１９に示すように「ルートを外れました。元ルートに戻る経路を案内します。」などを含み得る。ロボット１００及び通信端末の行動決定部２３６は、前述した音声、例えば「以前にも通ったルートを案内します」、「水分補給をこまめに行ってください」、「歩道に戻ってください」などの音声及び画像の少なくとも１つの再生を実行してよい。 When a user being led along a set guidance route deviates from the guidance route, the behavior decision unit 236 may spontaneously search for a route that returns the user to the guidance route, and may execute playback of at least one of audio and images that guide the user along the searched route. Specifically, the behavior decision unit 236 may spontaneously re-search for a place to which the user can return, including the traffic conditions on the route that deviated from the guidance route, the weather near the route, the current time, and past events that occurred near the route, and may play audio related to the route to the re-searched place. In other words, audio related to a route that returns to the set guidance route may be played. More specifically, the audio may include, "Since there are few people and it is dangerous, we have searched for a route that returns to the original route," "It has started to rain. It seems to be an area where a landslide occurred before. Would you like to return to the original road?", and, as shown in FIG. 19, "You have deviated from the route. We will guide you to a route that returns to the original route." The behavior decision unit 236 of the robot 100 and the communication terminal may reproduce at least one of the above-mentioned sounds and images, such as "I will guide you along a route you have taken before," "Please drink water frequently," and "Please return to the sidewalk."

（第２行動内容）
行動決定部２３６は、通信端末１００Ｍがユーザの先導中に、又は、ユーザの歩行を補助する音声及び画像の少なくとも１つを再生した後に、ユーザの行動を検出することでユーザの行動が是正されたか否かを判定し、ユーザの行動が是正された場合、第１行動内容と異なる第２行動内容を決定してよい。 (Second Action)
The behavior decision unit 236 detects the user's behavior while the communication terminal 100M is leading the user or after playing at least one of audio and images to assist the user in walking, thereby determining whether the user's behavior has been corrected, and if the user's behavior has been corrected, may decide on a second behavior content different from the first behavior content.

ユーザの行動が是正された場合とは、第１行動内容による通信端末１００Ｍの動作が実行された結果、ユーザが特定行動及び特定行為を辞めた場合、又は、特定状況が解消された場合と解釈してよい。具体的には、ユーザの行動が是正された場合は、提案した別ルートに従ってユーザが歩行を開始した場合、ルートを外れたことで元ルートに戻る経路を案内したときに、その経路に従ってユーザが歩行を開始した場合などを含み得る。 A case where the user's behavior is corrected may be interpreted as a case where the user stops the specific behavior and specific action or the specific situation is resolved as a result of the operation of the communication terminal 100M being executed according to the first action content. Specifically, a case where the user's behavior is corrected may include a case where the user starts walking according to a suggested alternative route, a case where the user deviates from the route and is guided to a route back to the original route and starts walking according to that route, etc.

第２行動内容は、ユーザの行動を褒める音声、ユーザの行動に対して感謝する音声、ユーザの行動を褒める画像、及び、ユーザの行動に対して感謝する画像の少なくとも１つの再生を含めて良い。 The second action content may include playing at least one of audio praising the user's action, audio thanking the user for the user's action, an image praising the user's action, and an image thanking the user for the user's action.

具体的には、ユーザの行動を褒める音声は、図２０に示すように「よく戻れましたね！素晴らしい」などの音声を含めてよい。ユーザの行動に対して感謝する音声は、図２１に示すように「戻ってくれて有り難うございます」、「別ルートを選択してくれて有り難うございます」という音声を含めてよい。ユーザの行動を褒める画像は、例えばグッドポーズをとるキャラクタの画像を含めてよい。ユーザの行動に対して感謝する画像は、例えばお礼をするキャラクタの画像を含めてよい。 Specifically, audio praising the user's actions may include audio such as "You made it back! That's great," as shown in FIG. 20. Audio thanking the user for their actions may include audio such as "Thank you for coming back," or "Thank you for choosing a different route," as shown in FIG. 21. An image praising the user's actions may include, for example, an image of a character striking a good pose. An image thanking the user for their actions may include, for example, an image of a character giving thanks.

（第３行動内容）
行動決定部２３６は、通信端末１００Ｍがユーザの先導中に、又は、ユーザの歩行を補助する音声及び画像の少なくとも１つを再生した後に、ユーザの行動を検出することでユーザの行動が是正されたか否かを判定し、ユーザの行動が是正されていない場合、第１行動内容と異なる第３行動内容を決定してよい。 (Third Action)
The behavior decision unit 236 detects the user's behavior while the communication terminal 100M is leading the user or after playing at least one of audio and images to assist the user in walking, thereby determining whether the user's behavior has been corrected, and if the user's behavior has not been corrected, may decide on a third behavior content different from the first behavior content.

ユーザの行動が是正されていない場合とは、第１行動内容による通信端末１００Ｍの動作が実行されたにもかかわらず、ユーザが危険な行動及び行為を継続した場合、危険な状況が解消されていない場合、又は、ユーザが通信端末１００Ｍの提案とは異なる行動を開始又は継続した場合と解釈してよい。 A case where the user's behavior is not corrected may be interpreted as a case where the user continues dangerous behavior and actions despite the operation of the communication terminal 100M according to the first action content being executed, a case where the dangerous situation is not resolved, or a case where the user starts or continues a behavior different from that suggested by the communication terminal 100M.

具体的には、ユーザの行動が是正されていない場合は、提案した別ルートに従ってユーザが歩行を行わずに当該ルートとは異なるルートを歩行した場合、ルートを外れたことで元ルートに戻る経路を案内したときに、その経路を無視してユーザが歩行を継続した場合などを含み得る。 Specifically, if the user's behavior is not corrected, this may include cases where the user does not follow the suggested alternative route but walks a different route, or where the user strays from the route and is guided back to the original route, but continues walking while ignoring the suggested route.

第３行動内容は、ユーザ以外の人物への特定情報の送信、ユーザの興味を引く音の再生、ユーザの興味を引く画像の再生の少なくとも１つを含めてよい。 The third action content may include at least one of sending specific information to a person other than the user, playing a sound that attracts the user's interest, and playing an image that attracts the user's interest.

ユーザ以外の人物への特定情報の送信は、ユーザの親族、ユーザの配偶者、ユーザの保護者などに対する、警告メッセージが記載されたメールの配信を含めてよい。ユーザ以外の人物への特定情報の送信は、ユーザの親族、ユーザの配偶者、ユーザの保護者などに対する、ユーザとその周囲の風景を含む画像（静止画像、動画像）の配信などを含めてよい。ユーザ以外の人物への特定情報の送信は、警告メッセージを有する音声の配信を含めてよい。 The sending of specific information to persons other than the user may include the delivery of emails containing a warning message to the user's relatives, the user's spouse, the user's guardian, etc. The sending of specific information to persons other than the user may include the delivery of images (still images, moving images) containing the user and the scenery around the user to the user's relatives, the user's spouse, the user's guardian, etc. The sending of specific information to persons other than the user may include the delivery of audio containing a warning message.

ユーザの興味を引く音の再生は、ユーザが好きな特定の音楽を含めてよく、また「こっちに戻って下さい」、「安全な道を通りましょう」などの音声を含めてよい。 The playing of sounds to interest the user may include specific music the user likes, and may also include voice prompts such as "come back this way" and "take a safe route."

ユーザの興味を引く画像の再生は、ユーザが飼っている動物の画像、ユーザの両親の画像、動物のキャラクタの画像などを含めてよい。 Playback of images that interest the user may include images of the user's pets, images of the user's parents, images of animal characters, etc.

本開示のロボット１００及び通信端末によれば、自律的処理によって、ユーザを先導する行動を実行し得る。これにより、ロボット１００及び通信端末は、初めての街を歩行するユーザを適切なルートに案内し得る。また歩行が趣味のユーザに対してこれまで通ったことがないルートを案内し得る。また通行できないルートを事前に検知して、ユーザを別ルートへ誘導し得る。また道に迷い特定のルートから外れたユーザの元のルートに誘導し得る。 The robot 100 and communication terminal disclosed herein can perform actions to lead the user through autonomous processing. This allows the robot 100 and communication terminal to guide a user walking in a city for the first time to an appropriate route. It can also guide a user who enjoys walking to a route they have not taken before. It can also detect impassable routes in advance and guide the user to a different route. It can also guide a user who has become lost and deviated from a specific route back to the original route.

行動決定部２３６は、所定のタイミングで、ユーザ１０の状態、ユーザ１０の感情、ロボット１００及び通信端末の感情、及びロボット１００及び通信端末の状態の少なくとも一つと、行動決定モデル２２１とを用いて、行動しないことを含む複数種類のロボット行動及び通信端末行動の何れかを、ロボット１００及び通信端末の行動として決定する。ここでは、行動決定モデル２２１として、対話機能を有する文章生成モデルを用いる場合を例に説明する。 The behavior decision unit 236 uses at least one of the state of the user 10, the emotion of the user 10, the emotion of the robot 100 and the communication terminal, and the state of the robot 100 and the communication terminal, and the behavior decision model 221 at a predetermined timing to decide one of a plurality of types of robot behaviors and communication terminal behaviors, including no behavior, as the behavior of the robot 100 and the communication terminal. Here, an example will be described in which a sentence generation model with a dialogue function is used as the behavior decision model 221.

具体的には、行動決定部２３６は、ユーザ１０の状態、ユーザ１０の感情、ロボット１００及び通信端末の感情、及びロボット１００及び通信端末の状態の少なくとも一つを表すテキストと、ロボット行動及び通信端末行動を質問するテキストとを文章生成モデルに入力し、文章生成モデルの出力に基づいて、ロボット１００及び通信端末の行動を決定する。 Specifically, the behavior decision unit 236 inputs text representing at least one of the state of the user 10, the emotion of the user 10, the emotion of the robot 100 and the communication terminal, and the state of the robot 100 and the communication terminal, and text asking about the robot behavior and the communication terminal behavior, into a sentence generation model, and decides the behavior of the robot 100 and the communication terminal based on the output of the sentence generation model.

例えば、複数種類のロボット及び通信端末の行動は、以下の（１）～（２６）を含む。 For example, the behaviors of multiple types of robots and communication terminals include the following (1) to (26).

（１）ロボット１００及び通信端末１００Ｍは、何もしない。
（２）ロボット１００及び通信端末１００Ｍは、夢をみる。
（３）ロボット１００及び通信端末１００Ｍは、ユーザに話しかける。
（４）ロボット１００及び通信端末１００Ｍは、絵日記を作成する。
（５）ロボット１００及び通信端末１００Ｍは、アクティビティを提案する。
（６）ロボット１００及び通信端末１００Ｍは、ユーザが会うべき相手を提案する。
（７）ロボット１００及び通信端末１００Ｍは、ユーザが興味あるニュースを紹介する。（８）ロボット１００及び通信端末１００Ｍは、写真や動画を編集する。
（９）ロボット１００及び通信端末１００Ｍは、ユーザと一緒に勉強する。
（１０）ロボット１００及び通信端末１００Ｍは、記憶を呼び起こす。
（１１）ロボット１００及び通信端末１００Ｍは、ユーザを先導する第１行動内容として、ユーザが歩行し得るルートの画像を通信端末１００Ｍの画面に再生してよい。
（１２）ロボット１００及び通信端末１００Ｍは、ユーザの行動を是正する第１行動内容として、ユーザが歩行し得るルートに関連する音声を再生してよい。
（１３）ロボット１００及び通信端末１００Ｍは、ユーザの行動を是正する第１行動内容として、設定された案内ルートとは別のルートの画像を通信端末１００Ｍの画面に再生してよい。
（１４）ロボット１００及び通信端末１００Ｍは、ユーザの行動を是正する第１行動内容として、設定された案内ルートとは別のルートに関連する音声を再生してよい。
（１５）ロボット１００及び通信端末１００Ｍは、ユーザの行動を是正する第１行動内容として、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートの画像を通信端末１００Ｍの画面に再生してよい。
（１６）ロボット１００及び端末１００Ｍは、ユーザの行動を是正する第１行動内容として、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートに関連する音声を再生してよい。
（１７）ロボット１００及び通信端末１００Ｍは、第１行動内容と異なる第２行動内容として、ユーザの行動を褒める音声を再生してよい。
（１８）ロボット１００及び通信端末１００Ｍは、第１行動内容と異なる第２行動内容として、ユーザの行動に対して感謝する音声を再生してよい。
（１９）ロボット１００及び通信端末１００Ｍは、第１行動内容と異なる第２行動内容として、ユーザの行動を褒める画像を再生してよい。
（２０）ロボット１００及び通信端末１００Ｍは、第１行動内容と異なる第２行動内容として、ユーザの行動に対して感謝する画像を再生してよい。
（２１）ロボット１００及び通信端末１００Ｍは、第１行動内容と異なる第３行動内容として、ユーザ以外の人物への特定情報を送信してよい。
（２２）ロボット１００及び通信端末１００Ｍは、第１行動内容と異なる第３行動内容として、ユーザの興味を引く音を再生してよい。
（２３）ロボット１００及び通信端末１００Ｍは、第１行動内容と異なる第３行動内容として、ユーザの興味を引く画像を再生してよい。 (1) The robot 100 and the communication terminal 100M do nothing.
(2) The robot 100 and the communication terminal 100M dream.
(3) The robot 100 and the communication terminal 100M talk to the user.
(4) The robot 100 and the communication terminal 100M create a picture diary.
(5) The robot 100 and the communication terminal 100M suggest an activity.
(6) The robot 100 and the communication terminal 100M suggest people that the user should meet.
(7) The robot 100 and the communication terminal 100M introduce news that the user is interested in. (8) The robot 100 and the communication terminal 100M edit photos and videos.
(9) The robot 100 and the communication terminal 100M study together with the user.
(10) The robot 100 and the communication terminal 100M evoke memories.
(11) As a first action content for leading the user, the robot 100 and the communication terminal 100M may display an image of a route that the user may walk on the screen of the communication terminal 100M.
(12) The robot 100 and the communication terminal 100M may play back audio related to a route that the user may walk as a first action content for correcting the user's behavior.
(13) As a first action content for correcting the user's behavior, the robot 100 and the communication terminal 100M may play back, on the screen of the communication terminal 100M, an image of a route different from the set guidance route.
(14) The robot 100 and the communication terminal 100M may play back audio related to a route other than the set guidance route as a first action content for correcting the user's behavior.
(15) As a first action content for correcting the user's behavior, the robot 100 and the communication terminal 100M may play on the screen of the communication terminal 100M an image of a route that returns to the set guidance route when the user being led along the set guidance route deviates from the guided route.
(16) As a first action content for correcting the user's behavior, the robot 100 and the terminal 100M may play audio related to a route back to the set guidance route when the user being led along the set guidance route deviates from the guided route.
(17) The robot 100 and the communication terminal 100M may play back a sound praising the user's behavior as a second behavior content different from the first behavior content.
(18) The robot 100 and the communication terminal 100M may play back a voice expressing gratitude for the user's action as a second action content different from the first action content.
(19) The robot 100 and the communication terminal 100M may play back an image praising the user's behavior as a second behavior content different from the first behavior content.
(20) The robot 100 and the communication terminal 100M may play an image expressing gratitude for the user's action as a second action content different from the first action content.
(21) The robot 100 and the communication terminal 100M may transmit specific information to a person other than the user as a third action content different from the first action content.
(22) The robot 100 and the communication terminal 100M may play a sound that attracts the user's interest as a third behavior content different from the first behavior content.
(23) The robot 100 and the communication terminal 100M may play back an image that attracts the user's interest as a third action content different from the first action content.

行動決定部２３６は、一定時間の経過毎に、状態認識部２３０によって認識されたユーザ１０の状態及びロボット１００又は通信端末１００Ｍの状態、感情決定部２３２により決定されたユーザ１０の現在の感情値と、ロボット１００又は通信端末１００Ｍの現在の感情値とを表すテキストと、行動しないことを含む複数種類のロボット行動の何れかを質問するテキストとを、文章生成モデルに入力し、文章生成モデルの出力に基づいて、ロボット１００又は通信端末１００Ｍの行動を決定する。ここで、ロボット１００又は通信端末１００Ｍの周辺にユーザ１０がいない場合には、文章生成モデルに入力するテキストには、ユーザ１０の状態と、ユーザ１０の現在の感情値とを含めなくてもよいし、ユーザ１０がいないことを表すことを含めてもよい。以下では通信端末１００Ｍをロボットと読み替えてよい。通信端末１００Ｍは、スマートホンと解釈してよく、スマートＡＩデバイスと解釈してもよい。ロボット行動は通信端末行動を読み替えてよい。 The behavior determination unit 236 inputs the state of the user 10 and the state of the robot 100 or the communication terminal 100M recognized by the state recognition unit 230, the current emotion value of the user 10 determined by the emotion determination unit 232, text representing the current emotion value of the robot 100 or the communication terminal 100M, and text asking about one of a plurality of types of robot behaviors including not taking any action, into the sentence generation model every time a certain period of time has elapsed, and determines the behavior of the robot 100 or the communication terminal 100M based on the output of the sentence generation model. Here, when the user 10 is not present around the robot 100 or the communication terminal 100M, the text input to the sentence generation model does not need to include the state of the user 10 and the current emotion value of the user 10, or may include a representation that the user 10 is not present. In the following, the communication terminal 100M may be read as a robot. The communication terminal 100M may be interpreted as a smartphone or as a smart AI device. The robot behavior may be read as a communication terminal behavior.

一例として、「ロボットはとても楽しい状態です。ユーザは普通に楽しい状態です。ユーザは寝ています。ロボットの行動として、次の（１）～（２６）のうち、どれがよいですか？
（１）ロボットは何もしない。
（２）ロボットは夢をみる。
（３）ロボットはユーザに話しかける。
・・・」というテキストを、文章生成モデルに入力する。文章生成モデルの出力「（１）何もしない、または（２）ロボットは夢を見る、のどちらかが、最も適切な行動であると言えます。」に基づいて、ロボット１００の行動として、「（１）何もしない」または「（２）ロボットは夢を見る」を決定する。 As an example, "The robot is in a very happy state. The user is in a normal happy state. The user is sleeping. Which of the following (1) to (26) is the best behavior for the robot?"
(1) The robot does nothing.
(2) Robots dream.
(3) The robot talks to the user.
..." is input to the sentence generation model. Based on the output of the sentence generation model, "It can be said that either (1) doing nothing or (2) the robot dreams is the most appropriate behavior," the behavior of the robot 100 is determined to be "(1) doing nothing" or "(2) the robot dreams."

他の例として、「ロボットは少し寂しい状態です。ユーザは不在です。ロボットの周辺は暗いです。ロボットの行動として、次の（１）～（２６）のうち、どれがよいですか？（１）ロボットは何もしない。
（２）ロボットは夢をみる。
（３）ロボットはユーザに話しかける。
・・・」というテキストを、文章生成モデルに入力する。文章生成モデルの出力「（２）ロボットは夢を見る、または（４）ロボットは、絵日記を作成する、のどちらかが、最も適切な行動であると言えます。」に基づいて、ロボット１００の行動として、「（２）ロボットは夢を見る」または「（４）ロボットは、絵日記を作成する。」を決定する。 Another example is, "The robot is a little lonely. The user is not present. The robot's surroundings are dark. Which of the following (1) to (26) would be the best behavior for the robot? (1) The robot does nothing.
(2) Robots dream.
(3) The robot talks to the user.
. . " is input to the sentence generation model. Based on the output of the sentence generation model, "It can be said that either (2) the robot dreams or (4) the robot creates a picture diary is the most appropriate behavior," the behavior of the robot 100 is determined to be "(2) the robot dreams" or "(4) the robot creates a picture diary."

行動決定部２３６は、ロボット行動として、「（２）ロボットは夢をみる。」すなわち、オリジナルイベントを作成することを決定した場合には、文章生成モデルを用いて、履歴データ２２２のうちの複数のイベントデータを組み合わせたオリジナルイベントを作成する。このとき、記憶制御部２３８は、作成したオリジナルイベントを、履歴データ２２２に記憶させる When the behavior decision unit 236 decides to create an original event, i.e., "(2) The robot dreams," as the robot behavior, it uses the sentence generation model to create an original event that combines multiple event data from the history data 222. At this time, the storage control unit 238 stores the created original event in the history data 222.

行動決定部２３６は、ロボット行動として、「（３）ロボットはユーザに話しかける。」、すなわち、ロボット１００が発話することを決定した場合には、文章生成モデルを用いて、ユーザ状態と、ユーザの感情又はロボットの感情とに対応するロボットの発話内容を決定する。このとき、行動制御部２５０は、決定したロボットの発話内容を表す音声を、制御対象２５２に含まれるスピーカから出力させる。なお、行動制御部２５０は、ロボット１００の周辺にユーザ１０が不在の場合には、決定したロボットの発話内容を表す音声を出力せずに、決定したロボットの発話内容を行動予定データ２２４に格納しておく。 When the behavior decision unit 236 decides that the robot 100 will speak, i.e., "(3) The robot speaks to the user," as the robot behavior, it uses a sentence generation model to decide the robot's utterance content corresponding to the user state and the user's emotion or the robot's emotion. At this time, the behavior control unit 250 causes a sound representing the determined robot's utterance content to be output from a speaker included in the control target 252. Note that, when the user 10 is not present around the robot 100, the behavior control unit 250 stores the determined robot's utterance content in the behavior schedule data 224 without outputting a sound representing the determined robot's utterance content.

行動決定部２３６は、ロボット行動として、「（７）ロボットは、ユーザが興味あるニュースを紹介する。」ことを決定した場合には、文章生成モデルを用いて、収集データ２２３に格納された情報に対応するロボットの発話内容を決定する。このとき、行動制御部２５０は、決定したロボットの発話内容を表す音声を、制御対象２５２に含まれるスピーカから出力させる。なお、行動制御部２５０は、ロボット１００の周辺にユーザ１０が不在の場合には、決定したロボットの発話内容を表す音声を出力せずに、決定したロボットの発話内容を行動予定データ２２４に格納しておく。 When the behavior decision unit 236 decides that the robot behavior is "(7) The robot introduces news that is of interest to the user," it uses a sentence generation model to decide the robot's utterance content corresponding to the information stored in the collected data 223. At this time, the behavior control unit 250 causes a sound representing the decided robot's utterance content to be output from a speaker included in the control target 252. Note that when the user 10 is not present around the robot 100, the behavior control unit 250 stores the decided robot's utterance content in the behavior schedule data 224 without outputting a sound representing the decided robot's utterance content.

行動決定部２３６は、ロボット行動として、「（４）ロボットは、絵日記を作成する。」、すなわち、ロボット１００がイベント画像を作成することを決定した場合には、履歴データ２２２から選択されるイベントデータについて、画像生成モデルを用いて、イベントデータを表す画像を生成すると共に、文章生成モデルを用いて、イベントデータを表す説明文を生成し、イベントデータを表す画像及びイベントデータを表す説明文の組み合わせを、イベント画像として出力する。なお、行動制御部２５０は、ロボット１００の周辺にユーザ１０が不在の場合には、イベント画像を出力せずに、イベント画像を行動予定データ２２４に格納しておく。 When the behavior decision unit 236 determines that the robot behavior is "(4) The robot creates a picture diary.", i.e., that the robot 100 creates an event image, it uses an image generation model to generate an image representing the event data for the event data selected from the history data 222, and uses a text generation model to generate an explanatory text representing the event data, and outputs the combination of the image representing the event data and the explanatory text representing the event data as an event image. Note that when the user 10 is not present around the robot 100, the behavior control unit 250 does not output the event image, but stores the event image in the behavior schedule data 224.

行動決定部２３６は、ロボット行動として、「（８）ロボットは、写真や動画を編集する。」、すなわち、画像を編集することを決定した場合には、履歴データ２２２から、感情値に基づいてイベントデータを選択し、選択されたイベントデータの画像データを編集して出力する。なお、行動制御部２５０は、ロボット１００の周辺にユーザ１０が不在の場合には、編集した画像データを出力せずに、編集した画像データを行動予定データ２２４に格納しておく。 When the behavior decision unit 236 determines that the robot behavior is "(8) The robot edits photos and videos," i.e., that an image is to be edited, it selects event data from the history data 222 based on the emotion value, and edits and outputs the image data of the selected event data. Note that when the user 10 is not present near the robot 100, the behavior control unit 250 stores the edited image data in the behavior schedule data 224 without outputting the edited image data.

行動決定部２３６は、ロボット行動として、「（５）ロボットは、アクティビティを提案する。」、すなわち、ユーザ１０の行動を提案することを決定した場合には、履歴データ２２２に記憶されているイベントデータに基づいて、文章生成モデルを用いて、提案するユーザの行動を決定する。このとき、行動制御部２５０は、ユーザの行動を提案する音声を、制御対象２５２に含まれるスピーカから出力させる。なお、行動制御部２５０は、ロボット１００の周辺にユーザ１０が不在の場合には、ユーザの行動を提案する音声を出力せずに、ユーザの行動を提案することを行動予定データ２２４に格納しておく。 When the behavior decision unit 236 determines that the robot behavior is "(5) The robot proposes an activity," i.e., that the robot proposes an action for the user 10, it uses a sentence generation model to determine the proposed user action based on the event data stored in the history data 222. At this time, the behavior control unit 250 causes a sound proposing the user action to be output from a speaker included in the control target 252. Note that, when the user 10 is not present around the robot 100, the behavior control unit 250 stores in the action schedule data 224 the suggestion of the user action without outputting the sound proposing the user action.

行動決定部２３６は、ロボット行動として、「（６）ロボットは、ユーザが会うべき相手を提案する。」、すなわち、ユーザ１０と接点を持つべき相手を提案することを決定した場合には、履歴データ２２２に記憶されているイベントデータに基づいて、文章生成モデルを用いて、提案するユーザと接点を持つべき相手を決定する。このとき、行動制御部２５０は、ユーザと接点を持つべき相手を提案することを表す音声を、制御対象２５２に含まれるスピーカから出力させる。なお、行動制御部２５０は、ロボット１００の周辺にユーザ１０が不在の場合には、ユーザと接点を持つべき相手を提案することを表す音声を出力せずに、ユーザと接点を持つべき相手を提案することを行動予定データ２２４に格納しておく。 When the behavior decision unit 236 determines that the robot behavior is "(6) The robot proposes people that the user should meet," that is, to propose people that the user 10 should have contact with, it uses a sentence generation model based on the event data stored in the history data 222 to determine people that the proposed user should have contact with. At this time, the behavior control unit 250 causes a speaker included in the control target 252 to output a sound indicating that a person that the user should have contact with is being proposed. Note that, when the user 10 is not present around the robot 100, the behavior control unit 250 stores in the behavior schedule data 224 the suggestion of people that the user should have contact with, without outputting a sound indicating that a person that the user should have contact with is being proposed.

行動決定部２３６は、ロボット行動として、「（９）ロボットは、ユーザと一緒に勉強する。」、すなわち、勉強に関してロボット１００が発話することを決定した場合には、文章生成モデルを用いて、ユーザ状態と、ユーザの感情又はロボットの感情とに対応する、勉強を促したり、勉強の問題を出したり、勉強に関するアドバイスを行うためのロボットの発話内容を決定する。このとき、行動制御部２５０は、決定したロボットの発話内容を表す音声を、制御対象２５２に含まれるスピーカから出力させる。なお、行動制御部２５０は、ロボット１００の周辺にユーザ１０が不在の場合には、決定したロボットの発話内容を表す音声を出力せずに、決定したロボットの発話内容を行動予定データ２２４に格納しておく。 When the behavior decision unit 236 decides that the robot behavior is "(9) The robot studies together with the user," i.e., that the robot 100 will make an utterance regarding studying, it uses a sentence generation model to decide the content of the robot's utterance to encourage studying, give study questions, or give advice regarding studying, which corresponds to the user state and the user's or the robot's emotions. At this time, the behavior control unit 250 causes a sound representing the determined content of the robot's utterance to be output from a speaker included in the control target 252. Note that, when the user 10 is not present around the robot 100, the behavior control unit 250 stores the determined content of the robot's utterance in the behavior schedule data 224, without outputting the sound representing the determined content of the robot's utterance.

行動決定部２３６は、ロボット行動として、「（１０）ロボットは、記憶を呼び起こす。」、すなわち、イベントデータを思い出すことを決定した場合には、履歴データ２２２から、イベントデータを選択する。このとき、感情決定部２３２は、選択したイベントデータに基づいて、ロボット１００の感情を判定する。更に、行動決定部２３６は、選択したイベントデータに基づいて、文章生成モデルを用いて、ユーザの感情値を変化させるためのロボット１００の発話内容や行動を表す感情変化イベントを作成する。このとき、記憶制御部２３８は、感情変化イベントを、行動予定データ２２４に記憶させる。 When the behavior decision unit 236 decides that the robot behavior is "(10) The robot recalls a memory," i.e., that the robot recalls event data, it selects the event data from the history data 222. At this time, the emotion decision unit 232 judges the emotion of the robot 100 based on the selected event data. Furthermore, the behavior decision unit 236 uses a sentence generation model based on the selected event data to create an emotion change event that represents the speech content and behavior of the robot 100 for changing the user's emotion value. At this time, the memory control unit 238 stores the emotion change event in the scheduled behavior data 224.

例えば、ユーザが見ていた動画がパンダに関するものであったことをイベントデータとして履歴データ２２２に記憶し、当該イベントデータが選択された場合、「パンダに関する話題で、次ユーザに会ったときにかけるべきセリフは何がありますか。三つ挙げて。」と、文章生成モデルに入力し、文章生成モデルの出力が、「（１）動物園にいこう、（２）パンダの絵を描こう、（３）パンダのぬいぐるみを買いに行こう」であった場合、ロボット１００が、「（１）、（２）、（３）でユーザが一番喜びそうなものは？」と、文章生成モデルに入力し、文章生成モデルの出力が、「（１）動物園にいこう」である場合は、ロボット１００が次にユーザに会っときに「（１）動物園にいこう」とロボット１００が発話することを、感情変化イベントとして作成し、行動予定データ２２４に記憶される。 For example, the fact that the video the user was watching was about pandas is stored as event data in the history data 222. When the event data is selected, "Which of the following would you like to say to the user the next time you meet them on the topic of pandas? Name three." is input to the sentence generation model. If the output of the sentence generation model is "(1) Let's go to the zoo, (2) Let's draw a picture of a panda, (3) Let's go buy a stuffed panda," the robot 100 inputs "Which of (1), (2), and (3) would the user be most happy about?" to the sentence generation model. If the output of the sentence generation model is "(1) Let's go to the zoo," the robot 100 will say "(1) Let's go to the zoo" the next time it meets the user. This is created as an emotion change event and stored in the action schedule data 224.

また、例えば、ロボット１００の感情値が大きいイベントデータを、ロボット１００の印象的な記憶として選択する。これにより、印象的な記憶として選択されたイベントデータに基づいて、感情変化イベントを作成することができる。 In addition, for example, event data with a high emotion value for the robot 100 is selected as an impressive memory for the robot 100. This makes it possible to create an emotion change event based on the event data selected as an impressive memory.

行動決定部２３６は、自発的に又は定期的に、センサに基づきユーザの行動を検知し、検知したユーザの行動と予め記憶した特定情報とに基づき、通信端末１００Ｍである電子機器の行動として、ユーザを先導することを決定した場合には、以下の第１行動内容を実行し得る。なお、行動決定部２３６は、センサに含まれるカメラがユーザの進行方向に向けられている場合に、第１行動内容を決定してもよい。これにより、ユーザのルート案内の実行を望む場合のみ、通信端末１００Ｍによるユーザの先導を開始し得る。 The behavior decision unit 236 may detect the user's behavior based on the sensor, either voluntarily or periodically, and may execute the following first behavior content when it determines that the behavior of the electronic device, which is the communication terminal 100M, is to lead the user based on the detected user behavior and pre-stored specific information. Note that the behavior decision unit 236 may determine the first behavior content when the camera included in the sensor is pointed in the user's traveling direction. In this way, leading the user by the communication terminal 100M may begin only when the user desires to execute route guidance.

行動決定部２３６は、通信端末行動として、前述した「（１１）」の第１行動内容、すなわち、ユーザが歩行し得るルートの画像を通信端末１００Ｍの画面に再生し得る。 The action decision unit 236 may play, as a communication terminal action, the first action content of "(11)" described above, i.e., an image of a route that the user may walk, on the screen of the communication terminal 100M.

行動決定部２３６は、通信端末行動として、前述した「（１２）」の第１行動内容、すなわち、ユーザが歩行し得るルートに関連する音声を再生し得る。 The behavior decision unit 236 may play back, as a communication terminal behavior, the first behavior content of "(12)" described above, i.e., audio related to a route that the user may walk.

行動決定部２３６は、通信端末行動として、前述した「（１３）」の第１行動内容、すなわち、ユーザの行動を是正する第１行動内容として、設定された案内ルートとは別のルートの画像を通信端末１００Ｍの画面に再生し得る。 The behavior decision unit 236 may play, as a communication terminal behavior, an image of a route other than the set guidance route on the screen of the communication terminal 100M as the first behavior content of "(13)" described above, that is, as a first behavior content that corrects the user's behavior.

行動決定部２３６は、通信端末行動として、前述した「（１４）」の第１行動内容、すなわち、設定された案内ルートとは別のルートに関連する音声を再生し得る。 The action decision unit 236 may play back, as a communication terminal action, the first action content of "(14)" described above, i.e., audio related to a route other than the set guidance route.

行動決定部２３６は、通信端末行動として、前述した「（１５）」の第１行動内容、すなわち、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートの画像を通信端末１００Ｍの画面に再生し得る。 The action decision unit 236 may, as a communication terminal action, play on the screen of the communication terminal 100M an image of the first action content of "(15)" described above, that is, a route that returns to the set guidance route when the user being led along the set guidance route deviates from the guidance route.

行動決定部２３６は、通信端末行動として、前述した「（１６）」の第１行動内容、すなわち、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートに関連する音声を再生し得る。 The behavior decision unit 236 may play, as a communication terminal behavior, the first behavior content of "(16)" described above, i.e., audio related to a route back to the set guidance route when the user being led along the set guidance route deviates from the guidance route.

行動決定部２３６は、第１行動内容と異なる第２行動内容を実行し得る。具体的には、行動決定部２３６は、通信端末行動として、前述した「（１７）」の第２行動内容、すなわち、ユーザの行動を褒める音声を再生し得る。 The behavior decision unit 236 may execute a second behavior content different from the first behavior content. Specifically, the behavior decision unit 236 may play back, as the communication terminal behavior, the second behavior content of "(17)" described above, i.e., a voice praising the user's behavior.

行動決定部２３６は、通信端末行動として、前述した「（１８）」の第２行動内容、すなわち、ユーザの行動に対して感謝する音声を再生し得る。 The action decision unit 236 may play, as the communication terminal action, the second action content of "(18)" described above, i.e., a voice expressing gratitude for the user's action.

行動決定部２３６は、通信端末行動として、前述した「（１９）」の第２行動内容、すなわち、ユーザの行動を褒める画像を再生し得る。 The behavior decision unit 236 may play, as the communication terminal behavior, the second behavior content of "(19)" described above, i.e., an image praising the user's behavior.

行動決定部２３６は、通信端末行動として、前述した「（２０）」の第２行動内容、すなわち、ユーザの行動に対して感謝する画像を再生し得る。 The action decision unit 236 may play, as the communication terminal action, the second action content of "(20)" mentioned above, i.e., an image expressing gratitude for the user's action.

行動決定部２３６は、第１行動内容と異なる第３行動内容を実行し得る。具体的には、行動決定部２３６は、通信端末行動として、前述した「（２１）」の第３行動内容、すなわち、ユーザ以外の人物への特定情報を送信し得る。 The action decision unit 236 may execute a third action content different from the first action content. Specifically, the action decision unit 236 may transmit the third action content (21) described above, that is, specific information to a person other than the user, as a communication terminal action.

行動決定部２３６は、通信端末行動として、前述した「（２２）」の第３行動内容、すなわち、ユーザの興味を引く音の再生を実行し得る。 The behavior decision unit 236 may execute the third behavior content of "(22)" described above, i.e., playing a sound that attracts the user's interest, as a communication terminal behavior.

行動決定部２３６は、通信端末行動として、前述した「（２３）」の第３行動内容、すなわち、ユーザの興味を引く画像の再生を実行し得る。 The action decision unit 236 may execute the third action content of "(23)" mentioned above, i.e., playing an image that attracts the user's interest, as a communication terminal action.

前述した「（１１）」～「（１６）」に示す第１行動内容として、ユーザが歩行し得るルートの画像などを再生する場合、関連情報収集部２７０は、収集データ２２３に、ユーザが歩行し得るルートの画像などのデータを格納してよい。 When playing back images of routes that the user can walk as the first action content shown in "(11)" to "(16)" above, the related information collection unit 270 may store data such as images of routes that the user can walk in the collected data 223.

前述した「（１７）」～「（２０）」に示す第２行動内容として、ユーザの行動を褒める音声などを再生する場合、関連情報収集部２７０は、収集データ２２３に、ユーザが歩行し得るルートの画像などのデータを格納してよい。 When playing back audio or the like praising the user's behavior as the second behavior content shown in "(17)" to "(20)" above, the related information collection unit 270 may store data such as images of the route the user may walk in the collected data 223.

前述した「（２１）」～「（２３）」に示す第３行動内容として、ユーザ以外の人物への特定情報の送信などする場合、関連情報収集部２７０は、収集データ２２３に、特定情報のデータなどを格納してよい。 When sending specific information to a person other than the user as the third action content shown in "(21)" to "(23)" described above, the related information collection unit 270 may store data of the specific information in the collected data 223.

行動決定部２３６は、状態認識部２３０によって認識されたユーザ１０の状態に基づいて、ロボット１００に対するユーザ１０の行動がない状態から、ロボット１００に対するユーザ１０の行動を検知した場合に、行動予定データ２２４に記憶されているデータを読み出し、ロボット１００の行動を決定する。 When the behavior decision unit 236 detects an action of the user 10 toward the robot 100 from a state in which the user 10 is not taking any action toward the robot 100 based on the state of the user 10 recognized by the state recognition unit 230, the behavior decision unit 236 reads the data stored in the action schedule data 224 and decides the behavior of the robot 100.

例えば、ロボット１００の周辺にユーザ１０が不在だった場合に、ユーザ１０を検知すると、行動決定部２３６は、行動予定データ２２４に記憶されているデータを読み出し、ロボット１００の行動を決定する。また、ユーザ１０が寝ていた場合に、ユーザ１０が起きたことを検知すると、行動決定部２３６は、行動予定データ２２４に記憶されているデータを読み出し、ロボット１００の行動を決定する。 For example, if the user 10 is not present near the robot 100 and the user 10 is detected, the behavior decision unit 236 reads out the data stored in the behavior schedule data 224 and decides on the behavior of the robot 100. Also, if the user 10 is asleep and the behavior decision unit 236 detects that the user 10 has woken up, the behavior decision unit 236 reads out the data stored in the behavior schedule data 224 and decides on the behavior of the robot 100.

なお、ロボット１００の一部（例えば、センサモジュール部２１０、格納部２２０、制御部２２８）が、ロボット１００の外部（例えば、サーバ）に設けられ、ロボット１００が、外部と通信することで、上記のロボット１００の各部として機能するようにしてもよい。 In addition, parts of the robot 100 (e.g., the sensor module section 210, the storage section 220, the control section 228) may be provided outside the robot 100 (e.g., a server), and the robot 100 may communicate with the outside to function as each part of the robot 100 described above.

図３は、ユーザ１０の好み情報に関連する情報を収集する収集処理に関する動作フローの一例を概略的に示す。図３に示す動作フローは、一定期間毎に、繰り返し実行される。ユーザ１０の発話内容、又はユーザ１０による設定操作から、ユーザ１０の関心がある事柄を表す好み情報が取得されているものとする。なお、動作フロー中の「Ｓ」は、実行されるステップを表す。 Figure 3 shows an example of an operational flow for a collection process that collects information related to the preference information of the user 10. The operational flow shown in Figure 3 is executed repeatedly at regular intervals. It is assumed that preference information indicating matters of interest to the user 10 is acquired from the contents of the speech of the user 10 or from a setting operation performed by the user 10. Note that "S" in the operational flow indicates the step that is executed.

まず、ステップＳ９０において、関連情報収集部２７０は、ユーザ１０の関心がある事柄を表す好み情報を取得する。 First, in step S90, the related information collection unit 270 acquires preference information that represents matters of interest to the user 10.

ステップＳ９２において、関連情報収集部２７０は、好み情報に関連する情報を、外部データから収集する。 In step S92, the related information collection unit 270 collects information related to the preference information from external data.

ステップＳ９４において、感情決定部２３２は、関連情報収集部２７０によって収集した好み情報に関連する情報に基づいて、ロボット１００の感情値を決定する。 In step S94, the emotion determination unit 232 determines the emotion value of the robot 100 based on information related to the preference information collected by the related information collection unit 270.

ステップＳ９６において、記憶制御部２３８は、上記ステップＳ９４で決定されたロボット１００の感情値が閾値以上であるか否かを判定する。ロボット１００の感情値が閾値未満である場合には、収集した好み情報に関連する情報を収集データ２２３に記憶せずに、当該処理を終了する。一方、ロボット１００の感情値が閾値以上である場合には、ステップＳ９８へ移行する。 In step S96, the memory control unit 238 determines whether the emotion value of the robot 100 determined in step S94 above is equal to or greater than a threshold value. If the emotion value of the robot 100 is less than the threshold value, the process ends without storing the information related to the collected preference information in the collection data 223. On the other hand, if the emotion value of the robot 100 is equal to or greater than the threshold value, the process proceeds to step S98.

ステップＳ９８において、記憶制御部２３８は、収集した好み情報に関連する情報を、収集データ２２３に格納し、当該処理を終了する。 In step S98, the memory control unit 238 stores the information related to the collected preference information in the collected data 223 and terminates the process.

図４Ａは、ユーザ１０の行動に対してロボット１００が応答する応答処理を行う際に、ロボット１００において行動を決定する動作に関する動作フローの一例を概略的に示す。図４Ａに示す動作フローは、繰り返し実行される。このとき、センサモジュール部２１０で解析された情報が入力されているものとする。 Figure 4A shows an example of an outline of an operation flow relating to the operation of determining an action in the robot 100 when performing a response process in which the robot 100 responds to the action of the user 10. The operation flow shown in Figure 4A is executed repeatedly. At this time, it is assumed that information analyzed by the sensor module unit 210 has been input.

まず、ステップＳ１００において、状態認識部２３０は、センサモジュール部２１０で解析された情報に基づいて、ユーザ１０の状態及びロボット１００の状態を認識する。 First, in step S100, the state recognition unit 230 recognizes the state of the user 10 and the state of the robot 100 based on the information analyzed by the sensor module unit 210.

ステップＳ１０２において、感情決定部２３２は、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態に基づいて、ユーザ１０の感情を示す感情値を決定する。 In step S102, the emotion determination unit 232 determines an emotion value indicating the emotion of the user 10 based on the information analyzed by the sensor module unit 210 and the state of the user 10 recognized by the state recognition unit 230.

ステップＳ１０３において、感情決定部２３２は、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態に基づいて、ロボット１００の感情を示す感情値を決定する。感情決定部２３２は、決定したユーザ１０の感情値及びロボット１００の感情値を履歴データ２２２に追加する。 In step S103, the emotion determination unit 232 determines an emotion value indicating the emotion of the robot 100 based on the information analyzed by the sensor module unit 210 and the state of the user 10 recognized by the state recognition unit 230. The emotion determination unit 232 adds the determined emotion value of the user 10 and the emotion value of the robot 100 to the history data 222.

ステップＳ１０４において、行動認識部２３４は、センサモジュール部２１０で解析された情報及び状態認識部２３０によって認識されたユーザ１０の状態に基づいて、ユーザ１０の行動分類を認識する。 In step S104, the behavior recognition unit 234 recognizes the behavior classification of the user 10 based on the information analyzed by the sensor module unit 210 and the state of the user 10 recognized by the state recognition unit 230.

ステップＳ１０６において、行動決定部２３６は、ステップＳ１０２で決定されたユーザ１０の現在の感情値及び履歴データ２２２に含まれる過去の感情値の組み合わせと、ロボット１００の感情値と、上記ステップＳ１０４で認識されたユーザ１０の行動と、行動決定モデル２２１とに基づいて、ロボット１００の行動を決定する。 In step S106, the behavior decision unit 236 decides the behavior of the robot 100 based on a combination of the current emotion value of the user 10 determined in step S102 and the past emotion values included in the history data 222, the emotion value of the robot 100, the behavior of the user 10 recognized in step S104, and the behavior decision model 221.

ステップＳ１０８において、行動制御部２５０は、行動決定部２３６により決定された行動に基づいて、制御対象２５２を制御する。 In step S108, the behavior control unit 250 controls the control object 252 based on the behavior determined by the behavior determination unit 236.

ステップＳ１１０において、記憶制御部２３８は、行動決定部２３６によって決定された行動に対して予め定められた行動の強度と、感情決定部２３２により決定されたロボット１００の感情値とに基づいて、強度の総合値を算出する。 In step S110, the memory control unit 238 calculates a total intensity value based on the predetermined action intensity for the action determined by the action determination unit 236 and the emotion value of the robot 100 determined by the emotion determination unit 232.

ステップＳ１１２において、記憶制御部２３８は、強度の総合値が閾値以上であるか否かを判定する。強度の総合値が閾値未満である場合には、ユーザ１０の行動を含むイベントデータを履歴データ２２２に記憶せずに、当該処理を終了する。一方、強度の総合値が閾値以上である場合には、ステップＳ１１４へ移行する。 In step S112, the storage control unit 238 determines whether the total intensity value is equal to or greater than the threshold value. If the total intensity value is less than the threshold value, the process ends without storing the event data including the user 10's behavior in the history data 222. On the other hand, if the total intensity value is equal to or greater than the threshold value, the process proceeds to step S114.

ステップＳ１１４において、行動決定部２３６によって決定された行動と、現時点から一定期間前までの、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態とを含むイベントデータを、履歴データ２２２に記憶する。 In step S114, event data including the action determined by the action determination unit 236, information analyzed by the sensor module unit 210 from the present time up to a certain period of time ago, and the state of the user 10 recognized by the state recognition unit 230 is stored in the history data 222.

図４Ｂは、ロボット１００が自律的に行動する自律的処理を行う際に、ロボット１００において行動を決定する動作に関する動作フローの一例を概略的に示す。図４Ｂに示す動作フローは、例えば、一定時間の経過毎に、繰り返し自動的に実行される。このとき、センサモジュール部２１０で解析された情報が入力されているものとする。なお、上記図４Ａと同様の処理については、同じステップ番号を表す。 Figure 4B shows an example of an outline of an operation flow relating to the operation of determining the behavior of the robot 100 when the robot 100 performs autonomous processing to act autonomously. The operation flow shown in Figure 4B is automatically executed repeatedly, for example, at regular time intervals. At this time, it is assumed that information analyzed by the sensor module unit 210 has been input. Note that the same step numbers are used for the same processes as those in Figure 4A above.

ステップＳ２００において、行動決定部２３６は、上記ステップＳ１００で認識されたユーザ１０の状態、ステップＳ１０２で決定されたユーザ１０の感情、ロボット１００の感情、及び上記ステップＳ１００で認識されたロボット１００の状態と、上記ステップＳ１０４で認識されたユーザ１０の行動と、行動決定モデル２２１とに基づいて、行動しないことを含む複数種類のロボット行動の何れかを、ロボット１００の行動として決定する。 In step S200, the behavior decision unit 236 decides on one of multiple types of robot behaviors, including no action, as the behavior of the robot 100 based on the state of the user 10 recognized in step S100, the emotion of the user 10 determined in step S102, the emotion of the robot 100, and the state of the robot 100 recognized in step S100, the behavior of the user 10 recognized in step S104, and the behavior decision model 221.

ステップＳ２０１において、行動決定部２３６は、上記ステップＳ２００で、行動しないことが決定されたか否かを判定する。ロボット１００の行動として、行動しないことが決定された場合には、当該処理を終了する。一方、ロボット１００の行動として、行動しないことが決定されていない場合には、ステップＳ２０２へ移行する。 In step S201, the behavior decision unit 236 determines whether or not it was decided in step S200 above that no action should be taken. If it was decided that no action should be taken as the action of the robot 100, the process ends. On the other hand, if it was not decided that no action should be taken as the action of the robot 100, the process proceeds to step S202.

ステップＳ２０２において、行動決定部２３６は、上記ステップＳ２００で決定したロボット行動の種類に応じた処理を行う。このとき、ロボット行動の種類に応じて、行動制御部２５０、感情決定部２３２、又は記憶制御部２３８が処理を実行する。 In step S202, the behavior determination unit 236 performs processing according to the type of robot behavior determined in step S200. At this time, the behavior control unit 250, the emotion determination unit 232, or the memory control unit 238 executes processing according to the type of robot behavior.

ステップＳ１１２において、記憶制御部２３８は、強度の総合値が閾値以上であるか否かを判定する。強度の総合値が閾値未満である場合には、ユーザ１０の行動を含むデータを履歴データ２２２に記憶せずに、当該処理を終了する。一方、強度の総合値が閾値以上である場合には、ステップＳ１１４へ移行する。 In step S112, the storage control unit 238 determines whether the total intensity value is equal to or greater than the threshold value. If the total intensity value is less than the threshold value, the process ends without storing data including the behavior of the user 10 in the history data 222. On the other hand, if the total intensity value is equal to or greater than the threshold value, the process proceeds to step S114.

ステップＳ１１４において、記憶制御部２３８は、行動決定部２３６によって決定された行動と、現時点から一定期間前までの、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態と、を、履歴データ２２２に記憶する。 In step S114, the memory control unit 238 stores the behavior determined by the behavior determination unit 236, the information analyzed by the sensor module unit 210 from the present time up to a certain period of time ago, and the state of the user 10 recognized by the state recognition unit 230 in the history data 222.

以上説明したように、ロボット１００によれば、ユーザ状態に基づいて、ロボット１００の感情を示す感情値を決定し、ロボット１００の感情値に基づいて、ユーザ１０の行動を含むデータを履歴データ２２２に記憶するか否かを決定する。これにより、ユーザ１０の行動を含むデータを記憶する履歴データ２２２の容量を抑制することができる。そして例えば、１０年後にユーザ状態が１０年前と同じ状態であるとロボット１００が判断したときに、１０年前の履歴データ２２２を読み込むことにより、ロボット１００は１０年前当時のユーザ１０の状態（例えばユーザ１０の表情、感情など）、更にはその場の音声、画像、匂い等のデータなどのあらゆる周辺情報を、ユーザ１０に提示することができる。 As described above, according to the robot 100, an emotion value indicating the emotion of the robot 100 is determined based on the user state, and whether or not to store data including the behavior of the user 10 in the history data 222 is determined based on the emotion value of the robot 100. This makes it possible to reduce the capacity of the history data 222 that stores data including the behavior of the user 10. For example, when the robot 100 determines that the user state 10 years from now is the same as that 10 years ago, the robot 100 can present to the user 10 all kinds of peripheral information, such as the state of the user 10 10 years ago (e.g., the facial expression, emotions, etc. of the user 10), and data on the sound, image, smell, etc. of the location.

また、ロボット１００によれば、ユーザ１０の行動に対して適切な行動をロボット１００に実行させることができる。従来は、ユーザの行動を分類し、ロボットの表情や恰好を含む行動を決めていた。これに対し、ロボット１００は、ユーザ１０の現在の感情値を決定し、過去の感情値及び現在の感情値に基づいてユーザ１０に対して行動を実行する。従って、例えば、昨日は元気であったユーザ１０が今日は落ち込んでいた場合に、ロボット１００は「昨日は元気だったのに今日はどうしたの？」というような発話を行うことができる。また、ロボット１００は、ジェスチャーを交えて発話を行うこともできる。また、例えば、昨日は落ち込んでいたユーザ１０が今日は元気である場合に、ロボット１００は、「昨日は落ち込んでいたのに今日は元気そうだね？」というような発話を行うことができる。また、例えば、昨日は元気であったユーザ１０が今日は昨日よりも元気である場合、ロボット１００は「今日は昨日よりも元気だね。昨日よりも良いことがあった？」というような発話を行うことができる。また、例えば、ロボット１００は、感情値が０以上であり、かつ感情値の変動幅が一定の範囲内である状態が継続しているユーザ１０に対しては、「最近、気分が安定していて良い感じだね。」というような発話を行うことができる。 According to the robot 100, the robot 100 can be made to perform an appropriate action in response to the action of the user 10. Conventionally, the user's actions were classified and the action including the robot's facial expression and appearance was determined. In contrast, the robot 100 determines the current emotional value of the user 10 and performs an action on the user 10 based on the past emotional value and the current emotional value. Therefore, for example, if the user 10 who was cheerful yesterday is depressed today, the robot 100 can make an utterance such as "You were cheerful yesterday, but what's wrong with you today?" The robot 100 can also make an utterance with gestures. For example, if the user 10 who was depressed yesterday is cheerful today, the robot 100 can make an utterance such as "You were depressed yesterday, but you look cheerful today, don't you?" For example, if the user 10 who was cheerful yesterday is more cheerful today than yesterday, the robot 100 can make an utterance such as "You're more cheerful today than yesterday. Has something better happened than yesterday?" Furthermore, for example, the robot 100 can say to a user 10 whose emotion value is equal to or greater than 0 and whose emotion value fluctuation range continues to be within a certain range, "You've been feeling stable lately, which is good."

また、例えば、ロボット１００は、ユーザ１０に対し、「昨日言っていた宿題はできた？」と質問し、ユーザ１０から「できたよ」という回答が得られた場合、「偉いね！」等の肯定的な発話をするとともに、拍手又はサムズアップ等の肯定的なジェスチャーを行うことができる。また、例えば、ロボット１００は、ユーザ１０が「一昨日話したプレゼンテーションがうまくいったよ」という発話をすると、「頑張ったね！」等の肯定的な発話をするとともに、上記の肯定的なジェスチャーを行うこともできる。このように、ロボット１００がユーザ１０の状態の履歴に基づいた行動を行うことによって、ユーザ１０がロボット１００に対して親近感を覚えることが期待できる。 For example, the robot 100 can ask the user 10, "Did you finish the homework I told you about yesterday?", and if the user 10 responds, "I did it," it can utter a positive utterance such as "Great!" and make a positive gesture such as clapping or a thumbs up. For example, when the user 10 utters, "The presentation you gave the day before yesterday went well," the robot 100 can utter a positive utterance such as "You did a great job!" and make the above-mentioned positive gesture. In this way, the robot 100 can be expected to make the user 10 feel a sense of closeness to the robot 100 by performing actions based on the state history of the user 10.

また、例えば、ユーザ１０が、パンダに関する動画を見ているときに、ユーザ１０の感情の「楽」の感情値が閾値以上である場合、当該動画におけるパンダの登場シーンを、イベントデータとして履歴データ２２２に記憶させてもよい。 For example, when user 10 is watching a video about a panda, if the emotion value of user 10's emotion of "pleasure" is equal to or greater than a threshold, a scene in which a panda appears in the video may be stored as event data in the history data 222.

履歴データ２２２や収集データ２２３に蓄積したデータを用いて、ロボット１００は、どのような会話をユーザとすれば、ユーザの幸せを表現する感情値が最大化されるかを常に学習することができる。 Using the data stored in the history data 222 and the collected data 223, the robot 100 can constantly learn what kind of conversation to have with the user in order to maximize the emotional value that expresses the user's happiness.

また、ロボット１００がユーザ１０と会話をしていない状態において、ロボット１００の感情に基づいて、自律的に行動を開始することができる。 In addition, when the robot 100 is not engaged in a conversation with the user 10, the robot 100 can autonomously start to act based on its own emotions.

また、自律的処理において、ロボット１００が、自動的に質問を生成して、文章生成モデルに入力し、文章生成モデルの出力を、質問に対する回答として取得することを繰り返すことによって、良い感情を増大させるための感情変化イベントを作成し、行動予定データ２２４に格納することができる。このように、ロボット１００は、自己学習を実行することができる。 Furthermore, in the autonomous processing, the robot 100 can create emotion change events for increasing positive emotions by repeatedly generating questions, inputting them into a sentence generation model, and obtaining the output of the sentence generation model as an answer to the question, and storing these in the behavior schedule data 224. In this way, the robot 100 can execute self-learning.

また、ロボット１００が、外部からのトリガを受けていない状態において、自動的に質問を生成する際に、ロボットの過去の感情値の履歴から特定した印象に残ったイベントデータに基づいて、質問を自動的に生成することができる。 In addition, when the robot 100 automatically generates a question without receiving an external trigger, the question can be automatically generated based on memorable event data identified from the robot's past emotion value history.

また、関連情報収集部２７０が、ユーザについての好み情報に対応して自動的にキーワード検索を実行して、検索結果を取得する検索実行段階を繰り返すことによって、自己学習を実行することができる。 In addition, the related information collection unit 270 can perform self-learning by automatically performing a keyword search corresponding to the preference information about the user and repeating the search execution step of obtaining search results.

ここで、検索実行段階は、外部からのトリガを受けていない状態において、ロボットの過去の感情値の履歴から特定した、印象に残ったイベントデータに基づいて、キーワード検索を自動的に実行するようにしてもよい。 Here, the search execution stage may be configured to automatically execute a keyword search based on memorable event data identified from the robot's past emotion value history when no external trigger is received.

なお、感情決定部２３２は、特定のマッピングに従い、ユーザの感情を決定してよい。具体的には、感情決定部２３２は、特定のマッピングである感情マップ（図５参照）に従い、ユーザの感情を決定してよい。 In addition, the emotion determination unit 232 may determine the user's emotion according to a specific mapping. Specifically, the emotion determination unit 232 may determine the user's emotion according to an emotion map (see FIG. 5), which is a specific mapping.

図５は、複数の感情がマッピングされる感情マップ４００を示す図である。感情マップ４００において、感情は、中心から放射状に同心円に配置されている。同心円の中心に近いほど、原始的状態の感情が配置されている。同心円のより外側には、心境から生まれる状態や行動を表す感情が配置されている。感情とは、情動や心的状態も含む概念である。同心円の左側には、概して脳内で起きる反応から生成される感情が配置されている。同心円の右側には概して、状況判断で誘導される感情が配置されている。同心円の上方向及び下方向には、概して脳内で起きる反応から生成され、かつ、状況判断で誘導される感情が配置されている。また、同心円の上側には、「快」の感情が配置され、下側には、「不快」の感情が配置されている。このように、感情マップ４００では、感情が生まれる構造に基づいて複数の感情がマッピングされており、同時に生じやすい感情が、近くにマッピングされている。 5 is a diagram showing an emotion map 400 on which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive emotions are arranged. Emotions that represent states and actions arising from a state of mind are arranged on the outer side of the concentric circles. Emotions are a concept that includes emotions and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions that occur in the brain are arranged. On the right side of the concentric circles, emotions that are generally induced by situational judgment are arranged. On the upper and lower sides of the concentric circles, emotions that are generally generated from reactions that occur in the brain and are induced by situational judgment are arranged. In addition, the emotion of "pleasure" is arranged on the upper side of the concentric circles, and the emotion of "discomfort" is arranged on the lower side. In this way, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions are generated, and emotions that tend to occur simultaneously are mapped close to each other.

（１）例えばロボット１００の感情決定部２３２である感情エンジンが、１００ｍｓｅｃ程度で感情を検知している場合、ロボット１００の反応動作（例えば相槌）の決定は、頻度が少なくとも、感情エンジンの検知頻度（１００ｍｓｅｃ）と同様のタイミングに設定してよく、これよりも早いタイミングに設定してもよい。感情エンジンの検知頻度はサンプリングレートと解釈してよい。 (1) For example, if the emotion engine, which is the emotion determination unit 232 of the robot 100, detects emotions at approximately 100 msec, the frequency of the determination of the reaction action of the robot 100 (e.g., a backchannel) may be set to at least the same timing as the detection frequency of the emotion engine (100 msec), or may be set to an earlier timing. The detection frequency of the emotion engine may be interpreted as the sampling rate.

１００ｍｓｅｃ程度で感情を検知し、即時に連動して反応動作（例えば相槌）を行うことで、不自然な相槌ではなくなり、自然な空気を読んだ対話を実現できる。ロボット１００は、感情マップ４００の曼荼羅の方向性とその度合い（強さ）に応じて、反応動作（相槌など）を行う。なお、感情エンジンの検知頻度（サンプリングレート）は、１００ｍｓに限定されず、シチュエーション（スポーツをしている場合など）、ユーザの年齢などに応じて、変更してもよい。 By detecting emotions in about 100 msec and immediately performing a corresponding reaction (e.g., a backchannel), unnatural backchannels can be avoided, and a natural dialogue that reads the atmosphere can be realized. The robot 100 performs a reaction (such as a backchannel) according to the directionality and the degree (strength) of the mandala in the emotion map 400. Note that the detection frequency (sampling rate) of the emotion engine is not limited to 100 ms, and may be changed according to the situation (e.g., when playing sports), the age of the user, etc.

（２）感情マップ４００と照らし合わせ、感情の方向性とその度合いの強さを予め設定しておき、相槌の動き及び相槌の強弱を設定してよい。例えば、ロボット１００が安定感、安心などを感じている場合、ロボット１００は、頷いて話を聞き続ける。ロボット１００が不安、迷い、怪しい感じを覚えている場合、ロボット１００は、首をかしげてもよく、首振りを止めてもよい。 (2) The directionality of emotions and the strength of their intensity may be preset in reference to the emotion map 400, and the movement of the interjections and the strength of the interjections may be set. For example, if the robot 100 feels a sense of stability or security, the robot 100 may nod and continue listening. If the robot 100 feels anxious, confused, or suspicious, the robot 100 may tilt its head or stop shaking its head.

これらの感情は、感情マップ４００の３時の方向に分布しており、普段は安心と不安のあたりを行き来する。感情マップ４００の右半分では、内部的な感覚よりも状況認識の方が優位に立つため、落ち着いた印象になる。 These emotions are distributed in the three o'clock direction of emotion map 400, and usually fluctuate between relief and anxiety. In the right half of emotion map 400, situational awareness takes precedence over internal sensations, resulting in a sense of calm.

（３）ロボット１００が褒められて快感を覚えた場合、「あー」というフィラーが台詞の前に入り、きつい言葉をもらって痛感を覚えた場合、「うっ！」というフィラーが台詞の前に入ってよい。また、ロボット１００が「うっ！」と言いつつうずくまる仕草などの身体的な反応を含めてよい。これらの感情は、感情マップ４００の９時あたりに分布している。 (3) If the robot 100 feels good after being praised, the filler "ah" may be inserted before the line, and if the robot 100 feels hurt after receiving harsh words, the filler "ugh!" may be inserted before the line. Also, a physical reaction such as the robot 100 crouching down while saying "ugh!" may be included. These emotions are distributed around 9 o'clock on the emotion map 400.

（４）感情マップ４００の左半分では、状況認識よりも内部的な感覚（反応）の方が優位に立つ。よって、思わず反応してしまった印象を与え得る。 (4) In the left half of the emotion map 400, internal sensations (reactions) are more important than situational awareness. This can give the impression that the person is reacting unconsciously.

ロボット１００が納得感という内部的な感覚（反応）を覚えながら状況認識においても好感を覚える場合、ロボット１００は、相手を見ながら深く頷いてよく、また「うんうん」と発してよい。このように、ロボット１００は、相手へのバランスのとれた好感、すなわち、相手への許容や寛容といった行動を生成してよい。このような感情は、感情マップ４００の１２時あたりに分布している。 When the robot 100 feels a favorable feeling in its situational awareness while also feeling an internal sensation (reaction) of satisfaction, the robot 100 may nod deeply while looking at the other person, or may say "uh-huh." In this way, the robot 100 may generate a behavior that shows a balanced favorable feeling toward the other person, that is, acceptance and tolerance toward the other person. Such emotions are distributed around 12 o'clock on the emotion map 400.

逆に、ロボット１００が不快感という内部的な感覚（反応）を覚えながら状況認識においても、ロボット１００は、嫌悪を覚えるときには首を横に振る、憎しみを覚えるくらいになると、目のＬＥＤを赤くして相手を睨んでもよい。このような感情は、感情マップ４００の６時あたりに分布している。 Conversely, even when the robot 100 is aware of a situation while experiencing an internal sensation (reaction) of discomfort, the robot 100 may shake its head when it feels disgust, or turn the eye LEDs red and glare at the other person when it feels hatred. These types of emotions are distributed around the 6 o'clock position on the emotion map 400.

（５）感情マップ４００の内側は心の中、感情マップ４００の外側は行動を表すため、感情マップ４００の外側に行くほど、感情が目に見える（行動に表れる）ようになる。 (5) The inside of emotion map 400 represents what is going on inside one's mind, while the outside of emotion map 400 represents behavior, so the further out on emotion map 400 you go, the more visible the emotions become (the more they are expressed in behavior).

（６）感情マップ４００の３時付近に分布する安心を覚えながら、人の話を聞く場合、ロボット１００は、軽く首を縦に振って「ふんふん」と発する程度であるが、１２時付近の愛の方になると、首を深く縦に振るような力強い頷きをしてよい。 (6) When listening to someone with a sense of relief, which is distributed around the 3 o'clock area of the emotion map 400, the robot 100 may lightly nod its head and say "hmm," but when it comes to love, which is distributed around 12 o'clock, it may nod vigorously, nodding its head deeply.

ここで、人の感情は、姿勢や血糖値のような様々なバランスを基礎としており、それらのバランスが理想から遠ざかると不快、理想に近づくと快という状態を示す。ロボットや自動車やバイク等においても、姿勢やバッテリー残量のような様々なバランスを基礎として、それらのバランスが理想から遠ざかると不快、理想に近づくと快という状態を示すように感情を作ることができる。感情マップは、例えば、光吉博士の感情地図（音声感情認識及び情動の脳生理信号分析システムに関する研究、徳島大学、博士論文：https://ci.nii.ac.jp/naid/500000375379）に基づいて生成されてよい。感情地図の左半分には、感覚が優位にたつ「反応」と呼ばれる領域に属する感情が並ぶ。また、感情地図の右半分には、状況認識が優位にたつ「状況」と呼ばれる領域に属する感情が並ぶ。 Here, human emotions are based on various balances such as posture and blood sugar level, and when these balances are far from the ideal, it indicates an unpleasant state, and when they are close to the ideal, it indicates a pleasant state. Emotions can also be created for robots, cars, motorcycles, etc., based on various balances such as posture and remaining battery power, so that when these balances are far from the ideal, it indicates an unpleasant state, and when they are close to the ideal, it indicates a pleasant state. The emotion map may be generated, for example, based on the emotion map of Dr. Mitsuyoshi (Research on speech emotion recognition and emotion brain physiological signal analysis system, Tokushima University, doctoral dissertation: https://ci.nii.ac.jp/naid/500000375379). The left half of the emotion map is lined with emotions that belong to an area called "reaction" where sensation is dominant. The right half of the emotion map is lined with emotions that belong to an area called "situation" where situation recognition is dominant.

感情マップでは学習を促す感情が２つ定義される。１つは、状況側にあるネガティブな「懺悔」や「反省」の真ん中周辺の感情である。つまり、「もう２度とこんな想いはしたくない」「もう叱られたくない」というネガティブな感情がロボットに生じたときである。もう１つは、反応側にあるポジティブな「欲」のあたりの感情である。つまり、「もっと欲しい」「もっと知りたい」というポジティブな気持ちのときである。 The emotion map defines two emotions that encourage learning. The first is the negative emotion around the middle of "repentance" or "reflection" on the situation side. In other words, this is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the positive emotion around "desire" on the response side. In other words, this is when the robot has positive feelings such as "I want more" or "I want to know more."

感情決定部２３２は、センサモジュール部２１０で解析された情報、及び認識されたユーザ１０の状態を、予め学習されたニューラルネットワークに入力し、感情マップ４００に示す各感情を示す感情値を取得し、ユーザ１０の感情を決定する。このニューラルネットワークは、センサモジュール部２１０で解析された情報、及び認識されたユーザ１０の状態と、感情マップ４００に示す各感情を示す感情値との組み合わせである複数の学習データに基づいて予め学習されたものである。また、このニューラルネットワークは、図６に示す感情マップ９００のように、近くに配置されている感情同士は、近い値を持つように学習される。図６では、「安心」、「安穏」、「心強い」という複数の感情が、近い感情値となる例を示している。 The emotion determination unit 232 inputs the information analyzed by the sensor module unit 210 and the recognized state of the user 10 into a pre-trained neural network, obtains emotion values indicating each emotion shown in the emotion map 400, and determines the emotion of the user 10. This neural network is pre-trained based on multiple learning data that are combinations of the information analyzed by the sensor module unit 210 and the recognized state of the user 10, and emotion values indicating each emotion shown in the emotion map 400. In addition, this neural network is trained so that emotions that are located close to each other have similar values, as in the emotion map 900 shown in Figure 6. Figure 6 shows an example in which multiple emotions, such as "peace of mind," "calm," and "reassuring," have similar emotion values.

また、感情決定部２３２は、特定のマッピングに従い、ロボット１００の感情を決定してよい。具体的には、感情決定部２３２は、センサモジュール部２１０で解析された情報、状態認識部２３０によって認識されたユーザ１０の状態、及びロボット１００の状態を、予め学習されたニューラルネットワークに入力し、感情マップ４００に示す各感情を示す感情値を取得し、ロボット１００の感情を決定する。このニューラルネットワークは、センサモジュール部２１０で解析された情報、認識されたユーザ１０の状態、及びロボット１００の状態と、感情マップ４００に示す各感情を示す感情値との組み合わせである複数の学習データに基づいて予め学習されたものである。例えば、タッチセンサ（図示省略）の出力から、ロボット１００がユーザ１０になでられていると認識される場合に、「嬉しい」の感情値「３」となることを表す学習データや、加速度センサ２０６の出力から、ロボット１００がユーザ１０に叩かれていると認識される場合に、「怒」の感情値「３」となることを表す学習データに基づいて、ニューラルネットワークが学習される。また、このニューラルネットワークは、図６に示す感情マップ９００のように、近くに配置されている感情同士は、近い値を持つように学習される。 Furthermore, the emotion determination unit 232 may determine the emotion of the robot 100 according to a specific mapping. Specifically, the emotion determination unit 232 inputs the information analyzed by the sensor module unit 210, the state of the user 10 recognized by the state recognition unit 230, and the state of the robot 100 into a pre-trained neural network, obtains emotion values indicating each emotion shown in the emotion map 400, and determines the emotion of the robot 100. This neural network is pre-trained based on multiple learning data that are combinations of the information analyzed by the sensor module unit 210, the recognized state of the user 10, and the state of the robot 100, and emotion values indicating each emotion shown in the emotion map 400. For example, the neural network is trained based on learning data that indicates that when the robot 100 is recognized as being stroked by the user 10 from the output of a touch sensor (not shown), the emotional value becomes "happy" at "3," and that when the robot 100 is recognized as being hit by the user 10 from the output of the acceleration sensor 206, the emotional value becomes "anger" at "3." Furthermore, this neural network is trained so that emotions that are located close to each other have similar values, as in the emotion map 900 shown in FIG. 6.

行動決定部２３６は、ユーザの行動と、ユーザの感情、ロボットの感情とを表すテキストに、ユーザの行動に対応するロボットの行動内容を質問するための固定文を追加して、対話機能を有する文章生成モデルに入力することにより、ロボットの行動内容を生成する。 The behavior decision unit 236 generates the robot's behavior by adding fixed sentences to the text representing the user's behavior, the user's emotions, and the robot's emotions, and inputting the results into a sentence generation model with a dialogue function.

例えば、行動決定部２３６は、感情決定部２３２によって決定されたロボット１００の感情から、表１に示すような感情テーブルを用いて、ロボット１００の状態を表すテキストを取得する。ここで、感情テーブルには、感情の種類毎に、各感情値に対してインデックス番号が付与されており、インデックス番号毎に、ロボット１００の状態を表すテキストが格納されている。 For example, the behavior determination unit 236 obtains text representing the state of the robot 100 from the emotion of the robot 100 determined by the emotion determination unit 232, using an emotion table such as that shown in Table 1. Here, in the emotion table, an index number is assigned to each emotion value for each type of emotion, and text representing the state of the robot 100 is stored for each index number.

感情決定部２３２によって決定されたロボット１００の感情が、インデックス番号「２」に対応する場合、「とても楽しい状態」というテキストが得られる。なお、ロボット１００の感情が、複数のインデックス番号に対応する場合、ロボット１００の状態を表すテキストが複数得られる。 When the emotion of the robot 100 determined by the emotion determination unit 232 corresponds to index number "2", the text "very happy state" is obtained. Note that when the emotion of the robot 100 corresponds to multiple index numbers, multiple pieces of text representing the state of the robot 100 are obtained.

また、ユーザ１０の感情に対しても、表２に示すような感情テーブルを用意しておく。 In addition, an emotion table like that shown in Table 2 is prepared for the emotions of user 10.

ここで、ユーザの行動が、「一緒にあそぼう」と話しかけるであり、ロボット１００の感情が、インデックス番号「２」であり、ユーザ１０の感情が、インデックス番号「３」である場合には、
「ロボットはとても楽しい状態です。ユーザは普通に楽しい状態です。ユーザに「一緒にあそぼう」と話しかけられました。ロボットとして、どのように返事をしますか？」というテキストを文章生成モデルに入力し、ロボットの行動内容を取得する。行動決定部２３６は、この行動内容から、ロボットの行動を決定する。 In this case, if the user's action is speaking "Let's play together", the emotion of the robot 100 is index number "2", and the emotion of the user 10 is index number "3",
The text "The robot is having a lot of fun. The user is having a normal amount of fun. The user says to the robot, 'Let's play together.' How will you respond as the robot?" is input into the sentence generation model to obtain the robot's behavior. The behavior decision unit 236 decides on the robot's behavior from this behavior.

このように、行動決定部２３６は、ロボット１００の感情の種類毎で、かつ、当該感情の強さ毎に予め定められたロボット１００の感情に関する状態と、ユーザ１０の行動とに対応して、ロボット１００の行動内容を決定する。この形態では、ロボット１００の感情に関する状態に応じて、ユーザ１０との対話を行っている場合のロボット１００の発話内容を分岐させることができる。すなわち、ロボット１００は、ロボットの感情に応じたインデックス番号に応じて、ロボットの行動を変えることができるため、ユーザは、ロボットに心があるような印象を持ち、ロボットに対して話しかけるなどの行動をとることが促進される。 In this way, the behavior decision unit 236 decides the behavior of the robot 100 in response to the state of the robot 100's emotion, which is predetermined for each type of emotion of the robot 100 and for each strength of the emotion, and the behavior of the user 10. In this form, the speech content of the robot 100 during a dialogue with the user 10 can be branched according to the state of the robot 100's emotion. In other words, since the robot 100 can change its behavior according to an index number according to the emotion of the robot, the user has the impression that the robot has a heart, and is encouraged to take actions such as talking to the robot.

また、行動決定部２３６は、ユーザの行動と、ユーザの感情、ロボットの感情とを表すテキストだけでなく、履歴データ２２２の内容を表すテキストも追加した上で、ユーザの行動に対応するロボットの行動内容を質問するための固定文を追加して、対話機能を有する文章生成モデルに入力することにより、ロボットの行動内容を生成するようにしてもよい。これにより、ロボット１００は、ユーザの感情や行動を表す履歴データに応じて、ロボットの行動を変えることができるため、ユーザは、ロボットに個性があるような印象を持ち、ロボットに対して話しかけるなどの行動をとることが促進される。また、履歴データに、ロボットの感情や行動を更に含めるようにしてもよい。 Furthermore, the behavior decision unit 236 may generate the robot's behavior content by adding not only text representing the user's behavior, the user's emotions, and the robot's emotions, but also text representing the contents of the history data 222, adding a fixed sentence for asking about the robot's behavior content corresponding to the user's behavior, and inputting the result into a sentence generation model with a dialogue function. This allows the robot 100 to change its behavior according to the history data representing the user's emotions and behavior, so that the user has the impression that the robot has a personality, and is encouraged to take actions such as talking to the robot. The history data may also further include the robot's emotions and actions.

また、感情決定部２３２は、文章生成モデルによって生成されたロボット１００の行動内容に基づいて、ロボット１００の感情を決定してもよい。具体的には、感情決定部２３２は、文章生成モデルによって生成されたロボット１００の行動内容を、予め学習されたニューラルネットワークに入力し、感情マップ４００に示す各感情を示す感情値を取得し、取得した各感情を示す感情値と、現在のロボット１００の各感情を示す感情値とを統合し、ロボット１００の感情を更新する。例えば、取得した各感情を示す感情値と、現在のロボット１００の各感情を示す感情値とをそれぞれ平均して、統合する。このニューラルネットワークは、文章生成モデルによって生成されたロボット１００の行動内容を表すテキストと、感情マップ４００に示す各感情を示す感情値との組み合わせである複数の学習データに基づいて予め学習されたものである。 The emotion determination unit 232 may also determine the emotion of the robot 100 based on the behavioral content of the robot 100 generated by the sentence generation model. Specifically, the emotion determination unit 232 inputs the behavioral content of the robot 100 generated by the sentence generation model into a pre-trained neural network, obtains emotion values indicating each emotion shown in the emotion map 400, and integrates the obtained emotion values indicating each emotion with the emotion values indicating each emotion of the current robot 100 to update the emotion of the robot 100. For example, the emotion values indicating each emotion obtained and the emotion values indicating each emotion of the current robot 100 are averaged and integrated. This neural network is pre-trained based on multiple learning data that are combinations of texts indicating the behavioral content of the robot 100 generated by the sentence generation model and emotion values indicating each emotion shown in the emotion map 400.

例えば、文章生成モデルによって生成されたロボット１００の行動内容として、ロボット１００の発話内容「それはよかったね。ラッキーだったね。」が得られた場合には、この発話内容を表すテキストをニューラルネットワークに入力すると、感情「嬉しい」の感情値として高い値が得られ、感情「嬉しい」の感情値が高くなるように、ロボット１００の感情が更新される。 For example, if the speech content of the robot 100, "That's great. You're lucky," is obtained as the behavioral content of the robot 100 generated by the sentence generation model, then when the text representing this speech content is input to the neural network, a high emotion value is obtained for the emotion "happy," and the emotion of the robot 100 is updated so that the emotion value of the emotion "happy" becomes higher.

ロボット１００においては、生成系ＡＩなどの文章生成モデルと、感情決定部２３２とが連動して、自我を有し、ユーザがしゃべっていない間も様々なパラメータで成長し続ける方法が実行される。 In the robot 100, a sentence generation model such as generative AI works in conjunction with the emotion determination unit 232 to give the robot an ego and allow it to continue to grow with various parameters even when the user is not speaking.

生成系ＡＩは、深層学習の手法を用いた大規模言語モデルである。生成系ＡＩは外部データを参照することもでき、例えば、ＣｈａｔＧＰＴＰｌｕｇｉｎｓでは、対話を通して天気情報やホテル予約情報といった様々な外部データを参照しながら、なるべく正確に答えを出す技術が知られている。例えば、生成系ＡＩでは、自然言語で目的を与えると、様々なプログラミング言語でソースコードを自動生成することができる。例えば、生成系ＡＩでは、問題のあるソースコードを与えると、デバッグして問題点を発見し、改善されたソースコードを自動生成することもできる。これらを組み合わせて、自然言語で目的を与えると、ソースコードに問題がなくなるまでコード生成とデバッグを繰り返す自律型エージェントが出てきている。そのような自律型エージェントとして、ＡｕｔｏＧＰＴ、ｂａｂｙＡＧＩ、ＪＡＲＶＩＳ、及びＥ２Ｂ等が知られている。 Generative AI is a large-scale language model that uses deep learning techniques. Generative AI can also refer to external data. For example, ChatGPT Plugins is known as a technology that provides answers as accurately as possible while referring to various external data such as weather information and hotel reservation information through dialogue. For example, generative AI can automatically generate source code in various programming languages when a goal is given in natural language. For example, generative AI can also debug and find the problem when problematic source code is given, and automatically generate improved source code. Combining these, autonomous agents are emerging that, when a goal is given in natural language, repeat code generation and debugging until there are no problems in the source code. AutoGPT, babyAGI, JARVIS, E2B, etc. are known as such autonomous agents.

本実施形態に係るロボット１００では、特許文献２（特許第６１９９９２７号公報）に記載されているような、ロボットが強い感情を覚えたイベントデータを長く残し、ロボットにあまり感情が湧かなかったイベントデータを早く忘却するという技術を用いて、学習すべきイベントデータを、印象的な記憶が入ったデータベースに残してよい。 In the robot 100 according to this embodiment, the event data to be learned may be stored in a database containing impressive memories using a technique described in Patent Document 2 (Patent Publication No. 6199927) in which event data for which the robot felt strong emotions is stored for a long time and event data for which the robot felt little emotion is quickly forgotten.

また、ロボット１００は、カメラ機能で取得したユーザ１０の映像データ等を、履歴データ２２２に記録させてよい。ロボット１００は、必要に応じて履歴データ２２２から映像データ等を取得して、ユーザ１０に提供してよい。ロボット１００は、感情の強さが強いほど、情報量がより多い映像データを生成して履歴データ２２２に記録させてよい。例えば、ロボット１００は、骨格データ等の高圧縮形式の情報を記録している場合に、興奮の感情値が閾値を超えたことに応じて、ＨＤ動画等の低圧縮形式の情報の記録に切り換えてよい。ロボット１００によれば、例えば、ロボット１００の感情が高まったときの高精細な映像データを記録として残すことができる。 The robot 100 may also record video data of the user 10 acquired by the camera function in the history data 222. The robot 100 may acquire video data from the history data 222 as necessary and provide it to the user 10. The robot 100 may generate video data with a larger amount of information as the emotion becomes stronger and record it in the history data 222. For example, when the robot 100 is recording information in a highly compressed format such as skeletal data, it may switch to recording information in a low-compression format such as HD video when the emotion value of excitement exceeds a threshold. The robot 100 can leave a record of high-definition video data when the robot 100's emotion becomes heightened, for example.

ロボット１００は、ロボット１００がユーザ１０と話していないときに、印象的なイベントデータが記憶されている履歴データ２２２から自動的にイベントデータをロードして、感情決定部２３２により、ロボットの感情を更新し続けてよい。ロボット１００は、ロボット１００がユーザ１０と話していないとき、ロボット１００の感情が学習を促す感情になったときに、印象的なイベントデータに基づいて、ユーザ１０の感情を良くするように変化させるための感情変化イベントを作成することができる。これにより、ロボット１００の感情の状態に応じた適切なタイミングでの自律的な学習（イベントデータを思い出すこと）を実現できるとともに、ロボット１００の感情の状態を適切に反映した自律的な学習を実現することができる。 When the robot 100 is not talking to the user 10, the robot 100 may automatically load event data from the history data 222 in which impressive event data is stored, and may continue to update the robot's emotions using the emotion determination unit 232. When the robot 100 is not talking to the user 10 and the robot 100's emotions become emotions that encourage learning, the robot 100 may create an emotion change event for changing the user 10's emotions for the better, based on the impressive event data. This allows the robot 100 to realize autonomous learning (recalling event data) at an appropriate timing according to the emotional state of the robot 100, and also allows autonomous learning that appropriately reflects the emotional state of the robot 100 to be realized.

学習を促す感情とは、ネガティブな状態では光吉博士の感情地図の「懺悔」や「反省」」あたりの感情であり、ポジティブな状態では感情地図の「欲」のあたりの感情である。 Emotions that promote learning, in a negative state, are emotions like "repentance" or "remorse" on Dr. Mitsuyoshi's emotion map, and in a positive state, are emotions like "desire" on the emotion map.

ロボット１００は、ネガティブな状態において、感情地図の「懺悔」及び「反省」を、学習を促す感情として取り扱ってよい。ロボット１００は、ネガティブな状態において、感情地図の「懺悔」及び「反省」に加えて、「懺悔」及び「反省」に隣接する感情を、学習を促す感情として取り扱ってもよい。例えば、ロボット１００は、「懺悔」及び「反省」に加えて、「惜」、「頑固」、「自滅」、「自戒」、「後悔」、及び「絶望」の少なくともいずれかを、学習を促す感情として取り扱う。これらにより、例えば、ロボット１００が「もう２度とこんな想いはしたくない」「もう叱られたくない」というネガティブな気持ちを抱いたときに自律的な学習を実行するようにできる。 In a negative state, the robot 100 may treat "repentance" and "remorse" in the emotion map as emotions that encourage learning. In a negative state, the robot 100 may treat emotions adjacent to "repentance" and "remorse" in the emotion map as emotions that encourage learning. For example, in addition to "repentance" and "remorse", the robot 100 may treat at least one of "regret", "stubbornness", "self-destruction", "self-reproach", "regret", and "despair" as emotions that encourage learning. This allows the robot 100 to perform autonomous learning when it feels negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again".

ロボット１００は、ポジティブな状態においては、感情地図の「欲」を、学習を促す感情として取り扱ってよい。ロボット１００は、ポジティブな状態において、「欲」に加えて、「欲」に隣接する感情を、学習を促す感情として取り扱ってもよい。例えば、ロボット１００は、「欲」に加えて、「うれしい」、「陶酔」、「渇望」、「期待」、及び「羞」の少なくともいずれかを、学習を促す感情として取り扱う。これらにより、例えば、ロボット１００が「もっと欲しい」「もっと知りたい」というポジティブな気持ちを抱いたときに自律的な学習を実行するようにできる。 When the robot 100 is in a positive state, it may treat "desire" in the emotion map as an emotion that encourages learning. When the robot 100 is in a positive state, it may treat emotions adjacent to "desire" as emotions that encourage learning, in addition to "desire". For example, the robot 100 treats at least one of "happiness", "euphoria", "craving", "anticipation", and "shyness" as emotions that encourage learning, in addition to "desire". This allows the robot 100 to perform autonomous learning when it feels positive emotions such as "wanting more" or "wanting to know more".

ロボット１００は、上述したような学習を促す感情以外の感情をロボット１００が抱いているときには、自律的な学習を実行しないようにしてもよい。これにより、例えば、極端に怒っているときや、盲目的に愛を感じているときに、自律的な学習を実行しないようにできる。 The robot 100 may be configured not to execute autonomous learning when the robot 100 is experiencing emotions other than the emotions that encourage learning as described above. This can prevent the robot 100 from executing autonomous learning, for example, when the robot 100 is extremely angry or when the robot 100 is blindly feeling love.

感情変化イベントとは、例えば、印象的なイベントの先にある行動を提案することである。印象的なイベントの先にある行動とは、感情地図のもっとも外側にある感情ラベルのことで、例えば「愛」の先には「寛容」や「許容」という行動がある。 An emotion-changing event, for example, is a suggestion of an action that follows a memorable event. An action that follows a memorable event is an emotion label that is on the outermost side of the emotion map. For example, beyond "love" are actions such as "tolerance" and "acceptance."

ロボット１００がユーザ１０と話していないときに実行される自律的な学習では、印象的な記憶に登場する人々と自分について、それぞれの感情、状況、行動などを組み合わせて、文章生成モデルを用いて、感情変化イベントを作成する。 In the autonomous learning that is performed when the robot 100 is not talking to the user 10, the robot 100 creates emotion change events by combining the emotions, situations, actions, etc. of people who appear in memorable memories and the user itself using a sentence generation model.

すべての感情値が０から５の６段階評価で表されているとして、印象的なイベントデータとして、「友達が叩かれて嫌そうにしていた」というイベントデータが履歴データ２２２に記憶されている場合を考える。ここでの友達はユーザ１０を指し、ユーザ１０の感情は「嫌悪感」であり、「嫌悪感」を表す値としては５が入っていたとする。また、ロボット１００の感情は「不安」であり、「不安」を表す値としては４が入っていたとする。 Let us consider a case where all emotion values are expressed on a six-level scale from 0 to 5, and event data such as "a friend looked displeased after being hit" is stored in the history data 222 as memorable event data. The friend in this case refers to the user 10, and the emotion of the user 10 is "disgust," with 5 entered as the value representing "disgust." In addition, let us assume that the emotion of the robot 100 is "anxiety," and 4 is entered as the value representing "anxiety."

ロボット１００はユーザ１０と話をしていない間、自律的処理を実行することにより、様々なパラメータで成長し続けることができる。具体的には、履歴データ２２２から例えば感情値が強い順に並べた最上位のイベントデータとして「友達が叩かれて嫌そうにしていた」というイベントデータをロードする。ロードされたイベントデータにはロボット１００の感情として強さ４の「不安」が紐づいており、ここで、友達であるユーザ１０の感情として強さ５の「嫌悪感」が紐づいていたとする。ロボット１００の現在の感情値が、ロード前に強さ３の「安心」であるとすると、ロードされた後には強さ４の「不安」と強さ５の「嫌悪感」の影響が加味されてロボット１００の感情値が、口惜しい（悔しい）を意味する「惜」に変化することがある。このとき、「惜」は学習を促す感情であるため、ロボット１００は、ロボット行動として、イベントデータを思い出すことを決定し、感情変化イベントを作成する。このとき、文章生成モデルに入力する情報は、印象的なイベントデータを表すテキストであり、本例は「友達が叩かれて嫌そうにしていた」ことである。また、感情地図では最も内側に「嫌悪感」の感情があり、それに対応する行動として最も外側に「攻撃」が予測されるため、本例では友達がそのうち誰かを「攻撃」することを避けるように感情変化イベントが作成される。 While not talking to the user 10, the robot 100 can continue to grow with various parameters by executing autonomous processing. Specifically, for example, the event data "a friend was hit and looked disgusted" is loaded as the top event data arranged in order of emotional value strength from the history data 222. The loaded event data is linked to the emotion of the robot 100, "anxiety" of strength 4, and here, the emotion of the friend, user 10, is linked to "disgust" of strength 5. If the current emotional value of the robot 100 is "relief" of strength 3 before loading, after loading, the influence of "anxiety" of strength 4 and "disgust" of strength 5 are added, and the emotional value of the robot 100 may change to "regret" meaning disappointment (regret). At this time, since "regret" is an emotion that encourages learning, the robot 100 decides to recall the event data as a robot behavior and creates an emotion change event. At this time, the information input to the sentence generation model is text that represents memorable event data; in this example, it is "the friend looked displeased after being hit." Also, since the emotion map has the emotion of "disgust" at the innermost position and the corresponding behavior predicted as "attack" at the outermost position, in this example, an emotion change event is created to prevent the friend from "attacking" anyone in the future.

例えば、印象的なイベントデータの情報を使用して、穴埋め問題を解けば、下記のような入力テキストを自動生成できる。 For example, by using information from impressive event data to solve fill-in-the-blank questions, you can automatically generate input text like the one below.

「ユーザが叩かれていました。そのとき、ユーザは、非常に嫌悪感を持っていました。ロボットはとても不安でした。ロボットが次にユーザに会ったときにかけるべきセリフを３０文字以内で教えてください。ただし、会う時間帯に関係ないようにお願いします。また、直接的な表現は避けてください。候補は３つ挙げるものとします。
＜期待するフォーマット＞
候補１：（ロボットがユーザにかけるべき言葉）
候補２：（ロボットがユーザにかけるべき言葉）
候補３：（ロボットがユーザにかけるべき言葉）」 "A user was being slammed. At that time, the user felt very disgusted. The robot was very anxious. Please tell us what the robot should say to the user the next time they meet, in 30 characters or less. However, please make sure that it is not related to the time of day they will meet. Also, please avoid direct expressions. We will provide three candidates.
<Expected format>
Candidate 1: (Words the robot should say to the user)
Candidate 2: (Words the robot should say to the user)
Candidate 3: (What the robot should say to the user)

このとき、文章生成モデルの出力は、例えば、以下のようになる。 In this case, the output of the sentence generation model would look something like this:

「候補１：大丈夫？昨日のこと気になってたんだ。
候補２：昨日のこと、気にしていたよ。どうしたらいい？
候補３：心配していたよ。何か話してもらえる？」 Candidate 1: Are you okay? I was just wondering about what happened yesterday.
Candidate 2: I was worried about what happened yesterday. What should I do?
Candidate 3: I was worried about you. Can you tell me something?"

さらに、感情変化イベントの作成で得られた情報については、ロボット１００は、下記のような入力テキストを自動生成してもよい。 Furthermore, based on the information obtained by creating an emotion change event, the robot 100 may automatically generate input text such as the following:

「「ユーザが叩かれていました」場合、そのユーザに次の声をかけたとき、ユーザはどのような気持ちになるでしょうか。ユーザの感情は、「喜Ａ怒Ｂ哀Ｃ楽Ｄ」の形式で、ＡからＤは、０から５の６段階評価の整数が入るものとします。
候補１：大丈夫？昨日のこと気になってたんだ。
候補２：昨日のこと、気にしていたよ。どうしたらいい？
候補３：心配していたよ。何か話してもらえる？」 If a user is being bashed, how will the user feel when you speak to them in the following way? The user's emotions are expressed in the format of "Happy A, Angry B, Sad C, Happy D," where A to D are integers on a 6-point scale from 0 to 5.
Candidate 1: Are you okay? I was just wondering about what happened yesterday.
Candidate 2: I was worried about what happened yesterday. What should I do?
Candidate 3: I was worried about you. Can you tell me something?"

「ユーザの感情は以下のようになるかもしれません。
候補１：喜３怒１哀２楽２
候補２：喜２怒１哀３楽２
候補３：喜２怒１哀３楽３」 "Users' feelings might be:
Candidate 1: Joy 3, anger 1, sadness 2, happiness 2
Candidate 2: Joy 2, anger 1, sadness 3, happiness 2
Candidate 3: Joy 2, Anger 1, Sorrow 3, Pleasure 3"

このように、ロボット１００は、感情変化イベントを作成した後に、想いをめぐらす処理を実行してもよい。 In this way, the robot 100 may perform a musing process after creating an emotion change event.

最後に、ロボット１００は、複数候補の中から、もっとも人が喜びそうな候補１を使用して、感情変化イベントを作成し、行動予定データ２２４に格納し、ユーザ１０に次回会ったときに備えてよい。 Finally, the robot 100 may create an emotion change event using candidate 1 from among the multiple candidates that is most likely to please people, store this in the action schedule data 224, and prepare for the next time the robot 10 meets the user 10.

以上のように、家族や友達と会話をしていないときでも、印象的なイベントデータが記憶されている履歴データ２２２の情報を使用して、ロボットの感情値を決定し続け、上述した学習を促す感情になったときに、ロボット１００はロボット１００の感情に応じて、ユーザ１０と会話していないときに自律的学習を実行し、履歴データ２２２や行動予定データ２２４を更新し続ける。 As described above, even when the robot is not talking to family or friends, the robot continues to determine the robot's emotion value using information from the history data 222, which stores impressive event data, and when the robot experiences an emotion that encourages learning as described above, the robot 100 performs autonomous learning when not talking to the user 10 in accordance with the emotion of the robot 100, and continues to update the history data 222 and the action schedule data 224.

以上は、感情値を用いた例であるが、感情地図ではホルモンの分泌量とイベント種類から感情をつくることができるため、印象的なイベントデータにひもづく値としてはホルモンの種類、ホルモンの分泌量、イベントの種類であっても良い。 The above are examples using emotion values, but because emotion maps can create emotions from hormone secretion levels and event types, the values linked to memorable event data could also be hormone type, hormone secretion levels, or event type.

以下、具体的な実施例を記載する。 Specific examples are described below.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの興味関心のあるトピックや趣味に関する情報を調べる。 For example, the robot 100 may look up information about topics or hobbies that interest the user, even when the robot 100 is not talking to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの誕生日や記念日に関する情報を調べ、祝福のメッセージを考える。 For example, even when the robot 100 is not talking to the user, it can check information about the user's birthdays and anniversaries and think up a congratulatory message.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザが行きたがっている場所や食べ物、商品のレビューを調べる。 For example, the robot 100 checks reviews of places, foods, and products that the user wants to visit, even when it is not talking to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、天気情報を調べ、ユーザのスケジュールや計画に合わせたアドバイスを提供する。 For example, even when the robot 100 is not talking to the user, it can check weather information and provide advice tailored to the user's schedule and plans.

ロボット１００は、例えば、ユーザと話をしていないときでも、地元のイベントやお祭りの情報を調べ、ユーザに提案する。 For example, even when the robot 100 is not talking to the user, it can look up information about local events and festivals and suggest them to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの興味のあるスポーツの試合結果やニュースを調べ、話題を提供する。 For example, even when the robot 100 is not talking to the user, it can check the results and news of sports that interest the user and provide topics of conversation.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの好きな音楽やアーティストの情報を調べ、紹介する。 For example, even when the robot 100 is not talking to the user, it searches for and introduces information about the user's favorite music and artists.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザが気になっている社会的な問題やニュースに関する情報を調べ、意見を提供する。 For example, even when the robot 100 is not talking to the user, it can look up information about social issues or news that concern the user and provide its opinion.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの故郷や出身地に関する情報を調べ、話題を提供する。 For example, even when the robot 100 is not talking to the user, it can look up information about the user's hometown or birthplace and provide topics of conversation.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの仕事や学校の情報を調べ、アドバイスを提供する。 For example, the robot 100 can look up information about the user's work or school and provide advice, even when it is not talking to the user.

ロボット１００は、ユーザと話をしていないときでも、ユーザが興味を持つ書籍や漫画、映画、ドラマの情報を調べ、紹介する。 Even when the robot 100 is not talking to the user, it searches for and introduces information about books, comics, movies, and dramas that may be of interest to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの健康に関する情報を調べ、アドバイスを提供する。 For example, the robot 100 may check information about the user's health and provide advice even when it is not speaking to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの旅行の計画に関する情報を調べ、アドバイスを提供する。 For example, the robot 100 may look up information about the user's travel plans and provide advice even when it is not speaking to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの家や車の修理やメンテナンスに関する情報を調べ、アドバイスを提供する。 For example, the robot 100 can look up information and provide advice on repairs and maintenance for the user's home or car, even when it is not speaking to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザが興味を持つ美容やファッションの情報を調べ、アドバイスを提供する。 For example, even when the robot 100 is not talking to the user, it searches for information on beauty and fashion that the user is interested in and provides advice.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザのペットの情報を調べ、アドバイスを提供する。 For example, the robot 100 can look up information about the user's pet and provide advice even when it is not talking to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの趣味や仕事に関連するコンテストやイベントの情報を調べ、提案する。 For example, even when the robot 100 is not talking to the user, it searches for and suggests information about contests and events related to the user's hobbies or work.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザのお気に入りの飲食店やレストランの情報を調べ、提案する。 For example, the robot 100 searches for and suggests information about the user's favorite eateries and restaurants even when it is not talking to the user.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザの人生に関わる大切な決断について、情報を収集しアドバイスを提供する。 For example, even when the robot 100 is not talking to the user, it can collect information and provide advice about important decisions that affect the user's life.

ロボット１００は、例えば、ユーザと話をしていないときでも、ユーザが心配している人に関する情報を調べ、助言を提供する。 For example, the robot 100 looks up information about someone the user is concerned about and provides advice, even when it is not speaking to the user.

［第２実施形態］
第２実施形態では、上記のロボット１００を、ぬいぐるみに搭載するか、又はぬいぐるみに搭載された制御対象機器（スピーカやカメラ）に無線又は有線で接続された制御装置に適用する。なお、第１実施形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Second embodiment]
In the second embodiment, the robot 100 is mounted on a stuffed toy, or is applied to a control device connected wirelessly or by wire to a control target device (speaker or camera) mounted on the stuffed toy. Note that parts having the same configuration as those in the first embodiment are given the same reference numerals and the description thereof is omitted.

第２実施形態は、具体的には、以下のように構成される。例えば、ロボット１００を、ユーザ１０と日常を過ごしながら、当該ユーザ１０と日常に関する情報を基に、対話を進めたり、ユーザ１０の趣味趣向に合わせた情報を提供する共同生活者（具体的には、図７及び図８に示すぬいぐるみ１００Ｎ）に適用する。第２実施形態では、上記のロボット１００の制御部分を、スマートホン５０に適用した例について説明する。 Specifically, the second embodiment is configured as follows. For example, the robot 100 is applied to a cohabitant (specifically, a stuffed toy 100N shown in Figs. 7 and 8) that spends daily life with the user 10 and advances a dialogue with the user 10 based on information about the user's daily life, and provides information tailored to the user's hobbies and interests. In the second embodiment, an example in which the control part of the robot 100 is applied to a smartphone 50 will be described.

ロボット１００の入出力デバイスとしての機能を搭載したぬいぐるみ１００Ｎは、ロボット１００の制御部分として機能するスマートホン５０が着脱可能であり、ぬいぐるみ１００Ｎの内部で、入出力デバイスと、収容されたスマートホン５０とが接続されている。 The stuffed toy 100N, which is equipped with the function of an input/output device for the robot 100, has a detachable smartphone 50 that functions as the control part of the robot 100, and the input/output device and the housed smartphone 50 are connected inside the stuffed toy 100N.

図７（Ａ）に示される如く、ぬいぐるみ１００Ｎは、本実施形態（その他の実施形態）では、外観が柔らかい布生地で覆われた熊の形状であり、その内方に形成された空間部５２には、入出力デバイスとして、センサ部２００Ａ及び制御対象２５２Ａが配置されている（図９参照）。センサ部２００Ａは、マイク２０１及び２Ｄカメラ２０３を含む。具体的には、図７（Ｂ）に示される如く、空間部５２には、耳５４に相当する部分にセンサ部２００のマイク２０１が配置され、目５６に相当する部分にセンサ部２００の２Ｄカメラ２０３が配置され、及び、口５８に相当する部分に制御対象２５２Ａの一部を構成するスピーカ６０が配置されている。なお、マイク２０１及びスピーカ６０は、必ずしも別体である必要はなく、一体型のユニットであってもよい。ユニットの場合は、ぬいぐるみ１００Ｎの鼻の位置など、発話が自然に聞こえる位置に配置するとよい。なお、ぬいぐるみ１００Ｎは、動物の形状である場合を例に説明したが、これに限定されるものではない。ぬいぐるみ１００Ｎは、特定のキャラクタの形状であってもよい。 As shown in FIG. 7A, in this embodiment (other embodiments), the stuffed toy 100N has the shape of a bear covered with soft fabric, and the sensor unit 200A and the control target 252A are arranged in the space 52 formed inside the stuffed toy 100N as input/output devices (see FIG. 9). The sensor unit 200A includes a microphone 201 and a 2D camera 203. Specifically, as shown in FIG. 7B, the microphone 201 of the sensor unit 200 is arranged in the part corresponding to the ear 54 in the space 52, the 2D camera 203 of the sensor unit 200 is arranged in the part corresponding to the eye 56, and the speaker 60 constituting a part of the control target 252A is arranged in the part corresponding to the mouth 58. Note that the microphone 201 and the speaker 60 do not necessarily need to be separate bodies, and may be an integrated unit. In the case of a unit, it is preferable to arrange them in a position where speech can be heard naturally, such as the nose position of the stuffed toy 100N. Although the plush toy 100N has been described as having the shape of an animal, this is not limited to this. The plush toy 100N may also have the shape of a specific character.

図９は、ぬいぐるみ１００Ｎの機能構成を概略的に示す。ぬいぐるみ１００Ｎは、センサ部２００Ａと、センサモジュール部２１０と、格納部２２０と、制御部２２８と、制御対象２５２Ａとを有する。 Figure 9 shows a schematic functional configuration of the plush toy 100N. The plush toy 100N has a sensor unit 200A, a sensor module unit 210, a storage unit 220, a control unit 228, and a control target 252A.

本実施形態のぬいぐるみ１００Ｎに収容されたスマートホン５０は、第１実施形態のロボット１００と同様の処理を実行する。すなわち、スマートホン５０は、図９に示す、センサモジュール部２１０としての機能、格納部２２０としての機能、及び制御部２２８としての機能を有する。 The smartphone 50 housed in the stuffed toy 100N of this embodiment executes the same processing as the robot 100 of the first embodiment. That is, the smartphone 50 has a function as the sensor module unit 210, a function as the storage unit 220, and a function as the control unit 228 shown in FIG. 9.

図８に示される如く、ぬいぐるみ１００Ｎの一部（例えば、背部）には、ファスナー６２が取り付けられており、当該ファスナー６２を開放することで、外部と空間部５２とが連通する構成となっている。 As shown in FIG. 8, a zipper 62 is attached to a part of the stuffed animal 100N (e.g., the back), and opening the zipper 62 allows communication between the outside and the space 52.

ここで、スマートホン５０が、外部から空間部５２へ収容され、ＵＳＢハブ６４（図７（Ｂ）参照）を介して、各入出力デバイスとＵＳＢ接続することで、上記第１実施形態のロボット１００と同等の機能を持たせることができる。 Here, the smartphone 50 is accommodated in the space 52 from the outside and connected to each input/output device via a USB hub 64 (see FIG. 7B), thereby providing the same functionality as the robot 100 of the first embodiment.

また、ＵＳＢハブ６４には、非接触型の受電プレート６６が接続されている。受電プレート６６には、受電用コイル６６Ａが組み込まれている。受電プレート６６は、ワイヤレス給電を受電するワイヤレス受電部の一例である。 A non-contact type power receiving plate 66 is connected to the USB hub 64. A power receiving coil 66A is built into the power receiving plate 66. The power receiving plate 66 is an example of a wireless power receiving unit that receives wireless power.

受電プレート６６は、ぬいぐるみ１００Ｎの両足の付け根部６８付近に配置され、ぬいぐるみ１００Ｎを載置ベース７０に置いたときに、最も載置ベース７０に近い位置となる。載置ベース７０は、外部のワイヤレス送電部の一例である。 The power receiving plate 66 is disposed near the bases 68 of both feet of the stuffed toy 100N, and is located closest to the mounting base 70 when the stuffed toy 100N is placed on the mounting base 70. The mounting base 70 is an example of an external wireless power transmission unit.

この載置ベース７０に置かれたぬいぐるみ１００Ｎが、自然な状態で置物として鑑賞することが可能である。 The stuffed animal 100N placed on this mounting base 70 can be viewed as an ornament in its natural state.

また、この付け根部は、他の部位のぬいぐるみ１００Ｎの表層厚さに比べて薄く形成しており、より載置ベース７０に近い状態で保持されるようになっている。 In addition, this base portion is made thinner than the surface thickness of other parts of the stuffed animal 100N, so that it is held closer to the mounting base 70.

載置ベース７０には、充電パット７２を備えている。充電パット７２は、送電用コイル７２Ａが組み込まれており、送電用コイル７２Ａが信号を送って、受電プレート６６の受電用コイル６６Ａを検索し、受電用コイル６６Ａが見つかると、送電用コイル７２Ａに電流が流れて磁界を発生させ、受電用コイル６６Ａが磁界に反応して電磁誘導が始まる。これにより、受電用コイル６６Ａに電流が流れ、ＵＳＢハブ６４を介して、スマートホン５０のバッテリー（図示省略）に電力が蓄えられる。 The mounting base 70 is equipped with a charging pad 72. The charging pad 72 incorporates a power transmission coil 72A, which sends a signal to search for the power receiving coil 66A on the power receiving plate 66. When the power receiving coil 66A is found, a current flows through the power transmission coil 72A, generating a magnetic field, and the power receiving coil 66A reacts to the magnetic field, starting electromagnetic induction. As a result, a current flows through the power receiving coil 66A, and power is stored in the battery (not shown) of the smartphone 50 via the USB hub 64.

すなわち、ぬいぐるみ１００Ｎを置物として載置ベース７０に載置することで、スマートホン５０は、自動的に充電されるため、充電のために、スマートホン５０をぬいぐるみ１００Ｎの空間部５２から取り出す必要がない。 In other words, by placing the stuffed toy 100N on the mounting base 70 as an ornament, the smartphone 50 is automatically charged, so there is no need to remove the smartphone 50 from the space 52 of the stuffed toy 100N to charge it.

なお、第２実施形態では、スマートホン５０をぬいぐるみ１００Ｎの空間部５２に収容して、有線による接続（ＵＳＢ接続）したが、これに限定されるものではない。例えば、無線機能（例えば、「Bluetooth（登録商標）」）を持たせた制御装置をぬいぐるみ１００Ｎの空間部５２に収容して、制御装置をＵＳＢハブ６４に接続してもよい。この場合、スマートホン５０を空間部５２に入れずに、スマートホン５０と制御装置とが、無線で通信し、外部のスマートホン５０が、制御装置を介して、各入出力デバイスと接続することで、上記第１実施形態のロボット１００と同等の機能を持たせることができる。また、制御装置をぬいぐるみ１００Ｎの空間部５２に収容した制御装置と、外部のスマートホン５０とを有線で接続してもよい。 In the second embodiment, the smartphone 50 is housed in the space 52 of the stuffed toy 100N and connected by wire (USB connection), but this is not limited to this. For example, a control device with a wireless function (e.g., "Bluetooth (registered trademark)") may be housed in the space 52 of the stuffed toy 100N and the control device may be connected to the USB hub 64. In this case, the smartphone 50 and the control device communicate wirelessly without placing the smartphone 50 in the space 52, and the external smartphone 50 connects to each input/output device via the control device, thereby providing the same functions as the robot 100 of the first embodiment. Also, the control device housed in the space 52 of the stuffed toy 100N may be connected to the external smartphone 50 by wire.

また、第２実施形態では、熊のぬいぐるみ１００Ｎを例示したが、他の動物でもよいし、人形であってもよいし、特定のキャラクタの形状であってもよい。また、着せ替え可能でもよい。さらに、表皮の材質は、布生地に限らず、ソフトビニール製等、他の材質でもよいが、柔らかい材質であることが好ましい。 In the second embodiment, a teddy bear 100N is used as an example, but it may be another animal, a doll, or the shape of a specific character. It may also be dressable. Furthermore, the material of the outer skin is not limited to cloth, and may be other materials such as soft vinyl, but a soft material is preferable.

さらに、ぬいぐるみ１００Ｎの表皮にモニタを取り付けて、ユーザ１０に視覚を通じて情報を提供する制御対象２５２を追加してもよい。例えば、目５６をモニタとして、目に映る画像によって喜怒哀楽を表現してもよいし、腹部に、内蔵したスマートホン５０のモニタが透過する窓を設けてもよい。さらに、目５６をプロジェクターとして、壁面に投影した画像によって喜怒哀楽を表現してもよい。 Furthermore, a monitor may be attached to the surface of the stuffed animal 100N to add a control object 252 that provides visual information to the user 10. For example, the eyes 56 may be used as a monitor to express joy, anger, sadness, and happiness by the image reflected in the eyes, or a window may be provided in the abdomen through which the monitor of the built-in smartphone 50 can be seen. Furthermore, the eyes 56 may be used as a projector to express joy, anger, sadness, and happiness by the image projected onto a wall.

第２実施形態によれば、ぬいぐるみ１００Ｎの中に既存のスマートホン５０を入れ、そこから、ＵＳＢ接続を介して、カメラ２０３、マイク２０１、スピーカ６０等をそれぞれ適切な位置に延出させた。 According to the second embodiment, an existing smartphone 50 is placed inside the stuffed toy 100N, and the camera 203, microphone 201, speaker 60, etc. are extended from there to appropriate positions via a USB connection.

さらに、ワイヤレス充電のために、スマートホン５０と受電プレート６６とをＵＳＢ接続して、受電プレート６６を、ぬいぐるみ１００Ｎの内部からみてなるべく外側に来るように配置した。 Furthermore, for wireless charging, the smartphone 50 and the power receiving plate 66 are connected via USB, and the power receiving plate 66 is positioned as far outward as possible when viewed from the inside of the stuffed animal 100N.

スマートホン５０のワイヤレス充電を使おうとすると、スマートホン５０をぬいぐるみ１００Ｎの内部からみてできるだけ外側に配置しなければならず、ぬいぐるみ１００Ｎを外から触ったときにごつごつしてしまう。 When trying to use wireless charging for the smartphone 50, the smartphone 50 must be placed as far out as possible when viewed from the inside of the stuffed toy 100N, which makes the stuffed toy 100N feel rough when touched from the outside.

そのため、スマートホン５０を、できるだけぬいぐるみ１００Ｎの中心部に配置し、ワイヤレス充電機能（受電プレート６６）を、できるだけぬいぐるみ１００Ｎの内部からみて外側に配置した。カメラ２０３、マイク２０１、スピーカ６０、及びスマートホン５０は、受電プレート６６を介してワイヤレス給電を受電する。 For this reason, the smartphone 50 is placed as close to the center of the stuffed animal 100N as possible, and the wireless charging function (receiving plate 66) is placed as far outside as possible when viewed from the inside of the stuffed animal 100N. The camera 203, microphone 201, speaker 60, and smartphone 50 receive wireless power via the receiving plate 66.

なお、第２実施形態のぬいぐるみ１００Ｎの他の構成及び作用は、第１実施形態のロボット１００と同様であるため、説明を省略する。 The rest of the configuration and operation of the stuffed animal 100N of the second embodiment is the same as that of the robot 100 of the first embodiment, so a description thereof will be omitted.

また、ぬいぐるみ１００Ｎの一部（例えば、センサモジュール部２１０、格納部２２０、制御部２２８）が、ぬいぐるみ１００Ｎの外部（例えば、サーバ）に設けられ、ぬいぐるみ１００Ｎが、外部と通信することで、上記のぬいぐるみ１００Ｎの各部として機能するようにしてもよい。 In addition, parts of the plush toy 100N (e.g., the sensor module section 210, the storage section 220, the control section 228) may be provided outside the plush toy 100N (e.g., a server), and the plush toy 100N may communicate with the outside to function as each part of the plush toy 100N described above.

［第３実施形態］
上記第１実施形態では、制御システムをロボット１００に適用する場合を例示したが、第３実施形態では、上記のロボット１００を、ユーザと対話するためのエージェントとし、制御システムをエージェントシステムに適用する。なお、第１実施形態及び第２実施形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Third embodiment]
In the first embodiment, the control system is applied to the robot 100, but in the third embodiment, the robot 100 is used as an agent for interacting with a user, and the control system is applied to an agent system. Note that parts having the same configuration as those in the first and second embodiments are given the same reference numerals and will not be described.

図１０は、制御システムの機能の一部又は全部を利用して構成されるエージェントシステム５００の機能ブロック図である。 Figure 10 is a functional block diagram of an agent system 500 that is configured using some or all of the functions of a control system.

エージェントシステム５００は、ユーザ１０との間で行われる対話を通じてユーザ１０の意図に沿った一連の行動を行うコンピュータシステムである。ユーザ１０との対話は、音声又はテキストによって行うことが可能である。 The agent system 500 is a computer system that performs a series of actions in accordance with the intentions of the user 10 through dialogue with the user 10. The dialogue with the user 10 can be performed by voice or text.

エージェントシステム５００は、センサ部２００Ａと、センサモジュール部２１０と、格納部２２０と、制御部２２８Ｂと、制御対象２５２Ｂと、を有する。 The agent system 500 has a sensor unit 200A, a sensor module unit 210, a storage unit 220, a control unit 228B, and a control target 252B.

エージェントシステム５００は、例えば、ロボット、人形、ぬいぐるみ、ウェアラブル端末（ペンダント、スマートウォッチ、スマート眼鏡）、スマートホン、スマートスピーカ、イヤホン及びパーナルコンピュータなどに搭載され得る。また、エージェントシステム５００は、ウェブサーバに実装され、ユーザが所持するスマートホン等の通信端末上で動作するウェブブラウザを介して利用されてもよい。 The agent system 500 may be installed in, for example, a robot, a doll, a stuffed toy, a wearable device (pendant, smart watch, smart glasses), a smartphone, a smart speaker, earphones, a personal computer, etc. The agent system 500 may also be implemented in a web server and used via a web browser running on a communication device such as a smartphone owned by the user.

エージェントシステム５００は、例えばユーザ１０のために行動するバトラー、秘書、教師、パートナー、友人、恋人又は教師としての役割を担う。エージェントシステム５００は、ユーザ１０と対話するだけでなく、アドバイスの提供、目的地までの案内又はユーザの好みに応じたリコメンド等を行う。また、エージェントシステム５００はサービスプロバイダに対して予約、注文又は代金の支払い等を行う。 The agent system 500 plays the role of, for example, a butler, secretary, teacher, partner, friend, lover, or teacher acting for the user 10. The agent system 500 not only converses with the user 10, but also provides advice, guides the user to a destination, or makes recommendations based on the user's preferences. The agent system 500 also makes reservations, orders, or makes payments to service providers.

感情決定部２３２は、上記第１実施形態と同様に、ユーザ１０の感情及びエージェント自身の感情を決定する。行動決定部２３６は、ユーザ１０及びエージェントの感情も加味しつつロボット１００の行動を決定する。すなわち、エージェントシステム５００は、ユーザ１０の感情を理解し、空気を読んで心からのサポート、アシスト、アドバイス及びサービス提供を実現する。また、エージェントシステム５００は、ユーザ１０の悩み相談にものり、ユーザを慰め、励まし、元気づける。また、エージェントシステム５００は、ユーザ１０と遊び、絵日記を描き、昔を思い出させてくれる。エージェントシステム５００は、ユーザ１０の幸福感が増すような行動を行う。ここで、エージェントとは、ソフトウェア上で動作するエージェントである。 The emotion determination unit 232 determines the emotions of the user 10 and the agent itself, as in the first embodiment. The behavior determination unit 236 determines the behavior of the robot 100 while taking into account the emotions of the user 10 and the agent. That is, the agent system 500 understands the emotions of the user 10, reads the mood, and provides heartfelt support, assistance, advice, and services. The agent system 500 also listens to the worries of the user 10, comforts, encourages, and cheers up the user. The agent system 500 also plays with the user 10, draws picture diaries, and helps the user reminisce about the past. The agent system 500 performs actions that increase the user 10's sense of happiness. Here, the agent is an agent that runs on software.

制御部２２８Ｂは、状態認識部２３０と、感情決定部２３２と、行動認識部２３４と、行動決定部２３６と、記憶制御部２３８と、行動制御部２５０と、関連情報収集部２７０と、コマンド取得部２７２と、ＲＰＡ（Robotic Process Automation）２７４と、キャラクタ設定部２７６と、通信処理部２８０と、を有する。 The control unit 228B has a state recognition unit 230, an emotion determination unit 232, a behavior recognition unit 234, a behavior determination unit 236, a memory control unit 238, a behavior control unit 250, a related information collection unit 270, a command acquisition unit 272, an RPA (Robotic Process Automation) 274, a character setting unit 276, and a communication processing unit 280.

行動決定部２３６は、上記第１実施形態と同様に、エージェントの行動として、ユーザ１０と対話するためのエージェントの発話内容を決定する。行動制御部２５０は、エージェントの発話内容を、音声及びテキストの少なくとも一方によって制御対象２５２Ｂとしてのスピーカやディスプレイにより出力する。 As in the first embodiment, the behavior decision unit 236 decides the agent's speech content for dialogue with the user 10 as the agent's behavior. The behavior control unit 250 outputs the agent's speech content as voice and/or text through a speaker or display as a control object 252B.

キャラクタ設定部２７６は、ユーザ１０からの指定に基づいて、エージェントシステム５００がユーザ１０と対話を行う際のエージェントのキャラクタを設定する。すなわち、行動決定部２３６から出力される発話内容は、設定されたキャラクタを有するエージェントを通じて出力される。キャラクタとして、例えば、俳優、芸能人、アイドル、スポーツ選手等の実在の著名人又は有名人を設定することが可能である。また、漫画、映画又はアニメーションに登場する架空のキャラクタを設定することも可能である。エージェントのキャラクタが既知のものである場合には、当該キャラクタの声、言葉遣い、口調及び性格は、既知であるため、ユーザ１０が自分の好みのキャラクタを指定するのみで、キャラクタ設定部２７６におけるプロンプト設定が自動で行われる。設定されたキャラクタの声、言葉遣い、口調及び性格が、ユーザ１０との対話において反映される。すなわち、行動制御部２５０は、キャラクタ設定部２７６によって設定されたキャラクタに応じた音声を合成し、合成した音声によってエージェントの発話内容を出力する。これにより、ユーザ１０は、自分の好みのキャラクタ（例えば好きな俳優）本人と対話しているような感覚を持つことができる。 The character setting unit 276 sets the character of the agent when the agent system 500 converses with the user 10 based on the designation from the user 10. That is, the speech content output from the action determination unit 236 is output through the agent having the set character. For example, it is possible to set real celebrities or famous people such as actors, entertainers, idols, and athletes as characters. It is also possible to set fictional characters that appear in comics, movies, or animations. If the character of the agent is known, the voice, language, tone, and personality of the character are known, so that the user 10 only needs to designate a character of his/her choice, and the prompt setting in the character setting unit 276 is automatically performed. The voice, language, tone, and personality of the set character are reflected in the conversation with the user 10. That is, the action control unit 250 synthesizes a voice according to the character set by the character setting unit 276, and outputs the speech content of the agent using the synthesized voice. This allows the user 10 to have the feeling that he/she is conversing with his/her favorite character (for example, a favorite actor) himself/herself.

エージェントシステム５００が例えばスマートホン等のディスプレイを有するデバイスに搭載される場合、キャラクタ設定部２７６によって設定されたキャラクタを有するエージェントのアイコン、静止画又は動画がディスプレイに表示されてもよい。エージェントの画像は、例えば、３Ｄレンダリング等の画像合成技術を用いて生成される。エージェントシステム５００において、エージェントの画像が、ユーザ１０の感情、エージェントの感情、及びエージェントの発話内容に応じたジェスチャーを行いながらユーザ１０との対話が行われてもよい。なお、エージェントシステム５００は、ユーザ１０との対話に際し、画像は出力せずに音声のみを出力してもよい。 When the agent system 500 is mounted on a device with a display, such as a smartphone, an icon, still image, or video of the agent having a character set by the character setting unit 276 may be displayed on the display. The image of the agent is generated using an image synthesis technique, such as 3D rendering. In the agent system 500, a dialogue with the user 10 may be conducted while the image of the agent makes gestures according to the emotions of the user 10, the emotions of the agent, and the content of the agent's speech. Note that the agent system 500 may output only audio without outputting an image when engaging in a dialogue with the user 10.

感情決定部２３２は、第１実施形態と同様に、ユーザ１０の感情を示す感情値及びエージェント自身の感情値を決定する。本実施形態では、ロボット１００の感情値の代わりに、エージェントの感情値を決定する。エージェント自身の感情値は、設定されたキャラクタの感情に反映される。エージェントシステム５００が、ユーザ１０と対話する際、ユーザ１０の感情のみならず、エージェントの感情が対話に反映される。すなわち、行動制御部２５０は、感情決定部２３２によって決定された感情に応じた態様で発話内容を出力する。 As in the first embodiment, the emotion determination unit 232 determines an emotion value indicating the emotion of the user 10 and an emotion value of the agent itself. In this embodiment, instead of the emotion value of the robot 100, the emotion value of the agent is determined. The emotion value of the agent itself is reflected in the emotion of the set character. When the agent system 500 converses with the user 10, not only the emotion of the user 10 but also the emotion of the agent is reflected in the dialogue. In other words, the behavior control unit 250 outputs the speech content in a manner according to the emotion determined by the emotion determination unit 232.

また、エージェントシステム５００が、ユーザ１０に向けた行動を行う場合においてもエージェントの感情が反映される。例えば、ユーザ１０がエージェントシステム５００に写真撮影を依頼した場合において、エージェントシステム５００がユーザの依頼に応じて写真撮影を行うか否かは、エージェントが抱いている「悲」の感情の度合いに応じて決まる。キャラクタは、ポジティブな感情を抱いている場合には、ユーザ１０に対して好意的な対話又は行動を行い、ネガティブな感情を抱いている場合には、ユーザ１０に対して反抗的な対話又は行動を行う。 The agent's emotions are also reflected when the agent system 500 takes an action toward the user 10. For example, when the user 10 requests the agent system 500 to take a photo, whether the agent system 500 will take a photo in response to the user's request is determined by the degree of "sadness" the agent is feeling. If the character is feeling positive emotions, the character will have friendly conversations or actions toward the user 10, and if the character is feeling negative emotions, the character will have hostile conversations or actions toward the user 10.

履歴データ２２２は、ユーザ１０とエージェントシステム５００との間で行われた対話の履歴をイベントデータとして記憶している。格納部２２０は、外部のクラウドストレージによって実現されてもよい。エージェントシステム５００は、ユーザ１０と対話する場合又はユーザ１０に向けた行動を行う場合、履歴データ２２２に格納された対話履歴の内容を加味して対話内容又は行動内容を決定する。例えば、エージェントシステム５００は、履歴データ２２２に格納された対話履歴に基づいてユーザ１０の趣味及び嗜好を把握する。エージェントシステム５００は、ユーザ１０の趣味及び嗜好に合った対話内容を生成したり、リコメンドを提供したりする。行動決定部２３６は、履歴データ２２２に格納された対話履歴に基づいてエージェントの発話内容を決定する。履歴データ２２２には、ユーザ１０との対話を通じて取得したユーザ１０の氏名、住所、電話番号、クレジットカード番号等の個人情報が格納される。ここで、「クレジットカード番号を登録しておきますか？」など、エージェントが自発的にユーザ１０に対して個人情報を登録するか否かを質問する発話をし、ユーザ１０の回答に応じて、個人情報を履歴データ２２２に格納するようにしてもよい。 The history data 222 stores the history of the dialogue between the user 10 and the agent system 500 as event data. The storage unit 220 may be realized by an external cloud storage. When the agent system 500 dialogues with the user 10 or takes an action toward the user 10, the content of the dialogue or the action is determined by taking into account the content of the dialogue history stored in the history data 222. For example, the agent system 500 grasps the hobbies and preferences of the user 10 based on the dialogue history stored in the history data 222. The agent system 500 generates dialogue content that matches the hobbies and preferences of the user 10 or provides recommendations. The action decision unit 236 determines the content of the agent's utterance based on the dialogue history stored in the history data 222. The history data 222 stores personal information of the user 10, such as the name, address, telephone number, and credit card number, acquired through the dialogue with the user 10. Here, the agent may proactively ask the user 10 whether or not to register personal information, such as "Would you like to register your credit card number?", and depending on the user's 10 response, the personal information may be stored in the history data 222.

行動決定部２３６は、上記第１実施形態で説明したように、文章生成モデルを用いて生成された文章に基づいて発話内容を生成する。具体的には、行動決定部２３６は、ユーザ１０により入力されたテキストまたは音声、感情決定部２３２によって決定されたユーザ１０及びキャラクタの双方の感情及び履歴データ２２２に格納された会話の履歴を、文章生成モデルに入力して、エージェントの発話内容を生成する。このとき、行動決定部２３６は、更に、キャラクタ設定部２７６によって設定されたキャラクタの性格を、文章生成モデルに入力して、エージェントの発話内容を生成してもよい。エージェントシステム５００において、文章生成モデルは、ユーザ１０とのタッチポイントとなるフロントエンド側に位置するものではなく、あくまでエージェントシステム５００の道具として利用される。 As described in the first embodiment, the behavior determination unit 236 generates the utterance content based on the sentence generated using the sentence generation model. Specifically, the behavior determination unit 236 inputs the text or voice input by the user 10, the emotions of both the user 10 and the character determined by the emotion determination unit 232, and the conversation history stored in the history data 222 into the sentence generation model to generate the utterance content of the agent. At this time, the behavior determination unit 236 may further input the personality of the character set by the character setting unit 276 into the sentence generation model to generate the utterance content of the agent. In the agent system 500, the sentence generation model is not located on the front end side that is the touch point with the user 10, but is used merely as a tool of the agent system 500.

コマンド取得部２７２は、発話理解部２１２の出力を用いて、ユーザ１０との対話を通じてユーザ１０から発せられる音声又はテキストから、エージェントのコマンドを取得する。コマンドは、例えば、情報検索、店の予約、チケットの手配、商品・サービスの購入、代金の支払い、目的地までのルート案内、リコメンドの提供等のエージェントシステム５００が実行すべき行動の内容を含む。 The command acquisition unit 272 uses the output of the speech understanding unit 212 to acquire commands for the agent from the voice or text uttered by the user 10 through dialogue with the user 10. The commands include the content of actions to be performed by the agent system 500, such as, for example, searching for information, making a reservation at a store, arranging tickets, purchasing a product or service, paying for it, getting route guidance to a destination, and providing recommendations.

ＲＰＡ２７４は、コマンド取得部２７２によって取得されたコマンドに応じた行動を行う。ＲＰＡ２７４は、例えば、情報検索、店の予約、チケットの手配、商品・サービスの購入、代金の支払い等のサービスプロバイダの利用に関する行動を行う。 The RPA 274 performs actions according to the commands acquired by the command acquisition unit 272. The RPA 274 performs actions related to the use of service providers, such as information searches, store reservations, ticket arrangements, product and service purchases, and payment.

ＲＰＡ２７４は、サービスプロバイダの利用に関する行動を実行するために必要なユーザ１０の個人情報を、履歴データ２２２から読み出して利用する。例えば、エージェントシステム５００は、ユーザ１０からの依頼に応じて商品の購入を行う場合、履歴データ２２２に格納されているユーザ１０の氏名、住所、電話番号、クレジットカード番号等の個人情報を読み出して利用する。初期設定においてユーザ１０に個人情報の入力を要求することは不親切であり、ユーザにとっても不快である。本実施形態に係るエージェントシステム５００においては、初期設定においてユーザ１０に個人情報の入力を要求するのではなく、ユーザ１０との対話を通じて取得した個人情報を記憶しておき、必要に応じて読み出して利用する。これにより、ユーザに不快な思いをさせることを回避でき、ユーザの利便性が向上する。 The RPA 274 reads out from the history data 222 the personal information of the user 10 required to execute actions related to the use of the service provider, and uses it. For example, when the agent system 500 purchases a product at the request of the user 10, it reads out and uses the personal information of the user 10, such as the name, address, telephone number, and credit card number, stored in the history data 222. It is unkind and unpleasant for the user to be required to input personal information in the initial settings. In the agent system 500 according to this embodiment, instead of requiring the user 10 to input personal information in the initial settings, the personal information acquired through the dialogue with the user 10 is stored, and is read out and used as necessary. This makes it possible to avoid making the user feel uncomfortable, and improves user convenience.

エージェントシステム５００は、例えば、以下のステップ１～ステップ６により、対話処理を実行する。 The agent system 500 executes the dialogue processing, for example, through the following steps 1 to 6.

（ステップ１）エージェントシステム５００は、エージェントのキャラクタを設定する。具体的には、キャラクタ設定部２７６は、ユーザ１０からの指定に基づいて、エージェントシステム５００がユーザ１０と対話を行う際のエージェントのキャラクタを設定する。 (Step 1) The agent system 500 sets the character of the agent. Specifically, the character setting unit 276 sets the character of the agent when the agent system 500 interacts with the user 10, based on the designation from the user 10.

（ステップ２）エージェントシステム５００は、ユーザ１０から入力された音声又はテキストを含むユーザ１０の状態、ユーザ１０の感情値、エージェントの感情値、履歴データ２２２を取得する。具体的には、上記ステップＳ１００～Ｓ１０３と同様の処理を行い、ユーザ１０から入力された音声又はテキストを含むユーザ１０の状態、ユーザ１０の感情値、エージェントの感情値、及び履歴データ２２２を取得する。 (Step 2) The agent system 500 acquires the state of the user 10, including the voice or text input from the user 10, the emotion value of the user 10, the emotion value of the agent, and the history data 222. Specifically, the same processing as in steps S100 to S103 above is performed to acquire the state of the user 10, including the voice or text input from the user 10, the emotion value of the user 10, the emotion value of the agent, and the history data 222.

（ステップ３）エージェントシステム５００は、エージェントの発話内容を決定する。
具体的には、行動決定部２３６は、ユーザ１０により入力されたテキストまたは音声、感情決定部２３２によって特定されたユーザ１０及びキャラクタの双方の感情及び履歴データ２２２に格納された会話の履歴を、文章生成モデルに入力して、エージェントの発話内容を生成する。 (Step 3) The agent system 500 determines the content of the agent's utterance.
Specifically, the behavior determination unit 236 inputs the text or voice input by the user 10, the emotions of both the user 10 and the character identified by the emotion determination unit 232, and the conversation history stored in the history data 222 into a sentence generation model, and generates the agent's speech content.

例えば、ユーザ１０により入力されたテキストまたは音声、感情決定部２３２によって特定されたユーザ１０及びキャラクタの双方の感情及び履歴データ２２２に格納された会話の履歴を表すテキストに、「このとき、エージェントとして、どのように返事をしますか？」という固定文を追加して、文章生成モデルに入力し、エージェントの発話内容を取得する。 For example, a fixed sentence such as "How would you respond as an agent in this situation?" is added to the text or voice input by the user 10, the emotions of both the user 10 and the character identified by the emotion determination unit 232, and the text representing the conversation history stored in the history data 222, and this is input into the sentence generation model to obtain the content of the agent's speech.

一例として、ユーザ１０に入力されたテキスト又は音声が「今夜７時に、近くの美味しいチャイニーズレストランを予約してほしい」である場合、エージェントの発話内容として、「かしこまりました。」、「こちらがおすすめのレストランです。１．AAAA。２．BBBB。３．CCCC。４．DDDD」が取得される。 As an example, if the text or voice input by the user 10 is "Please make a reservation at a nice Chinese restaurant nearby for tonight at 7pm," the agent's speech will be "Understood," and "Here are some recommended restaurants: 1. AAAA. 2. BBBB. 3. CCCC. 4. DDDD."

また、ユーザ１０に入力されたテキスト又は音声が「４番目のDDDDがいい」である場合、エージェントの発話内容として、「かしこまりました。予約してみます。何名の席です。」が取得される。 In addition, if the text or voice input by the user 10 is "Number 4, DDDD, would be good," the agent's speech will be "Understood. I'll try to make a reservation. How many seats are there?"

（ステップ４）エージェントシステム５００は、エージェントの発話内容を出力する。
具体的には、行動制御部２５０は、キャラクタ設定部２７６によって設定されたキャラクタに応じた音声を合成し、合成した音声によってエージェントの発話内容を出力する。 (Step 4) The agent system 500 outputs the agent's utterance content.
Specifically, the behavior control unit 250 synthesizes a voice according to the character set by the character setting unit 276, and outputs the agent's speech in the synthesized voice.

（ステップ５）エージェントシステム５００は、エージェントのコマンドを実行するタイミングであるか否かを判定する。
具体的には、行動決定部２３６は、文章生成モデルの出力に基づいて、エージェントのコマンドを実行するタイミングであるか否かを判定する。例えば、文章生成モデルの出力に、エージェントがコマンドを実行する旨が含まれている場合には、エージェントのコマンドを実行するタイミングであると判定し、ステップ６へ移行する。一方、エージェントのコマンドを実行するタイミングでないと判定された場合には、上記ステップ２へ戻る。 (Step 5) The agent system 500 determines whether it is time to execute the agent's command.
Specifically, the behavior decision unit 236 judges whether or not it is time to execute the agent's command based on the output of the sentence generation model. For example, if the output of the sentence generation model includes information indicating that the agent should execute a command, it is judged that it is time to execute the agent's command, and the process proceeds to step 6. On the other hand, if it is judged that it is not time to execute the agent's command, the process returns to step 2.

（ステップ６）エージェントシステム５００は、エージェントのコマンドを実行する。
具体的には、コマンド取得部２７２は、ユーザ１０との対話を通じてユーザ１０から発せられる音声又はテキストから、エージェントのコマンドを取得する。そして、ＲＰＡ２７４は、コマンド取得部２７２によって取得されたコマンドに応じた行動を行う。例えば、コマンドが「情報検索」である場合、ユーザ１０との対話を通じて得られた検索クエリ、及びＡＰＩ（Application Programming Interface）を用いて、検索サイトにより、情報検索を行う。行動決定部２３６は、検索結果を、文章生成モデルに入力して、エージェントの発話内容を生成する。行動制御部２５０は、キャラクタ設定部２７６によって設定されたキャラクタに応じた音声を合成し、合成した音声によってエージェントの発話内容を出力する。 (Step 6) The agent system 500 executes the agent's command.
Specifically, the command acquisition unit 272 acquires a command for the agent from a voice or text issued by the user 10 through a dialogue with the user 10. Then, the RPA 274 performs an action according to the command acquired by the command acquisition unit 272. For example, if the command is "information search", an information search is performed on a search site using a search query obtained through a dialogue with the user 10 and an API (Application Programming Interface). The behavior decision unit 236 inputs the search results into a sentence generation model to generate the agent's utterance content. The behavior control unit 250 synthesizes a voice according to the character set by the character setting unit 276, and outputs the agent's utterance content using the synthesized voice.

また、コマンドが「店の予約」である場合、ユーザ１０との対話を通じて得られた予約情報、予約先の店情報、及びＡＰＩを用いて、電話ソフトウエアにより、予約先の店へ電話をかけて、予約を行う。このとき、行動決定部２３６は、対話機能を有する文章生成モデルを用いて、相手から入力された音声に対するエージェントの発話内容を取得する。そして、行動決定部２３６は、店の予約の結果（予約の正否）を、文章生成モデルに入力して、エージェントの発話内容を生成する。行動制御部２５０は、キャラクタ設定部２７６によって設定されたキャラクタに応じた音声を合成し、合成した音声によってエージェントの発話内容を出力する。 Also, if the command is "reserve a restaurant," the reservation information obtained through dialogue with the user 10, the restaurant information, and the API are used to call the restaurant via telephone software to make the reservation. At this time, the behavior decision unit 236 uses a sentence generation model with a dialogue function to obtain the agent's utterance in response to the voice input from the other party. The behavior decision unit 236 then inputs the result of the restaurant reservation (whether the reservation was successful or not) into the sentence generation model to generate the agent's utterance. The behavior control unit 250 synthesizes a voice according to the character set by the character setting unit 276, and outputs the agent's utterance using the synthesized voice.

そして、上記ステップ２へ戻る。 Then go back to step 2 above.

ステップ６において、エージェントにより実行された行動（例えば、店の予約）の結果についても履歴データ２２２に格納される。履歴データ２２２に格納されたエージェントにより実行された行動の結果は、エージェントシステム５００によりユーザ１０の趣味、又は嗜好を把握することに活用される。例えば、同じ店を複数回予約している場合には、その店をユーザ１０が好んでいると認識したり、予約した時間帯、又はコースの内容もしくは料金等の予約内容を次回の予約の際にお店選びの基準としたりする。 In step 6, the results of the actions taken by the agent (e.g., making a reservation at a restaurant) are also stored in the history data 222. The results of the actions taken by the agent stored in the history data 222 are used by the agent system 500 to understand the hobbies or preferences of the user 10. For example, if the same restaurant has been reserved multiple times, the agent system 500 may recognize that the user 10 likes that restaurant, and may use the reservation details, such as the reserved time period, or the course content or price, as a criterion for choosing a restaurant the next time the reservation is made.

このように、エージェントシステム５００は、対話処理を実行し、必要に応じて、サービスプロバイダの利用に関する行動を行うことができる。 In this way, the agent system 500 can execute interactive processing and, if necessary, take action related to the use of the service provider.

図１１及び図１２は、エージェントシステム５００の動作の一例を示す図である。図１１には、エージェントシステム５００が、ユーザ１０との対話を通じてレストランの予約を行う態様が例示されている。図１１では、左側に、エージェントの発話内容を示し、右側に、ユーザ１０の発話内容を示している。エージェントシステム５００は、ユーザ１０との対話履歴に基づいてユーザ１０の好みを把握し、ユーザ１０の好みに合ったレストランのリコメンドリストを提供し、選択されたレストランの予約を実行することができる。 Figures 11 and 12 are diagrams showing an example of the operation of the agent system 500. Figure 11 illustrates an example of the agent system 500 making a restaurant reservation through dialogue with the user 10. In Figure 11, the left side shows the agent's speech, and the right side shows the user's speech. The agent system 500 is able to grasp the preferences of the user 10 based on the dialogue history with the user 10, provide a recommended list of restaurants that match the preferences of the user 10, and make a reservation at the selected restaurant.

一方、図１２には、エージェントシステム５００が、ユーザ１０との対話を通じて通信販売サイトにアクセスして商品の購入を行う態様が例示されている。図１２では、左側に、エージェントの発話内容を示し、右側に、ユーザ１０の発話内容を示している。エージェントシステム５００は、ユーザ１０との対話履歴に基づいて、ユーザがストックしている飲料の残量を推測し、ユーザ１０に当該飲料の購入を提案し、実行することができる。また、エージェントシステム５００は、ユーザ１０との過去の対話履歴に基づいて、ユーザの好みを把握し、ユーザが好むスナックをリコメンドすることができる。このように、エージェントシステム５００は、執事のようなエージェントとしてユーザ１０とコミュニケーションを取りながら、レストラン予約、又は、商品の購入決済など様々な行動まで実行することで、ユーザ１０の日々の生活を支えてくれる。 On the other hand, FIG. 12 illustrates an example in which the agent system 500 accesses a mail-order site through a dialogue with the user 10 to purchase a product. In FIG. 12, the left side shows the agent's speech, and the right side shows the user's speech. The agent system 500 can estimate the remaining amount of a drink stocked by the user 10 based on the dialogue history with the user 10, and can suggest and execute the purchase of the drink to the user 10. The agent system 500 can also understand the user's preferences based on the past dialogue history with the user 10, and recommend snacks that the user likes. In this way, the agent system 500 supports the user 10's daily life by communicating with the user 10 as a butler-like agent and performing various actions such as making restaurant reservations or purchasing and paying for products.

なお、第３実施形態のエージェントシステム５００の他の構成及び作用は、第１実施形態のロボット１００と同様であるため、説明を省略する。 Note that other configurations and operations of the agent system 500 of the third embodiment are similar to those of the robot 100 of the first embodiment, so a description thereof will be omitted.

エージェントシステム５００が通信端末１００Ｍ（図１６参照）に搭載されてもよい。この場合、エージェントシステム５００の行動決定部２３６は、行動しないことを含む複数種類のユーザと対話するためのエージェントの行動の何れかを、通信端末１００Ｍ（図１６参照）である電子機器の行動として決定してよい。 The agent system 500 may be installed in the communication terminal 100M (see FIG. 16). In this case, the behavior decision unit 236 of the agent system 500 may decide one of multiple types of agent behaviors for interacting with a user, including no behavior, as the behavior of the electronic device that is the communication terminal 100M (see FIG. 16).

エージェントシステム５００の行動決定モデル２２１が、対話機能を有する文章生成モデルの場合、行動決定部２３６は、ユーザ状態、エージェントの状態、ユーザの感情、及びエージェントの感情の少なくとも一つを表すテキストと、エージェント行動を質問するテキストとを文章生成モデルに入力し、文章生成モデルの出力に基づいて、通信端末１００Ｍの行動を決定してよい。 When the behavior decision model 221 of the agent system 500 is a text generation model with a dialogue function, the behavior decision unit 236 may input text representing at least one of the user state, the agent state, the user's emotion, and the agent's emotion, and text asking about the agent's behavior, to the text generation model, and determine the behavior of the communication terminal 100M based on the output of the text generation model.

次に、エージェントが自律的に行動する自律的処理を行う際の、行動決定部２３６の処理について説明する。 Next, we will explain the processing of the behavior decision unit 236 when the agent performs autonomous processing to act autonomously.

本実施形態における自律的処理では、エージェントは、ユーザを監視することで、自発的に又は定期的に、ユーザの状態又は行動を検知してよい。自発的は、エージェントが外部から契機なしに、ユーザの状態又は行動を自ら進んで取得することと解釈してよい。外部から契機は、ユーザから通信端末１００Ｍを介してのエージェントへの質問、ユーザから通信端末１００Ｍへの能動的な行動などを含み得る。 In the autonomous processing of this embodiment, the agent may detect the user's state or behavior spontaneously or periodically by monitoring the user. Spontaneous may be interpreted as the agent acquiring the user's state or behavior of its own accord without an external trigger. External triggers may include a question from the user to the agent via the communication terminal 100M, active behavior from the user toward the communication terminal 100M, etc.

また、自律的処理では、エージェントは、検知したユーザの状態又は行動について、生成系ＡＩに質問し、質問に対する生成系ＡＩの回答と、検知したユーザの行動とを、対応付けて記憶してよい。このとき、エージェントは、当該行動を是正する行動内容を、当該回答に対応付けて記憶してよい。 In addition, in autonomous processing, the agent may ask the generative AI questions about the detected state or behavior of the user, and store the generative AI's answer to the question in association with the detected user behavior. At this time, the agent may store the action content for correcting the behavior in association with the answer.

具体的には、ユーザが長時間歩行する傾向がある場合に、エージェントが生成系ＡＩに「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行い、この質問に対する生成系ＡＩの回答が「以前通ったルートを同じルートを提案する」である場合、エージェントは、当該回答と、長時間歩行し得るユーザの行動と、行動を是正する行動内容（例えば「以前にも通ったルートを案内します」というガイダンス内容）とを、テーブル情報に対応付けて記録してよい。長時間歩行しているユーザの行動を検知した場合、エージェントシステム５００の行動決定部２３６は、当該テーブル情報を利用することで、通信端末１００Ｍの行動として、「以前にも通ったルートを案内します」という音声を再生してよい。 Specifically, when a user has a tendency to walk for long periods of time, the agent asks the generative AI, "What kind of guidance should be given to a user who behaves like this?", and if the generative AI answers this question by saying, "Propose the same route as the user has taken before," the agent may record this answer, the user's behavior of walking for long periods of time, and the action content for correcting the behavior (for example, guidance content such as "We will guide you to a route you have taken before") in association with table information. When detecting the behavior of a user who has been walking for long periods of time, the action decision unit 236 of the agent system 500 may use the table information to play a voice message saying, "We will guide you to a route you have taken before," as the action of the communication terminal 100M.

他の例として、ユーザが長時間ランニングする傾向がある場合に、エージェントが生成系ＡＩに「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行い、この質問に対する生成系ＡＩの回答が「水分補給を促す案内を定期的に行うことが望ましい」である場合、エージェントは、当該回答と、長時間ランニングし得るユーザの行動と、行動を是正する行動内容（例えば「水分補給をこまめに行ってください」というガイダンス内容）とを、テーブル情報に対応付けて記録してよい。長時間ランニングしているユーザの行動を検知した場合、エージェントシステム５００の行動決定部２３６は、当該テーブル情報を利用することで、通信端末１００Ｍの行動として、「水分補給をこまめに行ってください」という音声を再生してよい。 As another example, if a user has a tendency to run for long periods of time, the agent may ask the generative AI, "What kind of guidance should be given to a user who behaves in this way?" and if the generative AI answers this question by saying, "It is desirable to provide regular guidance encouraging hydration," the agent may record the answer, the behavior of the user who may be running for long periods of time, and the action content for correcting the behavior (for example, the guidance content "Please hydrate frequently") in association with table information. When the behavior of a user who is running for long periods of time is detected, the action decision unit 236 of the agent system 500 may use the table information to play a voice message saying, "Please hydrate frequently," as the action of the communication terminal 100M.

他の例として、ユーザが車道を歩く傾向がある場合に、エージェントが生成系ＡＩに「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行い、この質問に対する生成系ＡＩの回答が「車両と接触する可能性があるので歩道に戻ることを提案することを推奨する」である場合、エージェントは、当該回答と、歩道を歩いているユーザの行動と、行動を是正する行動内容（例えば「歩道に戻ってください」というガイダンス内容）とを、テーブル情報に対応付けて記録してよい。歩道を歩いているユーザの行動を検知した場合、エージェントシステム５００の行動決定部２３６は、当該テーブル情報を利用することで、通信端末１００Ｍの行動として、「歩道に戻ってください」という音声を再生してよい。 As another example, if a user has a tendency to walk on the roadway, the agent asks the generative AI, "What kind of guidance should be given to a user behaving in this way?", and if the generative AI answers this question with, "There is a risk of contact with a vehicle, so we recommend suggesting that the user return to the sidewalk," the agent may record this answer, the behavior of the user walking on the sidewalk, and the action content for correcting the behavior (for example, the guidance content "Please return to the sidewalk") in association with the table information. When the behavior of a user walking on the sidewalk is detected, the action decision unit 236 of the agent system 500 may use the table information to play a voice message saying "Please return to the sidewalk" as the action of the communication terminal 100M.

また自律的処理では、検出したユーザの行動と、記憶した特定情報とに基づき、ユーザの状態又は行動に対して、注意を促すエージェントの行動予定を設定してよい。 In addition, in autonomous processing, an agent may set an action schedule to alert the user to the user's state or behavior based on the detected user behavior and the stored specific information.

前述したように、エージェントは、ユーザの状態又は行動に対応する生成系ＡＩの回答と、検知したユーザの状態又は行動とを対応付けたテーブル情報を記憶媒体に記録し得る。以下に、テーブルに記憶する内容の例について説明する。 As mentioned above, the agent can record table information in a storage medium that associates the generative AI's response corresponding to the user's state or behavior with the detected user's state or behavior. Below, an example of the contents stored in the table is described.

（１．ユーザが長時間歩行する傾向がある場合）
当該傾向がある場合、エージェントは、エージェント自ら生成系ＡＩに、「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行う。この質問に対する生成系ＡＩの回答が、例えば「以前通ったルートを同じルートを提案する」、「通ったことがないルートを提案する」、「自転車などの車両が比較的少ないルートを提案する」、「起伏のあるルートを提案する」などである場合、エージェントは、ユーザの行動と、生成系ＡＩの回答を対応付けて記憶してよい。 (1. When the user tends to walk for long periods of time)
If such a tendency exists, the agent itself asks the generative AI, "What kind of guidance should be provided to a user behaving in this way?" If the generative AI answers this question with, for example, "Suggest the same route as the one taken before,""Suggest a route that has never been taken before,""Suggest a route with relatively few bicycles or other vehicles,""Suggest a route with hills," or the like, the agent may store the user's behavior in association with the generative AI's answer.

（２．ユーザが長時間ランニングする傾向がある場合）
当該傾向がある場合、エージェントは、エージェント自ら生成系ＡＩに、「このような行動をとるユーザにどのような案内を行うべきか？」という質問を行う。この質問に対する生成系ＡＩの回答が、例えば「水分補給を促す案内を定期的に行う」、「自転車が少なく空気がきれいなルートを提案する」、「信号機が少ないルートを提案する」などである場合、エージェントは、ユーザの行動と、生成系ＡＩの回答を対応付けて記憶してよい。 (2. If the user tends to run for long periods of time)
If such a tendency exists, the agent itself asks the generative AI, "What kind of guidance should be provided to a user behaving in this way?" If the generative AI's answer to this question is, for example, "Periodic guidance should be provided to encourage hydration,""Suggest a route with fewer bicycles and cleaner air," or "Suggest a route with fewer traffic lights," the agent may store the user's behavior in association with the generative AI's answer.

（３．ユーザが車道を歩く傾向がある場合）
当該傾向がある場合、エージェントは、エージェント自ら生成系ＡＩに、「このような行動をとるユーザにどのような事態が及び得るか？」という質問を行う。この質問に対する生成系ＡＩの回答が、例えば「車両と接触する可能性がある」、「自転車と接触する可能性がある」、「高速ＩＣに侵入し得る」などである場合、エージェントは、車道を歩くというユーザの行動と、生成系ＡＩの回答を対応付けて記憶してよい。またエージェントは、当該行動を是正する行動内容を、当該回答に対応付けて記憶してよい。 (3. When the user tends to walk on the road)
If such a tendency exists, the agent itself asks the generative AI, "What kind of situation may occur to a user who behaves in this way?" If the generative AI answers this question with, for example, "There is a possibility of contact with a vehicle,""There is a possibility of contact with a bicycle,""There is a possibility of entering a highway interchange," etc., the agent may store the user's behavior of walking on the roadway in association with the generative AI's answer. The agent may also store the content of an action to correct the behavior in association with the answer.

ユーザの危険な行動を是正する音声は、ユーザを特定の場所に誘導する音声を含み得る。特定の場所は、ユーザを現在位置する場所以外の場所、例えば、通信端末１００Ｍの近傍、歩道などを含めてよい。具体的には、当該音声は、「車道を歩くのはやめてください」、「そこは歩道ではありません」、「車道は危ない」、「すぐ歩道に移動して下さい」などの音声を含めてよい。 The audio that corrects the user's risky behavior may include audio that guides the user to a specific location. The specific location may include a location other than the user's current location, such as near the communication terminal 100M or a sidewalk. Specifically, the audio may include audio such as "Please do not walk on the road," "That is not a sidewalk," "The road is dangerous," and "Please move to the sidewalk immediately."

また、自律的処理では、当該テーブルを記録した後、ユーザの行動を自律的又は定期的に検出し、検出したユーザの行動と記憶したテーブルの内容とに基づき、ユーザに注意を促すエージェントの行動予定を設定してよい。具体的には、エージェントシステム５００の行動決定部２３６が、検出したユーザの行動と記憶したテーブルの内容とに基づき、第１行動内容を設定してよい。 In addition, in the autonomous processing, after recording the table, the user's behavior may be detected autonomously or periodically, and an agent action schedule for alerting the user may be set based on the detected user's behavior and the contents of the stored table. Specifically, the action decision unit 236 of the agent system 500 may set the first action content based on the detected user's behavior and the contents of the stored table.

具体的には、エージェントシステム５００の行動決定部２３６は、自発的に又は定期的に、通信端末１００Ｍが有するセンサに基づきユーザの行動を検知し、検知したユーザの行動と予め記憶した特定情報とに基づき、電子機器の行動として、ユーザを先導することを決定した場合には、第１行動内容を決定してよい。 Specifically, the behavior decision unit 236 of the agent system 500 may detect the user's behavior, either autonomously or periodically, based on a sensor included in the communication terminal 100M, and when it decides to lead the user as the behavior of the electronic device based on the detected user's behavior and specific information stored in advance, it may determine the first behavior content.

（第１行動内容）
第１行動内容は、移動部６０１が有する車輪６０１ａを制御することでユーザを先導するエージェントの行動を含み得る。車輪６０１ａを制御することには、車輪６０１ａの回転方向、回転速度及び操蛇方向の少なくとも１つを変更することが含まれ得る。以下に、第１行動内容の例について説明する。 (First action content)
The first action content may include an action of the agent leading the user by controlling the wheels 601a of the moving unit 601. Controlling the wheels 601a may include changing at least one of the rotation direction, rotation speed, and steering direction of the wheels 601a. An example of the first action content will be described below.

エージェントシステム５００の行動決定部２３６は、歩行するユーザが特定ルートを移動するようにルート案内する第１行動内容として、ユーザの歩行を補助する音声及び画像の少なくとも１つの再生を実行し得る。具体的には、行動決定部２３６は、図１７に示すように、ユーザが歩行し得るルートの画像を通信端末１００Ｍの画面に再生してよく、当該ルートに関連する音声を再生してよい。当該音声は、「１０ｍ先を右方向です」などを含み得る。 The behavior decision unit 236 of the agent system 500 may execute playback of at least one of audio and images that assist the user in walking as a first behavior content that provides route guidance to the walking user so that the user moves along a specific route. Specifically, the behavior decision unit 236 may play an image of a route that the user may walk on the screen of the communication terminal 100M, as shown in FIG. 17, and may play audio related to the route. The audio may include, for example, "Turn right 10 meters ahead."

エージェントシステム５００の行動決定部２３６は、設定された案内ルートと別のルートを案内する音声及び画像の少なくとも１つの再生を実行してよい。具体的には、行動決定部２３６は、図１８に示すように、設定された案内ルートとは別のルートの画像を通信端末１００Ｍの画面に再生してよく、当該ルートに関連する音声を再生してよい。当該音声は、「この先通行止めです。別ルートを案内しますか？」などを含み得る。 The behavior decision unit 236 of the agent system 500 may execute playback of at least one of audio and images guiding a route other than the set guidance route. Specifically, as shown in FIG. 18, the behavior decision unit 236 may play an image of a route other than the set guidance route on the screen of the communication terminal 100M, and may play audio related to that route. The audio may include, for example, "Road closed ahead. Would you like to be guided to an alternative route?"

エージェントシステム５００の行動決定部２３６は、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、ユーザを案内ルートに戻す経路を案内する音声及び画像の少なくとも１つの再生を実行してよい。具体的には、行動決定部２３６は、図１９に示すように、設定された案内ルートに戻るルートの画像を通信端末１００Ｍの画面に再生してよく、当該ルートに関連する音声を再生してよい。当該音声は、「ルートを外れました。元ルートに戻る経路を案内します。」などを含み得る。エージェントシステム５００の行動決定部２３６は、前述した音声、例えば「以前にも通ったルートを案内します」、「水分補給をこまめに行ってください」、「歩道に戻ってください」などの音声及び画像の少なくとも１つの再生を実行してよい。 When a user being led along a set guidance route deviates from the guidance route, the behavior decision unit 236 of the agent system 500 may execute playback of at least one of audio and image guiding the user on a route back to the guidance route. Specifically, as shown in FIG. 19, the behavior decision unit 236 may execute playback of an image of a route back to the set guidance route on the screen of the communication terminal 100M, and may execute playback of audio related to the route. The audio may include, for example, "You have deviated from the route. We will guide you back to the original route." The behavior decision unit 236 of the agent system 500 may execute playback of at least one of the audio and image described above, such as, for example, "We will guide you along a route you have taken before," "Please drink water frequently," and "Please return to the sidewalk."

（第２行動内容）
エージェントシステム５００の行動決定部２３６は、移動部６０１がユーザの先導中に、又は、ユーザの歩行を補助する音声及び画像の少なくとも１つを再生した後に、ユーザの行動を検出することでユーザの行動が是正されたか否かを判定し、ユーザの行動が是正された場合、第１行動内容と異なる第２行動内容を決定してよい。 (Second Action)
The behavior decision unit 236 of the agent system 500 detects the user's behavior while the movement unit 601 is leading the user or after playing at least one of audio and images that assist the user's walking, thereby determining whether the user's behavior has been corrected, and if the user's behavior has been corrected, may decide on a second behavior content different from the first behavior content.

ユーザの行動が是正された場合とは、第１行動内容によるエージェントの動作が実行された結果、ユーザが特定行動及び特定行為を辞めた場合、又は、特定状況が解消された場合と解釈してよい。具体的には、ユーザの行動が是正された場合は、提案した別ルートに従ってユーザが歩行を開始した場合、ルートを外れたことで元ルートに戻る経路を案内したときに、その経路に従ってユーザが歩行を開始した場合などを含み得る。 A case where the user's behavior is corrected may be interpreted as a case where the user stops a specific behavior and a specific action as a result of the agent's operation based on the first action content being executed, or a case where a specific situation is resolved. Specifically, a case where the user's behavior is corrected may include a case where the user starts walking according to a suggested alternative route, or a case where the user starts walking according to a route that leads back to the original route after straying from the route.

具体的には、ユーザの行動を褒める音声は、図２０に示すように「よく戻れましたね！素晴らしい」などの音声を含めてよい。ユーザの行動に対して感謝する音声は、図２１に示すように「戻ってくれて有り難うございます」、「別ルートを選択してくれて有り難うございます」という音声を含めてよい。ユーザの行動を褒める画像は、グッドポーズをとるキャラクタの画像を含めてよい。ユーザの行動に対して感謝する画像は、お礼をするキャラクタの画像を含めてよい。 Specifically, audio praising the user's actions may include audio such as "You made it back! That's great," as shown in FIG. 20. Audio thanking the user for their actions may include audio such as "Thank you for coming back," or "Thank you for choosing a different route," as shown in FIG. 21. An image praising the user's actions may include an image of a character striking a good pose. An image thanking the user for their actions may include an image of a character giving thanks.

（第３行動内容）
行動決定部２３６は、移動部６０１がユーザの先導中に、又は、ユーザの歩行を補助する音声及び画像の少なくとも１つを再生した後に、ユーザの行動を検出することでユーザの行動が是正されたか否かを判定し、ユーザの行動が是正されていない場合、第１行動内容と異なる第３行動内容を決定してよい。 (Third Action)
The behavior decision unit 236 detects the user's behavior while the movement unit 601 is leading the user or after playing at least one of audio and images that assist the user in walking, thereby determining whether the user's behavior has been corrected, and if the user's behavior has not been corrected, may decide on a third behavior content different from the first behavior content.

ユーザの行動が是正されていない場合とは、第１行動内容によるエージェントの動作が実行されたにもかかわらず、ユーザが危険な行動及び行為を継続した場合、危険な状況が解消されていない場合、又は、ユーザがエージェントの提案とは異なる行動を開始又は継続した場合と解釈してよい。 A case where the user's behavior is not corrected may be interpreted as a case where the user continues dangerous behavior and actions despite the execution of the agent's operation according to the first action content, a case where the dangerous situation is not resolved, or a case where the user starts or continues a behavior different from that suggested by the agent.

本開示のエージェントシステム５００によれば、自律的処理によって、ユーザを先導する行動を実行し得る。これにより、エージェントは、初めての街を歩行するユーザを適切なルートに案内し得る。また歩行が趣味のユーザに対してこれまで通ったことがないルートを案内し得る。また通行できないルートを事前に検知して、ユーザを別ルートへ誘導し得る。また道に迷い特定のルートから外れたユーザの元のルートに誘導し得る。 According to the agent system 500 disclosed herein, the agent can perform actions to lead the user through autonomous processing. This allows the agent to guide a user walking in a city for the first time to an appropriate route. It can also guide a user who enjoys walking to a route they have not taken before. It can also detect impassable routes in advance and guide the user to a different route. It can also guide a user who has become lost and deviated from a specific route back to the original route.

行動決定部２３６は、所定のタイミングで、ユーザ１０の状態、ユーザ１０の感情、エージェントの感情、及びエージェントの状態の少なくとも一つと、行動決定モデル２２１とを用いて、行動しないことを含む複数種類のエージェント行動の何れかを、エージェントの行動として決定する。ここでは、行動決定モデル２２１として、対話機能を有する文章生成モデルを用いる場合を例に説明する。 The behavior decision unit 236 uses at least one of the state of the user 10, the emotion of the user 10, the emotion of the agent, and the state of the agent, and the behavior decision model 221 at a predetermined timing to decide one of a number of types of agent behavior, including no action, as the agent's behavior. Here, an example will be described in which a sentence generation model with a dialogue function is used as the behavior decision model 221.

具体的には、行動決定部２３６は、ユーザ１０の状態、ユーザ１０の感情、エージェントの感情、及びエージェントの状態の少なくとも一つを表すテキストと、エージェント行動を質問するテキストとを文章生成モデルに入力し、文章生成モデルの出力に基づいて、エージェントの行動を決定する。 Specifically, the behavior decision unit 236 inputs text representing at least one of the state of the user 10, the emotion of the user 10, the emotion of the agent, and the state of the agent, and text asking about the agent's behavior, into a sentence generation model, and decides on the agent's behavior based on the output of the sentence generation model.

例えば、複数種類のエージェント行動は、以下の（１）～（２６）を含む。 For example, the multiple types of agent behaviors include (1) to (26) below.

（１）エージェントは、何もしない。
（２）エージェントは、夢をみる。
（３）エージェントは、ユーザに話しかける。
（４）エージェントは、絵日記を作成する。
（５）エージェントは、アクティビティを提案する。
（６）エージェントは、ユーザが会うべき相手を提案する。
（７）エージェントは、ユーザが興味あるニュースを紹介する。
（８）エージェントは、写真や動画を編集する。
（９）エージェントは、ユーザと一緒に勉強する。
（１０）エージェントは、記憶を呼び起こす。
（１１）エージェントは、ユーザを先導する第１行動内容として、ユーザが歩行し得るルートの画像を通信端末の画面に再生してよい。
（１２）エージェントは、ユーザの行動を是正する第１行動内容として、ユーザが歩行し得るルートに関連する音声を再生してよい。
（１３）エージェントは、ユーザの行動を是正する第１行動内容として、設定された案内ルートとは別のルートの画像を通信端末１００Ｍの画面に再生してよい。
（１４）エージェントは、ユーザの行動を是正する第１行動内容として、設定された案内ルートとは別のルートに関連する音声を再生してよい。
（１５）エージェントは、ユーザの行動を是正する第１行動内容として、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートの画像を通信端末１００Ｍの画面に再生してよい。
（１６）エージェントは、ユーザの行動を是正する第１行動内容として、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートに関連する音声を再生してよい。
（１７）エージェントは、第１行動内容と異なる第２行動内容として、ユーザの行動を褒める音声を再生してよい。
（１８）エージェントは、第１行動内容と異なる第２行動内容として、ユーザの行動に対して感謝する音声を再生してよい。
（１９）エージェントは、第１行動内容と異なる第２行動内容として、ユーザの行動を褒める画像を再生してよい。
（２０）エージェントは、第１行動内容と異なる第２行動内容として、ユーザの行動に対して感謝する画像を再生してよい。
（２１）エージェントは、第１行動内容と異なる第３行動内容として、ユーザ以外の人物への特定情報を送信してよい。
（２２）エージェントは、第１行動内容と異なる第３行動内容として、ユーザの興味を引く音を再生してよい。
（２３）エージェントは、第１行動内容と異なる第３行動内容として、ユーザの興味を引く画像を再生してよい。 (1) The agent does nothing.
(2) The agent dreams.
(3) The agent speaks to the user.
(4) The agent creates a picture diary.
(5) The agent suggests an activity.
(6) The agent suggests people for the user to meet.
(7) The agent introduces news that may be of interest to the user.
(8) The agent edits photos and videos.
(9) The agent learns together with the user.
(10) The agent evokes memories.
(11) As a first action content for leading the user, the agent may display an image of a route that the user may walk on the screen of the communication terminal.
(12) The agent may play back audio related to a route that the user may walk as a first action content for correcting the user's behavior.
(13) As a first action content for correcting the user's behavior, the agent may play back, on the screen of the communication terminal 100M, an image of a route different from the set guidance route.
(14) The agent may play back audio related to a route other than the set guide route as a first action content for correcting the user's behavior.
(15) As a first action content for correcting the user's behavior, the agent may play on the screen of the communication terminal 100M an image of a route that returns to the set guidance route when the user being led along the set guidance route deviates from the guided route.
(16) As a first action content for correcting a user's behavior, the agent may play audio related to a route back to the set guidance route when the user being led along a set guidance route deviates from the guided route.
(17) The agent may play back audio praising the user's behavior as a second action content different from the first action content.
(18) The agent may play back, as a second action content different from the first action content, a voice message expressing gratitude for the user's action.
(19) The agent may play back, as a second action content different from the first action content, an image praising the user's action.
(20) The agent may play back, as a second action content different from the first action content, an image expressing gratitude for the user's action.
(21) The agent may transmit specific information to a person other than the user as a third action content different from the first action content.
(22) The agent may play a sound that attracts the user's interest as a third action content different from the first action content.
(23) The agent may play back an image that attracts the user's interest as a third action content different from the first action content.

エージェントシステム５００の行動決定部２３６は、自発的に又は定期的に、センサに基づきユーザの行動を検知し、検知したユーザの行動と予め記憶した特定情報とに基づき、エージェントの行動として、ユーザを先導することを決定した場合には、以下の第１行動内容を実行し得る。 The behavior decision unit 236 of the agent system 500 detects the user's behavior based on a sensor, either autonomously or periodically, and when it determines that the agent's behavior is to lead the user based on the detected user's behavior and pre-stored specific information, it may execute the following first behavior content.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１１）」の第１行動内容、すなわち、ユーザが歩行し得るルートの画像を通信端末１００Ｍの画面に再生し得る。 The behavior decision unit 236 of the agent system 500 may, as an agent behavior, reproduce on the screen of the communication terminal 100M the first behavior content of "(11)" described above, i.e., an image of a route that the user may walk.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１２）」の第１行動内容、すなわち、ユーザが歩行し得るルートに関連する音声を再生し得る。 The behavior decision unit 236 of the agent system 500 may play back, as an agent behavior, the first behavior content of "(12)" described above, i.e., audio related to the route the user may walk.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１３）」の第１行動内容、すなわち、ユーザの行動を是正する第１行動内容として、設定された案内ルートとは別のルートの画像を通信端末１００Ｍの画面に再生し得る。 The behavior decision unit 236 of the agent system 500 may play, as an agent behavior, an image of a route other than the set guidance route on the screen of the communication terminal 100M as the first behavior content of "(13)" described above, that is, as the first behavior content that corrects the user's behavior.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１４）」の第１行動内容、すなわち、設定された案内ルートとは別のルートに関連する音声を再生し得る。 The behavior decision unit 236 of the agent system 500 may play back, as an agent behavior, the first behavior content of "(14)" described above, i.e., audio related to a route other than the set guidance route.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１５）」の第１行動内容、すなわち、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートの画像を通信端末１００Ｍの画面に再生し得る。 The behavior decision unit 236 of the agent system 500 may, as an agent behavior, play on the screen of the communication terminal 100M an image of the first behavior content of "(15)" described above, that is, a route that returns to the set guidance route when the user being led along the set guidance route deviates from the guided route.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１６）」の第１行動内容、すなわち、設定された案内ルートを先導されるユーザが、案内ルートから外れた場合に、設定された案内ルートに戻るルートに関連する音声を再生し得る。 The behavior decision unit 236 of the agent system 500 may play back, as an agent behavior, the first behavior content of "(16)" described above, i.e., audio related to a route back to the set guidance route when the user being led along the set guidance route deviates from the guidance route.

エージェントシステム５００の行動決定部２３６は、第１行動内容と異なる第２行動内容を実行し得る。具体的には、行動決定部２３６は、エージェント行動として、前述した「（１７）」の第２行動内容、すなわち、ユーザの行動を褒める音声を再生し得る。 The behavior decision unit 236 of the agent system 500 may execute a second behavior content different from the first behavior content. Specifically, the behavior decision unit 236 may play back, as the agent behavior, the second behavior content of "(17)" described above, i.e., a voice praising the user's behavior.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１８）」の第２行動内容、すなわち、ユーザの行動に対して感謝する音声を再生し得る。 The behavior decision unit 236 of the agent system 500 may play the second behavior content of "(18)" described above as an agent behavior, i.e., a voice expressing gratitude for the user's behavior.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（１９）」の第２行動内容、すなわち、ユーザの行動を褒める画像を再生し得る。 The behavior decision unit 236 of the agent system 500 may play the second behavior content of "(19)" described above, i.e., an image praising the user's behavior, as the agent behavior.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（２０）」の第２行動内容、すなわち、ユーザの行動に対して感謝する画像を再生し得る。 The behavior decision unit 236 of the agent system 500 may play the second behavior content of "(20)" mentioned above, i.e., an image expressing gratitude for the user's behavior, as an agent behavior.

エージェントシステム５００の行動決定部２３６は、第１行動内容と異なる第３行動内容を実行し得る。具体的には、行動決定部２３６は、エージェント行動として、前述した
「（２１）」の第３行動内容、すなわち、ユーザ以外の人物への特定情報を送信し得る。 The behavior decision unit 236 of the agent system 500 may execute a third behavior content different from the first behavior content. Specifically, the behavior decision unit 236 may transmit the third behavior content of "(21)" described above, that is, specific information to a person other than the user, as the agent behavior.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（２２）」の第３行動内容、すなわち、ユーザの興味を引く音の再生を実行し得る。 The behavior decision unit 236 of the agent system 500 may execute the third behavior content of "(22)" mentioned above, i.e., playing a sound that attracts the user's interest, as an agent behavior.

エージェントシステム５００の行動決定部２３６は、エージェント行動として、前述した「（２３）」の第３行動内容、すなわち、ユーザの興味を引く画像の再生を実行し得る。 The behavior decision unit 236 of the agent system 500 may execute the third behavior content of "(23)" described above, i.e., playing an image that attracts the user's interest, as the agent behavior.

前述した「（１１）」～「（１６）」に示す第１行動内容として、ユーザが歩行し得るルートの画像などを再生する場合、エージェントシステム５００の関連情報収集部２７０は、収集データ２２３に、ユーザが歩行し得るルートの画像などのデータを格納してよい。 When playing back images of routes that the user can walk as the first action content shown in "(11)" to "(16)" above, the related information collection unit 270 of the agent system 500 may store data such as images of routes that the user can walk in the collected data 223.

前述した「（１７）」～「（２０）」に示す第２行動内容として、ユーザの行動を褒める音声などを再生する場合、エージェントシステム５００の関連情報収集部２７０は、収集データ２２３に、ユーザが歩行し得るルートの画像などのデータを格納してよい。 When playing back audio or the like praising the user's behavior as the second behavior content shown in "(17)" to "(20)" above, the related information collection unit 270 of the agent system 500 may store data such as images of the route the user may walk in the collected data 223.

前述した「（２１）」～「（２３）」に示す第３行動内容として、ユーザ以外の人物への特定情報の送信などする場合、エージェントシステム５００の関連情報収集部２７０は、収集データ２２３に、特定情報のデータなどを格納してよい。 When the third action content shown in "(21)" to "(23)" above involves sending specific information to a person other than the user, the related information collection unit 270 of the agent system 500 may store data on the specific information in the collected data 223.

次に、エージェントシステム５００がユーザ１０の感情を見守り、必要に応じてユーザ１０の代理で所定のアクションを実行する見守り機能を実行する場合の流れを説明する。 Next, we will explain the flow when the agent system 500 executes a monitoring function in which it monitors the emotions of the user 10 and, if necessary, performs a specified action on behalf of the user 10.

行動決定部２３６は、感情決定部２３２が判定したユーザ１０の感情が負の感情である場合、ユーザ１０の感情を表すテキストと、ユーザ１０に話しかける言葉を質問するテキストとを文章生成モデルに入力する。例えば、感情決定部２３２は、センサモジュール部２１０の音声感情認識部２１１により解析されたユーザ１０の音声に基づいて、ユーザ１０の感情を判定する。なお、上記のユーザ１０の感情を判定する方法はあくまで一例であり、感情決定部２３２は、センサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識されたユーザ１０の状態を適宜用いて、ユーザ１０の感情を判定することができる。 When the emotion of the user 10 determined by the emotion determination unit 232 is a negative emotion, the behavior determination unit 236 inputs text expressing the emotion of the user 10 and text asking the user 10 what words to speak to the user 10 into the sentence generation model. For example, the emotion determination unit 232 determines the emotion of the user 10 based on the voice of the user 10 analyzed by the voice emotion recognition unit 211 of the sensor module unit 210. Note that the above method of determining the emotion of the user 10 is merely an example, and the emotion determination unit 232 can determine the emotion of the user 10 by appropriately using the information analyzed by the sensor module unit 210 and the state of the user 10 recognized by the state recognition unit 230.

ここで、感情決定部２３２が判定したユーザ１０の負の感情が「不安」である場合、行動決定部２３６は、例えば「以下の負の感情を抱いている人間に話しかける言葉を教えてください。負の感情：不安」というテキストを文章生成モデルに入力する。行動決定部２３６は、文章生成モデルの出力に基づいて、通信端末１００Ｍの行動として通信端末１００Ｍが有する制御対象２５２Ｂに含まれるスピーカから音声出力させるエージェントの発話内容を決定する。例えば、行動決定部２３６は、上記テキストの入力による文章生成モデルの出力に基づいて、当該エージェントの発話内容を「不安そうな顔をしていますが、何かありましたか。」に決定する。 Here, if the negative emotion of user 10 determined by emotion determination unit 232 is "anxiety," behavior determination unit 236 inputs text, for example, "Please tell me what words to say to a person who is feeling the following negative emotion. Negative emotion: anxiety," into the sentence generation model. Based on the output of the sentence generation model, behavior determination unit 236 determines the speech content of the agent to be output as a voice from a speaker included in control object 252B possessed by communication terminal 100M as the behavior of communication terminal 100M. For example, based on the output of the sentence generation model in response to the input of the above text, behavior determination unit 236 determines the speech content of the agent to be "You look anxious. Is something wrong?"

行動制御部２５０は、行動決定部２３６が決定した上記のエージェントの発話内容をスピーカから音声出力する。具体的には、行動制御部２５０は、「不安そうな顔をしていますが、何かありましたか。」との内容をスピーカから音声出力させる。 The behavior control unit 250 outputs the speech content of the agent determined by the behavior determination unit 236 from the speaker. Specifically, the behavior control unit 250 causes the speaker to output the speech content, "You look anxious. Is something wrong?".

その後、行動決定部２３６は、スピーカから音声出力されたエージェントの発話内容と、状態認識部２３０がセンサモジュール部２１０の発話理解部２１２により出力された文字情報に基づいて認識したユーザ１０の状態としてのユーザ１０の発話内容とを含むユーザ１０とエージェントとの対話内容を表すテキストと、ユーザ１０が抱える問題の解決方法を質問するテキストとを文章生成モデルに入力する。行動決定部２３６は、例えば「以下のユーザ１０とエージェントとの対話内容を踏まえて、ユーザ１０が抱える問題の解決方法を教えてください。エージェント：不安そうな顔をしていますが、何かありましたか。ユーザ１０：お母さんがいなくて寂しいの。エージェント：お母さんはどこにいったかわかりますか。ユーザ１０：うん。そこのお店で買い物しているの。」というテキストを文章生成モデルに入力する。行動決定部２３６は、文章生成モデルの出力に基づいて、スピーカから出力させるエージェントの発話内容を決定する。例えば、行動決定部２３６は、上記テキストの入力による文章生成モデルの出力に基づいて、当該エージェントの発話内容を「そうですか。それならもう少し待っていれば戻ってきますよ。それまで私とおしゃべりでもしませんか。」に決定する。 Then, the behavior decision unit 236 inputs into the sentence generation model the text representing the dialogue between the user 10 and the agent, including the agent's speech output from the speaker and the user's speech as the state of the user 10 recognized by the state recognition unit 230 based on the character information output by the speech understanding unit 212 of the sensor module unit 210, and the text asking how to solve the problem the user 10 is having. The behavior decision unit 236 inputs into the sentence generation model, for example, the text "Based on the following dialogue between the user 10 and the agent, please tell me how to solve the problem the user 10 is having. Agent: You look anxious. Is there something wrong? User 10: I'm lonely because my mother is not here. Agent: Do you know where your mother went? User 10: Yes. I'm shopping at that store." Based on the output of the sentence generation model, the behavior decision unit 236 determines the agent's speech to be output from the speaker. For example, based on the output of the sentence generation model using the above text as input, the behavior decision unit 236 determines the agent's utterance content to be "I see. In that case, if you wait a little longer, he'll be back. Why don't you chat with me until then?"

行動制御部２５０は、エージェントの発話内容として、ユーザ１０が抱える問題の解決方法をスピーカから音声出力する。具体的には、行動制御部２５０は、当該エージェントの発話内容として、行動決定部２３６が上記で決定した「そうですか。それならもう少し待っていれば戻ってきますよ。それまで私とおしゃべりでもしませんか。」との内容をスピーカから音声出力させる。 The behavior control unit 250 outputs a solution to the problem that the user 10 is having from the speaker as the agent's speech content. Specifically, the behavior control unit 250 causes the speaker to output the content determined by the behavior decision unit 236 above, "I see. In that case, if you wait a little longer, he'll be back. Why don't you have a chat with me until then?" as the agent's speech content.

上記構成により、エージェントシステム５００によれば、ユーザ１０が自身で問題の解決方法を考えられない場合でも、エージェントのサポートにより問題の解決を図ることができる。 With the above configuration, the agent system 500 allows the user 10 to solve the problem with the support of the agent, even if the user 10 is unable to think of a way to solve the problem on his or her own.

上記では、ユーザ１０とエージェントとの対話内容に基づいて、ユーザ１０が抱える問題の解決方法をエージェントシステム５００が導出する例を説明した。次は、ユーザ１０がエージェントとの対話が困難である場合におけるエージェントシステム５００の見守り機能の処理の流れを説明する。 The above describes an example in which the agent system 500 derives a solution to a problem that the user 10 has based on the content of the dialogue between the user 10 and the agent. Next, we will explain the processing flow of the monitoring function of the agent system 500 when the user 10 has difficulty dialogue with the agent.

行動決定部２３６は、感情決定部２３２が判定したユーザ１０の感情が負の感情であり、かつ、ユーザ１０がエージェントとの対話が困難である場合、ユーザ１０のユーザ状態及びユーザ１０の感情を表すテキストと、ユーザ１０が抱える問題を質問するテキストとを文章生成モデルに入力する。行動決定部２３６は、例えば、エージェントの発話内容がスピーカから出力されてから所定時間以上マイク２０１でユーザ１０の音声を検出できない場合、又はマイク２０１で検出されたユーザ１０の音声が発話理解部２１２で解析できない場合等に、ユーザ１０がエージェントとの対話が困難であると決定する。 When the emotion of the user 10 determined by the emotion determination unit 232 is a negative emotion and the user 10 has difficulty in interacting with the agent, the behavior determination unit 236 inputs text expressing the user state and emotion of the user 10, and text asking about the problem the user 10 has, into the sentence generation model. The behavior determination unit 236 determines that the user 10 has difficulty interacting with the agent, for example, when the microphone 201 cannot detect the voice of the user 10 for a predetermined time or more after the agent's speech content is output from the speaker, or when the speech understanding unit 212 cannot analyze the voice of the user 10 detected by the microphone 201.

上記の場合、行動決定部２３６は、例えば「以下のユーザ状態及びユーザ１０の感情を踏まえて、ユーザ１０が抱える問題を教えてください。ユーザ状態：ユーザ１０が１人、泣いている確率９９％です。ユーザ１０の感情：不安」というテキストを文章生成モデルに入力する。行動決定部２３６は、文章生成モデルの出力に基づいて、通信端末１００Ｍの行動としてユーザ１０が抱える問題を表す問題情報をユーザ１０と所定の関係を有する関係者の端末に送信することを決定する。所定の関係には、一例として、血族関係、親族関係、姻族関係、恋愛関係、友人関係、及び、学校又は職場での人間関係等が含まれる。また、所定の関係において、関係者は、ユーザ１０に好意を寄せていることが望ましい。ここでは、一例として、所定の関係を血族関係とし、ユーザ１０を「子」、関係者をユーザ１０の「母」とする。例えば、行動決定部２３６は、上記テキストの入力による文章生成モデルの出力に基づいて、「お子さんがおそらく迷子になっています。」との問題情報をユーザ１０の母の端末（例：スマートホン）に送信することを決定する。そして、通信処理部２８０は、当該問題情報をユーザ１０の母の端末に送信する。 In the above case, the behavior decision unit 236 inputs, for example, the text "Considering the following user state and the emotions of the user 10, please tell us the problem that the user 10 is having. User state: 1 user 10, 99% probability of crying. Emotions of the user 10: Anxiety" into the sentence generation model. Based on the output of the sentence generation model, the behavior decision unit 236 decides to transmit problem information representing the problem that the user 10 is having as the behavior of the communication terminal 100M to the terminal of a related person who has a predetermined relationship with the user 10. Examples of the predetermined relationship include blood ties, kinship ties, in-law ties, romantic relationships, friendship ties, and human relationships at school or work. In addition, in the predetermined relationship, it is desirable for the related person to have a favorable impression of the user 10. Here, as an example, the predetermined relationship is blood ties, the user 10 is the "child", and the related person is the "mother" of the user 10. For example, based on the output of the sentence generation model using the above text input, the behavior decision unit 236 decides to send problem information such as "Your child is probably lost" to the device (e.g., a smartphone) of the mother of the user 10. The communication processing unit 280 then sends the problem information to the device of the mother of the user 10.

上記構成により、エージェントシステム５００によれば、ユーザ１０がうまく意思伝達をできない場合でも、エージェントがユーザ１０の代わりにユーザ１０が抱える問題をユーザ１０の母に伝えることができる。 With the above configuration, the agent system 500 allows the agent to communicate the problems that the user 10 is facing to the user 10's mother on behalf of the user 10, even if the user 10 is unable to communicate his/her intentions well.

他の例として、ユーザ１０がエージェントとの対話が困難である場合、エージェントシステム５００は、上記の関係者だけでなく、ユーザ１０の周囲に存在する他ユーザに助けを求めてもよい。 As another example, if user 10 has difficulty interacting with an agent, agent system 500 may ask for help not only from the above-mentioned parties, but also from other users around user 10.

感情決定部２３２は、ユーザ１０の周囲に存在する他ユーザを２Ｄカメラ２０３で撮影した撮影画像に基づいて、他ユーザの感情を判定する。例えば、感情決定部２３２は、センサモジュール部２１０の表情認識部２１３により認識された他ユーザの表情に基づいて、他ユーザの感情を判定する。２Ｄカメラ２０３は「カメラ」の一例である。なお、感情決定部２３２は、表情認識部２１３により認識された他ユーザの表情に加えて、他のセンサモジュール部２１０で解析された情報、及び状態認識部２３０によって認識された他ユーザの状態を適宜用いて、他ユーザの感情を判定してもよい。 The emotion determination unit 232 determines the emotion of other users based on images of other users around the user 10 captured by the 2D camera 203. For example, the emotion determination unit 232 determines the emotion of other users based on the facial expressions of other users recognized by the facial expression recognition unit 213 of the sensor module unit 210. The 2D camera 203 is an example of a "camera." Note that the emotion determination unit 232 may determine the emotion of other users by appropriately using information analyzed by other sensor module units 210 and the states of other users recognized by the state recognition unit 230 in addition to the facial expressions of other users recognized by the facial expression recognition unit 213.

行動決定部２３６は、感情決定部２３２が判定した他ユーザの感情が正の感情である場合、撮影画像と、ユーザ１０が抱える問題を表すテキストと、他ユーザに話しかける言葉を質問するテキストとを文章生成モデルに入力する。行動決定部２３６は、例えば「以下のユーザ１０が抱える問題を踏まえて、以下の撮影画像に示される人間に話しかける言葉を教えてください。ユーザ１０が抱える問題：迷子撮影画像：xxxyyyzzz111.jpg」というプロンプトを文章生成モデルに入力する。行動決定部２３６は、文章生成モデルの出力に基づいて、スピーカから出力させるエージェントの発話内容を決定する。例えば、行動決定部２３６は、上記プロンプトの入力による文章生成モデルの出力に基づいて、当該エージェントの発話内容を「すみません、そこの緑色のＴシャツを着ているお兄さん、迷子の子どもを助けてくれませんか。」に決定する。なお、行動決定部２３６は、他ユーザの正の感情が「安心」及び「安堵」等の特定の正の感情である場合、又は、他ユーザの正の感情の感情値が所定値以上の場合等に限って、上記プロンプトを文章生成モデルに入力することとしてもよい。 When the emotion of the other user determined by the emotion determination unit 232 is a positive emotion, the behavior determination unit 236 inputs the captured image, text expressing the problem faced by the user 10, and text asking the user what words to use to speak to the other user into the sentence generation model. The behavior determination unit 236 inputs a prompt to the sentence generation model, for example, "Considering the problem faced by the following user 10, please tell me what words to use to speak to the person shown in the captured image below. Problem faced by user 10: lost child Photographed image: xxxyyyzzz111.jpg". Based on the output of the sentence generation model, the behavior determination unit 236 determines the agent's speech content to be output from the speaker. For example, based on the output of the sentence generation model in response to the input of the above prompt, the behavior determination unit 236 determines the agent's speech content to be "Excuse me, man wearing a green T-shirt over there, could you help my lost child?" The behavior decision unit 236 may input the above prompt into the sentence generation model only when the other user's positive emotion is a specific positive emotion such as "relief" or "at ease," or when the emotion value of the other user's positive emotion is equal to or greater than a predetermined value.

行動制御部２５０は、エージェントの発話内容としてユーザ１０が抱える問題を、他ユーザに向けてスピーカから音声出力する。一例として、当該スピーカは、超指向性スピーカであり、狙った対象に音を届けることができる。この場合、行動制御部２５０は、他ユーザが存在する方向に音を出力する。そして、この場合、行動制御部２５０は、当該エージェントの発話内容として、行動決定部２３６が上記で決定した「すみません、そこの緑色のＴシャツを着ているお兄さん、迷子の子どもを助けてくれませんか。」との内容をスピーカから音声出力させる。 The behavior control unit 250 outputs the problem that the user 10 is having as the agent's speech from the speaker to the other users. As an example, the speaker is a super-directional speaker that can deliver sound to the intended target. In this case, the behavior control unit 250 outputs sound in the direction in which the other users are present. In this case, the behavior control unit 250 causes the speaker to output the content determined by the behavior decision unit 236 above, "Excuse me, man wearing the green T-shirt over there, could you please help my lost child?" as the agent's speech from the speaker.

上記構成により、エージェントシステム５００によれば、正の感情を有する他ユーザに助けを求めることができるため、感情を考慮せずに例えば負の感情を有する他ユーザに助けを求める場合に比べて、ユーザ１０が抱える問題が解決される可能性を高めることができる。また、エージェントシステム５００によれば、エージェントがユーザ１０の代わりに助けを求めていることで、他ユーザもためらわずにユーザ１０を助けることができる。例えば、迷子のユーザ１０を連れていることは他ユーザにとって誘拐犯と勘違いされるリスクもあるが、エージェントシステム５００によれば、エージェントがユーザ１０の代わりに助けを求めていることで、他ユーザは誘拐目的でないことを容易に説明することができる。また、他ユーザは、エージェントと対話を行うことで、エージェントからユーザ１０の氏名及び連絡先等を把握することができ、ユーザ１０を助けるための連絡を行うことができる。また、上記のように、エージェントシステム５００によれば、他ユーザは誘拐目的でないことを容易に説明することができるため、他ユーザにユーザ１０を交番等の安全な場所まで連れて行ってもらうことも可能になる。 With the above configuration, the agent system 500 allows the user 10 to ask for help from other users who have positive emotions, and therefore the possibility of solving the problem the user 10 has can be increased compared to when the user 10 asks for help from other users who have negative emotions without considering the emotions. In addition, the agent system 500 allows other users to help the user 10 without hesitation because the agent is asking for help on behalf of the user 10. For example, there is a risk that other users may mistake the user 10 for a kidnapper if the user is carrying a lost user 10, but the agent system 500 allows the other users to easily explain that they are not kidnapping the user 10 because the agent is asking for help on behalf of the user 10. In addition, by having a conversation with the agent, the other users can learn the name and contact information of the user 10 from the agent, and can contact the agent to help the user 10. In addition, as described above, the agent system 500 allows the other users to easily explain that they are not kidnapping the user 10, and therefore it is possible for the other users to take the user 10 to a safe place such as a police station.

ここで、エージェントシステム５００が上記の見守り機能を実行する場合、行動制御部２５０は、感情決定部２３２によって決定されたエージェントの感情に応じた出力態様で、スピーカからエージェントの発話内容を音声出力してもよい。 Here, when the agent system 500 executes the above-mentioned monitoring function, the behavior control unit 250 may output the agent's speech from the speaker in an output format according to the agent's emotion determined by the emotion determination unit 232.

感情決定部２３２は、ユーザ１０の感情が負の感情であると判定した場合、エージェントの感情を負の感情に決定する。例えば、感情決定部２３２は、ユーザ１０の負の感情を「不安」と判定した場合、エージェントの感情を「哀」に決定し、当該「哀」の感情値を増大させる。 When the emotion determination unit 232 determines that the emotion of the user 10 is a negative emotion, it determines the emotion of the agent to be a negative emotion. For example, when the emotion determination unit 232 determines that the negative emotion of the user 10 is "anxiety," it determines the emotion of the agent to be "sadness" and increases the emotion value of "sadness."

行動制御部２５０は、感情決定部２３２が決定したエージェントの負の感情に基づく出力態様で、スピーカからエージェントの発話内容を出力する。例えば、行動制御部２５０は、エージェントの感情が「哀」に決定された場合、スピーカから出力されるエージェントの発話速度を速くしたり、スピーカから出力されるエージェントの発話音量を大きくしたりする。 The behavior control unit 250 outputs the agent's speech from the speaker in an output mode based on the agent's negative emotion determined by the emotion determination unit 232. For example, when the agent's emotion is determined to be "sadness," the behavior control unit 250 increases the agent's speech rate output from the speaker and increases the volume of the agent's speech output from the speaker.

上記構成により、エージェントシステム５００によれば、エージェントの負の感情に基づく出力態様でスピーカからエージェントの発話内容が出力されるため、エージェントの伝え方で深刻さ及び緊急度等を表現することができる。 With the above configuration, the agent system 500 outputs the agent's speech from the speaker in an output format based on the agent's negative emotions, making it possible to express the seriousness and urgency of the situation through the agent's way of communicating.

また、エージェントシステム５００の一部（例えば、センサモジュール部２１０、格納部２２０、制御部２２８Ｂ）が、ユーザが所持するスマートホン等の通信端末の外部（例えば、サーバ）に設けられ、通信端末が、外部と通信することで、上記のエージェントシステム５００の各部として機能するようにしてもよい。 In addition, parts of the agent system 500 (e.g., the sensor module unit 210, the storage unit 220, and the control unit 228B) may be provided outside (e.g., a server) of a communication terminal such as a smartphone carried by a user, and the communication terminal may communicate with the outside to function as each part of the agent system 500.

［第４実施形態］
第４実施形態では、上記のエージェントシステムを、スマート眼鏡に適用する。なお、第１実施形態～第３実施形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Fourth embodiment]
In the fourth embodiment, the above-mentioned agent system is applied to smart glasses. Note that the same reference numerals are used to designate parts having the same configuration as the first to third embodiments, and the description thereof will be omitted.

図１３は、制御システムの機能の一部又は全部を利用して構成されるエージェントシステム７００の機能ブロック図である。 Figure 13 is a functional block diagram of an agent system 700 that is configured using some or all of the functions of a control system.

図１４に示すように、スマート眼鏡７２０は、眼鏡型のスマートデバイスであり、一般的な眼鏡と同様にユーザ１０によって装着される。スマート眼鏡７２０は、電子機器及びウェアラブル端末の一例である。 As shown in FIG. 14, the smart glasses 720 are glasses-type smart devices and are worn by the user 10 in the same way as regular glasses. The smart glasses 720 are an example of an electronic device and a wearable terminal.

スマート眼鏡７２０は、エージェントシステム７００を備えている。制御対象２５２Ｂに含まれるディスプレイは、ユーザ１０に対して各種情報を表示する。ディスプレイは、例えば、液晶ディスプレイである。ディスプレイは、例えば、スマート眼鏡７２０のレンズ部分に設けられており、ユーザ１０によって表示内容が視認可能とされている。制御対象２５２Ｂに含まれるスピーカは、ユーザ１０に対して各種情報を示す音声を出力する。スマート眼鏡７２０は、タッチパネル（図示省略）を備えており、タッチパネルは、ユーザ１０からの入力を受け付ける。 The smart glasses 720 include an agent system 700. The display included in the control object 252B displays various information to the user 10. The display is, for example, a liquid crystal display. The display is provided, for example, in the lens portion of the smart glasses 720, and the display contents are visible to the user 10. The speaker included in the control object 252B outputs audio indicating various information to the user 10. The smart glasses 720 include a touch panel (not shown), which accepts input from the user 10.

センサ部２００Ｂの加速度センサ２０６、温度センサ２０７、及び心拍センサ２０８は、ユーザ１０の状態を検出する。なお、これらのセンサはあくまで一例にすぎず、ユーザ１０の状態を検出するためにその他のセンサが搭載されてよいことはもちろんである。 The acceleration sensor 206, temperature sensor 207, and heart rate sensor 208 of the sensor unit 200B detect the state of the user 10. Note that these sensors are merely examples, and it goes without saying that other sensors may be installed to detect the state of the user 10.

マイク２０１は、ユーザ１０が発した音声又はスマート眼鏡７２０の周囲の環境音を取得する。２Ｄカメラ２０３は、スマート眼鏡７２０の周囲を撮像可能とされている。２Ｄカメラ２０３は、例えば、ＣＣＤカメラである。 The microphone 201 captures the voice emitted by the user 10 or the environmental sounds around the smart glasses 720. The 2D camera 203 is capable of capturing images of the surroundings of the smart glasses 720. The 2D camera 203 is, for example, a CCD camera.

センサモジュール部２１０Ｂは、音声感情認識部２１１及び発話理解部２１２を含む。制御部２２８Ｂの通信処理部２８０は、スマート眼鏡７２０と外部との通信を司る。 The sensor module unit 210B includes a voice emotion recognition unit 211 and a speech understanding unit 212. The communication processing unit 280 of the control unit 228B is responsible for communication between the smart glasses 720 and the outside.

図１４は、スマート眼鏡７２０によるエージェントシステム７００の利用態様の一例を示す図である。スマート眼鏡７２０は、ユーザ１０に対してエージェントシステム７００を利用した各種サービスの提供を実現する。例えば、ユーザ１０によりスマート眼鏡７２０が操作（例えば、マイクロフォンに対する音声入力、又は指でタッチパネルがタップされる等）されると、スマート眼鏡７２０は、エージェントシステム７００の利用を開始する。ここで、エージェントシステム７００を利用するとは、スマート眼鏡７２０が、エージェントシステム７００を有し、エージェントシステム７００を利用することを含み、また、エージェントシステム７００の一部（例えば、センサモジュール部２１０Ｂ、格納部２２０、制御部２２８Ｂ）が、スマート眼鏡７２０の外部（例えば、サーバ）に設けられ、スマート眼鏡７２０が、外部と通信することで、エージェントシステム７００を利用する態様も含む。 Figure 14 is a diagram showing an example of how the agent system 700 is used by the smart glasses 720. The smart glasses 720 provide various services to the user 10 using the agent system 700. For example, when the user 10 operates the smart glasses 720 (e.g., voice input to a microphone, or tapping a touch panel with a finger), the smart glasses 720 start using the agent system 700. Here, using the agent system 700 includes the smart glasses 720 having the agent system 700 and using the agent system 700, and also includes a mode in which a part of the agent system 700 (e.g., the sensor module unit 210B, the storage unit 220, the control unit 228B) is provided outside the smart glasses 720 (e.g., a server), and the smart glasses 720 uses the agent system 700 by communicating with the outside.

ユーザ１０がスマート眼鏡７２０を操作することで、エージェントシステム７００とユーザ１０との間にタッチポイントが生じる。すなわち、エージェントシステム７００によるサービスの提供が開始される。第３実施形態で説明したように、エージェントシステム７００において、キャラクタ設定部２７６によりエージェントのキャラクタの設定が行われる。 When the user 10 operates the smart glasses 720, a touch point is created between the agent system 700 and the user 10. In other words, the agent system 700 starts providing a service. As described in the third embodiment, in the agent system 700, the character setting unit 276 sets the agent character.

感情決定部２３２は、ユーザ１０の感情を示す感情値及びエージェント自身の感情値を決定する。ここで、ユーザ１０の感情を示す感情値は、スマート眼鏡７２０に搭載されたセンサ部２００Ｂに含まれる各種センサから推定される。例えば、心拍センサ２０８により検出されたユーザ１０の心拍数が上昇している場合には、「不安」「恐怖」等の感情値が大きく推定される。 The emotion determination unit 232 determines an emotion value indicating the emotion of the user 10 and an emotion value of the agent itself. Here, the emotion value indicating the emotion of the user 10 is estimated from various sensors included in the sensor unit 200B mounted on the smart glasses 720. For example, if the heart rate of the user 10 detected by the heart rate sensor 208 is increasing, emotion values such as "anxiety" and "fear" are estimated to be large.

また、温度センサ２０７によりユーザの体温が測定された結果、例えば、平均体温を上回っている場合には、「苦痛」「辛い」等の感情値が大きく推定される。また、例えば、加速度センサ２０６によりユーザ１０が何らかのスポーツを行っていることが検出された場合には、「楽しい」等の感情値が大きく推定される。 In addition, when the temperature sensor 207 measures the user's body temperature and, for example, the result is higher than the average body temperature, an emotional value such as "pain" or "distress" is estimated to be high. In addition, when the acceleration sensor 206 detects that the user 10 is playing some kind of sport, an emotional value such as "fun" is estimated to be high.

また、例えば、スマート眼鏡７２０に搭載されたマイク２０１により取得されたユーザ１０の音声、又は発話内容からユーザ１０の感情値が推定されてもよい。例えば、ユーザ１０が声を荒げている場合には、「怒り」等の感情値が大きく推定される。 In addition, for example, the emotion value of the user 10 may be estimated from the voice of the user 10 acquired by the microphone 201 mounted on the smart glasses 720, or the content of the speech. For example, if the user 10 is raising his/her voice, an emotion value such as "anger" is estimated to be high.

感情決定部２３２により推定された感情値が予め定められた値よりも高くなった場合、エージェントシステム７００は、スマート眼鏡７２０に対して周囲の状況に関する情報を取得させる。具体的には、例えば、２Ｄカメラ２０３に対して、ユーザ１０の周囲の状況（例えば、周囲にいる人物、又は物体）を示す画像又は動画を撮像させる。また、マイク２０１に対して周囲の環境音を録音させる。その他の周囲の状況に関する情報としては、日付、時刻、位置情報、又は天候を示す情報等が挙げられる。周囲の状況に関する情報は、感情値と共に履歴データ２２２に保存される。履歴データ２２２は、外部のクラウドストレージによって実現されてもよい。このように、スマート眼鏡７２０によって得られた周囲の状況は、その時のユーザ１０の感情値と対応付けられた状態で、いわゆるライフログとして履歴データ２２２に保存される。 When the emotion value estimated by the emotion determination unit 232 is higher than a predetermined value, the agent system 700 causes the smart glasses 720 to acquire information about the surrounding situation. Specifically, for example, the 2D camera 203 captures an image or video showing the surrounding situation of the user 10 (for example, people or objects in the vicinity). In addition, the microphone 201 records the surrounding environmental sounds. Other information about the surrounding situation includes information about the date, time, location information, or weather. The information about the surrounding situation is stored in the history data 222 together with the emotion value. The history data 222 may be realized by an external cloud storage. In this way, the surrounding situation acquired by the smart glasses 720 is stored in the history data 222 as a so-called life log in a state where it is associated with the emotion value of the user 10 at that time.

エージェントシステム７００において、履歴データ２２２に周囲の状況を示す情報が、感情値と対応付けられて保存される。これにより、ユーザ１０の趣味、嗜好、又は性格等の個人情報がエージェントシステム７００によって把握される。例えば、野球観戦の様子を示す画像と、「喜び」「楽しい」等の感情値が対応付けられている場合には、ユーザ１０の趣味が野球観戦であり、好きなチーム、又は選手が、履歴データ２２２に格納された情報からエージェントシステム７００により把握される。 In the agent system 700, information indicating the surrounding situation is stored in association with an emotional value in the history data 222. This allows the agent system 700 to grasp personal information such as the hobbies, preferences, or personality of the user 10. For example, if an image showing a baseball game is associated with an emotional value such as "joy" or "fun," the agent system 700 can determine from the information stored in the history data 222 that the user 10's hobby is watching baseball games and their favorite team or player.

そして、エージェントシステム７００は、ユーザ１０と対話する場合又はユーザ１０に向けた行動を行う場合、履歴データ２２２に格納された周囲状況の内容を加味して対話内容又は行動内容を決定する。なお、周囲状況に加えて、上述したように履歴データ２２２に格納された対話履歴を加味して対話内容又は行動内容が決定されてよいことはもちろんである。 When the agent system 700 converses with the user 10 or takes an action toward the user 10, the agent system 700 determines the content of the dialogue or the content of the action by taking into account the content of the surrounding circumstances stored in the history data 222. Of course, the content of the dialogue or the content of the action may be determined by taking into account the dialogue history stored in the history data 222 as described above, in addition to the surrounding circumstances.

上述したように、行動決定部２３６は、文章生成モデルによって生成された文章に基づいて発話内容を生成する。具体的には、行動決定部２３６は、ユーザ１０により入力されたテキストまたは音声、感情決定部２３２によって決定されたユーザ１０及びエージェントの双方の感情、履歴データ２２２に格納された会話の履歴、及びエージェントの性格等を文章生成モデルに入力して、エージェントの発話内容を生成する。さらに、行動決定部２３６は、履歴データ２２２に格納された周囲状況を文章生成モデルに入力して、エージェントの発話内容を生成する。 As described above, the behavior determination unit 236 generates the utterance content based on the sentence generated by the sentence generation model. Specifically, the behavior determination unit 236 inputs the text or voice input by the user 10, the emotions of both the user 10 and the agent determined by the emotion determination unit 232, the conversation history stored in the history data 222, and the agent's personality, etc., into the sentence generation model to generate the agent's utterance content. Furthermore, the behavior determination unit 236 inputs the surrounding circumstances stored in the history data 222 into the sentence generation model to generate the agent's utterance content.

生成された発話内容は、例えば、スマート眼鏡７２０に搭載されたスピーカからユーザ１０に対して音声出力される。この場合において、音声としてエージェントのキャラクタに応じた合成音声が用いられる。行動制御部２５０は、エージェントのキャラクタの声質を再現することで、合成音声を生成したり、キャラクタの感情に応じた合成音声（例えば、「怒」の感情である場合には語気を強めた音声）を生成したりする。また、音声出力に代えて、又は音声出力とともに、ディスプレイに対して発話内容が表示されてもよい。 The generated speech content is output as audio to the user 10, for example, from a speaker mounted on the smart glasses 720. In this case, a synthetic voice corresponding to the character of the agent is used as the voice. The behavior control unit 250 generates a synthetic voice by reproducing the voice quality of the agent character, or generates a synthetic voice corresponding to the character's emotion (for example, a voice with a stronger tone in the case of the emotion of "anger"). Also, instead of or together with the audio output, the speech content may be displayed on the display.

ＲＰＡ２７４は、コマンド（例えば、ユーザ１０との対話を通じてユーザ１０から発せられる音声又はテキストから取得されたエージェントのコマンド）に応じた動作を実行する。ＲＰＡ２７４は、例えば、情報検索、店の予約、チケットの手配、商品・サービスの購入、代金の支払い、経路案内、翻訳等のサービスプロバイダの利用に関する行動を行う。 The RPA 274 executes an operation according to a command (e.g., an agent command obtained from a voice or text issued by the user 10 through a dialogue with the user 10). The RPA 274 performs actions related to the use of a service provider, such as information search, store reservation, ticket arrangement, purchase of goods and services, payment, route guidance, translation, etc.

また、その他の例として、ＲＰＡ２７４は、ユーザ１０（例えば、子供）がエージェントとの対話を通じて音声入力した内容を、相手先（例えば、親）に送信する動作を実行する。送信手段としては、例えば、メッセージアプリケーションソフト、チャットアプリケーションソフト、又はメールアプリケーションソフト等が挙げられる。 As another example, the RPA 274 executes an operation to transmit the contents of voice input by the user 10 (e.g., a child) through dialogue with an agent to a destination (e.g., a parent). Examples of transmission means include message application software, chat application software, or email application software.

ＲＰＡ２７４による動作が実行された場合に、例えば、スマート眼鏡７２０に搭載されたスピーカから動作の実行が終了したことを示す音声が出力される。例えば、「お店の予約が完了しました」等の音声がユーザ１０に対して出力される。また、例えば、お店の予約が埋まっていた場合には、「予約ができませんでした。どうしますか？」等の音声がユーザ１０に対して出力される。 When an operation is executed by the RPA 274, for example, a sound indicating that execution of the operation has been completed is output from a speaker mounted on the smart glasses 720. For example, a sound such as "Your restaurant reservation has been completed" is output to the user 10. Also, for example, if the restaurant is fully booked, a sound such as "We were unable to make a reservation. What would you like to do?" is output to the user 10.

以上説明したように、スマート眼鏡７２０では、エージェントシステム７００を利用することでユーザ１０に対して各種サービスが提供される。また、スマート眼鏡７２０は、ユーザ１０によって身につけられていることから、自宅、仕事場、外出先等、様々な場面でエージェントシステム７００を利用することが実現される。 As described above, the smart glasses 720 provide various services to the user 10 by using the agent system 700. In addition, since the smart glasses 720 are worn by the user 10, it is possible to use the agent system 700 in various situations, such as at home, at work, and outside the home.

また、スマート眼鏡７２０は、ユーザ１０によって身につけられていることから、ユーザ１０のいわゆるライフログを収集することに適している。具体的には、スマート眼鏡７２０に搭載された各種センサ等による検出結果、又は２Ｄカメラ２０３等の記録結果に基づいてユーザ１０の感情値が推定される。このため、様々な場面でユーザ１０の感情値を収集することができ、エージェントシステム７００は、ユーザ１０の感情に適したサービス、又は発話内容を提供することができる。 In addition, since the smart glasses 720 are worn by the user 10, they are suitable for collecting the so-called life log of the user 10. Specifically, the emotional value of the user 10 is estimated based on the detection results of various sensors mounted on the smart glasses 720 or the recording results of the 2D camera 203, etc. Therefore, the emotional value of the user 10 can be collected in various situations, and the agent system 700 can provide services or speech content appropriate to the emotions of the user 10.

また、スマート眼鏡７２０では、２Ｄカメラ２０３、マイク２０１等によりユーザ１０の周囲の状況が得られる。そして、これらの周囲の状況とユーザ１０の感情値とは対応付けられている。これにより、ユーザ１０がどのような状況に置かれた場合に、どのような感情を抱いたかを推定することができる。この結果、エージェントシステム７００が、ユーザ１０の趣味嗜好を把握する場合の精度を向上させることができる。そして、エージェントシステム７００において、ユーザ１０の趣味嗜好が正確に把握されることで、エージェントシステム７００は、ユーザ１０の趣味嗜好に適したサービス、又は発話内容を提供することができる。 In addition, the smart glasses 720 obtain the surrounding conditions of the user 10 using the 2D camera 203, microphone 201, etc. These surrounding conditions are associated with the emotion values of the user 10. This makes it possible to estimate what emotions the user 10 felt in what situations. As a result, the agent system 700 can improve the accuracy with which it grasps the hobbies and preferences of the user 10. By accurately grasping the hobbies and preferences of the user 10 in the agent system 700, the agent system 700 can provide services or speech content that are suited to the hobbies and preferences of the user 10.

また、エージェントシステム７００は、他のウェアラブル端末（ペンダント、スマートウォッチ、イヤリング、ブレスレット、ヘアバンド等のユーザ１０の身体に装着可能な電子機器）に適用することも可能である。エージェントシステム７００をスマートペンダントに適用する場合、制御対象２５２Ｂとしてのスピーカは、ユーザ１０に対して各種情報を示す音声を出力する。スピーカは、例えば、指向性を有する音声を出力可能なスピーカである。スピーカは、ユーザ１０の耳に向かって指向性を有するように設定される。これにより、ユーザ１０以外の人物に対して音声が届くことが抑制される。マイク２０１は、ユーザ１０が発した音声又はスマートペンダントの周囲の環境音を取得する。スマートペンダントは、ユーザ１０の首から提げられる態様で装着される。このため、スマートペンダントは、装着されている間、ユーザ１０の口に比較的近い場所に位置する。これにより、ユーザ１０の発する音声を取得することが容易になる。 The agent system 700 can also be applied to other wearable devices (electronic devices that can be worn on the body of the user 10, such as pendants, smart watches, earrings, bracelets, and hair bands). When the agent system 700 is applied to a smart pendant, the speaker as the control target 252B outputs a sound indicating various information to the user 10. The speaker is, for example, a speaker that can output a directional sound. The speaker is set to have a directionality toward the ears of the user 10. This prevents the sound from reaching people other than the user 10. The microphone 201 acquires the sound emitted by the user 10 or the environmental sound around the smart pendant. The smart pendant is worn in a manner that it is hung from the neck of the user 10. Therefore, the smart pendant is located relatively close to the mouth of the user 10 while it is worn. This makes it easy to acquire the sound emitted by the user 10.

なお、上記実施形態では、ロボット１００は、ユーザ１０の顔画像を用いてユーザ１０を認識する場合について説明したが、開示の技術はこの態様に限定されない。例えば、ロボット１００は、ユーザ１０が発する音声、ユーザ１０のメールアドレス、ユーザ１０のＳＮＳのＩＤ又はユーザ１０が所持する無線ＩＣタグが内蔵されたＩＤカード等を用いてユーザ１０を認識してもよい。 In the above embodiment, the robot 100 recognizes the user 10 using a facial image of the user 10, but the disclosed technology is not limited to this aspect. For example, the robot 100 may recognize the user 10 using a voice emitted by the user 10, an email address of the user 10, an SNS ID of the user 10, or an ID card with a built-in wireless IC tag that the user 10 possesses.

ロボット１００は、制御システムを備える電子機器の一例である。制御システムの適用対象は、ロボット１００に限られず、様々な電子機器に制御システムを適用できる。また、サーバ３００の機能は、１以上のコンピュータによって実装されてよい。サーバ３００の少なくとも一部の機能は、仮想マシンによって実装されてよい。また、サーバ３００の機能の少なくとも一部は、クラウドで実装されてよい。 The robot 100 is an example of an electronic device equipped with a control system. The control system is not limited to being applied to the robot 100, but can be applied to various electronic devices. The functions of the server 300 may be implemented by one or more computers. At least some of the functions of the server 300 may be implemented by a virtual machine. At least some of the functions of the server 300 may be implemented in the cloud.

図１５は、スマートホン５０、ロボット１００、サーバ３００、及びエージェントシステム５００、７００として機能するコンピュータ１２００のハードウェア構成の一例を概略的に示す。コンピュータ１２００にインストールされたプログラムは、コンピュータ１２００を、本実施形態に係る装置の１又は複数の「部」として機能させ、又はコンピュータ１２００に、本実施形態に係る装置に関連付けられるオペレーション又は当該１又は複数の「部」を実行させることができ、及び／又はコンピュータ１２００に、本実施形態に係るプロセス又は当該プロセスの段階を実行させることができる。そのようなプログラムは、コンピュータ１２００に、本明細書に記載のフローチャート及びブロック図のブロックのうちのいくつか又はすべてに関連付けられた特定のオペレーションを実行させるべく、ＣＰＵ１２１２によって実行されてよい。 15 shows an example of a hardware configuration of a computer 1200 functioning as a smartphone 50, a robot 100, a server 300, and an agent system 500, 700. A program installed on the computer 1200 can cause the computer 1200 to function as one or more "parts" of an apparatus according to the present embodiment, or to execute an operation or one or more "parts" associated with an apparatus according to the present embodiment, and/or to execute a process or a step of the process according to the present embodiment. Such a program may be executed by the CPU 1212 to cause the computer 1200 to execute specific operations associated with some or all of the blocks of the flowcharts and block diagrams described in this specification.

本実施形態によるコンピュータ１２００は、ＣＰＵ１２１２、ＲＡＭ１２１４、及びグラフィックコントローラ１２１６を含み、それらはホストコントローラ１２１０によって相互に接続されている。コンピュータ１２００はまた、通信インタフェース１２２２、記憶装置１２２４、ＤＶＤドライブ１２２６、及びＩＣカードドライブのような入出力ユニットを含み、それらは入出力コントローラ１２２０を介してホストコントローラ１２１０に接続されている。ＤＶＤドライブ１２２６は、ＤＶＤ－ＲＯＭドライブ及びＤＶＤ－ＲＡＭドライブ等であってよい。記憶装置１２２４は、ハードディスクドライブ及びソリッドステートドライブ等であってよい。コンピュータ１２００はまた、ＲＯＭ１２３０及びキーボードのようなレガシの入出力ユニットを含み、それらは入出力チップ１２４０を介して入出力コントローラ１２２０に接続されている。 The computer 1200 according to this embodiment includes a CPU 1212, a RAM 1214, and a graphics controller 1216, which are connected to each other by a host controller 1210. The computer 1200 also includes input/output units such as a communication interface 1222, a storage device 1224, a DVD drive 1226, and an IC card drive, which are connected to the host controller 1210 via an input/output controller 1220. The DVD drive 1226 may be a DVD-ROM drive, a DVD-RAM drive, or the like. The storage device 1224 may be a hard disk drive, a solid state drive, or the like. The computer 1200 also includes a ROM 1230 and a legacy input/output unit such as a keyboard, which are connected to the input/output controller 1220 via an input/output chip 1240.

ＣＰＵ１２１２は、ＲＯＭ１２３０及びＲＡＭ１２１４内に格納されたプログラムに従い動作し、それにより各ユニットを制御する。グラフィックコントローラ１２１６は、ＲＡＭ１２１４内に提供されるフレームバッファ等又はそれ自体の中に、ＣＰＵ１２１２によって生成されるイメージデータを取得し、イメージデータがディスプレイデバイス１２１８上に表示されるようにする。 The CPU 1212 operates according to the programs stored in the ROM 1230 and the RAM 1214, thereby controlling each unit. The graphics controller 1216 acquires image data generated by the CPU 1212 into a frame buffer or the like provided in the RAM 1214 or into itself, and causes the image data to be displayed on the display device 1218.

通信インタフェース１２２２は、ネットワークを介して他の電子デバイスと通信する。記憶装置１２２４は、コンピュータ１２００内のＣＰＵ１２１２によって使用されるプログラム及びデータを格納する。ＤＶＤドライブ１２２６は、プログラム又はデータをＤＶＤ－ＲＯＭ１２２７等から読み取り、記憶装置１２２４に提供する。ＩＣカードドライブは、プログラム及びデータをＩＣカードから読み取り、及び／又はプログラム及びデータをＩＣカードに書き込む。 The communication interface 1222 communicates with other electronic devices via a network. The storage device 1224 stores programs and data used by the CPU 1212 in the computer 1200. The DVD drive 1226 reads programs or data from a DVD-ROM 1227 or the like and provides them to the storage device 1224. The IC card drive reads programs and data from an IC card and/or writes programs and data to an IC card.

ＲＯＭ１２３０はその中に、アクティブ化時にコンピュータ１２００によって実行されるブートプログラム等、及び／又はコンピュータ１２００のハードウェアに依存するプログラムを格納する。入出力チップ１２４０はまた、様々な入出力ユニットをＵＳＢポート、パラレルポート、シリアルポート、キーボードポート、マウスポート等を介して、入出力コントローラ１２２０に接続してよい。 ROM 1230 stores therein a boot program or the like executed by computer 1200 upon activation, and/or a program that depends on the hardware of computer 1200. I/O chip 1240 may also connect various I/O units to I/O controller 1220 via USB ports, parallel ports, serial ports, keyboard ports, mouse ports, etc.

プログラムは、ＤＶＤ－ＲＯＭ１２２７又はＩＣカードのようなコンピュータ可読記憶媒体によって提供される。プログラムは、コンピュータ可読記憶媒体から読み取られ、コンピュータ可読記憶媒体の例でもある記憶装置１２２４、ＲＡＭ１２１４、又はＲＯＭ１２３０にインストールされ、ＣＰＵ１２１２によって実行される。これらのプログラム内に記述される情報処理は、コンピュータ１２００に読み取られ、プログラムと、上記様々なタイプのハードウェアリソースとの間の連携をもたらす。装置又は方法が、コンピュータ１２００の使用に従い情報のオペレーション又は処理を実現することによって構成されてよい。 The programs are provided by a computer-readable storage medium such as a DVD-ROM 1227 or an IC card. The programs are read from the computer-readable storage medium, installed in the storage device 1224, RAM 1214, or ROM 1230, which are also examples of computer-readable storage media, and executed by the CPU 1212. The information processing described in these programs is read by the computer 1200, and brings about cooperation between the programs and the various types of hardware resources described above. An apparatus or method may be constructed by realizing the operation or processing of information according to the use of the computer 1200.

例えば、通信がコンピュータ１２００及び外部デバイス間で実行される場合、ＣＰＵ１２１２は、ＲＡＭ１２１４にロードされた通信プログラムを実行し、通信プログラムに記述された処理に基づいて、通信インタフェース１２２２に対し、通信処理を命令してよい。通信インタフェース１２２２は、ＣＰＵ１２１２の制御の下、ＲＡＭ１２１４、記憶装置１２２４、ＤＶＤ－ＲＯＭ１２２７、又はＩＣカードのような記録媒体内に提供される送信バッファ領域に格納された送信データを読み取り、読み取られた送信データをネットワークに送信し、又はネットワークから受信した受信データを記録媒体上に提供される受信バッファ領域等に書き込む。 For example, when communication is performed between computer 1200 and an external device, CPU 1212 may execute a communication program loaded into RAM 1214 and instruct communication interface 1222 to perform communication processing based on the processing described in the communication program. Under the control of CPU 1212, communication interface 1222 reads transmission data stored in a transmission buffer area provided in RAM 1214, storage device 1224, DVD-ROM 1227, or a recording medium such as an IC card, and transmits the read transmission data to the network, or writes received data received from the network to a reception buffer area or the like provided on the recording medium.

また、ＣＰＵ１２１２は、記憶装置１２２４、ＤＶＤドライブ１２２６（ＤＶＤ－ＲＯＭ１２２７）、ＩＣカード等のような外部記録媒体に格納されたファイル又はデータベースの全部又は必要な部分がＲＡＭ１２１４に読み取られるようにし、ＲＡＭ１２１４上のデータに対し様々なタイプの処理を実行してよい。ＣＰＵ１２１２は次に、処理されたデータを外部記録媒体にライトバックしてよい。 The CPU 1212 may also cause all or a necessary portion of a file or database stored in an external recording medium such as the storage device 1224, DVD drive 1226 (DVD-ROM 1227), IC card, etc. to be read into the RAM 1214, and perform various types of processing on the data on the RAM 1214. The CPU 1212 may then write back the processed data to the external recording medium.

様々なタイプのプログラム、データ、テーブル、及びデータベースのような様々なタイプの情報が記録媒体に格納され、情報処理を受けてよい。ＣＰＵ１２１２は、ＲＡＭ１２１４から読み取られたデータに対し、本開示の随所に記載され、プログラムの命令シーケンスによって指定される様々なタイプのオペレーション、情報処理、条件判断、条件分岐、無条件分岐、情報の検索／置換等を含む、様々なタイプの処理を実行してよく、結果をＲＡＭ１２１４に対しライトバックする。また、ＣＰＵ１２１２は、記録媒体内のファイル、データベース等における情報を検索してよい。例えば、各々が第２の属性の属性値に関連付けられた第１の属性の属性値を有する複数のエントリが記録媒体内に格納される場合、ＣＰＵ１２１２は、当該複数のエントリの中から、第１の属性の属性値が指定されている条件に一致するエントリを検索し、当該エントリ内に格納された第２の属性の属性値を読み取り、それにより予め定められた条件を満たす第１の属性に関連付けられた第２の属性の属性値を取得してよい。 Various types of information, such as various types of programs, data, tables, and databases, may be stored in the recording medium and may undergo information processing. The CPU 1212 may perform various types of processing on the data read from the RAM 1214, including various types of operations, information processing, conditional judgment, conditional branching, unconditional branching, information search/replacement, etc., as described throughout this disclosure and specified by the instruction sequence of the program, and writes back the results to the RAM 1214. The CPU 1212 may also search for information in a file, database, etc. in the recording medium. For example, when multiple entries each having an attribute value of a first attribute associated with an attribute value of a second attribute are stored in the recording medium, the CPU 1212 may search for an entry whose attribute value of the first attribute matches a specified condition from among the multiple entries, read the attribute value of the second attribute stored in the entry, and thereby obtain the attribute value of the second attribute associated with the first attribute that satisfies a predetermined condition.

上で説明したプログラム又はソフトウェアモジュールは、コンピュータ１２００上又はコンピュータ１２００近傍のコンピュータ可読記憶媒体に格納されてよい。また、専用通信ネットワーク又はインターネットに接続されたサーバシステム内に提供されるハードディスク又はＲＡＭのような記録媒体が、コンピュータ可読記憶媒体として使用可能であり、それによりプログラムを、ネットワークを介してコンピュータ１２００に提供する。 The above-described programs or software modules may be stored in a computer-readable storage medium on the computer 1200 or in the vicinity of the computer 1200. In addition, a recording medium such as a hard disk or RAM provided in a server system connected to a dedicated communication network or the Internet can be used as a computer-readable storage medium, thereby providing the programs to the computer 1200 via the network.

本実施形態におけるフローチャート及びブロック図におけるブロックは、オペレーションが実行されるプロセスの段階又はオペレーションを実行する役割を持つ装置の「部」を表わしてよい。特定の段階及び「部」が、専用回路、コンピュータ可読記憶媒体上に格納されるコンピュータ可読命令と共に供給されるプログラマブル回路、及び／又はコンピュータ可読記憶媒体上に格納されるコンピュータ可読命令と共に供給されるプロセッサによって実装されてよい。専用回路は、デジタル及び／又はアナログハードウェア回路を含んでよく、集積回路（ＩＣ）及び／又はディスクリート回路を含んでよい。プログラマブル回路は、例えば、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、及びプログラマブルロジックアレイ（ＰＬＡ）等のような、論理積、論理和、排他的論理和、否定論理積、否定論理和、及び他の論理演算、フリップフロップ、レジスタ、並びにメモリエレメントを含む、再構成可能なハードウェア回路を含んでよい。 The blocks in the flowcharts and block diagrams in this embodiment may represent stages of a process in which an operation is performed or "parts" of a device responsible for performing the operation. Particular stages and "parts" may be implemented by dedicated circuitry, programmable circuitry provided with computer-readable instructions stored on a computer-readable storage medium, and/or a processor provided with computer-readable instructions stored on a computer-readable storage medium. The dedicated circuitry may include digital and/or analog hardware circuits and may include integrated circuits (ICs) and/or discrete circuits. The programmable circuitry may include reconfigurable hardware circuits including AND, OR, XOR, NAND, NOR, and other logical operations, flip-flops, registers, and memory elements, such as, for example, field programmable gate arrays (FPGAs) and programmable logic arrays (PLAs).

コンピュータ可読記憶媒体は、適切なデバイスによって実行される命令を格納可能な任意の有形なデバイスを含んでよく、その結果、そこに格納される命令を有するコンピュータ可読記憶媒体は、フローチャート又はブロック図で指定されたオペレーションを実行するための手段を作成すべく実行され得る命令を含む、製品を備えることになる。コンピュータ可読記憶媒体の例としては、電子記憶媒体、磁気記憶媒体、光記憶媒体、電磁記憶媒体、半導体記憶媒体等が含まれてよい。コンピュータ可読記憶媒体のより具体的な例としては、フロッピー（登録商標）ディスク、ディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭ又はフラッシュメモリ）、電気的消去可能プログラマブルリードオンリメモリ（ＥＥＰＲＯＭ）、静的ランダムアクセスメモリ（ＳＲＡＭ）、コンパクトディスクリードオンリメモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、ブルーレイ（登録商標）ディスク、メモリスティック、集積回路カード等が含まれてよい。 A computer-readable storage medium may include any tangible device capable of storing instructions that are executed by a suitable device, such that a computer-readable storage medium having instructions stored thereon comprises an article of manufacture that includes instructions that can be executed to create means for performing the operations specified in the flowchart or block diagram. Examples of computer-readable storage media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer-readable storage media may include floppy disks, diskettes, hard disks, random access memories (RAMs), read-only memories (ROMs), erasable programmable read-only memories (EPROMs or flash memories), electrically erasable programmable read-only memories (EEPROMs), static random access memories (SRAMs), compact disk read-only memories (CD-ROMs), digital versatile disks (DVDs), Blu-ray disks, memory sticks, integrated circuit cards, and the like.

コンピュータ可読命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、又はＳｍａｌｌｔａｌｋ、ＪＡＶＡ（登録商標）、Ｃ＋＋等のようなオブジェクト指向プログラミング言語、及び「Ｃ」プログラミング言語又は同様のプログラミング言語のような従来の手続型プログラミング言語を含む、１又は複数のプログラミング言語の任意の組み合わせで記述されたソースコード又はオブジェクトコードのいずれかを含んでよい。 The computer readable instructions may include either assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, JAVA, C++, etc., and conventional procedural programming languages such as the "C" programming language or similar programming languages.

コンピュータ可読命令は、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサ、又はプログラマブル回路が、フローチャート又はブロック図で指定されたオペレーションを実行するための手段を生成するために当該コンピュータ可読命令を実行すべく、ローカルに又はローカルエリアネットワーク（ＬＡＮ）、インターネット等のようなワイドエリアネットワーク（ＷＡＮ）を介して、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサ、又はプログラマブル回路に提供されてよい。プロセッサの例としては、コンピュータプロセッサ、処理ユニット、マイクロプロセッサ、デジタル信号プロセッサ、コントローラ、マイクロコントローラ等を含む。 The computer-readable instructions may be provided to a processor of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, or a programmable circuit, either locally or over a local area network (LAN), a wide area network (WAN), such as the Internet, so that the processor of the general-purpose computer, special-purpose computer, or other programmable data processing apparatus, or the programmable circuit, executes the computer-readable instructions to generate means for performing the operations specified in the flowcharts or block diagrams. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, etc.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更又は改良を加えることが可能であることが当業者に明らかである。その様な変更又は改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 The present invention has been described above using an embodiment, but the technical scope of the present invention is not limited to the scope described in the above embodiment. It is clear to those skilled in the art that various modifications and improvements can be made to the above embodiment. It is clear from the claims that forms with such modifications or improvements can also be included in the technical scope of the present invention.

特許請求の範囲、明細書、及び図面中において示した装置、システム、プログラム、及び方法における動作、手順、ステップ、及び段階などの各処理の実行順序は、特段「より前に」、「先立って」などと明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、及び図面中の動作フローに関して、便宜上「まず、」、「次に、」などを用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The order of execution of each process, such as operations, procedures, steps, and stages, in the devices, systems, programs, and methods shown in the claims, specifications, and drawings is not specifically stated as "before" or "prior to," and it should be noted that the processes may be performed in any order, unless the output of a previous process is used in a later process. Even if the operational flow in the claims, specifications, and drawings is explained using "first," "next," etc. for convenience, it does not mean that it is necessary to perform the processes in that order.

以上、本発明に係るシステムをロボット１００及び通信端末１００Ｍなどの機能を主として説明したが、本発明に係るシステムはロボット１００及び通信端末１００Ｍに実装されているとは限らない。本発明に係るシステムは、一般的な情報処理システムとして実装されていてもよい。本発明は、例えば、サーバやパーソナルコンピュータで動作するソフトウェアプログラム、スマートホン等で動作するアプリケーションとして実装されてもよい。本発明に係る方法はSaaS(Software as a Service)形式でユーザに対して提供されてもよい。本発明は、例えば１又は２以上の情報源からの複数の情報を分類し、分類した２以上の情報の中から信頼度が特定の値よりも大きい情報を抽出する事前処理が行われた場合、抽出した情報に基づき記事を作成する特定処理を実行する際に利用することができる。 Although the system according to the present invention has been described above mainly with respect to the functions of the robot 100 and the communication terminal 100M, the system according to the present invention is not necessarily implemented in the robot 100 and the communication terminal 100M. The system according to the present invention may be implemented as a general information processing system. The present invention may be implemented, for example, as a software program that runs on a server or a personal computer, or an application that runs on a smartphone or the like. The method according to the present invention may be provided to a user in the form of SaaS (Software as a Service). For example, the present invention can be used when performing a specific process of creating an article based on the extracted information after a pre-processing is performed in which multiple pieces of information from one or more information sources are classified and information with a reliability greater than a specific value is extracted from the two or more classified pieces of information.

５システム、１０、１１、１２ユーザ、２０通信網、１００、１０１、１０２ロボット、１００Ｍ通信端末、１００Ｎぬいぐるみ、２００センサ部、２０１マイク、２０２深度センサ、２０３カメラ、２０４距離センサ、２１０センサモジュール部、２１１音声感情認識部、２１２発話理解部、２１３表情認識部、２１４顔認識部、２２０格納部、２２１行動決定モデル、２２２履歴データ、２３０状態認識部、２３２感情決定部、２３４行動認識部、２３６行動決定部、２３８記憶制御部、２５０行動制御部、２５２制御対象、２７０関連情報収集部、２８０通信処理部、３００サーバ、５００、７００エージェントシステム、６００ネックストラップ、１２００コンピュータ、１２１０ホストコントローラ、１２１２ＣＰＵ、１２１４ＲＡＭ、１２１６グラフィックコントローラ、１２１８ディスプレイデバイス、１２２０入出力コントローラ、１２２２通信インタフェース、１２２４記憶装置、１２２６ＤＶＤドライブ、１２２７ＤＶＤ－ＲＯＭ、１２３０ＲＯＭ、１２４０入出力チップ 5 System, 10, 11, 12 User, 20 Communication network, 100, 101, 102 Robot, 100M Communication terminal, 100N Stuffed toy, 200 Sensor unit, 201 Microphone, 202 Depth sensor, 203 Camera, 204 Distance sensor, 210 Sensor module unit, 211 Voice emotion recognition unit, 212 Speech understanding unit, 213 Facial expression recognition unit, 214 Face recognition unit, 220 Storage unit, 221 Behavior decision model, 222 History data, 230 State recognition unit, 232 Emotion decision unit, 234 Behavior recognition unit, 236 Behavior decision unit, 238 Memory control unit, 250 Behavior control unit, 252 Control target, 270 Related information collection unit, 280 Communication processing unit, 300 Server, 500, 700 Agent system, 600 Neck strap, 1200 Computer, 1210 host controller, 1212 CPU, 1214 RAM, 1216 graphic controller, 1218 display device, 1220 input/output controller, 1222 communication interface, 1224 storage device, 1226 DVD drive, 1227 DVD-ROM, 1230 ROM, 1240 input/output chip

Claims

a state recognition unit that recognizes a user state including a user's behavior and a state of an electronic device;
an emotion determining unit for determining an emotion of the user or an emotion of the electronic device;
a behavior decision unit that decides, at a predetermined timing, one of a plurality of types of device operation, including no operation, as an action of the electronic device, using at least one of the user state, the state of the electronic device, the user's emotion, and the emotion of the electronic device, and a behavior decision model;
Including,
The electronic device detachably attached to the neck strap includes one or more sensors for collecting information;
The device operation includes setting a first action content that guides the user by causing the electronic device to play at least one of sound and image.

The control system according to claim 1, wherein the action decision unit detects the action of the user based on the sensor either autonomously or periodically, and determines the first action content when it determines that the action of the electronic device is to lead the user based on the detected action of the user and specific information stored in advance.

The control system according to claim 2, wherein the action determination unit determines the first action content when the camera included in the sensor is directed in the direction in which the user is moving.

The control system of claim 2, wherein the first action content includes playing at least one of a sound and an image that assists the user in walking.

The control system of claim 4, wherein the playback includes playback of at least one of audio and images that guides the user along a route other than the set guidance route.

The control system according to claim 5, wherein the playback includes playback of at least one of audio and images that guide the user along a route back to the guided route when the user, being led along a set guided route, deviates from the guided route.

The control system according to claim 4, wherein the action decision unit detects the user's action while the user is leading or after playing back at least one of the audio and the image, thereby determining whether the user's action has been corrected, and if the user's action has been corrected, determines a second action content different from the first action content.

The control system of claim 7, wherein the second action content includes playing at least one of a voice praising the user's action, a voice thanking the user for the user's action, an image praising the user's action, and an image thanking the user for the user's action.

The control system according to claim 4, wherein the action decision unit detects the user's action while the user is leading or after playing back at least one of the audio and the image, to determine whether the user's action has been corrected, and if the user's action has not been corrected, determines a third action content different from the first action content.

The control system according to claim 9, wherein the third action content includes at least one of sending specific information to a person other than the user, playing a sound that attracts the user's interest, and playing an image that attracts the user's interest.

the electronic device is a communication terminal,
The control system according to claim 1 , wherein the behavior determining unit determines, as the behavior of the communication terminal, one of a plurality of types of agent behaviors for interacting with the user, including no action.

The behavioral decision model is a sentence generation model having a dialogue function,
12. The control system according to claim 11, wherein the behavior determination unit inputs text representing at least one of the user state, the agent state, the user's emotion, and the agent's emotion, and text asking about the agent's behavior, into the sentence generation model, and determines the behavior of the communication terminal based on an output of the sentence generation model.

the behavior determination unit, when the emotion of the user determined by the emotion determination unit is a negative emotion, inputs text expressing the emotion of the user and text asking the user about words to be spoken to the user into the sentence generation model, and determines, based on an output of the sentence generation model, the content of the utterance of the agent to be output from a control object of the communication terminal as an action of the communication terminal;
an action control unit that controls the control target so as to execute the action of the communication terminal determined by the action determination unit;
the behavior determination unit inputs into the sentence generation model a text representing a dialogue between the user and the agent, the text including the utterance content of the agent output from the control object and the utterance content of the user as the user state recognized by the state recognition unit, and a text asking a question about a solution method for a problem the user has, and determines the utterance content of the agent to be output from the control object based on an output of the sentence generation model;
the behavior control unit outputs a solution to the problem the user has from the controlled object as the speech content of the agent;
13. The control system of claim 12.

the behavior determination unit, when the emotion of the user determined by the emotion determination unit is a negative emotion and the user has difficulty in interacting with the agent, inputs text expressing the user state and the emotion of the user and text asking a question about a problem the user is having into the sentence generation model, and determines, based on an output of the sentence generation model, to transmit, as an action of the communication terminal, problem information expressing the problem the user is having to a terminal of a related person having a predetermined relationship with the user;
a communication processing unit that transmits the problem information to a terminal of the person involved;
14. The control system of claim 13.

The emotion determination unit determines emotions of the other users based on images captured by a camera of the other users around the user,
when the emotion of the other user determined by the emotion determination unit is a positive emotion, the action determination unit inputs the captured image, text expressing a problem the user has, and text asking the other user what words to speak to the other user into the sentence generation model, and determines the content of the utterance of the agent to be output from the control target based on an output of the sentence generation model;
The behavior control unit outputs a problem that the user has as an utterance content of the agent from the controlled object to the other user.
15. The control system of claim 14.

when the emotion determining unit determines that the emotion of the user is a negative emotion, the emotion determining unit determines that the emotion of the agent is a negative emotion;
the behavior control unit outputs the utterance content of the agent from the control target in an output mode based on the negative emotion of the agent determined by the emotion determination unit.
14. The control system of claim 13.