JP2018151471A

JP2018151471A - Dialogue method, dialogue system, dialogue apparatus, and program

Info

Publication number: JP2018151471A
Application number: JP2017046363A
Authority: JP
Inventors: 弘晃杉山; Hiroaki Sugiyama; 宏美成松; Hiromi Narimatsu; 雄一郎吉川; Yuichiro Yoshikawa; 石黒　浩; Hiroshi Ishiguro; 浩石黒
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Current assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC
Priority date: 2017-03-10
Filing date: 2017-03-10
Publication date: 2018-09-27
Anticipated expiration: 2037-03-10
Also published as: JP6610965B2

Abstract

【課題】質問に対するユーザの応答に関わらず、少数のシナリオで対話を継続する。【解決手段】対話システム１００は、ある質問である第一発話と第一発話に関係する発話である第二発話とをユーザ１０１へ提示する。人型ロボット５０−１は、第一発話を提示する。マイクロホン１１−１は、第一発話後にユーザ１０１が発したユーザ発話を受け付ける。人型ロボット５０−１は、第一発話に関係する発話から、ユーザ発話の内容の少なくとも一部を省略した、第二発話を提示する。【選択図】図１PROBLEM TO BE SOLVED: To continue a dialogue in a small number of scenarios regardless of a user's response to a question. A dialogue system 100 presents a first utterance, which is a question, and a second utterance, which is a utterance related to the first utterance, to a user 101. The humanoid robot 50-1 presents the first utterance. The microphone 11-1 receives the user utterance uttered by the user 101 after the first utterance. The humanoid robot 50-1 presents a second utterance in which at least a part of the content of the user's utterance is omitted from the utterance related to the first utterance. [Selection diagram] Fig. 1

Description

この発明は、人とコミュニケーションを行うロボットなどに適用可能な、コンピュータが人間と自然言語を用いて対話を行う技術に関する。 The present invention relates to a technology in which a computer interacts with a human using a natural language, which can be applied to a robot that communicates with a human.

近年、人とコミュニケーションを行うロボットの研究開発が進展しており、様々な現場で実用化されてきている。例えば、コミュニケーションセラピーの現場において、ロボットが孤独感を抱える人の話し相手となる利用形態がある。具体的には、老人介護施設においてロボットが入居者の傾聴役となることで、入居者の孤独感を癒す役割を担うことができると共に、ロボットとの会話している姿を見せ、入居者とその家族や介護士など周りの人々との会話のきっかけを作ることができる。また、例えば、コミュニケーション訓練の現場において、ロボットが練習相手となる利用形態がある。具体的には、外国語学習施設においてロボットが外国語学習者の練習相手となることで、外国語学習を効率的に進めることができる。また、例えば、情報提示システムとしての応用において、ロボット同士の対話を聞かせることを基本としながら、時折人に話しかけることで、退屈させずに人を対話に参加させ、人が受け入れやすい形で情報を提示することができる。具体的には、街中の待ち合わせ場所やバス停、駅のホームなどで人が時間を持て余している際や、自宅や教室などで対話に参加する余裕がある際に、ニュースや商品紹介、蘊蓄・知識紹介、教育（例えば、子供の保育・教育、大人への一般教養教授、モラル啓発など）など、効率的な情報提示が期待できる。さらに、例えば、情報収集システムとしての応用において、ロボットが人に話しかけながら情報を収集する利用形態がある。ロボットとのコミュニケーションにより対話感を保持できるため、人に聴取されているという圧迫感を与えずに情報収集することができる。具体的には、個人情報調査や市場調査、商品評価、推薦商品のための趣向調査などに応用することが想定されている。このように人とロボットのコミュニケーションは様々な応用が期待されており、ユーザとより自然に対話を行うロボットの実現が期待される。また、スマートフォンの普及により、LINE(登録商標)のように、複数ユーザでほぼリアルタイムにチャットを行うことにより、人との会話を楽しむサービスも実施されている。このチャットサービスにロボットとの会話の技術を適用すれば、チャット相手がいなくても、ユーザとより自然に対話を行うチャットサービスの実現が可能となる。 In recent years, research and development of robots that communicate with people have progressed and have been put to practical use in various fields. For example, in the field of communication therapy, there is a usage form in which a robot is a conversation partner of a person who is lonely. Specifically, in a nursing home for the elderly, the robot can play a role of listening to the resident, so he can play a role in healing the loneliness of the resident and show a conversation with the robot. You can create conversation opportunities with the family and caregivers. Further, for example, there is a usage form in which a robot is a practice partner in a communication training field. Specifically, the foreign language learning can be efficiently advanced by having the robot become a practice partner of the foreign language learner at the foreign language learning facility. Also, for example, in application as an information presentation system, it is basic to let robots talk to each other, but by talking to people from time to time, people can participate in the conversation without being bored, and information that is easy for people to accept Can be presented. Specifically, news, product introductions, accumulation / knowledge when people have time in meeting places in the city, bus stops, station platforms, etc. or when there is room to participate in dialogues at home or in classrooms. Efficient information presentation such as introduction and education (for example, childcare / education for children, general education professor for adults, moral education, etc.) can be expected. Furthermore, for example, in application as an information collection system, there is a utilization form in which a robot collects information while talking to a person. Since communication can be maintained through communication with the robot, information can be collected without giving a sense of pressure that people are listening. Specifically, it is assumed to be applied to personal information surveys, market surveys, product evaluations, preference surveys for recommended products, and the like. As described above, various applications of human-robot communication are expected, and realization of a robot that can more naturally interact with users is expected. In addition, with the spread of smartphones, services such as LINE (registered trademark) that allow users to enjoy conversations with people by chatting in almost real time are also being implemented. If the technology of conversation with the robot is applied to this chat service, it becomes possible to realize a chat service for more natural dialogue with the user even when there is no chat partner.

本明細書では、これらのサービスで用いられるロボットやチャット相手などのユーザとの対話相手となるハードウェアやユーザとの対話相手となるハードウェアとしてコンピュータを機能させるためのコンピュータソフトウェアなどを総称してエージェントと呼ぶこととする。エージェントは、ユーザとの対話相手となるものであるため、ロボットやチャット相手などのように擬人化されていたり、人格化されていたり、性格や個性を有していたりするものであってもよい。 In this specification, the hardware used as a conversation partner with a user such as a robot and a chat partner used in these services, and the computer software for causing the computer to function as the hardware as a conversation partner with the user are collectively referred to. It will be called an agent. Since the agent is a conversation partner with the user, the agent may be anthropomorphic, personalized, or have personality or individuality, such as a robot or a chat partner.

これらのサービスの実現のキーとなるのは、ハードウェアやコンピュータソフトウェアにより実現されるエージェントが人間と自然に対話を行うことができる技術である。 The key to the realization of these services is a technology that enables agents realized by hardware and computer software to naturally interact with humans.

上記のエージェントの一例として、例えば、非特許文献１に記載されたような、ユーザの発話を音声認識し、発話の意図を理解・推論して、適切な応答をする音声対話システムがある。音声対話システムの研究は、音声認識技術の進展に伴って活発に進められ、例えば音声自動応答システムなどで実用化されている。 As an example of the agent described above, there is a voice dialogue system that recognizes a user's utterance, understands / infers the intention of the utterance, and responds appropriately as described in Non-Patent Document 1, for example. Research on speech dialogue systems has been actively promoted with the progress of speech recognition technology, and has been put to practical use in, for example, automatic speech response systems.

また、上記のエージェントの一例として、あらかじめ定められたシナリオに沿って特定の話題についてユーザと対話を行うシナリオ対話システムがある。シナリオ対話システムでは、シナリオに沿って対話が展開する限り対話を続けることが可能である。例えば、非特許文献２に記載された対話システムは、ユーザと複数のエージェント間で、エージェントによる割り込みやエージェント同士のやり取りを含めながら対話を行うシステムである。例えば、エージェントは、ユーザに対してシナリオに用意された質問を発話し、質問に対するユーザの回答の発話がシナリオに用意された選択肢に対応する場合に、その選択肢に対応する発話を行うように機能する。すなわち、シナリオ対話システムは、システムに予め記憶されたシナリオに基づいた発話をエージェントが行う対話システムである。この対話システムでは、エージェントがユーザに問いかけ、ユーザからの返答を受けた際に、ユーザの発話内容に関わらず「そっか」といった相槌で流したり、エージェントの割り込みで話題を変えたりすることで、ユーザの発話が本来の話題から外れた場合であってもストーリーの破綻をユーザに感じさせないように応答することが可能である。 Further, as an example of the agent, there is a scenario dialogue system that performs dialogue with a user on a specific topic according to a predetermined scenario. In the scenario dialogue system, the dialogue can be continued as long as the dialogue develops along the scenario. For example, the dialogue system described in Non-Patent Document 2 is a system that performs a dialogue between a user and a plurality of agents, including an interruption by an agent and an exchange between agents. For example, when an agent utters a question prepared for a scenario to a user, and the utterance of a user's answer to the question corresponds to an option prepared for the scenario, the agent functions to utter corresponding to the option. To do. That is, the scenario dialogue system is a dialogue system in which an agent makes an utterance based on a scenario stored in advance in the system. In this interactive system, when the agent asks the user and receives a response from the user, the conversation is swayed regardless of the content of the user's utterance, or the topic is changed by interrupting the agent. Even when the user's utterance deviates from the original topic, it is possible to respond so as not to make the user feel the story is broken.

また、上記のエージェントの一例として、ユーザの発話内容に沿った発話をエージェントが行うことにより、ユーザとエージェントとが自然な対話を行う雑談対話システムがある。例えば、非特許文献３に記載された対話システムは、ユーザとエージェントとの間で行われる複数回の対話の中で文脈に特有のものをより重視しながら、ユーザまたはエージェントの発話に含まれる単語をトリガーとして、あらかじめ記述しておいたルールに従ってシステムが発話することで、ユーザとシステムとの間で雑談対話を実現するシステムである。雑談対話システムが用いるルールは、あらかじめ記述したものだけでなく、ユーザの発話内容に基づいて自動的に生成したものであってもよいし、ユーザまたはエージェントによる直前の発話またはその近傍に発話された発話に基づいて自動的に生成したものであってもよいし、ユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいて自動的に生成したものであってもよい。非特許文献３には、ユーザの発話に含まれる単語と共起関係や係り受け関係にある単語に基づいて、自動的にルールを生成する技術が記載されている。また、例えば、非特許文献４に記載された対話システムは、人手で記述したルールと統計的発話生成手法で記述したルールを融合することで、ルール生成のコストを低減したシステムである。雑談対話システムは、シナリオ対話システムとは異なり、予め用意されたシナリオに沿った発話をエージェントが行うものではないため、ユーザの発話によっては、エージェントの発話がユーザの発話に対応しないものとなってしまうという事態は生じずに、少なくともユーザの発話内容、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいた発話をエージェントが行うことが可能である。すなわち、雑談対話システムは、少なくともユーザの発話内容、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話、もしくはユーザまたはエージェントによる直前の発話またはその近傍に発話された発話を少なくとも含む発話に基づいた発話をエージェントが行う対話システムである。これらの雑談対話システムでは、ユーザの発話に対して明示的に応答することが可能である。 Further, as an example of the above-described agent, there is a chat dialogue system in which a user and an agent have a natural dialogue when the agent utters according to the content of the user's utterance. For example, in the dialogue system described in Non-Patent Document 3, words included in the utterance of the user or the agent while giving more importance to the context-specific ones in a plurality of dialogues between the user and the agent. Is a system that realizes a chat conversation between the user and the system by the system speaking according to the rules described in advance. The rules used by the chat dialogue system are not limited to those described in advance, but may be automatically generated based on the user's utterance content, or uttered in the immediate utterance by the user or agent or in the vicinity thereof. It may be automatically generated based on the utterance, or may be automatically generated based on the utterance including at least the utterance immediately before or near the utterance by the user or the agent. . Non-Patent Document 3 describes a technique for automatically generating a rule based on words that have a co-occurrence relationship or a dependency relationship with words included in a user's utterance. Further, for example, the dialogue system described in Non-Patent Document 4 is a system that reduces the cost of rule generation by fusing rules described manually and rules described using a statistical utterance generation method. The chat dialogue system is different from the scenario dialogue system because the agent does not utter the utterance according to the prepared scenario. Therefore, depending on the user's utterance, the agent's utterance does not correspond to the user's utterance. At least the content of the user's utterance, the utterance spoken immediately before or near the user or agent, or the utterance spoken immediately before or near the user or agent An agent can make an utterance based on the utterance. That is, the chat dialogue system includes at least the utterance content of the user, the utterance spoken immediately before or by the user or agent, or the utterance uttered immediately before or by the user or agent. It is a dialogue system in which an agent utters speech based on. In these chat dialogue systems, it is possible to explicitly respond to the user's utterance.

河原達也，“話し言葉による音声対話システム”，情報処理，vol. 45，no. 10，pp. 1027-1031，2004年10月Tatsuya Kawahara, “Spoken Dialogue System by Spoken Language”, Information Processing, vol. 45, no. 10, pp. 1027-1031, October 2004 有本庸浩，吉川雄一郎，石黒浩，“複数体のロボットによる音声認識なし対話の印象評価”，日本ロボット学会学術講演会，2016年Arimoto Yasuhiro, Yoshikawa Yuichiro, Ishiguro Hiroshi, “Impression Evaluation of Speechless Speech Recognition by Multiple Robots”, Annual Conference of the Robotics Society of Japan, 2016 杉山弘晃，目黒豊美，東中竜一郎，南泰浩，“任意の話題を持つユーザ発話に対する係り受けと用例を利用した応答文の生成”，人工知能学会論文誌，vol. 30(1)，pp. 183-194，2015年Hiroaki Sugiyama, Toyomi Meguro, Ryuichiro Higashinaka, Yasuhiro Minami, “Generation of response sentences using dependency and examples for user utterances with arbitrary topics”, Transactions of the Japanese Society for Artificial Intelligence, vol. 30 (1), pp. 183-194, 2015 目黒豊美，杉山弘晃，東中竜一郎，南泰浩，“ルールベース発話生成と統計的発話生成の融合に基づく対話システムの構築”，人工知能学会全国大会論文集，vol. 28，pp. 1-4，2014年Toyomi Meguro, Hiroaki Sugiyama, Ryuichiro Higashinaka, Yasuhiro Minami, “Construction of a dialogue system based on the fusion of rule-based utterance generation and statistical utterance generation”, Proceedings of National Conference of the Japanese Society for Artificial Intelligence, vol. 28, pp. 1-4 ,2014

対話のシナリオを人手で構築するには膨大なコストがかかる。Web上の記事など既存のリソースを収集して多数の話題についての発話文を生成することは可能であるが、通常は対話調の文ではなく、また、情報量が多いため、対話では理解しにくい場合が多い。 It takes a huge amount of money to build a dialogue scenario manually. Although it is possible to collect existing resources such as articles on the Web and generate utterance sentences on many topics, it is not usually a dialogue-like sentence, and since the amount of information is large, it is understood in the conversation. Often difficult.

ユーザの理解を促すために、上記のような方法で生成した文を複数の文に分割し、複数のエージェントが分担して発話することもできるが、ユーザが受動的に聞くのみとなると対話感が低減する。そのため、ユーザへの質問を挿入するなどしてユーザに発話させることで、対話感を向上することが考えられる。しかしながら、ユーザの発話を完全にコントロールすることは難しいため、想定されるユーザの発話それぞれに対してシナリオを用意する必要があり、大きくコストを下げることは困難である。 In order to encourage the user's understanding, the sentence generated by the method described above can be divided into multiple sentences, and multiple agents can share and speak, but if the user only listens passively, the feeling of dialogue Is reduced. Therefore, it is conceivable to improve the sense of dialogue by causing the user to speak by inserting a question to the user. However, since it is difficult to completely control the user's utterance, it is necessary to prepare a scenario for each assumed user's utterance, and it is difficult to greatly reduce the cost.

この発明の目的は、上述のような点に鑑みて、対話システムからの質問に対するユーザの応答に関わらず、あらかじめ用意した少数のシナリオで対話を継続することができる対話システム、対話装置を実現することである。 In view of the above-described points, an object of the present invention is to realize a dialogue system and a dialogue device capable of continuing a dialogue in a small number of scenarios prepared in advance regardless of a user's response to a question from the dialogue system. That is.

上記の課題を解決するために、この発明の第一の態様の対話方法は、ある質問である第一発話と第一発話に関係する発話である第二発話とをユーザへ提示する対話システムが実行する対話方法であって、提示部が、第一発話を提示する第一提示ステップと、入力部が、第一発話後にユーザが発したユーザ発話を受け付ける発話受付ステップと、提示部が、第一発話に関係する発話から、ユーザ発話の内容の少なくとも一部を省略した、第二発話を提示する第二提示ステップと、を含む。 In order to solve the above-described problem, the dialogue method according to the first aspect of the present invention is a dialogue system that presents a user with a first utterance that is a question and a second utterance that is an utterance related to the first utterance. In the interactive method to be executed, a presentation unit presents a first utterance, a first presentation step, an input unit accepts a user utterance uttered by a user after the first utterance, and a presentation unit A second presentation step of presenting a second utterance in which at least a part of the content of the user utterance is omitted from the utterance related to the one utterance.

この発明の第二の態様の対話方法は、特定の単語クラスに属する単語を発話させる質問である第一発話と第一発話に関係する発話である第二発話とをユーザへ提示する対話システムが実行する対話方法であって、提示部が、第一発話を提示する第一提示ステップと、入力部が、第一発話後にユーザが発したユーザ発話を受け付ける発話受付ステップと、提示部が、ユーザ発話に単語クラスに属する単語が含まれるか否かに応じて、あらかじめ定めた発話文から選択した追加発話を提示する第二提示ステップと、提示部が、第二発話を提示する第三提示ステップと、を含む。 In the dialogue method according to the second aspect of the present invention, there is provided a dialogue system for presenting a user with a first utterance that is a question for uttering a word belonging to a specific word class and a second utterance that is an utterance related to the first utterance. A dialogue method to be executed, wherein a presentation unit presents a first utterance, a first presentation step, an input unit accepts a user utterance uttered by a user after the first utterance, and a presentation unit A second presentation step for presenting an additional utterance selected from a predetermined utterance sentence according to whether or not a word belonging to the word class is included in the utterance, and a third presentation step for the presentation unit to present the second utterance And including.

この発明の第三の態様の対話方法は、ある知識を問う質問である第一発話と第一発話に関係する発話である第二発話とをユーザへ提示する対話システムが実行する対話方法であって、提示部が、第一発話を提示する第一提示ステップと、入力部が、第一発話後にユーザが発したユーザ発話を受け付ける発話受付ステップと、提示部が、ユーザ発話に知識を表す単語が含まれるか否かに応じて、あらかじめ定めた発話文から選択した追加発話を提示する第二提示ステップと、提示部が、第二発話を提示する第三提示ステップと、を含む。 The dialogue method according to the third aspect of the present invention is a dialogue method executed by a dialogue system that presents to a user a first utterance that is a question asking a certain knowledge and a second utterance that is an utterance related to the first utterance. The presentation unit presents the first utterance, the input unit accepts the user utterance uttered by the user after the first utterance, and the presentation unit represents the knowledge in the user utterance. Depending on whether or not is included, a second presentation step of presenting an additional utterance selected from a predetermined utterance sentence and a third presentation step of presenting a second utterance by the presenting unit are included.

この発明によれば、対話システムから提示した質問に対するユーザの発話内容に関わらず、あらかじめ用意したシナリオに復帰することができるため、少数のシナリオを用意しておくだけで対話を継続することができる対話システム、対話装置を実現することが可能となる。 According to the present invention, it is possible to return to a scenario prepared in advance regardless of the content of the user's utterance to the question presented from the dialogue system, so that the dialogue can be continued only by preparing a small number of scenarios. An interactive system and an interactive device can be realized.

図１は、実施形態の対話システムの機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of the interactive system according to the embodiment. 図２は、実施形態の対話方法の処理手続きを例示する図である。FIG. 2 is a diagram illustrating a processing procedure of the interactive method according to the embodiment. 図３は、変形例の対話システムの機能構成を例示する図である。FIG. 3 is a diagram illustrating a functional configuration of a dialog system according to a modification.

この発明では、対話システムがあらかじめ記憶しておいたシナリオに沿った対話を行う際に、対話感を向上するためにユーザへ質問を行い、その質問に対するユーザの発話内容に応じて、以降の発話内容を変更する。ある質問に対してユーザが発話した内容が、対話システムが直後に発話する内容に含まれる場合、そのまま発話すると同じ内容を繰り返すことになり冗長な印象を与える。そのため、対話システムが直後に発話する内容からユーザの発話内容を省略して発話を提示する。また、ある質問に対してユーザが発話する単語クラスが想定できる場合、その単語クラス内の単語がユーザの発話に含まれるときの発話と、その単語クラス内の単語がユーザの発話に含まれないときの発話とをあらかじめ用意しておき、ユーザの発話した内容がその単語クラスに含まれるか否かに応じて、ユーザの発話の直後に発話する内容を切り替える。さらに、ある知識を問う質問を行う場合、ユーザの発話が正解を含むときの発話と、ユーザの発話が正解を含まないときの発話とをあらかじめ用意しておき、ユーザの発話した内容が正解を含むか否かに応じて、ユーザの発話の直後に発話する内容を切り替える。従来は対話システムからの質問に対するユーザの応答を網羅して多数のシナリオを用意する必要があり、そのためのコストが膨大になっていたが、上記のように構成することにより、ユーザの応答の内容に関わらず少数のシナリオで対応することができ、シナリオを用意するコストを削減することができる。 In the present invention, when a dialogue is performed in accordance with a scenario stored in advance by the dialogue system, a question is asked to the user in order to improve the feeling of dialogue, and the subsequent utterance is determined according to the content of the user's utterance with respect to the question. Change the contents. If the content spoken by a user in response to a certain question is included in the content spoken immediately after by the dialog system, the same content is repeated if the speech is spoken as it is, giving a redundant impression. For this reason, the utterance content of the user is omitted from the content that the dialog system utters immediately, and the utterance is presented. In addition, when a word class spoken by a user can be assumed for a certain question, the utterance when a word in the word class is included in the user's utterance and the word in the word class are not included in the user's utterance Utterance at the time is prepared in advance, and the content to be uttered immediately after the user's utterance is switched according to whether or not the content uttered by the user is included in the word class. Furthermore, when asking a question that asks for certain knowledge, an utterance when the user's utterance includes the correct answer and an utterance when the user's utterance does not include the correct answer are prepared in advance, and the content uttered by the user is correct. The content to be uttered immediately after the user's utterance is switched depending on whether or not it is included. Conventionally, it has been necessary to prepare a large number of scenarios covering the user's response to the question from the interactive system, and the cost for that has been enormous, but by configuring as above, the content of the user's response Regardless of this, it is possible to cope with a small number of scenarios, and the cost for preparing the scenarios can be reduced.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the component which has the same function in drawing, and duplication description is abbreviate | omitted.

実施形態の対話システムは、少なくとも一台の人型ロボットがユーザとの対話を行うシステムである。すなわち、実施形態の対話システムは、エージェントが人型ロボットである場合の一例である。対話システム１００は、図１に示すように、例えば、対話装置１と、マイクロホン１１からなる入力部１０と、少なくともスピーカ５１を備える提示部５０とを含む。対話装置１は、例えば、音声認識部２０、発話決定部３０、および音声合成部４０を備える。この対話システム１００が後述する各ステップの処理を行うことにより実施形態の対話方法が実現される。 The dialogue system of the embodiment is a system in which at least one humanoid robot interacts with a user. That is, the dialogue system of the embodiment is an example in the case where the agent is a humanoid robot. As shown in FIG. 1, the dialogue system 100 includes, for example, a dialogue device 1, an input unit 10 including a microphone 11, and a presentation unit 50 including at least a speaker 51. The dialogue apparatus 1 includes, for example, a voice recognition unit 20, an utterance determination unit 30, and a voice synthesis unit 40. The interactive method of the embodiment is realized by the processing of each step described later by the interactive system 100.

対話装置１は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。対話装置１は、例えば、中央演算処理装置の制御のもとで各処理を実行する。対話装置１に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。また、対話装置１の各処理部の少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。 The interactive device 1 is a special device configured by reading a special program into a known or dedicated computer having a central processing unit (CPU), a main storage device (RAM), and the like. Device. For example, the interactive device 1 executes each process under the control of the central processing unit. The data input to the interactive device 1 and the data obtained in each process are stored in, for example, the main storage device, and the data stored in the main storage device is read out as necessary and used for other processing. The Further, at least a part of each processing unit of the interactive apparatus 1 may be configured by hardware such as an integrated circuit.

［入力部１０］
入力部１０は提示部５０と一体もしくは部分的に一体として構成してもよい。図１の例では、入力部１０の一部であるマイクロホン１１−１、１１−２が、提示部５０である人型ロボット５０−１、５０−２の頭部（耳の位置）に搭載されている。図１の例では、提示部５０は二台の人型ロボット５０−１、５０−２から構成されているが、一台の人型ロボットから構成されていてもよく、三台以上の人型ロボットから構成されていてもよい。 [Input unit 10]
The input unit 10 may be configured integrally or partially with the presentation unit 50. In the example of FIG. 1, microphones 11-1 and 11-2 that are part of the input unit 10 are mounted on the heads (ear positions) of the humanoid robots 50-1 and 50-2 that are the presentation unit 50. ing. In the example of FIG. 1, the presentation unit 50 is configured by two humanoid robots 50-1 and 50-2, but may be configured by a single humanoid robot, and may include three or more humanoids. You may be comprised from the robot.

入力部１０は、ユーザの発話を対話システム１００が取得するためのインターフェースである。言い換えれば、入力部１０は、ユーザの発話を対話システム１００へ入力するためのインターフェースである。例えば、入力部１０はユーザの発話音声を収音して音声信号に変換するマイクロホン１１である。マイクロホン１１は、ユーザ１０１が発話した発話音声を収音可能とすればよい。つまり、図１は一例であって、マイクロホン１１−１，１１−２の何れか一方を備えないでもよい。また、ユーザ１０１の近傍などの人型ロボット５０−１，５０−２とは異なる場所に設置された１個以上のマイクロホン、または、複数のマイクロホンを備えたマイクロホンアレイを入力部とし、マイクロホン１１−１，１１−２の双方を備えない構成としてもよい。マイクロホン１１は、変換により得たユーザの発話音声の音声信号を出力する。マイクロホン１１が出力した音声信号は、音声認識部２０へ入力される。 The input unit 10 is an interface for the dialog system 100 to acquire a user's utterance. In other words, the input unit 10 is an interface for inputting a user's utterance to the dialogue system 100. For example, the input unit 10 is a microphone 11 that picks up a user's voice and converts it into a voice signal. The microphone 11 only needs to be able to collect the uttered voice uttered by the user 101. That is, FIG. 1 is an example, and one of the microphones 11-1 and 11-2 may not be provided. Further, one or more microphones installed in a place different from the humanoid robots 50-1 and 50-2 such as the vicinity of the user 101 or a microphone array including a plurality of microphones is used as an input unit, and the microphone 11- It is good also as a structure which is not provided with both of 1 and 11-2. The microphone 11 outputs a voice signal of the user's uttered voice obtained by the conversion. The voice signal output from the microphone 11 is input to the voice recognition unit 20.

［音声認識部２０］
音声認識部２０は、マイクロホン１１から入力されたユーザの発話音声の音声信号を音声認識してユーザの発話内容を表すテキストに変換し、発話決定部３０に対して出力する。音声認識部２０が行う音声認識の方法は、既存のいかなる音声認識技術であってもよく、利用環境等に合わせて最適なものを適宜選択すればよい。 [Voice recognition unit 20]
The speech recognition unit 20 recognizes the speech signal of the user's speech input from the microphone 11 and converts the speech signal into text representing the user's speech content, and outputs the text to the speech determination unit 30. The speech recognition method performed by the speech recognition unit 20 may be any existing speech recognition technology, and an optimal one may be selected as appropriate in accordance with the usage environment.

［発話決定部３０］
発話決定部３０は、対話システム１００からの発話内容を表すテキストを決定し、音声合成部４０に対して出力する。音声認識部２０からユーザの発話内容を表すテキストが入力された場合には、入力されたユーザの発話内容を表すテキストに基づいて、対話システム１００からの発話内容を表すテキストを決定し、音声合成部４０に対して出力する。なお、対話システム１００の提示部５０が複数の人型ロボットで構成される場合には、発話決定部３０は、当該発話をいずれの人型ロボットが提示するかを決定してもよい。この場合には、当該発話を提示する人型ロボットを表す情報も併せて音声合成部４０へ出力する。また、この場合には、発話決定部３０は、当該発話を提示する相手、すなわち、当該発話をユーザに対して提示するのか、何れかの人型ロボットに対して提示するのか、を決定してもよい。この場合には、当該発話を提示する相手を表す情報も併せて音声合成部４０へ出力する。 [Speech determination unit 30]
The utterance determination unit 30 determines text representing the utterance content from the dialogue system 100 and outputs the text to the speech synthesis unit 40. When text representing the utterance content of the user is input from the speech recognition unit 20, the text representing the utterance content from the dialogue system 100 is determined based on the input text representing the utterance content of the user, and speech synthesis is performed. Output to the unit 40. When the presentation unit 50 of the interactive system 100 is configured with a plurality of humanoid robots, the utterance determination unit 30 may determine which humanoid robot presents the utterance. In this case, information representing the humanoid robot that presents the utterance is also output to the speech synthesizer 40. In this case, the utterance determination unit 30 determines whether to present the utterance, that is, whether to present the utterance to the user or any humanoid robot. Also good. In this case, information representing the partner who presents the utterance is also output to the speech synthesizer 40.

［音声合成部４０］
音声合成部４０は、発話決定部３０から入力された発話内容を表すテキストを、発話内容を表す音声信号に変換し、提示部５０に対して出力する。音声合成部４０が行う音声合成の方法は、既存のいかなる音声合成技術であってもよく、利用環境等に合わせて最適なものを適宜選択すればよい。なお、対話システム１００の提示部５０が複数の人型ロボットで構成される場合に、発話決定部３０から発話内容を表すテキストと共に当該発話を提示する人型ロボットを表す情報が入力された場合には、音声合成部４０は、当該情報に対応する人型ロボットへ発話内容を表す音声信号を出力する。また、発話決定部３０から発話内容を表すテキストと当該発話を提示する人型ロボットを表す情報に併せて発話を提示する相手を表す情報も入力された場合には、音声合成部４０は、当該情報に対応する人型ロボットへ発話内容を表す音声信号と発話を提示する相手を表す情報を出力する。 [Speech synthesizer 40]
The voice synthesis unit 40 converts the text representing the utterance content input from the utterance determination unit 30 into a voice signal representing the utterance content, and outputs the voice signal to the presentation unit 50. The speech synthesis method performed by the speech synthesizer 40 may be any existing speech synthesis technology, and an optimal method may be selected as appropriate according to the usage environment. In addition, when the presentation unit 50 of the interactive system 100 is configured by a plurality of humanoid robots, when information representing a humanoid robot that presents the utterance is input from the utterance determination unit 30 together with text representing the utterance content. The voice synthesizer 40 outputs a voice signal representing the utterance content to the humanoid robot corresponding to the information. In addition, when the text representing the utterance content and the information representing the humanoid robot that presents the utterance are also input from the utterance determining unit 30 together with the information representing the partner to present the utterance, the speech synthesizer 40 The voice signal representing the utterance content and the information representing the partner who presents the utterance are output to the humanoid robot corresponding to the information.

［提示部５０］
提示部５０は、発話決定部３０が決定した発話内容をユーザへ提示するためのインターフェースである。例えば、提示部５０は、人間の形を模して製作された人型ロボットである。この人型ロボットは、音声合成部４０から入力された発話内容を表す音声信号に対応する音声を、例えば頭部に搭載したスピーカ５１から発音する、すなわち、発話を提示する。スピーカ５１は、音声合成部４０から入力された発話内容を表す音声信号に対応する音声を発音可能とすればよい。つまり、図１は一例であって、スピーカ５１−１，５１−２の何れか一方を備えないでもよい。また、ユーザ１０１の近傍などの人型ロボット５０−１，５０−２とは異なる場所に１個以上のスピーカ、または、複数のスピーカを備えたスピーカアレイを設置し、スピーカ５１−１，５１−２の双方を備えない構成としてもよい。また、人型ロボットは、顔の表情や、身体の動作等の非言語的な行動により発話決定部３０が決定した発話内容をユーザへ提示してもよい。例えば、直前の発話に対して同意する旨を提示する際には、首を縦に振り、同意しない旨を提示する際には、首を横に振るなどの非言語的な行動を提示することが挙げられる。また、人型ロボットは、発話を提示する際に、顔や体全体をユーザまたは他の人型ロボットの方へ向けることで、顔や身体を向いた方にいるユーザまたは他の人型ロボットに対して発話を提示していることを表現することができる。提示部５０を人型ロボットとした場合には、例えば、対話に参加する人格（エージェント）ごとに一台の人型ロボットを用意する。以下では、二人の人格が対話に参加する例として、二台の人型ロボット５０−１および５０−２が存在するものとする。なお、対話システム１００の提示部５０が複数の人型ロボットで構成される場合に、発話決定部３０が当該発話をいずれの人型ロボットから提示するかを決定していた場合には、音声合成部４０が出力した発話内容を表す音声信号を受け取った人型ロボット５０−１または５０−２が当該発話を提示する。また、発話決定部３０が決定した発話を提示する相手を表す情報も入力された場合には、人型ロボット５０−１または５０−２は、発話を提示する相手を表す情報に対応する人型ロボットまたはユーザの発話に顔や視線を向けた状態で、発話を提示する。 [Presentation section 50]
The presentation unit 50 is an interface for presenting the utterance content determined by the utterance determination unit 30 to the user. For example, the presentation unit 50 is a humanoid robot imitating a human shape. This humanoid robot generates a voice corresponding to the voice signal representing the utterance content input from the voice synthesizer 40, for example, from the speaker 51 mounted on the head, that is, presents the utterance. The speaker 51 only needs to be able to produce sound corresponding to the sound signal representing the utterance content input from the sound synthesizer 40. That is, FIG. 1 is an example, and one of the speakers 51-1 and 51-2 may not be provided. Further, a speaker array including one or more speakers or a plurality of speakers is installed in a place different from the humanoid robots 50-1 and 50-2 such as the vicinity of the user 101, and the speakers 51-1 and 51- are installed. 2 may be provided. In addition, the humanoid robot may present the utterance content determined by the utterance determination unit 30 based on non-verbal behavior such as facial expressions and body movements to the user. For example, when presenting consent to the previous utterance, present a non-verbal action such as swinging the head vertically and presenting the disagreement by shaking the head sideways. Is mentioned. In addition, when presenting an utterance, a humanoid robot directs the face or body toward the user or another humanoid robot, thereby allowing the user or other humanoid robot facing the face or body. In contrast, it is possible to express that an utterance is being presented. When the presentation unit 50 is a humanoid robot, for example, one humanoid robot is prepared for each personality (agent) participating in the dialogue. In the following, it is assumed that there are two humanoid robots 50-1 and 50-2 as an example in which two personalities participate in the dialogue. When the presentation unit 50 of the interactive system 100 is configured by a plurality of humanoid robots, if the utterance determination unit 30 determines from which humanoid robot the utterance is to be presented, speech synthesis is performed. The humanoid robot 50-1 or 50-2 that has received the audio signal representing the utterance content output by the unit 40 presents the utterance. Further, when information representing a partner who presents the utterance determined by the utterance determination unit 30 is also input, the humanoid robot 50-1 or 50-2 selects a humanoid corresponding to the information representing the partner presenting the utterance. The utterance is presented with the face or line of sight directed at the utterance of the robot or user.

以下、図２を参照して、実施形態の対話方法の処理手続きを説明する。 Hereinafter, with reference to FIG. 2, a processing procedure of the interactive method of the embodiment will be described.

ステップＳ１において、対話システム１００は、第一発話の内容を表す音声を、人型ロボット５０−１が備えるスピーカ５１−１から出力する、すなわち、第一発話を提示する。第一発話の内容を表す音声は、発話決定部３０が決定した第一発話の内容を表すテキストを音声合成部４０が音声信号に変換したものである。 In step S1, the dialogue system 100 outputs a voice representing the content of the first utterance from the speaker 51-1 included in the humanoid robot 50-1, that is, presents the first utterance. The voice representing the content of the first utterance is obtained by converting the text representing the content of the first utterance determined by the utterance determining unit 30 into a voice signal by the voice synthesizing unit 40.

第一発話の内容を表すテキストは、発話決定部３０が、例えば、直前までの発話内容に応じて決定する。直前までの発話内容に応じて発話内容を決定する技術は、従来の対話システムにおいて用いられているものを利用すればよく、例えば、非特許文献２に記載されたシナリオ対話システムなどを用いることができる。発話決定部３０がシナリオ対話システムにおいて用いられている技術を用いる場合は、例えば、発話決定部３０は、直前の５発話程度を含む対話について、各発話に含まれる単語や各発話を構成する焦点語と発話決定部３０内の図示しない記憶部に記憶された各シナリオに含まれる単語や焦点語との単語間距離が所定の距離より近いシナリオを選択し、選択したシナリオに含まれるテキストを選択することにより第一発話の内容を表すテキストを決定する。 The text representing the content of the first utterance is determined by the utterance determination unit 30 according to, for example, the content of the utterance until immediately before. As a technique for determining the utterance content according to the utterance content up to immediately before, a technique used in a conventional dialogue system may be used. For example, a scenario dialogue system described in Non-Patent Document 2 may be used. it can. When the utterance determination unit 30 uses the technology used in the scenario dialogue system, for example, the utterance determination unit 30 has a word included in each utterance and a focal point constituting each utterance with respect to a conversation including about the last five utterances. Select a scenario in which the distance between the words and the words included in each scenario stored in the storage unit (not shown) in the utterance determination unit 30 or the focal word is closer than a predetermined distance, and select the text included in the selected scenario By doing so, the text representing the content of the first utterance is determined.

発話決定部３０内の図示しない記憶部に記憶されたシナリオは、例えば、Web上の記事など既存のリソースを取得して、複数の文に分割し、分割された各文を対話調に変換して生成した複数の発話文により構成される。シナリオには、複数のエージェント間で対話を行うために、生成した複数の発話文に、例えば相槌等の簡単な発話を挿入したものであってもよい。シナリオには、対話感を向上するために、記事の話題に関する質問や記事中のキーワードに関する質問をユーザに対して問いかける発話文を含むものとする。以降では、シナリオ中の質問までの発話文を第一発話と呼び、質問より後の発話文を質問後発話と呼ぶ。 The scenario stored in the storage unit (not shown) in the utterance determination unit 30 is obtained, for example, by acquiring an existing resource such as an article on the Web, dividing it into a plurality of sentences, and converting each divided sentence into a dialogue style. It consists of a plurality of utterance sentences generated in this way. The scenario may be a scenario in which simple utterances such as a match are inserted into a plurality of generated utterance sentences in order to perform a dialogue between a plurality of agents. The scenario includes an utterance sentence that asks the user a question about the topic of an article or a question about a keyword in the article in order to improve the feeling of interaction. Hereinafter, the utterance sentence up to the question in the scenario is called the first utterance, and the utterance sentence after the question is called the post-question utterance.

発話決定部３０内の図示しない記憶部に予め記憶しておくシナリオの生成方法について説明する。例えば、以下の内容の時事に関する記事が取得できたとする。 A scenario generation method stored in advance in a storage unit (not shown) in the utterance determination unit 30 will be described. For example, suppose that an article about current events with the following contents has been acquired.

Ａ．「○○県△△市で訓練中の輸送機が墜落・大破、二人ケガ」
この文章Ａを分割して、対話調へ変換し、質問を含むようにすることで、例えば、以下の発話文Ａ−１〜Ａ−４により構成されるシナリオを生成する。ここでは、事件に関する知識を問う質問をユーザへ問いかける発話文（Ａ−２）を生成してシナリオに含めている。この例では、シナリオに含まれる発話文Ａ−１〜Ａ−２が第一発話であり、発話文Ａ−３〜Ａ−４が質問後発話である。
Ａ−１．「ねぇ、○○県でね」
Ａ−２．「輸送機の話しってる？」
Ａ−３．「△△市なんだけど、訓練中に墜落・大破して二人ケガしたんだって」
Ａ−４．「こわいねぇ」 A. "The transport aircraft being trained in XX city in XX prefecture crashed, wrecked, and injured."
By dividing this sentence A, converting it into a dialogue style, and including a question, for example, a scenario composed of the following utterance sentences A-1 to A-4 is generated. Here, an utterance sentence (A-2) for asking the user questions regarding knowledge about the incident is generated and included in the scenario. In this example, utterance sentences A-1 to A-2 included in the scenario are first utterances, and utterance sentences A-3 to A-4 are utterances after a question.
A-1. "Hey, in XX prefecture"
A-2. "Are you talking about transport planes?"
A-3. “△△ It ’s a city, but it crashed and was wrecked during training.
A-4. "I'm scared"

また、例えば、以下の内容の時事に関する記事が取得できたとする。
Ｂ．「老舗映画雑誌『○○○』が年間ベストテンを発表し、アニメ作品『×××』が『△△△』や『□□□』といった強豪を抑え邦画第一位に決定した」
この文章Ｂを分割して、対話調へ変換し、質問を含むようにすることで、例えば、以下の発話文Ｂ−１−１〜Ｂ−１−３により構成されるシナリオを生成する。ここでは、任意の映画名を回答させる質問をユーザへ問いかける発話文（Ｂ−１−１）を生成してシナリオに含めている。この例では、シナリオに含まれる発話文Ｂ−１−１が第一発話であり、発話文Ｂ−１−２〜Ｂ−１−３が質問後発話である。
Ｂ−１−１．「最近映画どれみた？」
Ｂ−１−２．「×××が第一位に決定したね」
Ｂ−１−３．「そうそう、すごいよね」 Also, for example, suppose that an article about current events with the following contents has been acquired.
B. “A long-established movie magazine“ XX ”announced the best ten of the year, and the animated film“ XXX ”was decided to be the first Japanese film to control strong players such as“ △△△ ”and“ □□□ ”.
By dividing this sentence B, converting it into a dialogue style, and including a question, for example, a scenario composed of the following utterance sentences B-1-1 to B-1-3 is generated. Here, an utterance sentence (B-1-1) for asking the user a question for answering an arbitrary movie name is generated and included in the scenario. In this example, the utterance sentence B-1-1 included in the scenario is the first utterance, and the utterance sentences B-1-2 to B-1-3 are utterances after the question.
B-1-1. "Did you watch a movie recently?"
B-1-2. “XXX was the first place.”
B-1-3. "Yes, that's amazing"

また、例えば、上記の文章Ｂを分割して、対話調へ変換し、質問を含むようにすることで、以下の発話文Ｂ−２−１〜Ｂ−２−３により構成されるシナリオを生成する。ここでは、特定の映画名を回答させる質問をユーザへ問いかける発話文（Ｂ−２−１）を生成してシナリオに含めている。この例では、シナリオに含まれる発話文Ｂ−２−１が第一発話であり、発話文Ｂ−２−２〜Ｂ−２−３が質問後発話である。
Ｂ−２−１．「最近今年の日本の映画ベストテンが発表されたんだけど、何だか知ってる？」
Ｂ−２−２．「正解は×××でした！第一位に決定したんだよ」
Ｂ−２−３．「へぇ、すごいね」 Also, for example, by dividing the above sentence B, converting it into a dialogue style and including a question, a scenario composed of the following utterance sentences B-2-1 to B-2-3 is generated. To do. Here, an utterance sentence (B-2-1) for asking the user a question for answering a specific movie name is generated and included in the scenario. In this example, the utterance sentence B-2-1 included in the scenario is the first utterance, and the utterance sentences B-2-2 to B-2-3 are utterances after the question.
B-2-1. "Recently, this year's Japanese movie Best Ten was announced, do you know what?"
B-2-2. "The correct answer was XXX! I decided on the first place."
B-2-3. "Hey, amazing"

ステップＳ２において、マイクロホン１１は、第一発話を提示した後にユーザ１０１が発した発話を受け付ける。以下、この発話をユーザ発話と呼ぶ。マイクロホン１１が取得したユーザの発話内容を表す音声信号は音声認識部２０へ入力される。音声認識部２０は、マイクロホン１１が取得したユーザの発話内容を表す音声信号を音声認識し、認識結果として得たテキストを、ユーザの発話内容を表すテキストとして発話決定部３０に対して出力する。 In step S2, the microphone 11 accepts an utterance uttered by the user 101 after presenting the first utterance. Hereinafter, this utterance is referred to as user utterance. A voice signal representing the user's utterance content acquired by the microphone 11 is input to the voice recognition unit 20. The speech recognition unit 20 recognizes a speech signal representing the user's speech content acquired by the microphone 11 and outputs the text obtained as a recognition result to the speech determination unit 30 as text representing the user's speech content.

ステップＳ３において、発話決定部３０は、音声認識部２０が出力したユーザの発話内容を表すテキストを受け取り、第一発話の内容を表すテキストおよびユーザの発話内容を表すテキストに基づいて、第二発話の内容を表すテキストを決定し、音声合成部４０に対して出力する。第二発話は、一つの発話であってもよいし、複数の発話であってもよい。対話システム１００の提示部５０が複数の人型ロボットで構成される場合は、発話決定部３０は第二発話を提示する人型ロボットを決定してもよく、その場合、第二発話の内容を表すテキストと共に第二発話を提示する人型ロボットを表す情報を出力する。また、発話決定部３０は第二発話を提示する相手を決定してもよく、その場合、第二発話の内容を表すテキストと共に第二発話を提示する相手を表す情報を出力する。 In step S <b> 3, the utterance determination unit 30 receives the text representing the utterance content of the user output from the speech recognition unit 20, and based on the text representing the content of the first utterance and the text representing the utterance content of the user, the second utterance The text representing the content of is determined and output to the speech synthesizer 40. The second utterance may be one utterance or a plurality of utterances. When the presentation unit 50 of the dialogue system 100 includes a plurality of humanoid robots, the utterance determination unit 30 may determine a humanoid robot that presents the second utterance, and in this case, the content of the second utterance is determined. Information representing the humanoid robot that presents the second utterance together with the text to be output is output. Moreover, the utterance determination unit 30 may determine a partner who presents the second utterance. In this case, the utterance determination unit 30 outputs information representing the partner who presents the second utterance together with text representing the content of the second utterance.

発話決定部３０は、第二発話の内容を表すテキストを、発話決定部３０内の図示しない記憶部に記憶されたシナリオのうちの、第一発話と同一のシナリオに含まれる発話文のうち、質問後発話の内容を表すテキスト、すなわち、質問より後の発話文のテキスト、から生成する。発話決定部３０は、生成に際しては、第一発話の内容を表すテキストおよびユーザの発話内容を表すテキストに応じて、第二発話の内容を表すテキストを生成する方法を選択する。具体的には、発話決定部３０は、以下の方法から、第二発話の内容を表すテキストを生成する方法を選択する。 The utterance determination unit 30 includes a text representing the content of the second utterance, among utterances included in the same scenario as the first utterance among scenarios stored in a storage unit (not shown) in the utterance determination unit 30. It is generated from text representing the content of the utterance after the question, that is, the text of the utterance sentence after the question. At the time of generation, the utterance determination unit 30 selects a method for generating the text representing the content of the second utterance according to the text representing the content of the first utterance and the text representing the content of the user's utterance. Specifically, the utterance determination unit 30 selects a method for generating text representing the content of the second utterance from the following methods.

１．発話決定部３０は、第一発話の内容を表すテキストが、ある話題に関する任意の情報を問う質問であり、ユーザの発話内容を表すテキストが、質問後発話の内容を表すテキストの少なくとも一部を含んでいる場合、質問後発話の内容を表すテキストからその部分の全部または一部を省略したものを、第二発話の内容を表すテキストとして生成する。 1. In the utterance determination unit 30, the text representing the content of the first utterance is a question asking any information about a certain topic, and the text representing the utterance content of the user uses at least a part of the text representing the content of the utterance after the question. If included, a text obtained by omitting all or part of the text representing the content of the post-question utterance is generated as text representing the content of the second utterance.

２．発話決定部３０は、第一発話の内容を表すテキストが、ある話題に関して特定の単語クラスに属する任意の情報を問う質問である場合、ユーザの発話内容を表すテキストが、その単語クラスに属する情報を含んでいるか否かを判定して、発話決定部３０内の図示しない記憶部に記憶された発話文、すなわち、あらかじめ定めた発話文から選択した追加発話の内容を表すテキストを質問後発話の内容を表すテキストに追加したものを、第二発話の内容を表すテキストとして生成する。追加発話の内容を表すテキストは、ユーザの発話内容を表すテキストから、質問後発話の内容を表すテキストへ話題を遷移させる内容を表すものである。 2. When the text representing the content of the first utterance is a question asking any information belonging to a specific word class regarding a certain topic, the utterance determination unit 30 is information that the text representing the content of the user's utterance belongs to the word class. Utterance sentence stored in a storage unit (not shown) in the utterance determination unit 30, that is, a text representing the content of an additional utterance selected from a predetermined utterance sentence, What is added to the text representing the content is generated as text representing the content of the second utterance. The text representing the content of the additional utterance represents the content for transitioning the topic from the text representing the content of the user's utterance to the text representing the content of the post-question utterance.

例えば、ユーザの発話内容を表すテキストが、その単語クラスに属する情報を含んでいる場合、「あれはいいね」「そう言えば」などのようにユーザに同調する追加発話の内容を表すテキストと、質問後発話の内容を表すテキストとを決定する。また、例えば、ユーザの発話内容を表すテキストが、その単語クラスに属する情報を含んでいない場合、第一発話を提示した人型ロボットとは異なる人型ロボットが提示するための、その単語クラスに属する情報を含む追加発話の内容を表すテキストと、質問後発話の内容を表すテキストとを決定する。 For example, if the text representing the utterance content of the user includes information belonging to the word class, the text representing the content of the additional utterance synchronized with the user, such as “That ’s good” or “That ’s right” The text representing the content of the utterance after the question is determined. Also, for example, when the text representing the user's utterance content does not include information belonging to the word class, the word class for presentation by a humanoid robot different from the humanoid robot that presented the first utterance The text representing the content of the additional utterance including the belonging information and the text representing the content of the post-question utterance are determined.

３．発話決定部３０は、第一発話の内容を表すテキストが、ある知識を問う（すなわち、正解が存在する）質問である場合、ユーザの発話内容を表すテキストが、その正解を含んでいるか否か、もし正解を含んでいない場合は正解と同じ単語クラスに属する情報を含んでいるか否かを判定して、発話決定部３０内の図示しない記憶部に記憶された発話文、すなわち、あらかじめ定めた発話文から選択した追加発話の内容を表すテキストを質問後発話の内容を表すテキストに追加したものを、第二発話の内容を表すテキストとして生成する。追加発話の内容を表すテキストは、ユーザの発話内容を表すテキストから、質問後発話の内容を表すテキストへ話題を遷移させる内容を表すものである。 3. When the text representing the content of the first utterance is a question asking a certain knowledge (that is, there is a correct answer), the utterance determination unit 30 determines whether the text representing the content of the user's utterance includes the correct answer. If it does not include the correct answer, it is determined whether or not it includes information belonging to the same word class as the correct answer, and an utterance sentence stored in a storage unit (not shown) in the utterance determination unit 30, that is, a predetermined sentence A text representing the content of the second utterance is generated by adding the text representing the content of the additional utterance selected from the utterance to the text representing the content of the utterance after the question. The text representing the content of the additional utterance represents the content for transitioning the topic from the text representing the content of the user's utterance to the text representing the content of the post-question utterance.

例えば、ユーザの発話内容を表すテキストが、質問の正解を含んでいる場合、第一発話を提示した人型ロボットが提示するための、「そう！」「よくわかったね！」など正解であることを表す追加発話の内容を表すテキストと、質問後発話の内容を表すテキストとを決定する。また、ユーザの発話内容を表すテキストが、正解ではないが正解と同じ単語クラスに属する情報を含んでいる場合、第一発話を提示した人型ロボットが提示するための、「ううん！」「違うよ！」など不正解であることを表す追加発話の内容を表すテキストと、質問後発話の内容を表すテキストとを決定する。 For example, if the text describing the user's utterance content contains the correct answer to the question, the correct answer such as “Yes!” Or “I understand well!” For the humanoid robot that presented the first utterance. And a text representing the content of the additional utterance representing the text and a text representing the content of the post-question utterance. In addition, when the text representing the user's utterance content includes information that is not the correct answer but belongs to the same word class as the correct answer, “Yes!” “No” for the humanoid robot that presented the first utterance to present. The text indicating the content of the additional utterance indicating the incorrect answer such as “Yo!” And the text indicating the content of the utterance after the question are determined.

さらに、例えば、ユーザの発話内容を表すテキストが、正解と同じ単語クラスに属する情報を含む場合、第一発話を提示した人型ロボットとは異なる人型ロボットが提示するための、不正解の情報を含む追加発話の内容を表すテキストと、質問後発話の内容を表すテキストとを決定する。また、ユーザの発話内容を表すテキストが、正解と同じ単語クラスに属する情報を含まない場合、第一発話を提示した人型ロボットとは異なる人型ロボットが提示するための、正解の情報を含む追加発話の内容を表すテキストと、質問後発話の内容を表すテキストとを決定する。 Furthermore, for example, when the text representing the user's utterance content includes information belonging to the same word class as the correct answer, the incorrect answer information to be presented by a humanoid robot different from the humanoid robot that presented the first utterance The text representing the content of the additional utterance including the text and the text representing the content of the post-question utterance are determined. In addition, when the text representing the user's utterance content does not include information belonging to the same word class as the correct answer, it includes correct information for presentation by a humanoid robot different from the humanoid robot that presented the first utterance. A text representing the content of the additional utterance and a text representing the content of the post-question utterance are determined.

ステップＳ４において、音声合成部４０は、発話決定部３０が出力した第二発話の内容を表すテキストを音声合成し、第二発話の内容を表す音声を生成し、人型ロボット５０−１が備えるスピーカ５１−１または人型ロボット５０−２が備えるスピーカ５１−２から出力する、すなわち、第二発話を提示する。発話決定部３０から第二発話の内容を表すテキストと共に第二発話を提示する人型ロボットを表す情報が入力された場合、音声合成部４０は、当該情報に対応する人型ロボット５０が備えるスピーカ５１から当該第二発話の内容を表す音声を出力する。発話決定部３０から第二発話の内容を表すテキストと共に第二発話を提示する相手を表す情報が入力された場合、提示部５０は、当該情報に対応する相手のいる方向へ人型ロボット５０の顔もしくは体全体を向けて当該第二発話の内容を表す音声を出力する。なお、図２では、ステップＳ４において第二発話を提示するように記載されているが、第二発話に複数の発話が含まれる場合（例えば、追加発話と質問後発話とを含むような場合）には、ステップＳ４の内容が複数のサブステップにより実行される。 In step S4, the speech synthesizer 40 synthesizes the text representing the content of the second utterance output from the utterance determination unit 30 to generate speech representing the content of the second utterance, and the humanoid robot 50-1 includes. Output from the speaker 51-2 or the speaker 51-2 included in the humanoid robot 50-2, that is, the second utterance is presented. When information representing a humanoid robot that presents the second utterance together with text representing the content of the second utterance is input from the utterance determination unit 30, the speech synthesizer 40 includes a speaker included in the humanoid robot 50 corresponding to the information. From 51, a voice representing the content of the second utterance is output. When information representing a partner who presents the second utterance is input from the utterance determining unit 30 together with text representing the content of the second utterance, the presenting unit 50 moves the humanoid robot 50 in the direction of the partner corresponding to the information. A voice representing the content of the second utterance is output with the face or the whole body facing. In FIG. 2, it is described that the second utterance is presented in step S <b> 4, but when the second utterance includes a plurality of utterances (for example, when an additional utterance and an after-question utterance are included). The content of step S4 is executed by a plurality of substeps.

以降、対話システムは第二発話の内容を話題とした発話を行うことで、ユーザとの対話を続行する。例えば、質問後発話がシナリオ対話システムにおいて用いられている技術により選択したシナリオの途中の発話である場合には、そのシナリオに沿った対話がユーザと対話システムとの間で実行されるように、対話システムはシナリオ対話システムにおいて用いられている技術により決定したシナリオ発話の発話内容を表す音声をスピーカから出力する。また、例えば、質問後発話がシナリオ対話システムにおいて用いられている技術により選択したシナリオの最後の発話である場合には、ユーザの発話に基づいて雑談対話システムにおいて用いられている技術により決定した雑談発話の発話内容を表す音声をスピーカから出力する。以降の発話を提示する人型ロボットは、何れか一台の人型ロボットであってもよいし、複数台の人型ロボットであってもよい。 Thereafter, the dialogue system continues the dialogue with the user by performing the utterance with the content of the second utterance as a topic. For example, when the post-question utterance is an utterance in the middle of the scenario selected by the technology used in the scenario dialogue system, the dialogue according to the scenario is executed between the user and the dialogue system. The dialogue system outputs a voice representing the utterance content of the scenario utterance determined by the technology used in the scenario dialogue system from the speaker. Also, for example, when the post-question utterance is the last utterance of the scenario selected by the technique used in the scenario dialogue system, the chat decided by the technique used in the chat dialogue system based on the user's utterance A voice representing the utterance content of the utterance is output from the speaker. The humanoid robot that presents the subsequent utterances may be any one humanoid robot or a plurality of humanoid robots.

［具体例］
以下、実施形態の対話システムによる対話内容の具体例を示す。以降の具体例の記載では、Ｒはロボットを表し、Ｈはユーザを表す。Ｒの後の数字は人型ロボットの識別子である。t(i)（i=0, 1, 2, …）は対話中の発話を表し、iは発話の順番を表す数字である。発話中の括弧に囲まれた部分は発話を省略することを表す。 [Concrete example]
Hereinafter, specific examples of dialogue contents by the dialogue system of the embodiment will be shown. In the following description of specific examples, R represents a robot and H represents a user. The number after R is the identifier of the humanoid robot. t (i) (i = 0, 1, 2,...) represents an utterance during conversation, and i is a number representing the order of utterances. The part enclosed in parentheses during utterance represents omitting the utterance.

（具体例１−１）
具体例１−１は、ある話題に関する任意の情報を問う質問を含む第一発話を提示した後に、質問後発話からユーザ発話に含まれる情報の一部を省略した第二発話を提示する例である。
t(1) Ｒ１：ねぇ、○○県でね
t(2) Ｈ：うん？
t(3) Ｒ１：輸送機の話しってる？
t(4) Ｈ：あぁ、△△市の
t(5) Ｒ２：そう、（△△市なんだけど、）訓練中に墜落・大破して二人ケガしたんだって
t(6) Ｒ１：こわいねぇ (Specific Example 1-1)
Specific example 1-1 is an example in which after presenting a first utterance including a question asking about arbitrary information regarding a certain topic, a second utterance in which a part of information included in the user utterance is omitted from the post-question utterance is presented. is there.
t (1) R1: Hey, in XX prefecture
t (2) H: Yeah?
t (3) R1: Are you talking about transport aircraft?
t (4) H: Ah, △△ of the city
t (5) R2: (Yes, it ’s a city), but crashed and wrecked during training and two people were injured
t (6) R1: scary

この例では、まず、対話システム１００は、上記の文章Ａの話題に関する第一発話t(1), t(3)を提示する。この際、第一発話t(1)と第一発話t(3)の間でユーザが発話することにより、第一発話t(1)と第一発話t(3)の間にユーザ発話t(2)が含まれてしまってもよい。次に、対話システム１００は、第一発話の最後の発話である文章Ａの話題に関する任意の情報を問う質問である第一発話t(3)に対するユーザの発話であるユーザ発話t(4)を取得する。次に、対話システム１００は、ユーザ発話t(4)に「△△市」の情報が含まれているため、質問後発話から「△△市なんだけど、」の部分を省略した第二発話t(5)を提示する。これにより、ユーザが発話した前提の知識である「△△市」に関して重複して発話することがなく、冗長な印象を与えることを回避することができる。 In this example, first, the dialogue system 100 presents the first utterances t (1) and t (3) related to the topic of the sentence A. At this time, when the user utters between the first utterance t (1) and the first utterance t (3), the user utterance t (1) and the first utterance t (3) 2) may be included. Next, the dialogue system 100 determines a user utterance t (4) that is a user's utterance with respect to the first utterance t (3) that is a question asking any information about the topic of the sentence A that is the last utterance of the first utterance. get. Next, since the dialogue system 100 includes the information of “△△ city” in the user utterance t (4), the second utterance t in which the part of “△△ city is what it is” is omitted from the utterance after the question. Present (5). Thereby, it is possible to avoid giving redundant impressions with respect to “ΔΔ city”, which is the premise knowledge spoken by the user, and avoiding redundant impressions.

（具体例１−２）
具体例１−２は、ある話題に関する任意の情報を問う質問を含む第一発話を提示した後に、質問後発話からユーザ発話に含まれる情報の一部を省略した第二発話を提示する例である。
t(1) Ｒ１：ねぇ、○○県でね
t(2) Ｈ：うん？
t(3) Ｒ１：輸送機の話しってる？
t(4) Ｈ：あぁ、墜落したんでしょ
t(5) Ｒ２：そう、△△市なんだけど、訓練中に（墜落・）大破して二人ケガしたんだって
t(6) Ｒ１：こわいねぇ (Specific example 1-2)
Specific example 1-2 is an example in which after presenting a first utterance including a question asking about arbitrary information related to a certain topic, a second utterance in which a part of information included in the user utterance is omitted from the post-question utterance is presented. is there.
t (1) R1: Hey, in XX prefecture
t (2) H: Yeah?
t (3) R1: Are you talking about transport aircraft?
t (4) H: Oh, you've crashed
t (5) R2: Yes, it ’s a △ city, but during the training, it crashed and two people were injured.
t (6) R1: scary

この例では、まず、対話システム１００は、上記の文章Ａの話題に関する任意の情報を問う第一発話t(1), t(3)を提示する。この際、第一発話t(1)と第一発話t(3)の間でユーザが発話することにより、第一発話t(1)と第一発話t(3)の間にユーザ発話t(2)が含まれてしまってもよい。次に、対話システム１００は、第一発話の最後の発話である文章Ａの話題に関する任意の情報を問う質問である第一発話t(3)に対するユーザの発話であるユーザ発話t(4)を取得する。次に、対話システム１００は、ユーザ発話t(4)に「墜落」の情報が含まれているため、質問後発話から「墜落・」の部分を省略した第二発話t(5)を提示する。 In this example, first, the dialogue system 100 presents first utterances t (1) and t (3) asking for arbitrary information related to the topic of the sentence A. At this time, when the user utters between the first utterance t (1) and the first utterance t (3), the user utterance t (1) and the first utterance t (3) 2) may be included. Next, the dialogue system 100 determines a user utterance t (4) that is a user's utterance with respect to the first utterance t (3) that is a question asking any information about the topic of the sentence A that is the last utterance of the first utterance. get. Next, the dialogue system 100 presents the second utterance t (5) in which the “falling” part is omitted from the post-question utterance because the user utterance t (4) includes the information “falling”. .

（具体例２−１）
具体例２−１は、ある話題に関する特定の単語クラスに属する任意の情報を問う質問である第一発話を提示した後に、ユーザ発話がその単語クラスに属する情報を含む発話であると判定して、ユーザに同調する旨の追加発話と、追加発話より後の質問後発話とを含む第二発話を提示する例である。
t(1) Ｒ１：最近映画どれみた？
t(2) Ｈ：□□□をみたよ
t(3) Ｒ２：あれ人気だよね
t(4) Ｒ１：そう言えば、×××が第一位に決定したね
t(5) Ｒ２：そうそう、すごいよね (Specific example 2-1)
Specific example 2-1 determines that a user utterance is an utterance including information belonging to the word class after presenting a first utterance that is a question asking about arbitrary information belonging to a specific word class regarding a certain topic. This is an example of presenting a second utterance including an additional utterance to synchronize with the user and a post-question utterance after the additional utterance.
t (1) R1: How many movies have you seen recently?
t (2) H: I saw □□□
t (3) R2: That's popular
t (4) R1: Speaking of which, XXX has been decided as the first place
t (5) R2: Yeah, awesome

この例では、まず、対話システム１００は、上記の文章Ｂの話題に関して映画名のクラスに属する単語をユーザが発話することが予測できる質問である第一発話t(1)を提示する。次に、対話システム１００は、第一発話t(1)に対するユーザの発話であるユーザ発話t(2)を取得する。次に、対話システム１００は、ユーザ発話t(2)に「□□□」という映画名が含まれているため、「あれ人気だよね」のようにユーザに同調する旨の追加発話t(3)を提示する。さらに、対話システム１００は、「そう言えば」のように話題を遷移させる内容を追加した質問後発話t(4)を提示する。これにより、ユーザが発話した映画名がどのようなものであっても同じ話題へ復帰させることができ、かつ、ユーザが自分の発話が無視されたと感じることを回避することができる。 In this example, first, the dialogue system 100 presents the first utterance t (1), which is a question that can predict that the user utters a word belonging to the movie name class regarding the topic of the sentence B. Next, the dialogue system 100 acquires a user utterance t (2) that is a user's utterance for the first utterance t (1). Next, since the movie name “□□□” is included in the user utterance t (2), the dialogue system 100 adds an additional utterance t (3) indicating that the user tunes to the user like “That ’s popular”. ). Furthermore, the dialogue system 100 presents the post-question utterance t (4) to which the content of transition of the topic is added as “speaking so”. This makes it possible to return to the same topic regardless of the movie name spoken by the user, and to prevent the user from feeling that his / her utterance has been ignored.

（具体例２−２）
具体例２−２は、ある話題に関する特定の単語クラスに属する任意の情報を問う質問である第一発話を提示した後に、ユーザ発話がその単語クラスに属する情報を含まない発話であると判定して、単語クラスに属する情報を含む追加発話と、追加発話より後の質問後発話とを含む第二発話を提示する例である。
t(1) Ｒ１：最近映画どれみた？
t(2) Ｈ：最近映画みてないな
t(3) Ｒ２：ぼくは□□□をみたよ
t(4) Ｒ１：あれ人気だよね。そう言えば、×××が第一位に決定したね
t(5) Ｒ２：そうそう、すごいよね (Specific example 2-2)
Specific example 2-2 determines that the user utterance is an utterance that does not include information belonging to the word class after presenting the first utterance that is a question asking about arbitrary information belonging to a specific word class related to a certain topic. This is an example in which a second utterance including an additional utterance including information belonging to the word class and a post-question utterance after the additional utterance is presented.
t (1) R1: How many movies have you seen recently?
t (2) H: I haven't seen a movie recently
t (3) R2: I saw □□□
t (4) R1: That's popular. Speaking of which, XXX was the first place
t (5) R2: Yeah, awesome

この例では、まず、対話システム１００は、上記の文章Ｂの話題に関して映画名のクラスに属する単語をユーザが発話することが予測できる質問である第一発話t(1)を提示する。次に、対話システム１００は、第一発話t(1)に対するユーザの発話であるユーザ発話t(2)を取得する。次に、対話システム１００は、ユーザ発話t(2)に映画名が含まれていないため、質問t(1)を提示した人型ロボットＲ１以外の人型ロボットＲ２が「□□□」という映画名を含む追加発話t(3)を提示する。さらに、対話システム１００は、追加発話t(3)に関する質問後発話t(4), t(5)を人型ロボット同士で提示することで、元の話題へ復帰させている。 In this example, first, the dialogue system 100 presents the first utterance t (1), which is a question that can predict that the user utters a word belonging to the movie name class regarding the topic of the sentence B. Next, the dialogue system 100 acquires a user utterance t (2) that is a user's utterance for the first utterance t (1). Next, since the dialogue system 100 does not include the movie name in the user utterance t (2), the humanoid robot R2 other than the humanoid robot R1 that has presented the question t (1) has a movie “□□□”. An additional utterance t (3) including the name is presented. Furthermore, the dialogue system 100 returns the original topic by presenting post-question utterances t (4) and t (5) regarding the additional utterance t (3) between the humanoid robots.

（具体例３−１）
具体例３−１は、正解が存在する質問である第一発話を提示した後に、ユーザ発話がその正解を含む発話であると判定して、不正解を含む追加発話と、追加発話より後の質問後発話とを含む第二発話を提示する例である。
t(1) Ｒ１：最近今年の日本の映画トップテンが発表されたんだけど、何だか知ってる？
t(2) Ｈ：×××でしょ
t(3) Ｒ２：□□□かな？
t(4) Ｒ１：正解は×××でした！第一位に決定したんだよ
t(5) Ｒ２：へぇ、すごいね (Specific example 3-1)
In specific example 3-1, after presenting the first utterance which is a question with a correct answer, it is determined that the user utterance is an utterance including the correct answer, an additional utterance including an incorrect answer, and an additional utterance after the additional utterance. It is an example which shows the 2nd utterance containing the utterance after a question.
t (1) R1: Recently, this year's Japanese movie top ten was announced, do you know what?
t (2) H: XXX
t (3) R2: Is it □□□?
t (4) R1: The correct answer was XXX! I decided on the first place
t (5) R2: Hey, amazing

この例では、まず、対話システム１００は、上記の文章Ｂの話題に関して日本の映画ベストテンに決定した映画名を問う質問である第一発話t(1)を提示する。次に、対話システム１００は、第一発話t(1)に対するユーザの発話であるユーザ発話t(2)を取得する。次に、対話システム１００は、ユーザ発話t(2)に「×××」という正解の映画名が含まれているため、第一発話t(1)を提示した人型ロボットＲ１以外の人型ロボットＲ２から「□□□」という不正解の映画名を含む追加発話t(3)を提示する。さらに、対話システム１００は、追加発話t(3)に関する質問後発話t(4), t(5)を人型ロボット同士で提示することで、元の話題へ復帰させている。正解を含むユーザ発話t(2)の直後に正解を明かす質問後発話t(4)を提示すると、ユーザは自分の発話t(2)が無視されたような印象を受けるため、これを回避するために、あえて不正解の追加発話t(3)を人型ロボットＲ２により提示する。 In this example, first, the dialogue system 100 presents a first utterance t (1), which is a question asking a movie name determined as a Japanese movie best ten on the topic of the sentence B. Next, the dialogue system 100 acquires a user utterance t (2) that is a user's utterance for the first utterance t (1). Next, since the correct utterance name “xxx” is included in the user utterance t (2), the dialogue system 100 has a human type other than the humanoid robot R1 that presented the first utterance t (1). An additional utterance t (3) including an incorrect movie name “□□□” is presented from the robot R2. Furthermore, the dialogue system 100 returns the original topic by presenting post-question utterances t (4) and t (5) regarding the additional utterance t (3) between the humanoid robots. If you present a post-question utterance t (4) that reveals the correct answer immediately after the user's utterance t (2), including the correct answer, the user will get the impression that his utterance t (2) was ignored, so avoid this Therefore, the additional utterance t (3) of the incorrect answer is presented by the humanoid robot R2.

（具体例３−２）
具体例３−２は、正解が存在する質問である第一発話を提示した後に、ユーザ発話がその正解を含まない発話であると判定して、正解と同じ単語クラスを含む追加発話と、追加発話より後の質問後発話とを含む第二発話を提示する例である。
t(1) Ｒ１：最近今年の日本の映画ベストテンが発表されたんだけど、何だか知ってる？
t(2) Ｈ：□□□でしょ
t(3) Ｒ２：□□□かな？
t(4) Ｒ１：正解は×××でした！第一位に決定したんだよ
t(5) Ｒ２：へぇ、すごいね (Specific Example 3-2)
Specific Example 3-2 shows that after presenting the first utterance that is a question with a correct answer, it is determined that the user utterance is an utterance that does not include the correct answer, and an additional utterance including the same word class as the correct answer is added. It is an example which presents the 2nd utterance including the utterance after the question after the utterance.
t (1) R1: Recently this year's Japanese movie Best Ten was announced, do you know what?
t (2) H: □□□
t (3) R2: Is it □□□?
t (4) R1: The correct answer was XXX! I decided on the first place
t (5) R2: Hey, amazing

この例では、まず、対話システム１００は、上記の文章Ｂの話題に関して日本の映画ベストテンに決定した映画名を問う質問である第一発話t(1)を提示する。次に、対話システム１００は、第一発話t(1)に対するユーザの発話であるユーザ発話t(2)を取得する。次に、対話システム１００は、ユーザ発話t(2)に「□□□」という不正解の映画名が含まれているため、第一発話t(1)を提示した人型ロボットＲ１以外の人型ロボットＲ２から「□□□」という不正解の映画名を含む追加発話t(3)を提示する。さらに、対話システム１００は、追加発話t(3)に関する質問後発話t(4), t(5)を人型ロボット同士で提示することで、元の話題へ復帰させている。この場合、人型ロボットＲ２の追加発話t(3)は、映画名の単語クラスに属するものであれば、正解でも不正解でもよい。 In this example, first, the dialogue system 100 presents a first utterance t (1), which is a question asking a movie name determined as a Japanese movie best ten on the topic of the sentence B. Next, the dialogue system 100 acquires a user utterance t (2) that is a user's utterance for the first utterance t (1). Next, since the user's utterance t (2) includes an incorrect movie name “□□□” in the dialogue system 100, a person other than the humanoid robot R1 who presented the first utterance t (1). The additional utterance t (3) including the incorrect movie name “□□□” is presented from the robot R2. Furthermore, the dialogue system 100 returns the original topic by presenting post-question utterances t (4) and t (5) regarding the additional utterance t (3) between the humanoid robots. In this case, the additional utterance t (3) of the humanoid robot R2 may be correct or incorrect as long as it belongs to the word class of the movie name.

（具体例３−３）
具体例３−３は、正解が存在する質問である第一発話を提示した後に、ユーザ発話がその正解を含まない発話であるが、正解と同じ単語クラスに属する情報を含む発話であると判定して、不正解であることを示す追加発話と、追加発話より後の質問後発話とを含む第二発話を提示する例である。
t(1) Ｒ１：最近今年の日本の映画ベストテンが発表されたんだけど、何だか知ってる？
t(2) Ｈ：□□□でしょ
t(3) Ｒ１：ううん、正解は×××でした！第一位に決定したんだよ
t(4) Ｒ２：へぇ、すごいね (Specific Example 3-3)
Specific Example 3-3 determines that the user utterance is an utterance that does not include the correct answer after presenting the first utterance that is a question for which there is a correct answer, but that includes information belonging to the same word class as the correct answer. In this example, the second utterance including the additional utterance indicating the incorrect answer and the post-question utterance after the additional utterance is presented.
t (1) R1: Recently this year's Japanese movie Best Ten was announced, do you know what?
t (2) H: □□□
t (3) R1: Yes, the correct answer was XXX! I decided on the first place
t (4) R2: Hey, amazing

この例では、まず、対話システム１００は、上記の文章Ｂの話題に関して日本の映画ベストテンに決定した映画名を問う質問である第一発話t(1)を提示する。次に、対話システム１００は、第一発話t(1)に対するユーザの発話であるユーザ発話t(2)を取得する。次に、対話システム１００は、ユーザ発話t(2)に「□□□」という不正解の映画名が含まれているため、第一発話t(1)を提示した人型ロボットＲ１から「ううん」のように不正解であることを表す追加発話を付加した上で、正解を明かす質問後発話t(3)を提示する。仮にユーザ発話t(2)に正解の映画名が含まれている場合には、「よくわかったね！」のように正解であることを表す追加発話を付加することで、正解を明かす質問後発話t(3)を提示しても冗長な印象を与えることを回避することができる。 In this example, first, the dialogue system 100 presents a first utterance t (1), which is a question asking a movie name determined as a Japanese movie best ten on the topic of the sentence B. Next, the dialogue system 100 acquires a user utterance t (2) that is a user's utterance for the first utterance t (1). Next, since the incorrect utterance movie name “□□□” is included in the user utterance t (2), the dialogue system 100 receives “Yeah” from the humanoid robot R1 that presented the first utterance t (1). After adding an additional utterance indicating that it is an incorrect answer such as “”, a post-question utterance t (3) that reveals the correct answer is presented. If the correct movie name is included in the user utterance t (2), an utterance after the question that reveals the correct answer is added by adding an additional utterance indicating that the answer is correct, such as "I understand well!" Even if t (3) is presented, a redundant impression can be avoided.

（具体例３−４）
具体例３−４は、正解が存在する質問である第一発話を提示した後に、ユーザ発話がその正解を含まない発話であり、正解と同じ単語クラスに属する情報も含まない発話であると判定して、正解と同じ単語クラスを含む追加発話と、追加発話より後の質問後発話とを含む第二発話を提示する例である。
t(1) Ｒ１：最近今年の日本の映画ベストテンが発表されたんだけど、何だか知ってる？
t(2) Ｈ：最近映画みてないから、わからないな
t(3) Ｒ２：□□□かな？
t(4) Ｒ１：正解は×××でした！第一位に決定したんだよ
t(5) Ｒ２：へぇ、すごいね (Specific Example 3-4)
Specific example 3-4 determines that the user utterance is an utterance that does not include the correct answer after presentation of the first utterance that is a question with a correct answer, and does not include information that belongs to the same word class as the correct answer. In this example, the second utterance including the additional utterance including the same word class as the correct answer and the post-question utterance after the additional utterance is presented.
t (1) R1: Recently this year's Japanese movie Best Ten was announced, do you know what?
t (2) H: I don't know because I haven't seen a movie recently.
t (3) R2: Is it □□□?
t (4) R1: The correct answer was XXX! I decided on the first place
t (5) R2: Hey, amazing

この例では、まず、対話システム１００は、上記の文章Ｂの話題に関して日本の映画ベストテンに決定した映画名を問う質問である第一発話t(1)を提示する。次に、対話システム１００は、第一発話t(1)に対するユーザの発話であるユーザ発話t(2)を取得する。次に、対話システム１００は、ユーザ発話t(2)に映画名の単語クラスに属する単語が含まれていないため、第一発話t(1)を提示した人型ロボットＲ１以外の人型ロボットＲ２から「□□□」という不正解の映画名を含む追加発話t(3)を提示する。さらに、対話システム１００は、追加発話t(3)に関する質問後発話t(4), t(5)を人型ロボット同士で提示することで、元の話題へ復帰させている。人型ロボットＲ２が提示する追加発話t(3)は正解の映画名を含むものであってもよく、その場合には人型ロボットＲ１の発話に「そう！」のように正解であることを表す追加発話を付加すればよい。 In this example, first, the dialogue system 100 presents a first utterance t (1), which is a question asking a movie name determined as a Japanese movie best ten on the topic of the sentence B. Next, the dialogue system 100 acquires a user utterance t (2) that is a user's utterance for the first utterance t (1). Next, since the user utterance t (2) does not include a word belonging to the word class of the movie name, the dialogue system 100 has a humanoid robot R2 other than the humanoid robot R1 that presented the first utterance t (1). Presents an additional utterance t (3) that includes the incorrect movie name “□□□”. Furthermore, the dialogue system 100 returns the original topic by presenting post-question utterances t (4) and t (5) regarding the additional utterance t (3) between the humanoid robots. The additional utterance t (3) presented by the humanoid robot R2 may include the correct movie name, and in that case, the utterance of the humanoid robot R1 is correct as "Yes!" An additional utterance to represent may be added.

［変形例］
上述した実施形態では、エージェントとして人型ロボットを用いて音声による対話を行う例を説明したが、上述した実施形態の提示部は身体等を有する人型ロボットであっても、身体等を有さないロボットであってもよい。また、この発明の対話技術はこれらに限定されず、人型ロボットのように身体等の実体がなく、発声機構を備えないエージェントを用いて対話を行う形態とすることも可能である。そのような形態としては、例えば、コンピュータの画面上に表示されたエージェントを用いて対話を行う形態が挙げられる。より具体的には、「LINE」（登録商標）や「２ちゃんねる」（登録商標）のような、複数アカウントがテキストメッセージにより対話を行うグループチャットにおいて、ユーザのアカウントと対話装置のアカウントとが対話を行う形態に適用することも可能である。この形態では、エージェントを表示する画面を有するコンピュータは人の近傍にある必要があるが、当該コンピュータと対話装置とはインターネットなどのネットワークを介して接続されていてもよい。つまり、本対話システムは、人とロボットなどの話者同士が実際に向かい合って話す対話だけではなく、話者同士がネットワークを介してコミュニケーションを行う会話にも適用可能である。 [Modification]
In the above-described embodiment, an example in which a voice conversation is performed using a humanoid robot as an agent has been described. However, even if the presentation unit of the above-described embodiment is a humanoid robot having a body or the like, it has a body or the like. There may be no robot. In addition, the dialogue technique of the present invention is not limited to these, and it is also possible to adopt a form in which a dialogue is performed using an agent that does not have an entity such as a human body and does not have an utterance mechanism like a humanoid robot. As such a form, for example, a form in which dialogue is performed using an agent displayed on a computer screen can be cited. More specifically, in a group chat in which multiple accounts interact by text messages, such as “LINE” (registered trademark) and “2 channel” (registered trademark), the user's account and the dialog device account interact. It is also possible to apply to the form which performs. In this form, the computer having the screen for displaying the agent needs to be in the vicinity of the person, but the computer and the interactive device may be connected via a network such as the Internet. That is, this dialogue system can be applied not only to a dialogue in which speakers such as a person and a robot actually talk each other but also to a conversation in which the speakers communicate via a network.

変形例の対話システム２００は、図３に示すように、例えば、一台の対話装置２からなる。変形例の対話装置２は、例えば、入力部１０、音声認識部２０、発話決定部３０、および提示部５０を備える。対話装置２は、例えば、マイクロホン１１、スピーカ５１を備えていてもよい。 As shown in FIG. 3, the interactive system 200 according to the modification includes, for example, a single interactive device 2. The interactive apparatus 2 according to the modification includes, for example, an input unit 10, a voice recognition unit 20, an utterance determination unit 30, and a presentation unit 50. The interactive device 2 may include a microphone 11 and a speaker 51, for example.

変形例の対話装置２は、例えば、スマートフォンやタブレットのようなモバイル端末、もしくはデスクトップ型やラップトップ型のパーソナルコンピュータなどの情報処理装置である。以下、対話装置２がスマートフォンであるものとして説明する。提示部５０はスマートフォンが備える液晶ディスプレイである。この液晶ディスプレイにはチャットアプリケーションのウィンドウが表示され、ウィンドウ内にはグループチャットの対話内容が時系列に表示される。グループチャットとは、チャットにおいて複数のアカウントが互いにテキストメッセージを投稿し合い対話を展開する機能である。このグループチャットには、対話装置２が制御する仮想的な人格に対応する複数の仮想アカウントと、ユーザのアカウントとが参加しているものとする。すなわち、本変形例は、エージェントが、対話装置であるスマートフォンの液晶ディスプレイに表示された仮想アカウントである場合の一例である。ユーザはソフトウェアキーボードを用いてグループチャットのウィンドウ内に設けられた入力エリアである入力部１０へ発話内容を入力し、自らのアカウントを通じてグループチャットへ投稿することができる。発話決定部３０はユーザのアカウントからの投稿に基づいて対話装置２からの発話内容を決定し、各仮想アカウントを通じてグループチャットへ投稿する。なお、スマートフォンに搭載されたマイクロホン１１と音声認識機能を用い、ユーザが発声により入力部１０へ発話内容を入力する構成としてもよい。また、スマートフォンに搭載されたスピーカ５１と音声合成機能を用い、各対話システムから得た発話内容を、各仮想アカウントに対応する音声でスピーカ５１から出力する構成としてもよい。 The interactive apparatus 2 according to the modified example is an information processing apparatus such as a mobile terminal such as a smartphone or a tablet, or a desktop or laptop personal computer. In the following description, it is assumed that the interactive device 2 is a smartphone. The presentation unit 50 is a liquid crystal display included in the smartphone. A chat application window is displayed on the liquid crystal display, and conversation contents of the group chat are displayed in time series in the window. The group chat is a function in which a plurality of accounts post a text message to each other and develop a conversation in the chat. It is assumed that a plurality of virtual accounts corresponding to a virtual personality controlled by the dialogue apparatus 2 and a user account participate in this group chat. That is, this modification is an example in which the agent is a virtual account displayed on a liquid crystal display of a smartphone that is an interactive device. The user can input the utterance content to the input unit 10 which is an input area provided in the group chat window using the software keyboard, and can post to the group chat through his / her account. The utterance determination unit 30 determines the utterance content from the dialogue apparatus 2 based on the posting from the user's account, and posts it to the group chat through each virtual account. In addition, it is good also as a structure which uses the microphone 11 mounted in the smart phone and a speech recognition function, and a user inputs speech content to the input part 10 by speech. Moreover, it is good also as a structure which outputs the utterance content obtained from each dialog system from the speaker 51 by the audio | voice corresponding to each virtual account, using the speaker 51 mounted in the smart phone and a voice synthesizing function.

以上、この発明の実施の形態について説明したが、具体的な構成は、これらの実施の形態に限られるものではなく、この発明の趣旨を逸脱しない範囲で適宜設計の変更等があっても、この発明に含まれることはいうまでもない。実施の形態において説明した各種の処理は、提示部が提示する発話順を除いて、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 As described above, the embodiments of the present invention have been described, but the specific configuration is not limited to these embodiments, and even if there is a design change or the like as appropriate without departing from the spirit of the present invention, Needless to say, it is included in this invention. The various processes described in the embodiments are not only executed in time series according to the order described, except for the utterance order presented by the presentation unit, but also in parallel according to the processing capability of the apparatus that executes the process or as necessary. Or may be performed individually.

［プログラム、記録媒体］
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When various processing functions in each device described in the above embodiment are realized by a computer, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. A configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes a processing function only by an execution instruction and result acquisition without transferring a program from the server computer to the computer. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１、２発話装置
１０入力部
１１マイクロホン
２０音声認識部
３０発話決定部
４０音声合成部
５０提示部
５１スピーカ
１００、２００対話システム
１０１ユーザ 1, 2 Utterance device 10 Input unit 11 Microphone 20 Speech recognition unit 30 Utterance determination unit 40 Speech synthesis unit 50 Presentation unit 51 Speaker 100, 200 Dialogue system 101 User

Claims

A dialogue method executed by a dialogue system for presenting a user with a first utterance that is a certain question and a second utterance that is an utterance related to the first utterance,
A first presentation step in which the presenting unit presents the first utterance;
An utterance receiving step in which the input unit receives a user utterance uttered by the user after the first utterance;
A second presentation step in which the presenting unit presents the second utterance in which at least a part of the content of the user utterance is omitted from the utterance related to the first utterance;
Interactive method including

A dialogue method executed by a dialogue system that presents a user with a first utterance that is a question that utters a word belonging to a specific word class and a second utterance that is an utterance related to the first utterance,
A first presentation step in which the presenting unit presents the first utterance;
An utterance receiving step in which the input unit receives a user utterance uttered by the user after the first utterance;
A second presentation step in which the presenting unit presents an additional utterance selected from a predetermined utterance sentence according to whether or not the user utterance includes a word belonging to the word class;
A third presentation step in which the presenting unit presents the second utterance;
Interactive method including

A dialogue method executed by a dialogue system that presents to a user a first utterance that is a question asking a certain knowledge and a second utterance that is an utterance related to the first utterance,
A first presentation step in which the presenting unit presents the first utterance;
An utterance receiving step in which the input unit receives a user utterance uttered by the user after the first utterance;
A second presentation step in which the presenting unit presents an additional utterance selected from a predetermined utterance sentence according to whether or not the word representing the knowledge is included in the user utterance;
A third presentation step in which the presenting unit presents the second utterance;
Interactive method including

The dialogue method according to claim 3,
In the case where the user utterance does not include a word representing the knowledge, the second presentation step determines whether the user utterance includes a word belonging to the same word class as the word representing the knowledge. Select and present additional utterances,
How to interact.

An interactive system for presenting a user with a first utterance that is a certain question and a second utterance that is an utterance related to the first utterance,
An input unit for receiving a user utterance uttered by the user after the first utterance;
An utterance determination unit for determining the second utterance, in which at least a part of the content of the user utterance is omitted from the utterance related to the first utterance and the first utterance;
Presenting the first utterance and presenting the second utterance after accepting the user utterance;
Interactive system including

An interactive system for presenting a user with a first utterance that is a question that utters a word belonging to a specific word class and a second utterance that is an utterance related to the first utterance,
An input unit for receiving a user utterance uttered by the user after the first utterance;
An utterance determination unit that determines an additional utterance selected from a predetermined utterance sentence and the second utterance according to whether the first utterance and a word belonging to the word class are included in the user utterance When,
Presenting the first utterance, presenting the additional utterance after accepting the user utterance, and presenting the second utterance;
Interactive system including

An interactive system for presenting a user with a first utterance that is a question asking a certain knowledge and a second utterance that is an utterance related to the first utterance,
An input unit for receiving a user utterance uttered by the user after the first utterance;
An utterance determination unit that determines the first utterance, an additional utterance selected from a predetermined utterance sentence, and the second utterance according to whether or not the word representing the knowledge is included in the user utterance; ,
Presenting the first utterance, presenting the additional utterance after accepting the user utterance, and presenting the second utterance;
Including system.

An interactive device for determining an utterance to be presented by an interactive system including at least an input unit that receives a user's utterance and a presentation unit that presents the utterance,
A first utterance that is a certain question and an utterance related to the first utterance, and the second utterance in which at least a part of the content of the user utterance uttered by the user after the first utterance is omitted are determined. An interactive device that includes an utterance determination unit.

An interactive device for determining an utterance to be presented by an interactive system including at least an input unit that receives a user's utterance and a presentation unit that presents the utterance,
A first utterance which is a question for uttering a word belonging to a specific word class, and an additional utterance selected from a predetermined utterance sentence according to whether or not the word belonging to the word class is included in the user utterance; An interactive device including the second utterance and an utterance determination unit for determining the second utterance.

An interactive device for determining an utterance to be presented by an interactive system including at least an input unit that receives a user's utterance and a presentation unit that presents the utterance,
A first utterance which is a question asking a certain knowledge, an additional utterance selected from a predetermined utterance sentence according to whether or not the word representing the knowledge is included in the user utterance, and the second utterance An interactive device including an utterance determining unit for determining.

A program for causing a computer to execute each step of the interactive method according to claim 1.

A program for causing a computer to function as the interactive device according to claim 8.