JP2023149322A

JP2023149322A - Method for recovering from failure in interaction and computer program

Info

Publication number: JP2023149322A
Application number: JP2022057831A
Authority: JP
Inventors: 智船山; Satoshi Funayama; くりま境; Kurima Sakai
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2023-10-13

Abstract

【課題】ロボットが人間とのコミュニケーションに失敗したときに、自然な形により回復できるようにする。【解決手段】回復方法は、対話相手とのコミュニケーションの失敗を検出する失敗検出ステップ３６６と、失敗の検出に応答して、対話相手との、感情の表出を伴う対話を行うようにロボットを制御することにより、対話において得た情報を用いて復帰を行うステップ３７０－３９４とを含む。失敗検出ステップは、同定処理の信頼度に従って、第１の態度をもって同定処理の結果を確認する第１の発話を行う処理と、自信がなく見える第２の態度をもって、対話相手の同定手順を開始する第２の発話を行う処理とを選択的に行うステップと、第１の発話に対する対話相手の応答が、同定処理の結果の誤りを示すことに応答して、第２の態度をもって同定手順を開始するための第３の発話を行うようロボットを制御するステップとを含む。【選択図】図３[Problem] To enable a robot to recover in a natural way when it fails to communicate with a human. The recovery method includes a failure detection step 366 of detecting a failure in communication with a dialogue partner, and in response to the detection of the failure, the robot is configured to perform a dialogue with the dialogue partner that involves the expression of emotions. The control includes steps 370-394 for returning using information obtained in the interaction. The failure detection step includes a process of making a first utterance to confirm the result of the identification process with a first attitude according to the reliability of the identification process, and a process of starting the process of identifying the dialogue partner with a second attitude that appears lacking in confidence. selectively performing a process of performing a second utterance, and performing the identification procedure with a second attitude in response to the interaction partner's response to the first utterance indicating an error in the result of the identification process; and controlling the robot to make a third utterance to initiate. [Selection diagram] Figure 3

Description

この発明はロボットの制御技術に関し、特に、ヒューマノイドロボットによる人とのコミュニケーション技術の改善に関する。 The present invention relates to robot control technology, and in particular to improvements in communication technology with humans using humanoid robots.

最近のロボット技術の発展はめざましく、以前は困難だった様々な作業をロボットが行うことが可能になっている。中でも、日常生活において人の作業を代行したり、人とのコミュニケーションが必要な作業を遂行したりすることがロボットに期待されている。そのため、社会に溶け込み、人間と関わり合う日常活動を行うロボットに関する研究及び開発が盛んに行われている。 Recent developments in robot technology have been remarkable, and robots are now able to perform a variety of tasks that were previously difficult. In particular, robots are expected to take over the tasks of humans in daily life and perform tasks that require communication with humans. Therefore, research and development on robots that blend into society and perform daily activities that interact with humans is actively being conducted.

日常活動を行うロボットには対話機能が求められる。これまでも対話機能を持つとされるロボットが種々開発されてきた。対話とは、もともと２人の人が相対してことばを交わすこととされる。したがって、人間がロボットと対話を行う場合にも、人の姿に近いロボットを対話相手とすることが望ましいと考えられる。そのような人間らしさを追求した存在として人間にそっくりな外見を持つヒューマノイドロボットがある。以下、ヒューマノイドロボットを単にロボットという。 Robots that perform daily activities are required to have conversational capabilities. Various robots that are said to have conversational functions have been developed. Dialogue is originally thought of as an exchange of words between two people. Therefore, even when a human interacts with a robot, it is considered desirable to have a robot that resembles a human as the conversation partner. There are humanoid robots that look exactly like humans as beings that pursue such human-like characteristics. Hereinafter, the humanoid robot will be simply referred to as a robot.

人間とロボットとのコミュニケーションにおいて、短時間の振る舞いを見るだけならば、多くの人にとってロボットと人間との区別がつかないことを示す文献が公開されている。したがって、ロボットは、社会の中で人間と密接に関わることのできるコミュニケーションメディアとなれると考えられている。 In human-robot communication, there is published literature showing that many people cannot tell the difference between a robot and a human if they only look at their behavior over a short period of time. Therefore, it is believed that robots can become communication media that can closely interact with humans in society.

そうしたロボットが人間と対話を行う場合に問題となる事象として、発話の衝突がある。発話の衝突は、人同士の対話においてもよく発生する。人同士の対話においては、発話の衝突が起きたときには、そのときどきにより様々な対応により衝突を解決する。しかし、対話相手がロボットのときにはそのような融通はきかない。そのため、人とロボットとの対話は、人同士の対話とはかなり異なったものとなる可能性がある。 When such robots interact with humans, a problem that arises is the collision of utterances. Conflicts in utterances often occur in conversations between people. In dialogue between people, when a conflict of utterances occurs, the conflict is resolved by various responses depending on the situation. However, such flexibility is not possible when the conversation partner is a robot. Therefore, interactions between humans and robots can be quite different from interactions between humans.

こうした問題に関連する一つの提案が後掲の特許文献１に提案されている。特許文献に開示された技術は、複数の話者が参加する会話において、次の話者が誰かを推定する技術である。この推定には、各話者の口形状の変化に関する情報に基づいて、次の話者を推定する。この方法を使用すれば、例えば遠隔会議などにおいて発話の衝突が防止できる可能性がある。 One proposal related to these problems is proposed in Patent Document 1 mentioned below. The technology disclosed in the patent document is a technology for estimating who will be the next speaker in a conversation in which multiple speakers participate. In this estimation, the next speaker is estimated based on information regarding changes in the shape of each speaker's mouth. If this method is used, it may be possible to prevent utterance collisions in, for example, remote conferences.

特開2018-77791号公報Japanese Patent Application Publication No. 2018-77791

上記特許文献１に開示された技術は、人間同士の会話における次の発話者の推定には有効である可能性がある。しかし特許文献１に開示された技術によっては、会話にロボットが参加するときに、そのロボット自身の発話が対話相手の発話と衝突するという問題は解決できない。また、対話相手が人間であるため、ロボットと人間との間において発話の衝突が生ずる可能性は常に存在する。そのような発話の衝突が生じたときに、どうすれば対話相手に違和感を生じさせずに発話の衝突を解決するかという問題には、特許文献１に開示の技術を適用できないという問題もある。 The technique disclosed in Patent Document 1 may be effective for estimating the next speaker in a conversation between humans. However, the technique disclosed in Patent Document 1 cannot solve the problem that when a robot participates in a conversation, the robot's own utterances collide with the utterances of the conversation partner. Furthermore, since the conversation partner is a human, there is always a possibility that a conflict of utterances will occur between the robot and the human. There is also the problem that the technique disclosed in Patent Document 1 cannot be applied to the problem of how to resolve such a conflict of speech without causing a sense of discomfort to the conversation partner.

こうした問題は要するにロボットが人間とのコミュニケーションに失敗したときに、どのようにすれば相手に違和感を生じさせずにその失敗から復帰できるかという問題として捉えられる。例えば対話の開始時に、ロボットが対話相手を認識できなかったり、認識を誤ったりした場合には、それ以後のロボットと相手との対話を進めることができなくなってしまう。こうした場合にも、コミュニケーションの失敗から自然な形により復帰する必要がある。 In short, these problems can be seen as the question of how to recover from a failure in communication with a human without causing discomfort to the other person when a robot fails to communicate with a human. For example, if the robot cannot recognize the conversation partner or makes a mistake in recognizing the conversation partner at the beginning of a conversation, the robot and the conversation partner will no longer be able to proceed with the conversation. Even in such cases, it is necessary to recover from a communication failure in a natural way.

それ故にこの発明は、ロボットが人間とのコミュニケーションに失敗したときに、自然な形により回復できる、対話における失敗からの回復方法及びコンピュータプログラムを提供することである。 Therefore, it is an object of the present invention to provide a method and computer program for recovering from a failure in dialogue, which allows a robot to recover in a natural manner when it fails to communicate with a human.

この発明の第１の局面に係る対話における失敗からの回復方法は、コンピュータが、ロボットと対話相手とのコミュニケーションの失敗を検出する失敗検出ステップと、コンピュータが、失敗検出ステップにおいて失敗が検出されたことに応答して、あらかじめ定めた手順に従って、対話相手との、感情の表出を伴う対話を行うようにロボットを制御することにより、当該対話において得た情報を用いて失敗からの復帰を行うステップとを含む。 A method for recovering from a failure in dialogue according to a first aspect of the present invention includes a failure detection step in which a computer detects a failure in communication between a robot and a dialogue partner; In response to this, the robot is controlled to have a dialogue that involves the expression of emotion with the dialogue partner according to a predetermined procedure, and the information obtained during the dialogue is used to recover from the failure. step.

好ましくは、失敗検出ステップは、コンピュータが、対話相手の同定処理における信頼度が所定のしきい値より高いか否かにしたがって、あらかじめ準備した第１の態度をもって対話相手に同定処理の結果を確認する第１の発話を行うようロボットを制御する処理と、第１の態度よりも自信がなく見えるようにあらかじめ準備した第２の態度をもって、対話相手の同定手順を開始するための第２の発話を行うようロボットを制御する処理とを選択的に行うステップと、コンピュータが、第１の発話に対する対話相手の応答が、同定処理の結果の誤りを示すものであることに応答して、第２の態度をもって同定手順を開始するための第３の発話を行うよう、ロボットを制御する処理を行うステップとを含む。 Preferably, in the failure detection step, the computer confirms the result of the identification process to the conversation partner with a first attitude prepared in advance, depending on whether the reliability of the conversation partner identification process is higher than a predetermined threshold. a process for controlling the robot to make a first utterance to perform a conversation; and a second utterance for starting a dialogue partner identification procedure with a second attitude prepared in advance so as to appear less confident than the first attitude. selectively performing a process of controlling the robot to perform a second utterance in response to the interaction partner's response to the first utterance indicating an error in the result of the identification process; and controlling the robot to make a third utterance for starting the identification procedure with the attitude of the robot.

より好ましくは、復帰を行うステップは、コンピュータが、第１の発話に対する対話相手の応答が、同定結果が正しいことを示すものであることに応答して、対話相手をロボットにとっての知人に分類するステップと、コンピュータが、あらかじめ準備された、知人との対話のためのシナリオに従った対話を開始するようにロボットを制御するステップとを含む。 More preferably, in the step of performing the return, the computer classifies the interaction partner as an acquaintance of the robot in response to the interaction partner's response to the first utterance indicating that the identification result is correct. and a step in which the computer controls the robot to initiate a dialogue according to a previously prepared scenario for dialogue with an acquaintance.

さらに好ましくは、第２の発話及び第３の発話は同じ発話である。 More preferably, the second utterance and the third utterance are the same utterance.

好ましくは、第２の発話は、ロボットが、対話相手がロボットと初対面か否かを尋ねる発話である。 Preferably, the second utterance is an utterance in which the robot asks whether the conversation partner is meeting the robot for the first time.

より好ましくは、復帰を行うステップはさらに、コンピュータが、第２の発話に対する対話相手の応答が、対話相手がロボットと初対面であることを肯定したか否かを判定するステップと、判定するステップにおける対話相手の応答が否定であることに応答して、第２の態度よりもさらに自信がなく見えるようにあらかじめ準備した第３の態度をもって、対話相手が同定処理により同定された人物か否かに関する第４の発話を行うよう、コンピュータがロボットを制御するステップと、コンピュータが、第４の発話に対する対話相手の応答が肯定であることに応答して、対話相手をロボットにとっての知人に分類し、ホッとした様子に見えるようにあらかじめ準備した第４の態度をもって対話を開始するようロボットを制御するステップと、第４の発話に対する対話相手の応答が否定であることに応答して、残念そうに見えるようにあらかじめ準備した第５の態度を示して追加の同定処理を実行するように、コンピュータがロボットを制御するステップとを含む。 More preferably, the step of performing the return further includes a step in which the computer determines whether or not the dialogue partner's response to the second utterance affirms that the dialogue partner is meeting the robot for the first time; In response to a negative response from the dialogue partner, the user adopts a third attitude prepared in advance so as to appear even less confident than the second attitude, regarding whether or not the dialogue partner is the person identified by the identification process. the computer controlling the robot to make a fourth utterance; and the computer, in response to the interaction partner's response to the fourth utterance being affirmative, classifying the interaction partner as an acquaintance of the robot; controlling the robot to start the dialogue with a fourth attitude prepared in advance so as to appear relieved; and in response to the negative response of the dialogue partner to the fourth utterance, the robot appears disappointed. the computer controlling the robot to display a pre-prepared fifth attitude to be visible and to perform an additional identification process.

さらに好ましくは、追加の同定処理は、コンピュータが、対話相手にその氏名を聞く質問を発話するようロボットを制御するステップと、コンピュータが、氏名を聞く質問に対する対話相手の応答に含まれる氏名が、あらかじめ準備された人物情報データベースに登録されている人物の氏名と一致するか否かを判定することにより判定結果を生成するステップと、コンピュータが、判定結果が肯定であることに応答して、対話相手をロボットにとっての知人に分類し、うれしそうに見えるようにあらかじめ準備した第５の態度を示しながら知人との対話のためのシナリオに従った対話を開始するようにロボットを制御するステップと、コンピュータが、判定結果が否定であることに応答して、対話相手をロボットにとって未知の人に分類し、未知の人との対話としてあらかじめ準備されたシナリオに従って対話相手との対話を開始するようがロボットを制御するステップとを含む。 More preferably, the additional identification process includes a step in which the computer controls the robot to utter a question asking the interaction partner to ask the interaction partner a name, and a step in which the computer controls the robot to utter a question asking the interaction partner to ask the interaction partner a name included in the response to the interaction partner's name question. a step of generating a determination result by determining whether the name matches a person's name registered in a person information database prepared in advance; controlling the robot to classify the other party as an acquaintance for the robot and to start a dialogue according to a scenario for dialogue with an acquaintance while displaying a fifth attitude prepared in advance so as to appear happy; In response to a negative determination result, the computer classifies the conversation partner as a person unknown to the robot, and starts a conversation with the conversation partner according to a scenario prepared in advance as a conversation with an unknown person. and controlling the robot.

好ましくは、追加の同定処理は、コンピュータが、対話相手にその氏名を聞く質問を発話するようロボットを制御するステップと、コンピュータが、氏名を聞く質問に対する対話相手の応答に含まれる氏名が、あらかじめ準備された人物情報データベースに登録されている人物の氏名と一致するか否かを判定することにより判定結果を生成するステップと、コンピュータが、判定結果が肯定であることに応答して、対話相手が人物情報データベースに登録されている人物と同一人物か否かを確認する処理を行い、確認の結果にしたがって、対話相手をロボットにとっての知人と未知の人とに分類するステップと、コンピュータが、対話相手がロボットにとっての知人に分類されたことに応答して、うれしそうに見えるようにあらかじめ準備した第５の態度を示しながら知人との対話のためのシナリオに従った対話を開始するようにロボットを制御するステップと、コンピュータが、判定結果が否定であること、又は対話相手がロボットにとっての未知の人に分類されたことに応答して、未知の人との対話としてあらかじめ準備されたシナリオに従って対話相手との対話を開始するようにロボットを制御するステップとを含む。 Preferably, the additional identification process includes a step in which the computer controls the robot to utter a question asking the interaction partner to ask his or her name, and a step in which the computer determines in advance the name included in the interaction partner's response to the name asking question. a step of generating a determination result by determining whether the name matches the name of the person registered in the prepared person information database; a step of confirming whether or not the person is the same as a person registered in a person information database, and classifying the conversation partner into an acquaintance or an unknown person for the robot according to the confirmation result; In response to the conversation partner being classified as an acquaintance for the robot, the robot starts a conversation according to a scenario for a conversation with an acquaintance while showing the fifth attitude prepared in advance so as to appear happy. A step of controlling the robot, and a scenario prepared in advance as a dialogue with an unknown person in response to the determination result being negative or the dialogue partner being classified as an unknown person to the robot. controlling the robot to initiate a dialogue with the dialogue partner according to the method.

より好ましくは、復帰を行うステップはさらに、第４の発話に対する対話相手の応答が肯定であることに応答して、対話相手を特定するための第５の発話を行うよう、コンピュータがロボットを制御するステップと、コンピュータが、第５の発話に対する対話相手の応答に含まれる対話相手を特定する情報と、同定処理の結果とが一致するか否かに関する判定結果を生成するステップと、判定結果が肯定であることに応答して、対話相手がロボットにとっての知人に相当することを確認するための第６の発話を行うよう、コンピュータがロボットを制御するステップと、コンピュータが、第６の発話に対する対話相手の応答が肯定であることに応答して、対話相手をロボットにとっての知人に分類し、うれしそうに見えるようにあらかじめ準備した第５の態度を示しながら知人との対話のためのシナリオに従った対話を開始するようにロボットを制御するステップとを含む。 More preferably, the step of returning further includes the computer controlling the robot to make a fifth utterance for identifying the dialogue partner in response to the dialogue partner's response to the fourth utterance being affirmative. a step in which the computer generates a determination result regarding whether or not the information identifying the dialogue partner included in the dialogue partner's response to the fifth utterance matches the result of the identification process; In response to the affirmative, the computer controls the robot to make a sixth utterance for confirming that the conversation partner corresponds to an acquaintance of the robot; In response to an affirmative response from the conversation partner, the robot classifies the conversation partner as an acquaintance, and while displaying the fifth attitude prepared in advance so as to appear happy, the robot enters into a scenario for a conversation with an acquaintance. controlling the robot to initiate the compliant interaction.

さらに好ましくは、復帰を行うステップはさらに、コンピュータが、判定結果が否定であることに応答して、対話相手をロボットにとって未知の人に分類し、未知の人との対話としてあらかじめ準備されたシナリオに従って対話相手との対話を開始するようにロボットを制御するステップを含む。 More preferably, the step of performing the return further includes, in response to the negative determination result, the computer classifying the conversation partner as a person unknown to the robot, and creating a scenario prepared in advance as a conversation with an unknown person. controlling the robot to initiate a dialogue with the dialogue partner according to the method.

好ましくは、復帰を行うステップはさらに、コンピュータが、第６の発話に対する対話相手の応答が否定であることに応答して、対話相手をロボットにとって未知の人に分類し、未知の人との対話としてあらかじめ準備されたシナリオに従って対話相手との対話を開始するようにロボットを制御するステップを含む。 Preferably, the step of performing the return further comprises: in response to the interaction partner's response to the sixth utterance being negative, the computer classifies the interaction partner as a person unknown to the robot, and the computer classifies the interaction partner as a person unknown to the robot, The method includes the step of controlling the robot to start a dialogue with a dialogue partner according to a scenario prepared in advance.

この発明の第２の局面に係るコンピュータプログラムは、コンピュータを、上記したいずれかの方法を実行するよう機能させる。 A computer program according to a second aspect of the invention causes a computer to function to execute any of the methods described above.

この発明の上記及び他の目的、特徴、局面及び利点は、添付の図面と関連して理解されるこの発明に関する次の詳細な説明から明らかとなるであろう。 The above and other objects, features, aspects and advantages of the present invention will become apparent from the following detailed description of the invention, understood in conjunction with the accompanying drawings.

図１は、この発明の第１実施形態に係るロボットシステムのハードウェア構成を示すブロック図である。FIG. 1 is a block diagram showing the hardware configuration of a robot system according to a first embodiment of the present invention. 図２は、図１に示すロボットシステムがロボットを制御するために実行するプログラムの記述形式の例を示す模式図である。FIG. 2 is a schematic diagram showing an example of a description format of a program executed by the robot system shown in FIG. 1 to control a robot. 図３は、図１に示すロボットシステムが対話相手との会話を開始する際に実行するプログラムの制御構造を示すフローチャートである。FIG. 3 is a flowchart showing the control structure of a program that the robot system shown in FIG. 1 executes when starting a conversation with a dialogue partner. 図４は、ロボットに責任がある発話衝突のパターンを示す模式図である。FIG. 4 is a schematic diagram showing a pattern of speech conflicts for which the robot is responsible. 図５は、人に責任がある発話衝突のパターンを示す模式図である。FIG. 5 is a schematic diagram showing a pattern of utterance conflicts for which a person is responsible. 図６は、ロボットに責任がある発話衝突を検出するタイミングを示す図であるFIG. 6 is a diagram showing the timing of detecting a speech collision for which the robot is responsible. 図７は、人に責任がある発話衝突を検出するタイミングを示す図である。FIG. 7 is a diagram showing the timing of detecting a speech conflict for which a person is responsible. 図８は、第１実施形態に係るロボットシステムが、ロボットに責任がある発話衝突を検出した際に実行するプログラムの制御構造を示すフローチャートである。FIG. 8 is a flowchart showing the control structure of a program that the robot system according to the first embodiment executes when it detects a speech conflict for which the robot is responsible. 図９は、第１実施形態に係るロボットシステムが、対話相手に責任がある発話衝突を検出した際に実行するプログラムの制御構造を示すフローチャートである。FIG. 9 is a flowchart showing the control structure of a program that the robot system according to the first embodiment executes when it detects an utterance conflict for which the conversation partner is responsible. 図１０は、第１実施形態に係るロボットシステムが発話衝突を検出し分類する際に実行するプログラムの制御構造を示すフローチャートである。FIG. 10 is a flowchart showing a control structure of a program executed by the robot system according to the first embodiment when detecting and classifying speech collisions. 図１１は、発話中において、発話衝突から除外する範囲を説明するための図である。FIG. 11 is a diagram for explaining the range to be excluded from speech collision during speech. 図１２は、第２実施形態に係るロボットシステムが対話相手との会話を開始する際に実行するプログラムの制御構造を示すフローチャートである。FIG. 12 is a flowchart showing the control structure of a program that the robot system according to the second embodiment executes when starting a conversation with a conversation partner. 図１３は、第３実施形態に係るロボットシステムが対話相手との会話を開始する際に実行するプログラムの制御構造を示すフローチャートである。FIG. 13 is a flowchart showing the control structure of a program that the robot system according to the third embodiment executes when starting a conversation with a conversation partner. 図１４は、第４実施形態に係るロボットシステムが発話衝突を検出し分類する際に実行するプログラムの制御構造を示すフローチャートである。FIG. 14 is a flowchart showing the control structure of a program executed by the robot system according to the fourth embodiment when detecting and classifying speech collisions. 図１５は、各実施形態に係るロボットシステムを実現するためのコンピュータの１例の外観図である。FIG. 15 is an external view of an example of a computer for realizing the robot system according to each embodiment. 図１６は、図１５に示すコンピュータの１例のブロック図である。FIG. 16 is a block diagram of an example of the computer shown in FIG. 15.

以下の説明及び図面においては、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。 In the following description and drawings, identical parts are provided with the same reference numerals. Therefore, detailed description thereof will not be repeated.

第１第１実施形態
１．構成
図１に、この発明の第１実施形態に係る、人間とのコミュニケーションを行うロボットシステム１００のハードウェア構成をブロック図形式により示す。図１を参照して、ロボットシステム１００は、カメラ６０と、マイクロフォン６６と、スピーカ６２と、ロボット１１０とを含む。ロボット１１０は、人型ロボットであり、少なくとも上半身の各関節に相当する部分にアクチュエータを持ち、アクチュエータを駆動することにより様々な姿勢をとることができる。またロボット１１０の頭部には、ロボット１１０の顔に定義された制御点の位置を制御する複数のアクチュエータが設けられ、このアクチュエータを駆動することによりロボットに様々な表情を与えることができる。 1st Embodiment 1. Configuration FIG. 1 shows, in block diagram form, the hardware configuration of a robot system 100 that communicates with humans according to a first embodiment of the present invention. Referring to FIG. 1, robot system 100 includes a camera 60, a microphone 66, a speaker 62, and a robot 110. The robot 110 is a humanoid robot, and has actuators at least in parts corresponding to the joints of the upper body, and can take various postures by driving the actuators. Further, the head of the robot 110 is provided with a plurality of actuators that control the positions of control points defined on the face of the robot 110, and by driving these actuators, it is possible to give the robot various facial expressions.

ロボットシステム１００はさらに、カメラ６０の出力を受けるように接続され、カメラ６０が出力する映像内の人物の顔画像に対する顔画像認識を行うための顔画像認識ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）１１６と、マイクロフォン６６からの信号を受け、対話相手の発話についての音声認識を行うための音声認識ＰＣ１１８とを含む。顔画像認識ＰＣ１１６はニューラルネットワークからなり、入力された顔画像が、あらかじめ定められた複数の人物のいずれの顔画像かを示す情報を出力する機能を持つ。より具体的には、顔画像認識ＰＣ１１６を構成するニューラルネットワークは、認識可能な人物の数と同じだけの出力を持ち、入力された顔画像がそれら人物の顔画像である尤度を各人物について出力する。顔画像認識ＰＣ１１６は、尤度が最も高い人物を顔画像の認識結果とし、その識別子に相当する情報を出力する。認識結果として選択された人物に関する尤度を、以下の説明においては顔画像認識の確信度という。なお顔画像認識ＰＣ１１６は、認識結果の人物だけではなく、それ以外で尤度が高い所定人数の識別子も、その尤度とともに出力する。 The robot system 100 further includes a face image recognition PC (Personal Computer) 116 connected to receive the output of the camera 60 and for performing face image recognition on a face image of a person in the video output by the camera 60, and a microphone 66. and a voice recognition PC 118 for receiving signals from the conversation partner and performing voice recognition on the utterances of the conversation partner. The facial image recognition PC 116 is composed of a neural network, and has a function of outputting information indicating which of a plurality of predetermined persons the inputted facial image belongs to. More specifically, the neural network that constitutes the facial image recognition PC 116 has outputs equal to the number of recognizable people, and calculates the likelihood that the input facial image is that of those people for each person. Output. The facial image recognition PC 116 determines the person with the highest likelihood as the facial image recognition result, and outputs information corresponding to the identifier. In the following explanation, the likelihood regarding the person selected as a recognition result will be referred to as the certainty of face image recognition. Note that the face image recognition PC 116 outputs not only the recognition result of the person but also a predetermined number of other identifiers with a high likelihood together with their likelihood.

ロボットシステム１００はさらに、顔画像認識ＰＣ１１６及び音声認識ＰＣ１１８がともに接続されるネットワーク１１４と、ネットワーク１１４に接続され、与えられる制御指令に従ってロボット１１０の各アクチュエータを制御して、ロボット１１０の姿勢・動作及び表情を制御するための動作制御ＰＣ１１２とを含む。ロボット１１０の動作及び表情はあらかじめいくつか定義されており、それらを実現するためのプログラムがあらかじめ準備されている。表情としては、自信のある表情、自信のない表情、うれしそうな表情、残念そうな表情等がある。動作制御ＰＣ１１２は、そのプログラムに動作の継続時間、及び動作の大きさなどの制御情報を引数として与えることにより、ロボット１１０が所望の表情をもって所望の動作をするよう各アクチュエータを制御する。 The robot system 100 is further connected to a network 114 to which a face image recognition PC 116 and a voice recognition PC 118 are connected, and controls each actuator of the robot 110 according to the control commands given to control the posture and motion of the robot 110. and an action control PC 112 for controlling facial expressions. Several motions and facial expressions of the robot 110 are defined in advance, and programs for realizing them are prepared in advance. Facial expressions include a confident facial expression, a non-confident facial expression, a happy facial expression, a disappointing facial expression, and the like. The motion control PC 112 controls each actuator so that the robot 110 performs a desired motion with a desired facial expression by giving control information such as the duration of the motion and the magnitude of the motion as arguments to the program.

各表情を実現するためには、あらかじめ様々なパラメータでロボットの表情をさせ、複数の被験者にその表情によりロボット１１０がどのような感情を持っているかに関するアンケートを行う。その結果に従って、ロボット１１０に各表情をさせるためのパラメータを決定すればよい。 In order to realize each facial expression, the robot 110 is made to make facial expressions using various parameters in advance, and a plurality of subjects are asked to determine what kind of emotion the robot 110 is feeling based on the facial expressions. Based on the results, parameters for making the robot 110 make each facial expression may be determined.

ロボットシステム１００はさらに、顔画像認識ＰＣ１１６による認識対象となる人物に関する情報（氏名、所属など）を、その人物の識別子をキーとしてアクセスできるように管理する人物情報ＤＢ（Ｄａｔａｂａｓｅ）９２と、ネットワーク１１４及び人物情報ＤＢ９２に接続され、ネットワーク１１４に接続された他のＰＣからの情報に基づいて、ロボット１１０の動作と発話内容とを算出することによりロボット１１０の全体的な動作を制御するための統合制御ＰＣ１２２と、ネットワーク１１４に接続され、統合制御ＰＣ１２２から出力された発話指令に応答して、指定された発話を行うための音声合成を行い、音声信号としてスピーカ６２に与えて音声を発生させるための音声合成ＰＣ１２０を含む。統合制御ＰＣ１２２は、図示しない記憶装置に、ロボット１１０に所定のシナリオに従がった行動をさせるためのプログラムを記憶している。本実施形態においては、後述するようにこのプログラムはグラフ形式により表されるスクリプトとして作成される。 The robot system 100 further includes a personal information DB (Database) 92 that manages information regarding a person to be recognized by the facial image recognition PC 116 (name, affiliation, etc.) so that the information can be accessed using the person's identifier as a key, and a network 114. and integration for controlling the overall motion of the robot 110 by calculating the motion and speech content of the robot 110 based on information from other PCs connected to the personal information DB 92 and connected to the network 114. It is connected to the control PC 122 and the network 114, and in response to the speech command output from the integrated control PC 122, performs speech synthesis to perform specified speech, and provides it as an audio signal to the speaker 62 to generate speech. includes a speech synthesis PC 120. The integrated control PC 122 stores a program for causing the robot 110 to behave according to a predetermined scenario in a storage device (not shown). In this embodiment, this program is created as a script expressed in a graph format, as will be described later.

このように、顔画像認識ＰＣ１１６、音声認識ＰＣ１１８、音声合成ＰＣ１２０、統合制御ＰＣ１２２及び動作制御ＰＣ１１２が協働して動作することにより、人とのコミュニケーションをとる動作をするようロボット１１０を制御できる。 In this way, the face image recognition PC 116, the voice recognition PC 118, the voice synthesis PC 120, the integrated control PC 122, and the motion control PC 112 work together to control the robot 110 so as to perform an action to communicate with a person.

図２を参照して、統合制御ＰＣ１２２の図示しない記憶装置に記憶されているシナリオの形式について説明する。この実施形態では、シナリオは図２に示すグラフ１５０のようなグラフ形式で記述される。なお、図２は単にシナリオの記述形式を説明するためのものであって、以下に説明するような動作をロボット１１０に実行させるためのものではない。 With reference to FIG. 2, the format of the scenario stored in the storage device (not shown) of the integrated control PC 122 will be described. In this embodiment, the scenario is described in graphical form, such as graph 150 shown in FIG. Note that FIG. 2 is merely for explaining the description format of the scenario, and is not for causing the robot 110 to perform the operations described below.

図２に示すグラフ１５０は、ロボットの個々の動作単位を示すブロックと、各ブロックをつなぐ有向のエッジとからなる。グラフ１５０は、開始ブロック１６０、発話ブロック１６２、質問ブロック１６４、及び音声認識ブロック１６６を含む。これらブロックは直列に有向エッジで連結されており、エッジに沿って順次に実行される。また、後述する各ブロックでも同様だが、各ブロックにはロボットがひとまとまりの動作を行うための情報が記述されている。この情報は、ロボットがそのブロックにおいて発話すべき内容、とるべき動作、その動作を行うときの感情状態などを含む。ロボットは、各ブロックに到達するとそのブロックに記載された情報に基づいて動作する。 The graph 150 shown in FIG. 2 is made up of blocks representing individual movement units of the robot and directed edges connecting each block. Graph 150 includes a start block 160, an utterance block 162, a question block 164, and a speech recognition block 166. These blocks are serially connected by directed edges and are executed sequentially along the edges. Also, the same goes for each block described later, but each block describes information for the robot to perform a group of actions. This information includes what the robot should say in that block, the action it should take, and its emotional state when performing the action. When the robot reaches each block, it operates based on the information written in that block.

グラフ１５０はさらに、音声認識ブロック１６６の次に、互いに並列に連結される表情ブロック１６８及び動作ブロック１７０、並びに表情ブロック１６８及び動作ブロック１７０の次に連結された終了ブロック１７２とを含む。音声認識ブロック１６６においては、ロボット１１０は、ブロック１６４における質問に対する対話相手の回答を音声認識し、その結果に基づいて、ブロック１６８又はブロック１７０を異なるパラメータを用いて実行する。なお、この例においては、質問ブロック１６４において行われる質問は、基本的にイエス／ノーにより回答できる問題、又は回答のカテゴリが予測できる質問（例えば相手の名前を聞く質問など）である。音声認識ブロック１６６においては、対話相手の回答がイエス、ノー、分からない、どちらとも言えない、回答がない、などに分類される。対話相手の回答がイエス又はノーのときには、その内容をパラメータとしてブロック１６８が動作する。例えば回答がロボットの予測した回答と一致したときにはロボットがうれしそうな表情を作り、一致していないときにはびっくりした表情を作る、などである。分からない、どちらともいえないという回答があった場合、及び回答がない場合には、それらに応じたパラメータがブロック１７０に渡され、ブロック１７０の処理が実行される。ブロック１７０においては、ロボットは、パラメータに応じて手を動かしたり、頭を動かしたりする。 The graph 150 further includes a facial expression block 168 and a motion block 170 that are connected in parallel to each other after the speech recognition block 166, and a termination block 172 that is connected next to the facial expression block 168 and the motion block 170. In speech recognition block 166, robot 110 speech recognizes the interlocutor's answer to the question in block 164 and, based on the results, executes block 168 or block 170 using different parameters. In this example, the questions asked in the question block 164 are basically questions that can be answered with a yes/no response, or questions for which the answer category can be predicted (for example, a question asking the other party's name). In the voice recognition block 166, the conversation partner's answers are classified into yes, no, don't know, can't say, no answer, and so on. If the conversation partner's answer is yes or no, block 168 operates using the content as a parameter. For example, when the answer matches the answer predicted by the robot, the robot makes a happy expression, and when it does not match, it makes a surprised expression. If there is an answer of "I don't know" or "I can't say," or if there is no answer, the corresponding parameters are passed to block 170, and the process of block 170 is executed. At block 170, the robot moves its hands or moves its head depending on the parameters.

以下の実施形態においては、ロボット１１０が例えば研究所の訪問者の受付を務めている場合を想定し、その過程において訪問者とのコミュニケーションに失敗した場合の回復について説明する。訪問者を対話相手とするコミュニケーションは、最初に対話相手を同定する処理から始まる。以下に説明する実施形態は、コミュニケーションの失敗からの回復（復帰）の例として、ロボットが対話相手の同定に失敗したときの回復処理と、対話中に対話相手とロボットとの間に発話衝突が生じたときの回復処理とに関する。 In the following embodiment, it is assumed that the robot 110 is, for example, accepting visitors to a research institute, and recovery when communication with the visitor fails in the process will be described. Communication with a visitor as a conversation partner begins with the process of identifying the conversation partner. The embodiment described below is an example of recovery (recovery) from a communication failure, and describes a recovery process when a robot fails to identify a conversation partner and a speech conflict between a conversation partner and the robot during a conversation. Regarding recovery processing when such occurrence occurs.

Ａ．同定失敗からの回復処理
図３は、ロボット１１０が対話相手の同定処理に失敗したときにロボット１１０が実行する、あらかじめ定めた手順に従い失敗からの回復処理を実現するためのプログラムの制御構造を示す。対話相手の同定は、上記したとおり図１に示す顔画像認識ＰＣ１１６が実行する。顔画像認識ＰＣ１１６は、同定した対話相手の識別子と、その確信度、及び確信度が低いものの対話相手である可能性がある何人かの識別子とを統合制御ＰＣ１２２に通知する。 A. Recovery Processing from Identification Failure FIG. 3 shows the control structure of a program that is executed by the robot 110 when the robot 110 fails in the identification processing of the dialogue partner, and realizes the recovery processing from the failure according to a predetermined procedure. . Identification of the conversation partner is executed by the facial image recognition PC 116 shown in FIG. 1 as described above. The face image recognition PC 116 notifies the integrated control PC 122 of the identified identifier of the conversation partner, its confidence level, and identifiers of some people who may be the conversation partner, although the confidence level is low.

図３を参照して、このプログラムは、顔画像認識の結果を受信するステップ３６０と、受信した確信度が所定のしきい値より大きいか否かに従って制御の流れを分岐させるステップ３６２とを含む。この場合のしきい値は、顔画像認識ＰＣ１１６による顔画像認識の精度にもよるため、一概に定めることはできない。実際に顔画像認識を行った結果に基づいてこのしきい値を調整することが望ましい。 Referring to FIG. 3, the program includes a step 360 of receiving the results of facial image recognition, and a step 362 of branching the flow of control depending on whether the received confidence is greater than a predetermined threshold. . The threshold value in this case cannot be determined unconditionally because it depends on the accuracy of face image recognition by the face image recognition PC 116. It is desirable to adjust this threshold based on the results of actual facial image recognition.

このプログラムはさらにステップ３６２の判定が肯定であることに応答して、対話相手として同定された識別子に対応する人物に関する情報を人物情報ＤＢ９２から読み出し、自信のある態度（表情）をもってその情報に含まれる名前を発話することにより、相手の名前を確認するステップ３６４と、ステップ３６４の発話に対する相手の応答を音声認識し、相手の応答が肯定か否かを判定して判定結果に従って制御の流れを分岐させるステップ３６６とを含む。ステップ３６６の判定が肯定のときには、対話相手の同定処理が成功したということであり、かつそのロボットにとってその対話相手が知人に該当するということである。したがってロボット１１０は、ステップ３６８において対話相手を知人に分類し、知人を相手とする場合の対話のためにあらかじめ準備されていたシナリオに沿って相手との対話を開始する。このときには、笑顔で挨拶するようにロボット１１０を制御するとよい。 Further, in response to the affirmative determination in step 362, this program reads information regarding the person corresponding to the identifier identified as the conversation partner from the person information DB 92, and includes the information with a confident attitude (facial expression). Step 364: confirming the other party's name by speaking the name given to the other party; voice recognition of the other party's response to the utterance in step 364; determining whether or not the other party's response is affirmative; and controlling the flow of control according to the determination result. and branching step 366. If the determination in step 366 is affirmative, this means that the dialogue partner identification process has been successful, and that the dialogue partner is an acquaintance of the robot. Therefore, in step 368, the robot 110 classifies the conversation partner as an acquaintance, and starts a conversation with the conversation partner according to a scenario prepared in advance for a conversation with an acquaintance. At this time, it is preferable to control the robot 110 to greet the user with a smile.

このプログラムはさらに、ステップ３６２の判定が否定（すなわち確信度がしきい値以下）であるとき、又はステップ３６６の判定が否定（すなわち同定処理が誤っており、相手が同定結果の人物ではない）であるときに実行され、確信度がしきい値より高い場合よりも自信がないように見える態度をもって対話相手がロボット１１０と初対面か否かを尋ねる質問を発話するステップ３７０と、ステップ３７０における質問に対する対話相手の応答を音声認識して、対話相手の応答が肯定か否かに従って制御の流れを分岐させるステップ３７２とを含む。ステップ３７２の判定が肯定ならば対話相手とロボット１１０とが初対面であるということであり、否定ならば対話相手とロボット１１０とが以前に対話したことがあるはず、ということである。 This program further determines that when the determination in step 362 is negative (that is, the confidence level is below the threshold), or the determination in step 366 is negative (that is, the identification process is incorrect and the other party is not the person identified as the identification result). step 370, which is executed when the confidence level is higher than the threshold, and utters a question asking whether the conversation partner is meeting the robot 110 for the first time in a manner that makes the conversation partner appear less confident than when the confidence level is higher than the threshold; and the question in step 370. step 372 of voice-recognizing the dialogue partner's response to and branching the flow of control depending on whether the dialogue partner's response is affirmative or not. If the determination in step 372 is affirmative, it means that the conversation partner and robot 110 are meeting for the first time, and if negative, it means that the conversation partner and robot 110 must have interacted before.

このプログラムはさらに、ステップ３７２の判定が否定であることに応答して、ステップ３６０において顔画像認識により同定された人物の名前が対話相手の名前と一致するか否かを対話相手に確認する発話を行うステップ３７４と、ステップ３７４の発話に対する対話相手の応答を音声認識し、その応答が肯定か否定かに従って制御の流れを分岐させるステップ３７６とを含む。ステップ３７６の判定が肯定ならば、最初の同定処理の結果が正しかったということである。したがって制御はステップ３６８に進み、対話相手を知人に分類して、ロボットを制御し知人を相手とする対話を開始させる。この場合、ステップ３７４においては、ロボットには自信のなさそうな表情をさせるとよい。このようにすることにより、対話相手から見ると、対話相手が誰かをロボットが推定しているように見える。その結果、対話相手に人型ロボットの知能を感じさせることができるという効果がある。 In response to the negative determination in step 372, the program further utters an utterance to confirm with the dialogue partner whether the name of the person identified by the facial image recognition in step 360 matches the dialogue partner's name. step 374, and step 376, which performs speech recognition of the conversation partner's response to the utterance in step 374, and branches the flow of control depending on whether the response is affirmative or negative. If the determination in step 376 is affirmative, it means that the result of the first identification process was correct. Therefore, control proceeds to step 368, where the conversation partner is classified as an acquaintance and the robot is controlled to start a conversation with the acquaintance. In this case, in step 374, it is preferable to make the robot look like it lacks confidence. By doing this, it appears to the conversation partner that the robot is guessing who the conversation partner is. As a result, it has the effect of making the conversation partner feel the intelligence of the humanoid robot.

このプログラムはさらに、ステップ３７６における判定が否定であることに応答して、ロボット１１０が対話相手に対しその名前を尋ねる発話をするようにロボット１１０を制御するステップ３７８と、ステップ３７８における発話に対する相手の応答を音声認識し、音声認識結果に含まれる相手の名前が人物情報ＤＢ９２に存在するか否かを判定してその結果に従って制御の流れを分岐させるステップ３８０とを含む。ステップ３８０における判定が肯定ならば、この人物はロボット１１０にとっての知人ということになる。したがって制御はステップ３６８に進む。ステップ３８０における判定が否定ならば、この人物はロボット１１０にとって知人のはずだが、人物情報ＤＢ９２にその情報がないということになる。したがってこのプログラムにおいては、ステップ３８２及びそれ以降の処理において、相手に対してすまなさそうな表情をもって相手に関する情報を集めるための処理を実行するようロボット１１０を制御する。 This program further includes a step 378 in which, in response to the negative determination in step 376, the robot 110 controls the robot 110 to make an utterance asking the dialogue partner for his or her name; step 380 of performing voice recognition on the response, determining whether or not the name of the other party included in the voice recognition result exists in the personal information DB 92, and branching the flow of control according to the result. If the determination in step 380 is affirmative, this person is an acquaintance of robot 110. Control therefore continues to step 368. If the determination in step 380 is negative, this person should be an acquaintance of the robot 110, but this information does not exist in the person information DB 92. Therefore, in this program, in step 382 and subsequent processes, the robot 110 is controlled to perform a process for gathering information about the other party with an apologetic expression towards the other party.

このプログラムはさらに、ステップ３７２の判定が肯定のときに、相手に対してその名前を尋ねる発話を行うようロボット１１０を制御するステップ３８４と、対話相手の応答に対する音声認識により特定された対話相手の名前が、ステップ３６０における顔画像認識処理において、候補として挙げられていたいくつかの人物のいずれかの名前と一致するか否かを判定し、判定結果に従って制御の流れを分岐させるステップ３８６とを含む。 This program further includes a step 384 of controlling the robot 110 to make an utterance asking the other party for the name of the other party when the determination in step 372 is affirmative, and a step 384 of controlling the robot 110 to make an utterance to ask the other party's name; A step 386 in which it is determined whether the name matches the name of any of several persons listed as candidates in the face image recognition process in step 360, and the flow of control is branched according to the determination result. include.

ステップ３８６における判定結果が否定ならば制御はステップ３９４に進み、コンピュータは対話相手を初対面の相手（未知の人）に分類する。この後、コンピュータは、初対面の人との対話としてあらかじめ準備されていたスクリプトによる対話を行うようロボット１１０を制御する。 If the determination result in step 386 is negative, the control proceeds to step 394, where the computer classifies the conversation partner as a person met for the first time (unknown person). Thereafter, the computer controls the robot 110 to perform a dialogue based on a script prepared in advance as a dialogue with a person the robot 110 is meeting for the first time.

このプログラムはさらに、ステップ３８６の判定結果が肯定であることに応答して、やや自信がない表情をもって、ロボット１１０が対話相手に会ったことがあるような気がする、という趣旨の発話をするようロボット１１０を制御するステップ３８８と、この発話に対する対話相手の応答が肯定か否かに従って制御の流れを分岐させるステップ３９０と、ステップ３９０における判定が肯定であるときに、嬉しそうな表情をするようロボット１１０を制御するとともに、この対話相手をロボット１１０にとっての知人に分類するステップ３９２とを含む。ステップ３９２の後は、ステップ３６８と同様、知人との対話を実行するようにロボット１１０を制御する。ステップ３９０の判定が否定ならば、制御はステップ３９４に進み、対話相手を初対面の相手に分類する。 Furthermore, in response to the affirmative determination result in step 386, the program makes an utterance to the effect that the robot 110 feels like it has met the conversation partner, with a slightly unsure expression. a step 388 in which the robot 110 is controlled to do so; a step 390 in which the control flow is branched depending on whether the dialogue partner's response to this utterance is affirmative; and a step 390 in which the robot 110 makes a happy expression when the determination in step 390 is affirmative. The method includes a step 392 of controlling the robot 110 so as to classify the conversation partner as an acquaintance of the robot 110. After step 392, similarly to step 368, the robot 110 is controlled to perform a dialogue with the acquaintance. If the determination at step 390 is negative, control proceeds to step 394, where the conversation partner is classified as a first-time partner.

ステップ３７２において、相手はロボット１１０と初対面であると認めている。それにもかかわらず、ステップ３８８及び３９０においては相手がロボット１１０とあったことがあるか否かを確認する処理が行われる。これは、人によっては、相手が自分を認識しないときに、過去に会ったことをわざわざロボットに説明することを煩雑と考えて避ける場合があるためである。ステップ３８８及び３９０のような処理を入れることにより、対話相手は、ロボット１１０が自分を記憶していてくれたと思い、ロボット１１０に対する親近感を持つことが期待される。 In step 372, the other party acknowledges that this is the first time they are meeting the robot 110. Nevertheless, in steps 388 and 390, a process is performed to determine whether the other party has ever met the robot 110. This is because some people may avoid going out of their way to explain to a robot that they have met in the past when the other person does not recognize them, considering it troublesome. By including processes such as steps 388 and 390, it is expected that the dialogue partner will feel that the robot 110 has remembered him and will feel a sense of affinity towards the robot 110.

このように、対話相手の同定に失敗した際に、ロボットが感情の表出を伴う対話を対話相手との間において行うことにより、失敗からの回復のための情報を対話相手から引き出すことができる。ロボットは、これらの情報に基づいて、自然な形で失敗からの回復を行い、対話相手のカテゴリに応じた対話を開始できる。 In this way, when a robot fails to identify a conversation partner, by having a conversation with the conversation partner that involves the expression of emotions, it is possible to extract information from the conversation partner for recovery from the failure. . Based on this information, the robot can recover from failure in a natural manner and start a dialogue according to the category of the dialogue partner.

Ｂ．発話衝突からの回復処理
対話においてよく発生するコミュニケーションの失敗は、発話の衝突である。人と人との対話は、話し手と聞き手とが順次交代しながら進む。話し手となる順番が発話のターンである。発話のターンは通常は自然な形で入れ替わる。話し手と、聞き手との間で、何らかのターン交代規則が成立しているからと言われている。しかし、何らかの条件でターンの交代に失敗し、二人がほぼ同時に話し始めることがある。これが発話の衝突である。 B. Recovery Process from Utterance Conflict A communication failure that often occurs in dialogue is utterance conflict. Dialogue between people progresses as the speaker and listener take turns. The turn of the speaker is the speaking turn. Utterance turns usually change in a natural way. It is said that this is because some kind of turn-taking rule is established between the speaker and the listener. However, under some conditions, the turn exchange may fail and the two people may start speaking at almost the same time. This is a conflict of utterances.

なお、対話する両者の間では、ターンの交代までは話し手が発話をする権利を持っているという暗黙の了解ができていると考えられる。この権利は発話権とも呼ばれるが、この明細書では発言権という。 It is thought that there is a tacit understanding between the two parties that the speaker has the right to speak until the turn is changed. This right is also called the right to speak, but in this specification it is referred to as the right to speak.

ターンの交代における発話の衝突は、主として聞き手の側の失敗と考えられる。この実施形態においては、発話の衝突としてロボットに責任がある場合と、対話相手に責任がある場合とを考える。図４に前者を、図５に後者を図示する。 Conflicts in utterances during turn changes are considered to be primarily a failure on the part of the listener. In this embodiment, we will consider a case where the robot is responsible for a conflict in utterances and a case where the conversation partner is responsible. The former is illustrated in FIG. 4, and the latter is illustrated in FIG.

図４を参照して、ロボットに責任がある発話衝突について説明する。対話相手の発話ターン４００において、対話相手が発話４１０を行った後、発話ターンを保持しながら、一時の中断の後に次の発話４１２をしようとする。この一時の中断をロボットが発話ターンの終わりと誤解して、発話４１４を開始しようとする。その結果、発話４１４の先頭部分と発話４１２の先頭部分とが時間的に重複する結果、発話衝突４１６が発生する。 Referring to FIG. 4, a speech collision in which the robot is responsible will be described. In the conversation partner's utterance turn 400, after the conversation partner makes an utterance 410, the conversation partner attempts to make the next utterance 412 after a temporary interruption while maintaining the utterance turn. The robot misunderstands this temporary interruption as the end of its speaking turn and attempts to start speaking 414. As a result, the beginning portion of utterance 414 and the beginning portion of utterance 412 overlap in time, resulting in utterance collision 416.

図５を参照して、対話相手に責任がある発話衝突は、図４に示すと逆の状況である。具体的には、ロボットが発話ターン４３０の中において発話４４０を行った後、発話ターンを保持したまま、一時的な中断の後、次の発話４４２を開始する。対話相手が、この一時的な中断が発話ターンの終わりであると誤解して、次の発話４４４を開始する。その結果、発話４４２の先頭と発話４４４先頭とが時間的に重複し、発話衝突４４６が発生する。 Referring to FIG. 5, an utterance conflict in which the conversation partner is responsible is the opposite situation to that shown in FIG. Specifically, after the robot makes an utterance 440 during the utterance turn 430, the robot starts the next utterance 442 after a temporary interruption while maintaining the utterance turn. The dialogue partner misunderstands this temporary interruption as the end of the speaking turn and begins the next utterance 444. As a result, the beginning of utterance 442 and the beginning of utterance 444 overlap in time, resulting in utterance collision 446.

なお、対話をしている発話者どうしの発話が時間的に重複しているからといって、それが直ちに発話衝突とはいえない。典型的には、一方の発話者の発話中に、他方の発話者が相槌をうつ場合がある。そのような相槌は発話衝突と考えるべきではない。また、一方の発話者が発話を完全に終了する前に、相手が発話を開始することもある。この場合も、最初の発話者の発話ターンが終了までの時間が短時間ならば発話衝突と考えるべきではない。発話衝突の検出においてはこうした問題を考慮する必要がある。 Note that just because the utterances of two speakers in a dialogue overlap in time, this does not immediately mean that there is a utterance conflict. Typically, while one speaker is speaking, the other speaker may chime in. Such exchanges should not be considered utterance conflicts. Furthermore, the other party may start speaking before one speaker has completely finished speaking. In this case as well, if it takes a short time to complete the speaking turn of the first speaker, it should not be considered an utterance conflict. These issues need to be taken into consideration when detecting speech collisions.

図６及び図７に示す、例示のプログラムを表すグラフを用いて、発話衝突が発生しやすい状況と、発話衝突の発生を検出する対象区間（衝突検出区間）について説明する。図６と図７とは同じグラフである。図６はロボットに責任がある発話衝突が発生しやすい状況の例を示し、図７は対話相手に責任がある発話衝突が発生しやすい状況の例を示す。 Using graphs showing exemplary programs shown in FIGS. 6 and 7, a situation where a speech collision is likely to occur and a target section (collision detection section) in which the occurrence of a speech collision is detected will be described. 6 and 7 are the same graphs. FIG. 6 shows an example of a situation in which a speech conflict in which the robot is responsible is likely to occur, and FIG. 7 shows an example in a situation in which a speech conflict in which the conversation partner is responsible is likely to occur.

図６を参照して、このグラフは、左端の開始ブロックに続き２つの発話ブロックと２つの質問ブロックとがこの順に設けられる。質問ブロックの次に、相手の発話を音声認識しする音声認識ブロックが設けられる。音声認識ブロックの次には、３つの経路が設けれる。音声認識ブロックにおいて認識された対話相手の応答によってこれら経路のいずれかが選択される。 Referring to FIG. 6, in this graph, two speech blocks and two question blocks are provided in this order following the leftmost start block. Next to the question block, a voice recognition block is provided which performs voice recognition of the other party's utterances. Following the speech recognition block, three paths are provided. One of these routes is selected depending on the dialogue partner's response recognized in the speech recognition block.

これら３つの経路の各々は、連続する２つの発話ブロックを含む。これら３つの経路の最後は質問ブロックに合流する。質問ブロックの次には、再度、音声認識ブロックが設けられる。音声認識ブロックにおける対話相手の発話の音声認識により、対話のトピックがトピック１及び２、及びトピック３のいずれかから選択され、このプログラムの実行が終了される。 Each of these three paths includes two consecutive speech blocks. The last of these three paths joins the question block. Next to the question block, a voice recognition block is provided again. By voice recognition of the conversation partner's utterance in the voice recognition block, the conversation topic is selected from topics 1, 2, and topic 3, and the execution of this program is completed.

図６を参照して、ロボットに責任がある発話衝突が発生しやすい状況の例を示す。図６に示すグラフに従ってロボットが動作する場合、対話相手の発話ターン４６０の直後において発話衝突が生じやすい。したがって、対話相手の発話ターン４６０に続く発話ブロックの先頭部分を囲む発話衝突の衝突検出区間４６４が必要となる。一方、相手の発話ターン４６２の後にはロボットの発話ブロックが存在しない。したがって発話ターン４６２に関しては発話衝突の衝突検出区間は必要ない。 Referring to FIG. 6, an example of a situation in which a robot is likely to be responsible for a speech conflict is shown. When the robot operates according to the graph shown in FIG. 6, a utterance collision is likely to occur immediately after the conversation partner's utterance turn 460. Therefore, a collision detection section 464 for a speech collision surrounding the beginning of the speech block following the speech turn 460 of the dialogue partner is required. On the other hand, there is no robot speech block after the opponent's speech turn 462. Therefore, regarding the speech turn 462, there is no need for a collision detection section for speech collision.

図７を参照して、対話相手に責任がある発話衝突が発生しやすい状況の例を示す。図７に示すグラフに従ってロボットが発話する場合、グラフのスタートブロックの直後に３個の発話ブロックが続く。これら３個の発話ブロックがロボットの発話ターン４８０となる。ロボットの発話ターン４８０においては、各発話ブロックの後に発話の切れ目が存在する。そうした範囲では、対話相手がロボットの発話ターンの終了と誤って認識して発話する可能性がある。その結果、対話相手に責任がある発話衝突が生じやすい。したがって図７に示す発話ターン４８０の中で、各ブロックの境界部分を含む領域をまとめて衝突検出区間４８４とする。 Referring to FIG. 7, an example of a situation in which an utterance conflict in which the conversation partner is responsible is likely to occur is shown. When the robot speaks according to the graph shown in FIG. 7, three speech blocks immediately follow the start block of the graph. These three speech blocks constitute a speech turn 480 of the robot. In the robot's speech turn 480, there is a speech break after each speech block. In such a range, there is a possibility that the conversation partner may mistakenly think that the robot's speaking turn has ended and speak. As a result, utterance conflicts are likely to occur in which the conversation partner is responsible. Therefore, in the speech turn 480 shown in FIG. 7, the area including the boundaries of each block is collectively defined as a collision detection section 484.

ロボットの発話ターン４８０の後には音声認識ブロックが存在する。この部分は対話相手の発話ターンとなる。さらにこの音声認識ブロックの末尾と互いに並列関係にある３つの経路を含むロボットの発話ターン４８２が存在する。この発話ターン４８２内の各発話ブロックの先頭部分において、対話相手に責任がある発話衝突が生じやすい。したがって、これらを含む部分をまとめて衝突検出区間４８６とする。 After the robot's speaking turn 480 there is a speech recognition block. This part is the conversation partner's turn to speak. Furthermore, there is a robot speech turn 482 that includes three paths that are mutually parallel to the end of this speech recognition block. At the beginning of each utterance block within this utterance turn 482, a utterance conflict in which the conversation partner is responsible is likely to occur. Therefore, the portion including these is collectively defined as a collision detection section 486.

この実施形態においては、このような衝突検出区間以外において発生した発話の重複は発話衝突とみなさない。もちろん、そうした発話の重複を発話衝突として処理してもよい。 In this embodiment, duplication of utterances occurring outside of such a collision detection section is not considered as a utterance collision. Of course, such duplication of utterances may be treated as a utterance collision.

図８に、ロボットに責任がある発話衝突からの復帰を実現するプログラムの制御構造を示す。図８を参照して、このプログラムは、ロボットが発話を中断するステップ６００と、発話の衝突が生じたことについてロボットが気付いたことを示す、あらかじめ定義された表情（例えば驚いた表情、とまどった表情など）をするようにロボットを制御するステップ６０２とを含む。ステップ６０２においては、「あっ」というような声をロボットが発話するようロボットを制御してもよい。 FIG. 8 shows the control structure of a program that realizes recovery from a speech collision in which the robot is responsible. Referring to FIG. 8, the program includes a step 600 in which the robot interrupts speech and a predefined facial expression (e.g., a surprised facial expression, a confused facial expression, and controlling the robot to make a facial expression (such as a facial expression). In step 602, the robot may be controlled so that it utters a voice such as "Ah!".

このプログラムはさらに、ステップ６０２に続いて、ロボットが対話相手に発言権を譲るための処理を実行するステップ６０４と、ステップ６０４に続いてさらに新たな衝突が発生したか否かを判定し、新たな再衝突が発生したときには制御をステップ６００に戻し、さもなければこの復帰処理を終了するステップ６０６とを含む。 This program further includes, following step 602, a step 604 in which the robot executes a process for yielding the right to speak to the conversation partner, and a step 604 in which the robot determines whether or not a new collision has occurred and creates a new collision. If another collision occurs, control is returned to step 600; otherwise, this return processing is terminated in step 606.

ステップ６０４は、ステップ６０２に続き、対話相手の発話の中断があったか否かを判定し、判定結果に従って制御の流れを分岐させるステップ６１０と、ステップ６１０における判定が肯定であることに応答して、対話相手に発言権を譲ることにより発話ターンの調整を行うステップ６１２と、ステップ６１０における判定が肯定でかつステップ６１２の実行が終了したこと、又はステップ６１０における破綻が否定であることに応答して、対話相手の発話ターンの終了が検出されるまで待機するステップ６１４と、対話相手の発話ターンの終了が検出されたことに応答して、ロボットが発話しようとしていた情報の再伝達を行い、ステップ６０４の処理を終了するステップ６１６とを含む。 Step 604, following step 602, determines whether or not there has been an interruption in the conversation partner's speech, and branches the flow of control according to the determination result; and in response to the affirmative determination in step 610, In response to a step 612 in which the speaking turn is adjusted by yielding the right to speak to the dialogue partner, and the determination in step 610 is affirmative and the execution of step 612 is completed, or the breakdown in step 610 is negative. , step 614 of waiting until the end of the conversation partner's speaking turn is detected, and in response to the detection of the end of the conversation partner's speaking turn, retransmitting the information that the robot was about to speak, and step 614. and step 616, which ends the process of step 604.

ステップ６１０において対話相手の発話が中断したか否かを判定する理由は以下のとおりである。もしも対話相手が発話を中断しなかったら、ロボットが発話を中断するだけで発話衝突から復帰できる。したがってこの場合、ロボットは特に発話をせず、発言権を譲ることを示すジェスチャを行えば十分である。場合によってはジェスチャも必要ない場合もある。したがって、特にロボットが発話をしなくても対話を速やかに修復できる。 The reason for determining whether or not the dialogue partner's speech has been interrupted in step 610 is as follows. If the conversation partner does not interrupt the utterance, the robot can recover from the utterance conflict simply by interrupting the utterance. Therefore, in this case, it is sufficient for the robot to make a gesture indicating yielding the right to speak without making any particular utterance. In some cases, gestures may not be necessary. Therefore, the dialogue can be quickly repaired even if the robot does not speak.

一方、対話相手が発話を中断した場合には、もともとの発話ターンが対話相手にあったのだから、ロボットはより丁寧に発言権を対話相手に譲る必要がある。そこで、この場合にはロボットは、より丁寧に、発言権を相手に譲ることを明示する発話とそのためのジェスチャとを行う。この場合のロボットの動作としては、例えば「お先にどうぞ」と発話しつつ、手を対話相手に向けて伸ばすことにより発話を促すことが考えられる。もちろんこれは単なる１例であって、他にも対話を修復するための発話及びジェスチャは色々考えられる。 On the other hand, if the conversation partner interrupts speaking, the robot needs to more politely yield the right to speak, since the conversation partner originally had the speaking turn. Therefore, in this case, the robot more politely makes an utterance clearly indicating that it is giving up the right to speak to the other party, and makes a gesture for that purpose. In this case, the robot's behavior may be to encourage the conversation partner to speak by, for example, uttering, "Please come first," and extending its hand toward the conversation partner. Of course, this is just one example, and various other utterances and gestures can be considered to repair the dialogue.

このような処理を行うことにより、ロボットと対話相手との対話のテンポを維持しながら、自然な対話が行えるようなロボットの振る舞いを実現できる。 By performing such processing, it is possible to maintain the tempo of the dialogue between the robot and the dialogue partner, while realizing the robot's behavior that allows for natural dialogue.

図９に、対話相手に責任がある発話衝突からの対話の回復（復帰）を実現するためのプログラムの制御構造を示す。図９を参照して、このプログラムは、発話衝突が検出されたことに応答して、ロボットの発話を中断するステップ６３０と、発話衝突にロボットが気付いたことを示す表情を作るようロボットを制御するステップ６３２と、実行中の発話ブロックにおいて、ロボットの感情の１つのパラメータとして設定された、ロボットの発話意欲の値が所定のしきい値より大きいか否かに従って、ステップ６３６及び６３８に制御の流れを分岐させるステップ６３４とを含む。ステップ６３６は、ロボットが発言権を相手に譲るための処理であり、ステップ６３８はロボットが発話ターンを維持するための処理である。なお、この実施形態では、ロボット１１０の発話意欲の値は、例えばシナリオ作成時において該当する発話ブロックを定義する際にシステム設計者が設定する。もちろん、それ以外の条件によりシナリオ実行時に何らかの手段で発話意欲の値を設定するようにしてもよい。 FIG. 9 shows a control structure of a program for realizing dialogue recovery (return) from an utterance conflict in which the dialogue partner is responsible. Referring to FIG. 9, the program includes step 630 of interrupting the robot's speech in response to a detected speech conflict, and controlling the robot to make a facial expression indicating that the robot has noticed the speech conflict. In step 632, the control is performed in steps 636 and 638 according to whether or not the value of the robot's desire to speak, which is set as one parameter of the robot's emotion, is greater than a predetermined threshold in the utterance block being executed. and bifurcating the flow 634. Step 636 is a process for the robot to yield the right to speak to the other party, and step 638 is a process for the robot to maintain the speaking turn. In this embodiment, the value of the speech motivation of the robot 110 is set by the system designer, for example, when defining a corresponding speech block when creating a scenario. Of course, the value of the desire to speak may be set by some means when executing the scenario based on other conditions.

ステップ６３６は、対話相手の発話に中断があったか否かに応じて制御の流れを分岐させるステップ６５０と、ステップ６５０の判定が肯定であることに応答して、発話ターンの調整を行うステップ６５２とを含む。ステップ６５２において行われる処理は、図８のステップ６１２において行われる処理と同じである。ステップ６３６はさらに、ステップ６５０の判定が肯定でステップ６５２が終了したこと、又はステップ６５０の判定が否定であることに応答して、対話相手の発話ターンが終了するまで待機するステップ６５４と、対話相手の発話ターンが終了したことに応答して、発話衝突が生じたときにロボットが発話により伝えようとしていた情報を再伝達してステップ６３６を終了するステップ６５６とを含む。 Step 636 includes a step 650 in which the flow of control is branched depending on whether or not there is an interruption in the conversation partner's utterance, and a step 652 in which the utterance turn is adjusted in response to an affirmative determination in step 650. including. The processing performed in step 652 is the same as the processing performed in step 612 of FIG. Step 636 further includes, in response to the affirmative determination of step 650 and the completion of step 652, or the negative determination of step 650, step 654 of waiting until the dialogue partner's speaking turn ends; In response to the completion of the other party's speaking turn, the robot retransmits the information that the robot was trying to convey through the utterance when the utterance collision occurred, and the step 636 is ended.

このプログラムはさらに、ステップ６３６が完了したことに応答して、再衝突がさらに発生したか否かを判定し、判定が肯定なら制御をステップ６３０に戻し、判定が否定なら発話衝突からの対話の修復を完了したとしてこのプログラムの実行を終了するステップ６４０を含む。 The program further determines whether further re-collisions have occurred in response to step 636 being completed, returns control to step 630 if the determination is positive, and returns control to step 630 if the determination is negative; The program includes step 640 of terminating execution of the program as having completed the repair.

ステップ６３８は、フィラーを表出するステップ６６０と、ステップ６６０に続き、ロボットが発話ターンを維持することを相手に伝えることにより発話ターンの調整を行うステップ６６２と、発話衝突が発生したときにロボットが発話により伝えようとしていた情報を再伝達してこのプログラムの実行を終了するステップ６６４とを含む。ステップ６６０のフィラーとしては、例えば「えーっと」のような意味のない音声でよい。ステップ６６２において行われる発話としては、例えば「先に私に言わせてくださいね」とか「先にお話してよいですか？」などのように、ロボットが発話ターンを維持することを明らかにするものならばよい。 Step 638 includes a step 660 in which a filler is expressed, a step 662 in which the robot adjusts the speech turn by informing the other party that the robot will maintain the speech turn following step 660, and a step 662 in which the robot adjusts the speech turn when a speech conflict occurs. and retransmitting the information that the program intended to convey through the utterance, and terminating the execution of the program. The filler in step 660 may be, for example, a meaningless voice such as "umm". The utterances made in step 662 are ones that make it clear that the robot will maintain the utterance turn, such as "Please let me say it first" or "Can I speak first?" That's fine.

図１０に、ロボットと対話相手との間に発話衝突があったか否かを判定するためのプログラムの制御構造を示す。このプログラムは、ロボットの制御ループの各時刻、例えば１００ミリ秒ごとの各時刻において起動される。 FIG. 10 shows the control structure of a program for determining whether or not there is a speech conflict between the robot and the conversation partner. This program is started at each time in the robot's control loop, for example every 100 milliseconds.

図１０を参照してこのプログラムは、衝突からの復帰中か否かについての判定に従って、制御の流れを分岐させるステップ７００と、ステップ７００の判定が否定であるときに、ロボットの発話と対話相手の発話とが重複しているか否かについての判定に従って制御の流れを分岐させるステップ７０４と、ステップ７０４の判定が肯定であるときに、自分（ロボット）の発話が相槌か否かについての判定に従って制御の流れを分岐させるステップ７０６とを含む。なお、ステップ７０６におけるロボットの発話が相槌か否かの判断は、あらかじめ相槌となる発話テキストを集めて作成した辞書を参照することで相槌と判定できる。発話者が相手のときも同様である。 Referring to FIG. 10, this program includes a step 700 in which the flow of control is branched according to a determination as to whether or not the robot is recovering from a collision, and when the determination in step 700 is negative, the robot's utterance and the dialogue partner are step 704, in which the flow of control is branched according to the determination as to whether or not the utterance of the robot overlaps with the utterance of and step 706 of branching the flow of control. Note that it is possible to determine whether or not the robot's utterance in step 706 is a mutual response by referring to a dictionary created by collecting utterance texts that are mutual responses in advance. The same applies when the speaker is the other party.

ステップ７００の判定が肯定のとき、ステップ７０４の判定が否定のとき、及びステップ７０６の判定が肯定のときには制御はステップ７０２に進む。ステップ７０２においては、発話衝突は発生していないと判定され、発話衝突が発生していないときの処理が実行される。 When the determination at step 700 is affirmative, when the determination at step 704 is negative, and when the determination at step 706 is affirmative, control proceeds to step 702. In step 702, it is determined that no utterance conflict has occurred, and the processing when no utterance conflict has occurred is executed.

ステップ７００における判定により、衝突からの復帰中には発話衝突の検出は行われないことになる。またステップ７０４における判定により、ロボットと発話相手との発話が時間的に重複していない場合には発話衝突は生じない。さらにステップ７０６における判定により、ロボットの発話が相槌ならば、たとえ相手の発話と重なっていても発話衝突とは判定されない。 The determination in step 700 results in that no speech collision is detected during recovery from the collision. Further, as determined in step 704, if the utterances of the robot and the utterance partner do not overlap in time, no utterance collision occurs. Further, as determined in step 706, if the robot's utterance is a mutual response, it will not be determined as a utterance collision even if it overlaps with the other party's utterance.

このプログラムはさらに、ステップ７０６の判定が否定であることに応答して、ロボットの前回の発言後、ロボットが発言権を保持しているか否かに従って制御の流れを分岐させるステップ７０８と、ステップ７０８における判定が肯定であることに応答して、ロボットの予測発話長がしきい値Ｔ１ミリ秒以上か否かに従って制御の流れを分岐させるステップ７２８とを含む。ステップ７２８の判定が否定のときには、発話衝突は発生しないと結論する（ステップ７３６）。 The program further includes a step 708 in which, in response to a negative determination in step 706, the flow of control is branched depending on whether the robot retains the right to speak after the robot's previous utterance; In response to the affirmative determination in step 728, the control flow is branched depending on whether the predicted speech length of the robot is equal to or greater than the threshold value T1 milliseconds. When the determination at step 728 is negative, it is concluded that no utterance collision occurs (step 736).

ここでいうロボットの予測発話長とは、ロボットの現在の発話の最大長のことである。ロボットが現在の発話を全て行ったとしても、その長さが十分に小さければ、実際には発話の衝突にはならない。ステップ７２８の判定はそのためである。 The predicted utterance length of the robot here refers to the maximum length of the robot's current utterance. Even if the robot makes all of its current utterances, if their length is small enough, there will actually be no utterance collision. This is the reason for the determination in step 728.

このプログラムはさらに、ステップ７２８の判定が肯定であることに応答して、ロボットの現在の発話位置が、発話の末尾の所定長の部分かそれ以外かに従って制御の流れを分岐させるステップ７３０と、ステップ７３０の判定が否定であること、すなわち発話の開始位置又は発話本体であると判定されたことに応答して、ロボットの発話と対話相手の発話とが重複している時間がしきい値Ｔ２ミリ秒以上か否かに従って制御の流れを分岐させるステップ７３２とを含む。 The program further includes a step 730 in which, in response to the affirmative determination in step 728, the flow of control is branched depending on whether the robot's current utterance position is a portion of a predetermined length at the end of the utterance or something else; In response to the determination in step 730 being negative, that is, in response to determining that the utterance is at the start position or the utterance body, the time period during which the robot's utterance and the conversation partner's utterance overlap is a threshold value T2. and step 732 of branching the flow of control depending on whether the time is greater than or equal to a millisecond.

ロボットの予測発話長がある程度長いとしても、現在の発話位置が発話の末尾部分ならば、実質的に発話の衝突と考える必要はない。ステップ７３０の判定はそのためである。またステップ７０８における判定によってロボットが発言権を保持している。したがって、ロボットの発話位置が発話の末尾でない場合であって、発話の重複時間がしきい値以上ある場合には、対話相手の責任による発話衝突が発生したと結論できる。ステップ７３２の判定はそのためである。このプログラムはそのため、ステップ７３２における判定が肯定ならば対話相手の責任による発話衝突が発生したと判定するステップ７３４をさらに含む。 Even if the predicted utterance length of the robot is long to some extent, if the current utterance position is at the end of the utterance, there is no need to consider it as a utterance collision. This is the reason for the determination in step 730. Further, as determined in step 708, the robot holds the right to speak. Therefore, if the robot's utterance position is not at the end of the utterance, and if the utterance overlap time is equal to or greater than the threshold, it can be concluded that an utterance conflict has occurred due to the conversation partner's responsibility. This is the reason for the determination in step 732. Therefore, this program further includes step 734 in which, if the determination in step 732 is affirmative, it is determined that an utterance conflict attributable to the dialogue partner has occurred.

このプログラムはさらに、ステップ７２８における判定が否定、ステップ７３０における判定が肯定、又はステップ７３２における判定が否定のときに、発話衝突は発生しないと判定するステップ７３６を含む。 The program further includes a step 736 in which it is determined that a speech conflict does not occur when the determination in step 728 is negative, the determination in step 730 is positive, or the determination in step 732 is negative.

このプログラムはさらに、ステップ７０８における判定が否定であることに応答して、対話相手の発話が相槌か否かに従って制御の流れを分岐させるステップ７１０と、ステップ７１０の判定が否定であることに応答して、ロボットの予測発話長がしきい値Ｔ１ミリ秒以上か否かに従って制御の流れを分岐させるステップ７１２と、ステップ７１２における判定が肯定であることに応答して、ロボットの現在の発話位置が、発話の末尾部分か、先頭部分か、本体部分かに従って制御の流れを分岐させるステップ７１４と、ステップ７１４においてロボットの現在の発話位置が発話本体であると判定されたことに応答して、発話の重複時間がしきい値Ｔ２ミリ秒以上か否かに従って制御の流れを分岐させるステップ７１８とを含む。 This program further includes a step 710 in which, in response to the negative determination in step 708, the flow of control is branched depending on whether or not the dialogue partner's utterance is a mutual response, and in response to the negative determination in step 710. Step 712 branches the flow of control according to whether the predicted speech length of the robot is equal to or greater than the threshold value T1 milliseconds, and in response to an affirmative determination in step 712, the robot's current speech position is Step 714 branches the flow of control depending on whether the utterance is at the end, at the beginning, or at the main body; and step 718 of branching the flow of control depending on whether the utterance overlap time is equal to or greater than a threshold T2 milliseconds.

対話相手の発話が相槌ならば、仮にロボットの発話と対話相手の発話とが重複していたとしても、発話重複が発生したと考える必要はない。ステップ７１０の判定はそのためのものである。 If the conversation partner's utterance is mutual, even if the robot's utterance and the conversation partner's utterance overlap, there is no need to consider that utterance overlap has occurred. The determination in step 710 is for this purpose.

ステップ７１４においてロボットの現在の発話位置が発話の末尾部分と判定されたときには制御はステップ７０２に進み、発話衝突が生じていないと結論される。ステップ７１４においてロボットの現在の発話位置が発話の先頭部分と判定されたときには制御はステップ７１６に進み、ロボットの責任による発話衝突が起きたと結論される。ステップ７１８における判定が肯定のときには制御はステップ７２０に進み、対話相手の責任による発話衝突が起きたと結論される。ステップ７１８における判定が否定のときには制御はステップ７２２に進み発話衝突は起きていないと判定される。 When it is determined in step 714 that the robot's current utterance position is at the end of the utterance, control proceeds to step 702 and it is concluded that no utterance collision has occurred. When it is determined in step 714 that the robot's current utterance position is at the beginning of the utterance, control proceeds to step 716, where it is concluded that an utterance conflict for which the robot is responsible has occurred. When the determination in step 718 is affirmative, control proceeds to step 720, and it is concluded that an utterance conflict caused by the conversation partner's responsibility has occurred. If the determination in step 718 is negative, control proceeds to step 722 and it is determined that no speech conflict has occurred.

ステップ７０８の判定が否定ならば、ロボットは発言権を保持していない。したがって、ステップ７１２においてロボットの予測発話長がしきい値より長く、かつ現在の発話位置が発話の先頭ならばロボットの責任による発話衝突が発生したと結論できる。またステップ７１４においてロボットの現在の発話位置が発話の末尾であると判定されれば、実質的には発話衝突が発生していないと考えることができる。さらに、ロボットの現在の発話位置が発話本体であって、仮に発話の重複が生じたとすれば、それは対話相手が発話を開始したときと判断できる。したがって、その重複時間がしきい値以上ならば対話相手の責任による発話衝突が発生したと判断でき、そうでなければ対話相手が発話をすぐに終わらせたと考えられ、実質的に発話衝突が発生していないと判定できる。 If the determination at step 708 is negative, the robot does not hold the right to speak. Therefore, if the robot's predicted utterance length is longer than the threshold in step 712 and the current utterance position is at the beginning of the utterance, it can be concluded that an utterance collision caused by the robot has occurred. Furthermore, if it is determined in step 714 that the robot's current utterance position is at the end of the utterance, it can be considered that a utterance collision has not substantially occurred. Furthermore, if the robot's current utterance position is the utterance body and duplicate utterances occur, it can be determined that this is when the conversation partner starts speaking. Therefore, if the overlapping time is equal to or greater than the threshold, it can be determined that an utterance conflict has occurred due to the dialog partner's responsibility; otherwise, it is considered that the dialog partner ended the utterance immediately, and a utterance conflict has actually occurred. It can be determined that this has not been done.

このプログラムはさらに、ステップ７１２における判定が否定のときに、ロボットの発話が会話終了を告げる発言か否かにより制御を分岐させるステップ７２４を含む。ステップ７２４における判定が肯定のときには制御はステップ７２２に進み、発話衝突が起きていないと結論される。ステップ７２４における判定が否定のときには制御はステップ７２６に進み、ロボットの責任による発話衝突が発生したと結論される。なお、ステップ７２４における、会話終了を告げる発の例としては、例えば「さようなら」などの別れの挨拶がある。またしきい値Ｔ１としては例えば１３００ミリ秒、しきい値Ｔ２としては例えば１７００ミリ秒という値を採用してもよい。 This program further includes a step 724 in which, when the determination in step 712 is negative, control is branched depending on whether or not the robot's utterance indicates the end of the conversation. If the determination in step 724 is affirmative, control proceeds to step 722 and it is concluded that no speech conflict has occurred. If the determination at step 724 is negative, control proceeds to step 726, where it is concluded that an utterance conflict for which the robot is responsible has occurred. Note that an example of the utterance that announces the end of the conversation in step 724 is a farewell greeting such as "goodbye." Further, the threshold value T1 may be, for example, 1300 milliseconds, and the threshold value T2 may be, for example, 1700 milliseconds.

図１１に、ロボットの１発話７５０の先頭７５２、本体７５４、及び末尾７５６の例を示す。図１１に示す例においては、先頭７５２は発話開始から５００ミリ秒の範囲である。末尾７５６は、発話の予測発話長の末尾から５００ミリ秒の範囲である。本体７５４はそれ以外の発話７５０の部分である。もちろんこれは１例であって、先頭７５２及び末尾７５６の長さは図１１に示すものには限定されない。また先頭７５２の長さと末尾７５６の長さとが一致する必要もない。 FIG. 11 shows an example of the beginning 752, main body 754, and end 756 of one utterance 750 of the robot. In the example shown in FIG. 11, the beginning 752 is the range of 500 milliseconds from the start of the utterance. The tail 756 is a range of 500 milliseconds from the end of the predicted utterance length of the utterance. The main body 754 is the other part of the utterance 750. Of course, this is just one example, and the lengths of the beginning 752 and the end 756 are not limited to those shown in FIG. Further, the length of the beginning 752 and the length of the end 756 do not need to match.

２．効果
以上のようにこの実施形態によれば、ロボットが対話相手の同定に失敗したときにも、対話の途中で発話衝突が発生したときにも、一定の手順に従って対話を修復し、正常な対話に復帰できる。その際、ロボットが表情と適切なジェスチャを交えて修復のための対話を行うため、対話相手から見ると、人と対話しているときと同様、自然な形によりコミュニケーションを回復できるという効果がある。また発話衝突が発生したときに、従来と異なり、常にロボットが対話相手に発言権を譲るわけではない。ロボットの発話意欲を設定することにより、発話衝突が対話相手の責任による場合には、ロボットの発話意欲によってはロボットが発言権を維持して発話を続ける場合と、対話相手に発言権を譲る場合との双方の動作が行われる。これはロボットの行動としては、従来技術と比較してより人間に近いと評価できる行動であり、対話相手にとって自然な形により発話衝突から正常な対話に復帰できるという効果がある。 2. Effects As described above, according to this embodiment, even when the robot fails to identify the conversation partner or when a speech conflict occurs during the conversation, the robot can repair the conversation according to a certain procedure and restore normal dialogue. You can return to At that time, the robot uses facial expressions and appropriate gestures to carry out restorative dialogue, so from the perspective of the dialogue partner, the effect is that communication can be restored in a natural manner, just like when speaking with a human. . Also, when a speech conflict occurs, unlike in the past, the robot does not always yield the right to speak to the other party. By setting the robot's desire to speak, if a speech conflict is the responsibility of the conversation partner, depending on the robot's desire to speak, the robot may maintain the right to speak and continue speaking, or it may yield the right to the conversation partner. Both operations are performed. This is a robot behavior that can be evaluated as being more human-like than conventional techniques, and has the effect of allowing the conversation partner to return to a normal conversation after an utterance conflict in a natural way.

第２第２実施形態
第２実施形態に係るロボットは、対話相手の同定に失敗したときに、図３に示す第１実施形態に係るロボットが実行するプログラムに代えて、図１２に制御構造を示すプログラムを実行する。 2 Second Embodiment The robot according to the second embodiment has the control structure shown in FIG. 12 instead of the program executed by the robot according to the first embodiment shown in FIG. Run the program shown.

図１２を参照して、第２実施形態に係るロボットが実行するプログラムが図３に示すものと異なるのは、ステップ３８０における判定が肯定のときに、直ちに制御をステップ３６８に進めるのではなく、人物の同定をより確実にするためにステップ７７０の判定を行う点である。ステップ７７０においては、ステップ３８０において人物情報ＤＢ９２において検索された人物情報に記載されたその人物の属性と、相手の属性とが一致するか否かが判定される。例えば、人物情報ＤＢ９２のレコードに人物の性別及び生年月日が含まれていたとすれば、それらの情報と、ステップ３６０において認識対象となった顔画像から推定された性別及び年齢とを比較する。この処理においては、完全に一致するか否かを決定することはできないが、顔画像の人物が人物情報ＤＢ９２に記録された性別と、生年月日から計算した年齢とに一致する確率（尤度）を算出することは、学習済のニューラルネットワークを使用することにより可能である。 Referring to FIG. 12, the program executed by the robot according to the second embodiment is different from the program shown in FIG. The point is that the determination in step 770 is made in order to more reliably identify the person. In step 770, it is determined whether the attributes of the person described in the person information retrieved in the person information DB 92 in step 380 match the attributes of the other party. For example, if the record of the person information DB 92 includes the person's gender and date of birth, this information is compared with the gender and age estimated from the face image to be recognized in step 360. In this process, it is not possible to determine whether there is a complete match, but the probability (likelihood) that the person in the face image matches the gender recorded in the person information DB 92 and the age calculated from the date of birth. ) can be calculated by using a trained neural network.

この尤度が一定のしきい値以上ならばステップ７７０の判定が肯定となり、そうでなければ否定となる。ステップ７７０の判定が肯定ならば制御はステップ３６８に進み、認識された人物を知人に分類する。さもなければ制御はステップ３８２に進み、知人のはずだが人物情報ＤＢ９２には該当する記録がない人物に相手を分類する。 If this likelihood is greater than or equal to a certain threshold, the determination at step 770 will be affirmative; otherwise, it will be negative. If the determination at step 770 is affirmative, control proceeds to step 368, where the recognized person is classified as an acquaintance. Otherwise, control proceeds to step 382, where the other party is classified as a person who is supposed to be an acquaintance but for whom there is no corresponding record in the personal information DB 92.

顔画像認識による相手の同定に失敗したときであって、相手がステップ３７０における質問に対して名乗った名前が人物情報ＤＢ９２にあったとしても、その人物が人物情報ＤＢ９２に記録されている人物と同一人物であると完全に判断できるわけではない。ステップ７７０の判定を挿入することにより、対話相手が人物情報ＤＢ９２に記録された人物と同一か否かをより正確に判定できるという効果がある。 When identification of the other party by face image recognition fails, and even if the name the other party gave in response to the question in step 370 is in the person information DB 92, the person is not the same as the person recorded in the person information DB 92. It cannot be completely determined that they are the same person. By inserting the determination in step 770, it is possible to more accurately determine whether or not the conversation partner is the same person recorded in the personal information DB 92.

第３第３実施形態
第３実施形態も、第２実施形態と同様、第１実施形態の図３に示すプログラムに代えて、図１３に制御構造を示すプログラムを実行する点において第１実施形態と異なる。 Third Third Embodiment The third embodiment is similar to the second embodiment in that a program whose control structure is shown in FIG. 13 is executed instead of the program shown in FIG. 3 of the first embodiment. different from.

図１３に示すプログラムが図３に示すプログラムと異なるのは、ステップ３８０における判定が肯定であるときに、直ちに制御をステップ３６８に移すのではなく、第２実施形態と同様、対話相手の同定をより確実にするための処理を設ける点である。 The program shown in FIG. 13 is different from the program shown in FIG. 3 because, when the determination in step 380 is affirmative, the control is not immediately transferred to step 368, but as in the second embodiment, the dialogue partner is identified. The point is to provide processing to make it more reliable.

このプログラムは、対話相手が名乗った名前を持つ人物のレコードが人物情報ＤＢ９２において複数見つかった場合に対話相手をできるだけ正確に同定するための処理を含む。 This program includes processing for identifying the conversation partner as accurately as possible when a plurality of records of a person having the name given by the conversation partner are found in the person information DB 92.

より具体的には、このプログラムは、図３に示す各ステップに加えて、ステップ３８０における判定が肯定であることに応答して、対話相手が名乗った名前を持つ人物のレコードが人物情報ＤＢ９２において複数見つかったか否かに従って制御の流れを分岐させるステップ８００をさらに含む。ステップ８００の判定が否定ならば、すなわち見つかったレコードが一つのみの場合には、第１実施形態と同様、制御はステップ３６８に進み、対話相手を知人に分類する。 More specifically, in addition to each step shown in FIG. The method further includes step 800 of branching the flow of control depending on whether a plurality of results are found. If the determination in step 800 is negative, that is, if only one record is found, control proceeds to step 368, as in the first embodiment, and the conversation partner is classified as an acquaintance.

このプログラムはさらに、ステップ８００の判定が否定であること、すなわち検索されたレコードが１つであることに応答して、人物情報ＤＢ９２において検索されたレコードを用いて対話相手を特定するためのステップ８０２をさらに含む。 This program further includes, in response to the negative determination at step 800, that is, the number of records retrieved, a step for specifying a conversation partner using the retrieved record in the personal information DB 92. 802.

ステップ８０２は、ステップ８００における判定が肯定であること、すなわち検索されたレコードが複数であることに応答して、複数のレコードに対して所定の終了条件が成立するまで以下のステップ８２２を実行するステップ８２０を含む。ここにおける終了条件は、複数のレコードのうち、対話相手と一致する情報を持つレコードが見つかったか、対話相手と一致する情報を持つレコードが一つも見つからなかったという条件である。ステップ８２２は、処理対象のレコードの情報のうち、名前以外の情報を使って、対話相手がそのレコードに記録された人物か否かを確認するステップ８４０と、ステップ８４０の判定が肯定か否定かに従って制御の流れを分岐させるステップ８４２とを含む。例えば人物情報ＤＢ９２に各人物の所属が記録されていたとすれば、ロボットはステップ８４０において「Ａ部署のＢさんですか？」のような質問を相手に対して行う。相手がこの質問に対して肯定の応答をすればステップ８４２の判定が肯定になる。相手がこの質問に対して否定の応答をすればステップ８４２の判定は否定になる。 In step 802, in response to the affirmative determination in step 800, that is, in response to the fact that a plurality of records have been retrieved, the following step 822 is executed until a predetermined termination condition is satisfied for the plurality of records. Step 820 is included. The termination condition here is that either a record with information that matches the conversation partner is found among a plurality of records, or that no record with information that matches the conversation partner is found. Step 822 includes a step 840 in which information other than the name of the record to be processed is used to confirm whether or not the conversation partner is the person recorded in the record, and whether the determination in step 840 is affirmative or negative. and branching the flow of control according to step 842. For example, if the affiliation of each person is recorded in the person information DB 92, the robot asks the other person a question such as "Are you Mr. B from department A?" in step 840. If the other party answers this question in the affirmative, the determination at step 842 becomes affirmative. If the other party answers this question in the negative, the determination at step 842 will be negative.

ステップ８４２の判定が肯定ならば、対話相手がそのレコードに記録された相手であることが分かる。したがって制御はステップ８２２を抜けてステップ３６８に進み、対話相手を知人に分類する。ステップ８４２の判定が否定ならばロボットは次のレコードを使用して同様の処理を行う。全てのレコードについてステップ８４０及び８４２を実行しても相手が同定できないときには、制御はステップ３８２に進む。ステップ３８２においては、対話相手を知人のはずだが人物情報ＤＢ９２に記録されていない人物に分類し、対話を開始することになる。 If the determination in step 842 is affirmative, it is known that the conversation partner is the partner recorded in the record. Therefore, control exits step 822 and proceeds to step 368, where the conversation partner is classified as an acquaintance. If the determination at step 842 is negative, the robot performs similar processing using the next record. If the other party cannot be identified after performing steps 840 and 842 for all records, control proceeds to step 382. In step 382, the conversation partner is classified as a person who is supposed to be an acquaintance but is not recorded in the personal information DB 92, and a conversation is started.

以上のようにこの第３実施形態によれば、ステップ３７８において対話相手が名乗った名前に該当する人物のレコードが人物情報ＤＢ９２に複数存在していた場合でも、その中に該当する人物があればその人物を特定できる。人物の同定を誤る可能性を小さくできる。その結果、以後の対話を円滑に進めることができる可能性が高くなるという効果が得られる。 As described above, according to the third embodiment, even if there are multiple records of a person corresponding to the name given by the conversation partner in step 378 in the person information DB 92, if there is a corresponding person among them, The person can be identified. The possibility of mistakenly identifying a person can be reduced. As a result, the effect of increasing the possibility that subsequent dialogue will proceed smoothly can be obtained.

第４第４実施形態
第４実施形態は、第１実施形態の図９に示すプログラム（対話相手に責任がある発話衝突を検出した際にロボットシステム１００が実行するプログラム）に代えて、図１４に制御構造を示すプログラムをロボットシステムが実行してロボットの動作を制御する点に特徴がある。 Fourth Embodiment The fourth embodiment uses the program shown in FIG. The robot system is characterized by the fact that the robot system executes a program that shows the control structure to control the robot's movements.

図１４を参照して、図１４に示すプログラムが図９に示すプログラムと異なるのは、ステップ６３２とステップ６３４との間に、発話衝突が最初の衝突か否かに従って制御の流れを分岐させるステップ９００をさらに含む点である。ステップ９００における判定が肯定ならば制御はステップ６３６に進み、ステップ９００における判定が否定ならば制御はステップ６３４に進む。ステップ６３４以下は図１に示すものと同じである。 Referring to FIG. 14, the program shown in FIG. 14 differs from the program shown in FIG. 9 in that between step 632 and step 634, the flow of control is branched depending on whether or not the utterance collision is the first collision. This point further includes 900. If the determination in step 900 is affirmative, control proceeds to step 636; if the determination in step 900 is negative, control proceeds to step 634. Step 634 and subsequent steps are the same as shown in FIG.

このステップ９００を設けることにより、以下のような効果が得られる。対話相手とロボットとの今回の対話において、最初に対話相手の責任による発話衝突が生じたときには、ステップ９００の判定が必ず肯定になる。したがってロボットは必ず対話相手に発言権を譲る。しかし、２回目以降の発話の場合には、ロボットは、その発話意欲によって発言権を維持したり、相手に発言権を譲ったりする。例えば人同士の対話において発話衝突が発生したときには、自分に責任がなくても相手に発言権を譲るという行為が見られることがある。この行為は、逆の状況が発生したときに、相手も同じような行為をしてくれるという期待によるものと思われる。こうした行為は非常に人間的な行為と考えられる。この実施形態においては、そのような行為をロボットに行わせることにより、対話相手から見てロボットとの対話がより人間との対話に近く、自然なものとなるという効果がある。 By providing this step 900, the following effects can be obtained. In the current dialogue between the dialogue partner and the robot, if an utterance conflict caused by the dialogue partner's responsibility occurs for the first time, the determination in step 900 will always be affirmative. Therefore, the robot always yields the right to speak to the person with whom it interacts. However, in the case of the second and subsequent utterances, the robot may maintain the right to speak or give the right to speak to the other party, depending on the robot's desire to speak. For example, when a speech conflict occurs in a conversation between people, the act of yielding the right to speak to the other person may be observed even if the person is not responsible. This behavior seems to be based on the expectation that the other person will do the same thing if the opposite situation occurs. These actions can be considered very human actions. In this embodiment, by having the robot perform such an action, the interaction with the robot becomes more natural and closer to the interaction with a human being from the perspective of the interaction partner.

第５コンピュータによる実現
図１５は、例えば図１に示す統合制御ＰＣ１２２として動作するコンピュータシステムの外観図である。図１６は、図１５に示すコンピュータシステムのハードウェアブロック図である。図１に示す音声認識ＰＣ１１８、音声合成ＰＣ１２０、顔画像認識ＰＣ１１６及び動作制御ＰＣ１１２も、統合制御ＰＣ１２２とほぼ同様の構成のコンピュータシステムにより実現できる。したがって、ここでは統合制御ＰＣ１２２の構成についてのみ述べることとし、他のＰＣの構成の詳細については繰り返さない。 Fifth Realization by Computer FIG. 15 is an external view of a computer system that operates as, for example, the integrated control PC 122 shown in FIG. 1. FIG. 16 is a hardware block diagram of the computer system shown in FIG. 15. The voice recognition PC 118, voice synthesis PC 120, face image recognition PC 116, and operation control PC 112 shown in FIG. 1 can also be realized by a computer system having almost the same configuration as the integrated control PC 122. Therefore, only the configuration of the integrated control PC 122 will be described here, and details of the configurations of the other PCs will not be repeated.

図１５を参照して、このコンピュータシステム９５０は、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）ドライブ１００２を有するコンピュータ９７０と、いずれもコンピュータ９７０に接続された、対話相手と対話するためのキーボード９７４、マウス９７６、及びモニタ９７２とを含む。もちろんこれらは対話相手対話が必要となったときのための構成の一例であって、システムを扱うための対話相手対話に利用できる一般のハードウェア及びソフトウェア（例えばタッチパネル、音声入力、ポインティングデバイス一般）ならばどのようなものも利用できる。そのような対話相手対話が想定されない場合にはこれらは不要である。 Referring to FIG. 15, this computer system 950 includes a computer 970 having a DVD (Digital Versatile Disc) drive 1002, a keyboard 974, a mouse 976, and a mouse 976, all connected to the computer 970, for interacting with a conversation partner. monitor 972. Of course, these are examples of configurations for when dialogue with a dialogue partner is required, and general hardware and software (e.g. touch panel, voice input, general pointing device) that can be used for dialogue with a dialogue partner to handle the system. If so, you can use anything. These are unnecessary if such dialogue with the dialogue partner is not expected.

図１６を参照して、コンピュータ９７０は、ＤＶＤドライブ１００２に加えて、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９９０と、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９９２と、ＣＰＵ９９０、ＧＰＵ９９２、ＤＶＤドライブ１００２に接続されたバス１０１０と、バス１０１０に接続され、コンピュータ９７０のブートアッププログラムなどを記憶するＲＯＭ（Ｒｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）９９６とを含む。 Referring to FIG. 16, a computer 970 includes, in addition to a DVD drive 1002, a CPU (Central Processing Unit) 990, a GPU (Graphics Processing Unit) 992, and a bus 101 connected to the CPU 990, GPU 992, and DVD drive 1002. 0 and , a ROM (Read-Only Memory) 996 connected to the bus 1010 and storing a boot-up program for the computer 970 and the like.

コンピュータ９７０はさらに、バス１０１０に接続され、プログラムを構成する命令、システムプログラム、及び作業データなどを記憶するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９９８と、バス１０１０に接続された不揮発性メモリであるＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）１０００とを含む。ＳＳＤ１０００は、ＣＰＵ９９０及びＧＰＵ９９２が実行するプログラム、並びにＣＰＵ９９０及びＧＰＵ９９２が実行するプログラムが使用するデータなどを記憶するためのものである。コンピュータ９７０はさらに、他端末との通信を可能とするネットワーク９８６（図１に示すネットワーク１１４）への接続を提供するネットワークＩ／Ｆ（Ｉｎｔｅｒｆａｃｅ）１００８と、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリ９８４が着脱可能であり、ＵＳＢメモリ９８４とコンピュータ９７０内の各部との通信を提供するＵＳＢポート１００６とを含む。 The computer 970 further includes a RAM (Random Access Memory) 998 that is connected to the bus 1010 and stores instructions constituting the program, system programs, work data, etc., and an SSD (Solid Memory) that is a nonvolatile memory that is connected to the bus 1010 State Drive) 1000. The SSD 1000 is for storing programs executed by the CPU 990 and GPU 992, data used by the programs executed by the CPU 990 and GPU 992, and the like. The computer 970 further includes a network I/F (Interface) 1008 that provides connection to a network 986 (network 114 shown in FIG. 1) that enables communication with other terminals, and a USB (Universal Serial Bus) memory 984 that is removable. It includes a USB memory 984 and a USB port 1006 that provides communication with various parts within the computer 970.

コンピュータ９７０はさらに、マイクロフォン９８２、スピーカ９８０、及び図示しないカメラ、ロボットの各アクチュエータなどの外部機器とバス１０１０とに接続され、ＣＰＵ９９０などの内部の各部と外部機器との間の入出力を行うための入出力Ｉ／Ｆ１００４を含む。 The computer 970 is further connected to a bus 1010 and external devices such as a microphone 982, a speaker 980, a camera (not shown), and actuators of a robot (not shown) to perform input/output between internal parts such as the CPU 990 and external devices. It includes an input/output I/F 1004.

上記実施形態においては、動作制御ＰＣ１１２、統合制御ＰＣ１２２、音声認識ＰＣ１１８、音声合成ＰＣ１２０、及び顔画像認識ＰＣ１１６などの機能を実現するプログラムなどは、いずれも例えば図１６に示すＳＳＤ１０００、ＲＡＭ９９８、ＤＶＤ９７８又はＵＳＢメモリ９８４、若しくはネットワークＩ／Ｆ１００８及びネットワーク９８６を介して接続された図示しない外部装置の記憶媒体などに格納される。典型的には、これらのデータ及びパラメータなどは、例えば外部からＳＳＤ１０００に書込まれコンピュータ９７０の実行時にはＲＡＭ９９８にロードされる。 In the above embodiment, programs for realizing functions such as the operation control PC 112, the integrated control PC 122, the voice recognition PC 118, the voice synthesis PC 120, and the face image recognition PC 116 are all stored in the SSD 1000, RAM 998, DVD 978, etc. shown in FIG. It is stored in a USB memory 984 or a storage medium of an external device (not shown) connected via the network I/F 1008 and the network 986. Typically, these data and parameters are written into the SSD 1000 from the outside, for example, and loaded into the RAM 998 when the computer 970 is executed.

このコンピュータシステムを、図１に示す動作制御ＰＣ１１２、統合制御ＰＣ１２２、音声認識ＰＣ１１８、及び音声合成ＰＣ１２０、並びにそれらの各構成要素の機能を実現するよう動作させるためのコンピュータプログラムは、ＤＶＤドライブ１００２に装着されるＤＶＤ９７８に記憶され、ＤＶＤドライブ１００２からＳＳＤ１０００に転送される。又は、これらのプログラムはＵＳＢメモリ９８４に記憶され、ＵＳＢメモリ９８４をＵＳＢポート１００６に装着し、プログラムをＳＳＤ１０００に転送する。又は、このプログラムはネットワーク９８６を通じてコンピュータ９７０に送信されＳＳＤ１０００に記憶されてもよい。 A computer program for operating this computer system so as to realize the functions of the operation control PC 112, integrated control PC 122, speech recognition PC 118, and speech synthesis PC 120 shown in FIG. It is stored on the DVD 978 that is installed, and transferred from the DVD drive 1002 to the SSD 1000. Alternatively, these programs are stored in the USB memory 984, the USB memory 984 is attached to the USB port 1006, and the programs are transferred to the SSD 1000. Alternatively, this program may be transmitted to computer 970 via network 986 and stored on SSD 1000.

プログラムは実行のときにＲＡＭ９９８にロードされる。もちろん、キーボード９７４、モニタ９７２及びマウス９７６を用いてソースプログラムを入力し、コンパイルした後のオブジェクトプログラムをＳＳＤ１０００に格納してもよい。上記実施形態のようにスクリプト言語の場合には、キーボード９７４などを用いて入力したスクリプトをＳＳＤ１０００に格納してもよい。仮想マシン上において動作するプログラムの場合には、仮想マシンとして機能するプログラムを予めコンピュータ９７０にインストールしておく必要がある。顔画像認識、音声認識及び音声合成などにはニューラルネットワークが使用される。顔画像から対話相手の性別及び年齢を推定するプログラムにもニューラルネットワークが使用される。これらについては、別のシステムにより学習済のニューラルネットワークを使用してもよいし、ロボットシステム１００において学習を行ってもよい。 The program is loaded into RAM 998 during execution. Of course, a source program may be input using the keyboard 974, monitor 972, and mouse 976, and the compiled object program may be stored in the SSD 1000. In the case of a script language as in the above embodiment, a script input using the keyboard 974 or the like may be stored in the SSD 1000. In the case of a program that operates on a virtual machine, it is necessary to install the program that functions as a virtual machine on the computer 970 in advance. Neural networks are used for facial image recognition, voice recognition, voice synthesis, etc. Neural networks are also used in programs that estimate the gender and age of a conversation partner from facial images. For these, a neural network trained by another system may be used, or the robot system 100 may perform the learning.

ＣＰＵ９９０は、その内部のプログラムカウンタと呼ばれるレジスタ（図示せず）により示されるアドレスに従ってＲＡＭ９９８からプログラムを読み出して命令を解釈し、命令の実行に必要なデータを命令により指定されるアドレスに従ってＲＡＭ９９８、ＳＳＤ１０００又はそれ以外の機器から読み出して命令により指定される処理を実行する。ＣＰＵ９９０は、実行結果のデータを、ＲＡＭ９９８、ＳＳＤ１０００、ＣＰＵ９９０内のレジスタなど、プログラムにより指定されるアドレスに格納する。アドレスによってはロボットのアクチュエータへの指令、音声信号などとしてコンピュータから出力される。このとき、プログラムカウンタの値もプログラムによって更新される。コンピュータプログラムは、ＤＶＤ９７８から、ＵＳＢメモリ９８４から、又はネットワーク９８６を介して、ＲＡＭ９９８に直接にロードしてもよい。なお、ＣＰＵ９９０が実行するプログラムの中で、一部のタスク（主として数値計算）については、プログラムに含まれる命令により、又はＣＰＵ９９０による命令実行時の解析結果に従って、ＧＰＵ９９２にディスパッチされる。 The CPU 990 reads the program from the RAM 998 according to the address indicated by an internal register called a program counter (not shown), interprets the instruction, and stores the data necessary for executing the instruction in the RAM 998 and the SSD 1000 according to the address specified by the instruction. Or read it from other devices and execute the process specified by the command. The CPU 990 stores the data of the execution result at an address specified by the program, such as the RAM 998, the SSD 1000, or a register within the CPU 990. Depending on the address, the computer outputs a command to the robot's actuator, a voice signal, etc. At this time, the value of the program counter is also updated by the program. Computer programs may be loaded directly into RAM 998 from DVD 978, from USB memory 984, or via network 986. Note that in the program executed by the CPU 990, some tasks (mainly numerical calculations) are dispatched to the GPU 992 according to instructions included in the program or according to an analysis result when the CPU 990 executes the instructions.

コンピュータ９７０により上記した各実施形態における各部の機能を実現するプログラムは、それら機能を実現するようコンピュータ９７０を動作させるように記述され配列された複数の命令を含む。この命令を実行するのに必要な基本的機能のいくつかはコンピュータ９７０上において動作するオペレーティングシステム（ＯＳ）若しくはサードパーティのプログラム、コンピュータ９７０にインストールされる各種ツールキットのモジュール又はプログラムの実行環境により提供される場合もある。したがって、このプログラムはこの実施形態のシステム及び方法を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令の中で、所望の結果が得られるように制御されたやり方によって適切な機能又はモジュールなどを静的にリンクすることにより、又は動的に呼出すことにより、上記した各装置及びその構成要素としての動作を実行する命令のみを含んでいればよい。そのためのコンピュータ９７０の動作方法は周知である。したがって、ここでは繰り返さない。 A program that causes the computer 970 to realize the functions of each part in each of the embodiments described above includes a plurality of instructions written and arranged to cause the computer 970 to operate to realize those functions. Some of the basic functions required to execute this instruction are provided by the operating system (OS) running on the computer 970, third party programs, modules of various toolkits installed on the computer 970, or the program execution environment. In some cases, it may be provided. Therefore, this program does not necessarily include all the functions necessary to implement the system and method of this embodiment. This program includes each of the above-mentioned devices and modules by statically linking or dynamically calling appropriate functions or modules in a controlled manner to obtain the desired results. It is sufficient to include only instructions for executing operations as its constituent elements. The manner in which computer 970 operates for this purpose is well known. Therefore, it will not be repeated here.

なお、ＧＰＵ９９２は並列処理を行うことが可能であり、機械学習に伴う多量の計算を同時並列的又はパイプライン的に実行できる。例えばプログラムのコンパイル時にプログラム中に発見された並列的計算要素、又はプログラムの実行時に発見された並列的計算要素は、随時、ＣＰＵ９９０からＧＰＵ９９２にディスパッチされ、実行され、その結果が直接に、又はＲＡＭ９９８の所定アドレスを介してＣＰＵ９９０に返され、プログラム中の所定の変数に代入される。 Note that the GPU 992 can perform parallel processing, and can execute a large amount of calculations associated with machine learning simultaneously in parallel or in a pipeline manner. For example, parallel computing elements found in a program when the program is compiled, or parallel computing elements discovered when the program is executed are dispatched from the CPU 990 to the GPU 992 and executed, and the results are sent directly or to the RAM 998. is returned to the CPU 990 via a predetermined address, and is substituted into a predetermined variable in the program.

第６その他の変形例
上記実施形態においては、特定のシナリオに従ってロボット１１０が動作する。しかしこの発明はそのような実施形態には限定されない。特定のシナリオではなく、ロボット１１０がその都度自己の行動を選択するような実施形態においても、相手との対話を行う際には上記実施形態に係る方法を利用できる。また上記実施形態においては、対話の最初における対話相手の同定誤り、及び対話における発話衝突に関するものなら、この発明はそのような場合に適用可能なだけではない。一方が相手の発話を誤解したために対話が破綻した場合、又はロボットの相手がロボットを対話相手と認識しない場合などにも上記と同様、ロボットに感情を持たせて応答することにより、人とロボットとの自然な対話に復帰できる。さらに、上記実施形態においては、ロボットという、物理的実体を持つものが対話の一方当事者だった。しかしこの発明はそのような実施形態には限定されない。すなわち、この発明におけるロボットとは、物理的実体を持つものに限られない。いわゆるアバターのように、人間の形を模した映像にもこの発明を適用できる。 Sixth Other Modifications In the embodiment described above, the robot 110 operates according to a specific scenario. However, the invention is not limited to such embodiments. Even in an embodiment in which the robot 110 selects its own behavior each time instead of a specific scenario, the method according to the above embodiment can be used when interacting with the other party. Further, in the above embodiment, the present invention is not only applicable to such cases as long as it relates to an error in identifying a dialogue partner at the beginning of a dialogue and a collision of utterances in the dialogue. When a dialogue breaks down because one party misunderstands the other party's utterances, or when the robot partner does not recognize the robot as a dialogue partner, the same way as above, by giving the robot an emotion and responding, it is possible to connect the robot to the robot. You can return to natural dialogue with others. Furthermore, in the above embodiment, one party to the dialogue is a robot, which has a physical entity. However, the invention is not limited to such embodiments. That is, the robot in this invention is not limited to a robot that has a physical entity. This invention can also be applied to images that imitate the human form, such as so-called avatars.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiment disclosed this time is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim, with reference to the description of the detailed description of the invention, and all changes within the scope and meaning equivalent to the words described therein are defined. include.

６０カメラ
６２、９８０スピーカ
６６、９８２マイクロフォン
９２人物情報ＤＢ
１００ロボットシステム
１１０ロボット
１１２動作制御ＰＣ
１１４、９８６ネットワーク
１１６顔画像認識ＰＣ
１１８音声認識ＰＣ
１２０音声合成ＰＣ
１２２統合制御ＰＣ
１５０グラフ
３６０、３６２、３６４、３６６、３６８、３７０、３７２、３７４、３７６、３７８、４００、４３０、４６０、４６２発話ターン
４１２、４１４、４４０、４４２、４４４発話
４１６、４４６発話衝突
４８０、４８２ロボットの発話ターン
９５０コンピュータシステム
９７０コンピュータ
60 Camera 62, 980 Speaker 66, 982 Microphone 92 Person information DB
100 Robot system 110 Robot 112 Operation control PC
114,986 Network 116 Facial image recognition PC
118 Voice recognition PC
120 Voice synthesis PC
122 Integrated control PC
150 Graph 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 400, 430, 460, 462 Utterance turn 412, 414, 440, 442, 444 Utterance 416, 446 Utterance collision 480, 482 Robot speech turn 950 computer system 970 computer

Claims

a failure detection step in which the computer detects a failure in communication between the robot and the dialogue partner;
In response to the failure being detected in the failure detection step, the computer controls the robot to have a conversation with the conversation partner that involves the expression of emotion, according to a predetermined procedure. , the step of recovering from the failure using the information obtained in the dialogue,
The failure detection step includes:
A first step in which the computer confirms the result of the identification process with the conversation partner with a first attitude prepared in advance, depending on whether the reliability in the identification process of the conversation partner is higher than a predetermined threshold; a process of controlling the robot to make an utterance, and a second utterance for starting the dialogue partner identification procedure with a second attitude prepared in advance so as to appear less confident than the first attitude; selectively performing a process of controlling the robot to perform;
a third step for the computer to initiate the identification procedure with the second attitude in response to the interaction partner's response to the first utterance indicating an error in the result of the identification process; and controlling the robot to make the utterance.

The step of performing the return includes:
a step in which the computer classifies the interaction partner as an acquaintance of the robot in response to the interaction partner's response to the first utterance indicating that the identification result is correct;
2. The method of claim 1, further comprising the step of: a computer controlling the robot to initiate a dialogue according to a previously prepared scenario for dialogue with an acquaintance.

3. The method of claim 1 or claim 2, wherein the second utterance and the third utterance are the same utterance.

4. The method according to claim 1, wherein the second utterance is an utterance in which the robot asks whether the conversation partner is meeting the robot for the first time.

The step of performing the return further includes:
a step in which the computer determines whether or not the interaction partner's response to the second utterance affirms that the interaction partner is meeting the robot for the first time;
In response to the negative response of the dialogue partner in the determining step, the dialogue partner performs the identification process with a third attitude prepared in advance so as to appear even less confident than the second attitude. a computer controlling the robot to make a fourth utterance regarding whether or not the person is the person identified by the computer;
In response to the affirmative response of the dialogue partner to the fourth utterance, the computer classifies the dialogue partner as an acquaintance of the robot, and the computer classifies the dialogue partner as an acquaintance of the robot, and makes a fourth utterance prepared in advance so that the robot appears relieved. controlling the robot to initiate a dialogue with an attitude of
In response to the negative response of the dialogue partner to the fourth utterance, the computer displays a fifth attitude prepared in advance such as looking disappointed and performs an additional identification process. 5. A method according to any one of claims 2 to 4, comprising the step of controlling a robot.

The additional identification process includes:
a computer controlling the robot to utter a question to the conversation partner asking for his or her name;
A computer generates a determination result by determining whether the name included in the response of the dialogue partner to the question asking for the name matches the name of the person registered in a person information database prepared in advance. the step of
In response to the affirmative determination result, the computer classifies the conversation partner as an acquaintance of the robot, and engages in a conversation with the acquaintance while displaying a fifth attitude prepared in advance so as to appear happy. controlling the robot to initiate a dialogue according to a scenario for;
In response to the determination result being negative, the computer classifies the conversation partner as a person unknown to the robot, and conducts a conversation with the conversation partner according to a scenario prepared in advance as a conversation with an unknown person. 6. The method of claim 5, wherein initiating includes controlling the robot.

The additional identification process includes:
a computer controlling the robot to utter a question to the conversation partner asking for his or her name;
A computer generates a determination result by determining whether the name included in the response of the dialogue partner to the question asking for the name matches the name of the person registered in a person information database prepared in advance. the step of
In response to the affirmative determination result, the computer performs processing to confirm whether or not the conversation partner is the same person as the person registered in the person information database, and according to the confirmation result, a step of classifying a conversation partner into an acquaintance or an unknown person for the robot;
A dialogue according to a scenario for dialogue with an acquaintance, in which the computer displays a fifth attitude prepared in advance so as to appear happy in response to the robot's classification of the dialogue partner as an acquaintance of the robot. controlling the robot to initiate
In response to the determination result being negative or the conversation partner being classified as an unknown person to the robot, the computer selects the conversation partner according to a scenario prepared in advance as a conversation with an unknown person. 6. The method of claim 5, comprising: controlling the robot to initiate an interaction with the robot.

The step of performing the return further includes:
a step in which the computer controls the robot to make a fifth utterance for identifying the conversation partner in response to the conversation partner's response to the fourth utterance being affirmative;
a step in which the computer generates a determination result regarding whether or not information identifying the conversation partner included in the conversation partner's response to the fifth utterance matches the result of the identification process;
In response to the affirmative determination result, the computer controls the robot to make a sixth utterance for confirming that the conversation partner corresponds to an acquaintance of the robot;
In response to the affirmative response of the conversation partner to the sixth utterance, the computer classifies the conversation partner as an acquaintance of the robot, and has a fifth attitude prepared in advance such that the robot appears happy. 6. The method of claim 5, comprising: controlling the robot to initiate a scenario-based interaction with an acquaintance while indicating the interaction with an acquaintance.

In the step of performing the return, the computer classifies the conversation partner as a person unknown to the robot in response to the negative determination result, and creates a scenario prepared in advance as a conversation with an unknown person. 9. The method of claim 8, comprising controlling the robot to initiate a dialogue with the interaction partner according to the method.

In the step of performing the return, the computer classifies the dialogue partner as an unknown person to the robot in response to the negative response of the dialogue partner to the sixth utterance, and classifies the dialogue partner as an unknown person. 10. The method according to claim 8, further comprising the step of controlling the robot to start a dialogue with the dialogue partner according to a scenario prepared in advance as a dialogue.

A computer program for causing a computer to perform the method according to any one of claims 1 to 10.