JP4728905B2

JP4728905B2 - Spoken dialogue apparatus and spoken dialogue program

Info

Publication number: JP4728905B2
Application number: JP2006211166A
Authority: JP
Inventors: 浩彦佐川; 信夫畑岡; 浩明小窪; 健本間; 久高橋; 健大野; 実冨樫; 大介斎藤; 景子桂川
Original assignee: Clarion Co Ltd; Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd; Faurecia Clarion Electronics Co Ltd
Priority date: 2006-08-02
Filing date: 2006-08-02
Publication date: 2011-07-20
Anticipated expiration: 2026-08-02
Also published as: JP2008039928A

Description

本発明は、利用者とさまざまな機器とのやり取りを音声対話によってスムーズに行うための音声対話装置およびそのプログラムに関するものである。 The present invention relates to a voice dialogue apparatus and a program therefor for smoothly performing exchange between a user and various devices by voice dialogue.

音声により利用者と対話を行い、利用者が要求する情報やサービスを提供する音声対話システムに関する技術は多数提案されている。音声対話システムにより、利用者とのスムーズなやり取りを行うためには、利用者が入力する音声を正しく解釈できることに加え、利用者が入力した音声に対して適切な応答文を利用者に提示し、応答文に続いて利用者が
音声を入力しやすい状況にすることが重要となる。 A number of technologies related to a voice dialogue system that performs dialogue with a user and provides information and services required by the user have been proposed. In order to communicate smoothly with the user through the voice dialogue system, in addition to being able to correctly interpret the voice input by the user, an appropriate response sentence is presented to the user in response to the voice input by the user. It is important to make it easy for the user to input voice following the response sentence.

「特許文献１」では、認識対象とする第１のキーワードと、第１のキーワードとスロット項目(キーワードの種別)に対応づけられた第２のキーワードを格納したキーワード知識ベースを用意する技術が開示されている。入力音声から抽出された第１のキーワードに対応する第２のキーワードをキーワード知識ベースから選択し、スロット項目毎に記録する。スロット項目毎の第２のキーワード毎の格納状況に基づいて、応答文を生成する。
「特許文献２」では、入力音声から認識された単語をカテゴリとクラスとに分類し、さらに、入力音声から認識された単語の信頼度からどのクラスの単語が認識されたかを表す確からしさを求める技術が開示されている。求めた確からしさに基づいて詳細化、回答訂正、あるいは再入力といった発話タイプを決定し、応答文を生成する。応答文は、応答文パターンにカテゴリ単語やカテゴリクラスを挿入することにより生成される。 “Patent Document 1” discloses a technique for preparing a keyword knowledge base storing a first keyword to be recognized and a second keyword associated with the first keyword and a slot item (keyword type). Has been. A second keyword corresponding to the first keyword extracted from the input speech is selected from the keyword knowledge base and recorded for each slot item. A response sentence is generated based on the storage status of each second keyword for each slot item.
In “Patent Document 2”, words recognized from the input speech are classified into categories and classes, and further, a probability that indicates which class of words is recognized is obtained from the reliability of the words recognized from the input speech. Technology is disclosed. An utterance type such as refinement, answer correction, or re-input is determined based on the obtained probability, and a response sentence is generated. The response sentence is generated by inserting a category word or category class into the response sentence pattern.

前記従来技術では、利用者が入力した音声から認識されたキーワードについて、あらかじめ用意されたキーワードと言い換え語との対応関係に基づいて、キーワードを変更し、応答文中に入れ替える技術が示されている。
また、認識されたキーワードの信頼度や対話履歴に基づいて、キーワードを変更するかどうかを決定する技術や、応答文のパターンを決定する技術が示されている。
特開２００５−３０１１３８号公報特開２００４−２５１９９８号公報 The prior art discloses a technique for changing a keyword recognized from a voice input by a user based on a correspondence relationship between a keyword prepared in advance and a paraphrase and replacing the keyword in a response sentence.
Also, a technique for determining whether or not to change a keyword and a technique for determining a response sentence pattern are shown based on the reliability of the recognized keyword and the conversation history.
JP 2005-301138 A JP 2004-251998 A

しかし、前記従来技術では、あるキーワードに対してどのような語に変更するかは一意に決定されており、また、信頼度等に基づいてキーワードを変更したり、応答文を決定するルールは、共通のルールを設定し、それを利用して処理を行っている。
一方、あるキーワードを応答文に挿入する際の適切な変更方法や応答文の形式は、キーワードそのものの他、各種の状況、例えば、入力されたキーワードが複数の場合等によって変化する場合がある。このため、共通のルールを設定し、それを利用する従来技術では、キーワードそれぞれの音韻的な特徴等に応じて柔軟に応答文や対応処理を変更し決定することができないという問題がある。その結果、従来技術では、適切な応答文を生成することができない。 However, in the prior art, what kind of word is changed for a certain keyword is uniquely determined, and the rule for changing the keyword based on the reliability or the like or determining the response sentence is as follows: A common rule is set and processed using it.
On the other hand, an appropriate change method and a response sentence format when inserting a certain keyword into a response sentence may change depending on various situations, for example, a case where a plurality of input keywords exist, in addition to the keyword itself. For this reason, there is a problem that the conventional technique that sets and uses a common rule cannot flexibly change and determine the response sentence or the corresponding process according to the phonological characteristics of each keyword. As a result, the conventional technology cannot generate an appropriate response sentence.

そこで、本発明は、適切な応答文を生成することが可能な音声対話装置および音声対話プログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a voice interaction apparatus and a voice interaction program that can generate an appropriate response sentence.

以上の問題を解決するために、本発明では、認識対象となっているキーワードに対して、それらを応答文中に挿入する場合に使用する言い換え語と、応答文の種類を表す応答タイプと、言い換え語と応答タイプが選択される条件と、を記録する。また、応答タイプ毎に応答文のフォーマットを表す応答文テンプレートを用意する。
言い換え語と応答タイプが選択される条件に基づいて、認識されたキーワードに対する言い換え語と応答タイプを決定し、さらに、応答タイプに基づいて応答文テンプレートを検索する。検索された応答文テンプレートに言い換え語を挿入することにより応答文を生成する。
言い換え語と応答タイプが選択される条件としては、認識されたキーワードに対する信頼度の値に基づく条件、認識されたキーワードの数、認識されたキーワードの種類、過去の応答タイプの履歴、過去の応答文の履歴、過去の利用者音声の認識結果の何れか１つ以上が含まれる。 In order to solve the above problems, in the present invention, for a keyword that is a recognition target, a paraphrase used when the keyword is inserted in a response sentence, a response type indicating the type of the response sentence, and a paraphrase Record the terms and conditions under which the response type is selected. In addition, a response sentence template representing a response sentence format is prepared for each response type.
The paraphrase word and the response type for the recognized keyword are determined based on the condition for selecting the paraphrase word and the response type, and the response sentence template is searched based on the response type. A response sentence is generated by inserting a paraphrase into the retrieved response sentence template.
Conditions for selecting a paraphrase and a response type include a condition based on a confidence value for a recognized keyword, the number of recognized keywords, a recognized keyword type, a history of past response types, and a past response. Any one or more of sentence history and past user speech recognition results are included.

本発明によれば、適切な応答文を生成することが可能になる。 According to the present invention, it is possible to generate an appropriate response sentence.

（実施の形態１）
以下、本発明（音声対話装置および音声対話プログラム）の実施の形態１を、図１〜図７を用いて説明する。 (Embodiment 1)
The first embodiment of the present invention (voice dialogue apparatus and voice dialogue program) will be described below with reference to FIGS.

図１は本発明の実施の形態１の構成例を示した図である。図１では、利用者が目的とする施設の場所と名称を音声により入力し、目的とする施設の情報を検索し結果を出力する音声対話装置を想定している。 FIG. 1 is a diagram showing a configuration example of Embodiment 1 of the present invention. In FIG. 1, it is assumed that a voice dialogue apparatus that inputs the location and name of a target facility by a user by voice, searches for information on the target facility, and outputs the result.

図１において、マイク１０１は利用者の音声を電気信号に変換するための手段、音声入力部１０２はマイク１０１から入力された電気信号を情報処理部１０５において処理可能な音声データに変換する手段である。音声出力部１０３は入力された利用者の音声に対する応答文から生成された音声データを電気信号に変換するための手段、スピーカ１０４は変換された電気信号を音声として出力するための手段である。情報処理部１０５は記憶部１０６に記憶された各種プログラムに基づいて、利用者とのやり取りを行うための処理を実行する手段である。 In FIG. 1, a microphone 101 is a means for converting a user's voice into an electric signal, and a voice input unit 102 is a means for converting the electric signal input from the microphone 101 into voice data that can be processed by the information processing unit 105. is there. The voice output unit 103 is means for converting voice data generated from a response sentence to the input voice of the user into an electric signal, and the speaker 104 is means for outputting the converted electric signal as voice. The information processing unit 105 is means for executing processing for exchanging with the user based on various programs stored in the storage unit 106.

なお、音声対話装置１は、図示しない、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）等の半導体メモリからなる主記憶装置、ハードディスク装置等からなる補助記憶装置、入出力インタフェース等を備えるコンピュータを用いて構成される。ここで、ＣＰＵは情報処理部１０５に、主記憶装置は記憶部１０６に、入出力インタフェースは音声入力部１０２および音声出力部１０３に相当する。
前記主記憶装置には、記憶部１０６の音声認識プログラム１０７、対話制御プログラム１０８、音声合成プログラム１０９および検索プログラム１１０が格納されている。
また、前記補助記憶装置には、対話シナリオ１１１、キーワード種類辞書１１２、言い換え辞書１１３、応答文テンプレート辞書１１４およびデータベース１１５が格納されている。各機能についての詳細は後記する。 The spoken dialogue apparatus 1 is a computer having a main storage device made of a semiconductor memory such as a CPU (Central Processing Unit) and a RAM (Random Access Memory), an auxiliary storage device made of a hard disk device, an input / output interface, etc., not shown. It is configured using. Here, the CPU corresponds to the information processing unit 105, the main storage device corresponds to the storage unit 106, and the input / output interface corresponds to the voice input unit 102 and the voice output unit 103.
The main storage device stores a speech recognition program 107, a dialogue control program 108, a speech synthesis program 109, and a search program 110 in the storage unit 106.
The auxiliary storage device stores a dialogue scenario 111, a keyword type dictionary 112, a paraphrase dictionary 113, a response sentence template dictionary 114, and a database 115. Details of each function will be described later.

音声認識プログラム１０７は、情報処理部１０５に実行されることで、入力された利用者の音声データの中で表現されているキーワードを認識し、その結果を出力する。その結果は、例えば、「神奈川県の○○美術館」という利用者の音声に対して、（神奈川県、０．８）（○○美術館、０．９）というような形式で取得することができる。ここで、（）の中に表現されている単語は認識対象としているキーワードであり、キーワードに併記されている数値は認識されたキーワードの確からしさを表す信頼度である。信頼度としては、通常用いられる音声認識技術において音声認識処理の結果としてキーワード毎に得られる信頼度の数値をそのまま利用することができる。前記例では、キーワードとしている県名や美術館名のみを結果として出力すると想定しているが、「の」のようなキーワード以外の単語を全て出力することも可能である。さらに、前記例では、最も信頼度の高いキーワードのみを結果として出力しているが、音声データ中の各キーワードに対する候補を複数出力することもできる。 The voice recognition program 107 is executed by the information processing unit 105 to recognize a keyword expressed in the input voice data of the user, and outputs the result. The result can be obtained in the form of (Kanagawa Prefecture, 0.8) (XX Museum, 0.9), for example, for the voice of the user “XX Museum of Kanagawa”. . Here, the word expressed in parentheses is a keyword to be recognized, and the numerical value written together with the keyword is a reliability indicating the certainty of the recognized keyword. As the reliability, the numerical value of the reliability obtained for each keyword as a result of the speech recognition process in a commonly used speech recognition technique can be used as it is. In the above example, it is assumed that only the prefecture name or museum name as a keyword is output as a result, but it is also possible to output all words other than the keyword such as “no”. Furthermore, in the above example, only the keyword with the highest reliability is output as a result, but a plurality of candidates for each keyword in the voice data can also be output.

対話制御プログラム１０８は、情報処理部１０５に実行されることで、前記信頼度を条件として、前記認識されたキーワードとその信頼度とに基づいて、言い換え辞書１１３から応答タイプと言い換え語を決定し、前記決定された応答タイプに基づいて、それに対応づけられた応答文を応答文テンプレート辞書１１４から決定し、前記決定された応答文に、前記決定された言い換え語を挿入し、利用者に次の発話を促すための応答文を生成する。対話制御プログラム１０８の処理の詳細は後記する。 The dialogue control program 108 is executed by the information processing unit 105 to determine a response type and a paraphrase from the paraphrase dictionary 113 based on the recognized keyword and its reliability on the condition of the reliability. Based on the determined response type, a response sentence corresponding to the determined response type is determined from the response sentence template dictionary 114, and the determined paraphrase is inserted into the determined response sentence. A response sentence for prompting the utterance of is generated. Details of the processing of the dialog control program 108 will be described later.

音声合成プログラム１０９は、情報処理部１０５に実行されることで、対話制御プログラム１０８によって生成された応答文を音声データに変換して出力する。 The voice synthesis program 109 is executed by the information processing unit 105 to convert the response sentence generated by the dialogue control program 108 into voice data and output the voice data.

検索プログラム１１０は、情報処理部１０５に実行されることで、利用者が入力した施設の場所と名称を検索条件として、データベース１１５から目的とする施設の情報を検索するプログラムである。データベース１１５は、公知のリレーショナルデータベース等を使用する。また、検索プログラム１１０は前記データベースに通常用意されている検索手段を用いることにより、容易に実現することができる。あるいは、データベース１１５としては、一般的に利用されているインターネット上の情報を検索する手段等を利用することもできる。 The search program 110 is a program that is executed by the information processing unit 105 to search for information on a target facility from the database 115 using the location and name of the facility input by the user as search conditions. The database 115 uses a known relational database or the like. Further, the search program 110 can be easily realized by using search means normally prepared in the database. Alternatively, as the database 115, a means for searching information on the Internet which is generally used can be used.

図２は、対話シナリオ１１１中に格納されている個々の対話シナリオのフォーマットを示す構成図である。対話シナリオには、利用者が入力するキーワードの種類と数、利用者の音声を認識するための情報（利用者音声認識用文法名）、利用者が入力したキーワードを用いて行う処理（コマンド）に関する情報、が記録される。 FIG. 2 is a block diagram showing the format of each dialogue scenario stored in the dialogue scenario 111. In the dialogue scenario, the type and number of keywords input by the user, information for recognizing the user's voice (grammar name for user voice recognition), and processing (command) performed using the keyword input by the user Information on

対話名２０１は、対話シナリオを区別するために利用される対話の名称を表す文字列、スロット１の名称２０２およびスロットｎの名称２０４は、利用者が入力するスロットの名称を表す文字列である。ここで、スロットとは、利用者が入力する各キーワードを格納するメモリ領域を指し、スロットの名称はこのメモリ領域を区別するために使用される。利用者が入力したスロットに対応するキーワードは、スロットに対応するメモリ領域に格納される。
また、スロット１の種類２０３およびスロットｎの種類２０５は、スロットに格納されるキーワードの種類を表す文字列であり、後記するキーワード種類辞書１１２において使用されるキーワードの種類の名称と同じ文字列を使用する。例えば、「県名」や「美術館」等の文字列が格納される。スロット１の種類２０３およびスロットｎの種類２０５は、利用者が入力したキーワードをスロットに格納する際に、どのスロットに格納するかを決定するために用いられる。 The dialogue name 201 is a character string representing the name of the dialogue used to distinguish the dialogue scenario, and the name 202 of the slot 1 and the name 204 of the slot n are character strings representing the name of the slot input by the user. . Here, the slot refers to a memory area for storing each keyword input by the user, and the name of the slot is used to distinguish the memory area. The keyword corresponding to the slot input by the user is stored in the memory area corresponding to the slot.
The slot 1 type 203 and the slot n type 205 are character strings representing the types of keywords stored in the slots, and the same character strings as the keyword type names used in the keyword type dictionary 112 described later are used. use. For example, character strings such as “prefecture name” and “museum” are stored. The slot 1 type 203 and the slot n type 205 are used to determine in which slot a keyword input by the user is stored when stored in the slot.

利用者音声認識用文法名２０６は、利用者が入力した音声データを認識するために使用されるキーワードやキーワードの並びに関するルールが登録された音声認識用文法の名称を表す文字列である。音声認識用文法としても、一般的に用いられる音声認識技術で利用されている形式を使用することができる。また、対話毎に利用者が入力する音声の言い回しやキーワードが異なるため、本発明の実施の形態１では対話毎に音声認識用文法を設定しているが、対象とする全ての対話に対応可能な音声認識用文法を用意し、それを用いるようにすることもできる。
コマンド２０７は、利用者がスロット１の名称２０２およびスロットｎの名称２０４に入力したキーワードに基づいてデータベースを検索するためのコマンドを表す文字列である。例えば、スロット１およびスロット２を検索条件としてデータベースを検索する場合、コマンドの形式が、「SEARCH 条件１条件２」であったとすると、２０７のコマンドには、「SEARCH ［スロット１の名称］［スロット２の名称］」と記述される。ここで、SEARCHは検索を行うためのコマンドの名称であり、［スロットｎの名称］という記述は、この箇所をスロットｎに格納されているキーワードで置き換えることを示すものとする。 The user speech recognition grammar name 206 is a character string representing the name of a speech recognition grammar in which a keyword used for recognizing speech data input by the user and a rule relating to the arrangement of the keywords are registered. As the speech recognition grammar, a format used in a generally used speech recognition technology can be used. In addition, since the speech wording and keywords input by the user are different for each dialog, the speech recognition grammar is set for each dialog in the first embodiment of the present invention, but it can be used for all target dialogs. It is also possible to prepare a simple grammar for speech recognition and use it.
The command 207 is a character string representing a command for searching the database based on a keyword input by the user to the name 202 of the slot 1 and the name 204 of the slot n. For example, when searching the database using slot 1 and slot 2 as search conditions, if the command format is “SEARCH condition 1 condition 2”, the command 207 includes “SEARCH [name of slot 1] [slot 2 name] ”. Here, SEARCH is a name of a command for performing a search, and the description [name of slot n] indicates that this place is replaced with a keyword stored in slot n.

図３は、キーワード種類辞書１１２に格納される情報のフォーマットを示す構成図である。キーワード種類辞書１１２は、入力された利用者の音声に含まれるキーワードとその種類の名称を対応づけて記憶した辞書である。 FIG. 3 is a configuration diagram showing a format of information stored in the keyword type dictionary 112. The keyword type dictionary 112 is a dictionary that stores a keyword included in the input user's voice and the name of the type in association with each other.

種類３０１で示される列はキーワードの種類の名称であり、キーワード３０２で示される列はキーワードの種類に該当するキーワードが記述される。例えば、図３において、「○○美術館」３０４、「△△美術館」３０５および「××ミュージアム」３０６は、「美術館」３０３という種類に該当するキーワードである。また、「東京都」３０８、「神奈川県」３０９および「千葉県」３１０は、「県名」３０７という種類に該当するキーワードである。 The column indicated by the type 301 is the name of the keyword type, and the column indicated by the keyword 302 describes the keyword corresponding to the keyword type. For example, in FIG. 3, “XX Museum” 304, “ΔΔ Museum” 305, and “XX Museum” 306 are keywords corresponding to the type “Museum” 303. “Tokyo” 308, “Kanagawa” 309 and “Chiba” 310 are keywords corresponding to the type of “prefecture name” 307.

図４は、言い換え辞書１１３に格納される情報のフォーマットを示す構成図である。言い換え辞書１１３は、入力された利用者の音声に含まれるキーワードとその信頼度とから、応答文を生成するための言い換えルールを設定した辞書である。 FIG. 4 is a configuration diagram showing a format of information stored in the paraphrase dictionary 113. The paraphrase dictionary 113 is a dictionary in which a paraphrase rule for generating a response sentence is set based on a keyword included in the input user's voice and its reliability.

キーワード４０１で示される列は言い換え対象となるキーワード、条件（信頼度）４０２で示される列は言い換えを行う条件、言い換え語４０３で示される列は言い換え語および応答タイプ４０４で示される列は応答タイプが記述される。条件（信頼度）４０２の言い換えを行う条件で示される列において、「ｘ」は利用者の音声データから認識されたキーワードの信頼度を表しており、例えば、「ｘ＞０．８」という表記は、「信頼度が０．８より大きい場合」という条件を表す。また、符号４０５の欄に含まれる４行はキーワード「○○美術館」に関する言い換えルールを、符号４０６の欄に含まれる４行はキーワード「神奈川県」に関する言い換えルールを表す。さらに、符号４０７で示される行は、「○○美術館」の言い換えルールの１つを表しており、信頼度が０．８より大きい場合は、「○○美術館」というキーワードをそのまま使用し、「キーワード確認」という応答タイプが選択されるというルールになる。一方、符号４０８で示される行の場合は、信頼度が０．５より大きく、０．８以下の場合は、「○○美術館」というキーワードを「美術館の名称」に置き換え、「絞込み」という応答タイプが選択されるというルールになる。 The column indicated by the keyword 401 is the keyword to be paraphrased, the column indicated by the condition (reliability) 402 is the condition for paraphrasing, the column indicated by the paraphrase 403 is the paraphrase word and the column indicated by the response type 404 is the response type Is described. In the column indicated by the condition (reliability) 402 for performing the paraphrase, “x” represents the reliability of the keyword recognized from the user's voice data, for example, “x> 0.8” Represents the condition “when reliability is greater than 0.8”. Also, the four lines included in the column 405 represent paraphrase rules for the keyword “XX museum”, and the four lines included in the column 406 represent paraphrase rules for the keyword “Kanagawa”. Furthermore, the line indicated by reference numeral 407 represents one of the paraphrasing rules of “XX museum”. When the reliability is greater than 0.8, the keyword “XX museum” is used as it is. The rule is that the response type “confirm keyword” is selected. On the other hand, in the case of the row indicated by the reference numeral 408, when the reliability is greater than 0.5 and less than or equal to 0.8, the keyword “XX museum” is replaced with “art museum name”, and the response is “narrow down” The rule is that the type is selected.

図５は、応答文テンプレート辞書１１４に格納される情報のフォーマットを示す構成図である。応答文テンプレート辞書１１４は、前記言い換え辞書１１３（図４参照）で決定された応答タイプ４０４に対応づけた応答文のフォーマットを記録した辞書である。 FIG. 5 is a configuration diagram showing a format of information stored in the response sentence template dictionary 114. The response sentence template dictionary 114 is a dictionary that records the format of the response sentence associated with the response type 404 determined by the paraphrase dictionary 113 (see FIG. 4).

応答タイプ５０１で示される列には応答タイプが、応答文テンプレート５０２で示される列には応答タイプに対応する応答文テンプレートが記述される。符号５０３で示される行には応答タイプ「キーワード確認」に対する応答文テンプレートが、符号５０４で示される行には応答タイプ「絞込み」に対する応答文テンプレートが、符号５０５で示される行には応答タイプ「種類確認」に対する応答文テンプレートが、符号５０６で示される行には応答タイプ「再入力」に対する応答文テンプレートがそれぞれ格納されている。各応答文テンプレート中の「Ｘ」は、その位置に言い換え辞書１１３により決定された言い換え語４０３が挿入されることを示している。すなわち、応答文は、応答タイプ５０１に対応する応答文テンプレート５０２に、言い換え辞書１１３によって決定される言い換え語を挿入することにより生成される。 The response type is described in the column indicated by the response type 501, and the response statement template corresponding to the response type is described in the column indicated by the response statement template 502. In the line indicated by reference numeral 503, a response sentence template for the response type “keyword confirmation” is displayed. In the line indicated by reference numeral 504, the response sentence template for the response type “restriction” is indicated. In the line indicated by reference numeral 505, the response type “ The response sentence template for “type confirmation” is stored in the row indicated by reference numeral 506, and the response sentence template for the response type “re-input” is stored. “X” in each response sentence template indicates that the paraphrase word 403 determined by the paraphrase dictionary 113 is inserted at that position. That is, the response sentence is generated by inserting a paraphrase word determined by the paraphrase dictionary 113 into the response sentence template 502 corresponding to the response type 501.

例えば、言い換え辞書１１３（図４参照）において、符号４０７で示される行の言い換えルールが適用された場合、応答文テンプレート辞書１１４において、符号５０３で示される行の応答文テンプレートが選択され、言い換え語「○○美術館」が挿入されるため、応答文は、「○○美術館でよろしいですか？」となる。
一方、符号４０８で示される行の言い換えルールが適用された場合は、応答文テンプレート辞書１１４において、符号５０４で示される行の応答文テンプレートが選択され、言い換え語「美術館の名称」が挿入されるため、応答文は、「もう一度美術館の名称をお話し下さい。」となる。 For example, in the paraphrase dictionary 113 (see FIG. 4), when the paraphrase rule for the line indicated by reference numeral 407 is applied, the response sentence template for the line indicated by reference numeral 503 is selected in the response sentence template dictionary 114 and the paraphrase word is selected. Since “XX Museum” is inserted, the response sentence is “Are you sure you want to use XX Museum?”.
On the other hand, when the paraphrase rule for the line indicated by reference numeral 408 is applied, the response sentence template for the line indicated by reference numeral 504 is selected in the response sentence template dictionary 114 and the paraphrase word “art museum name” is inserted. Therefore, the response sentence is "Please tell me the name of the museum again."

図６は、本発明の実施の形態１を適用した音声対話装置１の処理手順を示すフローチャートである。本発明の実施の形態１を適用した音声対話装置１の処理は、対話制御プログラム１０８をメインに実行する。対話制御プログラム１０８は、対話シナリオ１１１、キーワード種類辞書１１２、言い換え辞書１１３および応答文テンプレート辞書１１４を用いて、利用者が目的とする情報を検索するための検索条件となるキーワードを音声により入力することを促し、必要なキーワードが揃うとデータベース１１５を検索し、その結果を利用者に対して出力する。図１を参照しつつ、図６に添って、本発明の実施の形態１における手順の説明をする。 FIG. 6 is a flowchart showing a processing procedure of the voice interactive apparatus 1 to which the first embodiment of the present invention is applied. The processing of the voice interactive apparatus 1 to which the first exemplary embodiment of the present invention is applied mainly executes the interactive control program 108. The dialogue control program 108 uses a dialogue scenario 111, a keyword type dictionary 112, a paraphrase dictionary 113, and a response sentence template dictionary 114 to input a keyword as a search condition for searching information intended by the user by voice. When the necessary keywords are prepared, the database 115 is searched and the result is output to the user. The procedure in the first embodiment of the present invention will be described with reference to FIG.

図６では、別の音声コマンド、あるいは、図示しない画面上のメニュー等により、利用者によって特定の対話シナリオがあらかじめ選択されており、音声認識プログラム１０７には選択された対話シナリオに対応する利用者音声認識用文法名が通知され、利用者から入力される音声データの認識が可能な状態になっていると想定している。また、処理開始時の対話シナリオ１１１の各スロットの内容は空であると想定する。
以下、ステップＳ６００からステップＳ６０３は、情報処理部１０５に実行されることで、対話制御プログラム１０８が行う。 In FIG. 6, a specific dialogue scenario is selected in advance by the user by another voice command or a menu on the screen (not shown), and the user corresponding to the selected dialogue scenario is stored in the voice recognition program 107. It is assumed that the grammar name for speech recognition is notified and the speech data input from the user can be recognized. Further, it is assumed that the contents of each slot of the dialogue scenario 111 at the start of processing are empty.
In the following, steps S600 to S603 are executed by the information processing unit 105, so that the dialogue control program 108 performs them.

処理が開始すると、ステップＳ６００は、各スロットを初期化する。すなわち、利用者によって特定の対話シナリオ１１１（図２参照）が選択されると、当該対話シナリオ１１１をメモリ領域に格納する。当該対話シナリオ１１１には、利用者音声認識用文法名２０６と、それに対応するコマンド２０７と、スロット１の種類２０３と、スロットｎの種類２０５と、が格納されている。さらに、スロットＮ（Ｎ＝１、・・ｎ）の内容を格納するメモリ領域を確保し、その内容を空のまま（空スロット）とする。 When the process starts, step S600 initializes each slot. That is, when a specific dialogue scenario 111 (see FIG. 2) is selected by the user, the dialogue scenario 111 is stored in the memory area. The dialogue scenario 111 stores a user speech recognition grammar name 206, a corresponding command 207, a slot 1 type 203, and a slot n type 205. Further, a memory area for storing the contents of the slot N (N = 1,... N) is secured, and the contents are left empty (empty slot).

ステップＳ６０１は、対話シナリオ１１１に存在するスロットＮ（Ｎ＝１、・・ｎ）の内容を格納するメモリ領域の中で空スロットであるものを条件として、その空スロットにキーワードの入力を利用者に促すための応答文に対する応答タイプを、後記する図７に示す対応表から決定する。ステップＳ６０１は、利用者との対話が新たに開始された場合や、新たなスロットへのキーワードの入力を利用者に促す場合に行われる処理となる。 In step S601, on the condition that the slot N (N = 1,... N) existing in the dialogue scenario 111 stores the contents of the slot N that is an empty slot, a keyword is input to the empty slot. The response type for the response sentence for prompting is determined from the correspondence table shown in FIG. Step S601 is a process performed when a dialog with the user is newly started or when the user is prompted to enter a keyword in a new slot.

ここで、図７は、本発明の実施の形態１において空きスロットと、それに対応した応答タイプとの関係を格納する情報を示す対応表である。本対応表は、空きスロットリスト７０１と応答タイプ７０２とから構成され、空きスロットリスト７０１で示される列にはスロットの名称、応答タイプ７０２で示される列にはスロットに対する応答タイプが格納される。例えば、符号７０３で示される行では、スロット１が空スロットである場合に応答タイプが「要求１」であることを示している。同様に、符号７０４で示される行では、スロット２が空スロットである場合に応答タイプが「要求２」になることを示している。
図７に示す対応表は、あらかじめ利用者音声認識用文法名２０６に対応した、空スロットと応答タイプを関連づけた対応表を用意しておく。また、言い換えルールの一種として、言い換え辞書１１３（図４参照）に、空スロットに対応する応答タイプを格納することができる。あるいは、別途格納手段を設けて空スロットに対応する応答タイプを格納するようにしても良い。 Here, FIG. 7 is a correspondence table showing information for storing the relationship between empty slots and corresponding response types in the first embodiment of the present invention. This correspondence table is composed of an empty slot list 701 and a response type 702. The column indicated by the empty slot list 701 stores the name of the slot, and the column indicated by the response type 702 stores the response type for the slot. For example, the row indicated by reference numeral 703 indicates that the response type is “request 1” when slot 1 is an empty slot. Similarly, the row indicated by reference numeral 704 indicates that the response type is “request 2” when slot 2 is an empty slot.
The correspondence table shown in FIG. 7 is prepared in advance as a correspondence table in which empty slots and response types are associated with the user speech recognition grammar name 206. As a kind of paraphrase rule, a response type corresponding to an empty slot can be stored in the paraphrase dictionary 113 (see FIG. 4). Alternatively, a separate storage means may be provided to store response types corresponding to empty slots.

ステップＳ６０２では、ステップＳ６０１で選択された応答タイプに基づいて、応答文テンプレート辞書１１４（図５参照）を検索し、該当する応答文テンプレートを決定する。応答文テンプレート辞書１１４に格納される情報では、図７に示す応答タイプ７０２に対応する応答文テンプレートは考慮していないが、例えば、後記する図８における符号８０１および符号８０２で示される行のように、空スロットに対する応答文テンプレートを応答文テンプレート辞書１１４の中に含めることができる。 In step S602, the response sentence template dictionary 114 (see FIG. 5) is searched based on the response type selected in step S601, and the corresponding response sentence template is determined. In the information stored in the response sentence template dictionary 114, the response sentence template corresponding to the response type 702 shown in FIG. 7 is not taken into account. For example, the lines indicated by reference numerals 801 and 802 in FIG. In addition, the response sentence template for the empty slot can be included in the response sentence template dictionary 114.

ここで、図８は、図５で示した応答文テンプレート辞書１１４を拡張した応答文テンプレート辞書１１４ａに格納される情報を表す構成図を示す。図８において、符号８０１で示される行には、図７より、スロット１が空スロットである場合の応答文、符号８０２で示される行には、スロット２が空スロットである場合の応答文となる。 Here, FIG. 8 is a configuration diagram showing information stored in the response sentence template dictionary 114a obtained by extending the response sentence template dictionary 114 shown in FIG. In FIG. 8, the line indicated by reference numeral 801 shows a response sentence when slot 1 is an empty slot, and the line indicated by reference numeral 802 shows a response sentence when slot 2 is an empty slot. Become.

ステップＳ６０３では、決定されている言い換え語を応答文テンプレートに挿入し、応答文を生成する。ここで、挿入される言い換え語は、後記するステップＳ６０７で決定される。従って、ステップＳ６０１で応答タイプが選択された場合のステップＳ６０３では、応答文テンプレートには、言い換え語は無しとして、処理を進める。 In step S603, the determined paraphrase word is inserted into the response sentence template to generate a response sentence. Here, the paraphrase to be inserted is determined in step S607 described later. Therefore, in step S603 when the response type is selected in step S601, the process proceeds with no paraphrase word in the response sentence template.

ステップＳ６０４では、ステップＳ６０３で生成された応答文を音声合成プログラム１０９により音声データに変換し、音声出力部１０３を経由してスピーカ１０４から出力する。 In step S604, the response sentence generated in step S603 is converted into voice data by the voice synthesis program 109 and output from the speaker 104 via the voice output unit 103.

ステップＳ６０５では、音声認識プログラム１０７が、ステップＳ６０４で出力した応答文に対して利用者が入力した音声データを認識し、キーワードを抽出し、抽出されたキーワードとその確からしさを表す信頼度とともに、対話制御プログラムに通知する。
以下、ステップＳ６０６からステップＳ６１１は、情報処理部１０５に実行されることで、対話制御プログラム１０８が行う。 In step S605, the voice recognition program 107 recognizes the voice data input by the user with respect to the response sentence output in step S604, extracts the keywords, together with the reliability indicating the extracted keywords and their probabilities. Notify the dialog control program.
Hereinafter, steps S606 to S611 are executed by the information processing unit 105, so that the dialogue control program 108 performs them.

ステップＳ６０６では、音声データの認識結果が確認に対する応答かどうかを判定する。判定方法としては、認識結果中に、あらかじめ登録しておいた確認に対する応答であることを表す特定の単語、例えば、「はい」や「いいえ」等の単語が含まれているかどうかを調べ、特定の単語が含まれていれば確認に対する応答であると判断し、含まれていなければ確認に対する応答ではないと判断すれば良い。また、応答文が確認を求める応答文かどうかの情報を保持しておき、応答文が確認を求める応答文であれば利用者の音声は確認に対する応答と判断することもできる。この際、応答文が確認を求める応答文であるかどうかの判定は、応答文テンプレート辞書１１４（図５参照）の応答タイプ５０１や応答文テンプレート５０２に確認を求める応答文であるかを表す情報を付加しておけば容易に実現することができる。さらに、応答文が確認を求める応答文であるかどうかを表す情報と、認識結果中に含まれる特定の単語との両方を利用して判定することもできる。 In step S606, it is determined whether the recognition result of the voice data is a response to the confirmation. As a determination method, the recognition result is checked by checking whether or not a specific word indicating a response to a confirmation registered in advance, for example, “Yes” or “No” is included. If this word is included, it is determined that the response is to the confirmation, and if it is not included, it is only necessary to determine that the response is not the confirmation. In addition, information on whether or not the response sentence is a response sentence requesting confirmation may be stored, and if the response sentence is a response sentence requesting confirmation, the user's voice can be determined as a response to the confirmation. At this time, whether or not the response sentence is a response sentence that requires confirmation is information indicating whether or not the response type 501 or the response sentence template 502 in the response sentence template dictionary 114 (see FIG. 5) requires confirmation. Can be easily realized. Furthermore, it can also be determined using both information indicating whether or not the response sentence is a response sentence for confirmation and a specific word included in the recognition result.

ステップＳ６０６において、利用者の音声が、確認に対する応答では無いと判断された場合は（Ｎｏ）、ステップＳ６０７に進む。
ステップＳ６０７では、ステップＳ６０５で認識されたキーワードとその信頼度に基づいて言い換え辞書１１３（図４参照）を検索し、言い換え語４０３と応答タイプ４０４を決定し、ステップＳ６０２に戻る。 If it is determined in step S606 that the user's voice is not a response to the confirmation (No), the process proceeds to step S607.
In step S607, the paraphrase dictionary 113 (see FIG. 4) is searched based on the keyword recognized in step S605 and its reliability, the paraphrase word 403 and the response type 404 are determined, and the process returns to step S602.

例えば、ステップＳ６０５で利用者の音声を認識した結果が、（○○美術館、０．４）であったとすると、言い換え辞書１１３（図４参照）から、まず「○○美術館」を検索キーとして検索され、ステップＳ６０７では、言い換え語４０３「美術館」、応答タイプ４０４「種類確認」が選択される。
この例では、ステップＳ６０２では、前記応答タイプに基づいて、応答文テンプレート辞書１１４（図５参照）から該当する応答文テンプレート５０５が選択されるため、ステップＳ６０３で生成される応答文は、「美術館でよろしいですか？」となる。 For example, if the result of recognizing the user's voice in step S605 is (XX museum, 0.4), first search from the paraphrase dictionary 113 (see FIG. 4) using “XX museum” as a search key. In step S607, the paraphrase 403 “museum” and the response type 404 “type confirmation” are selected.
In this example, in step S602, since the corresponding response sentence template 505 is selected from the response sentence template dictionary 114 (see FIG. 5) based on the response type, the response sentence generated in step S603 is “museum museum”. Are you sure? "

一方、ステップＳ６０６において、利用者の音声が確認に対する応答であると判断された場合は（Ｙｅｓ）、ステップＳ６０８に進む。 On the other hand, if it is determined in step S606 that the user's voice is a response to the confirmation (Yes), the process proceeds to step S608.

ステップＳ６０８では、さらに、利用者の音声が、確認に対する応答が肯定であったか、否定であったかを判定する。この場合も、利用者の音声の認識結果中に、肯定を表す特定の語、例えば「はい」や「そうです」等が含まれている場合、利用者の音声は確認に対する応答が肯定、一方、否定を表す特定の語、例えば「いいえ」、「違います」等が含まれている場合、利用者の音声は確認に対する応答が否定であると判定すれば良い。 In step S608, the user's voice further determines whether the response to the confirmation is affirmative or negative. Also in this case, if the user's voice recognition result includes a specific word indicating affirmation, such as “Yes” or “Yes”, the user's voice has a positive response to the confirmation. When a specific word representing negation, for example, “No”, “No”, or the like is included, the user's voice may be determined as a negative response to the confirmation.

ステップＳ６０８において、利用者の音声の、確認に対する応答が否定であると判定された場合は（Ｎｏ）、ステップＳ６０９に進み、確認対象となっているキーワードを抹消し、ステップＳ６０１に戻る。確認対象となっているキーワードが何であるかは、確認に対する応答と判断された利用者の音声より前に入力された利用者の音声の認識結果を保持しておくことにより、容易に判定することができる。 If it is determined in step S608 that the response of the user's voice to the confirmation is negative (No), the process proceeds to step S609, where the keyword to be confirmed is deleted, and the process returns to step S601. It is easy to determine what keyword is the target of confirmation by holding the recognition result of the user's voice input before the voice of the user who is judged to be a response to the confirmation. Can do.

ステップＳ６０８において、利用者の音声の確認に対する応答が肯定であると判定された場合は（Ｙｅｓ）、ステップＳ６１０に進み、確認の対象となっているキーワードを対話シナリオ１１１の該当するスロットＮ（Ｎ＝１、・・ｎ）の内容を格納するメモリ領域に格納する。このために、前記キーワードをキーとして、キーワード種類辞書１１２（図３参照）から、キーワードの種類３０１を求める。さらに、求めたキーワードの種類と一致するスロットＮの種類（Ｎ＝１、・・ｎ）を有するスロットを図２における対話シナリオ１１１から検索する。検索されたスロットをキーワードに該当するスロットとして、キーワードをそのスロットＮ（Ｎ＝１、・・ｎ）の内容を格納するメモリ領域に格納する。 If it is determined in step S608 that the response to the user's voice confirmation is affirmative (Yes), the process proceeds to step S610, and the keyword to be confirmed is selected in the corresponding slot N (N = 1,... N) is stored in a memory area for storing the contents. For this purpose, the keyword type 301 is obtained from the keyword type dictionary 112 (see FIG. 3) using the keyword as a key. Furthermore, a slot having a slot N type (N = 1,... N) that matches the obtained keyword type is searched from the dialogue scenario 111 in FIG. The searched slot is set as a slot corresponding to the keyword, and the keyword is stored in a memory area for storing the contents of the slot N (N = 1,... N).

ステップＳ６１０において該当するスロットにキーワードを格納した後、ステップＳ６１１に進み、全てのスロットにキーワードが格納されたかどうかを調べる。
全てのスロットにキーワードが格納されていない場合は（Ｎｏ）、ステップＳ６０１に戻る。 After storing the keyword in the corresponding slot in step S610, the process proceeds to step S611 to check whether the keyword is stored in all slots.
If no keyword is stored in all slots (No), the process returns to step S601.

ステップＳ６１１において全てのスロットにキーワードが格納されている場合は（Ｙｅｓ）、ステップＳ６１２に進み、対話シナリオ１１１（図２参照）の符号２０７で示されるコマンドとスロットに格納されたキーワードを用いて検索プログラム１１０が、データベース１１５の検索処理を実行し、結果を音声合成プログラム１０９により出力する。 When keywords are stored in all slots in step S611 (Yes), the process proceeds to step S612, and a search is performed using the command indicated by reference numeral 207 in the dialogue scenario 111 (see FIG. 2) and the keywords stored in the slots. The program 110 executes a search process of the database 115 and outputs the result by the speech synthesis program 109.

さらに、言い換え辞書１１３（図４参照）には、キーワード毎に対応する言い換え語を格納していたが、キーワードの種類毎に言い換え語を格納することもできる。この場合、言い換え辞書としては、図９に示すようなフォーマットを用いれば良い。 Furthermore, although the paraphrase dictionary 113 (see FIG. 4) stores paraphrase words corresponding to each keyword, paraphrase words can also be stored for each type of keyword. In this case, a format as shown in FIG. 9 may be used as the paraphrase dictionary.

ここで、図９は、本発明の実施の形態１におけるキーワードの種類に対する言い換え辞書に格納される情報を表す構成図を示す。 Here, FIG. 9 shows a configuration diagram representing information stored in the paraphrase dictionary for the keyword type in the first embodiment of the present invention.

図９において、図４に示す言い換え辞書１１３との差異は、種類９０１で示される列と言い換え語９０２で示される列の内容である。種類９０１で示される列には、言い換え対象となるキーワードの種類を表す文字列が記載される。
図９では、「美術館」９０３および「県名」９０４がキーワードの種類を表す文字列である。言い換え語９０２で示される列の内容は、図４における言い換え語４０３で示される列の内容とほぼ同じであるが、符号９０５および符号９０６の行で示されるルールに含まれる内容が異なっている。符号９０５の行で示されるルールでは、「認識されたキーワードの種類が美術館であり、キーワードの信頼度が０．８より大きければ、認識されたキーワードを言い換え語として選択する」ということを表している。符号９０５および符号９０６で示される行に含まれる［キーワード］という記述は、認識されたキーワードを言い換え語として使用することを示している。 In FIG. 9, the difference from the paraphrase dictionary 113 shown in FIG. 4 is the contents of the column indicated by the type 901 and the column indicated by the paraphrase word 902. In the column indicated by the type 901, a character string representing the type of keyword to be paraphrased is described.
In FIG. 9, “art museum” 903 and “prefecture name” 904 are character strings representing the types of keywords. The content of the column indicated by the paraphrase word 902 is substantially the same as the content of the column indicated by the paraphrase word 403 in FIG. 4, but the contents included in the rules indicated by the rows denoted by reference numerals 905 and 906 are different. The rule indicated by the line denoted by reference numeral 905 indicates that “if the recognized keyword type is a museum and the reliability of the keyword is greater than 0.8, the recognized keyword is selected as a paraphrase”. Yes. The description [keyword] included in the lines indicated by reference numerals 905 and 906 indicates that the recognized keyword is used as a paraphrase.

図９に示す言い換え辞書１１３ａを用いた場合、図６に示す流れ図におけるステップＳ６０７は、次のように変更される。すなわち、ステップＳ６０７では、まず、利用者の音声から認識されたキーワードをキーワード種類辞書１１２から検索し、キーワードの種類３０１を決定する。決定したキーワードの種類３０１と認識されたキーワードに対する信頼度に基づいて、言い換え辞書１１３ａを検索し、言い換え語および応答タイプを決定する。 When the paraphrase dictionary 113a shown in FIG. 9 is used, step S607 in the flowchart shown in FIG. 6 is changed as follows. That is, in step S607, first, a keyword recognized from the user's voice is searched from the keyword type dictionary 112, and the keyword type 301 is determined. Based on the reliability of the keyword recognized as the determined keyword type 301, the paraphrase dictionary 113 a is searched to determine the paraphrase word and the response type.

また、図４および図９に示す言い換え辞書１１３では、一組のキーワードと条件の組み合わせに対して一種類の言い換え語のみが登録されているが、複数の言い換え語を登録することもできる。この場合、言い換え語を決定する方法としては、例えば乱数を使用して決定すれば良い。 Further, in the paraphrase dictionary 113 shown in FIGS. 4 and 9, only one type of paraphrase word is registered for a combination of a keyword and a condition, but a plurality of paraphrase words can also be registered. In this case, as a method for determining the paraphrase word, for example, a random number may be used.

本発明の実施の形態１によれば、キーワードの音韻的な特性や長さ等によってキーワードの認識結果における信頼度が大きく変化する場合にも柔軟に対応した応答文の生成が可能となる。例えば、キーワードが正しく認識された場合においても、得られる信頼度はキーワードの種類によって常に高い値が得られるときも、低いときもある。言い換え辞書１１３（図４参照）では、「○○美術館」と「神奈川県」それぞれに対して、言い換え語を選択する際の信頼度に関する条件（符号４０２参照）が異なっており、「○○美術館」では正しい認識結果が得られる場合の信頼度が高く、「神奈川県」では正しい認識結果が得られる場合の信頼度が低いことが多い、ということを前提とした設定となっている。このように、キーワード毎に言い換え語を選択する条件を設定することにより、認識対象とするキーワードの特性に応じて適切な応答文を生成することが可能となる。 According to the first embodiment of the present invention, it is possible to flexibly generate a response sentence even when the reliability of the keyword recognition result varies greatly depending on the phonological characteristics and length of the keyword. For example, even when a keyword is recognized correctly, the reliability obtained may be always high or low depending on the type of keyword. In the paraphrase dictionary 113 (see FIG. 4), the conditions regarding reliability (see reference numeral 402) for selecting paraphrase words are different for “XX Museum” and “Kanagawa Prefecture”, respectively. "Is high when the correct recognition result is obtained, and" Kanagawa Prefecture "is premised on that the reliability is often low when the correct recognition result is obtained. Thus, by setting conditions for selecting paraphrasing words for each keyword, it is possible to generate an appropriate response sentence according to the characteristics of the keyword to be recognized.

（実施の形態２）
以下、本発明（音声対話装置および音声対話プログラム）の実施の形態２を、図１０等を用いて説明する。 (Embodiment 2)
The second embodiment of the present invention (voice dialogue apparatus and voice dialogue program) will be described below with reference to FIG.

前記した本発明の実施の形態１では、利用者の音声中に含まれているキーワードは１つのみであると仮定していた。通常、利用者との対話では、利用者が２つ以上のキーワードを１回の音声中に含めることができるようにした方が操作性は向上する。 In Embodiment 1 of the present invention described above, it is assumed that only one keyword is included in the user's voice. Usually, in the dialogue with the user, the operability is improved if the user can include two or more keywords in one voice.

本発明の実施の形態２では、利用者の音声中に複数のキーワードが含まれている場合でも柔軟な応答文を生成できるように、応答文テンプレート辞書１１４のフォーマットを図１０に示すようなフォーマットとする。図５に示す応答文テンプレート辞書１１４では、対象とするキーワードが１種類であるため、それに対応する応答タイプも１種類であるが、同時に対象とするキーワードが２つ以上の場合は、それぞれのキーワードに対して図４に示す言い換え辞書１１３を適用することにより、それぞれのキーワードに対して応答タイプが決定される。このため図１０に示す応答文テンプレート辞書１１４ｂでは、応答タイプの組み合わせに対して応答文テンプレートを決定できるようにしている。 In the second embodiment of the present invention, the format of the response sentence template dictionary 114 is as shown in FIG. 10 so that a flexible response sentence can be generated even when a plurality of keywords are included in the user's voice. And In the response sentence template dictionary 114 shown in FIG. 5, since there is one type of target keyword, the corresponding response type is also one type. However, when there are two or more target keywords at the same time, On the other hand, by applying the paraphrase dictionary 113 shown in FIG. 4, a response type is determined for each keyword. Therefore, in the response sentence template dictionary 114b shown in FIG. 10, a response sentence template can be determined for a combination of response types.

図１０に示す辞書は、対象とするスロットが２つの場合を示しており、それぞれのスロットに対応するキーワードから選択される応答タイプの組み合わせがスロット１の応答タイプ１００１およびスロット２の応答タイプ１００２で示される列に記録される。例えば、符号１００３で示される行では、スロット１およびスロット２の応答タイプが共に「キーワード確認」の場合、「［スロット１の名称］の［スロット２の名称］でよろしいですか？」という応答文テンプレートが選択される。
ここで、図６に示すフローチャートを用いた処理中、ステップＳ６０３の処理では、［スロット１の名称］および［スロット２の名称］には、対話シナリオ１１１に格納されているそれぞれのスロットＮの種類（Ｎ＝１、・・ｎ）に対応するキーワードの言い換え語が挿入される。すなわち、キーワードの入れ替え語をどのスロットに挿入するかは、対象とするキーワードに対応するキーワードの種類をキーワード種類辞書１１２から検索し、さらに、対話シナリオ１１１から、前記検索されたキーワードの種類と一致するスロットＮの種類（Ｎ＝１、・・ｎ）を検索することにより決定することができる。 The dictionary shown in FIG. 10 shows the case where there are two target slots, and the response type combinations selected from the keywords corresponding to the respective slots are the response type 1001 of slot 1 and the response type 1002 of slot 2. Recorded in the indicated column. For example, in the line indicated by reference numeral 1003, when both the response type of slot 1 and slot 2 is “keyword confirmation”, a response sentence “Are you sure you want to use [name of slot 2] in [name of slot 1]?” A template is selected.
Here, during the process using the flowchart shown in FIG. 6, in the process of step S <b> 603, [slot 1 name] and [slot 2 name] are the types of the respective slots N stored in the dialogue scenario 111. A keyword paraphrase corresponding to (N = 1,... N) is inserted. That is, in which slot the keyword replacement word is inserted, the keyword type corresponding to the target keyword is searched from the keyword type dictionary 112, and further, the dialogue scenario 111 is matched with the searched keyword type. It can be determined by searching the type of slot N to be performed (N = 1,... N).

また、図１０における符号１００３から符号１００６に示される行では、それぞれのスロットに対応するキーワードから選択される応答タイプが具体的に記録されているが、符号１００７から符号１００９に示される行では、スロット２に対応する応答タイプに関しては制約を設けないという形式になっている。符号１００２で示される列に記載されている「＊」という記号が、応答タイプに関する制約が無いことを示している。これにより、例えば、符号１００７で示される行は、スロット１に対応する応答タイプが「絞り込み」であれば、スロット２に対応する応答タイプに関わらず、「もう一度［スロット１の名称］をお話し下さい。」という応答文テンプレートが選択される。 Further, in the rows indicated by reference numerals 1003 to 1006 in FIG. 10, the response types selected from the keywords corresponding to the respective slots are specifically recorded, but in the rows indicated by reference numerals 1007 to 1009, The response type corresponding to slot 2 has a format in which no restriction is provided. The symbol “*” described in the column denoted by reference numeral 1002 indicates that there is no restriction regarding the response type. Thus, for example, if the response type corresponding to slot 1 is “narrow down”, the line indicated by reference numeral 1007 is “speak [name of slot 1] again regardless of the response type corresponding to slot 2”. Is selected.

利用者の音声中に複数のキーワードが含まれており、図１０の符号１００７から符号１００９に示されるような形式で応答文が生成された場合、図６に示すフローチャートを用いた処理では、１つのキーワードに対してのみ確認が行われ、他のキーワードについては再度入力を行う必要が生じる。このため、例えば、利用者の音声の認識結果に対して確認が行われたかどうかを示す情報をスロット毎に対話シナリオ１１１に付加するようにし、確認が行われていないキーワードについて、言い換え辞書１１３（図４参照）および応答文テンプレート辞書１１４（図５参照）を用いた応答文の生成を行うようにすれば、全てのキーワードに対して効率的な対話を行うことが可能となる。 When a plurality of keywords are included in the user's voice and a response sentence is generated in a format as indicated by reference numerals 1007 to 1009 in FIG. 10, in the process using the flowchart shown in FIG. Only one keyword is checked, and another keyword needs to be input again. For this reason, for example, information indicating whether or not confirmation is performed on the recognition result of the user's voice is added to the dialogue scenario 111 for each slot, and the paraphrase dictionary 113 ( If a response sentence is generated using the response sentence template dictionary 114 (see FIG. 4) and the response sentence template dictionary 114 (see FIG. 5), an efficient dialogue can be performed for all keywords.

また、利用者の音声中に複数のキーワードが含まれている場合、認識された複数のキーワードとそれらの信頼度の組み合わせによって応答タイプを一意に決定するように図１１に示すような言い換え辞書を用いることもできる。 In addition, when a plurality of keywords are included in the user's voice, a paraphrase dictionary as shown in FIG. 11 is used so that the response type is uniquely determined by the combination of the recognized keywords and their reliability. It can also be used.

ここで、図１１は、複数のキーワードの組み合わせに対する言い換え辞書のフォーマットを表す構成図を示す。
１つ目のキーワード１１０１、１つ目のキーワードの信頼度に関する条件１１０２、２つ目のキーワード１１０３、２つ目のキーワードの信頼度に関する条件１１０４、１つ目のキーワードに対する言い換え語１１０５、２つ目のキーワードの対する言い換え語１１０６、応答タイプ１１０７が組として登録されている。符号１１０１から符号１１０４で示されるキーワードおよびそれらの信頼度の組み合わせに対して、それぞれのキーワードに対する言い換え語および応答タイプが決定される。 Here, FIG. 11 is a block diagram showing the format of a paraphrase dictionary for a combination of a plurality of keywords.
The first keyword 1101, the condition 1102 regarding the reliability of the first keyword, the second keyword 1103, the condition 1104 regarding the reliability of the second keyword, the paraphrase 1105 for the first keyword, two, A paraphrase 1106 and a response type 1107 for the eye keyword are registered as a set. For the keywords indicated by reference numerals 1101 to 1104 and combinations of their reliability, paraphrases and response types for the respective keywords are determined.

図１２は、例えば、図１１の言い換え辞書１１３ｂを用いることにより、複数のキーワードに対する言い換え語および応答タイプが決定された場合に使用される応答文テンプレート辞書のフォーマットを示す。図１２に示す応答文テンプレート辞書１１４ｃのフォーマットは、図５に示す応答文テンプレート辞書１１４と基本的には同様であるが、応答文テンプレート１２０１で示す列における応答文テンプレートの内容は、複数のスロットに対応するキーワードが挿入可能な記述となっている。 FIG. 12 shows a format of a response sentence template dictionary used when paraphrase words and response types for a plurality of keywords are determined by using, for example, the paraphrase dictionary 113b of FIG. The format of the response sentence template dictionary 114c shown in FIG. 12 is basically the same as that of the response sentence template dictionary 114 shown in FIG. 5, but the content of the response sentence template in the column indicated by the response sentence template 1201 includes a plurality of slots. The keyword corresponding to can be inserted.

例えば、図１１における符号１１０８で示される条件が適用された場合、図１２の符号１２０２で示される行の応答文テンプレートが選択される。すなわち、スロット１の種類が「県名」、スロット２の種類が「美術館」である場合、応答文は、「神奈川県の○○美術館でよろしいですか？」となる。
一方、図１１の１１０９で示される条件が適用された場合は、図１２の符号１２０４で示される応答文テンプレートが選択され、応答文は、「神奈川県の何という美術館ですか？」となる。
さらに、図１１の符号１１１０で示される条件が適用された場合は、図１２の符号１２０３で示される応答文テンプレートが選択され、応答文は、「○○美術館でよろしいですか？」となる。 For example, when the condition indicated by reference numeral 1108 in FIG. 11 is applied, the response sentence template in the row indicated by reference numeral 1202 in FIG. 12 is selected. That is, if the type of slot 1 is “prefecture name” and the type of slot 2 is “museum”, the response sentence is “Are you sure you want to visit the XX museum in Kanagawa?”.
On the other hand, when the condition indicated by 1109 in FIG. 11 is applied, the response sentence template indicated by reference numeral 1204 in FIG. 12 is selected, and the response sentence is “What museum in Kanagawa?”.
Furthermore, when the condition indicated by reference numeral 1110 in FIG. 11 is applied, the response sentence template indicated by reference numeral 1203 in FIG. 12 is selected, and the response sentence is “Are you sure you want to use the XX museum?”.

さらに、１つのキーワードのみを対象とした言い換え辞書１１３や応答文テンプレート辞書１１４と、２つ以上のキーワードの組み合わせを対象とした言い換え辞書１１３や応答文テンプレート１１４と、を混在させて使用することもできる。
例えば、対象とするスロットに優先順位を付加し、優先順位の高いスロットに関する言い換え辞書１１３や応答文テンプレート辞書１１４を優先的に用いて検索する方法や、より多い数のスロットの組み合わせに対する言い換え辞書１１３や応答文テンプレート辞書１１４を優先的に用いて検索するようにすれば良い。
また、スロットの優先順位と対象とするスロットの数を併用するようにすることや、スロットの組み合わせに対する優先順位をあらかじめ定義しておくこともできる。 Furthermore, the paraphrase dictionary 113 and the response sentence template dictionary 114 for only one keyword and the paraphrase dictionary 113 and the response sentence template 114 for a combination of two or more keywords may be used in combination. it can.
For example, a priority is added to the target slot and a search is performed by using the paraphrase dictionary 113 and the response sentence template dictionary 114 related to the slot with a higher priority, or the paraphrase dictionary 113 for a combination of a larger number of slots. Or the response sentence template dictionary 114 may be preferentially searched.
Also, the slot priority and the number of target slots can be used together, and the priority for the combination of slots can be defined in advance.

本発明の実施の形態２によれば、利用者から入力されるキーワードの種類やその数によってキーワードの言い換え方法を変更した方が良い場合にも適切な応答文を柔軟に生成することが可能となる。
例えば、利用者の音声から認識されたキーワードについて、その種類を指定した再入力を促す応答文を生成する場合を想定する。利用者の音声が、「○○美術館です。」のようにキーワードが１つである場合、応答文としては、「もう一度、美術館の名称をお話下さい。」というような表現が考えられる。
一方、利用者の音声が、「神奈川県の○○美術館です。」のようにキーワードが２つ(神奈川県、○○美術館)である場合は、「神奈川県のどの美術館ですか。」というような表現が適切な表現となる。
最初の例においては「○○美術館」が「美術館の名称」で置き換えられ、２番目の例においては「どの美術館」に置き換えられていることになる。
このように、言い換える対象となるキーワードが同じであっても、応答文に含めるキーワードの数によって、適切な言い換え方法が異なる場合がある。また、それ以前の応答文によっては、「どの美術館」ではなく、「どんな美術館」や「何という美術館」等の表現が適切な場合も考えられる。
かかる場合、１つのキーワードのみを対象とした言い換え辞書１１３や応答文テンプレート辞書１１４と、２つ以上のキーワードの組み合わせを対象とした言い換え辞書１１３や応答文テンプレート１１４を併用することにより、適切な応答文を生成することが可能となる。 According to Embodiment 2 of the present invention, it is possible to flexibly generate an appropriate response sentence even when it is better to change the keyword paraphrasing method according to the type and number of keywords input from the user. Become.
For example, a case is assumed where a response sentence that prompts re-input specifying a type of a keyword recognized from the user's voice is generated. If the user's voice is a single keyword such as “It is a XX museum,” the response sentence may be “Please tell me the name of the museum again.”
On the other hand, if the user's voice is two keywords (Kanagawa Prefecture, XX Museum of Art), such as “It is a XX museum in Kanagawa Prefecture”, “Which museum is in Kanagawa Prefecture?” Is an appropriate expression.
In the first example, “XX museum” is replaced with “art museum name”, and in the second example, “which museum” is replaced.
Thus, even if the keywords to be paraphrased are the same, an appropriate paraphrase method may differ depending on the number of keywords included in the response sentence. In addition, depending on the response sentence before that, the expression “what kind of art museum” or “what kind of art museum” may be appropriate instead of “which art museum”.
In such a case, an appropriate response can be obtained by using the paraphrase dictionary 113 and the response sentence template dictionary 114 for only one keyword together with the paraphrase dictionary 113 and the response sentence template 114 for a combination of two or more keywords. A sentence can be generated.

（実施の形態３）
本発明（音声対話装置および音声対話プログラム）の実施の形態３を、図１３等を用いて説明する。 (Embodiment 3)
A third embodiment of the present invention (voice dialogue apparatus and voice dialogue program) will be described with reference to FIG.

前記した本発明の実施の形態１および実施の形態２における言い換え辞書１１３は、認識されたキーワードおよびその信頼度のみを言い換え語および応答タイプを選択する際の条件としていた。通常、利用者との対話では、やり取りする情報の量や直前までの対話の内容等によって応答文を変更した方が、スムーズなやり取りが行える場合が少なくない。これを実現するため、言い換え語辞書１１３における言い換え語および応答タイプを選択する際の条件として、認識されたキーワードおよび信頼度以外の項目を追加する。 The paraphrase dictionary 113 in the first and second embodiments of the present invention described above uses only the recognized keyword and its reliability as conditions for selecting a paraphrase word and a response type. Usually, in a dialogue with a user, it is often the case that a smooth exchange can be performed if the response sentence is changed according to the amount of information to be exchanged or the content of the dialogue up to immediately before. In order to realize this, items other than the recognized keyword and reliability are added as conditions for selecting a paraphrase word and a response type in the paraphrase word dictionary 113.

図１３は、本発明の実施の形態３における前記条件に項目を追加した言い換え辞書に格納される情報を表す構成図である。 FIG. 13 is a configuration diagram showing information stored in the paraphrase dictionary in which items are added to the conditions in the third embodiment of the present invention.

図１３における言い換え辞書１１３ｃでは、符号１３０１の列に示す「他スロット数」が条件として追加されている。符号１３０１の列において、「＊」は対象とするスロット以外のスロット数に対する制約が無いことを表している。「０」は対象とするスロット以外のスロットが無いことを表しており、「ｙ≧１」は対象とするスロット以外のスロット数が１以上であることを表している。ｙは便宜上使用している変数名である。 In the paraphrase dictionary 113 c in FIG. 13, “number of other slots” shown in the column 1301 is added as a condition. In the column 1301, “*” indicates that there is no restriction on the number of slots other than the target slot. “0” indicates that there is no slot other than the target slot, and “y ≧ 1” indicates that the number of slots other than the target slot is 1 or more. y is a variable name used for convenience.

ここで、例えば、図１３に示す言い換え辞書１１３ｃと、図５および図１０に示す応答文テンプレート辞書１１４を使用する場合を想定する。利用者の音声を認識した結果、音声中に含まれるキーワードが「○○美術館」のみであり、信頼度は０．７、そのキーワードはスロット２に対応しているとする。この場合、図１３に示す言い換え辞書１１３ｃを適用すると、符号１３０２で示す行が選択され、言い換え語「美術館の名称」、応答タイプ「絞込み」が選択される。対象とするスロットが１つであるため、図５に示す応答文テンプレート辞書１１４により応答文テンプレート「もう一度Ｘをお話し下さい。」が選択され、選択された言い換え語を挿入することにより、応答文として、「もう一度美術館の名称をお話し下さい。」が生成される。 Here, for example, it is assumed that the paraphrase dictionary 113c shown in FIG. 13 and the response sentence template dictionary 114 shown in FIGS. 5 and 10 are used. As a result of recognizing the user's voice, it is assumed that the keyword included in the voice is only “XX museum”, the reliability is 0.7, and the keyword corresponds to slot 2. In this case, when the paraphrase dictionary 113c shown in FIG. 13 is applied, the row indicated by reference numeral 1302 is selected, and the paraphrase word “museum name” and the response type “narrow down” are selected. Since there is only one target slot, the response sentence template “Please speak X again” is selected by the response sentence template dictionary 114 shown in FIG. 5, and the selected paraphrase is inserted as a response sentence. , "Please tell me the name of the museum again."

一方、利用者の音声中にスロット１に対応する「神奈川県」も同時に含まれており、その信頼度は１．０とする。この場合、「神奈川県」に対する言い換え語および応答タイプは、図１３に示す言い換え辞書１１３ｃを適用することにより、それぞれ「神奈川県」、「キーワード確認」となる。当該スロット１と前記スロット２とについて、対象とするスロットは２つになるため、図１０の応答文テンプレート辞書１１４ｂにより、応答文テンプレートとしては、「［スロット１の名称］の［スロット２の名称］ですか？」が選択される。 On the other hand, “Kanagawa Prefecture” corresponding to slot 1 is also included in the user's voice, and the reliability is assumed to be 1.0. In this case, the paraphrase word and the response type for “Kanagawa prefecture” are “Kanagawa prefecture” and “Keyword confirmation”, respectively, by applying the paraphrase dictionary 113c shown in FIG. Since there are two target slots for the slot 1 and the slot 2, the response sentence template 114b of FIG. 10 indicates that the response sentence template is [name of slot 2 of [name of slot 1]. ]? "Is selected.

ここで、他スロット数を条件として考慮しない場合（例えば、図４に示す言い換え辞書１１３を利用する場合）、「○○美術館」に対する言い換え語は「美術館の名称」であるため、選択された応答文テンプレートに言い換え語を挿入することにより、応答文は、「神奈川県の美術館の名称ですか？」となる。「○○美術館」に対する応答タイプは「絞込み」であるため、応答文としては不適切となる。 Here, when the number of other slots is not considered as a condition (for example, when using the paraphrase dictionary 113 shown in FIG. 4), the paraphrase word for “XX museum” is “museum name”, so the selected response By inserting a paraphrase into the sentence template, the response sentence becomes "is it the name of an art museum in Kanagawa?" Since the response type for “XX Museum” is “narrowed down”, it is inappropriate as a response sentence.

一方、図１３に示す言い換え辞書１１３ｃにより、他スロット数を条件として考慮した場合、「○○美術館」に対する言い換え語としては、他スロット数が１となるため、「何という美術館」が選択される。応答文テンプレートに選択された言い換え語を挿入することにより、応答文として、「神奈川県の何という美術館ですか？」が生成され、応答タイプ「絞込み」に対して適切な応答文を生成することが可能となる。 On the other hand, when the number of other slots is considered as a condition according to the paraphrase dictionary 113c shown in FIG. 13, the number of other slots is 1 as the paraphrase for “XX Museum”, so “What Art Museum” is selected. . By inserting the selected paraphrase into the response sentence template, “What kind of museum in Kanagawa Prefecture?” Is generated as a response sentence, and an appropriate response sentence is generated for the response type “narrow down” Is possible.

図１３に示す言い換え辞書１１３ｃでは、言い換え語および応答タイプを選択する条件として、信頼度と他スロット数（利用者の音声中に含まれるキーワードの数）に基づく条件を使用していたが、その他、他スロットの種類や利用者名、対話履歴（過去の応答タイプや応答文、利用者音声の認識結果等の履歴）を条件として使用することもできる。これらの情報を利用するためには、それぞれの内容を格納するための列を図１３に示す言い換え辞書１１３ｃに追加すれば良い。 In the paraphrase dictionary 113c shown in FIG. 13, the condition based on the reliability and the number of other slots (the number of keywords included in the user's voice) is used as the condition for selecting the paraphrase word and the response type. It is also possible to use other slot types, user names, and conversation history (history of past response types, response sentences, user speech recognition results, etc.) as conditions. In order to use these pieces of information, a column for storing each content may be added to the paraphrase dictionary 113c shown in FIG.

また、利用者名を利用する場合は、対話を開始する際に、音声入力やキーボード等により名前を入力させれば良い。あるいは、公知の顔画像認識技術を用いることにより、カメラから取り込んだ画像から利用者の顔を認識し、入力することもできる。これにより、利用者毎に応答文の形態を変化させることが可能となる。 Further, when using a user name, the name may be input by voice input, a keyboard, or the like when starting a conversation. Alternatively, the user's face can be recognized and input from an image captured from the camera by using a known face image recognition technique. Thereby, it becomes possible to change the form of a response sentence for every user.

さらに、対話履歴を利用する場合は、応答タイプ、応答文および利用者音声の認識結果の列を条件として言い換え辞書１１３中に格納すれば良い。例えば、以下のような対話を想定する。
（１）応答文：施設名をお話し下さい。
（２）利用者音声：○○美術館。
（３）応答文：もう一度美術館の名称をお話し下さい。
（４）利用者音声：○○美術館
（５）応答文：○○美術館でよろしいですか？
（６）利用者音声：はい
図４に示す言い換え辞書１１３および図８に示す応答文テンプレート辞書１１４ａを使用した場合、それぞれの応答文の応答タイプは、（１）は、要求１、（３）は、絞込み、（５）は、キーワード確認となる。対話履歴を応答タイプおよび利用者音声の認識結果の列によって表す場合、例えば、（応答：要求１）（利用者：○○美術館）（応答：絞込み）（利用者：○○美術館）（応答：キーワード確認）（利用者：はい）というような形式で対話履歴を表すことができる。「応答」は応答タイプの略であり、「利用者」は利用者音声の認識結果の略であることを表している。このような形式の情報を言い換え辞書１１３に格納すると共に、同様の形式で実際に行われた対話の結果を別途記録しておけば、言い換え辞書１１３の条件として対話履歴を利用することができる。
前記例では、利用者音声の認識結果として、認識されたキーワードのみを登録する例を示しているが、認識の結果得られる信頼度を合わせて記録しても良い。また、応答タイプの代わりに、実際に出力された応答文を格納することも容易である。また、応答文のみあるいは利用者音声の認識結果のみを格納しても良い。また、言い換え辞書１１３中に条件として格納する対話履歴として、格納する対話履歴中の項目の数を制限しても良い。 Furthermore, when using the conversation history, the response type, the response sentence, and the user speech recognition result column may be stored in the paraphrase dictionary 113 as conditions. For example, the following dialogue is assumed.
(1) Response text: Please tell us the name of the facility.
(2) User voice: XX museum.
(3) Response: Please tell us the name of the museum once again.
(4) User voice: XX Museum (5) Response: Are you sure you want to visit XX Museum?
(6) User voice: Yes When the paraphrase dictionary 113 shown in FIG. 4 and the response sentence template dictionary 114a shown in FIG. 8 are used, the response type of each response sentence is (1) is request 1, (3) Is narrowing down, and (5) is keyword confirmation. When the conversation history is represented by a sequence of response type and user speech recognition result, for example, (response: request 1) (user: XX museum) (response: narrowing down) (user: XX museum) (response: The dialogue history can be expressed in a format such as (Keyword Confirmation) (User: Yes). “Response” is an abbreviation for response type, and “user” is an abbreviation for user speech recognition result. If information in such a format is stored in the paraphrase dictionary 113 and a result of a dialog actually performed in the same format is separately recorded, the dialog history can be used as a condition for the paraphrase dictionary 113.
In the above example, only the recognized keyword is registered as the recognition result of the user voice. However, the reliability obtained as a result of the recognition may be recorded together. It is also easy to store the actually output response text instead of the response type. Further, only the response sentence or only the recognition result of the user voice may be stored. In addition, as the conversation history stored as a condition in the paraphrase dictionary 113, the number of items in the stored conversation history may be limited.

対話履歴を言い換え辞書１１３の条件として使用することにより、応答タイプが絞込みや種類確認の繰り返しとなる場合に応答文を変更する等の制御を容易に行うことが可能となる。 By using the dialogue history as a condition of the paraphrase dictionary 113, it is possible to easily perform control such as changing a response sentence when the response type is narrowed down or repeated type confirmation.

本発明の実施の形態３によれば、前記実施の形態２よりさらに複雑な条件毎に、応答文の内容を細かく制御して生成することが可能となる。 According to the third embodiment of the present invention, it is possible to finely control the contents of the response sentence for each more complicated condition than in the second embodiment.

本発明の音声対話装置は、入力された利用者の音声データの中で表現されているキーワードあるいはその種類毎に、細かく応答文の内容を設定することが可能であり、その結果、利用者との対話がより自然となり、操作性向上が期待できる。従って、コールセンターにおける自動応答システムや、自動販売機やＡＴＭ等の機器の操作インタフェースとしての利用が本発明には適している。 The spoken dialogue apparatus of the present invention can finely set the contents of the response sentence for each keyword or its type expressed in the input user's voice data, and as a result, Can be expected to improve the operability. Accordingly, the present invention is suitable for use as an automatic response system in a call center or as an operation interface for devices such as vending machines and ATMs.

本発明の実施の形態１による音声対話装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the voice interactive apparatus by Embodiment 1 of this invention. 本発明の実施の形態１における対話シナリオのフォーマットを表す構成図である。It is a block diagram showing the format of the dialogue scenario in Embodiment 1 of this invention. 本発明の実施の形態１におけるキーワード種類辞書に格納される情報のフォーマットを表す構成図である。It is a block diagram showing the format of the information stored in the keyword kind dictionary in Embodiment 1 of this invention. 本発明の実施の形態１における言い換え辞書に格納される情報のフォーマットを表す構成図である。It is a block diagram showing the format of the information stored in the paraphrase dictionary in Embodiment 1 of this invention. 本発明の実施の形態１における応答文テンプレート辞書に格納される情報のフォーマットを表す構成図である。It is a block diagram showing the format of the information stored in the response sentence template dictionary in Embodiment 1 of this invention. 本発明の実施の形態１における対話制御プログラムの処理手順を表すフローチャートである。It is a flowchart showing the process sequence of the dialogue control program in Embodiment 1 of this invention. 本発明の実施の形態１において空きスロットと応答タイプの関係を格納する情報を示す対応表である。It is a correspondence table which shows the information which stores the relationship between an empty slot and a response type in Embodiment 1 of this invention. 本発明の実施の形態１において拡張した応答文テンプレート辞書のフォーマットを表す構成図である。It is a block diagram showing the format of the response sentence template dictionary expanded in Embodiment 1 of this invention. 本発明の実施の形態１におけるキーワードの種類に対する言い換え辞書のフォーマットを表す構成図である。It is a block diagram showing the format of the paraphrase dictionary with respect to the kind of keyword in Embodiment 1 of this invention. 本発明の実施の形態２における応答文テンプレート辞書のフォーマットを表す構成図である。It is a block diagram showing the format of the response sentence template dictionary in Embodiment 2 of this invention. 本発明の実施の形態２における複数のキーワードの組み合わせに対する言い換え辞書のフォーマットを表す構成図である。It is a block diagram showing the format of the paraphrase dictionary with respect to the combination of the several keyword in Embodiment 2 of this invention. 本発明の実施の形態２における複数のキーワードに対応した応答文テンプレート辞書のフォーマットを表す構成図である。It is a block diagram showing the format of the response sentence template dictionary corresponding to the some keyword in Embodiment 2 of this invention. 本発明の実施の形態３における言い換え辞書のフォーマットを表す構成図である。It is a block diagram showing the format of the paraphrase dictionary in Embodiment 3 of this invention.

Explanation of symbols

１０７音声認識プログラム
１０８対話制御プログラム
１０９音声合成プログラム
１１２キーワード種類辞書
１１３言い換え辞書
１１４応答文テンプレート辞書 DESCRIPTION OF SYMBOLS 107 Speech recognition program 108 Dialog control program 109 Speech synthesis program 112 Keyword type dictionary 113 Paraphrase dictionary 114 Response sentence template dictionary

Claims

Voice recognition means for recognizing one or more keywords and their reliability from the input user voice;
For each keyword, a response type indicating the type of response sentence to be communicated to the user by voice, a paraphrase used when the recognized keyword is included in the response sentence, and a condition for selecting the response sentence were recorded. A paraphrase dictionary,
A response sentence template dictionary storing a response sentence associated with the response type;
Based on the recognized keyword and its reliability, the response type and the paraphrase are determined from the paraphrase dictionary on the basis of the reliability, and the response type is determined based on the determined response type. Dialogue control means for determining a response sentence associated with the response sentence template dictionary, inserting the determined paraphrase into the determined response sentence, and generating a response sentence;
Voice synthesizing apparatus, comprising: voice synthesis means for converting the generated response sentence into voice data and outputting the voice data.

The paraphrase dictionary is
In addition to the reliability, the number of keywords included in the user's voice, the type of the keyword, the past response type history, the past response sentence history, the past user voice recognition The spoken dialogue apparatus according to claim 1, wherein any one or more of the results are included.

The response sentence template dictionary is:
The spoken dialogue according to claim 1 or 2, wherein the response sentence corresponding to a combination of response types corresponding to each of two or more keywords included in the inputted user voice is recorded. apparatus.

The dialogue control means includes
For each of two or more keywords included in the input user's voice, a response associated with a combination having a large number of response types constituting the combination among the combinations of the determined response types The spoken dialogue apparatus according to claim 3, wherein the sentence is determined with priority.

The paraphrase dictionary is
The spoken dialogue apparatus according to claim 1 or 2, wherein the paraphrase for each keyword is recorded in association with a combination of two or more keywords included in the inputted user's voice.

A keyword type dictionary that records the keyword and the name of the type in association with each other;
The paraphrase dictionary is
For each name of the type, record the response type, the paraphrase, and the condition,
The dialogue control means includes
Based on the recognized keyword, the keyword type name is determined from the keyword type dictionary, and based on the determined keyword type name and the condition, from the paraphrase dictionary, the response type and The spoken dialogue apparatus according to any one of claims 1 to 5, wherein the paraphrase is determined.

The paraphrase dictionary is
Record a plurality of said paraphrases,
The dialogue control means includes
The spoken dialogue apparatus according to claim 1, wherein any one of the plurality of paraphrase words determined from the paraphrase dictionary is randomly determined.

A voice input unit that inputs voice uttered by the user via the voice input device, and a voice output unit that outputs voice to be heard by the user via the voice output device;
For each keyword, a response type indicating the type of response sentence to be communicated to the user by voice, a paraphrase used when the recognized keyword is included in the response sentence, and a condition for selecting the response sentence were recorded. In a computer having a storage device in which a paraphrase dictionary and a response sentence template dictionary that records a response sentence associated with the response type are stored,
A process of recognizing one or more keywords and their reliability from the voice of the user input via the voice input unit;
A process of determining the response type and the paraphrase from the paraphrase dictionary based on the recognized keyword and its reliability, with the reliability as a condition,
A process for determining a response sentence associated with the response type from the response sentence template dictionary based on the determined response type;
Processing for inserting the determined paraphrase into the determined response sentence and generating a response sentence;
A process of synthesizing the response sentence into voice data and outputting the voice data via a voice output unit;
Are executed in this order.