JPH11143488A

JPH11143488A - Voice recognition device

Info

Publication number: JPH11143488A
Application number: JP30674197A
Authority: JP
Inventors: Satoshi Matsuhashi; 聡松橋; Takuro Nakayama; 卓郎中山; Masahiro Kosaka; 昌宏小坂; Hana Igarashi; 華五十嵐
Original assignee: Hitachi Ltd; Hitachi Communication Systems Inc
Current assignee: Hitachi Ltd; Hitachi Information and Telecommunication Engineering Ltd
Priority date: 1997-11-10
Filing date: 1997-11-10
Publication date: 1999-05-28

Abstract

(57)【要約】【課題】音声認識装置と発声者との間の対話におい
て、誤認識、誤動作の防止のために認識結果の正誤を常
に確認することによる、発声者への不快感をなくし、発
声者に対する音声認識サービスを円滑かつ正確に提供す
ること。【解決手段】音声認識装置１００内で発声者１１０な
いし１１２からの音声入力に対する認識結果を判断し、
発声者１１０ないし１１２への最適な音声ガイダンス
を、データベース部１０８にある複数パターンの中から
選択することで達成される。この複数の音声ガイダンス
パターンの選択によって、発声者に対して次に入力すべ
き情報の指示が円滑かつ正確に行える。 (57) [Summary] [Problem] To eliminate discomfort to a speaker by constantly checking the correctness of the recognition result in order to prevent erroneous recognition and malfunction in a dialog between the speech recognition device and the speaker. , To provide a speech recognition service to a speaker smoothly and accurately. SOLUTION: In a voice recognition device 100, a recognition result for voice input from speakers 110 to 112 is determined,
The optimal voice guidance to the speakers 110 to 112 is achieved by selecting from a plurality of patterns in the database unit 108. By selecting the plurality of voice guidance patterns, it is possible to smoothly and accurately specify the information to be input next to the speaker.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば、公衆網を
はじめとする通信網内に位置し、音声入力サービスを提
供するのに好適な音声認識装置に係り、特に、音声認識
結果の正誤を判別し、発声者に対して適切なガイダンス
を送信可能とする音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus which is located in a communication network such as a public network and is suitable for providing a speech input service. The present invention relates to a voice recognition device that can determine and transmit appropriate guidance to a speaker.

【０００２】[0002]

【従来の技術】従来、音声認識装置のアプリケーション
の実用化の代表的なものとして、公衆網へ適用し、様々
な内容のサービスを運用可能とするために検討が重ねら
れている。音声認識装置の公衆網への適用にあたって
は、発声者の周囲環境（例えば騒音）の違いや、音声入
力装置（例えば電話機）種別などの要因から、音声認識
装置が必ずしも入力音声を正確に認識し、発声者の要求
（発声）に１００％応えるとは限らない。そのため、誤
認識に対する前記発声者への対処策が種々検討されてい
る。2. Description of the Related Art Hitherto, as a typical application of a speech recognition apparatus, practical studies have been made to apply it to a public network and to operate services of various contents. In applying the speech recognition device to a public network, the speech recognition device does not always recognize the input speech correctly due to factors such as differences in the surrounding environment (for example, noise) of the speaker and the type of speech input device (for example, telephone). However, it does not always meet the request (utterance) of the speaker 100%. For this reason, various countermeasures for the speaker in response to erroneous recognition have been studied.

【０００３】その一つとして、特開平３−２４８１９９
号公報には、音声入力を、登録語彙と照合して認識結果
を得て、認識結果を他の機械の動作指示として送信する
音声認識装置において、第１の閾値と第２の閾値を定
め、入力音声パターンと基準パターンの「市街地」距離
の総和Ｄをとってこの最も小さいものを認識結果とし、
１／Ｄなどを信頼度として、信頼度が第１の閾値より大
きい場合には認識結果を動作指示として送信し、上記信
頼度が第２の閾値より大きく第１の閾値より小さい場合
には使用者が認識結果の確認をした場合のみ認識結果の
送信を行い、上記信頼度が第２の閾値より小さい場合に
は認識結果を無効とすることにより、入力効率を落とさ
ずに、致命的な誤動作を起こすことを防ぐようにした音
声認識方式が示されている。One of them is disclosed in Japanese Patent Application Laid-Open No. 3-248199.
Japanese Patent Laid-Open Publication No. H10-15064 discloses a speech recognition apparatus that obtains a recognition result by comparing a voice input with a registered vocabulary, and transmits the recognition result as an operation instruction of another machine, and defines a first threshold and a second threshold. The sum D of the “city” distance between the input voice pattern and the reference pattern is taken, and the smallest one is taken as the recognition result,
If the reliability is larger than the first threshold, the recognition result is transmitted as an operation instruction if 1 / D or the like is a reliability, and if the reliability is larger than the second threshold and smaller than the first threshold, it is used. The recognition result is transmitted only when the user confirms the recognition result, and when the reliability is smaller than the second threshold value, the recognition result is invalidated. A speech recognition system that prevents the occurrence of a voice recognition is shown.

【０００４】また、一般に、音声認識装置と発声者との
間で、認識結果の正誤を確認する音声応答（ガイダンス
出力）機能が適用されている。In general, a voice response (guidance output) function for confirming whether a recognition result is correct or not is applied between a voice recognition device and a speaker.

【０００５】従来の音声認識装置における対話の流れを
図１を用いて説明する。音声認識装置は、入力された単
語音声Ｓ１１１、Ｓ１１４と、予め装置内に登録されて
いる単語の全てとの間の尤度を計算し、図１のＳ１０
２、Ｓ１０５に示すように、認識結果の第１候補となっ
た、「営業部」、「加藤さん」を音声出力して、発声者
に、確認として「はい」「いいえ」の入力を要求するこ
とが多い。また、確認を行なおうとする単語の尤度が極
めて小さい場合は、正解である可能性が低いと判定して
確認処理を行わずに、図１のＳ１０４に示されるよう
に、「もう一度〜を入力して下さい」のような再入力を
指示することも行われる。[0005] The flow of a dialog in a conventional speech recognition apparatus will be described with reference to FIG. The speech recognition device calculates the likelihood between the input word speeches S111 and S114 and all of the words registered in the device in advance, and calculates the likelihood in S10 of FIG.
2. As shown in S105, "Sales Department" and "Kato-san", which are the first candidates of the recognition result, are output as voices, and the speaker is requested to input "Yes" and "No" as confirmation. Often. Also, when the likelihood of the word to be checked is extremely small, it is determined that the likelihood of a correct answer is low, and the checking process is not performed, and as shown in S104 of FIG. Re-enter such as "Please enter."

【０００６】[0006]

【発明が解決しようとする課題】以上のように、従来の
音声認識装置と発声者との間の対話においては、音声認
識に伴う誤動作を起こさないという意味で有効である一
方、認識結果の正誤確認を常に繰り返すため、時間がか
かり、また、心理的に発声者の不快感を生む場合が多
い。本発明の課題は、上記従来の技術の問題点を解消す
ることである。As described above, in the dialogue between the conventional voice recognition apparatus and the speaker, it is effective in that no malfunction occurs due to voice recognition, while the recognition result is correct. Since the confirmation is always repeated, it takes a long time and often causes the speaker to be uncomfortable psychologically. An object of the present invention is to solve the above-mentioned problems of the conventional technology.

【０００７】すなわち、本発明の目的は、誤認識に伴う
誤ったサービス提供を行う危険を増大させることなく、
円滑スピーディ、かつ確実、正確に、音声入力、音声識
別サービス提供することである。That is, an object of the present invention is to increase the risk of providing erroneous services due to erroneous recognition,
It is to provide voice input and voice identification service smoothly, speedily, reliably and accurately.

【０００８】[0008]

【課題を解決するための手段】上記課題は、本発明によ
れば、発声者からの音声入力に対する認識結果を、音声
認識装置内に具備されている信頼度計算部によって算出
された結果に基づいて判定し、発声者への音声ガイダン
スを、前記信頼性に基づき複数用意されたガイダンス遷
移パターンの中から適切なものを選択する手段を設ける
ことで達成される。According to the present invention, a recognition result for a speech input from a speaker is calculated based on a result calculated by a reliability calculation unit provided in a speech recognition apparatus. This is achieved by providing means for selecting an appropriate voice guidance to the speaker from the plurality of guidance transition patterns prepared based on the reliability.

【０００９】上記のように、発声者が、音声認識装置か
ら送られてくる音声ガイダンス内容に従い、次の音声入
力を行う音声認識システムにおいて、音声認識装置から
発声者に対して、次に入力すべき音声入力情報の指示を
行う音声ガイダンスが、発声者の発声した音声状態を分
析し、その分析結果に応じて、複数の音声ガイダンスパ
ターンの中から選択された音声ガイダンスパターンであ
るため、音声入力サービスを円滑かつ正確に行うことが
できる。As described above, in the voice recognition system in which the speaker performs the next voice input in accordance with the voice guidance sent from the voice recognition device, the voice recognition device inputs the voice to the voice speaker next. Since the voice guidance for instructing the voice input information to be analyzed is a voice guidance pattern selected from a plurality of voice guidance patterns according to the analysis result of the voice state of the speaker, the voice input is performed. Service can be performed smoothly and accurately.

【００１０】出力できるガイダンス内容は大きく、次の
４つに大別される。（１）対話促進ガイダンス例えば、複数の情報入力によって１つのサービスが提供
される場合に、第ｎ番目の情報入力が終わったあと、第
ｎ＋１番目の情報入力を促すガイダンス。The guidance contents that can be output are broadly divided into the following four. (1) Dialogue promotion guidance For example, when one service is provided by a plurality of information inputs, guidance for prompting the (n + 1) th information input after the nth information input is completed.

【００１１】（２）聞き返しガイダンス例えば、「○○ですね」のような「はい」「そうです」
などの肯定語と「いいえ」「違う」などの否定語の入力
を促すガイダンス。(2) Reflection Guidance For example, "Yes" or "Yes" like "I'm OO"
Guidance that prompts for the input of positive words such as "No" and "Negative" such as "No."

【００１２】（３）聞き直しガイダンス複数の情報入力によって１つのサービスが提供できる時
に、その第ｎ番目の情報入力が終わったあと、再度第ｎ
番目の情報入力を促すガイダンス。(3) Listening Guidance When one service can be provided by a plurality of information inputs, after the n-th information input is completed, the n-th information is again input.
Guidance for prompting for second information.

【００１３】（４）認識可能単語提示ガイダンス認識候補として該音声認識が現段階で用意している単語
を並べて出力し、該認識単語群の中から選択して入力す
ることを促すガイダンス。(4) Recognizable Word Presentation Guidance Guidance to promptly output words prepared by the voice recognition at this stage as recognition candidates and to select and input from the recognized word group.

【００１４】本発明によれば、聞き返しガイダンスが不
要なときは、例えば、対話促進ガイダンスが発声者へ送
信されることにより、発声者は確認ガイダンスを意識す
ることなく次工程のガイダンス内容にて音声入力が可能
となり、不快感・煩わしさ等が軽減されている。According to the present invention, when the return guidance is not required, for example, the dialogue promotion guidance is transmitted to the speaker, so that the speaker does not recognize the confirmation guidance and outputs the voice in the guidance content of the next process. Input becomes possible, and discomfort and annoyance are reduced.

【００１５】さらに、ガイダンス内容を最終的に確認可
能とするため、ガイダンスデータ記憶部にガイダンスデ
ータを記憶しておき、接続先の確認をすることも可能で
ある。以下、本発明を図２ないし図８に示す実施の形態
により具体的に説明する。Further, in order to make it possible to finally confirm the guidance content, it is possible to store the guidance data in the guidance data storage unit and confirm the connection destination. Hereinafter, the present invention will be described in detail with reference to the embodiments shown in FIGS.

【００１６】本発明に関わる音声認識装置の音声ガイダ
ンス選択送信のための構成及び作用について、図２ない
し図８に示す実施の形態により、具体的に説明する。The structure and operation of the voice recognition apparatus according to the present invention for selecting and transmitting voice guidance will be specifically described with reference to the embodiments shown in FIGS.

【００１７】[0017]

【発明の実施の形態】図２は、本発明のハードウェア構
成を示した図であり、音声認識装置１００は、音声入力
装置１１０〜１１２からの入力音声と、認識用辞書デー
タ記憶部１０３に記憶されている認識用辞書データとの
間のスコアを求め、入力音声の候補単語を決定する音声
認識部１０２を具備する。上記スコアは、例えば、入力
音声と認識用辞書データの距離を求め、その総和の逆数
とすればよい。FIG. 2 is a diagram showing a hardware configuration of the present invention. A speech recognition apparatus 100 stores input speech from speech input devices 110 to 112 and a dictionary data storage unit 103 for recognition. A speech recognition unit 102 is provided for determining a score between the stored dictionary data for recognition and determining a candidate word for input speech. The score may be, for example, the distance between the input voice and the dictionary data for recognition, and the reciprocal of the sum.

【００１８】また、音声認識装置１００は、前記音声認
識部１０２が入力音声を認識する際に参照する認識用辞
書データを格納する認識用辞書データ記憶部１０３の他
に前記音声認識部１０２から送出される候補単語の信頼
度として算出する信頼度計算部１０１を具備する。上記
信頼度は、例えば、認識結果の第１候補のスコアと認識
結果の第２候補のスコアとの差分をとることで求めるこ
とができる。Further, the speech recognition apparatus 100 transmits the speech data from the speech recognition section 102 in addition to the recognition dictionary data storage section 103 for storing the recognition dictionary data to be referred to when the speech recognition section 102 recognizes the input speech. A reliability calculation unit 101 that calculates the reliability of a candidate word to be calculated. The reliability can be obtained, for example, by calculating the difference between the score of the first candidate of the recognition result and the score of the second candidate of the recognition result.

【００１９】音声認識装置１００は、さらに、前記音声
認識部１０２で決定された第１候補単語のコード及び前
記信頼度計算部１０１で計算された信頼度を受け、次に
出力するガイダンスデータと次に認識に用いる認識用辞
書データのデータベース上のアドレスを指定して、それ
ぞれガイダンスデータ記憶部１０５と認識用辞書データ
記憶部１０３に転送する制御部１０７と、音声認識時に
用いる認識用辞書データ１０８Ｄ及び音声合成時に用い
るガイダンスデータ１０８Ｇを格納するデータベース部
１０８と、ガイダンスデータを音声データに変換して発
声者側へ送出する音声合成部１０４と、音声合成時に用
いるガイダンスデータを格納するガイダンスデータ記憶
部１０５とを具備する。The speech recognition apparatus 100 further receives the code of the first candidate word determined by the speech recognition unit 102 and the reliability calculated by the reliability calculation unit 101, and outputs the next guidance data and the next guidance data. A control unit 107 for designating an address on a database of recognition dictionary data to be used for recognition and transferring them to a guidance data storage unit 105 and a recognition dictionary data storage unit 103, respectively; a recognition dictionary data 108D used for speech recognition; A database unit 108 for storing guidance data 108G used at the time of voice synthesis, a voice synthesis unit 104 for converting the guidance data into voice data and sending it to the speaker, and a guidance data storage unit 105 for storing guidance data used at the time of voice synthesis. And

【００２０】図３は、図2に示す音声認識装置の処理動
作の流れを、音声（破線矢示線）と制御データ（実線矢
示線）とを併記して示した図である。FIG. 3 is a diagram showing the flow of the processing operation of the voice recognition apparatus shown in FIG. 2 together with voice (dashed arrow line) and control data (solid arrow line).

【００２１】発声者と音声認識装置の間で回線が接続さ
れると、制御部１０７は、前記発声者に対して音声の入
力を促すガイダンスデータを選択し、対応するガイダン
スデータアドレス（Ｃ３０１）により、データベース部
１０８からガイダンスデータを読み出し、記憶部１０５
へ前記ガイダンスデータを転送する（Ｃ３０２）。When a line is connected between the speaker and the voice recognition device, the control section 107 selects guidance data for prompting the speaker to input a voice, and uses the corresponding guidance data address (C301). , The guidance data is read out from the database unit 108 and the storage unit 105 is read out.
The guidance data is transferred to (C302).

【００２２】音声合成部１０４は、前記ガイダンスデー
タ記憶部１０５に格納された前記ガイダンスデータを受
け取り、ガイダンス音声に変換して前記発声者へ送信す
る（Ａ３０１）。The voice synthesis unit 104 receives the guidance data stored in the guidance data storage unit 105, converts the guidance data into guidance voice, and transmits the guidance voice to the speaker (A301).

【００２３】この間に、前記制御部１０７は音声認識の
際に用いる認識用辞書データを、認識用辞書データアド
レス（Ｃ３０４）に基づき、前記データベース部１０８
から認識用辞書データ記憶部１０３へ転送する（Ｃ３０
５）。In the meantime, the control unit 107 stores the dictionary data for recognition used for voice recognition based on the dictionary data address for recognition (C304).
Is transferred to the recognition dictionary data storage unit 103 (C30).
5).

【００２４】出力ガイダンスに対する発声者から音声の
入力があると（Ａ３０２）、音声認識部１０２におい
て、入力音声（Ａ３０２）と、音声認識部１０２に送ら
れてきた前記認識用辞書データ（Ｃ３０６）との間でス
コアを計算し（Ｃ３０７）、前記入力音声の候補単語を
決定する。When there is a voice input from the speaker for the output guidance (A302), the voice recognition unit 102 inputs the input voice (A302) and the recognition dictionary data (C306) sent to the voice recognition unit 102. Is calculated (C307), and candidate words for the input voice are determined.

【００２５】音声認識部１０２は、得られた認識結果の
スコア（Ｃ３０７）を信頼度計算部１０１に送って認識
結果の信頼度を得て（Ｃ３０８）、前記認識結果と前記
信頼度を制御部１０７へ送出する（Ｃ３０９）。The speech recognition section 102 sends the obtained score of the recognition result (C307) to the reliability calculation section 101 to obtain the reliability of the recognition result (C308), and controls the recognition result and the reliability in the control section. It is transmitted to C107 (C309).

【００２６】制御部１０７は、前記信頼度の値に従って
次に用いるべきガイダンスデータと認識用辞書データの
選択ＤＳを行い、前述と同様に、データベース部１０８
に、ガイダンスデータアドレス（Ｃ３１０）及び認識用
辞書データアドレス（Ｃ３１３）を送出し、ガイダンス
データ記憶部１０５と認識用辞書データ記憶部１０３へ
転送する（Ｃ３１１）及び（Ｃ３１４）。ガイダンスデ
ータは、さらに音声合成部１０４に送られ（Ｃ３１
２）、ガイダンス音声に変換されて、電話機などの音声
入力装置１１０ないし１１２に送信される（Ａ３０
３）。また、認識用辞書データは、音声認識部１０２に
転送される。The control unit 107 selects the guidance data to be used next and the selection dictionary data for recognition in accordance with the value of the reliability.
Then, the guidance data address (C310) and the dictionary data address for recognition (C313) are transmitted and transferred to the guidance data storage unit 105 and the dictionary data storage unit for recognition 103 (C311) and (C314). The guidance data is further sent to the speech synthesis unit 104 (C31
2) It is converted into guidance voice and transmitted to voice input devices 110 to 112 such as telephones (A30)
3). The dictionary data for recognition is transferred to the voice recognition unit 102.

【００２７】次に、対話促進ガイダンスを出力する場合
について、図４、図５及び図７を使って説明する。Next, a case where the dialogue promotion guidance is output will be described with reference to FIGS. 4, 5 and 7. FIG.

【００２８】図７に示すように、音声認識装置１００か
らの対話促進ガイダンス（Ｓ７０１）、例えば、「所属
部署をどうぞ」に対して、発声者の入力音声（Ｓ７１
１）が「営業部」となり、これに対する信頼度Ｒが、図
４の項番１の通り、閾値ｔｈ１よりも大きい場合（Ｓ５
０６）、制御部１０７は、図５に示すように、聞き返し
などは行わず、次に必要な情報の認識のために次のガイ
ダンスデータ及び認識用辞書データを選択する（Ｓ５０
７）及び（Ｓ５０８）。As shown in FIG. 7, the dialogue promotion guidance (S701) from the voice recognition device 100, for example, in response to "Please enter your department", the input voice of the speaker (S71).
1) is “sales department”, and the reliability R for this is greater than the threshold th1 as shown in item 1 of FIG. 4 (S5).
06), as shown in FIG. 5, the control unit 107 selects the next guidance data and recognition dictionary data for recognizing the next necessary information without performing the reflection (S50).
7) and (S508).

【００２９】これにより前記発声者側に出力されるガイ
ダンスは、図７に示すように、第２番目に必要な情報を
認識するための対話促進ガイダンス（Ｓ７０２）「担当
者名をどうぞ」となる。As a result, as shown in FIG. 7, the guidance output to the speaker is the dialogue promotion guidance for recognizing the second necessary information (S702), "Please name the person in charge." .

【００３０】次に、発声者の入力音声の認識結果の信頼
度Ｒが、図４の項番２の通り、閾値ｔｈ１よりも小さく
かつ閾値ｔｈ２よりも大きい場合、制御部１０７は、図
５の（Ｓ５０９）、（Ｓ５１０）、（Ｓ５１１）、（Ｓ
５１２）で示す通り、聞き返しガイダンスを出力するよ
うに、次のガイダンスデータおよび認識用辞書データを
選択する。音声認識装置は、図１の（Ｓ１０２）、（Ｓ
１０５）に示すような聞き返しガイダンスを出力するこ
ともできる。Next, when the reliability R of the recognition result of the input voice of the speaker is smaller than the threshold th1 and larger than the threshold th2 as shown in item No. 2 of FIG. (S509), (S510), (S511), (S
As indicated by 512), the next guidance data and recognition dictionary data are selected so as to output the reflection guidance. The voice recognition device performs the operations of (S102) and (S102) of FIG.
Reflection guidance as shown in 105) can be output.

【００３１】次に、発声者の入力音声の認識結果の信頼
度Ｒが、図４の項番３の通り、閾値ｔｈ２よりも小さい
場合、制御部１０７は、図５の（Ｓ５１３）、（Ｓ５１
４）で示す通り、聞き直しガイダンスを出力するよう
に、次のガイダンスデータおよび認識用辞書データを選
択する。これにより前記発声者側に出力されるガイダン
スは、例えば、図１の（Ｓ１０４）に示すように「もう
１度お願いします」となる。Next, when the reliability R of the recognition result of the input voice of the speaker is smaller than the threshold th2 as shown in item No. 3 of FIG. 4, the control unit 107 proceeds to (S513) and (S51) of FIG.
As shown in 4), the next guidance data and recognition dictionary data are selected so as to output the re-listening guidance. Thus, the guidance output to the speaker side is, for example, "Please ask again" as shown in (S104) of FIG.

【００３２】次に、発声者の入力が認識用辞書データの
中に未登録である場合について図４、図５、図８を用い
て説明する。例えば、図８に示すように、発声者の入力
音声が「電子部品営業部」となり、入力音声の認識結果
のスコアＡが、図４の項番４のように、閾値Ｄよりも小
さい場合、制御部１０７は、次の順番のガイダンスデー
タおよび認識用辞書データを選択する。これにより前記
発声者側に出力されるガイダンスは、例えば、図８の
（Ｓ８０２）に示すような認識可能単語提示ガイダンス
「人事部、企画部、営業部の中からお選び下さい」とな
る。Next, a case where the input of the speaker is not registered in the dictionary data for recognition will be described with reference to FIGS. 4, 5 and 8. FIG. For example, as shown in FIG. 8, when the input voice of the speaker is “Electronic parts sales department” and the score A of the recognition result of the input voice is smaller than the threshold D as in item No. 4 in FIG. The control unit 107 selects the guidance data and the dictionary data for recognition in the next order. Thus, the guidance output to the speaker side is, for example, the recognizable word presentation guidance as shown in (S802) of FIG. 8 "Please select from the human resources department, the planning department, and the sales department".

【００３３】次に、音声入力終了時に、認識した全ての
内容をガイダンスする場合について説明する。認識結果
は、図２のガイダンスデータ記憶部１０５に蓄えられて
おり、図５の処理の流れに従って、対話終了時には、認
識した全ての認識結果を含む内容をガイダンスするよう
にガイダンスデータを選択する。例えば、図８に示す
「営業部の加藤へお繋ぎします」のようになる。Next, a description will be given of a case where all recognized contents are provided with guidance at the end of voice input. The recognition result is stored in the guidance data storage unit 105 in FIG. 2, and the guidance data is selected according to the flow of the process in FIG. For example, "I will connect to Kato of the sales department" shown in FIG.

【００３４】[0034]

【発明の効果】以上の通り、本発明により信頼度計算部
の数値によって複数パターンのガイダンスが発声者へ送
出されるため、無駄な再発声等を省くことができ、円滑
に対話を進めることを可能とする。As described above, according to the present invention, since a plurality of patterns of guidance are sent to the speaker based on the numerical value of the reliability calculation unit, it is possible to omit useless repetitions, etc., and to facilitate the dialogue. Make it possible.

【００３５】これにより、発声者の不快感を軽減し、か
つ発声者の要求するサービスを正確に提供できる効果が
ある。Thus, there is an effect that the discomfort of the speaker can be reduced and the service requested by the speaker can be provided accurately.

[Brief description of the drawings]

【図１】従来の音声認識装置と発声者間の対話の例を示
す遷移図。FIG. 1 is a transition diagram showing an example of a dialogue between a conventional voice recognition device and a speaker.

【図２】本発明による音声認識装置のハードウェア構成
図。FIG. 2 is a hardware configuration diagram of a speech recognition device according to the present invention.

【図３】本発明による音声認識装置内の動作シーケンス
を示すシーケンス図。FIG. 3 is a sequence diagram showing an operation sequence in the voice recognition device according to the present invention.

【図４】本発明による音声認識装置の信頼度による次ガ
イダンスの判定条件の一覧図表。FIG. 4 is a table showing a list of judgment conditions for the next guidance based on the reliability of the speech recognition apparatus according to the present invention.

【図５】本発明による音声認識装置の次ガイダンス選択
処理のフローチャート。FIG. 5 is a flowchart of a next guidance selection process of the voice recognition device according to the present invention.

【図６】本発明による音声認識装置の次ガイダンス選択
処理のフローチャート。FIG. 6 is a flowchart of a next guidance selection process of the voice recognition device according to the present invention.

【図７】本発明による音声認識装置で実現される対話の
例を示す遷移図。FIG. 7 is a transition diagram showing an example of a dialog realized by the speech recognition device according to the present invention.

【図８】本発明による音声認識装置で実現される対話の
例を示す遷移図。FIG. 8 is a transition diagram showing an example of a dialog realized by the speech recognition device according to the present invention.

[Explanation of symbols]

１０１…信頼度計算部、１０２…音声認識部、１０３…
認識用辞書データ記憶部、１０４…音声合成部、１０５
…ガイダンスデータ記憶部、１０６…インタフェース
部、１０７…制御部、１０８…データベース部、１１０
〜１１２…音声入力装置101: reliability calculation unit, 102: speech recognition unit, 103:
Recognition dictionary data storage unit, 104 ... Speech synthesis unit, 105
... Guidance data storage unit, 106 ... Interface unit, 107 ... Control unit, 108 ... Database unit, 110
~ 112 ... voice input device

───────────────────────────────────────────────────── フロントページの続き (72)発明者小坂昌宏神奈川県横浜市戸塚区戸塚町216番地株式会社日立製作所情報通信事業部内 (72)発明者五十嵐華神奈川県横浜市戸塚区戸塚町180番地日立通信システム株式会社内 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Masahiro Kosaka 216 Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa Prefecture Inside the Hitachi, Ltd.Information and Communications Department (72) Inventor Hana Igarashi 180 Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa Hitachi Communication Systems Co., Ltd.

Claims

[Claims]

1. A vocabulary candidate group corresponding to a function and service for receiving a voice input of a speaker, which is located in a communication network,
It has a voice recognition function for selecting one or more vocabulary candidates based on the voice input and outputting information on its certainty, and a response function for storing a plurality of types of response data to be transmitted to the speaker. Outputting a transmission guidance corresponding to response data selected from a plurality of types of response data stored in the response function, using information on the likelihood output from the voice recognition function. Recognition device.

2. A plurality of types of response data, wherein at least two of data for dialogue promotion guidance, data for return guidance, data for repeat guidance, and data for guidance on presentation of recognizable words are stored. The speech recognition device according to claim 1, wherein

3. The speech recognition apparatus according to claim 2, wherein the dialog promoting guidance is output when the information on the probability of the recognition result exceeds a certain value.

4. The speech recognition apparatus according to claim 1, wherein the next information can be input to the speaker without outputting guidance for confirming the recognition result.

5. The apparatus has response data for instructing a speaker to re-input the same voice as the voice input at the time of first recognition, and the distance data and the distance difference data between the two are below a certain value. 2. The speech recognition device according to claim 1, wherein the output is performed in the case of.

6. The speech recognition apparatus according to claim 1, further comprising response data for explaining service contents that can be provided to the speaker, and outputting the response data when the distance data is equal to or less than a predetermined value.

7. The speech recognition apparatus according to claim 1, wherein said speech recognition apparatus has a function of presenting a target word to said speaker.

8. When the voice recognition device transmits response data for promoting dialogue, finally, when the voice input is completed, all recognized contents are output as response data, and the speaker is asked to confirm. The speech recognition device according to claim 2, wherein:

9. The voice recognition according to claim 2, wherein the content of the voice recognition is provided before transmitting the recognition result to the destination (the name of the service to be connected). apparatus.