JPH11149294A

JPH11149294A - Voice recognition device and voice recognition method

Info

Publication number: JPH11149294A
Application number: JP31562597A
Authority: JP
Inventors: Shigeki Aoshima; 滋樹青島
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 1997-11-17
Filing date: 1997-11-17
Publication date: 1999-06-02

Abstract

PROBLEM TO BE SOLVED: To improve recognition capacity by making best use of recognized results of the past, when voice is re-input. SOLUTION: A restatement judging part 20 carries out inter-input-pattern matching between this time input voice pattern and the past input voice pattern to judge whether this time vocalization is a restatement vocalization or not. A recognition processing part 30 carries out recognition matching of this time input voice pattern with the standard pattern to select a recognition candidate. If the result of the restatement judging part 20 is not the restatement vocalization, the recognition candidate selected this time is output as the recognized result as it is. In the case of the restatement vocalization, the adjusted recognition candidate is decided, based on double matching results using both of the past recognition candidates stored in a recognition candidate registration part 32 and the recognition candidate is obtained by the processing of this time.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識装置およ
び方法、特に、話者により再発声が行われたときの認識
能力を向上できる音声認識装置および方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus and method, and more particularly to a speech recognition apparatus and method capable of improving the recognition ability when a speaker re-utters.

【０００２】[0002]

【従来の技術】従来より、発声入力の音声パターンと標
準パターンとのパターンマッチングを行う音声認識装置
が周知である。音声認識装置は、各種の装置の入力手段
として使うことができる。例えば、車両に音声認識装置
を設ければ、運転者は音声コマンドの発声により車載機
器を操作することができ、これにより操作性を向上し、
運転者の負担を軽減することが可能になる。2. Description of the Related Art Conventionally, a speech recognition apparatus for performing pattern matching between a speech pattern of an utterance input and a standard pattern has been known. The speech recognition device can be used as input means for various devices. For example, if a vehicle is provided with a voice recognition device, the driver can operate the in-vehicle device by uttering a voice command, thereby improving operability,
The burden on the driver can be reduced.

【０００３】図１は、従来の音声認識装置の構成を示し
ている。話者がマイク１０に向かって発声すると、マイ
ク１０からの入力音声信号は、Ａ／Ｄコンバータ１２で
デジタル信号に変換され、音響処理部１４に入力され
る。音響処理部１４では、音響分析が行われ、特徴パラ
メータ（ケプストラム等）の入力音声パターンが生成さ
れ、入力音声パターンは認識処理部１６に供給される。FIG. 1 shows a configuration of a conventional speech recognition apparatus. When a speaker speaks toward the microphone 10, an input voice signal from the microphone 10 is converted into a digital signal by the A / D converter 12 and input to the acoustic processing unit 14. In the acoustic processing unit 14, acoustic analysis is performed, an input voice pattern of a characteristic parameter (such as a cepstrum) is generated, and the input voice pattern is supplied to the recognition processing unit 16.

【０００４】認識処理部１６では、入力音声パターンと
標準パターンとのパターンマッチングが行われる。例え
ば単語認識を行う場合、認識用辞書記憶部１８には、認
識対象の複数の単語データが記憶されている。入力音声
パターンと各単語のマッチングが個別に行われ、マッチ
ング結果が最もよい単語が第１位の認識候補に選定され
る。さらに適宜、下位の認識候補（単語）が、マッチン
グのよい順に選定される。そして、認識処理部１６は、
選定した１または複数の認識候補を認識結果として出力
する。The recognition processing section 16 performs pattern matching between an input voice pattern and a standard pattern. For example, when performing word recognition, the recognition dictionary storage unit 18 stores a plurality of word data to be recognized. The input speech pattern and each word are individually matched, and the word with the best matching result is selected as the first recognition candidate. Further, lower recognition candidates (words) are appropriately selected in a descending order of matching. Then, the recognition processing unit 16
One or more selected recognition candidates are output as a recognition result.

【０００５】上記の認識処理部１６でのパターンマッチ
ングの手法としては、ダイナミックプログラミング法
（動的計画法、以下ＤＰ法という）や、ヒドンマルコフ
モデル（隠れマルコフモデル、以下、ＨＭＭという）を
使う確率手法が知られている。周知のように、前者のＤ
Ｐ法は、特定話者の音声の認識に適しており、後者のＨ
ＭＭを使う手法は、不特定話者の音声の認識に適してい
る。ＤＰ法は、ＤＴＷ（Dynamic Time Warping）とも呼
ばれる。[0005] As a method of pattern matching in the recognition processing unit 16, the probability of using a dynamic programming method (dynamic programming method, hereinafter referred to as DP method) or a hidden Markov model (hidden Markov model, hereinafter referred to as HMM) is used. Techniques are known. As is well known, the former D
The P method is suitable for recognizing a specific speaker's voice, and the latter H
The method using MM is suitable for recognizing the voice of an unspecified speaker. The DP method is also called DTW (Dynamic Time Warping).

【０００６】[0006]

【発明が解決しようとする課題】音声認識装置では、認
識結果が正しいかどうかを話者に問い合わせるために、
例えば、認識候補を通知するための合成音声が出力され
（トークバック）、また例えば、認識候補がディスプレ
イ表示される。そして、適宜、話者により、認識結果を
修正するためにもう一度発声入力が行われる。このよう
に発声入力の後にもう一度行われる発声入力を、「再発
声入力」という。In a speech recognition apparatus, in order to inquire a speaker whether or not a recognition result is correct,
For example, a synthesized voice for notifying the recognition candidate is output (talkback), and, for example, the recognition candidate is displayed on the display. Then, if necessary, another utterance input is performed by the speaker to correct the recognition result. The utterance input performed once again after the utterance input in this way is referred to as “repetition input”.

【０００７】再発声入力が行われるのは、間違ったこと
を言ってしまったことに話者が気づいた場合や、話者が
途中で言葉に詰まったりしたために、話者の意図通りに
音声が認識されなかった場合などである。このような場
合の再発声は、前回と異なった言葉が発声される「言い
換え発声」である。[0007] The re-utterance input is performed when the speaker notices that he or she has said something wrong, or when the speaker gets stuck in words on the way, so that the voice is intended as intended by the speaker. For example, when it is not recognized. The re-utterance in such a case is a “paraphrase utterance” in which a different word is uttered.

【０００８】再発声入力には、上記の言い換え発声のほ
かに、「言い直し発声」がある。現状では、音声認識装
置での認識正解率は１００％に届かないために、認識候
補が誤って選定されることがある。このとき、前回と同
じ言葉を発声する「言い直し発声」が行われる。[0008] In addition to the above paraphrased utterance, the re-utterance input includes "rephrase utterance". At present, the recognition accuracy rate of the speech recognition device does not reach 100%, and thus recognition candidates may be selected by mistake. At this time, a “restatement utterance” that utters the same word as the previous time is performed.

【０００９】従来は、再発声入力が行われたときに、通
常の発声入力時と同様の認識処理が行われる。従来の処
理では、再発声入力が言い換え発声であるか言い直し発
声であるかは考慮されない。そして、言い直し発声の場
合には、話者が何度繰り返して同じ言葉を発声しても、
毎回同じような誤認識結果が出力される可能性があっ
た。Conventionally, when a re-utterance input is performed, the same recognition processing as in a normal utterance input is performed. In the conventional processing, it is not considered whether the re-utterance input is a paraphrase utterance or a rephrase utterance. And in the case of rephrasing, even if the speaker utters the same word repeatedly,
The same erroneous recognition result may be output every time.

【００１０】また、特開平１−１６１２９９号公報に記
載の音声認識システムでは、再発声入力に対する認識結
果と、前の発声入力に対する認識結果との同一部分が抽
出され、同一部分をもとに辞書検索が行われる。例え
ば、４文字単語のうちの２文字が同じであれば、その２
文字をもとに辞書検索が行われる。しかしながら、この
システムでは、２回の発声とは全く関係ない単語が、同
一部分を含むという理由で選ばれ、そのために誤認識が
生じる可能性がある。さらに、同従来システムでも、再
発声入力が言い直し発声か言い換え発声かは考慮されて
おらず、話者が同じ言葉を２回続けて話すとは限らない
という実状に適応できない。In the speech recognition system described in Japanese Patent Application Laid-Open No. 1-161299, the same part of the recognition result for the re-uttered input and the recognition result for the previous uttered input is extracted, and a dictionary is extracted based on the same part. A search is performed. For example, if two characters of a four-character word are the same,
A dictionary search is performed based on the characters. However, in this system, words that have nothing to do with the two utterances are chosen because they contain the same part, which can lead to misrecognition. Further, even in the conventional system, it is not considered whether re-utterance input is re-utterance speech or paraphrase speech, and it cannot be adapted to the fact that the speaker does not always speak the same word twice in succession.

【００１１】本発明は上記課題に鑑みてなされたもので
あり、その目的は、再発声入力が行われたときに、過去
の認識結果を有効に使って認識能力を向上することが可
能な音声認識装置および方法を提供することにある。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to provide a speech capable of improving recognition ability by effectively using past recognition results when a re-utterance input is performed. It is to provide a recognition device and a method.

【００１２】[0012]

【課題を解決するための手段】（１）本発明の音声認識
装置は、発声入力された音声信号に基づいて入力音声パ
ターンを生成する音響処理手段と、前記入力音声パター
ンと標準パターンとの認識マッチングを行って音声認識
の認識候補を選定する認識処理手段と、今回の入力音声
パターンが、過去の発声の言い直し発声により得られた
パターンであることを判断する言い直し判断手段と、前
記判断手段により言い直し発声が行われたと判断された
場合に、認識処理手段により選定された認識候補の調整
処理を行う認識候補調整手段と、を有し、認識候補調整
手段は、前記調整処理として、今回の認識マッチング結
果の認識候補および過去の認識マッチング結果の認識候
補の双方を用いて、調整された認識候補を定める。(1) A speech recognition apparatus according to the present invention comprises: an acoustic processing means for generating an input speech pattern based on an uttered speech signal; and recognition of the input speech pattern and a standard pattern. Recognition processing means for selecting a recognition candidate for voice recognition by performing matching; restatement determination means for determining that the current input voice pattern is a pattern obtained by restatement of past utterances; When it is determined that rephrasing has been performed by the means, a recognition candidate adjusting means for adjusting the recognition candidate selected by the recognition processing means, and recognition candidate adjusting means, as the adjustment processing, An adjusted recognition candidate is determined using both the recognition candidate of the current recognition matching result and the recognition candidate of the past recognition matching result.

【００１３】本発明によれば、話者が再発声を行ったと
きに、言い直し判断手段により、今回の入力音声パター
ンは、過去の発声の言い直し発声により得られたパター
ンであるか否かが判断される。言い直し発声の場合に
は、過去の発声の際の認識マッチングで選定された認識
候補を参照することで、より正確に認識候補を定めるこ
とができ、誤認識の削減が可能となる。一方、言い直し
発声でない場合には、認識候補の調整処理は行われな
い。従って、過去の認識結果を無意味に参照するといっ
た無駄な処理が回避され、かつ、上記の無意味な参照に
よって招かれる誤認識発生を回避することができる。こ
のように、本発明によれば、言い直し発声か否かを判断
した上で、言い直し発声の場合に過去の認識候補を活用
するので、再発声入力時の認識の正確さが増し、これに
より認識能力を向上することができる。According to the present invention, when the speaker re-utters, the re-phrase determining means determines whether or not the current input voice pattern is a pattern obtained by re-phrase of past utterances. Is determined. In the case of utterance rephrasing, by referring to recognition candidates selected by recognition matching in past utterances, recognition candidates can be determined more accurately, and erroneous recognition can be reduced. On the other hand, if it is not a rephrase, the recognition candidate adjustment process is not performed. Therefore, it is possible to avoid useless processing of referring to past recognition results in a meaningless manner, and to avoid occurrence of erroneous recognition caused by the above-mentioned meaningless reference. As described above, according to the present invention, since it is determined whether or not the utterance is a utterance restatement, the past recognition candidates are used in the case of the utterance restatement. Can improve the recognition ability.

【００１４】（２）好ましくは、前記認識候補調整手段
は、過去と今回の認識マッチング結果を表すパラメータ
に基づいた所定の演算処理を行って、調整された認識候
補を定める。認識マッチング結果を表すパラメータは、
例えば、認識候補の順位や類似度である。ここでの類似
度は、周知のように、入力音声パターンと標準パターン
の距離や尤度、それらに基づく認識得点などによって表
される。本発明によれば、従来のように２回の認識候補
の同一部分を抽出するといった処理は行わないので、そ
のような処理に起因する誤認識もなく、過去の認識結果
が有効に役立てられる。(2) Preferably, the recognition candidate adjusting means performs a predetermined calculation process based on a parameter representing a past and current recognition matching result to determine an adjusted recognition candidate. The parameter representing the recognition matching result is
For example, the ranking and similarity of recognition candidates. As is well known, the similarity is expressed by the distance and likelihood between the input voice pattern and the standard pattern, the recognition score based on the distance, and the like. According to the present invention, since the process of extracting the same part of the recognition candidate twice is not performed as in the related art, there is no erroneous recognition caused by such a process, and the past recognition result can be effectively used.

【００１５】（３）本発明の好ましい一態様において
は、前記言い直し判断手段は、過去の発声の入力音声パ
ターンを記憶する音声パターン記憶手段と、過去の入力
音声パターンと今回の入力音声パターンとの入力パター
ン間マッチングを行って両入力音声パターンの類似度を
判定する類似度判定手段と、を有し、所定の類似度が得
られる場合に、今回の入力音声パターンは言い直し発声
により得られたものであると判断する。(3) In a preferred aspect of the present invention, the rephrasing determining means includes a voice pattern storing means for storing an input voice pattern of a past utterance, and a past input voice pattern and a present input voice pattern. And a similarity determining means for performing the matching between the input patterns to determine the similarity between the two input voice patterns.If the predetermined similarity is obtained, the input voice pattern of this time is obtained by rephrasing the utterance. It is determined that it is.

【００１６】本発明によれば、再発声入力が行われたと
きに、今回と過去の入力音声パターンを対象とするマッ
チング処理を行うことにより、言い直しで発声であるか
否かの判断が行われる。入力音声パターンと標準パター
ンとの比較ではなく、同じ話者の２回の発声により得ら
れた入力音声パターン同士を比較しているので、簡単な
処理で高精度の判断を短時間に行うことが可能となる。According to the present invention, when a re-utterance input is performed, a matching process for the present and past input voice patterns is performed, so that a determination as to whether or not the voice is a utterance is made. Will be Rather than comparing the input voice pattern with the standard pattern, the input voice patterns obtained by two utterances of the same speaker are compared with each other. It becomes possible.

【００１７】（４）また好ましくは、前記認識処理手段
での前記認識マッチングは、不特定話者に適応する処
理、例えば、ヒドンマルコフモデル処理である。そして
好ましくは、前記類似度判定手段での前記入力パターン
間マッチングは、特定話者に適応する処理、例えば、ダ
イナミックプログラミング処理である。(4) Preferably, the recognition matching by the recognition processing means is a process adapted to an unspecified speaker, for example, a hidden Markov model process. Preferably, the matching between the input patterns by the similarity determination means is processing adapted to a specific speaker, for example, dynamic programming processing.

【００１８】この態様は、不特定話者用の音声認識装置
に本発明に適用する場合に、特に好適に作用する。認識
候補を選定するための認識マッチングには、当然なが
ら、不特定話者に適応する処理が適している。しかしな
がら、言い直し判断のための入力パターン間マッチング
での比較対象は、同じ話者による２回の音声信号であ
る。このような比較対象には特定話者に適応する処理が
適しており、該処理の採用により類似度判定を高精度に
行うことができる。このように、本発明によれば、不特
定話者用の音声認識装置において、部分的に特定話者に
適応するマッチング処理を利用して言い直し判定を高精
度に行うことにより、認識能力の向上を図ることができ
る。This embodiment works particularly suitably when the present invention is applied to a speech recognition apparatus for an unspecified speaker. For recognition matching for selecting a recognition candidate, of course, processing suitable for an unspecified speaker is suitable. However, comparison targets in input pattern matching for rephrasing determination are two speech signals by the same speaker. A process suitable for a specific speaker is suitable for such a comparison target, and the similarity determination can be performed with high accuracy by adopting the process. As described above, according to the present invention, in the speech recognition apparatus for an unspecified speaker, the restatement determination is performed with high accuracy by partially using the matching process adapted to the specific speaker, thereby improving the recognition ability. Improvement can be achieved.

【００１９】（５）また、本発明の好ましい一態様にお
いては、話者により２回以上の再発声が行われた場合
に、前記類似度判定手段は、音声パターンの複数の組合
わせのそれぞれについて類似度を求め、前記認識候補調
整手段は、最も高い類似度をもつ組合わせの音声パター
ンに基づいて、調整された認識候補を定める。(5) In a preferred aspect of the present invention, when two or more repetitions are performed by the speaker, the similarity determination means determines whether or not each of the plurality of combinations of the voice patterns has been obtained. The similarity is obtained, and the recognition candidate adjusting means determines an adjusted recognition candidate based on a combination of voice patterns having the highest similarity.

【００２０】本発明によれば、２回以上の再発声、すな
わち３回以上の発声が行われた場合に、複数の組み合わ
せのパターン間マッチング結果に基づき、類似度が最も
高い組み合わせが選ばれる。そして、その組み合わせの
入力音声パターンを使って最終的な認識候補が選ばれ
る。信頼性の高い認識候補を得られる組み合わせへの絞
り込みが行われるので、認識候補調整手段の処理負担を
軽減しつつ、認識能力の向上が図れる。According to the present invention, when two or more repetitions, that is, three or more utterances, are performed, the combination having the highest similarity is selected based on the result of pattern matching of a plurality of combinations. Then, a final recognition candidate is selected using the input voice pattern of the combination. Since the combinations that can obtain highly reliable recognition candidates are narrowed down, the recognition ability can be improved while reducing the processing load on the recognition candidate adjustment unit.

【００２１】（６）また本発明の別の態様の音声認識装
置は、入力音声パターンと標準パターンとの認識マッチ
ングを行って音声認識を行う装置であって、過去の発声
の入力音声パターンと今回の発声の入力音声パターンと
の類似度判定マッチングを行って、両者の類似度に基づ
いて、今回の入力音声パターンは、過去の発声の言い直
し発声により得られたパターンであることを判断する言
い直し判断手段を有する。(6) A voice recognition device according to another aspect of the present invention is a device for performing voice recognition by performing recognition matching between an input voice pattern and a standard pattern, and performs input recognition of a past voice and current time of the input voice pattern. Is performed based on the similarity between the two utterances and the input utterance pattern, and based on the similarity between the two utterances, it is determined that the current input utterance pattern is a pattern obtained by re-uttering the past utterance. It has a correction determining means.

【００２２】（７）また本発明の音声認識方法は、発声
入力された音声信号に基づいて入力音声パターンを生成
する音響処理工程と、今回の入力音声パターンは、過去
の発声の言い直し発声により得られたパターンであるこ
とを判断する言い直し判断工程と、前記入力音声パター
ンと標準パターンとの認識マッチングを行う認識マッチ
ング工程と、今回の発声が言い直し発声でないと判断さ
れた場合に、今回の認識マッチング結果に基づいて認識
候補を決定する工程と、今回の発声が言い直し発声であ
ると判断された場合に、今回の認識マッチング結果と過
去の認識マッチング結果の双方を用いて認識候補を決定
する工程と、を含む。この態様によれば、本発明の上記
の効果が、方法というかたちで得られる。(7) In the voice recognition method of the present invention, an audio processing step of generating an input voice pattern based on a voice signal input as a voice, and a current input voice pattern is performed by rephrasing a past voice. A restatement determining step of determining that the pattern is an obtained pattern, a recognition matching step of performing recognition matching between the input voice pattern and the standard pattern, and a step of determining that the current utterance is not a restatement utterance. Determining the recognition candidate based on the recognition matching result of the above, and, when it is determined that the current utterance is a restatement utterance, the recognition candidate is determined using both the current recognition matching result and the past recognition matching result. Determining. According to this aspect, the above effects of the present invention can be obtained in the form of a method.

【００２３】[0023]

【発明の実施の形態】以下、本発明の好適な実施の形態
（以下、実施形態という）について、図面を参照し説明
する。本実施形態では、不特定話者用の音声認識装置に
本発明が適用されている。また、本実施形態では、一例
として、単語認識が行われる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention (hereinafter, referred to as embodiments) will be described below with reference to the drawings. In the present embodiment, the present invention is applied to a voice recognition device for an unspecified speaker. In the present embodiment, as an example, word recognition is performed.

【００２４】図２は、本実施形態の音声認識装置の構成
を示すブロック図である。従来と同様に、話者がマイク
１０に向かって発声すると、マイク１０からの入力音声
信号は、Ａ／Ｄコンバータ１２でデジタル信号に変換さ
れ、音響処理部１４に入力される。音響処理部１４で
は、音響分析が行われ、特徴パラメータ（ケプストラム
等）の入力音声パターンが生成される。FIG. 2 is a block diagram showing the configuration of the speech recognition apparatus of the present embodiment. As in the related art, when the speaker speaks toward the microphone 10, an input voice signal from the microphone 10 is converted into a digital signal by the A / D converter 12 and input to the acoustic processing unit 14. In the sound processing unit 14, sound analysis is performed, and an input sound pattern of a feature parameter (such as a cepstrum) is generated.

【００２５】生成された入力音声パターンは、言直し判
定部２０に供給されるとともに、特徴パラメータ登録部
２２に格納される。言直し判定部２０および特徴パラメ
ータ登録部２２は、本発明の言い直し判断手段として機
能する。特徴パラメータ登録部２２は、ＲＡＭ等の記憶
手段であり、入力音声パターンを記憶する。言直し判定
部２０は、ＤＰ法に基づくパターンマッチングを行っ
て、特徴パラメータ登録部２２に記憶されている過去の
入力音声パターンと、言直し判定部２０から供給された
今回の入力音声パターンとを照合する。照合により、２
つの入力音声パターンの類似度（両者の距離や、距離に
基づく得点により表される）が得られる。類似度が所定
のスレッショルド値以上の場合に、今回は言い直しが行
われたと判定される。言直し判定部２０は、今回の入力
音声パターンとともに、言い直しの判定結果を認識処理
部３０に供給する。なお、判定結果の代わりに類似度を
認識処理部３０に供給し、認識処理部３０にて言い直し
の判定を行ってもよい。The generated input voice pattern is supplied to the rephrasing judgment section 20 and stored in the feature parameter registration section 22. The rephrasing determination unit 20 and the feature parameter registration unit 22 function as the rephrasing determination unit of the present invention. The feature parameter registration unit 22 is a storage unit such as a RAM, and stores an input voice pattern. The rewording determination unit 20 performs pattern matching based on the DP method, and compares the past input voice pattern stored in the feature parameter registration unit 22 with the current input voice pattern supplied from the rewording determination unit 20. Collate. By collation, 2
The degree of similarity between two input voice patterns (represented by the distance between them and the score based on the distance) is obtained. If the similarity is equal to or greater than a predetermined threshold value, it is determined that restatement has been performed this time. The rephrasing determining unit 20 supplies the rephrasing determination result to the recognition processing unit 30 together with the current input voice pattern. Note that the similarity may be supplied to the recognition processing unit 30 instead of the determination result, and the recognition processing unit 30 may determine the restatement.

【００２６】認識処理部３０では、入力音声パターンと
標準パターンとのパターンマッチングが行われる。ここ
では、ＨＭＭを使ったマッチング処理が行われる。図１
を用いて説明したように、認識用辞書記憶部１８には、
認識対象の複数の単語データが記憶されている。入力音
声パターンと各単語のマッチングが個別に行われ、マッ
チング結果が最もよい単語が第１位の認識候補に選定さ
れる。さらに下位の認識候補（単語）が、マッチングの
よい順に選定される。本実施形態では、所定の順位、例
えば１０位までの認識候補が選定される。選定された認
識候補は、認識候補登録部３２に送られる。認識候補登
録部３２は、ＲＡＭ等の記憶手段であり、後に再発声が
行われた場合の認識候補調整のために、認識候補を記憶
する。認識候補とともに、各認識候補の順位やマッチン
グ結果の類似度も記憶される。認識候補登録部３２と特
徴パラメータ登録部２２とは一体化されてもよい。The recognition processing section 30 performs pattern matching between the input voice pattern and the standard pattern. Here, the matching process using the HMM is performed. FIG.
As described with reference to FIG.
A plurality of word data to be recognized are stored. The input speech pattern and each word are individually matched, and the word with the best matching result is selected as the first recognition candidate. Further lower recognition candidates (words) are selected in descending order of matching. In the present embodiment, recognition candidates in a predetermined order, for example, up to tenth are selected. The selected recognition candidates are sent to the recognition candidate registration unit 32. The recognition candidate registration unit 32 is a storage unit such as a RAM, and stores recognition candidates for adjustment of recognition candidates when re-speaking is performed later. Along with the recognition candidates, the ranking of each recognition candidate and the similarity of the matching result are also stored. The recognition candidate registration unit 32 and the feature parameter registration unit 22 may be integrated.

【００２７】さらに、認識処理部３０は、本発明の認識
候補調整手段として機能する。認識処理部３０は、前述
のように、言直し判定部２０から今回の発声入力が「言
い直し発声」であるか否かの判断結果を受け取ってい
る。「言い直し発声でない場合」には、認識処理部３０
は、認識候補の調整を行わない。従って、今回のマッチ
ングにより選定された認識候補が、そのまま認識結果と
して出力される。出力された認識結果は、トークバック
やディスプレイ表示により、話者に提示される。Further, the recognition processing section 30 functions as recognition candidate adjusting means of the present invention. As described above, the recognition processing unit 30 receives the determination result as to whether or not the current utterance input is “restatement utterance” from the restatement determination unit 20. In the case of “if not a rephrasing utterance”, the recognition processing unit 30
Does not adjust the recognition candidates. Therefore, the recognition candidate selected by the current matching is output as a recognition result as it is. The output recognition result is presented to the speaker by talkback or display.

【００２８】一方、「言い直し発声である場合」には、
認識処理部３０は、認識候補登録部３２から、過去の発
声入力の際に選定された認識候補を読み出す。そして、
過去の認識候補と今回の認識候補との双方を用いた総合
的な判断により、調整された認識候補が定められる。調
整された認識候補は、認識結果として出力される。上記
の総合的な判断には種々の手法が適用可能であり、例え
ば、下記の手法が挙げられる。On the other hand, in the case of "restatement utterance",
The recognition processing unit 30 reads, from the recognition candidate registration unit 32, the recognition candidate selected in the past utterance input. And
An adjusted recognition candidate is determined by comprehensive judgment using both the past recognition candidate and the current recognition candidate. The adjusted recognition candidate is output as a recognition result. Various methods can be applied to the above-mentioned comprehensive judgment, and the following methods are exemplified.

【００２９】（１）各認識候補（単語）の順位が、今回
の候補セットと過去の候補セット内でつけられている。
そこで、２つの順位に適当な演算を施して、総合的な順
位が求められる。例えば、２つの順位の和や積が求めら
れ、和や積の小さい順に、認識候補が並び替えられる。
並び替え後の順番に従って、総合的な順位が定められ
る。なお、このとき、一つの候補セットにしか存在しな
い認識候補については、和や積が求められないことはも
ちろんである。(1) The ranking of each recognition candidate (word) is given in the current candidate set and the past candidate set.
Therefore, an appropriate operation is performed on the two ranks to obtain a comprehensive rank. For example, a sum or a product of two ranks is obtained, and the recognition candidates are rearranged in ascending order of the sum or the product.
An overall ranking is determined according to the order after rearrangement. At this time, it is needless to say that a sum or a product cannot be obtained for recognition candidates that exist only in one candidate set.

【００３０】（２）上記（１）の順位の代わりに、各認
識候補（単語）の２つの類似度に適当な演算を施して、
総合的な類似度が求められる。ここでの類似度は、入力
音声パターンと標準パターンがどれくらい類似するかを
示すものであり、尤度や、尤度に基づく得点により表さ
れる。この類似度は、パターンマッチングにより算出さ
れている。ここでは、例えば、２回の得点の和や積が計
算されて、計算結果が総合得点とされる。最も大きな総
合得点をもつ認識候補が、調整後の第１位の候補に選ば
れる。以下、総合得点を基準に、各候補の順位がつけな
おされる。和や積を計算する際に、２つの得点に重みを
つけることも好適である。例えば、高い得点の方に大き
な重みをつけたり、今回の認識結果の得点に大きな重み
をつけることが考えられる。(2) Instead of the ranking of (1), an appropriate operation is performed on two similarities of each recognition candidate (word),
An overall similarity is required. The similarity here indicates how similar the input voice pattern and the standard pattern are, and is represented by likelihood and a score based on the likelihood. This similarity is calculated by pattern matching. Here, for example, the sum or product of the two scores is calculated, and the calculation result is used as the total score. The recognition candidate having the largest total score is selected as the first candidate after adjustment. Hereinafter, each candidate is re-ranked based on the total score. When calculating the sum or the product, it is also preferable to weight the two scores. For example, it is conceivable to give a higher weight to a higher score or to give a higher weight to the score of the recognition result this time.

【００３１】上記の判断手法では、いずれも、２回のマ
ッチング結果が順位や類似度というかたちで利用されて
いる。すなわち、マッチング結果を表すパラメータであ
る順位や類似度を用いて、調整後の認識候補が定められ
ている。上記以外の判断手法を採用する場合にも、２回
のマッチング結果に基づいた処理が行われる。In each of the above-described determination methods, the results of two matchings are used in the form of rank and similarity. That is, the adjusted recognition candidates are determined using the rank and the similarity, which are parameters representing the matching result. Even when a determination method other than the above is employed, processing based on the results of two matchings is performed.

【００３２】その他、調整後の認識候補を定める際に、
過去のマッチング結果の一位の候補は、認識候補から除
外されることが好ましい。この候補が正解でないことは
明らかだからである。また、本発明の調整された認識候
補としては、１位の認識候補だけを求め、２位以下の認
識候補は求めなくてもよい。In addition, when determining the adjusted recognition candidates,
It is preferable that the top candidate in the past matching result is excluded from the recognition candidates. It is clear that this candidate is not the correct answer. In addition, as the adjusted recognition candidates of the present invention, only the first recognition candidate may be obtained, and the second or lower recognition candidate may not be obtained.

【００３３】図３は、上記の処理による認識候補の順位
の変動を示している。話者は、「上げる」と言ったとす
る。過去のマッチングでは、「上げる」は２位の認識候
補であり、誤った候補「下げる」が一位に選ばれた。そ
して、今回のマッチングでは、「上げる」は３位に選ば
れ、「下げる」が再び一位に選ばれた。このとき、「下
げる」は、正解でないことが分かっているので、認識候
補から外される。そして、正解の「上げる」は、１位で
はないものの、２回のマッチングでそこそこの高い順位
（類似度）を獲得している。本来正解なのだから、高い
順位がつくのは当然である。そして、２回の順位の高か
った「上げる」が、調整後の一位の候補に選ばれてい
る。このように、２回の認識結果を参照することで、認
識候補の順位が入れ替わり、認識候補の選定の正確さが
増大する。FIG. 3 shows a change in the order of recognition candidates due to the above processing. Suppose the speaker said "raise". In the past matching, “raise” was the second highest recognition candidate, and the wrong candidate “lower” was ranked first. In this match, “raise” was selected as third place, and “lower” was again ranked first. At this time, since it is known that "lower" is not the correct answer, it is excluded from the recognition candidates. Although the correct answer "raise" is not the first place, a moderately high rank (similarity) is obtained by performing the matching twice. Since the answer is correct, it is natural that a higher ranking will be given. “Raise”, which has been ranked high twice, is selected as the top candidate after adjustment. In this way, by referring to the two recognition results, the order of the recognition candidates is switched, and the accuracy of the selection of the recognition candidates increases.

【００３４】以上に本実施形態の音声認識装置の構成を
説明した。上記の構成において、認識処理部３０でのパ
ターンマッチングが認識候補の選定を目的として行われ
るのとは対照的に、言直し判定部２０では、言い直し判
定という目的でパターンマッチングが行われる。この目
的の相違に対応して、２つのマッチングで行われる処理
は全く異なる。すなわち、認識処理部３０では、１つの
入力音声パターンと多数の標準パターン（認識対象の全
単語）が個別に比較され、そして、認識候補の単語が特
定される。一方で、言直し判定部２０では、２つの入力
音声パターン同士が比較されて類似度が求められ、そし
て、ここでは認識候補の特定は不要である。従って、言
い直し判定の計算処理に要するパワーは、認識処理と比
べて非常に小さくてよく、従って、簡単な処理で短時間
に言い直し判定ができる。The configuration of the speech recognition apparatus according to the present embodiment has been described above. In the above configuration, in contrast to the case where the pattern matching in the recognition processing unit 30 is performed for the purpose of selecting a recognition candidate, the word matching determination unit 20 performs the pattern matching for the purpose of the wording determination. Corresponding to this difference in purpose, the processing performed in the two matchings is completely different. That is, the recognition processing unit 30 individually compares one input voice pattern with a number of standard patterns (all words to be recognized), and specifies a word as a recognition candidate. On the other hand, the rephrase determination unit 20 compares the two input voice patterns to determine the similarity, and here, it is not necessary to specify the recognition candidate. Therefore, the power required for the rephrasing determination calculation process may be much smaller than that of the recognition process, and therefore, the rephrasing determination can be performed in a short time with a simple process.

【００３５】また、言直し判定部２０では、同一話者が
同一単語を発声したか否かが、判断される。同一話者、
同一単語の場合に入力音声パターン同士のＤＰマッチン
グなどを行うと、異なる単語の場合と比較して顕著に高
い類似度が得られる。従って、本実施形態の言直し判定
部２０は、言い直し発声が行われたか否かを高精度に判
定できる。さらに、言い直し発声時に顕著に高い類似度
が得られることを利用して、言直し判定部２０でのマッ
チング等の処理を、十分な判定精度が得られる範囲で簡
略化することも好適であり、これにより処理のさらなる
高速化が図れる。In addition, the rephrasing determination unit 20 determines whether the same speaker has uttered the same word. Same speaker,
When DP matching between input voice patterns is performed for the same word, remarkably high similarity can be obtained as compared with the case of different words. Therefore, the rephrasing determination unit 20 of the present embodiment can determine with high accuracy whether the rephrasing utterance has been performed. Furthermore, it is also preferable to simplify the processing such as matching in the rewording determination unit 20 within a range where sufficient determination accuracy can be obtained, by utilizing the fact that a remarkably high similarity is obtained at the time of rewording. Thus, the processing can be further speeded up.

【００３６】特に、本実施形態では、言直し判定部２０
でのパターンマッチングにはＤＰ法が採用され、認識処
理部３０でのパターンマッチングには、ＨＭＭが採用さ
れている。認識処理部３０にてＨＭＭを採用している理
由は、本実施形態の音声認識装置の対象が不特定話者で
あり、そして、ＨＭＭが不特定話者の音声認識に適して
いるからである。ＨＭＭは、確率手法を採用しており、
話者の個人差に起因する入力信号の変化に強い。一方、
上記のように言直し判定部２０では「同一話者」の入力
音声パターンが比較されるので、この比較処理には、特
定話者に適応するパターンマッチングが適しており、そ
の一手法としてＤＰ法が採用することにより判断精度が
さらに向上している。このＤＰ法の処理を、前述のよう
に、十分な精度が得られる範囲で簡略化し、処理速度を
高速化することも好適である。In particular, in this embodiment, the rephrasing judging section 20
The DP method is adopted for the pattern matching in, and the HMM is adopted for the pattern matching in the recognition processing unit 30. The reason why the recognition processing unit 30 employs the HMM is that the target of the speech recognition apparatus of the present embodiment is an unspecified speaker, and the HMM is suitable for the speech recognition of the unspecified speaker. . HMM adopts the probability method,
Resistant to changes in input signals due to individual differences between speakers. on the other hand,
As described above, since the input speech patterns of the "same speaker" are compared in the rephrasing determination unit 20, pattern matching suitable for a specific speaker is suitable for this comparison processing. The determination accuracy is further improved by adopting. As described above, it is also preferable to simplify the processing of the DP method as long as sufficient accuracy is obtained and to increase the processing speed.

【００３７】以上のように、本実施形態の言い直し判定
処理では、同一話者の２つの入力音声パターン同士を比
較して同一単語が発声されたか否かが判定される。この
ような処理にはＨＭＭなど使わなくともよい。ＤＰ法に
基づいて簡単な処理で短時間に高精度な言い直し判定を
行って、後段にて過去の認識結果を参照した処理を行う
べきか否かを決定するのに必要な判定結果が得られる。
さらにＤＰ法を簡略化して処理の高速化を図ることもで
きる。As described above, in the rephrasing determination process of the present embodiment, it is determined whether or not the same word is uttered by comparing two input voice patterns of the same speaker. It is not necessary to use HMM or the like for such processing. Based on the DP method, simple repetition determination is performed in a short time with simple processing, and a determination result necessary for determining whether or not to perform processing with reference to past recognition results in a later stage is obtained. Can be
Further, the DP method can be simplified to increase the processing speed.

【００３８】次に、図４のフローチャートを参照し、本
実施形態の音声認識処理を説明する。音響処理部１４に
より、入力音声信号に対する音響分析が行われ、特徴パ
ターンの入力音声パターンが生成される（Ｓ１０）。こ
の入力音声パターンは、言直し判定部２０に供給される
とともに、特徴パラメータ登録部２２に登録される（Ｓ
１２）。言直し判定部２０は、前回の発声時の入力音声
パターンを特徴パラメータ登録部２２から読み出す。そ
して、前回の入力音声パターンと、今回の入力音声パタ
ーンが、ＤＰ法に従って比較され、両パターンの類似度
が求められ（Ｓ１４）、類似度と所定値（スレッショル
ド）が比較される（Ｓ１６）。Next, the speech recognition processing of this embodiment will be described with reference to the flowchart of FIG. The acoustic processing unit 14 performs an acoustic analysis on the input audio signal, and generates an input audio pattern of a characteristic pattern (S10). This input voice pattern is supplied to the rephrasing determination unit 20 and registered in the feature parameter registration unit 22 (S
12). The rephrase determination unit 20 reads the input voice pattern at the time of the previous utterance from the feature parameter registration unit 22. Then, the previous input voice pattern and the current input voice pattern are compared according to the DP method, the similarity between the two patterns is obtained (S14), and the similarity is compared with a predetermined value (threshold) (S16).

【００３９】類似度が所定値未満であれば、今回の発声
は、言い直し発声ではないと判断される（Ｓ１８）。こ
の判断結果が、今回の入力音声パターンとともに認識処
理部３０に送られる。なお、今回の発声が最初の発声の
場合は（再発声でない場合）、Ｓ１６で比較の対象の入
力音声パターンがなく、Ｓ１６の判断はＮＯとなり、Ｓ
１８へ進む。認識処理部３０は、入力音声パターンの認
識処理を行って、認識候補を選定し（Ｓ２０）、認識候
補を認識候補登録部３２に登録する（Ｓ２２）。そし
て、今回の結果のみから認識候補が決定される（Ｓ２
４）。すなわち、Ｓ２０で選定された認識候補が、その
まま、認識結果として出力される。トークバック等によ
って認識結果に修正の必要があると判断されなければ、
後段の処理が行われる。例えば、認識結果は、音声認識
装置と接続された他の機器に送られる。If the similarity is less than the predetermined value, it is determined that the present utterance is not a restatement utterance (S18). The result of this determination is sent to the recognition processing unit 30 together with the current input voice pattern. If the current utterance is the first utterance (if not a re-utterance), there is no input voice pattern to be compared in S16, the determination in S16 is NO, and the determination in S16 is NO.
Proceed to 18. The recognition processing unit 30 performs recognition processing of the input voice pattern, selects a recognition candidate (S20), and registers the recognition candidate in the recognition candidate registration unit 32 (S22). Then, recognition candidates are determined only from the current result (S2
4). That is, the recognition candidate selected in S20 is output as it is as a recognition result. If it is not determined that the recognition result needs to be corrected by talkback, etc.,
Subsequent processing is performed. For example, the recognition result is sent to another device connected to the speech recognition device.

【００４０】一方、Ｓ１６にて類似度が所定値以上であ
れば、今回の発声は、言い直し発声であると判断される
（Ｓ２６）。この判断結果が、今回の入力音声パターン
とともに認識処理部３０に送られる。認識処理部３０
は、入力音声パターンの認識処理を行って認識候補を選
定し（Ｓ２８）、認識候補を認識候補登録部３２に登録
する（Ｓ３０）。そして、認識処理部３０は、前回の認
識結果と今回の認識結果の双方から、最終的な認識候補
を決定する（Ｓ３２）。ここでは、前回の認識処理で得
られた認識候補セットが、各候補の順位や類似度のデー
タとともに、認識候補登録部３２から読み出される。そ
して、前述したように、順位や類似度を用いた総合判断
が行われ、最終的に調整された認識候補が定められ、認
識結果として出力される。On the other hand, if the similarity is equal to or more than the predetermined value in S16, it is determined that the present utterance is a restatement utterance (S26). The result of this determination is sent to the recognition processing unit 30 together with the current input voice pattern. Recognition processing unit 30
Performs an input voice pattern recognition process to select a recognition candidate (S28), and registers the recognition candidate in the recognition candidate registration unit 32 (S30). Then, the recognition processing unit 30 determines a final recognition candidate from both the previous recognition result and the current recognition result (S32). Here, the recognition candidate set obtained in the previous recognition processing is read out from the recognition candidate registration unit 32 together with the data on the ranking and similarity of each candidate. Then, as described above, comprehensive judgment is performed using the ranking and the similarity, and finally the adjusted recognition candidates are determined and output as the recognition result.

【００４１】なお、音声認識装置が、今回の発声が再発
声であるか否かを、例えば以下のようにして把握するこ
とも好適である。認識装置は、認識結果をトークバック
やディスプレイ表示により話者に伝える。話者は、この
認識結果を知って再発声の必要性を判断し、適当な操作
スイッチを操作して、再発声を行うことを認識装置に伝
える。認識装置は、その後の再発声が、前と違う言葉を
発する「言い換え発声」か、前と同じ言葉を発する「言
い直し発声」かを判定する。It is also preferable that the speech recognition device grasps whether or not the current utterance is a re-utterance as follows, for example. The recognition device notifies the speaker of the recognition result by talkback or display. The speaker knows the recognition result, determines the necessity of re-speaking, and operates an appropriate operation switch to notify the recognition device that re-speaking is to be performed. The recognizing device determines whether the subsequent re-utterance is a “paraphrase utterance” that utters a different word from the previous one or a “restate utterance” that utters the same word as before.

【００４２】また、上記では、最初の発声の後に再発声
が行われた後の処理を想定して認識処理を説明した。あ
る再発声の後にもう一度再発声が行われるときにも、図
４の処理を同様に適用可能である。しかしながら、連続
して再発声が行われるときには、図５に示される下記の
処理を行うことがさらに好適である。In the above description, the recognition process has been described on the assumption that the process is performed after re-speaking is performed after the first utterance. The process of FIG. 4 can be similarly applied when a re-speak is performed again after a certain re-speak. However, when re-speaking is performed continuously, it is more preferable to perform the following processing shown in FIG.

【００４３】ここでは、すでにＮ−１回の発声が行われ
（Ｎ−２回の再発声）、Ｎ回目の発声が行われるものと
する。まず、図４と同様に、音響処理部１４により、入
力音声信号に対する音響分析が行われ、特徴パターンの
入力音声パターンが生成される（Ｓ５０）。この入力音
声パターンは、言直し判定部２０に供給されるととも
に、特徴パラメータ登録部２２に登録される（Ｓ５
２）。言直し判定部２０は、過去のＮ−１回の発声時の
入力音声パターンを特徴パラメータ登録部２２から読み
出す。そして、図６に示すように、Ｎ−１回の入力音声
パターンのそれぞれと、今回の入力音声パターンとが、
個別にＤＰ法に従って比較され、Ｎ−１の類似度が求め
られる（Ｓ５４）。そして、最大の類似度が、所定値
（スレッショルド）と比較される（Ｓ５６）。Here, it is assumed that N-1 utterances have already been performed (N-2 repetitions) and the Nth utterance has been performed. First, similarly to FIG. 4, the acoustic processing unit 14 performs an acoustic analysis on an input audio signal to generate an input audio pattern of a characteristic pattern (S50). This input voice pattern is supplied to the rewording determination unit 20 and registered in the feature parameter registration unit 22 (S5).
2). The rephrase determination unit 20 reads out the input voice pattern at the time of the past N-1 utterances from the feature parameter registration unit 22. Then, as shown in FIG. 6, each of the N-1 input voice patterns and the current input voice pattern are
The comparison is individually performed according to the DP method, and the similarity of N-1 is obtained (S54). Then, the maximum similarity is compared with a predetermined value (threshold) (S56).

【００４４】最大の類似度が所定値未満であれば、今回
の発声は、過去のいずれの発声の言い直し発声でもない
と判断される（Ｓ５８）。そこで、図４のＳ２０以下の
処理と同様の処理が行われる。Ｓ５６の判断結果が、今
回の入力音声パターンとともに認識処理部３０に送られ
る。認識処理部３０は、今回の入力データから認識候補
を選定し（Ｓ６０）、認識候補登録部３２に登録する
（Ｓ６２）。そして、今回の結果のみから認識候補が決
定される（Ｓ６４）。If the maximum similarity is less than the predetermined value, it is determined that this utterance is not a restatement of any past utterance (S58). Therefore, the same processing as the processing after S20 in FIG. 4 is performed. The determination result of S56 is sent to the recognition processing unit 30 together with the current input voice pattern. The recognition processing unit 30 selects a recognition candidate from the current input data (S60) and registers it in the recognition candidate registration unit 32 (S62). Then, a recognition candidate is determined only from the current result (S64).

【００４５】一方、最大の類似度が所定値以上であれ
ば、最大の類似度を与えた入力音声パターンと今回の音
声パターンの組み合わせが、最適組であると判定される
（Ｓ６６）。そして、この最適組の２つの音声パターン
について、図４のＳ２８以下の処理と同様の処理が行わ
れる。Ｓ５６の判断結果が、今回の入力音声パターンと
ともに認識処理部３０に送られる。認識処理部３０は、
入力音声パターンの認識処理を行って認識候補を選定し
（Ｓ６８）、認識候補を認識候補登録部３２に登録する
（Ｓ７０）。そして、認識処理部３０は、最適組の入力
音声パターンの認識結果として記憶されている認識候補
セットを、その順位や類似度のデータとともに、認識候
補登録部３２から読み出す。読み出された過去の認識結
果と、今回の認識結果の双方から、最終的な認識候補が
決定される（Ｓ７２）。なお、この際、過去のＮ−１回
の認識処理で１位に選ばれた認識候補は、すべて認識候
補から除外される。On the other hand, if the maximum similarity is equal to or more than the predetermined value, it is determined that the combination of the input voice pattern giving the maximum similarity and the current voice pattern is the optimum combination (S66). Then, the same processing as the processing from S28 onward in FIG. 4 is performed on the two audio patterns of the optimal set. The determination result of S56 is sent to the recognition processing unit 30 together with the current input voice pattern. The recognition processing unit 30
A recognition candidate is selected by performing recognition processing of the input voice pattern (S68), and the recognition candidate is registered in the recognition candidate registration unit 32 (S70). Then, the recognition processing unit 30 reads out from the recognition candidate registration unit 32 the recognition candidate set stored as the recognition result of the input speech pattern of the optimal set, together with the data on the order and the similarity. A final recognition candidate is determined from both the read past recognition result and the current recognition result (S72). At this time, the recognition candidates selected as the first place in the past N-1 recognition processes are all excluded from the recognition candidates.

【００４６】なお、図５のＳ５４では、今回の入力音声
パターンと、これまでの入力音声パターンが比較され、
Ｎ−１の類似度が求められた。これに対し、Ｓ５４で
は、他の基準で、比較する組み合わせが選ばれてもよ
い。例えば、_NＣ₂通りの全組み合わせの比較が行われて
もよい。In S54 of FIG. 5, the current input voice pattern is compared with the previous input voice pattern.
N-1 similarities were determined. On the other hand, in S54, a combination to be compared may be selected based on another criterion. For example, comparison of all combinations of _N C ₂ kinds may be performed.

【００４７】以上、本発明の好適な実施形態を説明し
た。以上に説明したように、本実施形態によれば、言い
直し判定結果に基づいて過去の認識候補を有効に活用す
るので、再発声入力時の認識の正確さが増し、これによ
り認識能力を向上することができる。話者にとってみれ
ば、何度言い直しをしても同じ認識しかしてもらえない
といった事態が回避される。The preferred embodiment of the present invention has been described above. As described above, according to the present embodiment, the past recognition candidates are effectively used based on the rephrasing determination result, so that the accuracy of the recognition at the time of re-inputting the voice is increased, thereby improving the recognition ability. can do. From the viewpoint of the speaker, it is possible to avoid a situation in which the same recognition is obtained no matter how many restatements are made.

【００４８】また、本実施形態の言い直し判定処理で
は、同一話者の２つの入力音声パターン同士を比較して
同一単語が発声されたか否かが判定される。従って、Ｄ
Ｐ法等を採用して短時間で高精度な言い直し判定がで
き、さらに判定処理を簡略化して処理の高速化を図るこ
ともできる。特に、本実施形態の認識装置は不特定話者
用のものであるが、その一部に特定話者に適応するＤＰ
マッチングを適用することにより、言い直し判定が精度
よく行われる。In the rephrasing determination process of the present embodiment, two input voice patterns of the same speaker are compared with each other to determine whether the same word is uttered. Therefore, D
By adopting the P method or the like, highly accurate rephrasing determination can be performed in a short time, and the determination processing can be simplified to speed up the processing. In particular, although the recognition device of the present embodiment is for an unspecified speaker, a DP adapted to a specific speaker is partially used.
By applying the matching, the restatement determination is performed with high accuracy.

【００４９】さらにまた、本実施形態によれば、図５を
用いて説明したように、複数回の再発声が行われたとき
に、類似度の最も高い組み合わせを選ばれる。信頼性の
高い認識候補を得られる組み合わせへの絞り込みが行わ
れるので、認識処理部３０での処理負担を軽減しつつ、
認識能力の向上が図れる。Furthermore, according to the present embodiment, as described with reference to FIG. 5, when re-speech is performed a plurality of times, the combination having the highest similarity is selected. Since narrowing down to combinations that can obtain highly reliable recognition candidates is performed, while reducing the processing load on the recognition processing unit 30,
The recognition ability can be improved.

【００５０】以下、本実施形態の変形例を説明する。Hereinafter, a modified example of this embodiment will be described.

【００５１】（１）音声認識装置の認識対象は、単語に
限られない。例えば、文、文字、数字など、なんでもよ
い。(1) The recognition target of the speech recognition device is not limited to words. For example, anything, such as a sentence, a character, or a number, is acceptable.

【００５２】（２）認識処理部３０や言直し判定部２０
のマッチング処理は、ＨＭＭやＤＰ法には限定されず、
他の任意の手法を適用してよい。言直し判定部２０にＨ
ＭＭを採用することも、認識処理部３０にＤＰ法を採用
することも可能である。ただし、上述した一部の効果は
得られない。(2) Recognition processing unit 30 and rewording determination unit 20
Is not limited to the HMM or DP method,
Any other technique may be applied. H in the rephrasing judgment unit 20
It is also possible to employ the MM or the DP method for the recognition processing unit 30. However, some effects described above cannot be obtained.

【００５３】（３）特定話者用の音声認識装置に本発明
が適用されてもよい。用途に併せて、認識処理部３０や
言直し判定部２０のマッチング処理も適宜変更される。(3) The present invention may be applied to a voice recognition device for a specific speaker. The matching processing of the recognition processing unit 30 and the rewording determination unit 20 is appropriately changed according to the application.

【００５４】（４）図２に示した各構成の機能は、ハー
ドウェアによって実現されてもよく、ソフトウェアによ
って実現されてもよい。(4) The function of each component shown in FIG. 2 may be realized by hardware or software.

[Brief description of the drawings]

【図１】従来の音声認識装置の構成を示すブロック図
である。FIG. 1 is a block diagram showing a configuration of a conventional voice recognition device.

【図２】本発明の実施形態の音声認識装置の構成を示
すブロック図である。FIG. 2 is a block diagram illustrating a configuration of a speech recognition device according to an embodiment of the present invention.

【図３】前回と今回の２回の認識結果に基づいて認識
候補を定めるときの認識候補の順位の変動を示す図であ
る。FIG. 3 is a diagram showing a change in the order of recognition candidates when a recognition candidate is determined based on the results of two previous recognitions;

【図４】図２の装置の音声認識処理のフローチャート
である。FIG. 4 is a flowchart of a voice recognition process of the device of FIG. 2;

【図５】複数回の再発声が行われた場合に適した、図
２の装置の音声認識処理のフローチャートである。FIG. 5 is a flowchart of a speech recognition process of the apparatus of FIG. 2, which is suitable for a case where re-speaking is performed a plurality of times.

【図６】図５の処理において、複数回の入力音声パタ
ーンの類似度判定の組み合わせを示す図である。FIG. 6 is a diagram showing a combination of a plurality of similarity determinations of an input voice pattern in the process of FIG. 5;

[Explanation of symbols]

１４音響処理部、１８認識用辞書記憶部、２０言
直し判定部、２２特徴パラメータ登録部、３０認識
処理部、３２認識候補登録部。14 acoustic processing unit, 18 recognition dictionary storage unit, 20 rewording determination unit, 22 feature parameter registration unit, 30 recognition processing unit, 32 recognition candidate registration unit.

Claims

[Claims]

1. An audio processing unit for generating an input voice pattern based on a voice signal input and uttered, and a recognition processing unit for performing recognition matching between the input voice pattern and a standard pattern to select a recognition candidate for voice recognition. And a restatement judging means for judging that the input voice pattern of this time is a pattern obtained by restatement of past utterances, and when it is judged that the restatement utterance is performed by the judgment means, And a recognition candidate adjustment unit that performs an adjustment process of the recognition candidate selected by the recognition processing unit. The recognition candidate adjustment unit includes, as the adjustment process, a recognition candidate of a current recognition matching result and a recognition candidate of a past recognition matching result. A speech recognition device, wherein an adjusted recognition candidate is determined using both of the recognition candidates.

2. The apparatus according to claim 1, wherein the recognition candidate adjustment unit performs a predetermined calculation process based on a parameter representing a past and current recognition matching result to determine an adjusted recognition candidate. A speech recognition device characterized by the following.

3. The apparatus according to claim 1, wherein the rephrasing determining unit includes: a voice pattern storing unit that stores an input voice pattern of a past utterance; And a similarity determining unit that performs matching between the input voice patterns with the input voice pattern to determine the similarity between the two input voice patterns. If the predetermined similarity is obtained, the input voice pattern of this time is rephrased. A speech recognition device for determining that the speech is obtained by utterance.

4. The apparatus according to claim 3, wherein the recognition matching in the recognition processing unit is a process adapted to an unspecified speaker, and the matching between the input patterns in the similarity determination unit is: A speech recognition device characterized by processing adapted to a specific speaker.

5. The apparatus according to claim 3, wherein the recognition matching in the recognition processing means is Hidden Markov model processing, and the matching between the input patterns in the similarity determination means is dynamic programming processing. A speech recognition device, characterized in that:

6. The apparatus according to claim 3, wherein when the speaker performs re-speaking more than once, the similarity determining unit determines a plurality of combinations of the voice patterns. A speech recognition apparatus, wherein a similarity is obtained for each of them, and the recognition candidate adjusting means determines an adjusted recognition candidate based on a combination of speech patterns having the highest similarity.

7. A voice recognition apparatus for performing voice recognition by performing recognition matching between an input voice pattern and a standard pattern, wherein matching between input patterns of an input voice pattern of a past voice and an input voice pattern of a current voice is performed. The input voice pattern based on the similarity, and based on the similarity, determine that the current input voice pattern is a pattern obtained by rephrasing the past voice. Characteristic speech recognition device.

8. An audio processing step of generating an input voice pattern based on a voice signal input by utterance, and judging that the current input voice pattern is a pattern obtained by restatement of past utterances. A rephrasing determination step, a recognition matching step of performing recognition matching between the input voice pattern and the standard pattern, and if it is determined that the present utterance is not a rephrase utterance,
When the recognition candidate is determined based on the recognition matching result of this time, and when the utterance of this time is determined to be a restatement utterance,
Determining a recognition candidate using both the current recognition matching result and the past recognition matching result.