JPH0756595A

JPH0756595A - Voice recognizer

Info

Publication number: JPH0756595A
Application number: JP5204915A
Authority: JP
Inventors: Toshiyuki Odaka; 俊之小高; Akio Amano; 明雄天野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-08-19
Filing date: 1993-08-19
Publication date: 1995-03-03

Abstract

PURPOSE:To recognize voice while coping with the varying conditions of a user's uttering and the changes in the user by controlling a collating means or a discriminating means based on the detection results of the changes in the uttering conditions and the changes in the user. CONSTITUTION:Voices, that are inputted and digitized through an input means 1, are acoustically analyzed for every constant time interval by a first analysis means 2 and the result of the analysis is outputted in a form which is suitable to a collating means 4. The means 4 performs collation between time sequential patterns and a standard pattern and outputs the score against each standard pattern. The score outputted from the means 4 is inputted to a discrimination means 5 and a candidate corresponding to a best scored standard pattern or plural higher ranking candidates are outputted as recognition results. A second analysis means 3 analyzes voices inputted through the means 1, extracts the changes in uttering conditions and the changes in uttering speeds, outputs these information and controls the means 4 and 5 based on the outputs.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識装置に係り、
特に、同一の話者の発声様態が多様に変化する場合の音
声や話者が変わった場合の音声を良好に認識する装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device,
In particular, the present invention relates to a device for satisfactorily recognizing a voice when the utterance mode of the same speaker varies and a voice when the speaker changes.

【０００２】[0002]

【従来の技術】従来の音声認識装置、例えば、単語認識
装置では、音声を発声する単位が単語であるということ
を前提としている。この装置に対して複数の単語を続け
て発声すると、連続的に発声された複数単語全体が一つ
の単語であるとみなしてしまい正しい認識結果が得られ
ないことが多い。このように、利用者は単語毎に区切っ
た発声しかできないといった制限を受ける。2. Description of the Related Art In a conventional voice recognition device, for example, a word recognition device, it is premised that a unit for uttering a voice is a word. When a plurality of words are continuously uttered to this device, the plurality of continuously uttered words are regarded as one word, and a correct recognition result is often not obtained. In this way, the user is limited to only vocalization divided into words.

【０００３】また、音声認識装置が誤認識した場合に利
用者が丁寧に一音一音区切って言い直したりすると、区
切って発声された一音一音をそれぞれ一つの単語とみな
してしまい、ますます認識できなくなってしまう。Further, when the voice recognition device makes a mistaken recognition and the user politely rephrases by dividing each sound by one note, each sound produced by dividing the sound is regarded as one word. It becomes more difficult to recognize.

【０００４】[0004]

【発明が解決しようとする課題】本発明の目的は、利用
者の発声の仕方の変化や話者の変化などにも対応して音
声を認識できるようにすることにある。SUMMARY OF THE INVENTION An object of the present invention is to enable recognition of voice in response to changes in the way the user speaks, changes in the speaker, and the like.

【０００５】[0005]

【課題を解決するための手段】上記本発明の目的は、発
声の様々な様態の変化や話者の変化の検出を行う第２の
分析手段を設け、第２の分析手段の結果に基づいて照合
手段あるいは判定手段を制御することにより達成され
る。The object of the present invention is to provide a second analysis means for detecting changes in various modes of utterance and changes in the speaker, and based on the results of the second analysis means. This is achieved by controlling the matching means or the judging means.

【０００６】[0006]

【作用】本発明によれば、発声の様態あるいは話者の変
化を分析した結果に基づいて照合手段あるいは判定手段
を制御するので、発声の多様な様態の変化や話者の変化
に対応して音声を認識することができる。According to the present invention, the collating means or the judging means is controlled on the basis of the result of analyzing the manner of utterance or the change of the speaker, so that it is possible to respond to the change of various manners of utterance and the change of the speaker. Can recognize voice.

【０００７】[0007]

【実施例】以下、図を用いて本発明の実施例を説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【０００８】図１は本発明の音声認識装置の一実施例を
示すブロック図である。本発明で従来と異なっているの
は、照合手段４あるいは判定手段５を制御するために第
２の分析手段３を設けている点である。入力手段１を通
してデジタル化されて入力された音声は第１の分析手段
２に送られ、ここで一定時間間隔ごとに音響的な分析が
行なわれる。第１の分析手段２の結果は、照合手段４の
所望する形式（例えば、特徴ベクトルの時系列パターン
あるいはベクトル量子化されたコードの時系列パターン
など）として出力される。照合手段４は、第１の分析手
段２から得られる音響的な分析結果である時系列パター
ンと予め照合の基準として標準パタン格納手段６に用意
されている標準パタンとの間で照合を行ない、各標準パ
タンに対するスコアを出力する。照合手段４から出力さ
れたスコアは、判定手段５に入力され、最もスコアの良
い標準パタンに対応した一つあるいは上位の複数の候補
が認識結果として出力される。ここまでの入力手段１、
第１の分析手段２、照合手段４、判定手段５は従来の音
声認識装置と同様の構成である。本発明で従来と異なっ
ている第２の分析手段３は、入力手段１を通して入力さ
れた音声を分析し、発声様態の変化や発声速度の変化を
抽出し、この情報を出力する。そしてこの第２の分析手
段３の出力により照合手段４あるいは判定手段５を制御
する。FIG. 1 is a block diagram showing an embodiment of a voice recognition device of the present invention. The present invention is different from the conventional one in that the second analyzing means 3 is provided to control the collating means 4 or the judging means 5. The voice digitized and input through the input unit 1 is sent to the first analysis unit 2 where acoustic analysis is performed at regular time intervals. The result of the first analysis means 2 is output as a format desired by the matching means 4 (for example, a time series pattern of a feature vector or a time series pattern of vector quantized code). The collation unit 4 collates the time series pattern, which is the acoustic analysis result obtained from the first analysis unit 2, with the standard pattern prepared in the standard pattern storage unit 6 in advance as the collation reference, The score for each standard pattern is output. The score output from the matching unit 4 is input to the determination unit 5, and one or a plurality of candidates of the higher rank corresponding to the standard pattern having the best score are output as the recognition result. Input means 1 up to here,
The first analysis unit 2, the collation unit 4, and the determination unit 5 have the same configuration as the conventional voice recognition device. The second analyzing means 3 which is different from the conventional one in the present invention analyzes the voice input through the input means 1, extracts the change in the utterance mode and the change in the utterance speed, and outputs this information. Then, the collating means 4 or the judging means 5 is controlled by the output of the second analyzing means 3.

【０００９】本実施例では、第２の分析手段で取り出す
情報を発声モードとする。発声モードというのは、発声
形態、発声様式といった意味のものである。モードとい
った場合には複数のモードの存在を考えるが、ここでは
「音節単位の発声」「単語単位の発声」「文章単位の発
声」の三つのモードを考え、それぞれ１）音節モード、
２）単語モード、３）文章モードとする。１）の場合
は、新しい単語を伝えようとする場合や相手が聞き損な
った場合に一音一音丁寧にゆっくりとあるいは区切って
発声するような場合であり、例えば、「こ・く・ぶ・ん
・じ」と一音一音丁寧に発声する。２）の場合は、コマ
ンドや比較的簡単な情報の伝達を行う場合のように、一
つの単語を発声したり、あるいは複数個の単語を単語単
位に区切って発声するような場合であり、例えば「国分
寺」と発声する。３）の場合は、文章単位でごく普通に
発声するような場合であり、例えば、「国分寺まで行き
たい」と発声する。In this embodiment, the information extracted by the second analysis means is set to the utterance mode. The speaking mode means a speaking form and a speaking style. In the case of modes, we consider the existence of multiple modes, but here we consider the three modes of "speaking in syllable units", "speaking in word units", and "sentences in sentence units".
2) Word mode, 3) Sentence mode. In the case of 1), when trying to convey a new word, or when the other party misses listening, one voice is produced slowly or in a segmented manner. For example, “Koku Kubu” Say "Every tone" carefully. In the case of 2), as in the case of transmitting a command or relatively simple information, one word is uttered, or a plurality of words are divided into word units and uttered, for example, Say "Kokubunji". In the case of 3), it is a case where the user speaks normally in sentences, for example, "I want to go to Kokubunji".

【００１０】次に発声モードを検出する第２の分析手段
３について詳しく説明する。Next, the second analyzing means 3 for detecting the vocalization mode will be described in detail.

【００１１】図２は発声モードを検出する場合の第２の
分析手段の一実施例を示すブロック図である。図３は図
２中のブロック図の中で入出力となる情報のいくつかを
示しており、(ａ)〜(ｆ)は図２と図３で対応付けられて
いる。図３(ａ)のような振幅ｗ(ｔ)の音声がパワー算出
手段３０１に入力され、図３(ｂ)のようなパワー（短区
間パワー）、FIG. 2 is a block diagram showing an embodiment of the second analyzing means for detecting the utterance mode. FIG. 3 shows some of the information to be input / output in the block diagram in FIG. 2, and (a) to (f) are associated with each other in FIG. 2 and FIG. A voice having an amplitude w (t) as shown in FIG. 3 (a) is input to the power calculating means 301, and power (short section power) as shown in FIG.

【００１２】[0012]

【数１】 [Equation 1]

【００１３】が出力される。ただし、Ｔは短区間分析の
区間幅である。短区間パワーｐｗ(ｔ)はパワー閾値判定
手段３０２に入力され、０（パワー無）／１（パワー
有）に変換されて図３(ｃ)のような音声区間ｓｐ(ｔ)が
出力される。また、短区間パワーｐｗ(ｔ)はパワー変化
量算出手段３０４にも入力され、次式に従って、Is output. However, T is the section width of the short section analysis. The short section power pw (t) is input to the power threshold value judging means 302, converted into 0 (no power) / 1 (with power), and a speech section sp (t) as shown in FIG. 3C is output. . The short-term power pw (t) is also input to the power change amount calculation means 304, and according to the following equation,

【００１４】[0014]

【数２】ｄｐｗ(ｔ＋１)＝｜ｐｗ(ｔ＋１)−ｐ
ｗ(ｔ)｜図３(ｄ)のようなパワー変化量ｄｐｗ(ｔ)が算出され
る。パワー変化量ｄｐｗ(ｔ)は、変化量閾値判定手段３
０５に入力され、次式に従って、## EQU00002 ## dpw (t + 1) = | pw (t + 1) -p
w (t) | Power change amount dpw (t) as shown in FIG. 3 (d) is calculated. The power change amount dpw (t) is calculated by the change amount threshold determination means 3
It is input to 05, according to the following formula,

【００１５】[0015]

【数３】 if ｄｐｗ(ｔ)≦ＤＰＷ_THthen ｆ
ｉｘ(ｔ)＝１ else ｆｉｘ(ｔ)＝０定常部分かどうか判定され、０（非定常）／１（定常）
として図３(ｅ)のように定常区間ｆｉｘ(ｔ)が出力され
る。ただし、ＤＰＷ_THはシステム毎に決められる定数で
ある。次に母音性定常区間判定手段３０６はパワー閾値
判定手段３０２からの出力ｓｐ(ｔ)と変化量閾値判定手
段３０５からの出力ｆｉｘ(ｔ)を入力として、[Equation 3] if dpw (t) ≦ DPW _TH then f
ix (t) = 1 else fix (t) = 0 It is determined whether it is a stationary part, and 0 (unsteady) / 1 (steady)
As shown in FIG. 3E, the steady section fix (t) is output. However, DPW _TH is a constant determined for each system. Next, the vowel steady section determining unit 306 receives the output sp (t) from the power threshold determining unit 302 and the output fix (t) from the variation threshold determining unit 305 as inputs,

【００１６】[0016]

【数４】ｓｐｆｉｘ(ｔ)＝ｓｐ(ｔ)＆ｆｉｘ
(ｔ)（＆は論理積）により母音による定常区間（母音性定常区間）ｓｐｆｉ
ｘ(ｔ)を図３(ｆ)のように０／１で出力する。続いて定
常区間長算出手段３０７は、母音性定常区間判定手段３
０６から出力されるｓｐｆｉｘ(ｔ)の０／１の列の中で
連続する１の個数により定常区間長（ｆｉｘｓｚ）を求
める。定常区間評価手段３０８は、定常区間算出手段３
０７により定常区間長が求まる毎に、## EQU00004 ## spfix (t) = sp (t) & fix
(t) (& is a logical product) is used to determine the stationary segment due to vowels (vowel stationary segment) spfi
x (t) is output at 0/1 as shown in FIG. Subsequently, the stationary section length calculating unit 307 is configured to include the vowel stationary section determining unit 3
The stationary section length (fixsz) is obtained by the number of consecutive 1's in the 0/1 sequence of spfix (t) output from 06. The steady section evaluating means 308 is the steady section calculating means 3
Every time the steady section length is obtained from 07,

【００１７】[0017]

【数５】 if ｆｉｘｓｚ≧ＳＺ１_THthe
n ｎ_A＝ｎ_A＋１ else if ｆｉｘｓｚ≧ＳＺ２_THthen ｎ_B＝ｎ_B＋１により、長い定常区間の数ｎ_A、あるいは短い定常区間
の数ｎ_Bを求める。ただし、ｎ_Aとｎ_Bの初期値はともに
０である。また、ＳＺ１_THとＳＺ２_THはシステム毎に決
められる定数であり、ＳＺ１_TH＞ＳＺ２_THである。最後
に音声区間検出手段３０３において音声の終端が検出さ
れると、モード判定手段３０９に起動をかける。モード
判定手段３０９は、定常区間評価手段３０８よりｎ_Aと
ｎ_Bを受け取り、以下によりモードを判定する。ここ
で、ｎは全音節数を表わし、ｎ＝ｎ_A＋ｎ_Bである。[Equation 5] if fixsz ≧ SZ1 _TH the
The number n _A of long steady intervals or the number n _B of short steady intervals is obtained by n n _A = n _A +1 else if fixzz ≧ SZ2 _TH then n _B = n _B +1. However, the initial values of n _A and n _B are both 0. Further, SZ1 _TH and SZ2 _TH are constants determined for each system, and SZ1 _TH > SZ2 _TH . Finally, when the voice section detecting means 303 detects the end of the voice, the mode determining means 309 is activated. The mode determination means 309 receives n _A and n _B from the steady section evaluation means 308, and determines the mode as follows. Here, n represents the total number of syllables, and n = n _A + n _B.

【００１８】[0018]

【数６】 if ｎ_A／ｎ＞Ｎ１_TH [Equation 6] if n _A / n> N1 _TH

【００１９】[0019]

【数７】 or ｎ＜Ｎ２_THthen モード＝
音節モード else if ｎ＜Ｎ３_THthen モード＝単語モード else モード＝文章モードただし、Ｎ１_THとＮ２_TH、Ｎ３_THはシステム毎に決めら
れる定数である。モード判定手段３０９は、まず、全音
節数ｎに対する長い定常区間の数ｎ_Aの割合がある閾値
を越えているかどうかにより入力された音声がゆっくり
と丁寧に発声された音節モードかどうか判定する。さら
に、全音節数ｎの大きさによりモードを判定する。この
モード判定手段の３０９の出力により照合手段４あるい
は判定手段５を制御する。[Equation 7] or n <N2 _TH then mode =
Syllable mode else if n <N3 _TH then mode = word mode else mode = text mode However, N1 _TH , N2 _TH , and N3 _TH are constants determined for each system. First, the mode determination unit 309 determines whether the input voice is a syllable mode in which the input voice is uttered slowly and carefully depending on whether or not the ratio of the number n _A of long steady intervals to the total number n of syllables exceeds a certain threshold. Further, the mode is determined by the size of the total number of syllables n. The collating means 4 or the judging means 5 is controlled by the output of the mode judging means 309.

【００２０】なお、母音性定常区間を求めるために、こ
こではパワーの変化だけを用いた実施例を示したが、ス
ペクトルの変化だけあるいはパワーの変化とスペクトル
の変化の組合せとしても求められることは言うまでもな
い。In order to obtain the vowel stationary section, only the power change is used here. However, it is possible to obtain only the spectrum change or a combination of the power change and the spectrum change. Needless to say.

【００２１】次に本実施例の中で用いる照合手段４につ
いて図４を用いて説明する。Next, the collating means 4 used in this embodiment will be described with reference to FIG.

【００２２】図４は、第２の分析手段３の出力を用いて
モードを切り替えるようにした場合の照合手段４の構成
を示すブロック図である。これは、複数の照合手段の前
に選択手段４４を設けたものである。選択手段４４は第
２の分析手段３の出力により複数の照合手段（この例の
場合、音節照合手段４１、単語照合手段４２、文照合手
段４３）のうち一つあるいは複数（この例の場合は高々
二つまで）を適宜選択し、選択された照合手段に第１の
分析手段２からの情報を送る。複数選択した場合には判
定手段５がスコアに基づいて一つあるいは複数の候補を
認識結果として出力することになる。ＨＭＭ６１は、予
め統計的に学習された音節単位のモデルを格納してい
る。音節照合手段４１はこのモデルに沿って音節単位の
照合をし、照合結果として一つあるいは複数の音節の候
補をスコアと共に出力する。単語辞書６２は、単語につ
いての情報（例えば、どんな音節列で構成されているか
に関する情報）を格納している。単語照合手段４２は、
ＨＭＭ６１に格納された音節単位のモデルを、単語辞書
６２の情報に沿って組み合わせた単語単位のモデルを用
いて単語単位の照合を行い、照合結果として一つあるい
は複数の単語の候補をスコアと共に出力する。文法６３
は、文法を格納している。文照合手段４３は、ＨＭＭ６
１、単語辞書６２、文法６３に基づいて照合を行い、照
合結果として一つあるいは複数の文あるいは文節の候補
をスコアと共に出力する。FIG. 4 is a block diagram showing the structure of the collating means 4 when the mode is switched using the output of the second analyzing means 3. This is provided with a selecting means 44 in front of a plurality of collating means. The selecting means 44 outputs one or more of a plurality of collating means (in this example, the syllable collating means 41, the word collating means 42, and the sentence collating means 43) according to the output of the second analyzing means 3. The information from the first analysis means 2 is sent to the selected matching means. When a plurality of selections are made, the determination means 5 outputs one or a plurality of candidates as a recognition result based on the score. The HMM 61 stores a model for each syllable that is statistically learned in advance. The syllable matching means 41 performs syllable matching according to this model, and outputs one or a plurality of syllable candidates together with a score as a matching result. The word dictionary 62 stores information about words (for example, information about what syllable sequence is composed of). The word matching means 42
The word-by-word model is combined using the syllable-based model stored in the HMM 61 according to the information in the word dictionary 62, and one or a plurality of word candidates are output together with the score as a matching result. To do. Grammar 63
Stores the grammar. The sentence matching means 43 is the HMM 6
1. Matching is performed based on the word dictionary 62 and the grammar 63, and one or a plurality of sentences or clause candidates are output as a matching result together with the score.

【００２３】なお、音節照合手段４１、単語照合手段４
２、文照合手段４３の実現方法としては様々な方法が考
えられるが、ここではＨＭＭ（ＨｉｄｄｅｎＭａｒｋ
ｏｖＭｏｄｅｌ）を使った方法を考える。ＨＭＭを用い
た音声認識装置の実現方法については”中川聖一，音声
認識における時系列パターン照合アルゴリズムの展開，
人工知能学会，Vol.3, No.4, pp414-423, 1988.”ある
いは”Kai-Fu Lee, Automatic speech recognition: th
e development of the SPHINX system, Kluwer Academi
c Publisher, 1989.”に詳しく説明されている。The syllable matching means 41 and the word matching means 4
2. Various methods are conceivable for realizing the sentence matching unit 43, but here, an HMM (Hidden Mark) is used.
Consider a method using ovModel). For the method of realizing a speech recognition system using HMM, see "Seiji Nakagawa, Development of time series pattern matching algorithm in speech recognition,
AI Society, Vol.3, No.4, pp414-423, 1988. ”or“ Kai-Fu Lee, Automatic speech recognition: th
e development of the SPHINX system, Kluwer Academi
c Publisher, 1989. ”.

【００２４】次に、図５を用いて照合手段４の別の実施
例を説明する。Next, another embodiment of the matching means 4 will be described with reference to FIG.

【００２５】図５は、第２の分析手段３の出力を用いて
モードを切り替えるようにした場合の照合手段４の構成
を示すブロック図である。複数の照合手段の後に選択手
段４４を設けたものである。すなわち、複数の照合手段
（音節照合手段４１、単語照合手段４２、文照合手段４
３）は並列に動作し、各照合手段からの照合結果のうち
一つあるいは複数を、選択手段４４が第２の分析手段３
の結果に基づいて選択する。音節照合手段４１、単語照
合手段４２、文照合手段４３の構成については図４の場
合と同じで良い。FIG. 5 is a block diagram showing the configuration of the collating means 4 when the mode is switched using the output of the second analyzing means 3. The selecting means 44 is provided after the plurality of collating means. That is, a plurality of matching means (syllabic matching means 41, word matching means 42, sentence matching means 4).
3) operates in parallel, and the selecting means 44 selects one or more of the matching results from the matching means by the second analyzing means 3.
Select based on the results of. The configurations of the syllable matching unit 41, the word matching unit 42, and the sentence matching unit 43 may be the same as those in FIG.

【００２６】次に、図６を用いて照合手段４のさらに別
の実施例を説明する。Next, another embodiment of the matching means 4 will be described with reference to FIG.

【００２７】図６は、第２の分析手段３の出力により判
定手段５を制御する場合の照合手段４の構成を示すブロ
ック図である。選択手段がなく、複数の照合結果がすべ
て判定手段５へ送られる点以外は図４や場合と同じ構成
である。FIG. 6 is a block diagram showing the structure of the collating means 4 when the judging means 5 is controlled by the output of the second analyzing means 3. The configuration is the same as that of FIG. 4 and the case except that there is no selection means and all the plurality of matching results are sent to the determination means 5.

【００２８】次に本実施例の中で用いる判定手段５につ
いて説明する。Next, the judging means 5 used in this embodiment will be described.

【００２９】判定手段５は、入力として照合手段４の出
力を受け取る。判定手段５は、最もスコアの良い候補一
つあるいは上位の複数の候補を認識結果として出力す
る。なお、照合手段４内の選択手段４４により複数の照
合手段が選択されている場合には、それらの照合結果を
まとめて、判定手段５がスコアに基づいて最もスコアの
良い一つの候補あるいは上位の複数の候補を認識結果と
して出力することになる。さらに判定手段５は、照合手
段４の出力に加えて第２の分析手段３の出力を入力とし
て受け取る場合もある。判定手段５では、第２の分析手
段３から受け取った情報（今の場合は、発声モード）に
基づいて、照合手段４から送られてきた候補に対してス
コアの修正（例えば、重みを付ける）を行ってから、最
もスコアの良い一つの候補あるいは上位の複数の候補を
認識結果として出力する。The judging means 5 receives the output of the collating means 4 as an input. The determination means 5 outputs one candidate having the highest score or a plurality of candidates having the highest score as a recognition result. When a plurality of collating means are selected by the selecting means 44 in the collating means 4, the collating results are put together, and the judging means 5 selects one candidate having the best score or a higher rank based on the score. A plurality of candidates will be output as the recognition result. Further, the judging means 5 may receive the output of the second analyzing means 3 as an input in addition to the output of the collating means 4. The determination unit 5 corrects the score (for example, weights) the candidates sent from the matching unit 4 based on the information (in this case, the utterance mode) received from the second analysis unit 3. After that, one candidate with the highest score or a plurality of candidates with the highest score is output as a recognition result.

【００３０】なお、照合手段４と判定手段５の両方を制
御できることは言うまでもない。It goes without saying that both the collating means 4 and the judging means 5 can be controlled.

【００３１】本実施例では、第２の分析手段において発
声モードを検出するようにしたが、第２の分析手段が話
者性に関連した離散的な値（例えば、男性か女声か、大
人か子供か）を抽出するための分析を行なうようにすれ
ば、話者の変化に対応できる。In the present embodiment, the second analysis means detects the voicing mode, but the second analysis means has a discrete value (for example, male or female voice or adult) related to speaker characteristics. It is possible to deal with changes in the speaker by performing an analysis for extracting (children).

【００３２】また、第２の分析手段が入力音声の発声速
度に関連した連続的な値（例えば、音声中の単位時間当
りの音節数）を抽出するための分析を行なうようにすれ
ば、発声速度の変化に対応できる。Further, if the second analysis means performs an analysis for extracting a continuous value (for example, the number of syllables per unit time in the speech) related to the speech rate of the input speech, the speech is produced. Can respond to changes in speed.

【００３３】[0033]

【発明の効果】本発明によれば、発声の様態の変化や話
者の変化の検出結果に基づいて照合手段あるいは判定手
段を制御するので、利用者の多様な発声の様態の変化や
話者の変化に対応して音声を認識することができる。According to the present invention, the collating means or the judging means is controlled on the basis of the detection result of the change of the utterance mode and the change of the speaker, so that the various utterance changes of the user and the speaker are controlled. The voice can be recognized in accordance with the change of.

[Brief description of drawings]

【図１】本発明の音声認識装置の一実施例を示すブロッ
ク図。FIG. 1 is a block diagram showing an embodiment of a voice recognition device of the present invention.

【図２】第２の分析手段の一実施例を示すブロック図。FIG. 2 is a block diagram showing an embodiment of a second analysis means.

【図３】本実施例の第２の分析手段におけるデータの流
れを示す説明図。FIG. 3 is an explanatory diagram showing a data flow in the second analysis means of the present embodiment.

【図４】照合手段の構成を示すブロック図。FIG. 4 is a block diagram showing a configuration of a matching unit.

【図５】照合手段の他の構成を示すブロック図。FIG. 5 is a block diagram showing another configuration of the matching unit.

【図６】照合手段のさらに他の構成を示すブロック図。FIG. 6 is a block diagram showing still another configuration of the matching unit.

【符号の説明】１…入力手段、２…第１の分析手段、３…第２の分析手
段、４…照合手段、５…判定手段、６…標準パタン格納
手段。[Explanation of Codes] 1 ... Input means, 2 ... First analysis means, 3 ... Second analysis means, 4 ... Collation means, 5 ... Judgment means, 6 ... Standard pattern storage means.

Claims

[Claims]

1. A voice input means for inputting a voice, a first analyzing means for analyzing the voice input by the voice input means and outputting a time series pattern of a feature vector, which are prepared in advance as a recognition reference. By comparing the standard pattern storage means for storing the standard pattern with the time series pattern of the feature vector obtained from the standard pattern and the first analysis means,
In the voice recognition device, which comprises a collating means for obtaining a score for each standard pattern and a judging means for outputting one or a plurality of recognition candidates based on the score for each standard pattern, the voice input means inputs the voice recognition means. A voice recognition device characterized in that a second analysis means is provided for voice and the collation means and / or the determination means are controlled by using the output of the second analysis means.

2. The voice according to claim 1, wherein the second analysis means outputs a discrete value, and the collation means and / or the determination means is controlled by using the discrete value. Recognition device.

3. The matching means according to claim 2, wherein the matching means is provided with a plurality of matching means corresponding to the discrete values, and one or a plurality of the matching means are selected from the plurality of matching means based on the discrete values. A voice recognition device that properly selects and uses individual items.

4. The collating means according to claim 2, wherein the collating means is provided with a plurality of collating means corresponding to the discrete values, and all or a part of the plurality of collating means can be operated in parallel. Recognition device for selecting one or more of the results of the plurality of matching means based on a specific value.

5. The collating means according to claim 2, wherein the collating means is provided with a plurality of collating means corresponding to the discrete values, and all or a part of the plurality of collating means can be operated in parallel. A means for recognizing a plurality of matching results obtained from the plurality of matching means based on the discrete values.

6. The method according to claim 3, 4 or 5,
The analyzing means is a voice recognition device that outputs whether the utterance unit of the input voice is a syllable, a word, or a sentence.

7. The method according to claim 3, 4 or 5,
The speech recognition device is configured to extract discrete values related to speaker characteristics.

8. The output according to claim 1, wherein the output obtained from the second analysis means is a continuously varying amount, and the collating means and / or the judging means is used by using the continuously varying amount. A voice recognition device that is controlled.

9. The voice recognition device according to claim 8, wherein the second analysis means outputs a continuously changing amount related to a speaking rate.