JPS61145600A

JPS61145600A - Word recognition equipment

Info

Publication number: JPS61145600A
Application number: JP59269079A
Authority: JP
Inventors: 森井　秀司; 藤井　諭; 二矢田　勝行; 昌克星見
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1984-12-19
Filing date: 1984-12-19
Publication date: 1986-07-03

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は音声認識装置に用いる単語認識装置に関するも
のである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a word recognition device used in a speech recognition device.

従来例の構成とその問題点単語音声の認識方法の従来例として、入力された音声か
らまず音素を単位とした認識を行ない、認識された音素
の系列と単語辞書に格納された認識対象単語の音素系列
との音素系列間の類似度を求め、最も高い類似度を得る
単語辞書の音素系列て相当する単語を認識単語とする方
法が知られている。これは、「音声スペクトルの概略形
とその動特性を利用した単語音声認識システム」三輪他
、日本音響学会誌３４（１９７８）に示されている。Structure of the conventional example and its problems In the conventional example of the word speech recognition method, the input speech is first recognized in units of phonemes, and the recognized phoneme sequence and the recognition target word stored in the word dictionary are used. A method is known in which the degree of similarity between a phoneme sequence and a phoneme sequence is determined, and the word corresponding to the phoneme sequence in a word dictionary with the highest degree of similarity is selected as a recognized word. This is shown in "Word speech recognition system using the outline form of the speech spectrum and its dynamic characteristics" by Miwa et al., Journal of the Acoustical Society of Japan 34 (1978).

以下図面を参照しながら従来例の単語認識方法について
説明する。第１図は従来例の単語認識装置の構成を示し
たものである。図において１は類似度演算部、　２（ｒ
ｉ認識対象単語の音素系列が格納されている単語辞書、
３は音素間の尤度が格納されている音素間尤度格納部、
４は最も類似度の高い単語を選び出す単語判定部である
。以上のように構成された単語認識装置について以下そ
の認識方法を説明する。A conventional word recognition method will be described below with reference to the drawings. FIG. 1 shows the configuration of a conventional word recognition device. In the figure, 1 is a similarity calculation unit, 2(r
a word dictionary storing phoneme sequences of i-recognition target words;
3 is an inter-phoneme likelihood storage unit in which the likelihood between phonemes is stored;
4 is a word determination unit that selects the word with the highest degree of similarity. The recognition method of the word recognition device configured as described above will be explained below.

第１図の単語認識装置の前段に位置する音素認識装置（
図示せず）によジ入力された音声信号は音素の系列に変
換され第１図の類似度演算部１に送られる二類似度演算
部１では入力された音声の音素系列と、単語辞書２に格
納しである認識の対象となる１６６個の単語の音素系列
との間の単語類似度を音素間尤度格納部３に格納されて
いる音素間の尤度を用いて算出する。そして、得られた
１６６個の単語類似度は単語判定部４に送られ単語類似
度最大のものが選択され最大類似度を得た単語辞書の音
素系列に相当する単語を認識単語として出力する。The phoneme recognition device (
The input speech signal is converted into a phoneme sequence (not shown) and sent to the similarity calculation unit 1 shown in FIG. The degree of word similarity between the phoneme sequences of the 166 words stored in the 166 words to be recognized is calculated using the likelihood between phonemes stored in the likelihood between phonemes storage unit 3. Then, the obtained 166 word similarities are sent to the word determination section 4, the one with the maximum word similarity is selected, and the word corresponding to the phoneme sequence of the word dictionary for which the maximum similarity has been obtained is outputted as a recognized word.

次に、単語判定部４における単語類似度の算出方法につ
いて説明する。単語辞書２の辞書項目の音゛素系列をＤ
（Ｄ、、Ｄ２・・・・・・Ｄ□）、入力音素系列をｗ　
（ｗ、　、　Ｗ２・・・・−・ＷＪ）、　ｆｃだしＩ、
Ｊは辞書項目り及び入力音素系列Ｗの音素数とする。単
語類似度Ｓ　（Ｄ　、　Ｗ　’）は式１に示す漸化式に
より求める。Next, a method for calculating word similarity in the word determination section 4 will be explained. The phoneme series of dictionary entries in word dictionary 2 is D
(D,,D2...D□), input phoneme sequence w
(w, , W2...-WJ), fc dashi,
Let J be the dictionary entry and the number of phonemes in the input phoneme sequence W. The word similarity S (D, W') is determined by the recurrence formula shown in Equation 1.

ここで、Ｌ＝ｑＣｉ−１＊１−１）Ｌａ＝ｑ（ｉ−１，ｊ−２）＋１ａ（ｊ−１）Ｌａａ＝
　ｇ（ｉ−１，ｊ−３）＋１ａａ（ｊ−２）＋１ａａ（
ｊ−１）Ｌｏ　＝　ｇ（ｉ−２，１−１）＋４ｏ（１−１）Ｌｏ
ｏ　＝＝ｇ　（ｉ　−３、１−１）＋Ｊｏｏ（ｉ−２）
十１０ｏ（ｉ−１）ｇ（ｏ、ｏ）＝　Ｊ（１＋１　、Ｊ
＋１　）＝Ｏｑ（ｔ、ｏ）＝ｑ（ｏ＋ｊ）＝−ω （ただしｉ≠’＋］≠０）ｑＣｉ　、　ｊ＋１＞＝ｑ（１＋１　＊　ｉ　）＝−■
（ただしｉ≠Ｉ＋１　、　ｉ≠Ｊ＋１）式１においてｊ
ｉ（ｉ、ｊ）は辞書項目の音素系列りのｉ番目の音素Ｄ
ｉと入力音素系列ＷＯｊ番目の音素Ｗｊ　との尤度を示
す。同根にハ（ｊ）はｗｉの音素が付加する尤度、　！
１０（ｉ）はＤｉの音素が脱落する尤度を示す。また、
ＡａａＮ）、１ｏｏ（ｉ）ｌ：ｔＷｉの音素が連続して
付加する尤度とＤｉの音素が連続して脱落する尤度を示
す。式１は■単語境界同志は必ず対応する。■音素の付
加又は脱落は２連続以内である。■付加と脱落は連続し
て生起しない。という制限を加えて辞書項目の音素系列
りと入力音素系列Ｗの各音素Ｄｉ　、　Ｗｊを対応させ
た場合における最適な対応の結果得らｎる類似度を表わ
している。また式１におけるβ（ｉ９口。Here, L=qCi-1*1-1) La=q(i-1,j-2)+1a(j-1)Laa=
g(i-1,j-3)+1aa(j-2)+1aa(
j-1) Lo = g(i-2,1-1)+4o(1-1)Lo
o ==g (i -3, 1-1) + Joo (i-2)
10o(i-1)g(o,o)=J(1+1,J
+1)=Oq(t,o)=q(o+j)=-ω (however, i≠'+]≠0) qCi, j+1>=q(1+1*i)=-■
(However, i≠I+1, i≠J+1) In equation 1, j
i (i, j) is the i-th phoneme D in the phoneme series of the dictionary entry
The likelihood between i and the input phoneme sequence WOj-th phoneme Wj is shown. Ha (j) is the likelihood that the wi phoneme is added to the same root, !
10(i) indicates the likelihood that the phoneme of Di is dropped. Also,
AaaN), 1oo(i)l: Shows the likelihood that the phonemes of tWi are added consecutively and the likelihood that the phonemes of Di are consecutively dropped. Equation 1 is: ■ Word boundaries always correspond. ■ Addition or omission of phonemes is within two consecutive additions or omissions. ■Addition and omission do not occur consecutively. It represents the degree of similarity obtained as a result of the optimal correspondence when the phoneme sequence of the dictionary entry is made to correspond to each phoneme Di, Wj of the input phoneme sequence W with the following restriction. Also, β(i9mouth) in Formula 1.

ｇａ（ｉ）、ｇａａ（ｉ）、６ｏ（ｉ）、１ｏｏ（ｉ）
の各尤度の値はあらかじめ多数の音声の音素認Ｒを行な
った結果から得られる音素の付加や脱落を含む音素認識
の誤りの確率を表わすＣｏｎｆｕｓｉｏｎ　Ｍａｔｒｉ
ｘの各成分の対数値として求められ、音素間尤度格納部
３に格納さｎ−’ｃいる。このＣｏｎｆｕｓｉｏｎ　Ｍ
ａｔｒｉｘでは１つの入力音素に対する全ての認識音素
（脱落を含む）の出現確率の和は１となっている。ga(i), gaa(i), 6o(i), 1oo(i)
Each likelihood value is a Confusion Matri that represents the probability of an error in phoneme recognition, including the addition or omission of phonemes, obtained from the results of performing phoneme recognition R on a large number of voices in advance.
It is calculated as the logarithm value of each component of x and stored in the inter-phoneme likelihood storage unit 3. This Confusion M
In atrix, the sum of the appearance probabilities of all recognized phonemes (including omissions) for one input phoneme is 1.

すなわち、従来例による単語認識の方法は入力音素系列
に対し式１により得られる単語類似度を単語辞書に含ま
ｎる全ての辞書項目について算出し、最も単語類似度が
高いものに対応する単語を認識単語とするものである。In other words, the conventional word recognition method calculates the word similarity obtained by equation 1 for the input phoneme sequence for all n dictionary items included in the word dictionary, and selects the word corresponding to the one with the highest word similarity. This is a recognized word.

しかしながら、従来例による方法は、入力される音声の
語頭や語尾に口から発生される呼吸音やため息、さらに
、「ええと」などの意味のない音声のような雑音が付加
さｎた場合有効に動作しない場合が多い。第２図は「市
川」という音声の前に呼吸音による雑音が付加された場
合の音素認識結果の音素系列の例を示したもので呼吸音
による雑音は１ｈａ１と認識されその雑音から音声の始
端までの無音部は１０１（促音）と認識されてしまった
ため１ｈａ０１という３音素が語頭に連続して付加して
いる。このように語頭や語尾に口から発せられる雑音が
付加されると雑音部は２音素以上に認識されやすく、雑
音から音声の始端までの無音部、あるい、ｑ音声の終端
から雑哀までの無音部は促音１０１と認識されやすくな
るため３音素以上連続して音素が付加されることが多い
。式１の漸化式は、音素の付加は２連続以内であるとい
う制限のもとての漸化式であるため３音素以上連続して
付加が起った場合、第３図に示すように辞書項目の音素
Ｄｉ　　と入力音素Ｗｉは最適な対応を得ることが出来
ず、正解単語の辞書項目との単語類似度は小さな値とな
るため正しい認識結果を得にくくなるという欠点を有し
ている。However, the conventional method is effective when noises such as breathing sounds or sighs generated from the mouth, or meaningless sounds such as "um" are added to the beginning or end of the input speech. It often doesn't work. Figure 2 shows an example of the phoneme sequence resulting from phoneme recognition when noise due to breathing sounds is added before the voice ``Ichikawa''. Since the silent part up to this point was recognized as 101 (consonant), the three phonemes 1ha01 are added consecutively to the beginning of the word. When the noise emitted from the mouth is added to the beginning or end of a word in this way, the noise part is easily recognized as two or more phonemes. Since a silent part is easily recognized as a consonant 101, three or more phonemes are often added consecutively. The recurrence formula in Equation 1 is a recurrence formula with the restriction that the addition of phonemes is limited to two consecutive times, so if three or more phonemes are added consecutively, as shown in Figure 3, This method has the disadvantage that it is not possible to obtain an optimal correspondence between the phoneme Di of the dictionary entry and the input phoneme Wi, and the word similarity of the correct word with the dictionary entry is a small value, making it difficult to obtain correct recognition results. .

発明の目的本発明は従来技術のもつ以上のような欠点を解消するも
ので、音声の語頭１語尾に雑音が付加された場合でも性
能劣下の少ない単語認識装置を提供するものである。OBJECTS OF THE INVENTION The present invention eliminates the above-mentioned drawbacks of the prior art, and provides a word recognition device that exhibits little performance deterioration even when noise is added to the beginning or end of a speech word.

発明の構成本発明による基本構成は認識対象単語の音素系列が格納
されている単語辞書記憶部と、音素間尤度が格納されて
いる音素間尤度格納記憶部と、入力音素系列の語頭また
は語尾に雑音が付加されている可能性があるか判定する
音素系列検定部と。Structure of the Invention The basic structure according to the present invention includes a word dictionary storage unit storing phoneme sequences of words to be recognized, a phoneme-to-phoneme likelihood storage unit storing phoneme-to-phoneme likelihoods, and a word dictionary storage unit storing phoneme sequences of words to be recognized; A phoneme sequence testing unit that determines whether there is a possibility that noise is added to the end of a word.

雑音が付加されている可能性がある場合は雑音部分を除
去した単語境界を定め、修正入力音素系列全発生する単
語境界再決定部と、入力音素系列又は入力音素系列及び
修正入力音素系列と単語辞書記憶部の音素系列との単語
類似度を計算する単語類似度演算部と、単語類似度のう
ち最大となるものを選びその最大単語類似度算出の除用
いた単語辞書の音素系列に相当する単語を認識単語とし
て出力する単語判定部を備え、音声の語頭または語尾に
雑音が付加されている可能性のある場合には、入力音素
系列と単語境界を修正された修正入力音素系列の２種類
の入力音素系列と単語辞書の音素系列との単語類似度を
求めるようにしたものであるう実施例の説明以下本発明の一実施例について図面を参照しながら説明
する。第４図は本発明の一実施例における音声認識装置
に組込まれた単語認識装置のブロック図を示したもので
ある。第４図において５は単語類似度演算部で前段の音
素認識装置（図示せず）によｆ）認識された入力音声の
音素系列と、単語辞書記憶部６に格納されている認識の
対象となる単語の音素系列との単語類似度を計算する。If there is a possibility that noise has been added, a word boundary from which the noise part has been removed is determined, and a word boundary re-determining unit generates all modified input phoneme sequences, and the input phoneme sequence or the input phoneme sequence and the modified input phoneme sequence and the word. The word similarity calculation unit calculates the word similarity with the phoneme sequence of the dictionary storage unit, and the word similarity calculation unit selects the maximum word similarity and calculates the maximum word similarity corresponding to the phoneme sequence of the word dictionary. It is equipped with a word judgment unit that outputs words as recognized words, and when there is a possibility that noise is added to the beginning or end of the speech, there are two types of input phoneme sequences: an input phoneme sequence and a modified input phoneme sequence with word boundaries corrected. DESCRIPTION OF AN EMBODIMENT The degree of word similarity between an input phoneme sequence and a phoneme sequence in a word dictionary is determined.An embodiment of the present invention will be described below with reference to the drawings. FIG. 4 shows a block diagram of a word recognition device incorporated in a speech recognition device according to an embodiment of the present invention. In FIG. 4, reference numeral 5 denotes a word similarity calculation unit that calculates the phoneme sequence of the input voice recognized by the previous stage phoneme recognition device (not shown) and the recognition target stored in the word dictionary storage unit 6. The word similarity with the phoneme sequence of the word is calculated.

また７は音素間尤度格納記憶部で、単語類似度演算部５
において単語類似度を算出する際に用いられる音素間尤
度が格納されている。この音素間尤度はあらかじめ多数
の音声の音素認識を行ない、そしてその結果得られる音
素の付加や脱落を含む音素の認識の誤りの確率全表わす
ＣｏｎｆｕｓｉｏｎＭａｔｘｉｘの各成分の対数値を求
めることにより得られたものを用いている。更に８は音
素系列検定部で前段の音素認識装置から送られて来た音
素系列に促音１Ｑ１が含まれているかの検定を行なう部
分である。そして、入力音素系列に促音１ｏ１が含まれ
ている場合には、単語境界再決定部９において単語境界
の再決定が行なわれ、その再決定された音素系列は再び
単語類似度演算部６に送られる。１ｏは単語判定部で単
語類似度演算部６で算出された単語類似度のうち最大の
ものを求め。Further, 7 is a storage unit for storing likelihood between phonemes, and a word similarity calculation unit 5
The inter-phoneme likelihood used in calculating word similarity is stored. This inter-phoneme likelihood can be obtained by performing phoneme recognition in advance on a large number of voices, and then calculating the logarithm value of each component of ConfusionMatxix, which represents the total probability of phoneme recognition errors, including phoneme additions and omissions. I'm using something like this. Further, reference numeral 8 denotes a phoneme sequence verification section which verifies whether the phoneme sequence sent from the previous stage phoneme recognition device includes the consonant 1Q1. If the input phoneme sequence includes the consonant 1o1, the word boundary is redetermined in the word boundary redetermination unit 9, and the redetermined phoneme sequence is sent to the word similarity calculation unit 6 again. It will be done. 1o is a word determination unit that determines the maximum word similarity among the word similarities calculated by the word similarity calculation unit 6.

その最大単語類似度を得た単語辞書記憶部６に格納され
ている辞書項目の音素系列に相当する単語を認識単語と
して出力する。The word corresponding to the phoneme sequence of the dictionary item stored in the word dictionary storage unit 6 for which the maximum word similarity has been obtained is output as a recognized word.

以上のように構成された単語認識装置についてその動作
を説明する。マイク等より入力された音声は音素認識装
置により音素の系列に変換され第４図の単語認識装置に
送られる。人力された音素系列は第４図の単語類似度演
算部５と音素系列検定部８に送られる。単語類似度演算
部６では入力された音素系列と単語辞書記憶部６に格納
されている認識対象単語の音素系列との間の単語類似度
全音素間尤度格納記憶部７に格納されている音素間尤度
を用いて計算する。この単語間類似度は式１に示す式で
行なっている。そして、この単語類似度は単語辞書記憶
部６に格納されている全ての音素系列について計算され
る。The operation of the word recognition device configured as described above will be explained. Speech input from a microphone or the like is converted into a series of phonemes by a phoneme recognition device and sent to the word recognition device shown in FIG. The manually generated phoneme sequence is sent to the word similarity calculation section 5 and the phoneme sequence verification section 8 shown in FIG. In the word similarity calculation unit 6, the word similarity between the input phoneme sequence and the phoneme sequence of the recognition target word stored in the word dictionary storage unit 6 is stored in the total inter-phoneme likelihood storage storage unit 7. Calculated using inter-phoneme likelihood. This inter-word similarity is calculated using the equation shown in equation 1. Then, this word similarity is calculated for all phoneme sequences stored in the word dictionary storage section 6.

一方、音素系列検定部８に入力された音素系列は音素系
列の中に促音１０１が含まれているか検定される。入力
音素系列の中に促音が含まれていない場合は以下の処理
は行わないが、入力音素系列の中に促音が含まれている
場合には、語頭あるいは語尾に雑音が付加さｎている可
能性があるということで入力音素系列は単語境界再決定
部１１に送られ単語境界が修正される。単語境界再決定
部１１では入力音素系列に含まれる促音の位置が語尾よ
りも語頭に近い場合は語頭に雑音が付加されている可能
性があるとし、促音に後続する音素全語頭とする単語境
界の修正を行なう。逆に促音の位置が語頭よりも語尾の
方に近い場合は語尾に雑音が付加されている可能性があ
るということで促音の前の音素を語尾とする単語境界の
修正が行なわれる。単語境界再決定部９において単語境
界が修正された音素系列は単語類似度演算部５に送られ
単語辞書記憶部６の音素系列との単語類似度が計算され
る。すなわち、入力音素系列に促音が含まれている場合
は入力音素系列と単語辞書の音素系列との単語類似度に
加え、単語境界を修正された入力音素系列と単語辞書の
音素系列との単語類似度も計算される。そして、計算さ
れた単語類似度は単語決定部１ｏに送られ、単語類似度
が最大となる単語辞書の音素系列に和尚する単語を認識
単語として出力する。On the other hand, the phoneme sequence input to the phoneme sequence testing section 8 is tested to see if the phoneme sequence includes a consonant 101. If the input phoneme series does not include a consonant, the following processing is not performed, but if the input phoneme series does include a consonant, noise may have been added to the beginning or end of the word. The input phoneme sequence is sent to the word boundary re-determining unit 11 and the word boundaries are corrected. The word boundary re-determining unit 11 determines that if the position of a consonant included in the input phoneme sequence is closer to the beginning of a word than to the end of the word, there is a possibility that noise has been added to the beginning of the word, and sets the word boundary to the beginning of all phonemes following the consonant. Make corrections. On the other hand, if the consonant is closer to the end of the word than the beginning, there is a possibility that noise has been added to the end of the word, so the word boundary is corrected so that the phoneme before the consonant becomes the end of the word. The phoneme sequence whose word boundaries have been corrected in the word boundary re-determination unit 9 is sent to the word similarity calculation unit 5, where the word similarity with the phoneme sequence in the word dictionary storage unit 6 is calculated. In other words, if the input phoneme sequence contains a consonant, in addition to the word similarity between the input phoneme sequence and the phoneme sequence in the word dictionary, the word similarity between the input phoneme sequence with word boundaries corrected and the phoneme sequence in the word dictionary is determined. Degrees are also calculated. The calculated word similarity is then sent to the word determination unit 1o, and the word that matches the phoneme sequence of the word dictionary with the maximum word similarity is output as a recognized word.

本実施例によｎば音声の語頭あるいは語尾に。According to this embodiment, if n is at the beginning or end of a voice.

呼吸音やせきばらい、あるいは「ええと」等の意味のな
い音声のような雑音が付加された場合でも単語境界再決
定部９により雑音が除去されるため正しい認識結果を得
ることが出来る。これは前述したような雑音と音声の間
には無音区間が存在することが多く、この無音区間の音
素認識結果が促音１０１となるということを利用したも
のである。Even if noises such as breathing sounds, coughing, or meaningless voices such as "um" are added, the word boundary re-determination unit 9 removes the noises, making it possible to obtain correct recognition results. This is based on the fact that there is often a silent section between noise and speech as described above, and the phoneme recognition result of this silent section is a consonant 101.

本実施例による単語認識装置を組み込んだ音声認識装置
を用い男女計４０名の話者が発声した２７４単語により
評価実験全行なった結果、前述したような雑音が付加さ
れている場合でも有効に動作し、平均単語認識率９５．
６％という良好な結果全書ることが出来た。As a result of conducting evaluation experiments using 274 words uttered by a total of 40 male and female speakers using the speech recognition device incorporating the word recognition device according to this embodiment, we found that it works effectively even when noise is added as described above. The average word recognition rate was 95.
I was able to write the whole thing with a good result of 6%.

発明の効果以上のように、本発明は認識対象となる単語の音素系列
が格納されている単語辞書記憶部と、音素間の尤度が格
納されている音素間尤度格納記憶部と、入力音素系列に
雑音が付加されている可能性があるか判定を行なう音素
系列検定部と、単語境界を修正し修正入力音素系列を発
生する単語境界再決定部と、入力音素系列または入力音
素系列及び修正入力音素系列と単語辞書記憶部に格納さ
れている音素系列との単語類似度を計算する単語類似度
演算部と、計算された単語類似度のうち最大のものを選
びその単語類似度の計算に用いた単語辞書記憶部の音素
系列に相当する単語を認識結果として出力する単語判定
部により構成される単語認識装置であり、本発明は、音
声の語頭や語尾に呼吸音やせきばらい、あるいは「ええ
と」などの意味のない音声のような雑音が付加された場
合、雑音と音声の間に無音区間が存在することが多いと
いうことを利用し、音素系列検定部において入力音素系
列に雑音が付加されている可能性があるか判定全行ない
、雑音が付加されている可能性がある場合には単語境界
再決定部において単語境界を修正し、入力音素系列と単
語境界を修正した音素系列の２種類の入力音素系列によ
り単語類似度金求めるようにしたもので、雑音が付加さ
れた音声に対しても正しい認識結果を得られる利点を有
する。Effects of the Invention As described above, the present invention includes a word dictionary storage unit storing phoneme sequences of words to be recognized, an inter-phoneme likelihood storage unit storing likelihoods between phonemes, and an input A phoneme sequence verification unit that determines whether noise may be added to the phoneme sequence; a word boundary re-determination unit that corrects word boundaries and generates a modified input phoneme sequence; A word similarity calculation unit that calculates the word similarity between the corrected input phoneme sequence and the phoneme sequence stored in the word dictionary storage unit, and a word similarity calculation unit that selects the maximum word similarity among the calculated word similarities. The present invention is a word recognition device that includes a word judgment unit that outputs a word corresponding to the phoneme sequence of the word dictionary storage unit used in the word dictionary storage unit as a recognition result. When noise such as meaningless speech such as "um" is added, there is often a silent interval between the noise and the speech. If there is a possibility that noise has been added, the word boundaries are corrected in the word boundary re-determining unit, and the input phoneme sequence and the phoneme sequence with the corrected word boundaries are combined. This method calculates word similarity using two types of input phoneme sequences, and has the advantage that correct recognition results can be obtained even for speech with added noise.

[Brief explanation of drawings]

第１図は従来の単語認識装置の機能構成を示すブロック
図、第２図は語頭に雑音が付加した場合の音素系列の例
を示す図、第３図は入力音素と単語辞書の音素との誤っ
た対応をとる例を示す図。第４図は本発明の一実施例における単語認識装置の機能
ブロック図である。６・・・・・・単語類似度演算部、６・・・・・・単語
辞書記憶部、７・・・・・・音素間尤度格納記憶部、８
・・・・・・音素系列検定部、９・・・・・・単語境界
再決定部、１ｏ・・・・・・単語判定部。Figure 1 is a block diagram showing the functional configuration of a conventional word recognition device, Figure 2 is a diagram showing an example of a phoneme sequence when noise is added to the beginning of a word, and Figure 3 is a diagram showing the relationship between input phonemes and phonemes in a word dictionary. A diagram showing an example of taking an incorrect response. FIG. 4 is a functional block diagram of a word recognition device in one embodiment of the present invention. 6... Word similarity calculation unit, 6... Word dictionary storage unit, 7... Inter-phoneme likelihood storage storage unit, 8
. . . Phoneme sequence testing section, 9 . . . Word boundary re-determination section, 1o . . . Word judgment section.

Claims

[Claims]

A word dictionary storage unit stores the phoneme sequence of the word to be recognized, an inter-phoneme likelihood storage unit stores the likelihood between phonemes, and a word dictionary storage unit stores the phoneme sequence of the word to be recognized, and an inter-phoneme likelihood storage unit stores the likelihood between phonemes. a word boundary re-determining unit that corrects word boundaries and generates a corrected input phoneme sequence when noise is added; and an input speech sequence or input phoneme sequence and corrected input phoneme sequence. a word similarity calculation unit that calculates a word similarity between a phoneme sequence and a phoneme sequence stored in the word dictionary storage unit using an inter-phoneme likelihood stored in the inter-phoneme likelihood storage unit; , a word determination unit that selects the maximum word similarity among the word similarities calculated by the word similarity calculation unit and outputs the word corresponding to the phoneme sequence in the word dictionary storage unit used for calculating the word similarity as a recognition result; A word recognition device comprising: