JPS58159595A

JPS58159595A - Monosyllabic voice recognition system

Info

Publication number: JPS58159595A
Application number: JP57030033A
Authority: JP
Inventors: 教幸藤本; 佐藤　泰雄; 大山　隆之
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-02-26
Filing date: 1982-02-26
Publication date: 1983-09-21

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（ａ）　　発明の技術分野音声を認識させる単音節音声認識方式に関する。[Detailed description of the invention] (a) Technical field of the invention This invention relates to a monosyllabic speech recognition method for recognizing speech.

缶）技術の背景近年音声認識技術の向上に伴い、話者の音声を認識する
場合、認識誤りの少い音声認識装置の出現が望まれてい
る。音声認識方式は主として話者の単音節音声を予め特
徴パラメータに変換して記憶させておき、未知入力単音
節音声の特徴パラメータと予め記憶させた特徴パラメー
タとを照合して最も似ているものを該当する単音節音声
として認識するものであるが、同じ単音節音声でも発声
の仕方では特徴パラメータは変化し、例え同一単音節音
声を何回か発声方法を変えて登録しておいても誤りを零
にすることは困難である。特に認識誤シを生じ易い特徴
パラメータを有する単音節音声は照合方法を考慮しない
と認識率の向上を計ることが出来ない。このため予め登
録しである総ての単音節音声と話者の単音節音声とを照
合した後、該照合結果に基づき未知入力単音節音声に最
も似ている単音節音声から順に順次複数の再照合候補を
登録済単音節音声より選出し、該複数の再照合候補の絹
合せに応じて定まる再照合パラメータにより未知入力単
音節音声と該禅数の再照合候補とを再照合して認識率の
向トを計る滋音節嵜市Ｗ拗方式が提案でれている。しか
し上記再照合方式には改善の余地があｐその対策が望ま
れている１、（ｃ）　　発明の目的本発明の目的は上記要望に基づき上記再照合方式の学音
節音声認識方式に於て、再照合候補の数を絞って町照合
に要する時間を短縮すると共に装置の構成を簡易ｆヒし
経済性の向上を計るものである。Background of the Technology As speech recognition technology has improved in recent years, there has been a desire for a speech recognition device with fewer recognition errors when recognizing a speaker's voice. The speech recognition method mainly converts the speaker's monosyllabic speech into feature parameters and stores them in advance, and then compares the feature parameters of the unknown input monosyllabic speech with the pre-stored feature parameters to find the one that is most similar. The system recognizes the corresponding monosyllabic speech, but even if the same monosyllabic speech is pronounced, the characteristic parameters change depending on the way it is uttered. It is difficult to reduce it to zero. In particular, for monosyllabic speech that has characteristic parameters that are likely to cause recognition errors, it is impossible to improve the recognition rate unless the matching method is taken into consideration. For this reason, after comparing all monosyllabic voices registered in advance with the monosyllabic voice of the speaker, multiple replays are performed in order from the monosyllabic voice that is most similar to the unknown input monosyllabic voice based on the matching results. The recognition rate is calculated by selecting matching candidates from the registered monosyllabic voices and re-matching the unknown input monosyllabic speech with the re-matching candidates for the Zen number using the re-matching parameters determined according to the combination of the plurality of re-matching candidates. A method has been proposed to measure the direction of the earthquake. However, there is room for improvement in the above-mentioned re-verification method, and countermeasures are desired. This aims to shorten the time required for town verification by narrowing down the number of candidates for reverification, and to improve economic efficiency by simplifying the configuration of the device.

（ｄ）　　発明の構成本発明の構成は予め単音節音声を登録しておき、未知入
力単音節音声の特徴パラメータと予め登録された総ての
単音節音声の特徴パラメータをＤＰ照会して最も良く似
ているものから上位順に順次複数の再照合候補を該登録
済単音節音声より選別し、該複数の杓照合候補の組合せ
に応じて定まる再照合パラメータにより未知入力単音節
音声と該再照合候補とを再照合して、その結果最も良く
似ている再照合候補を該当単音節音声として認識するが
、該複数の再照合候補を選別する際にＤＰ照合における
第−位の再照合候補と未知入力単音節音声との類似度が
該第−位の再照合候補により詑められる閾値以上の場合
は再照合工程を省略して該第−位の再照合候補を認識結
果として送出し単音節音声認識時間の短縮と再照合回路
の簡易化を計るものである。(d) Structure of the Invention The structure of the present invention is best achieved by registering monosyllabic speech in advance, and querying the feature parameters of the unknown input monosyllabic speech and the feature parameters of all previously registered monosyllabic speech in the DP. A plurality of rematching candidates are sequentially selected from the registered monosyllabic speech in descending order of similarity, and the unknown input monosyllabic speech and the rematching candidate are selected based on a rematching parameter determined according to the combination of the plurality of ladle matching candidates. As a result, the most similar re-matching candidate is recognized as the corresponding monosyllabic speech, but when selecting the plurality of re-matching candidates, it is necessary to distinguish between the highest re-matching candidate in DP matching and the unknown. If the degree of similarity with the input monosyllabic speech is equal to or greater than the threshold that can be satisfied by the re-matching candidate of the -th rank, the re-matching step is omitted and the re-matching candidate of the -th rank is sent out as the recognition result, and the monosyllabic speech is output. This aims to shorten recognition time and simplify the re-verification circuit.

（ｅ）　　発明の実施例図は本発明の一実施例を示す回路のブロック図である。(e) Examples of the invention The figure is a block diagram of a circuit showing one embodiment of the present invention.

先ず話者は予め単音節音声を登録するため制御部８の制
御により切替部３をパラメータ格納部４に接続し、単音
節音声を入力より加える。First, the speaker connects the switching section 3 to the parameter storage section 4 under the control of the control section 8 in order to register monosyllabic speech in advance, and inputs the monosyllabic speech.

前舵処理部１は音声レベル調整及びアナログディジタル
変換等を行ないパラメータ抽出部２へ送出し、パラメー
タ抽出部２は前記単音節音声の特徴パラメータを抽出し
パラメータ格納部４へ格納する。次に単音節声の認識を
行なわせるため、話者は制御部８の制御により切替部３
を記憶部５へ接続し、単音節音声を発声する。前記同様
の動作により前処理部１、パラメータ抽出部２、切替部
３を紗で記憶部５へ入った未知入力単音節音声の特徴パ
ラメータは制御部８の制御によりパラメータ３− 格納部４に格納されている全単音節音声の特徴パラメー
タと照合部６に於てＤＰ照合され、該全単音節音声の特
徴パラメータ中で最も良く似た特徴パラメータを持つ単
音節音声が第−位の再照合候補として選出され、続いて
順次複数の再照合候補が選出され判定部７へ送られる。The front rudder processing section 1 performs audio level adjustment, analog-to-digital conversion, etc., and sends it to the parameter extraction section 2. The parameter extraction section 2 extracts characteristic parameters of the monosyllabic speech and stores them in the parameter storage section 4. Next, in order to recognize a monosyllabic voice, the speaker selects the switching unit 3 under the control of the control unit 8.
is connected to the storage unit 5, and a monosyllabic voice is uttered. The characteristic parameters of the unknown input monosyllabic speech that have entered the storage unit 5 through the preprocessing unit 1, parameter extraction unit 2, and switching unit 3 through the same operation as described above are stored in the parameter 3-storage unit 4 under the control of the control unit 8. The matching unit 6 performs DP comparison with the feature parameters of all the monosyllabic speeches, and the monosyllabic speech with the most similar feature parameters among the feature parameters of all the monosyllabic speeches is designated as the highest re-matching candidate. Then, a plurality of re-verification candidates are sequentially selected and sent to the determination unit 7.

判定部７では照合部６で計算される未知入力単音節音声
と再照合候補との距離により類似度を判定する。即ち照
合部６で前記の如く第−位として選出された再照合候補
と未知人力単音節音声との類似度が該第−位の再照合候
補により予め定められている閾値より大きく、該再照合
候補が殆ど間違いなく未知入力単音節音声と判定して良
い場合は制御部８を経て出力に認識結果として送出する
。しかし前記第−位の再照合候補と未知入力単音節音声
との類似度が前記閾値より小さく該第−位の再照合候補
を未知入力単音節音声と判定するには危険がある場合は
再照合動作を行々っで認識する。従って制御部８は該再
照合候補に相当する特徴パラメータをパラメータ格納部
４よシ乗算器１０へ、記憶部５に４− 人っている未知入力単音節音声の特徴パラメータを乗算
器１１へ夫々送出させ、判定部７は該再照合候補により
定まる再照合パラメータ、即ち再照合候補を相互に識別
するに適した周波数帯域の成分を強調し、その他の周波
数帯域成分を減少させたものを周波数ウェイト記憶部１
２より乗算器１０゜１１へ送出させる０又判定部７は該
再照合候補に応じて定まる最適の照合区間を決定するパ
ラメータである閾値を閾値記憶部１３より再照合部９へ
送出させる０再熱合部９は乗算器１０．１１の出力と該
閾値記憶部１３よりの閾値とにより再照合する０前記第
−位の再照合候補より順に複数の再照合候補が未知入力
単音節音声と再照合され最も良く似た再照合候補が認識
結果として制御部８より出力へ送出される〇（ｆ）　　発明の詳細な説明した如く本発明は再照会方式を用いる単音節音声
認識方式に於て、再照合候補の数を絞って再照合に要す
る時間を短縮し、且つ再照合動作に関連する構成機器を
簡易化することが可能で経済性を向上させることが出来
るため、その効果は大なるものがある。The determining unit 7 determines the degree of similarity based on the distance between the unknown input monosyllabic speech calculated by the matching unit 6 and the re-matching candidate. That is, the degree of similarity between the re-matching candidate selected as the first-ranked re-matching candidate and the unknown human monosyllabic speech by the matching unit 6 as described above is greater than the threshold predetermined by the second-ranking re-matching candidate, and the re-matching is performed. If it is determined that the candidate is almost definitely an unknown input monosyllabic speech, it is sent as an output through the control unit 8 as a recognition result. However, if the degree of similarity between the re-matching candidate at the highest rank and the unknown input monosyllabic speech is smaller than the threshold value and there is a danger in determining the re-matching candidate at the highest rank as the unknown input monosyllabic speech, re-matching is necessary. Recognize actions step by step. Therefore, the control unit 8 transfers the feature parameters corresponding to the re-verification candidate from the parameter storage unit 4 to the multiplier 10, and transfers the feature parameters of the unknown input monosyllabic speech stored in the storage unit 5 to the multiplier 11. The determining unit 7 determines the rematching parameters determined by the rematching candidates, that is, emphasizes the frequency band components suitable for mutually discriminating the rematching candidates, and reduces other frequency band components as frequency weights. Storage part 1
2 to the multipliers 10 and 11. The 0-or determination unit 7 sends the threshold value, which is a parameter for determining the optimal matching interval determined according to the re-matching candidate, from the threshold storage unit 13 to the re-matching unit 9. The combination unit 9 performs re-verification using the output of the multiplier 10.11 and the threshold value from the threshold value storage unit 13. A plurality of re-verification candidates are re-verified with the unknown input monosyllabic speech in order from the -th rank re-verification candidate. The most similar re-verification candidate after collation is sent to the output from the control unit 8 as a recognition result (f) As described in detail, the present invention provides a monosyllabic speech recognition method using the re-referral method. The effect is significant because it is possible to reduce the number of reverification candidates, shorten the time required for reverification, and simplify the component equipment related to reverification operation, improving economic efficiency. There is.

[Brief explanation of drawings]

図は本発明の一実施しｕを示す回路のブロック図である
。１は前処理部、２はパラメータ抽出部、３は切替部、
４はパラメータ格納部、５は記憶部、６は照合部、７は
判定部、８は制御部、９は再照合部、１０．１１は乗算
器、１２は周波数ウェイト記憶部、１３は閾値記憶部で
ある。The figure is a block diagram of a circuit illustrating one embodiment of the present invention. 1 is a preprocessing unit, 2 is a parameter extraction unit, 3 is a switching unit,
4 is a parameter storage unit, 5 is a storage unit, 6 is a collation unit, 7 is a determination unit, 8 is a control unit, 9 is a re-verification unit, 10.11 is a multiplier, 12 is a frequency weight storage unit, 13 is a threshold value storage Department.

Claims

[Claims]

After comparing all previously registered monosyllabic speech with the unknown input monosyllabic speech, a plurality of re-matching candidates are selected from the registered monosyllabic speech based on the matching results, and a combination of the re-matching candidates is selected. In a speech recognition device that selects a rematching parameter for each case and rematches it with an unknown input monosyllabic speech, the similarity of the rematching candidate that ranks highest when selecting the rematching candidate is the rematching parameter. A monosyllabic speech recognition method that is characterized by omitting re-verification if the threshold value determined for each candidate is exceeded.