JPH0558553B2

JPH0558553B2 -

Info

Publication number: JPH0558553B2
Application number: JP61192431A
Authority: JP
Inventors: Takayuki Fujimoto
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-08-18
Filing date: 1986-08-18
Publication date: 1993-08-26
Also published as: JPS6348598A

Description

【発明の詳細な説明】〔目次〕概要産業上の利用分野従来の技術発明が解決しよとする問題点問題点を解決するための手段作用実施例発明の効果〔概要〕予め登録単語音声パタンを持ち、入力された音
声を認識するときには、該未知の入力単語の音声
パタンと、上記登録単語音声パタンとのパタン間
距離を求め、最小距離を与える登録単語音声パタ
ンの単語を認識結果とする単語音声認識方式にお
いて、(1)該登録単語音声パタンを登録するとき
に、区間検出の方法を、単語毎、又は単音節毎に
記憶する記憶部を設けることにより、該記憶内容
に基づいて、該区間検出の方法を登録すべき単
語、又は単音節毎に切り替えるようにしたもので
ある。(2)未知の入力音声に対する区間検出部の出
力を、照合する対象の登録パタン群から読み出し
た単語、又は単音節毎に選択して、照合し認識す
るようにしたものである。[Detailed Description of the Invention] [Table of Contents] Overview Industrial Application Fields Conventional Technology Problems to be Solved by the Invention Means for Solving the Problems Examples Effects of the Invention [Summary] Pre-registered Words When recognizing input speech that has a speech pattern, find the inter-pattern distance between the speech pattern of the unknown input word and the registered word speech pattern, and select the word of the registered word speech pattern that gives the minimum distance as the recognition result. In the word speech recognition method, (1) when registering the registered word speech pattern, by providing a storage unit that stores the method of section detection for each word or for each monosyllable, Therefore, the method of detecting the section is changed for each word or single syllable to be registered. (2) The output of the section detection unit for unknown input speech is selected for each word or monosyllable read out from a group of registered patterns to be compared, and is compared and recognized.

[Industrial application field]

本発明は、予め登録単語音声パタンを持ち、入
力された音声を認識するときには、該未知の入力
単語の音声パタンと、上記登録単語音声パタンと
のパタン間距離を求め、最小距離を与える登録単
語音声パタンの単語を認識結果とする単語音声認
識方式に係り、特に、登録単語音声パタンの登録
方式、又は未知の入力音声の認識方式に関する。 The present invention has registered word speech patterns in advance, and when recognizing input speech, calculates the inter-pattern distance between the speech pattern of the unknown input word and the registered word speech pattern, and the registered word giving the minimum distance. The present invention relates to a word speech recognition method that uses words of a speech pattern as a recognition result, and particularly relates to a method for registering registered word speech patterns or a method for recognizing unknown input speech.

一般に、日本語を発声する場合、例えば、（キ）
（ク）（シ）（ス）（チ）（ツ）（ヒ）（フ）（ピ）（
プ）
（シユ）等、（ｋ）（ｇ）（ｔ）（ｈ）（ｐ）の子音を
持つ音節（拍）の次に（ｉ）（ｕ）の母音が続く
場合、該母音の口構えだけで、実際に（ｉ）（ｕ）
が有声にひびかない現象があり、「母音の無声化」
と呼ばれている。｛「日本語発声アクセント辞典」
NHK編参照｝このような、無声化し易い音節が、単語中の先
頭、若しくは語尾にある場合、一般の音声認識処
理において実行されている音声区間検出処理で
は、該無声化部分が脱落することが多く、標準と
なる音声パタンの登録時において、該無声化部分
が脱落した音声パタンを登録する場合がある。 Generally, when speaking Japanese, for example, (ki)
(ku) (shi) (su) (chi) (tsu) (hi) (fu) (pi) (
)
When a syllable (beat) with a consonant (k), (g), (t), (h), or (p) is followed by a vowel (i) or (u), such as (shiyu), the posture of the vowel alone is , actually (i) (u)
There is a phenomenon in which vowels do not sound voiced, which is called "vowel devoicing."
It is called. {“Japanese Vocal Accent Dictionary”
Refer to the NHK edition} If such a syllable that is easily devoiced is at the beginning or end of a word, the voice segment detection processing performed in general speech recognition processing may not drop the devoiced portion. In many cases, when registering a standard voice pattern, a voice pattern in which the devoiced portion is omitted may be registered.

又、最近の計算機技術の進歩に伴つて、文書処
理システム、所謂ワープロが普及してきている
が、通常のワープロにおいては、キーによつて入
力された単語、又は単音節に対する処理である
為、操作性が悪いと云う問題があり、最近音声ワ
ープロが実用化されつつある。この場合、単音節
の登録品質を如何にして向上させるかが問題とな
る。 In addition, with recent advances in computer technology, document processing systems, so-called word processors, have become popular. However, in normal word processors, the processing is limited to words or monosyllables entered using keys, so the operation is difficult. Due to the problem of poor performance, voice word processors have recently been put into practical use. In this case, the problem is how to improve the registration quality of monosyllables.

このようにして、登録された標準音声パタン
と、未知の入力音声パタンとを比較して、最も似
ているパタン、具体的にはパタン間距離が最小の
上記標準音声パタンを認識結果とする音声認識装
置における音声認識においても、その前処理とし
て、該未知の入力音声に対する特徴パラメータの
抽出の後、区間検出処理があるので、標準音声パ
タンの登録の場合と同じ問題が存在する。 In this way, the registered standard speech pattern and the unknown input speech pattern are compared, and the most similar pattern, specifically, the above-mentioned standard speech pattern with the minimum distance between patterns is recognized as the speech. In speech recognition in a recognition device, the same problem as in the case of standard speech pattern registration exists because preprocessing includes section detection processing after extraction of feature parameters for the unknown input speech.

こうした事情から、品質の良い標準音声パタン
の登録と、未知の入力音声を認識する際の品質の
良い音声の抽出ができる区間検出方式が求められ
ていた。 Under these circumstances, there has been a need for a section detection method that can register high-quality standard speech patterns and extract high-quality speech when recognizing unknown input speech.

[Conventional technology]

第７図は従来の標準音声パタンの登録と、未知
の入力音声を認識する方式を説明する図である。 FIG. 7 is a diagram illustrating a conventional method for registering standard speech patterns and recognizing unknown input speech.

先ず、マイクから入力された登録音声は、パラ
メータ抽出部１において、該音声パタンの特徴を
表す認識パラメータが抽出される。 First, from a registered voice input from a microphone, a recognition parameter representing the characteristics of the voice pattern is extracted by a parameter extraction section 1.

この認識パラメータの抽出においては、例え
ば、帶域通過フイルタ群を使用したBPF分析や、
線形予測分析（LPC）等の方法が知られている。 In extracting this recognition parameter, for example, BPF analysis using a group of band-pass filters,
Methods such as linear predictive analysis (LPC) are known.

上記BPF分析においては、マイクから入力さ
れた音声を、特定の標本化周期（例えば、18ms）
でサンプリングした音声エネルギーのスペクトラ
ム強度をデイジタル化したものを認識パラメータ
とする。 In the above BPF analysis, the audio input from the microphone is processed at a specific sampling period (for example, 18 ms).
The digitized spectrum intensity of the audio energy sampled in is used as the recognition parameter.

該抽出された認識パラメータは、次の区間検出
部２において、真に音声が存在する区間を検出
し、その部分の認識パラメータを、パタンマツチ
ングを行う為のデータとする。 Using the extracted recognition parameters, the next section detecting section 2 detects the section where the voice truly exists, and uses the recognition parameters of that portion as data for pattern matching.

具体的には、例えば、上記パラメータ抽出部１
で作成した入力音声の認識パラメータに基づい
て、音声パワーを計算し、該音声パワーを、ある
特定の閾値でチエツクし、該閾値より大きい部分
を音声存在区間とするようにする。 Specifically, for example, the parameter extraction unit 1
The voice power is calculated based on the recognition parameters of the input voice created in step 1, the voice power is checked against a certain threshold value, and the portion larger than the threshold value is set as the voice presence section.

又、該区間の検出精度を上げる為に、音声パワ
ー閾値以外に、ゼロクロスや、自己相関等の他の
パラメータを組み合わせて行う等、種々の方法が
あるが、基本的には、一度該区間検出方法を決定
すると、途中で変更することなく、同じ手法で区
間検出を行つていた。 In addition, there are various methods to increase the detection accuracy of the section, such as combining other parameters such as zero crossing and autocorrelation in addition to the audio power threshold, but basically, once the section is detected, Once a method was determined, the same method was used to detect intervals without making any changes.

そして、該区間検出の結果に対しては何等のチ
エツクを行うことなく、標準の音声パタンとして
音声辞書（登録パタン群）６に登録していた。 Then, the result of the section detection was registered in the speech dictionary (registered pattern group) 6 as a standard speech pattern without performing any checks.

次に、未知の単語音声を認識する方式において
は、同じ手法によつて取り出された未知の入力単
語の音声パタンと、上記予め、登録されている音
声辞書（登録パタン群）６から、標準音声パタン
を順次取り出したものとを、例えば、パタン・マ
ツチング法によつて照合して、両者の距離を算出
し、最小距離の標準音声パタンを検出して認識結
果としていた。 Next, in the method of recognizing unknown word speech, the speech pattern of the unknown input word extracted by the same method and the standard speech are extracted from the previously registered speech dictionary (registered pattern group) 6. The sequentially extracted patterns are compared using, for example, a pattern matching method, the distance between the two is calculated, and the standard speech pattern with the minimum distance is detected and used as the recognition result.

上記未知の入力音声に対する認識方式の他の従
来方式としては、上記区間検出部２に複数個の閾
値を事前に持ち、複数個の区間を求めて、それぞ
れに対して、該標準の音声パタンの全てと照合
し、一番距離の近い組み合わせを選択して、認識
結果とする考えがあつた。 Another conventional recognition method for the above unknown input speech is to have a plurality of threshold values in advance in the section detecting section 2, find a plurality of sections, and calculate the standard speech pattern for each section. The idea was to compare all of them, select the closest combination, and use it as the recognition result.

然して、この方式では、１つの登録パタン当た
り、該複数個の閾値に対応する未知入力の音声パ
タンと照合しなければならないので、処理量が大
幅に増大する問題と、該複数個の音声パタン中に
誤つた音声パタンが多く含まれているので誤認識
の原因となる問題があつた。 However, in this method, each registered pattern must be compared with unknown input audio patterns corresponding to the plurality of threshold values, so there is a problem that the amount of processing increases significantly, and There was a problem with this because it contained many erroneous voice patterns, which caused erroneous recognition.

[Problem that the invention seeks to solve]

このような、一律の従来手法で区間検出をして
いると、音声区間の脱落と、ノイズの付加の問題
があり、例えば、脱落を減らそうとすると、ノイ
ズの付加が増加し、ノイズの付加を減らそうとす
ると脱落が増えると云う関係にあり、品質の良い
区間検出ができ難いと云う問題があつた。 When detecting sections using a uniform conventional method like this, there are problems with voice sections being dropped and noise being added.For example, if you try to reduce the dropouts, the addition of noise increases, If you try to reduce this, the number of dropouts will increase, making it difficult to detect high-quality sections.

登録時の区間検出で脱落、付加が生じた単語に
ついては、認識時に、通常は同じ脱落、付加が生
じることはないので、その殆どが棄却されるか、
誤認識されてしまうと云う問題があつた。 For words that have been omitted or added during section detection during registration, the same omissions or additions will not normally occur during recognition, so most of them will be rejected, or
There was a problem that it was misrecognized.

本発明は上記従来の欠点に鑑み、上記区間検出
で脱落する可能性の高い部分は、無声化規則等に
よつて事前に分かるので、該無声化規則等に対応
して、単語、又は単音節毎に区間検出方法を変え
ることによつて、該区間検出で検出された音声の
品質を向上させる方法を提供することを目的とす
るものである。 In view of the above-mentioned drawbacks of the conventional art, the present invention provides that the parts that are likely to be omitted in the above-mentioned section detection can be known in advance based on devoicing rules, etc., so that words or single syllables can be detected in accordance with the devoicing rules, etc. It is an object of the present invention to provide a method of improving the quality of speech detected by section detection by changing the section detection method for each section.

[Means for solving problems]

上記目的は本発明による単語音声認識装置での
音声パターンの登録のためには、入力音声パター
ンの特徴を表わす認識パラメータを抽出するパラ
メータ抽出部と、パラメータ抽出部よりの認識パ
ラメータのうち特定の閾値より大きい部分である
真の音声の存在する区間の認識パラメータを検出
するため、単語毎または単音節毎に検出のための
音声パワー閾値及び或いはゼロクロス閾値の決め
られた複数の区間検出部と、各単語或いは単音節
とそれに対応する検出機能を有する区間検出部と
の関係を記憶する区間検出記憶部と、区間検出記
憶部の制御によつて複数の区間検出部の出力のい
づれか１個を登録パターンとして選出する切り替
え部とより構成されており、また未知音声パター
ンの認識のためには、未知入力音声パターンの特
徴を表わす認識パラメータを抽出するパラメータ
抽出部と、パラメータ抽出部よりの認識パラメー
タのうち特定の閾値より大きい部分である真の音
声の存在する区間の認識パラメータを検出するた
めの音声パワー閾値及び或いはゼロクロス閾値の
決められた複数の区間検出部と、各区間検出部よ
りの出力を記憶する複数の入力用パターンメモリ
と、登録単語音声パターンを格納する登録パター
ン群と、登録パターン群から読み出した照合対象
である単語又は単音節毎に入力用パターンメモリ
の出力を選択する選択部、登録パターン群よりの
登録パターンと選択部よりの出力とを照合し、最
小のパターン間距離を与える登録単語又は単音節
を認識結果とする照合部とより構成されることに
よつて達成される。 In order to register a speech pattern in the word speech recognition device according to the present invention, a parameter extraction unit extracts recognition parameters representing the characteristics of the input speech pattern, and a specific threshold value is set among the recognition parameters from the parameter extraction unit. In order to detect the recognition parameters of the section where true speech exists, which is a larger part, a plurality of section detection units each having a voice power threshold and/or a zero-crossing threshold determined for detection on a word-by-word or monosyllable basis; A section detection storage section that stores the relationship between a word or a single syllable and a section detection section having a corresponding detection function, and one of the outputs of the plurality of section detection sections is registered as a pattern under the control of the section detection storage section. In addition, in order to recognize an unknown speech pattern, a parameter extraction section extracts recognition parameters representing the characteristics of the unknown input speech pattern, and a switching section selects recognition parameters from the parameter extraction section. A plurality of section detection sections each having a voice power threshold or a zero-crossing threshold for detecting the recognition parameter of a section where true speech exists, which is a portion larger than a specific threshold, and storing the output from each section detection section. a plurality of input pattern memories, a registered pattern group that stores registered word audio patterns, a selection unit that selects an output of the input pattern memory for each word or monosyllable to be matched read from the registered pattern group; This is achieved by comprising a matching unit that matches the registered pattern from the pattern group with the output from the selection unit and uses the registered word or monosyllable that provides the minimum distance between patterns as the recognition result.

[Effect]

即ち、本発明によれば、予め登録単語音声パタ
ンを持ち、入力された音声を認識するときには、
該未知の入力単語の音声パタンと、上記登録単語
音声パタンとのパタン間距離を求め、最小距離を
与える登録単語音声パタンの単語を認識結果とす
る単語音声認識方式において、(1) 該登録単語音
声パタンを登録するときに、区間検出の方法を、
単語毎、又は単音節毎に記憶する記憶部を設ける
ことにより、該記憶内容に基づいて、該区間検出
の方法を登録すべき単語、又は単音節毎に切り替
えるようにしたものであり、(2) 未知の入力音声
に対する区間検出部の出力を、照合する対象の登
録パタン群から読み出した単語、又は単音節毎に
選択して、照合し認識するようにしたものである
ので、区間検出誤りが減少し、標準音声パタンの
品質が向上すると共に、未知入力音声に対する認
識性能が向上する効果がある。 That is, according to the present invention, when recognizing input speech having registered word speech patterns in advance,
In a word speech recognition method that calculates the inter-pattern distance between the speech pattern of the unknown input word and the registered word speech pattern, and uses the word of the registered word speech pattern that gives the minimum distance as a recognition result, (1) the registered word When registering audio patterns, select the section detection method.
By providing a storage unit that stores information for each word or each monosyllable, the method for detecting the section can be switched for each word or monosyllable to be registered based on the stored content, and (2) ) The output of the section detection unit for unknown input speech is selected for each word or monosyllable read from the registered pattern group to be matched, and is compared and recognized, so there is no error in section detection. This has the effect of improving the quality of standard speech patterns and improving the recognition performance for unknown input speech.

〔Example〕

以下本発明の実施例を図面によつて詳述する。
第１図は本発明による音声パタン登録のための構
成例を示した図であり、第２図は本発明による未
知入力音声の認識のための構成例を示した図であ
つて、それぞれの図面における区間検出部２′、
及び関連機構（切り替え部３ａ、選択部３ｂ等）
が本発明を実施するのに必要な手段である。 Embodiments of the present invention will be described in detail below with reference to the drawings.
FIG. 1 is a diagram showing a configuration example for voice pattern registration according to the present invention, and FIG. 2 is a diagram showing a configuration example for recognizing unknown input voice according to the present invention. Section detection unit 2' in
and related mechanisms (switching section 3a, selection section 3b, etc.)
are the means necessary to carry out the present invention.

第３図は本発明の登録方式の概念を示した図
で、ａは単語の場合を示し、ｂは単音節の場合を
示しており、第４図は本発明の登録方式をとつた
場合の誤り率を説明する図である。又、第５図は
本発明による認識方式の概念を示した図であり、
ａは区間検出方式を示し、ｂは登録音声パタンと
の照合方式を示しており、第６図は本発明による
認識方式をとつた場合の認識率を説明する図であ
る。 FIG. 3 is a diagram showing the concept of the registration method of the present invention, where a indicates the case of a word, b indicates the case of a monosyllable, and FIG. 4 shows the concept of the registration method of the present invention. It is a figure explaining an error rate. Moreover, FIG. 5 is a diagram showing the concept of the recognition method according to the present invention,
a shows the section detection method, b shows the matching method with registered speech patterns, and FIG. 6 is a diagram for explaining the recognition rate when the recognition method according to the present invention is adopted.

以下、第１図、第２図を参照しながら第３図〜
第６図を用いて、本発明の単語音声認識方式を説
明する。 Below, while referring to Figures 1 and 2, Figures 3~
The word speech recognition method of the present invention will be explained using FIG.

先ず、標準音声パタン登録方式について説明す
る。 First, the standard voice pattern registration method will be explained.

例えば、認識対象の単語に「aomori（青森）」、
「aichi（愛知）」と云う単語が含まれていた場合、
該単語音声パタンの登録時に、該「aomori（青
森）」に対する単語音声の区間検出に使う音声パ
ワーの閾値を通常よりも、３デシベル上げて、ノ
イズの付加が起きにくいようにする。｛第３図ａ
の「Ａの区間１」参照｝又、「aichi（愛知）」に対する単語音声の区間検
出に使う音声パワーの閾値を、終端に対してのみ
通常よりも３デシベル下げて脱落が起きにくいよ
うにする。或いは、該終端に対する区間検出を、
音声パワー閾値と、ゼロクロスに対する閾値｛第
３図ａの「Ｂの区間」参照｝の両方を使用して、
区間２（Ａ＋Ｂ）のパタンで区間検出を行う。 For example, the word to be recognized is "aomori",
If the word "aichi" is included,
When registering the word speech pattern, the voice power threshold used for detecting the section of the word voice for "aomori" is raised by 3 decibels than usual to make it less likely that noise will be added. {Figure 3 a
Refer to "Section 1 of A"} Also, lower the voice power threshold used to detect the section of the word voice for "aichi" by 3 decibels than normal only for the end to make it less likely that dropouts will occur. . Alternatively, section detection for the end,
Using both the audio power threshold and the threshold for zero crossings {see “section B” in Figure 3a},
Section detection is performed using the pattern of section 2 (A+B).

これは、当該単語音声「aichi（愛知）」につい
ては、語尾が無声化する場合が殆どなので、該単
語の終端部の音声パワーは低くなるが、高域成分
は相対的に強くなるし、ゼロクロスも大きくなる
と云う特徴があることに着目した区間検出を行う
ことを意味している。 This is because the ending of the word "aichi" is unvoiced in most cases, so the speech power at the end of the word is low, but the high-frequency components are relatively strong, and the zero crossing This means that section detection is carried out by focusing on the characteristic that the value increases.

第３図ｂは、「す」、「ア」と云う単音節の、音
声パワー、及びゼロクロスの時間的変化を同じ軸
上に示したもので、本図をみることにより、単音
節についても、各単音節毎に、音声パワー、ゼロ
クロスの特性が異なることが分かる。 Figure 3b shows the temporal changes in voice power and zero cross of the monosyllables ``su'' and ``a'' on the same axis. It can be seen that the vocal power and zero-crossing characteristics differ for each single syllable.

第４図は、本発明の手順によつて音声パタンを
登録するときの誤り率を説明した図であるが、例
えば、音声パワー閾値による区間検出のみでは、
無声化した語尾の多くが脱落してしまうこと、及
び音声パワー閾値と、ゼロクロスによる区間検出
を行うと無声化した語尾の多くが正しく検出され
るが、ノイズの付加による誤り率が増加し、個別
に区間検出方式を変えた場合には、誤り率が最小
になることを示している。 FIG. 4 is a diagram explaining the error rate when registering a voice pattern using the procedure of the present invention.
Many of the devoiced word endings are dropped, and although many of the devoiced word endings are detected correctly when section detection is performed using the voice power threshold and zero crossing, the error rate increases due to the addition of noise, and individual It is shown that the error rate is minimized when the interval detection method is changed.

このように、本発明による単語音声認識装置に
おける音声パターン登録のための構成は、各単
語、又は単音節のそれぞれについて、音声パワー
や、ゼロクロスの特性が異なることに着目し、そ
れぞれの単語、単音節を抽出するのに最も適した
区間検出を行うように、各単語、単音節毎に区間
検出方法を区間検出記憶部３に記憶しておき、入
力された音声に対応して、区間検出部（１〜ｎ）
２′を、切り替え部３ａで選択するようにした所
に特徴がある。（第１図参照）次に、本発明の単語音声認識装置における未知
の入力音声を認識するための構成について説明す
る。 As described above, the configuration for registering a speech pattern in the word speech recognition device according to the present invention focuses on the fact that each word or single syllable has different speech power and zero-crossing characteristics. In order to perform the most suitable interval detection for extracting syllables, a interval detection method is stored in the interval detection storage unit 3 for each word or single syllable, and the interval detection unit 3 stores the interval detection method for each word or monosyllable. (1~n)
2' is selected by the switching section 3a. (See FIG. 1) Next, the configuration for recognizing unknown input speech in the word speech recognition device of the present invention will be described.

例えば、認識対象の単語に「aomori（青森）」、
「aichi（愛知）」と云う単語が含まれていた場合、
該「aomori（青森）」に対する単語音声の区間検
出に使う音声パワーの閾値を通常よりも、３デシ
ベル上げて、ノイズの付加が起き難いようにす
る。｛第５図ａの「Ａの区間１」参照｝又、「aichi（愛知）」に対する単語音声の区間検
出に使う音声パワーの閾値を、終端に対してのみ
通常よりも３デシベル下げて脱落が起き難いよう
にする。或いは、該終端に対する区間検出を、音
声パワー閾値と、ゼロクロスに対する閾値｛第５
図ａの「Ｂの区間」参照｝の両方を使用して、区
間２（Ａ＋Ｂ）のパタンで区間検出を行う。 For example, the word to be recognized is "aomori",
If the word "aichi" is included,
The threshold value of the voice power used to detect the section of the word voice for "aomori" is raised by 3 decibels compared to normal to make it difficult for noise to be added. {Refer to “Section 1 of A” in Figure 5a} In addition, the voice power threshold used to detect the section of the word voice for “aichi” was lowered by 3 decibels than normal only for the end to prevent omissions. Make it difficult to wake up. Alternatively, the section detection for the end can be performed using the audio power threshold and the threshold for zero crossing {fifth
Section detection is performed using the pattern of section 2 (A+B) using both the "Section B" in Figure a).

このように、各単語、又は単音節毎の区間検出
方式を、予め分かつている無声化規則等によつて
定めて｛第２図、区間検出部（１〜ｎ）２′）参
照｝おき、未知の音声が入力されると、それぞれ
の区間検出部（１〜ｎ）２′で、区間検出が行わ
れ、入力用パタンメモリ（１〜ｎ）２″に、該検
出された音声区間の認識パラメータが記憶され
る。 In this way, the interval detection method for each word or single syllable is determined based on the devoicing rule known in advance (see Figure 2, interval detection unit (1 to n) 2')), and When unknown speech is input, each section detection section (1 to n) 2' performs section detection, and the recognition of the detected speech section is stored in the input pattern memory (1 to n) 2''. Parameters are stored.

一方、制御部７においては、登録パタン群６に
登録されている標準の音声パタンを、１語宛読み
出し、対応した入力パタンメモリ（１〜ｎ）２″
を選択部３ｂで選択して、照合部４において上記
標準音声パタンとのパタン間距離を求め、その最
もパタン間距離の小さい標準音声パタンを認識結
果として出力するように機能する。（第２図参照）第５図ｂは、このときの認識過程を模式的に示
したもので、上記単語音声「aichi（愛知）」を認
識する場合を過程を示している。 On the other hand, in the control unit 7, the standard speech pattern registered in the registered pattern group 6 is read out for each word, and the corresponding input pattern memory (1 to n) 2'' is read out.
is selected by the selection unit 3b, the inter-pattern distance from the standard speech pattern is determined in the collation unit 4, and the standard speech pattern with the smallest inter-pattern distance is output as the recognition result. (See FIG. 2) FIG. 5b schematically shows the recognition process at this time, and shows the process when recognizing the word sound "aichi".

で示した区間検出では、語尾の「チ」の部分
が脱落しており、で示した区間検出では語頭に
ノイズが付加すると共に、語尾の「チ」の部分が
脱落している場合を示している。 In the section detection shown in , the "chi" part at the end of the word is dropped, and in the section detection shown in , noise is added to the beginning of the word, and the "chi" part at the end is dropped. There is.

従つて、登録パタン群６から読み出した登録パ
タン群Ａ，Ｂ，Ｃとのパタン間距離を算出した場
合、図示の如く、登録パタン群Ｂにおいて、パタ
ン間距離が最小になり、該登録パタン群Ｂに登録
されている単語の中で、パタン間距離が最小とな
るものを選択することにより、正しい単語音声
「aichi（愛知）」を認識することができる。 Therefore, when calculating the inter-pattern distance between the registered pattern groups A, B, and C read out from the registered pattern group 6, as shown in the figure, the inter-pattern distance is the minimum in the registered pattern group B, and the registered pattern group By selecting the word with the minimum distance between patterns from among the words registered in B, the correct word pronunciation "aichi" can be recognized.

第６図は本発明による未知入力音声を認識する
場合の認識率について説明したものであり、第４
図の登録の場合と同じような誤り率を示してい
て、区間検出方式を単語毎に個別に変更した場合
（＋）の認識率が最大になることが分かる。 FIG. 6 explains the recognition rate when recognizing unknown input speech according to the present invention.
It can be seen that the error rate is similar to that of the registration shown in the figure, and that the recognition rate is maximum when the section detection method is changed individually for each word (+).

この＋の区間検出では、第５図ｂからも明
らかな如く、誤りも多く含まれることになるが、
照合結果では、パタン間距離が大きくなる場合が
殆どである為、誤認識の原因となるこは逆に少な
くなるのである。 As is clear from Figure 5b, this + interval detection includes many errors,
In most of the matching results, the distance between patterns is large, so that the number of erroneous recognitions is reduced.

上記標準音声パタンの登録の場合、或いは未知
の入力音声を認識する場合、いずれの場合におい
ても、該区間検出法の選択条件の設定は、事前に
単語毎、或いは単音節毎に人手で設定しても良い
し、前述の無声化規則等によつて、自動生成して
設定しても良いことは云う迄もないことである。 In the case of registering the standard speech patterns mentioned above, or in the case of recognizing unknown input speech, in either case, the selection conditions for the interval detection method must be manually set for each word or single syllable in advance. Needless to say, it may be automatically generated and set using the above-mentioned devoicing rules or the like.

例えば、上の例で云えば、「aichi（愛知）」は語
尾が無声化することが、上記無声化規則等で分か
るので、該単語の終端部に対して、上記のような
区間検出を行うように定めるのである。 For example, in the above example, we know from the above devoicing rules that the ending of the word "aichi" is devoiced, so we perform the above interval detection for the final part of the word. It is defined as follows.

本発明の区間検出方式を用いても、ノイズの付
加の問題は残るが、上記「aichi（愛知）」の場合
と同じように、語尾が無声化する単語は、通常全
単語の１〜２割程度であるので、他の８〜９割の
単語に対しては、「aomori（青森）」の場合のよう
にノイズの付加の生じ難い区間検出を行うように
する為、該ノイズ付加の問題は格段に小さくな
る。 Even if the interval detection method of the present invention is used, the problem of noise addition remains, but as in the case of "aichi" above, words with devoiced endings usually account for 1 to 20% of all words. Therefore, for the other 80 to 90% of words, the problem of noise addition is solved by detecting sections where noise addition is unlikely to occur, as in the case of "aomori". becomes significantly smaller.

又、認識対象の単語が最初から決まつていて変
更が無い、若しくは変更が少ない場合には、区間
検出方式は、事前に人手で設定しても良いが、変
更が多い場合には、前述の無声化規則等を使用し
て自動生成するのが良い。尚、単音節を標準音声
登録パタンとする場合には、事前設定でも構わな
いことは云う迄もない。 In addition, if the words to be recognized have been determined from the beginning and there are no changes or only a few changes, the section detection method can be set manually in advance, but if there are many changes, the above-mentioned method can be used. It is best to automatically generate it using devoicing rules, etc. It goes without saying that if a single syllable is used as the standard speech registration pattern, it may be set in advance.

〔Effect of the invention〕

以上、詳細に説明したように、本発明の単語音
声認識装置は、予め登録単語音声パタンを持ち、
入力された音声を認識するときには、該未知の入
力単語の音声パタンと、上記登録単語音声パタン
とのパタン間距離を求め、最小距離を与える登録
単語音声パタンの単語を認識結果とする単語音声
認識方式において、(1) 該登録単語音声パタンを
登録するときに、区間検出の方法を、単語毎、又
は単音節毎に記憶する記憶部を設けることによ
り、該記憶内容に基づいて、該区間検出の方法を
登録すべき単語、又は単音節毎に切り替えるよう
にしたものであり、(2) 未知の入力音声に対する
区間検出部の出力を、照合する対象の登録パタン
群から読み出した単語、又は単音節毎に選択し
て、照合し認識するようにしたものであるので、
区間検出誤りが減少し、標準音声パタンの品質が
向上すると共に、未知入力音声に対する認識性能
が向上する効果がある。 As described above in detail, the word speech recognition device of the present invention has registered word speech patterns in advance,
When recognizing input speech, the distance between the speech pattern of the unknown input word and the registered word speech pattern is determined, and the word of the registered word speech pattern that gives the minimum distance is used as the recognition result. In this method, (1) when registering the registered word sound pattern, by providing a storage unit that stores a method for detecting an interval for each word or for each monosyllable, the interval is detected based on the stored content; (2) The output of the section detection unit for unknown input speech is switched for each word or single syllable to be registered. Since each syllable is selected and compared and recognized,
This has the effect of reducing section detection errors, improving the quality of standard speech patterns, and improving recognition performance for unknown input speech.

[Brief explanation of drawings]

第１図は本発明による単語音声認識装置におけ
る音声パターン登録のための構成例を示した図、
第２図は本発明による語音声認識装置における未
知入力音声の認識のための構成例を示した図、第
３図は本発明の登録方式の概念を示した図、第４
図は本発明の登録方式をとつた場合の誤り率を説
明する図、第５図は本発明による認識方式の概念
を示した図、第６図は本発明による認識方式をと
つた場合の認識率を説明する図、第７図は従来の
標準音声パタンの登録と、未知の入力音声を認識
する方式を説明する図、である。図面において、１はパラメータ抽出部、２は区
間検出部、２′は区間検出部１〜ｎ、２″は入力用
パタンメモリ、３は区間検出記憶部、３ａは切り
替え部、３ｂは選択部、５は照合部、６は登録パ
タン群、８は制御部、Ａは区間１、Ａ＋Ｂは区間
２、〜は区間検出方式、をそれぞれ示す。 FIG. 1 is a diagram showing an example of the configuration for voice pattern registration in a word voice recognition device according to the present invention;
FIG. 2 is a diagram showing an example of the configuration for recognition of unknown input speech in the word speech recognition device according to the present invention, FIG. 3 is a diagram showing the concept of the registration method of the present invention, and FIG.
The figure is a diagram explaining the error rate when using the registration method of the present invention, Figure 5 is a diagram showing the concept of the recognition method according to the present invention, and Figure 6 is a diagram showing recognition when using the recognition method according to the present invention. FIG. 7 is a diagram for explaining the conventional standard voice pattern registration and a method for recognizing unknown input voice. In the drawing, 1 is a parameter extraction section, 2 is a section detection section, 2' is section detection sections 1 to n, 2'' is an input pattern memory, 3 is a section detection storage section, 3a is a switching section, 3b is a selection section, 5 is a collation unit, 6 is a registered pattern group, 8 is a control unit, A is section 1, A+B is section 2, and ~ is a section detection method, respectively.

Claims

[Scope of Claims] 1. A parameter extraction unit that extracts recognition parameters representing the characteristics of an input speech pattern, and recognition of a section in which true speech exists, which is a portion of the recognition parameters from the parameter extraction unit that is greater than a specific threshold. a plurality of interval detection units in which voice power thresholds and zero-crossing thresholds are determined for each word or each single syllable in order to detect the parameters;
A section detection storage section that stores the relationship between each word or single syllable and a section detection section having a corresponding detection function; and one of the outputs of the plurality of section detection sections is registered under the control of the section detection storage section. A word speech recognition device comprising: a switching section that selects a pattern; and a switching section that selects a pattern. 2. A parameter extraction unit that extracts recognition parameters representing the characteristics of an unknown input speech pattern, and a parameter extraction unit that detects recognition parameters of sections where true speech exists, which is a portion larger than a specific threshold among the recognition parameters from the parameter extraction unit. a plurality of section detection sections having predetermined speech power thresholds and zero-crossing thresholds; a plurality of input pattern memories that store the output from each section detection section; a registered pattern group that stores registered word speech patterns; A selection section selects the output of the input pattern memory for each word or monosyllable to be matched read out from the pattern group, and the registered pattern from the registered pattern group is compared with the output from the selection section, and the minimum distance between patterns is determined. 1. A word speech recognition device comprising: a matching section whose recognition result is a registered word or monosyllable that gives a .