JP3427500B2

JP3427500B2 - Membership calculation device and HMM device

Info

Publication number: JP3427500B2
Application number: JP18747294A
Authority: JP
Inventors: 英一坪香
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1994-08-09
Filing date: 1994-08-09
Publication date: 2003-07-14
Anticipated expiration: 2018-07-14
Also published as: JPH0854893A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device.

【０００２】[0002]

【従来の技術】ＨＭＭ（Hidden Markov Model）が音声
認識の分野で盛んに用いられるようになってきた。その
中の１つにファジィベクトル量子化に基づくＨＭＭ（Ｆ
ＶＱ／ＨＭＭ）がある。電子情報通信学会技術報告ＳＰ
９３−２７（１９９３年６月）には相乗型と相加型のＦ
ＶＱ／ＨＭＭが記載されており、特に相乗型ＦＶＱ／Ｈ
ＭＭは優れた性能を示すものとして注目に値する。2. Description of the Related Art HMM (Hidden Markov Model) has been widely used in the field of speech recognition. One of them is the HMM (F
VQ / HMM). IEICE Technical Report SP
93-27 (June 1993) has a synergistic and additive F
VQ / HMM is described, especially synergistic FVQ / H
It is worth noting that MM shows excellent performance.

【０００３】（図１）（ａ）はＦＶＱ／ＨＭＭの一般的
な原理を説明するブロック図である。FIG. 1A is a block diagram for explaining the general principle of FVQ / HMM.

【０００４】１００は特徴抽出部であって、認識すべき
入力音声は、例えば、１０msec毎に特徴ベクトルに変換
される。特徴量としては、例えばケプストラムおよびそ
の回帰係数等が最近では良く用いられる。Reference numeral 100 denotes a feature extraction unit, which converts an input voice to be recognized into a feature vector every 10 msec, for example. As the feature amount, for example, cepstrum and its regression coefficient are often used recently.

【０００５】１０１はベクトル量子化部であって、前記
特徴ベクトルは帰属度ベクトルに変換される。１０２は
コードブックであって前記ベクトル量子化は、このコー
ドブックの情報に基づいて行われる。Reference numeral 101 is a vector quantizer, which converts the feature vector into a membership vector. Reference numeral 102 denotes a codebook, and the vector quantization is performed based on the information of this codebook.

【０００６】コードブック１０２はコードブック作成用
の訓練ベクトル集合をＭのクラスタにクラスタリング
し、各クラスタにラベル付けし、前記各クラスタの代表
ベクトルをそのラベルで検索可能な形で記憶したもので
ある。ここで、前記訓練ベクトル集合は、予めコードブ
ック作成のために発声された種々の音声を、前記特徴抽
出部１００を用いて、もしくはそれと同様な動作をする
特徴抽出手段を用いて、別途特徴ベクトルに変換したも
のからなる。前記代表ベクトルは、通常は前記各クラス
タの平均ベクトルである。The codebook 102 is obtained by clustering a training vector set for creating a codebook into M clusters, labeling each cluster, and storing a representative vector of each cluster in a searchable form by the label. . Here, the training vector set is obtained by using the feature extraction unit 100 or a feature extraction unit that operates in a similar manner as the feature vectors of various voices uttered in advance for codebook creation. It consists of the converted into. The representative vector is usually an average vector of each cluster.

【０００７】前記帰属度ベクトルは、各時点における特
徴ベクトルの前記各クラスタに対する帰属度を要素とす
るベクトルであって、時点ｔにおける特徴ベクトルをｙ
_t、前記クラスタをＣ₁,...,Ｃ_Mとし、ｙ_tのＣ_mに対する
帰属度をｕ_tmとすれば、ｙ_tが変換された帰属度ベクト
ルはｕ_t＝(ｕ_t1,...,ｕ_tM)^Tとなる。以後本願において
はベクトルは縦ベクトルとし、Ｔは転置を表すこととす
る。ここに、ｕ_tmの定義としては種々考えられるが、Ｃ
_mの代表ベクトルをμ_m、ｙ_tとμ_mのユークリッド距離を
ｄ_tm＝[(ｙ_t−μ_m)^T(ｙ_t−μ_m)]^1/2とするとき、例えばThe belonging degree vector is a vector having the degree of belonging of the feature vector at each time point to each cluster as an element, and the feature vector at the time point t is y.
_t, C ₁ the cluster, ..., and C _M, y if _t attribution degree for C _m and u _tm of the membership vector y _t is converted u _t = (u _t1, .. ., u _tM ) ^T. Hereinafter, in the present application, the vector is a vertical vector, and T represents transposition. There are various possible definitions of u _tm here, but C
_When the representative vector of _m is μ _{m and} the Euclidean distance between y _t and μ _m is d _tm = [(y _t −μ _m ) ^T (y _t −μ _m )] ^1/2 , for example,

【０００８】[0008]

【数１】 [Equation 1]

【０００９】と定義できる（J. G. Bezdek:“Pattern R
ecognition with Fuzzy Objective Function Algorith
m", Plenum Press, New York (1981).）。[JG Bezdek: “Pattern R
ecognition with Fuzzy Objective Function Algorith
m ", Plenum Press, New York (1981).).

【００１０】計算量の削減のために、実際には、前記帰
属度は、全てのクラスタについて計算されるものではな
く、ｄ_tmが最小のクラスタからｋ番目に小さいクラスタ
について計算される。即ち、前記帰属度ベクトルｕ_tを
形成する要素は、帰属度の大きいクラスタの上位ｋ（k-
nearest neighbor）に関しては（数１）で計算された値
であり、他は０とされる。In order to reduce the amount of calculation, actually, the degree of membership is not calculated for all the clusters, but is calculated for the kth smallest cluster from the cluster with the smallest d _tm . That is, the elements forming the degree-of-association vector u _t are the upper k (k-k
"nearest neighbor" is a value calculated by (Equation 1), and other values are set to 0.

【００１１】１０３はＨＭＭ記憶部であって、認識すべ
き単語や音節等の各認識単位ｗ＝１,...,Ｗに対応した
ＨＭＭを記憶したものである。１０４は尤度計算部であ
って、前記ベクトル量子化部の出力に得られる帰属度ベ
クトル系列から、前記各ＨＭＭの入力音声に対する尤
度、即ち、前記特徴ベクトルの系列ｙ₁,...,ｙ_Tが前記
各ＨＭＭから発生する度合を計算するものである。Reference numeral 103 denotes an HMM storage unit which stores HMMs corresponding to respective recognition units w = 1, ..., W such as words and syllables to be recognized. Reference numeral 104 denotes a likelihood calculation unit, which calculates the likelihood of the input voice of each HMM, that is, the feature vector sequence y ₁ , ..., From the belonging degree vector sequence obtained at the output of the vector quantization unit. The degree of occurrence of y _T from each HMM is calculated.

【００１２】１０５は判定部であって、ｙ₁,...,ｙ_Tが
ＨＭＭｗから発生する度合いをＬ^wとするときReference numeral 105 denotes a determination unit, which is used when the degree of occurrence of y ₁ , ..., Y _T from HMMw is L ^w.

【００１３】[0013]

【数２】 [Equation 2]

【００１４】を計算し、ｗ^*を認識結果とするものであ
る。（図１）（ｂ）はＨＭＭの原理を説明する説明図で
ある。ｑ₁,...,ｑ_J+1は状態、ａ_ijは状態ｉから状態ｊ
への遷移確率、ω_i(ｙ_t)は状態ｉにおけるｙ_tの発生度
合である。これらの記号を用いれば、特徴ベクトルの系
列ｙ₁,...,ｙ_TがこのＨＭＭから発生する度合Ｌは（数
３）のようになる。ただし、Ｘ＝(ｘ₁,ｘ₂,...,ｘ_T+1＝
ｑ_J+1)は状態系列であって、最終状態Ｊ＋１を仮定し、
時点Ｔ＋１で状態Ｊ＋１に到達するものとし、状態Ｊ＋
１においてはベクトルは発生しないとする。また、π_i
はｔ＝１で状態ｉである確率、λは当該ＨＭＭのパラメ
ータ集合である。Is calculated, and w ^* is used as the recognition result. (FIG. 1) (b) is an explanatory view for explaining the principle of the HMM. q ₁ , ..., q _{J + 1} is a state, a _ij is a state i to a state j
The transition probability to, ω _i (y _t ) is the degree of occurrence of y _t in state i. If these symbols are used, the degree L that the series of feature vectors y ₁ , ..., Y _T is generated from this HMM is as shown in (Equation 3). However, X = (x ₁ , x ₂ , ..., x _{T + 1} =
q _{J + 1} ) is a sequence of states, assuming the final state J + 1,
Assume that state J + 1 is reached at time T + 1, and state J +
In the case of 1, no vector is generated. Also, π _i
Is the probability that the state is i at t = 1, and λ is the parameter set of the HMM.

【００１５】[0015]

【数３】 [Equation 3]

【００１６】尤度計算部１０４は、認識単位ｗに対応す
る尤度Ｌ^wをｗ＝１,...,Ｗについて（数３）に従って計
算するものであるが、前記ω_i(ｙ_t)の定義の仕方によっ
て種々のＨＭＭが定義される。ここで問題としているＦ
ＶＱ／ＨＭＭはω_i(ｙ_t)を原理的には次のように定義し
たものである。（１）相乗型ＦＶＱ／ＨＭＭの場合The likelihood calculating unit 104 calculates the likelihood L ^w corresponding to the recognition unit w in accordance with (Equation 3) for w = 1, ..., W. The above-mentioned ω _i (y _t ) Various HMMs are defined according to the definition method of. The problem here is F
In VQ / HMM, ω _i (y _t ) is defined in principle as follows. (1) In the case of synergistic FVQ / HMM

【００１７】[0017]

【数４】 [Equation 4]

【００１８】相乗型という呼称は（数４）後者の表現か
ら来ている。（２）相加型ＦＶＱ／ＨＭＭThe name synergistic comes from the latter expression (Equation 4). (2) Additive FVQ / HMM

【００１９】[0019]

【数５】 [Equation 5]

【００２０】相加型という呼称は（数５）の表現から来
ている。ここで、ｂ_imは状態ｉにおけるクラスタｍの発
生確率、ｕ_tmはｙ_tのクラスタｍへの帰属度である。前
記のように実際には（数４）あるいは（数５）における
加算あるいは乗算は帰属度の上位Ｋクラスタのみで行わ
れるものであり、この場合は、（数４）（数５）は（数
６）（数７）のようになる。ただし、ｈ(ｋ)はｙ_tがｋ
番目に帰属度の高いクラスタである。The name "additive type" comes from the expression (Equation 5). Here, b _im is the probability of occurrence of cluster m in state i, and u _tm is the degree of membership of y _t in cluster m. As described above, the addition or multiplication in (Equation 4) or (Equation 5) is actually performed only in the upper K cluster of the degree of membership, and in this case, (Equation 4) (Equation 5) becomes 6) It becomes like (Equation 7). However, for h (k), y _t is k
This is the second highest cluster.

【００２１】[0021]

【数６】 [Equation 6]

【００２２】[0022]

【数７】 [Equation 7]

【００２３】実際の尤度計算は、（数３）をそのまま計
算するのではなく、Viterbi法がよく用いられ、対数化
して加算の形で用いられるのが普通である。即ち、For the actual likelihood calculation, the Viterbi method is often used instead of calculating (Equation 3) as it is, and it is common to use it in the form of addition after logarithmizing it. That is,

【００２４】[0024]

【数８】 [Equation 8]

【００２５】を計算し、Ｌ'を尤度とする。（数８）は
動的計画法によって効率的に計算することができる。即
ち、## EQU1 ## Let L'be the likelihood. (Equation 8) can be efficiently calculated by dynamic programming. That is,

【００２６】[0026]

【数９】 [Equation 9]

【００２７】をφ_i(１)＝log π_iとして、ｔ＝２,...,
Ｔ＋１、ｉ＝１,...,Ｊ＋１について漸化的に計算し、Let φ _i (1) = log π _i and t = 2, ...,
Recursively for T + 1, i = 1, ..., J + 1,

【００２８】[0028]

【数１０】 [Equation 10]

【００２９】として求められる。認識結果としては、Ｌ
を用いてもＬ'を用いても大差がないということから、
認識においてはViterbi法を用いるのが一般的である。
なお、（数９）の漸化式の計算においてはlog ω_j(ｙ_t)
の計算が必要であるが、相乗型の場合は、ｂ_imの代わり
にlog ｂ_imを記憶しておけば、（数６）前半の式を用い
ることにより、log ω_j(ｙ_t)の計算は積和のみで済むか
ら、計算量的にも相乗型を用いるのが最も有利である。Is calculated as The recognition result is L
Since there is no big difference between using and L ',
The Viterbi method is generally used for recognition.
In the calculation of the recurrence formula of (Equation 9), log ω _j (y _t )
It is necessary for calculation in the case of a synergistic type, by storing the log b _im instead of b _im, by using the equation (6) the first half of the equation, log omega _j calculations (y _t) Since only the sum of products is required, it is most advantageous to use the synergistic type in terms of calculation amount.

【００３０】[0030]

【発明が解決しようとする課題】前記従来例において
は、帰属度ｕ_tmは（数１）により計算している。これ
は、ファジィクラスタリングにおける目的関数In the above conventional example, the degree of membership u _tm is calculated by ( _Equation 1). This is the objective function in fuzzy clustering

【００３１】[0031]

【数１１】 [Equation 11]

【００３２】をTo

【００３３】[0033]

【数１２】 [Equation 12]

【００３４】の条件のもとで最小化すると言う基準によ
り導かれたものである。然るに、（数１１）は、天下り
的に与えられたものであって、理論的あるいは実験的に
導かれたものではなく、実際の世界におけるモデルとし
て必ずしも最適であるとは言えない。It is derived by the criterion of minimizing under the condition of. However, (Equation 11) is given in a descending direction, is not derived theoretically or experimentally, and cannot be said to be optimal as a model in the actual world.

【００３５】[0035]

【課題を解決するための手段】クラスタ数をＭ、観測ベ
クトルをｙとするとき、階層型ニューラルネットワーク
からなり、その入力層のユニットのそれぞれをｙの各要
素に対応させ、出力層の第ｍ番のユニットの出力をｙの
クラスタｍ（＝１,...,Ｍ）への帰属度あるいはクラス
タｍのｙに対する事後確率（以後、これも含めて帰属度
と呼ぶ）となしたことを特徴とする帰属度算出手段。When the number of clusters is M and the observation vector is y, it consists of a hierarchical neural network, and each unit of its input layer is associated with each element of y, and the m-th layer of the output layer. The output of the number unit is the degree of membership of y in cluster m (= 1, ..., M) or the posterior probability of cluster m in y (hereinafter, this is also referred to as membership degree). A means of calculating the degree of belonging.

【００３６】[0036]

【作用】クラスタ数をＭ、観測ベクトルをｙとすると
き、階層型ニューラルネットワークの入力層のユニット
のそれぞれをｙの各要素に対応させ、出力層の第ｍ番の
ユニットの出力をｙのクラスタｍ（＝１,...,Ｍ）への
帰属度あるいはクラスタｍのｙに対する事後確率（以
後、これも含めて帰属度と呼ぶ）となす。When the number of clusters is M and the observation vector is y, each unit of the input layer of the hierarchical neural network corresponds to each element of y, and the output of the m-th unit of the output layer is the cluster of y. The degree of membership to m (= 1, ..., M) or the posterior probability of cluster m to y (hereinafter also referred to as the degree of membership).

【００３７】[0037]

【実施例】Ｃ_mをクラスタｍとするとき、ｕ_tmは事後確
率Ｐ(Ｃ_m|ｙ_t)であると解釈でき、ＦＶＱ／ＨＭＭはク
ラスタの事後確率を帰属度で定義したものであると解釈
できる。然らば、この考え方を敷衍して、ｙ_tに対する
Ｃ_mの事後確率を別途何等かの方法で求めることが出来
れば、これをＦＶＱ／ＨＭＭの帰属度として用いること
が出来るはずである。EXAMPLES When C _m is a cluster m, u _tm can be interpreted as a posterior probability P (C _m | y _t ), and FVQ / HMM is a posterior probability of a cluster defined by a degree of membership. Can be interpreted. Obviously, if this idea is applied and the posterior probability of C _m with respect to y _t can be obtained by some other method, this should be used as the degree of membership of FVQ / HMM.

【００３８】一方、階層型ニューラルネットワークを、
入力層の各ユニットを識別すべき特徴ベクトルの要素の
それぞれに対応させ、出力層の第ｍユニットをクラスタ
ｍに対応させた構成とし、学習用ベクトル集合を用い
て、入力層に順次与えられるベクトルに対して、そのベ
クトルが帰属すべきクラスタに対応する出力層のユニッ
トの出力が１、他のユニットの出力が０になるように学
習しておけば、未知入力ｙ_tが入力されたときに第ｍユ
ニットの出力に得られる出力は、Ｐ(Ｃ_m|ｙ_t)に相当す
るという知見がある（栗田多喜夫："情報量基準による
３層ニューラルネットの隠れ層のユニット数の決定
法"，電子情報通信学会論文誌D-II，Vol.J73-D-II，No.
11，pp.1872-1878(1990年11月)）。本願発明はこの事実
を利用するものである。On the other hand, the hierarchical neural network is
Each unit of the input layer is made to correspond to each element of the feature vector to be identified, and the m-th unit of the output layer is made to correspond to the cluster m, and a vector sequentially given to the input layer by using the learning vector set. On the other hand, if learning is performed so that the output of the unit in the output layer corresponding to the cluster to which the vector should belong becomes 1 and the outputs of other units become 0, when the unknown input y _t is input. It is known that the output obtained as the output of the m-th unit corresponds to P (C _m | y _t ). (Takio Kurita: “Method of determining the number of hidden layer units in a three-layer neural network based on the information criterion”), IEICE Transactions D-II, Vol.J73-D- II, No.
11, pp.1872-1878 (November 1990)). The present invention utilizes this fact.

【００３９】従って、モデル作成の手順は次のようにな
る。 (1)コードブックの作成訓練ベクトル集合をクラスタリングしてＭ個のクラスタ
に分ける。各クラスタにはラベル１,...,Ｍが振られ、
そのラベルによって各クラスタのセントロイド、即ち、
各クラスタの代表ベクトルが検索可能な形で記憶され
る。具体的にはL.B.Gアルゴリズム等が用いられる。 (2)ニューラルネットワークの学習ここで用いるべきニューラルネットワークは（図３）に
示す。本例は３層の場合である。それぞれ入力層、中間
層、出力層等と呼ばれる。○はユニットと呼ばれるもの
で、入出力特性は、入力層では１、中間層では１／[１
＋exp(−θ)]、出力層では１または１／[１＋exp(−
θ)]とする場合が多い。ここで、θは入力レベルであ
る。１／[１＋exp(−θ)]はシグモイド関数と呼ばれ
る。第ｕ層の第ｋユニットと第ｕ＋１層の第ｊユニット
の間は、重みｗ^u _k ^u+1 _jで結合され、第ｕ層の第ｋユニッ
トの出力をｏ^u _kとすれば、第ｕ＋１層の第ｊユニットの
入力はｉ^u+1 _j＝Σ_k ｏ^u _kｗ^u _k ^u+1 _jである。Therefore, the model creating procedure is as follows. (1) Creating a codebook The training vector set is clustered and divided into M clusters. Labels 1, ..., M are assigned to each cluster,
By its label, the centroid of each cluster, ie
The representative vector of each cluster is stored in a searchable form. Specifically, the LBG algorithm or the like is used. (2) Learning of neural network The neural network to be used here is shown in (Fig. 3). This example is a case of three layers. They are called the input layer, the intermediate layer, and the output layer, respectively. ○ is called a unit, and the input / output characteristics are 1 in the input layer and 1 / [1 in the intermediate layer.
+ Exp (-θ)], 1 or 1 / [1 + exp (-
θ)] in many cases. Here, θ is the input level. 1 / [1 + exp (-θ)] is called a sigmoid function. The k-th unit of the u-th layer and the j-th unit of the u + 1-th layer are connected with a weight w ^u _k ^{u + 1} _j , and if the output of the k-th unit of the u-th layer is o ^u _k , then the u + 1-th unit The input of the j-th unit of the layer is i ^{u + 1} _j = Σ _k o ^u _k w ^u _k ^{u + 1} _j .

【００４０】ニューラルネットワークの学習は、学習用
入力ベクトルに対して所望の出力を教師信号として与え
ることにより、これら重み係数を決定することである。
これは実際の出力と所望の出力との２乗誤差を最小化す
る値として、繰り返し計算により求められる。本発明に
おいては具体的には次のようにして行う。The learning of the neural network is to determine these weighting factors by giving a desired output as a teacher signal to the learning input vector.
This is a value that minimizes the squared error between the actual output and the desired output, and is obtained by iterative calculation. In the present invention, specifically, it is performed as follows.

【００４１】出力層の第ｋユニットの出力をｏ^U _kとする
とき、上記クラスタリングにおいて、ｙ_n∈Ｃ_mの学習ベ
クトルに対して、ｏ^U _k＝１（ｋ＝ｍ）、ｏ^U _k＝０（ｋ≠
ｍ）となるように学習する。ただし、ｎ＝１,...,Ｎは
訓練ベクトルに付された通し番号である。このとき最小
化すべき目的関数は（数１３）のようになる。ただし、
δ_kmはクロネッカーのδであって、ｋ＝ｍのときはδ_km
＝１、ｋ≠ｍのときはδ_km＝０である。When the output of the k-th unit of the output layer is o ^U _k , in the above clustering, o ^U _k = 1 (k = m), o ^U _k = for the learning vector of y _n εC _m. 0 (k ≠
m) to learn. However, n = 1, ..., N is a serial number attached to the training vector. At this time, the objective function to be minimized is as shown in (Equation 13). However,
δ _km is Kronecker δ, and when k = m, δ _km
= 1 and k ≠ m, δ _km = 0.

【００４２】[0042]

【数１３】 [Equation 13]

【００４３】（数１３）の最小化はバックプロパゲーシ
ョンと呼ばれる周知の方法によって計算される（省
略）。 (3)ＨＭＭの学習ＨＭＭの学習は、初期確率π_i、状態遷移確率ａ_ij、ク
ラスタの発生確率ｂ_imを学習データ（単語などの認識単
位に対応して得られた特徴ベクトル系列の集合）から推
定することであって、Baum-Welchと呼ばれる周知の方法
によって行われ得る（省略）。この場合、本願発明の従
来のモデルと異なるところは（数４）〜（数７）におけ
るｕ_tmの計算を入力ｙ_tに対する前記ニューラルネット
ワークの出力とする点である。The minimization of (Equation 13) is calculated by a known method called back propagation (omitted). (3) Learning of HMM In learning of HMM, initial probability π _i , state transition probability a _ij , cluster occurrence probability b _im are learning data (a set of feature vector sequences obtained corresponding to recognition units such as words). It can be performed by a well-known method called Baum-Welch (omitted). In this case, the difference from the conventional model of the present invention is that the calculation of u _tm in ( _Equation 4) to ( _Equation 7) is used as the output of the neural network for the input y _t .

【００４４】単語１,...,Ｗを認識する場合は、単語
１,...,Ｗに対応した発声音声から学習用特徴ベクトル
系列群を得る。When recognizing words 1, ..., W, a learning feature vector sequence group is obtained from the voiced speech corresponding to words 1 ,.

【００４５】認識の手順は次のようになる。 (1)尤度の計算入力された音声に対するＨＭＭの尤度の計算法を説明す
る。（図２）は本発明による尤度計算装置の一実施例で
ある。尤度を計算すべきＨＭＭは上記のごとくして既に
計算され、ＨＭＭ記憶部２０４に記憶されている。２０
０は音声入力端子であって、認識すべき音声が入力され
る。２０１は特徴抽出部であって、例えば、１０msec毎
（フレームと呼ばれる）に特徴ベクトルｙ_tに変換され
る。ｔはフレーム番号である。特徴量としては、ＬＰＣ
やケプストラム等周知のものが用いられ、ｙ_tはこれら
の特徴量を要素とする通常１０数次元のベクトルであ
る。２０２は帰属度計算部であって、前述のごとく階層
型のニューラルネットワークで構成されており、ｙ_tの
各クラスタへの帰属度（ｙ_tに対する各クラスタの事後
確率）ｕ_tm＝Ｐ(Ｃ_m|ｙ_t)が計算される。２０３は尤度
計算部であって、前記帰属度に基づいて（数９）が計算
される。２０５は計算された尤度の出力端子である。 (2)認識（図４）は本発明による音声認識装置の一実施例であ
る。２０６はＨＭＭ記憶部であって、認識単位１,...,
Ｗに対応したモデル、即ち、ＨＭＭ１,ＨＭＭ２,...,
ＨＭＭＷが記憶されている。２００〜２０３は（図
２）における同じ番号を付したブロックと同じ働きをす
るブロックである。ただし、２０３では帰属度計算部２
０３の出力に対してるＨＭＭ１,ＨＭＭ２,..., ＨＭ
ＭＷすべてのモデルの尤度が計算される。２０７は判
定部であって、認識単位ｗに対する尤度をＬ^wとすると
き、（数２）を計算するものである。２０８は認識結果
ｗ^*を出力する端子である。The recognition procedure is as follows. (1) Calculation of Likelihood A method of calculating the likelihood of HMM for the input speech will be described. FIG. 2 shows an embodiment of the likelihood calculation device according to the present invention. The HMM for which the likelihood should be calculated is already calculated as described above and is stored in the HMM storage unit 204. 20
Reference numeral 0 is a voice input terminal for inputting a voice to be recognized. Reference numeral 201 denotes a feature extraction unit, which converts the feature vector y _t every 10 msec (called a frame), for example. t is a frame number. As a feature quantity, LPC
Well-known ones such as a cepstrum and a cepstrum are used, and y _t is usually a vector of ten-odd dimension having these feature quantities as elements. 202 is a degree of membership calculation unit is constituted by a neural network of a hierarchical as described above, (posterior probability of each cluster for y _t) membership to each cluster of _{_{y t u tm = P (C}} m | y _t ) is calculated. A likelihood calculating unit 203 calculates (Equation 9) based on the degree of membership. 205 is an output terminal of the calculated likelihood. (2) Recognition (FIG. 4) is an embodiment of the voice recognition device according to the present invention. Reference numeral 206 denotes an HMM storage unit, which has recognition units 1, ...,
A model corresponding to W, that is, HMM 1, HMM 2, ...,
HMM W is stored. Reference numerals 200 to 203 are blocks having the same functions as the blocks having the same numbers in (FIG. 2). However, in 203, the degree of membership calculator 2
HMM 1, HMM 2, ..., HM for output 03
MW The likelihoods of all models are calculated. A determination unit 207 calculates (Equation 2) when the likelihood for the recognition unit w is L ^w . A terminal 208 outputs the recognition result w ^* .

【００４６】[0046]

【発明の効果】以上のようにして、実際のデータからク
ラスタの事後確率を計算するニューラルネットワークを
直接学習するようにしたので、より現実に近い形でｙ_t
に対する各クラスタの事後確率を得ることが出来、認識
率の向上が期待される。As described above, since the neural network for calculating the posterior probability of clusters is directly learned from the actual data, y _t can be obtained in a more realistic manner.
The posterior probability of each cluster can be obtained, and the recognition rate is expected to improve.

[Brief description of drawings]

【図１】従来のＦＶＱ／ＨＭＭの一実施例を示すブロッ
ク図FIG. 1 is a block diagram showing an embodiment of a conventional FVQ / HMM.

【図２】本発明による尤度計算の一実施例を示すブロッ
ク図FIG. 2 is a block diagram showing an embodiment of likelihood calculation according to the present invention.

【図３】本発明による音声認識装置の一実施例を示す図FIG. 3 is a diagram showing an embodiment of a voice recognition device according to the present invention.

【図４】本発明による音声認識装置の一実施例を示すブ
ロック図FIG. 4 is a block diagram showing an embodiment of a voice recognition device according to the present invention.

[Explanation of symbols]

１００特徴抽出部１０１ベクトル量子化部１０２コードブック１０３ＨＭＭ記憶部１０４尤度算出部１０５判定部 100 feature extraction unit 101 Vector quantizer 102 Codebook 103 HMM storage unit 104 Likelihood calculator 105 determination unit

フロントページの続き (56)参考文献丹康雄，江島俊朗，多入／出力素子を用いたネットワークＦｕｚｚｙＰａｒｔｉｔｉｏｎＭｏｄｅｌの提案とその基本的性質，電子情報通信学会技術研究報告［パターン認識・理解］，1989年９月21日，ＰＲＵ89−45，ｐ．39−46 坪香英一，中橋順一，相乗型ＦＶＱ／ＨＭＭ，電子情報通信学会技術研究報告［音声］，日本，1993年６月18日, ＳＰ93−27，ｐ．25−32 坪香英一，中橋順一，ＦＶＱ／ＨＭＭに関する一考察，日本音響学会平成４年度秋季研究発表会講演論文集，日本, 1992年10月５日，２−１−２，ｐ．81 −82 中橋順一，坪香英一，コードベクトル作成におけるＨＭＭの構造反映の効果, 日本音響学会平成５年度春季研究発表会講演論文集，日本，1993年３月17日, ２−４−３，ｐ．23−24 加藤喜永，杉山雅英，ファジィパーティションモデルを用いた連続音声認識, 電子情報通信学会技術研究報告［音声］，日本，1992年６月30日，ＳＰ92− 28，ｐ．31−37 坪香英一，中橋順一，特徴ベクトル間の非類似度を帰属度ベクトル間の非類似度とする音声認識，電子情報通信学会論文誌Ｄ−ＩＩ，日本，1996年12月，Ｖｏｌ．Ｊ79−Ｄ−ＩＩＮｏ．12，ｐ. 2039−2046 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/10 G10L 15/14 G10L 15/16 G06F 15/18 560 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References Yasuo Tan, Toshiro Ejima, Proposal of a network Fuzzy Partition Model using multiple input / output elements and its basic properties, IEICE Technical Report [Pattern Recognition / Understanding], September 21, 1989, PRU89-45, p. 39-46 Eiichi Tsubo, Junichi Nakahashi, Synergistic FVQ / HMM, Technical Report of IEICE [Voice], Japan, June 18, 1993, SP93-27, p. 25-32 Eiichi Tsubo, Junichi Nakahashi, A Study on FVQ / HMM, Proceedings of the 1994 Autumn Meeting of the Acoustical Society of Japan, Japan, October 5, 1992, 2-1-2, p. 81-82 Junichi Nakahashi, Eiichi Tsubo, Effect of HMM Structure Reflection on Code Vector Creation, Proceedings of the 5th Spring Research Conference of ASJ, Japan, March 17, 1993, 2-4-3 , P. 23-24 Kato Yoshinaga, Sugiyama Masahide, Continuous Speech Recognition Using Fuzzy Partition Model, IEICE Technical Report [Speech], Japan, June 30, 1992, SP92-28, p. 31-37 Eiichi Tsubo, Junichi Nakahashi, Speech recognition with dissimilarity between feature vectors as dissimilarity between attribution vectors, IEICE Transactions D-II, Japan, December 1996, Vol. J79-D-II No. 12, p. 2039-2046 (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 15/10 G10L 15/14 G10L 15/16 G06F 15/18 560 JISST file (JOIS)

Claims

(57) [Claims]

1. When the number of clusters is M and the observation vector is y, it is composed of a hierarchical neural network, each unit of its input layer is associated with each element of y, and the m-th (= 1, 1, The output of unit No ...., M) is defined as the degree of membership of y to cluster m or the posterior probability of cluster m to y (hereinafter, this is also referred to as the degree of membership) .
The degree of membership calculation means and the probability of occurrence of cluster m in b _m and y
When the degree of membership in cluster m is u _m , the degree of occurrence of y
Calculate (observed degree of y) as a function of b _m and u _m
Where u _m is calculated by the degree-of-attribute calculating means.
A device for calculating the degree of occurrence of observation vectors characterized by being transmitted
Place

2. The logarithm of the degree of occurrence is u _m of the logarithm of b _m .
Is calculated as a weighted arithmetic mean with
The observation vector generation degree calculation device according to claim 1, which is a characteristic .

3. The degree of occurrence is the weight of b _m , with u _m as the weight.
The calculation is performed as an attached geometric mean.
The observation vector generation degree calculation device described .

4. The degree of occurrence is a weight of b _m with a weight of u _m
2. The calculation is performed as an arithmetic mean with addition.
The observation vector generation degree calculation device described .

5. A hidden Markov model (HMM: Hidden Mar
kov Model), and the cluster m (=
1, ..., M) occurrence probability b _im , and the observation vector at time t.
Let _k _utl be the degree of membership in the cluster m of y _t and u _tm .
Then, the occurrence degree of y _t in state i (y _t is observed
Degree) as the function of b _im and u _tm
Occurrence degree calculation means and observation vector series y ₁ , y ₂ , ..., y
_The degree of occurrence of _T from the HMM is calculated from the observation vector
Observation vector calculated using the calculation result of the degree calculation method
And u _tm is the number of clusters
Is M and the observation vector is y, the hierarchical neural
It consists of a network and each of its input layer units
It corresponds to each element of y, and the mth unit of the output layer
Of the output of y to the cluster m (= 1, ..., M) of y
Calculated by a device for calculating degree of belonging characterized by
Likelihood calculation device characterized by the following.

6. W HMs corresponding to recognition units 1, ..., W
M and a cluster in the state i of the hidden Markov model w
The probability of occurrence of m (= 1, ..., M) is b ^w _im , at time t
The observation vector is y _t , and the degree of membership of y _t in cluster m is u _tm
And the occurrence degree of y _t in the state i of HMM w
Calculate ( observed degree of y _t ) as a function of b ^w _im and u _tm
Observation vector generation degree calculation means to be issued and observation vector
Degree L ^{w that the} sequence y ₁ , y ₂ , ..., y _T is generated from HMM ^w
Using the calculation result of the observation vector generation degree calculation means
, W, 1, ..., W calculation of observation vector series
The maximum w among the degree-of-life calculation means and L ¹ , ..., L ^W
Result determination means for calculating and determining w as a recognition result
And u _tm has the number of clusters M and the observation vector y
And consists of a hierarchical neural network,
Each of its input layer units corresponds to each element of y
The output of the m-th unit in the output layer to the cluster m of y
Characterized by the degree of membership in (= 1, ..., M)
An identification feature characterized by being calculated by a degree of membership calculator
Intelligence device.

7. The logarithmic value of the degree of occurrence is u _m of the logarithmic value of b _m .
Is calculated as a weighted arithmetic mean with
The recognition device according to claim 6, which is used as a signature .

8. The degree of occurrence is a weight of b _m with a weight of u _m
7. The calculation is performed as a geometric mean with a weight.
The recognition device described .

9. The degree of occurrence is the weight of b _m , with u _m as the weight.
7. The calculation is performed as an arithmetic mean with addition.
The recognition device described .

10. The HMM of the recognition unit w in the state i
Probability of occurrence of raster m (= 1, ..., M) is b ^w _im at time t
The observation vectors in y _t , the degree of membership of y _t in cluster m
_Is u _tm , the degree of occurrence of y _t in state i (y _t
(Observed degree of) as a function of b ^w _im and u _tm
Observation vector generation degree calculation means and observation vector series
The degree of occurrence of y ₁ , y ₂ , ..., y _T from the HMM is described above.
Calculation using the calculation result of the observation vector generation degree calculation means
Observation vector sequence occurrence degree calculating means and recognition unit w
HMMs corresponding to
Of the observation vector sequence for learning of
Maximize the occurrence degree calculated by the liveness degree calculation means.
Parameters to learn the parameters of HMM w
And learning means, u _tm is M number of clusters, observation vector
A torque when you and y, or hierarchical neural network
Each of the units in its input layer to each element of y
The output of the m-th unit in the output layer
It is specified that the degree of belonging to the raster m (= 1, ..., M)
Characteristic that it is calculated by a device for calculating the degree of belonging
HMM creation device.

11. The logarithmic value of the degree of occurrence is the logarithmic value of b _m ,
Calculate as a weighted arithmetic mean with u _m as the weight
The HMM creating apparatus according to claim 10, wherein

12. The degree of occurrence is the weight of b _m , with u _m as the weight.
Calculated as a weighted geometric mean
10. The HMM creation device according to item 10 .

13. The degree of occurrence is the weight of b _m , with u _m as the weight.
The calculation is performed as a weighted arithmetic average.
10. The HMM creation device according to item 10 .

14. Clustering means for clustering a set of learning vectors, and when a learning vector belonging to the mth cluster is input, the output of the mth unit of the output layer is 1, and the output of units of other output layers is The membership degree calculation device according to claim 1, wherein the weighting coefficient is learned as 0.