JPH09160586A

JPH09160586A - Learning method for hidden markov model

Info

Publication number: JPH09160586A
Application number: JP7317825A
Authority: JP
Inventors: Takashi I; 傑易
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-12-06
Filing date: 1995-12-06
Publication date: 1997-06-20

Abstract

PROBLEM TO BE SOLVED: To provide a method of learning environment-dependent phonemes HMM(hidden Markov model) which is smoothed and reduces the bias of learned data, while maintaining the advantages of the environment-dependent phonemes HMM. SOLUTION: A word HMM is learned in a step 6, separated into environment- dependent phonemes HMM in a step 7, and recombined in a step 9 to produce the word HMM. The environment-dependent phonemes HMM are learned by repeating such learning, separating, and connecting processes. Thereafter, whether or not the number of samples learned at the environment-dependent phonemes HMM has been sufficient is judged and, only when the number has been judged to be insufficient, a parameter is obtained in a step 12 as the weighted means of the environment-dependent phonemes HMM and the parameter of corresponding environment-independent phonemes HMM by use of a weighting factor corresponding to the number of the samples learned, and is substituted of the parameter of the environment-dependent phonemes HMM.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識方法等に
用いられるヒドン・マルコフ・モデル（以下、「ＨＭ
Ｍ」という）の学習方法に関するものである。The present invention relates to a hidden Markov model (hereinafter referred to as "HM") used in a speech recognition method and the like.
M))).

【０００２】[0002]

【従来の技術】従来、このような分野の技術としては、
例えば次のような文献に記載されるものがあった。文献１；ザ・ベル・システム・テクニカル・ジャーナル
（The Bell System Technical Journal ）、６２「４」
（１９８３−４） AmericanTelephone and Telegrap
h Company、（米）、エス・イー・レビンソン（S.E.Levi
nson）、エル・アール・ラビナー（L.R.Rabiner ）、エ
ム・エム・ソンディ（M.M.Sondhi）共著「An Introduct
ion to theApplication of the Theory of Probabilist
ic Function of aMarkov Process to Automatic Speech
Recognition」ｐ．1035-1074 文献２；中川聖一著「確率モデルによる音声認識」（昭
６３−７）、電子情報通信学会、ｐ．５５−６１音声認識技術には、古典的なパターン・マッチング手法
と統計的な手法とがあり、近年では後者が主流になりつ
つある。後者の統計的な手法では、確率的な有限状態を
持つマルコフ・モデルが提案されており、通常、ＨＭＭ
と呼ばれる。一般に、ＨＭＭは、複数の状態（例えば、
音声の特徴等）と状態間の遷移からなる。さらに、ＨＭ
Ｍは状態間の遷移を表す遷移確率と、遷移する際に伴う
ラベル（音声の特徴パラメータの典型的なもので、通常
数十から数千種類がある）を出力する出力確率を有して
いる。このようなＨＭＭを用いた音声認識方法が前記文
献１に記載されており、その単語音声認識の例を図２に
示す。[0002] Conventionally, as a technology in such a field,
For example, the following documents have been described. Document 1; The Bell System Technical Journal, 62 "4"
(1983-4) AmericanTelephone and Telegrap
h Company, (US), S. E. Levinson (SELevi
nson), L. R. Rabiner (LR Rabiner), M. M. Songdi (MMSondhi) co-authored "An Introduct
ion to the Application of the Theory of Probabilist
ic Function of a Markov Process to Automatic Speech
"Recognition" p. 1035-1074 Article 2; Seiichi Nakagawa, "Speech recognition by probabilistic model" (Apr. 63-7), Institute of Electronics, Information and Communication Engineers, p. 55-61 Speech recognition techniques include classical pattern matching methods and statistical methods, and the latter has become mainstream in recent years. In the latter statistical method, Markov models with probabilistic finite states are proposed, and usually HMMs
It is called. In general, HMMs have multiple states (eg,
It consists of transitions between voice features and so on) and states. Furthermore, HM
M has a transition probability representing a transition between states, and an output probability of outputting a label accompanying transition (a typical characteristic parameter of speech, which usually has several tens to several thousands). . A speech recognition method using such an HMM is described in the document 1 and an example of the word speech recognition is shown in FIG.

【０００３】図２は、従来の音声認識方法に用いられる
単語ＨＭＭの構造例を示す図である。図２のＳ₁，
Ｓ₂，Ｓ₃，Ｓ₄はＨＭＭにおける音声の特徴等の状態
を、ａ₁₁，ａ₁₂，ａ₂₂，ａ₂₃，ａ₃₃，ａ₃₄，ａ₄₄，ａ₄₅
は状態遷移確率を、ｂ₁（ｋ），ｂ₂（ｋ），ｂ₃(
ｋ），ｂ₄（ｋ）はラベルｋに対するラベル出力確率
を、それぞれ表している。ＨＭＭでは、状態遷移確率ａ
_ij（但し、ｉ＝１，・・，４、ｊ＝１，・・，５）で状
態遷移が行われる際、ラベル出力確率ｂ_j（ｋ）でラベ
ルを出力する。発声された単語をＨＭＭを用いて認識す
るには、まず、各単語に対して用意された学習データを
用いて、その単語のラベル列を最も高い確率で出力する
ようにＨＭＭを学習する。次に、発声された未知単語の
ラベル列を入力し、最も高い出力確率を与えた単語ＨＭ
Ｍを認識結果とする。この種の音声認識方法では、発声
された単語そのものにＨＭＭを与えて学習し、尤度（即
ち、ラベル出力の出力確率）によって認識結果を判断し
ている。このような単語ＨＭＭは、優れた認識精度を保
証するが、認識語彙数が増大することによって、膨大な
学習データが必要となる。また、学習対象語以外の音声
が全く認識できない等の欠点もある。一方、音声学では
通常、音素と呼ばれる声学的要素の系列で単語を表して
いる。従って、音素ごとにＨＭＭを用意し、これらのＨ
ＭＭを連結して単語ＨＭＭを生成し、単語認識を行う方
法もある。しかし、実際に発声された単語音声において
は、各々の音素は隣同士の音素の影響を受け、特徴パラ
メータ（例えば、スペクトル）が、かなり変形してしま
う。このような調音結合によるスペクトルの変形は、音
素ＨＭＭで表現しきれないことがある。そのため、この
ような単純に音素ＨＭＭを連結して単語を認識する方法
では、認識率の低下が免れない。FIG. 2 is a view showing an example of the structure of a word HMM used in the conventional speech recognition method. S ₁ in FIG.
S ₂ , S ₃ , and S ₄ indicate states of speech features and the like in the HMM, and a ₁₁ , a ₁₂ , a ₂₂ , a ₂₃ , a ₃₃ , a ₃₄ , a ₄₄ , a ₄₅
Is the state transition probability, b ₁ (k), b ₂ (k), b ₃ (
k) and b ₄ (k) represent label output probabilities for the label k, respectively. In HMM, state transition probability a
When state transition is performed at _ij (where i = 1,..., 4, j = 1,..., 5), a label is output with the label output probability b _j (k). In order to recognize a uttered word using an HMM, first, using learning data prepared for each word, the HMM is trained so as to output the label string of the word with the highest probability. Next, the label sequence of the uttered unknown word is input, and the word HM giving the highest output probability
Let M be the recognition result. In this type of speech recognition method, an HMM is given to the uttered word itself for learning, and the recognition result is judged by the likelihood (that is, the output probability of the label output). Such word HMMs guarantee excellent recognition accuracy, but an increase in the number of recognition vocabulary requires a large amount of learning data. In addition, there is a disadvantage that voices other than the learning target words can not be recognized at all. On the other hand, in phonetics, a word is usually represented by a series of phonetic elements called phonemes. Therefore, prepare an HMM for each phoneme, and
There is also a method of connecting a MM to generate a word HMM and performing word recognition. However, in the word speech actually uttered, each phoneme is influenced by neighboring phonemes, and feature parameters (for example, spectrum) are considerably deformed. Such deformation of spectrum due to articulatory combination may not be able to be represented by a phoneme HMM. Therefore, such a method of simply connecting phoneme HMMs to recognize words inevitably leads to a decrease in recognition rate.

【０００４】このような調音結合による影響を除去する
ため、前後の音韻環境に依存する音素モデル、つまり、
ダイフォン（diphone)とトライフォン（triphone）が提
案されている。ここでダイフォンとは、対象音素に対し
て、先行音素若しくは後続音素のどちらかが既知である
音素（片側環境依存型音素）を指し、トライフォンは先
行音素と後続音素の両方が既知である音素（環境依存型
音素）を指す。音声認識を行う際、ダイフォンあるいは
トライフォンＨＭＭを用意し、これらのＨＭＭの連接に
よって単語ＨＭＭを構成し、単語認識を行うようにして
いる。環境依存型音素ＨＭＭは、環境独立型音素ＨＭＭ
に比べ、調音結合によるスペクトル変形に伴う認識率の
低下が回避できるが、モデル数が多いため、ＨＭＭを学
習するには大量の学習データを用意しなければならな
い。その上、学習データに各々のダイフォンあるいはト
ライフォンが存在する区間を示す情報（即ち、ラベル情
報）も用意しなければならない。しかも、ラベル付け作
業を行う場合、例えばコンピュータによる自動作業で
は、満足のゆく精度が得られず、ほとんど手作業でラベ
ル付けを行っている。そこで、従来、ラベル情報を要し
ない学習方法が提案されている。この方法では、まず、
学習しやすい環境独立型音素ＨＭＭを用意する。そし
て、発声内容が既知でラベルが付かない単語（又は文節
若しくは文、以下同様）発声の学習データに対して、先
の環境独立型音素ＨＭＭを連結して単語ＨＭＭを構築
し、これらの単語ＨＭＭを学習する。単語ＨＭＭの学習
なので、単語境界（即ち、単語の始端と終端）が分かれ
ば、学習プロセスが実現できる。さらに、連結と逆の手
続きで、これらの単語ＨＭＭを分解し、環境依存型音素
ＨＭＭを生成する。学習精度を良くするため、上述の連
結学習及び分解生成を繰り返すことによって、近似的に
環境依存型音素ＨＭＭを生成する。In order to eliminate the influence of such articulatory coupling, a phoneme model which depends on the phonological environment before and after, ie,
A diphone and a triphone have been proposed. Here, the diphone refers to a phoneme (one-sided environment-dependent phoneme) for which either the preceding phoneme or the subsequent phoneme is known with respect to the target phoneme, and the triphone is a phoneme for which both the preceding phoneme and the subsequent phoneme are known. (Environment-dependent phoneme). When performing speech recognition, a diphone or triphone HMM is prepared, and a word HMM is constructed by linking these HMMs, and word recognition is performed. Environment-dependent phoneme HMMs are environment-independent phoneme HMMs
Compared to the above, it is possible to avoid the decrease in recognition rate due to the spectrum deformation due to articulatory coupling, but since there are a large number of models, it is necessary to prepare a large amount of learning data to learn the HMM. Furthermore, it is necessary to prepare information (ie, label information) indicating the section in which each diphone or triphone exists in the learning data. Moreover, when performing labeling work, for example, automatic work by a computer, satisfactory accuracy can not be obtained, and labeling is almost always performed manually. Then, the learning method which does not require label information conventionally is proposed. First of all,
Prepare environment-independent phoneme HMMs that are easy to learn. Then, the above environment independent phoneme HMMs are connected to the learning data of words (or clauses or sentences, the same shall apply hereinafter) of utterance contents known and not labeled to construct word HMMs, and these word HMMs are constructed. To learn Since the learning of word HMMs, the learning process can be realized if the word boundaries (ie, the start and end of the word) are known. Furthermore, these word HMMs are decomposed in the reverse procedure of concatenation to generate environment-dependent phoneme HMMs. In order to improve learning accuracy, environment-dependent phoneme HMMs are approximately generated by repeating the above-described connected learning and decomposition generation.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、従来の
環境依存型音素ＨＭＭ学習法では、次のような課題があ
った。ある特定の環境依存型音素ＨＭＭに対して、それ
に対応する音声データの数が場合によっては非常に限ら
れるため、上述のように学習して得た環境依存型音素Ｈ
ＭＭは、性質が学習データに左右されやすい。即ち、学
習データに偏るおそれがある。前記の学習方法で環境依
存型音素ＨＭＭを学習し、特に少量の学習サンプルから
学習されたＨＭＭに対して、環境依存型音素ＨＭＭのパ
ラメータとそれに対応する環境独立型音素ＨＭＭのパラ
メータとで重み付き平均をとり、その平均値を環境依存
型音素ＨＭＭパラメータとする方法がある。これは、環
境独立型音素ＨＭＭが大量の学習データから学習しやす
く、学習データへの偏りが少ないからである。ところ
が、前記のＨＭＭパラメータの平均化処理は、重み係数
と呼ばれるパラメータが学習データに依存しており、か
つ実験でその値を決めていたため、重み係数の決定には
手間がかかった。また、重み係数の設定によっては、環
境独立型音素ＨＭＭパラメータに重点が置かれ、学習で
得られた環境依存型音素ＨＭＭパラメータが生かされな
いおそれもあった。本発明は、前記従来技術が持ってい
た課題として、簡単でかつ学習結果を有効に生かした重
み係数の設定により、学習データへの偏りが少ない環境
依存型音素ＨＭＭの学習方法を提供するものである。However, the conventional environment-dependent phoneme HMM learning method has the following problems. For a particular environment-dependent phoneme HMM, the number of corresponding speech data is sometimes very limited, so the environment-dependent phoneme H obtained by learning as described above
MM is easily influenced by learning data. That is, there is a risk of being biased towards learning data. The environment-dependent phoneme HMM is learned by the above-described learning method, and in particular, with respect to the HMM learned from a small amount of learning samples, the parameters of the environment-dependent phoneme HMM and the parameters of the corresponding environment-independent phoneme HMM are weighted There is a method of taking an average and making the average value an environment-dependent phoneme HMM parameter. This is because the environment-independent phoneme HMM is easy to learn from a large amount of learning data, and there is little bias to the learning data. However, in the above-described averaging process of HMM parameters, since a parameter called a weighting factor is dependent on learning data and the value thereof is determined by experiment, it takes time to determine the weighting factor. In addition, depending on the setting of the weighting factor, emphasis is placed on environment independent phoneme HMM parameters, and there is also a possibility that the environment dependent phoneme HMM parameters obtained by learning may not be used. SUMMARY OF THE INVENTION The present invention provides a learning method of environment-dependent phoneme HMMs with less bias to learning data by setting weighting coefficients that is simple and effectively using learning results as problems that the above-described conventional techniques have. is there.

【０００６】[0006]

【課題を解決するための手段】前記課題を解決するため
に、第１の発明は、環境依存型音素ＨＭＭを学習するに
際して、予め用意しておいた環境独立型音素ＨＭＭを連
結して単語（又は文節若しくは文）ＨＭＭを構築する。
そして、前記単語（又は文節若しくは文）ＨＭＭを学習
する学習処理と、前記学習処理後にその学習結果を環境
依存型音素ＨＭＭに分解する分解処理と、前記分解され
た環境依存型音素ＨＭＭを再連結して単語（又は文節若
しくは文）ＨＭＭを作る連結処理とを用い、前記学習処
理、分解処理及び連結処理を繰り返すことによって前記
環境依存型音素ＨＭＭを学習するＨＭＭの学習方法にお
いて、次のような手段を講じている。即ち、この第１の
発明では、前記環境依存型音素ＨＭＭの学習に使われた
学習サンプルの数を計数し、この学習サンプル数が不十
分だと判断されたときのみ、前記繰り返しで最後に分解
された環境依存型音素ＨＭＭのパラメータと、このパラ
メータに対応する前記環境依存型音素ＨＭＭのパラメー
タとを、前記学習サンプル数に応じた重み係数で重み付
き平均したパラメータを算出し、前記繰り返しで最後に
分解された環境依存型音素ＨＭＭのパラメータを前記算
出したパラメータで置き換え、前記環境依存型音素ＨＭ
Ｍを学習するようにしている。[Means for Solving the Problems] In order to solve the above problems, in the first invention, when learning environment-dependent phoneme HMMs, environment-independent phoneme HMMs prepared in advance are connected to form words ( Or construct a clause or sentence) HMM.
Then, a learning process for learning the word (or phrase or sentence) HMM, a decomposition process for decomposing the learning result into an environment-dependent phoneme HMM after the learning process, and reconnection of the decomposed environment-dependent phoneme HMM In the HMM learning method for learning the environment-dependent phoneme HMM by repeating the learning process, the decomposition process and the connection process using the word (or phrase or sentence) HMM and the linking process, the following is performed. I am taking measures. That is, in the first invention, the number of learning samples used to learn the environment-dependent phoneme HMM is counted, and only when it is determined that the number of learning samples is insufficient, the decomposition is finally performed in the repetition. A weighted parameter is calculated by weighting the parameters of the environment-dependent phoneme HMM and the parameters of the environment-dependent phoneme HMM corresponding to the parameters with a weighting factor according to the number of learning samples, and Replacing the parameters of the environment-dependent phoneme HMM decomposed into the above-mentioned parameters,
I try to learn M.

【０００７】第２の発明は、第１の発明とほぼ同様の学
習方法であるが、学習サンプル数が不十分だと判断され
たときに、置き換えるパラメータの算出方法が相違して
いる。即ち、この第２の発明では、繰り返しで最後に分
解された環境依存型音素ＨＭＭのパラメータと、このパ
ラメータに対応する予め用意しておいた片側環境依存型
音素ＨＭＭのパラメータとを、前記学習サンプル数に応
じた重み係数で重み付き平均したパラメータを算出し、
前記繰り返しで最後に分解された環境依存型音素ＨＭＭ
のパラメータを前記算出したパラメータで置き換え、前
記環境依存型音素ＨＭＭを学習するようにしている。第
１の発明によれば、以上のようにＨＭＭの学習方法を構
成してたので、環境依存型音素ＨＭＭを学習し終えた
後、その環境依存型音素ＨＭＭの学習に使われた学習サ
ンプルの数が計数される。このサンプル数が十分でない
と判断されると、最後に分解された環境依存型音素ＨＭ
Ｍパラメータとそれに対応する前記環境独立型音素ＨＭ
Ｍのパラメータとが、前記学習サンプル数に応じた重み
係数によって重み付き平均され、新しいパラメータが算
出される。そして、前記環境依存型音素ＨＭＭのパラメ
ータは、この新しいパラメータで置き換えられる。学習
サンプルの数が十分ある場合には、置き換え処理は行わ
れない。第２の発明によれば、第１の発明と異なり、予
め用意しておいた片側環境依存型音素ＨＭＭのパラメー
タと最後に分解された環境依存型音素ＨＭＭのパラメー
タとを、前記学習サンプル数に応じた重み係数によって
重み付き平均して、新しいパラメータを算出している。The second invention is a learning method substantially the same as the first invention, but the method of calculating a parameter to be replaced is different when it is judged that the number of learning samples is insufficient. That is, in the second aspect of the invention, the learning sample includes the parameters of the environment-dependent phoneme HMM finally decomposed in repetition and the parameters of the one-sided environment-dependent phoneme HMM prepared in advance corresponding to the parameters. Calculate weighted parameters with weighting factors according to the number,
Environment-dependent phoneme HMM decomposed last in the above iteration
Is replaced with the calculated parameter to learn the environment-dependent phoneme HMM. According to the first aspect of the invention, since the learning method of the HMM is configured as described above, after learning the environment-dependent phoneme HMM, the learning sample of the environment-dependent phoneme HMM is used. The number is counted. If it is determined that the number of samples is not enough, the finally decomposed environment-dependent phoneme HM
M parameter and corresponding environment independent phoneme HM
A parameter of M is weighted and averaged by a weighting factor according to the number of learning samples to calculate a new parameter. Then, the parameters of the environment-dependent phoneme HMM are replaced with the new parameters. If there are a sufficient number of learning samples, replacement processing is not performed. According to the second aspect of the invention, unlike the first aspect of the invention, the parameters of the one-sided environment-dependent phoneme HMM prepared in advance and the parameters of the environment-dependent phoneme HMM decomposed at the end are the number of learning samples. New parameters are calculated by weighted averaging according to the corresponding weighting factors.

【０００８】[0008]

【発明の実施の形態】第１の実施形態図１は、本発明の第１の実施形態を示すＨＭＭの学習方
法の処理内容のフローチャートである。この図を参照し
て、本実施形態のＨＭＭの学習方法を説明する。この第
１の実施形態のＨＭＭの学習方法では、例えば、プログ
ラム制御されるコンピュータを用いて、図１のステップ
１〜１３の処理が実行される。まず、ステップ１で学習
が開始されると、ステップ２において、学習データの音
声信号（例えば単語音声として，単語“ａｋａｉ”と
“ｓａｋａｅ”）が入力され、ステップ３の前処理へ進
む。ステップ３の前処理では、例えば入力されたアナロ
グ音声信号をアナログ／ディジタル変換によってディジ
タル信号に変換し、ＬＰＣ（Linear Predictive Codin
g、線形予測符号化）分析によるＬＰＣケプストラムの
抽出等により、図２に示すＨＭＭの各状態Ｓ_i（ｉ＝
１，・・，４）に対応する音声特徴パラメータを抽出
し、ステップ５へ進む。DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment FIG. 1 is a flow chart of the processing contents of a learning method of an HMM according to a first embodiment of the present invention. The learning method of the HMM according to the present embodiment will be described with reference to this figure. In the learning method of the HMM according to the first embodiment, for example, the processes of steps 1 to 13 in FIG. 1 are performed using a program-controlled computer. First, when learning is started in step 1, speech signals of learning data (for example, words "akai" and "sakae" as word speech) are input in step 2, and the process proceeds to pre-processing of step 3. In the pre-processing of step 3, for example, the input analog voice signal is converted into a digital signal by analog / digital conversion, and LPC (Linear Predictive Codin) is performed.
g, extraction of LPC cepstrum by linear prediction coding) analysis, etc., each state S _i (i =
The speech feature parameters corresponding to 1,..., 4) are extracted, and the process proceeds to step 5.

【０００９】環境独立型音素ＨＭＭ辞書４には、例えば
日本語音素（約３０〜４０種類）のＨＭＭが格納されて
いる。いわゆる環境独立というのは、その音素の前後の
音素が未知であることを指す。これを、例えば次のよう
に表す。（＊）ａ（＊）（＊）ｉ（＊）（＊）ｕ（＊）（＊）ｅ（＊）（＊）ｏ（＊）（＊）ｋ（＊）（＊）ｓ（＊）（＊）ｔ（＊）・・・ステップ５では、入力された単語の音素列表現と、環境
独立型音素ＨＭＭ辞書４を参照しながら、前記の環境独
立型音素ＨＭＭを、例えば次式（１）のように連結して
単語ＨＭＭを構成する。単語“ａｋａｉ”のＨＭＭ ← （＊）ａ（＊）＋（＊）ｋ（＊）＋（＊）ａ（＊）＋（＊）ｉ（＊）単語“ｓａｋａｅ”のＨＭＭ← （＊）ｓ（＊）＋（＊）ａ（＊）＋（＊）ｋ（＊）＋（＊）ａ（＊）＋（＊）ｅ（＊）・・・（１）但し、“＋”：ＨＭＭの連結を意味する。次に、ステップ６では、ステップ３での前処理の結果と
ステップ５で構成された単語ＨＭＭとを用いて、この単
語ＨＭＭのパラメータを推定する（ＨＭＭの学習）。Ｈ
ＭＭパラメータの推定には、例えば前記文献２に記載さ
れたBaum-Welch（Ｂ−Ｗ）アルゴリズムを用いる。この
Ｂ−Ｗアルゴリズムでは、例えば観測ラベル系列Ｏ＝ｏ
₁，ｏ₂，・・，ｏ_T及び状態系列Ｉ＝ｉ₁，ｉ₂，・
・，ｉ_Tに対して、次式（２）のように、前向き変数α
_t（ｉ）と後向き変数β_t（ｉ）を定義する。 α_t（ｉ）＝Ｐｒ（ｏ₁，ｏ₂，・・，ｏ_t，ｉ_t＝Ｓ_i） β_t（ｉ）＝Ｐｒ（ｏ_t+1，ｏ_t+2，・・，ｏ_T｜ｉ_t＝Ｓ_i）・・・（２）そして、状態遷移確率ａ_ijとラベル出力確率ｂ_j（ｋ）
を次式（３）のように推定する。The environment-independent phoneme HMM dictionary 4 stores, for example, HMMs of Japanese phonemes (about 30 to 40 types). So-called environment independence refers to the fact that the phoneme before and after the phoneme is unknown. This is expressed, for example, as follows. (*) A (*) (*) i (*) (*) u (*) (*) e (*) (*) o (*) (*) k (*) (*) s (*) ( *) T (*) · · · In step 5, referring to the phoneme string representation of the input word and the environment-independent phoneme HMM dictionary 4, the environment-independent phoneme HMM can be expressed, for example, by the following equation (1) Are connected to form a word HMM. HMM of word "akai" * (*) a (*) + (*) k (*) + (*) a (*) + (*) i (*) HMM of word "sakae" * (*) s ( *) + (*) A (*) + (*) k (*) + (*) a (*) + (*) e (*) (1) where "+": connection of HMM means. Next, in step 6, the parameter of the word HMM is estimated using the result of the pre-processing in step 3 and the word HMM configured in step 5 (learning of HMM). H
For estimation of the MM parameter, for example, the Baum-Welch (B-W) algorithm described in the aforementioned reference 2 is used. In this B-W algorithm, for example, the observation label sequence O = o
₁ , o ₂ ,..., O _T and state series I = i ₁ , i ₂ ,.
·, I _T , as in the following equation (2), forward variable α
Define _t (i) and the backward variable β _t (i). α _t (i) = Pr (o ₁ , o ₂ ,..., o _t , i _t = S _i ) β _t (i) = Pr (o _{t + 1} , o _{t + 2} ,..., o _T | i _t = S _i ) (2) And, state transition probability a _ij and label output probability b _j (k)
Is estimated as in the following equation (3).

【００１０】[0010]

【数１】このように単語ＨＭＭを学習し終えると、ステップ７で
は、例えば次式（４）のように、単語ＨＭＭを環境依存
型音素ＨＭＭに分解する。単語ＨＭＭ“ａｋａｉ” → （Ｏ）ａ（ｋ）；（ａ）ｋ（ａ）；（ｋ）ａ（ｉ）；（ａ）ｉ（Ｏ）単語ＨＭＭ“ｓａｋａｅ”→ （Ｏ）ｓ（ａ）；（ｓ）ａ（ｋ）；（ａ）ｋ（ａ）；（ｋ）ａ（ｅ）；（ａ）ｅ（Ｏ）・・・（４）これらの環境依存型音素ＨＭＭを環境依存型音素ＨＭＭ
辞書８に保存する。この時、（ａ）ｋ（ａ）というＨＭ
Ｍが２つあるので、次式（５）のように、その平均をと
り、環境依存型音素ＨＭＭ辞書８に保存する。新(a)k(a) ＝｛単語"akai"の(a)k(a) ＋単語"sakae" の(a)k(a) ｝／２・・・（５）但し、“＋”：算数の“足す”を意味する。[Equation 1] After learning the word HMM in this manner, in step 7, the word HMM is decomposed into the environment-dependent phoneme HMM, for example, according to the following equation (4). Word HMM "akai"-> (O) a (k); (a) k (a); (k) a (i); (a) i (O) word HMM "sakae"-> (O) s (a) (S) a (k); (a) k (a); (k) a (e); (a) e (O) (4) These environment-dependent phonemes HMMs are environment-dependent phonemes HMM
Save in dictionary 8. At this time, HM called (a) k (a)
Since there are two M's, they are averaged as in the following equation (5) and stored in the environment-dependent phoneme HMM dictionary 8. New (a) k (a) = {(a) k (a) of word "akai" + (a) k (a) of word "sakae" 2/2 (5) where "+": Means "add" in arithmetic.

【００１１】ステップ１０では、ある基準で前記の環境
依存型音素ＨＭＭが収束したか（即ち、環境依存型音素
ＨＭＭパラメータの前回の値と今回の値との差がこの基
準よりも小さいか）どうかを判別し、収束していなけれ
ば、ステップ９で次式（６）のように、ステップ７で分
解した環境依存型音素ＨＭＭを連結して単語ＨＭＭを再
構成し、ステップ６の単語ＨＭＭの学習へ戻り、前記の
学習処理と分解処理を繰り返す。単語“ａｋａｉ”のＨＭＭ ← （Ｏ）ａ（ｋ）＋（ａ）ｋ（ａ）＋（ｋ）ａ（ｉ）＋（ａ）ｉ（Ｏ）単語“ｓａｋａｅ”のＨＭＭ← （Ｏ）ｓ（ａ）＋（ｓ）ａ（ｋ）＋（ａ）ｋ（ａ）＋（ｋ）ａ（ｅ）＋（ａ）ｅ（Ｏ）・・・（６）これに対し、ステップ１０の判別の結果、もし収束して
いれば、学習ループを終え、ステップ１１で該当する環
境依存型音素ＨＭＭの学習に使われた学習サンプル数が
十分であるか否かを判別する。即ち、学習サンプル数ｎ
が予め定めた閾値ｍ以上あれば、そのままステップ１３
で学習を終了する。In step 10, whether the environment-dependent phoneme HMM has converged on a certain criterion (ie, whether the difference between the previous value and the current value of the environment-dependent phoneme HMM parameters is smaller than this criterion) If it does not converge, the environment-dependent phoneme HMM decomposed in step 7 is linked in step 9 as in the following equation (6) to reconstruct a word HMM, and learning of the word HMM in step 6 Then, the learning process and the decomposition process are repeated. HMM of word "akai" k (O) a (k) + (a) k (a) + (k) a (i) + (a) i (O) HMM of word "sakae" O (O) s ( a) + (s) a (k) + (a) k (a) + (k) a (e) + (a) e (O) (6) On the other hand, the result of the determination in step 10 If converged, the learning loop is completed, and it is determined in step 11 whether or not the number of learning samples used for learning the corresponding environment-dependent phoneme HMM is sufficient. That is, the number of learning samples n
Is equal to or greater than a predetermined threshold m, step 13
End learning at.

【００１２】学習サンプル数ｎが閾値ｍより少なけれ
ば、ステップ１２で、環境依存型音素ＨＭＭのパラメー
タに対して、次式（７）に示す重み係数γを使用して、
重み付き平均化処理を行う。 γ＝（１−ｎ／ｍ）・・・（７）例えば、ステップ１２において、環境依存型音素ＨＭＭ
の各状態パラメータはａ_ij（ｉ＝１，・・・，４；ｊ＝
１，・・・，５）、ｂ_j（ｋ）（ｊ＝１，・・・，４）
とし、環境独立型音素ＨＭＭの各状態パラメータはａ_ij
0 （ｉ＝１，・・・，４；ｊ＝１，・・・，５）、ｂ_j
0 （ｋ）（ｊ＝１，・・・，４）とすると、新しい環境
依存型音素ＨＭＭの各状態パラメータａ_ij ，ｂ_j （ｋ）
は、それぞれ次式（８）のようになる。ａ_ij ＝γ×ａ_ij＋（１−γ）×ａ_ij0 ｂ_j （ｋ）＝γ×ｂ_j（ｋ）＋（１−γ）×ｂ_j0 （ｋ）・・・（８）この様にして得られた，新しい環境依存型音素ＨＭＭの
各状態パラメータａ_ij ，ｂ_j （ｋ）により、環境依存型
音素ＨＭＭ辞書８は更新され、ステップ１３で学習を終
了する。以上のように、この第１の実施形態では、次の
ような利点がある。第１の実施形態によれば、ステップ
１１で学習サンプル数ｎが十分であるか否かを閾値ｍを
もとに判断している。そして、ステップ１２で、この閾
値ｍをもとに、実際の学習で使用したサンプル数ｎに応
じた重み係数γを決定している。このため、実験結果を
見ながら手作業で重み係数γを求める必要がなく、迅速
な学習が可能となるとともに、重み係数γには学習サン
プル数ｎが反映されているので、学習データへの偏りを
低減できる。If the number of learning samples n is smaller than the threshold value m, at step 12, using the weighting factor γ shown in the following equation (7) for the parameters of the environment-dependent phoneme HMM,
Perform weighted averaging processing. γ = (1-n / m) (7) For example, in step 12, the environment-dependent phoneme HMM
Each state parameter of is a _ij (i = 1,..., 4; j =
1, ..., 5), b _j (k) (j = 1, ..., 4)
And each state parameter of the environment independent phoneme HMM is a _ij
0 (i = 1, ..., 4; j = 1, ..., 5), b _j
Assuming 0 (k) (j = 1, ..., 4), each state parameter a _ij , b _j (k) of the new environment-dependent phoneme HMM
Are respectively as in the following equation (8). a _ij = γ × a _ij + (1−γ) × a _ij 0 b _j (k) = γ × b _j (k) + (1−γ) × b _j 0 (k) (8) This The environment-dependent phoneme HMM dictionary 8 is updated with the state parameters a _ij and b _j (k) of the new environment-dependent phoneme HMM obtained in the same manner, and the learning is ended in step 13. As described above, the first embodiment has the following advantages. According to the first embodiment, in step 11, it is judged based on the threshold value m whether or not the number of learning samples n is sufficient. Then, in step 12, based on the threshold value m, a weighting factor γ corresponding to the number n of samples used in actual learning is determined. For this reason, it is not necessary to manually determine the weighting factor γ while looking at the experimental results, and rapid learning is possible, and the number n of learning samples is reflected in the weighting factor γ. Can be reduced.

【００１３】第２の実施形態図３は、本発明の第２の実施形態を示すＨＭＭの学習方
法の処理内容のフローチャートであり、図１中の処理と
共通する処理については共通の符号が付されている。前
記第１の実施形態では、図１のステップ１２で、環境独
立型音素ＨＭＭ辞書４の環境独立型音素ＨＭＭを用いて
いる。これに対し、この第２の実施形態では、第１の実
施形態のステップ１２に代えてステップ１２Ａを設け、
このステップ１２Ａにより、ダイフォン及び音素ＨＭＭ
辞書４Ａに予め用意しておいた片側環境依存型音素（即
ち、ダイフォン）ＨＭＭを用いたＨＭＭパラメータの重
み付き平均化処理を行っている。即ち、この第２の実施
形態のＨＭＭ学習方法では、第１の実施形態と同様に、
例えば、プログラム制御されるコンピュータを用いて、
図３のステップ１〜１３の処理が実行される。環境依存
型音素ＨＭＭの学習に使用された学習サンプル数が、十
分であるか否かの判別をするまでの処理は、第１の実施
形態と同様のステップ１〜１１で行われる。学習サンプ
ル数が十分であれば、そのまま、ステップ１３で学習を
終了する、学習サンプル数が不十分の場合、処理はステ
ップ１２Ａへ進む。 Second Embodiment FIG. 3 is a flow chart of the processing contents of a learning method of an HMM according to a second embodiment of the present invention, and the processing common to the processing in FIG. It is done. In the first embodiment, the environment-independent phoneme HMM of the environment-independent phoneme HMM dictionary 4 is used in step 12 of FIG. 1. On the other hand, in the second embodiment, a step 12A is provided instead of the step 12 of the first embodiment,
According to this step 12A, the diphone and the phoneme HMM
A weighted averaging process of HMM parameters using one-sided environment-dependent phoneme (that is, diphone) HMM prepared in advance in the dictionary 4A is performed. That is, in the HMM learning method of the second embodiment, as in the first embodiment,
For example, using a program controlled computer,
The processes of steps 1 to 13 in FIG. 3 are performed. Processing until it is determined whether or not the number of learning samples used for learning the environment-dependent phoneme HMM is sufficient is performed in steps 1 to 11 similar to the first embodiment. If the number of learning samples is sufficient, the learning is finished as it is in step 13. If the number of learning samples is insufficient, the process proceeds to step 12A.

【００１４】ステップ１２Ａでは、環境依存型音素ＨＭ
Ｍのパラメータに対して、式（７）に示す重み係数γを
使用して、ダイフォン及び音素ＨＭＭ辞書４Ａに予め用
意しておいた片側環境依存型音素ＨＭＭのパラメータと
の重み付き平均化処理を行う。例えば、ステップ１２に
おいて、環境依存型音素ＨＭＭの各状態パラメータはａ
_ij（ｉ＝１，・・・，４；ｊ＝１，・・・，５）、ｂ_j
（ｋ）（ｊ＝１，・・・，４）とし、片側環境依存型音
素ＨＭＭの各状態パラメータはａ_ij1 （ｉ＝１，・・
・，４；ｊ＝１，・・・，５）、ｂ_j1 （ｋ）（ｊ＝
１，・・・，４）とすると、新しい環境依存型音素ＨＭ
Ｍの各状態パラメータａ_ij ，ｂ_j （ｋ）は、それぞれ次
式（９）のようになる。ａ_ij ＝γ×ａ_ij＋（１−γ）×ａ_ij1 ｂ_j （ｋ）＝γ×ｂ_j（ｋ）＋（１−γ）×ｂ_j1 （ｋ）・・・（９）この様にして得られた新しい環境依存型音素ＨＭＭの各
状態パラメータａ_ij ，ｂ_j （ｋ）により、環境依存型音
素ＨＭＭ辞書８は更新され、ステップ１３で学習を終了
する。以上のように、この第２の実施形態では、次の
（１）、（２）のような利点がある。In step 12A, the environment-dependent phoneme HM is
(7) for the parameters of M, weighted averaging with the parameters of the one-sided environment-dependent phoneme HMM prepared in advance in the diphone and phoneme HMM dictionary 4A Do. For example, in step 12, each state parameter of the environment-dependent phoneme HMM is a
_ij (i = 1, ..., 4; j = 1, ..., 5), b _j
(K) (j = 1,..., 4), and each state parameter of the one-sided environment dependent phoneme HMM is a _ij 1 (i = 1,...
· 4; j = 1, ..., 5), b _j 1 (k) (j =
1, ..., 4), the new environment-dependent phoneme HM
Each state parameter a _ij , b _j (k) of M is as shown in the following equation (9). a _ij = γ × a _ij + (1−γ) × a _ij 1 b _j (k) = γ × b _j (k) + (1−γ) × b _j 1 (k) (9) The environment-dependent phoneme HMM dictionary 8 is updated with the state parameters a _ij , b _j (k) of the new environment-dependent phoneme HMM obtained in the same manner, and the learning is finished in step 13. As described above, the second embodiment has the following advantages (1) and (2).

【００１５】（１）第２の実施形態によれば、第１の実
施形態と同様に、重み係数γを算出するので、手作業で
その重み係数γを求める必要がなく、迅速な学習が可能
となる。しかも、重み係数γには学習サンプル数が反映
されているので、学習データへの偏りを低減できる。（２）第２の実施形態では、ステップ１２Ａの重み付き
平均化処理において、片側環境依存型音素ＨＭＭを用い
ているので、第１の実施形態と異なる次のような利点が
ある。第１の実施形態で用いられる環境独立型音素ＨＭ
Ｍは、前後の音韻環境を考慮しないので、音素数は極め
て限定されている。このため、環境独立型音素ＨＭＭ辞
書４を完備することは容易である。その反面、この環境
独立型音素ＨＭＭ辞書４で得られるＨＭＭパラメータ
は、極めて汎用化されているので、個々の単語の環境依
存型音素ＨＭＭを学習する場合に、相違が生ずるおそれ
が皆無とはいえない。これに対し、第２の実施形態で用
いられる片側環境依存型音素ＨＭＭは、前または後のい
ずれか一方の音韻環境を考慮するため、個々の単語が実
際に使われている環境に近い状態のＨＭＭパラメータを
得ることができる。このため、個々の単語の環境依存型
音素ＨＭＭを学習する場合に、相違が生じるおそれはよ
り少なくなる。しかし、音韻環境の組合わせにより音素
数は増加し、ダイフォン及び音素ＨＭＭ辞書４Ａを完備
することが困難となる。そのため、学習する単語を構成
する各音素に対して片側環境依存型音素ＨＭＭが予め用
意されている場合に、この第２の実施形態によるＨＭＭ
の学習方法を使用することが可能となる。(1) According to the second embodiment, since the weighting factor γ is calculated as in the first embodiment, it is not necessary to manually determine the weighting factor γ, and rapid learning is possible. It becomes. Moreover, since the number of learning samples is reflected in the weighting factor γ, it is possible to reduce the bias to the learning data. (2) In the second embodiment, since the one-sided environment-dependent phoneme HMM is used in the weighted averaging process of step 12A, there are the following advantages different from the first embodiment. Environment-independent phoneme HM used in the first embodiment
Since M does not consider the phonological environment before and after, the number of phonemes is very limited. Therefore, it is easy to complete the environment independent phoneme HMM dictionary 4. On the other hand, since the HMM parameters obtained by the environment independent phoneme HMM dictionary 4 are extremely generalized, there is no possibility that a difference will occur when learning the environment dependent phoneme HMM of each word. Absent. On the other hand, the single-sided environment-dependent phoneme HMM used in the second embodiment is in a state close to the environment in which each word is actually used, in order to consider either the phonological environment before or after. HMM parameters can be obtained. For this reason, when learning environment-dependent phoneme HMMs of individual words, the difference is less likely to occur. However, the combination of phonetic environments increases the number of phonemes, making it difficult to complete the diphone and phoneme HMM dictionary 4A. Therefore, when a one-sided environment-dependent phoneme HMM is prepared in advance for each phoneme constituting a word to be learned, the HMM according to the second embodiment
It is possible to use the learning method of

【００１６】このように、この第２の実施形態では、ス
テップ１２Ａの重み付き平均化処理で、予め用意してお
いた片側環境依存型音素ＨＭＭを使用するので、パラメ
ータの平滑化を図ることが可能となり、より高精度の音
声認識が可能となる。なお、本発明は、前記実施形態に
限定されず、種々の変形が可能である。この変形例とし
ては、例えば、次のようなものがある。（ａ）第１及び第２の実施形態では、図２に示す４状態
ＨＭＭを例として、ＨＭＭの学習方法を説明している
が、その他の状態数のＨＭＭについても同様に適用可能
である。（ｂ）第１及び第２の実施形態では、入力された単語音
声に対するＨＭＭの学習方法について説明しているが、
文節や文の音声が入力された場合にも、同様にして環境
依存型音素ＨＭＭの学習をすることができる。As described above, in the second embodiment, since the one-sided environment-dependent phoneme HMM prepared in advance is used in the weighted averaging process of step 12 A, smoothing of parameters can be achieved. It becomes possible, and more accurate speech recognition becomes possible. In addition, this invention is not limited to the said embodiment, A various deformation | transformation is possible. Examples of this variation include the following. (A) In the first and second embodiments, the learning method of the HMM is described using the four-state HMM shown in FIG. 2 as an example, but the present invention is similarly applicable to HMMs having other numbers of states. (B) In the first and second embodiments, the learning method of the HMM for the input word speech is described.
Even when speech of a phrase or a sentence is input, environment-dependent phoneme HMMs can be similarly learned.

【００１７】[0017]

【発明の効果】以上詳細に説明したように、第１の発明
によれば、十分に学習されていない環境依存型音素ＨＭ
Ｍに対してのみ、そのパラメータを学習サンプル数に応
じた重み係数で重み付き平均したパラメータで置き換え
る。このため、学習処理の迅速化が可能となる。更に、
従来の環境依存型音素ＨＭＭの長所を損なうことなく、
学習データへの偏りを低減でき、かつパラメータの平滑
化を図ることが可能となるので、高精度の音声認識が可
能となる。第２の発明によれば、第１の発明と同様に、
学習サンプル数に応じた重み係数で、環境依存型音素Ｈ
ＭＭパラメータの平均化処理を行うので、学習処理の迅
速化が可能となる。しかも、この平均化処理では、片側
環境依存型音素ＨＭＭのパラメータを使用するので、個
々の単語（または文節もしくは文）が実際に使われてい
る環境に近いパラメータが得られる。このため、第１の
発明に比べて、従来の環境依存型音素ＨＭＭの長所を損
なうことなく、学習データへの偏りを低減でき、かつパ
ラメータの一層の平滑化を図ることが可能となるので、
さらに高精度の音声認識が可能となる。As described above in detail, according to the first invention, the environment-dependent phoneme HM which has not been sufficiently learned.
For M only, replace the parameter with a weighted average with a weighting factor according to the number of training samples. This makes it possible to speed up the learning process. Furthermore,
Without losing the advantages of the conventional environment-dependent phoneme HMM
Since bias to learning data can be reduced and parameters can be smoothed, highly accurate speech recognition is possible. According to the second invention, as in the first invention,
Environment-dependent phoneme H with a weighting factor according to the number of training samples
Since the averaging process of the MM parameters is performed, the learning process can be speeded up. Moreover, since this averaging process uses the parameters of the one-sided environment-dependent phoneme HMM, parameters close to the environment in which the individual words (or phrases or sentences) are actually used can be obtained. Therefore, as compared to the first invention, it is possible to reduce the bias to the learning data and to further smooth the parameters without losing the merits of the conventional environment-dependent phoneme HMM.
Furthermore, high-accuracy speech recognition is possible.

Brief Description of the Drawings

【図１】本発明の第１の実施形態を示すＨＭＭの学習方
法の処理内容のフローチャートである。FIG. 1 is a flowchart of processing content of a learning method of an HMM according to a first embodiment of the present invention.

【図２】従来の音声認識方法に用いられる単語ＨＭＭの
構造例を示す図である。FIG. 2 is a view showing an example of the structure of a word HMM used in a conventional speech recognition method.

【図３】本発明の第２の実施形態を示すＨＭＭの学習方
法の処理内容のフローチャートである。FIG. 3 is a flowchart of processing contents of a learning method of an HMM according to a second embodiment of the present invention.

[Description of the code]

４環境独立型音素ＨＭＭ辞書４Ａダイフォン及び環境独立型音素ＨＭＭ辞
書５単語ＨＭＭの構成処理のステップ６単語ＨＭＭの学習処理のステップ７単語ＨＭＭを環境依存型音素ＨＭＭに分
解する分解処理のステップ８環境依存型音素ＨＭＭ辞書９環境依存型音素ＨＭＭを連結して単語Ｈ
ＭＭを再構成する連結学習処理のステップ１０環境依存型音素ＨＭＭの収束判定処理の
ステップ１１学習サンプル数の判定処理のステップ１２，１２ＡＨＭＭパラメータの重み付き平均化処理
のステップ4 Environment-independent phoneme HMM dictionary 4A diphone and environment-independent phoneme HMM dictionary 5 Step of configuration process of word HMM 6 step of learning process of word HMM 7 step of decomposition process of resolving word HMM into environment-dependent phoneme HMM 8 environment Dependent Phoneme HMM Dictionary 9 Concatenation of Environment Dependent Phoneme HMMs and Word H
Step of connected learning processing to reconstruct MM Step of convergence judgment processing of environment-dependent phoneme HMM Step of judgment processing of number of learning samples 12, 12A Step of weighted averaging processing of HMM parameters

Claims

[Claim of claim]

1. When learning environment-dependent phoneme hidden Markov models, environment-independent phoneme hidden Markov models prepared in advance are connected and any one of a word, a sentence or a sentence is connected. A learning process for constructing a Hidden Markov model and learning any one hidden Markov model, and a decomposition process for decomposing the learning result into an environment-dependent phoneme hidden Markov model after the learning process; The decomposed environment-dependent phoneme hidden Markov
The environment-dependent phoneme hidden by repeating the learning process, the decomposition process, and the connecting process using the connecting process of reconnecting the model and creating the hidden Markov model of any one of a word, a clause or a sentence -In the learning method of hidden Markov models for learning Markov models, counting the number of learning samples used for learning the environment-dependent phoneme hidden Markov models, and if the number of learning samples is insufficient Only when judged, the parameters of the environment-dependent phoneme hidden Markov model finally decomposed in the repetition and the parameters of the environment-independent phoneme hidden Markov model corresponding to the parameters are the learning sample Calculate the weighted averaged parameter with the weighting factor according to the number, and the environment finally decomposed in the repetition It exists type phoneme Hidden-substitute the parameter Markov model parameters the calculated learning method of hidden Markov models, characterized by learning the environment-dependent phoneme Hidden Markov Model.

2. When learning environment-dependent phoneme hidden Markov models, environment-independent phoneme hidden Markov models prepared in advance are connected to form any one of words, phrases or sentences. A learning process for constructing a Hidden Markov model and learning any one hidden Markov model, and a decomposition process for decomposing the learning result into an environment-dependent phoneme hidden Markov model after the learning process; The decomposed environment-dependent phoneme hidden Markov
The environment-dependent phoneme hidden by repeating the learning process, the decomposition process, and the connecting process using the connecting process of reconnecting the model and creating the hidden Markov model of any one of a word, a clause or a sentence -In the learning method of hidden Markov models for learning Markov models, counting the number of learning samples used for learning the environment-dependent phoneme hidden Markov models, and if the number of learning samples is insufficient Only when it is judged that the parameters of the environment-dependent phoneme hidden Markov model finally decomposed in the repetition and the parameters of the one-sided environment-dependent phoneme hidden Markov model prepared in advance corresponding to the parameters And calculating a weighted average of weighting factors according to the number of learning samples, and It has been replaced by the parameters of the environment-dependent phoneme Hidden Markov model parameters the calculated learning method of hidden Markov models, characterized by learning the environment-dependent phoneme Hidden Markov Model decomposed.