JP3209247B2

JP3209247B2 - Excitation signal coding for speech

Info

Publication number: JP3209247B2
Application number: JP16558193A
Authority: JP
Inventors: 健弘守谷; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-07-05
Filing date: 1993-07-05
Publication date: 2001-09-17
Anticipated expiration: 2016-09-17
Also published as: JPH0720895A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は音声信号をできるだけ
少ない情報量でディジタル符号化する高能率音声符号化
法で、特に線形予測合成フィルタに供給する励振信号を
決定する励振信号符号化方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-efficiency speech coding method for digitally coding a speech signal with a minimum amount of information, and more particularly to an excitation signal coding method for determining an excitation signal to be supplied to a linear prediction synthesis filter. It is.

【０００２】[0002]

【従来の技術】８ｋｂｉｔ／ｓ程度以下の音声符号化法
として、ＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎ
ｅａｒＰｒｅｄｉｃｔｉｏｎ：符号励振線形予測）符
号化が良く知られている。この符号化法は図５に示すよ
うに、フレーム単位で雑音符号帳１１と適応符号帳１２
から励振ベクトルを選択して、それぞれ利得調整手段１
３，１４で利得を調整して線形予測合成フィルタ１５に
入力して音声を合成する。この合成音声と入力音声との
差を差回路１６で求め、その差出力を聴感補正フィルタ
１７を通し、そのフィルタ出力から歪計算部１８で歪を
計算し、その歪が小さくなるように励振信号を選択し利
得調整する。励振ベクトルの選択基準は合成された信号
と入力信号の聴感上の誤差を最小化することである。こ
のように最終的に合成される波形をフィードバックして
励振ベクトルを決定するため、高い品質が得られる。2. Description of the Related Art As a speech encoding method of about 8 kbit / s or less, CELP (Code Excited Lin) is used.
Ear Prediction (Code Excited Linear Prediction) coding is well known. As shown in FIG. 5, this encoding method uses a random codebook 11 and an adaptive codebook 12 in frame units.
, The excitation vector is selected from the
The gain is adjusted in steps 3 and 14 and input to the linear prediction synthesis filter 15 to synthesize speech. The difference between the synthesized speech and the input speech is obtained by a difference circuit 16, the difference output is passed through an audibility correction filter 17, and a distortion is calculated by a distortion calculator 18 from the filter output. Select and adjust the gain. The criterion for selecting the excitation vector is to minimize the audible error between the synthesized signal and the input signal. Since the excitation vector is determined by feeding back the finally synthesized waveform in this manner, high quality can be obtained.

【０００３】この半面、大量の雑音励振ベクトルを記憶
しておく必要があることや、励振ベクトルの探索の演算
量が膨大となるという難点がある。このため、記憶容量
や演算量を削減するためにいくつかの手法が考えられて
いるが、以下で探索の基本原理を示し、この発明と関連
のある代表的な演算量削減法について整理して紹介す
る。On the other hand, on the other hand, there are disadvantages that it is necessary to store a large amount of noise excitation vectors and that the amount of calculation for searching the excitation vectors becomes enormous. For this reason, several methods have been considered to reduce the storage capacity and the amount of calculation. However, the basic principle of the search will be described below, and typical calculation amount reduction methods related to the present invention will be summarized. introduce.

【０００４】ＣＥＬＰ符号化における雑音励振ベクトル
の探索では、前述したように線形予測合成フィルタ１５
の出力合成波形と入力音声との差をとり、その差出力を
聴感補正フィルタ１７を通したあとの歪が最小となるベ
クトルを選択する。具体的には、あるフレーム（フレー
ム長はｎサンプル）で量子化の目標とするｎ次元の縦ベ
クトル（入力音声から、前のフレームからの応答成分や
適応符号ベクトルからの合成成分を差し引いた現フレー
ムの入力音声を、聴感補正フィルタに通した信号ベクト
ル）をＸ、符号帳中のｍ種類の雑音励振ベクトルのｎ次
元縦ベクトルをＣ^j（ｊ＝０，…，ｍ−１）、聴感補正
フィルタ１７も一括した合成フィルタ１５のインパルス
応答行列（ｎ×ｎ）をＨ、雑音励振ベクトルの利得をｇ
とすると、下記の歪みｄを最小とするｊを探索する。In searching for a noise excitation vector in CELP coding, as described above, a linear prediction synthesis filter 15 is used.
Then, the difference between the output synthesized waveform and the input voice is obtained, and the vector having the minimum distortion after the difference output is passed through the audibility correction filter 17 is selected. Specifically, an n-dimensional vertical vector (a current component obtained by subtracting a response component from a previous frame and a synthetic component from an adaptive code vector from an input speech) in a certain frame (the frame length is n samples). X is the signal vector obtained by passing the input speech of the frame through the audibility correction filter, C is the ^j- dimensional vertical vector of m kinds of noise excitation vectors in the codebook (j = 0,..., M−1), The impulse response matrix (n × n) of the synthesis filter 15 is H, and the gain of the noise excitation vector is g.
Then, j that minimizes the following distortion d is searched.

【０００５】ここで、最適な利得が与えられ、雑音励振ベクトル決定
後にその利得を量子化することとすると、ｇ＝Ｘ^TＨＣ^j／（（ＨＣ^j）^T（ＨＣ^j））（３）が最適な利得となり（Ｘ^TはＸの転置行列）、このとき
の歪はｄ＝Ｘ^TＸ−（Ｘ^TＨＣ^j)²／((ＨＣ^j）^T（ＨＣ^j)) （４）となる。結局第２項のｆ＝（Ｘ^TＨＣ^j)²／（（ＨＣ^j）^T（ＨＣ^j））（５）を最大とするｊを捜せばよいことになる。ここでＨＣ^j
の合成演算にはｎ²／２回程度の積和演算が必要ですべ
てのｊについてＣ^jを合成するには莫大な演算量を必要
とする。そこで種々の提案がなされている。[0005] Here, assuming that an optimum gain is given and that the gain is quantized after the determination of the noise excitation vector, g = X ^T HC ^j / ((HC ^j ) ^T (HC ^j )) (3) next (X ^T is the transpose matrix of X), the distortion in this case is ^{^{d = X T X- (X T}} HC j) 2 / ((HC j) T (HC j)) (4). Eventually, it is sufficient to search for j that maximizes the second term, f = (X ^T HC ^j ) ² / ((HC ^j ) ^T (HC ^j )) (5). Where HC ^j
The synthesis operation requires an enormous amount of computation to synthesize C ^j for all j required sum of products about n ^2/2 times. Therefore, various proposals have been made.

【０００６】まず式（５）の分子はＸ^TＨを１回だけ先
に計算し、そのあとでＣ^jと内積をとれば近似誤差を発
生させずに演算量を削減する手法が知られている。内積
はｎ回の演算ですむためである。ところが、分母の計算
で演算量を削減しようとすると、近似誤差が生ずる。例
えば分母の項を（Ｃ^j）^TＨ^TＨＣ^jとして、Ｈ^TＨを
テプリッツ型の相関行列で近似し、Ｈの自己相関関数と
Ｃ^jの自己相関関数の内積で計算する方法が知られてい
る。これにより、分母の項をｎ回の積和で計算できる
が、近似誤差が大きく、歪の増加と品質の劣化を招く。
また雑音励振ベクトルＣ^j毎に相関値を記憶しておくた
めに、符号帳のための記憶容量と同程度の記憶容量の追
加が必要となる。[0006] First molecules of formula (5) calculates the X ^T H above once and method to reduce the amount of calculation without causing approximation error Taking C ^j and the inner product is known for its later I have. The inner product requires n operations. However, if an attempt is made to reduce the amount of calculation in the calculation of the denominator, an approximation error occurs. For example the denominator term as ^{^{^{(C j) T H T HC}}} j, approximates the H ^T H in the correlation matrix of the Toeplitz type, a method of calculating by the inner product of the autocorrelation function of the autocorrelation function and C ^j of H is known ing. As a result, the denominator term can be calculated by n times of product sum, but the approximation error is large, causing an increase in distortion and a deterioration in quality.
In order to store the correlation value for each noise excitation vector C ^j, additional storage capacity and comparable storage capacity for the codebook is required.

【０００７】これとは別にＣにローパスフィルタを掛け
てダウンサンプルした励振ベクトルをもち、分母の項を
このベクトルで近似する手法も知られている。ダウンサ
ンプルした比率で演算量を削減できるが、この手法でも
近似誤差やダウンサンプルしたベクトルのための記憶容
量が増加するという問題がある。また探索を２段階にわ
け、第１段では近似計算で複数のＣ^jを候補として予備
選択し、その候補についてのみ第２段で式（５）を最大
とするものを選択する方法も知られている。第１段の近
似法としては分子の項だけを用いる方法や、上記で紹介
した近似法を用いる方法などが考えられる。第１段の近
似誤差を小さくすることや候補数を増やすことで歪を小
さくできるが、演算量削減効果が小さくなってしまう。[0007] Apart from this, there is also known a method in which a low-pass filter is applied to C to obtain an excitation vector downsampled, and the term of the denominator is approximated by this vector. Although the amount of calculation can be reduced by the downsampled ratio, this method also has a problem that the approximation error and the storage capacity for the downsampled vector increase. There is also known a method in which the search is divided into two stages, and a plurality of ^Cj are preliminarily selected as candidates in an approximate calculation in the first stage, and only those candidates in the second stage are selected so as to maximize Expression (5). ing. As the first approximation method, a method using only the numerator term, a method using the above-described approximation method, and the like can be considered. Distortion can be reduced by reducing the approximation error in the first stage or increasing the number of candidates, but the effect of reducing the amount of computation is reduced.

【０００８】[0008]

【発明が解決しようとする課題】この発明の目的は音声
を少ない情報量で符号化するとき、できるだけ演算量を
抑えたまま符号化音声の品質を向上させ、特に上記従来
法の中で紹介した２段階の探索と組合せ、歪の劣化が小
さい割に大幅な演算量の削減ができる音声の励振信号符
号化方法を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to improve the quality of a coded speech while encoding the speech with a small amount of information while minimizing the amount of computation as much as possible. It is an object of the present invention to provide a speech excitation signal encoding method capable of significantly reducing the amount of computation in spite of small distortion degradation in combination with a two-stage search.

【０００９】[0009]

【課題を解決するための手段】この発明では雑音励振ベ
クトルを選択する（または予備選択する）際に式（５）
の分母の項すなわちベクトルのエネルギーの項も考慮す
るが、そのエネルギーの項は予め蓄えておいた推定値を
読み出すことで求めることが特徴である。このエネルギ
ーの推定値は雑音励振ベクトル毎に、また合成フィルタ
の分類毎に予め蓄えておく。図１に２段階の探索を行な
う場合について、この発明方法と従来法との比較を示
す。従来においては図１Ｂに示すようにエネルギー項を
全く無視して、式（５）の分子の計算結果のみで予備選
択するか、図１Ｃに示すように、近似演算を行なうこと
でエネルギー項を求め、これを用いて式（５）を近似計
算して予備選択することが知られていたが、この発明で
は図１Ａに示すようにインパルス応答の分類に基づいた
表の参照でエネルギー項を推定し、これを用いて式
（５）を近似計算して予備選択する。According to the present invention, when a noise excitation vector is selected (or preselected), equation (5) is used.
Although the term of the denominator of, that is, the term of the energy of the vector is also taken into consideration, the term of the energy is characterized in that it is obtained by reading an estimated value stored in advance. The estimated value of the energy is stored in advance for each noise excitation vector and each classification of the synthesis filter. FIG. 1 shows a comparison between the method of the present invention and the conventional method when a two-stage search is performed. Conventionally, the energy term is completely ignored by ignoring the energy term as shown in FIG. 1B, or the energy term is obtained by performing an approximation operation as shown in FIG. It has been known that the equation (5) is approximated and preliminarily selected using this, but in the present invention, the energy term is estimated by referring to a table based on the classification of the impulse response as shown in FIG. 1A. Using this, equation (5) is approximated and preliminarily selected.

【００１０】[0010]

【実施例】図２にこの発明の最も基本的な実施例の要部
の構成を示す。通常のｍ個のｎ次元雑音励振ベクトルか
らなる符号帳１１のほかに、この実施例では合成フィル
タの類型を示すものとして複数種類のインパルス応答の
パターンが蓄えられた記憶部２１およびその記憶部２１
からインパルス応答系列と対応したコードを選択する選
択器２２、インパルス応答パターンを縦軸、雑音励振ベ
クトルを横軸とし、各要素には合成後のエネルギー推定
値またはその逆数が予め計算されて記憶されているエネ
ルギー項表２３とが備えられる。２段階の探索を行なう
としたときの探索手順は以下のようになる。FIG. 2 shows the structure of a main part of the most basic embodiment of the present invention. In addition to the codebook 11 composed of m normal n-dimensional noise excitation vectors, in this embodiment, a storage unit 21 storing a plurality of types of impulse response patterns as a type of a synthesis filter, and the storage unit 21
A selector 22 for selecting a code corresponding to an impulse response sequence from the above, an impulse response pattern as a vertical axis, a noise excitation vector as a horizontal axis, and an energy estimated value after synthesis or its reciprocal is pre-calculated and stored in each element. Energy term table 23 is provided. A search procedure when a two-step search is performed is as follows.

【００１１】１．あるフレームで合成フィルタや聴感補
正フィルタの特性が与えられると、合成フィルタのイン
パルス応答が決まる。このインパルス応答と記憶部２１
に予め蓄えられた応答パターンとの照合を選択器２２で
とり、現在のフレームのインパルス応答のパターンを決
定する。２．次に式（５）の分子の項を全ての雑音励振ベクトル
に対して演算する。1. When the characteristics of the synthesis filter and the audibility correction filter are given in a certain frame, the impulse response of the synthesis filter is determined. This impulse response and the storage unit 21
The selector 22 compares the response pattern with the response pattern stored in advance in the selector 22 to determine the pattern of the impulse response of the current frame. 2. Next, the term of the numerator of the equation (5) is calculated for all the noise excitation vectors.

【００１２】３．選択器２２からの決定されたインパル
ス応答に対応するエネルギー推定値の表２３を参照して
求め、各雑音励振ベクトルに対して、（分子の項／エネ
ルギー推定値）を算出し、その算出値の大きいものから
設定された数の候補を残す。４．予備選択した候補についてのみ式（５）の計算を行
ない、最適な雑音励振ベクトルの符号を決定する。3. The energy estimation value corresponding to the determined impulse response from the selector 22 is obtained with reference to Table 23, and for each noise excitation vector, (numerator term / energy estimation value) is calculated. Leave a set number of candidates from the largest. 4. Formula (5) is calculated only for the preselected candidates, and the sign of the optimal noise excitation vector is determined.

【００１３】上記処理中で、インパルス応答を分類す
る、つまりインパルス応答のパターンを決定する方法と
してはインパルス応答すなわちＦＩＲフィルタ係数の距
離尺度を用いる方法や、インパルス応答を近似する全極
型のスペクトルモデルの尺度を用いる方法がある。フレ
ーム毎に決まるインパルス応答ベクトルをｈ（ただしｈ
^T＝｛ｈ₀，…，ｈ_n-1｝）、Ｍ種類の代表パターンと
して蓄えておくインパルス応答ベクトルをｈ′ⁱ（ｉ＝
０，…，Ｍ−１）とすると、前者の場合ｅを最小とする
代表パターンを選ぶことになる。In the above process, the impulse response is classified.
To determine the pattern of the impulse response
The impulse response, that is, the distance of the FIR filter coefficient
Separation scale method, all poles approximating impulse response
There is a method that uses a measure of the type spectral model. Fret
H (where h
^T= ｛H₀, ..., h_n-1｝), M types of representative patterns and
The impulse response vector to be storedⁱ(I =
0,..., M−1), in the former case, minimize e.
You will select a representative pattern.

【００１４】ｅ＝‖ｈ−ｈ′ⁱ‖² （６）またこの変形として次式のように高次の係数が減少する
三角窓Ｗの重みをつけることや、低次のみでの距離計算
も考えられる。ｅ＝‖Ｗ（ｈ−ｈ′ⁱ）‖² （７）これはインパルス応答行列が下三角であるため、低次の
係数ほどエネルギー計算に大きく貢献していることを考
慮したものである。E = {h−h ′ ⁱ } ² (6) As a modification, weighting of a triangular window W in which a higher-order coefficient decreases as in the following equation, and distance calculation only in a lower order are also performed. Conceivable. e = {W (h−h ′ ⁱ )} ² (7) This takes into account that since the impulse response matrix is a lower triangular, lower-order coefficients contribute more to energy calculation.

【００１５】一方、全極型のスペクトルモデルを用いる
ときは以下のｅを最小とする最尤距離尺度で代表パター
ンを選ぶ。ｅ＝Σα_iΦ_i （８）ここで、Σはｉ＝０からｐまで、ｐはモデルの次数、α
_iは代表モデルのｉ次の線形予測係数、Φ_iはインパル
ス応答のｉ次の自己相関関数であり、この尺度は音声認
識で頻繁に使われる尺度である。また予め用意しておく
少数の代表パターンの作成には通常のベクトル量子化で
用いられている代表パターンの設計アルゴリズムをその
まま利用すればよい。On the other hand, when an all-pole spectral model is used, a representative pattern is selected using a maximum likelihood distance scale that minimizes the following e. e = Σα _i Φ _i (8) where Σ is from i = 0 to p, p is the order of the model, α
_i is the i-th linear prediction coefficient of the representative model, Φ _i is the i-th autocorrelation function of the impulse response, and this measure is a measure frequently used in speech recognition. In order to create a small number of representative patterns prepared in advance, the representative pattern design algorithm used in normal vector quantization may be used as it is.

【００１６】以上の実施例は図５に示した基本的なＣＥ
ＬＰ符号化を前提に説明したが、雑音励振ベクトルの探
索の前に、すべての雑音励振ベクトルをピッチ成分と直
交化させる手法や、複数チャンネルの雑音励振ベクトル
をもつ場合にもこの発明を適用できる。すなわち第２の
実施例として雑音励振ベクトルをピッチ成分（適応符号
帳からの出力）ｎ次元ベクトルＰと直交化させる場合に
ついて説明する。この場合には最終的に最大化する式
（５）が次のように変形される。In the above embodiment, the basic CE shown in FIG.
Although the description has been made on the assumption that the LP coding is used, the present invention can be applied to a method of orthogonalizing all the noise excitation vectors with the pitch component before searching for the noise excitation vector, or to a case where the noise excitation vector has a plurality of channels. . That is, a case where the noise excitation vector is orthogonalized to the pitch component (output from the adaptive codebook) n-dimensional vector P will be described as a second embodiment. In this case, the expression (5) that finally maximizes is modified as follows.

【００１７】ｆ＝（Ｘ^TＨＣ^j−ρＨＰ）²／ ((ＨＣ^j) ^T（ＨＣ^j）−ρ（ＨＣ^j）^T（ＨＰ)) （９）ここで ρ＝（ＨＣ^j) ^T（ＨＰ）／（（ＨＰ) ^T（ＨＰ））（10）である。式（９）でも分子の項は式（５）の場合と同様
に正確に比較的少ない演算で計算できる。また分母のρ
以下の第２項も積和の計算も同様である。従って分母の
第１項を第１の実施例と同じ手順で表から読み出せば演
算量を削減できる。また分母の第２項の計算を全く省略
して、第１項のみで評価しても歪の増加はほとんどな
い。F = (X ^T HC ^j −ρHP) ² / ((HC ^j ) ^T (HC ^j ) −ρ (HC ^j ) ^T (HP)) (9) where ρ = (HC ^j ) ^T (HP ) / ((HP) ^T (HP)) (10). In equation (9), the term of the numerator can be calculated accurately and with relatively few operations as in the case of equation (5). Denominator ρ
The same applies to the following second term and the calculation of the sum of products. Therefore, if the first term of the denominator is read from the table in the same procedure as in the first embodiment, the amount of calculation can be reduced. Further, even if the calculation of the second term of the denominator is omitted altogether and the evaluation is performed only with the first term, the distortion hardly increases.

【００１８】第３の実施例として２チャンネルの雑音励
振ベクトルをもつ場合について示す。ここでも２段階の
探索を行なうと図４に示すような処理を行なうことにな
る。第１チャンネルの励振ベクトルをＣ^j、第２チャン
ネルの励振ベクトルをＣ′^kとすると、最終的な評価尺
度は以下のようになる。As a third embodiment, a case in which noise excitation vectors of two channels are provided will be described. Here, if a two-stage search is performed, the processing shown in FIG. 4 is performed. Assuming that the excitation vector of the first channel is C ^j and the excitation vector of the second channel is C ′ ^k , the final evaluation scale is as follows.

【００１９】[0019]

【数１】しかしながら各チャンネル毎の予備選択ではそれぞれ、
第１の実施例と同じ処理を行なう。すなわち分母のエネ
ルギー項の概算値を表から読みだし、式（５）を最大と
する候補を複数個選択する。そして、各チャンネルから
の候補の組合せで式（１１）が最大となるｊとｋの組合
せを探索する。(Equation 1) However, in the preliminary selection for each channel,
The same processing as in the first embodiment is performed. That is, the approximate value of the energy term of the denominator is read from the table, and a plurality of candidates that maximize the equation (5) are selected. Then, a combination of j and k that maximizes Expression (11) is searched for in a combination of candidates from each channel.

【００２０】上述において、エネルギー項の表２３には
雑音励振ベクトルの合成後のエネルギー推定値の逆数を
各励振ベクトル及び各合成フィルタの類型ごとに記憶し
ておいてもよい。また場合によっては式（５）を正しく
計算することなく、前記予備選択を最終選択としてもよ
い。In the above description, in the energy term table 23, the reciprocal of the estimated energy value after the synthesis of the noise excitation vector may be stored for each excitation vector and each type of synthesis filter. In some cases, the preliminary selection may be the final selection without correctly calculating equation (5).

【００２１】[0021]

【発明の効果】この発明を用いた場合の入力音声と符号
化歪のＳＮＲと音声符号化処理全体の演算量（単位ＭＯ
ＰＳＭｅｇａＯｐｅｒａｔｉｏｎＰｅｒＳｅｃ
ｏｎｄ：実時間処理に必要な１秒あたり１００万回単位
の演算処理回数）との関係を従来法と比較したものを図
４に示す。この場合、雑音励振ベクトルは２チャンネル
からなり、各チャンネル毎に７ビット（極性を除く）を
配分した場合である。雑音励振ベクトルの次元数ｎは４
０でピッチ成分との直交化処理を併用している。またイ
ンパルス応答の分類は４種類とした。ここでの従来法は
式（５）の分子の項のみで予備選択を行なう方法であ
る。図中の数字は各チャンネル毎に１２８個の中から残
す候補数である。この図からこの発明により従来の予備
選択と比較してほとんど演算量を増加させることなく、
ＳＮＲを改善していることがわかる。この発明で残す候
補数が同一であれば、増加する演算量の内分けは、イン
パルス応答の分類のための距離計算がフレームあたり１
回、分母の項の逆数の積が予備選択での各雑音励振ベク
トル毎に１回ずつで、探索の距離計算の演算量と比較し
てごく少ない量である。またこの発明で増加する記憶容
量はインパルス応答の分類のための表２１とエネルギー
の表２３の記憶容量である。例えば図３の場合のように
インパルス応答を４種類に分類し、４０次元の雑音励振
ベクトルを用いる時には、記憶容量の増加する割合は約
１割である。According to the present invention, the input voice and the SNR of the coding distortion and the amount of calculation (unit MO)
PS Mega Operation Per Sec
FIG. 4 shows a comparison between the conventional method and ond: the number of operations performed in units of one million operations per second required for real-time processing. In this case, the noise excitation vector includes two channels, and 7 bits (excluding polarity) are allocated to each channel. The dimension number n of the noise excitation vector is 4
At 0, orthogonal processing with the pitch component is used together. The impulse response was classified into four types. Here, the conventional method is a method in which the preliminary selection is performed using only the numerator of the equation (5). The numbers in the figure are the number of candidates to be left out of 128 for each channel. From this figure, according to the present invention, there is almost no increase in the amount of computation as compared with the conventional preliminary selection.
It can be seen that the SNR has been improved. If the number of candidates to be left in the present invention is the same, the subdivision of the increasing amount of computation is as follows.
The product of the reciprocal of the term and the denominator term is once for each noise excitation vector in the preliminary selection, which is a very small amount as compared with the calculation amount of the distance calculation for the search. The storage capacities increased in the present invention are the storage capacities of Table 21 and Table 23 for the classification of the impulse response. For example, when the impulse responses are classified into four types as in the case of FIG. 3 and a 40-dimensional noise excitation vector is used, the rate at which the storage capacity increases is about 10%.

[Brief description of the drawings]

【図１】２段階の探索におけるこの発明の処理を従来法
と対比させて示す図。FIG. 1 is a diagram showing processing of the present invention in a two-stage search in comparison with a conventional method.

【図２】この発明の実施例の要部を示すブロック図。FIG. 2 is a block diagram showing a main part of the embodiment of the present invention.

【図３】この発明を２チャンネルの雑音励振ベクトルの
場合への適用した実施例の要部を示すブロック図。FIG. 3 is a block diagram showing a main part of an embodiment in which the present invention is applied to the case of a two-channel noise excitation vector.

【図４】この発明によるＳＮＲと演算量の関係を従来法
と比較した図。FIG. 4 is a diagram comparing the relationship between the SNR and the amount of calculation according to the present invention with the conventional method.

【図５】ＣＥＬＰ符号化法の基本原理を示すブロック
図。FIG. 5 is a block diagram showing the basic principle of the CELP encoding method.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/12 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 19/12

Claims

(57) [Claims]

1. A speech synthesis model having a pitch period component vector and a noise excitation vector as excitation sources of a linear predictive synthesis filter, and a noise excitation vector in a codebook so as to minimize an error between synthesized speech and input speech. In the speech excitation signal encoding method selected from the following, the inner product of the synthesized waveform and the input speech waveform is obtained for each noise excitation vector, and the type of the synthesis filter is determined. From the table of energy estimation values after synthesis of the excitation vectors created for each type or the table of the reciprocals of the estimation values, the estimated values are read for each noise excitation vector in the determined type, and the square of the inner product value is calculated. A method for encoding a speech excitation signal, comprising selecting or preselecting a noise excitation vector based on a value obtained by dividing the estimated energy value.