JPH0511799A

JPH0511799A - Speech coding system

Info

Publication number: JPH0511799A
Application number: JP3167270A
Authority: JP
Inventors: Masako Kato; 雅子加藤; Yoshiaki Tanaka; 良紀田中; Tomohiko Taniguchi; 智彦谷口; Hideaki Kurihara; 秀明栗原; Fumio Amano; 文雄天野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-07-08
Filing date: 1991-07-08
Publication date: 1993-01-22

Abstract

(57)【要約】【目的】本発明は、合成音声の品質の劣化を最小限に
抑えつつ誤差評価時の演算量を削減することにより音声
符号化処理の高速化を図ることを目的とする。【構成】コードブック記憶手段１０１からは励振ベク
トル１０２が選択される。音声合成処理手段１０３は、
励振ベクトル１０２を入力として線形予測分析に基づく
音声合成処理を行う。間引き手段１０６は、入力音声ベ
クトル１０５の要素値と合成音声ベクトル１０４の要素
値の各々につき一定の間隔の間引き処理を行う。誤差評
価手段１０９は、間引き入力音声ベクトル１０７及び間
引き合成音声ベクトル１０８について二乗誤差の評価を
行う。以上の処理が、励振ベクトル１０２毎に繰り返さ
れ、二乗誤差が最小（相関値が最大）となる合成音声ベ
クトルを生じさせたパラメータに基づき入力音声ベクト
ル１０５が符号化される。 (57) [Summary] [Object] An object of the present invention is to speed up a speech encoding process by reducing the amount of calculation at the time of error evaluation while minimizing the deterioration of the quality of synthesized speech. . [Configuration] An excitation vector 102 is selected from the codebook storage means 101. The voice synthesis processing means 103
Speech synthesis processing based on linear prediction analysis is performed using the excitation vector 102 as an input. The decimation unit 106 decimates the element values of the input speech vector 105 and the element values of the synthesized speech vector 104 at regular intervals. The error evaluation means 109 evaluates the squared error for the thinned-out input speech vector 107 and the thinned-out synthesized speech vector 108. The above process is repeated for each excitation vector 102, and the input voice vector 105 is encoded based on the parameter that has generated the synthesized voice vector with the minimum squared error (the maximum correlation value).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声信号の情報圧縮伝
送を行うための音声符号化方式に係り、更に詳しくは、
コード駆動ＬＰＣ符号化（ＣＥＬＰ）方式などを用いた
音声符号化方式における入力音声ベクトルと合成音声ベ
クトル間の誤差評価方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding system for performing information compression transmission of a voice signal, and more specifically,
The present invention relates to an error evaluation method between an input speech vector and a synthetic speech vector in a speech coding method using a code driven LPC coding (CELP) method or the like.

【０００２】[0002]

【従来の技術】音声は人間のコミュニケーションにおけ
る重要な手段であることより、社会における音声情報の
役割も非常に大きなものがある。ＯＡ化の進んだ現代に
おいては、数字、文字と共に、電話での音声通信は必要
不可欠な通信手段である。特に近年、ディジタル信号処
理技術とＬＳＩ技術に支えられたハードの驚異的な高性
能化、小型化に伴い、ディジタル通信路の拡大及びＩＳ
ＤＮ（サービス総合ディジタル網）などを用いた各種サ
ービスのディジタル統合化が、我が国を初め諸外国で急
速に進んでいる。2. Description of the Related Art Since voice is an important means in human communication, voice information plays a very important role in society. In the modern age of OA, voice communication by telephone along with numbers and letters is an indispensable communication means. In particular, in recent years, with the remarkable improvement in performance and miniaturization of hardware supported by digital signal processing technology and LSI technology, expansion of digital communication paths and IS
Digital integration of various services using DN (Service Integrated Digital Network) is rapidly progressing in Japan and other countries.

【０００３】ディジタル通信では、その基本尺度は使用
回線の単位時間あたりの情報の伝送量（ビットレート）
で決まるため、ネットワークの効率的かつ経済的利用と
通信需要に対応するためにも、通信情報の圧縮技術の早
急な開発が必要である。そして、人間の通信手段として
の音声が、速報性・指示性・警報性といった機能を要求
され、社会においては多くの場合、事務連絡・活動指示
といった通信に使用されることを考えると、特に、ディ
ジタル移動無線システムや企業内通信システムなどにお
いて、音声信号を４〜16 Kbits/secの低・中ビットレー
トで高能率な情報圧縮を実現できる音声符号化方式が望
まれている。In digital communication, the basic measure is the amount of information transmission (bit rate) per unit time of the line used.
Therefore, in order to efficiently and economically use the network and meet the communication demand, it is necessary to develop communication information compression technology immediately. Given that voice as a means of communication by humans is required to have functions such as promptness / indicativeness / alertness, and is often used for communication such as office communication / activity instruction in society, in particular, In digital mobile radio systems, in-house communication systems, and the like, there is a demand for a voice encoding method capable of realizing highly efficient information compression of voice signals at low and medium bit rates of 4 to 16 Kbits / sec.

【０００４】高能率音声符号化方式の代表的な方式とし
て、音声信号を音声生成モデルに基いて分析してパラメ
ータを抽出し、符号化を行う分析合成符号化方式があ
る。音声信号の大幅な情報圧縮を行うためには、音声信
号からの音声情報の効率の良い抽出が必要であり、その
ためには音声生成モデルの導入が不可欠である。音声生
成モデルに基いた代表的かつ実用的な分析合成方式とし
て線形予測モデルに基く分析合成方式があり、ＬＰＣ
（線形予測符号化）方式と呼ばれている。As a typical high-efficiency speech coding method, there is an analysis-synthesis coding method in which a speech signal is analyzed based on a speech generation model, parameters are extracted, and coding is performed. In order to perform significant information compression of voice signals, it is necessary to efficiently extract voice information from voice signals, and for that purpose, it is essential to introduce a voice generation model. As a typical and practical analysis / synthesis method based on the speech generation model, there is an analysis / synthesis method based on the linear prediction model.
It is called a (linear predictive coding) method.

【０００５】この方式は、音声波形の標本値間に高い相
関があることを利用し、現在の信号ｘ_iを過去のＫ個
（１０個程度）の標本の線形結合である予測値と、その
時の誤差信号の和として表そうとするものであり、This method utilizes the fact that there is a high correlation between the sample values of the speech waveform, and the present signal x _i is a predicted value that is a linear combination of the past K (about 10) samples, and at that time. Is to be expressed as the sum of the error signals of

【０００６】[0006]

【数１】 [Equation 1]

【０００７】という式に帰着できる。ここでａ_jは線形
予測係数（ＬＰＣ係数），ｅ_iは予測誤差信号と呼ば
れ、この予測誤差信号ｅ_iの一定区間の平均２乗誤差を
最小にするという条件でＬＰＣ係数を求めることを線形
予測分析（ＬＰＣ分析）という。原音声信号（符号化す
べき音声信号）に対してＬＰＣ分析を行うことは、全極
モデルに基く音声生成システムを仮定したのと等価であ
り、その極は人間の主に声道の周波数特性におけるホル
マントと呼ばれるスペクトルのピークに対応する。そし
て、ＬＰＣ係数は、原音声信号の周波数特性が定常と考
えられる一定区間（例えば数十ミリ秒の区間）で、原音
声信号を全極モデルによる音声生成モデルで近似した時
のシステムパラメ−タとなっており、上記各区間毎にＬ
ＰＣ分析によってＬＰＣ係数を演算しそれらを符号化す
ることにより、音声の高能率符号化を実現できる。The expression can be reduced to: Here, a _j is a linear prediction coefficient (LPC coefficient), e _i is called a prediction error signal, and the LPC coefficient is obtained under the condition that the mean square error of the prediction error signal e _{i in} a certain section is minimized. This is called linear prediction analysis (LPC analysis). Performing LPC analysis on the original speech signal (the speech signal to be encoded) is equivalent to assuming a speech production system based on an all-pole model, the poles of which are mainly in the frequency characteristics of the vocal tract of humans. It corresponds to the peak of the spectrum called the formant. The LPC coefficient is a system parameter when the original speech signal is approximated by a speech generation model by an all-pole model in a certain section (for example, a section of several tens of milliseconds) in which the frequency characteristic of the original speech signal is considered to be stationary. And L for each of the above sections
High-efficiency coding of voice can be realized by calculating LPC coefficients by PC analysis and coding them.

【０００８】ここで、上述の数１式において、右辺第１
項は、Ｋ個のＬＰＣ係数ａ_j をフィルタ係数とする音声
合成フィルタの演算を表わしており、右辺第２項の予測
誤差信号ｅ_iは、上記音声合成フィルタの出力ｘ_iを得
るための系への入力、即ち音源と考えることができる。
即ち、ＬＰＣ方式において音声を合成するためには、音
声合成フィルタを構成するためのＬＰＣ係数のほかに、
音声合成フィルタへの入力である音源信号が必要とな
る。従って、ＬＰＣ方式に基づいて音声を符号化する場
合には、ＬＰＣ係数を符号化すると共に、予測誤差信号
ｅ_i を効率的に符号化する必要がある。Here, in the above equation 1, the first right side
The term represents the operation of the speech synthesis filter using K LPC coefficients a _j as filter coefficients, and the prediction error signal e _i of the second term on the right side is a system for obtaining the output x _i of the speech synthesis filter. Can be thought of as the input to, ie the sound source.
That is, in order to synthesize speech in the LPC system, in addition to the LPC coefficient for constructing the speech synthesis filter,
A sound source signal that is an input to the speech synthesis filter is required. Therefore, when speech is encoded based on the LPC method, it is necessary to encode the LPC coefficient and efficiently encode the prediction error signal e _i .

【０００９】ここで、前述の数１式は、Here, the above equation 1 is

【００１０】[0010]

【数２】 [Equation 2]

【００１１】と書き直すことができ、これより予測誤差
信号ｅ_iは原音声信号ｘ_iを、ＬＰＣ分析によってモデ
ル化された線形システムの逆特性を有する逆フィルタに
通すことによって得ることができる。この予測誤差信号
は残差信号とも呼ばれる。## EQU1 ## From which the prediction error signal e _i can be obtained by passing the original speech signal x _i through an inverse filter having the inverse characteristics of the linear system modeled by LPC analysis. This prediction error signal is also called the residual signal.

【００１２】図７は、残差信号の時間域の波形の例を原
音声信号と対比させて示した図である。ＬＰＣ分析にお
いてもし音声信号の予測が完全に行われれば、残差信号
は値の小さい平坦なパワースペクトルを有する完全なラ
ンダム信号となる。しかし実際には、特に音声の大部分
を占める有声音部分では、ＬＰＣ分析による音声信号の
予測が完全に行われず、残差信号には図７に示すよう
に、周期的なパルスの時系列が現れる。また、パルス時
系列の間には雑音時系列が現れる。一方、無声音部分で
は、残差信号は周波数的にほぼ白色な雑音時系列とな
る。FIG. 7 is a diagram showing an example of the time domain waveform of the residual signal in comparison with the original voice signal. If the prediction of the speech signal is perfectly performed in the LPC analysis, the residual signal becomes a completely random signal having a flat power spectrum with a small value. However, in reality, particularly in a voiced sound portion that occupies most of the speech, the speech signal is not completely predicted by LPC analysis, and the residual signal has a time series of periodic pulses as shown in FIG. appear. Also, a noise time series appears between the pulse time series. On the other hand, in the unvoiced part, the residual signal is a noise time series that is almost white in frequency.

【００１３】このように残差信号には、マクロ的にみれ
ば有声音部分でパルス時系列の成分が含まれ、ミクロ的
にみれば雑音時系列の成分が含まれる。そして、パルス
時系列は人間の喉の声帯が振動することにより生ずるパ
ルス列空気流に同期することが知られており、そのパル
スの繰り返し周期はピッチ周期と呼ばれている。また、
雑音時系列は、一定サンプル（例えば8 kHz サンプリン
グで６４サンプル）毎に、統計的に複数種類に分類され
た典型的な雑音時系列の候補の何れかによって近似的に
表現可能である。As described above, the residual signal includes a pulse time-series component in the voiced sound portion from a macro perspective, and a noise time-series component from a micro perspective. It is known that the pulse time series is synchronized with the pulse train airflow generated by the vibration of the vocal cords of the human throat, and the repetition cycle of the pulse is called the pitch cycle. Also,
The noise time series can be approximately represented by any of typical noise time series candidates statistically classified into a plurality of types for every fixed sample (for example, 64 samples at 8 kHz sampling).

【００１４】このような事実を利用して、移動無線等に
適用可能な４〜16 kbits/secの低・中ビットレートの音
声伝送システムにおいて、残差信号を効率的かつ忠実に
符号化することのできる音声符号化方式として、コード
駆動ＬＰＣ符号化（ＣＥＬＰ、以下同じ）方式がある。Utilizing such a fact, it is possible to efficiently and faithfully encode a residual signal in a voice transmission system of low to medium bit rate of 4 to 16 kbits / sec applicable to mobile radio or the like. As a voice coding method that can be performed, there is a code driven LPC coding (CELP, hereinafter the same) method.

【００１５】ＣＥＬＰ方式では、送信装置において、次
のような符号化処理が行われる。即ち、まず、伝送され
るべき入力音声信号が所定サンプルに区切られたそれぞ
れの部分（これをフレームと呼ぶ）について、ＬＰＣ分
析によって主に声道特性に対応するＬＰＣ係数が演算さ
れ、そのＬＰＣ係数の組が現在のフレームに対応する声
道情報として符号化され、受信装置に伝送される。次
に、入力音声信号からＬＰＣ係数で表現される成分を除
いて得られる残差信号を表現する成分のうち、パルス時
系列の成分と残りの雑音時系列の成分とが別々に符号化
され、復号側の受信装置に伝送される。受信装置では、
送信装置から伝送されてきた残差信号に関する各符号デ
ータに基づき、雑音時系列の成分が復号されると共にパ
ルス時系列の成分が復号され、両者が加算されることに
よって残差信号が復号される。そして、送信装置から伝
送されてきたＬＰＣ係数によってＬＰＣ合成フィルタが
構成され、それに上述の復号された残差信号が入力され
ることによって、出力音声信号が合成される。In the CELP system, the following encoding processing is performed in the transmitting device. That is, first, the LPC coefficient mainly corresponding to the vocal tract characteristic is calculated by LPC analysis for each part (this is called a frame) in which the input audio signal to be transmitted is divided into predetermined samples, and the LPC coefficient is calculated. Is encoded as vocal tract information corresponding to the current frame and transmitted to the receiving device. Next, among the components representing the residual signal obtained by removing the components represented by the LPC coefficient from the input speech signal, the pulse time series components and the remaining noise time series components are encoded separately, It is transmitted to the receiving device on the decoding side. In the receiving device,
The noise time series component is decoded and the pulse time series component is decoded based on each code data relating to the residual signal transmitted from the transmitter, and the residual signal is decoded by adding both components. . Then, the LPC coefficient transmitted from the transmission device constitutes an LPC synthesis filter, and the above-mentioned decoded residual signal is input to the LPC synthesis filter to synthesize the output audio signal.

【００１６】このように、ＣＥＬＰ方式では、音声の声
道特性に相当する成分をＬＰＣ係数として符号化すると
共に、音声の音源特性に相当する残差信号をパルス時系
列の成分と雑音時系列の成分とに分離して符号化するこ
とによって、高能率かつ高品質な音声の符号化を可能と
している。As described above, in the CELP method, the component corresponding to the vocal tract characteristic of the voice is encoded as the LPC coefficient, and the residual signal corresponding to the source characteristic of the voice is composed of the pulse time series component and the noise time series. By separating and encoding the components, it is possible to encode the voice with high efficiency and high quality.

【００１７】ここで、パルス時系列の候補を記憶したコ
ードブックと、雑音時系列の候補を記憶したコードブッ
クのそれぞれが、送信装置と受信装置の両方に用意され
る。送信装置と受信装置とでは、それぞれ同じパルス時
系列のコードブック、同じ雑音時系列のコードブックが
用意される。Here, a codebook storing pulse time series candidates and a codebook storing noise time series candidates are prepared for both the transmitter and the receiver. The transmitter and the receiver prepare the same pulse time series codebook and the same noise time series codebook, respectively.

【００１８】そして、送信装置では、フレーム単位の残
差信号を最適に近似するパルス時系列がパルス時系列の
コードブック中の候補の中から探索され、コードブック
上での当該パルス時系列の位置を示す番号（インデック
ス）が受信装置に伝送される。この時、最適なパルス時
系列の振幅を残差信号の振幅に適合させるための最適な
ゲインも演算・符号化されて受信装置に伝送される。次
に、残差信号から上述の最適なパルス時系列に最適なゲ
インを乗算した成分が除かれ、フレーム単位の残差誤差
信号が得られる。そして、このフレーム単位の残差誤差
信号を最適に近似する雑音時系列が、パルス時系列の候
補の探索の場合と同様にして、雑音時系列のコードブッ
ク中の候補の中から探索され、そのインデックスが受信
装置に伝送される。また、最適な雑音時系列の振幅を上
記残差誤差信号の振幅に適合させるための最適なゲイン
も演算・符号化されて受信装置に伝送される。Then, in the transmitting apparatus, a pulse time series that optimally approximates the frame-by-frame residual signal is searched from candidates in the pulse time series codebook, and the position of the pulse time series on the codebook is searched. Is transmitted to the receiving device. At this time, the optimum gain for adapting the optimum pulse time series amplitude to the amplitude of the residual signal is also calculated and encoded and transmitted to the receiving device. Next, the component obtained by multiplying the optimum pulse time series by the optimum gain is removed from the residual signal, and a residual error signal in frame units is obtained. Then, the noise time series that optimally approximates the residual error signal of this frame unit is searched from among the candidates in the code book of the noise time series in the same manner as in the case of searching for the pulse time series candidates. The index is transmitted to the receiving device. Further, the optimum gain for adapting the optimum noise time series amplitude to the amplitude of the residual error signal is also calculated and encoded and transmitted to the receiving device.

【００１９】上述の送信装置側の動作に対応して、受信
装置では、送信装置から送られてきた雑音時系列につい
てのインデックスで雑音時系列のコードブックが参照さ
れ、対応する雑音時系列が読み出される。そして、その
雑音時系列に送信装置から送られてきた最適なゲインが
乗算されることにより、残差誤差信号が復号される。続
いて、送信装置から送られてきたパルス時系列について
のインデックスでパルス時系列のコードブックが参照さ
れ、対応するパルス時系列が読み出される。そして、そ
のパルス時系列に送信装置から送られてきた最適なゲイ
ンが乗算され、この結果得られたパルス時系列信号に上
述の残差誤差信号が加算されることにより、残差信号が
復号される。Corresponding to the above operation on the transmitting device side, the receiving device refers to the noise time series codebook with the index of the noise time series sent from the transmitting device, and reads the corresponding noise time series. Be done. Then, the residual error signal is decoded by multiplying the noise time series by the optimum gain sent from the transmitter. Then, the codebook of the pulse time series is referred to by the index of the pulse time series sent from the transmitter, and the corresponding pulse time series is read. Then, the pulse time series is multiplied by the optimum gain sent from the transmitter, and the above-mentioned residual error signal is added to the pulse time series signal obtained as a result, thereby decoding the residual signal. It

【００２０】ここで、送信装置において、パルス時系列
のコードブックから残差信号を近似する最適なパルス時
系列と最適なゲインを探索し、また、雑音時系列のコー
ドブックから残差誤差信号を近似する最適な雑音時系列
と最適なゲインを探索するＣＥＬＰ処理につき、図８の
原理構成図を用いて説明する。図８において、８０１が
上述の前者の処理を行う部分、８０２が後者の処理を行
う部分である。Here, in the transmitter, the optimum pulse time series and the optimum gain for approximating the residual signal are searched from the pulse time series codebook, and the residual error signal is obtained from the noise time series codebook. A CELP process for searching for an approximate optimum noise time series and an optimum gain will be described with reference to the principle configuration diagram of FIG. In FIG. 8, reference numeral 801 denotes a part for performing the former process and 802 for a latter process.

【００２１】まず、パルス時系列のコードブック中の候
補の中からフレーム単位の残差信号を最適に近似するパ
ルス時系列を探索するためには、単純には入力音声信号
に対するＬＰＣ分析に基づいて得られる各フレームの残
差信号（数２式参照）と、コードブック中の各パルス時
系列の候補との誤差を計算し、それが最小となるものを
選択すればよい。しかし、実際には人間の聴覚上の誤差
が最小となるパルス時系列を選択すべきであるため、音
声信号のレベルで誤差評価が行われる。しかもその場合
に、そのままの音声信号ではなく、人間の聴覚に合うよ
うに周波数特性に重み付けがなされた音声信号のレベル
で誤差評価が行われる。First, in order to search the pulse time series that best approximates the residual signal in frame units from the candidates in the pulse time series codebook, simply based on the LPC analysis on the input speech signal. The error between the obtained residual signal of each frame (see Formula 2) and each pulse time series candidate in the codebook may be calculated, and the one that minimizes the error may be selected. However, in practice, the pulse time series that minimizes the human auditory error should be selected, and therefore the error evaluation is performed at the level of the audio signal. Moreover, in that case, the error evaluation is performed not by the audio signal as it is, but by the level of the audio signal whose frequency characteristics are weighted so as to match the human hearing.

【００２２】即ち、図８の部分８０１において、まず、
聴覚重み付け部８０３は、１フレーム分の入力音声ベク
トルｓ（複数サンプルからなる入力音声信号）に対し
て、人間の聴覚に合うようにその周波数特性に重み付け
がなされる処理を実行することによって、重み付入力音
声ベクトルｘ（複数サンプルからなる重み付入力音声信
号）を出力する。That is, in the portion 801 of FIG. 8, first,
The perceptual weighting unit 803 performs weighting on the input voice vector s (input voice signal composed of a plurality of samples) for one frame by weighting its frequency characteristics so as to match human hearing. The attached input speech vector x (weighted input speech signal composed of a plurality of samples) is output.

【００２３】一方、パルスベクトルコードブック８０７
（パルス時系列のコードブック）には、それぞれ複数サ
ンプルからなる複数組のパルスベクトル（パルス時系
列）が記憶されている。ここで、１組のパルスベクトル
をｐ(m) とする。ｍはパルスベクトルコードブック８０
７上にある複数組のパルスベクトルのインデックスを表
わし、組の数をαとすれば１≦ｍ≦αである。On the other hand, the pulse vector codebook 807
The (pulse time series codebook) stores a plurality of sets of pulse vectors (pulse time series) each consisting of a plurality of samples. Here, one set of pulse vectors is p (m). m is the pulse vector codebook 80
The index of a plurality of sets of pulse vectors on 7 is represented, and if the number of sets is α, then 1 ≦ m ≦ α.

【００２４】今、誤差評価部８０４が読出し部８０６に
対してｍ＝１である１組目のパルスベクトルを読み出す
よう指示する。これにより、読出し部８０６によりパル
スベクトルコードブック８０７（パルス時系列のコード
ブック）から読み出された１組目のパルスベクトル（パ
ルス時系列）ｐ(1) が、重み付短期予測部８０５に入力
される。Now, the error evaluation unit 804 instructs the reading unit 806 to read the first set of pulse vectors with m = 1. As a result, the first set of pulse vectors (pulse time series) p (1) read from the pulse vector codebook 807 (pulse time series codebook) by the reading unit 806 is input to the weighted short-term prediction unit 805. To be done.

【００２５】重み付短期予測部８０５は、ＬＰＣ係数ａ
_j（１≦ｊ≦Ｋ）で構成されるＬＰＣ合成フィルタの処
理を聴覚重み付けを考慮した上で実行することにより、
１組目のパルスベクトルｐ(1) に対応する合成音声ベク
トルｙ(1) （複数サンプルからなる合成音声信号）を演
算する。このＬＰＣ係数ａ_j（１≦ｊ≦Ｋ）は、前述し
た入力音声ベクトルｓに対して特には図示しないＫ次の
ＬＰＣ分析処理が実行されることによって得られる。The weighted short-term predictor 805 uses the LPC coefficient a.
By performing the processing of the LPC synthesis filter composed of _j (1 ≦ j ≦ K) in consideration of auditory weighting,
A synthetic speech vector y (1) (a synthetic speech signal composed of a plurality of samples) corresponding to the first set of pulse vectors p (1) is calculated. The LPC coefficient a _j (1 ≦ j ≦ K) is obtained by performing the Kth-order LPC analysis process (not shown) on the input speech vector s described above.

【００２６】誤差評価部８０４は、上述の１組目のパル
スベクトルｐ(1) に対応する合成音声ベクトルｙ(1) と
重み付入力音声ベクトルｘとの誤差を計算する。そし
て、誤差評価部８０４は、１組目の誤差評価を終了した
ら、ｍの値を更新して２とし、読出し部８０６に、パル
スベクトルコードブック８０７からｍ＝２に対応する次
の組のパルスベクトルｐ(2) を読み出させる。The error evaluator 804 calculates an error between the weighted input voice vector x and the synthesized voice vector y (1) corresponding to the above-mentioned first set of pulse vectors p (1). Then, when the error evaluation unit 804 finishes the error evaluation of the first set, the value of m is updated to 2 and the reading unit 806 instructs the pulse vector codebook 807 to set the pulse of the next set corresponding to m = 2. Read the vector p (2).

【００２７】このようにして、誤差評価部８０４は、ｍ
の値を１からαまで更新しながら、パルスベクトルコー
ドブック８０７から順次読み出されるパルスベクトルｐ
(m)に対応する合成音声ベクトルｙ(m) について、聴覚
重み付け部８０３で求められている同じ重み付入力音声
ベクトルｘとの誤差を演算し、全てのパルスベクトルｐ
(m) （１≦ｍ≦α）についての誤差評価を終了した時点
で、誤差が最も小さかったパルスベクトルに対応するイ
ンデックスｍの値ｍ_optと最適なゲインｇを出力する。In this way, the error evaluator 804 uses m
Of the pulse vector p that is sequentially read from the pulse vector codebook 807 while updating the value of 1 from α to α.
For the synthesized speech vector y (m) corresponding to (m), an error between the synthesized speech vector y (m) and the same weighted input speech vector x found by the perceptual weighting unit 803 is calculated, and all pulse vectors p
When the error evaluation for (m) (1 ≦ m ≦ α) is completed, the value m _{opt of the} index m corresponding to the pulse vector having the smallest error and the optimum gain g are output.

【００２８】以上のように、音声信号（音声ベクトル）
を合成しながら誤差評価を行う方式は、一般に、合成に
よる分析手法(Analysis-by-Synthesis:A-b-S法）と呼ば
れている。A-b-S 法では、入力音声信号から残差信号を
直接求める必要はないが、コードブックの探索を行う毎
に音声合成処理を実行する必要がある。As described above, the voice signal (voice vector)
The method of performing error evaluation while synthesizing is generally called an analysis-by-synthesis method (AbS method). With the AbS method, it is not necessary to directly obtain the residual signal from the input speech signal, but it is necessary to execute speech synthesis processing each time the codebook is searched.

【００２９】ここで、誤差評価部８０４におけるゲイン
を考慮した誤差評価方式について説明する。まず、重み
付入力音声ベクトルｘと合成音声ベクトルｙ(m) との誤
差電力Ｅdは、次式で表わされる。Here, the error evaluation method in consideration of the gain in the error evaluation unit 804 will be described. First, the error power Ed between the weighted input speech vector x and the synthesized speech vector y (m) is expressed by the following equation.

【００３０】[0030]

【数３】 [Equation 3]

【００３１】ここで、合成音声ベクトルｙ(m) は重み付
短期予測部８０５において、Here, the synthesized speech vector y (m) is calculated by the weighted short-term prediction unit 805.

【００３２】[0032]

【数４】 [Equation 4]

【００３３】なるフィルタ演算で合成される。なお、重
み付入力音声ベクトルｘと合成音声ベクトルｙ(m) の各
添え字ｉは、Ｎサンプルからなる１フレーム（１組）分
の音声サンプルの各要素の番号を示しており、１≦ｉ≦
Ｎである。Are combined by the following filter calculation. Each subscript i of the weighted input speech vector x and the synthesized speech vector y (m) represents the number of each element of one frame (one set) of speech samples consisting of N samples, and 1 ≦ i ≤
N.

【００３４】上述の数３式の誤差電力Ｅd を最小にする
ゲインｇは、数３式をｇで微分して零とおくことにより
得られる。即ち、次式の通りである。The gain g that minimizes the error power Ed of the above equation 3 is obtained by differentiating the equation 3 with g and setting it to zero. That is, it is as follows.

【００３５】[0035]

【数５】 [Equation 5]

【００３６】この時の誤差電力Ｅd は、数５式を数３式
に代入することにより、次式のようになる。The error power Ed at this time is given by the following equation by substituting the equation 5 into the equation 3.

【００３７】[0037]

【数６】 [Equation 6]

【００３８】数６式の右辺第１項は、重み付入力音声ベ
クトルｘの電力を表わしており、各パルスベクトルｐ
(m) の組が変更されることによって合成音声ベクトルｙ
(m) の組が変更されても一定である。従って、誤差電力
Ｅd を最小にするパルスベクトルｐ(m) の組は、数６式
の右辺第２項、即ち、The first term on the right side of the equation (6) represents the power of the weighted input speech vector x, and each pulse vector p
By changing the set of (m), the synthesized speech vector y
It is constant even if the set of (m) is changed. Therefore, the set of pulse vectors p (m) that minimizes the error power Ed is the second term on the right side of Equation 6, that is,

【００３９】[0039]

【数７】 [Equation 7]

【００４０】を最大にするパルスベクトルｐ(m) の組と
して求めることができる。この数７式は、重み付入力音
声ベクトルｘと合成音声ベクトルｙ(m) の一種の相関を
求める演算である。従って、数７式を最大にするパルス
ベクトルｐ(m) を求めることは、２つの音声ベクトルの
相関が最大なものを求めることにほぼ等価である。It can be obtained as a set of pulse vectors p (m) that maximizes. The expression (7) is an operation for obtaining a kind of correlation between the weighted input speech vector x and the synthesized speech vector y (m). Therefore, obtaining the pulse vector p (m) that maximizes the equation (7) is almost equivalent to obtaining the one having the maximum correlation between the two voice vectors.

【００４１】以上の誤差評価の原理に基づいて、図８の
誤差評価部８０４は、各パルスベクトルｐ(m) の組に対
応する合成音声ベクトルｙ(m) について数７式を演算
し、その値Ａが最終的に最も大きくなる組に対応するイ
ンデックスｍの値をｍ_optとして出力する。即ち、数７
式の値を最大にする最適なパルスベクトル及び合成音声
ベクトルは、ｐ(m_opt) 及びｙ(m_opt) である。Based on the above-described principle of error evaluation, the error evaluation unit 804 of FIG. 8 calculates the equation 7 for the synthetic speech vector y (m) corresponding to each set of pulse vectors p (m), and The value of the index m corresponding to the set in which the value A finally becomes the largest is output as m _opt . That is, number 7
The optimal pulse and synthetic speech vectors that maximize the value of the equation are p (m _opt ) and y (m _opt ).

【００４２】そして、誤差評価部８０４は、最適合成音
声ベクトルｙ(m_opt) を用いて数５式を演算することに
より、最適ゲインｇを出力する。以上の処理の結果、誤
差評価部８０４から乗算部８０８に上述の最適合成音声
ベクトルｙ(m_opt) が出力される。そして、乗算部８０
８においてｙ(m_opt) に最適ゲインｇが乗算され、更
に、減算部８０９において上述の乗算結果が重み付入力
音声ベクトルｘ_iから減算されることにより、重み付入
力音声誤差ベクトルｅｘが得られる。Then, the error evaluation section 804 outputs the optimum gain g by calculating the expression 5 using the optimum synthesized speech vector y (m _opt ). As a result of the above processing, the above-mentioned optimum synthesized speech vector y (m _opt ) is output from the error evaluation unit 804 to the multiplication unit 808. Then, the multiplication unit 80
8, y (m _opt ) is multiplied by the optimum gain g, and the subtraction unit 809 further subtracts the above-mentioned multiplication result from the weighted input speech vector x _i to obtain the weighted input speech error vector ex. .

【００４３】次に、雑音時系列のコードブック中の候補
の中からフレーム単位の残差誤差信号を最適に近似する
雑音時系列を探索する場合にも、聴覚重み付けがなされ
た音声信号のレベルで誤差評価が行われる。Next, even when a noise time series that optimally approximates the residual error signal in frame units is searched from the candidates in the noise time series codebook, at the level of the perceptually weighted speech signal. Error evaluation is performed.

【００４４】即ち、図８の部分８０２において、雑音ベ
クトルコードブック８１３（雑音時系列のコードブッ
ク）には、それぞれ複数サンプルからなる複数組の雑音
ベクトル（雑音時系列）が記憶されている。ここで、１
組の雑音ベクトルをｃ(n) とする。ｎは雑音ベクトルコ
ードブック８１３上にある複数組の雑音ベクトルのイン
デックスを表わし、組の数をβとすれば１≦ｎ≦βであ
る。That is, in the portion 802 of FIG. 8, the noise vector codebook 813 (noise time series codebook) stores a plurality of sets of noise vectors (noise time series) each consisting of a plurality of samples. Where 1
Let the noise vector of the set be c (n). n represents an index of a plurality of sets of noise vectors in the noise vector codebook 813, where 1 ≦ n ≦ β where β is the number of sets.

【００４５】誤差評価部８１０による制御動作は、前述
した誤差評価部８０４の場合とほとんど同じである。即
ち、誤差評価部８１０は、ｎの値を１からβまで更新し
ながら、雑音ベクトルコードブック８１３から順次読み
出される雑音ベクトルｃ(n)に対応して重み付短期予測
部８１１で合成される合成音声誤差ベクトルｅｙ(n)に
ついて、重み付入力音声誤差ベクトルｅｘとの誤差を演
算し、全ての雑音ベクトルｃ(n) （１≦ｎ≦β）につい
ての誤差評価を終了した時点で、誤差が最も小さかった
雑音ベクトルに対応するインデックスｎの値ｎ_optと最
適なゲインｂを出力する。The control operation by the error evaluator 810 is almost the same as that of the error evaluator 804 described above. That is, the error evaluation unit 810 updates the value of n from 1 to β, and synthesizes the weighted short-term prediction unit 811 corresponding to the noise vector c (n) sequentially read from the noise vector codebook 813. For the speech error vector ey (n), the error with the weighted input speech error vector ex is calculated, and at the time when the error evaluation for all the noise vectors c (n) (1 ≦ n ≦ β) is completed, the error becomes The value n _{opt of the} index n corresponding to the smallest noise vector and the optimum gain b are output.

【００４６】そして、誤差評価部８１０におけるゲイン
を含めた誤差評価は、前述した数３式〜数７式に対応す
る以下の数８式〜数１２に基づいて行われる。The error evaluation including the gain in the error evaluation unit 810 is performed based on the following Expressions 8 to 12 corresponding to the above Expressions 3 to 7.

【００４７】[0047]

【数８】 [Equation 8]

【００４８】[0048]

【数９】 [Equation 9]

【００４９】[0049]

【数１０】 [Equation 10]

【００５０】[0050]

【数１１】 [Equation 11]

【００５１】[0051]

【数１２】 [Equation 12]

【００５２】即ち、誤差評価部８１０は、読出し部８１
２によって雑音ベクトルコードブック８１３から読み出
される各雑音ベクトルｃ(n) の組に対応して重み付短期
予測部８１１で数９式に基づいて合成される合成音声誤
差ベクトルｅｙ(n) について数１２式を演算し、その値
Ａが最終的に最も大きくなる組に対応するインデックス
ｎの値をｎ_optとして出力する。更に、誤差評価部８１
０は、最適合成音声誤差ベクトルｅｙ(n_opt) を用いて
数１０式を演算することにより、最適ゲインｂを出力す
る。That is, the error evaluator 810 includes the read unit 81.
2, the weighted short-term predictor 811 corresponding to each set of noise vectors c (n) read from the noise vector codebook 813 according to the equation (9) synthesizes the speech error vector ey (n) with the equation (12). The formula is calculated, and the value of the index n corresponding to the set whose value A finally becomes the largest is output as n _opt . Furthermore, the error evaluation unit 81
0 outputs the optimum gain b by calculating the expression (10) using the optimum synthesized speech error vector ey (n _opt ).

【００５３】以上に示される送信装置側の動作に対応し
て、受信装置では、送信装置から送られてきたインデッ
クスｎ_optで図８の８１３と全く同じ構成の受信装置内
の雑音ベクトルコードブックが参照され、対応する最適
雑音ベクトルｃ(n_opt) が読み出される。そして、それ
に送信装置から送られてきた最適ゲインｂが乗算される
ことにより、残差誤差信号が復号される。続いて、送信
装置から送られてきたインデックスｍ_optで図８の８０
７と全く同じ構成の受信装置内のパルスベクトルコード
ブックが参照され、対応する最適パルスベクトルｐ(m
_opt) が読み出される。そして、それに送信装置から送
られてきた最適ゲインｇが乗算され、この結果得られた
パルスベクトルに上述の残差誤差信号が加算されること
により、残差信号が復号される。Corresponding to the operation on the transmitting device side as described above, at the receiving device, the noise vector codebook in the receiving device having exactly the same configuration as 813 in FIG. 8 is obtained by the index n _opt sent from the transmitting device. The optimum noise vector c (n _opt ) corresponding to this is read out. Then, the residual error signal is decoded by being multiplied by the optimum gain b sent from the transmitter. Then, with the index m _opt sent from the transmission device, 80 in FIG.
Reference is made to the pulse vector codebook in the receiver having exactly the same configuration as in No. 7, and the corresponding optimum pulse vector p (m
_opt ) is read. Then, it is multiplied by the optimum gain g sent from the transmitter, and the above-mentioned residual error signal is added to the pulse vector obtained as a result, whereby the residual signal is decoded.

【００５４】[0054]

【発明が解決しようとする課題】以上に示したA-b-S 法
を基本とするＣＥＬＰ方式の従来例においては、コード
ブックの探索を行う毎に音声合成処理を実行する必要が
あるため、最適なインデックスｍ_opt、ｎ_opt及び最適
ゲインｇ、ｂを得るまでに、非常に多くの演算量を必要
とする。In the conventional example of the CELP method based on the AbS method described above, it is necessary to execute the speech synthesis processing every time the codebook is searched, so that the optimum index m To obtain _opt , n _opt and optimum gains g, b requires a very large amount of calculation.

【００５５】特に、優れた合成音声の品質を得るために
は、図８の雑音ベクトルコードブック８１３のサイズβ
を大きくする必要がある。これは、次のような理由によ
る。即ち、残差信号を近似すべきパルスベクトルの種類
はそれほど多くはないのに対して、残差信号からパルス
ベクトルの成分を除いた残差誤差信号は、周波数特性が
白色に近く、時間領域のベクトルパターンとしても非常
に多くの種類のパターンが出現し得る。従って、このよ
うな残差誤差信号を正確に近似するためには、多くのパ
ターンの雑音ベクトルが必要となるためである。Particularly, in order to obtain an excellent synthesized speech quality, the size β of the noise vector codebook 813 shown in FIG. 8 is used.
Needs to be increased. This is for the following reason. That is, there are not so many types of pulse vectors to which the residual signal should be approximated, whereas the residual error signal obtained by removing the pulse vector components from the residual signal has a frequency characteristic close to white and As a vector pattern, a great many kinds of patterns can appear. Therefore, many patterns of noise vectors are required to accurately approximate such a residual error signal.

【００５６】このように、大きなサイズの雑音ベクトル
コードブック８１３が必要となる結果、図８の誤差評価
部８１０が数１２式を演算する回数が増加し、全体的に
膨大な演算量となってしまうという問題点を有してい
る。As described above, as a result of requiring the large-sized noise vector codebook 813, the number of times the error evaluator 810 of FIG. 8 calculates the equation 12 increases, resulting in a huge amount of calculation as a whole. There is a problem that it ends up.

【００５７】上述のような問題点は、ＣＥＬＰ方式に限
られることなく、例えば残差信号の符号化方式の１つで
あるマルチパルス符号化方式において、A-b-S 法に基づ
いて入力音声と合成された音声の二乗誤差の評価を繰り
返す処理などにおいても、同様に発生する。The above-mentioned problems are not limited to the CELP method, and for example, in the multi-pulse coding method which is one of the coding methods of the residual signal, it is synthesized with the input voice based on the AbS method. The same occurs in the process of repeating the evaluation of the square error of the voice and the like.

【００５８】本発明は、合成音声の品質の劣化を最小限
に抑えつつ、誤差評価時の演算量を削減することによっ
て、音声符号化処理の高速化を図ることを目的とする。It is an object of the present invention to speed up the speech coding process by reducing the amount of calculation at the time of error evaluation while minimizing the deterioration of the quality of synthesized speech.

【００５９】[0059]

【課題を解決するための手段】図１は、本発明のブロッ
ク図である。本発明は、音声合成処理を行ってそれによ
り得られる合成音声ベクトルと入力音声ベクトルとの二
乗誤差の評価処理を繰り返しながら入力音声ベクトルの
符号化を行う音声符号化方式を前提とする。より具体的
には、例えばコード駆動線形予測方式に基づく音声符号
化方式が適用される。この方式では、まず、入力音声ベ
クトル１０５が線形予測分析されてその線形予測係数が
符号化される。この処理と共に、コードブック記憶手段
１０１から励振ベクトル１０２が選択され、その励振ベ
クトルと上述の線形予測係数に基づいて音声合成処理手
段１０３において音声合成処理が実行され、それにより
得られる合成音声ベクトル１０４と入力音声ベクトル１
０５との二乗誤差の評価処理が繰り返されながら、入力
音声ベクトル１０５の残差信号のベクトル量子化が行わ
れる。その他、音声合成方式として、残差信号の符号化
方式の１つであるマルチパルス符号化方式なども採用で
きる。FIG. 1 is a block diagram of the present invention. The present invention is premised on a speech encoding method in which an input speech vector is encoded while performing a speech synthesis process and repeating a squared error evaluation process between the synthesized speech vector obtained thereby and the input speech vector. More specifically, for example, a speech coding method based on a code driven linear prediction method is applied. In this method, first, the input speech vector 105 is subjected to linear prediction analysis and its linear prediction coefficient is encoded. Along with this processing, an excitation vector 102 is selected from the codebook storage means 101, a speech synthesis processing is executed in the speech synthesis processing means 103 based on the excitation vector and the above-mentioned linear prediction coefficient, and a synthetic speech vector 104 obtained thereby. And input voice vector 1
The vector quantization of the residual signal of the input speech vector 105 is performed while the evaluation process of the squared error with 05 is repeated. In addition, a multi-pulse coding method, which is one of the coding methods of the residual signal, can be adopted as the speech synthesis method.

【００６０】そして、まず、入力音声ベクトル１０５の
要素値と合成音声ベクトル１０４の要素値のそれぞれに
ついて、一定の間隔の間引き処理を行う間引き手段１０
６を有する。First, the thinning-out means 10 for thinning out the element values of the input speech vector 105 and the element values of the synthesized speech vector 104 at regular intervals.
Have six.

【００６１】次に、間引き手段１０６によって入力音声
ベクトル１０５及び合成音声ベクトル１０４の各要素値
を間引いて得られる間引き入力音声ベクトル１０７及び
間引き合成音声ベクトル１０８について二乗誤差の評価
処理を実行する誤差評価手段１０９を有する。同手段
は、例えば、間引き入力音声ベクトル１０７及び間引き
合成音声ベクトル１０８について両ベクトル間の相関値
を演算する。Next, an error evaluation for executing a square error evaluation process on the thinned-out input voice vector 107 and the thinned-out synthesized voice vector 108 obtained by thinning out the respective element values of the input voice vector 105 and the synthesized voice vector 104 by the thinning-out means 106. Having means 109. The means calculates the correlation value between the thinned-out input speech vector 107 and the thinned-out synthesized speech vector 108, for example.

【００６２】上述の二乗誤差の評価処理が繰り返される
ことにより、二乗誤差が最小（相関値が最大）となる合
成音声ベクトルを生じさせたパラメータに基づいて入力
音声ベクトル１０５が符号化される。By repeating the above-described square error evaluation processing, the input voice vector 105 is encoded based on the parameter that has generated the synthesized voice vector having the minimum square error (the maximum correlation value).

【００６３】上述の本発明の構成において、二乗誤差の
評価処理は聴覚特性に基づく重み付け処理を行った上で
実行されるように構成することができる。具体的には、
入力音声ベクトル１０５について聴覚重み付けが行わ
れ、また、音声合成処理手段１０３が聴覚重み付けを考
慮した音声合成処理を行うことによって聴覚重み付けさ
れた合成音声ベクトル１０４が生成される。そして、こ
れら両ベクトルについて間引き手段１０６による間引き
処理、誤差評価手段１０９による誤差評価が実行され
る。In the above-mentioned configuration of the present invention, the squared error evaluation process may be performed after weighting process based on the auditory characteristic. In particular,
Perceptual weighting is performed on the input speech vector 105, and the perceptually weighted synthetic speech vector 104 is generated by the speech synthesis processing unit 103 performing speech synthesis processing in consideration of the perceptual weighting. Then, thinning processing by the thinning means 106 and error evaluation by the error evaluation means 109 are executed for these two vectors.

【００６４】[0064]

【作用】音声合成処理を行ってそれにより得られる合成
音声ベクトル１０４と入力音声ベクトル１０５との二乗
誤差の評価処理を繰り返しながら入力音声ベクトル１０
５の符号化を行う音声符号化方式では、１回の二乗誤差
の評価処理において、合成音声ベクトル１０４と入力音
声ベクトル１０５の各要素間の乗算演算等が必要とな
る。そして、例えばコード駆動線形予測方式において、
復号音声の音質を向上させるためにコードブック記憶手
段１０１に記憶される励振ベクトル１０２の種類が増え
ると、その全ての励振ベクトルについて二乗誤差の評価
処理を行わねばならないため、全体的に膨大な演算量が
必要となる。The input voice vector 10 is generated while repeating the evaluation process of the squared error between the synthesized voice vector 104 and the input voice vector 105 obtained by performing the voice synthesis process.
In the speech coding method for performing the coding of No. 5, the multiplication operation between the respective elements of the synthesized speech vector 104 and the input speech vector 105 is required in the evaluation process of the squared error once. And, for example, in a code driven linear prediction scheme,
If the number of types of the excitation vector 102 stored in the codebook storage means 101 increases in order to improve the sound quality of the decoded speech, the square error evaluation process must be performed for all the excitation vectors, so that a huge calculation is performed as a whole. You need the amount.

【００６５】そこで、本発明では、音声信号は隣接する
数サンプル間では近接相関が高いことを利用して、間引
き手段１０６によって入力音声ベクトル１０５及び合成
音声ベクトル１０４の各要素値を間引いて得られる間引
き入力音声ベクトル１０７及び間引き合成音声ベクトル
１０８について誤差評価手段１０９で誤差評価が行われ
る。Therefore, in the present invention, the speech signal is obtained by thinning out the element values of the input speech vector 105 and the synthesized speech vector 104 by the thinning-out means 106 by utilizing the fact that the proximity correlation is high between adjacent several samples. The error evaluation means 109 performs error evaluation on the thinned-out input speech vector 107 and the thinned-out synthesized speech vector 108.

【００６６】これにより、復号装置側での合成音声の品
質の劣化を最小限に抑えつつ、誤差評価のための演算量
を数分の１にすることができ、音声符号化装置を高速化
することができ、回路規模の縮小が可能となる。By this means, it is possible to reduce the amount of calculation for error evaluation to a fraction, while minimizing the deterioration of the quality of synthesized speech on the decoding device side, and speed up the speech coding device. Therefore, the circuit scale can be reduced.

【００６７】[0067]

【実施例】以下、図面を参照しながら本発明の実施例に
つき詳細に説明する。本発明は、大きく分けて、図２の
ＬＰＣ符号化部と図３の残差符号化部とで構成される。Embodiments of the present invention will now be described in detail with reference to the drawings. The present invention is roughly divided into an LPC encoding unit in FIG. 2 and a residual encoding unit in FIG.

【００６８】なお、本発明の実施例では、後述する図３
の誤差評価部３１０の構成と動作に特徴がある。＜ＬＰＣ符号化部の説明＞図２は、ＬＰＣ符号化部の構
成図である。In the embodiment of the present invention, FIG.
The configuration and operation of the error evaluation unit 310 are characterized. < Description of LPC Encoding Unit > FIG. 2 is a block diagram of the LPC encoding unit.

【００６９】サンプリング周波数8 kHz でサンプリング
された入力音声信号は、バッファ部２０１に、入力音声
ベクトルｓとして例えば１フレーム＝３２０サンプルず
つ保持される。そして、３２０サンプルの入力音声ベク
トルｓに対する符号化処理が、３２０サンプリング周期
の間にリアルタイムで実行される。The input voice signal sampled at the sampling frequency of 8 kHz is held in the buffer unit 201 as the input voice vector s, for example, one frame = 320 samples at a time. Then, the encoding process on the input speech vector s of 320 samples is executed in real time during 320 sampling periods.

【００７０】ＬＰＣ分析部２０２は、上述の３２０サン
プルの入力音声ベクトルｓに対してＬＰＣ分析を実行
し、１組のＬＰＣ係数ａ_j（１≦ｊ≦Ｋ）を出力する。
ＬＰＣ分析の具体的手法としては、例えばＰＡＲＣＯＲ
分析法、ＬＳＰ分析法等が適用でき、ＬＰＣ分析の次数
Ｋは例えば１０次である。The LPC analysis section 202 performs an LPC analysis on the above-mentioned input speech vector s of 320 samples and outputs a set of LPC coefficients a _j (1 ≦ j ≦ K).
As a concrete method of LPC analysis, for example, PARCOR
An analysis method, an LSP analysis method, or the like can be applied, and the order K of the LPC analysis is, for example, 10th order.

【００７１】入力音声ベクトルｓの３２０サンプル毎に
ＬＰＣ分析部２０２で求まる１組の１０次のＬＰＣ係数
ａ_jは、符号化部２０３で符号化された後、特には図示
しない音声復号装置に伝送される。A set of 10th-order LPC coefficients a _j obtained by the LPC analysis section 202 for each 320 samples of the input speech vector s is encoded by the encoding section 203 and then transmitted to a speech decoding device (not shown). To be done.

【００７２】また、この符号化されたＬＰＣ係数ａ
_jは、復号部２０４において音声復号装置における復号
処理と同じ処理によって復号され、後述する残差符号化
部での処理に使用される。このように、復号したＬＰＣ
係数ａ_jが用いられるのは、ＬＰＣ係数ａ_jを符号化す
るときに発生する量子化誤差を残差信号に含めてしまい
そのような残差信号を符号化することによって、ＬＰＣ
係数ａ_jの符号化により発生する復号時の音声の品質劣
化を最小限に抑えるためである。＜残差符号化部の説明＞次に、図３は、残差符号化部の
構成図である。Also, this encoded LPC coefficient a
_j is decoded in the decoding unit 204 by the same processing as the decoding processing in the speech decoding apparatus, and is used in the processing in the residual coding unit described later. In this way, the decrypted LPC
The coefficient a _j is used because the LPC coefficient a _j is included in the residual signal by including a quantization error that occurs when the LPC coefficient a _j is encoded, and thus the LPC coefficient a _j is encoded.
This is because the quality deterioration of the voice at the time of decoding, which occurs due to the encoding of the coefficient a _j , is minimized. < Description of Residual Encoding Unit > Next, FIG. 3 is a configuration diagram of the residual encoding unit.

【００７３】図３において、３０１〜３１３の参照番号
が付された各部は、その番号順に、「従来の技術」の項
で前述した図８のＣＥＬＰ方式の原理構成における８０
１〜８１３の各部に対応している。そして、図３におい
ては、その基本的な構成は図８のＣＥＬＰ方式の原理構
成と同様だが、ＣＥＬＰ方式を実際の残差符号化装置と
して実現するための構成が付加されている。In FIG. 3, the parts denoted by reference numerals 301 to 313 are arranged in the order of the numbers 80 in the principle configuration of the CELP system of FIG. 8 described in the section "Prior Art".
It corresponds to each part of 1 to 813. In FIG. 3, the basic configuration is the same as the principle configuration of the CELP system in FIG. 8, but a configuration for implementing the CELP system as an actual residual encoding device is added.

【００７４】即ち、まず、図２のＬＰＣ符号化部のバッ
ファ部２０１から出力されている１フレーム（＝３２０
サンプル）分の入力音声ベクトルｓに対して、聴覚重み
付け部３０３で、人間の聴覚に合うようにその周波数特
性に重み付けがなされ、そこから得られる６４サンプル
の重み付入力音声ベクトルｘは、バッファ部３１４に保
持される。そして、バッファ部３１４に保持された６４
サンプルの重み付入力音声ベクトルｘに対して、３０１
及び３０２の部分の処理が連続的に実行される。That is, first, one frame (= 320) output from the buffer unit 201 of the LPC encoding unit of FIG.
The input speech vector s for (samples) is weighted by the perceptual weighting unit 303 to its frequency characteristic so as to match the human hearing, and the weighted input speech vector x of 64 samples obtained therefrom is stored in the buffer unit. Held at 314. Then, the 64 held in the buffer unit 314
For the weighted input speech vector x of the sample, 301
The processing of portions 302 and 302 is continuously executed.

【００７５】ここで、聴覚重み付け部３０３、３０１及
び３０２の各部分における１サブフレーム＝６４サンプ
ルに対する処理が５サブフレーム分連続して実行される
ことにより、１フレーム＝３２０サンプルの入力音声ベ
クトルｓに対する処理が実行されたことになる。Here, the processing for 1 subframe = 64 samples in each part of the perceptual weighting sections 303, 301 and 302 is continuously executed for 5 subframes, whereby the input speech vector s of 1 frame = 320 samples. Has been executed.

【００７６】従って、符号化装置から特には図示しない
復号装置に対しては、図２のＬＰＣ符号化部から３２０
サンプルに１組の割合で１０次のＬＰＣ係数ａ_jが出力
され、また、図３の残差符号化部から３２０サンプルに
５組の割合で最適インデックスｍ_optと最適ゲインｇの
組が、同様に、３２０サンプルに５組の割合で最適イン
デックスｎ_optと最適ゲインｂの組が出力されることに
なる。第１段目の残差符号化処理の説明より具体的には、まず、３０１の部分において、バッフ
ァ部３１４に保持された６４サンプルの重み付入力音声
ベクトルｘのる各サブフレームについて、以下のような
誤差評価処理が実行される。Therefore, from the encoding device to the decoding device (not shown), the LPC encoder 320 of FIG.
The 10th-order LPC coefficient a _j is output at a rate of 1 set for each sample, and the set of the optimum index m _opt and the optimal gain g at the rate of 5 sets for 320 samples is the same from the residual encoding unit of FIG. Then, the set of the optimum index n _opt and the optimum gain b is output at a ratio of 5 sets to 320 samples. More specifically, the description of the residual coding process in the first stage is as follows. First, in the portion 301, the following is performed for each subframe having the weighted input speech vector x of 64 samples held in the buffer unit 314. Such an error evaluation process is executed.

【００７７】図３の誤差評価部３０４の構成を図４に示
す。誤差評価部３０４は、評価値演算部４１０、最大評
価値記憶部４０２、合成音声ベクトル記憶部４０３、イ
ンデックス記憶部４０４、インデックスカウンタ部４０
５及びゲイン演算部４０６から構成される。以下、図３
と図４を用いて説明する。The configuration of the error evaluation unit 304 of FIG. 3 is shown in FIG. The error evaluation unit 304 includes an evaluation value calculation unit 410, a maximum evaluation value storage unit 402, a synthesized voice vector storage unit 403, an index storage unit 404, and an index counter unit 40.
5 and a gain calculator 406. Below, FIG.
Will be described with reference to FIG.

【００７８】始めに、アダプディブコードブック３０７
は、図８のパルスベクトルコードブック８０７と同様で
あり、そこにはα組のパルスベクトルｐ(m)（１≦ｍ≦
α）が記憶されている。ここで、各組のパルスベクトル
の要素の数（サンプル数）は上述の例では６４サンプ
ル、即ち、ｐ(m) ＝ｐ_i(m) （１≦ｉ≦６４）である。
また、本実施例においては、パルスベクトルｐ(m) は、
後述する更新部３２７によって更新される。First, the adaptive codebook 307
Is similar to the pulse vector codebook 807 in FIG. 8, in which α sets of pulse vectors p (m) (1 ≦ m ≦
α) is stored. Here, the number of elements (the number of samples) of the pulse vector of each set is 64 samples in the above example, that is, p (m) = p _i (m) (1 ≦ i ≦ 64).
Further, in the present embodiment, the pulse vector p (m) is
It is updated by the updating unit 327 described later.

【００７９】今、誤差評価部３０４内の評価値演算部４
０１が、インデックスカウンタ部４０５に初期値１をセ
ットし、インデックスカウンタ部４０５から読出し部３
０６に対してｍ＝１が出力される。これにより、読出し
部３０６は、アダプディブコードブック３０７から１組
目のパルスベクトルｐ(1) を読み出し、それを重み付短
期予測部３０５に出力する。Now, the evaluation value calculation unit 4 in the error evaluation unit 304
01 sets the initial value 1 to the index counter unit 405, and the index counter unit 405 reads out the read unit 3
For 06, m = 1 is output. As a result, the reading unit 306 reads the first set of pulse vectors p (1) from the adaptive codebook 307 and outputs it to the weighted short-term prediction unit 305.

【００８０】重み付短期予測部３０５は、ＬＰＣ係数ａ
_j（１≦ｊ≦Ｋ）で構成される前述した数４式に基づく
ＬＰＣ合成フィルタの処理を、聴覚重み付けを考慮した
上で実行することにより、１組目のパルスベクトルｐ
(1)に対応する合成音声ベクトルｙ(1) を演算する。こ
のＬＰＣ係数ａ_j（１≦ｊ≦Ｋ）は、図２の符号化部２
０３から供給されている。The weighted short-term predictor 305 uses the LPC coefficient a.
By executing the processing of the LPC synthesis filter based on the above-described equation 4 configured by _j (1 ≦ j ≦ K) in consideration of auditory weighting, the first set of pulse vectors p
The synthesized speech vector y (1) corresponding to (1) is calculated. This LPC coefficient a _j (1 ≦ j ≦ K) is obtained by the encoding unit 2 in FIG.
It is supplied from 03.

【００８１】誤差評価部３０４内の評価値演算部４０１
は、上述の１組目のパルスベクトルｐ(1) に対応する合
成音声ベクトルｙ(1) （＝ｙ_i(1),１≦ｉ≦６４）とバ
ッファ部３１４から読み込んだ１サブフレーム（＝６４
サンプル）分の重み付入力音声ベクトルｘ（＝ｘ_i, １
≦ｉ≦６４）とについて、前述した数７式の評価値Ａを
演算する。Evaluation value calculation unit 401 in error evaluation unit 304
Is a synthesized speech vector y (1) (= y _i (1), 1 ≦ i ≦ 64) corresponding to the above-described first set of pulse vectors p (1) and one subframe (=) read from the buffer unit 314 (= 64
Sampled weighted input speech vector x (= x _i , 1
≦ i ≦ 64), the evaluation value A of the above-mentioned expression 7 is calculated.

【００８２】評価値演算部４０１は、１組目の誤差評価
を終了したら、評価値Ａ、１組目の合成音声ベクトルｙ
(m) 及びインデックスｍ＝１を、それぞれ最大評価値記
憶部４０２、合成音声ベクトル記憶部４０３及びインデ
ックス記憶部４０４に記憶する。When the evaluation value calculation section 401 has finished the error evaluation of the first set, the evaluation value A and the first set of synthesized speech vectors y
(m) and the index m = 1 are stored in the maximum evaluation value storage unit 402, the synthesized voice vector storage unit 403, and the index storage unit 404, respectively.

【００８３】続いて、インデックスカウンタ部４０５
は、評価値演算部４０１からの指示に基づいて、カウン
タ値であるインデックスｍの値を１から２にインクリメ
ントする。これにより、読出し部３０６は、アダプディ
ブコードブック３０７から２組目のパルスベクトルｐ
(2) を読み出し、それを重み付短期予測部３０５に出力
する。Subsequently, the index counter unit 405
Increments the value of the index m, which is a counter value, from 1 to 2 based on an instruction from the evaluation value calculation unit 401. As a result, the reading unit 306 determines that the second set of pulse vectors p from the adaptive codebook 307 is received.
(2) is read and output to the weighted short-term prediction unit 305.

【００８４】評価値演算部４０１は、上述のようにイン
デックスカウンタ部４０５に対してｍの値を１からαま
で更新させながら、次のような制御処理を行う。即ち、
評価値演算部４０１は、アダプディブコードブック３０
７から読み出されるパルスベクトルｐ(m) に対応する合
成音声ベクトルｙ(m) （＝ｙ_i(m),１≦ｉ≦６４）とバ
ッファ部３１４から出力されている１サブフレーム分の
重み付入力音声ベクトルｘ（＝ｘ_i, １≦ｉ≦６４）と
につき、前述した数７式の評価値Ａを演算する。そし
て、評価値演算部４０１は、その評価値Ａを最大評価値
記憶部４０２に記憶されている評価値と比較し、今回演
算された評価値Ａの方が大きければ、最大評価値記憶部
４０２の内容を今回の評価値Ａで置き換え、また、合成
音声ベクトル記憶部４０２及びインデックス記憶部４０
４の各内容を、それぞれ今回の合成音声ベクトルｙ(m)
及び今回のインデックスｍの値で置き換える。今回演算
された評価値Ａが最大評価値記憶部４０２に記憶されて
いる評価値以下の場合には、上述の各置き換えは行われ
ない。The evaluation value calculator 401 performs the following control processing while updating the value of m from 1 to α in the index counter 405 as described above. That is,
The evaluation value calculation unit 401 uses the adaptive code book 30.
7, the synthesized voice vector y (m) (= y _i (m), 1 ≦ i ≦ 64) corresponding to the pulse vector p (m) and the weighting for one subframe output from the buffer unit 314. With respect to the input voice vector x (= x _i , 1 ≦ i ≦ 64), the evaluation value A of the above-mentioned expression 7 is calculated. Then, the evaluation value calculation unit 401 compares the evaluation value A with the evaluation value stored in the maximum evaluation value storage unit 402, and if the evaluation value A calculated this time is larger, the maximum evaluation value storage unit 402. Is replaced with the evaluation value A of this time, and the synthesized voice vector storage unit 402 and the index storage unit 40 are replaced.
Each of the contents of 4 is the synthesized voice vector y (m) of this time.
And the value of index m this time. If the evaluation value A calculated this time is equal to or smaller than the evaluation value stored in the maximum evaluation value storage unit 402, the above replacements are not performed.

【００８５】このようにして、全てのインデックス値ｍ
（１≦ｍ≦α）に対する誤差評価が終了した時点におい
て、インデックス記憶部４０４には数７式の評価値Ａを
最大にしたパルスベクトルのインデックス値ｍ＝ｍ_opt
が記憶されており、また、合成音声ベクトル記憶部４０
３にはそのときの最適合成音声ベクトルｙ(m_opt) が記
憶されていることになる。In this way, all index values m
At the time when the error evaluation for (1 ≦ m ≦ α) is completed, the index storage unit 404 stores the index value m = m _{opt of} the pulse vector that maximizes the evaluation value A of the equation (7).
Is stored, and the synthesized voice vector storage unit 40
In 3, the optimum synthesized speech vector y (m _opt ) at that time is stored.

【００８６】この後、評価値演算部４０１は、バッファ
部３１４から出力されている重み付入力音声ベクトルｘ
（＝ｘ_i, １≦ｉ≦６４）と、合成音声ベクトル記憶部
４０３から読み出した最適合成音声ベクトルｙ(m_opt)
（＝ｙ_i(m_opt),１≦ｉ≦６４）をゲイン演算部４０６
に出力する。ゲイン演算部４０６は、上述の２組のベク
トルに基づいて前述した数５式を演算することによって
最適ゲインｇを出力する。そして、最適ゲインｇは、符
号化部３１８で符号化される。After that, the evaluation value calculation unit 401 outputs the weighted input speech vector x output from the buffer unit 314.
(= X _i , 1 ≦ i ≦ 64) and the optimum synthesized speech vector y (m _opt ) read from the synthesized speech vector storage unit 403.
(= Y _i (m _opt ), 1 ≦ i ≦ 64) is calculated by the gain calculation unit 406.
Output to. The gain calculation unit 406 outputs the optimum gain g by calculating the above-mentioned equation 5 based on the above two sets of vectors. Then, the optimum gain g is encoded by the encoding unit 318.

【００８７】以上のようにして、評価値演算部４０１か
らは最適インデックスｍ_optが、また、符号化部３１８
からは符号化された最適ゲインｇが、それぞれ特には図
示しない音声復号装置へ送出される。As described above, the optimum index m _opt from the evaluation value calculation unit 401 and the encoding unit 318
The coded optimum gains g are sent to the speech decoding apparatus not shown in the figure.

【００８８】なお、評価値演算部４０１は、インデック
ス記憶部４０４から読み出した最適インデックスｍ_opt
を読出し部３０６へ出力する。読出し部３０６は、最適
インデックスｍ_optに対応する最適パルスベクトルｐ(m
_opt) （＝ｐ_i(m_opt),１≦ｉ≦６４）を、アダプディ
ブコードブック３０７から読み出し、バッファ部３２２
へ出力する。このバッファ部３２２のデータは、アダプ
ディブコードブック３０７の後述する更新処理に使用さ
れる。The evaluation value calculation unit 401 uses the optimum index m _opt read from the index storage unit 404.
Is output to the reading unit 306. The reading unit 306 reads the optimum pulse vector p (m corresponding to the optimum index m _opt.
_opt ) (= p _i (m _opt ), 1 ≦ i ≦ 64) is read from the adaptive codebook 307, and the buffer unit 322 is read.
Output to. The data in the buffer unit 322 is used for the update processing of the adaptive code book 307 described later.

【００８９】上述の処理に続き、評価値演算部４０１
は、合成音声ベクトル記憶部４０３から読み出した最適
合成音声ベクトルｙ(m_opt) （＝ｙ_i(m_opt),１≦ｉ≦
６４）を図３の乗算部３０８へ出力する。また、符号化
部３１８によって符号化された最適ゲインｇは、復号部
３１９において特には図示しない音声復号装置における
復号処理と同じ復号処理によって復号された後に、乗算
部３０８に入力される。このように、復号した最適ゲイ
ンが用いられるのは、復号部２０４で説明したのと同
様、符号化による量子化誤差の影響を除去するためであ
る。そして、乗算部３０８において、最適合成音声ベク
トルｙ(m_opt) に最適ゲインｇを復号部３１９で復号し
た値が乗算され、更に、減算部３０９において上述の乗
算結果がバッファ部３１４から出力されている１サブフ
レーム分の重み付入力音声ベクトルｘ（＝ｘ_i, １≦ｉ
≦６４）から減算されることにより、１サブフレーム分
の重み付入力音声誤差ベクトルｅｘ（＝ｅｘ_i, １≦ｉ
≦６４）が得られる。第２段目の残差符号化処理の説明次に、図３の３０２の部分において、バッファ部３１６
に保持された６４サンプルの重み付入力音声誤差ベクト
ルｅｘの各サブフレームについて、以下のような誤差評
価処理が実行される。Following the above processing, the evaluation value calculation unit 401
Is the optimum synthesized speech vector y (m _opt ) (= y _i (m _opt ), 1 ≦ i ≦ read from the synthesized speech vector storage unit 403.
64) is output to the multiplication unit 308 of FIG. Further, the optimum gain g encoded by the encoding unit 318 is input to the multiplication unit 308 after being decoded by the decoding unit 319 by the same decoding process as the decoding process in a speech decoding device (not shown). The reason why the decoded optimum gain is used in this way is to remove the influence of the quantization error due to the coding, as described in the decoding unit 204. Then, the multiplication unit 308 multiplies the optimum synthesized speech vector y (m _opt ) by the value obtained by decoding the optimum gain g by the decoding unit 319, and further, the subtraction unit 309 outputs the above multiplication result from the buffer unit 314. Weighted input speech vector x (= x _i , 1 ≦ i
≦ 64), the weighted input speech error vector ex (= ex _i , 1 ≦ i for one subframe is obtained.
≦ 64) is obtained. Description of the residual coding process in the second stage Next, in the portion 302 of FIG.
The following error evaluation process is executed for each sub-frame of the 64-sample weighted input speech error vector ex held in.

【００９０】図３の誤差評価部３１０の構成を図５に示
す。誤差評価部３１０は、図４の誤差評価部３０４にお
ける４０１〜４０６の構成に対応して、評価値演算部５
０１、最大評価値記憶部５０２、合成音声ベクトル記憶
部５０３、インデックス記憶部５０４、インデックスカ
ウンタ部５０５及びゲイン演算部５０６を有するが、更
に、間引き部５０７及び５０８を有することが本実施例
の最も大きな特徴である。以下、図３と図５を用いて説
明する。The structure of the error evaluation unit 310 of FIG. 3 is shown in FIG. The error evaluation unit 310 corresponds to the configuration of the error evaluation unit 304 in FIG.
01, the maximum evaluation value storage unit 502, the synthesized speech vector storage unit 503, the index storage unit 504, the index counter unit 505, and the gain calculation unit 506, but the thinning units 507 and 508 are most included in the present embodiment. This is a great feature. This will be described below with reference to FIGS. 3 and 5.

【００９１】始めに、ストカスティックコードブック３
１３は、図８の雑音ベクトルコードブック８１３と同様
であり、そこにはβ組の雑音ベクトルｃ(n)（１≦ｎ≦
β）が記憶されている。ここで、各組の雑音ベクトルの
要素の数（サンプル数）は本実施例では６４サンプル、
即ち、ｃ(n) ＝ｃ_i(n) （１≦ｉ≦６４）である。First, the stochastic codebook 3
13 is similar to the noise vector codebook 813 of FIG. 8, in which β sets of noise vectors c (n) (1 ≦ n ≦
β) is stored. Here, the number of elements (number of samples) of the noise vector of each set is 64 samples in this embodiment,
That is, _{c (n) = c i (} n) (1 ≦ i ≦ 64).

【００９２】そして、評価値演算部５０１は、図４の４
０１と同様に、インデックスカウンタ部５０５に対して
ｎの値を１からβまで更新させながら、以下のような制
御処理を行う。即ち、評価値演算部５０１は、読出し部
５１２によってストカスティックコードブック３１３か
ら読み出される各雑音ベクトルｃ(n)の組に対応して重
み付短期予測部３１１で数９式に基づいて合成されバッ
ファ部３１７を介して入力される合成音声誤差ベクトル
ｅｙ(n) （＝ｅｙ_i(n),１≦ｉ≦６４）と、バッファ部
３１６から出力されている１サブフレーム分の重み付入
力音声誤差ベクトルｅｘ（＝ｅｘ_i, １≦ｉ≦６４）と
につき、誤差の評価値を演算する。そして、評価値演算
部５０１は、その評価値を最大評価値記憶部５０２に記
憶されている評価値と比較して、今回演算された評価値
の方が大きければ、最大評価値記憶部５０２の内容を今
回の評価値で置き換え、また、合成音声誤差ベクトル記
憶部５０３及びインデックス記憶部５０４の各内容を、
それぞれ今回の合成音声誤差ベクトルｅｙ(n)及び今回
のインデックスｎの値で置き換える。今回演算された評
価値が最大評価値記憶部５０２に記憶されている評価値
以下の場合には、上述の各置き換えは行われない。Then, the evaluation value calculation unit 501 operates at 4 in FIG.
Similar to 01, the following control process is performed while updating the value of n from 1 to β in the index counter unit 505. That is, the evaluation value calculation unit 501 corresponds to the set of each noise vector c (n) read from the stochastic codebook 313 by the reading unit 512, and is synthesized by the weighted short-term prediction unit 311 based on the equation (9) and buffered. The synthesized speech error vector ey (n) (= ey _i (n), 1 ≦ i ≦ 64) input via the unit 317 and the weighted input speech error for one subframe output from the buffer unit 316. An error evaluation value is calculated for the vector ex (= ex _i , 1 ≦ i ≦ 64). Then, the evaluation value calculation unit 501 compares the evaluation value with the evaluation value stored in the maximum evaluation value storage unit 502. If the evaluation value calculated this time is larger, the evaluation value calculation unit 501 stores the maximum evaluation value storage unit 502. The contents are replaced with the evaluation value of this time, and the contents of the synthetic speech error vector storage unit 503 and the index storage unit 504 are replaced by
The values of the synthetic speech error vector ey (n) of this time and the index n of this time are respectively replaced. When the evaluation value calculated this time is equal to or smaller than the evaluation value stored in the maximum evaluation value storage unit 502, the above replacements are not performed.

【００９３】ここで、評価値演算部５０１における評価
値の演算式は、図４の評価値演算部４０１における評価
値の演算式である数７式が数３式〜数６式に基づいて決
定されたのに対応して、前述した数８式〜数１１式に基
づいて決定される。そして、これらの式に基づいて基本
的に導出される評価値の演算式は、従来例でも使用して
いた数１２式である。本実施例では、この数１２式を以
下に示す数１３式のように変形した点が大きな特徴であ
る。Here, the arithmetic expression of the evaluation value in the evaluation value arithmetic unit 501 is determined based on Equations 3 to 6 which are the arithmetic expressions of the evaluation value in the evaluation value arithmetic unit 401 of FIG. Corresponding to the above, it is determined based on the above-described equations (8) to (11). Then, the arithmetic expression of the evaluation value that is basically derived based on these expressions is the expression 12 used in the conventional example. A major feature of the present embodiment is that the equation (12) is modified into the following equation (13).

【００９４】[0094]

【数１３】 [Equation 13]

【００９５】即ち従来例では、数１２式のように、１サ
ブフレーム分の重み付入力音声誤差ベクトルｅｘ（＝ｅ
ｘ_i, １≦ｉ≦６４）と１組分の合成音声誤差ベクトル
ｅｙ(n) （＝ｅｙ_i(n),１≦ｉ≦６４）の、それぞれの
全ての要素値（信号サンプル値）を使用して評価値Ａが
演算された。これに対して本実施例では、数１３式のよ
うに、各ベクトルの要素値が一定間隔Ｍで間引かれなが
ら評価値Ａが演算される。一般に、音声信号及び合成音
声信号は、８乃至１０次程度の近接相関を有しているこ
とが良く知られている。そこで、数１３式のように間引
き処理を行いながら評価値Ａを演算しても、数１２式の
ように間引き処理を行わないで評価値Ａを演算した場合
に比較して、評価値Ａは大きくは変化しないことが期待
されるのである。ここで、数１３式の“N(step M) ”
は、ベクトルの要素値を示す番号ｉが１から値Ｍずつ値
Ｎまでインクリメントされることを示している。That is, in the conventional example, the weighted input speech error vector ex (= e
x _i , 1 ≤ i ≤ 64) and one set of synthesized speech error vectors ey (n) (= ey _i (n), 1 ≤ i ≤ 64) for each element value (signal sample value) The evaluation value A was calculated by using. On the other hand, in the present embodiment, the evaluation value A is calculated while the element values of each vector are thinned out at a constant interval M, as in Expression 13. In general, it is well known that the audio signal and the synthetic audio signal have a proximity correlation of the order of 8 to 10. Therefore, even if the evaluation value A is calculated while performing the thinning-out process as in Expression 13, the evaluation value A is calculated as compared with the case where the evaluation value A is calculated without performing the thinning-out process as in Expression 12. It is expected that it will not change significantly. Here, “N (step M)” of the equation 13
Indicates that the number i indicating the element value of the vector is incremented from 1 to the value N by the value M.

【００９６】本実施例では、Ｍ＝４程度までの数１３式
に基づく間引き演算では、数１２式の演算を行った場合
に比較して、復号された音声において、ほとんど遜色の
ないＳ／Ｎ（若しくはセグメンタルＳ／Ｎ）を得られる
ことが、実験的に確認されている。In the present embodiment, in the thinning-out operation based on the expression (13) up to M = 4, the S / N ratio of the decoded voice is almost equal to that in the case where the operation of the expression (12) is performed. It has been experimentally confirmed that (or segmental S / N) can be obtained.

【００９７】ここで、数１２式及び数１３式のそれぞれ
によって評価値Ａを演算した場合の演算量を比較する
と、図６のようになる。これより、例えば１サブフレー
ムのサンプル数Ｎ＝６４、間引き間隔Ｍ＝４とした場合
に、数１３式で評価値Ａを演算した方が、約１／４の加
算及び乗算の回数で済むことがわかる。Here, comparing the calculation amounts when the evaluation value A is calculated by each of the equations 12 and 13, the result is as shown in FIG. From this, for example, when the number of samples in one subframe is N = 64 and the thinning interval is M = 4, it is possible to calculate the evaluation value A by the formula 13 with about 1/4 the number of additions and multiplications. I understand.

【００９８】以上の原理に基づいて、図３又は図４の誤
差評価部３１０では、次のような評価値Ａの演算処理が
行われる。即ち、まず、バッファ部３１６から出力され
る１サブフレーム分の重み付入力音声誤差ベクトルｅｘ
（＝ｅｘ_i, １≦ｉ≦６４）は、間引き部５０７におい
てサンプル間隔Ｍで間引かれて、評価値演算部５０１に
入力される。また、バッファ部３１７から出力される１
組の合成音声誤差ベクトルｅｙ(n) （＝ｅｙ_i(n),１≦
ｉ≦６４）は、間引き部５０８においてやはりサンプル
間隔Ｍで間引かれて、評価値演算部５０１に入力され
る。これらの処理は、数１３式において、ベクトルの要
素値を示す番号ｉが１から値Ｍずつ値Ｎまでインクリメ
ントされる処理に相当する。そして、評価値演算部５０
１は、数１３式における上述の動作以外の残りの演算処
理を実行するのである。Based on the above principle, the error evaluation section 310 shown in FIG. 3 or 4 performs the following arithmetic processing of the evaluation value A. That is, first, the weighted input speech error vector ex for one subframe output from the buffer unit 316
(= Ex _i , 1 ≦ i ≦ 64) is thinned out by the thinning unit 507 at the sample interval M and input to the evaluation value calculation unit 501. In addition, 1 output from the buffer unit 317
A set of synthesized speech error vectors ey (n) (= ey _i (n), 1 ≦
i ≦ 64) is also thinned by the thinning unit 508 at the sample interval M and input to the evaluation value calculation unit 501. These processes correspond to processes in which the number i indicating the element value of the vector is incremented from 1 to the value N by the value M in the equation (13). Then, the evaluation value calculation unit 50
1 executes the remaining arithmetic processing other than the above-mentioned operation in the expression (13).

【００９９】なお、合成音声誤差ベクトル記憶部５０３
には、間引かれた合成音声誤差ベクトルではなく、間引
かれていない合成音声誤差ベクトルｅｙ(n)（＝ｅｙ
_i(n),１≦ｉ≦６４）が記憶される。これは、後述する
最適ゲインの演算は間引き処理なしに実行された方が、
音声復号装置で合成される音声の音質を向上させること
ができ、また、最適ゲインの演算は１サブフレームに１
回実行されるだけなので間引き処理をしなくても演算量
はそれほど増大しないからである。The synthesized voice error vector storage unit 503
Is not the decimated synthetic speech error vector, but the decimated synthetic speech error vector ey (n) (= ey
_i (n), 1 ≦ i ≦ 64) is stored. This is because the calculation of the optimum gain described later is performed without thinning
It is possible to improve the sound quality of the speech synthesized by the speech decoding device, and the calculation of the optimum gain is 1 for each subframe.
This is because the calculation amount does not increase so much even if the thinning-out process is not performed because it is only executed once.

【０１００】以上のような評価値Ａの演算処理を基本と
して、全てのインデックス値ｎ（１≦ｎ≦β）に対する
誤差評価が終了した時点で、インデックス記憶部５０４
には数１３式の評価値Ａを最大にした雑音ベクトルのイ
ンデックス値ｎ＝ｎ_optが記憶されており、また、合成
音声誤差ベクトル記憶部５０３にはそのときの最適合成
音声誤差ベクトルｅｙ(n_opt) （＝ｅｙ_i(n),１≦ｉ≦
６４）が記憶されていることになる。Based on the calculation processing of the evaluation value A as described above, at the time when the error evaluation for all index values n (1≤n≤β) is completed, the index storage unit 504
Stores the index value n = n _{opt of} the noise vector that maximizes the evaluation value A of Equation 13, and the synthesized speech error vector storage unit 503 stores the optimum synthesized speech error vector ey (n _opt ) (= ey _i (n), 1 ≦ i ≦
64) is stored.

【０１０１】この後、評価値演算部５０１は、バッファ
部３１６から出力されている重み付入力音声誤差ベクト
ルｅｘ_i（＝ｅｘ_i, １≦ｉ≦６４）と、合成音声ベク
トル記憶部４０３から読み出した最適合成音声誤差ベク
トルｅｙ(n_opt) （＝ｅｙ_i(n_opt),１≦ｉ≦６４）を
ゲイン演算部５０６に出力する。ゲイン演算部５０６
は、上述の２組のベクトルに基づいて従来例と同様の数
１０式を演算することによって最適ゲインｂを出力す
る。そして、最適ゲインｂは、符号化部３２０で符号化
される。After that, the evaluation value calculation unit 501 reads the weighted input speech error vector ex _i (= ex _i , 1 ≦ i ≦ 64) output from the buffer unit 316 and the synthesized speech vector storage unit 403. The optimum synthesized speech error vector ey (n _opt ) (= ey _i (n _opt ), 1 ≦ i ≦ 64) is output to the gain calculation unit 506. Gain calculation unit 506
Outputs the optimum gain b by calculating the equation 10 similar to the conventional example based on the above two sets of vectors. Then, the optimum gain b is encoded by the encoding unit 320.

【０１０２】以上のようにして、評価値演算部５０１か
らは最適インデックスｎ_optが、また、符号化部３２０
からは符号化された最適ゲインｂが、それぞれ特には図
示しない音声復号装置へ送出される。アダプディブコードブック３０７の更新処理の説明最後に、アダプディブコードブック３０７の更新処理に
ついて説明する。As described above, the optimum index n _opt from the evaluation value calculation unit 501 and the coding unit 320
The coded optimum gains b are transmitted from the respective units to a voice decoding device (not shown). Description of Update Process of Adaptive Codebook 307 Lastly, the update process of the adaptive codebook 307 will be described.

【０１０３】評価値演算部５０１は、１サブフレーム
（＝６４サンプル）の上述した誤差評価処理を終了した
後に、インデックス記憶部５０４から読み出した最適イ
ンデックスｎ_optを読出し部３１２へ出力する。読出し
部３１２は、この最適インデックスｎ_optに対応する最
適雑音ベクトルｃ(n_opt) （＝ｃ_i(n_opt),１≦ｉ≦６
４）をストカスティックコードブック３１３から読み出
して、乗算部３２４へ出力する。一方、バッファ部３２
２から乗算部３２３へは、６４サンプルの最適パルスベ
クトルｐ(m_opt) （＝ｐ_i(m_opt),１≦ｉ≦６４）のう
ちの上記６４サンプルの最適雑音ベクトルｃ(n_opt) に
対応する部分が出力される。乗算部３２３では最適パル
スベクトルｐ(m_opt) に最適ゲインｇを復号部３１９で
復号した値が乗算され、乗算部３２４では最適雑音ベク
トルｃ(n_opt) に最適ゲインｂを復号部３２１で復号し
た値が乗算される。ここで、復号した最適ゲインが用い
られるのは、復号部２０４で説明したのと同様、符号化
による量子化誤差の影響を除去するためである。そし
て、これら２つの乗算結果が加算部３２５で加算され、
バッファ部３２６に格納される。The evaluation value calculation unit 501 outputs the optimum index n _opt read from the index storage unit 504 to the reading unit 312 after completing the above-described error evaluation processing for one subframe (= 64 samples). The reading unit 312 uses the optimum noise vector c (n _opt ) (= c _i (n _opt ), 1 ≦ i ≦ 6 corresponding to the optimum index n _opt.
4) is read from the stochastic codebook 313 and output to the multiplication unit 324. On the other hand, the buffer unit 32
From 2 to the multiplication unit 323, the optimum noise vector c (n _opt ) of 64 samples of the optimum pulse vector p (m _opt ) (= p _i (m _opt ), 1 ≦ i ≦ 64) of 64 samples is _output . The corresponding part is output. In the multiplication unit 323, the optimum pulse vector p (m _opt ) is multiplied by the value obtained by decoding the optimum gain g in the decoding unit 319, and in the multiplication unit 324, the optimum noise vector c (n _opt ) is decoded by the decoding unit 321. The multiplied value is multiplied. Here, the reason why the decoded optimum gain is used is to remove the influence of the quantization error due to the coding, as described in the decoding unit 204. Then, these two multiplication results are added by the addition unit 325,
It is stored in the buffer unit 326.

【０１０４】このようにしてバッファ部３２６に得られ
る６４サンプル分の更新データは、現在の６４サンプル
の区間において特には図示しない音声復号装置において
得られるであろう最適な残差信号を表現していることに
なる。そして、この最適な残差信号が、更新部３２７に
よって、アダプディブコードブック３０７に新たなパル
スベクトルの組として書き込まれる。このとき、コード
ブック上の最も古い１組のパルスベクトルが捨てられ
る。The update data for 64 samples obtained in the buffer unit 326 in this manner expresses an optimum residual signal which will be obtained by a speech decoding device not shown in the figure in the current section of 64 samples. Will be there. Then, the optimum residual signal is written in the adaptive codebook 307 as a new set of pulse vectors by the updating unit 327. At this time, the oldest set of pulse vectors in the codebook is discarded.

【０１０５】以上のようにアダプディブコードブック３
０７には、図３の３０１での処理における現フレームま
でに求まった最適な残差信号の組が新しい順に記憶さ
れ、これらのデータが次のサブフレームにおけるパルス
ベクトルの探索に使用される。これは、残差信号は隣接
するサブフレーム間では同じような形状が繰り返される
ため、これらのデータを残差信号の第１段目の近似に使
用することにより、より高精度な残差信号の符号化を行
うことができるからである。なお、特には図示しない音
声復号装置の側でも、全く同じようにしてアダプディブ
コードブックの更新が行われるため、データに矛盾が発
生することはない。As described above, the adaptive codebook 3
In 07, the optimum residual signal set obtained up to the current frame in the processing of 301 in FIG. 3 is stored in the new order, and these data are used for searching the pulse vector in the next subframe. This is because the residual signal has a similar shape between adjacent subframes, and therefore, by using these data in the first stage approximation of the residual signal, a more accurate residual signal This is because encoding can be performed. It should be noted that, in particular, the audio decoding device (not shown) also updates the adaptive codebook in exactly the same manner, so that no contradiction occurs in the data.

【０１０６】以上の記載では、本発明を音声符号化方式
の１方式であるＣＥＬＰ方式に適用した実施例について
説明したが、本発明はこれに限られることなく、例えば
残差信号の符号化方式の１つであるマルチパルス符号化
方式において、A-b-S 法に基づいて入力音声と合成され
た音声の二乗誤差の評価を繰り返す処理などについて
も、同様に適用することができる。In the above description, the embodiment in which the present invention is applied to the CELP system, which is one of the audio coding systems, has been described. However, the present invention is not limited to this, and for example, a residual signal coding system. In the multi-pulse coding method which is one of the above, the same processing can be applied to the processing of repeating the evaluation of the square error of the speech synthesized with the input speech based on the AbS method.

【０１０７】[0107]

【発明の効果】本発明によれば、音声信号は隣接する数
サンプル間では近接相関が高いことを利用して、間引き
手段により入力音声ベクトル及び合成音声ベクトルの各
要素値を間引いて得られる間引き入力音声ベクトル及び
間引き合成音声ベクトルについて誤差評価手段１０９で
誤差評価が行われることにより、復号装置側での合成音
声の品質の劣化を最小限に抑えつつ、誤差評価のための
演算量を数分の１にすることが可能となり、音声符号化
装置の高速化・回路規模の縮小が可能となる。According to the present invention, the thinning-out means obtains the thinning-out means by thinning out the respective element values of the input speech vector and the synthesized speech vector by utilizing the fact that the adjacent signal has a high close correlation between several samples. The error evaluation means 109 performs the error evaluation on the input speech vector and the thinned-out synthesized speech vector, so that the deterioration of the quality of the synthesized speech on the decoding device side is minimized, and the calculation amount for the error evaluation is several minutes. It becomes possible to increase the speed of the speech coding apparatus and reduce the circuit scale.

[Brief description of drawings]

【図１】本発明のブロック図である。FIG. 1 is a block diagram of the present invention.

【図２】本発明による音声符号化装置におけるＬＰＣ符
号化部の構成図である。FIG. 2 is a configuration diagram of an LPC encoding unit in the speech encoding apparatus according to the present invention.

【図３】本発明による音声符号化装置における残差符号
化部の構成図であるFIG. 3 is a configuration diagram of a residual coding unit in the speech coding apparatus according to the present invention.

【図４】本発明による音声符号化装置における誤差評価
部３０４の構成図である。FIG. 4 is a configuration diagram of an error evaluation unit 304 in the speech encoding device according to the present invention.

【図５】本発明による音声符号化装置における誤差評価
部３１０の構成図である。[Fig. 5] Fig. 5 is a configuration diagram of an error evaluation unit 310 in the speech encoding device according to the present invention.

【図６】数１２式と数１３式の演算量を比較した図であ
る。FIG. 6 is a diagram comparing the calculation amounts of Expression 12 and Expression 13.

【図７】音声信号と残差信号を示した図である。FIG. 7 is a diagram showing an audio signal and a residual signal.

【図８】ＣＥＬＰ方式の原理構成図である。FIG. 8 is a principle configuration diagram of a CELP method.

[Explanation of symbols]

１０１コードブック記憶手段１０２励振ベクトル１０３音声合成処理手段１０４合成音声ベクトル１０５入力音声ベクトル１０６間引き手段１０７間引き入力音声ベクトル１０８間引き合成音声ベクトル１０９誤差評価手段 101 code book storage means 102 excitation vector 103 voice synthesis processing means 104 synthetic speech vector 105 input speech vector 106 thinning means 107 Decimated input voice vector 108 Thinned-out synthetic speech vector 109 Error evaluation means

───────────────────────────────────────────────────── フロントページの続き (72)発明者栗原秀明神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (72)発明者天野文雄神奈川県川崎市中原区上小田中1015番地富士通株式会社内 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Hideaki Kurihara 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Within Fujitsu Limited (72) Inventor Fumio Amano 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture Within Fujitsu Limited

Claims

[Claims]

1. A speech coding method for performing coding of an input speech vector while repeating a squared error evaluation process between a synthesized speech vector obtained by the speech synthesis processing and an input speech vector, wherein: Thinning means for thinning out a constant interval for each element value of the vector and the element value of the synthesized speech vector, and obtained by thinning out each element value of the input speech vector and the synthesized speech vector by the thinning means. A speech coding method comprising: an error evaluation unit that executes a squared error evaluation process for a thinned-out input speech vector and a thinned-out synthesized speech vector.

2. An excitation vector (102) is selected from a codebook storage means (101), a speech synthesis process (103) based on linear prediction analysis is performed based on the excitation vector, and a synthesized speech vector ( 104) and the input speech vector (105), while repeating the evaluation processing of the squared error, in a speech coding method based on a code-driven linear prediction method including a process of performing vector quantization of the residual signal of the input speech vector, A thinning-out means (106) for performing thinning-out processing at fixed intervals for each of the element values of the input speech vector (105) and the synthesized speech vector (104), and the input speech vector (105) by the thinning-out means. )
And error evaluation means (109) for executing a squared error evaluation process for the thinned-out input voice vector (107) and the thinned-out synthesized voice vector (108) obtained by thinning out each element value of the synthesized voice vector (104), A speech coding method characterized by having.

3. An excitation vector is selected from the codebook storage means, a speech synthesis process is performed based on the linear prediction analysis based on the excitation vector, and a squared error between a synthesized speech vector and an input speech vector obtained thereby is evaluated. In a speech coding method based on a code-driven linear prediction method, which includes a processing of performing vector quantization of a residual signal of the input speech vector while repeating the processing after weighting processing based on the auditory characteristics, the input speech vector Thinning-out means for performing thinning-out processing at fixed intervals for each of the element values of the above and the synthesized speech vector, and thinning-out obtained by thinning out the respective element values of the input speech vector and the synthesized speech vector by the thinning-out means. Performs squared error evaluation processing on the input speech vector and the thinned-out synthesized speech vector Speech coding, characterized in that it comprises a differential evaluation means.

4. The error evaluation means calculates a correlation value between the thinned-out input speech vector and the thinned-out synthesized speech vector, and obtains a synthesized speech vector having the maximum correlation value as a parameter. The speech coding method according to any one of claims 1 to 3, wherein the input speech vector is coded based on the input speech vector.