JPH10260699A

JPH10260699A - Voice coding method and apparatus

Info

Publication number: JPH10260699A
Application number: JP10024615A
Authority: JP
Inventors: Lin Yin; イーンリン
Original assignee: Nokia Mobile Phones Ltd
Current assignee: Nokia Oyj
Priority date: 1997-02-07
Filing date: 1998-02-05
Publication date: 1998-09-29
Also published as: AU5664898A; FR2759510A1; FI970553L; GB2322776A; DE19804584A1; FI970553A0; SE9800338D0; FI970553A7; SE9800338L; WO1998035447A3; WO1998035447A2; CN1199959A; GB9802611D0; CN1202513C; GB2322776B

Abstract

(57)【要約】（修正有）【課題】高い圧縮利得を得ることの出来る音声符号化
方法及び装置。【解決手段】後向き適応予測を使用して音声電気信号
を符号化する方法。符号化すべき音声電気信号の第１の
時間フレームが受信され、修正された離散コサイン変換
（ＭＤＣＴ）を使用し周波数領域に変換される。結果と
して生ずる周波数スペクトルは、１０２４個のスペクト
ル成分を有する。音声電気信号の後続する時間フレーム
が次いで受信され、各々のスペクトル成分についてのス
ペクトルデータ値のストリームを発生するように、離散
コサイン変換が順々に各々に適用する。各々のストリー
ムについて、一組の予測係数が、ストリームの所定の数
の以前に受信した連続的なスペクトル値を使用して各々
のスペクトル値について計算する。一組の線形予測係数
を使用して、予測されたスペクトル値が発生し、また、
予測されたスペクトル値と対応する実際のスペクトル値
の間の誤差を計算する。 (57) [Summary] (with correction) [PROBLEMS] To provide a speech encoding method and apparatus capable of obtaining a high compression gain. A method for encoding a speech electrical signal using backward adaptive prediction. A first time frame of the audio electrical signal to be encoded is received and transformed to the frequency domain using a modified discrete cosine transform (MDCT). The resulting frequency spectrum has 1024 spectral components. Subsequent time frames of the audio electrical signal are then received and a discrete cosine transform is applied to each of them in turn to generate a stream of spectral data values for each spectral component. For each stream, a set of prediction coefficients is calculated for each spectral value using a predetermined number of previously received consecutive spectral values of the stream. Using a set of linear prediction coefficients, a predicted spectral value is generated, and
Calculate the error between the predicted spectral value and the corresponding actual spectral value.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、電子信号を符号化
して復号するための方法およびそのような方法を実行す
るための装置に関する。[0001] The present invention relates to a method for encoding and decoding an electronic signal and an apparatus for performing such a method.

【０００２】[0002]

【従来の技術】デジタル形態のデータの伝送は、信号対
雑音比を増加し、また、伝送チャンネルに沿った情報容
量を増加することが良く知られている。しかしながら、
デジタル信号を大幅に圧縮することによりチャンネル容
量を更に増加したいという要求が依然存在している。音
声信号に関しては、二つの基本的な圧縮原理が慣例的に
適用される。これらの第１のものは、源信号における統
計的なまたは決定論的な冗長性を除去することを伴う一
方、第２のものは、源信号から人の知覚に関して冗長で
ある成分を抑圧あるいは除去することを伴う。最近、後
者の原理は高音質の音声の用途では有力になっており、
典型的には、音声信号を複数の周波数成分（しばしば
‘サブバンド’と呼ばれる）に分離することを伴ってお
り、複数の周波数成分の各々は分析され、（聴取者にと
っての）不適切さを取り除くように決定された量子化精
度で量子化される。ＩＳＯ（Ｉｎｔｅｒｎａｔｉｏｎａ
ｌＳｔａｎｄａｒｄｓＯｒｇａｎｉｓａｔｉｏｎ）
ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅｓＥｘｐｅ
ｒｔＧｒｏｕｐ）音声符号化標準および他の音声符号
化標準は、この原理を使用して更に定義している。しか
しながら、ＭＰＥＧ（および他の標準）は、データ速度
を更に減少させるように、‘適応予測’として知られる
技術も同様に使用する。BACKGROUND OF THE INVENTION It is well known that the transmission of data in digital form increases the signal-to-noise ratio and increases the information capacity along the transmission channel. However,
There remains a need to further increase channel capacity by significantly compressing digital signals. For audio signals, two basic compression principles are customarily applied. The first of these involves removing statistical or deterministic redundancy in the source signal, while the second suppresses or removes components from the source signal that are redundant with respect to human perception. Accompanies you. Recently, the latter principle has become dominant in high quality audio applications,
Typically, this involves separating the audio signal into multiple frequency components (often referred to as 'subbands'), each of which is analyzed to identify inadequacies (for the listener). The quantization is performed with the quantization precision determined to be removed. ISO (Internationa
l Standards Organization)
MPEG (Moving Pictures Expe
The rt Group speech coding standard and other speech coding standards have further defined using this principle. However, MPEG (and other standards) also uses a technique known as 'adaptive prediction' to further reduce data rates.

【０００３】適応予測の特定の形態は、‘後向き適応型
格子予測’として知られている。Ｆｕｃｈｓ他著の‘Ｉ
ｍｐｒｏｖｉｎｇＭＰＥＧＡｕｄｉｏＣｏｄｉｎ
ｇｂｙＢａｃｋｗａｒｄＡｄａｐｔｉｖｅＬｉｎ
ｅａｒＳｔｅｒｅｏＰｒｅｄｉｃｔｉｏｎ’，ＡＥ
ＳＣｏｎｖｅｎｔｉｏｎ，ＮｅｗＹｏｒｋ，Ｐｒｅ
ｐｒｉｎｔ４０８６Ｏｃｔ．１９９５は、一つのそ
のような後向き適応型格子予測アルゴリズムを記述して
いる。各々の周波数成分の各々のスペクトル値（‘現
在’値）について、後向き適応型格子予測は、（量子化
されたスペクトル値の中間計算による）その成分の以前
に計算されたスペクトル値から、符号化器内に一組の予
測係数を発生させる。これらの係数は、次いで、現在の
スペクトル値の値を予測するために使用される。現在の
スペクトル値と予測されたスペクトル値の間の誤差が判
別され、この誤差値が（量子化の後で）受信機に伝送さ
れる。任意の所与の時間において、現在の予測係数が、
全ての以前に受信されたサンプル値から効果的に得られ
ることが判るであろう。受信機において、係数が同様に
計算され、予測されたスペクトル値を受信された誤差値
と結合することによって再構築されたスペクトル値が得
られる。[0003] A particular form of adaptive prediction is known as 'backward adaptive lattice prediction'. 'I by Fuchs et al.
mproving MPEG Audio Codin
gby Backward Adaptive Lin
ear Stereo Prediction ', AE
S Convention, New York, Pre
print 4086 Oct. 1995 describes one such backward adaptive grid prediction algorithm. For each spectral value ('current' value) of each frequency component, backward adaptive lattice prediction encodes the previously calculated spectral value of that component (by intermediate computation of the quantized spectral values). Generate a set of prediction coefficients in the vessel. These coefficients are then used to predict the value of the current spectral value. An error between the current spectral value and the predicted spectral value is determined, and the error value is transmitted (after quantization) to a receiver. At any given time, the current prediction factor is
It will be seen that it can be effectively obtained from all previously received sample values. At the receiver, the coefficients are similarly calculated and the reconstructed spectral values are obtained by combining the predicted spectral values with the received error values.

【０００４】後向き適応予測を使用する或るアルゴリズ
ムにおいては、達成された圧縮値の測定が圧縮プロセス
の間に判別される場合がしばしばあり、誤差値は、正の
圧縮利得が達成される場合にのみ送られる。そうでない
場合には、その代わりに、実際の量子化された周波数成
分信号が伝送される。[0004] In some algorithms using backward adaptive prediction, the measurement of the achieved compression value is often determined during the compression process, and the error value is increased if a positive compression gain is achieved. Only sent. Otherwise, the actual quantized frequency component signal is transmitted instead.

【０００５】[0005]

【発明が解決しようとする課題】新しいＭＰＥＧ−２Ａ
ＡＣ標準は、１０２４個の周波数成分によるサイコアコ
ースティック（ｐｓｙｃｈｏａｃｏｕｓｔｉｃ）モデリ
ングおよび後向き適応型線形予測を使用する。新しいＭ
ＰＥＧ−４ＶＭ標準は、同様の要求を有するであろうこ
とが予想される。しかしながら、そのような多数の周波
数成分は、予測アルゴリズムの複雑さに起因して大きな
計算上のオーバーヘッドを招くこととなり、また、計算
された係数を格納するために、広い領域のメモリを利用
できることも必要である。これに加えて、後向き適応型
格子予測では、予測器が‘オフ’にされたとき（たとえ
ば、誤差値を送ることによって圧縮する利点を得ること
ができないとき）においても、デコーダは、どのような
性能の一時的な品質低下もなしに、要求されたときに、
予測器を再び‘オン’にすることができるように、係数
を判別し続けなければならない。これは付加的な計算上
のオーバーヘッドを生じる。SUMMARY OF THE INVENTION New MPEG-2A
The AC standard uses psychoacoustic modeling with 1024 frequency components and backward adaptive linear prediction. New M
It is anticipated that the PEG-4VM standard will have similar requirements. However, such a large number of frequency components introduces a large computational overhead due to the complexity of the prediction algorithm, and a large area of memory can be used to store the calculated coefficients. is necessary. In addition to this, in backward adaptive lattice prediction, when the predictor is turned 'off' (eg, when it is not possible to gain the benefit of compression by sending error values), the decoder will have any When required, without any temporary degradation of performance,
The coefficients must be determined so that the predictor can be turned 'on' again. This causes additional computational overhead.

【０００６】本発明の目的は、前述の不利益の一つ以上
を克服するか少なくとも軽減することである。[0006] It is an object of the present invention to overcome or at least reduce one or more of the above disadvantages.

【０００７】この目的は、符号化すべき音声信号の比較
的多数の周波数成分に対して作用し、且つ、成分の以前
に受信されたサンプル値の所定の数からその成分につい
ての予測係数を計算する後向き適応予測アルゴリズムを
利用することによって達成される。This object operates on a relatively large number of frequency components of the audio signal to be encoded and calculates a prediction coefficient for that component from a predetermined number of previously received sample values of the component. This is achieved by utilizing a backward adaptive prediction algorithm.

【０００８】[0008]

【課題を解決するための手段】本発明の第１の態様によ
れば、後向き適応予測を使用して音声電気信号を符号化
する符号化方法であって、（ａ）符号化すべき音声電気
信号の第１の時間フレームを受信するステップと、
（ｂ）５１２個以上のスペクトル成分を有する周波数ス
ペクトルを発生させるために時間フレームを周波数領域
に変換するステップと、（ｃ）前記音声電気信号の次の
時間フレームを受信して各々のスペクトル成分について
スペクトルデータ値のストリームを発生させるために順
番にこれらのフレームについてステップ（ｂ）を繰り返
すステップと、（ｄ）各々の前記ストリームについて所
定の数のストリームの以前に決定された再構築されたス
ペクトル値の共分散を使用して各々のスペクトル値につ
いて一組の予測係数を計算し、予測されたスペクトル値
を発生させるために前記一組の予測係数を使用し、予測
されたスペクトル値と対応する実際のスペクトル値の間
の誤差を計算するステップとを含み、計算された誤差が
スペクトル値ストリームの符号化された表現を提供し、
再構築されたスペクトル値を得るために前記誤差を予測
されたスペクトル値と再結合することができる方法が提
供される。According to a first aspect of the present invention, there is provided an encoding method for encoding an audio electric signal using backward adaptive prediction, comprising: (a) an audio electric signal to be encoded; Receiving a first time frame of
(B) transforming the time frame into the frequency domain to generate a frequency spectrum having 512 or more spectral components; and (c) receiving the next time frame of the audio electrical signal and performing Repeating steps (b) for these frames in order to generate a stream of spectral data values; and (d) for each said stream, a predetermined number of previously determined reconstructed spectral values of a stream. Calculating a set of predictive coefficients for each spectral value using the covariance of the set, and using the set of predictive coefficients to generate a predicted spectral value; Calculating an error between the spectral values of the spectrum values. Providing an encoded representation of the over-time,
A method is provided in which the error can be recombined with the predicted spectral values to obtain a reconstructed spectral value.

【０００９】慣用の後向き適応予測アルゴリズムの場合
には、本発明の方法は、全ての先行のスペクトル成分か
ら一組の予測係数を直接的には計算しない。すなわち、
予測係数は、各々のスペクトル値毎に再計算され、以前
に計算された組から単に適応されたものではない。この
ように、予測器がオフとなっている期間は、デコーダで
係数を更新し続ける必要はない。In the case of a conventional backward adaptive prediction algorithm, the method of the present invention does not directly calculate a set of prediction coefficients from all previous spectral components. That is,
The prediction coefficients are recalculated for each spectral value and are not simply adapted from a previously calculated set. In this way, it is not necessary for the decoder to keep updating the coefficients while the predictor is off.

【００１０】前のスペクトル値の所定の数の共分散から
予測係数を計算する後向き適応予測アルゴリズムは、比
較的少数の周波数サブバンド（たとえば、３２）に細分
割された音声信号を符号化するのには一般に適していな
いが、その一方、そのようなアルゴリズムは、音声信号
が比較的多数の周波数サブバンド（たとえば、ＭＰＥＧ
−４標準草案に規定されたように１０２４）に細分割さ
れる場合には、適切であることが判っている。これは、
多数のサブバンドが規定されるときには、予測アルゴリ
ズムの次数（すなわち予測係数の数）を低くすることが
でき、また、本発明を具体化するアルゴリズムは、高い
性能を提供し、低次数について計算効率が良いからであ
る。好ましくは、予測次数は、１あるいは２である。更
に好ましくは、予測次数は２である。A backward adaptive prediction algorithm that calculates prediction coefficients from a predetermined number of covariances of previous spectral values encodes a speech signal subdivided into a relatively small number of frequency subbands (eg, 32). While such algorithms are not generally suitable for such applications, such algorithms require that the audio signal be composed of a relatively large number of frequency subbands (eg, MPEG
-4 has been found to be appropriate if it is subdivided into 1024) as specified in the draft standard. this is,
When a large number of subbands are defined, the order of the prediction algorithm (ie, the number of prediction coefficients) can be reduced, and the algorithm embodying the present invention provides high performance and computational efficiency for low orders Is good. Preferably, the prediction order is one or two. More preferably, the prediction order is two.

【００１１】好ましくは、前記所定数の以前に受信され
た連続的なスペクトル値は、対応する数の量子化された
スペクトル値を得るために使用される。したがって、そ
れは前記予測係数を計算するために使用された量子化さ
れた値である。[0011] Preferably, said predetermined number of previously received consecutive spectral values is used to obtain a corresponding number of quantized spectral values. Therefore, it is the quantized value used to calculate the prediction coefficient.

【００１２】好ましくは、音声信号から見た時間窓は、
重なっている。たとえば、各々の窓は、５０％の重なり
を有する隣接窓を備えた２０４８個のサンプル点を含む
ことができる。しかしながら、窓は隣接していてもよ
い。Preferably, the time window viewed from the audio signal is:
overlapping. For example, each window may include 2048 sample points with adjacent windows having 50% overlap. However, the windows may be adjacent.

【００１３】本発明のある実施態様においては、新しい
一組の予測係数は、各々の全てのスペクトル値について
計算することができる。しかしながら、他の実施態様に
おいては、第２あるいは第３（あるいは他の倍数）のス
ペクトル値毎にのみ予測係数を再計算して、いくつかの
連続的なスペクトル値について同じ係数を使用すること
が、計算上一層効率的かもしれない。音声信号の遷移を
検出すると直ぐに、低係数更新速度（たとえば、第２の
値毎に）と高更新速度（たとえば、スペクトル値毎に）
の間で切り換えるようにすることが適切であるかもしれ
ない。In one embodiment of the invention, a new set of prediction coefficients can be calculated for each and every spectral value. However, in other embodiments, it is possible to recalculate the prediction coefficients only for every second or third (or other multiple) spectral value, and use the same coefficient for several consecutive spectral values. , May be more computationally efficient. As soon as a transition of the audio signal is detected, a low coefficient update rate (eg, every second value) and a high coefficient update rate (eg, every spectral value)
It may be appropriate to switch between.

【００１４】各々の組の予測係数を計算するために使用
された、以前に受信されたサンプル点の所定の数に対す
る下限は、必要とされる符号化品質によって決定され
る。しかしながら、好ましくは、数は４以上である。こ
の数に対する上限は、メモリと計算上の制約によって決
定される。好ましくは、数は１０以下である。更に好ま
しくは、所定の数は６である。The lower bound for the predetermined number of previously received sample points used to calculate each set of prediction coefficients is determined by the required coding quality. However, preferably, the number is four or more. The upper limit for this number is determined by memory and computational constraints. Preferably, the number is 10 or less. More preferably, the predetermined number is six.

【００１５】予測係数を評価するあらゆる適当な方法、
たとえば、自己相関法を使用することができる。しかし
ながら、最小自乗法が特に有利であることが判った。[0015] Any suitable method of evaluating the prediction coefficients,
For example, an autocorrelation method can be used. However, the least squares method has been found to be particularly advantageous.

【００１６】好ましくは、予測されたスペクトル値を計
算するために使用された予測係数は、線形予測係数であ
る。Preferably, the prediction coefficients used to calculate the predicted spectral values are linear prediction coefficients.

【００１７】本発明は、サイコアコースティック補正と
共に使用するように意図されており、また、誤差信号の
量子化は、これに応じて制御することができることが判
るであろう。It will be appreciated that the present invention is intended for use with psychoacoustic correction, and that the quantization of the error signal can be controlled accordingly.

【００１８】本発明の第２の態様によれば、上記第１の
態様の方法を使用して符号化された音声電気信号を復号
する方法であって、符号化された音声信号に対応する一
連の誤差値を入力信号として受信するステップと、各々
のストリームについて、一組の予測係数を使用して各々
の誤差値毎に対応する予測されたスペクトル成分値を決
定し、予測係数がそのストリームについての所定数の以
前に決定された連続的な予測されたスペクトル成分の共
分散を使用して計算され、再構築されたスペクトル値を
提供するために誤差値と予測されたスペクトル値を結合
するステップと、全てのストリームの再構築されたスペ
クトル値を結合して周波数−時間変換することにより、
前記音声信号を実質的に再構築するステップとを含む復
号方法が提供される。According to a second aspect of the present invention, there is provided a method of decoding an audio electric signal encoded using the method of the first aspect, comprising the steps of: Receiving, as an input signal, an error value for each stream, and for each stream, using a set of prediction coefficients to determine a corresponding predicted spectral component value for each error value, wherein the prediction coefficient is Combining the error value and the predicted spectral value to provide a reconstructed spectral value calculated using the covariance of a predetermined number of previously determined continuous predicted spectral components And by combining the reconstructed spectral values of all streams and performing a frequency-to-time conversion,
Substantially reconstructing the audio signal.

【００１９】符号化方法の特定の実施例の詳細が、復号
する方法の実施例の詳細（たとえば、予測次数）の大部
分を決定することが判るであろう。It will be appreciated that the details of a particular embodiment of the encoding method determine most of the details (eg, the prediction order) of the embodiment of the decoding method.

【００２０】本発明の第３の態様によれば、後向き適応
予測を使用して音声電気信号を符号化するための装置で
あって、符号化すべき音声電気信号を受信するための入
力と、受信信号の受信された時間フレームを時間領域か
ら周波数領域に変換して、５１２個以上のスペクトル成
分を有する周波数スペクトルを提供するための時間−周
波数領域変換器と、関連するスペクトル値をストリーム
として受信し、所定数の以前に再構築されたスペクトル
値の共分散を使用して各々のスペクトル値について一組
の予測係数を計算し、予測されたスペクトル値を発生さ
せるために前記一組の予測係数を使用し、そして、予測
された値と対応する実際のスペクトル値の間の誤差を計
算するために各々のスペクトル成分と関連する信号処理
手段とを含み、計算された誤差は受信されたスペクトル
値ストリームの符号化された表現を提供し、前記誤差は
再構築されたスペクトル値を得るために予測されたスペ
クトル値と再結合することができる装置が提供される。According to a third aspect of the present invention, there is provided an apparatus for encoding a speech electrical signal using backward adaptive prediction, comprising: an input for receiving a speech electrical signal to be encoded; A time-frequency domain converter for transforming a received time frame of the signal from the time domain to the frequency domain to provide a frequency spectrum having 512 or more spectral components, and receiving the associated spectral values as a stream. Calculating a set of prediction coefficients for each spectral value using a covariance of a predetermined number of previously reconstructed spectral values, and calculating the set of prediction coefficients to generate a predicted spectral value. And using signal processing means associated with each spectral component to calculate the error between the predicted value and the corresponding actual spectral value. An apparatus is provided wherein the error provided provides an encoded representation of a stream of received spectral values, said error being able to recombine with the predicted spectral values to obtain a reconstructed spectral value. .

【００２１】本発明の第４の態様によれば、本発明の上
記第３の態様の装置を使用して符号化した音声電気信号
を復号するための装置であって、符号化された音声信号
に対応する一連の誤差値を受信するための入力と、前記
一連の値を別々のスペクトル成分ストリームに分離し、
各々の誤差値について一組の予測係数を使用して対応す
る予測されたスペクトル成分値を決定するための信号処
理手段とを含み、信号処理手段が、所定数の以前に決定
された連続的な再構築されたスペクトル値の共分散を使
用して予測係数を計算するために配置され、信号処理手
段が、各々の誤差値を対応する予測されたスペクトル値
に結合して再構築されたスペクトル値を提供する装置が
提供される。According to a fourth aspect of the present invention, there is provided an apparatus for decoding an audio electric signal encoded using the apparatus according to the third aspect of the present invention, comprising: Separating the set of values into separate spectral component streams, and an input for receiving a set of error values corresponding to
Signal processing means for determining a corresponding predicted spectral component value using a set of prediction coefficients for each error value, wherein the signal processing means comprises a predetermined number of previously determined consecutively determined spectral component values. A signal processing means arranged to calculate the prediction coefficients using the covariance of the reconstructed spectral values, wherein the signal processing means combines each error value with a corresponding predicted spectral value and reconstructs the reconstructed spectral value. Are provided.

【００２２】本発明の第５の態様によれば、本発明の第
３および第４の態様の装置の組み合わせを含む通信シス
テムが提供される。According to a fifth aspect of the present invention, there is provided a communication system comprising a combination of the devices of the third and fourth aspects of the present invention.

【００２３】本発明の第６の態様によれば、本発明の第
３および第４の態様による装置の組み合わせを含む移動
通信装置が提供される。According to a sixth aspect of the invention, there is provided a mobile communication device comprising a combination of the devices according to the third and fourth aspects of the invention.

【００２４】本発明を一層良く理解するために、また、
発明をどのようにして実行することができるかを示すた
めに、例として添付の図面が参照される。For a better understanding of the present invention,
To illustrate how the invention may be carried out, reference is made to the accompanying drawings as an example.

【００２５】[0025]

【発明の実施の形態】図１を参照すると、符号化すべき
パルスコード変調（ＰＣＭ）された音声入力信号ｇ
（ｔ）は、符号化装置の第１の信号処理ユニット１の入
力に与えられる。この第１のユニット１は、各々のフレ
ームｎが２０４８個のサンプル値から成り、且つ、隣接
フレームが５０％の重なりを有するフレーム単位で時間
領域から周波数領域に入力信号ｇ（ｔ）を変換するため
に配置される。特に、ユニット１の出力が、各々のスト
リームｊが異なったスペクトル成分に対応するスペクト
ル値ｘ_j（ｎ）の１０２４個の別々のストリームから成
るように、信号を周波数領域に変換するために、ユニッ
ト１は、修正された離散コサイン変換（ＭＤＣＴ）を使
用する。他の変換方法、たとえば、フーリエ変換を使用
することができることは、注意すべきである。DETAILED DESCRIPTION OF THE INVENTION Referring to FIG. 1, a pulse code modulated (PCM) speech input signal g to be encoded
(T) is given to the input of the first signal processing unit 1 of the coding device. This first unit 1 converts the input signal g (t) from the time domain to the frequency domain in frame units where each frame n consists of 2048 sample values and adjacent frames have 50% overlap. Arranged for. In particular, the unit for transforming the signal into the frequency domain, such that the output of unit 1 consists of 1024 separate streams of spectral values x _j (n), each stream j corresponding to a different spectral component, 1 uses a modified discrete cosine transform (MDCT). It should be noted that other transformation methods can be used, for example a Fourier transform.

【００２６】データ値ｘ_j（ｎ）の各々のストリーム
は、その動作が以下に詳細に記述される後向き適応型予
測器２の対応する入力に与えられる。概略を述べると、
各々のストリームの各々のスペクトル値ｘ_j（ｎ）につ
いて、予測器２は、後に得られた再構築された量子化さ
れたスペクトル値と、そのストリームの以前に受信され
たスペクトル値とを順に使用して、一組の予測係数ａ_j
（ｎ）を計算する。予測係数は、スペクトル値につい
て、誤差値ｅ_j（ｎ）を計算するために順に使用され
る。各々のストリームについての誤差値は、次のデジタ
ル伝送のための量子化された誤差ｅ_j〜（ｎ）を発生さ
せるために配置された量子化器３の入力に与えられる。
量子化された誤差ｅ_j〜（ｎ）は、伝送のために多重化
された誤差信号９を発生させるマルチプレクサ４に与え
られ、また、予測器２にもフィードバックされる。Each stream of data values x _j (n) is provided to a corresponding input of a backward adaptive predictor 2 whose operation is described in detail below. In brief,
For each spectral value x _j (n) of each stream, predictor 2 uses in turn the reconstructed quantized spectral value obtained later and the previously received spectral value of that stream. And a set of prediction coefficients a _j
Calculate (n). The prediction coefficients are used in turn to calculate error values e _j (n) for the spectral values. The error value for each stream is provided to the input of a quantizer 3 arranged to generate a quantized error e _j (n) for the next digital transmission.
The quantized errors e _j- (n) are provided to a multiplexer 4 for generating a multiplexed error signal 9 for transmission, and are also fed back to a predictor 2.

【００２７】別の信号処理ユニット５も、入力音声信号
ｇ（ｔ）のサイコアコースティック特性に依存して、信
号処理ユニット１と量子化器３の動作を制御するために
設けられる。このユニットの動作は、慣用のものであ
り、ここでは詳細に記述されない。Another signal processing unit 5 is also provided for controlling the operation of the signal processing unit 1 and the quantizer 3 depending on the psychoacoustic characteristics of the input audio signal g (t). The operation of this unit is conventional and will not be described in detail here.

【００２８】各々のスペクトル成分ｊについて、ｘ
（ｎ）、ｘ＾（ｎ）、およびｘ〜（ｎ）は、予測器２へ
の入力信号、予測器出力信号、および、再構築された量
子化された信号であり、ｅ_j（ｎ）およびｅ_j〜（ｎ）
は、予測誤差信号および量子化された予測誤差信号であ
る。一組の予測係数は、For each spectral component j, x
(N), x ＾ (n), and x〜 (n) are the input signal to the predictor 2, the predictor output signal, and the reconstructed quantized signal, and e _j (n) And e _j ~ (n)
Are the prediction error signal and the quantized prediction error signal. One set of prediction coefficients is

【数３】で表わすことができ、これは、時間に依存しており、こ
こで肩文字Ｔは転置（Ｔｒａｎｓｐｏｓｅ）を表わす。
予測器２の出力信号ｘ＾（ｎ）は、(Equation 3) , Which is time-dependent, where the superscript T stands for Transpose.
The output signal x ＾ (n) of the predictor 2 is

【数４】により計算され、ここで(Equation 4) Where is calculated by

【数５】であり、Ｐは予測次数すなわち係数の数である。予測器
誤差は、(Equation 5) And P is the prediction order or number of coefficients. The predictor error is

【数６】であり、再構築された量子化された信号は、(Equation 6) And the reconstructed quantized signal is

【数７】である。予測器係数の計算は、平均自乗予測誤差を最小
化することに基づいている。ａ（ｎ）は、(Equation 7) It is. The calculation of the predictor coefficients is based on minimizing the mean square prediction error. a (n) is

【数８】 (Equation 8)

【００２９】いったん自己相関関数ｒ（ｎ）が得られる
と、線形予測器は、正規方程式を解くことによって得る
ことができることが判るであろう。しかしながら、ここ
では、最小自乗アルゴリズムは、サンプル毎の線形予測
器係数を推定するために表わされる。特に利用できるデ
ータの数が少ないときには、最小自乗法は、しばしば自
己相関方法よりも良好な線形予測係数推定を生じる。予
測器の次数が低いとき、特に、たった２であるときに
は、最小自乗アルゴリズムの複雑さは、先行技術の適応
型格子アルゴリズムのものと同程度かそれよりも少ない
ことが以下に示される。Once the autocorrelation function r (n) is obtained, it will be seen that a linear predictor can be obtained by solving a normal equation. However, here the least squares algorithm is represented for estimating the linear predictor coefficients for each sample. Least squares often yield better linear prediction coefficient estimates than autocorrelation methods, especially when the number of available data is small. It is shown below that when the order of the predictor is low, especially when it is only 2, the complexity of the least squares algorithm is comparable to or less than that of the prior art adaptive lattice algorithm.

【００３０】再構築された量子化された信号が、ｘ〜
（ｎ）で示されると再度仮定する。予測次数が２で、ブ
ロック長がＬの場合には、再構築された信号の共分散
は、The reconstructed quantized signal has x〜
Assume again that it is indicated by (n). If the prediction order is 2 and the block length is L, the covariance of the reconstructed signal is

【数９】により計算される。効率的なアルゴリズムは、(Equation 9) Is calculated by An efficient algorithm is

【数１０】となる。これらの共分散で、二つの線形予測器係数を次
の通りに計算することができる。(Equation 10) Becomes With these covariances, two linear predictor coefficients can be calculated as follows:

【数１１】 [Equation 11]

【００３１】線形予測係数は、所定のあるいは固定され
た、比較的少ない数の前のスペクトル値から得られるこ
とが判るであろう。係数の計算は、全ての以前に受信さ
れたスペクトル値に依存しているわけではない。It will be appreciated that the linear prediction coefficients are obtained from a predetermined or fixed, relatively small number of previous spectral values. The calculation of the coefficients does not depend on all previously received spectral values.

【００３２】チャンネル誤差および数値的な丸め誤差に
対して後向き適応予測の丈夫さを増すために、線形予測
係数が得られた後に、帯域幅拡張を行なうことができ
る。前述の式によって計算された線形予測係数をａ_i，
ｉ＝０，１，２とする。ここでａ₀＝１である。帯域幅
拡張動作は、各々のａ_iをγⁱａ_iと置換する。ここ
で、γは１より僅かに小さい定数である。To increase the robustness of backward adaptive prediction for channel errors and numerical roundoff errors, bandwidth extension can be performed after the linear prediction coefficients are obtained. The linear prediction coefficients calculated by the above equation are represented by a _i ,
It is assumed that i = 0, 1, and 2. Here, a ₀ = 1. The bandwidth extension operation replaces each a _i with γ ⁱ a _i . Here, γ is a constant slightly smaller than 1.

【００３３】前節から判るように、共分散関数はサンプ
ル単位で更新される。したがって、正規方程式を解くこ
とによって、サンプル単位で線形予測係数も得ることが
できる。しかしながら、計算を省くために、線形予測係
数は少ない頻度で計算することができる。たとえば、線
形予測係数は、二つのサンプル毎に一度計算してもよ
い。平均予測利得の損失は、無視しても良い。しかしな
がら、予測利得の損失は、符号化すべき音声信号におけ
る遷移の発生時に、明瞭に知覚される。したがって、遷
移が検出されたときに予測器を通常の低更新速度（たと
えば第２のスペクトル値毎に）から高更新速度（たとえ
ばスペクトル値毎に）に切り換える遷移検出器１０が含
まれる。高更新速度は、遷移の検出後に短時間だけ維持
すればよい。As can be seen from the previous section, the covariance function is updated on a sample basis. Therefore, by solving the normal equation, a linear prediction coefficient can be obtained for each sample. However, the linear prediction coefficients can be calculated less frequently to save computation. For example, the linear prediction coefficients may be calculated once every two samples. The loss of the average prediction gain may be ignored. However, the loss of prediction gain is clearly perceived when a transition occurs in the audio signal to be coded. Thus, a transition detector 10 is included that switches the predictor from a normal low update rate (eg, every second spectral value) to a high update rate (eg, every spectral value) when a transition is detected. The high update speed only needs to be maintained for a short time after detecting the transition.

【００３４】Ｇ_lが倍率バンドリットルにおける予測利
得を表わすと仮定する。Ｇ_l＞０である場合には、この
サブバンド内の予測器は、以下の通り計算される全体の
予測利得に依存して切り換えることができる。Assume that G _l represents the expected gain in magnification band l. If G _l > 0, the predictors in this subband can switch depending on the overall prediction gain calculated as follows.

【数１２】ここでＮ_sは倍率バンドの数である。Ｇが予測器のサイ
ド情報に必要な追加のビットを補正する場合、すなわ
ち、Ｇ＞Ｔ₁（ｄＢ）である場合には、あるいは、予測
利得が急激に低下しない場合、すなわち、Ｇ^Present−
Ｇ^Previous＜Ｔ₂（ｄＢ）である場合には、完全なサイ
ド情報が伝送され、正の利得を生じる予測器がオンに切
り換えられ、そうでない場合には、予測器は使用され
ず、これは遷移が起こることも意味する。遷移フレーム
が検出された後に、後向き適応予測係数は、サンプル単
位で計算される。或る数のサンプルの後は、予測係数は
第２のサンプル毎に計算される。(Equation 12) Where N _s is the number of magnification band. If G corrects the extra bits needed for predictor side information, ie, G> T ₁ (dB), or if the prediction gain does not drop sharply, ie, G ^Present −
If G ^Previous <T ₂ (dB), the complete side information is transmitted and the predictor producing the positive gain is switched on, otherwise the predictor is not used, which is It also means that a transition occurs. After the transition frame is detected, the backward adaptive prediction coefficient is calculated on a sample basis. After a certain number of samples, the prediction coefficients are calculated every second sample.

【００３５】図２は、上に詳細に説明された方法を使用
して、符号化された信号を復号するための装置を示す。
受信された多重化された誤差信号９は、信号を１０２４
個のスペクトル値ストリーム、ｅ_j（ｎ）に分離するデ
マルチプレクサ６の入力に与えられる。次いで、これら
のストリームは信号処理ユニット７に送られる。各々の
ストリームについて、このユニット７は、各々の誤差値
毎に、予測された、すなわち、推定されたスペクトル値
を計算する。所定の数のこれらの予測された値は、現在
のサンプルについての予測された値の計算ができるよう
に、線形予測係数を計算するために順に使用される。こ
の処理は、符号化処理について記述されたものと一致し
ている。再構築されたスペクトル値は、受信された誤差
信号を対応する予測された値と結合することによって得
られる。再構築されたスペクトル値のストリームは、元
の音声信号を実質的に更生させるために、データに対し
て逆ＭＤＣＴを実行する別の処理ユニット８に与えられ
る。FIG. 2 shows an apparatus for decoding an encoded signal using the method described in detail above.
The received multiplexed error signal 9 has the signal 1024
Are supplied to the input of a demultiplexer 6 which separates the spectral value streams into e _j (n). These streams are then sent to the signal processing unit 7. For each stream, this unit 7 calculates, for each error value, a predicted or estimated spectral value. A predetermined number of these predicted values are used in turn to calculate a linear prediction coefficient so that a predicted value can be calculated for the current sample. This process is consistent with that described for the encoding process. The reconstructed spectral values are obtained by combining the received error signal with the corresponding predicted values. The reconstructed stream of spectral values is provided to another processing unit 8 that performs inverse MDCT on the data to substantially regenerate the original audio signal.

【００３６】図３は、その送信機の中に、上述された符
号化方法を使用して無線電話信号を符号化するための装
置１２（図１の装置に対応する）を組み込んだ移動電話
機１１を示す。また電話機は、その受信機の中に、受信
された符号化された電話信号を復号化するための装置１
３（図２の装置に対応する）も組み込んでいる。FIG. 3 shows a mobile telephone 11 incorporating in its transmitter a device 12 (corresponding to the device of FIG. 1) for encoding a radiotelephone signal using the encoding method described above. Is shown. The telephone also has, in its receiver, an apparatus 1 for decoding the received encoded telephone signal.
3 (corresponding to the device of FIG. 2) is also incorporated.

[Brief description of the drawings]

【図１】本発明の実施態様による後向き適応予測を使用
して音声信号を符号化するための装置を図式的に示す
図。FIG. 1 schematically shows an apparatus for encoding a speech signal using backward adaptive prediction according to an embodiment of the present invention.

【図２】図１の装置で符号化された音声信号を復号する
ための装置を図式的に示す図。FIG. 2 schematically shows a device for decoding a speech signal encoded by the device of FIG. 1;

【図３】図１および図２の装置を組み込んだ移動電話機
を示す図。FIG. 3 shows a mobile telephone incorporating the apparatus of FIGS. 1 and 2;

[Explanation of symbols]

１第１の信号処理ユニット２後向き適応型予測器３量子化器４マルチプレクサ５信号処理ユニット６デマルチプレクサ７信号処理ユニット８処理ユニット９誤差信号１０遷移検出器１１移動電話機１２無線電話信号を符号化するための装置１３電話信号を復号化するための装置 REFERENCE SIGNS LIST 1 first signal processing unit 2 backward adaptive predictor 3 quantizer 4 multiplexer 5 signal processing unit 6 demultiplexer 7 signal processing unit 8 processing unit 9 error signal 10 transition detector 11 mobile telephone 12 encoding wireless telephone signal 13 Device for decoding telephone signals

Claims

[Claims]

1. An encoding method for encoding an audio electrical signal using backward adaptive prediction, comprising: (a) receiving a first time frame of the audio electrical signal to be encoded; Transforming a time frame into the frequency domain to generate a frequency spectrum having 512 or more spectral components; and (c) receiving a next time frame of the audio electrical signal and obtaining a spectral data value for each spectral component. (B) repeating step (b) for these frames in turn to generate a stream of the previously determined reconstructed spectral values of a predetermined number of streams for each said stream. Calculate a set of prediction coefficients for each spectral value using to generate the predicted spectral values Using the set of prediction coefficients to calculate an error between a predicted spectral value and a corresponding actual spectral value, the calculated error being encoded in the spectral value stream. Providing a representation and recombining said error with a predicted spectral value to obtain a reconstructed spectral value.

2. The method according to claim 1, wherein the prediction order is two.

3. The method according to claim 1, wherein the prediction coefficients are recalculated only after receiving a multiple of the spectral values, and the same coefficients are used for several successive spectral values.

4. The method of claim 3, wherein said multiple is two.

5. The method according to claim 3, further comprising the step of switching between a low coefficient update rate and a high update rate as soon as a transition of the audio signal to be coded is detected.

6. The method according to claim 1, wherein the predetermined number of spectral values is four or more.

7. The method according to claim 1, wherein the predetermined number of spectral values is less than or equal to 10.

8. The method according to claim 1, wherein a least squares method is used to determine the prediction coefficients.

9. The method of claim 1, wherein the covariance is 9. The method according to claim 8, when dependent on claim 2, characterized in that:

10. The prediction coefficient is given by: 10. The method according to claim 9, wherein the method is determined according to:

11. A method for decoding an encoded audio electrical signal, comprising: receiving a series of error values corresponding to the encoded audio signal as an input signal; Using a prediction coefficient to determine a corresponding predicted spectral component value for each error value, wherein the prediction coefficient is a covariance of a predetermined number of previously determined continuous predicted spectral components for the stream. Combining the error values and the predicted spectral values to provide the reconstructed spectral values calculated using: and combining the reconstructed spectral values of all streams into a frequency-to-time transform. And thereby substantially reconstructing the audio signal.

12. An apparatus for encoding an audio electrical signal using backward adaptive prediction, comprising: an input for receiving an audio electrical signal to be encoded; and a time frame receiving the received signal. A time-frequency domain converter for transforming from the domain to the frequency domain to provide a frequency spectrum having 512 or more spectral components; receiving the associated spectral values as a stream and reconstructing a predetermined number of previously reconstructed Calculating a set of predictive coefficients for each spectral value using the covariance of the calculated spectral values, using the set of predictive coefficients to generate a predicted spectral value; and And each of the spectral components and associated signal processing means for calculating an error between the corresponding actual spectral value and the calculated error is received. Providing an encoded representation of spectral values stream, the error encoding apparatus characterized by capable of recombining with predicted spectral values to obtain the spectral values reconstructed.

13. An apparatus for decoding an encoded audio electrical signal, comprising: an input for receiving a series of error values corresponding to the encoded audio signal; Signal processing means for determining a corresponding predicted spectral component value using a set of prediction coefficients for each error value, wherein the signal processing means comprises: Arranged to calculate a prediction coefficient using the covariance of the determined successively reconstructed spectral values, wherein signal processing means combine each error value with a corresponding predicted spectral value. A decoding device for providing reconstructed spectral values. apparatus.

14. A communication system comprising a combination of the devices according to claim 12 and 13.

15. A mobile communication device comprising a combination of the devices according to claim 12 and claim 13.