JP3163206B2

JP3163206B2 - Acoustic signal coding device

Info

Publication number: JP3163206B2
Application number: JP18038093A
Authority: JP
Inventors: 智一森尾
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1993-07-21
Filing date: 1993-07-21
Publication date: 2001-05-08
Anticipated expiration: 2016-05-08
Also published as: JPH0736484A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、オーディオ信号や音声
信号を圧縮符号化して通信または蓄積する音響信号符号
化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio signal encoding apparatus for compressing and encoding an audio signal or an audio signal for communication or storage.

【０００２】[0002]

【従来の技術】第１の従来技術として、音声信号を圧縮
符号化する際に、符号化で生じる量子化ノイズを、聴覚
マスキング特性を利用して、スペクトルシェイピングす
る技術がある。その一例としては、"A New Model of LP
C Excitation for Producing Natural-Sounding Speech
at Low Bit Rates", B.S.Atal and J. R. Remde,IEEEI
nt. Conf.on Acoustics, Speech and Signal Processin
g, pp.614-617,1982、が知られている。2. Description of the Related Art As a first prior art, there is a technique of spectrally shaping quantization noise generated by encoding when compressing and encoding an audio signal by utilizing an auditory masking characteristic. One example is "A New Model of LP
C Excitation for Producing Natural-Sounding Speech
at Low Bit Rates ", BSAtal and JR Remde, IEEEI
nt. Conf.on Acoustics, Speech and Signal Processin
g, pp. 614-617, 1982.

【０００３】これは音声信号を線形予測分析して得られ
る線形予測係数を用いて、数１で表される伝達特性を持
つフィルタで、量子化誤差波形をフィルタリングし、そ
のフィルタリングされた誤差波形のエネルギーを最小化
するように符号化処理を行う手法である。[0003] This uses a linear prediction coefficient obtained by performing a linear prediction analysis on an audio signal to filter a quantization error waveform with a filter having a transfer characteristic expressed by the following equation (1). This is a method of performing an encoding process so as to minimize energy.

【０００４】[0004]

【数１】 (Equation 1)

【０００５】上記式（１）において、ａ_k はｋ次の線形
予測係数、ｐは予測次数、β，γは０≦γ≦β≦１の定
数をそれぞれ表す。In the above equation (1), a _k represents a _k-th linear prediction coefficient, p represents a prediction order, and β and γ represent constants satisfying 0 ≦ γ ≦ β ≦ 1, respectively.

【０００６】この聴覚的重み付けフィルタを用いた音声
符号化方式である、符号帳励振線形予測符号化(Code-Ex
cited Linear Predictive Coding.以後ＣＥＬＰと記
す）は、例えば、“Code-Excited Linear Prediction
(CELP):High-Quality Speech atVery Low Bit Rates",
M.R.Schroeder and B.S.Atal,IEEE Int.Conf.on Acoust
ics,Speech and Signal Processing,pp.937-940,1985に
示されている。図４はその構成を示すブロック図であ
る。A codebook-excited linear predictive coding (Code-Ex), which is a speech coding method using this auditory weighting filter,
cited Linear Predictive Coding. Hereinafter referred to as CELP) is, for example, “Code-Excited Linear Prediction
(CELP): High-Quality Speech at Very Low Bit Rates ",
MRSchroeder and BSAtal, IEEE Int.Conf.on Acoust
ics, Speech and Signal Processing, pp. 937-940, 1985. FIG. 4 is a block diagram showing the configuration.

【０００７】図４において、１／Ａ（ｚ）は、式（２）
で表される音声の線形予測合成フィルタである。In FIG. 4, 1 / A (z) is given by the following equation (2).
Is a speech linear prediction synthesis filter represented by

【０００８】[0008]

【数２】 (Equation 2)

【０００９】上記式（１）においてγ＝０．８，β＝１
に設定して、上記式（２）で表される音声の線形予測合
成フィルタとこの聴覚的重み付けフィルタを合成する
と、式（３）のように簡略化される。In the above equation (1), γ = 0.8, β = 1
, And the audio linear prediction synthesis filter represented by the above equation (2) is synthesized with the perceptual weighting filter, the result is simplified as in equation (3).

【００１０】この場合、図４のブロック図は、図５に示
す構成に変更される。In this case, the block diagram of FIG. 4 is changed to the configuration shown in FIG.

【００１１】[0011]

【数３】 (Equation 3)

【００１２】上述した従来の技術では聴覚的重み付けフ
ィルタは、聴覚マスキング特性を非常に簡単に近似した
特性で表している。In the prior art described above, the auditory weighting filter represents the auditory masking characteristic as a characteristic that is very easily approximated.

【００１３】第２の従来技術として、オーディオ信号の
圧縮符号化で用いられている技術がある。この方式は、
第１の従来技術より積極的に聴覚マスキング特性を利用
している。As a second conventional technique, there is a technique used in compression coding of an audio signal. This method is
The auditory masking characteristic is more actively used than the first prior art.

【００１４】図６に、ＭＰＥＧで用いられている音響信
号の符号化部の動作シーケンスを示す。その一例は、
「音響信号の高能率符号化−ＭＰＥＧオーディオ符号化
方式」後藤、日本音響学会誌４７巻１２号ｐｐ．９６６
−９６９，１９９１に示されている。FIG. 6 shows an operation sequence of an audio signal encoding unit used in MPEG. One example is
"High Efficiency Coding of Audio Signal-MPEG Audio Coding System" Goto, Journal of the Acoustical Society of Japan, Vol. 966
-969, 1991.

【００１５】図６のフローの右上において、入力信号を
ＦＦＴを用いパワースペクトルを求め、パワースペクト
ルの情報等から、聴覚マスキング特性を算出している。
ＭＰＥＧＬａｙｅｒ１，２では、基本的には帯域分割
符号化を用いており、マスキング特性の情報等から、各
帯域毎の符号化ビットを決定している。In the upper right part of the flow of FIG. 6, a power spectrum is obtained from an input signal using FFT, and an auditory masking characteristic is calculated from information on the power spectrum.
In MPEG Layers 1 and 2, band division coding is basically used, and coded bits for each band are determined from information on masking characteristics and the like.

【００１６】第３の従来技術としては、第１と第２の技
術を融合した技術がある。パワースペクトル情報から聴
覚マスキング特性を求め、その逆特性を持つ聴覚的重み
付けフィルタを用い、量子化誤差波形のエネルギーを最
小化するように符号化処理を行う手法である。その一例
は、"Some Experiments in Perceptual Maskinig ofQua
ntizing Noise in Analysis-By-Synthesis Speech Code
rs",R.Drogo De Iacovo and R.Montagna, EUROSPEECH,p
p.825-828,1991に示されている。As a third conventional technique, there is a technique that combines the first and second techniques. In this method, an auditory masking characteristic is obtained from power spectrum information, and an encoding process is performed using an auditory weighting filter having the inverse characteristic to minimize the energy of the quantization error waveform. One example is "Some Experiments in Perceptual Maskinig ofQua
ntizing Noise in Analysis-By-Synthesis Speech Code
rs ", R.Drogo De Iacovo and R.Montagna, EUROSPEECH, p
pp. 825-828, 1991.

【００１７】この方式においては、ヒルベルト変換の技
術を用いて、聴覚マスキング特性のパワースペクトル特
性を持つ、最小位相有限インパルス応答フィルタ（以後
ＦＩＲフィルタと記す）を設計し、その逆フィルタを聴
覚的重み付けフィルタとして使用している。In this method, a minimum phase finite impulse response filter (hereinafter, referred to as an FIR filter) having a power spectrum characteristic of an auditory masking characteristic is designed using a Hilbert transform technique, and its inverse filter is weighted by an auditory weight. Used as a filter.

【００１８】[0018]

【発明が解決しようとする課題】しかしながら、上述し
た第１の従来技術における聴覚的重み付けフィルタの特
性は、簡単な近似によって求められているので人間の聴
覚マスキング特性とは異なっており、量子化ノイズを充
分に隠蔽することができないという問題点があった。However, the characteristics of the auditory weighting filter in the first prior art described above are different from human auditory masking characteristics because they are obtained by simple approximation, and the quantization noise However, there was a problem that it was not possible to conceal sufficiently.

【００１９】また、上述した第２の従来技術において
は、マスキング特性は、人間の聴覚マスキング特性のモ
デルに従って求めてはいるが、最終的に帯域分割符号化
を用いており、ビット配分等の付加情報も必要で、圧縮
率が充分に低くできないという問題点があった。In the second prior art, the masking characteristic is obtained in accordance with the model of the human auditory masking characteristic. However, the band division coding is finally used, and the addition of bit allocation and the like is performed. There is also a problem that information is required and the compression ratio cannot be sufficiently reduced.

【００２０】更に、上述した第３の従来技術において
は、上記２つの問題点に対処し、聴覚マスキング特性を
考慮し、聴覚的重み付けフィルタを用いることで、圧縮
率の高い符号化方式が実現できる。しかしながら聴覚的
重み付けフィルタはＦＩＲフィルタで構成されているゆ
え、同一フィルタ次数で振幅周波数特性を近似する観点
からは無限インパルス応答フィルタ（以後ＩＩＲフィル
タと記す）より劣るという課題と、第１の従来技術で説
明したような、聴覚的重み付けフィルタと、音声の線形
予測合成フィルタとの合成処理による処理の簡易化が困
難であるという問題点があった。Further, in the third prior art described above, by coping with the above two problems, taking into account the auditory masking characteristics, and using an auditory weighting filter, an encoding system with a high compression rate can be realized. . However, since the auditory weighting filter is composed of the FIR filter, it is inferior to the infinite impulse response filter (hereinafter referred to as IIR filter) from the viewpoint of approximating the amplitude frequency characteristic with the same filter order, and the first conventional technique. However, there is a problem that it is difficult to simplify the processing by the synthesis processing of the auditory weighting filter and the linear predictive synthesis filter of the voice as described in (1).

【００２１】本発明の目的は、上述した従来の技術にお
ける問題点に鑑み、量子化ノイズを充分に隠蔽でき、圧
縮率が充分に低くできると共に全体の処理を簡易化でき
る音響信号符号化装置を提供することにある。An object of the present invention is to provide an audio signal encoding apparatus capable of sufficiently concealing quantization noise, sufficiently reducing the compression ratio, and simplifying the entire processing, in view of the above-mentioned problems in the prior art. To provide.

【００２２】[0022]

【課題を解決するための手段】本発明の目的は、音響信
号のパワースペクトルを求める手段と、聴覚マスキング
スペクトル特性を求める手段と、音響信号の逆パワース
ペクトル特性を有する第１フィルタリング手段と、音響
信号のパワースペクトル特性を聴覚マスキングスペクト
ル特性で除したスペクトル特性を有する第２フィルタリ
ング手段とを備えており、第１フィルタリング手段及び
第２フィルタリング手段により聴覚的重み付け処理を行
う音響信号符号化装置によって達成される。SUMMARY OF THE INVENTION It is an object of the present invention to obtain a power spectrum of an audio signal, obtain an audio masking spectrum characteristic, a first filtering means having an inverse power spectrum characteristic of the audio signal, A second filtering unit having a spectral characteristic obtained by dividing a power spectral characteristic of the signal by an auditory masking spectral characteristic, and achieved by an audio signal encoding device that performs an auditory weighting process by the first filtering unit and the second filtering unit. Is done.

【００２３】本発明の音響信号符号化装置は、音響信号
のパワースペクトルから自己相関系列を求める逆フーリ
エ変換手段と、自己相関系列から第２フィルタリング手
段の係数を算出する手段を備えるように構成されてもよ
い。An audio signal encoding apparatus according to the present invention is configured to include an inverse Fourier transform unit for obtaining an autocorrelation sequence from a power spectrum of an audio signal, and a unit for calculating a coefficient of a second filtering unit from the autocorrelation sequence. You may.

【００２４】本発明の音響信号符号化装置は、対数パワ
ースペクトルを求める手段と、対数パワースペクトルか
ら逆フーリエ変換によってケプストラムを求める手段
と、ケプストラムから第２フィルタリング手段の係数を
算出する手段とを備えるように構成されてもよい。The audio signal encoding apparatus according to the present invention comprises means for obtaining a logarithmic power spectrum, means for obtaining a cepstrum from the logarithmic power spectrum by inverse Fourier transform, and means for calculating a coefficient of the second filtering means from the cepstrum. It may be configured as follows.

【００２５】[0025]

【作用】本発明の音響信号符号化装置では、音響信号の
パワースペクトルを求め、聴覚マスキングスペクトル特
性を求め、第１フィルタリング手段は音響信号の逆パワ
ースペクトル特性を有し、第２フィルタリング手段は音
響信号のパワースペクトル特性を聴覚マスキングスペク
トル特性で除したスペクトル特性を有し、第１フィルタ
リング手段及び第２フィルタリング手段により聴覚的重
み付け処理を行う。In the audio signal encoding apparatus according to the present invention, the power spectrum of the audio signal is determined, the auditory masking spectrum characteristic is determined, the first filtering means has the inverse power spectrum characteristic of the audio signal, and the second filtering means has the audio power spectrum characteristic. It has a spectral characteristic obtained by dividing the power spectral characteristic of the signal by the auditory masking spectral characteristic, and performs an auditory weighting process by the first filtering means and the second filtering means.

【００２６】本発明の音響信号符号化装置では、逆フー
リエ変換手段は音響信号のパワースペクトルから自己相
関系列を求め、自己相関系列から第２フィルタリング手
段の係数を算出する。In the audio signal encoding apparatus according to the present invention, the inverse Fourier transform means obtains an autocorrelation sequence from the power spectrum of the audio signal, and calculates a coefficient of the second filtering means from the autocorrelation sequence.

【００２７】本発明の音響信号符号化装置では、対数パ
ワースペクトルを求め、対数パワースペクトルから逆フ
ーリエ変換によってケプストラムを求め、ケプストラム
から第２フィルタリング手段の係数を算出する。In the audio signal encoding apparatus according to the present invention, a logarithmic power spectrum is obtained, a cepstrum is obtained from the logarithmic power spectrum by inverse Fourier transform, and a coefficient of the second filtering means is calculated from the cepstrum.

【００２８】[0028]

【実施例】以下、図面を参照して本発明の音響信号符号
化装置の実施例を説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of an audio signal encoding apparatus according to the present invention.

【００２９】図１は、本発明の音響信号符号化装置の第
１実施例の構成を示すブロック図であり、ＣＥＬＰシス
テムを用いた例を示す。FIG. 1 is a block diagram showing the configuration of a first embodiment of an audio signal encoding apparatus according to the present invention, and shows an example using a CELP system.

【００３０】図１の音響信号符号化装置は、音響信号の
入力端子１０５、入力端子１０５に接続されており音響
信号を線形予測分析（以後ＬＰＣ分析と記す）するＬＰ
Ｃ分析部１１０、ＬＰＣ分析部１１０に接続されており
ＬＰＣ分析結果から信号のパワースペクトルＰ（ω）を
算出するパワースペクトル算出部１１１、パワースペク
トル算出部１１１に接続されており信号のパワースペク
トルからマスキング特性Ｍ（ω）を算出するマスキング
特性算出部１１２、パワースペクトル算出部１１１及び
マスキング特性算出部１１２に接続されており信号のパ
ワースペクトルをマスキング特性で割算する割算器１１
３、割算器１１３に接続されており割算器１１３で求ま
ったスペクトル比特性からＩＩＲフィルタ係数を求める
ＩＩＲフィルタ係数算出部１１４、入力端子１０５及び
ＬＰＣ分析部１１０に接続されており入力信号を聴覚的
重み付けするための第１フィルタリング手段であるＦＩ
Ｒフィルタ１０７、ＦＩＲフィルタ１０７及びＩＩＲフ
ィルタ係数算出部１１４に接続されており入力信号を聴
覚的重み付けするための第２フィルタリング手段の一部
であるＩＩＲフィルタ１０９、ＣＥＬＰ音声符号化の励
振符号帳（コードブック）１０１、コードブック１０１
に接続されており励振信号を増幅する増幅部１０２、増
幅部１０２に接続されておりピッチ成分を合成するピッ
チ成分合成フィルタ１０３、ピッチ成分合成フィルタ１
０３及びＩＩＲフィルタ係数算出部１１４に接続されて
おり音声スペクトル合成フィルタと聴覚的重み付けフィ
ルタを合成した特性を持つ第２フィルタリング手段の他
の一部であるＩＩＲフィルタ１０４、ＩＩＲフィルタ１
０４，１０９に接続されており聴覚的重み付けされた入
力信号と聴覚的重み付けされた再生信号の差分をとる減
算部１０６、減算部１０６に接続されており差分波形の
エネルギーを最小化するように符号化パラメータを設定
するエネルギー最小化部１０８によって構成されてい
る。The audio signal encoding apparatus shown in FIG. 1 is connected to an input terminal 105 of an audio signal and an LP for performing linear prediction analysis (hereinafter referred to as LPC analysis) on the audio signal.
A power spectrum calculation unit 111 connected to the C analysis unit 110 and the LPC analysis unit 110 to calculate the power spectrum P (ω) of the signal from the LPC analysis result. A masking characteristic calculator 112 for calculating a masking characteristic M (ω), a power spectrum calculator 111, and a divider 11 connected to the masking characteristic calculator 112 for dividing the power spectrum of the signal by the masking characteristic.
3. The input signal is connected to the IIR filter coefficient calculation unit 114, the input terminal 105, and the LPC analysis unit 110 which are connected to the divider 113 and obtain the IIR filter coefficient from the spectrum ratio characteristic obtained by the divider 113. FI as first filtering means for auditory weighting
The IIR filter 109, which is connected to the R filter 107, the FIR filter 107, and the IIR filter coefficient calculation unit 114 and is a part of the second filtering means for perceptually weighting the input signal, an excitation codebook for CELP speech coding ( Codebook) 101, codebook 101
, An amplification unit 102 for amplifying the excitation signal, a pitch component synthesis filter 103 connected to the amplification unit 102 for synthesizing pitch components, and a pitch component synthesis filter 1
IIR filter 104, IIR filter 1 which is connected to the IIR filter coefficient calculating unit 114 and is another part of the second filtering means having the characteristic of combining the speech spectrum synthesis filter and the auditory weighting filter.
The subtraction unit 106 is connected to the subtraction unit 106 and the subtraction unit 106. The subtraction unit 106 is connected to the subtraction unit 106 and is connected to the subtraction unit 106 to reduce the energy of the difference waveform. It is configured by an energy minimizing unit 108 for setting the optimization parameter.

【００３１】本実施例では、聴覚的重み付けフィルタの
構成法が上述した図５の従来技術と異なる。以下では聴
覚的重み付けフィルタの構成法に重点をおいて説明す
る。In this embodiment, the configuration of the auditory weighting filter is different from that of the prior art shown in FIG. The following description focuses on the configuration of the auditory weighting filter.

【００３２】入力端子１０５から入力した信号は、ある
一定の時間長毎に区分化処理される。これをフレームと
呼ぶことにする。１フレームの信号はＬＰＣ分析部１１
０で線形予測係数が算出される。この線形予測係数は上
記式（１）で示す聴覚的重み付けフィルタの分子項であ
るＦＩＲフィルタ１０７の係数として設定される（但し
以後、上記式（１）において、β＝１とする）。算出さ
れた線形予測係数から振巾伝達特性をパワースペクトル
算出部１１１で計算する。上記式（２）で表された伝達
特性から下記に示す式（４）でパワースペクトルが算出
される。The signal input from the input terminal 105 is subjected to a segmentation process for every certain time length. This is called a frame. The signal of one frame is output to the LPC analysis unit 11
At 0, a linear prediction coefficient is calculated. This linear prediction coefficient is set as a coefficient of the FIR filter 107 which is a numerator of the auditory weighting filter shown in the above equation (1) (hereafter, β = 1 in the above equation (1)). The amplitude transfer characteristic is calculated by the power spectrum calculation unit 111 from the calculated linear prediction coefficient. From the transfer characteristic expressed by the above equation (2), a power spectrum is calculated by the following equation (4).

【００３３】[0033]

【数４】 (Equation 4)

【００３４】上記式（４）において、ω＝２πＦs でＦ
s はサンプリング周波数である。In the above equation (4), when ω = 2πFs, F
s is the sampling frequency.

【００３５】上述の説明では、ＬＰＣ分析の結果から入
力信号のパワースペクトルを算出したが、入力信号をフ
ーリエ変換して算出してもよい。この場合、ＬＰＣスペ
クトルより周波数分解能を高く求められるので、マスキ
ング特性の算出がより精度よく計算できる。In the above description, the power spectrum of the input signal is calculated from the result of the LPC analysis. However, the input signal may be calculated by performing a Fourier transform. In this case, since the frequency resolution is required to be higher than the LPC spectrum, the masking characteristic can be calculated more accurately.

【００３６】マスキング特性算出部１１２は、入力信号
のパワースペクトルから、マスキングスペクトル特性を
算出する。本処理手順の概要は、パワースペクトルを聴
覚の臨界帯域幅毎に分解し、全ての臨界帯域毎に、入力
信号による量子化雑音のマスキング曲線を算出し、信号
帯域全体に渡る最小可聴値及び時間軸でのマスキング等
を考慮してマスキング曲線Ｍ（ω）を算出する。マスキ
ング曲線の算出は、種々提案されており、その一例とし
ては"Estimation of Perceptual Entropy Using Noise
Masking Criteria",J.D Johnston,IEEE Int. Conf.on A
coustics, Speech and Signal Processing,pp.2524-252
7,1988がある。The masking characteristic calculator 112 calculates a masking spectrum characteristic from the power spectrum of the input signal. The outline of this processing procedure is to decompose the power spectrum for each critical auditory bandwidth, calculate the masking curve of the quantization noise due to the input signal for each critical band, and obtain the minimum audible value and time over the entire signal band. A masking curve M (ω) is calculated in consideration of masking at the axis and the like. Various calculations of a masking curve have been proposed, and one example is “Estimation of Perceptual Entropy Using Noise”.
Masking Criteria ", JD Johnston, IEEE Int. Conf.on A
coustics, Speech and Signal Processing, pp. 2524-252
There are 7,1988.

【００３７】符号化による量子化ノイズは、このマスキ
ング曲線の形状に従ってシェイピングされるように聴覚
的重み付けフィルタを設計する。即ち、聴覚的重み付け
フィルタのパワースペクトルは、マスキングスペクトル
の逆特性を持つ必要がある。ここで下記の式（５）に示
す関係が成り立つフィルタＦ（ｚ）を考える。The auditory weighting filter is designed so that the quantization noise due to the encoding is shaped according to the shape of the masking curve. That is, the power spectrum of the auditory weighting filter needs to have the inverse characteristic of the masking spectrum. Here, a filter F (z) that satisfies the relationship shown in the following equation (5) is considered.

【００３８】[0038]

【数５】 (Equation 5)

【００３９】フィルタＦ（ｚ）の振巾伝達特性は、入力
信号のパワースペクトルＰ（ω）を、マスキングスペク
トルＭ（ω）で除した伝達特性を持つフィルタと考える
ことができる。The amplitude transfer characteristic of the filter F (z) can be considered as a filter having a transfer characteristic obtained by dividing the power spectrum P (ω) of the input signal by the masking spectrum M (ω).

【００４０】このフィルタＦ（ｚ）を全極形ＩＩＲフィ
ルタで実現した場合には、聴覚的重み付けフィルタＷ
（ｚ）と音声合成フィルタ１／Ａ（ｚ）を合成すると、
下記に示す式（６）のように簡略化できる。When this filter F (z) is realized by an all-pole IIR filter, the auditory weighting filter W
When (z) and the speech synthesis filter 1 / A (z) are synthesized,
It can be simplified as in the following equation (6).

【００４１】[0041]

【数６】 (Equation 6)

【００４２】上記動作を行うために、割算部１１３でＰ
（ω）／Ｍ（ω）を求め、ＩＩＲフィルタ係数算出部１
１４で、Ｐ（ω）／Ｍ（ω）で示されるパワースペクト
ルからＩＩＲフィルタ係数を算出する。In order to perform the above operation, the dividing unit 113 sets P
(Ω) / M (ω) is obtained, and the IIR filter coefficient calculation unit 1
At 14, an IIR filter coefficient is calculated from the power spectrum represented by P (ω) / M (ω).

【００４３】ここで、フィルタＦ（ｚ）の伝達関数を式
（７）に示す。Here, the transfer function of the filter F (z) is shown in equation (7).

【００４４】[0044]

【数７】 (Equation 7)

【００４５】式（７）において、ｑはＩＩＲフィルタの
次数で、音声の線形予測次数と一致している必要はな
い。ｆ_k はＩＩＲフィルタ係数算出部１１４で算出され
たＩＩＲフィルタのｋ次の係数である。In equation (7), q is the order of the IIR filter and does not need to match the linear prediction order of the speech. f _k is a k-th order coefficient of the IIR filter calculated by the IIR filter coefficient calculation unit 114.

【００４６】以上の処理で、ＬＰＣ分析部１１０で求ま
った線形予測係数が設定されたＡ（ｚ）の伝達関数を持
つＦＩＲフィルタ１０７と上述したＩＩＲフィルタで、
入力信号を聴覚的重み付け処理する。また、ＩＩＲフィ
ルタ１０９と同じ係数が設定されたＩＩＲフィルタ１０
４によって、聴覚的重み付けされた再生信号を得る。こ
の後の符号化処理は、一般のＣＥＬＰ符号化方式と同じ
であり、概略だけ説明すると、聴覚的重み付けされた入
力信号と、聴覚的重み付けされた再生信号の、誤差エネ
ルギーが最小になるように、符号化のパラメータを決定
する。In the above processing, the FIR filter 107 having the transfer function of A (z) in which the linear prediction coefficient obtained by the LPC analysis unit 110 is set and the IIR filter described above
Aurally weight the input signal. The IIR filter 10 having the same coefficient as the IIR filter 109 is set.
4 obtains an auditory weighted reproduction signal. The subsequent encoding process is the same as that of a general CELP encoding method, and will be briefly described so that the error energy between the input signal weighted perceptually and the reproduced signal weighted perceptually is minimized. , And determine the encoding parameters.

【００４７】次に、パワースペクトルＰ（ω）と、マス
キングスペクトルＭ（ω）から、ＩＩＲフィルタ係数を
算出する処理法を説明する。Next, a method of calculating an IIR filter coefficient from the power spectrum P (ω) and the masking spectrum M (ω) will be described.

【００４８】図２は、逆フーリエ変換と正規方程式を解
くことにより、パワースペクトルＰ（ω）とマスキング
スペクトルＭ（ω）から、ＩＩＲフィルタ係数を算出す
る処理手順を示す。FIG. 2 shows a processing procedure for calculating an IIR filter coefficient from the power spectrum P (ω) and the masking spectrum M (ω) by solving the inverse Fourier transform and the normal equation.

【００４９】以下、図２を参照して説明する。Hereinafter, description will be made with reference to FIG.

【００５０】まず、下記の式（８）に示すようにパワー
スペクトルを定義する。First, a power spectrum is defined as shown in the following equation (8).

【００５１】[0051]

【数８】 (Equation 8)

【００５２】パワースペクトルＳ（ω）と自己相関関数
Ｒ（τ）の間には、下記の式（９）で示すような関係が
あるので、τ＝０〜ｑの範囲で、ＦＦＴの手法等を使
い、自己相関系列を算出する。Since the power spectrum S (ω) and the autocorrelation function R (τ) have a relationship as shown in the following equation (9), the FFT method and the like can be performed in the range of τ = 0 to q. Is used to calculate the autocorrelation sequence.

【００５３】[0053]

【数９】 (Equation 9)

【００５４】次に自己相関係数から、ＩＩＲフィルタ係
数への変換は、音声の線形予測分析で一般的に用いられ
ているように、式（１０）の正規方程式を解くことで求
められる。Next, the conversion from the autocorrelation coefficient to the IIR filter coefficient is obtained by solving the normal equation of Expression (10) as generally used in the linear prediction analysis of speech.

【００５５】[0055]

【数１０】 (Equation 10)

【００５６】式（１０）において、（....）^T は行列の
転置操作を表す。In equation (10), (...) ^T represents a matrix transpose operation.

【００５７】上述した操作で、ＩＩＲフィルタの係数が
算出される。With the above operation, the coefficients of the IIR filter are calculated.

【００５８】図３は、準同形処理によるケプストラムを
用いてＩＩＲフィルタ係数を算出する他の手法を示す。FIG. 3 shows another method of calculating an IIR filter coefficient using a cepstrum by homomorphic processing.

【００５９】ここではパワースペクトルＰ（ω）を、マ
スキングスペクトルＭ（ω）で除す演算を、対数領域で
行うので、式（１１）に示す処理を行う。これは図３で
は、Ｐ（ω）、Ｍ（ω）をそれぞれ対数演算部３０１，
３０２で対数化して、演算部３０３で減算することに相
当する。Here, since the operation of dividing the power spectrum P (ω) by the masking spectrum M (ω) is performed in the logarithmic domain, the processing shown in equation (11) is performed. This is because in FIG. 3, P (ω) and M (ω) are logarithmic calculation units 301 and 301, respectively.
This corresponds to logarithmization at 302 and subtraction at arithmetic unit 303.

【００６０】[0060]

【数１１】 [Equation 11]

【００６１】このＬｏｇＳ（ω）を逆ＦＦＴ演算部３
０４で逆フーリエ変換すると、式（１２）によって、ケ
プストラムｃ_n が算出される（「音声情報処理の基礎」
斎藤、中田、オーム社、ｐｐ．９９−１０３、参照）。This Log S (ω) is calculated by the inverse FFT operation unit 3
When the inverse Fourier transform is performed at step 04, the cepstrum c _n is calculated according to equation (12) (“basic of speech information processing”).
Saito, Nakata, Ohmsha, pp. 99-103).

【００６２】[0062]

【数１２】 (Equation 12)

【００６３】ケプストラムｃ_n の低次部分がスペクトル
構造を表しているので、ケプストラム窓（例えば、ｗ_n
＝１：ｎ＝１〜ｑ，ｗ_n ＝０：ｎ＞ｑ）で窓掛けする。
こうして求まったケプストラムｃ_n から、式（１３）に
よってＩＩＲフィルタの係数が算出される。Since the lower order part of the cepstrum c _n represents the spectral structure, the cepstrum window (eg, w _n
= 1: n = 1~q, w n = 0: to windowing with n> q).
From the cepstrum c _n thus obtained, the coefficients of the IIR filter are calculated by equation (13).

【００６４】[0064]

【数１３】 (Equation 13)

【００６５】ただし、式（１３）において、ｋはｋ＝１
〜ｑの整数である。However, in the equation (13), k is k = 1.
To q.

【００６６】以上の説明はＣＥＬＰシステムで説明した
が、マルチパルス符号化等の聴覚的重み付けフィルタを
構成要素として持つシステムへも容易に応用できる。Although the above description has been made with reference to the CELP system, the present invention can be easily applied to a system having an auditory weighting filter such as multi-pulse coding as a component.

【００６７】なお、符号化装置の符号化に関する部分を
変更することにより、他の部分は一切変更なしで復号装
置を実現できる。By changing the part related to the coding of the coding apparatus, the decoding apparatus can be realized without changing other parts.

【００６８】[0068]

【発明の効果】本発明の音響信号符号化装置は、音響信
号のパワースペクトルを求める手段と、聴覚マスキング
スペクトル特性を求める手段と、音響信号の逆パワース
ペクトル特性を有する第１フィルタリング手段と、音響
信号のパワースペクトル特性を聴覚マスキングスペクト
ル特性で除したスペクトル特性を有する第２フィルタリ
ング手段とを備えており、第１フィルタリング手段及び
第２フィルタリング手段により聴覚的重み付け処理を行
うので、符号化で生じる量子化ノイズを、聴覚的重み付
けフィルタによって、ノイズシェイピングすることがで
き、人間の聴覚特性の聴覚マスキングを利用することに
より、雑音を聞こえにくくして再生音質を向上できる。
また、聴覚的重み付けフィルタは、音声の線形予測合成
フィルタと合成することで、簡易化することができ、符
号化演算量を削減することができる。According to the present invention, there is provided an audio signal encoding apparatus comprising: means for obtaining a power spectrum of an audio signal; means for obtaining an auditory masking spectrum characteristic; first filtering means having an inverse power spectrum characteristic of the audio signal; A second filtering unit having a spectral characteristic obtained by dividing a power spectrum characteristic of the signal by an auditory masking spectral characteristic, and performing an auditory weighting process by the first filtering unit and the second filtering unit. Noise noise can be shaped by an auditory weighting filter, and the auditory masking of human auditory characteristics can be used to make the noise less audible and improve the reproduction sound quality.
Further, by synthesizing the auditory weighting filter with the speech linear prediction synthesis filter, the simplification can be performed, and the amount of encoding operation can be reduced.

【００６９】本発明の音響信号符号化装置は、音響信号
のパワースペクトルから自己相関系列を求める逆フーリ
エ変換手段と、自己相関系列から第２フィルタリング手
段の係数を効果的に算出することができる。The audio signal encoding apparatus of the present invention can effectively calculate the coefficients of the inverse Fourier transform for obtaining the autocorrelation sequence from the power spectrum of the audio signal and the second filtering means from the autocorrelation sequence.

【００７０】本発明の音響信号符号化装置は、対数パワ
ースペクトルを求める手段と、対数パワースペクトルか
ら逆フーリエ変換によってケプストラムを求める手段に
よって、ケプストラムから第２フィルタリング手段の係
数を効果的に算出することができる。The acoustic signal encoding apparatus according to the present invention uses a means for obtaining a logarithmic power spectrum and a means for obtaining a cepstrum by inverse Fourier transform from a logarithmic power spectrum to effectively calculate the coefficient of the second filtering means from the cepstrum. Can be.

[Brief description of the drawings]

【図１】本発明の音響信号符号化装置の一実施例の構成
を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of an embodiment of an audio signal encoding device according to the present invention.

【図２】本発明の音響信号符号化装置におけるフィルタ
係数算出の一例を説明するためのフローチャートであ
る。FIG. 2 is a flowchart for explaining an example of filter coefficient calculation in the audio signal encoding device of the present invention.

【図３】本発明の音響信号符号化装置におけるフィルタ
係数算出の他の一例を説明するためのブロック図であ
る。FIG. 3 is a block diagram for explaining another example of filter coefficient calculation in the audio signal encoding device of the present invention.

【図４】従来技術のＣＥＬＰ音声符号化方式を説明する
ためのブロック図である。FIG. 4 is a block diagram illustrating a conventional CELP speech coding scheme.

【図５】従来技術のＣＥＬＰ音声符号化方式の、聴覚的
重み付けフィルタ処理の簡易化を説明するためのブロッ
ク図である。FIG. 5 is a block diagram for explaining simplification of an auditory weighting filter process of a conventional CELP speech coding scheme.

【図６】従来技術の聴覚マスキングを考慮した符号化方
式を説明するためのフローチャートである。FIG. 6 is a flowchart for explaining a conventional encoding method in consideration of auditory masking.

[Explanation of symbols]

１０１ＣＥＬＰの励振符号帳（コードブック）１０２掛算部１０３ピッチ成分合成フィルタ１０４，１０９全極形ＩＩＲフィルタ１０５入力端子１０６減算部１０７ＦＩＲフィルタ１０８誤差エネルギー最小化部１１０線形予測分析部１１１パワースペクトル算出部１１２マスキング特性算出部１１３スペクトル比算出部１１４全極形ＩＩＲフィルタ算出部３０１，３０２対数演算部３０３減算部３０４逆フーリエ変換部３０５ケプストラム窓掛け部３０６ケプストラムから予測係数への変換部 101 CELP excitation codebook (codebook) 102 Multiplication unit 103 Pitch component synthesis filter 104, 109 All-pole IIR filter 105 Input terminal 106 Subtraction unit 107 FIR filter 108 Error energy minimization unit 110 Linear prediction analysis unit 111 Power spectrum calculation Unit 112 masking characteristic calculation unit 113 spectrum ratio calculation unit 114 all-pole IIR filter calculation unit 301, 302 logarithmic calculation unit 303 subtraction unit 304 inverse Fourier transform unit 305 cepstrum windowing unit 306 conversion unit from cepstrum to prediction coefficient

Claims

(57) [Claims]

1. A means for obtaining a power spectrum of an acoustic signal, a means for obtaining an auditory masking spectrum characteristic,
A first filtering unit having an inverse power spectrum characteristic of the audio signal; and a second filtering unit having a spectrum characteristic obtained by dividing a power spectrum characteristic of the audio signal by the auditory masking spectrum characteristic.
An audio signal encoding device, wherein an audio weighting process is performed by the first filtering means and the second filtering means.

2. The apparatus according to claim 1, further comprising: an inverse Fourier transform unit for obtaining an autocorrelation sequence from a power spectrum of the acoustic signal; and a unit for calculating a coefficient of the second filtering unit from the autocorrelation sequence. 2. The audio signal encoding device according to claim 1.

3. A means for obtaining a logarithmic power spectrum,
2. The audio signal encoding according to claim 1, further comprising: means for obtaining a cepstrum from the logarithmic power spectrum by inverse Fourier transform; and means for calculating a coefficient of the second filtering means from the cepstrum. apparatus.