CN101930746B

CN101930746B - An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio

Info

Publication number: CN101930746B
Application number: CN2010102154044A
Authority: CN
Inventors: 余小清; 许雪琼; 张静; 刘军伟; 万旺根
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2010-06-29
Filing date: 2010-06-29
Publication date: 2012-05-02
Anticipated expiration: 2030-06-29
Also published as: CN101930746A

Abstract

The invention relates to an MP3 compressed domain audio adaptive noise reduction method. The method is directly based on the MP3 compressed domain for noise reduction. First, MDCT coefficients are extracted from MP3 audio data containing noise, and activity detection is performed on the MP3 audio based on the MDCT spectrum energy characteristics to distinguish active audio segments and silent segments. At the same time, after extracting the MDCT coefficients from the MP3 compressed audio data, according to the sparse characteristics of the MDCT coefficients, a priori statistical modeling is performed on the MDCT coefficients using a normal inverse Gaussian (NIG) distribution function. Then, according to the Bayesian theory, a maximum a posteriori probability estimator based on the NIG prior probability model is designed to obtain an attenuation factor of the corresponding audio segment. In the attenuation noise part, the attenuation factor is used to attenuate the noise of the audio segment, and the number of attenuation iterations is adaptively adjusted according to the attenuation weight of the silent segment audio to achieve noise reduction. Experimental results show that the noise reduction algorithm of the invention can effectively remove the noise in the MP3 audio, improve the signal-to-noise ratio of the compressed audio, and the MP3 audio quality after noise reduction is good.

Description

An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio

技术领域 technical field

本发明涉及一种MP3压缩域音频自适应降噪方法，主要是在不同高斯白噪声条件下，对含有噪声的MP3音频，直接在MP3压缩域实现对MP3音频的自适应降噪处理的方法。The invention relates to an audio adaptive noise reduction method in an MP3 compressed domain, mainly for realizing the adaptive noise reduction processing method for the MP3 audio containing noise directly in the MP3 compressed domain under different Gaussian white noise conditions.

背景技术 Background technique

音频降噪技术，是指利用信号处理和模式识别的方法，从含有噪声的音频中将噪声去除，使去除噪声后的音频有较高的信噪比和较好的质量。音频降噪是音频信号处理领域需要解决的关键技术之一。Audio noise reduction technology refers to the use of signal processing and pattern recognition methods to remove noise from audio containing noise, so that the audio after noise removal has a higher signal-to-noise ratio and better quality. Audio noise reduction is one of the key technologies to be solved in the field of audio signal processing.

互联网以及各种数据库中存在的大量音频数据都是以压缩格式存储，如何对压缩域中的音频数据进行处理已成为音频研究领域的一大热点。国内外学者已针对压缩音频的分割、分类、检索算法进行了大量的研究，并且能获得与非压缩音频处理相近的实验结果。但在压缩音频中混有噪声的情况下，音频分类检索算法的精度却受到严重的影响。通常，先对含有噪声的压缩域音频解压缩，再进行降噪处理，耗费的时间较多，这必然降低对压缩音频进行各种处理的效率。因此，研究如何直接基于压缩域实现音频的降噪处理，以最小计算代价实现降噪来提高压缩域音频的检索效率显得尤为重要。A large amount of audio data in the Internet and various databases are stored in compressed format. How to process audio data in the compressed domain has become a hot spot in the field of audio research. Scholars at home and abroad have done a lot of research on the segmentation, classification, and retrieval algorithms of compressed audio, and can obtain experimental results similar to those of uncompressed audio processing. However, when the compressed audio is mixed with noise, the accuracy of the audio classification retrieval algorithm is seriously affected. Usually, it takes a lot of time to decompress the compressed domain audio containing noise first, and then perform noise reduction processing, which will inevitably reduce the efficiency of various processing of compressed audio. Therefore, it is particularly important to study how to implement audio noise reduction processing directly based on the compressed domain, and achieve noise reduction with the minimum computational cost to improve the retrieval efficiency of compressed domain audio.

在对音频进行压缩处理时考虑了人耳的听觉掩蔽特性，通过第二心理声学模型来选择修正的离散余弦变换(MDCT)的窗函数。同时，MDCT变换也是FFT变换的一种修正，且MDCT系数具有稀疏特性。因此，我们可以从压缩域音频中提取MDCT系数，然后试图寻找一种能拟合稀疏分布的模型函数用于对MDCT系数进行先验建模，然后构建滤波器，实现对压缩域音频的降噪处理。本发明正是采用上述的方法，从MPEG1标准声音第三层压缩技术MP3压缩域音频中提取MDCT系数，采用正态反高斯函数对MDCT系数的分布进行先验建模，构建最大后验概率估计函数，实现压缩域音频的降噪。The auditory masking characteristic of the human ear is taken into consideration when the audio is compressed, and the window function of the Modified Discrete Cosine Transform (MDCT) is selected through the second psychoacoustic model. At the same time, the MDCT transform is also a modification of the FFT transform, and the MDCT coefficients are sparse. Therefore, we can extract MDCT coefficients from compressed domain audio, and then try to find a model function that can fit a sparse distribution for prior modeling of MDCT coefficients, and then construct filters to achieve noise reduction for compressed domain audio deal with. The present invention just adopts above-mentioned method, extracts MDCT coefficient from MPEG1 standard audio third-layer compression technology MP3 compression domain audio frequency, adopts normal anti-Gaussian function to carry out prior modeling to the distribution of MDCT coefficient, constructs maximum posteriori probability estimation Function that implements noise reduction for compressed-domain audio.

本发明所提出的降噪方法解决了MP3压缩域中含有噪声的音频降噪问题，可进一步应用于MP3音频的语音识别和分类检索系统中。The noise reduction method proposed by the invention solves the audio noise reduction problem containing noise in the MP3 compression domain, and can be further applied to the speech recognition and classification retrieval system of the MP3 audio.

发明内容 Contents of the invention

本发明的目的在于提供一种MP3压缩域音频自适应降噪方法，通过从MP3音频提取MDCT系数，对MDCT系数的分布进行先验建模，并构建估计器，实现对含有噪声的MP3音频进行降噪处理。The object of the present invention is to provide a kind of MP3 compressed domain audio adaptive denoising method, by extracting MDCT coefficient from MP3 audio frequency, carry out prior modeling to the distribution of MDCT coefficient, and construct estimator, realize the MP3 audio frequency that contains noise Noise reduction processing.

本发明解决其技术问题采用的技术方案为：先从MP3音频数据中提取MDCT系数，再对MDCT系数进行先验概率建模，然后构造噪声衰减估计器。同时，对MP3音频进行静音段检测，根据静音段的衰减比重来调整对含噪音频段进行噪声衰减的程度。The technical scheme adopted by the present invention to solve the technical problem is as follows: first extract MDCT coefficients from MP3 audio data, then perform prior probability modeling on the MDCT coefficients, and then construct a noise attenuation estimator. At the same time, it detects the silent segment of the MP3 audio, and adjusts the degree of noise attenuation for the noise-containing frequency band according to the attenuation proportion of the silent segment.

本发明解决其技术问题采用的技术方案还可以进一步完善。首先从MP3音频数据中提取MDCT系数，再分析MDCT系数的特性，根据MDCT系数的特性选择适用于对MDCT系数分布进行先验概率建模的正态反高斯分布函数，然后根据贝叶斯最大后验概率理论来构造噪声衰减估计器。同时，利用MDCT谱能量特征对MP3音频进行静音段检测，根据静音段的衰减比重来调整降噪处理时对噪声衰减的程度。该方法具体包括如下步骤：The technical solution adopted by the present invention to solve the technical problem can be further improved. First extract the MDCT coefficients from the MP3 audio data, and then analyze the characteristics of the MDCT coefficients. According to the characteristics of the MDCT coefficients, the normal anti-Gaussian distribution function suitable for modeling the prior probability distribution of the MDCT coefficient distribution is selected, and then according to the Bayesian maximum posterior The experimental probability theory is used to construct the noise attenuation estimator. At the same time, the MDCT spectral energy feature is used to detect the silent segment of the MP3 audio, and the degree of noise attenuation during the noise reduction process is adjusted according to the attenuation proportion of the silent segment. The method specifically includes the following steps:

1)、含有噪声的MP3压缩音频的预处理，包括对MP3帧头进行解码、边信息获取、获取主数据和缩放因子、哈夫曼解码和反量化四个部分；1), the preprocessing of MP3 compressed audio containing noise, including four parts: decoding the MP3 frame header, obtaining side information, obtaining main data and scaling factor, Huffman decoding and dequantization;

2)、提取MDCT系数，并进行幅值映射处理：从反量化后的MP3帧中找出每一帧两个粒度的MDCT系数，对两个颗粒的MDCT系数按频率点求平均，构建每帧音频的MDCT谱系数，并将MDCT系数的幅值范围映射到0-L之间；2), extract the MDCT coefficients, and perform amplitude mapping processing: find out the MDCT coefficients of each frame with two granularities from the dequantized MP3 frame, average the MDCT coefficients of the two particles according to the frequency points, and construct each frame The MDCT spectral coefficient of the audio, and the amplitude range of the MDCT coefficient is mapped to between 0-L;

3)、对MDCT系数的分布进行先验建模并构造最大后验概率估计器：分别对不含噪声的MDCT系数和含有噪声的MDCT系数的分布情况进行分析，获得不含噪声的MDCT系数的统计特性。根据MDCT系数的稀疏统计特性，利用正态反高斯(NIG)分布函数对MDCT系数进行先验建模。根据贝叶斯最大后验概率准则，推导出基于NIG先验分布模型的估计器。3) Perform prior modeling on the distribution of MDCT coefficients and construct a maximum a posteriori probability estimator: respectively analyze the distribution of MDCT coefficients without noise and MDCT coefficients with noise, and obtain the MDCT coefficients without noise statistical properties. According to the sparse statistical properties of MDCT coefficients, the normal inverse Gaussian (NIG) distribution function is used to model the MDCT coefficients a priori. According to the Bayesian maximum a posteriori probability criterion, an estimator based on the NIG prior distribution model is derived.

4)、静音段检测：提取基于MDCT系数的谱能量特征，根据能量特征参数检测MP3音频中的静音段；4), silent segment detection: extract the spectral energy feature based on MDCT coefficients, and detect the silent segment in the MP3 audio according to the energy feature parameter;

5)、自适应迭代估计：利用3)中的估计器对含有噪声的MP3进行估计，并通过4)中检测到的静音段的衰减因子自适应地调整迭代估计的次数。5), adaptive iterative estimation: use the estimator in 3) to estimate the MP3 containing noise, and adjust the number of iterative estimation adaptively through the attenuation factor of the silent segment detected in 4).

本发明有益的效果是：直接基于MP3压缩域对MP3音频进行降噪处理，比传统的将MP3压缩音频解码为非压缩的wave音频再进行降噪处理的方法而言，本发明提出的方法更简单，且节省计算时间；研究MP3音频的MDCT系数的分布特性，选择适用于对MDCT系数的分布进行先验建模的函数，实验结果表明所选择的函数能有效的拟合MDCT系数的分布；并且，基于MDCT系数的先验概率分布函数设计的噪声衰减估计器能有效地实现MP3压缩音频的降噪；同时，利用MDCT谱能量特征检测MP3音频中的静音段，再由静音段的衰减因子自适应控制衰减噪声的程度，不仅能有效的解决降噪过程中过衰减或欠衰减导致引入音频噪声的问题，而且降噪后的音频具有良好的效果。The beneficial effects of the present invention are: directly based on the MP3 compressed domain, MP3 audio is subjected to noise reduction processing, and compared with the traditional method of decoding MP3 compressed audio into non-compressed wave audio and then performing noise reduction processing, the method proposed by the present invention is more effective. It is simple and saves calculation time; study the distribution characteristics of MDCT coefficients of MP3 audio, and select a function suitable for prior modeling of the distribution of MDCT coefficients. Experimental results show that the selected function can effectively fit the distribution of MDCT coefficients; Moreover, the noise attenuation estimator designed based on the prior probability distribution function of MDCT coefficients can effectively realize the noise reduction of MP3 compressed audio; at the same time, the silent segment in MP3 audio is detected by using the MDCT spectral energy feature, and then the attenuation factor of the silent segment Adaptive control of the degree of noise attenuation can not only effectively solve the problem of introducing audio noise caused by over-attenuation or under-attenuation in the noise reduction process, but also the audio after noise reduction has a good effect.

附图说明 Description of drawings

图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

本发明一种MP3压缩域音频自适应降噪方法的一个优选实施例结合附图说明如下：一种MP3压缩域音频自适应降噪方法共分为五步：A preferred embodiment of a kind of MP3 compressed domain audio adaptive noise reduction method of the present invention is described as follows in conjunction with accompanying drawing: A kind of MP3 compressed domain audio adaptive noise reduction method is divided into five steps altogether:

第一步：含有噪声的MP3压缩音频的预处理Step 1: Preprocessing of noisy MP3 compressed audio

含有噪声的MP3压缩音频的预处理，包括对MP3帧头进行解码、边信息获取、读取主数据和缩放因子、哈夫曼解码和反量化四个部分。Preprocessing of noisy MP3 compressed audio includes four parts: decoding MP3 frame header, obtaining side information, reading main data and scaling factor, Huffman decoding and dequantization.

1、同步数据流和帧头信息的获取1. Acquisition of synchronous data stream and frame header information

A)、根据MP3编码格式，从MP3数据流中搜索同步信息；A), according to the MP3 encoding format, search for synchronous information from the MP3 data stream;

B)、根据同步信息，找到MP3数据流中各帧数据的起始位置；B), according to synchronous information, find the starting position of each frame data in the MP3 data flow;

C)、确定数据帧的起始位置后，获取帧头信息Head；C), after determining the starting position of the data frame, obtain the frame header information Head;

2、边信息的获取2. Acquisition of side information

A)、根据MP3帧头的编码格式，确定MP3帧头中边信息的起始位置；A), according to the encoding format of MP3 frame header, determine the starting position of edge information in MP3 frame header;

B)、从MP3帧头信息Head中获取边信息Side；B), obtain the side information Side from the MP3 frame header information Head;

3、MP3主数据和缩放因子的读取3. Reading of MP3 main data and scaling factor

A)、根据边信息Side计算主数据的长度L；A), calculate the length L of the main data according to the side information Side;

B)、根据帧头信息Head中主数据的偏移量，确定MP3主数据的起始位置；B), according to the offset of the main data in the frame header information Head, determine the starting position of the MP3 main data;

C)、从当前帧中获取总长度为L的主数据D；C), obtaining the main data D with a total length of L from the current frame;

D)、从主数据D中提取缩放因子Scale；D), extract the scaling factor Scale from the master data D;

4、哈夫曼解码和反量化4. Huffman decoding and dequantization

A)、根据边信息Side确定哈夫曼解码数据的起始和结束位置；A), determine the start and end positions of the Huffman decoded data according to the side information Side;

B)、对MP3主数据D进行哈夫曼解码，得到32*18维的哈夫曼解码结果F[32，18]；B), carry out Huffman decoding to MP3 main data D, obtain the Huffman decoding result F [32,18] of 32*18 dimensions;

C)、对哈夫曼解码结果F[32，18]中的数据进行反量化。C) Dequantize the data in the Huffman decoding result F[32, 18].

第二步：MDCT系数提取及幅值映射处理The second step: MDCT coefficient extraction and amplitude mapping processing

1、构建每帧音频的修正离散余弦变换MDCT系数1. Construct the modified discrete cosine transform MDCT coefficients of each frame of audio

A)、分配用于存放一帧MP3音频两个粒度的MDCT系数的n*576大小的存储空间MDCT₀[n，576]，MDCT₁[n，576]中，其中n为MP3音频的帧数；A), allocate the n*576 size storage space MDCT ₀ [n, 576], MDCT ₁ [n, 576] for depositing the MDCT coefficient of two granularities of a frame of MP3 audio frequency, wherein n is the frame number of MP3 audio frequency ;

B)、从数组F中分别找到同一帧音频两个粒度的MDCT系数，按频率从低到高的原则重新排列，得到MDCT₀[i，j]，MDCT₁[i，j]中；B), from the array F, find the MDCT coefficients of the two granularities of the same frame of audio, rearrange according to the principle of frequency from low to high, and obtain MDCT ₀ [i, j], MDCT ₁ [i, j];

C)、计算同一帧音频中两个粒度相同频率点处的MDCT系数的平均值，作为这一帧音频的MDCT系数值M[i，j]；C), calculate the average value of the MDCT coefficients at two granularity identical frequency points in the same frame of audio, as the MDCT coefficient value M[i, j] of this frame of audio;

$M m [[i i,, j j]] = = \frac{{MDCT MDCT}_{00} [[i i,, j j]] + + {MDCT MDCT}_{11} [[i i,, j j]]}{22}$

其中，MDCT₀[i，j]，MDCT₁[i，j]分别第i帧音频的第0个粒度和第1个粒度的第j个MDCT谱值。M[i，j]为第i帧音频的第j个平均MDCT谱值。Among them, MDCT ₀ [i, j] and MDCT ₁ [i, j] are the 0th granularity and the jth MDCT spectrum value of the first granularity of the i-th audio frame respectively. M[i, j] is the j-th average MDCT spectral value of the i-th frame of audio.

2、MDCT系数幅值范围映射2. MDCT coefficient amplitude range mapping

将MDCT系数的幅值在0-1的范围线性映射到0-P之间，便于研究MDCT系数的统计分布和相应的拟合函数Linearly map the amplitude of MDCT coefficients in the range of 0-1 to 0-P, which is convenient for studying the statistical distribution of MDCT coefficients and the corresponding fitting functions

${x x}_{ij ij}^{' '} = = \frac{M m [[i i,, j j]] - - {M m}_{min min}}{{M m}_{max max} - - {M m}_{min min}} \times \times P P$

式中，x′_ij为幅值映射后的第i帧音频的第j个MDCT谱值，M[i，j]为由1中得到的第i帧音频的第j个平均MDCT谱值，M_min为最小的MDCT谱系数，M_max为最大的MDCT谱系数，P为映射后的最大幅值。In the formula, x′ _ij is the j-th MDCT spectral value of the i-th frame audio after amplitude mapping, M[i, j] is the j-th average MDCT spectral value of the i-th frame audio obtained in 1, M _min is the smallest MDCT spectral coefficient, M _max is the largest MDCT spectral coefficient, and P is the maximum magnitude after mapping.

第三步：MDCT系数的先验建模和最大后验概率估计器Step 3: Prior Modeling of MDCT Coefficients and Maximum Posteriori Estimator

1、分析MDCT的分布特性1. Analyze the distribution characteristics of MDCT

2、计算MDCT系数的概率分布函数2. Calculate the probability distribution function of the MDCT coefficient

通过1的分析得到MDCT的分布具有稀疏特性后，采用正态反高斯分布函数模拟MDCT系数的分布，得到MDCT概率分布函数表示为：After the analysis of 1 shows that the distribution of MDCT has a sparse characteristic, the normal inverse Gaussian distribution function is used to simulate the distribution of MDCT coefficients, and the MDCT probability distribution function is expressed as:

$p p ((x x)) = = \frac{αδ αδ}{πq πq ((x x))} exp exp [[h h ((x x))]] {K K}_{11} [[αq αq ((x x))]]$

式中，

K_λ(·)是索引为λ的第二阶修正贝塞尔函数，K₁(·)是索引为1的第二阶修正贝塞尔函数，

0≤|β|＜α，δ＞0，-∞＜μ＜∞。其中，α为衰减因子，δ为尺度因子，μ为均值，β为倾斜因子。In the formula,

K _λ (·) is the second-order modified Bessel function whose index is λ, K ₁ (·) is the second-order modified Bessel function whose index is 1,

0≤|β|<α, δ>0, -∞<μ<∞. Among them, α is the attenuation factor, δ is the scale factor, μ is the mean value, and β is the tilt factor.

3、分析参数[α，δ，β，μ]^T对正态反高斯分布特性的影响3. Analyze the influence of parameters [α, δ, β, μ] ^T on the characteristics of normal inverse Gaussian distribution

4、参数估计4. Parameter estimation

采用2中的正态反高斯分布函数来拟合MDCT系数的概率分布，需要对参数[α，δ，β，μ]^T进行估计。To use the normal inverse Gaussian distribution function in 2 to fit the probability distribution of MDCT coefficients, the parameters [α, δ, β, μ] ^T need to be estimated.

A)、计算方差

均值μ，倾斜因子β假设加入的噪声为零均值高斯白噪声，含噪音频的前几帧为纯噪声帧，由纯噪声帧估计噪声MDCT系数的方差

并对含有噪声的MDCT系数计算均值μ。MP3音频信号的MDCT系数呈对称分布，因此，假设倾斜因子β＝0。A), calculate the variance

Mean value μ, slope factor β assume that the noise added is zero-mean Gaussian white noise, and the first few frames of the noise-containing frequency are pure noise frames, and the variance of the noise MDCT coefficients is estimated from the pure noise frames

And the mean value μ is calculated for the MDCT coefficients containing noise. The MDCT coefficients of the MP3 audio signal are distributed symmetrically, therefore, it is assumed that the tilt factor β=0.

B)、计算衰减因子α、尺度因子δB), calculate the attenuation factor α, scale factor δ

不含噪声的MDCT系数的NIG分布模型的偏斜系数为

峭度系数为其中

相应的衰减因子α、尺度因子δ可通过如下式子进行估计：The skew coefficient of the NIG distribution model of the noise-free MDCT coefficients is

The kurtosis coefficient is in

The corresponding attenuation factor α and scale factor δ can be estimated by the following formula:

$δ δ = = {C C}_{11} \times \times \sqrt{{γβ γβ}_{22} | | 11 - - {η η}^{22} | |}$

$α α = = {C C}_{22} \times \times {β β}_{22}^{22} \sqrt{{γβ γβ}_{22}} / / {\overset{^^}{k k}}_{44}$

其中，分别为含有噪声的MDCT系数的2至4阶累积量，

C₁，C₂为用于控制衰减因子α、尺度因子δ的幅值，使NIG能有效地拟合MDCT系数的分布。in, are the 2nd to 4th order cumulants of the noisy MDCT coefficients, respectively,

C ₁ and C ₂ are used to control the amplitude of attenuation factor α and scale factor δ, so that NIG can effectively fit the distribution of MDCT coefficients.

C)、估计参数C₁和C₂ C), estimated parameters C ₁ and C ₂

对不同音频类型，不同信噪比条件下，统计C₁、C₂不同取值对MDCT系数分布的模拟误差，最后得到最佳的值C₁＝0.1，C₂＝0.1；故有衰减因子α、尺度因子δ的估计式为：For different audio types and under different signal-to-noise ratio conditions, the simulation errors of different values of C ₁ and C ₂ on the distribution of MDCT coefficients are counted, and finally the best values C ₁ = 0.1, C ₂ = 0.1 are obtained; therefore, there is an attenuation factor α , the estimation formula of scale factor δ is:

$δ δ = = 0.1 0.1 \times \times \sqrt{{γβ γβ}_{22} | | 11 - - {η η}^{22} | |}$

$α α = = 0.1 0.1 \times \times {β β}_{22}^{22} \sqrt{{γβ γβ}_{22}} / / {\overset{^^}{k k}}_{44}$

5、设计衰减估计器5. Design an attenuation estimator

根据贝叶斯最大后验概率准则，设计基于NIG先验分布模型的估计函数：According to the Bayesian maximum a posteriori probability criterion, the estimation function based on the NIG prior distribution model is designed:

$\overset{^^}{x x} = = \frac{11}{11 + + {σ σ}_{ϵ ϵ}^{22} ζ ζ} \times \times ((y the y + + {σ σ}_{ϵ ϵ}^{22} β β)) = = \frac{11}{11 + + {σ σ}_{ϵ ϵ}^{22} ζ ζ} \times \times y the y$

式中，

K_λ(·)是索引为λ的第二阶修正贝塞尔函数，

为对含有噪声的MP3音频数据y进行衰减得到的无噪MP3音频数据。In the formula,

K _λ ( ) is the second-order modified Bessel function with index λ,

It is the noise-free MP3 audio data obtained by attenuating the MP3 audio data y containing noise.

相应，可得到含有噪声的MP3音频的衰减因子为：Correspondingly, the attenuation factor of MP3 audio containing noise can be obtained as:

$a a = = \frac{11}{11 + + {σ σ}_{ϵ ϵ}^{22} ζ ζ}$

第四步：静音段检测Step 4: Silent segment detection

1、MDCT谱特征的提取1. Extraction of MDCT spectral features

MDCT系数的谱能量计算如下：The spectral energy of the MDCT coefficients is calculated as follows:

$EM EM ((i i)) = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N - - 11} {M m}^{22} ((i i,, j j))$

其中，EM(i)为第i帧音频的MDCT谱能量，M(i，j)为第i帧音频的第j个MDCT谱均值，N为一帧音频的MDCT系数的点数N＝576。对整个MP3音频段，音频段各帧的MDCT谱能量组成相应的特征矢量EM＝[EM(0)，EM(1)，...EM(N-1)]，即EM为音频段的MDCT谱能量包络。Wherein, EM (i) is the MDCT spectral energy of the i-th frame audio, M (i, j) is the j MDCT spectral mean value of the i-th frame audio, and N is the number of points N=576 of the MDCT coefficient of a frame of audio. For the whole MP3 audio segment, the MDCT spectrum energy of each frame of the audio segment forms a corresponding feature vector EM=[EM (0), EM (1), ... EM (N-1)], that is, EM is the MDCT of the audio segment Spectral energy envelope.

2、判决门限的调整2. Adjustment of judgment threshold

A)、初始化判决门限，以整个信号的MDCT谱能量包络的均值作为初始判决门限L_th A), initialize the decision threshold, and use the mean value of the MDCT spectrum energy envelope of the entire signal as the initial decision threshold L _th

${L L}_{th the th} = = \frac{11}{N N} {Σ Σ}_{i i = = 00}^{N N - - 11} EM EM ((i i))$

式中，EM(i)为第i帧音频的MDCT谱能量，N表示音频段的帧数，L_th为初始判决门限。In the formula, EM(i) is the MDCT spectrum energy of the i-th audio frame, N is the frame number of the audio segment, and L _th is the initial decision threshold.

B)、门限调整：将音频段的MDCT谱包络EM中所有小于判决门限L_th的帧做为噪声帧处理，有B), threshold adjustment: in the MDCT spectrum envelope EM of the audio segment, all frames less than the decision threshold L _th are processed as noise frames, with

EM_noise(i)＝EM(i)if EM(i)＜I_th EM _noise (i) = EM (i) if EM (i) < I _th

式中，EM_noise(i)表示第i帧音频的MDCT谱能量值为噪声帧的MDCT谱能量值。In the formula, EM _noise (i) indicates that the MDCT spectral energy value of the i-th frame audio is the MDCT spectral energy value of the noise frame.

初始化噪声谱序列的均值和均方差，分别记为L_noise和S_noise，Initialize the mean and mean square error of the noise spectrum sequence, denoted as L _noise and S _noise respectively,

${L L}_{noise noise} = = \frac{11}{M m} {Σ Σ}_{i i = = 00}^{M m - - 11} {Em Em}_{noise noise} ((i i))$

${S S}_{noise noise} = = \sqrt{\frac{11}{M m} {Σ Σ}_{i i = = 00}^{M m - - 11} {(({EM EM}_{noise noise} ((i i)) - - {L L}_{noise noise}))}^{22}}$

式中，EM_noise(i)表示第i个噪声帧的MDCT谱能量值，L_noise、S_noise分别为噪声能量序列的均值和均方差，M为噪声段的帧数。In the formula, EM _noise (i) represents the MDCT spectrum energy value of the i-th noise frame, L _noise and S _noise are the mean value and mean square error of the noise energy sequence respectively, and M is the number of frames of the noise segment.

在得到噪声帧能量序列的均值L_noise和均方差S_noise基础上，重新调整判决门限L_th。On the basis of obtaining the mean value L _noise and the mean square error S _noise of the energy sequence of the noise frame, the decision threshold L _th is readjusted.

L_th＝C₀×(L_noise+C₁×S_noise)L _th ＝C ₀ ×(L _noise +C ₁ ×S _noise )

其中，C0和C1为经验常数，实验中取C0＝1.001，C1值取在1.5～2.0之间调整。调整完判决门限值L_th后，再重新区分噪声和语音帧，并重新计算噪声谱能量序列的均值L_noise和均方差S_noise，然后调整判决门限值。如此重复至判决门限稳定。Among them, C0 and C1 are empirical constants. In the experiment, C0=1.001, and the value of C1 is adjusted between 1.5 and 2.0. After the decision threshold L _th is adjusted, the noise and speech frames are re-distinguished, and the mean value L _noise and the mean square error S _noise of the noise spectrum energy sequence are recalculated, and then the decision threshold is adjusted. Repeat this until the decision threshold is stable.

3、活性端点的融合3. Fusion of active endpoints

A)、根据门限判断静音帧/非静音帧A), judge the mute frame/non-silent frame according to the threshold

${E E.}_{type type} [[i i]] = = \{\begin{matrix} 00,, & EM EM [[i i]] < < {L L}_{th the th} \\ 11,, & EM EM [[i i]] &GreaterEqual; &Greater Equal; {L L}_{th the th} \end{matrix}$

其中，E_type[i]为第i帧音频的类型，EM[i]为第i帧音频的MDCT谱能量值；音频类型E_type[i]值为0表示静音帧，类型E_type[i]值为1表示活性音频帧。Among them, E _type [i] is the type of the i-th frame audio, and EM[i] is the MDCT spectrum energy value of the i-th frame audio; the value of the audio type E _type [i] is 0 to indicate a silent frame, and the type E _type [i] A value of 1 indicates an active audio frame.

B)、计算静音段中所包含的帧数F_N；B), calculating the number of frames F _N included in the silent segment;

C)、若F_N＜10，该段为连续活性音频段间的停顿，合并入对应的音频段中；C), if F _N <10, this segment is a pause between continuous active audio segments, and is merged into the corresponding audio segment;

第五步：自适应迭代衰减Step 5: Adaptive iterative decay

1、由第三步得到的衰减函数对第四步中检测到的静音段计算静音段的衰减值；1. Calculate the attenuation value of the silent segment for the silent segment detected in the fourth step by the attenuation function obtained in the third step;

2、计算1中静音段的平均衰减值

2. Calculate the average attenuation value of the silent segment in 1

3、使用第三步得到的衰减函数对含有噪声的MP3音频的MDCT系数进行衰减；3. Use the attenuation function obtained in the third step to attenuate the MDCT coefficients of the noisy MP3 audio;

4、由静音段的平均衰减值

自适应调整迭代估计的次数：重复步骤1、2、3，当满足下面条件，则停止迭代，降噪完成：4. From the average attenuation value of the silent segment

Adaptively adjust the number of iterative estimates: Repeat steps 1, 2, and 3. When the following conditions are met, the iteration is stopped and the noise reduction is completed:

$\overset{&OverBar; &OverBar;}{a a} \leq \leq {a a}_{min min} + + c c$

其中，

为静音段的平均衰减值，a_min为整段音频的最小衰减因子，可以通过MDCT系数的高频段获得。C用于控制余留分量，取C＝0.001，见附图1。in,

is the average attenuation value of the silent segment, and a _min is the minimum attenuation factor of the entire audio segment, which can be obtained through the high frequency band of the MDCT coefficient. C is used to control the residual component, and C=0.001, see Figure 1.

实验结果Experimental results

本实验使用了中央电视台广播音频资料进行了试验。音频资料的格式为MP3，采样频率为44.1KHz。音频类型有：语音、音乐、语音和音乐混合的音频。每种类型的音频各选择20首。分别对各种类型的音频加入不同程度的高斯白噪声，采用本研究提出的自适应降噪算法对含有噪声的MP3音频进行处理。降噪处理后的信噪比SNR采用的计算方法为：In this experiment, the audio data broadcast by CCTV was used for the experiment. The format of the audio data is MP3, and the sampling frequency is 44.1KHz. The audio types are: speech, music, audio mixed with speech and music. Choose 20 songs for each type of audio. Add different degrees of Gaussian white noise to various types of audio, and use the adaptive noise reduction algorithm proposed in this study to process the noisy MP3 audio. The calculation method of the signal-to-noise ratio SNR after noise reduction processing is:

$SNR SNR = = 1010 {log log}_{1010} \frac{{Σ Σ}_{n no = = 00}^{N N - - 11} {x x}^{22} ((n no))}{{Σ Σ}_{n no = = 00}^{N N - - 11} {((x x ((n no)) - - \overset{^^}{x x} ((n no))))}^{22}}$

其中，x(n)为不含噪声的MP3音频解码得到的PCM数据，为降噪处理后的MP3音频解码得到的PCM数据。降噪处理前后的信噪比SNR对比结果如表1所示：Wherein, x(n) is the PCM data obtained by decoding MP3 audio without noise, PCM data decoded for noise-reduced MP3 audio. The SNR comparison results before and after noise reduction processing are shown in Table 1:

表1：对MP3音频降噪前后的信噪比SNR对比Table 1: SNR comparison of MP3 audio before and after noise reduction

MP3音频信号 MP3 audio signal 降噪前的SNR SNR before noise reduction 降噪后的SNR SNR after noise reduction 平均信噪比增益 Average SNR Gain 音乐1 Music 1 -5db -5db 8.11db 8.11db 13.11db 13.11db 音乐2 Music 2 0db 0db 11.40db 11.40db 11.40db 11.40db 音乐3 Music 3 5db 5db 14.89db 14.89db 9.89db 9.89db 音乐4 Music 4 10db 10db 17.93db 17.93db 7.93db 7.93db 音乐5 Music 5 15db 15db 22.57db 22.57db 7.57db 7.57db 语音1 Voice 1 -5db -5db 8.12db 8.12db 13.12db 13.12db 语音2 Voice 2 0db 0db 10.78db 10.78db 10.78db 10.78db 音乐+语音1 Music + Voice 1 -5db -5db 6.26db 6.26db 11.26db 11.26db 音乐+语音2 Music + Voice 2 0db 0db 9.13db 9.13db 9.13db 9.13db

大量的统计实验表明，本发明的基于MP3压缩域音频的降噪方法能直接基于MP3压缩域，有效实现对不同类型的含有噪声的MP3音频进行降噪处理。降噪处理后的MP3音频的信噪比得到很大提高，并且处理后的音频有良好的听觉感知效果。本研究解决了直接基于MP3压缩域音频的降噪问题，也为MP3音频分类检索的抗噪算法研究提出了一个新的思路。A large number of statistical experiments show that the noise reduction method based on the MP3 compressed domain audio of the present invention can be directly based on the MP3 compressed domain, and effectively implement noise reduction processing on different types of MP3 audio containing noise. The signal-to-noise ratio of the MP3 audio after the noise reduction processing is greatly improved, and the processed audio has a good auditory perception effect. This study solves the problem of noise reduction directly based on MP3 compressed domain audio, and also proposes a new idea for the research of anti-noise algorithm for MP3 audio classification retrieval.

Claims

1. An MP3 compressed domain audio adaptive noise reduction method is characterized in that: firstly, extracting MDCT coefficients representing the frequency domain characteristics of original audio from MP3 compressed audio, then analyzing the sparse statistical characteristics of the MDCT coefficients, carrying out prior modeling on the MDCT coefficients by adopting a normal inverse Gaussian NIG distribution function, and designing a maximum posterior probability estimator based on the normal inverse Gaussian NIG prior probability model by utilizing a Bayesian criterion to obtain attenuation factors of corresponding audio segments; finally, in a noise reduction part, detecting a mute section in the MP3 audio by using the MDCT spectrum energy characteristics, and adaptively controlling the iteration times of noise attenuation through the attenuation weight of the detected mute section, thereby realizing the adaptive noise reduction of the MP3 compressed audio;

the specific operation steps are as follows:

1) preprocessing the MP3 compressed audio containing noise, including decoding the MP3 frame header, acquiring side information, acquiring main data and a scaling factor, performing Huffman decoding and performing inverse quantization;

2) extracting MDCT coefficients and carrying out amplitude mapping: the MDCT coefficients of two granularities of each frame are found out from the MP3 frame after inverse quantization, the MDCT coefficients of two granules are averaged according to frequency points, the MDCT spectrum coefficient of each frame of audio is constructed, and the amplitude range of the MDCT coefficient is mapped between 0 and L;

3) carrying out prior modeling on the distribution of the MDCT coefficient and constructing a maximum posterior probability estimator: respectively analyzing the distribution conditions of the MDCT coefficient without noise and the MDCT coefficient with noise to obtain the statistical characteristics of the MDCT coefficient without noise; according to the sparse statistical characteristics of the MDCT coefficient, carrying out prior modeling on the MDCT coefficient by utilizing a normal inverse Gaussian NIG distribution function; designing a noise attenuation estimator based on an NIG prior distribution model according to a Bayes maximum posterior probability criterion;

4) and mute section detection: extracting the spectral energy characteristics based on the MDCT coefficient, and detecting a mute section in the MP3 audio according to the MDCT spectral energy characteristic parameters;

5) and self-adaptive iterative estimation: estimating the MP3 containing noise by using the estimator in the step 3), and adaptively adjusting the number of times of iterative estimation by the attenuation factor of the mute section detected in the step 4);

the specific steps of carrying out prior modeling on the distribution of the MDCT coefficients in the step 3) and constructing the maximum posterior probability estimator are as follows:

firstly, analyzing the distribution characteristics of MDCT;

secondly, calculating a probability distribution function of the MDCT coefficient;

after the MDCT distribution obtained through the analysis of the step I has the sparse characteristic, the distribution of the MDCT coefficient is simulated by adopting a normal inverse Gaussian distribution function, and the MDCT probability distribution function is obtained and expressed as:

in the formula:

K_λ(. is a second order modified Bessel function with an index of λ, K₁(. cndot.) is a second order modified Bezier function with an index of 1,

beta is more than or equal to 0 and less than alpha, delta is more than 0, and mu is more than infinity; wherein alpha is an attenuation factor, delta is a scale factor, mu is a mean value, and beta is a gradient factor;

analyzing the influence of four parameters of attenuation factor alpha, scale factor delta, mean value mu and inclination factor beta on the normal inverse Gaussian distribution characteristic;

fourthly, parameter estimation

The normal inverse Gaussian distribution function in the step II is adopted to fit the probability distribution of the MDCT coefficient, and four parameters [ alpha, delta, beta, mu ] for determining the shape of the normal inverse Gaussian distribution are required]^TAnd (3) estimating:

A) calculating the variance

Mean value μ, skew factor β

Assuming that the added noise is white Gaussian noise with zero mean, the first frames of the noisy audio are pure noise frames, and the variance of MDCT (modified discrete cosine transform) coefficients of the noise is estimated from the pure noise framesAnd calculating the average value mu of the MDCT coefficients containing noise, wherein the MDCT coefficients of the MP3 audio signal are distributed symmetrically, so that the inclination factor beta is assumed to be 0;

B) calculating attenuation factor alpha and scale factor delta

Estimating an attenuation factor alpha and a scale factor delta by adopting a skew coefficient and a kurtosis coefficient of an NIG distribution model; skew coefficients of the NIG distribution model of noise-free MDCT coefficients are

Kurtosis coefficient of

Wherein

Corresponding attenuation factor alpha and scale factor delta-passThe estimation is performed by the following equation:

wherein,respectively 2 to 4 order cumulants of MDCT coefficients containing noise,

parameter C₁，C₂For weighting the attenuation factor alpha and the scale factor delta, selecting proper C₁，C₂Values that enable the NIG to effectively fit the distribution of MDCT coefficients;

C) estimating attenuation factor and weight C of scale factor₁、C₂

For different audio types and under different signal-to-noise ratios, counting C₁、C₂The fitting error of MDCT coefficient distribution is obtained when different values are taken, and finally the optimal value C is obtained₁＝0.1，C₂0.1; therefore, the estimation formula of the attenuation factor alpha and the scale factor delta is as follows:

designing a noise attenuation function based on an NIG prior distribution model according to a Bayes maximum posterior probability criterion:

in the formula,

k λ (·) is a second-order modified bessel function with index λ,

noise-free MP3 audio data obtained by attenuating the MP3 audio data y containing noise;

accordingly, the attenuation factor for MP3 audio with noise can be obtained as:

2. the MP3 compressed domain audio adaptive noise reduction method of claim 1, wherein: the specific steps of performing MP3 compressed audio preprocessing in step 1) are as follows:

firstly, acquiring synchronous data stream and frame header information;

A) searching synchronous information from MP3 data stream according to MP3 encoding format;

B) finding the initial position of each frame data in the MP3 data stream according to the synchronization information;

C) after determining the initial position of the data frame, acquiring frame header information Head;

secondly, acquiring side information from the frame header information obtained by decoding

A) Determining the initial position of the side information in the MP3 frame header according to the encoding format of the MP3 frame header;

B) acquiring Side information Side from MP3 frame header information Head;

extracting MP3 main data and scaling factor

A) Calculating the length L of the main data according to the Side information Side;

B) determining the initial position of the MP3 main data according to the offset of the main data in the header information Head;

C) acquiring main data D with the total length L from the current frame;

D) extracting a scaling factor Scale from the main data D;

fourthly, carrying out Huffman decoding and inverse quantization on the MP3 main data stream

A) Determining the starting position and the ending position of the Huffman decoding data according to the Side information Side;

B) performing Huffman decoding on the MP3 main data D to obtain a 32-by-18-dimensional Huffman decoding result F [32, 18 ];

C) and inverse quantizing the data in the Huffman decoding result F [32, 18 ].

3. The MP3 compressed domain audio adaptive noise reduction method of claim 2, wherein: the MDCT coefficient extraction and amplitude mapping process in step 2) specifically includes the following steps:

firstly, constructing Modified Discrete Cosine Transform (MDCT) coefficient of each frame of audio

A) Memory space MDCT of size n x 576 for storing MDCT coefficients of two granularities of one frame MP3 audio₀[n，576]，MDCT₁[n，576]Wherein n is the number of frames of MP3 audio;

B) respectively finding MDCT coefficients of two granularities of the same frame of audio from the array F, rearranging according to the principle that the frequency is from low to high to obtain the MDCT₀[i，j]，MDCT₁[i，j]；

C) Calculating the average value of MDCT coefficients at two frequency points with the same granularity in the same frame audio to be used as an MDCT coefficient value M [ i, j ] of the frame audio;

M [i, j] = \frac{{MDCT}_{0} [i, j] + {MDCT}_{1} [i, j]}{2}

wherein the MDCT₀[i，j]，MDCT₁[i，j]The jth MDCT spectrum values of the 0 th granularity and the 1 st granularity of the ith frame of audio respectively; m [ i, j ]]The j average MDCT spectrum value of the i frame audio;

and secondly, mapping the amplitude range of the MDCT coefficient: the amplitude of the MDCT coefficient is linearly mapped between 0 and P in the range of 0 to 1, so that the statistical distribution of the MDCT coefficient and the corresponding fitting function are conveniently researched:

x 'in the formula'_ijFor the j (th) MDCT spectrum value of the i (th) frame audio after amplitude mapping, M [ i, j]Is the j average MDCT spectrum value of the i frame audio obtained by (r)_minIs the smallest MDCT spectral coefficient, M_maxIs the largest MDCT spectral coefficient, and P is the maximum amplitude after mapping.

4. The MP3 compressed domain audio adaptive noise reduction method of claim 1, wherein: the step 4) of detecting the mute section comprises the following specific steps:

extracting spectral features based on MDCT coefficients

Wherein EM (i) is the MDCT spectral energy of the i-th frame of audio, M (i, j) is the jth MDCT spectral mean of the i-th frame of audio, N is the number N of MDCT coefficients of one frame of audio 576, and for the entire MP3 audio segment, the MDCT spectral energy of each frame of the audio segment constitutes a corresponding feature vector EM [ EM (0), EM (1),.. EM (N-1) ], i.e. EM is the MDCT spectral energy envelope of the audio segment;

secondly, adjusting a decision threshold according to the MDCT spectrum energy characteristics

A) Initializing a decision threshold, and taking the average value of the MDCT spectrum energy envelope of the whole signal as an initial decision threshold L_th

Where EM (i) is the MDCT spectral energy of the i frame of audio, N represents the number of frames of the audio segment, L_thIs an initial decision threshold;

B) and adjusting a threshold: all MDCT spectrum envelopes EM of the audio segment are smaller than a judgment threshold L_thThe frame is processed as a noise frame, having

EM_noise(i)＝EM(i) if EM(i)＜L_th

In the formula, EM_noise(i) The MDCT spectral energy value representing the i-th frame audio is the MDCT spectral energy value of the noise frame,

mean and mean square error of the initialized noise spectrum sequence, respectively denoted as L_noiseAnd S_noise，

<math> <mrow> <msub> <mi>L</mi> <mi>noise</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>EM</mi> <mi>noise</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </math>

<math> <mrow> <msub> <mi>S</mi> <mi>noise</mi> </msub> <mo>=</mo> <msqrt> <mfrac> <mn>1</mn> <mi>M</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>M</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>EM</mi> <mi>noise</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>L</mi> <mi>noise</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> </math>

In the formula, EM_noise(i) MDCT spectral energy value, L, representing the i-th noise frame_noise、S_noiseRespectively representing the mean value and the mean square error of the noise energy sequence, wherein M is the frame number of the noise section;

obtaining the mean value L of the energy sequence of the noise frame_noiseSum mean square error S_noiseOn the basis, readjust the decision threshold L_th，

L_th＝C₀×(L_noise+C₁×S_noise)

Wherein C0 and C1 are empirical constants, C0 is 1.001, and C1 is adjusted between 1.5 and 2.0; adjusted decision threshold value L_thThen, the noise and the voice frame are distinguished again, and the mean value L of the noise spectrum energy sequence is calculated again_noiseSum mean square error S_noiseThen adjusting a judgment threshold value; repeating the above steps until the decision threshold is stable;

③ fusion of active endpoints

A) Judging the mute frame/non-mute frame according to the threshold

<math> <mrow> <msub> <mi>E</mi> <mi>type</mi> </msub> <mo>[</mo> <mi>i</mi> <mo>]</mo> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mn>0</mn> <mo>,</mo> </mtd> <mtd> <mi>EM</mi> <mo>[</mo> <mi>i</mi> <mo>]</mo> <mo><</mo> <msub> <mi>L</mi> <mi>th</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> <mo>,</mo> </mtd> <mtd> <mi>EM</mi> <mo>[</mo> <mi>i</mi> <mo>]</mo> <mo>&GreaterEqual;</mo> <msub> <mi>L</mi> <mi>th</mi> </msub> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>

E_type[i]Type of audio for frame i, EM [ i ]]MDCT spectrum energy value of the ith frame audio; audio type E_type[i]A value of 0 indicating a silent frame, type E_type[i]A value of 1 indicates an active audio frame;

B) calculating the number of frames F contained in the mute section_N；

C) If F_N< 10, which is a pause between consecutive active audio segments, incorporated into the corresponding audio segment.

5. The MP3 compressed domain audio adaptive noise reduction method of claim 1, wherein: the step 5) of the adaptive iterative estimation specifically comprises the following steps:

calculating the attenuation value of the mute section detected in the step 4) by the attenuation function obtained in the step (3);

secondly, calculating the average attenuation value of the mute section in the step I

Using the attenuation function obtained in the step 3) to attenuate the MDCT coefficient of the MP3 audio containing noise;

fourthly, average attenuation value of silence segmentAdaptively adjusting the number of iterative estimation: repeating the first step, the second step and the third step, stopping iteration and finishing noise reduction when the following conditions are met:

<math> <mrow> <mover> <mi>a</mi> <mo>&OverBar;</mo> </mover> <mo>≤</mo> <msub> <mi>a</mi> <mi>min</mi> </msub> <mo>+</mo> <mi>c</mi> </mrow> </math>

is the average attenuation value of the silence segment, a_minThe minimum attenuation factor for the whole audio can be obtained by a high-frequency segment of MDCT coefficients, C is used to control the residual component, and C is 0.001.