CN101930746B - An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio - Google Patents
An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio Download PDFInfo
- Publication number
- CN101930746B CN101930746B CN2010102154044A CN201010215404A CN101930746B CN 101930746 B CN101930746 B CN 101930746B CN 2010102154044 A CN2010102154044 A CN 2010102154044A CN 201010215404 A CN201010215404 A CN 201010215404A CN 101930746 B CN101930746 B CN 101930746B
- Authority
- CN
- China
- Prior art keywords
- mrow
- audio
- mdct
- noise
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
本发明涉及一种MP3压缩域音频自适应降噪方法。本方法直接基于MP3压缩域进行降噪。首先,对含有噪声的MP3音频数据提取MDCT系数,基于MDCT谱能量特征对MP3音频进行活性检测,区分出活性音频段和静音段。同时,在从MP3压缩音频数据中提取MDCT系数后,根据MDCT系数的稀疏特性,采用正态反高斯(NIG)分布函数对MDCT系数进行先验统计建模。然后根据贝叶斯理论,设计基于NIG先验概率模型的最大后验概率估计器,得到相应音频段的衰减因子。在衰减噪声部分,利用衰减因子对音频段的噪声进行衰减,并根据静音段音频的衰减权重自适应地调整衰减的迭代次数以实现降噪。实验结果表明,采用本发明的降噪算法能有效去除MP3音频中的噪声,提高压缩音频的信噪比,且降噪后的MP3音频质量良好。
The invention relates to an MP3 compressed domain audio adaptive noise reduction method. The method is directly based on the MP3 compressed domain for noise reduction. First, MDCT coefficients are extracted from MP3 audio data containing noise, and activity detection is performed on the MP3 audio based on the MDCT spectrum energy characteristics to distinguish active audio segments and silent segments. At the same time, after extracting the MDCT coefficients from the MP3 compressed audio data, according to the sparse characteristics of the MDCT coefficients, a priori statistical modeling is performed on the MDCT coefficients using a normal inverse Gaussian (NIG) distribution function. Then, according to the Bayesian theory, a maximum a posteriori probability estimator based on the NIG prior probability model is designed to obtain an attenuation factor of the corresponding audio segment. In the attenuation noise part, the attenuation factor is used to attenuate the noise of the audio segment, and the number of attenuation iterations is adaptively adjusted according to the attenuation weight of the silent segment audio to achieve noise reduction. Experimental results show that the noise reduction algorithm of the invention can effectively remove the noise in the MP3 audio, improve the signal-to-noise ratio of the compressed audio, and the MP3 audio quality after noise reduction is good.
Description
技术领域 technical field
本发明涉及一种MP3压缩域音频自适应降噪方法,主要是在不同高斯白噪声条件下,对含有噪声的MP3音频,直接在MP3压缩域实现对MP3音频的自适应降噪处理的方法。The invention relates to an audio adaptive noise reduction method in an MP3 compressed domain, mainly for realizing the adaptive noise reduction processing method for the MP3 audio containing noise directly in the MP3 compressed domain under different Gaussian white noise conditions.
背景技术 Background technique
音频降噪技术,是指利用信号处理和模式识别的方法,从含有噪声的音频中将噪声去除,使去除噪声后的音频有较高的信噪比和较好的质量。音频降噪是音频信号处理领域需要解决的关键技术之一。Audio noise reduction technology refers to the use of signal processing and pattern recognition methods to remove noise from audio containing noise, so that the audio after noise removal has a higher signal-to-noise ratio and better quality. Audio noise reduction is one of the key technologies to be solved in the field of audio signal processing.
互联网以及各种数据库中存在的大量音频数据都是以压缩格式存储,如何对压缩域中的音频数据进行处理已成为音频研究领域的一大热点。国内外学者已针对压缩音频的分割、分类、检索算法进行了大量的研究,并且能获得与非压缩音频处理相近的实验结果。但在压缩音频中混有噪声的情况下,音频分类检索算法的精度却受到严重的影响。通常,先对含有噪声的压缩域音频解压缩,再进行降噪处理,耗费的时间较多,这必然降低对压缩音频进行各种处理的效率。因此,研究如何直接基于压缩域实现音频的降噪处理,以最小计算代价实现降噪来提高压缩域音频的检索效率显得尤为重要。A large amount of audio data in the Internet and various databases are stored in compressed format. How to process audio data in the compressed domain has become a hot spot in the field of audio research. Scholars at home and abroad have done a lot of research on the segmentation, classification, and retrieval algorithms of compressed audio, and can obtain experimental results similar to those of uncompressed audio processing. However, when the compressed audio is mixed with noise, the accuracy of the audio classification retrieval algorithm is seriously affected. Usually, it takes a lot of time to decompress the compressed domain audio containing noise first, and then perform noise reduction processing, which will inevitably reduce the efficiency of various processing of compressed audio. Therefore, it is particularly important to study how to implement audio noise reduction processing directly based on the compressed domain, and achieve noise reduction with the minimum computational cost to improve the retrieval efficiency of compressed domain audio.
在对音频进行压缩处理时考虑了人耳的听觉掩蔽特性,通过第二心理声学模型来选择修正的离散余弦变换(MDCT)的窗函数。同时,MDCT变换也是FFT变换的一种修正,且MDCT系数具有稀疏特性。因此,我们可以从压缩域音频中提取MDCT系数,然后试图寻找一种能拟合稀疏分布的模型函数用于对MDCT系数进行先验建模,然后构建滤波器,实现对压缩域音频的降噪处理。本发明正是采用上述的方法,从MPEG1标准声音第三层压缩技术MP3压缩域音频中提取MDCT系数,采用正态反高斯函数对MDCT系数的分布进行先验建模,构建最大后验概率估计函数,实现压缩域音频的降噪。The auditory masking characteristic of the human ear is taken into consideration when the audio is compressed, and the window function of the Modified Discrete Cosine Transform (MDCT) is selected through the second psychoacoustic model. At the same time, the MDCT transform is also a modification of the FFT transform, and the MDCT coefficients are sparse. Therefore, we can extract MDCT coefficients from compressed domain audio, and then try to find a model function that can fit a sparse distribution for prior modeling of MDCT coefficients, and then construct filters to achieve noise reduction for compressed domain audio deal with. The present invention just adopts above-mentioned method, extracts MDCT coefficient from MPEG1 standard audio third-layer compression technology MP3 compression domain audio frequency, adopts normal anti-Gaussian function to carry out prior modeling to the distribution of MDCT coefficient, constructs maximum posteriori probability estimation Function that implements noise reduction for compressed-domain audio.
本发明所提出的降噪方法解决了MP3压缩域中含有噪声的音频降噪问题,可进一步应用于MP3音频的语音识别和分类检索系统中。The noise reduction method proposed by the invention solves the audio noise reduction problem containing noise in the MP3 compression domain, and can be further applied to the speech recognition and classification retrieval system of the MP3 audio.
发明内容 Contents of the invention
本发明的目的在于提供一种MP3压缩域音频自适应降噪方法,通过从MP3音频提取MDCT系数,对MDCT系数的分布进行先验建模,并构建估计器,实现对含有噪声的MP3音频进行降噪处理。The object of the present invention is to provide a kind of MP3 compressed domain audio adaptive denoising method, by extracting MDCT coefficient from MP3 audio frequency, carry out prior modeling to the distribution of MDCT coefficient, and construct estimator, realize the MP3 audio frequency that contains noise Noise reduction processing.
本发明解决其技术问题采用的技术方案为:先从MP3音频数据中提取MDCT系数,再对MDCT系数进行先验概率建模,然后构造噪声衰减估计器。同时,对MP3音频进行静音段检测,根据静音段的衰减比重来调整对含噪音频段进行噪声衰减的程度。The technical scheme adopted by the present invention to solve the technical problem is as follows: first extract MDCT coefficients from MP3 audio data, then perform prior probability modeling on the MDCT coefficients, and then construct a noise attenuation estimator. At the same time, it detects the silent segment of the MP3 audio, and adjusts the degree of noise attenuation for the noise-containing frequency band according to the attenuation proportion of the silent segment.
本发明解决其技术问题采用的技术方案还可以进一步完善。首先从MP3音频数据中提取MDCT系数,再分析MDCT系数的特性,根据MDCT系数的特性选择适用于对MDCT系数分布进行先验概率建模的正态反高斯分布函数,然后根据贝叶斯最大后验概率理论来构造噪声衰减估计器。同时,利用MDCT谱能量特征对MP3音频进行静音段检测,根据静音段的衰减比重来调整降噪处理时对噪声衰减的程度。该方法具体包括如下步骤:The technical solution adopted by the present invention to solve the technical problem can be further improved. First extract the MDCT coefficients from the MP3 audio data, and then analyze the characteristics of the MDCT coefficients. According to the characteristics of the MDCT coefficients, the normal anti-Gaussian distribution function suitable for modeling the prior probability distribution of the MDCT coefficient distribution is selected, and then according to the Bayesian maximum posterior The experimental probability theory is used to construct the noise attenuation estimator. At the same time, the MDCT spectral energy feature is used to detect the silent segment of the MP3 audio, and the degree of noise attenuation during the noise reduction process is adjusted according to the attenuation proportion of the silent segment. The method specifically includes the following steps:
1)、含有噪声的MP3压缩音频的预处理,包括对MP3帧头进行解码、边信息获取、获取主数据和缩放因子、哈夫曼解码和反量化四个部分;1), the preprocessing of MP3 compressed audio containing noise, including four parts: decoding the MP3 frame header, obtaining side information, obtaining main data and scaling factor, Huffman decoding and dequantization;
2)、提取MDCT系数,并进行幅值映射处理:从反量化后的MP3帧中找出每一帧两个粒度的MDCT系数,对两个颗粒的MDCT系数按频率点求平均,构建每帧音频的MDCT谱系数,并将MDCT系数的幅值范围映射到0-L之间;2), extract the MDCT coefficients, and perform amplitude mapping processing: find out the MDCT coefficients of each frame with two granularities from the dequantized MP3 frame, average the MDCT coefficients of the two particles according to the frequency points, and construct each frame The MDCT spectral coefficient of the audio, and the amplitude range of the MDCT coefficient is mapped to between 0-L;
3)、对MDCT系数的分布进行先验建模并构造最大后验概率估计器:分别对不含噪声的MDCT系数和含有噪声的MDCT系数的分布情况进行分析,获得不含噪声的MDCT系数的统计特性。根据MDCT系数的稀疏统计特性,利用正态反高斯(NIG)分布函数对MDCT系数进行先验建模。根据贝叶斯最大后验概率准则,推导出基于NIG先验分布模型的估计器。3) Perform prior modeling on the distribution of MDCT coefficients and construct a maximum a posteriori probability estimator: respectively analyze the distribution of MDCT coefficients without noise and MDCT coefficients with noise, and obtain the MDCT coefficients without noise statistical properties. According to the sparse statistical properties of MDCT coefficients, the normal inverse Gaussian (NIG) distribution function is used to model the MDCT coefficients a priori. According to the Bayesian maximum a posteriori probability criterion, an estimator based on the NIG prior distribution model is derived.
4)、静音段检测:提取基于MDCT系数的谱能量特征,根据能量特征参数检测MP3音频中的静音段;4), silent segment detection: extract the spectral energy feature based on MDCT coefficients, and detect the silent segment in the MP3 audio according to the energy feature parameter;
5)、自适应迭代估计:利用3)中的估计器对含有噪声的MP3进行估计,并通过4)中检测到的静音段的衰减因子自适应地调整迭代估计的次数。5), adaptive iterative estimation: use the estimator in 3) to estimate the MP3 containing noise, and adjust the number of iterative estimation adaptively through the attenuation factor of the silent segment detected in 4).
本发明有益的效果是:直接基于MP3压缩域对MP3音频进行降噪处理,比传统的将MP3压缩音频解码为非压缩的wave音频再进行降噪处理的方法而言,本发明提出的方法更简单,且节省计算时间;研究MP3音频的MDCT系数的分布特性,选择适用于对MDCT系数的分布进行先验建模的函数,实验结果表明所选择的函数能有效的拟合MDCT系数的分布;并且,基于MDCT系数的先验概率分布函数设计的噪声衰减估计器能有效地实现MP3压缩音频的降噪;同时,利用MDCT谱能量特征检测MP3音频中的静音段,再由静音段的衰减因子自适应控制衰减噪声的程度,不仅能有效的解决降噪过程中过衰减或欠衰减导致引入音频噪声的问题,而且降噪后的音频具有良好的效果。The beneficial effects of the present invention are: directly based on the MP3 compressed domain, MP3 audio is subjected to noise reduction processing, and compared with the traditional method of decoding MP3 compressed audio into non-compressed wave audio and then performing noise reduction processing, the method proposed by the present invention is more effective. It is simple and saves calculation time; study the distribution characteristics of MDCT coefficients of MP3 audio, and select a function suitable for prior modeling of the distribution of MDCT coefficients. Experimental results show that the selected function can effectively fit the distribution of MDCT coefficients; Moreover, the noise attenuation estimator designed based on the prior probability distribution function of MDCT coefficients can effectively realize the noise reduction of MP3 compressed audio; at the same time, the silent segment in MP3 audio is detected by using the MDCT spectral energy feature, and then the attenuation factor of the silent segment Adaptive control of the degree of noise attenuation can not only effectively solve the problem of introducing audio noise caused by over-attenuation or under-attenuation in the noise reduction process, but also the audio after noise reduction has a good effect.
附图说明 Description of drawings
图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.
具体实施方式Detailed ways
本发明一种MP3压缩域音频自适应降噪方法的一个优选实施例结合附图说明如下:一种MP3压缩域音频自适应降噪方法共分为五步:A preferred embodiment of a kind of MP3 compressed domain audio adaptive noise reduction method of the present invention is described as follows in conjunction with accompanying drawing: A kind of MP3 compressed domain audio adaptive noise reduction method is divided into five steps altogether:
第一步:含有噪声的MP3压缩音频的预处理Step 1: Preprocessing of noisy MP3 compressed audio
含有噪声的MP3压缩音频的预处理,包括对MP3帧头进行解码、边信息获取、读取主数据和缩放因子、哈夫曼解码和反量化四个部分。Preprocessing of noisy MP3 compressed audio includes four parts: decoding MP3 frame header, obtaining side information, reading main data and scaling factor, Huffman decoding and dequantization.
1、同步数据流和帧头信息的获取1. Acquisition of synchronous data stream and frame header information
A)、根据MP3编码格式,从MP3数据流中搜索同步信息;A), according to the MP3 encoding format, search for synchronous information from the MP3 data stream;
B)、根据同步信息,找到MP3数据流中各帧数据的起始位置;B), according to synchronous information, find the starting position of each frame data in the MP3 data flow;
C)、确定数据帧的起始位置后,获取帧头信息Head;C), after determining the starting position of the data frame, obtain the frame header information Head;
2、边信息的获取2. Acquisition of side information
A)、根据MP3帧头的编码格式,确定MP3帧头中边信息的起始位置;A), according to the encoding format of MP3 frame header, determine the starting position of edge information in MP3 frame header;
B)、从MP3帧头信息Head中获取边信息Side;B), obtain the side information Side from the MP3 frame header information Head;
3、MP3主数据和缩放因子的读取3. Reading of MP3 main data and scaling factor
A)、根据边信息Side计算主数据的长度L;A), calculate the length L of the main data according to the side information Side;
B)、根据帧头信息Head中主数据的偏移量,确定MP3主数据的起始位置;B), according to the offset of the main data in the frame header information Head, determine the starting position of the MP3 main data;
C)、从当前帧中获取总长度为L的主数据D;C), obtaining the main data D with a total length of L from the current frame;
D)、从主数据D中提取缩放因子Scale;D), extract the scaling factor Scale from the master data D;
4、哈夫曼解码和反量化4. Huffman decoding and dequantization
A)、根据边信息Side确定哈夫曼解码数据的起始和结束位置;A), determine the start and end positions of the Huffman decoded data according to the side information Side;
B)、对MP3主数据D进行哈夫曼解码,得到32*18维的哈夫曼解码结果F[32,18];B), carry out Huffman decoding to MP3 main data D, obtain the Huffman decoding result F [32,18] of 32*18 dimensions;
C)、对哈夫曼解码结果F[32,18]中的数据进行反量化。C) Dequantize the data in the Huffman decoding result F[32, 18].
第二步:MDCT系数提取及幅值映射处理The second step: MDCT coefficient extraction and amplitude mapping processing
1、构建每帧音频的修正离散余弦变换MDCT系数1. Construct the modified discrete cosine transform MDCT coefficients of each frame of audio
A)、分配用于存放一帧MP3音频两个粒度的MDCT系数的n*576大小的存储空间MDCT0[n,576],MDCT1[n,576]中,其中n为MP3音频的帧数;A), allocate the n*576 size storage space MDCT 0 [n, 576], MDCT 1 [n, 576] for depositing the MDCT coefficient of two granularities of a frame of MP3 audio frequency, wherein n is the frame number of MP3 audio frequency ;
B)、从数组F中分别找到同一帧音频两个粒度的MDCT系数,按频率从低到高的原则重新排列,得到MDCT0[i,j],MDCT1[i,j]中;B), from the array F, find the MDCT coefficients of the two granularities of the same frame of audio, rearrange according to the principle of frequency from low to high, and obtain MDCT 0 [i, j], MDCT 1 [i, j];
C)、计算同一帧音频中两个粒度相同频率点处的MDCT系数的平均值,作为这一帧音频的MDCT系数值M[i,j];C), calculate the average value of the MDCT coefficients at two granularity identical frequency points in the same frame of audio, as the MDCT coefficient value M[i, j] of this frame of audio;
其中,MDCT0[i,j],MDCT1[i,j]分别第i帧音频的第0个粒度和第1个粒度的第j个MDCT谱值。M[i,j]为第i帧音频的第j个平均MDCT谱值。Among them, MDCT 0 [i, j] and MDCT 1 [i, j] are the 0th granularity and the jth MDCT spectrum value of the first granularity of the i-th audio frame respectively. M[i, j] is the j-th average MDCT spectral value of the i-th frame of audio.
2、MDCT系数幅值范围映射2. MDCT coefficient amplitude range mapping
将MDCT系数的幅值在0-1的范围线性映射到0-P之间,便于研究MDCT系数的统计分布和相应的拟合函数Linearly map the amplitude of MDCT coefficients in the range of 0-1 to 0-P, which is convenient for studying the statistical distribution of MDCT coefficients and the corresponding fitting functions
式中,x′ij为幅值映射后的第i帧音频的第j个MDCT谱值,M[i,j]为由1中得到的第i帧音频的第j个平均MDCT谱值,Mmin为最小的MDCT谱系数,Mmax为最大的MDCT谱系数,P为映射后的最大幅值。In the formula, x′ ij is the j-th MDCT spectral value of the i-th frame audio after amplitude mapping, M[i, j] is the j-th average MDCT spectral value of the i-th frame audio obtained in 1, M min is the smallest MDCT spectral coefficient, M max is the largest MDCT spectral coefficient, and P is the maximum magnitude after mapping.
第三步:MDCT系数的先验建模和最大后验概率估计器Step 3: Prior Modeling of MDCT Coefficients and Maximum Posteriori Estimator
1、分析MDCT的分布特性1. Analyze the distribution characteristics of MDCT
2、计算MDCT系数的概率分布函数2. Calculate the probability distribution function of the MDCT coefficient
通过1的分析得到MDCT的分布具有稀疏特性后,采用正态反高斯分布函数模拟MDCT系数的分布,得到MDCT概率分布函数表示为:After the analysis of 1 shows that the distribution of MDCT has a sparse characteristic, the normal inverse Gaussian distribution function is used to simulate the distribution of MDCT coefficients, and the MDCT probability distribution function is expressed as:
式中,Kλ(·)是索引为λ的第二阶修正贝塞尔函数,K1(·)是索引为1的第二阶修正贝塞尔函数, 0≤|β|<α,δ>0,-∞<μ<∞。其中,α为衰减因子,δ为尺度因子,μ为均值,β为倾斜因子。In the formula, K λ (·) is the second-order modified Bessel function whose index is λ, K 1 (·) is the second-order modified Bessel function whose index is 1, 0≤|β|<α, δ>0, -∞<μ<∞. Among them, α is the attenuation factor, δ is the scale factor, μ is the mean value, and β is the tilt factor.
3、分析参数[α,δ,β,μ]T对正态反高斯分布特性的影响3. Analyze the influence of parameters [α, δ, β, μ] T on the characteristics of normal inverse Gaussian distribution
4、参数估计4. Parameter estimation
采用2中的正态反高斯分布函数来拟合MDCT系数的概率分布,需要对参数[α,δ,β,μ]T进行估计。To use the normal inverse Gaussian distribution function in 2 to fit the probability distribution of MDCT coefficients, the parameters [α, δ, β, μ] T need to be estimated.
A)、计算方差均值μ,倾斜因子β假设加入的噪声为零均值高斯白噪声,含噪音频的前几帧为纯噪声帧,由纯噪声帧估计噪声MDCT系数的方差并对含有噪声的MDCT系数计算均值μ。MP3音频信号的MDCT系数呈对称分布,因此,假设倾斜因子β=0。A), calculate the variance Mean value μ, slope factor β assume that the noise added is zero-mean Gaussian white noise, and the first few frames of the noise-containing frequency are pure noise frames, and the variance of the noise MDCT coefficients is estimated from the pure noise frames And the mean value μ is calculated for the MDCT coefficients containing noise. The MDCT coefficients of the MP3 audio signal are distributed symmetrically, therefore, it is assumed that the tilt factor β=0.
B)、计算衰减因子α、尺度因子δB), calculate the attenuation factor α, scale factor δ
不含噪声的MDCT系数的NIG分布模型的偏斜系数为峭度系数为其中相应的衰减因子α、尺度因子δ可通过如下式子进行估计:The skew coefficient of the NIG distribution model of the noise-free MDCT coefficients is The kurtosis coefficient is in The corresponding attenuation factor α and scale factor δ can be estimated by the following formula:
其中,分别为含有噪声的MDCT系数的2至4阶累积量, C1,C2为用于控制衰减因子α、尺度因子δ的幅值,使NIG能有效地拟合MDCT系数的分布。in, are the 2nd to 4th order cumulants of the noisy MDCT coefficients, respectively, C 1 and C 2 are used to control the amplitude of attenuation factor α and scale factor δ, so that NIG can effectively fit the distribution of MDCT coefficients.
C)、估计参数C1和C2 C), estimated parameters C 1 and C 2
对不同音频类型,不同信噪比条件下,统计C1、C2不同取值对MDCT系数分布的模拟误差,最后得到最佳的值C1=0.1,C2=0.1;故有衰减因子α、尺度因子δ的估计式为:For different audio types and under different signal-to-noise ratio conditions, the simulation errors of different values of C 1 and C 2 on the distribution of MDCT coefficients are counted, and finally the best values C 1 = 0.1, C 2 = 0.1 are obtained; therefore, there is an attenuation factor α , the estimation formula of scale factor δ is:
5、设计衰减估计器5. Design an attenuation estimator
根据贝叶斯最大后验概率准则,设计基于NIG先验分布模型的估计函数:According to the Bayesian maximum a posteriori probability criterion, the estimation function based on the NIG prior distribution model is designed:
式中, Kλ(·)是索引为λ的第二阶修正贝塞尔函数,为对含有噪声的MP3音频数据y进行衰减得到的无噪MP3音频数据。In the formula, K λ ( ) is the second-order modified Bessel function with index λ, It is the noise-free MP3 audio data obtained by attenuating the MP3 audio data y containing noise.
相应,可得到含有噪声的MP3音频的衰减因子为:Correspondingly, the attenuation factor of MP3 audio containing noise can be obtained as:
第四步:静音段检测Step 4: Silent segment detection
1、MDCT谱特征的提取1. Extraction of MDCT spectral features
MDCT系数的谱能量计算如下:The spectral energy of the MDCT coefficients is calculated as follows:
其中,EM(i)为第i帧音频的MDCT谱能量,M(i,j)为第i帧音频的第j个MDCT谱均值,N为一帧音频的MDCT系数的点数N=576。对整个MP3音频段,音频段各帧的MDCT谱能量组成相应的特征矢量EM=[EM(0),EM(1),...EM(N-1)],即EM为音频段的MDCT谱能量包络。Wherein, EM (i) is the MDCT spectral energy of the i-th frame audio, M (i, j) is the j MDCT spectral mean value of the i-th frame audio, and N is the number of points N=576 of the MDCT coefficient of a frame of audio. For the whole MP3 audio segment, the MDCT spectrum energy of each frame of the audio segment forms a corresponding feature vector EM=[EM (0), EM (1), ... EM (N-1)], that is, EM is the MDCT of the audio segment Spectral energy envelope.
2、判决门限的调整2. Adjustment of judgment threshold
A)、初始化判决门限,以整个信号的MDCT谱能量包络的均值作为初始判决门限Lth A), initialize the decision threshold, and use the mean value of the MDCT spectrum energy envelope of the entire signal as the initial decision threshold L th
式中,EM(i)为第i帧音频的MDCT谱能量,N表示音频段的帧数,Lth为初始判决门限。In the formula, EM(i) is the MDCT spectrum energy of the i-th audio frame, N is the frame number of the audio segment, and L th is the initial decision threshold.
B)、门限调整:将音频段的MDCT谱包络EM中所有小于判决门限Lth的帧做为噪声帧处理,有B), threshold adjustment: in the MDCT spectrum envelope EM of the audio segment, all frames less than the decision threshold L th are processed as noise frames, with
EMnoise(i)=EM(i)if EM(i)<Ith EM noise (i) = EM (i) if EM (i) < I th
式中,EMnoise(i)表示第i帧音频的MDCT谱能量值为噪声帧的MDCT谱能量值。In the formula, EM noise (i) indicates that the MDCT spectral energy value of the i-th frame audio is the MDCT spectral energy value of the noise frame.
初始化噪声谱序列的均值和均方差,分别记为Lnoise和Snoise,Initialize the mean and mean square error of the noise spectrum sequence, denoted as L noise and S noise respectively,
式中,EMnoise(i)表示第i个噪声帧的MDCT谱能量值,Lnoise、Snoise分别为噪声能量序列的均值和均方差,M为噪声段的帧数。In the formula, EM noise (i) represents the MDCT spectrum energy value of the i-th noise frame, L noise and S noise are the mean value and mean square error of the noise energy sequence respectively, and M is the number of frames of the noise segment.
在得到噪声帧能量序列的均值Lnoise和均方差Snoise基础上,重新调整判决门限Lth。On the basis of obtaining the mean value L noise and the mean square error S noise of the energy sequence of the noise frame, the decision threshold L th is readjusted.
Lth=C0×(Lnoise+C1×Snoise)L th =C 0 ×(L noise +C 1 ×S noise )
其中,C0和C1为经验常数,实验中取C0=1.001,C1值取在1.5~2.0之间调整。调整完判决门限值Lth后,再重新区分噪声和语音帧,并重新计算噪声谱能量序列的均值Lnoise和均方差Snoise,然后调整判决门限值。如此重复至判决门限稳定。Among them, C0 and C1 are empirical constants. In the experiment, C0=1.001, and the value of C1 is adjusted between 1.5 and 2.0. After the decision threshold L th is adjusted, the noise and speech frames are re-distinguished, and the mean value L noise and the mean square error S noise of the noise spectrum energy sequence are recalculated, and then the decision threshold is adjusted. Repeat this until the decision threshold is stable.
3、活性端点的融合3. Fusion of active endpoints
A)、根据门限判断静音帧/非静音帧A), judge the mute frame/non-silent frame according to the threshold
其中,Etype[i]为第i帧音频的类型,EM[i]为第i帧音频的MDCT谱能量值;音频类型Etype[i]值为0表示静音帧,类型Etype[i]值为1表示活性音频帧。Among them, E type [i] is the type of the i-th frame audio, and EM[i] is the MDCT spectrum energy value of the i-th frame audio; the value of the audio type E type [i] is 0 to indicate a silent frame, and the type E type [i] A value of 1 indicates an active audio frame.
B)、计算静音段中所包含的帧数FN;B), calculating the number of frames F N included in the silent segment;
C)、若FN<10,该段为连续活性音频段间的停顿,合并入对应的音频段中;C), if F N <10, this segment is a pause between continuous active audio segments, and is merged into the corresponding audio segment;
第五步:自适应迭代衰减Step 5: Adaptive iterative decay
1、由第三步得到的衰减函数对第四步中检测到的静音段计算静音段的衰减值;1. Calculate the attenuation value of the silent segment for the silent segment detected in the fourth step by the attenuation function obtained in the third step;
2、计算1中静音段的平均衰减值 2. Calculate the average attenuation value of the silent segment in 1
3、使用第三步得到的衰减函数对含有噪声的MP3音频的MDCT系数进行衰减;3. Use the attenuation function obtained in the third step to attenuate the MDCT coefficients of the noisy MP3 audio;
4、由静音段的平均衰减值自适应调整迭代估计的次数:重复步骤1、2、3,当满足下面条件,则停止迭代,降噪完成:4. From the average attenuation value of the silent segment Adaptively adjust the number of iterative estimates: Repeat steps 1, 2, and 3. When the following conditions are met, the iteration is stopped and the noise reduction is completed:
其中,为静音段的平均衰减值,amin为整段音频的最小衰减因子,可以通过MDCT系数的高频段获得。C用于控制余留分量,取C=0.001,见附图1。in, is the average attenuation value of the silent segment, and a min is the minimum attenuation factor of the entire audio segment, which can be obtained through the high frequency band of the MDCT coefficient. C is used to control the residual component, and C=0.001, see Figure 1.
实验结果Experimental results
本实验使用了中央电视台广播音频资料进行了试验。音频资料的格式为MP3,采样频率为44.1KHz。音频类型有:语音、音乐、语音和音乐混合的音频。每种类型的音频各选择20首。分别对各种类型的音频加入不同程度的高斯白噪声,采用本研究提出的自适应降噪算法对含有噪声的MP3音频进行处理。降噪处理后的信噪比SNR采用的计算方法为:In this experiment, the audio data broadcast by CCTV was used for the experiment. The format of the audio data is MP3, and the sampling frequency is 44.1KHz. The audio types are: speech, music, audio mixed with speech and music. Choose 20 songs for each type of audio. Add different degrees of Gaussian white noise to various types of audio, and use the adaptive noise reduction algorithm proposed in this study to process the noisy MP3 audio. The calculation method of the signal-to-noise ratio SNR after noise reduction processing is:
其中,x(n)为不含噪声的MP3音频解码得到的PCM数据,为降噪处理后的MP3音频解码得到的PCM数据。降噪处理前后的信噪比SNR对比结果如表1所示:Wherein, x(n) is the PCM data obtained by decoding MP3 audio without noise, PCM data decoded for noise-reduced MP3 audio. The SNR comparison results before and after noise reduction processing are shown in Table 1:
表1:对MP3音频降噪前后的信噪比SNR对比Table 1: SNR comparison of MP3 audio before and after noise reduction
大量的统计实验表明,本发明的基于MP3压缩域音频的降噪方法能直接基于MP3压缩域,有效实现对不同类型的含有噪声的MP3音频进行降噪处理。降噪处理后的MP3音频的信噪比得到很大提高,并且处理后的音频有良好的听觉感知效果。本研究解决了直接基于MP3压缩域音频的降噪问题,也为MP3音频分类检索的抗噪算法研究提出了一个新的思路。A large number of statistical experiments show that the noise reduction method based on the MP3 compressed domain audio of the present invention can be directly based on the MP3 compressed domain, and effectively implement noise reduction processing on different types of MP3 audio containing noise. The signal-to-noise ratio of the MP3 audio after the noise reduction processing is greatly improved, and the processed audio has a good auditory perception effect. This study solves the problem of noise reduction directly based on MP3 compressed domain audio, and also proposes a new idea for the research of anti-noise algorithm for MP3 audio classification retrieval.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102154044A CN101930746B (en) | 2010-06-29 | 2010-06-29 | An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102154044A CN101930746B (en) | 2010-06-29 | 2010-06-29 | An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101930746A CN101930746A (en) | 2010-12-29 |
CN101930746B true CN101930746B (en) | 2012-05-02 |
Family
ID=43369879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102154044A Expired - Fee Related CN101930746B (en) | 2010-06-29 | 2010-06-29 | An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101930746B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120245927A1 (en) * | 2011-03-21 | 2012-09-27 | On Semiconductor Trading Ltd. | System and method for monaural audio processing based preserving speech information |
CN103730123A (en) * | 2012-10-12 | 2014-04-16 | 联芯科技有限公司 | Method and device for estimating attenuation factors in noise suppression |
CN103971698B (en) * | 2013-01-25 | 2019-01-11 | 北京千橡网景科技发展有限公司 | Method and apparatus for voice real-time noise-reducing |
CN104242850A (en) * | 2014-09-09 | 2014-12-24 | 联想(北京)有限公司 | Audio signal processing method and electronic device |
CN108595386B (en) * | 2018-05-07 | 2022-01-25 | 长沙理工大学 | Distributed optical fiber vibration measurement method and device based on high-order cumulant analysis |
CN109087657B (en) * | 2018-10-17 | 2021-09-14 | 成都天奥信息科技有限公司 | Voice enhancement method applied to ultra-short wave radio station |
KR102756512B1 (en) * | 2019-04-03 | 2025-01-21 | 돌비 레버러토리즈 라이쎈싱 코오포레이션 | Scalable voice scene media server |
CN110838306B (en) * | 2019-11-12 | 2022-05-13 | 广州视源电子科技股份有限公司 | Voice signal detection method, computer storage medium and related equipment |
CN112863546A (en) * | 2021-01-21 | 2021-05-28 | 安徽理工大学 | Belt conveyor health analysis method based on audio characteristic decision |
CN113436637A (en) * | 2021-06-20 | 2021-09-24 | 杭州登虹科技有限公司 | Compression algorithm of audio flow |
CN116417015B (en) * | 2023-04-03 | 2023-09-12 | 广州市迪士普音响科技有限公司 | Silence detection method and device for compressed audio |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6324502B1 (en) * | 1996-02-01 | 2001-11-27 | Telefonaktiebolaget Lm Ericsson (Publ) | Noisy speech autoregression parameter enhancement method and apparatus |
CN1624767A (en) * | 2003-12-03 | 2005-06-08 | 富士通株式会社 | Noise reduction device and noise reduction method |
EP1760696A2 (en) * | 2005-09-03 | 2007-03-07 | GN ReSound A/S | Method and apparatus for improved estimation of non-stationary noise for speech enhancement |
CN101142623A (en) * | 2003-11-28 | 2008-03-12 | 斯盖沃克斯瑟路申斯公司 | Noise Suppressor for Speech Coding and Speech Recognition |
CN101221762A (en) * | 2007-12-06 | 2008-07-16 | 上海大学 | MP3 compression field audio partitioning method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1253581B1 (en) * | 2001-04-27 | 2004-06-30 | CSEM Centre Suisse d'Electronique et de Microtechnique S.A. - Recherche et Développement | Method and system for speech enhancement in a noisy environment |
-
2010
- 2010-06-29 CN CN2010102154044A patent/CN101930746B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6324502B1 (en) * | 1996-02-01 | 2001-11-27 | Telefonaktiebolaget Lm Ericsson (Publ) | Noisy speech autoregression parameter enhancement method and apparatus |
CN101142623A (en) * | 2003-11-28 | 2008-03-12 | 斯盖沃克斯瑟路申斯公司 | Noise Suppressor for Speech Coding and Speech Recognition |
CN1624767A (en) * | 2003-12-03 | 2005-06-08 | 富士通株式会社 | Noise reduction device and noise reduction method |
EP1760696A2 (en) * | 2005-09-03 | 2007-03-07 | GN ReSound A/S | Method and apparatus for improved estimation of non-stationary noise for speech enhancement |
CN101221762A (en) * | 2007-12-06 | 2008-07-16 | 上海大学 | MP3 compression field audio partitioning method |
Also Published As
Publication number | Publication date |
---|---|
CN101930746A (en) | 2010-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101930746B (en) | An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio | |
CN107274908B (en) | Wavelet Speech Denoising Method Based on New Threshold Function | |
CN109410977B (en) | Voice segment detection method based on MFCC similarity of EMD-Wavelet | |
CN108831499B (en) | Speech enhancement method using speech existence probability | |
EP2352145B1 (en) | Transient speech signal encoding method and device, decoding method and device, processing system and computer-readable storage medium | |
JP2010539539A (en) | Speech improvement with speech clarification | |
CN111091833A (en) | Endpoint detection method for reducing noise influence | |
Jangjit et al. | A new wavelet denoising method for noise threshold | |
CN110931039A (en) | Wireless voice noise reduction device and method based on wavelet | |
CN110808057A (en) | A Speech Enhancement Method Based on Constrained Naive Generative Adversarial Networks | |
CN110709926A (en) | Apparatus and method for post-processing audio signals using prediction-based shaping | |
CN106653004B (en) | Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient | |
Zou et al. | Speech signal enhancement based on MAP algorithm in the ICA space | |
CN118016079B (en) | Intelligent voice transcription method and system | |
Gupta et al. | Speech enhancement using MMSE estimation and spectral subtraction methods | |
CN102169694A (en) | Method and device for generating psychoacoustic model | |
CN112927700B (en) | Blind audio watermark embedding and extracting method and system | |
Yektaeian et al. | Comparison of spectral subtraction methods used in noise suppression algorithms | |
CN115602190A (en) | Forged voice detection algorithm and system based on main body filtering | |
Sanam et al. | Teager energy operation on wavelet packet coefficients for enhancing noisy speech using a hard thresholding function | |
JPH113091A (en) | Audio signal rise detection device | |
Wu et al. | Speech endpoint detection in noisy environment using Spectrogram Boundary Factor | |
Liu | A new wavelet threshold denoising algorithm in speech recognition | |
Faek et al. | Speaker recognition from noisy spoken sentences | |
Lu et al. | A robust feature extraction based on the MTF concept for speech recognition in reverberant environment. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120502 Termination date: 20140629 |
|
EXPY | Termination of patent right or utility model |