CN101303858B

CN101303858B - Method and apparatus for implementing fundamental tone enhancement post-treatment

Info

Publication number: CN101303858B
Application number: CN 200710104394
Authority: CN
Inventors: 刘丽; 李伟; 曹军彬; 孙晓刚; 张清; 许丽净; 许剑峰; 杜正中; 胡晨; 苗磊; 杨毅
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2007-05-11
Filing date: 2007-05-11
Publication date: 2011-06-01
Anticipated expiration: 2027-05-11
Also published as: WO2008138267A1; CN101303858A

Abstract

A post-processing implementation method and device for pitch enhancement, the implementation process including: obtaining the gain of the decoded signal, and judging whether the gain exceeds a predetermined threshold, and after determining that the gain exceeds the predetermined threshold, the The decoded signal is output after long-term post-filtering. Therefore, in the present invention, the configuration of the filter coefficients and the judgment of the threshold are relatively simple, so that the post-processing process can be simplified, and at the same time, a better pitch enhancement effect can also be obtained.

Description

Method and device for realizing post-processing of pitch enhancement

技术领域technical field

本发明涉及音频解码技术领域，尤其涉及一种音频解码过程中的基音增强的自适应后处理技术。The invention relates to the technical field of audio decoding, in particular to an adaptive post-processing technology for pitch enhancement in the audio decoding process.

背景技术Background technique

在音频解码过程中，为改善解码后的语音的感知效果，需要针对解码后的语音进行后处理操作。所述后处理的目的是在合成声音信号中增强与感知质量相关的信息，即降低或去除使感知质量下降的干扰信息，以提高感知质量。目前，在后处理过程中采用的技术一般分为共振峰后处理技术和基音后处理技术。在基音后处理技术中，滤波器的频率响应需要与谐波相关。During the audio decoding process, in order to improve the perceived effect of the decoded speech, post-processing operations need to be performed on the decoded speech. The purpose of the post-processing is to enhance information related to perceptual quality in the synthesized sound signal, that is, reduce or remove interference information that degrades perceptual quality, so as to improve perceptual quality. At present, the technologies adopted in the post-processing process are generally divided into formant post-processing technology and pitch post-processing technology. In pitch post-processing techniques, the frequency response of the filter needs to be harmonically related.

以AMR-WB+(Adaptive Multi-Rate Wideband plus，增强自适应多速宽带)编解码为例，其采用的后处理方式为频带可选的基音增强后处理算法。如图1所示，在该后处理算法中，具体是将已经解码的合成声音信号分成两个子频带，对于其中的低频带，首先通过自适应基音增强滤波器，以对低频端基音谐波间的噪声进行削弱，然后再通过低通滤波进行处理；对于另一个频带则直接通过高通滤波器进行滤波处理；最后，将分别经过相应处理的两个频带的信号加和，从而得到基音增强后的合成声音信号。Taking AMR-WB+ (Adaptive Multi-Rate Wideband plus, enhanced adaptive multi-rate wideband) codec as an example, the post-processing method adopted is a frequency-band-selectable pitch enhancement post-processing algorithm. As shown in Figure 1, in this post-processing algorithm, specifically, the decoded synthesized sound signal is divided into two sub-bands, and for the low-frequency band, an adaptive pitch enhancement filter is firstly used to adjust the low-frequency end pitch between harmonics The noise of the two frequency bands is weakened, and then processed by low-pass filtering; for the other frequency band, it is directly filtered by high-pass filter; finally, the signals of the two frequency bands that have been correspondingly processed are summed to obtain the pitch-enhanced Synthesize the sound signal.

在图1中，为实现基音增强后处理的目的，在低频子频带中采用了Pitchenhancer(基音增强)和Low-pass filter(低通滤波器)两个模块。其中：In Figure 1, in order to achieve the purpose of pitch enhancement post-processing, two modules, Pitchenhancer (pitch enhancement) and Low-pass filter (low-pass filter), are used in the low-frequency sub-band. in:

所述的Pitch enhancer模块的作用是对已解码信号低频端的内部谐波噪声(inter-harmonic noise)进行适当程度的削弱，然后再通过Low-pass filter以滤除频谱倾斜及其他一些不希望的频率成分；该Pitch enhancer模块的实现过程采用了一个时变的线性滤波器，该滤波器的传输函数为：The function of the Pitch enhancer module is to weaken the internal harmonic noise (inter-harmonic noise) at the low-frequency end of the decoded signal to an appropriate degree, and then pass the Low-pass filter to filter out spectral tilt and other unwanted frequencies Component; the implementation process of the Pitch enhancer module uses a time-varying linear filter, and the transfer function of the filter is:

${H h}_{E E.} ((z z)) = = ((11 - - α α)) + + \frac{α α}{22} {z z}^{T T} + + \frac{α α}{22} {z z}^{- - T T} - - - - - - ((11))$

式中，系数α用来控制对内部谐波噪声的削减程度，T是已解码信号(decoded signal)的基音周期。In the formula, the coefficient α is used to control the reduction degree of internal harmonic noise, and T is the pitch period of the decoded signal.

所述的Low-pass filter(低通滤波器)模块为线性相位FIR(有限脉冲响应)低通滤波器，其截止频率为5×Fs/256kHz，其中，Fs为内部采样率。该滤波器的延迟为12个采样点。在实现过程中，需要利用经过低通滤波器处理后的信号状态对寄存器进行更新，在每个子帧中需要更新的寄存器位数为2×12个。Described Low-pass filter (low-pass filter) module is linear phase FIR (finite impulse response) low-pass filter, and its cut-off frequency is 5 * Fs/256kHz, and wherein, Fs is internal sampling rate. The delay of this filter is 12 samples. In the implementation process, the register needs to be updated by using the signal state processed by the low-pass filter, and the number of register bits to be updated in each subframe is 2×12.

式(1)中的系数α的值在0～0.5之间，其具体值由下式给出：The value of coefficient α in formula (1) is between 0 and 0.5, and its specific value is given by the following formula:

α＝0.5×g_p(2)α＝0.5× _gp (2)

其中，g_p是已解码出的基音增益值。当在AMR-WB+中采用TCX(变换域码激励)编解码模式时，α的值被直接设置为0，此时，基音增强滤波器的传输函数变为：Wherein, g _p is the decoded pitch gain value. When the TCX (transform domain code excitation) codec mode is used in AMR-WB+, the value of α is directly set to 0. At this time, the transfer function of the pitch enhancement filter becomes:

H_E(z)＝1(3)H _E (z) = 1(3)

式(3)表示针对TCX编解码模式不进行基音增强后处理操作。Equation (3) indicates that the pitch enhancement post-processing operation is not performed for the TCX codec mode.

可以看出，通过上述后处理方法可以消除解码语音信号低频端的谐波间的噪声成分，使得解码后的合成声音感知质量有所提高。It can be seen that the above post-processing method can eliminate the noise component between the harmonics of the low-frequency end of the decoded speech signal, so that the perceived quality of the decoded synthesized sound is improved.

在实现本发明过程中，发明人发现上述现有技术中提供基音增强的后处理实现方式至少存在如下问题：In the process of implementing the present invention, the inventors found that the post-processing implementation of pitch enhancement in the above-mentioned prior art has at least the following problems:

从上述实现过程可以看出，在所述的基音增强后处理算法中，需对解码语音信号先进行分频操作，并对不同的子频带作不同的滤波处理，导致相应的后处理过程实现复杂。It can be seen from the above implementation process that in the pitch enhancement post-processing algorithm, it is necessary to perform frequency division operation on the decoded speech signal first, and perform different filtering processes on different sub-frequency bands, resulting in complicated implementation of the corresponding post-processing process. .

发明内容Contents of the invention

本发明的实施例提供了一种基音增强的后处理实现方法及装置，以简化后处理过程，提高后处理获得的音频信号的质量。Embodiments of the present invention provide a pitch enhancement post-processing implementation method and device, so as to simplify the post-processing process and improve the quality of the audio signal obtained by the post-processing.

本发明实施例提供了一种实现基音增强后处理的方法，包括：获取解码信号的增益，判断所述增益是否超过预定的阈值，并在确定所述增益超过所述预定的阈值后，对解码信号进行后滤波处理。An embodiment of the present invention provides a method for implementing pitch enhancement post-processing, including: obtaining the gain of the decoded signal, judging whether the gain exceeds a predetermined threshold, and after determining that the gain exceeds the predetermined threshold, performing a decoding The signal is post-filtered.

本发明实施例还提供了一种实现基音增强后处理的装置，包括：The embodiment of the present invention also provides a device for implementing pitch enhancement post-processing, including:

增益评估单元，用于获取解码信号的增益；a gain evaluation unit, configured to obtain the gain of the decoded signal;

阈值判断单元，用于判断所述增益评估单元确定的所述解码信号的增益是否超过预定的阈值；a threshold judging unit, configured to judge whether the gain of the decoded signal determined by the gain evaluating unit exceeds a predetermined threshold;

自适应后滤波器，用于根据所述阈值判断单元的判断结果，仅对所述解码信号的增益超过预定的阈值的解码信号进行长时后滤波处理。The adaptive post-filter is configured to perform long-term post-filtering only on decoded signals whose gains of the decoded signals exceed a predetermined threshold according to the judgment result of the threshold judging unit.

由上述本发明的实施例提供的技术方案可以看出，本发明实施例中针对滤波器系数的配置和阈值的判断的处理过程的实现较为简单，并可以获得较佳的基音增强效果。同时，本发明实施例中是针对整个的解码语音信号进行基音增强处理，而不需进行分频处理，也不用分别地进行低通滤波和高通滤波操作，进一步降低了处理过程的复杂程度。It can be seen from the technical solutions provided by the above embodiments of the present invention that the implementation of the process of filter coefficient configuration and threshold judgment in the embodiments of the present invention is relatively simple, and a better pitch enhancement effect can be obtained. At the same time, in the embodiment of the present invention, pitch enhancement processing is performed on the entire decoded speech signal, without frequency division processing, and without performing low-pass filtering and high-pass filtering operations separately, which further reduces the complexity of the processing process.

附图说明Description of drawings

图1为现有技术中采用的基音增强的后处理实现原理示意图；FIG. 1 is a schematic diagram of the implementation principle of the post-processing of pitch enhancement adopted in the prior art;

图2为本发明实施例提供的方法的处理过程示意图；Fig. 2 is a schematic diagram of the processing process of the method provided by the embodiment of the present invention;

图3为本发明实施例提供的装置的结构示意图；Fig. 3 is a schematic structural diagram of a device provided by an embodiment of the present invention;

图4为本发明实施例中的增益评估单元的结构示意图。FIG. 4 is a schematic structural diagram of a gain evaluation unit in an embodiment of the present invention.

具体实施方式Detailed ways

本发明实施例中，充分利用已解码信号的能量特点，将其与解码得到的基音增益和基音周期值进行比较以取得最能反映声音特点的基音信息，从而提供了选择使用基音增强后处理滤波器以使解码信号有更好的感知质量的域值评估和判定实现方案。In the embodiment of the present invention, the energy characteristics of the decoded signal are fully utilized, and are compared with the decoded pitch gain and pitch cycle value to obtain the pitch information that best reflects the characteristics of the sound, thereby providing the option to use the pitch enhancement post-processing filter A threshold evaluation and decision implementation scheme for better perceptual quality of the decoded signal.

本发明实施例中，具体可以为：首先，获取解码信号的增益，之后，判断所述的增益是否超过预定的阈值，若超过，则对解码信号进行长时后滤波处理后输出，否则，直接输出所述解码信号。其中，所述的对解码信号进行后滤波处理所采用的后滤波器可以为全零点后滤波器。In the embodiment of the present invention, it may specifically be as follows: firstly, obtain the gain of the decoded signal, and then judge whether the gain exceeds a predetermined threshold, and if so, perform long-term post-filtering on the decoded signal and output it, otherwise, directly output the decoded signal. Wherein, the post-filter used for post-filtering the decoded signal may be an all-zero post-filter.

另外，本发明实施例中，若所述的后滤波器选择全零点后滤波器，则还针对相应的滤波器函数中涉及的局部调整因子λ和自适应全局增益G1，给出了进一步提高音频感知质量的具体参数值。当然，本发明中也可以采用其他类型的后滤波器进行后滤波处理。In addition, in the embodiment of the present invention, if the all-zero post-filter is selected as the post-filter, the local adjustment factor λ and the adaptive global gain G1 involved in the corresponding filter function are also given to further improve the audio frequency. Specific parameter values for perceived quality. Certainly, other types of post-filters may also be used in the present invention to perform post-filtering processing.

为便于对本发明实施例的理解，将首先对基音谐波间编码噪声的产生原因进行说明。To facilitate the understanding of the embodiments of the present invention, the cause of the coding noise between pitch harmonics will be described first.

以AMR-WB+编码为例，其中的语音编码部分采用CELP(码激励线性预测，Code-Excited Linear Prediction)编码技术。在编码端，对输入信号进行预加重处理，并进行16-阶的线性预测分析后，再用基音合成滤波器对其进行编码处理。所述的基音合成滤波器的表达式为：Taking AMR-WB+ coding as an example, the speech coding part adopts CELP (Code-Excited Linear Prediction, Code-Excited Linear Prediction) coding technology. At the encoding end, the input signal is pre-emphasized, and the 16-order linear prediction analysis is performed, and then the pitch synthesis filter is used to encode it. The expression of described pitch synthesis filter is:

$\frac{11}{B B ((z z))} = = \frac{11}{11 - - {g g}_{p p} {z z}^{- - T T}} - - - - - - ((44))$

其中，T是基音周期，g_p是基音增益。Among them, T is the pitch period, and g _p is the pitch gain.

在语音感知理论中，语音的共振峰部分要比语音的波谷部分对听觉感知更重要；因此，在较低编码速率下，通常需要牺牲波谷区域的性能而尽量使对共振峰的编码更优越。这就使得波谷相对于波峰可能包含更多的感知编码噪声，包括基音谐波峰值之间的波谷。In the theory of speech perception, the formant part of the speech is more important to the auditory perception than the valley part of the speech; therefore, at a lower coding rate, it is usually necessary to sacrifice the performance of the valley region and try to make the formant coding superior. This makes the troughs likely to contain more perceptually encoded noise than the peaks, including troughs between pitch harmonic peaks.

基于上述编码噪声产生的原因，在解码端，需要合理设置一个后处理滤波器，以削减所述的编码噪声，以便获得更好的感知质量。Based on the above-mentioned causes of encoding noise, at the decoding end, a post-processing filter needs to be reasonably set to reduce the encoding noise so as to obtain better perceptual quality.

下面将结合附图对本发明实施例的具体实现过程进行说明。The specific implementation process of the embodiment of the present invention will be described below with reference to the accompanying drawings.

本发明实施例提供的音频解码过程中实现基音增强后处理的方法的具体实现方式如图2所示，具体包括以下步骤：The specific implementation of the method for implementing pitch enhancement post-processing in the audio decoding process provided by the embodiment of the present invention is shown in Figure 2, and specifically includes the following steps:

步骤1，根据解码信号确定接收到的解码信号的增益；Step 1, determining the gain of the received decoded signal according to the decoded signal;

具体可以为：在相邻一个周期内的信号幅值的比值为：Specifically, it can be: the ratio of the signal amplitude in an adjacent period is:

$ration ration = = \sqrt{\frac{0.01 0.01 + + {Σ Σ}_{i i = = 00}^{T T} syn syn__{in in}^{22} ((i i))}{0.01 0.01 + + {Σ Σ}_{i i = = 00}^{T T} syn syn__{in in}^{22} ((i i + + T T))}} - - - - - - ((55))$

将该比值ration与从码流中解码获得的增益进行比较，并取其中较小的一个值作为最终的解码信号的增益值。The ratio ration is compared with the gain obtained from decoding the code stream, and a smaller value is taken as the final gain value of the decoded signal.

步骤2，判断步骤1确定的增益是否超过预定的阈值，如果是，则执行步骤3，否则执行步骤4；Step 2, judging whether the gain determined in step 1 exceeds a predetermined threshold, if yes, execute step 3, otherwise execute step 4;

本发明实施例中，根据解码后合成声音信号的当前基音周期和邻近基音周期的信号能量特点，对于何时使用后处理滤波器，设置了一个判断阈值E_thr，即当步骤1确定的增益值E_com大于E_thr时，才进行相应的长时后滤波操作，否则不进行该长时后滤波处理；In the embodiment of the present invention, according to the signal energy characteristics of the current pitch period and adjacent pitch periods of the decoded synthesized sound signal, a judgment threshold E _thr is set for when to use the post-processing filter, that is, when the gain value determined in step 1 When E _com is greater than E _thr , the corresponding long-term post-filtering operation is performed, otherwise the long-term post-filtering process is not performed;

其中，基于所述域值E_thr的判断处理主要是考虑到浊音语音帧具有较强的周期性的特点，即：从编码端传送的码流中解码出的增益g_p′即能反映出浊音的这种特性。根据大量的程序调试和对参量的变化情况的观察，可看出：在浊音帧中，g_p′的值较大且接近于一个稳定的值；在清音帧中，g_p′则较小，并有很大一部分趋近于0；总体来看，g_p′的值和当前基音周期的信号幅值与前一个基音周期的信号幅值的比值大体相近；以AMR-WB+编解码为例，经大量实验，并比较各次实验后解码信号与原声音信号之间的pesq(客观话音质量评定)差值，E_thr可以选择0.6；Wherein, the judgment process based on the threshold value E _thr mainly considers that the voiced speech frame has a strong periodicity, that is, the gain g _p ′ decoded from the code stream transmitted by the encoding end can reflect the voiced sound of this characteristic. According to a large number of program debugging and observation of parameter changes, it can be seen that: in voiced frames, the value of g _p ′ is relatively large and close to a stable value; in unvoiced frames, g _p ′ is small, And a large part tends to 0; overall, the value of g _p ′ and the ratio of the signal amplitude of the current pitch period to the signal amplitude of the previous pitch period are roughly similar; taking the AMR-WB+ codec as an example, After a large number of experiments, and comparing the pesq (objective voice quality assessment) difference between the decoded signal and the original sound signal after each experiment, E _thr can be selected as 0.6;

需要说明的是，根据不同的编解码框架，可以根据具体情况确定所述阈值的取值，例如，在除AMR-WB+编解码外的其他编解码过程中，所述阈值的选取范围可在0～1之间；It should be noted that, according to different encoding and decoding frameworks, the value of the threshold can be determined according to specific conditions. For example, in other encoding and decoding processes except AMR-WB+ encoding and decoding, the selection range of the threshold can be 0 ~1 between;

步骤3，对解码信号(即解码端解码获得的基音合成信号)进行长时后滤波处理后输出，并执行步骤4；Step 3, output the decoded signal (i.e., the pitch synthesis signal obtained by decoding at the decoder) after long-term post-filtering, and perform step 4;

具体可以采用全零点后滤波器作为后滤波器对基音谐波间的噪声做削弱处理；其中，为保证基音谐波的波峰仍在以上的频率处，零点应添加在与基音谐波间的波谷位置相对应的频率处，即在π/T，......，(2T-1)*π/T处，因此，可以采用的全零点后滤波器的形式为：Specifically, an all-zero post-filter can be used as a post-filter to weaken the noise between the pitch harmonics; among them, in order to ensure that the peak of the pitch harmonic is still at the above frequency, the zero point should be added to the valley between the pitch harmonics At the frequency corresponding to the position, that is, at π/T,..., (2T-1)*π/T, therefore, the form of the all-zero post-filter that can be used is:

H(z)＝G₁×(1+λ×z^-T)(6)H(z)＝G ₁ ×(1+λ×z ^-T )(6)

式(6)中，T为基音周期，G₁为对该滤波器的总的增益控制，λ为一个局部调整因子；In formula (6), T is the pitch period, G ₁ is the total gain control of the filter, and λ is a local adjustment factor;

在该步骤中，以采用AMR-WB+编解码为例，则所述全零点后滤波器的基音周期T的确定可以采用AMR-WB+编解码中所采用的基音周期确定方式，如采用pitch tracking模块输出的T作为基音周期。为避免出现pitch doubling(双基音)现象，还需计算延迟为T/2的两处信号的归一化自相关值，若所述的归一化自相关值大于0.95，则将T/2作为后处理中的新的基音周期值，以在低频端更精确并实时地得到相应基音周期值；In this step, taking the AMR-WB+ codec as an example, the determination of the pitch period T of the all-zero post-filter can adopt the pitch period determination method adopted in the AMR-WB+ codec, such as using a pitch tracking module The output T is taken as the pitch period. In order to avoid the phenomenon of pitch doubling (double pitch), it is also necessary to calculate the normalized autocorrelation value of the two signals whose delay is T/2. If the normalized autocorrelation value is greater than 0.95, T/2 is used as New pitch period value in post-processing to obtain the corresponding pitch period value more accurately and in real time at the low frequency end;

在该步骤中，λ的取值范围通常在0～1之间，其取值决定了相隔一个基音周期的信号之间的加权程度，仍以AMR-WB+编解码为例，经实验后，所述的λ值可以选择为0.1；In this step, the value range of λ is usually between 0 and 1, and its value determines the weighting degree between signals separated by a pitch period. Still taking the AMR-WB+ codec as an example, after experiments, the The λ value mentioned above can be selected as 0.1;

在该步骤中，为防止后滤波器对基音谐波间的噪声削弱的同时所带来的信号扭曲，则采用自适应增益控制的处理方式确定自适应全局增益G₁，相应的确定自适应全局增益G₁的过程如下：In this step, in order to prevent the signal distortion caused by the post-filter attenuating the noise between pitch harmonics, the adaptive gain control method is used to determine the adaptive global gain G ₁ , and the adaptive global gain G 1 is determined accordingly. The process of gain G ₁ is as follows:

假设k时刻该后处理滤波器的输入为x(n)输出为y(n)，则从(6)的传输函数可得Assuming that the input of the post-processing filter at time k is x(n) and the output is y(n), then the transfer function of (6) can be obtained

y(n)＝G₁×[x(n)+λ×x(n-T)](7)y(n)=G ₁ ×[x(n)+λ×x(nT)](7)

对于浊音帧，根据浊音的强周期性可知，其相邻基音周期内的波形可看作是幅度上有些微的差异，所以可令For voiced sound frames, according to the strong periodicity of voiced sound, the waveforms in adjacent pitch periods can be regarded as slightly different in amplitude, so we can set

x(n-T)≈gain×x(n)(8)x(n-T)≈gain×x(n)(8)

将(8)代入到(7)中，可得Substituting (8) into (7), we can get

y(n)≈G₁×[1+λ×gain]×x(n)(9)y(n)≈G ₁ ×[1+λ×gain]×x(n)(9)

由以上推导可以看出，若不做自适应的增益控制，则滤波器在完成削弱谐波间噪声的基音增强后处理的同时会使输出y(n)比输入x(n)大出很多，将使最终的合成语音信号的感知质量大大下降；故选择自适应全局增益G₁的值为：It can be seen from the above derivation that if no adaptive gain control is performed, the filter will make the output y(n) much larger than the input x(n) while completing the pitch enhancement post-processing that weakens the inter-harmonic noise. The perceptual quality of the final synthesized speech signal will be greatly reduced; so the value of the adaptive global gain G ₁ is selected as:

${G G}_{11} \approx \approx \frac{11}{11 + + λ λ \times \times gain gain} - - - - - - ((1010))$

这样，便可以确定全零点后滤波器的各参数。In this way, the parameters of the all-zero post-filter can be determined.

步骤4，将解码端获得的基音合成信号输出。Step 4, outputting the pitch synthesis signal obtained at the decoding end.

具体可以为：假设在步骤2和步骤3中，解码后的基音合成信号为synth_in，进行基音长时后滤波处理后的输出信号为synth_out，则所述的步骤2和步骤3的处理可以通过下式表示：Specifically, it can be as follows: suppose that in steps 2 and 3, the decoded pitch synthesis signal is synth_in, and the output signal after the pitch long-term post-filtering process is synth_out, then the processing of steps 2 and 3 can be performed as follows formula means:

$synth synth__out out = = \{\begin{matrix} synth synth__in in,, & if if {E E.}_{com com} < < {E E.}_{thr thr} \\ synth synth__in in &CircleTimes; &CircleTimes; h h,, & if if {E E.}_{com com} &GreaterEqual; &Greater Equal; {E E.}_{thr thr} \end{matrix} - - - - - - ((1111))$

式(11)中，h为自适应后滤波器H(z)的脉冲响应函数；且该式(11)表示，在步骤4中输出的基音合成信号有两种：In formula (11), h is the impulse response function of adaptive post-filter H (z); And this formula (11) represents, there are two kinds of pitch synthesis signals output in step 4:

(1)一种是经过步骤3的长时后滤波处理后，且经过自适应增益控制的基音合成信号，以防止后滤波器对基音谐波间的噪声削弱的同时所带来的信号扭曲；(1) One is after the long-term post-filtering process in step 3, and through the pitch synthesis signal of adaptive gain control, to prevent the signal distortion caused by the post-filter while weakening the noise between the pitch harmonics;

(2)另一种为未经步骤3处理而直接输出的基音合成信号。(2) The other is the pitch synthesis signal that is directly output without being processed in step 3.

本发明实施例还提供了一种音频解码过程中实现基音增强后处理的装置，其具体实现结构如图3所示，具体可以包括以下处理单元：The embodiment of the present invention also provides a device for implementing pitch enhancement post-processing in the audio decoding process, the specific implementation structure of which is shown in Figure 3, and may specifically include the following processing units:

(1)增益评估单元(1) Gain evaluation unit

该单元用于获取解码信号的增益；This unit is used to obtain the gain of the decoded signal;

而且，如图4所示，该单元具体可以包括：Moreover, as shown in Figure 4, the unit may specifically include:

比值确定单元，用于确定相邻基音周期的信号幅值的比值，即确定上一基音周期内的信号幅值与当前基音周期内的信号幅值的比值；The ratio determination unit is used to determine the ratio of the signal amplitudes of adjacent pitch periods, that is, determine the ratio of the signal amplitudes in the previous pitch period to the signal amplitudes in the current pitch period;

解码信号的增益确定单元，用于比较并选择所述比值与解码获得的增益进行比较，并取两者中较小的一个值作为解码信号的增益。The gain determination unit of the decoded signal is used to compare and select the ratio and the gain obtained by decoding for comparison, and take the smaller value of the two as the gain of the decoded signal.

(2)阈值判断单元(2) Threshold judgment unit

该单元用于判断所述增益评估单元确定的所述解码信号的增益是否超过预定的阈值；The unit is used to judge whether the gain of the decoded signal determined by the gain evaluation unit exceeds a predetermined threshold;

若所述装置用于AMR-WB+解码过程中，则该阈值判断单元选择的所述预定的阈值可以为0.6。If the device is used in the AMR-WB+ decoding process, the predetermined threshold selected by the threshold judging unit may be 0.6.

(3)自适应后滤波器(3) Adaptive post filter

其用于根据所述阈值判断单元的判断结果，仅对所述解码信号的增益超过预定的阈值的解码信号进行长时后滤波处理；It is used to perform long-term post-filtering processing only on decoded signals whose gain of the decoded signal exceeds a predetermined threshold according to the judgment result of the threshold judgment unit;

所述的自适应后滤波器可以选择全零点后滤波器，且所述全零点后滤波器的函数为：H(z)＝G₁×(1+λ×z^-T)，其中，G₁为自适应全局增益，λ为局部调整因子，T为基音周期；The adaptive post-filter may be an all-zero post-filter, and the function of the all-zero post-filter is: H(z)=G ₁ ×(1+λ×z ^-T ), where G ₁ is the adaptive global gain, λ is the local adjustment factor, and T is the pitch period;

而且，若所述装置用于AMR-WB+解码过程中，则所述全零点后滤波器采用：所述的λ值为0.1，且所述自适应全局增益的值 $G_{1} \approx \frac{1}{1 + λ \times gain}$ 的全零点后滤波器，以便于避免所述后滤波器可以对基音谐波间的噪声削弱的同时所带来的信号扭曲。Moreover, if the device is used in the AMR-WB+ decoding process, the all-zero post-filter adopts: the value of λ is 0.1, and the value of the adaptive global gain $G$ $_{1} \approx \frac{1}{1 + λ \times gain}$ An all-zero post-filter is used to avoid signal distortion caused by the post-filter while attenuating the noise between the fundamental harmonics.

需要说明的是，本发明实施例中，用于基音增强的后滤波器也可采用梳状滤波器(Comb filter)。梳状滤波器利用了浊音的强周期性，在频域，梳状滤波器能够保留声音信号的基频及其整数倍数的各谐波分量，抑制非谐波分量。It should be noted that, in the embodiment of the present invention, the post-filter used for pitch enhancement may also use a comb filter (Comb filter). The comb filter takes advantage of the strong periodicity of voiced sounds. In the frequency domain, the comb filter can retain the fundamental frequency of the sound signal and its integer multiple harmonic components, and suppress the non-harmonic components.

由于各谐波之间的间隙基本以噪声为主，故在理想情况下，若获知基频(基音周期)便可以将谐波之间的噪声完全滤掉。Since the gaps between the harmonics are basically dominated by noise, ideally, if the fundamental frequency (pitch period) is known, the noise between the harmonics can be completely filtered out.

本发明实施例中采用梳状滤波器的传输函数为：In the embodiment of the present invention, the transfer function of the comb filter is:

${H h}_{c c} ((z z)) = = {Σ Σ}_{k k = = - - L L}^{L L} {a a}_{k k} {z z}^{- - kT kT} - - - - - - ((1212))$

相对应的时域表达式为：The corresponding time domain expression is:

$y the y ((n no)) = = {Σ Σ}_{k k = = - - L L}^{L L} {a a}_{k k} x x ((n no - - kT kT)) - - - - - - ((1313))$

其中，×(n)是解码后的语音信号，y(n)是经梳状滤波器处理后的输出；a_k(-L≤k≤L)是梳状滤波器的2L+1个抽头系数，系数a_k可以自适应于语音信号谱的变化，在各个子帧中，a_k的取值可参考上述获得的解码信号的增益进行配置；对基音周期T，要避免重复预测的情况。Among them, ×(n) is the decoded speech signal, y(n) is the output after comb filter processing; a _k (-L≤k≤L) is the 2L+1 tap coefficients of the comb filter , the coefficient a _k can be adaptive to the change of the speech signal spectrum. In each subframe, the value of a _k can be configured with reference to the gain of the decoded signal obtained above; for the pitch period T, repeated prediction should be avoided.

从式(13)中可以看出，输出y(n)是输入x(n)的延时加权平均值，以强调周期性分量；当延时与基音周期一致时，这个平均过程会使周期性分量得到加强，而那些非周期性分量或其它与信号周期不同的分量将受到抑制或彻底消除。It can be seen from equation (13) that the output y(n) is a delay-weighted average of the input x(n) to emphasize the periodic component; when the delay coincides with the pitch period, this averaging process makes the periodicity Components are enhanced, while those aperiodic components or other components that differ from the period of the signal are suppressed or completely eliminated.

综上所述，本发明实施例中，在采用FIR滤波器对全频带的解码声音信号进行基音增强后处理的情况下，所述域值的判断过程，以及滤波器系数的配置过程均可以较为简单地实现，而且，本发明实施例还能够在每个子帧中自适应于解码端合成声音信号的能量变化，得到较优的基音增强效果。例如，基于AMR-WB+编解码框架，可在相对简单的操作过程中实现基音增强的后处理过程，提高了解码声音的感知质量。In summary, in the embodiment of the present invention, in the case of using the FIR filter to perform pitch enhancement post-processing on the decoded sound signal of the full frequency band, the judgment process of the threshold value and the configuration process of the filter coefficients can be compared. It is simple to implement, and the embodiment of the present invention can also adapt to the energy change of the synthesized sound signal at the decoding end in each subframe, so as to obtain a better pitch enhancement effect. For example, based on the AMR-WB+ codec framework, the post-processing process of pitch enhancement can be realized in a relatively simple operation process, which improves the perceived quality of the decoded sound.

而且，本发明实施例提供的实现方案在对语音信号进行基音增强以获得较好感知质量的同时，经过对大量音乐序列进行的主客观测试，发现其对音乐信号的感知质量的提高程度也非常大。Moreover, the implementation scheme provided by the embodiments of the present invention performs pitch enhancement on the voice signal to obtain better perceptual quality, and through subjective and objective tests on a large number of music sequences, it is found that the degree of improvement in the perceptual quality of the music signal is also very high. big.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art within the technical scope disclosed in the present invention can easily think of changes or Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. A method for implementing post-processing of pitch enhancement, comprising: obtaining the gain of the decoded signal, judging whether the gain exceeds a predetermined threshold, and after determining that the gain exceeds the predetermined threshold, decoding The signal is post-filtered;

Wherein, the step of obtaining the gain of the decoded signal specifically includes:

determining the ratio of the signal amplitudes of adjacent pitch periods;

The ratio is compared with the pitch gain obtained by decoding, and the smaller value of the two is taken as the gain of the decoded signal.

2. The method according to claim 1, wherein the step of performing post-filtering processing on the decoded signal comprises:

The decoded signal is post-filtered using an all-zero post-filter, and the function of the all-zero post-filter is: H(z)=G ₁ ×(1+λ×z ^-T ), where G ₁ is the Adapt to the global gain, λ is the local adjustment factor, and T is the pitch period.

3. The method according to claim 2, characterized in that, in the enhanced adaptive multi-speed wideband AMR-WB+ encoding and decoding process, the described lambda value is selected as 0.1, and the adaptive global gain:

Wherein, gain is the gain of decoding the signal in each subframe.

4. The method according to claim 1, 2 or 3, characterized in that, in the AMR-VB+ encoding and decoding process, the predetermined threshold is 0.6.

5. A device for realizing post-processing of pitch enhancement, characterized in that, comprising:

a gain evaluation unit, configured to obtain the gain of the decoded signal;

a threshold judging unit, configured to judge whether the gain of the decoded signal determined by the gain evaluating unit exceeds a predetermined threshold;

An adaptive post-filter, configured to perform post-filtering processing only on decoded signals whose gain of the decoded signal exceeds a predetermined threshold according to the judgment result of the threshold judgment unit;

Wherein, the gain evaluation unit specifically includes:

a ratio determination unit, configured to determine the ratio of the signal amplitudes of adjacent pitch periods;

The gain determination unit of the decoded signal is configured to compare the ratio with the pitch gain obtained by decoding, and take the smaller value of the two as the gain of the decoded signal.

6. The device according to claim 5, wherein the adaptive post-filter is an all-zero post-filter, and the function of the all-zero post-filter is:

H(z)=G ₁ ×(1+λ×z ^−T ), where G ₁ is the adaptive global gain, λ is the local adjustment factor, and T is the pitch period.

7. The device according to claim 6, characterized in that, when the device is used in the AMR-VB+ decoding process, the all-zero post-filter adopts: the λ value is 0.1, and the adaptive global gain

The all-zero post-filter of , where gain is the gain of the decoded signal in each subframe.

8. The device according to claim 5, 6 or 7, wherein when the device is used in the AMR-VB+ decoding process, the predetermined threshold selected by the threshold judging unit is 0.6. the