CN108172231B - A Kalman Filter-Based Reverberation Method and System - Google Patents
A Kalman Filter-Based Reverberation Method and System Download PDFInfo
- Publication number
- CN108172231B CN108172231B CN201711285885.4A CN201711285885A CN108172231B CN 108172231 B CN108172231 B CN 108172231B CN 201711285885 A CN201711285885 A CN 201711285885A CN 108172231 B CN108172231 B CN 108172231B
- Authority
- CN
- China
- Prior art keywords
- signal
- reverberation
- kalman
- microphone
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
本发明公开了一种基于卡尔曼滤波的去混响方法及系统,所述方法包括:将各麦克风采集到的原始信号进行预处理得到相应的频域信号,延迟后构成输入信号;利用卡尔曼滤波算法和时变的多通道自回归模型估计混响信号,将当前时刻的各麦克风采集到的原始信号作为参考信号,减去混响信号得到误差信号;利用卡尔曼增益矩阵和误差信号更新卡尔曼滤波器的系数;利用当前时刻各麦克风采集到的原始信号、输入信号和更新后的卡尔曼滤波器系数得到目标信号;最后,利用逆傅里叶变换将频域目标信号转换到时域。本发明的方法通过对角化卡尔曼滤波器状态向量误差协方差矩阵,降低了自适应多通道线性预测去混响算法的复杂度。
The invention discloses a method and system for de-reverberation based on Kalman filtering. The method includes: preprocessing original signals collected by each microphone to obtain corresponding frequency domain signals, and delaying to form an input signal; using Kalman The filter algorithm and the time-varying multi-channel autoregressive model estimate the reverberation signal, take the original signal collected by each microphone at the current moment as the reference signal, and subtract the reverberation signal to obtain the error signal; use the Kalman gain matrix and the error signal to update the Kalman signal Kalman filter coefficients; use the original signal collected by each microphone at the current moment, the input signal and the updated Kalman filter coefficients to obtain the target signal; finally, use the inverse Fourier transform to convert the frequency domain target signal to the time domain. The method of the invention reduces the complexity of the adaptive multi-channel linear prediction de-reverberation algorithm by diagonalizing the Kalman filter state vector error covariance matrix.
Description
技术领域technical field
本发明涉及语音去混响领域,特别涉及一种基于卡尔曼滤波的去混响方法及系统。The present invention relates to the field of speech de-reverberation, in particular to a de-reverberation method and system based on Kalman filtering.
背景技术Background technique
如图1所示,由于房间边界及房间内物体对声波的反射作用,麦克风除接收到声源发出的直达声外,还有来自各个方向的反射声。一般将到达时间在直达声之后30-50ms的声信号称为早期反射声,在此之后到达的声信号称为晚期反射声,即混响拖尾。心理声学研究发现,早期反射声可增强直达声的强度,提高语音可懂度。而混响信号会掩蔽后续到达的直达声信号,导致语音模糊。另外,混响信号还会降低麦克风接收信号的语音质量,以及语音识别系统的准确识别率。在密闭房间内进行的电话会议、智能音箱等应用场景下,麦克风往往处在声源的远场。随着声源与麦克风之间距离的增加,混响对麦克风接收信号的破坏作用更加严重。另外,在语音通信系统中,环境噪声较小,麦克风接收到的信号主要受房间混响的影响,导致语音信号精确度和可懂度都有所下降,严重影响通信质量。因此,对麦克风接收信号去混响是一项十分必要的工作。As shown in Figure 1, due to the reflection of sound waves by the room boundary and objects in the room, the microphone not only receives the direct sound from the sound source, but also the reflected sound from all directions. Generally, the sound signal whose arrival time is 30-50ms after the direct sound is called the early reflection sound, and the sound signal arriving after this is called the late reflection sound, that is, the reverberation tail. Psychoacoustic studies have found that early reflected sound can enhance the intensity of direct sound and improve speech intelligibility. The reverberation signal will mask the subsequent direct sound signal, resulting in blurred speech. In addition, the reverberation signal will also reduce the speech quality of the signal received by the microphone and the accurate recognition rate of the speech recognition system. In application scenarios such as conference calls and smart speakers in a closed room, the microphone is often in the far field of the sound source. As the distance between the sound source and the microphone increases, the effect of reverberation on the signal received by the microphone becomes more severe. In addition, in the voice communication system, the ambient noise is small, and the signal received by the microphone is mainly affected by the reverberation of the room, which leads to the decline of the accuracy and intelligibility of the voice signal, which seriously affects the communication quality. Therefore, it is necessary to de-reverberate the signal received by the microphone.
语音去混响是一个热门的研究课题。目前的解决方法主要有:Speech de-reverberation is a hot research topic. The current solutions mainly include:
(1)线性预测残差增强算法。线性预测残差增强算法利用的语音模型为声源滤波器模型。该模型中将语音视作一串激励序列通过一个时变的全极点滤波器。对混响语音信号作线性预测分析可得到全极点滤波器系数的估计值,也就是线性预测系数。然后对麦克风接收信号作逆滤波,即可得到对应的激励信号,也就是残差信号。通过增强残差信号即可实现去混响,通过估计得到的线性预测系数可重建语音信号。(1) Linear prediction residual enhancement algorithm. The speech model used by the linear prediction residual enhancement algorithm is the sound source filter model. The model treats speech as a sequence of excitations passing through a time-varying all-pole filter. The linear prediction analysis of the reverberated speech signal can obtain the estimated value of the all-pole filter coefficient, that is, the linear prediction coefficient. Then, inverse filtering is performed on the signal received by the microphone, and the corresponding excitation signal, that is, the residual signal, can be obtained. De-reverberation can be achieved by enhancing the residual signal, and the speech signal can be reconstructed through the estimated linear prediction coefficients.
(2)谱增强方法。谱增强方法是一类经典的去混响算法。该方法通过在短时傅里叶变换域修正含噪或含混响信号,达到增强语音信号的目的。文献[1](K.Kinoshita,M.Delcroix,T.Nakatani,and M.Miyoshi,“Suppression of late reverberationeffecton speech signal using long-term multiple-step linear prediction,”IEEETrans.Audio,Speech,Lang.Process.,vol.17,no.4,pp.534–545,May 2009.)通过延迟线性预测估计晚期混响,再利用后续的谱减法实现去混响。文献[2](F.Xiong,N.Moritz,R.Rehr,J.Anemuller,B.Meyer,T.G.G.Doclo,and S.Goetze,“Robust ASR inreverberant environments using temporal cepstrum smoothing for speechenhancement and an amplitude modulation filterbank for feature extraction,”inProc.REVERB Challenge Workshop,Florence,Italy,2014.)利用最小均方误差方法估计干净的语音信号幅度谱,作为自动语音识别的预处理阶段,由晚期混响和平稳背景噪声的功率谱密度可估计干净语音信号的功率谱密度。一般情况下,谱增强方法为确定谱衰减等级需要先估计混响时间。然而,盲混响估计仍然是十分困难的问题,尤其是在含有噪声的环境,该问题的研究仍在不断进展中。(2) Spectral enhancement method. Spectral enhancement methods are a class of classical de-reverberation algorithms. The method achieves the purpose of enhancing the speech signal by correcting the noise-containing or reverberation-containing signal in the short-time Fourier transform domain. Literature [1] (K.Kinoshita,M.Delcroix,T.Nakatani,and M.Miyoshi,"Suppression of late reverberationeffecton speech signal using long-term multiple-step linear prediction,"IEEETrans.Audio,Speech,Lang.Process. , vol.17, no.4, pp.534–545, May 2009.) estimated late reverberation by delayed linear prediction, followed by spectral subtraction to achieve de-reverberation. Literature [2] (F.Xiong,N.Moritz,R.Rehr,J.Anemuller,B.Meyer,T.G.G.Doclo,and S.Goetze, "Robust ASR inreverberant environments using temporal cepstrum smoothing for speechenhancement and an amplitude modulation filterbank for feature extraction, "inProc. REVERB Challenge Workshop, Florence, Italy, 2014.) uses the least mean square error method to estimate the clean speech signal amplitude spectrum as a preprocessing stage for automatic speech recognition, determined by the power of late reverberation and stationary background noise Spectral Density estimates the power spectral density of a clean speech signal. In general, the spectral enhancement method needs to estimate the reverberation time in order to determine the spectral attenuation level. However, blind reverberation estimation is still a very difficult problem, especially in noisy environments, and research on this problem is still in progress.
(3)逆滤波方法。盲去混响算法是指在去混响的过程中,对声源和麦克风之间的房间冲激响应的先验知识是未知的。基于麦克风阵列的多通道线性预测算法是一种经典的盲去混响算法。根据多输入输出求逆理论(Multiple input/output inverse theorem,MINT),在各通道传递函数不含公共零点的条件下,多通道方法可以完美均衡时不变的房间冲激响应。然而,MINT算法对系统辨识误差十分敏感,而且实际房间的冲激响应往往含有相近的零点,因此MINT算法在实际中难以应用。(3) Inverse filtering method. Blind de-reverberation algorithms refer to the process of de-reverberation in which prior knowledge of the room impulse response between the sound source and the microphone is unknown. The multi-channel linear prediction algorithm based on the microphone array is a classic blind de-reverberation algorithm. According to the Multiple Input/Output Inverse Theorem (MINT), the multi-channel method can perfectly equalize the time-invariant room impulse response under the condition that the transfer function of each channel does not contain a common zero. However, the MINT algorithm is very sensitive to the system identification error, and the impulse response of the actual room often contains similar zeros, so the MINT algorithm is difficult to apply in practice.
由于时域线性预测算法往往要求很长的滤波器长度,并且存在白化目标信号的问题。最近有学者提出在短时傅里叶变换域应用多通道线性预测算法在各子带独立处理信号。在STFT域,混响语音信号在每个频带用自回归模型描述,由此可以减少每个子带的滤波器长度。由于房间冲激响应实际上是随时间变化的,所以需要时变的预测模型系数建模。最近有学者提出了STFT域的多通道自回归(Multichannel autoregressive,MAR)信号模型,利用卡尔曼滤波器估计MAR系数,该算法可视为一种广义的递归最小二乘(Recursiveleastsquares,RLS)算法。Because the time-domain linear prediction algorithm often requires a long filter length, and there is a problem of whitening the target signal. Recently, some scholars proposed to apply a multi-channel linear prediction algorithm in the short-time Fourier transform domain to process signals independently in each subband. In the STFT domain, the reverberated speech signal is described by an autoregressive model in each frequency band, thereby reducing the filter length of each subband. Since the room impulse response is actually time-varying, time-varying predictive model coefficients are modeled. Recently, some scholars proposed a multi-channel autoregressive (MAR) signal model in the STFT domain, using Kalman filter to estimate the MAR coefficients. This algorithm can be regarded as a generalized recursive least squares (RLS) algorithm.
基于STFT域的多通道线性预测算法的计算复杂度与每个子带滤波器阶数成平方关系。该复杂度限制了算法在很多资源有限的系统平台上的应用。文献[3](Dietzen T,Doclo S,Spriet A,et al.Low-complexity Kalmanfilterformulti-channel linear-prediction-basedblindspeechdereverberarion[C].IEEE Workshop on Applicationsof Signal Processing to Audio and Acoustics.IEEE,2017.)针对STFT域的自适应多通道线性预测去混响算法,提出了一种简化的卡尔曼滤波求解方法,将计算复杂度降到与滤波器阶数成线性关系。然而,该简化方法会导致一定程度的语音质量下降。另外,该算法只估计一个通道信号,实际中需要计算多个通道。The computational complexity of the multi-channel linear prediction algorithm based on the STFT domain is squared with the order of each subband filter. This complexity limits the application of the algorithm on many resource-limited system platforms. Literature [3] (Dietzen T, Doclo S, Spriet A, et al. Low-complexity Kalmanfilterformulti-channel linear-prediction-basedblindspeechdereverberarion[C].IEEE Workshop on Applicationsof Signal Processing to Audio and Acoustics.IEEE,2017.) for STFT Domain adaptive multi-channel linear prediction de-reverberation algorithm, a simplified Kalman filter solution method is proposed, which reduces the computational complexity to a linear relationship with the filter order. However, this simplified approach results in a certain degree of speech quality degradation. In addition, the algorithm only estimates one channel signal, and in practice multiple channels need to be calculated.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于克服目前去混响方法存在的上述缺陷,提出一种基于卡尔曼滤波的低复杂度去混响方法,该方法在保证不损失语音质量的同时,进一步降低STFT域自适应多通道线性预测去混响算法的复杂度。The purpose of the present invention is to overcome the above-mentioned defects existing in the current de-reverberation methods, and propose a low-complexity de-reverberation method based on Kalman filtering, which further reduces the STFT domain adaptive multiplication rate while ensuring no loss of speech quality. The complexity of the channel linear prediction de-reverberation algorithm.
为实现上述发明目的,本发明提出一种基于卡尔曼滤波的去混响方法,该方法包括:In order to achieve the above purpose of the invention, the present invention proposes a Kalman filter-based de-reverberation method, which includes:
将各麦克风采集到的原始信号进行预处理得到相应的频域信号,延迟后构成输入信号;The original signal collected by each microphone is preprocessed to obtain the corresponding frequency domain signal, and the input signal is formed after delay;
利用卡尔曼滤波算法和时变的多通道自回归模型估计混响信号,将当前时刻的各麦克风采集到的原始信号作为参考信号,减去混响信号得到误差信号;The reverberation signal is estimated by using the Kalman filter algorithm and the time-varying multi-channel autoregressive model, the original signal collected by each microphone at the current moment is used as the reference signal, and the error signal is obtained by subtracting the reverberation signal;
利用卡尔曼增益矩阵和误差信号更新卡尔曼滤波器的系数;Update the coefficients of the Kalman filter using the Kalman gain matrix and the error signal;
利用当前时刻各麦克风采集到的原始信号、输入信号和更新后的卡尔曼滤波器系数得到目标信号;The target signal is obtained by using the original signal collected by each microphone at the current moment, the input signal and the updated Kalman filter coefficients;
最后,利用逆傅里叶变换将频域目标信号转换到时域。Finally, the frequency domain target signal is converted to the time domain using the inverse Fourier transform.
作为上述方法的一种改进,所述方法具体包括:As an improvement of the above method, the method specifically includes:
步骤1)将M个麦克风采集到的信号ym(n),1≤m≤M进行分帧、加窗和傅里叶变换得到相应的频域信号Ym(n),Step 1) Framing, windowing and Fourier transform are performed on the signals y m (n) collected by the M microphones, 1≤m≤M, to obtain the corresponding frequency domain signal Y m (n),
频域信号Ym(n)为:The frequency domain signal Y m (n) is:
其中,k为频率下标,N是傅里叶变换的点数;n为时间帧下标,wSTFT(l)为短时傅里叶变换分析窗函数,R代表帧移;Among them, k is the frequency subscript, N is the number of Fourier transform points; n is the time frame subscript, w STFT (l) is the short-time Fourier transform analysis window function, and R represents the frame shift;
步骤2)由n-D到n-L时刻的M个麦克风的频域信号构成输入信号矩阵Y(n-D),利用卡尔曼权重向量估计混响信号向量r(n),其中D为延迟,L为线性预测长度;Step 2) The input signal matrix Y(n-D) is formed by the frequency domain signals of the M microphones from n-D to n-L, and the reverberation signal vector r(n) is estimated by the Kalman weight vector, where D is the delay and L is the linear prediction length. ;
y(n)=[Y1(n),...,YM(n)]T (2)y(n)=[Y 1 (n),...,Y M (n)] T (2)
式(3)中,IM是M×M的单位阵,代表Kronecker乘积,Y(n-D)是由麦克风观测信号构成的尺寸为M×Lc的稀疏矩阵,Lc=M2(L-D+1);In formula (3), IM is the unit matrix of M×M, represents the Kronecker product, Y(nD) is a sparse matrix of size M×L c formed by the microphone observation signal, L c =M 2 (L-D+1);
按照式(4)计算混响信号向量r(n);Calculate the reverberation signal vector r(n) according to formula (4);
式(4)中,M×M的矩阵Cp(n-1)为时变的卡尔曼权重向量系数,p=[D,D+1,...,L],Vec{·}为矩阵列堆叠操作因子;In formula (4), The M×M matrix C p (n-1) is the time-varying Kalman weight vector coefficient, p=[D,D+1,...,L], Vec{·} is the matrix column stacking operation factor;
步骤3)利用当前时刻各麦克风采集的信号y(n)减去所述的步骤2)获得的混响信号向量r(n)得到误差信号向量e(n);Step 3) use the signal y(n) collected by each microphone at the current moment to subtract the reverberation signal vector r(n) obtained in step 2) to obtain the error signal vector e(n);
e(n)=y(n)-r(n) (5)e(n)=y(n)-r(n) (5)
步骤4)计算卡尔曼增益矩阵K(n);Step 4) calculate the Kalman gain matrix K(n);
步骤5)由卡尔曼增益矩阵K(n)和误差信号向量e(n)更新卡尔曼滤波器系数 Step 5) Update the Kalman filter coefficients by the Kalman gain matrix K(n) and the error signal vector e(n)
步骤6)利用当前时刻麦克风采集的信号y(n)、输入信号矩阵Y(n-D)和更新后的卡尔曼滤波器系数计算目标信号向量x(n);Step 6) Use the signal y(n) collected by the microphone at the current moment, the input signal matrix Y(nD) and the updated Kalman filter coefficients Calculate the target signal vector x(n);
步骤7)对频域目标信号向量x(n)进行逆傅里叶变换,得到时域目标信号向量xt(l):Step 7) Inverse Fourier transform is performed on the frequency domain target signal vector x(n) to obtain the time domain target signal vector xt (l):
作为上述方法的一种改进,所述步骤4)具体包括:As an improvement of the above method, the step 4) specifically includes:
步骤401)按照式(6)采用一阶平滑的方式计算 Step 401) Calculate by first-order smoothing according to formula (6)
其中,为n-1时刻的目标信号方差,为n-2时刻的目标信号方差,x(n-1)为n-1时刻目标信号向量;α为平滑因子,取值为0.2;in, is the variance of the target signal at time n-1, is the variance of the target signal at time n-2, x(n-1) is the target signal vector at time n-1; α is the smoothing factor, which is 0.2;
步骤402)按照式(7)首先计算扰动噪声w(n)的方差然后按照式(8)计算先验失调方差 Step 402) First calculate the variance of the disturbance noise w(n) according to formula (7) Then calculate the prior offset variance according to formula (8)
式(7)中,Lc=M2(L-D+1),η通常为10-5;为n-1时刻的后验失调方差;In formula (7), L c =M 2 (L-D+1), and n is usually 10 -5 ; is the posterior offset variance at time n-1;
步骤403)按照式(9)由目标信号方差和先验失调方差计算规整化因子δ(n);Step 403) According to formula (9), the variance of the target signal is calculated by and a priori offset variance Calculate the normalization factor δ(n);
步骤404)按照式(10)由麦克风采集到的信号计算协方差矩阵SY(n-D);Step 404) calculate the covariance matrix S Y (nD) according to the signal collected by the microphone according to formula (10);
SY(n-D)=Y(n-D)YH(n-D) (10)S Y (nD) = Y (nD) Y H (nD) (10)
步骤405)按照式(11)计算卡尔曼增益矩阵K(n);Step 405) Calculate the Kalman gain matrix K(n) according to formula (11);
K(n)=YH(n-D)[SY(n-D)+δ(n)IM]-1 (11)。K(n)= YH(nD)[S Y ( nD)+δ(n) IM ] -1 (11).
作为上述方法的一种改进,所述步骤7)后还包括:As a kind of improvement of aforesaid method, after described step 7), also comprise:
更新后验失调方差 update posterior offset variance
一种基于卡尔曼滤波的去混响系统,包括存储器、处理器和存储在存储器上的并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现上述方法的步骤。A Kalman filter-based de-reverberation system, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the above method when executing the program A step of.
本发明的优点在于:The advantages of the present invention are:
1、本发明的方法通过对角化卡尔曼滤波器状态向量误差协方差矩阵,降低了自适应多通道线性预测去混响算法的复杂度;1. The method of the present invention reduces the complexity of the adaptive multi-channel linear prediction de-reverberation algorithm by diagonalizing the Kalman filter state vector error covariance matrix;
2、本发明的简化的卡尔曼滤波算法可看作是一种变规整化因子的归一化最小均方(Normalized Least Mean Square,NLMS)算法。另外,本发明提出的简化的卡尔曼滤波算法的误差信号向量e(n)和目标信号向量x(n)均为M×1的向量,这为后续级联其他多通道算法提供了方便。另外,也为计算目标信号的方差提供了更多的可用信息。2. The simplified Kalman filter algorithm of the present invention can be regarded as a normalized least mean square (Normalized Least Mean Square, NLMS) algorithm with a variable normalization factor. In addition, the error signal vector e(n) and the target signal vector x(n) of the simplified Kalman filtering algorithm proposed by the present invention are both M×1 vectors, which provides convenience for subsequent cascading of other multi-channel algorithms. In addition, it is also used to calculate the variance of the target signal More available information is provided.
附图说明Description of drawings
图1为房间混响产生示意图;Figure 1 is a schematic diagram of room reverberation generation;
图2为本发明的卡尔曼滤波去混响的框图;Fig. 2 is the block diagram of Kalman filter de-reverberation of the present invention;
图3为本发明的卡尔曼权向量更新的框图;Fig. 3 is the block diagram of Kalman weight vector update of the present invention;
图4为本发明的计算卡尔曼增益矩阵模块的框图;Fig. 4 is the block diagram of the calculation Kalman gain matrix module of the present invention;
图5为本发明的估计先验失调方差的框图。Figure 5 is a block diagram of the estimated prior offset variance of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明进行详细的说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
一种基于卡尔曼滤波的低复杂度去混响方法,所述方法包括:A low-complexity de-reverberation method based on Kalman filtering, the method comprising:
步骤1)将M个麦克风采集到的信号ym(n),1≤m≤M进行分帧、加窗和傅里叶变换得到相应的频域信号Ym(k,n),为简化表示,下文中将省略频率下标k;Step 1) The signal y m (n) collected by the M microphones, 1≤m≤M is divided into frames, windowed and Fourier transformed to obtain the corresponding frequency domain signal Y m (k, n), in order to simplify the representation , the frequency subscript k will be omitted below;
频域信号Ym(k,n)的计算按照式(1)计算:The calculation of the frequency domain signal Y m (k, n) is calculated according to formula (1):
其中,k为频率下标,N是傅里叶变换的点数;n为时间帧下标,wSTFT(l)为短时傅里叶变换分析窗函数,R代表帧移;Among them, k is the frequency subscript, N is the number of Fourier transform points; n is the time frame subscript, w STFT (l) is the short-time Fourier transform analysis window function, and R represents the frame shift;
步骤2)由n-D到n-L时刻的M个麦克风的频域信号构成输入信号矩阵Y(n-D),利用卡尔曼权重向量估计混响信号向量r(n),其中D为延迟,L为线性预测长度;Step 2) The input signal matrix Y(n-D) is formed by the frequency domain signals of the M microphones from n-D to n-L, and the reverberation signal vector r(n) is estimated by the Kalman weight vector, where D is the delay and L is the linear prediction length. ;
Y(n-D)是由麦克风观测信号构成的尺寸为M×Lc的稀疏矩阵,Lc=M2(L-D+1)。r(n)代表晚期混响。Y(nD) is a sparse matrix of size M×L c composed of microphone observation signals, L c =M 2 (L-D+1). r(n) stands for late reverberation.
按照式(2)和(3)得到输入信号矩阵Y(n-D);According to equations (2) and (3), the input signal matrix Y(n-D) is obtained;
y(k,n)=[Y1(k,n),...,YM(k,n)]T (2)y(k,n)=[Y 1 (k,n),...,Y M (k,n)] T (2)
式(3)中,代表Kronecker乘积。In formula (3), stands for the Kronecker product.
按照式(4)计算混响信号向量r(n);Calculate the reverberation signal vector r(n) according to formula (4);
式(4)中,表示对某一信号的估计值,M×M的矩阵Cp(n-1)为时变的卡尔曼权重向量系数系数,p=[D,D+1,...,L]。L为线性预测长度,延迟D>1的选择与STFT(Short-time Fourier transform,STFT)的帧重叠参数有关,取值要保证x(n)与r(n)的相关可以忽略。Vec{·}为矩阵列堆叠操作因子。In formula (4), represents the estimated value of a signal, The M×M matrix C p (n-1) is the time-varying Kalman weight vector coefficient coefficients, p=[D, D+1, . . . , L]. L is the linear prediction length. The selection of delay D>1 is related to the frame overlap parameter of STFT (Short-time Fourier transform, STFT). The value should ensure that the correlation between x(n) and r(n) can be ignored. Vec{·} is the matrix column stacking operation factor.
步骤3)利用当前时刻各麦克风采集的信号y(n)减去所述的步骤2)获得的混响信号向量r(n)得到误差信号向量e(n);Step 3) use the signal y(n) collected by each microphone at the current moment to subtract the reverberation signal vector r(n) obtained in step 2) to obtain the error signal vector e(n);
e(n)=y(n)-r(n) (5)e(n)=y(n)-r(n) (5)
步骤4)由输入信号矩阵Y(n-D)、目标信号方差和先验失调方差计算卡尔曼增益矩阵K(n);具体包括:Step 4) by the input signal matrix Y(nD), target signal variance and a priori offset variance Calculate the Kalman gain matrix K(n); specifically include:
步骤401)按照式(6)采用一阶平滑的方式计算n时刻的目标信号方差 Step 401) Calculate the variance of the target signal at time n in a first-order smoothing manner according to formula (6).
其中,为n-1时刻的目标信号方差,为n-2时刻的目标信号方差,x(n-1)为n-1时刻目标信号向量;α为平滑因子,取值为0.2;in, is the variance of the target signal at time n-1, is the variance of the target signal at time n-2, x(n-1) is the target signal vector at time n-1; α is the smoothing factor, which is 0.2;
步骤402)按照式(7)首先计算扰动噪声w(n)的方差然后按照式(8)计算先验失调方差 Step 402) First calculate the variance of the disturbance noise w(n) according to formula (7) Then calculate the prior offset variance according to formula (8)
式(7)中,Lc=M2(L-D+1),η是一个小正常数,一般建议取10-5。In formula (7), L c =M 2 (L-D+1), η is a small constant, and it is generally recommended to take 10 -5 .
步骤403)按照式(9)由目标信号方差和先验失调方差计算规整化因子δ(n);Step 403) According to formula (9), the variance of the target signal is calculated by and a priori offset variance Calculate the normalization factor δ(n);
步骤404)按照式(10)由麦克风采集到的信号计算协方差矩阵SY(n-D);Step 404) calculate the covariance matrix S Y (nD) according to the signal collected by the microphone according to formula (10);
SY(n-D)=Y(n-D)YH(n-D) (10)S Y (nD) = Y (nD) Y H (nD) (10)
步骤405)按照式(11)计算卡尔曼增益矩阵K(n);Step 405) Calculate the Kalman gain matrix K(n) according to formula (11);
K(n)=YH(n-D)[SY(n-D)+δ(n)IM]-1 (11)K(n)=Y H (nD)[S Y (nD)+δ(n)I M ] -1 (11)
步骤5)由卡尔曼增益矩阵K(n)和误差信号向量e(n)更新卡尔曼滤波器系数 Step 5) Update the Kalman filter coefficients by the Kalman gain matrix K(n) and the error signal vector e(n)
步骤6)利用当前时刻麦克风采集的信号y(n)、输入信号矩阵Y(n-D)和更新后的卡尔曼滤波器系数计算目标信号向量x(n);Step 6) Use the signal y(n) collected by the microphone at the current moment, the input signal matrix Y(nD) and the updated Kalman filter coefficients Calculate the target signal vector x(n);
步骤7)求频域信号向量x(n)的逆傅里叶变换,得到时域目标信号向量xt(l);Step 7) seek the inverse Fourier transform of the frequency domain signal vector x(n) to obtain the time domain target signal vector xt (l);
步骤8)更新后验失调方差 Step 8) Update the posterior offset variance
式(15)中,IM是M×M的单位阵,Lc=M2(L-D+1),L为线性预测长度。tr[·]表示求矩阵的迹。In formula (15), IM is an M×M unit matrix, L c =M 2 (L-D+1), and L is the linear prediction length. tr[·] means to find the trace of the matrix.
如图2所示,图2为本发明的基于卡尔曼滤波的低复杂度去混响算法系统框图。其中,Y(n-D)是由n-D到n-L时刻的M个麦克风的频域信号构成的输入信号矩阵,r(n)是由卡尔曼滤波算法估计出的混响信号向量,y(n)是由当前时刻麦克风采集的信号构成的参考信号向量,x(n)为最终输出的目标信号向量。傅里叶变换模块201表示对麦克风采集的信号进行傅里叶变换,第m个麦克风信号的傅里叶变换用Ym(n)表示。延时模块202表示对麦克风采集的信号进行延迟操作。延迟D>1的选择与STFT的帧重叠参数有关,取值要保证x(n)与r(n)的相关可以忽略。卡尔曼滤波模块203表示利用卡尔曼滤波器对输入信号进行滤波,估计混响信号。由求和模块204计算得到目标信号向量x(n)。逆傅里叶变换模块205将频域信号变换到时域。As shown in FIG. 2 , FIG. 2 is a system block diagram of a low-complexity de-reverberation algorithm based on Kalman filtering of the present invention. Among them, Y(nD) is the input signal matrix composed of the frequency domain signals of M microphones from nD to nL, r(n) is the reverberation signal vector estimated by the Kalman filter algorithm, and y(n) is composed of The reference signal vector formed by the signal collected by the microphone at the current moment, x(n) is the final output target signal vector. The Fourier transform module 201 represents performing Fourier transform on the signal collected by the microphone, and the Fourier transform of the mth microphone signal is represented by Y m (n). The delay module 202 represents performing a delay operation on the signal collected by the microphone. The selection of delay D>1 is related to the frame overlap parameter of STFT, and the value should ensure that the correlation between x(n) and r(n) can be ignored. The
图3为卡尔曼权系数更新原理框图,其中包含卡尔曼增益计算模块303。由误差信号向量、卡尔曼增益矩阵得到权向量的更新量,由更新的权向量可计算得到最终输出的目标信号向量x(n)。FIG. 3 is a block diagram showing the principle of updating the Kalman weight coefficient, which includes a Kalman
图4为计算卡尔曼增益矩阵的原理框图,其中包含先验失调方差估计模块403。乘积模块401实现两输入变量相乘,求逆模块402表示对输入信号进行取逆操作。利用目标信号的方差输入信号矩阵Y(n-D)和先验失调误差计算卡尔曼增益矩阵。由先验失调方差估计模块403计算得到。卡尔曼增益对滤波器权系数的更新以及先验失调方差的估计至关重要。首先计算Re(n),然后计算得到卡尔曼增益矩阵K(n)。FIG. 4 is a schematic block diagram of calculating the Kalman gain matrix, which includes a priori offset
图5所示的先验失调方差估计模块也反映了后验失调方差的计算方法。转置模块501表示对矩阵进行转置操作。模块503表示求矩阵的迹。The prior offset variance estimation module shown in Figure 5 also reflects the posterior offset variance calculation method. The transpose module 501 represents a transpose operation on the matrix.
通过上述分析和图2、图3和图4可以得出以下结论:From the above analysis and Figure 2, Figure 3 and Figure 4, the following conclusions can be drawn:
首先,采用本发明技术后,大大降低了STFT域自适应多通道线性预测去混响算法的计算复杂度;First, after adopting the technology of the present invention, the computational complexity of the STFT domain adaptive multi-channel linear prediction de-reverberation algorithm is greatly reduced;
其次,采用本发明技术后,不仅降低了计算复杂度,输出的语音质量也得到了保证;Secondly, after adopting the technology of the present invention, not only the computational complexity is reduced, but the output voice quality is also guaranteed;
最后,采用本发明技术后,可以在卡尔曼滤波器的跟踪性能和收敛性能之间得到很好的折中。Finally, after adopting the technology of the present invention, a good compromise can be obtained between the tracking performance and the convergence performance of the Kalman filter.
以上充分的表明本发明提供了一种有效的去混响技术,可以很好的去除由于房间声反射引起的混响干扰,提高语音可懂度和自动语音识别系统的准确识别率。The above fully shows that the present invention provides an effective de-reverberation technology, which can well remove the reverberation interference caused by room sound reflection, and improve the speech intelligibility and the accurate recognition rate of the automatic speech recognition system.
应该指出的是,本发明所描述的简化的卡尔曼滤波算法可看作是一种变规整化因子的NLMS算法,其中δ(n)可视为一个可变的规整化因子。方差对滤波器系数c(n)的估计具有重要作用,较小的值表征了良好的失调性能及差的跟踪性能,较大的值表征了良好的跟踪性能及差的失调性能。换句话说,的取值高度决定了卡尔曼滤波器的跟踪性能和收敛性能。当算法还未收敛时,和的差值较大,根据式(7),此时也取较大的值,因此提供了快速的收敛性能和跟踪性能。当算法开始收敛到稳态时,和的差值减小,导致了较小的也就是较低的失调。It should be pointed out that the simplified Kalman filter algorithm described in the present invention can be regarded as an NLMS algorithm with a variable normalization factor, wherein δ(n) can be regarded as a variable normalization factor. variance It plays an important role in the estimation of the filter coefficient c(n), the smaller values characterize good offset performance and poor tracking performance, larger The values characterize good tracking performance and poor offset performance. in other words, The value of the height determines the tracking performance and convergence performance of the Kalman filter. When the algorithm has not converged, and The difference is large, according to formula (7), A larger value is also taken at this time, thus providing fast convergence performance and tracking performance. When the algorithm begins to converge to a steady state, and The difference decreases, resulting in a smaller That is, lower dissonance.
最后所应说明的是,以上实施例仅用以说明本发明的技术方案而非限制。尽管参照实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,对本发明的技术方案进行修改或者等同替换,都不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the embodiments, those of ordinary skill in the art should understand that any modification or equivalent replacement of the technical solutions of the present invention will not depart from the spirit and scope of the technical solutions of the present invention, and should be included in the present invention. within the scope of the claims.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711285885.4A CN108172231B (en) | 2017-12-07 | 2017-12-07 | A Kalman Filter-Based Reverberation Method and System |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711285885.4A CN108172231B (en) | 2017-12-07 | 2017-12-07 | A Kalman Filter-Based Reverberation Method and System |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108172231A CN108172231A (en) | 2018-06-15 |
CN108172231B true CN108172231B (en) | 2021-07-30 |
Family
ID=62524587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711285885.4A Active CN108172231B (en) | 2017-12-07 | 2017-12-07 | A Kalman Filter-Based Reverberation Method and System |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108172231B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108600894B (en) * | 2018-07-11 | 2023-07-04 | 甘肃米笛声学有限公司 | Earphone self-adaptive active noise control system and method |
CN109297718B (en) * | 2018-09-29 | 2020-08-07 | 重庆长安汽车股份有限公司 | Evaluation method of order howling noise |
CN109243476B (en) * | 2018-10-18 | 2021-09-03 | 电信科学技术研究院有限公司 | Self-adaptive estimation method and device for post-reverberation power spectrum in reverberation voice signal |
CN110289011B (en) * | 2019-07-18 | 2021-06-25 | 大连理工大学 | A Speech Enhancement System for Distributed Wireless Acoustic Sensor Networks |
CN111599372B (en) * | 2020-04-02 | 2023-03-21 | 云知声智能科技股份有限公司 | Stable on-line multi-channel voice dereverberation method and system |
CN111474481B (en) * | 2020-04-13 | 2022-08-09 | 深圳埃瑞斯瓦特新能源有限公司 | Battery SOC estimation method and device based on extended Kalman filtering algorithm |
CN111599374B (en) * | 2020-04-16 | 2023-04-18 | 云知声智能科技股份有限公司 | Single-channel voice dereverberation method and device |
CN111540372B (en) * | 2020-04-28 | 2023-09-12 | 北京声智科技有限公司 | Method and device for noise reduction processing of multi-microphone array |
CN111933170B (en) * | 2020-07-20 | 2024-03-29 | 歌尔科技有限公司 | Voice signal processing method, device, equipment and storage medium |
CN112017680B (en) * | 2020-08-26 | 2024-07-02 | 西北工业大学 | Dereverberation method and device |
CN115065422B (en) * | 2021-07-26 | 2024-12-10 | 中国计量科学研究院 | System and method for evaluating communication quality in a reverberation room |
CN114205731B (en) * | 2021-12-08 | 2023-12-26 | 随锐科技集团股份有限公司 | Speaker area detection method, speaker area detection device, electronic equipment and storage medium |
CN116320857A (en) * | 2023-03-27 | 2023-06-23 | 厦门亿联网络技术股份有限公司 | Kalman self-adaption-based array microphone noise reduction method and device |
CN117316175B (en) * | 2023-11-28 | 2024-01-30 | 山东放牛班动漫有限公司 | Intelligent encoding storage method and system for cartoon data |
CN117318671B (en) * | 2023-11-29 | 2024-04-23 | 有研(广东)新材料技术研究院 | Self-adaptive filtering method based on fast Fourier transform |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101460999A (en) * | 2006-06-05 | 2009-06-17 | 埃克奥迪公司 | Blind signal extraction |
CN103187068A (en) * | 2011-12-30 | 2013-07-03 | 联芯科技有限公司 | Priori signal-to-noise ratio estimation method, device and noise inhibition method based on Kalman |
CN107393550A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Method of speech processing and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130332156A1 (en) * | 2012-06-11 | 2013-12-12 | Apple Inc. | Sensor Fusion to Improve Speech/Audio Processing in a Mobile Device |
-
2017
- 2017-12-07 CN CN201711285885.4A patent/CN108172231B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101460999A (en) * | 2006-06-05 | 2009-06-17 | 埃克奥迪公司 | Blind signal extraction |
CN103187068A (en) * | 2011-12-30 | 2013-07-03 | 联芯科技有限公司 | Priori signal-to-noise ratio estimation method, device and noise inhibition method based on Kalman |
CN107393550A (en) * | 2017-07-14 | 2017-11-24 | 深圳永顺智信息科技有限公司 | Method of speech processing and device |
Non-Patent Citations (2)
Title |
---|
Multichannel Online Blind Speech Dereverberation with Marginalization of Static Observation Parameters in a Rao-Blackwellized Particle Filter;Christine Evers et al.;《Journal of Signal Processing Systems》;20110615;第315-316页 * |
Online Dereverberation for Dynamic Scenarios Using a Kalman Filter With an Autoregressive Model;Sebastian Braun et al.;《IEEE Signal Processing Letters》;20161231;第23卷(第12期);第1741-1743页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108172231A (en) | 2018-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108172231B (en) | A Kalman Filter-Based Reverberation Method and System | |
Kinoshita et al. | Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. | |
JP5124014B2 (en) | Signal enhancement apparatus, method, program and recording medium | |
Doclo et al. | GSVD-based optimal filtering for single and multimicrophone speech enhancement | |
CN107993670B (en) | Microphone array speech enhancement method based on statistical model | |
CN108141656B (en) | Method and apparatus for digital signal processing of microphones | |
US11373667B2 (en) | Real-time single-channel speech enhancement in noisy and time-varying environments | |
CN106875938B (en) | An Improved Nonlinear Adaptive Speech Endpoint Detection Method | |
JP6894580B2 (en) | Signal processing devices and methods that provide audio signals with reduced noise and reverberation | |
CN108154885A (en) | It is a kind of to use QR-RLS algorithms to multicenter voice signal dereverberation method | |
JP6225245B2 (en) | Signal processing apparatus, method and program | |
CN112037809A (en) | Residual echo suppression method based on deep neural network with multi-feature flow structure | |
CN110111802B (en) | Kalman filtering-based adaptive dereverberation method | |
KR20220022286A (en) | Method and apparatus for extracting reverberant environment embedding using dereverberation autoencoder | |
CN110111804A (en) | Adaptive dereverberation method based on RLS algorithm | |
Kinoshita et al. | Multi-step linear prediction based speech dereverberation in noisy reverberant environment. | |
WO2020078210A1 (en) | Adaptive estimation method and device for post-reverberation power spectrum in reverberation speech signal | |
CN116052702A (en) | Kalman filtering-based low-complexity multichannel dereverberation noise reduction method | |
CN113851141A (en) | Novel method and device for noise suppression by microphone array | |
US11195540B2 (en) | Methods and apparatus for an adaptive blocking matrix | |
Prasad et al. | Two microphone technique to improve the speech intelligibility under noisy environment | |
Jung et al. | Noise Reduction after RIR removal for Speech De-reverberation and De-noising | |
Kinoshita et al. | A linear prediction-based microphone array for speech dereverberation in a realistic sound field | |
Schwartz et al. | LPC-based speech dereverberation using Kalman-EM algorithm | |
Islam et al. | Statistical modeling for suppression of late reverberation with inverse filtering for early reflections |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |