CN106504763A - Multi-target Speech Enhancement Method Based on Microphone Array Based on Blind Source Separation and Spectral Subtraction - Google Patents
Multi-target Speech Enhancement Method Based on Microphone Array Based on Blind Source Separation and Spectral Subtraction Download PDFInfo
- Publication number
- CN106504763A CN106504763A CN201611191478.2A CN201611191478A CN106504763A CN 106504763 A CN106504763 A CN 106504763A CN 201611191478 A CN201611191478 A CN 201611191478A CN 106504763 A CN106504763 A CN 106504763A
- Authority
- CN
- China
- Prior art keywords
- signal
- speech
- frame
- noise
- spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
本发明公开了一种基于盲源分离与谱减法的麦克风阵列多目标语音增强方法,其包括下列步骤:通过麦克风阵列采集多通道多目标信号;分别对采集单通道信号进行带通滤波处理,以屏蔽非语音段噪声和干扰,以及预加重处理;再进行语音加窗分帧处理,得到帧信号,并利用短时傅立叶变换将各帧转换到频域,提取各帧的幅度谱、相位谱;检测语音信号起始端点和结束端点,估计噪声功率谱;基于谱减法降低语音帧的背景噪声;对谱减法后输出的信号与相位谱结合,进行短时傅立叶反变换,得到时域的语音信号;最后进行盲源分离,得到各目标信号。本发明的实现方法简单,对资源需求低,计算复杂度低,并且能实现多目标信号增强。
The invention discloses a microphone array multi-target speech enhancement method based on blind source separation and spectral subtraction, which comprises the following steps: collecting multi-channel multi-target signals through a microphone array; performing band-pass filter processing on the collected single-channel signals respectively to obtain Shield non-speech segment noise and interference, and pre-emphasis processing; then perform speech windowing and framing processing to obtain frame signals, and use short-time Fourier transform to convert each frame to the frequency domain, and extract the amplitude spectrum and phase spectrum of each frame; Detect the start endpoint and end endpoint of the speech signal, estimate the noise power spectrum; reduce the background noise of the speech frame based on spectral subtraction; combine the output signal after spectral subtraction with the phase spectrum, and perform short-time Fourier inverse transform to obtain the speech signal in the time domain ; Finally, perform blind source separation to obtain each target signal. The implementation method of the invention is simple, has low resource requirements, low calculation complexity, and can realize multi-target signal enhancement.
Description
技术领域technical field
本发明属于信号处理技术和计算机语音信号处理技术领域,具体涉及一种基于阵列麦克风的语音增强方法。The invention belongs to the technical fields of signal processing technology and computer voice signal processing, and in particular relates to a voice enhancement method based on an array microphone.
背景技术Background technique
语音增强的目标是从含有噪声的语音信号中提取尽可能纯净的原始语音,抑制背景噪声,提高语音的质量并提高听者的舒适度,使听者不会感到疲劳。它在解决噪声污染、改善语音质量、提高语音可懂度等方面发挥着越来越重要的作用。语音增强技术是语音信号处理发展到实用阶段后需要迫切解决的问题。语音识别中抗噪声干扰是提高识别率的一个重要环节。随着语音识别应用的不断扩大并进入实用阶段,急切需要采用更为有效的语音增强技术,加强语音识别特征,使语音易于识别。语音信号是一种复杂的非线性信号,如何从各种混合语音信号中,特别是从同声道语音干扰中分离出所需要的语音信号是一个很难的数字信号处理问题,任何算法都不可能将噪声完全滤除,都很难在所有噪声都存在的情况下保持较高的主观及客观的评价性能。The goal of speech enhancement is to extract the original speech as pure as possible from the noisy speech signal, suppress background noise, improve the quality of speech and improve the comfort of the listener so that the listener will not feel tired. It is playing an increasingly important role in solving noise pollution, improving speech quality, and improving speech intelligibility. Speech enhancement technology is a problem that needs to be solved urgently after the speech signal processing develops to the practical stage. Anti-noise interference in speech recognition is an important link to improve the recognition rate. With the continuous expansion of speech recognition applications and entering the practical stage, there is an urgent need to adopt more effective speech enhancement technology to strengthen speech recognition features and make speech easy to recognize. Speech signal is a kind of complex nonlinear signal, how to separate the required speech signal from various mixed speech signals, especially from co-channel speech interference is a difficult digital signal processing problem, and any algorithm is impossible If the noise is completely filtered out, it is difficult to maintain high subjective and objective evaluation performance in the presence of all noise.
基于麦克风阵列语音增强方法典型的工作流程如图1所示,具体流程主要包括下列步骤:A typical workflow of the speech enhancement method based on the microphone array is shown in Figure 1, and the specific process mainly includes the following steps:
1)根据需求,设计满足需求的麦克风阵列结构。1) According to the requirements, design a microphone array structure that meets the requirements.
2)多通道语音信号采集系统,用于采集多通道语音信号;2) multi-channel voice signal acquisition system, used to collect multi-channel voice signals;
3)对采集的多通道语音信号进行预处理、语音激活检测、通道延时估计、目标信号方位估计等等的预处理操作。3) Perform preprocessing operations such as preprocessing, voice activation detection, channel delay estimation, and target signal orientation estimation on the collected multi-channel voice signals.
4)利用阵列语音增强算法进行语音增强,得到较为纯净的语音信号。4) The array speech enhancement algorithm is used for speech enhancement to obtain a relatively pure speech signal.
步骤1)中,设计一个合适的麦克风阵列结构是非常重要的。In step 1), it is very important to design an appropriate microphone array structure.
麦克风阵列拓扑结构可以分为一维线性阵列(包括等距阵列、嵌套线性阵以及非等距阵列)、二维面阵(包括均匀和非均匀圆形阵、方阵)与三维立体阵列。实际中,应用较多的有均匀线阵列、嵌套线阵、均匀面阵等。研究表明,阵列拓扑结构对麦克风阵列语音系统的影响较大。且阵列拓扑结构的设计与多通道信号模型的选择有密切关系。Microphone array topology can be divided into one-dimensional linear array (including equidistant array, nested linear array and non-equidistant array), two-dimensional area array (including uniform and non-uniform circular array, square array) and three-dimensional stereo array. In practice, uniform line arrays, nested line arrays, and uniform area arrays are widely used. The research shows that the array topology has a great influence on the microphone array speech system. And the design of array topology is closely related to the choice of multi-channel signal model.
根据声源距离阵列的远近,可以将声音信号模型分为远场模型和近场模型。两者的区别在于:远场模型使用平面波模型,它忽略各个通道接收信号的幅度差,信源相对阵列来说只有一个入射角度,各个阵元之间的延时长度呈线性关系;近场模型使用球面波形,它考虑接收信号间的幅度差,且对于每一个阵元来说都有一个入射角度,各个阵元间的延时长度没有明显关系。近场和远场的划分没有一个绝对的标准,通常认为当信源和阵列中心的距离远远大于信号波长时该信源处在远场;反之则为近场。According to the distance of the sound source from the array, the sound signal model can be divided into a far-field model and a near-field model. The difference between the two is: the far-field model uses the plane wave model, which ignores the amplitude difference of the received signal of each channel, the source has only one incident angle relative to the array, and the delay length between each array element is linear; the near-field model Using a spherical waveform, it considers the amplitude difference between received signals, and there is an incident angle for each array element, and the delay length between each array element has no obvious relationship. There is no absolute standard for the division of near field and far field. It is generally believed that when the distance between the source and the center of the array is much greater than the signal wavelength, the source is in the far field; otherwise, it is in the near field.
通常,麦克风阵列可以看作是一个用来进行空间采样的装置,与时间采样相似,阵列采样频率必须足够高才不会引起空间模糊,避免空间混叠。对于一个等距线阵,空间采样率定义为:即空间采样频率Us由麦克风阵间距d决定。考虑到同一信号的相邻采样差别为一个相移,定义归一化的空间频率为:其中λ表示波长,Φ表示入射角度。为避免空间混叠,要求归一化频率U满足:此时入射角度对应范围是-90°≤Φ≤90°,因此相邻麦克风间隔(麦克风阵间距)应该为: Generally, a microphone array can be regarded as a device for spatial sampling. Similar to time sampling, the array sampling frequency must be high enough to avoid spatial ambiguity and avoid spatial aliasing. For an equidistant linear array, the spatial sampling rate is defined as: That is, the spatial sampling frequency U s is determined by the microphone array spacing d. Considering that the difference between adjacent samples of the same signal is a phase shift, the normalized spatial frequency is defined as: Where λ represents the wavelength and Φ represents the angle of incidence. In order to avoid spatial aliasing, the normalized frequency U is required to satisfy: At this time, the corresponding range of incident angle is -90 ° ≤ Φ ≤ 90 ° , so the distance between adjacent microphones (microphone array distance) should be:
上述空间采样定理揭示了麦克风阵间距、信号频率和来波方向(入射角度Φ)三者之间关系。如果不满足空间采样定理,则会出现空间混叠现象。The above spatial sampling theorem reveals the relationship between the microphone array spacing, signal frequency and incoming wave direction (incident angle Φ). If the spatial sampling theorem is not satisfied, spatial aliasing occurs.
对于一个均匀线性麦克风阵列,定义rm为声源到第m个麦克风阵列中心的直线距离。则第m个麦克风输出的离散信号可以表示为:xm[n]=s[n-Δnm]+ηm[n],其中s[n]为声源信号,Δnm为第m个麦克风接收到的信号与声源信号之间的样本点延迟,ηm[n]为第m个麦克风接收到的噪声信号。令Δτm为第m个麦克风接收到的信号与声源信号之间的时间延迟,则有如下关系:其中fs为时间采样频率,c为声波在空间中的传播速度。由此可以建立麦克风阵列输出的阵列信号矩阵:For a uniform linear microphone array, define r m as the linear distance from the sound source to the center of the mth microphone array. Then the discrete signal output by the mth microphone can be expressed as: x m [n]=s[n-Δn m ]+η m [n], where s[n] is the sound source signal, and Δn m is the mth microphone The sample point delay between the received signal and the sound source signal, η m [n] is the noise signal received by the mth microphone. Let Δτ m be the time delay between the signal received by the mth microphone and the sound source signal, then the relationship is as follows: Among them, f s is the time sampling frequency, and c is the propagation speed of the sound wave in space. From this, the array signal matrix output by the microphone array can be established:
x1[n]=s[n-Δn1]+η1[n]x 1 [n]=s[n-Δn 1 ]+η 1 [n]
x2[n]=s[n-Δn2]+η2[n]x 2 [n]=s[n-Δn 2 ]+η 2 [n]
xN[n]=s[n-ΔnN]+ηN[n]x N [n]=s[n-Δn N ]+η N [n]
N为阵列麦克风的阵元个数。N is the number of array elements of the array microphone.
在步骤3)中,可视不同的增强方法或曾或减。In step 3), different enhancement methods can be used or subtracted.
在预处理中,预加重和预滤波是由语音信号特性所决定的。预滤波的目的有两个:①抑制输入信号各频域分量中频率超出fs/2的所有分量,以防止混叠干扰;②抑制50Hz的电源工频干扰。这样预滤波器必须是一个带通滤波器,设其上、下截止频率分别为fH和fL,则fH=3400Hz,fL=60~100Hz,采样频率fs=16000Hz。In pre-processing, pre-emphasis and pre-filtering are determined by the characteristics of the speech signal. There are two purposes of pre-filtering: ① suppress all components whose frequency exceeds f s /2 in each frequency domain component of the input signal to prevent aliasing interference; ② suppress 50Hz power frequency interference. Such a pre-filter must be a band-pass filter, and its upper and lower cut-off frequencies are respectively f H and f L , then f H =3400Hz, f L =60-100Hz, and sampling frequency f s =16000Hz.
由于语音信号的平均功率谱受声门激励和口鼻辐射影响,高频端大约在800Hz以上按6dB/倍频跌落,所以在求语音信号频谱时,频率越高的相应成分越小,高频部分的频谱比低频部分的难求,为此要在预处理中进行预加重处理。预加重的目的是提升高频部分,使信号的频谱变得平坦,保持在低频到高频的整个频带中,能用同样的信噪比求频谱,以便于频谱分析或声道参数分析。预加重可由提升高频特性的预加重数字滤波器来实现,它一般是一阶数字滤波器,基于其工作原理,可以得到对应的加重方式为:s′(n)=s(n)-αs(n+1),为了恢复原信号,需要对做过预加重的信号频谱进行去加重处理,即s′′(n)=s′(n)+βs′(n+1)其中,s(n)表示声源信号,s′(n)表示加重处理后的信号,s″(n)表示去加重处理后的信号,与β为加重因子,一般取-0.8~0.95。Since the average power spectrum of the speech signal is affected by glottal excitation and mouth and nose radiation, the high-frequency end drops at a rate of 6dB/octave above 800Hz. Therefore, when calculating the speech signal spectrum, the higher the frequency, the smaller the corresponding component, and the high-frequency Part of the spectrum is more difficult to find than the low frequency part, so pre-emphasis should be done in the pre-processing. The purpose of pre-emphasis is to enhance the high-frequency part, make the spectrum of the signal flat, and keep it in the entire frequency band from low frequency to high frequency. The same signal-to-noise ratio can be used to calculate the spectrum, so as to facilitate spectrum analysis or channel parameter analysis. Pre-emphasis can be realized by a pre-emphasis digital filter that improves high-frequency characteristics. It is generally a first-order digital filter. Based on its working principle, the corresponding emphasis method can be obtained as: s′(n)=s(n)-αs (n+1), in order to restore the original signal, it is necessary to de-emphasize the signal spectrum that has been pre-emphasized, that is, s''(n)=s'(n)+βs'(n+1) where, s( n) represents the sound source signal, s'(n) represents the signal after emphasis processing, and s"(n) represents the signal after de-emphasis processing, and β are aggravating factors, generally -0.8 to 0.95.
由于语音信号是一种非平稳的时变信号,其产生过程与发声器官的运动紧密相关。而发声器官的状态速度较声音振动的速度缓慢的多,因此语音信号可以认为是短时平稳的。研究发现,在5~50ms的范围内,语音频谱特征和一些物理特征参数基本保持不变。因此可以将平稳过程中的处理方法和理论引入到语音信号的短时处理当中,将语音信号划分为很多短时的语音段,每个短时的语音段称为一个分析帧。这样,对每一帧信号处理就相当于对特征固定的持续信号进行处理。帧既可以是连续的,也可以采用交叠分帧,一般帧长取10~30ms。取数据时,前一帧和后一帧的交迭部分称为帧移,帧移与帧长之比一般取为0~1/2。对取出的语音帧要经过加窗处理,即用一定的窗函数w(n)与信号相乘,从而形成加窗语音。加窗的主要作用在于减少由分帧处理带来的频谱泄露,这是因为,分帧是对语音信号的突然截断,相当于语音信号的频谱与矩形窗函数频谱的周期卷积。由于矩形窗频谱的旁瓣较高,信号的频谱会产生“拖尾”,即频谱泄露。为此,可采用汉明窗,因为汉明窗旁瓣最低,可以有效地克服泄露现象,具有更平滑的低通特性,得到的频谱比较平滑。Since the speech signal is a non-stationary time-varying signal, its generation process is closely related to the movement of vocal organs. The state speed of the vocal organ is much slower than the speed of sound vibration, so the speech signal can be considered to be short-term stable. The study found that within the range of 5-50ms, the speech spectrum characteristics and some physical characteristic parameters basically remain unchanged. Therefore, the processing methods and theories in the stationary process can be introduced into the short-term processing of the speech signal, and the speech signal is divided into many short-term speech segments, and each short-term speech segment is called an analysis frame. In this way, processing each frame signal is equivalent to processing a continuous signal with fixed characteristics. Frames can be continuous or overlapped and divided into frames. Generally, the frame length is 10-30ms. When fetching data, the overlapping part of the previous frame and the next frame is called frame shift, and the ratio of frame shift to frame length is generally taken as 0~1/2. The extracted voice frame needs to be windowed, that is, the signal is multiplied by a certain window function w(n) to form a windowed voice. The main function of windowing is to reduce the spectrum leakage caused by framing processing. This is because framing is a sudden truncation of the speech signal, which is equivalent to the periodic convolution of the spectrum of the speech signal and the spectrum of the rectangular window function. Due to the high side lobe of the spectrum of the rectangular window, the spectrum of the signal will produce "tailing", that is, spectrum leakage. For this reason, the Hamming window can be used, because the Hamming window has the lowest side lobe, can effectively overcome the leakage phenomenon, has smoother low-pass characteristics, and obtains a smoother spectrum.
阵元间时间延时的估计在整个麦克风阵列语音增强算法中有很重要的作用:它和信号频率共同决定了波束的指向性以及用于对声源的方位估计。时间延时估计精度直接影响语音处理系统的性能。由于麦克风阵列对语音信号的空间采样,使得麦克风接收到的信号相对与参考麦克风而言都有一定的延时。为使波束形成的输出的最大指向对准目标信号源,保持各个麦克风接收到的期望语音信号同步是解决该问题的重要手段。典型的时延估计方法有广义互相关时延估计方法,基于自适应滤波的时延估计方法、自适应特征分解、高阶累积量估计方法等等。其中广义互相关时延估计方法应用最为普遍。假设一对麦克风接收到语音信号模型为:x1(t)=s(t)+η1、x2(t)=s(t-D)+η2,其中s(t)为声源信号,x1(t)和x2(t)分别是两个麦克风接收的信号,D为两个麦克风之间的声音传播延时,η1和η2为加性背景噪声。假设s(t)、η1、η2互不相关,这里忽略信号幅度衰减。则x1(t)和x2(t)之间的广义互相关函数R12(τ)为:The estimation of the time delay between array elements plays a very important role in the entire microphone array speech enhancement algorithm: together with the signal frequency, it determines the directivity of the beam and is used to estimate the direction of the sound source. The accuracy of time delay estimation directly affects the performance of the speech processing system. Due to the spatial sampling of the speech signal by the microphone array, the signal received by the microphone has a certain delay relative to the reference microphone. In order to align the maximum direction of the beamformed output to the target signal source, maintaining the synchronization of the desired voice signals received by each microphone is an important means to solve this problem. Typical time delay estimation methods include generalized cross-correlation time delay estimation method, time delay estimation method based on adaptive filtering, adaptive eigendecomposition, high-order cumulant estimation method and so on. Among them, the generalized cross-correlation time delay estimation method is most commonly used. Assume that a pair of microphones receive the speech signal model as follows: x 1 (t)=s(t)+η 1 , x 2 (t)=s(tD)+η 2 , where s(t) is the sound source signal, x 1 (t) and x 2 (t) are the signals received by the two microphones respectively, D is the sound propagation delay between the two microphones, and η 1 and η 2 are additive background noises. Assuming that s(t), η 1 , and η 2 are not correlated with each other, signal amplitude attenuation is ignored here. Then the generalized cross-correlation function R 12 (τ) between x 1 (t) and x 2 (t) is:
其中X1(ω)和X2(ω)分别为x1(t)和x2(t)的傅立叶变换,ψ12(ω)为广义互相关加权函数。针对不同的情况选择不同的加权函数,使得R12(τ)具有比较尖锐的峰值,则峰值处即为两个麦克风之间的时延。Where X 1 (ω) and X 2 (ω) are Fourier transforms of x 1 (t) and x 2 (t) respectively, and ψ 12 (ω) is a generalized cross-correlation weighting function. Different weighting functions are selected for different situations, so that R 12 (τ) has a relatively sharp peak, and the peak is the time delay between the two microphones.
语音激活检测又称语音检测、语音端点检测,用于精确地确定输入语音的起点和终点,以保证语音处理系统良好的性能,对于语音和噪声的处理方法不同,如果不能判断当前语音帧是含噪语音帧或是噪声帧的话,就不能进行适当的处理。在语音增强系统中,为了得到更多的背景噪声特性,语音端点检测更注重于如何准确的检测出无音段。语音知识的学习和噪声源信息估计的积累都依赖于准确的端点检测。通常的语音激活检测是基于语音帧来进行的,语音帧的长度在10~30ms不等。语音激活检测的方法可以综述为:从输入信号中提取一个或一系列的对比特征参数,然后将其和一个或一系列的门限阈值进行比较。如果超过门限则表示当前为有音段,否则就表示当前为无音段。Voice activation detection, also known as voice detection and voice endpoint detection, is used to accurately determine the starting point and end point of the input voice to ensure the good performance of the voice processing system. The processing methods for voice and noise are different. If it cannot be judged whether the current voice frame contains Noisy speech frames or noisy frames cannot be properly processed. In the speech enhancement system, in order to obtain more background noise characteristics, the speech endpoint detection focuses more on how to accurately detect the silent segment. Both the learning of phonetic knowledge and the accumulation of noise source information estimation rely on accurate endpoint detection. Common voice activation detection is performed based on voice frames, and the length of the voice frames ranges from 10 to 30 ms. The method of voice activation detection can be summarized as: extracting one or a series of contrast feature parameters from the input signal, and then comparing it with one or a series of threshold thresholds. If it exceeds the threshold, it means that there is currently a sound segment, otherwise it means that there is currently no sound segment.
语音检测一般有两个步骤:Speech detection generally has two steps:
第一步:基于语音信号的特征。用能量、过零率、熵(entropy)、音高等参数,以及它们的衍生参数来判断信号流中的语音/非语音段。The first step: based on the characteristics of the speech signal. Use parameters such as energy, zero-crossing rate, entropy, pitch, and their derivative parameters to judge speech/non-speech segments in the signal stream.
第二步:在信号流中检测到语音信号后,判断此处是语音的开始点或是结束点。在语音系统中,由于信号多变的背景和自然对话模式而更容易使句中有停顿(非语音),特别是在爆发声母前重是无声间隙。因此这种开始或结束的判断尤为重要。Step 2: After the voice signal is detected in the signal stream, it is judged whether this is the start point or the end point of the voice. In the speech system, it is easier to have mid-sentence pauses (non-speech) due to the variable background of the signal and natural dialogue patterns, especially the silent gap before the burst initial. Therefore, this judgment of beginning or end is particularly important.
目前语音端点检测所采取的方法大体可以分为两类:At present, the methods adopted for voice endpoint detection can be roughly divided into two categories:
第一类是噪声环境下基于HMM模型的语音信号端点检测的方法,该方法要求背景噪声保持平稳且信噪比较高。The first type is the method of speech signal endpoint detection based on HMM model in noisy environment, which requires the background noise to be stable and the signal-to-noise ratio is high.
第二类方法是基于信号的短时能量进行检测的算法,它通过对背景噪声能量的统计,定出能量门限,利用能量门限来确定语音信号起始点。The second type of method is based on the short-term energy detection algorithm of the signal. It determines the energy threshold through the statistics of the background noise energy, and uses the energy threshold to determine the starting point of the speech signal.
步骤4)中,利用语音增强算法获取较为纯净的语音信号。In step 4), a relatively pure speech signal is obtained by using a speech enhancement algorithm.
语音增强技术主要可以分为基于单通道的方法和多通道阵列麦克风的方法。单通道语音增强方法种类繁多,大都基于各种噪声消除方法结合语音信号的特征来研究具有针对性的算法,其理论成熟也是最简单有效的是谱减法(SS:Spectral Subtraction)语音增强。单个传感器拾音会受到场地、距离、应用场合的限制,因此拾音效果将大打折扣,后续的语音增强也就会困难重重。Speech enhancement technology can be mainly divided into methods based on single channel and methods based on multi-channel array microphones. There are many kinds of single-channel speech enhancement methods, most of which are based on various noise cancellation methods combined with the characteristics of the speech signal to study targeted algorithms. The theory is mature and the simplest and most effective is spectral subtraction (SS: Spectral Subtraction) speech enhancement. The sound pickup by a single sensor will be limited by the venue, distance, and application occasions, so the sound pickup effect will be greatly reduced, and subsequent speech enhancement will be difficult.
谱减法的基本原理是:在频域将带噪语音的功率谱减去噪声的功率谱,得到语音的功率谱估计,开方后就得到语音幅度估计,将其相位恢复后再采用逆傅立叶变换恢复时域信号。考虑到人耳对相位的感觉不灵敏,相位恢复时所采用的相位是带噪语音的相位信息。由于语音是短时平稳的,所以在短时谱幅度估计中认为它是平稳随机信号。The basic principle of spectral subtraction is: in the frequency domain, the power spectrum of the noisy speech is subtracted from the power spectrum of the noise to obtain the power spectrum estimate of the speech. Recover the time domain signal. Considering that the human ear is not sensitive to the phase, the phase used in the phase recovery is the phase information of the noisy speech. Since speech is short-term stationary, it is considered to be a stationary random signal in short-term spectrum amplitude estimation.
假设s(n)、η(n)和x(n)分别代表语音、噪声和带噪语音,S(ω)、Γ(ω)和X(ω)分别表示其短时谱。假设s(n)、η(n)不相关且噪声为加性噪声。于是得到信号的加性模型:x(n)=s(n)+η(n),经过加窗处理后的信号分别表示为xw(n),sw(n),ηw(n),则有:xw(n)=sw(n)+ηw(n),对其做傅立叶变换,得:Xw(ω)=Sw(ω)+Γw(ω),因此对功率谱有:|Xw(ω)|2=|Sw(ω)|2+根据观测数据估计|Xw()|2,其余各项必须近似为统计均值。由于s(n)、η(n)独立,则互功率统计均值为0,所以原始语音的估值为:其中,估计值不能保证是非负的,这是因为在估计噪声时存在误差,当估计噪声平均功率大于某帧带噪语音功率时,该帧得出的估计值就会出现为负的情况,这些负值可以通过改变它们的符号使之变为正值,也可以直接给它们置零。将恢复相位并做短时傅立叶反变换IFFT就能得到语音信号的时域估计: Suppose s(n), η(n) and x(n) represent speech, noise and noisy speech respectively, and S(ω), Γ(ω) and X(ω) represent their short-term spectrum respectively. Assume that s(n), η(n) are uncorrelated and the noise is additive noise. Then the additive model of the signal is obtained: x(n)=s(n)+η(n), and the signals after windowing processing are respectively expressed as x w (n), s w (n), η w (n) , then: x w (n) = s w (n) + η w (n), do Fourier transform on it, get: X w (ω) = S w (ω) + Γ w (ω), so for The power spectrum is: |X w (ω)| 2 =|S w (ω)| 2 + |X w ()| 2 is estimated from observed data, and the remaining terms must be approximated by statistical means. Since s(n) and η(n) are independent, the statistical mean value of the cross power is 0, so the estimate of the original speech is: Among them, the estimated value It cannot be guaranteed to be non-negative, because there is an error in estimating the noise. When the estimated noise average power is greater than the noisy speech power of a certain frame, the estimated value obtained by the frame There will be negative cases. These negative values can be changed to positive values by changing their signs, or they can be directly set to zero. Will The time domain estimation of the speech signal can be obtained by recovering the phase and doing the inverse short-time Fourier transform IFFT:
当前,麦克风阵列语音增强算法主要有波束形成、子空间分解、盲源分离等。其中,盲源分离(BSS)是指在不知道或无法或得源信号和混合方式的情况下仅由观测信号恢复源信号的过程。即盲源分离可以不依赖当前事件的先验条件,可以用较少的麦克风就能进行语音增强,该算法的中心问题是解决多个人语音相互干扰混叠情况下将各说话人的语音分离出来,达到对各个目标语音增强的目的。Currently, microphone array speech enhancement algorithms mainly include beamforming, subspace decomposition, and blind source separation. Among them, Blind Source Separation (BSS) refers to the process of recovering the source signal only from the observed signal when the source signal and the mixing method are not known or cannot be obtained. That is, blind source separation does not depend on the prior conditions of the current event, and can perform speech enhancement with fewer microphones. The central problem of this algorithm is to separate the speech of each speaker in the case of mutual interference and aliasing of multiple people's speech , to achieve the purpose of enhancing the voice of each target.
独立分量分析(ICA)是盲信号分离的有效方法之一,属于线性瞬时混合盲信号处理,该方法不依赖于源信号类型相关的详细知识或信号传输系统特性的精确辨识,是一种有效的冗余取消技术。该方法根据代价函数的不同,可以得到不同的ICA算法,如信息最大化(infomax)算法、Fast ICA算法、最大熵(M E)和最小互信息(MM I)算法、极大似然(ML)算法等。其基本原理为:将所获得信号看为目标信号经过一个线性变换混合而成,为了获得目标信号就需要找到一个逆线性变换将获得的信号分解开来,从而达到信源分离的目的。Independent component analysis (ICA) is one of the effective methods for blind signal separation, which belongs to linear instantaneous mixed blind signal processing. This method does not depend on the detailed knowledge of the source signal type or the precise identification of the signal transmission system characteristics. Redundancy Cancellation Technology. According to different cost functions, this method can obtain different ICA algorithms, such as information maximization (infomax) algorithm, Fast ICA algorithm, maximum entropy (ME) and minimum mutual information (MM I) algorithm, maximum likelihood (ML) algorithm etc. The basic principle is: the obtained signal is regarded as the target signal mixed through a linear transformation. In order to obtain the target signal, it is necessary to find an inverse linear transformation to decompose the obtained signal, so as to achieve the purpose of source separation.
在无噪声的情况下,用X=[x1(t) x2(t) … xN(t)]′表示麦克风阵列接收到的一组观察信号,其中t为时间或是样本序号,N为麦克风个数,假设其由独立成分线性混合而成,即其中A为某个未知的满秩矩阵。所以其信号模型的向量表达式为X=AS。In the case of no noise, use X=[x 1 (t) x 2 (t) ... x N (t)]' to represent a set of observation signals received by the microphone array, where t is the time or sample number, N is the number of microphones, assuming that it is linearly mixed from independent components, that is, where A is an unknown full-rank matrix. So the vector expression of its signal model is X=AS.
在有噪声的情况下,假设噪声是加性噪声。则其信号模型的表达式为:X=AS+Γ,其中Γ=[η1 η2 … ηN]是噪声向量。对X=AS+Γ做变换可得:X=A(S+Γ0),Γ=AΓ0,因此可以得出,带噪信号模型任然是基本的ICA模型,只是独立成分由S变换为在ICA基本信号模型下,假设待求分离矩阵为W,分离后信号矩阵为Y,则有如下表达式:Y=WX=WAS。ICA的最终目的是寻找一个最优或是较优的分离矩阵W使得分离后信号矩阵Y中各个信号相互独立并尽可能的逼近源信号。In the presence of noise, it is assumed that the noise is additive. Then the expression of its signal model is: X=AS+Γ, where Γ=[η 1 η 2 ... η N ] is the noise vector. Transform X=AS+Γ to get: X=A(S+Γ 0 ), Γ = AΓ 0 , Therefore, it can be concluded that the noisy signal model It is still the basic ICA model, but the independent components are transformed from S to Under the basic signal model of ICA, assuming that the separation matrix to be obtained is W, and the signal matrix after separation is Y, the following expression is given: Y=WX=WAS. The ultimate goal of ICA is to find an optimal or better separation matrix W so that each signal in the separated signal matrix Y is independent of each other and as close as possible to the source signal.
当前,基于麦克风阵列的语音增强都是对单个目标进行,从而限制了阵列拾音装置的有效拾音效果,且传统的单目标增强并不能满足实际应用的需求。At present, speech enhancement based on a microphone array is performed on a single target, which limits the effective sound pickup effect of the array pickup device, and the traditional single-target enhancement cannot meet the needs of practical applications.
发明内容Contents of the invention
本发明为了解决目前在基于阵列语音信号的多目标增强的技术问题,提出了一种基于盲源分离与谱减法的麦克风阵列多目标语音增强方法。In order to solve the current technical problem of multi-target enhancement based on array speech signals, the present invention proposes a microphone array multi-target speech enhancement method based on blind source separation and spectral subtraction.
本发明的基于盲源分离与谱减法的麦克风阵列多目标语音增强方法,包括下列步骤:The microphone array multi-target voice enhancement method based on blind source separation and spectral subtraction of the present invention comprises the following steps:
步骤1:通过二维面阵的麦克风阵列采集带噪语音信号,得到麦克风阵列的各通道的采集信号,其中麦克风阵列数目大于或等于4;Step 1: collect the noisy speech signal through the microphone array of the two-dimensional area array, and obtain the acquisition signals of each channel of the microphone array, wherein the number of microphone arrays is greater than or equal to 4;
步骤2:分别对各通道的采集信号执行步骤201~205:Step 2: Execute steps 201-205 on the collected signals of each channel respectively:
步骤201:对采集信号进行带通滤波处理,屏蔽非语音段噪声和干扰;再对带通滤波后的信号进行预加重处理,分帧、加窗处理,得到帧信号;Step 201: Perform band-pass filtering processing on the collected signal to shield non-speech segment noise and interference; then perform pre-emphasis processing on the band-pass filtered signal, divide into frames, and add window processing to obtain a frame signal;
然后对每帧信号进行频域转换,即对各帧信号进行短时傅立叶变换,并计算每帧的功率谱;同时计算并保留每帧的相位谱,以备谱减法过程中的相位恢复;Then perform frequency domain conversion on each frame signal, that is, perform short-time Fourier transform on each frame signal, and calculate the power spectrum of each frame; at the same time, calculate and retain the phase spectrum of each frame for phase recovery in the process of spectrum subtraction;
步骤203:对每帧的帧信号进行语音检测,判定当前帧是语音帧还是噪声帧,基于噪声帧估计噪声功率谱;Step 203: Perform speech detection on the frame signal of each frame, determine whether the current frame is a speech frame or a noise frame, and estimate the noise power spectrum based on the noise frame;
步骤204:基于谱减法去除语音帧的功率谱中的噪声功率谱,得到每帧的语音功率谱估计;Step 204: remove the noise power spectrum in the power spectrum of the speech frame based on spectral subtraction, and obtain the speech power spectrum estimation of each frame;
步骤205:对语音功率谱估计开方,并基于对应帧的相位谱进行相位恢复后,再进行短时傅立叶反变换,得到语音帧的时域估计信号;Step 205: Estimate the square root of the speech power spectrum, perform phase recovery based on the phase spectrum of the corresponding frame, and then perform inverse short-time Fourier transform to obtain the time domain estimation signal of the speech frame;
步骤2中是对单通道的采集信号进行信号预处理,将采集信号划分为很多短时的语音段(带噪),即帧信号,再分别对每帧的帧信号进行谱减法处理,以降低语音帧的背景噪声。In step 2, signal preprocessing is performed on the single-channel acquisition signal, and the acquisition signal is divided into many short-term speech segments (with noise), that is, frame signals, and then spectral subtraction is performed on the frame signals of each frame to reduce the Background noise for speech frames.
步骤3:对所有通道的语音帧的时域估计信号采用盲源分离法进行信源分离处理,得到不同信源的目标信号;Step 3: Carry out source separation processing to the time-domain estimation signals of the speech frames of all channels using the blind source separation method to obtain target signals of different sources;
步骤4:对同一信源的目标信号进行去加重、去窗、帧重组处理,得到不同信源的目标语音信号。Step 4: De-emphasis, de-windowing, and frame reorganization are performed on the target signal of the same source to obtain target speech signals of different sources.
综上所述,由于采用了上述技术方案,本发明的有益效果是:(1)解决了传统的单通道语音增强方法处理环境背景噪声,算法简单,对资源需求不高的技术问题;(2)不再依赖阵列信号处理算法进行空间滤波,不需要考虑宽带波束算法,降低了算法结构的复杂度;(3)利用盲源分离算法实现了对目标信号增强,不再单一或是轮换的进行单目标信号增强。In summary, due to the adoption of the above technical solution, the beneficial effects of the present invention are: (1) solve the traditional single-channel speech enhancement method to process environmental background noise, the algorithm is simple, and the technical problems of low demand for resources; (2) ) no longer rely on the array signal processing algorithm for spatial filtering, and do not need to consider the broadband beam algorithm, which reduces the complexity of the algorithm structure; (3) uses the blind source separation algorithm to realize the enhancement of the target signal, no longer single or rotating Single target signal enhancement.
附图说明Description of drawings
图1是传统语音增强系统示意图。FIG. 1 is a schematic diagram of a traditional speech enhancement system.
图2是本发明具体实施方式的实现系统示意图。Fig. 2 is a schematic diagram of an implementation system of a specific embodiment of the present invention.
图3是语音检测的流程图。Fig. 3 is a flowchart of speech detection.
图4是谱减法单通道语音增强方法流程图。Fig. 4 is a flow chart of a method for spectral subtraction single-channel speech enhancement.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚,下面结合实施方式和附图,对本发明作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the implementation methods and accompanying drawings.
参见图2,本发明的多目标语音增强方法,首先对二维面阵的麦克风阵列采集的各单通道信号(语音信号)进行信号预处理,将单通道的语音信号划分为很多短时的语音段,得到帧信号,以便于后续的语音激活检测、谱减法处理。其中信号预处理包括带通滤波、预加重处理,交叠分帧、汉明窗加窗处理。Referring to Fig. 2, the multi-target speech enhancement method of the present invention, at first carries out signal preprocessing to each single-channel signal (speech signal) that the microphone array of two-dimensional area array gathers, and the speech signal of single-channel is divided into a lot of short-term speech segment to obtain frame signals for subsequent speech activation detection and spectral subtraction processing. The signal preprocessing includes band-pass filtering, pre-emphasis processing, overlapping framing, and Hamming window processing.
对单通道的帧信号分别进行语音激活检测、谱减法处理,再对同一语音帧的所有通道进行盲源分离,得到不同源信号的目标信号,再对应信号预处理的逆反操作,对同一源信号的目标信号进行去加重,去汉明窗处理,帧重组得到各目标语音信号,实现对多目标语音的增强处理。Perform speech activation detection and spectral subtraction processing on the single-channel frame signals, and then perform blind source separation on all channels of the same speech frame to obtain target signals of different source signals, and then correspond to the inverse operation of signal preprocessing to perform the same source signal The target signal is de-emphasized, the Hamming window is removed, and the frame is reorganized to obtain each target speech signal, so as to realize the enhancement processing of multi-target speech.
针对室内环境中的声源特性及噪声场特性,采用散射噪声场模型和近场声源模型,对实际环境中的多通道带噪语音信号进行建模。通过64个麦克风组成的8×8的平面阵列来采集空间中的语音信号。According to the characteristics of the sound source and noise field in the indoor environment, the multi-channel noisy speech signal in the actual environment is modeled by using the diffuse noise field model and the near-field sound source model. Speech signals in the space are collected through an 8×8 planar array composed of 64 microphones.
用X=[x1(t) x2(t) … xj(t) … xN(t)]′表示各通道输出的带噪语音信号,其中j表示麦克风通道序号。Use X=[x 1 (t) x 2 (t) ... x j (t) ... x N (t)]' to represent the noisy speech signal output by each channel, where j represents the serial number of the microphone channel.
则对各通道输出的带噪语音信号进行信号预处理后得到的阵列信号(帧信号)为Xpw,则Xpw=[x1pw(n) x2pw(n) … jpw(n) … xNpw(n)]′,其中n=1,2,……L,L为帧长,w为帧号。Then the array signal (frame signal) obtained after signal preprocessing is carried out to the noisy speech signal output by each channel is X pw , then X pw =[x 1pw (n) x 2pw (n) ... jpw (n) ... x Npw (n)]', where n=1,2,...L, L is the frame length, w is the frame number.
对帧信号Xpw作短时傅立叶变换,得到幅度谱|Xpw(ω)和相位谱Φpw(ω)。其中ω为频率采样点,是角频率从0到2π的N等分均匀采样。所以有:Make short-time Fourier transform on the frame signal X pw to obtain the amplitude spectrum |X pw (ω) and the phase spectrum Φ pw (ω). Among them, ω is the frequency sampling point, which is the uniform sampling of N equal parts of the angular frequency from 0 to 2π. So have:
|Xpw|=[|X1pw(ω)| |X2pw(ω)| … |Xjpw(ω)| … |XNpw(ω)|]′|X pw |=[|X 1pw (ω)| |X 2pw (ω)| … |X jpw (ω)| … |X Npw (ω)|]′
利用|Xpw|=[|X1pw(ω)| |X2pw(ω)| … |Xjpw(ω)| …|XNpw(ω)|]′,按照图3所示流程图进行语音起始端点和结束端点的检测,即判断当前帧是噪声帧还是语音帧,并利用判断结果进行谱减消噪。其中,语音起始端点(起始帧)和结束端点(结束帧)的检测的具体过程为:Utilize |X pw |=[|X 1pw (ω)| |X 2pw (ω)| ... |X jpw (ω)| ...|X Npw (ω)|] ', carry out speech starting according to the flow chart shown in Figure 3 The detection of the start endpoint and the end endpoint is to judge whether the current frame is a noise frame or a voice frame, and use the judgment result to perform spectrum reduction and noise reduction. Wherein, the specific process of the detection of speech start endpoint (start frame) and end endpoint (end frame) is:
采用公式计算每一帧的语音能量,其中N为帧长,w为帧的编号,1≤w≤L,L为帧数,ω为每一帧中的各点;use the formula Calculate the voice energy of each frame, where N is the frame length, w is the number of the frame, 1≤w≤L, L is the number of frames, and ω is each point in each frame;
初始化门限阈值T,通过对背景噪声能量的统计,设置门限阈值T的初始值Initialize the threshold T, and set the initial value of the threshold T through the statistics of background noise energy
然后基于门限阈值T对每帧进行类别判定,判定当前帧是噪声帧还是语音帧,同时基于最近的k帧的噪声帧对门限阈值T进行更新:Then classify each frame based on the threshold T, determine whether the current frame is a noise frame or a speech frame, and update the threshold T based on the noise frame of the latest k frames:
a.计算当前帧的语音能量Mw,若Mw大于T,则判定当前帧为语音帧,否则判定为噪声帧;a. Calculate the voice energy M w of the current frame, if M w is greater than T, then determine that the current frame is a voice frame, otherwise it is determined to be a noise frame;
b、若当前帧为噪声帧,则基于最近的k(经验值,通常取值为大于等于10)帧噪声帧对门限阈值T进行更新:b. If the current frame is a noise frame, the threshold T is updated based on the nearest k (experience value, usually greater than or equal to 10) frame noise frame:
b1:计算最近k帧噪声帧的平均语音能量EMN、语音能量最大值和能量最小值EAX,EMIN;b1: Calculate the average speech energy EMN, speech energy maximum value and energy minimum value EAX, EMIN of the noise frame of the latest k frames;
b2:根据公式T=min[a×(EAX-EMIN)+EMN,b×EMN](0<a<1,1<b<10)得到更新后的门限阈值T;b2: According to the formula T=min[a×(EAX-EMIN)+EMN,b×EMN](0<a<1,1<b<10) to obtain the updated threshold T;
c、若当前帧为语音帧,则判定所有帧是否处理完毕,若是,则端点检测完毕,否则,继续对下一帧重复步骤a~c。c. If the current frame is a speech frame, it is determined whether all frames have been processed, and if so, the endpoint detection is completed; otherwise, continue to repeat steps a to c for the next frame.
进一步的,还可以利用短时过零率对语音帧和噪声帧的判定结果进行检验,以防止误判。Furthermore, the short-term zero-crossing rate can also be used to check the judgment results of the speech frame and the noise frame, so as to prevent misjudgment.
参见图4,基于检测出的所有噪声帧,可以估计得到噪声功率谱,然后基于谱减法除去各语音帧的估计噪声,即用语音帧的功率谱减去当前估计得到的噪声功率谱,得到语音帧的语音功率谱估计,再对语音功率谱估计开方,并基于各语音帧的相位谱进行相位恢复后,再进行短时傅立叶反变换,得到语音帧的时域估计信号,即单通道的增强语音信号。Referring to Figure 4, based on all detected noise frames, the noise power spectrum can be estimated, and then the estimated noise of each speech frame is removed based on spectral subtraction, that is, the power spectrum of the speech frame is subtracted from the currently estimated noise power spectrum to obtain the speech The speech power spectrum of each frame is estimated, and then the square root of the speech power spectrum is estimated, and the phase is restored based on the phase spectrum of each speech frame, and then the short-time inverse Fourier transform is performed to obtain the time domain estimation signal of the speech frame, that is, the single-channel Enhance speech signal.
上述完成后,利用自然梯度ICA来完成盲源分离,其具体处理过程为:After the above is completed, the natural gradient ICA is used to complete the blind source separation, and the specific processing process is as follows:
(1)若当前观测信号X(多个单通道增强语音信号序列)的均值不为零,那么就先从观测信号X中减去其均值;(1) If the mean value of the current observed signal X (multiple single-channel enhanced speech signal sequences) is not zero, then first subtract its mean value from the observed signal X;
(2)选择一矩阵B,使协方差矩阵E{VVT}为单位矩阵I,其中V=BX,向量V的各个分量之间是不相关的,且具有单位方差;(2) select a matrix B, make covariance matrix E{VV T } be identity matrix I, wherein V=BX, be irrelevant between each component of vector V, and have unit variance;
(3)基于奇异值分解的白化处理:首先估计X的方差Rx=E{XXT},Rx是一个实Hermitian阵;其次对Rx进行奇异值分解其中U=[u1,u2,…,un]的列是Rx的左奇异向量,白化处理的目的是为了减弱语音信号混合后的相关性;(3) Whitening processing based on singular value decomposition: first estimate the variance R x = E{XX T } of X, and R x is a real Hermitian matrix; secondly, perform singular value decomposition on R x The column of U=[u 1 , u 2 ,..., u n ] is the left singular vector of R x , and the purpose of whitening processing is to weaken the correlation of speech signals after mixing;
(4)由于σ1≥σ2≥…≥σm>0;σm+1=…σn;(m≤n)从而估计出源信号的数目为m;(4) Since σ 1 ≥σ 2 ≥...≥σ m >0; σ m+1 =...σ n ; (m≤n), the number of source signals is estimated to be m;
(5)最后进行正交变换:(5) Finally, perform an orthogonal transformation:
U=[u1,2,…,n]U=[u 1 , 2 ,…, n ]
=BX,Um=[u1,2,…,m] =BX, U m =[u 1 , 2 ,..., m ]
根据公式得到回复信号Y,其中,P既可称为性能矩阵,又可称为收敛矩阵,W表示独立分量分析ICA中的分离矩阵,A表示ICA中的混合矩阵,S表示源信号。According to the formula Get a reply signal Y, where, P can be called both the performance matrix and the convergence matrix, W represents the separation matrix in ICA, A represents the mixing matrix in ICA, and S represents the source signal.
以上所述,仅为本发明的具体实施方式,本说明书中所公开的任一特征,除非特别叙述,均可被其他等效或具有类似目的的替代特征加以替换;所公开的所有特征、或所有方法或过程中的步骤,除了互相排斥的特征和/或步骤以外,均可以任何方式组合。The above is only a specific embodiment of the present invention. Any feature disclosed in this specification, unless specifically stated, can be replaced by other equivalent or alternative features with similar purposes; all the disclosed features, or All method or process steps may be combined in any way, except for mutually exclusive features and/or steps.
Claims (3)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510967234 | 2015-12-22 | ||
CN2015109672348 | 2015-12-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106504763A true CN106504763A (en) | 2017-03-15 |
Family
ID=58333455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611191478.2A Pending CN106504763A (en) | 2015-12-22 | 2016-12-21 | Multi-target Speech Enhancement Method Based on Microphone Array Based on Blind Source Separation and Spectral Subtraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106504763A (en) |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107102296A (en) * | 2017-04-27 | 2017-08-29 | 大连理工大学 | A Sound Source Localization System Based on Distributed Microphone Array |
CN107293305A (en) * | 2017-06-21 | 2017-10-24 | 惠州Tcl移动通信有限公司 | It is a kind of to improve the method and its device of recording quality based on blind source separation algorithm |
CN107785029A (en) * | 2017-10-23 | 2018-03-09 | 科大讯飞股份有限公司 | Target voice detection method and device |
CN108831500A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | Sound enhancement method, device, computer equipment and storage medium |
CN108899052A (en) * | 2018-07-10 | 2018-11-27 | 南京邮电大学 | A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction |
CN109671439A (en) * | 2018-12-19 | 2019-04-23 | 成都大学 | A kind of intelligence fruit-bearing forest bird pest prevention and treatment equipment and its birds localization method |
WO2019105238A1 (en) * | 2017-12-01 | 2019-06-06 | 腾讯科技(深圳)有限公司 | Method and terminal for speech signal reconstruction and computer storage medium |
CN109859749A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | A kind of voice signal recognition methods and device |
CN109884591A (en) * | 2019-02-25 | 2019-06-14 | 南京理工大学 | A sound signal enhancement method for multi-rotor UAV based on microphone array |
CN110060704A (en) * | 2019-03-26 | 2019-07-26 | 天津大学 | A kind of sound enhancement method of improved multiple target criterion study |
CN110111806A (en) * | 2019-03-26 | 2019-08-09 | 广东工业大学 | A kind of blind separating method of moving source signal aliasing |
CN110223708A (en) * | 2019-05-07 | 2019-09-10 | 平安科技(深圳)有限公司 | Sound enhancement method and relevant device based on speech processes |
CN110459236A (en) * | 2019-08-15 | 2019-11-15 | 北京小米移动软件有限公司 | Noise estimation method, device and the storage medium of audio signal |
CN110459234A (en) * | 2019-08-15 | 2019-11-15 | 苏州思必驰信息科技有限公司 | For vehicle-mounted audio recognition method and system |
CN110491410A (en) * | 2019-04-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Speech separating method, audio recognition method and relevant device |
CN111128217A (en) * | 2019-12-31 | 2020-05-08 | 杭州爱莱达科技有限公司 | Distributed multi-channel voice coherent laser radar interception method and device |
CN111239680A (en) * | 2020-01-19 | 2020-06-05 | 西北工业大学太仓长三角研究院 | A DOA Estimation Method Based on Differential Array |
CN111402917A (en) * | 2020-03-13 | 2020-07-10 | 北京松果电子有限公司 | Audio signal processing method and device and storage medium |
CN111627456A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Noise elimination method, device, equipment and readable storage medium |
CN111986692A (en) * | 2019-05-24 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Sound source tracking and pickup method and device based on microphone array |
CN112151036A (en) * | 2020-09-16 | 2020-12-29 | 科大讯飞(苏州)科技有限公司 | Anti-sound-crosstalk method, device and equipment based on multi-pickup scene |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
CN112309414A (en) * | 2020-07-21 | 2021-02-02 | 东莞市逸音电子科技有限公司 | Active noise reduction method based on audio coding and decoding, earphone and electronic equipment |
CN112735464A (en) * | 2020-12-21 | 2021-04-30 | 招商局重庆交通科研设计院有限公司 | Tunnel emergency broadcast sound effect information detection method |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
US11049509B2 (en) | 2019-03-06 | 2021-06-29 | Plantronics, Inc. | Voice signal enhancement for head-worn audio devices |
CN113053406A (en) * | 2021-05-08 | 2021-06-29 | 北京小米移动软件有限公司 | Sound signal identification method and device |
CN113077808A (en) * | 2021-03-22 | 2021-07-06 | 北京搜狗科技发展有限公司 | Voice processing method and device for voice processing |
CN113160845A (en) * | 2021-03-29 | 2021-07-23 | 南京理工大学 | Speech enhancement algorithm based on speech existence probability and auditory masking effect |
CN113314135A (en) * | 2021-05-25 | 2021-08-27 | 北京小米移动软件有限公司 | Sound signal identification method and device |
CN113314137A (en) * | 2020-02-27 | 2021-08-27 | 东北大学秦皇岛分校 | Mixed signal separation method based on dynamic evolution particle swarm shielding EMD |
CN113329288A (en) * | 2021-04-29 | 2021-08-31 | 开放智能技术(南京)有限公司 | Bluetooth headset noise reduction method based on notch technology |
CN113362847A (en) * | 2021-05-26 | 2021-09-07 | 北京小米移动软件有限公司 | Audio signal processing method and device and storage medium |
CN113763982A (en) * | 2020-06-05 | 2021-12-07 | 阿里巴巴集团控股有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN114171052A (en) * | 2021-11-30 | 2022-03-11 | 深圳云知声信息技术有限公司 | Double voice separation method and device, electronic equipment and storage medium |
CN114639398A (en) * | 2022-03-10 | 2022-06-17 | 电子科技大学 | Broadband DOA estimation method based on microphone array |
CN114822572A (en) * | 2022-04-18 | 2022-07-29 | 西北工业大学 | A filter bank-based speech enhancement method under low signal-to-noise ratio |
CN114974279A (en) * | 2022-05-10 | 2022-08-30 | 中移(杭州)信息技术有限公司 | Sound quality control method, device, equipment and storage medium |
CN117238278A (en) * | 2023-11-14 | 2023-12-15 | 三一智造(深圳)有限公司 | Speech recognition error correction method and system based on artificial intelligence |
CN117409799A (en) * | 2023-09-25 | 2024-01-16 | 深圳市极客空间科技有限公司 | Audio signal processing system and method |
CN118553261A (en) * | 2024-07-25 | 2024-08-27 | 深圳市计通智能技术有限公司 | Directional sound source noise reduction method and medium of head-mounted AR equipment |
CN112666522B (en) * | 2020-12-24 | 2024-10-22 | 北京地平线信息技术有限公司 | Awakening word sound source positioning method and device |
CN119252277A (en) * | 2024-12-05 | 2025-01-03 | 电子科技大学 | A method and device for processing audio signals based on machine learning algorithm catboost |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750956A (en) * | 2012-06-18 | 2012-10-24 | 歌尔声学股份有限公司 | Method and device for removing reverberation of single channel voice |
CN202749088U (en) * | 2012-08-08 | 2013-02-20 | 滨州学院 | Voice reinforcing system using blind source separation algorithm |
CN103854660A (en) * | 2014-02-24 | 2014-06-11 | 中国电子科技集团公司第二十八研究所 | Four-microphone voice enhancement method based on independent component analysis |
US20150078571A1 (en) * | 2013-09-17 | 2015-03-19 | Lukasz Kurylo | Adaptive phase difference based noise reduction for automatic speech recognition (asr) |
CN104935546A (en) * | 2015-06-18 | 2015-09-23 | 河海大学 | A Blind Separation Method of MIMO-OFDM Signals to Improve the Convergence Speed of Natural Gradient Algorithm |
-
2016
- 2016-12-21 CN CN201611191478.2A patent/CN106504763A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750956A (en) * | 2012-06-18 | 2012-10-24 | 歌尔声学股份有限公司 | Method and device for removing reverberation of single channel voice |
CN202749088U (en) * | 2012-08-08 | 2013-02-20 | 滨州学院 | Voice reinforcing system using blind source separation algorithm |
US20150078571A1 (en) * | 2013-09-17 | 2015-03-19 | Lukasz Kurylo | Adaptive phase difference based noise reduction for automatic speech recognition (asr) |
CN103854660A (en) * | 2014-02-24 | 2014-06-11 | 中国电子科技集团公司第二十八研究所 | Four-microphone voice enhancement method based on independent component analysis |
CN104935546A (en) * | 2015-06-18 | 2015-09-23 | 河海大学 | A Blind Separation Method of MIMO-OFDM Signals to Improve the Convergence Speed of Natural Gradient Algorithm |
Non-Patent Citations (4)
Title |
---|
李蕴华: "基于盲源分离的单通道语音信号增强", 《计算机仿真》 * |
杨震等: "基于SB卡的语音识别实时仿真系统", 《南京邮电学院学报》 * |
职振华: "语音盲分离算法的研究", 《中国硕士学位论文全文数据库(电子期刊)信息科技辑》 * |
陈为国: "实时语音信号处理系统理论和应用", 《中国博士学位论文全文数据库(电子期刊)信息科技辑》 * |
Cited By (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107102296A (en) * | 2017-04-27 | 2017-08-29 | 大连理工大学 | A Sound Source Localization System Based on Distributed Microphone Array |
CN107102296B (en) * | 2017-04-27 | 2020-04-14 | 大连理工大学 | A sound source localization system based on distributed microphone array |
CN107293305A (en) * | 2017-06-21 | 2017-10-24 | 惠州Tcl移动通信有限公司 | It is a kind of to improve the method and its device of recording quality based on blind source separation algorithm |
US11308974B2 (en) | 2017-10-23 | 2022-04-19 | Iflytek Co., Ltd. | Target voice detection method and apparatus |
CN107785029A (en) * | 2017-10-23 | 2018-03-09 | 科大讯飞股份有限公司 | Target voice detection method and device |
CN107785029B (en) * | 2017-10-23 | 2021-01-29 | 科大讯飞股份有限公司 | Target voice detection method and device |
US11869481B2 (en) | 2017-11-30 | 2024-01-09 | Alibaba Group Holding Limited | Speech signal recognition method and device |
CN109859749A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | A kind of voice signal recognition methods and device |
US11482237B2 (en) | 2017-12-01 | 2022-10-25 | Tencent Technology (Shenzhen) Company Limited | Method and terminal for reconstructing speech signal, and computer storage medium |
CN109887494A (en) * | 2017-12-01 | 2019-06-14 | 腾讯科技(深圳)有限公司 | The method and apparatus of reconstructed speech signal |
WO2019105238A1 (en) * | 2017-12-01 | 2019-06-06 | 腾讯科技(深圳)有限公司 | Method and terminal for speech signal reconstruction and computer storage medium |
CN108831500A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | Sound enhancement method, device, computer equipment and storage medium |
CN108899052B (en) * | 2018-07-10 | 2020-12-01 | 南京邮电大学 | A Parkinson's Speech Enhancement Method Based on Multiband Spectral Subtraction |
CN108899052A (en) * | 2018-07-10 | 2018-11-27 | 南京邮电大学 | A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction |
CN109671439A (en) * | 2018-12-19 | 2019-04-23 | 成都大学 | A kind of intelligence fruit-bearing forest bird pest prevention and treatment equipment and its birds localization method |
CN109671439B (en) * | 2018-12-19 | 2024-01-19 | 成都大学 | Intelligent fruit forest bird pest control equipment and bird positioning method thereof |
CN109884591A (en) * | 2019-02-25 | 2019-06-14 | 南京理工大学 | A sound signal enhancement method for multi-rotor UAV based on microphone array |
US11664042B2 (en) | 2019-03-06 | 2023-05-30 | Plantronics, Inc. | Voice signal enhancement for head-worn audio devices |
US11049509B2 (en) | 2019-03-06 | 2021-06-29 | Plantronics, Inc. | Voice signal enhancement for head-worn audio devices |
CN110060704A (en) * | 2019-03-26 | 2019-07-26 | 天津大学 | A kind of sound enhancement method of improved multiple target criterion study |
CN110111806A (en) * | 2019-03-26 | 2019-08-09 | 广东工业大学 | A kind of blind separating method of moving source signal aliasing |
CN110491410A (en) * | 2019-04-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Speech separating method, audio recognition method and relevant device |
CN110223708A (en) * | 2019-05-07 | 2019-09-10 | 平安科技(深圳)有限公司 | Sound enhancement method and relevant device based on speech processes |
CN110223708B (en) * | 2019-05-07 | 2023-05-30 | 平安科技(深圳)有限公司 | Speech enhancement method based on speech processing and related equipment |
CN111986692A (en) * | 2019-05-24 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Sound source tracking and pickup method and device based on microphone array |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
CN112289335B (en) * | 2019-07-24 | 2024-11-12 | 阿里巴巴集团控股有限公司 | Voice signal processing method, device and sound pickup device |
CN110459236B (en) * | 2019-08-15 | 2021-11-30 | 北京小米移动软件有限公司 | Noise estimation method, apparatus and storage medium for audio signal |
CN110459234A (en) * | 2019-08-15 | 2019-11-15 | 苏州思必驰信息科技有限公司 | For vehicle-mounted audio recognition method and system |
CN110459236A (en) * | 2019-08-15 | 2019-11-15 | 北京小米移动软件有限公司 | Noise estimation method, device and the storage medium of audio signal |
CN110459234B (en) * | 2019-08-15 | 2022-03-22 | 思必驰科技股份有限公司 | Vehicle-mounted voice recognition method and system |
CN111128217A (en) * | 2019-12-31 | 2020-05-08 | 杭州爱莱达科技有限公司 | Distributed multi-channel voice coherent laser radar interception method and device |
CN111239680A (en) * | 2020-01-19 | 2020-06-05 | 西北工业大学太仓长三角研究院 | A DOA Estimation Method Based on Differential Array |
CN111239680B (en) * | 2020-01-19 | 2022-09-16 | 西北工业大学太仓长三角研究院 | A DOA Estimation Method Based on Differential Array |
CN113314137A (en) * | 2020-02-27 | 2021-08-27 | 东北大学秦皇岛分校 | Mixed signal separation method based on dynamic evolution particle swarm shielding EMD |
CN113314137B (en) * | 2020-02-27 | 2022-07-26 | 东北大学秦皇岛分校 | Mixed signal separation method based on dynamic evolution particle swarm shielding EMD |
CN111402917A (en) * | 2020-03-13 | 2020-07-10 | 北京松果电子有限公司 | Audio signal processing method and device and storage medium |
CN111627456A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Noise elimination method, device, equipment and readable storage medium |
CN113763982A (en) * | 2020-06-05 | 2021-12-07 | 阿里巴巴集团控股有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN112309414A (en) * | 2020-07-21 | 2021-02-02 | 东莞市逸音电子科技有限公司 | Active noise reduction method based on audio coding and decoding, earphone and electronic equipment |
CN112309414B (en) * | 2020-07-21 | 2024-01-12 | 东莞市逸音电子科技有限公司 | Active noise reduction method based on audio encoding and decoding, earphone and electronic equipment |
CN112151036A (en) * | 2020-09-16 | 2020-12-29 | 科大讯飞(苏州)科技有限公司 | Anti-sound-crosstalk method, device and equipment based on multi-pickup scene |
CN112151036B (en) * | 2020-09-16 | 2021-07-30 | 科大讯飞(苏州)科技有限公司 | Anti-sound-crosstalk method, device and equipment based on multi-pickup scene |
CN112735464A (en) * | 2020-12-21 | 2021-04-30 | 招商局重庆交通科研设计院有限公司 | Tunnel emergency broadcast sound effect information detection method |
CN112666522B (en) * | 2020-12-24 | 2024-10-22 | 北京地平线信息技术有限公司 | Awakening word sound source positioning method and device |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
CN113077808B (en) * | 2021-03-22 | 2024-04-26 | 北京搜狗科技发展有限公司 | Voice processing method and device for voice processing |
WO2022198820A1 (en) * | 2021-03-22 | 2022-09-29 | 北京搜狗科技发展有限公司 | Speech processing method and apparatus, and apparatus for speech processing |
CN113077808A (en) * | 2021-03-22 | 2021-07-06 | 北京搜狗科技发展有限公司 | Voice processing method and device for voice processing |
CN113160845A (en) * | 2021-03-29 | 2021-07-23 | 南京理工大学 | Speech enhancement algorithm based on speech existence probability and auditory masking effect |
CN113329288A (en) * | 2021-04-29 | 2021-08-31 | 开放智能技术(南京)有限公司 | Bluetooth headset noise reduction method based on notch technology |
CN113053406A (en) * | 2021-05-08 | 2021-06-29 | 北京小米移动软件有限公司 | Sound signal identification method and device |
CN113314135A (en) * | 2021-05-25 | 2021-08-27 | 北京小米移动软件有限公司 | Sound signal identification method and device |
CN113314135B (en) * | 2021-05-25 | 2024-04-26 | 北京小米移动软件有限公司 | Voice signal identification method and device |
CN113362847A (en) * | 2021-05-26 | 2021-09-07 | 北京小米移动软件有限公司 | Audio signal processing method and device and storage medium |
CN114171052B (en) * | 2021-11-30 | 2025-01-28 | 深圳云知声信息技术有限公司 | A method, device, electronic device and storage medium for separating two-person voices |
CN114171052A (en) * | 2021-11-30 | 2022-03-11 | 深圳云知声信息技术有限公司 | Double voice separation method and device, electronic equipment and storage medium |
CN114639398B (en) * | 2022-03-10 | 2023-05-26 | 电子科技大学 | A Wideband DOA Estimation Method Based on Microphone Array |
CN114639398A (en) * | 2022-03-10 | 2022-06-17 | 电子科技大学 | Broadband DOA estimation method based on microphone array |
CN114822572A (en) * | 2022-04-18 | 2022-07-29 | 西北工业大学 | A filter bank-based speech enhancement method under low signal-to-noise ratio |
CN114974279A (en) * | 2022-05-10 | 2022-08-30 | 中移(杭州)信息技术有限公司 | Sound quality control method, device, equipment and storage medium |
CN114974279B (en) * | 2022-05-10 | 2024-10-25 | 中移(杭州)信息技术有限公司 | Sound quality control method, device, equipment and storage medium |
CN117409799A (en) * | 2023-09-25 | 2024-01-16 | 深圳市极客空间科技有限公司 | Audio signal processing system and method |
CN117409799B (en) * | 2023-09-25 | 2024-07-09 | 杭州来疯科技有限公司 | Audio signal processing system and method |
CN117238278B (en) * | 2023-11-14 | 2024-02-09 | 三一智造(深圳)有限公司 | Speech recognition error correction method and system based on artificial intelligence |
CN117238278A (en) * | 2023-11-14 | 2023-12-15 | 三一智造(深圳)有限公司 | Speech recognition error correction method and system based on artificial intelligence |
CN118553261A (en) * | 2024-07-25 | 2024-08-27 | 深圳市计通智能技术有限公司 | Directional sound source noise reduction method and medium of head-mounted AR equipment |
CN118553261B (en) * | 2024-07-25 | 2024-10-22 | 深圳市计通智能技术有限公司 | Directional sound source noise reduction method and medium of head-mounted AR equipment |
CN119252277A (en) * | 2024-12-05 | 2025-01-03 | 电子科技大学 | A method and device for processing audio signals based on machine learning algorithm catboost |
CN119252277B (en) * | 2024-12-05 | 2025-02-25 | 电子科技大学 | Audio signal processing method and device based on machine learning algorithm catboost |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106504763A (en) | Multi-target Speech Enhancement Method Based on Microphone Array Based on Blind Source Separation and Spectral Subtraction | |
JP6074263B2 (en) | Noise suppression device and control method thereof | |
WO2015196729A1 (en) | Microphone array speech enhancement method and device | |
CN107479030B (en) | Frequency division and improved generalized cross-correlation based binaural time delay estimation method | |
US8654990B2 (en) | Multiple microphone based directional sound filter | |
CN106226739A (en) | Merge the double sound source localization method of Substrip analysis | |
CN102157156B (en) | Single-channel voice enhancement method and system | |
CN101778322B (en) | Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic | |
CN107316648A (en) | A kind of sound enhancement method based on coloured noise | |
CN105225672B (en) | Merge the system and method for the dual microphone orientation noise suppression of fundamental frequency information | |
CN102411138A (en) | A method for robot sound source localization | |
CN108198568B (en) | Method and system for localizing multiple sound sources | |
CN102456351A (en) | Voice enhancement system | |
WO2015196760A1 (en) | Microphone array speech detection method and device | |
JP6225245B2 (en) | Signal processing apparatus, method and program | |
Velasco et al. | Novel GCC-PHAT model in diffuse sound field for microphone array pairwise distance based calibration | |
CN110310650A (en) | A Speech Enhancement Algorithm Based on Second-Order Differential Microphone Array | |
Hosseini et al. | Time difference of arrival estimation of sound source using cross correlation and modified maximum likelihood weighting function | |
Zhang et al. | A speech separation algorithm based on the comb-filter effect | |
Bavkar et al. | PCA based single channel speech enhancement method for highly noisy environment | |
Zhu et al. | Modified complementary joint sparse representations: a novel post-filtering to MVDR beamforming | |
CN111009259A (en) | Audio processing method and device | |
Firoozabadi et al. | Combination of nested microphone array and subband processing for multiple simultaneous speaker localization | |
JP2017181761A (en) | Signal processing device and program, and gain processing device and program | |
Wang | Speech enhancement using fiber acoustic sensor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170315 |