CN114373475A

CN114373475A - Method, device and storage medium for speech noise reduction based on microphone array

Info

Publication number: CN114373475A
Application number: CN202111621218.5A
Authority: CN
Inventors: 王向辉; 高朴; 韩冬; 陈捷; 王瑞琪; 王姣; 李梅
Original assignee: Shaanxi University of Science and Technology
Current assignee: Shaanxi University of Science and Technology
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-19

Abstract

The present application discloses a voice noise reduction method based on a microphone array, which solves the problem that the complexity of solving the filter in the prior art will increase rapidly with the increase of the filter length, and the statistical characteristics of the voice signal and noise will change. The method includes: acquiring the noisy speech signal; preprocessing the noisy speech signal to determine the frequency domain noisy speech signal; estimating the frequency domain noisy speech signal and the statistical characteristics of the noise signal; The array is divided into multiple sub-arrays, and multiple sub-filters are estimated respectively, and the frequency-domain noise reduction filter is determined; according to the frequency-domain noise reduction filter, the frequency-domain noisy speech signal is denoised and converted into time-domain noise reduction speech signal, so that the required signal covariance matrix dimension is smaller in the process of solving the filter, thereby significantly reducing the complexity of solving the speech noise reduction filter, and improving the filter's sensitivity to changes in the statistical characteristics of speech signals and noise. tracking ability.

Description

Method, device and storage medium for speech noise reduction based on microphone array

技术领域technical field

本申请涉及麦克风阵列技术领域，尤其涉及一种基于麦克风阵列的语音降噪方法、装置以及存储介质。The present application relates to the technical field of microphone arrays, and in particular, to a method, device and storage medium for speech noise reduction based on a microphone array.

背景技术Background technique

语音降噪在智能语音、人机交互、远程会议、助听设备、车载、虚拟现实、临境通讯和军用超高背景噪声的语音通信等系统中都起到举足轻重的作用，其性能的好坏直接影响着语音交互的体验。Voice noise reduction plays a pivotal role in intelligent voice, human-computer interaction, teleconferencing, hearing aids, vehicle, virtual reality, immersive communication, and military voice communication with ultra-high background noise, and its performance is good or bad. It directly affects the experience of voice interaction.

早期的语音交互系统通常只装备一个麦克风，对应的降噪方法则为单道语音降噪。单通道语音降噪方法具有实现简单、运算效率高等优点，能取得一定的效果，但也有较大的局限性。研究表明，在一定条件下单通道降噪一定会引入语音失真，且信噪比提升越大，引入的语音失真也越大。相比之下，多通道语音降噪方法更有潜力在少引入或者不引入语音失真的前提下显著提升信噪比。经典的多通道语音降噪方法包括多通道维纳滤波，多通道折中滤波，最小方差无失真响应滤波、线性约束最小方差滤波、以及广义旁瓣对消等。近年来，国内外研究人员提出了基于深度学习的语音降噪方法，可取得较好的性能，但由于其泛化能力通常较弱，当前还难以大范围地应用在实际系统中。Early voice interaction systems are usually equipped with only one microphone, and the corresponding noise reduction method is single-channel voice noise reduction. The single-channel speech noise reduction method has the advantages of simple implementation and high computing efficiency, and can achieve certain effects, but it also has great limitations. Studies have shown that under certain conditions, single-channel noise reduction will definitely introduce speech distortion, and the greater the improvement of the signal-to-noise ratio, the greater the introduced speech distortion. In contrast, multi-channel speech noise reduction methods have the potential to significantly improve the signal-to-noise ratio with little or no speech distortion. Classical multi-channel speech noise reduction methods include multi-channel Wiener filtering, multi-channel compromise filtering, minimum variance undistorted response filtering, linear constrained minimum variance filtering, and generalized sidelobe cancellation. In recent years, researchers at home and abroad have proposed speech noise reduction methods based on deep learning, which can achieve good performance.

为取得更好的语音降噪性能，通常需要装备更多的麦克风，以获取更加丰富的空时频信息。但这通常也意味着需要设计更长的滤波器。而应用更长的滤波器会带来以下的两个问题。第一，求解滤波器的复杂度会随着滤波器长度的增大而迅速增大；第二，在滤波器的求解过程中所需的信号协方差矩阵的维数会更大，因此需要更多的观测样本来估计信号的协方差矩阵，用以计算滤波器的系数，导致对语音信号和噪声统计特性变化的跟踪能力下降，无法更好地处理实际中常见的非平稳噪声。In order to achieve better speech noise reduction performance, it is usually necessary to equip more microphones to obtain richer space-time-frequency information. But this also usually means designing longer filters. The application of longer filters brings the following two problems. First, the complexity of solving the filter will increase rapidly with the increase of the filter length; second, the dimension of the signal covariance matrix required in the process of solving the filter will be larger, so it needs to be more Many observation samples are used to estimate the covariance matrix of the signal, which is used to calculate the coefficient of the filter, which leads to a decrease in the ability to track the changes in the statistical characteristics of the speech signal and noise, and cannot better handle the common non-stationary noise in practice.

发明内容SUMMARY OF THE INVENTION

本申请实施例通过提供一种基于麦克风阵列的语音降噪方法，解决了在现有技术中当滤波器长度较长时所导致的两个问题，即，第一，求解滤波器的复杂度会随着滤波器长度的增大而迅速增大；第二，在滤波器的求解过程中所需的信号协方差矩阵的维数会更大，因此需要更多的观测样本来估计信号的协方差矩阵，用以计算滤波器的系数，导致对语音信号和噪声统计特性变化的跟踪能力下降，无法更好地处理实际中常见的非平稳噪声。本申请实施例显著降低了求解滤波器的复杂度，且在滤波器的求解过程中所需的信号协方差矩阵维数更小，因此可以用更少的信号观测样本来估计其协方差矩阵，从而提高滤波器对语音信号和噪声统计特性变化的跟踪能力。The embodiment of the present application solves two problems caused when the filter length is long in the prior art by providing a voice noise reduction method based on a microphone array, that is, first, the complexity of solving the filter will increase It increases rapidly with the increase of the filter length; secondly, the dimension of the signal covariance matrix required in the process of solving the filter will be larger, so more observation samples are needed to estimate the signal covariance The matrix is used to calculate the coefficients of the filter, which leads to a decrease in the ability to track the changes in the statistical characteristics of the speech signal and noise, and cannot better handle the non-stationary noise that is common in practice. The embodiment of the present application significantly reduces the complexity of solving the filter, and the required signal covariance matrix dimension in the process of solving the filter is smaller, so the covariance matrix can be estimated with fewer signal observation samples, Thereby, the ability of the filter to track the changes of the statistical characteristics of the speech signal and noise is improved.

第一方面，本发明实施例提供了一种基于麦克风阵列的语音降噪方法，该方法包括：In a first aspect, an embodiment of the present invention provides a method for noise reduction based on a microphone array, the method comprising:

获取带噪语音信号；Obtain noisy speech signal;

对所述带噪语音信号进行预处理，确定频域带噪语音信号；Preprocessing the noisy speech signal to determine the frequency domain noisy speech signal;

估计所述频域带噪语音信号的统计特性，估计噪声信号的统计特性；Estimating the statistical properties of the frequency-domain noisy speech signal, and estimating the statistical properties of the noise signal;

将麦克风阵列分为多个子阵列，分别估计出多个子滤波器；Divide the microphone array into multiple sub-arrays, and estimate multiple sub-filters respectively;

根据所述多个子滤波器，确定频域降噪滤波器；determining a frequency-domain noise reduction filter according to the plurality of sub-filters;

根据所述频域降噪滤波器对所述频域带噪语音信号进行降噪处理，确定频域降噪语音信号；According to the frequency-domain noise reduction filter, noise reduction processing is performed on the frequency-domain noisy speech signal to determine the frequency-domain noise reduction speech signal;

将所述频域降噪语音信号转换为时域降噪语音信号。Converting the frequency-domain noise-reduced speech signal into a time-domain noise-reduced speech signal.

结合第一方面，在一种可能的实现方式中，所述对所述带噪语音信号进行预处理，包括：对所述带噪语音信号进行分帧、加窗后进行快速傅里叶变换。With reference to the first aspect, in a possible implementation manner, the preprocessing of the noisy speech signal includes: framing and windowing the noisy speech signal and then performing fast Fourier transform.

结合第一方面，在一种可能的实现方式中，所述估计所述频域带噪语音信号的统计特性，包括根据时间平滑估计方式进行带噪语音信号统计特性的估计。With reference to the first aspect, in a possible implementation, the estimating the statistical characteristics of the frequency-domain noisy speech signal includes estimating the statistical characteristics of the noisy speech signal according to a time smoothing estimation method.

结合第一方面，在一种可能的实现方式中，所述估计噪声信号的统计特性，包括根据现有噪声估计算法估计噪声信号的统计特性。With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristics of the noise signal includes estimating the statistical characteristics of the noise signal according to an existing noise estimation algorithm.

结合第一方面，在一种可能的实现方式中，所述将麦克风阵列分为多个子阵列，分别估计出多个子滤波器，包括利用降噪滤波器的低秩结构迭代估计出多个子滤波器。With reference to the first aspect, in a possible implementation manner, dividing the microphone array into multiple sub-arrays and estimating multiple sub-filters respectively includes iteratively estimating multiple sub-filters by using the low-rank structure of the noise reduction filter .

第二方面，本发明实施例提供了一种基于麦克风阵列的语音降噪装置，其特征在于，包括In a second aspect, an embodiment of the present invention provides a voice noise reduction device based on a microphone array, which is characterized by comprising:

信号获取模块，用于获取带噪语音信号；The signal acquisition module is used to acquire the noisy speech signal;

信号预处理模块，用于对所述带噪语音信号进行预处理，确定频域带噪语音信号；a signal preprocessing module for preprocessing the noisy speech signal to determine the frequency domain noisy speech signal;

统计特性估计模块，用于估计所述频域带噪语音信号的统计特性，估计噪声信号的统计特性；a statistical characteristic estimation module for estimating the statistical characteristic of the frequency-domain noisy speech signal, and estimating the statistical characteristic of the noise signal;

子滤波器确定模块，用于将麦克风阵列分为多个子阵列，分别估计出多个子滤波器；a sub-filter determining module, used for dividing the microphone array into multiple sub-arrays, and estimating multiple sub-filters respectively;

频域降噪滤波器确定模块，用于根据所述多个子滤波器，确定频域降噪滤波器；a frequency-domain noise reduction filter determination module, configured to determine a frequency-domain noise reduction filter according to the plurality of sub-filters;

降噪模块，用于根据所述频域降噪滤波器对所述频域带噪语音信号进行降噪处理，确定频域降噪语音信号；A noise reduction module, configured to perform noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter, to determine the frequency-domain noise reduction speech signal;

时域降噪语音信号确定模块，用于将所述频域降噪语音信号转换为时域降噪语音信号。A time-domain noise-reduced speech signal determination module, configured to convert the frequency-domain noise-reduced speech signal into a time-domain noise-reduced speech signal.

结合第二方面，在一种可能的实现方式中，所述信号预处理模块，包括：对所述带噪语音信号进行分帧、加窗后进行快速傅里叶变换。With reference to the second aspect, in a possible implementation manner, the signal preprocessing module includes: performing fast Fourier transform on the noisy speech signal after framing and windowing.

结合第二方面，在一种可能的实现方式中，所述统计特性估计模块，包括：包括根据时间平滑估计方式进行带噪语音信号统计特性的估计。With reference to the second aspect, in a possible implementation manner, the statistical characteristic estimation module includes: including estimating the statistical characteristic of a noisy speech signal according to a time smoothing estimation method.

结合第二方面，在一种可能的实现方式中，所述统计特性估计模块，包括：包括根据现有噪声估计算法估计噪声信号的统计特性。With reference to the second aspect, in a possible implementation manner, the statistical characteristic estimation module includes: including estimating the statistical characteristic of the noise signal according to an existing noise estimation algorithm.

结合第二方面，在一种可能的实现方式中，所述频域降噪滤波器确定模块，包括：利用降噪滤波器的低秩结构迭代估计出多个子滤波器。With reference to the second aspect, in a possible implementation manner, the frequency-domain noise reduction filter determination module includes: iteratively estimating a plurality of sub-filters by using a low-rank structure of the noise reduction filter.

第三方面，本发明实施例提供了一种基于麦克风阵列的语音降噪服务器，包括存储器和处理器；In a third aspect, an embodiment of the present invention provides a voice noise reduction server based on a microphone array, including a memory and a processor;

所述存储器用于存储计算机可执行指令；the memory for storing computer-executable instructions;

所述处理器用于执行所述计算机可执行指令，以实现如第一方面所述的方法。The processor is adapted to execute the computer-executable instructions to implement the method of the first aspect.

第四方面，本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有可执行指令，计算机执行所述可执行指令时能够实现如第一方面任一项所述的方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores executable instructions, and when a computer executes the executable instructions, any one of the first aspect can be implemented Methods.

本发明实施例中提供的一个或多个技术方案，至少具有如下技术效果或优点：One or more technical solutions provided in the embodiment of the present invention have at least the following technical effects or advantages:

本发明实施例采用了一种基于麦克风阵列的语音降噪方法，该方法包括，获取带噪语音信号；对带噪语音信号进行预处理，确定频域带噪语音信号；估计频域带噪语音信号的统计特性，估计噪声信号的统计特性；将麦克风阵列分为多个子阵列，分别估计出多个子滤波器；根据多个子滤波器，确定频域降噪滤波器；根据频域降噪滤波器对频域带噪语音信号进行降噪处理，确定频域降噪语音信号；将频域降噪语音信号转换为时域降噪语音信号。有效解决了在现有技术中当滤波器长度较长时所导致的两个问题，即，第一，求解滤波器的复杂度会随着滤波器长度的增大而迅速增大；第二，在滤波器的求解过程中所需的信号协方差矩阵的维数会更大，因此需要更多的观测样本来估计信号的协方差矩阵，用以计算滤波器的系数，导致对语音信号和噪声统计特性变化的跟踪能力下降，无法更好地处理实际中常见的非平稳噪声。本发明实施例显著降低了求解滤波器的复杂度，且在滤波器的求解过程中所需的信号协方差矩阵维数更小，因此可以用更少的信号观测样本来估计其协方差矩阵，从而提高滤波器对语音信号和噪声统计特性变化的跟踪能力。The embodiment of the present invention adopts a voice noise reduction method based on a microphone array. The method includes: acquiring a noisy voice signal; preprocessing the noisy voice signal to determine a frequency-domain noisy voice signal; estimating a frequency-domain noisy voice signal Statistical characteristics of the signal, estimate the statistical characteristics of the noise signal; divide the microphone array into multiple sub-arrays, and estimate multiple sub-filters respectively; determine the frequency-domain noise reduction filter according to the multiple sub-filters; According to the frequency-domain noise reduction filter The noise reduction processing is performed on the frequency-domain noise-reduced speech signal to determine the frequency-domain noise-reduced speech signal; the frequency-domain noise-reduced speech signal is converted into a time-domain noise reduction speech signal. It effectively solves two problems caused when the filter length is long in the prior art, namely, first, the complexity of solving the filter will increase rapidly with the increase of the filter length; second, The dimension of the signal covariance matrix required in the process of solving the filter will be larger, so more observation samples are needed to estimate the covariance matrix of the signal to calculate the coefficients of the filter, which will lead to the loss of speech signal and noise. The ability to track changes in statistical properties is reduced, and it cannot better handle non-stationary noise that is common in practice. The embodiment of the present invention significantly reduces the complexity of solving the filter, and the required signal covariance matrix dimension in the process of solving the filter is smaller, so the covariance matrix can be estimated by using fewer signal observation samples, Thereby, the ability of the filter to track the changes of the statistical characteristics of the speech signal and noise is improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对本发明实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings that are required in the description of the embodiments of the present invention or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本申请实施例提供的基于麦克风阵列的语音降噪方法的步骤流程图；Fig. 1 is a flow chart of steps of a microphone array-based voice noise reduction method provided by an embodiment of the application;

图2为本申请实施例提供的基于麦克风阵列的语音降噪的装置的示意图；2 is a schematic diagram of a device for noise reduction based on a microphone array provided by an embodiment of the present application;

图3为本申请实施例提供的基于麦克风阵列的语音降噪的服务器示意图；3 is a schematic diagram of a server for voice noise reduction based on a microphone array provided by an embodiment of the present application;

图4为本申请实施例提供的方法的复杂度和传统方法复杂度的对比图；4 is a comparison diagram of the complexity of the method provided by the embodiment of the present application and the complexity of the traditional method;

图5为本申请实施例提供的方法的均方误差随迭代次数的变化的图像；FIG. 5 is an image of the mean square error of the method provided by the embodiment of the present application as a function of the number of iterations;

图6为本申请实施例提供的当噪声统计特性突然发生变化时，本申请实施例提供的方法和传统方法的均方误差随时间变化的对比图。FIG. 6 is a comparison diagram of the mean square error of the method provided by the embodiment of the present application and the traditional method when the statistical characteristics of noise suddenly change, provided by the embodiment of the present application.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述。显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some, but not all, embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work, all belong to the protection scope of the present invention.

在早期的语音交互系统中通常只配备一个麦克风，对应的语音降噪方法为单通道语音降噪。单通道语音降噪方法具有实现简单、运算效率高等优点，能取得一定的效果，但也具有很大的局限性。研究表明，在一定的条件下单通道降噪一定会引入语音失真，且信噪比提升越大，引入的语音失真也就越大。相比之下，多通道语音降噪方法更具有潜力，在少引入或者不引入语音失真的前提下，显著提升信噪比。多通道语音降噪通常需要装备更多的麦克风，以获取更加丰富的空时频信息。但相应的会导致两个问题，第一，求解滤波器的复杂度会随着滤波器长度的增加而迅速的增大；第二，在滤波器的求解过程中所需的信号协方差矩阵的维数更大，因此需要更多的测样本来估计信号的协方差矩阵，用以计算滤波器的系数，导致其对语音信号和噪声统计变化的跟踪能力下降，无法更好地处理在实际中常见的非平稳噪声。In the early voice interaction system, only one microphone is usually equipped, and the corresponding voice noise reduction method is single-channel voice noise reduction. The single-channel speech noise reduction method has the advantages of simple implementation and high computing efficiency, and can achieve certain effects, but it also has great limitations. Studies have shown that under certain conditions, single-channel noise reduction will definitely introduce speech distortion, and the greater the improvement of the signal-to-noise ratio, the greater the introduced speech distortion. In contrast, the multi-channel speech noise reduction method has more potential, and can significantly improve the signal-to-noise ratio under the premise of introducing little or no speech distortion. Multi-channel speech noise reduction usually requires more microphones to obtain richer space-time-frequency information. However, it will lead to two problems. First, the complexity of solving the filter will increase rapidly with the increase of the filter length; second, the signal covariance matrix required in the process of solving the filter will increase. The dimension is larger, so more test samples are needed to estimate the covariance matrix of the signal to calculate the coefficients of the filter, which leads to a decrease in the tracking ability of the statistical changes of the speech signal and noise, which cannot be better processed in practice. Common non-stationary noise.

本发明实施例提供了一种基于麦克风阵列的语音降噪方法，如图1所示，该方法包括以下步骤，An embodiment of the present invention provides a voice noise reduction method based on a microphone array. As shown in FIG. 1 , the method includes the following steps:

步骤S101，获取带噪语音信号。Step S101, acquiring a noisy speech signal.

步骤S102，对带噪语音信号进行预处理，确定频域带噪语音信号。Step S102, preprocessing the noisy speech signal to determine the frequency domain noisy speech signal.

步骤S103，估计频域带噪语音信号的统计特性，估计噪声信号的统计特性。Step S103, estimating the statistical characteristics of the frequency-domain noisy speech signal, and estimating the statistical characteristics of the noise signal.

步骤S104，将麦克风阵列分为多个子阵列，分别估计出多个子滤波器。Step S104: Divide the microphone array into a plurality of sub-arrays, and estimate a plurality of sub-filters respectively.

步骤S105，根据多个子滤波器，确定频域降噪滤波器。Step S105: Determine a frequency-domain noise reduction filter according to a plurality of sub-filters.

步骤S106，根据频域降噪滤波器对频域带噪语音信号进行降噪处理，确定频域降噪语音信号。Step S106, performing noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter to determine the frequency-domain noise reduction speech signal.

步骤S107，将频域降噪语音信号转换为时域降噪语音信号。Step S107, converting the frequency-domain noise reduction speech signal into a time-domain noise reduction speech signal.

综合上述的方法步骤，构建一个更加合理的滤波器，避免了像现有的多通道语音降噪方法一样整体计算一个很长的滤波器，更短的滤波器意味着少的滤波器系数。因此，相校于现有的方法，本申请提供的方法显著降低了求解语音降噪滤波器的复杂度，且在滤波器的求解过程中所需的信号协方差矩阵维数小，所以可以用更少的信号观测样本来估计其协方差矩阵，从而可提高滤波器对语音信号和噪声统计特性变化的跟踪能力。Combining the above method steps, a more reasonable filter is constructed, which avoids calculating a long filter as a whole like the existing multi-channel speech noise reduction method, and a shorter filter means less filter coefficients. Therefore, compared with the existing method, the method provided by the present application significantly reduces the complexity of solving the speech noise reduction filter, and the required signal covariance matrix dimension in the process of solving the filter is small, so it can be used Fewer signal observation samples are used to estimate its covariance matrix, which improves the filter's ability to track changes in speech signal and noise statistics.

在本申请的一个具体的实施例中，我们将时域带噪语音信号表示为，In a specific embodiment of this application, we represent the time-domain noisy speech signal as,

y_m(t)＝x_m(t)+v_m(t),m＝1,2,...,M (1)y _m (t) = x _m (t) + v _m (t), m = 1,2,...,M (1)

其中，y_m(t)表示第m个麦克风接收到的带噪语音信号；x_m(t)表示第m个麦克风接收到的纯净语音信号；v_m(t)表示第m个麦克风接收到的背景噪声信号； t表示离散时间点；M表示麦克风的个数。Among them, y _m (t) represents the noisy speech signal received by the m-th microphone; x _m (t) represents the pure speech signal received by the m-th microphone; _vm (t) represents the m-th microphone received Background noise signal; t represents discrete time points; M represents the number of microphones.

在本申请中的一个具体的实施例中，假设所有的信号都是零均值、带宽信号，同时，假设语音信号和噪声信号不相关。语音降噪地目的为，通过带噪语音信号恢复出纯净语音信号。为不失一般性，本申请中，将麦克风1设置为参考麦克风，即设x₁(t)为期望信号(需要恢复的信号)。In a specific embodiment of the present application, it is assumed that all signals are zero mean, bandwidth signals, and at the same time, it is assumed that the speech signal and the noise signal are not correlated. The purpose of speech noise reduction is to restore the pure speech signal through the noisy speech signal. Without loss of generality, in this application, the microphone 1 is set as the reference microphone, that is, x ₁ (t) is set as the desired signal (the signal to be restored).

对带噪语音信号进行预处理，包括：对带噪语音信号进行分帧、加窗后进行快速傅里叶变换，得到频域带噪语音信号，表示为：The preprocessing of the noisy speech signal includes: framing the noisy speech signal, adding a window and then performing fast Fourier transform to obtain the frequency domain noisy speech signal, which is expressed as:

其中，w表示窗函数；T表示窗函数的长度(也是语音信号帧的长度)；L 表示两个相邻帧之间的步进长度；零均值随机变量Y_m(k,n),X_m(k,n),V_m(k,n)分别为 y_m(t),x_m(t),v_m(t),在第n帧第k个频带的傅里叶变换值，其中k∈{0,1,...,K-1}。Among them, w represents the window function; T represents the length of the window function (also the length of the speech signal frame); L represents the step length between two adjacent frames; the zero-mean random variable Y _m (k,n), X _m (k,n), V _m (k, n) are y _m (t), x _m (t), v _m (t), respectively, the Fourier transform value of the k-th frequency band in the n-th frame, where k ∈{0,1,...,K-1}.

为方便起见，将信号模型用向量形式表示为For convenience, the signal model is represented in vector form as

y(k,n)＝x(k,n)+v(k,n) (3)y(k,n)=x(k,n)+v(k,n) (3)

其中，in,

y(k,n)＝[Y₁(k,n),Y₂(k,n),...,Y_M(k,n)]^T (4)y(k,n)=[Y ₁ (k,n),Y ₂ (k,n),...,Y _M (k,n)] ^T (4)

x(k,n)和x(k,n)的定义与y(k,n)类似，上标T为转置符。The definitions of x(k,n) and x(k,n) are similar to y(k,n), and the superscript T is the transpose operator.

在传统方法中，通常需要设计一个长为M的滤波器h(k,n)来实现语音降噪，即：In traditional methods, it is usually necessary to design a filter h(k,n) with a length of M to achieve speech noise reduction, namely:

Z(k,n)＝h^H(k,n)y(k,n) (5)Z(k,n)=h ^H (k,n)y(k,n) (5)

其中in

h(k,n)＝[H₁(k,n),H₂(k,n),...,H_M(k,n)]^T (6)h(k,n)=[H ₁ (k,n),H ₂ (k,n),...,H _M (k,n)] ^T (6)

Z(k,n)为X₁(k,n)的估计值。但当M较大时，则会导致在背景技术中所述的两个问题。Z(k,n) is the estimated value of X ₁ (k,n). However, when M is large, two problems as described in the background art are caused.

估计频域带噪语音信号的统计特性，包括根据时间平滑方式进行带噪语音信号统计特性的估计。估计噪声信号的统计特性，包括根据现有噪声估计算法估计噪声信号的统计特性。Estimating the statistical properties of the noisy speech signal in the frequency domain, including estimating the statistical properties of the noisy speech signal according to the time smoothing method. Estimating the statistical properties of the noise signal, including estimating the statistical properties of the noise signal according to existing noise estimation algorithms.

由于语音信号和噪声不相关，所以Z(k,n)的方差可表示为：Since the speech signal and noise are uncorrelated, the variance of Z(k,n) can be expressed as:

Φ_Z(k,n)＝h^H(k,n)Φ_y(k,n)h(k,n)Φ _Z (k,n)=h ^H (k,n)Φ _y (k,n)h(k,n)

＝h^H(k,n)Φ_x(k,n)h(k,n)+h^H(k,n)Φ_v(k,n)h(k,n) (7)=h ^H (k,n)Φ _x (k,n)h(k,n)+h ^H (k,n)Φ _v (k,n)h(k,n) (7)

其中，Φ_a(k,n)＝E[a(k,n)a^H(k,n)],a(k,n)∈{y(k,n),x(k,n),v(k,n)}。通常，我们可以应用时间平滑的方式估计Φ_y(k,n)，而Φ_v(k,n)则可以根据现有文献中的噪声估计算法得到。得到Φ_y(k,n)及Φ_v(k,n)的估计值后，则可通过Φ_y(k,n)-Φ_v(k,n)得到 Φ_x(k,n)。Among them, Φ _a (k,n)=E[a(k,n)a ^H (k,n)],a(k,n)∈{y(k,n),x(k,n),v (k,n)}. In general, we can estimate Φ _y (k,n) by applying temporal smoothing, and Φ _v (k,n) can be obtained according to noise estimation algorithms in the existing literature. After obtaining the estimated values of Φ _y (k, n) and Φ _v (k, n), Φ _x (k, n) can be obtained by Φ _y (k, n)-Φ _v (k, n).

为导出本发明中的方法，将麦克风阵列分为M₂个子阵，每个子阵中有M₁个麦克风，即M＝M₁*M₂，第1至M₁个麦克风组成第一个子阵，第M₁+1至2M₁个麦克风组成第二个子阵，以此类推。在本发明中，我们假设M₁≤M₂。同样，可以将滤波器h(k,n)按上述方式分解，即In order to derive the method in the present invention, the microphone array is divided into M ₂ sub-arrays, and there are M ₁ microphones in each sub-array, that is, M=M ₁ *M ₂ , and the first to M ₁ microphones form the first sub-array , the M ₁ +1 to 2M ₁ microphones form the second sub-array, and so on. In the present invention, we assume that M ₁ ≤ M ₂ . Similarly, the filter h(k,n) can be decomposed as above, namely

其中，in,

此时，可以将子滤波器h_m(k,n),m＝1,2,...,M₂组成一个维数为M₁×M₂的矩阵，即：At this time, the sub-filters h _m (k, n), m=1, 2,..., M ₂ can be formed into a matrix of dimension M ₁ ×M ₂ , that is:

H(k,n)＝[h₁(k,n),h₂(k,n),...,h_M2(k,n)] (10)H(k,n)=[h ₁ (k,n),h ₂ (k,n),...,h _M2 (k,n)] (10)

需要注意的是，h(k,n)＝vec[H(k,n)]，vec(·)表示矩阵的向量化操作符。为简便起见，在后面不会引起歧义的地方将去掉符号k和n。对矩阵H进行奇异值分解(SingularValue Decomposition，SVD)，可将H分解为:It should be noted that h(k,n)=vec[H(k,n)], vec(·) represents the vectorization operator of the matrix. For brevity, the symbols k and n will be removed later where no ambiguity arises. Singular Value Decomposition (SVD) is performed on the matrix H, and H can be decomposed into:

其中，in,

为一个M₂×M₂的矩阵，is an M ₂ ×M ₂ matrix,

为一个M₂×M₂的矩阵。H₁和H₂为两个正交矩阵，∑为一个M₁×M₂的对角矩阵，其对角线元素为非负实数。在本申请中，将它们按从大到小的顺序排列，即

上标H为共轭转置符。is an M ₂ ×M ₂ matrix. H ₁ and H ₂ are two orthogonal matrices, and Σ is an M ₁ ×M ₂ diagonal matrix whose diagonal elements are non-negative real numbers. In this application, they are arranged in descending order, i.e.

The superscript H is the conjugate transpose.

各个通道接收到的带噪语音信号强相关，所以子滤波器h_m(k,n),m＝1,2,...,M₂之间通常也是是强相关的，导致矩阵H通常不是行满秩矩阵。所以矩阵H通常可以用前P个最大的奇异值及对应的奇异向量进行较好地近似，即：The noisy speech signals received by each channel are strongly correlated, so the sub-filters h _m (k, n), m=1, 2,..., M ₂ are usually also strongly correlated, resulting in the matrix H usually not Row full rank matrix. Therefore, the matrix H can usually be well approximated by the first P largest singular values and the corresponding singular vectors, namely:

其中，

需要注意的是，由

引起的歧义对矩阵H没有影响。相应的，滤波器h可以近似表示为:in,

It should be noted that by

The resulting ambiguity has no effect on the matrix H. Correspondingly, the filter h can be approximately expressed as:

需要注意的是，当P＝M₁时,h_P＝h。It should be noted that when P=M ₁ , h _P =h.

应用关系式：Apply the relation:

可将h_P写为： _hP can be written as:

其中，

大小为M×M₂，

大小为M×M₁。此时，滤波器的输出值Z(k,n)可写为：in,

The size is M×M ₂ ,

The size is M×M ₁ . At this time, the output value Z(k,n) of the filter can be written as:

其中，in,

H_σ1,P＝[H_σ1,1 H_σ1,2...H_σ1,P]^H (24)H _σ1,P = [H _σ1,1 H _σ1,2 ...H _σ1,P ] ^H (24)

H_σ2,P＝[H_σ2,1 H_σ2,2...H_σ2,P]^H (25)H _σ2,P = [H _σ2,1 H _σ2,2 ...H _σ2,P ] ^H (25)

h _σ1,P，h _σ2,P，y_σ1,P(t)，y_σ2,P(t),H_σ1,P和H_σ2,P的大小分别是M₁P×1，M₂P×1，M₂P×1，M₁P×1，M₂P×M，M₁P×M。可以看出，当参数P较小时，子滤波器h _σ1,P和h _σ2,P的长度远小于滤波器h的长度。The sizes of h _σ1,P , h _σ2,P , y _σ1,P (t), y _σ2,P (t), H _σ1,P and H _σ2,P are M ₁ P×1, M ₂ P×1, respectively , M ₂ P×1, M ₁ P×1, M ₂ P×M, M ₁ P×M. It can be seen that when the parameter P is small, the lengths of the sub-filters h _σ1,P and h _σ2,P are much smaller than the length of the filter h.

期望信号X₁和其估计值Z的均方误差(mean square error,MSE)为The mean square error (MSE) of the expected signal X ₁ and its estimated value Z is

其中，

E(·)表示数学期望，

表示取实部，上标^*表示复共轭。in,

E( ) represents the mathematical expectation,

Represents the real part, and the superscript ^* represents the complex conjugate.

为导出本发明中的滤波器，将MSE写为如下形式：For deriving the filter in the present invention, MSE is written in the following form:

其中，in,

需要注意的是，当参数P较小时，矩阵Φ_yσ1,p(M₂P×M₂P),和Φ_yσ2,p(M₁P×M₁P)的维数远远小于矩阵Φ_y(M×M)的维数。It should be noted that when the parameter P is small, the dimensions of the matrices Φ _yσ1,p (M ₂ P×M ₂ P), and Φ _yσ2,p (M ₁ P×M ₁ P) are much smaller than those of the matrix Φ _y ( M×M) dimension.

由此可带来两个优势：This leads to two advantages:

1)相较于求解基于Φ_y的逆矩阵的传统多通道语音降噪滤波器，求解基于 Φ_yσ1,p和Φ_yσ2,p的逆矩阵的子滤波器h _σ1，P和h _σ2，P，所需的复杂度显著降低；1) Compared with solving the traditional multi-channel speech noise reduction filter based on the inverse matrix of Φ _y , solving the sub-filters h _σ1,P and h _σ2,P based on the inverse matrix of Φ _yσ1,p and Φ _yσ2,p , The required complexity is significantly reduced;

2)相较于估计矩阵Φ_y，可用更少的信号观测样本估计矩阵Φ_yσ1,p和 Φ_yσ2,p，使得子滤波器h _σ1，P,和h _σ2，P可更加快速地跟踪信号统计特性的变化。2) Compared with the estimation matrix Φ _y , the matrices Φ _yσ1,p and Φ _yσ2,p can be estimated with fewer signal observation samples, so that the sub-filters h _σ1,P , and h _σ2,P can track the signal statistics more quickly changes in characteristics.

对近似滤波器进行运算，包括：采用迭代求解的方式，得到维纳滤波器。Calculating the approximate filter, including: adopting an iterative solution method to obtain a Wiener filter.

基于式(27)和(28)，很难导出子滤波器h _σ1，P和h _σ2，P的闭式解。所以，本发明中采用迭代求解的方式。为此，在求解其中一个子滤波器时，假设另一个子滤波器固定，即Based on equations (27) and (28), it is difficult to derive closed-form solutions for the subfilters h _σ1,P and h _σ2,P . Therefore, an iterative solution method is adopted in the present invention. For this reason, when solving one of the subfilters, the other subfilter is assumed to be fixed, i.e.

将子滤波器h _σ1，P按如下方式初始化：Initialize the subfilters h _{σ1, P} as follows:

其中，in,

x_p的定义与y_p类似。可以看出，h_σ1,W,p为第p个子矩阵的维也纳滤波器，长为M₁。The definition of x _p is similar to that of y _p . It can be seen that h _σ1,W,p is the Vienna filter of the p-th sub-matrix, and its length is M ₁ .

应用

构建

并将其带入式(29)和(30)，可得application

Construct

And bring it into equations (29) and (30), we can get

将式(38)和(39)带入至式(34)中可得：Substituting equations (38) and (39) into equation (34), we get:

将式(40)对

求导并将结果置零，可得子滤波器

的维纳解:Put equation (40) on

Take the derivative and set the result to zero to get the subfilter

The Wiener solution of:

应用

构建

并将其带入至式(31)和(32)，可得：application

Construct

and bringing it into equations (31) and (32), we get:

将

和

带入式(33)中得：Will

and

Bring it into equation (33) to get:

基于(44)，可得子滤波器

的维纳解：Based on (44), the sub-filter can be obtained

The Wiener solution of :

按上述方式，迭代至第n步时，我们有：In the above way, when iterating to the nth step, we have:

其中，in,

此时，可以得到本申请中的迭代维纳滤波器：At this point, the iterative Wiener filter in this application can be obtained:

本发明实施例提供了一种基于麦克风阵列的语音降噪装置，如图2所示，包括信号获取模块201，信号预处理模块202，统计特性估计模块203，子滤波器确定模块204，频域降噪滤波器确定模块205，降噪模块206，时域降噪语音信号确定模块207。信号获取模块201，用于获取带噪语音信号；信号预处理模块202，用于对所述带噪语音信号进行预处理，确定频域带噪语音信号；统计特性估计模块203，用于估计所述频域带噪语音信号的统计特性及噪声信号的统计特性；子滤波器确定模块204，用于将麦克风阵列分为多个子阵列，分别估计出多个子滤波器；频域降噪滤波器确定模块205，用于根据所述多个子滤波器，确定频域降噪滤波器；降噪模块206，用于根据所述频域降噪滤波器对所述频域带噪语音信号进行降噪处理，确定频域降噪语音信号；时域降噪语音信号确定模块207，用于将所述频域降噪语音信号转换为时域降噪语音信号。An embodiment of the present invention provides a voice noise reduction device based on a microphone array, as shown in FIG. 2, including a signal acquisition module 201, a signal preprocessing module 202, a statistical characteristic estimation module 203, a subfilter determination module 204, a frequency domain A noise reduction filter determination module 205 , a noise reduction module 206 , and a time domain noise reduction speech signal determination module 207 . The signal acquisition module 201 is used to acquire the noisy speech signal; the signal preprocessing module 202 is used to preprocess the noisy speech signal to determine the frequency domain noisy speech signal; the statistical characteristic estimation module 203 is used to estimate the The statistical characteristics of the frequency-domain noisy speech signal and the statistical characteristics of the noise signal are described; the sub-filter determination module 204 is used to divide the microphone array into a plurality of sub-arrays, and respectively estimate a plurality of sub-filters; the frequency-domain noise reduction filter determines Module 205, configured to determine a frequency-domain noise reduction filter according to the plurality of sub-filters; noise reduction module 206, configured to perform noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter , determine the frequency-domain noise-reduced speech signal; the time-domain noise-reduced speech signal determining module 207 is configured to convert the frequency-domain noise-reduced speech signal into a time-domain noise-reduced speech signal.

图4为本申请提供的方法的复杂度与传统方法复杂度的对比，图5为本申请提供的方法的均方误差随迭代次数的变化，图6为噪声统计特性突然发生变化时，本申请所提方法及传统方法的均方误差随时间的变化图。即本申请提供的方法有效降低了复杂度，提高了滤波器对语音信号和噪声统计特性变化的跟踪能力。Fig. 4 is a comparison between the complexity of the method provided by the application and the complexity of the traditional method, Fig. 5 is the change of the mean square error of the method provided by the application with the number of iterations, and Fig. 6 is the noise statistical characteristic suddenly changed, the application Plot of the mean squared error versus time for the proposed method and the traditional method. That is, the method provided by the present application effectively reduces the complexity and improves the ability of the filter to track changes in the statistical characteristics of speech signals and noise.

本发明实施例提供了一种基于麦克风阵列的语音降噪的服务器，如图3所示，包括存储器301和处理器302；存储器301用于存储计算机可执行指令；处理器302用于执行计算机可执行指令。An embodiment of the present invention provides a voice noise reduction server based on a microphone array. As shown in FIG. 3 , it includes a memory 301 and a processor 302; the memory 301 is used to store computer-executable instructions; the processor 302 is used to execute computer-executable instructions. Execute the instruction.

本发明实施例提供了一种计算机可读存储介质，计算机可读存储介质存储有可执行指令，计算机执行可执行指令时能够。Embodiments of the present invention provide a computer-readable storage medium, where the computer-readable storage medium stores executable instructions, and the computer can execute the executable instructions.

上述存储介质包括但不限于随机存取存储器(英文：Random Access Memory；简称：RAM)、只读存储器(英文：Read-Only Memory；简称：ROM)、缓存(英文：Cache)、硬盘(英文：Hard Disk Drive；简称：HDD)或者存储卡(英文：Memory Card)。所述存储器可以用于存储计算机程序指令。The above-mentioned storage medium includes but is not limited to random access memory (English: Random Access Memory; referred to as: RAM), read-only memory (English: Read-Only Memory; referred to as: ROM), cache (English: Cache), hard disk (English: Hard Disk Drive; referred to as: HDD) or memory card (English: Memory Card). The memory may be used to store computer program instructions.

虽然本申请提供了如实施例或流程图所述的方法操作步骤，但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。本实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式，不代表唯一的执行顺序。在实际中的装置或客户端产品执行时，可以按照本实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境)。Although the present application provides method operation steps as described in the embodiments or flow charts, more or less operation steps may be included based on routine or non-creative work. The sequence of steps enumerated in this embodiment is only one way among the execution sequences of many steps, and does not represent the only execution sequence. When an actual device or client product is executed, it can be executed sequentially or in parallel (for example, a parallel processor or multi-threaded processing environment) according to the method shown in this embodiment or the accompanying drawings.

上述实施例阐明的装置或模块，具体可以由计算机芯片或实体实现，或者由具有某种功能的产品来实现。为了描述的方便，描述以上装置时以功能分为各种模块分别描述。在实施本申请时可以把各模块的功能在同一个或多个软件和/或硬件中实现。当然，也可以将实现某功能的模块由多个子模块或子单元组合实现。The devices or modules described in the above embodiments may be specifically implemented by computer chips or entities, or by products with certain functions. For the convenience of description, when describing the above device, the functions are divided into various modules and described respectively. When implementing the present application, the functions of each module may be implemented in one or more software and/or hardware. Of course, a module that implements a certain function can also be implemented by a combination of multiple sub-modules or sub-units.

本申请中所述的方法、装置或模块可以以计算机可读程序代码方式实现控制器按任何适当的方式实现，例如，控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(英文：Application Specific Integrated Circuit；简称：ASIC)、可编程逻辑控制器和嵌入微控制器的形式，控制器的例子包括但不限于以下微控制器：ARC625D、Atmel AT91SAM、 Microchip PIC18F26K20以及Silicone Labs C8051F320，存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道，除了以纯计算机可读程序代码方式实现控制器以外，完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件，而对其内部包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至，可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The methods, apparatuses or modules described in this application may be implemented in computer readable program code. The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and the memory may be implemented by the (micro)processing computer-readable medium, logic gates, switches, application-specific integrated circuits (English: Application Specific Integrated Circuit; ASIC for short), programmable logic controllers and embedded microcontrollers Examples of controllers include, but are not limited to, the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320, and memory controllers can also be implemented as part of the memory's control logic. Those skilled in the art also know that, in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps. The same function can be realized in the form of a microcontroller, etc. Therefore, such a controller can be regarded as a hardware component, and the devices included therein for realizing various functions can also be regarded as a structure within the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.

本申请所述装置中的部分模块可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构、类等。也可以在分布式计算环境中实践本申请，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

通过以上的实施方式的描述可知，本领域的技术人员可以清楚地了解到本申请可借助软件加必需的硬件的方式来实现。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，也可以通过数据迁移的实施过程中体现出来。该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，移动终端，服务器，或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary hardware. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art can also be embodied in the implementation process of data migration. The computer software product can be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions to make a computer device (which can be a personal computer, mobile terminal, server, or network device, etc.) execute this The methods described in various embodiments or portions of embodiments are claimed.

本说明书中的各个实施方式采用递进的方式描述，各个实施方式之间相同或相似的部分互相参见即可，每个实施方式重点说明的都是与其他实施方式的不同之处。本申请的全部或者部分可用于众多通用或专用的计算机系统环境或配置中。例如：个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、移动通信终端、多处理器系统、基于微处理器的系统、可编程的电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. All or part of this application may be utilized in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.

以上实施例仅用以说明本申请的技术方案，而非对本申请限制；尽管参照前述实施例对本申请进行了详细的说明，本领域普通技术人员应当理解：其依然可以对前述实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请技术方案的范围。The above embodiments are only used to illustrate the technical solutions of the present application, but not to limit the present application; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still The technical solutions are modified, or some or all of the technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the present application.

Claims

1. A speech noise reduction method based on microphone array is characterized by comprising

Acquiring a voice signal with noise;

preprocessing the voice signal with noise to determine a frequency domain voice signal with noise;

estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal;

dividing a microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters;

determining a frequency domain noise reduction filter according to the plurality of sub-filters;

carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;

and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.

2. The method of claim 1, wherein the pre-processing the noisy speech signal comprises: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.

3. The method according to claim 1, wherein said estimating the statistical properties of the frequency-domain noisy speech signal comprises estimating the statistical properties of the noisy speech signal according to a time-smoothed estimation.

4. The method of claim 1, wherein estimating the statistical properties of the noise signal comprises estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.

5. The method of claim 1, wherein the dividing the microphone array into a plurality of sub-arrays and estimating a plurality of sub-filters separately comprises iteratively estimating the plurality of sub-filters using a low rank architecture of a noise reduction filter.

6. A speech noise reduction device based on microphone array is characterized by comprising

The signal acquisition module is used for acquiring a voice signal with noise;

the signal preprocessing module is used for preprocessing the voice signal with the noise and determining a frequency domain voice signal with the noise;

the statistical characteristic estimation module is used for estimating the statistical characteristic of the frequency domain voice signal with noise and estimating the statistical characteristic of the noise signal;

the sub-filter determining module is used for dividing the microphone array into a plurality of sub-arrays and respectively estimating a plurality of sub-filters;

a frequency domain noise reduction filter determining module, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters;

the noise reduction module is used for carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter and determining a frequency domain noise reduction voice signal;

and the time domain noise reduction voice signal determination module is used for converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.

7. A microphone array based speech noise reduction server comprising a memory and a processor;

the memory is to store computer-executable instructions;

the processor is configured to execute the computer-executable instructions to implement the method of any of claims 1-5.

8. A computer-readable storage medium having stored thereon executable instructions that, when executed by a computer, are capable of implementing the method of any one of claims 1-5.