[go: up one dir, main page]

CN114373475A - Method, device and storage medium for speech noise reduction based on microphone array - Google Patents

Method, device and storage medium for speech noise reduction based on microphone array Download PDF

Info

Publication number
CN114373475A
CN114373475A CN202111621218.5A CN202111621218A CN114373475A CN 114373475 A CN114373475 A CN 114373475A CN 202111621218 A CN202111621218 A CN 202111621218A CN 114373475 A CN114373475 A CN 114373475A
Authority
CN
China
Prior art keywords
noise reduction
noise
signal
voice signal
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111621218.5A
Other languages
Chinese (zh)
Inventor
王向辉
高朴
韩冬
陈捷
王瑞琪
王姣
李梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202111621218.5A priority Critical patent/CN114373475A/en
Publication of CN114373475A publication Critical patent/CN114373475A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请公开了一种基于麦克风阵列的语音降噪方法,解决了现有技术中求解滤波器的复杂度会随着滤波器长度的增大而迅速增大,并且对语音信号和噪声统计特性变化的跟踪能力下降的问题,该方法包括:获取带噪语音信号;对带噪语音信号进行预处理,确定频域带噪语音信号;估计频域带噪语音信号和噪声信号的统计特性;将麦克风阵列分为多个子阵列,分别估计出多个子滤波器,并确定频域降噪滤波器;根据频域降噪滤波器对频域带噪语音信号进行降噪处理,并转换为时域降噪语音信号,使得在滤波器的求解过程中所需的信号协方差矩阵维数更小,从而显著降低了求解语音降噪滤波器的复杂度,并且提高滤波器对语音信号和噪声统计特性变化的跟踪能力。

Figure 202111621218

The present application discloses a voice noise reduction method based on a microphone array, which solves the problem that the complexity of solving the filter in the prior art will increase rapidly with the increase of the filter length, and the statistical characteristics of the voice signal and noise will change. The method includes: acquiring the noisy speech signal; preprocessing the noisy speech signal to determine the frequency domain noisy speech signal; estimating the frequency domain noisy speech signal and the statistical characteristics of the noise signal; The array is divided into multiple sub-arrays, and multiple sub-filters are estimated respectively, and the frequency-domain noise reduction filter is determined; according to the frequency-domain noise reduction filter, the frequency-domain noisy speech signal is denoised and converted into time-domain noise reduction speech signal, so that the required signal covariance matrix dimension is smaller in the process of solving the filter, thereby significantly reducing the complexity of solving the speech noise reduction filter, and improving the filter's sensitivity to changes in the statistical characteristics of speech signals and noise. tracking ability.

Figure 202111621218

Description

一种基于麦克风阵列的语音降噪方法、装置以及存储介质Method, device and storage medium for speech noise reduction based on microphone array

技术领域technical field

本申请涉及麦克风阵列技术领域,尤其涉及一种基于麦克风阵列的语音降 噪方法、装置以及存储介质。The present application relates to the technical field of microphone arrays, and in particular, to a method, device and storage medium for speech noise reduction based on a microphone array.

背景技术Background technique

语音降噪在智能语音、人机交互、远程会议、助听设备、车载、虚拟现实、 临境通讯和军用超高背景噪声的语音通信等系统中都起到举足轻重的作用,其 性能的好坏直接影响着语音交互的体验。Voice noise reduction plays a pivotal role in intelligent voice, human-computer interaction, teleconferencing, hearing aids, vehicle, virtual reality, immersive communication, and military voice communication with ultra-high background noise, and its performance is good or bad. It directly affects the experience of voice interaction.

早期的语音交互系统通常只装备一个麦克风,对应的降噪方法则为单道语 音降噪。单通道语音降噪方法具有实现简单、运算效率高等优点,能取得一定 的效果,但也有较大的局限性。研究表明,在一定条件下单通道降噪一定会引 入语音失真,且信噪比提升越大,引入的语音失真也越大。相比之下,多通道 语音降噪方法更有潜力在少引入或者不引入语音失真的前提下显著提升信噪 比。经典的多通道语音降噪方法包括多通道维纳滤波,多通道折中滤波,最小 方差无失真响应滤波、线性约束最小方差滤波、以及广义旁瓣对消等。近年来, 国内外研究人员提出了基于深度学习的语音降噪方法,可取得较好的性能,但由于其泛化能力通常较弱,当前还难以大范围地应用在实际系统中。Early voice interaction systems are usually equipped with only one microphone, and the corresponding noise reduction method is single-channel voice noise reduction. The single-channel speech noise reduction method has the advantages of simple implementation and high computing efficiency, and can achieve certain effects, but it also has great limitations. Studies have shown that under certain conditions, single-channel noise reduction will definitely introduce speech distortion, and the greater the improvement of the signal-to-noise ratio, the greater the introduced speech distortion. In contrast, multi-channel speech noise reduction methods have the potential to significantly improve the signal-to-noise ratio with little or no speech distortion. Classical multi-channel speech noise reduction methods include multi-channel Wiener filtering, multi-channel compromise filtering, minimum variance undistorted response filtering, linear constrained minimum variance filtering, and generalized sidelobe cancellation. In recent years, researchers at home and abroad have proposed speech noise reduction methods based on deep learning, which can achieve good performance.

为取得更好的语音降噪性能,通常需要装备更多的麦克风,以获取更加丰 富的空时频信息。但这通常也意味着需要设计更长的滤波器。而应用更长的滤 波器会带来以下的两个问题。第一,求解滤波器的复杂度会随着滤波器长度的 增大而迅速增大;第二,在滤波器的求解过程中所需的信号协方差矩阵的维数 会更大,因此需要更多的观测样本来估计信号的协方差矩阵,用以计算滤波器 的系数,导致对语音信号和噪声统计特性变化的跟踪能力下降,无法更好地处 理实际中常见的非平稳噪声。In order to achieve better speech noise reduction performance, it is usually necessary to equip more microphones to obtain richer space-time-frequency information. But this also usually means designing longer filters. The application of longer filters brings the following two problems. First, the complexity of solving the filter will increase rapidly with the increase of the filter length; second, the dimension of the signal covariance matrix required in the process of solving the filter will be larger, so it needs to be more Many observation samples are used to estimate the covariance matrix of the signal, which is used to calculate the coefficient of the filter, which leads to a decrease in the ability to track the changes in the statistical characteristics of the speech signal and noise, and cannot better handle the common non-stationary noise in practice.

发明内容SUMMARY OF THE INVENTION

本申请实施例通过提供一种基于麦克风阵列的语音降噪方法,解决了在现 有技术中当滤波器长度较长时所导致的两个问题,即,第一,求解滤波器的复 杂度会随着滤波器长度的增大而迅速增大;第二,在滤波器的求解过程中所需 的信号协方差矩阵的维数会更大,因此需要更多的观测样本来估计信号的协方 差矩阵,用以计算滤波器的系数,导致对语音信号和噪声统计特性变化的跟踪 能力下降,无法更好地处理实际中常见的非平稳噪声。本申请实施例显著降低 了求解滤波器的复杂度,且在滤波器的求解过程中所需的信号协方差矩阵维数 更小,因此可以用更少的信号观测样本来估计其协方差矩阵,从而提高滤波器 对语音信号和噪声统计特性变化的跟踪能力。The embodiment of the present application solves two problems caused when the filter length is long in the prior art by providing a voice noise reduction method based on a microphone array, that is, first, the complexity of solving the filter will increase It increases rapidly with the increase of the filter length; secondly, the dimension of the signal covariance matrix required in the process of solving the filter will be larger, so more observation samples are needed to estimate the signal covariance The matrix is used to calculate the coefficients of the filter, which leads to a decrease in the ability to track the changes in the statistical characteristics of the speech signal and noise, and cannot better handle the non-stationary noise that is common in practice. The embodiment of the present application significantly reduces the complexity of solving the filter, and the required signal covariance matrix dimension in the process of solving the filter is smaller, so the covariance matrix can be estimated with fewer signal observation samples, Thereby, the ability of the filter to track the changes of the statistical characteristics of the speech signal and noise is improved.

第一方面,本发明实施例提供了一种基于麦克风阵列的语音降噪方法,该 方法包括:In a first aspect, an embodiment of the present invention provides a method for noise reduction based on a microphone array, the method comprising:

获取带噪语音信号;Obtain noisy speech signal;

对所述带噪语音信号进行预处理,确定频域带噪语音信号;Preprocessing the noisy speech signal to determine the frequency domain noisy speech signal;

估计所述频域带噪语音信号的统计特性,估计噪声信号的统计特性;Estimating the statistical properties of the frequency-domain noisy speech signal, and estimating the statistical properties of the noise signal;

将麦克风阵列分为多个子阵列,分别估计出多个子滤波器;Divide the microphone array into multiple sub-arrays, and estimate multiple sub-filters respectively;

根据所述多个子滤波器,确定频域降噪滤波器;determining a frequency-domain noise reduction filter according to the plurality of sub-filters;

根据所述频域降噪滤波器对所述频域带噪语音信号进行降噪处理,确定频 域降噪语音信号;According to the frequency-domain noise reduction filter, noise reduction processing is performed on the frequency-domain noisy speech signal to determine the frequency-domain noise reduction speech signal;

将所述频域降噪语音信号转换为时域降噪语音信号。Converting the frequency-domain noise-reduced speech signal into a time-domain noise-reduced speech signal.

结合第一方面,在一种可能的实现方式中,所述对所述带噪语音信号进行 预处理,包括:对所述带噪语音信号进行分帧、加窗后进行快速傅里叶变换。With reference to the first aspect, in a possible implementation manner, the preprocessing of the noisy speech signal includes: framing and windowing the noisy speech signal and then performing fast Fourier transform.

结合第一方面,在一种可能的实现方式中,所述估计所述频域带噪语音信 号的统计特性,包括根据时间平滑估计方式进行带噪语音信号统计特性的估计。With reference to the first aspect, in a possible implementation, the estimating the statistical characteristics of the frequency-domain noisy speech signal includes estimating the statistical characteristics of the noisy speech signal according to a time smoothing estimation method.

结合第一方面,在一种可能的实现方式中,所述估计噪声信号的统计特性, 包括根据现有噪声估计算法估计噪声信号的统计特性。With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristics of the noise signal includes estimating the statistical characteristics of the noise signal according to an existing noise estimation algorithm.

结合第一方面,在一种可能的实现方式中,所述将麦克风阵列分为多个子 阵列,分别估计出多个子滤波器,包括利用降噪滤波器的低秩结构迭代估计出 多个子滤波器。With reference to the first aspect, in a possible implementation manner, dividing the microphone array into multiple sub-arrays and estimating multiple sub-filters respectively includes iteratively estimating multiple sub-filters by using the low-rank structure of the noise reduction filter .

第二方面,本发明实施例提供了一种基于麦克风阵列的语音降噪装置,其 特征在于,包括In a second aspect, an embodiment of the present invention provides a voice noise reduction device based on a microphone array, which is characterized by comprising:

信号获取模块,用于获取带噪语音信号;The signal acquisition module is used to acquire the noisy speech signal;

信号预处理模块,用于对所述带噪语音信号进行预处理,确定频域带噪语 音信号;a signal preprocessing module for preprocessing the noisy speech signal to determine the frequency domain noisy speech signal;

统计特性估计模块,用于估计所述频域带噪语音信号的统计特性,估计噪 声信号的统计特性;a statistical characteristic estimation module for estimating the statistical characteristic of the frequency-domain noisy speech signal, and estimating the statistical characteristic of the noise signal;

子滤波器确定模块,用于将麦克风阵列分为多个子阵列,分别估计出多个 子滤波器;a sub-filter determining module, used for dividing the microphone array into multiple sub-arrays, and estimating multiple sub-filters respectively;

频域降噪滤波器确定模块,用于根据所述多个子滤波器,确定频域降噪滤 波器;a frequency-domain noise reduction filter determination module, configured to determine a frequency-domain noise reduction filter according to the plurality of sub-filters;

降噪模块,用于根据所述频域降噪滤波器对所述频域带噪语音信号进行降 噪处理,确定频域降噪语音信号;A noise reduction module, configured to perform noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter, to determine the frequency-domain noise reduction speech signal;

时域降噪语音信号确定模块,用于将所述频域降噪语音信号转换为时域降 噪语音信号。A time-domain noise-reduced speech signal determination module, configured to convert the frequency-domain noise-reduced speech signal into a time-domain noise-reduced speech signal.

结合第二方面,在一种可能的实现方式中,所述信号预处理模块,包括: 对所述带噪语音信号进行分帧、加窗后进行快速傅里叶变换。With reference to the second aspect, in a possible implementation manner, the signal preprocessing module includes: performing fast Fourier transform on the noisy speech signal after framing and windowing.

结合第二方面,在一种可能的实现方式中,所述统计特性估计模块,包括: 包括根据时间平滑估计方式进行带噪语音信号统计特性的估计。With reference to the second aspect, in a possible implementation manner, the statistical characteristic estimation module includes: including estimating the statistical characteristic of a noisy speech signal according to a time smoothing estimation method.

结合第二方面,在一种可能的实现方式中,所述统计特性估计模块,包括: 包括根据现有噪声估计算法估计噪声信号的统计特性。With reference to the second aspect, in a possible implementation manner, the statistical characteristic estimation module includes: including estimating the statistical characteristic of the noise signal according to an existing noise estimation algorithm.

结合第二方面,在一种可能的实现方式中,所述频域降噪滤波器确定模块, 包括:利用降噪滤波器的低秩结构迭代估计出多个子滤波器。With reference to the second aspect, in a possible implementation manner, the frequency-domain noise reduction filter determination module includes: iteratively estimating a plurality of sub-filters by using a low-rank structure of the noise reduction filter.

第三方面,本发明实施例提供了一种基于麦克风阵列的语音降噪服务器, 包括存储器和处理器;In a third aspect, an embodiment of the present invention provides a voice noise reduction server based on a microphone array, including a memory and a processor;

所述存储器用于存储计算机可执行指令;the memory for storing computer-executable instructions;

所述处理器用于执行所述计算机可执行指令,以实现如第一方面所述的方 法。The processor is adapted to execute the computer-executable instructions to implement the method of the first aspect.

第四方面,本发明实施例提供了一种计算机可读存储介质,所述计算机 可读存储介质存储有可执行指令,计算机执行所述可执行指令时能够实现 如第一方面任一项所述的方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores executable instructions, and when a computer executes the executable instructions, any one of the first aspect can be implemented Methods.

本发明实施例中提供的一个或多个技术方案,至少具有如下技术效果或优 点:One or more technical solutions provided in the embodiment of the present invention have at least the following technical effects or advantages:

本发明实施例采用了一种基于麦克风阵列的语音降噪方法,该方法包括, 获取带噪语音信号;对带噪语音信号进行预处理,确定频域带噪语音信号;估 计频域带噪语音信号的统计特性,估计噪声信号的统计特性;将麦克风阵列分 为多个子阵列,分别估计出多个子滤波器;根据多个子滤波器,确定频域降噪 滤波器;根据频域降噪滤波器对频域带噪语音信号进行降噪处理,确定频域降 噪语音信号;将频域降噪语音信号转换为时域降噪语音信号。有效解决了在现 有技术中当滤波器长度较长时所导致的两个问题,即,第一,求解滤波器的复 杂度会随着滤波器长度的增大而迅速增大;第二,在滤波器的求解过程中所需的信号协方差矩阵的维数会更大,因此需要更多的观测样本来估计信号的协方 差矩阵,用以计算滤波器的系数,导致对语音信号和噪声统计特性变化的跟踪 能力下降,无法更好地处理实际中常见的非平稳噪声。本发明实施例显著降低 了求解滤波器的复杂度,且在滤波器的求解过程中所需的信号协方差矩阵维数 更小,因此可以用更少的信号观测样本来估计其协方差矩阵,从而提高滤波器 对语音信号和噪声统计特性变化的跟踪能力。The embodiment of the present invention adopts a voice noise reduction method based on a microphone array. The method includes: acquiring a noisy voice signal; preprocessing the noisy voice signal to determine a frequency-domain noisy voice signal; estimating a frequency-domain noisy voice signal Statistical characteristics of the signal, estimate the statistical characteristics of the noise signal; divide the microphone array into multiple sub-arrays, and estimate multiple sub-filters respectively; determine the frequency-domain noise reduction filter according to the multiple sub-filters; According to the frequency-domain noise reduction filter The noise reduction processing is performed on the frequency-domain noise-reduced speech signal to determine the frequency-domain noise-reduced speech signal; the frequency-domain noise-reduced speech signal is converted into a time-domain noise reduction speech signal. It effectively solves two problems caused when the filter length is long in the prior art, namely, first, the complexity of solving the filter will increase rapidly with the increase of the filter length; second, The dimension of the signal covariance matrix required in the process of solving the filter will be larger, so more observation samples are needed to estimate the covariance matrix of the signal to calculate the coefficients of the filter, which will lead to the loss of speech signal and noise. The ability to track changes in statistical properties is reduced, and it cannot better handle non-stationary noise that is common in practice. The embodiment of the present invention significantly reduces the complexity of solving the filter, and the required signal covariance matrix dimension in the process of solving the filter is smaller, so the covariance matrix can be estimated by using fewer signal observation samples, Thereby, the ability of the filter to track the changes of the statistical characteristics of the speech signal and noise is improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对本发 明实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下 面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不 付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings that are required in the description of the embodiments of the present invention or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本申请实施例提供的基于麦克风阵列的语音降噪方法的步骤流程 图;Fig. 1 is a flow chart of steps of a microphone array-based voice noise reduction method provided by an embodiment of the application;

图2为本申请实施例提供的基于麦克风阵列的语音降噪的装置的示意图;2 is a schematic diagram of a device for noise reduction based on a microphone array provided by an embodiment of the present application;

图3为本申请实施例提供的基于麦克风阵列的语音降噪的服务器示意图;3 is a schematic diagram of a server for voice noise reduction based on a microphone array provided by an embodiment of the present application;

图4为本申请实施例提供的方法的复杂度和传统方法复杂度的对比图;4 is a comparison diagram of the complexity of the method provided by the embodiment of the present application and the complexity of the traditional method;

图5为本申请实施例提供的方法的均方误差随迭代次数的变化的图像;FIG. 5 is an image of the mean square error of the method provided by the embodiment of the present application as a function of the number of iterations;

图6为本申请实施例提供的当噪声统计特性突然发生变化时,本申请实施 例提供的方法和传统方法的均方误差随时间变化的对比图。FIG. 6 is a comparison diagram of the mean square error of the method provided by the embodiment of the present application and the traditional method when the statistical characteristics of noise suddenly change, provided by the embodiment of the present application.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、完整地描述。显然,所描述的实施例是本发明的一部分实施例,而不是全 部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性 劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some, but not all, embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work, all belong to the protection scope of the present invention.

在早期的语音交互系统中通常只配备一个麦克风,对应的语音降噪方法为 单通道语音降噪。单通道语音降噪方法具有实现简单、运算效率高等优点,能 取得一定的效果,但也具有很大的局限性。研究表明,在一定的条件下单通道 降噪一定会引入语音失真,且信噪比提升越大,引入的语音失真也就越大。相 比之下,多通道语音降噪方法更具有潜力,在少引入或者不引入语音失真的前 提下,显著提升信噪比。多通道语音降噪通常需要装备更多的麦克风,以获取 更加丰富的空时频信息。但相应的会导致两个问题,第一,求解滤波器的复杂 度会随着滤波器长度的增加而迅速的增大;第二,在滤波器的求解过程中所需的信号协方差矩阵的维数更大,因此需要更多的测样本来估计信号的协方差矩 阵,用以计算滤波器的系数,导致其对语音信号和噪声统计变化的跟踪能力下 降,无法更好地处理在实际中常见的非平稳噪声。In the early voice interaction system, only one microphone is usually equipped, and the corresponding voice noise reduction method is single-channel voice noise reduction. The single-channel speech noise reduction method has the advantages of simple implementation and high computing efficiency, and can achieve certain effects, but it also has great limitations. Studies have shown that under certain conditions, single-channel noise reduction will definitely introduce speech distortion, and the greater the improvement of the signal-to-noise ratio, the greater the introduced speech distortion. In contrast, the multi-channel speech noise reduction method has more potential, and can significantly improve the signal-to-noise ratio under the premise of introducing little or no speech distortion. Multi-channel speech noise reduction usually requires more microphones to obtain richer space-time-frequency information. However, it will lead to two problems. First, the complexity of solving the filter will increase rapidly with the increase of the filter length; second, the signal covariance matrix required in the process of solving the filter will increase. The dimension is larger, so more test samples are needed to estimate the covariance matrix of the signal to calculate the coefficients of the filter, which leads to a decrease in the tracking ability of the statistical changes of the speech signal and noise, which cannot be better processed in practice. Common non-stationary noise.

本发明实施例提供了一种基于麦克风阵列的语音降噪方法,如图1所示, 该方法包括以下步骤,An embodiment of the present invention provides a voice noise reduction method based on a microphone array. As shown in FIG. 1 , the method includes the following steps:

步骤S101,获取带噪语音信号。Step S101, acquiring a noisy speech signal.

步骤S102,对带噪语音信号进行预处理,确定频域带噪语音信号。Step S102, preprocessing the noisy speech signal to determine the frequency domain noisy speech signal.

步骤S103,估计频域带噪语音信号的统计特性,估计噪声信号的统计特性。Step S103, estimating the statistical characteristics of the frequency-domain noisy speech signal, and estimating the statistical characteristics of the noise signal.

步骤S104,将麦克风阵列分为多个子阵列,分别估计出多个子滤波器。Step S104: Divide the microphone array into a plurality of sub-arrays, and estimate a plurality of sub-filters respectively.

步骤S105,根据多个子滤波器,确定频域降噪滤波器。Step S105: Determine a frequency-domain noise reduction filter according to a plurality of sub-filters.

步骤S106,根据频域降噪滤波器对频域带噪语音信号进行降噪处理,确定 频域降噪语音信号。Step S106, performing noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter to determine the frequency-domain noise reduction speech signal.

步骤S107,将频域降噪语音信号转换为时域降噪语音信号。Step S107, converting the frequency-domain noise reduction speech signal into a time-domain noise reduction speech signal.

综合上述的方法步骤,构建一个更加合理的滤波器,避免了像现有的多通 道语音降噪方法一样整体计算一个很长的滤波器,更短的滤波器意味着少的滤 波器系数。因此,相校于现有的方法,本申请提供的方法显著降低了求解语音 降噪滤波器的复杂度,且在滤波器的求解过程中所需的信号协方差矩阵维数小, 所以可以用更少的信号观测样本来估计其协方差矩阵,从而可提高滤波器对语 音信号和噪声统计特性变化的跟踪能力。Combining the above method steps, a more reasonable filter is constructed, which avoids calculating a long filter as a whole like the existing multi-channel speech noise reduction method, and a shorter filter means less filter coefficients. Therefore, compared with the existing method, the method provided by the present application significantly reduces the complexity of solving the speech noise reduction filter, and the required signal covariance matrix dimension in the process of solving the filter is small, so it can be used Fewer signal observation samples are used to estimate its covariance matrix, which improves the filter's ability to track changes in speech signal and noise statistics.

在本申请的一个具体的实施例中,我们将时域带噪语音信号表示为,In a specific embodiment of this application, we represent the time-domain noisy speech signal as,

ym(t)=xm(t)+vm(t),m=1,2,...,M (1)y m (t) = x m (t) + v m (t), m = 1,2,...,M (1)

其中,ym(t)表示第m个麦克风接收到的带噪语音信号;xm(t)表示第m个 麦克风接收到的纯净语音信号;vm(t)表示第m个麦克风接收到的背景噪声信号; t表示离散时间点;M表示麦克风的个数。Among them, y m (t) represents the noisy speech signal received by the m-th microphone; x m (t) represents the pure speech signal received by the m-th microphone; vm (t) represents the m-th microphone received Background noise signal; t represents discrete time points; M represents the number of microphones.

在本申请中的一个具体的实施例中,假设所有的信号都是零均值、带宽信 号,同时,假设语音信号和噪声信号不相关。语音降噪地目的为,通过带噪语 音信号恢复出纯净语音信号。为不失一般性,本申请中,将麦克风1设置为参 考麦克风,即设x1(t)为期望信号(需要恢复的信号)。In a specific embodiment of the present application, it is assumed that all signals are zero mean, bandwidth signals, and at the same time, it is assumed that the speech signal and the noise signal are not correlated. The purpose of speech noise reduction is to restore the pure speech signal through the noisy speech signal. Without loss of generality, in this application, the microphone 1 is set as the reference microphone, that is, x 1 (t) is set as the desired signal (the signal to be restored).

对带噪语音信号进行预处理,包括:对带噪语音信号进行分帧、加窗后进 行快速傅里叶变换,得到频域带噪语音信号,表示为:The preprocessing of the noisy speech signal includes: framing the noisy speech signal, adding a window and then performing fast Fourier transform to obtain the frequency domain noisy speech signal, which is expressed as:

Figure BDA0003437550130000071
Figure BDA0003437550130000071

其中,w表示窗函数;T表示窗函数的长度(也是语音信号帧的长度);L 表示两个相邻帧之间的步进长度;零均值随机变量Ym(k,n),Xm(k,n),Vm(k,n)分别为 ym(t),xm(t),vm(t),在第n帧第k个频带的傅里叶变换值,其中k∈{0,1,...,K-1}。Among them, w represents the window function; T represents the length of the window function (also the length of the speech signal frame); L represents the step length between two adjacent frames; the zero-mean random variable Y m (k,n), X m (k,n), V m (k, n) are y m (t), x m (t), v m (t), respectively, the Fourier transform value of the k-th frequency band in the n-th frame, where k ∈{0,1,...,K-1}.

为方便起见,将信号模型用向量形式表示为For convenience, the signal model is represented in vector form as

y(k,n)=x(k,n)+v(k,n) (3)y(k,n)=x(k,n)+v(k,n) (3)

其中,in,

y(k,n)=[Y1(k,n),Y2(k,n),...,YM(k,n)]T (4)y(k,n)=[Y 1 (k,n),Y 2 (k,n),...,Y M (k,n)] T (4)

x(k,n)和x(k,n)的定义与y(k,n)类似,上标T为转置符。The definitions of x(k,n) and x(k,n) are similar to y(k,n), and the superscript T is the transpose operator.

在传统方法中,通常需要设计一个长为M的滤波器h(k,n)来实现语音降噪, 即:In traditional methods, it is usually necessary to design a filter h(k,n) with a length of M to achieve speech noise reduction, namely:

Z(k,n)=hH(k,n)y(k,n) (5)Z(k,n)=h H (k,n)y(k,n) (5)

其中in

h(k,n)=[H1(k,n),H2(k,n),...,HM(k,n)]T (6)h(k,n)=[H 1 (k,n),H 2 (k,n),...,H M (k,n)] T (6)

Z(k,n)为X1(k,n)的估计值。但当M较大时,则会导致在背景技术中所述的 两个问题。Z(k,n) is the estimated value of X 1 (k,n). However, when M is large, two problems as described in the background art are caused.

估计频域带噪语音信号的统计特性,包括根据时间平滑方式进行带噪语音 信号统计特性的估计。估计噪声信号的统计特性,包括根据现有噪声估计算法 估计噪声信号的统计特性。Estimating the statistical properties of the noisy speech signal in the frequency domain, including estimating the statistical properties of the noisy speech signal according to the time smoothing method. Estimating the statistical properties of the noise signal, including estimating the statistical properties of the noise signal according to existing noise estimation algorithms.

由于语音信号和噪声不相关,所以Z(k,n)的方差可表示为:Since the speech signal and noise are uncorrelated, the variance of Z(k,n) can be expressed as:

ΦZ(k,n)=hH(k,n)Φy(k,n)h(k,n)Φ Z (k,n)=h H (k,n)Φ y (k,n)h(k,n)

=hH(k,n)Φx(k,n)h(k,n)+hH(k,n)Φv(k,n)h(k,n) (7)=h H (k,n)Φ x (k,n)h(k,n)+h H (k,n)Φ v (k,n)h(k,n) (7)

其中,Φa(k,n)=E[a(k,n)aH(k,n)],a(k,n)∈{y(k,n),x(k,n),v(k,n)}。通常,我们可 以应用时间平滑的方式估计Φy(k,n),而Φv(k,n)则可以根据现有文献中的噪声估计算法得到。得到Φy(k,n)及Φv(k,n)的估计值后,则可通过Φy(k,n)-Φv(k,n)得到 Φx(k,n)。Among them, Φ a (k,n)=E[a(k,n)a H (k,n)],a(k,n)∈{y(k,n),x(k,n),v (k,n)}. In general, we can estimate Φ y (k,n) by applying temporal smoothing, and Φ v (k,n) can be obtained according to noise estimation algorithms in the existing literature. After obtaining the estimated values of Φ y (k, n) and Φ v (k, n), Φ x (k, n) can be obtained by Φ y (k, n)-Φ v (k, n).

为导出本发明中的方法,将麦克风阵列分为M2个子阵,每个子阵中有M1个麦克风,即M=M1*M2,第1至M1个麦克风组成第一个子阵,第M1+1至2M1个麦克风组成第二个子阵,以此类推。在本发明中,我们假设M1≤M2。同样, 可以将滤波器h(k,n)按上述方式分解,即In order to derive the method in the present invention, the microphone array is divided into M 2 sub-arrays, and there are M 1 microphones in each sub-array, that is, M=M 1 *M 2 , and the first to M 1 microphones form the first sub-array , the M 1 +1 to 2M 1 microphones form the second sub-array, and so on. In the present invention, we assume that M 1 ≤ M 2 . Similarly, the filter h(k,n) can be decomposed as above, namely

Figure BDA0003437550130000081
Figure BDA0003437550130000081

其中,in,

Figure BDA0003437550130000082
Figure BDA0003437550130000082

此时,可以将子滤波器hm(k,n),m=1,2,...,M2组成一个维数为M1×M2的矩阵,即:At this time, the sub-filters h m (k, n), m=1, 2,..., M 2 can be formed into a matrix of dimension M 1 ×M 2 , that is:

H(k,n)=[h1(k,n),h2(k,n),...,hM2(k,n)] (10)H(k,n)=[h 1 (k,n),h 2 (k,n),...,h M2 (k,n)] (10)

需要注意的是,h(k,n)=vec[H(k,n)],vec(·)表示矩阵的向量化操作符。为简便起见,在后面不会引起歧义的地方将去掉符号k和n。对矩阵H进行奇异值 分解(SingularValue Decomposition,SVD),可将H分解为:It should be noted that h(k,n)=vec[H(k,n)], vec(·) represents the vectorization operator of the matrix. For brevity, the symbols k and n will be removed later where no ambiguity arises. Singular Value Decomposition (SVD) is performed on the matrix H, and H can be decomposed into:

Figure BDA0003437550130000083
Figure BDA0003437550130000083

其中,in,

Figure BDA0003437550130000084
Figure BDA0003437550130000084

为一个M2×M2的矩阵,is an M 2 ×M 2 matrix,

Figure BDA0003437550130000091
Figure BDA0003437550130000091

为一个M2×M2的矩阵。H1和H2为两个正交矩阵,∑为一个M1×M2的对角矩 阵,其对角线元素为非负实数。在本申请中,将它们按从大到小的顺序排列, 即

Figure BDA0003437550130000092
上标H为共轭转置符。is an M 2 ×M 2 matrix. H 1 and H 2 are two orthogonal matrices, and Σ is an M 1 ×M 2 diagonal matrix whose diagonal elements are non-negative real numbers. In this application, they are arranged in descending order, i.e.
Figure BDA0003437550130000092
The superscript H is the conjugate transpose.

各个通道接收到的带噪语音信号强相关,所以子滤波器hm(k,n),m=1,2,...,M2之间通常也是是强相关的,导致矩阵H通常不是行满秩矩阵。所以矩阵H通常 可以用前P个最大的奇异值及对应的奇异向量进行较好地近似,即:The noisy speech signals received by each channel are strongly correlated, so the sub-filters h m (k, n), m=1, 2,..., M 2 are usually also strongly correlated, resulting in the matrix H usually not Row full rank matrix. Therefore, the matrix H can usually be well approximated by the first P largest singular values and the corresponding singular vectors, namely:

Figure BDA0003437550130000093
Figure BDA0003437550130000093

其中,

Figure BDA0003437550130000094
需要注意的是,由
Figure BDA0003437550130000095
引起的歧义对矩阵H没有影响。相应的,滤波器h可以 近似表示为:in,
Figure BDA0003437550130000094
It should be noted that by
Figure BDA0003437550130000095
The resulting ambiguity has no effect on the matrix H. Correspondingly, the filter h can be approximately expressed as:

Figure BDA0003437550130000096
Figure BDA0003437550130000096

需要注意的是,当P=M1时,hP=h。It should be noted that when P=M 1 , h P =h.

应用关系式:Apply the relation:

Figure BDA0003437550130000097
Figure BDA0003437550130000097

可将hP写为: hP can be written as:

Figure BDA0003437550130000098
Figure BDA0003437550130000098

其中,

Figure BDA0003437550130000099
大小为M×M2
Figure BDA00034375501300000910
大小为M×M1。此时, 滤波器的输出值Z(k,n)可写为:in,
Figure BDA0003437550130000099
The size is M×M 2 ,
Figure BDA00034375501300000910
The size is M×M 1 . At this time, the output value Z(k,n) of the filter can be written as:

Figure BDA0003437550130000101
Figure BDA0003437550130000101

其中,in,

Figure BDA0003437550130000102
Figure BDA0003437550130000102

Figure BDA0003437550130000103
Figure BDA0003437550130000103

Figure BDA0003437550130000104
Figure BDA0003437550130000104

Figure BDA0003437550130000105
Figure BDA0003437550130000105

Hσ1,P=[Hσ1,1 Hσ1,2...Hσ1,P]H (24)H σ1,P = [H σ1,1 H σ1,2 ...H σ1,P ] H (24)

Hσ2,P=[Hσ2,1 Hσ2,2...Hσ2,P]H (25)H σ2,P = [H σ2,1 H σ2,2 ...H σ2,P ] H (25)

h σ1,Ph σ2,P,yσ1,P(t),yσ2,P(t),Hσ1,P和Hσ2,P的大小分别是M1P×1,M2P×1,M2P×1,M1P×1,M2P×M,M1P×M。可以看出,当参数P较小时,子滤波器h σ1,Ph σ2,P的长度远小于滤波器h的长度。The sizes of h σ1,P , h σ2,P , y σ1,P (t), y σ2,P (t), H σ1,P and H σ2,P are M 1 P×1, M 2 P×1, respectively , M 2 P×1, M 1 P×1, M 2 P×M, M 1 P×M. It can be seen that when the parameter P is small, the lengths of the sub-filters h σ1,P and h σ2,P are much smaller than the length of the filter h.

期望信号X1和其估计值Z的均方误差(mean square error,MSE)为The mean square error (MSE) of the expected signal X 1 and its estimated value Z is

Figure BDA0003437550130000106
Figure BDA0003437550130000106

其中,

Figure BDA0003437550130000107
E(·)表示数学期望,
Figure BDA0003437550130000108
表示取实部,上标*表示复共轭。in,
Figure BDA0003437550130000107
E( ) represents the mathematical expectation,
Figure BDA0003437550130000108
Represents the real part, and the superscript * represents the complex conjugate.

为导出本发明中的滤波器,将MSE写为如下形式:For deriving the filter in the present invention, MSE is written in the following form:

Figure BDA0003437550130000109
Figure BDA0003437550130000109

其中,in,

Figure BDA0003437550130000111
Figure BDA0003437550130000111

Figure BDA0003437550130000112
Figure BDA0003437550130000112

Figure BDA0003437550130000113
Figure BDA0003437550130000113

Figure BDA0003437550130000114
Figure BDA0003437550130000114

需要注意的是,当参数P较小时,矩阵Φyσ1,p(M2P×M2P),和Φyσ2,p(M1P×M1P)的 维数远远小于矩阵Φy(M×M)的维数。It should be noted that when the parameter P is small, the dimensions of the matrices Φ yσ1,p (M 2 P×M 2 P), and Φ yσ2,p (M 1 P×M 1 P) are much smaller than those of the matrix Φ y ( M×M) dimension.

由此可带来两个优势:This leads to two advantages:

1)相较于求解基于Φy的逆矩阵的传统多通道语音降噪滤波器,求解基于 Φyσ1,p和Φyσ2,p的逆矩阵的子滤波器h σ1,Ph σ2,P,所需的复杂度显著降低;1) Compared with solving the traditional multi-channel speech noise reduction filter based on the inverse matrix of Φ y , solving the sub-filters h σ1,P and h σ2,P based on the inverse matrix of Φ yσ1,p and Φ yσ2,p , The required complexity is significantly reduced;

2)相较于估计矩阵Φy,可用更少的信号观测样本估计矩阵Φyσ1,p和 Φyσ2,p,使得子滤波器h σ1,P,和h σ2,P可更加快速地跟踪信号统计特性的变化。2) Compared with the estimation matrix Φ y , the matrices Φ yσ1,p and Φ yσ2,p can be estimated with fewer signal observation samples, so that the sub-filters h σ1,P , and h σ2,P can track the signal statistics more quickly changes in characteristics.

对近似滤波器进行运算,包括:采用迭代求解的方式,得到维纳滤波器。Calculating the approximate filter, including: adopting an iterative solution method to obtain a Wiener filter.

基于式(27)和(28),很难导出子滤波器h σ1,Ph σ2,P的闭式解。所 以,本发明中采用迭代求解的方式。为此,在求解其中一个子滤波器时,假设 另一个子滤波器固定,即Based on equations (27) and (28), it is difficult to derive closed-form solutions for the subfilters h σ1,P and h σ2,P . Therefore, an iterative solution method is adopted in the present invention. For this reason, when solving one of the subfilters, the other subfilter is assumed to be fixed, i.e.

Figure BDA0003437550130000115
Figure BDA0003437550130000115

Figure BDA0003437550130000116
Figure BDA0003437550130000116

将子滤波器h σ1,P按如下方式初始化:Initialize the subfilters h σ1, P as follows:

Figure BDA0003437550130000117
Figure BDA0003437550130000117

其中,in,

Figure BDA0003437550130000118
Figure BDA0003437550130000118

Figure BDA0003437550130000119
Figure BDA0003437550130000119

Figure BDA00034375501300001110
Figure BDA00034375501300001110

xp的定义与yp类似。可以看出,hσ1,W,p为第p个子矩阵的维也纳滤波器,长为M1The definition of x p is similar to that of y p . It can be seen that h σ1,W,p is the Vienna filter of the p-th sub-matrix, and its length is M 1 .

应用

Figure BDA00034375501300001111
构建
Figure BDA00034375501300001112
并将其带入式(29)和(30),可得application
Figure BDA00034375501300001111
Construct
Figure BDA00034375501300001112
And bring it into equations (29) and (30), we can get

Figure BDA0003437550130000121
Figure BDA0003437550130000121

Figure BDA0003437550130000122
Figure BDA0003437550130000122

将式(38)和(39)带入至式(34)中可得:Substituting equations (38) and (39) into equation (34), we get:

Figure BDA0003437550130000123
Figure BDA0003437550130000123

将式(40)对

Figure BDA0003437550130000124
求导并将结果置零,可得子滤波器
Figure BDA0003437550130000125
的维纳解:Put equation (40) on
Figure BDA0003437550130000124
Take the derivative and set the result to zero to get the subfilter
Figure BDA0003437550130000125
The Wiener solution of:

Figure BDA0003437550130000126
Figure BDA0003437550130000126

应用

Figure BDA0003437550130000127
构建
Figure BDA0003437550130000128
并将其带入至式(31)和(32),可得:application
Figure BDA0003437550130000127
Construct
Figure BDA0003437550130000128
and bringing it into equations (31) and (32), we get:

Figure BDA0003437550130000129
Figure BDA0003437550130000129

Figure BDA00034375501300001210
Figure BDA00034375501300001210

Figure BDA00034375501300001211
Figure BDA00034375501300001212
带入式(33)中得:Will
Figure BDA00034375501300001211
and
Figure BDA00034375501300001212
Bring it into equation (33) to get:

Figure BDA00034375501300001213
Figure BDA00034375501300001213

基于(44),可得子滤波器

Figure BDA00034375501300001214
的维纳解:Based on (44), the sub-filter can be obtained
Figure BDA00034375501300001214
The Wiener solution of :

Figure BDA00034375501300001215
Figure BDA00034375501300001215

按上述方式,迭代至第n步时,我们有:In the above way, when iterating to the nth step, we have:

Figure BDA00034375501300001216
Figure BDA00034375501300001216

其中,in,

Figure BDA00034375501300001217
Figure BDA00034375501300001217

Figure BDA00034375501300001218
Figure BDA00034375501300001218

Figure BDA00034375501300001219
Figure BDA00034375501300001219

此时,可以得到本申请中的迭代维纳滤波器:At this point, the iterative Wiener filter in this application can be obtained:

Figure BDA00034375501300001220
Figure BDA00034375501300001220

本发明实施例提供了一种基于麦克风阵列的语音降噪装置,如图2所示, 包括信号获取模块201,信号预处理模块202,统计特性估计模块203,子滤波 器确定模块204,频域降噪滤波器确定模块205,降噪模块206,时域降噪语音 信号确定模块207。信号获取模块201,用于获取带噪语音信号;信号预处理 模块202,用于对所述带噪语音信号进行预处理,确定频域带噪语音信号;统 计特性估计模块203,用于估计所述频域带噪语音信号的统计特性及噪声信号 的统计特性;子滤波器确定模块204,用于将麦克风阵列分为多个子阵列,分 别估计出多个子滤波器;频域降噪滤波器确定模块205,用于根据所述多个子 滤波器,确定频域降噪滤波器;降噪模块206,用于根据所述频域降噪滤波器 对所述频域带噪语音信号进行降噪处理,确定频域降噪语音信号;时域降噪语 音信号确定模块207,用于将所述频域降噪语音信号转换为时域降噪语音信号。An embodiment of the present invention provides a voice noise reduction device based on a microphone array, as shown in FIG. 2, including a signal acquisition module 201, a signal preprocessing module 202, a statistical characteristic estimation module 203, a subfilter determination module 204, a frequency domain A noise reduction filter determination module 205 , a noise reduction module 206 , and a time domain noise reduction speech signal determination module 207 . The signal acquisition module 201 is used to acquire the noisy speech signal; the signal preprocessing module 202 is used to preprocess the noisy speech signal to determine the frequency domain noisy speech signal; the statistical characteristic estimation module 203 is used to estimate the The statistical characteristics of the frequency-domain noisy speech signal and the statistical characteristics of the noise signal are described; the sub-filter determination module 204 is used to divide the microphone array into a plurality of sub-arrays, and respectively estimate a plurality of sub-filters; the frequency-domain noise reduction filter determines Module 205, configured to determine a frequency-domain noise reduction filter according to the plurality of sub-filters; noise reduction module 206, configured to perform noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter , determine the frequency-domain noise-reduced speech signal; the time-domain noise-reduced speech signal determining module 207 is configured to convert the frequency-domain noise-reduced speech signal into a time-domain noise-reduced speech signal.

图4为本申请提供的方法的复杂度与传统方法复杂度的对比,图5为本申 请提供的方法的均方误差随迭代次数的变化,图6为噪声统计特性突然发生变 化时,本申请所提方法及传统方法的均方误差随时间的变化图。即本申请提供 的方法有效降低了复杂度,提高了滤波器对语音信号和噪声统计特性变化的跟 踪能力。Fig. 4 is a comparison between the complexity of the method provided by the application and the complexity of the traditional method, Fig. 5 is the change of the mean square error of the method provided by the application with the number of iterations, and Fig. 6 is the noise statistical characteristic suddenly changed, the application Plot of the mean squared error versus time for the proposed method and the traditional method. That is, the method provided by the present application effectively reduces the complexity and improves the ability of the filter to track changes in the statistical characteristics of speech signals and noise.

本发明实施例提供了一种基于麦克风阵列的语音降噪的服务器,如图3所 示,包括存储器301和处理器302;存储器301用于存储计算机可执行指令; 处理器302用于执行计算机可执行指令。An embodiment of the present invention provides a voice noise reduction server based on a microphone array. As shown in FIG. 3 , it includes a memory 301 and a processor 302; the memory 301 is used to store computer-executable instructions; the processor 302 is used to execute computer-executable instructions. Execute the instruction.

本发明实施例提供了一种计算机可读存储介质,计算机可读存储介质存 储有可执行指令,计算机执行可执行指令时能够。Embodiments of the present invention provide a computer-readable storage medium, where the computer-readable storage medium stores executable instructions, and the computer can execute the executable instructions.

上述存储介质包括但不限于随机存取存储器(英文:Random Access Memory;简称:RAM)、只读存储器(英文:Read-Only Memory;简称:ROM)、 缓存(英文:Cache)、硬盘(英文:Hard Disk Drive;简称:HDD)或者存储 卡(英文:Memory Card)。所述存储器可以用于存储计算机程序指令。The above-mentioned storage medium includes but is not limited to random access memory (English: Random Access Memory; referred to as: RAM), read-only memory (English: Read-Only Memory; referred to as: ROM), cache (English: Cache), hard disk (English: Hard Disk Drive; referred to as: HDD) or memory card (English: Memory Card). The memory may be used to store computer program instructions.

虽然本申请提供了如实施例或流程图所述的方法操作步骤,但基于常规或 者无创造性的劳动可以包括更多或者更少的操作步骤。本实施例中列举的步骤 顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际 中的装置或客户端产品执行时,可以按照本实施例或者附图所示的方法顺序执 行或者并行执行(例如并行处理器或者多线程处理的环境)。Although the present application provides method operation steps as described in the embodiments or flow charts, more or less operation steps may be included based on routine or non-creative work. The sequence of steps enumerated in this embodiment is only one way among the execution sequences of many steps, and does not represent the only execution sequence. When an actual device or client product is executed, it can be executed sequentially or in parallel (for example, a parallel processor or multi-threaded processing environment) according to the method shown in this embodiment or the accompanying drawings.

上述实施例阐明的装置或模块,具体可以由计算机芯片或实体实现,或者 由具有某种功能的产品来实现。为了描述的方便,描述以上装置时以功能分为 各种模块分别描述。在实施本申请时可以把各模块的功能在同一个或多个软件 和/或硬件中实现。当然,也可以将实现某功能的模块由多个子模块或子单元 组合实现。The devices or modules described in the above embodiments may be specifically implemented by computer chips or entities, or by products with certain functions. For the convenience of description, when describing the above device, the functions are divided into various modules and described respectively. When implementing the present application, the functions of each module may be implemented in one or more software and/or hardware. Of course, a module that implements a certain function can also be implemented by a combination of multiple sub-modules or sub-units.

本申请中所述的方法、装置或模块可以以计算机可读程序代码方式实现控 制器按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以 及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的 计算机可读介质、逻辑门、开关、专用集成电路(英文:Application Specific Integrated Circuit;简称:ASIC)、可编程逻辑控制器和嵌入微控制器的形 式,控制器的例子包括但不限于以下微控制器:ARC625D、Atmel AT91SAM、 Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以 被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程 来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控 制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件, 而对其内部包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或 者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又 可以是硬件部件内的结构。The methods, apparatuses or modules described in this application may be implemented in computer readable program code. The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and the memory may be implemented by the (micro)processing computer-readable medium, logic gates, switches, application-specific integrated circuits (English: Application Specific Integrated Circuit; ASIC for short), programmable logic controllers and embedded microcontrollers Examples of controllers include, but are not limited to, the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320, and memory controllers can also be implemented as part of the memory's control logic. Those skilled in the art also know that, in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps. The same function can be realized in the form of a microcontroller, etc. Therefore, such a controller can be regarded as a hardware component, and the devices included therein for realizing various functions can also be regarded as a structure within the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.

本申请所述装置中的部分模块可以在由计算机执行的计算机可执行指令 的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或 实现特定抽象数据类型的例程、程序、对象、组件、数据结构、类等。也可以 在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络 而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位 于包括存储设备在内的本地和远程计算机存储介质中。Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本 申请可借助软件加必需的硬件的方式来实现。基于这样的理解,本申请的技术 方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出 来,也可以通过数据迁移的实施过程中体现出来。该计算机软件产品可以存储 在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算 机设备(可以是个人计算机,移动终端,服务器,或者网络设备等)执行本申 请各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary hardware. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that make contributions to the prior art can also be embodied in the implementation process of data migration. The computer software product can be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions to make a computer device (which can be a personal computer, mobile terminal, server, or network device, etc.) execute this The methods described in various embodiments or portions of embodiments are claimed.

本说明书中的各个实施方式采用递进的方式描述,各个实施方式之间相同 或相似的部分互相参见即可,每个实施方式重点说明的都是与其他实施方式的 不同之处。本申请的全部或者部分可用于众多通用或专用的计算机系统环境或 配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型 设备、移动通信终端、多处理器系统、基于微处理器的系统、可编程的电子设 备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计 算环境等等。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. All or part of this application may be utilized in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.

以上实施例仅用以说明本申请的技术方案,而非对本申请限制;尽管参照 前述实施例对本申请进行了详细的说明,本领域普通技术人员应当理解:其依 然可以对前述实施例所记载的技术方案进行修改,或者对其中部分或者全部技 术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离 本申请技术方案的范围。The above embodiments are only used to illustrate the technical solutions of the present application, but not to limit the present application; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still The technical solutions are modified, or some or all of the technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the present application.

Claims (8)

1. A speech noise reduction method based on microphone array is characterized by comprising
Acquiring a voice signal with noise;
preprocessing the voice signal with noise to determine a frequency domain voice signal with noise;
estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal;
dividing a microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters;
determining a frequency domain noise reduction filter according to the plurality of sub-filters;
carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;
and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
2. The method of claim 1, wherein the pre-processing the noisy speech signal comprises: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.
3. The method according to claim 1, wherein said estimating the statistical properties of the frequency-domain noisy speech signal comprises estimating the statistical properties of the noisy speech signal according to a time-smoothed estimation.
4. The method of claim 1, wherein estimating the statistical properties of the noise signal comprises estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.
5. The method of claim 1, wherein the dividing the microphone array into a plurality of sub-arrays and estimating a plurality of sub-filters separately comprises iteratively estimating the plurality of sub-filters using a low rank architecture of a noise reduction filter.
6. A speech noise reduction device based on microphone array is characterized by comprising
The signal acquisition module is used for acquiring a voice signal with noise;
the signal preprocessing module is used for preprocessing the voice signal with the noise and determining a frequency domain voice signal with the noise;
the statistical characteristic estimation module is used for estimating the statistical characteristic of the frequency domain voice signal with noise and estimating the statistical characteristic of the noise signal;
the sub-filter determining module is used for dividing the microphone array into a plurality of sub-arrays and respectively estimating a plurality of sub-filters;
a frequency domain noise reduction filter determining module, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters;
the noise reduction module is used for carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter and determining a frequency domain noise reduction voice signal;
and the time domain noise reduction voice signal determination module is used for converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
7. A microphone array based speech noise reduction server comprising a memory and a processor;
the memory is to store computer-executable instructions;
the processor is configured to execute the computer-executable instructions to implement the method of any of claims 1-5.
8. A computer-readable storage medium having stored thereon executable instructions that, when executed by a computer, are capable of implementing the method of any one of claims 1-5.
CN202111621218.5A 2021-12-28 2021-12-28 Method, device and storage medium for speech noise reduction based on microphone array Pending CN114373475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111621218.5A CN114373475A (en) 2021-12-28 2021-12-28 Method, device and storage medium for speech noise reduction based on microphone array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111621218.5A CN114373475A (en) 2021-12-28 2021-12-28 Method, device and storage medium for speech noise reduction based on microphone array

Publications (1)

Publication Number Publication Date
CN114373475A true CN114373475A (en) 2022-04-19

Family

ID=81142867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111621218.5A Pending CN114373475A (en) 2021-12-28 2021-12-28 Method, device and storage medium for speech noise reduction based on microphone array

Country Status (1)

Country Link
CN (1) CN114373475A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5917919A (en) * 1995-12-04 1999-06-29 Rosenthal; Felix Method and apparatus for multi-channel active control of noise or vibration or of multi-channel separation of a signal from a noisy environment
WO2006114100A1 (en) * 2005-04-26 2006-11-02 Aalborg Universitet Estimation of signal from noisy observations
CN110517701A (en) * 2019-07-25 2019-11-29 华南理工大学 A kind of microphone array voice enhancement method and realization device
CN112802490A (en) * 2021-03-11 2021-05-14 北京声加科技有限公司 Beam forming method and device based on microphone array
CN113409804A (en) * 2020-12-22 2021-09-17 声耕智能科技(西安)研究院有限公司 Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5917919A (en) * 1995-12-04 1999-06-29 Rosenthal; Felix Method and apparatus for multi-channel active control of noise or vibration or of multi-channel separation of a signal from a noisy environment
WO2006114100A1 (en) * 2005-04-26 2006-11-02 Aalborg Universitet Estimation of signal from noisy observations
CN110517701A (en) * 2019-07-25 2019-11-29 华南理工大学 A kind of microphone array voice enhancement method and realization device
CN113409804A (en) * 2020-12-22 2021-09-17 声耕智能科技(西安)研究院有限公司 Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace
CN112802490A (en) * 2021-03-11 2021-05-14 北京声加科技有限公司 Beam forming method and device based on microphone array

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIANGHUI WANG ET AL: "Multichannel Iterative Noise Reduction Filters in the Short-Time-Fourier-Transform Domain Based on Kronecker Product Decomposition", IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING ( VOLUME: 29), pages 2725 - 2739 *
何成林, 杜利民, 马昕: "麦克风阵列语音增强的研究", 计算机工程与应用, no. 24 *

Similar Documents

Publication Publication Date Title
Pedersen et al. Convolutive blind source separation methods
CN106233382B (en) A signal processing device for de-reverberation of several input audio signals
CN111415676B (en) A Blind Source Separation Method and System for Initial Frequency Point Selection Based on Separation Matrix
EP1371058B1 (en) Geometric source separation signal processing technique
US7711553B2 (en) Methods and apparatus for blind separation of multichannel convolutive mixtures in the frequency domain
CN114220453B (en) Multi-channel non-negative matrix decomposition method and system based on frequency domain convolution transfer function
WO2007016445A2 (en) System and method for optimizing the operation of an oversampled discrete fourier transform filter bank
Yang et al. A noise reduction method based on LMS adaptive filter of audio signals
CN112992173B (en) Signal separation and denoising method based on improved BCA blind source separation
CN114373475A (en) Method, device and storage medium for speech noise reduction based on microphone array
Málek et al. Sparse target cancellation filters with application to semi-blind noise extraction
CN108322858B (en) Multi-microphone Speech Enhancement Method Based on Tensor Decomposition
CN109074811B (en) audio source separation
Makino et al. Underdetermined blind source separation using acoustic arrays
CN117894332A (en) A time-domain multi-channel speech denoising method based on Kronecker decomposition
CN113655441B (en) Robust sound source localization method for low-complexity compromise pre-whitening
CN113921031A (en) Multi-channel audio data processing method, device, computer equipment, storage medium
Seghouane Maximum likelihood blind image restoration via alternating minimization
CN103413555B (en) A kind of small-bore array microphone sound enhancement method
CN116153325A (en) A kind of single-channel frequency-domain non-causal speech noise reduction method, system, device and medium
CN119207446A (en) Multi-channel speech enhancement method and computer equipment based on time-frequency-space masking
Sawada et al. Independent Component and Vector Analysis
Davies et al. Blind Source Separation using Space–Time Independent Component Analysis
Chua Low Latency Convolutive Blind Source Separation
Wang et al. An Improved Method of Permutation Correction in Convolutive Blind Source Separation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220419