[go: up one dir, main page]

CN118609587A - Signal noise reduction method, device, equipment and readable storage medium - Google Patents

Signal noise reduction method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN118609587A
CN118609587A CN202410804637.XA CN202410804637A CN118609587A CN 118609587 A CN118609587 A CN 118609587A CN 202410804637 A CN202410804637 A CN 202410804637A CN 118609587 A CN118609587 A CN 118609587A
Authority
CN
China
Prior art keywords
signal
noise
speech data
frequency domain
noise reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410804637.XA
Other languages
Chinese (zh)
Inventor
蒋超
刘兵兵
吴劼
毛婷婷
侯天峰
袁斌
李晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goertek Intelligent Technology Co Ltd
Original Assignee
Goertek Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goertek Intelligent Technology Co Ltd filed Critical Goertek Intelligent Technology Co Ltd
Priority to CN202410804637.XA priority Critical patent/CN118609587A/en
Publication of CN118609587A publication Critical patent/CN118609587A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请公开了一种信号降噪方法、装置、设备及可读存储介质,本申请涉及信号处理技术领域,包括:获取待降噪的原始信号,并获取噪声信号;将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。本申请实现了对非稳态噪声的有效降噪。

The present application discloses a signal denoising method, device, equipment and readable storage medium. The present application relates to the field of signal processing technology, including: obtaining an original signal to be denoised and obtaining a noise signal; inputting the original signal and the noise signal into a pre-trained signal denoising model, and outputting a denoising processing result, wherein the trained signal denoising model is obtained by training with first noise speech data and second noise speech data as model input data, and with an ideal processing result as a model training label, and the signal-to-noise ratio of the first noise speech data is higher than the signal-to-noise ratio of the second noise speech data. The present application realizes effective denoising of non-steady-state noise.

Description

信号降噪方法、装置、设备及可读存储介质Signal noise reduction method, device, equipment and readable storage medium

技术领域Technical Field

本申请涉及信号处理技术领域,尤其涉及一种信号降噪方法、装置、设备及可读存储介质。The present application relates to the field of signal processing technology, and in particular to a signal noise reduction method, device, equipment and readable storage medium.

背景技术Background Art

语音总是不可避免地受到外界环境噪声的干扰,包括传输媒介引入的噪声、通信设备内部电噪声、乃至其它讲话者的干扰。这些干扰使麦克风接收到的语音信号并非纯净的原始语音信号,而是受噪声污染的带噪语音信号,导致许多语音处理系统性能急剧恶化。因此,为了从带噪语音信号中获得尽可能纯净的原始语音信号,就需要进行语音降噪。Speech is always inevitably interfered by external environmental noise, including noise introduced by the transmission medium, electrical noise inside the communication equipment, and even interference from other speakers. These interferences make the speech signal received by the microphone not a pure original speech signal, but a noisy speech signal contaminated by noise, causing the performance of many speech processing systems to deteriorate sharply. Therefore, in order to obtain the purest original speech signal from the noisy speech signal, speech noise reduction is required.

和单个麦克风相比,麦克风阵列在时频域的基础上增加了一个空间域(即空域),充分利用语音信号的空域、时域和频域信息,同时具有高空间分辨率、高信号增益与较强的抗干扰能力等特点。它可以弥补单个麦克风在噪声处理、语音提取分离等方面存在的不足。Compared with a single microphone, a microphone array adds a spatial domain (i.e., air domain) on the basis of the time-frequency domain, making full use of the air, time, and frequency domain information of the speech signal, and at the same time has the characteristics of high spatial resolution, high signal gain, and strong anti-interference ability. It can make up for the shortcomings of a single microphone in noise processing, speech extraction and separation, etc.

相关技术中,在有多个拾音麦克风(即麦克风阵列)时,通常的做法是先将麦克风阵列接收到的信号(即多通道信号)进行融合处理,然后将融合后的信号输入至信号降噪模型,如深度学习模型,以对融合后的信号进行降噪处理,然而,这种降噪处理方式存在对于非稳态噪声(例如敲击声、喇叭声等)处理效果差,噪声残留明显的问题。In the related art, when there are multiple sound pickup microphones (i.e., microphone arrays), the usual practice is to first fuse the signals received by the microphone arrays (i.e., multi-channel signals), and then input the fused signals into a signal noise reduction model, such as a deep learning model, to perform noise reduction on the fused signals. However, this noise reduction processing method has the problem of poor processing effect on non-steady-state noise (such as knocking sounds, horn sounds, etc.) and obvious noise residual.

发明内容Summary of the invention

本申请的主要目的在于提供一种信号降噪方法、装置、设备及可读存储介质,旨在解决目前的降噪处理方式对于非稳态噪声的降噪处理效果差,噪声残留明显的技术问题。The main purpose of the present application is to provide a signal noise reduction method, device, equipment and readable storage medium, aiming to solve the technical problems that the current noise reduction processing method has poor noise reduction effect on non-steady-state noise and obvious noise residue.

为实现上述目的,本申请提供一种信号降噪方法,所述信号降噪方法包括:To achieve the above object, the present application provides a signal noise reduction method, the signal noise reduction method comprising:

获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;Acquire an original signal to be denoised and acquire a noise signal, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by estimating the noise of the multi-channel signal based on a preset noise estimation algorithm;

将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。The original signal and the noise signal are input into a pre-trained signal denoising model, and a denoising processing result is output, wherein the trained signal denoising model is trained using first noisy speech data and second noisy speech data as model input data and an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data.

在一实施例中,所述将所述原始信号与所述噪声信号输入至预训练完成的信号降噪模型中,输出得到降噪处理结果的步骤之后,所述方法还包括:In one embodiment, after the step of inputting the original signal and the noise signal into a pre-trained signal denoising model and outputting a denoising result, the method further comprises:

若所述降噪处理结果为增益因子,则基于所述增益因子对所述原始信号进行降噪处理,得到降噪完成的信号。If the noise reduction processing result is a gain factor, noise reduction processing is performed on the original signal based on the gain factor to obtain a noise-reduced signal.

在一实施例中,所述获取待降噪的原始信号,并获取噪声信号的步骤之前,所述方法还包括:In one embodiment, before the steps of obtaining the original signal to be denoised and obtaining the noise signal, the method further includes:

获取训练数据集,其中,所述训练数据集包括纯净语音数据与噪声数据;Acquire a training data set, wherein the training data set includes clean speech data and noise data;

混合所述纯净语音数据与所述噪声数据得到第一噪声语音数据与第二噪声语音数据,其中,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比;Mixing the clean speech data with the noise data to obtain first noise speech data and second noise speech data, wherein the signal-to-noise ratio of the first noise speech data is higher than the signal-to-noise ratio of the second noise speech data;

基于所述纯净语音数据确定理想处理结果,以所述第一噪声语音数据、所述第二噪声语音数据作为输入,所述理想处理结果作为训练标签对待训练的预设信号降噪模型进行训练,得到所述训练完成的信号降噪模型。An ideal processing result is determined based on the clean speech data, and the first noisy speech data and the second noisy speech data are used as inputs. The ideal processing result is used as a training label to train a preset signal denoising model to be trained, so as to obtain the trained signal denoising model.

在一实施例中,所述基于所述纯净语音数据确定理想处理结果的步骤之前,所述方法还包括:In one embodiment, before the step of determining an ideal processing result based on the clean speech data, the method further includes:

分别对所述第一噪声语音数据、所述第二噪声语音数据与所述纯净语音数据进行傅里叶变换,得到频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据;Performing Fourier transform on the first noisy speech data, the second noisy speech data and the clean speech data respectively to obtain the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain and the clean speech data in the frequency domain;

基于频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据执行所述基于所述纯净语音数据确定理想处理结果的步骤。The step of determining an ideal processing result based on the clean speech data is performed based on the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain, and the clean speech data in the frequency domain.

在一实施例中,所述基于所述纯净语音数据确定理想处理结果的步骤,包括:In one embodiment, the step of determining an ideal processing result based on the clean speech data includes:

将频域的所述纯净语音数据作为理想处理结果;或者,Taking the pure speech data in the frequency domain as the ideal processing result; or,

基于频域的所述第一噪声语音数据与频域的所述纯净语音数据计算理想增益因子,将所述理想增益因子作为理想处理结果。An ideal gain factor is calculated based on the first noisy speech data in the frequency domain and the clean speech data in the frequency domain, and the ideal gain factor is used as an ideal processing result.

在一实施例中,所述基于频域的所述第一噪声语音数据与频域的所述纯净语音数据计算理想增益因子的步骤,包括:In one embodiment, the step of calculating the ideal gain factor based on the first noisy speech data in the frequency domain and the clean speech data in the frequency domain comprises:

获取频域的所述第一噪声语音数据的信号幅值,并获取频域的所述纯净语音数据的信号幅值;Acquire the signal amplitude of the first noisy speech data in the frequency domain, and acquire the signal amplitude of the clean speech data in the frequency domain;

将频域的所述第一噪声语音数据的信号幅值作为第一信号幅值,将频域的所述纯净语音数据的信号幅值作为第二信号幅值;Using the signal amplitude of the first noisy speech data in the frequency domain as the first signal amplitude, and using the signal amplitude of the clean speech data in the frequency domain as the second signal amplitude;

将所述第二信号幅值与所述第一信号幅值之间的比值作为理想增益因子。The ratio between the second signal amplitude and the first signal amplitude is taken as the ideal gain factor.

在一实施例中,所述预设信号降噪模型为神经网络模型,包括依次连接的卷积神经网络层、第一门控循环单元与第二门控循环单元。In one embodiment, the preset signal denoising model is a neural network model, comprising a convolutional neural network layer, a first gated recurrent unit, and a second gated recurrent unit connected in sequence.

此外,为实现上述目的,本申请还提供一种信号降噪装置,所述信号降噪装置包括:In addition, to achieve the above-mentioned purpose, the present application also provides a signal noise reduction device, the signal noise reduction device comprising:

获取模块,用于获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;An acquisition module, used to acquire an original signal to be denoised, and to acquire a noise signal, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by performing noise estimation on the multi-channel signal based on a preset noise estimation algorithm;

降噪模块,用于将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。A denoising module is used to input the original signal and the noise signal into a pre-trained signal denoising model, and output a denoising processing result, wherein the trained signal denoising model is trained using the first noisy speech data and the second noisy speech data as model input data and an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data.

此外,为实现上述目的,本申请还提供一种信号降噪设备,所述信号降噪设备包括麦克风阵列与处理器,所述麦克风阵列与所述处理器电连接;所述处理器用于执行如上所述的信号降噪方法的步骤。In addition, to achieve the above-mentioned purpose, the present application also provides a signal noise reduction device, which includes a microphone array and a processor, and the microphone array is electrically connected to the processor; the processor is used to execute the steps of the signal noise reduction method as described above.

此外,为实现上述目的,本申请还提供一种可读存储介质,所述可读存储介质为计算机可读存储介质,所述计算机可读存储介质上存储有实现信号降噪方法的程序,所述实现信号降噪方法的程序被处理器执行以实现如上所述信号降噪方法的步骤。In addition, to achieve the above-mentioned purpose, the present application also provides a readable storage medium, which is a computer-readable storage medium, and a program for implementing the signal noise reduction method is stored on the computer-readable storage medium. The program for implementing the signal noise reduction method is executed by a processor to implement the steps of the signal noise reduction method as described above.

本申请还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上述的信号降噪方法的步骤。The present application also provides a computer program product, including a computer program, which implements the steps of the signal noise reduction method as described above when executed by a processor.

本申请中获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。如此,与直接将融合后的信号输入至信号降噪模型中,进行降噪的信号降噪方式相比,本申请实施例获取多通道信号融合处理后的原始信号,以及进行噪声估计得到的噪声信号,将原始信号与噪声信号两路输入至预训练完成的信号降噪模型中,由于信号降噪模型为以第一噪声语音数据与第二噪声语音数据为模型输入数据训练得到模型,通过第一噪声语音数据表征信噪比较高的原始信号,通过第二噪声语音数据表征信噪比较低的噪声信号,使得训练完成的信号降噪模型基于实际输入的原始信号与噪声信号可有效预测输出,而基于输入的噪声信号提供的参考信息,如噪声信号位置、噪声信号幅值大小等等,对原始信号进行降噪处理,使得信号降噪模型对于非稳态噪声也具有较好的处理效果。In the present application, an original signal to be denoised is obtained, and a noise signal is obtained, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by performing noise estimation on the multi-channel signal based on a preset noise estimation algorithm; the original signal and the noise signal are input into a pre-trained signal denoising model, and a denoising processing result is output, wherein the trained signal denoising model is trained using first noisy speech data and second noisy speech data as model input data, and using an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data. In this way, compared with the signal denoising method of directly inputting the fused signal into the signal denoising model for noise reduction, the embodiment of the present application obtains the original signal after multi-channel signal fusion processing and the noise signal obtained by noise estimation, and inputs the original signal and the noise signal into the pre-trained signal denoising model. Since the signal denoising model is trained with the first noise speech data and the second noise speech data as model input data, the original signal with a higher signal-to-noise ratio is represented by the first noise speech data, and the noise signal with a lower signal-to-noise ratio is represented by the second noise speech data, the trained signal denoising model can effectively predict the output based on the actual input original signal and noise signal, and the original signal is denoised based on the reference information provided by the input noise signal, such as the position of the noise signal, the amplitude of the noise signal, etc., so that the signal denoising model also has a better processing effect on non-steady-state noise.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the present application.

为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, for ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

图1为本申请信号降噪方法第一实施例的流程示意图;FIG1 is a schematic diagram of a flow chart of a first embodiment of a signal noise reduction method of the present application;

图2为本申请一实施例涉及传统降噪方式的降噪结果示意图;FIG2 is a schematic diagram of a noise reduction result of a traditional noise reduction method according to an embodiment of the present application;

图3为采用本申请信号降噪方法进行降噪的降噪结果示意图;FIG3 is a schematic diagram of a noise reduction result obtained by using the signal noise reduction method of the present application;

图4为本申请信号降噪方法一实施例涉及的信号降噪简要流程示意图;FIG4 is a schematic diagram of a simplified signal noise reduction process involved in an embodiment of a signal noise reduction method of the present application;

图5为本申请信号降噪方法一实施例涉及的降噪模型训练简要流程示意图;FIG5 is a schematic diagram of a brief process of training a noise reduction model according to an embodiment of the signal noise reduction method of the present application;

图6为本申请信号降噪装置的装置结构示意图;FIG6 is a schematic diagram of the device structure of the signal noise reduction device of the present application;

图7为本申请实施例中信号降噪装置涉及的硬件运行环境的设备结构示意图。FIG. 7 is a schematic diagram of the device structure of the hardware operating environment involved in the signal noise reduction device in the embodiment of the present application.

本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The purpose, features and advantages of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例,均属于本发明保护的范围。In order to make the above-mentioned purposes, features and advantages of the present invention more obvious and easy to understand, the technical scheme in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without making creative work belong to the scope of protection of the present invention.

本申请的主要解决方案是:获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。The main solution of the present application is: obtaining an original signal to be denoised and obtaining a noise signal, wherein the original signal is a signal obtained by fusion processing of a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by noise estimation of the multi-channel signal based on a preset noise estimation algorithm; inputting the original signal and the noise signal into a pre-trained signal denoising model, and outputting a denoising processing result, wherein the trained signal denoising model is trained using first noisy speech data and second noisy speech data as model input data, and using an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data.

本申请通过获取多通道信号融合处理后的原始信号,以及进行噪声估计得到的噪声信号,将原始信号与噪声信号两路输入至预训练完成的信号降噪模型中,由于信号降噪模型为以第一噪声语音数据与第二噪声语音数据为模型输入数据训练得到模型,通过第一噪声语音数据表征信噪比较高的原始信号,通过第二噪声语音数据表征信噪比较低的噪声信号,使得训练完成的信号降噪模型基于实际输入的原始信号与噪声信号可有效预测输出,而基于噪声信号提供的参考信息,如噪声信号位置、噪声信号幅值大小等等,对原始信号进行降噪处理,使得信号降噪模型对于非稳态噪声也具有较好的处理效果。The present application obtains the original signal after multi-channel signal fusion processing and the noise signal obtained by noise estimation, and inputs the original signal and the noise signal into a pre-trained signal denoising model. Since the signal denoising model is trained with the first noise speech data and the second noise speech data as model input data, the original signal with a higher signal-to-noise ratio is represented by the first noise speech data, and the noise signal with a lower signal-to-noise ratio is represented by the second noise speech data, the trained signal denoising model can effectively predict the output based on the actual input original signal and noise signal, and the original signal is denoised based on the reference information provided by the noise signal, such as the position of the noise signal, the amplitude of the noise signal, etc., so that the signal denoising model also has a better processing effect on non-steady-state noise.

需要说明的是,本实施例的执行主体可以是一种具有数据处理、网络通信以及程序运行功能的计算服务设备,例如平板电脑、个人电脑、手机等,或者是一种能够实现上述功能的信号降噪设备,例如耳机、穿戴式设备等,示例性,以耳机作为执行主体进行本申请各实施例的说明。It should be noted that the execution subject of this embodiment can be a computing service device with data processing, network communication and program running functions, such as a tablet computer, a personal computer, a mobile phone, etc., or a signal noise reduction device that can achieve the above functions, such as headphones, wearable devices, etc. For example, headphones are used as the execution subject to describe the embodiments of this application.

基于此,本申请提出第一实施例的信号降噪方法,请参照图1,所述信号降噪方法包括步骤S10~S20:Based on this, the present application proposes a signal noise reduction method of the first embodiment. Please refer to FIG. 1 . The signal noise reduction method includes steps S10 to S20:

步骤S10,获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;Step S10, obtaining an original signal to be denoised, and obtaining a noise signal, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by performing noise estimation on the multi-channel signal based on a preset noise estimation algorithm;

该原始信号可由耳机自身对麦克风阵列接收到多通道信号进行融合处理得到,也可由其他设备对麦克风阵列接收到多通道信号进行融合处理后,将融合后的多通道信号发送至耳机。类似地,噪声信号也可由耳机自身基于预设噪声估计算法对多通道信号进行噪声估计,也可由其他设备基于预设噪声估计算法对多通道信号进行噪声估计后,将得到的噪声信号发送至耳机,本实施例对此并不做具体限制。The original signal can be obtained by the headset itself fusing the multi-channel signal received by the microphone array, or by other devices fusing the multi-channel signal received by the microphone array and sending the fused multi-channel signal to the headset. Similarly, the noise signal can also be obtained by the headset itself performing noise estimation on the multi-channel signal based on a preset noise estimation algorithm, or by other devices performing noise estimation on the multi-channel signal based on a preset noise estimation algorithm and sending the obtained noise signal to the headset. This embodiment does not impose specific restrictions on this.

可以理解地是,考虑到麦克风阵列中各麦克风距声源位置有差异,可对麦克风阵列接收到的多通道信号进行时间对齐处理,基于时间对齐的多通道信号进行相应处理得到原始信号与噪声信号。It can be understood that, considering that the positions of the microphones in the microphone array are different from the sound source, the multi-channel signals received by the microphone array can be time-aligned, and corresponding processing can be performed based on the time-aligned multi-channel signals to obtain the original signal and the noise signal.

麦克风阵列拾取到信号后,将麦克风阵列拾取到的信号作为多通道信号,并对多通道信号进行融合处理,具体地,可采用波束形成算法对多通道信号进行融合处理,该波束形成算法可为已有的波束形成算法,如广义旁瓣抵消器算法(Generalized SidelobeCanceller,GSC)、最小方差无失真响应(Minimum Variance Distortionless Response,MVDR)算法、线性约束最小方差(Linearly Constrained Minimum Variance,LCMV)算法等。After the microphone array picks up the signal, the signal picked up by the microphone array is used as a multi-channel signal, and the multi-channel signal is fused. Specifically, a beamforming algorithm can be used to fuse the multi-channel signal. The beamforming algorithm can be an existing beamforming algorithm, such as a generalized sidelobe canceller algorithm (Generalized Sidelobe Canceller, GSC), a minimum variance distortionless response (Minimum Variance Distortionless Response, MVDR) algorithm, a linearly constrained minimum variance (Linearly Constrained Minimum Variance, LCMV) algorithm, etc.

该预设噪声估计算法可为提前设置用于估计多通道信号中噪声的算法,如均方误差算法、光谱估计算法等。考虑到本实施例采用波束形成算法的对多通道信号进行融合处理,以得到原始信号,基于此,在一优选的实施方式中,可采用与融合处理的同一波束形成算法对多通道信号进行噪声估计,如采用GSC算法对多通道信号进行融合处理时,即多通道信号经过GSC算法的阻塞矩阵后输出的信号作为噪声信号,如此,通过一次算法处理即可得到原始信号与噪声信号,减少对多通道信号的处理次数,且基于同一算法处理得到原始信号与噪声信号,还可以保持原始信号与噪声信号之间数据的一致性。The preset noise estimation algorithm may be an algorithm set in advance for estimating noise in a multi-channel signal, such as a mean square error algorithm, a spectrum estimation algorithm, etc. Considering that the present embodiment adopts a beamforming algorithm to fuse the multi-channel signal to obtain the original signal, based on this, in a preferred embodiment, the same beamforming algorithm as the fusion process may be used to estimate the noise of the multi-channel signal, such as when the GSC algorithm is used to fuse the multi-channel signal, that is, the signal output after the multi-channel signal passes through the blocking matrix of the GSC algorithm is used as the noise signal, so that the original signal and the noise signal can be obtained through one algorithm processing, reducing the number of times the multi-channel signal is processed, and the original signal and the noise signal are obtained based on the same algorithm processing, and the consistency of the data between the original signal and the noise signal can also be maintained.

示例性地,以采用GSC算法对多通道信号进行融合处理得到信号作为原始信号,以经过GSC算法的阻塞矩阵后输出的信号作为噪声信号为例进行本申请各实施例的阐述。Exemplarily, the embodiments of the present application are described by taking a signal obtained by fusing multi-channel signals using the GSC algorithm as the original signal and a signal output after passing through a blocking matrix of the GSC algorithm as a noise signal as an example.

GSC算法主要由三部分组成:固定波束形成(FBF,Fixed Beamformer)、阻塞矩阵(BM,Blocking Matrix)以及自适应噪声相消器(Adpative Noise Canceller,ANC),其中,固定波束形成为上支路,阻塞矩阵以及自适应噪声相消器为下支路。假定对准声源方位的为主波束,其他则为参考波束。上支路允许主波束通过,而参考波束无法通过;下支路阻塞主波束,而允许参考波束通过。具体地,下支路中阻塞矩阵用于阻塞主波束,ANC用于对阻塞矩阵输出的信号Z进行均衡处理,使其与上支路输出结果中的残余噪音一致。多通道信号x输出至GSC算法中,经上支路处理后得到上支路信号yc,经下支路处理后得到阻塞信号yb,再对上支路信号yc与阻塞信号yb进行滤波处理,如将两路信号进行维纳滤波,上下支路噪声得到抵消,得到处理结果yp;采用GSC算法对多通道波束进行融合处理时,处理结果yp即可为待降噪的原始信号,阻塞矩阵输出的信号Z即可为噪声信号。The GSC algorithm mainly consists of three parts: Fixed Beamforming (FBF), Blocking Matrix (BM) and Adaptive Noise Canceller (ANC), among which the fixed beamforming is the upper branch, and the blocking matrix and adaptive noise canceller are the lower branch. It is assumed that the main beam is aligned with the direction of the sound source, and the others are reference beams. The upper branch allows the main beam to pass, while the reference beam cannot pass; the lower branch blocks the main beam and allows the reference beam to pass. Specifically, the blocking matrix in the lower branch is used to block the main beam, and the ANC is used to equalize the signal Z output by the blocking matrix to make it consistent with the residual noise in the output result of the upper branch. The multi-channel signal x is output to the GSC algorithm, and the upper branch signal yc is obtained after the upper branch processing, and the blocking signal yb is obtained after the lower branch processing. The upper branch signal yc and the blocking signal yb are then filtered. For example, the two signals are subjected to Wiener filtering, and the noise of the upper and lower branches is offset to obtain the processing result yp . When the GSC algorithm is used to perform fusion processing on the multi-channel beam, the processing result yp can be the original signal to be denoised, and the signal Z output by the blocking matrix can be the noise signal.

需要说明地是,由于麦克风阵列采集的多通道信号为时域信号,若在对时域的多通道信号进行融合或噪声估计的处理过程中,未对其进行傅里叶变换处理,而是基于时域的信号直接进行融合或噪声估计,此时噪声信号与原始信号中可能存在有信号为时域信号,则可对时域信号进行傅里叶变换处理,得到频域信号,基于频域的噪声信号与原始信号进行后续处理。示例性地,如将时域信号在时域上进行分帧处理,每帧长度和重叠长度可以根据需要设置,如帧长在7.5ms~30ms之间,对每帧数据进行STFT(short-time Fouriertransform,短时傅里叶变换)得到频域信号。可以理解地,每一时域信号采用相同的分帧方式转换为频域信号,以保证信号之间的一致性。It should be noted that since the multi-channel signals collected by the microphone array are time domain signals, if the multi-channel signals in the time domain are fused or estimated in the process of noise estimation, they are not subjected to Fourier transform processing, but are directly fused or estimated based on the time domain signals. At this time, there may be a signal in the noise signal and the original signal that is a time domain signal. Then, the time domain signal can be Fourier transformed to obtain a frequency domain signal, and the noise signal and the original signal in the frequency domain are subsequently processed. Exemplarily, if the time domain signal is framed in the time domain, the length of each frame and the overlapping length can be set as needed, such as when the frame length is between 7.5ms and 30ms, and each frame of data is subjected to STFT (short-time Fourier transform) to obtain a frequency domain signal. It can be understood that each time domain signal is converted into a frequency domain signal using the same framing method to ensure consistency between signals.

步骤S20,将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。Step S20, input the original signal and the noise signal into a pre-trained signal denoising model, and output a denoising processing result, wherein the trained signal denoising model is trained using the first noisy speech data and the second noisy speech data as model input data, and using the ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data.

需要说明地是,该预训练完成的信号降噪模型具体可为基于第一噪声语音数据与第二噪声语音数据作为输入,以及理想处理结果作为训练标签进行训练得到信号降噪模型,其中,理想处理结果用于指示对第一噪声语音数据的预期处理结果,如纯净语音信息化、理想增益因子等。而通过第一噪声语音数据表征信噪比较高的原始信号,通过第二噪声语音数据表征信噪比较低的噪声信号,通过理想处理结果表征理想输出,进而使得训练完成的信号降噪模型基于实际输入的原始信号与噪声信号可有效预测输出。It should be noted that the pre-trained signal denoising model can be specifically a signal denoising model trained based on the first noisy speech data and the second noisy speech data as inputs, and the ideal processing result as a training label, wherein the ideal processing result is used to indicate the expected processing result of the first noisy speech data, such as pure speech informationization, ideal gain factor, etc. The first noisy speech data is used to represent the original signal with a high signal-to-noise ratio, the second noisy speech data is used to represent the noise signal with a low signal-to-noise ratio, and the ideal processing result is used to represent the ideal output, so that the trained signal denoising model can effectively predict the output based on the actual input original signal and noise signal.

在一种可能的实施方式中,所述将所述原始信号与所述噪声信号输入至预训练完成的信号降噪模型中,输出得到降噪处理结果的步骤之后,所述方法还包括:In a possible implementation manner, after the step of inputting the original signal and the noise signal into a pre-trained signal denoising model and outputting a denoising result, the method further includes:

步骤S201,若所述降噪处理结果为增益因子,则基于所述增益因子对所述原始信号进行降噪处理,得到降噪完成的信号。Step S201: If the noise reduction processing result is a gain factor, noise reduction processing is performed on the original signal based on the gain factor to obtain a noise-reduced signal.

需要说明地是,在模型训练的时如采用理想增益因子作为训练标签进行训练,则信号降噪模型输出的降噪处理结果也为增益因子,此时可基于模型输出的增益因子对原始信号进行降噪处理,具体地,可以采用增益因子调整原始信号的信号幅值。可以理解地是,以增益因子对信号降噪处理的处理方式可采用已有的基于增益因子对信号的降噪处理方式,本实施例不再详述。It should be noted that when the model is trained, if an ideal gain factor is used as a training label, the noise reduction processing result output by the signal noise reduction model is also the gain factor. At this time, the original signal can be subjected to noise reduction processing based on the gain factor output by the model. Specifically, the signal amplitude of the original signal can be adjusted using the gain factor. It can be understood that the processing method of signal noise reduction processing using the gain factor can adopt the existing noise reduction processing method of the signal based on the gain factor, which will not be described in detail in this embodiment.

进一步地,在模型训练的时如采用纯净语音信号作为训练标签进行训练,则信号降噪模型输出的降噪处理结果也为语音信号(也即降噪完成的信号),或者基于模型输出的增益因子对原始进行降噪得到降噪完成的信号后,可对降噪完成的信号进行逆傅里叶变换,以将频域的信号转换到时域,便于输出。Furthermore, when the model is trained, if a pure speech signal is used as a training label, the denoising result output by the signal denoising model is also a speech signal (that is, a denoised signal). Alternatively, after the original signal is denoised based on the gain factor output by the model to obtain a denoised signal, an inverse Fourier transform can be performed on the denoised signal to convert the frequency domain signal to the time domain for easy output.

本实施例中获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。如此,与直接将融合后的信号输入至信号降噪模型中,进行降噪的信号降噪方式相比,本实施例获取多通道信号融合处理后的原始信号,以及进行噪声估计得到的噪声信号,将原始信号与噪声信号两路输入至预训练完成的信号降噪模型中,由于信号降噪模型为以第一噪声语音数据与第二噪声语音数据为模型输入数据训练得到模型,通过第一噪声语音数据表征信噪比较高的原始信号,通过第二噪声语音数据表征信噪比较低的噪声信号,使得训练完成的信号降噪模型基于实际输入的原始信号与噪声信号可有效预测输出,基于输入的噪声信号提供的参考信息,如噪声信号位置、噪声信号幅值大小等等,对原始信号进行降噪处理,使得信号降噪模型对于非稳态噪声也具有较好的处理效果。In this embodiment, an original signal to be denoised is obtained, and a noise signal is obtained, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by performing noise estimation on the multi-channel signal based on a preset noise estimation algorithm; the original signal and the noise signal are input into a pre-trained signal denoising model, and a denoising processing result is output, wherein the trained signal denoising model is trained using first noisy speech data and second noisy speech data as model input data, and using an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data. In this way, compared with the signal denoising method of directly inputting the fused signal into the signal denoising model for denoising, the present embodiment obtains the original signal after multi-channel signal fusion processing and the noise signal obtained by noise estimation, and inputs the original signal and the noise signal into the pre-trained signal denoising model. Since the signal denoising model is trained with the first noise speech data and the second noise speech data as model input data, the original signal with a higher signal-to-noise ratio is represented by the first noise speech data, and the noise signal with a lower signal-to-noise ratio is represented by the second noise speech data, the trained signal denoising model can effectively predict the output based on the actual input original signal and noise signal, and performs denoising on the original signal based on the reference information provided by the input noise signal, such as the position of the noise signal, the amplitude of the noise signal, etc., so that the signal denoising model also has a better processing effect on non-steady-state noise.

示例性地,参照图2至3所示,图2至图3中框内部分为非稳态噪声;图2中为将将原始信号作为输入,采用传统的信号降噪方式进行处理得到的降噪处理效果;图3为将原始信号与噪声信号作为输入,采用本申请实施例的信号降噪方法进行处理得到的降噪处理效果;通过图2与图3的比较可看出本申请实施例的降噪处理方式,对非稳态噪声对的处理效果较好,噪声残留少。Exemplarily, referring to Figures 2 to 3, the interior of the boxes in Figures 2 to 3 is divided into non-steady-state noise; Figure 2 shows the noise reduction processing effect obtained by taking the original signal as input and adopting the traditional signal noise reduction method to process it; Figure 3 shows the noise reduction processing effect obtained by taking the original signal and the noise signal as input and adopting the signal noise reduction method of the embodiment of the present application to process it; by comparing Figures 2 and 3, it can be seen that the noise reduction processing method of the embodiment of the present application has a better processing effect on non-steady-state noise and less noise residue.

示例性地,为了助于理解本实施例的技术构思或技术原理,请参照图4,图4提供了一种信号降噪的简要流程示意图,具体如下:For example, to help understand the technical concept or technical principle of this embodiment, please refer to FIG. 4 , which provides a brief flow chart of signal noise reduction, as follows:

1、获取麦克风阵列mic1和mic2采集到的多通道信号。1. Get the multi-channel signals collected by microphone arrays mic1 and mic2.

2、将采集到的多通到信号输入至GSC算法中,通过GSC算法的阻塞矩阵输出得到噪声信号,阻塞矩阵的输出与GSC算法的上支路输出进行自适应滤波后,得到带噪的原始信号。2. The collected multi-channel signals are input into the GSC algorithm, and the noise signal is obtained through the blocking matrix output of the GSC algorithm. The output of the blocking matrix and the upper branch output of the GSC algorithm are adaptively filtered to obtain the original noisy signal.

3、将步骤2中得到的噪声信号与带噪的原始信号均输入至预训练的信号降噪模型,该信号降噪模型具体为神经网络模型,输出得到增益因子IRM。3. The noise signal obtained in step 2 and the original noisy signal are input into a pre-trained signal denoising model, which is specifically a neural network model, and the gain factor IRM is output.

4、基于增益因子IRM对原始信号进行降噪处理,得到降噪完成的信号。4. Perform noise reduction processing on the original signal based on the gain factor IRM to obtain a noise-reduced signal.

基于本申请第一实施例,在本申请第二实施例中,与上述实施例一相同或相似的内容,可以参考上文介绍,后续不再赘述。在此基础上,所述获取待降噪的原始信号,并获取噪声信号的步骤之前,所述方法还包括:Based on the first embodiment of the present application, in the second embodiment of the present application, the same or similar contents as those in the first embodiment can be referred to the above description and will not be described in detail later. On this basis, before the steps of obtaining the original signal to be denoised and obtaining the noise signal, the method further includes:

步骤A10,获取训练数据集,其中,所述训练数据集包括纯净语音数据与噪声数据;Step A10, obtaining a training data set, wherein the training data set includes clean speech data and noise data;

需要说明地是,该纯净语音数据与噪声数据具体可为单通道的时域信号数据,纯净语音数据是仅包含有效语音信号,不存在噪声信号的数据,噪声数据是仅包含噪声信号,不存在有效语音信号的数据。It should be noted that the clean speech data and the noise data can specifically be single-channel time domain signal data. The clean speech data is data that only contains valid speech signals and no noise signals, and the noise data is data that only contains noise signals and no valid speech signals.

步骤A20,混合所述纯净语音数据与所述噪声数据得到第一噪声语音数据与第二噪声语音数据,其中,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比;Step A20, mixing the clean speech data and the noise data to obtain first noise speech data and second noise speech data, wherein the signal-to-noise ratio of the first noise speech data is higher than the signal-to-noise ratio of the second noise speech data;

需要说明地是,作为其中一种实施方式,可直接以预设的第一信号噪比与第二信噪比混合纯净语音数据与噪声语音数据,得到第一噪声语音数据与第二噪声语音数据。作为另一种实施方式,也可通过在一声源位置处播放纯净语音数据,其他位置处播放噪声数据,从而将纯净语音数据与噪声语音数据混合,并通过麦克风阵列进行信号采集,以模拟多通道信号,并将麦克风阵列采集到的信号输入至GSC算法中,将GSC算法的输出作为第一噪声语音数据,GSC算法中阻塞矩阵的输出作为第二噪声语音数据,如此,第一噪声语音数据与第二噪声语音数据为实际GSC算法的输出,可进一步提高模型的训练效果,进而进一步提高模型的降噪处理效果。It should be noted that, as one of the implementation modes, the clean voice data and the noisy voice data can be directly mixed with the preset first signal-to-noise ratio and the second signal-to-noise ratio to obtain the first noisy voice data and the second noisy voice data. As another implementation mode, the clean voice data can be played at a sound source position and the noisy data can be played at other positions to mix the clean voice data and collect signals through a microphone array to simulate a multi-channel signal, and the signals collected by the microphone array are input into the GSC algorithm, and the output of the GSC algorithm is used as the first noisy voice data, and the output of the blocking matrix in the GSC algorithm is used as the second noisy voice data. In this way, the first noisy voice data and the second noisy voice data are the outputs of the actual GSC algorithm, which can further improve the training effect of the model, and further improve the noise reduction processing effect of the model.

步骤A30,基于所述纯净语音数据确定理想处理结果,以所述第一噪声语音数据、所述第二噪声语音数据作为输入,所述理想处理结果作为输出对预设的信号降噪模型进行训练,得到训练完成的信号降噪模型。Step A30, determining an ideal processing result based on the clean speech data, taking the first noisy speech data and the second noisy speech data as input and the ideal processing result as output to train a preset signal denoising model, and obtaining a trained signal denoising model.

理想处理结果用于指示对第一噪声语音数据处理后用户预期的理想输出,如纯净语音数据、理想增益因子等。以第一噪声语音数据、第二噪声语音数据作为输入,理想处理结果作为输出对预设的信号降噪模型进行训练,以通过第一噪声语音数据模拟高信号噪比的信号,通过第二噪声语音数据模拟低信噪比的信号,使得训练完成得到信号降噪模型可有效基于高信噪比与低信噪比的两路输入信号有效的预测输出。The ideal processing result is used to indicate the ideal output expected by the user after processing the first noisy speech data, such as pure speech data, ideal gain factor, etc. The preset signal denoising model is trained with the first noisy speech data and the second noisy speech data as input and the ideal processing result as output, so as to simulate a signal with a high signal-to-noise ratio through the first noisy speech data and simulate a signal with a low signal-to-noise ratio through the second noisy speech data, so that the signal denoising model obtained after training can effectively predict the output based on the two input signals with high signal-to-noise ratio and low signal-to-noise ratio.

基于本申请第一实施例和/或第二实施例,在本申请第三实施例中,与上述实施例一和实施例二相同或相似的内容,可以参考上文介绍,后续不再赘述。在此基础上,所述基于所述纯净语音数据确定理想处理结果的步骤之前,所述方法还包括:Based on the first embodiment and/or the second embodiment of the present application, in the third embodiment of the present application, the same or similar contents as those in the first and second embodiments can be referred to the above description, and will not be described in detail later. On this basis, before the step of determining the ideal processing result based on the clean voice data, the method further includes:

步骤B10,分别对所述第一噪声语音数据、所述第二噪声语音数据与所述纯净语音数据进行傅里叶变换,得到频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据;Step B10, performing Fourier transform on the first noisy speech data, the second noisy speech data and the clean speech data respectively to obtain the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain and the clean speech data in the frequency domain;

步骤B20,基于频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据执行所述基于所述纯净语音数据确定理想处理结果的步骤。Step B20, performing the step of determining an ideal processing result based on the clean speech data based on the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain, and the clean speech data in the frequency domain.

对时域信号进行傅里叶变换处理,得到频域信号,基于频域的信号对信号降噪模型进行训练。示例性地,可将时域信号在时域上进行分帧处理,每帧长度和重叠长度可以根据需要设置,如帧长在7.5ms~30ms之间,对每帧数据进行STFT(short-time Fouriertransform,短时傅里叶变换)得到频域信号。可以理解地,每一时域信号采用相同的分帧方式转换为频域信号,以保证信号之间的一致性。The time domain signal is processed by Fourier transform to obtain the frequency domain signal, and the signal noise reduction model is trained based on the frequency domain signal. Exemplarily, the time domain signal can be framed in the time domain, and the length of each frame and the overlap length can be set as needed, such as the frame length is between 7.5ms and 30ms, and each frame of data is subjected to STFT (short-time Fourier transform) to obtain the frequency domain signal. It can be understood that each time domain signal is converted into a frequency domain signal using the same framing method to ensure consistency between signals.

在一种可能的实施方式中,所述基于所述纯净语音数据确定理想处理结果的步骤,包括:In a possible implementation, the step of determining an ideal processing result based on the clean speech data includes:

步骤C10,将频域的所述纯净语音数据作为理想处理结果;或者,Step C10, taking the pure speech data in the frequency domain as an ideal processing result; or,

步骤C20,基于频域的所述第一噪声语音数据与频域的所述纯净语音数据计算理想增益因子,将所述理想增益因子作为理想处理结果。Step C20, calculating an ideal gain factor based on the first noisy speech data in the frequency domain and the clean speech data in the frequency domain, and taking the ideal gain factor as an ideal processing result.

可以理解地是,考虑到第一噪声语音数据用于模拟高信噪比的待降噪信号,理想情况下,待降噪信号输入至模型后,期望模型输出噪声完全消除的信号。基于此,基于第一噪声语音数据与纯净语音数据计算的理想增益因子,为基于该理想增益因子对第一噪声语音数据进行降噪后,得到的为噪声完全消除的纯净语音数据。It can be understood that, considering that the first noisy speech data is used to simulate a signal to be de-noised with a high signal-to-noise ratio, ideally, after the signal to be de-noised is input into the model, the model is expected to output a signal with noise completely eliminated. Based on this, the ideal gain factor calculated based on the first noisy speech data and the clean speech data is the clean speech data with noise completely eliminated after the first noisy speech data is de-noised based on the ideal gain factor.

在一种可能的实施方式中,所述基于频域的所述第一噪声语音数据与频域的所述纯净语音数据计算理想增益因子的步骤,包括:In a possible implementation, the step of calculating the ideal gain factor based on the first noisy speech data in the frequency domain and the clean speech data in the frequency domain includes:

步骤D10,获取频域的所述第一噪声语音数据的信号幅值,并获取频域的所述纯净语音数据的信号幅值;Step D10, obtaining the signal amplitude of the first noisy speech data in the frequency domain, and obtaining the signal amplitude of the clean speech data in the frequency domain;

具体地,该信号幅值具体可为预设信号帧数的信号中所有信号频点的累计信号幅值或平均信号幅值。可以理解地是,N个采样点,经过傅里叶变换之后,就可以得到N个信号点的傅里叶变换结果,每一个采样点对应一个傅里叶变换后的信号频点,每一个信号频点对应的傅里叶变换后的值(记为FT值)。基于此,某一信号频点的信号幅值具体为该信号频点对应的FT值的模。Specifically, the signal amplitude may be the cumulative signal amplitude or average signal amplitude of all signal frequency points in the signal of the preset signal frame number. It can be understood that after Fourier transform of N sampling points, the Fourier transform results of N signal points can be obtained, each sampling point corresponds to a signal frequency point after Fourier transform, and each signal frequency point corresponds to the value after Fourier transform (recorded as FT value). Based on this, the signal amplitude of a certain signal frequency point is specifically the modulus of the FT value corresponding to the signal frequency point.

步骤D20,将频域的所述第一噪声语音数据的信号幅值作为第一信号幅值,将频域的所述纯净语音数据的信号幅值作为第二信号幅值;Step D20, using the signal amplitude of the first noisy speech data in the frequency domain as the first signal amplitude, and using the signal amplitude of the clean speech data in the frequency domain as the second signal amplitude;

步骤D30,将所述第二信号幅值与所述第一信号幅值之间的比值作为理想增益因子。Step D30: taking the ratio between the second signal amplitude and the first signal amplitude as an ideal gain factor.

具体地,第二信号幅值与第一信号幅值之间的比值为第二信号幅值除第一信号幅值,以使得第一信号幅值乘增益因子后得到信号幅值为纯净语音数据的信号幅值,也即理想的信号幅值。Specifically, the ratio between the second signal amplitude and the first signal amplitude is the second signal amplitude divided by the first signal amplitude, so that the signal amplitude obtained by multiplying the first signal amplitude by the gain factor is the signal amplitude of pure voice data, that is, the ideal signal amplitude.

在一种可能的实施方式中,所述预设的信号降噪模型为神经网络模型,包括依次连接的卷积神经网络层、第一门控循环单元与第二门控循环单元。In a possible implementation, the preset signal denoising model is a neural network model, comprising a convolutional neural network layer, a first gated recurrent unit, and a second gated recurrent unit connected in sequence.

考虑到神经网络具有强大的非线性计算能力,本实施例采用神经网络模型作为信号降噪模型,具体地,参照图5所示,该信号降噪模型包括依次连接的卷积网络层(Convolutional Neural Network,卷积神经网络)、第一门控循环单元(Gate RecurrentUnit,GRU)与第二门控循环单元。Considering that the neural network has powerful nonlinear computing capabilities, this embodiment adopts a neural network model as a signal denoising model. Specifically, as shown in Figure 5, the signal denoising model includes a convolutional network layer (Convolutional Neural Network, convolutional neural network), a first gated recurrent unit (Gate Recurrent Unit, GRU) and a second gated recurrent unit connected in sequence.

示例性地,为了助于理解本实施例与上述实施例二结合后的信号降噪的技术构思或技术原理,列举一具体实施例,在本具体实施例中,信号降噪流程为:For example, in order to help understand the technical concept or technical principle of signal noise reduction after combining this embodiment with the above-mentioned embodiment 2, a specific embodiment is listed. In this specific embodiment, the signal noise reduction process is:

1、准备数据。准备大量单通道的纯净语音数据和噪声数据,并按照高低两种信噪比进行混合,混合后的数据视为一组,一共混合三组数据。分别为5dB和-10dB、10dB和-5dB、15dB和0dB。其中,低信噪比信号模拟通过阻塞矩阵估计出来的噪声,高信噪比信号模拟经过融合处理后得到的信号。1. Prepare data. Prepare a large amount of single-channel pure speech data and noise data, and mix them according to high and low signal-to-noise ratios. The mixed data is regarded as one group, and a total of three groups of data are mixed. They are 5dB and -10dB, 10dB and -5dB, and 15dB and 0dB. Among them, the low signal-to-noise ratio signal simulates the noise estimated by the blocking matrix, and the high signal-to-noise ratio signal simulates the signal obtained after fusion processing.

2、模型训练。参照图5所示,设计一个两通道输入和一通道输出的神经网络作为基于DNN神经网络的降噪模块。对上述过程得到的每一对信号:①计算神经网络的输入。具体步骤是对高低两组信噪比信号,进行傅里叶变换(帧长取256个点),并连续三帧组成一组。②计算神经网络的输出。对高信噪比信号和对应的纯净语音信号进行傅里叶变换(帧长取256个点),用FFT变换后的语音信号除以对应的高信噪比信号,得到每一帧的理想增益值(记作IRM)。③进行训练,先将高低两组信噪比信号傅里叶变换后的结果拼接起来(垒起来),得到一组2×3×129(选择256个点中非对称的129个点)的数据,作为神经网络的输入。将中间那一帧的高信噪比信号对应的IRM作为输出。进行神经网络训练。2. Model training. As shown in Figure 5, a neural network with two-channel input and one-channel output is designed as a noise reduction module based on the DNN neural network. For each pair of signals obtained in the above process: ① Calculate the input of the neural network. The specific steps are to perform Fourier transform (frame length is 256 points) on the two groups of high and low signal-to-noise ratio signals, and form a group of three consecutive frames. ② Calculate the output of the neural network. Perform Fourier transform (frame length is 256 points) on the high signal-to-noise ratio signal and the corresponding pure speech signal, divide the speech signal after FFT transformation by the corresponding high signal-to-noise ratio signal, and obtain the ideal gain value of each frame (denoted as IRM). ③ Perform training, first splice (stack) the results of the Fourier transform of the two groups of high and low signal-to-noise ratio signals, and obtain a set of 2×3×129 (select 129 asymmetric points out of 256 points) data as the input of the neural network. Take the IRM corresponding to the high signal-to-noise ratio signal of the middle frame as the output. Perform neural network training.

需要说明的是,上述具体实施例仅用于理解本申请,并不构成对本申请信号降噪流程的限定,基于此技术构思进行更多形式的简单变换,均在本申请的保护范围内。It should be noted that the above specific embodiments are only used to understand the present application and do not constitute a limitation on the signal noise reduction process of the present application. More simple transformations based on this technical concept are all within the protection scope of the present application.

此外,本申请实施例还提供一种信号降噪装置,参照图6,所述信号降噪装置包括:In addition, the embodiment of the present application further provides a signal noise reduction device. Referring to FIG. 6 , the signal noise reduction device includes:

获取模块10,用于获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;An acquisition module 10 is used to acquire an original signal to be denoised and to acquire a noise signal, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by estimating the noise of the multi-channel signal based on a preset noise estimation algorithm;

降噪模块20,用于将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。The denoising module 20 is used to input the original signal and the noise signal into a pre-trained signal denoising model, and output a denoising processing result, wherein the trained signal denoising model is trained using the first noisy speech data and the second noisy speech data as model input data and an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data.

在一实施例中,所述降噪模块20,还用于:In one embodiment, the noise reduction module 20 is further used for:

若所述降噪处理结果为增益因子,则基于所述增益因子对所述原始信号进行降噪处理,得到降噪完成的信号。If the noise reduction processing result is a gain factor, noise reduction processing is performed on the original signal based on the gain factor to obtain a noise-reduced signal.

在一实施例中,所述信号降噪装置还包括模型训练模块,所述模型训练模块,用于:In one embodiment, the signal denoising device further includes a model training module, wherein the model training module is used to:

获取训练数据集,其中,所述训练数据集包括纯净语音数据与噪声数据;Acquire a training data set, wherein the training data set includes clean speech data and noise data;

混合所述纯净语音数据与所述噪声数据得到第一噪声语音数据与第二噪声语音数据,其中,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比;Mixing the clean speech data with the noise data to obtain first noise speech data and second noise speech data, wherein the signal-to-noise ratio of the first noise speech data is higher than the signal-to-noise ratio of the second noise speech data;

基于所述纯净语音数据确定理想处理结果,以所述第一噪声语音数据、所述第二噪声语音数据作为输入,所述理想处理结果作为训练标签对待训练的预设信号降噪模型进行训练,得到所述训练完成的信号降噪模型。An ideal processing result is determined based on the clean speech data, and the first noisy speech data and the second noisy speech data are used as inputs. The ideal processing result is used as a training label to train a preset signal denoising model to be trained, so as to obtain the trained signal denoising model.

所述模型训练模块,还用于:The model training module is also used for:

分别对所述第一噪声语音数据、所述第二噪声语音数据与所述纯净语音数据进行傅里叶变换,得到频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据;Performing Fourier transform on the first noisy speech data, the second noisy speech data and the clean speech data respectively to obtain the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain and the clean speech data in the frequency domain;

基于频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据执行所述基于所述纯净语音数据确定理想处理结果的步骤。The step of determining an ideal processing result based on the clean speech data is performed based on the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain, and the clean speech data in the frequency domain.

所述模型训练模块,还用于:The model training module is also used for:

将频域的所述纯净语音数据作为理想处理结果;或者,Taking the pure speech data in the frequency domain as the ideal processing result; or,

基于频域的所述第一噪声语音数据与频域的所述纯净语音数据计算理想增益因子,将所述理想增益因子作为理想处理结果。An ideal gain factor is calculated based on the first noisy speech data in the frequency domain and the clean speech data in the frequency domain, and the ideal gain factor is used as an ideal processing result.

所述模型训练模块,还用于:The model training module is also used for:

获取频域的所述第一噪声语音数据的信号幅值,并获取频域的所述纯净语音数据的信号幅值;Acquire the signal amplitude of the first noisy speech data in the frequency domain, and acquire the signal amplitude of the clean speech data in the frequency domain;

将频域的所述第一噪声语音数据的信号幅值作为第一信号幅值,将频域的所述纯净语音数据的信号幅值作为第二信号幅值;Using the signal amplitude of the first noisy speech data in the frequency domain as the first signal amplitude, and using the signal amplitude of the clean speech data in the frequency domain as the second signal amplitude;

将所述第二信号幅值与所述第一信号幅值之间的比值作为理想增益因子。The ratio between the second signal amplitude and the first signal amplitude is taken as the ideal gain factor.

在一实施例中,所述预设信号降噪模型为神经网络模型,包括依次连接的卷积神经网络层、第一门控循环单元与第二门控循环单元。In one embodiment, the preset signal denoising model is a neural network model, comprising a convolutional neural network layer, a first gated recurrent unit, and a second gated recurrent unit connected in sequence.

此外,本申请实施例还提出一种信号降噪设备,信号降噪设备括存储器、处理器及存储在所述存储器上并可在所述处理器上执行的信号降噪程序,所述信号降噪程序被所述处理器执行时实现如上述的信号降噪方法的步骤。In addition, an embodiment of the present application also proposes a signal noise reduction device, which includes a memory, a processor, and a signal noise reduction program stored in the memory and executable on the processor. When the signal noise reduction program is executed by the processor, the steps of the signal noise reduction method as described above are implemented.

下面参考图7,其示出了适于用来实现本申请实施例的信号降噪设备的结构示意图。本申请实施例中的信号降噪设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(Personal Digital Assistant:个人数字助理)、PAD(PortableApplication Description:平板电脑)、PMP(Portable Media Player:便携式多媒体播放器)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图7示出的信号降噪设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Referring to FIG. 7 below, a schematic diagram of the structure of a signal noise reduction device suitable for implementing an embodiment of the present application is shown. The signal noise reduction device in the embodiment of the present application may include, but is not limited to, mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Descriptions), PMPs (Portable Media Players), etc., and fixed terminals such as digital TVs, desktop computers, etc. The signal noise reduction device shown in FIG. 7 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present application.

如图7所示,信号降噪设备可以包括处理装置1001(例如中央处理器、图形处理器等),其可以根据存储在只读存储器(ROM:Read Only Memory)1002中的程序或者从存储装置1003加载到随机访问存储器(RAM:Random Access Memory)1004中的程序而执行各种适当的动作和处理。在RAM1004中,还存储有信号降噪设备操作所需的各种程序和数据。处理装置1001、ROM1002以及RAM1004通过总线1005彼此相连。输入/输出(I/O)接口1006也连接至总线。通常,以下系统可以连接至I/O接口1006:包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置1007;包括例如液晶显示器(LCD:LiquidCrystal Display)、扬声器、振动器等的输出装置1008;包括例如磁带、硬盘等的存储装置1003;以及通信装置1009。通信装置1009可以允许信号降噪设备与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种系统的信号降噪设备,但是应理解的是,并不要求实施或具备所有示出的系统。可以替代地实施或具备更多或更少的系统。As shown in FIG7 , the signal noise reduction device may include a processing device 1001 (e.g., a central processing unit, a graphics processor, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 to a random access memory (RAM) 1004. Various programs and data required for the operation of the signal noise reduction device are also stored in the RAM 1004. The processing device 1001, the ROM 1002, and the RAM 1004 are connected to each other via a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. Typically, the following systems may be connected to the I/O interface 1006: input devices 1007 including, for example, a touch screen, a touchpad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc.; output devices 1008 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 1003 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 1009. The communication device 1009 may allow the signal noise reduction device to communicate wirelessly or wired with other devices to exchange data. Although the figure shows a signal noise reduction device with various systems, it should be understood that it is not required to implement or have all the systems shown. More or fewer systems may be implemented or have instead.

特别地,根据本申请公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置从网络上被下载和安装,或者从存储装置1003被安装,或者从ROM1002被安装。在该计算机程序被处理装置1001执行时,执行本申请公开实施例的方法中限定的上述功能。In particular, according to the embodiments disclosed in the present application, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments disclosed in the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes a program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network through a communication device, or installed from a storage device 1003, or installed from a ROM 1002. When the computer program is executed by the processing device 1001, the above-mentioned functions defined in the method of the embodiment disclosed in the present application are executed.

本申请实施例提供的信号降噪设备,采用上述实施例中的信号降噪方法,能解决信号降噪的技术问题。与现有技术相比,本申请提供的信号降噪设备的有益效果与上述实施例提供的信号降噪方法的有益效果相同,且该信号降噪设备中的其他技术特征与上一实施例方法公开的特征相同,在此不做赘述。The signal noise reduction device provided in the embodiment of the present application adopts the signal noise reduction method in the above embodiment to solve the technical problem of signal noise reduction. Compared with the prior art, the beneficial effects of the signal noise reduction device provided in the present application are the same as the beneficial effects of the signal noise reduction method provided in the above embodiment, and other technical features in the signal noise reduction device are the same as the features disclosed in the method of the previous embodiment, which will not be repeated here.

应当理解,本申请公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式的描述中,具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。It should be understood that the various parts disclosed in this application can be implemented by hardware, software, firmware or a combination thereof. In the description of the above embodiments, specific features, structures, materials or characteristics can be combined in any one or more embodiments or examples in a suitable manner.

以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

此外,为实现上述目的,本申请实施例还提供一种可读存储介质,具有存储在其上的计算机可读程序指令(即计算机程序),计算机可读程序指令用于执行上述实施例中的信号降噪方法。In addition, to achieve the above-mentioned purpose, an embodiment of the present application further provides a readable storage medium having computer-readable program instructions (ie, computer program) stored thereon, and the computer-readable program instructions are used to execute the signal noise reduction method in the above-mentioned embodiment.

本申请实施例提供的计算机可读存储介质例如可以是U盘,但不限于电、磁、光、电磁、红外线、或半导体的系统、系统或器件,或者任意以上的组合。计算机可读存储介质的更具体地例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM:Random Access Memory)、只读存储器(ROM:Read Only Memory)、可擦式可编程只读存储器(EPROM:Erasable Programmable Read Only Memory或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM:CD-Read Only Memory)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、系统或者器件使用或者与其结合使用。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(Radio Frequency:射频)等等,或者上述的任意合适的组合。The computer-readable storage medium provided in the embodiment of the present application may be, for example, a USB flash drive, but is not limited to electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, systems or devices, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM: Random Access Memory), a read-only memory (ROM: Read Only Memory), an erasable programmable read-only memory (EPROM: Erasable Programmable Read Only Memory or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM: CD-Read Only Memory), an optical storage device, a magnetic storage device, or any suitable combination of the above. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in combination with an instruction execution system, system or device. The program code contained on the computer-readable storage medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (Radio Frequency: Radio Frequency), etc., or any suitable combination of the above.

上述计算机可读存储介质可以是信号降噪设备中所包含的;也可以是单独存在,而未装配入信号降噪设备中。The computer-readable storage medium may be included in the signal noise reduction device; or may exist independently without being assembled into the signal noise reduction device.

上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被信号降噪设备执行时,使得信号降噪设备:获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。The computer-readable storage medium carries one or more programs. When the one or more programs are executed by the signal noise reduction device, the signal noise reduction device: obtains the original signal to be denoised and obtains the noise signal, wherein the original signal is a signal obtained by fusion processing of the multi-channel signal received by the microphone array, and the noise signal is a signal obtained by noise estimation of the multi-channel signal based on a preset noise estimation algorithm; inputs the original signal and the noise signal into a pre-trained signal noise reduction model, and outputs a noise reduction processing result, wherein the trained signal noise reduction model is trained using the first noisy speech data and the second noisy speech data as model input data, and using the ideal processing result as the model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data.

可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN:Local Area Network)或广域网(WAN:Wide Area Network)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages. The program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).

附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present application. In this regard, each box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a sequence different from that marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or flow chart, and the combination of the boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

描述于本申请实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该单元本身的限定。The modules involved in the embodiments of the present application may be implemented by software or hardware, wherein the name of the module does not, in some cases, constitute a limitation on the unit itself.

本申请提供的可读存储介质为计算机可读存储介质,所述计算机可读存储介质存储有用于执行上述信号降噪方法的计算机可读程序指令(即计算机程序),能够解决信号降噪的技术问题。与现有技术相比,本申请提供的计算机可读存储介质的有益效果与上述实施例提供的信号降噪方法的有益效果相同,在此不做赘述。The readable storage medium provided in the present application is a computer-readable storage medium, which stores computer-readable program instructions (i.e., computer programs) for executing the above-mentioned signal noise reduction method, and can solve the technical problem of signal noise reduction. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in the present application are the same as the beneficial effects of the signal noise reduction method provided in the above-mentioned embodiment, and will not be repeated here.

此外,本申请实施例还提出一种计算机程序产品,包括信号降噪程序,所述信号降噪程序被处理器执行时实现如上所述的信号降噪方法的步骤。In addition, an embodiment of the present application further proposes a computer program product, including a signal noise reduction program, which implements the steps of the signal noise reduction method described above when executed by a processor.

本申请计算机程序产品具体实施方式与上述信号降噪方法各实施例基本相同,在此不再赘述。The specific implementation of the computer program product of the present application is basically the same as the above-mentioned signal noise reduction method embodiments, and will not be repeated here.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or system. In the absence of further restrictions, an element defined by the sentence "comprises a ..." does not exclude the existence of other identical elements in the process, method, article or system including the element.

上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above-mentioned embodiments of the present application are for description only and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件传感器的形式体现出来,该计算机软件传感器存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software sensor, which is stored in a storage medium (such as ROM/RAM, disk, CD) as described above, including a number of instructions for a terminal device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in each embodiment of the present application.

以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made using the contents of the present application specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present application.

Claims (10)

1.一种信号降噪方法,其特征在于,所述信号降噪方法包括以下步骤:1. A signal noise reduction method, characterized in that the signal noise reduction method comprises the following steps: 获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;Acquire an original signal to be denoised, and acquire a noise signal, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by estimating the noise of the multi-channel signal based on a preset noise estimation algorithm; 将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。The original signal and the noise signal are input into a pre-trained signal denoising model, and a denoising processing result is output, wherein the trained signal denoising model is trained using first noisy speech data and second noisy speech data as model input data and an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data. 2.如权利要求1所述的方法,其特征在于,所述将所述原始信号与所述噪声信号输入至预训练完成的信号降噪模型中,输出得到降噪处理结果的步骤之后,所述方法还包括:2. The method according to claim 1, characterized in that after the step of inputting the original signal and the noise signal into a pre-trained signal denoising model and outputting a denoising result, the method further comprises: 若所述降噪处理结果为增益因子,则基于所述增益因子对所述原始信号进行降噪处理,得到降噪完成的信号。If the noise reduction processing result is a gain factor, noise reduction processing is performed on the original signal based on the gain factor to obtain a noise-reduced signal. 3.如权利要求1所述的方法,其特征在于,所述获取待降噪的原始信号,并获取噪声信号的步骤之前,所述方法还包括:3. The method according to claim 1, characterized in that before the steps of obtaining the original signal to be denoised and obtaining the noise signal, the method further comprises: 获取训练数据集,其中,所述训练数据集包括纯净语音数据与噪声数据;Acquire a training data set, wherein the training data set includes clean speech data and noise data; 混合所述纯净语音数据与所述噪声数据得到第一噪声语音数据与第二噪声语音数据,其中,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比;Mixing the clean speech data with the noise data to obtain first noise speech data and second noise speech data, wherein the signal-to-noise ratio of the first noise speech data is higher than the signal-to-noise ratio of the second noise speech data; 基于所述纯净语音数据确定理想处理结果,以所述第一噪声语音数据、所述第二噪声语音数据作为输入,所述理想处理结果作为训练标签对待训练的预设信号降噪模型进行训练,得到所述训练完成的信号降噪模型。An ideal processing result is determined based on the clean speech data, and the first noisy speech data and the second noisy speech data are used as inputs. The ideal processing result is used as a training label to train a preset signal denoising model to be trained, so as to obtain the trained signal denoising model. 4.如权利要求3所述的方法,其特征在于,所述基于所述纯净语音数据确定理想处理结果的步骤之前,所述方法还包括:4. The method according to claim 3, characterized in that before the step of determining an ideal processing result based on the clean speech data, the method further comprises: 分别对所述第一噪声语音数据、所述第二噪声语音数据与所述纯净语音数据进行傅里叶变换,得到频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据;Performing Fourier transform on the first noisy speech data, the second noisy speech data and the clean speech data respectively to obtain the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain and the clean speech data in the frequency domain; 基于频域的所述第一噪声语音数据、频域的所述第二噪声语音数据与频域的所述纯净语音数据执行所述基于所述纯净语音数据确定理想处理结果的步骤。The step of determining an ideal processing result based on the clean speech data is performed based on the first noisy speech data in the frequency domain, the second noisy speech data in the frequency domain, and the clean speech data in the frequency domain. 5.如权利要求4所述的方法,其特征在于,所述基于所述纯净语音数据确定理想处理结果的步骤,包括:5. The method according to claim 4, wherein the step of determining an ideal processing result based on the clean speech data comprises: 将频域的所述纯净语音数据作为理想处理结果;或者,Taking the pure speech data in the frequency domain as the ideal processing result; or, 基于频域的所述第一噪声语音数据与频域的所述纯净语音数据计算理想增益因子,将所述理想增益因子作为理想处理结果。An ideal gain factor is calculated based on the first noisy speech data in the frequency domain and the clean speech data in the frequency domain, and the ideal gain factor is used as an ideal processing result. 6.如权利要求5所述的方法,其特征在于,所述基于频域的所述第一噪声语音数据与频域的所述纯净语音数据计算理想增益因子的步骤,包括:6. The method according to claim 5, wherein the step of calculating the ideal gain factor based on the first noisy speech data in the frequency domain and the clean speech data in the frequency domain comprises: 获取频域的所述第一噪声语音数据的信号幅值,并获取频域的所述纯净语音数据的信号幅值;Acquire the signal amplitude of the first noisy speech data in the frequency domain, and acquire the signal amplitude of the clean speech data in the frequency domain; 将频域的所述第一噪声语音数据的信号幅值作为第一信号幅值,将频域的所述纯净语音数据的信号幅值作为第二信号幅值;Using the signal amplitude of the first noisy speech data in the frequency domain as the first signal amplitude, and using the signal amplitude of the clean speech data in the frequency domain as the second signal amplitude; 将所述第二信号幅值与所述第一信号幅值之间的比值作为理想增益因子。The ratio between the second signal amplitude and the first signal amplitude is taken as the ideal gain factor. 7.如权利要求3至6任一项所述的方法,其特征在于,所述预设信号降噪模型为神经网络模型,包括依次连接的卷积神经网络层、第一门控循环单元与第二门控循环单元。7. The method according to any one of claims 3 to 6 is characterized in that the preset signal denoising model is a neural network model, comprising a convolutional neural network layer, a first gated recurrent unit and a second gated recurrent unit connected in sequence. 8.一种信号降噪装置,其特征在于,所述信号降噪装置包括:8. A signal noise reduction device, characterized in that the signal noise reduction device comprises: 获取模块,用于获取待降噪的原始信号,并获取噪声信号,其中,所述原始信号为对麦克风阵列接收到的多通道信号进行融合处理得到的信号,所述噪声信号为基于预设噪声估计算法对所述多通道信号进行噪声估计得到的信号;An acquisition module, used to acquire an original signal to be denoised, and to acquire a noise signal, wherein the original signal is a signal obtained by fusing a multi-channel signal received by a microphone array, and the noise signal is a signal obtained by performing noise estimation on the multi-channel signal based on a preset noise estimation algorithm; 降噪模块,用于将所述原始信号与所述噪声信号输入至预先训练完成的信号降噪模型中,输出得到降噪处理结果,其中,所述训练完成的信号降噪模型为以第一噪声语音数据与第二噪声语音数据作为模型输入数据,以及以理想处理结果作为模型训练标签进行训练得到的,所述第一噪声语音数据的信噪比高于所述第二噪声语音数据的信噪比。A denoising module is used to input the original signal and the noise signal into a pre-trained signal denoising model, and output a denoising processing result, wherein the trained signal denoising model is trained using the first noisy speech data and the second noisy speech data as model input data and an ideal processing result as a model training label, and the signal-to-noise ratio of the first noisy speech data is higher than the signal-to-noise ratio of the second noisy speech data. 9.一种信号降噪设备,其特征在于,所述信号降噪设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的信号降噪程序,所述信号降噪程序被所述处理器执行时实现如权利要求1至7中任一项所述的信号降噪方法的步骤。9. A signal noise reduction device, characterized in that the signal noise reduction device comprises: a memory, a processor, and a signal noise reduction program stored in the memory and executable on the processor, wherein when the signal noise reduction program is executed by the processor, the steps of the signal noise reduction method according to any one of claims 1 to 7 are implemented. 10.一种可读存储介质,其特征在于,所述可读存储介质为计算机可读存储介质,所述计算机可读存储介质上存储有实现信号降噪方法的程序,所述实现信号降噪方法的程序被处理器执行以实现如权利要求1至7中任一项所述信号降噪方法的步骤。10. A readable storage medium, characterized in that the readable storage medium is a computer-readable storage medium, and a program for implementing a signal noise reduction method is stored on the computer-readable storage medium, and the program for implementing the signal noise reduction method is executed by a processor to implement the steps of the signal noise reduction method as described in any one of claims 1 to 7.
CN202410804637.XA 2024-06-20 2024-06-20 Signal noise reduction method, device, equipment and readable storage medium Pending CN118609587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410804637.XA CN118609587A (en) 2024-06-20 2024-06-20 Signal noise reduction method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410804637.XA CN118609587A (en) 2024-06-20 2024-06-20 Signal noise reduction method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN118609587A true CN118609587A (en) 2024-09-06

Family

ID=92555493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410804637.XA Pending CN118609587A (en) 2024-06-20 2024-06-20 Signal noise reduction method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN118609587A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118800268A (en) * 2024-09-14 2024-10-18 潍坊歌尔电子有限公司 Voice signal processing method, voice signal processing device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118800268A (en) * 2024-09-14 2024-10-18 潍坊歌尔电子有限公司 Voice signal processing method, voice signal processing device and storage medium
CN118800268B (en) * 2024-09-14 2025-02-28 潍坊歌尔电子有限公司 Voice signal processing method, voice signal processing device and storage medium

Similar Documents

Publication Publication Date Title
CN108615535B (en) Voice enhancement method and device, intelligent voice equipment and computer equipment
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
CN118609587A (en) Signal noise reduction method, device, equipment and readable storage medium
JP2016048872A (en) Sound collection device
CN108495235B (en) Method and device for separating heavy and low sounds, computer equipment and storage medium
CN112312258B (en) Intelligent earphone with hearing protection and hearing compensation
JP2016503262A (en) Echo suppression
CN111755021B (en) Voice enhancement method and device based on binary microphone array
CN113707170B (en) Wind noise suppression method, electronic device and storage medium
CN118800268B (en) Voice signal processing method, voice signal processing device and storage medium
CN110992975A (en) Voice signal processing method and device and terminal
CN113205824B (en) Sound signal processing method, device, storage medium, chip and related equipment
CN118553260A (en) Adaptive noise reduction method, device, hearing device and readable storage medium
CN118474625A (en) Audio signal processing method, electronic device, and computer-readable storage medium
US11521637B1 (en) Ratio mask post-filtering for audio enhancement
CN118737111A (en) Speech processing method, device, vehicle, storage medium and program product
CN112634930B (en) Multichannel sound enhancement method and device and electronic equipment
CN118737181A (en) Wind noise suppression method, device, audio equipment and readable storage medium
CN112565979B (en) Speaker frequency response numerical value calculation method and device, electronic equipment and storage medium
CN118782084A (en) Voice activity detection method, device, audio device and readable storage medium
CN118762704A (en) Signal noise reduction method, device, storage medium and computer program product
CN114615586B (en) Headphone noise reduction method, device, electronic device and readable storage medium
WO2024082800A1 (en) Audio processing method and apparatus, and terminal device
CN113345394B (en) Audio data processing method and device, electronic equipment and storage medium
CN111145776B (en) Audio processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination