CN114333870A

CN114333870A - Voice processing method and device

Info

Publication number: CN114333870A
Application number: CN202011065103.8A
Authority: CN
Inventors: 魏善义; 吴超; 廖猛; 章烨辉
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2022-04-12

Abstract

The present invention discloses a voice processing method and a related device, wherein the method comprises: performing echo cancellation on n channels of first voice signals to obtain n channels of second voice signals, wherein the first voice signals are obtained by collecting voice signals; perform beamforming on the n channels of second voice signals to obtain m channels of first beams; obtain interference candidate beams from the m channels of first beams; The second voice signal is processed to obtain the target voice signal, which can improve the signal quality of the target voice signal.

Description

Voice processing method and device

技术领域technical field

本发明涉及语音处理领域，尤其涉及一种语音处理方法及装置。The present invention relates to the field of speech processing, and in particular, to a speech processing method and device.

背景技术Background technique

现有技术方案中进行语音识别和语音唤醒时，通常采用对接收的语音信号进行语音增强的方法，来实现提升识别或语音唤醒的成功率，但是恶劣场景(例如强外噪场景)下，现有的语音增强方法通常不能很好的对信号进行增强处理，从而导致了语音识别或语音唤醒时的成功率较低。In the prior art solution, when voice recognition and voice wake-up are performed, the method of voice enhancement of the received voice signal is usually used to improve the success rate of recognition or voice wake-up. However, in harsh scenarios (such as strong external noise scenarios), Some speech enhancement methods usually cannot enhance the signal well, resulting in a low success rate in speech recognition or speech wake-up.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种语音处理方法及装置，可以对多路语音信号进行回声消除，根据回声消除后的语音信号进行波束形成得到的波束对回声消除的语音信号进行处理，得到目标语音信号，可以提升目标语音信号的信号质量。Embodiments of the present invention provide a voice processing method and device, which can perform echo cancellation on multi-channel voice signals, and process the echo-cancelled voice signals according to beams obtained by beamforming the echo-cancelled voice signals to obtain a target voice signal, The signal quality of the target voice signal can be improved.

第一方面，本发明实施例提供一种语音处理方法，所述方法包括：In a first aspect, an embodiment of the present invention provides a speech processing method, the method includes:

对n路第一语音信号进行回声消除，以得到n路第二语音信号，所述第一语音信号为采集得到的语音信号；Performing echo cancellation on n-channel first voice signals to obtain n-channel second voice signals, where the first voice signals are collected voice signals;

对所述n路第二语音信号进行波束形成，以得到m路第一波束；performing beamforming on the n channels of second speech signals to obtain m channels of first beams;

从所述m路第一波束中获取干扰候选波束；obtaining interference candidate beams from the m first beams;

根据所述干扰候选波束对所述n路第二语音信号进行处理，以得到目标语音信号。The n channels of second speech signals are processed according to the interference candidate beams to obtain a target speech signal.

本示例中，对第一语音信号进行回声消除，得到第二语音信号，并根据第二语音信号波束形成后的干扰候选波束对第二语音信号进行处理，以得到目标语音信号，因此，可以通过干扰候选波束对第二语音信号进行处理，得到目标语音信号，可以提升目标语音信号的信号质量。In this example, echo cancellation is performed on the first speech signal to obtain the second speech signal, and the second speech signal is processed according to the interference candidate beams formed by the beamforming of the second speech signal to obtain the target speech signal. Therefore, the target speech signal can be obtained by The interference candidate beam processes the second speech signal to obtain the target speech signal, which can improve the signal quality of the target speech signal.

结合第一方面，在一个可能的实现方式中，所述从所述m路第一波束中获取干扰候选波束，包括：With reference to the first aspect, in a possible implementation manner, the obtaining interference candidate beams from the m-channel first beams includes:

获取所述m路第一波束对应的帧级能量，以得到m个第一帧级能量值；obtaining the frame-level energy corresponding to the m first beams to obtain m first frame-level energy values;

根据所述m个第一帧级能量值，确定波束平均能量值；determining the average energy value of the beam according to the m first frame-level energy values;

根据所述m个第一帧级能量值和所述波束平均能量值，确定所述m路第一波束对应的计数值；determining a count value corresponding to the m first beams according to the m first frame-level energy values and the beam average energy value;

将所述m路第一波束对应的计数值中的最大计数值对应的波束确定为所述干扰候选波束。The beam corresponding to the largest count value among the count values corresponding to the m first beams is determined as the interference candidate beam.

本示例中，通过每个第一波束对应的帧级能量和波束的平均能量对第一波束进行计数，将计数最大值对应的波束确定为所述干扰候选波束，从帧级能量和平均能量的角度来获取干扰候选波束，提升了干扰候选获取时的准确性。In this example, the first beams are counted according to the frame-level energy corresponding to each first beam and the average energy of the beam, and the beam corresponding to the maximum count is determined as the interference candidate beam. The interference candidate beam is obtained from the angle, which improves the accuracy of the interference candidate acquisition.

结合第一方面，在一个可能的实现方式中，所述根据所述干扰候选波束对所述n路第二语音信号进行处理，以得到目标语音信号，包括：With reference to the first aspect, in a possible implementation manner, the processing of the n channels of second speech signals according to the interference candidate beams to obtain the target speech signal includes:

获取所述干扰候选波束的平滑能量值；obtaining the smoothed energy value of the interference candidate beam;

根据所述平滑能量值确定滤波强度值；determining a filter strength value according to the smoothed energy value;

根据所述滤波强度值和所述干扰候选波束对所述n路第二语音信号进行滤波处理，以得到n路第三语音信号；Perform filtering processing on the n channels of second speech signals according to the filter strength value and the interference candidate beam to obtain n channels of third speech signals;

对所述n路第三语音信号进行处理，以得到目标语音信号。The n third voice signals are processed to obtain a target voice signal.

本示例中，通过平滑能量确定的滤波强度值和干扰波束对第二语音信号进行滤波处理，得到第三滤波信号，对第三滤波信号进行处理，得到目标语音信号，由于通过滤波强度值和干扰波束对第二语音进行滤波处理，可以提升滤波后的第三信号的质量，进而提升了目标语音信号的信号质量。In this example, the second voice signal is filtered through the filtering strength value determined by the smoothing energy and the interference beam to obtain the third filtered signal, and the third filtered signal is processed to obtain the target voice signal. Filtering the second speech by the beam can improve the quality of the filtered third signal, thereby improving the signal quality of the target speech signal.

结合第一方面，在一个可能的实现方式中，所述对所述n路第三语音信号进行处理，以得到目标语音信号，包括：With reference to the first aspect, in a possible implementation manner, the processing of the n third voice signals to obtain the target voice signal includes:

从所述n路第三语音信号中获取第i路第三语音信号和第j路第三语音信号；Obtain the ith third voice signal and the jth third voice signal from the n third voice signals;

对所述第i路第三语音信号和第j路第三语音信号进行去混响和盲源分离，以得到第一候选语音信号和第二候选语音信号；Perform de-reverberation and blind source separation on the i-th third speech signal and the j-th third speech signal to obtain the first candidate speech signal and the second candidate speech signal;

对所述n路第三语音信号进行波束形成，以得到m路第二波束；performing beamforming on the n channels of third speech signals to obtain m channels of second beams;

从所述m路第二波束中获取第一目标波束；obtaining a first target beam from the m-way second beam;

对所述第一目标波束至少进行降噪处理，以得到处理后的第一目标波束；performing at least noise reduction processing on the first target beam to obtain a processed first target beam;

根据所述第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束，确定所述目标语音信号。The target speech signal is determined according to the first candidate speech signal and the second candidate speech signal and the processed first target beam.

本示例中，对第i路第三语音信号和第j路第三语音信号进行去混响和盲源分离，以得到的第一候选语音信号、第二候选语音信号，并根据第一候选语音信号、第二候选语音信号和降噪后的第一目标波束来确定目标语音信号，可以提升目标语音信号的信号质量。In this example, de-reverberation and blind source separation are performed on the i-th third speech signal and the j-th third speech signal to obtain the first candidate speech signal and the second candidate speech signal, and according to the first candidate speech signal The signal, the second candidate speech signal, and the noise-reduced first target beam are used to determine the target speech signal, which can improve the signal quality of the target speech signal.

判断所述干扰候选波束是否为预设波束，若所述干扰候选波束为预设波束，则根据所述干扰候选波束和预设滤波强度对所述n路第二语音信号进行滤波处理，以得到n路第四语音信号；Determine whether the interference candidate beam is a preset beam, and if the interference candidate beam is a preset beam, perform filtering processing on the n channels of second speech signals according to the interference candidate beam and the preset filtering strength to obtain n fourth voice signal;

对所述n路第四语音信号进行波束形成，以得到m路第三波束；performing beamforming on the n channels of fourth speech signals to obtain m channels of third beams;

从所述m路第三波束中获取第二目标波束；obtaining a second target beam from the m third beams;

对所述第二目标波束至少进行降噪处理，以得到处理后的第二目标波束；performing at least noise reduction processing on the second target beam to obtain a processed second target beam;

从所述n路第二语音信号中获取第h路第二语音信号和第k路第二语音信号；Obtain the h-th second voice signal and the k-th second voice signal from the n-way second voice signals;

对所述第h路第二语音信号和所述第k路第二语音信号进行滤波和盲源分离，以得到第三候选语音信号和第四候选语音信号；Filtering and blind source separation are performed on the h-th second voice signal and the k-th second voice signal to obtain a third candidate voice signal and a fourth candidate voice signal;

根据所述第三候选语音信号和第四候选语音信号和所述处理后的第二目标波束，确定所述目标语音信号。The target speech signal is determined according to the third candidate speech signal and the fourth candidate speech signal and the processed second target beam.

本示例中，在判断出干扰候选波束为预设波束后，根据干扰候选波束和预设滤波强度对第二语音信号进行滤波处理，对第二目标波束至少进行降噪处理后和对所述第h路第二语音信号和所述第k路第二语音信号进行去混响和盲源分离，以得到的第三候选语音信号和第四候选语音信号，来确定目标信号，可以提升目标语音信号的信号质量。In this example, after it is determined that the interference candidate beam is the preset beam, the second speech signal is filtered according to the interference candidate beam and the preset filter strength, and the second target beam is at least subjected to noise reduction processing and the first The second voice signal of the h channel and the second voice signal of the kth channel are subjected to de-reverberation and blind source separation, and the third candidate voice signal and the fourth candidate voice signal are obtained to determine the target signal, which can improve the target voice signal signal quality.

结合第一方面，在一个可能的实现方式中，所述方法还包括：With reference to the first aspect, in a possible implementation manner, the method further includes:

若所述干扰候选波束不是预设波束，则根据所述干扰候选波束、第三候选语音信号和第四候选语音信号，确定所述目标语音信号。If the interference candidate beam is not a preset beam, the target speech signal is determined according to the interference candidate beam, the third candidate speech signal and the fourth candidate speech signal.

若干扰候选波束不是预设波束，则干扰候选波束、第三候选语音信号和第四候选语音信号，确定所述目标语音信号，可以降低目标语音信号确定时的资源消耗。If the interference candidate beam is not a preset beam, the interference candidate beam, the third candidate speech signal and the fourth candidate speech signal are determined to determine the target speech signal, which can reduce resource consumption when determining the target speech signal.

第二方面，本申请实施例提供了一种语音处理装置，所述装置包括：In a second aspect, an embodiment of the present application provides a voice processing apparatus, and the apparatus includes:

消除单元，用于对n路第一语音信号进行回声消除，以得到n路第二语音信号，所述第一语音信号为采集得到的语音信号；an elimination unit, configured to perform echo cancellation on n-channel first voice signals to obtain n-channel second voice signals, where the first voice signals are collected voice signals;

波束形成单元，用于对所述n路第二语音信号进行波束形成，以得到m路第一波束；a beamforming unit, configured to perform beamforming on the n channels of second speech signals to obtain m channels of first beams;

获取单元，用于从所述m路第一波束中获取干扰候选波束；an obtaining unit, configured to obtain interference candidate beams from the m first beams;

处理单元，用于根据所述干扰候选波束对所述n路第二语音信号进行处理，以得到目标语音信号。The processing unit is configured to process the n channels of second speech signals according to the interference candidate beams to obtain a target speech signal.

结合第二方面，在一个可能的实现方式中，所述获取单元具体用于：With reference to the second aspect, in a possible implementation manner, the obtaining unit is specifically configured to:

结合第二方面，在一个可能的实现方式中，所述处理单元具体用于：With reference to the second aspect, in a possible implementation manner, the processing unit is specifically configured to:

结合第二方面，在一个可能的实现方式中，在所述对所述n路第三语音信号进行处理，以得到目标语音信号方面，所述处理单元具体用于：With reference to the second aspect, in a possible implementation manner, in the aspect of processing the n third voice signals to obtain the target voice signal, the processing unit is specifically configured to:

结合第二方面，在一个可能的实现方式中，所述处理单元用于：With reference to the second aspect, in a possible implementation manner, the processing unit is configured to:

结合第二方面，在一个可能的实现方式中，所述处理单元还用于：With reference to the second aspect, in a possible implementation manner, the processing unit is further configured to:

第三方面，本发明实施例提供一种语音处理装置，包括：In a third aspect, an embodiment of the present invention provides a voice processing apparatus, including:

存储器，用于存储指令；以及memory for storing instructions; and

至少一台处理器，与所述存储器耦合；at least one processor coupled to the memory;

其中，当所述至少一台处理器执行所述指令时，所述指令致使所述处理器执行如第一方面所示的全部或者部分方法。Wherein, when the at least one processor executes the instructions, the instructions cause the processor to execute all or part of the method shown in the first aspect.

第四方面，本发明实施例提供一种计算机可读存储介质，该计算机可读存储介质存储有计算机程序，所述计算机程序包括程序指令，所述程序指令当被处理器执行时使所述处理器执行如第一方面所示的全部或者部分方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions cause the processing when executed by a processor The processor executes all or part of the method shown in the first aspect.

本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。These and other aspects of the invention will be more clearly understood from the description of the following embodiments.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本申请实施例提供了一种语音增强的示意图；FIG. 1 provides a schematic diagram of a speech enhancement in an embodiment of the present application;

图2为本申请实施例提供了一种语音处理方法的流程示意图；FIG. 2 provides a schematic flowchart of a voice processing method according to an embodiment of the present application;

图3为本发明实施例提供的一种语音处理装置的结构示意图；FIG. 3 is a schematic structural diagram of a voice processing apparatus according to an embodiment of the present invention;

图4为本发明实施例提供的另一种语音处理装置的结构示意图。FIG. 4 is a schematic structural diagram of another voice processing apparatus according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图对本申请的实施例进行描述。The embodiments of the present application will be described below with reference to the accompanying drawings.

下面首先介绍，目前主要的引入语音增强的方式来进行语音唤醒和语音增强的处理方法，如图1所示，对输入信号进行语音增强，通过增强后的语音信号进行语音唤醒或语音识别，语音唤醒和语音识别的应用场景可以是人机语音交互场景，在人机语音交互场景中，越来越多的人使用语音交互，通过语音唤醒电子设备或者通过语音向电子设备发送指令等。The following first introduces the main methods of introducing voice enhancement for voice wake-up and voice enhancement. As shown in Figure 1, voice enhancement is performed on the input signal, and voice wake-up or voice recognition is performed through the enhanced voice signal. The application scenarios of wake-up and voice recognition can be human-computer voice interaction scenarios. In human-computer voice interaction scenarios, more and more people use voice interaction to wake up electronic devices through voice or send instructions to electronic devices through voice.

本申请的电子设备可以是手机、平板、音箱、电视等终端设备上，例如作为手机、平板、音箱、电视等，语音处理方法可以是对麦克风拾音的带干扰的信号进行处理，输出干净的语音信号，作为唤醒和识别引擎的输入。The electronic device of the present application can be a terminal device such as a mobile phone, a tablet, a speaker, a TV, etc., for example, as a mobile phone, a tablet, a speaker, a TV, etc., the voice processing method can be to process the signal with interference picked up by the microphone, and output a clean Speech signal, as input to wake-up and recognition engine.

下面具体介绍语音处理方法的过程。The process of the speech processing method is described in detail below.

参见图2，图2为本申请实施例提供了一种语音处理方法的流程示意图。如图2所示，语音处理方法包括：Referring to FIG. 2, FIG. 2 provides a schematic flowchart of a speech processing method according to an embodiment of the present application. As shown in Figure 2, the speech processing method includes:

S201、对n路第一语音信号进行回声消除，以得到n路第二语音信号，第一语音信号为采集得到的语音信号。S201. Perform echo cancellation on n channels of first voice signals to obtain n channels of second voice signals, where the first voice signals are collected voice signals.

n路第一语音信号可以是通过麦克风阵列/拾音阵列获取到的n路语音信号，麦克风阵列中可以包括有n个麦克风等。n为大于或等于2的正整数。The n-channel first voice signals may be n-channel voice signals obtained through a microphone array/sound pickup array, and the microphone array may include n microphones and the like. n is a positive integer greater than or equal to 2.

对n路第一语音信号进行回声消除的方法可以是，通过根据参考信号对n路第一语音信号进行回声消除，具体可以是通过ACE算法进行回声消除，参考信号可以是预先设定的一个信号。The method for performing echo cancellation on the n-channel first voice signals may be to perform echo cancellation on the n-channel first voice signals according to a reference signal, specifically, performing echo cancellation by using an ACE algorithm, and the reference signal may be a preset signal. .

S202、对n路第二语音信号进行波束形成，以得到m路第一波束。S202. Perform beamforming on n channels of second speech signals to obtain m channels of first beams.

可以通过波束形成算法等，对n路第二语音信号进行波束成形，得到m路第一波束。m可以是独立于n的数值，例如，m可以与n相同，m也可以与n不同。A beamforming algorithm or the like may be used to perform beamforming on the n channels of second speech signals to obtain m channels of first beams. m can be a value independent of n, for example, m can be the same as n, or m can be different from n.

S203、从m路第一波束中获取干扰候选波束。S203. Obtain interference candidate beams from the m first beams.

可以通过m路第一波束对应的帧级能量值，来确定出干扰候选波束。具体可以是根据m路第一波束对应的帧级能量值和波束平均能量值，确定干扰候选波束。波束平均能量值可以理解为m路波束的波束平均能量。The interference candidate beams may be determined by the frame-level energy values corresponding to the m first beams. Specifically, the interference candidate beam may be determined according to the frame-level energy value and the beam average energy value corresponding to the m first beams. The average beam energy value can be understood as the average beam energy of m beams.

S204、根据干扰候选波束对n路第二语音信号进行处理，以得到目标语音信号。S204. Process the n channels of second speech signals according to the interference candidate beams to obtain a target speech signal.

可以对n路第二语音信号进行滤波处理得到滤波处理后的n路第三语音信号，根据n路第三语音信号进行处理，得到目标语音信号。或者，判断干扰候选波束是否为预设波束，根据判断结果对第二语音信号进行处理，得到目标语音信号。预设波束可以是持续干扰波束，持续干扰波束可以理解为：波束中存在持续的噪声干扰。The n channels of second speech signals may be filtered to obtain n channels of third speech signals after filtering, and the target speech signals may be obtained by processing according to the n channels of third speech signals. Or, it is judged whether the interference candidate beam is a preset beam, and the second voice signal is processed according to the judgment result to obtain the target voice signal. The preset beam can be a continuous interference beam, and the continuous interference beam can be understood as: there is continuous noise interference in the beam.

在一个可能的实现方式中，一种可能的从m路第一波束中获取干扰候选波束的方法包括：In a possible implementation, a possible method for obtaining interference candidate beams from m-channel first beams includes:

A1、获取所述m路第一波束对应的帧级能量，以得到m个第一帧级能量值；A1. Acquire the frame-level energy corresponding to the m first beams to obtain m first frame-level energy values;

A2、根据所述m个第一帧级能量值，确定波束平均能量值；A2. Determine the average energy value of the beam according to the m first frame-level energy values;

A3、根据所述m个第一帧级能量值和所述波束平均能量值，确定所述m路第一波束对应的计数值；A3. Determine the count value corresponding to the m first beams according to the m first frame-level energy values and the beam average energy value;

A4、将所述m路第一波束对应的计数值中的最大计数值对应的波束确定为所述干扰候选波束。A4. Determine the beam corresponding to the largest count value among the count values corresponding to the m first beams as the interference candidate beam.

可以通过根据第一波束对应的每帧数据的时域采样点数来获取到第一波束对应的帧级能量值。具体可以通过如下公式所示的方法获取帧级能量：The frame-level energy value corresponding to the first beam may be obtained by the number of time domain sampling points of each frame of data corresponding to the first beam. Specifically, the frame-level energy can be obtained by the method shown in the following formula:

其中，i为第i个第一波束，取值范围为[1，m]，K表示每帧数据的时域采样点数，k取值范围[1,K]，P_i为第i个第一波束的帧级能量。Among them, i is the ith first beam, the value range is [1, m], K represents the number of time domain sampling points of each frame of data, k value range [1, K], P _i is the ith first beam The frame-level energy of the beam.

可以直接对m路第一帧级能量值进行均值运算，以得到波束平均能量值。具体有通过如下公式所示的方法获取到波束平均能量值：The average value of the energy values of the m channels at the first frame level can be directly performed to obtain the average energy value of the beam. Specifically, the average energy value of the beam can be obtained by the method shown in the following formula:

其中，P_avg为波束平均能量值，m为总的帧能量的数量，P_i为第i个帧级能量值。Among them, P _avg is the average energy value of the beam, m is the amount of total frame energy, and P _i is the ith frame-level energy value.

根据m个第一帧级能量值和所述波束平均能量值，确定所述m路第一波束对应的计数值的方法具体可以为：According to the m first frame-level energy values and the beam average energy value, the method for determining the count value corresponding to the m first beams may specifically be:

首先，VAD模块根据波束数据B1,B2,…,Bm的数据判断各个波束的当前帧是否为安静状态，VAD＝0表示当前帧为安静状态，VAD＝1表示当前帧为非安静状态；First, the VAD module judges whether the current frame of each beam is in a quiet state according to the data of the beam data B1, B2, .

当第i个波束的当前帧为非安静状态时，对波束对应计数器C_i进行如下更新：When the current frame of the i-th beam is in a non-quiet state, the counter C _i corresponding to the beam is updated as follows:

其中，P_avg为波束平均能量值，P_i为第i个第一帧级能量值。Among them, P _avg is the average energy value of the beam, and P _i is the ith first frame-level energy value.

在一个可能的实现方式中，一种可能的根据所述干扰候选波束对所述n路第二语音信号进行处理，以得到目标语音信号的方法包括：In a possible implementation manner, a possible method for processing the n channels of second speech signals according to the interference candidate beams to obtain a target speech signal includes:

B1、获取所述干扰候选波束的平滑能量值；B1. Obtain the smoothed energy value of the interference candidate beam;

B2、根据所述平滑能量值确定滤波强度值；B2. Determine the filter strength value according to the smoothed energy value;

B3、根据所述滤波强度值和所述干扰候选波束对所述n路第二语音信号进行滤波处理，以得到n路第三语音信号；B3. Perform filtering processing on the n-channel second speech signals according to the filtering strength value and the interference candidate beam to obtain n-channel third speech signals;

B4、对所述n路第三语音信号进行处理，以得到目标语音信号。B4. Process the n third voice signals to obtain a target voice signal.

可以通过数据帧的上一帧的平滑能量值和本数据帧的波束的最大能量值来确定平滑能量值，具体可以通过如下公式所示的方法来确定平滑能量值：The smoothed energy value can be determined by the smoothed energy value of the previous frame of the data frame and the maximum energy value of the beam of the current data frame. Specifically, the smoothed energy value can be determined by the method shown in the following formula:

其中，

为干扰候选波束的平滑能量值，

为上一帧的平滑能量值，P_max为本数据帧中的波束的最大能量值。in,

is the smoothed energy value of the interfering candidate beam,

is the smoothed energy value of the previous frame, and _Pmax is the maximum energy value of the beam in this data frame.

滤波强度值可以通过最大波束能量来确定，具体可以通过如下公式所示的方法确定：The filter strength value can be determined by the maximum beam energy, which can be determined by the method shown in the following formula:

其中，β表示滤波强度，H表示高阈值，L表示低阈值，H>L。当

超过高阈值H时，β＝1；当

低于低阈值L时，β＝0。H、L为预先设定的值，可以通过经验值或历史数据设定。Among them, β represents the filter strength, H represents the high threshold, L represents the low threshold, and H>L. when

When the high threshold H is exceeded, β=1; when

Below the low threshold L, β=0. H and L are preset values, which can be set through empirical values or historical data.

可以将干扰候选波束确定为参考波束，以及通过滤波强度值调整滤波增益，以实现对n路第二语音信号进行滤波处理，得到n路第三语音信号。对第二语音信号进行滤波时，可以是通过线性自适应滤波方式进行滤波，例如，通过Kalman滤波，具体可以是，以干扰候选波束为参考波束，通过滤波强度值对滤波器的增益进行调整，以得到第三语音信号。The interference candidate beam may be determined as the reference beam, and the filtering gain may be adjusted by the filtering strength value, so as to implement filtering processing on n channels of second speech signals to obtain n channels of third speech signals. When filtering the second speech signal, the filtering may be performed by a linear adaptive filtering method, for example, by Kalman filtering. Specifically, the interference candidate beam may be used as a reference beam, and the gain of the filter may be adjusted by the filtering strength value. to obtain the third voice signal.

可以从n路第三语音信号进行波束形成，得到m路第二波束，根据第二波束确定处理后的第一目标波束，以及从n路第三语音信号中获取两路语音信号，对该两路语音信号进行处理，得到候选语音信号，根据候选语音信号和处理后的目标波束确定目标语音信号。Beamforming can be performed from n channels of third speech signals to obtain m channels of second beams, the processed first target beam can be determined according to the second beams, and two channels of speech signals can be obtained from n channels of third speech signals, and the two channels of speech signals can be obtained. The voice signal of the channel is processed to obtain a candidate voice signal, and the target voice signal is determined according to the candidate voice signal and the processed target beam.

在一个可能的实现方式中，一种可能的对n路第三语音信号进行处理，得到目标语音信号的方法包括：In a possible implementation manner, a possible method for processing n channels of third voice signals to obtain a target voice signal includes:

C1、从所述n路第三语音信号中获取第i路第三语音信号和第j路第三语音信号；C1, obtain the ith third voice signal and the jth third voice signal from the n third voice signals;

C2、对所述第i路第三语音信号和第j路第三语音信号进行去混响和盲源分离，以得到第一候选语音信号和第二候选语音信号；C2, perform de-reverberation and blind source separation on the i-th third voice signal and the j-th third voice signal to obtain the first candidate voice signal and the second candidate voice signal;

C3、对所述n路第三语音信号进行波束形成，以得到m路第二波束；C3, performing beamforming on the n third voice signals to obtain m second beams;

C4、从所述m路第二波束中获取第一目标波束；C4. Obtain the first target beam from the m second beams;

C5、对所述第一目标波束至少进行降噪处理，以得到处理后的第一目标波束；C5. Perform at least noise reduction processing on the first target beam to obtain a processed first target beam;

C6、根据所述第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束，确定所述目标语音信号。C6. Determine the target speech signal according to the first candidate speech signal, the second candidate speech signal and the processed first target beam.

第i路第三语音信号和第j路第三语音信号的获取方法可以是，将麦克风阵列中距离最远的两个麦克风对应的第三语音信号确定为第i路第三语音信号和第j路第三语音信号。The method for acquiring the i-th third voice signal and the j-th third voice signal may be: determining the third voice signals corresponding to the two farthest microphones in the microphone array as the i-th third voice signal and the j-th third voice signal The third voice signal of the road.

对所述第i路第三语音信号和第j路第三语音信号进行去混响和盲源分离，以得到第一候选语音信号和第二候选语音信号具体可以为，对第i路第三语音信号和第j路第三语音信号进行交叉运算等，以得到第一候选语音信号和第二候选语音信号。对第三语音信号进行去混响和盲源分离的方法可以是，通过去混响模块和盲源分离模块以获取到第一候选语音信号和第二候选语音信号，盲源分离模块可以通过IVA算法实现。De-reverberation and blind source separation are performed on the i-th third voice signal and the j-th third voice signal, to obtain the first candidate voice signal and the second candidate voice signal, specifically, for the i-th third voice signal. The voice signal and the j-th third voice signal are subjected to cross operation, etc., to obtain the first candidate voice signal and the second candidate voice signal. The method for performing de-reverberation and blind source separation on the third speech signal may be to obtain the first candidate speech signal and the second candidate speech signal through a de-reverberation module and a blind source separation module, and the blind source separation module may pass IVA. Algorithm implementation.

对m路第二波束中获取第一目标波束的方法可以参照前述实施例中获取干扰候选波束的方法，此处不再赘述。For the method for obtaining the first target beam from the m-channel second beam, reference may be made to the method for obtaining the interference candidate beam in the foregoing embodiment, which will not be repeated here.

根据所述第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束，确定所述目标语音信号的方法可以是，将第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束中的任意一个确定为目标语音信号，也可以是将第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束中的任意组合确定为目标语音信号，还可以是第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束中的任意组合并进行处理，将处理后得到的信号确定为目标语音信号。According to the first candidate speech signal, the second candidate speech signal and the processed first target beam, the method for determining the target speech signal may be as follows: Any one of the processed first target beams is determined as the target speech signal, or any combination of the first candidate speech signal, the second candidate speech signal and the processed first target beam can be determined as the target The speech signal may also be any combination of the first candidate speech signal, the second candidate speech signal and the processed first target beam, and the processed signal is determined as the target speech signal.

第一目标波束路唤醒增强信号通过滤波干扰波束的方式有效提升了外噪场景下语音增强和干扰抑制的效果，第i路和第j路唤醒增强信号通过盲源分离算法提升了目标和干扰在相同方位场景下的目标信号提取质量，并根据处理后的第一波束、第一语音信号和第二语音信号，从而提升了目标语音信号的质量，从而采用目标语音信号进行语音识别和语音唤醒时，可以提升准确性。The first target beam path wake-up enhancement signal effectively improves the effect of speech enhancement and interference suppression in external noise scenarios by filtering the interference beam. The quality of the target signal extraction in the same orientation scene is improved, and the quality of the target voice signal is improved according to the processed first beam, the first voice signal and the second voice signal, so that the target voice signal is used for voice recognition and voice wake-up. , which can improve the accuracy.

在一个可能的实现方式中，另一种可能的根据所述干扰候选波束对所述n路第二语音信号进行处理，以得到目标语音信号的方法可以为：In a possible implementation manner, another possible method for processing the n channels of second speech signals according to the interference candidate beams to obtain the target speech signal may be:

D1、判断所述干扰候选波束是否为预设波束，若所述干扰候选波束为预设波束，则根据所述干扰候选波束和预设滤波强度对所述n路第二语音信号进行滤波处理，以得到n路第四语音信号；D1. Determine whether the interference candidate beam is a preset beam, and if the interference candidate beam is a preset beam, filter the n second voice signals according to the interference candidate beam and a preset filtering strength, to obtain n fourth voice signals;

D2、对所述n路第四语音信号进行波束形成，以得到m路第三波束；D2, performing beamforming on the n channels of fourth speech signals to obtain m channels of third beams;

D3、从所述m路第三波束中获取第二目标波束；D3. Obtain the second target beam from the m third beams;

D4、对所述第二目标波束至少进行降噪处理，以得到处理后的第二目标波束；D4. Perform at least noise reduction processing on the second target beam to obtain a processed second target beam;

D5、从所述n路第二语音信号中获取第h路第二语音信号和第k路第二语音信号；D5, obtain the h-th second voice signal and the k-th second voice signal from the n-way second voice signal;

D6、对所述第h路第二语音信号和所述第k路第二语音信号进行滤波和盲源分离，以得到第三候选语音信号和第四候选语音信号；D6, filtering and blind source separation are performed on the h-th second voice signal and the k-th second voice signal to obtain a third candidate voice signal and a fourth candidate voice signal;

D7、根据所述第三候选语音信号和第四候选语音信号和所述处理后的第二目标波束，确定所述目标语音信号。D7. Determine the target speech signal according to the third candidate speech signal, the fourth candidate speech signal and the processed second target beam.

上述步骤D2-D4的具体实施方式，可以参照前述实施例中获取第一目标波束的方法，此处不再赘述。For the specific implementation of the above steps D2-D4, reference may be made to the method for obtaining the first target beam in the foregoing embodiment, and details are not described herein again.

对所述第h路第二语音信号和所述第k路第二语音信号进行滤波和盲源分离，还可以对第h路第二语音信号和所述第k路第二语音信号进行去混响，以得到第三候选语音信号和第四候选语音信号中。对第h路第二语音信号和所述第k路第二语音信号进行滤波的方法可以参照前述对n路第二语音信号进行滤波的方法。Filtering and blind source separation are performed on the h-th second voice signal and the k-th second voice signal, and demixing can also be performed on the h-th second voice signal and the k-th second voice signal sound to obtain the third candidate speech signal and the fourth candidate speech signal. For the method of filtering the h-th second voice signal and the k-th second voice signal, reference may be made to the foregoing method for filtering n-th second voice signals.

预设波束可以是持续干扰波束，持续干扰波束可以理解为：波束中存在持续的噪声干扰。The preset beam can be a continuous interference beam, and the continuous interference beam can be understood as: there is continuous noise interference in the beam.

从所述n路第二语音信号中获取第h路第二语音信号和第k路第二语音信号，可以参照前述实施例中获取第i路第三语音信号和第j路语音信号的方法进行获取，此处不再赘述。Obtaining the h-th channel of the second voice signal and the k-th channel of the second voice signal from the n-channel second voice signals can be performed by referring to the method for acquiring the i-th channel of the third voice signal and the j-th channel of the voice signal in the foregoing embodiment. acquisition, which will not be repeated here.

根据所述第三候选语音信号和第四候选语音信号和所述处理后的第二目标波束，确定所述目标语音信号，可以参照前述实施例中根据所述第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束，确定所述目标语音信号，此处不再赘述。To determine the target speech signal according to the third candidate speech signal, the fourth candidate speech signal and the processed second target beam, refer to the first candidate speech signal and the second candidate speech signal in the foregoing embodiment with reference to The voice signal and the processed first target beam determine the target voice signal, which will not be repeated here.

在一个可能的实现方式中，若干扰候选波束不是预设波束，则可以通过如下方法获取到目标语音信号：In a possible implementation, if the interference candidate beam is not a preset beam, the target speech signal can be obtained by the following method:

根据所述干扰候选波束、第三候选语音信号和第四候选语音信号，确定所述目标语音信号。具体可以参照前述实施例中根据所述第一候选语音信号和第二候选语音信号和所述处理后的第一目标波束，确定所述目标语音信号，此处不再赘述。The target speech signal is determined according to the interference candidate beam, the third candidate speech signal and the fourth candidate speech signal. Specifically, the target speech signal may be determined according to the first candidate speech signal, the second candidate speech signal, and the processed first target beam with reference to the foregoing embodiment, which will not be repeated here.

本示例中，对第i路第三语音信号和第j路第三语音信号进行去混响和盲源分离，以得到的第一候选语音信号和第二候选语音信号和降噪后的第一目标波束来确定目标语音信号，可以提升目标语音信号的信号质量。In this example, de-reverberation and blind source separation are performed on the i-th third speech signal and the j-th third speech signal to obtain the first candidate speech signal and the second candidate speech signal and the noise-reduced first speech signal. The target beam is used to determine the target speech signal, which can improve the signal quality of the target speech signal.

参见图3，图3为本发明实施例提供的一种语音处理装置的结构示意图。如图3所示，该语音处理装置30包括：Referring to FIG. 3 , FIG. 3 is a schematic structural diagram of a voice processing apparatus according to an embodiment of the present invention. As shown in Figure 3, the voice processing device 30 includes:

消除单元301，用于对n路第一语音信号进行回声消除，以得到n路第二语音信号，所述第一语音信号为采集得到的语音信号；Elimination unit 301, configured to perform echo cancellation on n channels of first voice signals to obtain n channels of second voice signals, where the first voice signals are collected voice signals;

波束形成单元302，用于对所述n路第二语音信号进行波束形成，以得到m路第一波束；a beamforming unit 302, configured to perform beamforming on the n channels of second speech signals to obtain m channels of first beams;

获取单元303，用于从所述m路第一波束中获取干扰候选波束；an obtaining unit 303, configured to obtain interference candidate beams from the m first beams;

处理单元304，用于根据所述干扰候选波束对所述n路第二语音信号进行处理，以得到目标语音信号。The processing unit 304 is configured to process the n channels of second speech signals according to the interference candidate beams to obtain a target speech signal.

在一个可能的实现方式中，所述获取单元303具体用于：In a possible implementation manner, the obtaining unit 303 is specifically configured to:

在一个可能的实现方式中，所述处理单元304具体用于：In a possible implementation manner, the processing unit 304 is specifically configured to:

在一个可能的实现方式中，在所述对所述n路第三语音信号进行处理，以得到目标语音信号方面，所述处理单元304具体用于：In a possible implementation manner, in the aspect of processing the n third voice signals to obtain the target voice signal, the processing unit 304 is specifically configured to:

在一个可能的实现方式中，所述处理单元304用于：In a possible implementation manner, the processing unit 304 is configured to:

在一个可能的实现方式中，所述处理单元304还用于：In a possible implementation manner, the processing unit 304 is further configured to:

在本实施例中，语音处理装置300是以单元的形式来呈现。这里的“单元”可以指特定应用集成电路(application-specific integrated circuit，ASIC)，执行一个或多个软件或固件程序的处理器和存储器，集成逻辑电路，和/或其他可以提供上述功能的器件。此外，以上消除单元301、波束形成单元302、获取单元303和处理单元304可通过图4所示的语音处理装置的处理器401来实现。In this embodiment, the speech processing apparatus 300 is presented in the form of a unit. A "unit" herein may refer to an application-specific integrated circuit (ASIC), a processor and memory executing one or more software or firmware programs, an integrated logic circuit, and/or other devices that may provide the above-described functions . In addition, the above elimination unit 301 , beam forming unit 302 , acquisition unit 303 and processing unit 304 may be implemented by the processor 401 of the speech processing apparatus shown in FIG. 4 .

如图4所示语音处理装置40可以以图4中的结构来实现，该语音处理装置400包括至少一个处理器401，至少一个存储器402以及至少一个通信接口403。所述处理器401、所述存储器402和所述通信接口403通过所述通信总线连接并完成相互间的通信。As shown in FIG. 4 , the voice processing apparatus 40 can be implemented with the structure shown in FIG. 4 . The voice processing apparatus 400 includes at least one processor 401 , at least one memory 402 and at least one communication interface 403 . The processor 401, the memory 402 and the communication interface 403 are connected through the communication bus and complete the communication with each other.

处理器401可以是通用中央处理器(CPU)，微处理器，特定应用集成电路(application-specific integrated circuit，ASIC)，或一个或多个用于控制以上方案程序执行的集成电路。The processor 401 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the programs in the above solutions.

通信接口403，用于与其他设备或通信网络通信，如以太网，无线接入网(RAN)，无线局域网(Wireless Local Area Networks，WLAN)等。The communication interface 403 is used to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (Wireless Local Area Networks, WLAN).

存储器402可以是只读存储器(read-only memory，ROM)或可存储静态信息和指令的其他类型的静态存储设备，随机存取存储器(random access memory，RAM)或者可存储信息和指令的其他类型的动态存储设备，也可以是电可擦可编程只读存储器(ElectricallyErasable Programmable Read-Only Memory，EEPROM)、只读光盘(Compact Disc Read-Only Memory，CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此。存储器可以是独立存在，通过总线与处理器相连接。存储器也可以和处理器集成在一起。Memory 402 may be read-only memory (ROM) or other type of static storage device that can store static information and instructions, random access memory (RAM), or other type of static storage device that can store information and instructions The dynamic storage device can also be an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, optical disk storage ( including compact discs, laser discs, compact discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being stored by a computer any other medium taken, but not limited to this. The memory can exist independently and be connected to the processor through a bus. The memory can also be integrated with the processor.

其中，所述存储器402用于存储执行以上方案的应用程序代码，并由处理器401来控制执行。所述处理器401用于执行所述存储器402中存储的应用程序代码。Wherein, the memory 402 is used for storing the application code for executing the above solution, and the execution is controlled by the processor 401 . The processor 401 is configured to execute the application code stored in the memory 402 .

存储器402存储的代码可执行以上提供的语音处理方法，对n路第一语音信号进行回声消除，以得到n路第二语音信号，所述第一语音信号为采集得到的语音信号；对所述n路第二语音信号进行波束形成，以得到m路第一波束；从所述m路第一波束中获取干扰候选波束；根据所述干扰候选波束对所述n路第二语音信号进行处理，以得到目标语音信号。The code stored in the memory 402 can execute the voice processing method provided above, and perform echo cancellation on n-channel first voice signals to obtain n-channel second voice signals, and the first voice signals are the collected voice signals; The n channels of second speech signals are beamformed to obtain m channels of first beams; interference candidate beams are obtained from the m channels of first beams; the n channels of second speech signals are processed according to the interference candidate beams, to get the target speech signal.

本发明实施例还提供一种计算机可读存储介质，其中，该计算机可读存储介质可存储有程序，该程序执行时包括上述方法实施例中记载的任何一种语音处理方法的部分或全部步骤。An embodiment of the present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium may store a program, and when the program is executed, the program includes part or all of the steps of any of the speech processing methods described in the above method embodiments .

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本发明并不受所描述的动作顺序的限制，因为依据本发明，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action sequence. As in accordance with the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置，可通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative, for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储器中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储器包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable memory. Based on such understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, Several instructions are included to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成，该程序可以存储于一计算机可读存储器中，存储器可以包括：闪存盘、只读存储器(英文：Read-Only Memory，简称：ROM)、随机存取器(英文：Random Access Memory，简称：RAM)、磁盘或光盘等。Those skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), magnetic disk or optical disk, etc.

以上对本发明实施例进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上上述，本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present invention; at the same time, for Persons of ordinary skill in the art, according to the idea of the present invention, will have changes in the specific embodiments and application scope. To sum up, the content of this description should not be construed as a limitation of the present invention.

Claims

1. a speech processing method, is characterized in that, described method comprises:

Performing echo cancellation on n-channel first voice signals to obtain n-channel second voice signals, where the first voice signals are collected voice signals;

performing beamforming on the n channels of second speech signals to obtain m channels of first beams;

obtaining interference candidate beams from the m first beams;

The n channels of second speech signals are processed according to the interference candidate beams to obtain a target speech signal.

2 . The method according to claim 1 , wherein the obtaining interference candidate beams from the m-channel first beams comprises: 2 .

obtaining the frame-level energy corresponding to the m first beams to obtain m first frame-level energy values;

determining the average energy value of the beam according to the m first frame-level energy values;

determining a count value corresponding to the m first beams according to the m first frame-level energy values and the beam average energy value;

The beam corresponding to the largest count value among the count values corresponding to the m first beams is determined as the interference candidate beam.

3. The method according to claim 1 or 2, wherein the processing of the n channels of second voice signals according to the interference candidate beams to obtain a target voice signal, comprising:

obtaining the smoothed energy value of the interference candidate beam;

determining a filter strength value according to the smoothed energy value;

Perform filtering processing on the n channels of second speech signals according to the filter strength value and the interference candidate beam to obtain n channels of third speech signals;

The n third voice signals are processed to obtain a target voice signal.

4. The method according to claim 3, wherein the processing of the n third voice signals to obtain a target voice signal comprises:

Obtain the ith third voice signal and the jth third voice signal from the n third voice signals;

Perform de-reverberation and blind source separation on the i-th third speech signal and the j-th third speech signal to obtain the first candidate speech signal and the second candidate speech signal;

performing beamforming on the n channels of third speech signals to obtain m channels of second beams;

obtaining a first target beam from the m-way second beam;

performing at least noise reduction processing on the first target beam to obtain a processed first target beam;

The target speech signal is determined according to the first candidate speech signal and the second candidate speech signal and the processed first target beam.

5. The method according to claim 1 or 2, wherein the processing of the n channels of second speech signals according to the interference candidate beams to obtain a target speech signal, comprising:

Determine whether the interference candidate beam is a preset beam, and if the interference candidate beam is a preset beam, perform filtering processing on the n channels of second speech signals according to the interference candidate beam and the preset filtering strength to obtain n fourth voice signal;

performing beamforming on the n channels of fourth speech signals to obtain m channels of third beams;

obtaining a second target beam from the m third beams;

performing at least noise reduction processing on the second target beam to obtain a processed second target beam;

Obtain the h-th second voice signal and the k-th second voice signal from the n-way second voice signals;

Filtering and blind source separation are performed on the h-th second voice signal and the k-th second voice signal to obtain a third candidate voice signal and a fourth candidate voice signal;

The target speech signal is determined according to the third candidate speech signal and the fourth candidate speech signal and the processed second target beam.

6. The method according to claim 5, wherein the method further comprises:

If the interference candidate beam is not a preset beam, the target speech signal is determined according to the interference candidate beam, the third candidate speech signal and the fourth candidate speech signal.

7. A voice processing device, wherein the device comprises:

an elimination unit, configured to perform echo cancellation on the n-channel first voice signals to obtain n-channel second voice signals, where the first voice signals are the collected voice signals;

a beamforming unit, configured to perform beamforming on the n channels of second speech signals to obtain m channels of first beams;

an obtaining unit, configured to obtain interference candidate beams from the m first beams;

A processing unit, configured to process the n channels of second speech signals according to the interference candidate beams to obtain a target speech signal.

8. The device according to claim 7, wherein the acquiring unit is specifically configured to:

determining an average beam energy value according to the m first frame-level energy values;

9. The device according to claim 7 or 8, wherein the processing unit is specifically configured to:

obtaining the smoothed energy value of the interference candidate beam;

determining a filter strength value according to the smoothed energy value;

The n third voice signals are processed to obtain a target voice signal.

10. The device according to claim 9, wherein, in the aspect of processing the n third voice signals to obtain the target voice signal, the processing unit is specifically used for:

Perform de-reverberation and blind source separation on the i-th third voice signal and the j-th third voice signal to obtain the first candidate voice signal and the second candidate voice signal;

obtaining a first target beam from the m-way second beam;

11. The apparatus according to claim 7 or 8, wherein the processing unit is configured to:

obtaining a second target beam from the m third beams;

12. The apparatus according to claim 11, wherein the processing unit is further configured to:

13. A voice processing device, comprising:

memory for storing instructions; and

at least one processor coupled to the memory;

Wherein, when the at least one processor executes the instructions, the instructions cause the processor to perform the method of any one of claims 1-6.

14. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program comprising program instructions that, when executed by a processor, cause the processor to execute The method of any one of claims 1-6.