CN111883155B

CN111883155B - Echo cancellation method, device and storage medium

Info

Publication number: CN111883155B
Application number: CN202010700907.4A
Authority: CN
Inventors: 马路; 赵培; 苏腾荣
Original assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2023-10-27
Anticipated expiration: 2040-07-17
Also published as: CN111883155A

Abstract

The invention provides an echo cancellation method, an echo cancellation device and a storage medium, wherein the echo cancellation method comprises the following steps: obtaining a predicted echo signal through a nonlinear filter based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process that the nonlinear filter carries out forward calculation on the far-end signal based on the neural network, the nonlinear filter carries out nonlinear processing on the weighted summation result of each node in the forward calculation based on a nonlinear function; and performing echo cancellation on a near-end signal input by the microphone according to the predicted echo signal. The invention solves the problems of poor nonlinear echo suppression effect and high processing complexity in the scheme of realizing echo cancellation by combining linear filtering with nonlinear processing, improves the estimation precision of nonlinear echo, and further improves the echo cancellation effect.

Description

Echo cancellation method, device and storage medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to an echo cancellation method, apparatus, and storage medium.

Background

The voice signal processing technology is a key technology in the field of man-machine interaction at present, and the echo cancellation algorithm can realize the cancellation of self-playing voice signals received by a microphone of equipment, is a key algorithm for whole voice signal processing and voice enhancement, has extremely important effect on voice recognition at the back end, and is a key technology for voice signal processing.

FIG. 1 is a schematic diagram of echo cancellation, as shown in FIG. 1, in an echo cancellation method in open source tool Web instant messaging (Web Real-Time Communication, webRTC), an adaptive filter is used to complete the estimation of echo, thereby canceling linear echo; suppression of residual nonlinear echoes is accomplished using nonlinear processing. The method can well eliminate the linear echo, but the nonlinear echo and the time delay estimation error can introduce the residual echo, and although the nonlinear processing can inhibit the residual echo to a certain extent, the inhibition degree is limited, and certain residual echo still exists, particularly the echo in a complex environment and the nonlinear echo introduced by a device loudspeaker, so that the final echo elimination effect is influenced, and the processing performance of the whole sound signal is further influenced. In addition, the nonlinear processing in the conventional echo cancellation method has high computational complexity, and takes up half of the computation time of the whole echo cancellation algorithm.

Disclosure of Invention

The embodiment of the invention provides an echo cancellation method, an echo cancellation device and a storage medium, which are used for at least solving the problems of poor nonlinear echo suppression effect and high processing complexity in a scheme for realizing echo cancellation by combining linear filtering with nonlinear processing.

According to an embodiment of the present invention, there is provided an echo cancellation method including: obtaining a predicted echo signal through a nonlinear filter based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process that the nonlinear filter carries out forward calculation on the far-end signal based on the neural network, the nonlinear filter carries out nonlinear processing on the weighted summation result of each node in the forward calculation based on a nonlinear function; and performing echo cancellation on a near-end signal input by the microphone according to the predicted echo signal.

In at least one example embodiment, the method further comprises: determining an error signal based on the predicted echo signal and a desired input signal; and adjusting the weight coefficient of the neural network according to the error signal.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, the input layer, the output layer, and the output layerEach of the hidden layers has one or more nodes, and adjusting the weighting coefficients of the neural network according to the error signal includes: calculating the adjusted weight coefficient according to the error signal e (k) Wherein e (k) =d (k) -o (k), o (k) being the predicted echo signal, d (k) being the desired input signal,/o>Representing the weight coefficient from the ith node of the l-1 layer to the jth node of the l layer in the neural network at the moment k, +.>The weight coefficient from the ith node of the first layer to the jth node of the first layer in the neural network at the moment k+1 is represented, mu is an adjustment step length, deltaw represents the change of the weight coefficient, and the weight coefficient is calculated by solving the error signal e (k)>Is obtained.

In at least one exemplary embodiment, before determining an error signal from the predicted echo signal and the desired input signal, and adjusting the weighting coefficients of the neural network according to the error signal, further comprising: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; in the case where the near-end is silent and the far-end is silent, the steps of determining an error signal from the predicted echo signal and a desired input signal, and adjusting the weighting coefficients of the neural network based on the error signal, are entered, wherein the desired input signal comprises: the near-end signal input by the microphone in the case where the near-end has no sound and the far-end has sound.

In at least one exemplary embodiment, before obtaining a predicted echo signal based on the far-end signal through a nonlinear filter, and performing echo cancellation on a near-end signal input by a microphone according to the predicted echo signal, the method further comprises: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; and under the condition that the far end has sound, entering a step of obtaining a predicted echo signal through a nonlinear filter based on the far end signal, and carrying out echo cancellation on a near end signal input by a microphone according to the predicted echo signal.

In at least one exemplary embodiment, performing double-ended detection on the near-end signal and the far-end signal, respectively, to determine whether the near-end and the far-end are voiced, respectively, includes: respectively acquiring a first energy value of the near-end signal and a second energy value of the far-end signal; determining that the near end has no sound when the first energy value is below a first sound determination threshold, and determining that the near end has sound when the first energy value is not below the first sound determination threshold; and determining that the far end has no sound if the second energy value is lower than a second sound judgment threshold value, and determining that the far end has sound if the second energy value is not lower than the second sound judgment threshold value.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, deriving the predicted echo signal by a nonlinear filter based on the far-end signal includes: and taking the far-end signal as an input signal of a node of the input layer of the neural network, and performing forward calculation on the far-end signal, wherein in the forward calculation, the output values of the nodes of the previous layer are weighted and summed according to the weight coefficient from the nodes of the previous layer to the nodes of the current layer by layer to obtain a predicted value, and nonlinear processing is performed on the predicted value to obtain the output value of the nodes of the current layer until the output value of the nodes of the output layer is obtained to serve as the predicted echo signal.

According to another embodiment of the present invention, there is provided an echo cancellation device including: the nonlinear filter is used for obtaining a predicted echo signal based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process of performing forward calculation on the far-end signal based on the neural network, the nonlinear filter performs nonlinear processing on the weighted summation result of each node in the forward calculation based on a nonlinear function; and the echo cancellation module is used for performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

In at least one example embodiment, the apparatus further comprises: an error determination module for determining an error signal from the predicted echo signal and a desired input signal and inputting the error signal to the nonlinear filter; the nonlinear filter is used for adjusting the weight coefficient of the neural network according to the error signal.

In at least one example embodiment, the apparatus further comprises a double-ended detection module for: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; turning on the function of the nonlinear filter to adjust the weighting coefficient of the neural network according to the error signal in the case that the near end has no sound and the far end has sound, wherein the desired input signal includes: the near-end signal input by the microphone in the case where the near-end has no sound and the far-end has sound; and/or, under the condition that the far end has sound, starting the function of echo cancellation of the near-end signal input by the microphone by the echo cancellation module according to the predicted echo signal.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, the nonlinear filter deriving a predicted echo signal based on a far-end signal by: and taking the far-end signal as an input signal of a node of the input layer of the neural network, and performing forward calculation on the far-end signal, wherein in the forward calculation, the output values of the nodes of the previous layer are weighted and summed according to the weight coefficient from the nodes of the previous layer to the nodes of the current layer by layer to obtain a predicted value, and nonlinear processing is performed on the predicted value to obtain the output value of the nodes of the current layer until the output value of the nodes of the output layer is obtained to serve as the predicted echo signal.

According to a further embodiment of the invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

According to the application, the far-end signal is input into the nonlinear filter constructed based on the neural network to obtain the predicted echo signal, and the near-end signal input into the microphone is subjected to echo cancellation according to the predicted echo signal. In the embodiment of the application, the nonlinear filter performs nonlinear processing on the weighted summation result of each node in the forward computation based on the neural network in the process of performing the forward computation on the far-end signal, so that the nonlinear filter can replace the combination of the traditional linear adaptive filter and the nonlinear processing module, the nonlinear processing of independent nonlinear residual echo cancellation is avoided, the estimation precision of the nonlinear echo is improved, and the echo cancellation effect is further improved, thereby solving the problems of poor nonlinear echo suppression effect and high processing complexity in the scheme of realizing echo cancellation by combining linear filtering with nonlinear processing.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of echo cancellation;

fig. 2 is a flowchart of an echo cancellation method according to embodiment 1 of the present application;

fig. 3 is a block diagram of the structure of an echo cancellation device according to embodiment 2 of the present application;

fig. 4 is an exemplary structural block diagram of an echo canceling device according to embodiment 2 of the present application;

fig. 5 is a schematic diagram of an echo cancellation algorithm based on nonlinear adaptive filtering according to embodiment 4 of the present application;

FIG. 6 is a flowchart of a nonlinear adaptive filtering algorithm based on BP neural network according to embodiment 4 of the present application;

fig. 7 is a coefficient update flowchart of a BP neural network-based nonlinear filter according to embodiment 4 of the present application.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Example 1

In this embodiment, an echo cancellation method is provided, fig. 2 is a flowchart of the echo cancellation method according to embodiment 1 of the present application, and as shown in fig. 2, the flowchart includes the following steps:

Step S202, obtaining a predicted echo signal based on a far-end signal through a nonlinear filter, for example, may be based on the far-end signal, and estimating the echo signal through the nonlinear filter to obtain the predicted echo signal, where the nonlinear filter is constructed based on a neural network, and in the process that the nonlinear filter performs forward computation on the far-end signal based on the neural network, performing nonlinear processing on a weighted summation result of each node in the forward computation based on a nonlinear function;

step S204, performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

In Acoustic Echo Cancellation (AEC), far-end signals refer to voice signals of far-end (e.g., opposite-end participants of a voice conference, located at other sites) personnel, which are typically communicated to the current site via a communication line; the near-end signal refers to a signal collected by a near-end microphone, which is a superposition of a voice signal collected by the near-end microphone when a participant speaks in a current conference place and a voice signal of a far-end (for example, a counterpart participant of a voice conference and other conference places) person is played through a near-end speaker (a speaker of the current conference place) and then convolved with an echo path of a room of the local conference place. Through the step S202, the echo signal generated after the remote signal is emitted by the speaker of the local conference site and passes through the echo path of the local conference site room can be estimated by the nonlinear filter, so that the estimated predicted echo signal is eliminated from the near-end signal collected by the microphone of the local conference site, and the echo is eliminated. Because the echo generated by the near-end microphone after the far-end signal is emitted by the local loudspeaker and reflected by the complex and changeable wall surface belongs to indirect echo, and the indirect echo is a nonlinear signal, the nonlinear filter in the embodiment is realized by further carrying out nonlinear processing on the weighted summation result of each node, and the echo path corresponding to the nonlinear echo can be fully simulated, so that more accurate echo estimation is carried out, and the method can be better used for echo cancellation of the nonlinear echo.

According to the scheme, a far-end signal is input into a nonlinear filter constructed based on a neural network to obtain a predicted echo signal, and echo cancellation is performed on a near-end signal input into a microphone according to the predicted echo signal. In the embodiment of the invention, the nonlinear filter performs nonlinear processing on the weighted summation result of each node in the forward computation based on the neural network in the process of performing the forward computation on the far-end signal, so that the nonlinear filter can replace the combination of the traditional linear adaptive filter and the nonlinear processing module, the nonlinear processing of independent nonlinear residual echo cancellation is avoided, the estimation precision of the nonlinear echo is improved, and the echo cancellation effect is further improved, thereby solving the problems of poor nonlinear echo suppression effect and high processing complexity in the scheme of realizing echo cancellation by combining linear filtering with nonlinear processing.

In order to implement an adaptive nonlinear filter, the weighting parameters of the neural network may be adjusted according to the error between the predicted echo signal and the near-end signal. Thus, in at least one example embodiment, the method may further comprise: determining an error signal based on the predicted echo signal and a desired input signal; and adjusting the weight coefficient of the neural network according to the error signal. As an exemplary embodiment, adjusting the weighting coefficients of the neural network according to the error signal may be achieved by: and carrying out back propagation on the error signals in the neural network, and sequentially adjusting weight coefficients between nodes of adjacent layers of the neural network.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, determining an error signal from the predicted echo signal and a desired input signal includes: calculating an error signal e (k) =d (k) -o (k), where o (k) is the predicted echo signal, d (k) is the desired input signal, and the calculation method of o (k) may be, for example, using a neural network with 10 input nodes, 1 hidden layer, 3 nodes, and 1 output node f ₁ And f ₂ The non-linear activation functions used by the hidden layer and the output layer are respectively represented, the non-linear activation functions used can be the same or different, and g (k) represents the magnitude gain calculated by the kth update. Because if the neural network uses a network such as SigmoidThe class of nonlinear activation functions, the output is typically a fraction between-1 and 1, and the magnitude of the echo value to be predicted is mostly outside this range, so that it is necessary to multiply a data gain, which can be obtained by calculating the maximum value of the data in each input node, to the same data range, namely: g (k) =max (x _i (k))。

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, adjusting the weight coefficient of the neural network according to the error signal includes: calculating the adjusted weight coefficient according to the error signal e (k)Wherein e (k) =d (k) -o (k), o (k) being the predicted echo signal, d (k) being the desired input signal,/o>Representing the weight coefficient from the ith node of the l-1 layer to the jth node of the l layer in the neural network at the moment k, +.>The weight coefficient from the ith node of the first layer to the jth node of the first layer in the neural network at the moment k+1 is represented, mu is an adjustment step length, deltaw represents the change of the weight coefficient, and the weight coefficient is calculated by solving the error signal e (k)>Is obtained.

Therefore, the weight of the output layer changes to:

wherein d _k Representing the desired output value of output node k, o _k Is the predicted value of the output node k, g is the data gain described above, and for the layer 3 network shown in this embodiment, Δw _j,k The weight change from the intermediate node j to the node K of the output layer is represented, and the number of output layers k=1.

The weight of the intermediate hidden layer varies as:

wherein o is _i Output of input node i representing the upper layer, o _j Representing the output of the current layer node j, K represents the total number of output nodes, for this embodiment the output node is the predicted echo signal, so k=1. For the layer 3 network of this embodiment, Δw _i,j Representing the change in weight of the input node i to the intermediate node j.

In this embodiment, whether the near end and the far end have sound can be determined by double-end detection, and the adjustment of the weight coefficient is only performed when the near end is silent and the far end has sound, because the expected input signal excludes the influence of the sound of the near end user in this case, and the adjustment of the weight coefficient based on the result is more accurate. In at least one exemplary embodiment, before determining an error signal from the predicted echo signal and the desired input signal, and adjusting the weight coefficient of the neural network according to the error signal, the method may further include:

performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively;

In the case where the near-end is silent and the far-end is silent, the steps of determining an error signal from the predicted echo signal and a desired input signal, and adjusting the weighting coefficients of the neural network based on the error signal, are entered, wherein the desired input signal comprises: the near-end signal input by the microphone in the case where the near-end has no sound and the far-end has sound.

In this embodiment, whether the near end and the far end have sound can be determined by double-end detection, and the echo cancellation is controlled to be performed only when the far end has sound, because the echo cancellation is most necessary in this case. In at least one exemplary embodiment, before obtaining a predicted echo signal based on the far-end signal through a nonlinear filter, and performing echo cancellation on a near-end signal input to a microphone according to the predicted echo signal, the method may further include:

and under the condition that the far end has sound, entering a step of obtaining a predicted echo signal through a nonlinear filter based on the far end signal, and carrying out echo cancellation on a near end signal input by a microphone according to the predicted echo signal.

The process of double-end detection of the near-end signal and the far-end signal, respectively, to determine whether the near-end and the far-end are voiced may be implemented in various ways, for example, by energy value determination, or by correlation operation, etc. In at least one exemplary embodiment, performing double-ended detection on the near-end signal and the far-end signal, respectively, to determine whether the near-end and the far-end are voiced, respectively, may include:

respectively acquiring a first energy value of the near-end signal and a second energy value of the far-end signal;

determining that the near end has no sound when the first energy value is below a first sound determination threshold, and determining that the near end has sound when the first energy value is not below the first sound determination threshold;

and determining that the far end has no sound if the second energy value is lower than a second sound judgment threshold value, and determining that the far end has sound if the second energy value is not lower than the second sound judgment threshold value.

In at least one exemplary embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, step S202 may include the operations of:

And taking the far-end signal as an input signal of a node of the input layer of the neural network, and performing forward calculation on the far-end signal, wherein in the forward calculation, the output values of the nodes of the previous layer are weighted and summed according to the weight coefficient from the nodes of the previous layer to the nodes of the current layer by layer to obtain a predicted value, and nonlinear processing is performed on the predicted value to obtain the output value of the nodes of the current layer until the output value of the nodes of the output layer is obtained to serve as the predicted echo signal.

In the process of performing weighted summation on the output values of the nodes of the previous layer according to the weight coefficients from the nodes of the previous layer to the nodes of the current layer to obtain the predicted value, the calculation of the output value of the nodes of each layer can be expressed as follows:wherein k represents the kth iteration, +.>And->Output values of the node j of the first layer and the node i of the first-1 layer of the preceding layer, respectively,/->Representing the weight coefficients between layer i and layer j, N representing the number of layer i nodes, f (x) representing a nonlinear function with an argument x, an exemplary nonlinear activation function being a Sigmoid function, namely:

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

Example 2

In this embodiment, an echo cancellation device is further provided, and the device is used to implement the foregoing embodiments and preferred implementation manners, which are not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 3 is a block diagram of the structure of an echo cancellation device according to embodiment 2 of the present invention, and as shown in fig. 3, the device includes:

the nonlinear filter 32 is configured to obtain a predicted echo signal based on a far-end signal, where the nonlinear filter 32 is configured based on a neural network, and in a process of performing forward computation on the far-end signal by the nonlinear filter 32 based on the neural network, a weighted summation result of each node in the forward computation is subjected to nonlinear processing based on a nonlinear function;

and the echo cancellation module 34 is configured to perform echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

In order to implement an adaptive nonlinear filter, the weighting parameters of the neural network may be adjusted according to the error between the predicted echo signal and the near-end signal. Thus, as shown in an exemplary block diagram of the echo cancellation device according to embodiment 2 of the present invention in fig. 4, the device may further include:

an error determination module 42 for determining an error signal from the predicted echo signal and a desired input signal and inputting the error signal to the nonlinear filter 32;

the nonlinear filter 32 is configured to adjust a weight coefficient of the neural network according to the error signal.

As shown in an exemplary block diagram of the echo cancellation device according to embodiment 2 of the present invention of fig. 4, in at least one exemplary embodiment, the device may further comprise a double-ended detection module 44 for: performing double-end detection on the near-end signal and the far-end signal respectively to determine whether sound exists at the near end and the far end respectively; turning on the function of the nonlinear filter 32 to adjust the weighting coefficients of the neural network in response to the error signal in the case where the near end is silent and the far end is silent, wherein the desired input signal comprises: the near-end signal input by the microphone in the case where the near-end has no sound and the far-end has sound; and/or, in the case that the far end has sound, turning on the function of echo cancellation of the near end signal input by the microphone by the echo cancellation module 34 according to the predicted echo signal.

In at least one example embodiment, the neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, the nonlinear filter 32 deriving a predicted echo signal based on a far-end signal by:

It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.

Example 3

An embodiment of the invention also provides a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

step S1, obtaining a predicted echo signal through a nonlinear filter based on a far-end signal, wherein the nonlinear filter is constructed based on a neural network, and in the process that the nonlinear filter carries out forward calculation on the far-end signal based on the neural network, the nonlinear filter carries out nonlinear processing on the weighted summation result of each node in the forward calculation based on a nonlinear function;

and S2, performing echo cancellation on the near-end signal input by the microphone according to the predicted echo signal.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.

Example 4

Because the traditional echo cancellation method based on adaptive filtering adopts a linear filtering method, only linear echo can be cancelled; although nonlinear processing can suppress nonlinear echoes to a certain extent, the suppression degree is limited, and particularly for echoes in complex environments and nonlinear echoes introduced by equipment speakers, a large amount of residual echoes can be introduced, and the nonlinear processing adopted for eliminating the residual echoes has high computational complexity, so that the performance of echo elimination is seriously affected, and the performance of whole sound signal processing and voice enhancement is further affected.

In order to improve the estimation accuracy of nonlinear echo and further improve the effect of echo cancellation, the embodiment provides an echo cancellation method based on BP (Back Propagation) neural network nonlinear adaptive filtering, wherein a BP neural network is used for constructing a nonlinear adaptive filter to estimate echo signals received by a microphone, and the echo cancellation method replaces the original two modules of linear adaptive filtering and nonlinear processing. Meanwhile, double-end detection of the voice is finished by double-end detection, echo cancellation is performed only when the far end has sound, and echo estimation is performed only when the near end has no voice, so that the influence of the existence of the voice on the estimation of the echo is avoided. Because of the nonlinear characteristic of the neural network, the method has stronger environment modeling capability, and nonlinear echo can be well predicted, so that the performance of echo cancellation is improved. It should be noted that, only the BP neural network is described herein as an example, and the description should not constitute limitation of the type of the neural network, and the method is applicable to various types of neural networks.

Fig. 5 is a schematic diagram of an echo cancellation algorithm based on nonlinear adaptive filtering according to embodiment 4 of the present invention. As shown in fig. 5, the algorithm processing flow based on the neural network mainly involves two modules of double-ended detection (Double Talk Detection, DTD) and nonlinear adaptive filter.

The double-end detection mainly detects far-end and near-end signals, echo cancellation is performed only when the far-end has sound, and echo estimation is performed only when the near-end has no sound, so that the influence of the existence of sound on the estimation of the echo is avoided. One typical double-ended detection method is to calculate the energy of the far-end and near-end signals, respectively.

The nonlinear adaptive filter is mainly used for estimating echo signals which are reflected by the environment and received by a microphone of the nonlinear adaptive filter.

Fig. 6 is a nonlinear adaptive filtering structure diagram based on a BP neural network according to embodiment 4 of the present invention, and the flow of the echo cancellation algorithm based on the filter structure mainly includes the following steps:

s601, extracting an input signal: the input samples are sequentially stored in a delay unit (register unit), and the length of the filter is set by an application scene. Here, a BP network is assumed in which the number of nodes in the input layer is n=10, the number of hidden layers is 1, the number of nodes is 3, and the number of nodes in the output layer is 1. It should be noted that this embodiment is only described by way of example, and should not be construed as the scheme is applicable to only the neural network in this configuration.

S602, forward calculation: the data in the filter register is multiplied by the corresponding tap coefficients in sequence, and the initial value of the tap coefficients can be set to be a random decimal number between-1 and 1, or can be set to be 1. Each hidden layer node can calculate a predicted value, the predicted value is processed in a nonlinear way through a nonlinear unit, and the calculated results of different hidden layer nodes are calculated through a weight network to obtain a final outputThe calculation of the output value for each level of nodes can be expressed as:

where k represents the kth iteration,and->Output values of the node j of the first layer and the node i of the first-1 layer of the preceding layer, respectively,/->Representing the weight coefficients between the nodes i and j of the layer 1, N represents the number of the nodes of the layer 1, f (x) represents a nonlinear function with an independent variable x, and nonlinear processing is performed on the result of weighted summation, and one exemplary nonlinear activation function is a Sigmoid function, namely:

assuming that the number of input nodes is 10, the number of intermediate nodes is 3, and the number of output nodes is 1, after 3 layers of forward transmission, the final output predicted value obtained by calculation can be expressed as:

wherein,,the weight coefficients representing the nodes 1 of the second layer node j to the third layer output layer are 1 in the number of output nodes because only one prediction output is considered.

S603, error calculation: the desired input signal d (k) is subtracted from the predicted signal o (k) to obtain an error value. The desired input signal d (k) is controlled by double-ended detection, and is calculated when there is a signal at the far end and no signal at the near end. The calculated error expression can be expressed as:

where e (k) represents the desired signal d (k) and the predicted signal o (k) (i.e., the neural network output value, i.e., the predicted value calculated by the filter network) Difference of f ₁ And f ₂ The non-linear activation functions used by the hidden layer and the output layer are respectively represented, the non-linear activation functions used can be the same or different, and g (k) represents the magnitude gain calculated by the kth update. Because if a neural network uses a nonlinear activation function such as Sigmoid, the output is typically a fraction between-1 and 1, and the magnitude of the echo to be predicted is mostly outside this range, it is necessary to multiply a data gain, which can be obtained by calculating the maximum value of the data in each input node, to the same data range, namely: g (k) =max (x _i (k))。

S604, backward calculation: and carrying out counter propagation on the calculated error, and sequentially adjusting each weight coefficient of the filter. Updating the filter coefficients, namely: The weight coefficients from the first layer node i to the first layer node j at the moment k are represented, Δw represents an error value, and μ is an adjustment step size. Wherein the change Deltaw of the weight coefficient is obtained by weighting the error signal e (k)>Is obtained. Therefore, the weight of the output layer changes to:

The weight of the intermediate hidden layer varies as:

The flow of updating coefficients of the nonlinear filter based on the BP neural network in the echo cancellation algorithm according to the present embodiment is shown in fig. 7.

The echo cancellation method based on the BP neural network nonlinear filtering of the embodiment utilizes the nonlinear characteristic of the BP neural network to estimate nonlinear echo, replaces the traditional echo cancellation method based on a linear self-adaptive filter and nonlinear processing, and has the following advantages:

Better echo cancellation performance: the method adopts the neural network to realize the nonlinear self-adaptive filter, replaces the conventional linear self-adaptive filter, can utilize the nonlinear fitting characteristic of the neural network to finish the estimation of nonlinear echo, and can further improve the performance of echo cancellation;

the algorithm has the advantages of simple structure: the method adopts the nonlinear filter based on BP neural network to realize echo estimation, replaces the linear self-adaptive filter and nonlinear processing module required by conventional echo cancellation, thus having simpler calculation structure, definite physical meaning of module function and easy realization.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An echo cancellation method, characterized in that it comprises:

Based on the far-end signal, a predicted echo signal is obtained through a nonlinear filter. The nonlinear filter is constructed based on a neural network, and during the forward calculation of the far-end signal based on the neural network, the nonlinear filter performs nonlinear processing on the weighted summation result of each node in the forward calculation based on a nonlinear function.

Echo cancellation is performed on the near-end signal input to the microphone based on the predicted echo signal;

The method further includes:

The error signal is determined based on the predicted echo signal and the expected input signal;

The weight coefficients of the neural network are adjusted based on the error signal;

The neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer. Each of the input layer, the output layer, and the hidden layers has one or more nodes. Adjusting the weight coefficients of the neural network according to the error signal includes:

The adjusted weighting coefficients are calculated based on the error signal e(k). Where e(k) = d(k) - o(k), o(k) is the predicted echo signal, and d(k) is the desired input signal. Let represent the weight coefficients from the i-th node in the (l-1)-th layer to the j-th node in the l-th layer of the neural network at time k. This represents the weight coefficients from the i-th node to the j-th node in the (l-1)-th layer of the neural network at time k+1, where μ is the adjustment step size and Δw represents the change in the weight coefficients. The weight coefficients are calculated from the error signal e(k). The partial derivative is obtained.

2. The method according to claim 1, characterized in that, before determining the error signal based on the predicted echo signal and the desired input signal, and before adjusting the weight coefficients of the neural network based on the error signal, it further comprises:

The near-end signal and the far-end signal are subjected to dual-end detection to determine whether there is sound at the near end and the far end, respectively.

In the case where there is no sound at the near end but sound at the far end, the process proceeds to the steps of determining an error signal based on the predicted echo signal and a desired input signal, and adjusting the weight coefficients of the neural network based on the error signal, wherein the desired input signal includes the near-end signal input by the microphone when there is no sound at the near end but sound at the far end.

3. The method according to claim 1, characterized in that, before obtaining the predicted echo signal based on the far-end signal through a nonlinear filter, and before performing echo cancellation on the near-end signal input to the microphone according to the predicted echo signal, it further includes:

If there is sound at the far end, the process proceeds to the steps of obtaining a predicted echo signal based on the far-end signal through a nonlinear filter, and then performing echo cancellation on the near-end signal input to the microphone based on the predicted echo signal.

4. The method according to claim 2 or 3, characterized in that, performing dual-end detection on the near-end signal and the far-end signal respectively to determine whether there is sound at the near end and the far end includes:

The first energy value of the near-end signal and the second energy value of the far-end signal are obtained respectively;

If the first energy value is lower than the first sound determination threshold, it is determined that there is no sound at the proximal end; if the first energy value is not lower than the first sound determination threshold, it is determined that there is sound at the proximal end.

If the second energy value is lower than the second sound determination threshold, it is determined that there is no sound at the far end; if the second energy value is not lower than the second sound determination threshold, it is determined that there is sound at the far end.

5. The method according to any one of claims 1-3, characterized in that the neural network comprises an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, and obtaining the predicted echo signal based on the far-end signal through a nonlinear filter comprises:

The far-end signal is used as the input signal of the node of the input layer of the neural network. Forward computation is performed on the far-end signal. In the forward computation, the output value of the node of the previous layer is weighted and summed layer by layer according to the weight coefficient of the node from the previous layer to the node of the current layer to obtain the predicted value. The predicted value is then nonlinearly processed to obtain the output value of the node of the current layer. This process continues until the output value of the node of the output layer is obtained as the predicted echo signal.

6. An echo cancellation device, characterized in that it comprises:

A nonlinear filter is used to obtain a predicted echo signal based on a far-end signal. The nonlinear filter is constructed based on a neural network, and during the forward calculation of the far-end signal based on the neural network, the nonlinear filter performs nonlinear processing on the weighted summation result of each node in the forward calculation based on a nonlinear function.

An echo cancellation module is used to cancel the echo of the near-end signal input to the microphone based on the predicted echo signal;

The device further includes:

An error determination module is used to determine an error signal based on the predicted echo signal and the desired input signal;

The nonlinear filter is used to adjust the weight coefficients of the neural network according to the error signal;

The neural network includes an input layer, an output layer, and one or more hidden layers between the input layer and the output layer. Each of the input layer, the output layer, and the hidden layers has one or more nodes. The nonlinear filter is further used to calculate the adjusted weight coefficients based on the error signal e(k). Where e(k) = d(k) - o(k), o(k) is the predicted echo signal, and d(k) is the desired input signal. Let represent the weight coefficients from the i-th node in the (l-1)-th layer to the j-th node in the l-th layer of the neural network at time k. This represents the weight coefficients from the i-th node to the j-th node in the (l-1)-th layer of the neural network at time k+1, where μ is the adjustment step size and Δw represents the change in the weight coefficients. The weight coefficients are calculated from the error signal e(k). The partial derivative is obtained.

7. The apparatus according to claim 6, characterized in that it further comprises a dual-end detection module, used for:

When there is no sound at the near end but sound at the far end, the nonlinear filter is activated to adjust the weight coefficients of the neural network according to the error signal. The desired input signal includes: the near-end signal input by the microphone when there is no sound at the near end but sound at the far end; and/or, when there is sound at the far end, the echo cancellation module is activated to cancel the echo of the near-end signal input by the microphone according to the predicted echo signal.

8. The apparatus according to any one of claims 6-7, characterized in that the neural network comprises an input layer, an output layer, and one or more hidden layers between the input layer and the output layer, each of the input layer, the output layer, and the hidden layers having one or more nodes, and the nonlinear filter obtains the predicted echo signal based on the far-end signal in the following manner:

9. A storage medium, characterized in that the storage medium stores a computer program, wherein the computer program is configured to execute the method described in any one of claims 1 to 5 when it is run.