CN101820302B

CN101820302B - Device and method for canceling echo

Info

Publication number: CN101820302B
Application number: CN 200910105667
Authority: CN
Inventors: 王进军; 李智江; 吴浪浪
Original assignee: BYD Co Ltd
Current assignee: BYD Co Ltd
Priority date: 2009-02-27
Filing date: 2009-02-27
Publication date: 2013-10-30
Anticipated expiration: 2029-02-27
Also published as: CN101820302A

Abstract

The invention discloses a device for canceling echo, which comprises a near-end cache module, a near-end voice detection module, a far-end voice detection module, a near-end pitch period module, a far-end pitch period module and a signal separation module. The device can analyze independent components according to a near-end input signal, calculates each signal source and a pitch period thereof when the analyzing result of the independent components is converged, compares the pitch period of each signal source with a near-end pitch period and a far-end pitch period to obtain an acoustic echo signal, further obtains the near-end input signal after the echo is cancelled and outputs the near-end input signal to a far end so as to effectively cancel the acoustic echo in real time.

Description

Echo cancellation device

Technical Field

The invention relates to the field of voice signal processing, in particular to an echo cancellation device and method.

Background

With the advent of the information age, communication ways on which people increasingly depend have been developed from early single-language communication to multi-service and multi-network integrated communication. In various communication services, where a voice playing device and a voice capturing device need to be used simultaneously, such as a teleconference, a video conference, a network call, etc., echo affects the quality of the call to a certain extent. Echoes can be divided into electrical echoes and acoustic echoes, where electrical echoes are mainly caused by mismatch in data transitions present in the communication system.

The acoustic echo is formed by voice coupling between the voice playing device and the voice capturing device, and specifically, the full-duplex channel is adopted, and the near-end and far-end speakers and microphones work simultaneously. The near-end input signal is output by the far-end speaker, and the signal output by the speaker is picked up by the far-end microphone and directly transmitted back to the near-end speaker, so that the near-end microphone picks up the signal, and acoustic echo is generated.

In order to increase the stability of a full-duplex communication system and improve the communication quality, an echo cancellation device with a self-adaptive filtering function which is commonly used at present is arranged at a corresponding position of the system so as to solve the echo problem.

The adaptive filtering is to automatically adjust the filter parameters at the current time to adapt to the input signal at the current time by using the result of the filter parameters obtained at the previous time, so as to realize the optimized filtering. An echo cancellation device with adaptive filtering function generates a simulated echo signal according to the estimated characteristic parameters of the echo path, and subtracts the echo signal from the received signal to realize echo cancellation.

However, in such an echo canceller apparatus, due to lack of accurate control of the adaptive filter, even when the environmental noise is large, stable operation is not achieved, and the echo cannot be effectively cancelled, so that noise is artificially introduced. In addition, under different application occasions, the step size factor of the external parameter needs to be continuously debugged to find out a more reasonable step size factor. If the speaker moves or multiple speakers talk at the same time (i.e. in the presence of severe acoustic echo), the operating state of the adaptive filter is affected.

Disclosure of Invention

The invention aims to solve the problem that the existing adaptive filtering echo cancellation device has poor echo cancellation effect, and therefore, the invention provides an echo cancellation device which can be slightly influenced by external environment aiming at acoustic echo.

The invention is realized by the following steps:

an echo cancellation device, comprising:

the near-end cache module is used for receiving and storing a near-end input signal and outputting the near-end input signal to the signal separation module;

the near-end voice detection module is used for receiving a near-end input signal and sending a near-end pitch trigger signal to the near-end pitch period module when judging that the signal is a voice signal;

the far-end voice detection module is used for receiving a far-end input signal and sending a far-end fundamental tone trigger signal to the far-end fundamental tone period module when judging that the signal is a voice signal;

a near-end pitch period module, configured to receive a near-end input signal and a near-end pitch trigger signal sent by the near-end speech detection module, generate a near-end speech pitch period according to the near-end input signal, and output the near-end speech pitch period to the signal separation module;

the far-end pitch period module is used for receiving a far-end input signal and a far-end pitch trigger signal sent by the far-end voice detection module, generating a far-end voice pitch period according to the far-end input signal and outputting the far-end voice pitch period to the signal separation module;

the signal separation module is used for receiving a near-end input signal sent by the near-end cache module, a near-end pitch period sent by the near-end pitch period module and a far-end pitch period sent by the far-end pitch period module, carrying out independent component analysis according to the near-end input signal, and calculating each signal source and the pitch period thereof when the independent component analysis result is converged; the signal separation module compares the pitch period of each signal source with the near-end voice pitch period and the far-end voice pitch period to obtain an echo signal and outputs the echo signal to an adder; the adder calculates the difference between the near-end input signal and the echo signal to obtain the near-end signal after echo cancellation and outputs the near-end signal.

The invention further provides an echo cancellation method, wherein the method comprises the following steps:

A. the near-end cache module stores a near-end input signal, meanwhile, the near-end voice detection module stores the near-end input signal and judges whether the signal is a voice signal or not, and if the signal is the voice signal, the step B is executed; meanwhile, the far-end voice detection module stores a far-end input signal and judges whether the signal is a voice signal or not, and if the signal is the voice signal, the step C is executed;

B. the near-end pitch period module receives a near-end input signal, generates a near-end voice pitch period according to the near-end input signal and sends the near-end voice pitch period to the signal separation module;

C. the far-end pitch period module receives a far-end input signal, generates a far-end voice pitch period according to the far-end input signal and sends the far-end voice pitch period to the signal separation module;

D. the signal separation module carries out independent component analysis according to the stored near-end input signal, and when the independent component analysis result is converged, each signal source and the pitch period thereof are calculated; and comparing the pitch period of each signal source with the near-end voice pitch period and the far-end voice pitch period according to the received near-end voice pitch period and the far-end voice pitch period to obtain an echo signal, and calculating the near-end signal after the acoustic echo is eliminated and outputting the near-end signal to the far end.

Compared with the prior art, the echo cancellation device provided by the invention can perform echo cancellation in real time, is slightly influenced by the external environment, and thus effectively cancels echo.

Drawings

FIG. 1 is a schematic diagram of the implementation of acoustic echo cancellation using the present invention in a telecommunications network;

FIG. 2 is a schematic block diagram of a first embodiment of the present invention for implementing acoustic echo cancellation;

FIG. 3 is a schematic block diagram of a second embodiment of the present invention for implementing acoustic echo cancellation;

FIG. 4 is a schematic block diagram of a third embodiment of the present invention for implementing acoustic echo cancellation;

FIG. 5 is a functional block diagram of a fourth embodiment of the present invention for implementing acoustic echo cancellation;

FIG. 6 is a block diagram of a fifth embodiment of the present invention for implementing acoustic echo cancellation;

fig. 7 is a schematic block diagram of a sixth embodiment of the present invention for implementing acoustic echo cancellation.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.

Fig. 1 is a schematic diagram of two terminals respectively using two echo cancellation devices of the present invention to perform near-end acoustic echo cancellation and far-end acoustic echo cancellation in a telecommunication network, wherein a signal with far-end acoustic echo inputted from the near-end is outputted through a second echo cancellation device; the signal with near-end acoustic echo input at the far end is output through the first echo cancellation device, so that both ends have better voice quality under the full-duplex working condition.

Fig. 2 is a schematic block diagram of a first embodiment of implementing acoustic echo cancellation according to the present invention, and fig. 2 is a schematic block diagram of a dashed box portion of fig. 1. Sin in the figure represents an initial signal to be transmitted from a near end to a far end, namely a near end input signal, wherein the near end input signal Sin comprises one or more of a voice signal, an environmental noise signal and an acoustic echo signal, and is observed through a plurality of observation paths; sout represents the processed near-end input signal and the signal output to the far-end, i.e. the ambient noise signal and the acoustic echo-cancelled near-end signal. Rin represents the initial signal to be sent from the far end to the near end, i.e. the far end input signal; rout denotes the signal that the far-end input signal is transmitted to the near-end.

As shown in fig. 2, the echo cancellation apparatus mainly includes:

the near-end cache module 1 is used for receiving and storing Sin and outputting the Sin to the signal separation module 6;

the near-end voice detection module 2 is used for receiving Sin and sending a near-end pitch trigger signal to the near-end pitch period module 4 when judging that the signal is a voice signal;

a far-end voice detection module 3, configured to receive Rin, and send a far-end pitch trigger signal to the far-end pitch period module 5 when determining that the signal is a voice signal;

a near-end pitch period module 4, configured to receive a near-end pitch trigger signal and Sin sent by the near-end speech detection module 2, where the near-end pitch period module 4 generates a near-end speech pitch period according to Sin and sends the near-end speech pitch period to the signal separation module 6;

the far-end pitch period module 5 is configured to receive a far-end pitch trigger signal and Rin sent by the far-end voice detection module 3, and the far-end pitch period module 5 generates a far-end voice pitch period according to the Rin and sends the far-end voice pitch period to the signal separation module 6;

a signal separation module 6, configured to receive a near-end input signal sent by the near-end cache module 1, send a near-end speech pitch period by the near-end pitch period module 4, and send a far-end speech pitch period by the far-end pitch period module 5, where the signal separation module 6 performs independent component analysis according to Sin, and when an independent component analysis result is converged, calculates each signal source and its pitch period; the signal separation module 6 compares the pitch period of each signal source with the near-end voice pitch period and the far-end voice pitch period to obtain an echo signal re 'and outputs the echo signal re' to the adder; and the adder calculates the difference between Sin and the echo signal to obtain a near-end signal after echo cancellation and outputs the near-end signal to the far end.

The echo signal re' includes an acoustic echo signal and an ambient noise signal.

The echo cancellation device provided by the invention can perform echo cancellation in real time, is slightly influenced by the external environment, and thus effectively cancels acoustic echo.

Based on the first embodiment, the echo cancellation device of the present invention further provides a second embodiment. Fig. 3 is a schematic block diagram of a second embodiment, as shown in fig. 3, the second embodiment includes all the modules of fig. 2, and:

the signal separation module includes:

the whitening unit 61: the system is used for forming a mixing matrix from the near-end input signals of a plurality of observation paths, whitening the mixing matrix to obtain a whitened signal, and outputting the whitened signal to the computing unit 62;

the calculation unit 62: the whitening unit 62 is configured to receive the whitening signal and Sin sent by the whitening unit 61, perform iterative computation according to the whitening signal, and obtain a separation matrix when converging, and calculate each signal source according to Sin and the separation matrix and output the signal source to the extracting unit 63;

the extraction unit 63: the extracting unit 63 is configured to receive each signal source sent by the calculating unit, send a near-end speech pitch period from the near-end pitch period module 4, and send a far-end speech pitch period from the far-end pitch period module 5, compare the pitch period of each signal source with the near-end speech pitch period and the far-end speech pitch period, obtain an echo signal, and output the echo signal to the adder; and the adder calculates the difference between Sin and the echo signal to obtain a near-end signal after echo cancellation and outputs the near-end signal to the far end.

Fig. 4 is a schematic block diagram of a third embodiment of the echo cancellation device of the present invention, which includes all the blocks of fig. 2, compared with the first embodiment, and:

the near-end voice detection module includes a near-end energy calculation unit 21 and a near-end voice determination unit 22.

The near-end energy calculating unit 21 is configured to receive the near-end signal Sin, calculate short-time energy ESin of the near-end input signal, multiply a maximum value of the energy of the near-end input signal Sin by a constant 1.2 according to the maximum value of the energy of the silence period to obtain a value serving as a near-end voice threshold ETs, and output the near-end voice threshold ETs and the energy ESin of the Sin during the normal call to the near-end voice determining unit 22;

the near-end speech determination unit 22 is configured to receive the near-end input signal short-time energy ESin and the near-end speech threshold ETs, compare the two, and determine that the near-end input signal is a speech signal when the near-end input signal short-time energy ESin is greater than the near-end speech threshold ETs; when the near-end input signal is a speech signal, a near-end pitch trigger signal is sent to the near-end pitch period module 4.

Wherein,

the short-time energy of the near-end input signal is calculated by the formula:

E_{Sin} (n) = Σ_{n = 0}^{N - 1} {S_{in}}^{2} (n)

where n is the number of signal samples in a short period of time (e.g., 20 ms).

Of course, the near-end speech threshold ETs may be a predetermined value, for example, it may be empirically taken to be 0.001.

The far-end voice detection module includes a far-end energy calculation unit 31 and a far-end voice judgment unit 32.

The far-end energy calculating unit 31 is configured to receive the far-end signal Rin, calculate short-time energy ERin of the far-end input signal, multiply a maximum value of the energy of Rin during the silent period by a constant 1.2 to obtain a numerical value serving as a far-end voice threshold ETr, and output the ETr and the energy ERin of Rin during the normal call to the far-end voice judging unit 32;

the far-end voice judging unit 32 is configured to receive the short-time energy ERin of the far-end input signal and the far-end voice threshold ETr, compare the short-time energy ERin of the far-end input signal with the far-end voice threshold ETr, and judge that the far-end input signal is a voice signal when the short-time energy ERin of the far-end input signal is greater than the far-end voice threshold ETr; when the far-end input signal is a speech signal, a far-end pitch trigger signal is sent to the far-end pitch period module 5.

Wherein,

the short-time energy calculation formula of the far-end input signal is as follows:

E_{Rin} (n) = Σ_{n = 0}^{N - 1} {R_{in}}^{2} (n)

of course, the far-end speech threshold ETs may be a predetermined value, for example, empirically taken to be 0.001.

Fig. 5 is a schematic block diagram of a fourth embodiment of the echo cancellation device of the present invention, which includes all the blocks of fig. 2, compared with the first embodiment, and:

the near-end pitch period module comprises:

and a near-end low-pass filtering unit 41, configured to receive the near-end pitch trigger signal and Sin sent by the near-end speech detection module 2, and perform filtering processing on Sin. The near-end low-pass filtering unit 41 may filter the influence of high-frequency noise interference, and may also weaken the influence of multiple harmonic components in the frequency spectrum of the near-end input signal on the first formant, so as to play a role in not weakening the fundamental frequency information of the near-end input signal;

a near-end fourier transform unit 42, configured to generate spectrum information of a near-end input signal and output the spectrum information to a near-end maximum likelihood decision unit 44;

a near-end linear prediction unit 43, configured to perform linear prediction processing on Sin to form a frequency spectrum of a channel impulse response of Sin and output the frequency spectrum to a near-end maximum likelihood decision unit 44;

a near-end maximum likelihood decision unit 44, configured to receive the frequency spectrum information sent by the near-end fourier transform unit 42 and the frequency spectrum of the channel impulse response sent by the near-end linear prediction unit 43, generate a near-end speech pitch period according to the frequency spectrum information and the frequency spectrum of the channel impulse response, and send the near-end speech pitch period to the signal separation module 4.

The far-end pitch period module comprises:

and the far-end low-pass filtering unit 51 is configured to receive the far-end pitch trigger signal and Rin sent by the far-end voice detection module 3, and perform filtering processing on Rin. The far-end low-pass filtering unit 51 may filter the influence of high-frequency noise interference, and may also weaken the influence of multiple harmonic components in the far-end input signal frequency spectrum on the first formant, so as not to weaken the fundamental frequency information of the far-end input signal;

a far-end fourier transform unit 52, configured to generate spectrum information of a far-end input signal and output the spectrum information to a far-end maximum likelihood decision unit 54;

a far-end linear prediction unit 53, configured to perform linear prediction processing on Rin, form frequency spectrum information of a channel impulse response of Rin, and output the frequency spectrum information to a far-end maximum likelihood decision unit 54;

a far-end maximum likelihood decision unit 54, configured to receive the spectrum information sent by the far-end fourier transform unit 52 and the spectrum information of the channel impulse response sent by the far-end linear prediction unit 53, generate a far-end speech pitch period according to the two, and send the far-end speech pitch period to the signal separation module 4.

Each unit in the far-end pitch module may adopt the same data processing mode as each unit in the near-end pitch module, and the specific processing method of each unit is described below by taking the near-end pitch module as an example:

one option for the near-end low-pass filtering unit 41 may be a 5 th order low-pass filter with a cut-off frequency of 800Hz, which performs filtering processing on Sin.

The near-end fourier transform unit 42 receives the signal processed by the low-pass filter unit, and searches for a short-time spectrum of the near-end input signal within a frame

Corresponding to the first largest formant, and converting the corresponding peak value into a time domain peak value, denoted as x_f. Will [ x ]_f-1x_f+1]As the preliminary estimation value of the pitch period, then carrying out period continuation to obtain glottal excitation, adding Hamming window and carrying out spectrum analysis through a Fourier transform module to obtain

The spectrum is sent to a maximum likelihood decision unit 44;

the near-end linear prediction unit 43 receives Sin for linear prediction processing, calculates the obtained vocal tract impulse response, and makes corresponding frequency spectrum

And sending the data to a maximum likelihood judgment unit. Because of the correlation between speech samples, past sample values can be used to predict present or future sample values, i.e., a sample of speech can be approximated using past speech samples or a linear combination thereof.

{\hat{S}}_{in} (n) = - a_{1} S_{in} (n - 1) - a_{2 i} S_{in} (n - 2) - . . . - a_{p} S_{in} (n - p)

In the formula a_iIs a past time speech sample S_in(n-i) a weighting coefficient, and p (p may be 10) is a prediction order. The difference between the true signal and the predicted signal is thePrediction residual

e (n) = S_{in} (n) - {\hat{S}}_{in} (n) = S_{in} (n) + Σ_{i = 1}^{p} a_{i} S_{in} (n - i)

Making E [ | E (n) non-volatile according to the LMS rule²]At a minimum, a unique set of linear prediction coefficients a may be determined_i(i ═ 1, 2,. p). Respective prediction coefficients a are determined_iThe frequency spectrum of its frequency response can then be obtained

And output to the near-end maximum likelihood decision unit 44, where

The near-end maximum likelihood decision unit 44 determines the frequency spectrum based on the received frequency spectrum

Andshort-time frequency spectrum capable of reconstructing original input signal

Will be provided with

And

similarity comparison was performed, and the calculation formula was as follows:

mean square error between the two

And the value corresponding to the short-frequency spectrum at the minimum value is the pitch period after correction. The near-end maximum likelihood decision unit 44 outputs the obtained near-end speech pitch period to the signal processing module 4 for pitch matching correlation processing.

Fig. 6 is a schematic block diagram of a fifth embodiment of the echo cancellation device according to the present invention, which includes all the blocks of fig. 2, and further includes a control module 7 and an output module 8 for receiving the control signal sent by the control module 7,

the near-end voice detection module 2 receives a near-end input signal, and sends a near-end voice trigger signal to the control module 7 when judging that the signal is a voice signal;

the far-end voice detection module 3 receives a far-end input signal, and sends a far-end voice trigger signal to the control module 7 when judging that the signal is a voice signal;

when only receiving the near-end voice trigger signal sent by the near-end voice detection module 2 or not receiving any voice trigger information, the control module 7 outputs a control signal to the output module 8; the output module 8 receives the control signal and inputs a signal to a far end from the near end;

when only receiving a far-end voice trigger signal sent by the far-end voice detection module 3, or simultaneously receiving a near-end voice trigger signal sent by the near-end voice detection module 2 and a far-end voice trigger signal sent by the far-end voice detection module 3, controlling the signal separation module 6 to receive a near-end input signal sent by the near-end cache module 1, a near-end voice pitch period sent by the near-end pitch period unit, and a far-end voice pitch period sent by the far-end pitch period module, performing independent component analysis by the signal separation module 6 according to the near-end input signal, and calculating each signal source and the pitch period thereof when the independent component analysis result is converged; the signal separation module 6 compares the pitch period of each signal source with the near-end voice pitch period and the far-end voice pitch period to obtain an acoustic echo signal and outputs the acoustic echo signal to the adder; the adder calculates the difference between the near-end input signal and the echo signal to obtain a near-end signal after echo cancellation and outputs the near-end signal to the far end.

Based on the fifth embodiment, the echo cancellation device of the present invention provides a sixth embodiment, which includes all the modules of the fourth embodiment, compared with the fourth embodiment, wherein,

the output module further comprises a comfort sound generating unit and a signal output unit,

when the control module only receives the near-end voice trigger signal sent by the near-end voice detection module or does not receive any voice trigger information, the control module outputs a control signal to the comfort sound generation unit;

the comfortable sound generating unit is used for receiving the control signal sent by the control module and generating a comfortable sound signal output unit with a certain level;

the signal output unit superposes the near-end input signal and the comfort sound signal and outputs the superposed signal to the far end.

Obviously, the echo cancellation device with the comfort tone generation unit can provide a comfort tone signal with a certain level to a far-end calling party under the condition that no voice signal exists at both ends, so that the phenomenon that a user easily mistakenly thinks that a line is interrupted or an electric appliance fails when no voice signal exists at both ends can be effectively avoided.

The following is a specific implementation method of a specific embodiment one for implementing echo cancellation provided by the present invention:

s01: the near-end cache module stores Sin, meanwhile, the near-end voice detection module stores Sin and judges whether the signal is a voice signal currently, and if the signal is the voice signal, the step S02 is executed; meanwhile, the far-end voice detection module saves Rin and judges whether the signal is a voice signal or not, and if the signal is the voice signal, the step S03 is executed;

s02: the near-end pitch period module receives the Sin, generates a near-end voice pitch period according to the Sin and sends the near-end voice pitch period to the signal separation module;

s03: the far-end pitch period module receives Rin, generates a far-end voice pitch period according to Rin and sends the far-end voice pitch period to the signal separation module;

s04: the signal separation module carries out independent component analysis according to the stored Sin, and when the independent component analysis result is converged, each signal source and the pitch period thereof are calculated; and comparing the pitch period of each signal source with the near-end voice pitch period and the far-end voice pitch period according to the received near-end voice pitch period and the far-end voice pitch period to obtain an acoustic echo signal, and calculating the near-end signal after the acoustic echo is eliminated and outputting the near-end signal to the far end.

The signal separation module performs independent separation analysis in real time, and when the analysis result is converged, the separation matrix is calculated, so that each signal source is obtained.

When the near-end input signal and the far-end input signal are both non-voice signals, Sin is only a near-end environment noise signal; when only Sin is a voice signal, Sin contains a near-end voice signal and a near-end ambient noise signal; when only Rin is a voice signal, Rin comprises a far-end voice signal and a far-end environment noise signal, Sin comprises a near-end environment noise signal and an echo signal generated by Rin; when the two-terminal input signals are all voice signals, Rin contains an echo signal generated by the near-end input signal Sin, a far-end input voice signal and a far-end ambient noise signal.

From the above analysis, it can be seen that the situation when the near-end input signal and the far-end input signal are both voice is the most complicated, and other situations can be regarded as simplification of the above situation, and the following specifically describes the steps of calculating each signal source by the signal separation module, taking the case that the both-end input signals are both voice signals as an example:

for convenience of expression, the invention simplifies the near-end input signal into

S(n)＝[S₁(n)S₂(n)S₃(n)]

Wherein S₁(n)、S₂(n)、S₃(n) the near-end input signals received by the three microphones are mixed signals formed by superposition of near-end voice signals, echo signals generated by far-end input signals and near-end environment noise signals in different degrees.

SP01, whitening the signal mixing matrix x (n), resulting in a whitening matrix y (n), y (n) ux (n). The whitening method is to decompose the covariance of the signal mixing matrix X (n) into eigenvalues so that R_x＝VΛV^T. Let U ═ Λ^-1/2V^TThen, the whitening signal y (n) ═ Λ is obtained^-1/2V^TX(n)。

SP02, the signal separation module estimates a separation matrix W on the one hand, so that

\hat{S} (n) = WY (n),

The respective components thereof are approximated to those in s (n). The iteration steps of the algorithm within a frame are as follows.

1) Let i equal 1.

2) The matrix vector w (0) is initialized and k is made 1.

3) Let w_i(k)＝E[Y_i(w(k-1)^TY_i)³]-3w_i(k-1).。

4) Order to

w_{i} (k) = \frac{w_{i} (k)}{| | w_{i} (k) | |},

To ensure that each time a different independent component is estimated, it is necessary to add an orthogonal projection in the loop, resulting in

w_{i} (k) = w_{i} (k) - Σ_{j - 1}^{k - 1} w_{i} {(k)}^{T} w_{i} (j) w_{i} (j) .

5) If | w_i(k)^Tw_iIf (k-1) | converges to 1, stopping iteration and outputting w_i(k) Otherwise, making k equal to k +1, returning to step 3), and continuing the iteration. Until a separation matrix W is obtained, which can be written as:

W = [\begin{matrix} w_{1} (1) & w_{1} (2) & w_{1} (3) \\ w_{2} (1) & w_{2} (2) & w_{2} (3) \\ w_{3} (1) & w_{3} (2) & w_{3} (3) \end{matrix}]

SP03, according to

\hat{S} (n) = WY (n) = [\begin{matrix} {\hat{S}}_{1} & {\hat{S}}_{2} & {\hat{S}}_{3} \end{matrix}],

And calculating each signal source, and simultaneously calculating the fundamental tone of each signal source.

The situation that both ends have speech signals is the most complicated, and the other situation is the simplification of the above situation, and the following explains the signal extraction by taking the specific steps of signal extraction when both ends have speech signals as an example:

SP04, comparing the pitch of each signal source with the pitch period of the far-end input speech, so as to determine the speech signal of the far-end input signal in the signal source, and meanwhile, knowing that the remaining two signals are the acoustic echo signal generated by the near-end and the far-end ambient noise signal, i.e. the signal to be eliminated.

Compared with the prior art, the echo cancellation method provided by the invention can be used for carrying out echo cancellation in real time, and is slightly influenced by the external environment, so that the echo is effectively cancelled.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An echo cancellation device, comprising

the near-end pitch period module is used for receiving a near-end input signal and a near-end pitch trigger signal sent by the near-end voice detection module, generating a near-end voice pitch period according to the near-end input signal and outputting the near-end voice pitch period to the signal separation module;

the signal separation module is used for receiving a near-end input signal sent by the near-end cache module, a near-end pitch period sent by the near-end pitch period module and a far-end pitch period sent by the far-end pitch period module, carrying out independent component analysis according to the near-end input signal, and calculating the pitch period of each signal source when the independent component analysis result is converged; the signal separation module compares the pitch period of each signal source with the near-end voice pitch period and the far-end voice pitch period to obtain an echo signal and outputs the echo signal to an adder; the adder calculates the difference between the near-end input signal and the echo signal to obtain a near-end signal after echo cancellation and outputs the near-end signal to a far-end loudspeaker, each signal source comprises a near-end input signal and a far-end input signal, wherein the signal separation module comprises:

the whitening unit is used for receiving the near-end input signal to form a mixed matrix, whitening the mixed matrix to obtain a whitened signal and outputting the whitened signal to the computing unit;

the computing unit is used for receiving the whitening signal and the near-end input signal sent by the whitening unit, performing iterative computation according to the whitening signal, obtaining a separation matrix when the iterative computation converges, computing each signal source according to the near-end input signal and the separation matrix, and outputting the signal source to the extracting unit;

the extracting unit is used for receiving each signal source sent by the calculating unit, the near-end pitch period sent by the near-end pitch period module and the far-end pitch period sent by the far-end pitch period module, comparing the pitch period of each signal source with the near-end pitch period and the far-end pitch period, obtaining an echo signal and outputting the echo signal to the adder; the adder calculates the difference between the near-end input signal and the echo signal to obtain a near-end signal after echo cancellation and outputs the near-end signal to the far end.

2. The echo cancellation device of claim 1, wherein the near-end speech detection module comprises:

the near-end energy calculating unit is used for receiving a near-end input signal, calculating short-time energy of the signal, determining a near-end voice threshold according to the energy of the signal in a silent period, and outputting the near-end voice threshold and the energy of the near-end input signal during formal conversation to the near-end voice judging unit;

the near-end voice judging unit is used for receiving the near-end voice threshold value and the energy of the near-end input signal during formal conversation and comparing the near-end voice threshold value and the energy to judge whether the near-end input signal is a voice signal; and when the near-end input signal is a voice signal, sending a near-end pitch trigger signal to the near-end pitch period module.

3. The echo cancellation device of claim 1, wherein the far-end speech detection module comprises:

the far-end energy calculating unit is used for receiving a far-end input signal, calculating short-time energy of the signal, determining a far-end voice threshold value according to the energy of the signal in a silent period, and outputting the far-end voice threshold value and the energy of the far-end input signal during formal conversation to the far-end voice judging unit;

the far-end voice judging unit is used for receiving the far-end voice threshold value and the energy of the far-end input signal during formal conversation and comparing the far-end voice threshold value and the energy to judge whether the far-end input signal is a voice signal; and when the far-end input signal is a voice signal, sending a far-end pitch trigger signal to the far-end pitch period module.

4. The echo cancellation device of claim 1, wherein the near-end pitch period module comprises:

the near-end low-pass filtering unit is used for receiving a near-end input signal and a near-end pitch trigger signal sent by the near-end voice detection module, filtering the near-end input signal and outputting the near-end input signal to the near-end Fourier transform unit and the near-end linear prediction unit;

the near-end Fourier transform unit is used for receiving the near-end input signal processed by the near-end low-pass filter unit, generating the frequency spectrum information of the near-end input signal and outputting the frequency spectrum information to the near-end maximum likelihood judgment unit;

the near-end linear prediction unit is used for receiving and carrying out linear prediction processing on the near-end input signal according to the near-end input signal processed by the near-end low-pass filtering unit to form a frequency spectrum of a sound channel impulse response of the near-end input signal and outputting the frequency spectrum to the near-end maximum likelihood judgment unit;

and the near-end maximum likelihood judgment unit is used for receiving the frequency spectrum information sent by the near-end Fourier transformation unit and the frequency spectrum of the sound channel impulse response sent by the near-end linear prediction unit, generating a near-end voice pitch period according to the frequency spectrum information and the frequency spectrum of the sound channel impulse response, and sending the near-end voice pitch period to the signal separation module.

5. The echo cancellation device of claim 1, wherein the far-end pitch period module comprises:

the far-end low-pass filtering unit is used for receiving a far-end input signal and a far-end fundamental tone trigger signal sent by the far-end voice detection module, filtering the far-end input signal and outputting the filtered far-end input signal to the far-end Fourier transform unit and the far-end linear prediction unit;

the far-end Fourier transform unit is used for receiving and generating the frequency spectrum information of the far-end input signal according to the far-end input signal processed by the far-end low-pass filtering unit and outputting the frequency spectrum information to the far-end maximum likelihood judgment unit;

the far-end linear prediction unit is used for receiving and carrying out linear prediction processing on a far-end input signal processed by the far-end low-pass filtering unit to form a frequency spectrum of a sound channel impulse response of the far-end input signal and outputting the frequency spectrum to the far-end maximum likelihood judgment unit;

and the far-end maximum likelihood judgment unit is used for receiving the frequency spectrum information sent by the far-end Fourier transformation unit and the frequency spectrum of the sound channel impulse response sent by the far-end linear prediction unit, generating a far-end voice pitch period according to the frequency spectrum information and the frequency spectrum of the sound channel impulse response, and sending the far-end voice pitch period to the signal separation module.

6. The echo cancellation device of claim 1, further comprising a control module and an output module for receiving a control signal transmitted by the control module,

the near-end voice detection module receives a near-end input signal and sends a near-end voice trigger signal to the control module when judging that the signal is a voice signal;

the far-end voice detection module receives a far-end input signal and sends a far-end voice trigger signal to the control module when judging that the signal is a voice signal;

when the control module only receives a near-end voice trigger signal sent by the near-end voice detection module or does not receive any voice trigger information, the control module outputs a control signal to the output module; the output module receives the control signal and inputs a signal to a far-end loudspeaker from a near end;

when a far-end voice trigger signal sent by a far-end voice detection module is received only or a near-end voice trigger signal sent by a near-end voice detection module and a far-end voice trigger signal sent by a far-end voice detection module are received at the same time, the signal separation module is controlled to receive a near-end input signal sent by a near-end cache module, a near-end voice pitch period sent by a near-end pitch period unit and a far-end voice pitch period sent by a far-end pitch period module, the signal separation module carries out independent component analysis according to the near-end input signal, and when an independent component analysis result is converged, each signal source and the pitch period thereof are calculated; the signal separation module compares the pitch period of each signal source with the near-end voice pitch period and the far-end voice pitch period to obtain an echo signal and outputs the echo signal to an adder; the adder calculates the difference between the near-end input signal and the echo signal to obtain a near-end signal after echo cancellation and outputs the near-end signal to the far-end loudspeaker.

7. The echo cancellation device of claim 6, wherein the output module further comprises a comfort tone generation unit and a signal output unit,

the comfortable sound generating unit is used for receiving the control signal sent by the control module, generating a comfortable sound signal with a certain level and outputting the comfortable sound signal to the signal output unit;