CN111755020B

CN111755020B - Stereo echo cancellation method

Info

Publication number: CN111755020B
Application number: CN202010788470.4A
Authority: CN
Inventors: 王青云; 余兵; 梁瑞宇; 丁帆
Original assignee: Nanjing Shibaolian Information Technology Co ltd
Current assignee: Nanjing Shibaolian Information Technology Co ltd
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2023-02-28
Anticipated expiration: 2040-08-07
Also published as: CN111755020A

Abstract

The invention discloses a stereo echo cancellation method, which adopts two self-adaptive filters, wherein a reference signal of a first self-adaptive filter is an original far-end double-channel frequency domain signal without decorrelation processing, a reference signal of a second self-adaptive filter is a far-end double-channel frequency domain signal with decorrelation processing, and the first self-adaptive filter forms an independent echo cancellation system; the second adaptive filter is not matched between the two reference signals when error processing is carried out, so that part of the reference signals lacking in the near-end microphone are compensated by multiplying the noise frequency domain signal obtained by decorrelation and the coefficient frequency domain of the first adaptive filter, and the second adaptive filter also forms an independent echo cancellation system, is suitable for multi-channel stereo echo cancellation, and has a good application prospect.

Description

Stereo echo cancellation method

Technical Field

The invention relates to the technical field of acoustic echo cancellation, in particular to a stereo echo cancellation method.

Background

With the rapid progress of electronic science and technology, the development of multimedia communication technology has promoted the wide application of hands-free systems such as video conference systems, telephone conference systems, and the like. In this process, the user puts higher demands on the quality of the speech signal. At present, users of hands-free systems are increasingly pursuing to experience a sense of being personally on the scene, and this has led to research on the problem of stereo echo cancellation in the field of telephone conference systems and the like.

In a conference telephone system, the existence of acoustic echo may affect the quality of double-side conversation, and even howling may occur in severe cases, so as early as the eighties of the last century, research on methods for acoustic echo cancellation has been started. So far, the acoustic echo cancellation method mainly uses adaptive filtering technology, and its basic idea is to use the estimated echo path characteristic parameters to generate an estimated echo signal, then subtract the signal from the received signal to obtain a residual signal, and then use the residual signal to update the filter characteristic parameters. Currently, single-path acoustic echo cancellation methods are well established and are used in many practical fields. However, it is not long before the research on the multi-channel echo cancellation method starts, and many theoretical problems, research methods, and implementation means need to be improved by the research staff together.

The existing multi-path acoustic echo cancellation method still generally adopts an adaptive filtering technology, and is different from a single-path echo cancellation method in that the strong correlation between multi-path input signals causes the problems of slow convergence, non-unique path solution, large mismatch and the like of the adaptive filtering method, and finally the echo cancellation effect is reduced. In order to solve the problem of correlation between input signals, researchers thought many countermeasures, and the common idea was to add a preprocessing method to reduce the correlation between multiple input signals by using a decorrelation method, but this improves the echo cancellation effect, but reduces the sound quality of stereo signals to a greater or lesser extent, contrary to the idea that we thought to obtain an in-the-field effect.

Therefore, the above description is a problem to be solved how to implement multi-path acoustic echo cancellation and improve the sound quality of stereo signals so as to achieve the effect of being personally on the scene.

Disclosure of Invention

It is an object of the present invention to overcome the problems of prior art multi-path acoustic echo cancellation. The stereo echo cancellation method can ensure that no loss is caused to stereo tone quality on the basis of effectively canceling the echo signal, so that a near-end user can experience a sense of being personally on the scene, and the stereo echo cancellation method has a good application prospect.

In order to achieve the purpose, the invention adopts the technical scheme that:

a stereo echo cancellation method includes the steps of,

converting an original far-end double-channel time domain signal into a double-channel frequency domain signal through Fourier transform;

performing decorrelation processing on the original far-end two-channel time domain signal to obtain a decorrelated far-end two-channel time domain signal, and converting the decorrelated far-end two-channel time domain signal into a decorrelation processing two-channel frequency domain signal through Fourier transform, wherein a psychoacoustic masking noise model is adopted in the decorrelation processing process, namely a psychoacoustic masking threshold value of the decorrelation processing two-channel time domain signal is calculated according to the frequency spectrum of an input signal, a psychoacoustic masking noise amount time domain signal is obtained according to the masking threshold value, and the decorrelation processed far-end two-channel time domain signal is the sum of the original far-end two-channel time domain signal and the two-channel psychoacoustic masking noise amount time domain signal;

step (C), the frequency domain multiplication is carried out on the double-channel frequency domain signal which is obtained in the step (A) and does not pass through the decorrelation processing and the coefficient of the first self-adaptive filter, the two channels are added, and the inverse Fourier transform is carried out, so that the time domain output of the first self-adaptive filter is obtained;

step (D), based on one channel of the near-end microphone, subtracting the time domain signal obtained by the near-end microphone channel from the time domain output of the first adaptive filter obtained in the step (C) to obtain a first time domain error signal, wherein the first time domain error signal is the near-end signal after echo cancellation, performing Fourier transform on the first time domain error signal to obtain a first frequency domain error signal, and updating the frequency domain coefficient of the first adaptive filter used in the step (C) through the first frequency domain error signal and the far-end frequency domain signal; step (E), the decorrelation processing dual-channel frequency domain signal obtained in the step (B) is multiplied by the coefficient of a second self-adaptive filter in a frequency domain, the two channels are added, and inverse Fourier transform is carried out to obtain the time domain output of the second self-adaptive filter;

step (F), fourier transform is carried out on the psychoacoustic masking noise quantity obtained in the step (B), frequency domain multiplication is carried out on the psychoacoustic masking noise quantity and a first adaptive filter coefficient, and then time domain noise convolution quantity is obtained through inverse Fourier transform;

step (G), adding the time domain noise volume obtained in the step (F) and the time domain signal of one channel based on the near-end microphone in the step (D) to obtain a near-end time domain signal of a second adaptive filter;

step (H), subtracting the time domain output of the second adaptive filter obtained in the step (E) from the near-end time domain signal of the second adaptive filter obtained in the step (G) to obtain a second time domain error signal, namely a signal sent to the far-end after echo cancellation, obtaining a second frequency domain error signal through Fourier transform by using the second time domain error signal, and updating the frequency domain coefficient of the second adaptive filter used in the step (E) through the second frequency domain error signal and the far-end frequency domain signal; and (I), the step (A) to the step (H), and echo cancellation based on the other channel of the near-end microphone is carried out.

In the stereo echo cancellation method, in step (a), an original far-end dual-channel time domain signal is converted into a dual-channel frequency domain signal through fourier transform, where the reference signal corresponding to the dual channels is segmented and subjected to fourier transform, and the reference signal of any channel is subjected to an operation process of segmented fourier transform, as shown in formula (1):

wherein F is an M-order Fourier transform matrix, N is an iteration number, L is the length of an input signal newly added in each iteration, P is a segment number, N/P is the number of segments, P is the number of coefficients of each segment, M is the order of a filter, and X is the number of the filter _p And (n) is a frequency domain signal of one channel.

The stereo echo cancellation method comprises the steps of (C) performing frequency domain multiplication on the dual-channel frequency domain signal obtained in the step (A) and not subjected to decorrelation processing and a first adaptive filter coefficient, adding the two channels, performing inverse Fourier transform to obtain the time domain output of the first adaptive filter, specifically,

respectively filtering each segment of data in frequency domain, accumulating and then performing inverse Fourier transform, and only taking the last L point as an effective linear convolution result to obtain the time domain output of the first adaptive filter, wherein the transformation process is shown as a formula (2),

wherein,

w _p (n) is the adaptive filter time domain coefficient for the segment where the number of segments is P, with a length of P; w _p (n) is the adaptive filter time domain coefficient w at number of segments p _p And (n) compensating (M-P) zeros after the zero-sequence number is obtained, then performing Fourier transform to obtain the frequency domain coefficient of the adaptive filter with the number of segments at P and the length of M.

In the stereo echo cancellation method, in step (D), based on one channel of the near-end microphone, the time-domain signal obtained from the channel of the near-end microphone is subtracted from the time-domain output of the first adaptive filter obtained in step (C) to obtain a first time-domain error signal, which is the near-end signal after echo cancellation, the first time-domain error signal is used to perform fourier transform to obtain a first frequency-domain error signal, and the frequency-domain coefficient of the first adaptive filter used in step (C) is updated by the first frequency-domain error signal and the far-end frequency-domain signal, and the specific process is,

the power spectrum of the reference signal is used to calculate the step size of the sub-band, and the first adaptive filter coefficient, the first time domain error e of which is updated according to the step size of the sub-band, the frequency spectrum of the reference signal and the frequency domain residual signal ₁ (n) as shown in formula (3); the updating formula of the sub-band step length pi (n) is shown as the formula (4); the update formula of the first adaptive filter coefficient is shown in equation (5),

e ₁ (n)＝d(n)-y ₁ (n) (3)

where μ is the global step size, γ is the data overflow prevention coefficient, the effect is to prevent data overflow when the derivative is taken,

d (n) is the near-end signal acquired by the microphone, I _M Is an identity matrix of order M.

In the stereo echo cancellation method, in step (F), the psychoacoustic masking noise amount obtained in step (B) is subjected to fourier transform, and then frequency-domain multiplication is performed on the psychoacoustic masking noise amount and the first adaptive filter coefficient, and then time-domain noise convolution amount N (N) is obtained through inverse fourier transform, as shown in formula (6):

wherein, NP _p (n) is the frequency domain quantity of the psychoacoustic masking noise, which comprises the noise quantity of two channels and has the same structure with the frequency domain signal X of one channel _p (n)。

In the stereo echo cancellation method, in step (G), the time domain noise convolution amount obtained in step (F) is added to the time domain signal based on one channel of the near-end microphone in step (D), so as to obtain a near-end time domain signal dpN (n) of the second adaptive filter, as shown in formula (7):

dpN(n)＝d(n)+N(n) (7)。

the beneficial effects of the invention are: the stereo echo cancellation method is different from a general double-channel echo cancellation algorithm, and has the advantages that the stereo echo cancellation method can further ensure that the tone quality of a far-end signal is not damaged on the basis of well inhibiting the double-channel echo, and achieves the effect that sound is required by people to be in the environment; the second adaptive filter is not matched between two reference signals when error processing is carried out, so that part of reference signals lacking in the near-end microphone are multiplied and compensated by a noise frequency domain signal obtained by decorrelation and a coefficient frequency domain of the first adaptive filter, and therefore, the second adaptive filter also forms an independent echo cancellation system, can ensure that no loss is caused to stereo sound quality on the basis of effectively canceling echo signals, is suitable for multi-channel stereo echo cancellation, and has a good application prospect.

Drawings

FIG. 1 is a flow chart of a stereo echo cancellation method of the present invention;

FIG. 2 is a block diagram of a stereo echo cancellation system of the present invention;

FIG. 3 is a diagram of a two-channel far-end signal waveform used for simulation of an embodiment of the stereo echo cancellation method of the present invention;

fig. 4 is a waveform diagram of a residual signal after echo cancellation according to an embodiment of the stereo echo cancellation method of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings.

As shown in fig. 1-2, the stereo echo cancellation method of the present invention includes the steps of,

step (A), converting an original far-end dual-channel time domain signal into a dual-channel frequency domain signal through Fourier transform, namely segmenting a reference signal corresponding to dual channels and performing Fourier transform, wherein the reference signal of any channel is subjected to an operation process of segmented Fourier transform, as shown in a formula (1):

wherein F is an M-order Fourier transform matrix, N is an iteration number, L is the length of a newly added input signal at each iteration, P is a segment number, N/P is the number of segments, P is the number of coefficients of each segment, M is the order of the filter, X is the number of the filter, and _p (n) is the frequency domain signal of one channel;

performing decorrelation processing on the original far-end two-channel time domain signal to obtain a decorrelated far-end two-channel time domain signal, converting the decorrelated far-end two-channel time domain signal into a decorrelation processing two-channel frequency domain signal through Fourier transform, wherein a psychoacoustic masking noise model is adopted in the decorrelation processing process, namely a psychoacoustic masking threshold value of the decorrelated far-end two-channel time domain signal is calculated according to the frequency spectrum of an input signal, the psychoacoustic masking noise amount time domain signal is obtained according to the masking threshold value, and the decorrelation processed far-end two-channel time domain signal is the sum of the original far-end two-channel time domain signal and the two-channel psychoacoustic masking noise amount time domain signal;

step (C), the double-channel frequency domain signal which is obtained in the step (A) and is not passed through the decorrelation processing is multiplied by the coefficient of the first self-adaptive filter in the frequency domain, the two channels are added, and the inverse Fourier transform is carried out, so as to obtain the time domain output of the first self-adaptive filter,

respectively filtering each segment of data in frequency domain, accumulating and performing inverse Fourier transform, and only taking the last L points as effective linear convolution results to obtain the time domain output of the first adaptive filter, wherein the transformation process is shown as formula (2),

wherein,

and is

w _p (n) is the adaptive filter time domain coefficient for the segment where the number of segments is P, with a length of P; w _p (n) is the adaptive filter time domain coefficient w at number of segments p _p (n) supplementing (M-P) zeros later, and then performing Fourier transform to obtain a frequency domain coefficient of the adaptive filter with the number of segments at P and the length of M;

step (D), based on a channel of the near-end microphone, subtracting the time-domain signal obtained from the channel of the near-end microphone from the time-domain output of the first adaptive filter obtained in step (C) to obtain a first time-domain error signal, which is the near-end signal after echo cancellation, performing Fourier transform on the first time-domain error signal to obtain a first frequency-domain error signal, and updating the frequency-domain coefficient of the first adaptive filter used in step (C) through the first frequency-domain error signal and the far-end frequency-domain signal to gradually converge the frequency-domain coefficient of the first adaptive filter to be consistent with the room path, so that the filter can simulate the environment of the room and cancel the echo at the far end, wherein the updating formula adopts a segmented block frequency-domain adaptive filter algorithm, and the specific process is as follows,

calculating the sub-band step size by using the power spectrum of the reference signal, and updating the first adaptive filter coefficient and the first time domain error e of the first adaptive filter according to the sub-band step size, the frequency spectrum of the reference signal and the frequency domain residual signal ₁ (n) as shown in formula (3); the updating formula of the sub-band step length pi (n) is shown as the formula (4); the update formula of the first adaptive filter coefficient is shown in equation (5),

e ₁ (n)＝d(n)-y ₁ (n) (3)

Step (E), the decorrelation processing dual-channel frequency domain signal obtained in the step (B) is multiplied by the coefficient of a second self-adaptive filter in a frequency domain, the two channels are added, inverse Fourier transform is carried out, and the time domain output of the second self-adaptive filter is obtained, wherein the step is consistent with the step (D);

step (F), performing fourier transform on the psychoacoustic masking noise amount obtained in step (B), performing frequency domain multiplication on the psychoacoustic masking noise amount and a first adaptive filter coefficient, and then obtaining a time domain noise convolution amount N (N) through inverse fourier transform, as shown in formula (6):

wherein, NP _p (n) is a psychoacoustic masking noise frequency domain quantity, which comprises noise quantities of two channels and has the same structure as the frequency domain signal X of one reference channel _p (n)；

Step (G), adding the time domain noise convolution obtained in step (F) to the time domain signal of one channel based on the near-end microphone in step (D), to obtain a near-end time domain signal dpN (n) of the second adaptive filter, as shown in formula (7):

dpN(n)＝d(n)+N(n) (7)；

step (H), subtracting the time domain output of the second adaptive filter obtained in step (G) from the near-end time domain signal of the second adaptive filter obtained in step (E) to obtain a second time domain error signal, namely, a signal sent to the far-end after echo cancellation, but not sent to the far-end, obtaining a second frequency domain error signal by Fourier transform using the second time domain error signal, and updating the frequency domain coefficient of the second adaptive filter used in step (E) through the second frequency domain error signal and the far-end frequency domain signal so as to ensure that the frequency domain coefficient of the second adaptive filter gradually converges to be consistent with the room path, so that the filter can simulate the environment of the room and eliminate the echo from the far-end, wherein the updating formula adopts a segmented block adaptive filter algorithm, and the method in the step is consistent with that in step (D);

and (I), step (A) -step (H), performing echo cancellation based on another channel of the near-end microphone.

Stereo echo cancellation method according to an embodiment of the stereo echo cancellation method of the present invention, as shown in fig. 3, a two-channel far-end signal waveform diagram; fig. 4 is a waveform diagram of a residual signal after echo cancellation, and its simulation environment is that the size of a room is [5m,4m,3m ], the coordinates of a left microphone and a right microphone are [2.2m,2m,0.9m ], [2.8m,2m,0.9m ], the coordinates of a left speaker and a right speaker are [0m,4m,2.8m ], [5m,4m,2.8m ], the length of an impulse response is 3072 points, and the reflectivity r is set to 0.6.

In summary, the stereo echo cancellation method of the present invention is different from a general dual-channel echo cancellation algorithm, and has an advantage that it can further ensure that the sound quality of the far-end signal is not damaged on the basis of making a good suppression on the dual-channel echo, so as to achieve the effect of the sound approaching its environment, two adaptive filters are adopted, a reference signal of a first adaptive filter is an original far-end dual-channel frequency domain signal without decorrelation processing, a reference signal of a second adaptive filter is a far-end dual-channel frequency domain signal with decorrelation processing, a psychoacoustic masking noise model is adopted in the decorrelation algorithm, the first adaptive filter constitutes an independent echo cancellation system, and only because of the correlation between the reference signals, the path estimation is not very accurate; the second adaptive filter is not matched between two reference signals when error processing is carried out, so that part of reference signals lacking in the near-end microphone are multiplied and compensated by a noise frequency domain signal obtained by decorrelation and a coefficient frequency domain of the first adaptive filter, therefore, the second adaptive filter also forms an independent echo cancellation system, can ensure that no loss is caused to stereo tone quality on the basis of effectively eliminating echo signals, is suitable for multi-channel stereo echo cancellation, and has good application prospect.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, and such changes and modifications are within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A stereo echo cancellation method, characterized by: comprises the following steps of (a) carrying out,

performing decorrelation processing on the original far-end double-channel time domain signal to obtain a decorrelated far-end double-channel time domain signal, and converting the decorrelated far-end double-channel time domain signal into a decorrelation processing double-channel frequency domain signal through Fourier transform, wherein a psychoacoustic masking noise model is adopted in the decorrelation processing process, namely a psychoacoustic masking threshold value of the decorrelation processing double-channel time domain signal is calculated according to the frequency spectrum of an input signal, the psychoacoustic masking noise amount time domain signal is obtained according to the masking threshold value, and the decorrelation processed far-end double-channel time domain signal is the sum of the original far-end double-channel time domain signal and the double-channel psychoacoustic masking noise amount time domain signal;

step (C), the dual-channel frequency domain signal which is obtained in the step (A) and is not subjected to the decorrelation processing is subjected to frequency domain multiplication with the coefficient of the first adaptive filter, the two channels are added, and inverse Fourier transform is performed to obtain the time domain output of the first adaptive filter;

step (D), based on one channel of the near-end microphone, subtracting the time domain signal obtained by the near-end microphone channel from the time domain output of the first adaptive filter obtained in the step (C) to obtain a first time domain error signal, wherein the first time domain error signal is the near-end signal after echo cancellation, performing Fourier transform on the first time domain error signal to obtain a first frequency domain error signal, and updating the frequency domain coefficient of the first adaptive filter used in the step (C) through the first frequency domain error signal and the far-end frequency domain signal;

step (E), the decorrelation processing dual-channel frequency domain signal obtained in the step (B) is multiplied by the coefficient of a second self-adaptive filter in a frequency domain, the two channels are added, and inverse Fourier transform is carried out to obtain the time domain output of the second self-adaptive filter;

step (G), adding the time domain noise convolution quantity obtained in the step (F) and the time domain signal of one channel based on the near-end microphone in the step (D) to obtain a near-end time domain signal of a second adaptive filter;

step (H), subtracting the time domain output of the second adaptive filter obtained in the step (E) from the near-end time domain signal of the second adaptive filter obtained in the step (G) to obtain a second time domain error signal, namely a signal sent to the far-end after echo cancellation, obtaining a second frequency domain error signal through Fourier transform by using the second time domain error signal, and updating the frequency domain coefficient of the second adaptive filter used in the step (E) through the second frequency domain error signal and the far-end frequency domain signal;

and (I), the step (A) to the step (H), and echo cancellation based on the other channel of the near-end microphone is carried out.

2. A stereo echo cancellation method according to claim 1, characterized in that: step (A), converting an original far-end dual-channel time domain signal into a dual-channel frequency domain signal through Fourier transform, namely segmenting a reference signal corresponding to dual channels and performing Fourier transform, wherein the reference signal of any channel is subjected to an operation process of segmented Fourier transform, as shown in a formula (1):

wherein F is an M-order Fourier transform matrix, N is an iteration number, L is the length of a newly added input signal at each iteration, P is a segment number, N/P is the number of segments, P is the number of coefficients of each segment, M is the order of the filter, X is the number of the filter, and _p and (n) is a frequency domain signal of one channel.

3. A stereo echo cancellation method according to claim 1, characterized by: step (C), the dual-channel frequency domain signal which is obtained in step (A) and is not subjected to the decorrelation processing is subjected to frequency domain multiplication with the coefficient of the first adaptive filter, the two channels are added, and inverse Fourier transform is carried out to obtain the time domain output of the first adaptive filter,

wherein,

and is

w _p (n) is the adaptive filter time domain coefficient for the segment where the number of segments is P, with a length of P; w is a group of _p (n) is the adaptive filter time domain coefficient w at number of segments p _p And (n) compensating (M-P) zeros after the zero-sequence detection, and then performing Fourier transform to obtain the frequency domain coefficient of the adaptive filter with the number of the segments P and the length of M.

4. A stereo echo cancellation method according to claim 3, characterized by: step (D), based on a channel of the near-end microphone, subtracting the time-domain signal obtained from the channel of the near-end microphone from the time-domain output of the first adaptive filter obtained in step (C) to obtain a first time-domain error signal, which is the near-end signal after echo cancellation, performing Fourier transform on the first time-domain error signal to obtain a first frequency-domain error signal, and updating the frequency-domain coefficient of the first adaptive filter used in step (C) through the first frequency-domain error signal and the far-end frequency-domain signal, wherein the specific process is as follows,

e ₁ (n)＝d(n)-y ₁ (n) (3)

5. A stereo echo cancellation method according to claim 4, characterized in that: step (F), performing fourier transform on the psychoacoustic masking noise amount obtained in step (B), performing frequency domain multiplication on the psychoacoustic masking noise amount and a first adaptive filter coefficient, and then obtaining a time domain noise convolution amount N (N) through inverse fourier transform, as shown in formula (6):

6. A stereo echo cancellation method according to claim 4, characterized by: step (G), adding the time domain noise convolution obtained in step (F) to the time domain signal of one channel based on the near-end microphone in step (D), to obtain a near-end time domain signal dpN (n) of the second adaptive filter, as shown in formula (7):

dpN(n)＝d(n)+N(n) (7)。