[go: up one dir, main page]

CN111128220A - Dereverberation method, apparatus, device and storage medium - Google Patents

Dereverberation method, apparatus, device and storage medium Download PDF

Info

Publication number
CN111128220A
CN111128220A CN201911416265.9A CN201911416265A CN111128220A CN 111128220 A CN111128220 A CN 111128220A CN 201911416265 A CN201911416265 A CN 201911416265A CN 111128220 A CN111128220 A CN 111128220A
Authority
CN
China
Prior art keywords
dereverberation
signal
sound source
nlms
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911416265.9A
Other languages
Chinese (zh)
Other versions
CN111128220B (en
Inventor
陈俊彬
杨汉丹
王广新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Youjie Zhixin Technology Co ltd
Original Assignee
Shenzhen Youjie Zhixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Youjie Zhixin Technology Co ltd filed Critical Shenzhen Youjie Zhixin Technology Co ltd
Priority to CN201911416265.9A priority Critical patent/CN111128220B/en
Publication of CN111128220A publication Critical patent/CN111128220A/en
Application granted granted Critical
Publication of CN111128220B publication Critical patent/CN111128220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a dereverberation method, a dereverberation device, equipment and a storage medium, wherein the method comprises the following steps: obtaining M1The input signals of each channel are converted into digital signals by analog-to-digital conversion; performing short-time Fourier transform on the digital signal, and converting the digital signal from a time domain to a frequency domain; performing sound source positioning on the frequency domain signal, and selecting a target sound source direction; to M1Performing beam forming on the frequency domain signal to obtain M2Outputting beams in various directions; adopting WPE algorithm based on NLMS to pair M2Performing first dereverberation processing on the beam output in each direction; and performing inverse Fourier transform on the signal subjected to the dereverberation processing to obtain a dereverberation signal of a time domain. The dereverberation method, the device, the equipment and the storage medium provided by the invention are used for beamforming signals input by a few channels to obtain more channels, and then the WPE algorithm based on the NLMS algorithm is used for dereverberation, so that the operation time of dereverberation processing is faster,meanwhile, the operation complexity is low, and the dereverberation effect is good.

Description

Dereverberation method, apparatus, device and storage medium
Technical Field
The present invention relates to the field of signal processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for dereverberation.
Background
In some relatively closed environments, if the sound source is far away from the microphone, the reflection of the voice will cause the signal received by the microphone to have reverberation, so that the signal quality is degraded, and therefore, the signal may be seriously affected whether the signal is used for conversation or voice recognition.
Reverberation affects the intelligibility of speech, reduces the recognition rate of speech recognition systems, and even causes hearing fatigue. The multichannel self-adaptive dereverberation algorithm based on the microphone array can effectively remove reverberation and keep direct voice, and is suitable for a far-field pickup scene. In commercial products, microphone arrays are increasingly used, such as smart speakers, robots, conference call equipment, and the like. At present, the adaptive dereverberation algorithm includes an algorithm based on spectral subtraction, WPE (Weighted Prediction Errors) based on RLS (recursive Least Square), WPE based on LMS (Least mean Square), and the like, and the spectral subtraction has certain damage to voice and unnatural hearing; the WPE based on the LMS is difficult to adjust convergence according to the input signal due to the fixed step length, and the dereverberation effect is realized only by improving the prediction order, but the convergence speed is reduced due to the improvement of the prediction order; the RLS-based WPE algorithm requires more parameters than LMS, and its tracking performance and robustness to interference variations are also inferior to LMS.
Disclosure of Invention
The invention mainly aims to provide a dereverberation method, a dereverberation device, equipment and a storage medium, which can dereverberate multi-channel signals with higher operation speed and better effect.
The invention provides a dereverberation method, which comprises the following steps:
obtaining M1The input signals of each channel are converted into digital signals by analog-to-digital conversion;
performing short-time Fourier transform on the digital signal, and converting the digital signal from a time domain to a frequency domain;
performing sound source positioning on the frequency domain signal, and selecting a target sound source direction;
to M1A frequency of saidThe domain signal is made into beam forming to obtain M2Outputting beams in various directions; wherein, M is2Greater than M1Said M is2The beam outputs of the individual directions include a beam output of a target sound source direction;
adopting WPE algorithm based on NLMS to pair M2Performing first dereverberation processing on the beam output in each direction;
and performing inverse Fourier transform on the signal subjected to the dereverberation processing to obtain a dereverberation signal of a time domain.
Further, the step of performing inverse fourier transform on the signal after the dereverberation processing to obtain a dereverberation signal of a time domain includes:
performing second dereverberation processing on the signal subjected to the first dereverberation processing by adopting a post-wiener filter;
and performing inverse Fourier transform on the signal subjected to the second dereverberation processing to obtain a dereverberation signal of a time domain.
Further, the step of performing sound source localization on the frequency domain signal and selecting a target sound source direction includes:
uniformly selecting N direction vectors in a space;
calculating SRP-PHAT values corresponding to the N direction vectors by adopting an SRP-PHAT algorithm;
and selecting the direction corresponding to the maximum value in the SRP-PHAT values as the direction of the target sound source.
Further, the method adopts WPE algorithm based on NLMS to the M2Before the step of performing the first dereverberation processing on the beam output in each direction, the method further comprises the following steps:
detecting whether target voice exists;
if yes, updating the filter coefficient of the NLMS filter;
and if not, keeping the current filter coefficient of the NLMS filter.
Further, the step of updating the filter coefficient of the NLMS filter includes:
using a formula
Figure BDA0002351271150000021
And calculating a filter coefficient of the NLMS filter, wherein mu is a step size adjusting factor, α is a positive real number, ORD is a prediction order, G (l, k) is the current filter coefficient of the NLMS filter, Y (l-D, k) is a beam output historical value from an l-D-ORD frame to an l-D frame, D is a prediction delay, and E (l, k) is a signal of a k-th frequency band of the l-D frame after the first dereverberation.
Further, the method adopts WPE algorithm based on NLMS to the M2The step of performing the first dereverberation processing on the beam output in each direction includes:
the signal of the kth frequency band of the ith frame in the time-frequency domain after the first dereverberation is expressed by multi-channel linear prediction as: e (l, k) ═ Ymax(l,k)-YT(l-D, k) G (l, k); wherein Y (l-D, k) [, [ 2 ] ]Y(l-D,k),Y(l-1-D,k),...,Y(l-ORD+1-D,k)]T
Figure BDA0002351271150000032
D is the predicted delay, Ymax(l, k) is a beam output of the target sound source direction at the kth frequency band of the l-th frame.
Further, the step of performing a second dereverberation process on the first dereverberated signal by using a post-wiener filter includes:
using a formula
Figure BDA0002351271150000031
Performing second dereverberation processing on the signal subjected to the first dereverberation processing; wherein O (l, k) is the signal of the kth frequency band in the l frame after the second dereverberation.
The present invention also provides a dereverberation apparatus, comprising:
an acquisition unit for acquiring M1The input signals of each channel are converted into digital signals by analog-to-digital conversion;
the short-time Fourier transform unit is used for performing short-time Fourier transform on the digital signal and converting the digital signal from a time domain to a frequency domain;
the sound source positioning unit is used for performing sound source positioning on the frequency domain signal and selecting a target sound source direction;
a beam forming unit for pair M1Performing beam forming on the frequency domain signal to obtain M2Outputting beams in various directions; wherein, M is2Greater than M1Said M is2The beam outputs of the individual directions include a beam output of a target sound source direction;
a first dereverberation unit for applying a NLMS-based WPE algorithm to the M2Performing first dereverberation processing on the beam output in each direction;
and the inverse Fourier transform unit is used for performing inverse Fourier transform on the signal subjected to the dereverberation processing to obtain a dereverberation signal of a time domain.
The invention also provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.
The invention also provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any of the above.
The invention takes the multi-channel self-adaptive dereverberation algorithm as the post-processing of the beam forming algorithm, utilizes the beam forming to generate the signals with more channels from the input signals with less channels, and then carries out dereverberation processing through the WPE dereverberation algorithm based on the NLMS, so that the operation time of dereverberation processing is faster, the operation complexity is smaller, and the dereverberation effect is better.
Drawings
FIG. 1 is a flow chart of a dereverberation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of experimental verification according to an embodiment of the present invention;
fig. 3 is a block diagram of a dereverberation apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, in an embodiment of the present invention, there is provided a dereverberation method, including:
step S1, obtaining M1The input signals of each channel are converted into digital signals by analog-to-digital conversion;
step S2, performing short-time Fourier transform on the digital signal, and converting the digital signal from a time domain to a frequency domain;
step S3, sound source positioning is carried out on the frequency domain signal, and a target sound source direction is selected;
step S4, for M1Performing beam forming on the frequency domain signal to obtain M2Outputting beams in various directions; wherein, M is2Greater than M1Said M is2The beam outputs of the individual directions include a beam output of a target sound source direction;
step S5, adopting WPE algorithm based on NLMS to process M2Performing first dereverberation processing on the beam output in each direction;
and step S6, performing inverse Fourier transform on the signal after the dereverberation processing to obtain a dereverberation signal of a time domain.
In this embodiment, as described in step 1 above, M is obtained1The input signals of each channel are converted into digital signals by analog-to-digital conversion. Obtaining M with a microphone1The method comprises the steps that input signals of each channel are analog signals, and the analog signals are converted into digital signals through analog-to-digital conversion and then processed.
As described in step S2, the digital signal is subjected to short-time fourier transform, and the digital signal is converted from the time domain to the frequency domain. This embodiment can adopt Xi(l, k) denotes a signal of a kth frequency band of an ith frame input at an ith channel, where i is a channel index, and i is 1,21
As described in step S3, the frequency domain signal is subjected to sound source localization, and a target sound source direction is selected. In this embodiment, one or more of the following technical means, such as SRP-PHAT (weighted Response Power based on Phase Transformation, controlled Response Power based on Phase Transformation weighting) based on GCC-PHAT (Generalized Cross Correlation-Phase Transformation, Generalized Cross Correlation based on Phase Transformation weighting), SRP based on beam energy, and the like, may be used for sound source localization of the frequency domain signal. And selecting a target sound source direction according to the sound source positioning.
As described in the above step S4, for M1Performing beam forming on the frequency domain signal to obtain M2A beam output of one direction, wherein M2Greater than M1. In this example for M1The Beamforming of the frequency domain signal may be performed by one or more of SDBF (Super directional Beamforming), MVDR (minimum variance Distortionless Response), DS (Delay Sum), DMA (Differential Microphone Array), GSC (Generalized sidelobe canceller), and the like. Take the super-directional beam as an example, take the target sound source direction as the target direction, and take M as the target direction1Performing beam forming on the frequency domain signal to obtain M2Beams in each direction, finding the beam output in each direction, M2The beams of the individual directions include beam outputs of the directions of the target sound source. The beam output in each direction is
Figure BDA0002351271150000051
m=1,2,...,M2Where x (k) is the input signal for the k-th band. Wm(k) Is the spatial filter coefficient of the SDBF.
Figure BDA0002351271150000052
Is Wm(k) The conjugate transpose of (c).
Wm(k) The expression of (a) is:
Figure BDA0002351271150000053
αm(k) is a vector d corresponding to the m-th directionmCorrespond toThe steering vector of (1). Γ (k) is the cross-correlation matrix of the scattered noise field,
Figure BDA0002351271150000054
wherein,
Figure BDA0002351271150000055
li,jis the distance from the ith microphone to the jth microphone,
Figure BDA0002351271150000056
k is the frequency index, and K1, 2sFor sampling frequency, the beam output for each direction is found by the above formula.
As described in the above step S5, NLMS-based WPE algorithm is applied to M2Outputting the wave beams in the directions to perform first dereverberation processing to obtain dereverberation signals in the direction of a target sound source; the signals of different frequency bands are regarded as independent signals, dereverberation processing is carried out on the signal iteration of each frequency band, the WPE algorithm based on the NLMS can adjust the step length in real time according to the input signals, the dereverberation processing operation time is short, meanwhile, the operation complexity is small, and the dereverberation effect is good.
As described in step S6, the signal after the dereverberation processing is inverse fourier transformed to obtain a single-channel dereverberation signal in the time domain.
In this embodiment, by pairing M1Beamforming the signals of the channels to obtain more than M1M of (A)2Outputting beams in each direction by adopting WPE algorithm based on NLMS to M2And outputting the wave beams in the directions to perform first dereverberation processing to obtain dereverberation signals in the direction of a target sound source. In the embodiment, through beam forming noise reduction, the target voice is enhanced, and meanwhile, the number of channels is increased to more channels, so that the effect of removing reverberation is enhanced; the WPE algorithm based on NLMS is adopted for dereverberation processing, the step length can be adjusted in real time according to the input signal, the operation time of dereverberation processing is shorter, the operation complexity is lower, and dereverberation processing is realizedThe sound effect is better.
In an embodiment, the step S6 of performing inverse fourier transform on the dereverberated signal to obtain a time-domain dereverberated signal includes:
step S601, a post-wiener filter is adopted to carry out second dereverberation processing on the signal subjected to the first dereverberation processing;
step S602, inverse fourier transform is performed on the second dereverberation processed signal to obtain a time domain dereverberation signal.
In this embodiment, the post-wiener filter is used to perform a second dereverberation process on the first dereverberation processed signal to eliminate residual reverberation, so that the dereverberation effect is better, and the inverse fourier transform is performed on the second dereverberation processed signal to obtain a single-channel dereverberation signal in the time domain.
In an embodiment, the step S3 of performing sound source localization on the frequency domain signal and selecting a target sound source direction includes:
s301, uniformly selecting N direction vectors in a space;
step S302, calculating SRP-PHAT values corresponding to the N direction vectors by adopting an SRP-PHAT algorithm;
and step S303, selecting the direction corresponding to the maximum value in the SRP-PHAT values as a target sound source direction.
In this embodiment, as described in step S301 above, an SRP-PHAT algorithm is adopted to uniformly select N direction vectors in a space, where the N direction vectors are represented as: dn,n=1,2,...,N。
As described in step S302, the SRP-PHAT values corresponding to the N direction vectors are calculated.
Figure BDA0002351271150000071
Wherein R isijij(dn)]Generalized cross-correlation function GCC-PHAT weighted based on phase transformation for the received signals of the ith microphone and the jth microphone is expressed as:
Figure BDA0002351271150000072
where K is the frequency index, and K1, 2.. K, K is the number of points of the short-time fourier transform, (.)*Representing conjugation.
Figure BDA0002351271150000073
FsFor the sampling frequency, τij(dn) Represents a direction vector dnThe time Difference of arrival tdoa (time Difference of arrival) at the ith and jth microphones. With riAnd rjRespectively represents the rectangular coordinate vectors of the ith and jth microphones, and c is the sound velocity, then
Figure BDA0002351271150000074
Wherein | · | | represents a 2-norm of the vector, and the above formula is used in this embodiment to calculate SRP-PHAT values corresponding to the N direction vectors.
As described in step S303, the direction corresponding to the maximum SRP-PHAT value is selected as the target sound source direction. The SRP-PHAT algorithm has stronger robustness in a reverberation environment, can realize sound source positioning in a real environment, and is favorable for accurately positioning a sound source.
In one embodiment, the applying NLMS-based WPE algorithm to the M2Before the step S5 of performing the first dereverberation process on the beam outputs in the respective directions, the method further includes:
step S5a, detecting whether a target voice exists;
step S5b, if yes, updating the filter coefficient of the NLMS filter;
and step S5c, if not, keeping the current filter coefficient of the NLMS filter.
In this embodiment, the VAD (Voice Activity Detection) of the target Voice is used to control the update of the filter coefficient of the NLMS filter, so as to reduce the influence of interference, enhance the robustness of dereverberation, and reduce the amount of calculation.
In an embodiment, the step S5b of updating the filter coefficient of the NLMS filter includes:
step S5b1, using the formula
Figure BDA0002351271150000075
And calculating a filter coefficient of the NLMS filter, wherein mu is a step size adjusting factor, α is a positive real number, ORD is a prediction order, G (l, k) is the current filter coefficient of the filter, Y (l-D, k) is a beam output historical value from the l-D-ORD frame to the l-D frame, D is a prediction delay, and E (l, k) is a signal of the k-th frequency band of the l-D frame after the first dereverberation.
In this embodiment, when it is detected that there is Voice, the VAD (Voice activity detection) of the target Voice is used to control the update of the filter coefficient of the NLMS filter, so as to reduce the influence of interference, enhance the robustness of dereverberation, and reduce the amount of calculation, so that the dereverberation effect is better.
In one embodiment, the applying NLMS-based WPE algorithm to the M2Step S5 of performing the first dereverberation process on the beam outputs in the respective directions includes:
in step S501, after the first dereverberation, the signal of the kth frequency band of the ith frame in the time-frequency domain is represented by multi-channel linear prediction as: e (l, k) ═ Ymax(l,k)-YT(l-D, k) G (l, k); wherein Y (l-D, k) [, [ 2 ] ]Y(l-D,k),Y(l-1-D,k),...,Y(l-ORD+1-D,k)]T
Figure BDA0002351271150000081
D is the predicted delay, Ymax(l, k) is a beam output of the target sound source direction at the kth frequency band of the l-th frame.
In an embodiment, the step S601 of performing the second dereverberation processing on the first dereverberated signal by using the post wiener filter includes:
step S6011, formula
Figure BDA0002351271150000082
And performing second dereverberation processing on the signal subjected to the first dereverberation processing.
In this embodiment, the above formula is adopted to perform the second dereverberation processing on the first dereverberated signal, so as to eliminate the residual reverberation, and thus the dereverberation effect is better.
Referring to fig. 2, fig. 2 is an experimental verification of the present invention.
As shown in fig. 2, fig. 2 takes 36 recording data of a 4-microphone array as an example, and the frame length is 160 points to perform a comparison SRMR (Speech-to-conversion modulation energy ratio) experiment.
1. Raw data takes data of a first channel (hereinafter labeled "raw");
2. performing ORD on original 4-channel data to obtain 20, dereverberating RLS-WPE (recursive least squares) -input 4-channel data and single-channel data (hereinafter, labeled as 'experiment 1');
3. performing an ORD (order of arrival) on original 4-channel data as 20, and dereverberating by using a WPE (wavelet transform algorithm) algorithm based on NLMS (non-linear regression) with 4-channel input and single-channel output (hereinafter, labeled as 'experiment 2');
4. original 4-channel data are firstly subjected to an ORD of 20, and subjected to a WPE algorithm based on NLMS for removing reverberation, wherein 4-channel input and 4-channel output are carried out; then, BF (hereinafter, referred to as "experiment 3") with 4-channel input and single-channel output is performed;
5. firstly, BF of 4-channel input and 8-channel output is carried out on original 4-channel data; then, the WPE algorithm based on NLMS with 4, 8 channel input and single channel output as the ORD is made to dereverberate (i.e. the embodiment of the present invention, hereinafter referred to as "experiment 4").
The mean SRMR value for each experiment was calculated: the mean SRMR for experiment 1 was 3.2055; the mean SRMR for experiment 2 was 4.3595; the mean SRMR for experiment 2 was 4.5440; the mean SRMR for experiment 3 was 4.7216; the mean SRMR for experiment 4 was 5.0315.
The total time spent for experiment 1 was 3230.681 s; the total time spent for experiment 2 was 38.270 s; the total time spent for experiment 3 was 111.305 s; the total time consumed for experiment 4 was 46.083 s. The prediction order used in experiment 4 is smaller and the convergence speed of the filter is faster. Through comparison, less channel signals are subjected to beam forming to obtain voices of more channels, then the WPE algorithm based on NLMS is adopted for dereverberation, the operation time is obviously shorter than that of the WPE algorithm based on RLS, and meanwhile, a better dereverberation effect can be obtained with smaller operation complexity.
Referring to fig. 3, an embodiment of the present invention further provides a dereverberation apparatus, including:
an acquisition unit 10 for acquiring M1The input signals of each channel are converted into digital signals by analog-to-digital conversion;
a short-time fourier transform unit 20, configured to perform short-time fourier transform on the digital signal, and convert the digital signal from a time domain to a frequency domain;
a sound source positioning unit 30, configured to perform sound source positioning on the frequency domain signal, and select a target sound source direction;
a beam forming unit 40 for pair M1Performing beam forming on the frequency domain signal to obtain M2Outputting beams in various directions; wherein, M is2Greater than M1Said M is2The beam outputs of the individual directions include a beam output of a target sound source direction;
a first dereverberation unit 50 for applying a NLMS-based WPE algorithm to the M2Performing first dereverberation processing on the beam output in each direction;
and an inverse fourier transform unit 60, configured to perform inverse fourier transform on the signal after the dereverberation processing, so as to obtain a dereverberation signal in a time domain.
In one embodiment, the inverse fourier transform unit 60 includes:
the second dereverberation subunit is used for performing second dereverberation processing on the signal subjected to the first dereverberation processing by adopting a post-wiener filter;
and the inverse Fourier transform subunit is used for performing inverse Fourier transform on the signal subjected to the second dereverberation processing to obtain a dereverberation signal of a time domain.
In an embodiment, the sound source localization unit 30 includes:
a selecting subunit, configured to uniformly select N directional vectors in a space;
a calculating subunit, configured to calculate, by using an SRP-PHAT algorithm, SRP-PHAT values corresponding to the N direction vectors;
and the selecting subunit is used for selecting the direction corresponding to the maximum value in the SRP-PHAT values as the target sound source direction.
In an embodiment, the dereverberation apparatus further includes:
a detection unit for detecting whether there is a target voice;
the updating unit is used for updating the filter coefficient of the NLMS filter if the NLMS filter exists;
and the holding unit is used for holding the current filter coefficient of the NLMS filter if the NLMS filter does not exist.
In one embodiment, the update unit includes:
a calculation subunit for employing the formula
Figure BDA0002351271150000101
And calculating a filter coefficient of the NLMS filter, wherein mu is a step size adjusting factor, α is a positive real number, ORD is a prediction order, G (l, k) is the current filter coefficient of the NLMS filter, Y (l-D, k) is a beam output historical value from an l-D-ORD frame to an l-D frame, D is a prediction delay, and E (l, k) is a signal of a k-th frequency band of the l-D frame after the first dereverberation.
In an embodiment, the first dereverberation unit 50 includes:
a representation subunit, configured to represent, by multi-channel linear prediction, a signal of a kth frequency band of the ith frame in the time-frequency domain after the first dereverberation as: e (l, k) ═ Ymax(l,k)-YT(l-D, k) G (l, k); wherein Y (l-D, k) [, [ 2 ] ]Y(l-D,k),Y(l-1-D,k),...,Y(l-ORD+1-D,k)]T
Figure BDA0002351271150000103
D is the predicted delay, Ymax(l, k) is a beam output of the target sound source direction at the kth frequency band of the l-th frame.
In an embodiment, the second dereverberation subunit includes:
a second dereverberation module for employing the formula
Figure BDA0002351271150000102
Performing second dereverberation processing on the signal subjected to the first dereverberation processing; wherein O (l, k) is the signal of the kth frequency band in the l frame after the second dereverberation.
In this embodiment, please refer to the above method embodiment for specific implementation of the above units, sub-units, and modules, which are not described herein again.
The invention also provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.
The invention also provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of any of the above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method of dereverberation, comprising:
obtaining M1The input signals of each channel are converted into digital signals by analog-to-digital conversion;
performing short-time Fourier transform on the digital signal, and converting the digital signal from a time domain to a frequency domain;
performing sound source positioning on the frequency domain signal, and selecting a target sound source direction;
to M1Performing beam forming on the frequency domain signal to obtain M2Outputting beams in various directions; wherein, M is2Greater than M1Said M is2The beam outputs of the individual directions include a beam output of a target sound source direction;
adopting WPE algorithm based on NLMS to pair M2Performing first dereverberation processing on the beam output in each direction;
and performing inverse Fourier transform on the signal subjected to the dereverberation processing to obtain a dereverberation signal of a time domain.
2. The dereverberation method of claim 1, wherein the step of performing inverse fourier transform on the dereverberated signal to obtain a time-domain dereverberation signal comprises:
performing second dereverberation processing on the signal subjected to the first dereverberation processing by adopting a post-wiener filter;
and performing inverse Fourier transform on the signal subjected to the second dereverberation processing to obtain a dereverberation signal of a time domain.
3. The dereverberation method of claim 1, wherein the step of source-locating the frequency-domain signal and selecting a target sound source direction comprises:
uniformly selecting N direction vectors in a space;
calculating SRP-PHAT values corresponding to the N direction vectors by adopting an SRP-PHAT algorithm;
and selecting the direction corresponding to the maximum value in the SRP-PHAT values as the direction of the target sound source.
4. The dereverberation method of claim 1, wherein said applying NLMS-based WPE algorithm to said M2Before the step of performing the first dereverberation processing on the beam output in each direction, the method further comprises the following steps:
detecting whether target voice exists;
if yes, updating the filter coefficient of the NLMS filter;
and if not, keeping the current filter coefficient of the NLMS filter.
5. The dereverberation method of claim 4, wherein the step of updating the filter coefficients of the NLMS filter comprises:
using a formula
Figure FDA0002351271140000021
Calculating the filter coefficient of the NLMS filter, wherein mu is a step size adjusting factor, α is a positive real number, ORD is a prediction order, and G (l, k) is the current filtering coefficient of the NLMS filterAnd the coefficients, Y (l-D, k) is the beam output historical value from the l-D-ORD frame to the l-D frame, D is the predicted delay, and E (l, k) is the signal of the k frequency band of the l frame after the first dereverberation.
6. The dereverberation method of claim 1, wherein said applying NLMS-based WPE algorithm to said M2The step of performing the first dereverberation processing on the beam output in each direction includes:
the signal of the kth frequency band of the ith frame in the time-frequency domain after the first dereverberation is expressed by multi-channel linear prediction as: e (l, k) ═ Ymax(l,k)-YT(l-D, k) G (l, k); wherein Y (l-D, k) [, [ 2 ] ]Y(l-D,k),Y(l-1-D,k),...,Y(l-ORD+1-D,k)]T
Figure FDA0002351271140000023
D is the predicted delay, Ymax(l, k) is a beam output of the target sound source direction at the kth frequency band of the l-th frame.
7. The dereverberation method of claim 2, wherein the step of performing a second dereverberation process on the first dereverberated signal by using a post-wiener filter comprises:
using a formula
Figure FDA0002351271140000022
Performing second dereverberation processing on the signal subjected to the first dereverberation processing; wherein O (l, k) is the signal of the kth frequency band in the l frame after the second dereverberation.
8. A dereverberation apparatus, comprising:
an acquisition unit for acquiring M1The input signals of each channel are converted into digital signals by analog-to-digital conversion;
the short-time Fourier transform unit is used for performing short-time Fourier transform on the digital signal and converting the digital signal from a time domain to a frequency domain;
the sound source positioning unit is used for performing sound source positioning on the frequency domain signal and selecting a target sound source direction;
a beam forming unit for pair M1Performing beam forming on the frequency domain signal to obtain M2Outputting beams in various directions; wherein, M is2Greater than M1Said M is2The beam outputs of the individual directions include a beam output of a target sound source direction;
a first dereverberation unit for applying a NLMS-based WPE algorithm to the M2Performing first dereverberation processing on the beam output in each direction;
and the inverse Fourier transform unit is used for performing inverse Fourier transform on the signal subjected to the dereverberation processing to obtain a dereverberation signal of a time domain.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201911416265.9A 2019-12-31 2019-12-31 Dereverberation method, apparatus, device and storage medium Active CN111128220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911416265.9A CN111128220B (en) 2019-12-31 2019-12-31 Dereverberation method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911416265.9A CN111128220B (en) 2019-12-31 2019-12-31 Dereverberation method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN111128220A true CN111128220A (en) 2020-05-08
CN111128220B CN111128220B (en) 2022-06-28

Family

ID=70506710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911416265.9A Active CN111128220B (en) 2019-12-31 2019-12-31 Dereverberation method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (1) CN111128220B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111341338A (en) * 2020-05-19 2020-06-26 深圳市友杰智新科技有限公司 Method and device for eliminating echo and computer equipment
CN111650559A (en) * 2020-06-12 2020-09-11 深圳市裂石影音科技有限公司 Real-time processing two-dimensional sound source positioning method
CN111883162A (en) * 2020-07-24 2020-11-03 杨汉丹 Awakening method and device and computer equipment
CN113160842A (en) * 2021-03-06 2021-07-23 西安电子科技大学 Voice dereverberation method and system based on MCLP
CN113257265A (en) * 2021-05-10 2021-08-13 北京有竹居网络技术有限公司 Voice signal dereverberation method and device and electronic equipment
CN113687305A (en) * 2021-07-26 2021-11-23 浙江大华技术股份有限公司 Method, device and equipment for positioning sound source azimuth and computer readable storage medium
CN114813129A (en) * 2022-04-30 2022-07-29 北京化工大学 Acoustic signal fault diagnosis method of rolling bearing based on WPE and EMD
US11641545B2 (en) 2021-09-01 2023-05-02 Acer Incorporated Conference terminal and feedback suppression method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182411A1 (en) * 2016-12-23 2018-06-28 Synaptics Incorporated Multiple input multiple output (mimo) audio signal processing for speech de-reverberation
CN109994120A (en) * 2017-12-29 2019-07-09 福州瑞芯微电子股份有限公司 Sound enhancement method, system, speaker and storage medium based on diamylose

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182411A1 (en) * 2016-12-23 2018-06-28 Synaptics Incorporated Multiple input multiple output (mimo) audio signal processing for speech de-reverberation
CN110088834A (en) * 2016-12-23 2019-08-02 辛纳普蒂克斯公司 Multiple-input and multiple-output (MIMO) Audio Signal Processing for speech dereverbcration
CN109994120A (en) * 2017-12-29 2019-07-09 福州瑞芯微电子股份有限公司 Sound enhancement method, system, speaker and storage medium based on diamylose

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CE XU: "Detection in present of reverberation bombined with blind source separatin and beamforming", 《2012 2ND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL》 *
毛维等: "双微阵列语音增强算法在说话人识别中的应用", 《声学技术》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111341338A (en) * 2020-05-19 2020-06-26 深圳市友杰智新科技有限公司 Method and device for eliminating echo and computer equipment
CN111650559A (en) * 2020-06-12 2020-09-11 深圳市裂石影音科技有限公司 Real-time processing two-dimensional sound source positioning method
CN111883162A (en) * 2020-07-24 2020-11-03 杨汉丹 Awakening method and device and computer equipment
CN113160842A (en) * 2021-03-06 2021-07-23 西安电子科技大学 Voice dereverberation method and system based on MCLP
CN113160842B (en) * 2021-03-06 2024-04-09 西安电子科技大学 MCLP-based voice dereverberation method and system
CN113257265A (en) * 2021-05-10 2021-08-13 北京有竹居网络技术有限公司 Voice signal dereverberation method and device and electronic equipment
CN113687305A (en) * 2021-07-26 2021-11-23 浙江大华技术股份有限公司 Method, device and equipment for positioning sound source azimuth and computer readable storage medium
US11641545B2 (en) 2021-09-01 2023-05-02 Acer Incorporated Conference terminal and feedback suppression method
CN114813129A (en) * 2022-04-30 2022-07-29 北京化工大学 Acoustic signal fault diagnosis method of rolling bearing based on WPE and EMD
CN114813129B (en) * 2022-04-30 2024-03-26 北京化工大学 Rolling bearing acoustic signal fault diagnosis method based on WPE and EMD

Also Published As

Publication number Publication date
CN111128220B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN111128220B (en) Dereverberation method, apparatus, device and storage medium
US10446171B2 (en) Online dereverberation algorithm based on weighted prediction error for noisy time-varying environments
US10657981B1 (en) Acoustic echo cancellation with loudspeaker canceling beamformer
JP7011075B2 (en) Target voice acquisition method and device based on microphone array
EP2222091B1 (en) Method for determining a set of filter coefficients for an acoustic echo compensation means
KR101239604B1 (en) Multi-channel adaptive speech signal processing with noise reduction
US11373667B2 (en) Real-time single-channel speech enhancement in noisy and time-varying environments
CN107018470B (en) A kind of voice recording method and system based on annular microphone array
CN109285557B (en) Directional pickup method and device and electronic equipment
US10887691B2 (en) Audio capture using beamforming
US20150063589A1 (en) Method, apparatus, and manufacture of adaptive null beamforming for a two-microphone array
CN112435685B (en) Blind source separation method and device for strong reverberation environment, voice equipment and storage medium
US11483646B1 (en) Beamforming using filter coefficients corresponding to virtual microphones
Nakajima et al. Adaptive step-size parameter control for real-world blind source separation
Corey et al. Motion-tolerant beamforming with deformable microphone arrays
CN113050035B (en) Two-dimensional directional pickup method and device
CN110419228B (en) Signal processing device
Priyanka et al. Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement
Kovalyov et al. Dfsnet: A steerable neural beamformer invariant to microphone array configuration for real-time, low-latency speech enhancement
Dietzen et al. Speech dereverberation by data-dependent beamforming with signal pre-whitening
CN114758670A (en) Beam forming method, beam forming device, electronic equipment and storage medium
CN116017230A (en) Microphone array, signal processing method, device, equipment and medium thereof
TWI517143B (en) A method for noise reduction and speech enhancement
Braun et al. Low complexity online convolutional beamforming
Barnov et al. A robust RLS implementation of the ANC block in GSC structures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method, device, equipment, and storage medium for reverberation removal

Granted publication date: 20220628

Pledgee: Shenzhen Shunshui Incubation Management Co.,Ltd.

Pledgor: SHENZHEN YOUJIE ZHIXIN TECHNOLOGY Co.,Ltd.

Registration number: Y2024980029366

PE01 Entry into force of the registration of the contract for pledge of patent right