[go: up one dir, main page]

CN113470675A - Audio signal processing method and device - Google Patents

Audio signal processing method and device Download PDF

Info

Publication number
CN113470675A
CN113470675A CN202110739135.XA CN202110739135A CN113470675A CN 113470675 A CN113470675 A CN 113470675A CN 202110739135 A CN202110739135 A CN 202110739135A CN 113470675 A CN113470675 A CN 113470675A
Authority
CN
China
Prior art keywords
vector
signal
echo
current frame
echo separation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110739135.XA
Other languages
Chinese (zh)
Other versions
CN113470675B (en
Inventor
操陈斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202110739135.XA priority Critical patent/CN113470675B/en
Publication of CN113470675A publication Critical patent/CN113470675A/en
Application granted granted Critical
Publication of CN113470675B publication Critical patent/CN113470675B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02087Noise filtering the noise being separate speech, e.g. cocktail party

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

The present disclosure relates to the field of voice communication technologies, and in particular, to an audio signal processing method and apparatus. An audio signal processing method, comprising: determining a first signal vector according to the first reference signal, the second reference signal and a first audio signal picked up by a microphone; the first audio signal comprises a first echo signal generated by a first loudspeaker playing a first reference signal and a second echo signal generated by a second loudspeaker playing a second reference signal; obtaining a first residual signal vector according to a first signal vector of a current frame and an echo separation vector of a previous frame; updating the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame; and performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal. The method improves the elimination effect of stereo echo and improves the voice communication effect.

Description

Audio signal processing method and device
Technical Field
The present disclosure relates to the field of voice communication technologies, and in particular, to an audio signal processing method and apparatus.
Background
With the development of voice communication systems towards more realistic audio and video directions, for example, in scenes such as online games and video conferences, two speakers are often used to form stereo. After the near-end two speakers play the far-end transmitted sound, the near-end microphone will pick up the sound again and transmit the sound to the far-end, generating an acoustic echo.
In the related art, for echo cancellation in stereo systems, two adaptive filters are generally used to estimate the echo path from each speaker to the microphone, respectively, to cancel stereo echoes. However, for complex acoustic scenes such as Double talk (Double talk), since the echo path of the stereo system changes constantly, the echo cancellation method of the related art cannot accurately and quickly estimate the acoustic transfer function, resulting in poor echo cancellation effect.
Disclosure of Invention
In order to improve the echo cancellation effect of a stereo speech system, the embodiments of the present disclosure provide an audio signal processing method and apparatus.
In a first aspect, the disclosed embodiments provide an audio signal processing method, including:
determining a first signal vector according to the first reference signal, the second reference signal and a first audio signal picked up by a microphone; the first audio signal comprises a first echo signal generated by a first speaker playing the first reference signal and a second echo signal generated by a second speaker playing the second reference signal;
obtaining a first residual signal vector according to the first signal vector of the current frame and the echo separation vector of the previous frame;
updating the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame;
and performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal.
In some embodiments, the determining a first signal vector from the first reference signal, the second reference signal, and the first audio signal picked up by the microphone comprises:
respectively transforming the first reference signal, the second reference signal and the first audio signal from a time domain to a frequency domain to obtain a first frequency domain reference signal, a second frequency domain reference signal and a first frequency domain audio signal;
and arranging the vectors of the first frequency domain reference signal, the second frequency domain reference signal and the first frequency domain audio signal according to a preset direction to obtain the first signal vector.
In some embodiments, the obtaining a first residual signal vector according to the first signal vector of the current frame and the echo separation vector of the previous frame includes:
performing echo separation on the first signal vector based on the echo separation vector of the previous frame to obtain the first residual signal vector under the condition that the current frame is not the initial frame;
and under the condition that the current frame is an initial frame, performing echo separation on the first signal vector based on a preset initial echo separation vector to obtain the first residual signal vector.
In some embodiments, the updating the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame includes:
determining an auxiliary variable of the current frame according to the first signal vector and the first residual signal vector of the current frame and an auxiliary variable of a previous frame;
and updating the echo separation vector of the previous frame according to the auxiliary variable of the current frame to obtain the echo separation vector of the current frame.
In some embodiments, said determining an auxiliary variable of the current frame from said first signal vector and first residual signal vector of the current frame and an auxiliary variable of the previous frame comprises:
determining an evaluation function according to the first residual signal vector of the current frame;
determining a contrast function according to the evaluation function;
determining a first covariance matrix according to the first signal vector of the current frame;
and determining the echo separation vector of the current frame according to the auxiliary variable of the previous frame, the first covariance matrix, the contrast function and the smoothing function.
In some embodiments, the performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal includes:
performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target frequency domain signal;
and converting the target frequency domain signal from a frequency domain to a time domain to obtain the target audio signal.
In a second aspect, the present disclosure provides an audio signal processing apparatus, including:
a determining module configured to determine a first signal vector from the first reference signal, the second reference signal and a first audio signal picked up by the microphone; the first audio signal comprises a first echo signal generated by a first speaker playing the first reference signal and a second echo signal generated by a second speaker playing the second reference signal;
a deriving module configured to derive a first residual signal vector from the first signal vector of a current frame and an echo separation vector of a previous frame;
a vector updating module configured to update the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain an echo separation vector of the current frame;
and the echo separation module is configured to perform echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal.
In some embodiments, the determining module is specifically configured to:
respectively transforming the first reference signal, the second reference signal and the first audio signal from a time domain to a frequency domain to obtain a first frequency domain reference signal, a second frequency domain reference signal and a first frequency domain audio signal;
and arranging the vectors of the first frequency domain reference signal, the second frequency domain reference signal and the first frequency domain audio signal according to a preset direction to obtain the first signal vector.
In some embodiments, the obtaining module is specifically configured to:
performing echo separation on the first signal vector based on the echo separation vector of the previous frame to obtain the first residual signal vector under the condition that the current frame is not the initial frame;
and under the condition that the current frame is an initial frame, performing echo separation on the first signal vector based on a preset initial echo separation vector to obtain the first residual signal vector.
In some embodiments, the vector update module is specifically configured to:
determining an auxiliary variable of the current frame according to the first signal vector and the first residual signal vector of the current frame and an auxiliary variable of a previous frame;
and updating the echo separation vector of the previous frame according to the auxiliary variable of the current frame to obtain the echo separation vector of the current frame.
In some embodiments, the vector update module is specifically configured to:
determining an evaluation function according to the first residual signal vector of the current frame;
determining a contrast function according to the evaluation function;
determining a first covariance matrix according to the first signal vector of the current frame;
and determining the echo separation vector of the current frame according to the auxiliary variable of the previous frame, the first covariance matrix, the contrast function and the smoothing function.
In some embodiments, the echo separation module is specifically configured to:
performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target frequency domain signal;
and converting the target frequency domain signal from a frequency domain to a time domain to obtain the target audio signal.
In a third aspect, the disclosed embodiments provide an electronic device, including:
a microphone;
a first speaker;
a second speaker;
a processor; and
a memory storing computer instructions for causing a processor to perform the method according to any of the embodiments of the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a storage medium storing computer instructions for causing a computer to execute the method according to any one of the embodiments of the first aspect.
The audio signal processing method of the embodiment of the disclosure includes determining a first signal vector according to a first reference signal, a second reference signal and a first audio signal picked up by a microphone, obtaining a first residual signal vector according to the first signal vector of a current frame and an echo separation vector of a previous frame, updating the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame, and performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal. In the embodiment of the disclosure, stereo echoes are separated based on an independent vector analysis manner, so that the stereo echoes are eliminated, and compared with the echo eliminating method in the related art, the stereo echo eliminating method can improve the stereo echo eliminating effect in complex environments such as a double-talk scene, avoid damaging near-end voice, and improve the voice communication effect.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 2 is a schematic diagram of an audio signal processing method according to some embodiments of the present disclosure.
Fig. 3 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 4 is a schematic diagram of an analysis window in an audio signal processing method according to some embodiments of the present disclosure.
Fig. 5 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 6 is a flow chart of an audio signal processing method in some embodiments according to the present disclosure.
Fig. 7 is a block diagram of an audio signal processing apparatus according to some embodiments of the present disclosure.
FIG. 8 is a block diagram of an electronic device suitable for implementing the method of the present disclosure.
Detailed Description
The technical solutions of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure. In addition, technical features involved in different embodiments of the present disclosure described below may be combined with each other as long as they do not conflict with each other.
Stereophonic sound refers to sound with a sense of space emitted by a plurality of loudspeaker sound channels, which is closer to real sound in the nature and can achieve a better audio-video effect, and therefore, the stereophonic sound is widely applied to scenes such as video conferences and online games. The sound emitted by the two speakers of a stereo speech system is picked up again by the near-end microphone and transmitted to the far-end to form acoustic echoes, which can seriously affect the quality of the speech call.
In the related art, one method for canceling stereo echo is to provide two adaptive filters, such as NLMS (Normalized Least Mean square) filters, and each adaptive filter estimates an echo path from one speaker to a microphone to cancel an echo generated by the speaker, thereby implementing stereo echo cancellation. However, in a complex Double talk (Double talk) scene such as multi-user online voice, since the echo path of the stereo system changes constantly, the adaptive filter cannot estimate the change of the echo path quickly and accurately, resulting in a poor echo cancellation effect.
Another type of related art method is echo cancellation based on Independent Component Analysis (ICA) method, which is implemented by analyzing each loudspeaker echo signal and distinguishing the echo signal from the microphone pickup signal. However, in a complex audio scene, due to the inherent frequency arrangement ambiguity of the audio signal, the near-end speech signal is distorted, and the speech communication effect is affected.
Based on the above-mentioned drawbacks of the related art, the embodiments of the present disclosure provide an audio signal processing method, an audio signal processing apparatus, an audio system, and a storage medium, which are intended to improve the echo cancellation effect of a stereo audio system.
In a first aspect, the embodiments of the present disclosure provide an audio signal processing method, which may be applied to an electronic device with a voice communication system, such as a mobile phone, a tablet computer, a notebook computer, and the like, and the disclosure is not limited thereto.
As shown in fig. 1, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
and S110, determining a first signal vector according to the first reference signal, the second reference signal and the first audio signal picked up by the microphone.
Specifically, the voice communication system of the embodiment of the present disclosure is a stereo system, and the system includes two speakers constituting stereo, that is, a first speaker and a second speaker. The first loudspeaker and the second loudspeaker can be respectively arranged at different positions of the system, thereby forming stereo sound.
It will be appreciated that when the first and second speakers are playing sound, the microphone may pick up echo signals played by both speakers simultaneously. Since the two loudspeakers are located differently, the echoes of the first loudspeaker and the second loudspeaker that are picked up by the microphone have different echo paths.
The first reference signal and the second reference signal are far-end voice signals received by the system, for example, for a call scene, the reference signal refers to voice signals generated by speaking of a far-end speaker received by the system. The first reference signal is a far-end speech signal played through a first speaker, and the second reference signal is a far-end speech signal played through a second speaker.
When the first speaker plays the first reference signal, the first reference signal is propagated through an echo path between the first speaker and the microphone, and the microphone receives the first echo signal when the first reference signal reaches the microphone. Similarly, when the second reference signal is played by the second speaker, the second reference signal is propagated through an echo path between the second speaker and the microphone, and the microphone receives the second echo signal when the second reference signal reaches the microphone.
Meanwhile, for a double-talk scene, the microphone also collects a near-end voice signal generated when a near-end speaker speaks and a near-end background noise signal. That is, the first audio signal picked up by the microphone includes: a near-end speech signal, a background noise signal, a first echo signal, and a second echo signal.
In an embodiment of the disclosure, a first signal vector is determined from a first reference signal, a second reference signal, and a first audio signal picked up by a microphone.
For example, in some embodiments, the first reference signal, the second reference signal, and the first audio signal may be converted from a time domain form to a frequency domain form, thereby combining the signals in the frequency domain form into a matrix, i.e., a first signal vector. The following embodiments of the present disclosure will be described in detail, and will not be described in detail here.
And S120, obtaining a first residual signal vector according to the first signal vector of the current frame and the echo separation vector of the previous frame.
It should be noted that the first audio signal picked up by the microphone is a continuous signal in the time domain, and during signal processing, the continuous signal in the time domain needs to be divided into continuous multiframe signals. The "current frame" described in the embodiments of the present disclosure refers to a frame signal to be currently processed, and the "previous frame" refers to a continuous signal located one frame before the current frame.
In the embodiment of the present disclosure, as can be seen from S110, the first signal vector represents a vector including near-end speech and stereo echo, and the echo separation vector refers to a vector in which stereo echo is separated from the first signal vector. The first signal vector is processed through an echo separation vector to separate stereo echoes from the first signal vector while preserving near-end speech.
However, in the embodiment of the present disclosure, when processing the first signal vector of the current frame, first, the first signal vector is subjected to separation processing based on the echo separation vector of the previous frame. In a scene with a large change of the echo path, the echo signal of the current frame and the echo signal of the previous frame have a large change, so that the stereo echo cannot be accurately and completely separated from the first signal vector based on the echo separation vector of the previous frame. That is, after the first signal vector is processed by using the echo separation vector of the previous frame, the obtained first residual signal vector includes the near-end speech signal and the residual stereo echo signal.
Therefore, in the embodiment of the present disclosure, the echo separation vector of the previous frame needs to be updated based on the first residual signal vector, so as to obtain a relatively accurate echo separation vector of the current frame. The echo separation vector of the current frame can relatively accurately represent the stereo echo path of the current frame, so that the stereo echo can be more accurately separated from the first signal vector by using the echo separation vector of the current frame, and the near-end voice is kept. The following S130 to S140 are specifically described.
The following embodiments of the present disclosure will be described in detail with respect to a process of processing a first signal vector of a current frame based on an echo separation vector of a previous frame to obtain a first residual signal vector, which will not be described in detail herein.
And S130, updating the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame.
In some embodiments, an auxiliary variable of the current frame may be calculated based on the first signal vector, the first residual signal vector, and an auxiliary variable of a previous frame, and an echo separation vector of the current frame may be obtained according to the auxiliary variable of the current frame. The present disclosure is described in detail below, and will not be described in detail here.
It can be understood that the echo separation vector of the current frame represents the stereo echo signal corresponding to the current frame, and the stereo echo path at the current time can be better represented relative to the echo separation vector of the previous frame, and the stereo echo signal can be better separated for the complex scene with the echo path change.
And S140, performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal.
After obtaining the echo separation vector of the current frame, as can be seen from the foregoing description, the first signal vector represents a vector of the current frame including the near-end speech and the stereo echo, and the echo separation vector of the current frame represents an echo path of the stereo echo of the current frame.
Therefore, echo separation is carried out on the first signal based on the echo separation vector of the current frame, namely, the two vectors are multiplied, and the target audio signal can be obtained. The target audio signal represents the near-end speech signal after the stereo echo is cancelled.
In the above description, a frame signal processing procedure in the first audio signal is specifically described, and in the processing procedure of consecutive frames, as time goes on, the processing procedure of each "current frame" repeatedly executes the above processes from S110 to S140, so as to implement processing on the signal picked up by the microphone, and obtain a clean near-end speech signal after stereo echo is removed.
As can be seen from the above description, in the embodiment of the present disclosure, the stereo echo is eliminated by using a method based on Independent Vector Analysis (IVA), that is, the first audio signal picked up by the microphone and the stereo echo signals generated by the two speakers are constructed as a first signal Vector, and then the stereo echo is separated based on the Independent Vector (the first signal Vector), so that the echo cancellation problem is converted into a multi-channel speech separation problem, and the stereo echo cancellation is implemented.
In addition, compared with the echo cancellation method of the double filter in the related technology, extra double-talk scene detection and self-adaptive step length control are not needed, and the problem of poor double-talk scene stereo echo cancellation effect caused by inaccurate detection and control is fundamentally avoided. Compared with the ICA method in the related art, the method avoids the problem of near-end voice signal distortion caused by the possibility of wrong arrangement on the frequency band, and the near-end voice signal and echo separation vector updating speed is higher, thereby improving the voice communication effect.
Fig. 2 shows a schematic diagram of an audio signal processing method in some embodiments of the present disclosure, and the method of the present disclosure is described below with reference to fig. 2.
As shown in fig. 2, the audio system of the disclosed example is a stereo system, which includes two speakers, namely, a first speaker 210, a second speaker 220, and a microphone 100. The first speaker 210 may play the received far-end first reference signal x1(n), so that the microphone 100 picks up the first echo signal y1(n) generated by playing the first reference signal x1 (n). The second speaker 220 may play the received far-end second reference signal x2(n), so that the microphone 100 picks up the first echo signal y2(n) generated by playing the first reference signal x2 (n).
Meanwhile, in a double-talk scene, a near-end speech signal s (n) generated by the near-end speaker speaking and a near-end background noise signal v (n) can be also picked up by the microphone 100. That is, the first audio signal d (n) picked up by the microphone 100 can be represented as:
d(n)=s(n)+v(n)+y1(n)+y2(n)
where s (n) + v (n) denotes a near-end audio signal, which includes a near-end speech signal and a background noise signal. y1(n) + y2(n) represents stereo echo signals, and the method of embodiments of the present disclosure aims to cancel the stereo echo signals y1(n) and y2(n) from the first audio signal d (n).
As shown in fig. 3, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
s310, respectively transforming the first reference signal, the second reference signal and the first audio signal from the time domain to the frequency domain to obtain a first frequency domain reference signal, a second frequency domain reference signal and a first frequency domain audio signal.
S320, arranging the vectors of the first frequency domain reference signal, the second frequency domain reference signal and the first frequency domain audio signal according to a preset direction to obtain a first signal vector.
Specifically, the first reference signal, the second reference signal and the first audio signal are time domain signals, and in order to facilitate the signal processing calculation, the time domain signals are first converted into a frequency domain.
In some embodiments, a short-time Fourier transform (STFT) may be employed to convert the time-domain signal to a frequency-domain signal. In one example, the process of the STFT in fig. 2 is represented as:
Xn=fft(d.*win)
Xf1=fft(x1.*win)
Xf2=fft(x2.*win)
where d is the first audio signal picked up by the microphone 100, x1 is the first reference signal, x2 is the second reference signal, and fft (·) is a short-time fourier transform.
win is a short analysis window, which is expressed as:
win=[0;sqrt(hanning(N-1))]
hanning(n)=0.5*[1-cos(2π*n/N)]
where N is the analysis frame length and hanning (N) is the Hanning window of length N-1. In one example, the short analysis window may be as shown in FIG. 4.
The first reference signal Xf1, the second reference signal Xf2 and the first audio signal Xn are obtained by the above formula calculation, and then the first reference signal Xf1, the second reference signal Xf2 and the first audio signal Xn are combined into a matrix form to obtain a first signal vector, which is expressed as:
X(l)=[Xn,Xf1,Xf2]
where l denotes a frame index, k denotes a frequency point, and x (l) denotes a first signal vector.
With continued reference to fig. 2, after obtaining the first signal vector, the echo cancellation module 300 separates the stereo echo signals in the first signal vector. In the embodiment of the present disclosure, a first residual signal vector is obtained according to a first signal vector of a current frame and an echo separation vector of a previous frame, and is represented as:
E1(l,k)=WT(l-1,k)×X(l,k)
where X (l, k) denotes a first signal vector of the current frame (the l-th frame), WT(l-1, k) represents the echo separation vector of the previous frame (l-1 st frame), E1(l, k) denotes a first residual signal vector of the current frame. That is, in the above formula, the stereo echo in the first signal vector of the current frame is first cancelled by the echo separation vector of the previous frame, and the obtained first residual signal vector E1(l,k)。
It should be noted that, during the signal processing of the initial frame, since the initial frame has no previous frame signal, the initial echo separation vector may be preset, and in the case that the current frame is the initial frame, the first signal vector of the initial frame obtains the first residual signal vector based on the initial echo separation vector.
And under the condition that the current frame is not the initial frame, because each frame signal can be calculated to obtain the echo separation vector corresponding to the current frame, the first signal vector of the current frame obtains the first residual signal vector based on the echo separation vector of the previous frame. And the echo separation vector of the current frame obtained by subsequent calculation is used as the echo separation vector of the previous frame of the next frame signal, and the loop iteration processing is carried out.
As shown in fig. 5, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
and S510, determining an auxiliary variable of the current frame according to the first signal vector and the first residual signal vector of the current frame and the auxiliary variable of the previous frame.
S520, updating the echo separation vector of the previous frame according to the auxiliary variable of the current frame to obtain the echo separation vector of the current frame.
Specifically, as can be seen from the foregoing, in the embodiment of the present disclosure, it is necessary to obtain an echo separation vector of the current frame based on the first residual signal vector. In the embodiment of the present disclosure, an auxiliary variable function is introduced to calculate and obtain an echo separation vector of the current frame.
Firstly, according to the first residual signal E of the current frame1(l, k) an evaluation function r is calculated, expressed as:
Figure BDA0003142497280000121
the evaluation function r represents the evaluation of the first residual signal E1Analytical evaluation of each frequency point in (l, k). A contrast function is then determined from the evaluation function
Figure BDA0003142497280000122
Expressed as:
Figure BDA0003142497280000123
a first covariance matrix Xf (l, k) is then determined from the first signal vector X (l, k) of the current frame, as represented by: xf (l, k) ═ X (l, k) ×H(l, k) wherein (·)HRepresenting the hermitian conjugate transpose. Then, updating the auxiliary variable of the previous frame based on the function and the covariance matrix to obtain the auxiliary variable of the current frame, which is expressed as:
Figure BDA0003142497280000124
where V (l, k) denotes an auxiliary variable of a current frame (frame I), V (l-1, k) denotes an auxiliary variable of a previous frame (frame I-1), alpha denotes a smoothing function,
Figure BDA0003142497280000125
representing a contrast function.
After determining the auxiliary variable V (l, k) of the current frame, an echo separation vector of the current frame is obtained according to the auxiliary variable V (l, k) of the current frame, and is expressed as:
W(l,k)=V(l,k)-1I
where W (l, k) denotes an echo separation vector of the current frame (I-th frame), I is a unit vector, and I ═ 1,0]T
Through the above process, the echo separation vector W (l, k) of the current frame is calculated, so that the stereo echo in the first signal vector of the current frame can be separated based on the echo separation vector of the current frame.
As shown in fig. 6, in some embodiments, an audio signal processing method of an example of the present disclosure includes:
s610, echo separation is carried out on the first signal vector based on the echo separation vector of the current frame, and a target frequency domain signal is obtained.
And S620, converting the target frequency domain signal from a frequency domain to a time domain to obtain a target audio signal.
Specifically, the process of performing echo separation on the first signal vector of the current frame is represented as:
E2(l,k)=WT(l,k)×X(l,k)
wherein, WT(l, k) denotes an echo separation vector of the current frame, and X (l, k) denotes a first signal vector of the current frame. And performing echo separation on the first signal vector of the current frame based on the echo separation vector of the current frame to obtain the audio signal without the stereo echo.
It should be noted that, as shown in fig. 2, after performing echo separation and cancellation on the first signal vector of the current frame, the echo cancellation module 300 obtains a target frequency domain signal in a frequency domain form, so that the target frequency domain signal can be converted into a time domain by inverse short-time fourier transform (ISTFT), and a target audio signal e in a time domain form is obtained, which can be represented as:
e=ifft(E(l)).*win
where e is the target audio signal and ifft (-) is the inverse short-time fourier transform. The target audio signal e is a clean near-end audio signal after the stereo echo is removed, and mainly includes near-end speech and background noise.
Therefore, in the embodiment of the disclosure, the echo separation vector of the stereo echo is estimated by using an independent vector analysis technology, and an auxiliary variable is introduced to accelerate the update of the echo separation vector, so as to achieve the elimination of the stereo echo.
In a second aspect, the embodiments of the present disclosure provide an audio signal processing apparatus, which may be applied to an electronic device with a voice communication system, such as a mobile phone, a tablet computer, a notebook computer, and the like, and the disclosure is not limited thereto.
As shown in fig. 7, in some embodiments, an audio signal processing apparatus of an example of the present disclosure includes:
a determining module 701 configured to determine a first signal vector from the first reference signal, the second reference signal and the first audio signal picked up by the microphone; the first audio signal comprises a first echo signal generated by a first loudspeaker playing a first reference signal and a second echo signal generated by a second loudspeaker playing a second reference signal;
a deriving module 702 configured to derive a first residual signal vector according to a first signal vector of a current frame and an echo separation vector of a previous frame;
a vector updating module 703 configured to update the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame;
and an echo separation module 704 configured to perform echo separation on the first signal vector based on the echo separation vector of the current frame, so as to obtain a target audio signal.
Therefore, in the embodiment of the disclosure, the echo separation vector of the stereo echo is estimated by using an independent vector analysis technology, and an auxiliary variable is introduced to accelerate the update of the echo separation vector, so as to achieve the elimination of the stereo echo.
In some embodiments, the determining module 701 is specifically configured to:
respectively transforming the first reference signal, the second reference signal and the first audio signal from a time domain to a frequency domain to obtain a first frequency domain reference signal, a second frequency domain reference signal and a first frequency domain audio signal;
and arranging the vectors of the first frequency domain reference signal, the second frequency domain reference signal and the first frequency domain audio signal according to a preset direction to obtain a first signal vector.
In some embodiments, the obtaining module 702 is specifically configured to:
under the condition that the current frame is not the initial frame, performing echo separation on the first signal vector based on the echo separation vector of the previous frame to obtain a first residual signal vector;
and under the condition that the current frame is the initial frame, performing echo separation on the first signal vector based on a preset initial echo separation vector to obtain a first residual signal vector.
In some embodiments, the vector update module 703 is specifically configured to:
determining an auxiliary variable of the current frame according to the first signal vector and the first residual signal vector of the current frame and an auxiliary variable of a previous frame;
and updating the echo separation vector of the previous frame according to the auxiliary variable of the current frame to obtain the echo separation vector of the current frame.
In some embodiments, the vector update module 703 is specifically configured to:
determining an evaluation function according to a first residual signal vector of the current frame;
determining a contrast function according to the evaluation function;
determining a first covariance matrix according to a first signal vector of a current frame;
and determining the echo separation vector of the current frame according to the auxiliary variable of the previous frame, the first covariance matrix, the contrast function and the smoothing function.
In some embodiments, the echo separation module 704 is specifically configured to:
performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target frequency domain signal;
and converting the target frequency domain signal from a frequency domain to a time domain to obtain a target audio signal.
In a third aspect, the disclosed embodiments provide an electronic device, including:
a microphone;
a first speaker;
a second speaker;
a processor; and
a memory storing computer instructions for causing the processor to perform the method according to any of the embodiments of the first aspect.
The electronic device according to the embodiment of the present disclosure may be described with reference to any one of the foregoing embodiments, and the present disclosure is not repeated herein.
In a fourth aspect, the disclosed embodiments provide a storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the first aspect.
Fig. 8 is a block diagram of an electronic device according to some embodiments of the present disclosure, and the following describes principles related to the electronic device and a storage medium according to some embodiments of the present disclosure with reference to fig. 8.
Referring to fig. 8, the electronic device 1800 may include one or more of the following components: processing component 1802, memory 1804, power component 1806, multimedia component 1808, audio component 1810, input/output (I/O) interface 1812, sensor component 1816, and communications component 1818.
The processing component 1802 generally controls the overall operation of the electronic device 1800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1802 may include one or more processors 1820 to execute instructions. Further, the processing component 1802 may include one or more modules that facilitate interaction between the processing component 1802 and other components. For example, the processing component 1802 can include a multimedia module to facilitate interaction between the multimedia component 1808 and the processing component 1802. As another example, the processing component 1802 can read executable instructions from a memory to implement electronic device related functions.
The memory 1804 is configured to store various types of data to support operation at the electronic device 1800. Examples of such data include instructions for any application or method operating on the electronic device 1800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 1806 provides power to various components of the electronic device 1800. The power components 1806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1800.
The multimedia component 1808 includes a display screen that provides an output interface between the electronic device 1800 and a user. In some embodiments, the multimedia component 1808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera can receive external multimedia data when the electronic device 1800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
Audio component 1810 is configured to output and/or input audio signals. For example, the audio component 1810 can include a Microphone (MIC) that can be configured to receive external audio signals when the electronic device 1800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1804 or transmitted via the communication component 1818. In some embodiments, audio component 1810 also includes a speaker for outputting audio signals.
I/O interface 1812 provides an interface between processing component 1802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 1816 includes one or more sensors to provide status evaluations of various aspects for the electronic device 1800. For example, the sensor component 1816 can detect an open/closed state of the electronic device 1800, the relative positioning of components such as a display and keypad of the electronic device 1800, the sensor component 1816 can also detect a change in position of the electronic device 1800 or a component of the electronic device 1800, the presence or absence of user contact with the electronic device 1800, orientation or acceleration/deceleration of the electronic device 1800, and a change in temperature of the electronic device 1800. Sensor assembly 1816 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1816 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1816 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1818 is configured to facilitate communications between the electronic device 1800 and other devices in a wired or wireless manner. The electronic device 1800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G, 5G, or 6G, or a combination thereof. In an exemplary embodiment, the communication component 1818 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1818 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 1800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components.
It should be understood that the above embodiments are only examples for clearly illustrating the present invention, and are not intended to limit the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the present disclosure may be made without departing from the scope of the present disclosure.

Claims (14)

1. An audio signal processing method, comprising:
determining a first signal vector according to the first reference signal, the second reference signal and a first audio signal picked up by a microphone; the first audio signal comprises a first echo signal generated by a first speaker playing the first reference signal and a second echo signal generated by a second speaker playing the second reference signal;
obtaining a first residual signal vector according to the first signal vector of the current frame and the echo separation vector of the previous frame;
updating the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame;
and performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal.
2. The method of claim 1, wherein determining the first signal vector from the first reference signal, the second reference signal, and the first audio signal picked up by the microphone comprises:
respectively transforming the first reference signal, the second reference signal and the first audio signal from a time domain to a frequency domain to obtain a first frequency domain reference signal, a second frequency domain reference signal and a first frequency domain audio signal;
and arranging the vectors of the first frequency domain reference signal, the second frequency domain reference signal and the first frequency domain audio signal according to a preset direction to obtain the first signal vector.
3. The method of claim 1, wherein obtaining a first residual signal vector according to the first signal vector of the current frame and the echo separation vector of the previous frame comprises:
performing echo separation on the first signal vector based on the echo separation vector of the previous frame to obtain the first residual signal vector under the condition that the current frame is not the initial frame;
and under the condition that the current frame is an initial frame, performing echo separation on the first signal vector based on a preset initial echo separation vector to obtain the first residual signal vector.
4. The method of claim 1, wherein the updating the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain the echo separation vector of the current frame comprises:
determining an auxiliary variable of the current frame according to the first signal vector and the first residual signal vector of the current frame and an auxiliary variable of a previous frame;
and updating the echo separation vector of the previous frame according to the auxiliary variable of the current frame to obtain the echo separation vector of the current frame.
5. The method of claim 4, wherein determining the auxiliary variable of the current frame according to the first signal vector and the first residual signal vector of the current frame and the auxiliary variable of the previous frame comprises:
determining an evaluation function according to the first residual signal vector of the current frame;
determining a contrast function according to the evaluation function;
determining a first covariance matrix according to the first signal vector of the current frame;
and determining the echo separation vector of the current frame according to the auxiliary variable of the previous frame, the first covariance matrix, the contrast function and the smoothing function.
6. The method of claim 2, wherein the performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal comprises:
performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target frequency domain signal;
and converting the target frequency domain signal from a frequency domain to a time domain to obtain the target audio signal.
7. An audio signal processing apparatus, comprising:
a determining module configured to determine a first signal vector from the first reference signal, the second reference signal and a first audio signal picked up by the microphone; the first audio signal comprises a first echo signal generated by a first speaker playing the first reference signal and a second echo signal generated by a second speaker playing the second reference signal;
a deriving module configured to derive a first residual signal vector from the first signal vector of a current frame and an echo separation vector of a previous frame;
a vector updating module configured to update the echo separation vector of the previous frame according to the first signal vector and the first residual signal vector to obtain an echo separation vector of the current frame;
and the echo separation module is configured to perform echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target audio signal.
8. The apparatus of claim 7, wherein the determination module is specifically configured to:
respectively transforming the first reference signal, the second reference signal and the first audio signal from a time domain to a frequency domain to obtain a first frequency domain reference signal, a second frequency domain reference signal and a first frequency domain audio signal;
and arranging the vectors of the first frequency domain reference signal, the second frequency domain reference signal and the first frequency domain audio signal according to a preset direction to obtain the first signal vector.
9. The apparatus of claim 7, wherein the obtaining module is specifically configured to:
performing echo separation on the first signal vector based on the echo separation vector of the previous frame to obtain the first residual signal vector under the condition that the current frame is not the initial frame;
and under the condition that the current frame is an initial frame, performing echo separation on the first signal vector based on a preset initial echo separation vector to obtain the first residual signal vector.
10. The apparatus of claim 7, wherein the vector update module is specifically configured to:
determining an auxiliary variable of the current frame according to the first signal vector and the first residual signal vector of the current frame and an auxiliary variable of a previous frame;
and updating the echo separation vector of the previous frame according to the auxiliary variable of the current frame to obtain the echo separation vector of the current frame.
11. The apparatus of claim 10, wherein the vector update module is specifically configured to:
determining an evaluation function according to the first residual signal vector of the current frame;
determining a contrast function according to the evaluation function;
determining a first covariance matrix according to the first signal vector of the current frame;
and determining the echo separation vector of the current frame according to the auxiliary variable of the previous frame, the first covariance matrix, the contrast function and the smoothing function.
12. The apparatus of claim 8, wherein the echo separation module is specifically configured to:
performing echo separation on the first signal vector based on the echo separation vector of the current frame to obtain a target frequency domain signal;
and converting the target frequency domain signal from a frequency domain to a time domain to obtain the target audio signal.
13. An electronic device, comprising:
a microphone;
a first speaker;
a second speaker;
a processor; and
memory storing computer instructions for causing a processor to perform the method according to any one of claims 1 to 6.
14. A storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 6.
CN202110739135.XA 2021-06-30 2021-06-30 Audio signal processing method and device Active CN113470675B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110739135.XA CN113470675B (en) 2021-06-30 2021-06-30 Audio signal processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110739135.XA CN113470675B (en) 2021-06-30 2021-06-30 Audio signal processing method and device

Publications (2)

Publication Number Publication Date
CN113470675A true CN113470675A (en) 2021-10-01
CN113470675B CN113470675B (en) 2024-06-25

Family

ID=77876658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110739135.XA Active CN113470675B (en) 2021-06-30 2021-06-30 Audio signal processing method and device

Country Status (1)

Country Link
CN (1) CN113470675B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278455A (en) * 2022-06-27 2022-11-01 深圳市中深澳信息技术有限公司 Audio processing method and device, conference microphone and computer readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160057534A1 (en) * 2014-08-20 2016-02-25 Yuan Ze University Acoustic echo cancellation method and system using the same
CN106887238A (en) * 2017-03-01 2017-06-23 中国科学院上海微系统与信息技术研究所 A kind of acoustical signal blind separating method based on improvement Independent Vector Analysis algorithm
KR101802444B1 (en) * 2016-07-15 2017-11-29 서강대학교산학협력단 Robust speech recognition apparatus and method for Bayesian feature enhancement using independent vector analysis and reverberation parameter reestimation
CN107483761A (en) * 2016-06-07 2017-12-15 电信科学技术研究院 A kind of echo suppressing method and device
US20190272842A1 (en) * 2018-03-01 2019-09-05 Apple Inc. Speech enhancement for an electronic device
CN111128221A (en) * 2019-12-17 2020-05-08 北京小米智能科技有限公司 Audio signal processing method and device, terminal and storage medium
CN111161751A (en) * 2019-12-25 2020-05-15 声耕智能科技(西安)研究院有限公司 Distributed microphone pickup system and method under complex scene
WO2020097828A1 (en) * 2018-11-14 2020-05-22 深圳市欢太科技有限公司 Echo cancellation method, delay estimation method, echo cancellation apparatus, delay estimation apparatus, storage medium, and device
CN111418010A (en) * 2017-12-08 2020-07-14 华为技术有限公司 Multi-microphone noise reduction method and device and terminal equipment
CN111524498A (en) * 2020-04-10 2020-08-11 维沃移动通信有限公司 Filtering method and device and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160057534A1 (en) * 2014-08-20 2016-02-25 Yuan Ze University Acoustic echo cancellation method and system using the same
CN107483761A (en) * 2016-06-07 2017-12-15 电信科学技术研究院 A kind of echo suppressing method and device
KR101802444B1 (en) * 2016-07-15 2017-11-29 서강대학교산학협력단 Robust speech recognition apparatus and method for Bayesian feature enhancement using independent vector analysis and reverberation parameter reestimation
CN106887238A (en) * 2017-03-01 2017-06-23 中国科学院上海微系统与信息技术研究所 A kind of acoustical signal blind separating method based on improvement Independent Vector Analysis algorithm
CN111418010A (en) * 2017-12-08 2020-07-14 华为技术有限公司 Multi-microphone noise reduction method and device and terminal equipment
US20190272842A1 (en) * 2018-03-01 2019-09-05 Apple Inc. Speech enhancement for an electronic device
WO2020097828A1 (en) * 2018-11-14 2020-05-22 深圳市欢太科技有限公司 Echo cancellation method, delay estimation method, echo cancellation apparatus, delay estimation apparatus, storage medium, and device
CN111128221A (en) * 2019-12-17 2020-05-08 北京小米智能科技有限公司 Audio signal processing method and device, terminal and storage medium
CN111161751A (en) * 2019-12-25 2020-05-15 声耕智能科技(西安)研究院有限公司 Distributed microphone pickup system and method under complex scene
CN111524498A (en) * 2020-04-10 2020-08-11 维沃移动通信有限公司 Filtering method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278455A (en) * 2022-06-27 2022-11-01 深圳市中深澳信息技术有限公司 Audio processing method and device, conference microphone and computer readable storage medium

Also Published As

Publication number Publication date
CN113470675B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
US8842851B2 (en) Audio source localization system and method
CN105513596B (en) Voice control method and control equipment
CN113362843B (en) Audio signal processing method and device
US10978086B2 (en) Echo cancellation using a subset of multiple microphones as reference channels
CN115482830B (en) Speech enhancement method and related equipment
US20240144948A1 (en) Sound signal processing method and electronic device
CN112217948B (en) Echo processing method, device, equipment and storage medium for voice call
CN106791245B (en) Method and device for determining filter coefficients
CN112447184A (en) Voice signal processing method and device, electronic equipment and storage medium
CN113421579B (en) Sound processing method, device, electronic equipment and storage medium
CN113470675B (en) Audio signal processing method and device
Tashev Recent advances in human-machine interfaces for gaming and entertainment
US11388281B2 (en) Adaptive method and apparatus for intelligent terminal, and terminal
CN113488066B (en) Audio signal processing method, audio signal processing device and storage medium
CN113489855A (en) Sound processing method, sound processing device, electronic equipment and storage medium
CN113077808A (en) Voice processing method and device for voice processing
CN111292760B (en) Sounding state detection method and user equipment
CN113362842B (en) Audio signal processing method and device
CN113489854B (en) Sound processing method, device, electronic equipment and storage medium
CN113470676B (en) Sound processing method, device, electronic equipment and storage medium
CN113488067B (en) Echo cancellation method, device, electronic equipment and storage medium
CN118522299A (en) Echo cancellation method, device and storage medium
CN111294473B (en) Signal processing method and device
CN113194387A (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN114648996A (en) Audio data processing method and device, voice interaction method, equipment and chip, sound box, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant