CN113470677B - Audio processing method, device and system - Google Patents
Audio processing method, device and system Download PDFInfo
- Publication number
- CN113470677B CN113470677B CN202110741217.8A CN202110741217A CN113470677B CN 113470677 B CN113470677 B CN 113470677B CN 202110741217 A CN202110741217 A CN 202110741217A CN 113470677 B CN113470677 B CN 113470677B
- Authority
- CN
- China
- Prior art keywords
- audio signal
- filtering
- module
- processing
- end audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 238000001914 filtration Methods 0.000 claims abstract description 192
- 230000005236 sound signal Effects 0.000 claims abstract description 184
- 230000003044 adaptive effect Effects 0.000 claims abstract description 56
- 230000009977 dual effect Effects 0.000 claims description 22
- 238000000034 method Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 7
- 230000035945 sensitivity Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000001629 suppression Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R27/00—Public address systems
- H04R27/04—Electric megaphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
- H04R2227/009—Signal processing in [PA] systems to enhance the speech intelligibility
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The application provides an audio processing method, device and system, wherein the audio processing method comprises the following steps: inputting a far-end audio signal into an adaptive filtering module, and acquiring a first filtering coefficient converged by the adaptive filtering module; the first filter coefficient is used as a filter coefficient of a reference filter module, and the double-filter processing module comprises a reference filter module and a real-time filter module; performing double-filtering processing on the near-end audio signal through the double-filtering processing module; mixing the near-end audio signal and the far-end audio signal after the double filtering treatment to obtain a mixed audio signal; and playing the mixed sound signal by the near-end sound amplifying equipment, wherein the near-end sound amplifying equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space. The application can reduce the noise-spreading feedback.
Description
Technical Field
The present application relates to the field of audio processing, and in particular, to an audio processing method, apparatus, and system.
Background
Under remote interaction scenes such as an education recording and broadcasting system and an online meeting, the voice collected at the near end (local) is amplified and played, so that the sound size and definition requirements of each meeting participant in the local large space on the near end voice are met, meanwhile, a docking network is needed to realize online remote interaction, the voice collected at the near end can be synchronously played by the personnel in the far end space, and the voice collected by the personnel in the far end space can also be amplified and played at the near end through the remote collection equipment, so that remote interaction is realized.
In such a scenario, when the near-end acquisition device, the far-end acquisition device, the near-end amplifying device and the far-end amplifying device are used almost simultaneously, they may cause the generation of echoes and howling in the acquisition, playing and transmission processes of the audio signal.
How to alleviate the generation of echo and howling in the scenario of using the near-end acquisition device, the far-end acquisition device, the near-end amplifying device and the far-end amplifying device at the same time is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In order to overcome the defects in the prior art, the application provides an audio processing method, an audio processing device and an audio processing system, which can relieve the generation of echo and howling under the scene that near-end acquisition equipment, far-end acquisition equipment, near-end sound amplifying equipment and far-end sound amplifying equipment are used simultaneously.
According to an aspect of the present application, there is provided an audio processing method including:
Inputting a far-end audio signal into an adaptive filtering module, and acquiring a first filtering coefficient converged by the adaptive filtering module;
The first filter coefficient is used as a filter coefficient of a reference filter module, and the double-filter processing module comprises a reference filter module and a real-time filter module;
performing double-filtering processing on the near-end audio signal through the double-filtering processing module;
Mixing the near-end audio signal and the far-end audio signal after the double filtering treatment to obtain a mixed audio signal;
and playing the mixed sound signal by the near-end sound amplifying equipment, wherein the near-end sound amplifying equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space.
In some embodiments of the present application, after the obtaining the first filter coefficient after the adaptive filtering module converges, and before the taking the first filter coefficient as the filter coefficient of the reference filtering module, the method further includes:
confidence calculation is carried out on the first filter coefficient;
And when the confidence coefficient of the first filter coefficient is larger than a preset threshold value, executing the step of taking the first filter coefficient as the filter coefficient of the reference filter module.
In some embodiments of the application, the confidence calculating the first filter coefficient includes:
Obtaining a plurality of adaptive filtering curves according to a plurality of historical first filtering coefficients and gains of historical audio signals at all frequency points, wherein the historical audio signals are output by the adaptive filtering modules corresponding to the historical first filtering coefficients;
clustering the plurality of adaptive filter curves to obtain a reference first filter coefficient;
and calculating the confidence coefficient of the first filter coefficient according to the reference first filter coefficient.
In some embodiments of the present application, after the dual filtering processing module performs dual filtering processing on the near-end audio signal, and mixes the near-end audio signal and the far-end audio signal after the dual filtering processing, before obtaining a mixed audio signal, the method further includes:
Performing decorrelation processing on the near-end audio signal subjected to the double-filtering processing so as to mix the near-end audio signal subjected to the decorrelation processing with the far-end audio signal; and/or
And carrying out automatic equalization processing on set frequency points in the near-end audio signal subjected to the double-filter processing so as to mix the near-end audio signal subjected to the automatic equalization processing with the far-end audio signal.
In some embodiments of the present application, the performing automatic equalization processing on the set frequency point in the near-end audio signal after the dual-filtering processing includes:
the response sensitivity of the set frequency point in the near-end audio signal after the automatic equalization processing is made smaller than the response sensitivity of the set frequency point in the near-end audio signal before the automatic equalization processing.
In some embodiments of the present application, the inputting the far-end audio signal into the adaptive filtering module further comprises:
Estimating an echo signal of the far-end audio signal according to the output signal of the adaptive filtering module;
And removing the echo signal of the far-end audio signal from the near-end audio signal, and sending the echo signal as a far-end audio signal of an opposite end.
In some embodiments of the present application, the reference filtering module and the real-time filtering module respectively process the near-end audio signal, where the near-end audio signal output by the dual filtering process is obtained by weighting the near-end audio signal output by the reference filtering module and the near-end audio signal output by the real-time filtering module.
According to still another aspect of the present application, there is also provided an audio processing apparatus including:
the self-adaptive filtering module is configured to filter the far-end audio signal;
The acquisition module is configured to input the far-end audio signal into the adaptive filtering module and acquire a first filtering coefficient converged by the adaptive filtering module;
the parameter extraction module is configured to take the first filter coefficient as a filter coefficient of the reference filter module, and the double-filter processing module comprises the reference filter module and the real-time filter module;
the double-filtering processing module is configured to perform double-filtering processing on the near-end audio signal;
the sound mixing module is configured to mix the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a mixed sound signal;
And the control module is configured to control the near-end sound amplifying equipment to play the mixed sound signal, and the near-end sound amplifying equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space.
In some embodiments of the application, further comprising:
The decorrelation module is configured to perform decorrelation processing on the near-end audio signal subjected to the double-filtering processing so as to mix the near-end audio signal subjected to the decorrelation processing and the far-end audio signal; and/or
And the automatic equalization module is configured to perform automatic equalization processing on set frequency points in the near-end audio signal subjected to the double-filtering processing so as to mix the near-end audio signal subjected to the automatic equalization processing with the far-end audio signal.
According to still another aspect of the present application, there is also provided an audio processing system including:
A near-end acquisition device for near-end audio signals;
the near-end sound amplifying device and the near-end acquisition device are positioned in the same space;
A near-end audio processing device as described above.
According to still another aspect of the present application, there is also provided an electronic apparatus including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.
According to a further aspect of the present application there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.
Therefore, compared with the prior art, the scheme provided by the application has the following advantages:
According to the application, the far-end audio signal is input into the adaptive filtering module, and the first filtering coefficient converged by the adaptive filtering module is obtained and used as the filtering coefficient of the reference filtering module of the double-filtering processing module, so that the influence of the far-end audio signal played by near-end sound amplifying equipment on the collected near-end audio signal can be reduced, and the echo suppression effect is improved.
Drawings
The above and other features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
Fig. 1 shows a flow chart of an audio processing method according to an embodiment of the application.
Fig. 2 shows a flow chart of forming a far-end audio signal of an opposite end according to an embodiment of the application.
Fig. 3 shows a block diagram of an audio processing system according to an embodiment of the application.
Fig. 4 shows a schematic diagram of an audio processing system and an audio processing device according to an embodiment of the application.
Fig. 5 shows a block diagram of an audio processing system and an audio processing device according to an embodiment of the application.
Fig. 6 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the present disclosure.
Fig. 7 schematically illustrates a schematic diagram of an electronic device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
In order to overcome the defects in the prior art, the application provides an audio processing method and an audio processing device, which can relieve the generation of echo and howling in a scene that near-end acquisition equipment, far-end acquisition equipment, near-end sound amplifying equipment and far-end sound amplifying equipment are used simultaneously. Wherein, near-end collection equipment and near-end public address equipment are located same space, and far-end collection equipment and far-end public address equipment are located same space. The near-end audio processing system interacts with the far-end audio processing system. The near-end sound amplifying device can play the audio signals collected by the near-end collecting device and the audio signals collected by the far-end collecting device. Meanwhile, the audio signal played by the near-end sound amplifying device may be collected by the near-end collecting device. Similarly, when the remote audio processing system is used as the near-end audio processing system, the remote acquisition device and the remote sound amplifying device acquire and play audio signals according to the same mode.
Referring first to fig. 1, fig. 1 shows a flowchart of an audio processing method according to an embodiment of the present application. Fig. 1 shows the following steps in total:
Step S110: and inputting the far-end audio signal into an adaptive filtering module, and acquiring a first filtering coefficient converged by the adaptive filtering module.
Specifically, the adaptive filtering module may employ a normalized least mean squares algorithm (NLMS) for filtering. The invention is not limited thereto, and other filtering algorithms of the adaptive filtering module are also within the scope of the invention. The filter coefficient of the adaptive filter module changes dynamically.
In a preferred embodiment, step S110 may further include a step of performing confidence calculation on the first filter coefficient.
In some embodiments of the present application, the confidence calculation may be performed by: obtaining a plurality of adaptive filtering curves according to a plurality of historical first filtering coefficients and gains of historical audio signals at all frequency points, wherein the historical audio signals are output by the adaptive filtering modules corresponding to the historical first filtering coefficients; clustering the plurality of adaptive filter curves to obtain a reference first filter coefficient; and calculating the confidence coefficient of the first filter coefficient according to the reference first filter coefficient. Further, the horizontal axis of the adaptive filter curve is the frequency point, and the vertical axis is the gain of the frequency point. Because different frequency points correspond to different filter coefficients, the first filter coefficient in the application is actually a filter coefficient group to represent different frequency points and corresponding filter coefficients. For clarity, the specific implementation of the above steps is described below in terms of sets of filter coefficients: the variance is found from the newly input set of filter coefficients and the fitted coefficient curve, for example by fitting a curve (a set of filter coefficients) reflecting the trend of the transfer function through a plurality of sets of filter coefficients (historical first filter coefficients), when the variance is greater than a threshold value, this indicates that the newly input set of filter coefficients is too large, discarding, for example, if it is less than the threshold value, the newly input set of filter coefficients is added, discarding the set of filter coefficients with the earliest addition time, and re-fitting a set of curve coefficients based on the newly added set of filter coefficients. Specifically, the confidence may be inversely related to the variance, so that the confidence is smaller when the variance is larger, and the confidence is larger when the variance is smaller, so that the next step is performed when the confidence is larger than a set threshold.
In some embodiments of the present application, the confidence level may be calculated by a pre-trained confidence calculation model. The confidence computation model may be any machine learning model, such as a neural network model, a random forest model, etc., and the application is not limited in this regard. The confidence calculation model may learn based on the sample data and the set confidence.
In other embodiments of the present application, when the near-end acquisition device does not acquire an audio signal (or when the audio intensity of the near-end acquired audio signal is 0), only the far-end audio signal is input to the adaptive filtering module to obtain the converged reference first filter coefficient (filter coefficient). The confidence level of the first filter coefficient may then be used as a measure of similarity of the first filter coefficient to a reference first filter coefficient, thereby indicating whether the first filter coefficient truly reflects the near-end environmental transfer function.
In further embodiments of the present application, a functional relationship of the first filter coefficient and the filter effect may also be generated/fitted based on the first filter coefficient and the filter effect over a pre-set period of time obtained for the current near-end environment and the far-end environment. Therefore, when the filtering effect is better, the confidence coefficient of the first filtering coefficient is higher, the confidence coefficient of the first filtering coefficient can be calculated in real time based on the generated/fitted function relation of the first filtering coefficient and the filtering effect, and accordingly confidence judgment is conducted on the obtained first filtering coefficient.
Therefore, confidence coefficient calculation can be performed through the first filter coefficient, so that the first filter coefficient can truly reflect an environment transfer function, and the echo suppression effect is further improved.
Step S130: and taking the first filter coefficient as a filter coefficient of a reference filter module.
In the foregoing preferred embodiment of calculating the confidence coefficient, when the confidence coefficient of the first filter coefficient is greater than the preset threshold, the first filter coefficient is indicative of an environment transfer function of the near end, so that the near end audio signal may be filtered by using the first filter coefficient (the near end audio signal may include a far end audio signal played by a near end amplifying device).
Step S140: the near-end audio signal is subjected to double-filtering processing through the double-filtering processing module
Specifically, the dual filtering processing module comprises a reference filtering module and a real-time filtering module. The reference filtering module and the real-time filtering module can adopt a normalized least mean square algorithm for filtering. The invention is not limited thereto, and other filtering algorithms of the reference filtering module and the real-time filtering module are also within the scope of the invention. The filter coefficients of the reference filter module are set by the adaptive filter module. The filter coefficient of the real-time filter module changes dynamically.
Further, the normalized least mean square algorithm processes the input audio signal by using a reference signal, thereby realizing filtering operations such as echo cancellation. In this embodiment, the filter coefficient of the reference filtering process is obtained by convergence and confidence calculation of the adaptive filtering module, so that the near-end and far-end environment transfer functions can be estimated, and meanwhile, the output signal of the reference filtering module can be used as the reference signal of the real-time filtering module, so that the real-time filtering module can perform optimized echo cancellation and filtering operations with reference to the near-end and far-end environment transfer functions.
Specifically, the reference filtering module and the real-time filtering module respectively process the near-end audio signals, wherein the near-end audio signals output by the dual-filtering processing are obtained by weighting the near-end audio signals output by the reference filtering module and the near-end audio signals output by the real-time filtering module. The weights of the reference filtering module and the real-time filtering module can be set as required, and can also be calculated based on the filtering effects of the reference filtering module and the real-time filtering module. For example, the echo difference energy may be obtained by calculating one or more of the magnitude of the residual echo energy after filtering by the reference filtering module and the real-time filtering module, the product of the difference of the residual echo energy after filtering by the reference filtering module and the real-time filtering module, and the product of the residual echo energy of the reference filtering module and the estimated echo difference energy. The estimated echo energy is estimated echo difference energy obtained by calculation according to the estimated echo difference of the reference filtering module and the real-time filtering module. The application is not limited in this regard.
Step S150: and mixing the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a mixed audio signal.
Step S160: the mixed signal is played by a near-end amplifying device.
Therefore, the generation of echo and howling can be relieved while the audio signals collected at the near end and the audio signals collected at the far end are played together.
According to the audio processing method provided by the application, the far-end audio signal is input into the adaptive filtering module, and the first filtering coefficient converged by the adaptive filtering module is obtained and used as the filtering coefficient of the reference filtering module of the double-filtering processing module, so that the influence of the far-end audio signal played by near-end sound amplifying equipment on the collected near-end audio signal can be reduced, and the echo suppression effect is improved.
In some embodiments of the present application, after the dual filtering processing module performs the dual filtering processing on the near-end audio signal in step S140 of fig. 1, and before the dual filtering processing is performed on the near-end audio signal and the far-end audio signal in step S150, the following steps may be further performed to obtain a mixed signal: and performing decorrelation processing on the near-end audio signal subjected to the double-filtering processing so as to mix the near-end audio signal subjected to the decorrelation processing and the far-end audio signal. Specifically, the decorrelation processing can be realized by frequency shift and/or phase modulation, so that the correlation between the audio signal to be played by the sound amplifying device and the audio signal acquired by the acquisition device is reduced, and the generation of howling can be avoided while the double-filtering processing is facilitated.
In some embodiments of the present application, after the dual filtering processing module performs the dual filtering processing on the near-end audio signal in step S140 of fig. 1, and before the dual filtering processing is performed on the near-end audio signal and the far-end audio signal in step S150, the following steps may be further performed to obtain a mixed signal: and carrying out automatic equalization processing on set frequency points in the near-end audio signal subjected to the double-filter processing so as to mix the near-end audio signal subjected to the automatic equalization processing with the far-end audio signal. Specifically, because the room is not good in sound establishment, sound staining is easy to occur, the acoustic resonance of the room enables certain frequencies in the sound to be enhanced, howling is generated subsequently, and automatic equalization can be performed according to certain unstable frequency points in the sound field of the room. Specifically, since the filter coefficient is a coefficient corresponding to each frequency point of the frequency domain, the possible unstable point of the estimated value can be approximately known as the set frequency point through the relatively large coefficient, and then the set automatic equalization parameter (for example, 1/3 octave is used to make the center frequency correspond to the possible unstable point, the gain video coefficient is determined to be-6 to-3 dB, the application does not take this as a limitation) is used for filtering, so that the response sensitivity of the frequency point is adaptively reduced, and the response sensitivity of the set frequency point in the near-end audio signal after the automatic equalization is made smaller than the response sensitivity of the set frequency point in the near-end audio signal before the automatic equalization, thereby achieving the early prevention of howling and improving the sound transmission gain. Further, the step of automatic equalization processing may be performed before the step of decorrelation processing.
In some embodiments of the present application, step S110 in fig. 1 may further perform steps as shown in fig. 2 after inputting the far-end audio signal into the adaptive filtering module:
step S170: and estimating an echo signal of the far-end audio signal according to the output signal of the adaptive filtering module.
Step S180: and removing the echo signal of the far-end audio signal from the near-end audio signal, and sending the echo signal as a far-end audio signal of an opposite end.
Therefore, the echo of the near-end audio signal acquired by the near-end acquisition equipment can be processed at the near end, and the audio signal subjected to echo removal processing can be used as a reference signal of the adaptive filter, so that the convergence of the filter coefficient of the adaptive filter is facilitated, the accuracy of echo estimation is enhanced, and meanwhile, the adaptive filter is facilitated to provide a first filter coefficient with higher confidence coefficient for the double-filter module.
The foregoing illustrates various embodiments of the present application, and the present application is not limited thereto, and in each embodiment, steps are added, omitted, and sequence changed are all within the scope of the present application; the embodiments may be implemented alone or in combination.
Fig. 3 shows a block diagram of an audio processing system according to an embodiment of the application. Specifically, fig. 3 shows a near-end audio processing system and a far-end audio processing system. The proximal end and the distal end are opposite to each other. As shown in fig. 3, the near-end audio processing system includes a near-end acquisition device 210, a near-end amplification device 230, and a near-end audio processing apparatus 220. The near-end acquisition device 210 and the near-end sound amplifying device 230 are located in the same space, so that the audio signal played by the near-end sound amplifying device 230 is transmitted to the near-end acquisition device 210 through the near-end environment transfer function. The remote audio processing system comprises a remote acquisition device 240, a remote loudspeaker device 260 and a remote audio processing apparatus 250. The remote acquisition device 240 and the remote sound amplifying device 260 are located in the same space, so that the audio signal played by the remote sound amplifying device 260 is transferred to the remote acquisition device 240 through the remote environment transfer function. The near-end audio processing apparatus 220 and the far-end audio processing apparatus 250 communicate with each other in a wireless or wired manner, so that the near-end sound amplifying device 230 can play the audio signal collected by the far-end collection device 240, and similarly, the far-end sound amplifying device 260 can play the audio signal collected by the near-end collection device 210. Further, the amplifying device may be, for example, sound; the acquisition device may be, for example, a microphone, and the application is not limited in this regard.
Referring now to fig. 4, fig. 4 is a schematic diagram illustrating an audio processing system and an audio processing apparatus according to an embodiment of the present application. The audio processing system and the audio processing device shown in fig. 4 are applied to the scene shown in fig. 3. The audio processing system comprises a near-end acquisition device 210 for near-end audio signals, a near-end loudspeaker device 230 and near-end audio processing means 220. The remote audio processing device 250 in fig. 3 may have the same structure as the near-end audio processing device 220.
The near-end audio processing device 220 includes an adaptive filtering module 221, an obtaining module 222, a parameter extracting module 224, a dual filtering processing module 225, a mixing module 226, and a control module 227.
The adaptive filtering module 221 is configured to filter the far-end audio signal.
The acquisition module 222 is configured to input the far-end audio signal into the adaptive filtering module, and acquire the first filter coefficient after convergence of the adaptive filtering module.
The parameter extraction module 224 is configured to take the first filter coefficient as a filter coefficient of a reference filter module.
The dual filter processing module 225 is configured to perform dual filter processing on the near-end audio signal. The dual-filtering processing module comprises a reference filtering module and a real-time filtering module. The reference filtering module and the real-time filtering module may have the same filters as the adaptive filtering module 221.
The mixing module 226 is configured to mix the near-end audio signal and the far-end audio signal after the double filtering process, so as to obtain a mixed audio signal.
The control module 227 is configured to control the near-end loudspeaker device to play the mix signal.
Therefore, the far-end audio signal is input into the adaptive filtering module, and the first filtering coefficient converged by the adaptive filtering module is obtained to serve as the filtering coefficient of the reference filtering module of the double-filtering processing module, so that the influence on the collected near-end audio signal caused by the far-end audio signal played by near-end sound amplifying equipment can be reduced, and the echo suppression effect is improved.
Referring now to fig. 5, fig. 5 is a block diagram illustrating an audio processing system and an audio processing device according to an embodiment of the present application. The audio processing system and the audio processing device shown in fig. 5 are applied to the scene shown in fig. 3. The audio processing system comprises a near-end acquisition device 210 for near-end audio signals, a near-end loudspeaker device 230 and near-end audio processing means 220. The remote audio processing device 250 in fig. 3 may have the same structure as the near-end audio processing device 220.
The near-end audio processing device 220 includes an adaptive filtering module 221, an obtaining module 222, a confidence calculating module 223, a parameter extracting module 224, a dual filtering processing module 225, a mixing module 226, a control module 227, an echo removing module 228, an automatic equalizing module 229, and a decorrelation module 2210.
The adaptive filtering module 221, the obtaining module 222, the parameter extracting module 224, the dual filtering processing module 225, the mixing module 226, and the control module 227 are described in conjunction with fig. 4, and are not described herein.
The confidence calculation module 223 is configured to perform confidence calculation on the first filter coefficient, whereby the parameter extraction module 224 is configured to use the first filter coefficient as a filter coefficient of a reference filter module of the dual filter processing module when the confidence of the first filter coefficient is greater than a preset threshold.
The echo removing module 228 is configured to estimate an echo signal of the far-end audio signal according to the output signal of the adaptive filtering module, and remove the echo signal of the far-end audio signal from the near-end audio signal, and send the echo signal as a far-end audio signal of an opposite end.
The automatic equalization module 229 is configured to perform automatic equalization on the set frequency points in the near-end audio signal after the dual-filtering process, so as to mix the near-end audio signal and the far-end audio signal after the automatic equalization process.
The decorrelation module 2210 is configured to perform decorrelation processing on the near-end audio signal after the dual filtering processing, so as to mix the near-end audio signal after the decorrelation processing with the far-end audio signal.
The application can realize the audio processing device and the system by software, hardware, firmware and any combination thereof. Fig. 3-5 are only schematic illustrations of the audio processing apparatus and system provided by the present application, and the splitting, combining, and adding of the modules are all within the scope of the present application without departing from the inventive concept.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium, on which a computer program is stored, which program, when being executed by, for example, a processor, can implement the steps of the audio processing method described in any one of the above embodiments. In some possible embodiments, the aspects of the application may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the application as described in the above-mentioned audio processing method section of this specification, when said program product is run on the terminal device.
Referring to fig. 6, a program product 800 for implementing the above-described method according to an embodiment of the present application is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partially on the tenant device, as a stand-alone software package, partially on the tenant computing device, partially on a remote computing device, or entirely on a remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected through the internet using an internet service provider).
In an exemplary embodiment of the present disclosure, an electronic device is also provided, which may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the audio processing method of any of the embodiments described above via execution of the executable instructions.
Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the application is described below with reference to fig. 7. The electronic device 600 shown in fig. 7 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in fig. 7, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different system components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.
Wherein the storage unit stores program code executable by the processing unit 610 such that the processing unit 610 performs the steps according to various exemplary embodiments of the present application described in the above-mentioned audio processing method section of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1 or 2.
The memory unit 620 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.
The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a tenant to interact with the electronic device 600, and/or any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the above-mentioned audio processing method according to the embodiments of the present disclosure.
According to the application, the far-end audio signal is input into the adaptive filtering module, and the first filtering coefficient converged by the adaptive filtering module is obtained and used as the filtering coefficient of the reference filtering module of the double-filtering processing module, so that the far-end audio signal played by near-end sound amplifying equipment can be reduced, the influence on the acquired near-end audio signal is improved, and the echo suppression effect is improved.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
Claims (8)
1. An audio processing method, comprising:
Inputting a far-end audio signal into an adaptive filtering module, and acquiring a first filtering coefficient converged by the adaptive filtering module;
Obtaining a plurality of adaptive filtering curves according to a plurality of historical first filtering coefficients and gains of historical audio signals at all frequency points, wherein the historical audio signals are output by the adaptive filtering modules corresponding to the historical first filtering coefficients;
fitting a plurality of self-adaptive filter curves to obtain a reference first filter coefficient;
calculating the confidence coefficient of the first filter coefficient according to the reference first filter coefficient;
when the confidence coefficient of the first filter coefficient is larger than a preset threshold value, the first filter coefficient is used as a filter coefficient of a reference filter module, and the double-filter processing module comprises the reference filter module and a real-time filter module;
Performing double-filtering processing on the near-end audio signal through the double-filtering processing module, and taking the output signal of the reference filtering module as the reference signal of the real-time filtering module;
Mixing the near-end audio signal and the far-end audio signal after the double filtering treatment to obtain a mixed audio signal;
and playing the mixed sound signal by the near-end sound amplifying equipment, wherein the near-end sound amplifying equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space.
2. The audio processing method according to claim 1, wherein after the dual filtering processing is performed on the near-end audio signal by the dual filtering processing module, and mixing the near-end audio signal and the far-end audio signal after the dual filtering processing, before obtaining the mixed audio signal, further comprises:
Performing decorrelation processing on the near-end audio signal subjected to the double-filtering processing so as to mix the near-end audio signal subjected to the decorrelation processing with the far-end audio signal; and/or
And carrying out automatic equalization processing on set frequency points in the near-end audio signal subjected to the double-filter processing so as to mix the near-end audio signal subjected to the automatic equalization processing with the far-end audio signal.
3. The audio processing method according to claim 2, wherein the automatically equalizing the set frequency points in the near-end audio signal after the double-filtering process includes:
the response sensitivity of the set frequency point in the near-end audio signal after the automatic equalization processing is made smaller than the response sensitivity of the set frequency point in the near-end audio signal before the automatic equalization processing.
4. The audio processing method of claim 1, wherein the inputting the far-end audio signal into the adaptive filtering module further comprises:
Estimating an echo signal of the far-end audio signal according to the output signal of the adaptive filtering module;
And removing the echo signal of the far-end audio signal from the near-end audio signal, and sending the echo signal as a far-end audio signal of an opposite end.
5. The audio processing method according to claim 1, wherein the reference filtering module and the real-time filtering module respectively process the near-end audio signals, wherein the near-end audio signals output by the dual filtering process are weighted by the near-end audio signals output by the reference filtering module and the near-end audio signals output by the real-time filtering module.
6. An audio processing apparatus, comprising:
the self-adaptive filtering module is configured to filter the far-end audio signal;
The acquisition module is configured to input a remote audio signal into the adaptive filtering module, acquire a first filtering coefficient converged by the adaptive filtering module, acquire a plurality of adaptive filtering curves according to a plurality of historical first filtering coefficients and gains of the historical audio signal at each frequency point, output the historical audio signal by the adaptive filtering module corresponding to the historical first filtering coefficients, and fit the plurality of adaptive filtering curves to acquire a reference first filtering coefficient, and calculate the confidence coefficient of the first filtering coefficient according to the reference first filtering coefficient;
the parameter extraction module is configured to take the first filter coefficient as the filter coefficient of the reference filter module when the confidence coefficient of the first filter coefficient is larger than a preset threshold value, and the double-filter processing module comprises the reference filter module and the real-time filter module;
The double-filtering processing module is configured to perform double-filtering processing on the near-end audio signal, and takes the output signal of the reference filtering module as the reference signal of the real-time filtering module;
the sound mixing module is configured to mix the near-end audio signal and the far-end audio signal after the double filtering processing to obtain a mixed sound signal;
And the control module is configured to control the near-end sound amplifying equipment to play the mixed sound signal, and the near-end sound amplifying equipment and the near-end acquisition equipment of the near-end audio signal are positioned in the same space.
7. The audio processing apparatus of claim 6, further comprising:
The decorrelation module is configured to perform decorrelation processing on the near-end audio signal subjected to the double-filtering processing so as to mix the near-end audio signal subjected to the decorrelation processing and the far-end audio signal; and/or
And the automatic equalization module is configured to perform automatic equalization processing on set frequency points in the near-end audio signal subjected to the double-filtering processing so as to mix the near-end audio signal subjected to the automatic equalization processing with the far-end audio signal.
8. An audio processing system, comprising:
A near-end acquisition device for near-end audio signals;
the near-end sound amplifying device and the near-end acquisition device are positioned in the same space;
an audio processing apparatus as claimed in claim 6 or 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110741217.8A CN113470677B (en) | 2021-06-30 | 2021-06-30 | Audio processing method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110741217.8A CN113470677B (en) | 2021-06-30 | 2021-06-30 | Audio processing method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113470677A CN113470677A (en) | 2021-10-01 |
CN113470677B true CN113470677B (en) | 2024-06-21 |
Family
ID=77876960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110741217.8A Active CN113470677B (en) | 2021-06-30 | 2021-06-30 | Audio processing method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113470677B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109451195A (en) * | 2018-09-18 | 2019-03-08 | 北京佳讯飞鸿电气股份有限公司 | A kind of echo cancel method and system of adaptive double-end monitor |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004258533A (en) * | 2003-02-27 | 2004-09-16 | National Institute Of Advanced Industrial & Technology | Multi-channel real-time sound signal processor |
CN1842110B (en) * | 2005-03-28 | 2010-04-28 | 华为技术有限公司 | Echo eliminating device and method |
KR20090010288A (en) * | 2007-07-23 | 2009-01-30 | 삼성전자주식회사 | Echo cancellation method in portable terminal |
KR101090865B1 (en) * | 2009-07-16 | 2011-12-08 | (주)시그젠 | Real-time howling signal eliminating method |
CN102938254B (en) * | 2012-10-24 | 2014-12-10 | 中国科学技术大学 | Voice signal enhancement system and method |
-
2021
- 2021-06-30 CN CN202110741217.8A patent/CN113470677B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109451195A (en) * | 2018-09-18 | 2019-03-08 | 北京佳讯飞鸿电气股份有限公司 | A kind of echo cancel method and system of adaptive double-end monitor |
Also Published As
Publication number | Publication date |
---|---|
CN113470677A (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11297178B2 (en) | Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters | |
CN111768796B (en) | Acoustic echo cancellation and dereverberation method and device | |
CN113241085B (en) | Echo cancellation method, device, equipment and readable storage medium | |
CN111031448B (en) | Echo cancellation method, echo cancellation device, electronic equipment and storage medium | |
CN107017004A (en) | Noise suppressing method, audio processing chip, processing module and bluetooth equipment | |
CN110992923B (en) | Echo cancellation method, electronic device, and storage device | |
CN108696648B (en) | Method, device, equipment and storage medium for processing short-time voice signal | |
CN115083431B (en) | Echo cancellation method, device, electronic equipment and computer readable medium | |
CN111710344A (en) | A signal processing method, apparatus, device and computer-readable storage medium | |
CN110782914A (en) | Signal processing method and device, terminal equipment and storage medium | |
CN114242100A (en) | Audio signal processing method, training method and device, equipment and storage medium thereof | |
CN109215672B (en) | Method, device and equipment for processing sound information | |
CN114171049A (en) | Echo cancellation method and device, electronic device and storage medium | |
CN112997249B (en) | Voice processing method, device, storage medium and electronic equipment | |
CN112037810A (en) | Echo processing method, device, medium and computing equipment | |
CN113286047B (en) | Voice signal processing method and device and electronic equipment | |
CN110992975A (en) | Voice signal processing method and device and terminal | |
CN113470677B (en) | Audio processing method, device and system | |
CN113824843B (en) | Voice call quality detection method, device, equipment and storage medium | |
US20230403506A1 (en) | Multi-channel echo cancellation method and related apparatus | |
CN118250389A (en) | Echo cancellation method, device, electronic equipment, vehicle-mounted system and storage medium | |
CN115410593B (en) | Audio channel selection method, device, equipment and storage medium | |
CN113763975B (en) | A voice signal processing method, device and terminal | |
CN113113046B (en) | Performance detection method and device for audio processing, storage medium and electronic equipment | |
CN115620737A (en) | Voice signal processing device, method, electronic equipment and sound amplification system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |