CN121397453A

CN121397453A - Audio data processing method and related device

Info

Publication number: CN121397453A
Application number: CN202511504369.0A
Authority: CN
Inventors: 戚成杰; 王琪; 陈小波; 李云龙
Original assignee: Hunan MgtvCom Interactive Entertainment Media Co Ltd
Current assignee: Hunan MgtvCom Interactive Entertainment Media Co Ltd
Priority date: 2025-10-20
Filing date: 2025-10-20
Publication date: 2026-01-23

Abstract

The application discloses an audio data processing method and a related device, relating to the field of audio processing, comprising the steps of acquiring PCM data of each channel of a current audio frame of sound stream information based on a spatial audio algorithm; the method comprises the steps of carrying out bidirectional filtering on PCM data of a current audio frame based on PCM data of a previous audio frame to obtain a high-frequency signal and a low-frequency signal, deleting the PCM data of the previous audio frame stored in a first area, moving the PCM data of the current audio frame from a second area to the first area, respectively carrying out down mixing processing on the high-frequency signal and the low-frequency signal to obtain a high-frequency stereo signal and a low-frequency stereo signal, and combining the high-frequency stereo signal and the low-frequency stereo signal to obtain an audio rendering signal of the current audio frame. The application realizes the real-time output of the audio rendering signal with clear and stable level under the condition of low delay based on the frame-level bidirectional filtering operation.

Description

Audio data processing method and related device

Technical Field

The present application relates to the field of audio data processing technologies, and in particular, to an audio data processing method and a related device.

Background

In some scenes (such as music variety programs) with strong real-time requirements on audio processing, live sounds collected by a pickup are required to be processed in real time, and audio data generated after the processing is output to the internet. Currently, when processing the live sound, the high frequency and the low frequency in the live sound are simultaneously mixed down through a spatial audio algorithm, so that the high frequency sound image and the low frequency sound image are overlapped. For example, when the high-frequency sound is located in the center of the stage and the low-frequency piano sound is located around the stage, the high-frequency sound and the low-frequency piano sound overlap each other, so that the real stage scene sound source placement position cannot be represented, and the stereo effect is poor.

Disclosure of Invention

In view of the above problems, the present application provides an audio data processing method and related apparatus, so as to achieve the purpose of real-time processing of audio data. The specific scheme is as follows:

the first aspect of the present application provides an audio data processing method, including:

Acquiring PCM data of each channel of a current audio frame of sound stream information based on a spatial audio algorithm;

if the current audio frame is not the first frame, the PCM data of the current audio frame is stored in a second area of the buffer area, wherein the buffer area comprises a first area and a second area, the second area is arranged behind the first area, and the first area stores the PCM data of the previous audio frame;

Bidirectional filtering is carried out on the PCM data of the current audio frame based on the PCM data of the previous audio frame to obtain a high-frequency signal and a low-frequency signal, the PCM data of the previous audio frame stored in the first area is deleted, and the PCM data of the current audio frame is moved from the second area to the first area;

respectively carrying out down-mixing processing on the high-frequency signal and the low-frequency signal to obtain a high-frequency stereo signal and a low-frequency stereo signal;

and combining the high-frequency stereo signal and the low-frequency stereo signal to obtain an audio rendering signal of the current audio frame.

Optionally, the bidirectional filtering process includes:

Filtering PCM data in the buffer area to obtain a filtering result, storing forward buffer data, carrying out inverse processing on the filtering result to obtain an inverse filtering result, carrying out secondary filtering processing on the inverse filtering result to obtain a secondary filtering result, storing inverse buffer data, carrying out secondary inverse processing on the secondary filtering result to obtain a secondary inverse filtering result, and carrying out cutting processing on the secondary inverse filtering result to obtain a high-frequency signal and a low-frequency signal of each sound channel.

Optionally, filtering the PCM data in the buffer to obtain a filtering result, and storing forward buffered data, including:

If the current audio frame is the first frame, performing first filtering processing on the PCM data in the buffer area to obtain a filtering result, and storing forward buffer data;

And if the current audio frame is not the first frame, performing second filtering processing on the PCM data in the buffer area to obtain a filtering result, removing the first two bits of the filtering result, and storing forward buffer data.

Optionally, performing a second filtering process on the PCM data in the buffer to obtain a filtering result, including:

intercepting the last M bits of PCM data of the previous audio frame to obtain input buffer data, wherein the length of M is the length of a filter parameter;

Filtering PCM data of the previous audio frame to obtain a processing result, and intercepting the last M bits of the processing result to obtain an initial input condition;

inputting the input buffer data, the initial input condition and the PCM data of the current audio frame into a filter to obtain output data of the current audio frame;

and assigning the initial input condition to the first M bits of the output data to obtain a filtering result.

Optionally, bi-directionally filtering the PCM data of the current audio frame based on the PCM data of the previous audio frame to obtain a high frequency signal and a low frequency signal, including:

Acquiring filter parameters of each channel of a current audio frame, wherein the filter parameters comprise low-frequency input filter parameters, low-frequency feedback filter parameters, high-frequency input filter parameters and high-frequency feedback filter parameters;

Based on the PCM data of the previous audio frame, the low-frequency input filtering parameters and the low-frequency feedback filtering parameters, bidirectional filtering is carried out on the PCM data of the current audio frame, and a low-frequency signal of each channel is obtained;

And carrying out bidirectional filtering on the PCM data of the current audio frame based on the PCM data of the previous audio frame, the high-frequency input filtering parameters and the high-frequency feedback filtering parameters to obtain a high-frequency signal of each channel.

Optionally, the cutting processing is performed on the secondary inverse filtering result to obtain a high-frequency signal and a low-frequency signal of each channel, including:

If the current frame is the first frame, cutting off the first nine bits of the secondary reverse filtering result when the secondary reverse filtering result of the current frame is obtained, and then intercepting the secondary reverse filtering result of the first 1024 sampling points to obtain a high-frequency signal and a low-frequency signal of each sound channel;

If the current frame is not the first frame, when a secondary reverse filtering result of the current frame is obtained, the secondary reverse filtering results of the first 1024 sampling points of the secondary reverse filtering result are intercepted, and a high-frequency signal and a low-frequency signal of each channel are obtained.

Optionally, the process of performing the downmix processing on the high frequency signal and the low frequency signal respectively includes:

Carrying out loudness downmixing processing on the low-frequency signals of each channel to obtain low-frequency stereo signals;

Performing channel middle side processing on the high-frequency signal of each channel to obtain a sound field expansion signal;

and performing HRTF down-mixing processing on the sound field expansion signal to obtain a high-frequency stereo signal.

A second aspect of the present application provides an audio data processing apparatus comprising:

the data acquisition module is used for acquiring PCM data of each channel of the current audio frame of the sound stream information based on a spatial audio algorithm;

The data updating module is used for storing the PCM data of the current audio frame into a second area of the buffer area if the current audio frame is not the first frame, wherein the buffer area comprises a first area and a second area, the second area is arranged behind the first area, and the first area stores the PCM data of the previous audio frame;

The filtering module is used for carrying out bidirectional filtering on the PCM data of the current audio frame based on the PCM data of the previous audio frame to obtain a high-frequency signal and a low-frequency signal, deleting the PCM data of the previous audio frame stored in the first area and moving the PCM data of the current audio frame from the second area to the first area;

the down-mixing processing module is used for respectively carrying out down-mixing processing on the high-frequency signal and the low-frequency signal to obtain a high-frequency stereo signal and a low-frequency stereo signal;

And the data merging module is used for merging the high-frequency stereo signal and the low-frequency stereo signal to obtain an audio rendering signal of the current audio frame.

A third aspect of the application provides a computer program product comprising computer readable instructions which, when run on an electronic device, cause the electronic device to implement the audio data processing method of the first aspect or any implementation of the first aspect.

A fourth aspect of the application provides an electronic device comprising at least one processor and a memory coupled to the processor, wherein:

The memory is used for storing a computer program;

the processor is configured to execute a computer program to enable the electronic device to implement the audio data processing method of the first aspect or any implementation manner of the first aspect.

A fifth aspect of the present application provides a computer storage medium carrying one or more computer programs which, when executed by an electronic device, enable the electronic device to implement the audio data processing method of the first aspect or any implementation manner of the first aspect.

By means of the technical scheme, the buffer area is introduced during frame-by-frame processing, so that frame boundary distortion is avoided, and instantaneity is guaranteed. On the basis, the audio signal of each channel is divided into a high frequency part and a low frequency part, and the high frequency signal and the low frequency signal are respectively subjected to down mixing treatment to generate a low frequency stereo signal and a high frequency stereo signal, so that the accuracy of sound image positioning and the resolution of spatial distribution are improved. And finally, correspondingly adding the low-frequency stereo signal and the high-frequency stereo signal of each channel to obtain the binaural rendering signal. Compared with the problem that the conventional bidirectional filtering is adopted in the existing binaural rendering method aiming at the file and cannot be applied to a real-time live broadcast system, the method reduces the computational complexity and maintains the balance of low-frequency energy, and finally synthesizes the high-frequency signal and the low-frequency signal to obtain the binaural rendering signal, so that a sound field result with clear and stable level can be output in real time under the low-delay condition.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart of an audio data processing method according to the present application;

FIG. 2 is a flow chart of an algorithm for audio data processing according to the present application;

Fig. 3 is a schematic structural diagram of an audio data processing device according to the present application;

fig. 4 is a schematic block diagram of an electronic device provided by the present application.

Detailed Description

Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application herein is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.

Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The application provides an audio data processing method, as shown in fig. 1, which can comprise the following steps:

s101, PCM data of each channel of a current audio frame of sound stream information is acquired based on a spatial audio algorithm.

Alternatively, in one embodiment, the spatial audio algorithm may be a cyanine color algorithm, a dolby algorithm, and a DTS: X algorithm. The current audio frame refers to frame data of an audio stream currently processed in a time sequence, is an audio signal segment divided by taking sampling points as basic units, and generally comprises a plurality of sampling points, and can completely reflect sound data in the time segment. Channels refer to independent sound channels in an audio system, such as the common left channel, right channel, and front, rear, and bass channels in multi-channel surround sound, where different channels carry different audio information for reconstructing a stereo field in space, and in this embodiment, the number of channels may be 10, which includes a left channel, a right channel, a center channel, a bass channel, a left surround channel, a right surround channel, a left front top channel, a right front top channel, a left rear top channel, and a right rear top channel, where the last four channels are referred to as sky channels. The PCM data is pulse code modulation data, is a discrete digital signal obtained by sampling, quantizing and encoding an analog audio signal, can completely reflect the waveform characteristics of original sound, and is basic data in audio processing.

Optionally, in this embodiment, the system may directly decode the input audio stream information frame by frame to obtain PCM data, divide the PCM data into frames according to a fixed sampling point length (for example 1024 sampling points), and map the frames into PCM data corresponding to a plurality of channels according to the number of channels, so as to ensure accuracy of data input, and obtain PCM data of each channel of the current audio frame according to the following formula,, wherein,All PCM data contained in the sound stream information, andIs a matrix of [1024,10 ].Representing an audio frame counter, representing how many audio frames are contained in PCM data (as is common in all equations below),Representing a frame of audio frames, 1024 sample points in length.

Specifically, the amplitude normalization processing can be performed on the PCM data in the decoding process, and the amplitudes of different channels are unified to a reasonable range, so that the problem that the subsequent processing effect is poor due to overlarge volume difference between channels is avoided, and the time sequence and sampling precision of the PCM data are ensured to meet the requirements of different playing devices. By accurately analyzing the PCM data into the frame-by-frame and channel-by-channel PCM data, not only is complete and reliable input data provided for subsequent buffering and filtering processing, but also the PCM data is ensured not to be distorted in the analysis process, thereby improving the precision and controllability of the subsequent processing of the PCM data.

And S102, if the current audio frame is not the first frame, storing the PCM data of the current audio frame into a second area of a buffer area, wherein the buffer area comprises the first area and the second area, the second area is arranged behind the first area, and the first area stores the PCM data of the previous audio frame.

Optionally, in this embodiment, the buffer is a storage space pre-divided in the system memory, and is used to temporarily store PCM data of each channel, so as to ensure data continuity and integrity in the processing process. The first area is used for storing the PCM data of the previous audio frame, the second area is used for storing the PCM data of the current audio frame, and the data of the second area is arranged behind the first area, so that an ordered double-area structure is formed, continuous input data is provided for subsequent bidirectional filtering, and particularly when the input audio frame is the first frame, the PCM data of the first frame is put into the first area without any other operation, and meanwhile, the PCM data of the second frame is put into the second area.

Specifically, the buffer may be expressed as: , wherein, For a buffer, the buffer data is used for buffering input information to achieve continuity of data processing,

When (when)In the time-course of which the first and second contact surfaces,And (3) withIdentical whenWhen the last frameWith the current frameThe buffer areas are formed together, and the buffer areas are formed together,Before the front partThe buffer of 2048 length is formed later. The buffer area updates data according to time sequence, namely when new PCM data is written, the PCM data of the previous audio frame is kept in the first area, and the PCM data of the current audio frame is written in the second area, so that the sequential storage of two adjacent frames of data is realized. In yet another embodiment, the buffer may be designed as a circular queue, and when the buffer is full, the newly written PCM data will cover the earliest frame of data, which is especially suitable for scenes with high real-time requirements, such as heddle programs or live broadcasts, so that the system can continue to operate stably without data loss due to buffer overflow. By setting up the buffer area for buffering the input data, the PCM data of the previous audio frame and the current audio frame can be stored in the buffer area continuously and orderly, the required complete input data is provided for the subsequent bidirectional filtering processing, the continuity and stability of the data processing are improved, the utilization rate of the storage space is optimized, and therefore the real-time performance and the processing precision can be considered.

And S103, carrying out bidirectional filtering on the PCM data of the current audio frame based on the PCM data of the previous audio frame to obtain a high-frequency signal and a low-frequency signal, deleting the PCM data of the previous audio frame stored in the first area, and moving the PCM data of the current audio frame from the second area to the first area.

Optionally, in this embodiment, bidirectional filtering may refer to a signal processing manner combining forward filtering and backward filtering, where the forward filtering is used for primarily separating a high frequency signal and a low frequency signal in an audio signal, and the backward filtering is used for eliminating phase distortion possibly introduced in the forward filtering process, so as to obtain a filtering result with no phase deviation. The high frequency signal refers to a high frequency part of an audio signal in PCM data extracted during filtering, and includes clarity, brightness, detail representation, and the like of sound. The low-frequency signal is a low-frequency part of an audio signal in PCM data that remains during filtering, and mainly represents a sense of sound heaviness, atmosphere, space surrounding, and the like. Specifically, the bidirectional filtering is performed in the order of forward filtering, backward filtering, secondary filtering and secondary backward filtering, redundant data is removed through cutting, and the finally obtained high-frequency signal and low-frequency signal are ensured to be smoother and more accurate.

In another embodiment, the bidirectional filtering may adopt a segmentation processing strategy to divide the current audio frame into a plurality of subsections, each subsection respectively carries out bidirectional filtering with the previous audio frame, and then the results of the subsections are spliced to form complete high-frequency signals and low-frequency signals. The embodiment can effectively separate the high-frequency signal and the low-frequency signal and avoid phase distortion, thereby realizing clear display of sound levels and providing high-quality input signals for subsequent down-mixing processing and audio rendering.

And S104, respectively carrying out down-mixing processing on the high-frequency signal and the low-frequency signal to obtain a high-frequency stereo signal and a low-frequency stereo signal.

Alternatively, in this embodiment, the downmix processing may refer to a process of converting a high frequency signal and a low frequency signal into a stereo signal according to a certain rule, which aims to reduce the number of channels while preserving spatial characteristics and sound layering of an original sound field as much as possible. In the down-mixing process of the high-frequency signal, a channel side processing mode is generally adopted to decompose the high-frequency components of a plurality of channels into a middle signal and two side signals, so that the expansion sense of a sound field is enhanced. When the low-frequency signals are subjected to downmixing, loudness downmixing operation is often adopted to carry out weighted fusion on the low-frequency signals of different channels, so that the low-frequency signals still have enough strength and sense of surrounding after stereo synthesis, and the space sense and the layering sense of sound can be reserved and even enhanced, thereby ensuring that the audio signals can show good hearing effect in different playing devices, and being particularly suitable for the immersive sound effect requirement of audio programs.

And S105, combining the high-frequency stereo signal and the low-frequency stereo signal to obtain an audio rendering signal of the current audio frame.

Alternatively, in this embodiment, the audio rendering signal may refer to audio data that is finally output for playing after the fusion and spatialization processing of the multiple signals. The combining process typically involves aligning the amplitude, phase and timing of the high frequency and low frequency stereo signals for each channel and then mixing them in a set synthesis ratio to obtain a rendering result containing full band information. Specifically, the high-frequency stereo signal and the low-frequency stereo signal can be directly weighted and overlapped to obtain a full-bandwidth audio rendering signal, and the audio rendering signal with both definition and thick sense is generated, so that the output sound has detail expression, the whole space atmosphere is maintained, and the immersion experience of audiences in the audio program is improved.

In one embodiment, the process of bi-directional filtering includes:

Specifically, firstly, an IIR filter may be used to filter PCM data in a buffer area to obtain a filtering result, the current audio frame refers to a PCM data segment currently processed, the buffer area is an area in the system memory for temporarily storing audio frame data, and the PCM data stored in the buffer area is an input digitized audio signal.

The IIR filter is an infinite impulse response filter, can realize higher filtering precision under limited calculated amount, and is suitable for real-time audio processing. Subsequently, the formula is adoptedAnd carrying out reverse processing on the filtering result, wherein the reverse processing refers to time reversal of the data sequence of the filtering result and is used for eliminating phase delay possibly introduced by forward filtering, so that the fidelity of the signal is improved. After the reverse processing is completed, adopting a formulaPerforming secondary filtering treatment on the obtained reverse filtering result to obtain a secondary filtering resultAnd adopts the formulaPreserving reverse buffered dataFor inverse buffering data, it refers to intermediate results generated during the inverse filtering process. Next, the formula is adoptedPerforming secondary inverse processing on the secondary filtering result to obtain a final secondary inverse filtering resultThe purpose of the second inversion process is to further eliminate residual phase distortion so that the output signal is smoother and more natural. Finally, the formula is adoptedAnd cutting the secondary reverse filtering result, wherein the cutting process is to remove invalid data introduced by zero filling in the filtering process, and only keep effective sampling points so as to obtain a high-frequency signal and a low-frequency signal of each sound channel, wherein the high-frequency signal represents a part containing definition and detail representation in the audio, and the low-frequency signal represents a part representing atmosphere and space in the audio.

In another embodiment, the PCM data in the buffer may also be filtered using an IIR filter, and then formulatedAnd carrying out inverse processing on the filtering result to obtain an inverse filtering result. Re-taking the formulaPerforming secondary filtering treatment on the obtained reverse filtering result to obtain a secondary filtering result, and storing corresponding reverse buffer data. Finally, the secondary filtering result is formulatedPerforming the reverse processing again to obtain a secondary reverse filtering result, and adopting a formulaAnd cutting the secondary reverse filtering result, intercepting the effective part of the secondary reverse filtering result, and finally obtaining the high-frequency signal and the low-frequency signal of each channel.

In one embodiment, filtering PCM data in the buffer to obtain a filtering result, and storing forward buffered data includes:

Specifically, when the current audio frame is the first frame, in order to ensure smoothness of the filtering process, zero padding operation is performed on the input data in the buffer zone, and a formula is adoptedAnd performing zero padding operation on the buffer area, wherein,Representing zero padding functionsRepresenting the parameters of the wave device,Representation takingThe product of the lengths.

The PCM data in the buffer may be first formulated using an IIR filterPerforming a first filtering process to obtain a filtering result, whereinFor IIR filter functions, the current audio frame refers to the PCM data fragment that is currently being processed, and the first frame refers to the start frame of the audio stream as it enters the system for processing, which does not contain any historical frame data inside, and therefore requires special processing. The buffer is an area in the system memory for temporarily storing audio frame data, and PCM data stored in the buffer is an input digitized audio signal. The IIR filter is an infinite impulse response filter, can realize higher filtering precision under limited calculated amount, and is suitable for real-time audio processing. The filtering result is generated and then passes through the formulaPreserving forward buffered data, whereinThe forward buffer data refers to intermediate results recorded in the filtering process for subsequent reverse filtering call to ensure consistency between front and back processing,Representing the selected output resultLast two bits of (2). When the current audio frame processed by the system is not the first frame, that is, the audio frame is not the initial frame of the PCM data in time sequence, but is any frame located after the first frame, the system performs a second filtering process on the PCM data in a buffer area by using an IIR filter, where the buffer area refers to a storage area of the system for storing the PCM data of the audio frame.

First, the formula is adoptedPerforming a second filtering process on the PCM data in the buffer area to obtain a filtering result, whereinIs a second filter processing function. To ensure the stability of the filter output and avoid redundant calculation results, a formula is adoptedThe first two bits of the filtering result are removed to remove invalid sampling points that may be caused by the initial conditions of the filter. Then, the filtering result is generated and then passes through the formulaThe forward buffered data is saved to provide continuous input conditions in subsequent filtering calculations. The application ensures that the first frame can still obtain a high-precision filtering result under the condition of no history reference by introducing zero-filling operation, forward filtering, reverse filtering, secondary filtering and secondary reverse filtering into the first frame, thereby not only avoiding the boundary distortion problem of the traditional method in the first frame stage, but also improving the fidelity of high-frequency signals and low-frequency signals of each channel, thereby ensuring that the whole audio stream has good continuity and stability in the subsequent processing process, and avoiding the data distortion caused by the continuous state of a filter when the filtering processing is carried out on the history data of the previous frame combined with the non-first frame.

In one embodiment, performing the second filtering process on the PCM data in the buffer to obtain a filtering result includes:

The method comprises the steps of intercepting the last M bits of PCM data of a previous audio frame to obtain input buffer data, wherein the length of M is the length of a filter parameter, filtering the PCM data of the previous audio frame to obtain a processing result, intercepting the last M bits of the processing result to obtain initial input conditions, inputting the input buffer data, the initial input conditions and the PCM data of the current audio frame into a filter to obtain output data of the current audio frame, and assigning the initial input conditions to the first M bits of the output data to obtain a filtering result.

The filter may be an IIR filter. Specifically, when the PCM data in the buffer is subjected to the second filtering process by using the IIR filter, the formula is first usedTo intercept last M bits of PCM data of a previous audio frame to obtain input buffer data, wherein,For the previous audio frame, where the previous audio frame is the immediately preceding frame, M represents the length of the filter parameter, and generally corresponds to the order of the IIR filter or the state quantity to be maintained, M may be 3 in this embodiment. The purpose of the input buffer data is to ensure continuity of the cross-frame processing. Subsequently, the formula is adoptedFiltering PCM data of the previous audio frame to obtain a processing result, and intercepting the last M bits from the processing result as an initial input condition, whereinIs an initial input condition. The initial input condition refers to the internal state quantity of the IIR filter before entering the next frame processing, and can record the state of the IIR filter on the previous frame data processing, so that the consistency and stability of the cross-frame filtering are ensured. Next, the formula is adoptedThe input buffer data, the initial input condition and the PCM data of the current audio frame are input into an IIR filter together, and the second filtering processing is carried out to obtain a processing result, wherein,The PCM data of the current audio frame is the original audio input to be processed currently, and after the input buffer data and the initial input condition are combined, a complete continuous signal can be formed in the filtering process, so that discontinuity or distortion between frames is avoided. Finally, the system assigns the initial input condition to the first M bits of the processing result, thereby obtaining the output data of the current audio frame,

The method and the device can ensure that the second filtering process can keep the continuity of signals under the frame crossing condition by introducing the input buffer data and the initial input condition, so that the problems of faults and frequency spectrum distortion generated during the frame crossing splicing in the traditional method are avoided, the adaptability and the robustness of the IIR filter under the real-time audio processing scene are improved, and the reliable guarantee is provided for the accurate separation of the high-frequency signals and the low-frequency signals of each channel.

In one embodiment, bi-directionally filtering the PCM data of the current audio frame based on the PCM data of the previous audio frame to obtain a high frequency signal and a low frequency signal, further comprising:

The method comprises the steps of obtaining filter parameters of each channel of a current audio frame, wherein the filter parameters comprise low-frequency input filtering parameters, low-frequency feedback filtering parameters, high-frequency input filtering parameters and high-frequency feedback filtering parameters, performing bidirectional filtering on the PCM data of the current audio frame based on the PCM data of the previous audio frame, the low-frequency input filtering parameters and the low-frequency feedback filtering parameters to obtain low-frequency signals of each channel, and performing bidirectional filtering on the PCM data of the current audio frame based on the PCM data of the previous audio frame, the high-frequency input filtering parameters and the high-frequency feedback filtering parameters to obtain high-frequency signals of each channel.

Specifically, firstly, a baswof divider may be used to obtain a filter parameter of each channel of the current audio frame, where the divider is a module that splits a full-band signal into a low-frequency portion and a high-frequency portion according to a set frequency threshold, and is used to provide corresponding parameter support for filtering in different frequency bands. The filter parameters comprise low-frequency input filter parameters, low-frequency feedback filter parameters, high-frequency input filter parameters and high-frequency feedback filter parameters, wherein the input filter parameters are used for controlling paths of external signals entering the IIR filter, the feedback filter parameters are used for controlling characteristics of feedback links inside the IIR filter, and the characteristics of the response of the filter to signals in different frequency bands are determined together. 8000Hz can be used as the crossover point for the left and right channels, 1000Hz can be used as the crossover point for the left and right surround channels, 450Hz can be used as the crossover point for the sky channel and no crossover is needed for the bass channel. Then, based on the PCM data of the previous audio frame, the low-frequency input filtering parameter and the low-frequency feedback filtering parameter, the PCM data of the current audio frame is subjected to bidirectional filtering to obtain a low-frequency signal of each channel. Then, the system performs bi-directional filtering on the PCM data of the current audio frame based on the PCM data of the previous audio frame, the high frequency input filtering parameter, and the high frequency feedback filtering parameter, to obtain a high frequency signal of each channel.

According to the application, the frequency divider is introduced to obtain the low-frequency input filtering parameter, the low-frequency feedback filtering parameter, the high-frequency input filtering parameter and the high-frequency feedback filtering parameter, so that not only is the effective separation of high-frequency components and low-frequency components realized, but also the phase distortion is eliminated in the bidirectional filtering process, and the fidelity and the continuity of the output signals of each channel are ensured, so that full low-frequency atmosphere and clear high-frequency details can be presented simultaneously in the playing of the sound heddle program, and the layering and immersion of the whole sound field are improved.

In one embodiment, the cutting processing of the secondary inverse filtering result to obtain the high frequency signal and the low frequency signal of each channel includes:

If the current frame is not the first frame, when the secondary reverse filtering result of the current frame is obtained, the secondary reverse filtering result of the first 1024 sampling points of the secondary reverse filtering result is intercepted, and the high-frequency signal and the low-frequency signal of each channel are obtained.

Specifically, if the current frame is the first frame, when the second inverse filtering result of the frame is obtained, zero padding bits of the first nine bits of the second inverse filtering result are first cut, and since there is no history data at the front end of the second inverse filtering result, zero padding before the first frame corresponds to "buffering" the IIR filter, so that the IIR filter has time to transition from the initial state (normally zero state) to the steady state. An unnatural jump or oscillation of the output signal at the boundary is avoided. Resulting in distortion of the output signal at the beginning and ending segments. Therefore, invalid data needs to be eliminated by cutting off the first nine bits of data after the re-filtering process. After invalid data is cut off, the first 1024 sampling points are cut off from the secondary reverse filtering result, the 1024 sampling points correspond to the effective data length of filtering processing, and the high-frequency signal and the low-frequency signal output by each sound channel can be ensured to have uniform duration and structure. If the current frame is not the first frame, namely in a normal processing stage, when a secondary reverse filtering result is obtained, the front nine data are not required to be cut off, but the front 1024 sampling points are directly intercepted, so that a high-frequency signal and a low-frequency signal of each channel are obtained.

The application eliminates invalid data supplemented by insufficient initial conditions of the IIR because of the filter by introducing extra cutting steps when the current frame is the first frame, and simultaneously extracts effective sampling points directly at the stage of non-first frame, thereby improving the processing efficiency, ensuring the consistency of high-frequency signals and low-frequency signals of each channel in time length and structure, and improving the smoothness and stability of the whole audio stream.

In one embodiment, the process of performing the downmix processing on the high frequency signal and the low frequency signal, respectively, includes:

the method comprises the steps of carrying out loudness down-mixing processing on low-frequency signals of each channel to obtain low-frequency stereo signals, carrying out channel middle-side processing on high-frequency signals of each channel to obtain sound field expansion signals, and carrying out HRTF down-mixing processing on the sound field expansion signals to obtain high-frequency stereo signals.

Specifically, as shown in fig. 2, the low-frequency signal is audio information corresponding to a low-frequency band in PCM data obtained by bidirectional filtering, and mainly represents the energy sensation and the spatial atmosphere of sound. The loudness downmixing process is to synthesize the low-frequency signals of multiple channels into a low-frequency stereo signal under the condition of ensuring the overall loudness balance, and the low-frequency stereo signal can keep the energy distribution of the original audio and make the listening feel more full when playing in double ears by carrying out weighted average on the low-frequency signals of different channels. After the high frequency signal is obtained as shown in fig. 2, the formula is adoptedAndThe channel side processing is carried out on the high-frequency signal of each channel, and the channel side processing is a stereo processing method, so that a sound field expansion signal can be obtained, the sound field expansion signal is wider in spatial distribution, and the positioning sense of the high-frequency part in a stereo sound field can be enhanced. Wherein, the The range is between [1,2] for sound field expansion factors.Representing a left channel, a left surround channel, a left front overhead channel and a left rear overhead channel,Representing a right channel, a right surround channel, a right front overhead channel and a right rear overhead channel, wherein the two channels are opposite one to one. In other words, the left channel, the left surround channel, the left front top channel and the left rear top channel are used respectivelyPerforming middle side processing on the right channel, the right surround channel, the right front top channel and the right rear top channel respectivelyAnd performing middle-side processing to obtain a sound field expansion signal. The high-frequency signal refers to audio information corresponding to a high-frequency band in the PCM data obtained through bidirectional filtering, and mainly reflects the definition and detail of sound. Finally, performing HRTF (high-temperature transfer coefficient) down-mixing processing on the sound field expansion signal, wherein the HRTF is a filtering mode simulating the auditory characteristics of human ears, and can introduce the influence of the head, the auricle and the shoulder on sound transmission during stereophonic sound synthesis, so that the space sense and the direction sense are improved. After the HRTF downmixing processing, a high-frequency stereo signal is obtained, and the high-frequency stereo signal not only maintains the definition and detail expression of a high-frequency part, but also has stronger space immersion sense. Finally, as shown in fig. 2, the high-frequency stereo signal and the low-frequency stereo signal are combined to obtain an audio rendering signal of the current audio frame.

From the above, the application ensures the fullness and balance of the low-frequency part by carrying out loudness down-mixing processing on the low-frequency signal. The width and layering of the sound field are expanded by performing channel side processing on the high-frequency signal. And by combining with the down-mixing processing of the HRTF, the space positioning sense and the immersion effect of the high-frequency part are effectively improved, so that the whole audio rendering signal has a thick and heavy low-frequency atmosphere and a fine high-frequency space sense when being played, and the tone quality expression and the hearing experience of a user are greatly improved.

Having described an audio data processing method provided by an embodiment of the present application, an audio data processing apparatus that performs the above will be described below.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an audio data processing device according to an embodiment of the application. As shown in fig. 3, the audio data processing apparatus includes:

a data acquisition module 401 for acquiring PCM data of each channel of a current audio frame of sound stream information based on a spatial audio algorithm;

A data updating module 402, configured to store PCM data of the current audio frame in a second area of the buffer if the current audio frame is not the first frame, where the buffer includes the first area and the second area, the second area is arranged after the first area, and the first area stores PCM data of the previous audio frame;

A filtering module 403, configured to bi-directionally filter PCM data of a current audio frame based on PCM data of a previous audio frame to obtain a high frequency signal and a low frequency signal, delete PCM data of the previous audio frame stored in the first area, and move PCM data of the current audio frame from the second area to the first area;

A down-mixing module 404, configured to perform down-mixing processing on the high-frequency signal and the low-frequency signal, so as to obtain a high-frequency stereo signal and a low-frequency stereo signal;

the data merging module 405 is configured to merge the high-frequency stereo signal and the low-frequency stereo signal to obtain an audio rendering signal of the current audio frame.

In one embodiment, the data update module 402 is specifically configured to:

Filtering the PCM data in the buffer area to obtain a filtering result, storing forward buffer data, carrying out reverse processing on the filtering result to obtain a reverse filtering result, carrying out secondary filtering processing on the reverse filtering result to obtain a secondary filtering result, storing reverse buffer data, carrying out secondary reverse processing on the secondary filtering result to obtain a secondary reverse filtering result, and carrying out cutting processing on the secondary reverse filtering result to obtain a high-frequency signal and a low-frequency signal of each sound channel.

In one embodiment, the data updating module 402 is specifically configured to perform a first filtering process on the PCM data in the buffer to obtain a filtering result and store forward buffered data if the current audio frame is a first frame;

In one embodiment, the filtering module 403 is specifically configured to intercept last M bits of PCM data of a previous audio frame to obtain input buffer data, where the length of M is a length of a filter parameter;

In one embodiment, the filtering module 403 is specifically configured to obtain a filter parameter of each channel of the current audio frame, where the filter parameter includes a low frequency input filtering parameter, a low frequency feedback filtering parameter, a high frequency input filtering parameter, and a high frequency feedback filtering parameter;

In one embodiment, the filtering module 403 is specifically configured to, if the current frame is the first frame, cut off the first nine bits of the second inverse filtering result when the second inverse filtering result of the current frame is obtained, and then intercept the second inverse filtering results of the first 1024 sampling points to obtain a high frequency signal and a low frequency signal of each channel;

In one embodiment, the downmix module 404 and the data merging module 405 are specifically configured to perform a loudness downmix process on the low-frequency signal of each channel to obtain a low-frequency stereo signal;

The embodiment of the application also provides electronic equipment. Referring to fig. 4, a schematic diagram of an electronic device suitable for implementing the audio data processing method in an embodiment of the present application is shown. The electronic device in the embodiment of the present application may include, but is not limited to, a fixed terminal such as a mobile phone, a notebook computer, a PDA (personal digital assistant), a PAD (tablet computer), a desktop computer, and the like. The electronic device shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.

As shown in fig. 4, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the state where the electronic device is powered on, various programs and data necessary for the operation of the electronic device are also stored in the RAM 603. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, devices may be connected to I/O interface 605 including input devices 606, including for example, touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc., output devices 607, including for example, liquid Crystal Displays (LCDs), speakers, vibrators, etc., storage devices 608, including for example, memory cards, hard disks, etc., and communication devices 609. The communication means 609 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

Embodiments of the present application also provide a computer program product including computer readable instructions, which when executed on an electronic device, cause the electronic device to implement any of the audio data processing methods provided by the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium, which carries one or more computer programs, and when the one or more computer programs are executed by the electronic device, the electronic device can realize any audio data processing method provided by the embodiment of the application.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.

Claims

1. A method of processing audio data, comprising:

if the current audio frame is not the first frame, storing the PCM data of the current audio frame into a second area of a buffer area, wherein the buffer area comprises a first area and a second area, the second area is arranged behind the first area, and the first area stores the PCM data of the previous audio frame;

Bi-directionally filtering the PCM data of the current audio frame based on the PCM data of the previous audio frame to obtain a high frequency signal and a low frequency signal, deleting the PCM data of the previous audio frame stored in the first region, and moving the PCM data of the current audio frame from the second region to the first region;

2. The audio data processing method according to claim 1, wherein the bidirectional filtering process includes:

3. The method of processing audio data according to claim 2, wherein filtering the PCM data in the buffer to obtain a filtering result and storing forward buffered data comprises:

4. The audio data processing method according to claim 3, wherein performing a second filtering process on the PCM data in the buffer to obtain a filtering result comprises:

intercepting the last M bits of the PCM data of the previous audio frame to obtain input buffer data, wherein the length of M is the length of a filter parameter;

Filtering the PCM data of the previous audio frame to obtain a processing result, and intercepting the last M bits of the processing result to obtain an initial input condition;

5. The audio data processing method according to claim 1, wherein bi-directionally filtering the PCM data of the current audio frame based on the PCM data of the previous audio frame to obtain a high frequency signal and a low frequency signal, comprising:

Acquiring filter parameters of each channel of the current audio frame, wherein the filter parameters comprise low-frequency input filter parameters, low-frequency feedback filter parameters, high-frequency input filter parameters and high-frequency feedback filter parameters;

Based on the PCM data of the previous audio frame, the low-frequency input filtering parameter and the low-frequency feedback filtering parameter, bidirectional filtering is carried out on the PCM data of the current audio frame, and a low-frequency signal of each channel is obtained;

and carrying out bidirectional filtering on the PCM data of the current audio frame based on the PCM data of the previous audio frame, the high-frequency input filtering parameter and the high-frequency feedback filtering parameter to obtain the high-frequency signal of each channel.

6. The audio data processing method according to claim 2, wherein the performing the cutting process on the secondary inverse filtering result to obtain a high frequency signal and a low frequency signal of each channel comprises:

And if the current frame is not the first frame, intercepting the secondary reverse filtering results of the first 1024 sampling points of the secondary reverse filtering results when the secondary reverse filtering results of the current frame are obtained, and obtaining the high-frequency signal and the low-frequency signal of each channel.

7. The audio data processing method according to claim 1, wherein the process of performing the downmix processing on the high frequency signal and the low frequency signal, respectively, includes:

performing loudness downmixing processing on the low-frequency signals of each channel to obtain low-frequency stereo signals;

Performing channel middle side processing on the high-frequency signals of each channel to obtain sound field expansion signals;

8. A computer program product comprising computer readable instructions which, when run on an electronic device, cause the electronic device to implement the audio data processing method of any of claims 1 to 7.

9. An electronic device comprising at least one processor and a memory coupled to the processor, wherein:

The memory is used for storing a computer program;

the processor is configured to execute the computer program to enable the electronic device to implement the audio data processing method according to any one of claims 1 to 7.

10. A computer storage medium carrying one or more computer programs which, when executed by an electronic device, enable the electronic device to implement the audio data processing method of any one of claims 1 to 7.