CN112002338B

CN112002338B - A method and system for optimizing audio coding quantization times

Info

Publication number: CN112002338B
Application number: CN202010905585.7A
Authority: CN
Inventors: 李强; 王尧; 叶东翔; 朱勇
Original assignee: Barrot Wireless Co Ltd
Current assignee: Barrot Wireless Co Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2024-06-21
Anticipated expiration: 2040-09-01
Also published as: CN112002338A

Abstract

The present application discloses a method, system, storage medium and device for optimizing the number of audio coding quantization times, belonging to the field of audio coding technology. The method for optimizing the number of audio coding quantization times includes: counting the output results of the encoded audio frame after the time domain noise shaping module, and calculating the maximum value in the output results; judging whether the encoded audio frame is a silent frame according to the maximum value, including: when the maximum value is zero, the encoded audio frame is a silent frame; and when the maximum value is non-zero, the encoded audio frame is a non-silent frame; performing a spectrum quantization process on the encoded audio frame, including: when the encoded audio frame is a silent frame, setting the spectrum quantization result of the encoded audio frame to a first preset value; and when the encoded audio frame is a non-silent frame, performing a spectrum quantization process of the encoded audio frame in a standard process to obtain the spectrum quantization result of the encoded audio frame. The application of the present application reduces the number of audio coding quantization times and reduces the amount of encoder calculation.

Description

Method and system for optimizing audio coding quantization times

Technical Field

The application relates to the technical field of audio coding, in particular to a method, a system, a storage medium and equipment for optimizing audio coding quantization times.

Background

In the prior art, the mainstream bluetooth audio encoder includes: the SBC audio encoder is most widely used according to the mandatory requirements of the A2DP protocol, and all Bluetooth audio devices are supported, but the tone quality is general; the AAC-LC audio encoder has good tone quality and wide application, a plurality of mainstream mobile phones support, but compared with the SBC audio encoder, the memory occupation is large, the operation complexity is high, a plurality of Bluetooth devices are based on embedded platforms, the battery capacity is limited, the operation capability of a processor is poor, and the memory is limited; aptX series audio encoder has better tone quality, but has very high code rate, aptX needs 384kbps code rate, aptX-HD has 576kbps code rate, and is a high-pass unique technology, and is relatively closed; the LDAC audio encoder has better tone quality, but the code rates are also very high, namely 330kbps,660kbps and 990kbps, and the stable support of such high code rate is difficult due to the special complex wireless environment in which the Bluetooth device is positioned, and the technology which is unique to Sony is also very closed.

For the above reasons, the Bluetooth international union Bluetooth Sig has been introduced by a number of manufacturers in combination with LC3 audio encoders, which have the advantages of low delay, high sound quality and coding gain, and no patent fee in the Bluetooth field, and have been paid attention by the manufacturers. Since the LC3 audio encoder is originally proposed to meet the audio application in the bluetooth low energy field, the power consumption requirements are very strict.

In the LC3 audio encoder, since it is based on a waveform encoding technique, compression efficiency is low compared to a conventional vocoder. For example, the recommended code rate for LC3 at 8k sample rate is 24kbps, but the AMR-NB maximum code rate for WCDMA is only 12.2kbps, the average code rate is below 10kbps, the EVRC-A maximum code rate for CDMA system is 8kbps, and the average code rate is about 5kbps. The higher code rate of the LC3 audio encoder occupies larger bandwidth, so higher transmitting power is needed, and mutual interference is generated among all Bluetooth devices in more public places of the Bluetooth devices, thereby influencing user experience.

In addition, when the encoder is used for encoding and decoding the call, it is counted that the party only has 35% of the time to call, and there is a long-time mute state. The current audio encoder quantizes according to the standard quantization process when the quantization step is performed in the encoding process, so that the spectrum quantization module of the encoder has higher code rate when one party of communication is in a silent state without speaking, the code rate is wasted, and more encoder operation is occupied.

Disclosure of Invention

Aiming at the technical problems in the prior art, the application provides a method, a system, a storage medium and equipment for optimizing the quantization times of audio coding.

In one aspect of the present application, there is provided a method of optimizing the number of quantization times of audio coding, comprising: counting the output result of the encoded audio frame passing through the time domain noise shaping module, and calculating the maximum value in the output result; judging whether the coded audio frame is a mute frame according to the maximum value, comprising: when the maximum value is zero, the coded audio frame is a mute frame; and when the maximum value is non-zero, the encoded audio frame is a non-silence frame; performing a spectral quantization process on an encoded audio frame, comprising: when the encoded audio frame is a non-mute frame, performing a spectrum quantization process of a standard flow on the encoded audio frame to obtain a spectrum quantization result of the encoded audio frame; and when the encoded audio frame is a mute frame, skipping a spectrum quantization process of the standard flow, and setting a spectrum quantization result of the encoded audio frame to a first preset value.

In another aspect of the present application, there is provided a system for optimizing the number of quantization times of audio coding, comprising: the statistics module is used for counting the output result of the time domain noise shaping module of the encoded audio frame and calculating the maximum value in the output result; the judging module judges whether the encoded audio frame is a mute frame according to the maximum value, and comprises the following steps: when the maximum value is zero, the coded audio frame is a mute frame; and when the maximum value is non-zero, the encoded audio frame is a non-silence frame; a spectral quantization module that performs a spectral quantization process on an encoded audio frame, comprising: when the encoded audio frame is a non-mute frame, performing a spectrum quantization process of a standard flow on the encoded audio frame to obtain a spectrum quantization result of the encoded audio frame; and when the encoded audio frame is a mute frame, skipping a spectrum quantization process of the standard flow, and setting a spectrum quantization result of the encoded audio frame to a first preset value.

In another aspect of the present application, a computer readable storage medium is provided having computer instructions stored therein, wherein the computer instructions are operative to perform the method of optimizing the number of quantization of audio codes in scheme one.

In another aspect of the present application, a computer device is provided that includes a processor and a memory storing computer instructions, wherein the processor operates the computer instructions to perform the method of optimizing the number of quantization of audio codes in scheme one.

The beneficial effects of the invention are as follows: in the audio coding process, the audio coding quantization times are reduced, the code rate of an encoder is reduced, and the operand of the encoder is reduced.

Drawings

FIG. 1 is a schematic representation of encoded audio with silence frames for a method of optimizing the number of quantization of audio codes in accordance with the present application;

FIG. 2 is a flow chart of one embodiment of a method of optimizing the quantization count of audio codes according to the present application;

FIG. 3 is a flow chart of one embodiment of a method of optimizing the quantization count of audio codes according to the present application;

fig. 4 is a schematic diagram showing the composition of one embodiment of the system for optimizing the quantization count of audio codes according to the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the standard coding flow of the encoder, the processing procedure of the frequency spectrum quantization module for the coded audio frame comprises two parts:

the main calculation formula of the first part is as follows:

Where gg represents the quantized global gain parameter, gg _ind represents the quantized global gain index, and gg _off represents the quantized global gain offset. Wherein, gg _ind and gg _off are both obtained by standard coding procedures.

The calculation formula of the second part is as follows:

Wherein, X _f (N) is a spectrum data sample filtered by the TNS time domain noise shaping module, and N _E X _q (N) are quantized spectrum data samples, and N _E are all; n _E (Number ofencoded SPECTRAL LINES) is the number of coded spectral lines and is a variable with respect to the sampling frequency. The silence frames and the non-silence frames exist in the encoded audio frames, and if the encoded audio is encoded by adopting a standard spectrum quantization process, unnecessary consumption of the operation amount of an encoder is caused.

Fig. 1 shows a schematic representation of encoded audio with silence frames for the method of optimizing the number of quantization of audio codes of the present application. As shown in fig. 1, when the encoded audio includes a mute frame, according to the description of the principle of the spectrum quantization module, the result of the spectrum quantization process of the mute frame is a fixed value, and if the fixed value is directly output, the spectrum quantization process of the mute frame can be omitted, thereby reducing the code rate of the encoder, reducing the operation amount of the encoder, and reducing the power consumption.

Fig. 2 shows a specific embodiment of the method of optimizing the number of quantization of an audio code according to the present application.

In the embodiment shown in fig. 2, the method for optimizing the quantization number of audio codes according to the present application includes: s101, counting the output result of the encoded audio frame passing through the time domain noise shaping module, and calculating the maximum value in the output result; the process S102, which determines whether the encoded audio frame is a mute frame according to the maximum value, includes: when the maximum value is zero, the coded audio frame is a mute frame; and when the maximum value is non-zero, the encoded audio frame is a non-silence frame; a process S103, performing a spectrum quantization process on the encoded audio frame, including: when the encoded audio frame is a non-mute frame, performing a spectrum quantization process of a standard flow on the encoded audio frame to obtain a spectrum quantization result of the encoded audio frame; and when the encoded audio frame is a mute frame, skipping a spectrum quantization process of the standard flow, and setting a spectrum quantization result of the encoded audio frame to a first preset value.

In the embodiment shown in fig. 2, the method for optimizing the quantization count of audio coding according to the present application includes a process S101 of counting the output result of the encoded audio frame through the time domain noise shaping module and calculating the maximum value in the output result.

In this embodiment, the encoded audio frames are encoded according to the standard encoding flow of the encoder, and the output result of the encoded audio frames through the time domain noise shaping module is counted and denoted by X _f(n),n＝0...N_E -1, and the maximum value of the output result X _f (n) of the time domain noise shaping module in one encoded audio frame is counted and usedAnd (3) representing.

In the embodiment shown in fig. 2, the method for optimizing the quantization count of audio coding according to the present application includes a process S102 of determining whether the encoded audio frame is a mute frame according to the maximum value, including: when the maximum value is zero, the coded audio frame is a mute frame; and when the maximum value is non-zero, the encoded audio frame is a non-silence frame.

In this embodiment, according to the correlation characteristic when the encoded audio frame is a silence frame, whether the current encoded audio frame is a silence frame is determined according to the processing result of the time domain noise shaping module on the encoded audio frame. Wherein, judging the maximum value of the output result of the time domain noise shaping moduleWhether or not it is zero. WhenWhen the audio frame is zero, indicating that all output results X _f (n) of the time domain noise shaping module are zero, and indicating that the current encoded audio frame is a mute frame; if the maximum value/>, of the output result of the time domain noise shaping moduleNon-zero indicates that the current encoded audio frame is a non-silence frame, i.e., an encoded audio frame with speech information.

The time domain noise shaping module is utilized to judge whether the current encoded audio frame is a mute frame or not, no additional operation is needed, the intermediate encoding result obtained in the normal standard encoding flow is directly utilized to judge, and the method is simple, convenient and easy to operate, and avoids unnecessary operation amount increase.

In the embodiment shown in fig. 2, the method for optimizing the quantization count of audio coding according to the present application includes a process S103, where a spectrum quantization process is performed on a coded audio frame, including: when the encoded audio frame is a non-mute frame, performing a spectrum quantization process of a standard flow on the encoded audio frame to obtain a spectrum quantization result of the encoded audio frame; and when the encoded audio frame is a mute frame, skipping the spectrum quantization process of the standard flow, and directly setting the spectrum quantization result of the encoded audio frame as a first preset value.

In this embodiment, when it is determined that the current encoded audio frame is a mute frame, X _f (n) is all zero, and according to the standard operation procedure in the spectrum quantization module in formula 2, bringing X _f (n) into formula 2 can obtain that the operation result of the spectrum quantization module on the mute frame at this time is a determined value, where the value is irrelevant to the quantized global gain parameter gg. Therefore, when the encoded audio frame is a mute frame, a numerical value is directly output as the operation result of the spectrum quantization module of the audio frame without the operation process of the spectrum quantization module.

In a specific embodiment of the present application, when the encoded audio frame is a mute frame, the operation result of the spectrum quantization module of the mute frame is set to 0 according to the standard specification of the spectrum quantization process.

In this embodiment, a fixed value is calculated according to the standard specification of the spectrum quantization process and is used as the output result of the spectrum quantization module, so that the specific operation process of the spectrum quantization module is skipped directly, the operation amount of the encoder is reduced, and the power consumption of the encoder is reduced.

In one embodiment of the present application, prior to the spectrum quantization process, the method comprises: and calculating a quantized global gain offset and a quantized global gain index according to the standard specification, calculating a quantized global gain parameter according to the quantized global gain offset and the quantized global gain index, and performing a spectrum quantization process.

In this embodiment, when the current encoded audio frame is determined to be a non-mute frame, the encoded audio frame is subjected to a spectrum quantization process according to a standard processing flow of a spectrum quantization module in the encoder. Wherein, equation 1 and equation 2 show the operation of partial spectrum quantization.

In one example of the present application, according to the specification of the encoding process, when performing the operation of the spectrum quantization module on the encoded audio frame of the non-mute frame, the quantized global gain parameter is calculated according to the quantized global gain offset and the quantized global gain index, the quantized global gain parameter gg is calculated according to the formula 1, and the spectrum quantization result of the encoded audio frame is calculated according to the formula 2.

In a specific embodiment of the present application, the quantized global gain minima are calculated according to a standard specification, and the quantized global gain index is corrected according to the quantized global gain minima.

The method for optimizing the audio coding quantization times of the application judges whether the coded audio frame is a mute frame or not by utilizing the output result of the time domain noise shaping module in the standard coding flow, and carries out different frequency spectrum quantization operation processes according to different types of the coded audio frame. When the encoded audio frame is a mute frame, skipping a spectrum quantization process, directly outputting a spectrum quantization result of the mute frame according to the working principle of a spectrum quantization module, reducing unnecessary operation amount in the encoder and reducing power consumption of the encoder; and when the encoded audio frame is a non-mute frame, carrying out a spectrum quantization operation process of the encoded audio frame according to a standard encoding flow.

Fig. 3 shows a specific example of the method of optimizing the number of quantization times of audio coding according to the present application.

In the specific example shown in FIG. 3, the maximum value is calculated from the time domain noise shaping module calculation output result X _f (n)Then judging the maximum value/>, of the output result calculated by the time domain noise shaping moduleWhether or not to be zero, whenWhen zero, the current coded audio frame is a mute frame; whenAnd when the current encoded audio frame is not zero, the current encoded audio frame is a non-mute frame.

When (when)When the current coding audio frame is a non-mute frame, calculating a quantized global gain offset gg _off and a quantized global gain index gg _ind according to a standard coding flow, calculating a quantized global gain minimum gg _min to correct gg _ind, calculating a quantized global gain gg according to a formula 1, and carrying out a normal spectrum quantization process on the non-mute frame according to a formula 2. WhenWhen the current encoded audio frame is a mute frame, setting the quantized global gain index gg _ind to a second preset value, wherein gg _ind can be set to 255, so that the encoding process after the spectrum quantization process is performed normally. And outputting the spectrum quantization result of the mute frame as a first value, wherein the first value is 0 according to the audio coding specification. The frequency spectrum quantization result of the mute frame is directly output, a complex frequency spectrum quantization operation process is omitted, the operation amount of the frequency spectrum quantization process is reduced, the frequency spectrum quantization times are reduced by skipping the frequency spectrum quantization process, the operation amount of the encoder is reduced, and the power consumption of the encoder is reduced.

In one example of the present application, the effect of the method of optimizing the number of quantization of audio codes of the present application will be described by taking as an example a monaural, coded audio with a sampling rate of 16KHz and a frame length of 10 ms. For example, this audio length is 10 seconds for a total of 1000 frames, where the frame length is 10 milliseconds. If about 500 frames in the 10 seconds of audio are mute frames and the other 500 frames are non-mute frames, the method for optimizing the audio coding quantization times of the application is used, and the corresponding 500 mute frames directly output the spectrum quantization result without the operation process of spectrum quantization. Only the spectral quantization process needs to be performed for the other 500 non-mute frames. Therefore, the original frequency spectrum quantization times are reduced from 1000 times to 500 times, and the frequency spectrum quantization process times are reduced, so that the operation amount of the encoder is reduced, and the power consumption of the encoder is reduced.

The method for optimizing the audio coding quantization times of the application judges whether the audio frame is a mute frame or not, and then directly skips the spectrum quantization process on the mute frame to directly output the result, thereby reducing the operation amount of the encoder, reducing the power consumption of the encoder and prolonging the service time of the encoder under the limited power consumption. The method for optimizing the quantization times of the audio coding is simple to operate, can be applied to audio coding with frame lengths of 10ms and 7.5ms and all sampling rates, and has a wide application range. In addition, the method for optimizing the audio coding quantization times of the application reduces the frequency spectrum quantization times, does not negatively influence the tone quality of the coded audio, and has the same tone quality as the tone quality coded according to the standard flow.

Fig. 4 shows an embodiment of the system of the application for optimizing the number of quantization of an audio code.

In the embodiment shown in fig. 4, the system for optimizing the quantization number of audio codes according to the present application includes: the statistics module is used for counting the output result of the time domain noise shaping module of the encoded audio frame and calculating the maximum value in the output result; the judging module judges whether the coded audio frame is a mute frame according to the maximum value, and comprises the following steps: when the maximum value is zero, the coded audio frame is a mute frame; and when the maximum value is non-zero, the encoded audio frame is a non-silence frame; a spectral quantization module that performs a spectral quantization process on an encoded audio frame, comprising: when the encoded audio frame is a non-mute frame, performing a spectrum quantization process of a standard flow on the encoded audio frame to obtain a spectrum quantization result of the encoded audio frame; and when the encoded audio frame is a mute frame, skipping the spectrum quantization process of the standard flow, and directly setting the spectrum quantization result of the encoded audio frame as a first preset value.

The system for optimizing the audio coding quantization times of the application judges whether the audio frame is a mute frame or not through the judging module, and then directly skips the spectrum quantization process to directly output the result, thereby reducing the operation amount of the encoder, reducing the power consumption of the encoder and prolonging the service time of the encoder under the limited power consumption.

In one embodiment of the application, a computer readable storage medium stores computer instructions operable to perform the method of optimizing the number of quantization of audio codes described in any of the embodiments. Wherein the storage medium may be directly in hardware, in a software module executed by a processor, or in a combination of the two.

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.

The Processor may be a central processing unit (English: central Processing Unit, CPU for short), other general purpose Processor, digital signal Processor (English: DIGITAL SIGNAL Processor, DSP for short), application specific integrated Circuit (Application SPECIFIC INTEGRATED Circuit, ASIC for short), field programmable gate array (English: field Programmable GateArray, FPGA for short), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one embodiment of the application, a computer device includes a processor and a memory storing computer instructions, wherein: the processor operates the computer instructions to perform the method of optimizing the number of quantization of audio codes described in any of the embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The foregoing is only illustrative of the present application and is not to be construed as limiting the scope of the application, and all equivalent structural changes made by the present application and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the scope of the present application.

Claims

1. A method of optimizing the number of quantization of an audio code, comprising:

counting the output result of the encoded audio frame passing through the time domain noise shaping module, and calculating the maximum value in the output result;

Judging whether the coded audio frame is a mute frame according to the maximum value, comprising:

when the maximum value is zero, the encoded audio frame is a mute frame; and

When the maximum value is non-zero, the encoded audio frame is a non-silence frame; and

Performing a spectral quantization process on the encoded audio frame, comprising:

When the coded audio frame is a non-mute frame, performing a spectrum quantization process of a standard flow on the coded audio frame to obtain a spectrum quantization result of the coded audio frame; and

When the encoded audio frame is a mute frame, skipping the spectrum quantization process of the standard flow, setting the spectrum quantization result of the encoded audio frame to a first preset value, and setting the quantization global gain index to a second preset value 255, so that the audio encoding process after the spectrum quantization process is smoothly performed, and according to the standard specification of the audio encoding process, the first preset value is 0.

2. A method of optimizing the number of quantization of an audio code as set forth in claim 1, comprising, prior to said spectral quantization process:

and calculating a quantized global gain offset and the quantized global gain index according to the standard specification, calculating a quantized global gain parameter according to the quantized global gain offset and the quantized global gain index, and performing the spectrum quantization process.

3. A method of optimizing the number of quantization of an audio code according to claim 2, wherein a quantized global gain minimum is calculated according to the standard specification, and the quantized global gain index is corrected according to the quantized global gain minimum.

4. A system for optimizing the number of quantization of an audio code, comprising:

the statistics module is used for counting the output result of the time domain noise shaping module of the encoded audio frame and calculating the maximum value in the output result;

The judging module judges whether the encoded audio frame is a mute frame according to the maximum value, and comprises the following steps:

when the maximum value is zero, the encoded audio frame is a mute frame; and

When the maximum value is non-zero, the encoded audio frame is a non-silence frame;

A spectral quantization module that performs a spectral quantization process on the encoded audio frame, comprising:

5. A computer readable storage medium storing computer instructions, wherein the computer instructions are operative to perform the method of optimizing the number of quantization of audio codes of any one of claims 1-3.

6. A computer device comprising a processor and a memory, the memory storing computer instructions, wherein the processor operates the computer instructions to perform the method of optimizing the number of audio encoding quantizations of any one of claims 1-3.