CN103065637B

CN103065637B - Audio encoder and decoder

Info

Publication number: CN103065637B
Application number: CN201310005503.3A
Authority: CN
Inventors: P·H·海德林; P·J·卡尔森; J·L·萨缪尔森; M·舒格
Original assignee: Dolby International AB
Current assignee: Dolby International AB; Dolby Sweden AB
Priority date: 2008-01-04
Filing date: 2008-12-30
Publication date: 2015-02-04
Anticipated expiration: 2028-12-30
Also published as: JP5624192B2; KR101202163B1; CA3190951A1; CA3076068C; RU2015118725A; BRPI0822236A2; CN101925950A; EP2077550B1; MX2010007326A; EP2573765B1; WO2009086919A1; CN101939781A; EP2235719B1; EP2235719A1; US8924201B2; JP5356406B2; EP4414982A3; US8494863B2; CA2709974A1; CA2960862C

Abstract

The invention relates to an audio encoder and decoder. The present invention teaches a new audio coding system that can code both general audio and speech signals well at low bit rates. A proposed audio coding system comprises linear prediction unit for filtering an input signal based on an adaptive filter; a transformation unit for transforming a frame of the filtered input signal into a transform domain; and a quantization unit for quantizing the transform domain signal. The quantization unit decides, based on input signal characteristics, to encode the transform domain signal with a model-based quantizer or a non-model-based quantizer. Preferably, the decision is based on the frame size applied by the transformation unit.

Description

Audio coder and demoder

The divisional application that the application is application number is 200880125539.2, the applying date is on Dec 30th, 2008, denomination of invention is the application for a patent for invention of " audio coder and demoder ".

Technical field

The present invention relates to the coding of sound signal, in particular to the coding of any sound signal being not limited only to voice, music or its combination.

Background of invention

In the prior art, have the source model be specifically designed as by encoding based on signal, that is, the articulatory system of people, to the speech coder that voice signal is encoded.These scramblers can not process any sound signal of such as music or any other non-speech audio and so on.In addition, be commonly called the music encoder of audio coder in the prior art in addition, they will coding based on the conception of the auditory system of people, instead of based on the source model of signal.These scramblers can process arbitrary signal well, but for the voice signal of low bit rate, special speech coder provides excellent audio quality.Therefore, up to the present, also there is not the general coding structure for any sound signal of encoding, and when operating with low bit rate, it can again can as the music encoder for music as the speech coder for voice.

Therefore, a kind of enhancement mode audio coder and the demoder that can improve audio quality and/or reduce bit rate is needed.

Summary of the invention

The present invention relates to be equal to or better than specially for signal specific customization the quality level of quality level of system effectively to encode any sound signal.

The present invention relates to and comprise linear predictive coding (LPC) and the audio codec algorithm to the transform coder part that the signal through LPC process operates.

The invention further relates to the quantization strategy depending on transform frame size.In addition, also proposed the entropy constrained quantizer based on model using arithmetic coding.In addition, also random offset can be inserted in uniform scalar quantizer.The present invention it is further proposed the quantizer based on model using arithmetic coding, such as, and entropy constrained quantizer (ECQ).

The invention further relates to the scale factor by utilizing the existence of LPC data to come in coded audio scrambler transition coding part effectively.

The invention further relates to the bit reservoir (reservoir) effectively utilized with in the audio coder of variable frame size.

The invention further relates to for generating the scrambler of bit stream to coding audio signal, and for bit stream is decoded and generate sensuously with the demoder of the sound signal of the indistinguishable reconstruct of sound signal of input.

A first aspect of the present invention relates to quantification in transform coder, such as application enhancements discrete cosine transform (MDCT).The quantizer proposed preferably quantizes MDCT line.No matter whether scrambler uses linear predictive coding (LPC) to analyze or extra long-term forecasting (LTP) further, all applicable in this respect.

The invention provides a kind of audio coding system, comprise the linear prediction unit for filtering (filtering) input signal based on sef-adapting filter; For the frame of the described input signal through filtering (filter) being converted to the converter unit of transform domain; And, for quantizing the quantifying unit of described transform-domain signals.Described quantifying unit, based on input signal feature, determines to utilize and to encode described transform-domain signals based on the quantizer of model or the non-quantizer based on model.Preferably, decision makes based on the frame sign of converter unit application.But also can predict the criterion that other input signals for switching quantization strategy rely on, they are all in the scope of the application.

Another importance of the present invention is quantizer can be adaptive.Specifically, can be adaptive based on the model in the quantizer of model, to adjust to input audio signal.Model can change along with the time, such as, depends on input signal feature.This can reduce quantizing distortion, and so can improve coding quality.

According to an embodiment, the quantization strategy proposed depends on frame sign.Also proposed, quantifying unit can based on the frame sign applied by converter unit, determines to utilize to encode described transform-domain signals based on the quantizer of model or the non-quantizer based on model.Preferably, quantifying unit is configured to by the entropy constrained quantification based on model, for frame sign is less than the frame transcoding, coding transform territory signal of threshold value.Quantification based on model can depend on the parameter of classification.Large frame is passable, such as, by with such as based on the scalar quantizer of the entropy code of Huffman, quantizes, as, such as, use in AAC codec.

Audio coding system can further include long-term forecasting (LTP) unit, for the reconstruct based on described section before the input signal of filtering, estimate the described frame of the described input signal through filtering, and transform-domain signals assembled unit, for in described transform domain, combine described long-term forecasting to estimate and the described input signal through conversion, to generate the described transform-domain signals be input in quantifying unit.

Switching between the different quantization method of MDCT line is another aspect of the preferred embodiments of the present invention.By using different quantization strategies for different transform size, codec can perform all quantifications and coding in MDCT territory, without the need to running specific time-domain speech coder in parallel or in series with transform domain codec.Present invention teaches, for the signal of the voice having LTP gain and so on, preferably, use short conversion and based on the quantizer of model, signal encoded.Quantizer based on model is particularly suitable for short conversion, and as after a while by general introduction, provides the advantage of the specific vector quantizer of time domain speech (VQ), and still operate in MDCT territory, and does not have input signal to be the requirement of voice signal.In other words, when based on the quantizer of model and LTP in combination for short transforming section time, remain the efficiency of special time-domain speech coder VQ, and do not lose versatility, also do not leave MDCT territory.

In addition, for more static music signal, preferably use relatively large conversion, as usual in audio codec use, and the quantization scheme by converting greatly the sparse spectral line differentiated can be utilized.Therefore, present invention teaches for this quantization scheme of long conversion use.

So, switch quantization strategy according to frame sign, codec can be made both can to have retained the attribute of dedicated voice codec, the attribute of special audio codec can be retained again, only need by selecting transform size.Avoiding problems and try hard to equally with all problems in the prior art systems of low rate processed voice and sound signal, because these systems inevitably run into effectively by problems that time domain coding (speech coder) and Frequency Domain Coding (audio coder) combine.

According to another aspect of the present invention, quantize to use adaptive step.Preferably, the quantization step of the component of transform-domain signals is adaptive based on linear prediction and/or long-term forecasting parameter.Quantization step can also be configured to depend on frequency further.In the various embodiments of the invention, quantization step determines based at least one item in the following: the polynomial expression of sef-adapting filter, code rate controling parameters, long-term prediction gain value, and input signal variance.

Preferably, quantifying unit comprises the uniform scalar quantizer for quantization transform territory component of signal.Each scalar quantizer all such as based on probability model, to MDCT line application uniform quantization.Probability model can be Laplce or Gauss model, or is suitable for any other probability model of signal characteristic.Random offset can also be inserted in uniform scalar quantizer by quantifying unit further.Random offset is inserted and is provided vector quantization advantage to uniform scalar quantizer.According to an embodiment, random offset is determined based on the optimization of quantizing distortion, preferably, in perception territory and/or consider the cost of the quantity according to the bit needed for encoding to quantification index.

Quantifying unit can further include the arithmetic encoder of the quantification index generated for coding uniform scalar quantizer.This just obtains the low bit rate of the possible minimum value leveled off to given by signal entropy.

Quantifying unit can further include residual quantization device, for quantizing the residual quantization signal produced by uniform scalar quantizer, to reduce total distortion further.Residual quantization device is preferably fixed rate vector quantizer.

Can scrambler go use multiple quantification reconstruction point in the inverse DCT in quantifying unit and/or in demoder.Such as, least mean-square error (MMSE) and/or central point (mid point) reconstruction point can be used, come based on its quantification index reconstruct quantized value.Quantizing reconstruction point can also further based on the dynamic interpolation that can control by the feature of data between central point and MMSE point.This allow control noises insert, and avoid due in order to low bit rate to the spectral hole (hole) that zero amount of bar (bin) specifies MDCT line to cause.

Determining quantizing distortion preferably to apply the perceptual weighting in transform domain when arranging different weights to specific frequency components.Perception weight can derive from linear forecasting parameter effectively.

Of the present invention another independently aspect relate to utilize LPC and SCF(scale factor) universal coexisted of data.In the scrambler based on conversion of such as application enhancements discrete cosine transform (MDCT), in quantification, the usage ratio factor can control quantization step.In the prior art, these scale factors are estimated according to original signal, to determine masking curve.Present suggestion, estimates second group of scale factor by means of perceptual filter or according to the psychoacoustic model that LPC data calculate.This allows the poor replacement between the scale factor estimated by the scale factor and LPC only transmitting/store practical application to transmit/store real scale factor, reduces for transmitting/the cost of the stored ratio factor.So, comprising such as, the voice coding element of such as LPC and so on, and in the audio coding system of the transition coding element of such as MDCT and so on, the present invention, by utilizing the data provided by LPC, reduces the cost of the scale factor information needed for transition coding part being used for transmission coding/decoding device.It should be noted that in this respect independent of other aspects of proposed audio coding system, and also can realize in other audio coding systems.

Such as, perceptual mask curve can be estimated based on the parameter of sef-adapting filter.Second group of scale factor based on linear prediction can be determined based on the perceptual mask curve estimated.Then, based on the difference between the scale factor that uses actual in quantification and the scale factor calculated according to the perceptual mask opisometer based on LPC, determine to store/scale factor information of transmission.This just from store/transmission information delete dynamic perfromance and redundancy, so that the bit needed for storage/transmission scale factor is less.

When LPC and MDCT be not with same number of frames Rate operation, that is, there is different frame signs, then can based on the linear forecasting parameter of interpolation, estimate the scale factor based on linear prediction of the frame of transform-domain signals, to correspond to the time window (window) covered by MDCT frame.

Therefore, the invention provides based on transform coder and the audio coding system of the fundamental forecasting comprised from speech coder and Shaping Module.System of the present invention comprises for the linear prediction unit based on sef-adapting filter filtered input signal; For the frame of the described input signal through filtering being converted to the converter unit of transform domain; For the quantifying unit of quantization transform territory signal; Scale factor determining unit, for based on masking threshold curve, generates scale factor, for when the described transform-domain signals of quantification in described quantifying unit; Linear prediction scale factor estimation unit, for the parameter based on described sef-adapting filter, estimates the scale factor based on linear prediction; And scale factor scrambler, for the difference of encoding between the described scale factor based on masking threshold curve and the described scale factor based on linear prediction.Difference between the scale factor applied by coding and the scale factor can determined in a decoder based on available linear prediction information, coding and storage efficiency can be improved, and the bit only needing storage/transmission less.

The specific aspect of another individual encoders of the present invention relates to the bit reservoir process for variable frame size.In the audio coding system can encoded to the frame of variable-length, by the available bit that distributes between multiple frame, control bit reservoir.When the reasonable difficulty of each frame given estimate and the bit reservoir of the size that defines, allow better gross mass with a certain deviation of required constant bit rate, and the buffer requests applied by bit reservoir size can not be violated.The bit reservoir using the concept of bit reservoir to expand to for the vague generalization audio codec with variable frame size controls by the present invention.Therefore, audio coding system can comprise bit reservoir control module, for estimating based on the length of frame and the difficulty of frame, determines that license is for encoding through the quantity of the bit of the frame of the signal of filtering.Preferably, bit reservoir control module is estimated and/or different frame signs for different frame difficulty, has independent governing equation.The difficulty of different frame signs is estimated and can be normalized, and so, can compare them more easily.In order to the bit controlled for variable rate coder distributes, the lower permission restriction of the bit control algolithm of license is preferably set to the average of the bit of the frame sign of maximum permission by bit reservoir control module.

Further aspect of the present invention relates to the quantizer used based on model, such as, and the process of the bit reservoir in the scrambler of entropy constrained quantizer (ECQ).Suggestion minimizes the change of ECQ step-length.Suggested the specific governing equation be associated with ECQ speed by quantiser step size.

For the sef-adapting filter of filtered input signal, preferably analyze based on linear predictive coding (LPC), comprise the LPC wave filter producing albefaction input signal.The LPC parameter of the present frame of input data can be determined by algorithm known in the art.LPC parameter estimation unit for the frame of input data, can calculate any suitable LPC Parametric Representation, as polynomial expression, transition function, reflection coefficient, line spectral frequencies etc.For to encode or the LPC Parametric Representation of particular type of other process depends on corresponding requirement.As is known to persons skilled in the art, some expression is more suitable for some computing than other expressions, and therefore, some represents for realizing these computings is preferred.Linear prediction unit can operate with fixing (such as, 20 milliseconds) first frame length.Linear prediction is filtered and can also be operated on twisting frequencies axle further, to emphasize some frequency range, as low frequency relative to other frequencies selectively.

Be applied to the conversion of the frame of the input signal through filtering, the Modified Discrete Cosine Tr ansform (MDCT) preferably operated with variable second frame length.Audio coding system can comprise series of windows control module, this unit is by minimizing coding cost function for the whole input signal block comprising several frame, be preferably the perceptual entropy simplified, come for input signal block, determine the frame length of overlapping MDCT window.So, deriving input signal block comminute is the optimal segmentation of the MDCT window with corresponding second frame length.Therefore, propose a kind of transform domain coding structure, comprise the speech coder element with adaptive-length MDCT frame, only as the base unit of all process except LPC.Because MDCT frame length can present many different values, therefore, best sequence can be found, and the frame sign of the sudden change (abrupt) as commonly used in the prior art only applying wicket size and large window size can be avoided to change.In addition, do not need yet as in the method for the little of some prior art and the transition between large window size use the transition mapping window with sharp limit.

Preferably, continuous print MDCT length of window changes by the factor (2) at the most, and/or MDCT length of window is bi-values.More specifically, MDCT length of window can be two Meta Partitions of input signal block.Therefore, MDCT series of windows is only limitted to the predetermined sequence being easy to utilize a small amount of bits of encoded.In addition, series of windows also has the smooth transition of frame sign, thus eliminates the frame sign change of sudden change.

Series of windows control module can be configured to further, when searching for the sequence of MDCT length of window input signal block being minimized to coding cost function, considers that the long-term forecasting generated by long-term forecasting unit is estimated for length of window candidate.In this embodiment, when determining MDCT length of window, long-term forecasting circulation is closed, and this can cause improving the sequence being suitable for the MDCT window of encoding.

Audio coding system can further include LPC scrambler, for variable bit rate recursively coding linear prediction unit generate line spectral frequencies or other suitable LPC Parametric Representations so as to store and/or be transferred to demoder.According to an embodiment, provide linear prediction interpolation unit, for interpolation with the linear forecasting parameter generated corresponding to the speed of the first frame length, so that the variable frame length of matched transform territory signal.

According to an aspect of the present invention, audio coding system can comprise Sensing model unit, and this unit, by warbling for LPC frame and/or the LPC polynomial expression generated by linear prediction unit that tilts, revises the feature of sef-adapting filter.The sensor model received by the amendment to sef-adapting filter feature can in systems in which for many objects.Such as, it can be applied as the perceptual weighting function in quantification or long-term forecasting.

Another aspect of the present invention relates to long-term forecasting (LTP), and the LTP in particular to LTP and the MDCT weighting of the long-term forecasting in MDCT territory, MDCT frame adaptive searches for.No matter whether there is lpc analysis in the upstream of transform coder, these aspects are all applicable.

According to an embodiment, audio coding system comprises inverse quantization and inverse transformation block further, for generating the frame time domain reconstruction of the input signal through filtering.In addition, the long-term forecasting buffer zone of the time domain reconstruction of the frame stored before the input signal of filtering can also be provided for.These unit can arrange in the mode from quantifying unit to the backfeed loop of long-term forecasting extraction unit, and this backfeed loop searches for the section of optimum matching through the reconstruct of the present frame of the input signal of filtering in long-term forecasting buffer zone.In addition, long-term prediction gain estimation unit can also be provided, for adjusting the gain of the institute's selections from long-term forecasting buffer zone, so that its optimum matching present frame.Preferably, estimate from the long-term forecasting that deducts in the input signal of conversion transform domain.Therefore, the second converter unit institute's selections being transformed to transform domain can be provided for.Long-term forecasting circulation can also comprise after inverse quantization and before being inversely transformed into time domain, the long-term forecasting in transform domain be estimated to add feedback signal to.So, can use reverse self-adaptation long-term forecasting scheme, the program in the transform domain as illustrated, predicts the present frame through the input signal of filtering based on frame above.In order to more effective, (adapt) long-term forecasting scheme can be adapted in a different manner further, regard to that some example proposes as follows.

According to an embodiment, long-term forecasting unit comprises long-term forecasting extraction apparatus, and for determining lagged value, this value specifies the section of best-fit through the reconstruct of the signal through filtering of the present frame of the signal of filtering.Long-term prediction gain estimator can estimate the yield value of the signal of institute's selections of the signal be applied to through filtering.Preferably, so determine lagged value and yield value, to minimize the long-term forecasting related in perception territory the estimating distortion criterion with the difference of the input signal of conversion.When minimal distortion criterion, the linear prediction polynomial expression revised can be applied as MDCT territory EQ Gain curve.

Long-term forecasting unit can comprise converter unit, for the reconstruction signal of the section from LTP buffer zone is transformed to transform domain.For effectively realizing MDCT conversion, preferably, conversion is the discrete cosine transform of IV type.

Another aspect of the present invention relates to the audio decoder for the bit stream generated by the embodiment of scrambler above of decoding.Demoder according to an embodiment comprises quantifying unit, for removing the frame quantizing incoming bit stream based on scale factor; Inverse transformation block, for converting transform-domain signals inversely; For the linear prediction unit of transform-domain signals converted inversely described in filtering; And scale factor decoding unit, for based on the scale factor increment received (delta Δ) information, generate the described scale factor that uses in going to quantize, it encodes the described scale factor applied in described scrambler and the difference between the scale factor that generates based on the parameter of described sef-adapting filter.Demoder can further include scale factor determining unit, for based on the masking threshold curve of linear forecasting parameter deriving from present frame, generates scale factor.Scale factor decoding unit can combine the scale factor increment information received and the scale factor based on linear prediction generated, to generate the scale factor for being input to quantifying unit.

Comprise according to the demoder of another embodiment and go quantifying unit based on model, for removing the frame quantizing incoming bit stream; Inverse transformation block, for converting transform-domain signals inversely; And for filtering the linear prediction unit of the transform-domain signals converted inversely.Go quantifying unit can comprise non-removing quantizer and removing quantizer based on model based on model.

Preferably, quantifying unit is gone to comprise at least one adaptive probability model.Go quantifying unit can be configured to carry out self-adaptation as the function of the signal characteristic of transmission to go to quantize.

Go quantifying unit further based on the control data of the frame through decoding, quantization strategy can also be decided.Preferably, go quantified controlling data to receive together with bit stream, or derive from the data received.Such as, quantifying unit is gone to decide quantization strategy based on the transform size of frame.

According to another aspect, quantifying unit is gone to comprise self-adapting reconstruction point.Go quantifying unit can comprise uniform scalar and remove quantizer, they are configured to each quantized interval and use two to go to quantize reconstruction point, specifically, and mid point and MMSE reconstruction point.

According to an embodiment, quantifying unit and arithmetic coding is gone to use quantizer based on model in combination.

In addition, demoder can comprise as above for the many aspects disclosed in scrambler.Generally speaking, demoder is by the operation of mirror image (mirror) scrambler, although some operation only performs in the encoder, and does not have corresponding assembly in a decoder.So, if otherwise do not stated, the content disclosed in scrambler is regarded as also being applicable to demoder.

Aspect above of the present invention can realize as device, equipment, method or the computer program operated on programmable device.Aspect of the present invention can also realize with signal, data structure and bit stream further.

So, the application further discloses audio coding method and audio-frequency decoding method.Exemplary audio coding method comprises the following steps: based on sef-adapting filter filtered input signal; The frame of the described input signal through filtering is converted to transform domain; Quantize described transform-domain signals; Based on masking threshold curve, generate scale factor, for quantification described transform-domain signals time in described quantifying unit; Based on the parameter of described sef-adapting filter, estimate the scale factor based on linear prediction; And the difference between the described scale factor based on masking threshold curve of coding and the described scale factor based on linear prediction.

Another audio coding method comprises the following steps: based on sef-adapting filter filtered input signal; The frame of the described input signal through filtering is converted to transform domain; And quantize described transform-domain signals; Wherein said quantifying unit, based on input signal feature, determines to utilize and to encode described transform-domain signals based on the quantizer of model or the non-quantizer based on model.

Exemplary audio coding/decoding method comprises the following steps: based on scale factor, removes the frame quantizing incoming bit stream; Convert transform-domain signals inversely; The transform-domain signals converted inversely described in linear prediction filtration; Based on the parameter of described sef-adapting filter, estimate the second scale factor; And based on the scale factor difference information received and the second estimated scale factor, generate the described scale factor going to use in quantification.

Another audio coding method comprises the following steps: the frame quantizing incoming bit stream; Convert transform-domain signals inversely; And the transform-domain signals converted inversely described in linear prediction filtration; Wherein, go described in quantize to use non-removing quantizer and removing quantizer based on model based on model.

These are the preferred audio coding/decoding method of the application's instruction and the example of computer program, and person skilled in the art can from deriving additive method to the description of exemplary embodiment below.

Accompanying drawing explanation

Referring now to accompanying drawing, only to limit the scope of the invention or the mode of spirit as illustrated examples, present invention is described, wherein:

Fig. 1 shows the preferred embodiment according to encoder of the present invention;

Fig. 2 shows the more detailed view according to encoder of the present invention;

Fig. 3 shows another embodiment according to scrambler of the present invention;

Fig. 4 shows the preferred embodiment according to scrambler of the present invention;

Fig. 5 shows the preferred embodiment according to demoder of the present invention;

Fig. 6 shows the preferred embodiment according to MDCT line coding of the present invention and decoding;

Fig. 7 shows the preferred embodiment according to encoder of the present invention, and is transferred to another the example of related control data from one;

Fig. 7 a is another illustration of the aspect of scrambler according to an embodiment of the invention;

Fig. 8 shows the example of series of windows according to an embodiment of the invention between LPC data and MDCT data and relation;

Fig. 9 shows the combination according to scale factor data of the present invention and LPC data;

Fig. 9 a shows another embodiment of the combination according to scale factor data of the present invention and LPC data;

Fig. 9 b shows another simplified block diagram according to encoder of the present invention;

Figure 10 shows the preferred embodiment according to the present invention, LPC polynomial expression being converted to MDCT gain trace;

Figure 11 shows the preferred embodiment of constant renewal rate LPC Parameter Mapping to adaptive M DCT series of windows data according to of the present invention;

Figure 12 shows the preferred embodiment calculated according to the transform size based on quantizer of the present invention and type self adaption perceptual weighting filter;

Figure 13 shows the preferred embodiment that self-adaptation according to the present invention depends on the quantizer of frame sign;

Figure 14 shows the preferred embodiment that self-adaptation according to the present invention depends on the quantizer of frame sign;

Figure 15 shows the preferred embodiment that the function as LPC and LTP data according to the present invention carrys out adaptive quantizing step-length;

How Figure 15 a derives incremental rate curve by increment adaptation module from LPC and LTP parameter if showing;

Figure 16 shows according to the preferred embodiment utilizing the quantizer based on model of random offset of the present invention;

Figure 17 shows the preferred embodiment according to the quantizer based on model of the present invention;

Figure 17 a shows another preferred embodiment according to the quantizer based on model of the present invention;

Figure 17 b schematically illustrates the MDCT line demoder 2150 based on model according to an embodiment of the invention;

Figure 17 c shows the pretreated aspect of quantizer according to an embodiment of the invention;

Figure 17 d schematically illustrates the aspect of step-length according to an embodiment of the invention;

Figure 17 e schematically illustrates the entropy constrained scrambler based on model according to an embodiment of the invention;

Figure 17 f schematically illustrates the operation of uniform scalar quantizer (USQ);

Figure 17 g schematically illustrates probability calculation according to an embodiment of the invention;

Figure 17 h shows and according to an embodiment of the inventionly removes quantizing process;

Figure 18 shows the preferred embodiment controlled according to bit reservoir of the present invention;

Figure 18 a shows the key concept that bit reservoir controls;

Figure 18 b shows the concept controlled according to the bit reservoir of variable frame size of the present invention;

Figure 18 c shows the exemplary controlling curve controlled according to the bit reservoir of an embodiment;

Figure 19 shows a preferred embodiment of the inverse DCT of the different reconstruction point of use according to the present invention.

Embodiment

The embodiments described below are the explanation of the principle of audio coder of the present invention and demoder.Should be appreciated that, be obvious to the amendment of layout described herein and details and variant to those skilled in the art.Therefore, intention is only to be limited by the scope of appended Patent right requirement, and the detail of can't help wherein to present as the description of embodiment and explanation is limited.The similar assembly of embodiment is numbered by similar Reference numeral.

In FIG, scrambler 101 and demoder 102 is shown.Scrambler 101 obtains time domain input signal, and produces the bit stream 103 being sent to demoder 102 subsequently.Demoder 102, based on the bit stream 103 received, produces output waveform.Output signal is similar to original input signal in psychologic acoustics.

In fig. 2, a preferred embodiment of scrambler 200 and demoder 210 is shown.Input signal in scrambler 200 is passed through LPC(linear predictive coding) module 201, this module 201 is for having the LPC frame generation albefaction residue signal of the first frame length, and the linear forecasting parameter of correspondence.In addition, gain normalization can also be comprised in LPC module 201.From the residue signal of LPC by with the MDCT(Modified Discrete Cosine Tr ansform of the second variable frame length operation) module 202 converts frequency domain to.Depicted in figure 2 in scrambler 200, include LTP(long-term forecasting) module 205.In another embodiment of the present invention in detail, LTP will be described.MDCT line is quantized 203, is also gone quantification 204, so that when it is available to demoder 210 to the copy of the output of LTP buffer zone feeding through decoding.Due to quantizing distortion, this copy is called the reconstruct of corresponding input signal.Portion under figure 2, depicts demoder 210.Demoder 210 gets the MDCT line quantized, and they is gone quantification 211, adds the contribution from LTP module 214, and performs inverse MDCT conversion 212, is next LPC composite filter 213.

An importance of embodiment is above, MDCT frame is the unique base unit for encoding, although LPC has its oneself (and constant in one embodiment) frame sign, and LPC parameter of also encoding.This embodiment is from transform coder, and the fundamental forecasting introduced from speech coder and Shaping Module.As discussed later below, MDCT frame sign is variable, and by minimizing the perceptual entropy cost function of simplification, determining the best MDCT series of windows of whole piece, making it be applicable to input signal block.This can make convergent-divergent (scale) maintain Best Times/frequency control.In addition, the unified structure proposed avoids the switching of different encoding paradigm or the combination of layering.

In figure 3, than the part summarily describing scrambler 300 in more detail.The whitened signal exported from the LPC module 201 scrambler of Fig. 2 is imported into MDCT bank of filters 302.It can be optionally that the MDCT of time warp analyzes that MDCT analyzes, and the pitch of signal in MDCT mapping window constant (if signal is periodically and with clearly defined pitch) is guaranteed in this analysis.

In figure 3, compare and illustrate in detail LTP module 310.It comprises the LTP buffer zone 311 of the time domain samples of the reconstruct of the output signal section remained above.When given current input section, the optimum matching section in LTP buffer zone 311 searched by LTP extraction apparatus 312.From current be input in the section of quantizer 303 deduct this section before, by gain unit 313 to this section of suitable yield value of application.Obviously, in order to perform subtraction before a quantization, but selected signal segment is also transformed to MDCT territory by LTP extraction apparatus 312.When being combined by the MDCT territory incoming frame of the output signal Duan Yujing conversion before reconstruct, LTP extraction apparatus 312 search minimizes optimum gain and the lagged value of the error function in perception territory.Such as, optimize from the section of the reconstruct through conversion of LTP module 310 and square error (MSE) function between the incoming frame (that is, the residue signal after subtraction) of conversion.This optimization can perform in perception territory, there according to their perceptual importance, and weighted frequency component (that is, MDCT line).LTP module 310 operates in MDCT frame unit, and scrambler 300 once considers that a MDCT frame is remaining, such as, for the quantification in quantization modules 303.Delayed and gain search can be performed in perception territory.Optionally, LTP can select frequency, that is, to frequency self-adaption gain and/or delayed.Depict inverse quantization unit 304 and inverse MDCT unit 306.As explained later, MDCT can be time warp.

In the diagram, another embodiment of scrambler 400 is shown.Except Fig. 3, for illustrating and including lpc analysis 401.Show the DCT-IV conversion 414 for selected signal segment being transformed to MDCT territory.In addition, also show the several means calculating and carry out the least error of LTP section selection.Except the minimizing of residue signal as shown in Figure 4 (being designated LTP2 in the diagram), also show and transformed to the time-domain signal of reconstruct inversely so that the minimizing (being expressed as LTP3) of the difference between the MDCT territory signal quantized through the input signal of conversion and going before being stored in LTP buffer zone 411.This MSE minimum of a function by LTP contribution is guided to through conversion input signal and be used for the best (as much as possible) similarity of input signal of the reconstruct be stored in LTP buffer zone 411.Another substitution error function (being expressed as LTP 1) is based on the difference of these signals in time domain.In the case, the MSE between the time domain reconstruction of the correspondence in the incoming frame of LPC filtering and LTP buffer zone 411 is minimized.Preferably, MSE calculates based on MDCT frame sign, and MDCT frame sign can be different from LPC frame sign.In addition, quantizer and go quantiser block to be replaced by spectrum coding block 403 and frequency spectrum decoding block 404(" Spec enc " and " Spec dec "), they can comprise the extra module except quantification, as depicted in figure 6.Again, MDCT and inverse MDCT can be time warp (WMDCT, IWMDCT).

In Figure 5, proposed demoder 500 is shown.Frequency spectrum data from the bit stream received is quantized 511 inversely, and adds (add) and contribute from the LTP provided by LTP extraction apparatus of LTP buffer zone 515.Also show the LTP extraction apparatus 516 in demoder 500 and LTP gain unit 517.The MDCT line amounted to is synthesized to time domain by MDCT Synthetic block, and time-domain signal is carried out frequency spectrum shaping by LPC composite filter 513.

In figure 6, " Spec dec " and " Spec enc " block 403,404 that describe in detail Fig. 4 is compared.In one embodiment, " Spec enc " block 603 shown by the right of this figure comprises harmonic prediction analysis module 610, TNS analyzes (time-domain noise reshaping) module 611, next being the scale factor Zoom module 612 of MDCT line, is finally quantification and the coding of the line in line of codes module 613.Demoder " Spec Dec " block 604 shown by the left side of this figure performs inverse process, that is, the MDCT line received is gone to quantize in decoding line module 620, and is that scale factor (SCF) Zoom module 621 cancels convergent-divergent.Application TNS synthesis 622 and harmonic prediction synthesis 623.

In the figure 7, the very general illustration of coded system of the present invention is depicted.Example encoder gets input signal, and produces bit stream, except other data, also comprises:

The MDCT line quantized;

Scale factor;

LPC polynomial repressentation;

Signal segment energy (such as, signal variance);

Series of windows;

LTP data.

Demoder according to embodiment reads the bit stream provided, and produces the audio output signal being similar to original signal in psychologic acoustics.

Fig. 7 a is another illustration of the aspect of scrambler 700 according to an embodiment of the invention.Scrambler 700 comprises LPC module 701, MDCT module 704, LTP module 705(only schematically illustrate), quantization modules 703 and for will the signal feedback of reconstruct to the inverse quantization module 704 of LTP module 705.Further provide the pitch estimation module 750 of the pitch for estimating input signal, and for determining the series of windows determination module 751 of best MDCT series of windows (such as, 1 second) for larger input signal block.In this embodiment, MDCT series of windows is determined based on open-loop method, in the method, determines to minimize coding cost function, the sequence of the MDCT window size candidate of the perceptual entropy such as simplified.When searching for best MDCT series of windows, optionally can consider that LTP module 705 is to by the contribution of the minimized coding cost function of series of windows determination module 751.Preferably, for the window size candidate that each has been assessed, determine the best long-term forecasting contribution for the MDCT frame corresponding to window size candidate, and estimate to encode accordingly cost.Generally speaking, short MDCT frame sign is more suitable for phonetic entry, and the long mapping window with meticulous spectral resolution is preferred for sound signal.

Perception weight or perceptual weighting function determine based on the LPC parameter calculated by LPC module 701, will be described in more detail below.Perception weight is provided to the LTP module 705 both operated in MDCT territory and quantization modules 703, so that according to the error of their corresponding perceptual importance weighted frequency component or distortion contribution (contribution).Which coding parameter Fig. 7 a also show preferably by after a while the suitable encoding scheme discussed being transferred to demoder.

Next, the coexisting and the simulation of effect of LPC in MDCT, both in order to retroaction and actual filtering are omitted of LPC and MDCT data will be discussed.

According to an embodiment, LP modular filtration input signal, so that the spectrum shape removing signal, the output subsequently of LP module is the signal of spectral flatness.This operation for such as LTP is favourable.But, can benefit from what the spectrum shape knowing original signal before carrying out LP filtering is to other parts of the codec that the signal of spectral flatness operates.Because coder module after the filtering, the MDCT conversion of the signal of spectral flatness is operated, the spectrum shape that present invention teaches original signal before carrying out LP filtering is passable, if necessary, by by the transition function of used LP wave filter (namely, the spectrum envelope of original signal) be mapped to gain trace or equalizer curve that the frequency (bin) that represents the MDCT of spectral flatness signal applies, represented by the MDCT again putting on spectral flatness signal.On the contrary, LP module can omit actual filtering, and only estimates the transition function being mapped to gain trace subsequently, and the MDCT that this gain trace can be applied in signal represents, as this eliminated the necessity of input signal being carried out to time-domain filtering.

The outstanding aspect of of various embodiments of the present invention uses Application of Splitting Window (segmentation) flexibly to operate the transform coder based on MDCT to LPC whitened signal.In fig. 8 to this has been description, in the figure, together with the windowing of LPC, give exemplary MDCT series of windows.Therefore, can clearly be seen that from this figure, LPC operates constant frame sign (such as, 20ms), and MDCT operates variable window sequence (such as, 4 to 128ms).This allows independently for LPC selects optimal window length, and is that MDCT selects best window sequence.

Fig. 8 also show the relation between LPC data and MDCT data, and these LPC data are specially the LPC parameter generated with the first frame rate, and these MDCT data are specially the MDCT line generated with the second variable bit rate.Downward arrow in this figure represents the LPC data be interpolated between LPC frame (circle), so that the MDCT frame that coupling is corresponding.Such as, be the such as determined time instance of MDCT series of windows, the perceptual weighting function that interpolation LPC generates.

Arrow representative is upwards used for the refining data (that is, control data) of MDCT line coding.For AAC frame, these data normally scale factor, and for ECQ frame, these data are variance correction data etc. normally.Solid line is to represented by dotted arrows when given a certain quantizer, and which data is " important " data for MDCT line coding.Two-way lower arrow represents codec spectral line.

Coexisting of LPC and the MDCT data in scrambler can being utilized, such as, carrying out the perceptual mask curve by considering estimated by LPC parameter, reduce the bit requirement of coding MDCT scale factor.In addition, when determining quantizing distortion, the perceptual weighting that LPC derives can also be used.As shown in the figure also as below by discussion, depend on the frame sign of the data received, namely correspond to MDCT frame or window size, quantizer operates in two modes, and generates the frame (ECQ frame and AAC frame) of two types.

Figure 11 shows the preferred embodiment of constant rate of speed LPC Parameter Mapping to adaptive M DCT series of windows data.LPC mapping block 1100 receives LPC parameter according to LPC renewal rate.In addition, LPC mapping block 1100 also receives the information about MDCT series of windows.Then, it generates the mapping of LPC to MDCT, such as, for the psychoacoustic data based on LPC being mapped to the corresponding MDCT frame generated with variable MDCT frame rate.Such as, LPC mapping block interpolation LPC polynomial expression or correspond to the related data of time instance of MDCT frame, is used as such as, the perception weight in LTP module or quantizer.

Now, by reference to Fig. 9, the details of the sensor model based on LPC is discussed.In one embodiment of the invention, self-adaptation LPC module 901, with by for 16kHz sampling rate signal, uses the linear prediction on such as rank 16, produces white output signal.Such as, the output from LPC module 201 in Fig. 2 is carrying out the remnants after LPC parameter estimation and filtering.As the lower left quarter at Fig. 9 summarily illustrate estimated by LPC polynomial expression A (z), can be warbled by bandwidth expansion factor, in one of the present invention realizes, can also by polynomial first reflection coefficient of LPC of amendment correspondence, tilted (tilt).By being moved in unit circle by polynomial limit, the bandwidth of the peak value can expanded in LPC transition function of warbling, so causes softer peak value.Inclination can make LPC transition function more flat, to balance the impact of lower and higher frequency.These amendments make every effort to from the LPC parameter estimated generate by the encoder both sides of system can perceptual mask curve A ' (z).The details of the polynomial manipulation of LPC is presented in Figure 12 below.

To the MDCT coding of LPC remnants operation, in one of the present invention realizes, there is the scale factor controlling the resolution of quantizer or quantization step (and so, the noise by quantizing to introduce).These scale factors are estimated by scale factor estimation module 960 pairs of original input signals.Such as, scale factor is derived from the perceptual mask threshold curve estimated according to original signal.In one embodiment, independent frequency transformation (different frequency resolutions may be had) can be used to determine masking threshold curve, but this is always unrequired.Alternatively, according to the MDCT line generated by conversion module, masking threshold curve can be estimated.The right lower quadrant of Fig. 9 schematically illustrates the scale factor generated by scale factor estimation module 960, quantizes for controlling, and is only limitted to inaudible distortion with the quantizing noise that toilet is introduced.

If LPC wave filter is connected to the upstream of MDCT conversion module, then whitened signal is transformed to MDCT territory.Because this signal has white spectrum, therefore, be not too applicable to deriving perceptual mask curve from it.So, when estimating masking threshold curve and/or scale factor, the MDCT territory EQ Gain curve of the albefaction for compensation spectrum of generation can be used.This is because, need to the signal of the absolute spectral properties with original signal to estimate scale factor, correctly to estimate sensorial sheltering.Below with reference to Figure 10 than discussing from LPC polynomial computation MDCT territory EQ Gain curve in more detail.

Depict the scale factor summarized in Fig. 9 a above and estimate a graphic embodiment.In this embodiment, input signal is imported into the LP module 901 estimated by the spectrum envelope of the input signal described by A (z), and exports the version through filtering of described polynomial expression and input signal.Utilize the inverse of A (z) to carry out filtering to input signal, so as to obtain as scrambler other parts the frequency spectrum white signal that uses.Through the signal of filtering be imported into MDCT converter unit 902, and A (z) polynomial expression is imported into MDCT gain trace computing unit 970(as depicted in figure 14).To MDCT coefficient or line application from the gain trace of LP Polynomial Estimation, to retain the spectrum envelope of original input signal before carrying out scale factor estimation.MDCT line through Gain tuning is imported into the scale factor estimation module 960 into input signal estimation scale factor.

By the method summarized above use, the packet transmitted between encoder contains LP polynomial expression and normally used scale factor in transform coding and decoding device, when using the quantizer based on model, relevant perception information and signal model can be derived from LP polynomial expression.

Specifically, turn back to Fig. 9, the LPC module 901 in this figure carrys out spectrum envelope A (z) of estimated signal from input signal, and from then on derives perception expression A'(z).In addition, input signal is estimated to the scale factor usually used in the perceptual audio codecs based on conversion, or, if consider the transition function of LP wave filter in scale factor is estimated, also their (described in contexts of Figure 10 below) can be estimated to the white signal produced by LP wave filter.Then, can in the polynomial situation of given LP, self-adaptation scale factor in scale factor adaptation module 961, as below summarize, to reduce the bit rate of transmission needed for scale factor.

Usually, scale factor is transferred to demoder, and LP polynomial expression is also like this.Now, assuming that they both from original input signal estimate, and they are both associated with the absolute spectral properties of original input signal to a certain extent, propose coding increment between the two to express, to eliminate any redundancy that may produce when both separately transmit.According to an embodiment, this is utilized to associate as follows.Due to LPC polynomial expression, after correctly being warbled and being tilted, make every effort to represent masking threshold curve, therefore, two kinds of expression can be combined, so that the scale factor desired by the scale factor transmitted representative of transform coder and the difference between those scale factors can derived from the LPC polynomial expression transmitted.Therefore, scale factor adaptation module 961 as shown in Figure 9 calculates the difference between scale factor that the scale factor desired by generating from original input signal and LPC derive.Remain the ability had in LPC structure the quantizer based on MDCT (this quantizer has the concept of the normally used scale factor of institute in transform coder) that LPC remnants operate in this respect, and still have and be switched to only from the possibility of the quantizer based on model of Linear Prediction Data derivation quantization step.

In figure 9b, the simplified block diagram of the encoder according to an embodiment is given.Input signal in scrambler is passed through the LPC module 901 generating albefaction residue signal and corresponding linear forecasting parameter.In addition, gain normalization can also be comprised in LPC module 901.Residue signal from LPC is converted to frequency field by MDCT conversion 902.On the right of Fig. 9 b, depict demoder.Demoder gets the MDCT line quantized, and they is gone quantification 911, and the inverse MDCT conversion 912 of application, be next LPC synthetic filtering 913.

The whitened signal exported from the LPC module 901 scrambler of Fig. 9 b is imported into MDCT bank of filters 902.MDCT line is analyzed due to MDCT, be utilized the different part comprised for MDCT frequency spectrum guide desired by the Transform Coding Algorithm of sensor model of quantization step and transition coding.Determine that the value of quantization step is called " scale factor ", for each subregion of the scalefactor bands by name of MDCT frequency spectrum, have a scale factor value.In prior art Transform Coding Algorithm, scale factor is transferred to demoder by bit stream.

According to an aspect of the present invention, when the scale factor used in coded quantization, use as with reference to the perceptual mask curve from LPC parameter estimation illustrated by figure 9.Estimate that the another kind of possibility of perceptual mask curve is the estimation for the energy distribution on MDCT line, use unmodified LPC filter factor.Utilize this energy budget, can in both encoder application as in transition coding scheme the psychoacoustic model that uses, to obtain the estimation of masking curve.

Then, two of masking curve kinds of expression are combined, so that the scale factor desired by the scale factor that will transmit representative of transform coder and can from the difference between the LPC polynomial expression transmitted or those scale factors of deriving based on the psychoacoustic model of LPC.This feature remains the ability in LPC structure with the quantizer based on MDCT (this quantizer to have in transform coder the concept of normally used scale factor) operated LPC remnants, and still has the possibility controlling quantizing noise according to the psychoacoustic model of transform coder based on each scalefactor bands.Advantage is, compared with the LPC data existed with transmission absolute ratio's factor values and not considering, the difference of transmission scale factor is by bit less for cost.Depend on bit rate, frame sign or other parameters, the amount of the scale factor remnants that will transmit can be selected.For having the control completely to each scalefactor bands, suitable noiseless coding scheme can be utilized to transmit scale factor increment.In other cases, the cost for transmitting scale factor can reduce further in the more rough expression of passing ratio factor difference.The special circumstances with minimum expense are when being all set to 0 for all frequency band ratio factor difference, and when not transmitting extra information.

Figure 10 shows the preferred embodiment according to the present invention, LPC polynomial expression being converted to MDCT gain trace.As depicted in Figure 2, MDCT operates the whitened signal of carrying out albefaction by LPC wave filter 1001.In order to retain the spectrum envelope of original input signal, calculate MDCT gain trace by MDCT gain trace module 1070.For the frequency represented by the point (bin) in being converted by MDCT, by estimating the amplitude response of the spectrum envelope described by LPC wave filter, MDCT territory EQ Gain curve can be obtained.Then, can to MDCT market demand gain trace, such as, when calculating least mean-square error as depicted in fig. 3, or when as above with reference to figure 9 the estimation described for carry out scale factor and determine perceptual mask curve time.

Figure 12 shows the preferred embodiment calculated based on the transform size of quantizer and/or type self adaption perceptual weighting filter.LP polynomial expression A (z) is estimated by the LPC module 1201 in Figure 16.LPC parameter adapting module 1271 receives the LPC parameter of such as LPC polynomial expression A (z), and generates perceptual weighting filter A'(z by amendment LPC parameter).Such as, the bandwidth of expansion LPC polynomial expression A (z), and/or this polynomial expression that tilts.Being input to that self-adaptation warbles with the parameter of tilt module 1272 is give tacit consent to warble and tilting value ρ and γ.When given pre-defined rule, based on used transform size, and/or the quantization strategy Q used, revise these values.Modifiedly to warble and tilt parameters ρ ' and γ ' are imported into LPC parameter adapting module 1271, the input signal spectrum envelope represented by A (z) is converted to by A'(z by this module 1271) represented by perceptual mask curve.

Below, the quantization strategy depending on frame sign according to an embodiment of the invention will be described, and depend on the quantification based on model of parameter of classification.One aspect of the present invention is, it utilizes different quantization strategies for different transform size or frame sign.This is shown in Figure 13, and in the figure, frame sign is used as using based on the quantizer of model or the Selection parameter of the non-quantizer based on model.It should be noted that other aspects independent of disclosed encoder/decoder, this quantification aspect, and also can be applied in other codecs.An example of the non-quantizer based on model is the quantizer based on huffman code table used in AAC audio coding standard.Quantizer based on model can be the entropy constrained quantizer (ECQ) using arithmetic coding.But, also can use other quantizers in various embodiments of the present invention.

According to an independent aspects of the present invention, propose when given particular frame size, the function as frame sign carries out switching can use optimal quantization strategy between different quantization strategies.Exemplarily, series of windows can specify the very static tonal sound period for signal, uses long conversion.For this signal specific type, use long conversion, the quantization strategy using " rareness " character (that is, well-defined discrete tone) that can utilize in signal spectrum is very useful.By the quantization method used in AAC with also as in AAC the huffman code table that uses and spectral line group combine, be highly profitable.But on the contrary, for voice segments, when the coding gain of given LTP, series of windows can specify to use short conversion.For this signal type and transform size, use and do not attempt to search or introduce the rareness in frequency spectrum, but the quantization strategy but maintaining wide band energy (when given LTP, will retaining the pulse as the character of original input signal) is useful.

Figure 14 gives the more generally diagram of this concept, and in the figure, input signal is converted into MDCT territory, is quantized subsequently by the quantizer controlled by the transform size converted for MDCT or frame sign.

According to another aspect of the present invention, the function as LPC and/or LTP data carrys out adaptive quantizer step-length.This permission determines step-length according to the difficulty of frame, and the quantity that control is allocated for the bit that frame is encoded.In fig .15, the illustration being controlled the quantification based on model about how by LPC and LTP data is given.At the top of Figure 15, give the signal diagram of MDCT line.Below, the quantization step increment Delta of the function as frequency is depicted.From then on particular example is clear that very much, and quantization step increases along with frequency, that is, for higher frequency, can produce more quantizing distortion.Incremental rate curve is derived from LPC and LTP parameter by the increment adaptation module described in Figure 15 a.Illustrated by with reference to Figure 13, incremental rate curve can also be derived further by warbling and/or tilting from prediction polynomial expression A (z).

The preferred perceptual weighting function deriving from LPC data is given in equation below:

P (z) = \frac{1 - (1 - τ) r_{1} z^{- 1}}{A (z / ρ)}

Wherein, A (z) is LPC polynomial expression, and τ is tilt parameters, and ρ controls to warble, and r ₁it is the first reflection coefficient gone out according to A (z) polynomial computation.It should be noted that and for the classification of different expression formulas, then A (z) polynomial expression can be calculated, to extract relevant information from polynomial expression.If someone is interested in spectrum slope, to apply " inclination " to resist the slope of frequency spectrum, then it is preferred for polynomial expression being calculated as again reflection coefficient, because the first reflection coefficient represents the slope of frequency spectrum.

In addition, as input signal variances sigma, LTP gain g and the polynomial first reflection coefficient r of prediction can also be derived from ₁function, auto-adaptive increment value Δ.Such as, self-adaptation can based on equation below:

Δ′＝Δ(1+r ₁(1-g ²))

Below, each side of the quantizer based on model according to an embodiment of the invention is outlined.In figure 16, an aspect of each side of the quantizer based on model is shown.Use uniform scalar quantizer, MDCT line is input to quantizer.In addition, also random offset is input to quantizer, and the off-set value of quantized interval used as moving section border.The quantizer proposed provides vector quantization advantage, and maintains the search property of scalar quantizer.Quantizer carries out iteration to a different set of off-set value, and for these off-set values, calculates quantization error.The off-set value (or off-set value vector) that use minimizes quantizing distortion for the specific MDCT line be quantized quantizes.Then, off-set value is transferred to demoder together with the MDCT line to quantize.The use of random offset, going to introduce noise filling in the decoded signal quantized, by doing like this, avoids the spectral hole quantized in frequency spectrum.Otherwise quantized to the low bit rate of null value, this particular importance for wherein many MDCT lines, null value will cause there is audible defect in the frequency spectrum of the signal of reconstruct.

Figure 17 schematically illustrates the MDCT line quantizer (MBMLQ) based on model according to an embodiment of the invention.The top of Figure 17 depicts MBMLQ scrambler 1700.MBMLQ scrambler 1700 is using the MDCT line (if LTP is present in system) of the MDCT line in MDCT frame or LTP remnants as input.MBMLQ uses the statistical model of MDCT line, makes source code to be adapted to signal attribute by based on MDCT frame, produces the effective compression of bit stream.

Can estimate the local gain of MDCT line as the RMS value of MDCT line, and before being imported into MBMLQ scrambler 1700, MDCT line is normalized in gain normalization module 1720.Local gain normalization MDCT line, and be supplementing LP gain normalization.The signal level variation put on when LP gain is adapted to larger, and target change when local gain is adapted to less, can improve the beginning (on-sets) in the quality of transient and voice.Local gain is encoded by fixed rate or variable rate encoding, and is transferred to demoder.

The quantity that Rate control module 1710 controls for the bit of encoding to MDCT frame can be used.Speed control characteristic controls the quantity of the bit used.Speed control characteristic points to the list of specified quantiser step size.Can be undertaken sort (see Figure 17 g) by the descending his-and-hers watches of step-length.

Utilize one group of different rates control characteristic to run MBMLQ scrambler, for frame, produce the speed control characteristic of the bit count of the quantity of the bit lower than the license being controlled to provide by bit reservoir.Speed control characteristic changes at leisure, and this can be used to reduce complexity of searching, and effectively encodes to index.If the index tested around MDCT frame above starts, then can reduce this class index of test.Equally, if probability reaches peak value around the last value of index, then the effective entropy code of this index is obtained.Such as, for the list of 32 step-lengths, each MDCT frame of average out to 2 bits can be used to carry out code rate control characteristic.

Figure 17 also summarily illustrates MBMLQ demoder 1750, and in the figure, if having estimated local gain in scrambler 1700, then MDCT frame is by gain again normalization.

Figure 17 a schematically illustrates the MDCT line scrambler 1700 based on model according to an embodiment.It comprises quantizer pretreatment module 1730(see Figure 17 c), based on the entropy constrained scrambler 1740(of model see Figure 17 e), and can be the arithmetic encoder 1720 of arithmetic encoder of prior art.The task of quantizer pretreatment module 1730 is by by making MBMLQ scrambler self-adapting signal statistical information based on MDCT frame.It gets other codecs parameter as input, and from their derive relevant signals can be used for revise based on the useful statistical information of the behavior of the entropy constrained scrambler 1740 of model.Based on the entropy constrained scrambler 1740 of model, such as, controlled by one group of controling parameters: quantiser step size Δ (increment, gap length), one group of variance evaluation V(vector of MDCT line; Each MDCT line, an estimated value), perceptual mask curve P _mod, the matrix that (at random) offsets or table, and describe the shape of distribution of MDCT line and the statistical model of the MDCT line of their relation of interdependence.All controling parameters referred to above can change between each MDCT frame.

Figure 17 b schematically illustrates the MDCT line demoder 1750 based on model according to an embodiment of the invention.It is fetched from the side information bit of bit stream as input, and they is decoded as the parameter (see Figure 17 c) being imported into quantizer pretreatment module 1760.Quantizer pretreatment module 1760 preferably has and the identical function in demoder 1750 in scrambler 1700.Be imported into the parameter of quantizer pretreatment module 1760 in the encoder with identical in a decoder.Quantizer pretreatment module 1760 exports one group of controling parameters (identical with in scrambler 1700), and these controling parameters are input to probability evaluation entity 1770(see Figure 17 g; With identical in the encoder, see Figure 17 e), and be input to quantization modules 1780(see Figure 17 h; With identical in the encoder, see Figure 17 e).When the variance of given increment for quantizing and signal, the cdf table of the probability density function of all MDCT lines of the representative from probability evaluation entity 1770, be imported into arithmetic decoder (can be any arithmetic encoder be known to those skilled in the art), then, MDCT line bit is decoded as MDCT linear index by this arithmetic decoder.Then, by going quantization modules 1780 to be gone to be quantified as MDCT line by MDCT linear index.

Figure 17 c schematically illustrates the pretreated aspect of quantizer according to an embodiment of the invention, comprises i) step size computation, ii) perceptual mask curve modification, iii) MDCT line variance evaluation, iv) offset table structure.

Step size computation is illustrate in greater detail in Figure 17 d.It comprises i) table inquiry, wherein, produces specified step delta to the speed control characteristic point in the table of step-length _nom(delta_nom), low-yield self-adaptation, and iii) high pass self-adaptation.

Gain normalization causes high-energy sound and low-yield sound to utilize same section SNR to encode usually.This can cause too much bit number for acoustically low-yield.The low-yield self-adaptation proposed allows during between low-yield and high-energy sound, refinement (fine) regulates.When signal energy is as at Figure 17 d-ii) in describe step-down time, can step-length be increased, in these figures, show signal energy (gain g) and controlling elements q _lebetween the exemplary curve of relation.Signal gain g can calculate as input signal itself or the remaining RMS value of LP.Figure 17 d-ii) in controlling curve be an example, other controlling functions of the step-length for increasing low-yield signal can be used.In described example, controlling functions is by by threshold value T ₁and T ₂and the step factor L progressively linear segments that defines is determined.

High pass sound perception does not have low pass sound important.When MDCT frame is high pass, that is, when the energy of the signal in this MDCT frame is concentrated to upper frequency, high pass adaptation function increases step-length, causes the bit of cost less on this frame.And if if LTP exists LTP gain gLTP close to 1, then LTP remnants can become high pass; In this case, it is favourable for not increasing step-length.At Figure 17 d-iii) in depict this mechanism, wherein, r is the first reflection coefficient from LPC.The high pass self-adaptation proposed can use equation below:

Figure 17 c-ii) schematically illustrate the perceptual mask curve modification using low frequency (LF) lifting to remove the Coding artifacts of " being similar to rumble ".Bass boost can be fixing, or makes it be adaptive, only to promote the part below lower than the first spectrum peak.Self-adaptation bass boost can be carried out by using LPC envelope data.

Figure 17 c-iii) schematically illustrate MDCT line variance evaluation.When the activity of LPC prewhitening filter, all MDCT lines all have unit variance (according to LPC envelope).After based on the perceptual weighting in the entropy constrained scrambler 1740 of model (see Figure 17 e), MDCT line has the masking curve P as square perceptual mask curve or square amendment _modinverse variance.If there is LTP, then it can reduce the variance of MDCT line.At Figure 17 c-iii) in, depict the mechanism making estimation variance self-adaptation LTP.The figure shows the Modification growth function q in frequency f _lTP.Modified variance can pass through V _lTPmod=Vq _lTPdetermine.Value L _lTPcan be the function of LTP gain, if so that LTP gain (represents the coupling that LTP has found) around 1, then L _lTPcloser to 0, and if LTP gain around 0, then L _lTPcloser to 1.The variance V={v proposed ₁, v ₂..., V _j..., v _nlTP self-adaptation only affect lower than a certain frequency (f _lTPcutoff) MDCT line.As a result, reduce lower than cut-off frequency f _lTPcutoffmDCT line variance, LTP gain is depended in this reduction.

Figure 17 c-iv) schematically illustrate offset table structure.Nominal offset table is the matrix of filling by the pseudo random number be distributed between-0.5 and 0.5.The quantity of the row in matrix equals the quantity of the MDCT line of being encoded by MBMLQ.The quantity of row is adjustable, and equals the quantity (see Figure 17 e) of the offset vector of test in optimizing based on the RD in the entropy constrained scrambler 1740 of model.Offset table structure function along with quantiser step size convergent-divergent nominal offset table so that offset-between Δ/2 and+Δ/2 distribute.

Figure 17 g schematically illustrates an embodiment of offset table.Offset index is the pointer of Compass, and selects selected offset vector O={o ₁, o ₂..., o _n..., O _n, wherein N is the quantity of the MDCT line in MDCT frame.

As described below, the means provided for carrying out noise filling are offset.If there is low variance v for compared with quantiser step size Δ _jmDCT line skew distribution be limited, then obtain better target and perceived quality.At Figure 17 c-iv) in, describe an example of such restriction, in the figure, k ₁and k ₂it is regulating parameter.The distribution of skew can be uniform, and is distributed between-s and+s.Border s can determine according to following formula:

For low variance MDCT line (wherein, v _jlittle compared with Δ), it is favourable for making distribution of offsets uneven and depending on signal.

Figure 17 e schematically illustrates the entropy constrained scrambler 1740 based on model.By the MDCT line utilizing the value of perceptual mask curve (preferably, deriving from LPC polynomial expression) to split input, sensuously they are being weighted, are causing the MDCT line vector y=(y of weighting ₁..., y _n).The target of encoding subsequently introduces white quantizing noise to the MDCT line in perception territory.In a decoder, apply the inverse of perceptual weighting, this can cause the quantizing noise following perceptual mask curve.

First, the iteration to random offset is summarized.For the every a line j in excursion matrix, perform following operation: quantize each MDCT line by skew uniform scalar quantizer (USQ), wherein, each quantizer is all offset (offset) by its oneself the single deflection value obtained from offset row vector.

The probability (see Figure 17 g) in the minimum distortion interval from each USQ is calculated in probability evaluation entity 1770.USQ index is entropy code.As shown in Figure 17 e, calculate the cost of the quantity according to the bit needed for encoding to index, produce the long R of theoretical code word _j.The overload border of the USQ of MDCT line j can be as calculate, wherein, k ₃any numeral suitably can be selected as, such as, 20.Overload border is the border that in amplitude, quantization error is greater than the half of quantization step.

By the scalar reconstruction value (see Figure 17 h) of going quantization modules 1780 to calculate each MDCT line, produce the MDCT vector quantized optimize in module 1790 at RD, calculated distortion can be square error (MSE), or the distortion measure that another kind is sensuously more relevant, such as, based on perceptual weighting function.Specifically, together to MSE and y and between energy do not mate the distortion measure be weighted and come in handy.

Optimize in module 1790, preferably, based on distortion D at RD _jand/or the long R of theoretical code word of every a line j in excursion matrix _j, assess the cost C.The example of cost function is C=10*log ₁₀(D _j)+λ * R _j/ N.Select the skew minimizing C, and export corresponding USQ exponential sum probability from the entropy constrained scrambler 1780 based on model.

RD optimizes and can optionally be improved further by other attributes changing quantizer together with skew.Such as, replace each offset vector for test in RD optimization, use identical fixing variance evaluation V, variance evaluation vector V can be changed.For offset row vector m, variance evaluation k can be used _mv, wherein, k _mthe line number m=(excursion matrix can be changed to from m=1 along with m) and across, such as, scope 0.5 to 1.5.Change during this makes entropy code and MMSE calculating add up the input signal that statistical model can not be caught is not too responsive.Generally speaking, this can cause lower cost C.

Can by use as in Figure 17 e the residual quantization device described, the MDCT line of quantification is removed in further refinement.Residual quantization device can be, such as, and fixed rate random vector quantizer.

Figure 17 f schematically illustrates the operation of the uniform scalar quantizer (USQ) for quantizing MDCT line n, the figure shows to be in have index i _nthe value of MDCT line n in minimum distortion interval." x " mark represents the center (mid point) with the quantized interval of step delta.The initial point of scalar quantizer is from offset vector O={o ₁, o ₂..., o _n..., o _nmove skew o _o.So, interval border and mid point move this skew.

The use of skew introduces the noise filling of scrambler control in quantized signal, by doing like this, avoids the spectral hole quantized in frequency spectrum.In addition, skew also by providing one group than the coding replacement scheme of cubic lattice more effectively packing space, improves code efficiency.Equally, skew also provides change in the probability tables calculated by probability evaluation entity 1770, and this can cause the more effective entropy code to MDCT linear index (that is, required less bit).

Use variable step size Δ (increment) to allow to quantize there is variable accuracy, so that for sensuously important sound, larger accuracy can be used, and for secondary sound, less accuracy can be used.

Figure 17 g schematically illustrates the probability calculation in probability evaluation entity 1770.Statistical model, quantiser step size Δ, Variance Vector V, the offset index being suitable for MDCT line to the input of this module, and offset table.The output of probability evaluation entity 1770 is cdf tables.For each MDCT line x _j, assessment statistical model (that is, probability density function, pdf).Area under the pdf function of an interval i is the Probability p in this interval _i,j.This probability is used for the arithmetic coding of MDCT line.

Figure 17 h schematically illustrate as, such as go in quantization modules 1780 perform remove quantizing process.With the mid point x in interval _mptogether, the barycenter (MMSE value) in the minimum distortion interval of each MDCT line is calculated .Consider the N n dimensional vector n quantizing MDCT line, scalar MMSE value is suboptimum, and generally speaking, too low.This causes the variance in the output of decoding to be lost and frequency spectrum imbalance.This problem can retain decoding by the variance such as described by Figure 17 h and alleviate, and wherein, reconstruction value calculates as the weighted sum of MMSE value and midrange.Further optional improvement is adaptive weighting, to prevail for speech MMSE value, and prevails for non-voice mid point.This can produce voice more clearly, and remains spectral balance and energy for non-voice.

Variance according to an embodiment of the invention retains decoding by determining that reconstruction point obtains according to equation below:

x _dequant＝(1-χ)x _MMSE+x _MP

Self-adaptation variance retains decoding can determine interpolation factor based on rule below:

Adaptive weighting can also further, such as, and the function g of LTP prediction gain _lTP: X=f (g _lTP).Adaptive weighting changes at leisure, and can effectively be encoded by recurrence entropy code.

In probability calculation (Figure 17 g) and going quantizes, (Figure 17 h) MDCT line statistical model of using will the statistical information of reflection real signal.In a version, statistical model hypothesis MDCT line is independently, and is laplacian distribution.MDCT line is modeled as independent Gaussian by another version.MDCT line is modeled as gauss hybrid models by a version, comprises the relation of interdependence between the MDCT line between in MDCT frame and MDCT frame.Another version makes statistical model be adaptive to line signal statistics.Self-adaptation statistical model can be forward direction and/or oppositely adaptive.

Figure 19 schematically illustrates the another aspect of the present invention of the reconstruction point of the amendment relating to quantizer, in the figure, depicts the inverse DCT used in the demoder of an embodiment.This module except the normal input of inverse DCT, that is, outside the line of quantification and information about quantization step (quantification type), also has the information of the reconstruction point about quantizer.As the quantification index i according to correspondence _ndetermine the value reconstructed time, the inverse DCT of this embodiment can use polytype reconstruction point.Reconstruction value as mentioned above be further used for, such as, in MDCT line scrambler (see Figure 17), determine the quantized residual being input to residual quantization device.In addition, also perform in inverse DCT 304 and quantize reconstruct, so that the MDCT frame of reconstruct coding, for (see Fig. 3) in LTP buffer zone, and certainly in demoder.

Inverse DCT is passable, such as, select the mid point of quantized interval as reconstruction point, or MMSE reconstruction point.In one embodiment of the invention, the reconstruction point of quantizer is selected as the mean value between center and MMSE reconstruction point.Generally speaking, reconstruction point can in be inserted between mid point and MMSE reconstruction point, such as, depend on the signal attribute of such as signal period property and so on.Signal period property information is passable, such as, derive from LTP module.This feature can make Systematical control distortion and energy preserve.Center reconstruction point will guarantee that energy is preserved, and MMSE reconstruction point will guarantee minimum distortion.When Setting signal, reconstruction point can be adapted to and provide optimal compromise place by system.

The present invention comprises new window sequential coding form further.According to one embodiment of present invention, the window for MDCT conversion is binary size, and can size is only with factor 2(2 doubly between a window to another window) change.When 16kHz sampling rate, binary transform size is, such as, corresponding to 4,8 ..., 64,128 of 128ms ..., 2048 samples.Generally speaking, propose variable-size window, the multiple window sizes between minimum window size and largest amount can be got.In one sequence, continuous print window size can only be changed by the factor 2, to produce the smooth sequence of the window size of not change suddenly.As an embodiment the series of windows that defines, that is, be only limitted to binary size, and be only allowed to size between a window to another window and only change with the factor 2, there is multiple advantage.First, do not need specificly to start or stop window, that is, with the window on sharp limit.The time/frequency resolution that this can maintain.Secondly, for coding, series of windows becomes very effective, that is, send the signal using what certain window sequence to demoder.Finally, series of windows will be applicable to (fit) superframe structure all the time well.

When needing to transmit a certain decoder configurations parameter so that when can start operate coding device in the system of the reality of demoder wherein, superframe structure is very useful.These data store the header field of the sound signal of description encoding in the bitstream usually.In order to minimize bit rate, header is not transmit for each frame of coded data, particularly in the system proposed by the present invention, within the system, MDCT frame sign may from very short become very large.Therefore, the present invention proposes a certain amount of MDCT frame to be grouped in becomes superframe together, in the beginning of superframe transmission header data.Superframe is generally defined as specific time span.Therefore, need careful, so that the change of MDCT frame sign is applicable to regular length, predefined superframe length.The series of windows of the present invention of above-outlined ensure that selected series of windows is applicable to superframe structure all the time.

According to one embodiment of present invention, LTP is delayed encodes in variable bit rate mode with LTP gain.This is favourable because due to the LTP validity for fixed cycle signal, in the section that some is long, LTP is delayed trend towards identical.Therefore, this can be utilized by arithmetic coding, causes the delayed and LTP gain coding of variable bit rate LTP.

Similarly, one embodiment of the present of invention, for the coding of LP parameter, also utilize bit reservoir and variable rate encoding.In addition, this invention also teaches recurrence LP coding.

Another aspect of the present invention is the bit reservoir of the variable frame size in process scrambler.In figure 18, depict according to bit reservoir control module 1800 of the present invention.Except the difficulty provided as input is estimated, bit reservoir control module also receives the information of the frame length about present frame.The example estimated for the difficulty in bit reservoir control module is perceptual entropy, or the logarithm of power spectrum.It is important that bit reservoir controls in the system that frame length can change in a different set of frame length wherein.When calculating quantity for the bit of the license of the frame that will encode, the bit reservoir control module 1800 of suggestion can consider frame length, as below summarize.

Here, bit reservoir is defined as a certain bit fixed amount in buffer zone, must be greater than a frame and be permitted for the average of the bit of bit rates.If size is identical, then the quantity of the bit of a frame does not change will be possible.Take out using the bit that is licensed for encryption algorithm as the permission of actual frame bit number before, bit reservoir controls the level checking bit reservoir all the time.So, full bit reservoir means, the quantity of bit available in bit reservoir equals bit reservoir size.After encoding to frame, will deduct the quantity of used bit from buffer zone, by adding the quantity representing the bit of constant bit rate, bit reservoir obtains and upgrades.Therefore, if the quantity of the bit before encoding to frame in bit reservoir equals the quantity of the average bit of each frame, then bit reservoir is empty.

In Figure 18 a, depict the key concept that bit reservoir controls.Scrambler provides the difficulty how means calculating actual frame of encoding compared with frame above.For the average difficulty of 1.0, the quantity of the bit of license depends on the quantity of bit available in bit reservoir.According to given control line, if bit reservoir is very full, then from bit reservoir, take out bit more more than the bit number corresponding to mean bit rate.When the bit reservoir of sky, will be fewer than average number of bits for the bit number of encoding to frame.For the long frame sequence with average difficulty, average bit reservoir level is made way in this behavior.For with more highly difficult frame, control line can be moved upward, and has the effect that the frame being difficult to carry out encoding is allowed to use in same bits reservoir level more bits.Correspondingly, for being easy to encode to frame, only by being moved down by the control line in Figure 18 a, need move to the little situation of difficulty from the situation of average difficulty, the quantity for the bit of a frame permission will be less.Also other can be had to revise except moving simply except control line.Such as, as shown in figure 18 a, according to frame difficulty, the slope of controlling curve can be changed.

When calculating the quantity of bit of license, the restriction to the lower end of bit reservoir must be observed, to be unlikely the more bit that taking-up ratio allows from buffer zone.Be that possible bit reservoir level and difficulty estimate an example with the bit relationships of license by the bit reservoir control program of bit calculating license that comprises of control line as shown in figure 18 a.Equally, the rigid restriction of lower end that other control algolithms will have jointly to bit reservoir level, it prevents bit reservoir from violating empty bit reservoir restriction, and the restriction had upper end, wherein, if scrambler is by bit number too low for consumption, scrambler will be forced to write filling bit.

For enabling this controlling mechanism process one group of variable frame size, must this simple control algolithm of self-adaptation.Must the normalization difficulty that will use estimate, so that the difficulty value of different frame signs is comparable.For each frame sign, the bit for license will have different allowed bands, and because the average of bit of each frame is different for variable frame size, and therefore, each frame sign all has its oneself governing equation, and with its oneself restriction.An example has been shown in Figure 18 b.It is the border of the lower permission of control algolithm to an important modification of the situation of constant frame size.Replace the average of the bit of the actual frame size corresponding to fixed bit rate situation, now, the average of the bit of the frame sign of maximum permission is the minimum permissible value for bit reservoir level before the bit taking out actual frame.This is one of key distinction controlled for the bit reservoir of constant frame size.This restriction ensure that, at least can utilize the average of the bit for this frame sign with the frame after the frame sign of maximum possible.

Difficulty is estimated can be based on, the perceptual entropy (PE) such as deriving from the masking threshold of psychoacoustic model calculates, as performed in AAC, or as the alternative bit count of the quantification with fixed step size, as performed in scrambler ECQ part according to an embodiment of the invention.These values can be normalized relative to variable frame size, and this can be realized by the simple division divided by frame length, and result will be the PE bit count of each sample respectively.For average difficulty, another normalization step can be performed.For this object, frame moving average in the past can be used, cause the frame for difficulty, have the difficulty value being greater than 1.0, or for simple frame, have the difficulty value being less than 1.0.When twice scrambler or when advanced greatly, for this normalization that difficulty is estimated, the difficulty value of the frame that also can look to the future.

Another aspect of the present invention relates to the details of the bit reservoir process for ECQ.The work prerequisite of the bit reservoir management of ECQ is that, when using constant quantization device step-length to encode, ECQ produces the quality of constant.Constant quantization device step-length produces variable bit rate, and the target of bit reservoir makes the change of the quantiser step size between different frames little as much as possible, and don't can violate the constraint of bit reservoir buffer zone.Except the speed produced by ECQ, also by transmitting more information (such as, LTP gain and delayed) based on MDCT frame.Generally speaking extra information be also entropy code, and so consume different speed between frames.

In one embodiment of the invention, the bit reservoir proposed controls the change (see Figure 18 c) attempting to minimize by introducing three variablees ECQ step-length:

-R _{eCQ_AVG}: the average ECQ speed of previous used each sample;

-Δ _{eCQ_AVG}: previous used average Quantizer step-length.

These variablees all dynamically upgrade, to reflect up-to-date encoding statistics.

-R _{eCQ_AVG_DES}: the ECQ speed corresponding on average total bit rate.

When bit reservoir level changes during the time frame of average window, this value will be different from R _{eCQ_AVG}, such as, at this moment, in frame, employed the bit rate higher or lower than the mean bit rate of specifying.It also upgrades along with the rate variation of side information, so that total speed equals the bit rate of specifying.

Bit reservoir controls to use these three values to determine will initial estimation on the increment of present frame.It is the R by searching as shown in Figure 18 c _{eCQ_ Δ}on curve correspond to R _{eCQ_AVG_DES}r _{eCQ_AVG_DES}complete like this.In subordinate phase, if speed is not according to the constraint of bit reservoir, then this value may be changed.Exemplary curve R in Figure 18 c _eCQ-Δequation based on below:

R_{ECQ} = \frac{1}{2} \log_{2} \frac{α}{Δ^{2}}

Certainly, also R can be used _eCQand other relationships between Δ.

In a stationary situation, will close to R _{eCQ_AVG_DES}, the change of Δ will be very little.In nonstatic situation, average calculating operation will guarantee the smooth change of Δ.

Although be described content above with reference to specific embodiment of the present invention, should be appreciated that, theory of the present invention is not limited only to described embodiment.On the other hand, the invention presented in the application will enable those of ordinary skill in the art understand and realize the present invention.Those skilled in the art can understand, and when not departing from the spirit and scope of the present invention proposed by claims institute exclusiveness, can make various amendment.

Claims

1. an audio coding system, comprising:

Linear prediction unit, for carrying out filtering based on sef-adapting filter to input signal;

Converter unit, for being converted to transform domain by the frame of the described input signal through filtering;

Quantifying unit, for quantizing described transform-domain signals;

Scale factor determining unit, for based on masking threshold curve generate scale factor, for quantification described transform-domain signals time in described quantifying unit;

Linear prediction scale factor estimation unit, for the parameter based on described sef-adapting filter, estimates the scale factor based on linear prediction; And

Scale factor scrambler, for the difference of encoding between the described scale factor based on masking threshold curve and the described scale factor based on linear prediction.

2. audio coding system according to claim 1, wherein, described linear prediction scale factor estimation unit comprises perceptual mask curve estimation unit, perceptual mask curve is estimated for the described parameter based on described sef-adapting filter, wherein, the described scale factor based on linear prediction is determined based on estimated perceptual mask curve.

3. audio coding system according to claim 1, wherein, the described scale factor based on linear prediction of the frame of described transform-domain signals is estimated based on the linear forecasting parameter of interpolation.

4. audio coding system according to claim 1, comprising:

Long-term forecasting unit, for the reconstruct of the previous section based on the described input signal through filtering, determines the estimation of the described frame of the described input signal through filtering; And

Transform-domain signals assembled unit, for combining the estimation of the described frame of the described input signal through filtering determined by described long-term forecasting unit and the input signal through conversion in described transform domain, to generate described transform-domain signals.

5. audio coding system according to claim 1, comprises bit reservoir control module, for estimating based on the length of described frame and the difficulty of described frame, and the quantity of the licensed bit of the frame of the described input signal through filtering that determines encoding.

6. audio coding system according to claim 5, wherein, described bit reservoir control module is estimated and/or different frame signs for different frame difficulty, has independent governing equation.

7. the audio coding system according to claim 5 or 6, wherein, the difficulty of the frame sign that the normalization of described bit reservoir control module is different is estimated.

8. the audio coding system according to claim 5 or 6, wherein, the permission lower limit of permitted bit control algolithm is set to the average of the bit of the frame sign of maximum permission by described bit reservoir control module.

9. an audio decoder, comprising:

Go quantifying unit, for based on scale factor, remove the frame quantizing incoming bit stream;

Inverse transformation block, for converting transform-domain signals inversely;

Linear prediction unit, for filtering the transform-domain signals through converting inversely; And

Scale factor decoding unit, for based on the scale factor increment information received, generate the scale factor that uses in going to quantize, this scale factor increment information to the scale factor applied in the encoder with based on the parameter of sef-adapting filter, encode by the difference between the scale factor that generates.

10. audio decoder according to claim 9, comprises

Scale factor determining unit, for based on the masking threshold curve of linear forecasting parameter deriving from present frame, generate scale factor, wherein, the scale factor increment information received described in the combination of described scale factor decoding unit and the scale factor based on linear prediction generated, to generate the scale factor for going quantifying unit described in being input to.

11. 1 kinds of audio coding methods, comprise the following steps:

Based on sef-adapting filter, filtering is carried out to input signal;

The frame of the input signal through filtering is converted to transform domain;

Quantize described transform-domain signals;

Based on masking threshold curve generate scale factor, for quantification described transform-domain signals time in quantifying unit;

Based on the parameter of described sef-adapting filter, estimate the scale factor based on linear prediction; And

Difference between the described scale factor based on masking threshold curve of coding and the described scale factor based on linear prediction.

12. 1 kinds of audio-frequency decoding methods, comprise the following steps:

Based on scale factor, remove the frame quantizing incoming bit stream;

Convert transform-domain signals inversely;

The transform-domain signals through converting inversely is filtered in linear prediction;

Based on the parameter of sef-adapting filter, estimate the second scale factor; And

Based on the scale factor difference information received and the second estimated scale factor, generate the scale factor going to use in quantification.

13. 1 kinds of audio decoding apparatus, comprising:

For based on scale factor, remove the device of the frame quantizing incoming bit stream;

For converting the device of transform-domain signals inversely;

The device of the transform-domain signals through converting inversely is filtered for linear prediction;

For the parameter based on sef-adapting filter, estimate the device of the second scale factor; And

For based on the scale factor difference information received and the second estimated scale factor, generate the device removing the scale factor used in quantification.

14. 1 kinds of audio-frequency decoding methods, comprising:

Remove quantization step, for based on scale factor, remove the frame quantizing incoming bit stream;

Inverse transformation step, for converting transform-domain signals inversely;

Linear prediction step, for filtering the transform-domain signals through converting inversely; And

Scale factor decoding step, for based on the scale factor increment information received, generate the scale factor that uses in going to quantize, described scale factor increment information to the scale factor applied in the encoder with based on the parameter of sef-adapting filter, encode by the difference between the scale factor that generates.