CN110223702A

CN110223702A - Audio decoding system and reconstructing method

Info

Publication number: CN110223702A
Application number: CN201910546611.9A
Authority: CN
Inventors: H·普恩哈根; L·维尔莫斯; L·J·萨米尔森; T·赫冯恩
Original assignee: Dolby International AB
Current assignee: Dolby International AB
Priority date: 2013-05-24
Filing date: 2014-05-23
Publication date: 2019-09-10
Anticipated expiration: 2034-05-23
Also published as: EP3005352B1; WO2014187987A1; CN105393304A; KR20160003083A; US9818412B2; ES2624668T3; BR112015028914B1; CN105393304B; US20160111097A1; EP3005352A1; HK1216453A1; JP2016522445A; CN110223702B; BR112015028914A2; RU2628177C2; KR101761099B1; RU2015150066A; JP6248186B2

Abstract

This disclosure relates to audio decoding system and reconstructing method.It provides and method, equipment and the computer program product of the less complex and more flexible control to the decorrelation introduced in audio coding system is provided.According to the disclosure, by calculating and being realized using two weighted factors of the decorrelation for introducing audio object in audio coding system, a weighted factor is used for decorrelation audio object for approaching audio object, a weighted factor for this.

Description

Audio decoding system and reconstructing method

It is May 23, entitled " audio in 2014 that the application, which is application No. is the 201480029603.2, applying date, The divisional application of the application for a patent for invention of coding and decoding methods, medium and audio coder and decoder ".

Cross reference to related applications

This application claims the U.S. Provisional Patent Application No.61/827 that on May 24th, 2013 submits, and 288 priority should The full content of application is incorporated herein.

Technical field

Disclosure herein is usually directed to audio coding.Particularly, this disclosure relates to use and calculate for compiling audio The weighted factor of audio object decorrelation in code system.

This disclosure relates to be submitted by one day with the application, entitled " Coding of Audio Scenes ", inventor Name is the U.S. Provisional Application No.61/827,246 of Heiko Purnhagen etc..The full content of the application of the reference is at this In be included by reference.

Background technique

In conventional audio system, using the method based on sound channel.Each sound channel can for example indicate a loudspeaker Or the content of a loudspeaker array.Possible encoding scheme for such system includes discrete multi-channel encoder or parameter Change coding (such as MPEG is surround).

Recently, new method is developed.This method is object-based.In the system using object-based method In, dimensional audio scene is indicated with their associated location metadata by audio object.These audio objects are in audio It is moved around in three-dimensional scenic during signal playback.The system may also include so-called bed sound channel, these sound channels can be retouched It states to map directly to the static audio object of the loudspeaker position of conventional audio system for example as described above.In such system The decoder end of system lower mixed signal and upper mixed or restructuring matrix can be used to reconstruct object/bed sound channel, wherein by based on weight The linear combination of mixed signal reconstructs object/bed sound channel under the value of corresponding element in structure matrix is constituted.

The problem of may cause (especially under low target bit rate) in object-based audio system is, decoded Correlation between object/bed sound channel is likely larger than primary object/bed sound channel correlation for coding.Such as in MPEG In SAOC, solve such problems and improve the common methods of the reconstruct of audio object to be to introduce decorrelation in a decoder Device.In MPEG SAOC, the decorrelation of introducing is intended to the specified rendering in view of audio object (that is, dependent on sound is connected to The what kind of playback unit of display system) correct correlation between Lai Huifu audio object.

It is well known, however, that the method for object-based audio system to quantity and object/bed sound channel of lower mixed signal Quantity it is sensitive, and can also be the complex operations of the rendering depending on audio object.Therefore it needs a kind of simple and flexible Method, the method is used to control the amount of the decorrelation introduced in decoder in such a system, so that changing Into the reconstruct of audio object.

Detailed description of the invention

It will now be described with reference to the attached figures example embodiment, in which:

Fig. 1 is the generalized block diagram of audio decoding system according to example embodiment；

Fig. 2 shows restructuring matrix and weighting parameters by way of example and is received lattice used by the audio decoding system of Fig. 1 Formula；

Fig. 3 is the sound for generating at least one weighting parameters used in the decorrelation process in audio decoding system The generalized block diagram of frequency encoder；

Fig. 4 shows a part in the encoder of Fig. 3 for generating at least one weighting parameters by way of example Generalized block diagram；

Fig. 5 a-5c shows the mapping function used in the part of the encoder of Fig. 4 by way of example.

All attached drawings are all schematical, and part necessary to usually illustrating only to illustrate the disclosure, and its He can then be omitted or only be proposed in part.Unless otherwise directed, identical label refers to identical in different drawings Part.

Specific embodiment

In view of above, it is therefore an objective to provide a kind of less complicated and more flexible control provided to the decorrelation of introducing, So that improving the encoder and decoder and associated method of the reconstruct of audio object.

I. summarize --- decoder

According in a first aspect, example embodiment is proposed for the production of decoded coding/decoding method, decoder and computer program Product.Method, decoder and the computer program product proposed usually can have identical feature and advantage.

According to example embodiment, a kind of method for reconstructing the time/frequency tile of N number of audio object is provided.Institute Method is stated the following steps are included: receiving mixed signal under M；Receiving can be realized from the mixed N number of audio object of signal reconstruction under M The restructuring matrix approached；Restructuring matrix is applied to mixed signal under M, N number of approaches audio object to generate；It approaches N number of At least one subset of audio object carries out decorrelative transformation, to generate at least one decorrelation audio object, thus at least Each of one decorrelation audio object corresponds to N number of one approached in audio object；Audio object is approached for N number of In each of the corresponding decorrelation audio object of not having approach audio object, reconstruct audio pair by approaching audio object The time/frequency tile of elephant；And there is each of corresponding decorrelation audio object to force N number of approach in audio object Nearly audio object, reconstructs the time/frequency tile of audio object by following steps: receiving indicates the first weighted factor and the At least one weighting parameters of two weighted factors, are weighted with the first weighted factor to audio object is approached, with the second weighting Factor pair decorrelation audio object corresponding with audio object is approached is weighted, and by weighting approach audio object with it is right The decorrelation audio object for the weighting answered combines.

Audio coding decoding system, will usually for example by the way that suitable filter group is applied to input audio signal T/F space is divided into time/frequency tile.Time/frequency tile be often referred in T/F space between the time Every a part corresponding with frequency subband.Time interval can correspond generally to the time used in audio coding decoding system The duration of frame.Frequency subband can correspond generally to one that the filter group as used in coder/decoder system defines Or several adjacent frequency subbands.The case where frequency subband corresponds to several adjacent frequency subbands defined by filter group Under, this to can have non-uniform frequency subband in the decoding process of audio signal, for example, for the sound of upper frequency Frequency signal has wider frequency subband.(in this case, audio coding decoding system is to entire in wide band situation Frequency range is operated), the frequency subband of time/frequency tile can correspond to entire frequency range.Above method discloses The step of such time/frequency tile of the N number of audio object of reconstruct.It is to be appreciated, however, that audio decoding system Each time/frequency tile can repeat the method.It will further be understood that some time/frequency tile can be compiled simultaneously Code.In general, adjacent time/frequency tile can have some overlappings on time and/or frequency.For example, temporal overlapping (that is, from the time interval to next time interval) of the element of restructuring matrix in time can be equivalent to linearly to insert Value.However, the disclosure is using the other parts of coder/decoder system as target, and between adjacent time/frequency tile Any overlapping on time and/or frequency is left to technical staff to go to implement.

As used herein, lower mixed signal is the combination as one or more bed sound channels and/or audio object Signal.

Above method provides a kind of for reconstructing the flexible and simple of the time/frequency tile of N number of audio object Method reduces any undesired correlation between the N number of audio object approached in the method.By using two Weighted factor, one for audio object is approached, one is directed to decorrelation audio object, allows to neatly control and is introduced into Decorrelation amount it is simple parametrization be implemented.

Moreover, the simple parametrization in the method carries out what kind of wash with watercolours independent of to reconstruct audio object Dye.This advantages of, is, identical independently of the what kind of playback unit for the audio decoding system for being connected to realization the method Method used, so as to cause less complex audio decoding system.

According to embodiment, there is each of corresponding decorrelation audio object to approach N number of approach in audio object Audio object, at least one described weighting parameters include can be from wherein deriving the first weighted factor and the second weighted factor Single weighting parameters.This advantages of is to propose the simple ginseng of the amount for the decorrelation that control introduces in audio decoding system Numberization.This method (goes phase using " dry " (the not being decorrelation) contribution and " wet " for describing every an object and time/frequency tile Close) the single parameter of the mixing of contribution.With use several parameters (for example, a wet contribution of description, the dry contribution of a description) It compares, by using single parameter, required bit rate can be reduced.

According to embodiment, the quadratic sum of the first weighted factor and the second weighted factor is equal to one.In this case, described Single weighting parameters include or the first weighted factor or the second weighted factor.This can be implementation for describe every an object and The plain mode of the single weighted factor of the mixing of dry contribution and the wet contribution of time/frequency tile.Also, it implies that reconstruct Object will have energy identical with object is approached.

It include to N to the step of N number of at least one subset progress decorrelative transformation for approaching audio object according to embodiment A each of audio object that approaches carries out decorrelative transformation, and thus N number of each of audio object that approaches is corresponding to one A decorrelation audio object.This can further decrease any undesired correlation between reconstruct audio object, because of institute There is reconstruct audio object to be all based on decorrelation audio object and approach both audio objects.

According to embodiment, the first weighted factor and the second weighted factor are to change at any time with frequency.Therefore, Ke Yiti High audio decodes the flexibility of system, because can introduce different decorrelation amounts to different time/frequency tiles.This may be used also To further decrease any undesired correlation between reconstruct audio object, and improve the quality of reconstruct audio object.

According to embodiment, restructuring matrix is to change at any time with frequency.Therefore, the flexibility of audio decoding system is mentioned Height because for from lower mixed signal reconstruction or approach audio object parameter can for different time/frequency tiles and become Change.

According to another embodiment, restructuring matrix and at least one weighting parameters are disposed in frame once being received.Make Restructuring matrix is arranged in the first field of frame with the first format, and will at least one described weighting ginseng using the second format Number is arranged in the second field of frame, so that only supporting that the decoder of the first format can be to the reconstruct square in the first field Battle array is decoded, and abandons at least one described weighting parameters in the second field.It is thereby achieved that with phase is not implemented The compatibility of the decoder of pass.

According to embodiment, the method can also include receiving L auxiliary signal, wherein restructuring matrix further realizes From the reconstruct approached of mixed signal and L auxiliary signal to N number of audio object under M, and wherein, the method also includes will Restructuring matrix is applied under M mixed signal and L auxiliary signal to generate and N number of to approach audio object.L auxiliary signal can be with For example including equal at least one of L auxiliary signal of be reconstructed audio object being believed in N number of audio object Number.The quality of specific reconstruct audio object can be improved in this.In N number of audio object by be reconstructed audio object Indicate a part (for example, the audio object for indicating speaker's voice in documentary film) with the audio signal of particular importance In the case where, this may be advantageous.According to embodiment, at least one of L auxiliary signal is the general in N number of audio object The combination at least two audio objects being reconstructed, to provide the compromise between bit rate and quality.

According to embodiment, mixed signal spans hyperplane under M, and wherein, at least one of L auxiliary signal not position Under M in the hyperplane of mixed signal spans.Therefore, one or more auxiliary signals in L auxiliary signal can indicate not It is included in the signal dimension under M in any one of mixed signal signal.Therefore, the quality for reconstructing audio object can mention It is high.In embodiment, at least one auxiliary signal in L auxiliary signal is orthogonal with the hyperplane of mixed signal spans under M.Cause This, the entire signal of one or more auxiliary signals in L auxiliary signal indicates that M lower mix that are not included in of audio signal are believed Number any one of part in signal.The quality of reconstruct audio object can be improved in this, while reducing required bit rate, Because at least one auxiliary signal in L auxiliary signal does not include being already present on any one of mixed signal letter under M Any information in number.

According to example embodiment, a kind of computer-readable medium is provided, which includes working as to have The computer generation code instruction for being adapted for carrying out any method of first aspect is performed on the device of processing capacity.

According to example embodiment, provide it is a kind of for reconstructing the device of the time/frequency tile of N number of audio object, should Device includes: the first receiving unit, is configured as receiving mixed signal under M；Second receiving unit is configured as receiving real Now from the restructuring matrix of the mixed N number of audio object of signal reconstruction under M approached；Audio object approaches component, is disposed in The downstream of one receiving unit and the second receiving unit, and be configured as restructuring matrix being applied to mixed signal under M, to produce Life is N number of to approach audio object；Decorrelation component is disposed in audio object and approaches the downstream of component, and is configured as to N A at least one subset for approaching audio object carries out decorrelative transformation, to generate at least one decorrelation audio object, by Each of this at least one decorrelation audio object corresponds to N number of one approached in audio object；Second receiving unit It is further configured to for N number of approach in audio object there is each of corresponding decorrelation audio object to approach audio pair As receiving at least one weighting parameters for indicating the first weighted factor and the second weighted factor；And audio object reconstitution assembly, It is disposed in audio object and approaches the downstream of component, decorrelation component and the second receiving unit, and is configured as: being directed to N A each of corresponding decorrelation audio object that do not have approached in audio object approaches audio object, by approaching audio pair As come the time/frequency tile that reconstructs audio object；And there is corresponding decorrelation sound for N number of approach in audio object Each of frequency object approaches audio object, and the time/frequency tile of audio object is reconstructed by following steps: with the first weighting Factor pair approaches audio object and is weighted, with the second weighted factor pair decorrelation audio object corresponding with audio object is approached It is weighted, and combines the audio object that approaches of weighting with the decorrelation audio object of corresponding weighting.

II. summarize --- encoder

According to second aspect, example embodiment proposes the coding method for coding, encoder and computer program and produces Product.Method, encoder and the computer program product proposed usually can have identical feature and advantage.

According to example embodiment, the method for generating at least one weighting parameters in encoder is provided, wherein when Added by the way that the decoder-side of the weighting of specific audio object to be approached to the corresponding of the specific audio object approached with decoder-side The decorrelation version of power combines, and when time/frequency tile to reconstruct the specific frequency object, at least one weighting parameters will be by With in a decoder, the described method comprises the following steps: receiving mixed signal under M, under these mixed signal be include described specific The combination of at least N number of audio object of audio object；Receive the specific audio object；It calculates and indicates the specific audio object Energy level the first amount；It calculates and indicates energy corresponding with the energy level that the coder side of the specific audio object is approached Measure the second horizontal amount, the coder side approach be mixed signal under M a combination；Based on the first amount and the second amount to calculate State at least one weighting parameters.

Above method, which is disclosed, generates at least one weighting for specific audio object during a time/frequency tile The step of parameter.It is to be appreciated, however, that can each time/frequency tile to audio coding decoding system and to every A audio object repeats the method.

It can be pointed out that the tiling (tiling) in audio coding system, i.e., be divided into time/frequency for audio signal/object Rate tile, it is not necessary to identical as the tiling in audio decoding system.

It may also be noted that the decoder-side of the specific audio object approaches the coder side with the specific audio object It approaches can be different and approach or they can be identical approach.

For bit rate required for reducing and complexity is reduced, at least one described weighting parameters may include can be with From the single weighting parameters for wherein deriving the first weighted factor and the second weighted factor, the first weighted factor is used for the spy The decoder-side for determining audio object, which approaches, to be weighted, and the audio object that the second weighted factor is used to approach decoder-side is gone Related versions are weighted.

Energy is added to the reconstruct audio object on decoder-side in order to prevent, which includes the spy The decoder-side for determining audio object approaches the decorrelation version of the audio object approached with decoder-side, the first weighted factor and The quadratic sum of two weighted factors can be equal to one.In this case, the single weighting parameters may include or first weights The factor or the second weighted factor.

According to embodiment, the step of calculating at least one weighting parameters includes comparing the first amount and the second amount.For example, can be with Compare the energy of the specific audio object approached and the energy of specific audio object.

It according to example embodiment, include: the ratio calculated between the second amount and the first amount to the comparison of the first amount and the second amount Rate；The ratio is increased to α power；And weighting parameters are calculated using the ratio for being raised to α power.Volume can be improved in this The flexibility of code device.Parameter alpha can be equal to two.

According to example embodiment, the ratio of α power is raised in accordance with increasing function, which will be raised to α The rate maps of power at least one weighting parameters described in.

According to example embodiment, the first weighted factor and the second weighted factor are to change at any time with frequency.

According to example embodiment, indicate that the second amount of energy level is forced corresponding to the coder side of the specific audio object Close energy level, the coder side approach be mixed signal and L auxiliary signal under M linear combination, lower mixed signal with Auxiliary signal is formed from N number of audio object.In order to improve decoder-side audio object reconstruct, auxiliary signal can be included In audio coding decoding system.

According to example embodiment, at least one auxiliary signal in L auxiliary signal can correspond to especially important sound Frequency object such as indicates the audio object of dialogue.Therefore, at least one auxiliary signal in L auxiliary signal can be equal to N number of One in audio object.According to further embodiments, at least one auxiliary signal in L auxiliary signal is N number of audio At least two combination in object.

According to example embodiment, mixed signal spans hyperplane under M is a, and wherein, at least one of L auxiliary signal Auxiliary signal is not located under M in the hyperplane of mixed signal spans.It means that at least one of L auxiliary signal assists Signal indicates the signal dimension for the audio object lost during mixed signal under generating M, this can be improved to decoder The reconstruct of the audio object of side.According to further embodiments, at least one described auxiliary signal and M in L auxiliary signal The hyperplane of mixed signal spans is orthogonal under a.

According to example embodiment, a kind of computer-readable medium is provided, which includes when it is having There is the computer generation code instruction that any method for being adapted for carrying out second aspect is performed on the device of processing capacity.

According to example embodiment, it provides a kind of for generating the encoder of at least one weighting parameters, wherein when passing through The decoder-side of the weighting of specific audio object is approached to the corresponding weighting of the specific audio object approached with decoder-side The combination of decorrelation version, when time/frequency tile to reconstruct the specific frequency object, at least one described weighting parameters will be by With in a decoder, described device includes: receiving unit, is configured as receiving mixed signal under M, and mixed signal is packet under these The combination of at least N number of audio object of the specific audio object is included, which is further configured to receive the spy Determine audio object；Computing unit is configured as: calculating the first amount for indicating the energy level of the specific audio object；Meter Calculate the second amount for indicating energy level corresponding with the energy level that the coder side of the specific audio object is approached, the volume Code device side approach be mixed signal under M combination；At least one described weighting parameters are calculated based on the first amount and the second amount.

Example embodiment

Fig. 1 shows the generalized block diagram of the audio decoding system 100 for reconstructing N number of audio object.Audio decoding system 100 execute time/frequency resolution process, it is meant that it operates to reconstruct N number of audio pair single time/frequency tile As.Below, by the processing of a time/frequency tile for being used to reconstruct N number of audio object for description system 100.N number of audio Object can be one or more audio objects.

System 100 includes the first receiving unit 102, is configured as receiving mixed signal 106 under M.Mixed signal can under M To be to mix signal under one or more.Mixed signal 106 may, for example, be with established voice codec system (such as under M Dolby Digital Plus, MPEG or AAC) back compatible 5.1 or 7.1 around signals.In other embodiments, under M Mixed 106 not back compatible of signal.The input signal of first receiving unit 102 can be bit stream 130, receiving unit can from than Mixed signal 106 under M is extracted in spy's stream 130.

System 100 further includes the second receiving unit 112, is configured as receiving and realizes that mixed signal 106 reconstructs N under M The restructuring matrix 104 of a audio object approached.Restructuring matrix 104 can also be referred to as upper mixed matrix.Second receiving unit 112 Input signal 126 can be bit stream 126, which can extract restructuring matrix 104 or its yuan from bit stream 126 Element will be described in detail additional information below.In some embodiments of audio decoding system 100,102 He of the first receiving unit Second receiving unit 112 is combined in a single receiving unit.In some embodiments, input signal 130,126 is by group Be combined into a single input signal, one single input signal can be have allow receiving unit 102,112 from One single input signal extracts the bit stream of the format of different information.

System 100 can also include that audio object approaches component 108, be disposed in the first receiving unit 102 and second The downstream of receiving unit 112, and mixed signal 106 is configured as restructuring matrix 104 being applied under M to generate N number of force Nearly audio object 110.More specifically, audio object, which approaches component 108, can execute matrix operation, in the matrix operation, By restructuring matrix multiplied by the vector for including mixed signal under M.Restructuring matrix 104 can be at any time with frequency variation, that is, weight The value of element in structure matrix 104 can be different for each time/frequency tile.Therefore, the element of restructuring matrix 104 It is currently being handled dependent on which time/frequency.

(that is, time/frequency tile) approaches at frequency k and time slot lAudio object n for example in audio object It approaches at component 108 and is calculated, for example, being used for all frequency sampling k in frequency band b, b=1 ..., BTo calculate, wherein c_{M, b, n}Be in frequency band b with lower mixing sound road Y_mThe associated object n's of mesh Reconstruction coefficients.It can be pointed out that reconstruction coefficients c_{M, b, n}It is fixed for being assumed to be on time/frequency tile, but further Embodiment in, which can change during time/frequency tile.

System 100 further includes the decorrelation component 118 for being disposed in audio object and approaching 108 downstream of component.Decorrelation group Part 118 is configured as carrying out decorrelative transformation to N number of at least one subset 140 for approaching audio object 110, to generate at least One decorrelation audio object 136.It in other words, can be to N number of entirely or only some progress approached in audio object 110 Decorrelative transformation.Each of at least one described decorrelation audio object 136 corresponds to N number of approach in audio object 110 One.More precisely, the set of decorrelation audio object 136 correspond to be input into decorrelation process 118 approach sound The set 140 of frequency object.The purpose of at least one decorrelation audio object 136 be reduce it is N number of approach audio object 110 it Between undesired correlation.The undesired correlation is especially to be had in the audio system including audio decoding system 100 Occur when low target bit rate.Under low target bit rate, restructuring matrix may be sparse.This means that in restructuring matrix Many elements may be zero.In this case, specifically approaching audio object 110 can be based on the mixed signal 106 under M Individually under mixed signal or several lower mixed signals, introduce undesired correlation approaching between audio object 110 to increase The risk of property.According to some embodiments, decorrelation component 118 carries out decorrelation to N number of each of audio object 110 that approaches Processing, thus N number of each of audio object 110 that approaches is corresponding to a decorrelation audio object 136.

N number of each of audio object 110 that approaches that decorrelative transformation can be carried out to decorrelation component 118 carries out not Same decorrelative transformation, for example, approaching audio object by be applied to be decorrelated by noise-whitening filter, or by answering With any other suitable decorrelative transformation, such as all-pass wave filtering.

The example of further decorrelative transformation can be found in the following: MPEG parametric stereo encoding tool (its It is used in HE-AAC v2, such as the paper of the 116th conference of ISO/IEC 14496-3 and in May, 2004 Berlin, Germany AES: J.H.Purnhagen, J.L.Liljeryd, " Synthetic ambience in As described in parametric stereo coding "), MPEG is around (ISO/IEC 23003-1) and MPEG SAOC(ISO/IEC 23003-2)。

In order not to introduce undesired correlation, different decorrelative transformations is mutual decorrelation.According to other implementations Example carries out identical decorrelative transformation to some or all of objects approached in audio object 110.

System 100 further includes audio object reconstitution assembly 128.Object reconstruction component 128 is disposed in audio object and approaches The downstream of component 108, decorrelation component 118 and the second receiving unit 112.Object reconstruction component 128 is configured as, for N number of Each of the corresponding decorrelation audio object 136 that do not have approached in audio object 138 approaches audio object, by approaching sound Frequency object 138 reconstructs the time/frequency tile of audio object 142.In other words, if a certain approach audio object 138 still Decorrelative transformation is not carried out, then it, which is simply reconstructed into, approaches audio object by what audio object approached that component 108 provides 110.Object reconstruction component 128 is further configured to, and has corresponding decorrelation for N number of approach in audio object 110 Each of audio object 136 approaches audio object, using decorrelation audio object 136 and corresponding approaches 110 liang of audio object Person reconstructs the time/frequency tile of audio object.

In order to promote the process, the second receiving unit 112 is further configured to approach in audio object 110 for N number of There is each of corresponding decorrelation audio object 136 to approach audio object, receive at least one weighting parameters 132.It is described At least one weighting parameters 132 indicates the first weighted factor 116 and the second weighted factor 114.The first of the also referred to as dry factor Second weighted factor 116 of weighted factor 116 and the also referred to as wet factor, by wet/dry extractor 134 from it is described at least one Weighting parameters 132 are derived.First weighted factor 116 and/or the second weighted factor 114 can be to be changed with frequency at any time , that is, the value of weighted factor 116,114 can be different for processed each time/frequency tile.

In some embodiments, at least one described weighting parameters 132 include the first weighted factor 116 and the second weighting because Son 114.In some embodiments, at least one described weighting parameters 132 include single weighting parameters.If so, then wet/dry Extractor 134 can derive the first weighted factor 116 and the second weighted factor 114 from the single weighting parameters 132.Example Such as, the first weighted factor 116 and the second weighted factor 114 can satisfy certain relationships, once these relationships allow weighted factor In a weighted factor be it is known, then another weighted factor can be derived.The example of such relationship can be, The quadratic sum of first weighted factor 116 and the second weighted factor 114 is equal to one.Therefore, if single weighting parameters 132 include the One weighted factor 116, the then square root of the first weighted factor 116 that can be subtracted square according to one derive the second weighted factor 114, vice versa.

First weighted factor 116 is for weighting 122, that is, for approach audio object 110 and be multiplied.Second weighted factor 114 for weighting 120, that is, for being multiplied with corresponding decorrelation audio object 136.Audio object reconstitution assembly 128 is by into one Step is configured to for example combine the decorrelation sound for approaching audio object 150 with corresponding weighting of 124 weightings by executing summation Frequency object 152, to reconstruct the time/frequency tile of corresponding audio object 142.

In other words, for each object and each time/frequency tile, the amount of decorrelation can be by a weighting parameters 132 controls.In wet/dry extractor 134, which is converted into the weight factor for being applied to approach object 110 116(w_dry) and be applied to the 114 (w of weight factor of decorrelation object 136_wet).The quadratic sum of these weight factors is one, That is,

This means that the final object 142 of the output as summation 124 is with identical with corresponding decorrelation object 110 Energy.

In order to enable input signal 126,130 can cannot be handled the audio decoding system decoding of decorrelation, that is, be The backward compatibility with such audio decoder is kept, input signal 126 can be disposed in as depicted in fig. 2 In frame 202.According to this embodiment, restructuring matrix 104 is arranged in the first field of frame 202 using the first format, and made At least one described weighting parameters 132 are arranged in the second field of frame 202 with the second format.In this way it is possible to read Taking the first format but cannot reading the decoder of the second format still can be decoded restructuring matrix 104 and with any Conventional mode carries out lower mixed signal 106 using restructuring matrix 104 upper mixed.Second field of frame 202 is in this case It can be dropped.

According to some embodiments, the audio decoding system 100 in Fig. 1 can add for example at the first receiving unit 102 Ground receives L auxiliary signal 144.There may be auxiliary signals as one or more, that is, L >=1.These auxiliary signals 144 It can be included in input signal 130.Auxiliary signal 144 can be maintained with backward compatibility more than such basis Mode be included in input signal 130, that is, so that cannot handle the decoder system of auxiliary signal still can be from defeated Mixed signal 106 under entering in signal 130 at derivation.Restructuring matrix 104 can further realize auxiliary from mixed signal 106 under M and L Signal 144 is helped to reconstruct approaching for N number of audio object 110.Audio object, which approaches component 108 therefore can be configured as, will reconstruct square Battle array 104 is applied under M mixed signal 106 and L auxiliary signal 144 to generate and N number of to approach audio object 110.

The effect of auxiliary signal 144 is to improve to approach in component 108 in audio object to approach N number of audio object.Root According to an example, at least one auxiliary signal in auxiliary signal 144 be equal in N number of audio object by be reconstructed one. In this case, the vector in the restructuring matrix 104 for reconstructing specific audio object will only include single non-zero parameter, example Such as, with the parameter of value one (1).According to other examples, at least one auxiliary signal in L auxiliary signal 144 is N number of audio In object by be reconstructed at least two combination.

In some embodiments, L auxiliary signal can indicate the signal dimension of N number of audio object, these signal dimensions It is the information lost during mixed signal 106 under generating M from N number of audio object.This can be by illustrating M lower mixed letters Hyperplane and L auxiliary signal 144 in number 106 crossover signal spaces, which are not located in the hyperplane, to explain.For example, L auxiliary signal 144 can be orthogonal with the hyperplane that mixed signal 106 under M is crossed over.It is based only upon mixed signal 106 under M, only Signal in hyperplane can be reconstructed, that is, the audio object not being located in hyperplane will be believed by the audio in hyperplane It number approaches.By further using L auxiliary signal 144 in reconstruct, the signal not being located in hyperplane can also be reconstructed. As a result, it is possible to by also improving approaching for audio object using L auxiliary signal.

Fig. 3 shows the summary of the audio coder 300 for generating at least one weighting parameters 320 by way of example Block diagram.As the spy approached by the way that the decoder-side of the weighting of specific audio object to be approached to (label 150 of Fig. 1) and decoder-side Decorrelation version (label 152 of Fig. 1) combination (label 124 of Fig. 1) of the corresponding weighting of audio object is determined to reconstruct the spy When determining the time/frequency tile of frequency object, at least one described weighting parameters 320 will be used in decoder (such as above-mentioned sound Frequency decoding system 100) in.

Encoder 300 includes receiving unit 302, is configured as receiving mixed signal 312 under M, mixed signal 312 under these Be include the specific audio object at least N number of audio object combination.Receiving unit 302 is further configured to receive special Determine audio object 314.In some embodiments, receiving unit 302 is further configured to receive L auxiliary signal 322.As above It is discussed, at least one of L auxiliary signal 322 can be equal to one in N number of audio object, in L auxiliary signal 322 At least one can be at least two combination in N number of audio signal, and at least one of L auxiliary signal 322 It may include the information being not present under M in any one of mixed signal.

Encoder 300 further includes computing unit 304.Computing unit 304 is configured as example in the first energy balane component The first amount 316 of the energy level of instruction specific audio object is calculated at 306.First amount 316 can be calculated as specific audio The norm of object.For example, the first amount 316 can be equal to the energy of specific audio object, therefore two norm Q can be used₁=| | S | |² To calculate, wherein S indicates the specific audio object.First amount can alternatively be calculated as indicating the specific audio Another amount (square root of such as energy) of the energy of object.

Computing unit 304 is further configured to calculate the second amount 318, the coding of instruction and specific audio object 314 The corresponding energy level of the energy level that device side is approached.Coder side approaches the combination that may, for example, be mixed signal 312 under M, Such as linear combination.Alternatively, coder side approaches the combination that can be mixed signal 312 and L auxiliary signal 322 under M, Such as linear combination.Second amount can be calculated at the second energy balane component 308.

Coder side, which is approached, for example to be counted by using mixed signal 312 under the matched mixed matrix of non-energy and M It calculates.In the context of the present specification, by term " non-energy is matched " it should be understood that specific audio object approach with The specific audio object itself is that energy is unmatched, that is, this is approached will have different energy compared with specific audio object 314 Amount is horizontal, usually lower energy level.

Different methods can be used and generate the matched mixed matrix of non-energy.It is, for example, possible to use Minimum Mean Square Errors (MMSE) prediction technique, this method at least take mixed 312 (and possibly, L auxiliary of signal under N number of audio object and M Signal 322) as input.This can be described as being intended to find the upper mixed of the mean square deviation approached for minimizing N number of audio object The alternative manner of matrix.Specifically, mixing Matrix Multiplication on this method candidate with signal mixed under M 312 (and possibly, L Auxiliary signal 322) to approach N number of audio object, and described approach compares with N number of audio object in terms of mean square deviation. Mixed matrix on the candidate of mean square deviation is minimized to be chosen as being used to define the upper mixed square that the coder side of specific audio object is approached Battle array.

When using MMSE method, specific audio object S and the prediction error e approached between audio object S ' are orthogonal with S. This means that:

||S′||²+||e||²=| | S | |²

In other words, the energy of audio object S is equal to the sum of the energy of the energy for approaching audio object and prediction error.By In relation above, predict therefore the energy of error e gives the instruction that the energy of S ' is approached coder side.

Therefore, it is possible to use specific audio object approaches S ' or prediction error to calculate the second amount 318.Second amount can be with It is calculated as the norm for approaching S ' of specific audio object or predicts the norm of error e.For example, the second amount can be calculated as 2 norms are (that is, Q₂=| | S ' | |²Or Q₂=| | e | |²).Second amount can alternatively be calculated as the specific audio that instruction approaches Another amount of the energy of object, the energy of the square root or prediction error of the energy of the specific audio object such as approached are put down Root.

Computing unit is further configured to for being based on the first amount 316 and second for example at parameter computation component 310 318 are measured to calculate at least one described weighting parameters 320.Parameter computation component 310 can be for example by comparing 316 He of the first amount Second amount 318 calculates at least one described weighting parameters 320.Example will be explained in detail in conjunction with Fig. 4 and Fig. 5 a-c now Property parameter computation component 310.

Fig. 4 shows the parameter computation component 310 for generating at least one weighting parameters 320 by way of example Generalized block diagram.Parameter computation component 310 is for example at ratio calculation component 402, by calculating the second amount 318 and the first amount Ratio r between 316 compares the first amount 316 and the second amount 318.Then the ratio is increased to α power, it may be assumed that

Wherein, Q₂It is the second amount 318, Q₁It is the first amount 316.According to some embodiments, work as Q₂=| | S ' | | and Q₁=| | S | | when, α is equal to 2, that is, ratio r is the ratio of the specific audio object approached and the energy of specific audio object.Then for example At least one described weighting parameters 320 are calculated using the ratio for being raised to α power at map component 404.Map component 404 make r406 in accordance with increasing function, which is mapped at least one described weighting parameters 320 for r.It illustrates in Fig. 5 a-c Illustrate such increasing function.In Fig. 5 a-c, trunnion axis indicates the value of r406, and vertical axis indicates the value of weighting parameters 320. In this example embodiment, weighting parameters 320 are single weighting parameters corresponding with the first weighted factor 116 in Fig. 1.

Generally, the principle of mapping function is:

If Q₂< < Q₁, then the first weighted factor is close to 0, if Q₂≈Q₁, then the first weighted factor is close to 1.

Fig. 5 a shows mapping function 502, in the mapping function 502, the value between 0 and 1 for r406, and the value of r It will be identical as the value of weighting parameters 312.For the value for being greater than 1 of r, the value of weighting parameters 320 will be 1.

Figure 5b shows that another mapping functions 504, in the mapping function 504, the value between 0 and 0.5 for r406, The value of weighting parameters 320 will be 0.For the value for being greater than 1 of r, the value of weighting parameters 320 will be 1.For r 0.5 and 1 between Value, the value of weighting parameters 320 will be (r-0.5) * 2.

Fig. 5 c shows the third substitution mapping function 506 of the mapping function of overview diagram 5a-b.Mapping function 506 is by least Four parameter b₁、b₂、β₁And β₂It is limited, these parameters can be the optimal perceived of the reconstruct audio object for decoder-side The constant that quality is tuned.Generally, the maximum of the decorrelation in limitation output audio signal can be beneficial, because The quality for approaching audio object of decorrelation is usually more of poor quality when audio object is individually listened to than approaching.By b₁It is set as big Directly control this point in zero, so as to ensure weighting parameters 320 (therefore and Fig. 1 in the first weighted factor 116) It all will be greater than zero under all situations.By b₂Be set smaller than 1 have be constantly present minimum in the output of audio decoding system 100 The effect of horizontal decorrelation energy.In other words, the second weighted factor 114 in Fig. 1 will be always greater than zero.β₁Implicitly control The amount for the decorrelation added in the output of audio decoding system 100 is made, but is related to different dynamics (with b₁Compared to).Class As, β₂Implicitly control the amount of the decorrelation in the output of audio decoding system 100.

In the value β of desired r₁And β₂Between curved surface mapping function in the case where, need at least one another parameter, the ginseng Number can be constant.

It is equivalent, extension, substitution and other

After studying above description, the further embodiment of the disclosure will become to those skilled in the art It is clear.Even if current description and attached drawing discloses embodiment and example, but the present disclosure is not limited to these particular examples.It is not carrying on the back In the case where from the scope of the present disclosure being defined by the following claims, many modifications and variations can be made.In claim Any quotation mark of middle appearance is not understood to limit their range.

In addition, the modification of the disclosed embodiments can be by skill by research attached drawing, disclosure and appended claims Art personnel understand and implement in implementing the disclosure.In the claims, word " comprising " is not excluded for other elements or step, no Definite article " one " is not excluded for multiple.The fact that only certain measures are described in mutually different dependent claims is not Show that the combination of these measures cannot be used for benefiting.

System and method disclosed hereinabove may be implemented as software, firmware, hardware or their combination.In hardware In embodiment, the division between the functional unit that task refers in the above description not necessarily corresponds to drawing for physical unit Point；On the contrary, a physical assemblies can have multiple functions, and a task can be executed by several physical assemblies cooperations. Certain components or all components may be implemented as the software executed by digital signal processor or microprocessor, or be carried out For hardware, or it is implemented as specific integrated circuit.Such software can be distributed on a computer-readable medium, and computer can Reading medium may include computer storage medium (or non-transitory medium) and communication media (or fugitive medium).Such as this field Well known to technical staff, term computer storage medium is included in for storing information (such as computer-readable instruction, data knot Structure, program module or other data) any method or technique in implement volatile and non-volatile, can be removed and it is not removable Except medium.Computer storage medium include but is not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, Digital versatile disc (DVD) or other optics disk storages, magnetic holder, tape, disk storage or other magnetic memory apparatus or can be with Any other medium for storing desired information and can be accessed by a computer.In addition, well known to those skilled in the art It is that communication media generally comprises computer readable instructions, data structure, program module or such as carrier wave or other conveyers Other data in the modulated data signal of system etc, and including any information delivery media.

Claims

1. a kind of for reconstructing the audio decoding system of the time/frequency tile of N number of audio object, comprising:

First receiving unit (102) is configured as receiving the first input signal (130), and first input signal includes under M Mixed signal (126) and L auxiliary signal (130)；

Second receiving unit (112), is configured as:

It receives the second input signal (126), and extracts restructuring matrix (104) from second input signal；And

It receives weighting parameters (132)；

Audio object approaches component (108), is arranged in the downstream of first receiving unit and second receiving unit, and And it is configured as the restructuring matrix being applied under the M mixed signal and the L auxiliary signal and N number of approaches sound to generate Frequency object；

It is wet/dry to extract device assembly (134), it is arranged in the downstream of second receiving unit, and be configured as from by described The received weighting parameters of second receiving unit derive the dry factor (116) and the wet factor (114)；

Decorrelation component (118) is arranged in the audio object and approaches the downstream of component, and is configured as to described N number of At least one subset for approaching audio object carries out decorrelative transformation, to generate at least one decorrelation audio object, thus Each of at least one described decorrelation audio object corresponds to N number of one approached in audio object；

Audio object reconstitution assembly (128) is arranged in the audio object and approaches component, the decorrelation component, Yi Jisuo Wet/dry downstream for extracting device assembly is stated, the audio object reconstitution assembly is configured as:

It is weighted using N number of audio object that approaches described in the dry factor pair；

It is weighted using at least one decorrelation audio object described in the wet factor pair；And

Weighted N number of audio object and at least one weighted decorrelation audio object of approaching is combined to reconstruct N number of audio The time/frequency tile of object (142).

2. system according to claim 1, wherein the wet factor and the dry factor are to change at any time with frequency , and wherein the restructuring matrix is to change at any time with frequency.

3. system according to claim 1, wherein at least one of described L auxiliary signal is equal to N number of audio In object by be reconstructed one.

4. system according to claim 1, wherein at least one of described L auxiliary signal is N number of audio pair As in by be reconstructed at least two combination.

5. system according to claim 1, wherein mixed signal spans hyperplane under the M, and wherein, the L At least one of auxiliary signal is not located under the M in the hyperplane of mixed signal spans.

6. system according to claim 5, wherein in the L auxiliary signal it is described at least one with the M under The hyperplane of mixed signal spans is orthogonal.

7. system according to claim 1, wherein the restructuring matrix and the weighting parameters are when being received by cloth It sets in frame, wherein the restructuring matrix is arranged in the first field of the frame using the first format, and uses second The weighting parameters are arranged in the second field of the frame by format, so that only supporting that the decoder of the first format can Restructuring matrix in first field is decoded and abandons the weighting parameters in the second field.

8. a kind of method for reconstructing the time/frequency tile of N number of audio object by audio decoding system, comprising:

The first input signal is received by the first receiving unit of audio decoding system, first input signal includes under M Mixed signal and L auxiliary signal；

The second input signal and weighting parameters are received by the second receiving unit of audio decoding system；

Restructuring matrix is extracted from second input signal by the second receiving unit；

Component is approached by the audio object of audio decoding system, and restructuring matrix is applied to mixed signal and the L under the M A auxiliary signal with generate it is N number of approach audio object, the audio object approaches component and is arranged in first receiving unit With the downstream of second receiving unit；

The dry factor and the wet factor are derived from the received weighting parameters of institute by wet/dry extraction device assembly of audio decoding system, Wet/dry downstream extracted device assembly and be arranged in second receiving unit；

Phase is carried out to described N number of at least one subset for approaching audio object by the decorrelation component of audio decoding system It closes, including generates at least one decorrelation audio object, wherein each of at least one described decorrelation audio object pair N number of one approached in audio object described in Ying Yu, the decorrelation component are arranged in the audio object and approach component Downstream；

It is weighted by audio object reconstitution assembly using N number of audio object that approaches described in the dry factor pair, the audio Object reconstruction group is arranged in the audio object and approaches component, the decorrelation component and wet/dry extraction device assembly Downstream；

Added by the audio object reconstitution assembly using at least one decorrelation audio object described in the wet factor pair Power；And

It is combined by the audio object reconstitution assembly and weighted N number of approaches audio object and at least one weighted goes phase Audio object is closed to reconstruct the time/frequency tile of N number of audio object, wherein the audio decoding system includes one or more A computer processor.

9. according to the method described in claim 8, wherein:

It is weighted including N number of audio object that approaches using N number of audio object that approaches described in the dry factor pair multiplied by institute State the dry factor；

It is weighted using at least one decorrelation audio object described in the wet factor pair including at least one described decorrelation Audio object is multiplied by the wet factor；

Combine it is weighted it is N number of approach audio object and at least one weighted decorrelation audio object include will be weighted It is N number of to approach audio object and at least one weighted decorrelation audio object is summed.

10. according to the method described in claim 8, wherein, the wet factor and the dry factor are to change at any time with frequency , and wherein the restructuring matrix is to change at any time with frequency.

11. according to the method described in claim 8, wherein, at least one of described L auxiliary signal is equal to N number of sound In frequency object by be reconstructed one.

12. according to the method described in claim 8, wherein, at least one of described L auxiliary signal is N number of audio In object by be reconstructed at least two combination.

13. according to the method described in claim 8, wherein, mixed signal spans hyperplane under the M, and wherein, the L At least one of a auxiliary signal is not located under the M in the hyperplane of mixed signal spans.

14. according to the method for claim 13, wherein at least one of described L auxiliary signal is lower mixed with the M The hyperplane of signal spans is orthogonal.

15. according to the method described in claim 8, wherein, the restructuring matrix and the weighting parameters quilt when being received It is arranged in frame, wherein the restructuring matrix is arranged in the first field of the frame using the first format, and uses the The weighting parameters are arranged in the second field of the frame by two formats, so that only supporting the decoder energy of the first format It is enough that restructuring matrix in first field is decoded and abandons the weighting parameters in the second field.