[go: up one dir, main page]

CN101341533B - Method and apparatus for decoding audio signal - Google Patents

Method and apparatus for decoding audio signal Download PDF

Info

Publication number
CN101341533B
CN101341533B CN2006800421983A CN200680042198A CN101341533B CN 101341533 B CN101341533 B CN 101341533B CN 2006800421983 A CN2006800421983 A CN 2006800421983A CN 200680042198 A CN200680042198 A CN 200680042198A CN 101341533 B CN101341533 B CN 101341533B
Authority
CN
China
Prior art keywords
channel
audio signal
spatial
spatial information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2006800421983A
Other languages
Chinese (zh)
Other versions
CN101341533A (en
Inventor
房熙锡
吴贤午
林宰显
金东秀
郑亮源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG Electronics Inc
Original Assignee
LG Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LG Electronics Inc filed Critical LG Electronics Inc
Priority claimed from PCT/KR2006/003659 external-priority patent/WO2007032646A1/en
Publication of CN101341533A publication Critical patent/CN101341533A/en
Application granted granted Critical
Publication of CN101341533B publication Critical patent/CN101341533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stereophonic System (AREA)

Abstract

An apparatus for decoding an audio signal and a method thereof are disclosed. The present invention includes receiving an audio signal and spatial information, identifying a type of modified spatial information, generating the modified spatial information using the spatial information, and decoding the audio signal using the modified spatial information, wherein the type of the modified spatial information includes at least one of partial spatial information, combined spatial information, and extended spatial information. Thus, the audio signal can be decoded into a configuration different from the configuration decided by the encoding apparatus. Even if the number of speakers is less than or more than the number of multi-channels before down-mixing is performed, it is possible to generate an output channel equal to the number of speakers from the down-mixed audio signal.

Description

Method and apparatus for decoding audio signal
Technical Field
The present invention relates to audio signal processing, and more particularly, to an apparatus for decoding an audio signal and method thereof. Although the present invention is suitable for a wide scope of applications, it is particularly suitable for decoding audio signals.
Background
In general, an encoder encodes an audio signal, if the audio signal to be encoded is a multi-channel audio signal, the multi-channel audio signal is downmixed into two channels or one channel to generate a downmixed audio signal, and spatial information is extracted from the multi-channel audio signal. The spatial information is information that can be used to upmix a multi-channel audio signal from a downmix audio signal. Meanwhile, the encoder downmixes the multi-channel audio signal according to a predetermined tree configuration. In this case, the predetermined tree configuration may be a structure agreed between the audio signal decoder and the audio signal encoder. In particular, if there is identification information indicating the type of one of the predetermined tree configurations, the decoding apparatus can know the structure of the audio signal that has been upmixed, e.g., the number of channels, the position of each channel, etc.
Therefore, if the encoder downmixes a multi-channel audio signal according to a predetermined tree configuration, the spatial information extracted in this process also depends on the structure. Therefore, if a decoding apparatus upmixes a downmixed audio signal using spatial information depending on the structure, a multi-channel audio signal according to the structure is generated. That is, if a decoding apparatus uses spatial information generated by an encoding apparatus as it is, upmixing is performed only according to a structure agreed between the encoding apparatus and the decoding apparatus. Therefore, an output channel audio signal that does not follow the agreed structure cannot be generated. For example, it is not possible to upmix a signal into an audio signal having a different (fewer or greater) number of channels than the number of channels determined according to the agreed structure.
Disclosure of Invention
Accordingly, the present invention is directed to an apparatus for decoding an audio signal and method thereof that substantially obviate one or more of the problems due to limitations and disadvantages of the related art.
An object of the present invention is to provide an apparatus for decoding an audio signal and method thereof, by which the audio signal can be decoded to have a structure different from that determined by an encoder.
Another object of the present invention is to provide an apparatus for decoding an audio signal and method thereof, by which the audio signal can be decoded using spatial information generated by modifying previous spatial information generated by an encoding process.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
To achieve these and other advantages and in accordance with the purpose of the present invention, as embodied and broadly described, a method of decoding an audio signal according to the present invention includes: the method includes receiving an audio signal and spatial information, identifying a type of modified spatial information, generating modified spatial information using the spatial information, and decoding the audio signal using the modified spatial information, wherein the type of modified spatial information includes at least one of partial spatial information, combined spatial information, and extended spatial information.
To further achieve these and other advantages and in accordance with the purpose of the present invention, a method of decoding an audio signal includes: the method includes receiving spatial information, generating combined spatial information using the spatial information, and decoding an audio signal using the combined spatial information, wherein the combined spatial information is generated by combining spatial parameters included in the spatial information.
To further achieve these and other advantages and in accordance with the purpose of the present invention, a method of decoding an audio signal includes: receiving spatial information including at least one spatial information and spatial filter information including at least one filter parameter; generating combined spatial information having a surround effect by combining the spatial parameter with the filter parameter; and converting the audio signal into a virtual surround signal using the combined spatial information.
To further achieve these and other advantages and in accordance with the purpose of the present invention, a method of decoding an audio signal includes: the method includes receiving an audio signal, receiving spatial information including tree configuration information and spatial parameters, generating modified spatial information by adding extended spatial information to the spatial information, and upmixing the audio information using the modified spatial information, the upmixing the audio signal including converting the audio signal into a primary upmix audio signal based on the spatial information and converting the primary upmix audio signal into a secondary upmix audio signal based on the extended spatial information.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
Brief description of the drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
In the drawings:
fig. 1 is a block diagram of an audio signal encoding apparatus and an audio signal decoding apparatus according to the present invention;
FIG. 2 is a schematic diagram of one embodiment of applying partial spatial information;
FIG. 3 is a schematic diagram of another embodiment of applying partial spatial information;
FIG. 4 is a schematic diagram of yet another embodiment of applying partial spatial information;
FIG. 5 is a schematic diagram of one embodiment of applying combined spatial information;
FIG. 6 is a schematic diagram of another embodiment of applying combined spatial information;
FIG. 7 is a diagram of the acoustic paths from the speakers to the listener, showing the location of the speakers;
fig. 8 is a diagram explaining signals output from each speaker position for a surround effect;
fig. 9 is a conceptual diagram explaining a method of generating a 3-channel signal using a 5-channel signal;
FIG. 10 is a diagram of an embodiment of configuring an expansion channel based on expansion channel configuration information;
FIG. 11 is a diagram for explaining the configuration of the extension channels and their relationship to the extension spatial parameters shown in FIG. 10;
FIG. 12 is a position diagram of a 5.1-channel multi-channel audio signal and a 6.1-channel output-channel audio signal;
fig. 13 is a diagram for explaining a relationship between a virtual sound source position and a level difference between two channels;
fig. 14 is a graph explaining the levels of two rear channels and one rear center channel;
fig. 15 is a diagram for explaining the positions of multi-channel audio signals of 5.1 channels and the positions of output channel audio signals of 7.1 channels;
fig. 16 is a diagram explaining the levels of two left channels and the level of one left front channel (Lfs); and
fig. 17 is a diagram explaining the levels of three front channels and the level of one left front channel (Lfs).
Best Mode for Carrying Out The Invention
Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings.
Current and commonly used general terms are selected as the terms used in the present invention. Also, there are terms arbitrarily chosen by the applicant for special cases, the detailed meanings of which are explained in detail in the description of the preferred embodiments of the present invention. Therefore, the present invention should not be understood from the literal sense of the terms, but rather from the meaning of the terms.
First, the present invention generates modified spatial information using spatial information and then decodes an audio signal using the generated modified spatial information. In this case, the spatial information is spatial information extracted in downmixing according to a predetermined tree configuration, and the modified spatial information is spatial information newly generated using the spatial information.
The present invention will be explained in detail below with reference to fig. 1.
Fig. 1 is a block diagram of an audio signal encoding apparatus and an audio signal decoding apparatus according to an embodiment of the present invention.
Referring to fig. 1, an apparatus for encoding an audio signal (hereinafter, simply referred to as an encoding apparatus) 100 includes a downmixing unit 110 and a spatial information extracting unit 120. And the apparatus for decoding an audio signal (hereinafter, simply referred to as a decoding apparatus) 200 includes an output channel generating unit 210 and a modified spatial information generating unit 220.
The downmixing unit 110 of the encoding apparatus 100 generates a downmix audio signal d by downmixing a multi-channel audio signal IN _ M. The down-mix audio signal d may be a signal generated by the down-mixing unit 110 by down-mixing the multi-channel audio signal IN _ M or an arbitrary down-mix audio signal generated by a user arbitrarily down-mixing the multi-channel audio signal IN _ M.
The spatial information extraction unit 120 of the encoding apparatus 100 extracts spatial information s from the multi-channel audio signal IN _ M. IN this case, the spatial information is information required to upmix the downmix audio signal d into the multi-channel audio signal IN _ M.
Meanwhile, the spatial information may be information extracted IN a process of downmixing the multi-channel audio signal IN _ M according to a predetermined tree configuration. In this case, the tree configuration may correspond to the tree configuration(s) agreed between the audio signal decoding apparatus and the audio signal encoding apparatus, which is not limited by the present invention.
And, the spatial information can include tree configuration information, an indicator, a spatial parameter, and the like. The tree configuration information is information on a tree configuration type. Therefore, the number of multi-channels, the downmixing order per channel, and the like vary according to the type of tree configuration. The indicator is information indicating whether the extended spatial information exists or not, etc. And the spatial parameters may include a channel level difference (hereinafter, abbreviated as CLD), inter-channel correlation or coherence (hereinafter, abbreviated as ICC), a channel prediction coefficient (hereinafter, abbreviated as CPC), etc. in downmixing at least two channels into at most two channels.
Meanwhile, the spatial information extraction unit 120 can further extract the extended spatial information in addition to the spatial information. In this case, the extension spatial information is information required to additionally extend the downmix audio signal d which has been upmixed with the spatial parameters. In addition, the extended spatial information may include extended channel configuration information and extended spatial parameters. The extended spatial information is not limited to the extended spatial information extracted by the spatial information extraction unit 120, which will be explained below.
Besides, the encoding apparatus 100 can further include a core codec encoding unit (not shown in the drawings) generating a downmix audio bitstream by decoding the downmix audio signal d, a spatial information encoding unit (not shown in the drawings) generating a spatial information bitstream by encoding the spatial information s, and a multiplexing unit (not shown in the drawings) generating an audio signal bitstream by multiplexing the downmix audio bitstream with the spatial information bitstream, and the present invention is not limited in this respect.
Also, the decoding apparatus 200 can further include a demultiplexing unit (not shown in the drawings) that separates the audio signal bitstream into a downmix audio bitstream and a spatial information bitstream, a core codec decoding unit (not shown in the drawings) that decodes the downmix audio bitstream, and a spatial information decoding unit (not shown in the drawings) that decodes the spatial information bitstream, and the present invention is not limited in this respect.
The modified spatial information generating unit 220 in the decoding apparatus 200 identifies the type of the modified spatial information using the spatial information and then generates modified spatial information s' of the identified type based on the spatial information. In this case, the spatial information may be spatial information s communicated from the encoding apparatus 100. And, the modified spatial information is information newly generated using the spatial information.
Meanwhile, there may be various types of modified spatial information. And, the various types of modified spatial information may include at least one of a), b), and c): a) partial spatial information, b) combined spatial information, c) expanded spatial information, the invention not being limited in this respect.
The partial spatial information includes spatial parameters of the part, the combined spatial information is generated by combining the spatial parameters, and the extended spatial information is generated using the spatial information and the extended spatial information.
The manner in which the modified spatial information is generated by the modified spatial information generating unit 220 may be different according to the type of the modified spatial information. And, a method of generating the modified spatial information according to the type of the modified spatial information will be explained in detail hereinafter.
Meanwhile, a reference for deciding the type of the modified spatial information may correspond to tree configuration information in the spatial information, an indicator in the spatial information, output channel information, and the like. The tree configuration information and the indicator may be included in the spatial information s from the encoding apparatus. The output channel information is information on speakers interconnected to the decoding apparatus 200, and may include the number of output channels, position information on each output channel, and the like. The output channel information may be previously input by a manufacturer or input by a user.
The method of deciding the type of the modified spatial information using such information will be explained in detail later.
The output channel generating unit 210 of the decoding apparatus 200 generates an output channel audio signal OUT _ N from the downmix audio signal d using the modified spatial information s'.
The spatial filter information 230 is information on the acoustic path, and is provided to the modified spatial information generating unit 220. The spatial filter information may be used if the modified spatial information generating unit 220 generates combined spatial information having a surround effect.
Hereinafter, the method of decoding an audio signal by generating modified spatial information according to the type of the modified spatial information is explained as follows in the following order: (1) partial spatial information, (2) combined spatial information, and (3) extended spatial information.
(1) Partial spatial information
Since the spatial parameters are calculated in the course of downmixing the multi-channel audio signal according to the predetermined tree configuration, if the downmixed audio signal is decoded using the spatial parameters as it is, the original multi-channel audio signal before being downmixed can be reconstructed. If it is attempted to make the number of channels N of the output channel audio signal smaller than the number of channels M of the multi-channel audio signal, the down-mix audio signal can be decoded by partially applying the spatial parameters.
Such a method may vary according to the order and method of downmixing a multi-channel audio signal in an encoding apparatus, i.e., the type of tree configuration. And, the tree configuration type can be queried using the tree configuration information of the spatial information. And this method may be changed according to the number of output channels. In addition, the number of output channels may be queried using the output channel information.
Hereinafter, in the case where the number of channels of the output channel audio signal is smaller than that of the multi-channel audio signal, in the following description, a method of decoding an audio signal by applying partial spatial information partially including spatial parameters is explained as an example of various tree configurations.
(1) First embodiment of Tree configuration (5-2-5 Tree configuration)
Fig. 2 is a diagram of one embodiment of applying partial spatial information.
Referring to the left half of fig. 2, it is shown that the number of channels is 6 (left front channel L, left surround channel L)sA center sound channel C, a low frequency sound channel LFE, a right front sound channel R and a right surround sound channel Rs) Down-mixing a multi-channel audio signal to a stereo down-mixed channel LoAnd RoAnd the relation between the multi-channel audio signal and the spatial parameters.
First, the left channel L and the left surround channel L are executedsDown-mixing between a center channel C and a low frequency channel LFE, and a right channel R and a right surround channel RsDown-mixing between them. In this primary downmixing process, a left total channel L is generatedtCentral total sound channel CtAnd a right total channel Rt. And the spatial parameter calculated in this primary downmixing process comprises CLD2(ICC-containing)2)、CLD1(ICC-containing)1)、CLD0(ICC-containing)0) And the like.
In a secondary process following this primary downmix process, the left sum channel LtCentral total sound channel CtAnd a right total channel RtIs downmixed together to generate a left channel L0And a right channel R0. The spatial parameters calculated in the secondary downmix process include CLDTTT、CPCTTT、ICCTTTAnd the like.
In other words, a multi-channel audio signal totaling six channels is downmixed in the above-described order to generate a stereo downmix channel LoAnd Ro
If the spatial parameters (CLD) calculated in the above sequential manner are used as they are2、CLD1、CLD0、CLDTTTEtc.), they are upmixed in the reverse order of the downmixed order to generate the number of channels of 6 (left front channel L, left surround channel L)sA center sound channel C, a low frequency sound channel LFE, a right front sound channel R and a right surround sound channel Rs) A multi-channel audio signal.
Referring to the right half of fig. 2, if partial spatial information corresponds to spatial parameters (CLD)2、CLD1、CLD0、CLDTTTEtc.) of the group consisting ofTTTIt is upmixed to the left sum channel LtCentral total sound channel CtAnd a right total channel Rt. If the left total channel LtAnd a right total channel RtIs selected as the output channel audio signal, L can be generatedtAnd RtOutput channel audio signals of two channels. If the left total channel LtCentral total sound channel CtAnd a right total channel RtIs selected as the output channel audio signal, L can be generatedt、CtAnd RtOutput channel audio signals of three channels. In addition to using CLD1After upmixing, if the left total channel L istRight master channel RtCenter channel C and low frequencyThe channel LFE is selected, and four channels (L) can be generatedt、RtC, and LFE).
(1) Second embodiment of Tree configuration (5-1-5 Tree configuration)
Fig. 3 is a diagram of another embodiment of applying partial spatial information.
Referring to the left half of fig. 3, it is shown that the number of channels is 6 (left front channel L, left surround channel L)sA center sound channel C, a low frequency sound channel LFE, a right front sound channel R and a right surround sound channel Rs) The order in which the multi-channel audio signal is downmixed to the mono downmix audio signal M and the relation between the multi-channel audio signal and the spatial parameters.
First, similarly to the first embodiment, the left channel L and the left surround channel L are performedsDown-mixing between a center channel C and a low frequency channel LFE, and a right channel R and a right surround channel RsDown-mixing between them. In this primary downmixing process, a left total channel L is generatedtCentral total sound channel CtAnd a right total channel Rt. And, the spatial parameter calculated in the primary downmixing process includes CLD3(ICC-containing)3)、CLD4(ICC-containing)4)、CLD5(ICC-containing)5) And the like. (in this example, CLDXAnd ICCXDistinguished from the previous CLD in the first embodimentX)。
In a secondary process following this primary downmix process, the left sum channel LtAnd a right total channel RtAre downmixed together to generate a left center channel LC and a center sum channel CtAnd a right total channel RtAre downmixed together to generate a right center channel RC. And, the spatial parameter calculated in the secondary downmixing process may include CLD2(ICC-containing)2)、CLD1(ICC-containing)1) And the like.
Then, in a three-level downmixing process, the left center channel LC and the right center channel RtIs downmixed to generate a mono downmix signal M. And, the spatial parameter calculated in the three-level downmixing process includes CLD0(ICC-containing)0) And the like.
Referring to the right half of fig. 3, if partial spatial information corresponds to spatial parameters (CLD)3、CLD4、CLD5、CLD1、CLD2、CLD0Etc.) of the group consisting of0Then a left center channel LC and a right center channel RC are generated. If the left center channel LC and the right center channel RC are selected as the output channel audio signals, an output channel audio signal having both LC and RC channels can be generated.
Meanwhile, if partial spatial information corresponds to spatial parameters (CLD)3、CLD4、CLD5、CLD1、CLD2、CLD0Etc.) of the group consisting of0、CLD1And CLD2Then generate the left total channel LtCentral total sound channel CtAnd a right total channel Rt
If the left total channel LtAnd a right total channel RtIs selected as the output channel audio signal, L can be generatedtAnd RtOutput channel audio signals of two channels. If the left total channel LtCentral total sound channel CtAnd a right total channel RtIs selected as the output channel audio signal, L can be generatedt、CtAnd RtOutput channel audio signals of three channels.
In part spatial information, CLD is additionally included4In case that the left total channel L is set after the upmixing is performed to the center channel and the low frequency channel LEFtRight master channel RtThe center channel C and the low frequency channel LEF are selected as output channel audio signals, four channels (L) can be generatedt、RtC and LFE) output channelsAn audio signal.
(1) -3. third embodiment of tree configuration (5-1-5 tree configuration)
Fig. 4 is a schematic diagram of still another embodiment of applying partial spatial information.
Referring to the left half of fig. 4, it is shown that the number of channels is 6 (left front channel L, left surround channel L)sA center sound channel C, a low frequency sound channel LFE, a right front sound channel R and a right surround sound channel Rs) The order in which the multi-channel audio signal is downmixed to the mono downmix audio signal M and the relation between the multi-channel audio signal and the spatial parameters.
First, similarly to the first embodiment or the second embodiment, the left channel L and the left surround channel L are performedsDown-mixing between a center channel C and a low frequency channel LFE, and a right channel R and a right surround channel RsDown-mixing between them. In this primary downmixing process, a left total channel L is generatedtCentral total sound channel CtAnd a right total channel Rt. And, the spatial parameter calculated in the primary downmixing process includes CLD1(ICC-containing)1)、CLD2(ICC-containing)2)、CLD3(ICC-containing)3) Etc. (in this example, CLDXAnd ICCXIs distinguished from the previous CLD in the first or second embodimentXAnd ICCX)。
In a secondary process following this primary downmix process, the left sum channel LtCentral total sound channel CtAnd a right total channel RtAre downmixed together to generate a left center channel LC and a right channel R. And spatial parameters CLDTTT(ICC-containing)TTT) Is calculated.
Then, in the three-level downmixing process, the left center channel LC and the right channel R are downmixed to generate a mono downmix signal M. And spatial parameters CLD0(ICC-containing)0) Is calculated.
Referring to the right half of fig. 4, if partial spatial information corresponds to spatial parameters (CLD)1、CLD2、CLD3、CLDTTT、CLD0Etc.) of the group consisting of0And CLDTTTThen generate the left total channel LtCentral total sound channel CtAnd a right total channel Rt
If the left total channel LtAnd a right total channel RtIs selected as the output channel audio signal, L can be generatedtAnd RtOutput channel audio signals of two channels.
If the left total channel LtCentral total sound channel CtAnd a right total channel RtIs selected as the output channel audio signal, L can be generatedt、CtAnd RtOutput channel audio signals of three channels.
Additionally including CLD in partial spatial information2In case that the upmixing has been performed to the center channel C and the low frequency channel LEF, if the left total channel L istRight master channel RtThe center channel C and the low frequency channel LEF are selected as output channel audio signals, four channels (L) can be generatedt、RtC, and LFE).
In the above description, the process of generating the output channel audio signal by applying the spatial parameters only partially is explained taking the three tree configuration types as an example. Also, combined spatial information or extended spatial information may be additionally applied in addition to the partial spatial information. Accordingly, a process of applying the modified spatial information to the audio signal hierarchically, or collectively and synthetically may be coped with.
(2) Combining spatial information
Since spatial information is calculated in downmixing a multi-channel audio signal according to a predetermined tree configuration, if the downmixed audio signal is decoded using spatial parameters of the spatial information as they are, the original multi-channel audio signal before downmixing can be reconstructed. If the number of channels M of a multi-channel audio signal is different from the number of channels N of an output channel audio signal, new combined spatial information is generated by combining the spatial information, and then a downmix processing can be performed on a downmix audio signal using the generated information. In particular, by applying the spatial parameters to the conversion formula, the combined spatial parameters can be generated.
This method may be changed according to the order and method of downmixing a multi-channel audio signal in an encoding apparatus. And can query the order and method of downmixing using tree configuration information in the spatial information. And this method varies according to the number of output channels. In addition, the number of output channels or the like may be queried using the output channel information.
Next, a detailed embodiment of a method of modifying spatial information and an embodiment of giving a virtual 3-D effect will be explained in the following description.
(2) -1. general combined spatial information
There is provided a method of generating a combined spatial parameter by combining spatial parameters of spatial information for upmixing according to a tree configuration different from a tree configuration in a downmixing process. Accordingly, this method can be applied to all kinds of downmix audio signals regardless of tree configuration according to the tree configuration information.
If a multi-channel audio signal is 5.1 channels and a downmix audio signal is 1 channel (mono), a method of generating an output channel audio signal of two channels is explained with reference to two embodiments as follows.
(2) -1-1. fourth example of tree configuration (5-1-5)1Tree shaped configuration)
Fig. 5 is a schematic diagram of an embodiment of applying combined spatial information.
Referring to the left half of FIG. 5, CLD0To CLD4And ICC0To ICC4(not shown in the drawing) may be referred to as spatial parameters that may be calculated in a process of downmixing a 5.1-channel multi-channel audio signal. For example, in the spatial parameters, the inter-channel level difference between the left channel signal L and the right channel signal R is CLD3The inter-channel correlation between L and R is ICC3. And a left surround channel LsAnd a right surround sound channel RsInter-channel level difference between is CLD2,LsAnd RsIs ICC2
On the other hand, referring to the right half of fig. 5, if the spatial parameters CLD are determined by combining themαAnd ICCαLeft channel signal L applied to mono downmix audio signal mtAnd a right channel signal RtThen a stereo output channel audio signal L can be generated directly from the mono audio signal mtAnd Rt. In this case, spatial parameters CLD are combinedαAnd ICCαBy combining CLD0To CLD4And ICC0To ICC4A combination is calculated.
The following is explained first by the CLD0To CLD4Computing CLDs in combined spatial parameters combined togetherαBy the expression of CLD, then0To CLD4And ICC0To ICC4Combining to compute ICC in combined spatial parametersαThe procedure of (2) is as follows.
(2)-1-1-a.CLDαDerivation of
First, due to CLDαIs the left output signal LtAnd a right output signal RtThe difference in level between, and thus the left output signal LtAnd a right output signal RtThe result obtained by substituting into the definition formula for CLD is as follows.
[ equation 1]
CLDα=10*log10(PLt/PRt),
Wherein, PLtIs the power of Lt, PRtIs the power of Rt.
[ formula 2]
CLDα=10*log10(PLt+a/PRt+a),
Wherein, PLtIs the power of Lt, PRtIs the power of Rt and 'a' is a very small constant.
Thus, CLDαDefined by equation 1 or equation 2.
At the same time, in order to use spatial parameters CLD0To CLD4To represent PLtAnd PRtA left output signal L of the audio signal of the output channel is requiredtA right output signal R for outputting the audio signal of the sound channeltAnd multi-channel signal L, Ls、R、RsThe relational equation between C and LFE. Also, the corresponding relational formula can be defined as follows.
[ formula 3]
Lt=L+Ls+C/√2+LFE/√2
Rt=R+Rs+C/√2+LFE/√2
Since the relational formula such as formula 3 may vary depending on how the output channel audio signal is defined, it may be defined in a different manner from the formula of formula 3. For example, '1/v 2' in C/v 2 or LFE/v 2 may be '0' or '1'.
Equation 3 can be derived from equation 4 below.
[ formula 4]
PLt=PL+PLs+PC/2+PLFE/2
PRt=PR+PRs+PC/2+PLFE/2
Can use PLtAnd PRtExpression of CLD according to formula 1 or formula 2α. And' PLtAnd PRt' can use PL、PLs、Pc、PLFE、PRAnd PRsAccording to equation 4. Therefore, there is a need to find a spatial parameter enabling CLD0To CLD4To represent PL、PLs、PC、PLFE、PRAnd PRsThe relational formula of (c).
Meanwhile, if the tree configuration is as shown in FIG. 5, then the multi-channel audio signals (L, R, C, LFE, L)s、Rs) The relationship with the mono downmix channel signal m is as follows.
[ formula 5]
L R C LFE Ls Rs = D L D R D C D LFE D Ls D Rs m = c 1 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 c 2 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 c 1 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 c 2 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 c 1 , OTT 2 c 2 , OTT 0 c 2 , OTT 2 c 2 , OTT 0 m
Wherein, c 1 , OT T x = 10 CLD x 10 1 + 10 CLD x 10 , c 2 , OT T x = 1 1 + 10 CLD x 10
and equation 5 leads to equation 6 below.
[ formula 6]
P L P R P C P LFE P Ls P Rs = ( c 1 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 ) 2 ( c 2 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 ) 2 ( c 1 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 ) 2 ( c 2 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 ) 2 ( c 1 , OTT 2 c 2 , OTT 0 ) 2 ( c 2 , OTT 2 c 2 , OTT 0 ) 2 m 2
Wherein, c 1 , OT T x = 10 CLD x 10 1 + 10 CLD x 10 , c 2 , OT T x = 1 1 + 10 CLD x 10
in particular, by substituting formula 6 into formula 4 and substituting formula 4 into formula 1 or formula 2, it is possible to assign the spatial parameter CLD to0To CLD4Combined way to represent combined spatial parameters CLDα
Meanwhile, by substituting equation 6 into Pc/2+ P in equation 4LFEThe resulting expansion of the/2 results is shown in equation 7.
[ formula 7]
PC/2+PLFE/2=[(c1,OTT4)2+(c2,OTT4)2]*(c2,OTT1*c1,OTT0)2*m2/2,
In this case, according to c1And c2Because (c) is defined (see equation 5)1,x)2+(c2,x)21, so (c)1,OTT4)2+(c2,OTT4)2=1。
Therefore, equation 7 can be briefly summarized as follows.
[ formula 8]
PC/2+PLFE/2=(c2,OTT1*c1,OTT0)2*m2/2
Therefore, by substituting formula 8 and formula 6 into formula 4 and substituting formula 4 into formula 1, it is possible to assign spatial parameters CLD0To CLD4Combined way to represent combined spatial parameters CLDα
(2)-1-1-b.ICCαDerivation of
First, since ICC α is a correlation between the left output signal Lt and the right output signal Rt, a result obtained by substituting the left output signal Lt and the right output signal Rt into corresponding definition formulas is as follows.
[ formula 9]
<math><mrow> <msub> <mi>ICC</mi> <mi>&alpha;</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>P</mi> <mi>LtRt</mi> </msub> <msqrt> <msub> <mi>P</mi> <mi>Lt</mi> </msub> <msub> <mi>P</mi> <mi>Rt</mi> </msub> </msqrt> </mfrac> <mo>,</mo> </mrow></math> Wherein <math><mrow> <msub> <mi>P</mi> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> <msub> <mi>x</mi> <mn>2</mn> </msub> </mrow> </msub> <mo>=</mo> <mi>&Sigma;</mi> <msub> <mi>x</mi> <mn>1</mn> </msub> <msubsup> <mi>x</mi> <mn>2</mn> <mo>*</mo> </msubsup> <mo>.</mo> </mrow></math>
In the formula 9, PLtAnd PRtCLD0 through CLD4 can be used as expressed in formula 4, formula 6, and formula 8. And, PLtPRtCan be expanded in the manner of equation 10.
[ equation 10]
PLtRt=PLR+PLsRs+PC/2+PLFE/2
In equation 10,' PC/2+PLFE[ 2] can be expressed as CLD according to equation 60To CLD4. And, PLRAnd PLsRsThe expansion can be according to the ICC definition as follows.
[ formula 11]
ICC3=PLR/√(PLPR)
ICC2=PLsRs/√(PLsPRs)
In equation 11, if √ (P)LPR) Or √ (P)LsPRs) By shifting terms, equation 12 can be obtained.
[ formula 12]
PLR=ICC3*√(PLPR)
PLsRs=ICC2*√(PLsPRs)
In formula 12, PL、PR、PLsAnd PRsCan be expressed as CLD according to equation 60To CLD4. The formula resulting from substituting formula 6 into formula 12 corresponds to formula 13.
[ formula 13]
PLR=ICC3*c1,OTT3*c2,OTT3*(c1,OTT1*c1,OTT0)2*m2
PLsRs=ICC2*c1,OTT2*c2,OTT2*(c2,OTT0)2*m2
Summarizing, by substituting equations 6 and 13 into equation 10 and substituting equations 10 and 4 into equation 9, the combined spatial parameter ICC can be obtainedαBy spatial parameters CLD0To CLD3、ICC2And ICC3To indicate.
(2) -1-2. treeFifth embodiment (5-1-5) of the shape configuration2Tree shaped configuration)
Fig. 6 is a schematic diagram of another embodiment of applying combined spatial information.
Referring to the left half of FIG. 6, CLD0To CLD4And ICC0To ICC4(not shown in the drawing) may be referred to as spatial parameters that may be calculated in a process of downmixing a 5.1-channel multi-channel audio signal.
In the spatial parameters, the inter-channel level difference between the left channel signal L and the left surround channel signal Ls is CLD3And LsIs ICC3. And, a right channel R and a right surround channel RsInter-channel level difference between is CLD4The inter-channel correlation between R and Rs is ICC4
On the other hand, referring to the right half of fig. 6, if the spatial parameters CLD are determined by combining themβAnd ICCβGeneration of a left channel signal L for a mono downmix audio signal mtAnd a right channel signal RtThen a stereo output channel audio signal L can be generated directly from the mono audio signal mtAnd Rt. In this case, spatial parameters CLD are combinedβAnd ICCβBy using spatial parameters CLD0To CLD4And ICC0To ICC4A combination is calculated.
The following is explained first by the CLD0To CLD4Computing CLD in combined spatial parameters in combinationβBy the expression of CLD, then0To CLD4And ICC0To ICC4Combined together to compute ICC in combined spatial parametersβThe procedure of (2) is as follows.
(2)-1-2-a.CLDβDerivation of
First, because of CLDβIs the left output signal LtTo the rightOutput signal RtInter-channel level difference between them, the left output signal LtAnd a right output signal RtThe result of substituting the definition formula for CLD is shown below.
[ formula 14]
CLDβ=10*log10(PLt/PRt),
Wherein, PLtIs LtPower of PRtIs RtOf the power of (c).
[ formula 15]
CLDβ=10*log10(RLt+a/PRt+a),
Wherein, PLtIs LtPower of PRtIs RtAnd 'a' is a very small number.
Thus, CLDβAs defined by equation 14 or equation 15.
At the same time, in order to use spatial parameters CLD0To CLD4To represent PLtAnd PRtA left output signal L of the audio signal of the output channel is requiredtA right output signal R for outputting the audio signal of the sound channeltAnd multi-channel signal L, Ls、R、RsThe relational equation between C and LFE. Also, the corresponding relational formula can be defined as follows.
[ formula 16]
Lt=L+Ls+C/√2+LFE/√2
Rt=R+Rs+C/√2+LFE/√2
Since the relational formula like formula 16 may vary depending on how the output channel audio signal is defined, it may be defined in a different manner from the formula of formula 16. For example, '1/v 2' in C/v 2 or LFE/v 2 may be '0' or '1'.
Equation 16 can lead to equation 17 below.
[ formula 17]
PLt=PL+PLs+PC/2+PLFE/2
PRt=PR+PRs+PC/2+PLFE/2
Can use PLtAnd PRtExpressing CLD according to formula 14 or formula 15β. And' PLtAnd PRt' can use PL、PLs、Pc、PLFE、PRAnd PRsAccording to equation 15. Therefore, there is a need to find a spatial parameter enabling CLD0To CLD4To represent PL、PLs、PC、PLFE、PRAnd PRsThe relational formula of (c).
Meanwhile, if the tree configuration is as shown in FIG. 6, then the multi-channel audio signals (L, R, C, LFE, L)s、Rs) The relationship with the mono downmix channel signal m is as follows.
[ formula 18]
L Ls R Rs C LFE = D L D Ls D R D Rs D C D LFE m = c 1 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 c 2 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 c 1 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 c 2 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 c 1 , OTT 2 c 2 , OTT 0 c 2 , OTT 2 c 2 , OTT 0 m ,
Wherein, c 1 , OT T x = 10 CLD x 10 1 + 10 CLD x 10 , c 2 , OT T x = 1 1 + 10 CLD x 10
and, equation 18 leads to equation 19 below.
[ formula 19]
P L P Ls P R P Rs P C P LFE = ( c 1 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 ) 2 ( c 2 , OTT 3 c 1 , OTT 1 c 1 , OTT 0 ) 2 ( c 1 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 ) 2 ( c 2 , OTT 4 c 2 , OTT 1 c 1 , OTT 0 ) 2 ( c 1 , OTT 2 c 2 , OTT 0 ) 2 ( c 2 , OTT 2 c 2 , OTT 0 ) 2 m 2 ,
Wherein, c 1 , OT T x = 10 CLD x 10 1 + 10 CLD x 10 , c 2 , OT T x = 1 1 + 10 CLD x 10
in particular, by substituting formula 19 into formula 17 and substituting formula 17 into formula 14 or formula 15, it is possible to assign the spatial parameter CLD to0To CLD4Combined way to represent combined spatial parameters CLDβ
Meanwhile, substituting formula 19 into P in formula 17L+PLsThe resulting expansion formula is represented in formula 20.
[ formula 20]
PL+PLs=[(c1,OTT3)2+(c2,OTT3)2](c1,OTT1*c1,OTT0)2*m2
In this case, according to c1And c2Because (c) is defined (see equation 5)1,x)2+(c2,x)21, so (c)1,OTT3)2+(c2,OTT3)2=1。
Therefore, equation 20 can be briefly summarized as follows.
[ formula 21]
PL=PL+PLs=(c1,OTT1*c1,OTT0)2*m2
On the other hand, substituting equation 19 into P in equation 17R+PRsThe resulting expansion formula is represented in formula 22.
[ formula 22]
PR+PRs=[(c1,OTT4)2+(c2,OTT4)2](c1,OTT1*c1,OTT0)2*m2
In this case, according to c1And c2Because (c) is defined (see equation 5)1,x)2+(c2,x)21, so (c)1,OTT4)2+(c2,OTT4)2=1。
Therefore, equation 22 can be briefly summarized as follows.
[ formula 23]
PR_=PR+PRs=(c2,OTT1*c1,OTT0)2*m2
On the other hand, substituting equation 19 into P in equation 17C/2+PLFEThe resulting expansion equation for/2 is shown in equation 24.
[ formula 24]
PC/2+PLFE/2=[(c1,OTT2)2+(c2,OTT2)2](c2,OTT0)2*m2/2
In this case, according to c1And c2Because (c) is defined (see equation 5)1,x)2+(c2,x)21, so (c)1,OTT2)2+(c2,OTT2)2=1。
Therefore, equation 24 can be briefly summarized as follows.
[ formula 25]
PC/2+PLFE/2=(c2,OTT0)2*m2/2
Therefore, by substituting formula 21, formula 23, and formula 25 into formula 17, and substituting formula 17 into formula 14 or formula 15, it is possible to assign the spatial parameter CLD to0To CLD4Combined way to represent combined spatial parameters CLDβ
(2)-1-2-b.ICCβDerivation of
First, because of the ICCβIs the left output signal LtAnd a right output signal RtSo will output the left signal LtAnd a right output signal RtThe result obtained by substituting into the corresponding definition formula is as follows.
[ formula 26]
<math><mrow> <msub> <mi>ICC</mi> <mi>&beta;</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>P</mi> <mi>LtRt</mi> </msub> <msqrt> <msub> <mi>P</mi> <mi>Lt</mi> </msub> <msub> <mi>P</mi> <mi>Rt</mi> </msub> </msqrt> </mfrac> <mo>,</mo> </mrow></math> Wherein <math><mrow> <msub> <mi>P</mi> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> <msub> <mi>x</mi> <mn>2</mn> </msub> </mrow> </msub> <mo>=</mo> <mi>&Sigma;</mi> <msub> <mi>x</mi> <mn>1</mn> </msub> <msubsup> <mi>x</mi> <mn>2</mn> <mo>*</mo> </msubsup> <mo>.</mo> </mrow></math>
In the formula 26, PLtAnd PRtCLD may be used according to formula 190To CLD4To indicate. And, PLtPRtIt can be expanded in the manner of equation 27.
[ formula 27]
PLtRt=PL_R_+PC/2+PLFE/2
In equation 27,' Pc/2+ PLFEPer 2' can be represented by CLD according to equation 190To CLD4To indicate. And, PL_R_It can be developed according to the ICC definition as follows.
[ formula 28]
ICC1=PL_R_/√(PL_PR_)
If √ (P)L_PR_) By shifting terms, equation 29 can be obtained.
[ formula 29]
PL_R_=ICC1*√(PL_PR_)
In the formula 29, PL_And PR_And can use CLD according to formula 21 and formula 230To CLD4To indicate. The formula obtained by substituting formula 21 and formula 23 into formula 29 corresponds to formula 30.
[ formula 30]
PL_R_=ICC1*c1,OTT1*c1,OTT0*c2,OTT1*c1,OTT0*m2
Summarizing, by substituting equation 30 into equation 27 and substituting equation 27 and equation 17 into equation 26, the combined spatial parameter ICC can be calculatedβBy spatial parameters CLD0To CLD4And ICC1To indicate.
As explained aboveThe spatial parameter modification method is only one embodiment. And, looking for PxOr PxyObviously by taking into account the correlation between the respective channels (e.g. ICC)0Etc.), and additionally considering the signal energy, the above explained formula can be changed in various forms.
(2) -2. combined spatial information with surround effect
First, if the combined spatial information is generated by combining the spatial information in consideration of the sound paths, the virtual surround effect can be produced.
The virtual surround effect or virtual 3D effect can generate the effect of the presence of surround channel speakers in essence without the surround channel speakers. For example, a 5.1-channel audio signal is output through two stereo speakers.
The acoustic path may correspond to spatial filter information. The spatial filter information can use a function called HRTF (head related transfer function), which is not limited by the present invention. The spatial filter information can contain filter parameters. The combined spatial parameters can be generated by substituting the filter parameters and the spatial parameters into a conversion formula. Also, the generated combined spatial parameters may include filter coefficients.
Next, assuming that a multi-channel audio signal is 5-channel and a three-channel output channel audio signal is generated, a method of generating combined spatial information having a surround effect in consideration of sound paths will be explained as follows.
Fig. 7 is a diagram of the sound paths from the speakers to the listener, showing the positions of the speakers.
Referring to fig. 7, the positions of the three speakers SPK1, SPK2, and SPK3 are left front L, center C, and right R, respectively. And the positions of the virtual surround channels are left surround Ls and right surround Rs, respectively.
Showing the location of the three speakers L, C and R and the virtual surround channel locations Ls and Rs, respectively, to the listenerR and l at the right and left ear positions. Label' Gx_y' indicates the acoustic path from position x to position y. For example, the label ` G `L_r' indicates the acoustic path from the left front L to the right ear r of the listener.
If there are loudspeakers at five positions (i.e. there are also loudspeakers at the left surround Ls and the right surround Rs), and if the listener is present at the positions shown in fig. 7, then the signal L is fed to the left ear of the listener0And a signal R to be sent to the right ear of the listener0Expressed by equation 31.
[ formula 31]
L0=L*GL_l+C*GC_l+R*GR_l+LS*GLs_l+RS*GRs_l
R0=L*GL_r+C*GC_r+R*GR_r+LS*GLs_r+RS*GRs_r
Where L, C, R, Ls and Rs are the sound channels at the respective positions, Gx_yRepresents the acoustic path from position x to position y, and '×' represents the convolution.
However, as described above, if speakers exist at only three positions L, C, and R, then the signal L to the left ear of the listener is0_realAnd a signal R to be sent to the right ear of the listener0_realAs shown below.
[ formula 32]
L0_real=L*GL_l+C*GC_l+R*GR_l
R0_real=L*GL_r+C*GC_r+R*GR_r
Since the signal represented in equation 32 does not take into account the surround channel signals Ls and Rs, a virtual surround effect cannot be generated. To produce the virtual surround effect, the Ls signal arriving at the listener position (l, R) from the speaker position Ls is made equal to the Ls signal arriving at the listener position (l, R) from the speaker at each of the three positions L, C and R different from the original position Ls. And, this also applies equally to the right surround channel signal Rs.
Looking at the left surround channel signal Ls, if the left surround channel signal Ls is output from a speaker at a left surround position Ls as an original position, signals reaching the left and right ears l and r of the listener are expressed as follows.
[ formula 33]
′Ls*GLs_l′、′Ls*GLs_r
And, if the right surround channel signal Rs is output from the speaker at the right surround position Rs as the original position, signals reaching the left and right ears l and r of the listener are expressed as follows.
[ formula 34]
′Rs*GRs_l′、′Rs*GRs_r
If the left and right ear/and r signals arriving at the listener are equal to the components of equations 33 and 34, the listener can have the feeling as if the speakers were present at the left surround position Ls and the right surround position Rs, respectively, even if they are output through the speakers at any positions (for example, through the speaker SPK1 at the left front position).
Meanwhile, if the components in equation 33 are output from a speaker located at the left surround position Ls, they are signals arriving at the left ear l and the right ear r of the listener, respectively. Therefore, if the components shown in equation 33 are output from the speaker SPK1 located at the front left position as they are, the signals reaching the left ear l and the right ear r of the listener can be expressed as follows.
[ formula 35]
′Ls*GLs_l*GL_l′、′Ls*GLs_r*GL_r
An examination formula 35 is added corresponding to the left front position L to the listener's left ear L (orComponent' G) of the acoustic path of the right ear r)L_l'(or' G)L_r′)。
However, the signals arriving at the listener's left ear/and right ear r should be components shown in equation 33 rather than equation 35. If the sound output from the speaker located at the front left position L reaches the listener, the component' G is addedL_l'(or' G)L_r'). Therefore, if the component shown in equation 33 is output from the speaker SPK1 located at the front left position, then for the acoustic path, the 'G' should be consideredL_l'(or' G)L_r') inverse function' GL_l -1'(or' G)L_r -1'). In other words, if components corresponding to formula 33 are output from the speaker SPK1 located at the front left position L, these components must be modified as follows.
[ formula 36]
′Ls*GLs_l*GL_l -1′、′Ls*GLs_r*GL_r -1
If the components corresponding to formula 34 are output by the speaker SPK1 located at the front left position L, these components must be modified to the following formula.
[ equation 37]
′Rs*GRs_l*GL_l -1′、′Rs*GRs_r*GL_l -1
Therefore, the signal L' output from the speaker SPK1 located at the front left position L is summarized as follows.
[ formula 38]
L′=L+LS*GLs_l*GL_l -1+RS*GRs_l*GL_l -1
(component Ls GLs_r*GL_r -1And Rs GRs_r*GL_l -1Is omitted)
If the signal shown in equation 38 to be output from the speaker SPK1 located at the front left position L arrives at the listener left ear position L, the acoustic path factor' G is addedL_l'. Therefore, each' G in equation 38L_lThe' term is cancelled out, whereby the factors shown in equation 33 and equation 34 remain eventually.
Fig. 8 is a diagram explaining signals output from each speaker position to implement a virtual surround effect.
Referring to fig. 8, if the signals Ls and Rs output from the surround positions Ls and Rs are contained in the signal L' output from each speaker position SPK1 by considering the sound paths, they correspond to equation 38.
In the formula 38, GLs_l*GL_l -1Is abbreviated as HLs_LAs follows.
[ formula 39]
L′=L+LS*HLs_L+RS*HRs_L
For example, the signal C' output from the speaker SPK2 located at the center position C is summarized as follows.
[ formula 40]
C′=C+LS*HLs_C+RS*HRs_C
For another example, the signal R' output from the speaker SPK3 located at the front right position R is summarized as follows.
[ formula 41]
R′=R+LS*HLs_R+RS*HRs_R
Fig. 9 is a conceptual diagram for explaining a method of generating a 3-channel signal using a 5-channel signal as in equation 38, equation 39, or equation 40.
If 5-channel signals are used to generate 2-channel signals R 'and L', or ifH if the surround channel signal Ls or Rs is not included in the center channel signal CLs_COr HRs_CBecomes 0.
For convenience of implementation, Hx_yCan be used as Hx_yQuilt Gx_yUsing H instead of, or by taking into account, cross-talkx_yVarious modifications are made.
The above detailed explanation relates to one embodiment of the combined spatial information having the surround effect. And it is apparent that it can be changed in various forms according to a method of applying spatial filter information. As described in the above description, the signals output through the speakers (in the above example, the left front channel L ', the right front channel R ', and the center channel C ') according to the above process may be generated from the downmix audio signal using the combined spatial information, more specifically, using the combined spatial parameters.
(3) Expanding spatial information
First, extended spatial information can be generated by adding the extended spatial information to the spatial information. And can upmix the audio signal using the extended spatial information. In a corresponding upmixing process, an audio signal is converted into a primary upmix audio signal based on spatial information, and then the primary upmix audio signal is converted into a secondary upmix audio signal based on the augmented spatial information.
In this case, the extended spatial information can include extended channel configuration information, extended channel mapping information, and extended spatial parameters.
The extended channel configuration information is information on configurable channels and channels configured by tree configuration information of available spatial information. The extended channel configuration information may include at least one of a split identifier and a non-split identifier, which will be explained in detail below. The extended channel mapping information is position information of each channel configuring the extended channel. And, the extended spatial parameters may be used to upmix one channel into at least two channels. The extended spatial parameters may include inter-channel level differences.
The extended spatial information explained above may be included in the spatial information after (i) being generated by the encoding apparatus or (ii) being generated by the decoding apparatus itself. If the extended spatial information is generated by the encoding apparatus, the presence or absence of the extended spatial information may be determined based on the spatial information indicator. If the extended spatial information is generated by the decoding apparatus itself, the extended spatial parameters of the extended spatial information may be obtained by calculation using the spatial parameters of the spatial information.
Meanwhile, a process of upmixing an audio signal using extension spatial information generated on the basis of spatial information and extension spatial information may be sequentially and hierarchically performed or collectively and synthetically performed. If the extension spatial information can be calculated as one matrix based on the spatial information and the extension spatial information, the matrix can be used to upmix the downmix audio signal into the multi-channel audio signal collectively and directly. In this case, the factors configuring the matrix may be defined according to the spatial parameters and the extended spatial parameters.
Hereinafter, after the example of using the extended spatial information generated by the encoding apparatus is explained, the example of generating the extended spatial information by the decoding apparatus itself will be explained.
(3) -1: example using the extended spatial information generated by the encoding apparatus: arbitrary tree configuration
First, extended spatial information is generated by an encoding apparatus in such a manner that the extended spatial information is added to the spatial information. And, an example in which the decoding apparatus receives the extended spatial information will be explained. In addition, the extended spatial information may be information extracted during a process of downmixing the multi-channel audio signal by the encoding apparatus.
As described above, the extended spatial information includes extended channel configuration information, extended channel mapping information, and extended spatial parameters. In this case, the extended channel configuration information may include at least one of a split identifier and a non-split identifier. Next, a process of expanding channels based on the split and non-split identifier array configurations will be explained in detail as follows.
Fig. 10 is a diagram of an embodiment of configuring an extension channel based on extension channel configuration information.
Referring to the lower half of fig. 10, 0 and 1 are repeatedly arranged in a sequence. In this case, '0' means that the identifier is not divided, and '1' means that the identifier is divided. The undivided flag 0 exists at the first order (1), and the channel matching the undivided flag 0 of the first order is the left channel L existing at the uppermost end. Therefore, the left channel L matching this non-division identifier 0 is selected as an output channel instead of being divided. At the second level (2) there is a split identifier 1. The channel matching the split identifier is the left surround channel Ls next to the left channel L. Therefore, the left surround channel Ls matching this division identifier 1 is divided into two channels.
Since the non-division identifier 0 exists in the third order (3) and the fourth order (4), two channels divided from the left surround channel Ls are selected as output channels without being divided. Once the above process is repeated to the final stage (10), the complete augmented channel can be configured.
This channel division process is repeated the same number of times as the number of division identifiers 1, and the process of selecting a channel as an output channel is repeated the same number of times as the number of non-division identifiers 0. Therefore, the number of the channel division units AT0 and AT1 is equal to the number of division identifiers 1 (2), and the number of the extension channels (L, Lfs, Ls, R, Rfs, Rs, C, and LFE) is equal to the number of non-division identifiers 0 (8).
Meanwhile, after the extended channels are configured, the position of each output channel may be mapped using the extended channel mapping information. In the case of fig. 10, mapping is performed in the order of a left front channel L, a left front side channel Lfs, a left surround channel Ls, a right front channel R, a right front side channel Rfs, a right surround channel Rs, a center channel C, and a low frequency channel LFE.
As described above, the extended channels may be configured based on the extended channel configuration information. For this, a channel division unit that divides one channel into at least two channels is necessary. The channel division unit can use the extended spatial parameters when dividing one channel into at least two channels. Since the number of extended spatial parameters is equal to the number of channel division units, it is also equal to the number of division identifiers. Therefore, as many extended space parameters as the number of split identifiers can be extracted.
Fig. 11 is a diagram for explaining the configuration of the extension channels shown in fig. 10 and their relationship with the extension spatial parameters.
Referring to FIG. 11, there are two channel division units AT0And AT1And shows extended spatial parameters ATD respectively applied to them0And ATD1
If the extended spatial parameter is an inter-channel level difference, the channel segmentation unit is capable of determining a level difference of two segmented channels using the extended spatial parameter.
Accordingly, when an upmix operation is performed by adding the extended spatial information, the extended spatial parameters may not be fully applied but partially applied.
(3) Example of generating augmented spatial information: interpolation/extrapolation
First, extended spatial information can be generated by adding the extended spatial information to the spatial information. An example of generating the extended spatial information using the spatial information will be explained in the following description. In particular, the expanded spatial information can be generated using spatial parameters of the spatial information. In this case, interpolation, extrapolation, or the like may be used.
(3) -2-1. extension to 6.1 channels
An example of generating an output channel audio signal of 6.1 channels is explained with reference to the following example if the multi-channel audio signal is 5.1 channels.
Fig. 12 is a diagram of the positions of a multi-channel audio signal of 5.1 channels and the positions of an output channel audio signal of 6.1 channels.
Referring to fig. 12(a), it can be seen that channel positions of a multi-channel audio signal of 5.1 channels are a left front channel L, a right front channel R, a center channel C, a low frequency channel (not shown in the drawing) LFE, a left surround channel Ls, and a right surround channel Rs, respectively.
In case that a multi-channel audio signal of 5.1 channels is a down-mix audio signal, if a spatial parameter is applied to the down-mix audio signal, the down-mix audio signal is up-mixed again into a multi-channel audio signal of 5.1 channels.
However, a rear center RC channel signal as shown in fig. 12(b) should be further generated to upmix the downmix audio signal into a 6.1 channel multi-channel audio signal.
The channel signal of the rear center RC may be generated using spatial parameters associated with two rear channels (a left surround channel Ls and a right surround channel Rs). In particular, an inter-Channel Level Difference (CLD) in the spatial parameters indicates a level difference between two channels. Therefore, by adjusting the level difference between the two channels, the position of the virtual sound source existing between the two channels can be changed.
The principle that the position of a virtual sound source varies according to the level difference between two channels is explained as follows.
Fig. 13 is a diagram for explaining the relationship between a virtual sound source position and two inter-channel level differences, in which the levels of the left and right surround channels Ls and Rs are 'a' and 'b', respectively.
Referring to fig. 13(a), if the level a of the left surround channel Ls is greater than the level b of the right surround channel Rs, it can be seen that the position of the virtual sound source VS is closer to the position of the left surround channel Ls than to the position of the right surround channel Rs.
If audio signals are output from two channels, a listener feels that a virtual sound source exists substantially between the two channels. In this case, the position of the virtual sound source is closer to the position of the channel whose sound level is higher than that of the other channel.
In the case of fig. 13(b), since the sound level of the left surround channel Ls and the sound level of the right surround channel Rs are almost equal, the listener feels that the position of the virtual sound source exists at the center between the left surround channel Ls and the right surround channel Rs.
Thus, the sound level in the rear center can be determined using the above-described principles.
Fig. 14 is a diagram for explaining the sound levels of two rear channels and the sound level of one rear center channel.
Referring to fig. 14, the level c of the rear center channel RC can be calculated by interpolating a difference between the level a of the left surround channel Ls and the level b of the right surround channel Rs. In this case, the calculation may use either a nonlinear interpolation method or a linear interpolation method.
The level c of a new channel (e.g., the rear center channel RC) existing between two channels (e.g., Ls and Rs) may be calculated according to a linear interpolation method by the following formula.
[ formula 40]
c=a*k+b*(1-k),
Where 'a' and 'b' are the levels of the two channels, respectively, and 'k' is the relative position between the level-a channel, the level-b channel, and the level-c channel.
If the channel of level-c (e.g., the rear center channel RC) is positioned at the center between the channel of level-a (e.g., Ls) and the channel of level-b Rs, 'k' is 0.5. If 'k' is 0.5, then equation 40 yields equation 41.
[ formula 41]
c=(a+b)/2
According to equation 41, if a channel of a level-c (e.g., a rear center channel RC) is located at the center between a channel of a level-a (e.g., Ls) and a channel of Rs of a level-b, the level-c of the new channel corresponds to an average value of the levels a and b of the previous channel. In addition, the equations 40 and 41 are merely exemplary. Thus, the decision of level-c and the values of level-a and level-b may also be readjusted.
(3) -2-2. extension to 7.1 channels
An example of attempting to generate an output channel audio signal of 7.1 channels when the multi-channel audio signal is 5.1 channels is explained as follows.
Fig. 15 is a diagram explaining the position of a multi-channel audio signal of 5.1 channels and the position of an output channel audio signal of 7.1 channels.
Referring to fig. 15(a), as in fig. 12(a), it can be seen that channel positions of a multi-channel audio signal of 5.1 channels are a left front channel L, a right front channel R, a center channel C, a low frequency channel (not shown in the drawing) LFE, a left surround channel Ls, and a right surround channel Rs, respectively.
In case that a multi-channel audio signal of 5.1 channels is a down-mix audio signal, if a spatial parameter is applied to the down-mix audio signal, the down-mix audio signal is up-mixed again into a multi-channel audio signal of 5.1 channels.
However, a front left channel Lfs and a front right channel Rfs as shown in fig. 15(b) should be further generated to upmix a multi-channel down-mix audio signal into a 7.1-channel multi-channel audio signal.
Since the front left channel Lfs is positioned between the front left channel L and the surround left channel Ls, the level of the front left channel Lfs can be determined by interpolation using the level of the front left channel L and the level of the surround left channel Ls.
Fig. 16 is a diagram explaining two left-channel levels and one left-front-channel (Lfs) level.
Referring to fig. 16, it can be seen that the level c of the left front channel Lfs is a linear interpolation based on the level a of the left front channel L and the level b of the left surround channel Ls.
Meanwhile, although the left front side channel Lfs is positioned between the left front channel L and the left surround channel Ls, it may be positioned outside the left front channel L, the center channel C, and the right front channel R. Therefore, the level of the left front channel Lfs can be determined by extrapolation using the levels of the left front channel L, the center channel C, and the right front channel R.
Fig. 17 is a diagram for explaining the levels of three front channels and the level of one left front channel.
Referring to fig. 17, it can be seen that the level d of the left front channel Lfs is a linear extrapolated value based on the level a of the left front channel L, the level C of the center channel C, and the level b of the right front channel.
In the above description, the process of adding the extended spatial information to the spatial information to generate the output channel audio signal has been explained with reference to two examples. As described above, in the upmixing process of adding the extended spatial information, the extended spatial parameters may not be fully applied but partially applied. Thus, the process of applying the spatial parameters to the audio signal may be performed sequentially and hierarchically, or collectively and synthetically.
INDUSTRIAL APPLICABILITY
Therefore, the present invention provides the following effects.
First, the present invention can generate an audio signal having a configuration different from a predetermined tree configuration, thereby being capable of generating audio signals of various configurations.
Second, since an audio signal having a configuration different from a predetermined tree configuration can be generated, even if the number of multi-channels before down-mixing is performed is more or less than the number of speakers, the number of output channels equal to the number of speakers can be generated from the down-mixed audio signal.
Third, if output channels less in number than multiple channels are generated, since a multi-channel audio signal is directly generated from a down-mix audio signal instead of down-mixing an output channel audio signal from a multi-channel audio signal generated from an up-mix processing a down-mix audio signal, it is possible to remarkably reduce an operation load required for decoding an audio signal.
Fourth, the present invention provides a pseudo-surround effect when surround channel output is not available because sound paths are considered in generating combined spatial information.
Although the present invention has been described and illustrated herein with reference to the preferred embodiments thereof, it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims (12)

1. A method of decoding an audio signal, comprising:
receiving a downmix signal;
receiving spatial information comprising at least one spatial parameter and spatial filter information comprising at least one filter parameter;
generating combined spatial information of a surround effect by combining at least one of the spatial parameters with at least one of the filter parameters; and
converting the downmix signal into a virtual surround signal using the combined spatial information,
wherein the downmix signal is a stereo signal including a left channel signal and a right channel signal.
2. The method of claim 1, wherein the combined spatial parameters of the combined spatial information are generated by substituting at least one of the spatial parameters and at least one of the filter parameters into a conversion formula.
3. The method of claim 2, wherein the combined spatial parameters comprise filter coefficients.
4. The method of claim 2, wherein the conversion formula is decided according to tree configuration information on the audio signal.
5. The method of claim 2, wherein the conversion formula is decided based on output channel information.
6. The method of claim 1, wherein the spatial filter information is an acoustic path.
7. A device for decoding an audio signal, comprising:
a modified spatial information generating unit generating combined spatial information of the surround effect by combining at least one of the spatial parameters with at least one of the filter parameters; and
an output channel generating unit converting the downmix signal into a virtual surround signal using the combined spatial information,
wherein at least one of the spatial parameters is included in spatial information, at least one of the filter parameters is included in spatial filter information, and at least one of the spatial information, at least one of the spatial filter information, and the downmix signal are received, and
wherein the downmix signal is a stereo signal including a left channel signal and a right channel signal.
8. The apparatus of claim 7, wherein the combined spatial parameters of the combined spatial information are generated by substituting at least one of the spatial parameters and at least one of the filter parameters into a conversion formula.
9. The apparatus of claim 8, wherein the combined spatial parameters comprise filter coefficients.
10. The apparatus of claim 8, wherein the conversion formula is decided according to tree configuration information on the audio signal.
11. The apparatus of claim 8, wherein the conversion formula is decided based on output channel information.
12. The apparatus of claim 7, wherein the spatial filter information is an acoustic path.
CN2006800421983A 2005-09-14 2006-09-14 Method and apparatus for decoding audio signal Active CN101341533B (en)

Applications Claiming Priority (18)

Application Number Priority Date Filing Date Title
US71652405P 2005-09-14 2005-09-14
US60/716,524 2005-09-14
US75998006P 2006-01-19 2006-01-19
US60/759,980 2006-01-19
US76036006P 2006-01-20 2006-01-20
US60/760,360 2006-01-20
US77366906P 2006-02-16 2006-02-16
US60/773,669 2006-02-16
US77672406P 2006-02-27 2006-02-27
US60/776,724 2006-02-27
US78751606P 2006-03-31 2006-03-31
US60/787,516 2006-03-31
US81602206P 2006-06-22 2006-06-22
US60/816,022 2006-06-22
KR1020060078300 2006-08-18
KR20060078300 2006-08-18
KR10-2006-0078300 2006-08-18
PCT/KR2006/003659 WO2007032646A1 (en) 2005-09-14 2006-09-14 Method and apparatus for decoding an audio signal

Publications (2)

Publication Number Publication Date
CN101341533A CN101341533A (en) 2009-01-07
CN101341533B true CN101341533B (en) 2012-04-18

Family

ID=40214817

Family Applications (4)

Application Number Title Priority Date Filing Date
CN2006800420618A Active CN101351839B (en) 2005-09-14 2006-09-14 Method and apparatus for decoding an audio signal
CN2006800421752A Active CN101454828B (en) 2005-09-14 2006-09-14 Method and apparatus for decoding an audio signal
CN2006800421983A Active CN101341533B (en) 2005-09-14 2006-09-14 Method and apparatus for decoding audio signal
CN2006800420711A Active CN101356572B (en) 2005-09-14 2006-09-14 Method and apparatus for decoding an audio signal

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN2006800420618A Active CN101351839B (en) 2005-09-14 2006-09-14 Method and apparatus for decoding an audio signal
CN2006800421752A Active CN101454828B (en) 2005-09-14 2006-09-14 Method and apparatus for decoding an audio signal

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN2006800420711A Active CN101356572B (en) 2005-09-14 2006-09-14 Method and apparatus for decoding an audio signal

Country Status (1)

Country Link
CN (4) CN101351839B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2843226A1 (en) 2012-07-02 2014-01-09 Sony Corporation Decoding device, decoding method, encoding device, encoding method, and program
KR20150032651A (en) 2012-07-02 2015-03-27 소니 주식회사 Decoding device and method, encoding device and method, and program
CA2843263A1 (en) 2012-07-02 2014-01-09 Sony Corporation Decoding device, decoding method, encoding device, encoding method, and program
TWI517142B (en) * 2012-07-02 2016-01-11 Sony Corp Audio decoding apparatus and method, audio coding apparatus and method, and program
CN104540084A (en) * 2014-12-16 2015-04-22 广东欧珀移动通信有限公司 Stereo voice communication method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1179074A (en) * 1996-10-08 1998-04-15 三星电子株式会社 Device and method for reproducing multi-channel sound using two speakers
CN1402592A (en) * 2002-07-23 2003-03-12 华南理工大学 Two-loudspeaker virtual 5.1 path surround sound signal processing method
CN1655651A (en) * 2004-02-12 2005-08-17 艾格瑞系统有限公司 Auditory scene based on late reverberation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5632005A (en) * 1991-01-08 1997-05-20 Ray Milton Dolby Encoder/decoder for multidimensional sound fields
US6711266B1 (en) * 1997-02-07 2004-03-23 Bose Corporation Surround sound channel encoding and decoding
EP1054575A3 (en) * 1999-05-17 2002-09-18 Bose Corporation Directional decoding
BRPI0304540B1 (en) * 2002-04-22 2017-12-12 Koninklijke Philips N. V METHODS FOR CODING AN AUDIO SIGNAL, AND TO DECODE AN CODED AUDIO SIGN, ENCODER TO CODIFY AN AUDIO SIGN, CODIFIED AUDIO SIGN, STORAGE MEDIA, AND, DECODER TO DECOD A CODED AUDIO SIGN

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1179074A (en) * 1996-10-08 1998-04-15 三星电子株式会社 Device and method for reproducing multi-channel sound using two speakers
CN1402592A (en) * 2002-07-23 2003-03-12 华南理工大学 Two-loudspeaker virtual 5.1 path surround sound signal processing method
CN1655651A (en) * 2004-02-12 2005-08-17 艾格瑞系统有限公司 Auditory scene based on late reverberation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Julia Jakka.Binaural to Multichannel Audio Upmix.《Binaural to Multichannel Audio Upmix》.HELSINKI UNIVERSITY OF TECHNOLOGY,2005,1-59. *

Also Published As

Publication number Publication date
CN101341533A (en) 2009-01-07
CN101356572B (en) 2013-02-13
CN101356572A (en) 2009-01-28
CN101351839B (en) 2012-07-04
CN101454828B (en) 2011-12-28
CN101351839A (en) 2009-01-21
CN101454828A (en) 2009-06-10

Similar Documents

Publication Publication Date Title
US9747905B2 (en) Method and apparatus for decoding an audio signal
US20080235006A1 (en) Method and Apparatus for Decoding an Audio Signal
JP4740335B2 (en) Audio signal decoding method and apparatus
AU2007328614B2 (en) A method and an apparatus for processing an audio signal
US20080221907A1 (en) Method and Apparatus for Decoding an Audio Signal
JP2013033299A (en) Method, storage medium, and system for decoding and encoding multi-channel signal
CN101341533B (en) Method and apparatus for decoding audio signal
RU2380767C2 (en) Method and device for audio signal decoding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant