EP1905034B1 - Virtual source location information based channel level difference quantization and dequantization - Google Patents
Virtual source location information based channel level difference quantization and dequantization Download PDFInfo
- Publication number
- EP1905034B1 EP1905034B1 EP06783342A EP06783342A EP1905034B1 EP 1905034 B1 EP1905034 B1 EP 1905034B1 EP 06783342 A EP06783342 A EP 06783342A EP 06783342 A EP06783342 A EP 06783342A EP 1905034 B1 EP1905034 B1 EP 1905034B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- cld
- quantization
- vsli
- spatial
- channel audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Not-in-force
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Definitions
- the present invention relates to Spatial Audio Coding (SAC) of a multi-channel audio signal and decoding of an audio bitstream generated by the SAC, and more particularly, to efficient quantization and dequantization of Channel Level Difference (CLD) used as a spatial parameter when SAC-based encoding of a multi-channel audio signal is performed.
- SAC Spatial Audio Coding
- CLD Channel Level Difference
- the SAC approach is an encoding approach for improving transmission efficiency by encoding N number of multi-channel audio signals (N>2) using both a down-mix signal, which is mixed into mono or stereo, and a set of ancillary spatial parameters, which represent a human perceptual characteristic of the multi-channel audio signal.
- the spatial parameters can include Channel Level Difference (CLD) representing a level difference between two channels according to time-frequency, Inter-channel Correlation/Coherence (ICC) representing correlation or coherence between two channels according to time-frequency, Channel Prediction Coefficient (CPC) for making it possible to reproduce a third channel from two channels by prediction, and so on.
- CLD Channel Level Difference
- ICC Inter-channel Correlation/Coherence
- CPC Channel Prediction Coefficient
- the CLD is a core element in restoring a power gain of each channel, and is extracted in various ways in the process of SAC encoding. As illustrated in FIG. 1A , on the basis of one reference channel, the CLD is expressed by a power ratio of the reference channel to each of the other channels. For example, if there are six channel signals L, R, C, LFE, Ls and Rs, five power ratios can be obtained based on one reference channel, and CLD1 through CLD5 correspond to levels obtained by applying a base-10 logarithm to each of the five power ratios.
- a multi-channel is divided into a plurality of channel pairs, and each of the channel pairs is analyzed on the basis of stereo, and, in each analysis step, one CLD value is extracted.
- This is carried out by step-by-step use of a plurality of One-To-Two (OTT) modules, which take two input channels to one output channel.
- OTT One-To-Two
- any one of the input stereo signals is recognized as a reference channel, and a base-10 logarithmic value of a power ratio of the reference channel to the other channel is output as a CLD value.
- the CLD value has a dynamic range between - ⁇ and + ⁇ . Hence, to express the CLD value with a finite number of bits, efficient quantization is required.
- CLD quantization is performed by using a normalized quantization table.
- An example of such a quantization table is given in the SAC standard document (see page 41, Table 57).
- the dynamic range of the CLD value is limited to a predetermined level or less.
- quantization error is introduced, and thus spectrum information is distorted.
- the dynamic range of the CLD value will be limited to the range between -25 dB and +25 dB.
- the present invention is directed to Channel Level Difference (CLD) quantization and dequantization methods capable of minimizing sound deterioration in the process of Spatial Audio Coding (SAC)-based encoding of a multi-channel audio signal.
- CLD Channel Level Difference
- SAC Spatial Audio Coding
- the present invention is also directed to CLD quantization and dequantization methods capable of minimizing sound deterioration using advantages of quantization of Virtual Source Location Information (VSLI), which is replaceable with CLD, in the process of SAC-based encoding of a multi-channel audio signal.
- VSLI Virtual Source Location Information
- the present invention is directed to improving quality of sound without additional complexity by providing a VSLI-based CLD quantization table, which can be replaced by a CLD quantization table used for CLD quantization and dequantization in a Moving Picture Experts Group (MPEG)-4 SAC system.
- MPEG Moving Picture Experts Group
- a first aspect of the present invention provides a method for quantizing a Channel Level Difference (CLD) parameter used as a spatial parameter when Spatial Audio coding (SAC)-based encoding of an N-channel audio signal (N>1) is performed in accordance with claim 1.
- CLD Channel Level Difference
- SAC Spatial Audio coding
- a second aspect of the present invention provides a computer-readable recording medium in accordance with claim 12.
- a third aspect of the present invention provides a method for encoding an N-channel audio signal (N>1) based on Spatial Audio Coding (SAC) in accordance with claim 13.
- N N-channel audio signal
- SAC Spatial Audio Coding
- a fourth aspect of the present invention provides an apparatus for encoding an N-channel audio signal (N>1) based on Spatial Audio Coding (SAC) in accordance with claim 14.
- N N-channel audio signal
- SAC Spatial Audio Coding
- a fifth aspect of the present invention provides a method for dequantizing an encoded Channel Level Difference (CLD) quantization value when an encoded N-channel audio bitstream (N>1) is decoded based on Spatial Audio coding (SAC) in accordance with claim 17.
- CLD Channel Level Difference
- a sixth aspect of the present invention provides a computer-readable recording medium in accordance with claim 22.
- a seventh aspect of the present invention provides a method for decoding an encoded N-channel audio bitstream (N>1) based on Spatial Audio Coding (SAC) in accordance with claim 23.
- N N-channel audio bitstream
- SAC Spatial Audio Coding
- the VSLI-based CLD quantization table created according to the present invention can replace the CLD quantization table used in an existing SAC system.
- FIG. 2 schematically illustrates a configuration of a spatial audio coding (SAC) system to which the present invention is to be applied.
- the SAC system can be divided into an encoding part of generating, encoding and transmitting a down-mix signal and spatial parameters from an N-channel audio signal and a decoding part of restoring the N-channel audio signal from the down-mix signal and spatial parameters transmitted from the encoding part.
- the encoding part includes an SAC encoder 210, an audio encoder 220, a spatial parameter quantizer 230, and a spatial parameter encoder 240.
- the decoding part includes an audio decoder 250, a spatial parameter decoder 260, a spatial parameter dequantizer 270, and an SAC decoder 280.
- the SAC encoder 210 generates a down-mix signal from the input N-channel audio signal and analyzes spatial characteristics of the N-channel audio signal, thereby extracting spatial parameters such as Channel Level Difference (CLD), Inter-channel Correlation/Coherence (ICC), and Channel Prediction Coefficient (CPC).
- CLD Channel Level Difference
- ICC Inter-channel Correlation/Coherence
- CPC Channel Prediction Coefficient
- N (N > 1) multi-channel signal input into the SAC encoder 210 is decomposed into frequency bands by means of an analysis filter bank.
- a quadrature mirror filter (QMF) is used. Spatial characteristics related to spatial perception are analyzed from sub-band signals, and spatial parameters such as CLD, ICC, and CPC are selectively extracted according to an encoding operation mode. Further, the sub-band signals are down-mixed and converted into a down-mix signal of a time domain by means of a QMF synthesis bank.
- the down-mix signal may be replaced by a down-mix signal which is pre-produced by an acoustic engineer (or wan artistic/hand-mixed down-mix signal).
- the SAC encoder 210 adjusts and transmits the spatial parameters on the basis of the pre-produced down-mix signal, thereby optimizing multi-channel restoration at the decoder.
- the audio encoder 220 compresses the down-mix signal generated by the SAC encoder 210 or the artistic down-mix signal by using an existing audio compression technique (e.g. Moving Picture Experts Group (MPEG)-4, Advanced Audio Coding (AAC), MPEG-4 High Efficiency Advanced Audio Coding (HE-AAC), MPEG-4 Bit Sliced Arithmetic Coding (BSAC) etc.), thereby generating a compressed audio bitstream.
- MPEG Moving Picture Experts Group
- AAC Advanced Audio Coding
- HE-AAC MPEG-4 High Efficiency Advanced Audio Coding
- BSAC MPEG-4 Bit Sliced Arithmetic Coding
- the spatial parameter quantizer 230 is provided with a quantization table, which is to be used to quantize each of the CLD, ICC and CPC. As described below, in order to minimize sound deterioration caused by quantizing the CLD using an existing normalized CLD quantization table, a Virtual Source Location Information (VSLI)-based CLD quantization table can be used in the spatial parameter quantizer 230.
- VSLI Virtual Source Location Information
- the spatial parameter encoder 240 performs entropy encoding in order to compress the spatial parameters quantized by the spatial parameter quantizer 230, and preferably performs Huffman encoding on quantization indexes of the spatial parameters using a Huffman codebook. As described below, the present invention proposes a new Huffman codebook in order to maximize transmission efficiency of CLD quantization indexes.
- the audio decoder 250 decodes the audio bitstream compressed through the existing audio compression technique (e.g. MPEG-4, AAC, MPEG-4 HE-AAC, MPEG-4 BSAC, etc.).
- the existing audio compression technique e.g. MPEG-4, AAC, MPEG-4 HE-AAC, MPEG-4 BSAC, etc.
- the spatial parameter decoder 260 and the spatial parameter dequantizer 270 are modules for performing the inverse of the quantization and encoding performed by the spatial parameter quantizer 230 and the spatial parameter encoder 240.
- the spatial parameter decoder 260 decodes the encoded quantization indexes of the spatial parameters on the basis of the Huffman codebook, and the spatial parameter dequantizer 270 obtains the spatial parameters corresponding to the quantization indexes from the quantization table.
- the VSLI-based CLD quantization table and the Huffman codebook proposed in the present invention are used for the processes of decoding and dequantization of the spatial parameters.
- the SAC decoder 280 restores the N multi-channel audio signals by synthesis of the audio bitstream decoded by the audio decoder 250 and the spatial parameters obtained by the spatial parameter dequantizer 270.
- the SAC system can provide compatibility with an existing mono or stereo audio coding system.
- the present invention is concerned with providing both the CLD quantization capable of minimizing sound deterioration resulting from quantization by utilizing advantages of the quantization of the VSLI representing a spatial audio image of the multi-channel audio signal.
- the present invention is based on the fact that, in expressing an azimuth angle of the spatial audio image, human ears have difficulty in recognizing an error of 3° or less.
- the VSLI expressed with the azimuth angle has a limited dynamic range of 90°, so that quantization error caused by limitation of the dynamic range upon quantization can be avoided.
- the CLD quantization table is designed on the basis of the advantages of the quantization of the VSLI, sound deterioration resulting from the quantization can be minimized.
- FIGS. 3A and 3B are views for explaining a concept of VSLI serving as a reference of CLD quantization in accordance with the present invention.
- FIG. 3A illustrates a stereo speaker environment in which two speakers are located at an angle of 60°
- FIG. 3B is a view in which a stereo audio signal in the stereo speaker environment of FIG. 3A is represented by power of a down-mixed signal and by VSLI.
- the stereo or multi-channel audio signal can be represented by the magnitude vector of a down-mix audio signal and the VSLI that can be obtained by analyzing the each channel power of the multi-channel audio signals.
- the multi-channel audio signal represented in this way can be restored by projecting the magnitude vector according to the location vector of a sound source.
- the VSLI calculated in this way has a value between A L and A R .
- P L and P R can be restored from the VSLI as follows: First, the VSLI is mapped to a value, VSLI', between 0° and 90° using a Constant Power Panning (CPP) rule, as in Equation 3.
- CCPP Constant Power Panning
- P L and P R are calculated using Equations 4 and 5.
- P L P D ⁇ cos VSLI ′ 2
- P R P D ⁇ sin VSLI ′ 2
- the subject matter of the present invention concerns applying the advantages of quantization of the VSLI to quantization of the spatial parameter, the CLD.
- the CLD can be expressed as in Equation 6.
- CLD 10 ⁇ log 10 ⁇ P R P L
- the CLD can be derived from the VSLI according to Equation 7.
- the CLD can be obtained by taking the natural logarithm, instead of the base-10 logarithm, of the VSLI.
- Equations 7 and 8 can be directly used as spatial parameters of a general SAC system.
- the CLD has a dynamic range between - ⁇ and + ⁇ , problems occur in performing quantization using a finite number of bits.
- the main problem is quantization error caused by limitation of the dynamic range. Because all dynamic ranges of the CLD cannot be expressed with only a finite number of bits, the dynamic range of the CLD is limited to a predetermined level or less. As a result, quantization error is introduced, and the spectrum information is distorted. If 5 bits are used for the CLD quantization, the dynamic range of the CLD is limited to between -25 dB and +25 dB.
- the VSLI has a finite dynamic range of 90°, such quantization error caused by limitation of the dynamic range upon quantization can be avoided.
- the CLD quantization value using the VSLI can be calculated by taking a base-10 logarithm or natural logarithm.
- e rather than 10 is used as the base when spectrum information is restored by using the CLD value. Table 3.
- the CLD quantization values and the CLD quantization decision levels are expressed as integers by taking the base-10 logarithm, it can be seen that there is a problem that some of the CLD quantization values are identical to some of the CLD quantization decision levels.
- the CLD quantization values and decision levels using the natural logarithm are preferably used for actual quantization.
- the CLD quantization values are derived by taking the natural logarithm rather than the base-10 logarithm of the VSLI.
- the VSLI-based CLD quantization table created in this way is employed in the spatial parameter quantizer 230 and the spatial parameter dequantizer 270 of the SAC system illustrated in FIG. 2 , so that sound deterioration resulting from the CLD quantization error can be minimized.
- the present invention proposes a Huffman codebook capable of optimizing Huffman encoding of the CLD quantization indexes derived on the basis of the above-described VSLI-based CLD quantization table.
- the multi-channel audio signal is processed after being split into sub-bands of a frequency domain by means of a filter bank.
- a differential coding method is applied to a quantization index of each sub-band, thereby classifying the quantization indexes into the quantization index of the fist sub-band and the other 19 differential indexes between neighboring sub-bands.
- they may be divided into differential indexes between neighboring frames.
- a probability distribution is calculated with respect to each of the three types of indexes classified in this way, and then the Huffman coding method is applied to each of the three types of indexes.
- Huffman codebooks described in Tables 13 and 14 below can be obtained.
- Table 13 is the Huffman codebook for the index of the first sub-band
- Table 14 is the Huffman code book for the other indexes between neighboring sub-bands.
- Table 13 Index Number of Bits Codeword (hexadecimal) Index Number of Bits Codeword (hexadecimal) 0 5 0x17 16 5 0x1d 1 8 0x64 17 5 0x19 2 8 0x65 18 5 0x1c 3 8 0xf0 19 5 0x16 4 8 0xf1 20 5 0x18 5 7 0x33 21 5 0x14 6 7 0x79 22 5 0x13 7 6 0x18 23 5 0x15 8 6 0x22 24 5 0x1b 9 6 0x23 25 5 0x10 10 6 0x3d 26 5 0x0e 11 5 0x0b 27 5 0x0f 12 5 0x12 28 5 0x0d 13 5 0x1a 29 5 0x
- the Huffman codebooks proposed in the present invention are employed to the spatial parameter encoder 240 and the spatial parameter decoder 260 of the SAC system illustrated in FIG. 2 , so that a bit rate required to transmit the CLD quantization indexes can be reduced.
- the present invention can be provided as a computer program stored on at least one computer-readable medium in the form of at least one product such as a floppy disk, hard disk, CD ROM, flash memory card, PROM, RAM, ROM, or magnetic tape.
- the computer program can be written in any programming language such as C, C++, or JAVA.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- The present invention relates to Spatial Audio Coding (SAC) of a multi-channel audio signal and decoding of an audio bitstream generated by the SAC, and more particularly, to efficient quantization and dequantization of Channel Level Difference (CLD) used as a spatial parameter when SAC-based encoding of a multi-channel audio signal is performed.
- Spatial Audio Coding (SAC) is technology for efficiently compressing a multi-channel audio signal while maintaining compatibility with an existing stereo audio system. In the Moving Picture Experts Group (MPEG), SAC technology has been standardized and named "MPEG Surround" since 2002, and is described in detail in the ISO/IEC working document, ISO/IEC CD 14996-x (published on February 18, 2005 and hereinafter referred to as "SAC standard document").
- Specifically, the SAC approach is an encoding approach for improving transmission efficiency by encoding N number of multi-channel audio signals (N>2) using both a down-mix signal, which is mixed into mono or stereo, and a set of ancillary spatial parameters, which represent a human perceptual characteristic of the multi-channel audio signal. The spatial parameters can include Channel Level Difference (CLD) representing a level difference between two channels according to time-frequency, Inter-channel Correlation/Coherence (ICC) representing correlation or coherence between two channels according to time-frequency, Channel Prediction Coefficient (CPC) for making it possible to reproduce a third channel from two channels by prediction, and so on.
- The CLD is a core element in restoring a power gain of each channel, and is extracted in various ways in the process of SAC encoding. As illustrated in
FIG. 1A , on the basis of one reference channel, the CLD is expressed by a power ratio of the reference channel to each of the other channels. For example, if there are six channel signals L, R, C, LFE, Ls and Rs, five power ratios can be obtained based on one reference channel, and CLD1 through CLD5 correspond to levels obtained by applying a base-10 logarithm to each of the five power ratios. - Meanwhile, as illustrated in
FIG. 1B , a multi-channel is divided into a plurality of channel pairs, and each of the channel pairs is analyzed on the basis of stereo, and, in each analysis step, one CLD value is extracted. This is carried out by step-by-step use of a plurality of One-To-Two (OTT) modules, which take two input channels to one output channel. In each OTT, any one of the input stereo signals is recognized as a reference channel, and a base-10 logarithmic value of a power ratio of the reference channel to the other channel is output as a CLD value. - The CLD value has a dynamic range between -∞ and +∞. Hence, to express the CLD value with a finite number of bits, efficient quantization is required.
- Typically, CLD quantization is performed by using a normalized quantization table. An example of such a quantization table is given in the SAC standard document (see page 41, Table 57). In this manner, because all CLD values cannot be expressed with only a finite number of bits, the dynamic range of the CLD value is limited to a predetermined level or less. Thereby, quantization error is introduced, and thus spectrum information is distorted. For example, when 5 bits are used for the CLD quantization, the dynamic range of the CLD value will be limited to the range between -25 dB and +25 dB.
- Document JEONGIL SEO ET AL: "A New Cue Parameter for Spatial Audio Coding", JOINT VIDEO TEAM (JVT) OF ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q6), no. M11264, 13 October 2004 (2004-10-13), suggests a new spatial cue parameter that describes the virtual sound source location as power vectors between channels. The main advantage of describing a spatial characteristic as angle comes from the finite dynamic range.
- The present invention is directed to Channel Level Difference (CLD) quantization and dequantization methods capable of minimizing sound deterioration in the process of Spatial Audio Coding (SAC)-based encoding of a multi-channel audio signal.
- The present invention is also directed to CLD quantization and dequantization methods capable of minimizing sound deterioration using advantages of quantization of Virtual Source Location Information (VSLI), which is replaceable with CLD, in the process of SAC-based encoding of a multi-channel audio signal.
- In addition, the present invention is directed to improving quality of sound without additional complexity by providing a VSLI-based CLD quantization table, which can be replaced by a CLD quantization table used for CLD quantization and dequantization in a Moving Picture Experts Group (MPEG)-4 SAC system.
- A first aspect of the present invention provides a method for quantizing a Channel Level Difference (CLD) parameter used as a spatial parameter when Spatial Audio coding (SAC)-based encoding of an N-channel audio signal (N>1) is performed in accordance with
claim 1. - A second aspect of the present invention provides a computer-readable recording medium in accordance with claim 12.
- A third aspect of the present invention provides a method for encoding an N-channel audio signal (N>1) based on Spatial Audio Coding (SAC) in accordance with claim 13.
- A fourth aspect of the present invention provides an apparatus for encoding an N-channel audio signal (N>1) based on Spatial Audio Coding (SAC) in accordance with claim 14.
- A fifth aspect of the present invention provides a method for dequantizing an encoded Channel Level Difference (CLD) quantization value when an encoded N-channel audio bitstream (N>1) is decoded based on Spatial Audio coding (SAC) in accordance with claim 17.
- A sixth aspect of the present invention provides a computer-readable recording medium in accordance with claim 22.
- A seventh aspect of the present invention provides a method for decoding an encoded N-channel audio bitstream (N>1) based on Spatial Audio Coding (SAC) in accordance with claim 23.
- An eighth aspect of the present invention provides an apparatus for decoding an encoded N-channel audio bitstream (N>1) based on Spatial Audio Coding (SAC) in accordance with claim 24.
- The VSLI-based CLD quantization table created according to the present invention can replace the CLD quantization table used in an existing SAC system.
- By using the VSLI-based CLD quantization table according to the present invention, sound deterioration can be prevented as much as possible. In addition, by using a Huffman codebook in compressing CLD indexes, which is proposed in the present invention, it is possible to reduce a bit rate required to transmit the CLD.
-
-
FIGS. 1A and 1B conceptually illustrate a process of extracting Channel Level Difference (CLD) values from multi-channel signals; -
FIG. 2 schematically illustrates a configuration of a spatial audio coding (SAC) system to which the present invention is to be applied; -
FIGS. 3A and 3B are views for explaining a concept of VSLI serving as a reference of CLD quantization in accordance with the present invention; and -
FIG. 4 is a graph showing CLD quantization values converted from VSLI quantization values in accordance with the present invention. - Hereinafter, exemplary embodiments of the present invention will be described in detail. However, the present invention is not limited to the exemplary embodiments disclosed below, but can be implemented in various forms. Therefore, these exemplary embodiments are provided for complete disclosure of the present invention and to fully convey the scope of the present invention, as defined by the appended claims, to those of ordinary skill in the art.
-
FIG. 2 schematically illustrates a configuration of a spatial audio coding (SAC) system to which the present invention is to be applied. As illustrated, the SAC system can be divided into an encoding part of generating, encoding and transmitting a down-mix signal and spatial parameters from an N-channel audio signal and a decoding part of restoring the N-channel audio signal from the down-mix signal and spatial parameters transmitted from the encoding part. The encoding part includes anSAC encoder 210, anaudio encoder 220, aspatial parameter quantizer 230, and aspatial parameter encoder 240. The decoding part includes anaudio decoder 250, aspatial parameter decoder 260, aspatial parameter dequantizer 270, and anSAC decoder 280. - The
SAC encoder 210 generates a down-mix signal from the input N-channel audio signal and analyzes spatial characteristics of the N-channel audio signal, thereby extracting spatial parameters such as Channel Level Difference (CLD), Inter-channel Correlation/Coherence (ICC), and Channel Prediction Coefficient (CPC). - Specifically, N (N > 1) multi-channel signal input into the
SAC encoder 210 is decomposed into frequency bands by means of an analysis filter bank. In order to split a signal into sub-bands of a frequency domain with low complexity, a quadrature mirror filter (QMF) is used. Spatial characteristics related to spatial perception are analyzed from sub-band signals, and spatial parameters such as CLD, ICC, and CPC are selectively extracted according to an encoding operation mode. Further, the sub-band signals are down-mixed and converted into a down-mix signal of a time domain by means of a QMF synthesis bank. - Alternatively, the down-mix signal may be replaced by a down-mix signal which is pre-produced by an acoustic engineer (or wan artistic/hand-mixed down-mix signal). At this time, the
SAC encoder 210 adjusts and transmits the spatial parameters on the basis of the pre-produced down-mix signal, thereby optimizing multi-channel restoration at the decoder. - The
audio encoder 220 compresses the down-mix signal generated by theSAC encoder 210 or the artistic down-mix signal by using an existing audio compression technique (e.g. Moving Picture Experts Group (MPEG)-4, Advanced Audio Coding (AAC), MPEG-4 High Efficiency Advanced Audio Coding (HE-AAC), MPEG-4 Bit Sliced Arithmetic Coding (BSAC) etc.), thereby generating a compressed audio bitstream. - Meanwhile, the spatial parameters generated by the
SAC encoder 210 are transmitted after being quantized and encoded by thespatial parameter quantizer 230 and thespatial parameter encoder 240. Thespatial parameter quantizer 230 is provided with a quantization table, which is to be used to quantize each of the CLD, ICC and CPC. As described below, in order to minimize sound deterioration caused by quantizing the CLD using an existing normalized CLD quantization table, a Virtual Source Location Information (VSLI)-based CLD quantization table can be used in thespatial parameter quantizer 230. - The
spatial parameter encoder 240 performs entropy encoding in order to compress the spatial parameters quantized by thespatial parameter quantizer 230, and preferably performs Huffman encoding on quantization indexes of the spatial parameters using a Huffman codebook. As described below, the present invention proposes a new Huffman codebook in order to maximize transmission efficiency of CLD quantization indexes. - The
audio decoder 250 decodes the audio bitstream compressed through the existing audio compression technique (e.g. MPEG-4, AAC, MPEG-4 HE-AAC, MPEG-4 BSAC, etc.). - The
spatial parameter decoder 260 and thespatial parameter dequantizer 270 are modules for performing the inverse of the quantization and encoding performed by thespatial parameter quantizer 230 and thespatial parameter encoder 240. Thespatial parameter decoder 260 decodes the encoded quantization indexes of the spatial parameters on the basis of the Huffman codebook, and thespatial parameter dequantizer 270 obtains the spatial parameters corresponding to the quantization indexes from the quantization table. In analogy to the quantization and encoding of the spatial parameters, the VSLI-based CLD quantization table and the Huffman codebook proposed in the present invention are used for the processes of decoding and dequantization of the spatial parameters. - The
SAC decoder 280 restores the N multi-channel audio signals by synthesis of the audio bitstream decoded by theaudio decoder 250 and the spatial parameters obtained by thespatial parameter dequantizer 270. Alternatively, when decoding of the multi-channel audio signals is impossible, only the down-mix signal can be decoded by using an existing audio decoder, so that independent service is possible. Therefore, the SAC system can provide compatibility with an existing mono or stereo audio coding system. - The present invention is concerned with providing both the CLD quantization capable of minimizing sound deterioration resulting from quantization by utilizing advantages of the quantization of the VSLI representing a spatial audio image of the multi-channel audio signal. The present invention is based on the fact that, in expressing an azimuth angle of the spatial audio image, human ears have difficulty in recognizing an error of 3° or less. The VSLI expressed with the azimuth angle has a limited dynamic range of 90°, so that quantization error caused by limitation of the dynamic range upon quantization can be avoided. When the CLD quantization table is designed on the basis of the advantages of the quantization of the VSLI, sound deterioration resulting from the quantization can be minimized.
-
FIGS. 3A and 3B are views for explaining a concept of VSLI serving as a reference of CLD quantization in accordance with the present invention.FIG. 3A illustrates a stereo speaker environment in which two speakers are located at an angle of 60°, andFIG. 3B is a view in which a stereo audio signal in the stereo speaker environment ofFIG. 3A is represented by power of a down-mixed signal and by VSLI. As illustrated, the stereo or multi-channel audio signal can be represented by the magnitude vector of a down-mix audio signal and the VSLI that can be obtained by analyzing the each channel power of the multi-channel audio signals. The multi-channel audio signal represented in this way can be restored by projecting the magnitude vector according to the location vector of a sound source. -
-
-
-
-
-
- The CLD values obtained by Equations 7 and 8 can be directly used as spatial parameters of a general SAC system.
- As previously described, because the CLD has a dynamic range between -∞ and +∞, problems occur in performing quantization using a finite number of bits. The main problem is quantization error caused by limitation of the dynamic range. Because all dynamic ranges of the CLD cannot be expressed with only a finite number of bits, the dynamic range of the CLD is limited to a predetermined level or less. As a result, quantization error is introduced, and the spectrum information is distorted. If 5 bits are used for the CLD quantization, the dynamic range of the CLD is limited to between -25 dB and +25 dB.
- In contrast, because the VSLI has a finite dynamic range of 90°, such quantization error caused by limitation of the dynamic range upon quantization can be avoided.
- In one embodiment, upon quantization of the VSLI, if 5 bits are used for the CLD quantization and a linear quantizer is applied, the number of quantization levels is 31 and a quantization interval is 3°. The validity of the VSLI quantization approach can be verified from the fact that people fail to recognize a difference of 3° or less when recognizing the spatial image of an audio signal.
- The advantages of this VSLI quantization are applied to the CLD quantization of the stereo coding method, the CLD quantization table used in the existing SAC system can be replaced by a VSLI-based quantization table.
- In one embodiment, quantization values of the VSLI on which 5-bit linear quantization is performed at a quantization interval of 3° and CLD conversion levels corresponding to the VSLI quantization values are given in Table 1.
Table 1. VSLI Quantization values and CLD values Index VSLI Quantization value CLD value Index VSLI Quantization value CLD value -15 0 -324.2604 1 48 0.9113 -14 3 -25.6121 2 51 1.8326 -13 6 -19.5676 3 54 2.7748 -12 9 -16.0057 4 57 3.7497 -11 12 -13.4505 5 60 4.7712 -10 15 -11.4390 6 63 5.8567 -9 18 -9.7645 7 66 7.0283 -8 21 -8.3165 8 69 8.3165 -7 24 -7.0283 9 72 9.7645 -6 27 -5.8567 10 75 11.4390 -5 30 -4.7712 11 78 13.4505 -4 33 -3.7497 12 81 16.0057 -3 36 -2.7748 13 84 19.5676 -2 39 -1.8326 14 87 25.6121 -1 42 -0.9113 15 90 324.2604 0 45 0.0000 - Further, a VSLI decision level for the VSLI quantization is decided by a middle value between neighboring quantization values. The middle value is converted into the CLD and used as a decision level of the CLD quantization. The VSLI-based CLD quantization decision level has a value other than the middle value between neighboring quantization values as seen in Table 2, unlike ordinary CLD quantization in which the decision level has the middle value between neighboring quantization values.
-
FIG. 4 is a graph showing CLD quantization values converted from VSLI quantization values in accordance with the present invention. As illustrated, when quantizing the VSLI at a uniform angle on the basis of 45°, the decision level between the quantized angles is the middle value between two angles. However, when this VSLI decision level is converted into a CLD value, it can be found that the VSLI decision level has a value other than the middle value between two neighboring CLD values. Table 2 below lists the decision levels of the VSLI quantization and the corresponding CLD values. - Tables 3 through 7 below are VSLI-based CLD quantization tables created by using Tables 1 and 2, wherein Table 3 gives the CLD quantization values down to the fourth decimal place, Table 4 down to the third decimal place, Table 5 down to the second decimal place, Table 6 down to the first decimal place, and Table 7 to the integer.
- The CLD quantization value using the VSLI can be calculated by taking a base-10 logarithm or natural logarithm. When taking the natural logarithm, e rather than 10 is used as the base when spectrum information is restored by using the CLD value.
Table 3. VSLI-based CLD Quantization Table (Fourth Decimal Place) Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65.1400 -150.0000 1 0.9113 2.0982 -14 -25.6121 -58.9740 2 1.8326 4.2198 -13 -19.5676 -45.0561 3 2.7748 6.3892 -12 -16.0057 -36.8546 4 3.7497 8.6339 -11 -13.4505 -30.9709 5 4.7712 10.9861 -10 -11.4390 -26.3392 6 5.8567 13.4855 -9 -9.7645 -22.4835 7 7.0283 16.1833 -8 -8.3165 -19.1493 8 8.3165 19.1493 -7 -7.0283 -16.1833 9 9.7645 22.4835 -6 -5.8567 -13.4855 10 11.4390 26.3392 -5 -4.7712 -10.9861 11 13.4505 30.9709 -4 -3.7497 -8.6339 12 16.0057 36.8546 -3 -2.7748 -6.3892 13 19.5676 45.0561 -2 -1.8326 -4.2198 14 25.6121 58.9740 -1 -0.9113 -2.0982 15 65.1400 150.0000 0 0.0000 0.0000 Table 4. VSLI-based CLD Quantization Table (Third Decimal Place) Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65.140 -150.000 1 0.911 2.098 -14 -25.612 -58.974 2 1.832 4.219 -13 -19.567 -45.056 3 2.774 6.389 -12 -16.005 -36.854 4 3.749 8.633 -11 -13.450 -30.970 5 4.771 10.986 -10 -11.439 -26.339 6 5.856 13.485 -9 -9.764 -22.483 7 7.028 16.183 -8 -8.316 -19.149 8 8.316 19.149 -7 -7.028 -16.183 9 9.764 22.483 -6 -5.856 -13.485 10 11.439 26.339 -5 -4.771 -10.986 11 13.450 30.970 -4 -3.749 -8.633 12 16.005 36.854 -3 -2.774 -6.389 13 19.567 45.056 -2 -1.832 -4.219 14 25.612 58.974 -1 -0.911 -2.098 15 65.140 150.000 0 0.000 0.000 Table 5. VSLI-based CLD Quantization Table (Second Decimal Place) Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65.14 -150.00 1 0.91 2.09 -14 -25.61 -58.97 2 1.83 4.21 -13 -19.56 -45.05 3 2.77 6.38 -12 -16.00 -36.85 4 3.74 8.63 -11 -13.45 -30.97 5 4.77 10.98 -10 -11.43 -26.33 6 5.85 13.48 -9 -9.76 -22.48 7 7.02 16.18 -8 -8.31 -19.14 8 8.31 19.14 -7 -7.02 -16.18 9 9.76 22.48 -6 -5.85 -13.48 10 11.43 26.33 -5 -4.77 -10.98 11 13.45 30.97 -4 -3.74 -8.63 12 16.00 36.85 -3 -2.77 -6.38 13 19.56 45.05 -2 -1.83 -4.21 14 25.61 58.97 -1 -0.91 -2.09 15 65.14 150.00 0 0.00 0.00 Table 6. VSLI-based CLD Quantization Table (First Decimal Place) Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65.1 -150.0 1 0.9 2.0 -14 -25.6 -58.9 2 1.8 4.2 -13 -19.5 -45.0 3 2.7 6.3 -12 -16.0 -36.8 4 3.7 8.6 -11 -13.4 -30.9 5 4.7 10.9 -10 -11.4 -26.3 6 5.8 13.4 -9 -9.7 -22.4 7 7.0 16.1 -8 -8.3 -19.1 8 8.3 19.1 -7 -7.0 -16.1 9 9.7 22.4 -6 -5.8 -13.4 10 11.4 26.3 -5 -4.7 -10.9 11 13.4 30.9 -4 -3.7 -8.6 12 16.0 36.8 -3 -2.7 -6.3 13 19.5 45.0 -2 -1.8 -4.2 14 25.6 58.9 -1 -0.9 -2.0 15 65.1 150.0 0 0.0 0.0 Table 7. VSLI-based CLD Quantization Table (Integer) Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65 -150 1 0 2 -14 -25 -58 2 1 4 -13 -19 -45 3 2 6 -12 -16 -36 4 3 8 -11 -13 -30 5 4 10 -10 -11 -26 6 5 13 -9 -9 -22 7 7 16 -8 -8 -19 8 8 19 -7 -7 -16 9 9 22 -6 -5 -13 10 11 26 -5 -4 -10 11 13 30 -4 -3 -8 12 16 36 -3 -2 -6 13 19 45 -2 -1 -4 14 25 58 -1 -0 -2 15 65 150 0 0 0 -
- As shown in Tables 7 and 12, when the CLD quantization values and the CLD quantization decision levels are expressed as integers by taking the base-10 logarithm, it can be seen that there is a problem that some of the CLD quantization values are identical to some of the CLD quantization decision levels. Hence, the CLD quantization values and decision levels using the natural logarithm are preferably used for actual quantization. In other words, when intending to use the VSLI-based CLD quantization table and the VSLI-based CLD quantization decision levels, both of which are expressed to the integer, the CLD quantization values are derived by taking the natural logarithm rather than the base-10 logarithm of the VSLI.
- The VSLI-based CLD quantization table created in this way is employed in the
spatial parameter quantizer 230 and thespatial parameter dequantizer 270 of the SAC system illustrated inFIG. 2 , so that sound deterioration resulting from the CLD quantization error can be minimized. - Further, the present invention proposes a Huffman codebook capable of optimizing Huffman encoding of the CLD quantization indexes derived on the basis of the above-described VSLI-based CLD quantization table.
- In the SAC system, the multi-channel audio signal is processed after being split into sub-bands of a frequency domain by means of a filter bank. When the multi-channel audio signal is split into 20 sub-bands, a differential coding method is applied to a quantization index of each sub-band, thereby classifying the quantization indexes into the quantization index of the fist sub-band and the other 19 differential indexes between neighboring sub-bands. Alternatively, they may be divided into differential indexes between neighboring frames. A probability distribution is calculated with respect to each of the three types of indexes classified in this way, and then the Huffman coding method is applied to each of the three types of indexes. Thereby, Huffman codebooks described in Tables 13 and 14 below can be obtained. Table 13 is the Huffman codebook for the index of the first sub-band, and Table 14 is the Huffman code book for the other indexes between neighboring sub-bands.
Table 13 Index Number of Bits Codeword (hexadecimal) Index Number of Bits Codeword (hexadecimal) 0 5 0x17 16 5 0x1d 1 8 0x64 17 5 0x19 2 8 0x65 18 5 0x1c 3 8 0xf0 19 5 0x16 4 8 0xf1 20 5 0x18 5 7 0x33 21 5 0x14 6 7 0x79 22 5 0x13 7 6 0x18 23 5 0x15 8 6 0x22 24 5 0x1b 9 6 0x23 25 5 0x10 10 6 0x3d 26 5 0x0e 11 5 0x0b 27 5 0x0f 12 5 0x12 28 5 0x0d 13 5 0x1a 29 5 0x0a 14 4 0x04 30 2 0x00 15 5 0x1f Table 14 Index Between Neighboring Bands Between Neighboring Frames Number of Bits Codeword Number of Bits Codeword 0 2 0x00003 1 0x0000 1 2 0x00001 2 0x0002 2 3 0x00005 4 0x000f 3 3 0x00001 4 0x000d 4 4 0x00009 5 0x001d 5 4 0x00001 5 0x0019 6 5 0x00011 6 0x0039 7 5 0x00001 6 0x0031 8 6 0x00021 7 0x0071 9 6 0x00001 7 0x0061 10 7 0x00041 8 0x00e0 11 7 0x00001 8 Ox00c0 12 8 0x00080 9 0x0183 13 8 0x00000 10 0x0386 14 9 0x00102 10 0x0305 15 9 0x00002 11 0x070b 16 10 0x00206 11 0x0708 17 10 0x00006 11 0x0609 18 11 0x0040e 12 0x0e1f 19 11 0x0000e 12 0x0e15 20 12 0x0081f 12 0x0c10 21 12 0x0001f 12 0x0e14 22 13 0x0103c 13 0x1c3a 23 13 0x0003d 13 0x1c3d 24 14 0x0207a 13 0x1c38 25 14 0x00079 13 0x1c39 26 14 0x00078 13 0x1823 27 15 0x040f6 13 0x1822 28 16 0x081ef 13 0x1c3c 29 17 0x103dd 13 0x1c3b 30 17 0x103dc 11 0x0709 - In this manner, the Huffman codebooks proposed in the present invention are employed to the
spatial parameter encoder 240 and thespatial parameter decoder 260 of the SAC system illustrated inFIG. 2 , so that a bit rate required to transmit the CLD quantization indexes can be reduced. - Alternatively, when the number of bits used for Huffman encoding of the 20 sub-bands exceeds 100, 5-bit Pulse Code Modulation (PCM) coding can be performed on each sub-band.
- The present invention can be provided as a computer program stored on at least one computer-readable medium in the form of at least one product such as a floppy disk, hard disk, CD ROM, flash memory card, PROM, RAM, ROM, or magnetic tape. In general, the computer program can be written in any programming language such as C, C++, or JAVA.
- While the invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention as defined by the appended claims.
Claims (26)
- A Channel Level Difference (CLD) quantization method for quantizing a CLD parameter used as a spatial parameter when Spatial Audio coding (SAC)-based encoding of an N-channel audio signal (N>1) is performed, the CLD quantization method comprising the steps of:extracting CLD for each sub-band from the N-channel audio signal; andquantizing the CLDs by reference to a Virtual Source Location Information (VSLI)-based CLD quantization table designed using CLD quantization values derived from VSLI quantization values of the N-channel audio signalwherein VSLI expresses an azimuth angle of a spatial audio image of the audio signal, andAL and AR are, respectively, the angles of left and right speakers in a speaker environment.
- The CLD quantization method according to claim 1, wherein the VSLI quantization value is quantized at a predetermined quantization interval within a range between 0° and 90°.
- The CLD quantization method according to claim 2, wherein the predetermined quantization interval is 3°.
- The CLD quantization method according to claim 1, wherein a decision level for the CLD quantization is derived from a VSLI decision level for VSLI quantization.
- The CLD quantization method according to claim 1, wherein the VSLI-based CLD quantization table is as follows:
Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65.1 -150.0 1 0.9 2.0 -14 -25.6 -58.9 2 1.8 4.2 -13 -19.5 -45.0 3 2.7 6.3 -12 -16.0 -36.8 4 3.7 8.6 -11 -13.4 -30.9 5 4.7 10.9 -10 -11.4 -26.3 6 5.8 13.4 -9 -9.7 -22.4 7 7.0 16.1 -8 -8.3 -19.1 8 8.3 19.1 -7 -7.0 -16.1 9 9.7 22.4 -6 -5.8 -13.4 10 11.4 26.3 -5 -4.7 -10.9 11 13.4 30.9 -4 -3.7 -8.6 12 16.0 36.8 -3 -2.7 -6.3 13 19.5 45.0 -2 -1.8 -4.2 14 25.6 58.9 -1 -0.9 -2.0 15 65.1 150.0 0 0.0 0.0 - The CLD quantization method according to claim 1, further comprising the step of performing Huffman encoding on quantization indexes of the CLD.
- The CLD quantization method according to claim 9, wherein the Huffman encoding is performed on a quantization index of a first sub-band by reference to a Haffman codebook as follows;
Index Number of Bits Codeword (hexadecimal) Index Number of Bits Codeword (hexadecimal) 0 5 0x17 16 5 0x1d 1 8 0x64 17 5 0x19 2 8 0x65 18 5 0x1c 3 8 0xf0 19 5 0x16 4 8 0xf1 20 5 0x18 5 7 0x33 21 5 0x14 6 7 0x79 22 5 0x13 7 6 0x18 23 5 0x15 8 6 0x22 24 5 0x1b 9 6 0x23 25 5 0x10 10 6 0x3d 26 5 0x0e 11 5 0x0b 27 5 0x0f 12 5 0x12 28 5 0x0d 13 5 0x1a 29 5 0x0a 14 4 0x04 30 2 0x00 15 5 0x1f - The CLD quantization method according to claim 10, wherein the Huffman encoding is performed on quantization indexes of the remaining sub-bands other than the first sub-band by reference to a Huffman codebook as follows:
Index Number of Bits Codeword (hexadecimal) Index Number of Bits Codeword (hexadecimal) 0 2 0x00003 16 10 0x00205 1 2 0x00001 17 10 0x00006 2 3 0x00005 18 11 0X0040e 3 3 0x00001 19 11 0x0000e 4 4 0x00009 20 12 0x0081f 5 4 0x00001 21 12 0x0001f 6 5 0x00011 22 13 0x0103c 7 5 0x00001 23 13 0x0003d 8 6 0x00021 24 14 0x0207a 9 6 0x00001 25 14 0x00079 10 7 0x00041 26 14 0x00078 11 7 0x00001 27 15 0x040f6 12 8 0x00080 28 16 0x081ef 13 8 0x00000 29 17 0x103dd 14 9 0x00102 30 17 0x103dc 15 9 0x00002 - A computer-readable recording medium on which is recorded a computer program for performing the CLD quantization method according to any one of claims 1 through 11.
- A method for encoding an N-channel audio signal (N>1) based on Spatial Audio Coding (SAC), the method comprising the steps of;down-mixing and encoding the N-channel audio signal;extracting spatial parameters including Channel Level Difference (CLD), Inter-channel Correlation/Coherence (ICC), and Channel Prediction Coefficient (CPC), for each sub-band, from the N-channel audio signal; andquantizing the extracted spatial parameters, using the method according to any one of claims 1 to 11.
- An apparatus for encoding an N-channel audio signal (N>1) based on Spatial Audio Coding (SAC), the apparatus comprising:an SAC encoding means for down-mixing the N-channel audio signal to generate a down-mix signal, and extracting spatial parameters including Channel Level Difference (CLD), Inter-channel Correlation/Coherence (ICC), and Channel Prediction Coefficient (CPC), for each sub-band, from the N-channel audio signal;an audio encoding means for generating a compressed audio bitstream from the down-mix signal generated by the SAC encoding means;a spatial parameter quantizing means for quantizing the spatial parameters extracted by the SAC encoding means; anda spatial parameter encoding means for encoding the quantized spatial parameters,wherein the spatial parameter quantizing means quantizes the CLD by reference to a Virtual Source Location Information (VSLI)-based CLD quantization table designed using CLD quantization values derived from VSLI quantization values of the N-channel audio signalwherein VSLI expresses an azimuth angle of a spatial audio image of the audio signal, andAL and AR are, respectively, the angles of left and right speakers in a speaker environment.
- The apparatus according to claim 14, wherein the VSLI-based CLD quantization table is as follows:
Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65.1 -150.0 1 0.9 2.0 -14 -25.6 -58.9 2 1.8 4.2 -13 -19.5 -45.0 3 2.7 6.3 -12 -16.0 -36.8 4 3.7 8.6 -11 -13.4 -30.9 5 4.7 10.9 -10 -11.4 -26.3 6 5.8 13.4 -9 -9.7. -22.4 7 7.0 16.1 -8 -8.3 -19.1 8 8.3 19.1 -7 -7.0 -16.1 9 9.7 22.4 -6 -5.8 -13.4 10 11.4 26.3 -5 -4.7 -10.9 11 13.4 30.9 -4 -3.7 -8.6 12 16.0 36.8 -3 -2.7 -6.3 13 19.5 45.0 -2 -1.8 -4.2 14 25.6 58.9 -1 -0.9 -2.0 15 65.1 150.0 0 0.0 0.0 - A method for dequantizing an encoded Channel Level Difference (CLD) quantization value when an encoded N-channel audio bitstream (N>1) is decoded based on Spatial Audio coding (SAC), the method comprising the steps of:performing Huffman decoding on the encoded CLD quantization value; anddequantizing the decoded CLD quantization value by using a Virtual Source Location Information (VSLI)-based CLD quantization table designed using CLD quantization values derived from VSLI quantization values of the N-channel audio signalwherein VSLI expresses an azimuth angle of a spatial audio image of the audio signal, andAL and AR are, respectively, the angles of left and right speakers in a speaker environment.
- The method according to claim 17, wherein the VSLI-based CLD quantization table is as follows:
Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm natural Logarithm -15 -65.1 -150.0 1 0.9 2.0 -14 -25.6 -58.9 2 1.8 4.2 -13 -19.5 -45.0 3 2.7 6.3 -12 -16.0 -36.8 4 3.7 8.6 -11 -13.4 -30.9 5 4.7 10.9 -10 -11.4 -26.3 6 5.8 13.4 -9 -9.7 -22.4 7 7.0 16.1 -8 -8.3 -19,1 8 8.3 19.1 -7 -7.0 -16.1 9 9.7 22.4 -6 -5.8 -13.4 10 11.4 26.3 -5 -4.7 -10.9 11 13.4 30.9 -4 -3.7 -8.6 12 16.0 36.8 -3 -2.7 -6.3 13 19.3 45.0 -2 -1.8 -4.2 14 25.6 58.9 -1 -0.9 -2.0 15 65.1 150.0 0 0.0 0.0 - The method according to claim 17, wherein, in the step of performing Huffmann decoding on the encoded CLD quantization value, the CLD quantization value of a first sub-band is decoded by reference to a Huffman codebook as follows:
Index Number of Bits Codeword (hexadecimal) Index Number of Bits Codeword (hexadecimal) 0 5 0x17 16 5 0x1d 1 8 0x64 17 5 0x19 2 8 0x65 18 5 0x1c 3 8 0xf0 19 5 0x16 4 8 0xf1 20 5 0x18 5 7 0x33 21 5 0x14 6 7 0x79 22 5 0x13 7 6 0x13 23 5 0x15 8 6 0x22 24 5 0x1b 9 6 0x23 25 5 0x10 10 6 0x3d 26 5 0x0e 11 5 0x0b 27 5 0x0f 12 5 0x12 28 5 0x0d 13 5 0x1a 29 5 0x0a 14 4 0x04 30 2 0x00 15 5 0x1f - The method according to claim 20, wherein the Huffman encoding is performed on quantization indexes of the remaining sub-bands other than the first sub-band by reference to a Huffman codebook as follows;
Index Number of Bits Codeword (hexadecimal) Index Number of Bits Codeword (hexadecimal) 0 2 0x00003 16 10 0x00206 1 2 0x00001 17 10 0x00006 2 3 0x00005 18 11 0x0040e 3 3 0x00001 19 11 0x0000e 4 4 0x00009 20 12 0x0081f 5 4 0x00001 21 12 0x0001f 6 5 0x00011 22 13 0x0103c 7 5 0x00001 23 13 0x0003d 8 6 0x00021 24 14 0x0207a 9 6 0x00001 25 14 0x00079 10 7 0x00041 26 14 0x00078 11 7 0x00001 27 15 0x04016 12 8 0x00080 28 16 0x081ef 13 8 0x00000 29 17 0x103dd 14 9 0x00102 30 17 0x103dc 15 9 0x00002 - A computer-readable recording medium on which is recorded a computer program for performing the CLD dequantization method according to any one of claims 17 through 21.
- A method for decoding an encoded N-channel audio bitstream (N>1) based on Spatial Audio Coding (SAC), the method comprising the steps of:decoding the encoded N-channel audio bitstream;dequantizing quautization values of at least one spatial parameter received together with the encoded N-channel audio bitstream using the method according to any one of claims 17 to 21 andsynthesizing the decoded N-channel audio bitstream based on the dequantized spatial parameter to restore an N-channel audio signal.
- An apparatus for decoding an encoded N-channel audio bitstream (N>1) based on Spatial Audio Coding (SAC), the apparatus comprising:means for decoding the encoded N-channel audio bitstream;means for decoding quantization values of at least one spatial parameter received together with the encoded N-channel audio bitstream;means for dequantizing the quantization values of the spatial parameter; andsynthesizing the decoded N-channel audio bitstream based on the dequantized spatial parameter to restore an N-channel audio signal,wherein the means for dequantizing the quantization value of the spatial parameter dequantizes a CLD included in the spatial parameter by reference to a Virtual Source Location Information (VSLI)-based CLD quantization table designed - using CLD quantization values derived from VSLI quantization values of the N-channel audio signalwherein VSLI expresses an azimuth angle of a spatial audio image of the audio signal, andAL and AR are, respectively, the angles of left and right speakers in a speaker environment.
- The apparatus according to claim 24, wherein the VSLI-based CLD quantization table is as follows:
Index CLD Index CLD Base-10 logarithm Natural Logarithm Base-10 logarithm Natural Logarithm -15 -65.1 -150.0 1 0.9 2.0 -14 -25.6 -58.9 2 1.8 4.2 -13 -19.5 -45.0 3 2.7 6.3 -12 16.0 -36.8 4 3.7 8.6 -11 -13.4 -30.9 5 4.7 10.9 -10 -11.4 -26.3 6 5.8 13.4 -9 -9.7 -22.4 7 7.0 16.1 -8 -8.3 -19.1 8 8.3 19.1 -7 -7.0 -16.1 9 9.7 22.4 -6 -5.8 -13.4 10 11.4 26.3 -5 -4.7 -10.9 11 134 30.9 -4 -3.7 -8.6 12 16.0 36.8 -3 -2.7 -6.3 13 19.5 45.0 -2 -1.8 -4.2 14 25.6 58.9 -1 -0.9 -2.0 15 65.1 150.0 0 0.0 0.0
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20050065515 | 2005-07-19 | ||
KR20050096256 | 2005-10-12 | ||
KR1020060066822A KR100755471B1 (en) | 2005-07-19 | 2006-07-18 | Virtual source location information based channel level difference quantization and dequantization method |
PCT/KR2006/002824 WO2007011157A1 (en) | 2005-07-19 | 2006-07-19 | Virtual source location information based channel level difference quantization and dequantization method |
Publications (3)
Publication Number | Publication Date |
---|---|
EP1905034A1 EP1905034A1 (en) | 2008-04-02 |
EP1905034A4 EP1905034A4 (en) | 2009-11-25 |
EP1905034B1 true EP1905034B1 (en) | 2011-06-01 |
Family
ID=37669008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP06783342A Not-in-force EP1905034B1 (en) | 2005-07-19 | 2006-07-19 | Virtual source location information based channel level difference quantization and dequantization |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP1905034B1 (en) |
WO (1) | WO2007011157A1 (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5098458B2 (en) * | 2007-06-20 | 2012-12-12 | カシオ計算機株式会社 | Speech coding apparatus, speech coding method, and program |
KR101613975B1 (en) | 2009-08-18 | 2016-05-02 | 삼성전자주식회사 | Method and apparatus for encoding multi-channel audio signal, and method and apparatus for decoding multi-channel audio signal |
WO2011097903A1 (en) | 2010-02-11 | 2011-08-18 | 华为技术有限公司 | Multi-channel signal coding, decoding method and device, and coding-decoding system |
CN102157151B (en) | 2010-02-11 | 2012-10-03 | 华为技术有限公司 | Encoding method, decoding method, device and system of multichannel signals |
US9456289B2 (en) | 2010-11-19 | 2016-09-27 | Nokia Technologies Oy | Converting multi-microphone captured signals to shifted signals useful for binaural signal processing and use thereof |
US9313599B2 (en) * | 2010-11-19 | 2016-04-12 | Nokia Technologies Oy | Apparatus and method for multi-channel signal playback |
US9055371B2 (en) | 2010-11-19 | 2015-06-09 | Nokia Technologies Oy | Controllable playback system offering hierarchical playback options |
PL2740222T3 (en) | 2011-08-04 | 2015-08-31 | Dolby Int Ab | Improved fm stereo radio receiver by using parametric stereo |
EP2834995B1 (en) | 2012-04-05 | 2019-08-28 | Nokia Technologies Oy | Flexible spatial audio capture apparatus |
EP2982139A4 (en) | 2013-04-04 | 2016-11-23 | Nokia Technologies Oy | Visual audio processing apparatus |
US9706324B2 (en) | 2013-05-17 | 2017-07-11 | Nokia Technologies Oy | Spatial object oriented audio apparatus |
GB2575632A (en) * | 2018-07-16 | 2020-01-22 | Nokia Technologies Oy | Sparse quantization of spatial audio parameters |
GB2593672A (en) * | 2020-03-23 | 2021-10-06 | Nokia Technologies Oy | Switching between audio instances |
WO2022173988A1 (en) * | 2021-02-11 | 2022-08-18 | Nuance Communications, Inc. | First and second embedding of acoustic relative transfer functions |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6016473A (en) * | 1998-04-07 | 2000-01-18 | Dolby; Ray M. | Low bit-rate spatial coding method and system |
US20030035553A1 (en) * | 2001-08-10 | 2003-02-20 | Frank Baumgarte | Backwards-compatible perceptual coding of spatial cues |
EP1881486B1 (en) * | 2002-04-22 | 2009-03-18 | Koninklijke Philips Electronics N.V. | Decoding apparatus with decorrelator unit |
JP4212591B2 (en) * | 2003-06-30 | 2009-01-21 | 富士通株式会社 | Audio encoding device |
-
2006
- 2006-07-19 EP EP06783342A patent/EP1905034B1/en not_active Not-in-force
- 2006-07-19 WO PCT/KR2006/002824 patent/WO2007011157A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
EP1905034A1 (en) | 2008-04-02 |
WO2007011157A1 (en) | 2007-01-25 |
EP1905034A4 (en) | 2009-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1905034B1 (en) | Virtual source location information based channel level difference quantization and dequantization | |
US11955131B2 (en) | Apparatus and method for encoding or decoding a multi-channel signal | |
US11727944B2 (en) | Apparatus and method for stereo filling in multichannel coding | |
KR101976757B1 (en) | Apparatus for encoding and decoding multi-object audio supporting post downmix signal | |
KR100755471B1 (en) | Virtual source location information based channel level difference quantization and dequantization method | |
US7627480B2 (en) | Support of a multichannel audio extension | |
KR101428487B1 (en) | Multi-channel encoding and decoding method and apparatus | |
US7620554B2 (en) | Multichannel audio extension | |
US8515770B2 (en) | Method and apparatus for encoding and decoding excitation patterns from which the masking levels for an audio signal encoding and decoding are determined | |
USRE46082E1 (en) | Method and apparatus for low bit rate encoding and decoding | |
US20080252510A1 (en) | Method and Apparatus for Encoding/Decoding Multi-Channel Audio Signal | |
EP2270774B1 (en) | Lossless multi-channel audio codec | |
JP3487250B2 (en) | Encoded audio signal format converter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20080110 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20091026 |
|
17Q | First examination report despatched |
Effective date: 20100108 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 19/00 20060101AFI20101125BHEP Ipc: G10L 19/02 20060101ALI20101125BHEP |
|
RTI1 | Title (correction) |
Free format text: VIRTUAL SOURCE LOCATION INFORMATION BASED CHANNEL LEVEL DIFFERENCE QUANTIZATION AND DEQUANTIZATION |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602006022287 Country of ref document: DE Effective date: 20110714 |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: VDEP Effective date: 20110601 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110912 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110902 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: BE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20111003 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20111001 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110731 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20120330 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110731 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110731 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110801 |
|
26N | No opposition filed |
Effective date: 20120302 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20110901 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602006022287 Country of ref document: DE Effective date: 20120302 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110719 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110901 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110719 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110901 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R088 Ref document number: 602006022287 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20110601 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20160614 Year of fee payment: 11 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602006022287 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180201 |