WO2009146734A1 - Codage audio multicanaux - Google Patents
Codage audio multicanaux Download PDFInfo
- Publication number
- WO2009146734A1 WO2009146734A1 PCT/EP2008/056813 EP2008056813W WO2009146734A1 WO 2009146734 A1 WO2009146734 A1 WO 2009146734A1 EP 2008056813 W EP2008056813 W EP 2008056813W WO 2009146734 A1 WO2009146734 A1 WO 2009146734A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channels
- correlation
- difference
- value representing
- status information
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 claims abstract description 86
- 238000000034 method Methods 0.000 claims description 37
- 238000004590 computer program Methods 0.000 claims description 30
- 230000004048 modification Effects 0.000 claims description 18
- 238000012986 modification Methods 0.000 claims description 18
- 230000009471 action Effects 0.000 description 40
- 230000015654 memory Effects 0.000 description 27
- 230000003595 spectral effect Effects 0.000 description 22
- 238000012545 processing Methods 0.000 description 18
- 230000005540 biological transmission Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000010267 cellular communication Effects 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000035807 sensation Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
Definitions
- the invention relates to the field of multi-channel audio coding.
- Audio coding systems are used in particular for transmitting or storing audio signals.
- the audio coding system comprises an encoder at a transmitting side and a decoder at a receiving side.
- the audio signal that is to be transmitted is provided to the encoder.
- the encoder is responsible for adapting the incoming audio data rate to a bitrate level at which the bandwidth conditions in the transmission channel are not violated. Ideally, the encoder discards only irrelevant information from the audio signal in this encoding process.
- the encoded audio signal is then transmitted by the transmitting side of the audio coding system and received at the receiving side of the audio coding system.
- the decoder at the receiving side reverses the encoding process to obtain a decoded audio signal with little or no audible degradation.
- the audio coding system could be employed for archiving audio data.
- the encoded audio data provided by the encoder is stored in some storage unit, and the decoder decodes audio data retrieved from this storage unit.
- the encoder achieves a bitrate which is as low as possible, in order to save storage space.
- the original audio signal can be a mono audio signal or a multi-channel audio signal containing at least a first and a second channel signal.
- An example of a multichannel audio signal is a stereo audio signal, which is composed of a left channel signal and a right channel signal.
- Another example is an audio signal that is used for a surround technology and includes for example two stereo channels, an additional center channel and two surround channels.
- different encoding schemes can be applied to a multi-channel audio signal.
- the different channels can be encoded for instance independently from each other. But typically, a correlation exists between the different channels of a multi-channel audio signal, and the most advanced coding schemes exploit this correlation to achieve a further reduction in the bitrate.
- Examples for reducing the bitrate for an encoded stereo audio signal comprise low bitrate stereo extension methods.
- the stereo audio signal is encoded as a high bitrate mono signal, which is provided by the encoder together with some side information reserved for a stereo extension.
- the stereo audio signal is then reconstructed from the high bitrate mono signal in a stereo extension making use of the side information.
- the side information typically takes only a few kbps of the total bitrate.
- Parametric multi-channel audio coding methods such as Binaural Cue Coding (BCC) , enable a high-quality multichannel reproduction with reasonable bit-rate compared to a scenario where all channels are encoded and transmitted separately.
- BCC Binaural Cue Coding
- the compression of a spatial image is based on generating one or several down-mixed signals together with a set of spatial cues.
- the decoder uses the received down-mixed signals and the spatial cues to synthesize a set of channels - which can be different from the number of input channels - with spatial properties as described by the received spatial cues.
- the spatial cues typically include an inter-channel level difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference ICLD), an inter-channel time difference
- ICTD inter-channel coherence/correlation
- ICC inter-channel coherence/correlation
- ICLD and ICTD aim at describing the signals from the actual audio sources, whereas the ICC aims at enhancing the spatial sensation by introducing a diffuse component of the audio image, including reverberations, ambience, etc. These cues are normally provided for each frequency band separately.
- the decoding side typically uses a filter that is controlled by the received ICC cues to recreate a coherence/correlation approximating the coherence/correlation which is present in the input signals .
- a method which comprises evaluating correlation status information, wherein the correlation status information indicates whether or not there is a - A -
- the method further comprises modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
- a first apparatus which comprises a processor.
- the processor is configured to evaluate correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal.
- the processor is further configured to modify a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
- the apparatus may comprise for example exclusively the described processor, but it may also comprise additional components.
- the apparatus could further be for example a module provided for integration into an electronic device, like a processing component, a chip or a circuit implementing the processor, or it could be such a device itself. In the latter case, it could be for instance an electronic device, which comprises in addition an interface configured to receive captured multi-channel audio signals and/or an interface configured to output multi-channel audio signals.
- a second apparatus which comprises means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal, and means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
- the means of this apparatus can be implemented in hardware and/or software. They may comprise for instance a processor for executing computer program code for realizing the required functions, a memory storing the program code, or both. Alternatively, they could comprise for instance a circuit that is designed to realize the required functions, for instance implemented in a chipset or a chip, like an integrated circuit. It is to be understood that further or correspondingly adapted means may be comprised for realizing any of the functions that may optionally be implemented in any described embodiment of the first apparatus.
- a computer readable storage medium in which computer program code is stored.
- the computer program code realizes the described method when executed by a processor.
- the computer readable storage medium could be for example a disk or a memory or the like.
- the computer program code could be stored in the computer readable storage medium in the form of instructions encoding the computer-readable storage medium. It is to be understood that also the computer program code by itself has to be considered an embodiment of the invention.
- certain embodiments of the invention provide that information about a correlation status is used as a decision criterion whether to apply a certain modification to a value indicating a difference between different channels of a multi-channel audio signal.
- the considered audio signal can be for instance a speech signal, but equally any other kind of audio signal, like a music signal.
- the considered segment can be for instance a frame of an audio signal, but equally any other kind of segment, like a superframe or a subframe.
- An audio signal may comprise any number of segments, including one.
- the described processing can further be performed for example for each of a plurality of frequency bands in the segment of an audio signal, only for selected ones of a plurality of frequency bands, or on the entire frequency range of the segment of the audio signal as a whole. A selection of frequency bands could also differ from one segment to the next.
- the multi-channel audio signal may comprise only two channels, for instance the left and right channel of a stereo signal, or any other number of channels, for instance five channels for a surround audio signal.
- the correlation information status could be derived for instance from an inter-channel correlation (ICC) cue obtained in a binaural cue coding, but it could be obtained in any other manner as well which is suited to indicate whether or not a significant correlation between channels is given.
- the difference value that may be modified could be for instance an inter-channel level difference (ICLD) cue obtained in a binaural cue coding, but it could equally be any other kind of value that can be modified to increase the decorrelation of the channels if appropriate.
- ICLD inter-channel level difference
- the correlation status information is represented by a single bit.
- the modifying of a value representing a difference between channels is based on equations using exclusively non-random values.
- the processor or some other means is configured to realize a corresponding modification.
- the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal.
- the processor or some other means is configured to realize a corresponding modification.
- the modifying of a value representing a difference between channels is based alternatively or in addition on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels.
- a value indicating the amount of the correlation itself might not required so that only the correlation status has to be provided.
- the processor or some other means is configured to realize a corresponding modification.
- the modifying of a value representing a difference between channels is based alternatively or in addition on a correlation value indicating the amount of correlation between the channels, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates in contrast a similar level of a signal in the channels.
- the processor or some other means is configured to realize a corresponding modification.
- the code may be implemented to realize a corresponding modification.
- the modifying of a value representing a difference between channels is performed at an encoder side, which generates the correlation status information and the value representing a difference between the channels.
- the processor or some other means is associated to an encoder side, which generates the correlation status information and the value representing a difference between the channels .
- the code is code for such an encoder side.
- the modifying of a value representing a difference between channels is performed at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side.
- the processor or some other means is associated to such a decoder side.
- the code is code for such a decoder side.
- the method comprises obtaining at the decoder side in addition information on frequency bands for which the correlation status information is valid.
- the processor or some other means is configured to obtain such additional information.
- the code may be implemented to obtain such additional information.
- a method is an information providing method, comprising the steps of evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
- an apparatus is an information providing apparatus comprising processing means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and processing means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
- one of the described apparatuses can be seen as an audio signal encoding or decoding apparatus.
- Fig. 1 is a schematic block diagram of a coding system in which an exemplary embodiment of the invention is implemented
- Fig. 2 is a schematic block diagram presenting functional blocks of an exemplary encoder
- Fig. 3 is a schematic block diagram presenting functional blocks of an exemplary decoder
- Fig. 4 is a flow chart illustrating an operation at an encoding side in the system of Figure 1
- Fig. 5 is a flow chart illustrating an operation at a decoding side in the system of Figure 1
- Fig. 6 is a schematic block diagram of an electronic device in which another exemplary embodiment of the invention is implemented
- Fig. 7 is a flow chart illustrating an operation in the electronic device of Figure 6.
- Figure 1 is a schematic diagram of an exemplary system which supports a correlation status controlled modification of inter-channel level differences.
- the system comprises a first electronic device 110 and a second electronic device 120.
- the first electronic device 110 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission, for example an audio recording device.
- the device 110 comprises a processor 112 and, linked to this processor 112, a memory 113, an interface for receiving captured audio data 116, and a transmitter (TX) 117.
- a processor 112 and, linked to this processor 112, a memory 113, an interface for receiving captured audio data 116, and a transmitter (TX) 117.
- TX transmitter
- the processor 112 is configured to execute implemented computer program code.
- the memory 113 stores computer program code 114, which may be retrieved by the processor 112 for execution.
- the stored program codes 114 comprise code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension cues and for generating inter-channel correlation values and status.
- the memory 113 may comprise in addition a data storage portion 115.
- the processor 112 and the memory 113 could optionally be integrated in a single component, for example on a chip 111.
- the interface 116 could be for instance microphones or comprise a socket for connecting microphones.
- the transmitter 117 could belong for example to a cellular engine of the device 110 and be configured to transmit data via a cellular communication network to other devices.
- the second electronic device 120 can also be for instance a mobile phone, but equally any other device which is able to decode audio data for presentation to a user.
- the device 120 comprises a processor 122 and, linked to this processor 122, a memory 123, an interface for presenting audio data 126 to a user and a receiver (RX) 127.
- RX receiver
- the processor 122 is configured to execute implemented computer program code.
- the memory 123 stores computer program code 124, which may be retrieved by the processor 122 for execution.
- the stored program codes 124 comprise code for decoding audio data. It includes code for modifying stereo extension values under control of an inter-channel correlation status, and for reconstructing a multi-channel audio signal.
- the memory 123 may comprise in addition a data storage portion (not shown) .
- the processor 122 and the memory 123 could optionally be integrated in a single component, for example on a chip 121.
- the interface 126 could comprise for instance loudspeakers or a socket for connecting loudspeakers.
- the receiver 127 could belong for example to a cellular engine of the device 120 and be configured to receive data via a cellular communication network from other devices .
- the interfaces 117 and 127 are configured in any case such that they enable device 110 to transmit encoded audio data to device 120, either directly on a wired or wireless link or indirectly via some communication network.
- Figure 2 is a high-level block diagram of an encoder implemented by the program code 114 of device 110. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of an encoder providing the same functions as the program code 114. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.
- the encoder includes a transform block 201 for transforming the data of a left channel of an audio signal 'L' from the time domain into the frequency domain.
- the resulting frequency domain signal is denoted 'Lf'.
- the encoder further includes a transform block 202 for transforming the data of a right channel of an audio signal 'R' from the time domain into the frequency domain.
- the resulting frequency domain signal is denoted
- the encoder moreover includes a mono conversion block 203, which is configured to create a down-mixed signal by converting the stereo signal into a mono signal
- M f 0.5-(L f +R f ) and to pass the mono signal to a mono encoder, for example to an embedded variable bitrate (EV- VBR) mono encoder 204.
- a different way to create the down-mixed signal and the difference signal can be used, for example one comprising a linear combination of the input channels with possible phase correction.
- the mono encoder 204 is configured to encode a received mono signal and to provide a resulting bitstream to a bitstream multiplexer 208.
- the stereo encoder 205 is configured to generate and encode stereo extension data, including a quantization to obtain a desired bitrate, and to provide a resulting bitstream to the multiplexer 208. Any kind of stereo encoder could be used to this end.
- the encoder moreover includes further transformers 206, which are configured to transform the left and right channel signals L and R to the frequency domain, and a correlation encoder 207, which is configured to analyze the left and right channel signals in the frequency domain, to decide which of the spectral bands need decorrelation at a decoder side, and to pass corresponding correlation flags to the multiplexer 208. It is to be understood that in an alternative embodiment instead of employing separate transformers 206, also the output of transformers 201, 202 could be provided to correlation encoder 207.
- the multiplexer 208 is configured to multiplex all received information to create a bitstream for storage or transmission.
- Figure 3 is a high-level block diagram of a decoder implemented by the program code 124 of device 120. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of a decoder providing the same functions as the program code 124. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.
- the decoder includes a demultiplexer 307, which is configured to demultiplex a bitstream that has been retrieved from a memory or received from another device and to pass the demultiplexed data to a mono decoder, for example an EV-VBR mono decoder 304, to a decorrelation block 306 and to a stereo decoder 305.
- a mono decoder for example an EV-VBR mono decoder 304, to a decorrelation block 306 and to a stereo decoder 305.
- the mono decoder 304 is configured to decode a received encoded mono signal.
- the stereo decoder 305 is configured to extract and decode stereo extension data from the bitstream, to combine this data with the decoded mono signal to reconstruct a stereo signal, and to output the reconstructed left and right output channels L f and R f to inverse transformers 301 and 302.
- the stereo decoder 305 is configured to provide the extracted stereo extension values to the decorrelation block 306 before using them in the reconstruction and to receive modified stereo extension values from the decorrelation block 306 for use in the reconstruction.
- the decorrelation block 306 is configured to extract correlation flags from the bitstream, to modify the stereo extension values when needed, and to provide the modified values to the stereo decoder 305.
- stereo decoder 305 and decorrelation block 306 can be integrated in a single functional block for optimizing the processing.
- the inverse transformer 301 is configured to obtain the time domain left channel L by performing a frequency-to- time domain transformation on reconstructed left channel L f
- the inverse transformer 302 is configured to obtain the time domain right channel R by performing a frequency-to-time domain transformation on reconstructed right channel R f .
- the regained stereo signal may be provided for presentation to a user or stored for later consumption.
- processor 112 when executing the code for encoding audio data 114 retrieved from memory 113, and equally to be realized by the corresponding functional blocks of the encoder of Figure 2.
- the data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis.
- the multi-channel audio signal is transformed into the frequency domain (action 401) .
- the employed transform can be any complex valued transform such as a discrete Fourier transform (DFT) , a quadrature mirror filterbank (QMF) transform, or a combination of a modified discrete cosine transform (MDCT) and a modified discrete sine transform (MDST) .
- DFT discrete Fourier transform
- QMF quadrature mirror filterbank
- MDCT modified discrete cosine transform
- MDST modified discrete sine transform
- MDCT is used to obtain the real valued signals
- MDST is used to obtain the imaginary counterpart for the same input signal.
- the left and right channel signals are down-mixed to a mono signal (action 411), and the mono signal is encoded for transmission (action 412) .
- the frequency range of each frequency domain frame is divided into a plurality of frequency bands .
- the left and right channel signals are used for determining multi-channel extension values for each frequency band, including ICLD values (action 421) .
- the difference signal can be used for enhancing the stereo quality, in particular when higher bitrates are available .
- icld(i) is the ICLD cue for frequency band i of the current frame, where Offset describes the start and end indices for each spectral band, and where L real , L j _ mag ,
- R rea i f and R ⁇ magf are the complex valued spectral representations of the left and right channels.
- the multi-channel extension values are encoded for transmission (action 422) .
- a correlation measure such as the inter- channel correlation (ICC) is calculated for each of a plurality of spectral bands (action 431) .
- the inter-channel correlation can be calculated for example as follows :
- icc t (i) is the inter-channel correlation in frequency band i of the current frame
- lcc t _ ⁇ contains the ICC values from the previous frame
- M is the number of spectral bands present for each frame.
- IcC f - ⁇ 1 could be initialized to 1 I 1 (or any other suitable value) at start up.
- the correlation measures in several previous frames could be taken into account for example by generalizing the equation (1 ⁇ into a weighted sum of the past values:
- icc t -j is the correlation measure in frequency band i of the j:th frame counting backwards from the current frame
- kj is the weight assigned to the correlation measure in frequency band i of the j : th frame counting backwards from the current frame.
- sbOffsetU ⁇ 0, 5, 11, 18, 25, 33, 43, 56, 72, 91, 116, 146, 183, 240, 274 ⁇
- the considered spectral bands for the ICC related computations could then cover 6850 Hz.
- the total frequency range of the audio signal segment, which be considered in the computation of the ICLD values, could be larger than 6850 Hz, but decorrelation might not be applied to higher frequencies where the impact to subjective quality is lower.
- the exemplary value 0.75 in equation (3) can be considered as a threshold for a correlation measure value (ICC) indicating significant correlation.
- the threshold value can be a fixed or adaptive value, and it may be selected for example based on desired performance, based on the application, based on the characteristics of the input signal, etc.
- the final ICC values are mapped to flag bits for bitstream multiplexing (action 433) .
- Alternative approaches include having one flag bit per frequency band or one flag bit for any arbitrarily selected set of frequency bands.
- flag bits are provided for transmission or storage as follows:
- Send/store value of icc flag (i) with 1 bit it could be determined in addition whether the signal level in a frequency band is very similar across the channels, that is, whether the ICLD cue is equal to 1 for any frequency band (action 434) .
- the final ICC value itself may be provided in addition in the bitstream for enabling a better decorrelation.
- qTbl describes a table for quantized ICC values and where the quantization operator Q() returns the table index that minimizes the squared error between the ICC value in question and the quantization table value corresponding to the index.
- the table is as follows:
- the threshold value for the high correlation status in equation (3) could be decreased from 0.75 for example to 0.5 to limit the decorrelation only to spectral bands where decorrelation is perceptually most relevant. This has also the advantage that the side information gets simultaneously minimized.
- the quantized indices of the final ICC values could then be provided for transmission or storage as follows:
- mapping between flag bits and frequency bands is known from the context.
- information indicating for which frequency band or bands a respective flag applies or any further information could be provided using additional bits ⁇ action 436) .
- the encoded mono signal, the encoded stereo extension information and the decorrelation information, the latter including ICC flags and optionally encoded ICC values, are multiplexed to a bitstream for transmission via interface 117 or storage in data storage portion 115 (action 441) .
- the bitstream can be constructed in such a way that all encoded data belonging to the same frame, i.e. the encoded mono signal, the encoded stereo extension information and the decorrelation information, are included in a single data unit.
- the encoded mono signal of a frame can be encapsulate in one data unit, while the stereo extension information and the decorrelation information for this frame are combined into another data unit.
- the encoded data of a frame is encapsulate in several data units, each comprising encoded data representing a certain frequency range.
- processor 122 when executing the code for encoding audio data 124 retrieved from memory 123, and equally to be realized by the corresponding functional blocks of the decoder of Figure 3.
- the data When receiving encoded multi-channel audio data via receiver 127 that is to be presented to a user, the data is forwarded to the processor 112 for processing.
- the received bitstream is first demultiplexed (action 501) .
- the mono signal is extracted from the bitstream and decoded (action 511) .
- extension values including for example ICLD cues
- ICC flags are extracted from the bitstream and expanded to full resolution (action 531) as follows:
- the "read 1 bit”, which is used as a respective decoded ICC flag icC f i ag ⁇ corresponds to the icc f i ag determined in equation (4) for a respective pair of neighboring frequency bands - optionally modified in case of an ICLD cue equal to 1 as described above.
- the number of received ICC flags is doubled by associating the same flag Iccfi a g dec to two neighboring frequency bands, respectively.
- ICC values ICC values or information linking a respective ICC flag and/or value to a respective frequency band.
- Indices for the ICC values could be read from the bitstream right after the flag bits and converted into ICC values as follows:
- the "read 2 bits" correspond to a respective index icCq_i d ⁇ as defined above in equation (5), while icc q is the quantized value associated to a particular index icc q _i dx in table qTbl.
- An exemplary table qTbl has already been introduced above. Also the number of obtained quantized ICC values is doubled by associating the same quantized ICC value to two neighboring frequency bands, respectively.
- the decoded ICC flag is evaluated for all frequency bands of the current frame (action 532) . That is, it is determined whether it has a value of 1 I' representing a low correlation or a value of '0' representing a significant correlation between the channels.
- lcld(i) is the decoded level difference for each frequency band i of a respective frame, as extracted from the bitstream.
- the modification summand b(i) may be determined as follows:
- ice _ dec t is a decoder internal ICC value for the current frame and where ice _ dec t _ ⁇ is a decoder internal
- the decoder internal ICC value for the previous frame ice _ dec t _ 1 may be initialized for example to 1 at start up.
- the decoder internal ICC value for the current frame ice _ dec t may be determined as follows:
- i ⁇ c g contains the quantized ICC values for those bands where the corresponding ICLD value is 1, and where MIN returns the minimum of the specified input values.
- eMax max eMoriO (eMojio)
- sbOffsetf defines again the offsets for the considered spectral bands, which may be the same as indicated above for the encoding.
- iccGain t is an adaptive gain that is initialized to a suitable value, for example to '6' at start up. Then, it is updated for the respective next frame as follows based on the energy (i) computed for the current frame in equation (9) :
- iccGain t + 1 MIW(6,O.3 • iccGa ⁇ n t + 0.7 • lccGain e ) (10)
- iccGain t is the gain value of the current frame
- iccGain t+1 is the gain value for the next frame
- Equation (9) introduces a time-frequency dependent gain for the decorrelation to improve the perceptual quality. This is especially advantageous for signal frames or other signal segments in which the ICLDs have a relatively flat response.
- Detailed analysis of equation (8) shows, however, that decorrelation performs better with the first option whenever the value of icld(i) is equal to 1. Otherwise, icld(i) of value 1, indicating that the channel signals are similar in a level difference sense, would lead ice dec t (i) in Equation (8) to zero and thus, no perceptually significant decorrelation contribution could be expected when applying equation (6) .
- the original or modified stereo extension values (action 534, 536 or 537) are then used for reconstructing the multi-channel audio signal by up-mixing the decoded mono signal ⁇ action 522) .
- the reconstructed multi-channel audio signal is transformed again into the time domain (action 541) and then presented to a user via audio out interface 126.
- the ICC flags are transmitted together with the actual audio data. . They could also be transmitted separately from the other data.
- a corresponding decorrelation could also be applied for an audio signal comprising more than two channels.
- one of the channels could be selected to be a reference channel, and correlation flags could indicate the correlation between a respective channel and this reference channel.
- correlation flags could indicate the correlation between any arbitrary pair of channels.
- processing an audio signal comprising more than two channels more than one down- mixed signal could be generated and transmitted.
- a set of ICLD values, ICC flags, and possibly ICC value may be provided for each down-mixed signal separately.
- ICLD cues and inter- channel correlation could also be computed in the time domain instead of the frequency domain.
- the presented approach could equally be employed for modifying other kinds of values representing a difference between channels than BCC ICLD cues and other inter- channel correlation information than BCC ICC cues.
- Another variation of the presented approach may comprise a modified computation of the inter-channel level differences.
- An exemplary modified computation will be presented in the following for the case of a stereo audio signal .
- the left and right channel input signals are converted to the frequency domain using a shifted discrete Fourier transform (SDFT) .
- SDFT discrete Fourier transform
- the resulting complex- valued spectral samples are converted to the energy domain as follows:
- f L and f R are the complex valued shifted discrete Fourier transform (SDFT) samples of the left and right channels, respectively, and N is the size of the frame.
- SDFT shifted discrete Fourier transform
- the energy level for each spectral subband is calculated according to:
- offset ⁇ is a frequency offset table describing the frequency bin offsets for each spectral subband, and where M is the number of spectral subbands present in the region .
- the inter-channel level differences can then be determined for different frequency bands in the form of stereo gain values gain ⁇ i) as follows:
- offset 2 is the frequency offset table describing the frequency bin offsets for each spectral subband
- K is the number of spectral gain subbands present in the region
- max ( ) and min() return the maximum and minimum of the specified samples, respectively.
- gain values may then correspond to inter-channel level differences, which are modified whenever an ICC status indicates that there is a low correlation between the channels.
- Additional position values may indicate to which channel a respective gain value belongs.
- the position values may be post-processed to obtain a stable stereo image over time.
- Figure 6 is a schematic diagram of an exemplary electronic device which supports a correlation status controlled modification of inter-channel level differences at an encoder side.
- the electronic device 610 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission.
- the device 610 comprises a processor 612 and, linked to this processor 612, a memory 613, an interface for receiving audio data 616, and a transmitter (TX) 617.
- the processor 612 is configured to execute implemented computer program code .
- the memory 613 stores computer program code 614, which may be retrieved by the processor 612 for execution.
- the stored program code 614 comprises code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension values, for determining inter-channel correlation values and status, for modifying the stereo extension values under control of the ICC status, and for encoding the mono signal and the modified stereo extension values for storage or transmission.
- the memory 613 may comprise in addition a data storage portion 615.
- the processor 612 and the memory 613 could optionally be integrated in a single component, for example on a chip 611.
- the interface 616 could comprise for instance a plurality of microphones or comprise a socket for connecting microphones .
- the transmitter 617 could belong for example to a cellular engine of the device 610 and be configured to transmit data via a cellular communication network to other devices.
- processor 712 when executing the code for encoding audio data 714 retrieved from memory 713 or by a corresponding hardware implementation.
- a multi-channel audio signal that is to be stored or transmitted is received by device 610 via audio interface 616, it is forwarded to the processor 612 for encoding.
- the multi-channel audio signal is a stereo signal.
- the data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis.
- the multi-channel audio signal is transformed into the frequency domain (action 701) .
- the left and right channel signals are combined to a mono signal (action 711) , and the mono signal is encoded for transmission (action 712) .
- each frequency domain frame is divided into M frequency bands.
- the left and right channel signals are used for determining multi-channel extension values, including ICLD values for each frequency band (action 721) .
- inter-channel correlation is calculated for each spectral band in accordance with above equations (1) and (2) (action 731) .
- the extension values are modified based on the mono signal values under control of the ICC status in accordance with above equations (6)-(10) (action 734) .
- the ICC status could be determined to this end in accordance with above equation (4) . It is to be understood, however, that generating separate ICC flags is only optional, since no flags have to be transmitted in this case.
- the extension values are modified based on the final ICC values under control of the ICC status in accordance with above equations (6) -(8) (action 735) .
- the ICC status can be determined for this case in accordance with above equation (4) using for example a threshold value of 0.5 instead of 0.75 in equation (3 ⁇ .
- a quantization of the ICC values may not be required in this case, since they are not necessarily transmitted.
- the quantized values ⁇ cc q in equation (8) could simply be replaced by the final ICC values icc t obtained with equation (3) .
- the quantized values icc q in equation (8) could be replaced by the ICC values icc t (i) determined in accordance with equation (1) .
- the modified multi-channel extension values are encoded (action 722) .
- the encoded mono signal and the encoded modified stereo extension information are multiplexed to a bitstream for transmission or storage (action 741) .
- Some decoders may then decode the encoded data in a conventional manner without applying any further decorrelation processing.
- Using a correlation/coherence processing in a parametric multi-channel audio coding process may result in an improved user-experience due to enhanced spatial sensation.
- Some embodiments of the invention allow reducing the correlation between channels that are derived from the mono signal by modifying values representing a difference between channels, for instance values representing a level difference. As a result, correlation between the channels better approximates that of the original stereo signal, thus improving the feeling of spaciousness.
- Certain embodiments of the invention further allow improving the naturalness and subjective audio quality of a low bit-rate multi-channel audio coding system by using an improved and effective transmission or storage and by processing the correlation/coherence information in a way exploiting the data from previous frames.
- Some embodiments may also be suited to improve the multi-channel audio quality across a wide range of signals.
- Certain embodiments of the invention using information about a correlation status as a decision criterion whether to modify a value representing a difference between channels in the segment of the multi-channel audio signal ensure that the amount of decorrelation processing is reduced compared to an approach in which a decorrelation processing is performed in any case.
- certain embodiments further ensure that the actual correlation values have to be provided at the most when a decorrelation is appropriate. Such embodiments thus enable a particularly low bitrate coding where only limited bits are available for the coding of the correlation information.
- the lowest amount of data has to be provided if the correlation status information is encoded as a single bit, the association to frequency bands is predetermined, and the actual correlation values are never provided.
- providing an association to frequency bands as additional information may render some embodiments more flexible, since the association may change in this case from segment to segment of the audio signal.
- Providing the actual correlation values in selected cases may further improve the decorrelation without unduly increasing the amount of required side information.
- the transmission of ICC values may be limited to a few cases, while otherwise only a one bit status may be transmitted. If the modification is performed at the encoder, certain embodiments ensure that less side information has to be provided to a decoder and that a decoder which does not support decorrelation processing at all could be employed.
- Certain embodiments ensure that only deterministic values are used in the modification instead of random numbers. This ensures that the decorrelation procedure can be adapted better to the concrete spatial situation.
- connection is to be understood in a way that the involved components are operationally coupled.
- connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.
- any of the mentioned processors could be of any suitable type, for example a computer processor, an application-specific integrated circuit (ASIC), etc.
- Any of the mentioned memories could be implemented as a single memory or as a combination of a plurality of distinct memories, and may comprise for example a read- only memory, a flash memory or a hard disc drive memory etc.
- any other hardware components that have been programmed in such a way to carry out the described functions could be employed as well.
- any of the actions described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor.
- a computer-readable storage medium e.g., disk, memory, or the like
- references to 'computer-readable storage medium' should be understood to encompass specialized circuits such as field-programmable gate arrays, application-specific integrated circuits (ASICs), signal processing devices, and other devices.
- the functions illustrated by the combination of processor 122 and memory 123, by the decorrelation block 306 or by the combination of processor 612 and memory 613 can be viewed as means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal; and as means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
- the program codes 124 or 614 can also be viewed as comprising such means in the form of functional modules.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Stereophonic System (AREA)
Abstract
Pour mettre en œuvre la reconstruction d'un signal audio multicanaux, des données d'état de corrélation sont évaluées, du côté codeur ou du côté décodeur. Les données d'état de corrélation indiquent s'il existe une corrélation importante entre les canaux dans un segment d'un signal audio multicanaux. Si les données d'état de corrélation indiquent qu'il n'y a pas de corrélation importante entre les canaux, une valeur représentant une différence entre les canaux du segment du signal audio multicanaux est modifiée.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2008/056813 WO2009146734A1 (fr) | 2008-06-03 | 2008-06-03 | Codage audio multicanaux |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2008/056813 WO2009146734A1 (fr) | 2008-06-03 | 2008-06-03 | Codage audio multicanaux |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009146734A1 true WO2009146734A1 (fr) | 2009-12-10 |
Family
ID=40351784
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2008/056813 WO2009146734A1 (fr) | 2008-06-03 | 2008-06-03 | Codage audio multicanaux |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2009146734A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233684A (zh) * | 2015-03-09 | 2021-01-15 | 弗劳恩霍夫应用研究促进协会 | 用于对多声道信号进行编码或解码的装置与方法 |
WO2022247651A1 (fr) * | 2021-05-28 | 2022-12-01 | 华为技术有限公司 | Procédé et appareil de codage pour signaux audio multicanal |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004030410A1 (fr) * | 2002-09-26 | 2004-04-08 | Koninklijke Philips Electronics N.V. | Procede de traitement de signaux audio et systeme de traitement de signaux audio utilisant ce procede |
EP1814104A1 (fr) * | 2004-11-30 | 2007-08-01 | Matsushita Electric Industrial Co., Ltd. | Appareil de codage stéréo, appareil de décodage stéréo et leurs procédés |
EP1914722A1 (fr) * | 2004-03-01 | 2008-04-23 | Dolby Laboratories Licensing Corporation | Décodage audio multicanal |
-
2008
- 2008-06-03 WO PCT/EP2008/056813 patent/WO2009146734A1/fr active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004030410A1 (fr) * | 2002-09-26 | 2004-04-08 | Koninklijke Philips Electronics N.V. | Procede de traitement de signaux audio et systeme de traitement de signaux audio utilisant ce procede |
EP1914722A1 (fr) * | 2004-03-01 | 2008-04-23 | Dolby Laboratories Licensing Corporation | Décodage audio multicanal |
EP1814104A1 (fr) * | 2004-11-30 | 2007-08-01 | Matsushita Electric Industrial Co., Ltd. | Appareil de codage stéréo, appareil de décodage stéréo et leurs procédés |
Non-Patent Citations (1)
Title |
---|
ENGDEGARD J ET AL: "Synthetic ambience in parametric stereo coding", AUDIO ENGINEERING SOCIETY CONVENTION PAPER, NEW YORK, NY, US, 8 May 2004 (2004-05-08), pages 1 - 12, XP002347433 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233684A (zh) * | 2015-03-09 | 2021-01-15 | 弗劳恩霍夫应用研究促进协会 | 用于对多声道信号进行编码或解码的装置与方法 |
CN112233684B (zh) * | 2015-03-09 | 2024-03-19 | 弗劳恩霍夫应用研究促进协会 | 用于对多声道信号进行编码或解码的装置与方法 |
WO2022247651A1 (fr) * | 2021-05-28 | 2022-12-01 | 华为技术有限公司 | Procédé et appareil de codage pour signaux audio multicanal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2016325879B2 (en) | Method and system for decoding left and right channels of a stereo sound signal | |
US20220215846A1 (en) | Audio encoding device, method and program, and audio decoding device, method and program | |
EP1749296B1 (fr) | Extension audio multicanal | |
US8046214B2 (en) | Low complexity decoder for complex transform coding of multi-channel sound | |
CN101128866B (zh) | 多声道音频编码中的优化保真度和减少的信令 | |
JP4934427B2 (ja) | 音声信号復号化装置及び音声信号符号化装置 | |
US9275648B2 (en) | Method and apparatus for processing audio signal using spectral data of audio signal | |
US8170871B2 (en) | Signal coding and decoding | |
US20060013405A1 (en) | Multichannel audio data encoding/decoding method and apparatus | |
US9167367B2 (en) | Optimized low-bit rate parametric coding/decoding | |
Hwang | Multimedia networking: From theory to practice | |
CN103329197A (zh) | 用于反相声道的改进的立体声参数编码/解码 | |
DK2697795T3 (en) | ADAPTIVE SHARING Gain / FORM OF INSTALLMENTS | |
KR20180125475A (ko) | 멀티 채널 코딩 | |
EP3703050B1 (fr) | Procédé de codage audio et produit associé | |
KR101387808B1 (ko) | 가변 비트율을 갖는 잔차 신호 부호화를 이용한 고품질 다객체 오디오 부호화 및 복호화 장치 | |
KR102492791B1 (ko) | 시간-도메인 스테레오 인코딩 및 디코딩 방법 및 관련 제품 | |
WO2009146734A1 (fr) | Codage audio multicanaux | |
US20250210051A1 (en) | Encoder and encoding method for discontinuous transmission of parametrically coded independent streams with metadata | |
US20250210052A1 (en) | Decoder and decoding method for discontinuous transmission of parametrically coded independent streams with metadata | |
US12125492B2 (en) | Method and system for decoding left and right channels of a stereo sound signal | |
EP3424048A1 (fr) | Codeur de signal audio, décodeur de signal audio, procédé de codage et procédé de décodage | |
HK40069408A (en) | Method and system for decoding left and right channels of a stereo sound signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08760397 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08760397 Country of ref document: EP Kind code of ref document: A1 |