KR101615776B1

KR101615776B1 - Apparatus and method for coding and decoding multi-object audio signal using different analysis stages

Info

Publication number: KR101615776B1
Application number: KR1020100050036A
Authority: KR
Inventors: 백승권
Original assignee: 한국전자통신연구원
Priority date: 2010-05-28
Filing date: 2010-05-28
Publication date: 2016-04-28
Anticipated expiration: 2030-05-28
Also published as: KR20110130623A

Abstract

다양한 채널의 다객체 오디오 신호의 부호화 및 복호화 장치 및 방법이 제공된다.
상기 부호화 장치는, 서로 상이한 채널들로 구성된 복수의 다객체 오디오 신호들을 다운믹싱하여 제2 오디오 신호들을 생성하고, 상기 제2 오디오 신호들 각각에 대한 헤더 정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 다운믹싱부, 상기 제2 오디오 신호들을 부호화하는 제1 부호화수단 및 상기 부가정보를 부호화하여 비트스트림을 생성하는 제2 부호화수단을 포함하고, 상기 다운믹싱부는 상기 제2 오디오 신호들을 복수의 서브-밴드로 분할하고, 상기 분할된 복수의 서브-밴드 별로 부가정보를 각각 추출한다.An apparatus and method for encoding and decoding multi-object audio signals of various channels are provided.
The encoding apparatus generates second audio signals by downmixing a plurality of multi-object audio signals composed of different channels, and generates additional information including header information and space queue information for each of the second audio signals Wherein the downmixing unit includes a downmixing unit for generating a plurality of second audio signals, a first encoding unit for encoding the second audio signals, and a second encoding unit for encoding the additional information to generate a bitstream, Sub-bands, and extracts additional information for each of the divided sub-bands.

Description

[0001] APPARATUS AND METHOD FOR CODING AND DECODING MULTI-OBJECT AUDIO SIGNAL USING DIFFERENT ANALYSIS STAGES [0002]

본 발명은 다객체 오디오 신호의 부호화 및 복호화 장치 및 방법에 관한 것으로, 보다 상세하게는 다양한 채널로 구성된 다객체 오디오 신호를 상이한 분석 단계를 사용하여 부호화 및 복호화 하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for encoding and decoding multi-object audio signals, and more particularly, to an apparatus and method for encoding and decoding multi-object audio signals composed of various channels using different analysis steps.

여기서, 다양한 채널을 갖는 다객체 오디오 신호란, 다객체 오디오 신호로서 각각의 오디오 객체가 서로 상이한 채널(예컨대, 모노, 스테레오 및 5.1 채널)로 구성된 오디오 신호를 의미한다.Here, a multi-object audio signal having various channels means an audio signal composed of channels (for example, mono, stereo, and 5.1 channels) in which audio objects are different from each other as multi-object audio signals.

본 발명은 문화체육관광부 및 한국콘텐츠진흥원의 이용자 창조형 음악 콘텐츠 서비스 기술 개발의 일환으로 수행한 연구로부터 도출된 것이다[과제명 : 이용자 창조형 음악 콘텐츠 서비스 기술 개발]The present invention has been derived from research conducted by the Ministry of Culture, Sports and Tourism and the Korea Creative Content Agency as part of the technology development of user-created music contents service [Task:

종래의 오디오 부복호화 기술에 따르면, 사용자는 오디오 컨텐츠를 수동적으로 청취할 수 밖에 없다.According to the conventional audio decoding technique, the user can only listen to the audio content manually.

사용자의 필요에 따라 서로 상이한 채널로 구성된 각 오디오 객체를 제어하여 하나의 오디오 컨텐츠를 다양한 방법으로 조합함으로써 다양한 오디오 객체를 소비할 수 있는 다양한 채널로 구성된 복수의 오디오 객체를 객체 별로 부호화 및 복호화하는 장치 및 방법이 요구된다.A device for encoding and decoding a plurality of audio objects, each of which is composed of various channels, capable of consuming various audio objects by combining audio contents in various ways by controlling audio objects composed of channels different from each other according to user's needs And methods are required.

이와 관련하여, 종래의 SAC(Spatial Audio Coding)는 다채널 오디오 신호를 다운믹스된 모노 또는 스테레오 신호 및 공간큐로 표현, 전송 및 복원하는 기술로서 낮은 비트율에서도 고품질의 다채널 오디오 신호를 전송할 수 있다.In this regard, the conventional SAC (Spatial Audio Coding) is a technique for expressing, transmitting and restoring a multi-channel audio signal as a downmixed mono or stereo signal and a spatial cue, and can transmit a high-quality multi-channel audio signal even at a low bit rate .

그러나, SAC는 다채널이지만 1 객체인 오디오 신호에 대해서만 부호화 및 복호화가 가능하고, 다채널이면서 동시에 다객체인 오디오 신호, 예를 들어, 모노, 스테레오 및 5.1 채널로 구성된 다객체 오디오 신호를 부복호화할 수 없다는 문제점이 있다.However, the SAC is capable of encoding and decoding only audio signals which are multi-channel but one object, and multi-object audio signals composed of multi-channel and multi-object audio signals such as mono, There is a problem that it can not be done.

또한, 종래의 바이노럴 큐 코딩(Binaural Cue Coding, BCC)은 다객체 오디오 신호를 부복호화할 수 있으나, 당해 오디오 객체가 모노 채널인 경우로만 한정되기 때문에, 모노 채널 이외의 다양한 채널로 구성된 다객체 오디오 신호를 부복호화할 수 없다는 단점이 있다.In addition, the conventional binaural cue coding (BCC) can decode a multi-object audio signal. However, since the audio object is limited to a mono channel, the multi-object audio signal is composed of various channels other than a mono channel The object audio signal can not be decoded.

따라서, 사용자의 필요에 따라 서로 상이한 채널로 구성된 복수의 오디오 객체 각각을 제어하여 하나의 오디오 컨텐츠를 다양한 방법으로 조합함으로써 다양한 오디오 객체를 소비할 수 있는 다양한 채널로 구성된 복수의 오디오 객체별 부호화 및 복호화 장치 및 방법이 요구된다.Therefore, by encoding and decoding a plurality of audio objects, which are composed of various channels, which can consume various audio objects by controlling each of a plurality of audio objects composed of channels different from each other according to the user's needs and combining one audio content in various methods Apparatus and method are required.

본 발명의 일측에 따르면, 부호화 장치에 있어서, 서로 상이한 채널들로 구성된 복수의 다객체 오디오 신호들을 다운믹싱하여 제2 오디오 신호들을 생성하고, 상기 제2 오디오 신호들 각각에 대한 헤더 정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 다운믹싱부, 상기 제2 오디오 신호들을 부호화하는 제1 부호화수단 및 상기 부가정보를 부호화하여 비트스트림을 생성하는 제2 부호화수단을 포함하고, 상기 다운믹싱부는 상기 제2 오디오 신호들을 복수의 서브-밴드로 분할하고, 상기 분할된 복수의 서브-밴드 별로 부가정보를 각각 추출하는 부호화 장치가 제공된다.According to an aspect of the present invention, there is provided an encoding apparatus for generating second audio signals by downmixing a plurality of multi-object audio signals composed of mutually different channels, and generating header information for each of the second audio signals, A first coding unit for coding the second audio signals; and a second coding unit for coding the additional information to generate a bitstream, wherein the downmixing unit comprises: There is provided an encoding apparatus for dividing second audio signals into a plurality of sub-bands and extracting additional information for each of the divided sub-bands.

상기 제2 오디오 신호는 주파수 대역에 따라 복수의 서브-밴드로 분할될 수 있다.The second audio signal may be divided into a plurality of sub-bands according to a frequency band.

상기 복수의 서브-밴드는 높은 주파수 대역의 서브-밴드 및 낮은 주파수 대역의 서브 밴드 별로 부가정보가 각각 추출될 수 있다.The additional information may be extracted for each of the sub-bands of the high frequency band and the sub-band of the low frequency band.

상기 낮은 주파수 대역의 서브 밴드는 부가정보의 추출 전 주파수 변환될 수 있다.The subbands of the low frequency band may be frequency converted before extracting the additional information.

상기 주파수 변환은 이산 푸리에 변환일 수 있다.The frequency transform may be a discrete Fourier transform.

본 발명의 일측에 따르면, 복호화 장치에 있어서, 입력 오디오 신호로부터 다운믹스 오디오 신호를 복원하고, 입력된 오디오 신호에 포함된 부가정보 비트스트림으로부터 헤더정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 입력신호 분석기, 상기 입력신호 분석기로부터 추출된 부가정보를 이용하여, 상기 복원된 다운믹스 오디오 신호를 객체별 오디오 신호로 복원하는 오디오 객체추출기 및 입력된 오디오 신호에 대한 제어정보를 이용하여 상기 복원된 객체별 오디오 신호를 다객체 오디오 신호로 출력하는 출력기를 포함하고, 상기 오디오 객체 추출기는 상기 복원된 다운믹스 오디오 신호를 복수의 서브-밴드로 분할하고, 상기 분할된 복수의 서브-밴드 별로 상기 부가정보를 이용하여 상기 객체별 오디오 신호로 복원하는 복호화 장치가 제공된다.According to one aspect of the present invention, in a decoding apparatus, a downmix audio signal is recovered from an input audio signal, and additional information including header information and spatial cue information is extracted from an additional information bitstream included in the input audio signal An input signal analyzer, an audio object extractor for recovering the reconstructed downmix audio signal into an object-specific audio signal using the additional information extracted from the input signal analyzer, Wherein the audio object extractor divides the reconstructed downmix audio signal into a plurality of subbands and outputs the reconstructed downmix audio signal to a plurality of subbands for each of the plurality of subbands, A decoding device for decoding the object-specific audio signal using the information Is provided.

상기 복원된 다운믹스 오디오 신호는 주파수 대역에 따라 복수의 서브-밴드로 분할될 수 있다.The restored downmix audio signal may be divided into a plurality of sub-bands according to a frequency band.

상기 복수의 서브-밴드는 높은 주파수 대역의 서브-밴드 및 낮은 주파수 대역의 서브 밴드 별로 각각 부가정보를 이용하여 상기 객체별 오디오 신호를 복원하는데 사용될 수 있다.The plurality of sub-bands may be used to recover the per-object audio signal using sub-information for each sub-band of the high frequency band and the sub-band of the low frequency band.

상기 낮은 주파수 대역의 서브 밴드는 상기 객체별 오디오 신호의 복원 전 역 주파수 변환될 수 있다.The subband of the low frequency band may be transformed to the frequency of the restoration of the object-specific audio signal.

상기 역 주파수 변환은 역 이산 푸리에 변환일 수 있다.The inverse frequency transform may be an inverse discrete Fourier transform.

본 발명의 일측에 따르면, 서로 상이한 채널로 구성된 다객체 오디오 신호의 부호화 장치에 있어서, 상기 서로 상이한 채널로 구성된 다객체 오디오 신호를 하나의 다운믹스된 오디오 신호로 다운믹싱하며, 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 헤더 정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 다운믹싱수단; 상기 다운믹싱된 오디오 신호를 부호화하는 부호화수단; 및 상기 부가정보를 비트스트림으로 생성하는 부가정보 부호화수단을 포함하되, 상기 헤더 정보는 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 식별자 정보; 및 상기 서로 상이한 채널로 구성된 다객체 오디오 신호에 대한 채널 정보를 포함하는 부호화 장치가 제공된다.According to an aspect of the present invention, there is provided an apparatus for encoding a multi-object audio signal composed of channels different from each other, the apparatus comprising: a downmixing unit for downmixing a multi-object audio signal composed of different channels to a downmixed audio signal; Downmixing means for extracting additional information including header information and spatial cue information for each of the configured multi-object audio signals; An encoding means for encoding the downmixed audio signal; And additional information encoding means for generating the additional information as a bitstream, wherein the header information includes identifier information for each of the multi-object audio signals having different channels; And channel information for a multi-object audio signal composed of the different channels.

본 발명의 일측에 따르면, 서로 상이한 채널로 구성된 다객체 오디오 신호의 부호화 방법에 있어서, 상기 서로 상이한 채널로 구성된 다객체 오디오 신호를 하나의 다운믹스된 오디오 신호로 다운믹싱하며, 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 헤더 정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 다운믹싱단계; 상기 다운믹싱된 오디오 신호를 부호화하는 부호화단계; 및 상기 부가정보를 비트스트림으로 생성하는 부가정보 부호화단계를 포함하되, 상기 헤더 정보는 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 식별자 정보; 및 상기 서로 상이한 채널로 구성된 다객체 오디오 신호에 대한 채널 정보를 포함하는 부호화 방법이 제공된다.According to an aspect of the present invention, there is provided a method of encoding a multi-object audio signal composed of channels different from each other, the method comprising: downmixing a multi-object audio signal composed of different channels to one downmixed audio signal; A downmixing step of extracting additional information including header information and spatial cue information for each of the constructed multi-object audio signals; An encoding step of encoding the downmixed audio signal; And a side information encoding step of generating the side information as a bit stream, wherein the header information includes identifier information for each of multi-object audio signals having different channels; And channel information for a multi-object audio signal composed of the different channels.

본 발명의 일측에 따르면, 서로 상이한 채널로 구성된 다객체 오디오 신호의 복호화 장치에 있어서, 입력 오디오 신호로부터 다운믹스 오디오 신호를 복원하고, 입력된 오디오 신호에 포함된 부가정보 비트스트림으로부터 헤더정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 입력신호 분석수단; 상기 입력신호분석수단으로부터 추출된 부가정보를 이용하여, 상기 복원된 다운믹스 오디오 신호를 객체별 오디오 신호로 복원하는 오디오 객체추출수단 및 입력된 오디오 신호에 대한 제어정보를 이용하여 상기 복원된 객체별 오디오 신호를 다객체 오디오 신호로 출력하는 출력수단을 포함하되, 상기 헤더 정보는 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 식별자 정보; 및 상기 서로 상이한 채널로 구성된 다객체 오디오 신호에 대한 채널 정보를 포함하는 복호화 장치가 제공된다.According to an aspect of the present invention, there is provided an apparatus for decoding a multi-object audio signal having channels different from each other, the apparatus comprising: a decoder for decoding a downmix audio signal from an input audio signal, Input signal analyzing means for extracting additional information including queue information; An audio object extracting means for extracting the reconstructed downmix audio signal into an object-specific audio signal using the additional information extracted from the input signal analyzing means, And output means for outputting an audio signal as a multi-object audio signal, wherein the header information includes identifier information for each of the multi-object audio signals having different channels; And channel information for a multi-object audio signal composed of the different channels.

본 발명의 일측에 따르면, 서로 상이한 채널로 구성된 다객체 오디오 신호의 복호화 방법에 있어서, 입력 오디오 신호로부터 다운믹스 오디오 신호를 복원하고, 입력된 오디오 신호에 포함된 부가정보 비트스트림으로부터 헤더정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 입력신호 분석단계; 상기 입력신호 분석단계로부터 추출된 부가정보를 이용하여, 상기복원된 다운믹스 오디오 신호를 객체별 오디오 신호로 복원하는 오디오 객체추출단계; 및 입력된 오디오 신호에 대한 제어정보를 이용하여 상기 복원된 객체별 오디오 신호를 다객체 오디오 신호로 출력하는 출력단계를 포함하되, 상기 헤더 정보는 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 식별자 정보; 및 상기 서로 상이한 채널로 구성된 다객체 오디오 신호에 대한 채널 정보를 포함하는 복호화 방법이 제공된다.According to an aspect of the present invention, there is provided a method of decoding a multi-object audio signal composed of channels different from each other, the method comprising: restoring a downmix audio signal from an input audio signal; An input signal analyzing step of extracting additional information including the queue information; An audio object extracting step of restoring the restored downmix audio signal into an object-specific audio signal using the additional information extracted from the input signal analyzing step; And outputting the restored object-specific audio signal as a multi-object audio signal using control information for the input audio signal, wherein the header information includes at least one of the multi- Identifier information; And channel information for a multi-object audio signal including the different channels.

본 발명의 일측에 따르면, 서로 상이한 채널로 구성된 다객체 오디오 신호의 복호화 장치에 있어서, 입력 오디오 신호로부터 다운믹스 오디오 신호를 복원하고, 입력된 오디오 신호에 포함된 부가정보 비트스트림으로부터 헤더정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 입력신호 분석수단; 입력 오디오 신호에 대한 제어 정보를 이용하여, 상기 입력신호 분석수단으로부터 추출된 부가정보를 제어하는 부가정보 제어수단; 및 제어된 부가정보를 이용하여 상기 복원된 다운믹스 오디오 신호를 다객체 오디오 신호로 출력하는 출력수단을 포함하되, 상기 헤더 정보는 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 식별자 정보; 및 상기 서로 상이한 채널로 구성된 다객체 오디오 신호에 대한 채널 정보를 포함하는 복호화 장치가 제공된다.According to an aspect of the present invention, there is provided an apparatus for decoding a multi-object audio signal having channels different from each other, the apparatus comprising: a decoder for decoding a downmix audio signal from an input audio signal, Input signal analyzing means for extracting additional information including queue information; Additional information control means for controlling the additional information extracted from the input signal analysis means by using control information on the input audio signal; And output means for outputting the restored downmix audio signal as a multi-object audio signal using the controlled additional information, wherein the header information includes identifier information for each of the multi-object audio signals having different channels; And channel information for a multi-object audio signal composed of the different channels.

또한, 본 발명의 일측에 따르면, 서로 상이한 채널로 구성된 다객체 오디오 신호의 복호화 방법에 있어서, 입력 오디오 신호로부터 다운믹스 오디오 신호를 복원하고, 입력된 오디오 신호에 포함된 부가정보 비트스트림으로부터 헤더정보 및 공간큐 정보를 포함하는 부가정보를 추출하는 입력신호 분석단계; 입력 오디오 신호에 대한 제어 정보를 이용하여, 상기 입력신호 분석단계로부터 추출된 부가정보를 제어하는 부가정보 제어단계; 및 제어된 부가정보를 이용하여 상기 복원된 다운믹스 오디오 신호를 다객체 오디오 신호로 출력하는 출력단계를 포함하되, 상기 헤더 정보는 상기 서로 상이한 채널로 구성된 다객체 오디오 신호 각각에 대한 식별자 정보; 및 상기 서로 상이한 채널로 구성된 다객체 오디오 신호에 대한 채널 정보를 포함하는 복호화 방법이 제공된다.According to another aspect of the present invention, there is provided a method of decoding a multi-object audio signal comprising channels different from each other, the method comprising: restoring a downmix audio signal from an input audio signal; And additional information including spatial cue information; An additional information control step of controlling the additional information extracted from the input signal analysis step by using control information on the input audio signal; And outputting the restored downmix audio signal as a multi-object audio signal using the controlled additional information, wherein the header information includes identifier information for each of the multi-object audio signals having different channels; And channel information for a multi-object audio signal including the different channels.

다양한 채널로 구성된 다객체 오디오 신호를 상이한 분석 단계를 사용하여 부호화 및 복호화하는 장치 및 방법을 제공된다. 사용자는 필요에 따라 능동적으로 오디오 컨텐츠를 소비할 수 있다.An apparatus and method for encoding and decoding multi-object audio signals composed of various channels using different analysis steps are provided. The user can actively consume audio content as needed.

저대역 서브밴드 및 고대역 서브밴드의 분석 방법을 달리하여, 주어진 정보량 내에서 코딩 효율이 극대화 됨으로써 복원된 객체 신호의 음질이 개선될 수 있다.By varying the analysis method of the low-band subband and the high-band subband, the coding efficiency is maximized within a given amount of information, so that the sound quality of the reconstructed object signal can be improved.

도 1은 본 발명의 일 실시예에 따른 다객체 오디오 부호화 장치를 도시한 도.
도 2는 본 발명의 일 실시예에 따른 도 1의 모노 채널 다운믹서의 상세 구성도.
도 3은 본 발명의 일 실시예에 따른 도 1의 스테레오 채널 다운믹서의 상세 구성도.
도 4는 본 발명의 일 실시예에 따른 도 1의 다채널 다운믹서를 나타내는 상세 구성도.
도 5는 본 발명의 일 실시예에 따른 도 1의 제2 다운믹서를 나타내는 상세 구성도.
도 6은 본 발명의 일 실시예에 따른 도 1의 부가정보 인코더로부터 생성되는 부가정보 비트스트림의 구조를 나타내는 구성도.
도 7은 본 발명의 일 실시예에 따른 도 6에 도시된 부가정보 비트스트림의 상세 구조도.
도 8은 본 발명의 다른 일 실시예에 따른 도 6에 도시된 부가정보 비트스트림의 상세 구조도.
도 9는 본 발명의 일 실시예에 따른 다양한 채널의 다객체 오디오 신호를 복호화하는 장치의 구성도.
도 10은 본 발명에 다른 일 실시예에 따른 다양한 채널의 다객체 오디오 신호를 복호화하는 장치의 구성도.
도 11은 본 발명의 일 실시예에 따른 다객체 오디오 부호화 방법을 나타내는 흐름도
도 12는 본 발명의 일 실시예에 따른 다객체 오디오 복호화 방법을 나타내는 흐름도.
도 13은 본 발명의 다른 일 실시에에 따른 다객체 오디오 복호화 방법을 나타내는 흐름도.
도 14는 본 발명의 일 실시예에 따른 기본 다운믹서의 상세 구조도.
도 15는 본 발명의 일 실시예에 따른 서브-밴드로 분할된 신호들을 도시한 도.
도 16은 본 발명의 일 실시예에 따른 각각의 신호에 대한 윈도우잉의 예를 도시한 도.
도 17는 본 발명의 일 실시예에 따른 다운믹싱 방법의 절차 흐름도.
도 18은 본 발명의 일 실시예에 따른 SAC 디코더의 상세한 구성도.
도 19는 본 발명의 일 실시예에 따른 각각의 신호에 대한 윈도우잉의 예를 도시한 도.
도 20은 본 발명의 일 실시예에 따른 각각의 윈도우잉된 신호에 대한 중첩-부가의 예를 도시한 도.1 is a block diagram illustrating a multi-object audio encoding apparatus according to an embodiment of the present invention;
FIG. 2 is a detailed block diagram of the mono channel down mixer of FIG. 1 according to an embodiment of the present invention; FIG.
3 is a detailed block diagram of the stereo channel downmixer of FIG. 1 according to an embodiment of the present invention.
FIG. 4 is a detailed block diagram illustrating a multi-channel down mixer of FIG. 1 according to an embodiment of the present invention. FIG.
5 is a detailed block diagram illustrating a second down mixer of FIG. 1 according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a structure of a side information bitstream generated from the side information encoder of FIG. 1 according to an embodiment of the present invention; FIG.
FIG. 7 is a detailed structural diagram of a bitstream of additional information shown in FIG. 6 according to an embodiment of the present invention; FIG.
FIG. 8 is a detailed structure diagram of the additional information bit stream shown in FIG. 6 according to another embodiment of the present invention; FIG.
FIG. 9 is a block diagram of an apparatus for decoding multi-object audio signals of various channels according to an embodiment of the present invention; FIG.
10 is a block diagram of an apparatus for decoding multi-object audio signals of various channels according to an embodiment of the present invention.
11 is a flowchart illustrating a multi-object audio encoding method according to an embodiment of the present invention.
12 is a flowchart illustrating a multi-object audio decoding method according to an embodiment of the present invention.
13 is a flowchart illustrating a multi-object audio decoding method according to another embodiment of the present invention.
FIG. 14 is a detailed structure diagram of a basic down mixer according to an embodiment of the present invention; FIG.
15 illustrates sub-band divided signals in accordance with an embodiment of the present invention.
16 is a diagram illustrating an example of windowing for each signal according to an embodiment of the present invention;
17 is a flowchart of a downmixing method according to an embodiment of the present invention;
18 is a detailed configuration diagram of a SAC decoder according to an embodiment of the present invention;
19 is a diagram illustrating an example of windowing for each signal according to an embodiment of the present invention;
Figure 20 illustrates an example of superposition-addition for each windowed signal in accordance with an embodiment of the present invention;

이하에서, 본 발명의 일부 실시예를, 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. Like reference symbols in the drawings denote like elements.

도 1은 본 발명의 일 실시예에 따른 다객체 오디오 부호화 장치를 도시한다. 본 실시예에서, 입력되는 복수의 오디오 객체의 채널은 각각 모노, 스테레오 및 5.1 채널이다.FIG. 1 illustrates a multi-object audio encoding apparatus according to an embodiment of the present invention. In this embodiment, the channels of a plurality of input audio objects are mono, stereo, and 5.1 channels, respectively.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 다객체 오디오 부호화 장치는 제1 다운믹서(101), 제2 다운믹서(103), 오디오 인코더(105), 부가정보 인코더(107) 및 멀티플렉서(Multiplexer, 109)를 포함한다.1, a multi-object audio encoding apparatus according to an embodiment of the present invention includes a first down mixer 101, a second down mixer 103, an audio encoder 105, a side information encoder 107, And a multiplexer (multiplexer) 109.

상기 제1 다운믹서(101)는 모노 채널 다운믹서(111), 스테레오 채널 다운믹서(113) 및 다채널 다운믹서(115)를 포함한다.The first down mixer 101 includes a mono channel down mixer 111, a stereo channel down mixer 113, and a multi-channel down mixer 115.

상기 제1 다운믹서(101)는 입력 오디오 객체의 헤더정보를 이용하여 입력된 다양한 채널의 다객체 오디오 신호를 모노, 스테레오, 다채널로 식별하고, 채널별로 그룹핑한다. 따라서, 다양한 채널의 다객체 오디오 신호는 각각 채널별로 그룹핑되어 각 채널별 다운믹서(111, 113,115)에 의해 다운믹싱된다.The first down mixer 101 identifies the multi-object audio signals of various channels inputted using the header information of the input audio object as mono, stereo, and multi-channel, and groups them by channel. Accordingly, the multi-object audio signals of various channels are grouped by channels and down-mixed by down mixers 111, 113, and 115 for respective channels.

또한 상기 제1 다운믹서(101)는 입력 오디오 객체로부터 다운믹스된 오디오 신호 및 공간큐를 포함하는 부가정보를 추출한다.The first downmixer 101 extracts the downmixed audio signal and the additional information including the spatial cue from the input audio object.

즉, 동일한 채널별로 음원이 그룹핑되어 상기 제1다운믹서(101)로 입력된다.That is, sound sources are grouped by the same channel and input to the first down mixer 101.

모노 채널 다운믹서(111)는 입력된 모노 오디오 객체로부터 다운믹스 신호 및 공간큐를 포함하는 부가정보를 추출하고, 스테레오 채널 다운믹서(113)는 입력된 스테레오 오디오 객체로부터 다운믹스 신호 및 공간큐를 포함하는 부가정보를 추출하며, 다채널 다운믹서(115)는 입력된 다채널(예를 들어 5.1 채널) 오디오 객체로부터 다운믹스 신호 및 공간큐를 포함하는 부가정보를 추출한다.The mono channel downmixer 111 extracts a downmix signal and additional information including a spatial cue from the input mono audio object, and the stereo channel downmixer 113 extracts a downmix signal and a spatial cue from the input stereo audio object And the multi-channel down-mixer 115 extracts additional information including a down-mix signal and a spatial cue from the inputted multi-channel (for example, 5.1 channel) audio object.

상기 오디오 인코더(105)는 상기 제2 다운믹서(103)로부터 출력된 제2 다운믹스 신호를 부호화한다.The audio encoder 105 encodes the second downmix signal output from the second downmixer 103.

상기 부가정보 인코더(107)는 상기 제1 다운믹서(101)로부터 출력된 부가정보 및 상기 제2 다운믹서(103)로부터 출력된 부가정보를 이용하여 부가정보 비트스트림을 생성한다. 여기서 부가정보 비트스트림에 포함된 정보는 하기 도 6에서 상세하게 설명한다.The side information encoder 107 generates a side information bit stream using the side information output from the first down mixer 101 and the side information output from the second down mixer 103. Here, the information included in the additional information bitstream will be described in detail in FIG.

상기 멀티플렉서(109)는 상기 오디오 인코더(105)로부터 입력받은 부호화된 신호 및 상기 부가정보 인코더(107)로부터 입력받은 부가정보 비트스트림을 멀티플렉싱하여 복호화 장치로 전송되는 비트스트림을 생성한다.The multiplexer 109 multiplexes the encoded signal received from the audio encoder 105 and the additional information bit stream received from the additional information encoder 107 to generate a bit stream to be transmitted to the decoding apparatus.

상기 제1 다운믹서(101)로부터 출력되는 제1 다운믹스 신호는 스테레오 신호 또는 모노 신호이다. 즉, 상기 모노 채널 다운믹서(111)로부터 출력되는 다운믹스 신호는 모노 신호이며, 나머지 다운믹서(113, 115)로부터 출력되는 다운믹스 신호는 모노 또는 스테레오 신호이다.The first downmix signal output from the first downmixer 101 is a stereo signal or a mono signal. That is, the downmix signal output from the mono channel downmixer 111 is a mono signal, and the downmix signals output from the remaining downmixers 113 and 115 are mono or stereo signals.

상기 제2 다운믹서(103)는 상기 제1다운믹서(101)로부터 출력된 제1다운믹스 신호를 제2 다운믹스하여 출력하고, 상기 제2 다운믹스 과정에서 분석된 공간큐를 포함하는 부가정보를 추출한다. 여기서 제2 다운믹스 신호는 모드에 따라 모노 또는 스테레오 신호이다.The second downmixer 103 downmixes the first downmix signal output from the first downmixer 101 and outputs the additional information including the spatial cue analyzed in the second downmixing process . Here, the second downmix signal is a mono or stereo signal depending on the mode.

여기서, 상기 부가정보에는 공간큐 및 오디오 신호 복원 및 제어를 위한 헤더 정보가 포함된다. 상기 부가정보는 하기 도 6에서 상세하게 설명된다.Here, the additional information includes header information for restoring and controlling the spatial cue and the audio signal. The additional information will be described in detail in Fig.

도 2는 본 발명의 일 실시예에 따른 도 1의 모노 채널 다운믹서(111)의 상세 구성도이다. 본 실시예에서, 입력되는 모노 오디오 객체는 N개(m1,…, mN)이다.FIG. 2 is a detailed block diagram of the mono channel down mixer 111 of FIG. 1 according to an embodiment of the present invention. In this embodiment, the input mono audio objects are N (m1, ..., mN).

도 2에 도시된 바와 같이, 상기 모노 채널 다운믹서(111)는 기본 다운믹서1을 캐스캐이드(cascade) 구조로 포함한다.As shown in FIG. 2, the mono channel down mixer 111 includes a basic down mixer 1 in a cascade structure.

모노 채널 다운믹서(111)에 포함되는 기본 다운믹서1(201)의 개수는 모노 오디오 객체의 개수(N)에 따라 결정된다. 즉, 모노 오디오 객체가 N개일 때 기본 다운믹서1(201)의 개수는 N-1개가 되며, 모노 오디오 객체가 1개일 때는 기본 다운믹서 없이 입력 신호가 바이패스(bypass) 된다.The number of basic down mixers 201 included in the mono channel down mixer 111 is determined according to the number N of mono audio objects. That is, when the number of mono audio objects is N, the number of basic down mixers 201 is N-1. When there is one mono audio object, the input signal is bypassed without a basic down mixer.

한편, 실시예에 따라서는 1개의 기본 다운믹서1이 캐스캐이드 방식으로 N-1번 이용될 수 있다.On the other hand, according to the embodiment, one basic down mixer 1 can be used N-1 times in a cascade manner.

기본적으로 기본 다운믹서1은 2개의 입력 신호를 다운믹스하여, 1개의 다운믹스된 모노 신호를 생성하고, 입력 신호에 대한 공간 큐를 포함하는 부가정보를 추출한다.Basically, the basic downmixer 1 downmixes two input signals, generates one downmixed mono signal, and extracts additional information including a spatial cue for the input signal.

첫번째 기본 다운믹서1(201a)은 상기 모노 채널 다운믹서(111)로 입력되는 모노 오디오 객체 2개를 이용하여 1개의 다운믹스된 모노 신호를 생성하고, 공간큐를 포함하는 부가정보를 추출한다.The first basic downmixer 201a generates one downmixed mono signal using two mono audio objects input to the mono channel downmixer 111 and extracts additional information including a spatial cue.

다음, 두번째 이용되는 기본 다운믹서1(201b)은 첫번째 기본 다운믹서1(201a)으로부터 출력되는 다운믹스된 모노 신호 및 상기 모노 채널 다운믹서(111)으로 입력되는 모노 오디오 객체를 이용하여 1개의 다운믹스된 모노 신호를 생성하고, 공간큐를 포함하는 부가정보를 추출한다.Next, the second basic downmixer 1 (201b) uses the downmixed mono signal output from the first basic downmixer 201a and the mono audio object input to the mono channel downmixer 111, Generates a mixed mono signal, and extracts additional information including a spatial cue.

N-1번째 기본 다운믹서1(201d)은 N-2번째 기본 다운믹서1(미도시)로부터 출력되는 다운믹스 모노 신호 및 상기 모노 채널 다운믹서(111)으로 입력되는 모노 오디오 객체를 이용하여 1개의 다운믹스된 모노 신호를 생성하고, 공간큐를 포함하는 부가정보를 추출한다.The (N-1) -th basic down-mixer 201d uses the down-mix mono signal output from the (N-2) -th basic down mixer 1 (not shown) and the mono audio object input to the mono channel down- Generates a downmixed mono signal, and extracts additional information including a spatial cue.

여기서 공간큐(spatial cue)란 오디오 신호를 부호화 및 복호화하는 과정에서 이용되는 정보로서, 주파수 영역에서 추출되며, 기본 다운믹서1(201)으로 입력되는 두 신호의 크기 차, 지연 차, 상관성 등의 정보를 포함한다.Here, the spatial cue is information used in the process of encoding and decoding an audio signal. The spatial cue is extracted in the frequency domain. The spatial cue is a spatial cue, Information.

예를 들어, 본 발명의 일 실시예로서 활용 가능한 공간큐로서, 오디오 신호의 파워 이득 정보를 나타내는 오디오 신호간 레벨차(Channel(audio signal) Level Difference, CLD), 오디오 신호간 에너지비(Inter-Channel Level Difference: ICLD), 오디오 신호간 시간 차(Inter Channel Time Difference: ICTD), 오디오 신호간 상관성 정보를 나타내는 오디오 신호간 상관성(Inter Channel Correlation: ICC) 및 가상음원 위치 정보(Virtual Source Location Information)가 있으며, 이에 한정되지 않는다.For example, as a spatial cue that can be utilized as an embodiment of the present invention, a spatial cue that can be utilized as an audio signal level difference (CLD) between audio signals representing power gain information of an audio signal, Channel level difference (ICLD), inter-channel time difference (ICTD), inter-channel correlation (ICC) and virtual source location information indicating correlation information between audio signals, But is not limited thereto.

여기서, 상기 부가정보에는 공간큐 및 오디오 신호 복원 및 제어를 위한 헤더 정보가 포함된다. 부가정보는 하기 도 6에서 상세하게 설명한다.Here, the additional information includes header information for restoring and controlling the spatial cue and the audio signal. The additional information will be described in detail in Fig.

도 3은 본 발명의 일 실시예에 따른 도 1의 스테레오 채널 다운믹서(113)의 상세 구성도이다. 본 실시예에서 입력되는 스테레오 오디오 객체는 left 신호 및 right 신호 각각 M개(SL1,…, SLM 및 SR1, …, SRM)이다.3 is a detailed block diagram of the stereo channel down mixer 113 of FIG. 1 according to an embodiment of the present invention. The stereo audio objects input in this embodiment are M (SL1, ..., SLM and SR1, ..., SRM) respectively of a left signal and a right signal.

스테레오 채널 다운믹서(113)로 입력되는 스테레오 오디오 객체는 스테레오의 left 신호 및 right 신호로 분리되어, 그룹핑된다.Stereo audio objects input to the stereo channel down mixer 113 are separated into a left signal and a right signal of stereo and are grouped.

도 3에 도시된 바와 같이, 상기 스테레오 채널 다운믹서(113)는 기본 다운믹서1(201)을 포함한다. 상기 스테레오 채널 다운믹서(113)에 포함된 기본 다운믹서1(201)은 M개의 left 신호 및 M개의 right 신호를 다운믹싱하기 위해, 2*(M-1)개가 필요하다. 여기서, 도 2에서 설명된 바와 같이, 실시예에 따라서는 1개의 기본 다운믹서1이 2*(M-1)번 이용될 수 있다.As shown in FIG. 3, the stereo channel downmixer 113 includes a basic downmixer 1 (201). The basic down mixer 201 included in the stereo channel down mixer 113 needs 2 * (M-1) to down mix the M left signal and M right signal. Here, as described in FIG. 2, one basic down mixer 1 may be used 2 * (M-1) times depending on the embodiment.

M개의 left 신호를 분석하기 위한 M-1개의 기본 다운믹서1(201la ~ 201le)은 도 2에서 설명한 바와 같이, 입력되는 신호를 분석하여, 1개의 다운믹스된 left 신호를 생성하고, 공간큐를 포함하는 부가정보를 추출한다.As shown in FIG. 2, the M-1 basic down mixers 201a to 201le for analyzing the M left signals analyze the input signals to generate one downmixed left signal, And extracts the additional information included therein.

또한, M개의 right 신호를 분석하기 위한 M-1개의 기본 다운믹서1(201ra ~ 201re)은 도 2에서 설명한 바와 같이, 입력되는 신호를 분석하여, 1개의 다운믹스된 right 신호를 생성하고, 공간 큐를 포함하는 부가정보를 추출한다.As shown in FIG. 2, the M-1 basic downmixers 201a to 201re for analyzing the M right signals generate a downmixed right signal by analyzing the inputted signals, And extracts the additional information including the queue.

도 2에서 설명한 바와 같이, 스테레오 오디오 객체가 1개인 경우에는, 입력되는 left 신호 및 right 신호가 바이패스(bypass)될 수 있다.As described with reference to FIG. 2, when there is one stereo audio object, the left signal and the right signal to be input may be bypassed.

따라서, 상기 스테레오 채널 다운믹서(113)는 다운믹스된 left 신호 및 다운믹스된 right 신호를 생성함으로써, 스테레오 다운믹스 신호를 출력하고, 공간 큐를 포함하는 부가정보를 추출한다.Therefore, the stereo channel downmixer 113 generates a downmixed left signal and a downmixed right signal, thereby outputting a stereo downmix signal and extracting additional information including a spatial cue.

도 4는 본 발명의 일 실시예에 따른 도 1의 다채널 다운믹서(115)를 나타내는 상세 구성도이다. 본 실시예에서, 입력되는 5.1 채널 오디오 객체는 P개이다.4 is a detailed configuration diagram illustrating a multi-channel down-mixer 115 of FIG. 1 according to an embodiment of the present invention. In this embodiment, there are P number of 5.1-channel audio objects input.

도 4에 도시된 바와 같이, 다채널 다운믹서(115)는 종래의 MPEG Surround 또는 공간큐 기반 오디오 코딩(Spatial Audio Coding, SAC) 기술에 따른 다운믹서로서, 다채널 오디오 신호로부터 공간큐가 포함된 부가정보를 추출하고, 오디오 신호를 모노 또는 스테레오 다운믹스 신호로 다운믹스한다.As shown in FIG. 4, the multi-channel down-mixer 115 is a down mixer according to the conventional MPEG Surround or Spatial Audio Coding (SAC) technology, Extracts additional information, and downmixes the audio signal to a mono or stereo downmix signal.

즉, 다채널 다운믹서(115)는 입력 신호인 P개의 다채널 오디오 객체로부터 공간큐를 추출하여 전송하고, 오디오 신호를 모노 또는 스테레오 신호로 다운믹스한다. 일반적으로 다채널 오디오 객체는 1개인 경우가 대부분이다.That is, the multi-channel down-mixer 115 extracts spatial cues from P multi-channel audio objects, which are input signals, and transmits the spatial cues, and down-mixes the audio signals into mono or stereo signals. Generally, most of the multi-channel audio objects are one.

도 5는 본 발명의 일 실시예에 따른 도 1의 제2 다운믹서(103)를 나타내는 상세 구성도이다.5 is a detailed configuration diagram showing a second down mixer 103 of FIG. 1 according to an embodiment of the present invention.

상기 제2 다운믹서(103)는 상기 제1 다운믹서(101)로부터 출력된 신호를 다시 다운믹스(제2 다운믹싱)하여 스테레오 다운믹스 신호를 출력하고, 공간큐가 포함된 부가정보를 추출한다. The second down-mixer 103 down-mixes (second down-mixes) a signal output from the first down-mixer 101 to output a stereo down-mix signal and extracts additional information including a spatial cue .

도 5에 도시된 바와 같이, 상기 제2 다운믹서(103)는 기본 다운믹서1(201f, 201g) 및 기본 다운믹서2(501)를 포함한다.As shown in FIG. 5, the second down mixer 103 includes basic down mixers 201 f and 201 g and a basic down mixer 501.

스테레오 채널 다운믹서(113) 및 다채널 다운믹서(115)로부터 다운믹스된 신호가 스테레오일 경우, 당해 다운믹스된 스테레오 신호는 각각 left, right 신호로 그룹핑된다. 기본 다운믹서1(201f 및 201g)이 당해 그룹핑된 left, right 신호를 다운믹싱한다. 각각의 기본 다운믹서1(201f 및 201g)으로부터 출력된 다운믹스 모노 신호는 left와 right 신호의 대표적인 다운믹스 신호이다.If the downmixed signals from the stereo channel downmixer 113 and the multi-channel downmixer 115 are stereo, the downmixed stereo signals are grouped into left and right signals, respectively. The basic downmixers 201f and 201g downmix the grouped left and right signals. The downmix mono signals output from the respective basic downmixers 201f and 201g are representative downmix signals of the left and right signals.

즉, 상기 기본 다운믹서1(201f)은 상기 스테레오 채널 다운믹서(113)로부터 출력되는 다운믹스된 left 신호 및 상기 다채널 다운믹서(115)로부터 출력되는 다운믹스된 left 신호를 다시 다운믹스하여, 1개의 대표 다운믹스 left 신호를 출력하고, 부가정보를 추출한다.That is, the basic down mixer 201f down-mixes the downmixed left signal output from the stereo channel downmixer 113 and the downmixed left signal output from the multi-channel downmixer 115, One representative down mix left signal is output, and additional information is extracted.

그리고, 상기 기본 다운믹서1(201g)은 상기 스테레오 채널 다운믹서(113)로부터 출력되는 다운믹스된 right 신호 및 상기 다채널 다운믹서(115)로부터 출력되는 다운믹스된 right 신호를 다시 다운믹스하여, 1개의 대표 다운믹스 right 신호를 출력하고, 부가정보를 추출한다.The basic down mixer 201g down-mixes the downmixed right signal output from the stereo channel downmixer 113 and the downmixed right signal output from the multi-channel downmixer 115, One representative downmix right signal is output, and additional information is extracted.

여기서, 도 2에서 설명된 바와 같이, 실시예에 따라서는 1개의 기본 다운믹서1이 2번 이용될 수 있다.Here, as described in FIG. 2, one basic down mixer 1 may be used twice depending on the embodiment.

다음, 상기 기본 다운믹서2(501)는 상기 모노 채널 다운믹서(111)로부터 출력되는 다운믹스 모노 신호, 상기 기본 다운믹서1(201f 및 201g)로부터 출력되는 대표 다운믹스 left 신호 및 대표 다운믹스 right 신호를 다운믹스하여, 전체 다운믹스 left 신호 및 전체 다운믹스 right 신호를 출력하고, 공간큐가 포함된 부가정보를 추출한다.Next, the basic down-mixer 2 501 receives a down-mix mono signal output from the mono channel down-mixer 111, a representative down-mix left signal output from the basic down-mixers 201 f and 201 g, and a representative down- Mixes the signal, outputs the entire downmix left signal and the entire downmix right signal, and extracts additional information including the spatial cue.

여기서, 상기 부가정보에는 공간큐 및 오디오 신호 복원 및 제어를 위한 헤더 정보가 포함된다. 부가정보는 하기 도 6에서 상세하게 설명된다.Here, the additional information includes header information for restoring and controlling the spatial cue and the audio signal. The additional information is described in detail in Fig.

상기 기본 다운믹서1(201)과 상기 기본 다운믹서2(501)는 각각 하기 수학식 1 및 수학식 2에 기초하여 입력 오디오 신호를 다운믹싱한다.The basic down mixer 201 and the basic down mixer 501 downmix an input audio signal based on Equations (1) and (2), respectively.

여기서 w _b ^ij 는 가중요소(weighting factor)로서 입력 오디오 신호의 다운믹싱 레벨을 조절한다. S _b ^j (f)는 상기 기본 다운믹서1(201)과 상기 기본 다운믹서2(501)의 입력 오디오 신호로서 모노 신호 또는 스테레오의 left 신호 또는 right 신호이다. 아래첨자 b는 서브밴드를 나타내는 인덱스로서 각 가중요소(w _b ^ij )는 서브밴드별로 정의될 수 있다.Where w _b ^ij adjusts the downmixing level of the input audio signal as a weighting factor. S _b ^j (f) is a mono signal or stereo left signal or right signal as the input audio signals of the basic down mixer 201 and the basic down mixer 501. The subscript b is an index representing a subband, and each weighting factor w _b ^ij can be defined for each subband.

가중요소는 입력되는 오디오 객체의 표현목적에 따라 달리 정의될 수 있다. 예를 들어, 임의의 모노 신호 S _b ^j (f)가 주된 신호로 부호화되기 위하여, S _b ^j (f)에 해당하는 가중요소가 상대적으로 큰 값으로 정의될 수 있다. 수학식 1에서 w _b ¹¹ = 0.7, w _b ¹² = 0.3인 경우, 다운믹스 신호는 S _b ^k (f) = 0.7S _b ¹ (f) + 0.3S _b ² (f)로서 S _b ¹ (f)가 주된 신호로 다운믹스된다. The weighting factors can be defined differently depending on the purpose of presentation of the input audio object. For example, in order for an arbitrary mono signal S _b ^j (f) to be encoded as a main signal, a weighting factor corresponding to S _b ^j (f) may be defined as a relatively large value. If the _{^{_{w b 11 = 0.7, w b}}} 12 = 0.3 In Equation (1), the down-mix signal _{^{S b k (f) = 0.7}} S b 1 (f) + 0.3 S b S b 1 (f a ² (f) ) Is downmixed to the main signal.

상기 가중요소는 다운믹스 신호에 대한 표현 목적의 제약조건(constraint condition)에 따라 정해질 수 있다. 여기서 제약조건이란, 음향 장면(scene)에 대한 제약조건으로서, 예를 들어, 다운믹스된 오디오 신호에서 바이올린 및 기타에 대한 오디오 신호가 바이올린 0.7, 기타 0.3 비율로 재생(playback)되기 위해서 각각의 가중요소가 각각 0.7 및 0.3으로 정해질 수 있다. 제약조건 정보는 시스템 혹은 사용자 등 외부로부터 입력에 의해 결정된다.The weighting factor may be determined according to a constraint condition of the presentation purpose for the downmix signal. Here, the constraint is a constraint on a sound scene. For example, in a downmixed audio signal, an audio signal for a violin and the like is multiplied by a weight Elements can be set to 0.7 and 0.3, respectively. Constraint information is determined by input from outside such as system or user.

한편, 상기 가중요소는 공간큐 레벨 정보에 반영되어야 한다. 예를 들어, CLD가 공간큐로서 이용되는 경우, 수학식 1에 대해서 수학식 3과 같이 공간큐 레벨 정보가 예측될 수 있다.Meanwhile, the weighting factor should be reflected in the spatial cue level information. For example, when CLD is used as a spatial cue, spatial cue level information can be predicted as shown in Equation (3) for Equation (1).

여기서, P()는 파워 연산자로서,

을 이용하여 각 신호의 파워 합이 계산될 수 있다. A _b 및 A _b ₊ ₁ 는 서브밴드의 경계(boundary)를 나타낸다.Here, P () is a power operator,

The power sum of each signal can be calculated. A _b and A _b ₊ ₁ denote the boundaries of the subbands.

기본 다운믹서2(501)는 MPEG Surround의 TTT(Three-to-Two) box와 동일하게 공간큐를 추출한다.The basic downmixer 2 (501) extracts a spatial cue in the same way as a three-to-two (TTT) box of MPEG Surround.

도 6은 본 발명의 일 실시예에 따른 도 1의 부가정보 인코더(107)로부터 생성되는 부가정보 비트스트림의 구조를 나타내는 구성도이다.6 is a configuration diagram illustrating a structure of a side information bit stream generated from the side information encoder 107 of FIG. 1 according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 부가정보 비트스트림은 헤더정보 및 공간큐(spatial cue)를 포함한다.As shown in FIG. 6, the additional information bitstream includes header information and a spatial cue.

상기 헤더정보는 다양한 채널로 구성된 다객체 오디오 신호의 복원 및 재생을 위한 정보를 포함하며, 오디오 객체에 대한 채널 정보 및 해당 오디오 객체의 ID를 정의함으로써 모노, 스테레오, 다채널의 오디오 객체에 대한 복호화 정보를 제공할 수 있다.The header information includes information for restoration and reproduction of a multi-object audio signal composed of various channels, and defines channel information for the audio object and an ID of the corresponding audio object, thereby decoding the audio object of a mono, stereo, Information can be provided.

즉, 예를 들어, 부호화된 특정 오디오 객체가 모노 오디오 신호인지 스테레오 오디오 신호인지 구분될 수 있도록 식별 ID 및 객체별 정보가 정의될 수 있다.That is, for example, the identification ID and the object-specific information can be defined such that the specific audio object coded can be distinguished as a mono audio signal or a stereo audio signal.

상기 헤더정보는 일 실시예로서 SAC(Spatial Audio Coding) 헤더 정보, 오디오 객체 정보 및 프리셋(preset) 정보를 포함할 수 있다. The header information may include Spatial Audio Coding (SAC) header information, audio object information, and preset information.

일 실시예로서 상기 SAC 헤더 정보는 공간큐 기반의 오디오 부호화 과정에서 생성되는 정보로서, 타임 슬럿(time-slot) 정보를 포함할 수 있다. 상기 SAC 헤더 정보는 상기 제1 다운믹서(101) 및 상기 제2 다운믹서(103)가 부가정보를 추출하는 과정에서 추출된다. In one embodiment, the SAC header information is information generated in a spatial cue-based audio encoding process and may include time-slot information. The SAC header information is extracted in the process of extracting additional information by the first downmixer 101 and the second downmixer 103.

일 실시예로서, 상기 오디오 객체 정보는 다운믹스되는 오디오 객체들이 모노, 스테레오, 또는 다채널인지를 식별하기 위한 정보 및 객체 ID 정보를 포함한다. 예를 들어, 채널별 오디오 객체 수(모노 오디오 객체수, 스테레오 오디오 객체수, 다채널 오디오 객체수), 각 채널별 오디오 객체들의 인덱스 정보(ID, 오디오 객체가 모노, 스테레오, 다채널인지 식별하는 정보)를 포함한다.In one embodiment, the audio object information includes information for identifying whether the downmix audio objects are mono, stereo, or multichannel, and object ID information. For example, the number of audio objects per channel (the number of mono audio objects, the number of stereo audio objects, the number of multi-channel audio objects), the index information of the audio objects of each channel (ID, whether the audio object is mono, Information).

일 실시예로서, 상기 프리셋 정보는 헤더 정보의 부가정보로서, 각 객체의 제어 정보가 미리 정의한다.In one embodiment, the preset information is additional information of header information, and control information of each object is defined in advance.

예를 들어, 상기 프리셋 정보는 프리셋 모드 정보 및 프리셋 모드 지원 정보를 포함한다. 상기 프리셋 모드 정보는 예를 들어 가라오케 모드, 솔로 객체 추출(extraction) 모드(기타 연주 추출, 피아노 연주 추출 등), 선호 렌더링 정보 및 기본 재생 모드 세팅(playback mode setting) 정보를 포함할 수 있다.For example, the preset information includes preset mode information and preset mode support information. The preset mode information may include, for example, a karaoke mode, a solo object extraction mode (guitar performance extraction, piano performance extraction, etc.), preferred rendering information, and basic playback mode setting information.

상기 프리셋 모드 지원 정보는 예를 들어 가라오케 모드 지원을 위한 정보로서 가라오케 모드에서 보컬 인덱스 정보, 솔로 객체 추출 모드 지원을 위한 정보로서 해당 객체 인덱스 정보, 선호 렌더링 지원을 위한 정보로서 각 객체별 렌더링 정보(rotation, elevation, speed 등), 및 기본 스테레오 및 다채널 재생 모드 세팅 지원을 위한 정보로서 각 오디오 객체별 최적의 렌더링 정보를 포함한다.The preset mode support information is information for supporting karaoke mode, for example, as vocal index information in karaoke mode, information for supporting a solo object extraction mode, corresponding object index information, rotation, elevation, speed, and the like), and basic stereo and multi-channel playback mode settings, and includes optimal rendering information for each audio object.

또한, 상기 부가정보에 포함되는 공간큐는 입력된 다객체 오디오 신호의 객체별 공간큐 정보를 포함한다.In addition, the spatial cue included in the additional information includes spatial cue information for each object of the input multi-object audio signal.

부가정보의 포맷은 설계자의 선택에 따라 다양하게 구성될 수 있다.The format of the additional information can be variously configured according to the designer's selection.

도 7은 본 발명의 일 실시예에 따른 도 6에 도시된 부가정보 비트스트림의 상세 구조도로서, 모노 및 스테레오 채널로 구성된 다객체 오디오 신호에 대한 부가정보를 나타낸다.FIG. 7 is a detailed structure diagram of the additional information bitstream shown in FIG. 6 according to an embodiment of the present invention, and shows additional information on a multi-object audio signal composed of mono and stereo channels.

도 7에서 도시된 것처럼, 헤더 정보는 채널별 오디오 객체 수(모노 오디오 객체수, 스테레오 오디오 객체수) 및 각 채널별 오디오 객체들의 인덱스 정보(ID, 오디오 객체가 모노, 스테레오, 혹은 다채널인지 식별하는 정보)를 포함하고, 또한 부가정보 비트스트림은 공간큐를 포함한다. 도 7의 일 실시예에서는 공간큐의 일 예로서 CLD 및 ICC가 이용되었다.7, the header information includes the number of audio objects per channel (the number of mono audio objects, the number of stereo audio objects) and the index information of audio objects of each channel (ID, whether the audio object is a mono, stereo, , And the additional information bitstream includes a spatial cue. In the embodiment of FIG. 7, CLD and ICC are used as an example of the space queue.

도 7에서 도시된 것처럼, 각 모노 및 스테레오 객체에 대응되는 공간큐(CLD, ICC)가 부가정보에 포함된다. 즉, 각 입력 오디오 객체에 대응되는 공간큐 정보는 모두 부가정보에 포함되어야 한다.As shown in Fig. 7, the spatial information (CLD, ICC) corresponding to each mono and stereo object is included in the additional information. That is, the spatial cue information corresponding to each input audio object must be included in the additional information.

도 8은 본 발명의 다른 일 실시예에 따른 도 6에 도시된 부가정보 비트스트림의 상세 구조도로서, 모노, 스테레오 및 다채널로 구성된 다객체 오디오 신호에 대한 부가정보를 나타낸다.FIG. 8 is a detailed structure diagram of the additional information bitstream shown in FIG. 6 according to another embodiment of the present invention, and shows additional information on a multi-object audio signal composed of mono, stereo, and multi-channels.

도 8에서 도시된 것처럼, 헤더 정보는 채널별 오디오 객체 수(모노 오디오 객체수, 스테레오 오디오 객체수, 다채널 오디오 객체수) 및 각 채널별 오디오 객체들의 인덱스 정보(ID, 오디오 객체가 모노, 스테레오, 다채널인지 식별하는 정보)를 포함하고, 또한 부가정보 비트스트림은 공간큐를 포함한다. 도 8의 일실시예에서는 공간큐의 일 예로서 CLD 및 ICC가 이용되었다.As shown in FIG. 8, the header information includes the number of audio objects per channel (the number of mono audio objects, the number of stereo audio objects, the number of multi-channel audio objects) , Multi-channel identification information), and the additional information bitstream includes a spatial cue. In one embodiment of FIG. 8, CLD and ICC are used as an example of a space queue.

한편, 여기서 다채널 객체에 대한 공간큐는, 모노 및 스테레오 객체에 대한 공간큐와 캐스케이드 멀티플렉싱(Cascaded Multiplexing)되어 하나의 부가정보 비트스트림으로 표현될 수 있다. 상기 모노 채널 다운믹서(111), 스테레오 채널 다운믹서(113) 및 제2 다운믹서(103)에 의해 추출된 공간큐가 도 8의 모노 및 스테레오 오디오 객체에 대한 공간큐이고, 상기 다채널 다운믹서(115)에 의해 추출된 공간큐가 도 8의 다채널 오디오 객체에 대한 공간큐이다.Here, the spatial cue for a multi-channel object can be represented by a side information bit stream by cascaded multiplexing with a spatial cue for mono and stereo objects. The spatial cues extracted by the mono channel downmixer 111, the stereo channel downmixer 113 and the second downmixer 103 are spatial cues for the mono and stereo audio objects of FIG. 8, The spatial cue extracted by the spatial cue 115 is a spatial cue for the multi-channel audio object of Fig.

도 9는 본 발명의 일 실시예에 따른 다양한 채널의 다객체 오디오 신호를 복호화하는 장치의 구성도이다.9 is a block diagram of an apparatus for decoding multi-object audio signals of various channels according to an embodiment of the present invention.

본 실시예에 따른 다양한 채널의 다객체 오디오 신호를 복호화하는 장치는, 예를 들어 도 1의 부호화 장치로부터 생성된 오디오 비트스트림으로부터 공간큐 정보를 추출하고, 추출된 공간큐를 이용하여 각 채널 정보를 예측함으로써, 다양한 채널로 구성된 다객체 오디오 신호(모노, 스테레오 및 다채널 오디오 객체를 포함하는 오디오 신호)를 복원한다.The apparatus for decoding multi-object audio signals of various channels according to the present embodiment may extract spatial cue information from an audio bitstream generated from the encoding apparatus of FIG. 1, (Audio signal including mono, stereo, and multi-channel audio objects) composed of various channels.

도 9에 도시된 것처럼, 본 실시예에 따른 다양한 채널의 다객체 오디오 신호를 복호화하는 장치는, 역다중화부(demultiplexer, DEMUX, 901), 오디오 디코더(903), 부가정보 해석부(905), 오디오 객체 추출부(907) 및 렌더링 처리부(909)를 포함한다.9, an apparatus for decoding multi-object audio signals of various channels according to the present embodiment includes a demultiplexer (DEMUX) 901, an audio decoder 903, a side information analyzer 905, An audio object extracting unit 907, and a rendering processing unit 909.

상기 디멀티플렉서(901)는 예를 들어 도 1의 부호화 장치로부터 생성된 오디오 비트스트림으로부터 오디오 정보 비트스트림과 부가정보 비트스트림을 분리한다.The demultiplexer 901 separates an audio information bit stream and a side information bit stream from an audio bit stream generated by the encoding apparatus shown in Fig.

상기 오디오 디코더(903)는 상기 디멀티플렉서(901)에 의해 분리된 오디오 정보 비트스트림으로부터 다운믹스 오디오 신호를 복원한다. The audio decoder 903 restores the downmix audio signal from the audio information bit stream separated by the demultiplexer 901.

상기 부가정보 해석부(905)는 상기 디멀티플렉서(901)에 의해 분리된 부가정보 비트스트림으로부터 각 오디오 객체의 공간큐 정보가 포함된 부가정보를 추출한다.The additional information analysis unit 905 extracts additional information including spatial cue information of each audio object from the additional information bit stream separated by the demultiplexer 901. [

상기 오디오 객체 추출부(907)는 상기 부가정보 해석부(905)로부터 추출된 부가정보의 헤더 정보를 이용하여, 다운믹스 오디오 신호로부터 객체별 오디오 신호를 복원한다.The audio object extracting unit 907 recovers an object-specific audio signal from the downmix audio signal using the header information of the additional information extracted from the additional information analyzing unit 905.

상기 헤더 정보에는 채널별 오디오 객체 수(모노 오디오 객체수, 스테레오 오디오 객체수, 다채널 오디오 객체수) 및 각 채널별 오디오 객체들의 인덱스 정보(ID, 오디오 객체가 모노, 스테레오, 다채널인지 식별하는 정보)가 포함되어 있다. 따라서, 상기 오디오 객체 추출부(907)는 상기 부가정보 해석부(905)로부터 추출된 부가정보의 헤더 정보 및 공간큐 정보에 기초하여 상기 오디오 디코더(903)에 의해 출력된 다운믹스 오디오 신호로부터 객체별 오디오 신호를 복원할 수 있다.The header information includes the number of audio objects per channel (the number of mono audio objects, the number of stereo audio objects, the number of multi-channel audio objects) and the index information of audio objects of each channel (ID, whether the audio object is mono, Information). Accordingly, the audio object extracting unit 907 extracts the object from the downmix audio signal output by the audio decoder 903 based on the header information and the spatial cue information of the additional information extracted from the additional information analyzing unit 905, It is possible to restore a separate audio signal.

상기 렌더링 처리부(909)는, 상기 오디오 객체 추출부(907)에 의해 복원된 각 오디오 객체에 대한 렌더링 제어 정보(예를 들어, 공간상 오디오 객체의 위치 및 크기) 및 출력 채널 제어 정보(예를 들어, 5.1 또는 7.1 채널, 스테레오)를 외부로부터 입력 받아, 상기 오디오 객체 추출부(907)로부터 복원된 각 객체별 오디오 신호를 정위시켜 오디오 신호를 출력한다.The rendering processing unit 909 generates rendering control information (e.g., position and size of a spatial audio object) and output channel control information (e.g., a position and a size of a spatial audio object) for each audio object restored by the audio object extracting unit 907 (5.1 or 7.1 channel, stereo) from the outside, and outputs the audio signal by positioning the audio signal of each object restored from the audio object extracting unit 907.

도 10은 본 발명에 다른 일 실시예에 따른 다양한 채널의 다객체 오디오 신호를 복호화하는 장치의 구성도이다. 도 10의 일 실시예는 각 객체별 복원된 오디오 신호를 렌더링 하는 도 9의 일 실시예와 달리, 부가정보를 제어한 다음, 제어된 부가정보에 따라 오디오 객체를 렌더링함으로써 오디오 신호를 복원한다.10 is a block diagram of an apparatus for decoding multi-object audio signals of various channels according to an embodiment of the present invention. In an embodiment of FIG. 10, unlike the embodiment of FIG. 9 which renders the reconstructed audio signal for each object, the additional information is controlled and the audio object is rendered by rendering the audio object according to the controlled additional information.

도 10에 도시된 것처럼, 본 발명의 다른 일 실시예에 따른 다양한 채널의 다객체 오디오 신호를 복호화하는 장치는, 디멀티플렉서(901), 오디오 디코더(903), 부가정보 해석부(905), 부가정보 제어부(1001) 및 SAC 디코더(1003)를 포함한다.10, an apparatus for decoding multi-object audio signals of various channels according to another embodiment of the present invention includes a demultiplexer 901, an audio decoder 903, a side information analyzing unit 905, A control unit 1001 and a SAC decoder 1003. [

상기 디멀티플렉서(901), 오디오 디코더(903) 및 부가정보 해석부(905)는 도 9의 디멀티플렉서(901), 오디오 디코더(903), 부가정보 해석부(905)와 동일한 구성을 가질 수 있다.The demultiplexer 901, the audio decoder 903 and the additional information analyzing unit 905 may have the same configuration as the demultiplexer 901, the audio decoder 903 and the additional information analyzing unit 905 shown in FIG.

상기 부가정보 제어부(1001)는, 상기 오디오 디코더(903)에 의해 복원된 다운믹스 오디오 신호에 대한 렌더링 제어 정보(예를 들어, 공간상 오디오 객체의 위치 및 크기) 및 출력 채널 제어 정보(예를 들어, 5.1 또는 7.1 채널, 스테레오)를 외부로부터 입력 받아, 상기 부가정보 해석부(905)로부터 추출된 부가정보(예를 들어, 각 오디오 객체의 신호 크기 정보 및 상관성 정보)를 상기 외부 입력 신호에 따라 제어한다.The additional information control unit 1001 may control rendering control information (e.g., position and size of a spatial audio object) and output channel control information (e.g., a position and a size of a spatial audio object) of the downmix audio signal restored by the audio decoder 903 (For example, signal size information and correlation information of each audio object) extracted from the additional information analysis unit 905 is input to the external input signal Respectively.

상기 SAC 디코더(1003)는 상기 부가정보 제어부(1001)에 의해 제어된 부가정보를 이용하여, 상기 오디오 디코더(903)로부터 복원된 다운믹스 오디오 신호를 다양한 채널의 다객체 오디오 신호로 복원한다.The SAC decoder 1003 restores the downmix audio signal restored from the audio decoder 903 into multi-object audio signals of various channels by using the side information controlled by the side information control unit 1001. [

상기 SAC 디코더(1003)는 상기 부가정보 제어부(1001)에 의해 제어된 부가정보의 헤더 정보를 이용하여, 다운믹스 오디오 신호로부터 객체별 오디오 신호를 복원한다.The SAC decoder 1003 restores the object-specific audio signal from the downmix audio signal using the header information of the sub information controlled by the sub information controller 1001. [

상기 헤더 정보에는 채널별 오디오 객체 수(모노 오디오 객체수, 스테레오 오디오 객체수 및 다채널 오디오 객체수) 및 각 채널별 오디오 객체들의 인덱스 정보(ID 및 오디오 객체가 모노, 스테레오 또는 다채널인지 식별하는 정보)가 포함되어 있기 때문에 상기 SAC 디코더(1003)는 상기 부가정보 제어부(1001)에 의해 제어된 부가정보의 헤더 정보 및 공간큐 정보에 기초하여 상기 오디오 디코더(903)에 의해 출력된 다운믹스 오디오 신호로부터 객체별 오디오 신호를 복원할 수 있다.In the header information, the number of audio objects per channel (the number of mono audio objects, the number of stereo audio objects and the number of multichannel audio objects), and the index information of the audio objects of each channel (ID and whether the audio object is mono, The SAC decoder 1003 generates the downmix audio signal output from the audio decoder 903 based on the header information and the spatial cue information of the sub information controlled by the sub information controller 1001, It is possible to restore the object-specific audio signal from the signal.

도 11은 본 발명의 일 실시예에 따른 다객체 오디오 부호화 방법을 나타내는 흐름도이다.11 is a flowchart illustrating a multi-object audio encoding method according to an embodiment of the present invention.

도 1에서 도시된 것처럼, 입력된 다양한 채널의 다객체 오디오 신호는 입력 오디오 객체의 헤더정보에 의해 모노, 스테레오 또는 다채널로 식별되고, 채널별로 그룹핑된다(S1101). As shown in FIG. 1, input multi-object audio signals of various channels are identified as mono, stereo, or multi-channel by header information of the input audio object and are grouped by channel (S1101).

다음, 단계(S1101)에 의해 동일한 채널별로 그룹핑된 음원이 다운믹스되고, 공간큐를 포함하는 부가정보가 추출된다(S1103).Next, in step S1101, the sound sources grouped by the same channel are downmixed, and additional information including a space queue is extracted (S1103).

즉, 입력된 모노 오디오 객체로부터 다운믹스 신호 및 공간큐를 포함하는 부가정보가 추출되고, 입력된 스테레오 오디오 객체로부터 다운믹스 신호 및 공간큐를 포함하는 부가정보가 추출되며, 입력된 다채널(예를 들어 5.1 채널) 오디오 객체로부터 다운믹스 신호 및 공간큐를 포함하는 부가정보가 추출된다.That is, the downmix signal and the additional information including the spatial cue are extracted from the inputted mono audio object, the additional information including the downmix signal and the spatial cue is extracted from the input stereo audio object, And the additional information including the downmix signal and the spatial cue are extracted from the audio object.

상기 단계(S1103)로부터 출력되는 제1 다운믹스 신호는 스테레오 신호 또는 모노 신호이다. 즉, 상기 입력된 모노 오디오 객체로부터 출력되는 다운믹스 신호는 모노 신호이며, 스테레오 오디오 객체 및 다채널 오디오 객체로부터 출력되는 다운믹스 신호는 모노 또는 스테레오 신호이다.The first downmix signal output from the step S1103 is a stereo signal or a mono signal. That is, the downmix signal output from the input mono audio object is a mono signal, and the downmix signal output from the stereo audio object and the multi-channel audio object is a mono or stereo signal.

다음, 단계(S1103)로부터 출력되는 제1 다운믹스 신호는 제2 다운믹스되고, 상기 제2 다운믹스 과정에서 분석된 공간큐를 포함하는 부가정보가 추출된다(S1105). 여기서 제2 다운믹스 신호는 모드에 따라 모노 또는 스테레오 신호이다.Next, the first downmix signal output from step S1103 is second downmixed, and additional information including the spatial cue analyzed in the second downmixing process is extracted (S1105). Here, the second downmix signal is a mono or stereo signal depending on the mode.

다음, 단계(S1105)로부터 출력된 제2다운믹스 신호가 부호화된다(S1107).Next, the second downmix signal output from step S1105 is encoded (S1107).

다음, 단계(S1103)로부터 출력된 부가정보 및 단계(S1105)로부터 출력된 부가정보를 이용하여 부가정보 비트스트림이 생성된다(S1109).Next, a side information bitstream is generated using the side information output from step S1103 and the side information output from step S1105 (S1109).

다음, 단계(S1107)로부터 부호화된 신호 및 단계(S1109)로부터 생성된 부가정보 비트스트림이 멀티플렉싱되어 복호화 장치로 전송되는 비트스트림이 생성된다(S1111).Next, the encoded signal from step S1107 and the additional information bit stream generated from step S1109 are multiplexed and a bitstream to be transmitted to the decoding apparatus is generated (S1111).

도 12는 본 발명의 일 실시예에 따른 다객체 오디오 복호화 방법을 나타내는 흐름도이다.12 is a flowchart illustrating a multi-object audio decoding method according to an embodiment of the present invention.

도 12에 도시된 것처럼, 단계(S1111)에 의해 생성된 오디오 비트스트림으로부터 오디오 정보 비트스트림과 부가정보 비트스트림이 분리된다(S1201).As shown in FIG. 12, the audio information bit stream and the additional information bit stream are separated from the audio bit stream generated in step S1111 (S1201).

다음, 단계(S1201)에 의해 분리된 오디오 정보 비트스트림으로부터 다운믹스 오디오 신호가 복원된다(S1203). Next, the downmix audio signal is recovered from the audio information bit stream separated by the step S1201 (S1203).

다음, 단계(S1201)에 의해 분리된 부가정보 비트스트림으로부터 각 오디오 객체의 공간큐 정보가 포함된 부가정보가 추출된다(S1205).Next, additional information including spatial cue information of each audio object is extracted from the bitstream separated by the step S1201 (S1205).

다음, 단계(S1205)로부터 추출된 부가정보의 헤더 정보를 이용하여, 다운믹스 오디오 신호로부터 객체별 오디오 신호가 복원된다(S1207).Next, the object-specific audio signal is restored from the downmix audio signal using the header information of the additional information extracted from step S1205 (S1207).

상기 헤더 정보에는 채널별 오디오 객체 수(모노 오디오 객체수, 스테레오 오디오 객체수 및 다채널 오디오 객체수) 및 각 채널별 오디오 객체들의 인덱스 정보(ID 및 오디오 객체가 모노, 스테레오 또는 다채널인지 식별하는 정보)가 포함되어 있기 때문에 상기 단계(S1205)로부터 추출된 부가정보의 헤더 정보 및 공간큐 정보에 기초하여 상기 단계(S1203)에 의해 출력된 다운믹스 오디오 신호로부터 객체별 오디오 신호가 복원될 수 있다.In the header information, the number of audio objects per channel (the number of mono audio objects, the number of stereo audio objects and the number of multichannel audio objects), and the index information of the audio objects of each channel (ID and whether the audio object is mono, Information), the object-specific audio signal can be recovered from the downmix audio signal output in step S1203 based on the header information and the spatial cue information of the additional information extracted from the step S1205 .

다음, 단계(S1207)에 의해 복원된 각 오디오 객체에 대한 렌더링 제어 정보(예를 들어, 공간상 오디오 객체의 위치 및 크기) 및 출력 채널 제어 정보(예를 들어, 5.1 또는 7.1 채널, 스테레오)가 외부로부터 입력 되어, 상기 단계(S1207)로부터 복원된 각 객체별 오디오 신호가 정위되어, 다객체 오디오 신호가 출력된다(S1209).Next, the rendering control information (e.g., the position and size of the spatial audio object) and the output channel control information (e.g., 5.1 or 7.1 channel, stereo) for each audio object restored by step S1207 The audio signal of each object restored from the step S1207 is inputted, and a multi-object audio signal is outputted (S1209).

도 13은 본 발명의 다른 일 실시에에 따른 다객체 오디오 복호화 방법을 나타내는 흐름도이다.13 is a flowchart illustrating a multi-object audio decoding method according to another embodiment of the present invention.

도 13에 도시된 것처럼, 단계(S1111)에 의해 생성된 오디오 비트스트림으로부터 오디오 정보 비트스트림과 부가정보 비트스트림이 분리된다(S1301).As shown in FIG. 13, the audio information bit stream and the additional information bit stream are separated from the audio bit stream generated in step S1111 (S1301).

다음, 단계(S1301)에 의해 분리된 오디오 정보 비트스트림으로부터 다운믹스 오디오 신호가 복원된다(S1303). Next, the downmix audio signal is recovered from the audio information bit stream separated by the step S1301 (S1303).

다음, 단계(S1301)에 의해 분리된 부가정보 비트스트림으로부터 각 오디오 객체의 공간큐 정보가 포함된 부가정보가 추출된다(S1305).Next, additional information including spatial cue information of each audio object is extracted from the bitstream separated by the step S1301 (S1305).

다음, 단계(S1303)에 의해 복원된 각 오디오 객체에 대한 렌더링 제어 정보(예를 들어, 공간상 오디오 객체의 위치 및 크기) 및 출력 채널 제어 정보(예를 들어, 5.1 또는 7.1 채널, 스테레오)를 외부로부터 입력 받아, 상기 단계(S1305)로부터 추출된 부가정보(예를 들어, 각 오디오 객체의 신호 크기 정보 및 상관성 정보)가 상기 외부 입력 신호에 따라 제어된다(S1307).Next, the rendering control information (for example, the position and size of the spatial audio object) and the output channel control information (e.g., 5.1 or 7.1 channel, stereo) for each audio object restored by the step S1303 The additional information (for example, the signal size information and the correlation information of each audio object) extracted from the step S1305 is controlled in accordance with the external input signal (S1307).

다음, 단계(S1307)에 의해 제어된 부가정보를 이용하여, 상기 단계(S1303)로부터 복원된 다운믹스 오디오 신호가 다양한 채널의 다객체 오디오 신호로 복원된다(S1309).Next, the downmix audio signal recovered from the step S1303 is restored to a multi-object audio signal of various channels using the additional information controlled by the step S1307 (S1309).

상기 단계(S1307)에 의해 제어된 부가정보의 헤더 정보를 이용하여, 다운믹스 오디오 신호로부터 객체별 오디오 신호가 복원된다.The object-specific audio signal is restored from the downmix audio signal using the header information of the additional information controlled in step S1307.

상기 헤더 정보에는 채널별 오디오 객체 수(모노 오디오 객체수, 스테레오 오디오 객체수 및 다채널 오디오 객체수) 및 각 채널별 오디오 객체들의 인덱스 정보(ID 및 오디오 객체가 모노, 스테레오 또는 다채널인지 식별하는 정보)가 포함되어 있기 때문에, 상기 단계(S1307)에 의해 제어된 부가정보의 헤더 정보 및 공간큐 정보에 기초하여 상기 단계(S1303)에 의해 출력된 다운믹스 오디오 신호로부터 객체별 오디오 신호가 복원될 수 있다.In the header information, the number of audio objects per channel (the number of mono audio objects, the number of stereo audio objects and the number of multichannel audio objects), and the index information of the audio objects of each channel (ID and whether the audio object is mono, The audio signal per object is restored from the downmix audio signal output in step S1303 based on the header information and the spatial cue information of the sub information controlled by the step S1307 .

도 14는 본 발명의 일 실시예에 따른 기본 다운믹서(201a 내지 201d)의 상세 구조도이다.14 is a detailed structural diagram of basic downmixers 201a to 201d according to an embodiment of the present invention.

상기 기본 다운 믹서는 제1 서브-밴드(sub-band) 분할기(splitter)(1410), 제2 서브-밴드 분할기(1420), 변환기(1440), 고대역-밴드 공간큐 추출기(high-band spatial cue extraction)(1430) 및 저대역 밴드 공간큐 추출기(low-band spatial cue extraction)(1450), 서브-밴드 연쇄기(concatenator)(1470)를 포함한다.The basic down-mixer includes a first sub-band splitter 1410, a second sub-band splitter 1420, a transformer 1440, a high-band spatial cue extractor 1440, cue extraction 1430 and a low-band spatial cue extraction 1450 and a sub-band concatenator 1470. The low-

상기 제1 서브-밴드 분할기(1410)는 입력 신호 X_R1[n]을 주파수 대역에 따라 서브-밴드로 분할하여, M개의 필터링된 신호 X_R11[k] 내지 X_R1M[k]를 생성한다.The first sub-band divider 1410 divides the input signal X _R1 [n] into sub-bands according to the frequency band to generate M filtered signals X _R11 [k] to X _R1M [k].

상기 입력 신호 X_R1[n]에서, R은 right 채널을 나타내고, 1은 제1 스테레오 객체를 나타내며, n은 시간 축을 나타낸다. 상기 채널 및 스테레오 객체는 예시적인 것이다.In the input signal X _R1 [n], R represents the right channel, 1 represents the first stereo object, and n represents the time axis. The channel and stereo object are exemplary.

상기 서브-밴드로 분할된 신호 X_R11[k] 내지 X_R1M[k]에서, R은 right 채널을 나타내고, 앞의 숫자 1은 제1 스테레오 객체를 나타내며, 뒤의 숫자는 서브-밴드의 번호를 나타내며, k는 시간 축을 나타낸다.In the sub-band divided signals X _R11 [k] to X _R1M [k], R represents the right channel, the previous number 1 represents the first stereo object, and the following numbers represent the sub- , And k represents a time axis.

도 15는 본 발명의 일 실시예에 따른 서브-밴드로 분할된 신호들을 도시한다.15 illustrates sub-band divided signals in accordance with an embodiment of the present invention.

순서대로, X_R1M[k]는 가장 높은 주파수 대역의 시간축 상의 신호이고, X_R11[k]는 가장 낮은 주파수 대역의 시간축 상의 신호이다.In order, X _R1M [k] is the signal on the time axis of the highest frequency band, and X _R11 [k] is the signal on the time axis of the lowest frequency band.

상기 서브-밴드로 분할된 신호 중 일부, 예컨대 X_R11[k] 내지 X_R1j[k]의 j개의 낮은 주파수 신호는 상기 변환기(1440)로 입력된다.The j low frequency signals of some of the sub-band divided signals, for example X _R11 [k] to X _R1j [k], are input to the converter 1440.

상기 변환기(1440)는 입력되는 신호 각각을 윈도우잉(windowing)한다.The converter 1440 windowing each of the input signals.

도 16은 본 발명의 일 실시예에 따른 각각의 신호에 대한 윈도우잉의 예를 도시한다.16 shows an example of windowing for each signal according to an embodiment of the present invention.

입력 신호 X_R11[k] 내지 X_R1j[k] 각각은 각각 N개의 서브프레임(실선)으로 이루어진다. 각각의 상기 입력 신호는 상기 입력 신호의 이전 서브프레임 N개(점선)와 묶여 윈도우잉된다.Each of the input signals X _R11 [k] to X _R1j [k] consists of N subframes (solid lines). Each of the input signals is bounded by N (dotted lines) of the previous sub-frame of the input signal.

상기 이전 서브프레임 N개는 상기 변환기의 히스토리 버퍼(history buffer) 내에 저장될 수 있다.The previous N subframes may be stored in the history buffer of the transcoder.

입력 신호의 시간 축이 k이므로, 이전 서브프레임 N개를 나타내는 시간 축은 k-N이 된다.Since the time axis of the input signal is k, the time axis representing N of the previous sub-frames is k-N.

상기 변환기(1440)는 상기 윈도우잉된 신호를 주파수 영역으로 변환하여(time to frequency transform; T/F transform) 출력한다.The transformer 1440 transforms the windowed signal into a frequency domain (time to frequency transform (T / F transform)).

상기 변환은 예컨대 이산 푸리에 변환(discrete fourier transform; DFT) 또는 변형된 이산 코사인 변환(modified discrete cosine transform)일 수 있다.The transform may be, for example, a discrete fourier transform (DFT) or a modified discrete cosine transform.

상기 변환은 1024-DFT 또는 2048-DFT 등의 긴-블록(long-block) 시간-주파수(time-to frequency) 변환 방식일 수 있다.The transform may be a long-block time-to-frequency conversion scheme such as 1024-DFT or 2048-DFT.

상기와 같은 긴-블록 변환은 변환에 필요한 계산 양을 낮출 수 있다.Such a long-block transform can reduce the amount of computation required for the transform.

입력 신호의 서브프레임 및 상기 입력 신호의 이전 서브프레임은 모두 합쳐 2N개 이므로, 2N의 윈도우잉 및 2N의 주파수 변환이 수행된다.Since both the subframe of the input signal and the previous subframe of the input signal are both 2N in total, 2N windowing and 2N frequency conversion are performed.

따라서, 서브-밴드로 분할된 신호 X_R11[k] 내지 X_R1j[k]는 상기 변환기(1440)를 통해 신호 S¹ _b,1(f) 내지 S¹ _b,j(f)로 각각 주파수 영역 변환된다. Thus, the sub-band divided signals X _R11 [k] to X _R1j [k] are converted into signals S ¹ _{b, 1} (f) through S ¹ _{b, j} (f) through the transducer 1440, .

상기 신호 S의 윗 첨자는 스테레오 객체를 나타내는 객체 인덱스이고, 아랫 첨자의 b는 주파수 빈(frequency bin) 인덱스이고, 아랫 첨자의 숫자는 서브-밴드를 나타내는 서브-밴드 인덱스이다.The superscript of the signal S is an object index representing a stereo object, the subscript b of the subscript is a frequency bin index, and the number of subscripts is a sub-band index representing a sub-band.

상기 변환기(1440)는 서브-밴드로 분할된 신호들 각각을 위해 복수 개가 있을 수 있다.The transducer 1440 may be multiple for each of the sub-band divided signals.

또한, 상기 변환기(1440)는 상기 제1 서브-밴드 분할기(1410) 및 상기 제2 서브-밴드 분할기(1420)에 포함되는 구성 요소일 수 있으며, 상기 저대역-밴드 공간큐 추출기(1450)에 포함되는 구성 요소일 수 있다.In addition, the transformer 1440 may be a component included in the first sub-band divider 1410 and the second sub-band divider 1420, and the low band- May be included.

상기 제2 서브-밴드 분할기(1420)는 제2 스테레오 객체의 right 채널 입력 신호 X_R2[n]을 주파수 대역에 따라 서브-밴드로 분할하여, M개의 필터링된 신호 X_R21[k] 내지 X_R2M[k]를 생성한다.The second sub-band splitter 1420 divides the right channel input signal X _R2 [n] of the second stereo object into sub-bands according to the frequency band to generate M filtered signals X _R21 [k] to X _R2M [k].

상기 변환기(1440)는 상기 제2 서브-밴드 분할기(1420)에 의해 분할된 신호 중 일부, 예컨대 X_R21[k] 내지 X_R2j[k]의 j개의 낮은 주파수 신호에 대해서도 전술한 것과 같은 윈도우잉 및 주파수 영역 변환 작업을 수행하여 신호 S² _b,1(f) 내지 S² _b,j(f)를 출력한다.The transducer 1440 may be operable to receive j low frequency signals of some of the signals partitioned by the second sub-band divider 1420, e.g., X _R21 [k] through X _R2j [k] And a frequency domain transform operation to output signals S ² _{b, 1} (f) to S ² _{b, j} (f).

주파수 변환이 적용되는 밴드의 수(예컨대, j개)는 입력 신호의 샘플링 레이트(sampling rate) 등을 고려하여 증가 또는 감소될 수 있다.The number of bands to which the frequency conversion is applied (for example, j) may be increased or decreased in consideration of the sampling rate of the input signal and the like.

상기 저대역 밴드 공간큐 추출기(1450)는 상기 변환기(1440)로부터 주파수 변환된 신호(S¹ _b,1(f) 내지 S¹ _b,j(f), 또한 S2_b,1(f) 내지 S² _b,j(f)) 를 입력 받는다.The low band band space cue extractor 1450 extracts the frequency converted signals S ¹ _{b, 1} (f) to S ¹ _{b, j} (f) and S2 _{b, 1} (f) to S ² _{b, j} (f)).

상기 저대역 밴드 공간큐 추출기(1450)는 상기 입력 신호에 기반하여 공간 큐 등의 부가정보를 생성하여 출력한다.The low-band-band space cue extractor 1450 generates and outputs additional information such as a space cue based on the input signal.

상기 저대역 밴드 공간큐 추출기(1450)는 상기의 수학식 3에 따라 공간 큐를 추출할 수 있다.The low-band-band space cue extractor 1450 can extract a spatial cue according to Equation (3).

예컨대, 제2 스테레오 객체의 제1 서브-밴드의 T/F 변환된 신호를 S² _b,1(f)로 간주하여, 상기 수학식 3이 적용될 수 있다. 또한, 제1 스테레오 객체의 제1 서브-밴드의 T/F 변환된 신호는 S¹ _b,1(f)로 표현될 수 있다.For example, Equation (3) may be applied, assuming that the T / F converted signal of the first sub-band of the second stereo object is S ² _{b, 1} (f). Also, the T / F converted signal of the first sub-band of the first stereo object may be represented by S ¹ _{b, 1} (f).

예컨대, 상기 저대역 밴드 공간큐 추출기(1450)는 상기 수학식 1, 수학식 2 및 수학식 3에서의 S¹ _b(f)는 S¹ _b,1(f)로, S² _b(f)는 S² _b,1(f) 등으로 대치하여 상기의 수학식을 이용할 수 있다.For example, the low-band-band space queue extractor 1450 extracts S ² _b (f) as S ¹ _{b, 1} (f), S ¹ _b (f) Can be replaced with S ² _{b, 1} (f), and the above equation can be used.

상기 고대역 밴드 공간큐 추출기(1430)는 상기 제1 서브-밴드 분할기(1410) 및 상기 제2 서브-밴드 분할기(1420)로부터 각각 M-j개의 높은 주파수 신호(X_R1(j+1)[k] 내지 X_R1M[k], 그리고 X_R2(j+1)[k] 내지 X_R2M[k])를 입력 받는다.The highband band space cue extractor 1430 extracts Mj high frequency signals X _{R1 (j + 1)} [k] from the first sub-band divider 1410 and the second sub-band divider 1420, X _R1M [k], and X _{R2 (j + 1)} [k] to X _R2M [k].

상기 고대역 밴드 공간큐 추출기(1430)는 상기 입력 신호에 기반하여 공간 큐 등의 부가정보를 생성하여 출력한다.The highband band space cue extractor 1430 generates and outputs additional information such as a space cue based on the input signal.

상기 고대역 밴드 공간큐 추출기(1430)는 하기의 수학식 4에 따라 상기 공간 큐를 추출할 수 있다.The highband band space cue extractor 1430 can extract the space cue according to Equation (4) below.

여기서,

이다.here,

to be.

상기 저대역 밴드 공간큐 추출기(1450)의 입력 신호는 주파수 영역에서 분석된다.The input signal of the low-band band space cue extractor 1450 is analyzed in the frequency domain.

상기 고대역 밴드 공간큐 추출기(1430)의 입력 신호는 밴드 별로 필터링된 신호에 대해서 적용되는 것으로, 시간 축 상에 파워 연산자가 적용된다.The input signal of the highband band space cue extractor 1430 is applied to a band-filtered signal, and a power operator is applied on the time axis.

복수 개의 신호 결합기(1460) 각각은 상기 제1 서브-밴드 분할기(1410) 및 상기 제2 서브-밴드 분할기(1420)로부터 대응하는 서브-밴드 필터링된 신호(예컨대, X_R11[k] 및 X_R21[k])를 입력 받아, 두 신호를 결합하여 출력한다.Each of the plurality of signal combiners 1460 receives a corresponding sub-band filtered signal (e.g., X _R11 [k] and X _R21 [k]) from the first sub-band divider 1410 and the second sub- [k]), and combines and outputs the two signals.

상기 신호 결합기(1460)은 서브-밴드의 개수만큼 있을 수 있다. 또한, 상기 신호 결합기(1460)는 복수 개의 신호 쌍을 처리할 수 있다.The signal combiner 1460 may be as many as the number of sub-bands. Also, the signal combiner 1460 can process a plurality of signal pairs.

상기 신호 결합기는 상기 서브-밴드 연쇄기(1470)에 포함될 수 있다.The signal combiner may be included in the sub-band sequencer 1470.

상기 신호 결합기(1460)에 의해 결합된 신호는 상기 서브-밴드 연쇄기(1470)로 입력된다.The signal coupled by the signal combiner 1460 is input to the sub-band chain coupler 1470.

상기 서브-밴드 연쇄기(1470)은 상기 결합된 신호들을 연쇄하여 다운믹스 신호를 출력한다.The sub-band sequencer 1470 cascades the combined signals and outputs a downmix signal.

즉, 상기 신호 결합기(1460) 및 상기 서브-밴드 연쇄기(1470)에 의해 다운믹싱이 수행된다.That is, downmixing is performed by the signal combiner 1460 and the sub-band sequencer 1470.

상기 실시예는 right 채널의 신호를 입력으로 받으므로, 상기 다운믹스 신호는 right 다운믹스 신호이다.Since the above embodiment receives a right channel signal as an input, the downmix signal is a right downmix signal.

도 17는 본 발명의 일 실시예에 따른 다운믹싱 방법의 절차 흐름도이다.17 is a flowchart illustrating a downmixing method according to an embodiment of the present invention.

우선, 시간축 n에 따른 2개의 입력 신호, 예컨대 X_R1[n] 및 X_R2[n]은 주파수 대역에 따라 M개의 서브-밴드 신호(X_R1[n]은 X_R11[k] 내지 X_R1M[k]로, X_R2[n]은 X_R21[k] 내지 X_R2M[k]로)로 분할 된다(S1710).First, the two input signals, such as X _R1 in accordance with the time-base-n [n] and X _R2 [n] is the M sub in accordance with the band-band signals (X _R1 [n] is X _R11 [k] to X _R1M [ k] and X _R2 [n] is divided into X _R21 [k] to X _R2M [k] (S1710).

상기 분할된 서브-밴드 신호 중 일부, 예컨대 높은 주파수 대역의 k-j개가 이용되어 고대역 밴드의 부가정보가 생성된다(S1720).Some of the divided sub-band signals, e.g., k-j of the high frequency band are used to generate additional information of the high band band (S1720).

상기 고대역 밴드의 부가정보 생성에 대한 상세한 사항이 상기 도 14를 참조하여 전술되었다.Details of the generation of the additional information of the high band are described above with reference to FIG.

상기 분할된 서브-밴드 신호 중 다른 일부, 예컨대 낮은 주파수 대역의 j개는 윈도우잉된다(S1730).The other part of the divided sub-band signals, e.g., j in the low frequency band, is windowed (S1730).

상기 윈도우잉된 신호는 주파수 영역으로 변환된다(S1740).The windowed signal is converted into a frequency domain (S1740).

상기 주파수 영역으로 변환된 신호가 이용되어 저대역 밴드의 부가정보가 생성된다(S1750).The signal converted into the frequency domain is used to generate additional information of a low-band (S1750).

상기 윈도우잉, 상기 주파수 영역 변환 및 상기 저대역 밴드의 부가정보 생성에 대한 상세한 사항이 상기 도 14를 참조하여 전술되었다.Details of the windowing, the frequency domain transformation, and the generation of the additional information of the low-band band have been described above with reference to FIG.

상기 분할된 서브-밴드 신호는 다운믹스되어 다운믹스 신호가 생성된다(S1760).The divided sub-band signals are downmixed to generate a downmix signal (S1760).

상기 다운믹스 신호 생성에 대한 상세한 사항이 상기 도 14를 참조하여 전술되었다.Details of the downmix signal generation have been described above with reference to FIG.

도 18은 본 발명의 일 실시예에 따른 SAC 디코더(1003)의 상세한 구성도이다.18 is a detailed configuration diagram of a SAC decoder 1003 according to an embodiment of the present invention.

상기 SAC 디코더(1003)는 서브-밴드 분할기(1810), 고대역 밴드 업믹싱기(1820), 변환기(1830), 저대역 밴드 업믹싱기(1840), 지연기(1850), 역변환기(1800) 및 서브-밴드 연쇄기(1870)을 포함한다.The SAC decoder 1003 includes a sub-band splitter 1810, a high band band up mixer 1820, a converter 1830, a low band band up mixer 1840, a delay 1850, an inverse transformer 1800 And a sub-band sequencer 1870.

상기 서브-밴드 분할기(1810)는 디코딩된 right 다운믹스 신호를 입력으로 받는다. 상기 서브-밴드 분할기(1810)는 상기 입력 신호를, 도 15에서 도시된 것과 같이, 주파수 대역에 따라 서브-밴드로 분할하여 M개의 필터링된 신호 X_R1[k] 내지 X_RM[k]을 생성하여 출력한다.The sub-band splitter 1810 receives the decoded right downmix signal as an input. The sub-band divider 1810 divides the input signal into sub-bands according to a frequency band as shown in FIG. 15 to generate M filtered signals X _R1 [k] to X _RM [k] And outputs it.

상기 서브-밴드로 분할된 신호 중, 예컨대 X_R1[k] 내지 X_Rj[k]의 j개의 낮은 주파수 대역 신호는 상기 변환기(1830)로 입력된다.Of the signals divided into sub-bands, j low frequency band signals of, for example, X _R1 [k] to X _Rj [k] are input to the converter 1830.

상기 변환기(1830)는 상기 입력된 낮은 주파수 신호 대역 각각을 윈도우잉하고, 주파수 영역으로 변환하여 출력한다.The converter 1830 window-converts each of the input low frequency signal bands and converts the frequency bands into frequency bands.

따라서, 서브-밴드로 분할된 신호 X_R1[k] 내지 X_Rj[k]는 상기 변환기(1830)를 통해 신호 S_b,1(f) 내지 S_b,j(f)로 각각 주파수 영역 변환된다.Therefore, the sub-band divided signals X _R1 [k] to X _Rj [k] are frequency-domain transformed into signals S _{b, 1} (f) to S _{b, j} (f) through the transducer 1830 .

상기 윈도우잉 및 상기 주파수 변환의 세부적인 사항은 상기 도 14의 변환기(1440)에 대해 설명된 것과 같다.The details of the windowing and the frequency conversion are the same as those described for the converter 1440 in FIG.

상기 저대역 밴드 업믹싱기(1840)는 주파수 변환된 신호 S_b,1(f) 내지 S_b,j(f)를 입력 받고, 부가정보 비트스트림을 입력 받는다. 상기 저대역 밴드 업믹싱기(1840)는 상기 부가정보 비트스트림에 기반하여 상기 입력 신호에 대한 업믹싱을 수행한다.The low-band band upmixer 1840 receives the frequency-converted signals S _{b, 1} (f) to S _{b, j} (f), and receives the additional information bit stream. The low-band band upmixer 1840 performs upmixing on the input signal based on the additional information bitstream.

상기 업믹싱은 상기 수학식 3 및 상기 수학식 4에 대응하는 과정이며, 일반적인 CLD 값으로부터 게인(gain) 값을 추출하는 과정이다.The upmixing corresponds to Equation (3) and Equation (4), and is a process of extracting a gain value from a general CLD value.

제1 스테레오 객체의 파워 P₁ 및 제2 스테레오 객체의 파워 P₂의 관계는 하기의 수학식 5와 같다.The relationship between the power P ₁ of the first stereo object and the power P ₂ of the second stereo object is expressed by Equation (5).

따라서, 상기 업믹싱은 하기의 수학식 6에 따라 수행될 수 있다.Therefore, the upmixing may be performed according to Equation (6) below.

입력 신호 S_b,1(f)는 상기 부가정보에 따라 제1 스테레오 객체에 대한 신호인 S¹ _b,1(f) 및 제2 스테레오 객체에 대한 신호인 S² _b,1(f)로 분리되어 출력되고, 다른 서브-밴드에 대한 입력 신호 또한 상기 부가정보에 따라 분리되어 출력된다.The input signal S _{b, 1} (f) is divided into S ¹ _{b, 1} (f) for the first stereo object and S ² _{b, 1} (f) for the second stereo object according to the additional information. And the input signals for the other sub-bands are also output in accordance with the additional information.

이후, 상기 제1 스테레오 객체에 대한 신호(S¹ _b,1(f) 내지 S¹ _b,j(f)) 및 상기 제2 스테레오 객체에 대한 신호(S¹ _b,1(f) 내지 S¹ _b,j(f))는 각각 별개로 처리된다. 본 실시예에서는, 상기 제2 스테레오 객체에 대한 신호의 더 이상의 처리에 대해서는 도시하지 않는다.Then, the signal for the first stereo objects (S ¹ _{b, 1} (f) to S ¹ _{b, j} (f)), and wherein the S (signal for the second stereo object ¹ _{b, 1} (f) to S ¹ _{b, j} (f) are processed separately. In this embodiment, further processing of the signal for the second stereo object is not shown.

다음, 상기 제1 스테레오 객체에 대한 신호 S¹ _b,1(f) 내지 S¹ _b,j(f)는 상기 역변환기(1860)에 의해 X_R11[k] 내지 X_R1J[k]로 각각 변환된다. 상기 역변환기(1860)을 통한 변환 과정이 하기에서 상세히 설명된다.Next, the signals S ¹ _{b, 1} (f) to S ¹ _{b, j} (f) for the first stereo object are transformed into X _R11 [k] to X _R1J [k] by the inverse transformer 1860, do. The conversion process through the inverse transformer 1860 will be described in detail below.

상기 역변환기(1860)는 입력 신호 S¹ _b _,1(f) 내지 S¹ _b _,j(f) 각각에 대해 역 주파수 변환, 예컨대 역 이산 푸리에 변환(inverse discrete fourier transform; IDFT) 또는 역 변형된 이산 코사인 변환을 수행한다.The inverse transformer 1860 may perform an inverse frequency transform, such as an inverse discrete fourier transform (IDFT) or an inverse transform, for each of the input signals S ¹ _b _{, 1} (f) through S ¹ _b _, And performs discrete cosine transform.

다음, 상기 역변환기(1860)는 상기 역 주파수 변환된 신호 각각에 대해 윈도우잉을 수행하여 윈도우잉된 신호를 생성한다.Next, the inverse transformer 1860 performs windowing on each of the inverse-frequency-converted signals to generate a windowed signal.

도 19는 본 발명의 일 실시예에 따른 각각의 신호에 대한 윈도우잉의 예를 도시한다.19 shows an example of windowing for each signal according to an embodiment of the present invention.

입력 신호 S¹ _b,1(f) 내지 S¹ _b,j(f) 각각은 각각 N개의 서브프레임(실선)으로 이루어진다. 각각의 상기 입력 신호는 상기 입력 신호의 이전 서브프레임 N개(점선)와 묶여 윈도우잉된다. Each of the input signals S ¹ _{b, 1} (f) to S ¹ _{b, j} (f) consists of N subframes (solid lines). Each of the input signals is bounded by N (dotted lines) of the previous sub-frame of the input signal.

상기 역변환기(1860)는 상기 윈도우잉된 신호에 대해 중첩-부가(overlap-add)를 수행하여, 주파수 변환이 되기 이전의 신호인 X_R11[k] 내지 X_R1J[k]를 출력한다.The inverse transformer 1860 performs an overlap-add operation on the windowed signal to output X _R11 [k] to X _R1J [k], which are signals before frequency conversion.

상기 중첩 부가는 윈도우 효과(window effect)를 제거하기 위한 것이다.The overlapping portion is for eliminating the window effect.

도 20은 본 발명의 일 실시예에 따른 각각의 윈도우잉된 신호에 대한 중첩-부가의 예를 도시한다.FIG. 20 shows an example of superposition-addition for each windowed signal in accordance with an embodiment of the present invention.

상기 윈도우잉된 신호는 도 20에 도시된 것과 같이 2N개의 서브프레임을 갖는 현재 저대역 서브-밴드로 표현된다.The windowed signal is represented as a current low-band sub-band with 2N subframes as shown in FIG.

중첩기(2020)는 현재 저대역 서브-밴드 출력(2040), 즉 상기 윈도우잉된 신호의 서브프레임 중 후(later) N개의 서브프레임 및 이전 저대역 서브 밴드의 서브프레임(2030) 중 전(former) N개의 서브프레임을 중첩 부가하여, 저대역 서브-밴드 출력 X_R11[k] 내지 X_R1J[k](2050)를 생성한다.The superpositioner 2020 receives the current low-band sub-band output 2040, that is, the next N sub-frames of the sub-frame of the windowed signal and the previous sub-frame 2030 of the previous low- form N subframes to generate low-band sub-band outputs X _R11 [k] to X _R1J [k] (2050).

상기 이전 서브프레임 N개는 상기 변환기의 히스토리 버퍼(history buffer)(2010) 내에 저장될 수 있다.The previous N subframes may be stored in the history buffer 2010 of the transcoder.

즉, 상기 역변환기(1860)의 입력 신호 S¹ _b,1(f)가 전술된 IDFT, 윈도우잉, 중첩-부가의 과정을 거쳐 X_R11[k]로 변환되어 출력된다.That is, the input signal S ¹ _{b, 1} (f) of the inverse transformer 1860 is converted into X _R11 [k] through the above-described IDFT, windowing,

상기 고대역 밴드 업믹싱기(1820)는 상기 서브-밴드로 분할된 신호 중, 예컨대 X_R(j+1)[k] 내지 X_RM[k]의 M-j개의 높은 주파수 대역 신호를 입력 받고, 부가정보 비트스트림을 입력 받는다. 상기 고대역 밴드 업믹싱기(1820)는 상기 부가정보 비트스트림에 기반하여 상기 입력 신호에 대한 업믹싱을 수행한다.The high band band up mixer 1820 receives Mj high frequency band signals of, for example, X _{R (j + 1)} [k] to X _RM [k] among the sub- And receives an information bit stream. The high band band up mixer 1820 performs up-mixing on the input signal based on the additional information bit stream.

제1 스테레오 객체의 파워 P₁ 및 제2 스테레오 객체의 파워 P₂의 관계는 전술한 수학식 5와 같다.The relationship between the power P ₁ of the first stereo object and the power P ₂ of the second stereo object is as shown in Equation (5).

따라서, 상기 업믹싱은 하기의 수학식 7에 따라 수행될 수 있다.Therefore, the upmixing may be performed according to the following Equation (7).

입력 신호 X_RM[k]는 상기 부가정보에 따라 제1 스테레오 객체에 대한 신호인 X_R1M[k] 및 제2 스테레오 객체에 대한 신호인 X_R2M[k]로 분리되어 출력되고, 다른 입력 신호 또한 상기 부가정보에 따라 분리되어 출력된다.The input signal X _RM [k] is separated and output as X _R1M [k] for the first stereo object and X _R2M [k] for the second stereo object according to the additional information, And are output separately according to the additional information.

이후, 상기 제1 스테레오 객체에 대한 신호(X_R1(j+1)[k] 내지 X_R1M[k]) 및 상기 제2 스테레오 객체에 대한 신호(X_R2(j+1)[k] 내지 X_R2M[k])는 각각 별개로 처리된다. 본 실시예에서는, 상기 제2 스테레오 객체에 대한 신호의 더 이상의 처리에 대해서는 도시하지 않는다.Thereafter, the signals X _{R1 (j + 1)} [k] to X _R1M [k] for the first stereo object and the signals X _{R2 (j +} _R2M [k]) are separately processed. In this embodiment, further processing of the signal for the second stereo object is not shown.

상기 고대역 밴드 업믹싱기(1820)에 의해 분리된 상기 제1 스테레오 객체에 대한 신호(X_R1(j+1)[k] 내지 X_R1M[k])는 지연기로 입력되어 특정 시간(예컨데 D)만큼 지연된다. 상기 지연은 변환기(1830) 및 역변환기(1860)에 의한 처리를 거친 저대역 밴드 신호((X_R11 [k] 내지 X_R1J [k])와 동기화를 맞추기 위해서이다.The signals X _{R1 (j + 1)} [k] to X _R1M [k] for the first stereo object separated by the highband band upmixer 1820 are input to the delay unit and output to a delay unit ). The delay is to synchronize with the lowband band signals (X _R11 [k] to X _R1J [k]) that have been processed by the converter 1830 and the inverse transformer 1860.

상기 서브-밴드 연쇄기(1870)는 상기 지연기(1850)를 거친 고대역 밴드 신호(X_R1(j+1)[k] 내지 X_R1M[k]) 및 상기 역변환기(1860)를 거친 저대역 밴드 신호(X_R11[k] 내지 X_R1J [k])를 입력으로 받아 서브-밴드 연쇄를 통해 연쇄된 신호 X _R1[n-D]를 생성한다. 상기 연쇄된 신호는 D만큼 지연된 것이다.The sub-band sequencer 1870 receives the highband band signals X _{R1 (j + 1)} [k] to X _R1M [k] through the delay 1850 and the low- And _{receives the} band-band signals X _R11 [k] to X _R1J [k] and generates a concatenated signal X _R1 [nD] through the sub-band chain. The concatenated signal is delayed by D.

상기 연쇄는 통상적인 업-샘플링 과정을 통해 각각의 서브-밴드의 업-샘플링 신호를 합치는 것이다.The chain combines up-sampling signals of each sub-band through a conventional up-sampling process.

상기 고대역 밴드 업믹싱기(1820), 지연기(1850), 변환기(1830), 저대역 밴드 업믹싱기(1840) 및 역변환기(1860)는 각각 입력되는 신호 또는 신호의 쌍 마다 별개로 복수 개가 있을 수 있다. 또한, 상기 고대역 밴드 업믹싱기(1820), 지연기(1850), 변환기(1830), 저대역 밴드 업믹싱기(1840) 및 역변환기(1860)는 인접한 다른 구성 요소에 포함되는 서브 구성 요소일 수 있다.The high band band up mixer 1820, the delay 1850, the converter 1830, the low band band up mixer 1840 and the inverse transformer 1860 are respectively provided with a plurality There can be dogs. The high band band up mixer 1820, the delayer 1850, the transformer 1830, the low band band up mixer 1840 and the inverse transformer 1860 are connected to the sub- Lt; / RTI >

도 21은 본 발명의 일 실시예에 따른 디코딩 방법의 절차 흐름도이다.21 is a flowchart illustrating a decoding method according to an embodiment of the present invention.

우선, 디코딩된 다운믹스 신호는 주파수 대역에 따라 M개의 서브-밴드 신호 X_R1[k] 내지 X_RM[k]로 분할된다(S2110).First, the decoded downmix signal is divided into M sub-band signals X _R1 [k] to X _RM [k] according to the frequency band (S2110).

상기 분할된 서브-밴드 신호 중 일부, 예컨대 높은 주파수 대역의 k-j개의 신호는 고대역-밴드 업믹싱되어, 제1 스테레오 객체에 대한 신호 및 제2 스테레오 객체에 대한 신호로 분리된다(S2120).Some of the divided sub-band signals, for example, k-j signals in a high frequency band, are subjected to high-band-up mixing and separated into a signal for the first stereo object and a signal for the second stereo object (S2120).

상기 고대역 밴드 업믹싱에 대한 상세한 사항이 상기 도 18을 참조하여 전술되었다.Details of the high-band band-up mixing have been described above with reference to FIG.

상기 고대역 밴드 업믹싱된 신호는 다른 신호와의 동기화를 위해 딜레이(delay)된다(S2130).The high-band band-up mixed signal is delayed for synchronization with other signals (S2130).

상기 고대역 밴드 신호의 딜레이에 대한 상세한 사항이 상기 도 18을 참조하여 전술되었다.Details of the delay of the highband band signal have been described above with reference to FIG.

한편, 상기 분할된 서브-밴드 신호 중 남은 일부, 예컨대 낮은 주파수 대역의 j개의 신호는 윈도우잉되고(S2140), 상기 윈도우잉된 신호는 주파수 영역 변환된다(S2150).On the other hand, a part of remaining sub-band signals, for example, j signals in a low frequency band are windowed (S2140), and the windowed signal is frequency-domain transformed (S2150).

상기 윈도우잉 및 상기 주파수 영역 변환 에 대한 상세한 사항이 상기 도 18을 참조하여 전술되었다.The details of the windowing and the frequency domain transformation have been described above with reference to FIG.

상기 주파수 영역 변환된 저대역 밴드 신호는 저대역 밴드 업믹싱되어, 제1 스테레오 객체에 대한 신호 및 제2 스테레오 객체에 대한 신호로 분리된다(S2160).The frequency-domain-converted low-band-band signal is low-band-up-mixed and separated into a signal for the first stereo object and a signal for the second stereo object (S2160).

상기 저대역 밴드 업믹싱에 대한 상세한 사항이 상기 도 18을 참조하여 전술되었다.Details of the low-band band upmixing have been described above with reference to FIG.

상기 저대역 밴드 업믹싱된 신호는 역 주파수 변환된다(S2170). 상기 역 주파수 변환된 신호는 윈도우잉되고(S2175), 상기 윈도우잉된 역 주파수 변환된 신호는 중첩-부가되어 윈도우 효과가 제거된다(S2180).The low-band band upmixed signal is reverse-frequency-converted (S2170). The inverse-frequency-converted signal is windowed (S2175), and the windowed inverse-frequency-converted signal is superimposed-added to remove the window effect (S2180).

상기 역 주파수 변환, 윈도우잉 및 중첩-부가에 대한 상세한 사항이 상기 도 18을 참조하여 전술되었다.Details of the inverse frequency conversion, windowing and superposition-addition have been described above with reference to FIG.

상기 고대역-밴드 딜레이 및 저대역-밴드 중첩-부가를 거친 신호는 서브-밴드 연쇄된다(S2190).The high-band-delay and low-band-superimposed-added signals are sub-band cascaded (S2190).

상기 서브-밴드 연쇄에 대한 상세한 사항이 상기 도 18을 참조하여 전술되었다.Details of the sub-band chain have been described above with reference to FIG.

상기 서브-밴드 연쇄에 의해 출력 신호가 생성되고, 절차가 종료한다.An output signal is generated by the sub-band chain and the procedure ends.

본 발명의 일 실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment of the present invention can be implemented in the form of a program command which can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

101: 제1 다운믹서
103: 재2 다운믹서
105: 오디오 인코더
107: 부가정보 인코더
109: 멀티플렉서
111: 모노 채널 다운믹스
112: 스테레오 채널 다운믹스
115: 다채널 다운믹스101: 1st down mixer
103: Down mixer
105: Audio Encoder
107: additional information encoder
109: multiplexer
111: Mono channel downmix
112: Stereo channel downmix
115: Multichannel Downmix

Claims

delete

In the decoding method,
Decoding a downmix audio signal from an audio information bitstream generated by an encoding apparatus;
Extracting additional information from the additional information bitstream generated by the encoding apparatus;
Controlling the additional information according to rendering control information and output channel control information for the downmix audio signal;
And decoding the object-specific audio signal from the downmix audio signal using the controlled additional information
Lt; / RTI >
Wherein the decoding of the object-specific audio signal comprises:
Dividing the downmix audio signal into M subband signals according to a frequency band;
Windowing the first through jth subband signals of the lower frequency band among the M subband signals;
Transforming the windowed subband signals into a frequency domain;
Upmixing the frequency-domain-converted subband signals according to the sub information bit stream and separating them into low band signal signals for each of the stereo objects;
Frequency-band-shifting the low-band signals;
Banding the previous subframe of the inverse frequency-converted low-band signal with the inverse frequency-converted low-band signals;
Superposing subsequent subframes and previous subframes of the windowed lowband-band signals to output a low-band-band signal from which the window effect has been removed;
Upmixing and delaying Mj subband signals having a higher frequency band than the j-th subband signal having a lower frequency among the M subband signals to output a highband band signal; And
And decoding the audio signal for each object by subband-banding the low-band signal and the high-band signal from which the window effect has been removed
/ RTI >

6. The method of claim 5,
Wherein the step of controlling the additional information comprises:
And controlling the signal size information and the correlation information of the audio object among the additional information according to the rendering control information and the output channel control information.

delete

6. The method of claim 5,
Wherein the step of outputting the high-
Upmixing the Mj subband signals into additional information and separating them into highband band signals for each of the objects;
And outputting the low-band signal by delaying and outputting the high-band signal according to a time required for converting the j sub-band signals into a window and transforming the j sub-
/ RTI >

delete

In the encoding method,
Grouping a plurality of multi-object audio signals composed of different channels by channels;
Mixing down the multi-object audio signals grouped by the channel to generate a downmix audio signal, and extracting additional information;
Mixing the downmix audio signals to generate an audio information bitstream; And
Generating a side information bitstream using the side information
/ RTI >
Wherein the extracting comprises:
Dividing the multi-object audio signals into M sub-band signals;
Windowing the previous subframe of the first through jth subband signals of the lower frequency band among the M subband signals by combining the first subframe and the jth subband signals;
Transforming the windowed subband signals into a frequency domain;
Extracting additional information of a low-band signal using a frequency-domain-converted subband signal;
Extracting additional information of a high-band band using Mj subband signals having a higher frequency band than the j-th subband signal having a lower frequency band among the M subband signals; And
And downmixing the M sub-band signals divided in the multi-object audio signals to generate a downmix audio signal
/ RTI >

11. The method of claim 10,
Wherein the extracting comprises:
Mixing the multi-object audio signals grouped by the channel with a first downmixer to generate a first downmix signal and extracting additional information; And
Mixing the first downmix signal with a second downmixer to generate the downmix audio signal and extracting additional information
/ RTI >

12. The method of claim 11,
Wherein the first down mixer comprises:
Wherein the grouped multi-object audio signals include N-1 basic downmixers in a cascade structure when the number of audio objects included in each of the channels of the grouped multi-object audio signals is N.

12. The method of claim 11,
The step of generating the first downmix signal and extracting additional information includes:
Identifying channels of the grouped multi-object audio signals as one of a mono channel, a stereo channel, and a multi-channel; And
Downmixing the channels of the grouped multi-object audio signals with a mono channel downmixer, a stereo channel downmixer, or a multi-channel downmixer according to the identification result
/ RTI >

14. The method of claim 13,
Wherein the step of generating the downmix audio signal and extracting additional information comprises:
Mixing the first downmix signal output from the stereo channel downmixer and the first downmix signal output from the multi-channel downmixer to output a representative downmix signal, and extracting additional information; And
Mixing the first downmix signal and the representative downmix signal output from the mono channel downmixer to generate a second downmix signal and extracting additional information
/ RTI >

delete