KR102149019B1

KR102149019B1 - Method for generating and playing object-based audio contents and computer readable recordoing medium for recoding data having file format structure for object-based audio service

Info

Publication number: KR102149019B1
Application number: KR1020190081325A
Authority: KR
Inventors: 장인선; 김휘용; 서정일; 강경옥; 홍진우; 김진웅; 안치득; 이태진; 함승철
Original assignee: 한국전자통신연구원; (주)오디즌
Priority date: 2008-04-23
Filing date: 2019-07-05
Publication date: 2020-08-28
Anticipated expiration: 2038-02-28
Also published as: KR20190087354A

Abstract

객체기반 오디오 컨텐츠의 생성/재생 방법 및 객체기반 오디오 서비스를 위한 파일 포맷 구조를 가진 데이터를 기록한 컴퓨터 판독 가능 기록 매체가 개시된다. 객체기반 오디오 컨텐츠의 생성 방법은 복수의 오디오 객체를 입력 받는 단계, 상기 입력된 복수의 오디오 객체를 이용하여 적어도 하나의 프리셋을 생성하는 단계, 및 상기 복수의 오디오 객체, 및 상기 적어도 하나의 프리셋의 속성에 대한 프리셋 파라미터를 저장하는 단계를 포함하고, 상기 프리셋 파라미터는 상기 객체기반 오디오 컨텐츠에 관한 미디어 파일 포맷에서 정의되는 박스(box)의 형태로 저장된다. A computer-readable recording medium recording data having a file format structure for an object-based audio content generation/reproducing method and an object-based audio service is disclosed. The object-based audio content generation method includes receiving a plurality of audio objects, generating at least one preset using the input plurality of audio objects, and the plurality of audio objects and the at least one preset. Storing a preset parameter for an attribute, wherein the preset parameter is stored in the form of a box defined in a media file format for the object-based audio content.

Description

Computer-readable recording medium recording data with a file format structure for object-based audio content creation/playback and object-based audio service. FORMAT STRUCTURE FOR OBJECT-BASED AUDIO SERVICE}

본 발명은 객체기반 오디오 컨텐츠에 대한 프리셋 정보를 효율적으로 저장할 수 있는 객체기반 오디오 컨텐츠의 생성/재생 방법 및 객체기반 오디오 서비스를 위한 파일 포맷 구조를 가진 데이터를 기록한 컴퓨터 판독 가능 기록 매체에 관한 것이다. The present invention relates to a method for generating/reproducing object-based audio content capable of efficiently storing preset information for object-based audio content, and to a computer-readable recording medium recording data having a file format structure for an object-based audio service.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT원천기술개발의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호 : 2008-F-011-01, 과제명 : 차세대DTV핵심기술개발(표준화연계) - 무안경개인형3D방송기술개발(계속)].The present invention is derived from research conducted as part of the IT source technology development of the Ministry of Knowledge Economy and Information and Communication Research Promotion Agency (Project Management Number: 2008-F-011-01, Project Name: Next Generation DTV Core Technology Development (Standardization Linkage) -Glasses-free personal 3D broadcasting technology development (continued)].

TV 방송, 라디오 방송, DMB(Digital Multimedia Broadcasting) 등과 같은 방송 서비스를 통해 제공되는 기존의 오디오 신호는 여러 가지 음원으로부터 획득된 오디오 신호가 믹싱되어 하나의 오디오 신호로 저장/전송되는 것이다. Existing audio signals provided through broadcasting services such as TV broadcasting, radio broadcasting, and Digital Multimedia Broadcasting (DMB) are mixed with audio signals obtained from various sound sources and stored/transmitted as a single audio signal.

이와 같은 환경에서는 시청자가 전체 오디오 신호의 세기 등을 조절하는 것은 가능하나, 오디오 신호 내에 포함된 각 음원 별 오디오 신호의 세기를 조절하는 것 등과 같은 음원 별 오디오 신호의 특성제어는 불가능하게 된다. In such an environment, it is possible for the viewer to control the intensity of the entire audio signal, but it is impossible to control the characteristics of the audio signal for each sound source, such as adjusting the intensity of the audio signal for each sound source included in the audio signal.

그러나, 오디오 컨텐츠를 저작할 때, 각 음원 별 오디오 신호를 합성하지 않고 독립적으로 저장한다면, 컨텐츠 재생 단말에서는 각 음원 별 오디오 신호에 대한 세기 등을 제어하면서 해당 컨텐츠를 시청할 수 있게 된다. However, when audio content is authored, if the audio signal for each sound source is not synthesized and stored independently, the content playback terminal can view the corresponding content while controlling the intensity of the audio signal for each sound source.

이와 같이 저장/송신 단에서 여러 개의 오디오 신호를 독립적으로 저장/전송하고, 사용자가 수신기(컨텐츠 재생 장치)에서 각각의 오디오 신호를 적절히 제어하면서 청취할 수 있도록 하는 오디오 서비스를 객체 기반 오디오 서비스라 한다.In this way, an audio service that independently stores/transmits several audio signals at the storage/transmission stage and enables users to listen while appropriately controlling each audio signal at a receiver (content playback device) is called an object-based audio service. .

이러한 객체 기반 오디오 서비스에서는 각 객체들의 위치, 음의 세기, 객체들의 위치에 따른 음향적 특성 등과 같은 속성들을 프리셋(Preset)으로 정의하여 제공함으로써 사용자로 하여금 이들을 오디오 컨텐츠의 재생에 활용할 수 있게 한다. 즉, 여러 개의 프리셋 오디오 정보들을 생성하여 파일 내부에 포함하여 서비스한다면, 수신 측에서는 객체 기반 오디오 서비스를 더욱 효율적으로 재생할 수 있다. In this object-based audio service, properties such as the location of each object, sound intensity, and acoustic characteristics according to the location of the objects are defined and provided as presets, so that the user can utilize them for playing audio content. That is, if a plurality of preset audio information is generated and provided in a file, the object-based audio service can be played more efficiently at the receiving side.

기존의 ISO 기반 미디어 파일 포맷(ISO-BMFF: ISO Base Media File Format)에서는 오디오, 비디오, 정지 영상 등 다양한 형태의 미디어를 모두 포함하는 형태의 파일 구조를 정의하고 있다. 상기의 파일 구조는 미디어의 인터체인지(interchange), 관리(management), 편집(editing), 프레젠테이션(presentation)에 있어 유연하고 확장 가능한 특징이 있다. The existing ISO-based media file format (ISO-BMFF: ISO Base Media File Format) defines a file structure that includes all types of media such as audio, video, and still images. The above file structure is flexible and expandable in media interchange, management, editing, and presentation.

이러한 ISO 기반 미디어 파일 포맷에 오디오 트랙과 프리셋 정보를 추가하여 저장 또는 송신한다면 객체기반 오디오 서비스를 더욱 효율적으로 제공할 수 있을 것이다.If audio tracks and preset information are added to the ISO-based media file format and stored or transmitted, object-based audio services can be more efficiently provided.

본 발명의 일실시예들은 복수의 오디오 객체에 대한 프리셋을 효율적으로 저장할 수 객체기반 오디오 컨텐츠의 생성 방법을 제공하는 것을 목적으로 한다. An object of the present invention is to provide a method of generating object-based audio content capable of efficiently storing presets for a plurality of audio objects.

본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법은 복수의 오디오 객체를 입력 받는 단계, 상기 입력된 복수의 오디오 객체를 이용하여 적어도 하나의 프리셋을 생성하는 단계, 및 상기 복수의 오디오 객체, 및 상기 적어도 하나의 프리셋의 속성에 대한 프리셋 파라미터를 저장하는 단계를 포함하고, 상기 프리셋 파라미터는 상기 객체기반 오디오 컨텐츠에 관한 미디어 파일 포맷에서 정의되는 박스(box)의 형태로 저장된다. The object-based audio content generation method according to an embodiment of the present invention includes receiving a plurality of audio objects, generating at least one preset using the input plurality of audio objects, and the plurality of audio objects , And storing a preset parameter for the attribute of the at least one preset, wherein the preset parameter is stored in the form of a box defined in a media file format for the object-based audio content.

이 경우, 상기 미디어 파일 포맷은 ISO 기반 미디어 파일 포맷(ISO base media file format) 구조일 수 있다. In this case, the media file format may be an ISO base media file format.

또한, 상기 박스는 무브(moov) 박스를 포함하고, 상기 무브 박스는 상기 무브 박스 내에 정의된 제1 박스를 포함하고, 상기 제1 박스는 상기 제1 박스 내에 정의된 제2 박스를 포함하고, 상기 프리셋 파라미터는 제1 프리셋 파라미터 및 제2 프리셋 파라미터를 포함하고, 상기 제1 프리셋 파라미터는 상기 적어도 하나의 프리셋의 개수, 및 상기 적어도 하나의 프리셋 중에서 어느 하나의 프리셋의 프리셋 아이디(ID) 중에서 적어도 하나를 포함하고, 상기 제1 프리셋 파라미터는 상기 제1 박스에 저장되고, 상기 제2 프리셋 파라미터는 상기 제2 박스에 저장될 수 있다. In addition, the box includes a move box, the move box includes a first box defined in the move box, and the first box includes a second box defined in the first box, The preset parameter includes a first preset parameter and a second preset parameter, and the first preset parameter is at least among the number of the at least one preset and a preset ID (ID) of any one preset among the at least one preset. One may be included, and the first preset parameter may be stored in the first box, and the second preset parameter may be stored in the second box.

또한, 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법은 객체기반 오디오 컨텐츠로부터 복수의 오디오 객체 및 적어도 하나의 프리셋을 복원하는 단계, 상기 적어도 하나의 프리셋에 기초하여 상기 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성하는 단계, 및 상기 출력 오디오 신호를 재생하는 단계를 포함하고, 상기 적어도 하나의 프리셋 각각은 프리셋 파라미터를 포함하고, 상기 프리셋 파라미터는 상기 객체기반 오디오 컨텐츠에 관한 미디어 파일 포맷에서 정의되는 박스의 형태로 상기 객체기반 오디오 컨텐츠에 저장될 수 있다. In addition, a method for reproducing object-based audio content according to an embodiment of the present invention includes the steps of restoring a plurality of audio objects and at least one preset from object-based audio content, and the plurality of audio objects based on the at least one preset. Mixing to generate an output audio signal, and playing the output audio signal, wherein each of the at least one preset includes a preset parameter, and the preset parameter is a media file related to the object-based audio content It may be stored in the object-based audio content in the form of a box defined in the format.

또한, 본 발명의 일실시예에 따른 객체기반 오디오 서비스를 위한 파일 포맷 구조를 가진 데이터를 기록한 컴퓨터 판독 가능 기록 매체는 객체기반 오디오 컨텐츠의 규격 정보를 저장하는 에프팁(ftyp) 박스, 상기 객체 기반 오디오 컨텐츠를 구성하는 복수의 오디오 객체를 저장하는 엠닷(mdat) 박스, 및 상기 저장된 복수의 오디오 객체를 프레젠테이션(presentation)하는 메타데이터(meta data)를 저장하는 무브(moov) 박스를 포함하고, 상기 복수의 오디오 객체를 이용하여 생성된 적어도 하나의 프리셋의 속성에 대한 프리셋 파라미터는 상기 에프팁 박스 및 상기 무브 박스 중에서 어느 하나에 저장된다. In addition, a computer-readable recording medium recording data having a file format structure for an object-based audio service according to an embodiment of the present invention includes a ftyp box for storing standard information of object-based audio content, and the object-based audio content. Including an mdat box for storing a plurality of audio objects constituting audio content, and a move box for storing metadata for presenting the stored plurality of audio objects, the A preset parameter for the attribute of at least one preset generated using a plurality of audio objects is stored in any one of the ftip box and the move box.

본 발명에 따르면, 복수의 오디오 객체에 대한 프리셋을 효율적으로 저장할 수 있게 된다.According to the present invention, it is possible to efficiently store presets for a plurality of audio objects.

도 1은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조의 기본 형태를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 트랙과 채널과의 관계를 도시한 도면이다.
도 3은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 대한 흐름도를 도시한 도면이다.
도 4는 본 발명의 일실시예에 따른 'moov'의 구조를 도시한 도면이다.
도 5는 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법에 대한 흐름도를 도시한 도면이다.
도 6은 본 발명의 다른 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법의 흐름도를 도시한 도면이다.
도 7 및 도 8은 본 발명의 일실시예에 따라 디스크립션 정보를 포함하는 객체기반 오디오 컨텐츠의 저장을 위한 파일 포맷의 구조를 도시한 도면이다.1 is a diagram showing a basic form of a media file format structure for storing object-based audio content according to an embodiment of the present invention.
2 is a diagram illustrating a relationship between a track and a channel according to an embodiment of the present invention.
3 is a flowchart illustrating a method of generating object-based audio content according to an embodiment of the present invention.
4 is a diagram showing the structure of a'moov' according to an embodiment of the present invention.
5 is a flowchart illustrating a method of reproducing object-based audio content according to an embodiment of the present invention.
6 is a flowchart illustrating a method of reproducing object-based audio content according to another embodiment of the present invention.
7 and 8 are diagrams illustrating a structure of a file format for storing object-based audio content including description information according to an embodiment of the present invention.

이하에서는 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 다음에 예시하는 본 발명의 실시예는 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 다음에 상술하는 실시예에 한정되는 것은 아니다. 본 발명의 실시예는 당업계에서 통상의 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위하여 제공된다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the embodiments of the present invention exemplified below may be modified in various other forms, and the scope of the present invention is not limited to the embodiments described below. Embodiments of the present invention are provided to more fully describe the present invention to those of ordinary skill in the art.

도 1은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조의 기본 형태를 도시한 도면이다. 1 is a diagram showing a basic form of a media file format structure for storing object-based audio content according to an embodiment of the present invention.

도 1을 참조하면, 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조는 크게, 객체기반 오디오 컨텐츠의 규격 정보(즉, 객체기반 오디오 컨텐츠 파일의 타입 정보)가 저장되는 에프팁 박스(이하 'ftyp'라고 한다), 객체기반 오디오 컨텐츠를 구성하는 복수의 오디오 객체 데이터의 프레젠테이션(presentation)을 위한 메타데이터(metadata)(예를 들어, 디코딩 타임(decoding time))가 저장되는 무브 박스(이하 'moov'라고 한다), 및 복수의 오디오 객체 데이터가 저장되는 엠닷 박스(이하, 'mdat' 라고 한다)를 포함하여 구성된다. Referring to FIG. 1, a media file format structure for storing object-based audio content is largely, a ftip box (hereinafter referred to as'ftyp) in which standard information of object-based audio content (ie, type information of object-based audio content file) is stored. '), a move box (hereinafter referred to as'moov') in which metadata (e.g., decoding time) for presentation of a plurality of audio object data constituting object-based audio content is stored. '), and an mdot box (hereinafter referred to as'mdat') in which a plurality of audio object data is stored.

'ftyp' 및 'moov'는 메타 박스(이하 'meta'라고 한다)를 포함하여 구성되는데, 일반적으로 'meta'에는 'mdat'에 저장된 복수의 오디오 객체 데이터에 대한 디스크립션 메타데이터(descriptive metadata)가 저장된다. 'ftyp' and'moov' include a meta box (hereinafter referred to as'meta'), and in general,'meta' contains description metadata for a plurality of audio object data stored in'mdat'. Is saved.

여기서, 객체기반 오디오 컨텐츠의 저장을 위한 미디어 파일 포맷 구조는 ISO 기반 미디어 파일 포맷(ISO-BMFF: ISO Based Media File Format) 구조인 것이 바람직하다. Here, it is preferable that the media file format structure for storing object-based audio content is an ISO-based media file format (ISO-BMFF) structure.

이하에서는 ISO 기반 미디어 파일 포맷(ISO-BMFF) 구조에 따라서 객체기반 오디오 컨텐츠의 재생과 관련된 프리셋을 복수의 오디오 객체와 함께 저장하여 객체기반 오디오 컨텐츠를 생성하는 방법에 대해 기술하기로 한다. 그러나, 앞서 언급한 바와 같이, 이하에서 설명되는 객체기반 오디오 컨텐츠 생성 방법은 ISO 기반 미디어 파일 포맷(ISO-BMFF) 구조를 갖는 객체기반 오디오 컨텐츠에 한정되지 않으며, MP4 파일 등과 같은 멀티미디어 데이터를 저장하기 위한 미디어 파일 포맷 구조를 갖는 멀티채널 오디오 컨텐츠에 대해서도 확장 가능하다. Hereinafter, a method of generating object-based audio content by storing presets related to reproduction of object-based audio content together with a plurality of audio objects according to an ISO-based media file format (ISO-BMFF) structure will be described. However, as mentioned above, the object-based audio content generation method described below is not limited to object-based audio content having an ISO-based media file format (ISO-BMFF) structure, and stores multimedia data such as MP4 files. It is also possible to expand multi-channel audio contents having a media file format structure for this purpose.

본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법을 설명하기에 앞서, 객체기반 오디오 컨텐츠에 저장되는 프리셋의 속성을 나타내는 프리셋 파라미터에 대해 먼저 설명하기로 한다. 프리셋 파라미터는 아래에 나열된 프리셋 정보 중에서 적어도 하나를 포함할 수 있다. Prior to describing a method of generating object-based audio content according to an embodiment of the present invention, a preset parameter representing a property of a preset stored in object-based audio content will be described first. The preset parameter may include at least one of preset information listed below.

1. 프리셋 이름, 프리셋 아이디1. Preset name, preset ID

'프리셋 이름'은 프리셋과 대응되는 문자열(string)을 의미하고, '프리셋 아이디'는 프리셋과 대응되는 각각의 프리셋과 대응되는 정수(integer)를 의미한다. The'preset name' means a string corresponding to the preset, and the'preset ID' means an integer corresponding to each preset corresponding to the preset.

2. 프리셋 개수, 디폴트 프리셋 아이디(ID)2. Number of presets, default preset ID (ID)

'프리셋 개수'는 객체기반 오디오 컨텐츠에 포함되는 프리셋의 개수를 의미한다. The'preset number' means the number of presets included in the object-based audio content.

'디폴트 프리셋 아이디(default preset ID)'는 객체기반 오디오 컨텐츠가 재생되는 경우에 있어 사용자 인터랙션(user interaction)이 없는 초기 상태에서 가장 먼저 재생되어야 할 프리셋 아이디를 의미한다. 디폴트 프리셋 아이디는 객체기반 오디오 컨텐츠에 포함된 프리셋 아이디 중에서 어느 하나의 프리셋 아이디와 대응될 수 있다. The'default preset ID' refers to a preset ID to be played first in an initial state without user interaction when object-based audio content is played. The default preset ID may correspond to any one preset ID from among preset IDs included in the object-based audio content.

3. 프리셋 정보의 표시 여부3. Whether to display preset information

'프리셋 정보의 표시 여부'는 객체기반 오디오 컨텐츠의 재생 시에 프리셋 정보(일례로서, 아래에서 설명하는 입력 트랙 별 또는 입력 채널 별 볼륨 정보 또는 입력 트랙 별 또는 입력 채널 별 주파수 이득(gain) 정보)를 사용자에게 표시할지 여부에 대한 정보를 의미한다. 'Preset information display' refers to preset information when playing object-based audio content (for example, volume information for each input track or for each input channel or frequency gain information for each input track or for each input channel described below) Means information on whether to display to the user.

4. 프리셋의 편집 가능 여부4. Whether the preset can be edited

'프리셋의 편집 가능 여부'는 객체기반 오디오 컨텐츠의 재생 시 사용자가 프리셋을 편집할 수 있는지에 대한 정보를 의미한다.The'preset editable' means information on whether a user can edit a preset when playing object-based audio content.

5. 입력 트랙(track)의 개수, 입력 트랙의 아이디, 입력 트랙 당 입력 채널(channel)의 개수5. Number of input tracks, ID of input tracks, number of input channels per input track

'입력 트랙의 개수'는 객체기반 오디오 컨텐츠에 저장되는 입력 트랙의 개수를 의미한다. 여기서 입력 트랙은 음원(sound source)와 대응될 수 있다. 즉, 객체기반 오디오 컨텐츠가 보컬(vocal), 피아노, 드럼으로 구성되는 경우, 보컬, 피아노, 드럼 각각은 하나의 트랙으로 구성될 수 있다. The'number of input tracks' means the number of input tracks stored in the object-based audio content. Here, the input track may correspond to a sound source. That is, when the object-based audio content is composed of vocals, pianos, and drums, each of the vocals, pianos, and drums may be composed of one track.

'입력 트랙의 아이디'는 각각의 입력 트랙과 대응되는 정수(integer)를 의미한다. The “input track ID” means an integer corresponding to each input track.

'입력 트랙당 입력 채널의 개수'는 각각의 입력 트랙에 포함되는 채널의 개수를 의미한다. The'number of input channels per input track' means the number of channels included in each input track.

이하, 도 2를 참고하여 트랙 및 채널과의 관계를 설명하기로 한다. Hereinafter, a relationship between a track and a channel will be described with reference to FIG. 2.

도 2는 본 발명의 일실시예에 따른 트랙과 채널과의 관계를 도시한 도면이다. 2 is a diagram illustrating a relationship between a track and a channel according to an embodiment of the present invention.

도 2에서는 보컬 트랙(210), 피아노 트랙(220), 및 드럼 트랙(230)을 도시하고 있다.In FIG. 2, a vocal track 210, a piano track 220, and a drum track 230 are shown.

음원의 녹음 시에 있어서, 각각의 음원을 2채널(즉, 스테레오 채널)로 녹음하는 경우, 각 트랙은 2개의 채널을 포함할 수 있다. 즉, 2채널로 보컬, 피아노, 및 드럼을 녹음하는 경우, 보컬 트랙(210)은 제1 채널(211) 및 제2 채널(212)로 구성되고, 피아노 트랙(220)은 제1 채널(221) 및 제2 채널(222)로 구성되고, 드럼 트랙(230)은 제1 채널(231) 및 제2 채널(232)로 구성될 수 있다. 도 2에서는 모든 트랙이 동일한 채널을 포함하는 것으로 도시하였지만, 각 트랙당 포함되는 채널의 개수는 서로 다를 수 있다.When recording a sound source, when recording each sound source in two channels (ie, a stereo channel), each track may include two channels. That is, when vocals, pianos, and drums are recorded in two channels, the vocal track 210 is composed of a first channel 211 and a second channel 212, and the piano track 220 is a first channel 221 ) And a second channel 222, and the drum track 230 may be composed of a first channel 231 and a second channel 232. 2 shows that all tracks include the same channel, the number of channels included in each track may be different.

이 때, 객체기반 오디오 컨텐츠의 저작자가 트랙 별로 프리셋을 설정하는 경우 복수의 오디오 객체는 트랙과 대응될 수 있고, 채널 별로 프리셋을 설정하는 경우 복수의 오디오 객체는 채널과 대응될 수 있다. In this case, when the author of the object-based audio content sets a preset for each track, a plurality of audio objects may correspond to a track, and when a preset for each channel is set, the plurality of audio objects may correspond to a channel.

6. 출력 채널의 타입(type), 출력 채널의 개수6. Type of output channel, number of output channels

'출력 채널의 타입'은 객체기반 오디오 컨텐츠가 어떠한 채널을 통해 재생되는지 여부에 대한 정보를 의미하고, '출력 채널의 개수'는 출력 채널 타입에 따른 출력 채널의 개수를 의미한다. The'type of output channel' means information on whether the object-based audio content is played through which channel, and the'number of output channels' means the number of output channels according to the output channel type.

7. 사운드 등화(equalization)를 위한 주파수 대역(frequency band)의 개수, 각각의 주파수 대역의 중심 주파수(center frequency), 각각의 주파수 대역의 대역폭(bandwidth) 7. The number of frequency bands for sound equalization, the center frequency of each frequency band, and the bandwidth of each frequency band.

'주파수 대역의 개수'는 신호의 증폭이나 전송 과정에서 발생하는 신호의 변형을 보정하기 위한 사운드 등화가 적용될 주파수 대역의 개수를 의미한다. The'number of frequency bands' refers to the number of frequency bands to which sound equalization is applied to compensate for the transformation of a signal occurring during a signal amplification or transmission.

8. 입력 트랙 별 또는 입력 채널 별 볼륨 정보8. Volume information per input track or per input channel

'볼륨 정보'는 복수의 오디오 객체 각각의 볼륨에 관한 정보를 의미한다. 오디오 객체가 입력 트랙과 대응되는 경우, '입력 트랙 별 볼륨 정보'가 객체기반 오디오 컨텐츠에 저장되고, 오디오 객체가 입력 채널과 대응되는 경우, '입력 채널 별 볼륨 정보'가 객체기반 오디오 컨텐츠에 저장된다. 'Volume information' means information on the volume of each of a plurality of audio objects. When an audio object corresponds to an input track,'volume information per input track' is stored in object-based audio content, and when an audio object corresponds to an input channel,'volume information per input channel' is stored in object-based audio content do.

9. 입력 트랙 별 또는 입력 채널 별 주파수 이득(gain) 정보9. Frequency gain information for each input track or for each input channel

'주파수 이득 정보'는 사운드 등화 적용 시의 주파수 이득에 관한 정보를 의미하는 것이다. 오디오 객체가 입력 트랙과 대응되는 경우, '입력 트랙 별 주파수 이득 정보'가 객체기반 오디오 컨텐츠에 저장되고, 오디오 객체가 입력 채널과 대응되는 경우, '입력 채널 별 주파수 이득 정보'가 객체기반 오디오 컨텐츠에 저장된다. 'Frequency gain information' refers to information about frequency gain when sound equalization is applied. When an audio object corresponds to an input track,'frequency gain information for each input track' is stored in object-based audio content, and when an audio object corresponds to an input channel,'frequency gain information for each input channel' is object-based audio content. Is stored in.

10. 프리셋 글로벌(global) 볼륨 정보10. Preset global volume information

'프리셋 글로벌 볼륨 정보'는 복수의 오디오 객체 전체의 볼륨을 조절하기 위한 정보를 의미한다. 'Preset global volume information' means information for adjusting the volume of all of the plurality of audio objects.

11. 음상(sound image)의 크기 및 음상의 각도11. The size of the sound image and the angle of the sound image

'음상의 크기' 및 '음상의 각도'는 객체기반 오디오 컨텐츠에 저장되는 복수개의 채널에 의해 형성되는 음상의 크기 값 및 음상의 각도 값을 의미한다. The'size of sound image' and'angle of sound image' refer to a size value of a sound image and an angle value of a sound image formed by a plurality of channels stored in object-based audio content.

객체기반 오디오 컨텐츠의 저작자는 다양한 방법을 통하여 ISO 기반 미디어 파일 포맷 구조에 따라, 상기 나열된 정보들 중에서 적어도 하나를 포함하는 프리셋 파라미터를 저장하여 객체기반 오디오 컨텐츠를 생성할 수 있다. The author of the object-based audio content may generate object-based audio content by storing a preset parameter including at least one of the listed information according to the ISO-based media file format structure through various methods.

도 3은 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 대한 흐름도를 도시한 도면이다.3 is a flowchart illustrating a method of generating object-based audio content according to an embodiment of the present invention.

먼저, 단계(310)에서는 복수의 오디오 객체를 입력 받는다. First, in step 310, a plurality of audio objects are input.

다음으로, 단계(320)에서는 입력된 복수의 오디오 객체를 이용하여 적어도 하나의 프리셋을 생성한다. Next, in step 320, at least one preset is generated using a plurality of input audio objects.

마지막으로, 단계(330)에서는 복수의 오디오 객체, 및 프리셋의 속성에 대한 프리셋 파라미터를 저장한다. 상기 언급한 바와 같이, 프리셋 파라미터는 상기 나열된 정보 중에서 적어도 하나를 포함할 수 있다. Finally, in step 330, preset parameters for a plurality of audio objects and properties of the preset are stored. As mentioned above, the preset parameter may include at least one of the information listed above.

이 경우, 프리셋 파라미터는 객체기반 오디오 컨텐츠에 관한 미디어 파일 포맷에서 정의되는 박스(box)의 형태로 저장된다. In this case, the preset parameters are stored in the form of a box defined in the media file format for object-based audio content.

이하에서는 단계(330)에서 프리셋 파라미터를 저장하는 과정을 상세히 설명하기로 한다. Hereinafter, a process of storing the preset parameter in step 330 will be described in detail.

' ftyp ' 내에 존재하는 ' meta 또는 ' moov ' 내에 존재하는 ' meta'내에 프리셋 파라미터를 저장Storing the preset parameters in the meta '' present in the '' meta or 'moov present in the' ftyp '

본 발명의 일실시예에 따르면, 프리셋 파라미터는 'ftyp' 내에 존재하는 'meta'(이하 제1 'meta'라고 한다), 또는 'moov' 내에 존재하는 'meta'(이하 제2 'meta'라고 한다)내에 저장될 수 있다. According to an embodiment of the present invention, the preset parameter is'meta' existing in'ftyp' (hereinafter referred to as a first'meta'), or'meta' existing in'moov' (hereinafter referred to as a second'meta'). Can be stored in).

즉, 상기에서 언급한 바와 같이, 제1 'meta' 또는 제2 'meta'에는 노래 제목, 가수 이름, 앨범(album) 이름 등 객체기반 오디오 컨텐츠에 대한 일반적인 정보를 나타내는 디스크립션 정보(또는 디스크립션 메타데이터)가 저장될 수 있는데, 프리셋 파라미터는 상기의 디스크립션 정보와 함께 저장될 수 있다. That is, as mentioned above, in the first'meta' or the second'meta', description information indicating general information on object-based audio contents such as song title, artist name, album name, etc. (or description metadata ) May be stored, and the preset parameter may be stored together with the above description information.

디스크립션 정보가 저장되는 ' meta '와 다른 별개의 ' meta '에 프리셋 파라미터를 저장Save the preset parameters and 'meta' description that the information is stored in another separate 'meta'

본 발명의 일실시예에 따르면, 프리셋 파라미터는 객체기반 오디오 컨텐츠에 대한 디스크립션 정보가 저장되는 'meta'와 다른 별개의 'meta'에 저장될 수 있다. According to an embodiment of the present invention, the preset parameter may be stored in a separate'meta' from'meta' in which description information for object-based audio content is stored.

이는 디스크립션 정보는 객체기반 오디오 컨텐츠의 식별과 관련된 정보이고, 프리셋 파라미터는 객체기반 오디오 컨텐츠의 재생과 관련된 정보로서, 양 정보의 속성이 서로 다르므로 이는 서로 구분되어 관리(handling)되는 것이 바람직하다는 점에 기인한 것이다. This is the point that description information is information related to the identification of object-based audio content, and the preset parameter is information related to reproduction of object-based audio content, and since the properties of both pieces of information are different, it is desirable to be separated from each other and managed. It is due to.

일례로서, 디스크립션 정보는 제1 'meta'에 저장되고, 프리셋 파라미터는 제2 'meta'에 저장될 수 있다. As an example, description information may be stored in a first'meta', and a preset parameter may be stored in a second'meta'.

ISO 기반 미디어 파일 포맷에서는 하나의 레벨(level) 내에 하나의 'meta' 만이 존재할 수 있는 것으로 규정하고 있으므로 'ftyp'와 'moov' 각각은 하위 레벨에서 하나의 'meta'만을 포함할 수 있다. 따라서, 디스크립션 정보와 프리셋 파라미터가 구분되어 저장되려면, 서로 다른 레벨에 존재하는 'meta'(즉, 제1 'meta' 및 제2 'meta')에 디스크립션 정보 및 프리셋 파라미터가 각각 저장되어야 한다. 이 경우, 프리셋 파라미터는 프레젠테이션을 위한 메타데이터의 속성을 가지고 있으므로, 디스크립션 정보는 제1 'meta'에 저장되고, 프리셋 파라미터는 제2 'meta'에 저장될 수 있다. In the ISO-based media file format, since only one'meta' can exist in one level, each of'ftyp' and'moov' can include only one'meta' in a lower level. Therefore, in order to separate and store the description information and the preset parameter, the description information and the preset parameter must be stored in'meta' (ie, the first'meta' and the second'meta') existing at different levels. In this case, since the preset parameter has a property of metadata for presentation, description information may be stored in the first'meta', and the preset parameter may be stored in the second'meta'.

다른 일례로서, 디스크립션 정보는 'meta'(제1 'meta' 및 제2 'meta')에 그대로 저장되어 있고, 프리셋 파라미터는 'ftyp' 또는 'moov' 내에 존재하는 메코(meco) 박스(이하, 'meco'라고 한다) 내에 저장될 수 있다. As another example, description information is stored in'meta' (the first'meta' and the second'meta') as it is, and the preset parameter is a meco box (hereinafter, referred to as'ftyp' or'moov'). It can be stored in'meco').

'meco'는 ISO 기반 미디어 파일 포맷에서 규정하고 있는, 부가적인 메타데이터를 저장하기 위한 박스(Additional Metadata Contain Box)로서, 'meco'에는 ISO 기반 미디어 파일 포맷에서 규정되지 않은 별개의 메타데이터가 저장될 수 있다. 따라서, 프리셋 파라미터는 'ftyp' 내에 존재하는 'meco' 또는 'moov' 내에 존재하는 'meco' 중 어느 하나에 저장될 수 있다. 'meco' is a box for storing additional metadata specified in the ISO-based media file format (Additional Metadata Contain Box).'meco' stores separate metadata not specified in the ISO-based media file format. Can be. Accordingly, the preset parameter may be stored in either'meco' existing in'ftyp' or'meco' existing in'moov'.

'' moovmoov ' 내에 새롭게 정의된 박스에 프리셋 파라미터를 저장 Save preset parameters in a newly defined box in '

본 발명의 일실시예에 따르면, 프리셋 파라미터는 'moov' 내에 새롭게 정의된 박스에 저장될 수 있다. According to an embodiment of the present invention, preset parameters may be stored in a newly defined box in'moov'.

상기 언급한 바와 같이, 프리셋 파라미터와 디스크립션 정보는 속성이 서로 다르므로, 프리셋 파라미터는 디스크립션 정보와 별개로 관리(handling)되는 것이 바람직하다. 또한, 프리셋 파라미터는 프레젠테이션을 위한 메타데이터의 속성을 가지고 있으므로, 'moov' 내에 저장되는 것이 바람직하다. 따라서, 프리셋 파라미터를 효율적으로 관리하기 위해서는 'moov' 내에 새로운 박스를 정의하고, 새롭게 정의된 박스를 내에 프리셋 파라미터를 저장하는 것이 바람직하다. As mentioned above, since the properties of the preset parameter and the description information are different from each other, the preset parameter is preferably managed separately from the description information. Also, since the preset parameter has the property of metadata for presentation, it is preferable to be stored in'moov'. Therefore, in order to efficiently manage the preset parameters, it is desirable to define a new box in'moov' and store the preset parameters in the newly defined box.

도 4는 본 발명의 일실시예에 따른 'moov'의 구조를 도시한 도면이다. 4 is a diagram showing the structure of a'moov' according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 'moov' 내에는 2개의 박스가 정의될 수 있다. As shown in FIG. 4, two boxes may be defined in'moov'.

제1 박스는 'moov'내에 정의되는 박스로서, 제1 박스에는 프리셋의 전체적인 정보를 나타내는 프리셋 파라미터인 제1 프리셋 파라미터가 저장된다. 이하에서는 제1 박스를 프리셋 컨테이너 박스(preset contain box) 즉,'prco'라고 칭하기로 한다. The first box is a box defined in'moov'. In the first box, a first preset parameter, which is a preset parameter indicating overall information of the preset, is stored. Hereinafter, the first box will be referred to as a preset contain box, that is,'prco'.

일례로, 제1 프리셋 파라미터는 상기에서 언급한 프리셋의 개수 및 디폴트 프리셋 아이디 중에서 적어도 하나가 포함될 수 있다. 디폴트 프리셋 아이디(default preset ID)란 객체기반 오디오 컨텐츠가 재생되는 경우에 있어 사용자 인터랙션(user interaction)이 없는 초기 상태에서 가장 먼저 재생되어야 할 프리셋 아이디를 의미한다. 디폴트 프리셋 아이디는 객체기반 오디오 컨텐츠에 포함된 프리셋 아이디 중에서 어느 하나의 프리셋 아이디와 대응될 수 있다. For example, the first preset parameter may include at least one of the number of presets and a default preset ID mentioned above. The default preset ID refers to a preset ID to be played first in an initial state without user interaction when object-based audio content is played. The default preset ID may correspond to any one preset ID from among preset IDs included in the object-based audio content.

제2 박스는 'prco'내에 정의되는 박스로서, 제2 박스에는 프리셋의 속성에 대한 파라미터인 제2 프리셋 파라미터가 저장된다. The second box is a box defined in'prco', and a second preset parameter, which is a parameter for the property of the preset, is stored in the second box.

일례로, 제2 프리셋 파라미터에는 상기 나열된 정보 중에서 프리셋의 개수 및 디폴트 프리셋 아이디 이외의 다른 정보들이 포함될 수 있다. 이하에서는 제2 박스를 프리셋 박스(preset box), 즉, 'prst'라고 칭하기로 한다. For example, the second preset parameter may include other information other than the number of presets and a default preset ID among the information listed above. Hereinafter, the second box will be referred to as a preset box, that is,'prst'.

'prco'내에는 객체기반 오디오 컨텐츠에 포함되는 프리셋 수만큼의 'prst'가 존재한다. 만약, 객체기반 오디오 컨텐츠 내에 프리셋이 저장되지 않는 경우, 'prco' 내에는 'prst'가 존재하지 않는다. In'prco', there are as many'prst' as the number of presets included in the object-based audio content. If the preset is not stored in the object-based audio content,'prst' does not exist in'prco'.

일례로, 'prst'에는 상기에서 언급한 프리셋 정보 중에서 프리셋의 개수 및 디폴트 프리셋 아이디를 제외한 나머지 프리셋 정보를 포함하는 프리셋 파라미터가 저장될 수 있다. For example,'prst' may store preset parameters including preset information other than the number of presets and default preset ID among the preset information mentioned above.

본 발명의 일실시예에 따르면, 'moov'가 'prco' 및 'prst'를 포함하는 경우, ISO 기반 미디어 파일 포맷의 구조는 표 1과 같이 나타낼 수 있다. According to an embodiment of the present invention, when'moov' includes'prco' and'prst', the structure of an ISO-based media file format may be represented as shown in Table 1.

ftypftyp file type and compatibilityfile type and compatibility moovmoov container for all the metadatacontainer for all the metadata mvhdmvhd movie header, overall declarationsmovie header, overall declarations traktrak container for an individual track or streamcontainer for an individual track or stream tkhdtkhd track header, overall information about the tracktrack header, overall information about the track treftref track reference containertrack reference container edtsedts edit list containeredit list container elstelst an edit listan edit list mdiamdia container for the media information in a trackcontainer for the media information in a track mdhdmdhd media header, overall information about the mediamedia header, overall information about the media hdlrhdlr handler, declares the media (handler) typehandler, declares the media (handler) type minfminf media information containermedia information container smhdsmhd sound media header, overall information (sound track only)sound media header, overall information (sound track only) hmhdhmhd hint media header, overall information (hint track only)hint media header, overall information (hint track only) nmhdnmhd Null media header, overall information (some tracks only)Null media header, overall information (some tracks only) dinfdinf data information box, containerdata information box, container drefdref data reference box, declares source(s) of media data in trackdata reference box, declares source(s) of media data in track stblstbl sample table box, container for the time/space mapsample table box, container for the time/space map stsdstsd sample descriptions (codec types, initialization etc.)sample descriptions (codec types, initialization etc.) sttsstts (decoding) time-to-sample (decoding) time-to-sample stscstsc sample-to-chunk, partial data-offset informationsample-to-chunk, partial data-offset information stszstsz sample sizes (framing)sample sizes (framing) stz2stz2 compact sample sizes (framing)compact sample sizes (framing) stcostco chunk offset, partial data-offset informationchunk offset, partial data-offset information co64co64 64-bit chunk offset64-bit chunk offset prcoprco container for the presetscontainer for the presets prstprst preset box, container for the preset informationpreset box, container for the preset information mdatmdat media data containermedia data container freefree free spacefree space skipskip free spacefree space metameta MetadataMetadata hdlrhdlr handler, declares the metadata (handler) typehandler, declares the metadata (handler) type dinfdinf data information box, containerdata information box, container DrefDref data reference box,declares source(s) of metadata itemsdata reference box,declares source(s) of metadata items ilociloc item locationitem location iinfiinf item informationitem information xmlxml XML containerXML container bxmlbxml binary XML containerbinary XML container pitmpitm primary item referenceprimary item reference

이하에서는 'prco' 및 'prst'의 신택스(syntax)와 시맨틱스(semantics)의 일실시예들에 대해 자세히 설명하기로 한다. Hereinafter, examples of the syntax and semantics of'prco' and'prst' will be described in detail.

표 2는 'prco'의 신택스의 일실시예를 나타낸다. Table 2 shows an example of the syntax of'prco'.

Preset Container Box
Box type: 'prco'
Container: Movie Box ('moov')
Mandatory: Yes
Quantity: Exactly one

syntax

aligned(8) class PresetContainerBox extends Box('prco'){
unsigned int(8) num_preset;
unsigned int(8) default_preset_ID;
} Preset Container Box
Box type :'prco'
Container: Movie Box ('moov')
Mandatory: Yes
Quantity: Exactly one

syntax

aligned(8) class PresetContainerBox extends Box('prco'){
unsigned int(8) num_preset;
unsigned int(8) default_preset_ID;
}

표 2의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax in Table 2 are as follows.

'num_preset'은 'prco' 내의 프리셋의 개수를 의미한다. 'num_preset' means the number of presets in'prco'.

'default_preset_ID'는 디폴트 프리셋 아이디를 각각 의미한다. 저작자가 'default_preset_ID'를 설정하지 않은 경우, 프리셋 아이디 값이 가장 작은 프리셋의 프리셋 아이디가 'default_preset_ID'로 설정될 수 있다. 'default_preset_ID' means each default preset ID. When the author does not set the'default_preset_ID', the preset ID of the preset having the smallest preset ID value may be set as'default_preset_ID'.

만약 'default_preset_ID'가 '0'으로 설정된 경우, 객체기반 오디오 컨텐츠에 포함되는 복수의 오디오 객체 중에서 다객체 오디오 압축 기술(SAOC: MPEG-D Spatial audio object coding)로써 부호화되어 저장된 오디오 객체들의 비트스트림 내부에 저장된 프리셋에 따라 객체기반 오디오 컨텐츠가 재생될 수 있다. 이에 대한 보다 자세한 설명은 도 6에 대한 설명을 참고하기로 한다. If'default_preset_ID' is set to '0', the bitstream of audio objects encoded and stored with multi-object audio compression technology (SAOC: MPEG-D Spatial audio object coding) among a plurality of audio objects included in object-based audio content Object-based audio content may be played according to the preset stored in the. For a more detailed description of this, reference will be made to the description of FIG. 6.

표 3은 'prst'의 개괄적인 신택스를 나타낸다. Table 3 shows the general syntax of'prst'.

Preset Box
Box type: ' prst '
Container: Preset Container Box ('prco')
Mandatory: No
Quantity: zero or more

syntax

aligned(8) class PresetBox extends FullBox('prst', version=0, flags){
unsigned int(8) preset_ID;
unsigned int(8) num_preset_track;
unsigned int(8) preset_track_ID[num_preset_track];
unsigned int(8) preset_type;
unsigned int(8) preset_global_volume;

if(preset_type == 0) {}
if(preset_type == 1) {}
if(preset_type == 2) {}
if(preset_type == 3) {}
if(preset_type == 4) {}
if(preset_type == 5) {}
if(preset_type == 6) {}
if(preset_type == 7) {}
if(preset_type == 8) {}
if(preset_type == 9) {}
if(preset_type == 10) {}
if(preset_type == 11) {}
string preset_name;
} Preset Box
Box type: ' prst '
Container: Preset Container Box ('prco')
Mandatory: No
Quantity: zero or more

syntax

aligned(8) class PresetBox extends FullBox('prst', version=0, flags){
unsigned int(8) preset_ID;
unsigned int(8) num_preset_track;
unsigned int(8) preset_track_ID[num_preset_track];
unsigned int(8) preset_type;
unsigned int(8) preset_global_volume;

if(preset_type == 0) {}
if(preset_type == 1) {}
if(preset_type == 2) {}
if(preset_type == 3) {}
if(preset_type == 4) {}
if(preset_type == 5) {}
if(preset_type == 6) {}
if(preset_type == 7) {}
if(preset_type == 8) {}
if(preset_type == 9) {}
if(preset_type == 10) {}
if(preset_type == 11) {}
string preset_name;
}

표 3의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax in Table 3 are as follows.

'version'은 'prst'의 버전을 의미한다. 'version' means the version of'prst'.

'flags'는 객체기반 오디오 컨텐츠의 재생 시에 있어, 'prst'에 저장된 정보를 사용자에게 표시할지 여부 및 'prst'에 저장된 정보에 대한 사용자의 편집을 허용할지 여부에 대한 플래그 정보를 의미한다.'flags' means flag information on whether to display information stored in'prst' to a user and whether to allow the user's editing of information stored in'prst' when playing object-based audio content.

'flags'는 8비트 인티저(bit integer)의 데이터 타입을 갖는 플래그 정보로서, 표 4와 같은 의미를 가질 수 있다. 'flags' is flag information having a data type of an 8-bit integer, and may have the same meaning as in Table 4.

FlagsFlags DisplayDisplay EditEdit 0x010x01 disabledisable disabledisable 0x020x02 enableenable disabledisable 0x030x03 enableenable enableenable

즉, 만약 'flags'가 0x01인 경우, 객체기반 오디오 컨텐츠의 재생 시 'prst' 내에 저장된 프리셋 관련 정보가 사용자에게 표시되지 않으며, 사용자는 'prst' 내에 저장된 프리셋 관련 정보를 편집할 수 없다.That is, if'flags' is 0x01, preset related information stored in'prst' is not displayed to the user when playing object-based audio content, and the user cannot edit preset related information stored in'prst'.

만약 'flags'가 0x02인 경우, 객체기반 오디오 컨텐츠의 재생 시 'prst' 내에 저장된 프리셋 관련 정보는 사용자에게 표시지만, 사용자는 'prst' 내에 저장된 정보를 편집할 수 없다.If'flags' is 0x02, preset related information stored in'prst' is displayed to the user when playing object-based audio content, but the user cannot edit information stored in'prst'.

만약 'flags'가 0x03인 경우, 객체기반 오디오 컨텐츠의 재생 시 'prst' 내에 저장된 정보는 사용자에게 표시되며, 사용자는 'prst' 내에 저장된 정보를 편집할 수 있다. If'flags' is 0x03, information stored in'prst' is displayed to the user when playing object-based audio content, and the user can edit information stored in'prst'.

'preset_ID'는 프리셋 아이디를 의미하는 것으로 1 이상의 값을 가질 수 있다. 'preset_ID' means a preset ID and may have a value of 1 or more.

'num_preset_track'은 프리셋과 관련된 입력 트랙의 개수를 의미한다. 'num_preset_track' means the number of input tracks related to a preset.

'preset_track_ID[num_preset_track]'은 입력 트랙의 아이디를 저장하는 어레이(array)를 의미한다. 'preset_track_ID[num_preset_track]' means an array that stores the ID of an input track.

'preset_name' 은 프리셋 이름을 의미한다. 'preset_name' means the preset name.

'preset_global_volume'는 프리셋 글로벌 볼륨 정보를 의미한다. 'preset_global_volume' means preset global volume information.

일반적으로, 객체기반 오디오 컨텐츠의 리듬감을 강조하기 위해, 저작자는 드럼과 같은 타악기(percussion instrument) 소리의 볼륨을 다른 악기 소리의 볼륨에 비해 상대적으로 크게 하여 프리셋을 생성한다. In general, in order to emphasize the rhythmic feeling of object-based audio contents, the author creates a preset by making the volume of a percussion instrument such as a drum relatively larger than that of other instrument sounds.

그런데, 만약 타악기 소리와 다른 악기 소리의 상대적인 볼륨 차가 작은 경우, 충분한 리듬감을 느낄 수 없게 된다. 이와 반대로, 만약 타악기 소리와 다른 악기 소리의 상대적인 볼륨 차가 큰 경우, 전체적인 볼륨의 크기가 작아지게 된다. 이는 일반적으로 타악기의 소리는 효과음(effector)과 같은 속성을 가지고 있어, 객체기반 오디오 컨텐츠의 총 재생 구간에 걸쳐 다른 악기 소리에 비해 타악기 소리의 고주파 성분이 차지하는 비중이 크다는 점에 기인한 것이다. However, if the relative volume difference between the sound of percussion and other musical instruments is small, a sufficient sense of rhythm cannot be felt. Conversely, if the relative volume difference between the percussion sound and the sound of other musical instruments is large, the overall volume of the sound is reduced. This is due to the fact that, in general, the sound of percussion instruments has the same property as an effector, and thus the high frequency component of the percussion sound occupies a greater proportion than other instrument sounds over the entire reproduction period of the object-based audio content.

예를 들어, [보컬, 피아노, 드럼]으로 구성된 프리셋의 볼륨 값이 [250, 200, 400]인 경우 전체적인 볼륨은 적당하지만 리듬감이 강조되지 않고, 프리셋의 볼륨 값이 [100, 150, 400]의 경우 리듬감은 강조되지만 전체적인 볼륨은 줄어들게 된다. For example, if the volume value of the preset consisting of [Vocal, Piano, Drum] is [250, 200, 400], the overall volume is adequate, but the rhythmic feel is not emphasized, and the preset volume value is [100, 150, 400]. In this case, the sense of rhythm is emphasized, but the overall volume is reduced.

이는 객체기반 오디오 컨텐츠 내에 프리셋 글로벌 볼륨 정보를 더 저장함으로써 해결될 수 있다. 프리셋 글로벌 볼륨 정보는 프리셋을 구성하는 오디오 객체의 전체적인 볼륨을 조절하기 위한 정보이다. This can be solved by further storing preset global volume information in the object-based audio content. The preset global volume information is information for adjusting the overall volume of an audio object constituting a preset.

즉, 객체기반 오디오 컨텐츠 내에 세팅되어 있는 기본 글로벌 볼륨 값을 기준으로 입력 트랙 전체의 볼륨 값을 저장하고, 프리셋 글로벌 볼륨 값을 기존의 글로벌 볼륨 값보다 크도록 프리셋을 생성한다면, 객체기반 오디오 컨텐츠의 재생 시 상대적인 볼륨 차가

의 비율로 더 커지게 된다. That is, if the volume value of the entire input track is stored based on the basic global volume value set in the object-based audio content, and the preset global volume value is created to be greater than the existing global volume value, the object-based audio content The relative volume difference during playback

It becomes larger at the rate of.

일례로서, 기본 글로벌 볼륨 값이 '50'이고, [보컬, 피아노, 드럼]로 구성된 프리셋의 볼륨 값이 [100, 150, 400]인 경우, 프리셋 글로벌 볼륨 값을 100로 설정한다면, 각각의 악기의 볼륨은 두 배로 커지게 된다. 이에 따라, 주 멜로디를 구성하는 보컬 및 피아노의 볼륨은 두 배 정도 커지게 되어 객체기반 오디오 컨텐츠의 전체적인 볼륨은 적정한 수준이 되고, 드럼의 볼륨 또한 2배로 커지게 되어 리듬감을 강조할 수 있게 된다. As an example, if the default global volume value is '50' and the volume value of the preset consisting of [Vocal, Piano, Drum] is [100, 150, 400], if the preset global volume value is set to 100, each instrument Will double the volume. Accordingly, the volume of vocals and pianos constituting the main melody becomes about twice as large, so that the overall volume of the object-based audio content becomes an appropriate level, and the volume of the drum is also doubled, thereby emphasizing the sense of rhythm.

이와 같이 프리셋 글로벌 볼륨 값을 이용해서 볼륨을 증폭시키는 경우, 클리핑(clipping) 현상 등의 음질 열화가 발생할 수 있지만, 일반적으로 타악기 소리를 일정 수준 이상으로 증가시키는 경우, 타악기에서 발생하는 음질 열화는 사용자가 인지하기 어렵다는 실험적 사실에 기초한다면, 프리셋 글로벌 볼륨 정보의 이용에 따른 음질 열화는 문제되지 않을 것이다. When the volume is amplified using the preset global volume value as described above, sound quality deterioration such as clipping may occur, but in general, when the percussion sound is increased above a certain level, the sound quality deterioration that occurs in the percussion instrument Based on the experimental fact that is difficult to recognize, sound quality deterioration due to the use of preset global volume information will not be a problem.

또한, 프리셋 글로벌 볼륨 정보는 기본 글로벌 볼륨 값이 최대인 경우, 전체적인 볼륨 크기를 증가시키기 위한 용도로도 사용될 수 있다. In addition, the preset global volume information may be used for increasing the overall volume size when the basic global volume value is the maximum.

즉, 일반적인 객체기반 오디오 컨텐츠의 재생에 있어, 기본 글로벌 볼륨 값이 최대인 경우, 오디오 객체 각각의 볼륨을 조절하는 것이 불가능하다. 그러나, 만약 객체기반 오디오 컨텐츠 내에 프리셋 글로벌 볼륨 정보가 저장되어 있다면, 기본 글로벌 볼륨 값의 최대값보다 더 큰 볼륨으로 객체기반 오디오 컨텐츠를 재생할 수 있게 된다. That is, in the reproduction of general object-based audio content, when the basic global volume value is the maximum, it is impossible to adjust the volume of each audio object. However, if preset global volume information is stored in the object-based audio content, the object-based audio content can be reproduced with a volume greater than the maximum value of the basic global volume value.

'preset_type'은 프리셋의 타입을 의미한다. 'preset_type' means the type of preset.

본 발명의 일실시예에 따르면, 프리셋 타입은 믹싱 정보의 종류, 믹싱 정보의 적용 대상, 및 객체기반 오디오 컨텐츠의 재생 시간에 따른 믹싱 정보의 변화 여부에 기초하여 결정될 수 있다. 이하에서는 프리셋 타입의 결정 방법에 대해 상세히 설명하기로 한다. According to an embodiment of the present invention, the preset type may be determined based on a type of mixing information, a target to which the mixing information is applied, and whether mixing information changes according to a reproduction time of object-based audio content. Hereinafter, a method of determining the preset type will be described in detail.

먼저, 프리셋 타입은 믹싱 정보의 종류에 기초하여 결정할 수 있다. First, the preset type may be determined based on the type of mixing information.

일례로서, 믹싱 정보는 볼륨 정보 및 사운드 등화 정보 중에서 적어도 하나를 포함할 수 있다. 이하에서는 볼륨 정보만을 고려하여 생성된 프리셋을 볼륨 프리셋(volume preset)으로, 등화 정보만을 고려하여 생성된 프리셋을 등화 프리셋(equalization preset)으로, 볼륨 정보와 등화 정보를 모두 고려하여 생성된 프리셋을 볼륨/등화 프리셋(volume/equalization preset)라고 칭하기로 한다. As an example, the mixing information may include at least one of volume information and sound equalization information. Hereinafter, a preset created by considering only volume information is used as a volume preset, a preset created by considering only equalization information is used as an equalization preset, and a preset created by considering both volume information and equalization information is used as a volume preset. It will be referred to as a volume/equalization preset.

다음으로, 프리셋 타입은 믹싱 정보의 적용 대상에 기초하여 결정될 수 있다. Next, the preset type may be determined based on a target to which the mixing information is applied.

즉, 입력 트랙을 오디오 객체로 간주하여 믹싱 정보를 적용할지, 입력 채널을 오디오 객체로 간주하여 믹싱 정보를 적용할지 여부에 따라 프리셋 타입이 결정될 수 있다. 이하에서는 입력 트랙을 오디오 객체로 간주하여 생성된 프리셋을 트랙 프리셋(track preset)으로, 입력 채널을 오디오 객체로 간주하는 생성된 프리셋을 채널 프리셋(channel preset)으로 칭하기로 한다. That is, the preset type may be determined according to whether to apply mixing information by considering the input track as an audio object or applying mixing information by considering the input channel as an audio object. Hereinafter, a preset generated by considering an input track as an audio object is referred to as a track preset, and a generated preset in which an input channel is regarded as an audio object is referred to as a channel preset.

마지막으로, 프리셋 타입은 객체기반 오디오 컨텐츠의 재생 시간에 따른 믹싱 정보의 변화 여부에 기초하여 결정될 수 있다. Finally, the preset type may be determined based on whether the mixing information changes according to the reproduction time of the object-based audio content.

즉, 객체기반 오디오 컨텐츠의 재생됨에 따라, 믹싱 정보가 일정한 값을 갖는지, 믹싱 정보가 변화하는지 여부에 따라 프리셋 타입이 결정될 수 있다. 이하에서는 믹싱 정보가 변화하지 않는 경우의 프리셋을 스태틱 프리셋(static preset)으로, 믹싱 정보가 변화하는 경우의 프리셋을 다이나믹 프리셋(dynamic preset)으로 칭하기로 한다. That is, as the object-based audio content is reproduced, the preset type may be determined according to whether the mixing information has a constant value or whether the mixing information changes. Hereinafter, a preset when the mixing information does not change is referred to as a static preset, and a preset when the mixing information changes is referred to as a dynamic preset.

본 발명의 일실시예에 따르면, 객체기반 오디오 컨텐츠 내에 다이나믹 프리셋을 저장하는 경우 'prst' 내에는 입력 트랙 아이디 및 상기 입력 트랙 아이디의 믹싱 정보를 매핑(mapping)하는 테이블(table)이 포함될 수 있다. 이 경우, 기존의 ISO-BMFF에서 규정하고 'stts'(decoding time to sample box)와 상기 테이블에 저장된 믹싱 정보에 기초하여 입력 트랙의 샘플링 넘버에 따른 믹싱 정보가 도출될 수 있다('stts'에는 디코딩 시간(decoding time)과 샘플링 넘버(sample number)와의 관계 정보가 저장되어 있다). 이에 따라, 객체기반 오디오 컨텐츠의 재생에 있어 임의 접근(random access)이 가능하게 되고, 객체기반 오디오 컨텐츠에 저장되는 믹싱 정보의 양은 감소될 수 있다. According to an embodiment of the present invention, when a dynamic preset is stored in object-based audio content, a table for mapping the input track ID and mixing information of the input track ID may be included in'prst'. . In this case, mixing information according to the sampling number of the input track may be derived based on the'stts' (decoding time to sample box) defined in the existing ISO-BMFF and the mixing information stored in the table ('stts' includes The relationship information between the decoding time and the sample number is stored). Accordingly, random access is possible in the reproduction of the object-based audio content, and the amount of mixing information stored in the object-based audio content can be reduced.

상기에서 언급한 정보들을 이용하여 프리셋을 생성하는 경우, 프리셋 타입은 표 5와 같이 구분될 수 있다. 표 5에서는 12개의 프리셋이 존재할 수 있는 것으로 표시하였지만, 이는 분류 요소에 따라 더욱 확장될 수 있다.When a preset is generated using the above-mentioned information, the preset type can be classified as shown in Table 5. In Table 5, it is indicated that there may be 12 presets, but this may be further expanded according to classification factors.

presetpreset
_type_type static(S)static(S)
/dynamic(D)/dynamic(D) track(T)track(T)
/channel(C)/channel(C) volumevolume
(Vol)(Vol) equalizationequalization
(( EqEq )) meaningmeaning 00 SS TT VolVol -- static track volume presetstatic track volume preset 1One SS TT VolVol EqEq static track volume preset with equalizationstatic track volume preset with equalization 22 SS TT -- EqEq static track equalization presetstatic track equalization preset 33 DD TT VolVol -- dynamic track volume presetdynamic track volume preset 44 DD TT VolVol EqEq dynamic track volume preset with equalizationdynamic track volume preset with equalization 55 DD TT -- EqEq dynamic track equalization presetdynamic track equalization preset 66 SS CC VolVol -- static object volume preset static object volume preset 77 SS CC VolVol EqEq static object volume preset with equalizationstatic object volume preset with equalization 88 SS CC -- EqEq static object equalization presetstatic object equalization preset 99 DD CC VolVol -- dynamic object volume preset dynamic object volume preset 1010 DD CC VolVol EqEq dynamic object volume preset with equalizationdynamic object volume preset with equalization 1111 DD CC -- EqEq dynamic object equalization presetdynamic object equalization preset

표 5를 참고하면, 믹싱 정보는 볼륨 정보와 등화 정보를 포함하고, 이는 프리셋 타입에 따라 상이한 형태로 'prst'에 저장됨을 알 수 있다. 여기서, 믹싱 정보의 저장 형태는 크게, 프리셋 타입이 static preset 인지 dynamic preset인지에 따라 구분될 수 있다. Referring to Table 5, it can be seen that the mixing information includes volume information and equalization information, which are stored in'prst' in a different form according to the preset type. Here, the storage format of the mixing information can be largely classified according to whether the preset type is a static preset or a dynamic preset.

1. 프리셋 타입이 static preset인 경우 1. When the preset type is static preset

프리셋 타입이 static preset인 경우, 객체기반 오디오 컨텐츠를 구성하는 복수의 프레임에서의 믹싱 정보는 동일하므로, 각각의 오디오 객체 별로 동일한 믹싱 정보가 저장된다. 여기서, 믹싱 정보의 저장 형태는 프리셋 타입이 track preset인지, channel preset인지에 따라 세부적으로 구분될 수 있다. When the preset type is a static preset, since mixing information in a plurality of frames constituting object-based audio content is the same, the same mixing information is stored for each audio object. Here, the storage format of the mixing information may be classified in detail according to whether the preset type is a track preset or a channel preset.

1.1. 프리셋 타입이 static/track preset인 경우('preset_type' 값이 0, 1, 2인 경우)1.1. When the preset type is static/track preset (when the'preset_type' value is 0, 1, 2)

믹싱 정보가 트랙 별로 저장되는 경우, 출력 채널 타입은 입력 트랙 중에서 가장 많은 채널을 갖는 입력 트릭에 따라 결정될 수 있다. 예를 들어, 제1 입력 트랙이 2개의 채널을 포함하고, 제2 입력 트랙이 1개의 채널을 포함하는 경우, 제1 입력 채널에 포함되는 채널의 개수가 더 많으므로, 출력 채널 타입은 스테레오로 결정될 수 있다. When mixing information is stored for each track, the output channel type may be determined according to an input trick having the most channels among input tracks. For example, when a first input track includes two channels and a second input track includes one channel, since the number of channels included in the first input channel is larger, the output channel type is stereo. Can be determined.

이 경우, 'prst' 내의 프리셋의 신택스는 표 6 내지 표 8과 같을 수 있다. In this case, the syntax of the preset in'prst' may be the same as Tables 6 to 8.

if(preset_type == 0){ // static track volume preset
for(i=0; i<num_preset_track; i++){
unsigned int(8) preset_volume;
}
}if(preset_type == 0){ // static track volume preset
for(i=0; i<num_preset_track; i++){
unsigned int(8) preset_volume;
}
}

if(preset_type == 1){ // static track volume preset with equalization
for(i=0; i<num_preset_track; i++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(j=0; j<num_freq_band; j++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}if(preset_type == 1){ // static track volume preset with equalization
for(i=0; i<num_preset_track; i++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(j=0; j<num_freq_band; j++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}

if(preset_type == 2){ // static track equalization preset
for(i=0; i<num_preset_track; i++){
unsigned int(8) num_freq_band;
for(j=0; j<num_freq_band; j++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
nsigned int(8) preset_freq_gain;
}
}
}if(preset_type == 2){ // static track equalization preset
for(i=0; i<num_preset_track; i++){
unsigned int(8) num_freq_band;
for(j=0; j<num_freq_band; j++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
nsigned int(8) preset_freq_gain;
}
}
}

표 6 내지 표 8의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax of Tables 6 to 8 are as follows.

'preset_volume'은 볼륨 정보를 의미한다. 'preset_volume' means volume information.

볼륨 정보는 입력 트랙의 입력 볼륨 값과 출력 트랙의 출력 볼륨 값간의 볼륨 이득 값을 포함할 수 있다. 볼륨 이득 값은 백분율 또는 데시벨(dB)로 표현될 수 있다. The volume information may include a volume gain value between an input volume value of an input track and an output volume value of an output track. The volume gain value can be expressed as a percentage or decibel (dB).

또한, 백분율 또는 데시벨로 표현된 볼륨 이득 값은 양자화되어 저장될 수 있다. 이 경우, 양자화된 볼륨 이득 값은 표 9 및 표 10과 같이 표현될 수 있다. In addition, a volume gain value expressed as a percentage or decibel may be quantized and stored. In this case, the quantized volume gain value can be expressed as Tables 9 and 10.

indexindex 00 1One 22 33 149149 200200 value(ratio)value(ratio) 00 0.020.02 0.040.04 0.060.06 3.983.98 4.004.00

indexindex 00 1One 22 33 44 55 66 77 88 99 1010 1111 1212 1313 value(dB)value(dB) -25-25 -21-21 -18-18 -15-15 -12-12 -8-8 -5-5 -3-3 -1-One 00 1One 22 33 44

'num_freq_band'은 사운드 등화가 적용되는 주파수 대역의 개수를 의미하는 것으로서, 0 이상 32 이하의 정수 값을 갖는다. 'num_freq_band' refers to the number of frequency bands to which sound equalization is applied, and has an integer value of 0 or more and 32 or less.

'center_freq'는 각각의 주파수 대역에서의 중심 주파수를 의미하는 것으로서, 0 이상 20,000 이하의 정수 값을 갖는다(단위: Hz). 'center_freq' means the center frequency in each frequency band, and has an integer value of 0 or more and 20,000 or less (unit: Hz).

'bandwidth'는 각각의 주파수 대역의 대역폭을 의미하는 것으로서, 0 이상 20,000 이하의 정수 값을 갖는다(단위: Hz). 'bandwidth' refers to the bandwidth of each frequency band, and has an integer value of 0 to 20,000 (unit: Hz).

'preset_freq_gain'각각의 주파수 대역에서의 주파수 이득 값을 의미한다. 'preset_freq_gain' means a frequency gain value in each frequency band.

볼륨 이득 값과 마찬가지로 주파수 이득 값 역시 백분율 또는 데시벨(dB)로 표현될 수 있고, 또한, 백분율 또는 데시벨로 표현된 주파수 이득 값은 양자화되어 저장될 수 있다. 이 경우, 양자화된 주파수 이득 값은 표 11과 같이 표현될 수 있다. Like the volume gain value, the frequency gain value may also be expressed as a percentage or decibel (dB), and the frequency gain value expressed as a percentage or decibel may be quantized and stored. In this case, the quantized frequency gain value can be expressed as shown in Table 11.

indexindex 00 1One 22 33 149149 200200 gaingain 00 0.020.02 0.040.04 0.060.06 3.983.98 4.004.00

1.2. 프리셋 타입이 static/channel preset인 경우('preset_type' 값이 7, 8, 9인 경우)1.2. When the preset type is static/channel preset (when the'preset_type' value is 7, 8, 9)

믹싱 정보가 채널 별로 저장되는 경우, 믹싱 정보는 입력 트랙의 개수, 입력 트랙 당 채널의 개수 및 출력 채널 타입을 고려하여 저장될 수 있다. 이 경우, 'prst'내의 프리셋의 신택스는 표 12 내지 표 14와 같을 수 있다. When the mixing information is stored for each channel, the mixing information may be stored in consideration of the number of input tracks, the number of channels per input track, and an output channel type. In this case, the syntax of the preset in'prst' may be the same as Tables 12 to 14.

if(preset_type == 6){ // static object volume preset
unsigned int(8) num_input_channel[num_preset_track];
unsigned int(8) output_channel_type;
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigened int(8) preset_volume;
}
}
}
}if(preset_type == 6){ // static object volume preset
unsigned int(8) num_input_channel[num_preset_track];
unsigned int(8) output_channel_type;
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigened int(8) preset_volume;
}
}
}
}

if(preset_type == 7){ // static object volume preset with equalization
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}if(preset_type == 7){ // static object volume preset with equalization
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}

if(preset_type == 8){ // static object equalization preset
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}if(preset_type == 8){ // static object equalization preset
for (i=0; i<num_preset_track; i++){
for (j=0; j<num_input_channel[i]; j++){
for (k=0; k<num_output_channel; k++){
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}

표 12 내지 표 14의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax of Tables 12 to 14 are as follows.

'num_input_channel[num_preset_track]'은 입력 트랙당 채널의 개수에 대한 정보를 저장하는 어레이를 의미한다. 'num_input_channel[num_preset_track]' denotes an array that stores information on the number of channels per input track.

일례로서, 'num_input_channel[num_preset_track]'는 'moov'/'track'/'media'/'minf'/'stbl'/'stsd' 내에 존재하는 'channel_count' 정보를 이용하여 구성될 수 있다. 입력 트랙이 모노 채널을 포함하는 경우, 'num_input_channel[num_preset_track]'는 '1'의 값, 입력 트랙이 스테레오 채널을 포함하는 경우, 'num_input_channel[num_preset_track]'는 '2'의 값을, 입력 트랙이 5채널을 포함하는 경우, 'num_input_channel[num_preset_track]'는 '5'의 값을 각각 가질 수 있다. As an example,'num_input_channel[num_preset_track]' may be configured using'channel_count' information existing in'moov'/'track'/'media'/'minf'/'stbl'/'stsd'. When the input track includes a mono channel,'num_input_channel[num_preset_track]' is a value of '1', when the input track includes a stereo channel,'num_input_channel[num_preset_track]' is a value of '2', and the input track is When 5 channels are included,'num_input_channel[num_preset_track]' may each have a value of '5'.

'output_channel_type'은 출력 채널 타입을 의미하고, 'num_output_channel'은 출력 채널의 개수를 의미한다. 일례로서, 'output_channel_type'과 'num_output_channel'은 표 15과 같은 관계를 가질 수 있다. 'output_channel_type' means an output channel type, and'num_output_channel' means the number of output channels. As an example,'output_channel_type' and'num_output_channel' may have a relationship as shown in Table 15.

output_channel_typeoutput_channel_type MeaningMeaning numnum _output_channel_output_channel 00 mono channelmono channel 1One 1One stereo channelstereo channel 22 22 5 channel5 channel 55

또한, 본 발명의 일실시예에 따르면, 프리셋 타입이 static/object/volume preset이고, 출력 채널의 개수가 5개인 경우, 'prst'에 저장되는 믹싱 정보는 표 16과 같이 표현될 수 있다. In addition, according to an embodiment of the present invention, when the preset type is static/object/volume preset and the number of output channels is 5, mixing information stored in'prst' may be expressed as shown in Table 16.

preset_track_ID = 1preset_track_ID = 1 reset_track_ID = 7 reset_track_ID = 7 LL RR MM output channel volumeoutput channel volume LL 5050 00 5050 RR 00 8080 5050 CC 5050 8080 00 LsLs 00 00 3030 RsRs 00 00 3030

이 경우, 'prst'에 저장되는 각각의 파라미터는 하기와 같은 관계를 가진다. In this case, each parameter stored in'prst' has the following relationship.

*num_preset_track = 2*num_preset_track = 2

preset_track_ID[2] = [1,7]preset_track_ID[2] = [1,7]

num_input_channel[2] = [2, 1]num_input_channel[2] = [2, 1]

num_output_channel =5num_output_channel =5

preset_volume = [50, 0, 50, 0, 0, 0, 80, 80, 0, 0, 50, 50, 0, 30, 30]preset_volume = [50, 0, 50, 0, 0, 0, 80, 80, 0, 0, 50, 50, 0, 30, 30]

여기서, 'preset_volume'을 살펴보면, 일부 믹싱 정보들이 중복되어 저장됨을 알 수 있다. 이 경우, 저장되는 정보의 양이 불필요하게 증가되게 되므로, 'prst'에 저장되는 정보의 양을 줄이기 위한 방안이 요구된다. 이에 대한 보다 자세한 설명을 하기의 "2-나, 다, 라" 부분을 참고하기로 한다. Here, looking at'preset_volume', it can be seen that some mixing information is redundantly stored. In this case, since the amount of information stored is unnecessarily increased, a method for reducing the amount of information stored in'prst' is required. For a more detailed description of this, refer to "2-B, C, D" below.

2. 프리셋 타입이 dynamic preset인 경우 2. When the preset type is dynamic preset

프리셋 타입이 dynamic preset인 경우, 객체기반 오디오 컨텐츠를 구성하는 복수의 프레임에서 믹싱 정보가 변화하므로, 상이한 믹싱 정보가 저장될 수 있다.When the preset type is a dynamic preset, different mixing information may be stored because mixing information changes in a plurality of frames constituting object-based audio content.

따라서, 믹싱 정보는 프레임 넘버(또는 샘플링 넘버(sample number))에 따른 행렬로 표현될 수 있으며, 또한 상기 행렬은 입력 트랙의 프레임과 이에 해당하는 믹싱 정보를 매핑하는 테이블의 형태로써 표현될 수 있다. Accordingly, the mixing information may be expressed as a matrix according to a frame number (or a sample number), and the matrix may be expressed in the form of a table for mapping a frame of an input track and mixing information corresponding thereto. .

이하에서는 변화하는 믹싱 정보가 표 17과 같은 매핑 테이블 형태로 표시되는 경우, 믹싱 정보를 저장하는 방안에 대해 구체적으로 설명하기로 한다. Hereinafter, when the changing mixing information is displayed in the form of a mapping table as shown in Table 17, a method of storing the mixing information will be described in detail.

sampling numbersampling number Input TrackInput Track preset_track ID = 1preset_track ID=1 preset_track ID=3preset_track ID=3 1One 5050 2020 22 5050 2020 99 5050 2020 1010 5050 2020 1111 5050 1010 1212 5050 1010 1919 5050 1010 2020 5050 1010 2121 7070 6060 2222 7070 6060 2929 7070 6060 3030 7070 6060

가. 프레임 넘버에 따른 믹싱 정보 값을 그대로 저장end. Save mixing information value according to frame number as it is

나. 프레임 넘버에 따른 믹싱 정보 값을 기준 값(reference value) 및 기준 값에 대한 믹싱 정보 차이 값으로 저장I. Saving the mixing information value according to the frame number as a reference value and the difference value of mixing information about the reference value

기준 값은 기준 프레임에서의 기준 믹싱 정보 값을 의미한다. 따라서, 기준 프레임에서의 기준 믹싱 정보 값, 및 기준 프레임 이외의 프레임에서의 믹싱 정보와 기준 믹싱 정보 값과의 차이 값이 'prst'에 저장될 수 있다. The reference value means the reference mixing information value in the reference frame. Accordingly, the reference mixing information value in the reference frame, and a difference value between the mixing information in frames other than the reference frame and the reference mixing information value may be stored in'prst'.

만약 기준 값이 0인 경우, 표 17은 표 18과 같이 간략하게 표현될 수 있다. If the reference value is 0, Table 17 can be briefly expressed as Table 18.

sampling countsampling count Input TrackInput Track preset_track ID = 1preset_track ID=1 2020 5050 1010 7070 sampling countsampling count Input TrackInput Track preset_track ID = 3preset_track ID = 3 1010 2020 1010 1010 1010 6060

따라서, 믹싱 정보가 표 18와 같은 테이블의 형태로 'prst'에 저장되는 경우, 저장되는 정보의 양을 감소시킬 수 있게 된다. Therefore, when mixing information is stored in'prst' in the form of a table as shown in Table 18, it is possible to reduce the amount of stored information.

다. 중복을 나타내는 플래그 정보를 이용하여 믹싱 정보를 저장All. Saving mixing information using flag information indicating overlapping

본 방안은 이전의 프레임의 믹싱 정보 값과 현재 프레임의 믹싱 정보 값이 동일한 경우, 믹싱 정보 값을 저장하지 않고, 현재 프레임의 믹싱 정보 값과 이전 프레임의 믹싱 정보 값이 동일한 것임을 나타내는 플래그 정보를 저장함으로써, 'prst'에 저장되는 정보의 양을 감소시킬 수 있는 방법이다. In this scheme, when the mixing information value of the previous frame and the mixing information value of the current frame are the same, the mixing information value is not stored, and flag information indicating that the mixing information value of the current frame and the mixing information value of the previous frame are the same. By doing so, it is a way to reduce the amount of information stored in'prst'.

이 경우, 믹싱 정보 값이 시간에 따라 값이 변화한다 하더라도, 각 프레임마다 믹싱 정보가 변화할 가능성은 크지 않으므로, 프레임마다 플래그 값을 부여하는 것이 효율적이지 않다. In this case, even if the value of the mixing information changes over time, the possibility that the mixing information changes for each frame is not high, so it is not efficient to assign a flag value for each frame.

따라서, 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 따르면, 믹싱 정보 값 및 플래그 정보는 믹싱 정보가 변화하는 프레임 간격에 대한 정보에 기초하여 저장될 수 있다. Accordingly, according to the method for generating object-based audio content according to an embodiment of the present invention, the mixing information value and flag information may be stored based on information on a frame interval in which the mixing information changes.

예를 들어, 믹싱 정보가 표 17과 같이 변화하는 경우, 믹싱 정보(즉 볼륨 정보)는 10개의 프레임 단위로 변화하는 것으로 간주될 수 있다. 따라서, 표 17은 표 19와 같이 간략하게 표현할 수 있다. For example, when the mixing information changes as shown in Table 17, the mixing information (ie, volume information) may be regarded as changing in units of 10 frames. Therefore, Table 17 can be briefly expressed as Table 19.

preset_volumepreset_volume 5050 5050 7070 2020 1010 6060 volume_flagvolume_flag 00 1One 00 00 00 00 modified preset_volumemodified preset_volume 5050 __ 7070 2020 1010 6060

따라서, 'prst'에 저장되는 각각의 파라미터는 하기와 같은 관계를 가진다.Therefore, each parameter stored in'prst' has the following relationship.

dynamic_interval = 10dynamic_interval = 10

volume_flag = [0, 1, 0, 0, 0, 0]volume_flag = [0, 1, 0, 0, 0, 0]

preset_volume = [50, 70, 20, 10, 60]preset_volume = [50, 70, 20, 10, 60]

여기서, 'dynamic_interval'은 프레임 간격을 의미하고, 'volume_flag'는 볼륨 플래그 정보를 의미한다. 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 동일한 경우, 'volume_flag'는 '1'의 값을 갖고, 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 다른 경우, 'volume_flag'는 '0'의 값을 갖는다. Here,'dynamic_interval' means a frame interval, and'volume_flag' means volume flag information. When the mixing information of the previous frame and the mixing information of the current frame are the same,'volume_flag' has a value of '1', and when the mixing information of the previous frame and the mixing information of the current frame are different,'volume_flag' is of '0'. It has a value.

이를 참고하면, 객체기반 오디오 컨텐츠에 포함되는 복수의 프레임이 특정 프레임 간격에 따라 프레임 그룹으로 구분되고, 믹싱 정보는 프레임 그룹 별로 저장되는 것으로 이해될 수 있다. Referring to this, it may be understood that a plurality of frames included in the object-based audio content are divided into frame groups according to a specific frame interval, and mixing information is stored for each frame group.

즉 본 발명의 일실시예에 따르면, 제1 프레임 그룹에 대한 제1 그룹 믹싱 정보와 제2 프레임 그룹에 대한 제2 그룹 믹싱 정보가 다른 경우, 'prst'에 저장되는 프리셋 파라미터는 제1 그룹 믹싱 정보, 제2 그룹 믹싱 정보, 제1 그룹 믹싱 정보와 제2 그룹 믹싱 정보가 다른 것임을 나타내는 제1 플래그(flag) 정보, 및 복수의 프레임 그룹 각각에 포함되는 프레임의 개수(즉, 프레임 간격)을 포함한다. That is, according to an embodiment of the present invention, when the first group mixing information for the first frame group and the second group mixing information for the second frame group are different, the preset parameter stored in'prst' is the first group mixing information. Information, second group mixing information, first flag information indicating that the first group mixing information and the second group mixing information are different, and the number of frames (ie, frame intervals) included in each of the plurality of frame groups Include.

반대로, 제1 그룹 믹싱 정보와 제2 그룹 믹싱 정보가 동일한 경우, 'prst'에 저장되는 프리셋 파라미터는 제1 그룹 믹싱 정보, 및 제1 그룹 믹싱 정보와 제2 그룹 믹싱 정보가 동일한 것임을 나타내는 제2 플래그 정보, 및 포함하는 복수의 프레임 그룹 각각에 포함되는 프레임의 개수를 포함한다. Conversely, when the first group mixing information and the second group mixing information are the same, the preset parameter stored in'prst' is a second group indicating that the first group mixing information and the first group mixing information and the second group mixing information are the same. It includes flag information, and the number of frames included in each of the included plurality of frame groups.

라. 믹싱 정보가 변화하는 횟수, 믹싱 정보가 변화하는 프레임의 프레임 넘버를 이용하여 믹싱 정보를 저장la. The mixing information is stored using the number of times the mixing information changes and the frame number of the frame in which the mixing information changes.

본 방안에 따르면, 믹싱 정보가 변화하는 횟수, 믹싱 정보가 변화하는 프레임의 프레임 넘버, 및 이에 따른 믹싱 정보가 저장된다. 따라서, 본 방안은 임의 접근(random access)의 측면에서, 상기 설명한 '다'의 방법보다 더욱 효율적인 방법이라고 할 수 있다. According to the present scheme, the number of times the mixing information is changed, the frame number of the frame in which the mixing information is changed, and the mixing information accordingly are stored. Therefore, this scheme can be said to be a more efficient method than the above-described'C' method in terms of random access.

예를 들어, 믹싱 정보가 표 17과 같이 변화하는 경우, 'prst'에 저장되는 믹싱 정보의 변화 횟수, 믹싱 정보가 변화하는 프레임 넘버, 및 믹싱 정보(즉 볼륨 정보)는 아래와 같다. For example, when the mixing information changes as shown in Table 17, the number of changes in the mixing information stored in'prst', the frame number at which the mixing information changes, and the mixing information (that is, volume information) are as follows.

num_updates = 3num_updates = 3

updated_sample_number = [1, 11, 21]updated_sample_number = [1, 11, 21]

preset_volume = [50, 20, 50, 10, 70, 60]preset_volume = [50, 20, 50, 10, 70, 60]

여기서, 'num_updates'는 믹싱 정보의 변화(업데이트) 횟수를, 'updated_sample_number'은 믹싱 정보가 변화(업데이트)되는 프레임 넘버를 각각 의미한다.Here,'num_updates' denotes a number of changes (updates) of the mixing information, and'updated_sample_number' denotes a frame number at which the mixing information is changed (updated).

이상에서, 믹싱 정보가 재생 시간에 따라 변화하는 경우, 믹싱 파라미터를 효율적으로 저장하기 위한 방안들에 대해 자세히 살펴보았다. 상기의 방안들은 프리셋 타입이 static preset인 경우에 있어, 저장되는 믹싱 정보들이 중복되는 때에도 역시 적용 가능하다. In the above, when the mixing information changes according to the playback time, methods for efficiently storing the mixing parameter have been described in detail. The above schemes are also applicable when the preset type is a static preset, and thus stored mixing information is duplicated.

예를 들어, 'prst'에 저장되는 믹싱 정보가 표 16과 같이 표시되는 경우에 있어, 플래그 정보를 이용하는 상기의 "다" 방안에 따라 믹싱 정보를 저장하는 하는 경우, 표 16은 표 20과 같이 변형될 수 있다. For example, when mixing information stored in'prst' is displayed as shown in Table 16, when mixing information is stored according to the above-described “C” scheme using flag information, Table 16 is as shown in Table 20. It can be transformed.

preset_volumepreset_volume 5050 00 5050 00 00 00 8080 8080 00 00 5050 5050 00 3030 3030 volume_flagvolume_flag 00 00 00 00 1One 1One 00 1One 00 1One 00 1One 00 00 1One modified preset_volumemodified preset_volume 5050 00 5050 00 __ __ 8080 __ 00 __ 5050 __ 00 3030 __

volume_flag = [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1]volume_flag = [0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1]

preset_volume = [50, 0, 50, 0, 80, 0, 50, 0, 30]preset_volume = [50, 0, 50, 0, 80, 0, 50, 0, 30]

이 경우, 표 12에 표시된 'prst'내의 프리셋의 신택스는 표 21과 같이 변형될 수 있다. In this case, the syntax of the preset in'prst' shown in Table 12 may be modified as shown in Table 21.

if(preset_type == 6){ // static object volume preset
unsigned int(8) num_input_channel[num_preset_track];
unsigned int(8) output_channel_type;
unsigned int(16) num_volume_flag;
for (i=0; i<num_volume_flag; i++){
unsigned int(8) volume_flag;
if(volume_flag==0){
unsigned int(8) preset_volume;
}
}
}if(preset_type == 6){ // static object volume preset
unsigned int(8) num_input_channel[num_preset_track];
unsigned int(8) output_channel_type;
unsigned int(16) num_volume_flag;
for (i=0; i<num_volume_flag; i++){
unsigned int(8) volume_flag;
if(volume_flag==0){
unsigned int(8) preset_volume;
}
}
}

표 21의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax of Table 21 are as follows.

'volume_flag'는 볼륨 플래그 정보를 의미하는 것으로서, 'volume_flag'는 1비트 인티저의 데이터 타입을 갖는다. 'volume_flag' 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 동일한 경우, 'volume_flag'는 '1'의 값을 갖고, 이전 프레임의 믹싱 정보와 현재 프레임의 믹싱 정보가 다른 경우, 'volume_flag'는 '0'의 값을 갖는다.'volume_flag' means volume flag information, and'volume_flag' has a data type of a 1-bit integer. 'volume_flag' When the mixing information of the previous frame and the mixing information of the current frame are the same, the'volume_flag' has a value of '1', and when the mixing information of the previous frame and the mixing information of the current frame are different,'volume_flag' is' It has a value of 0'.

'num_volume_flag'는 'volume_flag'의 어레이 길이를 의미한다.'num_volume_flag' means the array length of'volume_flag'.

이하에서는 상기에서 설명한 프리셋 저장 방안에 기초하여 dynamic preset의 믹싱 정보를 'prst'에 저장하는 일실시예를 구체적으로 설명하기로 한다. Hereinafter, an embodiment of storing mixing information of a dynamic preset in'prst' based on the above-described preset storage method will be described in detail.

2.1. 프리셋 타입이 dynamic/track preset인 경우('preset_type' 값이 3, 4, 5인 경우),2.1. When the preset type is dynamic/track preset (when the'preset_type' value is 3, 4, 5),

상기에서 언급한 바와 같이 프리셋 타입이 track preset인 경우, 믹싱 정보의 저장에 있어 출력 채널의 타입은 고려되지 않을 수 있다.As mentioned above, when the preset type is a track preset, the type of the output channel may not be considered in storing mixing information.

본 발명의 일실시예에 따르면, 'prst' 내의 프리셋의 신택스는 표 22 내지 표 24와 같을 수 있다. 표 22 내지 표 24에 표시된 신택스는 상기 설명한 "라"의 방안을 이용하여 믹싱 정보를 저장하는 방법과 관련된 신택스이다. According to an embodiment of the present invention, the syntax of the preset in'prst' may be the same as Tables 22 to 24. The syntaxes shown in Tables 22 to 24 are syntaxes related to a method of storing mixing information using the method of “D” described above.

if(preset_type == 3)){ // dynamic track volume preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(8) preset_volume;
}
}
}if(preset_type == 3)){ // dynamic track volume preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(8) preset_volume;
}
}
}

if(preset_type == 4){ // dynamic track volume preset with equalization
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(8) preset_volume;
unsigned int(16) num_freq_band;
for (k=0; k<num_freq_band; k++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
} if(preset_type == 4){ // dynamic track volume preset with equalization
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(8) preset_volume;
unsigned int(16) num_freq_band;
for (k=0; k<num_freq_band; k++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}

if(preset_type == 5){ // dynamic track equalization preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(16) num_freq_band;
for(k=0; k<num_freq_band; k++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}if(preset_type == 5){ // dynamic track equalization preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
unsigned int(16) num_freq_band;
for(k=0; k<num_freq_band; k++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}

표 22 내지 표 24의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax of Tables 22 to 24 are as follows.

'num_updates'는 믹싱 정보의 변화(업데이트) 횟수를 의미한다. 'num_updates' means the number of changes (updates) of mixing information.

'updated_sample_number'은 믹싱 정보가 변화(업데이트)되는 프레임 넘버를 의미한다. 'updated_sample_number' means a frame number to which mixing information is changed (updated).

또한, 상기의 "다"의 방안에 따라 믹싱 정보를 저장하는 경우, 표 22의 신택스는 표 25와 같이 변형될 수 있다. In addition, when mixing information is stored according to the method of “C”, the syntax of Table 22 may be modified as shown in Table 25.

if(preset_type == 3)){ // dynamic track volume preset
unsigned int(8) dynamic_interval;
unsigned int(32) num_volume_flag;
for(i=0; i< num_volume_flag; i++){
unsigned int(8) volume_flag;
if(volume_flag ==0){
unsigned int(8) preset_volume;
}
}
}if(preset_type == 3)){ // dynamic track volume preset
unsigned int(8) dynamic_interval;
unsigned int(32) num_volume_flag;
for(i=0; i<num_volume_flag; i++){
unsigned int(8) volume_flag;
if(volume_flag ==0){
unsigned int(8) preset_volume;
}
}
}

표 25의 신택스에 따른 시맨틱스는 아래와 같다. Semantics according to the syntax of Table 25 are as follows.

'dynamic_interval'은 프레임 간격을 의미한다. 'dynamic_interval' means frame interval.

2.2. 프리셋 타입이 dynamic/channel preset인 경우('preset_type' 값이 9, 10, 11인 경우),2.2. When the preset type is dynamic/channel preset (when the'preset_type' value is 9, 10, 11),

상기에서 언급한 바와 같이, 만약 믹싱 정보가 채널 별로 저장된다면, 믹싱 정보는 입력 트랙의 개수, 입력 트랙 당 채널의 개수 및 출력 채널의 타입을 고려하여 저장될 수 있다. As mentioned above, if the mixing information is stored for each channel, the mixing information may be stored in consideration of the number of input tracks, the number of channels per input track, and the type of output channels.

이 경우, 'prst'내의 프리셋의 신택스는 표 26 내지 표 28과 같을 수 있다. 표 26 내지 표 27의 신택스는 상기 설명한 "라"의 방법을 이용하여 믹싱 정보를 저장하는 방법과 관련된 신택스이다. In this case, the syntax of the preset in'prst' may be the same as Tables 26 to 28. The syntaxes of Tables 26 to 27 are syntaxes related to a method of storing mixing information using the method of "D" described above.

if(preset_type == 9){ // dynamic object volume preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[j]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) preset_volume;
}
}
}
}
}if(preset_type == 9){ // dynamic object volume preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
unsigned int(16) updated_sample_number;
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[j]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) preset_volume;
}
}
}
}
}

if(preset_type == 10){ // dynamic object volume preset with equalization
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[i]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
for(n=0; n<num_freq_band; n++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}
}
}if(preset_type == 10){ // dynamic object volume preset with equalization
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[i]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) preset_volume;
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
for(n=0; n<num_freq_band; n++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}
}
}

if(preset_type == 11){ // dynamic object equalization preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[i]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
for(n=0; n<num_freq_band; n++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}
}
}if(preset_type == 11){ // dynamic object equalization preset
unsigned int(16) num_updates;
for(i=0; i<num_updates; i++){
for(j=0; j<num_preset_track; j++){
for (k=0; k<num_input_channel[i]; k++){
for (m=0; m<num_output_channel; m++){
unsigned int(8) num_freq_band;
for(m=0; m<num_freq_band; m++){
for(n=0; n<num_freq_band; n++){
unsigned int(16) center_freq;
unsigned int(16) bandwidth;
unsigned int(8) preset_freq_gain;
}
}
}
}
}
}
}

이상에서는 믹싱 정보가 볼륨 정보 및 등화 정보만을 포함하는 것으로 기술하였으나, 본 발명의 일실시예에 따르면, 믹싱 정보는 적어도 하나의 입력 채널에 의해 형성되는 음상(sound image)의 크기 값 및 상기 음상의 각도 값을 더 포함할 수 있다. 음상의 크기 값 및 음상의 각도 값은 음상의 가상 위치(virtual position)를 결정하는 프리셋 파라미터이다. In the above, it has been described that the mixing information includes only volume information and equalization information, but according to an embodiment of the present invention, the mixing information includes a size value of a sound image formed by at least one input channel and the sound image. It may further include an angle value. The size value of the sound image and the angle value of the sound image are preset parameters that determine the virtual position of the sound image.

이 경우, 음상의 각도 값은 양자화 되어 저장될 수 있다. 일례로, 음상의 각도 값은 표 29와 같은 테이블 형태로 표현될 수 있다. In this case, the angle value of the sound image may be quantized and stored. For example, the angle value of the sound image may be expressed in the form of a table as shown in Table 29.

indexindex 00 1One 22 33 44 55 66 value(°)value(°) 00 55 1010 1515 2020 2525 3030 indexindex 77 88 99 1010 1111 1212 1313 value(°)value(°) 4040 5050 6060 7070 8080 9090 100100 indexindex 1414 1515 1616 1717 1818 1919 2020 value(°)value(°) 110110 120120 130130 140140 150150 160160 170170 indexindex 2121 2222 2323 2424 2525 2626 2727 value(°)value(°) 180180 190190 200200 210210 220220 230230 240240 indexindex 2828 2929 3030 3131 3232 3333 3434 value(°)value(°) 250250 260260 270270 280280 290290 300300 310310 indexindex 3535 3636 3737 3838 3939 4040 4141 value(°)value(°) 320320 330330 335335 340340 345345 350350 355355

또한, 본 발명의 일실시예에 따르면, 객체기반 오디오 컨텐츠는 적어도 하나의 프리셋 중에서 어느 하나에 기초하여 믹싱된 오디오 신호의 다운 믹스된 신호인 모노/스테레오 오디오 신호를 더 포함할 수 있다. Further, according to an embodiment of the present invention, the object-based audio content may further include a mono/stereo audio signal that is a down-mixed signal of an audio signal mixed based on any one of at least one preset.

상기 모노/스테레오 오디오 신호는 객체기반 오디오 컨텐츠의 재생이 불가능한 오디오 재생 장치와의 호환성을 위해 저장된다.The mono/stereo audio signal is stored for compatibility with an audio reproducing device that cannot reproduce object-based audio content.

객체기반 오디오 컨텐츠가 모노/스테레오 오디오 신호를 더 포함하는 경우, 객체기반 오디오 컨텐츠의 재생이 가능한 오디오 장치에서는 복수의 오디오 객체 및 적어도 하나의 프리셋에 기초하여 객체기반 오디오 컨텐츠를 재생하고, 객체기반 오디오 컨텐츠의 재생이 불가능한 오디오 장치에서는 모노/스테레오 오디오 신호를 재생하게 된다. 이에 따라, 오디오 장치의 종류에 관계없이 객체기반 오디오 컨텐츠의 재생이 가능하게 된다. When the object-based audio content further includes a mono/stereo audio signal, an audio device capable of playing the object-based audio content reproduces the object-based audio content based on a plurality of audio objects and at least one preset, and In an audio device incapable of reproducing content, a mono/stereo audio signal is reproduced. Accordingly, object-based audio content can be reproduced regardless of the type of audio device.

일례로서, 모노/스테레오 오디오 신호는 'mdat'에 저장될 수 있다. 이 경우, 'moov'/'trak'/'tkhd'내의 flags의 시맨틱스는 표 30과 같이 수정될 수 있다. 표 30에서 밑줄 친 부분은 삭제되는 시맨틱스이고, 굵은 글씨로 표시된 부분은 추가되는 시맨틱스이다. As an example, a mono/stereo audio signal may be stored in'mdat'. In this case, the semantics of flags in'moov'/'trak'/'tkhd' can be modified as shown in Table 30. In Table 30, the underlined part is the semantics to be deleted, and the part indicated in bold is the added semantics.

flags - is a 24-bit integer with flags; the following values are defined:

- Track_enabled: Indicates that the track is enabled. Flag value is 0x000001. A disabled track (the low bit is zero) is treated as if it were not present.
- Track_in_movie: Indicates that the track is used in the presentation. Flag value is 0x000002.
- Track_in_interaction_movie: Indicates that the track is used in the presentation by an interactive music player. Flag value is 0x000002.
- Track_in_non_interaction_movie: Indicates that the track is used in the presentation by a non-interactive music player. Flag value is 0x000003.
- Track_in_preview: Indicates that the track is used when previewing the presentation. Flag value is 0x000004.
flags-is a 24-bit integer with flags; the following values are defined:

-Track_enabled: Indicates that the track is enabled. Flag value is 0x000001. A disabled track (the low bit is zero) is treated as if it were not present.
-Track_in_movie: Indicates that the track is used in the presentation. Flag value is 0x000002.
-Track_in_interaction_movie: Indicates that the track is used in the presentation by an interactive music player. Flag value is 0x000002.
-Track_in_non_interaction_movie: Indicates that the track is used in the presentation by a non-interactive music player. Flag value is 0x000003.
-Track_in_preview: Indicates that the track is used when previewing the presentation. Flag value is 0x000004.

MPEG-4 BIFS (Binary format For Scene)를 이용하여 ' moov ' 내에 존재하는 'trak'내에 프리셋 파라미터를 저장Save preset parameters in'trak ' existing in'moov ' using MPEG-4 BIFS (Binary format For Scene)

본 발명의 일실시예에 따르면, 프리셋 파라미터는 MPEG-4 BIFS를 이용하여 'moov' 내에 존재하는 트랙(track) 박스(이하 'trak'이라고 한다)내에 저장될 수 있다. According to an embodiment of the present invention, preset parameters may be stored in a track box (hereinafter referred to as'trak') existing in'moov' using MPEG-4 BIFS.

이 경우, 프리셋 파라미터 중에서 프리셋의 전체적인 정보를 나타내는 제1 프리셋 파라미터(일례로, 프리셋의 개수, 디폴트 프리셋 아이디 등)는 상기에서 설명한 'prco'에 저장될 수도 있고, BIFS 내에 새롭게 정의된 노드를 이용하여 저장될 수도 있다. In this case, the first preset parameter (for example, the number of presets, default preset ID, etc.) representing the overall information of the preset among the preset parameters may be stored in'prco' described above, or a newly defined node in BIFS is used. Can also be saved.

BIFS 내에 새롭게 정의된 노드를 이용하여 제1 프리셋 파라미터를 저장하는 경우, 노드 인터페이스(node interface)는 표 31과 같이 나타낼 수 있다. 표 31에서, 'PresetSound'는 새롭게 정의된 노드를 의미한다. When storing the first preset parameter by using a newly defined node in the BIFS, a node interface may be represented as shown in Table 31. In Table 31,'PresetSound' means a newly defined node.

node interface

PresetSound{
exposedField SFNode source NULL
exposedField SFInt32 numPresets 1
exposedField SFInt32 default_preset_ID 1
} node interface

PresetSound{
exposedField SFNode source NULL
exposedField SFInt32 numPresets 1
exposedField SFInt32 default_preset_ID 1
}

표 31의 노드 인터페이스에 따른 시맨틱스는 아래와 같다. Semantics according to the node interface in Table 31 are as follows.

'source' field는 ISO/IEC 14496-11:2005의 subclause 7.2.2.116의 시맨틱스를 따른다. The'source' field follows the semantics of subclause 7.2.2.116 of ISO/IEC 14496-11:2005.

'numPreset' field 및 'default_preset_ID' field는 앞서 설명한 'prco'의 시맨틱스를 따른다. The'numPreset' field and the'default_preset_ID' field follow the semantics of'prco' described above.

또한, 프리셋 파라미터 중에서 볼륨 정보를 나타내는 프리셋 파라미터는 AudioMix node 및 WideSound node를 적절히 조합하여 저장할 수 있다. In addition, among the preset parameters, a preset parameter indicating volume information may be stored by appropriately combining an AudioMix node and a WideSound node.

또한, 프리셋 파라미터 중에서, 등화 정보를 나타내는 프리셋 파라미터는 기존의 AudioRXProto node 중 PROTO audioEcho를 이용하여 저장할 수도 있고, BIFS 내에 새롭게 정의된 노드를 이용하여 저장될 수도 있다. In addition, among the preset parameters, a preset parameter indicating equalization information may be stored using PROTO audioEcho among existing AudioRXProto nodes, or may be stored using a newly defined node in BIFS.

BIFS 내에 새롭게 정의된 노드를 이용하여 등화 정보(보다 정확하게는 주파수 이득 값)를 저장하는 경우, 노드 인터페이스(node interface)는 표 32와 같이 나타낼 수 있다. 표 32에서, 'PersetAudioEqualizer'는 새롭게 정의된 노드를 의미한다. When equalization information (more precisely, a frequency gain value) is stored using a newly defined node in the BIFS, a node interface may be represented as Table 32. In Table 32,'PersetAudioEqualizer' means a newly defined node.

node interface
PresetAudioEqualizer{
eventIn MFNode addChildren
eventIn MFNode removeChildren
exposedField MFNode children []
exposedField SFInt32 numInputs 1
exposedField MFFloat params []
} node interface
PresetAudioEqualizer{
eventIn MFNode addChildren
eventIn MFNode removeChildren
exposedField MFNode children []
exposedField SFInt32 numInputs 1
exposedField MFFloat params []
}

표 32의 노드 인터페이스에 따른 시맨틱스는 아래와 같다. Semantics according to the node interface in Table 32 are as follows.

'children' field는 동시에 믹싱될 수 있는 노드들의 출력을 의미한다. 'child' field의 일례로서, AudioSource, AudioMix 등이 있다. The'children' field means outputs of nodes that can be mixed at the same time. Examples of the'child' field include AudioSource and AudioMix.

'addChildren'은 'children' field에 추가되는 노드 리스트를 의미한다. 'addChildren' means a list of nodes added to the'children' field.

'removeChildren'은 'children' field에서 삭제되는 노드 리스트를 의미한다. 'removeChildren' means a list of nodes to be deleted from the'children' field.

'numInputs' field는 입력 트랙의 개수를 의미한다. The'numInputs' field means the number of input tracks.

'params' field는 [numInputs ×3·numFreqBands]의 행렬로서, 각 행에는 각 입력 트랙에 적용되는 주파수 대역의 등화 파라미터(등화 정보)가 저장된다. 이는 표 33과 같이 나타낼 수 있다. The'params' field is a matrix of [numInputs × 3·numFreqBands], and each row stores equalization parameters (equalization information) of a frequency band applied to each input track. This can be shown in Table 33.

Data TypeData Type FunctionFunction Default valueDefault value RangeRange floatfloat numFreqBandsnumFreqBands 22 0,…, 320,… , 32 float[]float[] centerFreqcenterFreq [][] 0,…, 200000,… , 20000 float[]float[] bandwidthbandwidth [][] 0,…, 200000,… , 20000 float[]float[] gaingain 1One 0.1,…, 100.1,... , 10

여기에서,' numFreqBands'은 주파수 대역의 개수, 'centerFreq'는 각 주파수 대역에서의 중심 주파수, 'bandwidth'는 각 주파수 대역에서의 대역폭, 'gain'은 주파수 대역 별 이득 값을 각각 의미한다. Here,'numFreqBands' denotes the number of frequency bands,'centerFreq' denotes a center frequency in each frequency band,'bandwidth' denotes a bandwidth in each frequency band, and'gain' denotes a gain value for each frequency band.

즉, 'params' field의 각 행은 아래와 같이 구성된다.That is, each row of the'params' field is composed as follows.

numFreqBands = params [0]numFreqBands = params [0]

centerFreq [0...numFreqBands-1] = params [1 ... numFreqBands]centerFreq [0...numFreqBands-1] = params [1 ... numFreqBands]

bandwidth [0...numFreqBands-1] = params [numFreqBands + 1 ... 2·numFreqBands]bandwidth [0...numFreqBands-1] = params [numFreqBands + 1 ... 2·numFreqBands]

gain [0...numFreqBands-1] = params [2·numFreqBands+1 ... 3·numFreqBands]gain [0...numFreqBands-1] = params [2·numFreqBands+1 ... 3·numFreqBands]

MPEG-4 LASeR (Lightweight Application Scene Representation)를 이용하여 'meta' 내의 xml'에 프리셋 파라미터를 저장Save preset parameters in xml in'meta' using MPEG-4 LASeR (Lightweight Application Scene Representation)

본 발명의 일실시예에 따르면, 프리셋 파라미터는 MPEG-4 LASeR를 이용하여 'meta' 내에 존재하는 엑스엠엘(xml) 박스(이하 'xml'이라고 한다)내에 저장될 수 있다. According to an embodiment of the present invention, the preset parameters may be stored in an xml box (hereinafter referred to as'xml') existing in'meta' using MPEG-4 LASeR.

이 경우, 표 34와 같은 엘리먼트(element) 및 어트리뷰트(attribute)를 새롭게 정의하여 프리셋 파라미터를 저장할 수 있다. In this case, a preset parameter may be stored by newly defining an element and an attribute as shown in Table 34.

i. presetContainer element

semantics

presetContainer element에는 앞서 설명한 'prco'와 동일한 정보가 저장된다.

attribute

'numPreset'은 프리셋의 개수를 의미한다.
'defaultPresetID'는 디폴트 프리셋 아이디를 의미한다.

ii. preset element

semantics

preset element에는 앞서 설명한 'prst'와 동일한 정보가 저장된다. 또한, preset element는 presetContainer element의 children으로 존재한다.

attribute
앞서 설명한 ISO-BMFF의 'prst'의 신택스 및 시맨틱스를 어트리뷰트로 이용한다. i. presetContainer element

semantics

The presetContainer element stores the same information as'prco' described above.

attribute

'numPreset' means the number of presets.
'defaultPresetID' means a default preset ID.

ii. preset element

semantics

The same information as the'prst' described above is stored in the preset element. Also, the preset element exists as children of the presetContainer element.

attribute
The syntax and semantics of'prst' of ISO-BMFF described above are used as attributes.

기타Etc

본 발명의 일실시예에 따르면, 복수의 오디오 객체를 포함하여 구성되는 파일 내에 프리셋 정보가 이미 기술되어 있는 경우, 객체 기반 오디오 컨텐츠 포맷에서 이를 참조하게 하거나, 상기의 프리셋 정보를 객체 기반 오디오 컨텐츠 포맷에 맞도록 변형하여 객체기반 오디오 컨텐츠 포맷 형태로 프리셋 파라미터를 저장할 수 있다. According to an embodiment of the present invention, when preset information is already described in a file comprising a plurality of audio objects, the object-based audio content format is referred to, or the preset information is used in an object-based audio content format. Preset parameters can be saved in an object-based audio content format by transforming them to fit.

또한, 본 발명의 일실시예에 따르면, BIFS 또는 LASeR와 같은 장면 표현언어 형태로 구성된 파일 내에서 프리셋 정보가 기술되어 있는 경우, 객체기반 오디오 컨텐츠 포맷에서 이를 참조하게 하거나, 상기의 프리셋 정보를 객체 기반 오디오 컨텐츠 포맷 스키마에 맞도록 변형하여 객체기반 오디오 컨텐츠 포맷 형태로 프리셋 파라미터를 저장할 수 있다. In addition, according to an embodiment of the present invention, when preset information is described in a file configured in the form of a scene expression language such as BIFS or LASeR, it is referred to in an object-based audio content format, or the preset information is referred to as an object. Preset parameters can be stored in the form of an object-based audio content format by modifying to fit the underlying audio content format schema.

또한, 본 발명의 일실시예에 따르면, 프리셋 만으로 구성된 파일로부터 프리셋 정보를 획득하는 경우, 객체기반 오디오 컨텐츠 포맷에서 이를 참조하도록 할 수 있다. 또한, 프리셋 만으로 구성된 파일에 저장된 프리셋 정보를 객체기반 오디오 컨텐츠 포맷 형태로 저장할 수 있다.In addition, according to an embodiment of the present invention, when obtaining preset information from a file composed of only presets, it may be referred to in an object-based audio content format. In addition, preset information stored in a file composed of only presets may be stored in an object-based audio content format.

앞서 언급한 바와 같이, 객체기반 오디오 컨텐츠에는 디스크립션 정보(또는 디스크립션 메타데이터)가 추가적으로 저장되고, 저장된 디스크립션 정보는 객체기반 오디오 컨텐츠의 검색 및 필터링에 활용될 수 있다. 이하에서는 디스크립션 정보를 저장하는 방법을 도 7 및 도 8을 참고하여 설명하기로 한다. As mentioned above, description information (or description metadata) is additionally stored in the object-based audio content, and the stored description information may be used for searching and filtering object-based audio content. Hereinafter, a method of storing description information will be described with reference to FIGS. 7 and 8.

도 7 및 도 8은 본 발명의 일실시예에 따라 디스크립션 정보를 포함하는 객체기반 오디오 컨텐츠의 저장을 위한 파일 포맷의 구조를 도시한 도면이다.7 and 8 are diagrams illustrating a structure of a file format for storing object-based audio content including description information according to an embodiment of the present invention.

ISO 기반의 객체기반 오디오 컨텐츠 파일 포맷에서, 디스크립션 정보는 앨범(album)을 표현하기 위한 메타데이터(이하, 'album level metadata'라고 한다), 노래(song)를 표현하기 위한 메타데이터(이하, 'song level metadata'라고 한다), 및 트랙(track)을 표현하기 위한 메타데이터(이하, 'track level metadata'라고 한다)를 포함하여 구성될 수 있다. 여기서, 각각의 메타데이터를 정리하면 표 35와 같이 나타낼 수 있다. In the ISO-based object-based audio content file format, description information is metadata for expressing an album (hereinafter referred to as'album level metadata'), and metadata for expressing a song (hereinafter, ' song level metadata'), and metadata for expressing a track (hereinafter, referred to as'track level metadata') may be included. Here, if each metadata is summarized, it can be expressed as Table 35.

DescriptionDescription LevelLevel albumalbum songsong tracktrack titletitle oo oo oo singersinger oo oo -- composercomposer - - oo -- lyricistlyricist -- oo -- performing musicianperforming musician -- -- oo genregenre oo oo -- file datefile date oo oo oo CD track number of the songCD track number of the song -- oo -- productionproduction oo oo -- publisherpublisher oo oo -- copyright informationcopyright information oo oo -- ISRC
(International Standard Recording Code)ISRC
(International Standard Recording Code) -- oo -- imgaeimgae oo oo -- URL
site address related to the music and the artist(e.g. album homepage, fan cafe, music video)URL
site address related to the music and the artist(eg album homepage, fan cafe, music video) oo oo --

상기의 메타데이터는 "노래(song) 및 트랙을 표현하기 위한 메타데이터"와 "앨범을 표현하기 위한 메타데이터"의 2가지 타입으로 분류될 수 있다. 여기서, "앨범을 표현하기 위한 메타데이터"는 객체기반 오디오 컨텐츠 내에 저장된 노래(song) 중에서 같은 앨범 내에 수록되어 있는 노래(song)들에 대한 공통되는 정보들을 표현한다. The metadata can be classified into two types: "metadata for expressing songs and tracks" and "metadata for expressing albums". Here, "metadata for expressing an album" represents common information about songs stored in the same album among songs stored in object-based audio content.

album level metadata는 'ftyp'/'meta'에, song level metadata는 'moov'/'meta'에, track level metadata는 'moov'/'trak'/'meta'에 각각 저장될 수 있다. 이를 정리하면 표 36과 같이 나타낼 수 있다. Album level metadata may be stored in'ftyp'/'meta', song level metadata may be stored in'moov'/'meta', and track level metadata may be stored in'moov'/'trak'/'meta', respectively. In summary, it can be expressed as Table 36.

MetadataMetadata LocationLocation track level track level trak/meta boxtrak/meta box song level song level moov/meta box moov/meta box album level album level meta box of file meta box of file

상기의 메타데이터가 저장되는 ISO 기반의 객체기반 오디오 컨텐츠 파일 포맷 구조의 형태는 도 7 및 도 8과 같이 나타낼 수 있다. 도 7에 도시된 포맷 구조는 하나의 싱글 타입의 파일 구조(single type file structure)이고, 도 8에 도시된 포맷 구조는 멀티 타입의 파일 구조(multiple type file structure)이다. The format of the ISO-based object-based audio content file format structure in which the metadata is stored may be represented as shown in FIGS. 7 and 8. The format structure shown in FIG. 7 is a single type file structure, and the format structure shown in FIG. 8 is a multiple type file structure.

여기서, 상기의 메타데이터는 mp7t(mpeg-7 type)에 따라 관리(handling)될 수 있다. Here, the metadata may be managed according to mp7t (mpeg-7 type).

보다 상세하게, track level metadata 및 song level metadata를 위해서 MPEG-7의 'CreationInformation', 'MediaInformation', 및 'Semantics DS'가 사용될 수 있다. album level metadata를 위해서는 MPEG-7의 'ContentCollection DS' 및 'CreationInformation DS '가 사용될 수 있다. 이는 album level metadata가 하나의 앨범에 포함되는 복수의 노래에 대한 구조적 정보(structure information)를 포함하고 있기 때문이다. In more detail,'CreationInformation','MediaInformation', and'Semantics DS' of MPEG-7 may be used for track level metadata and song level metadata. For album level metadata,'ContentCollection DS' and'CreationInformation DS' of MPEG-7 can be used. This is because album level metadata contains structure information on a plurality of songs included in one album.

이를 정리하면 표 37 내지 표 39와 같이 나타낼 수 있다. In summary, it can be expressed as Tables 37 to 39.

Tag NameTag Name SemanticsSemantics CreationInformation/Creation/Creator[＠type="Instrument"]CreationInformation/Creation/Creator[@type="Instrument"] The title of the trackThe title of the track - CreationInformation/Creation/Creator[Role/＠herf="urn:mpeg:mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonType"] / Name /{FamilyName, GivenName}(Arist name)
- CreationInformation/Creation/Creator[Role/＠herf= "urn:mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonGroupType"] /Name/(Group Name)-CreationInformation/Creation/Creator[Role/@herf="urn:mpeg:mpeg7: RoleCS:2001:PERFORMER"]/Agent[@xsi: type = "PersonType"] / Name /{FamilyName, GivenName}(Arist name)
-CreationInformation/Creation/Creator[Role/@herf= "urn:mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[@xsi: type = "PersonGroupType"] /Name/(Group Name) The name of a musician who is performing instruments, such as vocal, guitar, keyboard and so onThe name of a musician who is performing instruments, such as vocal, guitar, keyboard and so on CreationInformation/CreationCoordinates/Date/TimePointCreationInformation/CreationCoordinates/Date/TimePoint Time point of the recordingTime point of the recording

Tag NameTag Name SemanticsSemantics CreationInformation/Creation/Title[＠type="songTitle"]CreationInformation/Creation/Title[@type="songTitle"] The title of the songThe title of the song - CreationInformation/Creation/Creator[Role/＠herf="urn: mpeg : mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonType"] / Name /{FamilyName, GivenName}(Arist name)
- CreationInformation/Creation/Creator[Role/＠herf= "urn : mpeg : mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonGroupType"] /Name/(Group Name)-CreationInformation/Creation/Creator[Role/@herf="urn: mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[@xsi: type = "PersonType"] / Name /{FamilyName, GivenName}(Arist name)
-CreationInformation/Creation/Creator[Role/@herf= "urn: mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[@xsi: type = "PersonGroupType"] /Name/(Group Name) The name of a musician such as singer, composer and lyricistThe name of a musician such as singer, composer and lyricist CreationInformation/Classification/Genre[＠herf="urn:id3:v1:genreID"]CreationInformation/Classification/Genre[@herf="urn:id3:v1:genreID"] GenreGenre CreationInformation/CreationCoordinates/Date/TimePointCreationInformation/CreationCoordinates/Date/TimePoint Time point when the song is releasedTime point when the song is released Semantics/SemanticBase[＠xsi:type="SemanticStateType"] /AttributeValuePairSemantics/SemanticBase[@xsi:type="SemanticStateType"] /AttributeValuePair CD track number of the songCD track number of the song CreationInformation/Creation/Abstract/FreeTextAnnotationCreationInformation/Creation/Abstract/FreeTextAnnotation Information on production, Publisher and site address related to the music and the artist
(e.g. album homepage, fan cafe and music video) Information on production, Publisher and site address related to the music and the artist
(eg album homepage, fan cafe and music video) CreationInformation/Creation/copyrightStringCreationInformation/Creation/copyrightString Textual label indicating information that may be displayed or otherwise made known to the end userTextual label indicating information that may be displayed or otherwise made known to the end user MediaInformation/MediaIdentification/EntityIdentifierMediaInformation/MediaIdentification/EntityIdentifier ISRCISRC CreationInformation/Creation/TitleMedia[＠type="TitleImage"]CreationInformation/Creation/TitleMedia[@type="TitleImage"]

Tag NameTag Name SemanticsSemantics ContentCollection/CreationInformation/Creation/Title[＠type="albumTitle"]ContentCollection/CreationInformation/Creation/Title[@type="albumTitle"] The title of the albumThe title of the album - ContentCollection/CreationInformation/Creation/Creator[Role/＠href="urn: mpeg:mpeg7:RoleCS:2001:PERFORMER"]/Agent[＠xsi:type = "PersonType"]/ Name /{FamilyName, GivenName}(Arist name)
- CreationInformation/Creation/Creator[Role/＠herf= "urn:mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[＠xsi : type = "PersonGroupType"] /Name/(Group Name)-ContentCollection/CreationInformation/Creation/Creator[Role/@href="urn: mpeg:mpeg7:RoleCS:2001:PERFORMER"]/Agent[@xsi:type = "PersonType"]/ Name /{FamilyName, GivenName}(Arist name)
-CreationInformation/Creation/Creator[Role/@herf= "urn:mpeg: mpeg7: RoleCS:2001:PERFORMER"]/Agent[@xsi: type = "PersonGroupType"] /Name/(Group Name) The name of representative musician of the albumThe name of representative musician of the album ContentCollection/CreationInformation/Classification/Genre[＠href="urn: id3:v1:genreID"]ContentCollection/CreationInformation/Classification/Genre[@href="urn: id3:v1:genreID"] GenreGenre ContentCollection/CreationInformation/CreationCoordinates/Date/TimepointContentCollection/CreationInformation/CreationCoordinates/Date/Timepoint Time point when the album is relatedTime point when the album is related ContentCollection/CreationInformation/Creation/Abstract/FreeText AnotationContentCollection/CreationInformation/Creation/Abstract/FreeText Anotation Information on production, publisher and site address related to the music and the artist
(e.g. album homepage, fan cafe and music video)Information on production, publisher and site address related to the music and the artist
(eg album homepage, fan cafe and music video) ContentCollection/CreationInformation/Creation/CopyrightStringContentCollection/CreationInformation/Creation/CopyrightString Textual label indicating information that may be displayed or otherwise made known to the end userTextual label indicating information that may be displayed or otherwise made known to the end user ContentCollection/CreationInformation/Creation/TitleMedia[＠type ="TitleImage"]ContentCollection/CreationInformation/Creation/TitleMedia[@type ="TitleImage"] The title of the multimedia content in image formThe title of the multimedia content in image form

또한, 객체기반 오디오 컨텐츠 내에는 노래의 가사(lyrics) 등과 같은 오디오 컨텐츠 관련 정보가 포함될 수 있는데, 객체기반 오디오 컨텐츠의 재생 시 오디오 컨텐츠 재생 장치에 상기의 오디오 컨텐츠 관련 정보를 표시한다면, 보다 효율적으로 사용자에게 객체기반 오디오 서비스를 제공할 수 있다. 오디오 컨텐츠 관련 정보는 객체기반 오디오 컨텐츠의 재생 시간에 따라 변화될 수 있다. 이하에서는 재생 시간에 따라 변화하는 오디오 컨텐츠 관련 정보를 'Timed Text'라고 칭하기로 한다. In addition, the object-based audio content may include audio content-related information such as lyrics of a song. When the object-based audio content is played back, if the audio content-related information is displayed on the audio content playback device, it is more efficient. Object-based audio services can be provided to users. Audio content related information may be changed according to the playback time of the object-based audio content. Hereinafter, information related to audio content that changes according to the playback time will be referred to as'Timed Text'.

객체기반 오디오 컨텐츠 파일 포맷에서는 3GPP TS 26.245 (이하, '3GPP Timed Text'라고 칭하기로 한다), MPEG-4 Streaming Text Format과 같은 Timed Text 표준을 이용하여 Timed Text를 제공할 수 있다. In the object-based audio content file format, Timed Text can be provided by using Timed Text standards such as 3GPP TS 26.245 (hereinafter, referred to as '3GPP Timed Text') and MPEG-4 Streaming Text Format.

일례로서, 3GPP Timed Text를 이용하여 Timed Text를 제공하는 경우, 3GPP Timed Text는 텍스트 샘플(text sample)과 샘플 디스크립션(sample description)을 포함하여 구성될 수 있다. As an example, in the case of providing Timed Text using 3GPP Timed Text, the 3GPP Timed Text may be configured to include a text sample and a sample description.

여기서, 텍스트 샘플은 텍스트 스트링(text string)과 샘플 모디파이어(sample modifier)를 포함하여 구성될 수 있는데, 샘플 모디파이어(sample modifier)는 텍스트 스트링을 랜더링하는 방법에 대한 정보를 담고 있다. Here, the text sample may include a text string and a sample modifier, and the sample modifier contains information on how to render the text string.

텍스트 샘플은 ISO-BMFF에서 'mdat' 내 하나의 트랙(즉 text track) 으로 저장된다. 저장된 텍스트 샘플은 'moov'/'trad'/'mdia'/'minf'/'stbl' 내의 'stts', 'stsc', 'stco' 등에 저장된 정보들을 이용하여 오디오 트랙과 같은 timed media와 동기되어 재생된다. The text sample is stored as one track (ie, text track) in'mdat' in ISO-BMFF. Saved text samples are synchronized with timed media such as audio tracks using information stored in'stts','stsc', and'stco' in'moov'/'trad'/'mdia'/'minf'/'stbl'. Is reproduced.

또한, 샘플 디스크립션은 텍스트가 랜더링되는 방법에 관한 정보를 포함한다. 일례로, 샘플 디스크립션은 디스플레이되는 텍스트의 위치, 텍스트의 색, 배경(background) 색 등에 대한 정보를 포함하고 있다. 샘플 디스크립션은 한편, sample description은 'SampleEntry'를 'TextSampleEntry'로 확장하여 'stsd'에서 기술될 수 있다. In addition, the sample description includes information on how the text is rendered. For example, the sample description includes information on the position of the displayed text, the color of the text, and the background color. On the other hand, the sample description may be described in'stsd' by extending'SampleEntry' to'TextSampleEntry'.

이상에서는 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 생성 방법에 대해 설명하였다. 이하에서는 도 5를 참고하여 상기의 객체기반 오디오 컨텐츠의 생성 방법에 따라 생성된 객체기반 오디오 컨텐츠를 재생하는 방법에 대해 설명하기로 한다. In the above, a method of generating object-based audio content according to an embodiment of the present invention has been described. Hereinafter, a method of reproducing object-based audio content generated according to the object-based audio content generation method will be described with reference to FIG. 5.

도 5는 본 발명의 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법에 대한 흐름도를 도시한 도면이다. 5 is a flowchart illustrating a method of reproducing object-based audio content according to an embodiment of the present invention.

먼저, 단계(510)에서는 객체기반 오디오 컨텐츠로부터 복수의 오디오 객체 및 적어도 하나의 프리셋을 복원한다. First, in step 510, a plurality of audio objects and at least one preset are restored from object-based audio content.

이 경우, 객체기반 오디오 컨텐츠는 도 3에서 설명한 객체기반 오디오 컨텐츠의 생성 방법에 따라 생성된 것이다. In this case, the object-based audio content is generated according to the method of generating the object-based audio content described in FIG. 3.

단계(520)에서는 적어도 하나의 프리셋에 기초하여 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성한다. In step 520, an output audio signal is generated by mixing a plurality of audio objects based on at least one preset.

단계(530)에서는 생성된 출력 오디오 신호를 재생한다. In step 530, the generated output audio signal is reproduced.

상기에서 언급한 바와 같이, 프리셋 파라미터에 포함된 디폴트 프리셋 아이디 값이 '0'의 값을 갖는 경우, 다객체 오디오 압축 기술(SAOC)로써 부호화되어 저장된 오디오 객체들의 비트스트림 내부에 저장된 프리셋에 따라 객체기반 오디오 컨텐츠가 재생될 수 있는데, 이하에서는 도 6을 참고하여 다객체 오디오 압축 기술(SAOC)로써 부호화되어 저장된 오디오 객체들의 비트스트림 내부에 저장된 프리셋에 기초하여 객체기반 오디오 컨텐츠가 재생되는 과정을 상세히 설명하기로 한다. As mentioned above, when the default preset ID value included in the preset parameter has a value of '0', the object according to the preset stored in the bitstream of the audio objects encoded and stored using multi-object audio compression technology (SAOC) The base audio content can be played. Hereinafter, the process of playing the object-based audio content based on the preset stored in the bitstream of the audio objects encoded and stored with multi-object audio compression technology (SAOC) will be described in detail with reference to FIG. I will explain.

도 6은 본 발명의 다른 일실시예에 따른 객체기반 오디오 컨텐츠의 재생 방법의 흐름도를 도시한 도면이다. 6 is a flowchart illustrating a method of reproducing object-based audio content according to another embodiment of the present invention.

먼저, 단계(610)에서는 객체기반 오디오 컨텐츠 내에 프리셋이 존재하는지를 판단한다. First, in step 610, it is determined whether a preset exists in the object-based audio content.

단계(610)에서 프리셋이 존재하는 것으로 판단(즉, 'num_preset가 '0'이 아닌 값을 갖는 것으로 판단)한 경우, 단계(620)에서는 객체기반 오디오 컨텐츠 내에 디폴트 프리셋 아이디가 존재하는지를 판단한다. When it is determined that the preset exists in step 610 (that is, it is determined that'num_preset has a value other than '0'), in step 620, it is determined whether a default preset ID exists in the object-based audio content.

단계(620)에서 디폴트 프리셋 아이디가 존재하는 것으로 판단(즉, 'default_preset_ID'가 '0'이 아닌 값을 갖는 것으로 판단)한 경우, 단계(630)에서는 디폴트 프리셋 아이디와 동일한 프리셋 아이디를 갖는 프리셋에 기초하여 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성하고, 단계(670)에서는 생성될 출력 신호를 재생한다. If it is determined that the default preset ID exists in step 620 (that is, it is determined that'default_preset_ID' has a value other than '0'), in step 630, a preset having the same preset ID as the default preset ID is Based on the mixing of a plurality of audio objects to generate an output audio signal, step 670 reproduces the output signal to be generated.

만약, 단계(610)에서 프리셋이 존재하지 않는 것으로 판단(즉, 'num_preset가 '0'의 값을 갖는 것으로 판단)하거나, 단계(620)에서 디폴트 프리셋 아이디가 존재하지 않는 것으로 판단(즉, 'default_preset_ID'가 '0'의 값을 갖는 것으로 판단)한 경우, 단계(640)에서는 SAOC 비트스트림이 존재하는지를 판단한다. If, in step 610, it is determined that the preset does not exist (that is, it is determined that'num_preset has a value of '0'), or in step 620, it is determined that the default preset ID does not exist (that is, ' If it is determined that default_preset_ID' has a value of '0'), it is determined in step 640 whether there is an SAOC bitstream.

단계(640)에서 SAOC 비트스트림이 존재하는 것으로 판단한 경우, 단계(650)에서는 SAOC 비트스트림 내에 프리셋이 존재하는지를 판단한다. If it is determined in step 640 that the SAOC bitstream exists, in step 650 it is determined whether a preset exists in the SAOC bitstream.

단계(650)에서 SAOC 비트스트림 내에 프리셋이 존재하는 것으로 판단한 경우, 단계(670)에서는 SAOC 비트스트림 내에 포함된 첫번째 프리셋에 기초하여 복수의 오디오 객체를 믹싱하여 출력 오디오 신호를 생성하고, 단계(670)에서는 생성될 출력 신호를 재생한다. When it is determined in step 650 that a preset exists in the SAOC bitstream, in step 670, an output audio signal is generated by mixing a plurality of audio objects based on the first preset included in the SAOC bitstream, and step 670 ) Plays the output signal to be generated.

만약, 단계(640)에서 SAOC 비트스트림이 존재하지 않는 것으로 판단하거나, 단계(650)에서 SAOC 비트스트림 내에 프리셋이 존재하지 않는 것으로 판단한 경우, 객체기반 오디오 컨텐츠 내에 프리셋이 없는 것으로 판단하여 객체기반 오디오 컨텐츠를 재생하지 않는다. If, in step 640, it is determined that there is no SAOC bitstream, or in step 650, when it is determined that there is no preset in the SAOC bitstream, it is determined that there is no preset in the object-based audio content, and object-based audio Do not play content.

또한, 본 발명의 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 일실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Further, the embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Examples of program instructions such as magneto-optical, ROM, RAM, flash memory, etc., can be executed by a computer using an interpreter as well as machine code such as that made by a compiler. Contains high-level language code. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, in the present invention, specific matters such as specific components, etc., and limited embodiments and drawings have been described, but this is provided only to help a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , If a person of ordinary skill in the field to which the present invention belongs, various modifications and variations are possible from these descriptions. Therefore, the spirit of the present invention is limited to the described embodiments and should not be defined, and all things that are equivalent or equivalent to the claims as well as the claims to be described later fall within the scope of the spirit of the present invention. .

Claims

delete

In the object-based audio content generation device,
The object-based audio content generation device,
A processor that generates at least one preset related to a plurality of audio objects, and generates object-based audio content including the plurality of audio objects and the at least one preset,
Including,
The object-based audio content,
(i) The type of output channel, which is information on which channel the object-based audio content is played through, (ii) the number of input channels, (iii) the number of output channels according to the type of output channel, (iv) input Object-based audio content generation device including a volume gain between a channel and an output channel.

The method of claim 2,
The preset is editable by the user,
The object-based audio content generation apparatus is provided according to a media file format including track information.

In the object-based audio content reproducing apparatus,
The object-based audio content reproducing apparatus,
A processor for extracting at least one preset and a plurality of audio objects related to a plurality of audio objects from object-based audio content, and reproducing the object-based audio content using the preset and audio objects,
The object-based audio content,
(i) The type of output channel, which is information on which channel the object-based audio content is played through, (ii) the number of input channels, (iii) the number of output channels according to the type of output channel, (iv) input Object-based audio content reproduction apparatus including a volume gain between a channel and an output channel.

The method of claim 4,
The preset is editable by the user,
The object-based audio content reproducing apparatus is provided according to a media file format including track information.

In the object-based audio content generation method,
Generating at least one preset associated with a plurality of audio objects; And
Generating object-based audio content including the plurality of audio objects and the at least one preset
Including,
The object-based audio content,
(i) The type of output channel, which is information on which channel the object-based audio content is played through, (ii) the number of input channels, (iii) the number of output channels according to the type of output channel, (iv) input Object-based audio content generation method including a volume gain between a channel and an output channel.

The method of claim 6,
The preset is editable by the user,
The object-based audio content is provided according to a media file format including track information.

In the object-based audio content playback method,
Extracting at least one preset and a plurality of audio objects related to the plurality of audio objects from the object-based audio content; And
Reproducing the object-based audio content using the preset and audio objects
Including,
The object-based audio content,
(i) The type of output channel, which is information on which channel the object-based audio content is played through, (ii) the number of input channels, (iii) the number of output channels according to the type of output channel, (iv) input Object-based audio content reproduction method including a volume gain between a channel and an output channel.

The method of claim 8,
The preset is editable by the user,
The object-based audio content is provided according to a media file format including track information.