KR20130053444A

KR20130053444A - Signaling data for multiplexing video components

Info

Publication number: KR20130053444A
Application number: KR1020137003808A
Authority: KR
Inventors: 잉 천; 마르타 카르체비츠; 용 왕
Original assignee: 퀄컴 인코포레이티드
Priority date: 2010-07-15
Filing date: 2011-07-15
Publication date: 2013-05-23
Anticipated expiration: 2031-07-15
Also published as: KR101436267B1

Abstract

서버는 오디오 및 비디오 구성요소들 자신들의 인코딩된 샘플들과는 독립적으로, 오디오 및 비디오 구성요소들의 특성들을 기술하는 정보를 클라이언트에 제공할 수도 있다. 클라이언트는 그 정보를 이용하여, 구성요소들을 선택한 후, 그 선택된 구성요소들을, 예컨대, 스트리밍 네트워크 프로토콜에 따라서 요청할 수도 있다. 일 예에서, 캡슐화된 비디오 데이터를 전송하는 장치는, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 결정하도록 구성된 프로세서로서, 상기 특성들은 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 상기 프로세서; 및 특성들을 클라이언트 디바이스에 전송하고, 그 특성들을 전송한 후 클라이언트 디바이스로부터 구성요소들 중 적어도 하나의 구성요소들에 대한 요청을 수신하며, 요청에 응답하여 그 요청된 구성요소들을 클라이언트 디바이스에 전송하도록 구성된 하나 이상의 인터페이스들을 포함한다.The server may provide the client with information describing the characteristics of the audio and video components, independent of the audio and video components' own encoded samples. The client may use the information to select the components and then request the selected components, e.g., in accordance with the streaming network protocol. In one example, an apparatus for transmitting encapsulated video data is a processor configured to determine characteristics for components of a plurality of representations of video content, the characteristics comprising a frame rate, a profile indicator, a level indicator, Wherein the processor comprises at least one of dependencies between elements; And sending the properties to the client device, receiving the request for at least one of the components from the client device after transmitting the properties, and sending the requested components to the client device in response to the request And one or more interfaces configured.

Description

SIGNALING DATA FOR MULTIPLEXING VIDEO COMPONENTS < RTI ID = 0.0 >

본 출원은 2010년 7월 15일자에 출원된 미국 가특허출원 제 61/364,747호, 및 2010년 7월 21일자에 출원된 미국 가특허출원 번호 제 61/366,436호의 이익을 주장하며, 이의 양자가 본원에 각각의 전체로 참조로 포함된다.This application claims the benefit of US Provisional Patent Application No. 61 / 364,747, filed July 15, 2010, and US Provisional Patent Application No. 61 / 366,436, filed July 21, 2010, both of which are incorporated herein by reference. Each of which is incorporated herein by reference in its entirety.

본 개시물은 인코딩된 비디오 유닛의 저장 및 전송에 관한 것이다.The disclosure relates to the storage and transmission of encoded video units.

디지털 비디오 능력들은 디지털 텔레비전, 디지털 직접 브로드캐스트 시스템들, 무선 브로드캐스트 시스템들, 개인 휴대정보 단말기들 (PDAs), 랩탑 또는 데스크탑 컴퓨터들, 디지털 카메라들, 디지털 리코딩 디바이스들, 디지털 미디어 플레이어들, 비디오 게이밍 디바이스들, 비디오 게임 콘솔들, 셀룰러 또는 위성 무선 전화기들, 원격 화상회의 디바이스들, 및 기타 등등을 포함한, 광범위한 디바이스들에 포함될 수 있다. 디지털 비디오 디바이스들은 디지털 비디오 정보를 보다 효율적으로 송수신하기 위해서, 비디오 압축 기법들, 예컨대 MPEG-2, MPEG-4, ITU-T H.263 또는 ITU-T H.264/MPEG-4, 파트 10, AVC (고도 비디오 코딩) 에 의해 정의된 표준들, 및 이런 표준들의 확장판들에 설명된 비디오 압축 기법들을 구현한다.Digital video capabilities include digital television, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, Gaming devices, video game consoles, cellular or satellite radiotelephones, teleconferencing devices, and the like. Digital video devices may use video compression techniques such as MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264 / MPEG-4, Part 10, Standards defined by AVC (Advanced Video Coding), and video compression techniques described in extensions of these standards.

비디오 압축 기법들은 비디오 시퀀스들에 고유한 리던던시를 감소시키거나 또는 제거하기 위해 공간 예측 및/또는 시간 예측을 수행한다. 블록-기반의 비디오 코딩에 있어, 비디오 프레임 또는 슬라이스는 매크로블록들로 파티셔닝될 수도 있다. 각각의 매크로블록은 더 파티셔닝될 수도 있다. 인트라-코딩된 (I) 프레임 또는 슬라이스에서 매크로블록들은 이웃하는 매크로블록들에 대한 공간 예측을 이용하여 인코딩된다. 인터-코딩된 (P 또는 B) 프레임 또는 슬라이스에서 매크로블록들은 동일한 프레임 또는 슬라이스에서 이웃하는 매크로블록들에 대한 공간 예측, 또는 다른 참조 프레임들에서의 매크로블록들에 대한 시간 예측을 이용할 수도 있다.Video compression techniques perform spatial and / or temporal prediction to reduce or eliminate redundancy inherent in video sequences. In block-based video coding, a video frame or slice may be partitioned into macroblocks. Each macroblock may be further partitioned. Macroblocks in an intra-coded (I) frame or slice are encoded using spatial prediction for neighboring macroblocks. Macroblocks in inter-coded (P or B) frames or slices may use spatial prediction for neighboring macroblocks in the same frame or slice, or temporal prediction for macroblocks in different reference frames.

비디오 데이터가 인코딩된 후, 비디오 데이터는 송신 또는 저장을 위해 패킷화될 수도 있다. 비디오 데이터는 다양한 표준들 중 임의의 표준, 예컨대 ISO (International Organization for Standardization) 기반 미디어 파일 포맷 및 이의 확장판들, 예컨대 AVC 에 따르는 비디오 파일에 어셈블리될 수도 있다.After the video data is encoded, the video data may be packetized for transmission or storage. The video data may be assembled into any of a variety of standards, such as video files conforming to the International Organization for Standardization (ISO) based media file format and its extensions, such as AVC.

H.264/AVC 에 기초한 새로운 비디오 코딩 표준들을 개발하려는 노력들이 이루어져 왔다. 하나의 이런 표준이 스케일러블 비디오 코딩 (Scalable Video Coding; SVC) 표준이며, 이 표준은 H.264/AVC 에 대한 스케일러블 확장판이다. 또 다른 표준은 멀티-뷰 비디오 코딩 (MVC) 이며, 이 표준은 H.264/AVC 에 대한 멀티뷰 확장판이 되었다. MVC 의 합동 초안 (joint draft) 은 2008년 7월, 독일 (Germany), 하노버 (Hannover) 에서 개최되고 http://wftp3.itu.int/av-arch/jvt-site/2008_07_Hannover/JVT-AB204.zip 에서 입수가능한, 28차 JVT 회의, JVT-AB204, "Joint Draft 8.0 on Multiview Video Coding" 에 설명되어 있다. AVC 표준의 버전은 2009년 2월, CH, 제노바에서 개최되고 http://wftp3.itu.int/av-arch/jvt-site/2009_01_Geneva/JVT-AD007.zip 에서 입수가능한, 30차 JVT 회의, JVT-AD007, "Editors' draft revision to ITU-T Rec. H.264 | ISO/IEC 14496-10 Advanced Video Coding - in preparation for ITU-T SG 16 AAP Consent (in integrated form)" 에 설명되어 있다. 이 문서는 AVC 사양에 SVC 및 MVC 를 포함한다.Efforts have been made to develop new video coding standards based on H.264 / AVC. One such standard is the Scalable Video Coding (SVC) standard, which is a scalable extension to H.264 / AVC. Another standard is multi-view video coding (MVC), which became a multi-view extension to H.264 / AVC. The joint draft of MVC will be held in July 2008 in Hannover, Germany and will be available at http://wftp3.itu.int/av-arch/jvt-site/2008_07_Hannover/JVT-AB204. JVT-AB204, "Joint Draft 8.0 on Multiview Video Coding. " The version of the AVC standard is the 30th JVT meeting, which is held in February 2009, CH, Genoa and available at http://wftp3.itu.int/av-arch/jvt-site/2009_01_Geneva/JVT-AD007.zip, It is described in JVT-AD007, "Editors' draft revision to ITU-T Rec. H.264 | ISO / IEC 14496-10 Advanced Video Coding-in preparation for ITU-T SG 16 AAP Consent (in integrated form)". This document includes SVC and MVC in the AVC Specification.

일반적으로, 본 개시물은 비디오 데이터를, 예컨대, 네트워크 스트리밍 프로토콜, 예컨대 하이퍼텍스트 전송 프로토콜 (HTTP) 스트리밍을 통해서 전송하는 기법들을 설명한다. 일부의 경우, 비디오 콘텐츠는 다수의 가능한 오디오 데이터와 비디오 데이터의 조합들을 포함할 수도 있다. 예를 들어, 콘텐츠는 (예컨대, 영어, 스페인어, 및 불어와 같은 상이한 언어들로 된) 다수의 가능한 오디오 트랙들, 및 (예컨대, 여러 비트레이트들, 여러 프레임 레이트들과 같은 상이한 코딩 파라미터들로 및/또는 다른 여러 특성들로 인코딩되는) 다수의 가능한 비디오 트랙들을 가질 수도 있다. 이들 트랙들은 오디오 구성요소들 및 비디오 구성요소들과 같은 구성요소들로서 지칭될 수도 있다. 구성요소들 각각의 조합은 멀티미디어 콘텐츠의 고유한 프리젠테이션 (presentation) 을 형성할 수도 있으며, 클라이언트에 서비스로서 전달될 수도 있다. 본 개시물의 기법들은 서버로 하여금 여러 표현들 (representations) 의 특성들 (characteristics), 또는 멀티미디어 구성요소들을, 단일 데이터 구조로 시그널링할 수 있도록 한다. 이러한 방법으로, 클라이언트 디바이스는 그 데이터 구조를 취출하고 예컨대, 스트리밍 네트워크 프로토콜에 따라서 그 서버에게 요청할 표현들 중 하나를 선택할 수도 있다.In general, the present disclosure describes techniques for transmitting video data over, for example, a network streaming protocol, such as Hypertext Transfer Protocol (HTTP) streaming. In some cases, the video content may include a number of possible combinations of audio data and video data. For example, the content may include a plurality of possible audio tracks (e.g., in different languages such as English, Spanish, and French) and a plurality of audio tracks (e.g., at different bit rates, different frame rates, And / or < / RTI > encoded with various other characteristics). These tracks may be referred to as components such as audio components and video components. The combination of each of the components may form a unique presentation of the multimedia content and may be delivered as a service to the client. The techniques of the present disclosure allow a server to signal the characteristics of various representations, or multimedia components, into a single data structure. In this way, the client device may retrieve its data structure and select one of the representations to request from the server, e.g., in accordance with the streaming network protocol.

일 예에서, 캡슐화된 비디오 데이터를 전송하는 방법은, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 클라이언트 디바이스에 전송하는 단계로서, 상기 특성들은 프레임 레이트, 프로파일 표시자, 레벨 표시자, 구성요소들 사이의 의존성들 및 3D 표현을 위한 목표 출력 뷰들의 개수 중 적어도 하나를 포함하는, 상기 전송하는 단계, 특성들을 전송한 후에 구성요소들 중 적어도 하나의 구성요소들에 대한 요청을 클라이언트 디바이스로부터 수신하는 단계, 및 요청에 응답하여 그 요청된 구성요소들을 클라이언트 디바이스에 전송하는 단계를 포함한다.In one example, a method for transmitting encapsulated video data comprises transmitting to a client device characteristics for components of a plurality of representations of video content, the characteristics including a frame rate, a profile indicator, a level indicator, The dependencies between components and the number of target output views for a 3D representation; transmitting, after sending the properties, a request for at least one of the components to the client device And sending the requested components to the client device in response to the request.

또 다른 예에서, 캡슐화된 비디오 데이터를 전송하는 장치는, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 결정하도록 구성된 프로세서로서, 상기 특성들은 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 상기 프로세서, 및 특성들을 클라이언트 디바이스에 전송하고, 그 특성들을 전송한 후 클라이언트 디바이스로부터 구성요소들 중 적어도 하나의 구성요소들에 대한 요청을 수신하고, 요청에 응답하여 그 요청된 구성요소들을 클라이언트 디바이스에 전송하도록 구성된 하나 이상의 인터페이스들을 포함한다.In another example, an apparatus for transmitting encapsulated video data is a processor configured to determine characteristics for components of a plurality of representations of video content, wherein the characteristics are frame rate, profile indicator, level indicator, and Receiving a request for at least one of the components from the client device after transmitting the processor and the characteristics to the client device, the at least one of the dependencies between components, and transmitting the characteristics; And one or more interfaces configured to send the requested components to the client device in response to the request.

또 다른 예에서, 캡슐화된 비디오 데이터를 전송하는 장치는, 클라이언트 디바이스에 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 전송하는 수단으로서, 상기 특성들은 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 상기 전송하는 수단, 특성들을 전송한 후에 구성요소들 중 적어도 하나의 구성요소들에 대한 요청을 클라이언트 디바이스로부터 수신하는 수단, 및 요청에 응답하여 그 요청된 구성요소들을 클라이언트 디바이스에 전송하는 수단을 포함한다.In another example, an apparatus for transmitting encapsulated video data is means for transmitting characteristics for components of a plurality of representations of video content to a client device, wherein the characteristics are frame rate, profile indicator, level indicator. Means for transmitting, means for receiving from the client device a request for at least one of the components after transmitting the characteristics, and at least one of the dependencies between the components, and the request. Means for responding to sending the requested components to the client device.

또 다른 예에서, 컴퓨터 프로그램 제품은, 실행될 때, 캡슐화된 비디오 데이터를 전송하기 위해 소스 디바이스의 프로세서로 하여금, 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 클라이언트 디바이스에 전송하게 하고, 그 특성들을 전송한 후 클라이언트 디바이스로부터 구성요소들 중 적어도 하나의 구성요소들에 대한 요청을 수신하게 하며, 요청에 응답하여 그 요청된 구성요소들을 클라이언트 디바이스에 전송하게 하는 명령들을 포함하는 컴퓨터 판독가능 저장 매체를 포함한다.In another example, a computer program product, when executed, causes a processor of a source device to send encapsulated video data to at least one of a frame rate, a profile indicator, a level indicator, and dependencies between components To send to the client device the properties for the components of the plurality of representations of the video content and to receive the request for at least one of the components from the client device after transmitting the properties And causing the requested components to be transmitted to the client device in response to the request.

또 다른 예에서, 캡슐화된 비디오 데이터를 수신하는 방법은, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 소스 디바이스에게 요청하는 단계로서, 상기 특성들은 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 상기 요청하는 단계, 그 특성들에 기초하여 구성요소들 중 하나 이상을 선택하는 단계, 그 선택된 구성요소들의 샘플들을 요청하는 단계, 및 그 샘플들이 수신된 후 그 샘플들을 디코딩하여 제시하는 단계를 포함한다.In another example, a method for receiving encapsulated video data comprises requesting a source device for characteristics of components of a plurality of representations of video content, the characteristics comprising a frame rate, a profile indicator, And at least one of dependencies between the components, selecting one or more of the components based on the characteristics, requesting samples of the selected components, and And decoding and presenting the samples after the samples are received.

또 다른 예에서, 캡슐화된 비디오 데이터를 수신하는 장치는, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 소스 디바이스에게 요청하도록 구성된 하나 이상의 인터페이스들로서, 상기 특성들은 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 상기 인터페이스들, 및 그 특성들에 기초하여 구성요소들 중 하나 이상을 선택하고, 하나 이상의 인터페이스들로 하여금 그 선택된 구성요소들의 샘플들에 대한 요청들을 소스 디바이스에게 서브밋 (submit) 하도록 구성된 프로세서를 포함한다.In another example, an apparatus for receiving encapsulated video data is one or more interfaces configured to request a source device for characteristics of components of a plurality of representations of video content, the characteristics comprising a frame rate, a profile indicator, Selecting one or more of the components based on the interfaces and their characteristics, including at least one of dependencies between the components, the level indicator, and the components, And submit the requests to the source device for the samples of the source device.

또 다른 예에서, 캡슐화된 비디오 데이터를 수신하는 장치는, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 소스 디바이스에게 요청하는 수단으로서, 상기 특성들은 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 상기 요청하는 수단, 그 특성들에 기초하여 구성요소들 중 하나 이상을 선택하는 수단, 그 선택된 구성요소들의 샘플들을 요청하는 수단, 및 그 샘플들이 수신된 후 그 샘플들을 디코딩하여 제시하는 수단을 포함한다.In yet another example, an apparatus for receiving encapsulated video data comprises means for requesting a source device for characteristics of components of a plurality of representations of video content, the characteristics comprising a frame rate, a profile indicator, Means for selecting one or more of the components based on the characteristics, means for requesting samples of the selected components, and means for requesting samples of the selected components based on the at least one of the dependencies between the components. And means for decoding and presenting the samples after they are received.

또 다른 예에서, 컴퓨터 프로그램 제품은, 캡슐화된 비디오 데이터를 수신하기 위해 디바이스의 프로세서로 하여금, 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들 중 적어도 하나를 포함하는, 비디오 콘텐츠의 복수의 표현들의 구성요소들에 대한 특성들을 소스 디바이스에게 요청하게 하고, 그 특성들에 기초하여 구성요소들 중 하나 이상을 선택하게 하고, 그 선택된 구성요소들의 샘플들을 요청하게 하며, 그 샘플들이 수신된 후 샘플들을 디코딩하여 제시하게 하는 명령들을 포함하는 컴퓨터 판독가능 저장 매체를 포함한다.In yet another example, a computer program product is programmed to cause a processor of a device to receive encapsulated video data, the program product comprising at least one of a frame rate, a profile indicator, a level indicator, Cause the source device to request characteristics of components of the plurality of representations of the video content, to select one or more of the components based on the characteristics, to request samples of the selected components, And causing the samples to be decoded and presented after the samples have been received.

하나 이상의 예들의 세부 사항들이 첨부도면 및 아래의 상세한 설명에서 개시된다. 다른 피쳐들, 오브젝트들, 및 이점들 설명 및 도면들로부터, 그리고 청구항들로부터 명백히 알 수 있을 것이다.The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

도 1 은 오디오/비디오 (A/V) 소스 디바이스가 오디오 및 비디오 데이터를 A/V 목적지 디바이스에 전송하는 예시적인 시스템을 도시하는 블록도이다.
도 2 는 도 1 에 나타낸 A/V 소스 디바이스에 사용하기에 적합한 예시적인 캡슐화 유닛의 구성요소들을 도시하는 블록도이다.
도 3 은 도 1 의 시스템에 사용될 수 있는 예시적인 구성요소 맵 박스 및 예시적인 구성요소 배열 박스를 도시하는 개념도이다.
도 4 는 도 1 의 시스템에서 예시적인 비디오 구성요소 및 예시적인 오디오 구성요소를 멀티플렉싱하는 예시적인 타이밍 간격을 도시하는 개념도이다.
도 5 는 구성요소 맵 박스 및 구성요소 배열 박스들을 서버로부터 클라이언트에 제공하는 예시적인 방법을 도시하는 플로우차트이다.1 is a block diagram illustrating an exemplary system in which an audio / video (A / V) source device transmits audio and video data to an A / V destination device.
FIG. 2 is a block diagram illustrating components of an example encapsulation unit suitable for use with the A / V source device shown in FIG. 1.
3 is a conceptual diagram illustrating an exemplary component map box and an exemplary component placement box that may be used in the system of FIG.
4 is a conceptual diagram illustrating an exemplary timing interval for multiplexing exemplary video components and exemplary audio components in the system of FIG.
5 is a flow chart illustrating an exemplary method of providing a component map box and component arrangement boxes from a server to a client.

일반적으로, 본 개시물은 비디오 콘텐츠를 전송하는 기법들을 설명한다. 본 개시물의 기법들은 비디오 콘텐츠를 스트리밍 프로토콜, 예컨대 하이퍼텍스트 전송 프로토콜 (HTTP) 스트리밍을 이용하여 전송하는 것을 포함한다. HTTP 가 예시의 목적들을 위해 설명되지만, 본 개시물에서 개시된 기법들은 다른 유형들의 스트리밍에 유용할 수도 있다. 비디오 콘텐츠는 ISO 기반 미디어 파일 포맷 또는 그의 확장판들과 같은 특정의 파일 포맷의 비디오 파일들로 캡슐화될 수도 있다. 비디오 콘텐츠는 또한 MPEG-2 전송 스트림으로 캡슐화될 수도 있다. 콘텐츠 서버는 상이한 유형들의 미디어 데이터 (예컨대, 오디오 및 비디오), 및 각각의 유형에 대한 여러 데이터의 세트들 (예컨대, 영어, 스페인어, 및 독일어 오디오와 같은 상이한 언어들 및/또는 MPEG-2, MPEG-4, H.264/AVC, 또는 H.265 와 같은 비디오에 대한 상이한 인코딩 유형들) 을 포함한, 멀티미디어 서비스를 제공할 수도 있다. 본 개시물의 기법들은 특히 각 유형의 데이터의 여러 유형들과 세트들이 결합되어 멀티플렉싱될 수 있는 방법을 시그널링하는데 유용할 수도 있다.In general, the disclosure describes techniques for transmitting video content. Techniques of the present disclosure include transmitting video content using a streaming protocol, e.g., Hypertext Transfer Protocol (HTTP) streaming. Although HTTP is described for illustrative purposes, the techniques disclosed in this disclosure may be useful for other types of streaming. The video content may be encapsulated in video files in a specific file format, such as an ISO-based media file format or extensions thereof. The video content may also be encapsulated into an MPEG-2 transport stream. The content server may have different types of media data (eg, audio and video), and sets of different data for each type (eg, different languages such as English, Spanish, and German audio and / or MPEG-2, MPEG). -4, H.264 / AVC, or different encoding types for video such as H.265). The techniques of the present disclosure may be particularly useful for signaling how different types and sets of data of each type can be combined and multiplexed.

본 개시물은 장면의 관련된 멀티미디어 데이터의 컬렉션을 "콘텐츠" 로 지칭하며, 그 콘텐츠는 다수의 비디오 및/또는 오디오 콘텐츠 구성요소들을 포함할 수도 있다. 용어 "콘텐츠 구성요소" 또는 간단히 "구성요소" 는 비디오 또는 오디오 데이터와 같은, 단일 유형의 미디어를 지칭한다. 데이터의 구성요소는 데이터의 트랙, 서브-트랙, 또는 트랙들 또는 서브-트랙들의 컬렉션을 지칭할 수도 있다. 일반적으로, "트랙" 은 관련되는 인코딩된 화상 샘플들의 시퀀스에 대응할 수도 있으며, 한편 서브-트랙은 그 트랙의 인코딩된 샘플들의 서브세트에 대응할 수도 있다. 일 예로서, 콘텐츠 구성요소는 비디오 트랙, 오디오 트랙, 또는 영화 자막들에 대응할 수도 있다. HTTP 스트리밍 서버는 콘텐츠 구성요소들의 세트를 클라이언트에 그 클라이언트에 대한 서비스로서 전달할 수도 있다.This disclosure refers to a collection of related multimedia data of a scene as “content”, which content may include a number of video and / or audio content components. The term "content component" or simply "component" refers to a single type of media, such as video or audio data. A component of the data may refer to a collection of tracks, sub-tracks, or tracks or sub-tracks of data. In general, a "track" may correspond to a sequence of associated encoded picture samples, while a sub-track may correspond to a subset of the encoded samples of the track. As an example, the content component may correspond to a video track, an audio track, or movie captions. An HTTP streaming server may deliver a set of content components to a client as a service to that client.

서비스는 그 콘텐츠에 이용가능한 모든 비디오 콘텐츠 구성요소들로부터의 하나의 비디오 콘텐츠 구성요소의 선택, 및 그 콘텐츠에 이용가능한 모든 오디오 콘텐츠 구성요소들로부터의 하나의 오디오 콘텐츠 구성요소의 선택에 대응할 수도 있다. 예를 들어, HTTP 서버에 저장된 콘텐츠로서, 풋볼 매칭 프로그램은, 다수의 비디오 콘텐츠 구성요소들, 예컨대, 상이한 비트레이트들 (512 kbps 또는 1 Mbps) 또는 상이한 프레임 레이트들, 및 다수의 오디오 구성요소들, 예컨대, 영어, 스페인어, 또는 중국어를 가질 수도 있다. 따라서, 클라이언트에 제공되는 서비스는 하나의 비디오 구성요소 및 하나의 오디오 구성요소, 예컨대, 512 kbps 비디오와 함께 스페인어 오디오의 선택에 대응할 수도 있다. 비디오 구성요소와 오디오 구성요소의 조합은 또한 콘텐츠의 표현 (representation) 으로서 지칭될 수도 있다.A service may correspond to a selection of one video content element from all video content elements available for that content and a selection of one audio content element from all available audio content elements for that content . For example, as a content stored in an HTTP server, a football matching program may include a plurality of video content components, e.g., different bit rates (512 kbps or 1 Mbps) or different frame rates, For example, English, Spanish, or Chinese. Thus, the service provided to the client may correspond to the selection of Spanish audio with one video component and one audio component, e.g., 512 kbps video. The combination of video component and audio component may also be referred to as a representation of content.

HTTP 스트리밍에서, 일 예로서, 클라이언트 디바이스는 데이터에 대한 하나 이상의 요청들을 HTTP 취득 요청들 (HTTP Get requests) 또는 부분 취득 요청들 (partial Get requests) 의 형태로 발생시킨다. HTTP 취득 요청은 파일의 URL (uniform resource locator) 또는 URN (uniform resource name) 을 규정한다. HTTP 부분 취득 요청은 파일의 URL 또는 URN 뿐만 아니라, 취출할 파일의 바이트 범위를 규정한다. HTTP 스트리밍 서버는 HTTP 부분 취득 요청의 경우에 그 파일을 요청된 URL 또는 URN, 또는 파일의 요청된 바이트 범위에서 출력함으로써 (예컨대, 전송함으로써) HTTP 취득 요청에 응답할 수도 있다. 클라이언트가 HTTP 취득 및 부분 취득 요청들을 적절하게 발생하도록 하기 위해서, 서버는, 클라이언트가 원하는 콘텐츠 구성요소들을 선택하여 그 구성요소들에 대한 HTTP 취득 및/또는 부분 취득 요청들을 적절히 발생할 수 있도록, 콘텐츠 구성요소들에 대응하는 파일들의 URLs 및/또는 URNs 에 관한 정보 뿐만 아니라, 그 구성요소들의 특성들을 클라이언트에 제공할 수도 있다.In HTTP streaming, as an example, a client device generates one or more requests for data in the form of HTTP Get requests or partial Get requests. The HTTP get request specifies the uniform resource locator (URL) or uniform resource name (URN) of the file. The HTTP partial retrieval request specifies the byte range of the file to be retrieved as well as the URL or URN of the file. The HTTP streaming server may respond to an HTTP acquisition request by outputting (e.g., sending) the file in the requested URL or URN, or in the requested byte range of the file, in the case of an HTTP partial fetch request. In order for the client to appropriately generate HTTP acquisition and partial acquisition requests, the server may be configured to select content components desired by the client and appropriately generate HTTP acquisition and / or partial acquisition requests for the components. May provide the client with properties of the elements as well as information regarding URLs and / or URNs of files corresponding to the elements.

본 개시물의 기법들은 콘텐츠 구성요소들의 특성들을 시그널링하는 것, 예컨대, 여러 콘텐츠 구성요소들에 대한 데이터의 위치들을 시그널링하는 것을 포함한다. 이러한 방법으로, 클라이언트 디바이스는 콘텐츠의 표현을 선택하고 여러 유형들의 콘텐츠 구성요소들의 조합들에 대한 요청들을 발생할 수도 있다. 예를 들어, 상기 예에 따르면, 사용자는 512 kbps 비디오를 스페인어 오디오와 함께 시청하기로 선택할 수도 있다. 시청자의 클라이언트 디바이스는 이들 2개의 구성요소들에 대한 요청을 서브밋할 수도 있다. 즉, 클라이언트 디바이스는 512 kbps 비디오 및 스페인어 오디오의 데이터에 대한 위치들을 서버로부터의 시그널링된 데이터를 이용하여 결정하고, 그 후 이들 콘텐츠 구성요소들에 대응하는 데이터에 대한 요청을 발생할 수도 있다. 이 요청에 응답하여, 서버는 이들 2개의 구성요소들을 클라이언트 디바이스에 대한 서비스로서 전달할 수도 있다.The techniques of the present disclosure include signaling the characteristics of the content components, e.g., signaling the locations of data for the various content components. In this way, the client device may select a representation of the content and generate requests for combinations of different types of content components. For example, according to the example above, the user may choose to watch a 512 kbps video with Spanish audio. The viewer's client device may submits a request for these two components. That is, the client device may determine the locations for the data of 512 kbps video and Spanish audio using the signaled data from the server, and then generate a request for data corresponding to these content elements. In response to this request, the server may forward these two components as a service to the client device.

ISO 기반 미디어 파일 포맷은, 표현을 위한 타이밍된 미디어 정보를 미디어의 상호교환, 관리, 편집, 및 표현을 용이하게 하는 유연한 확장가능한 포맷으로 포함하도록, 설계된다. ISO 기반 미디어 파일 포맷 (ISO/IEC 14496-12:2004) 은 시간-기반의 미디어 파일들에 대한 일반 구조를 정의하는 MPEG-4 파트-12 에 규정되어 있다. H.264/MPEG-4 AVC 비디오 압축에 대한 지원이 정의된 AVC 파일 포맷 (ISO/IEC 14496-15), 3GPP 파일 포맷, SVC 파일 포맷, 및 MVC 파일 포맷과 같은 집단 (family) 에서 다른 파일 포맷들에 대한 기초로서 사용된다. 3GPP 파일 포맷 및 MVC 파일 포맷은 AVC 파일 포맷의 확장판들이다. ISO 기반 미디어 파일 포맷은 예컨대, 오디오-시각적 표현들과 같은, 미디어 데이터의 타이밍된 시퀀스들에 대한 타이밍, 구조, 및 미디어 정보를 포함한다. 파일 구조는 객체지향적일 수도 있다. 파일은 기본 오브젝트들로 매우 단순하게 분해될 수 있으며, 그들의 유형으로부터 오브젝트들의 구조가 암시된다.The ISO-based media file format is designed to include timed media information for presentation in a flexible, extensible format that facilitates media interchange, management, editing, and presentation. ISO-based media file format (ISO / IEC 14496-12: 2004) is defined in MPEG-4 Part-12 which defines the general structure for time-based media files. Support for H.264 / MPEG-4 AVC video compression has been defined in other file formats (such as AVC file format (ISO / IEC 14496-15), 3GPP file format, SVC file format, and MVC file format) Are used as a basis. The 3GPP file format and the MVC file format are extensions of the AVC file format. The ISO based media file format includes timing, structure, and media information for timed sequences of media data, such as audio-visual representations, for example. The file structure may be object oriented. Files can be decomposed very simply into basic objects, and the structure of objects from their type is implied.

ISO 기반 미디어 파일 포맷 (및 이의 확장판들) 에 따르는 파일들은 일련의 오브젝트들, 소위 "박스들" 로서 형성될 수도 있다. ISO 기반 미디어 파일 포맷의 데이터가 박스들 내에 포함될 수도 있으므로, 그 파일 내에 어떤 다른 데이터도 포함될 필요가 없고 그 파일 내 박스들의 외부에 데이터가 존재할 필요가 없다. 이것은 특정의 파일 포맷에 의해 요구되는 임의의 초기 서명 (initial signature) 을 포함한다. "박스" 는 고유한 유형 식별자 및 길이에 의해 정의되는 객체지향적인 빌딩 (building) 블록일 수도 있다. 일반적으로, 표현은 하나의 파일에 포함되며, 미디어 표현이 자체-포함된다 (self-contained). 영화 컨테이너 (영화 박스) 는 미디어의 메타데이터를 포함할 수도 있으며, 비디오 및 오디오 프레임들은 그 미디어 데이터 컨테이너에 포함될 수도 있으며 다른 파일들에 있을 수 있다.Files conforming to the ISO-based media file format (and its extensions) may be formed as a series of objects, so-called "boxes ". The data in the ISO-based media file format may be included in the boxes, so that there is no need to include any other data in the file and there is no need for data to exist outside the boxes in the file. This includes any initial signature required by a particular file format. A "box" may be an object-oriented building block defined by a unique type identifier and length. In general, expressions are contained in a single file, and the media representation is self-contained. The movie container (movie box) may include metadata of the media, and the video and audio frames may be contained in the media data container and in other files.

본 개시물의 기법들에 따르면, 서버는 여러 콘텐츠 구성요소들의 특성들을 시그널링하는 구성요소 맵 박스를 제공할 수도 있다. 구성요소 맵 박스는 여러 콘텐츠 구성요소들에 대한 인코딩된 샘플들을 저장하는 파일들과는 별개인 파일에 저장될 수도 있는 데이터 구조에 대응할 수도 있다. 구성요소 맵 박스는 비디오 데이터에 대해 그 코딩된 비디오 샘플들을 실제로 포함하는 파일들 외부에 저장되는 데이터 구조로 종래 시그널링되지 않은 콘텐츠 구성요소의 특성들을 시그널링할 수도 있다. 이런 데이터 구조는, 구성요소 맵 박스에서와 같이, 또한 HTTP 스트리밍의 매니페스트 (manifest) 파일 또는 미디어 프리젠테이션 설명 (Media Presentation Description) 으로 시그널링될 수도 있다.In accordance with the teachings of the present disclosure, a server may provide a component map box that signals characteristics of various content components. The component map box may correspond to a data structure that may be stored in a file separate from the files storing the encoded samples for the various content components. The component map box may signal characteristics of the content component that are not conventionally signaled with the data structure stored externally to the files that actually contain the coded video samples for the video data. This data structure may also be signaled to a manifest file or a media presentation description of the HTTP stream, as in the component map box.

이 특성들은 예를 들어, 프레임 레이트, 프로파일 표시자, 레벨 표시자, 및 구성요소들 사이의 의존성들을 포함할 수도 있다. 구성요소 맵 박스에 의해 시그널링되는 특성들은 또한 3D 비디오에 대한 3차원의 특성들, 예컨대 뷰들의 개수 및 뷰들 (예컨대, 스테레오 쌍을 형성하는 2개의 뷰들) 사이의 관계들을 포함할 수도 있다. 구성요소 맵 박스는 콘텐츠 구성요소에 대한 비트레이트 및 해상도와 같은, 콘텐츠 구성요소에 대한 종래의 시그널링된 특성들에 더해, 이들 특성들을 시그널링할 수도 있다. 구성요소 맵 박스는 또한 콘텐츠의 서비스를 고유하게 식별하는 서비스 식별자 (예컨대, content_id 값) 를 제공할 수도 있다. 서비스의 구성요소 각각은 서비스 식별자와 연관될 수도 있다.These characteristics may include, for example, frame rates, profile indicators, level indicators, and dependencies between components. The characteristics signaled by the component map box may also include relationships between three-dimensional properties for 3D video, such as the number of views and the views (e.g., two views that form a stereo pair). The component map box may signal these characteristics in addition to conventional signaled properties for the content component, such as bit rate and resolution for the content component. The component map box may also provide a service identifier (e.g., content_id value) that uniquely identifies the service of the content. Each component of the service may be associated with a service identifier.

소스 디바이스는 콘텐츠가 캡슐화되는 방법에 상관없이, 비디오 콘텐츠에 대한 구성요소 맵 박스를 제공하도록 구성될 수도 있다. 즉, 소스 디바이스는 비디오 콘텐츠가 AVC (고도 비디오 코딩) 파일 포맷, 스케일러블 비디오 코딩 (SVC) 파일 포맷, 멀티뷰 비디오 코딩 (MVC) 파일 포맷, 3GPP (Third Generation Partnership Project) 파일 포맷, 또는 다른 파일 포맷들에 따라서 캡슐화되는지 여부에 상관 없이, 구성요소 맵 박스를 클라이언트 디바이스들에 제공할 수도 있다. 구성요소 맵 박스는 특정의 콘텐츠에 대한 콘텐츠 구성요소들의 특성들을 시그널링할 수도 있다. 일부 예들에서, 각각의 구성요소는 파일의 비디오 또는 오디오 트랙, 일련의 작은 파일들 내의 트랙, 트랙 단편들, 트랙들의 조합들 (예컨대, SVC 또는 MVC 에서), 또는 트랙의 서브세트에 대응할 수도 있다.The source device may be configured to provide a component map box for video content, regardless of how the content is encapsulated. That is, the source device may have video content in the AVC (Advanced Video Coding) file format, the Scalable Video Coding (SVC) file format, the Multiview Video Coding (MVC) file format, the Third Generation Partnership Project (3GPP) file format, or other files. Whether or not encapsulated according to formats, a component map box may be provided to client devices. The component map box may signal characteristics of the content components for a particular content. In some instances, each component may correspond to a video or audio track of a file, a track in a series of small files, track fragments, combinations of tracks (e.g., in SVC or MVC), or a subset of tracks .

일반적으로, 구성요소 맵 박스는 기술하는 비디오 데이터와 개별적으로 저장될 수도 있다. 일부 예들에서, 구성요소 맵 박스는 별개의 파일에 포함되거나 또는 콘텐츠 구성요소들을 포함하는 하나의 영화 파일의 부분, 예컨대, mp4 또는 3GP 파일, 또는 본 개시물에서 설명하는 기능을 지원하는 다른 파일로서 포함될 수도 있다. 구성요소 맵 박스의 위치는 캡슐화하는 파일 유형에 따라 변할 수 있다. 더욱이, 구성요소 맵 박스는 ISO 기반 미디어 파일 포맷에 대한 확장판 또는 그의 확장판들 중 하나 이상이 될 수도 있다. 이런 데이터 구조는, 구성요소 맵 박스에서와 같이, 또한 HTTP 스트리밍의 매니페스트 파일 또는 미디어 프리젠테이션 설명으로 시그널링될 수도 있다.In general, the component map box may be stored separately from the video data to be described. In some instances, the component map box may be included in a separate file or as part of a single movie file, e.g., mp4 or 3GP file, containing content components, or other file supporting the functionality described in this disclosure . The location of the component map box may vary depending on the type of file being encapsulated. Moreover, the component map box may be one or more extensions to the ISO-based media file format or extensions thereof. This data structure may also be signaled as a manifest file or media presentation description of HTTP streaming, as in the component map box.

디폴트로, 구성요소 맵 박스는 연관되는 콘텐츠의 전체 지속기간에 적용될 수도 있다. 그러나, 일부의 경우, 구성요소 맵 박스는 오직 콘텐츠의 특정의 타이밍 간격에만 적용될 수도 있다. 이런 경우, 서버는 다수의 구성요소 맵 박스들을 제공하고, 그 구성요소 맵 박스가 대응하는 타이밍 간격 각각 동안 시그널링할 수도 있다. 일부 예들에서, 서버가 다수의 구성요소 맵 박스들을 제공할 때, 서버는 정적 모드로 구성될 수도 있으며, 이때 구성요소 맵 박스들은 동일한 파일에서 타이밍 간격 순서로 인접하게 배열된다. 일부 예들에서, 서버는 동적 모드로 구성될 수도 있으며, 이때 구성요소 맵 박스들은 별개의 파일들로 및/또는 서로 불연속적인 위치들에 제공될 수도 있다. 동적 모드는 라이브 스트리밍에 이점들을 제공할 수도 있으며, 한편 정적 모드는 더 큰 시간 범위에서 시도하는 것에 대해 이점들을 제공할 수도 있다.By default, the component map box may be applied to the entire duration of the associated content. However, in some cases, the component map box may only be applied to a particular timing interval of content. In such a case, the server may provide a number of component map boxes, and the component map boxes may signal during each of the corresponding timing intervals. In some instances, when the server provides multiple component map boxes, the server may be configured in static mode, where the component map boxes are arranged contiguously in the same file in timing-interval order. In some examples, the server may be configured in dynamic mode, where component map boxes may be provided in separate files and / or in discrete locations from each other. The dynamic mode may provide advantages in live streaming, while the static mode may provide advantages over trying in a larger time span.

본 개시물은 또한 그 파일의 트랙들과 여러 구성요소들 사이의 관계들을 시그널링하기 위해 각각의 파일 내에 포함될 수도 있는 구성요소 배열 박스를 제공한다. 예를 들어, 2개 이상 트랙들에 대한 데이터를 포함하는 파일에서 구성요소 배열 박스는 그 파일 내 트랙들에 대한 트랙 식별자들과, 대응하는 콘텐츠 구성요소들에 대한 구성요소 식별자들 사이의 관계를 시그널링할 수도 있다. 이러한 방법으로, 클라이언트 디바이스는 먼저 서버 디바이스로부터 구성요소 맵 박스를 취출할 수도 있다. 클라이언트 디바이스는 그 후, 구성요소 맵 박스에 의해 시그널링된 특성들에 기초하여 표현의 하나 이상의 구성요소들을 선택할 수도 있다. 그 후, 클라이언트 디바이스는 구성요소 맵 박스에 의해 기술되는 구성요소들을 저장하고 있는 파일들로부터 구성요소 배열 박스들을 취출할 수도 있다. 특정의 구성요소에 대한 단편들의 바이트 범위들과 같은 세그먼트 정보를 포함할 수도 있는 구성요소 맵 박스를 이용하여, 클라이언트는 그 선택된 구성요소들의 단편들이 그 파일들에서 저장되어 있는 장소를 결정할 수도 있다. 이 결정에 기초하여, 클라이언트는 그 선택된 구성요소들에 대응하는 트랙들 또는 서브-트랙들의 단편들에 대한 요청들 (예컨대, HTTP 취득 또는 부분 취득 요청들) 을 서브밋할 수도 있다.The present disclosure also provides a component arrangement box that may be included in each file to signal the relationships between the tracks and the various components of the file. For example, in a file that contains data for two or more tracks, the component arrangement box may define the relationship between the track identifiers for the tracks in the file and the component identifiers for the corresponding content components Signaling. In this way, the client device may first retrieve the component map box from the server device. The client device may then select one or more components of the presentation based on the characteristics signaled by the component map box. The client device may then retrieve the component arrangement boxes from the files storing the components described by the component map box. Using a component map box that may include segment information such as byte ranges of fragments for a particular component, the client may determine where fragments of the selected components are stored in the files. Based on this determination, the client may submits requests (e.g., HTTP acquisition or partial acquisition requests) for fragments of tracks or sub-tracks corresponding to the selected components.

이러한 방법으로, 어떻게 각각의 파일 또는 각각의 트랙이 구성요소 맵 박스에서의 콘텐츠 구성요소와 연관되는지에 관한 정보를 시그널링하는 대신, 이 정보는 각각의 파일들과 연관되는 구성요소 배열 박스들에 저장될 수도 있다. 구성요소 맵 박스는 콘텐츠의 모든 구성요소들의 구성요소 식별자들 (예컨대, component_id 값들) 을 시그널링할 수도 있으며, 한편 구성요소 배열 박스는 그 구성요소 배열 박스에 대응하는 파일 내에 저장된 구성요소들의 component_id 값들과 component_id 값들과 연관되는 content_id 값들 사이의 관계들을 시그널링할 수도 있다. 구성요소 맵 박스는 또한, 일부의 경우, 세그먼트 정보를 저장할 수도 있다. 또, 구성요소 맵 박스는 구성요소 맵 박스가 세그먼트 정보를 포함하는지 여부를 나타내는 플래그를 포함할 수도 있다. 클라이언트 디바이스는, 구성요소 맵 박스가 세그먼트 정보를 포함하지 않는 경우 그 표현의 미디어 데이터가 의존적인 표현들에 포함된다고 가정하도록, 구성될 수도 있다.In this way, instead of signaling information about how each file or each track is associated with the content component in the component map box, this information is stored in the component arrangement boxes associated with each of the files . The component map box may signal component identifiers (e.g., component_id values) of all components of the content, while the component arrangement box may contain component_id values of the components stored in the file corresponding to the component arrangement box and may signal the relationships between the content_id values associated with the component_id values. The component map box may also store, in some cases, segment information. In addition, the component map box may include a flag indicating whether the component map box includes segment information. The client device may be configured to assume that the media data of the representation is included in the dependent representations if the component map box does not contain segment information.

서버는 고유한 component_id 값들을 미디어의 유형 각각에 할당함으로써, component_id 값이 동일한 서비스에서 임의의 비디오 또는 오디오 구성요소에 고유한 것으로 보증할 수도 있다. 특정의 유형의 구성요소들은 서로 전환할 수도 있다. 즉, 클라이언트는 여러 비디오 구성요소들 사이에, 예컨대, 변하는 네트워크 조건들 또는 다른 인자들에 응답하여, 전환할 수도 있다. 클라이언트는 각각의 가용 유형의 구성요소들을 요청할 필요가 없다. 예를 들어, 클라이언트는 폐쇄 자막 구성요소들을 포함하는 콘텐츠에 대한 자막들을 요청하는 것을 생략할 수도 있다. 더욱이, 일부의 경우, 동일한 미디어 유형의 다수의 구성요소들이 예컨대, 3D 비디오 또는 화면 속 화면 (picture in picture; PIP) 을 지원하기 위해, 요청될 수도 있다. 서버는 특정의 기능들, 예컨대 화면 속 화면을 지원하기 위해 추가적인 시그널링을 제공할 수도 있다.By assigning unique component_id values to each type of media, the server may ensure that the component_id value is unique to any video or audio component in the same service. Certain types of components may switch between. That is, the client may switch between the various video components, e.g., in response to varying network conditions or other factors. Clients do not need to request components of each available type. For example, the client may omit requesting subtitles for content that includes closed caption components. Moreover, in some cases, multiple components of the same media type may be requested, e.g., to support 3D video or picture in picture (PIP). The server may provide additional signaling to support certain functions, e.g., in-screen display.

예를 들어, 서버는 구성요소가 화면 속 화면 데이터의 설명을 포함하고 있는지 여부를 나타내는 플래그를 제공할 수도 있다. 그 구성요소가 화면 속 화면 데이터를 포함하고 있다고 그 플래그가 나타내면, 구성요소 맵 박스는 화면 속 화면 디스플레이를 형성하는데 함께 나타낼 표현의 식별자를 현재의 표현과 함께 제공할 수도 있다. 하나의 표현은 큰 화상에 대응할 수도 있지만, 다른 하나의 표현은 그 큰 화상에 오버레이된 더 작은 화상에 대응할 수도 있다.For example, the server may provide a flag indicating whether the component includes a description of screen-in-screen data. If the flag indicates that the component includes screen-in-screen data, the component map box may provide an identifier of the representation to be displayed together to form an in-screen display, along with the current presentation. One representation may correspond to a larger image, while the other representation may correspond to a smaller image overlaid on the larger image.

위에서 언급한 바와 같이, 서버는 구성요소 배열 박스를 하나 이상의 구성요소들에 대응하는 인코딩된 샘플들을 포함하는 각각의 파일에 제공할 수도 있다. 구성요소 배열 박스는 그 파일의 헤더 데이터에 제공될 수도 있다. 구성요소 배열 박스는 그 파일에 포함된 구성요소들 및 구성요소들이 예컨대, 그 파일 내에 트랙들로서 저장되는 방법을 나타낼 수도 있다. 구성요소 배열 박스는 그 파일에서 대응하는 트랙의 구성요소 식별자 값과 트랙 식별자 값들 사이에 맵핑을 제공할 수도 있다.As mentioned above, the server may provide a component arrangement box to each file containing encoded samples corresponding to one or more components. The component arrangement box may be provided in the header data of the file. The component arrangement box may indicate how the components and components included in the file are stored, for example, as tracks in the file. The component arrangement box may provide a mapping between the component identifier value and track identifier values of the corresponding track in the file.

구성요소 맵 박스는 또한 콘텐츠 구성요소들 사이의 의존성들을 시그널링할 수도 있으며, 여기서, 그 시그널링된 의존성들은 액세스 유닛 내부의 콘텐츠 구성요소들의 디코딩 순서에 대한 현재의 콘텐츠 구성요소와 함께, 의존성들의 순서를 포함할 수도 있다. 현재의 표현에 대한 의존성들에 관한 시그널링된 정보는 현재의 표현 및/또는 그 현재의 표현이 의존하는 표현들에 의존하는 표현들 중 어느 한쪽 또는 양자를 포함할 수도 있다. 또한, 시간 차원에서 콘텐츠 구성요소들 사이의 의존성들이 존재할 수도 있다. 그러나, 각각의 비디오 구성요소에 대한 temporal_id 값들을 간단히 나타내는 것은, 전체적으로 독립적인 대안적 비디오 비트스트림들에서 시간 하위 계층들이 반드시 서로에 대한 프레임 레이트들의 맵핑을 가질 필요가 없기 때문에, 충분하지 않을 지도 모른다. 예를 들어, 하나의 비디오 구성요소는 24 fps 의 프레임 레이트와, 0 인 temporal_id 를 가질 수도 있으며, 12 fps 의 서브-계층 (2개의 시간 계층들을 가정하면) 을 가질 수도 있으며, 한편, 또 다른 비디오 구성요소는 30 fps 의 프레임 레이트와, 0 인 temporal_id 를 가질 수도 있으며, 7.5 fps 의 하위 계층 (3개의 시간 계층들을 가정하면) 을 가질 수도 있다. 따라서, 서버는 2개의 비디오 구성요소들의 의존성이 시그널링될 때 시간 계층 차이를 나타낼 수도 있다.The component map box may also signal dependencies between content components, where the signaled dependencies, along with the current content components for the decoding order of the content components within the access unit, . The signaled information about the dependencies on the current representation may include either or both of the current representation and / or representations that depend on the representations upon which the current representation depends. There may also be dependencies between content components in the time dimension. However, simply indicating the temporal_id values for each video component may not be sufficient because the temporal sub-layers in totally independent alternative video bitstreams do not necessarily have to have mappings of frame rates to each other . For example, one video component may have a frame rate of 24 fps, a temporal id of 0, a sub-layer of 12 fps (assuming two time layers), and another video The component may have a frame rate of 30 fps, a temporal id of 0, and a lower layer of 7.5 fps (assuming three time layers). Thus, the server may indicate a time layer difference when the dependencies of the two video components are signaled.

일반적으로, 구성요소의 시그널링된 특성들은 예를 들어, 평균 비트레이트, 최대 비트레이트 (예컨대, 1초 이상), 해상도, 프레임 레이트, 다른 구성요소들에 대한 의존성, 및/또는 예컨대, 멀티뷰 비디오에 대한 예약된 확장판들을 포함할 수도 있으며, 이 확장판들은 출력용으로 목표된 뷰들의 개수 및 그들 뷰들에 대한 식별자들을 포함할 수도 있다. 콘텐츠 구성요소를 형성하는 일련의 미디어 단편들에 관한 정보가 또한 시그널링될 수도 있다. 각각의 미디어 단편에 대한 시그널링된 정보는 미디어 단편의 바이트 오프셋, 그 미디어 단편에서 제 1 샘플의 디코딩 시간, 그 단편에서 랜덤 액세스 포인트 및 그의 디코딩 시간과 표현 시간, 및/또는 그 단편이 콘텐츠 구성요소의 새로운 세그먼트 (따라서, 상이한 URL) 에 속하는지 여부를 나타내는 플래그를 포함할 수도 있다.In general, the signaled characteristics of a component may include, for example, an average bit rate, a maximum bit rate (e.g., greater than one second), resolution, frame rate, dependency on other components, and / And these extensions may include the number of views targeted for output and identifiers for those views. Information about a series of media fragments forming a content element may also be signaled. The signaled information for each media fragment may include a byte offset of the media fragment, a decoding time of the first sample in the media fragment, a random access point and its decoding and presentation time in the fragment, and / (And therefore different URLs) of the < / RTI >

일부의 경우, 오디오 데이터의 단편들은 비디오 데이터의 단편들과 시간적으로 정렬된다. 본 개시물은 특정의 시간 간격에 기초하여 다수의 콘텐츠 구성요소들을 멀티플렉싱하는 기법들을 제공한다. 구성요소 맵 박스는 지원되는 멀티플렉싱 간격들의 리스트, 또는 멀티플렉싱 간격들의 범위를 제공할 수도 있다. 멀티플렉싱 간격은 T 로서 나타낼 수도 있으며, 멀티플렉스된 오디오 및 비디오 데이터의 시간 길이를 나타낼 수도 있다. 요청되는 다음 시간 간격이 [n*T, (n+1)*T] 인 것으로 가정한다. 클라이언트 디바이스는 (n*T) <= t <= ((n+1)*T) 의 범위에서 시작 시간 (t) 을 갖는 각각의 콘텐츠 구성요소에 어떤 단편이 있는지 여부를 결정할 수도 있다. 있다면, 클라이언트 디바이스는 그 단편을 요청할 수도 있다. n*T 이전에 시작하는 단편들은 현재의 멀티플렉싱 간격 n*T 이전에 요청될 수도 있으며, 반면 간격 (n+1)*T 이후에 시작하는 단편들은 그 후의 멀티플렉싱 간격에서 요청될 수도 있다. 이러한 방법으로, 서로 또는 요청된 멀티플렉싱 간격으로 정렬하는 단편 경계들을 갖지 않는 콘텐츠 구성요소들은 그럼에도 불구하고, 멀티플렉싱될 수도 있다. 더욱이, 멀티플렉싱 간격은 서비스 동안 콘텐츠 구성요소들의 멀티플렉싱을 방해하지 않고 변할 수도 있다.In some cases, fragments of audio data are temporally aligned with fragments of video data. The present disclosure provides techniques for multiplexing multiple content components based on a particular time interval. The component map box may provide a list of supported multiplexing intervals, or a range of multiplexing intervals. The multiplexing interval may be denoted as T and may represent the length of time of the multiplexed audio and video data. Assume that the next time interval requested is [n * T, (n + 1) * T]. The client device may determine whether there is any fragment in each content element having start time t in the range of (n * T) <= t <= ((n + 1) * T). If so, the client device may request the fragment. Fragments starting before n * T may be requested before the current multiplexing interval n * T, while fragments starting after the interval (n + 1) * T may be requested in the subsequent multiplexing interval. In this way, content components that do not have fragment boundaries that align with each other or at the requested multiplexing interval may nevertheless be multiplexed. Moreover, the multiplexing interval may change without interfering with the multiplexing of the content components during the service.

클라이언트 디바이스는 멀티플렉싱 간격을 변경함으로써 변하는 네트워크 조건들에 적응하도록 구성될 수도 있다. 예를 들어, 대역폭이 상대적으로 더 많이 이용가능하게 될 때, 클라이언트 디바이스는 멀티플렉싱 간격을 증가시킬 수도 있다. 한편, 대역폭이 상대적으로 덜 이용가능하게 될 때, 클라이언트 디바이스는 멀티플렉싱 간격을 감소시킬 수도 있다. 클라이언트 디바이스는 추가로 어떤 타이밍 간격 및 순간 비트레이트에 기초하여 멀티플렉스된 단편들을 요청하도록 구성될 수도 있다. 클라이언트 디바이스는 순간 비트레이트를 단편 내 바이트들의 개수 및 단편의 시간 지속기간에 기초하여 계산할 수도 있다.The client device may be configured to adapt to changing network conditions by changing the multiplexing interval. For example, when bandwidth becomes relatively more available, the client device may increase the multiplexing interval. On the other hand, when the bandwidth becomes relatively less available, the client device may reduce the multiplexing interval. The client device may further be configured to request multiplexed fragments based on a certain timing interval and instantaneous bit rate. The client device may calculate the instantaneous bit rate based on the number of intra-fragment bytes and the time duration of the fragment.

일부 예들에서, 서버는 시간 스플리싱 (splicing) 을 지원하기 위해, 동일한 구성요소 식별자를 2개의 연속적인 미디어 표현들, 예컨대, 순차적인 타이밍 정보를 갖는 2개의 비디오 파일들에 할당할 수도 있다. 위에서 언급한 바와 같이, 일부의 경우, 표현들은 상이한 파일들에 저장된 콘텐츠 구성요소들을 포함할 수도 있다. 따라서, 클라이언트 디바이스는 그 콘텐츠의 특정의 시간 간격 동안 데이터를 취출하기 위해 다수의 취득 또는 부분 취득 요청들을 서브밋할 필요가 있을 수도 있다. 즉, 클라이언트는 그 표현에 대한 콘텐츠 구성요소들을 저장하는 여러 파일들을 참조하는 다수의 취득 또는 부분 취득 요청들을 서브밋할 필요가 있을 수도 있다. 다수의 요청들이 어떤 시간 간격에서 멀티플렉싱되는 데이터를 취득하는 것이 요구될 때, 클라이언트 디바이스는 또 다른 시간 간격에서 어떤 데이터도 현재의 시간 간격에서 원하는 미디어 단편 데이터 사이에서 수신되지 않는 것을 보장하기 위해 그 요청들을 파이프라인으로 형성할 수도 있다.In some instances, the server may assign the same component identifier to two video files having two consecutive media representations, e.g., sequential timing information, to support time splicing. As mentioned above, in some cases, the representations may include content components stored in different files. Thus, a client device may need to commit a number of acquisitions or partial acquisitions to retrieve data during a particular time interval of the content. That is, a client may need to submit multiple retrieval or partial retrieval requests that refer to several files storing content elements for that representation. When multiple requests are required to acquire data that is to be multiplexed at some time interval, the client device sends a request to that request to ensure that no data is received between the desired media fragment data in the current time interval in another time interval May be formed into a pipeline.

이러한 방법으로, 다수의 파일들에서 구성요소들을 갖는 미디어 콘텐츠는 네트워크 스트리밍 컨텍스트, 예컨대 HTTP 스트리밍으로 지원될 수 있다. 즉, 미디어 콘텐츠의 표현은 하나의 파일에서의 한 구성요소 및 별개의 파일에서의 또 다른 구성요소를 포함할 수도 있다. 서버는 상이한 파일들에서의 구성요소들의 특성들을 단일 데이터 구조로, 예컨대, 구성요소 맵 박스로 시그널링할 수도 있다. 이것은 클라이언트가 임의의 목표 콘텐츠 구성요소에 대한, 또는 목표 콘텐츠 구성요소의 임의의 지속기간에 대한 요청들을 할 수 있도록 할 수도 있다.In this way, media content with components in multiple files may be supported in a network streaming context, e.g., HTTP streaming. That is, the representation of the media content may include one component in one file and another component in a separate file. The server may signal the characteristics of the components in the different files into a single data structure, e.g., a component map box. This may allow the client to make requests for any target content element, or for any duration of the target content element.

본 개시물의 구성요소 맵 박스 및 구성요소 배열 박스에 유사한 데이터 구조들의 사용은 또한 다른 이점들을 제공할 수도 있다. 예를 들어, 상이한 구성요소들에서 2개의 미디어 트랙들은 각각의 구성요소들 내에 동일한 트랙 식별자 (track_id) 값을 가질 수도 있다. 그러나, 위에서 언급한 바와 같이, 구성요소 맵 박스는 구성요소들을 트랙 식별자 값과 동일하지 않은 구성요소 식별자를 이용하여 분리하는 것을 지칭할 수도 있다. 각각의 파일이 구성요소 식별자들을 트랙 식별자들에 맵핑하는 구성요소 배열 박스를 포함할 수도 있기 때문에, 구성요소 맵 박스는 구성요소들을 그 트랙 식별자 값과는 독립적인 구성요소 식별자를 이용하여 지칭할 수도 있다. 구성요소 배열 박스는 또한 예컨대, 콘텐츠 전달 네트워크 (CDN) 서버가 많은 상이한 콘텐츠에 대응하는 다수의 파일들을 저장할 때 어느 파일이 어느 콘텐츠에 대응하는지를 규정하는 효율적인 메커니즘을 제공할 수도 있다.The use of similar data structures in the component map boxes and component arrangement boxes of this disclosure may also provide other advantages. For example, in the different components, the two media tracks may have the same track identifier (track_id) value in each of the components. However, as mentioned above, the component map box may refer to separating components using component identifiers that are not the same as the track identifier value. Since each file may include a component arrangement box that maps component identifiers to track identifiers, the component map box may refer to the components using component identifiers that are independent of their track identifier values have. The component arrangement box may also provide an efficient mechanism for defining which file corresponds to which content, for example, when a content delivery network (CDN) server stores multiple files corresponding to many different content.

더욱이, 본 개시물의 기법들은 상이한 네트워크 버퍼 사이즈들을 가진 클라이언트들을 지원할 수도 있다. 즉, 일부 클라이언트들은 예컨대, 네트워크 조건들, 클라이언트 능력들, 및 기타 등등으로 인해, 그 밖의 다른 버퍼들과는 상이한 사이즈의 버퍼들을 요구할 수도 있다. 따라서, 일부의 경우, 특정의 표현에 대한 구성요소들의 다수의 유형들이 상이한 시간 간격들로 멀티플렉싱되는 것을 필요로 할 수도 있다. 본 개시물은 서버가 상이한 가능한 멀티플렉싱 시간 간격들을 시그널링하고, 따라서, 요청되는 데이터의 사이즈들에서의 변동, 따라서, 예컨대, HTTP 를 이용하는 클라이언트와 서버 사이의 라운드 트립 시간의 관점에서 송신의 성능을 고려하는 기법들을 제공한다.Moreover, the techniques of this disclosure may support clients with different network buffer sizes. That is, some clients may require buffers of a different size than other buffers, e.g., due to network conditions, client capabilities, and so on. Thus, in some cases, multiple types of components for a particular representation may need to be multiplexed at different time intervals. This disclosure considers the performance of transmissions in terms of the server signaling different possible multiplexing time intervals, and therefore in terms of variations in the size of the data requested, and therefore, for example, a round trip time between a client and a server using HTTP. Provide techniques to do this.

더욱이, 일부의 경우, 하나의 파일에서의 콘텐츠 구성요소가 하나 이상의 다른 파일들에서의 여러 다른 콘텐츠 구성요소들에 의존할 수도 있다. 이런 의존성은 액세스 유닛 내에서 일어날 수도 있다. 일 예로서, 비디오 콘텐츠 구성요소는 CIF (common interface format) 계층 및 QCIF (quarter common interface format) 계층에 의존하는 CIF SVC 향상 계층에 대응할 수도 있다. 양자의 CIF 및 QCIF 계층들은 하나의 파일에 있을 수도 있는 반면, 4CIF 향상 계층은 또 다른 파일에 있을 수도 있다. 본 개시물의 기법들은 클라이언트의 디코더가 샘플들을 이들 계층들로부터 적합한 디코딩 순서로, 그 의존성들에 기초하여 수신할 수 있게, 클라이언트가 CIF, QCIF, 및 4CIF 계층들에 대한 데이터를 적절하게 요청하는 것을 보장할 수도 있다.Moreover, in some cases, a content component in one file may depend on several other content components in one or more other files. This dependency may occur within an access unit. As an example, the video content component may correspond to a CIF SVC enhancement layer that relies on a common interface format (CIF) layer and a quarter common interface format (QCIF) layer. While both CIF and QCIF layers may be in one file, the 4CIF enhancement layer may be in another file. The techniques of the present disclosure allow a client to appropriately request data for the CIF, QCIF, and 4CIF layers so that the decoder of the client can receive samples from these layers in a suitable decoding order based on their dependencies .

일부 예들에서, 동적 서버는 콘텐츠 구성요소들을 함께 멀티플렉싱하는 파일들을 동적으로 생성하는데 사용될 수도 있다. 예를 들어, 동적 서버는 구성요소들을 함께 멀티플렉싱하여, 현재의 시간 간격에 대한 데이터를 동적 파일의 연속적인 부분으로 만들기 위해, 공통 게이트웨이 인터페이스 (CGI) 서비스에 따르는 방법들을 지원할 수도 있다. CGI 는 http://tools.ietf.org/html/rfc3875 에서 입수가능한 Comment 3875 의 요구서 (Request for Comment 3875) 에 설명되어 있다. CGI 와 같은 서비스를 이용하여, 서버는 콘텐츠의 표현에 대한 여러 콘텐츠 구성요소들의 조합을 포함하는 파일을 동적으로 생성할 수도 있다.In some instances, a dynamic server may be used to dynamically generate files that multiplex content elements together. For example, a dynamic server may support methods that conform to Common Gateway Interface (CGI) services to multiplex components together to make the data for the current time interval a continuous part of a dynamic file. The CGI is described in a Request for Comment 3875 of Comment 3875 available at http://tools.ietf.org/html/rfc3875. Using a service such as CGI, a server may dynamically generate a file containing a combination of several content components for presentation of the content.

표현 (모션 시퀀스) 은 여러 파일들 내에 포함될 수도 있다. 타이밍 및 프레이밍 (위치 및 사이즈) 정보는 일반적으로 ISO 기반 미디어 파일 내에 있으며, 보조 파일들은 본질적으로 임의의 포맷을 이용할 수도 있다. 이 표현은 그 표현을 포함하는 시스템에 '로컬' 일 수도 있거나 또는 네트워크 또는 다른 스트림 전달 메커니즘을 통해서 제공될 수도 있다.The representation (motion sequence) may be included in several files. Timing and framing (position and size) information is typically in ISO-based media files, and auxiliary files may use essentially any format. This representation may be ' local ' to the system containing the representation or it may be provided via a network or other stream delivery mechanism.

파일들은 로직 구조, 시간 구조, 및 물리적인 구조를 가질 수도 있으며, 이들 구조들은 쌍으로 될 것이 요구되지는 않는다. 파일의 로직 구조는 결과적으로 시간-평행 (time-parallel) 트랙들의 세트를 포함하는 영화 또는 비디오 클립 (잠재적으로는, 비디오 및 오디오 데이터 양자를 포함함) 일 수도 있다. 파일의 시간 구조는 그 트랙들이 샘플들의 시퀀스들을 시간에 맞춰 포함하고 그들 시퀀스들이 옵션적인 편집 리스트들에 의해 전체 영화의 타임라인에 맵핑되는 것일 수도 있다. 파일의 물리적인 구조는 미디어 데이터 샘플들 자체로부터 로직, 시간, 및 구조적 분해에 요구되는 데이터를 분리할 수도 있다. 이 구조적 정보는 영화 박스에 집중될 수도 있으며, 어쩌면 영화 단편 박스들에 의해 시간에 맞춰 확장될 수도 있다. 영화 박스는 샘플들의 로직 및 타이밍 관계들을 문서화할 수도 있으며, 또한 그들이 위치되는 포인터들을 포함할 수도 있다. 그들 포인터들은 예컨대, URL 에 의해 참조되는 동일한 파일 또는 또 다른 파일을 가리킬 수도 있다.The files may have a logical structure, a time structure, and a physical structure, and these structures are not required to be paired. The logical structure of the file may ultimately be a movie or video clip (potentially including both video and audio data) that includes a set of time-parallel tracks. The temporal structure of the file may be such that the tracks contain sequences of samples in time and those sequences are mapped to the entire movie's timeline by optional edit lists. The physical structure of the file may separate the data required for logic, time, and structural decomposition from the media data samples themselves. This structural information may be concentrated in the movie box, perhaps extended in time by the movie short boxes. The movie box may document the logic and timing relationships of the samples, and may also include pointers where they are located. Those pointers may point to, for example, the same file or another file referenced by the URL.

각각의 미디어 스트림은 그 미디어 유형 (오디오, 비디오 등) 에 특수화된 트랙에 포함될 수도 있으며 샘플 엔트리에 의해 추가로 파라미터화될 수도 있다. 샘플 엔트리는 정확한 미디어 유형 (스트림을 디코딩하는데 필요로 하는 디코더의 유형) 의 '이름' 및 요구되는 그 디코더의 임의의 파라미터화를 포함할 수도 있다. 이름은 또한 4-캐릭터 (character) 코드의 유형, 예컨대, "moov", 또는 "trak" 를 취할 수도 있다. MPEG-4 미디어 뿐만 아니라, 이 파일 포맷 패밀리를 이용하는 다른 조직들에 의해 사용되는 미디어 유형들에 대해, 정의된 샘플 엔트리 포맷들이 존재한다.Each media stream may be included in a track specialized for its media type (audio, video, etc.) and may be further parameterized by a sample entry. The sample entry may include the 'name' of the exact media type (the type of decoder needed to decode the stream) and any parameterization of the decoder required. The name may also take the type of a 4-character code, e.g., "moov, " or" trak. &Quot; There are sample entry formats defined for media types used by MPEG-4 media, as well as other organizations using this file format family.

메타-데이터에 대한 지원은 일반적으로 2가지 형태들을 취한다. 첫째, 타이밍된 메타-데이터가 적합한 트랙에 저장되고, 원하는 바에 따라, 기술하고 있는 그 미디어 데이터와 동기화될 수도 있다. 둘째, 영화 또는 개개의 트랙에 부수하는 비-타이밍된 메타-데이터에 대한 일반적 지원이 있을 수도 있다. 구조적인 지원이 일반적이며, 그 파일 또는 또 다른 파일 내 다른 장소에, 미디어 데이터, 즉, 코딩된 비디오 화상들의 저장과 유사한 방법으로, 메타-데이터 리소스들의 저장을 가능하게 한다. 게다가, 이들 리소스들은 명명될 수도 있으며, 보호될 수도 있다.Support for meta-data generally takes two forms. First, timed meta-data is stored in the appropriate track and may be synchronized with the media data describing it as desired. Second, there may be general support for non-timed meta-data accompanying movies or individual tracks. Structural support is common and enables storage of meta data resources in a manner similar to the storage of media data, i.e. coded video images, in the file or elsewhere in another file. In addition, these resources may be named and protected.

용어 "프로그레시브 다운로드 (progressive download)" 는 통상 HTTP 프로토콜을 이용한, 서버로부터 클라이언트로의 디지털 미디어 파일들의 전송을 기술하기 위해 사용된다. 컴퓨터로부터 개시될 때, 컴퓨터는 다운로드가 완료되기 전에 미디어의 플레이백을 시작할 수도 있다. 스트리밍 미디어와 프로그레시브 다운로드 사이의 한가지 차이는 디지털 미디어 데이터가 디지털 미디어에 액세스하는 최종 사용자 디바이스에 의해 수신되어 저장되는 방법에 있다. 프로그레시브 다운로드 플레이백이 가능한 미디어 플레이어는, 웹 서버로부터 다운로드될 때, 영향을 받지 않은 파일의 헤더에 위치된 메타데이터 및 디지털 미디어 파일의 로컬 버퍼에 의존한다. 지정된 양의 버퍼된 데이터가 로컬 플레이백 디바이스에 이용가능하게 되는 지점에서, 디바이스는 그 미디어를 플레이하기 시작할 수도 있다. 이 지정된 양의 버퍼된 데이터는 인코더 설정들에서 콘텐츠의 제작자에 의해 그 파일에 내장될 수도 있으며, 클라이언트 컴퓨터의 미디어 플레이어에 의해 시행되는 추가적인 버퍼 설정들에 의해 강제될 수도 있다.The term "progressive download" is used to describe the transfer of digital media files from a server to a client, typically using the HTTP protocol. When initiated from a computer, the computer may begin playback of the media before the download is complete. One difference between streaming media and progressive download is how digital media data is received and stored by an end user device that accesses the digital media. Progressive Download Playback enabled media players depend on the local buffer of metadata and digital media files located in the header of unaffected files when downloaded from a web server. At a point where a specified amount of buffered data becomes available to the local playback device, the device may begin playing the media. This specified amount of buffered data may be embedded in the file by the creator of the content in the encoder settings and may be enforced by additional buffer settings enforced by the media player of the client computer.

프로그레시브 다운로딩 또는 HTTP 스트리밍에서, 비디오 및 오디오 샘플들을 포함한, 모든 미디어 데이터를 포함하는 단일 영화 박스 (moov 박스) 를 제공하는 대신, 영화 단편들 (moof) 은 그 영화 박스 내에 포함된 샘플들에 더해서 여분의 샘플들을 포함하도록 지원된다. 일반적으로, 영화 단편들은 어떤 시간 기간에 대한 샘플들을 포함한다. 이 영화 단편들을 이용하여, 클라이언트는 원하는 시간을 빨리 찾을 수 있다. 영화 단편은 HTTP 스트리밍과 같은 스트리밍 프로토콜에 따라서, 클라이언트가 영화 단편을 취출하라는 부분-취득 요청을 발할 수 있도록, 연속적인 바이트들의 파일을 포함할 수도 있다.Instead of providing a single movie box (moov box) that contains all the media data, including video and audio samples, in progressive downloading or HTTP streaming, movie fragments may be added to the samples contained within the movie box It is supported to include extra samples. In general, movie fragments contain samples for a certain time period. Using these movie fragments, the client can quickly find the desired time. The movie segment may include a file of consecutive bytes so that the client can issue a partial-acquisition request to retrieve the movie segment, in accordance with a streaming protocol such as HTTP streaming.

일 예로서 3GPP 에 대해, HTTP/TCP/IP 전송이 다운로드 및 프로그레시브 다운로드를 위한 3GPP 파일들에 대해 지원된다. 더욱이, 비디오 스트리밍에 HTTP 를 이용하는 것은 얼마간의 이점들을 제공할 수도 있으며, HTTP 에 기초한 비디오 스트리밍 서비스들이 대중화되고 있다. HTTP 스트리밍은 기존 인터넷 구성요소들 및 프로토콜들이 사용될 수 있다는 어떤 이점들을 포함할 수도 있으므로, 비디오 데이터를 네트워크를 통해서 전송하는 새로운 기법들을 개발하는데 새로운 노력들이 요구되지 않는다. 다른 전송 프로토콜들, 예컨대, 실시간 프로토콜 (RTP) 페이로드 포맷은, 미디어 포맷 및 시그널링 컨텍스트를 인식하기 위해 중간 네트워크 디바이스들, 예컨대, 중간 박스들을 필요로 한다. 또한, HTTP 스트리밍은 클라이언트-구동될 수 있어, 제어 이슈들을 피할 수도 있다. HTTP 를 이용하는 것은 또한 HTTP 1.1 이 구현된 웹 서버에서 새로운 하드웨어 또는 소프트웨어 구현들을 반드시 필요로 하지 않는다. HTTP 스트리밍은 또한 TCP-친화성 및 방화벽 보호 (firewall traversal) 를 제공한다. HTTP 스트리밍에서, 미디어 표현은 클라이언트에 액세스가능한 데이터의 구조화된 컬렉션일 수도 있다. 클라이언트는 스트리밍 서비스를 사용자에게 제시하기 위해 미디어 데이터 정보를 요청하여 다운로드할 수도 있다.As an example, for 3GPP, HTTP / TCP / IP transmissions are supported for 3GPP files for download and progressive download. Moreover, using HTTP for video streaming may provide some benefits, and HTTP-based video streaming services are becoming popular. HTTP streaming may include certain advantages that existing Internet components and protocols may be used, so no new efforts are required to develop new techniques for transmitting video data over a network. Other transport protocols, such as the Real Time Protocol (RTP) payload format, require intermediate network devices, e.g., intermediate boxes, to recognize the media format and signaling context. In addition, HTTP streaming can be client-driven, avoiding control issues. Using HTTP also does not require new hardware or software implementations on a web server that implements HTTP 1.1. HTTP streaming also provides TCP-friendliness and firewall traversal. In HTTP streaming, the media representation may be a structured collection of data accessible to the client. The client may request and download the media data information to present the streaming service to the user.

서비스가 클라이언트의 사용자에 의해 영화의 표현으로서 체험되며, 이 영화는 서버에 의해 전달된 콘텐츠 구성요소들로부터 클라이언트에 의해 디코딩되어 렌더링된다. HTTP 스트리밍에서는, 하나의 요청에 응답하여 전체 콘텐츠를 수신하는 대신, 클라이언트가 콘텐츠 구성요소들의 세그먼트들을 요청할 수 있다. 이러한 방법으로, HTTP 스트리밍은 보다 유연한 콘텐츠의 전달을 가능하게 할 수도 있다. 세그먼트는 하나의 URL 에 의해 요청될 수 있는 연속적인 영화 단편의 세트를 포함할 수도 있다. 예를 들어, 세그먼트는 전체 작은 파일일 수도 있으며, 이 작은 파일은 비디오 및 오디오를 포함할 수도 있다. 또 다른 예로서, 세그먼트는 하나의 영화 단편에 대응할 수도 있으며, 이 영화 단면은 하나의 비디오 트랙 단편 및 하나의 오디오 트랙 단편을 포함할 수도 있다. 또한 또 다른 예로서, 세그먼트는 여러 영화 단편들에 대응할 수도 있으며, 이 중에서 임의의 단면 또는 모두는 하나의 비디오 단편 및 하나의 오디오 단편을 가질 수도 있으며, 영화 단편들은 디코딩 시간이 연속적일 수도 있다.A service is experienced as a representation of a movie by a user of the client, which is decoded and rendered by the client from the content elements delivered by the server. In HTTP streaming, instead of receiving the entire content in response to a single request, a client may request segments of content components. In this way, HTTP streaming may enable delivery of more flexible content. A segment may include a set of consecutive movie segments that can be requested by a single URL. For example, the segment may be an entire small file, which may include video and audio. As another example, a segment may correspond to one movie segment, which may include one video track segment and one audio track segment. As yet another example, a segment may correspond to several movie segments, any of which may have one video segment and one audio segment, and the movie segments may have a continuous decoding time.

또한 콘텐츠 배포 네트워크로도 지칭되는, 콘텐츠 전달 네트워크 (CDN) 는, 그 네트워크에 걸친 클라이언트들에 의한 그 데이터에의 액세스를 위한 대역폭을 최대화하기 위해서 그 네트워크 내의 여러 지점들에 위치되고 데이터의 복사본들을 포함하는 컴퓨터들의 시스템을 포함할 수도 있다. 클라이언트는, 모든 클라이언트들이 동일한 중앙 서버에 액세스하는 것과는 대조적으로, 그 클라이언트 근처에 있는 데이터의 복사본에 액세스할 수도 있어, 개개의 서버 근처에서 병목들을 피할 수도 있다. 콘텐츠 유형들은 웹 오브젝트들, 다운로드가능한 오브젝트들 (미디어 파일들, 소프트웨어, 문서들, 및 기타 등등), 애플리케이션들, 실시간 미디어 스트림들, 및 인터넷 전달의 다른 구성요소들 (DNS, 루트들 (routes), 및 데이터베이스 쿼리들) 을 포함할 수도 있다. HTTP 프로토콜, 보다 구체적으로는, HTTP 1.1 에 기초하는 발신 (origin) 서버, 프록시들 및 캐시들에 오직 의존하는 많은 성공적인 CDNs 가 있다.A content delivery network (CDN), also referred to as a content distribution network, is located at various points in the network to maximize bandwidth for accessing the data by clients across the network, Or a system of computers including, but not limited to, A client may access a copy of the data near the client, in contrast to all clients accessing the same central server, thereby avoiding bottlenecks near the individual servers. Content types may include web objects, downloadable objects (media files, software, documents, etc.), applications, real-time media streams, and other components of Internet delivery (DNS, routes, , And database queries). There are many successful CDNs that depend only on the HTTP protocol, more specifically origin servers, proxies and caches based on HTTP 1.1.

HTTP 스트리밍에서, 빈번하게 사용되는 동작들은 취득 및 부분 취득을 포함한다. 취득 동작은 주어진 URL (uniform resource locator) 또는 URN (uniform resource name) 과 연관되는 전체 파일을 취출한다. 부분 취득 동작은 바이트 범위를 입력 파라미터로서 수신하고 그 수신된 바이트 범위에 대응하는 파일의 연속적인 개수의 바이트들을 취출한다. 따라서, 부분 취득 동작이 하나 이상의 개개의 영화 단편들을 취득할 수 있기 때문에, 영화 단편들이 HTTP 스트리밍을 위해 제공될 수도 있다. 영화 단편은 상이한 트랙들로부터의 여러 트랙 단편들을 포함할 수도 있다.In HTTP streaming, frequently used operations include fetching and partial fetching. The get operation retrieves the entire file associated with a given uniform resource locator (URL) or uniform resource name (URN). The partial acquisition operation receives the byte range as an input parameter and retrieves the consecutive number of bytes of the file corresponding to the received byte range. Thus, movie fragments may be provided for HTTP streaming because the partial acquisition operation may acquire one or more individual movie fragments. A movie segment may include multiple track fragments from different tracks.

HTTP 스트리밍의 상황에서, 세그먼트가 취득 요청 또는 부분 취득 요청 (HTTP 1.1 에서) 에 대한 응답으로서 전달될 수도 있다. CDN 에서, 프록시들 및 캐시들과 같은 컴퓨팅 디바이스들은 그 요청들에 응답하여 세그먼트들을 저장할 수 있다. 따라서, 그 세그먼트가 또 다른 클라이언트 (또는, 동일한 클라이언트) 에 의해 요청되고 클라이언트가 이 프록시 디바이스를 통과하는 경로를 가지면, 프록시 디바이스는 발신 서버로부터 다시 세그먼트를 취출함이 없이, 세그먼트의 로컬 복사본을 클라이언트로 전달할 수 있다. HTTP 스트리밍에서, 프록시 디바이스가 HTTP 1.1 를 지원하면, 바이트 범위들은, 요청들에 대한 응답으로서, 프록시 디바이스의 캐시에 저장되는 동안에 결합되거나 또는 요청에 대한 응답의 로컬 복사본으로서 사용되는 동안에 추출될 수 있다. 각각의 콘텐츠 구성요소는 연속적인 단편들의 섹션들을 포함할 수도 있으며, 그 섹션들 각각은 클라이언트 디바이스에 의해 전송된 HTTP 취득 또는 부분 취득에 의해 요청될 수 있다. 콘텐츠 구성요소의 이런 단편은 미디어 단편으로 지칭될 수도 있다.In the context of HTTP streaming, a segment may be delivered as a response to an acquisition request or a partial acquisition request (in HTTP 1.1). In a CDN, computing devices such as proxies and caches may store segments in response to those requests. Thus, if the segment is requested by another client (or the same client) and the client has a path through this proxy device, the proxy device will not be able to retrieve the local copy of the segment without retrieving the segment from the originating server again. Can be delivered to. In HTTP streaming, if the proxy device supports HTTP 1.1, the byte ranges can be extracted as a response to requests, while being stored in the cache of the proxy device or being used as a local copy of the response to the request . Each content component may comprise sections of consecutive fragments, each of which may be requested by HTTP acquisition or partial acquisition sent by the client device. Such fragments of content components may be referred to as media fragments.

여러 비트레이트들 및 여러 디바이스들을 지원할 뿐만 아니라, 여러 사용자 선호사항들을 적응시키기 위해, HTTP 스트리밍에서 하나 보다 많은 미디어 표현들이 있을 수도 있다. 그 표현들의 설명은 서버에 의해 발생되어 클라이언트에 전송될 때, 미디어 프리젠테이션 설명 (MPD) 데이터 구조로 기술될 수 있으며, 이 데이터 구조는 구성요소 맵 박스에 대응할 수도 있다. 즉, 종래의 MPD 데이터 구조는 본 개시물에서 설명하는 바와 같이, 구성요소 맵 박스에 대응하는 데이터를 포함할 수도 있다. 다른 예들에서, 구성요소 맵 박스는 구성요소 맵 박스에 대한 본 개시물에서 설명하는 데이터에 더해, MPD 데이터 구조와 유사한 데이터를 더 포함할 수도 있다. 설명되는 표현들은 하나 이상의 영화 파일들에 포함된, 콘텐츠 구성요소들을 포함할 수도 있다. 정적 콘텐츠 서버가 사용되면, 그 서버는 영화 파일들을 저장할 수도 있다. 동적 콘텐츠 서버가 지원되면, 그 서버는 그 수신된 요청에 응답하여 동적 파일 (콘텐츠) 을 발생할 수도 있다. 동적 콘텐츠가 서버에 의해 그때 그때 발생될 수도 있지만, 프록시들 및 캐시들과 같은 컴퓨팅 디바이스들에 투명하다. 따라서, 동적 콘텐츠 서버에 대한 요청들에 응답하여 제공되는 세그먼트들은 또한 캐시될 수 있다. 동적 콘텐츠 서버는 더 복잡한 구현예를 가질 수도 있으며, 아마 더 적은 스토리지가 서버 측에서 최적이거나 또는 더 적은 캐시가 콘텐츠의 전달 동안 효율적일 것이다.In addition to supporting multiple bit rates and multiple devices, there may be more than one media representation in HTTP streaming to accommodate different user preferences. The description of the representations may be described by a media presentation description (MPD) data structure when generated by a server and transmitted to a client, which may correspond to a component map box. That is, a conventional MPD data structure may include data corresponding to a component map box, as described in this disclosure. In other examples, the component map box may further include data similar to the MPD data structure, in addition to the data described in this disclosure for the component map box. Representations described may include content components, included in one or more movie files. When a static content server is used, the server may store movie files. If a dynamic content server is supported, the server may generate a dynamic file (content) in response to the received request. Dynamic content may then be generated by the server, but is transparent to computing devices such as proxies and caches. Thus, segments provided in response to requests for the dynamic content server may also be cached. The dynamic content server may have more complex implementations, and perhaps less storage is optimal on the server side or less cache will be efficient during delivery of the content.

게다가, 본 개시물은 또한 특정의 표현 (예컨대, 구성요소들의 조합) 이 완료된 (complete) 동작 지점인지 여부를 MPD 에서 시그널링하는 기법들을 포함한다. 즉, 서버는 표현이 완료된 비디오 동작 지점으로서 선택될 수 있는지 여부를 클라이언트에게 나타내기 위해 그 MPD 에 플래그를 제공할 수도 있다. 동작 지점은 어떤 시간 레벨에서 뷰들의 서브세트를 포함하고 자신에게 유효한 비트스트림을 표현하는, MVC 서브-비트스트림, 즉, MVC 비트스트림의 서브세트에 대응할 수도 있다. 동작 지점은 어떤 시간의 레벨 및 뷰 스케일러빌리티를 표현하며, 어떤 시간 레벨에서 뷰들의 어떤 서브세트를 나타내기 위해 유효한 비트스트림에 요구되는 NAL 유닛들만 오직 포함할 수도 있다. 동작 지점은 뷰들의 서브세트의 뷰 식별자 값들, 및 뷰들의 서브세트의 최고 시간 식별자에 의해 기술될 수도 있다.In addition, the present disclosure also includes techniques for signaling at the MPD whether a particular representation (e.g., a combination of components) is an operating point of completion. That is, the server may provide a flag to the MPD to indicate to the client whether the presentation can be selected as a completed video operation point. The operating point may correspond to a subset of the MVC sub-bit stream, i.e., the MVC bit stream, which contains a subset of views at some time level and represents a bit stream valid for itself. An operating point represents a level of time and view scalability, and may only include NAL units required for a valid bitstream to indicate some subset of views at any time level. The operating point may be described by the view identifier values of the subset of views, and by the highest time identifier of the subset of views.

MPD 는 또한 멀티미디어 콘텐츠에 대한 개개의 표현들을 기술할 수도 있다. 예를 들어, 각각의 표현에 대해, MPD 는 표현 식별자, 디폴트 속성 표현 식별자, 그 표현에 대한 프로파일 및 레벨 표시자, 그 표현에 대한 프레임 레이트, 의존성 그룹 식별자, 및 시간 식별자를 시그널링할 수도 있다. 표현 식별자는 멀티미디어 콘텐츠에 대한 연관되는 표현의 고유 식별자를 제공할 수도 있다. 디폴트 속성 표현 식별자는, 프로파일 및 레벨 표시자, 대역폭, 폭, 높이, 프레임 레이트, 의존성 그룹 식별자, 시간 식별자, 및/또는 3D 비디오에 대한 프레임 패킹 (packing) 유형 중 어느 것 또는 모두를 포함할 수도 있는, 현재의 표현에 대한 디폴트 속성들로서 사용될 속성들을 갖는 표현의 식별자를 제공할 수도 있다. 프레임 레이트 식별자는 대응하는 표현에 대한 비디오 구성요소(들) 의 프레임 레이트를 규정할 수도 있다. 의존성 그룹 식별자는 그 대응하는 표현이 할당되는 의존성 그룹을 규정할 수도 있다. 시간 식별자 값을 갖는 의존성 그룹에서의 표현들은 낮은 시간 식별자 값들을 갖는 동일한 의존성 그룹에서의 표현들에 의존할 수도 있다.The MPD may also describe individual representations for multimedia content. For example, for each representation, the MPD may signal an expression identifier, a default attribute representation identifier, a profile and level indicator for the representation, a frame rate for the representation, a dependency group identifier, and a time identifier. The presentation identifier may provide a unique identifier of the associated presentation for the multimedia content. The default attribute representation identifier may include any or all of a profile and level indicator, a bandwidth, a width, a height, a frame rate, a dependency group identifier, a time identifier, and / or a frame packing type for 3D video Which may be used as default attributes for the current presentation, which are to be used. The frame rate identifier may define the frame rate of the video component (s) for the corresponding representation. The dependency group identifier may specify a dependency group to which the corresponding representation is assigned. Representations in a dependency group having a time identifier value may depend on the expressions in the same dependency group having low time identifier values.

예컨대, 멀티뷰 비디오에 대응하는 3D 비디오 표현들에 있어, 구성요소 맵 박스는 출력을 위한 목표 뷰들의 개수를 기술할 수도 있다. 즉, 구성요소 맵 박스는 표현을 위한 목표 출력 뷰들의 개수를 나타내는 값을 포함할 수도 있다. 일부 예들에서, 구성요소 맵 박스는 단일 뷰에 대한 깊이 정보를 단일 뷰에 대한 코딩된 샘플들과 함께 제공할 수도 있으므로, 클라이언트 디바이스는 그 단일 뷰 및 깊이 정보로부터 제 2 뷰를 구성할 수도 있다. 플래그가 그 표현이 뷰 플러스 깊이 표현인 것을 나타내기 위해 제공될 수도 있다. 일부 예들에서, 깊이 정보와 각각 연관되는 다수의 뷰들이 그 표현에 포함될 수도 있다. 이러한 방법으로, 뷰들 각각이 스테레오 뷰 쌍을 생성하는 기초로서 사용될 수 있으므로, 표현의 뷰들 각각에 대해 2개의 뷰들을 발생시킬 수 있다. 따라서, 다수의 뷰들이 그 표현에 포함될 수도 있지만, 그 뷰들 중 2개가 반드시 스테레오 뷰 쌍을 형성하지는 않는다. 일부 예들에서, 표현이 그 대응하는 멀티미디어 콘텐츠에 대한 유효한 표현을 단독으로 형성할 수 없는 단지 의존적인 표현인지 여부를 나타내기 위해 플래그가 포함될 수도 있다.For example, in 3D video representations corresponding to multi-view video, the component map box may describe the number of target views for output. That is, the component map box may include a value indicating the number of target output views for the representation. In some instances, the component map box may provide depth information for a single view with coded samples for a single view, such that the client device may construct a second view from the single view and depth information. The flag may be provided to indicate that the representation is a view plus depth representation. In some instances, multiple views each associated with depth information may be included in the representation. In this way, each of the views can be used as a basis for generating a stereo view pair, thus generating two views for each of the views of the representation. Thus, although multiple views may be included in the representation, two of the views do not necessarily form a stereo-view pair. In some instances, a flag may be included to indicate whether the representation is merely a dependent expression that can not form a valid representation of its corresponding multimedia content alone.

도 1 은 오디오/비디오 (A/V) 소스 디바이스 (20) 가 오디오 및 비디오 데이터를 A/V 목적지 디바이스 (40) 에 전송하는 예시적인 시스템 (10) 을 도시하는 블록도이다. 도 1 의 시스템 (10) 은 원격 화상회의 시스템, 서버/클라이언트 시스템, 브로드캐스터/수신기 시스템, 또는 비디오 데이터가 소스 디바이스, 예컨대 A/V 소스 디바이스 (20) 로부터, 목적지 디바이스, 예컨대 A/V 목적지 디바이스 (40) 에 전송되는 임의의 다른 시스템에 대응할 수도 있다. 일부 예들에서, A/V 소스 디바이스 (20) 및 A/V 목적지 디바이스 (40) 는 양방향의 정보 교환을 수행할 수도 있다. 즉, A/V 소스 디바이스 (20) 및 A/V 목적지 디바이스 (40) 는 오디오 및 비디오 데이터를 인코딩 및 디코딩 양자를 행하는 (그리고, 송수신하는) 것이 가능할 수도 있다. 일부 예들에서, 오디오 인코더 (26) 는 또한 보코더로도 지칭되는, 보이스 인코더를 포함할 수도 있다.1 is a block diagram illustrating an exemplary system 10 in which an audio / video (A / V) source device 20 transmits audio and video data to an A / V destination device 40. As shown in FIG. The system 10 of FIG. 1 may be a remote video conferencing system, a server / client system, a broadcaster / receiver system, or a system in which video data is received from a source device, such as an A / V source device 20, Or to any other system that is transmitted to the device 40. In some examples, A / V source device 20 and A / V destination device 40 may perform a bidirectional information exchange. That is, the A / V source device 20 and the A / V destination device 40 may be capable of both (and transmitting and receiving) both audio and video data for encoding and decoding. In some instances, the audio encoder 26 may also include a voice encoder, also referred to as a vocoder.

도 1 의 예에서, A/V 소스 디바이스 (20) 는 오디오 소스 (22) 및 비디오 소스 (24) 를 포함한다. 오디오 소스 (22) 는 예를 들어, 오디오 인코더 (26) 에 의해 인코딩되는 캡쳐된 오디오 데이터를 나타내는 전기 신호들을 발생하는 마이크로폰을 포함할 수도 있다. 이의 대안으로, 오디오 소스 (22) 는 이전에 기록된 오디오 데이터를 저장하는 저장 매체, 오디오 데이터 발생기, 예컨대 컴퓨터화된 신시사이저, 또는 오디오 데이터의 임의의 다른 소스를 포함할 수도 있다. 비디오 소스 (24) 는 비디오 인코더 (28) 에 의해 인코딩되는 비디오 데이터를 발생하는 비디오 카메라, 이전에 기록된 비디오 데이터로 인코딩된 저장 매체, 비디오 데이터 발생 유닛, 또는 비디오 데이터의 임의의 다른 소스를 포함할 수도 있다.In the example of FIG. 1, the A / V source device 20 includes an audio source 22 and a video source 24. The audio source 22 may include, for example, a microphone that generates electrical signals representative of the captured audio data encoded by the audio encoder 26. Alternatively, audio source 22 may include a storage medium that stores previously recorded audio data, an audio data generator, such as a computerized synthesizer, or any other source of audio data. Video source 24 includes a video camera that generates video data encoded by a video encoder 28, a storage medium encoded with previously recorded video data, a video data generation unit, or any other source of video data You may.

원본 (Raw) 오디오 및 비디오 데이터는 아날로그 또는 디지털 데이터를 포함할 수도 있다. 아날로그 데이터는 오디오 인코더 (26) 및/또는 비디오 인코더 (28) 에 의해 인코딩되기 전에 디지털화될 수도 있다. 오디오 소스 (22) 는 대화 참가자가 말하는 동안에 그 대화 참가자로부터 오디오 데이터를 획득할 수도 있으며, 비디오 소스 (24) 는 대화 참가자의 비디오 데이터를 동시에 획득할 수도 있다. 다른 예들에서, 오디오 소스 (22) 는 저장된 오디오 데이터를 포함하는 컴퓨터 판독가능 저장 매체를 포함할 수도 있으며, 비디오 소스 (24) 는 저장된 비디오 데이터를 포함하는 컴퓨터 판독가능 저장 매체를 포함할 수도 있다. 이러한 방법으로, 본 개시물에서 설명하는 기법들은 라이브, 스트리밍, 실시간 오디오 및 비디오 데이터에 적용하거나 또는 아치브된, 사전-기록된 오디오 및 비디오 데이터에 적용될 수도 있다. 더욱이, 이 기법들은 컴퓨터-발생된 오디오 및 비디오 데이터에 적용될 수도 있다.Raw audio and video data may include analog or digital data. The analog data may be digitized before being encoded by the audio encoder 26 and / or the video encoder 28. The audio source 22 may obtain audio data from the conversation participant while the conversation participant is speaking and the video source 24 may simultaneously acquire the video data of the conversation participant. In other examples, audio source 22 may include a computer readable storage medium including stored audio data, and video source 24 may include a computer readable storage medium including stored video data. In this way, the techniques described in this disclosure may be applied to live, streaming, real-time audio and video data or to archived, pre-recorded audio and video data. Moreover, these techniques may be applied to computer-generated audio and video data.

비디오 프레임들에 대응하는 오디오 프레임들은 일반적으로 비디오 프레임들 내에 포함된, 비디오 소스 (24) 에 의해 캡쳐된 비디오 데이터와 동시에, 오디오 소스 (22) 에 의해 캡쳐된 오디오 데이터를 포함하는 오디오 프레임들이다. 예를 들어, 대화 참가자가 일반적으로 말함으로써 오디오 데이터를 발생하는 동안, 오디오 소스 (22) 는 오디오 데이터를 캡쳐하고, 비디오 소스 (24) 는 동시에, 즉, 오디오 소스 (22) 가 오디오 데이터를 캡쳐하고 있는 동안 대화 참가자의 비디오 데이터를 캡쳐한다. 그러므로, 오디오 프레임은 하나 이상의 특정의 비디오 프레임들에 시간적으로 대응할 수도 있다. 따라서, 비디오 프레임에 대응하는 오디오 프레임은 일반적으로 오디오 데이터 및 비디오 데이터가 동시에 캡쳐되었으며 오디오 프레임 및 비디오 프레임이 동시에 캡쳐된 오디오 데이터 및 비디오 데이터를 각각 포함하는 상황에 대응한다.Audio frames corresponding to video frames are audio frames that contain audio data captured by audio source 22, concurrent with video data captured by video source 24, typically contained within video frames. For example, the audio source 22 captures audio data while the conversation participant typically generates audio data, and the video source 24 simultaneously captures audio data 22, Captures the video data of the conversation participant while doing so. Therefore, the audio frame may correspond in time to one or more specific video frames. Thus, an audio frame corresponding to a video frame generally corresponds to a situation where audio data and video data are captured at the same time, and audio and video frames respectively contain audio data and video data captured simultaneously.

일부 예들에서, 오디오 인코더 (26) 는 그 인코딩된 오디오 프레임에 대한 오디오 데이터가 기록된 시간을 나타내는, 각각의 인코딩된 오디오 프레임에서의 시간스탬프를 인코딩할 수도 있으며, 유사하게, 비디오 인코더 (28) 는 그 인코딩된 비디오 프레임에 대한 비디오 데이터가 기록된 시간을 나타내는, 각각의 인코딩된 비디오 프레임에서의 시간스탬프를 인코딩할 수도 있다. 이런 예들에서, 비디오 프레임에 대응하는 오디오 프레임은 시간스탬프를 포함하는 오디오 프레임 및 동일한 시간스탬프를 포함하는 비디오 프레임을 포함할 수도 있다. A/V 소스 디바이스 (20) 는 오디오 인코더 (26) 및/또는 비디오 인코더 (28) 가 시간스탬프들을 발생할 수도 있거나, 또는 오디오 소스 (22) 및 비디오 소스 (24) 가 시간스탬프로 오디오 및 비디오 데이터를 각각 연관시키는데 사용할 수도 있는 내부 클록을 포함할 수도 있다.In some instances, the audio encoder 26 may encode a time stamp in each encoded audio frame, indicating the time at which the audio data for the encoded audio frame was recorded, May encode a time stamp in each encoded video frame, indicating the time at which the video data for the encoded video frame was recorded. In these instances, the audio frame corresponding to the video frame may comprise an audio frame containing a time stamp and a video frame containing the same time stamp. The A / V source device 20 may also be configured to allow the audio encoder 26 and / or the video encoder 28 to generate time stamps or the audio source 22 and video source 24 to time- Lt; RTI ID = 0.0 > a < / RTI >

일부 예들에서, 오디오 소스 (22) 는 오디오 데이터가 기록된 시간에 대응하는 데이터를 오디오 인코더 (26) 에 전송할 수도 있으며, 비디오 소스 (24) 는 비디오 데이터가 기록된 시간에 대응하는 데이터를 비디오 인코더 (28) 에 전송할 수도 있다. 일부 예들에서, 오디오 인코더 (26) 는 오디오 데이터가 기록된 절대 시간을 반드시 나타내지 않고 인코딩된 오디오 데이터의 상대적인 시간 순서를 나타내기 위해 인코딩된 오디오 데이터에 시퀀스 식별자를 인코딩할 수도 있으며, 유사하게, 비디오 인코더 (28) 는 또한 인코딩된 비디오 유닛의 상대적인 시간 순서를 나타내기 위해 시퀀스 식별자들을 사용할 수도 있다. 이와 유사하게, 일부 예들에서, 시퀀스 식별자는 맵핑되거나 또는 아니면 시간스탬프와 상관될 수도 있다.The audio source 22 may transmit data corresponding to the time at which the audio data was recorded to the audio encoder 26 and the video source 24 may transmit data corresponding to the time at which the video data was recorded to the video encoder 26. [ (28). In some examples, audio encoder 26 may encode a sequence identifier in the encoded audio data to indicate a relative time order of the encoded audio data without necessarily indicating the absolute time at which the audio data was recorded, and similarly, video Encoder 28 may also use sequence identifiers to indicate a relative temporal order of the encoded video unit. Similarly, in some instances, the sequence identifier may be mapped or otherwise correlated with a time stamp.

본 개시물의 기법들은 일반적으로 인코딩된 멀티미디어 (예컨대, 오디오 및 비디오) 데이터의 전송, 및 그 전송된 멀티미디어 데이터의 수신과 후속 해석 및 디코딩에 관한 것이다. 특히, 캡슐화 유닛 (30) 은 멀티미디어 콘텐츠에 대한 구성요소 맵 박스 뿐만 아니라, 멀티미디어 콘텐츠에 대응하는 각각의 파일에 대한 구성요소 배열 박스들을 발생할 수도 있다. 일부 예들에서, 프로세서는 캡슐화 유닛 (30) 에 대응하는 명령들을 실행할 수도 있다. 즉, 캡슐화 유닛 (30) 에 기인하는 기능을 수행하는 명령들은 컴퓨터 판독가능 매체 상에 저장되고 프로세서에 의해 실행될 수도 있다. 다른 프로세싱 회로는 다른 예들에서, 캡슐화 유닛 (30) 에 역시 기인하는 기능들을 수행하도록 구성될 수도 있다. 구성요소 맵 박스는 그 콘텐츠의 구성요소들 (예컨대, 오디오 구성요소들, 비디오 구성요소들, 또는 다른 구성요소들) 과는 개별적으로 저장될 수도 있다.Techniques of the present disclosure generally relate to the transmission of encoded multimedia (e.g., audio and video) data, and the subsequent reception and subsequent interpretation and decoding of the transmitted multimedia data. In particular, the encapsulation unit 30 may generate component arrangement boxes for multimedia content, as well as component arrangement boxes for each file corresponding to the multimedia content. In some instances, the processor may execute instructions corresponding to the encapsulation unit 30. [ That is, instructions to perform functions attributable to encapsulation unit 30 may be stored on a computer readable medium and executed by a processor. Other processing circuitry may, in other examples, be configured to perform the functions that are also attributable to encapsulation unit 30. The component map box may be stored separately from the components of its content (eg, audio components, video components, or other components).

따라서, 목적지 디바이스 (40) 는 멀티미디어 콘텐츠에 대한 구성요소 맵 박스를 요청할 수도 있다. 목적지 디바이스 (40) 는 콘텐츠의 플레이백을 수행하도록 요청하는 구성요소들을 사용자의 선호사항들, 네트워크 조건들, 목적지 디바이스 (40) 의 디코딩 및 렌더링 능력들, 또는 다른 인자들에 기초하여 결정하기 위해 구성요소 맵 박스를 이용할 수도 있다.Thus, the destination device 40 may request a component map box for the multimedia content. Destination device 40 may determine components that request to perform playback of content based on user's preferences, network conditions, decoding and rendering capabilities of destination device 40, or other factors. You can also use the component map box.

A/V 소스 디바이스 (20) 는 A/V 목적지 디바이스 (40) 에 "서비스" 를 제공할 수도 있다. 서비스는 일반적으로 하나 이상의 오디오 콘텐츠 구성요소와 비디오 콘텐츠 구성요소의 조합에 대응하며, 여기서, 오디오 및 비디오 콘텐츠 구성요소들은 전체 콘텐츠 중 가용 콘텐츠 구성요소들의 서브세트들이다. 하나의 서비스는 2개의 뷰들을 갖는 스테레오 비디오에 대응할 수도 있으며, 한편 또 다른 서비스는 4개의 뷰들에 대응할 수도 있으며, 또한 또 다른 서비스는 8개의 뷰들에 대응할 수도 있다. 일반적으로, 서비스는 가용 콘텐츠 구성요소들의 조합 (즉, 서브세트) 을 제공하는 소스 디바이스 (20) 에 대응한다. 콘텐츠 구성요소들의 조합은 또한 그 콘텐츠의 표현으로도 지칭된다.The A / V source device 20 may provide a "service" to the A / V destination device 40. A service generally corresponds to a combination of one or more audio content components and a video content component, wherein the audio and video content components are a subset of the available content components of the overall content. One service may correspond to a stereo video having two views, while another service may correspond to four views, and another service may correspond to eight views. Generally, a service corresponds to a source device 20 that provides a combination (i.e., a subset) of available content components. The combination of content components is also referred to as a representation of the content.

캡슐화 유닛 (30) 은 인코딩된 샘플들을 오디오 인코더 (26) 및 비디오 인코더 (28) 로부터 수신하고 대응하는 네트워크 추상화 계층 (NAL) 유닛들을 인코딩된 샘플들로부터 형성하며, 이 인코딩된 샘플들은 패킷화 기본 스트림 (PES; packetized elementary stream) 패킷들의 유형을 취할 수도 있다. H.264/AVC (고도 비디오 코딩) 의 예에서, 코딩된 비디오 세그먼트들이 NAL 유닛들로 조직화되며, 이 NAL 유닛들은 비디오 전화 통신, 스토리지, 브로드캐스트, 또는 스트리밍과 같은 애플리케이션들을 어드레싱 (addressing) 하는 "네트워크-친화적인" 비디오 표현을 제공한다. NAL 유닛들은 비디오 코딩 계층 (VCL) NAL 유닛들 및 비-VCL NAL 유닛들로 분류될 수 있다. VCL 유닛들은 코어 압축 엔진으로부터의 데이터를 포함할 수도 있으며 블록, 매크로블록, 및/또는 슬라이스 레벨 데이터를 포함할 수도 있다. 다른 NAL 유닛들은 비-VCL NAL 유닛들일 수도 있다. 일부 예들에서, 일반적으로 1차 코딩된 화상으로서 제시되는, 하나의 시간 인스턴스에서의 코딩된 화상이 액세스 유닛에 포함될 수도 있으며, 이 액세스 유닛은 하나 이상의 NAL 유닛들을 포함할 수도 있다.The encapsulation unit 30 receives the encoded samples from the audio encoder 26 and the video encoder 28 and forms corresponding network abstraction layer (NAL) units from the encoded samples, And may take the type of packetized elementary stream (PES) packets. In the example of H.264 / AVC (Advanced Video Coding), coded video segments are organized into NAL units, which address applications such as video telephony, storage, broadcast, or streaming Providing a "network-friendly" video representation. NAL units may be classified into video coding layer (VCL) NAL units and non-VCL NAL units. VCL units may include data from the core compression engine and may include block, macroblock, and / or slice level data. Other NAL units may be non-VCL NAL units. In some instances, a coded picture at one time instance, presented as a generally coded picture, may be included in an access unit, which may include one or more NAL units.

본 개시물의 기법들에 따르면, 캡슐화 유닛 (30) 은 콘텐츠 구성요소들의 특성들을 기술하는 구성요소 맵 박스를 구성할 수도 있다. 캡슐화 유닛 (30) 은 또한 하나 이상의 비디오 파일들에 대한 구성요소 배열 박스들을 구성할 수도 있다. 캡슐화 유닛 (30) 은 각각의 구성요소 배열 박스를 대응하는 비디오 파일과 연관시킬 수도 있으며, 구성요소 맵 박스를 비디오 파일들의 세트와 연관시킬 수도 있다. 이러한 방법으로, 구성요소 배열 박스들과 비디오 파일들 사이에 1:1 대응이 존재하고, 구성요소 맵 박스들과 비디오 파일들 사이에 1:N 대응이 존재할 수도 있다.According to the teachings of the present disclosure, the encapsulation unit 30 may constitute a component map box that describes the characteristics of the content components. The encapsulation unit 30 may also configure component arrangement boxes for one or more video files. Encapsulation unit 30 may associate each component arrangement box with a corresponding video file, and may associate a component map box with a set of video files. In this way, there is a 1: 1 correspondence between the component arrangement boxes and the video files, and there may be a 1: N correspondence between the component map boxes and the video files.

위에서 언급한 바와 같이, 구성요소 맵 박스는 콘텐츠에 공통적인 구성요소들의 특성들을 기술할 수도 있다. 예를 들어, 콘텐츠는 오디오 구성요소들, 비디오 구성요소들, 및 다른 구성요소들, 예컨대 폐쇄 자막을 포함할 수도 있다. 어떤 유형의 구성요소들 각각은 서로 전환가능할 수도 있다. 예를 들어, 2개의 비디오 구성요소들이 2개의 구성요소들 중 어느 하나로부터의 데이터가 콘텐츠의 플레이백을 방해함이 없이 취출될 수도 있다는 점에서 전환가능할 수도 있다. 여러 구성요소들이 여러 방식으로 그리고 여러 품질들로 인코딩될 수도 있다. 예를 들어, 여러 비디오 구성요소들은 여러 프레임 레이트들, 비트레이트들에서, (예컨대, 상이한 코덱들에 대응하는) 상이한 인코더들을 이용하여 인코딩되어, 여러 파일 유형들 (예컨대, H.264/AVC 또는 MPEG-2 전송 스트림 (TS)) 로 캡슐화되거나 또는 아니면 서로 상이할 수도 있다. 그러나, 예를 들어, 비디오 구성요소의 선택은 오디오 구성요소의 선택과는 일반적으로 독립적이다. 구성요소 맵 박스에 의해 시그널링된 구성요소의 특성들은 평균 비트레이트, 최대 비트레이트 (예컨대, 구성요소에 대해 1초의 플레이백 시간 동안), 해상도, 프레임 레이트, 다른 구성요소들에 대한 의존성들, 및 멀티-뷰 비디오와 같은 여러 파일 유형들에 대한 확장판들, 예컨대, 출력용으로 목표된 뷰들의 개수 및 뷰들 각각에 대한 식별자들을 포함할 수도 있다.As mentioned above, the component map box may describe the characteristics of the components common to the content. For example, the content may include audio components, video components, and other components, such as closed captions. Each type of component may be switchable to each other. For example, two video components may be switchable in that data from either of the two components may be retrieved without interfering with playback of the content. The various components may be encoded in different ways and in different qualities. For example, various video components may be encoded at different frame rates, bit rates, using different encoders (e.g., corresponding to different codecs) to provide different file types (e.g., H.264 / AVC or MPEG-2 transport stream (TS)), or they may be different from each other. However, for example, the selection of video components is generally independent of the selection of audio components. The characteristics of the components that are signaled by the component map box include the average bit rate, the maximum bit rate (e.g., for a playback time of one second for the component), resolution, frame rate, dependencies on other components, Extensions for various file types such as multi-view video, e.g., the number of views targeted for output and identifiers for each of the views.

HTTP 서버와 같은 서버로서 동작할 수도 있는 소스 디바이스 (20) 는 적응을 위해 동일한 콘텐츠의 다수의 표현들을 저장할 수도 있다. 일부 표현들은 다수의 콘텐츠 구성요소들을 포함할 수도 있다. 구성요소들은 소스 디바이스 (20) 의 저장 디바이스 (예컨대, 하나 이상의 하드 드라이브들) 상의 상이한 파일들에 저장될 수도 있으며, 따라서 표현은 상이한 파일들로부터의 데이터를 포함할 수도 있다. 여러 구성요소들의 특성들을 시그널링함으로써, 캡슐화 유닛 (30) 은 그 대응하는 콘텐츠를 렌더링하여 플레이백하기 위해 목적지 디바이스 (40) 에 각각의 전환가능한 구성요소 중 하나를 선택하는 능력을 제공할 수도 있다. 즉, 목적지 디바이스 (40) 는 구성요소 맵 박스를 특정의 콘텐츠에 대한 소스 디바이스 (20) 로부터 취출하고, 그 콘텐츠의 특정의 표현에 대응하는 콘텐츠에 대한 구성요소들을 선택하고, 그 후 그 선택된 구성요소들에 대한 데이터를 소스 디바이스 (20) 로부터, 예컨대, 스트리밍 프로토콜, 예컨대 HTTP 스트리밍에 따라서 취출할 수도 있다.The source device 20, which may operate as a server, such as an HTTP server, may store multiple representations of the same content for adaptation. Some expressions may include multiple content elements. The components may be stored in different files on the storage device (eg, one or more hard drives) of the source device 20, so that the representation may include data from different files. By signaling the characteristics of the various components, the encapsulation unit 30 may provide the destination device 40 with the ability to select one of each switchable component to render and play the corresponding content. That is, the destination device 40 retrieves the component map box from the source device 20 for the particular content, selects the components for the content corresponding to the particular representation of the content, Data for the elements may be retrieved from the source device 20, e.g., in accordance with a streaming protocol, such as HTTP streaming.

목적지 디바이스 (40) 는 네트워크 조건들, 예컨대 가용 대역폭, 및 구성요소들의 특성들에 기초하여 표현을 선택할 수도 있다. 더욱이, 목적지 디바이스 (40) 는 소스 디바이스 (20) 에 의해 시그널링된 데이터를 이용하여 변하는 네트워크 조건들에 적응시킬 수도 있다. 즉, 동일한 유형의 구성요소들이 서로 전환가능하기 때문에, 네트워크 조건들이 변할 때, 목적지 디바이스 (40) 는 새로 결정된 네트워크 조건들에 더 적합한 특정의 유형의 상이한 구성요소를 선택할 수도 있다.The destination device 40 may select the presentation based on network conditions, such as available bandwidth, and characteristics of the components. Moreover, the destination device 40 may adapt to changing network conditions using data signaled by the source device 20. [ That is, because the same type of components can be switched to each other, when the network conditions change, the destination device 40 may select a different type of different component that is more suitable for the newly determined network conditions.

캡슐화 유닛 (30) 는 구성요소 식별자 값들을 멀티미디어 콘텐츠의 구성요소 각각에 할당한다. 구성요소 식별자 값들은 유형에 관계없이 구성요소들에 고유하다. 즉, 예를 들어, 동일한 구성요소 식별자를 갖는 오디오 구성요소 및 비디오 구성요소가 없어야 한다. 구성요소 식별자들은 또한 개개의 파일들 내 트랙 식별자들에 반드시 관련될 필요가 없다. 예를 들어, 콘텐츠는 상이한 파일들에 각각 저장된 2개의 비디오 구성요소들을 가질 수도 있다. 파일들 각각은 특정의 파일에 로컬인 식별자들이 그 파일의 범위에, 외면적으로는 아니지만, 고유하므로, 비디오 구성요소들을 동일한 트랙 식별자를 이용하여 식별할 수도 있다. 그러나, 본 개시물의 기법들은 다수의 파일들 내에 상주할 수도 있는 구성요소들의 특성들을 제공하는 것을 수반하기 때문에, 본 개시물은 그 트랙 식별자들에 반드시 관련될 필요가 없는 구성요소 식별자들을 고유하게 할당하는 것을 제안한다.The encapsulation unit 30 assigns the component identifier values to each of the components of the multimedia content. The component identifier values are unique to the components regardless of type. That is, for example, there should be no audio component and no video component with the same component identifier. The component identifiers also need not necessarily be associated with track identifiers in individual files. For example, the content may have two video components each stored in different files. Each of the files may be identified using the same track identifier, since the identifiers local to a particular file are unique, but not externally, to the extent of the file. However, because the techniques of the present disclosure involve providing the characteristics of components that may reside in multiple files, the present disclosure may be used to uniquely assign component identifiers that are not necessarily associated with the track identifiers .

구성요소 맵 박스는 또한 단편들이 그 파일에서 각각의 구성요소/트랙에 대해 저장되는 방법, 예컨대, 단편들이 시작하는 장소, 그들이 랜덤 액세스 포인트들을 포함하는지 여부 (및, 랜덤 액세스 포인트들이 순간 디코딩 리프레시 (IDR) 또는 개방 디코딩 리프레시 (ODR) 화상들인지 여부), 각각의 단편의 시작에 대한 바이트 오프셋들, 각각의 단편에서의 제 1 샘플들의 디코딩 시간들, 랜덤 액세스 포인트들에 대한 디코딩 및 프리젠테이션 시간들, 및 특정의 단편이 새로운 세그먼트에 속하는지 여부를 나타내는 플래그를 나타낼 수도 있다. 각각의 세그먼트는 독립적으로 취출할 수도 있다. 예를 들어, 캡슐화 유닛 (30) 은 각각의 세그먼트가 고유한 URL 또는 URN 을 이용하여 취출될 수 있도록, 구성요소의 세그먼트 각각을 저장할 수도 있다.The component map box also displays how the fragments are stored for each component / track in the file, e.g., where fragments originate, whether they contain random access points (and whether the random access points are instantaneous decoding refreshes IDR) or open decoding refresh (ODR) pictures), byte offsets for the beginning of each fragment, decoding times of the first samples in each fragment, decoding and presentation times for random access points , And a flag indicating whether a particular segment belongs to a new segment. Each segment can be taken out independently. For example, the encapsulation unit 30 may store each segment of a component such that each segment can be retrieved using a unique URL or URN.

더욱이, 캡슐화 유닛 (30) 은 구성요소 배열 박스들을 콘텐츠에 대한 구성요소 식별자들과 대응하는 파일 내 트랙 식별자들 사이에 맵핑을 제공하는 파일들 각각에 제공할 수도 있다. 캡슐화 유닛 (30) 은 또한 동일한 유형의 구성요소들 사이의 의존성들을 시그널링할 수도 있다. 예를 들어, 어떤 구성요소들은 정확하게 디코딩되는 동일한 유형의 다른 구성요소들에 의존할 수도 있다. 일 예로서, 스케일러블 비디오 코딩 (SVC) 에서, 기초 계층 (base layer) 은 하나의 구성요소에 대응할 수도 있으며, 기초 계층에 대한 향상 계층은 또 다른 구성요소에 대응할 수도 있다. 또 다른 예로서, 멀티-뷰 비디오 코딩 (MVC) 에서, 하나의 뷰는 한 구성요소에 대응할 수도 있으며, 동일한 장면의 또 다른 뷰는 또 다른 구성요소에 대응할 수도 있다. 또한 또 다른 예로서, 한 구성요소의 샘플들은 또 다른 구성요소의 샘플들에 대해서 인코딩될 수도 있다. 예를 들어, MVC 에서, 인터-뷰 예측이 인에이블되는 상이한 뷰들에 대응하는 구성요소들일 수도 있다.Moreover, the encapsulation unit 30 may provide component arrangement boxes to each of the files providing mappings between component identifiers for content and track identifiers in a corresponding file. The encapsulation unit 30 may also signal dependencies between components of the same type. For example, some components may depend on other components of the same type that are correctly decoded. As an example, in a scalable video coding (SVC), a base layer may correspond to one component, and an enhancement layer for a base layer may correspond to another component. As another example, in multi-view video coding (MVC), one view may correspond to one component, and another view of the same scene may correspond to another component. As yet another example, samples of one component may be encoded for samples of another component. For example, in MVC, the components corresponding to different views where inter-view prediction is enabled may be.

이러한 방법으로, 목적지 디바이스 (40) 는 구성요소들을 적절히 디코딩 및/또는 렌더링하기 위해, 구성요소들 사이의 의존성들을 결정하고 원하는 구성요소들에 더해서, 그 부모 구성요소들에 의존하는 구성요소들에 대한 부모 구성요소들을 취출할 수도 있다. 캡슐화 유닛 (30) 은 목적지 디바이스 (40) 가 그 구성요소들에 대한 데이터를 적합한 순서로 요청할 수 있도록, 의존성들의 순서 및/또는 구성요소들의 디코딩 순서를 추가로 시그널링할 수도 있다. 더욱이, 캡슐화 유닛 (30) 은 목적지 디바이스 (40) 가 디코딩 및/또는 렌더링을 위해 구성요소들의 샘플들을 적절하게 정렬할 수 있도록, 의존성들을 갖는 구성요소들 사이의 시간 계층 차이들을 시그널링할 수도 있다. 예를 들어, 하나의 비디오 구성요소는 24 의 프레임 레이트 및 12 fps 의 0 하위 계층인 temporal_id 를 갖는 반면, 또 다른 비디오 구성요소는 30 의 프레임 레이트 및 7.5 fps 의 0 하위 계층인 temporal_id 를 가질 수도 있다.In this way, the destination device 40 determines the dependencies among the components to properly decode and / or render the components and, in addition to the desired components, to the components that depend on the parent components You can also retrieve the parent components for. Encapsulation unit 30 may further signal the order of dependencies and / or the decoding order of the components so that destination device 40 can request data for the components in a suitable order. Furthermore, the encapsulation unit 30 may signal time layer differences between components having dependencies, such that the destination device 40 may properly align the samples of the components for decoding and / or rendering. For example, one video component may have a frame rate of 24 and temporal_id which is 0 sublayer of 12 fps, while another video component may have a frame rate of 30 and temporal_id which is 0 sublayer of 7.5 fps. .

캡슐화 유닛 (30) 은 표현을 형성하기 위해 구성요소들의 조합들에 대한 여러 가능한 멀티플렉싱 간격들을 시그널링할 수도 있다. 이러한 방법으로, 목적지 디바이스 (40) 는 그 구성요소들의 이전 세그먼트들이 디코딩되고 디스플레이되고 있는 동안 그 구성요소들의 다가오는 세그먼트들에 대한 데이터가 취출될 수 있게 여러 구성요소들에 대한 데이터를 충분한 시간 기간 내에 요청하기 위해서, 가능한 멀티플렉싱 간격들 중 하나를 선택할 수도 있다. 즉, 목적지 디바이스 (40) 는 버퍼가 오버플로우될 정도로 앞서지는 아니지만, (네트워크 조건들에서 어떤 즉각적인 변화도 없다고 가정하면) 차단되는 플레이백이 없을 정도로 아주 충분히 앞서는, 그 구성요소들에 대한 데이터를 요청할 수도 있다. 네트워크 조건들에서 변화가 있으면, 목적지 디바이스 (40) 는 더 많은 후속 데이터의 송신을 대기하면서 디코딩 및 렌더링을 위해 충분한 양의 데이터가 취출되는 것을 보장하기 위해, 구성요소들을 전부 전환하기보다는, 상이한 멀티플렉싱 간격을 선택할 수도 있다. 캡슐화 유닛 (30) 은 멀티플렉싱 간격들을 명시적으로 시그널링된 간격들 또는 간격들의 범위에 기초하여 시그널링할 수도 있으며, 그 구성요소 맵 박스 내에서 이들 멀티플렉싱 간격들을 시그널링할 수도 있다.Encapsulation unit 30 may signal several possible multiplexing intervals for combinations of components to form a representation. In this way, the destination device 40 is able to retrieve data for the various components over a sufficient time period so that data for the upcoming segments of the components can be retrieved while the previous segments of the components are being decoded and displayed To request, one of the possible multiplexing intervals may be selected. That is, the destination device 40 does not advance to the point where the buffer overflows but requests data for those components that are far enough ahead that there is no blocking (assuming there is no immediate change in network conditions) It is possible. If there is a change in the network conditions, then the destination device 40 may wait for transmission of more subsequent data, rather than switching all of the components, to ensure that a sufficient amount of data is fetched for decoding and rendering, You can also choose an interval. The encapsulation unit 30 may signal multiplexing intervals based on explicitly signaled intervals or ranges of intervals, and may signal these multiplexing intervals within the component map box.

일부 예들에서, 소스 디바이스 (20) 는 다수의 바이트 범위들을 규정하는 요청들을 수신할 수도 있다. 즉, 목적지 디바이스 (40) 는 파일 내 여러 구성요소들의 멀티플렉싱을 달성하기 위해 다수의 바이트-범위들을 하나의 요청에 규정할 수도 있다. 목적지 디바이스 (40) 는 구성요소들이 다수의 파일들에 있을 때 다수의 요청들을 전송할 수도 있으며, 그 요청들 중 임의의 요청 또는 모두는 하나 이상의 바이트 범위들을 규정할 수도 있다. 일 예로서, 목적지 디바이스 (40) 는 다수의 HTTP 취득 또는 부분 취득 요청들을 다수의 URLs 또는 URNs 에 서브밋할 수도 있으며, 이때, 그 부분 취득 요청들의 임의의 요청 또는 모두는 그 요청들의 URLs 또는 URNs 내에 다수의 바이트 범위들을 규정할 수도 있다. 소스 디바이스 (20) 는 그 요청된 데이터를 목적지 디바이스 (40) 에 제공함으로써 응답할 수도 있다. 일부 예들에서, 소스 디바이스 (20) 는 예컨대, 멀티플렉스 표현의 구성요소들을 함께 멀티플렉싱하여, 소스 디바이스 (20) 가 그 후 목적지 디바이스 (40) 에 제공할 수도 있는 파일을 동적으로 형성하기 위해, 공통 게이트웨이 인터페이스 (CGI) 를 구현함으로써, 동적 멀티플렉싱을 지원할 수도 있다.In some instances, the source device 20 may receive requests defining a number of byte ranges. That is, destination device 40 may define multiple byte-ranges in one request to achieve multiplexing of the various components in the file. The destination device 40 may send multiple requests when the components are in multiple files, and any or all of the requests may define one or more byte ranges. As one example, destination device 40 may submit multiple HTTP acquisition or partial acquisition requests to multiple URLs or URNs, where any or all of the partial acquisition requests are URLs or URNs of those requests. Multiple byte ranges may be defined within. The source device 20 may respond by providing the requested data to the destination device 40. In some instances, the source device 20 may multiplex the components of the multiplex representation together, for example, in order to dynamically form a file that the source device 20 may then provide to the destination device 40, By implementing a gateway interface (CGI), it may support dynamic multiplexing.

캡슐화 유닛 (30) 은 또한 구성요소 맵 박스가 대응하는 콘텐츠의 시간 지속기간을 규정할 수도 있다. 디폴트로, 목적지 디바이스 (40) 는 어떤 시간 지속기간도 시그널링되지 않을 때 구성요소 맵 박스가 전체 콘텐츠에 적용하는 것을 결정하도록 구성될 수도 있다. 그러나, 시그널링될 경우, 목적지 디바이스 (40) 는 콘텐츠의 상이한 시간 지속기간에 각각 대응하는, 그 콘텐츠에 대한 다수의 구성요소 맵 박스들을 요청하도록 구성될 수도 있다. 캡슐화 유닛 (30) 은 구성요소 맵 박스들을 함께 인접하게, 또는 별개의 위치들에 저장할 수도 있다.The encapsulation unit 30 may also define the time duration of the content for which the component map box corresponds. By default, the destination device 40 may be configured to determine that the component map box applies to the entire content when no time duration is signaled. However, when signaled, destination device 40 may be configured to request multiple component map boxes for that content, each corresponding to a different time duration of the content. The encapsulation unit 30 may store the component map boxes together adjacent, or in separate locations.

일부의 경우, 구성요소의 여러 부분들 (예컨대, 세그먼트들) 은 별개의 파일들 (예컨대, URL 또는 URN 취출가능 데이터 구조들) 에 저장될 수도 있다. 이런 경우, 동일한 구성요소 식별자는 각각의 파일에서, 예컨대 파일의 구성요소 배열 박스 내에서 구성요소를 식별하기 위해 사용될 수도 있다. 파일들은 순차적인 타이밍 정보, 즉, 그 파일들 중 하나가 다른 하나의 파일을 바로 뒤따른다는 것을 나타내는 타이밍 정보를 가질 수도 있다. 목적지 디바이스 (40) 는 멀티플렉스된 단편들에 대한 요청들을 어떤 타이밍 간격 및 순간 비트레이트에 기초하여 발생할 수도 있다. 목적지 디바이스 (40) 는 순간 비트레이트를 구성요소의 단편들에서의 바이트들의 개수에 기초하여 계산할 수도 있다.In some cases, different parts of the component (e.g., segments) may be stored in separate files (e.g., URL or URN retrievable data structures). In this case, the same component identifier may be used in each file, e.g., to identify the component within the component arrangement box of the file. The files may have sequential timing information, i.e., timing information indicating that one of the files follows the other file immediately. The destination device 40 may generate requests for multiplexed fragments based on some timing interval and instantaneous bit rate. The destination device 40 may calculate the instantaneous bit rate based on the number of bytes in the fragments of the component.

대부분의 비디오 코딩 표준들과 같이, H.264/AVC 는 구문, 의미들 (semantics), 및 에러-없는 비트스트림들을 위한 디코딩 프로세스를 정의하며, 이 비트스트림들 중 임의의 스트림이 어떤 프로파일 또는 레벨에 따른다. H.264/AVC 는 인코더를 규정하지 않지만, 인코더는 그 발생된 비트스트림들이 디코더에 대한 표준-규격임을 보증하는 것을 맡고 있다. 비디오 코딩 표준의 상황에서, "프로파일" 은 그들에게 적용하는 알고리즘들, 피쳐들, 또는 툴들 및 제약들의 서브 세트에 대응한다. H.264 표준에 의해 정의되는 바와 같이, 예를 들어, "프로파일" 은 H.264 표준에 의해 규정된 전체 비트스트림 구문의 서브세트이다. "레벨" 은 화상들의 해상도, 비트 레이트, 및 매크로블록 (MB) 프로세싱 레이트에 관련되는, 예를 들어, 디코더 메모리 및 계산과 같은, 디코더 리소스 소비의 제한들에 대응한다. 프로파일은 profile_idc (프로파일 표시자) 값으로 시그널링될 수도 있지만, 레벨은 level_idc (레벨 표시자) 값으로 시그널링될 수도 있다.As with most video coding standards, H.264 / AVC defines a decoding process for syntax, semantics, and error-free bitstreams, and any stream of these bitstreams may be associated with any profile or level . H.264 / AVC does not prescribe an encoder, but the encoder is responsible for ensuring that the generated bitstreams are standards-compliant for the decoder. In the context of a video coding standard, a "profile" corresponds to a subset of algorithms, features, or tools and constraints that apply to them. As defined by the H.264 standard, for example, a “profile” is a subset of the full bitstream syntax defined by the H.264 standard. The "level" corresponds to limitations of decoder resource consumption, such as, for example, decoder memory and calculations, related to the resolution, bit rate, and macroblock (MB) processing rate of pictures. A profile may be signaled with a profile_idc (profile indicator) value, but the level may be signaled with a level_idc (level indicator) value.

H.264 표준은, 예를 들어, 주어진 프로파일의 구문에 의해 부과되는 한계들 내에서, 그 디코딩된 화상들의 지정된 사이즈와 같은 비트스트림에서의 구문 엘리먼트들에 의해 취한 값들에 따라서 인코더들 및 디코더들의 성능에서의 큰 변동을 요구하는 것이 여전히 가능하다는 것을 인정한다. H.264 표준은 추가로, 많은 애플리케이션들에서, 특정의 프로파일 내에서 그 구문의 모든 가정적 사용들을 처리할 수 있는 디코더를 구현하는 것이 실용적이지도 경제적이지도 않다는 것을 인정한다. 따라서, H.264 표준은 "레벨" 을 그 비트스트림에서의 구문 엘리먼트들의 값들에 의해 부과되는 제약들의 지정된 세트로서 정의한다. 이들 제약들은 값들에 대한 단순한 제한들일 수도 있다. 이의 대안으로, 이들 제약들은 값들의 산술적 조합들에 관한 제약들의 유형 (예컨대, 화상 폭 곱하기 화상 높이 곱하기 초 당 디코딩되는 화상들의 개수) 을 취할 수도 있다. H.264 표준은 개개의 구현예들이 각각의 지원되는 프로파일에 대해 상이한 레벨을 지원할 수도 있다는 것을 추가로 제공한다.The H.264 standard allows for encoders and decoders, for example, in accordance with the values taken by the syntax elements in the bitstream, such as the specified size of the decoded pictures, within limits imposed by the syntax of a given profile It is still possible to require large fluctuations in performance. The H.264 standard further recognizes that, in many applications, it is neither practical nor economical to implement decoders that can handle all the hypothetical uses of the syntax within a particular profile. Thus, the H.264 standard defines a "level" as a specified set of constraints imposed by the values of syntax elements in the bitstream. These constraints may be simple restrictions on values. Alternatively, these constraints may take the type of constraints on the arithmetic combinations of values (e.g., image width times image height times times the number of images decoded per second). The H.264 standard further provides that individual implementations may support different levels for each supported profile.

프로파일에 따르는 디코더는 그 프로파일에 정의된 모든 피쳐들을 통상 지원한다. 예를 들어, 코딩 피쳐로서, B-화상 코딩은 H.264/AVC 의 베이스라인 프로파일에서 지원되지 않지만 H.264/AVC 의 다른 프로파일들에서 지원된다. 레벨에 따르는 디코더는 리소스들을 그 레벨에 정의된 제한들을 넘어서 요구하지 않는 임의의 비트스트림을 디코딩하는 것이 가능해야 한다. 프로파일들 및 레벨들의 정의들은 해석능력에 도움이 될 수도 있다. 예를 들어, 비디오 송신 동안, 프로파일 및 레벨 정의들의 쌍은 전체 송신 세션 동안 협상되고 동의 받을 수도 있다. 보다 구체적으로는, H.264/AVC 에서, 레벨은 예를 들어, 프로세싱될 필요가 있는 매크로블록들의 개수, 디코딩된 화상 버퍼 (DPB) 사이즈, 코딩된 화상 버퍼 (CPB) 사이즈, 수직 모션 벡터 범위, 2개의 연속되는 MBs 당 모션 벡터들의 최대 개수, 및 B-블록이 8 x 8 픽셀들 미만의 서브-매크로블록 파티션들을 갖는지 여부에 대한 제한들을 정의할 수도 있다. 이러한 방법으로, 디코더는 그 디코더가 그 비트스트림을 적절히 디코딩할 수 있는지 여부를 결정할 수도 있다.A profile compliant decoder typically supports all the features defined in the profile. For example, as a coding feature, B-picture coding is not supported in the baseline profile of H.264 / AVC, but is supported in other profiles of H.264 / AVC. A level-compliant decoder should be able to decode any bitstream that does not require resources beyond the limits defined at that level. Definitions of profiles and levels may be helpful in interpreting ability. For example, during video transmission, the pair of profile and level definitions may be negotiated and agreed for the entire transmission session. More specifically, in H.264 / AVC, the level may be, for example, the number of macroblocks that need to be processed, the decoded picture buffer (DPB) size, the coded picture buffer (CPB) , The maximum number of motion vectors per two consecutive MBs, and whether the B-block has sub-macroblock partitions of less than 8 x 8 pixels. In this way, the decoder may determine whether or not the decoder can properly decode the bitstream.

미디어 표현은 미디어 표현 설명 (MPD) 을 포함할 수도 있으며, 미디어 표현 설명은 상이한 대안적인 표현들의 설명들 (예컨대, 상이한 품질들을 가진 비디오 서비스들) 을 포함할 수도 있으며 이 설명은 예컨대, 코덱 정보, 프로파일 값, 및 레벨 값을 포함할 수도 있다. 목적지 디바이스 (40) 는 여러 표현들의 영화 단편들에 어떻게 액세스할지를 결정하기 위해 미디어 표현의 MPD 를 취출할 수도 있다. 영화 단편들은 비디오 파일들의 영화 단편 박스들 (moof 박스들) 에 위치될 수도 있다.The media representation may include a media description description (MPD), and the media description description may include descriptions of different alternative representations (e.g., video services with different qualities), such as codec information, A profile value, and a level value. The destination device 40 may retrieve the MPD of the media representation to determine how to access the movie fragments of the various representations. Movie fragments may be located in movie fragments boxes (moof boxes) of video files.

ITU-T H.261, H.262, H.263, MPEG-1, MPEG-2 및 H.264/MPEG-4 파트 10 과 같은 비디오 압축 표준들은 시간 리던던시를 감소시키기 위해 모션 보상 시간 예측을 이용한다. 인코더는 현재의 코딩된 화상들을 모션 벡터들에 따라서 예측하기 위해, 일부 이전에 인코딩된 화상들 (또한, 본원에서 프레임들로 지칭됨) 로부터의 모션 보상 예측을 이용한다. 대표적인 비디오 코딩에서는 3개의 주요 화상 유형들이 있다. 이들은 인트라 코딩된 화상 ("I-화상들" 또는 "I-프레임들"), 예측된 화상들 ("P-화상들" 또는 "P-프레임들") 및 양방향 예측된 화상들 ("B-화상들" 또는 "B-프레임들") 이다. P-화상들은 시간 순서에서 오직 현재의 화상 이전의 참조 화상만 이용한다. B-화상에서, B-화상의 블록 각각은 하나 또는 2개의 참조 화상들로부터 예측될 수도 있다. 이들 참조 화상들은 시간 순서로 현재의 화상 이전 또는 이후에 위치될 수 있다.Video compression standards such as ITU-T H.261, H.262, H.263, MPEG-1, MPEG-2 and H.264 / MPEG-4 Part 10 use motion compensation time prediction to reduce time redundancy . The encoder uses motion compensated prediction from some previously encoded pictures (also referred to herein as frames) to predict the current coded pictures according to the motion vectors. In representative video coding, there are three main picture types. ("P-pictures" or "P-frames") and bidirectionally predicted pictures ("B-pictures" Pictures "or" B-frames "). The P-pictures use only the reference picture prior to the current picture in the time sequence. In the B-picture, each block of the B-picture may be predicted from one or two reference pictures. These reference pictures can be located before or after the current picture in chronological order.

H.264 코딩 표준에 따르면, 일 예로서, B-화상들은 이전에-코딩된 참조 화상들 중 2개의 리스트들, 즉, 리스트 0 및 리스트 1 을 이용한다. 이들 2개의 리스트들은 각각 시간 순서로 과거 및/또는 미래 코딩된 화상들을 포함할 수 있다. B-화상에서 블록들은 여러 방법들: 리스트 0 참조 화상으로부터의 모션-보상되는 예측, 리스트 1 참조 화상으로부터의 모션-보상되는 예측, 또는 리스트 0 참조 화상과 리스트 1 참조 화상 양자의 조합으로부터의 모션-보상되는 예측 중 하나로 예측될 수도 있다. 리스트 0 참조 화상과 리스트 1 참조 화상 양자의 조합을 취득하기 위해, 2개의 모션 보상되는 참조 영역들이 리스트 0 참조 화상 및 리스트 1 참조 화상으로부터 각각 획득된다. 그들의 조합은 현재의 블록을 예측하는데 사용될 수도 있다.According to the H.264 coding standard, as one example, B-pictures use two of the previously-coded reference pictures, i.e., list 0 and list 1. These two lists may each include past and / or future coded pictures in chronological order. The blocks in the B-picture can be divided into several methods: motion-compensated prediction from a list 0 reference picture, motion-compensated prediction from a list 1 reference picture, or motion from a combination of both a list 0 reference picture and a list 1 reference picture - May be predicted as one of the compensated predictions. To obtain a combination of both the List 0 reference picture and the List 1 reference picture, two motion compensated reference areas are obtained from the List 0 reference picture and the List 1 reference picture, respectively. Their combination may be used to predict the current block.

ITU-T H.264 표준은 여러 블록 사이즈들, 예컨대 루마 (luma) 성분들의 경우 16 × 16, 8 × 8, 또는 4 × 4 에서, 및 크로마 (chroma) 성분들의 경우 8 x 8 에서 인트라 예측을 지원할 뿐만 아니라, 여러 블록 사이즈들, 예컨대 루마 성분들의 경우 16 x 16, 16 x 8, 8 x 16, 8 x 8, 8 x 4, 4 x 8 및 4 x 4 에서, 그리고 크로마 성분들의 경우 대응하는 스케일링된 사이즈들에서 인터 예측을 지원한다. 본 개시물에서, "N x N" 및 "N × N" 은 수직 및 수평 치수들의 관점에서의 블록의 픽셀 치수들, 예컨대, 16 x 16 픽셀들 또는 16 × 16 픽셀들을 지칭하기 위해 상호교환가능하게 사용될 수도 있다. 일반적으로, 16 x 16 블록은 수직 방향으로 16 개의 픽셀들 (y = 16) 및 수평 방향으로 16 개의 픽셀들 (x = 16) 을 가질 것이다. 이와 유사하게, N x N 블록은 일반적으로 수직 방향으로 N개의 픽셀들 및 수평 방향으로 N개의 픽셀들을 가지며, 여기서 N 은 음이 아닌 정수 값을 나타낸다. 블록에서 픽셀들은 로우들 및 칼럼들로 배열될 수도 있다. 블록들은 수평 및 수직 치수들에서 상이한 픽셀들의 개수를 가질 수도 있다. 즉, 블록들은 N x M 픽셀들을 포함할 수도 있으며, 여기서 N 은 반드시 M 과 같을 필요는 없다.The ITU-T H.264 standard defines intra prediction in several block sizes, e.g., 16 x 16, 8 x 8, or 4 x 4 for luma components and 8 x 8 for chroma components. As well as in various block sizes, e.g., 16 x 16, 16 x 8, 8 x 16, 8 x 8, 8 x 4, 4 x 8 and 4 x 4 for luma components, Inter prediction is supported in scaled sizes. In the present disclosure, "N x N" and "N x N" are interchangeable to refer to the pixel dimensions of the block, e.g., 16 x 16 pixels or 16 x 16 pixels, . In general, a 16 x 16 block will have 16 pixels (y = 16) in the vertical direction and 16 pixels (x = 16) in the horizontal direction. Similarly, an N × N block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in the block may be arranged in rows and columns. The blocks may have a different number of pixels in the horizontal and vertical dimensions. That is, the blocks may include N x M pixels, where N does not necessarily have to equal M.

16 × 16 미만인 블록 사이즈들은 16 × 16 매크로블록의 파티션들로서 지칭될 수도 있다. 비디오 블록들은 픽셀 도메인에서의 픽셀 데이터의 블록들, 또는 이산 코사인 변환 (DCT), 정수 변환, 웨이블릿 변환, 또는 코딩된 비디오 블록들과 예측 비디오 블록들 사이의 픽셀 차이들을 나타내는 잔여 비디오 블록 데이터로의 개념적으로 유사한 변환과 같은 변환의 적용이 뒤따르는 변환 도메인에서의 변환 계수들의 블록들을 포함할 수도 있다. 일부의 경우, 비디오 블록은 그 변환 도메인에서 양자화된 변환 계수들의 블록들을 포함할 수도 있다.Block sizes less than 16x16 may be referred to as partitions of a 16x16 macroblock. The video blocks are either blocks of pixel data in the pixel domain, or into discrete video block data representing pixel differences between discrete cosine transform (DCT), integer transform, wavelet transform, or coded video blocks and predictive video blocks. It may conceptually include blocks of transform coefficients in the transform domain followed by application of a transform, such as a similar transform. In some cases, the video block may comprise blocks of quantized transform coefficients in the transform domain.

더 작은 비디오 블록들이 더 나은 해상도를 제공할 수 있으며, 높은 세부 레벨들을 포함하는 비디오 프레임의 위치들에 대해 사용될 수도 있다. 일반적으로, 매크로블록들 및 여러 파티션들은 서브-블록들로서 종종 지칭되며, 비디오 블록들로 간주될 수도 있다. 게다가, 슬라이스는 복수의 비디오 블록들, 예컨대 매크로블록들 및/또는 서브-블록들로 간주될 수도 있다. 각각의 슬라이스는 비디오 프레임의 독립적으로 디코딩가능한 유닛일 수도 있다. 이의 대안으로, 프레임들 자신은 디코딩가능한 유닛들일 수도 있거나, 또는 프레임의 다른 부분들은 디코딩가능한 유닛들으로서 정의될 수도 있다. 용어 "코딩된 유닛" 또는 "코딩 유닛" 은 전체 프레임, 프레임의 슬라이스, 시퀀스로도 또한 지칭되는 화상들 (GOP) 의 그룹과 같은, 비디오 프레임의 임의의 독립적으로 디코딩가능한 유닛, 또는 적용가능한 코딩 기법들에 따라 정의된 또 다른 독립적으로 디코딩가능한 유닛을 지칭할 수도 있다.Smaller video blocks may provide better resolution and may be used for positions of video frames that include higher detail levels. In general, macroblocks and several partitions are often referred to as sub-blocks and may be considered as video blocks. In addition, the slice may be regarded as a plurality of video blocks, e.g., macroblocks and / or sub-blocks. Each slice may be an independently decodable unit of a video frame. Alternatively, the frames themselves may be decodable units, or other parts of the frame may be defined as decodable units. The term "coded unit" or "coding unit" refers to any independently decodable unit of a video frame, such as a whole frame, a slice of frames, a group of pictures (GOPs) May refer to another independently decodable unit defined in accordance with techniques.

용어 매크로블록은 화상 및/또는 비디오 데이터를 16 x 16 픽셀들을 포함하는 2차원 픽셀 어레이에 따라서 인코딩하기 위한 데이터 구조를 지칭한다. 각각의 픽셀은 크로미넌스 성분 및 휘도 성분을 포함한다. 따라서, 매크로블록은 8 x 8 픽셀들의 2차원 어레이를 각각 포함하는 4개의 휘도 블록들, 16 x 16 픽셀들의 2차원 어레이를 각각 포함하는 2개의 크로미넌스 블록들, 및 코딩된 블록 패턴 (CBP), 인코딩 모드 (예컨대, 인트라- (I), 또는 인터- (P 또는 B) 인코딩 모드들), 인트라-인코딩된 블록의 파티션들에 대한 파티션 사이즈 (예컨대, 16 x 16, 16 x 8, 8 x 16, 8 x 8, 8 x 4, 4 x 8, 또는 4 x 4), 또는 인터-인코딩된 매크로블록에 대한 하나 이상의 모션 벡터들과 같은 구문 정보를 포함하는 헤더를 정의할 수도 있다.The term macroblock refers to a data structure for encoding image and / or video data according to a two-dimensional pixel array comprising 16 x 16 pixels. Each pixel includes a chrominance component and a luminance component. Thus, the macroblock is divided into four luminance blocks each comprising a two-dimensional array of 8 x 8 pixels, two chrominance blocks each comprising a two-dimensional array of 16 x 16 pixels, and a coded block pattern (CBP). ), Encoding mode (eg, intra- (I), or inter- (P or B) encoding modes), partition size for partitions of the intra-encoded block (eg, 16 × 16, 16 × 8, 8 x 16, 8 x 8, 8 x 4, 4 x 8, or 4 x 4), or one or more motion vectors for the inter-encoded macroblock.

비디오 인코더 (28), 비디오 디코더 (48), 오디오 인코더 (26), 오디오 디코더 (46), 캡슐화 유닛 (30), 및 캡슐화 해제 유닛 (38) 각각은, 하나 이상의 마이크로프로세서들, 디지털 신호 프로세서들 (DSPs), 주문형 집적회로들 (ASICs), 필드 프로그래밍가능 게이트 어레이들 (FPGAs), 이산 로직 회로, 소프트웨어, 하드웨어, 펌웨어 또는 임의의 이들의 조합들과 같은 다양한 적합한 프로세싱 회로 중 임의의 회로로서, 적용가능한 경우, 구현될 수도 있다. 비디오 인코더 (28) 및 비디오 디코더 (48) 각각은 하나 이상의 인코더들 또는 디코더들에 포함될 수도 있으며, 이들 중 어느 쪽이든 결합된 비디오 인코더/디코더 (코덱) 의 부분으로서 통합될 수도 있다. 이와 유사하게, 오디오 인코더 (26) 및 오디오 디코더 (46) 각각은 하나 이상의 인코더들 또는 디코더들에 포함될 수도 있으며, 이들 중 어느 쪽이든 결합된 코덱의 부분으로서 통합될 수도 있다. 비디오 인코더 (28), 비디오 디코더 (48), 오디오 인코더 오디오 인코더 (26), 오디오 디코더 (46), 캡슐화 유닛 (30), 및/또는 캡슐화 해제 유닛 (38) 을 포함하는 장치는 하나 이상의 집적 회로들, 마이크로프로세서들, 및/또는 무선 통신 디바이스, 예컨대 셀룰러 전화기의 임의의 조합을 포함할 수도 있다.Each of the video encoder 28, the video decoder 48, the audio encoder 26, the audio decoder 46, the encapsulation unit 30 and the decapsulation unit 38 may be implemented as one or more microprocessors, Any of a variety of suitable processing circuits such as DSPs, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic circuits, software, hardware, firmware or any combination thereof, If applicable, may be implemented. Each of video encoder 28 and video decoder 48 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined video encoder / decoder (codec). Similarly, each of the audio encoder 26 and the audio decoder 46 may be included in one or more encoders or decoders, or may be integrated as part of a combined codec. An apparatus comprising a video encoder 28, a video decoder 48, an audio encoder audio encoder 26, an audio decoder 46, an encapsulation unit 30, and / or an encapsulation unit 38 may be implemented within one or more integrated circuits , Microprocessors, and / or any combination of wireless communication devices, e.g., cellular telephones.

캡슐화 유닛 (30) 이 비디오 파일을 그 수신된 데이터에 기초하여 어셈블리한 후, 캡슐화 유닛 (30) 은 출력을 위해 비디오 파일을 출력 인터페이스 (32) 로 전달한다. 일부 예들에서, 캡슐화 유닛 (30) 은 비디오 파일을 로컬로 저장하거나 또는 비디오 파일을 직접 목적지 디바이스 (40) 에 전송하기보다는, 출력 인터페이스 (32) 를 통해서, 비디오 파일을 원격 서버에 전송할 수도 있다. 출력 인터페이스 (32) 는 예를 들어, 송신기, 송수신기, 예를 들어, 광학 드라이브, 자기 매체 드라이브 (예컨대, 플로피 드라이브) 과 같은 컴퓨터 판독가능 매체에 데이터를 기록하기 위한 디바이스, 범용 시리얼 버스 (USB) 포트, 네트워크 인터페이스, 또는 다른 출력 인터페이스를 포함할 수도 있다. 출력 인터페이스 (32) 는 비디오 파일을 예를 들어, 송신 신호, 자기 매체, 광학적 매체, 메모리, 플래시 드라이브와 같은, 컴퓨터 판독가능 매체 (34), 또는 다른 컴퓨터 판독가능 매체로 출력한다. 출력 인터페이스 (32) 는 HTTP 취득 및 부분 취득 요청들에 응답하기 위해 HTTP 1.1 을 구현할 수도 있다. 이러한 방법으로, 소스 디바이스 (20) 는 HTTP 스트리밍 서버로서 동작할 수도 있다.After the encapsulation unit 30 assembles the video file based on the received data, the encapsulation unit 30 passes the video file to the output interface 32 for output. In some instances, the encapsulation unit 30 may transmit the video file to the remote server via the output interface 32, rather than locally storing the video file or sending the video file directly to the destination device 40. [ Output interface 32 may be, for example, a transmitter, a transceiver, a device for writing data to a computer readable medium such as an optical drive, a magnetic media drive (e.g., a floppy drive), a universal serial bus Port, network interface, or other output interface. Output interface 32 outputs the video file to a computer readable medium 34, or other computer readable medium, such as, for example, a transmit signal, magnetic media, optical media, memory, flash drive. The output interface 32 may implement HTTP 1.1 to respond to HTTP acquisition and partial acquisition requests. In this way, source device 20 may operate as an HTTP streaming server.

궁극적으로, 입력 인터페이스 (36) 는 컴퓨터 판독가능 매체 (34) 로부터 데이터를 취출한다. 입력 인터페이스 (36) 는 예를 들어, 광학 드라이브, 자기 매체 드라이브, USB 포트, 수신기, 송수신기, 또는 다른 컴퓨터 판독가능 매체 인터페이스를 포함할 수도 있다. 입력 인터페이스 (36) 는 데이터를 캡슐화 해제 유닛 (38) 에 제공할 수도 있다. 캡슐화 해제 유닛 (38) 은 비디오 파일의 엘리먼트들을 캡슐화 해제하여 인코딩된 데이터를 취출하고, 그 인코딩된 데이터가 오디오 또는 비디오 구성요소의 부분인지 여부에 따라서, 인코딩된 데이터를 오디오 디코더 (46) 또는 비디오 디코더 (48) 에 전송할 수도 있다. 오디오 디코더 (46) 는 인코딩된 오디오 데이터를 디코딩하여, 그 디코딩된 오디오 데이터를 오디오 출력부 (42) 에 전송하며, 한편, 비디오 디코더 (48) 는 그 인코딩된 비디오 유닛을 디코딩하여, 복수의 뷰들을 포함하는 그 디코딩된 비디오 유닛을 비디오 출력부 (44) 에 전송한다.Ultimately, the input interface 36 retrieves data from the computer readable medium 34. The input interface 36 may include, for example, an optical drive, a magnetic media drive, a USB port, a receiver, a transceiver, or other computer readable media interface. The input interface 36 may provide the data to the decapsulation unit 38. [ Decapsulation unit 38 decapsulates the elements of the video file to retrieve the encoded data and, depending on whether the encoded data is part of an audio or video component, convert the encoded data to audio decoder 46 or video. May send to decoder 48. The audio decoder 46 decodes the encoded audio data and sends the decoded audio data to the audio output 42 while the video decoder 48 decodes the encoded video unit to generate a plurality of views To the video output unit 44. The video output unit 44 outputs the decoded video unit to the video output unit 44. [

도 2 은 예시적인 캡슐화 유닛 (30) 의 구성요소들을 도시하는 블록도이다. 도 2 의 예에서, 캡슐화 유닛 (30) 은 비디오 입력 인터페이스 (80), 오디오 입력 인터페이스 (82), 파일 생성 유닛 (60), 및 비디오 파일 출력 인터페이스 (84) 를 포함한다. 파일 생성 유닛 (60) 은, 이 예에서, 구성요소 어셈블리 유닛 (62), 구성요소 맵 박스 구성기 (64), 및 구성요소 배열 (arr't) 박스 구성기 (66) 를 포함한다.2 is a block diagram illustrating the components of an exemplary encapsulation unit 30. 2, the encapsulation unit 30 includes a video input interface 80, an audio input interface 82, a file creation unit 60, and a video file output interface 84. File generation unit 60 includes, in this example, component assembly unit 62, component map box constructor 64, and component array (arr't) box constructor 66.

비디오 입력 인터페이스 (80) 및 오디오 입력 인터페이스 (82) 는 인코딩된 비디오 및 오디오 데이터를 각각 수신한다. 비디오 입력 인터페이스 (80) 및 오디오 입력 인터페이스 (82) 는 데이터가 인코딩됨에 따라서 인코딩된 비디오 및 오디오 데이터를 수신할 수도 있거나, 또는 컴퓨터 판독가능 매체로부터 인코딩된 비디오 및 오디오 데이터를 취출할 수도 있다. 인코딩된 비디오 및 오디오 데이터를 수신하자 마자, 비디오 입력 인터페이스 (80) 및 오디오 입력 인터페이스 (82) 는 비디오 파일에의 어셈블리를 위해 그 인코딩된 비디오 및 오디오 데이터를 파일 생성 유닛 (60) 으로 전달한다.Video input interface 80 and audio input interface 82 receive encoded video and audio data, respectively. Video input interface 80 and audio input interface 82 may receive encoded video and audio data as the data is encoded or may extract video and audio data encoded from computer readable media. Upon receiving the encoded video and audio data, the video input interface 80 and the audio input interface 82 deliver the encoded video and audio data to the file creation unit 60 for assembly to a video file.

파일 생성 유닛 (60) 은 제어 유닛에 기인하는 기능들 및 프로시저들을 수행하도록 구성된 하드웨어, 소프트웨어, 및/또는 펌웨어를 포함하는 제어 유닛에 대응할 수도 있다. 일반적으로 제어 유닛은 캡슐화 유닛 (30) 에 기인하는 기능들을 추가로 수행할 수도 있다. 파일 생성 유닛 (60) 이 소프트웨어 및/또는 펌웨어에 내장되는 예들에 있어, 캡슐화 유닛 (30) 은 (구성요소 어셈블리 유닛 (62), 구성요소 맵 박스 구성기 (64), 및 구성요소 배열 박스 구성기 (66) 뿐만 아니라) 파일 생성 유닛 (60) 과 연관되는 하나 이상의 프로세서들에 대한 명령들을 포함하는 컴퓨터 판독가능 매체 및 그 명령들을 실행하는 프로세싱 유닛을 포함할 수도 있다. 파일 생성 유닛 (60) 의 서브-유닛들 (이 예에서는, 구성요소 어셈블리 유닛 (62), 구성요소 맵 박스 구성기 (64), 및 구성요소 배열 박스 구성기 (66)) 각각은 개개의 하드웨어 유닛들 및/또는 소프트웨어 모듈들로서 구현될 수도 있으며, 추가적인 서브-유닛들에 기능적으로 통합되거나 또는 추가로 분리될 수도 있다.The file creation unit 60 may correspond to a control unit including hardware, software, and / or firmware configured to perform functions and procedures attributable to the control unit. In general, the control unit may further perform functions attributed to the encapsulation unit 30. [ In the examples where the file generation unit 60 is embedded in software and / or firmware, the encapsulation unit 30 includes (component assembly unit 62, component map box constructor 64, and component arrangement box configuration. Computer-readable medium containing instructions for one or more processors associated with file generation unit 60, as well as a processing unit that executes the instructions. Each of the sub-units (in this example, the component assembly unit 62, the component map box constructor 64, and the component arrangement box constructor 66) May be implemented as units and / or software modules, and may be functionally integrated into additional sub-units or further separated.

파일 생성 유닛 (60) 은 예를 들어, 하나 이상의 마이크로프로세서들, 주문형 집적 회로들 (ASICs), 필드 프로그래밍가능 게이트 어레이들 (FPGAs), 디지털 신호 프로세서들 (DSPs), 또는 이들의 임의의 조합과 같은 임의의 적합한 프로세싱 유닛 또는 프로세싱 회로에 대응할 수도 있다. 파일 생성 유닛 (60) 은 명령들을 실행하는 프로세서 뿐만 아니라, 구성요소 어셈블리 유닛 (62), 구성요소 맵 박스 구성기 (64), 및 구성요소 배열 박스 구성기 (66) 중 임의 또는 모두에 대한 명령들을 저장하는 비일시성 컴퓨터 판독가능 매체를 추가로 포함할 수도 있다.The file generation unit 60 may be implemented as, for example, one or more microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs) And may correspond to any suitable processing unit or processing circuit, such as a processor. The file creation unit 60 may include a processor for executing the instructions as well as instructions for any or all of the component assembly unit 62, the component map box constructor 64, May also include non-temporal computer-readable media for storing data.

일반적으로, 파일 생성 유닛 (60) 은 그 수신된 오디오 및 비디오 데이터를 포함하는 하나 이상의 비디오 파일들을 생성할 수도 있다. 구성요소 어셈블리 유닛 (62) 은 콘텐츠의 구성요소를 그 수신된 인코딩된 비디오 및 오디오 샘플들로부터 발생할 수도 있다. 구성요소는 세그먼트들의 개수에 대응할 수도 있으며, 그 세그먼트들 각각은 하나 이상의 비디오 단편들을 포함할 수도 있다. 세그먼트들 각각은 클라이언트 디바이스, 예컨대 목적지 디바이스 (40) 에 의해 독립적으로 취출할 수도 있다. 예를 들어, 파일 생성 유닛 (60) 은 고유한 URL 또는 URN 을 세그먼트를 포함하는 파일에 할당할 수도 있다. 일반적으로, 구성요소 어셈블리 유닛 (62) 은 동일한 구성요소에 속하는 인코딩된 샘플들이 그 구성요소로 어셈블리되는 것을 보장할 수도 있다. 구성요소 어셈블리 유닛 (62) 은 또한 고유한 구성요소 식별자들을 콘텐츠의 구성요소 각각에 할당할 수도 있다. 파일 생성 유닛 (60) 은 한 파일에 하나 보다 많은 구성요소에 대한 데이터를 포함할 수도 있으며, 한 구성요소는 다수의 파일들을 포괄할 수도 있다. 파일 생성 유닛 (60) 은 구성요소에 대한 데이터를 비디오 파일 내 트랙으로서 저장할 수도 있다.Generally, the file creation unit 60 may generate one or more video files containing the received audio and video data. The component assembly unit 62 may generate components of the content from the received encoded video and audio samples. The component may correspond to the number of segments, and each of the segments may include one or more video segments. Each of the segments may be independently retrieved by a client device, e.g., destination device 40. [ For example, the file creation unit 60 may assign a unique URL or URN to a file containing a segment. In general, the component assembly unit 62 may ensure that encoded samples belonging to the same component are assembled into that component. The component assembly unit 62 may also assign unique component identifiers to each of the components of the content. The file generating unit 60 may include data for more than one component in one file, and one component may encompass a plurality of files. The file creation unit 60 may store the data for the component as a track in the video file.

구성요소 맵 박스 구성기 (64) 는 본 개시물의 기법들에 따라서 멀티미디어 콘텐츠에 대한 구성요소 맵 박스를 발생할 수도 있다. 예를 들어, 구성요소 맵 박스는 콘텐츠의 구성요소들의 특성들을 시그널링할 수도 있다. 이들 특성들은 구성요소의 평균 비트레이트, 구성요소의 최대 비트레이트, 구성요소의 해상도 및 프레임 레이트 (구성요소가 비디오 구성요소라고 가정함), 다른 구성요소들에 대한 의존성들, 또는 다른 특성들을 포함할 수도 있다. 의존성들이 시그널링될 때, 구성요소 맵 박스 구성기 (64) 는 또한 의존적인 관계를 갖는 구성요소들 사이의 시간 계층 차이를 규정할 수도 있다. 구성요소 맵 박스는 또한 그 구성요소에 이용가능한 멀티플렉싱 간격들의 잠재적인 멀티플렉싱 간격들 또는 범위의 세트를 시그널링할 수도 있다. 일부 예들에서, 파일 생성 유닛 (60) 은 구성요소 맵 박스를 그 콘텐츠에 대한 코딩된 샘플들을 포함하는 모든 다른 파일들과는 분리된 파일에 저장할 수도 있다. 다른 예들에서, 파일 생성 유닛 (60) 은 구성요소 맵 박스를 비디오 파일들의 하나의 헤더에 포함할 수도 있다.The component map box constructor 64 may generate a component map box for the multimedia content according to the techniques of the present disclosure. For example, the component map box may signal the characteristics of the components of the content. These characteristics include the average bit rate of the component, the maximum bit rate of the component, the resolution and frame rate of the component (assuming the component is a video component), dependencies on other components, or other characteristics You may. When dependencies are signaled, the component map box constructor 64 may also define a time layer difference between components having a dependency relationship. The component map box may also signal a set of potential multiplexing intervals or ranges of multiplexing intervals available to the component. In some instances, the file creation unit 60 may store the component map box in a separate file from all other files that contain coded samples for that content. In other examples, file generation unit 60 may include a component map box in one header of the video files.

디폴트로, 구성요소 맵 박스는 전체 콘텐츠에 적용된다. 그러나, 구성요소 맵 박스가 오직 콘텐츠의 부분에만 적용될 때, 구성요소 맵 박스 구성기 (64) 는 그 구성요소 맵 박스가 적용되는 콘텐츠의 시간 지속기간을 시그널링할 수도 있다. 구성요소 맵 박스 구성기 (64) 는 그 후 정적 모드 또는 동적 모드에서 그 콘텐츠에 대한 다수의 구성요소 맵 박스들을 발생할 수도 있다. 정적 모드에서, 구성요소 맵 박스 구성기 (64) 는 모든 구성요소 맵 박스들을 함께 그 구성요소 맵 박스들이 대응하는 콘텐츠의 시간 지속기간들에 대응하는 순서로 그룹화한다. 동적 모드에서, 구성요소 맵 박스 구성기 (64) 는 각각의 구성요소 맵 박스를 상이한 위치에, 예컨대, 상이한 파일들에 배치할 수도 있다.By default, the component map box is applied to the entire content. However, when the component map box is applied only to a portion of the content, the component map box constructor 64 may signal the time duration of the content to which the component map box is applied. Component map box configurator 64 may then generate multiple component map boxes for its content in static mode or dynamic mode. In static mode, the component map box constructor 64 groups all component map boxes together in an order corresponding to the time durations of the content for which the component map boxes correspond. In the dynamic mode, component map box configurator 64 may place each component map box in a different location, eg, in different files.

구성요소 맵 박스는 또한 미디어 단편이 구성요소의 새로운 세그먼트에 속하는지 여부를 시그널링할 수도 있다. 구성요소의 세그먼트 각각이 구성요소 식별자를 포함하고 있기 때문에, 심지어 세그먼트들이 별개의 파일들에 저장되더라도, 동일한 구성요소에 속하는 세그먼트들이 식별될 수 있다. 구성요소 맵 박스는 구성요소에 대한 인코딩된 샘플들을 갖는 파일들 내 구성요소의 부분들에 대한 타이밍 정보를 추가로 시그널링할 수도 있다. 따라서, 시간 스플리싱이 자연스럽게 지원된다. 예를 들어, 클라이언트 디바이스, 예컨대 목적지 디바이스 (40) 는 2개의 별개의 파일들이 동일한 구성요소에 대한 데이터, 및 2개의 파일들의 시간 순서를 포함한다고 결정할 수도 있다.The component map box may also signal whether the media fragment belongs to a new segment of the component. Because each segment of an element includes a component identifier, even if the segments are stored in separate files, segments belonging to the same component can be identified. The component map box may further signal the timing information for the parts of the component in files with encoded samples for the component. Therefore, time splicing is naturally supported. For example, a client device, e.g., destination device 40, may determine that two separate files contain data for the same component, and the time order of the two files.

구성요소 배열 박스 구성기 (66) 는 파일 생성 유닛 (60) 에 의해 발생된 각각의 파일에 대한 구성요소 배열 박스들을 발생할 수도 있다. 일반적으로, 구성요소 배열 박스 구성기 (66) 는 어느 구성요소들이 그 파일 내에 포함되는지 뿐만 아니라, 구성요소 식별자들과 파일에 대한 트랙 식별자들 사이의 대응을 식별할 수도 있다. 이러한 방법으로, 구성요소 배열 박스는 콘텐츠에 대한 구성요소 식별자들과 파일에 대한 트랙 식별자들 사이에 맵핑을 제공할 수도 있다. 트랙 식별자들은 그 맵핑에서 규정된 구성요소에 대한 인코딩된 샘플들을 갖는 파일의 트랙에 대응할 수도 있다.The component arrangement box constructor 66 may generate component arrangement boxes for each file generated by the file creation unit 60. [ In general, the component arrangement box constructor 66 may identify not only which components are included in the file, but also the correspondence between component identifiers and track identifiers for the file. In this way, the component arrangement box may provide a mapping between component identifiers for the content and track identifiers for the file. The track identifiers may correspond to a track of a file having encoded samples for a component specified in the mapping.

구성요소 배열 박스는 또한 각각의 구성요소의 단편들이 파일에 저장되는 방법을 나타낼 수도 있다. 예를 들어, 구성요소 배열 박스 구성기 (66) 는 파일에서의 구성요소의 단편들에 대한 바이트 범위들, 특정의 단편에 대한 바이트 오프셋들, 그 미디어 단편에서의 제 1 샘플의 디코딩 시간, 랜덤 액세스 포인트가 그 단편에 존재하는지 여부, 및 존재하면, 그의 디코딩 및 프리젠테이션 시간들 및 랜덤 액세스 포인트가 IDR 또는 ODR 화상인지 여부를 규정할 수도 있다.The component arrangement box may also indicate how the fragments of each component are stored in the file. For example, the component arrangement box constructor 66 may include byte ranges for component fragments in a file, byte offsets for a particular fragment, decoding time of a first sample in the media fragment, Whether the access point is present in the fragment and, if present, its decoding and presentation times and whether the random access point is an IDR or ODR picture.

파일 생성 유닛 (60) 이 파일을 발생한 후, 파일 출력 인터페이스 (84) 는 그 파일을 출력할 수도 있다. 일부 예들에서, 파일 출력 인터페이스 (84) 는 그 파일을 컴퓨터 판독가능 저장 매체, 예컨대 하드 디스크에 저장할 수도 있다. 일부 예들에서, 파일 출력 인터페이스 (84) 는 그 파일을 HTTP 1.1 를 구현하는 HTTP 스트리밍 서버와 같은, 서버로서 동작하도록 구성된 또 다른 디바이스로, 출력 인터페이스 (32) (도 1) 를 통해서 전송할 수도 있다. 일부 예들에서, 파일 출력 인터페이스 (84) 는 출력 인터페이스 (32) 가 예컨대, HTTP 스트리밍 요청들에 응답하여 파일을 클라이언트 디바이스들, 예컨대 목적지 디바이스 (40) 에 제공할 수 있도록, 그 파일을 로컬 저장 매체에 저장할 수도 있다.After the file generating unit 60 generates the file, the file output interface 84 may output the file. In some instances, the file output interface 84 may store the file on a computer readable storage medium, such as a hard disk. In some instances, the file output interface 84 may send the file via the output interface 32 (FIG. 1) to another device configured to operate as a server, such as an HTTP streaming server implementing HTTP 1.1. In some examples, file output interface 84 may provide the file to local storage medium such that output interface 32 may provide the file to client devices, such as destination device 40, for example in response to HTTP streaming requests. Can also be stored in

도 3 은 예시적인 구성요소 맵 박스 (100) 및 구성요소 배열 박스 (152A) 를 도시하는 개념도이다. 이 예에서, 구성요소 맵 박스 (100) 는 비디오 구성요소들 (110) 및 오디오 구성요소들 (140) 을 포함한다. 구성요소 맵 박스 (110) 자체가 비디오 구성요소들 (110) 및 오디오 구성요소들 (140) 에 대한 시그널링된 특성들을 포함한다는 점에 유의해야 한다. 도 2 에 대해 언급한 바와 같이, 구성요소 맵 박스 (100) 및 구성요소 배열 박스들 (152) 은 파일 생성 유닛 (60) 에 의해, 예컨대, 구성요소 맵 박스 구성기 (64) 및 구성요소 배열 박스 구성기 (66) 에 의해 각각 생성될 수도 있다. 이러한 방법으로, 캡슐화 유닛 (30) 은 멀티미디어 콘텐츠 및 그 멀티미디어 콘텐츠에 대한 데이터를 포함하는 파일들의 특성들을 시그널링할 수도 있다. 예를 들어, 비디오 구성요소들 (110) 은 구성요소들 (112) 에 대한 시그널링된 특성들을 포함하며, 오디오 구성요소들 (140) 은 구성요소들 (142) 에 대한 시그널링된 특성들을 포함한다. 이 예에서 나타낸 바와 같이, 구성요소 (112A) 는 구성요소 특성들 (114A) 을 포함한다.3 is a conceptual diagram showing an exemplary component map box 100 and a component arrangement box 152A. In this example, component map box 100 includes video components 110 and audio components 140. It should be noted that component map box 110 itself includes the signaled characteristics for video components 110 and audio components 140. 2, the component map box 100 and the component arrangement boxes 152 are defined by the file creation unit 60, for example, by the component map box constructor 64 and the component arrangement And may be generated by the box composer 66, respectively. In this way, the encapsulation unit 30 may signal the characteristics of files including multimedia content and data for the multimedia content. For example, video components 110 include signaled properties for components 112, and audio components 140 include signaled properties for components 142. For example, As shown in this example, component 112A includes component properties 114A.

구성요소 특성들 (114A) 은, 이 예에서, 비트레이트 정보 (116), 해상도 정보 (118), 프레임 레이트 정보 (120), 코덱 정보 (122), 프로파일 및 레벨 정보 (124), 의존성들 정보 (126), 세그먼트 정보 (128), 멀티플렉싱 간격 정보 (130), 및 3D 비디오 정보 (132) 를 포함한다.Component properties 114A, in this example, include bitrate information 116, resolution information 118, frame rate information 120, codec information 122, profile and level information 124, dependencies information. 126, segment information 128, multiplexing interval information 130, and 3D video information 132.

비트레이트 정보 (116) 는 구성요소 (112A) 에 대한 평균 비트레이트 및 최대 비트레이트의 어느 한쪽 또는 양자를 포함할 수도 있다. 비트레이트 정보 (116) 는 또한 평균 및/또는 최대 비트레이트 정보가 시그널링되는지 여부를 나타내는 플래그들을 포함할 수도 있다. 예를 들어, 비트레이트 정보 (116) 는 평균 비트레이트 플래그 및 최대 비트레이트 플래그를 포함할 수도 있으며, 여기서 평균 비트레이트 플래그는 평균 비트레이트가 구성요소 (112A) 에 대해 시그널링되는지 여부를 나타내며, 최대 비트레이트 플래그는 최대 비트레이트가 구성요소 (112A) 에 대해 시그널링되는지 여부를 나타낸다. 비트레이트 정보 (116) 는 또한 구성요소 (112A) 에 대한 평균 비트레이트를 나타내는 평균 비트레이트 값을 포함할 수도 있다. 이와 유사하게, 비트레이트 정보 (116) 는 어떤 시간 기간 동안, 예컨대, 1 초의 간격 동안 최대 비트레이트 값을 나타내는 최대 비트레이트 값을 포함할 수도 있다.The bit rate information 116 may include either or both of an average bit rate and a maximum bit rate for the component 112A. The bit rate information 116 may also include flags indicating whether average and / or maximum bit rate information is signaled. For example, the bit rate information 116 may include an average bit rate flag and a maximum bit rate flag, where the average bit rate flag indicates whether the average bit rate is signaled for the component 112A, The bit rate flag indicates whether the maximum bit rate is signaled for component 112A. Bit rate information 116 may also include an average bit rate value that represents the average bit rate for component 112A. Similarly, the bit rate information 116 may include a maximum bit rate value that represents a maximum bit rate value during a time period, e.g., an interval of one second.

해상도 정보 (118) 는 구성요소 (112A) 의 해상도를, 예컨대, 화상의 픽셀 폭 및 픽셀 높이의 관점에서, 기술할 수도 있다. 일부의 경우, 해상도 정보 (118) 는 구성요소 (112A) 에 대해 명시적으로 시그널링되지 않을 수도 있다. 예를 들어, 구성요소 특성들 (114A) 은 인덱스 i 를 갖는 구성요소가 인덱스 i-1 를 갖는 동일한 콘텐츠의 구성요소의 동일한 특성들을 갖는지 여부를 나타내는 디폴트 특성들 플래그를 포함할 수도 있다. 특성들이 동일하다는 것을 플래그가 나타내면, 그 특성들이 시그널링될 필요가 없다. 디폴트 특성들은 예를 들어, 해상도, 프레임 레이트, 코덱 정보, 프로파일 정보, 및 레벨 정보, 또는 구성요소 맵 박스 (100) 과 같은 구성요소 맵 박스에 의해 시그널링될 수 있는 특성들의 다른 조합들과 같은 가용 특성들의 서브세트에 대응할 수도 있다. 일부 예들에서, 개개의 플래그들이 그 구성요소에 대한 대응하는 특성이 이전 구성요소와 동일한지 여부를 나타내는 각각의 잠재적인 구성요소에 대해서 포함된다.Resolution information 118 may describe the resolution of component 112A, e.g., in terms of pixel width and pixel height of the image. In some cases, resolution information 118 may not be explicitly signaled for component 112A. For example, element properties 114A may include default property flags indicating whether an element with index i has the same properties of the same content element with index i-1. If the flags indicate that the characteristics are the same, then the characteristics need not be signaled. The default characteristics may include, for example, available options such as resolution, frame rate, codec information, profile information, and level information, or other combinations of characteristics that can be signaled by a component map box, May correspond to a subset of the properties. In some instances, the individual flags are included for each potential component that indicates whether the corresponding property for that component is the same as the previous component.

일부 예들에서, 프레임 레이트 정보 (120) 는 위에서 설명한 바와 같이 디폴트 특성으로서 지정될 수도 있다. 이의 대안으로, 프레임 레이트 정보 (120) 는 구성요소 (112A) 에 대한 프레임 레이트를 규정할 수도 있다. 프레임 레이트는 비디오 구성요소의 256 초 당 프레임들로 규정될 수도 있다. 코덱 정보 (122) 는 또한 위에서 설명한 바와 같이, 디폴트 특성으로서 지정될 수도 있다. 이의 대안으로, 코덱 정보 (122) 는 구성요소 (112A) 를 인코딩하는데 사용되는 인코더를 규정할 수도 있다. 이와 유사하게, 프로파일 및 레벨 정보 (124) 는 디폴트 특성들로서 지정되거나 또는 예컨대, 프로파일 표시자 (profile_idc) 및 레벨 표시자 (level_idc) 값들로서 명시적으로 규정될 수도 있다.In some instances, the frame rate information 120 may be specified as a default characteristic as described above. Alternatively, the frame rate information 120 may define a frame rate for the component 112A. The frame rate may be defined as 256 frames per second of the video component. The codec information 122 may also be specified as a default characteristic, as described above. Alternatively, the codec information 122 may define an encoder used to encode the component 112A. Similarly, the profile and level information 124 may be specified as default properties or explicitly defined as, for example, the profile indicator (profile_idc) and level indicator (level_idc) values.

의존성들 정보 (126) 는 구성요소 (112A) 가 구성요소들 (110) 의 그 밖의 구성요소들에 의존하는지 여부를 나타낼 수도 있다. 의존한다면, 의존성 정보 (126) 는 구성요소 (112A) 에 대한 시간 식별자 및 그 구성요소 (112A) 에 대한 시간 식별자와 구성요소 (112A) 가 의존하는 구성요소에 대한 시간 식별자 사이의 차이를 나타내는 정보를 포함할 수도 있다.The dependencies information 126 may indicate whether the component 112A depends on other components of the components 110. For example, The dependency information 126 includes information indicating the time identifier for the component 112A and the difference between the time identifier for the component 112A and the time identifier for the component on which the component 112A depends . &Lt; / RTI >

세그먼트 정보 (128) 는 구성요소 (112A) 의 세그먼트들을 기술한다. 세그먼트들은 파일들, 예컨대 파일들 (150) 에 저장될 수도 있다. 도 3 의 예에서, 구성요소 (112A) 의 세그먼트들에 대한 데이터는 아래에서 더욱더 자세하게 설명하는 바와 같이, 파일 (150A) 에, 구체적으로 말하면, 비디오 트랙 (158) 에, 저장될 수도 있다. 일부의 경우, 구성요소 (112A) 에 대한 세그먼트들은 다수의 파일들에 저장될 수도 있다. 각각의 세그먼트는 하나 이상의 단편들에 대응할 수도 있다. 각각의 단편에 대해, 세그먼트 정보 (128) 는 단편이 랜덤 액세스 포인트를 포함하는지 여부, 랜덤 액세스 포인트에 대한 유형 (예컨대, IDR 또는 ODR), 단편이 새로운 파일 (예컨대, 새로운 세그먼트) 에 대응하는지 여부, 단편의 시작에 대한 바이트 오프셋, 단편의 제 1 샘플에 대한 타이밍 정보 (예컨대, 디코딩 및/또는 디스플레이 시간), 다음 단편에 대한 바이트 오프셋, 존재한다면, 랜덤 액세스 포인트에 대한 바이트 오프셋, 및 ODR RAP 에서 스트림을 시작할 때 디코딩하는 것을 건너 뛸 샘플들의 개수를 시그널링할 수도 있다.Segment information 128 describes the segments of component 112A. Segments may be stored in files, such as files 150. [ In the example of FIG. 3, data for segments of component 112A may be stored in file 150A, specifically in video track 158, as described in greater detail below. In some cases, segments for component 112A may be stored in multiple files. Each segment may correspond to one or more fragments. For each segment, the segment information 128 includes information such as whether the fragment includes a random access point, the type for the random access point (e.g., IDR or ODR), whether the fragment corresponds to a new file (e.g., a new segment) , Byte offset for the start of the fragment, timing information (e.g., decoding and / or display time) for the first sample of the fragment, byte offset for the next fragment, byte offset for the random access point if present, and ODR RAP It may signal the number of samples to skip decoding at the beginning of the stream.

멀티플렉싱 간격 정보 (130) 는 구성요소 (112A) 에 대한 멀티플렉싱 간격들의 세트 또는 범위를 규정할 수도 있다. 3D 비디오 정보 (132) 는 구성요소 (112A) 가 3차원의 효과를, 예컨대, 2개 이상의 약간 상이한 장면의 뷰들을 동시에 또는 거의 동시에 디스플레이함으로써, 발생하기 위해 사용될 때에 포함될 수도 있다. 3D 비디오 정보 (132) 는 디스플레이될 뷰들의 개수, 뷰들에 대응하는 구성요소들에 대한 식별자들, 특정의 기본 비디오 구성요소에 대한 3D 표현의 시작 시간, 3D 표현의 시간 지속기간, 목표 해상도들 (예컨대, 궁극적으로 디스플레이될 때 3D 표현의 목표 폭 및 목표 높이), 측위 정보 (예컨대, 디스플레이 윈도우에서 수평 오프셋 및 수직 오프셋), 프리젠테이션을 위한 그 디코딩된 비디오 구성요소의 계층을 나타내는 윈도우 계층, 및 투명성 인자 (transparent factor) 를 포함할 수도 있다. 일반적으로, 낮은 윈도우 계층 값은 연관되는 비디오 구성요소가 조기에 렌더링되는 것을 나타낼 수도 있으며, 더 높은 계층 값을 가진 비디오 구성요소에 의해 커버될 수도 있다. 투명성 레벨 정보는 윈도우 레벨 정보와 결합될 수도 있다. 구성요소가 낮은 윈도우 계층 값을 갖는 또 다른 구성요소와 결합될 때, 그 밖의 구성요소에서 각각의 픽셀은 [투명성 레벨] / 255 의 값으로 가중될 수도 있으며, 현재의 구성요소에 함께 배치된 (collocated) 픽셀은 (255 - [투명성 레벨]) / 255 의 값으로 가중될 수도 있다.Multiplexing interval information 130 may define a set or range of multiplexing intervals for component 112A. 3D video information 132 may be included when component 112A is used to generate a 3D effect, e.g., by simultaneously or nearly simultaneously displaying views of two or more slightly different scenes. The 3D video information 132 includes information such as the number of views to be displayed, identifiers for components corresponding to the views, start time of the 3D representation for a particular basic video component, time duration of the 3D representation, (E.g., the target width and target height of the 3D representation when ultimately displayed), positioning information (e.g., horizontal offset and vertical offset in the display window), a window layer representing the layer of the decoded video component for presentation, And may include a transparent factor. In general, a low window layer value may indicate that the associated video component is rendered early and may be covered by a video component with a higher layer value. Transparency level information may be combined with window level information. When a component is combined with another component having a lower window layer value, each pixel in the other components may be weighted with a value of [Transparency Level] / 255, collocated pixel may be weighted with a value of (255 - [transparency level]) / 255.

도 3 은 구성요소들 (112, 142) 과 구성요소들 (112, 142) 에 대한 데이터를 포함하는 여러 파일들 (150) 사이의 대응을 도시한다. 이 예에서, 파일 (150A) 은 비디오 구성요소 (112A) 에 대한 인코딩된 샘플들을 비디오 트랙 (158) 의 유형으로, 및 오디오 구성요소 (142A) 에 대한 인코딩된 샘플들을 오디오 트랙 (160) 의 유형으로 포함한다. 파일 (150A) 은 또한 구성요소 배열 박스 (152A) 를 포함한다. 또한 이 예에서 도시된 바와 같이, 구성요소 배열 박스 (152A) 는 구성요소 대 비디오 트랙 맵 (component to video track map; 154) 및 구성요소 대 오디오 트랙 맵 (component to audio track map; 156) 을 포함한다. 구성요소 대 비디오 트랙 맵 (154) 은 구성요소 (112A) 에 대한 구성요소 식별자가 파일 (150A) 의 비디오 트랙 (158) 에 맵핑되는 것을 나타낸다. 이와 유사하게, 구성요소 대 오디오 트랙 맵 (156) 은 구성요소 (142A) 에 대한 구성요소 식별자가 파일 (150A) 의 오디오 트랙 (160) 에 맵핑되는 것을 나타낸다.Figure 3 illustrates the correspondence between the components 112 and 142 and the various files 150 including the data for the components 112 and 142. In this example, file 150A includes encoded samples for video component 112A as a type of video track 158 and encoded samples for audio component 142A as a type of audio track 160 . The file 150A also includes a component arrangement box 152A. As also shown in this example, the component placement box 152A includes a component to video track map 154 and a component to audio track map 156 do. The component to video track map 154 indicates that the component identifier for component 112A is mapped to the video track 158 of file 150A. Similarly, the component to audio track map 156 indicates that the component identifier for the component 142A is mapped to the audio track 160 of the file 150A.

이 예에서, 구성요소 (112B) 는 파일 (150B) 의 비디오 트랙 (162) 에 대응하며, 구성요소 (142B) 는 파일 (150C) 의 오디오 트랙 (164) 에 대응한다. 따라서, 구성요소 배열 박스 (152B) 는 구성요소 (112B) 와 비디오 트랙 (162) 사이의 맵핑을 포함할 수도 있으며, 한편 구성요소 배열 박스 (152C) 는 구성요소 (142B) 와 오디오 트랙 (164) 사이의 맵핑을 포함할 수도 있다. 이러한 방법으로, 클라이언트 디바이스는 구성요소 맵 박스 (100) 및 구성요소 배열 박스들 (152) 을 취출하여, 요청할 구성요소들, 및 그 파일들 (150) 로부터 구성요소들에 대한 인코딩된 데이터에 어떻게 액세스할 것인지를 결정할 수도 있다.In this example, component 112B corresponds to video track 162 of file 150B, and component 142B corresponds to audio track 164 of file 150C. The component placement box 152B may include a mapping between the component 112B and the video track 162 while the component placement box 152C includes the component 142B and the audio track 164, Lt; / RTI > In this way, the client device retrieves the component map box 100 and the component arrangement boxes 152 to determine which components to request, and how to add encoded data to the components from the files 150 And determine whether to access it.

아래의 의사 코드는 구성요소 맵 박스에 대한 데이터 구조의 하나의 예시적인 구현예이다.The following pseudo code is an exemplary implementation of a data structure for a component map box.

aligned(8) class ComponentMapBox extends FullBox('cmmp', version, 0) {aligned (8) class ComponentMapBox extends FullBox ('cmmp', version, 0) {

unsigned int (32) box_length;unsigned int (32) box_length;

unsigned int (64) content_ID;unsigned int (64) content_ID;

unsigned int (32) timescale;unsigned int (32) timescale;

unsigned int (8) video_component_count;unsigned int (8) video_component_count;

unsigned int (8) audio_component_count;unsigned int (8) audio_component_count;

unsigned int (8) other_component_count;unsigned int (8) other_component_count;

bit (1) first_cmmp_flag; //디폴트 1bit (1) first_cmmp_flag; // Default 1

bit (1) more_cmmp_flag; //디폴트 0bit (1) more_cmmp_flag; // default 0

bit (2) cmmp_byte_range_idc;bit (2) cmmp_byte_range_idc;

bit (1) multi_video_present_flag;bit (1) multi_video_present_flag;

bit (2) multiplex_interval_idc;bit (2) multiplex_interval_idc;

bit (1) duration_signalled_flag;bit (1) duration_signalled_flag;

bit (1) dynamic_component_map_mode_flag;bit (1) dynamic_component_map_mode_flag;

bit (7) reserved_bit; bit (7) reserved_bit;

if (duration_signalled_flag) {if (duration_signalled_flag) {

unsigned int (64) starting_time;unsigned int (64) starting_time;

unsigned int (64) duration;unsigned int (64) duration;

} }

for (i=1; i<= video_component_count; i++){for (i = 1; i < = video_component_count; i ++) {

unsigned int (8) component_ID;unsigned int (8) component_ID;

bit (1) average_bitrate_flag; bit (1) average_bitrate_flag;

bit (1) maximum_bitrate_flag;bit (1) maximum_bitrate_flag;

bit (1) default_characteristics_flag;bit (1) default_characteristics_flag;

bit (2) resolution_idc;bit (2) resolution_idc;

bit (2) frame_rate_idc; bit (2) frame_rate_idc;

bit (2) codec_info_idc; bit (2) codec_info_idc;

bit (2) profile_level_idc; bit (2) profile_level_idc;

bit (1) dependency_flag;bit (1) dependency_flag;

bit (1) 3DVideo_flag;bit (1) 3DVideo_flag;

bit (2) reserved_flag;bit (2) reserved_flag;

// 비트레이트// Bit rate

if (average_bitrate_flag)if (average_bitrate_flag)

unsigned int (32) avgbitrate;unsigned int (32) avgBitrate;

if (maximum_bitrate_flage)if (maximum_bitrate_flage)

unsigned int (32) maxbitrate;unsigned int (32) maxbitrate;

// 해상도// resolution

if (!default_characteristics_flag) {if (! default_characteristics_flag) {

if (resolution_idc == 1) {if (resolution_idc == 1) {

unsigned int (16) 폭;unsigned int (16) width;

unsigned int (16) 높이;unsigned int (16) height;

}}

else if (resolution_idc == 2)else if (resolution_idc == 2)

unsigned int (8) same_cha_component_id;unsigned int (8) same_cha_component_id;

// resolution_idc 가 0 일 때, 해상도는 규정되지 않으며, // When resolution_idc is 0, resolution is undefined,

// 그 값이 3 일 때, i-1 의 인덱스를 가진 구성요소와 // When the value is 3, the component with the index of i-1

// 동일한 해상도를 갖는다.// have the same resolution

// 프레임 레이트// Frame rate

if (frame_rate_idc ==1) if (frame_rate_idc == 1)

unsigned int (32) frame_rate;unsigned int (32) frame_rate;

else if (frame_rate_idc == 2)else if (frame_rate_idc == 2)

usngined int (8) same_cha_component_id;usngined int (8) same_cha_component_id;

// frame_rate_idc 가 0 일 때, 프레임 레이트는 규정되지 않으며, // When frame_rate_idc is 0, the frame rate is undefined,

// 그 값이 3 일 때, i-1 의 인덱스를 가진 구성요소와 동일한 // When the value is 3, it is the same as the component with the index of i-1

// 프레임 레이트를 갖는다.// Has a frame rate.

if (codec_info_idc == 1)if (codec_info_idc == 1)

string [32] compressorname;string [32] compressorname;

else if (codec_info_idc == 2) else if (codec_info_idc == 2)

unsingedn int (8) same_cha_component_id;unsingedn int (8) same_cha_component_id;

//profile_level// profile_level

if (profile_level_idc == 1)if (profile_level_idc == 1)

profile_level;profile_level;

else if (profile_level_idc == 2) else if (profile_level_idc == 2)

unsigned int (8) same_cha_component_id; unsigned int (8) same_cha_component_id;

}}

if (dependency_flag) {if (dependency_flag) {

unsigned int (8) dependent_comp_count;unsigned int (8) dependent_comp_count;

bit (1) temporal_scalability;bit (1) temporal_scalability;

unsigned int (3) temporal_id;unsigned int (3) temporal_id;

bit (4) reserved; bit (4) reserved;

for (j=1; j<= dependent_comp_count; j++) {for (j = 1; j <= dependent_comp_count; j ++) {

unsigned int (6) dependent_comp_id; unsigned int (6) dependent_comp_id;

if (temporal_scalability)if (temporal_scalability)

unsigned int (2) delta_time_id;unsigned int (2) delta_time_id;

}}

if (3DV_flag) {if (3DV_flag) {

unsigned int (8) number_target_views;unsigned int (8) number_target_views;

}}

// 세그먼트들 // segments

if (cmmp_byte_range_idc > 0) {if (cmmp_byte_range_idc> 0) {

unsigned int (16) entry_count_byte_range;unsigned int (16) entry_count_byte_range;

for(j=1; i <= entry_count; j++) { for (j = 1; i < = entry_count; j ++) {

int (2) contains_RAP;int (2) contains_RAP;

int (1) RAP_type;int (1) RAP_type;

bit (1) new_file_start_flag;bit (1) new_file_start_flag;

int (4) reserved;int (4) reserved;

unsigned int (32) reference_offset;unsigned int (32) reference_offset;

unsigned int (32) reference_delta_time;unsigned int (32) reference_delta_time;

if (cmmp_byt_rage_idc>0)if (cmmp_byt_rage_idc> 0)

unsigned int (32) next_fragment_offset;unsigned int (32) next_fragment_offset;

if (contain_RAP > 1) {if (contain_RAP> 1) {

unsigned int (32) RAP_delta_time;unsigned int (32) RAP_delta_time;

unsigned int (32) delta_offset;unsigned int (32) delta_offset;

}}

if (contain_RAP > 0 && RAP_type !=0) {if (contain_RAP> 0 && RAP_type! = 0) {

unsigned int (32) delta_DT_PT;unsigned int (32) delta_DT_PT;

unsigned int (8) number_skip_samples;unsigned int (8) number_skip_samples;

}}

if (multiplex_interval_idc == 1) {if (multiplex_interval_idc == 1) {

unsigned int (8) entry_count;unsigned int (8) entry_count;

for (j=1; j<=entry_count;j++) for (j = 1; j < = entry_count; j ++)

unsigned int (32) multiplex_time_interval;unsigned int (32) multiplex_time_interval;

}}

else if (multiplex_interval_idc == 2) {else if (multiplex_interval_idc == 2) {

unsigned int (32) min_muliplex_time_interval;unsigned int (32) min_muliplex_time_interval;

unsigned int (32) max_muliplex_time_interval;unsigned int (32) max_muliplex_time_interval;

}}

if (multi_video_present_flag) {if (multi_video_present_flag) {

unsigned int (8) multi_video_group_count;unsigned int (8) multi_video_group_count;

for (i=1; i<= multi_video_group_count; i++) {i (i = 1; i < = multi_video_group_count; i ++) {

unsigned int (8) basic_video_component_id; unsigned int (8) basic_video_component_id;

unsigned int (8) extra_video_component_count;unsigned int (8) extra_video_component_count;

int (64) media_time; int (64) media_time;

int (64) duration;int (64) duration;

for (j=1; j<= extra_video_component_count; j++)for (j = 1; j < = extra_video_component_count; j ++)

unsigned int (8) component_id;unsigned int (8) component_id;

for (j=0; j<= extra_video_component_count; j++) {for (j = 0; j <= extra_video_component_count; j ++) {

unsigned int (16) target_width;unsigned int (16) target_width;

unsigned int (16) target_height;unsigned int (16) target_height;

unsigned int (16) horizontal_offset;unsigned int (16) horizontal_offset;

unsigned int (16) vertical_offset;unsigned int (16) vertical_offset;

unsigned int (4) window_layer; unsigned int (4) window_layer;

unsigned int (8) transparent_level;unsigned int (8) transparent_level;

}}

for (i=1; i<= audio_component_count; i++) {for (i = 1; i <= audio_component_count; i ++) {

unsigned int (8) component_ID;unsigned int (8) component_ID;

bit (1) average_bitrate_flag;bit (1) average_bitrate_flag;

bit (1) maximum_bitrate_flag;bit (1) maximum_bitrate_flag;

bit (2) codec_info_idc; bit (2) codec_info_idc;

bit (2) profile_level_idc; bit (2) profile_level_idc;

......

// 비디오 구성요소들에 대한 구문 표와 유사 // similar to syntax table for video components

}}

이 예에서 의사 코드 엘리먼트들의 의미들은 다음과 같다. 다른 예들에서, 다른 변수 이름들 및 의미들이 구성요소 맵 박스의 엘리먼트들에 할당될 수도 있는 것으로 이해되어야 한다. 이 예에서, box_length 는 구성요소 맵 박스의 길이를 바이트들의 단위로 나타낸다. content_ID 는 스트리밍 서버가 제공하는 콘텐츠의 고유 식별자를 규정한다.In this example, the pseudo code elements have the following meanings. In other instances, it should be understood that other variable names and semantics may be assigned to the elements of the component map box. In this example, box_length represents the length of the component map box in units of bytes. The content_ID specifies a unique identifier of the content provided by the streaming server.

timescale 은 전체 서비스에 대한 시간-척도를 규정하는 정수이다. 이것은 1초 내에 통과하는 시간 단위들의 개수이다. 예를 들어, 시간을 1/60 초로 측정하는 시간 좌표계는 60 의 시간 척도를 갖는다.timescale is an integer defining the time-scale for the entire service. This is the number of time units that pass within one second. For example, the time coordinate system measuring time in 1/60 seconds has a time scale of 60.

video_component_count 는 구성요소 맵 박스에 대응하는 서비스가 제공할 수 있는 대안적인 비디오 구성요소들의 개수를 규정한다. 이 박스에서 정의된 임의의 2개의 비디오 구성요소들은 서로 전환될 수 있다. 다수의 표현들로 이루어지는 비디오 프리젠테이션들이 있으면, 이런 프리젠테이션은 대안적인 비디오 구성요소 그룹에 속하는 하나의 구성요소를 포함할 수도 있다.video_component_count specifies the number of alternative video components that the service corresponding to the component map box can provide. Any two video components defined in this box can be switched to each other. If there are video presentations made up of multiple representations, such presentations may include one component belonging to an alternative group of video components.

audio_component_count 는 구성요소 맵 박스에 대응하는 서비스가 제공할 수 있는 대안적인 오디오 구성요소들의 개수를 규정한다. 이 박스에 정의된 임의의 2개의 오디오 구성요소들은 서로 전환될 수 있다. other_component_count 는 구성요소 맵 박스에 대응하는 서비스의 다른 구성요소들의 개수를 규정한다.audio_component_count specifies the number of alternative audio components that the service corresponding to the component map box can provide. Any two audio components defined in this box can be switched to each other. other_component_count specifies the number of other components of the service corresponding to the component map box.

first_cmmp_flag 는 이 구성요소 맵 박스가 그 연관되는 서비스에 대한 동일한 유형의 제 1 박스인지 여부를 나타낸다. more_cmmp_flag 는 이 구성요소 맵 박스가 그 연관되는 서비스에 대한 동일한 유형의 최종 박스인지 여부를 나타낸다.first_cmmp_flag indicates whether this component map box is the first box of the same type for its associated service. more_cmmp_flag indicates whether this component map box is the last type of the same type for its associated service.

0 인 값을 갖는 cmmp_byte_range_idc 는 바이트 범위 및 타이밍 정보가 구성요소 맵 박스에서 시그널링되지 않는다는 것을 나타낸다. 0 보다 큰 값을 갖는 cmmp_byte_range_idc 는 바이트 범위 및 타이밍 정보가 구성요소 맵 박스에서 시그널링된다는 것을 나타낸다. 1 인 값을 갖는 cmmp_byte_range_idc 는 오직 구성요소의 세그먼트의 시작 바이트 오프셋만이 시그널링된다는 것을 나타낸다. 2 인 값을 갖는 cmmp_byte_range_idc 는 시작 바이트 오프셋 및 구성요소의 세그먼트의 종료 바이트 오프셋 양자가 시그널링된다는 것을 나타낸다.A value of 0, cmmp_byte_range_idc, indicates that the byte range and timing information are not signaled in the component map box. Cmmp_byte_range_idc with a value greater than zero indicates that byte range and timing information is signaled in the component map box. A value of 1, cmmp_byte_range_idc, indicates that only the start byte offset of the segment of the component is signaled. Cmmp_byte_range_idc with a value of 2 indicates that both the start byte offset and the end byte offset of the segment of the component are signaled.

temporal_scalability 는 현재의 구성요소가 다음 시그널링된 콘텐츠 구성요소들에 의존하는지 여부를 나타내며, 이때, 다음 시그널링된 콘텐츠 구성요소들 중 적어도 하나는 낮은 temporal_id 를 갖는다. temporal_id 는 현재의 구성요소의 시간 식별자를 나타낸다. temporal_id 값은 temporal_scalability 값이 0 인 경우 무시될 수도 있다. delta_temporal_id 는 현재의 구성요소의 샘플들의 모두의 최고 시간 식별자 값과 그 의존적인 구성요소의 최고 시간 식별자 값의 차이를 나타낸다.temporal_scalability indicates whether the current component depends on the next signaled content components, at least one of the following signaled content components having a low temporal_id. temporal_id represents the time identifier of the current component. The temporal_id value may be ignored if the temporal_scalability value is zero. delta_temporal_id indicates the difference between the highest time identifier value of all of the samples of the current component and the highest time identifier value of the dependent component.

1 인 값을 갖는 multi_video_present_flag 는 하나 보다 많은 디코딩된 비디오 구성요소로부터 렌더링되는 비디오 프리젠테이션들이 존재한다는 것을 나타낸다. 예를 들어, 화면 속 화면의 경우, multi_video_present_flag 는 0 인 값을 가질 수도 있으며, 그 0 의 값은 어떤 비디오 프리젠테이션도 하나 보다 많은 디코딩된 비디오 구성요소에 의해 렌더링되지 않는다는 것을 나타낼 수도 있다.A multi_video_present_flag having a value of one indicates that there are video presentations rendered from more than one decoded video component. For example, for a picture in picture, multi_video_present_flag may have a value of 0, which may indicate that no video presentation is rendered by more than one decoded video component.

0 인 값을 갖는 multiplex_interval_idc 는 멀티플렉싱 간격이 시그널링되지 않는다는 것을 나타낸다. 1 인 값을 갖는 multiplex_interval_idc 는 멀티플렉싱 간격들의 리스트가 시그널링된다는 것을 나타낸다. 2 인 값을 갖는 multiplex_interval_idc 는 멀티플렉싱 간격들의 범위가 시그널링된다는 것을 나타낸다.A multiplex_interval_idc having a value of 0 indicates that the multiplexing interval is not signaled. A multiplex_interval_idc value of 1 indicates that the list of multiplexing intervals is signaled. A multiplex_interval_idc having a value of 2 indicates that the range of multiplexing intervals is signaled.

duration_signalled_flag 는 구성요소 맵 박스가 대응하는 서비스의 지속기간이 시그널링되는지 여부를 나타낸다. 그 지속기간이 시그널링되지 않으면 (예컨대, duration_signalled_flag 가 0 인 값을 가질 때), 현재의 구성요소 맵 박스는 전체 서비스에 적용되는 것으로 가정된다.The duration_signalled_flag indicates whether the duration of the corresponding service of the component map box is signaled. If the duration is not signaled (e.g., when duration_signalled_flag has a value of zero), the current component map box is assumed to apply to the entire service.

dynamic_component_map_mode_flag 는 동적 모드가 현재의 구성요소 맵 박스에 대해 지원되는지 여부를 나타낸다. 1 인 값을 갖는 dynamic_component_map_mode_flag 는 정적 모드를 나타내며 동일한 서비스의 다음 구성요소 맵 박스가, 있다면, 동일한 파일에서 현재의 구성요소 맵 박스에 바로 뒤따른다. 0 인 값을 갖는 dynamic_component_map_mode_flag 는 동적 모드를 나타내며, 따라서, 동일한 서비스의 다음 구성요소 맵 박스가 상이한 수단에 의해 추후에 클라이언트로 송신될 것이다. 예를 들어, 다음 구성요소 맵 박스가 다음 파일의 영화 박스에 포함될 수도 있다.dynamic_component_map_mode_flag indicates whether the dynamic mode is supported for the current component map box. A value of 1, dynamic_component_map_mode_flag, indicates static mode and follows the current component map box in the same file, if there is a next component map box of the same service. Dynamic_component_map_mode_flag with a value of 0 indicates a dynamic mode, so that the next component map box of the same service will later be sent to the client by different means. For example, the following component map box may be included in the movie box of the next file.

starting_time 은 현재의 구성요소 맵 박스가 적용하는 서비스의 시작 시간을 나타낸다. 지속기간은 현재의 구성요소 맵 박스가 적용하는 콘텐츠의 지속기간을 나타낸다.The starting_time indicates the start time of the service applied by the current component map box. The duration represents the duration of the content to which the current component map box applies.

component_ID 는 구성요소의 고유 식별자이다. average_bitrate_flag 는 평균 비트 레이트가 그 연관되는 구성요소에 대해 시그널링되는지 여부를 나타낸다. maximum_bitrate_flag 는 최대 비트 레이트가 그 연관되는 구성요소에 대해 시그널링되는지 여부를 나타낸다. 도 3 의 비트레이트 값 (116) 은 그 의사 코드에서 average_bitrate_flag 및 maximum_bitrate_flag 중 어느 하나 또는 양자에 대응할 수도 있다. default_characteristics_flag 는 다음 특성들: 해상도, 프레임 레이트, 코덱 정보, 프로파일 및 레벨에 대해서, i 의 인덱스를 가진 현재의 구성요소가 i-1 의 인덱스를 가진 구성요소와 동일한 값들을 갖는지 여부를 나타낸다.The component_ID is a unique identifier of the component. Average_bitrate_flag indicates whether the average bit rate is signaled for its associated component. maximum_bitrate_flag indicates whether the maximum bit rate is signaled for the associated component. The bit rate value 116 of FIG. 3 may correspond to either or both of average_bitrate_flag and maximum_bitrate_flag in the pseudo code. default_characteristics_flag indicates whether the current component having the index of i has the same values as the component having the index of i-1, for the following properties: resolution, frame rate, codec information, profile and level.

0 으로 설정되는 resolution_idc/ frame_rate_idc/codec_info_idc/profile_level_idc 값들은 그 연관되는 비디오 구성요소의 해상도, 프레임 레이트, 코덱 정보, 프로파일, 및/또는 레벨 (각각) 이 시그널링되지 않는다는 것을 나타낸다. resolution_idc 는 도 3 의 해상도 값 (118) 에 대응할 수도 있다. frame_rate_idc 는 도 3 의 프레임 레이트 값 (120) 에 대응할 수도 있다. codec_info_idc 는 도 3 의 코덱 정보 (info) 값 (122) 에 대응할 수도 있다. profile_level_idc 는 도 3 의 프로파일/레벨 값 (124) 에 대응할 수도 있다. 1 로 설정되는 이 값들은 그 연관되는 비디오 구성요소의 해상도, 프레임 레이트, 코덱 정보, 프로파일, 또는 레벨 (각각) 이 i-1 의 인덱스를 가진 비디오 구성요소와 동일하다는 것을 나타낸다. 2 로 설정되는 이 값들 중 임의의 값은 각각의 값이 값 "same_cha_component_id" 을 이용하여 시그널링된 하나의 특정 비디오 구성요소와 같다는 것을 나타낸다.The resolution_idc / frame_rate_idc / codec_info_idc / profile_level_idc values set to zero indicate that the resolution, frame rate, codec information, profile, and / or level (respectively) of the associated video component are not signaled. The resolution_idc may correspond to the resolution value 118 of FIG. The frame_rate_idc may correspond to the frame rate value 120 of FIG. The codec_info_idc may correspond to the codec information (info) value 122 in Fig. profile_level_idc may correspond to the profile / level value 124 of FIG. 3. These values, which are set to 1, indicate that the resolution, frame rate, codec information, profile, or level (respectively) of the associated video component is the same as the video component with the index of i-1. Any of these values set to 2 indicates that each value is the same as one particular video component signaled using the value "same_cha_component_id ".

dependency_flag 는 현재의 비디오 구성요소의 의존성이 시그널링되는지 여부를 나타낸다. 그 구성요소가 다른 비디오 구성요소들에 의존한다고 dependency_flag 가 나타낼 때, 현재의 구성요소가 의존하는 구성요소들이 또한 시그널링될 수도 있다. 즉, 의존성이 시그널링되면, 대응하는 비디오 구성요소는 그 시그널링된 비디오 구성요소들에 의존한다. dependency_flag 값은, 현재의 구성요소가 의존하는 시그널링된 비디오 구성요소들과 함께, 도 3 의 의존성들 값 (126) 에 대응할 수도 있다.dependency_flag indicates whether the dependency of the current video component is signaled. When dependency_flag indicates that the component depends on other video components, the components upon which the current component depends may also be signaled. That is, if the dependency is signaled, the corresponding video component depends on the signaled video components. The dependency_flag value, in conjunction with the signaled video components upon which the current component depends, may correspond to the dependencies value 126 of FIG.

3DVideo_flag 는 현재의 비디오 구성요소가 MVC 또는 3D 표현을 제공하는 다른 비디오 콘텐츠에 관련되는지 여부를 나타낸다. number_target_views 는 예컨대, MVC (멀티뷰 비디오 코딩) 로 코딩된 3D 비디오 구성요소를 디코딩할 때, 목표 뷰들의 개수를 규정한다. entry_count_byte_range 는 그 연관되는 구성요소에 대해 시그널링된 단편들의 개수를 규정한다. 3DVideo_flag, number_target_views, 및 entry_count_byte_range 는 일반적으로 도 3 의 3D 비디오 정보 값 (132) 에 대응할 수도 있다.3DVideo_flag indicates whether the current video component is related to another video content providing MVC or 3D representation. number_target_views specifies the number of target views when, for example, decoding 3D video components coded with MVC (multi-view video coding). The entry_count_byte_range specifies the number of signaled fragments for the associated component. 3DVideo_flag, number_target_views, and entry_count_byte_range may generally correspond to the 3D video information value 132 of FIG.

avgbitrate 는 그 연관되는 구성요소의 평균 비트레이트를 나타낸다. maxbitrate 는 임의의 초 간격으로 계산된 그 연관되는 구성요소의 최대 비트레이트를 나타낸다. 폭 및 높이는 루마 픽셀들의 유닛들에서, 그 디코딩된 비디오 구성요소의 해상도를 나타낸다.avgbitrate represents the average bit rate of the associated component. maxbitrate represents the maximum bit rate of the associated component calculated at random seconds interval. The width and height, in units of luma pixels, represent the resolution of the decoded video component.

same_resl_component_id 는 연관되는 비디오 구성요소의 동일한 특정의 특성들 (해상도 또는 프레임 레이트 또는 코덱 정보, 또는 프로파일 및 레벨) 을 갖는 비디오 구성요소의 구성요소 식별자를 나타낸다.The same_resl_component_id indicates a component identifier of a video component having the same specific characteristics (resolution or frame rate or codec information, or profile and level) of the associated video component.

frame_rate 는 비디오 구성요소의 프레임 레이트를 256 초 당 프레임들로 나타낸다. compressorname 은 코덱의 브랜드, 예를 들어, "avc1" 를 나타내는 4 바이트 값이다. 이것은 파일 유형 박스의 major_brand 와 동일한 의미들을 갖는다. profile_level 은 현재의 비디오 구성요소를 디코딩하는데 요구되는 프로파일 및 레벨을 나타낸다. dependent_comp_count 는 그 연관되는 비디오 구성요소들에 대한 의존적인 비디오 구성요소의 개수를 나타낸다.frame_rate represents the frame rate of the video component in frames per 256 seconds. The compressorname is a 4-byte value representing the brand of the codec, for example, "avc1 ". It has the same meanings as the major_brand of the file type box. The profile_level indicates the profile and level required to decode the current video component. dependent_comp_count indicates the number of video components that are dependent on the associated video components.

dependent_comp_id 는 연관되는 비디오 구성요소가 의존하는 비디오 구성요소들의 하나의 구성요소 식별자를 규정한다. 동일한 시간 인스턴스에서, 상이한 콘텐츠 구성요소들에서 샘플들은 콘텐츠 구성요소들의 인덱스의 오름차순으로 순서정렬될 수도 있다. 즉, j 의 인덱스를 가진 샘플은 j+1 의 인덱스를 가진 샘플보다 더 조기에 배치될 수도 있으며, 현재의 콘텐츠 구성요소의 샘플은 그 시간 인스턴스에서 최종 샘플일 수도 있다.The dependent_comp_id specifies one component identifier of the video components on which the associated video component depends. At the same time instances, the samples in the different content elements may be reordered in ascending order of the index of the content elements. That is, a sample with an index of j may be placed earlier than a sample with an index of j + 1, and the sample of the current content element may be the last sample at that time instance.

contains_RAP 은 구성요소의 단편들이 랜덤 액세스 포인트를 포함하는지 여부를 나타낸다. contains_RAP 은 그 단편이 임의의 랜덤 액세스 포인트를 포함하지 않으면 0 으로 설정된다. contains_RAP 은 그 단편이 랜덤 액세스 포인트를 그 단편 내에 제 1 샘플로서 포함하면 1 로 설정된다. contains_RAP 은 랜덤 액세스 포인트가 그 단편의 제 1 샘플이 아니면 2 로 설정된다. RAP_type 은 영화 단편의 참조된 트랙에 포함된 랜덤 액세스 포인트들 (RAPs) 의 유형을 규정한다. RAP_type 는 랜덤 액세스 포인트가 순간 디코더 리프레시 (IDR) 화상이면 0 으로 설정된다. RAP_type 는 랜덤 액세스 포인트가 개방 GOP 랜덤 액세스 포인트, 예컨대, 개방 디코더 리프레시 (ODR) 화상이면 1 로 설정된다.The contains_RAP indicates whether the fragments of the component include a random access point. The contains_RAP is set to zero if the fragment does not contain any random access points. The contains_RAP is set to 1 if the fragment includes the random access point as the first sample in the fragment. The contains_RAP is set to 2 if the random access point is not the first sample of the fragment. RAP_type specifies the type of random access points (RAPs) included in the referenced track of the movie segment. RAP_type is set to 0 if the random access point is an instantaneous decoder refresh (IDR) image. RAP_type is set to 1 if the random access point is an open GOP random access point, e.g., an Open Decoder Refresh (ODR) image.

new_file_start_flag 플래그는 단편이 파일 내 대응하는 구성요소의 제 1 단편인지 여부를 나타낸다. 이것은 현재의 나타낸 단편이 새로운 파일에 있다는 것을 암시한다. 이 시그널링은 상대적으로 작은 사이즈 파일들이 서버에서 사용되거나 또는 시간 스플리싱이 사용될 때 유리할 수도 있다.The flag new_file_start_flag indicates whether the fragment is the first fragment of the corresponding component in the file. This implies that the current fragment is in the new file. This signaling may be advantageous when relatively small size files are used in the server or when time splicing is used.

reference_offset 은 그 단편을 포함하는 파일에서 단편의 시작 바이트에 대한 오프셋을 나타낸다. reference_delta_time 은 그 연관되는 단편의 디코딩 시간을 나타낸다. next_fragment_offset 은 그 단편을 포함하는 파일에서 그 연관되는 비디오 구성요소의 다음 단편의 시작 바이트 오프셋을 나타낸다. RAP_delta_time 은 제 1 IDR 랜덤 액세스 포인트와 그 단편의 제 1 샘플 사이의 디코딩 시간 차이를 나타낸다. delta_offset 은 단편의 제 1 샘플의 바이트 오프셋과 랜덤 액세스 포인트의 바이트 오프셋 사이의 바이트 오프셋 차이를 나타낸다.reference_offset indicates the offset to the start byte of the fragment in the file containing the fragment. reference_delta_time represents the decoding time of the associated fragment. next_fragment_offset indicates the start byte offset of the next fragment of the associated video component in the file containing the fragment. RAP_delta_time represents the decoding time difference between the first IDR random access point and the first sample of the fragment. delta_offset represents the byte offset difference between the byte offset of the first sample of the fragment and the byte offset of the random access point.

delta_DT_PT 는 디코딩 시간과 ODR (개방 GOP 랜덤 액세스 포인트) 인 RAP 에 대한 프리젠테이션 시간의 차이를 나타낸다. number_skip_samples 는 ODR 에 앞선 프리젠테이션 시간 및 ODR 이후의 분해 시간을 갖는 샘플들의 개수를 나타내며, 이 ODR 은 영화 단편의 참조된 트랙의 제 1 RAP 일 수도 있다. 디코더가 ODR 에서 시작하는 스트림을 수신하면 디코더가 이들 샘플들의 디코딩을 건너 뛸 수도 있다는 점에 유의한다. contains_RAP, RAP_type, new_file_start_flag, reference_offset, refrence_delta_time, next_fragment_offset, RAP_delta_time, delta_offset, delta_DT_PT, 및 number_skip_samples 은 일반적으로 세그먼트 정보 (128) 에 대응할 수도 있다.delta_DT_PT represents the difference between the decoding time and the presentation time for RAP, which is the ODR (Open GOP Random Access Point). number_skip_samples represents the number of samples having the presentation time preceding the ODR and the decomposition time after the ODR, which may be the first RAP of the referenced track of the movie segment. Note that the decoder may skip decoding of these samples when the decoder receives the stream starting at ODR. includes_RAP, RAP_type, new_file_start_flag, reference_offset, refrence_delta_time, next_fragment_offset, RAP_delta_time, delta_offset, delta_DT_PT, and number_skip_samples may generally correspond to the segment information 128.

multiplex_time_interval 은 멀티플렉싱 간격을 그 연관되는 비디오 구성요소에 대해 시간척도 (timescale) 의 단위로 나타낸다. 비디오 구성요소들은 일반적으로 멀티플렉싱 간격 정보와 연관되지만, 멀티플렉싱 간격 정보는 또한 오디오 구성요소들에 대해 시그널링될 수도 있다. multiplex_time_interval 은 도 3 의 멀티플렉스 간격 값 (130) 에 대응할 수도 있다. min_muliplex_time_interval 및 max_muliplex_time_interval 은 멀티플렉싱 간격의 범위를 그 연관되는 비디오 구성요소에 대해 시간스케일의 단위로 나타낸다. multi_video_group_count 는 다수의 디코딩된 비디오 구성요소들의 조합으로서 디스플레이될 비디오 프리젠테이션들의 개수를 규정한다.multiplex_time_interval indicates the multiplexing interval in units of a timescale for the associated video component. Video components are typically associated with multiplexing interval information, but multiplexing interval information may also be signaled to audio components. The multiplex_time_interval may correspond to the multiplex interval value 130 of FIG. min_muliplex_time_interval and max_muliplex_time_interval indicate the range of the multiplexing interval in units of time scale for the associated video components. multi_video_group_count defines the number of video presentations to be displayed as a combination of multiple decoded video components.

basic_video_component_id 는 기본 비디오 구성요소의 구성요소 식별자를 규정한다. 시그널링된 다른 여분의 비디오 구성요소들이 기본 비디오 구성요소와 함께 하나의 대안적인 비디오 프리젠테이션으로서 고려된다. 예를 들어, "비디오 구성요소들의 video_component_count 에 대해" 의 이전 루프가 CIDs 0, 1, 2, 3, 4, 5, 6 을 가진, 7개의 비디오 구성요소들을 포함한다고 가정한다. 3 의 숫자를 가진 basic_video_component_id 및 2개의 여분의 비디오 구성요소들 5 및 6 이 있다고 추가로 가정한다. 그러면, 오직 프리젠테이션들의 다음 그룹만이 서로 {0}, {1}, {2}, {3, 5, 6} 및 {4} 에 대안적이다.basic_video_component_id specifies the component identifier of the base video component. Other signaling components that are signaled are considered as one alternative video presentation with the basic video component. For example, assume that the previous loop of "for video_component_count of video components" includes seven video components with CIDs 0, 1, 2, 3, 4, 5, It is additionally assumed that there is a basic_video_component_id with a number of 3 and two extra video components 5 and 6. Then, only the next group of presentations are alternatives to {0}, {1}, {2}, {3, 5, 6} and {4}.

media_time 은 basic_video_component_id 의 식별자를 갖는 기본 비디오 구성요소를 가진 멀티-비디오 표현의 시작 시간을 나타낸다. 지속기간 (duration) 은 basic_video_component_id 의 식별자를 가진 기본 비디오 구성요소를 갖는 멀티-비디오 프리젠테이션의 지속기간을 규정한다. target_width 및 target_height 는 이 멀티-비디오 프리젠테이션에서 비디오 구성요소의 목표 해상도를 규정한다. 이것이 비디오의 원래 해상도와 같지 않으면, 목적지 디바이스가 스케일링을 수행할 수도 있음에 유의한다.media_time represents the start time of the multi-video representation with the basic video component having the identifier of basic_video_component_id. The duration defines the duration of a multi-video presentation having a primary video component with an identifier of basic_video_component_id. target_width and target_height define the target resolution of the video component in this multi-video presentation. Note that if this is not the same as the original resolution of the video, the destination device may perform scaling.

horizontal_offset 및 vertical_offset 은 디스플레이 윈도우에서 수평 및 수직 오프셋들에서의 오프셋을 규정한다. window_layer 는 프리젠테이션을 위한 디코딩된 비디오 구성요소의 계층을 나타낸다. 낮은 계층 값은 그 연관되는 비디오 구성요소가 조기에 렌더링된다는 것을 나타낼 수도 있으며, 더 높은 계층 값을 가진 비디오 구성요소에 의해 커버될 수도 있다. 디코딩된 비디오 구성요소들은 window_layer 값들의 오름차순으로 렌더링될 수도 있다.The horizontal_offset and vertical_offset define the offsets at the horizontal and vertical offsets in the display window. window_layer represents a layer of decoded video components for presentation. A lower layer value may indicate that the associated video component is rendered early and may be covered by a video component with a higher layer value. The decoded video components may be rendered in ascending order of window_layer values.

transparent_level 는 이 디코딩된 비디오가 현재의 비디오 구성요소보다 낮은 window_layer 과 결합될 때 사용되는 투명성 인자를 나타낸다. 각각의 픽셀에 대해, 기존 픽셀은 transparent_level/255 의 값으로 가중될 수도 있으며, 현재의 디코딩된 비디오 구성요소에서 함께 배열된 픽셀은 (255 - transparent_level)/255 의 값으로 가중될 수도 있다.transparent_level indicates the transparency factor used when this decoded video is combined with window_layer lower than the current video component. For each pixel, the existing pixel may be weighted to a value of transparent_level / 255, and the pixels arranged together in the current decoded video component may be weighted to a value of (255 - transparent_level) / 255.

아래의 의사 코드는 구성요소 배열 박스에 대한 데이터 구조의 하나의 예시적인 구현예이다.The following pseudocode is an exemplary implementation of a data structure for a component array box.

aligned(8) class ComponentArrangeBox extends FullBox('cmar', version, 0) {aligned (8) class ComponentArrangeBox extends FullBox ('cmar', version, 0) {

unsigned int (64) content_ID;unsigned int (64) content_ID;

unsigned int (8) component_count;unsigned int (8) component_count;

bit (1) track_map_flag;bit (1) track_map_flag;

bit (1) sub_fragment_flag; bit (1) sub_fragment_flag;

bit (1) agg_fragment_flag;bit (1) agg_fragment_flag;

for (i=1; i<= component_count; i++) {for (i = 1; i <= component_count; i ++) {

unsigned int (8) component_ID;unsigned int (8) component_ID;

if (track_map_flag)if (track_map_flag)

unsigned int (32) track_id;unsigned int (32) track_id;

}}

if (sub_fragment_flag) {if (sub_fragment_flag) {

unsigned int (8) major_component_count;unsigned int (8) major_component_count;

for (i=1; i<= major_component_count; i++) {for (i = 1; i <= major_component_count; i ++) {

unsigned int (8) full_component_ID;unsigned int (8) full_component_ID;

unsigned int (8) sub_set_component_count;unsigned int (8) sub_set_component_count;

for (j=1; j< sub_set_component_count; j++)for (j = 1; j <sub_set_component_count; j ++)

unsigned int (8) sub_set_component_ID;unsigned int (8) sub_set_component_ID;

}}

if (agg_fragment_flag) {if (agg_fragment_flag) {

unsigned int (8) aggregated_component_count;unsigned int (8) aggregated_component_count;

for (i=1; i<= aggregated_component_count; i++) {for (i = 1; i <= aggregated_component_count; i ++) {

unsigned int (8) aggr_component_id;unsigned int (8) aggr_component_id;

for (j=1; j<= dependent_component_count; j++)for (j = 1; j < = dependent_component_count; j ++)

unsigned int (8) depenedent_component_ID;unsigned int (8) depenedent_component_ID;

}}

이 예에서 의사 코드 엘리먼트들의 의미들 (semantics) 은 다음과 같다. 다른 예들에서, 다른 변수 이름들 및 의미들은 구성요소 배열 박스의 엘리먼트들에 할당될 수도 있는 것으로 이해되어야 한다.In this example, the semantics of the pseudo code elements are as follows. In other instances, it should be understood that other variable names and semantics may be assigned to the elements of the component arrangement box.

component_count 는 현재의 파일에서 구성요소들의 개수를 규정한다. track_map_flag 는 content_ID 의 서비스 식별자를 가진 서비스의 구성요소들에의 이 파일에서의 트랙들의 맵핑이 시그널링되는지 여부를 나타낸다.component_count specifies the number of components in the current file. track_map_flag indicates whether the mapping of tracks in this file to components of the service having the service ID of content_ID is signaled.

sub_fragment_flag 는 구성요소들에의 이 파일에서의 하위 트랙들의 맵핑이 시그널링되는지 여부를 나타낸다. agg_fragment_flag 는 구성요소들에의 이 파일에서의 트랙 집합들의 맵핑이 시그널링되는지 여부를 나타낸다. major_component_count 는 그 파일에 모든 샘플들을 포함하는 주요 구성요소들의 개수를 나타낸다.sub_fragment_flag indicates whether mapping of lower tracks in this file to elements is signaled. agg_fragment_flag indicates whether the mapping of track sets in this file to components is signaled. major_component_count represents the number of major components that contain all samples in the file.

component_ID 값들은 파일에 저장된 주요 구성요소들, 대략, 각각의 영화 단편에서 각각의 구성요소의 제 1 샘플의 식별자들을 나타낸다. track_id 는 component_ID 의 구성요소 식별자를 가진 구성요소에 대응하는 트랙의 식별자를 나타낸다.The component_ID values represent the major components stored in the file, approximately, the identifiers of the first samples of each component in each movie segment. The track_id indicates an identifier of the track corresponding to the component having the component identifier of the component_ID.

sub_set_component_count 는 full_component_ID 의 component_id 를 가진 구성요소의 전체 세트를 형성하는 하위 구성요소들의 개수를 나타낸다. sub_set_component_ID 는 full_component_ID 의 component_id 을 가진 구성요소의 전체 세트를 형성하는 하위 구성요소들의 component_id 값들을 규정한다. 동일한 구성요소의 임의의 2개의 서브 구성요소들은 중첩된 샘플들을 갖지 않는다.sub_set_component_count represents the number of subcomponents forming the entire set of components having component_id of full_component_ID. The sub_set_component_ID specifies the component_id values of the subcomponents that form the complete set of components having the component_id of the full_component_ID. Any two subcomponents of the same component do not have nested samples.

일부 예들에서, aggregated_component_count 는 그 파일 내 다른 구성요소들로부터 집합된 콘텐츠 구성요소들의 개수를 나타낸다. 일부 예들에서, aggregated_component_count 는 aggr_component_id 의 구성요소 식별자를 가진 그 집합된 콘텐츠 구성요소를 집합하는데 요구되는 의존적인 구성요소들의 개수를 나타낸다. aggr_component_id 는 집합된 구성요소의 구성요소 식별자를 규정한다. depenedent_component_ID 는 aggr_component_id 의 id 를 가진 구성요소를 집합하는데 사용되는 구성요소들의 구성요소 id 를 규정한다.In some examples, aggregated_component_count represents the number of content components aggregated from other components in the file. In some examples, aggregated_component_count represents the number of dependent components required to aggregate the aggregated content component with the component identifier of aggr_component_id. aggr_component_id specifies the component identifier of the aggregated component. depenedent_component_ID specifies the component id of the component used to aggregate the component with id of aggr_component_id.

아래의 표 1 은 본 개시물의 기법들에 부합하는 구문 오브젝트들의 또 다른 예시적인 세트를 도시한다. "엘리먼트 또는 속성 이름" 칼럼은 구문 오브젝트의 이름을 기술한다. "유형 (type)" 칼럼은 구문 오브젝트가 엘리먼트 또는 속성인지 여부를 기술한다. "원소 개수" 칼럼은 구문 오브젝트의 원소 개수 (cardinality), 즉, 표 1 에 대응하는 데이터 구조의 인스턴스에서 구문 오브젝트의 인스턴스들의 개수를 기술한다. "선택성 (optionality)" 칼럼은 구문 오브젝트가 옵션적인지 여부를 기술하며, 이 예에서, "M" 은 강제적임을 나타내고, "O" 는 옵션적임을 나타내고, "OD" 는 디폴트 값으로 옵션적임을 나타내고, "CM" 은 조건부 강제적임을 나타낸다. "설명" 칼럼은 대응하는 구문 오브젝트의 의미들을 기술한다.Table 1 below shows another exemplary set of syntax objects that conform to the techniques of the present disclosure. The "element or attribute name" column describes the name of the syntax object. The "type" column describes whether the syntax object is an element or an attribute. The "element count" column describes the cardinality of the syntax object, i.e., the number of instances of the syntax object in the instance of the data structure corresponding to Table 1. The "optionality" column describes whether the syntax object is optional, in this example, "M" indicates mandatory, "O" indicates optional, "OD" indicates optional by default , "CM" indicates conditional compulsion. The "Description" column describes the semantics of the corresponding syntax object.

도 3 의 예에 있어, 클라이언트 디바이스는 구성요소 맵 박스 (100) 를 요청할 수도 있으며, 이 구성요소 맵 박스는 비디오 구성요소들 (110) 및 오디오 구성요소들 (140) 의 특성들에 대한 정보를 포함한다. 예를 들어, 구성요소 (112A) 는 구성요소 특성들 (114A) 에 의해 기술된다. 이와 유사하게, 그 밖의 구성요소들 각각은 구성요소 특성들 (114A) 에 유사한 구성요소 특성들 정보에 의해 기술된다. 클라이언트 디바이스는 또한 구성요소 배열 박스들 (152) 을 취출할 수도 있으며, 이 구성요소 배열 박스들은 오디오 및 비디오 데이터의 구성요소 식별자들과 트랙들, 예컨대 비디오 트랙들 (158, 162) 및 오디오 트랙들 (160, 164) 사이의 맵핑들을 기술한다. 이러한 방법으로, 구성요소 맵 박스 (100) 는 파일들 (150) 로부터 개별적으로 저장되며, 그 파일들은 오디오 및 비디오 데이터의 코딩된 샘플들을 포함한다. 클라이언트 디바이스는 구성요소 맵 박스 (100) 및 구성요소 배열 박스들 (152) 의 데이터를 이용하여, 콘텐츠의 표현을 선택하고 그 선택된 구성요소들의 세그먼트들을, 예컨대, 네트워크 스트리밍 프로토콜, 예컨대 HTTP 스트리밍에 따라서 요청할 수도 있다.In the example of FIG. 3, the client device may request a component map box 100, which includes information about the characteristics of video components 110 and audio components 140 . For example, component 112A is described by component properties 114A. Similarly, each of the other components is described by component property information similar to component properties 114A. The client device may also extract component arrangement boxes 152, which may include component identifiers and tracks of audio and video data, such as video tracks 158 and 162, RTI ID = 0.0 > 160, < / RTI > In this way, the component map box 100 is stored separately from the files 150, which include coded samples of audio and video data. The client device uses the data in the component map box 100 and the component arrangement boxes 152 to select a representation of the content and to render segments of the selected components in accordance with a network streaming protocol such as HTTP streaming You can also request it.

도 4 은 비디오 구성요소 (180) 및 오디오 구성요소 (184) 를 멀티플렉싱하는 예시적인 타이밍 간격 (190) 을 도시하는 개념도이다. 이 예에서, 표현은 비디오 구성요소 (180) 및 오디오 구성요소 (184) 를 포함한다. 비디오 구성요소 (180) 는 비디오 단편들 (182A-182D) (비디오 단편들 (182)) 을 포함하는 반면, 오디오 구성요소 (184) 는 오디오 단편들 (186A-186C) (오디오 단편들 (186)) 을 포함한다. 비디오 단편들 (182) 은 인코딩된 비디오 샘플들을 포함할 수도 있는 반면, 오디오 단편들 (186) 은 인코딩된 오디오 샘플들을 포함할 수도 있다.4 is a conceptual diagram illustrating an exemplary timing interval 190 for multiplexing the video component 180 and audio component 184. In this example, the representation includes a video component 180 and an audio component 184. The video component 180 includes video segments 182A-182D (video segments 182) while the audio component 184 includes audio fragments 186A-186C (audio fragments 186) ). The video fragments 182 may comprise encoded video samples while the audio fragments 186 may comprise encoded audio samples.

비디오 단편들 (182) 및 오디오 단편들 (186) 은 비디오 구성요소 (180) 및 오디오 구성요소 (184) 내에 각각 디코딩 시간 순서로 정렬될 수도 있다. 도 4 의 축 (188) 은 비디오 구성요소 (180) 및 오디오 구성요소 (184) 에 대한 디코딩 시간 정보를 나타낸다. 이 예에서, 디코딩 시간은 축 (188) 로 나타낸 바와 같이, 좌측으로부터 우측으로 증가한다. 따라서, 클라이언트 디바이스는 예를 들어, 비디오 단편 (182B) 이전에, 비디오 단편 (182A) 를 디코딩할 수도 있다.Video fragments 182 and audio fragments 186 may be arranged in decoding time order within video component 180 and audio component 184, respectively. Axis 188 in FIG. 4 represents decoding time information for video component 180 and audio component 184. In this example, the decoding time increases from left to right, as indicated by axis 188. Thus, the client device may decode the video fragment 182A, for example, prior to the video fragment 182B.

또한, 도 4 는 1 초의 예시적인 타이밍 간격 (190) 을 도시한다. 본 개시물의 기법들에 따르면, 비디오 구성요소 (180) 및 오디오 구성요소 (184) 에 대한 구성요소 맵 박스는, 타이밍 간격 (190) 이 타이밍 간격들의 잠재적인 세트의 하나임을 나타내거나, 또는 타이밍 간격 (190) 을 포함하는 타이밍 간격들의 범위를 나타낼 수도 있다. 클라이언트 디바이스는 이 정보를 이용하여, 버퍼 오버플로우 (overflow) 를 피할 뿐만 아니라, 다음 정보의 세트가 네트워크를 통해서 스트리밍되기 전에 디코딩될 수 있는 충분한 양의 데이터가 버퍼링되는 것을 보장하는 방법으로, 비디오 구성요소 (180) 및 오디오 구성요소 (184) 로부터 단편들을 요청할 수도 있다.4 also shows an exemplary timing interval 190 of one second. According to the teachings of the present disclosure, the component map box for the video component 180 and the audio component 184 indicates that the timing interval 190 is one of the potential sets of timing intervals, Lt; RTI ID = 0.0 > 190 < / RTI > The client device uses this information to avoid buffer overflows as well as to ensure that a sufficient amount of data that can be decoded before being streamed through the network is buffered, Element 180 and the audio component 184. < RTI ID = 0.0 >

위에서 언급한 바와 같이, 본 개시물은 네트워크 스트리밍 상황들에 관한 것이며, 여기서, 클라이언트는 데이터를 서버로부터 연속적으로 요청하고 데이터가 취출됨에 따라 데이터를 디코딩하고 렌더링한다. 예를 들어, 비디오 단편 (182A) 및 오디오 단편 (186A) 의 데이터를 디코딩하고 렌더링하는 동안, 클라이언트 디바이스는 비디오 단편 (182B) 및 오디오 단편 (186B) 을 요청할 수도 있다. 도 4 의 예에 나타낸 바와 같이, 비디오 단편들 (182) 및 오디오 단편들 (186) 은 반드시 시간적으로 정렬되지는 않는다. 따라서, 클라이언트 디바이스는 타이밍 간격 정보를 이용하여, 비디오 구성요소 (180) 및 오디오 구성요소 (184) 의 후속 단편들의 데이터를 요청할 시점을 결정할 수도 있다.As noted above, the disclosure is directed to network streaming situations, wherein the client continuously requests data from the server and decodes and renders the data as the data is retrieved. For example, during decoding and rendering of data in video segment 182A and audio segment 186A, the client device may request video segment 182B and audio segment 186B. As shown in the example of FIG. 4, video fragments 182 and audio fragments 186 are not necessarily time aligned. Thus, the client device may use the timing interval information to determine when to request data for subsequent fragments of the video component 180 and the audio component 184.

일반적으로, 클라이언트 디바이스는 다음 타이밍 간격 내에 시작 디코딩 시간을 가진 단편들을 취출하도록 구성될 수도 있다. 구성요소가 다음 타이밍 간격 내에 시작 디코딩 시간을 갖는 단편을 포함하면, 클라이언트는 그 단편을 요청할 수도 있다. 그렇지 않고, 클라이언트는 후속 타이밍 간격까지 그 구성요소로부터의 데이터에 대한 요청을 건너뛸 수도 있다.In general, the client device may be configured to retrieve fragments having a start decoding time within the next timing interval. If the component includes a fragment having a start decoding time within the next timing interval, the client may request the fragment. Otherwise, the client may skip the request for data from the component until a subsequent timing interval.

도 4 의 예에서, 타이밍 간격 (190) 은 1 초와 같다. 비디오 구성요소 (180) 의 비디오 단편들 (182) 의 디코딩 시간 값들은 이 예에서, 비디오 단편 (182A) 에 대해 N-1 초, 비디오 단편 (182B) 에 대해 N+1.2 초, 비디오 단편 (182C) 에 대해 N+2.1 초, 그리고 비디오 단편 (182D) 에 대해 N+3.3 일 수도 있다. 오디오 구성요소 (184) 의 오디오 단편들 (186) 의 디코딩 시간 값들은 이 예에서, 오디오 단편 (186A) 에 대해 N-.2 초, 오디오 단편 (186B) 에 대해 N+1.3 초, 그리고 오디오 단편 (186C) 에 대해 N+3.2 초일 수도 있다.In the example of FIG. 4, the timing interval 190 is equal to one second. Decoding time values of video fragments 182 of video component 180 are, in this example, N-1 seconds for video fragment 182A, N + 1.2 seconds for video fragment 182B, and video fragment 182C. N + 2.1 seconds for N) and N + 3.3 for video fragment 182D. The decoding time values of the audio fragments 186 of the audio component 184 in this example are N.-2 seconds for the audio fragment 186A, N + 1.3 seconds for the audio fragment 186B, / RTI > may be N + 3.2 seconds relative to the signal 186C.

일 예로서, 목적지 디바이스 (40) 에서 다음 다가오는 로컬 디코딩 시간이 N+2 초라고 가정한다. 따라서, 목적지 디바이스 (40) 는 어느 구성요소 단편들이 N+2 와 N+3 초 사이의 디코딩 시간, 즉, 로컬 디코딩 시간 플러스 타이밍 간격 (190) 을 갖는지를 결정할 수도 있다. N+2 와 N+3 초 사이의 시작 디코딩 시간들을 갖는 구성요소들의 단편들은 요청되는 다음 단편들에 대응할 수도 있다. 이 예에서, 비디오 단편 (182C) 은 N+2 와 N+3 초 사이의 디코딩 시간을 갖는다. 따라서, 목적지 디바이스 (40) 는 비디오 단편 (182C) 을 취출하라는 요청, 예컨대, 비디오 단편 (182C) 의 바이트 범위를 규정하는 HTTP 부분 취득 요청을 서브밋할 수도 있다. 오디오 단편들 (186) 중 어느 것도 N+2 와 N+3 사이의 디코딩 시간을 갖지 않기 때문에, 목적지 디바이스 (40) 는 오디오 단편들 (186) 의 임의의 단편에 대한 요청을 아직도 서브밋하지 않을 것이다.As an example, assume that the next upcoming local decoding time at the destination device 40 is N + 2 seconds. Thus, the destination device 40 may determine which component fragments have a decoding time between N + 2 and N + 3 seconds, i.e., a local decoding time plus a timing interval 190. Fragments of components having start decoding times between N + 2 and N + 3 seconds may correspond to the next fragments requested. In this example, the video fragment 182C has a decoding time between N + 2 and N + 3 seconds. Thus, the destination device 40 may submit a request to retrieve the video fragment 182C, e.g., an HTTP partial retrieval request that specifies a byte range of the video fragment 182C. Since none of the audio fragments 186 has a decoding time between N + 2 and N + 3, the destination device 40 will still not submit the request for any fragments of the audio fragments 186 will be.

또 다른 예로서, 다가오는 로컬 디코딩 시간이 N+3 초일 때, 목적지 디바이스 (40) 는 비디오 단편 (182D) 및 오디오 단편 (186C) 에 대한 요청들을 서브밋할 수도 있다. 즉, 비디오 단편 (182D) 및 오디오 단편 (186C) 는 N+3 초와 N+4 초 사이의 디코딩 시간들을 갖는다. 따라서, 목적지 디바이스 (40) 는 비디오 단편 (182D) 및 오디오 단편 (186C) 에 대한 요청들을 서브밋할 수도 있다.As another example, when the upcoming local decoding time is N + 3 seconds, destination device 40 may submits requests for video fragment 182D and audio fragment 186C. That is, video fragment 182D and audio fragment 186C have decoding times between N + 3 seconds and N + 4 seconds. Thus, the destination device 40 may submits requests for the video fragment 182D and the audio fragment 186C.

또한 또 다른 예로서, 타이밍 간격 (190) 이 2 초였다고 대신 가정한다. 로컬 디코딩 시간이 N+1 초였으면, 목적지 디바이스 (40) 는 먼저 비디오 단편들 (182B 및 182C) 및 오디오 단편 (186B) 가 N+1 초와 N+3 초 사이의 디코딩 시간들을 갖는다고 결정할 수도 있다. 따라서, 목적지 디바이스 (40) 는 비디오 단편들 (182B 및 182C) 및 오디오 단편 (186B) 에 대한 요청들을 서브밋할 수도 있다.As yet another example, it is assumed that the timing interval 190 was 2 seconds instead. Destination device 40 may first determine that video segments 182B and 182C and audio fragment 186B have decoding times between N + 1 and N + 3 seconds if the local decoding time was N + 1 seconds have. Thus, the destination device 40 may submits requests for video fragments 182B and 182C and audio fragment 186B.

도 5 은 구성요소 맵 박스 및 구성요소 배열 박스들을 서버로부터 클라이언트에 제공하는 예시적인 방법을 도시하는 플로우차트이다. 도 5 는 또한 프리젠테이션을 형성하는 구성요소들을 선택하고 그 선택된 구성요소들의 인코딩된 샘플들을 요청하기 위해 구성요소 맵 박스 및 구성요소 배열 박스들을 이용하는 예시적인 방법을 도시한다. 일반적으로 도 1 의 소스 디바이스 (20) 및 목적지 디바이스 (40) 에 대해 설명하지만, 다른 디바이스들이 도 5 의 기법들을 구현할 수도 있는 것으로 이해되어야 한다. 예를 들어, 스트리밍 프로토콜을 통해서 통신하도록 구성된 임의의 서버 및 클라이언트가 이들 기법들을 구현할 수도 있다.5 is a flowchart illustrating an exemplary method of providing a component map box and component arrangement boxes from a server to a client. Figure 5 also illustrates an exemplary method of selecting component elements forming a presentation and utilizing component map boxes and component arrangement boxes to request encoded samples of the selected components. It should be understood that while the source device 20 and destination device 40 of FIG. 1 are generally described, other devices may implement the techniques of FIG. For example, any server and client configured to communicate via a streaming protocol may implement these techniques.

먼저, 소스 디바이스 (20) 는 인코딩된 비디오 샘플들을 수신할 수도 있다 (200). 소스 디바이스 (20) 는 또한 인코딩된 오디오 샘플들을 수신할 수도 있다. 수신된 샘플들은 공통 콘텐츠에 대응한다. 수신된 샘플들은 콘텐츠의 여러 구성요소들에 대응할 수도 있다. 소스 디바이스 (20) 는 어느 구성요소들에 샘플들이 대응하는지를 결정할 수도 있다. 소스 디바이스 (20) 는 또한 샘플들을 하나 이상의 파일들에 콘텐츠의 구성요소들로서 저장할 수도 있다 (202). 소스 디바이스 (20) 는 구성요소가 하나 이상의 단편들을 포함할 수 있고 그리고 동일한 구성요소의 단편들이 별개의 파일들에 저장될 수도 있도록, 샘플들을 단편들의 유형으로 배열할 수도 있다.First, the source device 20 may receive encoded video samples (200). The source device 20 may also receive encoded audio samples. The received samples correspond to common content. The received samples may correspond to various components of the content. Source device 20 may determine to which component the samples correspond. The source device 20 may also store samples in one or more files as components of the content (202). The source device 20 may arrange the samples into types of fragments so that the components may include one or more fragments and fragments of the same components may be stored in separate files.

콘텐츠의 구성요소들을 하나 이상의 파일들에 저장한 후, 소스 디바이스 (20) 는 그 파일들 각각에 대해 구성요소 배열 박스들을 발생할 수도 있다 (204). 위에서 설명한 바와 같이, 구성요소 배열 박스들은 파일의 구성요소 식별자들과 트랙 식별자들 사이에 맵핑을 제공할 수도 있다. 소스 디바이스 (20) 는 또한 콘텐츠의 구성요소들의 모두를 기술하는 구성요소 맵 박스를 발생할 수도 있다 (206). 위에서 설명한 바와 같이, 구성요소 맵 박스는 예를 들어, 비트 레이트들, 프레임 레이트들, 해상도, 코덱 정보, 프로파일 및 레벨 정보, 구성요소들 사이의 의존성들, 세그먼트 정보, 멀티플렉싱 간격들, 그 콘텐츠의 구성요소들 내의 단편들의 바이트 범위들을 기술하는 세그먼트 정보, 및/또는 3D 비디오 정보와 같은, 콘텐츠의 구성요소들의 특성들을 기술할 수도 있다.After storing the components of the content in one or more files, the source device 20 may generate 204 the component arrangement boxes for each of the files. As described above, the component arrangement boxes may provide a mapping between component identifiers and track identifiers of a file. The source device 20 may also generate a component map box describing all of the components of the content (206). As described above, the component map box may include, for example, bit rates, frame rates, resolution, codec information, profile and level information, dependencies between components, segment information, multiplexing intervals, Segment information describing byte ranges of fragments within components, and / or 3D video information.

일부 예들에서, 소스 디바이스 (20) 는 인코딩된 비디오 샘플들을 포함하는 파일들 및 구성요소 맵 박스 뿐만 아니라, 구성요소 배열 박스들을 소스 디바이스 (20) 내에 로컬로 저장할 수도 있다. 다른 예들에서, 소스 디바이스 (20) 는 그 파일들 및 구성요소 맵 박스를 클라이언트들에 제공되는 별개의 서버 디바이스로 스트리밍 네트워크 프로토콜을 통해서 전송할 수도 있다. 도 5 의 예에서, 소스 디바이스 (20) 가 그 파일들을 저장하고 스트리밍 네트워크 프로토콜, 예컨대 HTTP 스트리밍을 구현한다고 가정한다.In some instances, the source device 20 may store component arrangement boxes, as well as files containing encoded video samples and component map boxes locally within the source device 20. In other examples, the source device 20 may send its files and the component map box to a separate server device provided to clients via a streaming network protocol. In the example of FIG. 5, assume that the source device 20 stores its files and implements a streaming network protocol, such as HTTP streaming.

따라서, 목적지 디바이스 (40) 는 소스 디바이스 (20) 에게 구성요소 맵 박스 및 구성요소 배열 박스들을 요청할 수도 있다 (208). 예를 들어, 목적지 디바이스 (40) 는 구성요소 맵 박스 및 구성요소 배열 박스들에 대한 요청을 소스 디바이스 (20) 로 HTTP 스트리밍, 예컨대, 그 콘텐츠와 연관되는 URL 에서 송신된 헤드 요청에 따라서 전송할 수도 있다. 그 요청을 수신하는 것에 응답하여 (210), 소스 디바이스 (20) 는 구성요소 맵 박스 및 구성요소 배열 박스들을 목적지 디바이스 (40) 에 제공할 수도 있다 (212).Thus, the destination device 40 may request the source device 20 for a component map box and component arrangement boxes (208). For example, the destination device 40 may send a request for a component map box and component arrangement boxes to the source device 20 in accordance with HTTP streaming, e.g., a head request sent at the URL associated with the content have. In response to receiving the request 210, the source device 20 may provide a component map box and component arrangement boxes to the destination device 40 (212).

구성요소 맵 박스 및 구성요소 배열 박스들을 수신한 후에 (214), 목적지 디바이스 (40) 는 요청할 구성요소들을 구성요소 맵 박스 내에 포함된 데이터에 기초하여 결정할 수도 있다. 예를 들어, 목적지 디바이스 (40) 는 목적지 디바이스 (40) 가 특정의 구성요소들을 해상도, 프레임 레이트, 코덱 정보, 프로파일 및 레벨 정보, 및 3D 비디오 정보에 기초하여 디코딩하고 렌더링할 수 있는지 여부를 결정할 수도 있다. 목적지 디바이스 (40) 는 또한 가용 대역폭과 같은 현재의 네트워크 조건들을 결정하고 그 현재의 네트워크 조건들에 기초하여 구성요소들을 선택할 수도 있다. 예를 들어, 목적지 디바이스 (40) 는 더 적은 대역폭이 이용가능할 때는 상대적으로 낮은 비트레이트, 또는 더 많은 대역폭이 이용가능할 때는 상대적으로 더 높은 비트레이트를 갖는 구성요소들을 선택할 수도 있다. 또 다른 예로서, 목적지 디바이스 (40) 는 멀티플렉싱 간격을 현재의 네트워크 조건들에 기초하여 선택하고, 박스로 나타낸 바와 같은 가능한 멀티플렉싱 간격들에 기초하여 멀티플렉싱 간격을 변경함으로써, 변하는 네트워크 조건들에 적응시킬 수도 있다. 그 선택된 구성요소가 또 다른 구성요소에 의존하면, 목적지 디바이스 (40) 는 선택된 구성요소 및 그 선택된 구성요소가 의존하는 구성요소 양자를 요청할 수도 있다.After receiving 214 the component map box and the component arrangement boxes, the destination device 40 may determine the components to be requested based on the data contained in the component map box. For example, the destination device 40 may determine whether the destination device 40 can decode and render certain components based on resolution, frame rate, codec information, profile and level information, and 3D video information It is possible. The destination device 40 may also determine current network conditions, such as available bandwidth, and select components based on the current network conditions. For example, the destination device 40 may select relatively low bit rates when less bandwidth is available, or relatively higher bit rates when more bandwidth is available. As another example, the destination device 40 may select the multiplexing interval based on the current network conditions, and may change the multiplexing interval based on possible multiplexing intervals as indicated by box, It is possible. If the selected component depends on another component, the destination device 40 may request both the selected component and the component upon which the selected component depends.

요청할 구성요소들을 선택한 후, 목적지 디바이스 (40) 는 그 수신된 구성요소 배열 박스들에 기초하여 그 선택된 구성요소들에 대한 데이터를 저장하고 있는 파일들을 결정할 수도 있다 (218). 예를 들어, 목적지 디바이스 (40) 는 그 구성요소 배열 박스들을 분석하여, 파일이 선택된 구성요소의 구성요소 식별자와 파일의 트랙 사이의 맵핑을 갖는지 여부를 결정할 수도 있다. 그렇다면, 목적지 디바이스 (40) 는 그 파일로부터 데이터를 예컨대, 스트리밍 네트워크 프로토콜에 따라서 요청할 수도 있다 (220). 그 요청들은 어쩌면, 그 파일들의 바이트 범위를 규정하는, 그 파일들의 URLs 또는 URNs 에 대한 HTTP 취득 또는 부분 취득 요청들을 포함하며, 이때 바이트 범위들은 그 파일들에 의해 저장된 구성요소들의 단편들에 대응할 수도 있다.After selecting the components to request, the destination device 40 may determine 218 the files storing the data for the selected components based on the received component array boxes. For example, the destination device 40 may analyze its component arrangement boxes to determine whether the file has a mapping between the component identifier of the selected component and the track of the file. If so, the destination device 40 may request data from the file, e.g., according to a streaming network protocol (220). The requests may include HTTP acquisition or partial retrieval requests for URLs or URNs of the files, possibly defining byte ranges of the files, where the byte ranges may correspond to fragments of the components stored by the files have.

목적지 디바이스 (40) 는 구성요소들의 순차적인 부분들을 취출하라는 다수의 요청들을 그 선택된 멀티플렉싱 간격에 기초하여 서브밋할 수도 있다. 즉, 초기에, 목적지 디바이스 (40) 는 그 선택된 구성요소들 각각으로부터 단편들을 먼저 요청할 수도 있다. 그 후, 목적지 디바이스 (40) 는 다음 멀티플렉싱 간격에 대해, 각각의 구성요소에 대한 다음 멀티플렉싱 간격 내에서 시작하는 단편이 있는지 여부를 결정하고, 만약 있다면, 그 단편(들) 을 요청할 수도 있다. 이러한 방법으로, 목적지 디바이스 (40) 는 단편들을 구성요소들로부터 멀티플렉싱 간격에 기초하여 요청할 수도 있다. 목적지 디바이스 (40) 는 또한 예컨대, 멀티플렉싱 간격을 변경하거나 또는 상이한 구성요소들로부터 데이터를 요청함으로써, 네트워크 조건들을 주기적으로 재평가하여, 변하는 네트워크 조건들에의 적응을 수행할 수도 있다. 어쨌든, 그 요청들에 응답하여, 소스 디바이스 (20) 는 그 요청된 데이터를 목적지 디바이스 (40) 로 출력할 수도 있다 (222).Destination device 40 may submit a number of requests to retrieve sequential portions of the components based on the selected multiplexing interval. That is, initially, destination device 40 may first request fragments from each of its selected components. The destination device 40 may then determine, for the next multiplexing interval, whether there is a fragment starting within the next multiplexing interval for each component, and if so, request the fragment (s). In this way, the destination device 40 may request fragments based on the multiplexing interval from the components. Destination device 40 may also periodically reevaluate network conditions, eg, by changing the multiplexing interval or by requesting data from different components, to perform adaptation to changing network conditions. In any case, in response to the requests, the source device 20 may output the requested data to the destination device 40 (222).

하나 이상의 예들에서, 설명되는 기법들은 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 임의의 조합으로 구현될 수도 있다. 소프트웨어로 구현되는 경우, 그 기능들은 컴퓨터 판독가능 매체 상에 하나 이상의 명령들 또는 코드들로서 저장되거나 또는 송신되고, 하드웨어-기반의 프로세싱 유닛에 의해 실행될 수도 있다. 컴퓨터 판독가능 매체는 컴퓨터 판독가능 저장 매체를 포함할 수도 있으며, 이 저장 매체는 데이터 저장 매체와 같은 유형의 매체, 또는 예컨대, 통신 프로토콜에 따라서 한 장소로부터 다른 장소로의 컴퓨터 프로그램의 전송을 용이하게 하는 임의의 매체를 포함하는 통신 매체에 대응한다. 이러한 방법으로, 컴퓨터 판독가능 매체는 일반적으로 (1) 비일시성인 유형의 컴퓨터 판독가능 저장 매체 또는 (2) 신호 또는 캐리어 파와 같은 통신 매체에 대응할 수도 있다. 데이터 저장 매체는 본 개시물에서 설명하는 기법들의 구현을 위한 명령들, 코드 및/또는 데이터 구조들을 취출하기 위해 하나 이상의 컴퓨터들 또는 하나 이상의 프로세서들에 의해 액세스될 수 있는 임의의 가용 매체일 수도 있다. 컴퓨터 프로그램 제품은 컴퓨터 판독가능 매체를 포함할 수도 있다.In one or more examples, the techniques described may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored or transmitted as one or more instructions or codes on a computer-readable medium, and executed by a hardware-based processing unit. The computer readable medium may also include a computer readable storage medium that facilitates transfer of a computer program from one location to another location, for example, in accordance with a communication protocol, And the like. In this way, the computer readable medium may generally correspond to (1) a non-transitory type of computer readable storage medium or (2) a communication medium such as a signal or carrier wave. The data storage medium may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures for implementation of the techniques described in this disclosure . The computer program product may comprise a computer readable medium.

일 예로서, 이에 한정하지 않고, 이런 컴퓨터 판독가능 저장 매체는 RAM, ROM, EEPROM, CD-ROM 또는 다른 광디스크 저장, 자기디스크 저장, 또는 다른 자기 저장 디바이스들, 플래시 메모리, 또는 원하는 프로그램 코드를 명령들 또는 데이터 구조들의 형태로 저장하는데 사용될 수 있고 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체를 포함할 수 있다. 또한, 임의의 접속이 컴퓨터 판독가능 매체로 적절히 지칭된다. 예를 들어, 동축 케이블, 광섬유 케이블, 이중 권선, 디지털 가입자 회선 (DSL), 또는 무선 기술들 예컨대 적외선, 무선, 및 마이크로파를 이용하여 명령들이 웹사이트, 서버, 또는 다른 원격 소오스로부터 송신되는 경우, 동축 케이블, 광섬유 케이블, 이중 권선, DSL, 또는 무선 기술들 예컨대 적외선, 무선, 및 마이크로파가 그 매체의 정의에 포함된다. 그러나, 컴퓨터 판독가능 저장 매체 및 데이터 저장 매체는 접속부들, 캐리어 파들, 신호들, 또는 다른 일시적 매체를 포함하지 않으며, 그 대신, 비-일시성의, 유형의 저장 매체에 송신되는 것으로 이해되어야 한다. 디스크 (disk) 및 디스크 (disc) 는, 본원에서 사용할 때, 컴팩트 디스크 (CD), 레이저 디스크, 광 디스크, 디지털 다기능 디스크 (DVD), 플로피 디스크 및 블루-레이 디스크를 포함하며, 본원에서, 디스크들 (disks) 은 데이터를 자기적으로 보통 재생하지만, 디스크들 (discs) 은 레이저로 데이터를 광학적으로 재생한다. 앞에서 언급한 것들의 조합들이 또한 컴퓨터 판독가능 매체들의 범위 내에 포함되어야 한다.By way of example, and not limitation, such computer readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or desired program code. Or any other medium that can be used to store data in the form of devices or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer readable medium. For example, if commands are sent from a website, server, or other remote source using coaxial cable, fiber optic cable, double winding, digital subscriber line (DSL), or wireless technologies such as infrared, wireless, and microwave, Coaxial cable, fiber optic cable, double winding, DSL, or wireless technologies such as infrared, wireless, and microwave are included in the definition of the medium. However, it should be understood that the computer-readable storage medium and the data storage medium do not include connections, carrier waves, signals, or other temporal media, but instead are transmitted to a non-temporal, type of storage medium. A disk and a disc as used herein include a compact disk (CD), a laser disk, an optical disk, a digital versatile disk (DVD), a floppy disk and a Blu-ray disk, Discs usually reproduce data magnetically, while discs optically reproduce data with a laser. Combinations of the foregoing should also be included within the scope of computer readable media.

명령들은 하나 이상의 프로세서들, 예컨대 하나 이상의 디지털 신호 프로세서들 (DSPs), 범용 마이크로프로세서들, 주문형 집적회로들 (ASICs), 필드 프로그래밍가능 로직 어레이들 (FPGAs), 또는 다른 등가의 통합 또는 이산 로직 회로에 의해 실행될 수도 있다. 따라서, 용어 "프로세서" 는, 본원에서 사용될 때, 전술한 구조 또는 본원에서 설명하는 기법들의 구현에 적합한 임의의 다른 구조 중 임의의 구조를 지칭할 수도 있다. 게다가, 일부 양태들에서, 본원에서 설명하는 기능은 전용 하드웨어 및/또는 인코딩 및 디코딩을 위해 구성되는 소프트웨어 모듈들 내에 제공되거나, 또는 결합된 코덱에 포함될 수도 있다. 또한, 이 기법들은 하나 이상의 회로들 또는 로직 엘리먼트들로 완전히 구현될 수 있다.The instructions may be implemented in one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuit Lt; / RTI > Thus, the term "processor" when used herein may refer to any of the structures described above or any other structure suitable for implementation of the techniques described herein. Further, in some aspects, the functions described herein may be provided in dedicated software and / or software modules configured for encoding and decoding, or may be included in a combined codec. In addition, these techniques may be fully implemented in one or more circuits or logic elements.

본 개시물의 기법들은 무선 핸드셋, 집적 회로 (IC) 또는 ICs 의 세트 (예컨대, 칩 세트) 를 포함한, 매우 다양한 디바이스들 또는 장치들로 구현될 수도 있다. 여러 구성요소들, 모듈들, 또는 유닛들이 개시한 기법들을 수행하도록 구성되는 디바이스들의 기능적 양태들을 본 개시물에서 설명되지만, 상이한 하드웨어 유닛들에 의한 실현을 반드시 필요로 하지는 않는다. 더 정확히 말하면, 위에서 설명한 바와 같이, 여러 유닛들은 코덱 하드웨어 유닛에 결합되거나 또는 적합한 소프트웨어 및/또는 펌웨어와 함께, 위에서 설명한 바와 같은 하나 이상의 프로세서들을 포함한, 상호작용하는 하드웨어 유닛들의 컬렉션에 의해 제공될 수도 있다.The techniques of the present disclosure may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chipset). The functional aspects of the various components, modules, or devices configured to perform the techniques disclosed by the units are described in this disclosure, but do not necessarily require realization by different hardware units. More precisely, as described above, several units may be coupled to a codec hardware unit or provided by a collection of interacting hardware units, including one or more processors as described above, along with suitable software and / or firmware have.

여러 예들이 설명되었다. 이들 및 다른 예들은 다음 청구항들의 범위 이내이다.Several examples have been described. These and other examples are within the scope of the following claims.

Claims

CLAIMS 1. A method of transmitting encapsulated video data,
Transmitting characteristics for components of the plurality of representations of video content to a client device, the characteristics comprising at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components. Transmitting the characteristics;
Receiving a request for at least one of the components from the client device after transmitting the properties; And
Transmitting the requested components to the client device in response to the request.

The method of claim 1,
At least two of the components are stored in separate files,
Wherein transmitting the properties comprises transmitting a data structure comprising properties for each of the at least two components of the components.

The method of claim 1,
Storing the properties for the components in a file separate from one or more files that store encoded samples for the components,
The method of claim 1,
Receiving a first request for the file in which the properties are stored; And
And responsive to the first request, transmitting the file independently of the one or more files storing the encoded samples,
Wherein the request for at least one of the video components comprises a second different request.

The method of claim 1,
Storing properties for each of the components in a single data structure, the data structure being distinct from the components;
Assigning an identifier to the data structure to associate the data structure with the multimedia content comprising the plurality of representations; And
Further comprising assigning unique identifiers to the representations of the multimedia content,
Wherein the transmitting the characteristics comprises transmitting the data structure.

The method of claim 1,
Wherein transmitting the properties further comprises transmitting component identifier values for the components,
At least one component identifier value of the component identifier values transmits encapsulated video data different from a track identifier value for a component corresponding to the at least one component identifier value of the component identifier values. Way.

The method of claim 5, wherein
Further comprising transmitting information indicative of a correspondence between component identifier values for the components and track identifier values for the components in one or more files that store encoded samples for the components And transmitting the encapsulated video data.

The method according to claim 6,
For each of the components of the one or more files, byte offsets for fragments in the component, decoding times of first samples in the fragments, random access points in the fragments, and the fragment Transmitting information indicative of whether they belong to a new segment of the component.

The method of claim 1,
Wherein transmitting the properties comprises transmitting information indicating that the set of components is switchable to each other,
Wherein the request defines at least one of the set of components.

The method of claim 1,
The step of transmitting the properties includes transmitting information indicating an ordering of the dependencies between the components and the dependencies between the components on the decoding order of the components in the access unit &Lt; / RTI > The method of claim 1, further comprising transmitting the encapsulated video data.

The method of claim 1,
Wherein transmitting the properties comprises transmitting information indicative of a time layer difference between dependencies between the components and a second component dependent on the first component and the first component , And transmitting the encapsulated video data.

The method of claim 1,
Transmitting the characteristics comprises transmitting information indicative of a number of target views for output for one or more of the plurality of representations.

The method of claim 1,
Wherein the transmitting of the characteristics comprises transmitting information indicating possible multiplexing intervals for a combination of two or more of the components,
The request defining fragments of any one of the two or more components of the components having decoding times within a common interval of the multiplexing intervals.

The method of claim 1,
The characteristics comprising a first set of properties,
Transmitting the characteristics comprises transmitting information indicating a first time duration of the components to which the first set of characteristics corresponds;
The method comprises:
Further comprising transmitting a second set of properties for the components and a second time duration of the corresponding components of the second set of characteristics.

An apparatus for transmitting encapsulated video data,
A processor configured to determine characteristics for components of a plurality of representations of video content, the characteristics comprising at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components; The processor; And
Receiving a request for at least one of the components from the client device after sending the properties to a client device, receiving the request for at least one of the components from the client device, The device comprising one or more interfaces configured to transmit to the device.

15. The method of claim 14,
Wherein the characteristics further comprise component identifier values for the components,
Wherein at least one component identifier value of the component identifier values is different from a track identifier value for a component corresponding to the at least one component identifier value of the component identifier values,
The properties include information indicative of a correspondence between component identifier values for the components and track identifier values for the components in one or more files storing encoded samples for the components. Device for transmitting encapsulated video data.

The method of claim 15,
Characterized in that for each of the components of the one or more files, byte offsets for the fragments in the component, decoding times of the first samples in the fragments, random access points And information indicative of whether or not the segment belongs to a new segment of the component.

15. The method of claim 14,
The characteristics comprising information indicating an order of dependencies between the components and an order of dependencies between the components on the decoding order of the components in the access unit, the device transmitting the encapsulated video data .

15. The method of claim 14,
Wherein the properties include information indicating a time layer difference between dependencies between the components and a second component that relies on the first component and the first component to transmit the encapsulated video data Device.

15. The method of claim 14,
Wherein the properties include information representing a number of target views for output for one or more representations of the plurality of representations.

15. The method of claim 14,
The characteristics comprising information indicating possible multiplexing intervals for a combination of two or more of the components,
The request defining fragments of any of the two or more components of the components having decoding times within a common interval of the multiplexing intervals.

15. The method of claim 14,
The characteristics comprising a first set of properties,
The one or more interfaces are configured to transmit information indicative of a first time duration of the components to which the first set of characteristics corresponds;
The processor is further configured to generate a second set of properties for the components and a second set of characteristics corresponding to the second time duration of the components,
And the one or more interfaces are configured to transmit the second set of characteristics.

15. The method of claim 14,
The apparatus comprises:
integrated circuit;
A microprocessor; And
A wireless communication device comprising the processor
Gt; encapsulated < / RTI > video data.

An apparatus for transmitting encapsulated video data,
Means for transmitting characteristics for components of a plurality of representations of video content to a client device, the characteristics comprising at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components. Means for transmitting the properties;
Means for receiving a request for at least one of the components from the client device after transmitting the properties; And
And means for sending the requested components to the client device in response to the request.

24. The method of claim 23,
Wherein the means for transmitting the characteristics comprises:
Means for transmitting component identifier values for the components, wherein the component identifier value of at least one of the component identifier values includes a component identifier value corresponding to the at least one component identifier value of the component identifier values Means for transmitting the component identifier values different from a track identifier value for the component identifier;
Means for transmitting information indicative of a correspondence between component identifier values for the components and track identifier values for the components in one or more files storing encoded samples for the components; And
For each of the components of the one or more files, byte offsets for fragments within the component, decoding times of the first samples in the fragments, random access points in the fragments, And means for transmitting information indicative of whether the segment belongs to a new segment of the component.

24. The method of claim 23,
The means for transmitting the characteristics comprises means for transmitting information indicative of the dependencies between the components and the order of the dependencies between the components with respect to the decoding order of the components in an access unit, Device for transmitting encapsulated video data.

24. The method of claim 23,
Wherein the means for transmitting the properties comprises means for transmitting information indicative of a time layer difference between dependencies between the components and a second component dependent on the first component and the first component , And transmits the encapsulated video data.

24. The method of claim 23,
Wherein the means for transmitting the characteristics comprises means for transmitting information indicating possible multiplexing intervals for a combination of two or more of the components,
The request defining fragments of any of the two or more components of the components having decoding times within a common interval of the multiplexing intervals.

24. The method of claim 23,
The characteristics comprising a first set of properties,
Wherein the means for transmitting the characteristics comprises means for transmitting information indicating a first time duration of the corresponding components of the first set of characteristics,
The apparatus comprises:
Further comprising means for transmitting a second set of properties for the components and a second time duration of a corresponding second component of the features.

When executed, the processor of the source device transmitting the encoded video data,
Enabling to transmit characteristics for components of a plurality of representations of video content to a client device, the characteristics comprising at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components. Send the characteristics to a client device;
Receive a request for at least one of the components from the client device after transmitting the properties;
And a computer readable storage medium having stored thereon instructions for transmitting the requested components to the client device in response to the request.

30. The method of claim 29,
The instructions that cause the processor to transmit the characteristics include:
The processor,
Wherein the at least one component identifier value of at least one of the component identifier values comprises at least one of a component identifier value corresponding to the at least one component identifier value of the component identifier values, To transmit component identifier values for the components different from track identifier values for the component;
To transmit information indicative of a correspondence between component identifier values for the components and track identifier values for the components in one or more files that store encoded samples for the components;
For each of the components of the one or more files, byte offsets for fragments within the component, decoding times of the first samples in the fragments, random access points in the fragments, Further comprising instructions for causing the computer to transmit information indicative of whether a fragment belongs to a new segment of the component.

30. The method of claim 29,
The instructions that cause the processor to transmit the characteristics include:
Wherein the instructions cause the processor to perform the steps of: determining dependencies between the components; ordering dependencies between the components on a decoding order of the components in an access unit; And instructions to cause the computer to: transmit information indicative of a time layer difference between the second component.

30. The method of claim 29,
The instructions that cause the processor to transmit the characteristics include:
And cause the processor to transmit information indicating a number of target views for outputting one or more representations of the plurality of representations.

30. The method of claim 29,
The instructions that cause the processor to transmit the characteristics include instructions that cause the processor to transmit information indicative of possible multiplexing intervals for a combination of two or more of the components,
The request defining fragments of any one of the two or more components of the components having decoding times within a common interval of the multiplexing intervals.

30. The method of claim 29,
The characteristics comprising a first set of properties,
Sending the characteristics includes instructions that cause the processor to send information indicating a first time duration of the corresponding components of the first set of characteristics;
The computer program product,
And transmitting the second set of characteristics for the components and the second time duration of the components corresponding to the second set of characteristics.

A method for receiving encapsulated video data,
Requesting a source device for characteristics of components of a plurality of representations of video content, the characteristics comprising at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components; Requesting the source device for the characteristics;
Selecting one or more of the components based on the characteristics;
Requesting samples of the selected components; And
And decoding and presenting the samples after the samples are received.

36. The method of claim 35,
Receiving information indicative of a correspondence between component identifier values for the selected components and track identifier values for the components in one or more files that store encoded samples for the components; And
Byte offsets for the fragments within each of the selected components, decoding times of the first samples in the fragments, random access points in the fragments, and the new segments of each component &Lt; / RTI > further comprising the step of:
Wherein the step of requesting the samples further comprises the steps of: determining, based on the byte offsets, the decoding times, the random access points, and indications of whether the fragments belong to new segments, And requesting samples from the tracks of the one or more files corresponding to the track identifier values corresponding to the component identifier values.

36. The method of claim 35,
Receiving information indicating that at least one of the selected components depends on another component; And
Requesting samples of the component upon which one of the selected components depends.

36. The method of claim 35,
Wherein requesting the samples of the selected components comprises:
Determining a next multiplexing interval;
Determining, among the selected components, components having fragments starting at the next multiplexing interval; And
And requesting fragments starting at the next multiplexing interval from the determined ones of the selected components.

36. The method of claim 35,
The characteristics comprising a first set of properties,
The method comprises:
Receiving information indicative of a first time duration of the corresponding components of the first set of characteristics;
Requesting a second set of properties for the components, the second set of properties corresponding to a second time duration of the corresponding components; And
And requesting samples from the components corresponding to the second time duration based on the second set of properties. &Lt; Desc / Clms Page number 19 >

An apparatus for receiving encapsulated video data,
One or more interfaces configured to request a source device for characteristics of components of a plurality of representations of video content, the characteristics being at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components. The one or more interfaces, including one; And
A processor configured to select one or more of the components based on the characteristics and to submit the one or more interfaces to submit requests for samples of the selected components to the source device; Wherein the encapsulated video data is received by the device.

41. The method of claim 40,
The processor comprising:
Receiving information indicative of a correspondence between component identifier values for the selected components and track identifier values for the components in one or more files storing encoded samples for the components,
Byte offsets for the fragments within each of the selected components, decoding times of the first samples in the fragments, random access points in the fragments, and the new segments of each component The information indicating the presence or absence of the information,
Corresponding to said component identifier values for said selected components, based on said byte offsets, said decoding times, said random access points, and indications of whether said fragments belong to new segments, And configure the requests for the samples from tracks of the one or more files corresponding to track identifier values.

41. The method of claim 40,
The processor comprising:
Receiving information indicating that at least one of the selected components is dependent on another component,
Wherein the processor is configured to request samples of the component upon which one of the selected components depends.

41. The method of claim 40,
In order to generate the requests for the samples of the selected components,
Determine the next multiplexing interval,
Determining components of the selected components having fragments starting at the next multiplexing interval,
And request fragments starting at the next multiplexing interval from the determined ones of the selected components.

41. The method of claim 40,
The characteristics comprising a first set of properties,
The processor comprising:
Wherein the first set of properties receives information indicative of a first time duration of the corresponding component,
The second set of properties requesting a second set of properties for the components corresponding to a second time duration of the corresponding components,
And to request samples from the components corresponding to the second time duration based on the second set of characteristics.

An apparatus for receiving encapsulated video data,
Means for requesting a source device characteristics for components of a plurality of representations of video content, the characteristics comprising at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components; Means for requesting a source device for said characteristics;
Means for selecting one or more of the components based on the characteristics;
Means for requesting samples of the selected components; And
Means for decoding and presenting the samples after the samples have been received.

46. The method of claim 45,
Means for receiving information indicative of a correspondence between component identifier values for the selected components and track identifier values for the components in one or more files that store encoded samples for the components; And
Byte offsets for the fragments within each of the selected components, decoding times of the first samples in the fragments, random access points in the fragments, and the new segments of each component Means for receiving information indicative of whether the user belongs to the group;
Wherein the means for requesting the samples further comprises means for determining, based on the byte offsets, the decoding times, the random access points, and indications of whether the fragments belong to new segments, And means for requesting samples from the tracks of the one or more files corresponding to the track identifier values corresponding to the component identifier values.

46. The method of claim 45,
Means for receiving information indicating that at least one of the selected components is dependent on another component; And
And means for requesting samples of the component upon which one of the selected components depends.

46. The method of claim 45,
Wherein the means for requesting samples of the selected components comprises:
Means for determining a next multiplexing interval;
Means for determining, among said selected components, components having fragments starting at said next multiplexing interval; And
And means for requesting fragments starting at the next multiplexing interval from the determined ones of the selected components.

46. The method of claim 45,
The characteristics comprising a first set of properties,
The apparatus comprises:
Means for receiving information indicative of a first time duration of the components to which the first set of characteristics corresponds;
Means for requesting a second set of properties for the components, the second set of properties corresponding to a second time duration of the corresponding components; And
And means for requesting samples from the components corresponding to the second time duration based on the second set of characteristics.

When executed, causes the processor of the device to receive the encapsulated video data,
Requesting a source device for characteristics of components of a plurality of representations of video content, the characteristics comprising at least one of a frame rate, a profile indicator, a level indicator, and dependencies between the components; Request the source device for the characteristics;
Select one or more of the components based on the characteristics;
Request samples of the selected components;
The instructions comprising instructions for decoding and presenting the samples after the samples are received.

51. The method of claim 50,
The processor,
Receive information indicative of a correspondence between component identifier values for the selected components and track identifier values for the components in one or more files that store encoded samples for the components;
Byte offsets for the fragments within each of the selected components, decoding times of the first samples in the fragments, random access points in the fragments, and the new segments of each component The method comprising the steps of: receiving information indicating an indication of whether or not a user belongs to the group;
The instructions for causing the processor to request the samples include instructions for causing the processor to cause the processor to determine whether the byte offsets, the decoding times, the random access points, and indications of whether the fragments belong to new segments Instructions for requesting samples from the tracks of the one or more files corresponding to the track identifier values corresponding to the component identifier values for the selected components Computer program products.

51. The method of claim 50,
The processor,
Receive information indicating that at least one of the selected components is dependent on another component;
And instructions for requesting samples of the component upon which one of the selected components depends.

51. The method of claim 50,
Instructions for causing the processor to request samples of the selected components,
The processor,
Determine a next multiplexing interval;
Determine, among the selected components, components having fragments starting at the next multiplexing interval;
Instructions for requesting fragments starting at the next multiplexing interval from the determined ones of the selected components.

51. The method of claim 50,
The characteristics comprising a first set of properties,
The processor,
Cause the first set of characteristics to receive information indicative of a first time duration of the corresponding component;
Cause a second set of properties to request a second set of properties for the components corresponding to a second time duration of the corresponding components;
And based on the second set of characteristics, instructions for requesting samples from the components corresponding to the second time duration.