KR100883603B1

KR100883603B1 - Method and apparatus for decoding video signal using reference picture

Info

Publication number: KR100883603B1
Application number: KR1020077023436A
Authority: KR
Inventors: 전병문; 박승욱; 박지호; 윤도현
Original assignee: 엘지전자 주식회사
Priority date: 2005-04-13
Filing date: 2006-04-12
Publication date: 2009-02-13
Anticipated expiration: 2026-04-12
Also published as: WO2006109986A1; EP1880553A1; KR20080000588A; EP1880553A4; KR20080000587A; EP1880552A1; EP1880552A4; KR100883602B1; WO2006109988A1

Abstract

비디오 신호를 디코딩하는 방법에 있어서, 현재 층의 현재 이미지의 적어도 일부가 기준 이미지의 적어도 일부, 오프셋 정보 및 치수 정보에 기초하여 예측된다. 오프셋 정보는 기준 이미지의 적어도 하나의 경계 픽셀과 현재 이미지의 적어도 하나의 경계 픽셀 간의 위치 오프셋을 나타내고, 치수 정보는 현재 이미지의 적어도 하나의 치수를 나타낸다.In a method of decoding a video signal, at least a portion of the current image of the current layer is predicted based on at least a portion of the reference image, offset information and dimension information. The offset information indicates a position offset between at least one boundary pixel of the reference image and at least one boundary pixel of the current image, and the dimension information indicates at least one dimension of the current image.

Description

METHOD AND APPARATUS FOR DECODING VIDEO SIGNAL USING REFERENCE PICTURES}

본 발명은 비디오 신호의 스케일러블 인코딩 및 디코딩에 관한 것으로, 특히 비디오 신호의 베이스층을 비디오 신호의 인핸스드층을 코딩하는데 추가적으로 사용하는 비디오 신호의 인코딩 방법 및 장치, 및 그 인코딩된 비디오 데이터를 디코딩하는 방법 및 장치에 관한 것이다.The present invention relates to scalable encoding and decoding of a video signal, and more particularly, to a method and apparatus for encoding a video signal, which further uses the base layer of the video signal to code an enhanced layer of the video signal, and to decode the encoded video data. A method and apparatus are disclosed.

스케일러블 비디오 코덱(scalable video codec; SVC)은, 인코딩된 픽쳐 시퀀스의 일부(특히, 프레임의 전체 시퀀스로부터 간헐적으로 선택된 프레임의 일부 시퀀스)가 디코딩되어 저화질의 비디오를 나타내는데 사용되면서 비디오를 고화질의 픽쳐 시퀀스로 인코딩하는 방법이다. 움직임 보상 시간적 필터링(MCTF)은 스케일러블 비디오 코덱에 사용되도록 제안된 인코딩 방식이다.A scalable video codec (SVC) allows a portion of an encoded picture sequence (particularly a sequence of frames intermittently selected from the entire sequence of frames) to be decoded and used to represent a low quality video, resulting in a high quality picture. This is a method of encoding into a sequence. Motion Compensated Temporal Filtering (MCTF) is an encoding scheme proposed to be used for scalable video codecs.

상술한 바와 같이 스케일러블 방식으로 인코딩된 픽쳐 시퀀스의 일부를 수신하고 처리함으로써 저화질 비디오를 표현할 수 있다고 하더라도, 비트레이트가 낮으면 화질이 크게 감소하는 문제점은 여전히 존재한다. 이 문제점에 대한 해결책으로서, 낮은 비트레이트용 보조 픽쳐 시퀀스, 예를 들어, 작은 스크린 크기 및/또는 낮은 프레임 레이트를 갖는 픽쳐 시퀀스를 계층적으로 제공하여 각각의 디코더 가 그 능력 및 특성에 적합한 시퀀스를 선택하여 디코딩하도록 하는 것이다. 일예로, 4CIF (common intermediate format)의 주 픽쳐 시퀀스 뿐만 아니라 CIF의 보조 픽쳐 시퀀스 및 QCIF (Quarter CIF)의 보조 픽쳐 시퀀스를 인코딩하여 디코더로 전송한다. 각각의 시퀀스는 층이라 하며, 주어진 2개의 층 중의 더 높은 층은 인핸스드층이라 하고 낮은 층은 베이스층이라 한다.Although the low quality video can be represented by receiving and processing a part of the scalable picture sequence as described above, there is still a problem that the image quality is greatly reduced when the bit rate is low. As a solution to this problem, hierarchically provide a secondary picture sequence for low bitrate, e.g., a picture sequence with a small screen size and / or a low frame rate, so that each decoder can create a sequence suitable for its capabilities and characteristics. To decode it. For example, the secondary picture sequence of the CIF and the secondary picture sequence of the QCIF (Quarter CIF) as well as the primary picture sequence of 4CIF (common intermediate format) are encoded and transmitted to the decoder. Each sequence is called a layer, the higher of the two given layers is called the enhanced layer and the lower layer is called the base layer.

이러한 픽쳐 시퀀스는, 동일한 비디오 신호원이 시퀀스들로 인코딩되기 때문에 리던던시(redundancy)를 갖는다. 각각의 시퀀스의 코딩 효율을 증가시키기 위하여, 높은 시퀀스의 비디오 프레임과 시간적으로 일치하는 낮은 시퀀스의 비디오 프레임으로부터 높은 시퀀스의 비디오 프레임의 시퀀스간 픽쳐 예측을 수행함으로써 높은 시퀀스의 코딩 정보의 양을 감소시킬 필요있다.This picture sequence has redundancy because the same video signal source is encoded into the sequences. In order to increase the coding efficiency of each sequence, it is possible to reduce the amount of high sequence coding information by performing inter-sequence picture prediction of high sequence video frames from low sequence video frames that match the high sequence video frames in time. Is necessary.

그러나, 다른 층의 시퀀스의 비디오 프레임은 다른 종횡비를 가질 수 있다. 예를 들어, 높은 시퀀스(즉, 인핸스드층)의 비디오 프레임은 16:9의 큰 종횡비를 갖지만, 낮은 층(즉, 베이스층)의 비디오 프레임은 4:3의 작은 종횡비를 가질 수 있다. 이 경우, 인핸스드층 픽쳐의 예측을 수행할 때, 베이스층 픽쳐의 어느 부분이 인핸스드층 픽쳐에 사용되는지 또는 인핸스드층 픽쳐의 어느 부분에 베이스층 픽쳐가 사용되는지를 판단할 필요가 있다.However, video frames of sequences of different layers may have different aspect ratios. For example, a high sequence (ie, enhanced layer) video frame may have a large aspect ratio of 16: 9, while a low layer (ie, base layer) video frame may have a small aspect ratio of 4: 3. In this case, when performing the prediction of the enhanced layer picture, it is necessary to determine which part of the base layer picture is used for the enhanced layer picture or which part of the enhanced layer picture is used.

발명의 개요Summary of the Invention

본 발명은 비디오 신호의 디코딩에 관한 것이다.The present invention relates to the decoding of video signals.

비디오 신호를 디코딩하는 방법의 일 실시예에서, 현재 층 내의 현재 이미지의 적어도 일부는 기준 이미지의 적어도 일부, 오프셋 정보 및 치수 정보에 기초하 여 예측된다. 오프셋 정보는 기준 이미지의 적어도 하나의 경계 픽셀과 현재 이미지의 적어도 하나의 경계 픽셀 간의 위치 오프셋을 나타낼 수 있고, 치수 정보는 현재 이미지의 적어도 하나의 치수를 나타낼 수 있다.In one embodiment of a method of decoding a video signal, at least a portion of the current image in the current layer is predicted based on at least a portion of the reference image, offset information and dimension information. The offset information may indicate a position offset between at least one boundary pixel of the reference image and at least one boundary pixel of the current image, and the dimension information may indicate at least one dimension of the current image.

일 실시예에서, 기준 이미지는 베이스층 내의 베이스 이미지에 기초한다. 예를 들어, 기준 이미지는 베이스 이미지의 적어도 업샘플링된 부분일 수 있다.In one embodiment, the reference image is based on the base image in the base layer. For example, the reference image can be at least an upsampled portion of the base image.

일 실시예에서, 오프셋 정보는 기준 이미지의 적어도 하나의 좌측 픽셀과 현재 이미지의 적어도 하나의 좌측 픽셀 간의 위치 오프셋을 나타내는 좌측 오프셋 정보를 포함한다.In one embodiment, the offset information includes left offset information indicating a position offset between at least one left pixel of the reference image and at least one left pixel of the current image.

다른 실시예에서, 오프셋 정보는 기준 이미지의 적어도 하나의 상부측 픽셀과 현재 이미지의 적어도 하나의 상부측 픽셀간의 위치 오프셋을 나타내는 상부측 오프셋 정보를 포함한다.In another embodiment, the offset information includes top side offset information indicating a position offset between at least one top side pixel of the reference image and at least one top side pixel of the current image.

일 실시예에서, 치수 정보는 현재 이미지의 폭을 나타내는 폭 정보를 포함한다.In one embodiment, the dimension information includes width information indicating the width of the current image.

다른 실시예에서, 치수 정보는 현재 이미지의 높이를 나타내는 높이 정보를 포함한다.In another embodiment, the dimension information includes height information indicating the height of the current image.

일 실시예에서, 오프셋 정보는 현재 층 내의 픽쳐의 적어도 일부(예를 들어, 슬라이스, 프레임 등)에 대한 헤더로부터 얻어질 수 있다. 또한, 상기 헤더 내의 인디케이터에 기초하여 오프셋 정보가 존재한다는 것을 판정할 수 있다.In one embodiment, the offset information may be obtained from the header for at least a portion (eg, slice, frame, etc.) of the picture in the current layer. It is also possible to determine that there is offset information based on the indicator in the header.

다른 실시예는 비디오 신호를 디코딩하는 장치를 포함한다.Another embodiment includes an apparatus for decoding a video signal.

도 1은 본 발명에 따른 스케일러블 비디오 신호 코딩 방법이 적용되는 비디오 신호 인코딩 장치를 나타내는 블록도.1 is a block diagram illustrating a video signal encoding apparatus to which a scalable video signal coding method according to the present invention is applied.

도 2는 이미지 추정/예측 및 갱신 동작을 수행하는 도 1에 도시된 MCTF 인코더의 일부를 나타내는 블록도.FIG. 2 is a block diagram illustrating a portion of the MCTF encoder shown in FIG. 1 performing image estimation / prediction and update operations.

도 3a 및 3b는 인핸스드층 프레임을 예측 이미지를 갖는 H 프레임으로 변환하기 위한 기준 프레임으로서 사용될 수 있는 인핸스드층 프레임 및 베이스층 프레임간의 관계를 나타내는 도면.3A and 3B show a relationship between an enhanced layer frame and a base layer frame that can be used as a reference frame for converting an enhanced layer frame into an H frame having a predictive image.

도 4는 본 발명의 일 실시예에 따라 베이스층 픽쳐의 일부가 선택되고 확대되어 인핸스드층 픽쳐의 예측 동작에 사용되는 방법을 나타내는 도면.4 is a diagram illustrating a method in which a portion of a base layer picture is selected and enlarged and used for a prediction operation of an enhanced layer picture according to an embodiment of the present invention.

도 5a 및 5b는 본 발명에 따라 디코더로 전송되는, 베이스층 픽쳐 및 인핸스드층 픽쳐 간의 위치 관계에 대한 정보의 구조의 실시예를 나타내는 도면.5A and 5B illustrate an embodiment of the structure of information about the positional relationship between a base layer picture and an enhanced layer picture, which is transmitted to a decoder in accordance with the present invention.

도 6은 본 발명의 다른 실시예에 따라 베이스층 픽쳐를 포함하는 영역이 확대되어 인핸스드층 픽쳐의 예측 동작에 사용되는 방법을 나타내는 도면.FIG. 6 illustrates a method in which an area including a base layer picture is enlarged and used for a prediction operation of an enhanced layer picture according to another embodiment of the present invention; FIG.

도 7은 본 발명의 다른 실시예에 따라 베이스층 픽쳐가 인핸스드층 픽쳐보다 큰 영역으로 확대되어 인핸스드층 픽쳐의 예측 동작에 사용되는 방법을 나타내는 도면.7 illustrates a method in which a base layer picture is enlarged to an area larger than an enhanced layer picture and used for a prediction operation of an enhanced layer picture according to another embodiment of the present invention.

도 8은 도 1의 장치에 의해 인코딩된 데이터 스트림을 디코딩하는 장치를 나타내는 블록도.8 is a block diagram illustrating an apparatus for decoding a data stream encoded by the apparatus of FIG.

도 9는 역예측(inverse prediction) 및 갱신 동작을 수행하는 도 8에 도시된 MCTF 디코더의 일부를 나타내는 블록도.FIG. 9 is a block diagram illustrating a portion of the MCTF decoder shown in FIG. 8 performing inverse prediction and update operations. FIG.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described an embodiment of the present invention;

도 1은 본 발명에 따른 스케일러블 비디오 신호 코딩 방법이 적용되는 비디오 신호 인코딩 장치를 나타내는 블록도이다. 도 1의 장치는 2개의 층의 입력 비디오 신호를 코딩하도록 구현되지만, 이하 설명되는 본 발명의 원리는 비디오 신호가 3개 이상의 층에서 코딩될 때 적용될 수 있다. 본 발명은 또한 예로서 이하에서 설명되는 MCTF 방식에 한정되지 않고 임의의 스케일러블 비디오 코딩 방식에 적용될 수 있다.1 is a block diagram illustrating a video signal encoding apparatus to which a scalable video signal coding method according to the present invention is applied. Although the apparatus of FIG. 1 is implemented to code two layers of input video signal, the principles of the present invention described below can be applied when a video signal is coded in three or more layers. The invention is also not limited to the MCTF scheme described below by way of example, but may be applied to any scalable video coding scheme.

도 1에 도시된 비디오 신호 인코딩 장치는 본 발명이 적용되는 MCTF 인코더(Enhanced Layer Encoder;100), 텍스쳐 코딩부(Texture Coding Unit;110), 움직임 코딩부(Motion Coding Unit;120), 베이스층 인코더(BL Encoder;150) 및 먹스(멀티플렉서)(Muxer;130)를 포함한다. MCTF 인코더(100)는 MCTF 방식에 따라 매크로블록당 입력 비디오 신호를 인코딩하고 적절한 관리 정보를 발생시키는 인핸스드층 인코더이다. 텍스쳐 코딩부(11)는 인코딩된 매크로블록의 정보를 압축 비트스트림으로 변환한다. 움직임 코딩부(120)는 MCTF 인코더(100)에 의해 얻어진 이미지 블록의 움직임 벡터를 특정 방식에 따른 압축 비트스트림으로 변환한다. 베이스층 인코더(150)는, 특정 방식, 예를 들어, MPEG-1, 2 또는 4 표준 또는 H.261, H.263 또는 H.264 표준에 따라 입력 비디오 신호를 인코딩하고 작은 화면의 픽쳐 시퀀스, 예를 들어, 본래 크기의 25%로 축소된 픽쳐 시퀀스를 생성한다. 멀티플렉서(130)는 텍스쳐 코딩부(110)의 출력 데이터, 베이스층 인코더(150)로부터 출력된 작은 화면의 픽쳐 시퀀스, 및 움직임 코딩부(120)의 출력 벡터 데이터를 원하는 포맷으로 캡슐화한다. 멀티플렉서(130)는 캡슐화된 데이터를 원하는 전송 포맷으로 멀티플렉싱하여 출력한다. 베이스층 인코더(150)는 입력 비디오 신호를 인핸스드층 픽쳐보다 작은 화면 크기를 갖는 픽쳐 시퀀스로 인코딩할 뿐만 아니라 입력 비디오 신호를 인핸스드층보다 낮은 프레임 레이트에서 인핸스드층 픽쳐와 동일한 화면 크기를 갖는 픽쳐 시퀀스로 인코딩함으로써 낮은 비트레이트의 데이터 스트림을 제공할 수 있다. 이하 설명되는 본 발명의 실시예에서, 베이스층은 작은 화면의 픽쳐 시퀀스로 인코딩되고, 작은 화면의 픽쳐 시퀀스는 베이스층 시퀀스라 하고, MCTF 인코더(100)로부터 출력된 프레임 시퀀스는 인핸스드층 시퀀스라 한다.The video signal encoding apparatus shown in FIG. 1 includes an MCTF encoder 100, a texture coding unit 110, a motion coding unit 120, and a base layer encoder to which the present invention is applied. (BL Encoder) 150 and Muxer (Multiplexer). The MCTF encoder 100 is an enhanced layer encoder that encodes an input video signal per macroblock and generates appropriate management information according to the MCTF scheme. The texture coding unit 11 converts the information of the encoded macroblock into a compressed bitstream. The motion coding unit 120 converts the motion vector of the image block obtained by the MCTF encoder 100 into a compressed bitstream according to a specific scheme. The base layer encoder 150 encodes the input video signal according to a specific method, for example, the MPEG-1, 2 or 4 standard or the H.261, H.263 or H.264 standard, and the picture sequence of the small picture, For example, create a picture sequence reduced to 25% of its original size. The multiplexer 130 encapsulates the output data of the texture coding unit 110, the picture sequence of the small screen output from the base layer encoder 150, and the output vector data of the motion coding unit 120 into a desired format. The multiplexer 130 multiplexes and encapsulates the encapsulated data in a desired transmission format. The base layer encoder 150 not only encodes the input video signal into a picture sequence having a smaller picture size than the enhanced layer picture, but also encodes the input video signal into a picture sequence having the same picture size as the enhanced layer picture at a lower frame rate than the enhanced layer picture. By encoding, a low bitrate data stream can be provided. In the embodiment of the present invention described below, the base layer is encoded into a picture sequence of a small screen, the picture sequence of the small screen is called a base layer sequence, and the frame sequence output from the MCTF encoder 100 is called an enhanced layer sequence. .

MCTF 인코더(100)는 비디오 프레임 내의 각각의 타겟 매크로블록에 대하여 움직임 추정 및 예측을 수행한다. MCTF 인코더(100)는 타겟 매크로블록과 이웃 프레임 내의 대응 매크로블록간의 이미지 차를 이웃 프레임 내의 대응하는 매크로블록에 추가함으로써 각 타겟 매크로블록에 대한 갱신 동작을 수행한다. 도 2는 이들 동작을 수행하는 MCTF 인코더(100)의 일부 요소를 나타낸다.The MCTF encoder 100 performs motion estimation and prediction on each target macroblock in the video frame. The MCTF encoder 100 performs an update operation on each target macroblock by adding an image difference between the target macroblock and the corresponding macroblock in the neighboring frame to the corresponding macroblock in the neighboring frame. 2 shows some elements of the MCTF encoder 100 performing these operations.

도 2에 도시된 MCTF 인코더(100)의 요소들은 추정기/예측기(Estimate;102), 갱신기(Update;103) 및 디코더(Decoder;105)를 포함한다. 디코더(105)는 베이스층 인코더(150)로부터 수신된 인코딩된 스트림을 디코딩하고 디코딩된 작은 화면의 프레임을 내부 스케일러(105a)를 이용하여 인핸스드층의 프레임 크기로 확대한다. 추정기/예측기(102)는 잔여 데이터로 코딩될 현재 프레임 내, 현재 프레임의 이전 또는 이후의 인접 프레임 내, 및 스케일러(105a)에 의해 확대된 프레임 내의 각각 의 매크로블록의 기준 블록을 검색한다. 추정기/예측기(102)는 기준 블록 또는 스케일러(105a)에 의해 확대된 시간적으로 일치된 프레임 내의 대응 블록으로부터 현재 프레임 내의 각각의 매크로블록의 이미지 차(즉, 화소간 차)를 구하고 이미지 차를 매크로블록으로 코딩한다. 추정기/예측기(102)는 또한 매크로블록으로부터 기준 블록으로 연장되는 움직임 벡터를 구한다. 갱신기(103)는, 매크로블록의 이미지 차에 적절한 상수(예를 들어, 1/2 또는 1/4)를 곱하고 그 결과 값을 기준 블록에 더함으로써, 현재 프레임 이전 또는 이후의 프레임 내에서 기준 블록이 발견되는 현재 프레임 내의 매크로블록의 갱신 동작을 수행한다. 갱신기(103)에 의해 수행된 동작은 "U" 동작이라 하고 "U" 동작에 의해 생성된 프레임은 "L" 프레임이라 한다.Elements of the MCTF encoder 100 shown in FIG. 2 include an estimator / estimate 102, an update 103, and a decoder 105. The decoder 105 decodes the encoded stream received from the base layer encoder 150 and enlarges the decoded small picture frame to the frame size of the enhanced layer using the internal scaler 105a. The estimator / predictor 102 retrieves the reference block of each macroblock in the current frame to be coded with the residual data, in the adjacent frame before or after the current frame, and in the frame enlarged by the scaler 105a. The estimator / predictor 102 obtains the image difference (ie, inter-pixel difference) of each macroblock in the current frame from the corresponding block in the temporally matched frame enlarged by the reference block or scaler 105a and macros the image difference. Coding into blocks The estimator / predictor 102 also obtains a motion vector extending from the macroblock to the reference block. The updater 103 multiplies the image difference of the macroblock by an appropriate constant (e.g., 1/2 or 1/4) and adds the resulting value to the reference block to thereby reference within the frame before or after the current frame. Performs an update operation of the macroblock in the current frame in which the block is found. The operation performed by the updater 103 is called "U" operation and the frame generated by the "U" operation is called "L" frame.

도 2의 추정기/예측기(102)와 갱신기(103)는 비디오 프레임에 대한 동작을 수행하는 대신에 단일 프레임을 분할함으로써 생성된 복수의 슬라이스에 대하여 동작을 동시 또는 병렬로 수행할 수 있다. 추정기/예측기(102)에 의해 생성된 이미지 차를 갖는 프레임(또는 슬라이스)은 "H" 프레임(또는 슬라이스)이라 한다. "H" 프레임(또는 슬라이스)은 비디오 신호의 고주파 성분을 갖는 데이터를 포함한다. 실시예의 다음의 설명에서, 용어 "픽쳐"는, 그 용어의 사용이 기술적으로 가능하다면, 슬라이스 또는 프레임을 지칭하는데 사용된다.The estimator / predictor 102 and updater 103 of FIG. 2 may perform operations on a plurality of slices generated by dividing a single frame simultaneously or in parallel, instead of performing operations on video frames. Frames (or slices) with image differences generated by estimator / predictor 102 are referred to as "H" frames (or slices). An "H" frame (or slice) includes data with high frequency components of a video signal. In the following description of embodiments, the term "picture" is used to refer to a slice or frame, if the use of that term is technically possible.

추정기/예측기(102)는 입력 비디오 프레임(또는 이전 레벨에서 얻어진 L 프레임)의 각각을 원하는 크기의 매크로블록으로 분할한다. 각각의 분할된 매크로블록에 대하여, 추정기/예측기(102)는 인핸스드층의 이전/다음 이웃 프레임 및/또는 스케일러(105a)에 의해 확대된 베이스층 프레임 내에서 이미지가 각각의 분할된 매크로블록과 매우 유사한 블록을 검색한다. 즉, 추정기/예측기(102)는 각각의 분할된 매크로블록과 시간적으로 상관된 매크로블록을 검색한다. 타겟 이미지 블록과 매우 유사한 이미지를 갖는 블록은 타겟 이미지 블록과 가장 작은 이미지 차를 갖는다. 2개의 이미지 블록간의 이미지 차는 예를 들어 2개의 이미지 블록의 화소간 차의 합 또는 평균으로서 정의된다. 현재의 프레임 내의 타겟 매크로블록보다 작거나 임계 이미지 차를 갖는 블록 중에서, 타겟 매크로블록과 가장 작은 이미지 차를 갖는 블록은 기준 블록이라 한다. 기준 블록을 포함하는 픽쳐는 기준 픽쳐라 한다. 현재의 프레임의 각각의 매크로블록에 대하여, 2개의 기준 블록(또는 2개의 기준 픽쳐)이 현재 프레임의 이전 프레임(베이스층 프레임 포함) 또는 현재 프레임의 다음 프레임(베이스층 프레임 포함)에 존재하거나, 이전 프레임 내에 하나의 기준 블록이 존재하고 다음 프레임 내에 하나의 기준 블록이 존재할 수 있다.The estimator / predictor 102 splits each of the input video frames (or L frames obtained at the previous level) into macroblocks of the desired size. For each segmented macroblock, the estimator / predictor 102 determines that the image is very much like each segmented macroblock in the base layer frame magnified by the previous / next neighbor frame of the enhanced layer and / or scaler 105a. Search for similar blocks. That is, the estimator / predictor 102 retrieves macroblocks that are correlated in time with each divided macroblock. A block having an image very similar to the target image block has the smallest image difference from the target image block. The image difference between two image blocks is defined as, for example, the sum or average of the difference between pixels of two image blocks. Among the blocks having a smaller or critical image difference than the target macroblock in the current frame, the block having the smallest image difference with the target macroblock is called a reference block. A picture including a reference block is called a reference picture. For each macroblock of the current frame, two reference blocks (or two reference pictures) exist in the previous frame of the current frame (including the base layer frame) or the next frame of the current frame (including the base layer frame), There may be one reference block in the previous frame and one reference block in the next frame.

기준 블록을 찾으면, 추정기/예측기(102)는 현재 블록으로부터 기준 블록으로의 움직임 벡터를 산출하여 출력한다. 추정기/예측기(102)는 또한 이전 프레임 또는 다음 프레임에 존재하는 기준 블로의 픽셀 값 또는 이전 및 다음 프레임에 존재하는 2개의 기준 프레임의 평균 픽셀 값으로부터 현재 블록의 픽셀 에러 값(즉, 필셀 차 값)을 산출하여 출력한다. 이미지 또는 픽셀 차 값은 또한 잔여 데이터라 한다.When the reference block is found, the estimator / predictor 102 calculates and outputs a motion vector from the current block to the reference block. The estimator / predictor 102 may also determine the pixel error value of the current block (ie, the pixel difference value) from the pixel value of the reference blow present in the previous frame or the next frame or the average pixel value of two reference frames present in the previous and next frame. ) And output. Image or pixel difference values are also referred to as residual data.

원하는 임계 이미지 차를 갖거나 현재 매크로블록보다 작은 매크로블록이 움직임 추정 동작을 통해 2개의 이웃 프레임(베이스층 프레임을 포함)에서 찾을 수 없으면, 추정기/예측기(102)는 현재의 프레임과 동일한 시간 영역의 프레임(이하, 시간적 일치 프레임이라 한다) 또는 현재 프레임에 근접한 시간 영역의 프레임(이하, 시간적 근접 프레임이라 한다)이 베이스층 시퀀스에 존재하는지를 판정한다. 이러한 프레임이 베이스층 시퀀스에 존재하면, 추정기/예측기(102)는 2개의 매크로블록의 픽셀 값에 기초하여 시간적 일치 또는 근접 프레임 내의 대응 매크로블록으로부터 현재 매크로블록의 이미지 차(즉, 잔여 데이터)를 구하고, 대응하는 매크로블록에 대하여 현재 매크로블록의 움직임 벡터는 구하지 않는다. 현재 프레임에 대한 근접 시간 영역은 현재 프레임과 동일한 이미지를 갖는 것으로 간주될 수 있는 프레임을 포함하는 시간 간격에 대응한다. 이 시간 간격의 정보는 인코딩된 스트림 내에 포함된다.If a macroblock having a desired threshold image difference or smaller than the current macroblock cannot be found in two neighboring frames (including the base layer frame) through a motion estimation operation, the estimator / predictor 102 is in the same time domain as the current frame. It is determined whether a frame (hereinafter, referred to as a temporal coincidence frame) or a frame in a time domain (hereinafter referred to as a temporal proximity frame) close to the current frame exists in the base layer sequence. If such a frame is present in the base layer sequence, the estimator / predictor 102 derives the image difference (i.e., residual data) of the current macroblock from the corresponding macroblock in the temporal match or proximity frame based on the pixel values of the two macroblocks. The motion vector of the current macroblock is not obtained for the corresponding macroblock. The proximity time zone for the current frame corresponds to a time interval that includes a frame that can be considered to have the same image as the current frame. The information of this time interval is included in the encoded stream.

추정기/예측기(102)의 상술한 동작은 "P" 동작이라 한다. 추정기/예측기(102)가 현재 프레임 내의 각각의 매크로블록의 기준 블록을 검색하고 각각의 매크로블록을 잔여 데이터로 코딩함으로써 "P" 동작을 수행하여 H 프레임을 생성하면, 추정기/예측기(102)는 기준 픽쳐로서, 도 3a에 도시된 바와 같이, 현재 프레임 이전 및 이후의 인핸스드층의 이웃 L 프레임에 더하여, 스케일러(105a)로부터 수신된 베이스층의 확대된 픽쳐를 선택적으로 사용할 수 있다.The above-described operation of the estimator / predictor 102 is referred to as "P" operation. If estimator / predictor 102 performs a " P " operation by retrieving a reference block of each macroblock in the current frame and coding each macroblock with residual data, estimator / predictor 102 generates an H frame. As the reference picture, as shown in FIG. 3A, in addition to the neighboring L frames of the enhanced layer before and after the current frame, an enlarged picture of the base layer received from the scaler 105a may be selectively used.

본 발명의 일 실시예에서, 5개의 프레임이 각각의 H 프레임을 생성하는데 사용된다. 도 3b는 H 프레임을 생성하는데 사용될 수 있는 5개의 프레임을 나타낸다. 도시한 바와 같이, 현재의 L 프레임(400L)은 현재의 L 프레임(400L) 이전의 L 프레임(401) 및 현재의 L 프레임(400L) 다음의 L 프레임(402)을 갖는다. 또한, 현재의 L 프레임(400L)은 동일한 시간 영역 내의 베이스층 프레임(405)을 갖는다. 현재의 L 프레임(400L)과 동일한 MCTF 레벨의 L 프레임(401 및 402) 중의 하나 또는 두개의 프레임, L 프레임(400L)과 동일한 시간 영역의 베이스층의 프레임(405) 및 프레임(405) 이전 및 이후의 베이스층 프레임(403 및 404)은 기준 픽쳐로서 사용되어 현재의 L 프레임(400L)으로부터 H 프레임(400H)을 생성한다. 상술한 바와 같이, 다양한 기준 블록 선택 모드가 있다. 어느 모드가 선택되었는지를 디코더에 알리기 위하여, MCTF 인코더(100)는 "기준 블록 선택 모드" 정보를 대응하는 매크로블록의 헤더 영역의 특정 위치의 필드에 삽입한 후에 "기준 블록 선택 모드" 정보를 텍스쳐 코딩부(110)로 전송한다. In one embodiment of the invention, five frames are used to generate each H frame. 3B shows five frames that can be used to generate an H frame. As shown, the current L frame 400L has an L frame 401 before the current L frame 400L and an L frame 402 after the current L frame 400L. In addition, the current L frame 400L has a base layer frame 405 within the same time domain. One or two of the L frames 401 and 402 of the same MCTF level as the current L frame 400L, before the frame 405 and the frame 405 of the base layer in the same time domain as the L frame 400L and Subsequent base layer frames 403 and 404 are used as reference pictures to generate an H frame 400H from the current L frame 400L. As mentioned above, there are various reference block selection modes. In order to inform the decoder which mode is selected, the MCTF encoder 100 inserts "reference block selection mode" information into a field at a specific position of the header area of the corresponding macroblock and then textures the "reference block selection mode" information. Transmission to the coding unit 110.

베이스층의 픽쳐가 도 3b에 도시된 바와 같이 기준 픽쳐 선택 방법에서 인핸스드층의 픽쳐 예측을 위한 기준 픽쳐로서 사용되면, 베이스층 픽쳐의 전부 또는 일부가 인핸스드층 픽쳐의 예측에 사용될 수 있다. 예를 들어, 도 4에 도시된 바와 같이, 베이스층 픽쳐가 4:3의 종횡비를 가지고, 베이스층 픽쳐의 실제 이미지부(502)가 16:9의 종횡비를 가지며 인핸스드층 픽쳐(500)가 16:9의 종횡비를 가지면, 베이스층 픽쳐의 상부 및 하부 수평부(501a 및 501b)는 무효 데이터를 포함한다. 이 경우, 베이스층 픽쳐의 이미지부(502) 만이 인핸스드층 픽쳐(500)의 예측에 사용된다. 이를 달성하기 위하여, 스케일러(105a)는 베이스층 픽쳐의 이미지부(502)를 선택(또는 절단)하고(S41), 선택된 이미지부(502)를 업-샘플링(up-sample)하여 인핸스드층 픽쳐(500)의 크기로 확대하고(S42), 확대된 이미지부를 추정기/예측기(102)에 제공한다. If the picture of the base layer is used as a reference picture for picture prediction of the enhanced layer in the reference picture selection method as shown in FIG. 3B, all or part of the base layer picture may be used for prediction of the enhanced layer picture. For example, as shown in FIG. 4, the base layer picture has an aspect ratio of 4: 3, the actual image portion 502 of the base layer picture has a 16: 9 aspect ratio, and the enhanced layer picture 500 has 16 With an aspect ratio of: 9, the upper and lower horizontal portions 501a and 501b of the base layer picture contain invalid data. In this case, only the image unit 502 of the base layer picture is used for prediction of the enhanced layer picture 500. To achieve this, the scaler 105a selects (or cuts) the image portion 502 of the base layer picture (S41), and up-samples the selected image portion 502 to enhance the enhanced layer picture ( In step S42, the enlarged image unit is provided to the estimator / predictor 102.

MCTF 인코더(100)는 베이스층 픽쳐의 선택된 부분의 위치 정보를 잔여 데이터로 코딩된 현재 픽쳐의 헤더에 병합한다. MCTF 인코더(100)는 또한 베이스층 픽쳐의 일부가 선택되어 사용된다는 것을 나타내는 플래그 "flag_base_layer_cropping"을 설정하여 적절한 위치의 픽쳐 헤드로 삽입함으로써 플래그가 디코더로 전달되도록 한다. 플래그 "flag_base_layer_cropping"이 리셋될 때는 위치 정보가 전송되지 않는다.The MCTF encoder 100 merges the position information of the selected portion of the base layer picture into the header of the current picture coded with the residual data. The MCTF encoder 100 also sets a flag "flag_base_layer_cropping" indicating that a portion of the base layer picture is selected and used to insert it into the picture head at the appropriate location so that the flag is passed to the decoder. Position information is not transmitted when the flag "flag_base_layer_cropping" is reset.

도 5a 및 5b는 베이스층 픽쳐의 선택된 부분(512)에 관한 정보의 구조의 실시예를 나타낸다. 도 5a의 실시예에서, 베이스층 픽쳐의 선택된 부분(512)은 베이스층 픽쳐의 좌측, 우측, 상부 및 하부 경계로부터의 오프셋(left_offset, right_offset, top_offset 및 bottom_offset)에 의해 지정된다. 좌측 오프셋은 베이스층 이미지 내의 좌측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 선택된 부분(512) 내의 좌측 픽셀간의 위치 오프셋을 나타낸다. 상부측 오프셋은 베이스층 이미지 내의 상부측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 선택된 부분(512) 내의 상부측 픽셀간의 위치 오프셋을 나타낸다. 우측 오프셋은 베이스층 이미지 내의 우측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 선택된 부분(512) 내의 우측 픽셀간의 위치 오프셋을 나타낸다. 하부측 오프셋은 베이스층 이미지 내의 하부측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 선택된 부분(512) 내의 하부측 픽셀간의 위치 오프셋을 나타낸다. 도 5b의 실시예에서, 베이스층 픽쳐의 선택된 부분(512)은 베이스층 픽쳐의 좌측 및 상부 경계로부터의 오프셋(left_offset 및 top_offset) 및 선택된 부분(512)의 폭 및 높이(crop_width 및 crop_height)에 의해 지정된다. 또한, 다른 다양한 지정 방법이 가능하다.5A and 5B show an embodiment of the structure of the information about the selected portion 512 of the base layer picture. In the embodiment of FIG. 5A, selected portions 512 of the base layer picture are designated by offsets (left_offset, right_offset, top_offset and bottom_offset) from the left, right, top and bottom boundaries of the base layer picture. The left offset represents the position offset between the left pixel (or, for example, at least one pixel) in the base layer image and the left pixel in the selected portion 512. The top side offset represents the position offset between the top side pixel (or, for example, at least one pixel) in the base layer image and the top side pixel in the selected portion 512. The right offset represents the position offset between the right pixel (or, for example, at least one pixel) in the base layer image and the right pixel in the selected portion 512. The bottom side offset represents a position offset between bottom side pixels (or, for example, at least one pixel) in the base layer image and bottom side pixels in the selected portion 512. In the embodiment of FIG. 5B, the selected portion 512 of the base layer picture is determined by offsets (left_offset and top_offset) from the left and top boundaries of the base layer picture and the width and height (crop_width and crop_height) of the selected portion 512. Is specified. In addition, other various designation methods are possible.

도 5a 및 5b에 도시된 선택된 부분의 정보에 있어서의 오프셋은 네가티브 값을 가질 수 있다. 예를 들어, 도 6에 도시된 바와 같이, 베이스층 픽쳐가 4:3의 종횡비를 갖고, 인핸스드층 픽쳐(600)가 16:9의 종횡비를 갖고, 픽쳐의 실제 이미지부가 4:3의 종횡비를 가지면, 좌측 및 우측 오프셋값(left_offset 및 right_offset)은 네가티브 값(-d_L 및 -d_R)을 갖는다. 베이스층 픽쳐로부터 연장된 부분(601a 및 601b)은 네가티브 값(-d_L 및 -d_R)에 의해 지정된다. 연장된 부분(601a 및 601b)은 오프스크린(offscreen) 데이터로 채워지고, 연장된 부분(601a 및 601b)을 포함하는 픽쳐(610)는 인핸스드층 픽쳐(600)와 동일한 크기를 갖도록 업샘플링된다. 따라서, 인핸스드층 픽쳐(600)의 실제 이미지부에 대응하는 확대된 베이층 픽쳐 내의 영역(611)의 데이터는 인핸스드층 픽쳐(600)의 실제 이미지부의 예측에 사용될 수 있다.The offset in the information of the selected portion shown in Figs. 5A and 5B may have a negative value. For example, as shown in FIG. 6, the base layer picture has an aspect ratio of 4: 3, the enhanced layer picture 600 has an aspect ratio of 16: 9, and the actual image portion of the picture has a 4: 3 aspect ratio. If left, the left and right offset values left_offset and right_offset have negative values -d _L and -d _R. Portions 601a and 601b extending from the base layer picture are designated by negative values -d _L and -d _R. The extended portions 601a and 601b are filled with offscreen data, and the picture 610 including the extended portions 601a and 601b is upsampled to have the same size as the enhanced layer picture 600. Thus, data of the region 611 in the enlarged bay layer picture corresponding to the actual image portion of the enhanced layer picture 600 may be used for prediction of the actual image portion of the enhanced layer picture 600.

도 5a 및 5b에 도시된 정보의 오프셋 필드가 네가티브 값을 가질 수 있으므로, 베이스층 픽쳐 내의 선택된 영역을 지정하는 도 5a 및 5b의 정보를 이용하는 대신, 확대된 베이스층 픽쳐와 관련되는 인핸스드층 픽쳐와 중첩하는 영역의 위치 정보로서 도 5a 및 5b의 정보를 이용함으로써 도 4의 예에서 설명한 것과 동일한 이점이 얻어진다.Since the offset field of the information shown in FIGS. 5A and 5B may have a negative value, instead of using the information of FIGS. 5A and 5B specifying a selected area within the base layer picture, the enhanced layer picture associated with the enlarged base layer picture and The same advantages as those described in the example of FIG. 4 are obtained by using the information of FIGS. 5A and 5B as positional information of overlapping regions.

특히, 도 7을 참조하면, 베이스층 픽쳐(702)가 업샘플링되어 베이스층 픽쳐(702)의 실제 이미지 영역(701)이 인핸스드층 픽쳐(700)의 크기로 확대되면, 확 대된(예를 들어, 업샘플링된) 픽쳐는 인핸스드층 픽쳐(700)보다 큰 영역에 대응한다. 이 예에서, 상부측 및 하부측 오프셋(top_offset 및 bottom_offset)은 인핸스드층 픽쳐(700)와 중첩하는 영역의 위치 정보에 포함된다. 이들 오프셋은 확대된 베이스층 픽쳐에 대응하고 네가티브 값(-d_T 및 d_B)이 할당되어 확대된 베이스층 픽쳐의 실제 이미지 영역만이 인핸스드층 픽쳐(700)의 예측에 사용되도록 한다. 도 7의 예에서, 확대된 베이스층 픽쳐에 대응하는 영역의 위치 정보의 좌측 및 우측 오프셋은 제로이다. 그러나, 좌측 및 우측 오프셋은 제로가 아닐 수 있으며 확대된 베이스층 픽쳐에 대응한다는 것을 이해할 것이다. 확대된 베이스층 픽쳐 내의 이미지의 일부는 인핸스드층 픽쳐를 결정하는데 사용되지 않을 수 있다. 마찬가지로, 오프셋 정보가 베이스층 픽쳐에 대응하면, 업샘플 베이스층 픽쳐와 반대로, 베이스층 픽쳐의 이미지의 일부가 인핸스드층 픽쳐를 결정하는데 사용되지 않을 수 있다.In particular, referring to FIG. 7, when the base layer picture 702 is upsampled and the actual image area 701 of the base layer picture 702 is enlarged to the size of the enhanced layer picture 700, it is enlarged (eg, The, upsampled) picture corresponds to an area larger than the enhanced layer picture 700. In this example, the top and bottom offsets top_offset and bottom_offset are included in the positional information of the region overlapping with the enhanced layer picture 700. These offsets correspond to the enlarged base layer picture and negative values (-d _T and d _B ) are assigned such that only the actual image area of the enlarged base layer picture is used for prediction of the enhanced layer picture 700. In the example of FIG. 7, the left and right offsets of the position information of the area corresponding to the enlarged base layer picture are zero. However, it will be appreciated that the left and right offsets may not be zero and correspond to the enlarged base layer picture. Some of the images in the enlarged base layer picture may not be used to determine the enhanced layer picture. Similarly, if the offset information corresponds to the base layer picture, a portion of the image of the base layer picture may not be used to determine the enhanced layer picture, as opposed to the upsample base layer picture.

또한, 이 실시예에서, 좌측 오프셋은 업샘플링된 베이스층 이미지 내의 좌측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 인핸스드층 이미지 내의 좌측 픽셀간의 위치 오프셋을 나타낸다. 상부측 오프셋은 업샘플링된 베이스층 내의 상부측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 인핸스드층 이미지 내의 상부측 픽셀간의 위치 오프셋을 나타낸다. 우측 오프셋은 업샘플링된 베이스층 이미지 내의 우측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 인핸스드층 이미지 내의 우측 픽셀간의 위치 오프셋을 나타낸다. 하부측 오프셋은 업샘플링된 베이스층 이미지 내의 하부측 픽셀(또는, 예를 들어, 적어도 하나의 픽셀) 및 인핸스드층 이미지 내의 하부측 픽셀간의 위치 오프셋을 나타낸다.Also, in this embodiment, the left offset represents the position offset between the left pixel (or, for example, at least one pixel) in the upsampled base layer image and the left pixel in the enhanced layer image. The top side offset represents the position offset between the top side pixel (or at least one pixel, for example) in the upsampled base layer and the top side pixel in the enhanced layer image. The right offset represents the position offset between the right pixel (or at least one pixel, for example) in the upsampled base layer image and the right pixel in the enhanced layer image. The bottom side offset represents a position offset between bottom side pixels (or, for example, at least one pixel) in the upsampled base layer image and bottom side pixels in the enhanced layer image.

상술한 바와 같이, 도 5a 및 5b의 정보는 인핸스드층 픽쳐의 예측에 사용되는 베이스층 픽쳐의 일부의 선택을 위한 정보로서 사용되거나 인핸스드층 픽쳐의 예측에 사용되는 베이스층 픽쳐와 관련된 인핸스드층 픽쳐와 중첩되는 영역의 위치 정보로서 사용될 수 있다.As described above, the information in FIGS. 5A and 5B may be used as information for selecting a part of a base layer picture used for prediction of an enhanced layer picture or an enhanced layer picture associated with a base layer picture used for prediction of an enhanced layer picture. It can be used as location information of overlapping regions.

베이스층 픽쳐의 크기 및 종횡비의 정보, 베이스층 픽쳐의 실제 이미지의 모드 정보 등은 예를 들어 인코딩된 베이스층 스트림의 시퀀스 헤더로부터 디코딩함으로써 결정될 수 있다. 즉, 정보는 인코딩된 베이스층 스트림의 시퀀스 헤더에 기록될 수 있다. 따라서, 상술한 베이스층 픽쳐 내의 선택된 영역 또는 베이스층 픽쳐에 대응하는 인핸스드층 픽쳐와 중첩하는 영역의 위치는 위치 또는 오프셋 정보에 기초하여 결정되고, 베이스층 픽쳐의 전부 또는 일부가 이 결정을 적응시키는데 사용된다.Information of the size and aspect ratio of the base layer picture, mode information of the actual image of the base layer picture, and the like can be determined, for example, by decoding from the sequence header of the encoded base layer stream. That is, the information may be recorded in the sequence header of the encoded base layer stream. Thus, the position of the region overlapping the enhanced layer picture corresponding to the selected region or base layer picture in the above-described base layer picture is determined based on the position or offset information, and all or part of the base layer picture is adapted to adapt this determination. Used.

도 1 및 2로 되돌아가면, MCTF 인코더(100)는 소정 길이의 픽쳐 시퀀스, 예를 들어 픽쳐 그룹(GOP)에 대해 상술한 "P" 및 "U" 동작을 수행함으로써 H 프레임의 시퀀스 및 L 프레임의 시퀀스를 각각 생성한다. 그 후, 직렬 접속된 다음 단(도시하지 않음)의 추정기/예측기 및 갱신기는 생성된 L 프레임 시퀀스에 대하여 "P" 및 "U" 동작을 반복함으로써 H 프레임의 시퀀스 및 L 프레임의 시퀀스를 생성한다. "P" 및"U" 동작은 소정 횟수(예를 들어, GOP에 대하여 하나의 L 프레임이 생성될 때까지) 수행되어 최종 인핸스드층 시퀀스를 생성한다.1 and 2, the MCTF encoder 100 performs the above-described " P " and " U " operations on a picture sequence of a predetermined length, for example, a picture group (GOP), so that the sequence of the H frame and the L frame are performed. Create a sequence of. The estimator / predictor and updater of the next stage (not shown) connected in series then generates a sequence of H frames and a sequence of L frames by repeating the "P" and "U" operations on the generated L frame sequence. . The "P" and "U" operations are performed a predetermined number of times (eg, until one L frame is generated for a GOP) to produce the final enhanced layer sequence.

상술한 방법으로 인코딩된 데이터 스트림은 디코딩 장치로 무선 또는 유선으로 전송되거나 기록 매체를 통해 전송된다. 디코딩 장치는 후술하는 방법에 따라 인핸스드 및/또는 베이스층 내의 본래의 비디오 신호를 재구성한다.The data stream encoded in the above-described manner is transmitted wirelessly or by wire to the decoding apparatus or via a recording medium. The decoding apparatus reconstructs the original video signal in the enhanced and / or base layer according to the method described below.

도 8은 도 1의 장치에 의해 인코딩된 데이터 스트림을 디코딩하는 장치를 나타내는 블록도이다. 도 8의 디코딩 장치는 디먹스(디멀티플렉서)(200), 텍스쳐 디코딩부(210), 움직임 디코딩부(220), MCTF 디코더(230) 및 베이스층 디코더(240)을 포함한다. 디멀티플렉서(200)는 수신된 데이터 스트림을 압축 움직임 벡터 스트림, 압축 매크로블록 정보 스트림 및 베이스층 스트림으로 분리한다. 텍스쳐 디코딩부(210)는 압축 매크로블록 정보 스트림을 압축되지 않은 본래의 상태로 재구성한다. 움직임 디코딩부(220)는 압축 움직임 벡터 스트림을 압축하지 않은 본래의 상태로 재구성한다. MCTF 디코더(230)는 압축되지 않은 매크로블록 정보 스트림 및 압축되지 않은 움직임 벡터 스트림을 MCTF 방식에 따라 본래의 비디오 신호로 변환하는 인핸스드층 디코더이다. 베이스층 디코더(240)는 지정된 방식, 예를 들어, MPEG-4 또는 H.264 표준에 따라 베이스층 스트림을 디코딩한다.8 is a block diagram illustrating an apparatus for decoding a data stream encoded by the apparatus of FIG. 1. The decoding apparatus of FIG. 8 includes a demux (demultiplexer) 200, a texture decoding unit 210, a motion decoding unit 220, an MCTF decoder 230, and a base layer decoder 240. The demultiplexer 200 separates the received data stream into a compressed motion vector stream, a compressed macroblock information stream, and a base layer stream. The texture decoding unit 210 reconstructs the compressed macroblock information stream to an uncompressed original state. The motion decoding unit 220 reconstructs the compressed motion vector stream to an uncompressed original state. The MCTF decoder 230 is an enhanced layer decoder for converting an uncompressed macroblock information stream and an uncompressed motion vector stream into an original video signal according to the MCTF scheme. Base layer decoder 240 decodes the base layer stream according to a specified scheme, for example, according to the MPEG-4 or H.264 standard.

MCTF 디코더(230)는 내부 요소로서 입력 스트림을 본래의 프레임 시퀀스로 재구성하기 위한 도 9에 도시된 구조를 갖는 역필터를 포함한다.The MCTF decoder 230 includes an inverse filter having the structure shown in FIG. 9 for reconstructing an input stream into an original frame sequence as an internal element.

도 9는 MCTF 레벨 N의 H 및 L 프레임의 시퀀스를 레벨 N-1의 L 프레임의 시퀀스로 재구성하는 역필터의 일부 요소를 나타낸다. 도 9의 역필터의 요소는 역갱신기(231), 역예측기(232), 움직임 벡터 디코더(235), 배열기(arranger)(234), 및 스케일러(230a)를 포함한다. 역갱신기(231)는 입력 L 프레임의 대응하는 픽셀값으 로부터 입력 H 프레임의 픽셀 차 값을 감산한다. 역예측기(232)는 H 프레임의 이미지 차가 감산된 L 프레임 및/또는 스케일러(240a)로부터 출력된 확대된 픽쳐를 참조하여 입력 H 프레임을 본래의 이미지를 갖는 프레임으로 재구성한다. 움직임 벡터 디코더(235)는 입력 움직임 벡터 스트림을 각각의 블록의 움직임 벡터 정보로 디코딩하고 움직임 벡터 정보를 각 단의 역예측기(예를 들어, 역예측기(232))로 공급한다. 배열기(234)는 역갱신기(231)로부터 출력된 L 프레임간의 역예측기(232)에 의해 완성된 프레임을 인터리브하여 정상 비디오 프레임 시퀀스를 생성한다. 스케일러(230a)는 예를 들어 도 5a 및 5b에 도시된 바와 같은 정보에 따라 베이스층의 작은 화면 픽쳐를 인핸스드층 픽쳐 크기로 확대한다.9 shows some elements of an inverse filter that reconstruct a sequence of H and L frames of MCTF level N into a sequence of L frames of level N-1. The elements of the inverse filter of FIG. 9 include an inverse updater 231, an inverse predictor 232, a motion vector decoder 235, an arranger 234, and a scaler 230a. The inverse updater 231 subtracts the pixel difference value of the input H frame from the corresponding pixel value of the input L frame. The inverse predictor 232 reconstructs the input H frame into a frame having the original image with reference to the L frame from which the image difference of the H frame is subtracted and / or the enlarged picture output from the scaler 240a. The motion vector decoder 235 decodes the input motion vector stream into motion vector information of each block and supplies the motion vector information to the inverse predictor (eg, the inverse predictor 232) of each stage. The arranger 234 interleaves the frames completed by the inverse predictor 232 between L frames output from the inverse updater 231 to generate a normal video frame sequence. The scaler 230a enlarges the small screen picture of the base layer to the size of the enhanced layer picture according to the information as shown in FIGS. 5A and 5B, for example.

배열기(234)로부터 출력된 L 프레임은 레벨 N-1의 L 프레임 시퀀스(601)를 구성한다. 레벨 N-1의 다음단의 역갱신기 및 예측기는 레벨 N-1의 L 프레임 시퀀스(601) 및 입력 H 프레임 시퀀스(602)를 L 프레임 시퀀스로 재구성한다. 이 디코딩 프로세스는 인코딩 처리에서 채용된 MCTF 레벨의 수와 동일한 횟수만큼 수행되어 본래의 비디오 프레임 시퀀스를 재구성한다. 입력 H 프레임의 각각의 매크로블록의 헤더에 포함된 "reference_selection_code" 정보를 참조하여, 역예측기(232)는 기준 프레임으로서 사용된 베이스층의 확대된 프레임 및/또는 인핸스드층의 L 프레임을 지정하여 매크로블록을 잔여 데이터로 코딩한다. 역예측기(232)는 움직임 벡터 디코더(235)로부터 제공된 움직임 벡터에 기초하여 지정된 프레임 내의 기준 블록을 결정하고 기준 블록의 픽셀 값 (또는 매크로블록의 기준 블록으로서 사용되는 2개의 매크로블록의 평균 픽셀 값)을 H 프레임의 매크로블록의 픽셀 차 값 에 부가하여 H 프레임의 매크로블록의 본래의 이미지를 재구성한다.The L frame output from the arranger 234 constitutes an L frame sequence 601 of level N-1. The inverse updater and predictor next to level N-1 reconstruct L frame sequence 601 and input H frame sequence 602 at level N-1 into L frame sequences. This decoding process is performed the same number of times as the number of MCTF levels employed in the encoding process to reconstruct the original video frame sequence. With reference to " reference_selection_code " information contained in the header of each macroblock of the input H frame, the inverse predictor 232 designates the macro by specifying an enlarged frame of the base layer and / or an L frame of the enhanced layer used as the reference frame. Code the block with residual data. The inverse predictor 232 determines a reference block within a specified frame based on the motion vector provided from the motion vector decoder 235 and determines the pixel value of the reference block (or the average pixel value of two macroblocks used as the reference block of the macroblock). ) Is added to the pixel difference value of the macroblock of the H frame to reconstruct the original image of the macroblock of the H frame.

베이스층 픽쳐가 현재 H 프레임의 기준 프레임으로서 사용되면, 스케일러(230a)는 베이스층 픽쳐(도 4의 예에서) 내의 영역을 선택하여 확대하거나 MCTF 디코더(230)에 의해 분석된 헤더에 포함된 도 5a 및 5b에 도시된 바와 같은 위치 관계 정보에 기초하여 베이스층 픽쳐(도 6의 예에서)보다 큰 영역을 확대하여 베이스층 픽쳐의 확대된 영역이 상술한 바와 같이 현재의 H 프레임 내의 잔여 데이터를 포함하는 매크로블록을 본래의 이미지 블록으로 재구성하는데 사용되도록 한다. 위치 관계 정보는 헤더로부터 추출되어 위치 관계 정보가 포함되어 있는지를 나타내는 정보(특히, 도 5a 및 5b의 예의 플래그 "flag_base_layer_cropping")가 위치 관계 정보가 포함되었다는 것을 나타낼 때 참조된다.If the base layer picture is used as a reference frame of the current H frame, the scaler 230a selects and enlarges an area within the base layer picture (in the example of FIG. 4) or contains a picture contained in a header analyzed by the MCTF decoder 230. Based on the positional relationship information as shown in 5a and 5b, the area larger than the base layer picture (in the example of FIG. 6) is enlarged so that the enlarged area of the base layer picture is used to display the remaining data in the current H frame as described above. It is used to reconstruct the containing macroblock into the original image block. The position relationship information is extracted from the header and is referred to when the information indicating whether the position relationship information is included (in particular, the flag "flag_base_layer_cropping" in the example of FIGS. 5A and 5B) indicates that the position relationship information is included.

도 5a 및 5b의 정보가 인핸스드층 픽쳐의 예측에 사용되는 인핸스드층 픽쳐와 중첩되는 영역의 위치를 나타내는 정보로서 사용되는 경우, 역예측기(232)는 오프셋 정보의 값(포지티브 또는 네가티브)에 따라 확대된 베이스층 픽쳐의 전체를 현재의 H 프레임보다 큰 영역 또는 현재의 H 프레임의 전부 또는 일부와 관련시킴으로써 인핸스드층 픽쳐의 예측을 위해 스케일러(230a)로부터 수신된 베이스층 픽쳐 중에서 확대된 것을 사용한다. 확대된 베이스층 픽쳐가 현재의 H 프레임보다 큰 영역과 관련된 도 7의 경우에, 예측기(232)는 현재의 H 블록 내의 매크로블록을 본래의 이미지로 재구성하기 위하여 H 프레임에 대응하는 확대된 베이스층 픽쳐의 영역만을 이용한다. 이 예에서, 오프셋 정보는 네가티브 값을 포함한다.When the information of FIGS. 5A and 5B is used as the information indicating the position of the region overlapping with the enhanced layer picture used for the prediction of the enhanced layer picture, the inverse predictor 232 is enlarged according to the value (positive or negative) of the offset information. The enlarged one of the base layer pictures received from the scaler 230a for prediction of the enhanced layer picture is used by associating the entirety of the base layer picture with an area larger than the current H frame or all or a portion of the current H frame. In the case of FIG. 7 in which the enlarged base layer picture is larger than the current H frame, the predictor 232 enlarges the base layer corresponding to the H frame to reconstruct the macroblock in the current H block into the original image. Only the area of the picture is used. In this example, the offset information includes a negative value.

하나의 H 프레임에 대하여, 프레임 내의 매크로블록이 재구성된 자신의 본래 의 이미지를 갖도록 MCTF 디코딩이 특정 단위, 예를 들어, 병렬 방식의 슬라이스 단위로 수행되고, 재구성된 매크로 블록은 결합되어 완전한 비디오 프레임을 구성한다.For one H frame, MCTF decoding is performed in specific units, for example in parallel slice units, so that the macroblocks within the frame have their own original image reconstructed, and the reconstructed macroblocks are combined to form a complete video frame. Configure

상술한 디코딩 방법은 MCTF 인코딩 데이터 스트림을 완전한 비디오 프레임 시퀀스로 재구성한다. 디코딩 장치는 그 처리 및 프리젠테이션 능력에 의존하는 베이스층을 이용하여 베이스층 시퀀스를 디코딩하여 출력하거나 인핸스드층 시퀀스를 디코딩하여 출력한다.The decoding method described above reconstructs the MCTF encoded data stream into a complete video frame sequence. The decoding apparatus decodes and outputs a base layer sequence using a base layer that depends on its processing and presentation capability, or decodes and outputs an enhanced layer sequence.

상술한 디코딩 장치는 이동 통신 단말기, 미디어 플레이어 등에 병합될 수 있다.The above-described decoding apparatus may be incorporated into a mobile communication terminal, a media player, or the like.

상기 설명에서 알 수 있는 바와 같이, 본 발명에 따른 비디오 신호를 인코딩/디코딩하는 방법 및 장치는, 코딩된 데이터의 총량이 감소되도록 스케일러블 방식으로 비디오 신호를 인코딩할 때, 인핸스드층의 픽쳐에 더하여 저성능 디코더에 제공되는 베이스층의 픽쳐를 이용함으로써 코딩 효율을 증가시킨다. 또한, 인핸스드층 픽쳐의 예측 동작에 사용될 수 있는 베이스층 픽쳐의 일부가 지정되어 베이스층 픽쳐로부터 확대된 픽쳐가 인핸스드층 픽쳐의 예측 동작에 직접 사용될 수 없을 때에도 예측 동작이 성능 저하없이 정상적으로 수행될 수 있다.As can be seen from the above description, the method and apparatus for encoding / decoding a video signal according to the invention, in addition to the picture of the enhanced layer, when encoding the video signal in a scalable manner such that the total amount of coded data is reduced. Coding efficiency is increased by using the picture of the base layer provided to the low performance decoder. In addition, even when a portion of the base layer picture that can be used for the prediction operation of the enhanced layer picture is specified and a picture enlarged from the base layer picture cannot be directly used for the prediction operation of the enhanced layer picture, the prediction operation may be normally performed without degrading performance. have.

본 발명은 예시적인 실시예를 참조하여 설명하였지만, 본 발명의 범위를 벗어나지 않은 한도내에서 다양한 개량, 변경, 대체, 및 추가 등이 가능함은 당업자에게 자명한 것이다. 따라서, 본 발명은 본 발명의 개량, 변경, 대체 및 추가를 포함한다. While the invention has been described with reference to exemplary embodiments, it will be apparent to those skilled in the art that various modifications, changes, substitutions, and additions can be made without departing from the scope of the invention. Accordingly, the present invention includes improvements, modifications, substitutions and additions to the present invention.

Claims

Obtaining location information of a current block in a current image in the enhanced layer;

Obtaining offset information between at least one boundary pixel of the reference image in the base layer and at least one boundary pixel of the current image;

Acquiring position information of a reference block in the reference image in the base layer based on the position information and the offset information of the current block;

Obtaining reference residual data based on the positional information of the reference block;

Obtaining residual data of the current block based on the reference residual data; And

Decoding the current block based on the remaining data of the current block

Video signal decoding method comprising a.

The method of claim 1,

The enhanced layer has a screen ratio or spatial resolution different from that of the base layer, and the base layer corresponds to the same video signal as the enhanced layer.

delete

The method of claim 1,

And the reference image represents an upsampled portion of a base layer image.

The method of claim 1,

And the reference residual data represents an upsampled portion of a base layer image.

The method of claim 5,

And the upsampled portion of the base layer image is residual data coded.

The method of claim 1,

The offset information is,

Left offset information indicating a position offset between at least one left side pixel of the reference image and at least one left side pixel of the current image;

Upper offset information indicating a position offset between at least one upper side pixel of the reference image and at least one upper side pixel of the current image;

Right offset information indicating a position offset between at least one right side pixel of the reference image and at least one right side pixel of the current image;

Lower offset information indicating a position offset between at least one lower side pixel of the reference image and at least one lower side pixel of the current image

Video signal decoding method comprising a.

The method of claim 1,

Determining whether the offset information is obtained from a header of at least one portion of the current image,

And said offset information is obtained based on said determining step.

The method of claim 1,

Upsampling at least a portion of a base layer image to obtain an upsampled image as the reference image.

The method of claim 9,

Wherein said portion of said base layer image represents a residual data coded portion of a base layer image.

The method of claim 1,

And the position information of the reference block is obtained based on size information indicating at least one size of the current image.

The method of claim 11,

The size information includes width information indicating a width of the current image and height information indicating a height of the current image.

delete

The method of claim 1,

And the offset information is obtained from a slice header of the enhanced layer.

A demux for obtaining position information of a current block in a current image in an enhanced layer and offset information between at least one boundary pixel of a reference image in a base layer and at least one boundary pixel of the current image;

A decoder that obtains position information of a reference block in the reference image in the base layer based on the position information and the offset information of the current block, and obtains reference residual data based on the position information of the reference block

Video signal decoding apparatus comprising a.

delete