WO2006108863A2

WO2006108863A2 - Process for scalable coding of images

Info

Publication number: WO2006108863A2
Application number: PCT/EP2006/061563
Authority: WO
Inventors: Edouard Francois; Jérome Vieron; Gwenaelle Marquant; Nicolas Burdin; Patrick Lopez
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2005-04-13
Filing date: 2006-04-12
Publication date: 2006-10-19
Anticipated expiration: 2007-10-13

Abstract

The process, comprising an inter layer residual prediction mode, to code a current macroblock of an en hancement layer from a prediction macroblock of a base layer, a macroblock being made up of elementary blocks, the prediction macroblock being obtained through an upsampling of the base layer (3) is characterised in that the upsampling is made at an elementary block level and in that the phase of the top left sample within an elementary block depends on the location of the elementary block within the picture.

Description

Process for scalable coding of images

The invention relates to a process for coding/decoding of images, more specifically scalable video coding/decoding where several spatial video formats need to be encoded/decoded.

This process enables the encoding of two spatial resolutions, low and high layers, where the factor between pictures width and height of the two successive spatial layers can be different vertically and horizontally and does not necessarily equal 2 and where pictures of the higher resolution level can contain parts, for example picture borders, that are not present in corresponding pictures of the low resolution level.

This process is fully generic, that is, any inter-layer size ratio and any cropping window can be handled. It addresses both the "spatial scalable coding using cropped areas" (CE9) and "non dyadic scalability" (CE10).

We consider two successive spatial layers, a low layer, considered as base layer, and a high layer, considered as enhancement layer. Figure 1 shows relations between the enhancement layer 1 and the base layer 2. Width and height of enhancement layer pictures are defined respectively as w_enh and h_enh. Base layer pictures dimensions are defined as w_base and h_base. Base layer pictures are a subsampled version of sub-pictures of enhancement layer pictures, of dimensions w_extracl and h_extracl, positioned at coordinates (x_Oπg , Y_orig) in the enhancement layer pictures coordinates system. In figure 1 , the enhancement and base layer pictures are divided in macroblocks. Upsampling factors between base layer pictures and extraction pictures in enhancement layer are respectively defined as α_hOήz = w_eχtract I w_base and α_verti_C = ^extract I h_base for horizontal and vertical dimensions. Parameters [X₀O₉ , /«& , w_extract, hextmct) completely define the geometrical relations between high and low layer pictures. In a standard dyadic scalable version, these parameters are equal to (0,0,2. w_base,2.h_extracl).

The problem to solve is the encoding/decoding of the macroblocks of the enhancement layer knowing the decoded base layer. Figure 2 illustrates the macroblock overlapping between the upsampled base layer picture 3 , in dashed lines, and the enhancement layer picture, in solid lines. A macroblock of enhancement layer may have either no base layer corresponding block, on borders of the enhancement layer picture, either one or several base layer corresponding macroblocks. Consequently a specific managing of the inter layer prediction is necessary.

An aim of the invention is to solve the aforementioned problems.

Its object is a process for scalable coding of images with a F1 format corresponding to an enhanced layer and with a F2 format having a lower resolution than F1 , corresponding to a base layer, comprising an inter layer residual prediction mode to code a current macroblock of the enhancement layer from a prediction macroblock of the base layer, a macroblock being made up of elementary blocks, the prediction macroblock being obtained through an upsampling of the base layer, characterised in that the upsampling is made at an elementary block level and in that the phase of the top left sample within an elementary block depends on the location of the elementary block within the picture.

According to a particular embodiment, the size of an elementary block is 4 x 4 pixels.

According to a particular embodiment, the filter used for the upsampling is a 6-tap filter. According to a particular embodiment, the upsampling is a 4/3 upsampling corresponding to 4 samples for 3 original pixels.

According to a particular embodiment, the phase, expressed as the relative position of the pixel, for 3 successive elementary blocks, is 0, ^ΛA and ¹A

Other objects and features of the invention will become understood from the following description with reference to the accompanying drawings which represent :

- figure 1 , relations between the enhancement layer and base layer,

- figure 2, macroblock overlapping between upsampled base layer picture and enhancement layer picture,

- figure 3, authorized blocks for inter layer motion vector prediction,

- figure 4, localization of a high layer pixel in the base layer,

- figure 5, 4x4 blocks b of a 8x8 block B

- figure 6, 8x8 blocks B of a prediction macroblock, - figure 7, oversampling of the 4x4 blocks in the case of a 4/3 ratio.

In the sequel, according to the semantic defined further, we will name: - scaled_base_column = x_orig

- scaled_base_line = y_Oπg

- scaled_base_width = w_extract

- scaled_base_height = h_extracl

Inter layer prediction is only possible for macroblocks fully embedded in the scaled base layer window, corresponding to the grey-colored area in figure 3, that is, macroblocks whose coordinates (MB_χ,Mb_y) respect the following conditions: MB_x >= (scaled_base_column + 15) / 16 and

MB_x < (scaled_base_column + scaled_base_width) / 16 And

MBy >= (scaled_base_ line + 15) / 16 and

MBy < (scaled_base_ line + scaled_base_ height) / 16

Inter-layer motion prediction

A high layer macroblock can exploit inter-layer prediction using scaled base layer motion data, using either "BASE_LAYER_MODE" or "QPEL_REFINEMENT_MODE", as in case of dyadic spatial scalability. In case of using one of these two modes, the high layer macroblock is reconstructed with default motion data deduced from the base layer. These modes are only authorized for high layer macroblocks having corresponding base layer blocks, i.e. grey-colored area in figure 3 where bold line represents the upsampled base layer window 3.

- base-layer prediction macroblock construction

As in [JSVM1.0], these macrob lock modes indicate that motion/prediction information including macroblock partitioning are directly derived from the base layer. The process consists in constructing a prediction macroblock, MB pred, inheriting motion data from base layer. When using "BASE_LAYER_MODE" mode, the macroblock partitioning as well as the reference indices and motion vectors are those of the prediction macroblock MBjpred. "QPEL_REFINEMENT_MODE" is similar, but a quarter-sample motion vector refinement is achieved.

The process to derive MBjpred works in three steps:

- for each 4x4 block of MBjpred, inheritance of motion data from the base layer motion data

- partitioning choice for each 8x8 block of MBjpred

- mode choice for MBjpred

These 3 steps are described in the following sections.

- 4x4 block inheritance

For each 4x4 block b of the macroblock, the inheriting process consists in considering the center point (x_c, y_c) of the block, computing its base layer luma location (xB, yB), and inheriting motion data from the base layer block containing the pixel (xB, yB). Figure 4 represents the localisation of a high layer pixel in the base layer.

The detailed motion data inheritance process for b is the following:

Identification of base layer corresponding macroblock, 8x8 block and 4x4 block :

- Let (xθ,yθ) be the luma coordinates of the upper left pixel of b. The base luma location ( xB, yB ) of the pixel (1 ,1 ) of b in the base layer is derived as follows :

_ (xθ - scaled base column + 1) * BasePicWidthInSamples_L scaled base width

(yθ - scaled base line + 1) * BasePicHeightInSamples_L

scaled base height with

BasePicWidthlnSamples_L= 16^*( BasePicWidthlnMbsMinus1+1 ) BasePicHeightlnSamples_L=16*( BasePicHeightlnMbsMinus1+1 ) the base corresponding macroblock is defined as the base layer macroblock containing pixel ( xB, yB ). Its coordinates ( mbxB, mbyB ) are given by the following equations:

- the base corresponding 8x8 block is defined as the base layer 8x8 block, belonging to the base corresponding macroblock, and containing pixel ( xB, yB ). Its coordinates ( bδxB, bδyB ) in the base corresponding macroblock are defined by the following equations: ibSxB = (xB - mbxB * 16) / 8 <

[bSyB = (yB - mbyB * 16) / 8

- the base corresponding 4x4 block is defined as the base layer 4x4 block containing pixel ( xB, yB ). Its coordinates ( b4xB, b4yB ) in the base corresponding macroblock are defined by the following equations:

Motion data inheritance :

- IF the base corresponding macroblock is intra, THEN 4x4 block is set as intra

- ELSE for each list listx (listx=0 or 1 ), the 4x4 block gets the reference index and motion vector from the base corresponding 4x4 block

- 8x8 block inheritance Once each 4x4 block has been treated, a merging process is applied to merge reference indices (1 per list) and motion vectors of the 8x8 block it belongs to. In the following, 4x4 blocks of a 8x8 block are identified as indicated in figure 5, blocks b1 and b2 for the top, b3 and b4 for the bottom, from the left to the right. For each 8x8 block B, the following process is applied: - IF the four 4x4 blocks have been classified as intra blocks, B is considered as intra 8x8 block.

- ELSE, reference indices and partitioning mode are chosen as follows:

Reference indices choice (for assigning same indices to each 4x4 block) o for each list Ix

^■ IF no 4x4 block uses this list, no reference index and motion vector of this list are set to B

■ ELSE • reference index r_B(lx) for B is computed as the minimum of the existing reference indices of the 4 4x4 blocks: r_B 0^x) = . min (r_b (lx)) be{b_1;b₂,b₃,b₄}

• IF (r_b1(lx) != r_β(/x)) o r_b1 (Ix) = r _B(IX) o IF (r_b2(lx)==r_B(lx)) mv_b1(lx) = mv_b2(lx) o ELSE IF (r_ω(/x)==r_β(/x)) mι/_M(/x)=mι/_ω(/x) o ELSE IF (r_M(lx)==r_B(lx)) mv_b1(lx)=mv_M(lx)

• IF (r_b2(lx) != r_B(lx)) o r_b2(lx) = r_B(lx) o IF (r_b1(lx)==r_B(lx)) mv_b2(lx) = mv_b1(lx) o ELSE IF (r_M(lx)==r_B(lx)) mv_b2(lx)=mv_M(lx) o ELSE IF (r_ω(/x)==r_β(/x)) mv^lx^mv^lx)

• IF (r_ω(/x) != r_B(lx)) o r_ω(/x) = r_B(lx) o IF (r_M(lx)==r_B(lx)) mv_b3(lx) = mv_M(lx) o ELSE IF (r_b1(lx)==r_B(lx)) mι/_ω(/x)=mι/_M(/x) o ELSE IF (r_b2(lx)==r_B(lx)) mι/_ω(/x)=mι/₆₂(/x)

• IF (r_M(lx) != r_B(lx)) o r_M(lx) = r_B(lx) o IF (r_ω(/x)==r_β(/x)) mv_M(lx) = mv^lx) o ELSE IF (r_b2(lx)==r_B(lx)) mv_M(lx)=mv_b2(lx) o ELSE IF (r_b1(lx)==r_B(lx)) mv_M(lx)=mv_b1(lx)

Choice of partitioning mode o Two 4x4 blocks are considered as identical if their motion vectors are identical. The merging process is applied as follows:

^■ IF bi is identical to b₂ and b₃ is identical to b₄ THEN

• if bi is identical to b₃ then BLK_8x8 is chosen • else BLK_8x4 is chosen

^■ ELSE IF bi is identical to ύ₃ and b₂ is identical to b₄ THEN BLK_4x8 is chosen

■ ELSE BLK_4x4 is chosen

- prediction macroblock mode choice

A final process is achieved to determine the MB pred mode. In the following, 8x8 blocks of the macroblock are identified as indicated in figure 6, blocks B1 and B2 for the top, B3 and B4 for the bottom, from the left to the right.

Two 8x8 blocks are considered as identical blocks if: - One or both of the two 8x8 blocks are classified as intra blocks or

- Partitioning mode of both blocks is BLK_8x8 and reference indices and motion vectors of listO and Iist1 of each 8x8 block, if they exist, are identical.

The mode choice is done using the following process:

- IF all 8x8 blocks are classified as intra blocks, THEN MB pred is classified as INTRA macroblock

- ELSE, MB pred is an INTER macroblock Remaining intra 8x8 blocks are enforced to be Inter 8x8 blocks o 8x8 blocks classified as intra are enforced to INTER blocks with 8x8 partitioning. Their reference indices and motion vectors are computed as follows. Let B,_NTRA be such a 8x8 block. for each list listx

^■ IF none of the other 8x8 block uses this list, no reference index and motion vector of this list is assigned tO BiNTRA

^■ ELSE, the following steps are applied: • reference index r(listx) is computed as the minimum of the existing reference indices of the other 8x8 blocks: r (listx) = (listx))

• mean motion vector mv_mean(listx) of the 8x8 blocks having the same reference index r(listx) is computed

• r(listx) is assigned to B_INTRA and each 4x4 block of BiNTRA is enforced to have r{listx) and mv_mean(listx) as reference index and motion vector. Mode choice o The choice of the partitioning mode for B is achieved. Two 8x8 blocks are considered as identical if their Partitioning mode is

8x8 and reference indices and motion vectors of listO and Iist1 of each 8x8 block, if they exist, are identical. The merging process is applied as follows:

^■ if B1 is identical to B2 and B3 is identical to B4 then

• if B1 is identical to B3 then MODE_16x16 is chosen.

• else MODE_16x8 is chosen. ^■ else if B1 is identical to B3 and B2 is identical to B4 then

MODE_8x16 is chosen.

■ else MODE 8x8 is chosen. - motion vectors scaling

A motion vector rescaling is finally applied to every existing motion vectors of the prediction macroblock MB pred. Motion data mv=(d_χ,d_y) is scaled in the mv_s=(dsx,d_Sy) using the following equations:

|d_sx = [d_x .α_horiz] |d_sy = [d_y .α_verticj

Using the semantic described further, the actual formulas are as follows :

2 * d_x * scaled base width + sign[d_x ] * BasePicWidthInSamples_L d_Sχ = 2 * BasePicWidthInSamples_L

2 *d_y * scaled base height + sign[d_y J* BasePicHeightInSamples_L ^sy 2 * BasePicHeightlnSamples _L

Inter-layer texture prediction

Inter layer texture prediction is based on the same principles as inter layer motion prediction. It is only possible for macroblocks fully embedded in the scaled base layer window (grey-colored area in figure 3). For Intra texture prediction, the interpolation filter is applied across transform blocks boundaries. For residual texture prediction, this process only works inside transform blocks (4x4 or 8x8 depending on the transform).

The process at the decoder works as follows. Let MB be a high layer texture macroblock to be interpolated. Texture samples of MB are derived as follows:

- let ( xP , yP ) be the position of the upper left pixel of the macroblock in the high layer coordinates reference.

- a base layer prediction array is first derived as follows: o the corresponding quarter-pel position ( x4, y4 ) of ( xP , yP ) in the base layer is computed as: x4 = 4*x/^>*BasePicWidthInSamples_L /scaled_base_width [y4 = 4 *yP* BasePicHeightlnSamples _L /scaled_base_height the integer-pel position ( xB , yB ) is then derived as:

the quarter-pel phase is then derived as:

o the base layer prediction array corresponds to the samples contained in the area (xB-8, yB-8) and (xB+16, yB+16). The same filling process, as used in the dyadic case, is applied to fill samples areas corresponding to non existing or non available samples (for instance, in case of intra texture prediction, samples that do not belong to intra blocks). - the base layer prediction array is then upsampled. The upsampling is applied in two steps : first, texture is upsampled using the AVC half pixel 6-tap filter; then a bilinear interpolation is achieved to build the quarter pel samples, which results in a quarter-pel interpolation array. For intra texture, this interpolation crosses block boundaries. For residual texture, interpolation does not cross transform block boundaries.

- then, prediction sample pred[ x, y ] at each position ( x, y ), x=0..N- 1 ,y=0..N-1 , of the high layer block is computed as:

pred[ x, y ] = interp[ xl , yl ] with xl = px + (4 * x * BasePicWidthInSamples_L /scaled base width) [yl = py + (4 * y * BasePicHeightInSamples_L /scaled base height) interp[ xl , yl ] is the quarter-pel interpolated base layer sample at position ( xl, yl )

- inter-layer intra texture prediction

A given macroblock MB of current layer can exploit intra layer residual prediction only if co-located macroblocks of the base layer exist and are intra macroblocks.

For generating the intra prediction signal for high-pass macroblocks coded in I_BL mode, the corresponding 8x8 blocks of the base layer high-pass signal are directly de-blocked and interpolated, as in case of 'standard' dyadic spatial scalability. The same padding process is applied for deblocking. - inter-layer residual prediction

A given macroblock MB of current layer can exploit inter layer residual prediction only if co-located macroblocks of the base layer exist and are not intra macroblocks.

At the encoder, the upsampling process consists in upsampling each elementary transform block, without crossing the block boundaries. For instance, if a MB is coded into four 8x8 blocks, four upsampling processes will be applied on exactly 8x8 pixels as input.

The interpolation process is achieved in two steps : first, the base layer texture is upsampled using the AVC half pixel 6-tap filter; then a bilinear interpolation, is achieved to build the quarter pel samples. Interpolated high layer samples The nearest quarter pel position is chosen as the interpolated pixel.

Figure 7 shows how the 4x4 blocks are oversampled in the case of a 4/3 ratio. The number of pixels per block in the original picture is 4x4. After the 4/3 oversampling, this number becomes 6x6, 6x5, 5x6 or 5x5 depending on the location of the block within the picture. It can be observed that this number of pixels inside an oversampled 4x4 block follows a regular periodic pattern. In the 4/3 case, the pattern is : {6, 5, 5}

The phase of the top left pixel of each, expressed as the position of this pixel relatively to original grid is given by another periodic pattern. In the 4/3 case, the pattern is : {0, 1/2, 1/4}.

Claims

1 Process for scalable coding of images with a F1 format (1 ) corresponding to an enhanced layer and with a F2 format (2) having a lower resolution than F1 , corresponding to a base layer, comprising an inter layer residual prediction mode to code a current macroblock of the enhancement layer from a prediction macroblock of the base layer, a macroblock being made up of elementary blocks, the prediction macroblock being obtained through an upsampling of the base layer (3), characterised in that the upsampling is made at an elementary block level and in that the phase of the top left sample within an elementary block depends on the location of the elementary block within the picture.

2 Process according to claim 1 , characterized in that the size of an elementary block is 4 x 4 pixels.

3 Process according to claim 1 , characterized in that the filter used for the upsampling is a 6-tap filter.

4 Process according to claim 1 , characterized in that the upsampling is a 4/3 upsampling corresponding to 4 samp les for 3 original pixels.

5 Process according to claim 4, characterized in that the phase, expressed as the relative position of the pixel, for 3 successive elementary blocks, is 0, VT. and ^ΛA.