WO2013069117A1

WO2013069117A1 - Prediction image generation method, encoding method, and decoding method

Info

Publication number: WO2013069117A1
Application number: PCT/JP2011/075851
Authority: WO
Inventors: 昭行谷沢; 中條　健
Original assignee: 株式会社東芝
Priority date: 2011-11-09
Filing date: 2011-11-09
Publication date: 2013-05-16
Also published as: TW201320750A

Abstract

The prediction image generation method of an embodiment comprises first to third derivation steps and a prediction image generation step. In the first derivation step, a pixel average value and a pixel error value of each of two or more reference images are derived. In the second deviation step, a pixel average value of a prediction image is derived using the time-distance ratio between at least two reference images and the prediction image and the pixel average values of the at least two reference images, and the pixel error value of the prediction image is derived using the time-distance ratio and the pixel error values of the at least two reference images. In the third derivation step, the weight coefficients of the reference images are derived using the pixel error values of the reference images and the pixel error value of the prediction image, and the offsets of the reference images are derived using the derived weight coefficients, the pixel average values of the reference images, and the pixel average value of the prediction image. In the prediction image generation step, using a reference image of one target block obtained by dividing an input image into a plurality of blocks, the weight coefficients, and the offsets, the prediction image of the target block is generated.

Description

Predictive image generation method, encoding method, and decoding method

Embodiments of the present invention relate to a predicted image generation method, an encoding method, and a decoding method.

In recent years, image coding methods that have greatly improved coding efficiency have been jointly developed by ITU-T (International Telecommunication Union Telecommunication Standardization Sector) and ISO (International Organization for Standardization) / IEC (International Electrotechnical Commission). T REC. H. H.264 and ISO / IEC 14496-10 (hereinafter referred to as “H.264”).

H. H.264 discloses an inter-prediction coding scheme that eliminates temporal redundancy and realizes high coding efficiency by performing fractional accuracy motion compensated prediction using a coded image as a reference image. ing.

Also proposed is an implicit motion compensation prediction method that encodes moving images including fade and dissolve effects more efficiently than the inter prediction encoding methods in ISO / IEC MPEG (Moving Picture Experts Group) -1, 2, 4 Has been. In this method, as a framework for predicting a change in pixel value in the time direction, motion compensation prediction with fractional accuracy is performed on an input moving image having luminance and two color differences. Then, the reference image, the luminance and the weighting factor for each of the two color differences are implicitly derived by the temporal distance ratio between the reference images, and the prediction image is multiplied by the weighting factor.

JP 2004-179687 A

However, in the related art as described above, weighted motion compensation prediction is realized by multiplying a pixel after motion compensation prediction by a weighting factor implicitly derived by the temporal distance ratio of the reference image. Since the offset term for correcting the deviation of the value is not taken into account, the encoding efficiency is reduced. The problem to be solved by the present invention is to provide a prediction image generation method, an encoding method, and a decoding method capable of improving the encoding efficiency.

The predicted image generation method of the embodiment includes a first derivation step, a second derivation step, a third derivation step, and a predicted image generation step. In the first derivation step, a pixel error value indicating a pixel average value of each of two or more reference images and a pixel difference from the pixel average value is derived. In the second derivation step, a pixel average value of the predicted image using a temporal distance ratio between at least two reference images and the predicted image of the two or more reference images and the pixel average value of the at least two reference images. And the pixel error value of the predicted image is derived using the temporal distance ratio and the pixel error value of the at least two reference images. In the third derivation step, the weight coefficient of the reference image is derived using the pixel error value of the reference image and the pixel error value of the predicted image, and the derived weight coefficient and the pixel of the reference image The offset of the reference image is derived using the average value and the pixel average value of the predicted image. In the predicted image generation step, using the reference image of one target block obtained by dividing the input image of the reference image into a plurality of blocks, the weighting coefficient of the reference image, and the offset of the reference image, the target block The predicted image is generated.

The block diagram which shows the example of the encoding apparatus of 1st Embodiment. Explanatory drawing which shows the example of the prediction encoding order of the pixel block in 1st Embodiment. The figure which shows the block size example of the coding tree block in 1st Embodiment. The figure which shows the specific example of the coding tree block of 1st Embodiment. The figure which shows the specific example of the coding tree block of 1st Embodiment. The figure which shows the specific example of the coding tree block of 1st Embodiment. The block diagram which shows the example of the estimated image generation part of 1st Embodiment. The figure which shows the example of the relationship of the motion vector of the motion compensation prediction in the bidirectional | two-way prediction of 1st Embodiment. The block diagram which shows the example of the multi-frame motion compensation part of 1st Embodiment. Explanatory drawing of the example of the fixed point precision of the weighting coefficient in 1st Embodiment. The figure which shows the example of the reference image group feature-value of 1st Embodiment. The block diagram which shows the structural example of the reference image feature-value derivation | leading-out part of 1st Embodiment. The block diagram which shows the structural example of the estimated image feature-value derivation | leading-out part of 1st Embodiment. The figure which shows the example of WP parameter information of 1st Embodiment. The figure which shows the example of WP parameter information of 1st Embodiment. 6 is a flowchart illustrating an example of a reference image group feature amount derivation process according to the first embodiment. 5 is a flowchart illustrating an example of a predicted image feature amount derivation process according to the first embodiment. The flowchart which shows the example of WP parameter information derivation processing of 1st Embodiment. The figure which shows the example of the syntax of 1st Embodiment. The figure which shows the example of the picture parameter set syntax of 1st Embodiment. The figure which shows the example of the sequence parameter set syntax of 1st Embodiment. The block diagram which shows the structural example of the decoding apparatus of 2nd Embodiment.

Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The encoding device and decoding device of each of the following embodiments can be realized by hardware such as an LSI (Large-Scale Integration) chip, a DSP (Digital Signal Processor), or an FPGA (Field Programmable Gate Array). In addition, the encoding device and the decoding device of each of the following embodiments can be realized by software by causing a computer to execute a program. In the following description, the term “image” can be appropriately replaced with a term such as “video”, “pixel”, “image signal”, “picture”, or “image data”.

(First embodiment)
In the first embodiment, an encoding apparatus that encodes a moving image will be described.

FIG. 1 is a block diagram illustrating an example of the configuration of the encoding device 100 according to the first embodiment.

The encoding apparatus 100 divides each frame or each field constituting the input image into a plurality of pixel blocks, and uses the encoding parameters input from the encoding control unit 113 to predict the divided pixel blocks. To generate a predicted image. The encoding apparatus 100 subtracts the input image divided into a plurality of pixel blocks and the prediction image to generate a prediction error, orthogonally transforms and quantizes the generated prediction error, and further performs entropy encoding to perform encoding. Generate and output data.

The encoding apparatus 100 performs predictive encoding by selectively applying a plurality of prediction modes in which at least one of the block size of the pixel block and the prediction image generation method is different. The generation method of the prediction image is roughly classified into two types: intra prediction that performs prediction within the encoding target frame and inter prediction that performs motion compensation prediction using one or more reference frames that are temporally different. Note that intra prediction is also referred to as intra prediction or intra frame prediction, and inter prediction is also referred to as inter prediction, inter frame prediction, motion compensation prediction, or the like.

FIG. 2 is an explanatory diagram showing an example of a predictive coding order of pixel blocks in the first embodiment. In the example illustrated in FIG. 2, the encoding device 100 performs predictive encoding from the upper left to the lower right of the pixel block, and the left side of the encoding target pixel block c in the encoding target frame f and The encoded pixel block p is located on the upper side. Hereinafter, for simplification of description, the encoding apparatus 100 performs predictive encoding in the order shown in FIG. 2, but the order of predictive encoding is not limited to this.

The pixel block indicates a unit for processing an image, and corresponds to, for example, an M × N size block (M and N are natural numbers), a coding tree block, a macro block, a sub block, or one pixel. In the following description, the pixel block is basically used in the meaning of the coding tree block, but may be used in other meanings. For example, in the description of the prediction unit, a pixel block is used to mean a pixel block of the prediction unit. A block may be called by a name such as a unit. For example, a coding block is called a coding unit.

FIG. 3A is a diagram showing an example of the block size of the coding tree block in the first embodiment. The coding tree block is typically a 64 × 64 pixel block as shown in FIG. 3A. However, the present invention is not limited to this, and it may be a 32 × 32 pixel block, a 16 × 16 pixel block, an 8 × 8 pixel block, a 4 × 4 pixel block, or the like. Further, the coding tree block may not be a square, and may be, for example, an M × N size (M ≠ N) pixel block.

3B to 3D are diagrams illustrating specific examples of the coding tree block according to the first embodiment. FIG. 3B shows a coding tree block having a block size of 64 × 64 (N = 32). N represents the size of the reference coding tree block. The size when divided is defined as N, and the size when not divided is defined as 2N. FIG. 3C shows a coding tree block obtained by dividing the coding tree block of FIG. 3B into quadtrees. The coding tree block has a quadtree structure as shown in FIG. 3C. When the coding tree block is divided, the four pixel blocks after the division are numbered in the Z-scan order as shown in FIG. 3C.

Note that the coding tree block can be further divided into quadtrees within one quadtree number. Thereby, a coding tree block can be divided | segmented hierarchically. In this case, the depth of division is defined by Depth. FIG. 3D shows one of the coding tree blocks obtained by dividing the coding tree block of FIG. 3B into quadtrees, and the block size is 32 × 32 (N = 16). The Depth of the coding tree block shown in FIG. 3B is 0, and the Depth of the coding tree block shown in FIG. The coding tree block having the largest unit is called a large coding tree block, and the input image signal is encoded in this unit in the raster scan order.

In the following description, the encoding target block or coding tree block of the input image may be referred to as a prediction target block or a prediction pixel block. The encoding unit is not limited to a pixel block, and at least one of a frame, a field, a slice, a line, and a pixel can be used.

As illustrated in FIG. 1, the encoding device 100 includes a subtraction unit 101, an orthogonal transformation unit 102, a quantization unit 103, an inverse quantization unit 104, an inverse orthogonal transformation unit 105, an addition unit 106, and a prediction The image generation unit 107, the reference image feature value deriving unit 108, the predicted image feature value deriving unit 109, the parameter deriving unit 110, the motion evaluation unit 111, and the encoding unit 112 are provided. Note that the encoding control unit 113 shown in FIG. 1 controls the encoding apparatus 100 and can be realized by, for example, a CPU (Central Processing Unit).

The subtraction unit 101 subtracts the corresponding prediction image from the input image divided into pixel blocks to obtain a prediction error. The subtraction unit 101 outputs a prediction error and inputs it to the orthogonal transform unit 102.

The orthogonal transform unit 102 performs orthogonal transform such as discrete cosine transform (DCT) or discrete sine transform (DST) on the prediction error input from the subtraction unit 101 to obtain transform coefficients. The orthogonal transform unit 102 outputs transform coefficients and inputs them to the quantization unit 103.

The quantization unit 103 performs a quantization process on the transform coefficient input from the orthogonal transform unit 102 to obtain a quantized transform coefficient. Specifically, the quantization unit 103 performs quantization according to quantization information such as a quantization parameter or a quantization matrix specified by the encoding control unit 113. More specifically, the quantization unit 103 divides the transform coefficient by the quantization step size derived from the quantization information to obtain a quantized transform coefficient. The quantization parameter indicates the fineness of quantization. The quantization matrix is used for weighting the fineness of quantization for each component of the transform coefficient. The quantization unit 103 outputs the quantized transform coefficient and inputs it to the inverse quantization unit 104 and the encoding unit 112.

The inverse quantization unit 104 performs an inverse quantization process on the quantized transform coefficient input from the quantization unit 103 to obtain a restored transform coefficient. Specifically, the inverse quantization unit 104 performs inverse quantization according to the quantization information used in the quantization unit 103. More specifically, the inverse quantization unit 104 multiplies the quantization transform coefficient by the quantization step size derived from the quantization information to obtain a restored transform coefficient. The quantization information used in the quantization unit 103 is loaded from an internal memory (not shown) of the encoding control unit 113 and used. The inverse quantization unit 104 outputs the restored transform coefficient and inputs it to the inverse orthogonal transform unit 105.

The inverse orthogonal transform unit 105 performs inverse orthogonal transform such as inverse discrete cosine transform (IDCT) or inverse discrete sine transform (IDST) on the restored transform coefficient input from the inverse quantization unit 104, Get the restoration prediction error. Note that the inverse orthogonal transform performed by the inverse orthogonal transform unit 105 corresponds to the orthogonal transform performed by the orthogonal transform unit 102. The inverse orthogonal transform unit 105 outputs the reconstruction prediction error and inputs it to the addition unit 106.

The addition unit 106 adds the restored prediction error input from the inverse orthogonal transform unit 105 and the corresponding predicted image to generate a locally decoded image. The adding unit 106 outputs the local decoded image and inputs it to the predicted image generation unit 107.

The predicted image generation unit 107 stores the locally decoded image input from the addition unit 106 as a reference image in a memory (not shown in FIG. 1), outputs the reference image stored in the memory, and outputs a reference image feature quantity deriving unit 108. And input to the motion evaluation unit 111. The predicted image generation unit 107 performs weighted motion compensation prediction based on the WP parameter information input from the parameter derivation unit 110 and the motion information input from the motion evaluation unit 111, and generates a predicted image. The predicted image generation unit 107 outputs a predicted image and inputs it to the subtracting unit 101 and the adding unit 106.

FIG. 4 is a block diagram illustrating an example of the configuration of the predicted image generation unit 107 of the first embodiment. As shown in FIG. 4, the predicted image generation unit 107 includes a multi-frame motion compensation unit 201, a memory 202, a unidirectional motion compensation unit 203, a prediction parameter control unit 204, a reference image selector 205, and a frame memory 206. And a reference image control unit 207.

The frame memory 206 stores the locally decoded image input from the addition unit 106 as a reference image under the control of the reference image control unit 207. The frame memory 206 has a plurality of memory sets FM0 to FMN (N ≧ 1) for temporarily storing reference images.

The prediction parameter control unit 204 prepares a plurality of combinations of reference image numbers and prediction parameters as a table based on the motion information input from the motion evaluation unit 111. Here, the motion information indicates a motion vector indicating a shift amount of motion used in motion compensation prediction, a reference image number, information on a prediction mode such as unidirectional / bidirectional prediction, and the like. The prediction parameter refers to information regarding a motion vector and a prediction mode. Then, the prediction parameter control unit 204 selects and outputs a combination of the reference image number of the reference image used for generating the prediction image and the prediction parameter based on the input image, and inputs the reference image number to the reference image selector 205. The prediction parameter is input to the unidirectional motion compensation unit 203.

The reference image selector 205 is a switch for switching which output end of the frame memories FM0 to FMN included in the frame memory 206 is connected according to the reference image number input from the prediction parameter control unit 204. For example, if the reference image number is 0, the reference image selector 205 connects the output end of FM0 to the output end of the reference image selector 205. If the reference image number is N, the reference image selector 205 connects the output end of the FMN to the reference image selector. Connect to 205 output. The reference image selector 205 outputs the reference image stored in the frame memory to which the output terminal is connected among the frame memories FM0 to FMN included in the frame memory 206, and the unidirectional motion compensation unit 203, the reference image feature amount This is input to the derivation unit 108 and the motion evaluation unit 111.

The unidirectional motion compensation unit 203 performs a motion compensation prediction process according to the prediction parameter input from the prediction parameter control unit 204 and the reference image input from the reference image selector 205, and generates a unidirectional prediction image.

FIG. 5 is a diagram illustrating an example of a motion vector relationship of motion compensation prediction in the bidirectional prediction according to the first embodiment. In motion compensated prediction, interpolation processing is performed using a reference image, and a unidirectional prediction image is generated based on the amount of motion deviation from the pixel block at the encoding target position between the generated interpolation image and the input image. . Here, the amount of deviation is a motion vector. As shown in FIG. 5, in the bi-directional prediction slice (B-slice), a prediction image is generated using a set of two types of reference images and motion vectors. As the interpolation process, an interpolation process with 1/2 pixel accuracy, an interpolation process with 1/4 pixel accuracy, or the like is used, and the value of the interpolation image is generated by performing the filtering process on the reference image. For example, H.P. capable of performing interpolation processing up to 1/4 pixel accuracy on a luminance signal. In H.264, the shift amount is expressed by four times the integer pixel accuracy.

The unidirectional motion compensation unit 203 outputs a unidirectional prediction image and temporarily stores it in the memory 202. Here, when the motion information (prediction parameter) indicates bidirectional prediction, the multi-frame motion compensation unit 201 performs weighted prediction using two types of unidirectional prediction images. The first unidirectional prediction image corresponding to the first is stored in the memory 202, and the second unidirectional prediction image is directly output to the multi-frame motion compensation unit 201. Here, the first unidirectional prediction image corresponding to the first is the first prediction image, and the second unidirectional prediction image is the second prediction image.

Note that two unidirectional motion compensation units 203 may be prepared and each may generate two unidirectional prediction images. In this case, when the motion information (prediction parameter) indicates unidirectional prediction, the unidirectional motion compensation unit 203 directly outputs the first unidirectional prediction image as the first prediction image to the multi-frame motion compensation unit 201. Good.

The multi-frame motion compensation unit 201 uses the first prediction image input from the memory 202, the second prediction image input from the unidirectional motion compensation unit 203, and the WP parameter information input from the motion evaluation unit 111. A prediction image is generated by performing weighted prediction. The multi-frame motion compensation unit 201 outputs a prediction image and inputs the prediction image to the subtraction unit 101 and the addition unit 106.

FIG. 6 is a block diagram illustrating an example of the configuration of the multi-frame motion compensation unit 201 according to the first embodiment. As shown in FIG. 6, the multi-frame motion compensation unit 201 includes a default motion compensation unit 301, a weighted motion compensation unit 302, a WP parameter control unit 303, and

WP selectors

304 and 305.

The WP parameter control unit 303 outputs a WP application flag and weight information based on the WP parameter information input from the parameter derivation unit 110, inputs the WP application flag to the

WP selectors

304 and 305, and weights the weight information. Input to the motion compensation unit 302.

Here, the WP parameter information includes the fixed-point precision of the weight coefficient, the first WP application flag corresponding to the first predicted image, the first weight coefficient, the first offset, and the second WP application corresponding to the second predicted image. Contains information on flag, second weighting factor, and second offset. The WP application flag is a parameter that can be set for each corresponding reference image and signal component, and indicates whether to perform weighted motion compensation prediction. The weight information includes information on the fixed point precision of the weight coefficient, the first weight coefficient, the first offset, the second weight coefficient, and the second offset.

Specifically, when the WP parameter information is input from the parameter deriving unit 110, the WP parameter control unit 303 outputs the WP parameter information by separating it into a first WP application flag, a second WP application flag, and weight information. The first WP application flag is input to the WP selector 304, the second WP application flag is input to the WP selector 305, and the weight information is input to the weighted motion compensation unit 302.

The

WP selectors

304 and 305 switch the connection end of each predicted image based on the WP application flag input from the WP parameter control unit 303. When each WP application flag is 0, the

WP selectors

304 and 305 connect each output terminal to the default motion compensation unit 301. Then, the

WP selectors

304 and 305 output the first predicted image and the second predicted image and input them to the default motion compensation unit 301. On the other hand, when each WP application flag is 1, the

WP selectors

304 and 305 connect each output terminal to the weighted motion compensation unit 302. The

WP selectors

304 and 305 output the first predicted image and the second predicted image, and input them to the weighted motion compensation unit 302.

The default motion compensation unit 301 performs an average value process based on the two unidirectional prediction images (first prediction image and second prediction image) input from the

WP selectors

304 and 305 to generate a prediction image. Specifically, when the first WP application flag and the second WP application flag are 0, the default motion compensation unit 301 performs average value processing based on Expression (1).

P [x, y] = Clip1 ((PL0 [x, y] + PL1 [x, y] + offset2) >> (shift2)) ... (1)

Here, P [x, y] is a predicted image, PL0 [x, y] is a first predicted image, and PL1 [x, y] is a second predicted image. offset2 and shift2 are parameters of the rounding process in the average value process, and are determined by the internal calculation accuracy of the first predicted image and the second predicted image. If the bit accuracy of the prediction image is L and the bit accuracy of the first prediction image and the second prediction image is M (L ≦ M), shift2 is formulated by Equation (2), and offset2 is formulated by Equation (3). Is done.

Shift2 = (ML + 1) ... (2)

Offset2 = (1 << (shift2-1) ... (3)

For example, when the bit accuracy of the predicted image is 8 and the bit accuracy of the first predicted image and the second predicted image is 14, shift2 = 7 from Equation (2), and offset2 = (1 << from Equation (3). 6).

When the prediction mode indicated by the motion information (prediction parameter) is unidirectional prediction, the default motion compensation unit 301 uses only the first prediction image and calculates a final prediction image based on Expression (4). calculate.

P [x, y] = Clip1 ((PLX [x, y] + offset1) >> (shift1)) (4)

Here, PLX [x, y] indicates a unidirectional prediction image (first prediction image), and X is an identifier indicating the list number of the reference list, and is either 0 or 1. For example, when the list number is 0, PL0 [x, y] is obtained, and when the list number is 1, PL1 [x, y] is obtained. offset1 and shift1 are parameters of the rounding process, and are determined by the internal calculation accuracy of the first predicted image. Assuming that the bit accuracy of the predicted image is L and the bit accuracy of the first predicted image is M (L ≦ M), shift1 is formulated by Equation (5), and offset1 is formulated by Equation (6).

Shift1 = (ML) ... (5)

Offset1 = (1 << (shift1-1) ... (6)

For example, when the bit accuracy of the predicted image is 8 and the bit accuracy of the first predicted image is 14, shift1 = 6 from Equation (5) and offset1 = (1 << 5) from Equation (6).

The weighted motion compensation unit 302 is based on the two unidirectional prediction images (first prediction image and second prediction image) input from the

WP selectors

304 and 305 and the weight information input from the WP parameter control unit 303. Performs weighted motion compensation. Specifically, when the first WP application flag and the second WP application flag are 1, the weighted motion compensation unit 302 performs the weighting process based on Expression (7).

_{P [x, y] = Clip1} (((PL0 [x, y] * w 0C + PL1 [x, y] * w 1C + (1 << logWD C)) >> (logWD C +1)) + ((o _0C + o _1C +1) >> 1)) (7)

Here, w _0C is a weight coefficient corresponding to the first predicted image, w _1C is a weight coefficient corresponding to the second predicted image, o _0C is an offset corresponding to the first predicted image, and o _1C is corresponding to the second predicted image. Represents the offset to be performed. Hereinafter, these are referred to as a first weighting factor, a second weighting factor, a first offset, and a second offset, respectively. logWD _C is a parameter indicating the fixed-point precision of each weighting factor. The variable C means a signal component. For example, in the case of a YUV spatial signal, the luminance signal is C = Y, the Cr color difference signal is C = Cr, and the Cb color difference component is C = Cb.

Note that the weighted motion compensation unit 302 rounds off the log WD _C , which is the fixed-point precision, as in Expression (8) when the calculation accuracy of the first prediction image, the second prediction image, and the prediction image is different. Realize processing.

logWD ′ _C = logWD _C + offset1 (8)

The rounding process can be realized by replacing logWD _C in Expression (7) with logWD ′ _C in Expression (8). For example, a bit precision of the predicted image 8, when the bit precision of the first prediction image and the second prediction image is 14, by resetting the LogWD _C, the same operation accuracy and shift2 formula (1) It is possible to implement a batch rounding process in

When the prediction mode indicated by the motion information (prediction parameter) is unidirectional prediction, the weighted motion compensation unit 302 uses only the first predicted image and uses the final predicted image based on Equation (9). Is calculated.

P [x, y] = Clip 1 ((PLX [x, y] * w _XC + (1 << logWD _C −1)) >> (logWD _C )) (9)

Here, PLX [x, y] indicates a unidirectional prediction image (first prediction image), w _XC indicates a weighting factor corresponding to unidirectional prediction, and X is an identifier indicating a list number of the reference list. Yes, either 0 or 1. For example, when the list number is 0, PL0 [x, y] and w _{0C are obtained} , and when the list number is 1, PL1 [x, y] and w _1C are obtained.

Incidentally, the weighted motion compensator 302, if the calculation accuracy of the predicted image and the first prediction image and the second prediction image are different, equation as in the case of bidirectional prediction the LogWD _C is a fixed point precision (8) The rounding process is realized by controlling as described above.

The rounding process can be realized by replacing logWD _C in Expression (7) with logWD ′ _C in Expression (8). For example, when the bit accuracy of the predicted image is 8 and the bit accuracy of the first predicted image is 14, by resetting logWD _C , the batch rounding process with the same calculation accuracy as shift1 of Expression (4) is performed. It can be realized.

FIG. 7 is an explanatory diagram of an example of the fixed-point precision of the weighting factor in the first embodiment, and is a diagram illustrating an example of a change between a moving image having a pixel value change in the time direction and an average pixel value. In the example illustrated in FIG. 7, the encoding target frame is Frame (t), the temporally previous frame is Frame (t−1), and the temporally subsequent frame is Frame (t + 1). As shown in FIG. 7, in a fade image that changes from white to black, the brightness (gradation value) of the image decreases with time. The weighting coefficient means the degree of change in FIG. 7 and takes a value of 1.0 when there is no change in the pixel value, as is clear from Equation (7) and Equation (9). Fixed point precision is a parameter for controlling the step size corresponding to decimal weighting factor, the weighting factor when there is no pixel value variation, the 1 << logWD _C.

Here, with reference to FIG. 7, the relationship between a moving image with temporal pixel value change, a weighting factor, a fixed point precision of the weighting factor, and an offset will be described. As described above, in a fade image that changes from white to black as shown in FIG. 7, the pixel value (gradation value) of the image decreases with time. FIG. 7 shows an example in which the average pixel value of the image decreases with time, and the weighting factor corresponds to the inclination of the decrease.

Also, the fixed-point precision of the weighting coefficient is information indicating the precision of this slope. For example, in FIG. 7, when the weighting coefficient between Frame (t−1) and Frame (t + 1) is 0.75 in decimal precision, 3/4 can be expressed with 1/4 precision.

For example, if the slope does not match the predetermined fixed-point precision realization value, adjustment can be made with an offset value indicating a correction value (deviation amount) corresponding to the intercept of the linear function. For example, in FIG. 7, when the weighting coefficient between Frame (t−1) and Frame (t + 1) is 0.60 in decimal precision and the fixed decimal precision is 1 (1 << 1), the weighting coefficient is Since it does not correspond to the actual value of the fixed decimal point precision, for example, 1 (that is, corresponding to the decimal point precision 0.50 of the weight coefficient) is set. In this case, since the decimal point accuracy of the weighting coefficient is deviated from 0.60, which is the optimum value, it is possible to calculate this correction value from the maximum value of the pixel and set it as an offset value. is there. When the maximum value of the pixel is 255, it means that a value such as 25 (255 × 0.1) may be set as the offset value.

In the case of unidirectional prediction, since various parameters (second WP application flag, second weighting factor, and second offset information) corresponding to the second predicted image are not used, they are set to predetermined initial values. May be.

Returning to FIG. 1, the reference image feature amount deriving unit 108, the predicted image feature amount deriving unit 109, and the parameter deriving unit 110 generate the predicted image from the reference image input from the predicted image generation unit 107. WP parameter information corresponding to the predicted image is implicitly derived.

The reference image feature amount deriving unit 108 derives the reference image feature amount of each reference image input from the predicted image generation unit 107, outputs a reference image group feature amount that summarizes the derived reference image feature amounts, and The prediction image feature quantity deriving unit 109 and the parameter deriving unit 110 are input.

FIG. 8 is a diagram illustrating an example of a reference image group feature amount according to the first embodiment. In the example illustrated in FIG. 8, the reference image group feature amount is a table of 2N + 2 reference image feature amounts, and the reference image feature amount includes the list number of the reference list of the reference image and the reference image number of the reference image. , The pixel average value of the reference image, and the pixel error value indicating the pixel difference from the pixel average value of the reference image.

The list number is an identifier indicating the prediction direction, and takes a value of 0 for unidirectional prediction, and two values of 0 and 1 can be used for bidirectional prediction. The reference image number is a value corresponding to 0 to N indicated in the frame memory 206. List numbers and reference image numbers are H.264. Are managed in accordance with the DPB used in H.264, etc., and the encoding control unit 113 sets the setting contents in the predicted image generation unit 107 (for example, the reference image selector 205 selects any reference image as the reference image feature quantity deriving unit 108. The reference image feature quantity deriving unit 108 implicitly sets the value depending on whether the data is output to the reference image feature value. The average pixel value and the pixel error value are calculated by the reference image feature amount deriving unit 108.

Note that the table of reference image group feature values shown in FIG. 8 is an example, and the configuration of reference images that can be used differs depending on the coding structure, so the table size also differs. For example, in the case of a P-slice coding structure, the reference image number of list number 1 corresponding to bi-directional prediction is not usable, and thus becomes a table of list number 0. The table size also varies depending on the number of reference images.

FIG. 9 is a block diagram illustrating an example of the configuration of the reference image feature quantity deriving unit 108 of the first embodiment. As illustrated in FIG. 9, the reference image feature quantity deriving unit 108 includes an average value calculating unit 401, an error value calculating unit 402, and an integrating unit 403.

Each time the reference image is input from the predicted image generation unit 107, the average value calculation unit 401 calculates the pixel average value of the reference image, outputs the calculated pixel average value, and the error value calculation unit 402 and the integration unit Input to 403. The average value calculation unit 401 calculates the pixel average value of the reference image using, for example, Equation (10).

Here, DCLX (t) represents the pixel average value of the reference images of list number X and reference image number t. Therefore, if the pixel average value of the reference image of list number 0 and reference image number 1 is represented as DCL0 (1), if the pixel average value of the reference image of list number 1 and reference image number 0 is DCL1 ( 0). n indicates the number of pixels of the reference image of the list number X and the reference image number t. Y _{x, y} (t) represents the pixel value of the (x, y) coordinates of the reference image of list number X and reference image number t.

Each time the reference image is input from the predicted image generation unit 107, the error value calculation unit 402 uses the pixel average value of the reference image input from the average value calculation unit 401 to calculate the pixel average value of the reference image. A pixel error value is calculated, and the calculated pixel error value is output and input to the integration unit 403. The error value calculation unit 402 calculates the pixel error value of the pixel average value of the reference image using, for example, Equation (11).

Here, ACLX (t) represents a pixel error value that is an average value of differences (errors) between the pixel value of each pixel of the reference image of list number X and reference image number t and the pixel average value of the reference image. .

Whenever the reference image is input from the predicted image generation unit 107, the integration unit 403 receives the list number and reference image number of the reference image, the pixel average value of the reference image input from the average value calculation unit 401, and the error. The pixel error value of the pixel average value of the reference image input from the value calculation unit 402 is integrated into the reference image feature amount. Then, the integration unit 403 collects the integrated reference image feature values in a table as shown in FIG. 8 and outputs them as reference image group feature values, which are input to the predicted image feature value deriving unit 109 and the parameter deriving unit 110.

The predicted image feature amount deriving unit 109 derives a predicted image feature amount based on the reference image group feature amount input from the reference image feature amount deriving unit 108, outputs the derived predicted image feature amount, and the parameter deriving unit 110. To enter. The predicted image feature amount includes a pixel average value of the predicted image, a pixel error value obtained by averaging errors from the pixel average value, and a predicted image feature amount derivation flag indicating whether the predicted image feature amount has been derived.

FIG. 10 is a block diagram illustrating an example of the configuration of the predicted image feature quantity deriving unit 109 according to the first embodiment. As shown in FIG. 10, the predicted image feature quantity deriving unit 109 includes a feature quantity control unit 411, a memory 412, and a predicted image feature quantity calculation unit 413.

The feature amount control unit 411 selects and outputs two reference image feature amounts used for deriving the predicted image feature amount from the reference image group feature amount input from the reference image feature amount deriving unit 108, and outputs one of them in the memory 412. Is input (loaded), and the other is input to the predicted image feature amount calculation unit 413.

Specifically, the feature quantity control unit 411 displays the reference image display order (image display time) of the reference image feature quantity from the list number and reference image number of the reference image feature quantity collected in the reference image group feature quantity. Deriving POC (Picture Order Count) indicating The reference list and the reference image number are information for indirectly specifying the reference image, and the POC is information for directly specifying the reference image, and corresponds to the absolute position of the reference image. Then, the feature amount control unit 411 selects two POCs from the derived POC in the order of the shortest distance from the encoding target image (predicted image), thereby calculating the predicted image feature amount from the reference image group feature amount. Two reference image feature quantities used for derivation are selected.

The feature amount control unit 411 selects two POCs (POC1 and POC2) using Equation (12) when the encoded slice is P-slice.

for (refIdx = 0; refidx <= num_of_active_ref_l0_minus1; refIdx ++) {
refPOC [refIdx] = RefPicOrderCnt (ListL0, refIdx)
}
POC1 = SortRefPOC (refPOC, curPOC, num_of_active_ref_l0_minus1)
POC2 = SortRefPOC (refPOC, curPOC, num_of_active_ref_l0_minus1-1) (12)

Here, num_of_active_ref_l0_minus1 is one of the syntax elements, and indicates a value obtained by subtracting 1 from the number of reference images used in the reference list of list number 0, that is, N. RefPicOrderCnt is a function that, when given a list number of a reference image and a reference image number of the reference image, returns a POC of the reference image, ListL0 indicates list number 0, and refIdx indicates a reference image number. refPOC indicates the POC arrangement of the reference image in which the reference image feature amount is included in the reference image group feature amount, and curPOC indicates the POC of the encoding target image. SortRefPOC calculates the absolute difference value between each POC of the number +1 given by num_of_active_ref_l0_minus1 or num_of_active_ref_l0_minus1-1 of POCs stored in refPOC and the POC indicated by curPOC. SortRefPOC is a function that returns the POC having the smallest absolute difference value among the POCs stored in refPOC, deletes the POC from refPOC, and rearranges the POCs stored in refPOC.

For example, if {0, 1, 2, 3} is stored in refPOC, the value of curPOC is 4, and the value of num_of_active_ref_l0_minus1 is 3, SortRefPOC (refPOC, curPOC, num_of_active_ref_l0_minus1) has the smallest absolute difference value from refPOC POC number 3 that becomes 1 is returned as POC1, 3 is deleted from refPOC, and refPOC is rearranged to {0, 1, 2}. Subsequently, SortRefPOC (refPOC, curPOC, num_of_active_ref_l0_minus1-1) returns POC number 2 with the smallest absolute difference value of 2 from refPOC as POC2, deletes 2 from refPOC, and arranges refPOC in {0, 1} Change.

Since the correlation between the image feature amounts tends to increase as the temporal distance between the encoding target image and the reference image is shorter, two POCs (in order of increasing temporal distance from the encoded image) in P-slice. By selecting (reference image), the prediction accuracy of the predicted image feature quantity can be increased.

Also, the feature quantity control unit 411 selects two POCs (POC1 and POC2) using Expression (13) when the encoded slice is B-slice.

for (refIdx = 0; refidx <= num_of_active_ref_l0_minus1; refIdx ++) {
refPOCL0 [refIdx] = RefPicOrderCnt (ListL0, refIdx)
}
for (refIdx = 0; refidx <= num_of_active_ref_l1_minus1; refIdx ++) {
refPOCL1 [refIdx] = RefPicOrderCnt (ListL1, refIdx)
}
POC1 = SortRefPOC (refPOCL0, curPOC, num_of_active_ref_l0_minus1)
POC2 = SortRefPOC (refPOCL1, curPOC, num_of_active_ref_l1_minus1) (13)

Here, refPOCL0 indicates the POC arrangement of the reference image of list number 0 in which the reference image feature quantity is included in the reference image group feature quantity. num_of_active_ref_l1_minus1 is one of the syntax elements, and indicates a value obtained by subtracting 1 from the number of reference images used in the reference list of list number 1, that is, N. refPOCL1 indicates the POC arrangement of the reference image of list number 1 in which the reference image feature amount is included in the reference image group feature amount. ListL1 indicates list number 1.

Since the correlation between the image feature amounts tends to increase as the temporal distance between the encoding target image and the reference image is shorter, at the time of B-slice, the POC (reference image with the closest temporal distance to the encoded image) ) From the two reference lists one by one, the prediction accuracy of the predicted image feature quantity can be improved.

Note that, when the two selected POCs (POC1 and POC2) are the same, the feature amount control unit 411 reselects one POC (POC2) using Expression (14). The feature quantity control unit 411 can select two POCs having different temporal distances from the encoding target image (POC2 having a temporal distance different from that of POC1) from the encoding target image by repeatedly performing Expression (14).

if (POC1 == POC2) {
POC2 = SortRefPOC (refPOCLX, curPOC, num_of_active_ref_l1_minus1- (M))
} (14)

Here, X of refPOCLX is an identifier indicating a list number. For example, after searching a reference list of list number 0, a reference list of list number 1 may be searched. For this reason, when POC (POC2) is reselected, the two POCs (reference images) that are closest in time to the encoded image may be selected from the same reference list. M is a value indicating the number of repetitions. M is defined by the number of reference images set for each reference list, for example.

Then, when POC1 is selected, the feature quantity control unit 411 selects and outputs a reference image feature quantity corresponding to POC1 from the reference image group feature quantity, and inputs it to the memory 412. Further, when POC2 is selected, the feature amount control unit 411 selects and outputs a reference image feature amount corresponding to POC2 from the reference image group feature amount, and inputs it to the predicted image feature amount calculation unit 413.

In the first embodiment, the example in which the feature amount control unit 411 selects two POCs has been described, but three or more POCs may be selected. In this case, the feature amount control unit 411 selects three or more reference image feature amounts from the reference image group feature amount, and a predicted image feature amount is derived from the selected three or more reference image feature amounts. Note that in the case of P-slice, it is necessary that N ≧ 2, and the feature amount control unit 411 may search the reference list with the list number 0 after executing Expression (12).

The memory 412 holds the reference image feature amount (reference image feature amount corresponding to POC1 (hereinafter referred to as first reference image feature amount)) input by the feature amount control unit 411. Then, the memory 412 receives a reference image feature amount (a reference image feature amount corresponding to POC2 (hereinafter referred to as a second reference image feature amount)) from the feature amount control unit 411 to the predicted image feature amount calculation unit 413. Then, the first reference image feature value is output and input to the predicted image feature value calculation unit 413.

The predicted image feature value calculation unit 413 calculates a predicted image feature value using the first reference image feature value input from the memory 412 and the second reference image feature value input from the feature value control unit 411. To be output and input to the parameter deriving unit 110.

The predicted image feature amount calculation unit 413 first calculates the temporal distance between the reference image of the first reference image feature amount and the reference image of the second reference image feature amount according to Equation (15).

DistScaleFactor = Clip3 (-1024, 1023, (tb * tx + 32) >> 6) ... (15)

Here, Clip3 (A, B, C) returns A if the value C falls below the minimum value A, returns B if the value C exceeds the maximum value B, and returns value C if none of the conditions apply. Is a clip function that returns tb is calculated by Expression (16), and tx is calculated by Expression (17).

tb = Clip3 (-128, 127, curPOC-POC1) (16)
tx = (16384 + abs (td / 2)) / td (17)

Tb indicates the time distance between curPOC and POC1. tx is an intermediate variable for performing division of tb / td with fixed-point precision, and indicates a division result of tb / td. Note that the fixed point has 8-bit precision, and when the value of DistScaleFactor is 128 (median value), it means that tb / td is 1. td indicates a temporal distance between POC1 and POC2, and is calculated by Expression (18).

Td = Clip3 (-128, 127, POC2-POC1) ... (18)

Next, the predicted image feature value calculation unit 413 calculates the pixel average value of the predicted image according to Equation (19), and calculates the pixel error value of the predicted image according to Equation (20).

DCP = (DistScaleFactor * DC2 + (256-DistScaleFactor) * DC1 + Ofst3) >>(Shft3);… (19)
ACP = (DistScaleFactor * AC2 + (256-DistScaleFactor) * AC1 + Ofst3) >>(Shft3);… (20)

Here, DCP indicates the pixel average value of the predicted image, and DC1 and DC2 indicate the pixel average value of the first reference image feature value and the pixel average value of the second reference image feature value, respectively. ACP indicates the pixel error value of the predicted image, and AC1 and AC2 indicate the pixel error value of the first reference image feature value and the pixel error value of the second reference image feature value, respectively. The values of Shft3 and Ofst3 are determined according to INTERNAL_PREC indicating the internal calculation accuracy of the predicted image, Shft3 is calculated by Equation (21), and Ofst3 is calculated by Equation (22). In the first embodiment, since the fixed point precision of DistScaleFactor is 8, when INTERNAL_PREC is 8, DCP and ACP are rounded to integer precision.

Sfht3 = INTERNAL_PREC (21)
Ofst3 = (1 << (Shft3-1)) (22)

However, when the DistScaleFactor calculated by Expression (15) satisfies any one of Expressions (23) to (25), the predicted image feature amount calculation unit 413 cannot derive the predicted image feature amount or time Since the target distance is too far, the pixel average value DCP of the predicted image and the pixel error value ACP of the predicted image are set to initial values.

POC1 = POC2 (23)
(DistScaleFactor >> 2) <-64 (24)
(DistScaleFactor >>2)> 128 (25)

For example, when the DistScaleFactor satisfies any of the formulas (23) to (25), the predicted image feature value calculation unit 413 sets the predicted image feature value derivation flag wp_avaiable_flag, which is an internal variable, to false, If DistScaleFactor does not satisfy any of the conditions of Equations (23) to (25), wp_avaiable_flag is set to true. When the predicted image feature quantity derivation flag is set to false, initial values are set in the DCP and ACP. For example, DefaultDC indicating 0 is set in DCP, and DefaultAC indicating 0 is set in ACP. When the predicted image feature value derivation flag is set to true, the values calculated by Expression (19) and Expression (20) are set in DCP and ACP, respectively.

The parameter deriving unit 110 uses the reference image group feature amount input from the reference image feature amount deriving unit 108 and the predicted image feature amount input from the predicted image feature amount deriving unit 109 to use the WP parameter of the encoding target image. Deriving information.

FIG. 11A and FIG. 11B are diagrams illustrating an example of WP parameter information according to the first embodiment. An example of WP parameter information at the time of P-slice is as shown in FIG. 11A, and an example of WP parameter information at the time of B-slice is as shown in FIGS. 11A and 11B. The list number and the reference image number are the same as the reference image group feature quantity, and the WP application flag, the weighting coefficient, and the offset are as described with reference to FIGS. Note that the WP application flag, weighting factor, and offset with list number 0 correspond to the first WP application flag, first weighting factor, and first offset, respectively, and the WP application flag and weighting factor with list number 1 , Offset corresponds to a second WP application flag, a second weighting factor, and a second offset, respectively. Since the WP parameter information is held for each reference list and reference image, the information required for B-slice is 2N + 2 if N + 1 images are used.

The parameter deriving unit 110 first confirms the predicted image feature amount derivation flag wp_avaiable_flag included in the predicted image feature amount, and confirms whether the WP parameter information can be derived. When wp_avaiable_flag is set to false, the WP parameter information cannot be derived. Therefore, the parameter deriving unit 110 sets the weighting coefficient and the offset corresponding to the list number X and the reference image number Y to Expression (26) and Expression ( The initial value is set according to 27).

Weight [X] [Y] = (1 << Log2Denom) (26)
Offset [X] [Y] = 0 (27)

Weight [X] [Y] is a value corresponding to w _0C or w _1C used in Equations (7) and (9). Log2Denom is a value corresponding to log WD _C calculated by Expression (28) and used in Expression (7) and the like.

Log2Denom = Default_Value (28)

Here, Default_Value may be set to 0 or 7, for example.

When the wp_avaiable_flag is set to false, the parameter deriving unit 110 repeatedly executes Expression (26) and Expression (27) for all combinations of list numbers X and reference image numbers Y (all reference images). Thus, an initial value is set in the WP parameter information.

The parameter deriving unit 110 sets the WP application flag (WP_flag [X] [Y]) corresponding to the list number X and the reference image number Y to false when wp_avaiable_flag is set to false.

On the other hand, when wp_avaiable_flag is set to true, the WP parameter information can be derived. Therefore, the parameter deriving unit 110 calculates the weight coefficient and the offset corresponding to the list number X and the reference image number Y, respectively, using the formula (29) Derived according to equation (30).

Weight [X] [Y] = (((curAC << Log2Denom) + Ofst4) / (AC [X] [Y] << LeftShft)) ... (29)
Offset [X] [Y] = (((curDC << Log2Denom)-(Weight [X] [Y] * (DC [X] [Y] << LeftShft)) + RealOfst) >> RealLog2Denom)… (30)

Here, curDC and curAC indicate the pixel average value DCP and the pixel error value ACP of the predicted image, respectively. DC [X] [Y] and AC [X] [Y] indicate the pixel average value DCLX (Y) and pixel error value ACLX (Y) of the reference image with the list number X and the reference image number Y, respectively. LeftShft is a value for correcting a change in calculation accuracy for Shft3 used in equations (19) to (22), and is calculated by equation (31).

LeftShft = (8-INTERNAL_PREC) ... (31)

Ofst4 is a parameter used for rounding when dividing by AC [X] [Y]. For example, if rounding is performed during rounding with fixed-point precision, Ofst4 may be set to (AC [X] [Y] >> 1), and Ofst4 may be set to 0 when always rounding down. RealLog2Denom is calculated by Expression (32), and RealOfst is calculated by Expression (33).

RealLog2Denom = Log2Denom + LeftShft (32)
RealOfst = (1 << (RealLog2Denom-1)) ... (33)

Here, RealLog2Denom may be set to a predetermined value such as 7, for example.

When the wp_avaiable_flag is set to true, the parameter deriving unit 110 repeatedly executes Expression (29) and Expression (30) for all combinations of the list number X and the reference image number Y (all reference images). Thus, the WP parameter information shown in FIGS. 11A and 11B is derived.

The parameter deriving unit 110 sets the WP application flag (WP_flag [X] [Y]) corresponding to the list number X and the reference image number Y to true when wp_avaiable_flag is set to true.

Returning to FIG. 1, the motion evaluation unit 111 performs motion evaluation between a plurality of frames based on the input image and the reference image input from the predicted image generation unit 107, outputs motion information, and uses the motion information as the predicted image generation unit. 107 and the encoding unit 112.

For example, the motion evaluation unit 111 calculates an error by calculating a difference value from a plurality of reference images corresponding to the same position as the input image of the prediction target pixel block, shifts the position with fractional accuracy, and minimizes the error. Optimal motion information is calculated by a technique such as block matching for searching for a block. In the case of bidirectional prediction, the motion evaluation unit 111 performs block matching including default motion compensation prediction as shown in Equation (1) and Equation (4) using motion information derived by unidirectional prediction. Thus, motion information for bidirectional prediction is calculated.

At this time, the motion evaluation unit 111 may calculate motion information in consideration of weighted prediction by performing block matching including weighted motion compensated prediction as shown in Equation (7) and Equation (9). Is possible. In this case, the motion evaluation unit 111 may perform block matching using Equation (7) and Equation (9) according to the WP parameter information output from the parameter deriving unit 110.

In the first embodiment, the motion evaluation unit 111 is exemplified as one function of the encoding device 100. However, the motion evaluation unit 111 is not an essential component of the encoding device 100, and for example, the motion evaluation unit 111 is encoded. It is good also as an apparatus outside the conversion apparatus 100. In this case, the motion information calculated by the motion evaluation unit 111 may be loaded into the encoding device 100.

The encoding unit 112 includes various encoding parameters such as a quantized transform coefficient input from the quantization unit 103, motion information input from the motion evaluation unit 111, and quantization information specified by the encoding control unit 113. Is encoded to generate encoded data. The encoding process corresponds to, for example, Huffman encoding or arithmetic encoding. For example, H.M. In H.264, context-adaptive variable-length coding (CAVLC: Context based Adaptive Variable Coding) and context-adaptive arithmetic coding (CABAC: Context based Adaptive Binary Coding) are used.

The encoding parameter is a parameter necessary for decoding, such as prediction information indicating a prediction method, information on quantization transform coefficients, information on quantization, and the like. For example, the encoding control unit 113 has an internal memory (not shown), the encoding parameter is held in the internal memory, and the encoding parameter of the adjacent already encoded pixel block is used when encoding the pixel block. You can For example, H.M. In the H.264 intra prediction, the prediction information of the pixel block can be derived from the prediction information of the encoded adjacent block.

The encoding unit 112 outputs the generated encoded data according to an appropriate output timing managed by the encoding control unit 113. The output encoded data is, for example, multiplexed with various information such as a multiplexing unit (not shown) and temporarily stored in an output buffer (not shown). (Storage medium) or transmission system (communication line).

FIG. 12 is a flowchart illustrating an example of a flow of a reference image group feature quantity derivation process performed by the reference image feature quantity derivation unit 108 of the first embodiment.

First, when the slice type of the encoding target image is B-slice, the reference image feature quantity deriving unit 108 can use two reference lists, so PRED_TYPE is set to 1, and when the slice type is P-slice, reference is made. Since one list can be used, set PRED_TYPE to 0. In addition, since the slice type of the encoding target image is managed by the variable slice_type in the encoding control unit 113, the reference image feature value deriving unit 108 refers to the variable slice_type, thereby determining the slice type of the encoding target image. Can be discriminated whether B-slice or P-slice.

Subsequently, the reference image feature quantity deriving unit 108 initializes the list number X to 0 (step S101), and initializes the reference image number Y to 0 (step S102).

Subsequently, when the reference image with the list number X and the reference image number Y is input from the predicted image generation unit 107, the average value calculation unit 401 calculates the pixel average value DCLX [Y] according to Equation (10), The pixel error value ACLX [Y] is calculated according to (11) (step S103).

Subsequently, the integration unit 403 integrates the list number X, the reference image number Y, the pixel average value DCLX [Y], and the pixel error value ACLX [Y] into the reference image feature amount, and a reference image group feature amount table. Update. Then, the reference image feature quantity deriving unit 108 increments the reference image number Y (step S104).

Subsequently, the reference image feature quantity deriving unit 108 determines whether or not the incremented reference image number Y is larger than num_ref_active_lx_minus1 (step S105). If it is smaller (No in step S105), the process returns to step S103.

On the other hand, if it is larger (Yes in step S105), the reference image feature quantity deriving unit 108 determines that the calculation of the pixel average value and the pixel error value of all the reference images in the reference list of the list number X is completed, and the list number X is incremented (step S106).

Subsequently, the reference image feature quantity deriving unit 108 determines whether or not the incremented list number X is larger than PRED_TYPE (step S107), and if smaller (No in step S107), the process returns to step S102.

On the other hand, if it is larger (Yes in step S107), the reference image feature quantity deriving unit 108 determines that all the reference lists have been processed, outputs the reference image group feature quantity, and ends the process.

In the flowchart illustrated in FIG. 12, an example in which the corresponding reference images are sequentially input from the predicted image generation unit 107 is illustrated. However, reference images in which a list number and a reference image number are associated may be input in a batch. .

FIG. 13 is a flowchart illustrating an example of a flow of a predicted image feature quantity derivation process performed by the predicted image feature quantity derivation unit 109 of the first embodiment.

First, the predicted image feature quantity deriving unit 109 sets PRED_TYPE to 1 when the slice type of the encoding target image is B-slice, and sets PRED_TYPE to 0 when the slice type is P-slice. Note that the predicted image feature quantity deriving unit 109 can determine whether the slice type of the encoding target image is B-slice or P-slice by referring to the variable slice_type managed by the encoding control unit 113.

Subsequently, the predicted image feature quantity deriving unit 109 initializes the list number X to 0 (step S201) and initializes the reference image number Y to 0 (step S202).

Subsequently, when a reference image group feature amount is input from the reference image feature amount deriving unit 108, the feature amount control unit 411 determines the RefPicOrderCnt of Expression (12) or Expression (13) according to the slice type of the encoding target image. Using the function, POC is derived from list number X and reference image number Y, and the derived POC is stored in the refPOC array. Then, the feature quantity control unit 411 increments the reference image number Y (step S203).

Subsequently, the feature quantity control unit 411 determines whether or not the incremented reference image number Y is larger than num_ref_active_lx_minus1 (step S204), and if smaller (No in step S204), the process returns to step S203.

On the other hand, if it is larger (Yes in step S204), the feature amount control unit 411 determines that the POC derivation of all the reference images in the reference list of the list number X is finished, and increments the list number X (step S205). .

Subsequently, the feature quantity control unit 411 determines whether or not the incremented list number X is larger than PRED_TYPE (step S206), and if smaller (No in step S206), the process returns to step S202.

On the other hand, if it is larger (Yes in step S206), the feature quantity control unit 411 determines that all the reference lists have been processed, and sortRefPOC of Expression (12) or Expression (13) according to the slice type of the encoding target image. Using the function, the POC having the smallest absolute distance (absolute difference value) from the POC of the encoding target image indicated by curPOC among the POCs stored in refPOC is set to POC1. Then, the feature amount control unit 411 outputs the first reference image feature amount, which is the reference image feature amount of the reference image of POC1, to the memory 412, and deletes the POC from the refPOC array (step S207).

Subsequently, the feature value control unit 411 again uses the SortRefPOC function of Expression (12) or Expression (13) according to the slice type of the encoding target image to indicate curPOC among POCs stored in refPOC. The POC having the smallest absolute distance (absolute difference value) with the POC of the encoding target image is set as POC2, and the POC is deleted from the refPOC array (step S208).

Subsequently, when POC1 and POC2 are the same (Yes in Step S209), the feature amount control unit 411 executes Formula (14) to update POC2 (Step S210), and returns to Step S209.

On the other hand, if POC1 and POC2 are not the same (No in step S209), the feature amount control unit 411 sends the second reference image feature amount that is the reference image feature amount of the reference image of POC2 to the predicted image feature amount calculation unit 413. Output.

Subsequently, when the first reference image feature value is input from the memory 412 and the second reference image feature value is input from the feature value control unit 411, the predicted image feature value deriving unit 109 uses Equation (16). A time distance ratio tb between POC1 and curPOC is derived (step S211).

Subsequently, the predicted image feature quantity deriving unit 109 derives the time distance ratio td between POC1 and POC2 using Expression (18) (step S212).

Subsequently, the predicted image feature quantity deriving unit 109 derives DistScaleFactor used for distance scaling from the time distance ratio tb and the time distance ratio td using Expression (15) (step S213).

Subsequently, the predicted image feature quantity deriving unit 109 determines whether or not the derived DistScaleFactor satisfies any of the formulas (23) to (25) that are conditions for the exception process (step S214).

If the exception processing condition is met (Yes in step S214), the predicted image feature quantity derivation unit 109 sets the predicted image feature quantity derivation flag wp_avaiable_flag to false (step S215). Thereby, the pixel average value DCP and the pixel error value ACP of the predicted image are set to the initial values (step S216).

On the other hand, when the exception processing condition is not satisfied (No in Step S214), the predicted image feature quantity deriving unit 109 sets wp_avaiable_flag to true (Step S217).

Then, the predicted image feature quantity deriving unit 109 uses the equation (19) to calculate the pixel average of the predicted image from the DistScaleFactor, the pixel average value DC1 of the first reference image feature quantity, and the pixel average value DC2 of the second reference image feature quantity. The value DCP is derived, and the pixel error value ACP of the predicted image is derived from the DistScaleFactor, the pixel error value AC1 of the first reference image feature quantity, and the pixel error value AC2 of the second reference image feature quantity using Equation (20). To do. Then, the predicted image feature quantity deriving unit 109 outputs the pixel average value DCP, the pixel error value ACP, and the predicted image feature quantity derivation flag wp_avaiable_flag of the derived predicted image as the predicted image feature quantity (step S218).

FIG. 14 is a flowchart illustrating an example of a flow of a WP parameter information derivation process performed by the parameter derivation unit 110 of the first embodiment.

First, the parameter deriving unit 110 sets PRED_TYPE to 1 when the slice type of the encoding target image is B-slice, and sets PRED_TYPE to 0 when the slice type is P-slice. The parameter deriving unit 110 can determine whether the slice type of the image to be encoded is B-slice or P-slice by referring to the variable slice_type managed by the encoding control unit 113.

Subsequently, when the reference image group feature amount is input from the reference image feature amount deriving unit 108 and the predicted image feature amount is input from the predicted image feature amount deriving unit 109, the parameter deriving unit 110 displays the predicted image feature amount deriving flag. It is confirmed whether or not wp_avaiable_flag is set to false (step S301).

When wp_avaiable_flag is set to false (Yes in step S301), the parameter deriving unit 110 sets Log2Denom to 0 using equation (28) (step S302).

Subsequently, the parameter deriving unit 110 sets the weight coefficients of all reference images to initial values using Equation (26) (Step S303), and sets the offsets of all reference images to initial values using Equation (27). (Step S304). Then, the process ends.

On the other hand, when wp_avaiable_flag is set to true (No in step S301), the parameter deriving unit 110 sets Log2Denom to a predetermined value (for example, 7) (step S305).

Subsequently, the parameter deriving unit 110 initializes the list number X to 0 (step S306), and initializes the reference image number Y to 0 (step S307).

Subsequently, the parameter deriving unit 110 derives the weighting coefficient Weight [X] [Y] corresponding to the list number X and the reference image number Y using Equation (29), and sets the offset Offset [X] [Y]. Derived using Equation (30) (step S308).

Subsequently, the parameter deriving unit 110 increments the reference image number Y (step S309). Then, the parameter deriving unit 110 determines whether or not the incremented reference image number Y is larger than num_ref_active_lx_minus1 (step S310), and if smaller (No in step S310), the process returns to step S308.

On the other hand, if it is larger (Yes in step S310), the parameter deriving unit 110 determines that the derivation of the weighting factors and offsets of all the reference images in the reference list of the list number X has been completed, and increments the list number X (step). S311).

Subsequently, the parameter deriving unit 110 determines whether or not the incremented list number X is larger than PRED_TYPE (step S312), and if it is smaller (No in step S312), the process returns to step S307.

On the other hand, if it is larger (Yes in step S312), the feature quantity control unit 411 determines that all the reference lists have been processed, outputs WP parameter information, and ends the process.

FIG. 15 is a diagram illustrating an example of the syntax 500 used by the encoding device 100 according to the first embodiment. A syntax 500 indicates a structure of encoded data generated by the encoding apparatus 100 by encoding an input image (moving image data). When decoding the encoded data, a decoding apparatus described later refers to the same syntax structure as that of the syntax 500, and the moving image interprets the syntax.

The syntax 500 includes three parts: a high level syntax 501, a slice level syntax 502, and a coding tree level syntax 503. The high level syntax 501 includes syntax information of a layer higher than the slice. A slice refers to a rectangular area or a continuous area included in a frame or a field. The slice level syntax 502 includes information necessary for decoding each slice. The coding tree level syntax 503 includes information necessary for decoding each coding tree (ie, each coding tree block). Each of these parts includes more detailed syntax.

The high level syntax 501 includes sequence and picture level syntaxes such as a sequence parameter set syntax 504, a picture parameter set syntax 505, and an adaptation parameter set syntax 506.

The slice level syntax 502 includes a slice header syntax 507, a bread weight table syntax 508, a slice data syntax 509, and the like. The tread weight table syntax 508 is called from the slice header syntax 507.

The coding tree level syntax 503 includes a coding tree unit syntax 510, a transform unit syntax 511, a prediction unit syntax 512, and the like. The coding tree unit syntax 510 may have a quadtree structure. Specifically, the coding tree unit syntax 510 can be recursively called as a syntax element of the coding tree unit syntax 510. That is, one coding tree block can be subdivided with a quadtree. The coding tree unit syntax 510 includes a transform unit syntax 511. The transform unit syntax 511 is called in each coding tree unit syntax 510 at the extreme end of the quadtree. The transform unit syntax 511 describes information related to inverse orthogonal transformation and quantization.

FIG. 16 is a diagram illustrating an example of the picture parameter set syntax 505 according to the first embodiment. For example, weighted_unipred_idc is a syntax element indicating whether the weighted compensated prediction of the first embodiment regarding P-slice is valid or invalid. When weighted_unipred_idc is 0, the weighted motion compensation prediction of the first embodiment in the P-slice is invalid. Accordingly, the WP application flag included in the WP parameter information is always set to 0, and the

WP selectors

304 and 305 connect the respective output terminals to the default motion compensation unit 301. When weighted_unipred_idc is 1, explicit weighted motion compensation prediction (not explained in the first embodiment) in the P-slice is valid. Explicit weighted prediction is one of modes in which WP parameter information is explicitly encoded using the predicate weight table syntax. It can be realized by the method described in H.264. When weighted_unipred_idc is 2, the implicit weighted motion compensation prediction of the first embodiment in the P-slice is valid.

As another example, the weighted_unipred_idc may be changed to weighted_pred_flag so that the implicit weighted motion compensation prediction of the first embodiment in the P-slice is always prohibited.

“Weighted_bipred_idc” is a syntax element indicating, for example, the validity or invalidity of the weighted compensation prediction of the first embodiment related to B-slice. When weighted_bipred_idc is 0, the weighted motion compensation prediction of the first embodiment in the B-slice is invalid. Accordingly, the WP application flag included in the WP parameter information is always set to 0, and the

WP selectors

304 and 305 connect the respective output terminals to the default motion compensation unit 301. When weighted_bipred_idc is 1, explicit weighted motion compensated prediction (not described in the first embodiment) in the B-slice is effective. Explicit weighted prediction is one of modes in which WP parameter information is explicitly encoded using red weight table syntax. It can be realized by the method described in H.264. When weighted_bipred_idc is 2, the implicit weighted motion compensation prediction of the first embodiment in the B-slice is valid.

FIG. 17 is a diagram illustrating an example of the sequence parameter set syntax 504 according to the first embodiment. profile_idc is an identifier indicating information on a profile of encoded data. level_idc is an identifier indicating information on the level of encoded data. seq_parameter_set_id is an identifier indicating which sequence parameter set syntax 504 is to be referred to. max_num_ref_frames is a variable indicating the maximum number of reference images in a frame. The implicit_weighted_unipred_enabled_flag is a syntax element indicating, for example, whether the P-slice implicit weighted motion compensation prediction is valid or invalid with respect to the encoded data. The implicit_weighted_bipred_enabled_flag is a syntax element indicating, for example, whether the B-slice implicit weighted motion compensation prediction is valid or invalid with respect to the encoded data.

When implicit_weighted_unipred_enabled_flag or implicit_weighted_bipred_enabled_flag is 0, P-slice or B-slice implicit weighted motion compensated prediction in the encoded data is invalid. Accordingly, the WP application flag included in the WP parameter information is always set to 0, and the

WP selectors

304 and 305 connect the respective output terminals to the default motion compensation unit 301. On the other hand, when implicit_weighted_unipred_enabled_flag or implicit_weighted_bipred_enabled_flag is 1, P-slice or B-slice implicit weighted motion compensation prediction in the encoded data is valid.

As another example, when implicit_weighted_unipred_enabled_flag or implicit_weighted_bipred_enabled_flag is 1, in the syntax of a lower layer (picture parameter set, slice header, coding tree block, transform unit, prediction unit, etc.) The validity or invalidity of the implicitly weighted motion compensated prediction may be defined for each local region.

As another example, when implicit_weighted_unipred_enabled_flag or implicit_weighted_bipred_enabled_flag is 1, after the decoding process of the encoded slice is completed, the image feature amount of the encoded image is derived and stored in the DPB together with the information of the reference image It is good. In this case, the encoded slice is stored in the DPB as a reference image, and every time it is referred to when another slice is encoded, it is possible to omit the process of calculating the image feature quantity multiple times, thereby reducing the processing amount. it can.

As described above, in the first embodiment, the reference image feature amount deriving unit 108 derives the pixel average value and the pixel error value of the reference image as the reference image feature amount, and the predicted image feature amount deriving unit 109 The pixel average value and the pixel error value of the predicted image are derived as feature amounts, and the parameter deriving unit 110 derives a weighting factor using the pixel average value of the reference image and the pixel average value of the predicted image as WP parameter information. The offset is derived using the pixel error value of the reference image and the pixel error value of the predicted image. Therefore, according to the first embodiment, since implicit weighted prediction can be performed in consideration of not only the weighting coefficient but also the offset, the prediction error can be reduced by using the prediction image generated by this prediction. The amount of codes can be reduced, and the coding efficiency can be improved.

Specifically, in the first embodiment, linear interpolation prediction or linear extrapolation prediction is performed based on a pixel average value and a pixel error value derived between two reference images, and a pixel value change between reference images. A pixel average value and a pixel error value of the predicted image are derived, and changes in the pixel values of the reference image and the predicted image are predicted from these values. Therefore, according to the first embodiment, a weighting factor and an offset can be effectively predicted for a video having fades and dissolve effects that are continuous in time, thereby reducing a prediction error and improving encoding efficiency. Can improve the subjective image quality.

As described above, the encoding apparatus 100 according to the first embodiment, when performing weighted motion compensated prediction, uses an implicit mode that does not explicitly encode a weight parameter between two reference images. By deriving the image feature amount of the predicted image from the derived image feature amount and the temporal distance ratio, and deriving the weighting factor and offset as the weight parameters of the reference image and the predicted image, the coding amount can be reduced, and the coding efficiency Can be improved.

It should be noted that syntax elements not defined in the first embodiment may be inserted between the rows of the syntax tables illustrated in FIGS. 16 to 17, or descriptions regarding other conditional branches may be included. Further, the syntax table may be divided into a plurality of tables, or a plurality of syntax tables may be integrated. Moreover, the term of each illustrated syntax element can be changed arbitrarily.

(Modification 1 of the first embodiment)
In the first modification of the first embodiment, another method for calculating the pixel average value and the pixel error value of the reference image will be described.

When the reference image feature quantity deriving unit 108 (the average value calculating unit 401 and the error value calculating unit 402) calculates the pixel average value and the pixel error value of the reference image according to the equations (10) and (11), the calculated pixels The average value and the pixel error value are fractional values, and an error in the fractional part occurs when rounded to integer precision.

To avoid this error, rounding errors in division can be reduced by performing division at the same time when deriving WP parameters (weighting factors and offsets). In this case, the reference image feature quantity deriving unit 108 calculates the average pixel value and the pixel error value of the reference image using Expression (34) and Expression (35).

In this case, the parameter deriving unit 110 calculates the offset by the mathematical formula (36).

Offset [X] [Y] = (((((curDC << Log2Denom)-(Weight [X] [Y] * (DC [X] [Y] << LeftShft))) / N + RealOfst) >> RealLog2Denom) ... (36)

In the formula (36), division by the image size N is added to the formula (30). For example, by defining the internal calculation accuracy for each image size, the division N can be deleted and included in the section of RealLog2Denom.

According to the first modification of the first embodiment, a rounding error in division can be reduced.

(Modification 2 of the first embodiment)
In the second modification of the first embodiment, another method for calculating the pixel average value and the pixel error value of the reference image will be described.

In the first modification of the first embodiment, the reference image feature quantity deriving unit 108 (the average value calculating unit 401 and the error value calculating unit 402) performs the reference image on the entire image using Expressions (34) and (35). When the pixel average value and the pixel error value are calculated, the calculation accuracy increases depending on the resolution of the image. For example, assuming that the pixel range is a 10-bit signal, the worst calculation amount of an image having 4096 × 2160 pixels (≈2 to the 23rd power) is 2 to the 33rd power (23 + 10). It will exceed.

Therefore, the reference image feature quantity deriving unit 108 performs mathematical formulas (34) and (35) in predetermined processing units, and quantizes the predetermined values to quantify the calculation accuracy for each processing unit. Can be kept constant. An arbitrary unit such as a slice unit, a line unit, or a pixel block unit can be set as the processing unit.

For example, when performed in units of coding tree units, the reference image feature value deriving unit 108 performs the formula (34) and the formula (35) in 64 × 64 pixel units (2 to the 12th power) illustrated in FIG. , And quantize the pixel error values using Equation (37) and Equation (38), respectively.

DCLX '= (DCLX + Ofst5) >> Shft5 (37)
ACLX '= (ACLX + Ofst5) >> Shft5 ... (38)

Here, Shft5 is calculated by Equation (39), and Ofst5 is calculated by Equation (40). Here, in order to include the term of size N in Expression (36) in the shift calculation, the value of Shft5 is set to 4 in Expression (39).

Shft5 = 4 (39)
Ofst5 = (1 << (Shft5-1)) (40)

In this case, the parameter deriving unit 110 calculates an offset using the formula (30) and calculates RealLog2Denom using the formula (41).

RealLog2Denom = Log2Denom + LeftShft + Shft5; ... (41)

According to the second modification of the first embodiment, division that occurs depending on the image size or block size can be realized by right shift, and processing can be performed by 32-bit arithmetic without depending on resolution. This makes it possible to reduce the hardware scale.

(Modification 3 of the first embodiment)
In the third modification of the first embodiment, another method for calculating the pixel average value and the pixel error value of the reference image will be described.

For simple fade and dissolve effects where the statistical properties of the pixel average value and pixel error value in the screen do not change much, the pixel average value and pixel error value are calculated for each processing area, and the pixel average is calculated using all the pixels of the image. There is no need to calculate the value and the pixel error value.

Therefore, the reference image feature amount deriving unit 108 calculates, for example, a pixel average value and a pixel error value in advance, such as skipping one pixel block and sub-sampling, simple sub-sampling, line-by-line, or slice-by-pixel block of the pixel block. By defining the above, it is possible to reduce the amount of processing involved in calculating the pixel average value and the pixel error value.

(Modification 4 of the first embodiment)
In the fourth modification of the first embodiment, another method for calculating the pixel average value and the pixel error value of the reference image will be described.

When the reference image feature quantity deriving unit 108 calculates the pixel average value and the pixel error value for each processing region, the parameter deriving unit 110 can derive WP parameter information for each processing region. For example, when the reference image feature quantity deriving unit 108 calculates the pixel average value and the pixel error value in coding tree unit units, the reference image feature quantity is derived in 64 × 64 pixel units shown in FIG. Since the unit 108 also derives the predicted image feature quantity in units of 64 × 64 pixels shown in FIG. 3A, the parameter derivation unit 110 can derive WP parameter information in units of coding tree units.

The parameter deriving unit 110 performs the weighted motion compensated prediction of Equation (7) or Equation (9) using the derived WP parameter information for each processing region, thereby performing implicit motion compensated prediction for each pixel block. realizable. For example, when there are temporal pixel value changes that differ for each video region and the changes are temporally continuous, WP parameter information is derived for each pixel block as described above, and implicit motion compensation prediction is performed. Thus, the code amount can be reduced.

(Second Embodiment)
In the second embodiment, a decoding device that decodes encoded data encoded by the encoding device of the first embodiment will be described.

FIG. 18 is a block diagram illustrating an example of the configuration of the decoding device 800 according to the second embodiment.

The decoding apparatus 800 decodes encoded data stored in an input buffer (not shown) into a decoded image and outputs the decoded image to an output buffer (not shown) as an output image. The encoded data is output from, for example, the encoding device 100 of FIG. 1 and the like, and is input to the decoding device 800 via a storage system, a transmission system, or a buffer (not shown).

As illustrated in FIG. 18, the decoding device 800 includes a decoding unit 801, an inverse quantization unit 802, an inverse orthogonal transform unit 803, an addition unit 804, a predicted image generation unit 805, and a reference image feature amount derivation unit 806. A predictive image feature amount deriving unit 807 and a parameter deriving unit 808. An inverse quantization unit 802, an inverse orthogonal transform unit 803, an addition unit 804, a predicted image generation unit 805, a reference image feature quantity deriving unit 806, a predicted image feature quantity deriving unit 807, and a parameter deriving unit 808 are respectively the inverse of FIG. The quantization unit 104, the inverse orthogonal transform unit 105, the addition unit 106, the predicted image generation unit 107, the reference image feature amount derivation unit 108, the predicted image feature amount derivation unit 109, and the parameter derivation unit 110 are substantially the same or similar. Is an element. Note that the decoding control unit 809 shown in FIG. 18 controls the decoding device 800 and can be realized by, for example, a CPU.

The decoding unit 801 performs decoding based on the syntax for each frame or field for decoding the encoded data. The decoding unit 801 sequentially entropy-decodes the code string of each syntax, and reproduces the motion information including the prediction mode, the motion vector, the reference image number, and the like, and the encoding parameters of the encoding target block such as the quantization transform coefficient. . In addition to the above, the encoding parameters include all parameters necessary for decoding such as information on transform coefficients and information on quantization.

Specifically, the decoding unit 801 has a function of performing decoding processing such as variable length decoding processing and arithmetic decoding processing on the input encoded data. For example, H.M. H.264 uses context-adaptive variable-length coding (CAVLC: Context based Adaptive Variable Coding) and context-adaptive arithmetic coding (CABAC: Context based Adaptive Binary Coding). These processes are also called decryption processes.

The decoding unit 801 outputs motion information, a quantized transform coefficient, and the like, inputs the quantized transform coefficient to the inverse quantization unit 802, and inputs the motion information to the predicted image generation unit 805.

The inverse quantization unit 802 performs an inverse quantization process on the quantized transform coefficient input from the decoding unit 801 to obtain a restored transform coefficient. Specifically, the inverse quantization unit 802 performs inverse quantization according to the quantization information used in the decoding unit 801. More specifically, the inverse quantization unit 802 multiplies the quantized transform coefficient by the quantization step size derived from the quantization information to obtain a restored transform coefficient. The inverse quantization unit 802 outputs the restored transform coefficient and inputs it to the inverse orthogonal transform unit 803.

The inverse orthogonal transform unit 803 performs inverse orthogonal transform corresponding to the orthogonal transform performed on the encoding side on the reconstructed transform coefficient input from the inverse quantization unit 802 to obtain a reconstructed prediction error. The inverse orthogonal transform unit 803 outputs the restoration prediction error and inputs it to the addition unit 804.

The addition unit 804 adds the restored prediction error input from the inverse orthogonal transform unit 803 and the corresponding prediction image, and generates a decoded image. The adding unit 804 outputs the decoded image and inputs the decoded image to the predicted image generation unit 805. The adding unit 804 outputs the decoded image to the outside as an output image. The output image is then temporarily stored in an external output buffer (not shown) or the like. For example, in accordance with the output timing managed by the decoding control unit 807, the output image is sent to a display device system or a video device system (not shown). Is output.

The predicted image generation unit 805 generates a predicted image using the motion information input from the decoding unit 801, the WP parameter information input from the parameter derivation unit 808, and the decoded image input from the addition unit 804.

Here, the details of the predicted image generation unit 805 will be described with reference to FIG. Like the predicted image generation unit 107, the predicted image generation unit 805 includes a multi-frame motion compensation unit 201, a memory 202, a unidirectional motion compensation unit 203, a prediction parameter control unit 204, a reference image selector 205, and a frame memory 206. And a reference image control unit 207.

The frame memory 206 stores the decoded image input from the adding unit 804 as a reference image under the control of the reference image control unit 207. The frame memory 206 has a plurality of memory sets FM0 to FMN (N ≧ 1) for temporarily storing reference images.

The prediction parameter control unit 204 prepares a plurality of combinations of reference image numbers and prediction parameters as a table based on the motion information input from the decoding unit 801. Here, the motion information indicates a motion vector indicating a shift amount of motion used in motion compensation prediction, a reference image number, information on a prediction mode such as unidirectional / bidirectional prediction, and the like. The prediction parameter refers to information regarding a motion vector and a prediction mode. Then, the prediction parameter control unit 204 selects and outputs a combination of a reference image number and a prediction parameter used for generating a prediction image based on the motion information, inputs the reference image number to the reference image selector 205, and outputs the prediction parameter. Is input to the unidirectional motion compensation unit 203.

The reference image selector 205 is a switch for switching which output end of the frame memories FM0 to FMN included in the frame memory 206 is connected according to the reference image number input from the prediction parameter control unit 204. For example, if the reference image number is 0, the reference image selector 205 connects the output end of FM0 to the output end of the reference image selector 205. If the reference image number is N, the reference image selector 205 connects the output end of the FMN to the reference image selector. Connect to 205 output. The reference image selector 205 outputs a reference image stored in the frame memory to which the output terminal is connected among the frame memories FM0 to FMN included in the frame memory 206, and inputs the reference image to the unidirectional motion compensation unit 203 and references. This is input to the image feature quantity deriving unit 806.

The unidirectional motion compensation unit 203 performs a motion compensation prediction process according to the prediction parameter input from the prediction parameter control unit 204 and the reference image input from the reference image selector 205, and generates a unidirectional prediction image. Since motion compensation prediction has already been described with reference to FIG. 5, description thereof will be omitted.

The unidirectional motion compensation unit 203 outputs a unidirectional prediction image and temporarily stores it in the memory 202. Here, when the motion information (prediction parameter) indicates bidirectional prediction, the multi-frame motion compensation unit 201 performs weighted prediction using two types of unidirectional prediction images. The first unidirectional prediction image corresponding to the first one is stored in the memory 202, and the second unidirectional prediction image is directly output to the multi-frame motion compensation unit 201. Here, the first unidirectional prediction image corresponding to the first is the first prediction image, and the second unidirectional prediction image is the second prediction image.

The multi-frame motion compensation unit 201 uses the first predicted image input from the memory 202, the second predicted image input from the unidirectional motion compensation unit 203, and the WP parameter information input from the parameter derivation unit 808 to weight Predictive images are generated by performing predictive prediction. The multi-frame motion compensation unit 201 outputs a prediction image and inputs the prediction image to the addition unit 804.

Here, the details of the multi-frame motion compensation unit 201 will be described with reference to FIG. Similar to the predicted image generation unit 107, the multi-frame motion compensation unit 201 includes a default motion compensation unit 301, a weighted motion compensation unit 302, a WP parameter control unit 303, and

WP selectors

304 and 305.

The WP parameter control unit 303 outputs a WP application flag and weight information based on the WP parameter information input from the parameter derivation unit 808, inputs the WP application flag to the

WP selectors

Here, the WP parameter information includes the fixed-point precision of the weighting factor, the first WP application flag corresponding to the first predicted image, the first weighting factor, the first offset, and the second WP adaptation corresponding to the second predicted image. Contains information on flag, second weighting factor, and second offset. The WP application flag is a parameter that can be set for each corresponding reference image and signal component, and indicates whether to perform weighted motion compensation prediction. The weight information includes information on the fixed point precision of the weight coefficient, the first weight coefficient, the first offset, the second weight coefficient, and the second offset. The WP parameter information represents the same information as in the first embodiment.

Specifically, when the WP parameter information is input from the parameter deriving unit 808, the WP parameter control unit 303 separates the WP parameter information into a first WP application flag, a second WP application flag, and weight information, and outputs them. The first WP application flag is input to the WP selector 304, the second WP application flag is input to the WP selector 305, and the weight information is input to the weighted motion compensation unit 302.

The

WP selectors

Also, when the prediction mode indicated by the motion information (prediction parameter) is unidirectional prediction, the weighted motion compensation unit 302 uses only the first prediction image and uses the final prediction image based on Equation (9). Is calculated.

Since the fixed-point precision of the weighting factor has already been described with reference to FIG. In the case of unidirectional prediction, various parameters (second WP adaptive flag, second weighting factor, and second offset information) corresponding to the second predicted image are not used, and are set to predetermined initial values. May be.

The reference image feature value deriving unit 806, the predicted image feature value deriving unit 807, and the parameter deriving unit 808 correspond to the predicted image generated by the predicted image generating unit 805 from the reference image input from the predicted image generating unit 805. WP parameter information is derived implicitly.

The reference image feature amount deriving unit 806 derives a reference image feature amount of each reference image input from the predicted image generation unit 805, outputs a reference image group feature amount that summarizes the derived reference image feature amounts, and The prediction image feature quantity deriving unit 807 and the parameter deriving unit 809 are input.

Since the reference image group feature quantity has already been described with reference to FIG. Here, the details of the reference image feature quantity deriving unit 806 will be described with reference to FIG. Similar to the reference image feature value deriving unit 108, the reference image feature value deriving unit 806 includes an average value calculating unit 401, an error value calculating unit 402, and an integrating unit 403.

Each time a reference image is input from the predicted image generation unit 805, the average value calculation unit 401 calculates a pixel average value of the reference image, outputs the calculated pixel average value, and an error value calculation unit 402 and an integration unit Input to 403. The average value calculation unit 401 calculates the pixel average value of the reference image using, for example, Equation (10).

Each time the reference image is input from the predicted image generation unit 805, the error value calculation unit 402 uses the pixel average value of the reference image input from the average value calculation unit 401 to calculate the pixel average value of the reference image. A pixel error value is calculated, and the calculated pixel error value is output and input to the integration unit 403. The error value calculation unit 402 calculates the pixel error value of the pixel average value of the reference image using, for example, Equation (11).

Whenever the reference image is input from the predicted image generation unit 805, the integration unit 403 receives the list number and reference image number of the reference image, the pixel average value of the reference image input from the average value calculation unit 401, and the error. The pixel error value of the pixel average value of the reference image input from the value calculation unit 402 is integrated into the reference image feature amount. Then, the integration unit 403 collects the integrated reference image feature values in a table as illustrated in FIG. 8 and outputs the reference image group feature values as reference image group feature values, which are input to the predicted image feature value derivation unit 807 and the parameter derivation unit 808.

The predicted image feature quantity derivation unit 807 derives a predicted image feature quantity based on the reference image group feature quantity input from the reference image feature quantity derivation unit 806, outputs the derived predicted image feature quantity, and a parameter derivation unit 808. To enter. The predicted image feature amount includes a pixel average value of the predicted image, a pixel error value obtained by averaging errors from the pixel average value, and a predicted image feature amount derivation flag indicating whether the predicted image feature amount has been derived.

Here, the details of the predicted image feature quantity deriving unit 807 will be described with reference to FIG. Similar to the predicted image feature value deriving unit 109, the predicted image feature value deriving unit 807 includes a feature value control unit 411, a memory 412, and a predicted image feature value calculating unit 413.

The feature amount control unit 411 selects and outputs two reference image feature amounts used for deriving the predicted image feature amount from the reference image group feature amount input from the reference image feature amount deriving unit 806, and outputs one of them in the memory 412. Is input (loaded), and the other is input to the predicted image feature amount calculation unit 413.

Specifically, the feature amount control unit 411 uses a reference image feature amount list number and a reference image number collected in the reference image group feature amount to indicate a display order of the reference image of the reference image feature amount. Deriving Order Count). The reference list and the reference image number are information for indirectly specifying the reference image, and the POC is information for directly specifying the reference image, and corresponds to the absolute position of the reference image. Then, the feature amount control unit 411 selects two POCs from the derived POC in the order of the shortest distance from the encoding target image (predicted image), thereby calculating the predicted image feature amount from the reference image group feature amount. Two reference image feature quantities used for derivation are selected.

In the second embodiment, an example in which the feature amount control unit 411 selects two POCs has been described, but three or more POCs may be selected. In this case, the feature amount control unit 411 selects three or more reference image feature amounts from the reference image group feature amount, and a predicted image feature amount is derived from the selected three or more reference image feature amounts. Note that in the case of P-slice, it is necessary that N ≧ 2, and the feature amount control unit 411 may search the reference list with the list number 0 after executing Expression (12).

The predicted image feature value calculation unit 413 calculates a predicted image feature value using the first reference image feature value input from the memory 412 and the second reference image feature value input from the feature value control unit 411. To be output and input to the parameter deriving unit 808.

The parameter deriving unit 808 uses the reference image group feature value input from the reference image feature value deriving unit 806 and the predicted image feature value input from the predicted image feature value deriving unit 807 to use the WP parameter of the encoding target image. Deriving information.

Since the WP parameter information has already been described with reference to FIGS. 11A and 11B, the description thereof will be omitted.

The parameter deriving unit 808 first checks the predicted image feature quantity derivation flag wp_avaiable_flag included in the predicted image feature quantity, and checks whether WP parameter information can be derived. When wp_avaiable_flag is set to false, the WP parameter information cannot be derived. Therefore, the parameter deriving unit 808 sets the weighting coefficient and offset corresponding to the list number X and the reference image number Y to Expression (26) and Expression ( The initial value is set according to 27).

When the wp_avaiable_flag is set to false, the parameter deriving unit 808 repeatedly executes Expression (26) and Expression (27) for all combinations of list numbers X and reference image numbers Y (all reference images). Thus, an initial value is set in the WP parameter information.

The parameter deriving unit 808 sets the WP application flag (WP_flag [X] [Y]) corresponding to the list number X and the reference image number Y to false when wp_avaiable_flag is set to false.

On the other hand, when wp_avaiable_flag is set to true, since WP parameter information can be derived, the parameter deriving unit 808 uses the weighting coefficient and offset corresponding to the list number X and the reference image number Y, respectively, as shown in Equation (29), Derived according to equation (30).

When the wp_avaiable_flag is set to true, the parameter deriving unit 808 repeatedly executes Expression (29) and Expression (30) for all combinations of list numbers X and reference image numbers Y (all reference images). Thus, the WP parameter information shown in FIGS. 11A and 11B is derived.

The parameter deriving unit 808 sets the WP application flag (WP_flag [X] [Y]) corresponding to the list number X and the reference image number Y to true when wp_avaiable_flag is set to true.

Decoding section 801 uses syntax 500 shown in FIG. A syntax 500 indicates a structure of encoded data to be decoded by the decoding unit 801. The syntax 500 has already been described with reference to FIG. Also, the picture parameter set syntax 505 has already been described with reference to FIG. 16 except that encoding is decoding, and thus description thereof will be omitted. Also, the sequence parameter set syntax 504 has already been described with reference to FIG. 17 except that encoding is decoding, and thus description thereof will be omitted.

As described above, the decoding apparatus 800 according to the second embodiment, when performing weighted motion compensation prediction, uses an implicit mode that does not explicitly encode a weight parameter between two reference images. By deriving the image feature amount of the predicted image from the derived image feature amount and the temporal distance ratio, and deriving the weighting factor and offset as the weight parameters of the reference image and the predicted image, the code amount can be reduced, and the coding Efficiency can be improved.

(Modification)
In the first and second embodiments described above, an example is described in which a frame is divided into rectangular blocks of 16 × 16 pixel size and the like, and encoding / decoding is sequentially performed from the upper left block to the lower right side (FIG. See 2A). However, the encoding order and the decoding order are not limited to this example. For example, encoding and decoding may be performed in order from the lower right to the upper left, or encoding and decoding may be performed so as to draw a spiral from the center of the screen toward the screen end. Furthermore, encoding and decoding may be performed in order from the upper right to the lower left, or encoding and decoding may be performed so as to draw a spiral from the screen end toward the center of the screen. In this case, since the position of the adjacent pixel block that can be referred to changes depending on the encoding order, the position may be changed to a usable position as appropriate.

In the first and second embodiments, the description has been given by exemplifying the prediction target block size such as the 4 × 4 pixel block, the 8 × 8 pixel block, and the 16 × 16 pixel block. However, the prediction target block is a uniform block. It does not have to be a shape. For example, the prediction target block size may be a 16 × 8 pixel block, an 8 × 16 pixel block, an 8 × 4 pixel block, a 4 × 8 pixel block, or the like. Moreover, it is not necessary to unify all the block sizes within one coding tree block, and a plurality of different block sizes may be mixed. When a plurality of different block sizes are mixed in one coding tree block, the code amount for encoding or decoding the division information increases as the number of divisions increases. Therefore, it is desirable to select the block size in consideration of the balance between the code amount of the division information and the quality of the locally decoded image or the decoded image.

In the first and second embodiments described above, for the sake of simplicity, a comprehensive description of the color signal component has been described without distinguishing between the prediction process for the luminance signal and the color difference signal. However, when the prediction process is different between the luminance signal and the color difference signal, the same or different prediction methods may be used. If different prediction methods are used between the luminance signal and the color difference signal, the prediction method selected for the color difference signal can be encoded or decoded in the same manner as the luminance signal.

In the first and second embodiments described above, for the sake of simplicity, a comprehensive description of the color signal component has been described without distinguishing between the luminance signal and the weighted motion compensation prediction process for the color difference signal. However, when the weighted motion compensation prediction process is different between the luminance signal and the color difference signal, the same or different weighted motion compensation prediction process may be used. If a weighted motion compensation prediction process that is different between the luminance signal and the color difference signal is used, the weighted motion compensation prediction process selected for the color difference signal can be encoded or decoded in the same manner as the luminance signal.

In the first and second embodiments, syntax elements not specified in the present embodiment can be inserted between the rows of the table shown in the syntax configuration, and other conditional branch descriptions are included. It does not matter. Alternatively, the syntax table can be divided and integrated into a plurality of tables. Moreover, it is not always necessary to use the same term, and it may be arbitrarily changed depending on the form to be used.

Modifications 1 to 4 of the first embodiment may be applied to the second embodiment.

As described above, each of the above embodiments solves the problem that the offset for additively correcting the pixel value cannot be used when performing the implicit weighted motion compensation prediction, and the highly efficient implicit weighted motion compensation. Realize prediction processing. Therefore, according to each of the above embodiments, the encoding efficiency is improved, and the subjective image quality is also improved.

Although several embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

For example, it is possible to provide a program that realizes the processing of each of the above embodiments by storing it in a computer-readable storage medium. The storage medium can be a computer-readable storage medium such as a magnetic disk, optical disk (CD-ROM, CD-R, DVD, etc.), magneto-optical disk (MO, etc.), semiconductor memory, etc. For example, the storage format may be any form.

Further, the program for realizing the processing of each of the above embodiments may be stored on a computer (server) connected to a network such as the Internet and downloaded to the computer (client) via the network.

DESCRIPTION OF SYMBOLS 100 Coding apparatus 101 Subtraction part 102 Orthogonal transformation part 103 Quantization part 104 Inverse quantization part 105 Inverse orthogonal transformation part 106 Adder part 107 Predictive image generation part 108 Reference image feature-value derivation part 109 Predictive-image feature-value derivation part 110 Parameter derivation Unit 111 motion evaluation unit 112 encoding unit 113 encoding control unit 201 multi-frame motion compensation unit 202 memory 203 unidirectional motion compensation unit 204 prediction parameter control unit 205 reference image selector 206 frame memory 207 reference image control unit 301 default motion compensation unit 302 Weighted motion compensation unit 303 WP

parameter control unit

304, 305 WP selector 401 Average value calculation unit 402 Error value calculation unit 403 Integration unit 411 Feature amount control unit 412 Memory 413 Predictive image feature amount calculation unit 800 Decoding device 801 Issue unit 802 inverse quantization unit 803 inverse orthogonal transform unit 804 adding unit 805 predicted image generation unit 806 reference image feature amount derivation unit 807 predicted image feature amount derivation unit 808 parameter derivation unit 809 the decoding control unit

Claims

A first derivation step for deriving a pixel error value indicating a pixel average value of each of the two or more reference images and a pixel difference from the pixel average value;
Deriving a pixel average value of the predicted image using a temporal distance ratio between at least two reference images of the two or more reference images and the predicted image and the pixel average value of the at least two reference images; A second derivation step of deriving a pixel error value of the predicted image using a time distance ratio and the pixel error value of the at least two reference images;
The weight coefficient of the reference image is derived using the pixel error value of the reference image and the pixel error value of the predicted image, and the derived weight coefficient, the pixel average value of the reference image, and the predicted image A third derivation step of deriving an offset of the reference image using the pixel average value of
Generating the predicted image of the target block using the reference image of one target block obtained by dividing the input image into a plurality of blocks of the reference image, the weighting coefficient of the reference image, and the offset of the reference image A predicted image generation step,
A predicted image generation method including:
In the third derivation step, the weighting factor of each of the reference images is derived according to the ratio between the pixel error value of the reference image and the pixel error value of the predicted image, and the derived weighting factor and the reference image Deriving the offset of each of the reference images according to the pixel average value of and the pixel average value of the predicted image,
In the predicted image generation step, the target block is obtained by multiplying the value obtained by motion compensation prediction of the reference image of the target block according to a motion vector by the weight coefficient of the reference image and adding the offset of the reference image. The predicted image generation method according to claim 1, wherein the predicted image is generated.
The predicted image generation method according to claim 1, wherein, in the second derivation step, reference images having different image display times are selected from the two or more reference images as the at least two reference images.
4. The predicted image generation method according to claim 3, wherein, in the second derivation step, the at least two reference images are selected from the two or more reference images in descending order of temporal distance from the predicted image.
The predicted image generation method according to claim 1, wherein in the second derivation step, the pixel average value and the pixel error value of the predicted image are derived by performing extrapolation prediction or linear prediction of interpolation prediction.
In the first derivation step, the pixel average value and the pixel error value of each of the two or more reference images are derived with integer precision,
The predicted image generation method according to claim 1, wherein, in the third derivation step, a rounding process to a fixed point precision determined in advance when the weighting factor and the offset of the reference image are derived.
In the first derivation step, the pixel average value and the pixel error value of each of the two or more reference images are derived in any one of a slice unit, a line unit, and a pixel block unit and fixed in advance. The predicted image generation method according to claim 1, wherein quantization is performed with decimal precision.
The first deriving step derives the pixel average value and the pixel error value of each of the two or more reference images in units of subsampled slice units, line units, or pixel block units. The predicted image generation method according to 1.
In the first derivation step, after the input image is encoded, the pixel average value and the pixel error value of each of the two or more reference images are derived,
The predicted image generation method according to claim 1, wherein, in the third derivation step, the pixel average value and the pixel error value of the reference image are referred to according to the reference image management method.
In the first derivation step, the pixel average value and the pixel error value of each of the two or more reference images are derived in any one of a slice unit, a line unit, and a pixel block unit,
In the second derivation step, the pixel average value and the pixel error value of the predicted image are derived in the unit,
In the third derivation step, the weighting factor and the offset of the reference image are derived in the unit,
In the predicted image generation step, a value obtained by motion compensation prediction of the reference image of the target block according to a motion vector is multiplied by the weight coefficient in the unit of the reference image, and the offset in the unit of the reference image is calculated. The predicted image generation method according to claim 1, wherein the predicted image of the target block is generated by adding.
A first derivation step for deriving a pixel error value indicating a pixel average value of each of the two or more reference images and a pixel difference from the pixel average value;
Deriving a pixel average value of the predicted image using a temporal distance ratio between at least two reference images of the two or more reference images and the predicted image and the pixel average value of the at least two reference images; A second derivation step of deriving a pixel error value of the predicted image using a time distance ratio and the pixel error value of the at least two reference images;
The weight coefficient of the reference image is derived using the pixel error value of the reference image and the pixel error value of the predicted image, and the derived weight coefficient, the pixel average value of the reference image, and the predicted image A third derivation step of deriving an offset of the reference image using the pixel average value of
Generating the predicted image of the target block using the reference image of one target block obtained by dividing the input image into a plurality of blocks of the reference image, the weighting coefficient of the reference image, and the offset of the reference image A predicted image generation step,
An encoding step for encoding a value based on the input image and the predicted image;
An encoding method including:
A first derivation step for deriving a pixel error value indicating a pixel average value of each of the two or more reference images and a pixel difference from the pixel average value;
Deriving a pixel average value of the predicted image using a temporal distance ratio between at least two reference images of the two or more reference images and the predicted image and the pixel average value of the at least two reference images; A second derivation step of deriving a pixel error value of the predicted image using a time distance ratio and the pixel error value of the at least two reference images;
The weight coefficient of the reference image is derived using the pixel error value of the reference image and the pixel error value of the predicted image, and the derived weight coefficient, the pixel average value of the reference image, and the predicted image A third derivation step of deriving an offset of the reference image using the pixel average value of
Generating the predicted image of the target block using the reference image of one target block obtained by dividing the input image into a plurality of blocks of the reference image, the weighting coefficient of the reference image, and the offset of the reference image A predicted image generation step,
A decoding step of generating a decoded image based on the predicted image;
A decoding method including: