HK1148148B

HK1148148B - Image encoder and image decoder

Info

Publication number: HK1148148B
Application number: HK11102133.1A
Authority: HK
Inventors: 关口俊一; 山岸秀一; 守屋芳美; 山田悦久; 浅井光太郎; 村上笃道; 出原优一
Original assignee: 三菱电机株式会社
Priority date: 2007-06-28
Filing date: 2008-06-25
Publication date: 2013-10-11

Description

Image encoding device and image decoding device

Technical Field

The present invention relates to a digital image signal encoding apparatus, a digital image signal decoding apparatus, a digital image signal encoding method, and a digital image signal decoding method used in an image compression encoding technique, a compressed image data transmission technique, and the like.

Background

Conventionally, international standard video coding schemes such as MPEG and ITU-T h.26x are premised on the use of a standardized input signal format called a 4:2:0 format. 4:2:0 is a format in which a color moving image signal such as RGB is converted into a luminance component (Y) and two color difference components (Cb, Cr), and the number of samples of the horizontal/vertical color difference components is reduced to half of the luminance component. Since the color difference component has a lower visibility than the luminance component, it is assumed that the amount of original information to be encoded is reduced by performing down-sampling (down-sampling) of the color difference component before encoding in a conventional international standard video encoding system such as MPEG-4AVC (ISO/IEC 14496-10)/ITU-T h.264 standard (hereinafter, AVC) (non-patent document 1). On the other hand, with the recent increase in resolution and gradation of video displays, and with the aim of accurately reproducing color representations at the time of production of contents such as digital cinema at the time of showing the same, a method of encoding color difference components by the same samples as luminance components without down-sampling the color difference components has been studied. The format of the number of samples in which the luminance component is identical to the color difference component is referred to as a 4:4:4 format. In MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard (hereinafter AVC), as a coding scheme using 4:4:4 format as an input, "high 4:4:4 class" is planned. A standard system such as JPEG2000(ISO/IEC 15444) standard (non-patent document 2) is known as a system suitable for this purpose. As shown in fig. 10, the conventional 4:2:0 format is based on the assumption that color difference components are down-sampled, and is limited to color space definitions such as Y, Cb, and Cr, whereas the 4:4:4 format is based on the assumption that there is no sample ratio distinction among color components, and thus R, G, B can be used as it is or a plurality of other color space definitions can be used in addition to Y, Cb, and Cr. In the video encoding method using the 4:2:0 format, since the color space is determined to be Y, Cb, and Cr, it is not necessary to consider the type of color space in the encoding process, but in the above-mentioned AVC high 4:4:4 type, the color space definition affects the encoding process itself. On the other hand, in the current high 4:4:4 class, since compatibility with another class to be encoded is considered in the 4:2:0 format defined in the Y, Cb, and Cr spaces, it cannot be said that the compression efficiency of the 4:4:4 format is optimized.

[ non-patent document 1 ] MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard

[ non-patent document 2 ] JPEG2000(ISO/IEC 15444) standard

Disclosure of Invention

For example, in a high 4:2:0 class in which the 4:2:0 format of AVC is a coding target, in a macroblock region including 16 × 16 pixels as a luminance component, Cb and Cr become 8 × 8 pixel blocks in the corresponding color difference components. In the motion compensation prediction in the high 4:2:0 class, block size information, which is a unit of motion compensation prediction, is multiplexed with reference image information used for prediction and motion vector information of each block only for a luminance component, and motion compensation prediction is performed using the same information as for a luminance component for a color difference component. In such an aspect, the assumption is made that the color space definition is made such that the contribution of the color difference component is smaller than the luminance component, which greatly contributes to the expression of the structure (texture) of the image in the 4:2:0 format. However, in the current high 4:4:4 class, even when the block size of a color difference signal for each macroblock is extended to 16 × 16 pixels, only a mode in which a color difference in a 4:2:0 format is simply extended by an intra prediction mode (intra prediction mode) is used, and as in the 4:2:0 format, motion compensation prediction is performed using an inter prediction mode (inter prediction mode) common to three components, reference image information, and motion vector information instead of multiplexing information of only one component as a luminance component, and in the structural expression of an image signal, the prediction method is not necessarily the best in the 4:4:4 format in which each color component contributes equally.

An object of the present invention is to provide an encoding device, a decoding device, an encoding method, a decoding method, a program for executing the same, and a recording medium storing the same, which are improved in optimization when encoding a moving image signal having no sample ratio distinction between color components, such as in the 4:4:4 format, as in the conventional technique.

An image encoding device according to the present invention is an image encoding device that receives a color video signal composed of a plurality of color components, divides the color video signal into predetermined encoding unit regions for each of the color components, and digitally compresses the color video signal by selectively applying intra-frame encoding or motion compensated predictive encoding, and includes: a color component separation unit that separates an input bit stream for each of the color components; a block dividing unit that divides an input color component signal into blocks of a size determined by the coding block size indication information and generates a signal of the coding unit region; a prediction image generation unit that generates a prediction image for the signal of the coding unit region, in accordance with one or more prediction modes indicating a prediction image generation method; a determination unit configured to determine a prediction mode used for encoding, based on a prediction efficiency of the prediction image outputted from the prediction image generation unit; a prediction error encoding unit that encodes a difference between a prediction image corresponding to the prediction mode determined by the determination unit and the input color component signal; and a coding unit that performs variable length coding on the prediction mode, an output of the prediction error coding unit, and a color component identification flag indicating which color component the color component belongs to by the color component separation, wherein the coding unit multiplexes the coding block size indication information and information obtained by coding the prediction error for the prediction mode for each coding unit region into a bit stream.

According to the image encoding device and the image decoding device of the present invention, when encoding using various color spaces without being limited to a fixed color space such as Y, Cb, or Cr is performed, intra prediction mode information and inter prediction mode information used for each color component can be flexibly selected, and optimal encoding processing can be performed even when a plurality of color space definitions are provided.

Drawings

Fig. 1 is an explanatory diagram showing a configuration of a video encoding device according to embodiment 1.

Fig. 2 is an explanatory diagram showing the configuration of the video decoding apparatus according to embodiment 1.

Fig. 3 is an explanatory diagram for explaining a predicted image generation method in the intra 4 × 4 prediction mode evaluated by the spatial prediction unit 2 in fig. 1.

Fig. 4 is an explanatory diagram for explaining a predicted image generation method in the intra 16 × 16 prediction mode evaluated by the spatial prediction unit 2 in fig. 1.

Fig. 5 is a flowchart illustrating a procedure of an intra prediction mode determination process performed in the video encoding apparatus of fig. 1.

Fig. 6 is an explanatory diagram showing the data arrangement of a video bit stream output from the video encoding device in embodiment 1.

Fig. 7 is a flowchart illustrating a procedure of an intra prediction decoding process performed by the video decoding apparatus of fig. 2.

Fig. 8 is an explanatory diagram showing another data arrangement form of a video bit stream output from the video encoding device in embodiment 1.

Fig. 9 is an explanatory diagram for explaining a predicted image generation method in which color difference components in the AVC standard correspond to intra prediction modes.

Fig. 10 is an explanatory diagram for explaining a conventional macroblock and a current macroblock.

Fig. 11 is an explanatory diagram showing a configuration of the video encoding device according to embodiment 2.

Fig. 12 is an explanatory diagram showing the configuration of the video decoding apparatus according to embodiment 2.

Fig. 13 is an explanatory diagram for explaining a predicted image generation method in the intra 8 × 8 prediction mode evaluated by the spatial prediction unit 2 in fig. 11.

Fig. 14 is a flowchart illustrating a procedure of an intra coding mode determination process performed by the video coding apparatus of fig. 11.

Fig. 15 is an explanatory diagram showing the data arrangement of a video bit stream output from the video encoding device in embodiment 2.

Fig. 16 is an explanatory diagram showing another data arrangement of a video bit stream output from the video encoding device in embodiment 2.

Fig. 17 is a flowchart illustrating a procedure of an intra prediction decoding process performed by the video decoding apparatus of fig. 12.

Fig. 18 is an explanatory diagram for explaining parameters of the intra prediction mode encoding process of the component C0 in embodiment 3.

Fig. 19 is an explanatory diagram for explaining parameters of the intra prediction mode encoding process of the component C1 in embodiment 3.

Fig. 20 is an explanatory diagram for explaining parameters of the intra prediction mode encoding process of the component C2 in embodiment 3.

Fig. 21 is a flowchart showing the flow of intra prediction mode encoding processing in embodiment 3.

Fig. 22 is a flowchart showing another flow of the intra prediction mode encoding process according to embodiment 3.

Fig. 23 is a flowchart showing the flow of the intra prediction mode decoding process in embodiment 3.

Fig. 24 is an explanatory diagram showing another data arrangement of a video bit stream output from the video encoding device according to embodiment 4.

Fig. 25 is a flowchart showing another flow of the intra prediction mode encoding process according to embodiment 5.

Fig. 26 is an explanatory diagram showing a rule of tabulated predicted value setting in embodiment 5.

Fig. 27 is a flowchart showing an encoding procedure in embodiment 6.

Fig. 28 is an explanatory diagram showing a binary sequence structure of CurrIntraPredMode in embodiment 6.

Fig. 29 is an explanatory diagram showing another binary sequence structure of CurrIntraPredMode in embodiment 6.

Fig. 30 is an explanatory diagram showing a configuration of the video encoding device according to embodiment 7.

Fig. 31 is an explanatory diagram showing the configuration of the video decoding apparatus according to embodiment 7.

Fig. 32 is an explanatory diagram showing a unit of a macroblock.

Fig. 33 is a flowchart showing the flow of inter prediction mode determination processing in embodiment 7.

Fig. 34 is an explanatory diagram showing the data arrangement of a video stream output from the video encoding device according to embodiment 7.

Fig. 35 is a flowchart showing the flow of processing performed by the variable length decoding unit 25 in embodiment 7.

Fig. 36 is an explanatory diagram showing another data arrangement of a video stream output from the video encoding device according to embodiment 7.

Fig. 37 is an explanatory diagram showing another data arrangement of a video stream output from the video encoding device according to embodiment 7.

Fig. 38 is a flowchart showing the flow of inter prediction mode determination processing in embodiment 8.

Fig. 39 is an explanatory diagram showing the data arrangement of the bit stream in the macroblock level in embodiment 8.

Fig. 40 is a flowchart showing the flow of inter-predicted image generation processing in embodiment 8.

Fig. 41 is an explanatory diagram showing another data arrangement of a bit stream in the macroblock level in embodiment 8.

Fig. 42 is an explanatory diagram showing another data arrangement of a bit stream at the macroblock level in embodiment 8.

Fig. 43 is a flowchart showing the flow of inter prediction mode determination processing in embodiment 9.

Fig. 44 is a flowchart showing the flow of inter-predicted image generation processing in embodiment 9.

Fig. 45 is an explanatory diagram showing the configuration of the motion vector encoding unit.

Fig. 46 is an explanatory diagram showing an operation of the motion vector encoding unit.

Fig. 47 is an explanatory diagram showing the configuration of the motion vector decoding unit.

Fig. 48 is an explanatory diagram showing the appearance of the bitstream syntax (syntax).

Fig. 49 is an explanatory diagram showing a structure of macroblock encoded data in embodiment 11.

Fig. 50 is an explanatory diagram showing a detailed configuration of the encoded data of the Cn component header information in fig. 49 in embodiment 11.

Fig. 51 is an explanatory diagram showing another configuration of macroblock encoded data in embodiment 11.

Fig. 52 is an explanatory diagram showing a structure of a bit stream in embodiment 11.

Fig. 53 is an explanatory diagram showing the structure of a slice (slice) in embodiment 11.

Fig. 54 is an explanatory diagram showing an internal configuration related to the arithmetic coding process of the variable-length coding unit 11 in embodiment 12.

Fig. 55 is a flowchart showing the flow of arithmetic coding processing by the variable-length coding unit 11 in embodiment 12.

Fig. 56 is an explanatory diagram showing a detailed flow of the process of step S162 in fig. 55 in embodiment 12.

Fig. 57 is an explanatory diagram showing the concept of a context model (ctx).

Fig. 58 is an explanatory diagram showing an example of a context model relating to a motion vector of a macroblock.

Fig. 59 is an explanatory diagram showing an internal configuration related to the arithmetic decoding processing of the variable length decoding unit 25 in embodiment 12.

Fig. 60 is a flowchart showing the flow of arithmetic decoding processing by the variable length decoding unit 25 in embodiment 12.

Fig. 61 is an explanatory diagram showing a context model 11f in embodiment 12.

Fig. 62 is an explanatory diagram showing differences in modes of a current macroblock in embodiment 12.

Fig. 63 is an explanatory diagram showing the configuration of the encoding device and the decoding device in embodiment 13.

Fig. 64 is an explanatory diagram showing the configuration of the video encoding device according to embodiment 13.

Fig. 65 is an explanatory diagram showing the configuration of the video decoding device according to embodiment 13.

Fig. 66 is an explanatory diagram showing the common encoding process in embodiment 14.

Fig. 67 is an explanatory diagram showing the independent encoding process in embodiment 14.

Fig. 68 is an explanatory diagram showing a temporal motion prediction reference relationship between pictures in the encoding device and the decoding device according to embodiment 14.

Fig. 69 is an explanatory diagram showing an example of the structure of a bit stream generated by the encoding device of embodiment 14 and subjected to input/decoding processing by the decoding device of embodiment 14.

Fig. 70 is an explanatory diagram showing the bit stream structure of slice data in the case of the common encoding process and the independent encoding process.

Fig. 71 is an explanatory diagram showing a schematic configuration of the encoding device according to embodiment 14.

Fig. 72 is an explanatory diagram showing how the processing delay on the encoding device side is reduced.

Fig. 73 is an explanatory diagram showing an internal configuration of the first picture encoding unit.

Fig. 74 is an explanatory diagram showing an internal configuration of the second picture encoding unit.

Fig. 75 is an explanatory diagram showing a schematic configuration of a decoding device according to embodiment 14.

Fig. 76 is an explanatory diagram showing an internal configuration of the first picture decoding unit.

Fig. 77 is an explanatory diagram showing an internal configuration of the second picture decoding unit.

Fig. 78 is an explanatory diagram showing an internal configuration of the first picture encoding unit to which the color space conversion process is applied.

Fig. 79 is an explanatory diagram showing an internal configuration of the first picture encoding unit to which the color space conversion process is applied.

Fig. 80 is an explanatory diagram showing an internal configuration of the first picture encoding unit to which the inverse color space conversion process is applied.

Fig. 81 is an explanatory diagram showing an internal configuration of the first picture encoding unit to which the inverse color space conversion process is applied.

Fig. 82 is an explanatory diagram showing the structure of encoded data of macroblock header information included in a conventional YUV4:2: 0-format bitstream.

Fig. 83 is an explanatory diagram showing an internal configuration of the prediction unit 461 of the first picture decoding unit that ensures compatibility with a conventional YUV4:2:0 format bitstream.

Fig. 84 is an explanatory diagram showing a structure of a bit stream of multiplexed encoded data in embodiment 15.

Fig. 85 is an explanatory diagram showing information of picture coding types when picture data within an access unit (access unit) starting from an AUD NAL unit is coded.

Fig. 86 is an explanatory diagram showing a structure of a bit stream of multiplexed encoded data in embodiment 15.

Fig. 87 is an explanatory diagram showing a schematic configuration of an encoding device according to embodiment 16.

Fig. 88 is an explanatory diagram showing an internal configuration of the picture coding unit.

Fig. 89 is an explanatory diagram showing a case where different block sizes are used for each color difference component.

Fig. 90 is an explanatory diagram showing a unit of a macroblock.

Fig. 91 is an explanatory diagram showing the data arrangement of a video stream, which is an output from the picture coding unit.

Fig. 92 is an explanatory diagram showing a schematic configuration of a decoding device according to embodiment 16.

Fig. 93 is an explanatory diagram showing an internal configuration of the decoding unit.

Fig. 94 is an explanatory diagram illustrating a method of generating half-pixel prediction pixels.

Fig. 95 is an explanatory diagram illustrating horizontal processing of a half-pixel prediction pixel generation method in the case of only 1/4 pixels MC.

Fig. 96 is an explanatory diagram showing a schematic configuration of an encoding device according to embodiment 17.

Fig. 97 is an explanatory diagram showing the data arrangement of a video stream, which is an output from the picture coding unit.

Fig. 98 is an explanatory diagram showing a schematic configuration of a decoding device according to embodiment 17.

Detailed Description

Embodiment mode 1

In embodiment 1, a description will be given of an encoding device and a corresponding decoding device for performing closed encoding within a frame in units of a matrix area (macroblock) of 16 × 16 pixels, which is obtained by equally dividing a video frame input in a 4:4:4 format. The encoding device and the decoding device are based on the encoding method adopted in MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard, which is non-patent document 1, and are given the features unique to the present invention. In all the embodiments below, the macroblock is not necessarily limited to 16 × 16 pixels of the frame image, and for example, when a field is a picture of a coding unit as in the case of an interlaced signal (interlaced signal), a 16 × 16 pixel block of a field image may be used as the macroblock, or coding may be performed while adaptively changing the block size of the macroblock in accordance with whether the macroblock is coded as the frame image or the macroblock is coded as the field image.

Fig. 1 shows a configuration of a video encoding device according to embodiment 1, and fig. 2 shows a configuration of a video decoding device according to embodiment 1. In fig. 2, elements denoted by the same reference numerals as those of the components of the encoding apparatus of fig. 1 denote the same elements.

The following describes the overall operation of the encoding apparatus and the decoding apparatus, and the intra prediction mode determination process and the intra prediction decoding process, which are characteristic operations of embodiment 1, based on these drawings.

1. Outline of operation of encoding device

In the encoding device of fig. 1, each video frame is input to the input video signal 1 in a 4:4:4 format. As shown in fig. 10, the input video frame is input to the encoding device in units of macroblocks obtained by dividing three color components into blocks of 16 pixels × 16 pixels of the same size and summing up the three color components.

First, the spatial prediction unit 2 performs intra prediction processing for each color component in units of macroblocks using the local decoded image 15 stored in the memory 16. Three memories are prepared for each color component (three are described in the present embodiment, but may be appropriately changed according to design). Among the modes of intra prediction are: an intra 4 × 4 prediction mode in which spatial prediction using peripheral pixels is performed in units of blocks of 4 pixels × 4 lines shown in fig. 3; the intra 16 × 16 prediction mode using spatial prediction of peripheral pixels is performed in units of macroblocks of 16 pixels × 16 lines as shown in fig. 4.

(a) Intra4 × 4 prediction mode

The 16 × 16 pixel block of the luminance signal in the macroblock is divided into 16 blocks each composed of 4 × 4 pixel blocks, and any one of the 9 modes shown in fig. 3 is selected for each 4 × 4 pixel block unit. The pixels of the surrounding blocks (upper left, upper right, and left) that have already finished encoding and performed the local decoding processing and are stored in the memory 16 are used for prediction image generation.

Intra4 × 4_ pred _ mode ═ 0: the adjacent upper pixel is used as a prediction image as it is.

Intra4 × 4_ pred _ mode ═ 1: the adjacent left pixels are used as prediction images as they are.

Intra4 × 4_ pred _ mode ═ 2: the average value of the adjacent 8 pixels is used as a prediction image.

Intra4 × 4_ pred _ mode — 3: a weighted average is obtained for every 2-3 pixels from adjacent pixels and used as a prediction image (corresponding to the right 45-degree edge).

Intra4 × 4_ pred _ mode — 4: a weighted average is obtained for every 2-3 pixels from adjacent pixels and used as a prediction image (corresponding to the left 45-degree edge).

Intra4 × 4_ pred _ mode — 5: a weighted average is obtained for every 2-3 pixels from the adjacent pixels and used as a prediction image (corresponding to the left 22.5 degree edge).

Intra4 × 4_ pred _ mode ═ 6: a weighted average is obtained for every 2 to 3 pixels from the adjacent pixels and used as a predicted image (corresponding to the left 67.5 degree edge).

Intra4 × 4_ pred _ mode ═ 7: a weighted average is obtained for every 2-3 pixels from the adjacent pixels and used as a prediction image (corresponding to the right 22.5 degree edge).

Intra4 × 4_ pred _ mode — 8: a weighted average is obtained for every 2 to 3 pixels from the adjacent pixels and used as a prediction image (corresponding to the left 112.5 degree edge).

In the case of selecting the intra4 × 4 prediction mode, 16 pieces of mode information for each macroblock are required. Therefore, in order to reduce the amount of code of the mode information itself, prediction encoding is performed based on the mode information of the adjacent block using the fact that the mode information has a high correlation with the adjacent block.

(b) Intra16 × 16 prediction mode

The mode is a mode in which a16 × 16 pixel block corresponding to the macroblock size is predicted once, and any one of the four modes shown in fig. 4 is selected in units of macroblocks. As in the intra4 × 4 prediction mode, the pixels of the surrounding macroblocks (upper left, and left) that have already been encoded and subjected to the local decoding process and are stored in the memory 16 are used for generating the predicted image.

Intra16 × 16_ pred _ mode ═ 0: the lowermost 16 pixels of the upper macroblock are used as a prediction image.

Intra16 × 16_ pred _ mode ═ 1: the rightmost 16 pixels of the left macroblock are used as a prediction image.

Intra16 × 16_ pred _ mode ═ 2: an average value of 32 pixels in total of 16 pixels on the lowermost side of the upper macroblock (part a in fig. 4) and 16 pixels on the leftmost side of the left macroblock (part B in fig. 4) is used as a prediction image.

Intra16 × 16_ pred _ mode — 3: a predicted image is obtained by using a total of 31 pixels, i.e., the bottom right pixel of the top left macroblock, the bottom 15 pixels of the top macroblock (excluding the blank pixels), and the right 15 pixels of the left macroblock (excluding the blank pixels), and performing predetermined arithmetic processing (weighted addition processing corresponding to the pixel to be used and the predicted pixel position).

The video encoding device according to embodiment 1 is characterized in that the intra prediction processing method for the three color components is switched in accordance with the intra prediction mode sharing identification flag 23. This point is described in detail in the following 2.

The spatial prediction unit 2 performs prediction processing on all modes or subsets shown in fig. 3 and 4, and obtains a prediction difference signal 4 by the subtractor 3. The prediction difference signal 4 is evaluated for its prediction efficiency in the coding mode determination unit 5, and a prediction mode in which the best prediction efficiency is obtained for the macroblock to be predicted in the prediction process performed by the spatial prediction unit 2 is output as the coding mode 6. Here, the coding mode 6 includes discrimination information (equivalent to the Intra coding mode of fig. 6) for discriminating whether to use the Intra4 × 4 prediction mode or the Intra16 × 16 prediction mode, and also includes respective prediction mode information (the above-described Intra4 × 4_ pred _ mode or Intra16 × 16_ pred _ mode) used in each prediction unit region. The prediction unit region corresponds to a4 × 4 pixel block in the intra4 × 4 prediction mode, and corresponds to a16 × 16 pixel block in the intra16 × 16 prediction mode. When the coding mode 6 is selected, the weight coefficient 20 for each coding mode determined by the judgment of the coding control unit 19 may be added. The optimal prediction difference signal 4 obtained by the coding mode determining unit 5 using the coding mode 6 is output to the orthogonal transform unit 8. The orthogonal transform unit 8 transforms the input prediction difference signal 4 and outputs the transformed signal to the quantization unit 9 as an orthogonal transform coefficient. The quantization unit 9 quantizes the input orthogonal transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient to the variable length encoding unit 11 as the quantization-completed transform coefficient 10. The variable length encoding unit 11 entropy encodes the quantized transform coefficient 10 by means such as huffman coding or arithmetic coding. The quantized transform coefficient 10 is restored to a local decoded prediction difference signal 14 via an inverse quantization unit 12 and an inverse orthogonal transform unit 13, and is added to the predicted image 7 generated in accordance with the encoding mode 6 by an adder 18, thereby generating a local decoded image 15. The local decoded image 15 is stored in the memory 16 for use in the subsequent intra prediction process. Further, a block filter control flag 24 indicating whether or not to perform block (deblocking) filtering on the macroblock is input to the variable length encoding unit 11 (in the prediction processing performed by the spatial prediction unit 2, pixel data before block filtering is performed is stored in the memory 16 and used, so that the block filter processing itself is not necessary in the encoding processing, but the block filter is performed by the decoding apparatus side using an instruction of the block filter control flag 24 to obtain a final decoded image).

The intra prediction mode sharing identification flag 23, the quantization end transform coefficient 10, the coding mode 6, and the quantization parameter 21 input to the variable length coding unit 11 are arranged and shaped into a bit stream according to a predetermined rule (syntax), and output to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path to which the encoding apparatus is connected and the reading speed of the recording medium, and outputs the bit stream as a video stream 22. Further, feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.

2. Intra prediction mode determination processing in encoding device

The intra prediction mode determination process, which is a feature of the encoding apparatus according to embodiment 1, will be described in detail. In this process, the above three color components are performed in units of macroblocks, and the spatial prediction unit 2 and the encoding mode determination unit 5 in the encoding device of fig. 1 are mainly used. Fig. 5 is a flowchart showing the flow of the present process. Hereinafter, the image data of the three color components constituting the block are referred to as C0, C1, and C2.

First, the encoding mode determination unit 5 receives the intra prediction mode sharing flag 23, and determines whether or not the common intra prediction mode is used in C0, C1, and C2 based on the value (step S1 in fig. 5). If the data is shared, the process proceeds to step S2 and thereafter, and if the data is not shared, the process proceeds to step S5 and thereafter.

When the intra prediction modes are shared among C0, C1, and C2, the coding mode determination unit 5 notifies the spatial prediction unit 2 of all the intra 4 × 4 prediction modes that can be selected, and the spatial prediction unit 2 evaluates all the prediction efficiencies thereof and selects the optimal intra 4 × 4 prediction mode shared among C0, C1, and C2 (step S2). Next, the coding mode decision unit 5 notifies the spatial prediction unit 2 of all or a part of the intra 16 × 16 prediction modes that can be selected, and the spatial prediction unit 2 evaluates all the prediction efficiencies thereof and selects the optimal intra 16 × 16 prediction mode common to C0, C1, and C2 (step S3). The encoding mode determination unit 5 finally selects a mode that is more optimal in prediction efficiency among the modes obtained in steps S2 and S3 (step S4), and ends the process.

When the intra prediction modes are not shared among C0, C1, and C2 and the optimal modes are selected among C0, C1, and C2, the coding mode determination unit 5 notifies the spatial prediction unit 2 of all or a part of the intra 4 × 4 prediction modes that can be selected among the Ci (i < ═ 0 < 3) components, and the spatial prediction unit 2 evaluates all the prediction efficiencies thereof and selects the optimal intra 4 × 4 prediction mode among the Ci (i < ═ 0 < 3) components (step S6). Likewise, the optimal intra 16 × 16 prediction mode is selected (step S7). Finally, in step S8, the optimum intra prediction mode in the Ci component (i < ═ 0 < 3) is determined.

As a criterion for the estimation efficiency evaluation of the prediction mode by the spatial prediction unit 2, for example, a rate distortion cost provided by Jm ═ Dm + λ Rm (λ: positive number) can be used. Here, Dm is an amount of coding distortion or prediction error when the intra prediction mode m is applied. The coding distortion is obtained by obtaining a prediction error by applying the intra prediction mode m, transforming and quantizing the prediction error, decoding a video based on the result of the transformation and quantization, and measuring an error with respect to a signal before coding. The prediction error amount is obtained by obtaining a difference between a predicted image when the intra prediction mode m is applied and a signal before encoding, and quantifying the magnitude of the difference, and is obtained by using, for example, Sum of Absolute Differences (SAD) or the like. Rm is the amount of code that occurs when the intra prediction mode m is applied. That is, Jm is a value that specifies a tradeoff between the amount of code and the degree of degradation when the intra prediction mode m is applied, and the intra prediction mode m that provides the smallest Jm provides the best solution.

When the encoding device performs the processing of step S2 and subsequent steps, one piece of intra prediction mode information is assigned to each macroblock including three color components. On the other hand, when the processing in step S5 and thereafter is performed, intra prediction mode information is assigned to each color component. Therefore, since the information of the intra prediction mode allocated to the macroblock differs, it is necessary to multiplex the intra prediction mode-sharing identification flag 23 into the bit stream, and the decoding apparatus can identify whether the encoding apparatus has performed the processing procedure after S2 or the processing procedure after S5. Fig. 6 shows a data arrangement of such a bit stream.

The figure shows the data arrangement of the bit stream in the level of macroblocks. The intra coding mode 28 is information for determining whether the intra coding mode is intra 4 × 4 or intra 16 × 16, and the basic intra prediction mode 29 indicates the common intra prediction mode information when the intra prediction mode sharing flag 23 indicates "common to C0, C1, and C2", and indicates the intra prediction mode information for C0 when the intra prediction mode sharing flag indicates "common to C0, C1, and C2". The extended intra prediction mode 30 is multiplexed only when the intra prediction mode sharing flag 23 indicates that it is not "common to C0, C1, and C2", and indicates intra prediction mode information for C1 and C2. Next, the quantization parameter 21 and the quantization end transform coefficient 10 are multiplexed. The coding mode 6 in fig. 1 is a mode in which the intra coding mode 28 and the intra prediction mode (basic/extended) are collectively referred to as a "general" mode (in fig. 6, the block filtering control flag 24 input to the variable length coding unit 11 in fig. 1 is not included, but is omitted because it is not a necessary component for explaining the features of the present embodiment 1).

In the 4:2:0 format adopted in the conventional video encoding standard, the definition of the color space is fixed to Y, Cb, and Cr, but the 4:4:4 format is not limited to Y, Cb, and Cr, and various color spaces can be used. By configuring the intra prediction mode information as shown in fig. 6, it is possible to perform an optimal encoding process even when there are a plurality of definitions of the color space of the input video signal 1. For example, when the color space is defined by RGB, since the texture of the video remains uniformly in the components of R, G, B, the redundancy of the intra prediction mode information itself can be reduced and the encoding efficiency can be improved by using the common intra prediction mode information. On the other hand, when the color space is defined by Y, Cb, and Cr, the structure of the video texture is integrated into Y, and therefore the common intra prediction mode does not necessarily provide the best result. Therefore, by adaptively using the extended intra prediction mode 30, optimal coding efficiency can be obtained.

3. Outline of operation of decoding device

The decoding apparatus of fig. 2 receives the video stream 22 in the arrangement of fig. 6 output from the encoding apparatus of fig. 1, performs decoding processing on the three color components in units of macroblocks of the same size (4:4:4 format), and restores each video frame.

First, the variable length decoding unit 25 decodes the stream 22 according to a predetermined rule (syntax) with the stream 22 as an input, and extracts information such as the intra prediction mode sharing identification flag 23, the quantization completion transform coefficient 10, the coding mode 6, and the quantization parameter 21. The quantized transform coefficient 10 is input to the inverse quantization unit 12 together with the quantization parameter 21, and inverse quantization processing is performed. Then, the output thereof is input to the inverse orthogonal transform unit 13, and is restored to the local decoded prediction difference signal 14. On the other hand, the spatial prediction unit 2 receives the encoding mode 6 and the intra prediction mode sharing identification flag 23, and obtains the predicted image 7 according to these pieces of information. A specific procedure for obtaining the predicted image 7 will be described later. The local decoded prediction difference signal 14 and the predicted image 7 are added by an adder 18, and a temporary decoded image 15 (which is a signal identical to the local decoded image 15 in the encoding apparatus) is obtained. The provisional decoded image 15 is written back to the memory 16 for use in intra prediction of a subsequent macroblock. Three memories are prepared for each color component (three are described in the present embodiment, but may be appropriately changed by design). In addition, the block filter 26 is caused to act on the provisional decoded image 15 in accordance with the instruction of the block filter control flag 24 interpreted by the variable length decoding unit 25, and the final decoded image 27 is obtained.

4. Intra-prediction decoding processing in decoding device

The intra prediction image generation processing, which is a feature of the decoding apparatus according to embodiment 1, will be described in detail. In this process, the above three color components are performed in units of macroblocks, and the variable length decoding unit 25 and the spatial prediction unit 2 in the decoding device of fig. 2 are mainly used. Fig. 7 is a flowchart showing the flow of the present process.

In the flowchart of fig. 7, the variable length decoding unit 25 performs S10 to S14. The video stream 22, which is input to the variable length decoding unit 25, is arranged in accordance with the data of fig. 6. In step S10, the intra coding mode 28 in the data of fig. 6 is first decoded, and the intra prediction mode-sharing flag 23 is then decoded (step S11). Further, the basic intra prediction mode 29 is decoded (step S12). In step S13, it is determined whether or not the intra prediction modes are shared among C0, C1, and C2 using the result of the intra prediction mode sharing flag 23, and if the intra prediction modes are shared, the basic intra prediction mode 29 is used for all of C0, C1, and C2, and if the intra prediction modes are not shared, the basic intra prediction mode 29 is used as the mode of C0, and the extended intra prediction mode 30 is decoded (step S14), so that the mode information of C1 and C2 is obtained. Since the encoding mode 6 of each color component is specified through the above processing procedure, it is output to the spatial prediction unit 2, and an intra prediction image of each color component is obtained in accordance with steps S15 to S17. The process of obtaining an intra-prediction image is similar to the process performed by the encoding device of fig. 1, in accordance with the steps of fig. 3 and 4.

Fig. 8 shows a variation of the bit stream data arrangement of fig. 6. In fig. 7, the intra-prediction mode-sharing identification flag 23 is not a flag at the macroblock level, but is multiplexed as a flag in an upper data layer such as a slice, a picture, and a sequence, and an extended intra-prediction mode table indication flag 31 is provided so that any one of a plurality of code tables defining code words of the extended intra-prediction mode 30 can be selected. Thus, when sufficient prediction efficiency can be ensured by switching in an upper layer above a slice, it is possible to reduce the number of extra bits (overhead bits) without multiplexing the intra prediction mode-sharing identification flag 23 one by one at the macroblock level. In addition, by setting the extended intra prediction mode table flag 31 to the extended intra prediction mode 30, the definition of a prediction mode specialized into components C1 and C2, which is not the same definition as the basic intra prediction mode 29, can be selected, and encoding processing suitable for the definition of a color space can be performed. For example, in the 4:2:0 format coding of AVC, a different intra prediction mode set from luminance (Y) is defined for the color difference components (Cb, Cr). In the 4:2:0 format, the color difference signal in a macroblock is 8 pixels × 8 lines, and any one of the four modes shown in fig. 9 is selected for each macroblock unit and is subjected to decoding processing. The color difference signal has two types of Cb and Cr, but uses the same pattern. The same prediction processing as the intra 16 × 16 prediction mode in fig. 4 is performed except for the DC prediction in which intra _ chroma _ pred _ mode is 0, but in the DC prediction, an 8 × 8 block is divided into four 4 × 4 blocks, and the position of a pixel for which an average value is obtained is changed for each block. In the figure, a block "a + x, a or x" is used as the predicted image 7 by obtaining an average value using 8 pixels of a and x when both the pixel a and the pixel x are available, using four pixels of a when only a is available, and using four pixels of x when only x is available. In the case where both a and x are unavailable, the value 128 is used as the predicted image 7. The block "b or x" is an average value obtained by using four pixels of b when the image b can be used, and an average value obtained by using four pixels of x when only the pixel x can be used.

In this way, when a change is required in the set of intra prediction modes according to the properties of the color components, more optimal coding efficiency can be obtained by the structure such as the syntax of fig. 8.

Embodiment mode 2

In embodiment 2, another encoding device and a corresponding decoding device will be described which perform closed encoding within a frame in units of a matrix area (macroblock) in which a video frame input in a 4:4:4 format is equally divided into 16 × 16 pixels. The encoding device and the decoding device are based on the encoding method adopted in MPEG-4AVC (ISO/IEC14496-10)/ITU-T H.264 standard, which is non-patent document 1, and are given the features unique to the present invention, as in embodiment 1.

Fig. 11 shows a configuration of a video encoding device according to embodiment 2, and fig. 12 shows a configuration of a video decoding device according to embodiment 2. In fig. 11, elements denoted by the same reference numerals as those of the components of the encoding device of fig. 1 denote the same elements. In fig. 12, elements denoted by the same reference numerals as those of the components of the coding apparatus of fig. 11 denote the same elements. In fig. 11, 32 denotes a transform block size identification flag, and 33 denotes an intra coding mode commonization identification flag.

The following describes the overall operation of the encoding device and decoding device according to embodiment 2, and intra-coding/prediction mode determination processing and intra-prediction decoding processing, which are characteristic operations of embodiment 2, based on these drawings.

1. Outline of operation of encoding device

In the encoding device of fig. 11, each video frame of the input video signal 1 is in a 4:4:4 format, and is input to the encoding device in units obtained by dividing and integrating three color components into macroblocks of the same size as shown in fig. 10.

The spatial prediction unit 2 performs intra prediction processing for each color component in units of macroblocks using the local decoded image 15 stored in the memory 16. Among the modes of intra prediction are: an intra 4 × 4 prediction mode in which spatial prediction using peripheral pixels is performed in units of blocks of 4 pixels × 4 lines shown in fig. 3; an intra 8 × 8 prediction mode in which spatial prediction using peripheral pixels is performed in units of 8 pixel × 8 line blocks shown in fig. 13; the intra 16 × 16 prediction mode using spatial prediction of peripheral pixels is performed in units of macroblocks of 16 pixels × 16 lines as shown in fig. 4. In the encoding device according to embodiment 2, the intra 4 × 4 prediction mode and the intra 8 × 8 prediction mode are switched and used in accordance with the state of the transform block size identification flag 32. As for which intra prediction mode of 4 × 4 prediction, 8 × 8 prediction, and 16 × 16 prediction is used to encode a certain macroblock, it can be expressed by an intra encoding mode in the same manner as in fig. 6. The encoding device according to embodiment 2 includes, as intra-frame encoding modes: an intra N × N prediction encoding mode (N is 4 or 8) in which encoding is performed using either an intra 4 × 4 prediction mode or an intra 8 × 8 prediction mode, and an intra 16 × 16 prediction encoding mode in which encoding is performed using an intra 16 × 16 prediction mode. Hereinafter, the intra coding mode will be described.

(a) Intra-frame nxn predictive coding mode

The following modes are adopted: the encoding is performed while selectively switching between an intra 4 × 4 prediction mode in which a luminance signal 16 × 16 pixel block within a macroblock is divided into 16 blocks each composed of 4 × 4 pixel blocks and a prediction mode is selected for each 4 × 4 pixel block, and an intra 8 × 8 prediction mode in which a luminance signal 16 × 16 pixel block within a macroblock is divided into four blocks each composed of 8 × 8 pixel blocks and a prediction mode is selected for each 8 × 8 pixel block. The switching between the intra 4 × 4 prediction mode and the intra 8 × 8 prediction mode is linked to the state of the transform block size identification flag 32. This point will be described later. As described in embodiment 1, the intra 4 × 4 prediction mode is selected from among the nine modes shown in fig. 3 in units of 4 × 4 pixel blocks. The pixels of the surrounding blocks (upper left, upper right, and left) that have already finished encoding and performed the local decoding processing and are stored in the memory 16 are used for prediction image generation.

On the other hand, in the intra 8 × 8 prediction mode, any one of the nine modes shown in fig. 13 is selected in units of 8 × 8 pixel blocks. As is clear from comparison with fig. 3, a modification is added to adapt the intra 4 × 4 prediction mode prediction method to an 8 × 8 pixel block.

Intra8 × 8_ pred _ mode ═ 0: the adjacent upper pixel is used as a prediction image as it is.

Intra8 × 8_ pred _ mode ═ 1: the adjacent left pixels are used as prediction images as they are.

Intra8 × 8_ pred _ mode ═ 2: the average value of the adjacent 8 pixels is used as a prediction image.

Intra8 × 8_ pred _ mode — 3: a weighted average is obtained for every 2-3 pixels from the adjacent pixels and used as a prediction image (corresponding to the right 45-degree edge).

Intra8 × 8_ pred _ mode — 4: a weighted average is obtained for every 2-3 pixels from adjacent pixels and used as a prediction image (corresponding to the left 45-degree edge).

Intra8 × 8_ pred _ mode ═ 5: a weighted average is obtained for every 2-3 pixels from the adjacent pixels and used as a prediction image (corresponding to the left 22.5 degree edge).

Intra8 × 8_ pred _ mode ═ 6: a weighted average is obtained for every 2 to 3 pixels from the adjacent pixels and used as a predicted image (corresponding to the left 67.5 degree edge).

Intra8 × 8_ pred _ mode ═ 7: a weighted average is obtained for every 2-3 pixels from the adjacent pixels and used as a prediction image (corresponding to the right 22.5 degree edge).

Intra8 × 8_ pred _ mode — 8: a weighted average is obtained for every 2 to 3 pixels from the adjacent pixels and used as a prediction image (corresponding to the left 112.5 degree edge).

In the case of selecting the intra 4 × 4 prediction mode, 16 pieces of mode information for each macroblock are required. Therefore, in order to reduce the amount of code of the mode information itself, prediction encoding is performed based on the mode information of the adjacent block using the fact that the mode information has a high correlation with the adjacent block. Similarly, when the intra 8 × 8 prediction mode is selected, prediction encoding is performed based on the mode information of the adjacent blocks, using the fact that the correlation between the adjacent blocks and the intra prediction mode is high.

(b) Intra 16 x 16 predictive coding mode

The mode is a mode in which a 16 × 16 pixel block corresponding to the macroblock size is predicted once, and any one of the four modes shown in fig. 4 is selected in units of macroblocks. As in the intra 4 × 4 prediction mode, the pixels of the surrounding macroblocks (upper left, and left) that have already been encoded and subjected to the local decoding process and are stored in the memory 16 are used for generating the predicted image. The mode type is as described in embodiment 1 with reference to fig. 4. In the intra 16 × 16 prediction encoding mode, the transform block size is always set to 4 × 4. Wherein the following two-stage transformation is implemented: first, 16 DC (direct current component, average value) of 4 × 4 block units are collected, 4 × 4 block conversion is performed in units of the DC, and the remaining alternating current component excluding the DC portion is converted for each 4 × 4 block.

The video encoding device according to embodiment 2 is characterized in that the intra prediction, transform, and encoding method for the three color components is switched according to the intra coding mode sharing identification flag 33. This point is described in detail in the following 2.

The spatial prediction unit 2 evaluates the intra prediction mode for the input three color component signals, based on the instruction of the intra coding mode sharing identification flag 33. The intra-coding-mode commonization identification flag 33 indicates whether the intra-coding mode is individually assigned to the three color components inputted or the same intra-coding mode is assigned to all of the three components. It is based on the following background.

In the 4:4:4 format, RGB can be directly used in addition to the Y, Cb, Cr color space conventionally used for encoding. In the Y, Cb, and Cr color spaces, components depending on the texture of the video are removed from the Cb and Cr signals. The probability of the optimal intra-coding method changing between the Y component and the two components Cb and Cr is high. (in the encoding scheme such as the 4:2:0 format of AVC/H.264, such as the high-class 4:2:0, the design of the intra prediction mode used for the Y component is different from that used for the Cb and Cr components). On the other hand, when encoding is performed in the RGB color space, the texture structure between the color components is not removed as in the Y, Cb, and Cr color spaces, and the correlation between the signal components in the same space increases, so that the intra-coding mode can be selected in common, and it is possible to improve the encoding efficiency. This point is not limited only by the definition of the color space, but also by the nature of the video even if a specific color space is used, and it is preferable that the encoding method itself can adaptively cope with such a nature of the video signal. Therefore, in the present embodiment, the encoding device is configured so that the intra-coding mode sharing identification flag 33 is provided, and flexible encoding can be performed for 4:4:4 format video.

In accordance with the state of the intra-coding mode-sharing flag 33 set as described above, the spatial prediction unit 2 performs prediction processing for each color component for all the intra-prediction modes or a predetermined subset shown in fig. 3, 4, and 13, and obtains a prediction difference signal 4 by the subtractor 3. The prediction difference signal 4 is evaluated for its prediction efficiency in the coding mode determination unit 5, and an intra prediction mode that achieves the best prediction efficiency for the target macroblock is selected from the prediction processing performed by the spatial prediction unit 2. Here, when intra N × N prediction is selected, the intra N × N prediction encoding mode is output as the encoding mode 6, and when the prediction mode is intra 4 × 4 prediction, the transform block size identification flag 32 is set to "transform at 4 × 4 block size". In addition, when the prediction mode is intra 8 × 8 prediction, the transform block size identification flag 32 is set to "transform at 8 × 8 block size". Various methods are conceivable for determining the transform block size identification flag 32, but the following methods are given as an example of the encoding device in embodiment 2: in order to determine the block size when transforming the residual obtained by intra N × N prediction, the optimum intra N × N prediction mode is determined in the encoding mode determination unit 5, and then determined in accordance with the N value. For example, when the transform block size is set to 8 × 8 pixel blocks in the case of using the intra 4 × 4 prediction mode, the spatial continuity of the prediction signal is likely to be interrupted in units of 4 × 4 blocks in the prediction difference signal 4 obtained as a result of prediction, and unnecessary high-frequency components are generated, so that the effect of concentrating the signal power by the transform is impaired. Such a problem does not occur if the transform block size is set to a 4 × 4 pixel block in accordance with the prediction mode.

When the intra 16 × 16 prediction is selected by the coding mode determination unit 5, the intra 16 × 16 prediction coding mode is output as the coding mode 6. When the coding mode 6 is selected, the weight coefficient 20 for each coding mode determined by the judgment of the coding control unit 19 may be added.

The prediction difference signal 4 obtained in the encoding mode 6 is output to the orthogonal transform unit 8. The orthogonal transform unit 8 transforms the input prediction difference signal and outputs the transformed prediction difference signal to the quantization unit 9 as an orthogonal transform coefficient. The quantization unit 9 quantizes the input orthogonal transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient to the variable length encoding unit 11 as the quantization-completed transform coefficient 10.

When the transform block size is 4 × 4 block units, the prediction difference signal 4 input to the orthogonal transform unit 8 is divided into 4 × 4 block units, orthogonal transformed, and quantized by the quantization unit 9. When the transform block size is 8 × 8 block units, the prediction difference signal 4 input to the orthogonal transform unit 8 is divided into 8 × 8 block units, orthogonal transformed, and quantized by the quantization unit 9.

The variable length encoding unit 11 entropy encodes the quantized transform coefficient 10 by means such as huffman coding or arithmetic coding. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse orthogonal transform unit 13 in accordance with the block size based on the transform block size identification flag 32 or the like, and is added to the predicted image 7 generated in the encoding mode 6 by the adder 18, thereby generating the local decoded image 15. The local decoded image 15 is stored in the memory 16 for use in the subsequent intra prediction process. Further, a block filter control flag 24 indicating whether or not to perform block filtering on the macroblock is input to the variable length coding unit 11 (in the prediction processing performed by the spatial prediction unit 2, pixel data before performing block filtering is stored in the memory 16 and used, so that the block filtering process itself is not necessary in the coding processing, but the decoding apparatus performs block filtering using an instruction of the block filter control flag 24 to obtain a final decoded image).

The intra-coding mode-sharing flag 33, the quantization-completed transform coefficient 10, the coding mode 6, and the quantization parameter 21 input to the variable-length coding unit 11 are arranged and shaped into a bitstream according to a predetermined rule (syntax), and output to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path to which the encoding apparatus is connected and the reading speed of the recording medium, and outputs the bit stream as a video stream 22. Further, feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.

2. Intra-frame coding mode/prediction mode decision processing in coding apparatus

The determination processing of the intra coding mode and the intra prediction mode, which are the features of the coding apparatus according to embodiment 2, will be described in detail. In this process, the above three color components are performed in units of macroblocks, and the spatial prediction unit 2 and the encoding mode determination unit 5 in the encoding device of fig. 11 are mainly used. Fig. 14 is a flowchart showing the flow of the present process. Hereinafter, the image data of the three color components constituting the block are referred to as C0, C1, and C2.

First, the encoding mode determination unit 5 receives the intra-encoding mode sharing flag 33, and determines whether or not the common intra-encoding mode is used in C0, C1, and C2 based on the value (step S20 in fig. 14). If the data is shared, the process proceeds to step S21 and thereafter, and if the data is not shared, the process proceeds to step S22 and thereafter.

When the intra coding modes are shared among C0, C1, and C2, the coding mode determination unit 5 notifies the spatial prediction unit 2 of all or some of the selectable intra prediction modes (intra N × N prediction, intra 16 × 16 prediction), and the spatial prediction unit 2 evaluates all of the prediction efficiencies thereof and selects the intra coding mode and the intra prediction mode that are optimal for all the components (step S21).

On the other hand, when the optimal intra coding mode is selected in each of C0, C1, and C2, the coding mode determination unit 5 notifies the spatial prediction unit 2 of all or some of the intra prediction modes (intra N × N prediction, intra 16 × 16 prediction) that can be selected in the Ci (i < ═ 0 < 3) component, and the spatial prediction unit 2 evaluates all of the prediction efficiencies thereof and selects the optimal intra 4 × 4 prediction mode in the Ci (i < ═ 0 < 3) (step S23).

In the above steps S21 and S23, when the intra 4 × 4 prediction mode is selected as the mode providing the best prediction efficiency by the spatial prediction unit 2, the transform block size flag 32 is set to "transform at 4 × 4 block size", and when the intra 8 × 8 prediction mode is selected as the mode providing the best prediction efficiency by the spatial prediction unit 2, the transform block size flag 32 is set to "transform at 8 × 8 block size".

When the encoding device performs the processing of step S21 and subsequent steps, information of one intra encoding mode is assigned to a macroblock including three color components. On the other hand, when the processing in step S22 and thereafter is performed, intra-coding mode information (three in total) is assigned to each color component. Therefore, since the information of the intra prediction mode allocated to the macroblock differs, it is necessary to multiplex the intra coding mode sharing identification flag 23 into the bit stream, and the decoding apparatus can identify whether the encoding apparatus has performed the processing procedure after S21 or the processing procedure after S23. Fig. 15 shows the data arrangement of such a bit stream.

In fig. 15, intra coding modes 0(34a), 1(34b), and 2(34C) multiplexed into the bitstream at the macroblock level represent coding mode 6 for C0, C1, and C2 components, respectively. When the intra coding mode is an intra nxn prediction coding mode, the transform block size identification flag 32 and information of the intra prediction mode are multiplexed into the bitstream. On the other hand, when the intra coding mode is the intra 16 × 16 prediction coding mode, the information of the intra prediction mode is coded as a part of the intra coding mode information, and the transform block size identification flag 32 and the information of the intra prediction mode are not multiplexed into the bitstream. When the intra-coding-mode-commonization flag 33 indicates "common to C0, C1, and C2", the intra-coding mode 1(34b) · 2(34C), the transform block size flag 1(32b) · 2(32C), and the intra-prediction mode 1(35b) · 2(35C) are not multiplexed into the bitstream (the dotted circle portion in fig. 15 indicates the branch). At this time, the intra coding mode 0(34a), the transform block size identification flag 0(32a), and the intra prediction mode 0(35a) each function as coding information common to all color components. Fig. 15 shows an example in which the intra-coding-mode-commonization flag 33 is multiplexed into bitstream data of an upper level of macroblocks such as slices, pictures, and sequences. In particular, in the case of use as in the example described in embodiment 2, since the color space is often not changed by the sequence, the object can be achieved by multiplexing the identification flag 33 in common for the intra-frame coding mode at the sequence level.

In embodiment 2, the intra-coding-mode-sharing flag 33 is used in the meaning of "whether or not all components are shared", but it may be used in the meaning of "whether or not they are shared by two specific components such as C1 and C2", for example, depending on the definition of the color space of the input video signal 1 (in the case of Y, Cb, and Cr, there is a high possibility that Cb and Cr may be shared). Further, when the intra N × N prediction mode is used while limiting the range of the intra coding mode sharing identification flag 33 to only the intra coding mode, the transform block size and the N × N prediction mode may be independently selected for each color component (fig. 16). With the syntax structure shown in fig. 16, it is possible to share coding mode information for a video requiring a complicated pattern such as N × N prediction and to change the prediction method for each color component, thereby improving prediction efficiency.

Further, if the information of the intra-coding mode sharing identification flag 33 is known in advance by some means in both the encoding apparatus and the decoding apparatus, it may be transmitted without being placed in a video bitstream. In this case, for example, the encoding apparatus may encode the intra-frame encoding mode commonization identification flag 33 by fixing it to a certain value, or may transmit it independently of the video bit stream.

3. Outline of operation of decoding device

The decoding apparatus of fig. 12 receives the video stream 22 in the arrangement of fig. 15 output from the encoding apparatus of fig. 11, performs decoding processing on the three color components in units of macroblocks of the same size (4:4:4 format), and restores each video frame.

First, the variable length decoding unit 25 decodes the stream 22 according to a predetermined rule (syntax) with the stream 22 as an input, and extracts information such as the intra coding mode commonization flag 33, the quantization completion transform coefficient 10, the coding mode 6, and the quantization parameter 21. The quantized transform coefficient 10 is input to the inverse quantization unit 12 together with the quantization parameter 21, and inverse quantization processing is performed. Then, the output thereof is input to the inverse orthogonal transform unit 13, and is restored to the local decoded prediction difference signal 14. On the other hand, the spatial prediction unit 2 receives the coding mode 6 and intra-coding mode sharing identification flag 33, and obtains the predicted image 7 according to these pieces of information. A specific procedure for obtaining the predicted image 7 will be described later. The local decoded prediction difference signal 14 and the predicted image 7 are added by an adder 18, and a temporary decoded image 15 (which is a signal identical to the local decoded image 15 in the encoding apparatus) is obtained. The provisional decoded image 15 is written back to the memory 16 for use in intra prediction of a subsequent macroblock. Three memories are prepared for each of the respective color components. In addition, the block filter 26 is caused to act on the provisional decoded image 15 in accordance with the instruction of the block filter control flag 24 interpreted by the variable length decoding unit 25, and the final decoded image 27 is obtained.

4. Intra-prediction decoding processing in decoding device

The intra prediction image generation processing, which is a feature of the decoding apparatus according to embodiment 2, will be described in detail. In this process, the above three color components are performed in units of macroblocks, and the variable length decoding unit 25 and the spatial prediction unit 2 in the decoding device of fig. 12 are mainly used. Fig. 17 is a flowchart showing the flow of the present process.

In the flowchart of fig. 17, the variable length decoding unit 25 performs S25 to S38. The video stream 22, which is input to the variable length decoding unit 25, is arranged in accordance with the data of fig. 15. In step S25, the intra coding mode 0(34a) (corresponding to C0 component) in the data of fig. 15 is first decoded. As a result, when the intra coding mode 0(34a) is "intra N × N prediction", the transform block size flag 0(32a) and the intra prediction mode 0(35a) are decoded (steps S26, S27). Next, when it is determined that the intra coding/prediction mode information is common to all color components based on the state of the intra coding mode commonization flag 33, the intra coding mode 0(34a), the transform block size flag 0(32a), and the intra prediction mode 0(35a) are set to the coding information used for the C1 and the C2 components (steps S29 and S30). Fig. 17 illustrates processing in units of macroblocks, and it is assumed that the intra-coding mode-sharing flag 33 used for the determination at step S29 is read from the bit stream 22 by the variable length decoding unit 25 at a layer level equal to or higher than the slice before proceeding to the processing of "start" in fig. 17.

When it is determined in step S29 in fig. 17 that the intra-coding/prediction mode information is coded for each color component, the intra-coding prediction mode information for the C1 and C2 components is decoded in the processing of the next steps S31 to S38. Through the above processing procedures, the encoding mode 6 of each color component is determined, and the determined encoding mode is output to the spatial prediction unit 2, and an intra-prediction image of each color component is obtained in accordance with steps S39 to S41. The process of obtaining an intra-prediction image follows the steps in fig. 3, 4, and 13, and is the same as the process performed by the encoding apparatus in fig. 11.

As described above, if the information of the intra-coding mode sharing identification flag 33 is known in advance by some means in both the encoding apparatus and the decoding apparatus, the decoding apparatus may decode the information by a fixed value in advance, for example, without analyzing the value from the video bitstream, or may transmit the information independently of the video bitstream.

In the 4:2:0 format adopted in the conventional video encoding standard, the definition of the color space is fixed to Y, Cb, and Cr, but the 4:4:4 format is not limited to Y, Cb, and Cr, and various color spaces can be used. By configuring the coding information of the intra macroblock as shown in fig. 15 and 16, it is possible to perform an optimal coding process according to the definition of the color space of the input video signal 1 and the properties of the video signal, and to perform a video decoding/reproduction process by uniquely interpreting a bit stream obtained as a result of such a coding process.

Embodiment 3

Embodiment 3 shows another configuration example of the encoding device in fig. 11 and the decoding device in fig. 12. The encoding device and the decoding device are based on the encoding method adopted in MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard, which is non-patent document 1, and are given the features unique to the present invention, as in embodiment 1. The video encoding device according to embodiment 3 differs from the encoding device according to embodiment 2 described with reference to fig. 11 only in the variable length encoding unit 11. The video decoding apparatus according to embodiment 3 differs from the decoding apparatus according to embodiment 2 described with reference to fig. 12 only in the variable length decoding unit 25. The other operations are the same as those in embodiment 2, and only the differences will be described here.

1. Encoding step of intra prediction mode information in encoding device

In the encoding device according to embodiment 2, the variable length encoding unit 11 shows the data arrangement in the bit stream for the information of the intra N × N prediction mode, but does not particularly show the encoding procedure. In this embodiment, a specific method of the encoding step is shown. In the present embodiment, particularly, considering a case where the value of the intra N × N prediction mode has high correlation between color components, entropy coding is performed using the correlation between the values of the color components for the intra N × N prediction mode obtained for each color component.

In the following description, the bit stream arrangement in the form of fig. 16 is premised. For simplicity of explanation, the value of the intra coding mode sharing flag 33 is set to be common to the intra coding modes in C0, C1, and C2, the intra coding mode is set to be the intra nxn prediction mode, and the transform block sizes 0 to 2 are set to be 4 × 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. In fig. 18 to 20, X denotes a current macroblock to be encoded. The macroblock adjacent to the left side is a macroblock a, and the macroblock immediately above is a macroblock B.

Fig. 18 to 20 are explanatory diagrams of the encoding procedure for each color component of C0, C1, and C2. Fig. 21 and 22 show flowcharts of steps.

Fig. 18 shows the appearance of the C0 component of macroblock X. Here, the 4 × 4 block to be encoded is referred to as a block X, and the 4 × 4 blocks on the left and top of the block X are referred to as a block a and a block B, respectively. In the macroblock X, there are two cases corresponding to the position of the 4 × 4 block to be encoded. Case1 is a Case of a 4 × 4 block of a relative encoding object, and the upper left 4 × 4 block is outside the current macroblock X, that is, belongs to macroblock a or macroblock B. Case2 is a Case where the 4 × 4 block of the relative encoding target is inside the current macroblock X, that is, belongs to the macroblock X, and the upper left 4 × 4 block is inside the current macroblock X. In any case, the intra 4 × 4 prediction mode is assigned to each 4 × 4 block X in the macroblock X, and is set to CurrIntraPredMode. The intra 4 × 4 prediction mode of the block a is IntraPredModeA, and the intra 4 × 4 prediction mode of the block B is IntraPredModeB. IntraPredModeA and IntraPredModeB are both information that encoding has ended when the block X is encoded. When encoding the intra 4 × 4 prediction mode of a certain block X, their parameters are first allocated (step S50 in fig. 21).

Next, the predicted value predcurintrapredmode for CurrIntraPredMode of the block X is determined by the following equation (step S51).

predCurrIntraPredMode＝Min(IntraPredModeA，IntraPredModeB)

Next, CurrIntraPredMode having a component C0 was encoded. Here, if CurrIntraPredMode is predCurrIntraPredMode, a flag (prev _ intra _ pred _ mode _ flag) indicating one bit of the same case as the prediction value is encoded. If CurrIntraPredMode! The CurrIntraPredMode is compared with predcurintrapredmode, and if the CurrIntraPredMode is smaller, the CurrIntraPredMode is encoded as it is. In the case where CurrIntraPredMode is larger, CurrIntraPredMode-1 is encoded (step S52).

Next, using fig. 19, the encoding procedure of the C1 component is shown. First, as in the case of the component C0, encoding parameters in the vicinity of IntraPredModeA, IntraPredModeB, and the like are set according to the position of the block X (step S53).

Next, the predicted value candidate 1 predcurintrapredmode 1 for CurrIntraPredMode of the block X is determined by the following equation (step S54).

predCurrIntraPredMode1＝Min(IntraPredModeA、IntraPredModeB)

If prev _ intra _ pred _ mode _ flag in the C0 component is 1, this predcurintrapredmode 1 is used as it is for predcurintrapredmode in the block X of the C1 component. The reason for this is as follows. The case where prev _ intra _ pred _ mode _ flag is taken to be 1 at the same block position in the C0 component means that the correlation between prediction modes is high in the nearby image region in the C0 component. In the case where RGB signals and the like relating to texture structures are not completely removed between the C0 component and the C1 component, there is a possibility that the correlation between nearby image regions is high in the C1 component as in the case of the C0 component. Therefore, it is determined that the predicted value of the C1 component does not depend on the intra 4 × 4 prediction mode of the C0 component.

On the other hand, in the C0 component, when the prev _ intra _ pred _ mode _ flag is 0, that is, the rem _ intra _ pred _ mode is encoded (step S55), the CurrIntraPredMode of the C0 component is set as the prediction value candidate 2 (step S56). Namely, it is

predCurrIntraPredMode2＝CurrIntraPredMode_C0

The background for setting this as the predicted value candidate is as follows. Encoding the rem _ intra _ pred _ mode in the C0 component means that the correlation of intra prediction between nearby image areas is low in the C0 component. In this case, it is similarly expected that the correlation between the neighboring image regions is low in the C1 component, and the intra prediction mode of the same block position in different color components is likely to provide a better prediction value.

The predicted value of CurrIntraPredMode in block X of component C1 is finally determined to be either predcururrpintrapredmode 1 or predcurrpintrapredmode 2 (step S57). As to which value is used, one-bit flag (pred _ flag) is used for additional coding. However, the pred _ flag is encoded only when the CurrIntraPredMode matches the prediction value, and if the CurrIntraPredMode does not match (if the rem _ intra _ pred _ mode is encoded), predCurrIntraPredMode1 is used as the prediction value.

When the above steps are described by the formula, the method is

As a result, prev _ intra _ pred _ mode _ flag, pred _ flag, rem _ intra _ pred _ mode are encoded into encoded data (step S58).

Next, using fig. 20, the encoding procedure of the C2 component is shown. First, as in the case of the C0 and C1 components, the encoding parameters in the vicinity of IntraPredModeA, IntraPredModeB, and the like are set according to the position of the block X (step S59).

Next, the predicted value candidate 1 predcurintrapredmode 1 for CurrIntraPredMode of the block X is determined by the following equation (step S60).

predCurrIntraPredMode1＝Min(IntraPredModeA、IntraPredModeB)

If prev _ intra _ pred _ mode _ flag in both the C0 and C1 components is 1, this predcurintrapredmode 1 is used as it is for predcurintrapredmode in the block X of the C1 component. The reason for this is as follows. The case where prev _ intra _ pred _ mode _ flag is taken to be 1 at the same block position of C0, C1 components means that the correlation between prediction modes is high in the nearby image region in C0, C1 components. In the case where RGB signals and the like relating to texture structures are not completely removed between the C0 and C1 components and the C2 component, there is a possibility that the correlation between nearby image regions is high in the C2 component as in the case of the C0 and C1 components. Therefore, it is determined that the predicted value of the C2 component does not depend on the intra 4 × 4 prediction mode of the C0 and C1 components.

On the other hand, when the prev _ intra _ pred _ mode _ flag of the component C0 or C1 is 0, that is, when the rem _ intra _ pred _ mode is encoded (step S61), CurrIntraPredMode of the component C0 or C1 is set as the prediction value candidate 2 (step S62). Namely, it is

The background for setting this as the predicted value candidate is as follows. Encoding rem _ intra _ pred _ mode in the C0 or C1 component means that the correlation of intra prediction between nearby image areas is low in the C0 or C1 component. In this case, it is similarly expected that the correlation between the neighboring image regions is low in the C2 component, and the intra prediction mode of the same block position in different color components is likely to provide a better prediction value. In addition, according to this consideration method, when both the C0 and C1 components encode the rem _ intra _ pred _ mode, both the C0 and C1 current intra prediction modes may be candidates of prediction values, but here, the C1 current intra prediction mode is used as a prediction value. The reason for this is considered that when the YUV color space is input, C0 is processed to be luminance and C1/C2 is processed to be color difference, and in this case, C1 is closer to the prediction mode of C2 than C0. In the case of RGB color space input, whether C0 or C1 is selected does not become so large, and it is generally considered appropriate to use the C1 component as a prediction value (the C2 component may also be used as a prediction value by design).

The predicted value of CurrIntraPredMode in block X of component C2 is finally determined to be either predcururrpintrapredmode 1 or predcurrpintrapredmode 2 (step S63). As to which value is used, one-bit flag (pred _ flag) is used for additional coding.

When the above steps are described by the formula, the method is

As a result, prev _ intra _ pred _ mode _ flag, pred _ flag, rem _ intra _ pred _ mode are encoded into encoded data (step S64).

The above-described encoding procedure can be defined similarly for the intra 8 × 8 prediction mode. By encoding the intra N × N prediction mode in such a procedure, the amount of code of the prediction mode itself can be reduced by using the correlation with the prediction mode selected from the other color components, and the encoding efficiency can be improved.

The difference between fig. 21 and 22 is whether the encoding process of the intra prediction mode for each MB is performed separately for each color component or collectively. In the case of fig. 21, the color components are encoded in units of 4 × 4 blocks, and are grouped into 16 patterns and arranged as a bitstream (step S65). In the case of fig. 22, 16 4 × 4 blocks of each color component are collectively encoded so that the color components are arranged in a bit stream (steps S66, S67, S68).

In the above-described step, the pred _ flag is effective information only when the prev _ intra _ pred _ mode _ flag is 1, but it may be determined that the prev _ intra _ pred _ mode _ flag is 0. That is, for example, in the case of the component C1, the encoding may be performed by the following steps:

in this method, when rem _ intra _ pred _ mode is encoded in the intra prediction mode in the co-located block of the C0 component, the pred _ flag is always encoded, but when prev _ intra _ pred _ mode _ flag is equal to 0, a prediction value with better accuracy may be used, and improvement in encoding efficiency may be expected. Furthermore, the pred _ flag may be encoded without depending on whether or not the rem _ intra _ pred _ mode is encoded in the intra prediction mode in the co-located block of the component C0. In this case, the intra prediction mode of the C0 component is always used as the predicted value candidate.

That is, the formula at this time is as follows.

Note that the pred _ flag may be set not in 4 × 4 block units but in units of macroblocks and sequences. When the prediction value candidates 1 and 2 are used in common for all 4 × 4 blocks in a macroblock, overhead (overhead) information transmitted as the pred _ flag can be further reduced. In addition, in order to determine which of the predicted value candidate 1 or the predicted value candidate 2 is used based on the input color space definition, the determination may be made in units of a sequence. In this case, there is no need to transmit the pred _ flag for each macroblock, and overhead information can be further reduced.

2. Decoding step of intra prediction mode information in decoding device

In the decoding device according to embodiment 2, the variable length decoding unit 25 shows the data arrangement in the bit stream for the information of the intra N × N prediction mode, but does not particularly show the decoding procedure. Embodiment 3 shows a specific method of the decoding procedure. In embodiment 3, in particular, considering a case where the value of the intra N × N prediction mode has a high correlation between color components, the bit stream obtained by entropy encoding the intra N × N prediction mode obtained for each color component using the correlation between the color components is decoded.

In the following description, the bit stream arrangement in the form of fig. 16 is premised. In order to limit the description to the decoding procedure of the intra prediction mode, the value of the intra coding mode sharing flag 33 in the bit stream is set to be shared among the intra coding modes in C0, C1, and C2. The intra coding mode is designated as an intra NxN prediction mode, and the transform block sizes 0-2 are designated as 4 x 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. Similarly to the encoding device, the decoding device also uses the relationships in fig. 18 to 20. In the decoding apparatus, X is set as a current macroblock to be decoded. The macroblock adjacent to the left side is a macroblock a, and the macroblock immediately above is a macroblock B. Fig. 23 shows a flowchart of the decoding step. In fig. 23, the steps given the same numbers as in fig. 21 and 22 show the execution of the same processing as in the encoding apparatus.

Fig. 18 shows the appearance of the C0 component of macroblock X. In the macroblock X, there are two cases corresponding to the position of the 4 × 4 block to be decoded. Case1 is a Case of a 4 × 4 block of a relative decoding object, and the upper left 4 × 4 block is outside the current macroblock X, that is, belongs to macroblock a or macroblock B. Case2 is a Case of a 4 × 4 block of a relative decoding object, and the upper left 4 × 4 block is inside the current macroblock X, that is, belongs to the macroblock X. Here, the 4 × 4 block to be decoded is referred to as a block X, and the 4 × 4 blocks on the left and top of the block X are referred to as a block a and a block B, respectively. In any case, the intra 4 × 4 prediction mode is assigned to each 4 × 4 block X in the macroblock X, and is set to CurrIntraPredMode. The intra 4 × 4 prediction mode of the block a is IntraPredModeA, and the intra 4 × 4 prediction mode of the block B is IntraPredModeB. Both IntraPredModeA and IntraPredModeB are information that decoding has ended when the block X is encoded. When the intra 4 × 4 prediction mode of a certain block X is decoded, their parameters are first allocated (step S50).

predCurrIntraPredMode＝Min(IntraPredModeA、IntraPredModeB)

Next, a flag (prev _ intra _ pred _ mode _ flag) indicating whether or not it is one bit of CurrIntraPredMode ═ predcurcurrintrapredmode is decoded. prev _ intra _ pred _ mode _ flag ═ 1 means CurrIntraPredMode ═ predcurcurcurcurcurcurrintrapredmode. Otherwise, (prev _ intra _ pred _ mode _ flag ═ 0), the information of rem _ intra _ pred _ mode is decoded from the bitstream. When the rem _ intra _ pred _ mode is smaller than the predcurintrapredmode, the rem _ intra _ predmode is set to "rem _ intra _ pred _ mode". When CurrIntraPredMode is larger, CurrIntraPredMode is set to rem _ intra _ pred _ mode +1 (step S65).

In summarizing these steps, the following is described.

predCurrIntraPredMode1＝Min(IntraPredModeA、IntraPredModeB)

If prev _ intra _ pred _ mode _ flag in the C0 component is 1, this predcurintrapredmode 1 is used as it is for predcurintrapredmode in the block X of the C1 component. The reason for this is the same as that described in the encoding apparatus.

On the other hand, in the C0 component, when prev _ intra _ pred _ mode _ flag is 0, that is, when rem _ intra _ pred _ mode is decoded (step S55), CurrIntraPredMode of the C0 component is set as the prediction value candidate 2 (step S56). Namely, it is

predCurrIntraPredMode2＝CurrIntraPredMode_C0

The reason why this is used as the predicted value candidate is the same as that described in the encoding apparatus.

The predicted value of CurrIntraPredMode in block X of component C1 is finally determined to be either predcururrpintrapredmode 1 or predcurrpintrapredmode 2 (step S57). As to which value is used, a flag (pred _ flag) of one bit is decoded to determine. However, the pred _ flag is decoded only when the CurrIntraPredMode matches the prediction value, and if not (if rem _ intra _ pred _ mode is decoded), predCurrIntraPredMode1 is used as the prediction value.

After providing the predicted value candidate 1, the predicted value candidate 2, the prev _ intra _ pred _ mode _ flag, the pred _ flag, and the rem _ intra _ pred _ mode, the CurrIntraPredMode is decoded by the following steps (step S66).

Next, using fig. 20, the decoding procedure of the C2 component is shown. First, as in the case of the C0 and C1 components, the encoding parameters in the vicinity of IntraPredModeA, IntraPredModeB, and the like are set according to the position of the block X (step S59).

predCurrIntraPredMode1＝Min(IntraPredModeA、IntraPredModeB)

If prev _ intra _ pred _ mode _ flag in both the C0 and C1 components is 1, this predcurintrapredmode 1 is used as it is for predcurintrapredmode in the block X of the C1 component. The reason for this is the same as that described in the encoding apparatus.

On the other hand, in the C0 or C1 component, when prev _ intra _ pred _ mode _ flag is 0, that is, when rem _ intra _ pred _ mode is decoded (step S61), CurrIntraPredMode of the C0 or C1 component is set as the predicted value candidate 2 (step S62).

Namely, it is

The predicted value of CurrIntraPredMode in block X of component C2 is finally determined to be either predcururrpintrapredmode 1 or predcurrpintrapredmode 2 (step S63). As to which value is used, a flag (pred _ flag) of one bit is decoded to determine. However, the pred _ flag is decoded only when the CurrIntraPredMode matches the prediction value, and if not (if rem _ intra _ pred _ mode is decoded), predCurrIntraPredMode1 is used as the prediction value.

After providing the predicted value candidate 1, the predicted value candidate 2, the prev _ intra _ pred _ mode _ flag, the pred _ flag, and the rem _ intra _ pred _ mode, the CurrIntraPredMode is decoded by the following steps (step S71).

The decoding procedure described above may be defined similarly for the intra 8 × 8 prediction mode. By decoding the intra N × N prediction mode in such a procedure, it is possible to decode a bit stream in which the amount of code of the prediction mode itself is reduced by using the correlation with the prediction mode selected from other color components, thereby improving the encoding efficiency.

In the above-described step, the pred _ flag is decoded only when pred _ intra _ pred _ mode _ flag is 1, but may be decoded so as to include information when prev _ intra _ pred _ mode _ flag is 0.

That is, for example, in the case of the component C1, decoding may be performed by the following steps:

the effects of this method are as described in the description of the encoding procedure on the encoding apparatus side. Furthermore, it may be configured to decode the pred _ flag without depending on whether or not the rem _ intra _ pred _ mode is decoded in the intra prediction mode in the co-located block of the component C0. In this case, the intra prediction mode of the C0 component is always used as the predicted value candidate.

That is, the following is made.

As described in the description of the encoding apparatus, the pred _ flag may be included in the bit stream not in 4 × 4 block units but in units of macroblocks and sequences. When the prediction value candidates 1 and 2 are used in common for all 4 × 4 blocks in a macroblock, overhead information of the decoded pred _ flag can be reduced. In addition, in order to determine which of the predicted value candidate 1 or the predicted value candidate 2 is used based on the input color space definition, the determination may be made in units of a sequence. In this case, there is no need to transmit the pred _ flag for each macroblock, and overhead information can be further reduced.

Embodiment 4

In embodiment 2, a bit stream of the format of fig. 16 is explained. In embodiment 2, it is described that, when the intra coding mode indicates "intra nxn prediction", whether the intra prediction mode of each color component of C0, C1, and C2 is the intra 4 × 4 prediction mode or the intra 8 × 8 prediction mode is identified based on the values of the transform block size identification flags 0 to 2(32a to 32C). In embodiment 4, the bit stream arrangement is changed, and as shown in fig. 24, intra prediction mode indicator flags 1 and 2(36a and 36b) are transmitted at the sequence level to C1 and C2 components. The intra prediction mode indication flag is valid when the intra N × N prediction mode is selected in the intra coding mode and when the transform block size identification flag indicates 4 × 4 transform, that is, when the intra 4 × 4 prediction mode is selected, and the following two states can be switched according to this value.

State 1: for the C1 or C2 components, the intra4 × 4 prediction modes used are also individually selected from the nine of fig. 3 to be encoded.

State 2: for the C1 or C component, the intra4 × 4 prediction mode to be used is limited to DC prediction, that is, intra4 × 4_ pred _ mode of fig. 3 is 2, and no intra prediction mode information is encoded.

For example, in the case of encoding in a color space such as Y, Cb, or Cr, and in the case of high-resolution video such as HDTV or higher, a4 × 4 block corresponds to an extremely small image area. In this case, it may be more effective to fix the prediction mode information itself to one and not transmit the prediction mode information which becomes overhead, than to provide a room for selecting nine prediction modes particularly for components such as Cb and Cr components which do not hold the texture of the image. By performing such bit stream arrangement, optimal encoding can be performed according to the properties of the input color space and the characteristics of the video.

The decoding device that receives the bit stream of the format of fig. 24 decodes the intra prediction mode indicator (36a, 36b) in the variable length decoding unit 25, and uses this value to identify whether the bit stream has been encoded in state 1 or state 2. Thus, it is determined whether an intra4 × 4_ pred _ mode of fig. 3, which is a DC prediction mode fixedly applied to the intra4 × 4 prediction mode decoded from the bitstream and used, is 2 for the C1 or C2 component.

In embodiment 4, in state 2, the intra4 × 4_ pred _ mode is limited to 2 for the C1 or C2 component, but the prediction mode information may be fixed to one prediction mode or may be another prediction mode. In state 2, it may be determined that the same intra4 × 4 prediction mode as that of C0 is used for the C1 or C2 component. In this case, since it is not necessary to encode the intra4 × 4 prediction mode for the C1 or C2 component, the additional bits can be reduced.

Embodiment 5

Embodiment 5 shows another configuration example of the encoding device in fig. 11 and the decoding device in fig. 12. The encoding device and the decoding device in embodiment 5 are based on the encoding method adopted in MPEG-4AVC (ISO/IEC 14496-10)/ITU-T h.264 standard, which is non-patent document 1, and are given the features unique to the present invention, as in the other embodiments described above. The video encoding device according to embodiment 5 differs from the encoding device of fig. 11 described in embodiments 2 and 3 only in the operation of the variable length encoding unit 11. The video decoding apparatus according to embodiment 5 differs from the decoding apparatus of fig. 12 described in embodiments 2 and 3 only in the operation of the variable length decoding unit 25. The other operations are the same as those in embodiments 2 and 3, and only the differences will be described here.

1. Encoding step of intra prediction mode information in encoding device

In the encoding device according to embodiment 3, the variable length encoding unit 11 of the present embodiment shows a specific encoding method of intra N × N prediction mode information in a bitstream of the format of fig. 16. Embodiment 5 shows another specific method of the encoding procedure. Embodiment 5 is characterized by providing a method of adaptively predicting in a neighboring pixel region in the same color component, focusing particularly on a case where the texture structure as an image pattern is reflected in the value of the intra N × N prediction mode. In the following description, the bit stream arrangement in the form of fig. 16 is premised. In embodiment 5, encoding is performed independently for each color component in the encoding of the intra N × N prediction mode information of each component of C0, C1, and C2, and the encoding method of the C0 component is applied similarly to C1 and C2, and only the C0 component will be described for simplicity of description. The value of the intra coding mode sharing flag 33 is set to be shared among C0, C1, and C2, the intra coding mode is set to be an intra nxn prediction mode, and the transform block size flags 0 to 2(32a to 32C) are set to be 4 × 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. Fig. 18 is an explanatory diagram of an encoding procedure of the intra nxn prediction mode information of the C0 component. In fig. 18, X is a current macroblock to be encoded. The macroblock adjacent to the left side is a macroblock a, and the macroblock immediately above is a macroblock B. In addition, fig. 25 shows a flowchart of the encoding step.

In embodiment 3, the predicted value predurrnapredmode of the intra 4 × 4 prediction mode CurrIntraPredMode allocated to each 4 × 4 block X in fig. 18 is uniquely allocated to the small value of intrapredmode a and intrapredmode b. This is a method adopted also in the current AVC/h.264 standard, and the larger the value of the intra N × N prediction mode is, the more complicated the prediction image generation method becomes due to pixel interpolation that incorporates directionality of the image pattern, and is caused by assigning a small value to a mode with high suitability for a general image pattern. In this scheme, the overall coding efficiency is also improved when the bit rate is low, because the increment of the code amount of the prediction mode has a larger influence on the mode selection than the increment of the distortion, but in the case where the bit rate is relatively high, the increment of the distortion has a larger influence on the mode selection than the increment of the code amount of the prediction mode, and therefore, the small value of intrapredmode a and intrapredmode b is not always the most suitable. From such an observation, in embodiment 5, the accuracy of the predicted value is improved by adapting the predicted value setting to the states of IntraPredModeA and IntraPredModeB as described below. In this step, as a value of CurrIntraPredMode which is judged to be optimal when the image pattern is regarded, predcururrpintrapredmode is determined based on the states of intrapredmode a and intrapredmode b (steps S73, S74, S75).

(1) When IntraPredModeA and IntraPredModeB are both in the range of 0 to 2, MIN (IntraPredModeA and IntraPredModeB) is set to predcurrpintrapredmode.

(2) When either IntraPredModeA or IntraPredModeB is 3 or more and the directions of predictions of IntraPredModeA and IntraPredModeB are completely different (for example, when IntraPredModeA is 3 and IntraPredModeB is 4), the DC prediction (intra4 × 4_ pred _ mode ═ 2) is predcurcurcurkerintrapredmode.

(3) When either IntraPredModeA or IntraPredModeB is 3 or more and the prediction directions are the same (for example, when IntraPredModeA is 3 and IntraPredModeB is 7 (both predictions from the upper right)), the prediction mode for interpolating the pixel (7 in the above example) is predcururrpendopredmode.

In addition, as in embodiment 3, preparation processing for encoding such as IntraPredModeA and IntraPredModeB is performed in advance (steps S50, S53, and S59). As a result, predcurintrapredmode is uniquely derived from the values of intrapredmode a, intrapredmode b. Fig. 26 shows the contents of tabulation of the rules set for the predicted values. In fig. 26, the portions with grid shading are not in accordance with the conventional MIN (IntraPredModeA, IntraPredModeB) rule, and are in a case where a better predicted value is determined from the continuity of the image pattern. In the above step (1), the table of category 0 is used. The table of category 1 was used in (2) and (3).

As a result, after predcurintrapredmode is determined, the remaining encoding step of the C0 component described in embodiment 3 is performed, thereby completing encoding (steps S52, S58, and S64).

That is, become

The above-described encoding procedure can be defined similarly for the intra 8 × 8 prediction mode. By encoding the intra N × N prediction mode in such a procedure, it is possible to more favorably utilize the correlation of prediction modes in the neighboring pixel regions of the same color component, and it is possible to reduce the amount of code of the prediction mode itself and improve the encoding efficiency.

2. Decoding step of intra prediction mode information in decoding device

The decoding device according to embodiment 3 shows one specific decoding procedure for the information of the intra N × N prediction mode in the variable length decoding unit 25 with respect to the bit stream of the format of fig. 16. Embodiment 5 shows another specific method of the decoding step. In embodiment 5, focusing particularly on the case where the texture structure of the image pattern is reflected in the value of the intra N × N prediction mode, the bitstream encoded by adaptively predicting the neighboring pixel regions in the same color component is decoded.

In the following description, the bit stream arrangement in the form of fig. 16 is premised. For simplicity of explanation, the value of the intra coding mode sharing flag 33 in the bit stream is set so that the intra coding modes are shared among C0, C1, and C2. The intra coding mode is designated as an intra NxN prediction mode, and the transform block size identification flags 0 to 2(32a to 32c) are designated as 4 x 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. Similarly to the encoding apparatus, the decoding apparatus will also describe only the C0 component using the relationship shown in fig. 18 (C1 and C2 are decoded independently from C0 by equivalent steps). In the decoding apparatus, X is set as a current macroblock to be decoded. The macroblock adjacent to the left side is a macroblock a, and the macroblock immediately above is a macroblock B.

In embodiment 3, as described in the description of the encoding apparatus, the smaller value of IntraPredModeA and IntraPredModeB is uniquely assigned to the predicted value predurrpordmode of the intraframe 4 × 4 prediction mode CurrIntraPredMode assigned to each 4 × 4 block X in fig. 18. In contrast, in the decoding device according to embodiment 5, predcurintrapredmode is determined by the same procedure as that shown in the encoding procedure, using the table of fig. 26. Since IntraPredModeA and IntraPredModeB are already decoded and known, the same processing as the encoding process can be performed.

The subsequent steps are equivalent to the decoding step of the C0 component described in embodiment 3. In summary, the following is described below.

The decoding procedure described above may be defined similarly for the intra 8 × 8 prediction mode. By decoding the intra N × N prediction mode in such a procedure, it is possible to more favorably utilize the correlation of prediction modes in the neighboring pixel regions of the same color component, and to decode an encoded bit stream in which the code amount of the prediction mode itself is reduced.

In the above example, predcurintrapredmode is determined by fixedly using the table of fig. 26 and encoding and decoding are performed, but encoding and decoding may be performed while using the table of fig. 26 as an initial value and successively updating the intraframe prediction mode that is most likely to occur with respect to the states of IntraPredModeA and IntraPredModeB as predcurintrapredmode. For example, in the combination in fig. 26, which is "category 0, IntraPredModeA 0, IntraPredModeB 0, predcurintrapredmode 0", in the case where IntraPredModeA 0 and IntraPredModeB 0 are used in the above embodiment, predcurintrapredmode is always set to 0. However, since the video signal itself is a non-constant signal, it cannot be guaranteed that the combination is always optimal depending on the content of the video. In the worst case, the probability that predcurintrapredmode is inappropriate as a prediction value is not zero in most cases in the entire video. Therefore, for example, the frequency of CurrIntraPredMode occurring when IntraPredModeA is 0 and IntraPredModeB is 0 is counted, and predcurintrapredmode is updated with the prediction mode having the highest state occurrence frequency with respect to IntraPredModeA and IntraPredModeB each time encoding and decoding of CurrIntraPredMode is completed. With this configuration, the prediction value used for encoding/decoding CurrIntraPredMode can be set to an optimum value for each video content.

Embodiment 6

Embodiment 6 shows another configuration example of the encoding device in fig. 11 and the decoding device in fig. 12. The encoding device and the decoding device in embodiment 6 are based on the encoding method adopted in MPEG-4AVC (ISO/IEC 14496-10)/ITU-T h.264 standard, which is non-patent document 1, and are given the features unique to the present invention, as in the other embodiments described above. The video encoding device according to embodiment 6 differs from the encoding device of fig. 11 described in embodiments 2, 3, and 5 only in the operation of the variable length encoding unit 11. The video decoding apparatus according to embodiment 6 differs from the decoding apparatus of fig. 12 described in embodiments 2, 3, and 5 only in the operation of the variable length decoding unit 25. The other operations are the same as those in embodiments 2, 3, and 5, and only the differences will be described here.

1. Encoding step of intra prediction mode information in encoding device

The encoding apparatuses according to embodiments 3 and 5 show a specific encoding method of intra N × N prediction mode information for a bit stream of the format of fig. 16. Embodiment 6 shows another specific method of the encoding procedure. Embodiment 6 is characterized by providing a method of adaptively performing arithmetic coding in a neighboring pixel region in the same color component, focusing particularly on a case where the value of the intra N × N prediction mode reflects the texture structure as an image pattern. In the following description, the bit stream arrangement in the form of fig. 16 is premised. In embodiment 6, coding is performed independently for each color component in the coding of the intra N × N prediction mode information of each of the components C0, C1, and C2, and the coding method of the C0 component is applied similarly to the C1 and C2, and only the C0 component will be described for simplicity of description. The value of the intra coding mode sharing flag 33 is set to be shared among C0, C1, and C2, the intra coding mode is set to be an intra nxn prediction mode, and the transform block size flags 0 to 2(32a to 32C) are set to be 4 × 4 blocks. In this case, all of the intra prediction modes 0 to 2(35a to 35c) are intra 4 × 4 prediction modes. Fig. 18 is an explanatory diagram of an encoding procedure of the intra nxn prediction mode information of the C0 component. In fig. 18, X is a current macroblock to be encoded. The macroblock adjacent to the left side is a macroblock a, and the macroblock immediately above is a macroblock B. In addition, fig. 27 shows a flowchart of the encoding step.

In embodiments 3 and 5, the small value of intrapredmode a and intrapredmode b is uniquely assigned to the predicted value predurrarintrapredidpode of the intraframe 4 × 4 prediction mode currintrapredidpode assigned to each 4 × 4 block X in fig. 18, and when equal to this, prev _ intra _ pred _ mode _ flag is set to 1, while when encoding for the intraframe 4 × 4 prediction mode of the block X is terminated, the code is transmitted by rem _ intra _ pred _ mode in different cases. In the present embodiment, CurrIntraPredMode is directly arithmetically encoded using states of intrapredmode a and intrapredmode b. At this time, a coding step in accordance with the context-adaptive binary arithmetic coding adopted in the AVC/h.264 standard is used.

First, in the form shown in fig. 28, the CurrIntraPredMode of the encoding target is binary-expressed (step 76). The first bin (bin) of the binary sequence is a code for classifying whether CurrIntraPredMode is vertical prediction or horizontal prediction (see fig. 3). In this example, the DC prediction (intra4 × 4_ pred _ mode ═ 2) is classified as a horizontal prediction, but it may also be classified as a vertical prediction. The second bin provides the Terminate bit for the prediction mode value that is considered to occur most frequently in the vertical and horizontal directions, respectively. The third and subsequent frames are codes that are configured such that the remaining prediction mode values are terminated in order from the highest occurrence frequency (preferably, the second and subsequent frames in the binary sequence structure in fig. 28 are set according to the symbol occurrence probability in the course of actual image data encoding).

Arithmetic coding is performed while sequentially selecting a (0, 1) occurrence probability table to be used for each cell of a binary sequence. In the encoding of the first lattice, a context used in arithmetic encoding is determined as follows (step S78).

Context A (C)_A): with respect to intrapredmode a and intrapredmode b, a flag intra _ pred _ direction _ flag for binary representation of whether the intra prediction mode is vertical prediction or horizontal prediction is defined, and the following four states are set as context values.

C_A＝(intra_pred_direction_flag for IntraPredModeA＝＝1)+(intra_pred_direction_flag for IntraPredModeB＝＝1)；

Here, for example, in fig. 3, the intra _ pred _ direction _ flag is classified into vertical prediction (0) when the intra4 × 4_ pred _ mode takes values of 0, 3, 5, and 7, and classified into horizontal prediction (1) when the intra4 × 4_ pred _ mode takes values of 1, 2, 4, 6, and 8. At C_AIn the state 4 (1), conditional additional probabilities of CurrIntraPredMode based on the states of intrapredmode a and intrapredmode b are obtained in advance, and initial occurrence probability tables of (0, 1) determined based on the conditional additional probabilities are assigned. By configuring the context in this way, the conditional addition occurrence probability of the first bin can be estimated more favorably, and the efficiency of arithmetic coding can be improved. According to C_AThe occurrence probability table of the first lattice is selected, and arithmetic coding is performed. In addition, the occurrence probability table is updated by using the coded values ( Step S79).

After the second lattice, an initial occurrence probability table of (0, 1) determined based on the occurrence probability of each prediction mode value is assigned in advance (step S80). Next, as in the first case, binary arithmetic coding is performed and the occurrence probability table is updated (step S81).

The above-described encoding procedure can be defined similarly for the intra 8 × 8 prediction mode. By encoding the intra nxn prediction mode in such a procedure, adaptive arithmetic coding can be applied to encoding of prediction mode information using the correlation of prediction modes in the neighboring pixel regions of the same color component, so that the encoding efficiency can be improved.

2. Decoding step of intra prediction mode information in decoding device

The decoding apparatuses according to embodiments 3 and 5 each show one specific decoding procedure for the information of the intra N × N prediction mode in the variable length decoding unit 25 for the bit stream of the format shown in fig. 16. Embodiment 6 shows another specific method of the decoding step. Embodiment 6 is characterized in that, focusing particularly on the case where the texture structure as an image pattern is reflected in the value of the intra N × N prediction mode, a bitstream encoded by adaptive arithmetic coding in a nearby pixel region in the same color component is decoded.

In embodiments 3 and 5, as described in the description of the encoding apparatus, for the prediction value predurintrapredmode of the intraframe 4 × 4 prediction mode CurrIntraPredMode allocated to each 4 × 4 block X, a small value of intrapredmode a and intrapredmode b is uniquely allocated in fig. 18, and prev _ intra _ pred _ mode _ flag is decoded, and when the value is 1, predurrpurrappredmode is used as CurrIntraPredMode, and when prev _ intra _ pred _ mode _ flag is zero, rem _ intra _ pred _ mode is decoded, and the intraframe 4 × 4 prediction mode of the block X is restored. In contrast, in embodiment 6, CurrIntraPredMode is directly arithmetically decoded using the states of intrapredmode a and intrapredmode b. At this time, a decoding step in accordance with the context-adaptive binary arithmetic decoding adopted in the AVC/h.264 standard is used.

The CurrIntraPredMode of the decoding target is encoded into a binary sequence in the format shown in fig. 28, and the binary arithmetic decoding is performed on the sequence in order from the left end. As described in the encoding procedure of embodiment 6, the first bin of the binary sequence is a code (see fig. 3) for classifying whether CurrIntraPredMode is vertical prediction or horizontal prediction, and the second and subsequent bins are codes in which the prediction mode values are terminated in order from the high frequency of occurrence. The reason for this code structure is as described in the encoding step.

In the decoding process, first, when decoding the first frame, C that is the same as the context used in the encoding step is determined_A. According to C_ASelecting the occurrence probability table of the first lattice, performing arithmetic decoding, and restoring the first lattice. In addition, the occurrence probability table is updated with the decoded value.

An initial occurrence probability table of (0, 1) determined according to the occurrence probability of each prediction mode value is assigned in advance to the second and subsequent frames. Next, binary arithmetic decoding is performed and the occurrence probability table is updated in the same manner as in the first table. Since the binary sequence in fig. 28 is configured to uniquely identify each prediction mode value, CurrIntraPredMode can be decoded one by one when a predetermined number of bins are restored.

The decoding procedure described above may be defined similarly for the intra 8 × 8 prediction mode. By decoding the intra N × N prediction mode in such a procedure, it is possible to decode an encoded bit stream in which the amount of code of the prediction mode itself is reduced by arithmetic coding using correlation of prediction modes in the neighboring pixel regions of the same color component.

In the above example, other variations are also considered for the table of fig. 28. For example, a method of forming a binary sequence as shown in fig. 29 may be used. Here, the following context B is used in the first lattice.

Context B (C)_B): with respect to IntraPredModeA and IntraPredModeB, a flag intra _ DC _ pred _ flag for binary representation of whether the intra prediction mode is DC prediction or not is defined, and the following four states are set as context values.

C_A＝(intra_dc_pred_flag for IntraPredModeA＝＝1)+(intra_dd_pred_flag for IntraPredModeB＝＝1)；

Here, in fig. 3, the intra _ dc _ pred _ flag is set to 1 when the intra4 × 4_ pred _ mode takes a value of 2, and is set to 0 when the other values are taken. At C_BThe conditional additional probability of CurrIntraPredMode on the assumption of the states of intrapredmode a and intrapredmode b is obtained in advance, and an initial occurrence probability table of the first lattice value (0, 1) determined based on the conditional additional probability is assigned to each of the four states (1, 0). In fig. 29, it is designed that the first bin takes a value of 0 in the case where CurrIntraPredMode is DC prediction, and takes a value of 1 in the case other than DC prediction. In the second cell, the context A (C) is used _A). By configuring the context in this way, the conditional addition occurrence probability can be estimated more favorably for both the first bin and the second bin, and the efficiency of arithmetic coding can be improved.

Embodiment 7

In embodiment 7, a description will be given of an encoding device and a corresponding decoding device for performing encoding using inter-frame prediction in units obtained by equally dividing a video frame input in a 4:4:4 format into matrix regions (macroblocks) of 16 × 16 pixels. The encoding device and the decoding device are based on the encoding method adopted in MPEG-4AVC (ISO/IEC 14496-10)/ITU-T H.264 standard (hereinafter AVC), and are given the features inherent in the present invention.

Fig. 30 shows a configuration of a video encoding device according to embodiment 7, and fig. 31 shows a configuration of a video decoding device according to embodiment 7. In fig. 31, elements denoted by the same reference numerals as those of the components of the coding apparatus of fig. 30 denote the same elements.

The following describes the overall operation of the encoding device and the decoding device, and inter prediction mode determination processing and motion compensated prediction decoding processing, which are characteristic operations of embodiment 7, based on these figures.

1. Outline of operation of encoding device

In the encoding device shown in fig. 30, the input video signal 1 is input to the encoding device in units of three color components divided into macroblocks of the same size and grouped together, assuming that each video frame is in a 4:4:4 format.

First, the motion compensation prediction unit 102 selects a reference image of one frame from the motion compensation prediction reference image data of one or more frames stored in the memory 16, and performs motion compensation prediction processing for each color component in units of macroblocks. Three memories are prepared for each color component (three are described in the present embodiment, but may be appropriately changed by design). Seven kinds of block sizes for performing motion compensation prediction are prepared, and first, any one of the sizes 16 × 16, 16 × 8, 8 × 16, and 8 × 8 may be selected in units of macroblocks as shown in fig. 32(a) to (d). Further, when 8 × 8 is selected, any one of the sizes 8 × 8, 8 × 4, 4 × 8, and 4 × 4 may be selected for each 8 × 8 block as shown in fig. 32(e) to (h). For the selected size information, the size information of the macroblock unit is output as a macroblock type, and the size information of the 8 × 8 block unit is output as a sub-macroblock type. Further, the identification number and the motion vector information of the selected reference image are output for each block.

The video encoding device according to embodiment 7 is characterized in that the motion compensation prediction processing method for the three color components is switched in accordance with the inter prediction mode sharing identification flag 123. This point is described in detail in the following 2.

The motion compensation prediction unit 102 performs motion compensation prediction processing on all the block sizes or subblock sizes shown in fig. 32, all the motion vectors 137 of a predetermined search range, and one or more selectable reference images, and obtains a prediction difference signal 4 using the motion vectors 137, one reference image, and the subtractor 3. The prediction difference signal 4 is evaluated for its prediction efficiency in the encoding mode determining unit 5, and outputs the macroblock type/sub-macroblock type 106, the motion vector 137, and the identification number of the reference picture, which have obtained the best prediction efficiency for the macroblock to be predicted from the prediction processing performed by the motion compensation predicting unit 102. When the macroblock type/sub-macroblock type 106 is selected, the weighting factor 20 for each type determined by the determination of the encoding control unit 19 may be added. The prediction difference signal 4 obtained by motion compensation prediction based on the selected type, the motion vector 137, and the reference image is output to the orthogonal transform unit 8. The orthogonal transform unit 8 transforms the input prediction difference signal 4 and outputs the transformed signal to the quantization unit 9 as an orthogonal transform coefficient. The quantization unit 9 quantizes the input orthogonal transform coefficient according to the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient to the variable length encoding unit 11 as a quantized transform coefficient 10. The variable length encoding unit 11 entropy encodes the quantized transform coefficient 10 by means such as huffman coding or arithmetic coding. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse orthogonal transform unit 13, and added to the predicted image 7 generated from the selected macroblock type/sub-macroblock type 106, the motion vector 137, and the reference image by the adder 18, thereby generating the local decoded image 15. The local decoded image 15 is stored in the memory 16 for use in the subsequent motion compensation prediction process. Further, the variable length coding unit 11 is inputted with a block filtering control flag 24 indicating whether or not to perform block filtering on the macroblock (in the prediction processing performed by the motion compensation prediction unit 102, pixel data before performing block filtering is stored in the memory 16 and used, so that the block filtering processing itself is not necessary in the coding processing, but the decoding apparatus performs block filtering using an instruction of the block filtering control flag 24 to obtain a final decoded image).

The inter prediction mode sharing flag 123, the quantization completion transform coefficient 10, the macroblock type/sub-macroblock type 106, the motion vector 137, the reference picture identification number, and the quantization parameter 21, which are input to the variable length coding unit 11, are arranged and shaped into a bit stream according to a predetermined rule (syntax), and are output to the transmission buffer 17. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path to which the encoding apparatus is connected and the reading speed of the recording medium, and outputs the smoothed bit stream as a video stream 22. Further, feedback is given to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.

2. Inter prediction mode determination processing in encoding device

The inter prediction mode determination process, which is a feature of the encoding apparatus according to embodiment 7, will be described in detail. In the following description, the inter prediction mode refers to a block size which is a unit of the motion compensation prediction, that is, a macroblock type/sub-macroblock type, and the inter prediction mode determination processing refers to processing for selecting a macroblock type/sub-macroblock type, a motion vector, and a reference image. In this process, the above three color components are performed in units of macroblocks, and the motion compensation prediction unit 102 and the encoding mode determination unit 5 in the encoding device of fig. 30 are mainly used. Fig. 33 is a flowchart showing the flow of the present process. Hereinafter, the image data of the three color components constituting the block are referred to as C0, C1, and C2.

First, the encoding mode determination unit 5 receives the inter prediction mode sharing identification flag 123, and determines whether or not the common inter prediction mode, the common motion vector 137, and the common reference picture are used in C0, C1, and C2 based on the value (step S100 in fig. 33). If the commonization is performed, the process proceeds to step S101 and thereafter, otherwise, the process proceeds to step S102 and thereafter.

When the inter prediction mode, the motion vector 137, and the reference picture are shared among C0, C1, and C2, the encoding mode determination unit 5 notifies the motion compensation prediction unit 102 of all the inter prediction modes, the motion vector search range, and the reference picture that can be selected, and the motion compensation prediction unit 102 evaluates all the prediction efficiencies thereof and selects the optimal inter prediction mode, the motion vector 137, and the reference picture that are shared among C0, C1, and C2 (step S101).

When the inter prediction mode, the motion vector 137, and the reference image are not shared among C0, C1, and C2, but the optimal mode is selected among C0, C1, and C2, the encoding mode determination unit 5 notifies the motion compensation prediction unit 102 of all the inter prediction modes, the motion vector search range, and the reference image that can be selected in the Ci (i ≦ 0 ≦ 3) component, evaluates all the prediction efficiencies of the motion compensation prediction unit 102, and selects the optimal inter prediction mode, the motion vector 137, and the reference image in the Ci (i ≦ 0 ≦ 3) (steps S102, S103, and S104).

As a criterion for evaluating the prediction efficiency of the prediction mode by the motion compensation prediction unit 102, for example, a rate distortion cost provided by Jm, v, r ═ Dm, v, r + λ Rm, v, r (λ: positive number) can be used. Here, Dm, v, and r are coding distortion or prediction error amounts when the inter prediction mode m, the motion vector v in the predetermined range, and the reference image r are applied. The coding distortion is obtained by obtaining a prediction error using the inter prediction mode m, the motion vector v, and the reference image r, transforming and quantizing the prediction error, decoding a video based on the result of the transformation and quantization, and measuring an error with respect to a signal before coding. The prediction error amount is obtained by obtaining a difference between a predicted image and a signal before encoding when the inter prediction mode m, the motion vector v, and the reference image r are applied, and quantifying the magnitude of the difference, and is obtained by using, for example, Sum of Absolute Differences (SAD) or the like. Rm, v, r are generated code amounts when the inter prediction mode m, the motion vector v, and the reference image r are applied. That is, Jm, v, r are values that specify a tradeoff between the amount of code and the degree of degradation when the inter prediction mode m and the motion vector v are applied to the reference image r, and the inter prediction mode m, the motion vector v, and the reference image r that provide the smallest Jm, v, and r provide the best solution.

When the encoding device performs the processing of step S101 and subsequent steps, a set of the inter prediction mode, the motion vector 137, and the information of the reference image is assigned to the macroblock including the three color components. On the other hand, when the processing in step S102 and thereafter is performed, the inter prediction mode information, the motion vector 137, and the reference image are assigned to each color component. Therefore, since the inter prediction mode, the motion vector 137, and the reference image information allocated to the macroblock are different, it is necessary to multiplex the inter prediction mode-sharing identification flag 123 into the bitstream, and the decoding apparatus can identify whether the encoding apparatus has performed the processing procedure after S101 or the processing procedure after S102. Fig. 34 shows the data arrangement of such a bit stream.

Fig. 34 shows the data arrangement of the bit stream at the macroblock level, and the macroblock type indicates whether the macroblock is intra or inter, and includes information indicating the block size which is a unit of motion compensation in the inter mode. The sub-macroblock type is multiplexed only when an 8 × 8 block size is selected in the macroblock type, including block size information of each of the respective 8 × 8 blocks. The basic macroblock type 128 and the basic sub-macroblock type 129 indicate a common macroblock type and a common sub-macroblock type in the case where the inter prediction mode commonization flag 123 indicates "common in C0, C1, C2", and otherwise indicate a macroblock type and a sub-macroblock type for C0. The extended macroblock type 130 and the extended sub-macroblock type 131 respectively multiplex C1 and C2 only when the inter prediction mode commonization flag 123 indicates that the flag is not "common to C0, C1, and C2", and indicate the macroblock type and the sub-macroblock type for C1 and C2.

The reference image identification number is information for specifying a reference image selected for each of blocks having a size equal to or larger than an 8 × 8 block, which is a unit of motion compensation. In the case of an interframe format frame, since a selectable reference image is one frame, one reference image identification number is multiplexed for each block. As for the motion vector information, a set of motion vector information is multiplexed for each block which becomes a unit of motion compensation. The reference picture identification number and the motion vector information need to be multiplexed with the number of blocks that become a unit of motion compensation included in the macroblock. The base reference picture identification number 132 and the base motion vector information 133 indicate a common reference picture identification number and common motion vector information when the inter prediction mode commonization flag 123 indicates "common to C0, C1, and C2", and indicate a reference picture identification number and motion vector information for C0 otherwise. The extended reference picture identification number 134 and the extended motion vector information 135 are multiplexed with C1 and C2, respectively, and indicate the reference picture identification number and the motion vector information for C1 and C2, only when the inter prediction mode commonization flag 123 indicates that the flag is not "common to C0, C1, and C2".

Next, the quantization parameter 21 and the quantization completion transform coefficient 10 are multiplexed (in fig. 34, the block filter control flag 24 input to the variable length coding unit 11 in fig. 30 is not included, but is omitted because it is not a necessary component for explaining the feature of embodiment 7).

In the 4:2:0 format adopted in the conventional video encoding standard, the definition of the color space is fixed to Y, Cb, and Cr, but the 4:4:4 format is not limited to Y, Cb, and Cr, and various color spaces can be used. By configuring the inter prediction mode information as shown in fig. 34, it is possible to perform an optimal encoding process even when there are a plurality of definitions of the color space of the input video signal 1. For example, when the color space is defined by RGB, the common inter prediction mode information and the common motion vector information are used in regions where the structure of the video texture equally remains in each component of R, G, B, and thus the redundancy between the inter prediction mode information and the motion vector information itself can be reduced and the encoding efficiency can be improved. On the other hand, in a region having no red color at all (the R component is 0), for example, the inter prediction mode and the motion vector information optimal for the R component should be different from the inter prediction mode and the motion vector information optimal for the G, B component. Here, by adaptively using the extended inter prediction mode, the extended reference picture identification information, and the extended motion vector information, the optimal coding efficiency can be obtained.

3. Outline of operation of decoding device

The decoding apparatus of fig. 31 receives the video stream 22 in the arrangement of fig. 34 output from the encoding apparatus of fig. 30, performs decoding processing for each of the three macroblocks having the same color component size (4:4:4 format), and restores each of the video frames.

First, the variable length decoding unit 25 decodes the video stream 22 according to a predetermined rule (syntax) with the stream 22 as an input, and extracts information such as the inter-prediction mode commonization flag 123, the quantization completion transform coefficient 10, the macroblock type/sub-macroblock type 106, the reference picture identification number, the motion vector information, and the quantization parameter 21. The quantized transform coefficient 10 is input to the inverse quantization unit 12 together with the quantization parameter 21, and inverse quantization processing is performed. Then, the output thereof is input to the inverse orthogonal transform unit 13, and is restored to the local decoded prediction difference signal 14. On the other hand, the macroblock type/sub-macroblock type 106, the inter prediction mode sharing identification flag 123, the motion vector 137, and the reference picture identification number are input to the motion compensation prediction unit 102, and the predicted image 7 is obtained according to these pieces of information. A specific procedure for obtaining the predicted image 7 will be described later. The local decoded prediction difference signal 14 and the predicted image 7 are added by an adder 18, and a temporary decoded image 15 (which is a signal identical to the local decoded image 15 in the encoding apparatus) is obtained. The provisional decoded image 15 is written back to the memory 16 for use in motion compensated prediction of a subsequent macroblock. Three memories are prepared for each color component (three are described in the present embodiment, but may be appropriately changed by design). In addition, the block filter 26 is caused to act on the provisional decoded image 15 in accordance with the instruction of the block filter control flag 24 interpreted by the variable length decoding unit 25, and the final decoded image 27 is obtained.

4. Inter-frame prediction decoding process in decoding device

The inter-predicted image generation processing, which is a feature of the decoding apparatus according to embodiment 7, will be described in detail. In this process, the above three color components are performed in units of macroblocks, and the variable length decoding unit 25 and the motion compensation prediction unit 102 in the decoding device of fig. 31 are mainly used. Fig. 35 is a flowchart showing a flow of processing performed by the variable length decoding unit 25 in the present processing.

The video stream 22, which is input to the variable length decoding unit 25, is arranged in accordance with the data of fig. 34. In step S110, the inter prediction mode sharing identification flag 123 in the data of fig. 34 is decoded (step S110). Further, the basic macroblock type 128 and the basic sub-macroblock type 129 are decoded (step S111). In step S112, it is determined whether or not the inter prediction mode is shared among C0, C1, and C2 using the result of the inter prediction mode sharing flag 123, and if the inter prediction mode is shared (yes in step S112), the basic macroblock type 128 and the basic sub macroblock type 129 are used for all of C0, C1, and C2, and if not (no in step S112), the basic macroblock type 128 and the basic sub macroblock type 129 are used as the modes of C0, and the extended macroblock type 130 and the extended sub macroblock type 131 are decoded for C1 and C2, respectively (step S113), and the inter prediction mode information of C1 and C2 is obtained. Next, the base reference picture identification number 132 and the base motion vector information 133 are decoded (step S114), and when the inter-prediction mode sharing flag 123 indicates "sharing among C0, C1, and C2" (yes in step S115), the base reference picture identification number 132 and the base motion vector information 133 are used for all of C0, C1, and C2, and otherwise (no in step S115), the extended reference picture identification number 134 and the extended motion vector information 135 are decoded for C1 and C2, respectively, using the base reference picture identification number 132 and the base motion vector information 133 as information of C0 (step S116). Through the above processing procedures, the macroblock type/sub-macroblock type 106, the reference image identification number, and the motion vector information of each color component are specified, and therefore, they are output to the motion compensation prediction unit 102, and a motion compensation prediction image of each color component is obtained.

Fig. 36 shows a variation of the bit stream data arrangement of fig. 34. In fig. 36, the inter-prediction mode-sharing identification flag 123 is not a flag at the macroblock level, but is multiplexed into flags located in an upper data layer such as slices, pictures, and sequences. Thus, when sufficient prediction efficiency can be ensured by switching in the upper layer above the slice, it is not necessary to multiplex the prediction mode-sharing identification flag 123 one by one at the macroblock level, and additional bits can be reduced.

In fig. 34 and 36, the inter prediction mode commonization flag 123 is multiplexed for each upper data layer such as a macroblock, slice, picture, and sequence, but when encoding in the 4:4:4 format without multiplexing the inter prediction mode commonization flag 123, different inter prediction modes and motion vector information may be used for each component at all times. Fig. 37 shows the arrangement of bit stream data at this time. In fig. 37, the inter prediction mode commonization flag 123 is not present, the class information 136 of the input image in the 4:4:4 format is multiplexed and instructed to process the upper data layer such as the sequence, and the extended macroblock type 130, the extended sub-macroblock type 131, the extended reference picture identification number 134, and the extended motion vector information 135 are multiplexed using the decoding result of the class information.

Embodiment 8

Although the macroblock type, the sub-macroblock type, the motion vector, and the reference picture can be made different for each color component in embodiment 7, in embodiment 8, description is given of a video encoding device and a video decoding device that are characterized in that the macroblock type, the sub-macroblock type, and the reference picture can be made common for each component, and only the motion vector can be made different for each component. The video encoding device and the video decoding device in embodiment 8 have the same configurations as those of fig. 30 and 31 in embodiment 7, but differ from each other in that a motion vector-sharing identification flag 123b is used instead of the inter-prediction mode-sharing identification flag 123.

1. Inter prediction mode determination processing in encoding device

The inter prediction mode determination process, which is a feature of the encoding apparatus according to embodiment 8, will be described in detail mainly focusing on a process different from embodiment 7.

In this process, the above three color components are performed in units of macroblocks, and the motion compensation prediction unit 102 and the encoding mode determination unit 5 in the encoding device of fig. 30 are mainly used. Fig. 38 is a flowchart showing the flow of the present process. Hereinafter, the image data of the three color components constituting the block are referred to as C0, C1, and C2.

First, the encoding mode determination unit 5 receives the motion vector sharing identification flag 123b, and determines whether or not the common motion vector 137 is used in C0, C1, and C2 based on the value (step S120 in fig. 37). If the commonization is performed, the process proceeds to step S121 and thereafter, otherwise, the process proceeds to step S122 and thereafter.

When the motion vector 137 is shared among C0, C1, and C2, the encoding mode determining unit 5 notifies the motion compensation predicting unit 102 of all the inter prediction modes, the motion vector search range, and the reference picture that can be selected, and the motion compensation predicting unit 102 evaluates all the prediction efficiencies thereof and selects the optimal inter prediction mode, the motion vector 137, and the reference picture that are shared among C0, C1, and C2 (step S121).

When the motion vector 137 is not shared among C0, C1, and C2, but the optimal motion vector is selected among C0, C1, and C2, the encoding mode determination unit 5 notifies the motion compensation prediction unit 102 of all the inter prediction modes that can be selected, the motion vector search range, and the reference image, evaluates all the prediction efficiencies of the motion compensation prediction unit 102, selects the optimal inter prediction mode and the reference image that are shared among C0, C1, and C2 (step S122), and selects the optimal motion vector among Ci (i < 0 < 3) (steps S123, S124, and S125).

The motion vector commonization identification flag 123b needs to be multiplexed into the bit stream to be identifiable on the decoding apparatus side. Fig. 39 shows the data arrangement of such a bit stream.

Fig. 39 shows the data arrangement of the bit stream in the macroblock level. The macroblock type 128b and the sub-macroblock type 129b and the reference picture identification number 132b are "common among C0, C1, C2". The basic motion vector information 133 indicates common motion vector information in the case where the motion vector commonization flag 123b indicates "common in C0, C1, C2", and indicates motion vector information for C0 otherwise. The extended motion vector information 135 is multiplexed for C1 and C2 and represents motion vector information for C1 and C2 only when the motion vector sharing flag 123b indicates that "common to C0, C1, and C2" is not included. The macroblock type/sub-macroblock type 106 in fig. 30 and 31 is a general name of the macroblock type 128b and the sub-macroblock type 129b in fig. 39.

2. Inter-frame prediction decoding process in decoding device

The decoding device according to embodiment 8 receives the video stream 22 in the arrangement shown in fig. 39 output from the encoding device according to embodiment 8, performs decoding processing for each unit of a macroblock in which three color components are of the same size (4:4:4 format), and restores each video frame.

The inter-predicted image generation processing, which is a feature of the decoding apparatus according to embodiment 8, will be described in detail mainly focusing on processing different from that of embodiment 7. In this process, the above three color components are performed in units of macroblocks, and the variable length decoding unit 25 and the motion compensation prediction unit 102 in the decoding device of fig. 31 are mainly used. Fig. 40 is a flowchart showing a flow of processing performed by the variable length decoding unit 25 in the present processing.

The video stream 22, which is input to the variable length decoding unit 25, is arranged in accordance with the data of fig. 39. In step S126, the macroblock type 128b and the sub-macroblock type 129b common to C0, C1, and C2 are decoded. Since the block size, which is the unit of motion compensation, is determined using the decoded macroblock type 128b or sub-macroblock type 129b, the reference picture identification number 132b common to C0, C1, and C2 is decoded for each block, which is the unit of motion compensation (step S127). In step S128, the motion vector commonization flag 123b is decoded. Next, the basic motion vector information 133 is decoded for each block that becomes a unit of motion compensation (step S129). In step S130, it is determined whether or not the motion vector 137 is shared among C0, C1, and C2 using the result of the motion vector sharing flag 123b, and if the motion vector is shared (yes in step S130), the basic motion vector information is used for all of C0, C1, and C2, and if not (no in step S130), the extended motion vector information 135 is decoded for each of C1 and C2 using the basic motion vector information 133 as the C0 mode (step S131). Through the above processing procedures, the macroblock type/sub-macroblock type 106, the reference image identification number, and the motion vector information of each color component are specified, and therefore, they are output to the motion compensation prediction unit 102, and a motion compensation prediction image of each color component is obtained.

Fig. 41 shows a variation of the bit stream data arrangement of fig. 39. In fig. 39, the motion vector commonization identification flag 123b is not a flag at the macroblock level, but is multiplexed into flags located in an upper data layer such as slices, pictures, and sequences. Thus, when sufficient prediction efficiency can be ensured by switching in the upper layer above the slice, it is not necessary to multiplex the motion vector-sharing identification flag 123b one by one at the macroblock level, and additional bits can be reduced.

In fig. 39 and 41, the motion vector sharing identification flag 123b may be multiplexed into an upper data layer such as each macroblock, slice, picture, and sequence, but when encoding in the 4:4:4 format without multiplexing the motion vector sharing identification flag 123b, different motion vector information may be always used for each component. Fig. 42 shows the arrangement of bit stream data at this time. In fig. 42, the motion vector commonization flag 123b is not present, the class information 136 of the input image in the 4:4:4 format is multiplexed in the upper data layer such as the sequence, and the extended motion vector information 135 is multiplexed using the decoding result of the class information 136.

In embodiment 8, the macroblock type/sub-macroblock type 106 and the reference image may be made common to each color component, and the motion vector 137 may be made different only for each color component. Thus, when sufficient prediction efficiency is obtained by adapting only the motion vector 137 to each color component, it is not necessary to multiplex the macroblock type/sub-macroblock type 106 and the reference picture identification number for each color component, and additional bits can be reduced.

Embodiment 9

In embodiment 7, the macroblock type/sub-macroblock type 106, the motion vector 137, and the reference image may be switched to be common to three components or different for each color component by using the inter prediction mode commonization identification flag 123 or the class information 136, but in embodiment 9, it is possible to switch whether or not to set the luminance component (Y) and the color difference components (Cb, Cr) to be different (in this case, a common mode is used for two components of the color difference components) on the assumption of a 4:4:4 format image such as a Y, Cb, Cr format, or the like. That is, the video encoding device and the video decoding device characterized in that they can be switched between a common one among the three components, different for each component, or different for the luminance component and the color difference component will be described. The configurations of the video encoding device and the video decoding device according to embodiment 9 are the same as those of fig. 30 and 31 according to embodiment 7.

1. Inter prediction mode determination processing in encoding device

The inter prediction mode determination process, which is a feature of the encoding apparatus according to embodiment 9, will be described in detail mainly focusing on a process different from embodiment 7.

In this process, the above three color components are performed in units of macroblocks, and the motion compensation prediction unit 102 and the encoding mode determination unit 5 in the encoding device of fig. 30 are mainly used. Fig. 43 is a flowchart showing the flow of the present process. Hereinafter, the image data of the three color components constituting the block are referred to as C0, C1, and C2.

First, the encoding mode determination unit 5 receives the inter prediction mode sharing identification flag 123, and determines whether or not the common inter prediction mode, the common motion vector 137, and the common reference picture are used in C0, C1, and C2 based on the value (step S132 in fig. 43). If the commonization is performed, the process proceeds to step S133, and otherwise, the process proceeds to step S134 or S137.

When the inter prediction mode, the motion vector 137, and the reference picture are shared among C0, C1, and C2, the encoding mode determination unit 5 notifies the motion compensation prediction unit 102 of all the inter prediction modes, the motion vector search range, and the reference picture that can be selected, and the motion compensation prediction unit 102 evaluates all the prediction efficiencies thereof and selects the optimal inter prediction mode, the motion vector 137, and the reference picture that are shared among C0, C1, and C2 (step S133).

When the inter prediction mode, the motion vector 137, and the reference image are not shared among C0, C1, and C2, but the optimal mode is selected among C0, C1, and C2, the encoding mode determination unit 5 notifies the motion compensation prediction unit 102 of all the inter prediction modes, the motion vector search range, and the reference image that can be selected in the Ci (i < ═ 0 < 3) component, evaluates all the prediction efficiencies of the motion compensation prediction unit 102, and selects the optimal inter prediction mode, the motion vector 137, and the reference image in the Ci (i < ═ 0 < 3) (steps S134, S135, and S136).

When the inter prediction mode is shared among the inter prediction mode, the motion vector 137, and the reference image in C1 and C2, and the optimal mode is selected among C0 (corresponding to a luminance component), C1, and C2 (corresponding to a color difference component), respectively, the encoding mode determining unit 5 notifies the motion compensation predicting unit 102 of all the inter prediction modes, the motion vector search range, and the reference image that can be selected in the C0 component, and the motion compensation predicting unit 102 evaluates all the prediction efficiencies thereof and selects the optimal inter prediction mode, the motion vector 137, and the reference image in the C0 component (step S137). Further, all the inter prediction modes, the motion vector search range, and the reference picture that can be selected in the C1 and C2 components are notified, and the motion compensation prediction unit 102 evaluates all the prediction efficiencies thereof and selects the optimal inter prediction mode, the motion vector 137, and the reference picture that are common to C1 and C2 (step S138).

The data arrangement of the bit stream output by the encoding apparatus in embodiment 9 is the same as that in fig. 34, but when the inter-prediction mode commonization flag 123 indicates "common to C1 and C2", the extended macroblock type 130, the extended sub-macroblock type 131, the extended reference identification number 134, and the extended motion vector information 135 are information common to C1 and C2.

2. Inter-frame prediction decoding process in decoding device

The decoding device according to embodiment 9 receives the video stream 22 in the arrangement shown in fig. 34 output from the encoding device according to embodiment 9, performs decoding processing for each of the three color components in units of macroblocks of the same size (4:4:4 format), and restores each of the video frames.

The inter-predicted image generation processing, which is a feature of the decoding apparatus according to embodiment 9, will be described in detail mainly focusing on processing different from that of embodiment 7. In this process, the above three color components are performed in units of macroblocks, and the variable length decoding unit 25 and the motion compensation prediction unit 102 in the decoding device of fig. 31 are mainly used. Fig. 44 is a flowchart showing a flow of processing performed by the variable length decoding unit 25 in the present processing.

The video stream 22, which is input to the variable length decoding unit 25, is arranged in accordance with the data of fig. 34. In step S140, the inter prediction mode sharing identification flag 123 in the data of fig. 34 is decoded (step S140). Further, the basic macroblock type 128 and the basic sub-macroblock type 129 are decoded (step S141). In step S142, it is determined whether or not the inter prediction modes are shared among C0, C1, and C2 using the result of the inter prediction mode sharing identification flag 123, and if the inter prediction modes are shared, the basic macroblock type 128 and the basic sub-macroblock type 129 are used for all of C0, C1, and C2, and otherwise, the basic macroblock type 128 and the basic sub-macroblock type 129 are used as the modes of C0. When C1 and C2 are common, the common extended macroblock type 130 and extended sub-macroblock type 131 are decoded in the C1 and C2 components (step S143). When different modes are used for C0, C1, and C2, the extended macroblock type 130 and the extended sub-macroblock type 131 are decoded for C1 and C2, respectively (steps S144, S145, and S146), and mode information of C1 and C2 is obtained. Next, the base reference image identification number 132 and the base motion vector information 133 are decoded (step S147), and when the inter-prediction mode commonization flag 123 indicates "common use among C0, C1, and C2", the base reference image identification number 132 and the base motion vector information 133 are used for all of C0, C1, and C2, otherwise, the base reference image identification number 132 and the base motion vector information 133 are used as information of C0, and when they are common to C1 and C2, the extended reference image identification number 134 and the extended motion vector information 135 which are common to C1 and C2 components are decoded (step S149). When different modes are used for C0, C1, and C2, the extended reference picture identification number 134 and the extended motion vector information 135 are decoded for C1 and C2, respectively (steps S150, S151, and S152). Through the above processing procedures, the macroblock type/sub-macroblock type 106, the reference image identification number, and the motion vector information of each color component are specified, and therefore, they are output to the motion compensation prediction unit 102, and a motion compensation prediction image of each color component is obtained.

In addition, in the case of the data arrangement of the bit stream, similarly to the case of fig. 36, when the inter prediction mode commonization flag 123 indicates "common to C1 and C2", the extended macroblock type 130, the extended sub-macroblock type 131, the extended reference identification number 134, and the extended motion vector information 135 are information common to C1 and C2, and the operations of the video encoding apparatus and the video decoding apparatus that input and output a video stream in accordance with the arrangement of data shown in fig. 36 are the same as those in the case of fig. 34.

In embodiment 9, the macroblock type/sub-macroblock type 106, the motion vector 137, and the reference image may be different for each color component, but the macroblock type/sub-macroblock type 106 and the reference image may be common for each component, and only the motion vector 137 may be switched to be common for three components, different for each component, or common for C1 and C2, and the most suitable one may be selected among C0, C1, and C2. In this case, when the inter prediction mode commonization flag 123 indicates "common to C1 and C2", the extended motion vector information 135 is common to C1 and C2, as shown in fig. 39 or fig. 41 for the data arrangement of the bit stream.

Embodiment 10

In embodiment 10, a method of encoding the input motion vector 137 in the variable length encoding unit 11 of the encoding apparatus described in embodiment 7 and multiplexing the encoded motion vector into a bit stream, and a method of decoding the motion vector 137 from the bit stream in the variable length decoding unit 25 of the corresponding decoding apparatus are described.

Fig. 45 is a configuration of a motion vector encoding unit that encodes a motion vector 137 in a part of the variable length encoding unit 11 of the encoding device shown in fig. 30.

A method of multiplexing the motion vectors 137 of the three color components (C0, C1, C2) in the bit stream in the order of C0, C1, C2 is described.

The motion vector 137 of C0 is set to mv 0. The motion vector predictor 111 obtains a prediction vector (mvp0) of the motion vector 137 of C0. As shown in fig. 46, the motion vectors (mvA0, mvB0, mvC0) of the block (A, B, C in fig. 46) adjacent to the block in which the motion vector (mv0) of the encoding target is located are acquired from the memory. In addition, the motion vector 137 set to A, B, C has been multiplexed in the bitstream. The median values of mvA0, mvB0 and mvC0 were calculated as mvp 0. The calculated prediction vector mvp0 and the motion vector mv0 of the encoding target are input to the differential motion vector calculation unit 112. In the differential motion vector calculation unit 112, a differential vector (mvd0) between mv0 and mvp0 is calculated. The calculated mvd0 is input to the differential motion vector variable length coding unit 113 and entropy-coded by means of huffman coding, arithmetic coding, or the like.

Next, the motion vector of C1 (mv1) is encoded. The motion vector predictor 111 obtains a prediction vector (mvp1) of the motion vector 137 of C1. As shown in fig. 46, the motion vectors (mvA1, mvB1, mvC1) of blocks adjacent to the block in which the motion vector (mv1) of the encoding target is located and the motion vector (mv0) of C0 at the same position as the block in which mv1 is located are obtained from the memory 16. In addition, motion vector 137 set to A, B, C has been multiplexed into the bitstream. The median values of mvA1, mvB1, mvC1 and mv0 were calculated as mvp 1. The calculated prediction vector mvp1 and the motion vector mv1 of the encoding target are input to the differential motion vector calculation unit 112, and a differential motion vector between mv1 and mvp1 is calculated (mvd1 ═ mv1-mvp 1). The calculated mvd1 is input to the differential motion vector variable length coding unit 113 and entropy-coded by means of huffman coding, arithmetic coding, or the like.

Next, the motion vector of C2 (mv2) is encoded. The motion vector predictor 111 obtains a prediction vector (mvp2) of the motion vector 137 of C2. As shown in fig. 46, the motion vectors (mvA2, mvB2, mvC2) of blocks adjacent to the block in which the motion vector (mv2) of the encoding target is located, and the motion vectors (mv1, mv2) of C0 and C1 at the same positions as the block in which mv2 is located are obtained from the memory. The median values of mvA2, mvB2, mvC2, mv0 and mv1 were calculated as mvp 2. The calculated prediction vector mvp2 and the motion vector mv2 of the encoding target are input to the differential motion vector calculation unit 112, and a differential motion vector between mv2 and mvp2 is calculated (mvd2 ═ mv2-mvp 2). The calculated mvd2 is input to the differential motion vector variable length coding unit 113 and entropy-coded by means of huffman coding, arithmetic coding, or the like.

Fig. 47 shows the configuration of a motion vector decoding unit 250 that decodes a motion vector 137 in a part of the variable length decoding unit 25 of the decoding device shown in fig. 31.

The motion vector decoding unit 250 decodes the motion vectors 137 multiplexed into the three color components in the video stream 22 in the order of C0, C1, and C2.

The differential motion vector variable length decoding unit 251 extracts the differential motion vectors (mvd0, mvd1, mvd2) of the three color components (C0, C1, C2) multiplexed in the video stream 22 and performs variable length decoding.

The motion vector predictor 252 calculates the prediction vectors (mvp0, mvp1, mvp2) of the motion vectors 137 of C0, C1, and C2. The method of calculating the prediction vector is the same as that of the motion vector prediction unit 111 of the encoding apparatus.

Next, the motion vector calculation unit 253 adds the difference motion vector and the prediction vector to calculate a motion vector (mvi ═ mvdi + mvpi (i ═ 0, 1, 2)). The calculated motion vector 137 is stored in the memory 16 to be used as a prediction vector candidate.

According to embodiment 10, when encoding and decoding a motion vector, since a motion vector of a block of the same color component adjacent to a block where the motion vector to be encoded is located and a motion vector of a block of a color component different from the block where the motion vector to be encoded is located at the same position are used as prediction vector candidates, in a case where there is no continuity between the motion vectors of the blocks adjacent to the blocks within the same color component in a boundary region of an object or the like, the motion vector of a block of the same position of a different color component is used as a prediction vector candidate, thereby improving the prediction efficiency of the motion vector and obtaining an effect of reducing the code amount of the motion vector.

Embodiment 11

In embodiment 11, an example of another encoding device and decoding device derived from the encoding device and decoding device described in embodiment 7 will be described. The encoding device and the decoding device in embodiment 11 are characterized by providing the following means: based on a predetermined control signal, it is determined whether or not the C0, C1, and C2 components in a macroblock are encoded according to individual header information, and information of the control signal is multiplexed in the video stream 22. In addition, header information necessary for decoding the C0, C1, and C2 components is multiplexed in the video stream 22 by the control signal, and skip (or not coded) macroblocks when there is no information on a motion vector or a transform coefficient to be transmitted are efficiently encoded by the control signal.

In the conventional MPEG video encoding system including AVC, a high-energy rate encoding is realized in which a signaling (signaling) is particularly performed in a case where there is no encoding information to be transmitted for a macroblock to be encoded, and the amount of code of the macroblock is suppressed to a minimum. For example, when a certain macroblock is to be encoded, if image data at exactly the same position is used as a predicted image (that is, if a motion vector is zero) in a reference image used for motion compensation prediction, and if all quantized transform coefficients in the macroblock become zero as a result of transformation and quantization of the obtained prediction error signal, the amplitude of the obtained prediction error signal becomes zero even if inverse quantization is performed on the decoding side, and transform coefficient data to be transmitted to the decoding apparatus side disappears. Furthermore, if it is assumed in combination that the motion vector is zero, a special macroblock type such as "motion vector zero, no transform coefficient data" may be defined. Such a macroblock has been conventionally called a skip macroblock or a not coded macroblock, and a technique for avoiding transmission of redundant information by performing special signaling has been studied. In AVC, regarding the assumption of a motion vector, a condition "when the 16 × 16 prediction in fig. 32(a) is performed and prediction values used for coding the motion vector (corresponding to prediction vectors mvp0, mvp1, and mvp 2) are equal to an actual motion vector" is set, and when the condition is satisfied and there is no transform coefficient data to be transferred, the motion vector is regarded as a skip macroblock. In the conventional AVC, when the skip macroblock is coded, one of the following two methods is selected according to the variable length coding method used.

The method comprises the following steps: the number of skipped macroblocks (RUN length) that continue within a slice is counted, and the RUN length is variable-length-coded.

The method 2 comprises the following steps: for each macroblock, an indication whether it is a skipped macroblock is encoded.

Fig. 48 shows a bitstream syntax obtained by each method. Fig. 48(a) shows a case where adaptive huffman coding is used as the variable length coding scheme (method 1), and fig. 48(b) shows a case where adaptive arithmetic coding is used (method 2). In case of method 1, signaling of the skip macroblock is made using mb _ skip _ run, and in case of method 2, using mb _ skip _ flag. Mb (n) refers to the coded data for the nth (not skipped) macroblock. Note that mb _ skip _ run and mb _ skip _ flag are assigned in units of macroblocks, each of which is a collection of C0, C1, and C2 components.

In contrast, the encoding device and the decoding device according to embodiment 11 provide the following methods: based on the state of the control signal, that is, the signal corresponding to the inter prediction mode sharing flag 123 described in embodiment 7, header information including a motion vector and the like is changed for each of the components C0, C1, and C2, and a skip macroblock is signaled for each of the components C0, C1, and C2. Fig. 49 and 50 show specific examples of the bitstream syntax.

Fig. 49 shows the configuration of macroblock encoded data that is output by the encoding apparatus of embodiment 11 and input to the decoding apparatus of embodiment 11, and fig. 50 shows the detailed configuration of encoded data of Cn component header information in fig. 49. Hereinafter, in order to explain the effect of the structure of the bit stream, the operation of the decoding apparatus that receives the bit stream and restores the video signal will be mainly explained. Fig. 31 is referred to in the description of the operation of the decoding apparatus.

The inter prediction mode sharing identification flag 123 according to embodiment 7 is defined in an expanded manner, and is expressed as a macroblock header sharing identification flag 123 c. The macroblock header-sharing identification flag 123c is a flag indicated as follows: the C0 component header information 139a is regarded as basic macroblock header information, and only the C0 component header information 139a is multiplexed into header information that is also commonly used in the C1 and C2 components, or the C1 component header information 139b and C2 component header information 139C are individually multiplexed into extension header information. The macroblock header-sharing identification flag 123c is extracted from the video stream 22 and decoded by the variable length decoding unit 25. When the flag indicates that only the C0 component header information 139a is multiplexed into header information that is commonly used for the C1 and C2 components, decoding is performed using the C0 component header information 139a for all the C0, C1, and C2 components in the macroblock, and decoding is performed using the header information 139a to 139C unique to each of the C0, C1, and C2 components in the macroblock when the flag indicates that the C1 component header information 139b and C2 component header information 139C are individually multiplexed into extended header information. This point will be described in further detail below as processing in units of macroblocks.

1. Case where only the C0 component header information is multiplexed

When the macroblock header-sharing identification flag 123C indicates that only the C0 component header information 139a is multiplexed into header information that is commonly used for the C1 and C2 components, the macroblock is decoded from the various macroblock header information included in the C0 component header information 139a for all the C0, C1, and C2 components. In this case, since the C0 component skip instruction information 138a and the C0 component header information 139a are commonly applied to the C1 and the C2 components, skip instruction information (138b and 138C) and header information (139b and 139C) about the C1 and the C2 components are not multiplexed in the bitstream.

The variable length decoding unit 25 first decodes and evaluates the C0 component skip instruction information 138 a. Here, when the C0 component skip instruction information 138a indicates "skip", the C0 component header information 139a is regarded as not being encoded, and the transform coefficient validity indication information 142 in the C0 component header information 139a is regarded as zero (transform coefficient not encoded at all). Thus, the C0 to C2 component transform coefficient data (140a to 140C) are output as if all have not been encoded, with all of the quantized transform coefficients 10 in the macroblock set to zero. Further, the motion vectors 137 of all the components C0, C1, and C2 are set to the same value and output according to the definition of the skip macroblock.

If the C0 component skip instruction information 138a indicates that it is not "skip," the C0 component header information 139a is present and decoding is performed. In the C0 component header information 139a, if the macroblock type 128b indicates intra coding, the intra prediction mode 141, the transform coefficient validity indication information 142, and (if the transform coefficient validity indication information 142 is not 0) quantization parameters are decoded. Here, if the transform coefficient validity indication information 142 is not zero, the C0 to C2 component transform coefficient data (140a to 140C) are decoded and output as the quantized transform coefficient 10. When the transform coefficient validity indication information 142 is zero, the C0 to C2 component transform coefficient data (140a to 140C) are all zero, and all of the transform coefficients 10 whose quantization has been completed in the macroblock are set to zero and output. If the macroblock type 128b indicates inter coding, the sub-macroblock type 129b is decoded as necessary, and the quantization parameter 21 is decoded with reference to the picture identification number 132b, the motion vector information 133b, the transform coefficient validity indication information 142, and (if the transform coefficient validity indication information 142 is not 0). Here, if the transform coefficient validity indication information 142 is not zero, the C0 to C2 component transform coefficient data (140a to 140C) are decoded and output as the quantized transform coefficient 10. When the transform coefficient validity indication information 142 is zero, the C0 to C2 component transform coefficient data (140a to 140C) are all zero, and all of the transform coefficients 10 whose quantization has been completed in the macroblock are set to zero and output. The same as in embodiment 7 is true in that the macroblock is decoded in accordance with a predetermined processing procedure using the output from the variable length decoding unit 25 by the above operation.

2. For the case where each corresponding title information is multiplexed with the C0, C1, and C2 components

When the macroblock header-sharing identification flag 123C indicates that the C1 component header information 139b, the C2 component header information 139C, and the C0 component header information 139a are separately multiplexed into the extension header information, each component image is decoded from the respective macroblock header information included in the corresponding header information (139a to 139C) for each component of C0, C1, and C2. In this case, the jump instruction information (138b, 138C) and the header information (139b, 139C) about the C1 and C2 components are multiplexed in the bit stream.

The variable length decoding unit 25 first decodes and evaluates the C0 component skip instruction information 138 a. Here, when the C0 component skip instruction information 138a indicates "skip", the C0 component header information 139a is regarded as not being encoded, and the transform coefficient validity indication information 142 in the C0 component header information 139a is regarded as zero (transform coefficient not encoded at all). Thus, the C0 component transform coefficient data 140a is regarded as not being encoded, and the quantization completion full transform coefficient in the C0 component is set to zero (that is, the relationship between the C0 component skip instruction information 138a and the transform coefficient validity/invalidity instruction information 142 changes depending on the value of the macroblock header sharing identification flag 123C). Further, the motion vector 137 of the C0 component is set and output as defined when the C0 component jumps.

If the C0 component skip instruction information 138a indicates that it is not "skip," the C0 component header information 139a is present and decoding is performed. In the C0 component header information 139a, if the macroblock type 128b indicates intra coding, the quantization parameter 21 is decoded in the intra prediction mode 141 (a mode of spatial pixel prediction using, as a prediction value, pixels in the vicinity of a pixel to be predicted within a frame), the transform coefficient validity/invalidity indication information 142, and (if the transform coefficient validity/invalidity indication information 142 is not 0). Here, if the transform coefficient validity indication information 142 is not zero, the C0 component transform coefficient data is decoded and output as the quantization end transform coefficient 10. When the transform coefficient validity/invalidity instruction information is zero, it is assumed that all the C0 component transform coefficient data are zero. If the macroblock type indicates inter-frame coding, the sub-macroblock type is decoded as necessary, and further, the quantization parameter is decoded with reference to the picture identification number, the motion vector information, the transform coefficient validity/invalidity indication information, or (if the transform coefficient validity/invalidity indication information is not 0). Here, if the transform coefficient validity indication information is not zero, the C0 component transform coefficient data is decoded and output as the quantization end transform coefficient 10. When the transform coefficient validity/invalidity instruction information is zero, it is assumed that all the C0 component transform coefficient data are zero. The above processing steps are similarly performed for C1 and C2.

The same as in embodiment 7 is true of the case where the output from the variable length decoding unit 25 through the above operation is used to decode each component of C0, C1, and C2 in a macroblock in accordance with a predetermined processing procedure.

As described above, the following effects are obtained by configuring the bit stream in such a manner that the operation of the decoding apparatus side is centered. First, in conventional AVC, header information (fig. 50) that can be used for each macroblock is only one set, and it is necessary to perform intra/inter decision and encode by grouping all components of C0 to C2 into one according to the header information. On the other hand, in the case where a signal component corresponding to a luminance signal which transmits the content of an image signal is equivalently included in three color components as in the 4:4:4 format, there is a case where a variation in signal characteristics occurs in each component due to noise superimposition or the like on an input video signal, and a method of encoding by combining all components C0 to C2 into one may not be optimal. Assuming that the bit stream structures of fig. 49 and 50 in embodiment 11 are used, the encoding device can select an optimal encoding mode (macroblock type including intra/inter encoding type) and motion vector corresponding to signal characteristics for each of the components C0 to C2 by using the macroblock header-sharing identification flag 123C, and perform encoding, thereby improving encoding efficiency. In addition, conventionally, since all the components C0 to C2 are coded in units of macroblocks, each of which is obtained by grouping all the components C0 to C2 into one, it is determined to skip on the condition that no coded information of all the components exists, but in embodiment 11, the presence or absence of coded information can be determined by skip instruction information 138 for each component, so that when only one component is a skip and the other components are not a skip, it is not necessary to make all the components not skip, and the code amount can be more efficiently allocated. In the encoding apparatus, the variable length coding unit 11 determines the value of the skip instruction information 138 based on the quantized transform coefficient data 10, the motion vector 137, the reference picture identification number 132b, and the macroblock type/sub-macroblock type 106, in accordance with the definition of a skip macroblock uniquely defined by both the encoding apparatus and the decoding apparatus as described in the section 2 of the foregoing embodiment 11.

The bit stream to be processed by the encoding device and the decoding device according to embodiment 11 may have a structure as shown in fig. 51. In this example, jump instruction information (138), header information (139a to 139C), and transform coefficient data (140a to 140C) of each component of C0, C1, and C2 are arranged in a lump. In this case, the skip indication information 138 may be encoded by arranging the states of C0, C1, and C2 with one-bit code symbols, or by assembling eight states into one code symbol. When the correlation between color components is high in the skip state, the encoding efficiency of the skip instruction information 138 itself can be improved by appropriately defining a context model for arithmetic encoding (described later in embodiment 12) by summarizing the code symbols.

The macroblock header-sharing identification flag 123c may be multiplexed in the bit stream in units of arbitrary data layers such as macroblocks, slices, pictures, and sequences. When there is a constant difference in signal properties between color components in an input signal, if the macroblock header-sharing identification flag 123c is configured to be multiplexed in units of a sequence, efficient encoding can be performed with less overhead information. If the macroblock header-sharing identification flag 123c is multiplexed in units of pictures, the header is shared in an I picture with little macroblock type variation, and the header is used individually for each color component in an P, B picture with much macroblock type variation, so that it is expected to improve the balance between the encoding efficiency and the computational load. Further, switching in picture layers is preferable from the viewpoint of encoding control of a video signal in which the properties of a signal such as Scene Change every picture. When the macroblock header-sharing identification flag 123c is multiplexed on a macroblock-by-macroblock basis, the amount of code per macroblock increases, but on the other hand, whether or not header information is shared can be controlled on the basis of the signal state of each color component on a macroblock-by-macroblock basis, and a coding apparatus that can improve compression efficiency by following local signal fluctuations in an image more favorably can be configured.

When switching a coding type corresponding to a picture type at a slice level as in AVC, the following method is considered: the macroblock header-sharing identification flag 123C is multiplexed for each slice, and when the flag indicates "common to C0, C1, and C2", the bit stream is configured such that the slice includes coded information of all three color components, and when the flag indicates "common to C0, C1, and C2", the bit stream is configured such that one slice includes information of one color component. Fig. 52 shows the appearance thereof. In fig. 52, the macroblock header-sharing identification flag 123c has the meaning of "coding information that the current slice includes all three color components" or "coding information that the current slice includes a specific color component" as slice configuration identification information. Of course, such slice configuration identification information may be prepared independently of the macroblock header-sharing identification flag 123 c. If it is recognized that "the current slice includes the coded information of a specific color component", it is assumed that the current slice includes the recognition of "any one of C0, C1, and C2". In this way, when switching between using one macroblock header in common among C0, C1, and C2 components (C0, C1, and C2 slices in a mixed manner) or multiplexing the macroblock headers individually for each of C0, C1, and C2 components (C0 slice, C1 slice, and C2 slice), if these two slices are mixed within one picture, the following restriction is imposed: the C0 slice, the C1 slice, and the C2 slice are multiplexed into a bitstream as a group of data obtained by encoding macroblocks at the same position in a picture at all times. That is, the value of first _ mb _ in _ slice included in the slice header and indicating the intra-picture position of the leading macroblock of the slice always takes the same value in the set of C0 slice, C1 slice, and C2 slice, and the number of macroblocks included in the set of C0 slice, C1 slice, and C2 slice is the same. Fig. 53 shows the appearance thereof. By providing such restrictions on the structure of the bit stream, the encoding device can adaptively select an encoding method with high encoding efficiency from a group of C0, C1, and C2 slices mixed with C0 slices, C1 slices, and C2 slices according to the property of a local signal within a picture, and encode the selected slice, and the decoding device can receive the bit stream thus efficiently encoded and reproduce a video signal. For example, when the bit stream 22 input to the decoding apparatus of fig. 31 has such a configuration, the variable length decoding unit 25 decodes slice configuration identification information from the bit stream every time slice data is input, and identifies which slice in fig. 52 the slice to be decoded from. When it is determined from the slice configuration identification information that encoded data is configured as a group of C0 slices, C1 slices, and C2 slices, the state of the inter-prediction mode-sharing identification flag 123 (or the macroblock header-sharing identification flag 123C) may be determined to "use individual inter-prediction modes or (macroblock headers) in C0, C1, and C2" to perform a decoding operation. Since the value of first _ mb _ in _ slice of each slice is guaranteed to be equal to the number of macroblocks in the slice, decoding processing can be performed without overlapping or gaps (gaps) of the slices in the picture due to C0, C1, and C2.

In addition, when the signal properties of the slices of C0, C1, and C2 are greatly different from each other, in order to avoid a decrease in coding efficiency due to such a restriction, the following identification information may be provided: it is possible to select whether or not a slice having a value of different slice composition identification information is permitted to be mixed within a picture in a picture level or a sequence level.

Embodiment 12

In embodiment 12, an example of another encoding device and decoding device derived from the encoding device and decoding device described in embodiment 11 will be described. The coding apparatus/decoding apparatus according to embodiment 12 is characterized in that, when each component of C0, C1, and C2 in a macroblock is coded using an adaptive arithmetic coding scheme, the symbol occurrence probability and the learning procedure thereof used for common arithmetic coding among all components, or whether each component is separated, are adaptively switched using indication information multiplexed in a bit stream.

In embodiment 12, the encoding apparatus is different from embodiment 11 only in the variable length encoding unit 11 of fig. 30, and the decoding apparatus is different from embodiment 11 only in the processing in the variable length decoding unit 25 of fig. 31, except that the other operations are performed in accordance with embodiment 11. The arithmetic encoding and decoding processes, which are features of embodiment 12, will be described in detail below.

1. Encoding process

Fig. 54 shows an internal configuration related to arithmetic coding processing in the variable length coding unit 11, and fig. 55 and 56 show an operation flow thereof.

The variable length coding unit 11 according to embodiment 12 includes: a context model determination unit 11a that determines a context model (described later) defined for each data type such as a motion vector 137, a reference picture identification number 132b, a macroblock type/sub-macroblock type 106, an intra prediction mode 141, and a quantization completion transform coefficient 10, which are data to be encoded; a binarization section 11b for converting the multi-valued data into binary data according to a binarization rule determined for each type of data to be encoded; an occurrence probability generating unit 11c for providing the occurrence probability of the bin value (0 or 1) after binarization; an encoding unit 11d for performing arithmetic encoding based on the generated occurrence probability; and a memory 11g for storing the occurrence probability information. The inputs to the context model determining unit 11a are various data input as encoding target data to the variable length encoding unit 11, such as the motion vector 137, the reference picture identification number 132b, the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantized transform coefficient 10, and the output from the encoding unit 11d corresponds to information on the macroblock of the video stream 22.

(1) Context model determination processing (step S160 in FIG. 55)

The context model is a model obtained by modeling the dependency relationship with other information that causes the variation in the occurrence probability of the information source symbolBy switching the state of the occurrence probability in accordance with the dependency relationship, it is possible to encode the actual occurrence probability more suitable for the symbol. Fig. 57 shows the concept of the context model (ctx). In fig. 57, the information source symbol is binary, but may be multilevel. The branch of ctx such as 0 to 2 in fig. 57 is defined assuming that the state of the occurrence probability of the information source symbol using the ctx should change depending on the situation. In the video encoding according to embodiment 12, the value of ctx is switched according to the dependency relationship between encoded data in a certain macroblock and encoded data in a neighboring macroblock. For example, fig. 58 shows an example of a context model related to a motion vector of a macroblock disclosed in d.marpe, etc. "Video Compression using context-Based Adaptive arithmetric Coding", international conference on Image Processing 2001. In fig. 58, the motion vector of block C is the encoding target (correctly, the prediction difference value mvd for the motion vector in which block C is predicted from the vicinity _k(C) Coded), ctx _ mvd (C, k) represents a context model. mvd_k(A) Representing the motion vector prediction difference value, mvd, in block A_k(B) Representing the motion vector prediction difference value in block B, the switching evaluation value e for the context model_k(C) The definition of (1). Evaluation value e_k(C) The deviation of the motion vector in the vicinity is indicated, and generally, when the deviation is small, the mvd tends to be small_k(C) Small, conversely at e_k(C) In the case of large, mvd_k(C) And is also large. Therefore, it is preferably according to e_k(C) For mvd_k(C) The symbol occurrence probability of (2) is adapted. The set of variations (sets of variations) of the occurrence probability is a context model, in which case there are three variations of the occurrence probability.

The context model is defined in advance for each of the target data to be encoded, such as the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantized transform coefficient 10, and is shared by the encoding apparatus and the decoding apparatus. The context model determining unit 11a performs a process of selecting a predetermined model according to the type of the data to be encoded (which occurrence probability of the context model is selected varies, and corresponds to the occurrence probability generating process (3) below).

(2) Binarization processing (step S161 in FIG. 55)

In the context model, the encoding target data is binarized by the binarizing unit 11b and determined according to each bin (binary position) of the binary sequence. In the rule of binarization, transformation into a variable-length binary sequence is performed in accordance with the approximate distribution of values that can be obtained for each encoded data. In binarization, when encoding target data that can be originally multi-valued is encoded in bin units, there is an advantage that the number of probability straight line divisions can be reduced, the operation can be simplified, and the context model can be simplified, as compared with arithmetic encoding without modification.

(3) Occurrence probability generation processing (step S162 in FIG. 55 (step S162 is shown in detail in FIG. 56))

In the above-described processes (1) and (2), binarization of multivalued encoding target data and setting of a context model applied to each bin are completed, and the preparation for encoding is completed. Next, the occurrence probability generating unit 11c performs a process of generating an occurrence probability state used for arithmetic coding. Since each context model includes a variation in the occurrence probability for each value of 0/1, the processing is performed with reference to the context model 11f determined in step S160 as shown in fig. 54. Determining e of FIG. 58 _k(C) The evaluation value used for the probability of occurrence selection as shown is used to determine which probability of occurrence variation is used for the current coding from the selection branches of the context model to be referred to according to the evaluation value (step S162a in fig. 56). Furthermore, the variable length encoding unit 11 in embodiment 12 includes an occurrence probability information storage memory 11g and has a function of storing occurrence probability states 11h that are sequentially updated during encoding, for each color component. The occurrence probability generator 11c selects, for the occurrence probability state 11h used in the current coding, based on the value of the occurrence probability state parameter-sharing identification flag 143: selecting from the parts with color components of C0-C2 retained, or sharing C0 component in C1 and C2The occurrence probability state 11h actually used for coding is determined (S162 b to S162d in fig. 56).

The occurrence probability state parameter-sharing identification flag 143 can be selected similarly in the decoding apparatus, and therefore needs to be multiplexed in the bit stream. With such a configuration, the following effects are obtained. For example, in the case of fig. 58 as an example, in the case where the macroblock header-sharing identification flag 123C indicates that the C0 component header information 139a is also used in other components, if the macroblock type 128b indicates the 16 × 16 prediction mode, only one e of fig. 58 is specified for each macroblock _k(C) In that respect In this case, the occurrence probability state prepared for the component C0 is always used. On the other hand, if the macroblock header-sharing identification flag 123C indicates that the header information (139a to 139C) corresponding to each component is used, and if the macroblock type 128b indicates the 16 × 16 prediction mode in any of C0, C1, and C2, e in fig. 58_k(C) There can be three variations for each macroblock. In the encoding unit 11d at the subsequent stage, the following two options can be taken for each variation: the occurrence probability states 11h prepared in the C0 component are collectively used and updated, or the occurrence probability states 11h prepared for the respective color components are individually used and updated. In the former case, when the components C0, C1, and C2 have substantially the same motion vector distribution, the number of times of learning increases by using and updating the occurrence probability state 11h in common, and it is possible to learn a better probability of occurrence of a motion vector. In the latter case, conversely, when the components C0, C1, and C2 have a random motion vector distribution, the occurrence probability state 11h is updated by being used individually, so that it is possible to reduce the mismatch due to learning and learn a better probability of occurrence of a motion vector. Since the video signal is not constant, the efficiency of arithmetic coding can be improved by performing such adaptive control.

(4) Encoding process

Since the probability of occurrence of each value 0/1 on the probability number straight line required for the arithmetic coding process is obtained by (3), arithmetic coding is performed in the coding unit 11d according to the process described in the conventional example (step S163 in fig. 55). The actual code value (0 or1) 11e is fed back to the occurrence probability generating unit 11c, and 0/1 occurrence frequency is counted to update the used occurrence probability state 11h (step S164). For example, when encoding processing of 100 bins is performed using a specific occurrence probability state 11h, the occurrence probability of 0/1 in the change in occurrence probability is 0.25 or 0.75. Here, when 1 is encoded using the same occurrence probability variation, the occurrence frequency of 1 is updated, and the occurrence probability of 0/1 changes to 0.247 or 0.752. By this mechanism, efficient encoding suitable for the actual occurrence probability can be performed. The coded value 11e is output from the variable length coding unit 11 and is output from the coding apparatus as a video stream 22.

Fig. 59 shows an internal configuration related to arithmetic decoding processing in the variable length decoding unit 25, and fig. 60 shows an operation flow thereof.

The variable length decoding unit 25 according to embodiment 12 includes: a context model determining unit 11a that determines the types of the respective decoding target data, such as the motion vector 137, the reference picture identification number 132b, the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantization completion transform coefficient 10, and determines a context model defined in common with the encoding apparatus; a binarization section 11b for generating a binarization rule determined according to the type of the decoding target data; an occurrence probability generation unit 11c for providing the occurrence probability of each bin (0 or1) according to the binarization rule and the context model; a decoding unit 25a that performs arithmetic decoding based on the generated occurrence probability and decodes data such as the motion vector 137, the reference image identification number 132b, the macroblock type/sub-macroblock type 106, the intra prediction mode 141, and the quantization completion transform coefficient 10 based on the binary sequence obtained as a result of the arithmetic decoding and the binarization rule; and a memory 11g for storing the occurrence probability information. 11a to 11c and 11g are the same as the internal components of the variable length coding section 11 shown in fig. 54.

(5) Context model determination processing, binarization processing, and occurrence probability generation processing

These processes follow the processes (1) to (3) on the encoding device side. Although not shown, the occurrence probability state parameter-sharing identification flag 143 is extracted from the video stream 22 in advance.

(6) Arithmetic decoding process

In order to determine the probability of occurrence of a bin to be decoded from now on in the processing up to (6), the decoding unit 25a restores the value of the bin in accordance with a predetermined arithmetic decoding processing procedure (step S166 in fig. 60). The reduced value 25b of bin is fed back to the occurrence probability generator 11c, and 0/1 occurrence frequency is counted to update the used occurrence probability state 11h (step S164). Every time the reduced value of each bin is specified, the decoding unit 25a checks the correspondence with the binary sequence pattern determined in the binarization rule, and outputs the data value indicated by the pattern that matches as the decoded data value (step S167). As long as the decoded data is not determined, the process returns to step S166 to continue the decoding process.

According to the encoding/decoding device having the above configuration and including the arithmetic encoding and arithmetic decoding processes, when the coded information of each color component is adaptively encoded according to the macroblock header sharing identification flag 123c, more efficient encoding can be performed.

Although not particularly shown, the unit of multiplexing the occurrence probability state parameter commonization identification flag 143 may be any of macroblock unit, slice unit, picture unit, and sequence unit. By multiplexing the flag in the upper data layer such as slice, picture, and sequence, when sufficient coding efficiency can be ensured by switching in the upper layer above the slice, the occurrence probability state parameter-sharing identification flag 143 does not need to be multiplexed one by one at the macroblock level, and additional bits can be reduced.

The occurrence probability state parameter-sharing identification flag 143 may be information determined inside the decoding apparatus based on related information included in a different bit stream from the generation probability state parameter-sharing identification flag itself.

In embodiment 12, when the macroblock header-sharing identification flag 123c is arithmetically encoded in units of macroblocks, the model shown in fig. 61 is used for the context model 11 f. In fig. 61, the value of the macroblock header-sharing identification flag 123c in the macroblock X is set to IDC_X. When the macroblock header-sharing identification flag 123C in the macroblock C is encoded, the value IDC of the macroblock header-sharing identification flag 123C in the macroblock a is used as the reference _AAnd IDC value of the macroblock header-sharing identification flag 123c of the macroblock B_BThe following three states are obtained from the equation of the figure.

The value 0: A. both parties B are modes of "use a common macroblock header in C0, C1, C2".

The value 1: A. one of B is a mode "common macroblock header is used for C0, C1, and C2", and the other is a mode "individual macroblock header is used for C0, C1, and C2".

The value 2: A. both sides B are modes of "individual macroblock header is used in C0, C1, C2".

By encoding the macroblock header-sharing identification flag 123c in this manner, arithmetic encoding can be performed in accordance with the encoding state of the neighboring macroblock, and encoding efficiency can be improved. The operation of the decoding apparatus according to embodiment 12 described above can be clarified by defining a context model and performing arithmetic decoding in the same procedure on both the encoding side and the decoding side.

In embodiment 12, header information (macroblock type, sub-macroblock type, intra prediction mode, reference image identification number, motion vector, transform coefficient validity indication information, and quantization parameter) of fig. 50 included in a macroblock header is arithmetically encoded using context models defined for each information type, but any of the context models is defined by referring to corresponding information of a macroblock A, B for a current macroblock C as shown in fig. 62. Here, as shown in fig. 62(a), when the macroblock C is in the mode "common macroblock titles are used for C0, C1, and C2" and the macroblock B is in the mode "individual macroblock titles are used for C0, C1, and C2", information of a specific color component of any one of C0, C1, and C2 is used as reference information for the context model definition.

For example, in the case where C0, C1, C2 correspond to the R, G, B color component, the following method is considered: the G component having a component closest to a luminance signal used in encoding from the past as a signal for well expressing the structure of an image is selected. This is because, even in the mode "the macroblock header is used in common in C0, C1, and C2", it is considered that information of the macroblock header is specified with the G component as a reference and encoding is often performed.

On the other hand, in the case of the inverse, as shown in fig. 62(B), when the macroblock C is in the mode of "individual macroblock headers are used in C0, C1, and C2" and the macroblock B is in the mode of "common macroblock headers are used in C0, C1, and C2", it is necessary to encode and decode the header information of three color components in the macroblock C, but in this case, as reference information on the context model definition of the header information of each color component, the header information common to three components is used as the same value for the macroblock B as three components. It is to be noted that, although it is clear, in the case where the macroblock header-sharing identification flags 123c all indicate the same value in the macroblock A, B, C, the reference information corresponding thereto is always present, and therefore, these are used.

The operation of the decoding apparatus according to embodiment 12 described above can be clarified by defining a context model and performing arithmetic decoding in the same procedure on both the encoding side and the decoding side. After determining which component information is used as a reference context model, the occurrence probability state associated with the context model is updated according to the state of the occurrence probability state parameter-sharing identification flag 143.

In embodiment 12, arithmetic coding is also performed on each of the transform coefficient data of the C0, C1, and C2 components in accordance with the occurrence probability distribution of each piece of encoding target data. For these data, the encoded data of three components is always contained in the bit stream regardless of whether or not the macroblock header is shared. In embodiment 12, since the prediction difference signal is obtained by performing intra prediction and inter prediction in the color space of the encoded input signal, the distribution of the transform coefficient data obtained by integer-transforming the prediction difference signal has the same occurrence probability distribution regardless of the surrounding state such as whether or not the macroblock header is shared as shown in fig. 62. Therefore, in embodiment 12, a common context model is defined for each of the components C0, C1, and C2 regardless of whether or not the macroblock header is shared, and used for encoding and decoding.

Embodiment 13

In embodiment 13, another encoding device and decoding device derived from the encoding devices and decoding devices described in embodiments 7 to 12 will be described. The encoding device and decoding device according to embodiment 13 are characterized in that the encoding device performs color space conversion processing in the input stage of the encoding device described in embodiments 7 to 12, converts the color space of a video signal input to the encoding device after image capturing into an arbitrary color space suitable for encoding, multiplexes information specifying inverse conversion processing for returning to the color space at the image capturing time on the decoding side into a bitstream, extracts information specifying the inverse conversion processing from the bitstream, obtains a decoded image by the decoding device described in embodiments 7 to 12, and performs inverse color space conversion based on the information specifying the inverse conversion processing.

Fig. 63 shows the configuration of an encoding device and a decoding device in embodiment 13. The encoding device and the decoding device according to embodiment 13 will be described with reference to fig. 63.

The encoding device according to embodiment 13 includes a color space conversion unit 301 in a stage prior to the encoding device 303 according to embodiments 7 to 12. The color space conversion unit 301 includes one or more color space conversion processes, selects a color space conversion process to be used depending on the nature of the input video signal, system settings, and the like, performs the color space conversion process on the input video signal, and sends a converted video signal 302 obtained as a result of the color space conversion process to the encoding device 303. Information for identifying the color space conversion process used at the same time is output to the encoding device 303 as color space conversion method identification information 304. The encoding device 303 multiplexes the color space conversion method identification information 304 into a bit stream 305 compressed and encoded by the method described in embodiments 7 to 12, and outputs the resulting signal to a transmission path, or to a recording device that records the signal on a recording medium, with the converted video signal 302 as a signal to be encoded.

Here, the prepared color space conversion method includes, for example, the following conversion: conversion from RGB to YUV as used in the past standards

C0＝Y＝0.299×R+0.587×G+0.114×B

C1＝U＝-0.169×R-0.3316×G+0.500×B

C2＝V＝0.500×R-0.4186×G-0.0813×B；

Prediction between color components,

C0＝G′＝G

C1 ═ B' ═ B-f (G) (where f (G): the result of filter processing for the G component)

C2＝R′＝R-f(G)；

Conversion from RGB to YCoCg

C0＝Y＝R/2+G/2+B/4

C1＝Co＝R/2-B/2

C2＝Cg＝-R/4+G/2-B/4

The input to the color space conversion unit 301 is not necessarily limited to RGB, and the conversion processing is not limited to the three types described above.

The decoding device of embodiment 13 includes an inverse color space converter 308 at the subsequent stage in addition to the decoding devices 306 of embodiments 7 to 12. The decoding device 306 takes the bit stream 305 as input, extracts and outputs the color space conversion method identification information 304 from the bit stream 305, and outputs a decoded image 307 obtained by the operation of the decoding device described in embodiments 7 to 12. The inverse color space conversion unit 308 has inverse conversion processes corresponding to the color space conversion methods selectable by the color space conversion unit 301, determines the conversion to be performed by the color space conversion unit 301 based on the color space conversion method identification information 304 output from the decoding device 306, performs the inverse conversion process on the decoded image 307, and performs the process of returning the color space of the input video signal to the encoding device according to embodiment 13.

According to the encoding device and decoding device as in embodiment 13, by performing an optimal color space conversion process on the encoded video signal at the preceding stage of encoding and the subsequent stage of decoding, the correlation included in the image signal composed of three color components is removed before encoding, encoding can be performed with redundancy reduced, and compression efficiency can be improved. In the conventional standard encoding system such as MPEG, the color space of the signal to be encoded is limited to YUV, but by providing the color space conversion unit 301 and the inverse color space conversion unit 308 and including the color space conversion method identification information 304 in the bit stream 305, it is possible to eliminate the limitation on the color space of the video signal to be encoded and to perform encoding using an optimal conversion from among various means in which the correlation between color components is removed. The color space conversion method identification information 304 may also be multiplexed at the level of pictures, slices, and macroblocks. For example, by multiplexing in units of macroblocks, it is possible to selectively use a transform that can optimally remove the correlation of locality among three color components, and it is possible to improve the coding efficiency.

In embodiment 13, although the color space conversion unit 301 and the inverse color space conversion unit 308 are described on the premise that they are always operated, it is also possible to encode information indicating that compatibility with a conventional standard is ensured in a higher-order layer such as a sequence without operating these processing units.

The color space conversion unit 301 and the inverse color space conversion unit 308 of embodiment 13 may be incorporated into the internal configurations of the encoding devices and decoding devices of embodiments 7 to 12 to perform color space conversion in the prediction difference signal level. Fig. 64 shows an encoding device configured in this manner, and fig. 65 shows a decoding device. The encoding device of fig. 64 includes a transformation unit 310 instead of the orthogonal transformation unit 8 and an inverse transformation unit 312 instead of the inverse orthogonal transformation unit 13, and the decoding device of fig. 65 includes an inverse transformation unit 312 instead of the inverse orthogonal transformation unit 13.

As shown in the processing performed by the color space conversion unit 301, the conversion unit 310 selects the optimal conversion process from among the plurality of color space conversion processes for the prediction difference signal 4 of the C0, C1, and C2 components output from the encoding mode determination unit 5, and first performs color space conversion. Then, the result of the color space transformation is transformed by the orthogonal transformation unit 8. The color space conversion method identification information 311 indicating which conversion is selected is sent to the variable length coding unit 11, and multiplexed into a bit stream and output as the video stream 22. In addition, the inverse transform unit 312 first performs inverse transform corresponding to the inverse orthogonal transform unit 13, and then performs inverse color space transform processing using the color space transform processing specified by the color space transform method identification information 311.

In the decoding apparatus, the variable length decoding unit 25 extracts the color space conversion method identification information 311 from the bit stream, and sends the result to the inverse transform unit 312, thereby performing the same processing as the inverse transform unit 312 in the encoding apparatus. With this configuration, when the correlation remaining between the color components can be sufficiently removed in the prediction difference region, it is possible to perform the correlation as a part of the encoding process, which has the effect of improving the encoding efficiency. However, when individual macroblock headers are used for the C0, C1, and C2 components, the prediction method can be changed for each component, such as when the C0 component is intra prediction and the C1 component is inter prediction, and therefore, it may be difficult to maintain correlation in the region of the prediction difference signal 4. Therefore, when individual macroblock headers are used for the C0, C1, and C2 components, the transform unit 310 and the inverse transform unit 312 may operate so as not to perform color space transform, or may be configured to multiplex into a bitstream as identification information whether or not color space transform is performed in the region of the prediction difference signal 4. The color space conversion method identification information 311 may be switched in units of any one of a sequence, a picture, a slice, and a macroblock.

In the configurations of the encoding device and the decoding device in fig. 64 and 65, the signal definition domain of the signal to be encoded differs depending on the color-space conversion method identification information 311 for each of the conversion coefficient data of the C0, C1, and C2 components. Therefore, the distribution of the transform coefficient data generally becomes a different occurrence probability distribution according to the color space transform method identification information 311. Therefore, when the encoding device and the decoding device are configured as shown in fig. 64 and 65, the encoding and decoding are performed using a context model in which an individual occurrence probability state is associated with each of the states of the color space conversion method identification information 311 for each of the components C0, C1, and C2.

Embodiment 14

In embodiment 14, a more specific device configuration will be described with respect to the encoding device and the decoding device described in the above embodiments.

In the above-described embodiments, the operation of the encoding device and the decoding device is explained using, for example, the drawings shown in fig. 1, fig. 2, fig. 30, fig. 31, and the like. In these figures, the following actions are illustrated: an input video signal composed of three color components is collectively input to an encoding device, the three color components are encoded in accordance with a common prediction mode or macroblock header, or encoded in accordance with respective prediction modes or macroblock headers, and the resultant bit stream is input to a decoding device, and a reproduced video is obtained by performing decoding processing while selecting whether the three color components are encoded in accordance with the common prediction mode or macroblock header, or encoded in accordance with the respective prediction modes or macroblock headers, on the basis of flags (for example, intra-prediction mode-sharing identification flag 23, inter-prediction mode-sharing identification flag 123, and the like) extracted by decoding from the bit stream. Although it has been described that the flag can be encoded and decoded in units of an arbitrary data layer such as a macroblock, a slice, a picture, and a sequence, in embodiment 14, the following device configuration and operation are described with reference to specific drawings: the encoding/decoding is performed while dividing, in units of one frame (or one field), whether the three color component signals are encoded by a common macroblock header or by individual macroblock headers. Hereinafter, unless otherwise specified, a unit of data in one frame or one field is referred to as "one frame" when it is described as "one frame".

The macroblock header in embodiment 14 includes a transform block size identification flag as shown in fig. 15, coding/prediction mode information such as macroblock type, sub-macroblock type, and intra prediction mode as shown in fig. 50, motion prediction information such as a reference picture identification number and a motion vector, transform coefficient validity/invalidity instruction information, and macroblock overhead information other than transform coefficient data such as a quantization parameter for a transform coefficient.

Hereinafter, the process of encoding three color component signals of one frame by a common macroblock header is referred to as "common encoding process", and the process of encoding three color component signals of one frame by individual independent macroblock headers is referred to as "independent encoding process". Similarly, a process of decoding frame image data from a bitstream obtained by encoding three color component signals of one frame with a common macroblock header is referred to as "common decoding process", and a process of decoding frame image data from a bitstream obtained by encoding three color component signals of one frame with individual independent macroblock headers is referred to as "independent decoding process". In the common encoding process according to embodiment 14, as shown in fig. 66, an input video signal of one frame is divided into macroblocks in a format in which three color components are combined. On the other hand, in the independent encoding process, as shown in fig. 67, the input video signal of one frame is separated into three color components, and these are divided into macroblocks composed of a single color component. That is, the macroblock to be subjected to the common encoding process includes samples of three color components, C0, C1, and C2, but the macroblock to be subjected to the independent encoding process includes only a sample of any one of the C0, C1, and C2 components.

Fig. 68 is an explanatory diagram showing a temporal motion prediction reference relationship between pictures in the encoding device and the decoding device according to embodiment 14. In this example, a data unit indicated by a thick vertical straight line is taken as a picture, and the relationship between the picture and an access unit is shown by a dotted line enclosed. In the case of the common encoding/decoding process, one picture is data indicating a video signal of one frame in which three color components are mixed, and in the case of the independent encoding/decoding process, one picture is set as a video signal of one frame of any one color component. The access unit is the minimum data unit of a time stamp for the purpose of synchronization of audio/audio information and the like to a video signal, and when performing the common encoding/decoding process, data of one picture is included in one access unit (427 a in fig. 68). On the other hand, in the case of the independent encoding/decoding process, three pictures are included in one access unit (427 b of fig. 68). This is because, in the case of the independent encoding/decoding process, only pictures of all three color components at the same display time are aligned to obtain a reproduced video signal of one frame. The number added to the top of each picture indicates the coding/decoding processing order in the temporal direction of the picture (frame _ num in AVC). In fig. 68, an inter-picture arrow indicates a reference direction of motion prediction. That is, in the case of the independent encoding/decoding process, encoding/decoding is performed while limiting the pictures of the respective color components of C0, C1, and C2 to the signal of the same color component for prediction reference, without performing motion prediction reference between pictures included in the same access unit and motion prediction reference between different color components. With such a configuration, in the case of the independent encoding/decoding process in embodiment 14, encoding/decoding of each color component can be executed without completely depending on encoding/decoding processes of other color components, and parallel processing is facilitated.

In AVC, an idr (inverse decoder refresh) picture is defined that performs intra coding and resets the contents of a reference picture memory used for motion compensation prediction. Since the IDR picture can be decoded without depending on any other picture, it is used as a random access point. Although the access unit in the common encoding process is one access unit — one picture, since one access unit is composed of a plurality of pictures in the access unit in the independent encoding process, when a certain color component picture is an IDR picture, the IDR access unit is defined by using other remaining color component pictures as IDR pictures, and the random access function is secured.

Hereinafter, in embodiment 14, identification information indicating whether encoding is performed by the common encoding process or encoding is performed by the independent encoding process is referred to as a common encoding/independent encoding identification signal.

Fig. 69 is an explanatory diagram showing an example of the structure of a bit stream generated by the encoding device of embodiment 14 and subjected to input/decoding processing by the decoding device of embodiment 14. Fig. 69 is a diagram showing a bit stream structure from a sequence level to a frame level, and first, a common encoding/independent encoding identification signal 423 is multiplexed into a header at the sequence level (a sequence parameter set or the like in the case of AVC). Each frame is encoded in units of access units. AUD denotes the only NAL Unit used to identify the cut of an Access Unit in AVC, namely the Access Unit Delimiter NAL Unit. When the common encoding/independent encoding identification signal 423 indicates "picture encoding by common encoding processing", encoded data of one picture is included in an access unit. The picture at this time is data indicating the video signal of one frame in which the three color components are mixed as described above. At this time, the encoded data of the i-th access unit is configured as a set of Slice data Slice (i, j). j is an index of slice data within one picture.

On the other hand, when the common encoding/independent encoding identification signal 423 indicates "picture encoding by independent encoding processing", one picture is a video signal of one frame of any one color component. In this case, the encoded data of the p-th access unit is configured as a set of Slice data Slice (p, q, r) of the q-th picture in the access unit. r is an index of slice data within one picture. In the case of a video signal having three color components, such as RGB, the number of values that q can take is 3. In addition, in the case where additional data such as transmittance information for alpha blending (alpha blending) is encoded and decoded as the same access unit in addition to the video signal composed of 3 primary colors, or in the case where a video signal composed of four or more color components (for example, YMCK used in color printing) is encoded and decoded, the number of values that q can take is set to four or more. In the encoding device and the decoding device according to embodiment 14, if the independent encoding process is selected, the color components constituting the video signal are encoded completely independently, and therefore, in principle, the number of color components can be freely changed without changing the encoding/decoding process. In the future, even when the signal format for expressing the color of the video signal is changed, the effect that the independent encoding processing in embodiment 14 can be applied is obtained.

In order to realize such a configuration, in embodiment 14, the common encoding/independent encoding identification signal 423 is expressed by a format of "the number of pictures included in one access unit and independently encoded without performing motion prediction reference to each other". In this case, the common code/independent code identification signal 423 may be expressed by the number of values that the parameter q may take, and hereinafter, the number of values that the parameter may take is referred to as num _ pictures _ in _ au. That is, num _ pictures _ in _ au ═ 1 indicates "common coding processing", and num _ pictures _ in _ au ═ 3 indicates "independent coding processing" in embodiment 14. When the color components are four or more, num _ pictures _ in _ au > 3 may be set. By making such signaling, the decoding apparatus, if decoding and referring to num _ pictures _ in _ au, can not only distinguish encoded data obtained by the common encoding process from encoded data obtained by the independent encoding process, but also know that several single color component pictures exist within one access unit at the same time, can cope with color representation extension of a future picture signal, and can perform the common encoding process and the independent encoding process seamlessly in the bitstream.

Fig. 70 is an explanatory diagram showing the bit stream structure of slice data in the case of the common encoding process and the independent encoding process. In the bit stream encoded by the independent encoding process, in order to achieve the effect described later, a color component identification flag (color _ channel _ idc) is given to the header area at the head of the slice data so that the decoding apparatus can identify to which color component in the access unit the received slice data belongs. For color _ channel _ idc, slices having the same value are grouped. That is, it is assumed that slices having different color _ channel _ idc values do not have any coding/decoding dependency (for example, motion prediction reference, context modeling by CABAC, occurrence probability learning, and the like). By defining in this way, independence of each picture in an access unit at the time of independent encoding processing is ensured. The frame _ num (the coding/decoding processing order of the picture to which the slice belongs) multiplexed in each slice header is set to the same value for all color component pictures in one access unit.

Fig. 71 is an explanatory diagram showing a schematic configuration of the encoding device according to embodiment 14. In fig. 71, the common encoding process is performed in the first picture encoding section 503a, and the independent encoding process is performed in the second picture encoding sections 503b0, 503b1, 503b2 (the amount of preparing three color components). The input video signal 1 is supplied to the first picture coding unit 503a or any one of the color component separation unit 502 and the second picture coding units 503b0 to 503b2 via the Switch (SW) 501. The switch 501 is driven by the common code/independent code identification signal 423, and the input video signal 1 is supplied to a predetermined path. Hereinafter, the following will be explained: when the input video signal is in the 4:4:4 format, the common coding/independent coding identification signal (num _ pictures _ in _ au)423 is multiplexed into the sequence parameter set, and is a signal for selecting the common coding process and the independent coding process in units of a sequence. This is conceptually the same as the inter prediction mode sharing identification flag 123 described in embodiment 7 and the macroblock header sharing identification flag 123c described in embodiment 11. When the common encoding process is used, the common decoding process needs to be executed on the decoding apparatus side, and when the independent encoding process is used, the independent decoding process needs to be executed on the decoding apparatus side, and therefore the common encoding/independent encoding identification signal 423 needs to be multiplexed into the bit stream as information for specifying the common encoding/independent encoding. Therefore, the common code/independent code identification signal 423 is input to the multiplexing unit 504. The multiplexing unit of the common/independent coding identification signal 423 may be any unit as long as it is a higher layer of a picture such as a gop (group of pictures) unit composed of several picture groups in a sequence.

In order to perform the common encoding process, the first picture encoding unit 503a divides the input video signal 1 into macroblocks in the form of samples in which three color components are combined, as shown in fig. 66, and performs the encoding process in units of the macroblocks. The encoding process in the first picture encoding unit 503a will be described later. When the independent encoding process is selected, the input video signal 1 is separated into data of one frame of C0, C1, and C2 in the color component separator 502, and is supplied to the corresponding second picture encoders 503b0 to 503b 2. The second picture encoding units 503b0 to 503b2 divide the signal of one frame separated for each color component into macroblocks of the format shown in fig. 67, and perform encoding processing in units of the divided macroblocks. The encoding process in the second picture encoding unit will be described later.

The first picture encoding unit 503a receives a video signal of one picture composed of three color components, and outputs encoded data as a video stream 422 a. The video signals of one picture composed of a single color component are input to the second picture encoding units 503b0 to 503b2, and the encoded data are output as video streams 422b0 to 422b 2. These video streams are multiplexed and output in the form of the video stream 422c by the multiplexing unit 504 based on the state of the common code/independent code identification signal 423.

In multiplexing the video stream 422c, the multiplexing order and the transmission order in the bit stream of slice data can be interleaved between pictures (color components) in an access unit in the independent encoding process (fig. 72). In this case, on the decoding apparatus side, it is necessary to identify to which color component in an access unit the received slice data belongs. Therefore, the color component identification flag multiplexed in the header area at the head of the slice data as shown in fig. 70 is used.

With this configuration, in the encoding apparatus, as in the encoding apparatus shown in fig. 71, when encoding pictures of three color components by parallel processing using three sets of independent second picture encoding units 503b0 to 503b2, it is possible to immediately transmit encoded data as long as slice data of the picture itself is prepared without waiting for completion of encoded data of pictures of other color components. In AVC, one picture can be divided into a plurality of slices and coded, and the slice data length and the number of macroblocks included in a slice can be flexibly changed according to coding conditions. In order to ensure the independence of decoding processing of slices between slices adjacent in an image space, a nearby context such as intra prediction or arithmetic coding cannot be used, and therefore, when the slice data length is as long as possible, the coding efficiency is high. On the other hand, when an error is mixed in the bit stream during transmission and recording, the shorter the slice data length is, the faster the error is recovered, and the quality deterioration is easily suppressed. When the length, structure, order of color components, and the like of a slice are fixed without multiplexing color component identifiers, the conditions for generating a bit stream are fixed in the encoding apparatus, and it is not possible to flexibly cope with various encoding requirements.

In addition, if a bit stream can be configured as shown in fig. 72, the transmission buffer size required in transmission, that is, the processing delay on the encoding apparatus side can be reduced in the encoding apparatus. Fig. 72 shows the appearance thereof. If multiplexing of slice data across pictures is not permitted, the encoding device needs to buffer encoded data of another picture until encoding of a picture of a specific color component is completed. Which means that a delay in picture level occurs. On the other hand, if interleaving can be performed at the slice level as shown in the lowermost part of fig. 72, the picture coding section of a specific color component can output coded data to the multiplexing section in units of slice data, and delay can be suppressed.

In addition, in one color component picture, slice data included therein may be transmitted in a Raster scan (Raster scan) order of macroblocks, or may be configured to be transmitted interleaved in one picture.

The following describes the operation of the first and second picture encoding units in detail.

Outline of operation of first picture encoding unit

Fig. 73 shows an internal configuration of the first picture encoding unit 503 a. In fig. 73, an input video signal 1 is input in units of macroblocks in the format of 4:4:4 in which three color components are combined, as shown in fig. 66.

First, the prediction unit 461 selects a reference image from the motion compensation prediction reference image data stored in the memory 16a, and performs motion compensation prediction processing on a macroblock basis. The memory 16a stores a plurality of reference image data composed of three color components at a plurality of times, and the prediction unit 461 selects an optimal reference image from these reference image data in units of macroblocks to perform motion prediction. The arrangement of the reference image data in the memory 16a may be divided and stored for each color component in the plane order, or may be a sample of each color component in the dot order. Seven kinds of block sizes for motion compensation prediction are prepared, and first, any one of the sizes 16 × 16, 16 × 8, 8 × 16, and 8 × 8 may be selected for each macroblock unit as shown in fig. 32(a) to (d). Further, when 8 × 8 is selected, any one of the sizes 8 × 8, 8 × 4, 4 × 8, and 4 × 4 may be selected for each 8 × 8 block as shown in fig. 32(e) to (h).

The prediction unit 461 performs motion compensation prediction processing on all or a part of the block size/subblock size in fig. 32, a motion vector in a predetermined search range, and one or more available reference images for each macroblock, and obtains a prediction difference signal 4 for each block serving as a unit of motion compensation prediction using motion vector information, a reference image identification number 463 used for prediction, and a subtractor 3. The prediction difference signal 4 is evaluated for its prediction efficiency in the coding mode determination unit 5, and outputs a macroblock type/sub-macroblock type 106 and a motion vector information/reference picture identification number 463, which have obtained the best prediction efficiency for the macroblock to be predicted in the prediction process performed by the prediction unit 461. All macroblock header information such as a macroblock type, a sub-macroblock type, a reference picture index, and a motion vector are determined as header information common to three color components, and are used for encoding and multiplexed in a bit stream. In order to evaluate the optimization of the prediction efficiency, only the prediction error amounts for a predetermined color component (for example, the G component in RGB, the Y component in YUV, or the like) may be evaluated for the purpose of suppressing the calculation amount, or the prediction error amounts for all the color components may be comprehensively evaluated in order to obtain the optimal prediction performance although the calculation amount is large. When the final macroblock type/sub-macroblock type 106 is selected, the weighting coefficients 20 of each type determined by the determination of the encoding control unit 19 may be added.

Similarly, the prediction unit 461 also performs intra prediction. When performing intra prediction, the intra prediction mode information is output in the output signal 463. Hereinafter, when intra prediction and motion compensated prediction are not particularly distinguished, intra prediction mode information, motion vector information, and reference picture identification number are collectively referred to as prediction overhead information with respect to the output signal 463. For intra prediction, only the prediction error amounts of predetermined color components may be evaluated, or the prediction error amounts of all the color components may be comprehensively evaluated. Finally, the coding mode determination unit 5 evaluates and selects whether the macroblock type is intra-prediction or inter-prediction based on the prediction efficiency or the coding efficiency.

The prediction difference signal 4 obtained by intra prediction and motion compensation prediction based on the selected macroblock type/sub macroblock type 106 and the prediction overhead information 463 is output to the conversion unit 310. The transform unit 310 transforms the input prediction difference signal 4 and outputs the transformed prediction difference signal to the quantization unit 9 as a transform coefficient. In this case, the size of the block to be a unit of transform may be selected from either 4 × 4 or 8 × 8. When the transform block size can be selected, the block size selected at the time of encoding is reflected in the value of the transform block size designation flag 464, and this flag is multiplexed in the bit stream. The quantization unit 9 quantizes the input transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient to the variable length encoding unit 11 as the quantization-completed transform coefficient 10. The quantized transform coefficient 10 includes information on the amount of three color components, and is entropy-encoded by means such as huffman coding and arithmetic coding in the variable length coding unit 11. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse transform unit 312, and is added to the predicted image 7 generated from the selected macroblock type/sub-macroblock type 106 and the prediction overhead information 463 using the adder 18, thereby generating a local decoded image 15. The local decoded image 15 is subjected to the block distortion removal processing by the block filter 462, and then stored in the memory 16a for use in the subsequent motion compensation prediction processing. The variable length coding unit 11 is also input with a block filter control flag 24 indicating whether or not to perform block filtering on the macroblock.

The quantization completion transform coefficients 10, the macroblock type/sub-macroblock type 106, the prediction overhead information 463, and the quantization parameter 21 input to the variable length coding unit 11 are arranged and shaped into a bitstream according to a predetermined rule (syntax), and are output to the transmission buffer 17 as coded data in which NAL units are put together into a unit of slice data of one or a plurality of macroblocks in the format of fig. 66. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path to which the encoding apparatus is connected and the reading speed of the recording medium, and outputs the bit stream as a video stream 422 a. Further, feedback is given to the encoding control unit 19 according to the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.

The output of the first picture encoding unit 503a is a slice in a unit obtained by combining the three components, and is equivalent to the amount of code in a unit obtained by combining the access units, so that the transmission buffer 17 may be arranged in the multiplexing unit 504 as it is.

In the first picture encoding unit 503a according to embodiment 14, since all the slice data in the sequence can be identified as slices in which C0, C1, and C2 are mixed (i.e., slices in which information of three color components is mixed) by the common code/independent code identification signal 423, the color component identification flag is not multiplexed in the slice header.

Outline of operation of second picture encoding unit

Fig. 74 shows the internal structure of the second picture encoding section 503b0(503b1, 503b 2). In fig. 74, the input video signal 1 is input in units of macroblocks each composed of a sample of a single color component in the format of fig. 67.

First, the prediction unit 461 selects a reference image from the motion compensation prediction reference image data stored in the memory 16b, and performs motion compensation prediction processing on a macroblock basis. The memory 16b can store a plurality of reference image data composed of a single color component at a plurality of times, and the prediction unit 461 selects an optimal reference image for each macroblock from these reference image data and performs motion prediction. The memory 16b may be shared with the memory 16a in units obtained by summing the amounts of the three color components. Seven kinds of block sizes for motion compensation prediction are prepared, and one of the sizes 16 × 16, 16 × 8, 8 × 16, and 8 × 8 may be selected for each macroblock unit as shown in fig. 32(a) to (d). Further, when 8 × 8 is selected, any one of the sizes 8 × 8, 8 × 4, 4 × 8, and 4 × 4 may be selected for each 8 × 8 block as shown in fig. 32(e) to (h).

The prediction unit 461 performs motion compensation prediction processing on all or a part of the block size/subblock size in fig. 32, a motion vector in a predetermined search range, and one or more available reference images for each macroblock, and obtains a prediction difference signal 4 for each block serving as a unit of motion compensation prediction using motion vector information, a reference image identification number 463 used for prediction, and a subtractor 3. The prediction difference signal 4 is evaluated for its prediction efficiency in the coding mode determination unit 5, and outputs a macroblock type/sub-macroblock type 106 and a motion vector information/reference picture identification number 463, which have obtained the best prediction efficiency for the macroblock to be predicted in the prediction process performed by the prediction unit 461. All the macroblock header information such as the macroblock type, the sub-macroblock type, the reference picture index, and the motion vector are determined as header information for a signal of a single color component of the input video signal 1, and are used for encoding and multiplexed in a bit stream. When the optimization of the prediction efficiency is evaluated, only the prediction error amount for a single color component to be subjected to the encoding process is evaluated. When the final macroblock type/sub-macroblock type 106 is selected, the weighting factor 20 for each type determined by the determination of the encoding control unit 19 may be added.

Similarly, the prediction unit 461 also performs intra prediction. When performing intra prediction, the intra prediction mode information is output in the output signal 463. Hereinafter, when intra prediction and motion compensated prediction are not particularly distinguished, intra prediction mode information, motion vector information, and reference picture identification number are collectively referred to as prediction overhead information with respect to the output signal 463. Also, in the intra prediction, only the prediction error amount for a single color component to be subjected to encoding processing is evaluated. Finally, whether the macroblock type is set to intra prediction or inter prediction is evaluated and selected by prediction efficiency or coding efficiency.

The prediction difference signal 4 obtained using the selected macroblock type/sub-macroblock type 106 and the prediction overhead information 463 is output to the transform unit 310. The transform unit 310 transforms the input prediction difference signal 4 for the amount of the single color component and outputs the transformed prediction difference signal as a transform coefficient to the quantization unit 9. In this case, the size of the block to be a unit of transform may be selected from either 4 × 4 or 8 × 8. If selectable, the block size selected at the time of encoding is reflected in the value of the transform block size designation flag 464, and this flag is multiplexed in the bit stream. The quantization unit 9 quantizes the input transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient to the variable length encoding unit 11 as the quantization-completed transform coefficient 10. The quantized transform coefficient 10 includes information of the amount of a single color component, and is entropy-encoded by means such as huffman coding or arithmetic coding in the variable length coding unit 11. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse transform unit 312, and is added to the predicted image 7 generated from the selected macroblock type/sub-macroblock type 106 and the prediction overhead information 463 using the adder 18, thereby generating the local decoded image 15. The local decoded image 15 is subjected to the block distortion removal processing by the block filter 462, and then stored in the memory 16b for use in the subsequent motion compensation prediction processing. The variable length coding unit 11 is also input with a block filter control flag 24 indicating whether or not to perform block filtering on the macroblock.

The quantization completion transform coefficient 10, the macroblock type/sub-macroblock type 106, the prediction overhead information 463, and the quantization parameter 21 input to the variable length coding unit 11 are arranged and shaped into a bitstream according to a predetermined rule (syntax), and are output to the transmission buffer 17 as coded data in which NAL units are put together into a unit of slice data of one or a plurality of macroblocks in the format of fig. 67. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path to which the encoding apparatus is connected and the reading speed of the recording medium, and outputs the bit stream as a video stream 422b0(422b1, 422b 2). Further, feedback is given to the encoding control unit 19 according to the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled.

The outputs of the second picture encoding units 503b0 to 503b2 are slices composed of data of only a single color component, and when the amount of code in a unit obtained by integrating access units is required to be controlled, a common transmission buffer in a unit obtained by multiplexing slices of all color components may be provided in the multiplexing unit 504, and the encoding control unit 19 for each color component may be fed back in accordance with the amount of occupancy of the buffer. In this case, the encoding control may be performed using only the amount of generation information of all the color components, or may be performed with the state of the transmission buffer 17 of each color component added. When encoding control is performed using only the generation information amounts of all color components, a function corresponding to the transmission buffer 17 may be realized by a common transmission buffer in the multiplexing unit 504, and the transmission buffer 17 may be omitted.

In the second picture encoding units 503b0 to 503b2 in embodiment 14, since all slice data in a sequence can be identified as a single color component slice (i.e., a C0 slice, a C1 slice, or a C2 slice) by the common code/independent code identification signal 423, a color component identification flag is always multiplexed to a slice header, and it is possible to identify which slice corresponds to which picture data in an access unit on the decoding apparatus side. Therefore, the second picture encoding units 503b0 to 503b2 can transmit the output from the respective transmission buffers 17 at the time when data corresponding to one slice is stored, instead of storing the output corresponding to one picture.

In addition, the information (common encoding identification information) for distinguishing the encoded data obtained by the common encoding process from the encoded data obtained by the independent encoding process and the information (the number of color components) indicating that several single color component pictures exist in one access unit may be simultaneously expressed in the common encoding/independent encoding identification signal (num _ pictures _ in _ au).

The first picture encoding unit 503a and the second picture encoding units 503b0 to 503b2 differ only in the difference between the information on whether the macroblock header information is processed as common three components or as a single color component, and in the bitstream structure of slice data. Most of the basic processing blocks in fig. 73 and 74, such as the prediction unit, the transform unit, the inverse transform unit, the quantization unit, the inverse quantization unit, and the block filter, have a difference between processing with information of three color components in a group and processing with information of only a single color component, and can be realized by a common functional block in the first picture encoding unit 503a and the second picture encoding units 503b0 to 503b 2. Therefore, not only the completely independent encoding processing unit shown in fig. 71 can be used, but also the basic configuration elements shown in fig. 73 and 74 can be appropriately combined to realize the mounting of various encoding devices. Further, if the memory 16a in the first picture coding unit 503a is arranged in the order of the planes, the configuration of the reference image memory can be made common to the first picture coding unit 503a and the second picture coding units 503b0 to 503b 2.

Although not shown, in the encoding apparatus according to embodiment 14, it is assumed that there are a virtual stream buffer (encoded picture buffer) for buffering the video stream 422c arranged as shown in fig. 69 and 70 and a virtual frame memory (decoded picture buffer) for buffering the decoded images 427a and 427b, and the video stream 422c is generated so that overflow and underflow of the encoded picture buffer and burst of the decoded picture buffer do not occur. This control is mainly performed by the encoding control unit 19. Thus, when the decoding apparatus decodes the video stream 422c in accordance with the operations (virtual buffer model) of the coded picture buffer and the decoded picture buffer, it is ensured that no break occurs in the decoding apparatus. The imaginary buffer model is specified below.

The coded picture buffer is operated in units of access units. As described above, in the case of performing the common decoding processing, encoded data of one picture is included in one access unit, and in the case of performing the independent decoding processing, encoded data of pictures (three pictures if three components) in the amount of color component count is included in one access unit. The predetermined operation for the coded picture buffer is a time when the first bit and the last bit of the access unit are input to the coded picture buffer, and a time when the bit of the access unit is read out from the coded picture buffer. Further, it is defined that the reading from the coded picture buffer is performed instantaneously, and all bits of the access unit are read from the coded picture buffer at the same timing. When the bits of the access unit are read from the coded picture buffer, the bits are input to the upper header analyzing unit, and the bits are decoded by the first picture decoding unit or the second picture decoding unit as described above, and output as a color video frame in which the access unit units are assembled. In addition, it is assumed that the processing is performed immediately before the bits are read from the coded picture buffer and output as the color video frame of the access unit, under the regulation of the virtual buffer model. The color video frame constituting an access unit is input to the decoded picture buffer, and the output time from the decoded picture buffer is calculated. The output time from the decoded picture buffer is a value obtained by adding a predetermined delay time to the read time from the encoded picture buffer. The delay time may be multiplexed into the bit stream to control the decoding apparatus. In the case where the delay time is 0, that is, in the case where the output time from the decoded picture buffer is equal to the readout time from the encoded picture buffer, the color picture frame is output from the decoded picture buffer while being input to the decoded picture buffer. Otherwise, that is, when the output time from the decoded picture buffer is later than the read time from the encoded picture buffer, the color video frame is stored in the decoded picture buffer until the output time from the decoded picture buffer is reached. As described above, the operation of the slave decoded picture buffer is defined in units of access units.

Fig. 75 is an explanatory diagram showing a schematic configuration of the decoding device according to embodiment 14. In fig. 75, the common decoding process is performed in the first picture decoding section 603a, and the independent decoding process is performed in the color component determination section 602 and the second picture decoding sections 603b0, 603b1, and 603b2 (the amount of three color components is prepared).

The video stream 422c is divided into NAL unit units in the upper header analysis unit 610, and upper header information such as a sequence parameter set and a picture parameter set is decoded as is and stored in a predetermined memory area that can be referred to by the first picture decoding unit 603a, the color component determination unit 602, and the second picture decoding units 603b0 to 603b2 in the decoding apparatus. The common code/independent code flag 423(num _ pictures _ in _ au) multiplexed into the sequence unit is decoded and held as a part of the upper header information.

The decoded num _ pictures _ in _ au is supplied to the Switch (SW)601, and if num _ pictures _ in _ au is 1, the switch 601 supplies the slice NAL unit of each picture to the first picture decoding section 603a, and if num _ pictures _ in _ au is 3, the slice NAL unit is supplied to the color component determination section 602. That is, if num _ pictures _ in _ au is equal to 1, the first picture decoding unit 603a performs the common decoding process, and if num _ pictures _ in _ au is equal to 3, the three second picture decoding units 603b0 to 603b2 perform the independent decoding process. The detailed operation of the first and second picture decoding units will be described later.

The color component determination unit 602 uses the value of the color component flag shown in fig. 70 to identify which color component picture in the current access unit the slice NAL unit corresponds to, and assigns the color component picture to be supplied to the appropriate second picture decoders 603b0 to 603b 2. With such a configuration of the decoding apparatus, the following effects are obtained: even if a bit stream encoded by interleaving slices in access units as shown in fig. 72 is received, it is possible to easily determine which slice belongs to which color component picture and correctly decode the picture.

Summary of operation of first picture decoding unit

Fig. 76 shows the internal configuration of the first picture decoding unit 603 a. The first picture decoding unit 603a divides the video stream 442C output from the coding apparatus of fig. 71 and arranged in the arrangement of fig. 69 and 70 in NAL unit units by the upper header analysis unit 610, receives the divided video stream in units of slices in which C0, C1, and C2 are mixed, performs decoding processing on macroblocks each composed of samples of three color components as shown in fig. 66, and restores the output video frame.

The variable length decoding unit 25 receives the video stream 442c divided into NAL units, decodes the video stream 442c according to a predetermined rule (syntax), and extracts the quantization completion transform coefficients 10 of the three color components and the macroblock header information (macroblock type/sub-macroblock type 106, prediction overhead information 463, transform block size designation flag 464, and quantization parameter 21) commonly used for the three components. The quantized transform coefficient 10 is input to the inverse quantization unit 12, which performs the same processing as the first picture coding unit 503a, together with the quantization parameter 21, and inverse quantization processing is performed. Then, the output thereof is input to the inverse transform unit 312 which performs the same processing as that of the first picture coding unit 503a, and is restored to the locally decoded prediction difference signal 14 (if the transform block size designation flag 464 is present in the video stream 422c, the flag is referred to during the inverse quantization and inverse transform processing). On the other hand, the prediction unit 461 of the first picture encoding unit 503a includes only the processing for generating the predicted image 7 with reference to the prediction overhead information 463, and the macroblock type/sub-macroblock type 106 and the prediction overhead information 463 are input to the prediction unit 461 to obtain the predicted image 7 of three components. When the macroblock type indicates intra prediction, the prediction overhead information 463 is used to obtain the three-component predicted image 7 in accordance with the intra prediction mode information, and when the macroblock type indicates inter prediction, the prediction overhead information 463 is used to obtain the three-component predicted image 7 in accordance with the motion vector and the reference image index. The local decoded prediction difference signal 14 and the predicted image 7 are added by an adder 18, and a provisional decoded image (local decoded image) 15 of three components is obtained. In order to use the temporarily decoded image 15 for the motion compensated prediction of the subsequent macroblock, the temporarily decoded image samples of three components are subjected to block distortion removal processing in the block filter 462 which performs the same processing as that of the first picture encoding unit 503a, and then output as a decoded image 427a and stored in the memory 16 a. At this time, the block filtering process is applied to the temporarily decoded image 15 in accordance with the instruction of the block filtering control flag 24 decoded by the variable length decoding unit 25. The memory 16a stores a plurality of reference image data composed of three color components at a plurality of times, and the prediction unit 461 selects a reference image indicated by a reference image index extracted from the bitstream in units of macroblocks from the reference image data and generates a predicted image. The arrangement of the reference image data in the memory 16a may be divided and stored for each color component in the plane order, or may be a sample of each color component in the dot order. The decoded image 427a includes three color components, and is a color image frame constituting the access cell 427a0 in the common decoding process as it is.

Outline of operation of second picture decoding unit

FIG. 77 shows the internal structure of the second picture decoders 603b 0-603 b 2. In the second picture decoders 603b 0-603 b2, the video stream 442C in the arrangement of fig. 69 and 70 output from the encoding device of fig. 71 is divided by the upper header analysis unit 610 in NAL unit units, received in C0, C1, or C2 slice NAL unit units allocated by the color component determination unit 602, decoded in units of macroblocks each composed of samples of a single color component as shown in fig. 67, and the video frame is restored and output.

The variable length decoding unit 25 decodes the video stream 422c according to a predetermined rule (syntax) by using the video stream 422c as an input, and extracts the quantization completion transform coefficient 10 of a single color component and macroblock header information (macroblock type/sub-macroblock type 106, prediction overhead information 463, transform block size designation flag 464, and quantization parameter 21) applied to the single color component. The quantized transform coefficient 10 is input to the inverse quantization unit 12, which performs the same processing as the second picture coding unit 503b0(503b1, 503b2), together with the quantization parameter 21, and inverse quantization processing is performed. Then, the output thereof is input to the inverse transform unit 312 that performs the same processing as the second picture coding unit 503b0(503b1, 503b2), and is restored to the locally decoded prediction difference signal 14 (if the transform block size designation flag 464 is present in the video stream 422c, the flag is referred to during the inverse quantization and inverse orthogonal transform processing). On the other hand, the prediction unit 461 in the second picture encoding unit 503b0(503b1, 503b2) includes only the processing for generating the predicted image 7 with reference to the prediction overhead information 463, and the macroblock type/sub-macroblock type 106 and the prediction overhead information 463 are input to the prediction unit 461 to obtain the predicted image 7 of a single color component. When the macroblock type indicates intra prediction, a predicted image 7 of a single color component is obtained from the prediction overhead information 463 according to the intra prediction mode information, and when the macroblock type indicates inter prediction, the predicted image 7 of a single color component is obtained from the prediction overhead information 463 according to the motion vector and the reference image index. The local decoded prediction difference signal 14 and the predicted image 7 are added by an adder 18 to obtain a temporary decoded image 15 of a single-color component macroblock. In order to use the temporarily decoded image 15 for the motion compensated prediction of the subsequent macroblock, the temporarily decoded image samples of a single color component are subjected to block distortion removal processing in the block filter 26 that performs the same processing as that of the second picture encoding unit 503b0(503b1, 503b2), and then output as a decoded image 427b and stored in the memory 16 b. At this time, the block filtering process is applied to the temporarily decoded image 15 in accordance with the instruction of the block filtering control flag 24 decoded by the variable length decoding unit 25. The decoded image 427b includes only samples of a single color component, and is configured as a color video frame by grouping the decoded images 427b, which are outputs of the other parallel-processed second picture decoders 603b0 to 603b2 in fig. 75, into units of the access cell 427b 0.

As is clear from the above, the first picture decoder 603a and the second picture decoders 603b0 to 603b2 have only a difference between information on whether macroblock header information is processed as common information for three components or as a single color component, and have a different bit stream structure from that of slice data, and most of basic decoding processing blocks such as motion compensation prediction processing, inverse transform, and inverse quantization in fig. 73 and 74 can be realized by common functional blocks in the first picture decoder 603a and the second picture decoders 603b0 to 603b 2. Therefore, not only the completely independent decoding processing unit as shown in fig. 75 can be provided, but also various decoding apparatuses can be mounted by appropriately combining the basic components shown in fig. 76 and 77. In addition, if the memory 16a in the first picture decoding unit 603a is arranged in the planar order, the configurations of the memory 16a and the memory 16b may be shared by the first picture decoding unit 603a and the second picture decoding units 603b0 to 603b 2.

It is to be noted that the decoding apparatus of fig. 75 may be configured to receive and decode a bit stream output from a coding apparatus that independently codes all frames, by constantly fixing the common encoding/independent encoding identification signal 423 to the "independent encoding process" without using the first picture coding unit 503a, as a matter of course, as the coding apparatus of fig. 71. As another form of the decoding apparatus of fig. 75, in a usage mode on the premise that the common encoding/independent encoding identification signal 423 is always fixed to the "independent encoding process", a decoding apparatus that performs only the independent decoding process may be configured without the switch 601 and the first picture decoding unit 603 a.

The common coding/independent coding identification signal (num _ pictures _ in _ au) includes information (common coding identification information) for distinguishing the coded data obtained by the common coding process from the coded data obtained by the independent coding process, and information (the number of color components) indicating that several single color component pictures exist in one access unit.

Furthermore, if the first picture decoding unit 603a has a function of decoding a bit stream of AVC high class encoded by combining three components of the conventional YUV4:2:0 format as an object, the upper header analyzing unit 610 determines which format the bit stream is encoded by referring to the class identifier decoded from the video stream 422c, and the determination result is transmitted to the switch 601 and the first picture decoding unit 603a as part of the information of the signal line of the common encoding/independent encoding identification signal 423, a decoding apparatus that ensures compatibility with the bit stream of the conventional YUV4:2:0 format can be configured.

In addition, in the first picture encoding unit 503a according to embodiment 14, information of three color components is mixed in slice data, and the same intra/inter prediction processing is performed on the three color components, so that there is a case where signal correlation between the color components remains in a prediction error signal space. As a method of removing this correlation, for example, the color space conversion process described in embodiment 13 above may be performed on the prediction error signal. Fig. 78 and 79 show examples of the first picture coding unit 503a having such a configuration. Fig. 78 shows an example in which color space conversion processing is performed at a pixel level before conversion processing, and the color space conversion unit 465 is disposed before the conversion unit 310, and the inverse color space conversion unit 466 is disposed after the inverse conversion unit 312. Fig. 79 shows an example in which the color space conversion process is performed while appropriately selecting the frequency component to be processed for the coefficient data obtained after the conversion process, and the color space conversion unit 465 is disposed after the conversion unit 310 and the inverse color space conversion unit 466 is disposed before the inverse conversion unit 312. By defining the frequency components to which the color space conversion is performed, there is an effect that it is possible to suppress propagation of a high-frequency noise component contained in a specific color component to other color components containing little noise. When the frequency component to be subjected to the color space conversion process can be selected adaptively, the signaling information 467 for selection at the time of judgment of encoding on the decoding side is multiplexed in the bit stream.

The color space conversion processing may be switched in units of macroblocks according to the properties of the image signal to be encoded, and may use a plurality of conversion methods as described in embodiment 13 above, or may be configured to determine the presence or absence of conversion in units of macroblocks. The type of the conversion method that can be selected may be specified at the sequence level or the like, and which conversion method is selected from among them may be specified in units of pictures, slices, macroblocks, and the like. Further, it may be configured so that whether to perform orthogonal transformation before or after the orthogonal transformation can be selected. When performing these adaptive encoding processes, the encoding mode determination unit 5 may be configured to evaluate the encoding efficiency for all selectable options and select the branch having the highest encoding efficiency. When these adaptive encoding processes are performed, the signaling information 467 for determining the selection at the decoding side at the time of encoding is multiplexed in the bit stream. Such signaling may be specified in a level different from a macroblock, such as a slice, a picture, a GOP, and a sequence.

Fig. 80 and 81 show a decoding device corresponding to the encoding device of fig. 78 and 79. Fig. 80 is a decoding device that decodes a bitstream encoded by performing color space conversion by the encoding device of fig. 78 before conversion processing. The variable length decoding unit 25 decodes information on whether or not to perform conversion by the inverse color space conversion unit 466 or not, and signaling information 467 which is information on a conversion scheme that can be performed by the inverse color space conversion unit 466, from the bit stream, and supplies the decoded information to the inverse color space conversion unit 466. The decoding apparatus of fig. 80 performs a color space conversion process on the inverse-converted prediction error signal based on the information in the inverse color space conversion unit 466. Fig. 81 is a decoding device that decodes a bit stream that has been color-space-converted and encoded by selecting a frequency component to be processed by the encoding device of fig. 79 after conversion processing. The variable length decoding unit decodes, from the bit stream, signaling information 467 that is identification information including information on whether conversion is performed or not performed by the inverse color space conversion unit 466, information on a conversion method performed by the inverse color space conversion unit, information for specifying a frequency component to be subjected to color space conversion, and the like, and supplies the decoded information to the inverse color space conversion unit 466. The decoding apparatus in fig. 81 performs a color space transform process on the inversely quantized transform coefficient data based on these pieces of information in the inverse color space transform unit 466.

In the decoding apparatuses of fig. 80 and 81, similarly to the decoding apparatus of fig. 75, the first picture decoding unit 603a has a function of decoding a bit stream of AVC high class encoded by combining three components of the conventional YUV4:2:0 format as an object, and the upper header analyzing unit 610 determines which format the bit stream is encoded by referring to the class identifier decoded from the video stream 422c, and transmits the determination result to the switch 601 and the first picture decoding unit 603a as a part of the information of the signal line of the common encoding/independent encoding identification signal 423.

Fig. 82 shows a structure of encoded data of macroblock header information included in a conventional YUV4:2: 0-format bitstream. The difference from the Cn component header information shown in fig. 50 is only that when the macroblock type is intra prediction, the encoded data of the intra color difference prediction mode 144 is included. When the macroblock type is inter prediction, the structure of the encoded data of the macroblock header information is the same as the Cn component header information shown in fig. 50, but a motion vector of a color difference component is generated by a method different from that of a luminance component using a reference picture identification number and motion vector information included in the macroblock header information.

The operation of a decoding apparatus that ensures compatibility with a conventional YUV4:2:0 format bit stream will be described. As described above, the first picture decoding unit 603a has a function of decoding a conventional YUV4:2: 0-format bitstream. The internal configuration of the first picture decoding unit is the same as that in fig. 76.

The operation of the variable length decoding unit 25 of the first picture decoding unit having a function of decoding a conventional YUV4:2:0 format bitstream will be described. When the video stream 422c is input to the variable length decoding unit, the color difference format indicator is decoded. The color difference format indicator is included in the sequence parameter header of the video stream 422c, and the input video format indicates any one of 4:4:4, 4:2:2, 4:2:0, and 4:0: 0. The decoding process of the macroblock header information of the video stream 422c is switched by the value of the color difference format indicator flag. In case the macroblock type indicates intra prediction and the color difference format indicator indicates 4:2:0 or 4:2:2, the intra color difference prediction mode 144 is decoded from the bitstream. In case the color difference format indication flag indicates 4:4:4, decoding of the intra color difference prediction mode 144 is skipped. In the case where the color difference format indicator indicates 4:0:0, since the input picture signal is a format composed of only a luminance signal (4:0:0 format), the decoding of the intra color difference prediction mode 144 is skipped. The decoding process of the macroblock header information other than the intra color difference prediction mode 144 is the same as that of the variable length decoding unit of the first picture decoding unit 603a that does not have the conventional YUV4:2:0 format bitstream decoding function. When the video stream 422c is input to the variable length decoding unit 25 as described above, the color difference format indicator (not shown), the quantization completion transform coefficient 10 for the three components, and the macroblock header information (macroblock type/sub-macroblock type 106, prediction overhead information 463, transform block size specification flag 464, and quantization parameter 21) are extracted. The prediction unit 461 receives a color difference indication format indicator (not shown) and prediction overhead information 463 to obtain a predicted image 7 having three components.

Fig. 83 shows an internal configuration of a prediction unit 461 of the first picture decoding unit that ensures compatibility with a conventional YUV4:2:0 format bit stream, and describes the operation thereof.

The switch 4611a determines the macroblock type, and when the macroblock type indicates intra prediction, the switch 4611b determines the value of the color difference format indicator. When the value of the color difference format indicator indicates either 4:2:0 or 4:2:2, a predicted image 7 of three components is obtained from the prediction overhead information 463 in accordance with the intra prediction mode information and the intra color difference prediction mode information. Of the three components, a prediction image of the luminance signal is generated by the luminance signal intra prediction unit 4612 in accordance with the intra prediction mode information. In accordance with the intra color difference prediction mode information, a color product signal intra prediction unit 4613 that performs processing different from that of the luminance component generates a predicted image of 2 components of the color difference signal. When the value of the color difference format indicator indicates 4:4:4, a prediction image for all three components is generated by the luminance signal intra prediction unit 4612 in accordance with the intra prediction mode information. When the value of the color difference format indicator indicates 4:0:0, the 4:0:0 format is composed of only a luminance signal (one component), and therefore, only a predicted image of the luminance signal is generated by the luminance signal intra prediction unit 4612 in accordance with the intra prediction mode information.

When the switch 4611a indicates that the macroblock type is inter prediction, the switch 4611c determines the value of the color difference format indicator. When the value of the color difference format indicator indicates either 4:2:0 or 4:2:2, a prediction image is generated for the luminance signal by the luminance signal inter prediction unit 4614 in accordance with the prediction overhead information 463 in accordance with the motion vector and the reference image index and in accordance with the prediction image generation method for the luminance signal determined in accordance with the AVC standard. For a 2-component predicted image of a color difference signal, a color difference signal inter prediction unit 4615 specifies a motion vector obtained from the prediction overhead information 463 in accordance with a color difference format to generate a color difference motion vector, and generates a predicted image in accordance with the AVC standard based on a reference image indicated by a reference image index obtained from the prediction overhead information 463. When the value of the color difference format indicator indicates 4:0:0, the 4:0:0 format is composed of only a luminance signal (one component), and therefore, only a predicted image of the luminance signal is generated by the luminance signal inter-prediction section 4614 in accordance with the motion vector and the reference image index.

As described above, since the means for generating the predictive image of the color difference signal in the conventional YUV4:2:0 format is provided and the means for generating the predictive image of three components is switched according to the value of the color difference format indicator decoded from the bit stream, it is possible to configure a decoding device that ensures compatibility with the bit stream in the conventional YUV4:2:0 format.

Further, if information indicating whether or not the video stream 422c supplied to the decoding apparatus of fig. 80 or 81 is a bit stream that can be decoded even by a decoding apparatus that does not support color space conversion processing, such as the decoding apparatus of fig. 75, is given in units of a sequence parameter set or the like, it is possible to decode a bit stream corresponding to the decoding performance of each decoding apparatus in any of fig. 80, 81, and 75, and there is an effect that compatibility of bit streams can be easily ensured.

Embodiment 15

In embodiment 15, another embodiment in which only the configuration of a bit stream to be input/output is different in the encoding apparatus and the decoding apparatus according to embodiment 14 described above, such as fig. 71 and 75, will be described. The encoding device in embodiment 15 multiplexes encoded data using the bit stream configuration shown in fig. 84.

In the bitstream having the structure of fig. 69, the AUD NAL unit includes information such as primary _ pic _ type as an element thereof. Fig. 85 shows information of picture coding types when picture data within an access unit starting with an AUD NAL unit is coded.

For example, when primary _ pic _ type is 0, this indicates that intra coding is performed in all pictures. When primary _ pic _ type is 1, this indicates that a slice that is intra-coded and a slice that can be motion-compensated predicted using only one reference picture can be mixed in a picture. Primary _ pic _ type is information that defines what coding mode can be used for a picture, and therefore, by operating this information on the coding apparatus side, it is possible to perform coding suitable for various conditions such as the nature of the input video signal and the random access function. In embodiment 14 described above, since only one primary _ pic _ type is provided for each access unit, primary _ pic _ type is set in common among three color component pictures in an access unit when independent coding processing is performed. In embodiment 15, when each Color component picture is independently encoded, primary _ pic _ type of the remaining two Color component pictures is additionally inserted in the AUDNAL unit of fig. 69 according to the value of num _ pictures _ in _ au, or the encoded data of each Color component picture is started from an NAL unit (Color Channel limiter) indicating the start of a Color component picture as in the bit stream structure of fig. 84, and primary _ pic _ type information of the corresponding picture is included in the CCD NAL unit. In this configuration, since the encoded data of each color component picture is multiplexed by totaling one picture, the color component identification flag (color _ channel _ idc) described in embodiment 14 is not included in the slice header but is included in the CCD NAL unit. This makes it possible to integrate information of the color component identifier multiplexed on each slice into data in picture units, thereby reducing overhead information. Further, since the color _ channel _ idc can be verified only once for each color component picture by detecting the CCDNAL units configured as a byte string, and the beginning of the color component picture can be quickly found without performing variable length decoding processing, it is not necessary to verify the color _ channel _ idc in the slice header one by one for separating the NAL unit to be decoded for each color component on the decoding apparatus side, and data supply to the second picture decoding unit can be smoothly performed.

On the other hand, in such a configuration, since the effects of reducing the buffer size and processing delay of the encoding apparatus as described in fig. 72 of embodiment 14 are reduced, it is also possible to configure to signal whether the color component identification flag is multiplexed in slice units or multiplexed in color component picture units at a higher order (sequence, GOP). By taking such a bit stream structure, the encoding apparatus can be flexibly mounted according to its usage.

In another embodiment, the encoded data may be multiplexed by the bit stream structure shown in fig. 86. In fig. 86, it is assumed that each AUD includes color _ channel _ idc and primary _ pic _ type included in the CCD NAL unit in fig. 84. The bit stream structure in embodiment 15 is configured to include one (color component) picture in one access unit even in the case of independent encoding processing. In such a configuration, the effect of reducing overhead information is obtained by aggregating the information of the color component identification flag into data of a picture unit, and since the color _ channel _ idc can be verified only once for each picture by detecting the AUD NAL unit configured as a byte string, the head of the color component picture can be quickly found without performing variable length decoding processing, and therefore, on the decoding apparatus side, it is not necessary to verify the color _ channel _ idc in the slice header one by one in order to separate the NAL unit to be decoded for each color component, and data supply to the second picture decoding unit can be smoothly performed. On the other hand, since a picture of one frame or one field is composed of three access units, it is necessary to specify that the three access units are picture data at the same time. Therefore, in the bit stream structure of fig. 86, the AUD may be further configured to assign a sequence number (e.g., a time-wise encoding/decoding order) to each picture. With this configuration, the decoding apparatus can verify the decoding/display order, color component attribute, and IDR of each picture without decoding slice data at all, and can efficiently perform bitstream-level editing and special playback.

In the bit stream structures of fig. 69, 84 to 86, information for specifying the number of slice NAL units included in one color component picture may be stored in the area of the AUD or CCD.

In all the embodiments described above, the transform process and the inverse transform process may be a transform in which orthogonality is ensured as in DCT, or may be a transform in which orthogonality is approximated by combining quantization and inverse quantization processes, not strictly orthogonal transform as in DCT, such as AVC. Alternatively, the prediction error signal may be encoded into information at the pixel level without performing conversion.

Embodiment 16

In embodiment 16, a video frame input in 4:4:4 format is divided into M parts for each color component independently_i×M_iAn encoding device and a corresponding decoding device for encoding a unit of a rectangular region of pixels (i: 0, 1, 2) by using intra-frame and inter-frame adaptive prediction will be described. M_iThe size of a region into which the signal of the ith color component of the video frame is divided is shown.

1. Outline of operation of encoding device

Fig. 87 shows embodiment 1The video encoding device of claim 6. The input video signal 1 in the 4:4:4 format is separated into the picture components 505b0, 505b1, and 505b2 of the respective color components by the color component separator 502, and is input to the picture encoders 503b0, 503b1, and 503b2, respectively, which have the same configuration. At this time, the color component separator 502 inputs the size M of the rectangular region that specifies the coding unit in each of the picture encoders 503b0, 5003b1, and 503b2 to the corresponding picture encoders 503b0, 503b1, and 503b2, respectively _i506b0, 506b1, and 506b 2. Thus, the picture encoding units 503b0, 503b1, and 503b2 that encode the color component Ci are in accordance with the rectangular region size M_iEach picture component 505b0, 505b1, and 505b2 is divided and encoded in units of this division.

The detailed operation of the picture encoding units 503b0, 503b1, and 503b2 will be described below. The picture encoding units 503b0, 503b1, and 503b2 according to embodiment 16 are described with a modified configuration from that shown in fig. 74. Fig. 88 shows the internal structure of the picture encoding sections 503b0, 503b1, and 503b 2. In the figure, functional blocks and signal lines to which the same numbers are assigned as those of the picture coding unit in fig. 74 are the same as those of fig. 74 unless otherwise specified. In the following description of the picture coding units 503b0, 503b1, and 503b2, the picture components 505b0, 505b1, and 505b2 representing the respective color components are used as the input signal 505, and represent the size M of the rectangular region of the designated coding unit_iThe information 506b0, 506b1, and 506b2 of (a) are described as coding block size indication information 506. The input signal 505 is divided into rectangular blocks by the block dividing unit 40 based on the coding block size indication information 506. The input signal 505 corresponding to the color component Ci will be referred to as a picture as a data unit of the encoding process in the following description.

The following coding methods are available: when the input video signal 1 is a signal obtained by performing color space expression of a luminance/color difference signal (for example, Y, Cb, Cr, Y, Co, Cg, or the like) with respect to the input signal 505, a luminance component is assigned as 505b0, and 505b1, 505b2, and color difference components. In this case, the luminance component becomes a signal in which the texture information of the image signal is concentrated, and the color difference signal is inversely correlated with the luminance component of the texture component, and visually becomes a signal component having a meaning of coloring a monochrome image. Therefore, in signal prediction such as motion compensation prediction between frames based on the texture structure and spatial prediction within a frame used in AVC, the size of a block as a unit of prediction is not necessarily the same as the luminance component. Conversely, the color difference signal is not necessarily predicted according to the same block size as the luminance component, but when prediction is performed according to an individual block size to which correlation within a screen of the color difference signal can be maximally applied, the encoding efficiency can be improved. For example, when the Y component is C0, the Cb component is C1, and the Cr component is C2, the M is defined as ₀＝16、M₁＝M₂If the block size of the color difference component is increased as compared with the luminance component, the overhead information (prediction mode, motion vector, etc.) for each coding unit can be reduced to about one fourth of the luminance component for two out of three components. Fig. 89 shows the state.

In addition, M may be determined according to the image size_i. For example, when comparing a video of an HDTV signal (1920 pixels × 1080 lines) with a low-resolution video such as a CIF signal (352 pixels × 288 lines) having the same meaning, a 4 pixel × 4 line block of the HDTV signal is only the size of one pixel region in the CIF signal. Thus, the higher the image resolution, the smaller the substantial image texture area covered by each pixel. In motion compensation prediction between frames and intra-frame spatial prediction, the similarity of the texture of the original image is detected, and the signal portion with the highest similarity is used as a prediction value, so that when a certain degree of texture is not stored in the signal within the block that is the unit of prediction, prediction cannot be performed satisfactorily (prediction performance is hindered by noise components). Therefore, it is preferable to increase the block size in a video with high resolution so as to cover a texture region that can be covered at low resolution. Thus, also The larger the image size can be taken, the larger M is_iThe larger the size of the device, etc. Later, will be composed of M_i×M_iThe rectangular area of pixels is called a macroblock.

For the input signal 505 divided into macroblocks by the block divider 40, first, the predictor 461 performs: the intra prediction processing for performing spatial prediction from the peripheral pixels stored in the memory 16b in which the local decoding of the current picture is completed, or the motion compensation prediction processing for each color component using a reference image from the prediction reference image data of one or more frames stored in the memory 16 b. The operation of the process of the prediction unit 461 in embodiment 16 is different from that of the prediction unit 461 in fig. 74, and will be described below.

1.1 Intra prediction processing

The prediction unit 461 performs intra prediction processing on a macroblock basis using the reference image 701 stored in the memory 16 b. Among the modes of intra prediction are: an intra nxn prediction mode in which spatial prediction using peripheral pixels is performed in units of blocks of N pixels × N lines; in units of macroblocks shown in fig. 89, a macroblock-unit intra prediction mode using spatial prediction of its peripheral pixels is performed.

(a) Intra NxN prediction mode

A macroblock is divided into blocks each composed of an N × N pixel block, and spatial prediction is performed on each block. The size M of a macroblock is selected as a block size N to be a unit for performing intra N × N prediction_iAnd (4) performing equal division. For example, if M_iWhen N is 16, N is either 4 or 8, and if M is M, M is not used_iWhen N is 32, N is one of 4, 8, and 16. As the prediction value, pixels of blocks (upper left, upper right, left) around the current picture, which have already been encoded and subjected to the local decoding processing, are used, and stored in the memory 16 b. As the prediction mode, for example, a plurality of modes as shown in fig. 3 are prepared. As in embodiment 1, fig. 3 shows the prediction mode type when N is 4,nine prediction modes are shown. Any one of the nine pixels is selected in units of 4 × 4 pixel blocks.

Intra4 × 4_ pred _ mode — 3: a weighted average is obtained for every 2-3 pixels from the adjacent pixels and used as a prediction image (corresponding to the right 45-degree edge).

When N is 4, 16 pieces of mode information for each macroblock are required. In order to suppress the code amount of the mode information itself, prediction encoding is performed based on the mode information of the adjacent block using the fact that the correlation between the mode information and the adjacent block is high. When N is 8 or 16, although not shown, a spatial prediction mode to which the directionality of the image texture is added is defined as in the case where N is 4, and intra prediction processing is performed in units of subblocks into which the macro block of Mi × Mi is equally divided by N × N.

(b) Macroblock unit intra prediction mode

Is one-time predicting M corresponding to the size of the macro block_i×M_iMode of pixel block, at M_iWhen the number is 16, one of the four modes shown in fig. 4 is selected in units of macroblocks. As in the intra N × N prediction mode, the pixels of the surrounding macroblocks (upper left, and left) that have already been encoded and subjected to the local decoding process and are stored in the memory 16b are used for generating the predicted image.

Intra16 × 16_ pred _ mode — 3: a predicted image is obtained by using a total of 31 pixels, i.e., the bottom right pixel of the top left macroblock, the bottom 15 pixels of the top macroblock (excluding the blank pixels), and the right 15 pixels of the left macroblock (excluding the blank pixels), and performing predetermined arithmetic processing (weighted addition processing corresponding to the pixel to be used and the predicted pixel position). At M _iIn the case where the prediction mode is not 16, a macroblock-unit spatial prediction mode to which the directivity of the image texture is added is defined as in the case where Mi is 16.

As a criterion for the prediction efficiency evaluation for the intra prediction mode selection by the prediction unit 461, for example, a rate distortion cost provided by Jm ═ Dm + λ Rm (λ: positive number) can be used. Here, Dm is an amount of coding distortion or prediction error when the intra prediction mode m is applied. The coding distortion is obtained by obtaining a prediction difference signal using the intra prediction mode m, transforming and quantizing the prediction difference signal, decoding a video based on the result of the transformation and quantization, and measuring an error with respect to a signal before coding. The prediction error amount is obtained by obtaining a difference between a predicted image when the intra prediction mode m is applied and a signal before encoding, and quantifying the magnitude of the difference, and is obtained by using, for example, Sum of Absolute Differences (SAD) or the like. Rm is the amount of code that occurs when the intra prediction mode m is applied. That is, Jm is a value that specifies a tradeoff between the amount of code and the degree of degradation when the intra prediction mode m is applied, and the intra prediction mode m that provides the smallest Jm provides the best solution.

1.2 motion compensated prediction processing

The prediction unit 461 further performs inter-frame motion compensation prediction processing in units of macroblocks using the encoded local decoded image 15 stored in the memory 16 b. As the block size for performing motion compensation prediction, as shown in fig. 90(a) to (d), M may be selected in units of macroblocks_i×M_iPixel, M_i×(M_i/2)、(M_i/2)×M_iAnd (M)_i/2)×(M_i/2) a certain type of segmentation shape. Further, in selecting (M)_i/2)×(M_iIn the case of/2), the ratio may be set for each (M)_i/2)×(M_i/2) Each of the blocks, as shown in FIGS. 90(e) to (h), selects (M)_i/2)×(M_i/2)、(M_i/2)×(M_i/4)、(M_i/4)×(M_i/2) and (M)_i/4)×(M_i/4) of the same size.

Further, as shown in fig. 90(i) to (l), a region obtained by dividing a macroblock unequally may be used as a motion compensation prediction unit. In general, an image signal includes a subject having an outline, and motion is discontinuous with the outline as a boundary in many cases. When only a macroblock and a rectangular block which is a subset of the macroblock are units of motion detection, and when there is an object boundary in the block and motion discontinuity occurs, if the number of motion vectors is increased without thinning block division, there occurs a situation in which prediction efficiency is not high. If the area obtained by dividing the macroblock unequally is prepared as a motion-compensated prediction unit as shown in fig. 90(i) to (l), it is possible to cover the discontinuity of motion on the object contour with a smaller number of motion vectors and improve the prediction efficiency.

In general, when the position and shape of an outline in a macroblock are various when the outline exists in the macroblock, and when all of them are to be defined, it is necessary to define all intra-block partitions in addition to the shapes as shown in fig. 90(i) to (l), but as shown in fig. 90(i) to (l) of embodiment 16, up to (M)_i/2)×(M_iThe block of/2) defines a unit area constituting a shape of uneven division in such a form as to have the following effects: the amount of code of additional information to be encoded for representing the divided patterns is suppressed, the amount of computation required for motion detection of each divided pattern is suppressed, and the memory bandwidth (memory band width) is suppressed by making efficient the access to the memory 16b for generating the prediction value.

Which of the partitions in fig. 90 is used for the motion compensation prediction is determined as the inter prediction mode, and a motion vector assigned to each partition is generated and output. The types of inter prediction modes that can be used in a certain picture may be defined such that all of the partition patterns in fig. 90 can be specified, or the partition patterns that can be selected as the inter prediction modes may be limited according to conditions in order to reduce the amount of computation required to select the optimal inter prediction mode and the amount of code of information specifying the inter prediction mode. For example, the more the macroblock is divided, the more the motion vector information required for encoding increases, and therefore, when encoding is performed at a low bit rate, the configuration may be made such that the encoding is performed in the direction (M) as shown in fig. 90(e) to (h) without using the motion vector information _i/2)×(M_i/2) sub-divided patterns below the pixels, and instead of selecting divided patterns shown in fig. 90(i) to (l) in which the code amount of the motion vector can be small. Example (b)For example, the size of the quantization parameter can be used as a criterion for determining the level of the bit rate, and therefore the definition of the inter prediction mode can be switched according to the value of the initial state of the quantization parameter at the time of encoding the picture. Alternatively, a dedicated identification bit for determining the definition of the inter prediction mode may be multiplexed into the bit stream.

Furthermore, in the case where it is necessary to encode motion vectors for respective reference images individually in pictures using predicted images from a plurality of reference images, such as MPEG-2B pictures and AVC bidirectional predictions, the motion vectors are not encoded in the direction (M) as shown in fig. 90(e) to (h) to reduce the amount of information of the motion vectors_i/2)×(M_iAnd/2) sub-divided patterns below the pixels, the divided patterns shown in fig. 90(i) to (l) in which the code amount of the motion vector can be small may be selected instead. Further, the definition of the inter prediction mode may be switched based on information indicating the progress of encoding of a picture that has been encoded immediately before or based on information indicating the progress of estimating the motion of the entire picture. For example, the following methods are available: in the case of a scene with complicated motion, the definition of the inter prediction mode is determined so that a finer division pattern can be used, and in a situation where it is determined that motion is uniform and prediction can be sufficiently performed even in units of large blocks, the definition of the inter prediction mode without using a finer division pattern is used. In addition, the reference picture used for prediction value generation may be specified for each of the divided blocks in the macroblock, and the identification number of the reference picture may be encoded.

As a criterion for the prediction efficiency evaluation for the inter prediction mode selection in the motion compensation prediction process, for example, rate distortion costs provided by Jm, v, r ═ Dm, v, r + λ Rm, v, r (λ: positive number) can be used. Here, Dm, v, and r are coding distortion or prediction error amounts when the inter prediction mode m, the motion vector v determined based on the inter prediction mode m, and the reference image r are applied. The coding distortion is obtained by obtaining a prediction difference signal using the inter prediction mode m, the motion vector v, and the reference image r, transforming and quantizing the prediction difference signal, decoding a video based on the result of the transformation and quantization, and measuring an error with respect to a signal before coding. The prediction error amount is obtained by obtaining a difference between a predicted image and a signal before encoding when the inter prediction mode m, the motion vector v, and the reference image r are applied, and quantifying the magnitude of the difference, and is obtained by using, for example, Sum of Absolute Differences (SAD) or the like. Rm, v, r are generated code amounts when the inter prediction mode m, the motion vector v, and the reference image r are applied. That is, Jm, v, r are values that specify a tradeoff between the amount of code and the degree of degradation when the inter prediction mode m and the motion vector v are applied to the reference image r, and the inter prediction mode m, the motion vector v, and the reference image r that provide the smallest Jm, v, and r provide the best solution.

1.3 Picture coding Process

The prediction unit 461 performs intra prediction processing for all the intra prediction modes shown in fig. 3 and 4 or subsets thereof to generate intra prediction images in units of macroblocks, and performs motion compensation prediction processing for all the motion compensation prediction modes shown in fig. 90 or subsets thereof to output the prediction image 7 of Mi × Mi block. The predicted image 7 is subtracted from the input signal 505 by the subtractor 3 to obtain a predicted difference signal 4. The prediction difference signal 4 is evaluated for its prediction efficiency in the coding mode determination unit 5, and a prediction mode in which the prediction efficiency optimal for the macroblock to be predicted is obtained in the prediction process performed by the prediction unit 461 is output as the coding mode 6. That is, encoding mode 6 includes identifying: information of a macroblock type to be subjected to motion compensation prediction using any one of an intra N × N prediction mode such as fig. 3, a macroblock unit intra prediction mode such as fig. 4, and a division pattern such as that shown in fig. 90. In embodiment 16, the type of coding mode that can be selected in the current picture is switched using the coding mode definition selection information 711 determined by the coding control unit 19 or the like. As the encoding mode definition selection information 711, instead of the dedicated selection instruction information, for example, the initial value of the quantization parameter 21 when encoding the current picture, the encoding block size instruction information 506 notified to the block dividing unit 40, and the like may be used singly or in combination. When the coding mode 6 is selected, a weight coefficient 20 for each coding mode determined by the determination of the coding control unit 19 may be added. The optimal prediction difference signal 4 obtained by using the coding mode 6 in the coding mode determination unit 5 is output to the conversion unit 310.

The transform unit 310 transforms the input prediction difference signal 4 composed of the Mi × Mi pixel blocks and outputs the transformed prediction difference signal as a transform coefficient to the quantization unit 9. In the transform, a Mi × Mi pixel block is divided into L × L pixel blocks (L < ═ Mi, and Mi is a multiple of L) and transformed, and a transform block size L is specified by a transform block size specification flag 464. With this configuration, it is possible to perform a transform process suitable for local properties of signals in the Mi × Mi pixel block. The transform block size L may be a value that is most efficient by transforming all values of L that can be set, or may be matched with the block size of the intra prediction mode or the block size of the motion compensation prediction mode. In the latter case, since the coding mode 6 includes information corresponding to the transform block size designation flag 464, there is an effect that the transform block size designation flag 464 may not be separately multiplexed in the bitstream. The quantization unit 9 quantizes the input transform coefficient based on the quantization parameter 21 determined by the encoding control unit 19, and outputs the quantized transform coefficient to the variable length encoding unit 11 as the quantization-completed transform coefficient 10. The quantized transform coefficient 10 is entropy-encoded by means of huffman coding, arithmetic coding, or the like by the variable length coding unit 11. The quantized transform coefficient 10 is restored to the local decoded prediction difference signal 14 via the inverse quantization unit 12 and the inverse transform unit 312, and is added to the predicted image 7 generated by the prediction method corresponding to the encoding mode 6 by the adder 18, thereby generating the local decoded image 15. The local decoded image 15 is subjected to distortion removal filtering at the block boundary by the block filtering unit 462 or stored in the memory 16b as it is, based on the block filtering control flag 24 indicating whether or not to perform block filtering, so as to be used in the subsequent prediction processing. The block filter 462 refers to the coding block size indication information 506 and the transform block size designation flag 464, and performs optimum block distortion removal processing on each of the macroblock boundary and the transform block boundary. Since the decoding apparatus also needs to perform the same processing, the block filter control flag 24 is input to the variable length coding unit 11 and multiplexed into the bit stream.

In the variable length encoding section 11, a predetermined macroblock size M is subjected to huffman coding, arithmetic coding, or the like_iThe coded block size indication information 506, the quantization completion transform coefficient 10, the coding mode 6, the prediction overhead information 463, and the quantization parameter 21 are entropy-coded, arranged and shaped into a bit stream according to a predetermined rule (syntax), and output to the transmission buffer 17. The prediction overhead information 463 of embodiment 16 includes prediction mode information (Intra4 × 4_ pred _ mode, Intra16 × 16_ pred _ mode, and the like) used for each prediction unit block when the Intra prediction process is selected as the coding mode 6. When the motion compensation prediction process is selected as the encoding mode 6, the motion vector information and the reference picture index that match the partition pattern determined for each of the macroblock types shown in fig. 90 are included. The transmission buffer 17 smoothes the bit stream in accordance with the bandwidth of the transmission path to which the encoding apparatus is connected and the reading speed of the recording medium, and outputs the bit stream as the video stream 422b 0. Further, feedback information is output to the encoding control unit 19 in accordance with the status of bit stream accumulation in the transmission buffer 17, and the amount of generated code in encoding of the subsequent video frame is controlled. The video stream 422b0 is divided into units of slices obtained by combining a plurality of macroblocks, and output.

2. Structure of coded bit stream

By the picture coding unit 503 performing the above processing, the input video signal 1 to the coding apparatus is independently coded by the three independent picture coding units 503b0, 503b1, and 503b2, and is output as the video streams 422b0, 422b1, and 422b2 in units of slices obtained by grouping a plurality of macroblocks, and is output from the coding apparatus as the video stream 422c of the input video signal 1 composed of three components arranged in the multiplexing unit 504.

Fig. 91 shows the data arrangement of the video streams 422b0, 422b1, and 422b2, which are outputs from the picture encoding sections 503b0, 503b1, and 503b 2. The video streams 422b0, 422b1, and 422b2 obtained by picture coding are configured such that encoded data in which the number of macroblocks included in the same picture is concentrated is grouped into data units such as slices in which a plurality of macroblocks are concentrated. A picture level header to which macroblocks belonging to the same picture are referred as a common parameter is prepared, and coding block size indication information 506 and coding mode definition selection information 711 are stored in the picture level header. For all macroblocks in a picture, the macroblock size M is determined using the coding block size indication information 506 included in the referenced picture-level header _iThe variable length coding step of coding mode 6 is determined according to the coding mode definition selection information 711.

Each slice starts with a slice header, and the slice header includes a color component identification flag 721 (the same as in the case of slice encoded data obtained by the independent encoding process of fig. 69) indicating which color component encoded data (information defining the types of 505b0, 505b1, and 505b 2) is included in the slice. Next to the slice header, encoded data of each macroblock in the slice is arranged (in this example, it is shown that K macroblocks are included in the second slice in one picture). Next to the coding mode 6, prediction overhead information 463, a transform block size designation flag 464, (in the case where the quantization parameter is changed only in macroblock units), a quantization parameter 21, and a quantization-completed transform coefficient 10 are arranged in the data of each macroblock. The video stream 422c that is output from the encoding apparatus of fig. 87 is a multiplexed format of three components, i.e., the video streams 422b0, 422b1, and 422b2 having the structure of fig. 91. In fig. 91, the coding block size indication information 506 and the coding mode definition selection information 711 are arranged in the picture level header, but the information of three components may be stored in a sequence level header assigned in units of a sequence in which a plurality of video frames are grouped. This eliminates the need to encode and transmit information different among the three components by individual picture-level titles, and can reduce the amount of information in the titles.

3. Outline of operation of decoding device

The decoding apparatus in fig. 92 receives the video stream 422c output from the encoding apparatus in fig. 87, performs decoding processing on a macroblock-by-macroblock basis, and restores each video frame.

In fig. 92, the upper header analysis unit 610 decodes upper header information such as a sequence level header and a picture level header for the video stream 422c, and stores the decoded upper header information in a predetermined memory area that can be referred to by the color component determination unit 602 and the picture decoding units 603b0, 603b1, and 603b 2.

The color component determination unit 602 uses the value of the color component flag 721 shown in fig. 91 to identify which color component picture the slice corresponds to, and assigns the slice to an appropriate picture decoding unit (603b0, 603b1, and 603b 2). With such a configuration of the decoding apparatus, even when a video stream in which three color components are mixed is received, it is possible to easily determine which slice belongs to which color component picture and correctly decode the picture.

3.1 overview of operation of the Picture decoding section 603

The detailed operation of the picture decoders 603b0, 603b1, and 603b2 will be described below. In the description of the picture decoders 603b0, 603b1, and 603b2 according to embodiment 16, a modification of the configuration of fig. 77 will be described. Fig. 93 shows the internal structure of the picture decoders 603b0, 603b1, and 603b 2. In the figure, functional blocks and signal lines to which the same numbers are assigned as those of the picture decoding unit in fig. 77 are the same as those of fig. 77 unless otherwise specified.

The picture decoders 603b0, 603b1, and 603b2 receive the C0, C1, or C2 slice encoded data distributed by the color component determination unit 602, perform decoding processing on macroblocks formed of samples of a single color component as a unit, and restore and output signals 427 b0 (427 b1, 427 b2) of the corresponding color components of the video frame.

The variable length decoding unit 25 receives the video stream 422c as an input, decodes the video stream 422c according to a predetermined rule (syntax), and extracts a slice header, the quantization-completed transform coefficient 10 of each macroblock, the prediction overhead information 463, the transform block size designation flag 464, the quantization parameter 21, and the coding mode 6. In fig. 92, although it is described that a sequence or picture level header is decoded by the upper header analysis unit 610, in this case, information such as the coding block size indication information 506 and the coding mode definition selection information 711 may be referred to before decoding of a slice is started in the picture decoding unit 603 that decodes the corresponding color component. When decoding is performed by the variable length decoding unit 25 in each picture decoding unit 603, a picture-level header is decoded by the variable length decoding unit 25 before decoding of a slice is started, and information such as the coding block size indication information 506 and the coding mode definition selection information 711 is extracted from a bitstream. Although not shown, the encoding mode definition selection information 711 is used to determine the variable length decoding procedure when the encoding mode 6 is decoded in the variable length decoding unit 25.

The quantized transform coefficient 10 is input to the inverse quantization unit 12, which performs the same processing as the picture coding units 503b0, 503b1, and 503b2, together with the quantization parameter 21, and is subjected to inverse quantization processing. The output is then input to the inverse transform unit 312, which performs the same processing as the picture encoding units 503b0, 503b1, and 503b2, and is restored to the local decoded prediction difference signal 14. In these procedures, the coding block size indication information 506 is referred to in order to output the prediction error image configured as the Mi × Mi pixel block by inverse transformation with reference to the transform block size indication flag 464 of the transform block size L which is a unit of inverse transformation and inverse quantization. On the other hand, the prediction unit 461 of the picture encoding units 503b0, 503b1, and 503b2 includes only the processing for generating the predicted image 7 by referring to the encoding mode 6 and the prediction overhead information 463. The encoding mode 6 is input to the prediction unit 461,The predicted image 7 is obtained by predicting the overhead information 463. In the prediction part 461, the prediction part is set according to the macroblock size M_iGenerating a target M_i×M_iThe prediction image of the pixel block notifies the coding block size indication information 506.

When the coding mode 6 indicates an intra prediction mode such as intra N × N prediction or macroblock-based intra prediction, the prediction overhead information 463 includes intra prediction mode information for each N × N block, intra prediction mode information for each macroblock, and the macroblock size M _iThe predicted image 7 is obtained using the reference image 701 stored in the memory 16 b. When the coding mode 6 indicates inter (motion compensated) prediction, the intra-macroblock partition pattern in fig. 90 is recognized according to the coding mode 6, and the reference picture index and the macroblock size M are determined according to the motion vector obtained from the prediction overhead information 463_iThe predicted image 7 is obtained using the reference image 701 stored in the memory 16 b.

The local decoded prediction difference signal 14 and the predicted image 7 are added by the adder 18, and a decoded image 427 b0 is obtained (427 b1, 427 b 2). In order to be used for motion compensated prediction of subsequent macroblocks, block distortion removal processing may be applied to the decoded image 427 b0 (427 b1, 427 b2) in the block filter 26 based on the block filter control flag 24, in the same manner as the picture encoding units 503b0, 503b1, and 503b 2. At this time, the processing result of the block filter 26 is stored in the memory 16b, and in order to be referred to as a reference image 701 in the subsequent picture decoding, the block distortion removal processing suitable for each of the macroblock and the transform block is performed by referring to the encoding block size indication information 506 and the transform block size indication flag 464 in the same manner as in encoding. The decoded image 427 b0 (427 b1, 427 b2) is stored in the memory 16b for use in the subsequent motion compensation prediction processing. The decoded image 427 b0 (427 b1, 427 b2) includes only samples of a single color component, and is configured as a color video frame by integrating decoded images 427 b0, 427 b1, and 427 b2, which are outputs of the picture decoders 603b0, 603b1, and 603b2 that decode other color components, into units of a video frame.

According to the above-described encoding device and decoding device, in order to efficiently encode a color video signal of 4:4:4 format, since prediction encoding is performed independently for each color component and the size of a macroblock to be predicted and encoded can be dynamically switched according to the signal property of each color component, it is possible to perform encoding in which the rate of the entire code amount is increased in low bit rate encoding that achieves a high compression rate, and the code amount of prediction overhead information 463 such as an intra prediction mode, a motion vector, and a reference picture index is efficiently suppressed. Furthermore, in the motion compensation prediction process, in order to suppress the amount of code of the prediction overhead information 463, by using the intra-macroblock unevenly-divided patterns for improving the prediction efficiency with a small number of motion vectors, the balance between the prediction efficiency and the amount of code is improved, the types of inter-frame prediction modes representing these divided patterns are diversified to improve the prediction efficiency for various motions, and the types of necessary inter-frame prediction modes are switched in accordance with the coding conditions such as the bit rate and the image resolution, so that it is possible to provide an encoding device and a decoding device for efficiently encoding a color video signal of the 4:4:4 format.

In addition, although the encoding device of fig. 88 in which the macroblock dividing unit 40 is added to fig. 74 and the decoding device corresponding thereto have been described in embodiment 16, the same effects can be obtained by using an encoding device in which the block dividing unit 40 and the decoding device corresponding thereto are added to the processing function of performing individual and independent encoding processing for each color component in other embodiments. Further, by replacing the individual encoding processing implementation portion in the encoding apparatus of fig. 71 with the encoding apparatus of fig. 87 and replacing the individual decoding processing implementation portion in the decoding apparatus of fig. 75 with the decoding apparatus of fig. 92, it is possible to provide an encoding apparatus and a decoding apparatus which encode a color video signal of 4:4:4 format and have higher adaptability and efficiency.

Embodiment 17

In embodiment 17, an encoding device and a decoding device that dynamically switch the motion vector detection accuracy when performing motion compensation prediction processing in a prediction unit in the encoding device and the decoding device of embodiment 16 will be described.

Originally, although only discrete pixel information (hereinafter, referred to as integer pixels) generated by sampling exists in the input signal 505 as a digital image, a technique of creating a virtual sample by interpolation between integer pixels and using the virtual sample as a prediction image has been widely used. In this technique, there are two effects: the prediction accuracy is improved by increasing the number of candidate points for prediction, and the prediction efficiency is improved by reducing the number of specific points of a predicted image by the filtering effect accompanying interpolation. On the other hand, when the accuracy of the virtual sample is improved, it is necessary to improve the accuracy of the motion vector representing the motion amount, and therefore, it is necessary to pay attention to the increase in the code amount.

In encoding systems such as MPEG-1 and MPEG-2, half-pixel prediction is used in which the precision of the virtual sample is allowed up to 1/2 pixels. Fig. 94 shows the appearance of the generation of 1/2-pixel-precision samples. In the figure, A, B, C and D denote integer pixels, and e, f, g, h, and i denote virtual samples of half-pixel accuracy generated from a to D.

e＝(A+B)//2

f＝(C+D)//2

g＝(A+C)//2

h＝(B+D)//2

i＝(A+B+C+D)//2

(where,// denotes division with rounding (into integers))

In addition, in MPEG-4(ISO/IEC 14496-2), 1/4-pixel accuracy prediction using hypothetical samples up to 1/4-pixel accuracy is employed. In 1/4 pixel accuracy prediction, after half-pixel samples are generated, they are used to generate 1/4 pixel accuracy samples. In order to suppress excessive smoothing in the generation of half-pixel samples, a filter with a large number of taps (taps) is designed to keep the frequency components of the original signal as much as possible. For example, in 1/4-pixel-precision prediction of MPEG-4, a half-pixel-precision virtual sample a created for generating a 1/4-pixel-precision virtual sample is generated as follows using eight pixels around the half-pixel-precision virtual sample a. The following expression represents only the case of horizontal processing, and the relationship between the half-pixel-precision virtual sample a created for generating the 1/4-pixel-precision virtual sample and the X components X-4 to X4 of the integer pixels of the following expression is in the positional relationship shown in fig. 95.

a＝(COE1*X1+COE2*X2+COE3*X3+COE4*X4+COE_-1*X_-1+COE_-2*X_-2+COE_-3*X_-3+COE_-4*X_-4)//256

(wherein, COE_k: filter coefficients (coefficient sum 256). // denotes division with rounding. )

In AVC (ISO/IEC 14496-10), when generating a half-pixel-precision virtual sample, a filter having six taps of [1, -5, 20, 20, -5, 1] is used, and a linear interpolation process similar to the above-described generation of half-pixel samples of MPEG-1 and MPEG-2 is further performed to generate a virtual sample having 1/4 pixel precision.

1. Operation of an encoding device

In embodiment 17, the precision of the virtual sample may be specified to the half-pixel or 1/4-pixel precision in the motion compensation prediction process. Accordingly, the encoding device and the decoding device according to embodiment 17 are configured to be able to specify the accuracy of the virtual sample to be used for each color component. Fig. 96 shows the structure of the picture encoding units 503b0, 503b1, and 503b2 in embodiment 17. The picture encoding units 503b0, 503b1, and 503b2 in fig. 88 differ only in the operation of the prediction unit 461 and the variable length encoding unit 11.

The prediction unit 461 in embodiment 17 receives the virtual pixel accuracy instruction information 800, determines the accuracy of the virtual pixel for motion vector detection based on the received information, and performs processing. Although not shown, the picture coding unit 503 that codes each color component individually specifies the virtual pixel accuracy indication information 800 for each color component Ci. When the virtual pixel accuracy instruction information 800 indicates that "motion vector detection with 1/4 pixel accuracy" is to be performed, the predictor 461 performs motion vector detection while generating a sample with 1/4 pixel accuracy by linear interpolation after generating a sample with half pixel accuracy by a multi-tap filter such as MPEG-4 or AVC described above. On the other hand, when the virtual pixel accuracy instruction information 800 indicates that "half-pixel-accuracy-only motion vector detection" is to be performed, motion vector detection is performed while performing half-pixel-accuracy sample generation by a multi-tap filter such as MPEG-4 or AVC described above or generating a half-pixel-accuracy sample by linear interpolation such as MPEG-1 or MPEG-2 described above. Since the decoding apparatus needs to generate a virtual sample by the same method to obtain a predicted image, the virtual pixel accuracy indication information 800 is multiplexed and output to the bit stream. The method of generating the half-pixel-precision sample may be configured to be determined to be one and processed in the same procedure in the encoding apparatus and the decoding apparatus, or may be configured to prepare a plurality of generation methods, multiplex the generated methods into a bitstream as the virtual sample generation method instruction information 811, and transmit the multiplexed bitstream to the decoding apparatus. As a method of setting the virtual pixel accuracy indication information 800, for example, the following method is considered: when encoding is performed in a color space such as Y, Cb, or Cr, the virtual pixel accuracy instruction information 800 is used as "motion vector detection with 1/4 pixel accuracy" for a Y component that strongly reflects the texture structure of an image, and motion detection is performed with fine accuracy, and the virtual pixel accuracy instruction information 800 is used as "motion vector detection with only half-pixel accuracy" for color difference components (Cb, Cr) having lower correlation with the texture structure than the Y component signal. This configuration not only changes the instruction of the virtual pixel accuracy according to the color component, but also has an effect that when an image texture structure of a certain degree is stored for any component, such as an RGB signal, the virtual pixel accuracy instruction information 800 is set as "motion vector detection of 1/4 pixel accuracy" for all components, and motion detection is performed with fine accuracy, and flexible motion compensation prediction processing can be performed on any color space signal in accordance with the properties of each color component signal.

The virtual pixel accuracy instruction information 800 is sent to the variable length coding unit 11, and is used to identify the unit of the value of the motion vector (included in the prediction overhead information 463) detected by the prediction unit 461. The variable-length encoding unit 11 sets the motion vector to be encoded as MV, and sets the prediction vector determined for MV as PMV according to a predetermined prediction value determination procedure. The PMV uses the value at which the encoding has ended. The variable length encoding unit 11 encodes the MV-PMV value. At this time, when the virtual pixel accuracy indication information 800 indicates "motion vector detection with 1/4 pixel accuracy", the unit of the MV value is 1/4 pixels. On the other hand, when the virtual pixel accuracy instruction information 800 indicates that "motion vector detection with only half-pixel accuracy" is to be performed, the unit of the MV value is 1/2 pixels. In the motion vector having 1/4 pixels of 1, the horizontal and vertical components have two times the range as compared with the case of 1/2 pixels of 1. Therefore, when only 1/2-pixel-precision samples are used, the amount of information required for coding MV can be reduced by setting the unit of MV to 1/2 pixels, compared to the case where 1/4 pixels are set to the unit of MV.

By utilizing this property, not only the difference in the property of the signal according to the color space, but also when high compression encoding is performed in which the ratio of the code amount of the prediction overhead information 463 such as the motion vector to the entire code amount becomes high, it is possible to perform encoding in which the code amount of the motion vector is suppressed by adjusting the virtual pixel accuracy instruction information 800. Since the virtual pixel accuracy indication information 800 can be set independently for each color component, control can be performed in accordance with the state of each color component at the time of high compression, and encoding processing with higher adaptability can be performed.

2. Coded bitStructure of flow

Fig. 97 shows the data arrangement of the video streams 422b0, 422b1, and 422b2 output from the encoding apparatus of fig. 96. Compared with the stream arrangement of fig. 91, the difference is that the virtual pixel indication information 800 is multiplexed in the picture-level header portion. Thus, the decoding device that receives the bit stream can identify the unit of the value of the motion vector included in the prediction overhead information 463 for each color component, decode the motion vector in the same manner as the encoding device, and generate a predicted image. For example, when a plurality of methods for generating a sample with half-pixel accuracy can be prepared as described above, the virtual sample generation method instruction information 811 may be multiplexed into the picture level header. In fig. 97, the virtual pixel instruction information 800 and the virtual sample generation method instruction information 811 are multiplexed in the picture-level header area, but may be multiplexed in a higher-level header area such as a sequence-level header by summing up the amounts of the three color components.

3. Operation of decoding apparatus

Fig. 98 shows a configuration of a decoding device (picture decoders 603b0, 603b1, and 603b2) according to embodiment 17. The operations of the variable length decoding unit 25 and the prediction unit 461 are different from those of the picture decoding units 603b0, 603b1, and 603b2 in fig. 93. The variable length decoder 25 decodes the video streams 422b0, 422b1, and 422b2 shown in fig. 97, extracts the virtual pixel accuracy indication information 800 included in the picture level header from the video streams, and outputs the extracted information to the predictor 461. When the value of the virtual pixel accuracy indication information 800 indicates "motion vector detection with 1/4 pixel accuracy", the unit of the motion vector value included in the prediction overhead information 463 is set to 1/4 pixels, and the prediction overhead information 463 is sent to the prediction unit 461. The prediction unit 461 generates a prediction image while generating a sample of 1/4 pixel accuracy by linear interpolation after generating a sample of half pixel accuracy by a multi-tap filter such as MPEG-4 or AVC described above, in accordance with the case where the value of the motion vector included in the prediction overhead information 463 is 1/4 pixel units.

On the other hand, when the value of the virtual pixel accuracy instruction information 800 indicates "motion vector detection with half-pixel accuracy", the unit of the motion vector value included in the prediction overhead information 463 is set to 1/2 pixels, and the prediction overhead information 463 is sent to the prediction unit 461. The prediction unit 461 generates a prediction image while generating a sample of half-pixel accuracy by a multi-tap filter such as MPEG-4 or AVC described above or generating a sample of half-pixel accuracy by linear interpolation such as MPEG-1 or MPEG-2 described above, in accordance with the case where the value of the motion vector included in the prediction overhead information 463 is 1/2 pixels. When the variable length decoding unit 25 is configured to be able to select a plurality of methods for generating half-pixel samples, the virtual sample generation technique instruction information 811 in fig. 97 is extracted from the bit stream and transmitted to the prediction unit 461, so that the half-pixel samples are generated by the same method as the encoding apparatus.

According to the encoding device and the decoding device in embodiment 17 described above, in order to efficiently encode a color video signal of a 4:4:4 format, when motion compensation prediction is performed independently for each color component, the accuracy of a virtual sample used in motion vector detection and prediction image generation can be dynamically switched according to the property of the signal of each color component, and therefore, in low bit rate encoding that achieves a high compression rate, encoding in which the rate of the entire code amount is increased, and the code amount of a motion vector is efficiently suppressed, can be performed. Further, by preparing a plurality of methods for generating a virtual sample, such as the type of interpolation filter used when generating a virtual sample, and selectively switching between generating virtual samples, it is possible to perform optimal motion compensation prediction processing according to the signal properties of each color component, and it is possible to provide an encoding device and a decoding device that efficiently encode a color video signal of 4:4:4 format.

In embodiment 17, the encoding apparatus of fig. 96 to which the virtual pixel accuracy instruction information 800 is added to fig. 88 of embodiment 16 and the decoding apparatus of fig. 98 to which the virtual pixel accuracy instruction information 800 is added to fig. 93 are used for description, but similar effects can be obtained even when the encoding apparatus to which the virtual pixel accuracy instruction information 800 is added to the diagrams of other embodiments and the decoding apparatus to which the virtual pixel accuracy instruction information 800 is added are used.

Embodiment 18

In the above-described embodiments, other embodiments are described with respect to the encoding device and the decoding device that perform the individual and independent encoding process of encoding and decoding a single color component and other color components independently. Here, embodiment 16 is described as an example. In the individual/independent encoding process, since the encoding mode 6, which cannot be irreversibly encoded in principle, and the prediction overhead information 463, such as a motion vector, are multiplexed into the bit stream for each color component, the compression performance is hindered in the case of high compression encoding in which the ratio of the code amount to the entire code amount is high. Therefore, in the encoding device according to embodiment 18, information such as the encoding mode 6 and the prediction overhead information 463 obtained as a result of macroblock encoding using a specific color component (for example, determined as the C0 component) is held as reference information, and when encoding a macroblock that is located at the same position in the image space as a macroblock using the C0 component of the reference information is performed in the picture encoding unit 503 that processes another color component, it is possible to select whether to encode the macroblock by leaving the reference information or to individually determine the encoding mode 6 and the prediction overhead information 463 in the color component of itself and multiplex a prediction information encoding instruction flag indicating which step has been selected in units of macroblocks. Thus, when the correlation between color components is high with respect to the coding mode 6 and the prediction overhead information 463, the amount of code of the color components can be efficiently reduced to improve the compression efficiency.

When the encoding mode 6 and the prediction overhead information 463 are individually determined in the color components of the encoding device itself and encoded, the encoding/decoding processing procedure described in embodiments 16 and 17 can be used as it is. Even when encoding is performed with reference to the reference information, it is necessary to wait for encoding processing of other components until the reference information is obtained on the encoding side, but multiplexing processing of the bit stream into the coding mode 6, the prediction overhead information 463, and the like is simply skipped, and the encoding processing procedures described in embodiments 16 and 17 can be used almost as they are. Although the decoding side needs to decode the reference information at first, the decoding side simply performs a process of decoding the prediction information encoding indicator to determine whether to use the reference information or the decoded information in its own macroblock, and the decoding process steps described in embodiments 16 and 17 can be used as they are.

Further, by determining whether or not the prediction information coding indicator is multiplexed on a macroblock basis in a higher data hierarchy (slice, picture, sequence) and multiplexing the prediction information coding indicator into a bit stream, only when the prediction information coding indicator is required, such as when the bit stream is compressed at a high level, the prediction information coding indicator may be multiplexed into a macroblock-level code, and coding efficiency may be improved. Further, the reference information may not be limited to information of a specific color component, and encoding and decoding may be performed while selecting which color component is used as a reference.

The configuration of embodiment 18 is not limited to embodiment 16, and can be applied to all of the encoding devices and decoding devices in the embodiments of the present application, which perform individual and independent encoding processing for encoding and decoding a single color component and other color components independently.

Claims

1. An image decoding apparatus that receives a bit stream obtained by compression-encoding a color video signal composed of a plurality of color components, and decodes the color video signal by selectively applying an intra-picture decoding process or a motion compensated predictive decoding process to each of the color components of the color video signal, the image decoding apparatus comprising:

a color component identification unit that decodes a color component identification flag identifying a color component to which an input bitstream belongs and determines to which color component the bitstream includes encoded information;

a decoding unit configured to decode, for each of the color components determined by the color component identifying unit, a prediction mode indicating a prediction image generation method used for encoding the coding unit region, corresponding prediction overhead information, and information obtained by encoding a prediction error from a bitstream for each coding unit region in accordance with a predetermined syntax;

a prediction image generation unit configured to generate a prediction image for the signal of the coding unit region, based on the decoded prediction mode and the corresponding prediction overhead information;

A prediction error decoding unit that decodes a prediction error signal based on information obtained by encoding the prediction error; and

an adder for adding the output of the predicted image generator and the output of the prediction error decoder,

the decoding unit decodes a prediction information encoding instruction flag indicating whether the prediction mode used for encoding and the corresponding prediction cost information are to be used as they are in an encoding target region at the same image position of other color components constituting the same screen or whether the prediction mode unique to its own color component and the corresponding prediction cost information are to be used for decoding, and determines the prediction mode and the corresponding prediction cost information used in the predicted image generating unit based on the value of the prediction information encoding instruction flag.