CN110012294B

CN110012294B - A kind of encoding method and decoding method for multi-component video

Info

Publication number: CN110012294B
Application number: CN201910263325.1A
Authority: CN
Inventors: 李国平; 王国中; 商习武; 方志军
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2019-04-02
Filing date: 2019-04-02
Publication date: 2021-03-23
Anticipated expiration: 2039-04-02
Also published as: CN110012294A

Abstract

The invention belongs to the technical field of coding and decoding, and discloses a coding method for multi-component video. Each component is reorganized to obtain a single-component video sequence; Step 2, artificially add two components U and V to the single-component video sequence, so that it is converted into an existing three-component video format, and then according to the existing coding method It is encoded to obtain corresponding encoded data. A decoding method for multi-component video is also disclosed, comprising the following steps: Step 1, decoding the encoded data according to the existing decoding method, obtaining the corresponding decoding data of the three-component video format, and then removing the artificial addition The two components U and V of , obtain the corresponding single-component video sequence; Step ii, recombine all the single-component video sequences to obtain the corresponding multi-component video. The method of the present invention is highly versatile.

Description

Encoding method and decoding method for multi-component video

Technical Field

The invention belongs to the technical field of video coding and decoding, and particularly relates to a coding method and a decoding method for multi-component video.

Background

The different color spaces such as RGB, HIS, YUV, CMYK are only different representations of the same physical quantity, and these different representations are used in different fields. YUV is a color coding method, which is commonly used in various video processing fields, and allows for a reduction in the bandwidth of chrominance in consideration of human perception when encoding photographs or videos. Historically, YUV and Y' UV have been used to encode analog signals for television, while YCbCr is used to describe digital video signals suitable for video and picture compression and transmission, such as MPEG, JPEG.

For bandwidth saving, most YUV formats use less than 24 bits per pixel on average, with the main sampling formats being YCbCr4:2:0, YCbCr4:2:2, and YCbCr4:4: 4. The YUV representation is called A: B: C representation:

4:4:4 indicates complete sampling.

4:2:2 denotes horizontal sampling of 2:1, vertical full sampling.

4:2:0 denotes a 2:1 horizontal sampling, vertical 2:1 sampling.

In a video coding sequence, there are mainly three types of coded frames: i frame, P frame, B frame:

● I frame, i.e. Intra-coded picture, without referring to other image frames, only using the information of the frame to encode;

● P frame (Predictive-coded picture frame), which uses the previous I frame or P frame to perform inter-frame Predictive coding by adopting a motion prediction mode;

● B frames, i.e., bidirectional predictive coded image frames, provide the highest compression ratio, which requires both previous image frames (I-frames or P-frames) and subsequent image frames (P-frames), and are coded by motion prediction.

Group of pictures (GOP), which refers to the distance between two I-frames, it is clear that the first frame in a Group of pictures is an I-frame.

For conventional video coding, the YUV three components are usually used for the coding process. However, in videos in some specific fields, in which video images are represented by four components, as a technology has been developed in recent years, in which image analysis is performed for information of light invisible to the human eye such as infrared light and ultraviolet light or information obtained by photographing an object by specific wavelengths in RGB in addition to information on three primary colors of RGB, and analysis of sugar of fruits, pathological analysis of internal organs, and the like are performed using the analyzed information. A multispectral image (also referred to as a "multiband image" or a "multichannel image") in which the image includes a large number of color components other than RGB contains a large number of spectra, and the amount of data on such an image tends to increase. Therefore, it is necessary to compress, transmit, etc. image data by encoding and decoding, so as to reduce the data amount when such multispectral image is used in communication, etc. or when such multispectral image is recorded in a recording medium, the current methods for processing such images are few, only specific types of images can be processed, and the generality is poor, the algorithm is complex, and the speed is slow.

Disclosure of Invention

The invention provides an encoding method and a decoding method for a multi-component video, which solve the problems that the existing processing method can only process specific types of images, the universality is poor and the like.

The invention can be realized by the following technical scheme:

a coding method for multi-component video, the number of said components being greater than three, comprising the steps of:

step one, with a GOP image group as a unit, recombining each component of a multi-component video to obtain a single-component video sequence;

and step two, artificially adding two components U, V into the single-component video sequence to convert the single-component video sequence into the existing three-component video format, and then coding the single-component video sequence according to the existing coding method to obtain corresponding coded data.

Further, the multi-component video is uniformly divided into a plurality of multi-component GOP group of pictures, the same components in all frame pictures in each multi-component GOP group of pictures are combined into one group, and a data group formed by the same components is defined as a single-component video sequence, so that a plurality of single-component video sequences are formed.

Further, the method of obtaining corresponding encoded data comprises the steps of:

step I, artificially adding two components U, V to each data of the single-component video sequence to convert the data into the existing three-component video format;

step II, coding the converted single component video sequence according to the existing coding method, and marking the component type of the component video corresponding to the first I frame image of each GOP image group to obtain corresponding coded data;

step III, repeating the steps I and II, completing the coding of all the single-component video sequences in a multi-component GOP image group, and obtaining corresponding coded data;

and IV, repeating the step III to finish the coding of all the multi-component GOP image groups and obtain the coded data corresponding to the multi-component video.

Further, the components U and V corresponding to each datum are the same, and the values of the components U and V are any one of values from 0 to 255.

Further, the existing encoding method is configured as transform coding, motion estimation and motion compensation, entropy coding, or hybrid coding, and the length of the GOP group of pictures set in the encoding process is the same as that of the multi-component GOP group of pictures.

A decoding method based on the above-described encoding method for multi-component video, comprising the steps of:

decoding the coded data according to the existing decoding method to obtain decoded data in a corresponding three-component video format, and then removing two artificially added components U, V to obtain a corresponding single-component video sequence;

and step ii, recombining all the single-component video sequences again to obtain the corresponding multi-component video.

Further, the method of obtaining a corresponding single-component video sequence comprises: the length of the GOP group of the decoder is set to be the same as that of the GOP group of the encoder corresponding to the encoded data, then the GOP group to be decoded is decoded according to the existing decoding method to obtain the decoded data in the corresponding three-component video format, three components of the decoded data are respectively stored in different addresses, then two artificially added components are removed U, V, and a single-component video sequence corresponding to the component types is obtained according to the component types of the multi-component video corresponding to the first I-frame image of the GOP group during encoding.

Further, the method of recombination again comprises: judging whether the component type of the first single-component video sequence is the first component of the multi-component video, if so, recombining all the single-component data again by using the inverse operation of recombination with each component of the multi-component video; if not, deleting the first single-component video sequence, judging whether the component type of the second single-component video sequence is the first component of the multi-component video or not until the single-component video sequence with the component type being the first component of the multi-component video is found, and then recombining all the residual single-component data again by using the inverse operation of recombination with each component of the multi-component video.

The beneficial technical effects of the invention are as follows:

through recombination, the multi-component video is changed into a single-component video sequence, then U, V components are artificially added, each single-component video series conforms to a three-component video format which can be processed by the existing coding and decoding method, so that coding and decoding can be carried out on the single-component video, finally recombination is carried out again by utilizing the inverse operation of the recombination, the decoded multi-component video is obtained, the coding and decoding problems of the multi-component videos of different types can be processed by utilizing the method of the invention, the types of the excessive components do not need to be concerned, the universality is strong, meanwhile, the coding and decoding can be carried out by utilizing the existing coder and decoder through the recombination and the recombination again, and the calculation process is simple. As the images in each single-component GOP group belong to the same type of components, the correlation among the images of the same component type is strongest, the compression effect is good during encoding, and experiments also verify that the compression effect of the invention is very good.

Drawings

FIG. 1 is a schematic overview of the process of the present invention.

Detailed Description

The following detailed description of the preferred embodiments will be made with reference to the accompanying drawings.

The invention provides an encoding method for multi-component video, wherein the number of components is more than three, as shown in figure 1, the method specifically comprises the following steps:

step one, with a GOP image group as a unit, recombining each component of a multi-component video to obtain a single-component video sequence.

Specifically, a multi-component video is uniformly divided into a plurality of multi-component GOP group pictures, the same components in all frame pictures in each multi-component GOP group picture are combined into one group, and a data group composed of the same components is defined as a single-component video sequence, so that a plurality of single-component video sequences are formed.

Assuming that the number of components of the multi-component video is four, respectively YZHX, where Y, Z, H, X represents one of the components, respectively, and the frame rate is F fps, as shown below,

(Y1Z1H1X1)，(Y2Z2H2X2)，……，(YnZnHnXn) (1)

wherein, (Y1Z1H1X1) represents the 1 st frame data of the four-component video sequence, (YnZnHnXn) represents the n-th frame data of the four-component video sequence, and Yn, Zn, Hn, Xn represent the n-th frame data of each component of Y, Z, H, X, respectively. Because of the lack of correlation among the four components, the four components cannot be directly sent to an encoder for encoding, and recombination is needed, and the data format after recombination is as follows:

Y1,Y2,…Yk,Z1,Z2,…,Zk,H1,H2,…,Hk,X1,X2,…Xk,Yk+1,Yk+2,…Y2k,Zk+1,Zk+2,…,Z2k,Hk+1,Hk+2,…,H2k,Xk+1,Xk+2,…X2k,Y2k+1,Y2k+2,…Y3k,…… (2)

where k denotes the length of a multi-component GOP picture, Y1, Y2, … Yk denote single component video sequences, each component occurring k frames in succession, and then the components are arranged in a loop. The value of k depends on the encoder GOP size. Since the frame rate of the four-component video is F fps, the frame rate of the single-component video sequence is 4 times F fps.

The method comprises the following specific steps:

step I, artificially adding two components U, V to each data of the single-component video sequence, and converting the data into the existing three-component video format.

After decomposition of the four-component video data, the decomposed single-component video sequence will be video encoded, whereas conventional encoding methods require three components, such as YUV (YCbCr4:2:0, YCbCr4:2:2, and YCbCr4:4: 4). In order to adapt to the conventional encoding method, the UV component needs to be extended to the single-component video sequence, and therefore, the single-component video sequence needs to be transformed as follows:

(Y1,U1,V1),(Y2,U2,V2),…(Yk,Uk,Vk),(Z1,U1,V1),(Z2,U2,V2),…,(Zk,Uk,Vk),(H1,U1,V1),(H2,U2,V2),…,(Hk,Uk,Vk),(X1,U1,V1),(X2,U2,V2),…(Xk,Uk,Vk),(Yk+1,Uk+1,Vk+1),(Yk+2,Uk+2,Vk+2),…(Y2k,U2k,V2k),(Zk+1,Uk+1,Vk+1),(Zk+2,Uk+2,Vk+2),…,(Z2k,U2k,V2k),(Hk+1,Uk+1,Vk+1),(Hk+2,Uk+2,Vk+2),…,(H2k,U2k,V2k),(Xk+1,Uk+1,Vk+1),(Xk+2,Uk+2,Vk+2),…(X2k,U2k,V2k),(Y2k+1,U2k+1,V2k+1),(Y2k+2,U2k+2,V2k+2),…(Y3k,U3k,V3k),…… (3)

thus, each single component video sequence becomes the existing three-component video format, such as YUV, (Y1, U1, V1) representing the 1 st frame data of the three-component video sequence, and (Yk, Uk, Vk) representing the k-th frame data of the three-component video sequence. According to the encoding support capability of an encoder (YCbCr4:2:0, YCbCr4:2:2 and YCbCr4:4:4), the value of the UV component for the extension of the single-component video sequence is any fixed value between 0 and 255, the corresponding components U and V of each data are the same, and the components U and V can be the same or different.

And finally, encoding the three-component video sequence formula (3) by adopting the existing arbitrary encoding standards (such as MPEG2, AVS/AVS2, H264, H265 and the like), wherein the requirements for setting the encoding parameters of the encoder are as follows:

(1) setting the size of a coding parameter group of pictures (GOP) of an encoder to be k;

(2) describing the component type (Y, Z, H, X) of each GOP in the user data area in the I frame of each GOP, and finally outputting the code stream

And step II, coding the converted single-component video sequence according to the existing coding method such as transform coding, motion estimation and motion compensation, entropy coding or mixed coding to obtain corresponding coded data.

In order to uniformly process multi-component video and three-component video manufactured by people and simplify calculation, in the encoding process adopting the existing method, the length of a GOP group of pictures set by an encoder is the same as that of the multi-component GOP group of pictures, and the component type of the component video corresponding to the first I frame picture of each GOP group of pictures is marked.

And III, repeating the steps I and II to finish the coding of all the single-component video sequences in a multi-component GOP image group to obtain corresponding coded data.

For the above encoded data, the present invention further provides a decoding method for multi-component video, which specifically includes the following steps:

and step i, decoding the coded data according to the existing decoding method to obtain decoded data in a corresponding three-component video format, and then removing two artificially added components U, V to obtain a corresponding single-component video sequence.

Firstly, the length of a GOP group of pictures of a decoder is set to be the same as that of a GOP group of pictures of an encoder corresponding to the encoded data, then the GOP group of pictures to be decoded is decoded according to the existing decoding method to obtain decoded data of a corresponding three-component video format, three components of the decoded data are respectively stored in different addresses, two artificially added components are removed U, V, for example, data in the two last addresses are fixedly set to be two artificially added components U, V, so that the two remaining components can be removed according to address searching, and finally, a single-component video sequence of the corresponding component type is obtained according to the component type of a multi-component video corresponding to the first I-frame image of the GOP group of pictures during encoding.

And step ii, recombining all the single-component data again to obtain the corresponding multi-component video.

Firstly, judging whether the component type of a first single-component video sequence is the first component of a multi-component video, if so, recombining all single-component data again by using the inverse operation of recombination of each component of the multi-component video; if not, deleting the first single-component video sequence, judging whether the component type of the second single-component video sequence is the first component of the multi-component video or not until the single-component video sequence with the component type being the first component of the multi-component video is found, and then recombining all the residual single-component data again by using the inverse operation of recombination with each component of the multi-component video.

Because each component of the decoded data in the three-component video format is stored in a different address, all the components are recombined again according to the difference of the addresses to form the multi-component video.

Taking a 1000-frame quarter-component video as an example to specifically describe the embodiment of the present invention, each component size is 3840 × 2160, assuming that the frame rate F is 25fps, the corresponding equation (1) becomes

(Y1Z1H1X1)，(Y2Z2H2X2)，……，(Y1000,Z1000,H1000,X1000)。

Four-component video data reorganization:

assuming that the encoder sets the GOP group picture size to 25 frames, the recomposition is as follows according to equation (2):

Y1,Y2,…Y25,Z1,Z2,…,Z25,H1,H2,…,H25,X1,X2,…X25,Y26,Y27,…Y50,Z26,Z27,…,Z50,H26,H27,…,H50,X26,X27,…X50,Y51,Y52,…Y75,……， (4)

wherein, Y1, Y2, …, Y25, Z1, Z2, …, Z25, H1, H2, …, H25, X1, X2, … X25, etc. respectively represent the corresponding single-component video sequence, and the corresponding frame rate is 100 fps.

Encoding of a single component video sequence:

assuming that an h.264 encoder is used, considering the meaningless data of the UV component, the YCbCr4:2:0 chroma coding format is adopted, the size of the UV component is 1920x1080 respectively, and the values are all set to 128 uniformly.

The single component video sequence is transformed using equation (3) as follows:

(Y1,U1,V1),(Y2,U2,V2),…(Y25,U25,V25),(Z1,U1,V1),(Z2,U2,V2),…,(Z25,U25,V25),(H1,U1,V1),(H2,U2,V2),…,(H25,U25,V25),(X1,U1,V1),(X2,U2,V2),…(X25,U25,V25),(Y26,U26,V26),(Y27,U27,V27),…(Y50,U50,V50),(Z26,U26,V26),(Z27,U27,V27),…,(Z50,U50,V50),(H26,U26,V26),(H27,U27,V27),…,(H50,U50,V50),(X26,U26,V26),(X27,U27,V27),…(X50,U50,V50),(Y51,U51,V51),(Y52,U52,V52),…(Y75,U75,V75),…… (5)

setting the length of a GOP group of encoding parameters of an encoder to be 25, describing a component type (Y, Z, H and X) corresponding to each GOP group in a user data area in a first I frame of each GOP group, wherein the component type of the first GOP group in the above formula is Y, the component type of the second GOP group is Z, the component type of the third GOP group is H, the component type of the fourth GO group is X, the component type of the fifth GOP group is Y, and the like; and then encoded to produce encoded data.

Decoding of a single component video sequence:

decoding the coded data to obtain a three-component video sequence (as shown in formula 5); determining the component type (Y, Z, H, X) of each GOP group of pictures according to the description of the user data area of the first I frame in each GOP group of pictures, storing the three components in different addresses respectively, thus deleting U, V components according to different addresses to obtain a single-component video sequence, processing other GOP groups of pictures according to the method, and finally obtaining the sequence shown in the similar formula (4). If a group of GOP pictures is decoded, the following 25 frames are determined to be (H1, U1, V1), (H2, U2, V2), …, (H25, U25, V25) by knowing that the group of H components is an H component by the description of the user data area of the I frame, and the three components are stored in different addresses, so that U, V components can be deleted according to the different addresses, and a single-component video sequence H1, H2, …, H25 is obtained. Processing other GOP groups of pictures according to the method, and obtaining the following sequence:

H1,H2,…,H25,X1,X2,…X25,Y26,Y27,…Y50,Z26,Z27,…,Z50,H26,H27,…,H50,X26,X27,…X50,Y51,Y52,…Y75,……， (6)

four-component video data combining:

firstly, judging whether the component type of a first single-component video sequence is the first component of the multi-component video, if so, recombining all single-component data again by using the inverse operation of recombination with each component of the multi-component video, namely, directly recombining by using the inverse operation if the component type of the first single-component video sequence is Y to obtain the four-component video sequence shown in the formula (1).

If not, deleting the first single-component video sequence, judging whether the component type of the second single-component video sequence is the first component of the multi-component video until the single-component video sequence of which the component type is the first component of the multi-component video is found, then, recombining the remaining single-component data again by using the inverse operation of recombination of each component of the multi-component video, namely, if the component type of the first single-component video sequence is H, as shown in formula 6, deleting H1, H2, …, H25, X1, X2 and … X25, and then recombining according to the inverse operation, wherein the finally obtained four-component video sequence is as follows:

(Y26Z26H26X26)，(Y27Z27H27X27)，……，(Y1000,Z1000,H1000,X1000)

although particular embodiments of the present invention have been described above, it will be understood by those skilled in the art that these are by way of example only and that various changes or modifications may be made to these embodiments without departing from the spirit and scope of the invention and, therefore, the scope of the invention is to be defined by the appended claims.

Claims

1. an encoding method for multi-component video, wherein the number of the components is greater than three, comprising the following steps:

Step 1, take the GOP image group as a unit, reorganize each component of the multi-component video to obtain a single-component video sequence;

Step 2, artificially adding two components U and V to the single-component video sequence to convert it into an existing three-component video format, and then encode it according to the existing encoding method to obtain corresponding encoded data;

The multi-component video is evenly divided into multiple multi-component GOP image groups, the same components in all frame images in each multi-component GOP image group are combined into one group, and the data group formed by the same components is defined as a single-component video. sequence to form multiple single-component video sequences.

2. The encoding method for multi-component video according to claim 1, wherein the method for obtaining corresponding encoded data comprises the following steps:

Step 1, artificially add two components U, V to each data of the single-component video sequence, so that it is converted into an existing three-component video format;

Step Ⅱ, encode the converted single-component video sequence according to the existing encoding method, and mark the component type of the component video corresponding to the first I frame image of each GOP image group, obtain corresponding encoded data;

Step III, repeating steps I, II, to complete the encoding of all single-component video sequences in a multi-component GOP picture group, to obtain corresponding encoded data;

Step IV, repeating Step III, completes the encoding of all multi-component GOP picture groups, and obtains encoded data corresponding to the multi-component video.

3. The encoding method for multi-component video according to claim 2, wherein the component U corresponding to each of the data is the same, and the component V is also the same, and the component U and the component V take values in Any value from 0 to 255.

4. The coding method for multi-component video according to claim 2, wherein the existing coding method is set to transform coding, motion estimation and motion compensation, entropy coding or hybrid coding, and the coding process is set to The length of a given GOP group of pictures is the same as the length of a multi-component GOP group of pictures.

5. A decoding method based on the encoding method for multi-component video according to claim 1, characterized in that it comprises the following steps:

Step 1, according to existing decoding method, described encoded data is decoded, obtain the decoded data of corresponding three-component video format, then remove two components U, V that artificially add, obtain corresponding single-component video sequence;

Step ii: Recombining all single-component video sequences to obtain corresponding multi-component videos.

6 . The decoding method for multi-component video according to claim 5 , wherein the method for obtaining a corresponding single-component video sequence comprises: setting the GOP picture group of the decoder and the encoder corresponding to the encoded data. 7 . The length of the GOP picture group is the same. Then, the GOP picture group to be decoded is decoded according to the existing decoding method to obtain the corresponding decoded data in the three-component video format, and the three components are stored in different addresses respectively, and then removed. For the two artificially added components U and V, a single-component video sequence corresponding to the component type is obtained according to the component type of the multi-component video corresponding to the first I frame image of the GOP picture group during encoding.

7. The decoding method for multi-component video according to claim 5, wherein the method for recombining comprises: judging whether the component type of the first single-component video sequence is the first component of the multi-component video, if , use the inverse operation with the recombination of each component of the multi-component video to recombine all the single-component data; if not, delete the first single-component video sequence, and judge whether the component type of the second single-component video sequence is The first component of the multi-component video, until a single-component video sequence with the component type of the first component of the multi-component video is found, then, using the inverse operation with the recombination of the individual components of the multi-component video, for all the remaining single-component video sequences The data is reorganized again.