CN108924551B

CN108924551B - Method for predicting video image coding mode and related equipment

Info

Publication number: CN108924551B
Application number: CN201810994848.9A
Authority: CN
Inventors: 张宏顺
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2022-01-07
Anticipated expiration: 2038-08-29
Also published as: CN108924551A

Abstract

The method comprises the steps of firstly determining whether an alternative coding mode of a prediction unit is a merging mode after the prediction unit is obtained, if the alternative coding mode of the prediction unit is the merging mode, obtaining a target coding mode of an associated prediction unit with an association relation of the prediction unit, and if the target coding mode of the associated prediction unit meets the condition of being associated with the merging mode, determining the merging mode as the target coding mode of the prediction unit. In addition, the application also provides a prediction device of the video image coding mode, which is used for ensuring the application and the realization of the method in practice.

Description

Method for predicting video image coding mode and related equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method for predicting a video image coding mode and a related device.

Background

Video compression is a video image processing technology, which realizes compression by removing redundant information in a video image sequence, and further realizes reduction of resources consumed in the processes of storage, transmission and the like of video images.

The main processes of video coding include the following: a frame of video image is sent to an encoder, firstly, the video image is divided into a plurality of coding unit blocks according to the size of a preset maximum coding unit, then, each coding unit block is subjected to depth division of the coding unit layer by layer until the coding unit blocks are divided to the size of the minimum coding unit. And carrying out intra-frame and inter-frame prediction on each coding unit to obtain a predicted value. And subtracting the input frame video image from the predicted value to obtain a data residual error. The data residual is subjected to Discrete Cosine Transform (DCT) and quantization to obtain residual coefficients. The residual coefficient is sent to the entropy coding module to output the code stream. And meanwhile, after inverse quantization and inverse transformation are carried out on the residual coefficient, a residual value of the reconstructed image is obtained. The residual value is added to the predicted value in the frame or between frames to obtain a reconstructed image. And after the reconstructed image is subjected to in-loop filtering, the reconstructed image enters a reference frame queue to be used as a reference image in the coding process of the next frame of video image so as to realize the coding of the next frame of video image.

The above-described procedure of performing intra and inter prediction on the coding unit is referred to as a prediction coding procedure. The existing prediction coding step has low coding efficiency, thereby influencing the efficiency of the whole process of video compression.

Disclosure of Invention

In view of the above, the present application provides a method for predicting a video image coding mode, which is used to improve the coding efficiency of a predictive coding step.

In order to achieve the purpose, the technical scheme provided by the application is as follows:

in a first aspect, the present application provides a method for predicting a video image coding mode, including:

obtaining a prediction unit obtained by dividing a video image; the video image is divided into coding units, and the prediction unit is a unit block which is obtained by dividing the coding unit and is used for predicting as a coding mode;

for a prediction unit, judging whether an alternative coding mode of the prediction unit is a merging mode or a skipping mode;

if the alternative coding mode is a merging mode, obtaining a target coding mode of a relevant prediction unit of the prediction unit;

and if the target coding mode of the associated prediction unit meets a preset condition associated with a merging mode, determining the merging mode as the target coding mode of the prediction unit.

In a second aspect, the present application provides an apparatus for predicting a video image coding mode, comprising:

the prediction unit obtaining module is used for obtaining a prediction unit obtained by dividing the video image; the video image is divided into coding units, and the prediction unit is a unit block which is obtained by dividing the coding unit and is used for predicting as a coding mode;

the device comprises an alternative coding mode determining module, a coding mode selection module and a coding mode selection module, wherein the alternative coding mode determining module is used for judging whether an alternative coding mode of a prediction unit is a merging mode or a skipping mode aiming at the prediction unit;

a related unit mode determining module, configured to obtain a target coding mode of a related prediction unit of the prediction unit if the alternative coding mode is a merge mode;

and the target coding mode determining module is used for determining the merging mode as the target coding mode of the prediction unit if the target coding mode of the associated prediction unit meets the preset condition associated with the merging mode.

In a third aspect, the present application provides a prediction apparatus for a video image coding mode, comprising: a processor and a memory, the processor executing a software program stored in the memory, calling data stored in the memory, and performing at least the following steps:

In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, which, when loaded and executed by a processor, implements the above-mentioned prediction method for video image coding mode.

According to the foregoing technical solutions, the present application provides a method for predicting a video image coding mode, where after a prediction unit is obtained, it is first determined whether an alternative coding mode of the prediction unit is a merge mode, if the alternative coding mode of the prediction unit is the merge mode, a target coding mode of the prediction unit having an association relationship with the prediction unit is obtained, and if the target coding mode of the associated prediction unit meets a preset condition, the merge mode is determined as the target coding mode of the prediction unit. Therefore, after preliminary judgment is carried out, after the prediction unit is determined to use the merging mode, the target coding mode of the associated prediction unit is used as reference information, if the reference information meets the preset condition, the merging mode can be directly used as a final coding mode, so that the prediction of the motion estimation mode is skipped, the calculated amount of the motion estimation mode prediction process is avoided, the efficiency of the prediction coding step is improved, and the efficiency of the video image processing process is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for predicting video image coding modes provided in the present application;

FIG. 2 is a diagram illustrating an example of a partition of a coding unit provided in the present application;

FIG. 3 is an exemplary diagram of eight partitions of a prediction unit provided in the present application;

FIG. 4 is another flow chart of a method for predicting video image coding modes provided in the present application;

FIG. 5 is a schematic structural diagram of a prediction apparatus for video image coding modes according to the present application;

fig. 6 is a schematic structural diagram of a prediction apparatus for video image coding mode provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The trend of Video is High definition, High frame rate and High compression rate, and Video compression technologies include, but are not limited to, h.264, HEVC (High Efficiency Video Coding), and the like. The encoding protocol of the current video compression technology is complex, the calculation amount of the encoding process is large, and the requirement on the computing capacity of equipment for executing video image processing is high.

The current encoding protocol includes a predictive encoding step, which needs to sequentially try in multiple encoding modes for a prediction unit to find the best encoding mode, resulting in low encoding efficiency. The method mainly solves the problem of low coding efficiency caused by large calculation amount in a predictive coding step.

Referring to fig. 1, a flow of a prediction method for a video image coding mode provided in the present application is shown. As shown in fig. 1, the method specifically includes steps S101 to S104.

S101: obtaining a prediction unit obtained by dividing a video image; the video image is divided into coding units, and the prediction unit is obtained by dividing the coding units and is used as a unit block predicted by a coding mode.

As mentioned above, the video image processing flow first divides the video image into coding units and then performs predictive coding. To facilitate understanding of the predictive coding step, the division coding unit step is first explained.

After obtaining a frame of video image, dividing the video image into coding units may be performed layer by layer. Specifically, a frame of video image is first divided into a plurality of Coding Tree Blocks (CTBs) according to a preset maximum unit block size, and then the coding tree blocks are deeply divided layer by layer until the coding tree blocks of a certain layer meet the preset minimum unit block size.

For example, a frame of image may be first divided into a plurality of 64 × 64 coding tree blocks, and then, taking a certain 64 × 64 coding tree block as an example, may be further divided into 32 × 32 coding blocks, and then, a certain or some 32 × 32 coding blocks may be further divided into 16 × 16 coding blocks, and further, a certain or some 16 × 16 coding blocks may be divided into 8 × 8 coding blocks.

After the division, one frame image is divided into coding blocks of various sizes, such as 64 × 64, 32 × 32, 16 × 16, and 8 × 8. Or called Code Unit (CU for short). Each coding unit CU may perform independent operations of coding prediction, transform and quantization, entropy coding, etc.

It should be noted that, there is a hierarchical structure between coding units, and a coding unit with a larger size can be divided into coding units with a smaller size, so as to form a coding unit structure of upper and lower layers.

The upper layer coding unit first divides the lower layer coding unit according to a division rule, but the division is pre-division or is called analog division, and whether to perform the lower layer division is finally determined based on the result of the below-described predictive coding step. That is, an upper layer CU may or may not have a lower layer CU. See fig. 2, which shows an example of a partitioned coding unit. As shown in fig. 2, a 64 × 64 CU is divided into 4 32 × 32 CU units, wherein the 32 × 32 CU units in the upper right corner and the lower left corner are divided into 4 16 × 16 CU units, but the 32 × 32 CU units in the upper left corner and the lower right corner are not divided into lower layers; further, of the 4 16 × 16 coding units at the bottom left corner, the coding units at the top left corner and the bottom right corner are divided into 8 × 8 coding units, but the coding units at the top right corner and the bottom left corner are not divided into 8 × 8 coding units.

It is mentioned above that how to determine whether a certain coding unit is to be subjected to lower layer division is determined by the result of the predictive coding step, which will be described in detail below.

The video comprises a plurality of frames of images which are continuous in time, and the video coding needs to respectively perform a coding process on each frame of image in sequence. The encoding process comprises a predictive encoding step, wherein the predictive encoding is based on the correlation of video frame images on time or space, the correlation information of the encoded pixels is used for predicting the uncoded pixels to obtain a predicted value, and the difference value is encoded after the actual value and the predicted value of the uncoded pixels are differed, so that the redundancy is removed.

In the prediction encoding step, a Prediction Unit (PU) is used. The prediction unit PU is a basic unit for performing prediction operation, and in order to implement prediction coding, the coding unit CU needs to be divided according to a preset division manner to obtain the prediction unit PU. It should be noted that the partition from the coding unit CU to the prediction unit PU has only one layer at most, and the PU with the smallest size is 4 × 4.

The division may be symmetrical or asymmetrical. For inter prediction, a 2N × 2N (N may be 4, 8, 16, 32, etc.) coding unit CU may divide the prediction units PU in 7 ways, resulting in 7 different types of prediction units PU.

As shown in fig. 3, the 7 division methods are: 2 Nx2N, 2 Nx N, N x2N, 2 NxnU, 2 NxnD, nLx2N and nRx2N. Where nU denotes that the division line is above the horizontal center line of the coding unit CU, nD denotes that the division line is below the horizontal center line of the coding unit CU, nL denotes that the division line is to the left of the vertical center line of the coding unit CU, and nR denotes that the division line is to the right of the vertical center line of the coding unit CU.

Different partitioning methods can obtain different types of prediction units PU, such as the 7 prediction units PU obtained above.

It should be noted that, similarly to the partition of the coding unit CU, the partition of the prediction unit PU is also an analog partition, and a final partition needs to be determined from the several partitions.

The coding unit CU is pre-partitioned into 7 forms of prediction units PU, each of which also needs to be predicted, i.e. to predict which mode of coding the form of prediction unit PU uses. The encoding mode may include, but is not limited to, merge (merge) mode, skip (skip) mode, Motion Estimation (ME) mode. These coding modes are conventional coding modes, and the coding effect on the prediction unit PU is different in different coding modes. One embodiment of the encoding effect is Rate Distortion Cost (RDcost), and the lower the Rate Distortion Cost is, the better the encoding effect is, otherwise, the worse the effect is.

It should be noted that the coding modes that may be used by the prediction units obtained in different partition forms are different, and it is necessary to predict the coding mode used by each prediction unit among the coding modes that may be used by the prediction unit. Specifically, for a 2N × 2N prediction unit, the possible coding modes include a merge mode, a skip mode, and a motion estimation mode, and therefore, one of the three modes needs to be selected as a final coding mode; for six prediction units, 2 nx N, N × 2N, 2 nxnu, 2 nxnd, nlxn 2N and nrx2N, the coding modes that may be used include two coding modes, namely a merge mode and a motion estimation mode, so that only one of the two coding modes needs to be selected.

In the merge mode calculation process, it is necessary to calculate motion vectors of 5 prediction unit blocks around the prediction unit, and select a motion vector with the smallest rate-distortion cost from the 5 motion vectors. Note that the rate-distortion cost calculation method used by six types of prediction units, 2N × N, N × 2N, 2N × nU, 2N × nD, nL × 2N, and nR × 2N, is different from that of 2N × 2N.

Specifically, the rate-distortion cost calculation method of the 2N × 2N prediction unit is as follows: RDcost ═ SSD + λ × bit. Wherein, RDcost represents the rate distortion cost; SSD (Sum of Squared differences) represents the Sum of squares of errors of original pixels and reconstructed pixels, and in order to obtain the value, the steps of DCT transformation, quantization, inverse quantization, and inverse transformation need to be performed on the residual signal, which is relatively complicated in calculation; λ represents the Lagrangian constant; bit represents the number of code sub-bits that the coding mode needs to consume. It should be noted that, for this type of prediction unit, there is a high probability that the merging mode is selected as the final prediction mode.

The rate distortion cost calculation modes of the other six prediction units are as follows: RDcost ═ SATD + λ × bit. Wherein SATD (sum of Absolute Transformed difference) indicates that the sum of Absolute values is obtained after Hadamard (Hadamard) transform, and the method is a way of calculating distortion, specifically, after obtaining the residual signal of the prediction unit, Hadamard transform is performed, and then the sum of Absolute values of each element obtained after transform is solved; the remaining formula parameters are as described above.

Note that the SSD is more computationally complex than SATD, and therefore the 2N × 2N merge mode prediction process is slower.

For ease of understanding, the skip mode is briefly described. The skip mode is a special case of the merge mode, and particularly, on the basis of the merge mode, a special case that a current residual is assumed to be 0 is assumed. If the rate-distortion cost of the skip mode is smaller, an encoding mode using the skip mode as a prediction unit is selected. The skip mode is applied only to a prediction unit having a division scheme of 2N × 2N.

After the prediction unit PU is obtained by partitioning according to a certain partitioning manner, the coding mode used by the prediction unit PU needs to be determined according to the quality of the coding effect. And then comparing the coding effects of the prediction units PU obtained by different partition modes to obtain which partition mode is finally used by the prediction units PU.

For example, after the coding unit CU is divided in a 2N × N manner, it is determined that the coding effect is optimal when the merging mode is used in this dividing manner, and then the coding effect in this case is compared with the coding effect of other dividing manners, such as an N × 2N coding mode, to determine whether to finally use the 2N × N dividing manner or other dividing manners.

The description is given by specific application scenarios. It is assumed that the video images record a person walking on a platform, and it is assumed that in one frame of video image, the body background of the person is a blackboard, but in another frame of video image, half of the body background of the person is a blackboard, and the other half of the body background is a white wall. Therefore, it can be understood that in the previous frame of video image, the coding effect of the whole video block where the person is located is the best, but in the next frame of video image, the coding effect of the prediction unit obtained by dividing the video block where the person is located in a left-right manner is better than that of the other division manners. The description of the division mode is only a specific scene, and in other application scenes, the prediction unit obtained by a certain division mode can be determined through comparison, and the coding effect is better than that of other division modes. This also explains why the prediction of the coding mode needs to be performed, i.e. to find a prediction unit partition that has the best coding effect.

It can be seen that the same steps are required to be performed multiple times in the predictive coding process, which includes determining the coding effects of a prediction unit in different coding modes for the prediction unit, and then comparing the coding effects to determine the optimal coding mode of the prediction unit. Due to the fact that coding effects of multiple different coding modes need to be calculated, the calculation amount is large, the efficiency of a prediction coding step is low, and the efficiency of the whole video image processing process is low.

In order to improve the efficiency of prediction coding, the prediction unit skips the prediction of some coding modes by referring to the information around the prediction unit, and directly determines the coding mode used by the prediction unit. The specific implementation is as follows.

S102: for a prediction unit, it is determined whether an alternative coding mode of the prediction unit is a merge mode or a skip mode.

The prediction unit to which the present application is directed may be a prediction unit of a certain type, such as a prediction unit obtained by a partition method other than 2N × 2N, in other words, a prediction unit obtained by a partition method of N × N, 2N × N, N × 2N, 2N × nU, 2N × nD, nL × 2N, and nR × 2N. Therefore, a prediction unit in this step may be a prediction unit obtained by a predetermined type of partition. For convenience of description, the prediction unit may be referred to as a target prediction unit.

It should be noted that the prediction of the coding mode may be sequential, for example, the execution sequence sequentially includes a merge mode, a skip mode and a motion estimation mode, that is, the coding effect in the merge mode is calculated first, then the coding effect in the skip mode is calculated, and after comparing the two coding effects, it is determined whether to use the merge mode or the skip mode. For convenience of description, the coding mode determined in this case may be referred to as an alternative coding mode. It can be seen that the alternative coding mode refers to a non-final coding mode in the multiple coding mode determination process.

The obtained alternative coding mode may be a merge mode or a skip mode, and the specific mode is determined by the coding effect. Specifically, the coding effect represents the distortion of the image in the coding mode, one expression of the coding effect is rate distortion cost, and the mode of selecting the alternative coding mode according to the rate distortion cost comprises the following steps:

in the merge mode, a preset number of adjacent unit blocks are selected for the prediction unit, motion vectors of the selected adjacent unit blocks are obtained, a candidate motion vector table is constructed, and a motion vector with the minimum rate distortion cost is found in the candidate motion vector table. In the skip mode, a preset number of adjacent unit blocks are selected for a prediction unit, motion vectors of the selected adjacent unit blocks are obtained, a candidate motion vector table is constructed, and a motion vector with the minimum rate distortion cost is found in the candidate motion vector table. And comparing the rate distortion cost in the merging mode and the skipping mode, and finding out the coding mode with lower rate distortion cost. Note that the skip mode and the merge mode construct the candidate motion vector table in the same manner, but the rate-distortion cost is calculated in a different manner. Specifically, as can be seen from the above description of the rate-distortion cost calculation flow, before the rate-distortion cost is calculated, a residual signal of the prediction unit needs to be obtained, and then the residual signal is subjected to operation processing to obtain the rate-distortion cost. The skip mode is different from the merge mode in that the skip mode sets the prediction residual to 0 when calculating the rate-distortion cost, but the merge mode does not.

No matter what form of evaluation function is used, after the rate-distortion cost is obtained through calculation, the candidate coding mode of the prediction unit can be obtained according to the rate-distortion cost. Since the rate-distortion cost under the merging mode and the skip mode is calculated, the obtained alternative coding mode may be the merging mode or the skip mode.

S103: and if the alternative coding mode is the merging mode, obtaining the target coding mode of the associated prediction unit of the prediction unit.

If the alternative coding mode is the merging mode, the prediction process of the motion estimation mode can be skipped through the relevant information of the prediction unit which has the association relation with the prediction unit, so that the final coding mode used by the prediction unit is determined.

For convenience of description, a prediction unit having an association relationship with a prediction unit may be referred to as an associated prediction unit. The association relationship may refer to an association relationship in a location, or an association relationship in time. The association relationship in position may be embodied as any one or more of the following relationships: upper layer relation, lower layer relation, and adjacent relation of same layer position. The upper layer relation refers to the upper layer of the prediction unit, in other words, the parent block of the prediction unit; the lower layer relation refers to a layer next to the prediction unit, in other words, a sub-block of the prediction unit; the same-layer position neighboring relationship refers to a relationship that is adjacent to the same layer as the prediction unit, in other words, a neighboring block of the prediction unit.

Since the prediction unit is obtained by dividing the coding unit, it can be considered that the coding unit is related to the prediction unit divided by the coding unit. The division from the coding unit to the prediction unit is only one layer, and the upper layer of the prediction unit refers to the prediction unit which is obtained by the division of the coding unit of the upper layer and is in a2 Nx 2N form; the lower layer of the prediction unit is a prediction unit divided by the lower layer coding unit and has a 2N × 2N format.

In the step, a target coding mode used by the associated prediction unit is obtained, wherein the target coding mode refers to a final coding mode obtained after the associated prediction unit is subjected to coding effect comparison in multiple coding modes.

S104: and if the target coding mode of the associated prediction unit meets the preset condition associated with the merging mode, determining the merging mode as the target coding mode of the prediction unit.

After the target coding mode of the associated prediction unit is obtained, whether the target coding mode of the associated prediction unit meets a preset condition is judged, and if yes, the merging mode can be directly determined as the target coding unit of the prediction unit. Thus, the process of comparing the coding effect of the merging mode with the coding effect of the motion estimation mode can be omitted.

It should be noted that the preset condition is a condition related to the merge mode, and is used for limiting a condition that the prediction unit uses the merge mode as the target coding mode.

After the target coding mode of the prediction unit is obtained, since the prediction unit is obtained in a certain partition mode, the target coding mode of the prediction unit can be compared with the coding effect of the prediction units obtained in other partition modes to determine which partition mode of the prediction unit is finally used by the coding unit.

According to the foregoing technical solutions, the present application provides a method for predicting a video image coding mode, where after a prediction unit is obtained, it is first determined whether an alternative coding mode of the prediction unit is a merge mode, if the alternative coding mode of the prediction unit is the merge mode, a target coding mode of an associated prediction unit having an association relationship with the prediction unit is obtained, and if the target coding mode of the associated prediction unit meets a condition associated with the merge mode, the merge mode is determined as the target coding mode of the prediction unit. Therefore, after preliminary judgment is carried out, after the prediction unit is determined to use the merging mode, the target coding mode of the associated prediction unit is used as reference information, if the reference information meets the preset condition, the merging mode can be directly used as a final coding mode, so that the prediction of the motion estimation mode is skipped, the calculated amount of the motion estimation mode prediction process is avoided, the efficiency of the prediction coding step is improved, and the efficiency of the video image processing process is improved.

The following description specifically describes under what conditions the target coding mode of the associated prediction unit is, and the merging mode can be determined as the target coding mode of the prediction unit, and specifically includes the following steps a1 to a 5.

A1: and judging whether the prediction unit has a parent block and the target coding mode of the parent block is a merging mode or not for the obtained prediction unit, and if so, adding a mark that the parent block is the merging mode.

Wherein, the parent block is added as the mark of the merging mode, and the purpose is to record the parent block as the merging mode. Of course, the way of recording the parent block as the merge mode may be other, and is not limited to this.

A2: and aiming at the obtained prediction unit, judging whether the prediction unit has the subblocks and the target coding mode of the subblock is the merging mode, and if the prediction unit and the subblocks are both in the merging mode, recording the number of the subblocks of which the target coding mode is the merging mode.

A3: and judging whether the prediction unit is adjacent to the obtained prediction unit or not, wherein the target coding mode of the adjacent blocks is a merging mode, and recording the number of the adjacent blocks of which the target coding mode is the merging mode if the target coding mode of the adjacent blocks is the merging mode.

The adjacent blocks may include, but are not limited to, any one or more of a left block, a top block, and a top left block.

A4: if the number of the recorded adjacent blocks is greater than 1 and the parent block has a flag indicating that the parent block is a merging mode, it is determined that the target coding mode of the associated PU satisfies the predetermined condition, and the merging mode may be determined as the target coding mode of the PU.

A5: if the number of the recorded adjacent blocks is greater than 1 and the number of the recorded sub-blocks is greater than 2, it is determined that the target coding mode of the associated PU satisfies the predetermined condition, and the merge mode may be determined as the target coding mode of the PU.

Of course, the number threshold for limiting the number of sub-blocks is not limited to 2, and may be other values.

It should be noted that, as can be seen from steps a1-A3, the recorded related information of the associated prediction units includes the types of the associated prediction units and the number of associated prediction units of each type. Of course, the related information may be other, and is not limited to this, for example, the location information of the associated prediction unit, and the like. The correlation information is used to indicate the degree of correlation between the correlated prediction unit and the prediction unit, and the degree of correlation includes high or low. If the adjacent blocks indicate a high degree of association, the other blocks indicate a low degree of association. The parent block is more relevant than the child block among the other blocks.

As can be seen from steps a4-a5, when determining whether the target coding mode of the associated PU satisfies the preset condition, it can be specifically determined whether the type of the selected associated PU satisfies the type requirement and the number of the selected associated PU satisfies the number requirement of the preset condition.

As can be seen from steps a1-a5, the related prediction unit selected in the present application is a prediction unit located close to the target prediction unit, and whether or not the target prediction unit also uses the merge mode is determined by determining whether or not all prediction units located close to the target prediction unit use the merge mode. It can be seen that the target coding modes of the prediction units surrounding the target prediction unit are used as reference information to determine the target coding mode of the target prediction unit.

For convenience of explaining the difference between the coding mode prediction method provided in the present application and the conventional coding mode prediction method, the present application will be described by way of illustration.

Referring to fig. 4, an example of a method for predicting a coding mode of a video image provided by the present application is shown. As shown in fig. 4, the method includes steps S401 to S406.

S401: obtaining a prediction unit obtained by dividing a video image; the video image is divided into coding units, and the prediction unit is obtained by dividing the coding units and is used as a unit block predicted by a coding mode.

S402: for a prediction unit, it is determined whether an alternative coding mode of the prediction unit is a merge mode or a skip mode.

S403: and if the alternative coding mode is the merging mode, obtaining the target coding mode of the associated prediction unit of the prediction unit.

S404: and judging whether the target coding mode of the associated prediction unit meets a preset condition associated with the merging mode. If yes, go to step S405; if not, go to step S406.

S405: the merge mode is determined as a target coding mode of the prediction unit.

Note that, the descriptions of steps S401 to S405 above can refer to the description of fig. 1, and are not repeated here.

S406: and calculating the coding effect of the prediction unit in the motion estimation mode, and comparing the coding effect with the coding effect of the merging mode to determine whether the target coding mode is the merging mode or the motion estimation mode.

Here, a coding mode of motion estimation is explained.

A video image contains an object, which may be referred to as a target object. The target object moves from one position to another, so the position of the target object is different in different video images. When encoding a video image of a current frame, the positions of target objects in video images of previous and subsequent frames may be referenced. And finding the position of the target object in the reference frame, determining the motion direction information of the target object in the reference frame relative to the target object in the current frame, and taking the position, the direction and other information as motion vectors which are used for coding the video image of the current frame. This process may be referred to as motion estimation, and motion estimation may result in motion vectors that represent motion path information for the target object.

The principle of motion estimation is that in most video image sequences, the contents of adjacent images are very similar, and the change of a background picture is very small, so that all information of each frame of image is not required to be coded, only the motion information of a moving object in the current image is required to be transmitted to a decoder, and the current image can be recovered by using the content of the previous image and the motion information of the current image. The coding mode can effectively save resources such as video storage, transmission and the like.

When estimating motion, it is necessary to use an encoded video image as a reference image (also referred to as a reference frame), a reference coding unit in the reference image is referred to as a reference coding unit or a reference block, a displacement from the reference coding unit to a target prediction unit is referred to as a Motion Vector (MV), and a prediction residual can be obtained according to the motion vector of the reference coding unit. Specifically, a reference frame pointed by a motion vector of a reference coding unit is determined, reconstruction data of a position pointed by the motion vector is obtained in the reference frame, and whether interpolation operation needs to be carried out on the reconstruction data is judged. If the pixel pointed by the motion vector is an integer pixel, the reconstruction data does not need to be subjected to interpolation operation; if the pixel pointed by the motion vector is a sub-pixel, the reconstruction data needs to be subjected to interpolation operation. And carrying out interpolation operation on the reconstruction data needing interpolation operation so as to obtain a predicted value. And finally, the pixel value of the target prediction unit is subtracted from the predicted value to obtain a prediction residual error. In motion estimation, a coding unit with the smallest prediction residual is found from a plurality of reference pictures, and the calculation amount is large.

In the video image coding process, the prediction of the motion estimation mode needs to consume a large amount of calculation, wherein the larger the number of reference frames, the larger the calculation amount. For example, for the target prediction unit, the calculation amount of traversing calculation in four reference frames in the front and back two directions occupies about 40% of the whole calculation amount, and a maximum of 16 reference frames are usually set in the encoding protocol, thereby causing the motion estimation mode prediction to consume a large calculation amount.

As mentioned above, the coding unit may obtain the prediction units according to multiple partition manners, and for the prediction unit obtained by each partition manner, motion estimation mode prediction under multiple reference video images needs to be performed, thereby resulting in a cumbersome processing procedure of the entire video image. Therefore, through comparison, if the target coding mode of the target prediction unit is determined to be the merging mode, the prediction of the motion estimation mode can be skipped, so that the video image processing process is simplified, and the purpose of accelerating coding is achieved.

The following describes a device related to prediction of a video image coding mode provided in the present application, and for related description, reference may be made to the above method for predicting a video image coding mode, which is not described in detail below.

Referring to fig. 5, an example of a prediction method for a video image coding mode is shown, which specifically includes: a prediction unit obtaining module 501, an alternative coding mode determining module 502, an association unit mode determining module 503, and a target coding mode determining module 504.

A prediction unit obtaining module 501, configured to obtain a prediction unit obtained by dividing a video image; the video image is divided into coding units, and the prediction unit is obtained by the division of the coding units and is used as a unit block predicted by a coding mode;

an alternative coding mode determining module 502, configured to determine, for a prediction unit, whether an alternative coding mode of the prediction unit is a merge mode or a skip mode;

an associated unit mode determining module 503, configured to obtain a target coding mode of an associated prediction unit of the prediction unit if the alternative coding mode is a merge mode;

a target encoding mode determining module 504, configured to determine the merging mode as the target encoding mode of the prediction unit if the target encoding mode of the associated prediction unit satisfies a preset condition associated with the merging mode.

In one example, the alternative coding mode determination module comprises: an alternative coding mode determination sub-module.

The alternative coding mode determining submodule is used for obtaining the rate distortion cost of a prediction unit under two coding modes, namely a merging mode and a skipping mode, aiming at the prediction unit; and selecting the coding mode corresponding to the smaller rate distortion cost as the alternative coding mode of the prediction unit.

In one example, the association unit mode determination module includes: and an association unit mode determination submodule.

An association unit mode determining sub-module, configured to determine, if the candidate coding mode is a merge mode, a prediction unit having a target association relationship with the prediction unit as an association prediction unit of the prediction unit, where the target association relationship is any one or more of the following: an upper layer relation, a lower layer relation and a same layer position adjacent relation; and obtaining a target coding mode of the associated prediction unit.

In one example, the target encoding mode determination module includes: the device comprises an association unit selection submodule, a related information determination submodule, a related information judgment submodule and a target coding mode determination submodule.

The association unit selection submodule is used for selecting an association prediction unit of which the target coding mode is a merging mode; a relevant information determining submodule, configured to determine relevant information of the selected relevant prediction unit, where the relevant information is used to indicate a degree of association between the relevant prediction unit and the prediction unit; the relevant information judgment submodule is used for judging whether the relevant information meets a preset condition associated with the merging mode; if yes, triggering a target coding mode determining submodule; a target encoding mode determination sub-module for determining a merging mode as a target encoding mode of the prediction unit.

In one example, the related information determination sub-module includes: a related information determination unit.

And the related information determining unit is used for determining the type of the selected related prediction unit and the number of the related prediction units of each type.

In one example, the related information judgment sub-module includes: and a related information judging unit.

And the related information judging unit is used for judging whether the type of the selected related prediction unit meets the type requirement in the preset condition related to the merging mode and whether the number meets the number requirement in the preset condition.

Referring to fig. 6, a structure of a prediction apparatus for a video image coding mode provided in the present application is shown. As shown in fig. 6, the apparatus may include: memory 601, processor 602, and communication bus 603.

The memory 601 and the processor 602 complete communication with each other through the communication bus 603.

A memory 601 for storing programs; memory 601 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

A processor 602 for executing a program, which may comprise program code, including operating instructions for the processor. Among them, the procedure can be specifically used for:

obtaining a prediction unit obtained by dividing a video image; the video image is divided into coding units, and the prediction unit is obtained by the division of the coding units and is used as a unit block predicted by a coding mode;

The processor 602 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present application. It should be noted that the processor 602 may be a hardware representation of the virtualization module described above.

In addition, the present application also provides a readable storage medium, on which a computer program is stored, and when the computer program is loaded and executed by a processor, the method for predicting the video image coding mode as any one of the above methods is implemented.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the same element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for predicting a video image coding mode, comprising:

if the target coding mode of the associated prediction unit meets a preset condition associated with a merging mode, determining the merging mode as the target coding mode of the prediction unit;

wherein, if the target coding mode of the associated prediction unit satisfies a preset condition associated with a merge mode, determining the merge mode as the target coding mode of the prediction unit includes: selecting a related prediction unit with a target coding mode as a merging mode; determining relevant information of the selected associated prediction unit, wherein the relevant information is used for expressing the association degree of the associated prediction unit and the prediction unit; judging whether the related information meets a preset condition associated with a merging mode; if yes, determining a merging mode as a target coding mode of the prediction unit, and skipping prediction of a motion estimation mode;

the determining the relevant information of the selected associated prediction unit includes: judging whether the associated prediction unit is a parent block of the prediction unit and the target coding mode of the parent block is a merging mode, if so, adding a mark that the parent block is the merging mode; judging whether the associated prediction unit is a subblock of the prediction unit and the target coding mode of the subblock is a merging mode, and if the associated prediction unit and the associated prediction unit are both subblocks, recording the number of the subblocks of which the target coding mode is the merging mode; judging whether the associated prediction unit is an adjacent block of the prediction unit and the target coding mode of the adjacent block is a merging mode, if so, recording the number of the adjacent blocks of which the target coding mode is the merging mode;

the preset conditions comprise: the number of the recorded adjacent blocks is more than 1 and has a mark that a parent block is a merging mode; or the like, or, alternatively,

the number of recorded adjacent blocks is greater than 1 and the number of recorded sub-blocks is greater than 2.

2. The method of claim 1, wherein the determining whether the candidate coding mode of the prediction unit is the merge mode or the skip mode for a prediction unit comprises:

aiming at a prediction unit, obtaining rate distortion cost of the prediction unit under two coding modes, namely a merging mode and a skipping mode respectively;

and selecting the coding mode corresponding to the smaller rate distortion cost as the candidate coding mode of the prediction unit.

3. The method of claim 1, wherein obtaining the target coding mode of the associated one of the prediction units comprises:

determining a prediction unit having a target association relation with the prediction unit as an associated prediction unit of the prediction unit, wherein the target association relation is any one or more of the following: an upper layer relation, a lower layer relation and a same layer position adjacent relation;

obtaining a target coding mode for the associated prediction unit.

4. An apparatus for predicting a video image coding mode, comprising:

a target encoding mode determining module, configured to determine a merging mode as a target encoding mode of the prediction unit if a target encoding mode of the associated prediction unit satisfies a preset condition associated with the merging mode; the target encoding mode determination module includes: the association unit selection submodule is used for selecting an association prediction unit of which the target coding mode is a merging mode; a relevant information determining submodule, configured to determine relevant information of the selected relevant prediction unit, where the relevant information is used to indicate a degree of association between the relevant prediction unit and the prediction unit; the relevant information judgment submodule is used for judging whether the relevant information meets a preset condition associated with the merging mode; if yes, triggering a target coding mode determining submodule; a target encoding mode determination sub-module for determining a merging mode as a target encoding mode of the prediction unit, skipping prediction of a motion estimation mode;

the related information determination sub-module includes: judging whether the associated prediction unit is a parent block of the prediction unit and the target coding mode of the parent block is a merging mode, if so, adding a mark that the parent block is the merging mode; judging whether the associated prediction unit is a subblock of the prediction unit and the target coding mode of the subblock is a merging mode, and if the associated prediction unit and the associated prediction unit are both subblocks, recording the number of the subblocks of which the target coding mode is the merging mode; judging whether the associated prediction unit is an adjacent block of the prediction unit and the target coding mode of the adjacent block is a merging mode, if so, recording the number of the adjacent blocks of which the target coding mode is the merging mode;

5. The apparatus for predicting video image coding mode according to claim 4, wherein said alternative coding mode determining module comprises:

6. The apparatus for predicting video image coding mode according to claim 5, wherein said associated unit mode determining module comprises:

7. A prediction device for a video image coding mode, comprising: a processor and a memory, the processor executing a software program stored in the memory, calling data stored in the memory, and performing at least the following steps:

8. A readable storage medium on which a computer program is stored, which, when loaded and executed by a processor, implements the method of prediction of video image coding modes as claimed in any one of claims 1 to 3 above.