CN113557720B

CN113557720B - Video processing method, apparatus and non-transitory computer readable medium

Info

Publication number: CN113557720B
Application number: CN202080019945.1A
Authority: CN
Inventors: 朱维佳; 许继征; 张莉; 张凯; 刘鸿彬; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-03-12
Filing date: 2020-03-12
Publication date: 2024-06-28
Anticipated expiration: 2040-03-12
Also published as: CN113557720A; WO2020182187A1

Abstract

Adaptive weights in multi-hypothesis prediction in video coding are disclosed. In one example implementation, for a transition between a first block of video and a bitstream representation of the first block of video, determining whether to enable or disable use of an update to weights in a combined intra-and inter-prediction (CIIP) mode to be applied during the transition; updating weights applied in a portion of pixels of the first block in CIIP mode in response to determining that updating of weights in CIIP mode is enabled; and performing the conversion based on the updated weights and the non-updated weights.

Description

Video processing method, apparatus and non-transitory computer readable medium

Cross Reference to Related Applications

The present application, in accordance with applicable regulations of the patent laws and/or the paris convention, requires in time the priority and benefit of international patent application PCT/CN2019/077838 filed on day 3, month 12 of 2019. The entire disclosure of international patent application PCT/CN2019/077838 is incorporated herein by reference as part of the present disclosure.

Technical Field

The present application relates to video codec techniques, devices, and systems.

Background

Despite advances in video compression, digital video still occupies the largest bandwidth usage on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for digital video usage are expected to continue to increase.

Disclosure of Invention

This document describes various embodiments and techniques for using multi-hypothesis prediction in video coding.

In one example aspect, various methods for using multi-hypothesis prediction in video coding are disclosed.

In another example aspect, a method of video processing is disclosed. The method includes updating weights for combining inter and intra prediction modes during a transition between a current video region and a bitstream representation of a current video block, and performing the transition based on the updating.

In another example aspect, a method of video processing is disclosed. The method comprises the following steps: for a transition between a first block of video and a bitstream representation of the first block of video, determining whether to enable or disable use of updating weights in a combined intra-and inter-prediction (CIIP) mode to be applied during the transition; updating weights applied within a portion of pixels of the first block in CIIP mode in response to determining that updating of weights in CIIP mode is enabled; and performing the conversion based on the updated weights and the non-updated weights.

In another exemplary aspect, a method of video processing is disclosed. The method comprises the following steps: for a transition between a first block of video and a bitstream representation of the first block of video, a set of weights is determined from a plurality of sets of weights being used in a combined intra-and inter-prediction (CIIP) mode, the determination being dependent on a message present in at least one of: a sequence level comprising a Sequence Parameter Set (SPS), a slice level comprising a slice header, a picture level comprising a picture header, and a block level comprising a Coding Tree Unit (CTU) and a coding decoding unit (CU); based on the determined set of weights, applying CIIP a pattern to generate a final prediction of the first block; and performing the conversion based on the final prediction; wherein the final prediction of the first block is generated based on a weighted sum of the inter intra prediction and the inter merge inter prediction of the first block.

In yet another example aspect, a video codec device configured to implement one of the above methods is disclosed.

In yet another example aspect, a video decoding apparatus configured to implement one of the above methods is disclosed.

In yet another aspect, a computer-readable medium is disclosed. The computer readable medium has stored thereon processor executable code for implementing one of the methods described above.

These and other aspects are described herein.

Drawings

FIG. 1 illustrates an example derivation process for merge candidate list construction.

Fig. 2 shows an example of the location of the spatial merge candidate.

Fig. 3 shows an example of candidate pairs considered for redundancy check for spatial merge candidates.

Fig. 4 shows an example of the locations of the second PUs of the nx2n partitions and the 2nxn partitions.

Fig. 5 is an exemplary illustration of motion vector scaling of a temporal merge candidate.

Fig. 6 shows examples C0 and C1 of candidate positions of the time domain merge candidate.

Fig. 7 shows an example of combining bi-prediction merge candidates.

Fig. 8 summarizes the derivation of motion vector prediction candidates.

Fig. 9 shows an illustration of motion vector scaling of spatial motion vector candidates.

Fig. 10 shows neighboring samples for deriving IC parameters.

Fig. 11 shows a simplified affine motion model for (a) 4-parameter affine and (b) 6-parameter affine.

Fig. 12 is an example of affine MVF per sub-block.

Fig. 13 shows a 4-parameter affine model (a) and a 6-parameter affine model (b).

Fig. 14 shows MVP for af_inter for inherited affine candidates.

Fig. 15 shows MVP for af_inter for constructing affine candidates.

Fig. 16 shows candidates for af_merge.

Fig. 17 shows candidate positions for the affine merge mode.

Fig. 18 shows an example of MMVD search process.

Fig. 19 shows an example of MMVD search points.

FIG. 20 shows DMVR based on bilateral template matching.

Fig. 21 is an example of MVDs (0, 1) mirrored between list 0 and list 1 in DMVR.

Fig. 22 shows MVs that can be checked in one iteration.

Fig. 23 shows an example with the required reference points filled.

Fig. 24 illustrates an example apparatus for implementing the techniques described in this document.

Fig. 25 is a flow chart of an example method of video processing.

Fig. 26 is a flow chart of an example method of video processing.

Fig. 27 is a flow chart of an example method of video processing.

In the description of the drawings, (a) and (b) refer to the left-hand side and right-hand side of the corresponding drawings.

Detailed Description

This document provides various techniques that may be used by decoders of images or video bitstreams to improve the quality of decompressed or decoded digital video or images. For brevity, the term "video" is used herein to include sequences of pictures (conventionally referred to as video) and individual images. Furthermore, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.

Chapter headings are used in this document to ease understanding and do not limit the embodiments and techniques to corresponding chapters. In this way, embodiments from one section can be combined with embodiments from other sections.

1. Brief description of the drawings

This patent document relates to video encoding and decoding techniques. In particular, this document relates to candidate list construction in video coding. It can be applied to existing video coding standards such as HEVC, or standards to be finalized (multi-function video coding). It may also be applicable to future video codec standards or video codecs.

2. Preliminary discussion

Video codec standards have been developed primarily by developing the well-known ITU-T and ISO/IEC standards. ITU-T makes h.261 and h.263, ISO/IEC makes MPEG-1 and MPEG-4 vision, and two organizations jointly make h.262/MPEG-2 video, h.264/MPEG-4 Advanced Video Codec (AVC), and h.265/HEVC standards. Since h.262, video codec standards have been based on hybrid video codec structures, where temporal prediction plus transform coding is utilized. To explore future video codec technologies beyond HEVC, VCEG and MPEG have jointly established a joint video exploration team in 2015 (JVET). Since then, JVET has adopted a number of new approaches and applied it to reference software known as Joint Exploration Model (JEM). A joint video expert group (JVET) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11 (MPEG) was created 4 months 2018 to work against VVC standards with the goal of 50% bit rate reduction compared to HEVC.

2.1 Inter prediction in HEVC/H.265

Each inter prediction PU has motion parameters for one or two reference picture lists. The motion parameters include a motion vector and a reference picture index. The use of one of the two reference picture lists may also be signaled using inter predidc. The motion vector may be explicitly encoded into an increment relative to a predictor (predictor).

When a CU is encoded in skip mode, one PU is associated with the CU and there are no significant residual coefficients, no motion vector delta or reference picture index for the encoding and decoding. The merge mode is defined as that the motion parameters of the current PU are obtained from neighboring PUs including spatial and temporal candidates. The merge mode may be applied to any inter-predicted PU, not just those used for skip mode. An alternative to the merge mode is explicit transmission of motion parameters, in which motion vectors (more precisely, motion Vector Differences (MVDs) compared to motion vector predictors) are signaled per PU display, corresponding reference picture indices per reference picture list and reference picture list usage. Such a mode is referred to in this disclosure as Advanced Motion Vector Prediction (AMVP).

The PU is generated from a block of samples when the signaling indicates that one of the two reference picture lists is to be used. This is called "unidirectional prediction". Unidirectional prediction is available for both P-stripes and B-stripes.

When the signaling indicates that two reference picture lists are to be used, the PU is generated from two sample blocks. This is called "bi-prediction". Bi-directional prediction is only available for the B-stripes.

Details regarding inter prediction modes specified in HEVC will be provided below. The description will begin in the merge mode.

2.1.1 Reference Picture List

In HEVC, the term inter prediction is used to refer to a prediction derived from data elements (e.g., sample values or motion vectors) of a reference picture, rather than a current decoded picture. Similar to in H.264/AVC, a picture may be predicted from multiple reference pictures. The reference pictures for inter prediction are organized in one or more reference picture lists. The reference index identifies which reference picture in the list should be used to create the prediction signal.

A single reference picture List, list 0, is used for P slices, while two reference picture lists, list 0 and List 1, are used for B slices. It should be noted that the reference pictures contained in List 0/1 may be from past pictures and future pictures in the order of capture/display.

2.1.2 Merge mode in HEVC

2.1.2.1 Derivation of candidates for merge mode

When predicting a PU using the merge mode, an index pointing to an entry in the merge candidate list is parsed from the bitstream and motion information is retrieved using the index. The construction of this list is specified in the HEVC standard, which can be summarized according to the following sequence of steps:

step 1: initial candidate derivation

Step 1.1: spatial candidate derivation

Step 1.2: redundancy check for space domain candidates

Step 1.3: time domain candidate derivation

Step 2: additional candidate inserts

Step 2.1: creation of bi-prediction candidates

Step 2.2: insertion of zero motion candidates

A schematic illustration of these steps is also given in fig. 1. For spatial-domain merge candidate derivation, up to four merge candidates are selected from among candidates located at five different positions. For time domain merge candidate derivation, at most one merge candidate is selected from the two candidates. Since the number of candidates per PU is assumed to be constant at the decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of merge candidates signaled in the stripe header (MaxNumMergeCand). Since the number of candidates is constant, the index of the best merge candidate is encoded using truncated unary binarization (TU). If the size of the CU is equal to 8, then all PUs of the current CU share a single merge candidate list, which is equivalent to the merge candidate list of the 2Nx2N prediction unit.

Hereinafter, the operations associated with the foregoing steps will be described in detail.

2.1.2.2 Spatial candidate derivation

In the derivation of the spatial merge candidates, up to four merge candidates are selected among candidates located at the positions shown in fig. 2. The deduced order is A1, B0, A0 and B2. Position B2 is only considered when any PU of positions A1, B0, A0 is not available (e.g., because it belongs to another slice or slice) or is intra-coded. After the candidates at the position A1 are added, the addition of the remaining candidates is subjected to a redundancy check, which ensures that candidates having the same motion information are excluded from the list, thereby improving the codec efficiency. In order to reduce the computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead, only the pair linked by the arrow of fig. 3 is considered, and the corresponding candidate for redundancy check is added to the list only if the candidate does not have the same motion information. Another source of duplicate motion information is a "second PU" associated with a partition other than 2nx2n. As an example, fig. 4 shows the second PU in the case of nx2n and 2nxn, respectively. Candidates at position A1 are not considered for list construction when the current PU is partitioned into nx2n. In practice, adding this candidate will result in two prediction units having the same motion information, which is redundant for letting only one PU within the codec unit. Similarly, position B1 is not considered when the current PU is partitioned into 2N N.

2.1.2.3 Time domain candidate derivation

In this step, only one candidate is added to the list. In particular, in this derivation of temporal merge candidates, scaled motion vectors are derived based on collocated PUs belonging to the picture with the smallest POC difference from the current picture within a given reference picture list. The derived reference picture list to be used for the collocated PU is signaled explicitly in the slice header. The scaled motion vectors for the temporal merge candidate are obtained as shown by the dashed lines in fig. 5, scaled from the motion vectors of the collocated PU using POC distances (tb and td), where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal merge candidate is set equal to zero. The actual implementation of the scaling procedure is described in HEVC provision. For the B slices, two motion vectors (one for reference picture list 0 and the other for reference picture list 1) are obtained and combined to produce bi-predicted merge candidates.

As shown in fig. 6, in the collocated PU (Y) belonging to the reference frame, the position of the time domain candidate is selected between candidates C0 and C1. If the PU at position C0 is not available, is intra-coded, or is outside the current codec tree unit (CTU, also known as LCU, largest codec unit) row, position C1 is used. Otherwise, position C0 is used in the derivation of the time domain merge candidate.

Temporal motion vector prediction is also referred to as "TMVP".

2.1.2.4 Additional candidate insertions

In addition to spatial and temporal merge candidates, there are two additional types of merge candidates: the bi-prediction merge candidate and the zero merge candidate are combined. The combined bi-predictive merge candidate is generated by using the spatial merge candidate and the temporal merge candidate. The combined bi-predictive merge candidate is only used for B slices. The combined bi-prediction candidate is generated by combining the first reference picture list motion parameter of the initial candidate with the second reference picture list motion parameter of the other. If the two tuples provide different motion hypotheses they will form new bi-prediction candidates. As an example, fig. 7 shows the case when two candidates in the original list (left side) with mvL0 and refIdxL0 or mvL1 and refIdxL1 are used to create a combined bi-predictive merge candidate that is added to the final list (right side). There are many rules regarding the combinations considered to generate these additional merge candidates.

The zero motion candidate is inserted to fill the remaining entries in the merge candidate list and thus reach MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index that starts at zero and increases each time a new zero motion candidate is added to the list. Finally, no redundancy check is performed on these candidates.

2.1.3 AMVP

AMVP exploits the spatio-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, the motion vector candidate list is constructed in the following manner: the availability of left, upper temporal neighboring PU locations is first checked, redundant candidates are removed, and zero vectors are added, making the candidate list constant length. Thereafter, the encoder may select the best predictor from the candidate list and transmit a corresponding index indicating the selected candidate. Similar to the merge index signaling, the index of the best motion vector candidate is encoded using a truncated unary code. The maximum value to be encoded in this case is 2 (refer to fig. 8). In the following sections, details about the derivation process of motion vector prediction candidates will be provided.

2.1.3.1 Derivation of AMVP candidates

Fig. 8 shows an example derivation process of motion vector prediction candidates.

In motion vector prediction, two types of motion vector candidates are considered: spatial domain motion vector candidates and temporal motion vector candidates. For spatial domain motion vector candidate derivation, two motion vector candidates are ultimately derived based on the motion vector of each PU at five different locations as shown in fig. 2.

For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates derived based on two different collocated positions. After the first list of spatio-temporal candidates is created, the repeated motion vector candidates in the list are removed. If the number of possible candidates is greater than two, such motion vector candidates are removed from the list: its reference picture index within the associated reference picture list is greater than 1. If the number of spatio-temporal motion vector candidates is less than two, an additional zero motion vector candidate is added to the list.

2.1.3.2 Spatial motion vector candidates

In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five possible candidates derived from PUs located at the same positions as those of the motion merge as shown in fig. 2. The derivation order for the left side of the current PU is defined as A0, A1 and scaling A0, scaling A1. The derivation order for the upper side of the current PU is defined as B0, B1, B2, scale B0, scale B1, and scale B2. For each side there are thus four cases that can be used as motion vector candidates, two of which do not require the use of spatial scaling and two of which use spatial scaling. A summary of these four different cases is as follows:

Non-spatial scaling

(1) Identical reference picture list and identical reference picture index (identical POC)

(2) Different reference picture lists, but the same reference picture (same POC)

Spatial scaling

(3) The same reference picture list, but different reference picture indexes (different POCs)

(4) Different reference picture lists, and different reference pictures (different POCs)

First check the no spatial scaling case, followed by spatial scaling. Spatial scaling is considered when POC is different between the reference picture of a neighboring PU and the reference picture of the current PU, regardless of the reference picture list. If all PUs of the left candidate are not available or are intra-coded, scaling for the upper motion vector is allowed, thus facilitating parallel derivation of the left and upper MV candidates. Otherwise, spatial scaling for the upper motion vector is not allowed.

In the spatial scaling process, as shown in fig. 9, the motion vectors of neighboring PUs are scaled in a similar manner as in the temporal scaling. The main difference is that the reference picture list and index of the current PU are given as inputs; the actual scaling procedure is the same as the time domain scaling procedure.

2.1.3.3 Temporal motion vector candidates

All processes of the derivation of the temporal merge candidate are the same as those of the spatial motion vector candidate (see fig. 6), except for the reference picture index derivation. The decoder is signaled with reference picture indices.

2.2 Local illumination compensation in JEM

Local Illumination Compensation (LIC) is based on a linear model for illumination variation using a scaling factor a and an offset b. Also, a Codec Unit (CU) for each inter-mode codec adaptively enables or disables local illumination compensation.

Fig. 10 shows neighboring samples for deriving LIC parameters.

When LIC is applied to a CU, the least squares error method is employed to derive parameters a and b by using neighboring samples of the current CPU and their corresponding reference samples. More specifically, as shown in fig. 12, sub-sampled (2:1 sub-sampled) neighboring samples of the CU and corresponding samples in the reference picture (identified by motion information of the current CU or sub-CU) are used.

2.2.1 Derivation of prediction blocks

The LIC parameters are derived and applied separately for each prediction direction. For each prediction direction, a first prediction block is generated with the decoded motion information, after which a temporary prediction block is obtained by applying the LIC model. Thereafter, a final prediction block is derived using the two temporary prediction blocks.

Upon encoding and decoding a CU in the merge mode, the LIC flag is copied from the neighboring block in a similar manner to the motion information copy in the merge mode; otherwise, an LIC flag is signaled for the CU to indicate whether LIC is applied.

When LIC is enabled for pictures, an additional CU level RD check is needed to determine if LIC is applied to the CU. When LIC is enabled for CU, the sum of absolute differences with the mean removed (MR-SAD) and the sum of absolute Hadamard transform differences with the mean removed (MR-SATD) are used for integer-pixel motion refinement and fractional-pixel motion refinement, respectively, instead of SAD and SATD.

In order to reduce coding complexity, the following coding scheme is applied in JEM.

● When there is no significant illumination change between the current picture and its reference picture, the LIC is disabled for the entire picture. To identify this, a histogram of the current picture and each reference picture of the current picture is calculated at the encoder. Disabling the LIC for the current picture if the histogram difference between the current picture and each reference picture of the current picture is less than a given threshold; otherwise, LIC is enabled for the current picture.

2.3 Inter prediction method in VVC

There are several new codec tools for inter prediction improvement, such as adaptive motion vector difference resolution (AMVR), affine prediction mode, triangle Prediction Mode (TPM), advanced TMVP (ATMVP, also called SbTMVP), generalized bi-prediction (GBI), bi-directional optical flow (BIO) for signaling MVD.

2.3.1 Coding and decoding block structure in VVC

In VVC, a quadtree/binary tree/multi-way tree (QT/BT/TT) structure is adopted to divide a picture into square or rectangular blocks.

In addition to QT/BT/TT, a separate tree (SEPARATE TREE) (also called double coding tree) is also used for I frames in the VVC. In the case of a separate tree, the codec block structure is signaled separately for the luma and chroma components.

2.3.2 Adaptive motion vector difference resolution

In HEVC, a Motion Vector Difference (MVD) (between the motion vector of the PU and the predicted motion vector) is signaled in quarter luma samples when use_integer_mv_flag in the slice header is equal to 0. In VVC, locally Adaptive Motion Vector Resolution (AMVR) is introduced. In the VVC, MVDs can be encoded in units of quarter luminance samples, full luminance samples, and four luminance samples (i.e., 1/4-pel, 1-pel, 4-pel). The MVD resolution is controlled at the coding and decoding unit (CU) level and MVD resolution flags are signaled conditionally for each CU with at least one non-zero MVD component.

For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter-luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that the quarter-luma sample MV precision is not used, another flag is signaled to indicate whether the full-luma sample MV precision or the quarter-luma sample MV precision is used.

When the first MVD resolution flag of a CU is zero, or when the flag is not encoded for the CU (meaning that all MVDs within the CU are zero), the quarter luma sample MV resolution is used for the CU. When the CU uses the full luminance sample MV precision or the four luminance sample MV precision, the MVPs in the AMVP candidate list of the CU are rounded to the corresponding precision.

2.3.3 Affine motion compensated prediction

In HEVC, only translational motion models are applied for Motion Compensated Prediction (MCP). In the real world there are many kinds of movements, such as zoom in/out, rotation, perspective movements and other irregular movements. In VVC, a reduced affine transformation motion compensated prediction is applied using a 4-parameter affine model and a 6-parameter affine model. As shown in fig. 13, the affine motion field of a block is described by two Control Point Motion Vectors (CPMV) of a 4-parameter affine model and 3 CPMV of a 6-parameter affine model.

Fig. 11 shows a simplified affine motion model of (a) 4-parameter affine and (b) 6-parameter affine.

The Motion Vector Field (MVF) of the block is described by the following equations, respectively, wherein a 4-parameter affine model is employed in equation (1) (wherein 4 parameters are defined as variables a, b, e and f), and a 6-parameter affine model is employed in equation 2 (wherein 6 parameters are defined as a, b, c, d, e and f):

Where (mv ^h ₀,mv^h ₀) is the motion vector of the upper left corner control point, (mv ^h ₁,mv^h ₁) is the motion vector of the upper right corner control point, (mv ^h ₂,mv^h ₂) is the motion vector of the lower left corner control point, all three of which are referred to as Control Point Motion Vectors (CPMV), (x, y) represent coordinates of the representative point with respect to the upper left sample point within the current block, (mv ^h(x,y),mv^v (x, y)) is the motion vector derived for the sample point located at (x, y). The CP motion vector may be signaled (such as in affine AMVP mode) or may be instantaneously derived (such as in affine merge mode). w and h are the width and height of the current block. In practice, division is achieved by right shifting along with rounding operations. In VTM, the representative point is defined as the center position of the sub-block, for example, when the coordinates of the upper left corner of the sub-block are (xs, ys) with respect to the coordinates of the upper left sample point within the current block, then the coordinates of the representative point are defined as (xs+2, ys+2). For each sub-block (e.g., 4 x 4 in VTM), the representative point is used to derive the motion vector for the entire sub-block.

To further simplify motion compensated prediction, sub-block based affine transformation prediction is applied. To derive the motion vector of each m×n (in the current VVC, M and N are both set to 4) sub-block, the motion vector of the center sample of each sub-block is calculated according to equation (1) and equation (2) (as shown in fig. 14), and rounded to a 1/16 fractional precision. Then, a 1/16-pixel motion compensated interpolation filter is applied to generate a prediction for each sub-block by means of the derived motion vector. The interpolation filter of 1/16 pixel is introduced by affine mode.

After MCP, the high-precision motion vector of each sub-block is rounded and saved with the same precision as the normal motion vector.

Signaling of affine prediction

Similar to the translational motion model, there are two modes of signaling side information due to affine prediction. They are AFFINE _inter mode and AFFINE _merge mode.

-AF INTER mode

For CUs with both width and height greater than 8, the af_inter mode may be applied. Affine flags in the CU level are signaled in the bitstream to indicate whether af_inter mode is used.

In this mode, for each reference picture List (List 0 or List 1), an affine AMVP candidate List is constructed in the following order using three types of affine motion predictors, where each candidate includes an estimated CPMV of the current block. The difference between the best CPMV found at the codec side (e.g., mv ₀ mv₁ mv₂ in fig. 15) and the estimated CPMV is signaled. Furthermore, the index of the affine AMVP from which the estimated CPMV is derived is further signaled.

1) Inherited affine motion predictor

The checking order is similar to that of the spatial MVP in HEVC AMVP list construction. First, a left inherited affine motion predictor is derived from a first block in { A1, A0} which is affine-coded and has the same reference picture as in the current block. Next, an upper inherited affine motion predictor is derived from the first block in { B1, B0, B2} which is affine-coded and has the same reference picture as in the current block. Five blocks A1, A0, B1, B0, B2 are depicted in fig. 16.

Once the neighboring block is found to be encoded with affine mode, a predictor of the CPMV of the current block is derived using the CPMV of the codec unit covering the neighboring block. For example, if A1 is encoded with a non-affine mode and A0 is encoded with a 4-parameter affine mode, then the left inherited affine MV predictor will be derived from A0. In this case, the upper left CPMU of CPMU (as in FIG. 16) covering A0 is utilizedAnd the upper right CPMV in FIG. 16Represented) to derive an estimated CPMV of the current block from the upper left position (with coordinates (x 0, y 0)), the upper right position (with coordinates (x 1, y 1)) and the lower right position (with coordinates (x 2, y 2)) of the current block As indicated.

2) Construction of affine movement predictors

The construction affine motion predictor consists of Control Point Motion Vectors (CPMV) derived from blocks of neighboring inter-frame codecs, which have the same reference picture, as shown in fig. 15. If the current affine motion model is a 4-parameter affine, the number of CPMV is 2, otherwise if the current affine motion model is a 6-parameter affine, the number of CPMV is 3. Left upper CPMVIs derived from MVs at the first block in the set { a, B, C } that are inter-coded and have the same reference picture as in the current block. Upper right CPMVIs derived from the MV at the first block in the set D, E, which is inter-coded and has the same reference picture as in the current block. Lower left CPMVIs derived from the MV at the first block in the set F, G, which is inter-coded and has the same reference picture as in the current block.

If the current affine motion model is a 4-parameter affine, then onlyAndBoth are found, that is to sayAndThe constructed affine motion predictor is inserted into the candidate list only when used as an estimated CPMV for the top left (with coordinates (x 0, y 0)) and top right (with coordinates (x 1, y 1)) positions of the current block.

If the current affine motion model is a 6-parameter affine, then onlyAndAll of them were found, that is,AndIs used as an estimated CPMV for the top left position (with coordinates (x 0, y 0)), the top right position (with coordinates (x 1, y 1)) and the bottom right position (with coordinates (x 2, y 2)) of the current block, the build affine motion predictor is inserted into the candidate list.

The clipping process is not applied when inserting the constructed affine motion predictor into the candidate list.

3) Normal AMVP motion predictor

The following applies until the number of affine motion predictors reaches a maximum.

1) By setting all CPMVs equal toAn affine motion predictor is derived (if available).

2) By setting all CPMVs equal toAn affine motion predictor is derived (if available).

3) By setting all CPMVs equal toAn affine motion predictor is derived (if available).

4) Affine motion predictors are derived by setting all CPMV equal to HEVC TMVP (if available).

5) Affine motion predictors are derived by setting all CPMV to zero MV.

Note that in constructing affine motion predictors, we have derived

Fig. 14 shows MVP for af_inter for inherited affine candidates.

Fig. 15 shows MVP for af_inter for constructing affine candidates.

In the AF INTER mode, when the 4/6 parameter affine mode is applied, 2/3 control points are required, and thus 2/3 MVDs need to be encoded and decoded for these control points, as shown in fig. 15. MV is suggested to be derived from mvd ₀ to predict mvd ₁ and mvd ₂.

As shown in fig. 15, wherein,Mvd _i and mv ₁ are the predicted motion vector, motion vector difference, and motion vector of the top left pixel (i=0), top right pixel (i=1), or bottom left pixel (i=2), respectively. Note that the addition of two motion vectors (e.g., mvA (xA, yA) and mvB (xB, yB)) is equal to the sum of two components separately, that is newMV =mva+mvb, and the two components of newMV are set to (xa+xb) and (ya+yb), respectively.

-Af_merge mode

When the CPU is applied in af_merge mode, it gets the first block encoded with affine mode from the valid neighbor reconstructed block. And the selection order of the candidate blocks is from left, top right, bottom left to top left as shown in fig. 16 (represented by A, B, C, D, E in order). For example, if the neighbor lower left block is encoded in affine mode (as represented by A0 in fig. 16), then Control Point (CP) motion vectors mv ₀ ^N、mv₁ ^N and mv ₂ ^N for the upper left, upper right, and lower left corners of the neighboring CUs/PUs containing block a are extracted. Motion vectors mv ₀ ^C、mv₁ ^C and mv ₂ ^C (which are used only for the 6-parameter affine model) for the top left/top right/bottom left of the current CU/PU are calculated based on mv ₀ ^N、mv₁ ^N and mv ₂ ^N. It should be noted that in VTM-2.0, if the current block is affine coded, then the sub-block located in the upper left corner (e.g., 4 x 4 block in VTM) stores mv0 and the sub-block located in the upper right corner stores mv1. If the current block is encoded and decoded by adopting a 6-parameter affine model, storing mv2 in a sub-block positioned at the lower left corner; otherwise (using a 4-parameter affine model), LB stores mv2'. The other sub-blocks store MVs for the MC.

After deriving CPMV ₀ ^C、mv₁ ^C and mv ₂ ^C of the current CU, MVF of the current CU is generated according to the reduced affine motion model equations (1) and (2). In order to identify whether the current CU is encoded using the af_merge mode, an affine flag is signaled in the bitstream when there is at least one neighbor block encoded in the affine mode.

The affine candidate list is constructed by the steps of:

1) Inserting inherited affine candidates

Inherited affine candidates refer to: candidates are derived from affine motion models of blocks of their valid neighbor affine codecs. At most two inherited affine candidates are derived from affine motion models of neighboring blocks and inserted into the candidate list. For the left predictor, the scan order is { A0, A1}; for the upper predictor, the scan order is { B0, B1, B2}.

2) Insertion construction affine candidates

If the number of candidates in the affine candidate list is less than MaxNumAffineCand (e.g., 5), then the build affine candidate is inserted into the candidate list. Constructing affine candidates refers to: candidates are constructed by combining the neighbor motion information of each control point.

A) The motion information of the control point is first derived from the designated spatial and temporal neighbors shown in fig. 17. CPk (k=1, 2,3, 4) represents the kth control point. A0, A1, A2, B0, B1, B2, and B3 are spatial locations for predicting CPk (k=1, 2, 3); t is the time domain position used to predict CP 4.

Coordinates of CP1, CP2, CP3, and CP4 are (0, 0), (W, 0), (H, 0), and (W, H), respectively, where W and H are the width and height of the current block.

The motion information of each control point is obtained according to the following priority order:

For CP1, the check priority is B2- > B3- > A2. If B2 is available, then B2 is applicable. Otherwise, if B2 is not available, then B3 is used. If neither B2 nor B3 is available, then A2 is used. If all three candidates are not available, then motion information for CP1 cannot be obtained.

For CP2, the check priority is B1- > B0.

For CP3, the check priority is A1- > A0.

For CP4, T is used.

B) Second, affine candidate is constructed using a combination of control points.

I. Motion information of three control points is required to construct 6-parameter affine candidates. The three control points may be selected from one of four combinations ({ CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4 }). The combination CP1, CP2, CP3, { CP2, CP3, CP4}, { CP1, CP3, CP4} will be converted into a 6-parameter motion model represented by the upper left control point, the upper right control point and the lower left control point.

Motion information of two control points is needed to construct 4-parameter affine candidates. The two control points may be selected from one of two combinations ({ CP1, CP2}, { CP1, CP3 }). These two combinations will be converted into a 4-parameter motion model represented by the upper left control point and the upper right control point.

The combinations of constructed affine candidates are inserted into the candidate list in the following order: { CP1, CP2, CP3}, { CP1, CP2, CP4}, { CP1, CP3, CP4}, { CP2, CP3, CP4}, { CP1, CP2}, { CP1, CP3}, and }

I. For each combination, the reference index of list X for each CP is checked, and if they are all the same, this combination has a valid CPMV for list X. If the combination does not have a valid CPMV for both list 0 and list 1, then this combination is marked as invalid. Otherwise, it is valid and the CPMV is put into the sub-block merge list.

3) Filling with zero motion vectors

If the number of candidates in the affine candidate list is less than 5, then a zero motion vector with a zero reference index is inserted into the candidate list until the list is full.

More specifically, for the sub-block merge candidate list, the 4-parameter merge candidate has MV set to (0, 0) and prediction directions set to unidirectional prediction (for P-slice) and bidirectional prediction (for B-slice) from list 0.

In VTM4, the CPMV of the affine CU is stored into a separate buffer. The stored CPMV is only used to generate inheritance CPMVP in affine merge mode and affine AMVP mode for the most recently encoded CU. The sub-block MV derived from CPMV is used for MV derivation and deblocking of the merge/AMVP list of motion compensated, translated MVs. In order to avoid picture line buffers for additional CPMV, affine motion data inheritance from a CU originating from an overhead CTU is treated differently than inheritance from a normal neighboring CU. If a candidate CU for affine motion data inheritance is within the upper CTU row, the lower left and lower right sub-blocks MV in the row buffer (instead of CPMV) are used for affine MVP derivation. In this way, the CPMV is stored only into the local buffer. If the candidate CU is a 6-parameter affine codec, the affine model is downgraded to a 4-parameter model.

Merge (MMVD) with motion vector difference

The final motion vector expression (UMVE, also known as MMVD) will be described. UMVE is used for skip mode or merge mode using the proposed motion vector expression method.

UMVE reuse the same merge candidates as those included in the conventional merge candidate list in the VVC. Among these merge candidates, the base candidate may be selected and further expanded by the proposed motion vector expression method.

UMVE provides a new Motion Vector Difference (MVD) representation method in which the MVD is represented using a starting point, a motion magnitude, and a motion direction.

This proposed technique uses the merge candidate list as it is. But only candidates for the DEFAULT merge TYPE (mrg_type_default_n) are considered for UMVE extensions.

The base candidate index defines a starting point. The base candidate index indicates the best candidate among the candidates within the list, as described below.

TABLE 1 basic candidate IDX

Basic candidate IDX	0	1	2	3
					Nth MVP	No. 1 MVP	No. 2 MVP	No. 3 MVP	No. 4 MVP

If the number of base candidates is equal to 1, the base candidate IDX is not signaled.

The distance index is motion amplitude information. The distance index indicates a predefined distance from the start point information. The predefined distance is as follows:

TABLE 2 distance IDX

The direction index indicates the direction of the MVD relative to the starting point. The direction index may represent four directions as shown below.

TABLE 3 direction IDX

Direction IDX	00	01	10	11
					X-axis	+	–	N/A	N/A
Y-axis	N/A	N/A	+	–

The notification UMVE flag is signaled immediately after the skip flag or mergemerge flag is sent. If the skip or merge flag is true, then the UMVE flag is parsed. If UMVE flag is equal to 1, then parse UMVE syntax is performed. But if not equal to 1, then the AFFINE flags are parsed. If AFFINE flag is equal to 1, it is AFFINE mode, but if not equal to 1, skip/merge index is parsed for skip/merge mode of VTM.

No additional line buffers are required due to UMVE candidates. Because skip/merge candidates of the software are directly used as base candidates. In the case of the index using input UMVE, the addition to the MV is decided just prior to motion compensation. It is not necessary to maintain a long line buffer for this purpose.

Under the current common test conditions, the first merge candidate or the second merge candidate in the merge candidate list can be selected as a basic candidate.

UMVE is also known as Merge (MMVD) using MV differences.

Further, a flag tile_group_ fpel _ mmvd _enabled_flag of whether to use a fractional distance is signaled to the decoder in the stripe header. When the fractional distances are disabled, the distances in the default table are all multiplied by 4, i.e., the distance table {1,2,4,8,16,32,64,128} pixels are used. Since the size of the distance table is not changed, the entropy codec of the distance index is not changed.

Decoder side motion vector refinement (DMVR)

In the bi-prediction operation, for prediction of a region of one block, two prediction blocks formed using a Motion Vector (MV) of list 0 and an MV of list 1, respectively, are combined to form a single prediction signal. In the decoder-side motion vector refinement (DMVR) method, the two motion vectors of the bi-prediction are further refined.

DMVR in JEM

In JEM design, motion vectors are refined through a bilateral template matching process. A bilateral template matching is applied in the decoder to perform a distortion-based search between the bilateral template and reconstructed samples in the reference picture to obtain refined MVs without the need to transmit additional motion information. An example is shown in fig. 20. As shown in fig. 20, the bilateral template is generated as a weighted combination (i.e., average) of two prediction blocks from the initial MV0 of list 0 and the initial MV1 of list 1, respectively. The template matching operation consists of calculating a cost metric between the generated template and the sample areas (surrounding the initial prediction block) in the reference picture. For each of the two reference pictures, the MV that gets the lowest template cost is considered the updated MV of the list to replace the original one. In JEM, nine MV candidates are searched for each list. The nine MV candidates include an original MV and 8 surrounding MVs having one luminance sample offset with respect to the original MV in the horizontal direction or the vertical direction or both. Finally, two new MVs, i.e., MV0 'and MV1' as shown in fig. 20, are used to generate the final bi-prediction result. The Sum of Absolute Differences (SAD) is used as the cost metric. Note that in calculating the cost of a prediction block generated by one surrounding MV, the rounded MV (to the full pixel level) is used instead of the actual MV to obtain the prediction block.

FIG. 20 shows DMVR based on bilateral template matching.

DMVR in VVC

For DMVR in VVC, assume that the MVD mirror between list0 and list 1 is as shown in fig. 21, and that bilateral matching is performed to refine the MVs, i.e. find the best MVD among several MVD candidates. The MVs of the two reference picture lists are represented by MVL0 (L0X, L0Y) and MVL1 (L1X, L1Y). The MVD represented by (MvdX, mvdY) of list0 that can minimize a cost function (e.g., SAD) is defined as the optimal MVD. For the SAD function, which is defined as the SAD between the reference block of list0 and the reference block of list 1, the reference block of list0 is derived using the motion vector (L0X+ MvdX, L0Y+ MvdY) in the reference picture of list0, and the reference block of list 1 is derived using the motion vector (L1X-MvdX, L1Y-MvdY) in the reference picture of list 1.

The motion vector refinement process may be iterated twice. As shown in fig. 22, in each iteration, up to 6 MVDs (with full pixel accuracy) can be checked in two steps. In a first step, the MVD (0, 0), (-1, 0), (0, -1), (0, 1) is checked. In the second step, one of the MVDs (-1, -1), (-1, 1), (1, -1) or (1, 1) may be selected for inspection and further inspection. Let the function Sad (x, y) return the Sad value of MVD (x, y). The MVD, represented by (MvdX, mvdY), checked in the second step is determined as follows:

MvdX＝-1；

MvdY＝-1；

If(Sad(1，0)<Sad(-1,0))

MvdX＝1；

If(Sad(0，1)<Sad(0，-1))

MvdY＝1；

In the first iteration, the starting point is the signaled MV, and in the second iteration, the starting point is the signaled MV plus the selected best MVD in the first iteration. DMVR applies only when one reference picture is a preceding picture and the other reference picture is a following picture and both reference pictures have the same picture order count distance from the current picture.

Fig. 22 shows MVs that can be checked in one iteration. Also, DMVR in VVC first performs integer MVD refinement as described above. This is the first step. Thereafter, MVD refinement in the fractional accuracy is conditionally performed, thereby further refining the motion vector. This is the second step. The condition of whether to perform the second step is based on whether the MVD after the current iteration is zero MV. If it is zero MV (the vertical and horizontal components of MV are 0), then a second step will be performed.

Details of the fractional MVD refinement are given below. It should be noted that MVD represents the difference between the initial motion vector and the final motion vector used in motion compensation.

The integer distance locations and the costs assessed at these locations are used to fit a parametric error surface, which is then used to determine the 1/16 pixel accuracy sub-pixel offset.

The proposed method will be summarized below:

1. The parametric error surface fit is calculated only if the center position is the best cost position in a given iteration.

2. The cost of the center location and the cost from the center (-1, 0), (0, -1), (1, 0) and (0, 1) locations are used to fit the 2-D parabolic error surface equation of the form

E(x,y)＝A(x-x₀)²+B(y-y₀)²+C

Where (x ₀,y₀) corresponds to the location with the lowest cost and C corresponds to the lowest cost value. By solving 5 equations out of 5 unknowns, (x ₀,y₀) is calculated as:

x₀＝(E(-1,0)-E(1,0))/(2(E(-1,0)+E(1,0)-2E(0,0)))

y₀＝(E(0,-1)-E(0,1))/(2((E(0,-1)+E(0,1)-2E(0,0)))

The (x ₀,y₀) can be calculated to any desired sub-pixel accuracy calculation by adjusting the accuracy with which the division is performed (i.e., how many bits of the quotient are calculated). For a 1/16 pixel precision, only 4 bits in the quotient's absolute value need to be calculated, which applies to the 2 divisions needed to implement each CU based on fast shift subtraction.

3. The calculated (x ₀,y₀) is added to the integer distance refinement MV, resulting in a sub-pixel refinement increment MV.

The amplitude of the derived fractional motion vector is limited to less than or equal to half a pixel

To further simplify the DMVR process, several variations on the design in JEM are proposed. More specifically, the DMVR design adopted by VTM-4.0 (which will be released soon) has the following main features:

● Early termination occurs when the (0, 0) position SAD between list 0 and list 1 is less than the threshold.

● Early termination occurs when the SAD between List 0 and List 1 is zero for a certain location.

● Block size DMVR: w×h > =64 & & H > =8, where W and H are the width and height of the block.

● For DMVR with a CU size >16 x 16, the CU is divided into multiple 16 x 16 sub-blocks. When only the width or height of the CU is greater than 16, it is divided only in the vertical direction or the horizontal direction.

● Reference block size (w+7) × (h+7) (for luminance).

● Full pel search based on 25-point SAD (i.e., (+ -) 2 refine search range, single stage)

● DMVR based on bilinear interpolation.

● Subpixel refinement based on the "parametric error surface equation". This process is only performed when the lowest SAD cost in the last MV refinement iteration is not equal to zero and the best MVD is (0, 0).

● Luminance/chrominance MC w/reference block filling (if needed)

● MV for refinement of MC and TMVP only.

Use of-DMVR

DMVR may be enabled when the following conditions are true:

DMVR enable flag (i.e., sps_ dmvr _enabled_flag) in SPS is equal to 1.

The TPM flag, the inter affine flag and the sub-block merge flag (either ATMVPMERGE or affine) flags are all equal to 0.

The merge flag is equal to 1.

The current block is bi-predictive and the POC distance between the current picture and the reference picture in list 1 is equal to the POC distance between the reference picture and the current picture in list 0.

-The current CU height is greater than or equal to 8

The number of luminance samples (CU width x height) is greater than or equal to 64

Desired reference points in DMVR

For a block with a size w×h, assuming that the maximum allowable MVD value is +/-offset (e.g., 2 in VVC) and the filter size is filteSize (e.g., 8 for luminance and 4 for chrominance in VVC), then (w+2× offSet + filterSize-1) x (h+2× offSet + filterSize-1) reference samples are required. To reduce the memory bandwidth, only the central (W+ filterSize-1) x (H+ filterSize-1) reference samples are extracted, and other pixels are generated by repeating the boundaries of the extracted samples. An example of an 8x8 block is shown in fig. 23.

During motion vector refinement, bilinear motion compensation is performed using these reference samples. At the same time, final motion compensation is also performed using these reference samples.

Fig. 23 shows an example of a desired reference sample point with padding.

-Combining Intra and Inter Predictions (CIIP)

Multi-hypothesis prediction is proposed, wherein combining intra and inter prediction is one way to generate multiple hypotheses.

When multi-hypothesis prediction is applied to improve intra-mode, the multi-hypothesis prediction combines an intra-frame prediction and a merge index prediction. In the merge CU, a flag is signaled for the merge mode, whereby an intra mode is selected from the intra candidate list when the flag is true. For the luminance component, the intra candidate list is derived from 4 intra prediction modes including a DC mode, a plane mode, a horizontal mode, and a vertical mode, and the size of the intra candidate list may be 3 or 4 depending on the block shape. When the CU width is greater than twice the CU height, the horizontal mode is excluded from the intra mode list, and when the CU height is greater than twice the CU width, the vertical mode is removed from the intra mode list. A weighted average is used to combine one intra prediction mode selected by the intra mode index and one merge index prediction selected by the merge index. For the chrominance components, DM is always applied without redundant signaling. The weights used to incorporate the predictions are described below. When a DC or planar mode is selected, or CB width or height is less than 4, equal weights are applied. For those CBs having a CB width and height of greater than or equal to 4, when selecting the horizontal/vertical mode, one CB is first divided vertically/horizontally into four equal area regions. Each set of weights denoted (w_intra _i,w_inter_i) will be applied to the corresponding region, where i is from 1 to 4, and (w_intra₁,w_inter₁)＝(6,2),(w_intra₂,w_inter₂)＝(5,3),(w_intra₃,w_inter₃)＝(3,5),(w_intra₄,w_inter₄)＝(2,6).(w_intra₁,w_inter₁) is for the region closest to the reference sample, and (w_intra ₄,w_inter₄) is for the region furthest from the reference sample. The combined prediction can then be calculated by summing the two weighted predictions and right shifting by 3 bits. Furthermore, intra prediction modes for intra hypotheses of the predictor may be saved for reference by the next neighboring CU.

In VTM4, when a CU is coded in merge mode, and if the CU contains at least 64 luma samples (i.e., CU width times CU height is equal to or greater than 64), then additional flags are signaled to indicate whether combined inter/intra prediction is applied to the current CU (CIIP).

3. Problem(s)

CIIP employ a weighted combination between intra prediction and inter prediction, which may be less efficient when encoding and decoding screen content with sharp edges. Which may obscure the prediction signal and further impair codec performance.

4. Example enumeration of embodiments and techniques

It is proposed that when CIIP is enabled for a block, some samples within the block may be predicted from intra-prediction only, while others may be predicted from inter-prediction only.

The following detailed description is to be taken as an example of the general concepts. These techniques should not be interpreted narrowly. Furthermore, these inventions may be combined in any manner.

In addition to CIIP mentioned below, the methods described below may also be applied to other decoder motion information derivation techniques.

1. The weights may be = updated in CIIP mode for prediction units/codec blocks/regions.

A. In one example, the set of weights in CIIP mode, denoted as (w_intra _i,w_inter_i) (where i is from 1 to 4), may be updated to (w_intra₁,w_inter₁)＝(N,0),(w_intra₂,w_inter₂)＝(0,N),(w_intra₃,w_inter₃)＝(0,N) and (w_intra ₄,w_inter₄) = (0, n)

B. In one example, the set of weights in CIIP mode, denoted as (w_intra _i,w_inter_i) (where i ranges from 1 to 4), may be updated to (w_intra 1, w_inter 1) = (N, 0), (w_intra 2, w_inter 2) = (N, 0), (w_intra 3, w_inter 3) = (0, N), and (w_intra 4, w_inter 4) = (0, N)

C. In one example, the set of weights in CIIP mode, denoted as (w_intra _i,w_inter_i) (where i ranges from 1 to 4), may be updated to (w_intra 1, w_inter 1) = (N, 0), (w_intra 2, w_inter 2) = (N, 0), (w_intra 3, w_inter 3) = (N, 0), and (w_intra 4, w_inter 4) = (0, N)

D. In one example, the weights of only some parts are updated according to the method described above, and other parts still use the current weights.

E. in one example, the upper k rows of pixels may have a (intra, inter) weight (N, 0), and the other rows may have weights (0, N).

F. in one example, the k columns of pixels on the left may have a (intra, inter) weight (N, 0), and the other columns may have weights (0, N).

G. in one example, N is set to 1. Alternatively, when the final prediction block is the average of two prediction blocks divided by 8, N is set to 8.

H. In one example, weighting prediction in CIIP mode may be implemented as block copy when using the updated set of weights described above.

I. In one example, the value of the updated weight in CIIP mode for the prediction unit/codec block/region may depend on, for example, a message (e.g., flag) signaled in the sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU).

2. The indication of the use of the update of weights in CIIP modes for prediction units/codec blocks/regions may depend on, for example, a message (e.g., a flag) signaled in the sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU).

A. In one example, the weights in CIIP mode may be updated if a flag (e.g., tile_group_ MHIntra _scc_weight_enabled_flag) signaled in the sequence (e.g., SPS)/stripe (e.g., stripe header)/slice group (e.g., slice group header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) is true. In other words, the weights in CIIP may be updated when the signaled flag in the sequence (e.g., SPS)/stripe (e.g., stripe header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) is true.

B. In one example, the weights in CIIP mode may not be updated if the signaled flag (e.g., tile_group_ MHIntra _scc_weight_enabled_flag) in the sequence (e.g., SPS)/stripe (e.g., stripe header)/slice (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) is false. In other words, the weights within CIIP may not be updated when the flag signaled in the sequence (e.g., SPS)/stripe (e.g., stripe header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) is false.

C. in one example, an indication of the use of the update to the weights in CIIP mode may be inferred, and may depend on

A) Current block dimension

B) Current quantization parameters

C) Transformation type

D) Coding and decoding mode of block pointed by inter motion vector

E) Motion vector accuracy

F) Merge index used in CIIP

G) Intra prediction mode used in CIIP

H) The magnitude of the motion vector used in CIIP

I) Indication of which weight set is used by neighboring blocks

D. in one example, an indication of the use of the weights in CIIP mode to update may be signaled at the sub-block level.

E. In one example, an indication of the use of the weights in CIIP modes to update may be inferred at the sub-block level.

3. Multiple sets of weights may be provided in CIIP modes. The indication of which weight set to use for a prediction unit/codec block/region may depend on, for example, a message (e.g., a flag) signaled in the sequence (e.g., SPS)/slice (e.g., slice header)/slice group (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU).

A. in one example, a notification message (e.g., a flag) may be signaled in a sequence (e.g., SPS)/slice (e.g., slice header)/picture level (e.g., picture header)/block level (e.g., CTU or CU) to indicate which set of weights may be used in CIIP mode.

B. Alternatively, in one example, an indication of which set of weights to use in CIIP mode may be inferred, and in addition, in one example, the inference thereof may be based on

A) Current block dimension

B) Current quantization parameters

C) Merge index used in CIIP

D) Intra prediction mode used in CIIP

E) Amplitude of motion vector used in CIIP

F) Indication of which weight set to use for neighboring blocks

C. In one example, an indication of which set of weights to use in CIIP mode may be signaled at the sub-block level.

D. in one example, an indication of which set of weights to use in CIIP modes may be inferred at the sub-block level.

Examples-examples

7.3.2 Raw byte sequence payload, end bit (trailing bit) and byte alignment syntax

7.3.2.1 Sequence parameter set RBSP syntax

The sps_scc_ MHIntra _weight_enabled_flag specifies updating the weights in CIIP mode. The sps_scc_ MHIntra _weight_enabled_flag being equal to 0 specifies that the weights in CIIP mode are not updated. If there is no sps_scc_ MHIntra _weight_enabled_flag, it is inferred to be 0.

7.3.4 Slice group header syntax

7.3.4.1 General slice group header syntax

8.5.7.6 Weighted sample prediction procedure for combining merge and intra prediction

Inputs to this process are:

The width cbWidth of the current codec block,

The height cbHeight of the current codec block,

-Two (cbWidth) x (cbHeight) arrays PREDSAMPLESINTER and PREDSAMPLESINTRA

-Intra prediction mode predModeIntra

Variable cIdx specifying color component index

The output of this process is the (cbWidth) x (cbHeight) array predSamplesComb of predicted sample values

The derivation of variable bitDepth is as follows:

-bitDepth is set equal to BitDepth _Y if cIdx is equal to 0.

Otherwise bitDepth is set equal to BitDepth _C.

The prediction samples predSamplesComb [ x ] [ y ] (where x=0.. cbWidth-1 and y=0.. cbHeight-1) were derived as follows:

the derivation of the weights w is as follows:

-w is set equal to 4 if one or more of the following conditions are true:

-cbWidth is less than 4.

-CbHeight is less than 4.

PredModeIntra is equal to INTRA_PLANAR

PredModeIntra is equal to intra_dc.

Otherwise, if predModeIntra is intra_ ANGULAR50, then w is specified in tables 8-11, where nPos is equal to y and nSize is equal to cbHeight.

Otherwise, if predModeIntra is intra_ ANGULAR18, then w is specified in tables 8-11, where nPos is equal to x and nSize is equal to cbWidth.

Otherwise, w is set equal to 4.

The derivation of the predicted samples predSamplesComb [ x ] [ y ] is as follows:

-if tile_group_MHIntra_SCC_weight_enabled_flag is 1,

-if tile_group_MHIntra_SCC_weight_enabled_flag is 0,

Table 8-11-definition of w1 as a function of position nP and dimension nS

Tables 8-12-definition of w2 as a function of position nP and dimension nS

Fig. 24 is a block diagram of the video processing apparatus 1000. The apparatus 1000 may be used to implement one or more of the methods described herein. The device 1000 may be included in a smartphone, tablet, computer, internet of things (IoT) receiver, or the like. The device 1000 may include one or more processors 1002, one or more memories 1004, and video processing hardware 1006. The processor(s) 1002 may be configured to implement one or more of the methods described in this document. Memory(s) 1004 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 1006 may be used to implement some of the techniques described in this document in hardware circuitry.

Fig. 25 is a flow chart of an example method 2500 of video processing. Method 2500 includes updating (2502) weights for combining inter and intra prediction modes during a transition between a current video region and a bitstream representation of a current video block, and performing (2504) the transition based on the updating.

Additional features are described in the claims section and in chapter 4.

Fig. 26 is a flow chart of an example method 2600 of video processing. The method 2600 comprises: determining (2502) for a transition between a first block of video and a bitstream representation of the first block of video whether to enable or disable use of an update to weights in a combined intra-and inter-prediction (CIIP) mode to be applied during the transition; in response to determining that updating of weights in CIIP mode is enabled, updating (2504) weights in a portion of pixels in the first block applied in CIIP mode; and performing (2506) a conversion based on the updated weights and the non-updated weights.

In some examples, CIIP modes are applied to derive a final prediction of the first block based on a weighted sum of the inter intra prediction and the inter merge inter prediction of the first block.

In some examples, CIIP modes are applied for at least one of the prediction unit, the codec block, and the region during the transition.

In some examples, the weights applied in the partial pixels of the first block in CIIP mode include one or more sets of weights, and each set of weights (w_intra _i,w_inter_i) includes a weight for intra mode (w_intra _i) and a weight for inter mode (w_inter _i), and are applied to the corresponding prediction unit or codec block or region, where i is an integer.

In some examples, i is from 1 to 4.

In some examples, the set of weights (w_intra ₁,w_inter₁) is for the region closest to the reference sample point and (w_intra ₄,w_inter₄) is for the region furthest from the reference sample point.

In some examples, the set of weights in CIIP mode is updated to (w_intra₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(0,N)、(w_intra₃,w_inter₃)＝(0,N) and (w_intra ₄,w_inter₄) = (0, N), N being an integer.

In some examples, the set of weights in CIIP mode is updated to (w_intra1, w_intel1) = (N, 0), (w_intra2, w_intel2) = (N, 0), (w_intra3, w_intel3) = (0, N), and (w_intra4, w_intel4) = (0, N), N being an integer.

In some examples, the set of weights in CIIP mode is updated to (w_intra1, w_intel1) = (N, 0), (w_intra2, w_intel2) = (N, 0), (w_intra3, w_intel3) = (N, 0), and (w_intra4, w_intel4) = (0, N), N being an integer.

In some examples, the weight set (w_intra, w_inter) for pixels at K rows above the first block has a weight (N, 0), and the weight set (w_intra, w_inter) for other rows has a weight (0, N), K being an integer.

In some examples, the weight set (w_intra, w_inter) for pixels at the left K columns of the first block has a weight (N, 0), and the weight sets (w_intra, w_inter) for other columns has a weight (0, N), K being an integer.

In one example, N is set to 1.

In some examples, N is set to 8 when the final prediction block is a weighted average of two prediction blocks divided by 8.

In some examples, weighted prediction is implemented as block copy in CIIP mode when an updated set of weights is used during the transition.

In some examples, the value of the weight applied in the CIIP mode in a portion of the pixels of the first block depends on the message signaled in at least one of the following options: a sequence level comprising a Sequence Parameter Set (SPS), a slice level comprising a slice header, a picture level comprising a picture header, and a block level comprising a Coding Tree Unit (CTU) and a Coding Unit (CU).

In some examples, the determination is based on an indication of use of the update to the weights in CIIP modes.

In some examples, the indication of the use of the weights in CIIP mode to update depends on a message signaled in at least one of the following options: a sequence level comprising a Sequence Parameter Set (SPS), a slice level comprising a slice header, a picture level comprising a picture header, and a block level comprising a Coding Tree Unit (CTU) and a Coding Unit (CU).

In some examples, the message includes a flag present in the bitstream representation.

In some examples, when the flag is true, use of updating weights in CIIP modes is enabled.

In some examples, when the flag is false, use of the weights in CIIP modes to update is disabled.

In some examples, the flag is tile_group_ MHIntra _scc_weight_enabled_flag.

In some examples, an indication of use of the weights in CIIP modes to update is inferred.

In some examples, the indication of use of the weights in CIIP mode to update depends on at least one of the following options:

a) Current block dimensions;

b) Current quantization parameters;

c) Transforming the type;

d) A coding/decoding mode of a block to which an inter motion vector points;

e) Motion vector accuracy;

f) A merge index used in CIIP modes;

g) CIIP intra prediction modes used in modes;

h) The magnitude of the motion vector used in CIIP modes;

i) An indication of which set of weights is used by the neighboring blocks of the first block.

In some examples, the indication of the use of the weights in CIIP mode is signaled at the sub-block level.

In some examples, the indication of use of the weights in CIIP modes to update is inferred at the sub-block level.

Fig. 27 is a flow chart of an example method 2700 of video processing. The method 2700 includes: for a transition between a first block of video and a bitstream representation of the first block of video, determining (2702) a set of weights from a plurality of sets of weights being used in a combined intra-and inter-prediction (CIIP) mode, the determining being dependent on a message present in at least one of: a sequence level comprising a Sequence Parameter Set (SPS), a slice level comprising a slice header, a picture level comprising a picture header, and a block level comprising a Coding Tree Unit (CTU) and a coding decoding unit (CU); applying (2704) CIIP a pattern based on the determined set of weights to generate a final prediction of the first block; and performing (2706) the conversion based on the final prediction; wherein the final prediction of the first block is generated based on a weighted sum of the inter intra prediction and the inter merge inter prediction of the first block.

In some examples, at least one of the plurality of sets of weights in the CIIP mode is applied for at least one of the prediction unit, the codec block, and the region during the transition.

In some examples, the message is signaled to indicate which set of weights to use in CIIP modes.

In some examples, an indication of which of the plurality of weight sets is being used in CIIP mode is inferred.

In some examples, the indication of which of the plurality of sets of weights is being used in CIIP mode depends on at least one of the following options:

a) Current block dimensions;

b) Current quantization parameters;

c) A merge index used in CIIP modes;

d) CIIP intra prediction modes used in modes;

e) The magnitude of the motion vector used in CIIP modes;

f) An indication of which set of weights is used by the neighboring blocks of the first block.

In some examples, an indication of which of the multiple sets of weights is being used in CIIP mode is signaled at the sub-block level.

In some examples, an indication of which of the multiple sets of weights is being used in CIIP mode is inferred at the sub-block level.

In some examples, the conversion is represented by a bitstream to generate a first block of video.

In some examples, the conversion is represented by a bitstream generated by a first block of video.

Other aspects, examples, embodiments, modules, and functional operations disclosed and described herein may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed herein and structural equivalents thereof, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine readable storage device, a machine readable storage substrate, a memory device, a composition of matter effecting a machine readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. In addition to hardware, the apparatus may include code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. The propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and decode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disk; CD ROM and DVD ROM discs. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or of the claims, but rather as descriptions of features specific to particular embodiments of particular technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although certain features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Also, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Furthermore, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples have been described, and other implementations, enhancements, and variations may be made based on what is described and illustrated in this patent document.

Claims

1. A video processing method, comprising:

Determining, for a transition between a first block of video and a bitstream representation of said first block of video, whether to enable or disable use of an update of weights in a combined intra and inter prediction CIIP mode to be applied during said transition;

updating weights applied in a portion of pixels of the first block in CIIP mode in response to determining that updating of weights in CIIP mode is enabled; and

The conversion is performed based on the updated weights and the non-updated weights,

Wherein the updated weight set (w _ intra, w _ inter) for pixels at k rows above said first block has weights (N, 0), and the updated weight sets (w _ intra,

W_inter) has a weight (0, N), K is an integer, and/or

The updated set of weights for the pixels at the left k columns of the first block (w intra,

W _ inter) has a weight (N, 0), and the updated set of weights for the other columns (w _ intra,

W_inter) has a weight (0, n), K being an integer.

2. The method of claim 1, wherein the CIIP mode is applied to derive a final prediction of the first block based on a weighted sum of inter intra prediction and inter merge inter prediction of the first block.

3. The method of claim 1 or 2, wherein the CIIP modes are applied for at least one of prediction units, codec blocks, and regions during the conversion.

4. The method of claim 1, wherein the weights applied in the partial pixels of the first block in CIIP mode comprise one or more weight sets, and each weight set (w_intra _i,w_inter_i) comprises a weight for intra mode (w_intra _i) and a weight for inter mode (w_inter _i), and are applied to a corresponding prediction unit or codec block or region, where i is an integer.

5. The method of claim 4, wherein i is from 1 to 4.

6. The method of claim 5, wherein a set of weights (w_intra ₁,w_inter₁) is used for the region closest to the reference sample point and (w_intra ₄,w_inter₄) is used for the region furthest from the reference sample point.

7. The method of claim 5 or 6, wherein the set of weights in CIIP mode is updated to (w_intra₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(0,N)、(w_intra₃,w_inter₃)＝(0,N) and (w_intra ₄,w_inter₄) = (0, N), N being an integer.

8. The method of claim 5 or 6, wherein the set of weights in CIIP mode is updated to (w_intra₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(N,0)、(w_intra₃,w_inter₃)＝(0,N) and (w_intra ₄,w_inter₄) = (0, N), N being an integer.

9. The method of claim 5 or 6, wherein the set of weights in CIIP mode is updated to (w_intra₁,w_inter₁)＝(N,0)、(w_intra₂,w_inter₂)＝(N,0)、(w_intra₃,w_inter₃)＝(N,0) and (w_intra ₄,w_inter₄) = (0, N), N being an integer.

10. The method according to claim 1 or 2, wherein N is set to 1.

11. The method according to claim 1 or 2, wherein N is set to 8 when the final prediction block is a weighted average of two prediction blocks divided by 8.

12. The method of claim 1 or 2, wherein when an updated set of weights is used during the transition, weighted prediction is implemented as block copy in CIIP mode.

13. The method of claim 1 or 2, wherein the value of the weights applied in the CIIP mode in the partial pixels of the first block depends on a message signaled in at least one of the following options: a sequence level comprising a sequence parameter set SPS, a slice level comprising a slice header, a picture level comprising a picture header, and a block level comprising a codec tree unit CTU and a codec unit CU.

14. The method of claim 1, wherein the determination is based on an indication of use of the weights in CIIP modes to update.

15. The method of claim 14, wherein the indication of use of the update of weights in CIIP mode is dependent on a message signaled in at least one of: a sequence level comprising a sequence parameter set SPS, a slice level comprising a slice header, a picture level comprising a picture header, and a block level comprising a codec tree unit CTU and a codec unit CU.

16. The method of claim 15, wherein the message comprises a flag present in the bitstream representation.

17. The method of claim 16, wherein when the flag is true, enabling use of updating weights in CIIP modes.

18. The method of claim 16, wherein when the flag is false, disabling use of updating weights in CIIP modes.

19. The method of any of claims 16-18, wherein the flag is tile_group_ MHIntra _scc_weight_enabled_flag.

20. The method of claim 14, wherein an indication of use of the update to the weights in CIIP modes is inferred.

21. The method of claim 20, wherein the indication of use of the update of the weights in CIIP mode is dependent on at least one of:

a) Current block dimensions;

b) Current quantization parameters;

c) Transforming the type;

d) A coding/decoding mode of a block to which an inter motion vector points;

e) Motion vector accuracy;

f) A merge index used in CIIP modes;

g) CIIP intra prediction modes used in modes;

h) The magnitude of the motion vector used in CIIP modes;

i) An indication of which set of weights is used by neighboring blocks of the first block.

22. The method of claim 15, wherein the indication of use of the update to weights in CIIP mode is signaled at a sub-block level.

23. The method of claim 14, wherein the indication of use of the update to weights in CIIP modes is inferred at a sub-block level.

24. The method of claim 1, further comprising:

Determining a set of weights from a plurality of sets of weights being used in a combined intra and inter prediction CIIP mode, the determining being dependent on a message present in at least one of: a sequence level comprising a sequence parameter set SPS, a slice level comprising a slice header, a picture level comprising a picture header, and a block level comprising a codec tree unit CTU and a codec unit CU;

Based on the determined set of weights, applying CIIP a pattern to generate a final prediction of the first block; and

Performing the conversion based on the final prediction;

Wherein the final prediction of the first block is generated based on a weighted sum of the inter intra prediction and the inter merge inter prediction of the first block.

25. The method of claim 24, wherein during the converting, at least one of the plurality of weight sets in the CIIP mode is applied for at least one of a prediction unit, a codec block, and a region.

26. The method of claim 24 or 25, wherein the message comprises a flag present in the bitstream representation.

27. The method of claim 24 or 25, wherein the message is signaled to indicate which weight set is to be used in CIIP mode.

28. The method of claim 24 or 25, wherein an indication of which of the plurality of sets of weights is being used in CIIP mode is inferred.

29. The method of claim 28, wherein the indication of which of the plurality of sets of weights is being used in CIIP mode depends on at least one of:

a) Current block dimensions;

b) Current quantization parameters;

c) A merge index used in CIIP modes;

d) CIIP intra prediction modes used in modes;

e) The magnitude of the motion vector used in CIIP modes;

f) An indication of which set of weights is used by neighboring blocks of the first block.

30. The method of claim 24 or 25, wherein the indication of which of the plurality of sets of weights is being used in CIIP mode is signaled at a sub-block level.

31. The method of claim 24 or 25, wherein an indication of which of the plurality of sets of weights being used in CIIP mode is inferred at a sub-block level.

32. The method of claim 1 or 2, wherein the converting generates a first block of video from the bitstream representation.

33. The method of claim 1 or 2, wherein the converting generates the bitstream representation from a first block of the video.

34. An apparatus in a video system comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to implement the method of any of claims 1 to 33.

35. A non-transitory computer readable medium having stored thereon a computer program product comprising program code for implementing the method of any of claims 1 to 33.