CN113366831A

CN113366831A - Coordination between overlapped block motion compensation and other tools

Info

Publication number: CN113366831A
Application number: CN202080008149.8A
Authority: CN
Inventors: 刘鸿彬; 张莉; 张凯; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-01-13
Filing date: 2020-01-13
Publication date: 2021-09-07
Anticipated expiration: 2040-01-13
Also published as: CN117560503A; CN113366831B; WO2020143838A1

Abstract

The present disclosure relates to coordination between overlapped block motion compensation and other tools. A video processing method includes: generating, during a transition between a current block in video data and a bitstream representation of the current block, at least one sub-block from the current block based on a dimension of the current block; generating different predictions for at least one sub-block based on the different prediction lists; applying early termination processing at a sub-block level to determine whether to apply a bidirectional optical flow (BDOF) processing tool to at least one sub-block; and performing a conversion based on the application; wherein the BDOF processing tool generates the prediction offset based on at least one of the different predicted horizontal or vertical gradients.

Description

Coordination between overlapped block motion compensation and other tools

Cross Reference to Related Applications

The present application claims in time the priority and benefit of international patent application No. PCT/CN2019/071508 filed on 13/1/2019, in accordance with applicable patent laws and/or the provisions of paris convention. The entire disclosure of the aforementioned application is incorporated herein by reference as part of the disclosure of the present application.

Technical Field

This patent document relates to video encoding and decoding techniques, devices and systems.

Background

Digital video occupies the largest bandwidth in the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for pre-counting the use of digital video will continue to grow.

Disclosure of Invention

The disclosed techniques may be used by a video decoder or encoder embodiment, where overlapped block motion compensation with motion information derived from the neighborhood is used. The described method may be applied to existing video codec standards, such as High Efficiency Video Codec (HEVC), and future video codec standards or video codecs.

In one example aspect, a method of processing video includes: determining at least one neighboring block to a current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data; determining motion information of at least one neighboring block; and performing Overlapped Block Motion Compensation (OBMC) on the current block based on motion information of at least one neighboring block, wherein the OBMC comprises: the final predictor of the sub-block is generated using the intermediate predictor of one sub-block of the current block and the predictors of at least one neighboring sub-block.

In another example aspect, a method of processing video includes: determining at least one neighboring block to a current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data; determining motion information of at least one neighboring block; and modifying the motion information of the current block based on the motion information of the at least one neighboring block to generate modified motion information of the current block; processing of the current block is performed based on the modified motion information.

In another example aspect, a method of processing video includes: determining a plurality of neighboring blocks to a current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data; determining motion information of a plurality of neighboring blocks; determining a first prediction block of the current block based on motion information of the current block; determining a second prediction block of the current block based on motion information of a plurality of neighboring blocks; modifying the first prediction block based on the second prediction block; and performing processing of the current block based on the first prediction block.

In another example aspect, a method of processing video includes: determining a motion vector of a first sub-block within a current block during a transition between the current block and a bitstream representation of the current block; performing a conversion using an Overlapped Block Motion Compensation (OBMC) mode; wherein the OBMC mode generates a final prediction value of the first sub-block using an intermediate prediction value of the first sub-block based on the motion vector of the first sub-block and a prediction value of at least a second video unit adjacent to the first sub-block; wherein a sub-block size of the first sub-block is based on a block size, a block shape, motion information, or a reference picture of the current block.

In another example aspect, a video processing method includes: generating, during a transition between a current block in video data and a bitstream representation of the current block, at least one sub-block from the current block based on a dimension of the current block; generating different predictions for at least one sub-block based on the different prediction lists; applying early termination processing at a sub-block level to determine whether to apply a bidirectional optical flow (BDOF) processing tool to at least one sub-block; and performing a conversion based on the application; wherein the BDOF processing tool generates the prediction offset based on at least one of the different predicted horizontal or vertical gradients.

In another example aspect, a video processing method includes: generating a current motion vector for a current block during a transition between the current block and a bitstream representation of the current block in video data; generating one or more neighboring motion vectors of one or more neighboring blocks of the current block; deriving a first type of prediction for the current block based on the current motion vector; deriving one or more second type predictions for the current block based on the one or more neighboring motion vectors; determining whether to apply Local Illumination Compensation (LIC) to the first type prediction or the second type prediction based on characteristics of the current block or the neighboring blocks; and performing a conversion based on the determination; where the LIC constructs a linear model with multiple parameters to refine the prediction based on the prediction direction.

In another example aspect, a video processing method includes: generating a current motion vector for a current block during a transition between the current block and a bitstream representation of the current block in video data; generating one or more neighboring motion vectors of one or more neighboring blocks of the current block; deriving a first type of prediction for the current block based on the current motion vector; deriving one or more second type predictions for the current block based on the one or more neighboring motion vectors; applying generalized bi-directional prediction (GBi) to the first type of prediction or the second type of prediction; and performing a conversion based on the determination; wherein GBi comprises applying equal or unequal weights to different prediction directions of the first and second types of prediction based on GBi indices of the weight list.

In another example aspect, a video processing method includes: determining one or more prediction blocks for a current video block during a transition between the current video block and a bitstream representation of the current video block; and performing, based on the one or more prediction blocks, a conversion at least by using Overlapped Block Motion Compensation (OBMC) and decoder-side motion vector derivation (DMVD), wherein the DMVD applies refinement to the motion vectors based on a sum of absolute differences between different prediction directions, or applies refinement to the predictions based on at least one of horizontal or vertical gradients of the different predictions. Wherein the OBMC derives a refined prediction based on the current motion vector of the current video block and one or more neighboring motion vectors of neighboring blocks.

In another example aspect, a video processing method includes: determining availability of at least one neighboring sample point of a current video block during a transition between a current block in video data and a bitstream representation of the current video block; generating an intra prediction for the current video block based on the availability of the at least one neighboring sample point; generating an inter prediction of the current block based on the at least one motion vector; deriving a final prediction for the current block based on a weighted sum of the intra prediction and the inter prediction; and performing a conversion based on the final prediction.

In yet another exemplary aspect, the various techniques described herein may be implemented as a computer-readable recording medium having recorded thereon a program containing code for execution by a processor to perform the methods described herein.

In yet another example aspect, an apparatus in video encoding may implement the methods described herein.

In yet another exemplary aspect, a video decoder device may implement a method as described herein.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 shows an example of an optional temporal motion vector prediction (ATMVP) for a Coding Unit (CU).

Fig. 2 shows an example of one CU with four sub-blocks a-D and its neighbouring blocks a-D.

Fig. 3 shows an example of sub-blocks to which Overlapped Block Motion Compensation (OBMC) is applied.

Fig. 4 shows an example of an encoding flow with different Motion Vector (MV) precision.

FIG. 5 illustrates an example of a simplified affine motion model.

Fig. 6 shows an example of an affine Motion Vector Field (MVF) for each sub-block.

Fig. 7 shows examples of the 4-parameter affine models (a) and 6-parameter affine models (b).

Fig. 8 shows an example of MVP of AF _ INTER.

Fig. 9 shows an example of candidates for AF _ Merge.

Fig. 10 shows an example of neighboring blocks of a current block.

Fig. 11 is a block diagram of an example of a video processing apparatus.

Fig. 12 shows a block diagram of an example implementation of a video encoder.

Fig. 13 is a flowchart of an example of a video processing method.

Fig. 14 is a flowchart of an example of a video processing method.

Fig. 15 is a flowchart of an example of a video processing method.

FIG. 16 illustrates an example hardware platform for implementing some disclosed methods.

FIG. 17 illustrates another example hardware platform for implementing some disclosed methods.

FIG. 18 is a block diagram of an example video processing system in which the disclosed technology may be implemented.

Fig. 19 is a flowchart of an example of a video processing method.

Fig. 20 is a flowchart of an example of a video processing method.

Fig. 21 is a flowchart of an example of a video processing method.

Fig. 22 is a flowchart of an example of a video processing method.

Fig. 23 is a flowchart of an example of a video processing method.

Fig. 24 is a flowchart of an example of a video processing method.

Fig. 25 is a flowchart of an example of a video processing method.

Fig. 26 is a flowchart of an example of a video processing method.

Fig. 27 is a flowchart of an example of a video processing method.

Detailed Description

Various techniques are provided herein that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. In addition, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.

For ease of understanding, section headings are used herein and do not limit embodiments and techniques to the respective sections. Thus, embodiments from one section may be combined with embodiments from other sections.

1. Overview

The invention relates to a video coding and decoding technology. And more particularly to overlapped block motion compensation in video coding. It can be applied to existing video codec standards, such as High Efficiency Video Codec (HEVC), or to pending standards (multifunctional video codec). It is also applicable to future video codec standards or video codecs.

2. Background of the invention

The video codec standard was developed primarily by developing the well-known ITU-T and ISO/IEC standards. ITU-T makes H.261 and H.263, ISO/IEC makes MPEG-1 and MPEG-4 video, and both organizations together make the H.262/MPEG-2 video and the H.264/MPEG-4 Advanced Video Codec (AVC) and H.265/HEVC standards. Starting from h.262, the video codec standard is based on a hybrid video codec structure, in which temporal prediction plus transform coding is utilized. To explore future video codec technologies beyond HEVC, VCEG and MPEG have together established the joint video exploration team (jfet) in 2015. Since then, JFET has adopted many new approaches and applied them to a reference software named Joint Exploration Model (JEM). In month 4 of 2018, the joint video experts group (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) holds in an effort to the multifunctional video codec (VVC) standard, which aims at 50% bit rate reduction compared to HEVC.

Fig. 12 is a block diagram of an example implementation of a video encoder. Fig. 12 shows an encoder implementation with a built-in feedback path, where the video encoder also performs the video decoding function (reconstructing a compressed representation of the video data for encoding of the next video data).

2.1 sub-CU-based motion vector prediction

In a JEM with binary Quaternary Tree (QTBT) partitioning, each CU can have at most one set of motion parameters for each prediction direction. Two sub-CU level motion vector prediction methods are considered in the encoder by dividing a large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU. An Alternative Temporal Motion Vector Prediction (ATMVP) method allows each CU to extract multiple sets of motion information from multiple smaller blocks in the collocated reference picture than the current CU. In the spatial-temporal motion vector prediction (STMVP) method, a motion vector of a sub-CU is recursively derived by using a temporal motion vector predictor and a spatial neighboring motion vector.

In order to maintain a more accurate motion field for sub-CU motion prediction, motion compression of the reference frame is currently disabled.

Fig. 1 is an example of ATMVP motion prediction for a CU.

2.1.1 optional temporal motion vector prediction

In an Alternative Temporal Motion Vector Prediction (ATMVP) method, the motion vector Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. Advanced Temporal Motion Vector Prediction (ATMVP) is also called subblock-based temporal motion vector prediction (SbTMVP). As shown in fig. 1, a sub-CU is a square N × N block (default N is set to 4).

ATMVP predicts motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture with a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain the reference index and motion vector of each sub-CU from the corresponding block of each sub-CU, as shown in fig. 1.

In a first step, the reference picture and the corresponding block are determined from motion information of spatially neighboring blocks of the current CU. To avoid the repeated scanning process of the neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set as the temporal vector and the index to the motion source picture. In this way, in ATMVP, the corresponding block can be identified more accurately than in TMVP, where the corresponding block (sometimes referred to as a collocated block) is always located in the lower right or center position relative to the current CU.

In a second step, the corresponding block of the sub-CU is identified by the temporal vector in the motion source picture by adding the temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of the sub-CU is derived using the motion information of its corresponding block (the minimum motion grid covering the central samples). After identifying the motion information for the corresponding nxn block, it is converted into a motion vector and reference index for the current sub-CU, as in the TMVP method of HEVC, where motion scaling and other processing is applied. For example, the decoder checks whether a low delay condition is met (e.g., POC of all reference pictures of the current picture is less than picture order count POC of the current picture) and predicts a motion vector MVy (X equals 0 or 1 and Y equals 1-X) for each sub-CU, possibly using motion vector MVx (the motion vector corresponding to reference picture list X).

2.1.2 spatio-temporal motion vector prediction

In this method, the motion vectors of the sub-CUs are recursively derived in raster scan order. Fig. 2 illustrates this concept. Consider an 8 × 8 CU, which contains four 4 × 4 sub-CUs a, B, C, and D. The adjacent 4x4 blocks in the current frame are labeled a, b, c, and d.

The motion derivation of sub-CU a begins by identifying its two spatial neighbors. The first neighbor is the nxn block (block c) above the sub-CU a. If this block c is not available or intra coded, the other nxn blocks above the sub-CU a are examined (from left to right, starting at block c). The second neighbor is the block to the left of sub-CU a (block b). If block b is not available or intra-coded, the other blocks to the left of sub-CU a are examined (from top to bottom, starting at block b). The motion information obtained by each list from neighboring blocks is scaled to the first reference frame of a given list. Next, the Temporal Motion Vector Prediction (TMVP) of sub-block a is derived following the same procedure as the TMVP derivation specified in HEVC. The motion information of the collocated block at position D is extracted and correspondingly scaled. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged for each reference list, respectively. The average motion vector is specified as the motion vector of the current sub-CU.

Fig. 2 is an example of one CU with four sub-blocks (a-D) and adjacent blocks (a-D).

2.1.3 sub-CU motion prediction mode signaling

The sub-CU mode is enabled as an additional Merge candidate and no additional syntax element signaling is needed to signal this mode. Two additional Merge candidates are added to the Merge candidate list of each CU to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates that ATMVP and STMVP are enabled, seven large candidates are used at the maximum. The coding logic of the additional Merge candidates is the same as the one of the HM, which means that two additional Merge candidates need to be subjected to two more RD checks for each CU in a P-slice or a B-slice.

In JEM, all blocks (bins) of the Merge index are context coded by context adaptive binary arithmetic coding CABAC. However, in HEVC, only the first block is context-coded and the remaining blocks are context bypass-coded.

2.2 overlapped block motion compensation

Overlapped Block Motion Compensation (OBMC) has been used previously in h.263. In JEM, the OBMC can be turned on and off using CU-level syntax, unlike in h.263. OBMC is performed for all Motion Compensation (MC) block boundaries when used for JEM, except for the right and bottom boundaries of the CU. It also applies to luminance and chrominance components. In JEM, the MC block corresponds to a codec block. When a CU is coded in sub-CU mode (including sub-CU Merge, affine, frame rate up-conversion, and FRUC modes), each sub-block of the CU is an MC block. To handle CU boundaries in a uniform manner, OBMC is performed at the sub-block level for all MC block boundaries, with the sub-block size set equal to 4x4, as shown in fig. 3.

When OBMC is applied to the current sub-block, in addition to the current motion vector, the motion vectors of the four connected neighboring sub-blocks (if available and different from the current motion vector) may also be used to derive a prediction block for the current sub-block. These multiple prediction blocks based on multiple motion vectors are combined to generate the final prediction signal for the current sub-block.

The prediction block based on the motion vector of the neighboring sub-blocks is denoted as P_NWhere N denotes indexes of adjacent upper, lower, left, and right sub-blocks, and a prediction block based on a motion vector of a current sub-block is denoted as P_C. When P is present_NOBMC does not follow P when based on motion information of neighboring sub-blocks containing the same motion information as the current sub-block_NIs performed. Otherwise, each P_NAre all added to P_CIn the same sampling point of (1), i.e. P_NIs added to P_C。P_NUsing weighting factors {1/4,1/8,1/16,1/32}, P_CThe weighting factors {3/4,7/8,15/16,31/32} are used. The exception is small MC blocks (i.e. the codec block has a height or width equal to 4 or the CU is codec in sub-CU mode), for which P is the case_CIn which only P is added_NTwo rows/columns. In this case, P_NUsing weighting factors {1/4,1/8}, P_CThe weighting factors 3/4,7/8 are used. P generated for motion vector based on vertically (horizontally) adjacent sub-blocks_NA 1 is to P_NAre added to P with the same weighting factor_CIn (1).

Fig. 3 is an example of sub-blocks to which OBMC is applied.

In JEM, for CUs with a size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether the current CU applies OBMC. For CUs with a size larger than 256 luma samples or that are not coded using AMVP mode, OBMC is applied by default. At the encoder, the impact of OBMC is taken into account in the motion estimation phase when it is applied to the CU. The prediction signal formed by OBMC using the motion information of the upper and left neighboring blocks is used to compensate the upper and left boundaries of the original signal of the current CU, and then a conventional motion estimation process is applied.

2.3 adaptive motion vector difference resolution

In HEVC, when use _ integer _ mv _ flag is equal to 0 in the slice header, a Motion Vector Difference (MVD) (between a motion vector and a predicted motion vector of a PU) is signaled in units of quarter luminance samples. In JEM, a locally adaptive motion vector resolution (lamfr) is introduced. In the JEM, the MVD may perform encoding and decoding in units of quarter luminance samples, integer luminance samples, or four luminance samples. The MVD resolution control is at the coding and decoding unit (CU) level, and the MVD resolution flag is conditionally signaled for each CU that has at least one non-zero MVD component.

For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter luma sample MV precision is used in the CU. When the first flag (equal to 1) indicates that quarter-luma sample MV precision is not used, another flag is signaled to indicate whether integer-luma sample MV precision or four-luma sample MV precision is used.

A CU uses quarter-luma sample MV resolution when the first MVD resolution flag of the CU is zero or not coded for the CU (meaning all MVDs in the CU are zero). When a CU uses integer luma sample MV precision or four luma sample MV precision, the MVP in the AMVP candidate list of the CU will be rounded to the corresponding precision.

In the encoder, RD checking at the CU level is used to determine which MVD resolution is to be used for the CU. That is, RD checking at the CU level is performed three times for each MVD resolution. To speed up the encoder speed, the following encoding scheme is applied in JEM.

During the RD check of a CU with a conventional quarter-luma sample MVD resolution, the motion information of the current CU (integer luma sample precision) is stored. When performing the RD check on the same CU with integer luma sample and 4 luma sample MVD resolutions, the stored motion information (after rounding) is used as a starting point for further small-range motion vector refinement, so that the time-consuming motion estimation process is not repeated three times.

The RD check of a CU with a 4 luma sample MVD resolution is conditionally invoked. For a CU, when the RD cost for the integer luma sample MVD resolution is much greater than the RD cost for the quarter-luma sample MVD resolution, the RD check for the 4 luma sample MVD resolution of the CU will be skipped.

The encoding process is shown in fig. 4. First, 1/4 pixel MVs are tested and RD costs are calculated and expressed as RDCost0, then integer MVs are tested and RD costs are expressed as RDCost 1. If RDCost1< th RDCost0 (where th is a positive value), then test 4 pixels MV; otherwise, 4-pixel MVs are skipped. Basically, when checking integer or 4-pixel MVs, motion information and RD cost etc. of 1/4-pixel MVs are known, which can be reused to speed up the encoding process of integer or 4-pixel MVs.

Fig. 4 is an example of a flow chart for encoding with different MV precisions.

2.4 higher motion vector storage accuracy

In HEVC, motion vector precision is one-quarter pixel (one-quarter luma samples and one-eighth chroma samples of 4:2:0 video). In JEM, the accuracy of the internal motion vector storage and the Merge candidate is increased to 1/16 pixels. The higher motion vector precision (1/16 pixels) is used for motion compensated inter prediction of CUs coded in skip/Merge mode. For CUs coded in conventional AMVP mode, integer-pixel or quarter-pixel motion is used, as described in section 2.3.

An SHVC upsampling interpolation filter with the same filter length and normalization factor as the HEVC motion compensated interpolation filter is used as the motion compensated interpolation filter for the additional fractional pixel positions. The chroma component motion vector precision in JEM is 1/32 samples, and an additional interpolation filter for 1/32 pixel fractional positions is derived by using the average of two filters adjacent to 1/16 pixel fractional positions.

2.5 affine motion compensated prediction

In HEVC, Motion Compensated Prediction (MCP) only applies translational motion models. However, there may be a variety of motions in the real world, such as zoom in/out, rotation, perspective motion, and other irregular motions. In JEM a simplified affine transform motion compensated prediction is applied. As shown in fig. 5, the affine motion field of a block is described by two control point motion vectors.

Fig. 5 is an example of a simplified affine motion field.

The Motion Vector Field (MVF) of a block is described by the following equation:

wherein (v)_0x，v_0y) Is the motion vector of the upper left corner control point, and (v)_1x，v_1y) Is the motion vector of the upper right hand corner control point.

To further simplify motion compensated prediction, sub-block based affine transform prediction is applied. The subblock size M × N is derived as equation 2, where MvPre is the motion vector fractional precision (e.g., 1/16 in JEM). (v)_2x，v_2y) Is the motion vector of the lower left control point, which is calculated according to equation 1.

After being derived by equation 2, M and N should be adjusted downward as divisors of w and h, respectively, if necessary.

To derive the motion vector for each M × N sub-block, the motion vector for the center sample of each sub-block may be calculated according to equation 1 and rounded to 1/16 fractional accuracy, as shown in fig. 6. A motion compensated interpolation filter is then applied to generate a prediction for each sub-block using the derived motion vectors.

Fig. 6 is an example of affine MVF of each sub-block.

After MCP, the high precision motion vector of each sub-block is rounded and saved to the same precision as the regular motion vector.

In JEM, there are two affine motion patterns: AF _ INTER mode and AF _ MERGE mode. For CUs with a width and height larger than 8, the AF _ INTER mode may be applied. In the bitstream, an affine flag at CU level is signaled to indicate whether AF _ INTER mode is used. In this mode, neighboring blocks are used to construct a block with a motion vector pair { (v)₀，v₁)|v₀＝{v_A，v_B，v_c}，v₁＝{v_D，v_E} of the candidate list. As shown in FIG. 8, v is selected from the motion vectors of block A, B or C₀. The motion vectors from neighboring blocks are scaled according to the reference list and the relationship between the POC of the neighboring block reference, the POC of the current CU reference and the POC of the current CU. The method of selecting v1 from adjacent blocks D and E is similar. When the number of candidate lists is less than 2, the list is populated by copying the motion vector pairs consisting of each AMVP candidate. When the candidate list is larger than 2, the candidates may be first sorted according to the consistency of neighboring motion vectors (similarity of two motion vectors in a pair of candidates), and only the first two candidates are retained. An RD cost check is used to determine which motion vector pair candidate to select as the Control Point Motion Vector Predictor (CPMVP) for the current CU. And signaling an index indicating the position of the CPMVP in the candidate list in the bitstream. After determining the CPMVP of the current affine CU, affine motion estimation is applied and Control Point Motion Vectors (CPMVs) are found. Then, the difference of CPMV and CPMVP is signaled in the bitstream.

Fig. 7 is an example of the 4-parameter affine model (a) and the 6-parameter affine model (b).

Fig. 8 is an example of MVP of AF _ INTER.

In the AF _ INTER mode, 2/3 control points are required when the 4/6 parameter affine mode is used, and thus 2/3 MVDs need to be codec for these control points, as shown in fig. 7. In JFET-K0337, it is proposed to derive MVs by predicting mvd1 and mvd2 from mvd 0.

At the encoder, the MVD of AF _ INTER is iteratively derived. Assuming that this MVD derivation process is iterated n times, the final MVD is calculated as follows, where a_iAnd b_iIs an estimated affine parameter, and mvd [ k ]]^hAnd mvd [ k ]]^vIs the mvd derived in the ith iteration_kHorizontal and vertical components of (k ═ 0, 1).

With JFET-K0337, i.e. from mvd₀Predicting mvd₁Now actually to mvd₁Encoding only

When a CU is applied in AF _ MERGE mode, it gets the first block coded in affine mode from the valid neighboring reconstructed blocks. And the selection order of the candidate blocks is from left, top right, bottom left to top left as shown in fig. 9. If the adjacent lower left block A is coded in affine mode, as shown in FIG. 9, then the motion vectors v for the upper left, upper right and lower left corner of the CU containing block A are derived₂、v₃And v₄. And according to v₂、v₃And v₄Calculating a motion vector v of the upper left corner of the current CU₀. Next, the motion vector v at the upper right of the current CU is calculated₁。

Deriving the CPMV v of the current CU₀And v₁Thereafter, the MVF of the current CU is generated according to the simplified affine motion model equation 1. To identify whether the current CU is coded in AF _ MERGE mode, when at least one is closeWhen a block is coded in affine mode, an affine flag is signaled in the bitstream.

Fig. 9 is an example of candidates for AF _ Merge.

2.6 Intra Block copy

Decoder side:

in this approach [5], the current (partially) decoded picture is considered as a reference picture. The current picture is placed at the last position of the reference picture list 0. Thus, for a slice that uses the current picture as the only reference picture, its slice type is considered to be P slice. The bitstream syntax in this approach follows the same syntax structure of inter-coding, while the decoding process is unified with inter-coding. The only significant difference is that the block vector (the motion vector pointing to the current picture) always uses integer pixel resolution.

The change from the block level CPR _ flag mode is as follows:

in the encoder search for this mode, both the block width and height are less than or equal to 16.

Chroma interpolation is enabled when the luma block vector is an odd integer.

When the SPS flag is on, Adaptive Motion Vector Resolution (AMVR) is enabled for the CPR mode. In this case, when using AMVR, the block vector can be switched between 1 pixel integer and 4 pixel integer resolution at the block level.

Encoder side:

the encoder performs an RD check on blocks that are not larger than 16 in width or height. For non-Merge mode, a block vector search is first performed using a hash-based search. If no valid candidate is found from the hash search, a local search based on block matching will be performed.

In a hash-based search, hash-key matching (32-bit CRC) between the current block and the reference block is extended to all allowed block sizes. The hash key calculation for each position in the current picture is based on 4x4 blocks. For larger size current blocks, matching of the hash key to the reference block will occur when all of its 4x4 blocks match the hash key in the corresponding reference location. If multiple reference blocks are found to match the current block with the same hash key, the block vector cost of each candidate block is calculated and the block with the lowest cost is selected.

In the block matching search, the search range is set to 64 pixels on the left and top of the current block.

Fig. 10 is an example of neighboring blocks of a current block.

3. Examples of problems addressed by embodiments

OBMC is always performed in sub-block level even when the current PU/CU is not encoded in sub-block mode, which increases bandwidth and computational complexity. At the same time, a fixed 4x4 subblock size is used, which also leads to bandwidth problems.

4. Examples of the embodiments

To address this problem, OBMC may be performed in larger block sizes or adaptive sub-block sizes. Meanwhile, in some proposed methods, motion compensation may be performed only once for one prediction direction.

The techniques listed below should be considered as examples to explain the general concept. These techniques should not be construed narrowly. Furthermore, these inventions may be combined in any manner. It is proposed that whether and how the deblocking filter is applied may depend on whether or not the associated scalar quantization is used.

1. It is proposed that in OBMC processing, all sub-blocks within the current block use the same motion information associated with one representative neighboring block.

a. Alternatively, two representative adjacent blocks may be selected. For example, one representative block is selected from above the adjacent block, and the other representative block is selected from the left side of the adjacent block.

b. Alternatively, in addition to spatially neighboring blocks, neighboring blocks may also be located in different pictures.

c. In one example, this method is applied only when the current block is not coded using sub-block techniques (e.g., ATMVP, affine).

2. When decoding a video unit (e.g., a block or sub-block), motion information derived from the bitstream may be further modified based on motion information of neighboring blocks, and the modified motion information may be used to derive a final prediction block of the video unit.

a. In one example, a representative neighboring block may be selected and its motion information may be used together with the motion information of the current unit to derive modified motion information.

b. Alternatively, motion information of a plurality of representative neighboring blocks may be selected.

c. Furthermore, in one example, each motion information selected may be first scaled to the same reference picture of the current video unit (e.g., for each prediction direction). The scaled MV (denoted as neigbalamvlx) and the MV of the current video unit (denoted as currMvLX) may then be used together to derive the final MV of the MC of the video unit (e.g., using a weighted average).

i. When multiple sets of motion information are selected, neigScaleMvLX may be derived from multiple scaled motion vectors, e.g., using a weighted average or average of all the scaled motion vectors.

in one example, the average MV, denoted as avgMv, is calculated as: avgMv ═ (w1 × negigscalemvlx + w2 × currMvLX + offset) > > N, where w1, w2, offset, and N are integers.

1. In one example, w1 and w2 are equal to 1 and 3, respectively, and N is 2 and offset is 2.

2. In one example, w1 and w2 are equal to 1 and 7, respectively, and N is 3 and offset is 4.

d. In one example, the proposed method is applied to the border area of the current block, e.g., the top few lines and/or the left few columns of the current block.

i. In one example, neigScaleMvLX is generated with different representative neighboring blocks for the top border region and/or the left border region of a block, and two different neigscalemvlxs may be generated for the top border region and the left border region. For the upper left border region, either of the two neigscalemvlxs may be used.

e. In one example, the proposed method is performed at a sub-block level. avgMv is derived for each sub-block and used for motion compensation of the sub-blocks.

f. In one example, the proposed method is performed at the sub-block level only when the current block is coded in sub-block mode (e.g., ATMVP, STMVP, affine mode, etc.).

g. In one example, one of a plurality of neighboring blocks is selected, denoted as a representative neighboring block, and its motion information can be used to derive the final MV. Alternatively, M neighboring blocks may be selected as representative neighboring blocks, e.g., M ═ 2, one from neighboring blocks above the current block and one from neighboring blocks to the left of the current block.

h. In one example, for a boundary region of a block or a small region of MxN (e.g., 4x4) within a boundary region, the proposed method may not be performed if its representative neighboring block is intra-coded.

i. In one example, for a boundary region of a block, if its representative neighboring block is intra-coded, a number of adjacent and/or non-adjacent blocks are checked until an inter-coded block is found, and if no inter-coded block is available, the method is disabled.

i. In one example, the non-neighboring blocks include upper or/and upper-left or/and upper-right neighboring blocks of a top boundary block of the CU, and the non-neighboring blocks include left or/and upper-left or/and lower-left neighboring blocks of a left boundary block of the CU, as shown in fig. 10.

in one example, the non-adjacent blocks comprise upper or/and upper left or/and upper right or/and left or/and upper left adjacent blocks.

in one example, the non-neighboring blocks are checked in descending order of distance between the non-neighboring blocks and the corresponding boundary blocks.

in one example, only some non-adjacent blocks are checked.

v. in one example, no more than K non-adjacent blocks are checked.

In one example, the width of the top right and top left regions is W/2 and the height of the bottom left region is H/2, where W and H are the width and height of the CU.

j. In one example, for a boundary region of a block, if its representative adjacent/non-adjacent blocks and the current block are both bi-directionally predicted or uni-directionally predicted from the same reference list, the method is performed in each valid prediction direction.

k. In one example, for a boundary area of a block, the method is only performed on the list LX if its representative adjacent/non-adjacent blocks are uni-directionally predicted, e.g., from the list LX, and the current CU is bi-directionally predicted, or vice versa.

i. Alternatively, no MV averaging is performed.

In one example, for a boundary area of a block, if its representative neighboring/non-neighboring block and the current block are both uni-directionally predicted and predicted from different directions, the method is not performed.

i. Alternatively, MVs of adjacent/non-adjacent blocks are scaled to the reference picture of the current block, and MV averaging is performed.

3. It is proposed that motion information of one or more representative neighboring blocks can be used together to generate an additional prediction block (denoted neigbedlx) of a video unit (block or sub-block). Assuming that the prediction block generated with currMvLX is currPredLX, neigpedlx and currPredLX can be used together to generate the final prediction block for the video unit.

a. In one example, the motion information of multiple representative neighboring blocks may first be scaled to the same reference picture of the current video unit (e.g., for each prediction direction), and then the scaled MVs (e.g., average/weighted average) are used together to derive the neigbalamvlx. And neigbedlx is generated based on neigbalamendvlx.

b. In one example, the proposed method is applied to the border area of the current block, e.g., the top few lines and/or the left few columns of the current block.

i. In one example, neigbalamendvlx is generated with different representative neighboring blocks for the top and left border regions of a block, and two different neigbalamendvlxs may be generated for the top and left border regions. For the upper left border region, either of the two neigscalemvlxs may be used.

c. In one example, the MV scaling process may be skipped.

d. In one example, the proposed method is performed at the sub-block level only when the current block is coded in sub-block mode (e.g., ATMVP, STMVP, affine mode, etc.).

4. It is proposed that when OBMC and Local Illumination Compensation (LIC) work together, the same LIC parameters can be used for the current MV and the neighboring MVs, i.e. some temporal prediction blocks are derived using the same LIC parameters and the current MV or neighboring MVs.

a. In one example, for each prediction direction, the current MV may be used to derive LIC parameters, which are used for the current MV and the neighboring MVs.

b. In one example, for each prediction direction, neighboring MVs may be used to derive LIC parameters, which are used for the current MV and the neighboring MVs.

c. In one example, different LIC parameters may be derived for the current MV and the neighboring MVs, and different LIC parameters may be derived for the different neighboring MVs.

d. In one example, LIC may not be performed on nearby MVs.

e. In one example, LIC may not be performed on neighboring MVs only when the LIC flag corresponding to the neighboring block is false.

f. In one example, when the LIC flag of at least N (N > ═ 0) neighboring blocks is false, LIC may not be performed on all neighboring MVs.

g. In one example, when the LIC flag of at least N (N > ═ 0) neighboring blocks is true, LIC may be performed on all neighboring MVs.

h. In one example, if the LIC flag of the current block is false, LIC may not be performed for either the current MV or the nearby MV.

i. In one example, if the LIC flag of the current block is false, if the LIC flag of the corresponding block is true, the LIC may still be performed on the neighboring blocks.

5. It is proposed that the current MV and the neighboring MVs can use the same GBI index when OBMC and GBI work together.

a. In one example, the GBI index of the current block may be used for the current MV and the neighboring MVs.

b. In one example, different GBI indices may be used for the current MV and the neighboring MV, and different GBI indices may be used for the neighboring MV. For example, for an MV (current MV or neighboring MV), the GBI of the corresponding block is used.

c. In one example, a default GBI index may be used for all nearby MVs.

i. For example, a GBI index indicating a [1, 1] weight factor is used as the default GBI index.

6. It is proposed that in OBMC, DMVD may not be applied to neighboring MVs.

7. It is proposed that OBMC can be performed after DMVD when OBMC and DMVD work together.

a. In one example, OBMC may be performed prior to the DMVD to modify the predicted samples, and then the modified predicted samples are used in the DMVD.

i. In one example, the output of the DMVD can be used as the final prediction for the block.

b. In one example, only the modified prediction samples may be used to derive motion vectors in DMVD. After DMVD is completed, the prediction block generated in OBMC may be further used to modify the final prediction of the block.

8. Whether or not to use neighboring MVs in OBMC may depend on block size or motion information of neighboring blocks.

a. In one example, if the size of the neighboring block is 4x4 or/and 4x8 or/and 8x4 or/and 4x16 or/and 16x4, its neighboring MVs may not be used in the OBMC.

b. In one example, if the neighboring blocks are 4x4 or/and 4x8 or/and 8x4 or/and 4x16 or/and 16x4 in size and are bi-predictive, their neighboring MVs may not be used in OBMC.

9. It is proposed that a block coded in Combined Intra and Inter Prediction (CIIP) mode cannot be intra predicted using some neighboring pixels.

a. In one example, a CIIP mode block may be considered unavailable if its neighboring pixels are from an intra-coded block.

b. In one example, a CIIP mode block may be considered unavailable if its adjacent pixels are from an inter codec block.

c. In one example, a CIIP mode block may be considered unavailable if its neighboring pixels are from the CIIP mode block.

d. In one example, a CIIP mode block may be considered unavailable if its neighboring pixels are from a CPR mode block.

10. It is proposed that in early termination of BIO, sub-block (or block) level early termination may not be applicable.

a. Whether to apply BIO can only be decided at sub-block level.

b. A block may be divided into sub-blocks, and the decision to apply or not apply BIO may depend purely on the sub-blocks themselves and not on other sub-blocks.

11. It is proposed that for LIC encoding blocks, OBMC is not applied regardless of the motion information of the block.

a. Alternatively, OBMC may be further applied for the uni-directionally predicted LIC codec blocks. Alternatively, how and/or when OBMC is applied may further depend on the codec information of the current/neighboring blocks, such as block size, prediction direction, among others.

12. The sub-block size used in the proposed OBMC or the proposed method may depend on the block size, block shape, motion information, reference picture of the current block (assuming the size of the current block is w × h).

a. In one example, the sub-block size M1xM2 is used for blocks of w × h > ═ T, and the sub-block size N1xN2 is used for other blocks.

b. In one example, if w > -T, the width/height of the sub-block is set to M1; otherwise, the width/height of the sub-block is set to N1.

c. In one example, sub-block size M1xM2 is used for uni-directional prediction blocks and sub-block size N1xN2 is used for other blocks.

d. In one example, M1xM2 is 4x 4.

e. In one example, M1xM2 is w/4x4 for the upper region and M1xM2 is 4xh/4 for the left region.

f. In one example, M1xM2 is w/4x2 for the upper region and M1xM2 is 4xh/2 for the left region.

g. In one example, N1xN2 is 8x8, 8x4, or 4x 8.

h. In one example, N1xN2 is w/2x4 for the upper region and N1xN2 is 4xh/2 for the left region.

i. In one example, N1xN2 is w/2x2 for the upper region and N1xN2 is 2xh/2 for the left region.

13. The proposed method or OBMC may be applied to a particular mode, block size/shape, and/or a particular sub-block size.

a. The proposed method is applicable to certain modes, such as conventional translational motion (i.e. affine mode disabled).

b. The proposed method can be applied to specific block sizes.

i. In one example, it is only applied to blocks of w × h > ═ T, where w and h are the width and height of the current block, e.g., T is 16 or 32.

in another example, it is only applied to blocks where w > -T & & h > -T, e.g., T is 8.

Alternatively, it is applied only to blocks w > -T1 & & h > -T2, e.g., T1 and T2 equal to 8.

Alternatively, in addition, it is not applied to blocks of w > -T1 and/or h > -T2. For example, T1 and T2 equal 128.

c. The use of the proposed method can be invoked under further conditions (e.g. based on block size/block shape/codec mode/slice type/low delay check flag/time domain layer, etc.).

14. OBMC may be applied to a video unit if the video unit (e.g., block or sub-block) is coded using an Intra Block Copy (IBC) mode.

a. In one example, one or more representative neighboring blocks are only invoked when it is coded in intra block copy mode. Alternatively, in addition, only motion information from such neighboring blocks is used in OBMC.

b. In one example, if one block is coded with sub-block techniques (e.g., ATMVP) and some sub-blocks are coded with IBC mode, OBMC may still be applied to sub-blocks that are not IBC coded. Alternatively, OBMC may be disabled for the entire block.

c. Alternatively, OBMC is disabled in intra block copy mode.

15. The proposed method is applicable to all color components. Alternatively, they may be applied to only certain color components. For example, they may be applied only to the luminance component.

16. Whether and how the proposed method is applied can be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU/CU/CTU group/CU group.

Fig. 11 is a block diagram of a video processing apparatus 1100. The apparatus 1100 may be used to implement one or more of the methods described herein. The apparatus 1100 may be implemented in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. The apparatus 1100 may include one or more processors 1102, one or more memories 1104, and video processing hardware 1106. The processor 1102 may be configured to implement one or more of the methodologies described herein. The memory(s) 1104 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 1106 may be used to implement some of the techniques described herein in hardware circuitry.

Fig. 13 is a flow diagram of a method 1300 for processing video. The method 1300 includes: determining (1305) that the first video block is adjacent to the second video block, determining (1310) motion information of the second video block, and performing (1315) further processing of sub-blocks of the first video block based on the motion information of the second video block.

Fig. 14 is a flow diagram of a method 1400 for processing video. The method 1400 comprises: determining (1405) that the first video block neighbors the second video block, determining (1410) motion information of the second video block, modifying (1415) the motion information of the first video block based on the motion information of the second video block to generate modified motion information of the first video block, determining (1420) a prediction block for the first video block based on the modified motion information, and performing (1425) further processing of the first video block based on the prediction block.

Fig. 15 is a flow diagram of a method 1500 for processing video. The method 1500 includes: determining (1505) that the first video block is coded using an Intra Block Copy (IBC) mode, and processing (1510) the first video block using Overlapped Block Motion Compensation (OBMC) based on the determination that the first video block is coded using the intra block copy mode.

Some examples of determining candidates for encoding and their use are described in section 4 herein with reference to

methods

1300, 1400, and 1500. For example, as described in section 4, sub-blocks of a first video block may be processed based on motion information of a second video block adjacent to the first video block.

Referring to

methods

1300, 1400, and 1500, video blocks may be encoded in a video bitstream, wherein bit efficiency may be achieved by using bitstream generation rules related to motion information prediction.

The method may include: determining, by the processor, that the first video block is adjacent to the third video block; and determining, by the processor, motion information of a third video block, wherein further processing of the sub-blocks of the first video block is performed based on the motion information of the third video block, one of the second video block or the third video block being located above the first video block and the other being located to the left of the first video block.

The method may include: wherein the first video block is from a first picture and the second video block is from a second picture, the first picture and the second picture being different pictures.

The method may include: wherein the first video block and the second video block are within the same picture.

The method may include: wherein the method is applied based on a first video block that is not coded using sub-block techniques.

The method may include wherein the modified motion information is further based on motion information of a third video block adjacent to the first video block.

The method may include: wherein the motion information of the second video block and the motion information of the first video block are scaled based on a reference picture associated with the first video block, the modifying being based on the scaled motion information.

The method may include: wherein the scaling motion information is based on a weighted average or mean of the scaling motion vectors from the scaling motion information.

The method may include: wherein the average of the scaled motion vectors is calculated based on avgMv ═ (w1 × negscalemvlx + w2 × currMvLX + offset) > > N, where w1, w2, offset, and N are integers, where avgMv is the average of the scaled motion vectors, negscalemvlx is the scaled motion vectors, and currmlx is the motion vector of the first video block.

The method may include: where w1 is 1, w2 is 3, N is 2, and offset is 2.

The method may include: where w1 is 1, w2 is 7, N is 3, and offset is 4.

The method may include: wherein the method is applied to a boundary region of the first video block, the boundary region including a plurality of rows of blocks above the first video block, and the boundary region including a plurality of columns of blocks to the left of the first video block.

The method may include: wherein the neiggscalemvlx is based on a first neiggscalemvlx associated with rows of blocks above the first video within the border region and based on a second neiggscalemvlx associated with columns of blocks to the left of the first video block within the border region.

The method may include: wherein one or both of the first neigscalemvLX or the second neigscaleVLX is used for the upper left border region of the first video block.

The method may include: wherein the method is performed at a sub-block level and at a sub-block level avgMv is based on sub-blocks, wherein the motion compensation of the sub-blocks is based on avgMv.

The method may include: wherein the method is performed at a sub-block level based on a first video block coded in a sub-block mode.

The method may include: a motion vector for the first video block is determined based on the motion information of the second video block.

The method may include: wherein the motion vector of the first video block is further based on motion information of a third video block, one of the second video block or the third video block being located above the first video block and the other being located to the left of the first video block.

The method may include: wherein the method is not performed if the second video block is intra-coded.

The method may include: determining, by the processor, that the second video block is within the boundary region; determining, by the processor, that the second video block is intra-coded; determining, by the processor, that a third video block in the boundary region is inter-coded, the third video block being adjacent or non-adjacent to the first video block; and performing further processing of the first video block based on the third video block.

The method may include: wherein the border region comprises one or more of non-adjacent blocks that are top, left-top, or right-top border video blocks of the border region, and the non-adjacent blocks comprise one or more of left-side, left-top, or left-bottom-left border blocks of the border region.

The method may include: wherein the non-adjacent blocks comprise one or more of upper, upper left, upper right, left, or upper left video blocks of the boundary region.

The method may include: wherein the non-neighboring blocks are examined in descending order to identify the third video block based on a distance between the non-neighboring blocks and the video block within the boundary region.

The method may include: wherein a subset of the non-neighboring blocks is examined to identify a third video block.

The method may include: wherein the number of non-adjacent blocks checked to identify the third video block is less than or equal to the threshold K.

The method may include: wherein the width of the upper right and upper left regions is W/2, and the height of the lower left region is H/2, where W and H are the width and height of the first video block, which is the codec unit.

The method may include: determining that the first video block and the third video block are bi-predicted or uni-directionally predicted from a reference list, wherein the method is performed in each valid prediction direction.

The method may include: wherein the third video block is uni-directionally predicted and the first video block is bi-directionally predicted.

The method may include: wherein the third video block is bi-predictive and the first video block is uni-directionally predictive.

The method may include: wherein no motion vector averaging is performed.

The method may include: determining that the third video block and the first video block are uni-directionally predicted and predicted from different directions, wherein the method is not performed based on the determination.

The method may include: wherein the motion vector of the third video block is scaled to the reference picture of the first video block and motion vector averaging is performed.

The method may include: determining motion information of one or more neighboring blocks of the first video block; and determining a prediction block for the first video block based on motion information of one or more neighboring blocks.

The method may include: wherein motion information of one or more neighboring blocks is scaled to a reference picture of the first video block to generate scaled motion information comprising a scaled motion vector, the scaled motion vector being used to determine the prediction block.

The method may comprise: wherein the method is applied to a boundary region of the first video block.

The method may include: wherein neigcalamvlx is based on different neighboring blocks of the top border region and the left border region of the first video block, and two different neigcalamvlxs are based on the top border region and the left border region, wherein any one of the two neigcalamvlxs is used for the top left border region.

The method may include: wherein the motion vector scaling process is skipped.

The method may include: wherein the method is performed at a sub-block level based on a first block coded in a sub-block mode.

The method may include: wherein the sub-block size is based on a block size, a block shape, motion information, or a reference picture of the first video block.

The method may include: for blocks of width x and height greater than or equal to T, the subblock size is M1xM 2; for blocks with width x and height not greater than or equal to T, the subblock size is N1xN 2.

The method may include: wherein the width divided by the height of the sub-block is M1 based on the width of the block being greater than or equal to T, and the width divided by the height of the sub-block is N1 if the width of the block is not greater than or equal to T.

The method may include: wherein the subblock size is M1xM2 for the uni-directional prediction block and N1xN2 for the other blocks.

The method may include: wherein M1xM2 is 4x 4.

The method may include: wherein for the upper region, M1xM2 is w/4x4, and for the left region, M1xM2 is 4 xh/4.

The method may include: wherein for the upper region, M1xM2 is w/4x2, and for the left region, M1xM2 is 4 xh/2.

The method may include: wherein N1xN2 is 8x8, 8x4, or 4x 8.

The method may include: wherein for the upper region, N1xN2 is w/2x4, and for the left region, N1xN2 is 4 xh/2.

The method may include wherein for the upper region, N1xN2 is w/2x2, and for the left region, N1xN2 is 2 xh/2.

The method may include: wherein the method is applied to a conventional translational motion pattern.

The method may include: wherein the width x height of the first video block is greater than or equal to T, and T is 16 or 32.

The method may include: wherein the first video block has a width greater than or equal to T and a height greater than or equal to T, and T is 8.

The method may include: wherein the first video block has a width greater than or equal to T1, and a height greater than or equal to T2, and T1 and T2 are 8.

The method may include: where the method is not applied to video blocks having a width greater than or equal to T1 or a height greater than or equal to T2, T1 and T2 are 128.

The method may include: wherein the method is performed based on determining a condition according to one or more of: block size, block shape, codec mode, slice type, low delay check flag, or time domain layer.

The method may include: determining one or more neighboring blocks of the first video block are coded using intra block copy mode, wherein the first video block is processed using OBMC based on the determination that the one or more neighboring blocks of the first video block are coded using intra block copy mode.

The method may include: determining that the first video block comprises a sub-block that is coded with an IBC mode; and processing the sub-blocks of the first video block that are not coded with the IBC mode with OBMC.

The method may include: determining that the first video block comprises a sub-block that is coded with an IBC mode; and processing the sub-blocks without using OBMC based on the sub-blocks coded with the IBC mode.

The method may include: wherein the method is applied to one or more color components.

The method may comprise: wherein the method is applied to the luminance component.

The method may include: wherein the method is applied based on a signal from an encoder to a decoder, the signal being provided by a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a picture header, a slice group, a slice header, a Codec Tree Unit (CTU), a Codec Unit (CU), a CTU group, or a CU group.

It should be appreciated that the disclosed techniques may be implemented in a video encoder or decoder to improve compression efficiency when the compressed codec unit has a shape that is significantly different from a conventional square or semi-square rectangular block. For example, new codec tools using long or high codec units (such as units of 4x32 or 32x4 size) may benefit from the disclosed techniques.

FIG. 16 is a schematic diagram illustrating an example of a structure of a computer system or other control device 2600 that may be used to implement various portions of the disclosed technology. In fig. 16, computer system 2600 includes one or more processors 2605 and memory 2610 connected by an interconnect 2625. Interconnect 2625 may represent any one or more separate physical buses, point-to-point connections, or both, connected by appropriate bridges, adapters, or controllers. Thus, interconnect 2625 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or Industry Standard Architecture (ISA) bus, a Small Computer System Interface (SCSI) bus, a Universal Serial Bus (USB), an IIC (I2C) bus, or an Institute of Electrical and Electronics Engineers (IEEE) standard 674 bus (sometimes referred to as a "firewire").

The processor 2605 may include a Central Processing Unit (CPU) to control overall operation of the host, for example. In some embodiments, the processor 2605 accomplishes this by executing software or firmware stored in the memory 2610. The processor 2605 may be or include one or more programmable general purpose or special purpose microprocessors, Digital Signal Processors (DSPs), programmable controllers, Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), or the like, or a combination of such devices.

The memory 2610 may be or include the main memory of a computer system. Memory 2610 represents any suitable form of Random Access Memory (RAM), Read Only Memory (ROM), flash memory, or the like, or a combination of such devices. In use, the memory 2610 may contain, among other things, a set of machine instructions that, when executed by the processor 2605, cause the processor 2605 to perform operations to implement embodiments of the disclosed technology.

Also connected to the processor 2605 by an interconnect 2625 is an (optional) network adapter 2615. Network adapter 2615 provides computer system 2600 with the ability to communicate with remote devices, such as storage clients and/or other storage servers, and may be, for example, an ethernet adapter or a fibre channel adapter.

Fig. 17 shows a block diagram of an example embodiment of a device 2700 that may be used to implement portions of the disclosed technology. Mobile device 2700 may be a laptop, smartphone, tablet, camera, or other device capable of processing video. Mobile device 2700 includes a processor or controller 2701 to process data and a memory 2702 in communication with processor 2701 to store and/or buffer data. For example, the processor 2701 may include a Central Processing Unit (CPU) or a microcontroller unit (MCU). In some implementations, the processor 2701 may include a Field Programmable Gate Array (FPGA). In some implementations, mobile device 2700 includes or communicates with a Graphics Processing Unit (GPU), a Video Processing Unit (VPU), and/or a wireless communication unit to implement various visual and/or communication data processing functions of the smartphone device. For example, memory 2702 may include and store processor-executable code that, when executed by processor 2701, configures mobile device 2700 to perform various operations, such as receiving information, commands, and/or data, processing information and data, and transmitting or providing processed information/data to another data device, such as an actuator or external display. To support various functions of mobile device 2700, memory 2702 can store information and data, such as instructions, software, values, images, and other data processed or referenced by processor 2701. For example, the storage functionality of memory 2702 may be implemented using various types of Random Access Memory (RAM) devices, Read Only Memory (ROM) devices, flash memory devices, and other suitable storage media. In some implementations, mobile device 2700 includes an input/output (I/O) unit 2703 to interface processor 2701 and/or memory 2702 with other modules, units, or devices. For example, I/O unit 2703 may interface with processor 2701 and memory 2702 to utilize various wireless interfaces compatible with typical data communication standards, e.g., between one or more computers and user equipment in the cloud. In some implementations, mobile device 2700 can interface with other devices through I/O unit 2703 using a wired connection. The I/O unit 2703 may include a wireless sensor, such as an infrared detector for detecting remote control signals, or other suitable wireless human interface technology. Mobile device 2700 can also be connected to other external interfaces (e.g., a data store) and/or a visual or audio display device 2704 to retrieve and transmit data and information that can be processed by a processor, stored by a memory, or displayed on display device 2704 or an output unit of an external device. For example, the display device 2704 may display video frames that are modified based on MVP in accordance with the disclosed techniques.

Fig. 18 is a block diagram illustrating an example video processing system 1900 in which various techniques disclosed herein may be implemented. Various implementations may include some or all of the components of system 1800. The system 1800 can include an input 1802 for receiving video content. The video content may be received in a raw or uncompressed format, such as 8-bit or 10-bit multi-component pixel values, or may be received in a compressed or encoded format. Input 1802 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as ethernet, Passive Optical Networks (PONs), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.

System 1800 can include a codec component 1804 that can implement various encoding or coding methods described herein. The codec component 1804 may reduce the average bit rate of the video from the input 1802 to the output of the codec component 1804 to produce a codec representation of the video. Thus, codec techniques are sometimes referred to as video compression or video transcoding techniques. The output of the codec component 1804 may be stored or transmitted over a connected communication, as shown by component 1806. The stored or communicated bitstream (or codec) of video received at input 1802 represents displayable video that may be used by component 1808 to generate pixel values or sent to display interface 1810. The process of generating a user viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while certain video processing operations are referred to as "encoding" operations or tools, it should be understood that codec tools or operations are used at the encoder and the corresponding decoding tools or operations that reverse the encoded results will be performed by the decoder.

Examples of a peripheral bus interface or display interface may include a Universal Serial Bus (USB) or a high-definition multimedia interface (HDMI) or displayport, among others. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and the like. The techniques described herein may be implemented in various electronic devices such as mobile phones, laptops, smartphones, or other devices capable of performing digital data processing and/or video display.

FIG. 19 is a flow diagram of an example method of video processing. The method 1900 includes: determining (at step 1902) at least one neighboring block to a current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data; determining (at step 1904) motion information for at least one neighboring block; and performing (at step 1906) Overlapped Block Motion Compensation (OBMC) on the current block based on the motion information of the at least one neighboring block; wherein, the OBMC instrument includes: the final predictor of the sub-block is generated using the intermediate predictor of one sub-block of the current block and the predictors of at least one neighboring sub-block.

In some implementations, additional modifications may be performed to method 1900. For example, performing OBMC on the current block based on motion information of at least one neighboring block includes: OBMC is performed on all sub-blocks of the current block based on motion information of at least one neighboring block. The at least one neighboring block includes a first neighboring block located above the current block and a second neighboring block located at the left side of the current block. The at least one neighboring block and the current block are from different pictures of the visual media data. This method is applied only when the current block is not coded using sub-block techniques.

FIG. 20 is a flow diagram of an example method of video processing. The method 2000 includes: determining (at step 2002) at least one neighboring block to a current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data; determining (at step 2004) motion information of at least one neighboring block; and modifying (at step 2006) the motion information of the current block based on the motion information of the at least one neighboring block to generate modified motion information of the current block; processing of the current block is performed (at step 2008) based on the modified motion information.

In some implementations, additional modifications may be performed to method 2000. For example, modifying motion information of the current block based on motion information of at least one neighboring block to generate modified motion information of the current block includes: modifying the motion information of the current block based on the motion information of the at least one neighboring block and the motion information of the current block to generate modified motion information of the current block. Modifying the motion information of the current block includes: scaling motion information of at least one neighboring block to the same reference picture of the current block; and modifying the motion information of the current block based on the scaled motion information of the at least one neighboring block and the motion information of the current block. The scaling motion information of at least one neighboring block is weighted-averaged or averaged to generate one representative scaling motion vector for each reference picture list of the current block. The modified motion information of the current block is generated as a weighted average of the representative scaled motion vector and the motion vector of the current block. The modified motion vector is calculated as: avgMv ═ (w1 × negsigscalemvlx + w2 × currMvLX + offset) > > N, where w1, w2, offset, and N are integers, where avgMv is a modified motion vector, negsigscalemvlx is a representative scaled motion vector, and currmlx is a motion vector of the current block, X is a reference picture list, where X ═ 0, 1. w1 is 1, w2 is 3, N is 2, and offset is 2, or where w1 is 1, w2 is 7, N is 3, and offset is 4. Performing the processing of the current block based on the motion information of the at least one neighboring block includes: the processing is performed on a boundary region of the current block, wherein the boundary region of the current block includes a plurality of top rows and/or left columns of the current block. A representative motion vector is generated for the top row of the current block and the left column of the current block using different neighboring blocks, respectively. This method is applied at the sub-block level only when the current block is coded using sub-block techniques. The method is not performed on the boundary area of the current block when at least one neighboring block of the boundary area is intra-coded. When the at least one neighboring block is intra coded, the method further comprises: neighboring blocks and/or non-neighboring blocks are checked until an inter-coded block is found, and the motion vector modification process is disabled in response to no inter-coded block being found. The non-adjacent block includes an upper and/or upper-left and/or upper-right adjacent block of a top boundary region of the current block, and the non-adjacent block includes a left and/or upper-left and/or lower-left adjacent block of a left boundary region of the current block. Non-adjacent blocks include above and/or upper left and/or upper right and/or left and/or upper left adjacent blocks. The non-adjacent blocks are checked in descending order according to the distance between the non-adjacent blocks and the corresponding blocks in the boundary area. A subset of non-adjacent blocks or a number of non-adjacent blocks is checked, the number not being greater than a threshold K. The upper right and upper left regions have a width of W/2 and the lower left region has a height of H/2, where W and H are the width and height of the current block as a codec unit. The method is performed in each valid prediction direction when at least one neighboring/non-neighboring block and the current block are bi-directionally or uni-directionally predicted from a reference list. The modified motion information is generated for the first list when the at least one neighboring/non-neighboring block is uni-directionally predicted from the first list and the current block is bi-directionally predicted, or when the at least one neighboring/non-neighboring block is bi-directionally predicted and the current block is uni-directionally predicted from the first list. Modified motion information is not generated. When the at least one neighboring/non-neighboring block and the current block are uni-directionally predicted and are predicted from different directions, modified motion information is not generated. When at least one adjacent/non-adjacent block and the current block are uni-directionally predicted and are predicted from different directions, the motion vectors of the adjacent/non-adjacent blocks are scaled to a reference picture of the current block, and modified motion information is generated.

FIG. 21 is a flow diagram of an example method of video processing. The method 2100 includes: determining (at step 2102) a plurality of neighboring blocks to a current block during a transition between the current block of visual media data and a corresponding codec representation of the visual media data; determining (at step 2104) motion information for a plurality of neighboring blocks; determining (at step 2106) a first prediction block for the current block based on the motion information for the current block; determining (at step 2108) a second prediction block for the current block based on the motion information of the plurality of neighboring blocks; modifying (at step 2110) the first prediction block based on the second prediction block; and the processing of the current block is performed (at step 2112) based on the first prediction block.

In some implementations, additional modifications may be performed to method 2100. For example, motion information of one of a plurality of neighboring blocks is scaled to a reference picture of the current block to generate representative scaled motion information for determining a second prediction block of the current block. Modifying the first prediction block further comprises: the modified prediction block is generated as a weighted average of the first prediction block and the second prediction block. Performing processing of the current block based on the first prediction block includes: the processing is performed on a boundary region of the current block, wherein the boundary region of the current block includes an upper boundary region having a plurality of top rows and/or a left boundary region having a left column of the current block. Two different representative scaled motion vectors are generated for the upper and left boundary regions based on different neighboring blocks. Either of the two different scaled motion vectors is used for the upper left border region. The motion vector scaling process is skipped. Modifying the first prediction block based on the second prediction block comprises: processing is performed on one or more sub-blocks of the current block based on motion information of at least one neighboring block. This method is applied only when the current block is coded using the sub-block technique. The sub-block technique comprises: advanced Temporal Motion Vector Prediction (ATMVP), Spatial Temporal Motion Vector Prediction (STMVP), affine modes including affine inter mode and affine Merge mode.

FIG. 22 is a flow diagram of an example method of video processing. The method 2200 comprises: determining (at step 2202) a motion vector of a first sub-block within a current block during a transition between the current block and a bitstream representation of the current block; performing a conversion using (at step 2204) an Overlapped Block Motion Compensation (OBMC) mode; wherein the OBMC mode generates a final prediction value of the first sub-block using an intermediate prediction value of the first sub-block based on the motion vector of the first sub-block and a prediction value of at least a second video unit adjacent to the first sub-block; wherein a sub-block size of the first sub-block is based on a block size, a block shape, motion information, or a reference picture of the current block.

FIG. 23 is a flow diagram of an example method of video processing. The method 2300 comprises: generating (at step 2302) at least one sub-block from a current block based on a dimension of the current block during a transition between the current block and a bitstream representation of the current block in video data; generating (at step 2304) different predictions for at least one sub-block based on the different prediction lists; applying (at step 2306) early termination processing at the sub-block level to determine whether to apply a bidirectional optical flow (BDOF) processing tool to at least one sub-block; and performing (at step 2308) a conversion based on the application; wherein the BDOF processing tool generates the prediction offset based on at least one of the different predicted horizontal or vertical gradients.

FIG. 24 is a flow diagram of an example method of video processing. The method 2400 includes: generating (at step 2402) a current motion vector for a current block during a transition between the current block and a bitstream representation of the current block in the video data; generating (at step 2404) one or more neighboring motion vectors for one or more neighboring blocks of the current block; deriving (at step 2406) a first type of prediction for the current block based on the current motion vector; deriving (at step 2408) one or more second type predictions for the current block based on the one or more neighboring motion vectors; determining (at step 2410) whether to apply Local Illumination Compensation (LIC) to the first type of prediction or the second type of prediction based on characteristics of the current block or the neighboring blocks; and performing (at step 2412) a conversion based on the determination; where the LIC constructs a linear model with multiple parameters to refine the prediction based on the prediction direction.

FIG. 25 is a flow diagram of an example method of video processing. The method 2500 includes: generating (at step 2502) a current motion vector for a current block during a transition between the current block and a bitstream representation of the current block in video data; generating (at step 2504) one or more neighboring motion vectors of one or more neighboring blocks of the current block; deriving (at step 2506) a first type of prediction for the current block based on the current motion vector; deriving (at step 2508) one or more second type of prediction for the current block based on the one or more neighboring motion vectors; applying (at step 2510) generalized bi-directional prediction (GBi) to either the first type of prediction or the second type of prediction; and performing (at step 2512) a conversion based on the determination; wherein GBi comprises applying equal or unequal weights to different prediction directions of the first and second types of prediction based on GBi indices of the weight list.

FIG. 26 is a flow diagram of an example method of video processing. The method 2600 comprises: determining (at step 2602) one or more prediction blocks for the current video block during a transition between the current video block and a bitstream representation of the current video block; and performing (at step 2604) a conversion based on the one or more prediction blocks at least by using Overlapped Block Motion Compensation (OBMC) and decoder-side motion vector derivation (DMVD), wherein the DMVD applies refinement to the motion vectors based on a sum of absolute differences between different prediction directions or applies refinement to the predictions based on at least one of horizontal or vertical gradients of the different predictions. Wherein the OBMC derives a refined prediction based on the current motion vector of the current video block and one or more neighboring motion vectors of neighboring blocks.

FIG. 27 is a flow diagram of an example method of video processing. The method 2700 includes: determining (at step 2702) availability of at least one nearby sample point of a current video block during a transition between a current block in video data and a bitstream representation of the current video block; generating (at step 2704) an intra prediction for the current video block based on the availability of the at least one neighboring sample point; generating (at step 2706) inter prediction of the current block based on the at least one motion vector; deriving (at step 2708) a final prediction for the current block based on a weighted sum of the intra prediction and the inter prediction; and performing (at step 2710) the conversion based on the final prediction.

In some implementations, additional modifications may be performed to method 2200. For example, the translation generates the current block from the bitstream representation. The transform generates a bitstream representation from the current block. The current block has a width w and a height h, and if wxh is greater than or equal to a first threshold T1, the size of the first sub-block is M1x M2; and if w × h is less than the first threshold T1, the size of the sub-block is N1x N2, where M1, M2, w, h, N1, N2, and T1 are integers. The current block has a width w and a height h, and if w is greater than or equal to a second threshold T2, the width-to-height ratio w/h of the first sub-block of the current block is M1; and if w is less than the second threshold T2, the width-to-height ratio w/h of the first sub-block is N1, where M1, N1, and T2 are integers. The size M1xM2 of the first sub-block is used if the current block is a unidirectional prediction block, and the size N1xN2 of the first sub-block is used otherwise. M1xM2 is 4x 4. For the upper region, M1xM2 is (w/4) x4, and for the left region, M1xM2 is 4x (h/4). For the upper region, M1xM2 is (w/4) x2, and for the left region, M1xM2 is 4x (h/2). N1xN2 is 8x8, 8x4 or 4x 8. For the upper region, N1xN2 is (w/2) x4, and for the left region, N1xN2 is (h/2). For the upper region, N1xN2 is (w/2) x2, and for the left region, N1xN2 is 2x (h/2). The method is disabled in affine mode. The method is applied to a translational motion mode. If the product wxh of the width and height of the current block is greater than or equal to the third threshold T3, the method is applied to the current block, where T3 is an integer. T3 is 16 or 32. If the width w of the current block is greater than or equal to the fourth threshold T4 and the height h is greater than or equal to the fourth threshold T4, the method is applied to the current block, wherein T4 is an integer. T is 8. If the width w of the current block is greater than or equal to the fifth threshold T5 and the height h is greater than or equal to the sixth threshold T6, the method is applied to the current block, wherein T5 and T6 are integers. T5 and T6 are integer multiples of 8, and T5 is the same as or different from T6. If the width w of the current block is greater than or equal to the seventh threshold value T7 or the height h is greater than or equal to the eighth threshold value T8, the method is not applied to the current block, wherein T7 and T8 are integers. T7 and T8 are 128. The current block is coded using an Intra Block Copy (IBC) mode, wherein the IBC mode uses a picture of the current block as a reference picture. The second video unit is coded using IBC mode. The method is applied to all color components. The method is applied to one or more color components. The method is applied only to the luminance component. Whether and how the method is to be applied is signaled from an encoder to a decoder in a Video Parameter Set (VPS), Sequence Parameter Set (SPS), Picture Parameter Set (PPS), picture header, slice group, slice header, Codec Tree Unit (CTU), Codec Unit (CU), CTU group, or CU group.

Some of the features that are preferably implemented by some of the embodiments are now disclosed in a clause-based format.

1. A video processing method, comprising:

determining at least one neighboring block to a current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data;

determining motion information of at least one neighboring block; and

performing Overlapped Block Motion Compensation (OBMC) on the current block based on motion information of at least one neighboring block,

wherein the OBMC includes: the final predictor of the sub-block is generated using the intermediate predictor of one sub-block of the current block and the predictors of at least one neighboring sub-block.

2. The method of clause 1, wherein performing OBMC on the current block based on the motion information of the at least one neighboring block comprises:

OBMC is performed on all sub-blocks of the current block based on motion information of at least one neighboring block.

3. The method of clause 1 or 2, wherein the at least one neighboring block includes a first neighboring block located above the current block and a second neighboring block located to the left of the current block.

4. The method of any of clauses 1-3, wherein the at least one neighboring block and the current block are from different pictures of the visual media data.

5. The method of any of clauses 1-4, wherein the method is applied only when the current block is not coded using sub-block techniques.

6. A video processing method, comprising:

determining motion information of at least one neighboring block; and

modifying motion information of the current block based on motion information of at least one neighboring block to generate modified motion information of the current block;

processing of the current block is performed based on the modified motion information.

7. The method of clause 6, wherein modifying the motion information of the current block based on the motion information of the at least one neighboring block to generate modified motion information of the current block comprises:

modifying the motion information of the current block based on the motion information of the at least one neighboring block and the motion information of the current block to generate modified motion information of the current block.

8. The method of clause 6 or 7, wherein modifying the motion information of the current block comprises:

scaling motion information of at least one neighboring block to the same reference picture of the current block; and modifying the motion information of the current block based on the scaled motion information of the at least one neighboring block and the motion information of the current block.

9. The method of clause 8, wherein the scaled motion information of at least one neighboring block is weighted-averaged or averaged to generate one representative scaled motion vector for each reference picture list of the current block.

10. The method of clause 9, wherein the modified motion information for the current block is generated as a weighted average of the representative scaled motion vector and the motion vector for the current block.

11. The method of clause 10, wherein the modified motion vector is calculated as: avgMv ═ (w1 × neigscalamlx + w2 × currMvLX + offset) > > N,

where w1, w2, offset, and N are integers, where avgMv is the modified motion vector, neiggscalemvlx is the representative scaled motion vector, and currMvLX is the motion vector for the current block, X is the reference picture list, where X is 0, 1.

12. The method of clause 11, wherein w1 is 1, w2 is 3, N is 2, and offset is 2, or wherein w1 is 1, w2 is 7, N is 3, and offset is 4.

13. The method according to any one of clauses 6 to 12, wherein performing the processing of the current block based on the motion information of the at least one neighboring block comprises:

the process is performed on the boundary area of the current block,

wherein the boundary region of the current block includes a plurality of top rows and/or left columns of the current block.

14. The method of clause 13, wherein the representative motion vector is generated for a top row of the current block and a left column of the current block using different neighboring blocks, respectively.

15. The method of any of clauses 6-14, wherein the method is applied at a sub-block level only when the current block is coded using a sub-block technique.

16. The method according to any one of clauses 6 to 15, wherein the method is not performed on the boundary area of the current block when at least one neighboring block of the boundary area is intra-coded.

17. The method of any of clauses 6 to 16, wherein when the at least one neighboring block is intra coded, the method further comprises:

checking neighboring blocks and/or non-neighboring blocks until an inter-coded block is found, and

the motion vector modification process is disabled in response to no inter-coded block being found.

18. The method of clause 17, wherein the non-neighboring block comprises an upper and/or upper-left and/or upper-right neighboring block of a top boundary region of the current block, and the non-neighboring block comprises a left and/or upper-left and/or lower-left neighboring block of a left side boundary region of the current block.

19. The method of clause 17, wherein the non-adjacent blocks comprise upper and/or upper left and/or upper right and/or left and/or upper left adjacent blocks.

20. The method of clause 17, wherein the non-adjacent blocks are checked in descending order of distance between the non-adjacent blocks and corresponding blocks within the boundary area.

21. The method of any of clauses 17 to 20, wherein a subset of non-neighboring blocks or a number of non-neighboring blocks is checked, the number not being greater than a threshold K.

22. The method according to any of clauses 17 to 21, wherein the upper-right and upper-left regions have a width of W/2 and the lower-left region has a height of H/2, wherein W and H are the width and height of the current block as a codec unit.

23. The method according to any of clauses 17 to 22, wherein the method is performed in each valid prediction direction when at least one neighboring/non-neighboring block and the current block are bi-directionally or uni-directionally predicted from a reference list.

24. The method of any of clauses 17 to 22, wherein the modified motion information is generated for the first list when the at least one neighboring/non-neighboring block is uni-directionally predicted from the first list and the current block is bi-directionally predicted, or when the at least one neighboring/non-neighboring block is bi-directionally predicted and the current block is uni-directionally predicted from the first list.

25. The method of clause 24, wherein modified motion information is not generated.

26. The method of any of clauses 17 to 22, wherein the modified motion information is not generated when the at least one neighboring/non-neighboring block and the current block are uni-directionally predicted and predicted from different directions.

27. The method according to any one of clauses 17 to 22, wherein, when the at least one neighboring/non-neighboring block and the current block are uni-directionally predicted and are predicted from different directions, the motion vectors of the neighboring/non-neighboring blocks are scaled to the reference picture of the current block, and the modified motion information is generated.

28. A video processing method, comprising:

determining a plurality of neighboring blocks to a current block of visual media data during a transition between the current block and a corresponding codec representation of the visual media data;

determining motion information of a plurality of neighboring blocks;

determining a first prediction block of the current block based on motion information of the current block;

determining a second prediction block of the current block based on motion information of a plurality of neighboring blocks;

modifying the first prediction block based on the second prediction block; and

processing of the current block is performed based on the first prediction block.

29. The method of clause 28, wherein the motion information of one of the plurality of neighboring blocks is scaled to the reference picture of the current block to generate representative scaled motion information, the representative scaled motion information being used to determine the second prediction block for the current block.

30. The method of clause 29, wherein modifying the first prediction block further comprises:

the modified prediction block is generated as a weighted average of the first prediction block and the second prediction block.

31. The method of clause 30, wherein performing processing of the current block based on the first prediction block comprises:

the process is performed on the boundary area of the current block,

wherein the boundary region of the current block includes an upper boundary region having a plurality of top rows and/or a left boundary region having a left column of the current block.

32. The method of clause 31, wherein two different representative scaled motion vectors are generated for the upper boundary region and the left boundary region based on different neighboring blocks.

33. The method of clause 32, wherein either of the two different scaled motion vectors is used for the upper left border region.

34. The method of any of clauses 28 to 33, wherein the motion vector scaling process is skipped.

35. The method of any of clauses 28 to 33, wherein modifying the first prediction block based on the second prediction block comprises:

processing is performed on one or more sub-blocks of the current block based on motion information of at least one neighboring block.

36. The method of any of clauses 28-35, wherein the method is applied only when the current block is coded using sub-block techniques.

37. The method of any of clauses 1-36, wherein the sub-block technique comprises: advanced Temporal Motion Vector Prediction (ATMVP), Spatial Temporal Motion Vector Prediction (STMVP), affine modes including affine inter mode and affine Merge mode.

38. A video processing method, comprising:

determining a motion vector of a first sub-block within a current block during a transition between the current block and a bitstream representation of the current block;

performing a conversion using an Overlapped Block Motion Compensation (OBMC) mode;

wherein the OBMC mode generates a final predictor of the first sub-block using an intermediate predictor of the first sub-block and a predictor of at least a second video unit adjacent to the first sub-block, wherein the intermediate predictor is based on the motion vector of the first sub-block;

wherein a sub-block size of the first sub-block is based on a block size, a block shape, motion information, or a reference picture of the current block.

39. The method of clause 38, wherein the converting generates the current block from a bitstream representation.

40. The method of clause 38, wherein the transforming generates a bitstream representation from the current block.

41. The method of any of clauses 38 to 40, wherein the current block has a width w and a height h, and if wxh is greater than or equal to a first threshold T1, the size of the first sub-block is M1x M2; and if w × h is less than the first threshold T1, the size of the sub-block is N1x N2, where M1, M2, w, h, N1, N2, and T1 are integers.

42. The method of any of clauses 38 to 40, wherein the current block has a width w and a height h, and if w is greater than or equal to a second threshold T2, the width-to-height ratio w/h of the first sub-block of the current block is M1; and if w is less than the second threshold T2, the width-to-height ratio w/h of the first sub-block is N1, where M1, N1, and T2 are integers.

43. The method of clause 41, wherein the size M1xM2 of the first sub-block is used if the current block is a uni-directional prediction block, and the size N1xN2 of the first sub-block is used otherwise.

44. The method of clauses 41 or 43, wherein M1xM2 is 4x 4.

45. The method of clauses 41 or 43, wherein M1xM2 is (w/4) x4 for the upper region and M1xM2 is 4x (h/4) for the left region.

46. The method of clauses 41 or 43, wherein M1xM2 is (w/4) x2 for the upper region and M1xM2 is 4x (h/2) for the left region.

47. The method of clause 41 or 43, wherein N1xN2 is 8x8, 8x4, or 4x 8.

48. The method of clause 41 or 43, wherein for the upper region, N1xN2 is (w/2) x4 and for the left region, N1xN2 is (h/2).

49. The method of clause 41 or 43, wherein for the upper region, N1xN2 is (w/2) x2 and for the left region, N1xN2 is 2x (h/2).

50. The method of any of clauses 38 to 49, wherein the method is disabled in affine mode.

51. The method according to any of clauses 38 to 50, wherein the method is applied to a translational movement pattern.

52. The method of any of clauses 38 to 51, wherein the method is applied to the current block if the product wxh of the width and the height of the current block is greater than or equal to a third threshold T3, wherein T3 is an integer.

53. The method of clause 52, wherein T3 is 16 or 32.

54. The method according to any of clauses 38 to 51, wherein the method is applied to the current block if the width w of the current block is greater than or equal to a fourth threshold T4 and the height h is greater than or equal to a fourth threshold T4, wherein T4 is an integer.

55. The method of clause 54, wherein T is 8.

56. The method according to any of clauses 38 to 51, wherein the method is applied to the current block if the width w of the current block is greater than or equal to a fifth threshold T5 and the height h is greater than or equal to a sixth threshold T6, wherein T5 and T6 are integers.

57. The method of clause 56, wherein T5 and T6 are integer multiples of 8, and T5 is the same as or different from T6.

58. The method according to any of clauses 38 to 51, wherein if the width w of the current block is greater than or equal to the seventh threshold T7 or the height h is greater than or equal to the eighth threshold T8, wherein T7 and T8 are integers, the method is not applied to the current block.

59. The method of clause 58, wherein T7 and T8 are 128.

60. The method of clause 38, wherein the current block is coded using an Intra Block Copy (IBC) mode, wherein the IBC mode uses a picture of the current block as a reference picture.

61. The method of clause 60, wherein the second video unit uses IBC mode codec.

62. The method of any of clauses 38 to 61, wherein the method is applied to all color components.

63. The method of any of clauses 38 to 61, wherein the method is applied to one or more color components.

64. The method of clause 63, wherein the method is applied only to the luma component.

65. The method of any of clauses 1-64, wherein whether and how the method is applied is signaled from an encoder to a decoder in a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a picture header, a slice group, a slice header, a Codec Tree Unit (CTU), a Codec Unit (CU), a CTU group, or a CU group.

66. A video decoding apparatus comprising a processor configured to implement the method of any one or more of clauses 1-65.

67. A video encoding apparatus comprising a processor configured to implement the method of any one or more of clauses 1-65.

68. A computer program product having computer code stored thereon, which when executed by a processor, causes the processor to implement the method of any of clauses 1 to 65.

Some features which are preferably implemented by some embodiments are now disclosed in another clause-based format.

1. A video processing method, comprising:

generating, during a transition between a current block in video data and a bitstream representation of the current block, at least one sub-block from the current block based on a dimension of the current block;

generating different predictions for the at least one sub-block based on different prediction lists;

applying early termination processing at a sub-block level to determine whether a bi-directional optical flow (BDOF) processing tool is applied to the at least one sub-block; and

performing the conversion based on the application;

wherein the BDOF processing tool generates a prediction offset based on at least one of the different predicted horizontal or vertical gradients.

2. The method of clause 1, wherein the early termination processing at the sub-block level is based on a Sum of Absolute Differences (SAD) between different predictions of the at least one sub-block.

3. The method of clause 2, wherein the SAD is calculated based on the different predicted section locations.

4. The method of clause 2 or 3, wherein the BDOF processing tool is not applied based on the SAD being less than a threshold.

5. The method of clause 4, wherein the threshold is based on the dimension of the at least one sub-block.

6. The method of clause 1, wherein the BDOF processing tool applies the prediction offset to refine the different predictions to derive a modified prediction for the at least one sub-block.

7. The method of clause 1, further comprising:

the current block is divided into a plurality of sub-blocks, and a SAD is generated for each sub-block.

8. The method of clause 7, wherein determining whether to apply the BDOF processing tool to each sub-block is based on its own SAD without reference to the SADs of other sub-blocks.

9. A video processing method, comprising:

generating a current motion vector for a current block in video data during a transition between the current block and a bitstream representation of the current block;

generating one or more neighboring motion vectors of one or more neighboring blocks of the current block;

deriving a first type of prediction for the current block based on the current motion vector;

deriving one or more second type predictions for the current block based on the one or more neighboring motion vectors;

determining whether to apply Local Illumination Compensation (LIC) to the first type of prediction or the second type of prediction based on characteristics of the current block or the neighboring blocks; and

performing the conversion based on the determination;

wherein the LIC constructs a linear model with multiple parameters to refine the prediction based on the prediction direction.

10. The method of clause 9, wherein the linear models for the first and second types of predictions are derived based on the current motion vector if the LIC is applied to the first and second types of predictions.

11. The method of clause 9, wherein the linear models for the first and second types of predictions are derived based on at least one of the neighboring motion vectors if the LIC is applied to the first and second types of predictions.

12. The method of clause 9, wherein, if the LIC is applied to the first type of prediction, the linear model for the first type of prediction is derived based on the current motion vector, and if the LIC is applied to a second type of prediction, the linear model for the second type of prediction is derived based on at least one of the neighboring motion vectors.

13. The method of any of clauses 9 to 12, wherein different linear models are derived for different second type of predictions based on corresponding neighboring motion vectors.

14. The method of clause 9, wherein the LIC is not applied to the second type of prediction.

15. The method of any of clauses 9 to 14, wherein the bitstream representation includes flags corresponding to the current block and the neighboring blocks to indicate whether the LIC is enabled.

16. The method of clause 15, wherein the LIC is not applied to the second type of prediction if the corresponding flag indicates that the LIC is disabled.

17. The method of any of clauses 9-16, wherein the LIC is not applied to all of the second type of predictions if the number of flags of the neighboring blocks indicating that LIC is disabled is greater than or equal to a threshold.

18. The method of any of clauses 9-17, wherein the LIC is applied to all of the second type predictions if the number of flags of the neighboring blocks indicating LIC enablement is greater than or equal to a threshold.

19. The method of any of clauses 9 to 18, wherein if the flag for the current block indicates that LIC is disabled, the LIC is not applied to the first type of prediction and all of the second type of prediction.

20. The method of any of clauses 9-19, wherein even if the flag of the current block indicates that LIC is applied, applying LIC to the second type of prediction is performed if the flag of the corresponding neighboring block indicates that LIC is enabled.

21. A video processing method, comprising:

applying generalized bi-directional prediction (GBi) to the first type of prediction or the second type of prediction; and

performing the conversion based on the determination;

wherein the GBi comprises applying equal or unequal weights to different prediction directions of the first and second types of prediction based on GBi indices of a weight list.

22. The method of clause 21, wherein the same GBi index is used for both the first type of prediction and the second type of prediction.

23. The method of clause 22, wherein the same GBi index is the GBi index of the current block.

24. The method of clause 21, wherein different GBi indices are used for the first type of prediction and the second type of prediction.

25. The method of clause 21, wherein a default GBi index is used for the first type of prediction and the second type of prediction.

26. A video processing method, comprising:

determining one or more prediction blocks for a current video block during a transition between the current video block and a bitstream representation of the current video block; and

performing the converting at least by using Overlapped Block Motion Compensation (OBMC) and decoder-side motion vector derivation (DMVD) based on the one or more prediction blocks,

wherein the DMVD applies refinement to motion vectors based on a sum of absolute differences between different prediction directions, or applies refinement to predictions based on at least one of horizontal or vertical gradients of different predictions,

wherein the OBMC derives a refined prediction based on a current motion vector of the current video block and one or more neighboring motion vectors of the neighboring blocks.

27. The method of clause 26, further comprising:

the converting is performed using the DMVD and then using the OBMC.

28. The method of clause 26 or 27, further comprising:

the converting is performed by modifying one or more predicted samples using the OBMC to obtain modified one or more predicted samples, and then using the modified one or more predicted samples for the DMVD.

29. The method of clause 28, wherein the output of the DMVD is used as a prediction for the current video block.

30. The method of clause 26 or 27, further comprising:

the converting is performed by modifying one or more predicted samples using the OBMC to obtain modified one or more predicted samples, then using the modified one or more predicted samples only for the DMVD, and then modifying an output of the DMVD using one or more predicted blocks generated by the OBMC.

31. The method of any of clauses 26-30, further comprising:

performing the conversion without using the DMVD to one or more neighboring blocks based on the one or more prediction blocks.

32. The method of any of clauses 1-31, further comprising:

determining whether to use one or more neighboring motion vectors of the neighboring blocks according to the size and/or motion information of the one or more neighboring blocks.

33. The method of clause 32, wherein the neighboring motion vectors of the neighboring blocks are determined not to be used in response to the neighboring blocks being one of 4x4, 4x8, 8x4, 4x16, and 16x4 in size.

34. The method of clause 32, wherein, in response to the neighboring block being one of 4x4, 4x8, 8x4, 4x16, and 16x4 in size and the neighboring block being bi-predicted, determining not to use a neighboring motion vector of the neighboring block.

35. The method of any of clauses 9-20, wherein the converting is performed without using the second type of prediction based on determining to apply the LIC to the first type of prediction.

36. The method according to any of clauses 9 to 20, wherein the converting is performed by using the second type of prediction based on applying a unidirectional prediction LIC to the first type of prediction.

37. The method of any of clauses 1 to 36, wherein determining whether to use the second type of prediction is based on coding information of the current block and/or one or more neighboring blocks.

38. The method of clause 37, wherein the coding information comprises a block dimension and a prediction direction.

39. A video processing method, comprising:

determining availability of at least one nearby sample point of a current video block during a transition between a current block in video data and a bitstream representation of the current video block;

generating an intra prediction for the current video block based on the availability of the at least one neighboring sample point;

generating an inter prediction of the current block based on at least one motion vector;

deriving a final prediction for the current block based on a weighted sum of the intra prediction and the inter prediction; and

performing the conversion based on the final prediction.

40. The method of clause 39, wherein neighboring samples are deemed unavailable if the neighboring samples are from a neighboring block of an intra-coded frame.

41. The method of clause 39, wherein neighboring samples are deemed unavailable if the neighboring samples are from a neighboring block of an inter-frame codec.

42. The method of clause 39, wherein neighboring samples are deemed unavailable if they are from a neighboring block of a combined inter-frame intra prediction (CIIP) mode;

wherein the CIIP mode applies weights to inter-frame prediction and inter-frame prediction.

43. The method of clause 39, wherein neighboring samples are deemed unavailable if they are from a neighboring block of a Current Picture Reference (CPR) mode;

wherein the CPR uses a block vector pointing to a picture to which the neighboring block belongs.

44. The method of any of clauses 1-43, wherein the converting generates the current block from the bitstream representation.

45. The method of any of clauses 1-44, wherein the transforming generates the bitstream representation from the current block.

46. A video processing apparatus comprising a processor configured to implement the method of any of clauses 1-45.

47. The apparatus of clause 46, wherein the apparatus is a video encoder.

48. The apparatus of clause 46, wherein the apparatus is a video decoder.

49. A computer-readable recording medium on which a program including code for causing a processor to implement the method of any one of clauses 1 to 45 is stored.

Implementations of the subject matter described herein and the functional operations may be implemented in various systems, digital electronic circuitry, or computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine readable storage device, a machine readable storage substrate, a memory device, a composition of matter effecting a machine readable propagated signal, or a combination of one or more of them. The term "data processing unit" or "data processing apparatus" includes all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or groups of computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

The specification and drawings are intended to be regarded as examples only, with examples referring to the examples. Further, the use of "or" is intended to include "and/or" unless the context clearly indicates otherwise.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claim combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Also, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples have been described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A video processing method, comprising:

performing the conversion based on the application;

2. The method of claim 1, wherein the early termination processing at the sub-block level is based on a Sum of Absolute Differences (SAD) between different predictions of the at least one sub-block.

3. The method of claim 2 wherein the SAD is calculated based on the different predicted section locations.

4. The method of claim 2 or 3, wherein the BDOF processing tool is not applied based on the SAD being less than a threshold.

5. The method of claim 4, wherein the threshold is based on the dimension of the at least one sub-block.

6. The method of claim 1, wherein the BDOF processing tool applies the prediction offset to refine the different predictions to derive a modified prediction of the at least one sub-block.

7. The method of claim 1, further comprising:

8. The method of claim 7, wherein determining whether to apply the BDOF processing tool to each sub-block is based on its own SAD without reference to the SADs of other sub-blocks.

9. A video processing method, comprising:

performing the conversion based on the determination;

10. The method of claim 9, wherein the linear models for the first and second types of prediction are derived based on the current motion vector if the LIC is applied to the first and second types of prediction.

11. The method of claim 9, wherein the linear models of the first and second types of prediction are derived based on at least one of the neighboring motion vectors, with the LIC applied to the first and second types of prediction.

12. The method of claim 9, wherein the linear model for the first type of prediction is derived based on the current motion vector if the LIC is applied for the first type of prediction, and the linear model for the second type of prediction is derived based on at least one of the neighboring motion vectors if the LIC is applied for the second type of prediction.

13. The method according to any of claims 9 to 12, wherein different linear models are derived for different second type of predictions based on corresponding neighboring motion vectors.

14. The method of claim 9, wherein the LIC is not applied to the second type of prediction.

15. The method of any of claims 9 to 14, wherein the bitstream representation comprises flags corresponding to the current block and the neighboring blocks to indicate whether the LIC is enabled.

16. The method of claim 15, wherein the LIC is not applied to the second type of prediction if the corresponding flag indicates that the LIC is disabled.

17. The method of any of claims 9 to 16, wherein the LIC is not applied to all of the second type predictions if the number of flags of the neighboring blocks indicating that LIC is disabled is greater than or equal to a threshold.

18. The method of any of claims 9 to 17, wherein the LIC is applied to all of the second type predictions if the number of flags of the neighboring blocks indicating that LIC is enabled is greater than or equal to a threshold.

19. The method of any of claims 9 to 18, wherein, in case the flag of the current block indicates that LIC is disabled, the LIC is not applied to the first type of prediction and all the second type of prediction.

20. The method of any one of claims 9 to 19, wherein, even if the flag of the current block indicates that the LIC is applied, the LIC is applied to the second type prediction if the flag of the corresponding neighboring block indicates that LIC is enabled.

21. A video processing method, comprising:

performing the conversion based on the determination;

22. The method of claim 21, wherein the same GBi index is used for both the first type of prediction and the second type of prediction.

23. The method of claim 22 wherein the same GBi index is the GBi index of the current block.

24. The method of claim 21, wherein different GBi indices are used for the first type of prediction and the second type of prediction.

25. The method of claim 21, wherein a default GBi index is used for the first type of prediction and the second type of prediction.

26. A video processing method, comprising:

27. The method of claim 26, further comprising:

the converting is performed using the DMVD and then using the OBMC.

28. The method of claim 26 or 27, further comprising:

29. The method of claim 28, wherein an output of the DMVD is used as a prediction for the current video block.

30. The method of claim 26 or 27, further comprising:

31. The method of any of claims 26 to 30, further comprising:

32. The method of any of claims 1-31, further comprising:

33. The method of claim 32, wherein the neighboring motion vector not using the neighboring block is determined in response to one of sizes of the neighboring blocks being 4x4, 4x8, 8x4, 4x16, and 16x 4.

34. The method of claim 32, wherein the neighboring motion vector of the neighboring block is determined not to be used in response to the neighboring block having one of a size of 4x4, 4x8, 8x4, 4x16, and 16x4 and the neighboring block being bi-directionally predicted.

35. The method of any of claims 9-20, wherein the converting is performed without using the second type of prediction based on determining to apply the LIC to the first type of prediction.

36. The method according to any of claims 9-20, wherein said converting is performed by using said second type of prediction based on applying unidirectional prediction, LIC, to said first type of prediction.

37. The method of any of claims 1 to 36, wherein whether to use the second type of prediction is determined from codec information of the current block and/or one or more neighboring blocks.

38. The method of claim 37, wherein the coding information comprises a block dimension and a prediction direction.

39. A video processing method, comprising:

performing the conversion based on the final prediction.

40. The method of claim 39, wherein neighboring samples are deemed unavailable if they are from a neighboring block of an intra-coded.

41. The method of claim 39, wherein neighboring samples are deemed unavailable if they are from a neighboring block of an inter-frame codec.

42. The method of claim 39, wherein neighboring samples are considered unavailable if they are from a neighboring block of a combined inter-frame intra prediction (CIIP) mode;

43. The method of claim 39, wherein neighboring samples are considered unavailable if they are from a neighboring block of a Current Picture Reference (CPR) mode;

44. The method of any of claims 1 to 43, wherein the converting generates the current block from the bitstream representation.

45. The method of any of claims 1 to 44, wherein the transforming generates the bitstream representation from the current block.

46. A video processing apparatus comprising a processor configured to implement the method of any of claims 1 to 45.

47. The apparatus of claim 46, wherein the apparatus is a video encoder.

48. The apparatus of claim 46, wherein the apparatus is a video decoder.

49. A computer-readable recording medium on which a program including a code for causing a processor to implement the method of any one of claims 1 to 45 is stored.