WO2025152710A1

WO2025152710A1 - Methods and apparatus of using template matching for mv refinement or candidate reordering for video coding

Info

Publication number: WO2025152710A1
Application number: PCT/CN2024/140804
Authority: WO
Inventors: Chih-Yao Chiu; Shih-Chun Chiu; Ching-Yeh Chen; Chih-Wei Hsu; Yi-Wen Chen; Tzu-Der Chuang
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2024-01-17
Filing date: 2024-12-20
Publication date: 2025-07-24
Anticipated expiration: 2026-07-17

Abstract

A method and apparatus for video coding using adaptive cost function setting are disclosed. According to the method, one or more templates for the current block are determined. A target cost function or a target set of weighting coefficients is selected from multiple cost functions or multiple sets of weighting coefficients according to size-related information associated with the current block, said one or more templates, or both. A target cost is determined based on said one or more templates using the target cost function or the target set of weighting coefficients selected. A target coding tool belonging to a coding tool group is applied to the current block, wherein a target process of the target coding tool is performed according to information comprising the target cost.

Description

METHODS AND APPARATUS OF USING TEMPLATE MATCHING FOR MV REFINEMENT OR CANDIDATE REORDERING FOR VIDEO CODING

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/621, 615, filed on January 17, 2024. The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates to video coding system. In particular, the present invention relates to cost function settings for cost function evaluation over one or more templates in a video coding system.
BACKGROUND AND RELATED ART

Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO/IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.

Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing. For Intra Prediction 110, the prediction data is derived based on previously coded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, is provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.

As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.

The decoder, as shown in Fig. 1B, can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.

In VVC and emerging video coding, some new coding tools have been disclosed to improve coding efficient. The current disclosure focuses on techniques related to Template Matching.

Template Matching (TM)

Template matching is a decoder-side MV derivation method to refine the motion information of the current CU by finding the closest match between a template (i.e., top and/or left neighbouring blocks of the current CU) in the current picture and a template (i.e., same size to the template) in a reference picture. As illustrated in Fig. 2, a better MV (i.e., a refined MV) is searched around the initial motion of the current CU within a [–8, +8] -pel search range. The template matching method in JVET-J0021 is used with the following modifications: search step size is determined based on AMVR mode and TM can be cascaded with bilateral matching process in merge modes.

In AMVP mode, an MVP candidate is determined based on template matching error to select the one which reaches the minimum difference between the current block template and the reference block template, and then TM is performed only for this particular MVP candidate for MV refinement. TM refines this MVP candidate, starting from full-pel MVD precision (or 4-pel for 4-pel AMVR mode) within a [–8, +8] -pel search range by using iterative 16-point diamond search. The AMVP candidate may be further refined by using cross search with full-pel MVD precision (or 4-pel for 4-pel AMVR mode) , followed sequentially by half-pel and quarter-pel ones depending on AMVR mode as specified in Table 1. This search process ensures that the MVP candidate still keeps the same MV precision as indicated by the AMVR mode after TM process. In the search process, if the difference between the previous minimum cost and the current minimum cost in the iteration is less than a threshold that is equal to the area of the block, the search process terminates.
Table 1. Search patterns of AMVR and merge mode with AMVR.

In merge mode, similar search method is applied to the merge candidate indicated by the merge index. As shown in Table 1, TM may be performed all the way down to 1/8-pel MVD precision or skipping those beyond half-pel MVD precision, depending on whether the alternative interpolation filter (which is used when AMVR is half-pel mode) is used according to merged motion information. Besides, when TM mode is enabled, template matching may work as an independent process or an extra MV refinement process between block-based and subblock-based bilateral matching (BM) methods, depending on whether BM can be enabled or not according to its enabling condition check.

When TM is applied to bi-predictive blocks, an iterative process is used. Specifically, the initial motion vectors of L0 and L1 are firstly refined and TM costs Cost₀ and Cost₁ are calculated for L0 and L1, respectively. When Cost₀ is larger than Cost₁, the refined motion vector of L1 (MV’1) is used to derive a further refined motion vector of L0 (MV’0) . Then, the MV’1 is further refined using MV’0. Similarly, when Cost₀ is not larger than Cost₁, the refined motion vector of L0 (MV’0) is used to derive a further refined motion vector of L1 (MV’1) , and the MV’0 is further refined using MV’1. Besides, TM for bi-prediction is enabled when DMVR condition is satisfied.

TM-based Subblock Motion Refinement

In JVET-AF0168 test3.4a, it is proposed to apply the template matching to subblock based motion tools, including the affine and SbTMVP mode. More specifically, the control point motion vectors (CPMVs) of uni-predicted affine merge candidates and the motion shift of SbTMVP candidates are refined using TM. For a uni-predicted affine merge candidate, a same MV offset is assigned to all the CPMVs, and the TM cost of the affine candidate is calculated accordingly. The optimal CPMV offset with the minimum TM cost can be used to refine the corresponding affine candidate. For a SbTMVP candidate, the initial motion shift can be refined with TM, and then the refined motion shift will be utilized to derive subblock temporal.

Commonly Used Cost Function of Template Matching Process

The commonly used cost functions for the TM process include the following:
1. SAD represents for Sum of Absolute Differences
2. SSD represents for Sum of Square Differences
3. SATD represents for Sum of Absolute Transformed Differences

In the following, some coding tools that can be used with TM for MV refinement, candidate reordering, or predictor generation are briefly reviewed.

Adaptive Reordering of Merge Candidates (ARMC)

In JVET-V0099 (Na Zhang, et al., “AHG12: Adaptive Reordering of Merge Candidates with Template Matching” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 22nd Meeting, by teleconference, 20–28 April 2021, Document: JVET-V0099) , an adaptive reordering of merge candidates with template matching (ARMC) method is proposed. The reordering method is applied to the regular merge mode, template matching (TM) merge mode, and affine merge mode (excluding the SbTMVP candidate) . For the TM merge mode, merge candidates are reordered before the refinement process.

After a merge candidate list is constructed, merge candidates are divided into several subgroups. The subgroup size is set to 5. Merge candidates in each subgroup are reordered ascendingly according to cost values based on template matching. For simplification, merge candidates in the last subgroup are not reordered with the exception that there is only one subgroup.

The template matching cost is measured by the sum of absolute differences (SAD) between samples of a template of the current block and their corresponding reference samples. The template comprises a set of reconstructed samples neighbouring to the current block. Reference samples of the template are located using the same motion information of the current block.

Intra-Block-Copy (IBC) Mode

Motion Compensation, one of the key technologies in hybrid video coding, explores the pixel correlation between adjacent pictures. It is generally assumed that, in a video sequence, the patterns corresponding to objects or background in a frame are displaced to form corresponding objects in the subsequent frame or correlated with other patterns within the current frame. With the estimation of such displacement (e.g. using block matching techniques) , the pattern can be mostly reproduced without the need to re-code the pattern. Similarly, block matching and copy has also been tried to allow selecting the reference block from the same picture as the current block. It was observed to be inefficient when applying this concept to camera captured videos. Part of the reasons is that the textual pattern in a spatial neighbouring area may be similar to the current coding block, but usually with some gradual changes over the space. It is difficult for a block to find an exact match within the same picture in a video captured by a camera. Accordingly, the improvement in coding performance is limited.

However, the situation for spatial correlation among pixels within the same picture is different for screen contents. For a typical video with texts and graphics, there are usually repetitive patterns within the same picture. Hence, intra (picture) block compensation has been observed to be very effective. A new prediction mode, i.e., the intra block copy (IBC) mode or called current picture referencing (CPR) , has been introduced for screen content coding to utilize this characteristic. In the CPR mode, a prediction unit (PU) is predicted from a previously reconstructed block within the same picture. Further, a displacement vector (called block vector or BV) is used to indicate the relative displacement from the position of the current block to that of the reference block. The prediction errors are then coded using transformation, quantization and entropy coding.

Intra Template Matching Prediction

Intra template matching prediction (IntraTMP) is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame, whose L-shaped template matches the current template. For a predefined search range, the encoder searches for the most similar template matched with the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side.

Combined Inter and Intra Prediction (CIIP)

In VVC, when a CU is coded in merge mode, if the CU contains at least 64 luma samples (that is, CU width times CU height is equal to or larger than 64) , and if both CU width and CU height are less than 128 luma samples, an additional flag is signalled to indicate if the combined inter/intra prediction (CIIP) mode is applied to the current CU. As its name indicates, the CIIP prediction combines an inter prediction signal with an intra prediction signal. The inter prediction signal in the CIIP mode P_inter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal P_intra is derived following the regular intra prediction process with the planar mode. Then, the intra and inter prediction signals are combined using weighted averaging, where the weight value wt is calculated depending on the coding modes of the top and left neighbouring blocks of current CU.

Merge Mode with MVD (MMVD)

In addition to merge mode, where the implicitly derived motion information is directly used for prediction samples generation of the current CU, the merge mode with motion vector differences (MMVD) is introduced in VVC. An MMVD flag is signalled right after sending a regular merge flag to specify whether MMVD mode is used for a CU.

In MMVD, after a merge candidate is selected, it is further refined by the signalled MVDs information. The further information includes a merge candidate flag, an index to specify motion magnitude, and an index for indication of motion direction. In MMVD mode, one of the first two candidates in the merge list is selected to be used as MV basis. The MMVD candidate flag is signalled to specify which one is used between the first and second merge candidates.

Distance index specifies motion magnitude information and indicates the pre-defined offset from the starting points for a L0 reference block and L1 reference block. An offset is added to either the horizontal component or the vertical component of the starting MV.

Geometric Partitioning Mode (GPM)

In VVC, a Geometric Partitioning Mode (GPM) is supported for inter prediction as described in JVET-W2002 (Adrian Browne, et al., Algorithm description for Versatile Video Coding and Test Model 14 (VTM 14) , ITU-T/ISO/IEC Joint Video Exploration Team (JVET) , 23rd Meeting, by teleconference, 7–16 July 2021, document: document JVET-M2002) . The geometric partitioning mode is signalled using a CU-level flag as one kind of merge mode, with other merge modes including the regular merge mode, the MMVD mode, the CIIP mode and the subblock merge mode. A total of 64 partitions are supported by geometric partitioning mode for each possible CU size, w×h=2^m×2ⁿ with m, n ∈ {3…6} excluding 8x64 and 64x8. The GPM mode can be applied to skip or merge CUs having a size within the above limit and having at least two regular merge modes.

Bi-Prediction with CU-level Weight (BCW)

In HEVC, the bi-prediction signal, P_bi-pred is generated by averaging two prediction signals, P₀ and P₁ obtained from two different reference pictures and/or using two different motion vectors. In VVC, the bi-prediction mode is extended beyond simple averaging to allow weighted averaging of the two prediction signals.
P_bi-pred= ( (8-w)*P₀+w*P₁+4) ＞＞3 (3)

Five weights are allowed in the weighted averaging bi-prediction, w∈ {-2, 3, 4, 5, 10} . For each bi-predicted CU, the weight w is determined in one of two ways: 1) for a non-merge CU, the weight index is signalled after the motion vector difference; 2) for a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index. BCW is only applied to CUs with 256 or more luma samples (i.e., CU width times CU height is greater than or equal to 256) . For low-delay pictures, all 5 weights are used. For non-low-delay pictures, only 3 weights (w ∈ {3, 4, 5} ) are used. At the encoder, fast search algorithms are applied to find the weight index without significantly increasing the encoder complexity. These algorithms are summarized as follows.

In the present invention, methods and apparatus to adaptively select cost function setting associated with some coding tools that use template matching for candidate reordering, MV refinement or predictor generation are disclosed.
BRIEF SUMMARY OF THE INVENTION

A method and apparatus for video coding using adaptive cost function setting are disclosed. According to the method, input data associated with a current block is received, wherein the input data comprises pixel data to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. One or more templates for the current block are determined. One or more templates for the current block are determined. A target cost function or a target set of weighting coefficients is selected from multiple cost functions or multiple sets of weighting coefficients according to size-related information associated with the current block, said one or more templates, or both. A target cost is determined based on said one or more templates using the target cost function or the target set of weighting coefficients selected. A target coding tool belonging to a coding tool group is applied to the current block, wherein a target process of the target coding tool is performed according to information comprising the target cost.

In one embodiment, the size-related information comprises width, height, block size, aspect ratio, or a combination thereof. In one embodiment, if the width is greater than two times of the height or the height is more than two times of the width, a top template and a left template use different sets of weighting coefficients.

In one embodiment, a first set of spatial weighting coefficients for a top template contains more elements than a second set of spatial weighting coefficients for a left template.

In one embodiment, one or more additional sets of distance weighting coefficients and/or one or more additional sets of spatial weighting coefficients are used. In one embodiment, the multiple cost functions comprise SAD (Sum of Absolute Differences) , weighted SAD, SSD (Sum of Square Differences) , weighted SSD, SATD (Sum of Absolute Transformed Differences) , weighted SATD or a combination thereof.

In one embodiment, the coding tool group comprises ARMC (Adaptive Reordering of Merge Candidates) TM (Template Matching) for affine merge, regular merge, CIIP merge, BM (Bilateral Matching) merge, or a combination thereof. In one embodiment, the coding tool group comprises MVD (Motion Vector Differences) sign prediction for AMVP (Advanced Motion Vector Prediction) , MMVD (Merge mode with MVD) , affine MMVD, TM based BCW (Bi-Prediction with CU-level Weight) index derivation, IBC (Intra-Block-Copy) CIIP (Combined Inter and Intra Prediction) , CIIP TM merge, IBC TM merge, TM merge mode, GPM, intraTMP (Intra Template Matching Prediction) , AMVP merge MV refinement, or a combination thereof.

In one embodiment, the target process of the target coding tool comprises candidate reordering, MV refinement, generating predictor, or a combination thereof.

According to another method for the encoder side, RD (Rate-Distortion) costs associated with cost function setting candidates are evaluated, wherein each of the RD costs is evaluated for a target coding tool using one of the cost function setting candidates. The best cost function setting is selected among the cost function setting candidates to achieve a smallest RD cost among the RD costs. One or more syntaxes are signalled in a bitstream to indicate the best cost function setting. In one embodiment, the RD costs are evaluated only for a part of the cost function setting candidates.

According to another method for the decoder side, one or more syntaxes in a bitstream are parsed to indicate best cost function setting. One or more templates for the current block are determined. A target process of a target coding tool is applied to the current block using the best cost function setting over said one or more templates to generate processed data. The processed data is then provided.

In one embodiment, the best cost function setting comprises cost function, weighting coefficient set, template size, or a combination thereof. In one embodiment, said one or more syntaxes comprise four syntaxes to indicate best cost function, best template size, best distance weight, and best spatial weight. In one embodiment, said one or more syntaxes comprise two syntaxes to indicate best distance weight, and best spatial weight.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.

Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.

Fig. 2 illustrates an example of template matching, where the template matching is performed on a search area around an initial MV.

Fig. 3 illustrates an example of extended templates, wherein both the top and left templates are extended.

Fig. 4 illustrates an example of extended templates similar to Fig. 3, however the extended templates also include the top-left neighbouring samples.

Fig. 5A-Fig. 5C illustrate examples of subblock-based template matching, where one set of weights with the centroid corresponding to the subblock position for calculating the TM cost.

Fig. 6 illustrates a flowchart of an exemplary video coding system that uses adaptive cost function setting according to an embodiment of the present invention.

Fig. 7 illustrates a flowchart of an exemplary video decoding system that parses one or more syntaxes to determine the best cost function setting according to an embodiment of the present invention.

Fig. 8 illustrates a flowchart of an exemplary video encoding system that determines the best cost function setting and signals one or more syntaxes for the best cost function setting according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.

In order to improve coding efficiency, several embodiments of using template matching for MV refinement or candidate reordering associated with some coding tools are disclosed.

Method 1:

The coding tools that use template matching for MV refinement or candidate reordering or other purpose in the following table.
Table 2. Coding Tools in VTM that is Involved with Template Matching

In this disclosure, a cost function calculation for template matching cost different from the current design of VTM is proposed. The proposed embodiments are summarized in the following table, where each row in the following table represents an embodiment.
Table 3. Proposed embodiments that use cost function or template size different from the
implementation of VTM

Table 3 shows part of our proposed embodiments. To summarize all of our proposed embodiments, we propose to unify the cost function in TM which is used in candidate reordering for many coding tools including:
● ARMC TM for Affine merge, Regular merge, CIIP TM merge, TM merge and BM
merge
● TM based BCW index derivation
● Affine MMVD and MMVD
● MVD sign prediction for AMVP, Affine AMVP and SMVD
● IBC CIIP and IBC regular merge

Besides, we also propose to unify the cost function in TM which is used in MV refinement for many coding tools, including:
● AMVP merge MV refinement
● CIIP TM merge, TM merge mode, IBC TM merge
● GPM (TM MV candidate)

Last but not least, we also propose to unify the cost function in TM which is used to generate predictor for intraTMP.

The cost function can be SAD, weighted SAD, SSD, weighted SSD, SATD or weighted SATD. The template size could be 1, 2 and 4. The template region to be used can be top-only, left-only, or both top and left, L-shape. The difference between the L-shape and “both top and left” is that L-shape also uses top-left neighbouring pixels as the template while “both top and left” does not use.

If the cost function is weighted SAD, weighted SSD or weighted SATD, then the following weighting coefficients are used, which are the same as the implementation of VTM.
TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [2] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} }

The current design uses PU size to determine which set of weighting coefficients is used. If width and height of PU are both larger than 8, the second set of coefficients is used. Otherwise, the first set of coefficients is used.

For other coding tools, such as CCLM, CCCM, GLM, TIMD, DIMD, or LIC, we also propose to use TM for candidate reordering. The cost function can be SAD, weighted SAD, SSD, weighted SSD, SATD or weighted SATD. The template size can include 1, 2 and 4.

Method 2:

For all embodiments that are listed in Table 3, the cost functions are weighted SAD, weighted SSD or weighted SATD. In one embodiment, we propose to use different cost functions or different weighting coefficients according to the size of CU/PU or aspect ratio of CU/PU. Besides, the number of weighting coefficients is not constrained to the above-mentioned sets.

If the width of CU/PU is greater than or equal to twice the height of CU/PU, different weighting coefficients for the top template and the left template are used according to one embodiment. For example, the weighting coefficients for the top template are greater than weighting coefficients for the left template. For another example, the spatial weighting coefficient set for the top template contains more elements than the spatial weighting coefficient set for the left template. Namely, we divide the top template into more sub-regions in the horizontal direction. Following are some examples:
tmWeightIdx = 1 if both width and height of CU/PU are greater than or equal to 8.
tmWeightIdx = 0 for other cases.

Embodiment 1:
Distance weight for top template = { {1, 2, 3, 4} , {2, 3, 4, 4} }
Distance weight for left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for top template = { {3, 3, 3, 3} , {1, 2, 2, 3} }
Spatial weight for left template = { {2, 2, 2, 2} , {0, 1, 1, 2} }

Embodiment 2:
Only use the top template for TM cost calculation.
Distance weight for the top template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template is removed. Namely, equal weight for each spatial
position.

Embodiment 3:
Distance weight for the top template and the left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template = { {2, 2, 2, 2} , {0, 0, 0, 1, 1, 2, 2, 2} }
Spatial weight for the left template = { {2, 2, 2, 2} , {0, 1, 1, 2} }

Embodiment 4:
Distance weight for the top template and the left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template = { {2, 2, 2, 2} , {0, 1, 1, 1, 1, 1, 1, 2} }
Spatial weight for the left template = { {2, 2, 2, 2} , {0, 1, 1, 2} }

Embodiment 5:
Distance weight for the top template and the left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template = { {2, 2, 2, 2} , {0, 0, 1, 1, 1, 2, 2, 2} }
Spatial weight for the left template = { {2, 2, 2, 2} , {0, 1, 1, 2} }

Similarly, if the height of CU/PU is greater than or equal to twice the width of CU/PU, we use following weighting coefficients to calculate TM cost:

Embodiment 1:
Distance weight for the top template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Distance weight for the left template = { {1, 2, 3, 4} , {2, 3, 4, 4} }
Spatial weight for the top template = { {2, 2, 2, 2} , {0, 1, 1, 2} }
Spatial weight for the left template = { {3, 3, 3, 3} , {1, 2, 2, 3} }

Embodiment 2:
Only use the left template for TM cost calculation.
Distance weight for the left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template is removed. Namely, equal weight for each spatial
position.

Embodiment 3:
Distance weight for the top template and the left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template = { {2, 2, 2, 2} , {0, 1, 1, 2} }
Spatial weight for the left template = { {2, 2, 2, 2} , {0, 0, 0, 1, 1, 2, 2, 2} }

Embodiment 4:
Distance weight for the top template and the left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template = { {2, 2, 2, 2} , {0, 1, 1, 2} }
Spatial weight for the left template = { {2, 2, 2, 2} , {0, 1, 1, 1, 1, 1, 1, 2} }

Embodiment 5:
Distance weight for the top template and the left template = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Spatial weight for the top template = { {2, 2, 2, 2} , {0, 1, 1, 2} }
Spatial weight for the left template = { {2, 2, 2, 2} , {0, 0, 1, 1, 1, 2, 2, 2} }

Method 3:

This method is a variant of method2. We propose to add more weighting coefficient sets to TM_DISATNCE_WEIGHT and TM_SPATIAL_WEIGHT for handling different CU/PU size.

Embodiment 1:
tmWeightIdx equals 2 if both width and height of CU/PU are greater than or equal to 16.
tmWeightIdx equals 1 if both width and height of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 2, 2, 2} }

Embodiment 2:
tmWeightIdx equals 2 if both width and height of CU/PU are greater than or equal to 16.
tmWeightIdx equals 1 if both width and height of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 2, 2, 2} }

Embodiment 3:
tmWeightIdx equals 2 if both width and height of CU/PU are greater than or equal to 16.
tmWeightIdx equals 1 if both width and height of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 1, 2, 2} }

Embodiment 4:
tmWeightIdx equals 2 if both width and height of CU/PU are greater than or equal to 16.
tmWeightIdx equals 1 if both width and height of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 1, 2, 2} }

Embodiment 5:
tmWeightIdx equals 2 if both width and height of CU/PU are greater than or equal to 16.
tmWeightIdx equals 1 if both width and height of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 0, 1, 1, 2, 2, 2} }

Embodiment 6:
tmWeightIdx equals 2 if both width and height of CU/PU are greater than or equal to 16.
tmWeightIdx equals 1 if both width and height of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 0, 1, 1, 2, 2, 2} }

Embodiment 7:

We propose to use different tmWeightIdx for the top template and the left template.
tmWeightIdx for top template equals 2 if width of CU/PU are greater than or equal to 16.
tmWeightIdx for top template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
tmWeightIdx for left template equals 2 if height of CU/PU are greater than or equal to 16.
tmWeightIdx for left template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 2, 2, 2} }

Embodiment 8:

We propose to use different tmWeightIdx for the top template and the left template.
tmWeightIdx for top template equals 2 if width of CU/PU are greater than or equal to 16.
tmWeightIdx for top template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
tmWeightIdx for left template equals 2 if height of CU/PU are greater than or equal to 16.
tmWeightIdx for left template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 2, 2, 2} }

Embodiment 9:

We propose to use different tmWeightIdx for the top template and the left template.
tmWeightIdx for top template equals 2 if width of CU/PU are greater than or equal to 16.
tmWeightIdx for top template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
tmWeightIdx for left template equals 2 if height of CU/PU are greater than or equal to 16.
tmWeightIdx for left template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 0, 1, 1, 2, 2, 2} }

Embodiment 10:

We propose to use different tmWeightIdx for the top template and the left template.
tmWeightIdx for top template equals 2 if width of CU/PU are greater than or equal to 16.
tmWeightIdx for top template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
tmWeightIdx for left template equals 2 if height of CU/PU are greater than or equal to 16.
tmWeightIdx for left template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 0, 1, 1, 2, 2, 2} }

Embodiment 11:

We propose to use different tmWeightIdx for the top template and the left template.
tmWeightIdx for top template equals 2 if width of CU/PU are greater than or equal to 16.
tmWeightIdx for top template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
tmWeightIdx for left template equals 2 if height of CU/PU are greater than or equal to 16.
tmWeightIdx for left template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 1, 2, 2} }

Embodiment 12:

We propose to use different tmWeightIdx for the top template and the left template.
tmWeightIdx for top template equals 2 if width of CU/PU are greater than or equal to 16.
tmWeightIdx for top template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
tmWeightIdx for left template equals 2 if height of CU/PU are greater than or equal to 16.
tmWeightIdx for left template equals 1 if width of CU/PU are greater than or equal to 8 but
smaller than 16.
tmWeightIdx equals 0 for other cases.
TM_DISATNCE_WEIGHT [3] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [3] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} , {0, 0, 1, 1, 1, 1, 2, 2} }

Method 4:

We propose to calculate the RD cost for each cost function setting, including cost function, weighting coefficient set, and/or template size, and signal the best setting to the decoder. This method can be applied to one or more coding tools listed in the Table 2. The cost function can be weighted SAD, weighted SSD or weighted SATD. The template size can be 1, 2 and 4. The distance weight can be {2, 2, 2, 2} , {0, 1, 2, 3} , {1, 2, 2, 3} or {1, 2, 3, 3} . The spatial weight can be {2, 2, 2, 2} , {0, 1, 1, 2} or {0, 1, 2, 2} . Therefore, there are 3x3x4x3 = 108 kinds of cost function settings. We propose to signal four flags to the decoder according to one embodiment. One flag is signalled to indicate the best cost function. One flag is signalled to indicate the best template size. One flag is signalled to indicate the best distance weight. One flag is signalled to indicate the best spatial weight.

This search can be conducted at the sequence level, picture level, slice level, CTU level or CU level. For example, we can conduct this search at CTU level for MV refinement of TM merge mode. There are four flags signalled to the decoder. One flag is signalled to indicate the best cost function. One flag is signalled to indicate the best template size. One flag is signalled to indicate the best distance weight. One flag is signalled to indicate the best spatial weight.

Instead of searching all elements (or candidates) of the cost function setting, there is another embodiment that only searches for partial elements of the cost function setting. For example, template size is fixed to 4 and we perform search for the best cost function setting: (cost function , distance weight , spatial weight) . After the search is done, we signal three flags to the decoder according to one embodiment. One is used to indicate the cost function. One is used to indicate the distance weight. One is used to indicate the spatial weight.

For another example, template size is fixed to 4 and cost function is fixed as weighted SAD, and we perform search for the best cost function setting: (distance weight , spatial weight ) . After the search is done, we signal two flags to the decoder. One is used to indicate the distance weight. One is used to indicate the spatial weight.

Take candidate reordering of MMVD as an example. We can fix the distance weight and the spatial weight and perform search for the best cost function setting: (cost function, template size ) .

The cost function can be weighted SAD, weighted SSD, weighted SATD, SAD, SSD, or SATD. The template size can be 1, 2 or 4. We search the best cost function settings among these 6x3=18 candidates. We signal two flags to the decoder according to one embodiment. One is used to indicate the cost function. The other one is used to indicate the template size.

Method 5:

This proposed method is a variant of method 4. Instead of thoroughly search the best cost function setting among the 54 candidate settings. We propose to hierarchically search the sub-optimal cost function setting.
Step 1: template size is fixed as one of elements in set {1, 2, 4} . The cost function can be
weighted SAD, weighted SSD, or weighted SATD. The distance weight can be {2, 2, 2, 2} , {0, 1, 2, 3} , {1, 2, 2, 3} or {1, 2, 3, 3} . The spatial weight can be {2, 2, 2, 2} , {0, 1, 1, 2} or {0, 1, 2, 2} . Therefore, there are 3x4x3 = 36 candidates. We search the best cost function setting among these 36 candidates.
Step 2: Inherit the search result of step 1 and search the best template size. The template size
can be 1, 2 and 4. We search the best cost function setting among these 3 candidates.

This search can be conducted at the slice level, CTU level or CU level.

Method 6:

This proposed method is a variant of method 5. We propose to do hierarchically search in another way.
Step 1: template size is fixed as one of elements in set {1, 2, 4} . The cost criterion is fixed as one
of functions in set {weighted SAD, weighted SSD, weighted SATD} . The distance weight can be {2, 2, 2, 2} , {0, 1, 2, 3} , {1, 2, 2, 3} or {1, 2, 3, 3} . The spatial weight can be {2, 2, 2, 2} , {0, 1, 1, 2} and {2, 1, 1, 0} . Therefore, there are 4x3= 12 candidates. We search the best cost function setting among these 12 candidates.
Step 2: Inherit the search result of step 1 and search the best template size and the best cost
criteria. The template size can be 1, 2 or 4. The cost criteria can be weighted SAD, weighted SSD or weighted SATD. We search the best cost function setting among these 9 candidates.

Method 7:

This proposed method is a variant of method 6. We propose to fix the distance weight and the spatial weight at step 1, and search the best setting of (template size, cost criteria) . In step 2, we inherit the search result of step 1 and search the best setting of (distance weight set, spatial weight set) .

Method 8:

This method can be applied to each coding tool listed in the Table 2. We propose to select different cost function settings including the cost function, the weighting coefficient and the template size according to the used template region (e.g. top-only, left-only, or top+left) .

If both the top template and the left template are used, we use both the distance weight and the spatial weight, where tmWeightIdx = pu. width >= 8 and pu. height >= 8 ? 1 : 0.

Embodiments for distance weight are list as follows:
Embodiment 1: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} }
Embodiment 2: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} }
Embodiment 3: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 1, 2, 3} }
Embodiment 4: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 1, 1, 2} } .

Embodiments for spatial weights are list as follows:
Embodiment 1: TM_SPATIAL_WEIGHT [2] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} }
Embodiment 2: TM_SPATIAL_WEIGHT [2] [4] = { {2, 2, 2, 2} , {0, 1, 2, 2} } .

If only the top template is used, we only use the distance weight for TM cost calculation and set template size as 2 or 4.

There are several embodiments for the distance weight as follows.
tmWeightIdx = pu. width >= 8 ? 1 : 0
embodiment 1: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} }
embodiment 2: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 2, 2, 3} }
embodiment 3: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 1, 2, 3} }
embodiment 4: TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 1, 1, 2} }

Similarly, if only the left template is used, we only use the distance weight for TM cost calculation and set template size as 2 or 4.

Method 9:

We propose to extend the template area for template matching. An example of the extended template is shown in Fig. 3, where an extended above template 320 and an extended left template for the current CU 310 are shown.

In this proposed method, we need to use bottom-left and top-right neighbouring pixels.

Embodiment 1:

The width of top template equals twice the width of CU/PU. Similarly, the height of left template equals twice the height of CU/PU. As for the weighting coefficients, it is the same as the weighting coefficient defined in VTM software.
TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [2] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} }

Embodiment 2: L-shape template is used as shown in Fig. 4. Thus, the templates are similar to those in Fig. 3, however, additional top-left neighbouring pixels 410 are included.

The width of top template equals twice the width of CU/PU plus template size. The height of left template equals twice the height of CU/PU. As for the eighting coefficient , it is the same as the weighting coefficient defined in VTM software.
TM_DISATNCE_WEIGHT [2] [4] = { {0, 1, 2, 3} , {1, 2, 3, 3} }
TM_SPATIAL_WEIGHT [2] [4] = { {2, 2, 2, 2} , {0, 1, 1, 2} }

Method 10: Subblock-Based Template Matching

This method can be applied to all coding tools listed in Table 3 and CCLM, CCCM, GLM, TIMD, DIMD, and LIC. This method can be used to refine MV of each subblock or perform candidate reordering for each subblock.

For each subblock, one set of TM weights with the centroid corresponding to the subblock position can be used to calculate the TM cost. An example is shown in Figs. 5A-5C. Fig. 5A corresponds the case without using subblock-based TM, where the centroid is for the whole current block. Fig. 5B corresponds to the case for the top-left subblock and Fig. 5C corresponds to the case for the top-right subblock.

The weighting coefficient is indexed with respect to the (x, y) coordinate.

We define four weighting coefficient sets for top-left subblock, top-right subblock, bottom-left subblock and bottom-right subblock respectively. The idea of designing weighting coefficient for each subblock is to make the centre of mass of template as close to centroid of subblock as possible.

Examples:
Weight matrix for top template of top-left subblock [x] [y] = [ [2, 2, 1, 0] , [2, 2, 1, 1] , [2, 3, 2, 1] ,
[3, 3, 2, 1] ]
Weight matrix for left template of top-left subblock [x] [y] = [ [2, 2, 2, 3] , [2, 2, 3, 3] ,
[1, 1, 2, 2] , [0, 1, 1, 1] ]
Weight matrix for top template of top-right subblock [x] [y] = [ [0, 1, 2, 2] , [1, 1, 2, 2] ,
[1, 2, 3, 2] , [1, 2, 3, 3] ]
Weight matrix for left template of top-right subblock [x] [y] = [ [2, 2, 2, 3] , [2, 2, 3, 3] ,
[1, 1, 2, 2] , [0, 1, 1, 1] ]
Weight matrix for top template of bottom-left subblock [x] [y] = [ [2, 2, 1, 0] , [2, 2, 1, 1] ,
[2, 3, 2, 1] , [3, 3, 2, 1] ]
Weight matrix for left template of bottom-left subblock [x] [y] = [ [0, 1, 1, 1] , [1, 1, 2, 2] ,
[2, 2, 3, 3] , [2, 2, 2, 3] ]
Weight matrix for top template of bottom-right subblock [x] [y] = [ [0, 1, 2, 2] , [1, 1, 2, 2] ,
[1, 2, 3, 2] , [1, 2, 3, 3] ]
Weight matrix for left template of bottom-right subblock [x] [y] = [ [0, 1, 1, 1] , [1, 1, 2, 2] ,
[2, 2, 3, 3] , [2, 2, 2, 3] ]

The methods of adaptive cost function setting associated with template matching as described above can be implemented in an encoder side or a decoder side. For example, any of the proposed methods can be implemented in an Intra/Inter coding module (e.g. Intra Pred. 150/MC 152 in Fig. 1B) in a decoder or an Intra/Inter coding module is an encoder (e.g. Intra Pred. 110/Inter Pred. 112 in Fig. 1A) . Any of the proposed methods can also be implemented as a circuit coupled to the intra/inter coding module at the decoder or the encoder. However, the decoder or encoder may also use additional processing unit to implement the required cross-component prediction processing. While the Intra Pred. units (e.g. unit 110/112 in Fig. 1A and unit 150/152 in Fig. 1B) are shown as individual processing units, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .

Fig. 6 illustrates a flowchart of an exemplary video coding system that uses adaptive cost function setting according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to the method, input data associated with a current block is received in step 610, wherein the input data comprises pixel data to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. One or more templates for the current block are determined in step 620. A target cost function or a target set of weighting coefficients is selected from multiple cost functions or multiple sets of weighting coefficients according to size-related information associated with the current block, said one or more templates, or both in step 630. A target cost is determined based on said one or more templates using the target cost function or the target set of weighting coefficients selected in step 640. A target coding tool belonging to a coding tool group is applied to the current block in step 650, wherein a target process of the target coding tool is performed according to information comprising the target cost.

Fig. 7 illustrates a flowchart of an exemplary video decoding system that parses one or more syntaxes to determine the best cost function setting according to an embodiment of the present invention. According to this method, input data associated with a current block is received in step 710, wherein the input data comprises coded data associated with the current block to be decoded. One or more syntaxes in a bitstream are parsed to indicate best cost function setting in step 720. One or more templates for the current block are determined in step 730. A target process of a target coding tool is applied to the current block using the best cost function setting over said one or more templates to generate processed data in step 740. The processed data is then provided in step 750.

Fig. 8 illustrates a flowchart of an exemplary video encoding system that determines the best cost function setting and signals one or more syntaxes for the best cost function setting according to an embodiment of the present invention. Input data associated with a current block in step 810, wherein the input data comprises pixel data to be encoded. One or more templates for the current block are determined in step 820. RD (Rate-Distortion) costs associated with cost function setting candidates are evaluated in step 830, wherein each of the RD costs is evaluated for a target coding tool using one of the cost function setting candidates. The best cost function setting is selected among the cost function setting candidates to achieve a smallest RD cost among the RD costs in step 840. One or more syntaxes are signalled in a bitstream to indicate the best cost function setting in step 850.

The flowcharts shown are intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method of video coding, the method comprising:

receiving input data associated with a current block, wherein the input data comprises pixel data to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;

determining one or more templates for the current block;

selecting a target cost function or a target set of weighting coefficients from multiple cost functions or multiple sets of weighting coefficients according to size-related information associated with the current block, said one or more templates, or both;

deriving a target cost based on said one or more templates using the target cost function or the target set of weighting coefficients selected; and

applying a target coding tool belonging to a coding tool group to the current block, wherein a target process of the target coding tool is performed according to information comprising the target cost.
The method of Claim 1, wherein the size-related information comprises width, height, block size, aspect ratio, or a combination thereof.
The method of Claim 2, wherein if the width is greater than two times of the height or the height is more than two times of the width, a top template and a left template use different sets of weighting coefficients.
The method of Claim 1, wherein a first set of spatial weighting coefficients for a top template contains more elements than a second set of spatial weighting coefficients for a left template.
The method of Claim 1, wherein one or more additional sets of distance weighting coefficients and/or one or more additional sets of spatial weighting coefficients are used.
The method of Claim 1, wherein the multiple cost functions comprise SAD (Sum of Absolute Differences) , weighted SAD, SSD (Sum of Square Differences) , weighted SSD, SATD (Sum of Absolute Transformed Differences) , weighted SATD or a combination thereof.
The method of Claim 1, wherein the coding tool group comprises ARMC (Adaptive Reordering of Merge Candidates) TM (Template Matching) for affine merge, regular merge, CIIP merge, BM (Bilateral Matching) merge, or a combination thereof.
The method of Claim 1, wherein the coding tool group comprises MVD (Motion Vector Differences) sign prediction for AMVP (Advanced Motion Vector Prediction) , MMVD (Merge mode with MVD) , affine MMVD, TM based BCW (Bi-Prediction with CU-level Weight) index derivation, IBC (Intra-Block-Copy) CIIP (Combined Inter and Intra Prediction) , CIIP TM merge, IBC TM merge, TM merge mode, GPM, intraTMP (Intra Template Matching Prediction) , AMVP merge MV refinement, or a combination thereof.
The method of Claim 1, wherein the target process of the target coding tool comprises candidate reordering, MV refinement, generating predictor, or a combination thereof.
An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:

receive input data associated with a current block, wherein the input data comprises pixel data to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;

determine one or more templates for the current block;

select a target cost function or a target set of weighting coefficients from multiple cost functions or multiple sets of weighting coefficients according to size-related information associated with the current block, said one or more templates, or both;

derive a target cost based on said one or more templates using the target cost function or the target set of weighting coefficients selected; and

apply a target coding tool belonging to a coding tool group to the current block, wherein a target process of the target coding tool is performed according to information comprising the target cost.
A method of video decoding, the method comprising:

receiving input data associated with a current block, wherein the input data comprises coded data associated with the current block to be decoded;

parsing one or more syntaxes in a bitstream to indicate best cost function setting;

determining one or more templates for the current block;

applying a target process of a target coding tool to the current block using the best cost function setting over said one or more templates to generate processed data; and

providing the processed data.
The method of Claim 11, wherein the best cost function setting comprises cost function, weighting coefficient set, template size, or a combination thereof.
The method of Claim 11, wherein said one or more syntaxes comprise four syntaxes to indicate best cost function, best template size, best distance weight, and best spatial weight.
The method of Claim 11, wherein said one or more syntaxes comprise two syntaxes to indicate best distance weight, and best spatial weight.
The method of Claim 11, wherein the target coding tool belongs to a coding tool group comprising ARMC (Adaptive Reordering of Merge Candidates) TM (Template Matching) for affine merge, regular merge, CIIP merge, BM (Bilateral Matching) merge, or a combination thereof.
The method of Claim 11, wherein the target coding tool belongs to a coding tool group comprising MVD (Motion Vector Differences) sign prediction for AMVP (Advanced Motion Vector Prediction) , MMVD (Merge mode with MVD) , affine MMVD, TM based BCW (Bi-Prediction with CU-level Weight) index derivation, IBC (Intra-Block-Copy) CIIP (Combined Inter and Intra Prediction) , CIIP TM merge, IBC TM merge, TM merge mode, GPM, intraTMP (Intra Template Matching Prediction) , AMVP merge MV refinement, or a combination thereof.
A method of video encoding, the method comprising:

receiving input data associated with a current block, wherein the input data comprises pixel data to be encoded;

determining one or more templates for the current block;

evaluating RD (Rate-Distortion) costs associated with cost function setting candidates, wherein each of the RD costs is evaluated for a target coding tool using one of the cost function setting candidates;

selecting best cost function setting among the cost function setting candidates to achieve a smallest RD cost among the RD costs; and

signalling one or more syntaxes in a bitstream to indicate the best cost function setting.
The method of Claim 17, wherein the RD costs are evaluated only for a part of the cost function setting candidates.