WO2026008042A1

WO2026008042A1 - Method and apparatus of fixed filter set selection of adaptive loop filter in video coding

Info

Publication number: WO2026008042A1
Application number: PCT/CN2025/106997
Authority: WO
Inventors: Shih-Chun Chiu; Yu-Ling Hsiao; Yu-Cheng Lin; You-Cheng CHIANG; Chih-Wei Hsu; Ching-Yeh Chen; Tzu-Der Chuang; Yi-Wen Chen; Yu-Wen Huang
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2024-07-05
Filing date: 2025-07-04
Publication date: 2026-01-08
Anticipated expiration: 2027-01-05

Abstract

A method and apparatus for fixed filter selection. According to one method, a target ALF (Adaptive Loop Filter) fixed filter set is selected implicitly from a group of fixed filter sets according to one or more conditions comprising CU/CTU/slice/frame/sequence size, colour component, slice/frame type, CU coding modes, statistics associated with reconstruction samples of the current block, or a combination thereof. According to another method, a filter strength is applied to the filtered output to generate adjusted filtered output. The adjusted filtered output is provided. According to yet another method, one or more fixed-filter-selection flags are signalled or parsed in APS (Adaptation Parameter Set), PPS (Picture Parameter Set), or SPS (Sequence Parameter Set).

Description

METHOD AND APPARATUS OF FIXED FILTER SET SELECTION OF ADAPTIVE LOOP FILTER IN VIDEO CODING

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/667,859, filed on July 5, 2024 and U.S. Provisional Patent Application No. 63/669,320, filed on July 10, 2024. The U.S. Provisional Patent Applications are hereby incorporated by reference in their entireties.

FIELD OF THE INVENTION

The present invention relates to video coding system using ALF (Adaptive Loop Filter) . In particular, the present invention discloses fixed filter selection using implicit method or according to one or more flags in addition to the slice-level flag. Also, a filter strength is also disclosed to adjust filter output.
BACKGROUND AND RELATED ART

Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO/IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.

Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing. For Intra Prediction, the prediction data is derived based on previously coded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.

As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.

The decoder, as shown in Fig. 1B, can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.

According to VVC, an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC. Each CTU can be partitioned into one or multiple smaller size coding units (CUs) . The resulting CU partitions can be in square or rectangular shapes. Also, VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.

Adaptive Loop Filter in VVC

In VVC, an Adaptive Loop Filter (ALF) with block-based filter adaption is applied. For the luma component, one filter is selected among 25 filters for each 4×4 block, based on the direction and activity of local gradients.

1. Filter shape

Two diamond filter shapes (as shown in Fig. 2) are used. The 7×7 diamond shape 220 is applied for luma component and the 5×5 diamond shape 210 is applied for the chroma components.

2. Block classification

For luma component, each 4×4 block is categorized into one out of 25 classes. The classification index C is derived based on its directionality D and a quantized value of activityas follows:

To calculate D andgradients of the horizontal, vertical and two diagonal direction are first calculated using 1-D Laplacian:

where indices i and j refer to the coordinates of the upper left sample within the 4×4 block and
R (i, j) indicates a reconstructed sample at coordinate (i, j) .

To reduce the complexity of block classification, the subsampled 1-D Laplacian calculation is applied to the vertical direction (Fig. 3A) and the horizontal direction (Fig. 3B) . As shown in Figs. 3C-D, the same subsampled positions are used for gradient calculation of all directions (g_d1 in Fig. 3C and g_d2 in Fig. 3D) .

Then D maximum and minimum values of the gradients of horizontal and vertical directions are set as:

The maximum and minimum values of the gradient of two diagonal directions are set as:

To derive the value of the directionality D, these values are compared against each other and with two thresholds t₁ and t₂:
Step 1. If bothandare true, D is set to 0.
Step 2. Ifcontinue from Step 3; otherwise continue from Step 4.
Step 3. IfD is set to 2; otherwise D is set to 1.
Step 4. IfD is set to 4; otherwise D is set to 3.

The activity value A is calculated as:

A is further quantized to the range of 0 to 4, inclusively, and the quantized value is denoted as

For the chroma components in a picture, no classification is applied.

3. Geometric transformations of filter coefficients and clipping values

Before filtering each 4×4 luma block, geometric transformations such as rotation or diagonal and vertical flipping are applied to the filter coefficients f (k, l) and to the corresponding filter clipping values c (k, l) depending on gradient values calculated for that block. This is equivalent to applying these transformations to the samples in the filter support region. The idea is to make different blocks to which ALF is applied more similar by aligning their directionality.

Three geometric transformations, including diagonal, vertical flip and rotation are introduced:
Diagonal: f_D (k, l) =f (l, k) , c_D (k, l) =c (l, k) ,
Vertical flip: f_V (k, l) =f (k, K-l-1) , c_V (k, l) =c (k, K-l-1) ,
Rotation: f_R (k, l) =f (K-l-1, k) , c_R (k, l) =c (K-l-1, k) ,
where K is the size of the filter and 0≤k, l≤K-1 are coefficients coordinates, such that location
(0, 0) is at the upper left corner and location (K-1, K-1) is at the lower right corner. The transformations are applied to the filter coefficients f (k, l) and to the clipping values c (k, l) depending on gradient values calculated for that block. The relationship between the transformation and the four gradients of the four directions are summarized in the following table.
Table 1. Mapping of the gradient calculated for one block and the transformations

4. Filtering process

At decoder side, when ALF is enabled for a CTB, each sample R (i, j) within the CU is filtered, resulting in sample value R′ (i, j) as shown below,

where f (k, l) denotes the decoded filter coefficients, K (x, y) is the clipping function and c (k, l)
denotes the decoded clipping parameters. The variable k and l varies between –L/2 and L/2, where L denotes the filter length. The clipping function K (x, y) =min (y, max (-y, x) ) which corresponds to the function Clip3 (-y, y, x) . The clipping operation introduces non-linearity to make ALF more efficient by reducing the impact of neighbour sample values that are too different with the current sample value.

5. Cross component adaptive loop filter

CC-ALF uses luma sample values to refine each chroma component by applying an adaptive, linear filter to the luma channel and then using the output of this filtering operation for chroma refinement. Fig. 4A provides a system level diagram of the CC-ALF process with respect to the SAO, luma ALF and chroma ALF processes. As shown in Fig. 4A, each colour component (i.e., Y, Cb and Cr) is processed by its respective SAO (i.e., SAO Luma 410, SAO Cb 412 and SAO Cr 414) . After SAO, ALF Luma 420 is applied to the SAO-processed luma and ALF Chroma 430 is applied to SAO-processed Cb and Cr. However, there is a cross-component term from luma to a chroma component (i.e., CC-ALF Cb 422 and CC-ALF Cr 424) . The outputs from the cross-component ALF are added (using adders 432 and 434 respectively) to the outputs from ALF Chroma 430.

Filtering in CC-ALF is accomplished by applying a linear, diamond shaped filter (e.g. filters 440 and 442 in Fig. 4B) to the luma channel. In Fig. 4B, a blank circle indicates a luma sample and a dot-filled circle indicate a chroma sample. One filter is used for each chroma channel, and the operation is expressed as:

where (x, y) is chroma component i location being refined, (x_Y, y_Y) is the luma location based on
(x, y) , S_i is filter support area in luma component, and c_i (x₀, y₀) represents the filter coefficients.

As shown in Fig, 4B, the luma filter support is the region collocated with the current chroma sample after accounting for the spatial scaling factor between the luma and chroma planes. In Fig. 4B, circles represent luma samples, while dotted circles represent chroma samples being refined.

In the VVC reference software, CC-ALF filter coefficients are computed by minimizing the mean square error of each chroma channel with respect to the original chroma content. To achieve this, the VTM (VVC Test Model) algorithm uses a coefficient derivation process similar to the one used for chroma ALF. Specifically, a correlation matrix is derived, and the coefficients are computed using a Cholesky decomposition solver in an attempt to minimize a mean square error metric. In designing the filters, a maximum of 8 CC-ALF filters can be designed and transmitted per picture. The resulting filters are then indicated for each of the two chroma channels on a CTU basis.

Additional characteristics of CC-ALF include:
· The design uses a 3x4 diamond shape with 8 taps.
· Seven filter coefficients are transmitted in the APS.
· Each of the transmitted coefficients has a 6-bit dynamic range and is restricted to
power-of-2 values.
· The eighth filter coefficient is derived at the decoder such that the sum of the filter
coefficients is equal to 0.
· An APS may be referenced in the slice header.
· CC-ALF filter selection is controlled at CTU-level for each chroma component
· Boundary padding for the horizontal virtual boundaries uses the same memory
access pattern as luma ALF.

As an additional feature, the reference encoder can be configured to enable some basic subjective tuning through the configuration file. When enabled, the VTM attenuates the application of CC-ALF in regions that are coded with high QP and are either near mid-grey or contain a large amount of luma high frequencies. Algorithmically, this is accomplished by disabling the application of CC-ALF in CTUs where any of the following conditions are true:
· The slice QP value minus 1 is less than or equal to the base QP value.
· The number of chroma samples for which the local contrast is greater than
(1 << (bitDepth –2 ) ) –1 exceeds the CTU height, where the local contrast is the difference between the maximum and minimum luma sample values within the filter support region.
· More than a quarter of chroma samples are in the range between
(1 << (bitDepth –1) ) –16 and (1 << (bitDepth –1) ) + 16

The motivation for this functionality is to provide some assurance that CC-ALF does not amplify artefacts introduced earlier in the decoding path (This is largely due the fact that the VTM currently does not explicitly optimize for chroma subjective quality) . It is anticipated that alternative encoder implementations may either not use this functionality or incorporate alternative strategies suitable for their encoding characteristics.

6. Filter parameters signalling

ALF filter parameters are signalled in Adaptation Parameter Set (APS) . In one APS, up to 25 sets of luma filter coefficients and clipping value indexes, and up to eight sets of chroma filter coefficients and clipping value indexes can be signalled. To reduce bits overhead, filter coefficients of different classification for luma component can be merged. In slice header, the indices of the APSs used for the current slice are signalled.

Clipping value indexes, which are decoded from the APS, allow determining clipping values using a table of clipping values for both the luma and chroma components. These clipping values are dependent of the internal bitdepth. More precisely, the clipping values are obtained by the following formula:
AlfClip= {round (2^B-α*n) for n∈ [0.. N-1] }
with B equal to the internal bitdepth, α is a pre-defined constant value equal to 2.35, and N equal to
4 which is the number of allowed clipping values in VVC. The AlfClip is then rounded to the nearest value with the format of power of 2.

In slice header, up to 7 APS indices can be signalled to specify the luma filter sets that are used for the current slice. The filtering process can be further controlled at CTB level. A flag is always signalled to indicate whether ALF is applied to a luma CTB. A luma CTB can choose a filter set among 16 fixed filter sets and the filter sets from APSs. A filter set index is signalled for a luma CTB to indicate which filter set is applied. The 16 fixed filter sets are pre-defined and hard-coded in both the encoder and the decoder.

For the chroma component, an APS index is signalled in slice header to indicate the chroma filter sets being used for the current slice. At CTB level, a filter index is signalled for each chroma CTB if there is more than one chroma filter set in the APS.

The filter coefficients are quantized with norm equal to 128. In order to restrict the multiplication complexity, a bitstream conformance is applied so that the coefficient value of the non-central position shall be in the range of -2⁷ to 2⁷ -1, inclusive. The central position coefficient is not signalled in the bitstream and is considered as equal to 128.

Adaptive Loop Filter in ECM

In ECM8 (Muhammed Coban, et al., “Algorithm description of Enhanced Compression Model 8 (ECM 8) ” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29) , 29th Meeting, by teleconference, 11–20 January 2023, Document: JVET-AC2025) , some changes from the VVC ALF are disclosed. A brief overview is shown below.

1. ALF simplification removal

ALF gradient subsampling and ALF virtual boundary processing are removed. Block size for classification is reduced from 4x4 to 2x2. Filter size for both the luma and chroma, for which ALF coefficients are signalled, is increased to 9x9.

2. ALF with fixed filters

To filter a luma sample, three different classifiers (C₀, C₁ and C₂) and three different sets of filters (F₀, F₁ and F₂) are used. Sets F₀ and F₁ contain fixed filters, with coefficients trained for classifiers C₀ and C₁. Coefficients of filters in F₂ are signalled. Which filter from a set F_i is used for a given sample is decided by a class C_i assigned to this sample using classifier C_i.

3. Filtering

At first, two 13x13 diamond shape fixed filters F₀ and F₁ are applied to derive two intermediate samples R₀ (x, y) and R₁ (x, y) . After that, F₂ is applied to R₀ (x, y) , R₁ (x, y) , and neighbouring samples to derive a filtered sample as

where f_i, j is the clipped difference between a neighbouring sample and current sample R (x, y) and
g_i is the clipped difference between R_i-20 (x, y) and current sample. The filter coefficients c_i, i=0, …21, are signalled.

4. Classification

Based on directionality D_i and activityaclass C_i is assigned to each 2x2 block:

where M_D, i represents the total number of directionalities D_i.

As in VVC, values of the horizontal, vertical, and two diagonal gradients are calculated for each sample using 1-D Laplacian. The sum of the sample gradients within a 4×4 window that covers the target 2×2 block is used for classifier C₀ and the sum of sample gradients within a 12×12 window is used for classifiers C₁ and C₂. The sums of horizontal, vertical and two diagonal gradients are denoted, respectively, asandThe directionality D_i is determined by comparing

with a set of thresholds. The directionality D₂ is derived as in VVC using thresholds 2 and 4.5. For
D₀ and D₁, horizontal/vertical edge strengthand diagonal edge strengthare calculated first. Thresholds Th= [1.25, 1.5, 2, 3, 4.5, 8] are used. Edge strengthis 0 ifotherwise, is the maximum integer such thatEdge strengthis 0 ifotherwise, is the maximum integer such thatWheni.e., horizontal/vertical edges are dominant, the D_i is derived by using Table 2A; otherwise, diagonal edges are dominant, the D_i is derived by using Table 2B.
Table 2A. Mapping ofandto D_i

Table 2B. Mapping ofandto D_i

To obtainthe sum of vertical and horizontal gradients A_i is mapped to the range of 0 to n, where n is equal to 4 forand 15 forand

In an ALF_APS, up to 4 luma filter sets are signalled, each set may have up to 25 filters.

5. Alternative 2x2 ALF classifier

Classification in ALF is extended with an additional alternative classifier. For a signalled luma filter set, a flag is signalled to indicate whether the alternative classifier is applied. Geometrical transformation is not applied to the alternative band classifier. When the band-based classifier is applied, the sum of sample values of a 2x2 luma block is calculated at first. Then the class index is calculated as below,
class_index = (sum *25) >> (sample bit depth + 2) .

6. Residual based classifier

A third classifier is based on luma residual sample values. For each 2x2 luma block, the sum of absolute values of the residual samples in a neighbouring 8x8 window is calculated, and the class index is derived as:
classIdx = sum >> (sample bit depth - 4) .

The value of classIdx is in the range of 0 to 24, same as in ECM-8.0. The classifier usage is signalled for each luma filter set in APS.

7. CCALF with long tap filter

The CCALF process uses a linear filter to filter luma sample values and generate a residual correction for the chroma samples. A 25-tap large filter is used in CCALF process, which is illustrated in Fig. 5. In Fig. 5, taps for luma samples are shown in grey dots and the location of the corresponding chroma sample is shown as a small dash-lined circle. For a given slice, the encoder can collect the statistics of the slice, analyse them and can signal up to 16 filters through APS.

8. Adaptive filter shape switch/using samples before deblocking filter for ALF

Two candidate filter shapes: a diamond shape as shown in Fig. 6 and a new cross shape as shown in Fig. 7, can be adaptively selected by the luma filters in ALF. The number of coefficients of a luma filter is 22 for both the filter shapes. Please note that these 22 taps are constituted with 20 spatial taps (610 and 710 in Fig. 6 and Fig. 7 respectively) and 2 fixed filters based taps (620 and 720 in Fig. 6 and Fig. 7 respectively) in both shapes.

In each Adaptation Parameter Set (APS) , a shape index for the derived luma filters is signalled to the decoder. Each APS contains the luma filters that are associated with the filter shape index.

For each CTB, an APS index is signalled to indicate which luma filter shape is used to filter the current CTB. When filtering a luma sample, the coefficients and clip indices are also rearranged according to the corresponding filter shape.

The diamond shape luma ALF is replaced by the longer filter shown in Fig. 7.

The samples before deblocking filters are used as additional inputs for ALF. A final ALF sample is derived by weighting the regular ALF and the filter applied to the samples before the deblocking filter. Specifically, a filtered sample is derived as

where f_i, j is the clipped difference between a neighbouring sample and current sample R (x, y) , g_i is
the clipped difference between an intermediate sample and current sample R (x, y) and h_i, j is the clipped difference between a neighbouring sample before DBF and current sample R (x, y) . The filter coefficients c_i, i=0, …24 are signalled. In example, 3x3 diamond shape is applied to samples before deblocking filter. In an APS, a flag is signalled to indicate whether samples before DBF are used for ALF which is always set as true at encoder.

9. Extended fixed-filter-output based taps for ALF

In ALF online-trained filters consist of 4 kinds of filter taps: spatial taps (810) , reconstruction-before-DBF based taps (840) , residual based taps (850) and fixed-filter-output based taps (820 and 830) as shown in Fig. 8.

10. ALF with residual samples

The residual samples are used as additional inputs to the ALF. A filtered sample is derived as:

where r_i is the clipped neighbouring residual sample value and rFiltered_i is the clipped residual
sample filtered by the fixed-filter. For residual samples, the fixed filter reuses the offline fixed filter trained for reconstruction after SAO.

11. Additional fixed filter for ALF

Additional fixed filter with a shape of diamond 7x7 is introduced, the filter parameters are stored at both the encoder and the decoder. There is no classification for the newly added fixed filter.

An online filter or online-trained filter of the proposed method is shown in Fig. 9, where spatial taps 910 (i.e., tap #0 ～ #19) , reconstruction-before-DBF-based taps 940 (i.e., tap #26, #27, #36) , residual-based taps 950 (i.e., #37 ～ #38) and fixed-filter-output-based taps 920 and 930 (i.e., tap #20 ～ #25, #34, #35) are kept the same as the ECM-8.0, and several extended taps 960 (i.e., tap #28 ～ #33, #39) are introduced into luma online-trained filters. The reconstruction before DBF is fed into the additional fixed filter to produce the filter outputs, then these filter outputs are used as input for newly extended taps. The online filter or online-trained filter refers to a filter specified in APS (Adaptation Parameter Set) , where the filter is trained at the encoder and signalled to decoder. The online filter or online-trained filter is in contrast to fixed filters, which are offline-trained and pre-defined in the specification.

This filter is always enabled without any filter shape switching.

12. Improved fixed filters

Two Laplacian-based classifiers (one for each fixed filter) are applied to a 2x2 block. In each classifier, activity and directionality values are derived based on vertical, horizontal, and diagonal gradients using a window surrounding each 2x2 block. For each 2x2 block, the mean value of a surrounding window is calculated. Then, for each sample of this window, the difference between the sample value and the mean value is calculated. A scaling factor is determined based on the activity value derived from a Laplacian classifier. The square root of the sum of the squared differences is further quantized to C′ by a scaling factor. The value of C′is an integer between 0 and 7, inclusively. With i=0, 1, let C_i denote the classifier from the classifier of i-th fixed filter in ECM-9.0. Then the proposed class index C_i′ is derived as
C′_i= C′*896+C_i.

The total number of the fixed filters is not changed.

Then a class index is determined based on the activity and directionality values. Two diamond shaped fixed filters are selected from the two filter sets by using the derived two class indices. Both fixed filters are applied to samples before DBF and ALF input, where additional diamond 9x9 filter is used for the samples before DBF. The shape of the first fixed filter applied to the ALF input samples is reduced from 13x13 to 9x9, and the shape of the second fixed filter, which is 13x13, applied to ALF input is unchanged as shown in Table 3.
Table 3. Comparison of fixed filters between ECM-9.0 and JVET-AE0139

Fixed filter f₁ is applied to outputs of f₀ (instead of ALF input) and samples before DBF.

Finally, a signalled filter is applied to the ALF input samples, samples before the deblocking filter (DBF) , outputs of the two fixed filters, output of a Gaussian filter and the residual data.

13. Luma residual taps in CCALF (JVET-AF0197)

For CCALF, five luma residual taps in a cross 3x3 shape 1020 are added to the 9x9 CCALF filter shape 1010 as shown in Fig. 10. The extended taps take the co-located and neighbouring luma residual values as input.

14. Chroma fixed filter taps (JVET-AG0157)

The first luma classifier is applied to each 2x2 chroma block. The derived class index is then used to select a fixed filter from the luma fixed filter set related to this classifier. A fixed filter is applied to chroma ALF input sample in a 9x9 diamond shape and DBF input samples in a 9x9 diamond shape.

In a signalled chroma filter, 5x5 crossing extra taps are introduced, which are applied to the fixed filter output of the current sample.

15. Adaptive precision for luma ALF coefficients (JVET-AG0158)

The number of bits used to represent the fractional part of a luma coefficient can vary from 5 to 8 adaptively. For each luma filter set, which contains up to 25 filters, a 2-bit syntax element is signalled to indicate the number of bits used for the coefficients in this set. The real value range of a coefficient is not changed.

In the present invention, methods and apparatus of fixed filter selection using implicit method or based on one or more flags in addition to the slice-level flag. Also, a filter strength is also disclosed to adjust filter output are disclosed.
BRIEF SUMMARY OF THE INVENTION

A method and apparatus for in-loop filtering of reconstructed video are disclosed. According to the method, input data for a current block is receiving, wherein the input data comprises reconstructed samples of the current block and the current block comprises a luma component and one or more chroma components. A target ALF (Adaptive Loop Filter) fixed filter set is selected implicitly from a group of fixed filter sets according to one or more conditions comprising CU/CTU/slice/frame/sequence size, colour component, slice/frame type, CU coding modes, statistics associated with reconstruction samples of the current block, or a combination thereof. The target ALF fixed filter set is applied to the current block. Filtered output generated by said applying ALF fixed filter to the current block is provided.

In one embodiment, a first target ALF fixed filter set is selected for the luma component and a second target fixed filter set is selected for said one or more chroma components.

In one embodiment, said one or more chroma components consist of a first chroma component and a second chroma component, a first target ALF fixed filter set is selected for the luma component, a second target fixed filter set is selected for the first chroma component, and a third target fixed filter set is selected for the second chroma component. In one embodiment, a first target ALF fixed filter set is used for intra frames and a second target ALF fixed filter set is used for inter frames.

In one embodiment, a first target ALF fixed filter set is selected for first CUs using intra modes, a second target ALF fixed filter set is selected for second CUs using inter modes, and a third target ALF fixed filter set is selected for third CUs using IBC modes.

In one embodiment, the statistics associated with the reconstruction samples of the current block comprise local or global pre-ALF luminance, and/or local pre-ALF variance.

In one embodiment, two different fixed filter sets in the group of fixed filter sets are different in one or more filter coefficients, one or more clipping indices, one or more class-to-filter mappings, one or more classifiers, one or more geometric transform, or a combination thereof.

In one embodiment, the filtered output is adjusted by a filter strength before the filtered output is provided.

According to another method, input data for a current block is received, wherein the input data comprises reconstructed samples of the current block. A target ALF fixed filter set is applied to the current block to generate filtered output. A filter strength is applied to the filtered output to generate adjusted filtered output. The adjusted filtered output is provided.

In one embodiment, the filter strength is explicitly signalled at a CTU, slice, tile, sub-picture, picture, sequence, or APS level, or a combination thereof.

In one embodiment, the filter strength is implicitly derived according to statistics associated with a current coding region.

In one embodiment, the target ALF fixed filter set is selected from a group of fixed filter sets implicitly.

According to yet another method, input data for a current block is received, wherein the input data comprises reconstructed samples of the current block. One or more fixed-filter-selection flags are signalled or parsed in APS (Adaptation Parameter Set) , PPS (Picture Parameter Set) , or SPS (Sequence Parameter Set) . A target ALF (Adaptive Loop Filter) fixed filter set is selected from a group of fixed filter sets according to one or more flags comprising said one or more fixed-filter-selection flags. The target ALF fixed filter set is applied to the current block. Filtered output generated by said applying ALF fixed filter to the current block is provided.

In one embodiment, said one or more flags comprise a slice-level flag.

In one embodiment, said one or more fixed-filter-selection flags in the APS are signalled per filter, per filter set, or per APS.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing.

Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.

Fig. 2 illustrates the ALF filter shapes for the chroma (left) and luma (right) components.

Figs. 3A-D illustrates the subsampled Laplacian calculations for g_v (3A) , g_h (3B) , g_d1 (3C) and g_d2 (3D) .

Fig. 4A illustrates the placement of CC-ALF with respect to other loop filters.

Fig. 4B illustrates a diamond shaped filter for the chroma samples.

Fig. 5 illustrates the 25-tap large filter used in CCALF process.

Fig. 6 illustrates the diamond shaped ALF in ECM-5.0.

Fig. 7 illustrates a longer ALF as an alternative to the diamond shaped ALF in Fig. 6.

Fig. 8 illustrates the filter shape of ALF in ECM-7.0.

Fig. 9 illustrates an example of ALF with additional fixed filter.

Fig. 10 illustrates the newly introduced filter shape for CCALF according to JVET-AF0197.

Fig. 11 illustrates a flowchart of an exemplary video coding system that selects a target ALF fixed filter set implicitly from a group of fixed filter sets according to an embodiment of the present invention.

Fig. 12 illustrates a flowchart of an exemplary video coding system that applies a filter strength to the filtered output according to an embodiment of the present invention.

Fig. 13 illustrates a flowchart of an exemplary video coding system that signals or parses one or more fixed-filter-selection flags in APS, PPS, or SPS according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.

PROPOSED METHOD

ALF Fixed Filter Set Selection in ECM

In ECM-12.0, there are 512 pre-determined filters in each ALF fixed filter set. When a region is filtered by a fixed filter set, one classifier corresponding to the fixed filter set is used to categorize each block in the region into one out of 7168 classes, and one class-to-filter mapping (7168-to-512 mapping) corresponding to the fixed filter set is used to select which filter among the 512 filters should be used for each class. In ECM’s design, the QP value of the region and a slice-level flag are used to determine which fixed filter set to use for the region. In this invention, we provide methods for fixed filter set selection.

Proposed ALF Fixed Filter Set Selection

In one embodiment, fixed filter sets are implicitly selected according to CU/CTU/slice/frame/sequence sizes, colour component, slice/frame type, CU coding modes, statistics associated with reconstruction samples of the current block, or a combination thereof.

Example 1.

When applying fixed filtering to a region, if the corresponding CU/CTU/slice/frame/sequence size is smaller than a threshold, a first fixed filter set is used. Otherwise, a second fixed filter set is used.

In another embodiment, fixed filter sets are implicitly selected according to colour components.

Example 2.

When applying fixed filtering to a region, for the luma component of the region, a first fixed filter set is used. For the chroma components of the region, a second fixed filter set is used.

Example 3.

When applying fixed filtering to a region, for the luma component of the region, a first fixed filter set is used. For the chroma Cb component of the region, a second fixed filter set is used. For the chroma Cr component of the region, a third fixed filter set is used.

In another embodiment, fixed filter sets are implicitly selected according to slice/frame types.

Example 4.

When applying fixed filtering to a region, if the current slice/frame is an intra slice/frame, a first fixed filter set is used. Otherwise (the current slice/frame is an inter frame) , a second fixed filter set is used.

In another embodiment, fixed filter sets are implicitly selected according to CU coding information. Specifically, all possible coding modes will be categorized into multiple groups. CUs with coding modes in the same group will share the same fixed filter sets.

Example 5.

When applying fixed filtering to a region, if the corresponding CU uses an intra mode, a first fixed filter set is used. If the corresponding CU uses an inter mode, a second fixed filter set is used. If the corresponding CU uses an IBC mode, a third fixed filter set is used. In this example, CUs in the region include first CUs using intra modes, second CUs using inter modes, and third CUs using IBC modes, such that a first target ALF fixed filter set is selected for the first CUs using intra modes, a second target ALF fixed filter set is selected for the second CUs using inter modes, and a third target ALF fixed filter set is selected for the third CUs using IBC modes.

In another embodiment, fixed filter sets are implicitly selected according to some statistics associated with the reconstruction data.

Example 6.

When applying fixed filtering to a region, a local luminance of the pre-ALF samples of the region is calculated. If the luminance is smaller than a threshold (i.e., a darker region) , a first fixed filter set is used. Otherwise (i.e., a brighter region) , a second fixed filter set is used.

Example 7.

When applying fixed filtering to a region, a local variance of the pre-ALF samples of the region is calculated. If the variance is smaller than a threshold (i.e., a flat region) , a first fixed filter set is used. Otherwise (i.e., a texture region) , a second fixed filter set is used.

Example 8.

When applying fixed filtering to a region, a global luminance of the pre-ALF samples of the region is calculated. If the luminance is smaller than a threshold (i.e., a darker slice/frame) , a first fixed filter set is used. Otherwise (i.e., a brighter slice/frame) , a second fixed filter set is used.

In another embodiment, in addition to a slice-level flag, one or more flags in APS/PPS/SPS are signalled for fixed filter set selection.

Example 9.

When applying fixed filtering to a region, a PPS/SPS-level flag is used in combination with a slice-level flag to select the appropriate fixed filter set.

Example 10.

When applying an APS filter to a region, the APS filter will take the output of a fixed filtering result as the input, where an APS flag is used to select the fixed filter set for fixed filtering. Specifically, the flag can be signalled per filter, per filter set, or per APS.

In above embodiments, the first fixed filter set differs from the second fixed filter set in coefficients, clipping indices, class-to-filter mappings, classifiers, geometric transforms, or a combination thereof.

Example 11.

In Example 4, the first fixed filter set for intra slices/frames and the second fixed filter set for inter slices/frames share the same filter coefficients and clipping indices, but use different class-to-filter mappings, classifiers, and geometric transforms.

Example 12.

In Example 4, the first fixed filter set for intra slices/frames and the second fixed filter set for inter slices/frames use the same classifier and geometric transform, but use different filter coefficients, clipping indices, and class-to-filter mappings.

Example 13.

In Example 2, the first fixed filter set for luma and the second fixed filter set for chroma use the same classifier and geometric transform, but use different filter coefficients, clipping indices, and class-to-filter mappings.

In one embodiment, a filter strength is applied to fixed filtering result. The filter strength is explicitly signalled at the CTU/slice/tile/sub-picture/picture/sequence/APS level, or implicitly derived by some statistics associated with the current coding region.

Example 14.

One filter strength for fixed filtering is signalled at slice level. When applying fixed filtering to a region inside the slice, the filtering result will be scaled by the filter strength.

ALF with Multiple Outputs

In VVC and ECM, ALF is with single output, and the output is used to generate the reconstruction for future frame reference and for display. In this invention, we propose an ALF with multiple outputs, where some of the outputs are used to generate the reconstruction for future frame reference, while the other outputs are used to generate the reconstruction for display. In such design, by decoupling the optimization for inter prediction and for display, the coding efficiency can be improved.

In one embodiment, ALF is with two outputs, where one output is used for future frame reference, while the other output is used for current frame display. The two outputs can be generated in parallel.

In another embodiment, ALF is with two outputs, where one output is used for future frame reference, while the other output is used for current frame display. The two outputs should be generated sequentially. For example, one output should be generated first, and the other output is generated based on the first output.

Example 1.

ALF with a first ALF parameters will generate a first output for future frame reference, and ALF with a second ALF parameters will take the first output as input and generate a second output for current frame display.

In the above embodiments, two separate sets of ALF parameters are signalled for generating two ALF outputs, where ALF parameters include at least adaptive filters in APS, and slice/CTU-level flags related to ALF.

In the above embodiments, two sets of ALF parameters are signalled for generating two ALF outputs, while partial parameters are shared between two sets, where ALF parameters include at least adaptive filters in APS, and slice/CTU-level flags related to ALF.

Example 2.

One set of adaptive filters is signalled and shared, while two sets of the slice/CTU-level flags related to ALF are signalled separately for generating two outputs.

Example 3.

One set of the slice/CTU-level flags related to ALF is signalled and shared, while two sets of adaptive filters are signalled separately for generating two outputs.

Example 4.

Similar to Example 3, but the two separate sets of adaptive filters still share partial parameters. The filter coefficients and clipping indices are shared, but the class-to-filter mappings are separate.

Example 5.

Similar to Example 3, but the two separate sets of adaptive filters still share partial parameters. One large set of filter coefficients and clipping indices is signalled, and two subsets (with or without overlapping) of the large set are used to generate two outputs respectively.

In one embodiment, one or more picture/sequence-level flags are signalled to indicate the number of activated ALF outputs.

Example 5.

One picture/sequence-level flag is signalled. If the flag indicates there are two ALF outputs, one ALF output is used for future frame reference, while the other ALF output is used for display. If the flag indicates there is only one ALF output, the ALF output is used for both future frame reference and for display. Besides, if there is only one ALF output, the signalling of the low level (lower than picture/sequence level) syntax for the second ALF output can be skipped.

The foregoing proposed methods of fixed filter selection can be implemented in encoders and/or decoders. For example, the proposed method can be implemented in an in-loop filtering module of an encoder, and/or an in-loop filtering module of a decoder. With reference to the exemplary encoder and decoder in Fig. 1A and Fig. 1B, any of the proposed methods can be implemented in the in-loop filter module (e.g. ILPF 130 in Fig. 1A and Fig. 1B) of an encoder or a decoder. Alternatively, any of the proposed methods can be implemented as circuits coupled to the inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder. The proposed ALF fixed filter selection methods may also be implemented using executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .

Fig. 11 illustrates a flowchart of an exemplary video coding system that selects a target ALF fixed filter set implicitly from a group of fixed filter sets according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to the method, input data for a current block is receiving in step 1110, wherein the input data comprises reconstructed samples of the current block and the current block comprises a luma component and one or more chroma components. A target ALF (Adaptive Loop Filter) fixed filter set is selected implicitly from a group of fixed filter sets according to one or more conditions comprising CU/CTU/slice/frame/sequence size, colour component, slice/frame type, statistics associated with reconstruction samples of the current block, or a combination thereof in step 1120. The target ALF fixed filter set is applied to the current block in step 1130. Filtered output generated by said applying ALF fixed filter to the current block is provided in step 1140.

Fig. 12 illustrates a flowchart of an exemplary video coding system that applies a filter strength to the filtered output according to an embodiment of the present invention. According to this method, input data for a current block is received, wherein the input data comprises reconstructed samples of the current block. One or more fixed-filter-selection flags are signalled or parsed in APS (Adaptation Parameter Set) , PPS (Picture Parameter Set) , or SPS (Sequence Parameter Set) . A target ALF (Adaptive Loop Filter) fixed filter set is selected from a group of fixed filter sets according to one or more flags comprising said one or more fixed-filter-selection flags. The target ALF fixed filter set is applied to the current block. Filtered output generated by said applying ALF fixed filter to the current block is provided.

Fig. 13 illustrates a flowchart of an exemplary video coding system that signals or parses one or more fixed-filter-selection flags in APS, PPS, or SPS according to an embodiment of the present invention. According to this method, input data for a current block is received, wherein the input data comprises reconstructed samples of the current block. One or more fixed-filter-selection flags are signalled or parsed in APS (Adaptation Parameter Set) , PPS (Picture Parameter Set) , or SPS (Sequence Parameter Set) . A target ALF (Adaptive Loop Filter) fixed filter set is selected from a group of fixed filter sets according to one or more flags comprising said one or more fixed-filter-selection flags. The target ALF fixed filter set is applied to the current block. Filtered output generated by said applying ALF fixed filter to the current block is provided.

The flowcharts shown are intended to illustrate examples of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method for in-loop filtering of reconstructed video, the method comprising:

receiving input data for a current block, wherein the input data comprises reconstructed samples of the current block and the current block comprises a luma component and one or more chroma components;

selecting a target ALF (Adaptive Loop Filter) fixed filter set implicitly from a group of fixed filter sets according to one or more conditions comprising CU/CTU/slice/frame/sequence size, colour component, slice/frame type, CU coding modes, statistics associated with reconstruction samples of the current block, or a combination thereof;

applying the target ALF fixed filter set to the current block; and

providing filtered output generated by said applying ALF fixed filter to the current block.
The method of Claim 1, wherein a first target ALF fixed filter set is selected for the luma component and a second target fixed filter set is selected for said one or more chroma components.
The method of Claim 1, wherein said one or more chroma components consist of a first chroma component and a second chroma component, a first target ALF fixed filter set is selected for the luma component, a second target fixed filter set is selected for the first chroma component, and a third target fixed filter set is selected for the second chroma component.
The method of Claim 1, wherein a first target ALF fixed filter set is used for intra frames and a second target ALF fixed filter set is used for inter frames.
The method of Claim 1, wherein a first target ALF fixed filter set is selected for first CUs using intra modes, a second target ALF fixed filter set is selected for second CUs using inter modes, and a third target ALF fixed filter set is selected for third CUs using IBC modes.
The method of Claim 1, wherein the statistics associated with the reconstruction samples of the current block comprise local or global pre-ALF luminance, and/or local pre-ALF variance.
The method of Claim 1, wherein two different fixed filter sets in the group of fixed filter sets are different in one or more filter coefficients, one or more clipping indices, one or more class-to-filter mappings, one or more classifiers, one or more geometric transform, or a combination thereof.
The method of Claim 1, wherein the filtered output is adjusted by a filter strength before the filtered output is provided.
An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:

receive input data for a current block, wherein the input data comprises reconstructed samples of the current block and the current block comprises a luma component and one or more chroma components;

select a target ALF (Adaptive Loop Filter) fixed filter set implicitly from a group of fixed filter sets according to one or more conditions comprising CU/CTU/slice/frame/sequence size, colour component, slice/frame type, statistics associated with reconstruction samples of the current block, or a combination thereof;

apply the target ALF fixed filter set to the current block; and

provide filtered output generated by said applying ALF fixed filter to the current block.
A method for in-loop filtering of reconstructed video, the method comprising:

receiving input data for a current block, wherein the input data comprises reconstructed samples of the current block;

applying a target ALF fixed filter set to the current block to generate filtered output;

applying a filter strength to the filtered output to generate adjusted filtered output; and

providing the adjusted filtered output.
The method of Claim 10, wherein the filter strength is explicitly signalled at a CTU, slice, tile, sub-picture, picture, sequence, or APS level, or a combination thereof.
The method of Claim 10, wherein the filter strength is implicitly derived according to statistics associated with a current coding region.
The method of Claim 10, wherein the target ALF fixed filter set is selected from a group of fixed filter sets implicitly.
An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:

receive input data for a current block, wherein the input data comprises reconstructed samples of the current block;

apply a target ALF fixed filter set to the current block to generate filtered output;

apply a filter strength to the filtered output to generate adjusted filtered output; and

provide the adjusted filtered output.
A method for in-loop filtering of reconstructed video, the method comprising:

receiving input data for a current block, wherein the input data comprises reconstructed samples of the current block;

signalling or parsing one or more fixed-filter-selection flags in APS (Adaptation Parameter Set) , PPS (Picture Parameter Set) , or SPS (Sequence Parameter Set) ;

selecting a target ALF (Adaptive Loop Filter) fixed filter set from a group of fixed filter sets according to one or more flags comprising said one or more fixed-filter-selection flags;

applying the target ALF fixed filter set to the current block; and

providing filtered output generated by said applying ALF fixed filter to the current block.
The method of Claim 15, wherein said one or more flags comprise a slice-level flag.
The method of Claim 15, wherein said one or more fixed-filter-selection flags in the APS are signalled per filter, per filter set, or per APS.
The method of Claim 15, wherein two different fixed filter sets in the group of fixed filter sets are different in one or more filter coefficients, one or more clipping indices, one or more class-to-filter mappings, one or more classifiers, one or more geometric transform, or a combination thereof.
An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:

receive input data for a current block, wherein the input data comprises reconstructed samples of the current block;

signal or parse one or more fixed-filter-selection flags in APS (Adaptation Parameter Set) , PPS (Picture Parameter Set) , or SPS (Sequence Parameter Set) ;

select a target ALF (Adaptive Loop Filter) fixed filter set from a group of fixed filter sets according to one or more flags comprising said one or more fixed-filter-selection flags;

applying the target ALF fixed filter set to the current block; and

provide filtered output generated by said applying ALF fixed filter to the current block.