CN117837140A

CN117837140A - History-based derivation of Rice parameters for wavefront parallel processing in video coding

Info

Publication number: CN117837140A
Application number: CN202280056593.6A
Authority: CN
Inventors: 余越; 于浩平; 弗莱德斯拉夫·扎克哈成科
Original assignee: Innopeak Technology Inc
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-08-26
Filing date: 2022-08-26
Publication date: 2024-04-05
Also published as: CN117529914A; CN117837144A

Abstract

In some embodiments, a video decoder decodes video from a bitstream. The video decoder accesses a binary string representing a partition of the video and processes each Coding Tree Unit (CTU) in the partition to generate a decoded value in the CTU. The process includes: for a first CTU of a current CTU row, it is determined whether the current CTU row is the first CTU row in the partition. If so, the history counter is set to an initial value. If not, the history counter is set to the value stored in the history counter storage variable. The video decoder decodes the CTU by calculating a rice parameter of the CTU based on the history counter and decoding the binary string corresponding to the CTU based on the calculated rice parameter. After decoding the CTU, the current value of the history counter is stored in the history counter storage variable.

Description

History-based rice parameter derivation for wavefront parallel processing in video coding

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application No. 63/260,600, entitled "History-Based Rice Parameter Derivations for Wavefront Parallel Processing in Video Coding", filed on month 8, 26 of 2021, U.S. provisional application No. 63/262,078, entitled "History-Based Rice Parameter Derivations for Wavefront Parallel Processing in Video Coding", filed on month 10, 4 of 2021, and U.S. provisional application No. 63/251,385, entitled "Representation of Bit Depth Range for VVC Operation Range Extension", filed on month 10, 1 of 2021. The entire disclosure of the prior application is incorporated herein by reference in its entirety.

Technical Field

The present disclosure relates generally to computer-implemented methods and systems for video processing. In particular, the present disclosure relates to history-based Rice parameter derivation for wavefront parallel processing in video coding.

Background

The ubiquitous camera-enabled devices (such as smartphones, tablets and computers) make capturing video or pictures easier than ever before. However, the data amount of even short video may be very large. Video codec techniques (including video encoding and video decoding) are capable of compressing video data into smaller sizes, thereby enabling the storage and transmission of a variety of videos. Video coding has been widely used, such as for digital TV broadcasting, video transmission over the internet and mobile networks, real-time applications (e.g., video chat, video conferencing), DVD and blu-ray disc, etc. In order to reduce the storage space for storing video and/or the network bandwidth consumption for transmitting video, it is desirable to increase the efficiency of video codec schemes.

Disclosure of Invention

Some embodiments relate to history-based Rice parameter derivation for wavefront parallel processing in video coding. In one example, a method for decoding video from a video bitstream includes: accessing a binary string representing a partition of the video, the partition including a plurality of Coding Tree Units (CTUs) forming one or more CTU rows; for each CTU of the plurality of CTUs in the partition, prior to decoding the CTU, and in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is the first CTU row in the partition; in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for calculating a color component of the Rice parameter to an initial value; in response to determining that the current CTU row is not the first CTU row in the partition, setting the history counter for the color component to a value stored in a history counter storage variable; decoding the CTU, comprising: calculating Rice parameters of a plurality of Transform Units (TUs) in the CTU based on the value of the history counter; decoding the binary strings corresponding to the TUs in the CTU into coefficient values of the TUs based on the calculated Rice parameter; and determining pixel values of the plurality of TUs in the CTU according to the coefficient value; and after decoding the CTU, storing a current value of the history counter in the history counter storage variable in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row.

In another example, a non-transitory computer-readable medium stores program code executable by one or more processing devices to perform operations. The operations include accessing a binary string representing a partition of a video, the partition including a plurality of CTUs forming one or more rows of CTUs; for each CTU of the plurality of CTUs in the partition, prior to decoding the CTU, and in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is the first CTU row in the partition; in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for color components to an initial value; in response to determining that the current CTU row is not the first CTU row in the partition, setting the history counter for the color component to a value stored in a history counter storage variable; decoding the CTU, comprising: calculating Rice parameters of the TUs in the CTU based on the value of the history counter; decoding the binary strings corresponding to the TUs in the CTU into coefficient values of the TUs based on the calculated Rice parameter; and determining pixel values of the plurality of TUs in the CTU according to the coefficient value; and after decoding the CTU, storing a current value of the history counter in the history counter storage variable in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row.

In another example, a system includes a processing device and a non-transitory computer-readable medium communicatively coupled to the processing device. The processing device is configured to execute program code stored in the non-transitory computer readable medium to perform operations. The operations include accessing a binary string representing a partition of a video, the partition including a plurality of CTUs forming one or more rows of CTUs; for each CTU of the plurality of CTUs in the partition, prior to decoding the CTU, and in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is the first CTU row in the partition; in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for color components to an initial value; in response to determining that the current CTU row is not the first CTU row in the partition, setting the history counter for calculating a color component of the Rice parameter to a value stored in a history counter storage variable; decoding the CTU, comprising: calculating the Rice parameter of the TUs in the CTU based on the value of the history counter; decoding the binary strings corresponding to the plurality of TUs in the CTU into coefficient values of the TUs based on the calculated Rice parameter; and determining pixel values of the plurality of TUs in the CTU according to the coefficient value; and after decoding the CTU, storing a current value of the history counter in the history counter storage variable in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row.

In another example, a video encoding method includes accessing a partition of the video, the partition including a plurality of CTUs forming one or more rows of CTUs; processing the partition of the video to generate a binary representation of the partition, the processing comprising: for each CTU of the plurality of CTUs in the partition, prior to encoding the CTU, and in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is the first CTU row in the partition; in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for calculating a color component of the Rice parameter to an initial value; in response to determining that the current CTU row is not the first CTU row in the partition, setting the history counter for the color component to a value stored in a history counter storage variable; and encoding the CTU, comprising: calculating the Rice parameter of the plurality of TUs in the CTU based on the value of the history counter, and encoding coefficient values of the plurality of TUs into binary representations corresponding to the plurality of TUs in the CTU based on the calculated Rice parameter; after encoding the CTU, in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row, storing a current value of the history counter in the history counter storage variable; and encoding the binary representation of the partition as a stream of the video.

In another example, a non-transitory computer-readable medium stores program code executable by one or more processing devices to perform operations. The operations include accessing a partition of a video, the partition including a plurality of CTUs forming one or more rows of CTUs; processing the partition of the video to generate a binary representation of the partition, the processing comprising: for each CTU of the plurality of CTUs in the partition, prior to encoding the CTU, and in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is the first CTU row in the partition; in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for color components to an initial value; in response to determining that the current CTU row is not the first CTU row in the partition, setting the history counter for calculating a color component of the Rice parameter to a value stored in a history counter storage variable; and encoding the CTU, comprising: calculating the Rice parameter of the TUs in the CTU based on the value of the history counter; encoding coefficient values of the plurality of TUs into binary representations corresponding to the plurality of TUs in the CTU according to the calculated Rice parameter; after encoding the CTU, in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row, storing a current value of the history counter in the history counter storage variable; and encoding the binary representation of the partition as a stream of the video.

In another example, a system includes a processing device and a non-transitory computer-readable medium communicatively coupled to the processing device. The processing device is configured to execute program code stored in the non-transitory computer readable medium to perform operations comprising: accessing a partition of the video, the partition comprising a plurality of CTUs forming one or more rows of CTUs; processing the partition of the video to generate a binary representation of the partition, the processing comprising: for each CTU of the plurality of CTUs in the partition, prior to encoding the CTU, and in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is the first CTU row in the partition; in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for color components to an initial value; in response to determining that the current CTU row is not the first CTU row in the partition, setting the history counter for calculating a color component of the Rice parameter to a value stored in a history counter storage variable; and encoding the CTU, comprising: calculating the Rice parameter of the TUs in the CTU based on the value of the history counter; encoding coefficient values of the plurality of TUs into binary representations corresponding to the plurality of TUs in the CTU according to the calculated Rice parameter; after encoding the CTU, in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row, storing a current value of the history counter in the history counter storage variable; and encoding the binary representation of the partition as a stream of the video.

References to these illustrative embodiments are not intended to limit or define the disclosure, but rather to provide examples that aid understanding. Additional embodiments are discussed in detail in the specification, further description provided below.

Drawings

The features, embodiments and advantages of the present disclosure may be better understood when the following detailed description is read with reference to the accompanying drawings.

Fig. 1 is a block diagram illustrating an example of a video encoder configured to implement embodiments presented herein.

Fig. 2 is a block diagram illustrating an example of a video decoder configured to implement embodiments presented herein.

Fig. 3 depicts an example of CTU partitioning of images in video according to some embodiments of the present disclosure.

Fig. 4 depicts an example of Coding Unit (CU) partitioning of CTUs according to some embodiments of the present disclosure.

Fig. 5 depicts an example of a coded block having a predetermined order of elements for processing the coded block.

Fig. 6 depicts an example of a template pattern for computing local and variable coefficients located near TU boundaries.

FIG. 7 depicts an example of tiles enabling wavefront parallel processing.

Fig. 8 depicts an example of a frame of a computation history counter, tiles contained in the frame, and CTUs, according to some embodiments of the present disclosure.

Fig. 9 depicts one example of a process of encoding a partition of video according to some embodiments of the present disclosure.

Fig. 10 depicts one example of a process of decoding a partition of video according to some embodiments of the present disclosure.

Fig. 11 depicts another example of a process of encoding a partition of video according to some embodiments of the present disclosure.

Fig. 12 depicts another example of a process of decoding a partition of video according to some embodiments of the present disclosure.

FIG. 13 depicts an example of a computing system that may be used to implement some embodiments of the present disclosure.

Detailed Description

Various embodiments provide history-based Rice parameter derivation for wavefront parallel processing in video coding. As discussed above, more and more video data is generated, stored, and transmitted. This is beneficial to improving the efficiency of video codec technology, thereby using less data to represent video without compromising the visual quality of the decoded video. One way to increase the coding efficiency is to compress the processed video samples into a binary stream using as few binary bits as possible by entropy coding. On the other hand, since video typically contains a large amount of data, it is beneficial to reduce processing time during encoding and decoding. For this purpose, parallel processing may be employed in video encoding and video decoding.

In entropy coding, video samples are binarized into bins, and a codec algorithm such as Context-adaptive binary arithmetic coding (CABAC) may further compress the bins into binary bits. Binarization requires calculation of binarization parameters such as Rice parameters used in a combination of Truncated Rice (TR) and limited k-th order Exp-Golomb (EGK) binarization procedures specified in the general video coding (Versatile Video Coding, VVC) specification. In order to improve the codec efficiency, history-based Rice parameter derivation is used. In such history-based Rice parameter derivation, the Rice parameters of TUs in the current CTU of a partition (e.g., picture, slice, or tile) are derived based on a history counter (denoted StatCoeff) that is calculated from coefficients in the current CTU in the partition and a previous TU in a previous CTU. The history counter is then used to derive a replacement variable (denoted histvue) that will be used to derive the Rice parameter. The history counter may be updated as TUs are processed. In some examples, the replacement variable for a TU remains unchanged even if the history counter is updated.

The correlation between the current CTU and the previous CTU in the partition used to calculate the history counter may conflict, limit or even prevent the use of parallel processing, resulting in unstable or inefficient video encoding. Various embodiments described herein address these issues by reducing or eliminating dependencies between some CTUs in a partition, enabling parallel processing to speed up the video processing process, or by detecting and avoiding collisions before they occur. The following non-limiting examples are provided to introduce some embodiments.

In one embodiment, dependencies between CTUs in different CTU rows when the history counter is calculated are removed, thereby eliminating parallel processing dependency conflicts. For example, the history counter may be reinitialized for each CTU row of the partition. The history counter may be set to an initial value before calculating the Rice parameter of the first CTU in the CTU row. The subsequent history counter may be calculated based on the history counter value in the previous TU in the same CTU row. In this way, the correlation of CTUs in history-based Rice parameter derivation is limited to the same CTU row and does not interfere with parallel processing between different CTU rows, while still benefiting from the coding gain achieved by history-based Rice parameter derivation. In addition, the history-based Rice parameter derivation process is simplified, and the computational complexity is reduced.

In another embodiment, the correlation between CTUs when the history counter is calculated is kept consistent with the correlation between CTUs in parallel processing. For example, parallel encoding may be implemented between partitioned CTU rows, and there may be an N-CTU delay between two consecutive CTU rows. That is, the processing of the CTU row starts after N CTUs in the previous CTU row have been processed. In this case, the history counter of the CTU row may be calculated based on samples in the first N or fewer CTUs of the previous CTU row. This process may be implemented by a store synchronization process. After the last TU in the first CTU of the CTU row is processed, a history counter may be stored in a storage variable. The history counter may then be synchronized with the stored value in the storage variable before processing the first TU in the first CTU of the subsequent CTU row.

In some examples, alternative history-based Rice parameter derivation is used. In this alternative history-based Rice parameter derivation, once the history counter StatCoeff is updated, the substitution variable histvue is updated when processing the TU. To avoid correlation conflicts with parallel encoding, the correlation between CTUs when calculating the history counter may similarly be limited to no more than N CTUs. Likewise, a storage synchronization process may be implemented. After the last TU in the first CTU of the CTU row is processed, the history counter and the replacement variable may each be stored in a storage variable. The history counter and the replacement variable may then be synchronized with the stored value in the corresponding storage variable before processing the first TU in the first CTU of the subsequent CTU row.

In this way, the correlation between CTUs in two consecutive CTU rows when the history counter is calculated is limited to not exceed (i.e., remain consistent with) the correlation between CTUs when parallel encoding is performed. Thus, the history counter calculation does not interfere with parallel processing while still benefiting from the coding gain achieved by history-based Rice parameter derivation.

Alternatively, parallel processing and history-based Rice parameter derivation are prevented from coexistence in the code stream. For example, the video encoder may determine whether parallel processing has been enabled. Disabling history-based Rice parameter derivation if parallel processing has been enabled; otherwise, history-based Rice parameter derivation is enabled. Similarly, if the video encoder determines that history-based Rice parameter derivation has been enabled, parallel processing is disabled and vice versa.

Using the Rice parameters determined as discussed above, the video encoder may binarize prediction residual data (e.g., quantized transform coefficients of a residual) into binary terms, and further compress the binary terms into binary bits for inclusion in a video bitstream using an entropy encoding algorithm. On the decoder side, the decoder may decode the code stream back into a binary item and determine the Rice parameter using any of the methods or any combination of the methods described above and then determine the coefficients from the binary item. The coefficients may be further dequantized and inverse transformed to reconstruct the video block for display.

In some embodiments, the bit depth of the video samples (e.g., the bit depth used to determine the initial value of the history counter StatCoeff) may be determined from a sequence parameter set (Sequence Parameter Set, SPS) syntax element sps_bitdepth_minus8. The value of the SPS syntax element sps_bitdepth_minus8 is in the range of 0 to 8. Similarly, the size of a decoded picture buffer (Decoded Picture Buffer, DPB) for storing a decoded picture may be determined based on a video parameter set (Video Parameter Set, VPS) syntax element vps_ols_dpb_bitdepth_minus8. The value of the VPS syntax element vps_ols_dpb_bitdepth_minus8 is in the range of 0 to 8. The DPB may be allocated memory according to the determined size of the DPB. The determined bit depth and DPB may be used throughout decoding the video bitstream into pictures.

As described herein, some embodiments provide improvements to video codec efficiency and computational efficiency by coordinating history-based Rice parameter derivation with parallel encoding. By doing so, a collision between history-based Rice parameter derivation and parallel encoding can be avoided, thereby improving the stability of the encoding and decoding process. Furthermore, by limiting the correlation between CTUs in history-based Rice parameter derivation to not exceed that in parallel encoding, codec gain may still be achieved by history-based Rice parameter derivation without sacrificing the computational efficiency of the codec process. These techniques may be effective codec tools in future video coding standards.

Referring now to the drawings, FIG. 1 is a block diagram illustrating an example of a video encoder 100 configured to implement embodiments presented herein. In the example shown in fig. 1, video encoder 100 includes a partitioning module 112, a transform module 114, a quantization module 115, an inverse quantization module 118, an inverse transform module 119, an in-loop filter module 120, an intra-prediction module 126, an inter-prediction module 124, a motion estimation module 122, a decoded image buffer 130, and an entropy encoding module 116.

The input to the video encoder 100 is an input video 102 comprising a sequence of images (also referred to as frames or pictures). In a block-based video encoder, for each image, video encoder 100 employs partition module 112 to divide the image into blocks 104, and each block contains a plurality of pixels. These blocks may be macroblocks, CTUs, CUs, prediction units, and/or prediction blocks. One image may include blocks of different sizes, and the partitions of the blocks of different images of the video may also be different. Each block may be encoded using different predictions, such as intra-prediction or inter-prediction or a hybrid of intra-and inter-prediction.

In general, the first image of a video signal is an intra-predicted image encoded using only intra-prediction. In intra prediction mode, only the blocks of an image are predicted using data from the same image. An intra-predicted image may be decoded without information from other images. To perform intra prediction, the video encoder 100 shown in fig. 1 may employ an intra prediction module 126. The intra prediction module 126 is configured to generate an intra prediction block (prediction block 134) using reconstructed samples in a reconstructed block 136 of neighboring blocks of the same image. Intra prediction is performed according to an intra prediction mode selected for the block. The video encoder 100 then calculates the difference between the block 104 and the intra-prediction block 134. This difference is referred to as residual block 106.

To further remove redundancy from the block, transform module 114 transforms residual block 106 into the transform domain by applying a transform to the samples in the block. Examples of transforms may include, but are not limited to, discrete cosine transforms (Discrete Cosine Transform, DCT) or discrete sine transforms (Discrete Sine Transform, DST). The transformed values may be referred to as transform coefficients representing a residual block in the transform domain. In some examples, the residual block may be directly quantized without being transformed by the transform module 114. This is referred to as a transform skip mode.

The video encoder 100 may also quantize the transform coefficients using the quantization module 115 to obtain quantized coefficients. Quantization involves dividing the samples by a quantization step and subsequent rounding, while inverse quantization involves multiplying the quantization value by the quantization step. This quantization process is known as scalar quantization. Quantization is used to reduce the dynamic range of video samples (transformed or untransformed) so that fewer binary bits are used to represent the video samples.

Quantization of coefficients/samples within a block can be done independently, and this quantization method is used for some existing video compression standards, such as h.264 and HEVC. For an N by M block, the 2D coefficients of the block may be converted into a 1-D array using a particular scan order for coefficient quantization and encoding. Quantization of the intra-block coefficients may utilize scan order information. For example, the quantization of a given coefficient in a block may depend on the state of the previously quantized values along the scan order. To further increase the codec efficiency, more than one quantizer may be used. Which quantizer to use to quantize the current coefficient depends on information preceding the current coefficient in the encoding/decoding scan order. This quantization method is called correlation quantization.

Quantization step sizes may be used to adjust the quantization levels. For example, for scalar quantization, different quantization steps may be applied to achieve finer or coarser quantization. A smaller quantization step corresponds to a finer quantization, and a larger quantization step corresponds to a coarser quantization. The quantization step size may be indicated by a quantization parameter (Quantization Parameter, QP). Quantization parameters are provided in the encoded bitstream of video so that a video decoder can apply the same quantization parameters for decoding.

The quantized samples are then encoded by entropy encoding module 116 to further reduce the size of the video signal. The entropy encoding module 116 is configured to apply an entropy encoding algorithm to the quantized samples. In some examples, the quantized samples are binarized into bins, and the encoding algorithm further compresses the bins into bins. Examples of binarization methods include, but are not limited to, TR and limited EGK binarization. In order to improve the coding efficiency, a history-based Rice parameter derivation method is used, in which Rice parameters derived for TUs are based on variables obtained or updated from previous TUs. Examples of entropy Coding algorithms include, but are not limited to, variable-Length Coding (VLC) schemes, context-adaptive VLC schemes (CAVLC), arithmetic Coding schemes, binarization, CABAC, syntax-based Context-adaptive binary arithmetic Coding (SBAC), probability interval partitioning entropy (Probability Interval Partitioning Entropy, PIPE) Coding, or other entropy Coding techniques. Entropy encoded data is added to the bitstream of the output encoded video 132.

As described above, reconstructed blocks 136 from neighboring blocks are used for intra prediction of an image block. Generating a reconstructed block 136 of a block involves calculating a reconstructed residual for the block. The reconstructed residual may be determined by applying inverse quantization and inverse transform to the quantized residual of the block. The inverse quantization module 118 is configured to apply inverse quantization to the quantized samples to obtain dequantized coefficients. The inverse quantization module 118 applies the inverse of the quantization scheme applied by the quantization module 115 by using the same quantization step size as the quantization module 115. The inverse transform module 119 is configured to apply an inverse transform of the transform applied by the transform module 114, such as an inverse DCT or an inverse DST, to the dequantized samples. The output of inverse transform module 119 is the reconstructed residual of the block in the pixel domain. The reconstructed residual may be added to a prediction block 134 of the block to obtain a reconstructed block 136 in the pixel domain. The inverse transform module 119 is not applied to those blocks that skip transforms. The dequantized samples are the reconstructed residuals of the block.

The block in a subsequent image following the first intra-predicted image may be encoded using inter-prediction or intra-prediction. In inter prediction, the prediction of a block in an image is from one or more previously encoded video images. The video encoder 100 uses an inter prediction module 124 to perform inter prediction. The inter prediction module 124 is configured to perform motion compensation on the block based on the motion estimation provided by the motion estimation module 122.

The motion estimation module 122 compares the current block 104 of the current image with the decoded reference image 108 for motion estimation. Decoded reference pictures 108 are stored in a decoded picture buffer 130. The motion estimation module 122 selects the reference block from the decoded reference pictures 108 that best matches the current block. The motion estimation module 122 also identifies an offset between the location (e.g., x, y coordinates) of the reference block and the location of the current block. This offset is referred to as a Motion Vector (MV) and is provided to the inter prediction module 124. In some cases, a plurality of reference blocks are identified for the block in the plurality of decoded reference pictures 108. Thus, a plurality of motion vectors are generated and provided to the inter prediction module 124.

The inter prediction module 124 performs motion compensation using the motion vector and other inter prediction parameters to generate a prediction for the current block (i.e., the inter prediction block 134). For example, based on the motion vector, the inter prediction module 124 may locate a prediction block pointed to by the motion vector in a corresponding reference image. If there is more than one prediction block, these prediction blocks are combined with some weights to generate the prediction block 134 of the current block.

For inter-prediction blocks, video encoder 100 may subtract inter-prediction block 134 from block 104 to generate residual block 106. The residual block 106 may be transformed, quantized, and entropy encoded in the same manner as the residual of the intra-prediction block discussed above. Likewise, a reconstructed block 136 of the inter prediction block may be obtained by inverse quantizing, inverse transforming, and then combining the residual with the corresponding prediction block 134.

The reconstruction block 136 is processed by the in-loop filter module 120 to obtain the decoded image 108 for motion estimation. The in-loop filter module 120 is configured to smooth pixel transitions, thereby improving video quality. The in-loop filter module 120 may be configured to implement one or more in-loop filters, such as a deblocking filter, or a Sample-Adaptive Offset (SAO) filter, or an Adaptive loop filter (Adaptive Loop Filter, ALF), or the like.

Fig. 2 depicts an example of a video decoder 200 configured to implement embodiments presented herein. The video decoder 200 processes the encoded video 202 in the bitstream and generates decoded images 208. In the example shown in fig. 2, video decoder 200 includes entropy decoding module 216, inverse quantization module 218, inverse transform module 219, intra filter module 220, intra prediction module 226, inter prediction module 224, and decoded image buffer 230.

The entropy decoding module 216 is configured to perform entropy decoding of the encoded video 202. The entropy decoding module 216 decodes the quantized coefficients, coding parameters including intra-prediction parameters and inter-prediction parameters, and other information. In some examples, entropy decoding module 216 decodes the bitstream of encoded video 202 into a binary representation and then converts the binary representation into quantization levels of the coefficients. The entropy decoded coefficients are then inverse quantized by inverse quantization module 218 and then inverse transformed to the pixel domain by inverse transform module 219. The inverse quantization module 218 and the inverse transformation module 219 function similarly to the inverse quantization module 118 and the inverse transformation module 119, respectively, as described above with respect to fig. 1. The inverse transformed residual block may be added to the corresponding prediction block 234 to generate a reconstructed block 236. For blocks that skip transforms, the inverse transform module 219 is not applied to those blocks. The dequantized samples generated by the dequantization module 118 are used to generate the reconstruction block 236.

A prediction block 234 for a particular block is generated based on the prediction mode of the block. If the encoding parameters of a block indicate that the block is intra-predicted, a reconstructed block 236 of a reference block in the same image may be fed to the intra-prediction module 226 to generate a predicted block 234 of the block. If the encoding parameters of the block indicate that the block is inter predicted, a prediction block 234 is generated by the inter prediction module 224. The intra-prediction module 226 and the inter-prediction module 224 function similarly to the intra-prediction module 126 and the inter-prediction module 124, respectively, of fig. 1.

As discussed above with respect to fig. 1, inter prediction involves one or more reference pictures. The video decoder 200 generates a decoded image 208 of the reference image by applying an in-loop filter module 220 to the reconstructed block of the reference image. Decoded image 208 is stored in decoded image buffer 230 for use by inter prediction module 224 as well as for output.

Referring now to fig. 3, fig. 3 depicts an example of CTU partitioning of images in video according to some embodiments of the present disclosure. As discussed above with respect to fig. 1 and 2, the image is divided into blocks (such as CTU 302 in VVC shown in fig. 3) to encode the image of the video. For example, CTU 302 may be a block of 128x128 pixels. CTUs are processed according to a sequence such as that shown in fig. 3. In some examples, each CTU 302 in the image may be partitioned into one or more CUs 402 as shown in fig. 4, which may be further partitioned into prediction units or TUs for prediction and transformation. CTU 302 may be partitioned into CUs 402 in different ways depending on the codec scheme. For example, in VVC, CU 402 may be rectangular or square, and may be encoded without further division into prediction units or TUs. Each CU 402 may be as large as its root CTU 302 or a subdivision of the root CTU 302 as small as 4x4 blocks. As shown in fig. 4, dividing CTUs 302 into CUs 402 in VVC may be quadtree splitting or binary tree splitting or trigeminal tree splitting. In fig. 4, the solid line indicates a quadtree split, and the broken line indicates a binary tree or a trigeminal tree split.

As discussed above with respect to fig. 1 and 2, quantization is used to reduce the dynamic range of elements of a block in a video signal, thereby using fewer binary bits to represent the video signal. In some examples, the elements at a particular location of a block are referred to as coefficients prior to quantization. After quantization, the quantized value of the coefficient is referred to as a "quantization level" or "level". Quantization typically involves dividing by a quantization step size and subsequent rounding, while inverse quantization involves multiplying by the quantization step size. This quantization process is also known as scalar quantization. Quantization of intra-block coefficients may be performed independently and such independent quantization methods are used for some existing video compression standards, such as h.264, HEVC, etc. In other examples, correlation quantization is employed, such as in VVC.

For an N by M block, the 2-D coefficients of the block may be converted into a 1-D array using a particular scan order for coefficient quantization and encoding, and encoded and decoded using the same scan order. Fig. 5 shows an example of a coding block (such as a TU) having a predetermined scan order for processing coefficients of the coding block. In this example, the encoding block 500 has a size of 8x8, the process is at the lower right corner position L ₀ Beginning at the upper left corner L ₆₃ And (5) ending. If block 500 is a transformed block, the predetermined sequence shown in FIG. 5 begins at the highest frequency and ends at the lowest frequency. In some examples, processing of the block (such as quantization and binarization) begins with a first non-zero element of the block according to a predetermined scan order. For example, if position L ₀ -L ₁₇ Where the coefficients are all zero, L ₁₈ Where the coefficients are non-zero, the process is performed from L ₁₈ Beginning with coefficients at and for L in scan order ₁₈ Each coefficient thereafter being performed.

Residual coding

In video coding, residual coding is used to convert quantization levels into a code stream. After quantization, there are n×m quantization levels for an n×m TU coded block. The nxm levels may be zero or non-zero values. If the non-zero level is not binary, the non-zero level is further binarized into a binary term. CABAC may further compress the binary item into binary bits. Furthermore, there are two coding methods based on context modeling. Specifically, one of the methods adaptively updates a context model based on neighboring coding information. This method is called a context-coding method, and a binary item coded in this way is called a context-coded binary item. In contrast, another approach assumes that the probability of 1 or 0 is always 50% and therefore always uses fixed context modeling without adaptation. This method is called a bypass method, and the binary item encoded in this method is called a bypass binary item.

For a regular residual code (Regular Residual Coding, RRC) block in VVC, the position of the last non-zero level is defined as the position of the last non-zero level along the code scan order. The representation of the last non-zero level 2D coordinates (last_sig_coeff_x and last_sig_coeff_y) includes a total of 4 prefix and suffix syntax elements, namely last_sig_coeff_x_prefix, last_sig_coeff_y_prefix, last_sig_coeff_x_suffix and last_sig_coeff_y_suffix. The syntax elements last sig coeff x prefix and last sig coeff y prefix are first encoded using a context encoding method. If there are last sig coeff x unifxix and last sig coeff y unifxix, they are encoded using a bypass method. The RRC block may consist of several predefined sub-blocks. The syntax element sb_coded_flag is used to indicate whether all levels of the current sub-block are equal to zero. If the sb_coded_flag is equal to 1, there is at least one non-zero coefficient in the current sub-block. If the sb_coded_flag is equal to 0, all coefficients in the current sub-block will be zero. However, the sb_coded_flag of the last non-zero sub-block having the last non-zero level is derived from last_sig_coeff_x and last_sig_coeff_y according to the coding scan order to 1 without coding into the code stream. Furthermore, the sb_coded_flag of the upper left sub-block containing the DC position is also derived to 1 without being encoded into the code stream. The syntax element of the sb_coded_flag in the code stream is encoded by a context encoding method. As discussed above with respect to fig. 5, the RRC will encode the sub-blocks from sub-block to sub-block in reverse coding scan order starting with the last non-zero sub-block.

To guarantee worst-case throughput, a predefined value rembinstpass 1 is used to limit the maximum number of context-encoded binary items. Within the sub-block, the RRC will encode the level of each location in reverse coding scan order. If rembinstpass 1 is greater than 4, then when encoding the current level, a flag named sig_coeff_flag is first encoded into the bitstream to indicate whether the level is zero or non-zero. If the level is non-zero, abs_level_gtx_flag [ n ] [0], where n is an index along the scan order of the current location within the sub-block to indicate whether the absolute level is 1 or greater than 1. If the absolute level is greater than 1, then the par_level_flag is encoded to indicate whether the level in the VVC is odd or even, and then abs_level_gtx_flag [ n ] [1] will be present. The flags of par_level_flag and abs_level_gtx_flag [ n ] [1] are also used together to indicate that the level is 2, 3 or greater than 3. After encoding each of the above syntax elements as a context-encoded binary item, the value of rembinstpass 1 will decrease by 1.

If the absolute level is greater than 3 or the value of rembinstpass 1 is not greater than 4, after the above binary item is encoded by the context encoding method, the other two syntax elements abs_remain and dec_abs_level may be encoded as bypass encoded binary items of the remaining levels. Furthermore, the symbols for each level within a block will also be encoded to represent a quantization level and encoded as bypass encoded binary terms.

Another residual coding method uses abs_level_gtxx_flag and residual levels to enable conditional parsing of syntax elements for level coding of residual blocks, the corresponding binarization of the absolute values of the levels is shown in table 1. Here, abs_level_gtxx_flag describes whether the absolute value of the level is greater than X, where X is an integer, such as 0, 1, 2, or N. If abs_level_gtxY_flag is 0, where Y is an integer between 0 and N-1, then the abs_level_gtx (Y+1) flag will not be present. If abs_level_gtxY_flag is 1, then there will be an abs_level_gtx (Y+1) flag. Furthermore, if abs_level_gtxn_flag is 0, there will be no residual level. When abs_level_gtxn_flag is 1, there will be a remaining level, which represents the value after (n+1) is removed from the level. Typically, abs_level_gtxx_flag is encoded separately by a context encoding method and the remaining level is encoded by a bypass method.

TABLE 1 residual coding based on abs_level_gtxX_flag and remainder

abs (level)	0	1	2	3	4	5	6	7	8	9	10	11	12	…
															abs_level_gtx0_flag	0	1	1	1	1	1	1	1	1	1	1	1	1	…
abs_level_gtx1_flag		0	1	1	1	1	1	1	1	1	1	1	1	…
															abs_level_gtx2_flag			0	1	1	1	1	1	1	1	1	1	1	…
abs_level_gtx3_flag				0	1	1	1	1	1	1	1	1	1	…
															Remainder					0	1	2	3	4	5	6	7	8	…

For blocks encoded in transform skip residual coding mode (Transform Skip Residual Coding mode, TSRC), the TSRC will encode sub-blocks by sub-block in the coding scan order starting from the upper left sub-block. Similarly, a syntax element sb_coded_flag is used to indicate whether all residuals of the current sub-block are equal to zero. When a specific condition occurs, all syntax elements of the sb_coded_flag of all sub-blocks are encoded into the bitstream except for the last sub-block. If not all the sb_coded_flag is equal to 1 for all the sub-blocks before the last sub-block, the sb_coded_flag of the last sub-block is derived to 1 and the flag is not encoded into the code stream. To guarantee worst-case throughput, the maximum context-encoding binary item is limited using a predefined value RemCcbs. If the current sub-block has a non-zero level, the TSRC will encode the level of each location using the encoding scan order. If RemCcbs is greater than 4, the following syntax elements will be encoded using the context encoding method. For each level, sig_coeff_flag is first encoded into the code stream to indicate whether the level is zero or non-zero. If the level is non-zero, coeff_sign_flag is encoded to indicate whether the level is positive or negative. Abs_level_gtx_flag [ n ] [0] will then be encoded to indicate whether the current absolute level of the current position is greater than 1, where n is an index along the scan order of the current position within the sub-block. If abs_level_gtx_flag [ n ] [0] is not zero, par_level_flag is encoded. After each of the above syntax elements is encoded with the context encoding method, the value of RemCcbs will be reduced by 1.

After encoding the above syntax elements for all locations within the current sub-block, if remcbs is still greater than 4, then a maximum of four abs_level_gtx_flag [ n ] [ j ], where n is an index along the scan order of the current location within the sub-block; j is from 1 to 4, which will be encoded using the context encoding method. After encoding each abs_level_gtx_flag [ n ] [ j ], the value of remcbs will decrease by 1. If RemCcbs is not greater than 4, syntax element abs_remain will be encoded for the current position within the sub-block in a bypass method if necessary. For those positions where the absolute level is fully encoded with abs_remain syntax elements by the bypass method, coeff_sign_flags are also encoded by the bypass method. In summary, there is a predefined counter rembinstpass 1 in RRC, or a remcbs in TSRC, to limit the total number of context-encoded binary items and ensure worst-case throughput.

Rice parameter derivation

In the current RRC design in VVC, two syntax elements abs_remain and dec_abs_level encoded as bypass bins may be present in the remaining level of the code stream. Both abs_remain and dec_abs_levels are binarized by a combination of TR and limited EGK binarization procedures specified in the VVC specification, which requires Rice parameters to binarize a given level. To obtain the optimal Rice parameters, a local summation method as described below is used.

The array AbsLevel [ xC ] [ yC ] represents an array of absolute values of Transform coefficient levels of a current Transform Block (TB) for the color component index cIdx. Given an array AbsLevel [ x ] [ y ] of TBs with color component index cIdx and top left luminance position (x 0, y 0), the local and variable locSumAbs is derived as specified by the following pseudocode procedure:

where log2TbWidth and log2TbHeight are the base 2 logarithms of the width and height of TB, respectively. The variable baseLevel is 4 and 0 for abs_remain and dec_abs_level, respectively. Given the local and variable locsub, the Rice parameter cRiceParam is derived in the manner specified in Table 2.

TABLE 2 cRiceParam Specification based on locSumAbs

locSumAbs	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
																	cRiceParam	0	0	0	0	0	0	0	1	1	1	1	1	1	1	2	2
locSumAbs	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
																	cRiceParam	2	2	2	2	2	2	2	2	2	2	2	2	3	3	3	3

History-based Rice parameter derivation

If the coefficients lie at TU boundaries, or are first decoded using the Rice method, then the template calculations used for Rice parameter derivation may produce inaccurate coefficient estimates. For these coefficients, the template calculation is biased towards 0, as some template positions may be outside the TU and interpreted or initialized to a value of 0. Fig. 6 shows an example of a template pattern of locSumAbs for calculating coefficients located near TU boundaries. Fig. 6 shows a CTU 602 divided into a plurality of CUs, and each CU includes a plurality of TUs. For TU 604, the position of the current coefficient is shown as a solid block and the positions of its neighboring samples in the template pattern are shown as patterned blocks. The patterned block indicates a predetermined neighborhood for calculating the current coefficients of the local and variable locsub abs.

In fig. 6, because the current coefficient 606 is close to the boundary of TU 604, some neighboring samples in the template pattern are outside the TU boundary, such as neighboring samples 608B and 608E. In the Rice parameter derivation described above, when calculating the local and variable locsub, these out-of-boundary adjacent samples are set to 0, resulting in inaccurate Rice parameter derivation. For high bit depth samples (e.g., over 10 bits), there may be a large number of neighboring samples outside the TU boundary. Setting this large number to 0 introduces more errors in the Rice parameter derivation.

To improve the accuracy of Rice estimation from the calculated template, it is suggested to update the local and variable locSumAbs with a historically derived value for template positions outside the current TU, instead of initializing with a value of 0. The implementation of this method is shown below by the VVC specification text excerpt of clause 9.3.3.2 (suggested text underlined).

To maintain a history of neighboring coefficient/sample values, a history counter for each color component StatCoeff [ cIdx ] is utilized, where cidx=0, 1, 2 represents three color components Y, U, V, respectively. If the CTU is the first CTU in a partition (e.g., image, slice, or tile), statCoeff [ cIdx ] initializes as follows:

StatCoeff[idx]＝2*Floor(Log2(BitDepth-10) (1)

Here, bitDepth specifies the bit depth of samples of the luma and chroma arrays of the video; floor (x) represents the largest integer less than or equal to x, log2 (x) is the base 2 logarithm of x. Before TU decoding and history counter updating, the replacement variable HistValue is initialized to:

HistValue[cIdx]＝1<<StatCoeff[cIdx] (2)

the substitution variable histvue is used as an estimate of neighboring samples outside of the TU boundary (e.g., neighboring samples have horizontal or vertical coordinates outside of the TU). The local and variable locsub abs are re-derived (modified part underlined) as specified by the following pseudocode procedure:

the history counter StatCoeff is updated by an exponential moving average process based on the first non-zero Golomb-Rice encoded transform coefficient (abs_remain [ cIdx ] or dec_abs_level [ cIdx ]) for each TU. When the first non-zero Golomb-Rice encoded transform coefficient in the TU is encoded as abs_remainders, the history counter StatCoeff of the color component cIdx is updated as follows:

when the first non-zero Golomb-Rice encoded transform coefficient in the TU is encoded as dec_abs_level, the history counter StatCoeff of the color component cIdx is updated as follows:

the updated StatCoeff may be used to calculate the replacement variable histvue for the next TU before decoding the next TU according to equation (2).

Wavefront parallel processing (Wavefront Parallel Processing, WPP)

WPP aims to provide a parallel coding mechanism. When WPP is enabled in VVC, each CTU row of a frame, tile or slice constitutes a separate partition. WPP is enabled/disabled by SPS element sps_entropy_coding_sync_enabled_flag. Fig. 7 shows an example of WPP enabled tiles. In fig. 7, each CTU row of a tile is processed with a delay of one CTU relative to the previous CTU row. In this way, if palette (palette) coding is enabled at the end of each CTU row, no correlation between consecutive CTU rows is broken at the partition boundary, except for CABAC context variables and palette predictors. To mitigate potential loss of coding efficiency, the contents of the adjusted CABAC context variables and palette predictors are propagated from the first coding CTU of the previous CTU row to the first CTU of the current CTU row. WPP does not change the conventional raster scan order of CTUs.

When WPP is enabled, several threads up to the number of CTU rows in a partition (e.g., tile, slice, or frame) may work in parallel to process a single CTU row. By using WPP in the decoder, each decoding thread processes a single CTU row of the partition. The scheduling of thread processing needs to be organized such that for each CTU, decoding of the CTU top neighbor CTU has to be completed in the previous CTU row. An additional small overhead is added to WPP so that after the encoding of the first CTU of each CTU row except the last row of CTUs is completed, all CABAC context variables and the contents of the palette predictor can be stored.

When the history-based Rice parameter derivation discussed above is enabled for high bit depth and high bit rate video coding, the last StatCoeff in the previous CTU row will be passed to the first TU in the current CTU row. Thus, when WPP is enabled simultaneously, this process will interfere with WPP and break the parallelism of WPP. In the present disclosure, when parallel encoding (e.g., WPP) is enabled, several solutions are proposed to solve this problem.

In one embodiment, the correlation between CTUs in different CTU rows when calculating the history counter StatCoeff is removed, thereby eliminating interference to parallel encoding from history-based Rice parameter derivation. In this embodiment, rather than using the history counter StatCoeff value obtained from the previous CTU row, the initial value of StatCoeff [ cIdx ] is used to encode the first abs_remain [ cIdx ] or dec_abs_level [ cIdx ] in each CTU row of a partition (e.g., frame, tile, or slice), where cIdx is an index of a color component.

As an example, the initial value of StatCoeff [ cIdx ] can be determined as follows:

StatCoeff[idx]＝2*Floor(Log2(BitDepth-10)) (5)

here, bitDepth specifies the bit depth of a sample of the luminance or chrominance array, and Floor (x) represents the largest integer less than or equal to x. As another example, the initial value of StatCoeff [ cIdx ] may be determined as:

StatCoeff[idx]＝Clip(MIN_Stat,MAX_Stat,(int)((19-QP)/6))-1 (6)

Here, min_stat, max_stat are two predefined integers; QP is the initial QP for each slice, clip () is an operation defined as follows:

before encoding the first TU of each CTU row of a partition (e.g., frame, tile, or slice), the substitution variable histvue is calculated as follows:

HistValue[cIdx]＝1<<StatCoeff[cIdx] (8)

as described above, histValue may be used to calculate the local and variable locSumAbs. The HistValue can be updated by the process of exponential moving average based on the first non-zero Golomb-Rice encoded transform coefficient (abs_remainder [ cIdx ] or dec_abs_level [ cIdx ]) for each TU. When the first non-zero Golomb-Rice encoded transform coefficient in the TU is encoded as abs_remainders, the history counter StatCoeff [ cIdx ] of the color component cIdx is updated as follows:

when the first non-zero Golomb-Rice encoded transform coefficient in TU is encoded as dec_abs_level, the history counter StatCoeff [ cIdx ] of the color component cIdx is updated as follows:

the updated StatCoeff [ cIdx ] is used to calculate a replacement variable histvue as shown in equation (8) for the next TU of the current CTU or the first TU of the next CTU in the current CTU row.

Fig. 8 shows an example of a frame 802 and CTUs contained in the frame. In this example, the frame 802 contains two tiles: block 804A and block 804B. The tile 804A contains four CTU rows-CTU rows 1-4. The first CTU row includes CTUs 0 through CTU 9, the second CTU row includes CTUs 10 through CTU 19, and so on. Likewise, block 804B also includes four CTU rows-CTU rows 1'-4'. The first CTU row includes ten CTUs: CTU 0 'to CTU 9', the second CTU row includes CTU 10 'to CTU 19', and so on.

According to this embodiment, the initial value of StatCoeff [ cIdx ] of block 804A may be determined according to equation (5) or (6). Before encoding the first TU of each of CTU rows 1-4, a replacement variable HistValue [ cIdx ] is calculated using the initial value of StatCoeff [ cIdx ] using equation (8). For example, before encoding the first TU of CTU 0, the variable histvue is calculated using equation (8). This value of histvue is used to determine the local sum variable locsum of the coefficients in the first TU, which is further used to determine the Rice parameters of the individual coefficients of the first TU. The history counter StatCoeff may be updated according to equation (9) or (10) when the first TU of the current CTU 0 is processed. Prior to processing the second TU in CTU 0, the current value of StatCoeff is used to determine the histvue of the second TU according to equation (8). A similar procedure is then employed for the second TU to determine the Rice parameter using histvue and update StatCoeff. For the first TU in CTU 1, histValue is calculated according to equation (8) using the latest StatCoeff for TU in CTU 0. This process may be repeated until the last CTU (CTU 9) in the current CTU row 1 is processed.

For the second CTU row of block 804A, the history counter StatCoeff is initialized according to equation (5) or (6) before encoding the first TU of CTU 10, the first CTU of the second CTU row. Similar processing to that described above with respect to CTU row 1 is performed on TUs in CTUs of the second CTU row. Likewise, before encoding the first TU of each of CTU 20 and CTU 30, variable StatCoeff is initialized again according to equation (5) or (6).

Block 804B may be processed in a similar manner. Before encoding the first TU of each of CTU rows 1'-4' (i.e., CTU 0', CTU 10', CTU 20', and CTU 30'), the value of StatCoeff [ cIdx ] is initialized according to equation (5) or (6), and a history counter histvue is calculated using equation (8). The calculated history counter histvue is used to calculate the locsub and Rice parameters of TUs in the first CTU and remaining CTUs of each CTU row. Furthermore, according to equation (9) or (10), the history counter StatCoeff may be updated at most once in each TU, and the updated value of StatCoeff is used to determine the histvue of the next TU in the same CTU row.

Although fig. 8 is described as a frame 802 containing two tiles 804A and 804B, the same process is also applicable to other scenarios, such as slices containing multiple tiles, frames containing multiple slices, and so forth. In any of these scenarios, the value of the history counter StatCoeff cIdx is reset to an initial value prior to encoding the first TU in each CTU row of a partition (e.g., frame, tile, or slice) to eliminate the dependency of the CTU row on the Rice parameter derivation.

Possible VVC specifications shown under the hatching are changed as follows.

Another possible VVC specification modification is defined as 9.3.2.1 below

Bit depth of video samples

The bit depth of the input video supported by VVC version 2 may exceed 10 bits. Higher video bit depths may provide higher visual quality and lower compression distortion for decoded video. To support the high bit depth of the input video, the semantics of the corresponding SPS syntax element sps_bitdepth_minus8 and VPS syntax element vps_oles_dpb_bitdepth_minus8 [ i ] may be changed as follows.

sps_bitdepth_minus8 specifies the bit depth BitDepth of the samples of the luma and chroma array and the value QpBdOffset of the luma and chroma quantization parameter range offset as follows:

BitDepth＝8+sps_bitdepth_minus8 (x1)

QpBdOffset＝6*sps_bitdepth_minus8 (x2)

sps_bitdepth_minus8 should be in the range of 0 to 8, including 0 and 8.

When sps_video_parameter_set_id is greater than 0 and SPS is included in layer references in the ith multi-layer OLS specified by any i in the range of VPS 0 to nummultitlayerols-1 (including 0 and nummultitlayerols-1), the requirement for stream consistency is that the value of sps_bitdepth_minus8 should be less than or equal to the value of vps_ols_dpb_bitdepth_minus8[ i ].

vps_ols_dpb_bitdepth_minus8[ i ] specifies the maximum allowed value of sps_bitdepth_minus8 for all sps referenced by CLVS in the CVS of the ith multi-layer OLS. The value of vps_ols_dpb_bitdepth_minus8[ i ] should be in the range of 0 to 8, including 0 and 8. Note 2-to decode the i-th multi-layer OLS, the decoder can safely allocate memory for the DPB based on the values of syntax elements vps_ols_dpb_pic_width [ i ], vps_ols_dpb_pic_height [ i ], vps_ols_dpb_chroma_format [ i ], and vps_ols_dpb_bitdepth_minus8[ i ].

As can be seen from the above, the bit depth BitDepth of the samples of the luma and chroma arrays can be derived based on the SPS syntax element sps_bitdepth_minus8 according to equation (x 1). Using the determined BitDepth value, the history counter StatCoeff, the substitution variable HistValue, and the Rice parameters may be derived as described above.

The VPS syntax element vps_ols_dpb_bitdepth_minus8[ i ] can be used to derive the size of the DPB. The encoded bitstream may have multiple video layers. The VPS is used to specify the corresponding syntax element. For video decoding, the DPB may be used to store reference pictures so that previously encoded pictures may be used to generate a prediction signal for use in encoding other pictures. The DPBs may also be used to reorder the decoded pictures so that they may be output and/or displayed in the correct order. The DPB may also be used for an output delay specified for the hypothetical reference decoder. The decoded pictures may remain in the DPB for a predetermined period of time specified for the hypothetical reference decoder and be output after the predetermined period of time has elapsed.

For secure allocation of memory for a DPB, the size of the DPB is determined by the following syntax elements vps_ols_dpb_pic_width [ i ], vps_ols_dpb_pic_height [ i ], vps_ols_dpb_chroma_format [ i ], and vps_ols_dpb_bitdepth_minus8[ i ].

The size of the DPB will be determined by picture_size accordingly. In other words, the size of the DPB may be determined according to the chroma format of the samples. If the video frame is a monochrome frame, the size of the frame to be buffered is determined as the base image size picture_size1. If the color sub-sample of the color video frame is 4:2:0, then the frame size is determined to be 1.5 times the base image size picture_size 1; if the color sub-samples of the color video frame are 4:2:2, then the frame size is determined to be twice the base image size picture_size 1; if the color sub-samples of the color video frame are 4:4:4, the frame size is determined to be three times the base image size picture_size1. From the color sub-samples, the size of the DPB may be determined as the number of frames to be stored in the DPB multiplied by the size of the frame.

Fig. 9 depicts one example of a process 900 of encoding partitions of video according to some embodiments of the present disclosure. One or more computing devices (e.g., a computing device implementing the video encoder 100) implement the operations depicted in fig. 9 by executing suitable program code (e.g., program code implementing the entropy encoding module 116). For purposes of illustration, the process 900 is described with reference to some examples depicted in the accompanying drawings. However, other implementations are also possible.

In step 902, process 900 involves accessing a partition of a video signal. The partition may be a video frame, slice, or tile, or any type of partition that is processed as a unit by a video encoder when performing encoding. The partition includes one CTU set arranged in CTU rows as shown in fig. 8. Each CTU includes one or more CTUs, and each CTU includes a plurality of TUs for encoding as shown in the example of fig. 6.

In step 904, which includes steps 906-914, process 900 involves processing each CTU in the set of CTUs in the partition to encode the partition as a binary bit. In step 906, process 900 involves determining whether the parallel encoding mechanism has been enabled and whether the current CTU is the first CTU of the CTU row. In some examples, parallel encoding may be indicated by a flag whose value 0 indicates that parallel encoding has been disabled and value 1 indicates that parallel encoding has been enabled. If it is determined that the parallel encoding mechanism is enabled and the current CTU is the first CTU of the CTU row, process 900 involves setting the history counter StatCoeff to an initial value in step 908. As described above, if history-based Rice parameter derivation has been enabled, the initial value of the history counter may be set according to equation (5) or (6); otherwise, the initial value of the history counter is set to zero.

If it is determined that the parallel encoding mechanism is not enabled or that the current CTU is not the first CTU of the CTU row, or after setting the history counter in step 908, process 900 involves calculating the Rice parameter for TUs in that CTU based on the history counter in step 910. As described in detail above with respect to fig. 6-8, if the history counter is reset in step 908, the Rice parameter of the TU in the CTU is calculated based on the reset history counter or the history counter updated later. If the history counter is not reset in step 908, the Rice parameter for TU in the CTU is calculated based on the history counter updated in the previous CTU or the history counter updated later in the current CTU.

In step 912, process 900 involves encoding TUs in the CTU into a binary representation based on the calculated Rice parameters, such as by a combination of TR and limited EGK as specified in the VVC specification. In step 914, process 900 involves encoding the binary representation of the CTU as binary bits for inclusion in the bitstream of the video. For example, encoding may be performed using CABAC as discussed above. In step 916, process 900 involves outputting the encoded video stream.

Fig. 10 depicts one example of a process 1000 of decoding partitions of video according to some embodiments of the present disclosure. One or more computing devices implement the operations depicted in fig. 10 by executing suitable program code. For example, a computing device implementing the video decoder 200 may implement the operations depicted in fig. 10 by executing program code of the entropy decoding module 216, the inverse quantization module 218, and the inverse transformation module 219. For purposes of illustration, process 1000 is described with reference to some examples depicted in the accompanying drawings. However, other implementations are also possible.

In step 1002, process 1000 involves accessing a binary string or binary representation representing a partition of a video signal. The partition may be a video frame, slice, or tile, or any type of partition that is processed as a unit by a video encoder when performing encoding. The partition includes one CTU set arranged in CTU rows as shown in fig. 8. Each CTU includes one or more CTUs, and each CTU includes a plurality of TUs for encoding as shown in the example of fig. 6.

In step 1004, which includes steps 1006-1014, process 1000 involves processing the binary string of each CTU in the set of CTUs in the partition to generate decoded samples for the partition. In step 1006, process 1000 involves determining whether parallel encoding mechanisms have been enabled and whether the current CTU is the first CTU of a row of CTUs. Parallel encoding may be indicated by a flag whose value 0 indicates that parallel encoding has been disabled and whose value 1 indicates that parallel encoding has been enabled. If it is determined that the parallel encoding mechanism has been enabled and the current CTU is the first CTU of the CTU row, process 1000 involves setting the history counter StatCoeff to an initial value in step 1008. As described above, if history-based Rice parameter derivation has been enabled, the initial value of the history counter may be set according to equation (5) or (6); otherwise, the initial value of the history counter is set to zero.

If it is determined that the parallel encoding mechanism is not enabled or that the current CTU is not the first CTU of the CTU row, or after setting the history counter in step 1008, process 1000 involves calculating the Rice parameter for TUs in the CTU based on the history counter in step 1010. As described in detail above with respect to fig. 6-8, if the history counter is reset in step 1008, the Rice parameter of the TU in the CTU is calculated based on the reset history counter or the history counter updated later. If the history counter is not reset in step 1008, the Rice parameter of TU in the CTU is calculated based on the history counter updated in the previous CTU or the history counter updated later in the current CTU.

In step 1012, process 1000 involves decoding the binary string or binary representation of TUs in the CTU into coefficient values based on the calculated Rice parameters, such as by a combination of TR and limited EGK as specified in the VVC specification. In step 1014, process 1000 involves reconstructing pixel values of TUs in the CTU by, for example, inverse quantization and inverse transform as discussed above with respect to fig. 2. In step 1016, process 1000 involves outputting the decoded partitions of the video.

In another embodiment, the correlation between CTUs when the history counter StatCoeff is calculated is kept consistent with the correlation between CTUs in a parallel coding mechanism (such as WPP). For example, the history counter StatCoeff of CTU rows of a partition (e.g., frame, tile, or slice) is calculated based on coefficient values in the first N or fewer CTUs in the previous CTU row, where N is the maximum delay allowed between two consecutive CTU rows in the parallel encoding mechanism. In this way, the correlation between CTUs in two consecutive CTU rows when the history counter StatCoeff is calculated is limited to not exceed (i.e., to stay in agreement with) the correlation between CTUs when parallel processing is performed.

This embodiment may be implemented using a store-synchronization process. For example, in the WPP described above, the delay between two consecutive CTU rows is one CTU, so n=1. During storage, after encoding the last TU of the first CTU in each CTU row (except the last CTU row), statCoeff [ cIdx ] may be stored in a storage variable StatCoeffWpp [ cIdx ]. For each CTU row except the first CTU row, a synchronization procedure of Rice parameter derivation is applied before encoding the first TU. During the synchronization process, statCoeff [ cIdx ] is synchronized with StatCoeffWpp [ cIdx ] saved in the previous CTU row.

As described above, the variable histvue is calculated as follows before encoding the first TU in each CTU row:

HistValue[cIdx]＝1<<StatCoeff[cIdx] (11)

if the current CTU row is the first CTU row of the partition, statCoeff [ cIdx ] may be initialized according to equation (5) or (6). The calculated HistValue may be used to determine local and variable locSumAbs, which in turn are used to determine the Rice parameters of TUs in the current CTU. As described above with respect to formulas (9) and (10), by the process of exponential moving average, statCoeff can be updated according to the first non-zero Golomb-Rice encoded transform coefficient (abs_remainders [ cIdx ] or dec_abs_level [ cIdx ]) for each TU.

After encoding the last TU of the first CTU in the first CTU row, statCoeff [ cIdx ] may be saved as StatCoeffWpp [ cIdx ] in the following storage step:

StatCoeffWpp[cIdx]＝StatCoeff[cIdx] (12)

the encoding of the remaining CTUs in the first CTU row may be performed in a similar manner as described above with respect to the first embodiment.

Before encoding the second CTU row and the first TU in any subsequent CTU row, statCoeff [ cIdx ] can be obtained by a synchronization step:

StatCoeff[cIdx]＝StatCoeffWpp[cIdx] (13)

using the obtained StatCoeff [ cIdx ] value, histvue is calculated according to formula (11). The remaining processing for the CTU row is the same as the processing for the first CTU row.

Possible VVC specification changes are specified as follows (changes are shown underlined).

Alternative history-based Rice parameter derivation

History-based Rice parameter derivation may be implemented in alternative ways. In this alternative implementation, if the CTU is the first CTU in a partition (e.g., image, slice, or tile), then the histvue is initialized with the initial value StatCoeff [ cIdx ] as follows:

this initial HistValue is used to encode the first abs_remain [ cIdx ] or dec_abs_level [ cIdx ] until HistValue is updated according to the following rules. When the first non-zero Golomb-Rice encoded transform coefficient in the TU is encoded as abs_remainders, the history counter for the color component cIdx is updated as follows:

When the first non-zero Golomb-Rice encoded transform coefficient in the TU is encoded as dec_abs_level, the history counter of the color component cIdx is updated as follows:

once the history counter StatCoeff [ cIdx ] is updated, the histvue will be updated as shown in equation (17), and the updated histvue will be used to derive the Rice parameters for the remaining abs_remain and dec_abs_level syntax elements until the new StatCoeff [ cIdx ] and histvue [ cIdx ] are updated again.

HistValue[cIdx]＝1<<StatCoeff[cIdx] (17)

Based on the current VVC specification, possible specification changes are specified as follows.

Clause 7.3.11.11 (residual coding syntax) is modified as follows (supplementary part is underlined):

to resolve correlation conflicts between parallel encoding and alternative history-based Rice parameter derivation, statCoeff [ cIdx ] and histvue [ cIdx ] for each color component are saved after encoding the last TU of the first CTU in each CTU row. The saved values of StatCoeff [ cIdx ] and HistValue [ cIdx ] may be used to initialize StatCoeff [ cIdx ] and HistValue [ cIdx ] before processing the first TU of the first CTU of the subsequent CTU row.

This embodiment may also be implemented using a store-and-synchronize procedure. For example, during storage, statCoeff [ cIdx ] may be processed after the last TU of the first CTU in each CTU row

And HistValue [ cIdx ] are stored in storage variables such as StatCoeffWpp [ cIdx ] and HistValueWpp [ cIdx ] as shown in formulas (18) and (19), respectively.

StatCoeffWpp[cIdx]＝StatCoeff[cIdx] (18)

HistValueWpp[cIdx]＝HistValue[cIdx] (19)

For each CTU row except the first CTU row, a synchronization procedure of Rice parameter derivation is applied before encoding the first TU. For example, statCoeff [ cIdx ] and HistValue [ cIdx ] are synchronized with StatCoeffWpp [ cIdx ] and HistValueWpp [ cIdx ] saved in the previous CTU row, respectively, as shown in formulas (20) and (21).

StatCoeff[cIdx]＝StatCoeffWpp[cIdx] (20)

HistValue[cIdx]＝HistValueWpp[cIdx] (21)

The synchronization variable HistValue is used to encode the first abs_remain [ cIdx ] or dec_abs_level [ cIdx ] until HistValue is updated.

As described above, statCoeff [ cIdx ] may be updated according to the first non-zero Golomb-Rice encoded transform coefficient (abs_remain [ cIdx ]) or dec_abs_level [ cIdx ]) for each TU, as shown in equation (15) or equation (16). Once the history counter StatCoeff [ cIdx ] is updated, the histvue will be updated according to equation (17) and the updated histvue will be used to derive the Rice parameters for the remaining abs_remainders and dec_abs_level syntax elements until the new StatCoeff [ cIdx ] and histvue are updated again.

Based on the current VVC specification, possible specification changes indicated with underlining are specified as follows.

Fig. 11 depicts one example of a process 1100 of encoding a partition of video according to some embodiments of the disclosure. One or more computing devices (e.g., a computing device implementing the video encoder 100) implement the operations depicted in fig. 11 by executing suitable program code (e.g., program code implementing the entropy encoding module 116). For purposes of illustration, the process 1100 is described with reference to some examples depicted in the accompanying drawings. However, other implementations are also possible.

In step 1102, process 1100 involves accessing a partition of a video signal. The partition may be a video frame, slice, or tile, or any type of partition that is processed as a unit by a video encoder when performing encoding. The partition includes one CTU set arranged in CTU rows as shown in fig. 8. Each CTU includes one or more CTUs, and each CTU includes a plurality of TUs for encoding as shown in the example of fig. 6.

In step 1104, which includes steps 1106-1118, process 1100 involves processing each CTU in the set of CTUs in the partition to encode the partition as a binary bit. In step 1106, process 1100 involves determining whether parallel encoding mechanisms have been enabled and whether the current CTU is the first CTU of a row of CTUs. In some examples, parallel encoding may be indicated by a flag whose value 0 indicates that parallel encoding has been disabled and value 1 indicates that parallel encoding has been enabled. If it is determined that the parallel encoding mechanism is enabled and the current CTU is the first CTU of the CTU row, process 1100 involves determining if the current CTU row is the first CTU row in the partition in step 1107. If so, process 1100 involves setting the history counter StatCoeff to an initial value in step 1108. As described above, the initial value of the history counter may be set according to the formula (5) or (6). If the current CTU row is not the first CTU row in the partition, process 1100 involves setting the history counter StatCoeff to the value stored in the history counter storage variable in step 1109, as shown in equations (13) or (20). In some examples, such as when deriving with an alternative Rice parameter, the value of the substitution variable histvue may also be reset to a stored value, as shown in equation (21).

If it is determined that the parallel encoding mechanism is not enabled or that the current CTU is not the first CTU of the CTU row, or after setting the value of the history counter in step 1108 or 1109, process 1100 involves calculating the Rice parameter for the TUs in the CTU based on the history counter (and the replacement variable histvue if it is also reset) in step 1110. As described above (e.g., with respect to a graph or in alternative Rice parameter derivation), if the value of the history counter is reset in step 1108 or 1109, the Rice parameter of the TU in the CTU is calculated based on the reset history counter or a subsequently updated history counter. If the history counter is not reset in step 1108 or 1109, the Rice parameter of the TU in the CTU is calculated based on the history counter updated in the previous CTU or the history counter updated later in the current CTU.

In step 1112, process 1100 involves encoding TUs in the CTU into a binary representation based on the calculated Rice parameters, such as by a combination of TR and limited EGK as specified in the VVC specification. In step 1114, process 1100 involves encoding the binary representation of the CTU as binary bits for inclusion in a bitstream of the video. For example, encoding may be performed using CABAC as discussed above.

In step 1116, process 1100 involves determining whether parallel encoding has been enabled and whether the CTU is the first CTU of the current row of CTUs. If so, process 1100 involves storing the value of the history counter in a history counter storage variable in step 1118, as shown in equations (12) or (18). In some examples, such as when deriving with an alternative Rice parameter, the value of the substitution variable histvue may also be stored in a storage variable, as shown in equation (19). In step 1120, the process 1100 involves outputting an encoded video bitstream.

In some cases, CTUs in a non-first CTU row may be located at the boundary of a partition. For example, the first CTU in the second CTU row has no CTU in the partition located at the top of the CTU. In these cases, the history counter of the CTU may be set to an initial value instead of storing the value. In this case, a new step 1107' may be added between step 1107 and step 1109 of fig. 11 to determine whether the CTU is at the boundary of the partition (e.g., the CTU has no top-adjacent CTU within the partition). If so, process 1100 proceeds to step 1108 where a history counter is set to an initial value; if not, process 1100 proceeds to step 1109 where a history counter is set to a stored value. The remaining steps of fig. 11 may remain unchanged.

Fig. 12 depicts one example of a process 1200 of decoding a partition of video according to some embodiments of the present disclosure. One or more computing devices implement the operations depicted in fig. 12 by executing suitable program code. For example, a computing device implementing the video decoder 200 may implement the operations depicted in fig. 12 by executing program code of the entropy decoding module 216, the inverse quantization module 218, and the inverse transformation module 219. For purposes of illustration, the process 1200 is described with reference to some examples depicted in the accompanying drawings. However, other implementations are also possible.

In step 1202, process 1000 involves accessing a binary string or binary representation representing a partition of a video signal. The partition may be a video frame, slice, or tile, or any type of partition that is processed as a unit by a video encoder when performing encoding. The partition includes one CTU set arranged in CTU rows as shown in fig. 8. Each CTU includes one or more CTUs, and each CTU includes a plurality of TUs for encoding as shown in the example of fig. 6.

In step 1204, which includes steps 1206-1218, process 1200 involves processing the binary string of each CTU in the set of CTUs in the partition to generate decoded samples for the partition. In step 1206, process 1200 involves determining whether parallel encoding mechanisms have been enabled and whether the current CTU is the first CTU of the CTU row. Parallel encoding may be indicated by a flag whose value 0 indicates that parallel encoding has been disabled and whose value 1 indicates that parallel encoding has been enabled. If it is determined that the parallel encoding mechanism is enabled and the current CTU is the first CTU of the CTU row, process 1200 involves determining in step 1207 whether the current CTU row is the first CTU row in the partition. If so, process 1200 involves setting the history counter StatCoeff to an initial value in step 1208. As described above, the initial value of the history counter may be set according to the formula (5) or (6). If the current CTU row is not the first CTU row in the partition, process 1200 involves setting the history counter StatCoeff to the value stored in the history counter storage variable in step 1209, as shown in equations (13) or (20). In some examples, such as when deriving with an alternative Rice parameter, the value of the substitution variable histvue may also be reset to a stored value, as shown in equation (21).

If it is determined that the parallel encoding mechanism is not enabled or that the current CTU is not the first CTU of the CTU row, or after setting the history counter in step 1208 or 1209, process 1200 involves calculating the Rice parameter of the TUs in the CTU based on the history counter (and the replacement variable histvue if the value of the replacement variable is also set) in step 1210. As described above (e.g., with respect to a graph or in alternative Rice parameter derivation), if the value of the history counter is reset in step 1208 or 1209, the Rice parameter of the TU in the CTU is calculated based on the reset history counter or a subsequently updated history counter. If the history counter is not reset in step 1208 or 1209, the Rice parameter of the TU in the CTU is calculated based on the history counter updated in the previous CTU or the history counter updated later in the current CTU.

In step 1212, process 1200 involves decoding the binary string or binary representation of the TUs in the CTU into coefficient values based on the calculated Rice parameters, such as by a combination of TR and limited EGK as specified in the VVC specification. In step 1214, process 1200 involves reconstructing pixel values of TUs in the CTU by, for example, inverse quantization and inverse transform as discussed above with respect to fig. 2.

In step 1216, process 1200 involves determining whether parallel encoding has been enabled and whether the CTU is the first CTU of the current CTU row. If so, process 1200 involves storing the value of the history counter in a history counter storage variable in step 1218, as shown in equations (12) or (18). In some examples, such as when deriving with an alternative Rice parameter, the value of the substitution variable histvue may also be stored in a storage variable, as shown in equation (19). In step 1216, process 1200 involves outputting the decoded partitions of the video.

In another embodiment, WPP or other parallel coding mechanism and history-based Rice parameter derivation are prevented from coexistence in the code stream. For example, if WPP is enabled, history-based Rice parameter derivation may no longer be enabled. If WPP is not enabled, history-based Rice parameter derivation may be enabled. Similarly, if history-based Rice parameter derivation has been enabled, WPP may no longer be enabled. As an example, a grammar change may be made,

7.3.2.22 sequence parameter set Range extension grammar (supplement underlined)

As another example, the corresponding semantic changes are as follows (changes are underlined).

Although in the above description, a TU is described and illustrated in the accompanying drawings (e.g., fig. 6), the same technique may be applied to a TB. In other words, in the embodiments presented above (including the figures), TU may also represent TB.

Computing system examples for implementing video coding related quantization

Any suitable computing system may be used to perform the operations described herein. For example, fig. 13 depicts an example of a computing device 1300 that may implement video encoder 100 of fig. 1 or video decoder 200 of fig. 2. In some embodiments, computing device 1300 may include a processor 1312, the processor 1312 being communicatively coupled to a memory 1314 and executing computer-executable program code and/or accessing information stored in memory 1314. The processor 1312 may include a microprocessor, application-specific integrated circuit (ASIC), state machine, or other processing device. Processor 1312 may include any one or more processing devices. Such a processor may include, or may be in communication with, a computer-readable medium storing instructions that, when executed by the processor 1312, cause the processor to perform the operations described herein.

Memory 1314 may include any suitable non-transitory computer-readable media. The computer readable medium may include any electronic, optical, magnetic, or other storage device that can provide computer readable instructions or other program code to a processor. Non-limiting examples of computer readable media include magnetic disks, memory chips, ROM, RAM, ASIC, configured processors, optical storage, magnetic tape or other magnetic storage, or any other medium from which a computer processor may read instructions. The instructions may include processor-specific instructions generated by a compiler and/or interpreter from code written in any suitable computer programming language including, for example, C, C ++, c#, visual Basic, java, python, perl, javaScript, and ActionScript.

Computing device 1300 can also include a bus 1316. The bus 1316 may communicatively couple one or more components of the computing device 1300. Computing device 1300 may also include several external or internal devices, such as input or output devices. For example, computing device 1300 is shown having an input/output ("I/O") interface 1318, which interface 1318 may receive input from one or more input devices 1320 or provide output to one or more output devices 1322. One or more input devices 1320 and one or more output devices 1322 may be communicatively coupled to the I/O interface 1318. The communicative coupling may be achieved via any suitable means (e.g., connection via a printed circuit board, connection via a cable, communication via wireless transmission, etc.). Non-limiting examples of input devices 1320 include a touch screen (e.g., one or more cameras for imaging touch areas or one or more pressure sensors for detecting pressure changes caused by touches), a mouse, a keyboard, or any other device that may be used to generate input events in response to physical actions of a user of a computing device. Non-limiting examples of output devices 1322 include an LCD screen, an external monitor, speakers, or any other device that may be used to display or otherwise present output generated by a computing device.

Computing device 1300 may execute program code that configures processor 1312 to perform one or more operations described above with respect to fig. 1-12. The program code may include the video encoder 100 or the video decoder 200. The program code may reside in the memory 1314 or in any suitable computer-readable medium and may be executed by the processor 1312 or any other suitable processor.

Computing device 1300 can also include at least one network interface device 1324. The network interface device 1324 may include any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks 1328. Non-limiting examples of network interface device 1324 include an ethernet network adapter and/or modem, and the like. Computing device 1300 may transmit messages as electronic or optical signals through network interface device 1324.

General considerations

Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, it will be understood by those skilled in the art that the claimed subject matter may be practiced without these specific details. In other instances, methods, devices, or systems that are known to one of ordinary skill have not been described in detail so as not to obscure the claimed subject matter.

Unless specifically stated otherwise, it is appreciated that throughout the discussion of the present specification, terms such as "processing," "computing," "determining," and "identifying" or the like, refer to the action or processes of a computing device, such as one or more computers or similar electronic computing device, that manipulates and transforms data represented as physical electronic or magnetic quantities within the computing platform's memories, registers or other information storage devices, transmission devices, or display devices.

The one or more systems discussed herein are not limited to any particular hardware architecture or configuration. The computing device may include any suitable arrangement of components that provide results based on one or more inputs. Suitable computing devices include multi-purpose microprocessor-based computer systems that access stored software that programs or configures the computing system from a general-purpose computing device to a special-purpose computing device that implements one or more embodiments of the present subject matter. Any suitable programming, script, or other type of language or combination of languages may be used to implement the teachings contained herein in software for use in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such a computing device. The order of the blocks presented in the above examples may vary—for example, blocks may be reordered, combined, and/or broken into sub-blocks. Some blocks or processes may be performed in parallel.

The use of "adapted" or "configured to" herein is meant to be an open and inclusive language that does not exclude devices adapted or configured to perform additional tasks or steps. Furthermore, the use of "based on" is intended to be open and inclusive in that a process, step, calculation, or other action "based on" one or more stated conditions or values may in fact be based on additional conditions or values other than the stated conditions or values. Headings, lists, and numbers included herein are for ease of explanation only and are not meant as limitations.

While the present subject matter has been described in detail with respect to specific embodiments, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present disclosure has been presented for purposes of example and not limitation, and does not preclude inclusion of modifications, variations and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art.

Claims

1. A method for decoding video from a video bitstream, the method comprising:

accessing a binary string representing a partition of the video, the partition comprising a plurality of CTUs forming one or more rows of coding tree units CTUs;

For each CTU of the plurality of CTUs in the partition,

before decoding the CTU, in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is a first CTU row in the partition;

in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for calculating a color component of Rice parameter to an initial value;

in response to determining that the current CTU row is not the first CTU row in the partition, setting the history counter of the color component to a value stored in a history counter storage variable;

decoding the CTU, comprising:

calculating Rice parameters of a plurality of transform units TUs in the CTU based on the values of the history counter;

decoding the binary strings corresponding to the plurality of TUs in the CTU into coefficient values of the plurality of TUs based on the calculated Rice parameter; and

determining pixel values for the plurality of TUs in the CTU according to the coefficient values; and after decoding the CTU, storing a current value of the history counter in the history counter storage variable in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row.

2. The method of claim 1, wherein the partition is a frame, slice, or tile.

3. The method of claim 1, wherein setting the history counter of the color component cIdx to the initial value comprises:

StatCoeff[cIdx]＝2*Floor(Log2(BitDepth-10))，

where StatCoeff denotes the history counter, bitDepth specifies the bit depth of samples of the luminance and chrominance arrays of the video, floor (x) represents the largest integer less than or equal to x, and Log2 (x) is the base 2 logarithm of x.

4. The method of claim 1, wherein calculating the Rice parameters for the plurality of TUs in the CTU based on the value of the history counter comprises:

for each TU of the plurality of TUs in the CTU,

determining a replacement variable HistValue based on the history counter;

calculating local sum variables locsub of coefficients in the TU using values of neighboring coefficients in a predetermined neighborhood of the coefficients in the TU and the substitution variable histvue; and

the Rice parameter of the TU is derived based on the local and variable locSumAbs.

5. The method of claim 4, wherein calculating the local sum variable locsub of the coefficients in the TU comprises:

Determining that a neighboring sample of a plurality of neighboring samples in the predetermined neighborhood of the coefficient is outside the TU; and

the local and variable locsub is calculated using the substitution variable histvue as the value of the neighboring samples outside the TU.

6. The method of claim 1, further comprising:

setting a replacement variable to a stored replacement variable value in response to determining that the current CTU row is not the first CTU row in the partition prior to decoding the CTU, wherein calculating the Rice parameter for the plurality of TUs in the CTU based on the value of the history counter comprises: calculating the Rice parameters of the plurality of TUs in the CTU based on the replacement variables; and

after decoding the CTU, in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row, storing a current value of the substitution variable.

7. The method of claim 1, wherein decoding the CTU further comprises updating the history counter by:

in response to determining that a first non-zero Golomb-Rice encoded transform coefficient of one TU of the plurality of TUs is encoded as abs_remainders, updating the history counter of the color component cIdx to:

StatCoeff [ cIdx ] = (StatCoeff [ cIdx ] +floor (Log 2 (abs_domain [ cIdx ])) +2) > >1; and

in response to determining that the first non-zero Golomb-Rice encoded transform coefficient in the TU is encoded as dec_abs_level, updating the history counter for the color component cIdx to:

StatCoeff[cIdx]＝(StatCoeff[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，

wherein StatCoeff denotes the history counter, floor (x) represents a maximum integer less than or equal to x, and Log2 (x) is a base 2 logarithm of x.

8. A non-transitory computer readable medium storing program code executable by one or more processing devices to perform operations comprising:

accessing a binary string representing a partition of a video, the partition comprising a plurality of CTUs forming one or more rows of coding tree units CTUs;

for each CTU of the plurality of CTUs in the partition,

in response to determining that the current CTU row is the first CTU row in the partition, setting a history counter for color components to an initial value;

decoding the CTU, comprising:

9. The non-transitory computer-readable medium of claim 8, wherein setting the history counter of the color component cIdx to the initial value comprises:

StatCoeff[cIdx]＝2*Floor(Log2(BitDepth-10))，

10. The non-transitory computer-readable medium of claim 8, wherein calculating the Rice parameter for the plurality of TUs in the CTU based on the value of the history counter comprises:

for each TU of the plurality of TUs in the CTU,

determining a replacement variable HistValue based on the history counter;

11. The non-transitory computer readable medium of claim 8, wherein the partition is a frame, slice, or tile.

12. The non-transitory computer-readable medium of claim 10, wherein calculating the local sum variable locsum of the coefficients in the TU comprises:

13. The non-transitory computer-readable medium of claim 8, further comprising:

14. The non-transitory computer-readable medium of claim 8, wherein decoding the CTU further comprises updating the history counter by:

StatCoeff[cIdx]＝(StatCoeff[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，

15. A system, comprising:

a processing device; and

a non-transitory computer readable medium communicatively coupled to the processing device, wherein the processing device is configured to execute program code stored in the non-transitory computer readable medium to:

for each CTU of the plurality of CTUs in the partition,

Decoding the CTU, comprising:

16. The system of claim 15, wherein setting the history counter of the color component cIdx to the initial value comprises:

StatCoeff[cIdx]＝2*Floor(Log2(BitDepth-10))，

17. The system of claim 15, wherein calculating the Rice parameters for the plurality of TUs in the CTU based on the value of the history counter comprises:

For each TU of the plurality of TUs in the CTU,

determining a replacement variable HistValue based on the history counter;

18. The system of claim 15, wherein the partition is a frame, slice, or tile.

19. The system of claim 15, further comprising:

20. The system of claim 15, wherein decoding the CTU further comprises updating the history counter by:

StatCoeff[cIdx]＝(StatCoeff[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，

21. A video encoding method, the method comprising:

accessing a partition of the video, the partition comprising a plurality of CTUs forming one or more rows of coding tree units CTUs;

processing the partition of the video to generate a binary representation of the partition, the processing comprising:

for each CTU of the plurality of CTUs in the partition,

before encoding the CTU, in response to determining that parallel encoding has been enabled and that the CTU is a first CTU of a current CTU row, determining whether the current CTU row is a first CTU row in the partition;

encoding the CTU, comprising:

calculating Rice parameters of a plurality of transform units TUs in the CTU based on the values of the history counter; and

encoding coefficient values of the plurality of TUs into binary representations corresponding to the plurality of TUs in the CTU based on the calculated Rice parameters; and

after encoding the CTU, in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row, storing a current value of the history counter in the history counter storage variable; and

the binary representation of the partition is encoded as a bitstream of the video.

22. The method of claim 21, wherein the partition is a frame, slice, or tile.

23. The method of claim 21, wherein setting the history counter of the color component cIdx to the initial value comprises:

StatCoeff[cIdx]＝2*Floor(Log2(BitDepth-10))，

24. The method of claim 21, wherein calculating the Rice parameters for the plurality of TUs in the CTU based on the value of the history counter comprises:

for each TU of the plurality of TUs in the CTU,

determining a replacement variable HistValue based on the history counter;

25. The method of claim 24, wherein calculating the local sum variable locsub of the coefficients in the TU comprises:

26. The method of claim 21, further comprising:

before encoding the CTU, in response to determining that the current CTU row is not the first CTU row in the partition, setting a replacement variable to a stored replacement variable value, wherein calculating the Rice parameter for the plurality of TUs in the CTU based on the value of the history counter comprises: calculating the Rice parameters of the plurality of TUs in the CTU based on the replacement variables; and

after encoding the CTU, storing a current value of the substitution variable in response to determining that parallel encoding has been enabled and that the CTU is the first CTU of the current CTU row.

27. The method of claim 21, wherein encoding the CTU further comprises updating the history counter by:

StatCoeff[cIdx]＝(StatCoeff[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，

28. A non-transitory computer readable medium storing program code executable by one or more processing devices to perform operations comprising:

accessing a partition of a video, the partition comprising a plurality of Code Tree Units (CTUs) forming one or more CTU rows;

for each CTU of the plurality of CTUs in the partition,

Encoding the CTU, comprising:

29. The non-transitory computer-readable medium of claim 28, wherein setting the history counter of the color component cIdx to the initial value comprises:

StatCoeff[cIdx]＝2*Floor(Log2(BitDepth-10))，

30. The non-transitory computer-readable medium of claim 28, wherein calculating the Rice parameter for the plurality of TUs in the CTU based on the value of the history counter comprises:

For each TU of the plurality of TUs in the CTU,

determining a replacement variable HistValue based on the history counter;

31. The non-transitory computer readable medium of claim 28, wherein the partition is a frame, slice, or tile.

32. The non-transitory computer-readable medium of claim 30, wherein calculating the local sum variable locsum of the coefficients in the TU comprises:

33. The non-transitory computer-readable medium of claim 28, further comprising:

34. The non-transitory computer-readable medium of claim 28, wherein encoding the CTU further comprises updating the history counter by:

StatCoeff[cIdx]＝(StatCoeff[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，

35. A system, comprising:

a processing device; and

for each CTU of the plurality of CTUs in the partition,

encoding the CTU, comprising:

36. The system of claim 35, wherein setting the history counter of the color component cIdx to the initial value comprises:

StatCoeff[cIdx]＝2*Floor(Log2(BitDepth-10))，

37. The system of claim 35, wherein calculating the Rice parameters for the plurality of TUs in the CTU based on the value of the history counter comprises:

for each TU of the plurality of TUs in the CTU,

determining a replacement variable HistValue based on the history counter;

38. The system of claim 35, wherein the partition is a frame, slice, or tile.

39. The system of claim 35, further comprising:

40. The system of claim 35, wherein encoding the CTU further comprises updating the history counter by:

StatCoeff[cIdx]＝(StatCoeff[cIdx]+Floor(Log2(dec_abs_level[cIdx])))>>1，