Detailed Description
Reference will now be made in detail to the present embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. However, various alternatives may be used and the subject matter may be practiced without these specific details without departing from the scope of the claims. For example, the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the figures are used for distinguishing between objects and not for describing any particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the disclosure described herein may be implemented in other sequences than those illustrated in the figures or otherwise described in the disclosure.
Fig. 1 is a block diagram illustrating an exemplary system 10 for encoding and decoding video blocks in parallel according to some embodiments of the present disclosure. As shown in fig. 1, the system 10 includes a source device 12, the source device 12 generating and encoding video data to be later decoded by a target device 14. Source device 12 and destination device 14 may comprise any of a wide variety of electronic devices including cloud servers, server computers, desktop or laptop computers, tablet computers, smart phones, set-top boxes, digital televisions, cameras, display devices, digital media players, video gaming machines, video streaming devices, and the like. In some implementations, the source device 12 and the target device 14 are equipped with wireless communication capabilities.
In some implementations, the target device 14 may receive encoded video data to be decoded via the link 16. Link 16 may comprise any type of communication medium or device capable of moving encoded video data from source device 12 to destination device 14. In other embodiments, encoded video data may be sent from output interface 22 to storage device 32. The encoded video data in the storage device 32 may then be accessed by the target device 14 via the input interface 28.
As shown in fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Video source 18 may include sources such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface for receiving video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of such sources.
Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video data may be sent directly to the target device 14 via the output interface 22 of the source device 12. The encoded video data may also (or alternatively) be stored on the storage device 32 for later access by the target device 14 or other device for decoding and/or playback.
The target device 14 includes an input interface 28, a video decoder 30, and a display device 34. Input interface 28 may include a receiver and/or modem and receives encoded video data over link 16. The encoded video data transmitted over link 16 or provided on storage device 32 may include various syntax elements generated by video encoder 20 for use by video decoder 30 in decoding the video data. Such syntax elements may be included in encoded video data transmitted over a communication medium, stored on a storage medium, or stored on a file server.
Video encoder 20 and video decoder 30 may operate in accordance with a proprietary standard or industry standard (e.g., part 10 of VVC, HEVC, MPEG-4, AVC) or an extension of such standard. It should be appreciated that the present application is not limited to a particular video encoding/decoding standard and may be applicable to other video encoding/decoding standards. It is generally contemplated that video encoder 20 of source device 12 may be configured to encode video data according to any of these current or future standards. Similarly, it is also generally contemplated that the video decoder 30 of the target device 14 may be configured to decode video data according to any of these current or future standards.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic devices, software, hardware, firmware or any combinations thereof. When implemented in part in software, the electronic device can store instructions for the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the video encoding/decoding operations disclosed in the present disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
Fig. 2 is a block diagram illustrating an exemplary video encoder 20 according to some embodiments described in this disclosure. Video encoder 20 may perform intra-prediction encoding and inter-prediction encoding on video blocks within video frames. Intra-prediction encoding relies on spatial prediction to reduce or remove spatial redundancy in video data within a given video frame or picture. Inter-prediction encoding relies on temporal prediction to reduce or remove temporal redundancy in video data within adjacent video frames or pictures of a video sequence. It should be noted that in the field of video coding, the term "frame" may be used as a synonym for the term "image" or "picture".
As shown in fig. 2, video encoder 20 includes a video data memory 40, a prediction processing unit 41, a Decoded Picture Buffer (DPB) 64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction processing unit 41 further includes a motion estimation unit 42, a motion compensation unit 44, a division unit 45, an intra prediction processing unit 46, and an intra Block Copy (BC) unit 48. In some implementations, video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and an adder 62 for video block reconstruction. A loop filter 63, such as a deblocking filter, may be located between adder 62 and DPB 64 to filter block boundaries to remove blocking artifacts from the reconstructed video. In addition to the deblocking filter, another loop filter, such as a Sample Adaptive Offset (SAO) filter, a cross-component sample adaptive offset (CCSAO) filter, and/or an Adaptive Loop Filter (ALF), may be used to filter the output of adder 62. In some examples, the loop filter may be omitted and the decoded video block may be provided directly to DPB 64 by adder 62. Video encoder 20 may take the form of fixed or programmable hardware units, or may be dispersed in one or more of the fixed or programmable hardware units described.
Video data memory 40 may store video data to be encoded by components of video encoder 20. The video data in video data store 40 may be obtained, for example, from video source 18 shown in fig. 1. DPB 64 is a buffer that stores reference video data (e.g., reference frames or pictures) for use by video encoder 20 in encoding the video data (e.g., in intra or inter prediction encoding modes).
As shown in fig. 2, after receiving video data, a dividing unit 45 within the prediction processing unit 41 divides the video data into video blocks. This partitioning may also include partitioning the video frame into slices, tiles (tiles) (e.g., a set of video blocks), or other larger Coding Units (CUs) according to a predefined split structure (e.g., a Quadtree (QT) structure) associated with the video data. It should be noted that the term "block" or "video block" as used herein may be a portion of a frame or picture, especially a rectangular (square or non-square) portion. Referring to HEVC and VVC, a block or video block may be or correspond to a Coding Tree Unit (CTU), a CU, a Prediction Unit (PU), or a Transform Unit (TU) and/or may be or correspond to a respective block (e.g., a Coding Tree Block (CTB), a Coding Block (CB), a Prediction Block (PB), or a Transform Block (TB)) and/or sub-block.
The prediction processing unit 41 may select one of a plurality of possible prediction coding modes, for example, one of one or more inter prediction coding modes of a plurality of intra prediction coding modes, for the current video block based on the error result (e.g., the coding rate and the distortion level). The prediction processing unit 41 may provide the resulting intra-or inter-prediction encoded block to the adder 50 to generate a residual block and to the adder 62 to reconstruct the encoded block for subsequent use as part of a reference frame. Prediction processing unit 41 also provides syntax elements (e.g., motion vectors, intra mode indicators, partition information, and other such syntax information) to entropy encoding unit 56.
To select the appropriate intra-prediction encoding mode for the current video block, intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-prediction encoding of the current video block in relation to one or more neighboring blocks in the same frame as the current block to be encoded to provide spatial prediction. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-prediction encoding of the current video block in relation to one or more prediction blocks in one or more reference frames to provide temporal prediction. Video encoder 20 may perform multiple encoding passes, for example, to select an appropriate encoding mode for each block of video data.
In some embodiments, motion estimation unit 42 determines the inter-prediction mode for the current video frame by generating a motion vector from a predetermined pattern within the sequence of video frames, the motion vector indicating a displacement of a video block within the current video frame relative to a predicted block within a reference video frame. The motion estimation performed by the motion estimation unit 42 is a process of generating a motion vector that estimates motion for a video block. For example, the motion vector may indicate the displacement of a video block within a current video frame or picture relative to a predicted block within a reference frame associated with the current block being encoded within the current frame. The predetermined pattern may designate video frames in the sequence as P-frames or B-frames. The intra BC unit 48 may determine the vector (e.g., block vector) for intra BC encoding in a similar manner as the motion vector used for inter prediction by the motion estimation unit 42, or may determine the block vector using the motion estimation unit 42.
Regardless of whether the prediction block is from the same frame according to intra-prediction or from a different frame according to inter-prediction, video encoder 20 may form a residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being encoded. The pixel differences forming the residual video block may include both a luma component difference and a chroma component difference.
Intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, for example, during separate encoding passes, and intra-prediction processing unit 46 (or a mode selection unit in some examples) may select an appropriate intra-prediction mode from the tested intra-prediction modes for use. Intra-prediction processing unit 46 may provide information to entropy encoding unit 56 indicating the intra-prediction mode selected for the block. Entropy encoding unit 56 may encode information into the bitstream that indicates the selected intra-prediction mode.
After the prediction processing unit 41 determines a prediction block for the current video block via inter prediction or intra prediction, the adder 50 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be included in one or more TUs and provided to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a Discrete Cosine Transform (DCT) or a conceptually similar transform.
The transform processing unit 52 may send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantizes the transform coefficient to further reduce the bit rate. The quantization process may also reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting the quantization parameter. In some examples, quantization unit 54 may then perform a scan on the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.
After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients into a video bitstream using, for example, context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or another entropy encoding method or technique. The encoded bitstream may then be sent to a video decoder 30 as shown in fig. 1, or archived in a storage device 32 as shown in fig. 1 for later transmission to the video decoder 30 or retrieval by the video decoder 30. Entropy encoding unit 56 may also entropy encode motion vectors and other syntax elements for the current video frame being encoded.
Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transforms, respectively, to reconstruct the residual video block in the pixel domain for generating reference blocks for predicting other video blocks. As noted above, motion compensation unit 44 may generate a motion compensated prediction block from one or more reference blocks of a frame stored in DPB 64. Motion compensation unit 44 may also apply one or more interpolation filters to the prediction block to calculate sub-integer pixel values for use in motion estimation.
Adder 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in DPB 64. The reference block may then be used by the intra BC unit 48, the motion estimation unit 42, and the motion compensation unit 44 as a prediction block to inter-predict another video block in a subsequent video frame.
Fig. 3 is a block diagram illustrating an exemplary video decoder 30 according to some embodiments of the present application. Video decoder 30 includes video data memory 79, entropy decoding unit 80, prediction processing unit 81, inverse quantization unit 86, inverse transform processing unit 88, adder 90, and DPB 92. The prediction processing unit 81 further includes a motion compensation unit 82, an intra prediction unit 84, and an intra BC unit 85. Video decoder 30 may perform a decoding process that is substantially reciprocal to the encoding process described above in connection with fig. 2 with respect to video encoder 20. For example, the motion compensation unit 82 may generate prediction data based on the motion vector received from the entropy decoding unit 80, and the intra prediction unit 84 may generate prediction data based on the intra prediction mode indicator received from the entropy decoding unit 80.
In some examples, embodiments of the present disclosure may be dispersed in one or more of the units of video decoder 30. For example, the intra BC unit 85 may perform embodiments of the present application alone or in combination with other units of the video decoder 30 (e.g., the motion compensation unit 82, the intra prediction unit 84, and the entropy decoding unit 80). In some examples, video decoder 30 may not include intra BC unit 85, and the functions of intra BC unit 85 may be performed by other components of prediction processing unit 81 (e.g., motion compensation unit 82).
Video data memory 79 may store video data, such as an encoded video bitstream, to be decoded by other components of video decoder 30. The video data stored in the video data memory 79 may be obtained, for example, from the storage device 32, from a local video source (e.g., a camera), via wired or wireless network communication of video data, or by accessing a physical data storage medium (e.g., a flash drive or hard disk).
During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of encoded video frames and associated syntax elements. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantization coefficients, motion vectors, or intra-prediction mode indicators, as well as other syntax elements. Entropy decoding unit 80 then forwards the motion vector or intra prediction mode indicator, as well as other syntax elements, to prediction processing unit 81.
When a video frame is encoded as an intra prediction encoded (I) frame or an intra encoding prediction block used in other types of frames, the intra prediction unit 84 of the prediction processing unit 81 may generate prediction data for a video block of the current video frame based on the signaled intra prediction mode and reference data from a previously decoded block of the current frame.
When a video frame is encoded as an inter-prediction encoded (i.e., B or P) frame, the motion compensation unit 82 of the prediction processing unit 81 generates one or more prediction blocks for the video block of the current video frame based on the motion vectors and other syntax elements received from the entropy decoding unit 80. Each of the prediction blocks may be generated from reference frames within one of the reference frame lists. Video decoder 30 may construct a list of reference frames, i.e., list 0 and list 1, using a default construction technique based on the reference frames stored in DPB 92.
In some examples, when video blocks are encoded according to the intra BC mode described herein, intra BC unit 85 of prediction processing unit 81 generates a prediction block for the current video block based on the block vectors and other syntax elements received from entropy decoding unit 80. The prediction block may be within a reconstructed region of the same picture as the current video block defined by video encoder 20.
The motion compensation unit 82 and/or the intra BC unit 85 determine prediction information for the video block of the current video frame by parsing the motion vector and other syntax elements, and then use the prediction information to generate a prediction block for the current video block being decoded.
Motion compensation unit 82 may also perform interpolation using interpolation filters, such as those used by video encoder 20 during encoding of video blocks, to calculate interpolation values for sub-integer pixels of the reference block. In this case, motion compensation unit 82 may determine interpolation filters used by video encoder 20 from the received syntax elements and use these interpolation filters to generate the prediction block.
The dequantization unit 86 dequantizes quantized transform coefficients provided in the bitstream and entropy decoded by the entropy decoding unit 80 using the same quantization parameter calculated by the video encoder 20 for each video block in the video frame that is used to determine the degree of quantization. The inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to reconstruct the residual block in the pixel domain.
After the motion compensation unit 82 or the intra BC unit 85 generates a prediction block for the current video block based on the vector and other syntax elements, the adder 90 reconstructs a decoded video block for the current video block by adding the residual block from the inverse transform processing unit 88 to the corresponding prediction block generated by the motion compensation unit 82 and the intra BC unit 85. Loop filter 91 (e.g., deblocking filter, SAO filter, CCSAO filter, and/or ALF) may be located between adder 90 and DPB 92 to further process the decoded video block. In some examples, loop filter 91 may be omitted and the decoded video block may be provided directly to DPB 92 by adder 90. The decoded video blocks in a given frame are then stored in DPB 92, and DPB 92 stores reference frames for subsequent motion compensation of the next video block. DPB 92 or a memory device separate from DPB 92 may also store decoded video for later presentation on a display device (e.g., display device 34 of fig. 1).
As shown in fig. 4A, video encoder 20 (or more specifically, partitioning unit 45) generates an encoded representation of a frame by first partitioning the frame into a set of CTUs. The video frame may include an integer number of CTUs ordered consecutively from left to right and top to bottom in raster scan order. Each CTU is the largest logical coding unit and the width and height of the CTU are signaled by video encoder 20 in the sequence parameter set such that all CTUs in the video sequence have the same size of one of 128 x 128, 64 x 64, 32 x 32, and 16 x 16. It should be noted that the application is not necessarily limited to a particular size. As shown in fig. 4B, each CTU may include one CTB of a luminance sample, two corresponding coding tree blocks of a chrominance sample, and a syntax element for coding the samples of the coding tree blocks. Syntax elements describe the nature of the different types of units encoding the pixel blocks and how the video sequence may be reconstructed at video decoder 30, including inter-or intra-prediction, intra-prediction modes, motion vectors, and other parameters. In a monochrome picture or a picture having three separate color planes, a CTU may comprise a single coding tree block and syntax elements for encoding samples of the coding tree block. The coding tree block may be an nxn block of samples.
To achieve better performance, video encoder 20 may recursively perform tree partitioning, such as binary tree partitioning, trigeminal tree partitioning, quadtree partitioning, or a combination thereof, on the coding tree blocks of the CTUs and partition the CTUs into smaller CUs. As depicted in fig. 4C, a 64 x 64 CTU 400 is first divided into four smaller CUs, each having a block size of 32 x 32. Among the four smaller CUs, the CUs 410 and 420 are divided into four CUs with block sizes of 16×16, respectively. Two 16×16 CUs 430 and 440 are each further divided into four CUs with block sizes of 8×8. Fig. 4D depicts a quadtree data structure showing the final result of the partitioning process of CTU 400 as depicted in fig. 4C, each leaf node of the quadtree corresponding to one CU of a respective size ranging from 32 x 32 to 8 x 8. Similar to the CTU depicted in fig. 4B, each CU may include two corresponding coding blocks of CB and chroma samples of luma samples of the same size frame, and syntax elements for coding the samples of the coding blocks. In a monochrome picture or a picture having three separate color planes, a CU may comprise a single coding block and syntax structures for encoding samples of the coding block. It should be noted that the quadtree partitions depicted in fig. 4C and 4D are for illustrative purposes only, and that one CTU may be split into multiple CUs based on quadtree partitions/trigeminal tree partitions/binary tree partitions to accommodate varying local characteristics. In a multi-type tree structure, one CTU is divided in a quadtree structure, and each quadtree leaf CU may be further divided in a binary and trigeminal tree structure. As shown in fig. 4E, there are five possible partition types for a code block having a width W and a height H, namely, quaternary partition, horizontal binary partition, vertical binary partition, horizontal ternary partition, and vertical ternary partition.
In the h.266/VVC standard, a frame of image is divided into a plurality of CTUs for encoding using the CTUs as a basic unit of encoding. Unlike the h.265/HEVC standard, in the h.266/VVC standard, a CU may be square or rectangular, and a CTU may contain only one CU (without partitioning) or may be partitioned into multiple CUs. The CTUs may be divided according to Multi-type Tree (MTT) structures such as BT, TT, QT, etc., and the divided leaf nodes may be further divided by using the MTT structures.
The partition status of each node is identified by flags such as split_cu_flag, split_qt_flag, mtt_split_cu_vertical_flag, mtt_split_cu_binary_flag, and the like. For example, the split_cu_flag identifies whether the node is further divided, the split_qt_flag identifies whether the node employs QT division when the split_cu_flag=1, whether the node employs BT division or TT division when the split_qt_flag=0, mtt _split_cu_vertical_flag indicating a division direction (1 indicating vertical division; 0 indicating horizontal division), and mtt _split_cu_binary_flag indicating a division type (1 indicating BT division; 0 indicating TT division). It should be noted that the order of setting these syntax elements is related to the priority of the partition selection made by the CU or CTU. For example, for each QT node, if partitioning is continued, QT partitioning or Horizontal BT (HBT), vertical BT (VBT), horizontal TT (HTT), vertical TT (VTT) partitioning may be used. Once BT or TT partitioning is performed, if nodes continue to partition, then HBT, VBT, HTT, VTT partitioning can only be continued and QT partitioning cannot be performed. The priority of the split qtflag is thus before mtt split cu vertical flag, mtt split cu binary flag.
In addition, in order to further remove redundancy of the above MTT partition scheme, a time domain partition prediction scheme (may also be referred to as a time domain partition prediction tool) is proposed, which mainly involves the following operations:
Parameter collection and use after encoding of the reference frame, time-domain partition prediction parameters for subsequent use, such as minimum QT depth (also referred to as minimum QT partition depth), average QT depth (also referred to as average QT partition depth), maximum MTT depth (also referred to as maximum MTT partition depth), and average maximum MTT depth (also referred to as average maximum MTT partition depth) within the time-domain reference region, etc., may be collected. The time domain division prediction parameters can be acquired and stored after each frame of image is encoded. For example, QT depth and MTT depth may be counted at a granularity of 16×16 for reference frames. When the current CU or CTU is in use, corresponding time-domain partition prediction parameters may be collected based on the co-located blocks of the reference frame (the co-located blocks have a size of CTU), i.e. 4 parameters for the co-located blocks are calculated according to the collected 16×16 granularity information of the reference frame, namely, a minimum QT depth, an average QT depth, a maximum MTT depth, and an average maximum MTT depth, respectively. These four parameters may be used for partition prediction of the current CU or CTU.
Prediction of partition modes, namely, preferably allowing QT partition according to the minimum value and the average value of the QT depth of the reference image. The current partition mode is constrained based on the calculated minimum QT depth and average QT depth. For example, if the current QT depth is less than the minimum QT depth-1 of the time-domain reference, only non-partitioning and QT partitioning are allowed, or if the current QT depth is less than the average QT depth-1, QT partitioning, TT partitioning and non-partitioning are allowed, BT partitioning being allowed in the case that the parent node is TT partitioning.
Adaptive adjustment of MTT depth the maximum MTT depth is adaptively adjusted based on several coding information features such as frame type of reference picture, picture order count difference (POC difference), quantization Parameter (QP), temporal layer (TemporalLayer), temporal ID, collected temporal partition parameters such as maximum and average of QT depth, average and maximum of maximum MTT depth, allowing local region partitioning to be deeper or shallower, i.e. adaptively increasing or decreasing the maximum MTT depth depending on the temporal information.
Syntax element ordering adjustment for QT partitioning if the current QT depth is less than the average QT depth of the temporal reference, indicates that the priority of QT is higher at this time, split_qt_flag should be ranked before split_cu_flag.
The partition manner of a CU or CTU is determined using a time domain partition prediction tool in combination with syntax elements related to a partition state.
In the current partition prediction scheme, a flag on a time domain partition prediction tool is added only at the sequence level. For example, there is only a sequence level of control switches for enabling the time domain partition prediction tool, such as one control switch flag is defined in the Sequence Parameter Set (SPS).
However, this temporal partition prediction tool is closely related to the video content, as well as to the coding control parameters (such as QP), and there is inherently temporal dependency (i.e., the correlation of the current frame to the reference frame), which is relatively sensitive to QP and video content. That is, the temporal division prediction tool is significantly different in effectiveness across different sequence content, and not all video frames can be well applied. If only control switch flags at the sequence level are used, this results in lower codec efficiency. Therefore, there is a need to further enhance the adaptive capabilities of temporal partition prediction tools.
In view of the above, according to the present disclosure, a control switch flag lower than a sequence level is provided so that the activation of a temporal division prediction tool can be controlled more precisely, thereby improving coding and decoding efficiency. Embodiments of the present disclosure will be described in detail below.
Fig. 5 is a flowchart of a video decoding method according to some embodiments of the present disclosure. The precautions shown in fig. 5 may be performed by the decoder shown in fig. 3.
Referring to fig. 5, in step 501, a first syntax element is obtained from a bitstream at a target level lower than a sequence level, wherein the first syntax element indicates whether a temporal division prediction tool is enabled for a target prediction object corresponding to the target level, the temporal division prediction tool being used to determine division information of the target prediction object based on division information of a temporal reference object of the target prediction object.
According to embodiments of the present disclosure, the target level may include one or more of a picture level, a slice level, a coding tree unit CTU level.
In case the target level is an image level, the target prediction object may be a current image. In the case where the target level is a stripe level, the target prediction object may be the current stripe. In the case where the target level is a CTU level, the target prediction object may be a current CTU.
The temporal division prediction tool according to the present disclosure may be configured to predict, for a current picture, using a reference picture of the current picture, predict, for a current slice, using co-located slices or blocks of a size larger than the slices in the reference picture of the current picture, and predict, for a current CTU, using co-located CTUs or blocks of a size larger than the CTUs in the reference picture of the current picture.
In case the first syntax element is for the picture level, an adaptive switching control of the picture level for the temporal partition prediction tool may be implemented. At this time, the first syntax element may be used to control the partition decision and/or the coding and decoding manner with respect to the partition of all the coding blocks in the current picture. For example, a first syntax element at the picture level may be used to control whether a switch for enabling a temporal partition prediction tool is to be turned on in the current frame, in which case all coded blocks in the current picture (such as CUs) may be predicted using the temporal partition prediction tool. The temporal division prediction tool at the image level may be used to determine division information of the current image based on division information of a reference image of the current image. Here, the partition information may include, but is not limited to, information of a minimum QT depth, an average QT depth, a maximum MTT depth, and an average maximum MTT depth, etc.
In case the first syntax element is for a slice level, an adaptive switching control of the slice level for the temporal partition prediction tool may be implemented. At this time, the first syntax element may be used to control the partition decision and/or the coding and decoding manner with respect to the partition for all the coding blocks in the current slice. The temporal partition prediction tool at the slice level may be used to determine partition information for a current slice based on partition information for a reference slice in a reference image of the current slice.
For example, the first syntax element may be ph_ partitioning _prediction_flag, and may be set in a picture header or a picture parameter set.
The following illustrates improvements to the picture header syntax element in the VVC standard of the present disclosure, as described in tables 1 and 2 below. The newly added first syntax element in tables 1 and 2 is shown in bold italic font.
TABLE 1
| ... |
Descriptor for a computer |
| ph_partitioning_prediction_flag |
ue(v) |
| ... |
|
| if(ph_inter_slice_allowed_flag){ |
|
| if(ph_partition_constraints_override_flag){ |
|
| ph_log2_diff_min_qt_min_cb_inter_slice |
ue(v) |
| ph_max_mtt_hierarchy_depth_inter_slice |
ue(v) |
| if(ph_max_mtt_hierarchy_depth_inter_slice!=0){ |
|
| ph_log2_diff_max_bt_min_qt_inter_slice |
ue(v) |
| ph_log2_diff_max_tt_min_qt_inter_slice |
ue(v) |
| } |
|
| } |
|
| ... |
|
TABLE 2
The location of the first syntax element described above is merely exemplary, and the present disclosure is not limited thereto.
At step 502, it is determined whether a temporal partition prediction tool is enabled for the target prediction object based on the first syntax element.
After obtaining the first syntax element, a determination may be made as to whether to enable the temporal partition prediction tool based on a value of the first syntax element. For example, when the value of the first syntax element is 1, it may be determined to enable the temporal partition prediction tool. When the value of the first syntax element is 0, it may be determined that the temporal partition prediction tool is not enabled.
In step 503, based on determining that the temporal division prediction tool is enabled for the target prediction object, a prediction is performed on the target prediction object using the temporal division prediction tool.
In the case of enabling the temporal division prediction tool, the above-described temporal division prediction tool and syntax elements related to the division states may be used in combination to determine the division manner of the target prediction object.
According to an embodiment of the present disclosure, before the first syntax element is obtained from the bitstream, a second syntax element may be obtained from the bitstream at a sequence level, wherein the second syntax element indicates whether a temporal division prediction tool can be enabled for the sequence level, and then the first syntax element is obtained at a target level based on a determination that the temporal division prediction tool can be enabled for the sequence level. The second syntax element may be a control switch of the time division prediction tool at the sequence level.
For example, in case the target level is the picture level, the second syntax element for the sequence level may be obtained first from the bitstream. The second syntax element may indicate whether a temporal partition prediction tool can be enabled for the sequence of pictures. In the case where it is determined that the temporal division prediction tool can be enabled for the sequence of images, it is indicated that the temporal division prediction tool can be used for each image in the sequence of images, at which point it may be further determined by the first syntax element whether the temporal division prediction tool is used for each image in the sequence of images. In the event that it is determined that the temporal division prediction tool is not enabled for the sequence level, indicating that the temporal division prediction tool is not used for each picture in the sequence of pictures, at which point it may be determined that the first syntax element is not present and inferred to be zero.
For another example, in case the target level is the slice level, the second syntax element for the sequence level is first obtained from the bitstream. The second syntax element may indicate whether a temporal partition prediction tool can be enabled for the sequence of pictures. In case it is determined that a temporal division prediction tool can be enabled for the sequence of images, it is indicated that a temporal division prediction tool can be used for each image in the sequence of images, at which point it may be further determined by means of the first syntax element whether the temporal division prediction tool is used for the current slice. In the event that it is determined that the temporal division prediction tool is not enabled for the sequence of pictures, indicating that the temporal division prediction tool is not used for each picture in the sequence of pictures, at which point it may be determined that the first syntax element is not present and inferred to be zero.
According to embodiments of the present disclosure, in the case of enabling a temporal division prediction tool for a sequence of images, a first syntax element may be obtained at a target level lower than the sequence level to further determine whether the temporal division prediction tool is enabled for a target prediction object corresponding to the target level. In the case where the temporal partition prediction tool is not enabled for the sequence of pictures, the first syntax element may not be obtained, directly inferred as zero.
According to another embodiment of the present disclosure, a respective syntax element for controlling the enablement of the temporal division prediction tool may be set at each of a sequence level and levels below the sequence level (such as a picture level, a slice level, and a CTU level). For example, where a temporal division prediction tool can be enabled for a sequence of images, it may be determined whether a temporal division prediction tool can be enabled for each image in the sequence of images, where a temporal division prediction tool can be enabled for a current image, it may be further determined whether a temporal division prediction tool can be enabled for each slice in the current image, and so on.
According to the method and the device, the time domain dependence of the video (such as the control dependence of the video frame can be controlled because the time domain division prediction introduces additional frame level reference dependence) can be better controlled by adding the control switch of the time domain division prediction tool of the target level below the sequence level, so that the requirement of the video coding and decoding functions can be better met, and the compression performance is improved.
Fig. 6 is a flowchart of a video encoding method according to some embodiments of the present disclosure. The video encoding method shown in fig. 6 may be performed by the encoder shown in fig. 2.
Referring to fig. 6, in step 601, it is determined whether a time domain division prediction tool is used for a target prediction object corresponding to a target level lower than a sequence level, wherein the time domain division prediction tool is used to determine division information of the target prediction object based on division information of a time domain reference object of the target prediction object.
According to an embodiment, the target level may comprise one or more of a picture level, a slice level, a coding tree unit CTU level.
Whether to use a time domain division prediction tool for the target prediction object may be determined based on at least one of a multiple pass code analysis result and a pre-analysis result before encoding the target prediction object.
For example, in the case where the target prediction object is a current image, it may be determined whether to use a time domain division prediction tool based on the content or image characteristics of the current image. For another example, the texture distribution of the whole frame image may be pre-analyzed to determine whether to use a time domain division prediction tool. For another example, it may be determined whether to use a time domain division prediction tool based on the actual MTT depth selection result. The above examples are merely exemplary, and the present disclosure is not limited thereto.
At step 602, a first syntax element for a target prediction object is determined, wherein the first syntax element indicates whether a time domain division prediction tool is used for the target prediction object.
For example, in the case where it is determined that a time domain division prediction tool is used for the target prediction object, the value of the first syntax element may be determined to be 1. In the case where it is determined that the time domain division prediction tool is not used for the target prediction object, it may be determined that the value of the first syntax element is 0 or does not exist.
The first syntax element may be ph_ partitioning _prediction_flag and be set in a picture header or a picture parameter set, as the first syntax element added in the picture header shown in table 1 or table 2 above.
In step 603, a first syntax element is signaled at a target level in the bitstream.
For example, in the case where the value of the first syntax element is determined to be 1, the first syntax element may be signaled.
According to embodiments of the present disclosure, it may be determined whether a time domain division prediction tool can be used for a sequence level, determining a second syntax element for the sequence level, wherein the second syntax element indicates whether the time domain division prediction tool can be used for the sequence level, and signaling the second syntax element at the sequence level in a bitstream.
In the case where it is determined that the time domain division prediction tool is not used for the sequence level, it may be determined that the time domain division prediction tool is not used for the target level and the first syntax element is not signaled at the target level.
In the case of determining to use a time domain division prediction tool for a sequence level, it may be further determined whether to use the time domain division prediction tool for a target prediction object corresponding to a target level lower than the sequence level to determine a first syntax element for the target level.
The above-described methods may be implemented using an apparatus comprising one or more circuits comprising an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components. The apparatus may use the circuitry in combination with other hardware or software components to perform the methods described above. Each of the modules, sub-modules, units, or sub-units disclosed above may be implemented, at least in part, using one or more circuits.
Fig. 7 illustrates a computing environment 1610 coupled to a user interface 1650. The computing environment 1610 may be part of a data processing server. The computing environment 1610 includes a processor 1620, a memory 1630, and an input/output (I/O) interface 1640.
Processor 1620 generally controls overall operation of computing environment 1610, such as operations associated with display, data acquisition, data communication, and image processing. Processor 1620 may include one or more processors for executing instructions to perform all or some of the steps of the methods described above. Further, the processor 1620 may include one or more modules that facilitate interactions between the processor 1620 and other components. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip microcomputer, graphics Processing Unit (GPU), or the like.
Memory 1630 is configured to store various types of data to support the operation of computing environment 1610. The memory 1630 may include predetermined software 1632. Examples of such data include instructions, video data sets, image data, and the like for any application or method operating on computing environment 1610. The memory 1630 may be implemented using any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.
I/O interface 1640 provides an interface between processor 1620 and peripheral interface modules (e.g., keyboard, click wheel, buttons, etc.). Buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. I/O interface 1640 may be coupled with an encoder and decoder.
In an embodiment, there is also provided a non-transitory computer readable storage medium including, for example, a plurality of programs in memory 1630 executable by processor 1620 in computing environment 1610 for performing the above-described method and/or storing a bitstream generated by the above-described encoding method or a bitstream to be decoded by the above-described decoding method. In one example, the plurality of programs may be executed by the processor 1620 in the computing environment 1610 to receive (e.g., from the video encoder 20 in fig. 2) a bitstream or data stream comprising encoded video information (e.g., video blocks representing encoded video frames, and/or associated one or more syntax elements, etc.), and may also be executed by the processor 1620 in the computing environment 1610 to perform the above-described decoding method according to the received bitstream or data stream. In another example, the plurality of programs may be executed by the processor 1620 in the computing environment 1610 for performing the encoding methods described above to encode video information (e.g., video blocks representing video frames, and/or associated one or more syntax elements, etc.) into a bitstream or data stream, and may also be executed by the processor 1620 in the computing environment 1610 for transmitting the bitstream or data stream (e.g., to the video decoder 30 in fig. 3). Alternatively, a non-transitory computer readable storage medium may have stored therein a bitstream or data stream comprising encoded video information (e.g., video blocks representing encoded video frames, and/or associated one or more syntax elements, etc.) that is generated by an encoder (e.g., video encoder 20 of fig. 2) using, for example, the encoding methods described above, for use by a decoder (e.g., video decoder 30 of fig. 3) in decoding video data. The non-transitory computer readable storage medium may be, for example, ROM, random-access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an embodiment, a bitstream generated by the above-described encoding method or a bitstream to be decoded by the above-described decoding method is provided. In an embodiment, there is provided a bitstream including encoded video information generated by the above-described encoding method or encoded video information to be decoded by the above-described decoding method.
In an embodiment, a computing device is also provided that includes one or more processors (e.g., processor 1620), and a non-transitory computer-readable storage medium or memory 1630 having stored therein a plurality of programs executable by the one or more processors, wherein the one or more processors are configured to perform the above-described methods when executing the plurality of programs.
In an embodiment, there is also provided a computer program product having instructions for storing or transmitting a bitstream comprising encoded video information generated by the above-described encoding method or encoded video information to be decoded by the above-described decoding method. In an embodiment, a computer program product is also provided that includes a plurality of programs, e.g., in memory 1630, executable by processor 1620 in computing environment 1610 for performing the methods described above. For example, the computer program product may include a non-transitory computer readable storage medium.
In an embodiment, the computing environment 1610 may be implemented by one or more ASICs, DSPs, digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), FPGAs, GPUs, controllers, microcontrollers, microprocessors, or other electronic components for performing the methods described above.
In an embodiment there is also provided a method of storing a bitstream comprising storing said bitstream on a digital storage medium, wherein said bitstream comprises encoded video information generated by the above described encoding method or encoded video information to be decoded by the above described decoding method.
In an embodiment, there is also provided a method for transmitting a bitstream generated by the above encoder. In an embodiment, a method for receiving a bitstream to be decoded by the decoder described above is also provided.
The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will come to mind to one skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
The order of steps of the method according to the present disclosure is intended to be illustrative only, unless specifically stated otherwise, and the steps of the method according to the present disclosure are not limited to the above-described order, but may be changed according to actual circumstances. Furthermore, at least one of the steps of the method according to the present disclosure may be adjusted, combined or pruned as actually needed.
The examples were chosen and described in order to explain the principles of the present disclosure and to enable others skilled in the art to understand the disclosure for various embodiments and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the disclosed embodiments, and that modifications and other embodiments are intended to be included within the scope of the disclosure.