The present application is based on and claims priority from U.S. provisional application No. 63/426,712, entitled Methods AND DEVICES For Intra Block Copy AND INTRA TEMPLATE MATCHING (Methods and apparatus for intra block replication and intra template matching) filed on month 11, 2022, the entire contents of which are incorporated herein by reference for all purposes.
Detailed Description
Reference will now be made in detail to the specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. However, various alternatives may be used without departing from the scope of the claims, and the present subject matter may be practiced without these specific details. For example, the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the present disclosure. In this disclosure and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise throughout the disclosure. It should also be understood that the term "and/or" as used in this disclosure indicates and includes one or any or all possible combinations of the various related items listed.
Reference throughout this specification to "one embodiment," "an example," "some embodiments," "some examples," or similar language means that a particular feature, structure, or characteristic described is included in at least one embodiment or example. Features, structures, elements, or characteristics described in connection with one or some embodiments may be applicable to other embodiments unless explicitly stated otherwise.
Throughout this disclosure, unless explicitly stated otherwise, the terms "first," "second," "third," and the like, are used merely as designations of references to related elements (e.g., devices, components, compositions, steps, etc.), and do not imply any spatial or temporal order. For example, a "first device" and a "second device" may refer to two separately formed devices, or two portions, components, or operational states of the same device, and may be arbitrarily named.
The terms "module," "sub-module," "circuit," "sub-circuit," "circuitry," "sub-circuitry," "unit," or "sub-unit" may include a memory (shared, dedicated, or group) that stores code or instructions executable by one or more processors. A module may include one or more circuits with or without stored code or instructions. A module or circuit may include one or more components connected directly or indirectly. These components may or may not be physically attached to each other or positioned adjacent to each other.
As used herein, the term "if" or "when" may be understood to mean "based on" or "responsive to" depending on the context. These terms, if present in the claims, may not indicate that the relevant limitations or features are conditional or optional. For example, a method may include the steps of i) performing a function or action X 'when or if condition X exists, and ii) performing a function or action Y' when or if condition Y exists. The method may be implemented with the ability to perform a function or action X 'and the ability to perform a function or action Y'. Thus, both functions X 'and Y' may be performed at different times during multiple executions of the method.
The units or modules may be implemented purely in software, purely in hardware or by a combination of hardware and software. In a pure software implementation, for example, units or modules may comprise functionally related code blocks or software components that are directly or indirectly linked together in order to perform particular functions.
Fig. 1A is a block diagram illustrating an exemplary system 10 for encoding and decoding video blocks in parallel according to some embodiments of the present disclosure. As shown in fig. 1A, the system 10 includes a source device 12, the source device 12 generating and encoding video data to be later decoded by a target device 14. Source device 12 and destination device 14 may comprise any of a wide variety of electronic devices including cloud servers, server computers, desktop or laptop computers, tablet computers, smart phones, set-top boxes, digital televisions, cameras, display devices, digital media players, video gaming machines, video streaming devices, and the like. In some implementations, the source device 12 and the target device 14 are equipped with wireless communication capabilities.
In some implementations, the target device 14 may receive encoded video data to be decoded via the link 16. Link 16 may comprise any type of communication medium or device capable of moving encoded video data from source device 12 to destination device 14. In one example, link 16 may include a communication medium that enables source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the target device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network (e.g., a local area network, a wide area network, or a global network such as the internet). The communication medium may include routers, switches, base stations, or any other device that may facilitate communication from source device 12 to destination device 14.
In some other implementations, encoded video data may be sent from output interface 22 to storage device 32. The encoded video data in the storage device 32 may then be accessed by the target device 14 via the input interface 28. Storage device 32 may include any of a variety of distributed or locally accessed data storage media such as a hard drive, blu-ray disc, digital versatile disc (DIGITAL VERSATILE DISK, DVD), compact disc read Only Memory (CD-ROM), flash Memory, volatile or nonvolatile Memory, or any other suitable digital storage media for storing encoded video data. In another example, storage device 32 may correspond to a file server or another intermediate storage device that may hold encoded video data generated by source device 12. The target device 14 may access the stored video data via streaming or download from the storage device 32. The file server may be any type of computer capable of storing and transmitting encoded video data to the target device 14. Exemplary file servers include web servers (e.g., for websites), file Transfer Protocol (FTP) servers, network attached storage (Network Attached Storage, NAS) devices, or local disk drives. The target device 14 may access the encoded video data through any standard data connection suitable for accessing encoded video data stored on a file server, including a wireless channel (e.g., a wireless fidelity (Wi-Fi) connection), a wired connection (e.g., a digital subscriber line (Digital Subscriber Line, DSL), a cable modem, etc.), or a combination of both. The transmission of encoded video data from storage device 32 may be streaming, download, or a combination of both streaming and download.
As shown in fig. 1A, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Video source 18 may include sources such as a video capture device (e.g., a video camera), a video archive containing previously captured video, a video feed interface for receiving video from a video content provider, and/or a computer graphics system for generating computer graphics data as source video, or a combination of such sources. As one example, if video source 18 is a video camera of a security monitoring system, source device 12 and target device 14 may form a camera phone or video phone. However, the embodiments described in this disclosure are generally applicable to video codecs and may be applied to wireless and/or wired applications.
Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video data may be sent directly to the target device 14 via the output interface 22 of the source device 12. The encoded video data may also (or alternatively) be stored on the storage device 32 for later access by the target device 14 or other device for decoding and/or playback. Output interface 22 may also include a modem and/or a transmitter.
The target device 14 includes an input interface 28, a video decoder 30, and a display device 34. Input interface 28 may include a receiver and/or modem and receives encoded video data over link 16. The encoded video data communicated over link 16 or provided on storage device 32 may include various syntax elements generated by video encoder 20 for use by video decoder 30 in decoding the video data. Such syntax elements may be included within encoded video data sent over a communication medium, stored on a storage medium, or stored on a file server.
In some implementations, the target device 14 may include a display device 34, and the display device 34 may be an integrated display device and an external display device configured to communicate with the target device 14. The display device 34 displays the decoded video data to a user and may include any of a variety of display devices, such as a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), a plasma display, an Organic LIGHT EMITTING Diode (OLED) display, or another type of display device.
Video encoder 20 and video decoder 30 may operate in accordance with a proprietary standard or an industry standard (e.g., VVC, HEVC, MPEG-4, part 10, AVC) or an extension of such a standard. It should be understood that the present application is not limited to a particular video encoding/decoding standard and is applicable to other video encoding/decoding standards. It is generally contemplated that video encoder 20 of source device 12 may be configured to encode video data according to any of these current or future standards. Similarly, it is also generally contemplated that video decoder 30 of target device 14 may be configured to decode video data according to any of these current or future standards.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field programmable gate arrays (Field Programmable GATE ARRAY, FPGA), discrete logic, software, hardware, firmware or any combinations thereof. When implemented in part in software, the electronic device can store instructions for the software in a suitable non-volatile computer-readable medium and execute the instructions in hardware using one or more processors to perform the video encoding/decoding operations disclosed in the present disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device.
In some implementations, at least some of the components of source device 12 (e.g., video source 18, video encoder 20, or components included in video encoder 20 as described below with reference to fig. 1G, and output interface 22) and/or at least some of the components of target device 14 (e.g., input interface 28, video decoder 30, or components included in video decoder 30 as described below with reference to fig. 3, and display device 34) may operate in a cloud computing services network that may provide software, platforms, and/or infrastructure, such as software-as-a-service (SaaS), platform-as-a-service (PaaS), or infrastructure-as-a-service (IaaS). In some implementations, one or more components of the source device 12 and/or the target device 14 that are not included in the cloud computing service network may be provided in one or more client devices, and the one or more client devices may communicate with a server computer in the cloud computing service network through a wireless communication network (e.g., a cellular communication network, a short-range wireless communication network, or a Global Navigation Satellite System (GNSS) communication network) or a wired communication network (e.g., a Local Area Network (LAN) communication network or a Power Line Communication (PLC) network). In embodiments, at least a portion of the operations described herein may be implemented as cloud-based services provided by one or more server computers in a cloud computing services network implemented by at least a portion of the components of source device 12 and/or at least a portion of the components of destination device 14, and one or more other operations described herein may be implemented by one or more client devices. In some implementations, the cloud computing service network may be a private cloud, a public cloud, or a hybrid cloud. Terms such as "cloud," "cloud computing," "cloud-based," and the like herein may be used interchangeably as appropriate without departing from the scope of this disclosure. It should be understood that the present disclosure is not limited to implementation in the cloud computing service network described above. Alternatively, the disclosure may be implemented in any other type of computing environment, whether currently known or developed in the future.
Fig. 4A-4E are schematic diagrams illustrating multi-type tree partitioning modes according to some embodiments of the present disclosure. Fig. 4A-4E show five partition types, including quaternary partition (fig. 4A), vertical binary partition (fig. 4B), horizontal binary partition (fig. 4C), vertical ternary partition (fig. 4D), and horizontal ternary partition (fig. 4E), respectively.
Fig. 2 is a block diagram illustrating another exemplary video encoder 20 according to some embodiments described in this disclosure. Video encoder 20 may perform intra-prediction encoding and inter-prediction encoding of video blocks within video frames. Intra-prediction encoding relies on spatial prediction to reduce or eliminate spatial redundancy in video data within a given video frame or picture. Inter-prediction encoding relies on temporal prediction to reduce or eliminate temporal redundancy in video data within adjacent video frames or pictures of a video sequence. It should be noted that the term "frame" may be used as a synonym for the term "image" or "picture" in the field of video coding.
As shown in fig. 2, video encoder 20 includes a video data memory 40, a prediction processing unit 41, a decoded picture buffer (Decoded Picture Buffer, DPB) 64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction processing unit 41 further includes a motion estimation unit 42, a motion compensation unit 44, a division unit 45, an intra prediction processing unit 46, and an intra Block Copy (BC) unit 48. In some implementations, video encoder 20 also includes an inverse quantization unit 58, an inverse transform processing unit 60, and an adder 62 for video block reconstruction. A loop filter 63, such as a deblocking filter, may be located between adder 62 and DPB 64 to filter block boundaries to remove blockiness artifacts from the reconstructed video. In addition to deblocking filters, additional Loop filters may be used to Filter the output of adder 62, such as a Sample Adaptive Offset (SAO) Filter, a Cross Component SAMPLE ADAPTIVE Offset (CCSAO) Filter, and/or an adaptive Loop Filter (ADAPTIVE IN-Loop Filter). It should be noted that the present application is not limited to the embodiments described herein with respect to CCSAO technology, and alternatively, the present application may be applied to a case where an offset for any other one of a luminance component, a Cb chrominance component, and a Cr chrominance component is selected according to the any one of the luminance component, the Cb chrominance component, and the Cr chrominance component to modify the any other component based on the selected offset. Further, it should also be noted that the first component mentioned herein may be any one of a luminance component, a Cb chrominance component, and a Cr chrominance component, the second component mentioned herein may be any other one of the luminance component, the Cb chrominance component, and the Cr chrominance component, and the third component mentioned herein may be the remaining one of the luminance component, the Cb chrominance component, and the Cr chrominance component. In some examples, the loop filter may be omitted and the decoded video block may be provided directly to DPB 64 by adder 62. Video encoder 20 may take the form of fixed or programmable hardware units, or may be dispersed in one or more of the fixed or programmable hardware units described.
Video data memory 40 may store video data to be encoded by components of video encoder 20. The video data in video data store 40 may be obtained, for example, from video source 18 as shown in fig. 1A. DPB 64 is a buffer that stores reference video data (reference frames or pictures) for use by video encoder 20 in encoding the video data (e.g., in intra or inter prediction encoding modes). Video data memory 40 and DPB 64 may be formed from any of a variety of memory devices. In various examples, video data memory 40 may be on-chip with other components of video encoder 20, or off-chip with respect to those components.
As shown in fig. 2, after receiving video data, a dividing unit 45 within the prediction processing unit 41 divides the video data into video blocks. This partitioning operation may also include partitioning the video frame into slices, tiles (e.g., a set of video blocks), or other larger Coding Units (CUs) according to a predefined partitioning structure associated with the video data, such as a Quad-Tree (QT) structure. A video frame is or can be considered a two-dimensional array or matrix of samples having sample values. The samples in the array may also be referred to as pixels or pels. The number of samples in the horizontal and vertical directions (or axes) of the array or picture defines the size and/or resolution of the video frame. The video frame may be divided into a plurality of video blocks, for example, using QT division. Video blocks are also or may be considered as two-dimensional arrays or matrices of samples with sample values, but the size of the video blocks is smaller than the video frames. The number of samples in the horizontal and vertical directions (or axes) of the video block defines the size of the video block. The video block may be further divided into one or more block partitions or sub-blocks (which may again form blocks) by, for example, iteratively using QT partitioning, binary-Tree (BT) partitioning, or Trigeminal Tree (TT) partitioning, or any combination thereof. It should be noted that the term "block" or "video block" as used herein may be a part of a frame or picture, in particular a rectangular (square or non-square) part. Referring to HEVC and VVC, for example, a Block or video Block may be or correspond to a Coding Tree Unit (CTU), a CU, a Prediction Unit (PU) or a Transform Unit (TU) and/or may be or correspond to a respective Block, e.g., a Coding Tree Block (Coding Tree Block, CTB), a Coding Block (CB), a Prediction Block (PB) or a Transform Block (TB) and/or to a sub-Block.
Prediction processing unit 41 may select one of a plurality of possible prediction coding modes, such as one of one or more inter prediction coding modes of a plurality of intra prediction coding modes, for the current video block based on the error results (e.g., code rate and distortion level). The prediction processing unit 41 may provide the resulting intra-prediction encoded block (e.g., a prediction block) or inter-prediction encoded block to the adder 50 to generate a residual block and to the adder 62 to reconstruct the encoded block for subsequent use as part of a reference frame. Prediction processing unit 41 also provides syntax elements, such as motion vectors, intra mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.
To select the appropriate intra-prediction encoding mode for the current video block, intra-prediction processing unit 46 within prediction processing unit 41 may perform intra-prediction encoding of the current video block with respect to one or more neighboring blocks in the same frame as the current block to be encoded to provide spatial prediction. Motion estimation unit 42 and motion compensation unit 44 within prediction processing unit 41 perform inter-prediction encoding of the current video block relative to one or more prediction blocks in one or more reference frames to provide temporal prediction. Video encoder 20 may perform multiple encoding passes, for example, selecting an appropriate encoding mode for each block of video data.
In some embodiments, motion estimation unit 42 determines the inter-prediction mode for the current video frame by generating a motion vector from a predetermined pattern within the sequence of video frames, the motion vector indicating a displacement of a video block within the current video frame relative to a predicted block within a reference video frame. The motion estimation performed by the motion estimation unit 42 is a process of generating a motion vector that estimates motion for a video block. For example, the motion vector may indicate the displacement of a video block within a current video frame or picture relative to a predicted block within a reference frame relative to a current block being encoded in the current frame. The predetermined pattern may designate video frames in the sequence as P-frames or B-frames. The intra BC unit 48 may determine the vector (e.g., block vector) for intra BC encoding in a similar manner as the motion vector for inter prediction determined by the motion estimation unit 42, or may determine the block vector using the motion estimation unit 42.
In terms of pixel differences, the prediction block of a video block may be or may correspond to a block or reference block of a reference frame that closely matches the video block to be encoded, the pixel differences may be determined by a sum of absolute differences (Sum of Absolute Difference, SAD), a sum of squared differences (Sum of Square Difference, SSD), or other difference metric. In some implementations, video encoder 20 may calculate values for sub-integer pixel positions of reference frames stored in DPB 64. For example, video encoder 20 may interpolate values for one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference frame. Accordingly, the motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector having fractional pixel accuracy.
Motion estimation unit 42 calculates motion vectors for video blocks in inter-prediction encoded frames by comparing the locations of the video blocks with the locations of predicted blocks of reference frames selected from a first reference frame list (list 0) or a second reference frame list (list 1), each of which identifies one or more reference frames stored in DPB 64. The motion estimation unit 42 sends the calculated motion vector to the motion compensation unit 44 and then to the entropy encoding unit 56.
The motion compensation performed by motion compensation unit 44 may involve extracting or generating a prediction block based on the motion vector determined by motion estimation unit 42. Upon receiving the motion vector for the current video block, motion compensation unit 44 may locate the prediction block to which the motion vector points in one of the reference frame lists, retrieve the prediction block from DPB 64, and forward the prediction block to adder 50. Adder 50 then forms a residual video block of pixel differences by subtracting the pixel values of the prediction block provided by motion compensation unit 44 from the pixel values of the current video block being encoded. The pixel differences forming the residual video block may include a luma component difference or a chroma component difference or both. Motion compensation unit 44 may also generate syntax elements associated with the video blocks of the video frames for use by video decoder 30 in decoding the video blocks of the video frames. The syntax elements may include, for example, syntax elements defining motion vectors used to identify the prediction block, any flags indicating the prediction mode, or any other syntax information described herein. Note that motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are shown separately for conceptual purposes.
In some embodiments, intra BC unit 48 may generate vectors and extract prediction blocks in a manner similar to that described above in connection with motion estimation unit 42 and motion compensation unit 44, but in the same frame as the current block being encoded, and these vectors are referred to as block vectors rather than motion vectors. In particular, intra BC unit 48 may determine an intra prediction mode to be used to encode the current block. In some examples, intra BC unit 48 may encode the current block using various intra prediction modes, e.g., during different encoding channels, and test their performance through rate-distortion analysis. Next, intra BC unit 48 may select an appropriate intra prediction mode among the various tested intra prediction modes to use and generate the intra mode indicator accordingly. For example, intra BC unit 48 may calculate rate distortion values for various tested intra prediction modes using rate distortion analysis, and select the intra prediction mode with the best rate distortion characteristics among the tested modes to use as the appropriate intra prediction mode. Rate-distortion analysis generally determines the amount of distortion (or error) between a coded block and an original uncoded block that is coded to generate the coded block, as well as the bit rate (i.e., number of bits) used to generate the coded block. Intra BC unit 48 may calculate ratios from the distortion and rate for the various encoded blocks to determine which intra prediction mode exhibits the best rate distortion value for the block.
In other examples, intra BC unit 48 may use, in whole or in part, motion estimation unit 42 and motion compensation unit 44 to perform such functions for intra BC prediction in accordance with implementations described herein. In either case, for intra block copying, the prediction block may be a block deemed to closely match the block to be encoded in terms of pixel differences, which may be determined by SAD, SSD, or other difference metric, and the identification of the prediction block may include calculating the value of the sub-integer pixel location.
Regardless of whether the prediction block is from the same frame according to intra-prediction or from a different frame according to inter-prediction, video encoder 20 may form the residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being encoded. The pixel differences forming the residual video block may include both luma component differences and chroma component differences.
As an alternative to inter prediction performed by motion estimation unit 42 and motion compensation unit 44 or intra block copy prediction performed by intra BC unit 48 as described above, intra prediction processing unit 46 may intra-predict the current video block. In particular, intra-prediction processing unit 46 may determine an intra-prediction mode for encoding the current block. To this end, intra-prediction processing unit 46 may encode the current block using various intra-prediction modes, e.g., during different encoding passes, and intra-prediction processing unit 46 (or a mode selection unit in some examples) may select an appropriate intra-prediction mode from the tested intra-prediction modes to use. Intra-prediction processing unit 46 may provide information to entropy encoding unit 56 indicating the intra-prediction mode selected for the block. Entropy encoding unit 56 may encode information into the bitstream that indicates the selected intra-prediction mode.
After the prediction processing unit 41 determines a prediction block for the current video block via inter prediction or intra prediction, the adder 50 forms a residual video block by subtracting the prediction block from the current video block. Residual video data in the residual block may be included in one or more TUs and provided to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (Discrete Cosine Transform, DCT) or a conceptually similar transform.
Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. The quantization unit 54 quantizes the transform coefficient to further reduce the bit rate. The quantization process may also reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting quantization parameters. In some examples, quantization unit 54 may then perform a scan on the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform the scan.
After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients into a video bitstream using, for example, context adaptive variable length coding (Context Adaptive Variable Length Coding, CAVLC), context adaptive binary arithmetic coding (Context Adaptive Binary Arithmetic Coding, CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (Probability Interval Partitioning Entropy, PIPE) coding, or another entropy encoding method or technique entropy encoding technique. The encoded bitstream may then be sent to the video decoder 30 as shown in fig. 1A, or archived in the storage device 32 as shown in fig. 1A for later transmission to the video decoder 30 or extraction by the video decoder 30. Entropy encoding unit 56 may also entropy encode motion vectors and other syntax elements for the current video frame being encoded.
Inverse quantization unit 58 and inverse transform processing unit 60 apply inverse quantization and inverse transforms, respectively, to reconstruct the residual video block in the pixel domain for generating reference blocks for predicting other video blocks. As noted above, motion compensation unit 44 may generate a motion compensated prediction block from one or more reference blocks of a frame stored in DPB 64. Motion compensation unit 44 may also apply one or more interpolation filters to the prediction block to calculate sub-integer pixel values for use in motion estimation.
Adder 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in DPB 64. The reference block may then be used by intra BC unit 48, motion estimation unit 42, and motion compensation unit 44 as a prediction block to inter-predict another video block in a subsequent video frame.
Fig. 3 is a block diagram illustrating another exemplary video decoder 30 according to some embodiments of the present application. Video decoder 30 includes video data memory 79, entropy decoding unit 80, prediction processing unit 81, inverse quantization unit 86, inverse transform processing unit 88, adder 90, and DPB 92. The prediction processing unit 81 further includes a motion compensation unit 82, an intra prediction unit 84, and an intra BC unit 85. Video decoder 30 may perform a decoding process that is substantially reciprocal to the encoding process described above in connection with fig. 2 with respect to video encoder 20. For example, motion compensation unit 82 may generate prediction data based on the motion vectors received from entropy decoding unit 80, while intra-prediction unit 84 may generate prediction data based on the intra-prediction mode indicators received from entropy decoding unit 80.
In some examples, the units of video decoder 30 may be tasked to perform embodiments of the present application. Further, in some examples, embodiments of the present disclosure may be dispersed in one or more of the plurality of units of video decoder 30. For example, the intra BC unit 85 may perform embodiments of the present application alone or in combination with other units of the video decoder 30, such as the motion compensation unit 82, the intra prediction unit 84, and the entropy decoding unit 80. In some examples, video decoder 30 may not include intra BC unit 85, and the functions of intra BC unit 85 may be performed by other components of prediction processing unit 81 (such as motion compensation unit 82).
Video data memory 79 may store video data, such as an encoded video bitstream, to be decoded by other components of video decoder 30. The video data stored in the video data memory 79 may be obtained, for example, from the storage device 32, from a local video source (such as a camera), via wired or wireless network communication of video data, or by accessing a physical data storage medium (e.g., a flash drive or hard disk). The video data memory 79 may include an encoded picture buffer (Coded Picture Buffer, CPB) that stores encoded video data from an encoded video bitstream. DPB 92 of video decoder 30 stores reference video data for use by video decoder 30 (e.g., in an intra-or inter-prediction decoding mode) when decoding the video data. Video data memory 79 and DPB 92 may be formed from any of a variety of memory devices, such as dynamic random access memory (Dynamic Random Access Memory, DRAM), including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. For illustrative purposes, video data memory 79 and DPB 92 are depicted in fig. 3 as two different components of video decoder 30. It will be apparent to those skilled in the art that video data memory 79 and DPB 92 may be provided by the same memory device or separate memory devices. In some examples, video data memory 79 may be on-chip with other components of video decoder 30, or off-chip with respect to those components.
During the decoding process, video decoder 30 receives an encoded video bitstream representing video blocks of encoded video frames and associated syntax elements. Video decoder 30 may receive syntax elements at the video frame level and/or the video block level. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantization coefficients, motion vectors, or intra-prediction mode indicators, as well as other syntax elements. Entropy decoding unit 80 then forwards the motion vector or intra prediction mode indicator and other syntax elements to prediction processing unit 81.
When a video frame is encoded as an intra prediction encoded (I) frame or an intra encoding prediction block used in other types of frames, the intra prediction unit 84 of the prediction processing unit 81 may generate prediction data for a video block of the current video frame based on the signaled intra prediction mode and reference data from a previously decoded block of the current frame.
When a video frame is encoded as an inter-prediction encoded (i.e., B or P) frame, the motion compensation unit 82 of the prediction processing unit 81 generates one or more prediction blocks for the video block of the current video frame based on the motion vectors and other syntax elements received from the entropy decoding unit 80. Each of the prediction blocks may be generated from reference frames within one of the reference frame lists. Video decoder 30 may construct the reference frame list, list 0 and list 1 using a default construction technique based on the reference frames stored in DPB 92.
In some examples, when decoding a video block according to the intra BC mode described herein, intra BC unit 85 of prediction processing unit 81 generates a prediction block for the current video block based on the block vectors and other syntax elements received from entropy decoding unit 80. The prediction block may be within a reconstructed region of the same picture as the current video block defined by video encoder 20.
The motion compensation unit 82 and/or the intra BC unit 85 determine prediction information for the video block of the current video frame by parsing the motion vector and other syntax elements, and then use the prediction information to generate a prediction block for the current video block being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (e.g., intra-prediction or inter-prediction) for decoding a video block of a video frame, an inter-prediction frame type (e.g., B or P), construction information for one or more of a reference frame list for the frame, a motion vector for each inter-prediction encoded video block of the frame, an inter-prediction state for each inter-prediction encoded video block of the frame, and other information for decoding a video block in a current video frame.
Similarly, the intra BC unit 85 may use some of the received syntax elements, such as flags to determine that the current video block is predicted using intra BC mode, build information of which video blocks of the frame are within the reconstruction region and should be stored in the DPB 92, block vectors for each intra BC predicted video block of the frame, intra BC prediction status for each intra BC predicted video block of the frame, and other information for decoding video blocks in the current video frame.
Motion compensation unit 82 may also perform interpolation using interpolation filters, such as those used by video encoder 20 during encoding of video blocks, to calculate interpolation for sub-integer pixels of the reference block. In this case, motion compensation unit 82 may determine interpolation filters used by video encoder 20 from the received syntax elements and use these interpolation filters to generate the prediction block.
The dequantization unit 86 dequantizes the quantized transform coefficients provided in the bitstream and entropy decoded by the entropy decoding unit 80 using the same quantization parameter calculated by the video encoder 20 for each video block in the video frame that is used to determine the degree of quantization. The inverse transform processing unit 88 applies an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to reconstruct the residual block in the pixel domain.
After the motion compensation unit 82 or the intra BC unit 85 generates a prediction block for the current video block based on the vector and other syntax elements, the adder 90 reconstructs a decoded video block for the current video block by adding the residual block from the inverse transform processing unit 88 to the corresponding prediction block generated by the motion compensation unit 82 and the intra BC unit 85. A loop filter 91, such as a deblocking filter, SAO filter, CCSAO filter, and/or ALF, may be located between adder 90 and DPB 92 to further process the decoded video block. In some examples, loop filter 91 may be omitted and the decoded video block may be provided directly to DPB 92 by adder 90. The decoded video blocks in a given frame are then stored in DPB 92, and DPB 92 stores reference frames for subsequent motion compensation of the next video block. DPB 92 or a memory device separate from DPB 92 may also store decoded video for later presentation on a display device (e.g., display device 34 of fig. 1A).
In a typical video codec process, a video sequence generally includes an ordered set of frames or pictures. Each frame may include three sample arrays, denoted SL, SCb, and SCr. SL is a two-dimensional array of luminance samples. SCb is a two-dimensional array of Cb chroma-sampling points. SCr is a two-dimensional array of Cr chroma-sampling points. In other cases, the frame may be monochromatic, and thus include only one two-dimensional array of luminance samples.
As shown in fig. 1C, video encoder 20 (or more specifically, a partitioning unit in the prediction processing unit of video encoder 20) generates an encoded representation of a frame by first partitioning the frame into a set of CTUs. The video frame may include an integer number of CTUs ordered consecutively from left to right and top to bottom in raster scan order. Each CTU is the largest logical coding unit and the width and height of the CTU are signaled by video encoder 20 in the sequence parameter set such that all CTUs in the video sequence have the same size of one of 128 x 128, 64 x 64, 32 x 32, and 16 x 16. It should be noted that the present application is not necessarily limited to a particular size. As shown in fig. 1D, each CTU may include one CTB of a luminance sample, two corresponding coding tree blocks of a chrominance sample, and syntax elements for coding and decoding the samples of the coding tree blocks. Syntax elements describe the nature of the different types of units encoding the pixel blocks and how the video sequence may be reconstructed at video decoder 30, including inter-or intra-prediction, intra-prediction modes, motion vectors, and other parameters. In a monochrome picture or a picture having three separate color planes, a CTU may comprise a single coding tree block and syntax elements for encoding and decoding samples of the coding tree block. The coding tree block may be an nxn sample block.
To achieve better performance, video encoder 20 may recursively perform tree partitioning, such as binary tree partitioning, trigeminal tree partitioning, quadtree partitioning, or a combination thereof, on the coding tree blocks of the CTUs and partition the CTUs into smaller CUs. 1B-1E are block diagrams illustrating how a frame is recursively divided into video blocks of different sizes and shapes according to some embodiments of the present disclosure. As depicted in fig. 1E, 64 x 64ctu 400 is first divided into four smaller CUs, each having a block size of 32 x 32. Among the four smaller CUs, CU 410 and CU 420 are divided into four CUs with block sizes of 16×16, respectively. Two 16×16 CUs 430 and 440 are each further divided into four CUs with block sizes of 8×8. Fig. 1B depicts a quadtree data structure showing the final result of the partitioning process of CTU 400 as depicted in fig. 1E, each leaf node of the quadtree corresponding to one CU of various sizes ranging from 32 x 32 to 8 x 8. Similar to the CTU depicted in fig. 1D, each CU may include two corresponding encoded blocks of CBs and chroma samples of luma samples of the same size frame, and syntax elements for encoding and decoding the samples of the encoded blocks. In a monochrome picture or a picture having three separate color planes, a CU may comprise a single coding block and syntax structures for encoding and decoding samples of the coding block. It should be noted that the quadtree partitioning depicted in fig. 1E and 1B is for illustrative purposes only, and that one CTU may be partitioned into CUs based on quadtree/trigeminal/binary tree partitioning to accommodate varying local characteristics. In a multi-type tree structure, one CTU is divided by a quadtree structure, and each quadtree leaf CU may be further divided by a binary tree structure and a trigeminal tree structure. As shown in fig. 4A-4E, there are five possible partition types for a code block of width W and height H, namely, quaternary partition, horizontal binary partition, vertical binary partition, horizontal ternary partition and vertical ternary partition.
In some implementations, video encoder 20 may further divide the coding blocks of the CU into one or more mxn PB. PB is a rectangular (square or non-square) block of samples to which the same prediction (inter or intra) is applied. The PU of a CU may include a PB of a luma sample, two corresponding PB of chroma samples, and syntax elements for predicting the PB. In a monochrome picture or a picture having three separate color planes, a PU may include a single PB and syntax structures for predicting the PB. Video encoder 20 may generate a predicted luma block, a predicted Cb block, and a predicted Cr block for luma PB, cb PB, and Cr PB of each PU of the CU.
Video encoder 20 may use intra-prediction or inter-prediction to generate the prediction block for the PU. If video encoder 20 uses intra-prediction to generate the prediction block for the PU, video encoder 20 may generate the prediction block for the PU based on decoded samples of the frame associated with the PU. If video encoder 20 uses inter prediction to generate the prediction block of the PU, video encoder 20 may generate the prediction block of the PU based on decoded samples of one or more frames other than the frame associated with the PU.
After video encoder 20 generates the predicted luma block, the predicted Cb block, and the predicted Cr block for the one or more PUs of the CU, video encoder 20 may generate a luma residual block for the CU by subtracting the predicted luma block of the CU from the original luma coded block of the CU such that each sample in the luma residual block of the CU indicates a difference between a luma sample in one of the predicted luma blocks of the CU and a corresponding sample in the original luma coded block of the CU. Similarly, video encoder 20 may generate Cb residual blocks and Cr residual blocks for the CU, respectively, such that each sample in the Cb residual block of the CU indicates a difference between a Cb sample in one of the predicted Cb blocks of the CU and a corresponding sample in the original Cb encoded block of the CU, and each sample in the Cr residual block of the CU may indicate a difference between a Cr sample in one of the predicted Cr blocks of the CU and a corresponding sample in the original Cr encoded block of the CU.
Further, as shown in fig. 1E, video encoder 20 may decompose the luma residual block, the Cb residual block, and the Cr residual block of the CU into one or more luma transform blocks, cb transform blocks, and Cr transform blocks, respectively, using quadtree partitioning. The transform block is a rectangular (square or non-square) block of samples to which the same transform is applied. The TUs of a CU may include a transform block of luma samples, two corresponding transform blocks of chroma samples, and syntax elements for transforming the transform block samples. Thus, each TU of a CU may be associated with a luma transform block, a Cb transform block, and a Cr transform block. In some examples, the luma transform block associated with a TU may be a sub-block of a luma residual block of a CU. The Cb transform block may be a sub-block of a Cb residual block of the CU. The Cr transform block may be a sub-block of a Cr residual block of the CU. In a monochrome picture or a picture having three separate color planes, a TU may comprise a single transform block and syntax structures for transforming the samples of the transform block.
Video encoder 20 may apply one or more transforms to the luma transform block of the TU to generate a luma coefficient block for the TU. The coefficient block may be a two-dimensional array of transform coefficients. The transform coefficients may be scalar quantities. Video encoder 20 may apply one or more transforms to the Cb transform block of the TU to generate a Cb coefficient block for the TU. Video encoder 20 may apply one or more transforms to the Cr transform blocks of the TUs to generate Cr coefficient blocks for the TUs.
After generating the coefficient block (e.g., the luma coefficient block, the Cb coefficient block, or the Cr coefficient block), video encoder 20 may quantize the coefficient block. Quantization generally refers to the process by which transform coefficients are quantized to potentially reduce the amount of data used to represent the transform coefficients, thereby providing further compression. After video encoder 20 quantizes the coefficient block, video encoder 20 may entropy encode syntax elements that indicate the quantized transform coefficients. For example, video encoder 20 may perform CABAC on syntax elements indicating quantized transform coefficients. Finally, video encoder 20 may output a bitstream including a sequence of bits that form a representation of the encoded frames and associated data, which is stored in storage device 32 or transmitted to target device 14.
Upon receiving the bitstream generated by video encoder 20, video decoder 30 may parse the bitstream to obtain syntax elements from the bitstream. Video decoder 30 may reconstruct the frames of video data based at least in part on the syntax elements obtained from the bitstream. The process of reconstructing video data is typically reciprocal to the encoding process performed by video encoder 20. For example, video decoder 30 may perform an inverse transform on the coefficient blocks associated with the TUs of the current CU to reconstruct residual blocks associated with the TUs of the current CU. Video decoder 30 also reconstructs the encoded block of the current CU by adding samples of the prediction block for the PU of the current CU to corresponding samples of the transform block of the TU of the current CU. After reconstructing the encoded blocks for each CU of the frame, video decoder 30 may reconstruct the frame.
As described above, video codec mainly uses two modes, i.e., intra-prediction (or intra-prediction) and inter-prediction (or inter-prediction), to achieve video compression. Note that IBC may be considered as intra prediction or a third mode. Between the two modes, inter prediction contributes more to the codec efficiency than intra prediction because motion vectors are used to predict the current video block from the reference video block.
But with ever-improving video data capture techniques and more refined video block sizes for preserving details in video data, the amount of data required to represent the motion vector of the current frame has also increased significantly. One way to overcome this challenge is to benefit from the fact that not only are a set of neighboring CUs in both the spatial and temporal domains have similar video data for prediction purposes, but the motion vectors between these neighboring CUs are also similar. Thus, the motion information of the spatially neighboring CU and/or the temporally co-located CU may be used as an approximation of the motion information (e.g., motion vector) of the current CU by exploring the spatial and temporal correlation of the spatially neighboring CU and/or the temporally co-located CU, which is also referred to as the "motion vector predictor (Motion Vector Predictor, MVP)" of the current CU.
Instead of encoding the actual motion vector of the current CU as determined by the motion estimation unit described above in connection with fig. 2 into the video bitstream, the motion vector predictor of the current CU is subtracted from the actual motion vector of the current CU to generate a motion vector difference (Motion Vector Difference, MVD) for the current CU. By doing so, there is no need to encode the motion vector determined by the motion estimation unit for each CU of the frame into the video bitstream, and the amount of data in the video bitstream used to represent the motion information can be significantly reduced.
As with the process of selecting a prediction block in a reference frame during inter-prediction of an encoded block, both video encoder 20 and video decoder 30 need to employ a set of rules for constructing a motion vector candidate list (also referred to as a "merge list") for the current CU using those potential candidate motion vectors associated with spatially neighboring CUs and/or temporally co-located CUs of the current CU, and then select a member from the motion vector candidate list as a motion vector predictor for the current CU. By doing so, there is no need to send the motion vector candidate list itself from video encoder 20 to video decoder 30, and the index of the selected motion vector predictor within the motion vector candidate list is sufficient for video encoder 20 and video decoder 30 to use the same motion vector predictor within the motion vector candidate list to encode and decode the current CU.
In general, the basic inter prediction scheme applied in VVC remains almost the same as that of HEVC, except that several prediction tools are further extended, added and/or improved, e.g., extended combined prediction, MMVD and GPM.
Extended merge prediction
As video data capture techniques continue to improve and video block sizes for preserving details in video data are finer, the amount of data required to represent the motion vector of the current picture also increases significantly. One way to overcome this challenge is to use the motion information (e.g., motion vectors) of the current CU's spatially neighboring CUs, temporally co-located CUs, etc., as an approximation (e.g., prediction) of the current CU's motion information, which is also referred to as the "Motion Vector Predictor (MVP)" of the current CU.
Similar to the process of selecting a prediction block in a reference picture during inter prediction of an encoded block, both video encoder 20 and video decoder 30 need to employ a set of rules to construct a MVP candidate list for the current CU, and then select one MVP candidate from the MVP candidate list as the MVP for the current CU. By doing so, there is no need to send the MVP candidate list itself between the video encoder 20 and the video decoder 30, and the index of the MVP candidate selected from the MVP candidate list is sufficient for the video encoder 20 and the video decoder 30 to encode and decode the current CU using the same MVP candidate selected from the MVP candidate list.
In VVC, an MVP candidate list is constructed by sequentially including the following five types of MVPs:
spatial MVP from spatially neighboring CUs (i.e., spatial candidates);
temporal MVP from a temporal co-located CU (i.e., temporal candidate);
-history-based MVP (HMVP) from a first-in first-out (FIFO) table;
Pair-wise average MVP, and
Zero MVP.
The size of the MVP candidate list is signaled in the sequence parameter set header and the maximum allowed size of the MVP candidate list is 6. For each CU encoded in merge mode, the index of the best MVP candidate is encoded using truncated unary binarization. The first binary bit of the index is encoded with context and bypass encoding is used for other binary bits of the index.
The derivation process of each type of MVP is provided below. As in HEVC, VVC also supports parallel derivation of MVP candidate lists for all CUs within a region of a particular size.
Deriving MVP from spatial candidates
The derived MVP from the spatial candidate in VVC (e.g., the CU adjacent to the current CU 101 in fig. 5) is the same as the derived MVP from the spatial candidate in HEVC, except that the positions of the first two spatial candidates are swapped. A maximum of four spatial candidates are selected from the spatial candidates located at the positions shown in fig. 5 (i.e., the top position B0, the left side position A0, the upper right position B1, the lower left position A1, and the upper left position B2). The exporting is performed in the order of CUs at positions B0, A0, B1, A1 and B2. The CU at position B2 is only considered when one or more CUs at positions B0, A0, B1 and A1 are not available (e.g., because the one or more CUs belong to another slice or tile) or are intra-coded.
After the CU at the position B0 is added as a candidate to the merge candidate list, redundancy check is performed on the remaining candidates added to the merge candidate list, which ensures that candidates having the same motion information are excluded from the merge candidate list, thereby improving codec efficiency. In order to reduce the computational complexity not all possible candidate pairs are considered in the redundancy check. Instead, only the pair linked using the arrowed line in fig. 6 is considered, and a candidate in the corresponding pair for redundancy check is added to the merge candidate list only when the candidate does not have the same motion information as that of the candidate to be added. Spatial MVPs derived from candidates in the merge candidate list are added to the MVP candidate list.
Deriving MVP from temporal candidates
Only one temporal candidate is added to the merge candidate list during the derivation of MVP from the temporal candidates. Specifically, when deriving the MVP from this temporal candidate, a scaled motion vector is derived based on the co-located CU (e.g., col_cu 301 in fig. 7) that is a temporal candidate that is a co-located picture (e.g., col_pic 302 in fig. 7) belonging to the current CU (e.g., curr_cu 303 in fig. 7), and added as a temporal MVP candidate to the MVP candidate list. The reference picture list and the reference picture index to be used for deriving the co-located CU are explicitly signaled in the slice header. The scaled motion vector is obtained (i.e., scaled) from the motion vector of the co-located CU using Picture Order Count (POC) distances (i.e., tb and td), as shown in fig. 7, where tb is defined as the POC difference between the reference picture (e.g., curr_ref305 in fig. 7) of the current picture (e.g., curr_pic 304 in fig. 7) and the current picture, and td is defined as the POC difference between the reference picture (e.g., col_ref 306 in fig. 7) of the co-located picture and the co-located picture. The reference picture index of the temporal candidate is set equal to zero.
As shown in fig. 8, a position for a temporal candidate (i.e., a co-located CU) in the current CU 401 is selected between positions C 0 and C 1. If the CU at position C 0 in the co-located picture is not available, intra coded, or outside the current row of CTUs, the CU at position C 1 is used as the co-located CU for deriving the temporal MVP candidate. Otherwise, the CU at position C 0 is used as a co-located CU for deriving temporal MVP candidates.
Derivation of HMVP candidates
After spatial MVP and temporal MVP, HMVP candidates are added to the MVP candidate list. The motion information of the previously encoded block is stored in HMVP table and used as MVP for the current CU. A table with a plurality HMVP of candidates is maintained during the encoding/decoding process. When a new CTU row is encountered, the table is reset (cleared). Whenever there is a non-sub-block inter-coded CU, the associated motion information is added to the last entry of the HMVP table as a new HMVP candidate.
The size of HMVP tables may be set to 6. When a new HMVP candidate is inserted into the HMVP table, a constrained FIFO rule is used, where a redundancy check is first applied to find out if the same HMVP exists in the HMVP table. If found, the same HMVP is deleted from the HMVP table and all HMVP candidates thereafter are moved forward and the same HMVP is added to the last entry of the HMVP table.
HMVP candidates can be used in the MVP candidate list construction process. The last few HMVP candidates in the HMVP table are checked in order and inserted into the MVP candidate list after the temporal MVP candidate. Redundancy check is applied to HMVP candidates with respect to spatial candidates and/or temporal MVP candidates.
In order to reduce the number of redundancy check operations, the following simplified approach is introduced:
Redundancy checking the last two entries in the HMVP table with respect to the spatial MVP candidates derived from the spatial candidates at positions A1 and B1, respectively, and
-Terminating the construction of the MVP candidate list from HMVP candidates as soon as the total number of available MVP candidates reaches the maximum allowed size of the MVP candidate list minus 1.
Derivation of pairwise average MVP candidates
The pairwise average MVP candidates are generated by averaging MVPs derived using predefined pairs of the first two merge candidates in the existing merge candidate list. The first merge candidate in the predefined pair may be defined as p0Cand and the second merge candidate in the predefined pair may be defined as p1Cand. An average motion vector is calculated for each reference picture list availability from the motion vectors of p0Cand and p1Cand, respectively. If both motion vectors are available for one reference picture list, the two motion vectors are averaged even when they point to different reference pictures, and the reference picture of the averaged motion vector is set to the reference picture of p0Cand, if only one motion vector is available for one reference picture list, the motion vector is directly used, and if no motion vector is available for one reference picture list, the motion vector and the reference picture index of the reference picture list remain invalid.
Zero MVP
When the MVP candidate list is not full after adding the pairwise average MVP candidates, zero MVPs are inserted at the end of the MVP candidate list until the maximum allowed size of the MVP candidate list is reached.
MMVD
As described above, in merge mode, motion information (i.e., MVP candidates) is implicitly derived from a MVP candidate list constructed for the current CU and is directly used as the MV of the current CU for generating prediction samples of the current CU, which may result in a specific error between the actual MV of the current CU and the implicitly derived MVP. In order to improve the accuracy of the MV of the current CU, MMVD is introduced in the VVC, wherein the Motion Vector Difference (MVD) of the current CU is added to the implicitly derived MVP to obtain the MV of the current CU. A MMVD flag is signaled after the conventional merge flag is sent to specify whether a pairwise average MVP candidate is used for the current CU.
In MMVD mode, after selecting a MVP candidate from the first two MVP candidates in the MVP candidate list, MMVD information is signaled, wherein MMVD information includes MMVD candidate flags, a distance index, and a direction index, wherein MMVD candidate flags are used to specify which of the first two MVP candidates to select as MV basis, the distance index is used to indicate motion amplitude information of the MVD, and the direction index is used to indicate motion direction information of the MVD.
The distance index of the motion amplitude information specifying the MVD indicates a predefined offset from a starting point (e.g., represented by a dashed circle in fig. 9) in a reference picture (e.g., L0 reference picture 501 or L1 reference picture 503 in fig. 9) of the current CU to which the selected MVP candidate points, and the MVD may be derived from the offset and may be added to the selected MVP candidate. The relationship between the distance index and the predefined offset is specified in table 1 below.
TABLE 1
The direction index designates a symbol of the MVD, which represents the direction of the MVD with respect to the starting point. Table 2 specifies the relationship between the direction index and the predefined symbol. In some examples, the meaning of the sign of the MVD may vary depending on the information of the selected MVP candidate. When the selected MVP candidate is a uni-directionally predicted MV or a bi-directionally predicted MV with two MVs pointing to the same side of the current picture (i.e., the POC of both reference pictures of the current picture (e.g., list 0 and list 1 reference pictures, which are also referred to as L0 reference picture and L1 reference picture, respectively) are both greater than the POC of the current picture or both less than the POC of the current picture), the symbols in table 2 specify the symbol of the MVD added to the selected MVP candidate. When the selected MVP candidate is a bi-predictive MV in which two MVs point to different sides of the current picture (i.e., the POC of one reference picture of the current picture is greater than the POC of the current picture and the POC of the other reference picture of the current picture is less than the POC of the current picture), the sign in table 2 specifies the sign of the MVD for list 0MVD0 added to the MVP for list 0MVP0 in the selected MVP candidate and the sign of the MVD for list 1MVD1 added to the MVP for list 1MVP1 in the selected MVP candidate is opposite to the sign in table 2, if the POC distance for the L0 reference picture is greater than the POC distance for the L0 reference picture, the sign in table 2 specifies the sign of the MVD added to MVP1 and the sign of the MVD 2 added to MVP1 and the sign of the MVD opposite to the MVD0 in the MVP 2.
| Direction index |
00 |
01 |
10 |
11 |
| X-axis |
+ |
- |
N/A |
N/A |
| Y-axis |
N/A |
N/A |
+ |
- |
TABLE 2
The MVD is scaled according to POC distance. If the POC distance is the same for both the L0 reference picture and the L1 reference picture, then the MVD does not need scaling. Otherwise, if the POC distance for the L0 reference picture is greater than the POC distance for the L1 reference picture, MVD1 is scaled. If the POC distance for the L1 reference picture is greater than the POC distance for the L0 reference picture, MVD0 is scaled.
GPM
In VVC, GPM is supported for inter prediction. The GPM is signaled as a merge mode using CU level flags, other merge modes including regular merge mode, MMVD mode, CIIP mode, and sub-block merge mode. For each possible CU size w×h excluding 8×64 and 64×8 (w=2 m and h=2 n, where m, n e {3,4,5,6 }), GPM supports 64 partitions in total.
When using GPM, a CU is divided into two parts by geometrically located straight lines. The position of the parting line is mathematically derived from the angle and offset parameters of the particular partition. Each part of the CU obtained by geometric partitioning is inter-predicted using its own motion, and only unidirectional prediction is allowed for each partition, i.e. each part has one motion vector and one reference index. Unidirectional prediction motion constraints are applied to ensure that, like conventional bi-prediction, only two motion compensated predictions are required for each CU.
If GPM is used for the current CU, a geometric partition index indicating the partition mode of geometric partition (indicating the angle and offset of geometric partition) and two merge indexes (one merge index for each partition) are further signaled.
The unidirectional prediction candidate list is directly derived from the merge candidate list constructed according to the extended merge prediction process described above. N is represented as an index of a unidirectional prediction motion vector in the unidirectional prediction candidate list. The LX motion vector of the nth merge candidate in the merge candidate list (where X equals the parity of n) is used as the nth uni-directional predicted motion vector for GPM. These motion vectors are marked with an "x" in fig. 10. In the case where there is no corresponding LX motion vector of the nth merge candidate in the merge candidate list, the L (1-x) motion vector of the same merge candidate is used as a unidirectional prediction motion vector for GPM.
CIIP
In VVC, when a CU is encoded in merge mode, if the CU contains at least 64 luma samples (i.e., the width of the CU times the height of the CU is equal to or greater than 64), and if both the width and height of the CU are less than 128 luma samples, an additional flag is signaled to indicate CIIP if the mode is applied to the current CU. In CIIP mode, a prediction signal is obtained by combining an inter prediction signal with an intra prediction signal. The inter prediction signal in CIIP mode is derived using the same inter prediction process as applied in the normal merge mode, and the intra prediction signal in CIIP mode is derived after the normal intra prediction process with the planar mode. Then, the intra prediction signal and the inter prediction signal are combined using weighted average, wherein weight values are calculated according to the coding modes of the top neighboring block and the left neighboring block of the current CU 1601 (shown in fig. 11), as follows:
-isIntraTop is set to 1 if the top neighboring block is available and intra coded, otherwise isIntraTop is set to 0;
-ISINTRALEFT is set to 1 if left neighboring block is available and intra coded, otherwise ISINTRALEFT is set to 0;
-if (ISINTRALEFT + isIntraTop) is equal to 2, the weight value is set to 3;
-otherwise, if (ISINTRALEFT + isIntraTop) is equal to 1, the weight value is set to 2;
Otherwise, the weight value is set to 1.
-Deriving CIIP the prediction signal P CIIP in mode as follows:
PCIIP=((4-wt)*Pinter+wt*Pintra+2)>>2 (1)
Where P inter is an inter prediction signal in CIIP mode, P intra is an intra prediction signal in CIIP mode, wt is a weight value, and > represents a right shift operation.
Intra block copy in a Versatile Video Codec (VVC)
Intra Block Copy (IBC) is a tool employed in HEVC extension on SCC. IBC significantly improves the codec efficiency of screen content material. Since the IBC mode is implemented as a block-level coding mode, block Matching (BM) is performed at the encoder to find the optimal block vector (or motion vector) for each CU. Here, the block vector is used to indicate the displacement from the current block to a reference block that has been reconstructed inside the current picture. The luminance block vector of an IBC-encoded CU is of integer precision. The chroma block vector is also rounded to integer precision. When combined with AMVR, IBC mode may switch between 1-pixel motion vector precision and 4-pixel motion vector precision. An IBC-encoded CU is considered as a third prediction mode in addition to the intra prediction mode or the inter prediction mode. The IBC mode applies to CUs having a width and a height both less than or equal to 64 luminance samples.
On the encoder side, hash-based motion estimation is performed for IBCs. The encoder performs RD checking on blocks having a width or height of not more than 16 luminance samples. For the non-merge mode, a block vector search is first performed using a hash-based search. If the hash search does not return valid candidates, a local search based on block matching will be performed.
In hash-based searches, the hash key match (32-bit CRC) between the current block and the reference block is extended to all allowed block sizes. The hash key calculation for each position in the current picture is based on a4 x 4 sub-block. For a larger size current block, when all hash keys of all 4 x 4 sub-blocks match the hash keys in the corresponding reference locations, the hash keys are determined to match the hash keys of the reference block. If the hash key of the plurality of reference blocks is found to match the hash key of the current block, a block vector cost for each matching reference is calculated and the reference block with the smallest cost is selected.
In the block matching search, the search range is set to cover both the previous CTU and the current CTU.
At the CU level, IBC mode is signaled with a flag, and IBC mode may be signaled as IBC AMVP mode or IBC skip/merge mode:
IBC skip/merge mode-merge candidate indices are used to indicate which block vector from the list of neighboring candidate IBC encoded blocks is used to predict the current block. The merge list consists of spatial candidates, HMVP candidates, and pairwise candidates.
IBC AMVP mode-block vector differences are coded in the same way as motion vector differences. The block vector prediction method uses two candidates as predictors, one from the left neighbor and one from the upper neighbor (in the case of IBC coding). When either neighbor is not available, the default block vector will be used as a predictor. A flag is signaled to indicate a block vector predictor index.
IBC reference region
To reduce memory consumption and decoder complexity, IBCs in VVCs only allow reconstructed portions of predefined regions to include regions of the current CTU and specific regions of the left CTU. Fig. 12 shows the reference area of IBC mode, where each block represents a 64×64 luminance sample cell.
According to the position of the current coding CU position within the current CTU, the following applies:
In the case where the current block falls into the upper left 64×64 block of the current CTU, the current block may use the CPR mode to refer to reference samples in the lower right 64×64 block of the left CTU in addition to reconstructed samples in the current CTU. The current block may also use CPR mode to reference the reference samples in the lower left 64 x 64 block of the left CTU and the reference samples in the upper right 64 x 64 block of the left CTU.
In the case that the current block falls into the upper right 64×64 block of the current CTU, the current block may also refer to reference samples in the lower left 64×64 block and the lower right 64×64 block of the left CTU using the CPR mode if the luminance position (0,64) with respect to the current CTU has not been reconstructed in addition to referring to reconstructed samples in the current CTU, or else the current block may refer to reference samples in the lower right 64×64 block of the left CTU.
In the case that the current block falls into the lower left 64×64 block of the current CTU, the current block may reference the reference samples in the upper right 64×64 block and the lower right 64×64 block of the left CTU using the CPR mode if the luminance position (64, 0) with respect to the current CTU has not been reconstructed, in addition to the reconstructed samples in the current CTU. Otherwise, the current block may also use CPR mode to reference a reference sample in the lower right 64×64 block of the left CTU.
In the case where the current block falls into the 64 x 64 block at the bottom right of the current CTU, the current block may refer to only the reconstructed samples in the current CTU using CPR mode.
This limitation allows IBC mode to be implemented using local on-chip memory for hardware implementation.
Interaction of IBC with other codec tools
Interactions between IBC modes and other inter-codec tools in VVC, such as paired merge candidates, history-based motion vector predictors (HMVP), combined intra/inter prediction modes (CIIP), merge modes with motion vector differences (MMVD), and Geometric Partitioning Modes (GPM), are as follows:
IBC may be used with paired merge candidates HMVP. A new pair IBC merge candidate may be generated by averaging the two IBC merge candidates. For HMVP, IBC motions are inserted into the history buffer for future reference.
IBC cannot be used in combination with affine motion, CIIP, MMVD and GPM.
When using DUAL TREE (DUAL TREE) partitioning, the chroma coding block does not allow IBC.
Unlike HEVC screen content codec extensions, the current picture is no longer included as one of the reference pictures in reference picture list 0 for IBC prediction. The derivation process of motion vectors for IBC mode does not include all neighboring blocks in inter mode and vice versa. The following IBC design aspects apply:
IBC shares the same procedure as conventional MV merging, including paired merging candidates and history-based motion predictors, but does not allow TMVP and zero vectors, as they are not valid for IBC mode.
Separate HMVP buffers (5 candidates each) were used for conventional MV and IBC.
The block vector constraint is implemented in the form of a bitstream consistency constraint, the encoder needs to ensure that there are no invalid vectors in the bitstream and if the merge candidate is invalid (out of range or 0), then no merge should be used. This bitstream conformance constraint is expressed in terms of a virtual buffer as described below.
For deblocking, IBC is treated as an inter mode.
If the current block is coded using IBC prediction mode, then the AMVR does not use quarter-pixels, instead, the AMVR is signaled to indicate only whether the MV is an integer pixel or a4 integer pixel.
The number of IBC combining candidates may be signaled in the header separately from the number of regular combining candidates, sub-block combining candidates and geometric combining candidates.
The virtual buffer concept is used to describe the allowable reference areas and valid block vectors for IBC prediction modes. CTU size is denoted ctbSize, width wIbcBuf =128×128/ctbSize and height hIbcBuf = ctbSize of virtual buffer ibcBuf. For example, the size of ibcBuf is also 128×128 for a CTU size of 128×128, 256×64 for a CTU size of 64×64, and 512×32 for a CTU size of 32×32.
The size of the VPDU is min (ctbSize, 64) in each dimension, wv=min (ctbSize, 64).
The virtual IBC buffer ibcBuf remains as follows.
At the beginning of decoding each CTU row, the entire ibcBuf is refreshed with an invalid value of-1.
At the start of decoding a VPDU (xVPDU, yVPDU) relative to the upper left corner of a picture, ibcBuf [ x ] [ y ] = -1 is set, where x= xVPDU% wIbcBuf,.. xVPDU% wIbcBuf +wv-1, y= yVPDU% ctbSize,.. yVPDU% ctbSize +wv-1.
After decoding, the CU includes (x, y) relative to the upper left corner of the picture, set:
ibcBuf[x%wIbcBuf][y%ctbSize]=recSample[x][y]
For a block covering coordinates (x, y), a block vector bv= (bv [0], bv [1 ]) is valid if the following condition for the block vector is true, otherwise, the block vector is invalid:
ibcBuf [ (x+bv [0 ])% wIbcBuf ] [ y+bv [1 ])% ctbSize ] should not be equal to-1.
Intra block replication in Enhanced compression model (Enhanced CompressionModel, ECM)
In ECM, IBC is improved from the following.
IBC merge list/AMVP list construction
The IBC merge list/AMVP list construction is modified as follows:
only when the IBC merge candidate/AMVP candidate is valid, it may be inserted into the IBC merge candidate list/AMVP candidate list.
The top-right spatial candidate, the bottom-left spatial candidate, and the top-left spatial candidate, and one pairwise average candidate may be added to the IBC merge candidate list/AMVP candidate list.
The template-based adaptive reordering (ARMC-TM) is applied to the IBC merge list.
The HMVP table size for IBC increases to 25. After deriving up to 20 IBC merge candidates by using full pruning, they are reordered together. After reordering, the first 6 candidates with the lowest template matching cost are selected as final candidates in the IBC merge list.
The candidates for zero vectors used to populate the IBC merge list/AMVP list are replaced by the BVP candidate set located in the IBC reference area. The zero vector is invalid as a block vector in IBC merge mode and therefore is discarded as BVP in the IBC candidate list.
The three candidates are located on the nearest corners of the reference region, and three additional candidates are determined in the middle of the three sub-regions (A, B and C), the coordinates of which are determined by the width and height of the current block and the Δx and Δy parameters, as shown in fig. 13.
IBC with template matching
Template matching is used in IBC for both IBC merge mode and IBC AMVP mode.
In contrast to the IBC-TM merge list used by the conventional IBC merge mode, the IBC-TM merge list is modified such that candidates are selected according to a pruning method with a distance of movement between the candidates, as in the conventional TM merge mode. The end zero motion implementation is replaced by motion vectors to the left (-W, 0), top (0, -H), and top left (-W, -H), where W is the width of the current CU and H is the height of the current CU.
In the IBC-TM merge mode, a template matching method is used to refine the selected candidates prior to the RDO or decoding process. The IBC-TM merge mode competes with the conventional IBC merge mode and signals the TM merge flag.
In IBC-TM AMVP mode, up to 3 candidates are selected from the IBC-TM merge list. A template matching method is used to refine each of the 3 selected candidates and rank them according to their resulting template matching costs. Only the first 2 candidates are then considered as usual in the motion estimation process.
The template matching refinement for both IBC-TM merge mode and AMVP mode is very simple because IBC motion vectors are constrained to be (i) integers and (ii) within the reference region, as shown in fig. 12. Thus, in IBC-TM merge mode, all refinements are performed with integer precision, and in IBC-TM AMVP mode, they are performed with integer precision or 4-pixel precision, depending on the AMVR value. Such refinement only accesses samples that have not been interpolated. In both cases, the motion vectors refined in each refinement step and the templates used have to adhere to the constraints of the reference region.
IBC reference region
The reference region for IBC is extended to the upper two CTU rows. Fig. 14 shows a reference region for encoding CTUs (m, n). Specifically, for CTUs (m, n) to be encoded, the reference region includes CTUs having indices (m-2, n-2), (W, n-2), (0, n-1), (W, n-1), (0, n), (m, n), where W represents the maximum horizontal index within the current tile, slice, or picture. This arrangement ensures that for a CTU of size 128, IBCs do not require additional memory in current ETM platforms. The per-sample block vector search (or local search) range is limited in the horizontal direction to [ - (C < < 1), C > >2] and in the vertical direction to [ -C, C > >2] to accommodate the reference region extension, where C represents the CTU size.
IBC merge mode with block vector difference
IBC merge mode with block vector difference is employed in ECM. The distance set is {1 pixel, 2 pixels, 4 pixels, 8 pixels, 12 pixels, 16 pixels, 24 pixels, 32 pixels, 40 pixels, 48 pixels, 56 pixels, 64 pixels, 72 pixels, 80 pixels, 88 pixels, 96 pixels, 104 pixels, 112 pixels, 120 pixels, 128 pixels }, and the BVD direction is two horizontal directions and two vertical directions.
The base candidate is selected from the first five candidates in the reordered IBC merge list. And, all possible MBVD refinement locations (20 x 4) for each base candidate are reordered based on the SAD cost between the template (row above and column to the left of the current block) and its reference for each refinement location. Finally, the first 8 refinement locations with the lowest template SAD cost are reserved as available locations for MBVD index encoding.
IBC adaptation for camera capture content
When adapting IBC for camera capture content, the IBC reference range is reduced from 2 CTU rows to 2 x 128 rows, as shown in fig. 15. On the encoder side, to reduce complexity, the local search range is set to [ -8,8] in the horizontal direction and [ -8,8] in the vertical direction, centered around the first block vector predictor of the current CU. The encoder modifications are not applicable to SCC sequences.
CIIP in combination with TIMD and TM combining
In CIIP mode, prediction samples are generated by weighting the inter-prediction signal predicted using CIIP-TM merge candidates and the intra-prediction signal predicted using the TIMD derived intra-prediction mode. The method is applied only to encoded blocks having an area less than or equal to 1024.
The TIMD derivation method is used to derive intra prediction modes in CIIP. Specifically, the intra prediction mode having the smallest SATD value in the TIMD mode list is selected and mapped to one of 67 conventional intra prediction modes.
Furthermore, it is proposed to modify weights for both tests in case the derived intra prediction mode is an angular mode (wIntra, wInter). For the near horizontal mode (2 < = angular mode index < 34), the current block is divided vertically as shown in fig. 16A, and for the near vertical mode (34 < = angular mode index < = 66), the current block is divided horizontally as shown in fig. 16B.
Table 3 shows (wIntra, wInter) for the different sub-blocks.
| Sub-block index |
(wIntra,wInter) |
| 0 |
(6,2) |
| 1 |
(5,3) |
| 2 |
(3,5) |
| 3 |
(2,6) |
Table 3. Modified weights for angle mode.
A CIIP-TM merge candidate list is constructed for the CIIP-TM mode using CIIP-TM. The merge candidates are refined by template matching. CIIP-TM merge candidates are also reordered into regular merge candidates by the ARMC method. The maximum number of CIIP-TM merge candidates is equal to two.
Multi-hypothesis prediction (MHP)
In the multi-hypothesis inter prediction mode, one or more additional motion-compensated prediction signals are signaled in addition to the conventional bi-directional prediction signal. The resulting overall prediction signal is obtained by a point-by-point weighted superposition. Using the bi-directional predicted signal p bi and the first additional inter predicted signal/hypothesis h 3, the resulting predicted signal p 3 is obtained as follows:
p3=(1-α)pbi+αh3 (2)
According to the mapping presented in table 4, the weighting factor α is specified by the new syntax element add_hyp_weight_idx:
| add_hyp_weight_idx |
α |
| 0 |
1/4 |
| 1 |
-1/8 |
Table 4. Mapping between add_hyp_weight_idx and α
Similar to the above, more than one additional prediction signal may be used. The resulting overall prediction signal is iteratively accumulated with each additional prediction signal.
pn+1=(1-αn+1)pn+αn+1hn+1 (3)
The resulting overall predicted signal is obtained as the last p n (i.e., p n with the largest index n). Within this mode, at most two additional prediction signals may be used (i.e., n is limited to 2).
The motion parameters of each additional prediction hypothesis may be explicitly signaled by specifying a reference index, a motion vector predictor index, and a motion vector difference, or implicitly signaled by specifying a merge index. A separate multi-hypothesis combining flag distinguishes between these two signaling modes.
For inter AMVP mode, MHP is applied only if unequal weights in BCW are selected in bi-prediction mode.
A combination of MHP and BDOF is possible, however, BDOF is only applied to the bi-predictive signal portion of the predicted signal (i.e., the first two common hypotheses).
Geometric Partitioning Modes (GPM) in ECM
GPM with combined motion vector difference (MMVD)
GPM in a VVC is extended by applying motion vector refinement over existing GPM uni-directional MVs. A flag is first signaled for the GPM CU to specify whether this mode is used. If this mode is used, each geometric partition of the GPM CU may further decide whether to signal the MVD. If the MVD is signaled for a geometric partition, after the GPM merge candidate is selected, the motion of that partition is further refined by the signaled MVD information. All other programs remain the same as GPM.
MVDs are signaled as a pair of distance and direction, similar to MMVD. Nine candidate distances (1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 3 pixel, 4 pixel, 6 pixel, 8 pixel, 16 pixel) and eight candidate directions (four horizontal/vertical directions and four diagonal directions) are involved in a GPM with MMVD (GPM-MMVD). In addition, when pic_ fpel _ mmvd _enabled_flag is equal to 1, the MVD shifts left by 2 as in MMVD.
GPM with Template Matching (TM)
Template matching is applied to the GPM. When GPM mode is enabled for a CU, a CU level flag is signaled to indicate whether TM is applied to two geometric partitions. TM is used to refine the motion information for each geometric partition. When TM is selected, templates are constructed using left-side neighboring sample points, upper neighboring sample points, or left-side and upper neighboring sample points according to the division angle, as shown in table 5. The motion is then refined by minimizing the difference between the current template and the template in the reference picture using the same search pattern of the merge mode (where the half-pixel interpolation filter is disabled).
Table 5. Templates for the first and second geometric partitions, where a represents using the upper sample point, L represents using the left sample point, and l+a represents using both the left and upper sample points.
The GPM candidate list is constructed as follows:
1. the interleaved list-0 MV candidates and list-1 MV candidates are derived directly from the conventional merge candidate list, wherein the list-0 MV candidates have a higher priority than the list-1 MV candidates. A clipping method with an adaptive threshold based on the current CU size is applied to remove redundant MV candidates.
2. The interleaved list-1 MV candidates and list-0 MV candidates are further derived directly from the conventional merge candidate list, wherein the list-1 MV candidates have a higher priority than the list-0 MV candidates. The same clipping method with adaptive threshold is also applied to remove redundant MV candidates.
3. Zero MV candidates are filled until the GPM candidate list is full.
GPM-MMVD and GPM-TM are enabled for only one GPM CU. This is done by first signaling the GPM-MMVD syntax. When both GPM-MMVD control flags are equal to false (i.e., GPM-MMVD is disabled for both GPM partitions), a GPM-TM flag is signaled to indicate whether a template match is applied to both GPM partitions. Otherwise (at least one GPM-MMVD flag equals true), the value of the GPM-TM flag is inferred to be false.
GPM with inter-and intra-prediction
Under GPM with inter-and intra-prediction, final prediction samples are generated by weighting inter-and intra-prediction samples for each GPM partition. Inter prediction samples are derived by inter GPM, while intra prediction samples are derived by Intra Prediction Mode (IPM) candidate list and index signaled from the encoder. The IPM candidate list size is predefined as 3. The IPM candidates that are available are a parallel angle mode (parallel mode) with respect to the GPM block boundary, a vertical angle mode (vertical mode) with respect to the GPM block boundary, and a planar mode, as shown in fig. 17A-17D, respectively. In addition, the GPM with intra prediction and intra prediction as shown in fig. 17D is restricted to reduce signaling overhead for IPM and to avoid an increase in the size of an intra prediction circuit on a hardware decoder. In addition, direct motion vector and IPM storage on the GPM blend area is introduced to further improve codec performance.
In the IPM export based on DIMD and proximity modes, the parallel mode is first registered. Thus, if the same IPM candidate does not exist in the list, a maximum of two IPM candidates derived from the decoder side intra mode derivation (DIMD) method and/or neighboring blocks may be registered. For neighbor mode derivation, there are at most five available neighbor block locations, but they are limited by the angle of the GPM block boundary, as shown in Table 6, which have been used for GPM with template matching (GPM-TM).
Table 6. Locations of available neighboring blocks for IPM candidate derivation based on the angle of GPM block boundary. A and L represent the top and left sides of the prediction block.
GPM-intra may be combined with GPM with motion vector difference combining (GPM-MMVD). TIMD are used for IPM candidates within GPM frames to further improve codec performance. Parallel mode may be registered first, followed by TIMD, DIMD, and IPM candidates for neighboring blocks.
Template matching based reordering for GPM partitioning patterns
In the template matching based reordering for the GPM partitioning pattern, given the motion information of the current GPM block, the respective TM cost values of the GPM partitioning pattern are calculated. Then, all GPM split patterns are reordered in ascending order based on the TM cost value. Instead of transmitting the GPM partition pattern, an index using a golomb-rice code is signaled to indicate the exact GPM partition pattern's position in the reorder list.
The reordering method for the GPM partition mode is a two-step process performed after generating respective reference templates of two GPM partitions in the coding unit, as follows:
expanding the GPM partition edges into reference templates of two GPM partitions, obtaining 64 reference templates, and calculating respective TM costs for each of the 64 reference templates;
Reorder the GPM partition modes in ascending order based on the TM cost value of the GPM partition modes and mark the 32 best modes as available partition modes.
The edges on the template are extended from the edges of the current CU, as shown in fig. 18, but the GPM blending process is not used in the template region across the edges.
After reorder in ascending order using TM cost, the index is signaled.
Intra template matching
Intra template matching prediction (intra TMP) is a special intra prediction mode that replicates the best prediction block from the reconstructed portion of the current frame, where the best prediction block is the block that the L-shaped template matches the current template. For a predefined search range, the encoder searches for a template most similar to the current template in the reconstructed portion of the current frame and uses the corresponding block as a prediction block. The encoder then signals the use of this mode and performs the same prediction operation at the decoder side.
Generating a prediction signal by matching an L-shaped causal neighbor of a current block with another block in the predefined search area in fig. 19, wherein the predefined search area comprises:
R1:current CTU
R2 upper left CTU
R3:upper CTU
R4 left CTU
The Sum of Absolute Differences (SAD) is used as the cost function.
Within each region, the decoder searches for a template having the smallest SAD with respect to the current template, and uses a corresponding block of the template as a prediction block.
The sizes of all regions (SEARCHRANGE _w, SEARCHRANGE _h) are set to be proportional to the block sizes (BlkW, blkH) so that each pixel has a fixed number of SAD comparisons. Namely:
SearchRange_w=a×BlkW
SearchRange_h=a×BlkH
where "a" is a constant that controls the gain/complexity tradeoff. In practice, "a" is equal to 5.
For CUs with width and height less than or equal to 64, an intra template matching tool is enabled. The maximum CU size for intra template matching is configurable.
When DIMD is not used for the current CU, intra template matching prediction modes are signaled at the CU level by a dedicated flag.
Fusion of template-based intra-mode derivation (TIMD)
For each intra-prediction mode in the MPM, the SATD between the predicted and reconstructed samples of the template is calculated. The two intra prediction modes with the smallest SATD are first selected as TIMD modes. The two TIMD modes are fused using weights after the PDPC process is applied, and this weighted intra prediction is used to codec the current CU. Position dependent intra prediction combining (PDPC) is included in the derivation of TIMD modes.
Comparing the cost of the two modes selected with a threshold, and applying a cost factor of 2 during testing as follows:
Cost of mode2 (costMode 2) < cost of 2 x mode1 (costMode 1).
If the condition is true, fusion is applied, otherwise only mode 1 is used.
The weights of the two modes are calculated from their SATD costs as follows:
Weight 1= costMode 2/(costMode 1+ costMode 2)
Weight 2=1-weight 1
The division is performed using the same look-up table (LUT) based integer scheme used by CCLM.
Local Illumination Compensation (LIC)
LIC is an inter-prediction technique for modeling local illumination changes between a current block template and a reference block template as a function of local illumination changes between the current block and its predicted block. The parameters of this function can be represented by a scaling factor α and an offset β, which form a linear equation, i.e., α×p [ x ] +β, to compensate for illumination variations, where p [ x ] is the reference sample point pointed at by the MV at position x on the reference picture. When surround motion compensation is enabled, MVs should be clipped taking into account the surround offset. Since α and β can be derived based on the current block template and the reference block template, no signaling overhead is required for α and β other than signaling an LIC flag to indicate use of LIC for AMVP mode.
The local illumination compensation proposed in JVET-O0066 is used to uni-directionally predict inter-CUs, modified as follows.
Intra-adjacent samples may be used for LIC parameter derivation;
disabling LIC for blocks with less than 32 luma samples;
for both the non-sub-block mode and the affine mode, performing LIC parameter derivation based on the template block samples corresponding to the current CU but not the partial template block samples corresponding to the first 16 x16 unit at the top left;
The samples of the reference block template are generated by using MC with blocks MV that do not need to be rounded to integer pixel precision.
OBMC
As described in JVET-L0101, when OBMC is applied, weighted prediction is performed using motion information of neighboring blocks to refine the top boundary pixels and left side boundary pixels of the CU.
The conditions under which no OBMC was applied were as follows:
when OBMC is disabled at SPS level
When the current block has an intra mode or an IBC mode
When LIC is applied to the current block
When the current luminance block region is less than or equal to 32
The sub-block boundary OBMC is performed by applying the same blend to the top sub-block boundary pixel, the left sub-block boundary pixel, the bottom sub-block boundary pixel, and the right sub-block boundary pixel using the motion information of the neighboring sub-blocks. The sub-block boundary OBMC is enabled for the following sub-block based codec:
affine AMVP mode;
affine merge mode and sub-block based temporal motion vector prediction (SbTMVP);
Double-sided matching based on sub-blocks.
When OBMC mode is used with LMCS in CIIP mode, inter-frame mixing is performed before LMCS mapping of inter-frame samples. In CIIP mode, LMCS is applied to the mixed inter-samples and combined with the intra-samples of application LMCS.
Where Inter predY represents the samples predicted by the motion of the current block in the original domain, intra predY represents the samples predicted in the map domain, OBMC predY represents the samples predicted by the motion of the neighboring block in the original domain, and w 0 and w 1 are weights.
OBMC based on template matching
In the template matching-based OBMC scheme, the prediction value of the CU boundary sample derivation method is not determined directly using weighted prediction, but is determined according to the template matching cost, including using only motion information of the current block or using one of motion information of neighboring blocks and a mixed mode.
In this scheme, the upper template size is equal to 4×1 for each block of size 4×4 at the top CU boundary. If the N neighboring blocks have the same motion information, the upper template size is enlarged to 4nx1 since the MC operation can be processed at one time. For each left block of size 4×4 at the left CU boundary, the left template size is equal to 1×4 or 1×4n (fig. 20).
For each 4 x 4 top block (or N4 x 4 groups of blocks), the following steps are followed to derive the predicted values of the boundary samples.
Take block a as the current block and its upper neighbor block AboveNeighbor _a as an example. The operation of the left block is performed in the same manner.
First, three template matching costs (Cost 1), cost2 (Cost 2), cost3 (Cost 3)) are measured by SAD between a reconstructed sample of a template and a corresponding reference sample of the template derived by the MC process according to the following three types of motion information:
And calculating Cost1 according to the motion information of A.
And calculating Cost2 according to the motion information of AboveNeighbor _A.
Cost3 is calculated from the motion information of a and AboveNeighbor _a using weighted predictions of weighting factors 3/4 and 1/4, respectively.
Next, a method is selected to calculate the final prediction result of the boundary samples by comparing Cost1, cost2, and Cost 3.
The original MC result using the motion information of the current block is represented as Pixel1 (Pixel 1), and the MC result using the motion information of the neighboring block is represented as Pixel2 (Pixel 2). The final prediction result is represented as a new pixel (NewPixel).
NewPixel (i, j) =pixel 1 (i, j) if Cost1 is minimum.
If (Cost2+ (Cost2 > > 2) + (Cost2 > > 3)) < = Cost1, mixed mode 1 is used.
For a luminance block, the number of mixed pixel rows is 4.
NewPixel(i,0)=(26×Pixel1(i,0)+6×Pixel2(i,0)+16)>>5
NewPixel(i,1)=(7×Pixel1(i,1)+Pixel2(i,1)+4)>>3
NewPixel(i,2)=(15×Pixel1(i,2)+Pixel2(i,2)+8)>>4
NewPixel(i,3)=(31×Pixel1(i,3)+Pixel2(i,3)+16)>>5
For a chroma block, the number of mixed pixel rows is 1.
NewPixel(i,0)=(26×Pixel1(i,0)+6×Pixel2(i,0)+16)>>5
If Cost1< = Cost2, mixed mode 2 is used.
For a luminance block, the number of mixed pixel rows is 2.
NewPixel(i,0)=(15×Pixel1(i,0)+Pixel2(i,0)+8)>>4
NewPixel(i,1)=(31×Pixel1(i,1)+Pixel2(i,1)+16)>>5
For a chroma block, the number of mixed pixel rows/columns is 1.
NewPixel(i,0)=(15×Pixel1(i,0)+Pixel2(i,0)+8)>>4
Otherwise, mixed mode 3 is used.
For a luminance block, the number of mixed pixel rows is 4.
NewPixel(i,1)=(7×Pixel1(i,1)+Pixel2(i,1)+4)>>3
NewPixel(i,2)=(15×Pixel1(i,2)+Pixel2(i,2)+8)>>4
NewPixel(i,3)=(31×Pixel1(i,3)+Pixel2(i,3)+16)>>5
For a chroma block, the number of mixed pixel rows is 1.
NewPixel(i,0)=(7×Pixel1(i,0)+Pixel2(i,0)+4)>>3
Currently, IBC tools are not combined with GPM tools, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently, coding blocks encoded with IBC mode are not combined with coding blocks encoded with intra mode or inter mode, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently, the number of Block Vectors (BV) in the IBC tool is singular, increasing the number of Block Vectors (BV) is simple, and prediction results can be combined, which can improve prediction accuracy and improve codec performance.
Currently, the coding blocks encoded with intra TMP mode are not combined with the coding blocks encoded with intra mode or inter mode, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently, the intra TMP tools are not combined with the GPM tools, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently, IBC tools are not combined with TIMD tools, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently, the intra TMP tool is not combined with TIMD tools, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently, the intra TMP tool is not combined with the LIC tool, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently IBC tools are not combined with OBMC tools, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
Currently, the intra TMP tool is not combined with the OBMC tool, and it is simple to combine them together, which can improve prediction accuracy and improve codec performance.
In the present disclosure, in order to solve the above-described problems, a method of further improving the existing design of IBC is provided. In general, the main features of the technology proposed in the present disclosure are summarized below.
1. The IBC tool is combined with the GPM tool in the form of a GPM with IBC prediction and IBC prediction, a GPM with IBC prediction and intra prediction, or a GPM with IBC prediction and inter prediction.
2. As a simplified version of the IBC tool in combination with the GPM tool, for a predefined direction (such as 45 degrees), the upper left part is predicted with intra mode, the lower right part is predicted with IBC mode, and then they are weighted average to obtain the final predicted signal.
3. The IBC tool is combined with CIIP tools, where IBC prediction is combined with intra prediction modes, or IBC prediction is combined with inter prediction modes.
4. The IBC tool is combined with the MHP tool, where more than one BV prediction is obtained and they are weighted averaged to obtain the final prediction signal.
5. The intra TMP tool is combined with CIIP tools, where the intra TMP is combined with intra prediction modes or the intra TMP is combined with inter prediction modes.
6. The intra TMP tool is combined with the GPM tool in the form of a GPM with intra TMP prediction and intra TMP prediction, a GPM with intra TMP prediction and intra prediction, or a GPM with intra TMP prediction and inter prediction.
7. As a simplified version of the combination of the intra TMP tool and the GPM tool, for a predefined direction (such as 45 degrees), the upper left part is predicted with intra mode, the lower right part with intra TMP mode, and then they are weighted on average to obtain the final prediction signal.
8. The IBC tool is combined with TIMD tools, where IBC modes are used with intra-prediction modes in MPM for TIMD fusion.
9. The intra TMP tool is combined with TIMD tools, where intra TMP mode is used with intra prediction modes in the MPM for TIMD fusion.
10. The intra TMP tool is combined with an LIC tool, wherein the local illumination variation between the current block and its intra TMP prediction block is compensated with the LIC tool.
The IBC tool is combined with an OBMC tool, wherein top boundary pixels and left side boundary pixels of the current block predicted by IBC are refined by the OBMC tool.
12. The intra TMP tool is combined with the OBMC tool, where the top boundary pixels and the left boundary pixels of the current block predicted with intra TMP are refined with the OBMC tool.
In some examples, the disclosed methods may be applied independently or jointly.
GPM with IBC prediction and IBC prediction
In accordance with one or more embodiments of the present disclosure, an IBC tool is combined with a GPM tool in the form of a GPM with IBC prediction and IBC prediction. Different methods may be used to achieve this goal.
In the first approach, two "inter" parts of the GPM with inter prediction methods and inter prediction methods in the VVC are replaced with IBCs. This means that the two IBC-combined prediction results are weighted-averaged with each other according to the partition line in the encoded block. The weights may be obtained with reference to GPM having an inter prediction method and an inter prediction method in VVC.
In the second approach, the two "inter" parts of the GPM with inter prediction methods and inter prediction methods in the ECM are replaced with IBCs, where some template matching tools may be utilized to further improve codec performance.
GPM with IBC prediction and intra prediction
In accordance with one or more embodiments of the present disclosure, an IBC tool is combined with a GPM tool in the form of a GPM with IBC prediction and intra prediction. Different methods may be used to achieve this goal.
In the first method, the "inter" part of the GPM in the ECM with inter and intra prediction methods is replaced with IBCs, where the IBCs are combined with the intra prediction results weighted averaged to obtain the final prediction signal.
GPM with IBC prediction and inter prediction
In accordance with one or more embodiments of the present disclosure, an IBC tool is combined with a GPM tool in the form of a GPM with IBC prediction and inter-frame prediction. Different methods may be used to achieve this goal.
In the first approach, one "inter" part of the GPM in the VVC with inter prediction methods and inter prediction methods is replaced with IBCs, where the IBCs merge the prediction results and the inter merge prediction results are weighted averaged to obtain the final prediction signal.
In the second approach, one "inter" part of the GPM with inter prediction methods and inter prediction methods in the ECM is replaced with IBCs, where some template matching tools may be utilized to further improve coding performance.
Simplified IBC prediction and intra prediction combination in GPM form
According to one or more embodiments of the present disclosure, IBC tools are combined with GPM tools in the form of simplified GPMs with IBC prediction and intra prediction, such as IBC prediction and intra prediction are combined in a particular partitioning mode, which may save the bit overhead of the partitioning mode representation. Different methods may be used to achieve this goal.
In the first method, for a split line, for example 45 degrees, the upper left part of the coding block is coded with intra prediction mode and the lower right part of the coding block is coded with IBC prediction mode, then they are averaged in GPM to obtain the final prediction signal.
Combined IBC-intra/inter prediction
In accordance with one or more embodiments of the present disclosure, a coding block encoded in IBC mode and a coding block encoded in intra mode or inter mode are combined. Different methods may be used to achieve this goal.
In the first method, the encoder/decoder may combine the encoded blocks encoded in IBC mode and the encoded blocks encoded in intra mode. Various methods may be utilized in this combination. In one example, similar to CIIP techniques in VVC, a coding block encoded in IBC merge mode is treated as a coding block encoded in inter merge mode, and a coding block encoded in IBC merge mode and a coding block encoded in planar intra prediction mode are combined. In another example, similar to the combination of CIIP and TIMD and TM merging techniques in ECM, a coding block encoded in IBC merge-TM mode and a coding block encoded in TIMD derived intra prediction mode are combined.
In a second method, the encoder/decoder may combine the encoded blocks encoded in IBC mode and the encoded blocks encoded in inter mode. Various methods may be utilized in this combination. In one example, similar to CIIP techniques in VVC, a coding block encoded in IBC merge mode is treated as a coding block encoded in planar intra mode, and the coding block encoded in IBC merge mode and the coding block encoded in inter merge mode are combined. In another example, the coding blocks encoded in the IBC merge mode are regarded as the coding blocks encoded in the inter merge mode, and the coding blocks encoded in the IBC merge mode and the coding blocks encoded in the inter merge mode are combined by equally averaging.
In a third method, the encoder/decoder may combine the encoded blocks encoded in IBC mode with the encoded blocks encoded in intra mode and the encoded blocks encoded in inter mode. Various methods may be utilized in this combination. In one example, the coding blocks encoded in IBC mode, the coding blocks encoded in intra mode, and the coding blocks encoded in inter mode are directly combined by equally averaging. In another example, first, the coding blocks coded in IBC mode are individually combined with the coding blocks coded in intra mode and the coding blocks coded in inter mode, respectively, as presented in the first method and the second method. The individual combined results are then combined by averaging equally.
Multi-hypothesis IBC prediction
In accordance with one or more embodiments of the present disclosure, the number of Block Vectors (BVs) in the IBC tool is increased to 2 or more, and 2 or more hypotheses are combined to obtain a final prediction result. Different methods may be used to achieve this goal.
In the first method, the encoder/decoder may combine 2 hypotheses corresponding to 2 BVs to obtain a final prediction result. This objective can be achieved using various methods. In one example, 2 BVs corresponding to the minimum rate distortion metric and the second minimum rate distortion metric in IBC AMVP mode are equally averaged to obtain the final prediction result. In another example, the prediction result corresponding to the IBC AMVP mode and the prediction result corresponding to the IBC merge mode are equally averaged to obtain a final prediction result.
In a second approach, the encoder/decoder may combine more hypotheses corresponding to more BVs to obtain the final prediction result. This objective can be achieved using various methods. In one example, the final prediction result is obtained using an iterative accumulation method proposed in the multi-hypothesis prediction (MHP) technique. In another example, all BVs corresponding to the minimum rate distortion metric, the second minimum rate distortion metric, the third minimum rate distortion metric, the term, in IBC AMVP mode are equally averaged to obtain the final prediction result.
Combined intra TMP-intra/inter prediction
In accordance with one or more embodiments of the present disclosure, a coding block encoded with intra TMP mode is combined with a coding block encoded with intra mode or inter mode. Different methods may be used to achieve this goal.
In the first method, the encoder/decoder may combine the encoded blocks encoded with intra TMP mode with the encoded blocks encoded with intra mode. Various methods may be utilized in this combination. In one example, similar to CIIP techniques in VVC, a coded block encoded with intra TMP mode is treated as a coded block encoded with inter merge mode and combined with a coded block encoded with planar intra prediction mode. In another example, similar to the combination of CIIP and TIMD and TM merging techniques in ECM, the encoded blocks encoded with intra TMP mode are combined with encoded blocks encoded with TIMD derived intra prediction mode.
In a second approach, the encoder/decoder may combine the encoded blocks encoded with intra TMP mode with the encoded blocks encoded with inter mode. Various methods may be utilized in this combination. In one example, similar to CIIP techniques in VVC, a coding block encoded with intra TMP mode is treated as a coding block encoded with planar intra mode and combined with a coding block encoded with inter merge mode. In another example, a coded block encoded with intra TMP mode is treated as a coded block encoded with inter merge mode and combined with a coded block encoded with inter merge mode by equally averaging.
In a third method, the encoder/decoder may combine the encoded blocks encoded with intra TMP mode with the encoded blocks encoded with intra mode and the encoded blocks encoded with inter mode. Various methods may be utilized in this combination. In one example, the encoded blocks encoded with intra TMP mode, the encoded blocks encoded with intra mode, and the encoded blocks encoded with inter mode are directly combined by equally averaging. In another example, first, as presented in the first and second methods, a coding block encoded with intra TMP mode is combined with a coding block encoded with intra mode and a coding block encoded with inter mode, respectively. The individual combined results are then combined by equally averaging.
GPM with intra TMP prediction and intra TMP prediction
In accordance with one or more embodiments of the present disclosure, an intra-frame TMP tool is combined with a GPM tool in the form of a GPM having intra-frame TMP prediction and intra-frame TMP prediction. Different methods may be used to achieve this goal.
In the first approach, two "inter" portions of the GPM in the VVC with inter prediction methods and inter prediction methods are replaced with an intra TMP. This means that the two intra TMP predictors are weighted averaged with each other according to the partition line in the encoded block. The weights may be obtained with reference to GPM having an inter prediction method and an inter prediction method in VVC.
In the second approach, the two "inter" parts of the GPM with inter prediction methods and inter prediction methods in the ECM are replaced with intra TMPs, wherein some template matching tools may be utilized to further improve codec performance.
GPM with intra TMP prediction and intra prediction
In accordance with one or more embodiments of the present disclosure, an intra-frame TMP tool is combined with a GPM tool in the form of a GPM having intra-frame TMP prediction and intra-frame prediction. Different methods may be used to achieve this goal.
In the first method, the "inter" part of the ECM having inter-and intra-prediction methods is replaced with an intra-TMP, wherein the intra-TMP prediction is weighted averaged with the intra-prediction to obtain the final prediction signal.
GPM with intra TMP prediction and inter prediction
In accordance with one or more embodiments of the present disclosure, an intra-TMP tool is combined with a GPM tool in the form of a GPM having intra-TMP prediction and inter-frame prediction. Different methods may be used to achieve this goal.
In the first approach, one "inter" portion of the GPM in the VVC with inter prediction methods and inter prediction methods is replaced with an intra TMP. Wherein the intra TMP prediction result and the inter merge prediction result are weighted averaged to obtain a final prediction signal.
In the second approach, one "inter" part of the GPM with inter prediction methods and inter prediction methods in the ECM is replaced with an intra TMP, where some template matching tools may be used to further improve codec performance.
Simplified intra TMP prediction and intra prediction combination in the form of GPM
According to one or more embodiments of the present disclosure, intra TMP tools are combined with GPM tools, such as intra TMP prediction and intra prediction, in the form of a simplified GPM with intra TMP prediction and intra prediction, in a particular partitioning mode, which may save the bit overhead of the partitioning representation. Different methods may be used to achieve this goal.
In the first method, for a partition line, for example 45 degrees, the upper left part of the coding block is coded with intra prediction mode, the lower right part of the coding block is coded with intra TMP prediction mode, and then they are averaged in GPM to obtain the final prediction signal.
Combining IBC mode with TIMD mode
In accordance with one or more embodiments of the present disclosure, an IBC tool is combined with TIMD tools. Different methods may be used to achieve this goal.
In the first method, the IBC mode is regarded as one intra-frame prediction mode added in the MPM list, then the IBC mode is compared with other intra-frame prediction modes in the MPM list by using the template matching cost, and finally the two modes with the minimum cost and the second minimum cost are fused by using the TIMD method to obtain a final prediction result.
In the second method, firstly, a conventional TIMD predicted result is obtained, then template matching cost of the IBC mode and the conventional TIMD predicted result is calculated, and finally, the IBC mode and the conventional TIMD predicted result are fused by using a TIMD method, so that a final predicted result is obtained.
Combining intra TMP mode with TIMD mode
In accordance with one or more embodiments of the present disclosure, an intra TMP tool is combined with a TIMD tool. Different methods may be used to achieve this goal.
In the first method, the intra TMP mode is regarded as one intra prediction mode added in the MPM list, then the intra TMP mode is compared with other intra prediction modes in the MPM list using a template matching cost, and finally the two modes with the minimum cost and the second minimum cost are fused using a TIMD method to obtain a final prediction result.
In the second method, a conventional TIMD prediction result is obtained first, then template matching cost of the intra-frame TMP mode and the conventional TIMD prediction result is calculated, and finally the intra-frame TMP mode and the conventional TIMD prediction result are fused by using a TIMD method, so that a final prediction result is obtained.
Combining intra TMP with LIC
In accordance with one or more embodiments of the present disclosure, an intra-frame TMP tool is combined with an LIC tool. Different methods may be used to achieve this goal.
In a first approach, intra TMP mode is considered as inter mode and the LIC is used to model the local illumination variation between the current block template and the reference block template as a function of the local illumination variation between the current block and its intra TMP prediction block. The function is a linear equation as used in conventional LIC methods.
Combining IBC with OBMC
In accordance with one or more embodiments of the present disclosure, an IBC tool is combined with an OBMC tool. Different methods may be used to achieve this goal.
In the first method, the IBC mode is regarded as an inter mode, and a conventional OBMC method is applied to perform weighted prediction using block vector information of neighboring blocks to refine top boundary pixels and left side boundary pixels of the IBC-encoded CU.
In the second approach, the IBC mode is considered as an inter mode, and the template matching based OBMC method is applied to refine the top boundary pixels and the left side boundary pixels of the IBC encoded CU using the template matching based method.
Combining intra TMP with OBMC
In accordance with one or more embodiments of the present disclosure, an intra-frame TMP tool is combined with an OBMC tool. Different methods may be used to achieve this goal.
In the first method, intra TMP mode is considered as inter mode, and a conventional OBMC method is applied to perform weighted prediction using block vector information of neighboring blocks to refine top boundary pixels and left side boundary pixels of an intra TMP encoded CU.
In the second approach, intra TMP mode is considered as inter mode, and the template matching based OBMC method is applied to refine the top boundary pixels and the left side boundary pixels of the intra TMP encoded CU using the template matching based method.
Fig. 21 illustrates a computing environment (or computing device) 1610 coupled to a user interface 1650. The computing environment 1610 may be part of a data processing server. In some embodiments, computing device 1610 may perform any of the various methods or processes (e.g., encoding/decoding methods or processes) described previously according to various examples of the disclosure. The computing environment 1610 includes a processor 1620, a memory 1630, and an input/output (I/O) interface 1640.
Processor 1620 generally controls overall operation of computing environment 1610, such as operations associated with display, data acquisition, data communication, and image processing. Processor 1620 may include one or more processors to execute instructions to perform all or some of the steps of the methods described above. Further, the processor 1620 may include one or more modules that facilitate interactions between the processor 1620 and other components. The processor may be a central processing unit (Central Processing Unit, CPU), microprocessor, single-chip, graphics processor (GRAPHICAL PROCESSING UNIT, GPU), or the like.
Memory 1630 is configured to store various types of data to support the operation of computing environment 1610. Memory 1630 may include predetermined software 1632. Examples of such data include instructions, video data sets, image data, and the like for any application or method operating on computing environment 1610. The Memory 1630 may be implemented using any type or combination of volatile or nonvolatile Memory devices such as static random access Memory (Static Random Access Memory, SRAM), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.
I/O interface 1640 provides an interface between processor 1620 and peripheral interface modules (such as keyboards, click wheels, buttons, etc.). Buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. I/O interface 1640 may be coupled with an encoder and decoder.
Fig. 22 is a flowchart illustrating a method for video decoding according to an example of the present disclosure.
In step 2201, at the decoder side, the processor 1620 may obtain a current CU encoded based on a combined mode that combines at least one of an intra prediction mode, an inter prediction mode, a TIMD mode, an LIC mode, or an OBMC mode with an intra TMP mode.
In step 2202, the processor 1620 may obtain a final prediction for the current CU based on the combined mode.
In some examples, the processor 1620 may obtain a first prediction for the current CU, wherein the first prediction is associated with intra TMP mode, obtain a second prediction for the current CU, wherein the second prediction is associated with intra prediction mode, and obtain a final prediction for the current CU based on the first prediction and the second prediction. In some examples, the intra-prediction mode may include one of a planar intra-prediction mode or TIMD-derived intra-prediction mode.
In some examples, the processor 1620 may obtain a first prediction for the current CU, wherein the first prediction is associated with intra TMP mode, obtain a second prediction for the current CU, wherein the second prediction is associated with inter prediction mode, and obtain a final prediction for the current CU based on the first prediction and the second prediction.
In some examples, the inter prediction mode may include an inter merge mode, and the processor 1620 may obtain a final prediction for the current CU by equally averaging the first prediction and the second prediction.
In some examples, the processor 1620 may obtain a first prediction for the current CU, wherein the first prediction is associated with intra TMP mode, obtain a second prediction for the current CU, wherein the second prediction is associated with intra prediction mode, obtain a third prediction for the current CU, wherein the third prediction is associated with inter prediction mode, and obtain a final prediction for the current CU based on the first prediction, the second prediction, and the third prediction. In some examples, processor 1620 may obtain the final prediction for the current CU by equally averaging the first prediction, the second prediction, and the third prediction.
In some examples, processor 1620 may obtain a first intermediate prediction based on the first prediction and the second prediction, obtain a second intermediate prediction based on the first prediction and the third prediction, and obtain a final prediction for the current CU by equally averaging the first intermediate prediction and the second intermediate prediction.
In some examples, processor 1620 may obtain a Most Probable Mode (MPM) list including intra TMP mode and one or more other intra-prediction modes, calculate a plurality of TM costs by comparing intra TMP mode and one or more other intra-prediction modes, and select a first TM cost and a second TM cost from the plurality of TM costs, wherein the first TM cost is a smallest TM cost of the plurality of TM costs and is associated with the first mode in the MPM list, and the second TM cost is a second smallest TM cost of the plurality of TM costs and is associated with the second mode in the MPM list. Further, processor 1620 may obtain a final prediction for the current CU by fusing the first mode and the second mode using TIMD modes.
In some examples, the processor 1620 may obtain intra TMP prediction for the current CU, wherein the intra TMP prediction is associated with intra TMP mode, obtain conventional TIMD prediction for the current CU, wherein the conventional TIMD prediction is associated with TIMD, and fuse the intra TMP prediction and the conventional TIMD prediction using template-based intra mode derivation (TIMD) mode to obtain a final prediction for the current CU.
In some examples, processor 1620 may obtain intra TMP prediction for the current CU, wherein the intra TMP prediction is associated with intra TMP mode, obtain a modeling function by modeling local illumination changes between the current CU and intra TMP predictions using a function of the local illumination changes between the current CU template and the reference block template, wherein the function is a linear equation, and obtain the local illumination compensated intra TMP prediction based on the modeling function.
In some examples, processor 1620 may obtain the current CU encoded based on intra TMP mode and perform weighted prediction using block vector information of neighboring blocks, e.g., perform weighted prediction using block vector information of neighboring blocks, refine the top boundary pixels and the left side boundary pixels of the current CU encoded based on intra TMP mode. For example, assuming that the current CU and its upper CU are both encoded using intra TMP mode but use different block vectors, in order to refine the top boundary pixel of the current CU, another prediction result of the top boundary pixel of the current CU is obtained using the block vector of the upper CU, and then the original prediction result and the another prediction result are weighted averaged to obtain the final prediction.
In some examples, processor 1620 may obtain the current CU encoded based on the intra TMP mode and refine the top boundary pixels and the left boundary pixels of the current CU encoded based on the intra TMP mode using a template matching based method.
Fig. 23 is a flowchart illustrating a method for video encoding corresponding to the method for video decoding as illustrated in fig. 22.
In step 2301, at the encoder side, the processor 1620 may encode the current CU based on a combined mode that combines at least one of intra prediction mode, inter prediction mode, TIMD mode, LIC mode, or OBMC mode with intra TMP mode.
In step 2302, at the encoder side, the processor 1620 may send the current CU encoded based on the combined mode to the decoder.
In some examples, the processor 1620 may obtain a first prediction for the current CU, wherein the first prediction is associated with intra TMP mode, obtain a second prediction for the current CU, wherein the second prediction is associated with intra prediction mode, and obtain a final prediction for the current CU based on the first prediction and the second prediction. In some examples, the intra-prediction mode may include one of a planar intra-prediction mode or TIMD-derived intra-prediction mode.
In some examples, the processor 1620 may obtain a first prediction for the current CU, wherein the first prediction is associated with intra TMP mode, obtain a second prediction for the current CU, wherein the second prediction is associated with inter prediction mode, and obtain a final prediction for the current CU based on the first prediction and the second prediction.
In some examples, the inter prediction mode may include an inter merge mode, and the processor 1620 may obtain a final prediction for the current CU by equally averaging the first prediction and the second prediction.
In some examples, the processor 1620 may obtain a first prediction for the current CU, wherein the first prediction is associated with intra TMP mode, obtain a second prediction for the current CU, wherein the second prediction is associated with intra prediction mode, obtain a third prediction for the current CU, wherein the third prediction is associated with inter prediction mode, and obtain a final prediction for the current CU based on the first prediction, the second prediction, and the third prediction. In some examples, processor 1620 may obtain the final prediction for the current CU by equally averaging the first prediction, the second prediction, and the third prediction.
In some examples, processor 1620 may obtain a first intermediate prediction based on the first prediction and the second prediction, obtain a second intermediate prediction based on the first prediction and the third prediction, and obtain a final prediction for the current CU by equally averaging the first intermediate prediction and the second intermediate prediction.
In some examples, processor 1620 may obtain a Most Probable Mode (MPM) list including intra TMP mode and one or more other intra-prediction modes, calculate a plurality of TM costs by comparing intra TMP mode and one or more other intra-prediction modes, and select a first TM cost and a second TM cost from the plurality of TM costs, wherein the first TM cost is a smallest TM cost of the plurality of TM costs and is associated with the first mode in the MPM list, and the second TM cost is a second smallest TM cost of the plurality of TM costs and is associated with the second mode in the MPM list. Further, processor 1620 may obtain a final prediction for the current CU by fusing the first mode and the second mode using TIMD modes.
In some examples, the processor 1620 may obtain intra TMP prediction for the current CU, wherein the intra TMP prediction is associated with intra TMP mode, obtain conventional TIMD prediction for the current CU, wherein the conventional TIMD prediction is associated with TIMD, and fuse the intra TMP prediction and the conventional TIMD prediction using template-based intra mode derivation (TIMD) mode to obtain a final prediction for the current CU.
In some examples, processor 1620 may obtain intra TMP prediction for the current CU, wherein the intra TMP prediction is associated with intra TMP mode, obtain a modeling function by modeling local illumination changes between the current CU and intra TMP predictions using a function of the local illumination changes between the current CU template and the reference block template, wherein the function is a linear equation, and obtain the local illumination compensated intra TMP prediction based on the modeling function.
In some examples, processor 1620 may obtain the current CU encoded based on intra TMP mode and perform weighted prediction using block vector information of neighboring blocks, e.g., perform weighted prediction using block vector information of neighboring blocks, refine the top boundary pixels and the left side boundary pixels of the current CU encoded based on intra TMP mode. For example, assuming that the current CU and its upper CU are both encoded using intra TMP mode but use different block vectors, in order to refine the top boundary pixel of the current CU, another prediction result of the top boundary pixel of the current CU is obtained using the block vector of the upper CU, and then the original prediction result and the another prediction result are weighted averaged to obtain the final prediction.
In some examples, processor 1620 may obtain the current CU encoded based on the intra TMP mode and refine the top boundary pixels and the left boundary pixels of the current CU encoded based on the intra TMP mode using a template matching based method.
Fig. 24 is a flowchart illustrating a method for video decoding according to an example of the present disclosure.
In step 2401, at the decoder side, the processor 1620 may obtain the current CU encoded based on an intra Template Matching Prediction (TMP) mode combined with a Geometric Partition Mode (GPM) mode.
In step 2402, at the decoder side, the processor 1620 may obtain a final prediction for the current CU based on the intra TMP mode combined with the GPM mode.
In some examples, the current CU is divided into a first intra TMP prediction portion and a second intra TMP prediction portion, and processor 1620 may obtain a first intra TMP prediction for the first intra TMP prediction portion, obtain a second intra TMP prediction for the second intra TMP prediction portion, and obtain a final prediction for the current CU based on the first intra TMP prediction and the second intra TMP prediction. In some examples, processor 1620 may also obtain a reordered GPM partition pattern using a TM-based approach, as discussed in the template matching-based reordering portion for the GPM partition pattern.
In some examples, processor 1620 may obtain the final prediction for the current CU by weighted averaging the first intra TMP prediction and the second intra TMP prediction.
In some examples, the current CU is divided into a first intra TMP prediction portion and a second intra prediction portion. The processor 1620 may also obtain a first intra TMP prediction for the first intra TMP prediction portion, obtain a second intra prediction for the second intra prediction portion, and obtain a final prediction for the current CU based on the first intra TMP prediction and the second intra prediction.
In some examples, the decoder may obtain the final prediction for the current CU by weighted averaging the first intra TMP prediction and the second intra prediction.
In some examples, the current CU is divided into a first intra TMP prediction portion and a second inter prediction portion. The processor 1620 may also obtain a first intra TMP prediction for the first intra TMP prediction portion, obtain a second inter-frame merge prediction for the second inter-frame prediction portion, and obtain a final prediction for the current CU by weighted averaging the first intra TMP prediction and the second inter-frame merge prediction. In some examples, processor 1620 may also obtain a reordered GPM partition pattern using a TM-based approach, as discussed in the template matching-based reordering portion for the GPM partition pattern.
In some examples, the current CU is divided into a first portion and a second portion based on a predefined direction. Further, the processor 1620 may obtain a first intra prediction for the first portion, obtain a second intra TMP prediction for the second portion, and obtain a final prediction for the current CU by weighted averaging the first intra prediction and the second intra TMP prediction.
In some examples, the predefined direction is 45 degrees, the first intra prediction is located in the upper left portion of the current CU, and the second intra TMP prediction is located in the lower right portion of the current CU.
Fig. 25 is a flowchart illustrating a method for video encoding corresponding to the method for video decoding as illustrated in fig. 24.
In step 2501, at the encoder side, the processor 1620 may encode the current CU based on the intra TMP mode combined with the GPM mode.
In step 2502, at the encoder side, the processor 1620 may send the current CU encoded based on the intra TMP mode combined with the GPM mode to the decoder.
In some examples, the current CU is divided into a first intra TMP prediction portion and a second intra TMP prediction portion. Further, the processor 1620 may obtain a first intra TMP prediction for the first intra TMP prediction portion, obtain a second intra TMP prediction for the second intra TMP prediction portion, and obtain a final prediction for the current CU based on the first intra TMP prediction and the second intra TMP prediction.
In some examples, the current CU is divided into a first intra TMP prediction portion and a second intra TMP prediction portion, and processor 1620 may obtain a first intra TMP prediction for the first intra TMP prediction portion, obtain a second intra TMP prediction for the second intra TMP prediction portion, and obtain a final prediction for the current CU based on the first intra TMP prediction and the second intra TMP prediction. In some examples, processor 1620 may also obtain a reordered GPM partition pattern using a TM-based approach, as discussed in the template matching-based reordering portion for the GPM partition pattern.
In some examples, processor 1620 may obtain the final prediction for the current CU by weighted averaging the first intra TMP prediction and the second intra TMP prediction.
In some examples, the current CU is divided into a first intra TMP prediction portion and a second intra prediction portion. The processor 1620 may also obtain a first intra TMP prediction for the first intra TMP prediction portion, obtain a second intra prediction for the second intra prediction portion, and obtain a final prediction for the current CU based on the first intra TMP prediction and the second intra prediction.
In some examples, at the encoder side, the processor 1620 may obtain the final prediction for the current CU by weighted averaging the first intra TMP prediction and the second intra prediction.
In some examples, the current CU is divided into a first intra TMP prediction portion and a second inter prediction portion. The processor 1620 may also obtain a first intra TMP prediction for the first intra TMP prediction portion, obtain a second inter-frame merge prediction for the second inter-frame prediction portion, and obtain a final prediction for the current CU by weighted averaging the first intra TMP prediction and the second inter-frame merge prediction. In some examples, processor 1620 may also obtain a reordered GPM partition pattern using a TM-based approach, as discussed in the template matching-based reordering portion for the GPM partition pattern.
In some examples, the current CU is divided into a first portion and a second portion based on a predefined direction. Further, the processor 1620 may obtain a first intra prediction for the first portion, obtain a second intra TMP prediction for the second portion, and obtain a final prediction for the current CU by weighted averaging the first intra prediction and the second intra TMP prediction.
In some examples, the predefined direction is 45 degrees, the first intra prediction is located in the upper left portion of the current CU, and the second intra TMP prediction is located in the lower right portion of the current CU.
Fig. 26 is a flowchart illustrating a method for video decoding according to an example of the present disclosure.
In step 2601, at the decoder side, the processor 1620 may obtain a current CU encoded based on a combination mode that combines at least one of TIMD modes or OBMC modes with IBC modes.
In step 2602, at the decoder side, the processor 1620 may obtain a final prediction for the current CU based on the combined mode.
In some examples, processor 1620 may obtain a Most Probable Mode (MPM) list including the IBC mode and the one or more other intra-prediction modes, calculate a plurality of Template Matching (TM) costs by comparing the IBC mode and the one or more other intra-prediction modes, select a first TM cost and a second TM cost from the plurality of TM costs, wherein the first TM cost is a minimum TM cost of the plurality of TM costs and is associated with a first mode of the MPM list, and the second TM cost is a second minimum TM cost of the plurality of TM costs and is associated with a second mode of the MPM list, and obtain a final prediction for the current CU by fusing the first mode and the second mode using TIMD modes.
In some examples, the processor 1620 may obtain IBC predictions for the current CU, obtain conventional TIMD predictions for the current CU, and fuse the IBC mode and conventional TIMD predictions using TIMD mode to obtain a final prediction for the current CU, where the IBC prediction is associated with IBC mode and the conventional TIMD prediction is associated with TIMD mode.
In some examples, the processor 1620 may obtain the current CU encoded based on IBC mode and perform weighted prediction using block vector information of neighboring blocks, e.g., perform weighted prediction using block vector information of neighboring blocks, to refine the top boundary pixels and the left side boundary pixels of the current CU encoded based on IBC mode. For example, assuming that the current CU and its upper CU are both encoded using IBC mode but use different block vectors, in order to refine the top boundary pixel of the current CU, another prediction result of the top boundary pixel of the current CU is obtained using the block vector of the upper CU, and then the original prediction result and the another prediction result are weighted-averaged to obtain the final prediction result.
In some examples, the processor 1620 may obtain the current CU encoded based on the IBC mode and refine the top boundary pixels and the left side boundary pixels of the current CU encoded based on the IBC mode using a template matching based method.
Fig. 27 is a flowchart illustrating a method for video encoding corresponding to the method for video decoding as illustrated in fig. 26.
In step 2701, at the encoder side, the processor 1620 may encode the current CU based on a combination mode that combines at least one of TIMD modes or OBMC modes with IBC modes.
In step 2702, at the encoder side, the processor 1620 may send the current CU encoded based on the combined mode to a decoder.
In some examples, processor 1620 may obtain a Most Probable Mode (MPM) list including the IBC mode and the one or more other intra-prediction modes, calculate a plurality of Template Matching (TM) costs by comparing the IBC mode and the one or more other intra-prediction modes, select a first TM cost and a second TM cost from the plurality of TM costs, wherein the first TM cost is a minimum TM cost of the plurality of TM costs and is associated with a first mode of the MPM list, and the second TM cost is a second minimum TM cost of the plurality of TM costs and is associated with a second mode of the MPM list, and obtain a final prediction for the current CU by fusing the first mode and the second mode using TIMD modes.
In some examples, the processor 1620 may obtain IBC predictions for the current CU, obtain conventional TIMD predictions for the current CU, and fuse the IBC mode and conventional TIMD predictions using TIMD mode to obtain a final prediction for the current CU, where the IBC prediction is associated with IBC mode and the conventional TIMD prediction is associated with TIMD mode.
In some examples, the processor 1620 may obtain the current CU encoded based on IBC mode and perform weighted prediction using block vector information of neighboring blocks, e.g., perform weighted prediction using block vector information of neighboring blocks, to refine the top boundary pixels and the left side boundary pixels of the current CU encoded based on IBC mode. For example, assuming that the current CU and its upper CU are both encoded using IBC mode but use different block vectors, in order to refine the top boundary pixel of the current CU, another prediction result of the top boundary pixel of the current CU is obtained using the block vector of the upper CU, and then the original prediction result and the another prediction result are weighted-averaged to obtain the final prediction result.
In some examples, the processor 1620 may obtain the current CU encoded based on the IBC mode and refine the top boundary pixels and the left side boundary pixels of the current CU encoded based on the IBC mode using a template matching based method.
In some examples, an apparatus for video encoding and decoding is provided. The apparatus includes a processor 1620 and a memory 1640 configured to store instructions executable by the processor, wherein the processor, when executing the instructions, is configured to perform any of the methods as shown in fig. 22-27.
In an embodiment, there is also provided a non-transitory computer readable storage medium including a plurality of programs, for example in memory 1630, executable by processor 1620 in computing environment 1610 for performing the above-described methods and/or storing a bitstream generated by the above-described encoding method or a bitstream to be decoded by the above-described decoding method. In one example, a plurality of programs may be executed by the processor 1620 in the computing environment 1610 to receive (e.g., from the video encoder 20 in fig. 2) a bitstream or data stream comprising encoded video information (e.g., representing video blocks of encoded video frames and/or associated one or more syntax elements, etc.), and may also be executed by the processor 1620 in the computing environment 1610 to perform the above-described decoding method based on the received bitstream or data stream. In another example, a plurality of programs can be executed by the processor 1620 in the computing environment 1610 to perform the encoding methods described above, encode video information (e.g., video blocks representing video frames and/or associated one or more syntax elements, etc.) into a bitstream or data stream, and can also be executed by the processor 1620 in the computing environment 1610 to transmit the bitstream or data stream (e.g., to the video decoder 30 in fig. 3). Optionally, the non-transitory computer readable storage medium may store therein a bitstream or data stream comprising encoded video information (e.g., representing video blocks of an encoded video frame and/or associated one or more syntax elements, etc.) generated by an encoder (e.g., video encoder 20 of fig. 2) using, for example, the encoding methods described above for use by a decoder (e.g., video decoder 30 of fig. 3) in decoding video data. The non-transitory computer readable storage medium may be, for example, ROM, random-access memory (Random Access Memory, RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an embodiment, a bitstream generated by the above-described encoding method or a bitstream to be decoded by the above-described decoding method is provided. In an embodiment, a bitstream is provided that includes encoded video information generated by the above-described encoding method or encoded video information to be decoded by the above-described decoding method.
In an embodiment, a computing device is also provided that includes one or more processors (e.g., processor 1620), and a non-transitory computer-readable storage medium or memory 1630 having stored therein a plurality of programs executable by the one or more processors, wherein the one or more processors are configured to perform the above-described methods when the plurality of programs are executed.
In an embodiment, there is also provided a computer program product having instructions for storing or transmitting a bitstream comprising encoded video information generated by the above-described encoding method or encoded video information to be decoded by the above-described decoding method. In an embodiment, a computer program product is also provided that includes a plurality of programs, e.g., in memory 1630, executable by processor 1620 in computing environment 1610 for performing the methods described above. For example, the computer program product may include a non-transitory computer readable storage medium.
In an embodiment, the computing environment 1610 may be implemented with one or more ASICs, DSPs, digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), FPGAs, GPUs, controllers, microcontrollers, microprocessors, or other electronic components for executing the methods described above.
In an embodiment, there is also provided a method of storing a bitstream, comprising storing the bitstream on a digital storage medium, wherein the bitstream comprises encoded video information generated by the above-described encoding method or encoded video information to be decoded by the above-described decoding method.
In an embodiment, a method for transmitting a bitstream generated by the above encoder is also provided. In an embodiment, a method for receiving a bitstream to be decoded by the decoder described above is also provided.
The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will become apparent to those of ordinary skill in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
The order of method steps according to the present disclosure is intended to be illustrative only, unless specifically stated otherwise, and is not limited to the order specifically described above, but may be changed according to actual conditions. Furthermore, at least one of the method steps according to the present disclosure may be adapted, combined or deleted according to the actual requirements.
The examples were chosen and described in order to explain the principles of the present disclosure and to enable others skilled in the art to understand the disclosure for various embodiments and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the present disclosure is not limited to the specific examples of the disclosed embodiments, and that modifications and other embodiments are intended to be included within the scope of the present disclosure.