HK40092888B

HK40092888B - Method for decoding inter-prediction video block of video stream and electronic equipment

Info

Publication number: HK40092888B
Application number: HK62023081192.4A
Authority: HK
Inventors: 赵亮; 赵欣; 刘杉
Original assignee: 腾讯美国有限责任公司
Priority date: 2021-10-21
Filing date: 2022-04-13
Publication date: 2025-03-28

Description

Methods and electronic devices for decoding inter-frame predicted video blocks of a video stream

援引并入Incorporation

本国际PCT申请基于并要求于2022年3月30日提交的美国非临时专利申请号为No.17/708,801的优先权权益，该专利申请基于并要求于2021年10月21日提交的美国临时专利申请号为No.63/270,397和2021年12月13日提交的美国临时专利申请号No.63/289,122的优先权，其标题均为“运动矢量差的自适应分辨率”。在先申请的全部内容以引用的方式并入本文中。This international PCT application is based on and claims priority to U.S. non-provisional patent application No. 17/708,801, filed March 30, 2022, which in turn claims priority to U.S. provisional patent application No. 63/270,397, filed October 21, 2021, and U.S. provisional patent application No. 63/289,122, filed December 13, 2021, both entitled "Adaptive Resolution of Motion Vector Difference". The entire contents of the earlier applications are incorporated herein by reference.

技术领域Technical Field

本公开总体上涉及视频编码/解码，尤其涉及用于对视频流的帧间预测视频块进行解码的方法及电子设备。This disclosure relates generally to video encoding/decoding, and more particularly to methods and electronic devices for decoding inter-frame predicted video blocks of a video stream.

背景技术Background Technology

本文所提供的背景技术说明以对本公开的内容作一般性说明为目的。发明人的某些工作(即已在此背景技术部分中作出描述的工作)以及说明书中关于某些尚未成为申请日之前的现有技术的内容，无论是以明确或隐含的方式均不被视为相对于本公开的现有技术。The background description provided herein is intended to provide a general explanation of the contents of this disclosure. Certain works of the inventors (i.e., those already described in this background section) and content in the specification relating to prior art prior to the filing date are not, whether explicitly or implicitly, considered to be prior art relative to this disclosure.

可以使用具有运动补偿的帧间图片预测来执行视频编码和解码。未压缩的数字视频可以包括一系列图片，每个图片具有例如为1920×1080的亮度样本及相关联的全采样或子采样的色度样本的空间大小。该一系列图片可具有例如每秒60幅图片或每秒60帧的固定或可变的图片速率(替代地，称为帧率)。未压缩的视频对流式传输或数据处理具有特定比特率要求。例如，具有1920×1080的像素分辨率、60帧/秒的帧率、以及每个颜色通道每个像素8位的4：2：0的色度子采样的视频需要接近1.5Gbit/s的带宽。一小时的此类视频需要600GB以上的存储空间。Video encoding and decoding can be performed using inter-frame picture prediction with motion compensation. Uncompressed digital video can comprise a series of pictures, each with a spatial size of, for example, a 1920×1080 luminance sample and an associated full-sampled or subsampled chrominance sample. This series of pictures can have a fixed or variable picture rate (alternatively referred to as the frame rate), for example, 60 pictures per second or 60 frames per second. Uncompressed video has specific bit rate requirements for streaming or data processing. For example, video with a pixel resolution of 1920×1080, a frame rate of 60 frames per second, and 4:2:0 chrominance subsampling with 8 bits per pixel per color channel requires bandwidth close to 1.5 Gbit/s. One hour of such video would require more than 600 GB of storage space.

视频编码和解码的一个目的可以是通过压缩来减少未压缩的输入视频信号中的冗余。压缩可有助于减小上述带宽和/或存储空间需求，在某些情况下可减小两个数量级或大于两个数量级。可以采用无损压缩和有损压缩，以及它们的组合。无损压缩指的是可通过解码过程从已压缩的原始信号中重建原始信号的精确副本的技术。有损压缩指的是如下编码/解码过程，在该编码/解码过程中，在编码期间未完全保留原始视频信息，以及在解码期间未完全恢复原始视频信息。当使用有损压缩时，已重建的信号可能与原始信号不同，尽管丢失一些信息，但是原始信号和已重建的信号之间的失真足够小，以使已重建的信号可用于预期的应用。在视频的情况下，在许多应用中广泛使用有损压缩。可容忍的失真量取决于应用。例如，某些消费视频流式应用的用户相比电影或电视广播应用的用户来说可以容忍更高的失真。可以选择或调节可通过特定的编码算法达到的压缩率，以反映各种失真容忍度：更高的可容忍的失真通常允许编码算法产生更高的损失和更高的压缩率。One objective of video encoding and decoding is to reduce redundancy in the uncompressed input video signal through compression. Compression can help reduce the aforementioned bandwidth and/or storage space requirements, in some cases by two orders of magnitude or more. Lossless compression, lossy compression, and combinations thereof can be employed. Lossless compression refers to a technique that reconstructs an exact copy of the original signal from the compressed original signal through a decoding process. Lossy compression refers to an encoding/decoding process in which the original video information is not fully preserved during encoding and is not fully recovered during decoding. When using lossy compression, the reconstructed signal may differ from the original signal, but despite the loss of some information, the distortion between the original and reconstructed signals is small enough that the reconstructed signal can be used for the intended application. In the case of video, lossy compression is widely used in many applications. The tolerable amount of distortion depends on the application. For example, users of some consumer video streaming applications may tolerate higher distortion than users of movie or television broadcasting applications. The compression ratio achievable through a specific coding algorithm can be selected or adjusted to reflect various levels of distortion tolerance: higher tolerable distortion generally allows the coding algorithm to produce higher loss and higher compression ratio.

视频编码器和解码器可利用来自多个类别和步骤的技术，这些技术包括例如运动补偿、傅里叶变换、量化和熵编码。Video encoders and decoders can utilize techniques from multiple categories and steps, including, for example, motion compensation, Fourier transform, quantization, and entropy coding.

视频编解码器技术可以包括称为帧内编码的技术。在帧内编码中，在不参考来自先前重建的参考图片的样本或其他数据的情况下表示样本值。在某些视频编解码器中，图片在空间上细分为样本块。当所有样本块都以帧内模式编码时，该图片可称为帧内图片。帧内图片及其派生物(例如，独立解码器刷新图片)可用于重置解码器状态，因此可用作已编码视频码流和视频会话中的第一张图片，或者用作静止图像。然后，帧内预测之后的块的样本可进行变换而处于频域中，且可在熵编码之前对如此生成的变换系数进行量化。帧内预测表示一种使预变换域中的样本值最小化的技术。在某些情况下，变换后的DC值越小，且AC系数越小，则在给定的量化步长尺寸下就需要越少的比特来表示熵编码后的块。Video codec techniques can include techniques called intra-frame coding. In intra-frame coding, sample values are represented without reference to samples or other data from previously reconstructed reference images. In some video codecs, images are spatially subdivided into sample blocks. When all sample blocks are encoded in intra-frame mode, the image is called an intra-frame picture. Intra-frame pictures and their derivatives (e.g., independent decoder refresh pictures) can be used to reset the decoder state and thus can be used as the first picture in an encoded video stream and video session, or as a still image. Samples of blocks following intra-frame prediction can then be transformed into the frequency domain, and the transformed coefficients thus generated can be quantized before entropy coding. Intra-frame prediction represents a technique that minimizes the sample values in the pre-transform domain. In some cases, the smaller the transformed DC value and the smaller the AC coefficients, the fewer bits are needed to represent the entropy-coded block for a given quantization step size.

例如从诸如MPEG-2代编码技术已知的传统帧内编码不使用帧内预测。然而，一些更新的视频压缩技术包括基于例如在空间相邻的编码和/或解码期间获得的、按照解码顺序在帧内编码或解码的数据块之前的周围样本数据和/或元数据来尝试对块进行编码/解码的技术。此类技术此后称为“帧内预测”技术。应注意，至少在一些情况下，帧内预测仅使用来自正在重建的当前图片的参考数据，而不使用来自其它参考图片的参考数据。For example, traditional intra-frame coding techniques known from MPEG-2 generation coding technologies do not use intra-frame prediction. However, some newer video compression techniques include those that attempt to encode/decode blocks based on, for example, surrounding sample data and/or metadata obtained during spatially adjacent encoding and/or decoding, preceding the data blocks encoded or decoded intra-frame in the decoding order. Such techniques are hereinafter referred to as "intra-frame prediction" techniques. It should be noted that, at least in some cases, intra-frame prediction uses only reference data from the current frame being reconstructed, and not reference data from other reference frames.

帧内预测可以有许多不同的形式。当在给定的视频编码技术中可使用不止一种这样的技术时，使用中的技术可称为帧内预测模式。可以在特定的编解码器中提供一种或多种帧内预测模式。在一些情况下，模式可具有子模式，和/或可与各种参数相关联，且用于视频块的模式/子模式信息和帧内编码参数可单独编码或共同包含在模式码字中。对于给定模式、子模式和/或参数组合使用哪个码字可能会对通过帧内预测的编码效率增益产生影响，且用于将码字转换为码流的熵编码技术同样对其也可以产生影响。Intra-prediction can take many different forms. When more than one such technique is available in a given video coding technique, the technique in use is called an intra-prediction mode. One or more intra-prediction modes can be provided in a particular codec. In some cases, a mode may have sub-modes and/or may be associated with various parameters, and the mode/sub-mode information and intra-coding parameters for the video block may be encoded separately or included together in the mode codeword. Which codeword is used for a given mode, sub-mode, and/or parameter combination can affect the coding efficiency gain through intra-prediction, and the entropy coding technique used to convert the codeword into a bitstream can also have an impact.

H.264引入了某种帧内预测模式，并在H.265中对其进行了改进，并在诸如联合探索模型(Joint Exploration Model，JEM)、下一代视频编码(Versatile Video Coding，VVC)、基准集(Benchmark Set，BMS)等新的编码技术中进一步进行了改进。通常，对于帧内预测，可使用已经变成可用的相邻样本值来形成预测块。例如，相邻样本的特定集的可用值可沿着一些方向和/或线复制到预测块中。对使用方向的参考可以在码流中进行编码，或者可对其本身进行预测。对所使用方向的参考可以编码在码流中，或者可以对其本身进行预测。H.264 introduced a certain intra-frame prediction mode, which was improved in H.265 and further refined in new coding techniques such as the Joint Exploration Model (JEM), Versatile Video Coding (VVC), and Benchmark Set (BMS). Typically, for intra-frame prediction, neighboring sample values that have become available are used to form prediction blocks. For example, available values from a specific set of neighboring samples can be copied into the prediction block along some direction and/or line. References to the directions used can be encoded in the bitstream or predicted on their own.

参考图1A，在右下方描绘了H.265的33种可能的帧内预测方向(对应于H.265中指定的35个帧内模式的33个角模式)中指定的9个预测方向的子集。箭头收敛的点(101)表示正在被预测的样本。箭头表示相邻样本用于预测101处的样本所沿的方向。例如，箭头(102)指示从在右上方、与水平方向成45度角的一个或多个相邻样本预测样本(101)。类似地，箭头(103)指示从在样本(101)的左下方、与水平方向成22.5度角的一个或多个相邻样本预测样本(101)。Referring to Figure 1A, a subset of nine prediction directions is depicted in the lower right corner, representing nine of the 33 possible intra-frame prediction directions of H.265 (corresponding to the 33 angular modes of the 35 intra-frame modes specified in H.265). The point (101) where the arrow converges indicates the sample being predicted. The arrows indicate the directions along which neighboring samples are used to predict the sample at 101. For example, arrow (102) indicates that sample (101) is predicted from one or more neighboring samples in the upper right corner at a 45-degree angle to the horizontal. Similarly, arrow (103) indicates that sample (101) is predicted from one or more neighboring samples in the lower left corner of sample (101) at a 22.5-degree angle to the horizontal.

仍然参考图1，在左上方描绘了4×4个样本的正方形块(104)(用粗体虚线表示)。正方形块(104)包含16个样本，每个样本使用“S”及其在Y维度上的位置(例如，行索引)和其在X维度上的位置(例如列索引)来标记。例如，样本S21是Y维度上(从顶部开始)的第二个样本，以及X维度上(从左侧开始)的第一个样本。类似地，样本S44在Y维度和X维度上都是块(104)中的第四个样本。由于块的大小为4×4个样本，因此S44在右下角。还示出了遵循类似编号方案的示例参考样本。参考样本用R及其相对于块(104)的Y位置(例如行索引)和X位置(列索引)来标记。在H.264和H.265中，使用与正在重建的块相邻的预测样本。Referring again to Figure 1, a 4×4 square block (104) of samples is depicted in the upper left (indicated by bold dashed lines). The square block (104) contains 16 samples, each labeled with “S” and its position in the Y dimension (e.g., row index) and its position in the X dimension (e.g., column index). For example, sample S21 is the second sample in the Y dimension (from the top) and the first sample in the X dimension (from the left). Similarly, sample S44 is the fourth sample in block (104) in both the Y and X dimensions. Since the block size is 4×4 samples, S44 is in the lower right corner. Example reference samples following a similar numbering scheme are also shown. Reference samples are labeled with R and their Y position (e.g., row index) and X position (column index) relative to block (104). In H.264 and H.265, predicted samples adjacent to the block being reconstructed are used.

块104的帧内图片预测可通过根据用信号通知的预测方向从相邻样本复制参考样本值来开始。例如，假设已编码视频码流包括信令(signaling)，该信令针对该块104指示箭头(102)的预测方向，即从在右上方、与水平方向成45度角的一个或多个预测样本来预测样本。在这种情况下，从同一个参考样本R05预测样本S41、S32、S23和S14。然后，根据参考样本R08预测样本S44。Intra-frame image prediction for block 104 can begin by copying reference sample values from neighboring samples according to the prediction direction indicated by a signal. For example, suppose the encoded video stream includes signaling for the prediction direction indicated by the arrow (102) for block 104, i.e., predicting samples from one or more prediction samples at the upper right, at a 45-degree angle to the horizontal. In this case, samples S41, S32, S23, and S14 are predicted from the same reference sample R05. Then, sample S44 is predicted based on reference sample R08.

在某些情况下，可以例如通过插值来组合多个参考样本的值，以便计算参考样本，尤其是当方向不能被45度整除时。In some cases, the values of multiple reference samples can be combined, for example, by interpolation, in order to calculate the reference sample, especially when the direction is not divisible by 45 degrees.

随着视频编码技术继续发展，可能的方向的数量增加。例如，在H.264(2003年)中，九个不同的方向可用于帧内预测。在H.265(2013年)中，增加到33个方向，而在本公开时，JEM/VVC/BMS可支持多达65个方向。已进行实验研究来帮助识别最合适的帧内预测方向，且熵编码中的一些技术可用于以少量比特对那些最合适的方向进行编码，对于方向，接受一定的比特代价。此外，有时可以从已经解码的相邻块的帧内预测中使用的相邻方向来预测方向本身。As video coding techniques continue to evolve, the number of possible directions increases. For example, in H.264 (2003), nine different directions were available for intra-frame prediction. In H.265 (2013), this increased to 33 directions, and at the time of this disclosure, JEM/VVC/BMS supports up to 65 directions. Experimental studies have been conducted to help identify the most suitable intra-frame prediction directions, and some techniques in entropy coding can be used to encode those most suitable directions with a small number of bits, accepting a certain bit cost for the direction. Furthermore, sometimes the direction itself can be predicted from neighboring directions used in the intra-frame prediction of already decoded adjacent blocks.

图1B示出了示意图(180)，其描绘了根据JEM的65个帧内预测方向，以说明随着时间的推移，开发出来的各种编码技术中的预测方向的数量增加。Figure 1B shows a schematic diagram (180) depicting 65 intra-frame prediction directions according to JEM to illustrate the increase in the number of prediction directions in various coding techniques developed over time.

已编码视频码流中表示帧内预测方向的比特映射到预测方向的方式可能因视频编码技术而不同；例如，其范围可以从预测方向简单直接映射到帧内预测模式，映射到码字，映射到涉及最可能模式的复杂自适应方案，以及类似技术。然而，在所有情况下，对于帧内预测，可存在某些方向，这些方向与某些其它方向相比，在统计上在视频内容中出现的可能性较小。由于视频压缩的目标是减少冗余，因此在一种设计良好的视频编码技术中，那些不太可能的方向相比可能出现的方向将由更多的位数表示。The way bits representing intra-prediction directions in an encoded video bitstream are mapped to prediction directions can vary depending on the video coding technique; for example, it can range from a simple, direct mapping of the prediction direction to the intra-prediction mode, to a mapping to a codeword, to a complex adaptive scheme involving the most probable mode, and similar techniques. However, in all cases, for intra-prediction, there may be some directions that are statistically less likely to occur in the video content compared to others. Since the goal of video compression is to reduce redundancy, in a well-designed video coding technique, those less likely directions will be represented by more bits than the more likely directions.

帧间图片预测或帧间预测可基于运动补偿。在运动补偿中，来自先前重建的图片或其部分(参考图片)的样本数据在沿着由运动矢量(此后称为MV)指示的方向上空间偏移之后，可用于预测新重建的图片或图片部分(例如，块)。在某些情况下，参考图片可以与当前正在重建的图片相同。MV可具有X和Y两个维度，或具有三个维度，第三个维度指示正在使用的参考图片(类似于时间维度)。Inter-frame image prediction, or inter-prediction, can be based on motion compensation. In motion compensation, sample data from a previously reconstructed image or a portion thereof (the reference image), after being spatially offset along a direction indicated by a motion vector (hereafter referred to as MV), can be used to predict a newly reconstructed image or image portion (e.g., a patch). In some cases, the reference image can be the same as the image currently being reconstructed. MV can have two dimensions, X and Y, or three dimensions, with the third dimension indicating the reference image being used (similar to a temporal dimension).

在一些视频压缩技术中，可根据其它MV，例如根据在空间上与正在重建的区域相邻的样本数据的其它区域相关的、且按解码顺序在当前MV之前的其它MV来预测适用于样本数据的某个区域的当前MV。这样做可通过消除相关MV中的冗余，而大大减少对MV进行编码所需的数据的总量，从而增加压缩效率。MV预测可有效地工作，例如，由于在对从相机获得的输入视频信号(称为自然视频)进行编码时，存在以下统计可能性：在视频序列中，比适用单个MV的区域更大的区域沿着相似的方向移动，因此，在某些情况下，可使用从相邻区域的MV导出的相似运动矢量来预测该更大的区域。这使得用于给定区域的实际MV与根据周围MV所预测的MV相似或相同。进而在熵编码之后，该MV可以用比直接对MV进行编码(而非根据相邻MV来预测MV)时使用的位数更少的位数来表示。在一些情况下，MV预测可以是无损压缩从原始信号(即样本流)中导出的信号(即MV)的示例。在其他情况下，例如由于根据多个周围MV计算预测值时出现舍入误差，MV预测本身可以是有损的。In some video compression techniques, the current MV applicable to a region of sample data can be predicted based on other MVs, such as those related to other regions of sample data that are spatially adjacent to the region being reconstructed and precede the current MV in decoding order. This significantly reduces the total amount of data required to encode the MV by eliminating redundancy in related MVs, thereby increasing compression efficiency. MV prediction works effectively, for example, because when encoding an input video signal (called natural video) from a camera, there is a statistical probability that a larger region than the region applicable to a single MV moves along similar directions in the video sequence. Therefore, in some cases, the larger region can be predicted using similar motion vectors derived from the MVs of neighboring regions. This makes the actual MV for a given region similar to or the same as the MV predicted based on the surrounding MVs. Consequently, after entropy coding, the MV can be represented with fewer bits than when directly encoding the MV (rather than predicting the MV based on neighboring MVs). In some cases, MV prediction can be an example of lossless compression of the signal (i.e., the MV) derived from the original signal (i.e., the sample stream). In other cases, such as when rounding errors occur while calculating the predicted value based on multiple surrounding MVs, the MV prediction itself can be lossy.

H.265/HEVC(ITU-T H.265建议书，“高效视频编解码(High Efficiency VideoCoding)”，2016年12月)中描述了各种MV预测机制。在H.265指定的多种MV预测机制中，本文描述的是下文称为“空间合并”的技术。H.265/HEVC (ITU-T H.265 Recommendation, “High Efficiency Video Coding”, December 2016) describes various MV prediction mechanisms. Among the various MV prediction mechanisms specified in H.265, this paper describes the technique referred to below as “spatial combining”.

具体地，参考图2，当前块(201)包括在运动搜索过程期间已由编码器发现的样本，可以根据已产生空间偏移的相同大小的先前块来预测该样本。另外，可从一个或多个参考图片相关联的元数据中导出该MV，而非对MV直接编码。例如，使用关联于A0、A1和B0、B1、B2(分别对应202到206)五个周围样本中的任一样本的MV，(按解码次序)从最近的参考图片的元数据中导出该MV。在H.265中，MV预测可使用来自相邻块正在使用的相同参考图片的预测值。Specifically, referring to Figure 2, the current block (201) includes samples that have been discovered by the encoder during the motion search process, and these samples can be predicted based on previous blocks of the same size that have generated spatial offsets. Alternatively, the MV can be derived from the metadata associated with one or more reference images, rather than being directly encoded. For example, using the MV associated with any of the five surrounding samples A0, A1 and B0, B1, B2 (corresponding to 202 to 206 respectively), the MV is derived from the metadata of the nearest reference image (in decoding order). In H.265, MV prediction can use predictions from the same reference image being used by adjacent blocks.

发明内容Summary of the Invention

本公开总体上涉及视频编码，尤其涉及一种用于在视频块的帧间预测中提供运动矢量差的自适应分辨率的方法和系统。在示例实现中，公开了一种用于对视频流的帧间预测视频块进行解码的方法。该方法可以包括：接收所述视频流；确定与所述帧间预测视频块相关联的运动矢量和参考运动矢量之间的运动矢量差(motion vector difference，MVD)被写入所述视频流，其中，所述参考运动矢量对应于参考帧列表0和参考帧列表1中仅一个中的一个参考图片，除非所述MVD是针对两个参考图片联合写入的；从所述视频流中获得在多个预定义的运动矢量差大小范围中所述MVD的大小范围的指示；根据所述大小范围确定所述MVD的像素分辨率；基于所述像素分辨率识别所述视频流中的附加MVD信息；从所述视频流中提取所述附加MVD信息；以及基于所述像素分辨率、所述附加MVD信息、所述参考运动矢量以及与所述运动矢量相关联的参考帧，对所述帧间预测视频块进行解码。This disclosure generally relates to video coding, and more particularly to a method and system for providing adaptive resolution of motion vector difference in inter-frame prediction of video blocks. In an example implementation, a method for decoding inter-frame predicted video blocks of a video stream is disclosed. The method may include: receiving the video stream; determining that a motion vector difference (MVD) between a motion vector associated with the inter-frame predicted video block and a reference motion vector is written into the video stream, wherein the reference motion vector corresponds to only one of a reference picture in a reference frame list 0 and a reference frame list 1, unless the MVD is written jointly for both reference pictures; obtaining from the video stream an indication of a size range of the MVD within a plurality of predefined motion vector difference size ranges; determining the pixel resolution of the MVD based on the size range; identifying additional MVD information in the video stream based on the pixel resolution; extracting the additional MVD information from the video stream; and decoding the inter-frame predicted video block based on the pixel resolution, the additional MVD information, the reference motion vector, and the reference frame associated with the motion vector.

在以上的示例实现中，所述像素分辨率是2ⁿ个像素，其中n是整数，取值范围介于-6至11之间且包括-6和11。In the example implementation above, the pixel resolution is 2 ^{^n} pixels, where n is an integer ranging from -6 to 11 and inclusive.

在以上任何一个示例实现中，所述多个预定义的运动矢量差大小范围以预定义的方式与按非升序顺序的像素分辨率相关联，其中，较高的像素分辨率与较小的像素分辨率值相关联。In any of the above example implementations, the plurality of predefined motion vector difference magnitude ranges are associated with pixel resolutions in a predefined manner in a non-ascending order, wherein higher pixel resolutions are associated with smaller pixel resolution values.

在以上任何一个示例实现中，获得所述MVD的大小范围的指示包括：从所述视频流中提取第一预定义语法元素，所述第一预定义语法元素指示在预定义MVD类别集合中所述MVD的MVD类别，较低的MVD类别对应较小的MVD大小范围；以及根据所述MVD类别确定所述MVD的大小范围。In any of the above example implementations, obtaining an indication of the size range of the MVD includes: extracting a first predefined syntax element from the video stream, the first predefined syntax element indicating the MVD category of the MVD in a predefined MVD category set, with lower MVD categories corresponding to smaller MVD size ranges; and determining the size range of the MVD based on the MVD category.

在以上任何一个示例实现中，根据所述大小范围确定所述MVD的像素分辨率包括：确定所述大小范围是否高于预设的MVD范围阈值水平；在确定所述大小范围高于所述预设的MVD范围阈值水平时，确定所述像素分辨率为整数个像素；以及在确定所述大小范围不高于所述预设的MVD范围阈值水平时，确定所述像素分辨率是小数个像素。In any of the above example implementations, determining the pixel resolution of the MVD based on the size range includes: determining whether the size range is higher than a preset MVD range threshold level; when the size range is higher than the preset MVD range threshold level, determining the pixel resolution to be an integer number of pixels; and when the size range is not higher than the preset MVD range threshold level, determining the pixel resolution to be a fractional number of pixels.

在以上任何一个示例实现中，基于所述像素分辨率识别所述视频流中的附加MVD信息包括：根据第二预定义语法元素解析所述视频流，以获得所述MVD的整数像素部分；以及在确定所述像素分辨率是小数个像素时，进一步根据至少第三预定义语法元素解析所述视频流以获得所述MVD的小数像素部分。In any of the above example implementations, identifying additional MVD information in the video stream based on the pixel resolution includes: parsing the video stream according to a second predefined syntax element to obtain the integer pixel portion of the MVD; and, when it is determined that the pixel resolution is a fractional pixel, further parsing the video stream according to at least a third predefined syntax element to obtain the fractional pixel portion of the MVD.

在以上任何一个示例实施例中，所述MVD范围阈值水平包括所述预定义MVD类别集合中的最低MVD或次最低MVD。In any of the above example embodiments, the MVD range threshold level includes the lowest or second lowest MVD in the predefined set of MVD categories.

在以上任何一个示例实现中，大小范围高于所述预设的MVD范围阈值水平的所述预定义MVD类别集合中的每一个MVD与单个允许的整数MVD像素值相关联。In any of the above example implementations, each MVD in the predefined set of MVD categories with a size range higher than the preset MVD range threshold level is associated with a single allowed integer MVD pixel value.

在以上任何一个示例实现中，所述单个允许的整数像素值包括与所述对应的大小范围中的较高值对应的像素值。In any of the above example implementations, the single allowed integer pixel value includes the pixel value corresponding to the higher value in the corresponding size range.

在以上任何一个示例实施例中，所述单个允许的整数像素值包括与所述对应的大小范围中的中点值对应的像素值。In any of the above example embodiments, the single allowed integer pixel value includes the pixel value corresponding to the midpoint value in the corresponding size range.

在以上任何一个示例实现中，根据所述大小范围确定所述MVD的像素分辨率包括：确定所述大小范围是否低于、包含或高于预设的MVD阈值大小值；在确定所述大小范围高于所述预设的MVD阈值大小值时，确定所述像素分辨率为整数个像素；以及在确定所述大小范围不高于所述预设的MVD阈值大小值时，确定所述像素分辨率是小数个像素。In any of the above example implementations, determining the pixel resolution of the MVD based on the size range includes: determining whether the size range is lower than, includes, or is higher than a preset MVD threshold value; when the size range is determined to be higher than the preset MVD threshold value, determining the pixel resolution to be an integer number of pixels; and when the size range is determined to be lower than the preset MVD threshold value, determining the pixel resolution to be a fractional number of pixels.

在以上任何一个示例实现中，该方法还包括：在确定所述大小范围包含所述预设的MVD阈值大小值时：从所述视频流中提取第二预定义语法元素，所述第二预定义语法元素指示所述MVD相对于所述MVD的大小范围的起始大小的MVD大小偏移；基于所述MVD的大小范围和所述MVD大小偏移获得所述MVD的整数大小；当所述MVD的整数大小不高于所述预设的MVD阈值大小值时，确定所述像素分辨率为小数；以及当所述MVD的整数大小高于所述预设的MVD阈值大小值时，确定所述像素分辨率为非小数。In any of the above example implementations, the method further includes: when determining that the size range includes the preset MVD threshold size value: extracting a second predefined syntax element from the video stream, the second predefined syntax element indicating an MVD size offset relative to the starting size of the MVD size range; obtaining an integer size of the MVD based on the MVD size range and the MVD size offset; determining that the pixel resolution is a decimal when the integer size of the MVD is not higher than the preset MVD threshold size value; and determining that the pixel resolution is a non-decimal when the integer size of the MVD is higher than the preset MVD threshold size value.

在以上任何一个示例实现中，在确定所述像素分辨率为小数时，基于所述像素分辨率识别所述视频流中的附加MVD信息包括：根据第三预定义语法元素解析所述视频流以获得所述MVD的小数部分。In any of the above example implementations, when the pixel resolution is determined to be a decimal, identifying additional MVD information in the video stream based on the pixel resolution includes: parsing the video stream according to a third predefined syntax element to obtain the decimal part of the MVD.

在以上任何一个示例实现中，所述MVD阈值大小值小于4个像素。In any of the above example implementations, the MVD threshold value is less than 4 pixels.

在以上任何一个示例实现中，与所述多个预定义的运动矢量差大小范围相关联的MVD像素分辨率从一个大小范围到另一个大小范围不同。In any of the above example implementations, the MVD pixel resolution associated with the plurality of predefined motion vector difference size ranges differs from one size range to another.

在一些其它示例实现中，公开了一种用于解码视频流的帧间预测视频块的方法。该方法可以包括：接收所述视频流；确定与所述帧间预测视频块相关联的运动矢量和参考运动矢量之间的运动矢量差(MVD)被写入所述视频流，其中，所述参考运动矢量对应于参考帧列表0和参考帧列表1中仅一个中的一个参考图片，除非所述MVD是针对两个参考图片联合写入的；从所述视频流中提取所述MVD的大小的整数部分；根据所述MVD的大小的整数部分确定所述MVD的像素分辨率；基于所述像素分辨率识别所述视频流中的附加MVD信息；以及基于所述像素分辨率、所述MVD的大小的整数部分、所述附加MVD信息、所述参考运动矢量以及与所述运动矢量相关联的参考帧，对所述帧间预测视频块进行解码。In some other example implementations, a method for decoding inter-frame predicted video blocks of a video stream is disclosed. The method may include: receiving the video stream; determining that a motion vector difference (MVD) between a motion vector associated with the inter-frame predicted video block and a reference motion vector is written into the video stream, wherein the reference motion vector corresponds to only one of a reference image in a reference frame list 0 and a reference frame list 1, unless the MVD is written jointly for both reference images; extracting an integer portion of the size of the MVD from the video stream; determining the pixel resolution of the MVD based on the integer portion of the size of the MVD; identifying additional MVD information in the video stream based on the pixel resolution; and decoding the inter-frame predicted video block based on the pixel resolution, the integer portion of the size of the MVD, the additional MVD information, the reference motion vector, and the reference frame associated with the motion vector.

在以上示例实现中，多个MVD的像素分辨率按降序顺序依赖于多个MVD的大小。In the example implementation above, the pixel resolution of the multiple MVDs depends on the size of the multiple MVDs in descending order.

在以上任何一个示例实现中，根据所述MVD的大小的整数部分确定所述MVD的像素分辨率包括：确定所述MVD的的大小的整数部分是否高于预设的MVD阈值大小值；在确定所述MVD的大小的整数部分高于所述预设的MVD阈值大小值时，确定所述像素分辨率为整数个像素；以及在确定所述MVD的大小的整数部分不高于所述预设的MVD阈值大小值时，确定所述像素分辨率是小数个像素。In any of the above example implementations, determining the pixel resolution of the MVD based on the integer part of the MVD size includes: determining whether the integer part of the MVD size is higher than a preset MVD threshold size; when the integer part of the MVD size is higher than the preset MVD threshold size, determining the pixel resolution to be an integer number of pixels; and when the integer part of the MVD size is not higher than the preset MVD threshold size, determining the pixel resolution to be a fractional number of pixels.

本公开的各个方面还提供了一种视频编码或解码设备或装置，包括被配置为执行上述方法实现中的任一方法实现的电路。Various aspects of this disclosure also provide a video encoding or decoding device or apparatus, including circuitry configured to perform any of the methods described above.

本公开的各个方面还提供了存储指令的非临时计算机可读介质，当该指令被计算机执行用于视频解码和/或编码时，使得计算机执行上述用于视频解码和/或编码的方法。Various aspects of this disclosure also provide a non-transitory computer-readable medium storing instructions that, when executed by a computer for video decoding and/or encoding, cause the computer to perform the methods described above for video decoding and/or encoding.

附图说明Attached Figure Description

通过下面的详细描述和附图，所公开的主题的进一步特征、性质和各种优点将更加明显，在附图中：Further features, properties, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and accompanying drawings, in which:

图1A示出了帧内预测方向模式的示例性子集的示意图。Figure 1A illustrates a schematic diagram of an exemplary subset of intra-frame prediction direction patterns.

图1B示出了示例性的帧内预测方向的图示。Figure 1B illustrates an exemplary intra-frame prediction direction.

图2示出了一个示例中的用于运动矢量预测的当前块及其周围空间合并的候选块的示意图。Figure 2 shows a schematic diagram of the current block and its surrounding spatial merging candidate blocks for motion vector prediction in an example.

图3示出了根据一个示例实施例的通信系统(300)的简化框图的示意图。Figure 3 shows a simplified block diagram of a communication system (300) according to an example embodiment.

图4示出了根据一个示例实施例的通信系统(400)的简化框图的示意图。Figure 4 shows a simplified block diagram of a communication system (400) according to an example embodiment.

图5示出了根据一个示例实施例的视频解码器的简化框图的示意图。Figure 5 shows a simplified block diagram of a video decoder according to an example embodiment.

图6示出了根据一个示例实施例的视频编码器的简化框图的示意图。Figure 6 shows a simplified block diagram of a video encoder according to an example embodiment.

图7示出了根据另一示例实施例的视频编码器的框图。Figure 7 shows a block diagram of a video encoder according to another example embodiment.

图8示出了根据另一示例实施例的视频解码器的框图。Figure 8 shows a block diagram of a video decoder according to another example embodiment.

图9示出了根据本公开的示例实施例的编码块划分的方案。Figure 9 illustrates a scheme for dividing coded blocks according to an example embodiment of the present disclosure.

图10示出了根据本公开的示例实施例的编码块划分的另一方案。Figure 10 illustrates another scheme for code block partitioning according to an example embodiment of this disclosure.

图11示出了根据本公开的示例实施例的编码块划分的另一方案。Figure 11 illustrates another scheme for code block partitioning according to an example embodiment of this disclosure.

图12示出了根据一个示例划分方案将基本块划分成编码块的示例。Figure 12 shows an example of dividing a basic block into coded blocks according to an example partitioning scheme.

图13示出了示例性三元划分方案。Figure 13 illustrates an exemplary ternary partitioning scheme.

图14示出了示例性四叉树二叉树编码块划分方案。Figure 14 illustrates an exemplary quadtree/binary tree coding block partitioning scheme.

图15示出了根据本公开的示例实施例的用于将编码块划分成多个变换块的方案以及这些变换块的编码顺序。Figure 15 illustrates a scheme for dividing a coded block into multiple transform blocks according to an example embodiment of the present disclosure, and the encoding order of these transform blocks.

图16示出了根据本公开的示例实施例的用于将编码块划分成多个变换块的另一方案以及这些变换块的编码顺序。Figure 16 illustrates another scheme for dividing a coded block into multiple transform blocks according to an example embodiment of the present disclosure, and the encoding order of these transform blocks.

图17示出了根据本公开的示例实施例的用于将编码块划分成多个变换块的另一方案。Figure 17 illustrates another scheme for dividing a coded block into multiple transform blocks according to an example embodiment of the present disclosure.

图18示出了根据本公开的示例实施例的方法的流程图。Figure 18 shows a flowchart of a method according to an example embodiment of the present disclosure.

图19示出了根据本公开的示例实施例的方法的另一流程图。Figure 19 shows another flowchart of a method according to an example embodiment of the present disclosure.

图20示出了根据本公开的示例实施例的计算机系统的示意图。Figure 20 shows a schematic diagram of a computer system according to an example embodiment of the present disclosure.

具体实施方式Detailed Implementation

在整个说明书和权利要求书中，术语可以具有在上下文中暗示或隐含的超出明确陈述的含义的细微含义。本文中使用的短语“在一个实施例中”或“在一些实施例中”不一定指相同的实施例，并且本文中使用的短语“在另一个实施例中”或“在其他实施例中”不一定指不同的实施例。同样地，本文中使用的短语“在一个实施方式中”或“在一些实施方式中”不一定指相同的实施方式，本文中使用的短语“在另一个实施方式中”或“在其他实施方式中”不一定指不同的实施方式。例如，所要求保护的主题包括全部或部分示例性实施例/实施方式的组合。Throughout the specification and claims, terms may have subtle meanings implied or implied in the context that go beyond their expressly stated meanings. The phrases “in one embodiment” or “in some embodiments” as used herein do not necessarily refer to the same embodiment, and the phrases “in another embodiment” or “in other embodiments” as used herein do not necessarily refer to different embodiments. Similarly, the phrases “in one implementation” or “in some implementations” as used herein do not necessarily refer to the same implementation, and the phrases “in another implementation” or “in other embodiments” as used herein do not necessarily refer to different implementations. For example, the claimed subject matter includes combinations of all or some exemplary embodiments/implementations.

一般来说，术语可以至少部分地从上下文中的用法来理解。例如，这里使用的诸如“和”、“或”或“和/或”的术语可以包括各种含义，这些含义可以至少部分取决于使用这些术语的上下文。通常，“或”如果用于关联诸如A、B或C之类的列表，则意在表示A、B和C(此处用于包含意义)以及A、B或C(此处用于排斥意义)。此外，在此使用的术语“一个或多个”或“至少一个”，至少部分取决于上下文，可用于以单数意义描述任何特征、结构或特性，或可用于以复数意义描述特征、结构或特性的组合。类似地，诸如“一个”、“一个”或“该”的术语也可以被理解为传达单数用法或传达复数用法，这至少部分取决于上下文。此外，属于“基于”或“由...确定”可被理解为不一定意在传达一组排他性因素，而是可能允许存在不一定明确描述的其他因素，这也至少部分取决于上下文。图3示出了根据本公开的一个实施例的通信系统(300)的简化框图。通信系统(300)包括多个终端设备，该终端设备可通过例如网络(350)彼此通信。举例来说，通信系统(300)包括通过网络(350)互连的第一终端设备对(310)和(320)。在图3的示例中，第一终端设备对(310)和(320)可执行单向数据传输。例如，终端设备(310)可以对视频数据(例如，由终端设备(310)采集的视频图片流的视频数据)进行编码，以通过网络(350)传输到另一终端设备(320)。已编码的视频数据以一个或多个已编码视频码流形式传输。终端设备(320)可以从网络(350)接收已编码视频数据，对已编码视频数据进行解码以恢复视频图片，以及根据恢复的视频数据显示视频图片。单向数据传输可以在媒体服务等应用中实现。Generally, terms can be understood at least in part from their usage in the context. For example, terms such as “and,” “or,” or “and/or” as used herein can include a variety of meanings that can depend at least in part on the context in which the terms are used. Typically, “or,” when used to relate a list such as A, B, or C, is intended to mean A, B, and C (used here for inclusion) and A, B, or C (used here for exclusion). Furthermore, the terms “one or more” or “at least one,” as used herein, can be used, at least in part on the context, to describe any feature, structure, or characteristic in a singular sense, or to describe a combination of features, structures, or characteristics in a plural sense. Similarly, terms such as “a,” “an,” or “the” can also be understood to convey either a singular or a plural usage, at least in part on the context. Furthermore, terms such as “based on” or “determined by” can be understood to not necessarily convey a set of exclusive factors, but may allow for the presence of other factors that are not necessarily explicitly described, which also depends at least in part on the context. Figure 3 shows a simplified block diagram of a communication system (300) according to an embodiment of this disclosure. The communication system (300) includes multiple terminal devices that can communicate with each other, for example, via a network (350). For example, the communication system (300) includes a first pair of terminal devices (310) and (320) interconnected via the network (350). In the example of Figure 3, the first pair of terminal devices (310) and (320) can perform unidirectional data transmission. For example, terminal device (310) can encode video data (e.g., video data from a video picture stream captured by terminal device (310)) for transmission over the network (350) to another terminal device (320). The encoded video data is transmitted in the form of one or more encoded video streams. Terminal device (320) can receive the encoded video data from the network (350), decode the encoded video data to recover video pictures, and display the video pictures based on the recovered video data. Unidirectional data transmission can be implemented in applications such as media services.

在另一示例中，通信系统(300)包括执行已编码视频数据的双向传输的第二终端设备对(330)和(340)，该双向传输可例如在视频会议应用期间实现。对于双向数据传输，在一个示例中，终端设备(330)和(340)中的每个终端设备可以对视频数据(例如，由终端设备采集的视频图片流的视频数据)进行编码，以通过网络(350)传输到终端设备(330)和(340)中的另一终端设备。终端设备(330)和(340)中的每个终端设备还可接收由终端设备(330)和(340)中的另一终端设备传输的已编码视频数据，且可以对已编码视频数据进行解码以恢复视频图片，且可根据所恢复的视频数据在可访问的显示设备上显示视频图片。In another example, the communication system (300) includes a pair of second terminal devices (330) and (340) that perform bidirectional transmission of encoded video data, which may be implemented, for example, during a video conferencing application. For bidirectional data transmission, in one example, each of the terminal devices (330) and (340) may encode video data (e.g., video data from a video picture stream captured by the terminal device) for transmission over a network (350) to the other terminal device (330) and (340). Each of the terminal devices (330) and (340) may also receive encoded video data transmitted by the other terminal device (330) and (340), and may decode the encoded video data to recover video pictures, and may display the video pictures on an accessible display device based on the recovered video data.

在图3的示例中，终端设备(310)、终端设备(320)、终端设备(330)和终端设备(340)可实现为服务器、个人计算机和智能电话，但是本公开的基本原理的适用性可不限于此。本公开的实施例可以在台式计算机、膝上型计算机、平板电脑、媒体播放器、可穿戴式计算机、专用视频会议设备、和/或类似物上实现。网络(350)表示在终端设备(310)、终端设备(320)、终端设备(330)和终端设备(340)之间传送已编码视频数据的任何数量或类型的网络，包括例如有线(连线的)和/或无线通信网络。通信网络(350)可以在电路交换信道、分组交换信道、和/或其它类型的信道中交换数据。代表性的网络包括电信网络、局域网、广域网和/或互联网。出于本讨论的目的，除非在本文中明确说明，否则网络(350)的架构和拓扑对于本公开的操作来说可能是无关紧要的。In the example of Figure 3, terminal devices (310), (320), (330), and (340) can be implemented as servers, personal computers, and smartphones, but the applicability of the basic principles of this disclosure is not limited thereto. Embodiments of this disclosure can be implemented on desktop computers, laptop computers, tablet computers, media players, wearable computers, dedicated video conferencing equipment, and/or the like. Network (350) refers to any number or type of network that transmits encoded video data between terminal devices (310), (320), (330), and (340), including, for example, wired (connected) and/or wireless communication networks. Communication networks (350) can exchange data in circuit-switched channels, packet-switched channels, and/or other types of channels. Representative networks include telecommunications networks, local area networks (LANs), wide area networks (WANs), and/or the Internet. For the purposes of this discussion, unless explicitly stated herein, the architecture and topology of the network (350) may be irrelevant to the operation of this disclosure.

作为用于所公开的主题的应用的示例，图4示出了视频编码器和视频解码器在视频流式传输环境中的放置方式。所公开的主题可同等地适用于其它视频应用，包括例如视频会议、数字TV广播、游戏、虚拟现实、在包括CD、DVD、存储棒等的数字介质上存储压缩视频等等。As an example of an application of the disclosed subject matter, Figure 4 illustrates the placement of a video encoder and a video decoder in a video streaming environment. The disclosed subject matter is equally applicable to other video applications, including, for example, video conferencing, digital TV broadcasting, gaming, virtual reality, storing compressed video on digital media including CDs, DVDs, memory sticks, etc.

视频流式传输系统可包括视频采集子系统(413)，视频采集子系统(413)可包括例如数码相机的视频源(401)，视频源(401)创建未压缩的视频图片流或图像(402)。在一个示例中，视频图片流(402)包括由视频源401的数码相机记录的样本。相较于已编码的视频数据(404)(或已编码的视频码流)，视频图片流(402)被描绘为粗线以强调高数据量的视频图片流，视频图片流(302)可由电子设备(420)处理，该电子设备(320)包括耦接到视频源(401)的视频编码器(403)。视频编码器(403)可包括硬件、软件或软硬件组合以实现或实施如下文更详细地描述的所公开主题的各方面。相较于未压缩的视频图片流(402)，被描绘为细线以强调较低数据量的已编码视频数据(404)(或已编码视频码流(404))可存储在流式传输服务器(405)上以供将来使用，或者直接存储于下游视频设备(未示出)。一个或多个流式客户端子系统，例如图4中的客户端子系统(406)和客户端子系统(408)可以访问流式服务器(405)以检索编码视频数据(404)的副本(407)和副本(409)。客户端子系统(406)可包括例如电子设备(430)中的视频解码器(410)。视频解码器(410)对已编码视频数据的传入副本(407)进行解码，且产生可以在显示器(412)(例如，显示屏)或其它呈现设备(未描绘)上呈现的未压缩的输出视频图片流(411)。视频解码器410可配置成执行本公开所描述的各种功能中的一些或全部功能。在一些流式传输系统中，可根据某些视频编码/压缩标准对已编码的视频数据(404)、视频数据(407)和视频数据(409)(例如视频码流)进行编码。该些标准的示例包括ITU-T H.265。在实施例中，正在开发的视频编码标准非正式地称为下一代视频编码(Versatile Video Coding，VVC)。所公开的主题可用于VVC的上下文中，且可用于其它视频编码标准。The video streaming system may include a video capture subsystem (413), which may include, for example, a video source (401) of a digital camera, which creates an uncompressed video picture stream or image (402). In one example, the video picture stream (402) includes samples recorded by the digital camera of the video source 401. The video picture stream (402) is depicted as a thick line to emphasize the high data volume of the video picture stream compared to encoded video data (404) (or encoded video bitstream), and the video picture stream (302) may be processed by an electronic device (420) including a video encoder (403) coupled to the video source (401). The video encoder (403) may include hardware, software, or a combination of hardware and software to implement or enforce aspects of the disclosed subject matter as described in more detail below. Compared to the uncompressed video picture stream (402), the encoded video data (404) (or encoded video bitstream (404)) depicted as thin lines to emphasize its lower data volume can be stored on a streaming server (405) for future use or directly stored on a downstream video device (not shown). One or more streaming client subsystems, such as client subsystems (406) and (408) in FIG. 4, can access the streaming server (405) to retrieve copies (407) and (409) of the encoded video data (404). The client subsystem (406) may include, for example, a video decoder (410) in an electronic device (430). The video decoder (410) decodes the incoming copy (407) of the encoded video data and produces an uncompressed output video picture stream (411) that can be presented on a display (412) (e.g., a screen) or other presentation device (not depicted). The video decoder 410 may be configured to perform some or all of the various functions described in this disclosure. In some streaming systems, encoded video data (404), video data (407), and video data (409) (e.g., video streams) can be encoded according to certain video coding/compression standards. Examples of such standards include ITU-T H.265. In embodiments, the video coding standard under development is informally referred to as Versatile Video Coding (VVC). The disclosed topics are applicable in the context of VVC and can be used with other video coding standards.

应注意，电子设备(420)和电子设备(430)可包括其它组件(未示出)。举例来说，电子设备(420)可包括视频解码器(未示出)，且电子设备(430)还可包括视频编码器(未示出)。It should be noted that electronic devices (420) and (430) may include other components (not shown). For example, electronic device (420) may include a video decoder (not shown), and electronic device (430) may also include a video encoder (not shown).

在下文中，图5示出了根据本公开的任意实施例的视频解码器(510)的框图。视频解码器(510)可设置在电子设备(530)中。电子设备(530)可包括接收器(531)(例如接收电路)。视频解码器(510)可用于代替图4的示例中的视频解码器(410)。In the following text, Figure 5 shows a block diagram of a video decoder (510) according to any embodiment of the present disclosure. The video decoder (510) may be disposed in an electronic device (530). The electronic device (530) may include a receiver (531) (e.g., receiving circuitry). The video decoder (510) may be used in place of the video decoder (410) in the example of Figure 4.

接收器(531)可接收将由视频解码器(510)解码的一个或多个已编码视频序列。在同一实施例或另一实施例中，一次可以对一个已编码视频序列进行解码，其中每个已编码视频序列的解码独立于其它已编码视频序列。每个视频序列可与多个视频帧或图像相关联。可以从信道(501)接收已编码视频序列，信道(501)可以是硬件/软件链路，其通向存储已编码视频数据的存储设备或者发送已编码视频数据的流式传输源。接收器(531)可接收可转发到它们各自的处理电路(未描绘)的已编码视频数据和其它数据，例如已编码音频数据和/或辅助数据流。接收器(531)可将已编码视频序列与其它数据分开。为了防止网络抖动，缓冲存储器(515)可设置在接收器(531)和熵解码器/解析器(520)(此后称为“解析器(520)”)之间。在某些应用中，缓冲存储器(515)可实现为视频解码器(510)的一部分。在其它应用中，缓冲存储器(515)可位于视频解码器(510)的外部且与视频解码器(510)分开(未描绘)。而在另一些其它应用中，视频解码器(510)的外部设置缓冲存储器(未描绘)以例如防止网络抖动，且在视频解码器(510)的内部可设置另一附加的缓冲存储器(515)以例如处理播出定时。而当接收器(531)从具有足够带宽和可控性的存储/转发设备或从等时同步网络接收数据时，也可能不需要配置缓冲存储器(515)，或可以将该缓冲存储器做得较小。为了在诸如互联网等业务分组网络上使用，可能需要足够大小的缓冲存储器(515)，缓冲存储器(515)的大小可相对较大。这种缓冲存储器可实现为具有自适应大小，且可至少部分地在操作系统或视频解码器(510)外部的类似元件(未描绘)中实现。The receiver (531) may receive one or more encoded video sequences to be decoded by the video decoder (510). In the same embodiment or another embodiment, one encoded video sequence may be decoded at a time, wherein the decoding of each encoded video sequence is independent of other encoded video sequences. Each video sequence may be associated with multiple video frames or images. Encoded video sequences may be received from a channel (501), which may be a hardware/software link leading to a storage device storing the encoded video data or a streaming source transmitting the encoded video data. The receiver (531) may receive encoded video data and other data, such as encoded audio data and/or auxiliary data streams, that may be forwarded to their respective processing circuitry (not depicted). The receiver (531) may separate the encoded video sequences from other data. To prevent network jitter, a buffer memory (515) may be provided between the receiver (531) and the entropy decoder/parser (520) (hereinafter referred to as "parser (520)"). In some applications, the buffer memory (515) may be implemented as part of the video decoder (510). In other applications, the buffer memory (515) may be located externally to and separate from the video decoder (510) (not depicted). In other applications, the buffer memory (not depicted) may be located externally to the video decoder (510) to, for example, prevent network jitter, and another additional buffer memory (515) may be located internally to, for example, handle broadcast timing. When the receiver (531) receives data from a store/forward device with sufficient bandwidth and controllability or from an isochronous synchronization network, the buffer memory (515) may not be required, or it may be made smaller. For use on packet networks such as the Internet, a buffer memory (515) of sufficient size may be required, and the size of the buffer memory (515) may be relatively large. Such a buffer memory may be implemented with an adaptive size and may be implemented at least partially in an operating system or a similar component (not depicted) external to the video decoder (510).

视频解码器(510)可包括解析器(520)以根据已编码视频序列重建符号(521)。这些符号的类别包括用于管理视频解码器(510)的操作的信息，以及用于控制诸如显示器(512)(例如，显示屏)的呈现设备的潜在信息，该呈现设备可以是或者不是电子设备(530)的整体部分，但是可耦接到电子设备(530)，如图5所示。用于呈现设备的控制信息可以是辅助增强信息(Supplemental Enhancement Information，SEI消息)或视频可用性信息(Video Usability Information，VUI)参数集片段(未描绘)的形式。解析器(520)可以对由解析器(520)接收到的已编码视频序列进行解析/熵解码。已编码视频序列的熵编码可根据视频编码技术或标准进行，且可遵循各种原理，包括可变长度编码，霍夫曼(Huffman)编码、具有或不具有上下文敏感度的算术编码等。解析器(520)可基于对应于子群的至少一个参数，从已编码视频序列提取用于视频解码器中的像素的子群中的至少一个子群的子群参数集。子群可包括图片群组(Group of Pictures，GOP)、图片、图块、切片、宏块、编码单元(Coding Unit，CU)、块、变换单元(Transform Unit，TU)、预测单元(Prediction Unit，PU)等。解析器(520)还可以从已编码视频序列提取信息，例如变换系数(例如，傅里叶变换系数)，量化器参数值，运动矢量等。The video decoder (510) may include a parser (520) to reconstruct symbols (521) from the encoded video sequence. These symbols may include information for managing the operation of the video decoder (510) and potential information for controlling a presentation device such as a display (512) (e.g., a screen), which may or may not be integral to the electronic device (530), but may be coupled to the electronic device (530), as shown in Figure 5. Control information for the presentation device may be in the form of Supplemental Enhancement Information (SEI) messages or fragments of Video Usability Information (VUI) parameter sets (not depicted). The parser (520) may perform parsing/entropy decoding on the encoded video sequence received by the parser (520). The entropy coding of the encoded video sequence may be performed according to video coding techniques or standards and may follow various principles, including variable-length coding, Huffman coding, arithmetic coding with or without context sensitivity, etc. The parser (520) can extract a set of subgroup parameters from the encoded video sequence for use in the video decoder, based on at least one parameter corresponding to a subgroup. Subgroups may include Group of Pictures (GOP), pictures, tiles, slices, macroblocks, coding units (CU), blocks, transform units (TU), prediction units (PU), etc. The parser (520) can also extract information from the encoded video sequence, such as transform coefficients (e.g., Fourier transform coefficients), quantizer parameter values, motion vectors, etc.

解析器(520)可对从缓冲存储器(515)接收的视频序列执行熵解码/解析操作，从而创建符号(521)。The parser (520) can perform entropy decoding/parsing operations on the video sequence received from the buffer memory (515) to create symbols (521).

取决于已编码视频图片或一部分已编码视频图片(例如：帧间图片和帧内图片，帧间块和帧内块)的类型以及其它因素，符号(521)的重建可涉及多个不同的处理或功能单元。涉及哪些单元以及涉及方式可由解析器(520)通过从已编码视频序列解析的子群控制信息来控制。为了简单起见，未描绘解析器(520)与下文的多个处理或功能单元之间的此类子群控制信息流。Depending on the type of encoded video frames or portions thereof (e.g., inter-frame and intra-frame frames, inter-frame and intra-frame blocks) and other factors, the reconstruction of the symbol (521) may involve multiple different processing or functional units. Which units are involved and how they are involved can be controlled by the parser (520) through subgroup control information parsed from the encoded video sequence. For simplicity, the flow of such subgroup control information between the parser (520) and the various processing or functional units described below is not depicted.

除已经提及的功能块以外，视频解码器(510)可在概念上细分成如下文所描述的数个功能单元。在商业约束下运行的实际实现方式中，这些功能单元中的许多功能单元彼此紧密交互且可至少部分地彼此集成。然而，出于清楚地描述所公开的主题的各种功能的目的，在本公开的下文中采用在概念上细分成多个功能单元。In addition to the functional blocks already mentioned, the video decoder (510) can be conceptually subdivided into several functional units as described below. In a practical implementation operating under commercial constraints, many of these functional units interact closely with each other and can be at least partially integrated with each other. However, for the purpose of clearly describing the various functions of the disclosed subject matter, the following disclosure adopts the conceptual subdivision into multiple functional units.

第一单元可包括缩放器/逆变换单元(551)。缩放器/逆变换单元(551)可以从解析器(520)接收作为符号(521)的量化变换系数以及控制信息，包括指示要使用哪种类型的逆变换、块大小、量化因子/参数、量化缩放矩阵等的信息。缩放器/逆变换单元(551)可输出包括样本值的块，样本值可输入到聚合器(555)中。The first unit may include a scaler/inverse transform unit (551). The scaler/inverse transform unit (551) can receive quantization transform coefficients as symbols (521) from the parser (520) and control information, including information indicating which type of inverse transform to use, block size, quantization factor/parameter, quantization scaling matrix, etc. The scaler/inverse transform unit (551) can output a block including sample values, which can be input into the aggregator (555).

在一些情况下，缩放器/逆变换(551)的输出样本可属于帧内编码块；即：不使用来自先前重建的图片的预测信息，但是可使用来自当前图片的先前重建部分的预测信息的块。此类预测性信息可由帧内图片预测单元(552)提供。在一些情况下，帧内图片预测单元(552)可使用已重建且存储在当前图片缓冲器(558)中的周围块信息来生成大小和形状与正在重建的块相同的块。举例来说，当前图片缓冲器(558)缓冲部分重建的当前图片和/或完全重建的当前图片。在一些实现方式中，聚合器(555)可基于每个样本，将帧内预测单元(552)生成的预测信息添加到由缩放器/逆变换单元(551)提供的输出样本信息中。In some cases, the output samples of the scaler/inverse transform (551) may belong to intra-coded blocks; that is, blocks that do not use prediction information from previously reconstructed images, but can use prediction information from previously reconstructed portions of the current image. Such predictive information may be provided by the intra-picture prediction unit (552). In some cases, the intra-picture prediction unit (552) may use information from surrounding blocks that have been reconstructed and stored in the current picture buffer (558) to generate blocks of the same size and shape as the block being reconstructed. For example, the current picture buffer (558) buffers partially reconstructed and/or fully reconstructed current images. In some implementations, the aggregator (555) may add the prediction information generated by the intra-picture prediction unit (552) to the output sample information provided by the scaler/inverse transform unit (551) based on each sample.

在其它情况下，缩放器/逆变换单元(551)的输出样本可属于帧间编码和潜在运动补偿块。在这种情况下，运动补偿预测单元(553)可访问参考图片存储器(557)以提取用于帧间图片预测的样本。在根据属于块的符号(521)对所提取的样本进行运动补偿之后，这些样本可由聚合器(555)添加到缩放器/逆变换单元(551)的输出(单元551的输出可称为残差样本或残差信号)，从而生成输出样本信息。运动补偿预测单元(553)从参考图片存储器(557)内的地址提取预测样本可受到运动矢量控制，且该运动矢量可以以符号(521)的形式提供给运动补偿预测单元(553)使用，符号(521)可具有例如X分量、Y分量(偏移)和参考图片分量(时间)。运动补偿还可包括在使用子样本精确运动矢量时，从参考图片存储器(557)提取的样本值的内插，且还可与运动矢量预测机制等相关联。In other cases, the output samples of the scaler/inverse transform unit (551) may belong to inter-frame coding and latent motion compensation blocks. In this case, the motion compensation prediction unit (553) may access the reference image memory (557) to extract samples for inter-frame image prediction. After motion compensation is performed on the extracted samples according to the symbols (521) belonging to the block, these samples may be added by the aggregator (555) to the output of the scaler/inverse transform unit (551) (the output of unit 551 may be referred to as residual samples or residual signals), thereby generating output sample information. The extraction of prediction samples by the motion compensation prediction unit (553) from the address in the reference image memory (557) may be controlled by motion vectors, and these motion vectors may be provided to the motion compensation prediction unit (553) in the form of symbols (521), which may have, for example, an X component, a Y component (offset), and a reference image component (time). Motion compensation may also include interpolation of sample values extracted from the reference image memory (557) when using subsample precise motion vectors, and may also be associated with motion vector prediction mechanisms, etc.

聚合器(555)的输出样本可在环路滤波器单元(556)中被各种环路滤波技术采用。视频压缩技术可包括环路内滤波器技术，该环路内滤波器技术受控于包括在已编码视频序列(也称作已编码视频码流)中的参数，且该参数作为来自解析器(520)的符号(521)可用于环路滤波器单元(556)。然而，在其他实施例中，视频压缩技术还可响应于在解码已编码图片或已编码视频序列的先前(按解码次序)部分期间获得的元信息，以及响应于先前重建且经过环路滤波的样本值。可以以各种顺序包括多种类型的环路滤波器，作为环路滤波器单元556的一部分，如将在下文进一步详细描述的。The output samples of the aggregator (555) can be employed by various loop filtering techniques in the loop filter unit (556). Video compression techniques may include in-loop filtering techniques controlled by parameters included in the encoded video sequence (also referred to as the encoded video stream), and these parameters can be used as symbols (521) from the parser (520) in the loop filter unit (556). However, in other embodiments, the video compression techniques may also respond to metadata obtained during decoding of a previous (in decoding order) portion of the encoded picture or encoded video sequence, and to previously reconstructed and loop-filtered sample values. Various types of loop filters can be included in various orders as part of the loop filter unit 556, as will be described in further detail below.

环路滤波器单元(556)的输出可以是样本流，该样本流可输出到呈现设备(512)以及存储在参考图片存储器(557)中以用于未来的帧间图片预测。The output of the loop filter unit (556) can be a sample stream that can be output to the presentation device (512) and stored in the reference image memory (557) for future inter-frame image prediction.

一旦完全重建，某些已编码图片就可用作参考图片以用于将来的帧间图片预测。举例来说，一旦对应于当前图片的已编码图片被完全重建，且已编码图片(通过例如解析器(520))被识别为参考图片，则当前图片缓冲器(558)可变为参考图片存储器(557)的一部分，且可在开始重建后续已编码图片之前重新分配新的当前图片缓冲器。Once fully reconstructed, certain encoded images can be used as reference images for future inter-frame image prediction. For example, once the encoded image corresponding to the current image has been fully reconstructed and the encoded image (by, for example, the parser (520)) is identified as the reference image, the current image buffer (558) can become part of the reference image memory (557), and a new current image buffer can be reallocated before the reconstruction of subsequent encoded images begins.

视频解码器(510)可根据诸如ITU-T H.265建议书的标准中采用的预定视频压缩技术执行解码操作。在已编码视频序列遵循视频压缩技术或标准的语法以及视频压缩技术或标准中记录的配置文件的意义上，已编码视频序列可符合所使用的视频压缩技术或标准指定的语法。具体而言，配置文件可以从视频压缩技术或标准中可用的所有工具中选择某些工具作为在该配置文件下可供使用的仅有工具。为了符合标准，已编码视频序列的复杂度可处于视频压缩技术或标准的层级所限定的范围内。在一些情况下，层级限制最大图片大小、最大帧率、最大重建取样率(以例如每秒兆(mega)个样本为单位进行测量)、最大参考图片大小等。在一些情况下，由层级设定的限制可通过假想参考解码器(HypotheticalReference Decoder，HRD)规范和在已编码视频序列中用信号表示的HRD缓冲器管理的元数据来进一步限定。The video decoder (510) can perform decoding operations according to a predetermined video compression technique adopted in a standard such as ITU-T Recommendation H.265. The encoded video sequence may conform to the syntax specified by the video compression technique or standard in the sense that the encoded video sequence follows the syntax of the video compression technique or standard and the configuration file recorded in the video compression technique or standard. Specifically, the configuration file may select certain tools from all available tools in the video compression technique or standard as the only tools available under that configuration file. To conform to the standard, the complexity of the encoded video sequence may be within the range defined by the hierarchy of the video compression technique or standard. In some cases, the hierarchy limits the maximum image size, maximum frame rate, maximum reconstruction sampling rate (measured in, for example, megasamples per second), maximum reference image size, etc. In some cases, the limitations set by the hierarchy may be further limited by the Hypothetical Reference Decoder (HRD) specification and the metadata managed by the HRD buffer, which is represented by signals in the encoded video sequence.

在一些示例性实施例中，接收器(531)可连同已编码视频一起接收附加(冗余)数据。该附加数据可以是已编码视频序列的一部分。该附加数据可由视频解码器(510)用以对数据进行适当解码和/或较准确地重建原始视频数据。附加数据可呈例如时间、空间或信噪比(signal noise ratio，SNR)增强层、冗余切片、冗余图片、前向纠错码等形式。In some exemplary embodiments, the receiver (531) may receive supplemental (redundant) data along with the encoded video. This supplemental data may be a portion of the encoded video sequence. The supplemental data may be used by the video decoder (510) to properly decode the data and/or more accurately reconstruct the original video data. The supplemental data may take the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) enhancement layers, redundant slices, redundant images, forward error correction codes, etc.

图6是根据本申请公开的一个示例性实施例的视频编码器(603)的框图。视频编码器(603)可包括在电子设备(620)中。电子设备(620)可进一步包括传输器(640)(例如，传输电路)。视频编码器(603)可用于代替图4的示例中的视频编码器(403)。Figure 6 is a block diagram of a video encoder (603) according to an exemplary embodiment disclosed in this application. The video encoder (603) may be included in an electronic device (620). The electronic device (620) may further include a transmitter (640) (e.g., transmission circuitry). The video encoder (603) may be used in place of the video encoder (403) in the example of Figure 4.

视频编码器(603)可以从视频源(601)(并非图6的示例中的电子设备(620)的一部分)接收视频样本，视频源(601)可采集将由视频编码器(603)编码的视频图像。在另一示例中，视频源(601)可实现为电子设备(620)的一部分。The video encoder (603) can receive video samples from a video source (601) (not part of the electronic device (620) in the example of Figure 6), which can capture video images that will be encoded by the video encoder (603). In another example, the video source (601) may be implemented as part of the electronic device (620).

视频源(601)可提供将由视频编码器(603)编码的呈数字视频样本流形式的源视频序列，该数字视频样本流可具有任何合适位深度(例如：8位、10位、12位......)、任何色彩空间(例如BT.601YCrCB，RGB，XYZ......)和任何合适的采样结构(例如YCrCb 4:2:0，YCrCb 4:4:4)。在媒体服务系统中，视频源(601)可以是能够存储先前已准备的视频的存储设备。在视频会议系统中，视频源(601)可以是采集本地图像信息作为视频序列的相机。视频数据可作为多个单独的图片或图像来提供，当按顺序观看时，这些图片或图像被赋予运动。图片本身可构建为空间像素阵列，其中取决于所使用的采样结构、色彩空间等，每个像素可包括一个或多个样本。本领域的普通技术人员可容易地理解像素和样本之间的关系。下文侧重于描述样本。A video source (601) can provide a sequence of source video samples to be encoded by a video encoder (603) in the form of a digital video sample stream, which can have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit, etc.), any color space (e.g., BT.601YCrCB, RGB, XYZ, etc.), and any suitable sampling structure (e.g., YCrCb 4:2:0, YCrCb 4:4:4). In a media service system, the video source (601) can be a storage device capable of storing previously prepared video. In a video conferencing system, the video source (601) can be a camera that captures local image information as a video sequence. The video data can be provided as multiple individual pictures or images, which are given motion when viewed in sequence. The pictures themselves can be constructed as spatial pixel arrays, where each pixel can include one or more samples depending on the sampling structure, color space, etc., used. The relationship between pixels and samples will be readily understood by those skilled in the art. The following focuses on describing samples.

根据一些示例性实施例，视频编码器(603)可实时地或在应用所要求的任何其它时间约束下，将源视频序列的图片编码并压缩成已编码视频序列(643)。施行适当的编码速度构成了控制器(650)的一个功能。在一些实施例中，控制器(650)可以在功能上耦接到如下文所描述的其它功能单元，且控制所述的其它功能单元。为了简单起见，图中未描绘耦接。由控制器(650)设置的参数可包括速率控制相关参数(图片跳过、量化器、率失真优化技术的λ值......)、图片大小、图片群组(GOP)布局、最大运动矢量搜索范围等。控制器(650)可用于具有其它合适的功能，这些功能涉及针对某一系统设计优化的视频编码器(603)。According to some exemplary embodiments, a video encoder (603) can encode and compress images of a source video sequence into an encoded video sequence (643) in real time or under any other time constraints required by the application. Implementing an appropriate encoding rate constitutes a function of the controller (650). In some embodiments, the controller (650) may be functionally coupled to and control other functional units described below. For simplicity, coupling is not depicted in the figures. Parameters set by the controller (650) may include rate control related parameters (image skipping, quantizer, λ value of rate-distortion optimization techniques, etc.), image size, group of images (GOP) layout, maximum motion vector search range, etc. The controller (650) may be used with other suitable functions related to a video encoder (603) optimized for a particular system design.

在一些示例性实施例中，视频编码器(603)可配置成在编码环路中进行操作。作为过于简化的描述，在一个示例中，编码环路可包括源编码器(630)(例如，负责基于待编码的输入图片和参考图片来创建符号，例如符号流)和嵌入于视频编码器(603)中的(本地)解码器(633)。解码器(633)重建符号以用类似于(远程)解码器创建样本数据的方式创建样本数据，即使嵌入式解码器633通过不具有熵编码的源编码器630来处理已编码视频流(因为在所公开的主题中考虑的视频压缩技术中，熵编码中的符号与已编码视频码流之间的任何压缩可以是无损的)。将重建的样本流(样本数据)输入到参考图片存储器(634)。由于符号流的解码产生与解码器位置(本地或远程)无关的位精确结果，因此参考图片存储器(634)中的内容在本地编码器与远程编码器之间也是按比特位精确对应的。换句话说，编码器的预测部分“看到”的参考图片样本与解码器将在解码期间使用预测时所“看到”的样本值完全相同。这种参考图片同步性基本原理(以及在例如因信道误差而无法维持同步性的情况下产生的漂移)用于改善编码质量。In some exemplary embodiments, the video encoder (603) may be configured to operate within an encoding loop. As an oversimplification, in one example, the encoding loop may include a source encoder (630) (e.g., responsible for creating symbols, such as a symbol stream, based on the input image to be encoded and a reference image) and a (local) decoder (633) embedded within the video encoder (603). The decoder (633) reconstructs the symbols to create sample data in a manner similar to how the (remote) decoder creates sample data, even though the embedded decoder 633 processes the encoded video stream through the source encoder 630, which does not have entropy coding (because in the video compression techniques considered in the disclosed subject matter, any compression between the symbols in entropy coding and the encoded video stream can be lossless). The reconstructed sample stream (sample data) is input to a reference image memory (634). Since the decoding of the symbol stream produces bit-accurate results independent of the decoder location (local or remote), the contents of the reference image memory (634) also correspond bit-accurately between the local encoder and the remote encoder. In other words, the reference picture samples "seen" by the encoder's prediction section are exactly the same as the sample values that the decoder will "see" during the prediction phase. This fundamental principle of reference picture synchronization (and the drift that occurs when synchronization cannot be maintained, for example, due to channel errors) is used to improve coding quality.

“本地”解码器(633)的操作可与例如已在上文结合图5详细描述视频解码器(510)的“远程”解码器相同。然而，另外简要参考图5，由于符号可用且熵编码器(645)和解析器(520)能够无损地将符号编码/解码成已编码视频序列，因此包括缓冲存储器(515)和解析器(520)的视频解码器(510)的熵解码部分可能无法完全在本地解码器(633)中、在编码器中实现。The operation of the “local” decoder (633) can be the same as that of the “remote” decoder, for example, the video decoder (510) which has been described in detail above in conjunction with Figure 5. However, referring briefly to Figure 5 again, since symbols are available and the entropy encoder (645) and parser (520) are able to encode/decode the symbols into an encoded video sequence without loss, the entropy decoding portion of the video decoder (510), which includes the buffer memory (515) and the parser (520), may not be fully implemented in the local decoder (633) or in the encoder.

此时可以观察到，除了可能仅存在于解码器中的解析/熵解码之外的任何解码器技术，也必定以基本上相同的功能形式存在于对应的编码器中。出于此原因，所公开的主题有时侧重于解码器操作，解码器操作与编码器的解码部分协作。因此，可简化编码器技术的描述，因为编码器技术与全面地描述的解码器技术互逆。下面只对编码器的某些领域或方面进行了更详细的描述。It can be observed that any decoder technique, except for parsing/entropy decoding which may only exist in the decoder, must also exist in the corresponding encoder in essentially the same functional form. For this reason, the disclosed topics sometimes focus on decoder operations, which cooperate with the decoding part of the encoder. Therefore, the description of encoder techniques can be simplified, as encoder techniques are inverses of the fully described decoder techniques. Only certain areas or aspects of the encoder are described in more detail below.

在操作期间，在一些示例性实现方式中，源编码器(630)可执行运动补偿预测编码，通过参考来自视频序列中被指定为“参考图片”的一个或多个先前已编码图片，该运动补偿预测编码对输入图片进行预测性编码。以这种方式，编码引擎(632)对输入图片的像素块和参考图片的像素块之间的颜色通道中的差异(或残差)进行编码，该参考图片可被选作该输入图片的预测参考。术语“残差”及其形容词形式“残差”可互换使用。During operation, in some exemplary implementations, the source encoder (630) may perform motion-compensated predictive coding, which predictively encodes the input image by referencing one or more previously encoded images from the video sequence designated as "reference images." In this manner, the encoding engine (632) encodes the differences (or residuals) in the color channels between pixel blocks of the input image and pixel blocks of the reference image, which may be selected as a predictive reference for the input image. The term "residual" and its adjective form "residual" are used interchangeably.

本地视频解码器(633)可基于源编码器(630)创建的符号，对可指定为参考图片的图片的已编码视频数据进行解码。编码引擎(632)的操作可为有损过程。当已编码的视频数据可以在视频解码器(图6中未示出)上解码时，重建的视频序列通常可能是带有一些误差的源视频序列的副本。本地视频解码器(633)复制解码过程，该解码过程可由视频解码器对参考图片执行，且可使重建的参考图片存储在参考图片高速缓存(634)中。以这种方式，视频编码器(603)可以在本地存储已重建参考图片的副本，该副本与将由远端(远程)视频解码器获得的已重建参考图片具有共同内容(不存在传输误差)。The local video decoder (633) can decode encoded video data of a picture that can be designated as a reference picture based on symbols created by the source encoder (630). The operation of the encoding engine (632) can be a lossy process. When the encoded video data can be decoded on the video decoder (not shown in Figure 6), the reconstructed video sequence is often a copy of the source video sequence with some errors. The local video decoder (633) replicates the decoding process that can be performed by the video decoder on the reference picture, and can store the reconstructed reference picture in a reference picture cache (634). In this way, the video encoder (603) can locally store a copy of the reconstructed reference picture that shares common content (no transmission errors) with the reconstructed reference picture that will be obtained by the remote video decoder.

预测器(635)可针对编码引擎(632)执行预测搜索。即，对于将要编码的新图片，预测器(635)可在参考图片存储器(634)中搜索可作为该新图片的适当预测参考的样本数据(作为候选参考像素块)或某些元数据，例如参考图片运动矢量、块形状等。预测器(635)可基于样本块逐像素块操作，以找到合适的预测参考。在一些情况下，根据预测器(635)获得的搜索结果，可确定输入图片可具有从参考图片存储器(634)中存储的多个参考图片取得的预测参考。The predictor (635) can perform a prediction search against the encoding engine (632). That is, for a new image to be encoded, the predictor (635) can search in the reference image memory (634) for sample data (as candidate reference pixel blocks) or certain metadata, such as reference image motion vectors, block shapes, etc., that can serve as appropriate prediction references for the new image. The predictor (635) can operate pixel-by-pixel based on the sample blocks to find suitable prediction references. In some cases, based on the search results obtained by the predictor (635), it can be determined that the input image may have prediction references obtained from multiple reference images stored in the reference image memory (634).

控制器(650)可管理源编码器(630)的编码操作，包括例如设置用于对视频数据进行编码的参数和子群参数。The controller (650) can manage the encoding operations of the source encoder (630), including, for example, setting parameters and subgroup parameters for encoding video data.

可在熵编码器(645)中对所有上述功能单元的输出进行熵编码。熵编码器(645)根据诸如霍夫曼编码、可变长度编码、算术编码等的技术来对各种功能单元生成的符号进行无损压缩，从而将该符号转换成已编码视频序列。The outputs of all the above-mentioned functional units can be entropy encoded in the entropy encoder (645). The entropy encoder (645) performs lossless compression on the symbols generated by the various functional units using techniques such as Huffman coding, variable-length coding, and arithmetic coding, thereby converting the symbols into an encoded video sequence.

传输器(640)可缓冲由熵编码器(645)创建的已编码视频序列，从而为通过通信信道(660)进行传输做准备，该通信信道可以是通向将存储已编码的视频数据的存储设备的硬件/软件链路。传输器(640)可将来自视频编码器(603)的已编码视频数据与要传输的其它数据合并，该其它数据例如是已编码音频数据和/或辅助数据流(未示出来源)。The transmitter (640) can buffer the encoded video sequence created by the entropy encoder (645) in preparation for transmission via a communication channel (660), which may be a hardware/software link to a storage device that will store the encoded video data. The transmitter (640) can combine the encoded video data from the video encoder (603) with other data to be transmitted, such as encoded audio data and/or auxiliary data streams (source not shown).

控制器(650)可管理视频编码器(603)的操作。在编码期间，控制器(650)可以为每个已编码图片分配某一已编码图片类型，但这可能影响可应用于相应的图片的编码技术。例如，通常可将图片分配为以下任一种图片类型：The controller (650) manages the operation of the video encoder (603). During encoding, the controller (650) can assign a specific encoded image type to each encoded image, but this may affect the encoding techniques applicable to the corresponding images. For example, images can typically be assigned to any of the following image types:

帧内图片(I图片)，其可以是不将序列中的任何其它图片用作预测源就可被编码和解码的图片。一些视频编解码器容许不同类型的帧内图片，包括例如独立解码器刷新(Independent Decoder Refresh，“IDR”)图片。本领域的普通技术人员了解I图片的变体及其相应的应用和特征。An intra-frame picture (I-picture) is a picture that can be encoded and decoded without using any other pictures in the sequence as a prediction source. Some video codecs allow different types of intra-frame pictures, including, for example, Independent Decoder Refresh (IDR) pictures. Variations of I-pictures and their corresponding applications and characteristics are well understood by those skilled in the art.

预测性图片(P图片)，其可以是可使用帧内预测或帧间预测进行编码和解码的图片，该帧内预测或帧间预测使用至多一个运动矢量和参考索引来预测每个块的样本值。A predictive picture (P-picture) can be a picture that can be encoded and decoded using intra-frame prediction or inter-frame prediction, which uses at most one motion vector and reference index to predict sample values for each block.

双向预测性图片(B图片)，其可以是可使用帧内预测或帧间预测进行编码和解码的图片，该帧内预测或帧间预测使用至多两个运动矢量和参考索引来预测每个块的样本值。类似地，多个预测性图片可使用多于两个参考图片和相关联元数据以用于重建单个块。A bidirectional predictive picture (B-picture) can be a picture that can be encoded and decoded using intra-frame prediction or inter-frame prediction, which uses at most two motion vectors and a reference index to predict sample values for each block. Similarly, multiple predictive pictures can use more than two reference pictures and associated metadata to reconstruct a single block.

源图片通常可以在空间上细分成多个样本编码块(例如，4×4、8×8、4×8或16×16个样本的块)，且逐块进行编码。这些块可参考其它(已编码)块进行预测性编码，其它(已编码)块由应用于块的相应图片的编码分配来确定。举例来说，I图片的块可进行非预测编码，或该块可参考同一图片的已经编码的块来进行预测编码(空间预测或帧内预测)。P图片的像素块可参考一个先前编码的参考图片通过空间预测或通过时域预测进行预测编码。B图片的块可参考一个或两个先前编码的参考图片通过空间预测或通过时域预测进行预测编码。出于其它目的，源图片或中间处理后的图片可细分成其它类型的块。编码块和其它类型的块的划分可遵循或者可以不遵循相同的方式，如下文进一步详细描述的。Source images can typically be spatially subdivided into multiple sample coding blocks (e.g., 4×4, 8×8, 4×8, or 16×16 sample blocks), and coded block by block. These blocks can be predictively coded with reference to other (already coded) blocks, which are determined by the coding assignments of the corresponding images applied to the blocks. For example, blocks of an I-image can be non-predictively coded, or the block can be predictively coded (spatial or intra-frame prediction) with reference to already coded blocks of the same image. Pixel blocks of a P-image can be predictively coded with reference to a previously coded reference image via spatial or temporal prediction. Blocks of a B-image can be predictively coded with reference to one or two previously coded reference images via spatial or temporal prediction. For other purposes, source images or intermediate images can be subdivided into other types of blocks. The partitioning of coding blocks and other types of blocks may or may not follow the same pattern, as described in further detail below.

视频编码器(603)可根据例如ITU-T H.265建议书的预定视频编码技术或标准执行编码操作。在操作中，视频编码器(603)可执行各种压缩操作，包括利用输入视频序列中的时间和空间冗余的预测编码操作。因此，已编码视频数据可符合所用视频编码技术或标准指定的语法。The video encoder (603) can perform encoding operations according to a predetermined video coding technique or standard, such as ITU-T H.265 Recommendation. In operation, the video encoder (603) can perform various compression operations, including predictive coding operations that utilize temporal and spatial redundancy in the input video sequence. Therefore, the encoded video data can conform to the syntax specified by the video coding technique or standard used.

在一种示例性实施例中，传输器(640)可在传输已编码的视频时传输附加数据。源编码器(630)可以包括这样的数据作为编码视频序列的一部分。附加数据可包括时间/空间/SNR增强层、冗余图片和切片等其它形式的冗余数据、SEI消息、VUI参数集片段等。In one exemplary embodiment, the transmitter (640) may transmit additional data while transmitting encoded video. The source encoder (630) may include such data as part of the encoded video sequence. The additional data may include other forms of redundant data such as temporal/spatial/SNR enhancement layers, redundant pictures and slices, SEI messages, VUI parameter set fragments, etc.

采集到的视频可作为呈时间序列的多个源图片(视频图片)。帧内图片预测(通常缩写为帧内预测)利用给定图片中的空间相关性，而帧间预测利用图片之间的(时间或其他)相关性。例如，可以将正在编码/解码的特定图片分成块，正在编码/解码的特定图片被称为当前图片。在当前图片中的块类似于视频中先前已编码且仍被缓冲的参考图片中的参考块时，可通过被称为运动矢量的矢量对当前图片中的块进行编码。该运动矢量指向参考图片中的参考块，且在使用多个参考图片的情况下，该运动矢量可具有识别参考图片的第三维度。The captured video can be presented as multiple source images (video images) in a time-series format. Intra-frame image prediction (often abbreviated as intra-prediction) utilizes spatial correlations within a given image, while inter-frame prediction utilizes (temporal or other) correlations between images. For example, a specific image being encoded/decoded can be divided into blocks, and this specific image being encoded/decoded is called the current image. When a block in the current image resembles a reference block in a previously encoded and still buffered reference image in the video, the block in the current image can be encoded using a vector called a motion vector. This motion vector points to the reference block in the reference image, and when using multiple reference images, the motion vector can have a third dimension that identifies the reference image.

在一些示例性实施例中，双向预测技术可用于帧间图片预测。根据这种双向预测技术，使用两个参考图片，例如按解码次序在视频中的当前图片之前(但是按显示次序可能分别是过去和将来)的第一参考图片和第二参考图片。可通过指向第一参考图片中的第一参考块的第一运动矢量和指向第二参考图片中的第二参考块的第二运动矢量对当前图片中的块进行编码。可通过第一参考块和第二参考块的组合来共同预测该块。In some exemplary embodiments, bidirectional prediction techniques can be used for inter-frame image prediction. According to this bidirectional prediction technique, two reference images are used, such as a first reference image and a second reference image that precede the current image in the video in decoding order (but may be past and future in display order). A block in the current image can be encoded using a first motion vector pointing to a first reference block in the first reference image and a second motion vector pointing to a second reference block in the second reference image. The block can be predicted jointly using a combination of the first and second reference blocks.

此外，合并模式技术可用于帧间图片预测以改善编码效率。In addition, merging mode techniques can be used for inter-frame image prediction to improve coding efficiency.

根据本公开的一些示例性实施例，诸如帧间图片预测和帧内图片预测的预测以块为单位来执行。例如，将视频图片序列中的图片分成编码树单元(coding tree unit，CTU)以用于压缩，图片中的CTU可具有相同大小，例如64×64像素、32×32像素或16×16像素。通常，CTU可包括三个并行的编码树块(coding tree block，CTB)，这三个并行的编码树块是一个亮度CTB和两个色度CTB。可以将每个CTU递归地以四叉树划分成一个或多个编码单元(CU)。例如，可以将64×64像素的CTU划分成一个64×64像素的CU，或4个32×32像素的CU。一个或多个32×32块中的每个块可进一步划分成4个16×16像素的CU。在一些示例性实施例中，可以在编码期间分析每个CU以确定各种预测类型中、用于CU的预测类型，各种预测类型例如是帧间预测类型或帧内预测类型。根据时间和/或空间可预测性，可以将CU划分成一个或多个预测单元(PU)。通常，每个PU包括亮度预测块(prediction block，PB)和两个色度PB。在实施例中，编码(编码/解码)中的预测操作以预测块为单位来执行。可以以各种空间模式执行将CU划分成PU(或者不同颜色通道的PB)。例如，亮度或色度PB可包括针对样本的值(例如，亮度值)的矩阵，所述的样本例如是8×8像素、16×16像素、8×16像素、16×8样本等。According to some exemplary embodiments of this disclosure, predictions such as inter-frame picture prediction and intra-frame picture prediction are performed on a block-by-block basis. For example, pictures in a video picture sequence are divided into coding tree units (CTUs) for compression. The CTUs in the pictures may have the same size, such as 64×64 pixels, 32×32 pixels, or 16×16 pixels. Typically, a CTU may include three parallel coding tree blocks (CTBs), one luma CTB and two chroma CTBs. Each CTU may be recursively partitioned into one or more coding units (CUs) using a quadtree. For example, a 64×64 pixel CTU may be partitioned into one 64×64 pixel CU, or four 32×32 pixel CUs. Each of the one or more 32×32 blocks may be further partitioned into four 16×16 pixel CUs. In some exemplary embodiments, each CU may be analyzed during encoding to determine the prediction type used for the CU among various prediction types, such as inter-frame prediction or intra-frame prediction. Based on temporal and/or spatial predictability, a Cubic Queues (CUs) can be divided into one or more Prediction Units (PUs). Typically, each PU includes a luma prediction block (PB) and two chroma PBs. In embodiments, prediction operations during encoding (encoding/decoding) are performed on a per-prediction-block basis. The division of a CU into PUs (or PBs for different color channels) can be performed in various spatial modes. For example, a luma or chroma PB may comprise a matrix of values for samples (e.g., luma values), such as 8×8 pixels, 16×16 pixels, 8×16 pixels, 16×8 samples, etc.

图7示出了根据本公开的另一示例性实施例的视频编码器(703)的图。视频编码器(703)用于接收视频图片序列中的当前视频图片内的样本值的处理块(例如预测块)，且将该处理块编码到作为已编码视频序列的一部分的已编码图片中。示例性视频编码器(703)可用于代替图4的示例中的视频编码器(403)。Figure 7 illustrates a diagram of a video encoder (703) according to another exemplary embodiment of the present disclosure. The video encoder (703) is used to receive a processing block (e.g., a prediction block) of sample values within a current video image in a video image sequence, and to encode the processing block into an encoded image that is part of an encoded video sequence. The exemplary video encoder (703) can be used in place of the video encoder (403) in the example of Figure 4.

例如，视频编码器(703)接收用于处理块的样本值的矩阵，该处理块例如是8×8样本的预测块等。然后，视频编码器(703)使用例如率失真优化(rate-distortionoptimization，RDO)来确定是否使用帧内模式、帧间模式或双向预测模式来最佳地对处理块进行编码。当确定在帧内模式中对处理块进行编码时，视频编码器(703)可使用帧内预测技术以将处理块编码到已编码图片中；且当确定在帧间模式或双向预测模式中对处理块进行编码时，视频编码器(703)可分别使用帧间预测或双向预测技术以将处理块编码到已编码图片中。在一些示例性实施例中，合并模式可用作帧间图片预测子模式，其中，在不借助预测器外部的已编码运动矢量分量的情况下，从一个或多个运动矢量预测器导出运动矢量。在一些其它示例性实施例中，可存在适用于主题块的运动矢量分量。因此，视频编码器(703)可包括未在图7中明确示出的组件，例如用于确定处理块的预测模式的模式决策模块。For example, a video encoder (703) receives a matrix of sample values for a processing block, such as an 8×8 sample prediction block. The video encoder (703) then uses, for example, rate-distortion optimization (RDO) to determine whether to optimally encode the processing block using intra-frame mode, inter-frame mode, or bidirectional prediction mode. When it is determined that the processing block should be encoded in intra-frame mode, the video encoder (703) can use intra-frame prediction techniques to encode the processing block into an encoded picture; and when it is determined that the processing block should be encoded in inter-frame mode or bidirectional prediction mode, the video encoder (703) can use inter-frame prediction or bidirectional prediction techniques respectively to encode the processing block into an encoded picture. In some exemplary embodiments, a merging mode can be used as an inter-frame picture prediction sub-mode, wherein motion vectors are derived from one or more motion vector predictors without the aid of encoded motion vector components outside the predictors. In some other exemplary embodiments, motion vector components applicable to the subject block may be present. Therefore, the video encoder (703) may include components not explicitly shown in FIG7, such as a mode decision module for determining the prediction mode of the processing block.

在图7的示例中，视频编码器(703)包括如图7的示例性布置所示的耦接在一起的帧间编码器(730)、帧内编码器(722)、残差计算器(723)、开关(726)、残差编码器(724)、通用控制器(721)和熵编码器(725)。In the example of FIG7, the video encoder (703) includes an inter-frame encoder (730), an intra-frame encoder (722), a residual calculator (723), a switch (726), a residual encoder (724), a general controller (721), and an entropy encoder (725) coupled together as shown in the exemplary arrangement of FIG7.

帧间编码器(730)配置成接收当前块(例如，处理块)的样本、将该块与参考图片中的一个或多个参考块(例如，按显示次序的先前图片和后来图片中的块)进行比较、生成帧间预测信息(例如，根据帧间编码技术的冗余信息描述、运动矢量、合并模式信息)、以及基于帧间预测信息使用任何合适的技术来计算帧间预测结果(例如，已预测块)。在一些示例中，参考图片是基于已编码视频信息，使用解码单元633解码的已解码参考图片，解码单元633嵌入在图6的示例性编码器620中(其示出为图7的残差解码器728，如下文进一步详细描述的)。The inter-frame encoder (730) is configured to receive samples of the current block (e.g., the processing block), compare the block with one or more reference blocks in a reference image (e.g., blocks in previous and later images in display order), generate inter-frame prediction information (e.g., redundancy information description based on inter-frame coding techniques, motion vectors, merging mode information), and compute inter-frame prediction results (e.g., predicted blocks) based on the inter-frame prediction information using any suitable technique. In some examples, the reference image is a decoded reference image based on encoded video information, decoded using decoding unit 633, which is embedded in the exemplary encoder 620 of FIG. 6 (shown as the residual decoder 728 of FIG. 7, as described in further detail below).

帧内编码器(722)配置成接收当前块(例如，处理块)的样本、将该块与同一图片中已编码的块进行比较、在变换之后生成量化系数、以及在一些情况下还生成帧内预测信息(例如，根据一个或多个帧内编码技术的帧内预测方向信息)。帧内编码器(722)可基于帧内预测信息和同一图片中的参考块计算帧内预测结果(例如，已预测块)。The intra encoder (722) is configured to receive samples of the current block (e.g., the processed block), compare the block with encoded blocks in the same image, generate quantization coefficients after transformation, and in some cases also generate intra prediction information (e.g., intra prediction direction information based on one or more intra coding techniques). The intra encoder (722) may compute intra prediction results (e.g., predicted blocks) based on the intra prediction information and reference blocks in the same image.

通用控制器(721)可配置成确定通用控制数据，且基于该通用控制数据控制视频编码器(703)的其它组件。在一个示例中，通用控制器(721)确定块的预测模式，且基于该预测模式将控制信号提供给开关(726)。例如，当该预测模式是帧内模式时，通用控制器(721)控制开关(726)以选择供残差计算器(723)使用的帧内模式结果，且控制熵编码器(725)以选择帧内预测信息并将帧内预测信息包括在码流中；以及当块的预测模式是帧间模式时，通用控制器(721)控制开关(726)以选择供残差计算器(723)使用的帧间预测结果，且控制熵编码器(725)以选择帧间预测信息并将帧间预测信息包括在码流中。A general controller (721) can be configured to determine general control data and control other components of the video encoder (703) based on that general control data. In one example, the general controller (721) determines the prediction mode of a block and provides control signals to a switch (726) based on that prediction mode. For example, when the prediction mode is an intra-frame mode, the general controller (721) controls the switch (726) to select an intra-frame mode result for use by the residual calculator (723) and controls the entropy encoder (725) to select intra-frame prediction information and include the intra-frame prediction information in the bitstream; and when the prediction mode of a block is an inter-frame mode, the general controller (721) controls the switch (726) to select an inter-frame prediction result for use by the residual calculator (723) and controls the entropy encoder (725) to select inter-frame prediction information and include the inter-frame prediction information in the bitstream.

残差计算器(723)可配置成计算所接收的块与从帧内编码器(722)或帧间编码器(730)选择的块预测结果之间的差(残差数据)。残差编码器(724)可配置成对残差数据进行编码来生成变换系数。例如，残差编码器(724)可配置成将残差数据从空间域变换到频域，以生成变换系数。变换系数接着经由量化处理以获得量化的变换系数。在各种示例性实施例中，视频编码器(703)还包括残差解码器(728)。残差解码器(728)用于执行逆变换，且生成已解码残差数据。已解码残差数据可适当地由帧内编码器(722)和帧间编码器(730)使用。举例来说，帧间编码器(730)可基于已解码残差数据和帧间预测信息生成已解码块，且帧内编码器(722)可基于已解码残差数据和帧内预测信息生成已解码块。适当处理已解码块以生成已解码图片，且已解码图片可以在存储器电路(未示出)中缓冲并用作参考图片。A residual calculator (723) can be configured to calculate the difference (residual data) between the received block and the prediction result of a block selected from the intra encoder (722) or the inter encoder (730). A residual encoder (724) can be configured to encode the residual data to generate transform coefficients. For example, the residual encoder (724) can be configured to transform the residual data from the spatial domain to the frequency domain to generate transform coefficients. The transform coefficients are then quantized to obtain quantized transform coefficients. In various exemplary embodiments, the video encoder (703) also includes a residual decoder (728). The residual decoder (728) is used to perform the inverse transform and generate decoded residual data. The decoded residual data can be used appropriately by the intra encoder (722) and the inter encoder (730). For example, the inter encoder (730) can generate a decoded block based on the decoded residual data and inter-frame prediction information, and the intra encoder (722) can generate a decoded block based on the decoded residual data and intra-frame prediction information. The decoded blocks are processed appropriately to generate a decoded image, which can be buffered in a memory circuit (not shown) and used as a reference image.

熵编码器(725)可配置成将码流格式化以包括已编码块且执行熵编码。熵编码器(725)配置成将各种信息包括在码流中。例如，熵编码器(725)可配置成将通用控制数据、所选预测信息(例如，帧内预测信息或帧间预测信息)、残差信息和其它合适的信息包括在码流中。当在帧间模式或双向预测模式的合并子模式中对块进行编码时，可以不存在残差信息。The entropy encoder (725) can be configured to format the bitstream to include encoded blocks and perform entropy encoding. The entropy encoder (725) is configured to include various types of information in the bitstream. For example, the entropy encoder (725) can be configured to include general control data, selected prediction information (e.g., intra-frame prediction information or inter-frame prediction information), residual information, and other suitable information in the bitstream. Residual information may be absent when blocks are encoded in a merged sub-mode of inter-frame mode or bidirectional prediction mode.

图8示出了根据本公开的另一实施例的示例性视频解码器(810)的图。视频解码器(810)用于接收作为已编码视频序列的一部分的已编码图像，且对该已编码图像进行解码以生成重建的图片。在一个示例中，视频解码器(810)可用于代替图4的示例中的视频解码器(410)。Figure 8 illustrates an exemplary video decoder (810) according to another embodiment of the present disclosure. The video decoder (810) is used to receive an encoded image as part of an encoded video sequence and decode the encoded image to generate a reconstructed picture. In one example, the video decoder (810) can be used instead of the video decoder (410) in the example of Figure 4.

在图8的示例中，视频解码器(810)包括如图8的示例性布置所示的耦接在一起的熵解码器(871)、帧间解码器(880)、残差解码器(873)、重建模块(874)和帧内解码器(872)。In the example of FIG8, the video decoder (810) includes an entropy decoder (871), an inter-frame decoder (880), a residual decoder (873), a reconstruction module (874), and an intra-frame decoder (872) coupled together as shown in the exemplary arrangement of FIG8.

熵解码器(871)可用于根据已编码图片来重建某些符号，这些符号表示构成该已编码图片的语法元素。此类符号可包括例如对块进行编码的模式(例如，帧内模式、帧间模式、双向预测模式、合并子模式或另一子模式)、可识别分别供帧内解码器(872)或帧间解码器(880)使用以进行预测的某些样本或元数据的预测信息(例如，帧内预测信息或帧间预测信息)、呈例如量化变换系数形式的残差信息等。在一个示例中，当预测模式是帧间或双向预测模式时，将帧间预测信息提供给帧间解码器(880)；以及当预测类型是帧内预测类型时，将帧内预测信息提供给帧内解码器(872)。残差信息可经由逆量化并提供到残差解码器(873)。An entropy decoder (871) can be used to reconstruct certain symbols from an encoded picture, which represent the syntax elements constituting the encoded picture. Such symbols may include, for example, the mode encoding the block (e.g., intra-frame mode, inter-frame mode, bidirectional prediction mode, merged sub-mode, or another sub-mode), prediction information (e.g., intra-frame prediction information or inter-frame prediction information) that can be identified for use by the intra-frame decoder (872) or the inter-frame decoder (880) for prediction, residual information in the form of, for example, quantized transform coefficients, etc. In one example, when the prediction mode is inter-frame or bidirectional prediction mode, inter-frame prediction information is provided to the inter-frame decoder (880); and when the prediction type is intra-frame prediction type, intra-frame prediction information is provided to the intra-frame decoder (872). Residual information may be provided to the residual decoder (873) via inverse quantization.

帧间解码器(880)可配置成接收帧间预测信息，且基于该帧间预测信息生成帧间预测结果。The inter-frame decoder (880) can be configured to receive inter-frame prediction information and generate inter-frame prediction results based on the inter-frame prediction information.

帧内解码器(872)可配置成接收帧内预测信息，且基于该帧内预测信息生成预测结果。The intra-frame decoder (872) can be configured to receive intra-frame prediction information and generate prediction results based on the intra-frame prediction information.

残差解码器(873)可配置成执行逆量化以提取解量化的变换系数，且处理该解量化的变换系数，以将残差从频域变换到空间域。残差解码器(873)还可使用某些控制信息(用以包括量化器参数(Quantizer Parameter，QP))，该信息可由熵解码器(871)提供(未描绘数据路径，因为这仅仅是低数据量控制信息)。The residual decoder (873) can be configured to perform inverse quantization to extract the dequantized transform coefficients and process the dequantized transform coefficients to transform the residual from the frequency domain to the spatial domain. The residual decoder (873) can also use certain control information (to include quantizer parameters (QP)) which can be provided by the entropy decoder (871) (the data path is not depicted because this is only low-data-volume control information).

重建模块(874)可配置成在空间域中组合由残差解码器(873)输出的残差与预测结果(可由帧间预测模块或帧内预测模块输出，视情况而定)以形成已重建块，已重建块形成已重建图片的一部分，已重建图片作为已重建视频的一部分。应注意，还可执行诸如去块操作等其它合适的操作来改善视觉质量。The reconstruction module (874) can be configured to combine the residual output by the residual decoder (873) with the prediction result (which may be output by the inter-frame prediction module or the intra-frame prediction module, as appropriate) in the spatial domain to form a reconstructed block, which forms part of the reconstructed image, which in turn forms part of the reconstructed video. It should be noted that other suitable operations, such as deblocking, can also be performed to improve visual quality.

应注意，可使用任何合适的技术来实施视频编码器(403)、视频编码器(603)和视频编码器(703)以及视频解码器(410)、视频解码器(510)和视频解码器(810)。在一些示例性实施例中，可使用一个或多个集成电路来实现视频编码器(403)、视频编码器(603)和视频编码器(703)以及视频解码器(410)、视频解码器(510)和视频解码器(810)。在另一实施例中，可使用执行软件指令的一个或多个处理器来实施视频编码器(403)、视频编码器(603)和视频编码器(603)以及视频解码器(410)、视频解码器(510)和视频解码器(810)。It should be noted that any suitable technology can be used to implement the video encoder (403), video encoder (603), and video encoder (703), as well as the video decoder (410), video decoder (510), and video decoder (810). In some exemplary embodiments, one or more integrated circuits may be used to implement the video encoder (403), video encoder (603), and video encoder (703), as well as the video decoder (410), video decoder (510), and video decoder (810). In another embodiment, one or more processors executing software instructions may be used to implement the video encoder (403), video encoder (603), and video encoder (603), as well as the video decoder (410), video decoder (510), and video decoder (810).

转到用于编码和解码的块划分，一般划分可以从基本块开始，并且可以遵循预定义的规则集、特定模式、划分树或任何划分结构或方案。划分可以是分层的和递归的。在按照下面描述的任何示例划分过程或其他过程或其组合将基本块分割或划分之后，可以获得最终的一组分区或编码块。这些分区中的每一个都可以处于划分层次结构中的不同划分级别之一，并且可以具有各种形状。每个分区可以被称为编码块(CB)。对于下面进一步描述的各种示例划分实现，每个生成的CB可以是任何允许的大小和划分级别。这样的分区被称为编码块，因为它们可以形成单元，对于这些单元，可以对其进行一些基本的编码/解码，并且可以对编码/解码参数进行优化、确定和在已编码视频码流中写入。最终分区中的最高或最深级别表示树的编码块划分结构的深度。编码块可以是亮度编码块或色度编码块。每种颜色的CB树结构可以被称为编码块树(coding block tree，CBT)。Turning to block partitioning for encoding and decoding, partitioning can generally begin with basic blocks and can follow a predefined set of rules, a specific pattern, a partition tree, or any partitioning structure or scheme. Partitioning can be hierarchical and recursive. After dividing or partitioning the basic blocks according to any of the example partitioning processes described below, or other processes or combinations thereof, a final set of partitions or coding blocks can be obtained. Each of these partitions can be at one of the different partitioning levels in the partitioning hierarchy and can have various shapes. Each partition can be called a coding block (CB). For the various example partitioning implementations further described below, each generated CB can be of any allowed size and partitioning level. Such partitions are called coding blocks because they can form units for which some basic encoding/decoding can be performed, and encoding/decoding parameters can be optimized, determined, and written into the encoded video stream. The highest or deepest level in the final partition represents the depth of the tree-based coding block partitioning structure. Coding blocks can be luma coding blocks or chroma coding blocks. The CB tree structure for each color can be called a coding block tree (CBT).

所有颜色通道的编码块可以统称为编码单元(CU)。所有颜色通道的分层结构可以统称为编码树单元(CTU)。一个CTU中不同颜色通道的划分模式或结构可能相同，也可能不同。The coding blocks for all color channels can be collectively referred to as coding units (CUs). The hierarchical structure of all color channels can be collectively referred to as coding tree units (CTUs). The partitioning patterns or structures of different color channels within a CTU may be the same or different.

在一些实现中，用于亮度和色度通道的划分树方案或结构可能不需要相同。换句话说，亮度和色度通道可以具有单独的编码树结构或模式。此外，亮度和色度通道是否使用相同或不同的编码划分树结构以及要使用的实际编码划分树结构可以取决于被编码的切片是P切片、B切片还是I切片。例如，对于I切片，色度通道和亮度通道可以具有单独的编码划分树结构或编码划分树结构模式，而对于P切片或B切片，亮度和色度通道可以共享相同的编码划分树方案。当应用单独的编码划分树结构或模式时，亮度通道可以通过一个编码划分树结构被分成CBs，色度通道可以通过另一个编码划分树结构被分成色度CBs。In some implementations, the partitioning tree schemes or structures used for the luma and chroma channels may not need to be the same. In other words, the luma and chroma channels can have separate coding tree structures or patterns. Furthermore, whether the luma and chroma channels use the same or different coding tree structures, and the actual coding tree structure used, can depend on whether the slice being encoded is a P-slice, a B-slice, or an I-slice. For example, for an I-slice, the chroma and luma channels can have separate coding tree structures or coding tree structure patterns, while for P-slices or B-slices, the luma and chroma channels can share the same coding tree scheme. When applying separate coding tree structures or patterns, the luma channel can be divided into CBs using one coding tree structure, and the chroma channel can be divided into chroma CBs using another coding tree structure.

在一些示例性实现中，可以将预定的划分模式应用于基本块。如图9所示，示例性的4路划分树可以从第一预定义级别(例如，64×64块级别或其他大小，作为基本块大小)开始，并且基本块可以向下分层划分到预定义的最低级别(例如，4×4级别)。例如，基本块可以有由902、904、906和908指示的四个预定义的划分选项或模式，其中指定为R的分区被允许用于递归划分，使得如图9中指示的相同划分选项可以以较低的规模重复，直到最低级别(例如，4×4级别)。在一些实现中，可以对图9的划分方案应用附加限制。在图9的实现中，可以被允许矩形划分(例如，1:2/2:1的矩形划分)，但是这些矩形划分不允许是递归的，而正方形划分允许是递归的。如果需要的话，按照图9的递归划分会生成最终的一组编码块。可以进一步定义编码树深度以指示来自根节点或根块的分割深度。例如，根节点或根块(例如64×64个块)的编码树深度可以设置为0，并且在根块按照图9被进一步分割一次之后，编码树深度增加1。对于上述方案，从64×64的基本块到4×4的最小分区的最大或最深级别将是4(从级别0开始)。这种划分方案可以应用于一个或多个颜色通道。可以按照图9的方案独立地划分每个颜色通道(例如，可以针对每个分层级别的每个颜色通道独立地确定预定义模式中的划分模式或选项)。可选地，两个或更多个颜色通道可以共享图9的相同分层模式树(例如，可以在每个分层级别为两个或更多个颜色通道选择预定义模式中的相同划分模式或选项)。In some exemplary implementations, predetermined partitioning patterns can be applied to the basic blocks. As shown in Figure 9, an exemplary 4-way partitioning tree can start from a first predefined level (e.g., a 64×64 block level or other size, as the basic block size), and the basic blocks can be partitioned hierarchically down to a predefined lowest level (e.g., a 4×4 level). For example, the basic block can have four predefined partitioning options or patterns indicated by 902, 904, 906, and 908, where the partition designated as R is allowed for recursive partitioning, such that the same partitioning option indicated in Figure 9 can be repeated at lower scales until the lowest level (e.g., the 4×4 level). In some implementations, additional restrictions can be applied to the partitioning scheme of Figure 9. In the implementation of Figure 9, rectangular partitions (e.g., 1:2/2:1 rectangular partitions) can be allowed, but these rectangular partitions are not allowed to be recursive, while square partitions are allowed to be recursive. If needed, the recursive partitioning according to Figure 9 will generate a final set of coded blocks. The coded tree depth can be further defined to indicate the partitioning depth from the root node or root block. For example, the coding tree depth of the root node or root block (e.g., 64×64 blocks) can be set to 0, and the coding tree depth increases by 1 after the root block is further divided once according to Figure 9. For the above scheme, the maximum or deepest level from the 64×64 basic block to the 4×4 minimum partition will be 4 (starting from level 0). This partitioning scheme can be applied to one or more color channels. Each color channel can be partitioned independently according to the scheme of Figure 9 (e.g., the partitioning pattern or option in the predefined pattern can be determined independently for each color channel at each hierarchical level). Alternatively, two or more color channels can share the same hierarchical pattern tree of Figure 9 (e.g., the same partitioning pattern or option in the predefined pattern can be selected for two or more color channels at each hierarchical level).

图10示出了允许递归划分形成划分树的另一示例性预定义划分模式。如图10所示，可以预定义一个示例性10路划分结构或模式。根块可以从预定义的级别开始(例如，从128×128级别或64×64级别的基本块开始)。图10的示例划分结构包括各种2:1/1:2和4:1/1:4的矩形划分。图10的第二行中指示的1002、1004、1006和1008的3个子划分的划分类型可以被称为“T型”分区。“T型”分区1002、1004、1006和1008可以分别被称为左T型、上T型、右T型和下T型。在一些示例实现中，不允许进一步细分图10的矩形划分中的任何一个。可以进一步定义编码树深度以指示来自根节点或根块的分割深度。例如，根节点或根块(例如128×128块)的编码树深度可以设置为0，并且在根块按照图10被进一步分割一次之后，编码树深度增加1。在一些实现中，只有1010中的全正方形划分可以被允许按照图10的模式递归划分到划分树的下一级。换句话说，对于T型模式1002、1004、1006和1008内的正方形划分，可以不允许递归划分。如果需要的话，按照图10的递归划分过程会生成最终的一组编码块。这种方案可以应用于一个或多个颜色通道。在一些实现中，在使用8×8级别以下的分区时，可以增加更多的灵活性。例如，在某些情况下可以使用2×2色度帧间预测。Figure 10 illustrates another exemplary predefined partitioning pattern that allows recursive partitioning to form a partition tree. As shown in Figure 10, an exemplary 10-way partitioning structure or pattern can be predefined. The root block can start from a predefined level (e.g., from a 128×128 level or a 64×64 level basic block). The example partitioning structures in Figure 10 include various 2:1/1:2 and 4:1/1:4 rectangular partitions. The partitioning type of the three sub-partitions 1002, 1004, 1006, and 1008 indicated in the second row of Figure 10 can be referred to as “T-shaped” partitions. The “T-shaped” partitions 1002, 1004, 1006, and 1008 can be referred to as left T-shaped, upper T-shaped, right T-shaped, and lower T-shaped, respectively. In some example implementations, further subdivision of any of the rectangular partitions in Figure 10 is not allowed. The coding tree depth can be further defined to indicate the partitioning depth from the root node or root block. For example, the coding tree depth of the root node or root block (e.g., a 128×128 block) can be set to 0, and the coding tree depth increases by 1 after the root block is further divided once according to Figure 10. In some implementations, only full square partitions in 1010 are allowed to be recursively partitioned to the next level of the partition tree according to the pattern in Figure 10. In other words, recursive partitioning is not allowed for square partitions within T-patterns 1002, 1004, 1006, and 1008. If needed, the recursive partitioning process according to Figure 10 will generate the final set of coding blocks. This scheme can be applied to one or more color channels. In some implementations, more flexibility can be added when using partitions below the 8×8 level. For example, 2×2 chroma inter-frame prediction can be used in some cases.

在用于编码块划分的一些其它示例性实现中，四叉树结构可用于将基本块或中间块分割成四叉树划分。这种四叉树分割可以分层地和递归地应用于任何正方形分区。基本块或中间块或分区是否被进一步四叉树分割可适应于基本块或中间块或分区的各种局部特征。可以进一步调整在图片边界处的四叉树划分。例如，可以在图片边界处执行隐式四叉树分割，使得块将保持四叉树分割，直到大小适合图片边界。In some other exemplary implementations of coded block partitioning, quadtree structures can be used to partition basic blocks or intermediate blocks into quadtree partitions. This quadtree partitioning can be applied hierarchically and recursively to any square partition. Whether a basic block, intermediate block, or partition is further quadtree-partitioned can be adapted to various local features of the basic block, intermediate block, or partition. Quadtree partitioning at image boundaries can be further adjusted. For example, implicit quadtree partitioning can be performed at image boundaries such that the block will remain quadtree-partitioned until its size fits the image boundaries.

在一些其他示例实现中，可以使用来自基本块的分层二叉划分。对于这样的方案，基本块或中间级块可以被划分成两个分区。二叉划分可以是水平的，也可以是垂直的。例如，水平二叉划分可以将基本块或中间块分割成相等的右划分和左划分。类似地，垂直二叉划分可以将基本块或中间块分割为相等的上划分和下划分。这种二叉划分可以是分层的和递归的。可以在每个基本块或中间块上决定是否应该继续二叉划分方案，并决定如果该方案继续下去，应该使用水平还是垂直二叉划分。在一些实现中，进一步的划分可能会在预定义的最低分区大小(在一个或两个维度中)停止。可选地，一旦达到来自基本块的预定义划分级别或深度，则可以停止进一步的划分。在一些实现中，分区的纵横比可能会受到限制。例如，分区的纵横比可以不小于1:4(或大于4:1)。因此，纵横比为4:1的垂直条形分区，只能进一步被垂直二叉划分为上下两个分区，每个分区的纵横比为2:1。In some other example implementations, hierarchical binary partitioning from the base block can be used. For such a scheme, the base block or intermediate block can be divided into two partitions. The binary partition can be horizontal or vertical. For example, a horizontal binary partition can divide the base block or intermediate block into equal right and left partitions. Similarly, a vertical binary partition can divide the base block or intermediate block into equal upper and lower partitions. This binary partitioning can be hierarchical and recursive. At each base block or intermediate block, it can be decided whether the binary partitioning scheme should continue, and if so, whether to use a horizontal or vertical binary partition. In some implementations, further partitioning may stop at a predefined minimum partition size (in one or two dimensions). Optionally, further partitioning can stop once a predefined partition level or depth from the base block is reached. In some implementations, the aspect ratio of the partitions may be limited. For example, the aspect ratio of the partitions can be no less than 1:4 (or greater than 4:1). Therefore, a vertical strip partition with an aspect ratio of 4:1 can only be further divided into two vertically bifurcated partitions, each with an aspect ratio of 2:1.

在又一些其它示例中，如图13所示，三叉划分方案可用于划分基本块或任何中间块。三叉图案可以如图13的1302所示以垂直实现，或者如图13的1304所示以水平实现。虽然图13中的示例性分割比(垂直地或水平地)被示出为1:2:1，但也可以预定义其他比例。在一些实现中，可以预定义两个或多个不同的比率。这种三叉划分方案可以用于补充四叉树或二叉划分结构，因为这种三叉树划分能够在一个连续分区中捕获位于块中心的对象，而四叉树和二叉树总是沿着块中心划分，因此该对象被划分成单独的分区。在一些实现中，示例三叉树划分的宽度和高度始终是2的幂，以避免额外的变换。In several other examples, as shown in Figure 13, a ternary partitioning scheme can be used to partition basic blocks or any intermediate blocks. The ternary pattern can be implemented vertically as shown at 1302 in Figure 13, or horizontally as shown at 1304 in Figure 13. While the exemplary partition ratio (vertically or horizontally) in Figure 13 is shown as 1:2:1, other ratios can be predefined. In some implementations, two or more different ratios can be predefined. This ternary partitioning scheme can be used to complement quadtree or binary partitioning structures because this ternary partitioning captures objects located at the center of a block within a contiguous partition, whereas quadtrees and binary trees always partition along the block center, thus dividing the object into separate partitions. In some implementations, the width and height of the example ternary partition are always powers of 2 to avoid additional transformations.

上述划分方案可以在不同的划分级别上以任何方式组合。作为一个示例，可以组合上述四叉树和二叉划分方案以将基本块划分为四叉树-二叉树(quadtree-binary-tree，QTBT)结构。在这种方案中，基本块或中间块/划分可以是四叉树分割或二叉分割，如果指定的话，这取决于一组预定义的条件。图14示出了一个特定的例子。在图14的示例中，一个基本块首先被四叉树分成四个分区，如1402、1404、1406和1408所示。此后，每个生成的分区在下一级要么被四叉树分割成四个进一步的分区(例如1408)，要么被二叉分割成两个进一步的分区(例如水平地或垂直地，例如1402或1406，两者都是对称的)，要么不被分割(例如1404)。对于正方形分区，可以递归地允许二叉或四叉树分割，如1410的整体划分模式示例和1420中的相应树结构/表示所示，其中实线表示四叉树分割，虚线表示二叉树分割。可以将标志用于每个二叉分割节点(非叶二叉分区)，以指示二叉分割是水平的还是垂直的。例如，如1420所示，与1410的划分结构一致，标志“0”可以表示水平二叉分割，标志“1”可以表示垂直二叉分割。对于四叉树分割划分，不需要指示分割类型，因为四叉树分割总是水平和垂直地分割块或分区，以产生大小相等的4个子块/分区。在一些实现中，标志“1”可以表示水平二叉分割，标志“0”可以表示垂直二叉分割。The partitioning schemes described above can be combined in any way at different partitioning levels. As an example, the quadtree and binary partitioning schemes described above can be combined to partition the basic block into a quadtree-binary-tree (QTBT) structure. In this scheme, the basic block or intermediate block/partition can be a quadtree partition or a binary partition, depending on a set of predefined conditions, if specified. Figure 14 shows a specific example. In the example of Figure 14, a basic block is first divided into four partitions by a quadtree, as shown in 1402, 1404, 1406, and 1408. Subsequently, each generated partition is either partitioned into four further partitions by the quadtree at the next level (e.g., 1408), or partitioned into two further partitions by a binary partition (e.g., horizontally or vertically, such as 1402 or 1406, both of which are symmetrical), or not partitioned at all (e.g., 1404). For square partitions, binary or quadtree partitions can be recursively allowed, as shown in the overall partitioning pattern example in 1410 and the corresponding tree structure/representation in 1420, where solid lines represent quadtree partitions and dashed lines represent binary tree partitions. Flags can be used for each binary partition node (non-leaf binary partition) to indicate whether the binary partition is horizontal or vertical. For example, as shown in 1420, consistent with the partitioning structure in 1410, flag "0" can indicate a horizontal binary partition, and flag "1" can indicate a vertical binary partition. For quadtree partitions, it is not necessary to indicate the partition type, because quadtree partitions always divide blocks or partitions horizontally and vertically to produce 4 sub-blocks/partitions of equal size. In some implementations, flag "1" can indicate a horizontal binary partition, and flag "0" can indicate a vertical binary partition.

在QTBT的一些示例实现中，四叉树和二叉分割规则集可以由以下预定义参数和与其相关联的相应函数来表示：In some example implementations of QTBT, quadtrees and binary partitioning rule sets can be represented by the following predefined parameters and their associated corresponding functions:

- CTU size：四叉树的根节点大小(基本块的大小)- CTU size: The size of the root node of the quadtree (the size of the basic block).

- MinQTSize：允许的最小四叉树叶节点大小- MinQTSize: The minimum allowed size of a quadtree leaf node

- MaxBTSize：允许的最大二叉树根节点大小- MaxBTSize: The maximum allowed size of the root node of a binary tree.

- MaxBTDepth：允许的最大二叉树深度- MaxBTDepth: Maximum allowed binary tree depth

- MaxBTSize：允许的最小二叉树叶节点大小- MaxBTSize: The minimum allowed size of a binary leaf node

在QTBT划分结构的一些示例实现中，CTU大小可以被设置为128×128个亮度样本和两个对应的64×64个色度样本块(当考虑并使用示例色度子采样时)，MinQTSize可以被设置为16×16，MaxBTSize可以被设置为64×64，MinBTSize(对于宽度和高度)可以被设置为4×4，MaxBTDepth可以被设置为4。四叉树划分可以首先应用于CTU以生成四叉树叶节点。四叉树叶节点的大小可以从其最小允许大小16×16(即MinQTSize)到128×128(即CTU大小)。如果一个节点是128×128，它将不会首先被二叉树分割，因为它的大小超过了MaxBTSize(即64×64)。否则，不超过MaxBTSize的节点可以由二叉树进行划分。在图14的例子中，基本块是128×128。根据预定义的规则集，基本块只能进行四叉树分割。基本块的划分深度为0。得到的四个分区中的每一个都是64×64，不超过MaxBTSize，可以在级别1进一步进行四叉树或二叉树分割。这一过程仍在继续。当二叉树深度达到MaxBTDepth(即4)时，可以不考虑进一步的分割。当二叉树节点的宽度等于MinBTSize(即4)时，可以不考虑进一步的水平分割。类似地，当二叉树节点的高度等于MinBTSize时，不再考虑进一步的垂直分割。In some example implementations of the QTBT partitioning structure, the CTU size can be set to 128×128 luma samples and two corresponding 64×64 chroma sample blocks (when considering and using example chroma subsampling), MinQTSize can be set to 16×16, MaxBTSize can be set to 64×64, MinBTSize (for width and height) can be set to 4×4, and MaxBTDepth can be set to 4. Quadtree partitioning can first be applied to the CTU to generate quadtree leaf nodes. The size of the quadtree leaf nodes can range from their minimum allowed size of 16×16 (i.e., MinQTSize) to 128×128 (i.e., CTU size). If a node is 128×128, it will not be partitioned by the binary tree first because its size exceeds MaxBTSize (i.e., 64×64). Otherwise, nodes not exceeding MaxBTSize can be partitioned by the binary tree. In the example of Figure 14, the basic block is 128×128. According to a predefined set of rules, the basic block can only be partitioned by the quadtree. The basic block is partitioned at a depth of 0. Each of the four resulting partitions is 64×64, not exceeding MaxBTSize, and can be further partitioned into quadtrees or binary trees at level 1. This process continues. When the binary tree depth reaches MaxBTDepth (i.e., 4), further partitioning is not considered. When the width of a binary tree node equals MinBTSize (i.e., 4), further horizontal partitioning is not considered. Similarly, when the height of a binary tree node equals MinBTSize, further vertical partitioning is no longer considered.

在一些示例实现中，上述QTBT方案可以被配置为支持亮度和色度具有相同QTBT结构或分离QTBT结构的灵活性。例如，对于P切片和B切片，一个CTU中的亮度和色度CTB可以共用相同的QTBT结构。然而，对于I切片，亮度CTB可以通过QTBT结构划分成CBs，色度CTB可以通过另一QTBT结构划分成色度CBs。这意味着CU可以用于指代I切片中的不同颜色通道，例如，I切片可以由亮度分量的编码块或两个色度分量的编码块组成，并且P切片或B切片中的CU可以由所有三个颜色分量的编码块组成。In some example implementations, the QTBT scheme described above can be configured to support the flexibility of having the same QTBT structure for luma and chroma, or having separate QTBT structures. For example, for P-slices and B-slices, the luma and chroma CTBs in a CTU can share the same QTBT structure. However, for I-slices, the luma CTB can be partitioned into CBs using a QTBT structure, and the chroma CTB can be partitioned into chroma CBs using a different QTBT structure. This means that CUs can be used to refer to different color channels in an I-slice; for example, an I-slice can consist of coded blocks for the luma component or coded blocks for the two chroma components, and a CU in a P-slice or B-slice can consist of coded blocks for all three color components.

在一些其他实现中，QTBT方案可以用上述三叉方案进行补充。这种实现可以被称为多类型树(multi-type-tree，MTT)结构。例如，除了节点的二叉分割之外，可以选择图13的三叉划分模式之一。在一些实现中，只有正方形节点可以进行三叉分割。可以使用附加标志来指示三叉划分是水平的还是垂直的。In some other implementations, the QTBT scheme can be supplemented with the ternary partitioning scheme described above. This implementation can be referred to as a multi-type-tree (MTT) structure. For example, in addition to binary partitioning of nodes, one of the ternary partitioning patterns in Figure 13 can be selected. In some implementations, only square nodes can be ternary partitioned. Additional flags can be used to indicate whether the ternary partition is horizontal or vertical.

两级或多级树的设计，如QTBT实现和通过三叉分割进行补充的QTBT实现，可能主要是为了复杂性降低。理论上，遍历树的复杂度是T^D，其中T表示分割类型的数量，D是树的深度。可以通过使用多种类型(T)同时减少深度(D)来进行折衷。The design of two- or multi-level trees, such as QTBT implementations and QTBT implementations supplemented by ternary partitioning, is likely primarily for the purpose of reducing complexity. Theoretically, the complexity of traversing a tree is T<sub>^D</sub> , where T represents the number of partition types and D is the depth of the tree. A trade-off can be made by using multiple types (T) while simultaneously reducing the depth (D).

在一些实现中，可以对CB进行进一步划分。例如，为了在编码和解码过程期间进行帧内或帧间预测的目的，可将CB进一步划分成多个预测块。换句话说，CB可以进一步划分为不同的子分区，在这些子分区中可以做出单独的预测决策/配置。并行地，为了描绘执行视频数据的变换或逆变换的水平，可将CB进一步划分成多个变换块(transform block，TB)。将CB划分为PBs和TBs的方案可以相同，也可以不相同。例如，每个划分方案可以使用其自己的过程来执行，例如，基于视频数据的各种特征。在一些示例实现中，PB和TB划分方案可能是独立的。在其他一些示例实现中，PB和TB的划分方案和边界可能相互关联。例如，在一些实现中，TB可以在PB划分之后进行划分，特别地，每个PB在编码块划分之后确定，然后可以进一步划分为一个或多个TB。例如，在一些实现中，一个PB可以分成一个、两个、四个或其他数量的TB。In some implementations, the CB can be further subdivided. For example, for the purpose of intra-frame or inter-frame prediction during encoding and decoding, the CB can be further divided into multiple prediction blocks. In other words, the CB can be further divided into different sub-partitions, in which separate prediction decisions/configurations can be made. In parallel, to depict the level at which the transform or inverse transform of the video data is performed, the CB can be further divided into multiple transform blocks (TBs). The schemes for dividing the CB into PBs and TBs can be the same or different. For example, each partitioning scheme can be performed using its own process, such as based on various features of the video data. In some example implementations, the PB and TB partitioning schemes may be independent. In other example implementations, the PB and TB partitioning schemes and boundaries may be interrelated. For example, in some implementations, TBs can be partitioned after PB partitioning; specifically, each PB is determined after coded block partitioning and can then be further partitioned into one or more TBs. For example, in some implementations, a PB can be divided into one, two, four, or other numbers of TBs.

在一些实现中，为了将基本块划分为编码块并进一步划分为预测块和/或变换块，亮度通道和色度通道可以进行不同地处理。例如，在一些实现中，对于亮度通道，可以允许将编码块划分为预测块和/或变换块，而对于色度通道，可以不允许将编码块划分为预测块和/或变换块。因此，在这样的实现中，亮度块的变换和/或预测可以仅在编码块级别上执行。对于另一个示例，亮度通道和色度通道的最小变换块大小可以不同，例如，亮度通道的编码块可以被允许划分成比色度通道更小的变换和/或预测块。对于又一示例，将编码块划分成变换块和/或预测块的最大深度在亮度通道和色度通道之间可以不同，例如，可以允许将亮度通道的编码块划分成比色度通道更深的变换块和/或预测块。对于特定示例，亮度编码块可以被划分成多个大小的变换块，这些变换块可以用递归划分表示，递归划分最多可达2级，并且可以允许诸如正方形、2:1/1:2和4:1/1:4的变换块形状以及从4×4到64×64的变换块大小。然而，对于色度块，可以只允许为亮度块指定的最大可能的变换块。In some implementations, the luma and chroma channels can be processed differently to divide a basic block into coded blocks and further into prediction blocks and/or transform blocks. For example, in some implementations, for the luma channel, dividing the coded block into prediction blocks and/or transform blocks is permitted, while for the chroma channel, this division may not be permitted. Therefore, in such implementations, the transform and/or prediction of the luma block can be performed only at the coded block level. For another example, the minimum transform block size for the luma and chroma channels can be different; for example, the coded block for the luma channel can be divided into smaller transform and/or prediction blocks than the chroma channel. For yet another example, the maximum depth at which a coded block is divided into transform and/or prediction blocks can differ between the luma and chroma channels; for example, the coded block for the luma channel can be divided into deeper transform and/or prediction blocks than the chroma channel. For a specific example, the luma coding block can be divided into multiple transform blocks of various sizes, which can be represented recursively, with a maximum of two levels of recursion. Transform block shapes such as squares, 2:1/1:2, and 4:1/1:4, and transform block sizes ranging from 4×4 to 64×64 are allowed. However, for the chroma block, only the largest possible transform block specified for the luma block is permitted.

在将编码块划分为PBs的一些示例实现中，PB划分的深度、形状和/或其他特征可以取决于PB是帧内编码还是帧间编码。In some example implementations that divide coded blocks into PBs, the depth, shape, and/or other characteristics of the PB divisions may depend on whether the PB is intra-frame coded or inter-frame coded.

将编码块(或预测块)划分为变换块可以在各种示例方案中实现，包括但不限于递归地或非递归地四叉树分割和预定模式分割，并且附加考虑在编码块或预测块的边界处的变换块。通常，得到的变换块可以处于不同的分割级别，可以不具有相同的大小，且可以不需要为正方形形状(例如，它们可以是具有一些允许的大小和纵横比的矩形)。下面结合图15、图16和图17更详细地描述其他示例。Dividing a coded block (or prediction block) into transform blocks can be implemented in various example schemes, including but not limited to recursive or non-recursive quadtree partitioning and predefined pattern partitioning, with additional consideration given to transform blocks at the boundaries of the coded or prediction blocks. Typically, the resulting transform blocks can be at different partitioning levels, can have different sizes, and do not need to be square (e.g., they can be rectangles with some allowed size and aspect ratio). Other examples are described in more detail below with reference to Figures 15, 16, and 17.

然而，在一些其他实现中，经由上述任何划分方案获得的CBs可用作用于预测和/或变换的基本或最小编码块。换句话说，为了执行帧间预测/帧内预测目的和/或变换目的，不执行进一步分割。例如，根据上述QTBT方案获得的CBs可以直接用作执行预测的单元。具体地，这种QTBT结构消除了多种划分类型的概念，即，它消除了CU、PU和TU的分离，且针对如上所述的CU/CB划分形状提供了更大的灵活性。在这种QTBT块结构中，CU/CB可以具有正方形或矩形形状。这种QTBT的叶节点被用作预测和变换处理的单元，而无需任何进一步的划分。这意味着在这种示例性QTBT编码块结构中，CU、PU和TU具有相同的块大小。However, in some other implementations, CBs obtained via any of the above partitioning schemes can be used as basic or minimal coding blocks for prediction and/or transform. In other words, no further partitioning is performed for inter-frame/intra-frame prediction and/or transform purposes. For example, CBs obtained according to the QTBT scheme described above can be directly used as units for performing prediction. Specifically, this QTBT structure eliminates the concept of multiple partitioning types; that is, it eliminates the separation of CU, PU, and TU, and provides greater flexibility for the CU/CB partitioning shape described above. In this QTBT block structure, CU/CB can have a square or rectangular shape. The leaf nodes of this QTBT are used as units for prediction and transform processing without any further partitioning. This means that in this exemplary QTBT coding block structure, CU, PU, and TU have the same block size.

上述各种CB划分方案以及将CB进一步划分为PBs和/或TBs(不包括PB/TB划分)可以以任何方式组合。以下具体实现作为非限制性示例提供。The various CB partitioning schemes described above, as well as the further partitioning of CB into PBs and/or TBs (excluding PB/TB partitioning), can be combined in any way. The following specific implementations are provided as non-limiting examples.

下面描述编码块和变换块划分的具体示例实现。在这样的示例实现中，可以使用递归四叉树分割或上述预定义的分割模式(例如图9和图10中的模式)将基本块分割成编码块。在每个级别，是否应该继续对特定分区进行进一步的四叉树分割可以由本地视频数据特征来确定。所得到的CBs可以处于各种四叉树分割级别，并且具有各种大小。关于是否使用图片间(时间)或图片内(空间)预测来编码图片区域的决定可以在CB级别(或CU级别，对于所有三色通道)做出。每个CB可以根据预定义的PB分割类型进一步分割成一个、两个、四个或其他数量的PBs。在一个PB内，可以应用相同的预测处理，并且可以基于PB将相关信息发送到解码器。在通过应用基于PB分割类型的预测过程获得剩余块之后，可以根据类似于CB的编码树的另一四叉树结构将CB划分成TBs。在该特定实现中，CB或TB可以是但不限于正方形。此外，在该特定示例中，对于帧间预测，PB可以是正方形或矩形，而对于帧内预测，PB可以仅是正方形。编码块可以被分成例如四个正方形TB。每个TB可以被进一步递归地分割(使用四叉树分割)成更小的TB，称为剩余四叉树(Residual Quadtree，RQT)。The following describes a specific example implementation of coded block and transform block partitioning. In such an example implementation, basic blocks can be partitioned into coded blocks using recursive quadtree partitioning or the predefined partitioning patterns described above (e.g., the patterns in Figures 9 and 10). At each level, whether further quadtree partitioning of a particular partition should continue can be determined by the characteristics of the local video data. The resulting CBs can be at various quadtree partitioning levels and have various sizes. The decision on whether to use inter-picture (temporal) or intra-picture (spatial) prediction to encode picture regions can be made at the CB level (or CU level, for all three color channels). Each CB can be further partitioned into one, two, four, or other numbers of PBs according to a predefined PB partitioning type. Within a PB, the same prediction processing can be applied, and relevant information can be sent to the decoder based on the PB. After obtaining the remaining blocks by applying the prediction process based on the PB partitioning type, the CB can be partitioned into TBs according to another quadtree structure similar to the CB. In this particular implementation, CBs or TBs can be, but are not limited to, squares. Furthermore, in this particular example, for inter-frame prediction, PBs can be squares or rectangles, while for intra-frame prediction, PBs can only be squares. The encoded block can be divided into, for example, four square TBs. Each TB can be further recursively divided (using quadtrees) into smaller TBs, called a Residual Quadtree (RQT).

下面进一步描述用于将基本块划分为CBs、PBs和/或TBs的另一示例实现。例如，可以使用采用二叉和三叉分割结构的内嵌多类型树的四叉树(例如，QTBT或如上所述的具有三叉分割的QTBT)，而不是使用诸如图9或图10中示出的多个划分单元类型。CB、PB和TB的分离(即将CB划分为PBs和/或TBs，将PBs划分为TBs)可能会被放弃，除非CBs的大小对于最大变换长度来说太大，这样的CBs可能需要进一步分割。该示例划分方案可以被设计成支持CB划分形状的更大灵活性，使得预测和变换都可以在CB级别上执行，而无需进一步划分。在这样的编码树结构中，CB可以具有正方形或矩形形状。具体地，编码树块(CTB)可以首先通过四叉树结构划分。然后，可以通过内嵌多类型树结构进一步划分四叉树叶节点。图11中示出了使用二叉或三叉分割的内嵌多类型树结构的示例。具体地，图11的示例性多类型树结构包括四种分割类型，分别称为垂直二叉分割(SPLIT_BT_VER)(1102)、水平二叉分割(SPLIT_BT_HOR)(1104)、垂直三叉分割(SPLIT_TT_VER)(1106)和水平三叉分割(SPLIT_TT_HOR)(1108)。然后，CBs对应于多类型树的叶子。在该示例实现中，除非CB对于最大变换长度来说太大，否则该分割既用于预测又用于变换处理，而无需任何进一步的划分。这意味着，在大多数情况下，在具有内嵌多类型树编码块结构的四叉树中，CB、PB和TB具有相同的块大小。当支持的最大变换长度小于CB的颜色分量的宽度或高度时，会出现例外。在一些实现中，除了二叉或三叉分割之外，图11的内嵌模式还可以包括四叉树分割。Another example implementation for partitioning basic blocks into CBs, PBs, and/or TBs is further described below. For example, instead of using multiple partitioning unit types such as those shown in Figures 9 or 10, a quadtree with an embedded multi-type tree structure (e.g., a QTBT or a QTBT with ternary partitioning as described above) can be used. The separation of CBs, PBs, and TBs (i.e., partitioning CBs into PBs and/or TBs, and PBs into TBs) may be abandoned unless the size of the CBs is too large for the maximum transform length, in which case further partitioning of such CBs may be necessary. This example partitioning scheme can be designed to support greater flexibility in the shape of CB partitions, allowing both prediction and transformation to be performed at the CB level without further partitioning. In such a coding tree structure, CBs can have square or rectangular shapes. Specifically, coding tree blocks (CTBs) can first be partitioned using a quadtree structure. Then, the leaf nodes of the quadtree can be further partitioned using an embedded multi-type tree structure. An example of an embedded multi-type tree structure using binary or ternary partitioning is shown in Figure 11. Specifically, the exemplary multi-type tree structure in Figure 11 includes four partition types, referred to as vertical binary partition (SPLIT_BT_VER) (1102), horizontal binary partition (SPLIT_BT_HOR) (1104), vertical ternary partition (SPLIT_TT_VER) (1106), and horizontal ternary partition (SPLIT_TT_HOR) (1108). CBs then correspond to the leaves of the multi-type tree. In this example implementation, unless the CB is too large for the maximum transform length, the partition is used for both prediction and transform processing without any further division. This means that, in most cases, in a quadtree with an embedded multi-type tree coding block structure, the CB, PB, and TB have the same block size. Exceptions occur when the maximum supported transform length is less than the width or height of the color component of the CB. In some implementations, in addition to binary or ternary partitions, the embedded mode in Figure 11 may also include quadtree partitions.

图12示出了对一个基本块进行块划分(包括四叉树、二叉树和三叉分割选项)的内嵌多类型树编码块结构的四叉树的具体示例。更详细地，图12示出了基本块1200被四叉树分割成四个正方形分区1202、1204、1206和1208。对于每一个四叉树分割分区，决定进一步使用图11的多类型树结构和四叉树进行进一步分割。在图12的示例中，分区1204没有被进一步分割。分区1202和分区1208各自采用另一四叉树分割。对于分区1202，二级四叉树分割的左上、右上、左下和右下分区分别采用四叉树、图11的水平二叉分割1104、非分割和图11的水平三叉分割1108的三级分割。分区1208采用另一种四叉树分割，二级四叉树分割的左上、右上、左下、右下分区分别采用图11的垂直三叉分割1106、不分割、不分割、图11的水平二叉分割1104的三级分割。分别根据图11的水平二叉分割1104和水平三叉分割1108进一步分割1208的第三级左上划分的两个子分区。分区1206采用继图11的垂直二叉分割1102之后的第二级分割模式，分成两个分区，再按照图11的水平三叉分割1108和垂直二叉分割1102进行第三级分割。根据图11的水平二叉分割1104，进一步将第四级分割应用于其中一个。Figure 12 illustrates a specific example of a quadtree with an embedded multi-type tree-coded block structure for partitioning a basic block (including quadtree, binary tree, and ternary partitioning options). More specifically, Figure 12 shows the basic block 1200 partitioned into four square partitions 1202, 1204, 1206, and 1208 by a quadtree. For each quadtree partition, a decision is made to further partition using the multi-type tree structure and quadtree of Figure 11. In the example of Figure 12, partition 1204 is not further partitioned. Partitions 1202 and 1208 are each partitioned using another quadtree. For partition 1202, the upper-left, upper-right, lower-left, and lower-right partitions of the second-level quadtree partition are partitioned using a quadtree, a horizontal binary partition 1104 of Figure 11, no partitioning, and a horizontal ternary partition 1108 of Figure 11, respectively. Partition 1208 employs another quadtree partitioning method. The upper-left, upper-right, lower-left, and lower-right partitions of the second-level quadtree partitioning are respectively divided into three levels: vertical ternary partitioning 1106 (Figure 11), no partitioning, no partitioning, and horizontal binary partitioning 1104 (Figure 11). The upper-left sub-partition of 1208 is further divided into two sub-partitions based on horizontal binary partitioning 1104 and horizontal ternary partitioning 1108 (Figure 11). Partition 1206 adopts a second-level partitioning pattern following vertical binary partitioning 1102 (Figure 11), dividing into two partitions, and then performing a third-level partitioning according to horizontal ternary partitioning 1108 and vertical binary partitioning 1102 (Figure 11). Based on horizontal binary partitioning 1104 (Figure 11), a fourth-level partitioning is further applied to one of them.

对于上面的特定示例，最大亮度变换大小可以是64×64，并且支持的最大色度变换大小可以不同于亮度，例如32×32。即使图12中的上述示例CBs通常不被进一步分割成更小的PBs和/或TBs，当亮度编码块或色度编码块的宽度或高度大于最大变换宽度或高度时，亮度编码块或色度编码块可以在水平和/或垂直方向上自动分割，以满足该方向上的变换尺寸限制。For the specific example above, the maximum luma transform size can be 64×64, and the supported maximum chroma transform size can be different from the luma, for example, 32×32. Even though the example CBs in Figure 12 are not typically further subdivided into smaller PBs and/or TBs, when the width or height of the luma-coded block or chroma-coded block is greater than the maximum transform width or height, the luma-coded block or chroma-coded block can be automatically subdivided in the horizontal and/or vertical directions to meet the transform size constraints in that direction.

在上述用于将基本块划分为CBs的具体示例中，如上所述，编码树方案可以支持亮度和色度具有单独的块树结构的能力。例如，对于P切片和B切片，一个CTU中的亮度和色度CTB可以共用相同的编码树结构。例如，对于I切片，亮度和色度可以具有单独的编码块树结构。当采用单独的块树结构时，可以用一种编码树结构将色度CTB划分为色度CBs，用另一种编码树结构将色度CTB划分为色度CBs。这意味着I切片中的CU可能由一个亮度分量的编码块或两个色度分量的编码块组成，而P切片或B切片中的CU总是由所有三种颜色分量的编码块组成，除非视频是单色的。In the specific example above used to divide a basic block into CBs, as mentioned above, the coding tree scheme can support the ability for luma and chroma to have separate block tree structures. For example, for P-slices and B-slices, the luma and chroma CTBs in a CTU can share the same coding tree structure. For example, for I-slices, luma and chroma can have separate coding block tree structures. When using separate block tree structures, the chroma CTB can be divided into chroma CBs using one coding tree structure and the chroma CTB into chroma CBs using another coding tree structure. This means that a CU in an I-slice may consist of coding blocks for one luma component or two chroma components, while a CU in a P-slice or B-slice always consists of coding blocks for all three color components, unless the video is monochrome.

当编码块被进一步划分为多个变换块时，其中的变换块可以按照各种顺序或扫描方式在码流中排序。下面将进一步详细描述将编码块或预测块划分成变换块以及变换块的编码顺序的示例实现。在一些示例实现中，如上所述，变换划分可以支持多个形状的变换块，例如1:1(正方形)、1:2/2:1和1:4/4:1，变换块大小范围从4×4到64×64不等。在一些实现中，如果编码块小于或等于64×64，则变换块划分可以仅应用于亮度分量，使得对于色度块，变换块大小与编码块大小相同。否则，如果编码块宽度或高度大于64，则亮度和色度编码块两者可分别被隐式地分割成min(W，64)×min(H，64)和min(W，32)×min(H，32)变换块的倍数。When a coded block is further divided into multiple transform blocks, these transform blocks can be ordered in the bitstream in various sequences or scanning methods. The following describes in more detail example implementations of dividing a coded block or prediction block into transform blocks and the encoding order of these transform blocks. In some example implementations, as mentioned above, transform partitioning can support transform blocks of multiple shapes, such as 1:1 (square), 1:2/2:1, and 1:4/4:1, with transform block sizes ranging from 4×4 to 64×64. In some implementations, if the coded block is less than or equal to 64×64, transform block partitioning can be applied only to the luma component, such that for the chroma block, the transform block size is the same as the coded block size. Otherwise, if the coded block width or height is greater than 64, both the luma and chroma coded blocks can be implicitly partitioned into multiples of min(W, 64)×min(H, 64) and min(W, 32)×min(H, 32) transform blocks, respectively.

在变换块划分的一些示例实现中，对于帧内编码块和帧间编码块，编码块可以进一步划分成多个变换块，其划分深度可达预定数量的级别(例如2个级别)。变换块划分深度和大小可以相关联。对于一些示例实现，从当前深度的变换块大小到下一深度的变换块大小的映射如下表1所示。In some example implementations of transform block partitioning, for intra-frame and inter-frame coding blocks, the coding block can be further divided into multiple transform blocks, with a partitioning depth of up to a predetermined number of levels (e.g., 2 levels). The transform block partitioning depth and size can be correlated. For some example implementations, the mapping from the transform block size at the current depth to the transform block size at the next depth is shown in Table 1 below.

表1变换块划分大小设置Table 1 Transform Block Size Settings

基于表1的示例映射，对于1：1的正方形块，下一级变换分割可以创建四个1：1的正方形子变换块。变换块划分可以在例如4×4处停止。因此，当前深度为4×4的变换块大小对应于下一深度的相同大小4×4。在表1的示例中，对于1：2/2：1的非正方形块，下一级变换分割可以创建两个1：1的正方形子变换块，而对于1：4/4：1的非正方形块，下一级变换分割可以创建两个1：2/2：1的子变换块。Based on the example mapping in Table 1, for a 1:1 square block, the next-level transform partition can create four 1:1 square sub-transform blocks. Transform block partitioning can stop at, for example, 4×4. Therefore, the size of a transform block at the current depth of 4×4 corresponds to the same size 4×4 at the next depth. In the examples in Table 1, for a 1:2/2:1 non-square block, the next-level transform partition can create two 1:1 square sub-transform blocks, while for a 1:4/4:1 non-square block, the next-level transform partition can create two 1:2/2:1 sub-transform blocks.

在一些示例实现中，对于帧内编码块的亮度分量，可以相对于变换块划分施加额外的限制。例如，对于变换划分的每一级，所有子变换块可以被限制为具有相等的大小。例如，对于32×16编码块，1级变换分割创建两个16×16子变换块，2级变换分割创建八个8×8子变换块。换句话说，第二级分割必须应用于所有第一级子块，以保持变换单元大小相等。图15中示出了按照表1将帧内编码的正方形块划分成变换块的示例，以及由箭头示出的编码顺序。具体地，1502示出了正方形编码块。在1504中示出了根据表1将编码块进行第一级划分而分成4个大小相等的变换块，这4个变换块的编码顺序由箭头指示。在1506中示出了根据表1将所有第一级的大小相等的块进行第二级划分而分成16个大小相等的变换块，这16个变换块的编码顺序由箭头指示。In some example implementations, additional constraints can be imposed on the luma component of an intra-coded block relative to the transform block partitioning. For example, for each level of transform partitioning, all sub-transform blocks can be constrained to have equal sizes. For instance, for a 32×16 coded block, a level 1 transform partition creates two 16×16 sub-transform blocks, and a level 2 transform partition creates eight 8×8 sub-transform blocks. In other words, the second-level partitioning must be applied to all level 1 sub-blocks to maintain equal transform unit sizes. Figure 15 shows an example of partitioning an intra-coded square block into transform blocks according to Table 1, and the encoding order indicated by arrows. Specifically, 1502 shows a square coded block. 1504 shows four equal-sized transform blocks divided into a first-level partition according to Table 1, with the encoding order of these four transform blocks indicated by arrows. 1506 shows 16 equal-sized transform blocks divided into a second-level partition according to Table 1, with the encoding order of these 16 transform blocks indicated by arrows.

在一些示例实现中，对于帧间编码块的亮度分量，可能不适用上述对帧内编码的限制。例如，在第一级变换分割之后，任何一个子变换块可以被进一步独立地分割为一个以上的级别。因此，得到的变换块可以大小相同，也可以大小不相同。图16中示出了将帧间编码块分割成具有其编码顺序的多个变换块的示例。在图16的示例中，根据表1，帧间编码块1602被分成两个级别的变换块。在第一级，帧间编码块被分割成大小相等的四个变换块。然后，四个变换块中只有一个(不是全部)被进一步分割成四个子变换块，从而产生具有两种不同大小的总共7个变换块，如1604所示。这7个变换块的示例编码顺序由图16的1604中的箭头示出。In some example implementations, the aforementioned restrictions on intra-frame coding may not apply to the luma component of inter-frame coded blocks. For example, after the first-level transform segmentation, any sub-transform block can be further independently segmented into more than one level. Therefore, the resulting transform blocks can be of the same size or different sizes. Figure 16 shows an example of segmenting an inter-frame coded block into multiple transform blocks with their coding order. In the example of Figure 16, according to Table 1, inter-frame coded block 1602 is divided into two levels of transform blocks. At the first level, the inter-frame coded block is segmented into four transform blocks of equal size. Then, only one (not all) of the four transform blocks is further segmented into four sub-transform blocks, resulting in a total of seven transform blocks of two different sizes, as shown in 1604. An example coding order of these seven transform blocks is shown by the arrow in 1604 of Figure 16.

在一些示例实现中，对于一个或多个色度分量，可以应用对变换块的一些附加限制。例如，对于一个或多个色度分量，变换块大小可以与编码块大小一样大，但不小于预定义的大小，例如8×8。In some example implementations, additional constraints can be applied to the transform block for one or more chroma components. For instance, the transform block size can be the same as the coding block size, but not smaller than a predefined size, such as 8×8, for one or more chroma components.

在其他一些示例实现中，对于宽度(W)或高度(H)大于64的编码块，亮度和色度编码块都可以分别被隐式地分割成min(W，64)×min(H，64)和min(W，32)×min(H，32)个变换单元的倍数。这里，在本公开中，“min(a，b)”可以返回a和b之间的较小值。In some other example implementations, for coded blocks with a width (W) or height (H) greater than 64, the luma and chroma coded blocks can be implicitly divided into multiples of min(W, 64) × min(H, 64) and min(W, 32) × min(H, 32), respectively. Here, in this disclosure, "min(a, b)" can return the smaller value between a and b.

图17进一步示出了用于将编码块或预测块划分成多个变换块的另一替代示例方案。如图17所示，可以根据编码块的变换类型对编码块应用一组预定义的划分类型，而不是使用递归变换划分。在图17所示的具体示例中，可以应用6种示例划分类型中的一种来将编码块分割成各种数量的变换块。这种生成变换块划分的方案可以应用于编码块或预测块。Figure 17 further illustrates another alternative example scheme for dividing a coded block or prediction block into multiple transform blocks. As shown in Figure 17, instead of using recursive transform partitioning, a set of predefined partitioning types can be applied to the coded block based on its transform type. In the specific example shown in Figure 17, one of six example partitioning types can be applied to divide the coded block into various numbers of transform blocks. This scheme for generating transform block partitions can be applied to either coded blocks or prediction blocks.

更详细地，图17的划分方案为任何给定的变换类型提供了多达6个示例划分类型(变换类型指的是例如主变换的类型，例如ADST等)。在该方案中，可以基于例如率失真成本给每个编码块或预测块分配一个变换划分类型。在示例中，可基于编码块或预测块的变换类型来确定分配给该编码块或预测块的变换划分类型。特定的变换划分类型可以对应于变换块分割大小和模式，如图17中所示的6个变换划分类型所示。可以预定义各种变换类型和各种变换划分类型之间的对应关系。下面示出了一个示例，其中大写标签指示可基于率失真成本分配给编码块或预测块的变换划分类型：More specifically, the partitioning scheme in Figure 17 provides up to six example partitioning types for any given transform type (transform type refers to, for example, the type of the main transform, such as ADST, etc.). In this scheme, a transform partitioning type can be assigned to each coding block or prediction block based on, for example, rate-distortion cost. In the examples, the transform partitioning type assigned to a coding block or prediction block can be determined based on its transform type. A specific transform partitioning type can correspond to a transform block split size and mode, as shown by the six transform partitioning types in Figure 17. The correspondence between various transform types and various transform partitioning types can be predefined. An example is shown below, where uppercase labels indicate the transform partitioning type that can be assigned to a coding block or prediction block based on rate-distortion cost:

·PARTITION_NONE(不划分):分配一个等于块大小的变换大小。• PARTITION_NONE (No partitioning): Allocate a transformation size equal to the block size.

·PARTITION_SPLIT(分割划分)：分配一个变换大小，其宽度为块大小宽度的二分之一，高度为块大小高度的二分之一。• PARTITION_SPLIT: Allocate a transform size with a width that is half the width of the block size and a height that is half the height of the block size.

·PARTITION_HORZ(水平划分)：分配一个变换大小，其宽度与块大小的宽度相同，高度为块大小高度的二分之一。• PARTITION_HORZ (Horizontal Partition): Allocates a transform size with the same width as the block size and a height that is half the height of the block size.

·PARTITION_VERT(垂直划分)：分配一个变换大小，其宽度为块大小宽度的二分之一，高度与块大小的高度相同。• PARTITION_VERT (Vertical Partition): Allocates a transform size with a width that is half the width of the block size and a height that is the same as the height of the block size.

·PARTITION_HORZ4(水平4划分)：分配一个变换大小，其宽度与块大小的宽度相同，高度为块大小高度的四分之一。• PARTITION_HORZ4 (Horizontal 4-partition): Allocate a transform size with the same width as the block size and a height that is one-quarter of the block size height.

·PARTITION_VERT4(垂直4划分)：分配一个变换大小，其宽度为块大小宽度的四分之一，高度与块大小的高度相同。• PARTITION_VERT4 (Vertical 4-partition): Allocate a transform size with a width that is one-quarter the width of the block size and a height that is the same as the height of the block size.

在上面的例子中，如图17所示的变换划分类型针对划分的变换块均包含统一的变换大小。这只是一个例子，不具有限制性。在一些其他实现中，可以对特定划分类型(或模式)中的划分的变换块使用混合的变换块大小。In the example above, as shown in Figure 17, the transform partition type includes a uniform transform size for all partitioned transform blocks. This is just an example and is not restrictive. In some other implementations, mixed transform block sizes can be used for partitioned transform blocks within a specific partition type (or pattern).

根据上述任何划分方案获得的PBs(或CBs，当没有被进一步划分成预测块时也称为PBs)可以成为用于经由帧内或帧间预测进行编码的单个块。对于当前PB的帧间预测，可以生成当前块和预测块之间的残差，对该残差进行编码，并将其包括在已编码码流中。PBs (or CBs, also called PBs when not further divided into prediction blocks) obtained according to any of the above partitioning schemes can become a single block for encoding via intra-frame or inter-frame prediction. For inter-frame prediction of the current PB, a residual between the current block and the prediction block can be generated, encoded, and included in the encoded bitstream.

可以例如以单参考模式或复合参考模式实现帧间预测。在一些实现中，可以首先在当前块(或更高级别)的码流中包含跳过标志(skip flag)，用于指示当前块是否被帧间编码并且是否不被跳过。如果当前块被帧间编码，则可以在码流中进一步包含另一个标志作为一个信号，以指示当前块是使用单参考模式还是复合参考模式。对于单参考模式，可以使用一个参考块来生成当前块的预测块。对于复合参考模式，可以使用两个或多个参考块，例如通过加权平均来生成预测块。复合参考模式可以称为不止一个参考模式、两个参考模式或多参考模式。可以使用一个或多个参考帧索引并另外使用指示参考块和当前块之间的位置(例如，水平和垂直像素)偏移的一个或多个对应运动矢量来识别一个或多个参考块。例如，当前块的帧间预测块可以从由参考帧中的一个运动矢量标识为单参考模式中的预测块的单参考块生成，而对于复合参考模式，预测块可以通过对由两个运动矢量指示的两个参考帧中的两个参考块进行加权平均来生成。可以对运动矢量进行编码并以各种方式将其包含在码流中。Inter-frame prediction can be implemented, for example, in single-reference mode or composite reference mode. In some implementations, a skip flag can be included in the bitstream of the current block (or higher) to indicate whether the current block is inter-coded and not skipped. If the current block is inter-coded, another flag can be included in the bitstream as a signal to indicate whether the current block uses single-reference mode or composite reference mode. For single-reference mode, a single reference block can be used to generate the prediction block for the current block. For composite reference mode, two or more reference blocks can be used, for example, by weighted averaging to generate the prediction block. Composite reference mode can be referred to as more than one reference mode, two reference modes, or multiple reference modes. One or more reference blocks can be identified using one or more reference frame indices and additionally using one or more corresponding motion vectors indicating the positional (e.g., horizontal and vertical pixel) offset between the reference blocks and the current block. For example, the inter-frame prediction block for the current block can be generated from a single reference block identified by a motion vector in a reference frame in single-reference mode, while for composite reference mode, the prediction block can be generated by a weighted average of two reference blocks in two reference frames indicated by two motion vectors. Motion vectors can be encoded and included in the bitstream in various ways.

在一些实现中，编码或解码系统可以具有解码图像缓冲器(decoded picturebuffer，DPB)。一些图像/图片可以被保存在DPB中等待被显示(在解码系统中)，并且DPB中的一些图像/图片可以被用作参考帧以实现帧间预测。在一些实现中，DPB中的参考帧可以被标记为正在编码或解码的当前图像的短期参考或长期参考。例如，短期参考帧可以包括用于执行如下操作的帧，用于对当前帧中的块进行帧间预测或以解码顺序对当前帧最接近的预定数量(例如，2个)的后续视频帧中的块进行帧间预测。长期参考帧可以包括DPB中的帧，这些帧可以用于预测在解码顺序上远离当前帧的超过预定义数量的帧中的图像块。关于短期参考帧和长期参考帧的这种标签的信息可以被称为参考图片集(ReferencePicture Set，RPS)，并且该信息可以被添加到已编码码流中的每个帧的报头。已编码视频流中的每一帧可以由图像顺序计数器(Picture Order Counter，POC)标识，该图像顺序计数器根据播放序列以绝对方式或与从例如i帧开始的图像组相关的顺序进行编号。In some implementations, the encoding or decoding system may have a decoded picture buffer (DPB). Some images/pictures can be stored in the DPB awaiting display (in the decoding system), and some images/pictures in the DPB can be used as reference frames for inter-frame prediction. In some implementations, reference frames in the DPB can be labeled as short-term or long-term references to the current image being encoded or decoded. For example, short-term reference frames may include frames used to perform inter-frame prediction on blocks in the current frame or inter-frame prediction on blocks in a predetermined number (e.g., 2) of subsequent video frames closest to the current frame in decoding order. Long-term reference frames may include frames in the DPB that can be used to predict image blocks in more than a predefined number of frames that are far removed from the current frame in decoding order. This information regarding the labeling of short-term and long-term reference frames may be referred to as a Reference Picture Set (RPS), and this information may be added to the header of each frame in the encoded bitstream. Each frame in an encoded video stream can be identified by a Picture Order Counter (POC), which is numbered either in an absolute manner or in order associated with a group of images starting from, for example, frame i, according to the playback sequence.

在一些示例实现中，可以基于RPS中的信息形成包含用于帧间预测的短期和长期参考帧的标识的一个或多个参考图片列表。例如，可以针对单向帧间预测形成单个图片参考列表，表示为L0参考(或参考列表0)，双向帧间预测可形成两个图片参考列表，两个预测方向各记为L0(或参考列表0)和L1(或参考列表1)。包括在L0和L1列表中的参考帧可以以各种预定方式排序。L0和L1列表的长度可以被写入视频码流。当复合预测模式中用于通过加权平均生成预测块的多个参考帧位于待预测块的同一侧时，单向帧间预测可以是以单参考模式，或者是以复合参考模式。双向帧间预测可以仅是以复合模式，这是因为双向帧间预测涉及至少两个参考块。In some example implementations, one or more reference picture lists containing identifiers of short-term and long-term reference frames used for inter-frame prediction can be formed based on information in the RPS. For example, a single picture reference list, denoted as L0 reference (or reference list 0), can be formed for unidirectional inter-frame prediction, while two picture reference lists can be formed for bidirectional inter-frame prediction, denoted as L0 (or reference list 0) and L1 (or reference list 1) for the two prediction directions. The reference frames included in the L0 and L1 lists can be ordered in various predetermined ways. The lengths of the L0 and L1 lists can be written into the video bitstream. When multiple reference frames used to generate a prediction block by weighted averaging are located on the same side of the block to be predicted in a composite prediction mode, unidirectional inter-frame prediction can be in single-reference mode or composite reference mode. Bidirectional inter-frame prediction can be in composite mode only because bidirectional inter-frame prediction involves at least two reference blocks.

在一些实现中，可以实现用于帧间预测的合并模式(merge mode，MM)。通常，对于合并模式，当前PB的单参考预测中的运动矢量或复合参考预测中的一个或多个运动矢量可以从其他运动矢量导出，而不是独立地计算和写入。例如，在编码系统中，当前PB的当前运动矢量可以被减小至当前运动矢量和其他一个或多个已经编码的运动矢量(称为参考运动矢量)之间的差。这种运动矢量的差、而不是当前运动矢量的全部可以被编码并包括在码流中，并且这种差可以链接到参考运动矢量。相应地，在解码系统中，对应于当前PB的运动矢量可以基于解码的运动矢量差和与其链接的解码的参考运动矢量来导出。作为一般合并模式(MM)帧间预测的一种特定形式，这种基于运动矢量差的帧间预测可以被称为具有运动矢量差的合并模式(Merge Mode with Motion Vector Difference，MMVD)。因此，一般的MM或具体的MMVD可以被实现为利用与不同PBs相关联的运动矢量之间的相关性来提高编码效率。例如，相邻的PBs可以具有相似的运动矢量。对于另一个示例，对于空间中类似定位/位置的块，运动矢量可以在时间上(在帧之间)相关。In some implementations, a merge mode (MM) can be implemented for inter-frame prediction. Typically, in a merge mode, the motion vectors in a single-reference prediction or one or more motion vectors in a composite-reference prediction of the current PB can be derived from other motion vectors, rather than being computed and written independently. For example, in a coding system, the current motion vector of the current PB can be reduced to the difference between the current motion vector and one or more other already encoded motion vectors (called reference motion vectors). This difference, rather than the entire current motion vector, can be encoded and included in the bitstream, and this difference can be linked to the reference motion vector. Accordingly, in a decoding system, the motion vector corresponding to the current PB can be derived based on the decoded motion vector difference and the decoded reference motion vector linked to it. As a specific form of inter-frame prediction in general merge mode (MM), this motion vector difference-based inter-frame prediction can be called Merge Mode with Motion Vector Difference (MMVD). Therefore, general MM or specific MMVD can be implemented to improve coding efficiency by leveraging the correlation between motion vectors associated with different PBs. For example, adjacent PBs may have similar motion vectors. For another example, for blocks of similar location/position in space, motion vectors can be correlated in time (between frames).

在一些示例实现中，在编码处理期间，可以在码流中包括MM标志，用于指示当前PB是否处于合并模式。另外地或可选地，在编码过程中可以将MMVD标志包括在并写入码流中以指示当前PB是否处于MMVD模式。可以在PB级、CB级、CU级、CTB级、CTU级、切片级、图像级等处提供MM和/或MMVD标志或指示符。对于特定示例，对于当前CU可以包括MM标志和MMVD标志，并且该MMVD标志可以紧接在跳过标志和MM标志之后被写入，以指定MMVD模式是否用于当前CU。In some example implementations, the MM flag can be included in the bitstream during encoding processing to indicate whether the current PB is in merge mode. Alternatively or additionally, the MMVD flag can be included and written into the bitstream during encoding to indicate whether the current PB is in MMVD mode. MM and/or MMVD flags or indicators can be provided at the PB level, CB level, CU level, CTB level, CTU level, slice level, image level, etc. For a specific example, the current CU may include both the MM flag and the MMVD flag, and the MMVD flag may be written immediately after the skip flag and the MM flag to specify whether MMVD mode is used for the current CU.

在MMVD的一些示例实现中，针对被预测的块可以形成用于运动矢量预测的合并候选列表。合并候选列表可以包含预定数量(例如2)个MV预测器候选块，其运动矢量可以用于预测当前运动矢量。MVD候选块可以包括从同一帧的相邻块和/或临时块(例如，当前帧的前一帧或后一帧中相同位置的块)中选择的块。这些选项表示相对于当前块在空间或时间位置上的块，这些块可能具有与当前块相似或相同的运动矢量。可以预先确定MV预测器候选列表的大小。例如，列表中可能包含两个候选块。要位于合并候选的列表上，候选块例如可能需要具有与当前块相同的一个参考帧(或多个参考帧)，必须存在(例如，当当前块靠近帧的边缘时，需要执行边界检查)，并且必须在编码过程中已经被编码，和/或在解码过程中已经被解码。在一些实现中，如果合并候选列表可用并且满足上述条件，则可以首先使用空间上相邻的块(按照特定的预定义顺序扫描)填充，然后如果列表中仍有空间可用，则可以使用临时块填充。例如，这些相邻的候选块可以从当前块的左侧和顶部块中选择。合并MV预测器候选列表可以被写入码流。In some example implementations of MMVD, a merged candidate list can be formed for the block being predicted, used for motion vector prediction. The merged candidate list can contain a predetermined number (e.g., 2) of MV predictor candidate blocks whose motion vectors can be used to predict the current motion vector. MVD candidate blocks can include blocks selected from neighboring blocks and/or temporary blocks (e.g., blocks at the same position in the previous or next frame of the current frame). These options represent blocks with similar or identical motion vectors to the current block in terms of spatial or temporal location. The size of the MV predictor candidate list can be predetermined. For example, the list might contain two candidate blocks. To be included in the merged candidate list, a candidate block might need to have one (or more) reference frames identical to the current block, must exist (e.g., boundary checks are required when the current block is near the edge of a frame), and must have been encoded during encoding and/or decoded during decoding. In some implementations, if the merge candidate list is available and the above conditions are met, it can first be filled with spatially adjacent blocks (scanned in a specific predefined order), and then temporary blocks can be used to fill the list if space is still available. For example, these adjacent candidate blocks can be selected from the blocks to the left and top of the current block. The merge MV predictor candidate list can be written to the bitstream.

在一些实现中，可以将用作预测当前块的运动矢量的参考运动矢量的实际合并候选者写入码流。在合并候选列表包含两个候选者的情况下，可以使用称为合并候选标志的一位标志来指示对参考合并候选者的选择。对于以复合模式预测的当前块，使用MV预测器预测的多个运动矢量中的每一个可以与来自合并候选列表的参考运动矢量相关联。In some implementations, the actual merge candidate for the reference motion vector used to predict the motion vector of the current block can be written into the bitstream. When the merge candidate list contains two candidates, a one-bit flag called the merge candidate flag can be used to indicate the selection of the reference merge candidate. For the current block predicted in composite mode, each of the multiple motion vectors predicted using the MV predictor can be associated with a reference motion vector from the merge candidate list.

在MMVD的一些示例实现中，在选择合并候选者并将其用作待预测的运动矢量的基本运动矢量预测器之后，可以在编码系统中计算运动矢量差(MVD或增量(delta)MV，表示待预测的运动矢量和参考候选运动矢量之间的差)。这种MVD可以包括表示MV差的大小和MV差的方向的信息，这些信息可以被写入码流。运动差大小和运动差方向可以以各种方式被写入码流。In some example implementations of MMVD, after selecting a merging candidate and using it as the basic motion vector predictor for the motion vector to be predicted, the motion vector difference (MVD, or delta MV, representing the difference between the motion vector to be predicted and the reference candidate motion vector) can be computed in the encoding system. This MVD can include information representing the magnitude and direction of the MV difference, which can be written into the bitstream. The magnitude and direction of the motion difference can be written into the bitstream in various ways.

在MMVD的一些示例实现中，距离索引可用于指定运动矢量差的大小信息，并指示一组预定义偏移量中的一个，该一组预定义偏移量表示相对于起点(参考运动矢量)的预定义运动矢量差。然后，可以将根据所述信号指示的索引的MV偏移添加到起始(参考)运动矢量的水平分量或垂直分量。参考运动矢量的水平分量或垂直分量应该的偏移量由MVD的示例性方向信息确定。表2中指定了距离索引和预定义偏移量之间的预定义关系示例。In some example implementations of MMVD, a distance index can be used to specify the magnitude of the motion vector difference and indicate one of a set of predefined offsets representing a predefined motion vector difference relative to a starting point (reference motion vector). The MV offset, indicated by the index, can then be added to the horizontal or vertical component of the starting (reference) motion vector. The offset that the horizontal or vertical component of the reference motion vector should have is determined by exemplary direction information from the MVD. Table 2 specifies examples of predefined relationships between distance indices and predefined offsets.

表2距离索引和预定义MV偏移的示例关系Table 2 shows an example relationship between distance index and predefined MV offset.

在MMVD的一些示例性实现中，方向索引可以被进一步写入码流并用于表示MVD相对于参考运动矢量的方向。在一些实现中，方向可以被限定为水平方向和垂直方向中的任一个。2位方向索引的示例如表3所示。在表3的示例中，MVD的解释可以根据起始/参考MVs的信息而变化。例如，当起始/参考MV对应于单预测块或对应于双预测块，其中两个参考帧列表都指向当前图片的同一侧时(即，两个参考图片的POC都大于当前图片的POC，或者都小于当前图片的POC)，表3中的符号可以指定添加到起始/参考MV的MV偏移的符号(方向)。当起始/参考MV对应于当前图片不同侧的两个参考图片的双预测块时(即，一个参考图片的POC大于当前图片的POC，而另一个参考图片的POC小于当前图片的POC)，并且图片参考列表0中的参考POC与当前帧之间的差值大于图片参考列表1中的参考POC与当前帧之间的差值时，表3中的符号可以指定添加到与图片参考列表0中的参考图片相对应的参考MV的MV偏移的符号，并且与图片参考列表1中的参考图片相对应的MV的偏移的符号可以具有相反的值(对偏移而言具有相反的符号)。否则，如果图片参考列表1中的参考POC与当前帧之间的差值大于图片参考列表0中的参考POC与当前帧之间的差值，则表3中的符号可以指定添加到与图片参考列表1相关联的参考MV的MV偏移的符号，并且与图片参考列表0相关联的参考MV的偏移的符号具有相反的值。In some exemplary implementations of MMVD, the direction index can be further written into the bitstream and used to represent the direction of the MVD relative to the reference motion vector. In some implementations, the direction can be limited to either a horizontal or vertical direction. Examples of 2-bit direction indices are shown in Table 3. In the examples in Table 3, the interpretation of the MVD can vary depending on the information of the start/reference MVs. For example, when the start/reference MV corresponds to a single prediction block or a double prediction block, where both reference frame lists point to the same side of the current image (i.e., the POC of both reference images is greater than or less than the POC of the current image), the symbols in Table 3 can specify the sign (direction) of the MV offset added to the start/reference MV. When the start/reference MV corresponds to a double prediction block of two reference images on different sides of the current image (i.e., the POC of one reference image is greater than the POC of the current image, and the POC of the other reference image is less than the POC of the current image), and the difference between the reference POC in image reference list 0 and the current frame is greater than the difference between the reference POC in image reference list 1 and the current frame, the symbols in Table 3 can specify the sign of the MV offset added to the reference MV corresponding to the reference image in image reference list 0, and the sign of the offset of the MV corresponding to the reference image in image reference list 1 can have the opposite value (opposite sign for offset). Otherwise, if the difference between the reference POC in image reference list 1 and the current frame is greater than the difference between the reference POC in image reference list 0 and the current frame, then the symbols in Table 3 can specify the sign of the MV offset added to the reference MV associated with image reference list 1, and the sign of the offset of the reference MV associated with image reference list 0 has the opposite value.

表3由方向索引指定的MV偏移的符号的示例实现Table 3 shows an example implementation of the sign of the MV offset specified by the direction index.

方向IDXDirectional IDX 0000 0101 1010 1111 x轴(水平)x-axis (horizontal) ++ -- N/AN/A N/AN/A y轴(垂直)y-axis (perpendicular) N/AN/A N/AN/A ++ --

在一些示例实现中，MVD可以根据POCs在每个方向上的差值进行缩放调整。如果两个列表中POC的差值相同，则不需要调整。否则，如果参考列表0中的POC差值大于参考列表1中的POC差值，则对参考列表1的MVD进行调整。如果参考列表1的POC差大于列表0中的POC差值，则可以以相同的方式调整列表0的MVD。如果起始MV是单预测的，则将MVD添加到可用MV或参考MV中。In some example implementations, MVD can be scaled based on the difference between POCs in each direction. If the difference between POCs in two lists is the same, no adjustment is needed. Otherwise, if the difference between POCs in reference list 0 is greater than the difference between POCs in reference list 1, the MVD of reference list 1 is adjusted. If the difference between POCs in reference list 1 is greater than the difference between POCs in list 0, the MVD of list 0 can be adjusted in the same way. If the starting MV is a single prediction, the MVD is added to either the available MV or the reference MV.

在用于双向复合预测的MVD编码和写入的一些示例实现中，除了对两个MVD单独编码和写入之外，还可以实现对称MVD编码，使得只有一个MVD需要写入，而另一个MVD可以根据写入的MVD导出。在这种实现中，包括列表-0和列表-1两者的参考图片索引的运动信息被写入码流。然而，仅将与例如参考列表-0相关联的MVD写入，而与参考列表-1相关联的MVD不被写入而是被导出。具体地，在切片级，可以在码流中包括标志，称为“mvd_l1_zero_flag”，用于指示参考列表-1是否未被写入码流。如果该标志是1，指示参考列表-1等于零(因此没有写入)，则称为“BiDirPredFlag”的双向预测标志可以被设置为0，这意味着没有双向预测。否则，如果mvd_l1_zero_flag为零，如果列表-0中的最近参考图片和列表-1中的最近参考图片形成向前和向后的参考图片对或向后和向前的参考图片对，则BiDirPredFlag可以被设置为1，并且列表-0和列表-1的参考图片都是短期参考图片。否则，BiDirPredFlag设置为0。BiDirPredFlag为1表示对称模式标志被附加地写入码流。当BiDirPredFlag为1时，解码器可以从码流中提取对称模式标志。例如，对称模式标志可以在CU级上被写入(如果需要的话)，并且它指示对称MVD编码模式是否正被用于相应的CU。当对称模式标志为1时，表示使用对称MVD编码模式，并且仅列表-0和列表-1(称为“mvp_l0_flag”和“mvp_l1_flag”)两者的参考图片索引及与列表-0相关联的MVD(称为“MVD0”)被写入，并且另一个运动矢量差“MVD1”将被导出而不是被写入。例如，MVD1可以导出为-MVD0。因此，在示例性对称MVD模式中仅有一个MVD被写入码流。在MV预测的一些其它示例性实现中，对于单参考模式和复合参考模式MV预测，可使用协调方案来实现通用合并模式MMVD和一些其它类型的MV预测。可以使用各种语法元素来表示对当前块的MV进行预测的方式。In some example implementations of MVD encoding and writing for bidirectional composite prediction, symmetric MVD encoding can be implemented in addition to encoding and writing the two MVDs separately, such that only one MVD needs to be written, while the other MVD can be derived from the written MVD. In this implementation, motion information including reference image indices for both list-0 and list-1 is written to the bitstream. However, only the MVD associated with, for example, reference list-0 is written, while the MVD associated with reference list-1 is not written but derived. Specifically, at the slice level, a flag called "mvd_l1_zero_flag" can be included in the bitstream to indicate whether reference list-1 has not been written to the bitstream. If this flag is 1, indicating that reference list-1 is equal to zero (and therefore not written), the bidirectional prediction flag called "BiDirPredFlag" can be set to 0, meaning there is no bidirectional prediction. Otherwise, if `mvd_l1_zero_flag` is zero, and if the most recent reference picture in list-0 and the most recent reference picture in list-1 form a forward and backward reference picture pair or a backward and forward reference picture pair, then `BiDirPredFlag` can be set to 1, and the reference pictures in lists-0 and-1 are both short-term reference pictures. Otherwise, `BiDirPredFlag` is set to 0. `BiDirPredFlag` being 1 indicates that the symmetric mode flag is additionally written to the bitstream. When `BiDirPredFlag` is 1, the decoder can extract the symmetric mode flag from the bitstream. For example, the symmetric mode flag can be written at the CU level (if needed), and it indicates whether the symmetric MVD coding mode is being used for the corresponding CU. When the symmetric mode flag is 1, it indicates that the symmetric MVD encoding mode is used, and only the reference picture indices of lists -0 and -1 (referred to as "mvp_l0_flag" and "mvp_l1_flag") and the MVD associated with list -0 (referred to as "MVD0") are written, and another motion vector difference "MVD1" is derived instead of being written. For example, MVD1 can be derived as -MVD0. Therefore, in the exemplary symmetric MVD mode, only one MVD is written to the bitstream. In some other exemplary implementations of MV prediction, for single-reference mode and composite-reference mode MV prediction, a coordination scheme can be used to implement the general merge mode MMVD and some other types of MV prediction. Various syntax elements can be used to indicate how the MV of the current block is predicted.

例如，对于单参考模式，以下MV预测模式可以被写入码流：For example, for single-reference mode, the following MV prediction patterns can be written into the bitstream:

NEARMV-直接使用动态参考列表(Dynamic Reference List，DRL)索引指示的列表中的运动矢量预测器(motion vector predictor，MVP)之一，而不使用任何MVD。NEARMV directly uses one of the motion vector predictors (MVPs) in the list indicated by the Dynamic Reference List (DRL) index, without using any MVD.

NEWMV–使用由DRL索引写入的列表中的运动矢量预测器(MVPs)之一作为参考，并对MVP应用一个增量(例如，使用MVD)。NEWMV – Uses one of the motion vector predictors (MVPs) in a list written by the DRL index as a reference and applies an increment to the MVP (e.g., using MVD).

GLOBALMV–使用基于帧级全局运动参数的运动矢量。GLOBALMV – Uses motion vectors based on frame-level global motion parameters.

同样地，对于使用与待预测的两个MV对应的两个参考帧的复合参考帧间预测模式，以下MV预测模式可以被写入码流：Similarly, for a composite reference frame prediction mode that uses two reference frames corresponding to the two MVs to be predicted, the following MV prediction modes can be written into the bitstream:

NEAR_NEARMV–对于要预测的两个MV中的每一个，使用由DRL索引写入的列表中的运动矢量预测器(MVP)之一，而不使用MVD。NEAR_NEARMV – For each of the two MVs to be predicted, use one of the motion vector predictors (MVPs) from the list written by the DRL index, instead of using MVD.

NEAR_NEWMV–为了预测两个运动矢量中的第一个运动矢量，使用由DRL索引写入的列表中的运动矢量预测器(MVP)中的一个作为参考MV，而不使用MVD；为了预测两个运动矢量中的第二个运动矢量，使用由DRL索引写入的列表中的运动矢量预测器(MVP)中的一个作为参考MV，并结合另外写入的增量MV(MVD)。NEAR_NEWMV – To predict the first of two motion vectors, one of the motion vector predictors (MVPs) in the list written by the DRL index is used as the reference MV, without using MVD; to predict the second of two motion vectors, one of the motion vector predictors (MVPs) in the list written by the DRL index is used as the reference MV, combined with the additionally written incremental MV (MVD).

NEW_NEARMV–为了预测两个运动矢量中的第二个运动矢量，使用由DRL索引写入的列表中的运动矢量预测器(MVP)中的一个作为参考MV，而不使用MVD；为了预测两个运动矢量中的第一个运动矢量，使用由DRL索引写入的列表中的运动矢量预测器(MVP)中的一个作为参考MV，并结合另外写入的增量MV(MVD)。NEW_NEARMV – To predict the second of two motion vectors, one of the motion vector predictors (MVPs) in the list written by the DRL index is used as the reference MV, without using MVD; to predict the first of two motion vectors, one of the motion vector predictors (MVPs) in the list written by the DRL index is used as the reference MV, combined with an additional incremental MV (MVD).

NEW_NEWMV–使用由DRL索引写入的列表中的运动矢量预测器(MVP)中的一个作为参考MV，并与另外写入的增量MV结合使用，以预测两个MV中的每一个。NEW_NEWMV – Uses one of the motion vector predictors (MVPs) in the list written by the DRL index as a reference MV, and combines it with an additional incremental MV to predict each of the two MVs.

GLOBAL_GLOBALMV–根据帧级全局运动参数使用每个参考的MV。GLOBAL_GLOBALMV – Uses the MV for each reference based on frame-level global motion parameters.

因此，上述术语“NEAR”是指使用参考MV而不使用MVD作为一般合并模式的MV预测，而术语“NEW”是指涉及使用参考MV并用写入的MVD对其进行偏移作为MMVD模式的MV预测。对于复合帧间预测，上述的参考基本运动矢量和运动矢量增量两者在两个参考之间通常可以不同或独立，即使它们可以相关，并且可以利用这种相关性来减少写入两个运动矢量增量所需的信息量。在这种情况下，可以实现两个MVD的联合写入并指示在码流中。Therefore, the term "NEAR" refers to MV prediction using a reference MV without using MVD as the general merging mode, while the term "NEW" refers to MV prediction involving using a reference MV and offsetting it with the written MVD as the MMVD mode. For composite inter-frame prediction, the aforementioned reference base motion vector and motion vector increments can typically be different or independent between the two references, even if they are correlated, and this correlation can be used to reduce the amount of information required to write the two motion vector increments. In this case, joint writing of the two MVDs can be achieved and indicated in the bitstream.

上面的动态参考列表(dynamic reference list，DRL)可以用于保存一组有索引的运动矢量，这一组运动矢量被动态保存并被认为是候选运动矢量预测器。The dynamic reference list (DRL) above can be used to store a set of indexed motion vectors. This set of motion vectors is dynamically stored and considered as candidate motion vector predictors.

在一些示例实现中，可以允许预定义的MVD的分辨率。例如，可以允许1/8像素的运动矢量精度(或准确度)。以上在各种MV预测模式中描述的MVD可以被构建并以各种方式写入码流。在一些实现中，可以使用各种语法元素来表示参考帧列表0或列表1中的上述运动矢量差。In some example implementations, predefined MVD resolutions are allowed. For example, a motion vector precision (or accuracy) of 1/8 pixel may be permitted. The MVDs described above in various MV prediction modes can be constructed and written to the bitstream in various ways. In some implementations, various syntax elements can be used to represent the motion vector differences described above in reference frame list 0 or list 1.

例如，称为“mv_joint”的语法元素可以指定与其相关联的运动矢量差的哪些分量是非零的。对于MVD，这是针对所有非零分量进行联合写入。例如，mv_joint具有以下值。For example, a syntax element called "mv_joint" can specify which components of the motion vector difference it is associated with are non-zero. For MVD, this is a joint write for all non-zero components. For example, mv_joint has the following values.

0可表示沿水平或垂直方向不存在非零MVD；0 can indicate that there is no non-zero MVD along the horizontal or vertical direction;

1可表示仅在水平方向上存在非零MVD；1 can represent that non-zero MVD exists only in the horizontal direction;

2可表示仅在垂直方向上存在非零MVD；2 can represent that non-zero MVD exists only in the vertical direction;

3可表示在水平和垂直方向上都存在非零MVD。3 can indicate that there is a non-zero MVD in both the horizontal and vertical directions.

当MVD信号的“mv_joint”语法元素表示不存在非零MVD分量时，则没有进一步的MVD信息可被写入。如果“mv_joint”语法表示存在一个或两个非零分量，则可进一步针对每个非零MVD分量写入附加的语法元素，如下所述。When the "mv_joint" syntax element of the MVD signal indicates that there are no non-zero MVD components, no further MVD information can be written. If the "mv_joint" syntax indicates that there are one or two non-zero components, additional syntax elements can be written for each non-zero MVD component, as described below.

例如，称为“mv_sign”的语法元素可用于附加地指定对应的运动矢量差分量是正的还是负的。For example, a syntax element called "mv_sign" can be used to additionally specify whether the corresponding motion vector difference component is positive or negative.

对于另一示例，称为“mv_class”的语法元素可用于针对对应的非零MVD分量指定在预定义类别集合中运动矢量差的类别。例如，运动矢量差的预定义类别可用于将运动矢量差的连续大小空间划分成不重叠的范围，每个范围对应一个MVD类别。因此，写入的MVD类别指示相应MVD分量的大小范围。在表4所示的示例实现中，较高的类别对应于具有较大的大小范围的运动矢量差。在表4中，符号(n，m]用于表示大于n个像素且小于或等于m个像素的运动矢量差的范围。For another example, the syntax element called "mv_class" can be used to specify the class of motion vector differences in a predefined set of classes for the corresponding non-zero MVD component. For instance, predefined classes of motion vector differences can be used to divide the continuous size space of motion vector differences into non-overlapping ranges, each range corresponding to one MVD class. Therefore, the written MVD class indicates the size range of the corresponding MVD component. In the example implementation shown in Table 4, higher classes correspond to motion vector differences with larger size ranges. In Table 4, the symbol (n, m) is used to represent the range of motion vector differences greater than n pixels and less than or equal to m pixels.

表4运动矢量差的大小类别Table 4. Categories of Motion Vector Difference Magnitude

在一些其它示例中，称为“mv_bit”的语法元素可进一步用于指定非零运动矢量差分量与相应写入的MV类别的起始大小之间的偏移的整数部分。“mv_bit”中用于写入每个MVD类别的整个范围所需的位数可以随着MV类别的函数而变化。例如，在表4的实现中，MV_CLASS 0和MV_CLASS1可能只需要单个比特来表示从MVD为0开始的整数像素偏移量为1或2。每个更高的MV_CLASS可能需要比前一个MV_CLASS逐渐增加一个“mv_bit”位。In some other examples, a syntax element called "mv_bit" can be further used to specify the integer portion of the offset between the non-zero motion vector difference component and the starting size of the corresponding written MV class. The number of bits required in "mv_bit" for writing the entire range of each MVD class can vary as a function of the MV class. For example, in the implementation in Table 4, MV_CLASS 0 and MV_CLASS 1 might only require a single bit to represent an integer pixel offset of 1 or 2 starting from MVD 0. Each higher MV_CLASS might require progressively increasing by one "mv_bit" bit compared to the previous MV_CLASS.

在一些其它示例中，称为“mv_fr”的语法元素可进一步用于针对对应的非零MVD分量指定运动矢量差的前2个小数位，而称为“mv_hp”的语法元素可用于针对对应的非零MVD分量指定运动矢量差的第三小数位(高分辨率位)。两位“mv_fr”实际上提供了1/4像素的MVD分辨率，而“mv_hp”位可以进一步提供1/8像素的分辨率。在一些其它实现中，可以使用多个“mv_hp”位来提供比1/8像素更精细的MVD像素分辨率。在一些示例实现中，可以在一个或多个不同级别将附加标志写入码流，以指示是否支持1/8像素或更高的MVD分辨率。如果MVD分辨率没有应用于特定的编码单元，那么上面用于对应的不支持的MVD分辨率的语法元素可能不会被写入码流。In some other examples, a syntax element called "mv_fr" can be further used to specify the first two decimal places of the motion vector difference for the corresponding non-zero MVD component, while a syntax element called "mv_hp" can be used to specify the third decimal place (high-resolution bit) of the motion vector difference for the corresponding non-zero MVD component. Two "mv_fr" bits effectively provide 1/4 pixel MVD resolution, while "mv_hp" bits can further provide 1/8 pixel resolution. In some other implementations, multiple "mv_hp" bits can be used to provide a finer MVD pixel resolution than 1/8 pixel. In some example implementations, additional flags can be written to the bitstream at one or more different levels to indicate whether 1/8 pixel or higher MVD resolutions are supported. If the MVD resolution is not applied to a particular coding unit, the syntax elements above for the corresponding unsupported MVD resolution may not be written to the bitstream.

在上面的示例实现中，小数分辨率可能与不同类别的MVD无关。换句话说，不管运动矢量差的大小如何，可以使用预定义数量的“mv_fr”和“mv_hp”位来将非零MVD分量的小数MVD写入，从而提供类似的用于运动矢量分辨率的选项。In the example implementation above, fractional resolution may be independent of different categories of MVD. In other words, regardless of the magnitude of the motion vector difference, a predefined number of "mv_fr" and "mv_hp" bits can be used to write fractional MVDs of non-zero MVD components, thus providing similar options for motion vector resolution.

然而，在一些其他示例实现中，可以对各种MVD大小类别中运动矢量差的分辨率进行区分。具体地，对于较高MVD类别中大小较大的MVD，高分辨率MVD可能不会在压缩效率方面提供统计学意义上的显著改善。因此，对于更大的MVD大小范围，可以用递减的分辨率(整数像素分辨率或小数像素分辨率)对MVD进行编码，所述更大的MVD大小范围对应于更高的MVD大小类别。同样，对于通常较大的MVD值，可以用递减的分辨率(整数像素分辨率或小数像素分辨率)对MVD进行编码。这种依赖于MVD类别或依赖于MVD大小的MVD分辨率通常可称为自适应MVD分辨率。自适应MVD分辨率可以在以下示例实现中描述的各种情况下实现，以实现总体上更好的压缩效率。特别地，由于统计观察到以非适应的方式类似于大小较小或低级别MVD的MVD分辨率对大小较大或高级别MVD的MVD分辨率进行处理可能不会显著增加预测间残差编码效率，因此通过侧重于较低精度MVD而减少的信令的位数可能大于作为这种较低精度MVD的结果的帧间预测残差所需的附加位数。However, in some other example implementations, the resolution of motion vector differences across various MVD size categories can be differentiated. Specifically, for larger MVDs in higher MVD categories, higher resolution MVDs may not provide a statistically significant improvement in compression efficiency. Therefore, for a larger range of MVD sizes corresponding to higher MVD size categories, MVDs can be encoded with decreasing resolutions (integer pixel resolution or fractional pixel resolution). Similarly, for typically large MVD values, MVDs can be encoded with decreasing resolutions (integer pixel resolution or fractional pixel resolution). This MVD resolution, which depends on either the MVD category or the MVD size, is often referred to as adaptive MVD resolution. Adaptive MVD resolution can be implemented in various ways, as described in the following example implementations, to achieve better overall compression efficiency. In particular, since statistical observations show that processing MVD resolutions similar to smaller or lower-level MVDs in an adaptive manner to larger or higher-level MVDs may not significantly increase inter-prediction residual coding efficiency, the number of bits of signaling reduced by focusing on lower-precision MVDs may be greater than the additional number of bits required for inter-frame prediction residuals as a result of such lower-precision MVDs.

在一些一般示例实现中，MVD的像素分辨率或精度可以随着MVD类别的增加而降低或不增加。降低MVD的像素分辨率对应于更粗的MVD(或从一个MVD级别到下一个MVD级别的更大步长)。在一些实现中，MVD像素分辨率和MVD类别之间的对应关系可以被指定、预定义或预配置，因此可以不需要被写入已编码码流。In some typical implementations, the pixel resolution or precision of the MVD may or may not increase with the increase of the MVD category. Decreasing the pixel resolution of the MVD corresponds to a coarser MVD (or a larger step size from one MVD level to the next). In some implementations, the correspondence between MVD pixel resolution and MVD category can be specified, predefined, or preconfigured, and therefore does not need to be written into the encoded bitstream.

在一些示例实现中，表4的MV类别可以各自与不同的MVD像素分辨率相关联。In some example implementations, the MV categories in Table 4 can each be associated with different MVD pixel resolutions.

在一些示例实现中，每个MVD类别可以与单个允许的分辨率相关联。在一些其它实现中，一个或多个MVD类别可以与两个或更多个可选的MVD像素分辨率相关联。因此，用于这种MVD类别的码流中的信号之后跟随的是用于指示为当前MVD分量选择哪个可选像素分辨率的附加信令。In some example implementations, each MVD category may be associated with a single allowed resolution. In other implementations, one or more MVD categories may be associated with two or more optional MVD pixel resolutions. Therefore, the signal in the bitstream for such MVD category is followed by additional signaling to indicate which optional pixel resolution to select for the current MVD component.

在一些示例实现中，自适应允许的MVD像素分辨率可以包括但不限于1/64像素、1/32像素、1/16像素、1/8像素、1-4像素、1/2像素、1像素、2像素、4像素……(按照分辨率的降序顺序)。因此，每个升序MVD类别可以以非升序方式与这些分辨率中的一个相关联。在一些实现中，MVD类别可以与两个或更多个分辨率相关联，并且较高的分辨率可以低于或等于先前MVD类别的较低分辨率。例如，如果表4的MV_CLASS_3可以与可选的1像素和2像素分辨率相关联，则表4的MV_CLASS_4可以相关联的最高分辨率将是2像素。在一些其他实现中，MV类别的最高允许分辨率可能高于先前(较低)MV类别的最低允许分辨率。然而，升序的MV类别的允许的平均分辨率可能只是非升序的。In some example implementations, the adaptively allowed MVD pixel resolutions may include, but are not limited to, 1/64 pixel, 1/32 pixel, 1/16 pixel, 1/8 pixel, 1-4 pixels, 1/2 pixel, 1 pixel, 2 pixels, 4 pixels… (in descending order of resolution). Therefore, each ascending MVD category can be associated with one of these resolutions in a non-ascending manner. In some implementations, an MVD category can be associated with two or more resolutions, and a higher resolution may be lower than or equal to a lower resolution of a previous MVD category. For example, if MV_CLASS_3 in Table 4 can be associated with optional 1-pixel and 2-pixel resolutions, then the highest resolution that MV_CLASS_4 in Table 4 can be associated with would be 2 pixels. In some other implementations, the highest allowed resolution of an MV category may be higher than the lowest allowed resolution of a previous (lower) MV category. However, the allowed average resolution of ascending MV categories may simply be non-ascending.

在一些实现中，当允许小数像素分辨率高于1/8像素时，“mv_fr”和“mv_hp”信令可以相应地扩展到总共大于3个小数位。In some implementations, when fractional pixel resolutions higher than 1/8 of a pixel are allowed, the “mv_fr” and “mv_hp” signaling can be extended accordingly to a total of more than 3 fractional digits.

在一些示例实现中，小数像素分辨率可以仅被允许用于低于或等于阈值MVD类别的MVD类别。例如，小数像素分辨率可以仅允许用于MVD-CLASS 0，而不允许用于表4的所有其他MV类别。同样，小数像素分辨率可以仅允许用于低于或等于表4的任何一个其他MV类别的MVD类别。对于高于阈值MVD类别的其他MVD类别，仅允许MVD的整数像素分辨率。以这种方式，对于用高于或等于阈值MVD类别的MVD类别写入的MVD，可以不需要写入诸如“mv-fr”和/或“mv-hp”位中的一个或多个的小数分辨率信令。对于分辨率低于1像素的MVD类，可以进一步减少“mv-bit”信令中的位数。例如，对于表4中的MV_CLASS_5，MVD像素偏移的范围是(32，64]，因此需要5位来写入1像素分辨率的整个范围。然而，如果MV_CLASS_5与2像素MVD分辨率相关联，则“mv-bit”可能需要4位而不是5位，并且在将“mv_class”写为“MV_CLASS_5”之后“mv-fr”和“mv-hp”都不需要被写入。In some example implementations, fractional pixel resolution may be allowed only for MVD classes that are at or below the threshold MVD class. For example, fractional pixel resolution may be allowed only for MVD-CLASS 0, and not for any of the other MV classes in Table 4. Similarly, fractional pixel resolution may be allowed only for MVD classes that are at or below any of the other MV classes in Table 4. For other MVD classes above the threshold MVD class, only integer pixel resolution of the MVD is allowed. In this way, for MVDs written with MVD classes at or above the threshold MVD class, it may not be necessary to write fractional resolution signaling such as "mv-fr" and/or "mv-hp" bits. For MVD classes with a resolution less than 1 pixel, the number of bits in the "mv-bit" signaling can be further reduced. For example, for MV_CLASS_5 in Table 4, the MVD pixel offset range is (32, 64], so 5 bits are needed to write the entire range for a 1-pixel resolution. However, if MV_CLASS_5 is associated with a 2-pixel MVD resolution, the "mv-bit" may need to be 4 bits instead of 5 bits, and neither "mv-fr" nor "mv-hp" needs to be written after "mv_class" is written as "MV_CLASS_5".

在一些示例实现中，具有低于阈值整数像素值的整数值的MVD可仅允许小数像素分辨率。例如，对于小于5像素的MVD，可能只允许小数像素分辨率。对应于该示例，对于表4的MV_CLASS_0和MV_CLASS_1可以允许小数分辨率，而对于所有其它MV类别则不允许小数分辨率。再例如，对于小于7像素的MVD，可能只允许小数像素分辨率。与该示例相对应，对于表4中的MV_CLASS_0和MV_CLASS_1(范围小于5像素)可以允许小数分辨率，而对于MV_CLASS_3及更高的(范围大于5像素)则不允许小数分辨率。对于像素范围包含5个像素的属于MV_CLASS_2的MVD，根据“m位(m-bit)”值，MVD的小数像素分辨率可以被允许或不被允许。如果“m-bit”值被写入为1或2(使得写入的MVD的整数部分是5或6，计算为MV_CLASS_2的像素范围的开始，偏移量为1或2，由“m-bit”表示)，则可以允许小数像素分辨率。否则，如果“m位”值被写入为3或4(使得写入的MVD的整数部分是7或8)，则小数像素分辨率可能不被允许。In some example implementations, MVDs with integer pixel values below a threshold may only allow fractional pixel resolution. For example, MVDs with values less than 5 pixels may only allow fractional pixel resolution. Corresponding to this example, fractional resolution is allowed for MV_CLASS_0 and MV_CLASS_1 in Table 4, but not for all other MV categories. As another example, MVDs with values less than 7 pixels may only allow fractional pixel resolution. Corresponding to this example, fractional resolution is allowed for MV_CLASS_0 and MV_CLASS_1 in Table 4 (range less than 5 pixels), but not for MV_CLASS_3 and higher (range greater than 5 pixels). For MVDs belonging to MV_CLASS_2 with a pixel range of 5 pixels, fractional pixel resolution may or may not be allowed depending on the "m-bit" value. If the "m-bit" value is written as 1 or 2 (making the integer part of the written MVD 5 or 6, calculated as the start of the pixel range of MV_CLASS_2 with an offset of 1 or 2, represented by "m-bit"), then fractional pixel resolution is allowed. Otherwise, if the "m-bit" value is written as 3 or 4 (making the integer part of the written MVD 7 or 8), then fractional pixel resolution may not be allowed.

在一些其他实现中，对于等于或高于阈值MV类别的MV类别，可以仅允许单个MVD值。例如，这样的阈值MV类别可以是MV_CLASS2。因此，MV_CLASS_2及以上可仅被允许具有单个MVD值并且没有小数像素分辨率。这些MV类别的单个允许MVD值可以是预定义的。在一些示例中，允许的单个值可以是表4中这些MV类别的各自范围的较高的端值。例如，MV_CLASS_2到MV_CLASS_10可以高于或等于阈值class_2，并且这些类别的单个允许MVD值可以分别预定义为8、16、32、64、128、256、512、1024和2048。在其他一些示例中，允许的单个值可能是表4中这些MV类别各自范围的中间值。例如，MV_CLASS_2到MV_CLASS_10可以高于类别阈值，并且这些类别的单个允许MVD值可以分别预定义为3、6、12、24、48、96、192、384、768和1536。范围内的任何其他值也可以被定义为各个MVD类别的单个允许分辨率。In some other implementations, for MV classes equal to or higher than a threshold MV class, only a single MVD value may be allowed. For example, such a threshold MV class could be MV_CLASS2. Therefore, MV_CLASS_2 and above may be allowed only with a single MVD value and no fractional pixel resolution. The single allowed MVD value for these MV classes may be predefined. In some examples, the allowed single value may be the higher end of the respective range for these MV classes in Table 4. For example, MV_CLASS_2 to MV_CLASS_10 may be higher than or equal to the threshold class_2, and the single allowed MVD values for these classes may be predefined as 8, 16, 32, 64, 128, 256, 512, 1024, and 2048, respectively. In other examples, the allowed single value may be the middle value of the respective range for these MV classes in Table 4. For example, MV_CLASS_2 to MV_CLASS_10 can be higher than the category thresholds, and the individual allowed MVD values for these categories can be predefined as 3, 6, 12, 24, 48, 96, 192, 384, 768, and 1536, respectively. Any other value within the range can also be defined as a single allowed resolution for each MVD category.

在上述实现中，在写入的“mv_class”等于或高于预定义的MVD等级阈值时，仅“mv_class”信令就足以确定MVD值。然后使用“mv_class”和“mv_sign”来确定MVD的大小和方向。In the above implementation, when the written "mv_class" is equal to or higher than the predefined MVD level threshold, the "mv_class" signaling alone is sufficient to determine the MVD value. Then, "mv_class" and "mv_sign" are used to determine the size and direction of the MVD.

当MVD仅针对一个参考帧(来自参考帧列表0或列表1，但不是两者)写入时，或者针对两个参考帧联合写入时，MVD的精度(或分辨率)可以取决于表3中相关联的运动矢量差的类别和/或MVD的大小。When MVD is written to only one reference frame (from reference frame list 0 or list 1, but not both), or when written to two reference frames jointly, the accuracy (or resolution) of MVD can depend on the category of the associated motion vector difference in Table 3 and/or the size of MVD.

在一些其它实现中，MVD的像素分辨率或精度可以随着MVD大小的增加而降低或不增加。例如，像素分辨率可以取决于MVD大小的整数部分。在一些实现中，仅对于小于或等于振幅阈值的MVD大小，可以允许小数像素分辨率。对于解码器，可以首先从码流中提取MVD大小的整数部分。然后可以确定像素分辨率，并且然后可以决定码流中是否存在任何小数MVD并且需要被解析(例如，如果对于特定提取的MVD整数大小不允许小数像素分辨率，那么需要提取的码流中可能不包含小数MVD位)。上述与依赖于MVD类别的自适应MVD像素分辨率相关的示例实现适用于依赖于MVD大小的自适应MVD像素分辨率。对于特定示例，高于或包含大小阈值的MVD类别可能只被允许有一个预定义值。In some other implementations, the pixel resolution or precision of the MVD may or may not increase with the MVD size. For example, the pixel resolution may depend on the integer part of the MVD size. In some implementations, fractional pixel resolution is allowed only for MVD sizes less than or equal to an amplitude threshold. For the decoder, the integer part of the MVD size can be extracted from the bitstream first. The pixel resolution can then be determined, and it can then be determined whether any fractional MVDs exist in the bitstream and need to be parsed (e.g., if fractional pixel resolution is not allowed for a particular extracted integer MVD size, then the bitstream to be extracted may not contain fractional MVD bits). The example implementations above related to adaptive MVD pixel resolution dependent on MVD category apply to adaptive MVD pixel resolution dependent on MVD size. For a particular example, MVD categories above or containing a size threshold may only be allowed one predefined value.

上面的各种示例实现用于单参考模式。这些实现也适用于MMVD下复合预测中的NEW_NEARMV、NEAR_NEWMV和/或NEW_NEWMV模式示例。这些实现通常适用于MVD的写入。The various example implementations above are for single-reference mode. These implementations also apply to NEW_NEARMV, NEAR_NEWMV, and/or NEW_NEWMV mode examples in composite prediction under MMVD. These implementations are generally suitable for MVD writes.

图18示出了遵循上述用于自适应MVD分辨率的实现的原理的示例方法的流程图1800。示例解码方法流程从S1801开始。在S1810中，接收视频流，确定与所述帧间预测视频块相关联的运动矢量和参考运动矢量之间的运动矢量差(MVD)被写入所述视频流，其中，所述参考运动矢量对应于参考帧列表0和参考帧列表1中仅一个中的一个参考图片，除非所述MVD是针对两个参考图片联合写入的。在S1820中，从所述视频流中获得在多个预定义的运动矢量差大小范围中所述MVD的大小范围的指示。在S1830中，根据所述大小范围确定所述MVD的像素分辨率。在S1840中，基于所述像素分辨率识别所述视频流中的附加MVD信息。在S1850中，从所述视频流中提取所述附加MVD信息。在S1860中，基于所述像素分辨率、所述附加MVD信息、所述参考运动矢量以及与所述运动矢量相关联的参考帧，对所述帧间预测视频块进行解码。示例方法流程于S1899处结束。Figure 18 illustrates a flowchart 1800 of an example method following the principles described above for implementing adaptive MVD resolution. The example decoding method flow begins at S1801. In S1810, a video stream is received, and it is determined that a motion vector difference (MVD) between a motion vector associated with the inter-frame predicted video block and a reference motion vector is written into the video stream, wherein the reference motion vector corresponds to only one of the reference frames in reference frame list 0 and reference frame list 1, unless the MVD is written jointly for both reference frames. In S1820, an indication of the size range of the MVD within a plurality of predefined motion vector difference size ranges is obtained from the video stream. In S1830, the pixel resolution of the MVD is determined based on the size range. In S1840, additional MVD information in the video stream is identified based on the pixel resolution. In S1850, the additional MVD information is extracted from the video stream. In S1860, the inter-frame predicted video block is decoded based on the pixel resolution, the additional MVD information, the reference motion vector, and the reference frame associated with the motion vector. The example method flow ends at S1899.

图19示出了遵循上述用于自适应MVD分辨率的实现的原理的示例方法的另一流程图1900。示例方法流程从S1901开始。在S1910中，接收视频流，确定与帧间预测视频块相关联的运动矢量和参考运动矢量之间的运动矢量差(MVD)被写入所述视频流，其中，所述参考运动矢量对应于参考帧列表0和参考帧列表1中仅一个中的一个参考图片，除非所述MVD是针对两个参考图片联合写入的。在S1920中，从所述视频流中提取所述MVD的大小的整数部分。在S1930中，根据所述MVD的大小的整数部分确定所述MVD的像素分辨率。在S1940中，基于所述像素分辨率识别所述视频流中的附加MVD信息。在S1950中，基于所述像素分辨率、所述MVD的大小的整数部分、所述附加MVD信息、所述参考运动矢量以及与所述运动矢量相关联的参考帧，对所述帧间预测视频块进行解码。示例方法流程于S1999结束。Figure 19 illustrates another flowchart 1900 of the example method following the principles described above for implementing adaptive MVD resolution. The example method flow begins at S1901. In S1910, a video stream is received, and it is determined that the motion vector difference (MVD) between the motion vector associated with the inter-frame predicted video block and a reference motion vector is written into the video stream, wherein the reference motion vector corresponds to only one of the reference frames in reference frame list 0 and reference frame list 1, unless the MVD is written jointly for both reference frames. In S1920, the integer portion of the size of the MVD is extracted from the video stream. In S1930, the pixel resolution of the MVD is determined based on the integer portion of the size of the MVD. In S1940, additional MVD information in the video stream is identified based on the pixel resolution. In S1950, the inter-frame predicted video block is decoded based on the pixel resolution, the integer portion of the size of the MVD, the additional MVD information, the reference motion vector, and the reference frame associated with the motion vector. The example method flow ends at S1999.

在本公开的实施例和实施例中，任何步骤和/或操作可以根据需要以任何数量或顺序组合或安排。步骤和/或操作中的两个或更多个可以并行执行。本公开中的实施例和实施方式可以单独使用或以任何顺序组合使用。此外，每个方法(或实施例)、编码器和解码器可以通过处理电路(例如，一个或多个处理器或一个或多个集成电路)来实现。在一个示例中，一个或多个处理器执行存储在非暂时性计算机可读介质中的程序。本公开中的实施例可以应用于亮度块或色度块。术语块可以被解释为预测块、编码块或编码单元，即CU。这里的术语块也可以用来指代变换块。在以下各项中，当说块大小时，它可以指块的宽度或高度，或宽度和高度的最大值，或宽度和高度的最小值，或面积大小(宽度*高度)，或块的纵横比(宽度：高度，或高度：宽度)。In the embodiments and implementations of this disclosure, any steps and/or operations may be combined or arranged in any number or order as needed. Two or more steps and/or operations may be performed in parallel. The embodiments and implementations in this disclosure may be used individually or in combination in any order. Furthermore, each method (or embodiment), encoder, and decoder may be implemented by processing circuitry (e.g., one or more processors or one or more integrated circuits). In one example, one or more processors execute a program stored in a non-transitory computer-readable medium. The embodiments in this disclosure may be applied to luma blocks or chroma blocks. The term block may be interpreted as a prediction block, coding block, or coding unit, i.e., CU. The term block may also be used here to refer to a transform block. In the following, when referring to block size, it may refer to the width or height of the block, or the maximum value of the width and height, or the minimum value of the width and height, or the area size (width * height), or the aspect ratio of the block (width:height, or height:width).

可以将上述技术实现为计算机软件，该计算机软件使用计算机可读指令，并且物理存储在一个或多个计算机可读介质中。例如，图20示出了适合于实施所公开的主题的某些实施例的计算机系统(2000)。The above-described techniques can be implemented as computer software that uses computer-readable instructions and is physically stored in one or more computer-readable media. For example, Figure 20 illustrates a computer system (2000) suitable for implementing certain embodiments of the disclosed subject matter.

可以使用任何合适的机器代码或计算机语言对计算机软件进行编码，任何合适的机器代码或计算机语言可以经受汇编、编译、链接或类似的机制以创建包括指令的代码，该指令可以由一个或多个计算机中央处理单元(CPU)、图形处理单元(GPU)等直接执行或通过解释码、微码等执行。Computer software can be coded using any suitable machine code or computer language. Any suitable machine code or computer language can be assembled, compiled, linked, or similarly processed to create code containing instructions that can be executed directly by one or more computer central processing units (CPUs), graphics processing units (GPUs), etc., or through interpreter codes, microcode, etc.

指令可以在各种类型的计算机或其组件上执行，例如包括个人计算机、平板计算机、服务器、智能电话、游戏设备、物联网设备等。The instructions can be executed on various types of computers or their components, including personal computers, tablets, servers, smartphones, gaming devices, and Internet of Things (IoT) devices.

图20示出的用于计算机系统(2000)的组件在本质上是示例性的，并不旨在对实现本公开实施例的计算机软件的使用范围或功能提出任何限制。组件的配置也不应被解释为具有与计算机系统(2000)的示例性实施例中所示的组件中的任何一个组件或组件的组合有关的任何依赖或要求。The components for the computer system (2000) shown in Figure 20 are exemplary in nature and are not intended to impose any limitation on the scope or functionality of the computer software used to implement embodiments of this disclosure. The configuration of the components should also not be construed as having any dependencies or requirements relating to any one or a combination of components shown in the exemplary embodiments of the computer system (2000).

计算机系统(2000)可以包括某些人机接口输入设备。此类人机接口输入设备可以响应于一个或多个人类用户通过例如下述的输入：触觉输入(例如：击键、划动，数据手套移动)、音频输入(例如：语音、拍手)、视觉输入(例如：手势)、嗅觉输入(未描绘出)。人机接口设备还可以用于捕获不一定与人的意识输入直接相关的某些媒介，例如音频(例如：语音、音乐、环境声音)、图像(例如：扫描的图像、从静止图像相机获取摄影图像)、视频(例如二维视频、包括立体视频的三维视频)等。Computer systems (2000) may include certain human-computer interface input devices. Such human-computer interface input devices may respond to input from one or more human users through, for example, tactile input (e.g., keystrokes, swipes, movement of a data glove), audio input (e.g., speech, clapping), visual input (e.g., gestures), and olfactory input (not depicted). Human-computer interface devices may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., speech, music, ambient sounds), images (e.g., scanned images, photographic images acquired from still image cameras), and video (e.g., two-dimensional video, three-dimensional video including stereoscopic video), etc.

输入人机接口装置可以包括下述中的一项或多项(每种中仅示出一个)：键盘(2001)、鼠标(2002)、触控板(2003)、触摸屏(2010)、数据手套(未示出)、操纵杆(2005)、麦克风(2006)、扫描仪(2007)、相机(2008)。The input human-machine interface device may include one or more of the following (only one of each is shown): keyboard (2001), mouse (2002), touchpad (2003), touch screen (2010), data glove (not shown), joystick (2005), microphone (2006), scanner (2007), camera (2008).

计算机系统(2000)也可以包括某些人机接口输出设备。这样的人机接口输出设备可以例如通过触觉输出、声音、光和气味/味道来刺激一个或多个人类用户的感官。此类人机接口输出设备可以包括触觉输出设备(例如触摸屏(2010)、数据手套(未示出)或操纵杆(2005)的触觉反馈，但是也可以是不作为输入设备的触觉反馈设备)、音频输出设备(例如：扬声器(2009)、耳机(未示出))、视觉输出设备(例如包括CRT屏幕、LCD屏幕、等离子屏幕、OLED屏幕的屏幕(2010)，每种屏幕有或没有触摸屏输入功能，每种屏幕都有或没有触觉反馈功能-其中的一些屏幕能够通过诸如立体图像输出、虚拟现实眼镜(未描绘出)、全息显示器和烟箱(未描绘出)以及打印机(未描绘出)之类的设备来输出二维视觉输出或超过三维输出。Computer systems (2000) may also include certain human-machine interface output devices. Such human-machine interface output devices may, for example, stimulate the senses of one or more human users through tactile output, sound, light, and smell/taste. Such human-machine interface output devices may include tactile output devices (e.g., tactile feedback from touchscreens (2010), data gloves (not shown), or joysticks (2005), but may also be tactile feedback devices that are not input devices), audio output devices (e.g., speakers (2009), headphones (not shown)), visual output devices (e.g., screens including CRT screens, LCD screens, plasma screens, OLED screens (2010), each with or without touchscreen input, each with or without tactile feedback—some of which are capable of outputting two-dimensional or more three-dimensional visual outputs via devices such as stereoscopic image output, virtual reality glasses (not depicted), holographic displays and smoke boxes (not depicted), and printers (not depicted).

计算机系统(2000)也可以包括人类可访问存储设备及其关联介质：例如包括具有CD/DVD等介质(2021)的CD/DVD ROM/RW(2020)的光学介质、指状驱动器(2022)、可拆卸硬盘驱动器或固态驱动器(2023)、诸如磁带和软盘之类的传统磁性介质(未示出)、诸如安全软件狗之类的基于专用ROM/ASIC/PLD的设备(未示出)等。Computer systems (2000) may also include human-accessible storage devices and their associated media: for example, optical media including CD/DVD ROM/RW (2020) having media such as CD/DVD (2021), finger drives (2022), removable hard disk drives or solid-state drives (2023), conventional magnetic media such as magnetic tapes and floppy disks (not shown), devices based on dedicated ROM/ASIC/PLD such as security dongles (not shown), etc.

本领域技术人员还应理解，结合当前公开的主题使用的所术语“计算机可读介质”不涵盖传输介质、载波或其他暂时性信号。Those skilled in the art should also understand that the term "computer-readable medium" as used in connection with the subject matter currently disclosed does not cover transmission media, carrier waves, or other transient signals.

计算机系统(2000)还可以包括到一个或多个通信网络(2055)的接口(2054)。网络可以例如是无线网络、有线网络、光网络。网络可以进一步地是本地网络、广域网络、城域网络、车辆和工业网络、实时网络、延迟容忍网络等。网络的示例包括诸如以太网之类的局域网、无线LAN、包括GSM、3G、4G、5G、LTE等的蜂窝网络、包括有线电视、卫星电视和地面广播电视的电视有线或无线广域数字网络、包括CAN Bus的车辆和工业用电视等等。某些网络通常需要连接到某些通用数据端口或外围总线(2049)的外部网络接口适配器(例如计算机系统(2000)的USB端口)；如下所述，其他网络接口通常通过连接到系统总线而集成到计算机系统(2000)的内核中(例如，连接PC计算机系统中的以太网接口或连接到智能手机计算机系统中的蜂窝网络接口)。计算机系统(2000)可以使用这些网络中的任何一个网络与其他实体通信。此类通信可以是仅单向接收的(例如，广播电视)、仅单向发送的(例如，连接到某些CANbus设备的CANbus)或双向的，例如，使用局域网或广域网数字网络连接到其他计算机系统。如上所述，可以在那些网络和网络接口的每一个上使用某些协议和协议栈。The computer system (2000) may also include interfaces (2054) to one or more communication networks (2055). Networks may be, for example, wireless networks, wired networks, or optical networks. Networks may further be local area networks, wide area networks, metropolitan area networks, vehicle and industrial networks, real-time networks, latency-tolerant networks, etc. Examples of networks include local area networks such as Ethernet, wireless LANs, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., cable or wireless wide area digital television networks including cable television, satellite television, and terrestrial broadcast television, vehicle and industrial television including CAN buses, etc. Some networks typically require external network interface adapters (e.g., USB ports of the computer system (2000)) to connect to certain general-purpose data ports or peripheral buses (2049); other network interfaces are typically integrated into the core of the computer system (2000) by connecting to the system bus (e.g., an Ethernet interface connected to a PC computer system or a cellular network interface connected to a smartphone computer system). The computer system (2000) can use any of these networks to communicate with other entities. Such communication can be one-way receiving (e.g., broadcast television), one-way transmitting (e.g., CANbus connected to certain CANbus devices), or bidirectional, such as connecting to other computer systems using a local area network (LAN) or wide area network (WAN) digital network. As mentioned above, certain protocols and protocol stacks can be used on each of those networks and network interfaces.

上述人机接口设备、人机可访问的存储设备和网络接口可以附接到计算机系统(2000)的内核(2040)。The aforementioned human-machine interface devices, human-machine accessible storage devices, and network interfaces can be attached to the kernel (2040) of a computer system (2000).

内核(2040)可以包括一个或多个中央处理单元(CPU)(2041)、图形处理单元(GPU)(2042)、以现场可编程门区域(FPGA)形式的专用可编程处理单元(2043)、用于某些任务的硬件加速器(2044)，图形适配器(2050)等。这些设备以及只读存储器(ROM)(2045)、随机存取存储器(2046)、诸如内部非用户可访问的硬盘驱动器、SSD等之类的内部大容量存储器(2047)可以通过系统总线(2048)连接。在一些计算机系统中，可以以一个或多个物理插头的形式访问系统总线(2048)，以能够通过附加的CPU、GPU等进行扩展。外围设备可以直接连接到内核的系统总线(2048)或通过外围总线(2049)连接到内核的系统总线(1848)。在一个示例中，屏幕(2010)可以连接到图形适配器(2050)。外围总线的体系结构包括PCI、USB等。The kernel (2040) may include one or more central processing units (CPU) (2041), graphics processing units (GPUs) (2042), dedicated programmable processing units (2043) in the form of field-programmable gate areas (FPGAs), hardware accelerators (2044) for certain tasks, graphics adapters (2050), etc. These devices, along with read-only memory (ROM) (2045), random access memory (2046), and internal mass storage such as internal non-user-accessible hard drives, SSDs, etc. (2047), may be connected via a system bus (2048). In some computer systems, the system bus (2048) may be accessed via one or more physical connectors to allow for expansion with additional CPUs, GPUs, etc. Peripheral devices may be directly connected to the kernel's system bus (2048) or connected via a peripheral bus (2049) to the kernel's system bus (1848). In one example, a screen (2010) may be connected to a graphics adapter (2050). Peripheral bus architectures include PCI, USB, etc.

CPU(2041)、GPU(2042)、FPGA(2043)和加速器(2044)可以执行某些指令，这些指令可以组合来构成上述计算机代码。该计算机代码可以存储在ROM(2045)或RAM(2046)中。过渡数据也可以存储在RAM(2046)中，而永久数据可以存储在例如内部大容量存储器(2047)中。可以通过使用高速缓冲来进行对任何存储设备的快速存储及检索，该高速缓冲可以与下述紧密关联：一个或多个CPU(2041)、GPU(2042)、大容量存储(2047)、ROM(2045)、RAM(2046)等。The CPU (2041), GPU (2042), FPGA (2043), and accelerator (2044) can execute certain instructions, which can be combined to form the aforementioned computer code. This computer code can be stored in ROM (2045) or RAM (2046). Transient data can also be stored in RAM (2046), while permanent data can be stored, for example, in internal mass storage (2047). Fast storage and retrieval of any storage device can be achieved using a cache, which can be closely associated with one or more CPUs (2041), GPUs (2042), mass storage (2047), ROM (2045), RAM (2046), etc.

计算机可读介质可以在其上具有用于执行各种由计算机实现的操作的计算机代码。介质和计算机代码可以是出于本公开的目的而专门设计和构造的介质和计算机代码，或者介质和计算机代码可以是计算机软件领域的技术人员公知且可用的类型。Computer-readable media may have computer code thereon for performing various computer-implemented operations. The media and computer code may be media and computer code specifically designed and constructed for the purposes of this disclosure, or the media and computer code may be of a type known and available to those skilled in the art of computer software.

作为非限制性示例，可以由于一个或多个处理器(包括CPU、GPU、FPGA、加速器等)执行包含在一种或多种有形的计算机可读介质中的软件而使得具有架构(2000)，特别是内核(2040)的计算机系统可以提供功能。此类计算机可读介质可以是与如上所述的用户可访问的大容量存储相关联的介质，以及某些非暂时性的内核(2040)的存储器，例如内核内部大容量存储器(2047)或ROM(2045)。可以将实施本公开的各种实施例的软件存储在此类设备中并由内核(2040)执行。根据特定需要，计算机可读介质可以包括一个或多个存储设备或芯片。软件可以使得内核(2040)，特别是其中的处理器(包括CPU、GPU、FPGA等)执行本文所描述的特定过程或特定过程的特定部分，包括定义存储在RAM中的数据结构(2046)以及根据由软件定义的过程来修改此类数据结构。附加地或替换地，可以由于硬连线或以其他方式体现在电路(例如，加速器(2044))中的逻辑中而使得计算机系统提供功能，该电路可以替换软件或与软件一起运行以执行本文描述的特定过程或特定过程的特定部分。在适当的情况下，提及软件的部分可以包含逻辑，反之亦然。在适当的情况下，提及计算机可读介质的部分可以包括存储用于执行的软件的电路(例如集成电路(IC))、体现用于执行的逻辑电路或包括两者。本公开包括硬件和软件的任何适当组合。As a non-limiting example, a computer system having an architecture (2000), particularly a kernel (2040), can provide functionality by having one or more processors (including CPUs, GPUs, FPGAs, accelerators, etc.) execute software contained in one or more tangible computer-readable media. Such computer-readable media can be media associated with user-accessible mass storage as described above, and some non-transitory memory of the kernel (2040), such as internal kernel mass storage (2047) or ROM (2045). Software implementing various embodiments of this disclosure can be stored in such devices and executed by the kernel (2040). Depending on specific needs, the computer-readable media may include one or more storage devices or chips. The software can cause the kernel (2040), particularly the processors therein (including CPUs, GPUs, FPGAs, etc.), to execute a particular process or a particular portion of a particular process described herein, including defining data structures (2046) stored in RAM and modifying such data structures according to a software-defined process. Additionally or alternatively, the computer system may provide functionality through hard-wired or otherwise embodied logic in circuitry (e.g., the accelerator (2044)), which may replace or operate with the software to perform a particular process or a particular portion of a particular process described herein. Where appropriate, references to software may include logic, and vice versa. Where appropriate, references to computer-readable media may include circuitry (e.g., an integrated circuit (IC)) storing software for execution, logic circuitry embodying the execution, or both. This disclosure includes any suitable combination of hardware and software.

尽管本公开已经描述了一些示例性实施例，但存在落入本公开范围内的改变、置换和各种替代等价物。因此，应当理解，本领域技术人员将能够设计出许多虽然未在本文中明确示出或描述，但体现了本公开的原理，因此落入本公开的精神和范围内的系统和方法。While exemplary embodiments have been described in this disclosure, modifications, substitutions, and various alternative equivalents fall within the scope of this disclosure. Therefore, it should be understood that those skilled in the art will be able to design numerous systems and methods that, while not expressly shown or described herein, embody the principles of this disclosure and thus fall within its spirit and scope.

附录A：缩略语Appendix A: Abbreviations

JEM：联合探索模型JEM: Joint Exploration Model

VVC：下一代视频编码VVC: Next-Generation Video Coding

BMS：基准集BMS: Benchmark Set

MV：运动矢量MV: Motion Vector

HEVC：高效视频编码HEVC: High-Efficiency Video Coding

SEI：补充增强信息SEI: Supplemental Enhancement Information

VUI：视频可用性信息VUI: Video Availability Information

GOPs：图片群组GOPs: Image Groups

TUs：变换单元TUs: Transformation Unit

PUs：预测单元PUs: Prediction Units

CTUs：编码树单元CTUs: Coding Tree Units

CTBs：编码树块CTBs: Coded Tree Blocks

PBs：预测块PBs: Predicted Blocks

HRD：假定参考解码器HRD: Assuming a reference decoder

SNR：信噪比SNR: Signal-to-noise ratio

CPUs：中央处理单元CPUs: Central Processing Unit

GPUs：图形处理单元GPUs: Graphics Processing Units

CRT：阴极射线管CRT: Cathode Ray Tube

LCD：液晶显示器LCD: Liquid Crystal Display

OLED：有机发光二极管OLED: Organic Light Emitting Diode

CD：光盘CD: CD-ROM

DVD：数字视频光盘DVD: Digital Video Disc

ROM：只读存储器ROM: Read-Only Memory

RAM：随机存取存储器RAM: Random Access Memory

ASIC：专用集成电路ASIC: Application-Specific Integrated Circuit

PLD：可编程逻辑器件PLD: Programmable Logic Device

LAN：局域网LAN: Local Area Network

GSM：全球移动通信系统GSM: Global System for Mobile Communications

LTE：长期演进LTE: Long Term Evolution

CANBus：控制器局域网络总线CANBus: Controller Area Network Bus

USB：通用串行总线USB: Universal Serial Bus

PCI：外围设备互连PCI: Peripheral Device Interconnect

FPGA：现场可编程门区域FPGA: Field Programmable Gate Domain

SSD：固态驱动器SSD: Solid State Drive

IC：集成电路IC: Integrated Circuit

HDR：高动态范围HDR: High Dynamic Range

SDR：标准动态范围SDR: Standard Dynamic Range

JVET：联合视频勘探队JVET: Joint Video Exploration Team

MPM：最可能模式MPM: Most Likely Pattern

WAIP：广角帧内预测WAIP: Wide-angle Intra-frame Prediction

CU：编码单元CU: Encoding Unit

PU：预测单元PU: Prediction Unit

TU：变换单元TU: Transformation Unit

CTU：编码树单元CTU: Coding Tree Unit

PDPC：位置相关预测组合PDPC: Location-Related Prediction Combination

ISP：帧内子块划分ISP: Intra-frame sub-block division

SPS：序列参数设置SPS: Sequence Parameter Settings

PPS：图片参数集PPS: Image Parameter Set

APS：自适应参数集APS: Adaptive Parameter Set

VPS：视频参数集VPS: Video Parameter Set

DPS：解码参数集DPS: Decoding Parameter Set

ALF：自适应环路滤波器ALF: Adaptive Loop Filter

SAO：采样自适应偏移SAO: Sampling Adaptive Offset

CC-ALF：跨分量的自适应环路滤波器CC-ALF: Adaptive Loop Filter Across Components

CDEF：约束定向增强滤波器CDEF: Constrained Directional Enhancement Filter

CCSO：跨分量样本偏移CCSO: Cross-component sample offset

LSO：局部采样偏移LSO: Local Sampling Offset

LR：环路恢复滤波器LR: Loop Recovery Filter

AV1：开放媒体联盟视频1AV1: Open Media Alliance Video 1

AV2：开放媒体联盟视频2AV2: Open Media Alliance Video 2

MVD：运动矢量差MVD: Motion Vector Difference

CfL：从亮度预测色度CfL: Predicting chromaticity from luminance

SDT:半解耦树SDT: Semi-decoupled tree

SDP：半解耦划分SDP: Semi-decoupling partitioning

SST:半分离树SST: Semi-separated tree

SB:超级块SB: Super Block

IBC(或IntraBC)：块内复制IBC (or IntraBC): Intra-block copy

CDF：累积密度函数CDF: Cumulative Density Function

SCC：屏幕内容编码SCC: Screen Content Encoding

GBI：广义双预测GBI: Generalized Dual Prediction

BCW：CU级加权双向预测BCW: CU-level weighted bidirectional prediction

CIIP：组合帧内帧间预测CIIP: Combined Intra-Inter-Frame Prediction

POC:图片序列号POC: Image Serial Number

RPS:参考图片集RPS: Reference Image Gallery

DPB:解码图片缓冲器DPB: Decode Image Buffer

MMVD：运动矢量差分合并模式MMVD: Motion Vector Differential Combining Mode

Claims

1. A method for decoding inter-frame predicted video blocks of a video stream, characterized in that the method comprises:

Receive the video stream;

The motion vector difference (MVD) between the motion vector associated with the inter-frame predicted video block and the reference motion vector is determined to be written into the video stream, wherein the reference motion vector corresponds to only one of the reference frames in reference frame list 0 and reference frame list 1, unless the MVD is written jointly for both reference frames;

Obtaining an indication of the size range of the MVD within a plurality of predefined motion vector difference size ranges from the video stream, the indication of the size range of the MVD includes: extracting a first predefined syntax element from the video stream, the first predefined syntax element indicating the MVD category of the MVD in a predefined MVD category set, the lower MVD category corresponding to a smaller MVD size range; and determining the size range of the MVD based on the MVD category;

The pixel resolution of the MVD is determined based on the size range of the MVD, wherein the pixel resolution of the MVD depends on the MVD category in descending order;

Additional MVD information in the video stream is identified based on the pixel resolution of the MVD, wherein the additional MVD information indicates the optional pixel resolution selected by the MVD;

Extract the additional MVD information from the video stream; and

The inter-frame predicted video block is decoded based on the pixel resolution of the MVD, the additional MVD information, the reference motion vector, and the reference frame associated with the motion vector.

2. The method according to claim 1, wherein the pixel resolution of the MVD is 2 ^{^n} pixels, where n is an integer, and the value ranges from -6 to 11, including -6 and 11.

3. The method according to claim 1, wherein determining the pixel resolution of the MVD based on the size range of the MVD comprises:

Determine whether the size range of the MVD is higher than a preset MVD range threshold level;

When the size range of the MVD is determined to be higher than the preset MVD range threshold level, the pixel resolution of the MVD is determined to be an integer number of pixels; and

When it is determined that the size range of the MVD is not higher than the preset MVD range threshold level, the pixel resolution of the MVD is determined to be a fraction of pixels.

4. The method according to claim 3, wherein identifying additional MVD information in the video stream based on the pixel resolution of the MVD includes:

The video stream is parsed according to the second predefined syntax element to obtain the integer pixel portion of the MVD; and

When it is determined that the pixel resolution of the MVD is a fraction of pixels, the video stream is further parsed according to at least a third predefined syntax element to obtain the fractional pixel portion of the MVD.

5. The method according to claim 3, wherein the preset MVD range threshold level includes the lowest MVD or the second lowest MVD in the predefined MVD category set.

6. The method according to claim 3, wherein each MVD in the predefined MVD category set whose size range is higher than the preset MVD range threshold level is associated with a single allowed integer MVD pixel value.

7. The method according to claim 6, wherein the single allowed integer pixel value includes the pixel value corresponding to the higher value in the corresponding size range.

8. The method according to claim 6, wherein the single allowed integer pixel value includes a pixel value corresponding to the midpoint value in the corresponding size range.

9. The method according to claim 4, wherein determining the pixel resolution of the MVD based on the size range of the MVD comprises:

Determine whether the size range of the MVD is lower than, includes, or is higher than a preset MVD threshold value;

When the size range of the MVD is determined to be higher than the preset MVD threshold value, the pixel resolution of the MVD is determined to be an integer number of pixels; and

When it is determined that the size range of the MVD is not higher than the preset MVD threshold size value, the pixel resolution of the MVD is determined to be a fraction of pixels.

10. The method according to claim 9, further comprising: when determining that the size range of the MVD includes the preset MVD threshold size value:

Extract a second predefined syntax element from the video stream, the second predefined syntax element indicating the MVD size offset relative to the starting size of the MVD size range;

The integer size of the MVD is obtained based on the size range of the MVD and the size offset of the MVD;

When the integer value of the MVD is not higher than the preset MVD threshold value, the pixel resolution of the MVD is determined to be a decimal; and

When the integer value of the MVD is higher than the preset MVD threshold value, the pixel resolution of the MVD is determined to be a non-decimal.

11. The method according to claim 10, wherein when the pixel resolution of the MVD is determined to be a decimal, the step of identifying additional MVD information in the video stream based on the pixel resolution of the MVD includes:

The video stream is parsed according to a third predefined syntax element to obtain the fractional part of the MVD.

12. The method according to claim 9, wherein the preset MVD threshold value is less than 4 pixels.

13. An electronic device for decoding inter-frame predicted video blocks of a video stream, characterized in that the electronic device includes a memory for storing computer instructions and a processor communicating with the memory, wherein the processor, when executing the computer instructions, is configured to cause the electronic device to perform the method of any one of claims 1-12.

14. A method for processing a video stream, wherein the video stream is decoded based on the method of any one of claims 1 to 12.