CN111479110A - Fast Affine Motion Estimation Method for H.266/VVC - Google Patents
Fast Affine Motion Estimation Method for H.266/VVC Download PDFInfo
- Publication number
- CN111479110A CN111479110A CN202010293694.8A CN202010293694A CN111479110A CN 111479110 A CN111479110 A CN 111479110A CN 202010293694 A CN202010293694 A CN 202010293694A CN 111479110 A CN111479110 A CN 111479110A
- Authority
- CN
- China
- Prior art keywords
- prediction
- motion estimation
- uni
- affine motion
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/109—Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
- H04N19/543—Motion estimation other than block-based using regions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/567—Motion estimation based on rate distortion criteria
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本发明提出了一种针对H.266/VVC的快速仿射运动估计方法,其步骤为:利用标准差计算当前CU的纹理复杂度,并根据纹理复杂度将当前CU分为静态区域或非静态区域;对于静态区域的CU,跳过仿射运动估计,直接利用运动估计对当前CU进行预测,并通过率失真优化的方法选择最佳的预测方向模式;对于非静态区域的CU,利用训练好的随机森林分类器RFC模型对当前CU进行分类,输出最佳的预测方向模式。对于静态区域的CU,本发明跳过仿射运动估计,降低了计算复杂度;对于非静态区域的CU,本发明通过提前训练好的模型直接进行预测方向模式的预测,避免了仿射运动估计的计算,从而降低仿射运动估计模块的复杂度。
The present invention proposes a fast affine motion estimation method for H.266/VVC, the steps of which are: calculating the texture complexity of the current CU by using the standard deviation, and classifying the current CU into a static area or a non-static area according to the texture complexity area; for CUs in static areas, skip affine motion estimation, directly use motion estimation to predict the current CU, and select the best prediction direction mode through rate-distortion optimization; for CUs in non-static areas, use trained The Random Forest Classifier RFC model classifies the current CU and outputs the best predicted direction pattern. For the CU in the static area, the present invention skips affine motion estimation, which reduces the computational complexity; for the CU in the non-static area, the present invention directly predicts the prediction direction mode through the model trained in advance, avoiding affine motion estimation , thereby reducing the complexity of the affine motion estimation module.
Description
技术领域technical field
本发明涉及图像处理技术领域,特别是指一种针对H.266/VVC的快速仿射运动估计方法。The invention relates to the technical field of image processing, in particular to a fast affine motion estimation method for H.266/VVC.
背景技术Background technique
在如今的信息化时代,三维影像、超高清视频和虚拟现实等视频服务需求日益增长,高清晰度视频的编码和传输日益成为研究的热点问题。随着H.266/VVC标准的发展与完善,视频处理效率的提高也带动了视频产业的发展,为新一代视频编码技术的发展奠定了基础。高度密集的数据给带宽和存储带来巨大挑战,当前主流的视频编码标准开始无法满足目前新兴的应用,因此,新一代视频编码标准H.266/VVC应运而生,满足人们对视频的清晰度、流畅度以及实时度的要求。国际标准化组织ISO/IEC MPEG和ITU-T VCEG成立了联合视频小组(Joint Video Exploration Team,JVET),负责进行下一代视频编码标准H.266/通用视频编码(Versatile Video Coding,VVC)的开发。H.266/VVC的制定是针对4K及以上的高清视频,位深以10比特为主,这与H.265/HEVC的定位不同,这导致目前编码器的最大块尺寸变为128,编码中间处理的像素都为10比特,即使输入的8比特的序列,都会转为10比特处理。In today's information age, the demand for video services such as 3D images, ultra-high-definition video, and virtual reality is increasing, and the encoding and transmission of high-definition video has increasingly become a research hotspot. With the development and improvement of the H.266/VVC standard, the improvement of video processing efficiency has also driven the development of the video industry, laying the foundation for the development of a new generation of video coding technology. Highly dense data brings huge challenges to bandwidth and storage. The current mainstream video coding standards cannot meet the current emerging applications. Therefore, a new generation of video coding standards H.266/VVC came into being to meet people's demand for video clarity. , fluency and real-time requirements. The International Organization for Standardization ISO/IEC MPEG and ITU-T VCEG established a Joint Video Exploration Team (JVET), which is responsible for the development of the next-generation video coding standard H.266/Versatile Video Coding (VVC). H.266/VVC is formulated for high-definition video of 4K and above, and the bit depth is mainly 10 bits, which is different from the positioning of H.265/HEVC, which causes the maximum block size of the current encoder to become 128. The processed pixels are all 10-bit, even if the input 8-bit sequence is converted to 10-bit processing.
H.266/VVC使用混合编码技术框架,图像划分从单一、固定划分不断朝着多样、灵活的划分结构发展,能够更加高效的适配高分辨率图像的编解码处理。此外,H.266/VVC针对新一代视频数据对原有H.265/HEVC编码器的帧间-帧内预测,预测信号滤波,变换,量化/缩放,熵编码等新元素进行了扩展,并考虑新一代视频编码标准的特性,添加了新的模型预测模式。具体地,H.266/VVC沿用了高效视频编码H.265/HEVC帧间编码的运动估计、运动补偿和运动矢量预测技术,并在此基础上引入了一些新的技术。如对Merge模式进行扩展添加基于历史的预测运动矢量,添加了新的预测方法如仿射变换技术,自适应运动矢量精度方法,1/16采样精度的运动预测补偿等。众多先进编码工具的引入,大大提高新一代视频编码标准H.266/VVC的编码效率。但也因率失真成本计算而显著提高H.266/VVC帧间编码的运算复杂性,从而显著降低了新一代视频的编码速度。H.266/VVC uses a hybrid coding technology framework, and the image division continues to develop from a single, fixed division to a diverse and flexible division structure, which can more efficiently adapt to the encoding and decoding processing of high-resolution images. In addition, H.266/VVC extends the new elements of the original H.265/HEVC encoder such as inter-intra prediction, prediction signal filtering, transformation, quantization/scaling, entropy coding and other new elements for the new generation of video data, and Taking into account the characteristics of the new generation of video coding standards, a new model prediction mode has been added. Specifically, H.266/VVC follows the motion estimation, motion compensation and motion vector prediction techniques of high-efficiency video coding H.265/HEVC inter-frame coding, and introduces some new techniques on this basis. For example, the Merge mode is extended to add a prediction motion vector based on history, and new prediction methods such as affine transformation technology, adaptive motion vector precision method, and 1/16 sampling precision motion prediction compensation are added. The introduction of many advanced coding tools greatly improves the coding efficiency of the new generation video coding standard H.266/VVC. However, it also significantly increases the computational complexity of H.266/VVC inter-frame coding due to the rate-distortion cost calculation, thereby significantly reducing the coding speed of the new generation of video.
帧间预测的主要原理是为当前图像的每个像素块在之前已经编码的图像中找一个最佳匹配块,该过程称为运动估计ME,其中,用于预测的图像称为参考图像,参考块为参考图像中最佳匹配块,即参考像素块,参考块到当前像素块的位移称为运动矢量MV,当前像素块与参考块的差值称为预测残差。其中,运动估计ME算法是H.266/VVC视频编码过程中最为关键的算法,它占用了整个视频编码一半以上的计算量和绝大部分的运算时间,是决定视频压缩效率的主导因素。运动估计ME通过有效地去除连续图像间的时间冗余而成为视频压缩技术中的一个研究热点。为了提高压缩效率,最近的视频编解码器尝试估计不同形状和大小的运动。此外,通过添加多类型树,可以对非常薄的块执行运动估计ME(例如,宽度是高度的八分之一)。因此,在多类型树(multi-type trees,MTT)的各个模块中,运动估计ME是VVC中编码复杂度最高的工具。由于在MTT的精细划分块中递归地执行更高级的帧间预测方案,运动估计ME的计算复杂度甚至比HEVC中增加更多,因为未来视频编码(Future videocoding,FVC)中ME还尝试了仿射运动估计等新技术。仿射运动估计AME以旋转和缩放等非平移运动为特征,以牺牲较高的编码复杂度为代价,在率失真(Rate distortion,RD)性能上是有效的。在整个运动估计ME处理时间中,仿射运动估计AME的计算复杂度占很大一部分,因此降低其复杂度是非常重要的。因此,要降低VTM编码器的复杂度,需要加快AME模块。The main principle of inter-frame prediction is to find a best matching block in the previously coded image for each pixel block of the current image. This process is called motion estimation ME, and the image used for prediction is called the reference image. The block is the best matching block in the reference image, that is, the reference pixel block, the displacement from the reference block to the current pixel block is called the motion vector MV, and the difference between the current pixel block and the reference block is called the prediction residual. Among them, the motion estimation ME algorithm is the most critical algorithm in the H.266/VVC video coding process. It occupies more than half of the calculation amount and most of the computing time of the entire video coding, and is the dominant factor determining the video compression efficiency. Motion estimation ME has become a research hotspot in video compression technology by effectively removing temporal redundancy between consecutive images. To improve compression efficiency, recent video codecs try to estimate motion of different shapes and sizes. Furthermore, by adding a multi-type tree, motion estimation ME can be performed on very thin blocks (eg, width is one-eighth the height). Therefore, among the various modules of multi-type trees (MTT), motion estimation ME is the tool with the highest coding complexity in VVC. Since the more advanced inter prediction scheme is recursively performed in the finely divided blocks of the MTT, the computational complexity of the motion estimation ME increases even more than that in HEVC, because in Future Video Coding (FVC), the ME also tries to simulate the New technologies such as shot motion estimation. Affine motion estimation AME, characterized by non-translational motions such as rotation and scaling, is effective in rate distortion (RD) performance at the expense of higher coding complexity. In the whole motion estimation ME processing time, the computational complexity of affine motion estimation AME accounts for a large part, so it is very important to reduce its complexity. Therefore, to reduce the complexity of the VTM encoder, it is necessary to speed up the AME module.
事实上,针对H.265/HEVC帧间预测复杂度高的问题,许多文献已经进行了大量的研究。J.Xiong等人根据椎体运动散度提出了一种快速CU选择算法,该算法可以提前跳过H.265/HEVC中个别帧间CU。L.Shen等人提出了一种用于H.265/HEVC的自适应模式间决策算法,该算法联合利用了层间和时空相关性,基于统计分析提出了早期跳过模式决策,基于预测大小相关的模式决策和基于速率失真(RD)成本相关的模式决策三种方法。H.Lee等人提出了在计算2N×2N的Merge模式的RD代价后,利用其失真特性提出了一种早期的跳过模式决策方法。Q.Zhang等人针对纹理视频和深度图内容的高相关性,提出了提前判决编码单元深度级别和自适应模式判决方法,用以降低视频编码的计算复杂度。Q.Hu等人提出了一种基于Neyman-Pearson规则的快速帧间模式决策算法,该算法包括早期的SKIP模式决策和快速的CU大小决策来降低H.265/HEVC复杂度。Z.Pan等人提出了一种基于内容相似度的快速参考帧选择算法,以减少基于多个参考帧的帧间预测的计算复杂性。Z.Pan等人基于不同尺寸预测模式之间最佳的运动矢量选择相关性,提出一种快速运动估计ME方法,以降低H.265/HEVC编码器的编码复杂度。J.Zhang等人提出了一种基于贝叶斯方法和条件随机场的两阶段快速帧间CU决策方法,以降低HEVC编码器的编码复杂度。T.S.Kim等人提出了一种基于HEVC的快速运动估计算法,该算法支持高度灵活的块分区结构。通过搜索多个精确运动矢量预测周围的狭窄区域,该算法大大降低了其计算复杂性。C.Ma等人提出了一种基于神经网络的算术编码方法,对HEVC中的帧间预测信息进行编码。L.Shen等人提出了一种快速模式决策算法来减少编码器的计算复杂度。所提出方法利用了三个调整参数后的优化编码器,即SKIP/Merge模式的条件概率,运动特性和模式复杂度。D.Wang等人提出了快速的深度级别和帧间模式预测算法。该算法使用层间相关性,空间相关性及其相关程度来加快HEVC帧间编码。以上的快速帧间方法都能够保证编码性能的同时,有效降低H.265/HEVC的计算复杂度。但是,这些方法并非是为H.266/VVC编码器设计的,而H.266/VVC编码器采用了新帧间预测技术,如采用更先进的仿射运动补偿预测、扩展Merge模式、自适应运动矢量精度、三角划分模式等技术。基于此,H.266/VVC与基于H.265/HEVC帧间预测的空间和层间相关性必然存在着较大的差别,因而需要重新研究基于H.266/VVC的低复杂度帧间编码方法。In fact, many literatures have done a lot of research on the high complexity of H.265/HEVC inter-frame prediction. J. Xiong et al. proposed a fast CU selection algorithm based on vertebral motion divergence, which can skip individual inter-frame CUs in H.265/HEVC in advance. L. Shen et al. proposed an adaptive inter-mode decision algorithm for H.265/HEVC, which jointly exploited inter-layer and spatiotemporal correlations, proposed early skip mode decision based on statistical analysis, and based on prediction size Correlation mode decision and rate-distortion (RD) cost-dependent mode decision three methods. H. Lee et al. proposed an early skip mode decision method using its distortion characteristics after calculating the RD cost of the 2N×2N Merge mode. Aiming at the high correlation between texture video and depth map content, Q. Zhang et al. proposed an advanced decision coding unit depth level and adaptive mode decision method to reduce the computational complexity of video coding. Q.Hu et al. proposed a fast inter mode decision algorithm based on Neyman-Pearson rule, which includes early SKIP mode decision and fast CU size decision to reduce H.265/HEVC complexity. Z. Pan et al. proposed a fast reference frame selection algorithm based on content similarity to reduce the computational complexity of inter prediction based on multiple reference frames. Based on the optimal motion vector selection correlation between prediction modes of different sizes, Z. Pan et al. propose a fast motion estimation ME method to reduce the coding complexity of H.265/HEVC encoder. J. Zhang et al. proposed a two-stage fast inter-frame CU decision method based on Bayesian method and conditional random field to reduce the coding complexity of HEVC encoder. T.S. Kim et al. proposed a fast motion estimation algorithm based on HEVC, which supports a highly flexible block partition structure. By searching a narrow area around multiple accurate motion vector predictions, the algorithm greatly reduces its computational complexity. C. Ma et al. proposed a neural network-based arithmetic coding method to encode the inter prediction information in HEVC. L. Shen et al. proposed a fast mode decision algorithm to reduce the computational complexity of the encoder. The proposed method utilizes the optimized encoder after adjusting three parameters, namely the conditional probability of the SKIP/Merge mode, the motion characteristics and the mode complexity. D. Wang et al. proposed a fast depth-level and inter-mode prediction algorithm. The algorithm uses inter-layer correlation, spatial correlation and their degree of correlation to speed up HEVC inter-frame coding. The above fast inter-frame methods can effectively reduce the computational complexity of H.265/HEVC while ensuring the coding performance. However, these methods are not designed for H.266/VVC encoders, which use new inter-frame prediction techniques such as more advanced affine motion compensation prediction, extended Merge mode, adaptive Motion vector accuracy, triangulation mode and other technologies. Based on this, the spatial and inter-layer correlation between H.266/VVC and H.265/HEVC-based inter-frame prediction must be quite different, so it is necessary to re-study the low-complexity inter-frame coding based on H.266/VVC. method.
针对H.266/VVC帧间编码复杂度高的问题,极少一部分文献对此进行了探索。S.Park等人提出了一种有效限制正常运动估计以及仿射运动估计的参考帧搜索范围的方法,它主要利用了H.266/VVC预测结构内的依赖性。该方法基于父节点的预测信息来最小化CU的参考帧搜索范围的最大值来降低编码复杂度。Z.Wang等人提出了一种基于置信区间的四叉树加二叉树(quadtree plus binary tree,QTBT)划分结构的提前终止方案,建立了基于运动发散场的率失真(rate distortion,RD)模型,来估计每个分区模式的率失真RD成本;并基于该模型早期终止了H.266/VVC的块划分,以消除不必要的分区迭代,使H.266/VVC编码性能和编码复杂度之间取得良好的平衡。Z.Wang等人提出了一种面向卷积神经网络(convolutional neural networks,CNN)的快速QTBT分区决策算法,用于H.266/VVC帧间编码,该算法以统计方式来分析QTBT,从而设计卷积神经网络CNN的体系结构,并利用时间相关性来控制错误预测风险,以提高卷积神经网络CNN方案的鲁棒性。D.García-Lucas等人提出了一种用于提取帧运动信息的预分析算法,该算法在运动估计模块中用于加速H.266/VVC编码器。S.Park等人提出了一种快速H.266/VVC帧间编码方法,以有效降低使用多类型树MTT时VTM中仿射运动估计的编码复杂度。该方法包括两个过程:提前终止方案和减少仿射运动估计的参考帧的数量。H.Gao等人提出了一种低复杂度的解码器侧运动矢量细化方案,通过在先前解码的参考图片中搜索匹配成本最小的块,从Merge模式中优化初始运动矢量MV,并被添加到的基于双边匹配的解码器侧运动矢量细化方法中。N.Tang等人针对H.266/VVC帧间编码提出了一种快速块划分算法,使用三帧差来判断当前块是否为静态对象;当前块为静止时,无需进一步拆分,从而提前终止分区以提高帧间编码速度。然而,在VVC中减轻仿射运动估计AME复杂度的工作很少。对于VTM,有很大的空间进一步降低多类型树MTT结构中的运动估计ME复杂度,特别是在仿射运动估计AME中。In view of the high complexity of H.266/VVC inter-frame coding, very few literatures have explored it. S. Park et al. proposed a method to effectively limit the reference frame search range for normal motion estimation and affine motion estimation, which mainly exploits the dependencies within the H.266/VVC prediction structure. The method reduces the coding complexity by minimizing the maximum value of the reference frame search range of the CU based on the prediction information of the parent node. Z. Wang et al. proposed an early termination scheme based on a confidence interval based quadtree plus binary tree (QTBT) partition structure, and established a rate distortion (RD) model based on the motion divergence field, to estimate the rate-distortion RD cost of each partition mode; and based on this model, the block partitioning of H.266/VVC is terminated early to eliminate unnecessary partition iterations and make H.266/VVC coding performance and coding complexity. strike a good balance. Z. Wang et al. proposed a fast QTBT partition decision algorithm for convolutional neural networks (CNN) for H.266/VVC inter-frame coding. Architecture of Convolutional Neural Network CNNs and exploit temporal correlation to control misprediction risk to improve the robustness of Convolutional Neural Networks CNN schemes. D. García-Lucas et al. proposed a pre-analysis algorithm for extracting frame motion information, which is used in the motion estimation module to accelerate H.266/VVC encoders. S. Park et al. proposed a fast H.266/VVC inter-frame coding method to effectively reduce the coding complexity of affine motion estimation in VTM when using multi-type tree MTT. The method consists of two procedures: early termination of the scheme and reduction of the number of reference frames for affine motion estimation. H. Gao et al. proposed a low-complexity decoder-side motion vector refinement scheme, by searching for the block with the smallest matching cost in the previously decoded reference pictures, the initial motion vector MV is optimized from the Merge mode, and is added into the decoder-side motion vector refinement method based on bilateral matching. N. Tang et al. proposed a fast block division algorithm for H.266/VVC inter-frame coding, which uses three frame differences to determine whether the current block is a static object; when the current block is static, no further splitting is required, so it is terminated early Partitioning to improve inter-coding speed. However, little work has been done to alleviate the complexity of AME for affine motion estimation in VVC. For VTM, there is a lot of room to further reduce motion estimation ME complexity in multi-type tree MTT structure, especially in affine motion estimation AME.
发明内容SUMMARY OF THE INVENTION
针对上述背景技术中存在的不足,本发明提出了一种针对H.266/VVC的快速仿射运动估计方法,解决了在VTM中的仿射运动估计AME编码复杂度高的技术问题。In view of the deficiencies in the above background technologies, the present invention proposes a fast affine motion estimation method for H.266/VVC, which solves the technical problem of high coding complexity of affine motion estimation in VTM.
本发明的技术方案是这样实现的:The technical scheme of the present invention is realized as follows:
一种针对H.266/VVC的快速仿射运动估计方法,其步骤如下:A fast affine motion estimation method for H.266/VVC, the steps are as follows:
S1、利用标准差计算当前CU的纹理复杂度SD,并根据纹理复杂度SD将当前CU分为静态区域或非静态区域;S1, utilize the standard deviation to calculate the texture complexity SD of the current CU, and divide the current CU into a static area or a non-static area according to the texture complexity SD;
S2、对于静态区域的CU,跳过仿射运动估计AME,直接利用运动估计CME对当前CU进行预测,并通过率失真优化的方法选择最佳的预测方向模式;S2. For the CU in the static area, skip the affine motion estimation AME, directly use the motion estimation CME to predict the current CU, and select the best prediction direction mode through the rate-distortion optimization method;
S3、对于非静态区域的CU,利用训练好的随机森林分类器RFC模型对当前CU进行分类,输出最佳的预测方向模式。S3. For the CU in the non-static area, use the trained random forest classifier RFC model to classify the current CU, and output the best prediction direction mode.
所述利用标准差计算当前CU的纹理复杂度SD的方法为:The method for calculating the texture complexity SD of the current CU using the standard deviation is:
其中,W代表CU的宽度,H代表CU的高度,P(a,b)表示在CU中位置为(a,b)的像素值。Among them, W represents the width of the CU, H represents the height of the CU, and P(a, b) represents the pixel value at the position (a, b) in the CU.
所述利用运动估计CME对当前CU进行预测,并通过率失真优化的方法选择最佳的预测方向模式的方法为:The method of using the motion estimation CME to predict the current CU, and selecting the best prediction direction mode through the rate-distortion optimization method is:
S21、当前CU首先经过单向预测Uni-L0,然后经过单向预测Uni-L1,最后经过双向预测Bi;S21. The current CU firstly predicts Uni-L0 in one direction, then predicts Uni-L1 in one direction, and finally predicts Bi in two directions;
S22、利用率失真优化分别计算步骤S21中的当前CU分别经过单向预测Uni-L0、单向预测Uni-L1和双向预测Bi的率失真代价;S22, utilization distortion optimization respectively calculates the rate-distortion cost of current CU in step S21 through uni-directional prediction Uni-L0, uni-directional prediction Uni-L1 and bi-directional prediction Bi;
S23、将率失真代价最小的预测模式作为最佳的预测方向模式。S23. Use the prediction mode with the least rate-distortion cost as the best prediction direction mode.
所述单向预测Uni-L0、单向预测Uni-L1和双向预测Bi的率失真代价的计算方法均为:The calculation methods of the rate-distortion cost of the unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi are:
其中,表示所有可用参考列表集合,表示参考列表集,L0和L1表示两个参考帧列表,φ(j)表示参考列表中的参考帧,J(·)为率失真代价函数,且D(·)表示CU编码的失真程度,λ表示拉格朗日乘子,R(·)表示CU编码消耗的比特数。in, represents the set of all available reference lists, represents the reference list set, L0 and L1 represent two reference frame lists, φ(j) represents the reference frames in the reference list, J( ) is the rate-distortion cost function, and D(·) represents the distortion degree of CU coding, λ represents the Lagrangian multiplier, and R(·) represents the number of bits consumed by CU coding.
所述步骤S3中的随机森林分类器RFC模型的训练方法为:The training method of the random forest classifier RFC model in the step S3 is:
S31、从通用测试序列中选用不同分辨率下的Traffic、Kimono、BQSquare、RaceHorseC、和FourPeople视频序列,在VTM上分别编码前M帧,同时记录VTM中CU的形状、CU的纹理复杂度及CU的三种预测方向模式作为数据集,数据集包括样本集S和测试集T,其中,三种预测方向模式包括单向预测Uni-L0、单向预测Uni-L1和双向预测Bi;S31. Select Traffic, Kimono, BQSquare, RaceHorseC, and FourPeople video sequences at different resolutions from the general test sequence, encode the first M frames on the VTM, and simultaneously record the shape of the CU in the VTM, the texture complexity of the CU and the CU The three prediction direction modes are used as the data set, the data set includes the sample set S and the test set T, wherein, the three prediction direction modes include unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
S32、利用Bootstrap法重采样样本集S,生成K个训练样本集将生成的每个训练集作为根节点,生成对应的决策树{T1,T2,...,TK},其中,i=1,2,…,K表示第i个训练样本,K表示训练样本集的大小;S32. Use the Bootstrap method to resample the sample set S to generate K training sample sets will generate each training set As the root node, the corresponding decision tree {T 1 , T 2 ,...,T K } is generated, where i=1, 2,..., K represents the ith training sample, and K represents the size of the training sample set;
S33、从根节点开始训练,在决策树的每个中间节点上随机选择m个特征属性,计算每个特征属性的Gini指标系数,从中选择Gini指标系数最小的特征属性作为当前节点的最优分裂属性,以最小Gini指标系数为分裂阈值,将m个特征属性划分为左子树、右子树;S33. Start training from the root node, randomly select m feature attributes on each intermediate node of the decision tree, calculate the Gini index coefficient of each feature attribute, and select the feature attribute with the smallest Gini index coefficient as the optimal split of the current node attribute, with the minimum Gini index coefficient as the splitting threshold, divide m feature attributes into left subtree and right subtree;
S34、重复步骤S33,训练K’次,直到K’棵决策树训练完成,每棵决策树都完整生长而不进行剪枝;S34, repeat step S33, train K' times, until K' decision tree training is completed, and each decision tree grows completely without pruning;
S35、生成的多棵决策树即为随机森林分类器RFC模型,并利用随机森林分类器RFC模型对测试集T进行判别分类,分类结果采用投票方式,将K’棵决策树输出最多的类别作为测试集T的所属类别,得到当前CU的最佳的预测方向模式。S35. The generated multiple decision trees are the random forest classifier RFC model, and the random forest classifier RFC model is used to discriminate and classify the test set T, and the classification result adopts the voting method, and the category with the most output from the K' decision trees is used as the The category of the test set T, and the best prediction direction mode of the current CU is obtained.
所述步骤S31中获得数据集的方法为:The method for obtaining the data set in the step S31 is:
S31.1、利用运动估计CME对视频序列进行预测;S31.1, using motion estimation CME to predict the video sequence;
S31.2、利用4参数仿射运动模型对步骤S31.1中预测后的视频序列进行仿射预测,其中,仿射预测包括单向预测Uni-L0、单向预测Uni-L1和双向预测Bi;S31.2. Use a 4-parameter affine motion model to perform affine prediction on the video sequence predicted in step S31.1, wherein the affine prediction includes unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi ;
S31.3、利用6参数仿射运动模型对步骤S31.2中仿射预测后的视频序列进行放射预测;S31.3, using the 6-parameter affine motion model to perform radiation prediction on the video sequence after the affine prediction in step S31.2;
S31.4、分别计算步骤S31.2和S31.3进行仿射预测后的率失真代价,将最小的率失真代价对应的预测模式为视频序列的预测方向模式。S31.4. Calculate the rate-distortion cost of the affine prediction performed in steps S31.2 and S31.3 respectively, and set the prediction mode corresponding to the smallest rate-distortion cost as the prediction direction mode of the video sequence.
所述特征属性包括二维哈尔小波变换水平系数、二维哈尔小波变换垂直系数、二维哈尔小波变换角度系数、角二阶矩、对比度、熵、逆差矩、最小差值和和梯度。The characteristic attributes include two-dimensional Haar wavelet transform horizontal coefficients, two-dimensional Haar wavelet transform vertical coefficients, two-dimensional Haar wavelet transform angle coefficients, angular second moment, contrast, entropy, inverse difference moment, minimum difference sum and gradient .
所述4参数仿射运动模型,CU中样本位置(x,y)的运动矢量为:In the 4-parameter affine motion model, the motion vector of the sample position (x, y) in the CU is:
其中,(mv0x,mv0y)是左上角控制点的运动矢量,(mv1x,mv1y)是右上角控制点的运动矢量,W表示CU的宽;Among them, (mv 0x , mv 0y ) is the motion vector of the upper left control point, (mv 1x , mv 1y ) is the motion vector of the upper right control point, and W represents the width of the CU;
所述6参数仿射运动模型,CU中样本位置(x,y)的运动矢量为:In the 6-parameter affine motion model, the motion vector of the sample position (x, y) in the CU is:
其中,(mv2x,mv2y)是左下角的运动矢量控制点,H表示CU的高。Among them, (mv 2x , mv 2y ) is the motion vector control point in the lower left corner, and H represents the height of the CU.
本技术方案能产生的有益效果:本发明首先利用标准差SD将CU分为静态区域和非静态区域,如果CU属于静态区域,选择SKIP模式进行帧间预测的概率较高,并且倾向于选择SKIP模式进行帧间预测的静态区域不需要进行仿射预测,因此,在静态区域可以提前终止仿射运动估计AME模块,并且当前CU的最佳方向模式为运动估计CME的最佳方向模式;如果CU属于非静态区域,则根据随机森林分类模型判断CU的帧间预测模式,最终提前得到最优的预测方向模式;因此,本发明降低了计算复杂度并节省了编码时间,从而实现H.266/VVC的快速编码。The beneficial effects that the technical solution can produce: the present invention first uses the standard deviation SD to divide the CU into a static area and a non-static area. If the CU belongs to the static area, the probability of selecting the SKIP mode for inter-frame prediction is higher, and SKIP tends to be selected. Affine prediction is not required for static regions where inter-prediction mode is performed. Therefore, the affine motion estimation AME module can be terminated in advance in the static region, and the best direction mode of the current CU is the best direction mode of the motion estimation CME; if the CU If it belongs to a non-static area, the inter-frame prediction mode of the CU is determined according to the random forest classification model, and finally the optimal prediction direction mode is obtained in advance; therefore, the present invention reduces the computational complexity and saves the coding time, thereby realizing H.266/ Fast encoding for VVC.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明的流程图;Fig. 1 is the flow chart of the present invention;
图2为本发明的预测方向模式复杂度分布图;Fig. 2 is the prediction direction mode complexity distribution diagram of the present invention;
图3为本发明的4-参数仿射模型;Fig. 3 is the 4-parameter affine model of the present invention;
图4为本发明的6-参数仿射模型;Fig. 4 is the 6-parameter affine model of the present invention;
图5为本发明的运动估计ME的整体过程图;5 is an overall process diagram of the motion estimation ME of the present invention;
图6为本发明方法与FAME方法的整体运行时间对比结果图。Fig. 6 is the overall running time comparison result diagram of the method of the present invention and the FAME method.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
如图1所示,本发明实施例提供了一种针对H.266/VVC的快速仿射运动估计方法,具体步骤如下:As shown in FIG. 1 , an embodiment of the present invention provides a fast affine motion estimation method for H.266/VVC, and the specific steps are as follows:
S1、在图像编码过程中,单一区域的图像内容往往采用较大的CU进行编码。相反,具有丰富细节的区域通常使用较小的CU进行编码。由此可知,利用编码块的纹理复杂度程度决定CU是否使用SKIP模式进行帧间预测。在图像编码过程中,图像内容单一的区域更趋向于使用SKIP模式进行帧间预测进行编码,而细节丰富的区域很小几率使用SKIP模式进行帧间预测。一个CU的方差代表当前块的两个像素间能量的分散程度,因此一个块的纹理复杂度可以粗略的用它标准差SD来衡量,因此,利用标准差计算当前CU的纹理复杂度SD,并根据纹理复杂度SD将当前CU分为静态区域或非静态区域;标准差的公式为:S1. In the image encoding process, the image content of a single area is often encoded by using a larger CU. Conversely, regions with rich details are usually encoded with smaller CUs. It can be seen from this that whether the CU uses the SKIP mode to perform inter-frame prediction is determined by using the texture complexity of the coding block. In the process of image encoding, areas with single image content tend to be encoded using SKIP mode for inter-frame prediction, while areas with rich details are less likely to use SKIP mode for inter-frame prediction. The variance of a CU represents the degree of energy dispersion between the two pixels of the current block, so the texture complexity of a block can be roughly measured by its standard deviation SD. Therefore, the standard deviation is used to calculate the texture complexity SD of the current CU, and According to the texture complexity SD, the current CU is divided into static area or non-static area; the formula of standard deviation is:
其中,W代表CU的宽度,H代表CU的高度,P(a,b)表示在CU中位置为(a,b)的像素值。由于相邻块的纹理复杂度与CU具有相关性,通过相邻块的纹理复杂度来导出分类的阈值。根据大量的实验数据,将CU的相邻块的标准差SD中的最小值作为Thstatic是合理的。通过阈值可以将CU进行分类。如果当前标准差SD小于阈值Thstatic,则表明当前CU是静态区域。相反,如果标准差SD的值大于Thstatic,则当前的CU属于非静态区域。Among them, W represents the width of the CU, H represents the height of the CU, and P(a, b) represents the pixel value at the position (a, b) in the CU. Since the texture complexity of neighboring blocks is related to CU, the threshold for classification is derived from the texture complexity of neighboring blocks. According to a large amount of experimental data, it is reasonable to take the minimum value among the standard deviations SD of adjacent blocks of a CU as Th static . CUs can be classified by thresholds. If the current standard deviation SD is less than the threshold Th static , it indicates that the current CU is a static area. Conversely, if the value of the standard deviation SD is greater than Th static , the current CU belongs to the non-static region.
S2、现有的视频编码标准(如H.265/HEVC)对于运动估计CME,使用覆盖平移运动的运动矢量MV,然而,仿射运动估计AME不仅可以预测平移运动,还可以预测线性变换运动,如缩放和旋转。如果相机缩放或旋转以捕获视频,仿射运动估计AME比运动估计CME更准确地预测运动。在H.266/VVC中,仿射运动估计AME和运动估计CME一样,从单向预测Uni-prediction L0开始,然后是单向预测Uni-prediction L1,最后是双向预测Bi-prediction。在计算了三种预测方向模式后,利用率失真优化(rate distortionoptimization,RDO)的方法选择最佳的预测方向模式。图2示出了仿射运动估计AME帧间预测模式的复杂度的分布,并且单向预测Uni-prediction L0比单向预测Uni-prediction L1需要更多的预测时间。如果单向预测Uni-prediction L0和单向预测Uni-prediction L1的参考帧不同,单向预测的编码复杂度是一样。否则,如果单向预测Uni-prediction L0和单向预测Uni-prediction L1之间的参考帧相同,则从单向预测Uni-prediction L0的运动矢量复制单向预测Uni-prediction L1的运动矢量MV,以避免冗余仿射运动估计AME处理。因此,单向预测Uni-prediction L0预测比单向预测Uni-prediction L1消耗更多的预测时间。虽然在单向预测中需要最大的编码复杂度,但双向预测模式作为最佳帧间预测模式的概率较高。在仿射运动估计AME模块中,计算率失真RD代价是导致复杂度大的重要原因。S2. Existing video coding standards (such as H.265/HEVC) use motion vectors MV covering translational motion for motion estimation CME. However, affine motion estimation AME can not only predict translational motion, but also linear transform motion, such as scaling and rotation. Affine motion estimation AME predicts motion more accurately than motion estimation CME if the camera is zoomed or rotated to capture video. In H.266/VVC, the affine motion estimation AME is the same as the motion estimation CME, starting from the uni-prediction Uni-prediction L0, then the uni-prediction Uni-prediction L1, and finally the bi-prediction Bi-prediction. After calculating the three prediction direction modes, the rate distortion optimization (RDO) method selects the best prediction direction mode. Figure 2 shows the distribution of the complexity of affine motion estimation AME inter prediction mode, and uni-prediction Uni-prediction L0 requires more prediction time than Uni-prediction L1. If the reference frames of Uni-prediction L0 and Uni-prediction L1 are different, the coding complexity of Uni-prediction is the same. Otherwise, if the reference frame between the uni-prediction Uni-prediction L0 and the uni-prediction L1 is the same, copy the motion vector MV of the uni-prediction Uni-prediction L1 from the motion vector of the uni-prediction Uni-prediction L0, To avoid redundant affine motion estimation AME processing. Therefore, uni-prediction Uni-prediction L0 prediction consumes more prediction time than Uni-prediction L1. Although the greatest coding complexity is required in unidirectional prediction, bidirectional prediction mode has a higher probability of being the best inter prediction mode. In the affine motion estimation AME module, calculating the rate-distortion RD cost is an important reason for the complexity.
为了获得最佳运动矢量MV和最佳参考帧,编码器搜索多个可用的参考帧,使用拉格朗日乘子方法计算率失真RD代价J(·)并比较预测结果的代价,拉格朗日乘子方法计算率失真RD代价函数J(·)表示为:其中,D(·)表示CU编码的失真程度,λ表示拉格朗日乘子,R(·)表示CU编码消耗的比特数。因为用单向预测Uni-L0和单向预测Uni-L1表示的两个参考帧列表用于运动预测,所以应该用两个列表测试用于单向预测的运动估计ME过程,从而在两个列表中生成所有可用的帧。In order to obtain the best motion vector MV and the best reference frame, the encoder searches multiple available reference frames, uses the Lagrangian multiplier method to calculate the rate-distortion RD cost J( ) and compares the cost of the prediction results, Lagrangian The rate-distortion RD cost function J(·) calculated by the daily multiplier method is expressed as: Among them, D(·) represents the distortion degree of CU coding, λ represents the Lagrangian multiplier, and R(·) represents the number of bits consumed by CU coding. Since the two reference frame lists denoted by unidirectional prediction Uni-L0 and unidirectional prediction Uni-L1 are used for motion prediction, the motion estimation ME process for unidirectional prediction should be tested with two lists, so that in both lists to generate all available frames.
对于静态区域的CU,跳过仿射运动估计AME,直接利用运动估计CME对当前CU进行预测,并通过率失真优化的方法选择最佳的预测方向模式;具体方法为:For the CU in the static area, skip the affine motion estimation AME, directly use the motion estimation CME to predict the current CU, and select the best prediction direction mode through the rate-distortion optimization method; the specific method is:
S21、当前CU首先经过单向预测Uni-L0,然后经过单向预测Uni-L1,最后经过双向预测Bi。S21. The current CU first goes through the one-way prediction Uni-L0, then goes through the one-way prediction Uni-L1, and finally goes through the two-way prediction Bi.
S22、利用率失真优化分别计算经过单向预测Uni-L0、单向预测Uni-L1和双向预测Bi的率失真代价;S22, the utilization distortion optimization calculates the rate-distortion cost of the one-way prediction Uni-L0, the one-way prediction Uni-L1 and the two-way prediction Bi respectively;
所述单向预测Uni-L0、单向预测Uni-L1和双向预测Bi的率失真代价分别为:The rate-distortion costs of the unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi are:
其中,表示所有可用参考列表集合,表示参考列表集,L0和L1表示两个参考帧列表,φ(j)表示参考列表中的参考帧,J(·)为率失真代价函数。in, represents the set of all available reference lists, represents the reference list set, L0 and L1 represent two reference frame lists, φ(j) represents the reference frames in the reference list, and J(·) is the rate-distortion cost function.
S23、将率失真代价最小的预测模式作为最佳的预测方向模式。S23. Use the prediction mode with the least rate-distortion cost as the best prediction direction mode.
S3、对于非静态区域的CU,则不满足跳过仿射运动估计AME进程的条件,利用训练好的随机森林分类器RFC模型对当前CU进行分类,输出最佳的预测方向模式,来进一步降低计算复杂性。随机森林算法基于Bootstrap重采样生成K个自助样本集,每个样本集的数据生长为一棵决策树;在每棵树的节点处,基于随机子空间方法RSM,从M’个特征向量中随机抽取m(m<<M’)个特征。按照一定的节点分裂算法,从m个特征属性中选择最优属性进行分支生长;最终将K’棵决策树组合起来进行众数投票。随机森林分类器生成后,对随机森林分类器模型进行测试,森林中的每棵树都会独立判定分类结果,最终决策取相同判定最多的分类类别,用公式表示如下,S3. For the CU in the non-static area, the condition of skipping the affine motion estimation AME process is not satisfied, and the trained random forest classifier RFC model is used to classify the current CU, and output the best prediction direction mode to further reduce Computational complexity. The random forest algorithm generates K self-help sample sets based on Bootstrap resampling, and the data of each sample set grows into a decision tree; at the node of each tree, based on the random subspace method RSM, randomly selected from M' feature vectors Extract m (m<<M') features. According to a certain node splitting algorithm, the optimal attribute is selected from the m feature attributes for branch growth; finally, K' decision trees are combined for mode voting. After the random forest classifier is generated, the random forest classifier model is tested. Each tree in the forest will independently determine the classification result, and the final decision is to take the classification category with the most identical judgments. The formula is as follows:
其中,H(t)表示组合分类模型,hi(t)是单个分类树模型,t表示决策树的特征属性,Y表示输出变量,I(·)表示集合性示性函数(即当集合内有出现某个分类结果时,函数值为1,否则为0)。Among them, H(t) represents the combined classification model, h i (t) is the single classification tree model, t represents the feature attribute of the decision tree, Y represents the output variable, and I( ) represents the collective indicative function (that is, when the When there is a certain classification result, the function value is 1, otherwise it is 0).
在遍历CU时,记录CU的特征及CU的预测方向模式,不干扰正常编码过程。通过Bagging集成方法重采样产生多个训练集,从原始训练样本集中随机等量抽取样本,重复有放回抽取生成K个新的训练样本集,最终得到K个新的训练样本集。样本提取后,进入随进森林分类器模型的训练模块。表1显示了随机森林分类器RFC模型建立的相关训练参数设置。When traversing the CU, the characteristics of the CU and the prediction direction mode of the CU are recorded, so as not to interfere with the normal encoding process. Multiple training sets are generated by resampling through the Bagging ensemble method. Samples are randomly and equally extracted from the original training sample set, and K new training sample sets are generated by repeated and replacement extraction, and finally K new training sample sets are obtained. After the sample is extracted, it enters the training module of the follow-up forest classifier model. Table 1 shows the relevant training parameter settings for random forest classifier RFC model establishment.
表1训练参数配置Table 1 Training parameter configuration
根据表1的参数,随机森林分类器RFC模型的训练方法为:According to the parameters in Table 1, the training method of the random forest classifier RFC model is:
S31、训练分类器的关键之一是样本集的选取,从通用测试序列中选能够涵盖丰富的纹理复杂度的用不同分辨率下的Traffic、Kimono、BQSquare、RaceHorseC和FourPeople视频序列,在VTM上分别编码前M=50帧,同时记录VTM中CU的形状、CU的纹理复杂度及CU的三种预测方向模式作为数据集,数据集包括样本集S=20和测试集T=30,其中,三种预测方向模式包括单向预测Uni-L0、单向预测Uni-L1和双向预测Bi;S31. One of the keys to training the classifier is the selection of the sample set. From the general test sequence, select Traffic, Kimono, BQSquare, RaceHorseC and FourPeople video sequences at different resolutions that can cover rich texture complexity. M=50 frames before encoding, and simultaneously record the shape of the CU in the VTM, the texture complexity of the CU, and the three prediction direction modes of the CU as a data set. The data set includes a sample set S=20 and a test set T=30, among which, three The prediction direction modes include unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi;
在VTM中,仿射运动的块也通过三种方式进行预测:单向预测Uni-L0、单向预测Uni-L1和双向预测Bi。同时,仿射预测还包括4-参数和6-参数的仿射模型。仿射运动估计AME模块的单向预测或双向预测都需要相关的参考帧,从而增加了VTM的编码复杂度。当仅计算每个仿射运动估计AME模块所需的参考帧数时,仿射运动估计AME进程需要两倍的运动估计CME进程的参考帧数。整个运动估计ME过程如图5所示。由图5可知,步骤S31中获得数据集的方法为:In VTM, the blocks of affine motion are also predicted in three ways: Uni-L0 for unidirectional prediction, Uni-L1 for unidirectional prediction, and Bi for bidirectional prediction. Meanwhile, affine prediction also includes 4-parameter and 6-parameter affine models. Both unidirectional prediction or bidirectional prediction of Affine Motion Estimation AME module require related reference frames, which increases the coding complexity of VTM. When only the number of reference frames required by each affine motion estimation AME module is calculated, the affine motion estimation AME process requires twice as many reference frames as the motion estimation CME process. The entire motion estimation ME process is shown in Figure 5. As can be seen from Figure 5, the method for obtaining the data set in step S31 is:
S31.1、利用运动估计CME对视频序列进行预测,预测方法同步骤S21;S31.1, use motion estimation CME to predict the video sequence, and the prediction method is the same as that of step S21;
S31.2、利用4参数仿射运动模型对步骤S31.1中预测后的视频序列进行仿射预测,其中,仿射预测包括单向预测Uni-L0、单向预测Uni-L1和双向预测Bi;S31.2. Use a 4-parameter affine motion model to perform affine prediction on the video sequence predicted in step S31.1, wherein the affine prediction includes unidirectional prediction Uni-L0, unidirectional prediction Uni-L1 and bidirectional prediction Bi ;
如图3所示,4参数仿射运动模型的CU中样本位置(x,y)的运动矢量为:As shown in Figure 3, the motion vector of the sample position (x, y) in the CU of the 4-parameter affine motion model is:
其中,(mv0x,mv0y)是左上角控制点的运动矢量,(mv1x,mv1y)是右上角控制点的运动矢量,W表示CU的宽;Among them, (mv 0x , mv 0y ) is the motion vector of the upper left control point, (mv 1x , mv 1y ) is the motion vector of the upper right control point, and W represents the width of the CU;
S31.3、利用6参数仿射运动模型对步骤S31.2中仿射预测后的视频序列进行放射预测;S31.3, using the 6-parameter affine motion model to perform radiation prediction on the video sequence after the affine prediction in step S31.2;
如图4所示,6参数仿射运动模型的块中样本位置(x,y)的运动矢量为:As shown in Figure 4, the motion vector of the sample position (x, y) in the block of the 6-parameter affine motion model is:
其中,(mv2x,mv2y)是左下角的运动矢量控制点,H表示CU的高。Among them, (mv 2x , mv 2y ) is the motion vector control point in the lower left corner, and H represents the height of the CU.
S31.4、分别计算步骤S31.2和S31.3进行仿射预测后的率失真代价,将最小的率失真代价对应的预测模式为视频序列的预测方向模式。S31.4. Calculate the rate-distortion cost of the affine prediction performed in steps S31.2 and S31.3 respectively, and set the prediction mode corresponding to the smallest rate-distortion cost as the prediction direction mode of the video sequence.
S32、利用Bootstrap法重采样样本集S,生成K个训练样本集将生成的每个训练集作为根节点,生成对应的决策树{T1,T2,...,TK},其中,i=1,2,…,K表示第i个训练样本,K表示训练样本集的大小;S32. Use the Bootstrap method to resample the sample set S to generate K training sample sets will generate each training set As the root node, the corresponding decision tree {T 1 , T 2 ,...,T K } is generated, where i=1, 2,..., K represents the ith training sample, and K represents the size of the training sample set;
S33、从根节点开始训练,在决策树的每个中间节点上随机选择m个特征属性,计算每个特征属性的Gini指标系数,从中选择Gini指标系数最小的特征属性作为当前节点的最优分裂属性,以最小Gini指标系数为分裂阈值,将m个特征属性划分为左子树、右子树;S33. Start training from the root node, randomly select m feature attributes on each intermediate node of the decision tree, calculate the Gini index coefficient of each feature attribute, and select the feature attribute with the smallest Gini index coefficient as the optimal split of the current node attribute, with the minimum Gini index coefficient as the splitting threshold, divide m feature attributes into left subtree and right subtree;
机器学习的有效性与训练数据集的多样性和相关性高度相关。尽管随机森林分类器RFC可以处理超高维特征数据,但选取出真正相关的特征向量可以更好地推广分类模型。由于CU的预测方向模式和图像的纹理、纹理方向以及运动状态有关,因此将这些作为分类依据,即作为随机森林分类器模型的特征向量。本发明选取的特征属性包括二维哈尔小波变换水平系数(2D Haar wavelet transform horizontal coefficient,HL)、二维哈尔小波变换垂直系数(2D Haar wavelet transform vertical coefficient,LH)、二维哈尔小波变换角度系数(2D Haar wavelet transform angle coefficient,HH)、角二阶矩(angular second moment,ASM)、对比度(contrast,CON)、熵(entropy,ENT)、逆差矩(inverse difference moment,IDM)、最小差值和(Sum of Absolute Difference,SAD)和梯度(gradient)作为随机森林分类器模型的特征属性,特征属性的计算如下:The effectiveness of machine learning is highly correlated with the diversity and correlation of training datasets. Although the random forest classifier RFC can handle ultra-high dimensional feature data, picking out truly relevant feature vectors can better generalize the classification model. Since the prediction direction mode of the CU is related to the texture, texture direction and motion state of the image, these are used as the classification basis, that is, as the feature vector of the random forest classifier model. The characteristic attributes selected by the present invention include two-dimensional Haar wavelet transform horizontal coefficient (2D Haar wavelet transform horizontal coefficient, HL), two-dimensional Haar wavelet transform vertical coefficient (2D Haar wavelet transform vertical coefficient, LH), two-dimensional Haar wavelet transform Transform angle coefficient (2D Haar wavelet transform angle coefficient, HH), angular second moment (ASM), contrast (contrast, CON), entropy (entropy, ENT), inverse difference moment (inverse difference moment, IDM), The Sum of Absolute Difference (SAD) and the gradient (gradient) are used as the feature attributes of the random forest classifier model, and the feature attributes are calculated as follows:
图像的二维哈尔小波变换水平系数HL表示图像水平方向的纹理,值越大说明水平方向的纹理越丰富,值越小表示水平方向的纹理越平坦;图像的二维哈尔小波变换垂直系数LH表示图像垂直方向的纹理,值越大说明垂直方向的纹理越丰富,值越小表示垂直方向的纹理越平坦;图像的二维哈尔小波变换角度系数HH表示图像垂直方向的纹理,值越大说明45°方向的纹理越丰富,值越小表示45°方向的纹理越平坦,二维哈尔小波变换水平系数HL、二维哈尔小波变换垂直系数LH和二维哈尔小波变换角度系数HH分别表示为:The horizontal coefficient HL of the two-dimensional Haar wavelet transform of the image represents the texture in the horizontal direction of the image. The larger the value, the richer the texture in the horizontal direction, and the smaller the value, the flatter the texture in the horizontal direction; the vertical coefficient of the two-dimensional Haar wavelet transform of the image LH represents the texture in the vertical direction of the image. The larger the value, the richer the texture in the vertical direction. The smaller the value, the flatter the texture in the vertical direction. The two-dimensional Haar wavelet transform angle coefficient HH of the image represents the texture in the vertical direction of the image. The larger the value, the richer the texture in the 45° direction. The smaller the value, the flatter the texture in the 45° direction. The two-dimensional Haar wavelet transform horizontal coefficient HL, the two-dimensional Haar wavelet transform vertical coefficient LH, and the two-dimensional Haar wavelet transform angle coefficient HH are expressed as:
其中,W代表CU的宽,H代表CU的高,P(a,b)代表在位置为(a,b)的像素值。Among them, W represents the width of the CU, H represents the height of the CU, and P(a,b) represents the pixel value at the position (a,b).
角二阶矩ASM反应灰度分布均匀程度和纹理粗细度,值越大说明图像纹理分布越均匀;对比度CON反应图像的纹理深度,值越大说明纹理深度越大;熵ENT表示图像的信息量,值越大说明图像的信息量越大;逆差矩IDM反应图像局部纹理变化的大小,图像的纹理的不同区域间较均匀,变化缓慢,角二阶矩ASM、对比度CON、熵ENT和逆差矩IDE分别表示为:Angular second moment ASM reflects the uniformity of gray distribution and texture thickness. The larger the value, the more uniform the image texture distribution is; the contrast CON reflects the texture depth of the image, the larger the value, the greater the texture depth; the entropy ENT represents the amount of information in the image , the larger the value, the greater the amount of information in the image; the inverse difference moment IDM reflects the size of the local texture change of the image, the texture of different regions of the image is relatively uniform, and the change is slow, the second-order moment of angle ASM, contrast CON, entropy ENT and inverse difference moment IDEs are represented as:
在基于块匹配的运动估计算法中,最佳匹配块的判断准则有很多,我们使用最小差值和SAD,SAD越小,表明参考块越接近当前预测块,最小差值和SAD表示为:In the motion estimation algorithm based on block matching, there are many judgment criteria for the best matching block. We use the minimum difference and SAD. The smaller the SAD, the closer the reference block is to the current prediction block. The minimum difference and SAD are expressed as:
其中,Pk(a,b)代表当前像素的值,(a,b)表示当前像素的坐标,Pk-1(a+i',b+j')是参考像素值,(a+i',b+j')表示参考像素的坐标。Among them, P k (a, b) represents the value of the current pixel, (a, b) represents the coordinates of the current pixel, P k-1 (a+i', b+j') is the reference pixel value, (a+i ',b+j') represents the coordinates of the reference pixel.
梯度表示CU的纹理方向,使用亮度样本的水平和垂直方向的梯度作为特征属性。水平和垂直方向的梯度表示为:The gradient represents the texture direction of the CU, using the gradients of the horizontal and vertical directions of the luma samples as feature attributes. The gradients in the horizontal and vertical directions are expressed as:
Gx(a,b)=P(a+1,b)-P(a,b)+P(a+1,b+1)-P(a,b+1),G x (a,b)=P(a+1,b)-P(a,b)+P(a+1,b+1)-P(a,b+1),
Gy(a,b)=P(a,b)-P(a,b+1)+P(a+1,b)-P(a+1,b+1),G y (a,b)=P(a,b)-P(a,b+1)+P(a+1,b)-P(a+1,b+1),
其中Gx(a,b)和Gy(a,b)分别表示当前像素在水平和垂直方向上的梯度分量。(a,b)代表像素的坐标,P(a,b)代表像素值。where G x (a, b) and G y (a, b) represent the gradient components of the current pixel in the horizontal and vertical directions, respectively. (a,b) represents the coordinates of the pixel, and P(a,b) represents the pixel value.
S34、重复步骤S33,训练K’=25次,直到K’棵决策树训练完成,每棵决策树都完整生长而不进行剪枝;S34, repeat step S33, train K'=25 times, until K' decision tree training is completed, and each decision tree grows completely without pruning;
S35、生成的多棵决策树即为随机森林分类器RFC模型,并利用随机森林分类器RFC模型对测试集T进行判别分类,分类结果采用投票方式,将K’棵决策树输出最多的类别作为测试集T的所属类别,得到当前CU的最佳的预测方向模式,降低仿射运动估计AME模块的计算复杂度。S35. The generated multiple decision trees are the random forest classifier RFC model, and the random forest classifier RFC model is used to discriminate and classify the test set T, and the classification result adopts the voting method, and the category with the most output from the K' decision trees is used as the According to the category of the test set T, the best prediction direction mode of the current CU is obtained, which reduces the computational complexity of the affine motion estimation AME module.
为了评估本发明的方法,在最新的H.266/VVC编码器(VTM 7.0)上进行了仿真测试。测试视频序列在“Random Access”配置中使用默认参数进行编码。BDBR反映了本发明的压缩性能,时间的下降体现了复杂性的降低。表2给出了本发明的编码特性,本发明的总编码时间平均减少到87%,仿射运动估计AME时间平均减少到56%。因此,本发明可以有效地节省编码时间,并且RD性能的损失可以忽略不计。To evaluate the method of the present invention, simulation tests were carried out on the latest H.266/VVC encoder (VTM 7.0). The test video sequence is encoded with default parameters in the "Random Access" configuration. BDBR reflects the compression performance of the present invention, and the reduction in time reflects the reduction in complexity. Table 2 presents the coding characteristics of the present invention, the total coding time of the present invention is reduced to 87% on average, and the affine motion estimation AME time is reduced to 56% on average. Therefore, the present invention can effectively save coding time, and the loss of RD performance can be ignored.
表2本发明的编码特性Table 2 Coding characteristics of the present invention
从表2可以看出本发明与VTM相比RD性能和节省的编码运行时间。对于不同的测试视频,可能实验结果可能会有所波动,但是本发明提出的方法是有效的。与VTM相比,本发明可以有效地降低仿射运动估计AME模块的复杂度,并且具有良好的RD性能。It can be seen from Table 2 that the RD performance and the saved encoding running time of the present invention compared with VTM. For different test videos, the experimental results may fluctuate, but the method proposed in the present invention is effective. Compared with VTM, the present invention can effectively reduce the complexity of AME module for affine motion estimation, and has good RD performance.
仿射运动估计AME模块时间是根据不同的量化参数(Quantization parameter,QP)测量的。当量化参数QP为22时,从图6可以看出,所有视频序列的仿射运动估计AME模块时间总计约为36小时。但是,在本发明的方法中,仿射运动估计AME模块的时间减少了大约9个小时。可以看出,在其他量化参数QPs下,这种趋势是相似的。因此,从图6更直观地观察到,所提出的方法减少了仿射运动估计AME模块的编码时间,从而降低了计算复杂度。Affine motion estimation AME module time is measured according to different Quantization parameters (QP). When the quantization parameter QP is 22, it can be seen from Fig. 6 that the affine motion estimation AME module time for all video sequences totals about 36 hours. However, in the method of the present invention, the time of the affine motion estimation AME module is reduced by about 9 hours. It can be seen that this trend is similar under other quantization parameters QPs. Therefore, it is more intuitively observed from Fig. 6 that the proposed method reduces the encoding time of the affine motion estimation AME module, thereby reducing the computational complexity.
以上结合附图详细说明了本发明的技术方案,本发明的技术方案提出了一种针对H.266/VVC的快速仿射运动估计方法,有效地降低了在VTM中的仿射运动估计AME编码复杂度。首先利用标准差SD将CU分为静态区域和非静态区域,如果CU属于静态区域,选择SKIP模式进行帧间预测的概率较高,并且倾向于选择SKIP模式进行帧间预测的静态区域不需要进行仿射预测,因此,在静态区域可以提前终止仿射运动估计AME模块,并且当前CU的最佳方向模式为运动估计CME的最佳方向模式。如果CU属于非静态区域,则根据随机森林分类模型判断CU的帧间预测模式,最终提前得到最优的预测方向模式。因此,本发明降低了计算复杂度并节省了编码时间,从而实现H.266/VVC的快速编码。The technical solution of the present invention is described in detail above with reference to the accompanying drawings. The technical solution of the present invention proposes a fast affine motion estimation method for H.266/VVC, which effectively reduces the affine motion estimation AME coding in VTM the complexity. First, the standard deviation SD is used to divide the CU into a static area and a non-static area. If the CU belongs to a static area, the probability of selecting the SKIP mode for inter-frame prediction is higher, and the static area that tends to select the SKIP mode for inter-frame prediction does not need to be Affine prediction, therefore, the affine motion estimation AME module can be terminated early in static regions, and the best direction mode of the current CU is the best direction mode of the motion estimation CME. If the CU belongs to a non-static area, the inter-frame prediction mode of the CU is determined according to the random forest classification model, and the optimal prediction direction mode is finally obtained in advance. Therefore, the present invention reduces computational complexity and saves coding time, thereby realizing fast coding of H.266/VVC.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the scope of the present invention. within the scope of protection.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010293694.8A CN111479110B (en) | 2020-04-15 | 2020-04-15 | Fast Affine Motion Estimation Method for H.266/VVC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010293694.8A CN111479110B (en) | 2020-04-15 | 2020-04-15 | Fast Affine Motion Estimation Method for H.266/VVC |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111479110A true CN111479110A (en) | 2020-07-31 |
CN111479110B CN111479110B (en) | 2022-12-13 |
Family
ID=71752555
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010293694.8A Active CN111479110B (en) | 2020-04-15 | 2020-04-15 | Fast Affine Motion Estimation Method for H.266/VVC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111479110B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112689146A (en) * | 2020-12-18 | 2021-04-20 | 重庆邮电大学 | Heuristic learning-based VVC intra-frame prediction rapid mode selection method |
CN112911308A (en) * | 2021-02-01 | 2021-06-04 | 重庆邮电大学 | H.266/VVC fast motion estimation method and storage medium |
CN113225552A (en) * | 2021-05-12 | 2021-08-06 | 天津大学 | Intelligent rapid interframe coding method |
CN113630601A (en) * | 2021-06-29 | 2021-11-09 | 杭州未名信科科技有限公司 | Affine motion estimation method, device, equipment and storage medium |
CN115278260A (en) * | 2022-07-15 | 2022-11-01 | 重庆邮电大学 | VVC fast CU partition method and storage medium based on space-time domain characteristics |
CN115442620A (en) * | 2022-09-06 | 2022-12-06 | 杭州电子科技大学 | AME low-complexity affine motion estimation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1934871A (en) * | 2003-08-25 | 2007-03-21 | 新加坡科技研究局 | Mode decision for inter prediction in video coding |
CN104320658A (en) * | 2014-10-20 | 2015-01-28 | 南京邮电大学 | HEVC (High Efficiency Video Coding) fast encoding method |
WO2018124332A1 (en) * | 2016-12-28 | 2018-07-05 | 엘지전자(주) | Intra prediction mode-based image processing method, and apparatus therefor |
CN110213584A (en) * | 2019-07-03 | 2019-09-06 | 北京电子工程总体研究所 | Coding unit classification method and coding unit sorting device based on Texture complication |
-
2020
- 2020-04-15 CN CN202010293694.8A patent/CN111479110B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1934871A (en) * | 2003-08-25 | 2007-03-21 | 新加坡科技研究局 | Mode decision for inter prediction in video coding |
CN104320658A (en) * | 2014-10-20 | 2015-01-28 | 南京邮电大学 | HEVC (High Efficiency Video Coding) fast encoding method |
WO2018124332A1 (en) * | 2016-12-28 | 2018-07-05 | 엘지전자(주) | Intra prediction mode-based image processing method, and apparatus therefor |
CN110213584A (en) * | 2019-07-03 | 2019-09-06 | 北京电子工程总体研究所 | Coding unit classification method and coding unit sorting device based on Texture complication |
Non-Patent Citations (1)
Title |
---|
吴庆岗等: "基于随机森林和多特征融合的青苹果图像分割", 《信阳师范学院学报(自然科学版)》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112689146A (en) * | 2020-12-18 | 2021-04-20 | 重庆邮电大学 | Heuristic learning-based VVC intra-frame prediction rapid mode selection method |
CN112911308A (en) * | 2021-02-01 | 2021-06-04 | 重庆邮电大学 | H.266/VVC fast motion estimation method and storage medium |
CN112911308B (en) * | 2021-02-01 | 2022-07-01 | 重庆邮电大学 | A fast motion estimation method and storage medium for H.266/VVC |
CN113225552A (en) * | 2021-05-12 | 2021-08-06 | 天津大学 | Intelligent rapid interframe coding method |
CN113225552B (en) * | 2021-05-12 | 2022-04-29 | 天津大学 | Intelligent rapid interframe coding method |
CN113630601A (en) * | 2021-06-29 | 2021-11-09 | 杭州未名信科科技有限公司 | Affine motion estimation method, device, equipment and storage medium |
CN113630601B (en) * | 2021-06-29 | 2024-04-02 | 杭州未名信科科技有限公司 | An affine motion estimation method, device, equipment and storage medium |
CN115278260A (en) * | 2022-07-15 | 2022-11-01 | 重庆邮电大学 | VVC fast CU partition method and storage medium based on space-time domain characteristics |
CN115442620A (en) * | 2022-09-06 | 2022-12-06 | 杭州电子科技大学 | AME low-complexity affine motion estimation method |
Also Published As
Publication number | Publication date |
---|---|
CN111479110B (en) | 2022-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111479110B (en) | Fast Affine Motion Estimation Method for H.266/VVC | |
US10848765B2 (en) | Rate/distortion/RDcost modeling with machine learning | |
WO2020117412A1 (en) | Hybrid motion-compensated neural network with side-information based video coding | |
CN110087087A (en) | VVC interframe encode unit prediction mode shifts to an earlier date decision and block divides and shifts to an earlier date terminating method | |
Zhang et al. | Low complexity HEVC INTRA coding for high-quality mobile video communication | |
CN108989802B (en) | A method and system for quality estimation of HEVC video stream using inter-frame relationship | |
CN110383841A (en) | Method and apparatus for being encoded in video compression to motion vector | |
CN103491334B (en) | Video transcode method from H264 to HEVC based on region feature analysis | |
Birman et al. | Overview of research in the field of video compression using deep neural networks | |
US12225221B2 (en) | Ultra light models and decision fusion for fast video coding | |
CN114286093A (en) | Rapid video coding method based on deep neural network | |
CN108076347A (en) | A kind of acquisition methods and device for encoding starting point | |
Yeo et al. | CNN-based fast split mode decision algorithm for versatile video coding (VVC) inter prediction | |
JP2008523724A (en) | Motion estimation technology for video coding | |
CN102075757B (en) | Video foreground object coding method by taking boundary detection as motion estimation reference | |
WO2024083100A1 (en) | Method and apparatus for talking face video compression | |
CN107018412A (en) | A kind of DVC HEVC video transcoding methods based on key frame coding unit partition mode | |
CN117528069A (en) | Displacement vector prediction method, device and equipment | |
Jillani et al. | Multi-view clustering for fast intra mode decision in HEVC | |
Cherigui et al. | Correspondence map-aided neighbor embedding for image intra prediction | |
Bachu et al. | Adaptive order search and tangent-weighted trade-off for motion estimation in H. 264 | |
CN109168000B (en) | HEVC intra-frame prediction rapid algorithm based on RC prediction | |
Khodarahmi et al. | ReMove: Leveraging Motion Estimation for Computation Reuse in CNN-Based Video Processing | |
Chen et al. | CNN-Based fast HEVC quantization parameter mode decision | |
JP3941921B2 (en) | Video encoding device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20221118 Address after: Floor 20-23, block a, Ximei building, no.6, Changchun Road, high tech Industrial Development Zone, Zhengzhou City, Henan Province, 450000 Applicant after: Zhengzhou Light Industry Technology Research Institute Co.,Ltd. Applicant after: Zhengzhou University of light industry Address before: 450002 No. 5 Dongfeng Road, Jinshui District, Henan, Zhengzhou Applicant before: Zhengzhou University of light industry |
|
GR01 | Patent grant | ||
GR01 | Patent grant |