CN104837019B

CN104837019B - AVS to HEVC optimization video transcoding methods based on SVMs

Info

Publication number: CN104837019B
Application number: CN201510215888.5A
Authority: CN
Inventors: 解蓉; 罗瑞; 张文军; 张良
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2015-04-30
Filing date: 2015-04-30
Publication date: 2018-01-02
Anticipated expiration: 2035-04-30
Also published as: CN104837019A

Abstract

An optimized video transcoding method from AVS to HEVC based on support vector machine, by collecting the feature vector of AVS code stream, and using the support vector machine to learn it and obtain the training model, the extracted AVS feature vector is divided into HEVC The CU at the corresponding position in the CU can be divided or not divided into two categories. In the transcoding stage, the training model is used to predict whether the CU needs to be divided. When the current CU needs to be divided, the 2N×2N mode and the SKIP mode are calculated at the current HEVC depth. , and select the optimal prediction mode from these two modes. When it is predicted that the current CU does not need to be divided, the optimal mode selection is performed according to the HEVC standard encoding process. The present invention combines the basic idea of machine learning, divides the entire transcoding process into a training phase and a transcoding phase, obtains a training model through learning, predicts the division of CUs in HEVC, and combines the fast mode selection algorithm to improve the speed of transcoding , and ensure the overall video quality of the transcoded video.

Description

AVS to HEVC optimized video transcoding method based on support vector machine

技术领域technical field

本发明涉及的是一种视频信号处理领域的技术，具体是一种基于支持向量机的AVS到HEVC优化视频转码方法。The present invention relates to a technology in the field of video signal processing, in particular to an optimized video transcoding method from AVS to HEVC based on a support vector machine.

背景技术Background technique

视频转码技术，是将已压缩的码流通过解码再编码得到符合要求的目标码流。随着多媒体技术和互联网等的广泛应用和快速发展，在网络上传输各种视频数据已经成为现在网络技术发展的趋势，目前已出现了多种视频编码标准，包括MPEG-4、MPEG-2、H.264、AVS、HEVC等。由于视频资源多种多样，以及不同的终端设备的显示能力、存储能力、对码流的处理能力存在差异，在不同情境下用户对视频的需求也不尽相同，因此，如何能够实现高效的转码，使之适应于不同硬件设备以及网络传输环境中，一直广泛受到业界的关注。HEVC是目前最新的视频编码标准，压缩效率比H.264等标准提高了约50％，它将会得到越来越广泛的应用。AVS是我国自主研发的标准，具有与H.264相当的编码性能和更低的编码复杂度，在视频应用领域具有重要的影响力。目前已有许多已压缩的AVS码流，它将于HEVC等标准长期共存，因此实现AVS到HEVC标准间的视频转换成为重要的研究方向。Video transcoding technology is to decode and re-encode the compressed code stream to obtain the target code stream that meets the requirements. With the wide application and rapid development of multimedia technology and the Internet, the transmission of various video data on the network has become the development trend of network technology. At present, a variety of video coding standards have emerged, including MPEG-4, MPEG-2, H.264, AVS, HEVC, etc. Due to the variety of video resources, as well as the differences in the display capabilities, storage capabilities, and processing capabilities of different terminal devices, users have different requirements for video in different scenarios. Therefore, how to achieve efficient conversion Code, so that it can be adapted to different hardware devices and network transmission environments, has been widely concerned by the industry. HEVC is currently the latest video coding standard, and its compression efficiency is about 50% higher than that of H.264 and other standards. It will be more and more widely used. AVS is a standard independently developed by my country. It has the same coding performance and lower coding complexity as H.264, and has an important influence in the field of video applications. At present, there are many compressed AVS streams, which will coexist with HEVC and other standards for a long time. Therefore, the realization of video conversion between AVS and HEVC standards has become an important research direction.

机器学习应用于多个研究领域，包括人工智能、机器翻译、数据挖掘、文字识别以及商业领域等，随着数字多媒体技术以及传输网络等的发展，它也不断的应用于视频搜索、视频分析以及视频编转码等研究方向，目前将机器学习算法应用于视频转码领域的应用也在不断增多。支持向量机是机器学习的一个重要的方法，从输入码流前面部分帧中提取出解码信息等，利用支持向量机学习出AVS信息和HEVC编码模式之间的对应关系，在后面编码的过程中，就直接根据解码出的AVS的信息，预测出HEVC编码的模式，而不需要进行完全迭代遍历的过程，从而达到降低视频转换复杂度的目的。因此，通过利用支持向量机这种机器学习方法，如何学习得到准确的训练模型，实现准确预测以降低视频转换的复杂度，提高视频转码的速率成为当前研究的一个重要课题。Machine learning is used in many research fields, including artificial intelligence, machine translation, data mining, text recognition and commercial fields. With the development of digital multimedia technology and transmission network, it is also continuously applied to video search, video analysis and Research directions such as video encoding and transcoding, and the application of machine learning algorithms in the field of video transcoding is also increasing. Support vector machine is an important method of machine learning. It extracts decoding information from the front part of the input code stream, and uses support vector machine to learn the correspondence between AVS information and HEVC encoding mode. In the subsequent encoding process , directly predict the HEVC encoding mode based on the decoded AVS information, without the need for a complete iterative traversal process, so as to achieve the purpose of reducing the complexity of video conversion. Therefore, how to learn an accurate training model and achieve accurate prediction to reduce the complexity of video conversion and improve the rate of video transcoding by using the machine learning method of support vector machine has become an important topic of current research.

经过对现有技术的检索发现，中国专利文献号CN104320667A公开(公告)日2015.01.28，公开了一种多过程最优化编码系统，包括若干个并行编码器、前瞻缓冲器和二次编码器，前瞻缓冲器的输入端与并行编码器的输出端连接，前瞻缓冲器的输出端与二次编码器的输入端连接，并公开了其方法，包括第一编码阶段、最优化选择阶段和第二编码阶段3个步骤，第一编码阶段由若干个并行编码器同时进行编码，前瞻缓冲器对第一编码阶段所得到的结果进行最优化选择以获得最优编码路径，二次编码器根据最优化选择阶段所获得的最优编码路径第二次编码，获得最终而最优的编码结果。该技术性能、质量、带宽效率更高，编码/转码结果更好，非常易于配置并且非常灵活，既可用于高视频质量的4K和超高清应用，也可用于超高效带宽的移动视频应用。但该技术的输入是原始码流，没有能够利用输入为压缩码流时包含的编码信息，且对于HEVC比较复杂的编码器，编码路径多样，实现复杂度也会相对比较高。After searching the prior art, it was found that Chinese Patent Document No. CN104320667A was published (announced) on 2015.01.28, disclosing a multi-process optimized encoding system, including several parallel encoders, look-ahead buffers and secondary encoders. The input of the look-ahead buffer is connected to the output of the parallel encoder, the output of the look-ahead buffer is connected to the input of the secondary encoder, and its method is disclosed, including the first encoding stage, the optimization selection stage and the second There are 3 steps in the encoding stage. The first encoding stage is encoded by several parallel encoders at the same time. The look-ahead buffer optimizes the results obtained in the first encoding stage to obtain the optimal encoding path. The secondary encoder is optimized according to The optimal encoding path obtained in the selection stage is encoded for the second time to obtain the final and optimal encoding result. The technology has higher performance, quality, bandwidth efficiency, better encoding/transcoding results, is very easy to configure and is very flexible, and can be used for both 4K and UHD applications with high video quality, and mobile video applications with ultra-efficient bandwidth. However, the input of this technology is the original code stream, and the encoding information contained in the compressed code stream cannot be used. Moreover, for HEVC’s more complex encoders, the encoding paths are diverse, and the implementation complexity will be relatively high.

发明内容Contents of the invention

本发明针对现有技术存在的上述不足，提出一种基于支持向量机的AVS到HEVC优化视频转码方法，采用了简化的视频转码框架，在训练阶段将从AVS码流提取出的特征向量训练得到训练模型，然后在转码阶段利用该模型对HEVC中的编码单元划分情况进行区域转码，同时结合快速模式选择算法，减少了重编码阶段编码单元划分以及模式选择的复杂度，大大提高了视频转码速度，同时保证了有限了转码视频质量下降。Aiming at the above-mentioned deficiencies in the prior art, the present invention proposes an optimized video transcoding method from AVS to HEVC based on a support vector machine, adopts a simplified video transcoding framework, and extracts the feature vector from the AVS code stream during the training phase The training model is obtained through training, and then the model is used in the transcoding stage to perform regional transcoding on the division of coding units in HEVC. At the same time, combined with the fast mode selection algorithm, the complexity of coding unit division and mode selection in the re-encoding phase is reduced, and the complexity of the mode selection is greatly improved. The speed of video transcoding is improved, and at the same time, the quality of transcoding video is limited.

本发明是通过以下技术方案实现的：The present invention is achieved through the following technical solutions:

本发明通过采集AVS码流的特征向量，并利用支持向量机对其进行学习并得到训练模型，将提取出的AVS特征向量分为在HEVC中相应位置的CU划分或不划分两类，在转码阶段以训练模型预测CU是否需要划分。The present invention collects the eigenvectors of the AVS code stream, uses the support vector machine to learn them and obtains the training model, and divides the extracted AVS eigenvectors into two categories: CU division or non-division at corresponding positions in HEVC. In the encoding phase, the training model is used to predict whether the CU needs to be divided.

所述的AVS特征向量采集自AVS码流，包括：宏块编码模式、运动向量和变换系数等信息。The AVS feature vector is collected from the AVS code stream, including information such as macroblock coding mode, motion vector and transformation coefficient.

所述的宏块编码模式是指：AVS中每个宏块的模式信息。The macroblock coding mode refers to the mode information of each macroblock in the AVS.

所述的运动向量是指：AVS中宏块的平均运动向量大小。The motion vector refers to: the average motion vector size of the macroblock in the AVS.

所述的变换系数是指：AVS编码的离散余弦变换(DCT)系数中非零系数的个数。The transform coefficient refers to the number of non-zero coefficients in the discrete cosine transform (DCT) coefficients of AVS encoding.

所述的对应关系是指：AVS特征向量与HEVC中编码单元是否需要划分通过学习得到的训练模型，也即映射关系。The corresponding relationship refers to whether the AVS feature vector and the coding unit in HEVC need to be divided into a training model obtained through learning, that is, a mapping relationship.

所述的预测，当得到当前CU需要划分时，再在当前HEVC的深度下分别进行2N×2N模式和SKIP模式计算，并从这两种模式中选择出最优预测模式，当预测得到当前CU不需要进行划分，则按照HEVC标准编码过程进行最优模式选择。For the above prediction, when the current CU needs to be divided, the 2N×2N mode and the SKIP mode are respectively calculated at the current HEVC depth, and the optimal prediction mode is selected from these two modes. When the current CU is predicted If division is not required, optimal mode selection is performed according to the HEVC standard encoding process.

技术效果technical effect

与现有技术相比，本发明结合机器学习基本思想以及支持向量机学习方法，对视频学习得到训练模型并预测后续转码过程中HEVC编码模式，从而降低了转码复杂度，提高转码速度。整个转码过程分为两个阶段，即训练阶段和转码阶段。在训练阶段，提取出AVS特征向量和HEVC中相应位置的CU划分信息，根据支持向量机算法和工具训练得到两者的对应关系，即训练模型；在转码阶段，利用训练阶段得到的模型，根据AVS的特征向量，对深度为0和1时HEVC相应位置CU是否划分进行预测，若预测得到当前CU需要划分，则对当前深度的CU只进行SKIP和2Nx2N模式选择，若预测得到当前CU不需要划分，则按照HEVC标准过程对最优模式进行选择。通过机器学习和快速模式选择方法结合，可以自适应的根据每个序列自身的特征得到相应训练模型，在保证转码视频质量下降有限的情况下，大大减少了转码过程中的计算复杂度，从而提高了转码速度，节省了转码时间。Compared with the prior art, the present invention combines the basic idea of machine learning and the learning method of support vector machine to obtain a training model for video learning and predict the HEVC encoding mode in the subsequent transcoding process, thereby reducing the complexity of transcoding and improving the transcoding speed . The entire transcoding process is divided into two stages, namely the training stage and the transcoding stage. In the training phase, the AVS feature vector and the CU division information of the corresponding position in HEVC are extracted, and the corresponding relationship between the two is obtained according to the support vector machine algorithm and tool training, that is, the training model; in the transcoding phase, the model obtained in the training phase is used, According to the feature vector of AVS, predict whether the CU at the corresponding position of HEVC is divided when the depth is 0 and 1. If it is predicted that the current CU needs to be divided, then only SKIP and 2Nx2N mode selection is performed for the CU of the current depth. If the current CU is predicted to be not If division is required, the optimal mode is selected according to the HEVC standard process. Through the combination of machine learning and fast mode selection methods, the corresponding training model can be adaptively obtained according to the characteristics of each sequence, which greatly reduces the computational complexity in the transcoding process while ensuring that the quality of the transcoded video is limited. Thus, the transcoding speed is improved and the transcoding time is saved.

附图说明Description of drawings

图1是本发明转码框架图；Fig. 1 is a frame diagram of the transcoding of the present invention;

图2是本发明流程图。Fig. 2 is a flowchart of the present invention.

具体实施方式detailed description

下面对本发明的实施例作详细说明，本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below. This embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operating procedures are provided, but the protection scope of the present invention is not limited to the following implementation example.

实施例1Example 1

如图2所示，本实施例分为以下四个步骤：As shown in Figure 2, this embodiment is divided into the following four steps:

步骤一、采集AVS码流的特征向量，具体为：采集AVS码流中与对应的HEVC中深度为0和1时相应位置CU划分的信息。Step 1. Collecting the feature vectors of the AVS code stream, specifically: collecting the CU division information of corresponding positions in the AVS code stream and corresponding HEVC depths of 0 and 1.

所述的宏块编码模式是指：在AVS对应的HEVC中深度为0时，每个CU中包含有16个宏块，因此含有16个特征；在深度为1时，每个CU中包含有4个宏块，相应的有4个模式特征。The macroblock coding mode refers to: when the depth of HEVC corresponding to AVS is 0, each CU contains 16 macroblocks, and therefore contains 16 features; when the depth is 1, each CU contains There are 4 macroblocks, corresponding to 4 mode features.

所述的运动向量是指：AVS中运动向量是以8×8为单位，一个宏块中包含4个运动向量基本单元，每个运动向量分别取模，然后将一个宏块中四个运动向量模的平均值作为一个特征。同样，深度为0时CU中含有16个运动特征，深度为1时含有4个运动特征。运动向量模的平均值v_avg等于其中N是当前编码单元所包含的8×8大小块的个数，v_x(i)和v_y(i)分别是编码单元内第i个8×8块运动向量的水平分量和垂直分量。The motion vector refers to: the motion vector in the AVS is 8×8, and a macroblock contains 4 motion vector basic units, and each motion vector is respectively moduloed, and then the four motion vectors in a macroblock are The mean value of the modulus is used as a feature. Similarly, when the depth is 0, there are 16 motion features in the CU, and when the depth is 1, there are 4 motion features. The average value v _avg of the magnitude of the motion vector is equal to Where N is the number of 8×8 blocks included in the current coding unit, and v _x (i) and v _y (i) are the horizontal component and vertical component of the i-th 8×8 block motion vector in the coding unit, respectively.

所述的变换系数是指：AVS中DCT也是以8×8块为单位，将一个宏块内所有DCT系数中非零系数的个数作为一个特征，因此，深度为0时含有16个DCT系数特征，深度为1时含有4个DCT系数特征。The transformation coefficient refers to: DCT in AVS is also based on 8×8 blocks, and the number of non-zero coefficients in all DCT coefficients in a macro block is used as a feature. Therefore, when the depth is 0, there are 16 DCT coefficients Features, when the depth is 1, it contains 4 DCT coefficient features.

步骤二、利用支持向量机对步骤一得到的AVS特征向量进行学习并得到训练模型，用于实现对HEVC深度为0和1时的编码单元进行分类，将提取出的AVS特征向量分为两类，即在HEVC中相应位置的CU划分或不划分。Step 2. Use the support vector machine to learn the AVS feature vectors obtained in step 1 and obtain a training model, which is used to classify the coding units when the HEVC depth is 0 and 1, and divide the extracted AVS feature vectors into two categories , that is, the CU division or no division at the corresponding position in HEVC.

步骤三、利用步骤二中得到的训练模型直接对当前CU进行是否需要划分的预测，当预测结果为不需要划分，则划分终止并不再进行更深一步的模式选择，否则将进入HEVC的下一深度迭代并执行步骤四。Step 3. Use the training model obtained in step 2 to directly predict whether the current CU needs to be divided. When the prediction result is that no division is required, the division will be terminated and no further mode selection will be performed. Otherwise, it will enter the next step of HEVC. Iterate deeply and perform step 4.

步骤四、在当前HEVC的深度下分别进行2N×2N模式和SKIP模式计算，并从这两种模式中选择出最优预测模式，当预测得到当前CU不需要进行划分，则按照HEVC标准编码过程进行最优模式选择，从而减少在模式选择过程中的计算复杂度，提高了整个转码过程的转码速度。Step 4: Perform 2N×2N mode and SKIP mode calculations at the current HEVC depth, and select the optimal prediction mode from these two modes. When it is predicted that the current CU does not need to be divided, follow the HEVC standard encoding process Optimal mode selection is performed, thereby reducing computational complexity in the mode selection process and improving the transcoding speed of the entire transcoding process.

所述的2N×2N模式计算是指：表示当前编码单元不能再被划分；The 2N×2N mode calculation refers to: indicating that the current coding unit can no longer be divided;

所述的SKIP模式计算是指：表示当前编码宏块不需要编码预测残差信息；The SKIP mode calculation refers to: indicating that the current encoded macroblock does not need to encode prediction residual information;

所述的最优预测模式是指：对2N×2N和SKIP模式进行率失真比较，选择出的率失真代价(rate-distortion cost)最小的编码模式；The optimal prediction mode refers to: compare the rate-distortion of 2N×2N and SKIP modes, and select the encoding mode with the smallest rate-distortion cost;

所述的最优模式选择是指：对HEVC中所有的编码模式进行率失真的比较，从而选择出率失真代价最小的过程。The optimal mode selection refers to a process of comparing rate-distortion for all encoding modes in HEVC, so as to select the process with the least rate-distortion cost.

综上所述，本实施例的优点在于：In summary, the advantages of this embodiment are:

1)结合视频自身的特征，每个序列可以训练得到适用于当前序列的模型。1) Combined with the characteristics of the video itself, each sequence can be trained to obtain a model suitable for the current sequence.

2)根据训练模型预测转码中HEVC编码过程中确定CU最优划分的计算复杂度，从而在保证预测准确率的情况下，大大降低了转码计算复杂度，节省了转码时间，提高了转码效率。2) According to the training model, predict the computational complexity of determining the optimal division of CU in the HEVC encoding process in transcoding, thus greatly reducing the computational complexity of transcoding while ensuring the prediction accuracy, saving transcoding time, and improving Transcoding efficiency.

3)结合快速模式选择算法，进一步减少了不必要的计算过程，提高了转码速度。3) Combined with the fast mode selection algorithm, the unnecessary calculation process is further reduced, and the transcoding speed is improved.

本发明提出的算法是在HEVC参考代码HM10.1的基础上实现的。将AVS到HEVC全解全编转码器作为标准转码器进行比较，通过比较三个标准来评估算法的有效性，即ΔT、ΔPSNR和ΔBR。计算方式如下：The algorithm proposed by the present invention is implemented on the basis of HEVC reference code HM10.1. The AVS-to-HEVC full-decoding full-coding transcoder is compared as a standard transcoder, and the effectiveness of the algorithm is evaluated by comparing three criteria, namely ΔT, ΔPSNR and ΔBR. It is calculated as follows:

ΔPSNR＝PSNR_ref-PSNR_prop ΔPSNR = PSNR _ref - PSNR _prop

ΔT、ΔPSNR和ΔBR分别表示节省的编码时间百分比、PSNR降低值和码率增长率。ΔT, ΔPSNR and ΔBR represent the percentage of saved encoding time, PSNR reduction value and code rate growth rate, respectively.

实验结果如表1所示。The experimental results are shown in Table 1.

表1基于支持向量机的AVS到HEVC转码算法实验结果。Table 1. Experimental results of AVS to HEVC transcoding algorithm based on support vector machine.

通过表1中测试结果可以看出，基于支持向量机的转码算法对于1920×1080的序列，可以使转码速度平均提高了63.68％，PSNR平均降低了0.083dB，同时码率上升了仅1.82％，对于1280×720的序列，使得转码速度平均提高了71.78％，PSNR平均降低了0.09dB，码率上升了1.55％。实验结果表明本发明算法在大大提高视频转码速度的同时，保证了有限的视频质量下降。From the test results in Table 1, it can be seen that for the sequence of 1920×1080, the transcoding algorithm based on support vector machine can increase the transcoding speed by 63.68% on average, reduce the PSNR by 0.083dB on average, and increase the code rate by only 1.82 %, for the 1280×720 sequence, the transcoding speed is increased by 71.78% on average, the PSNR is reduced by 0.09dB on average, and the code rate is increased by 1.55%. Experimental results show that the algorithm of the present invention greatly improves the video transcoding speed while ensuring limited video quality degradation.

Claims

1. AVS based on support vector machine to HEVC optimization video transcoding method, it is characterized in that, by collecting the feature vector of AVS code stream, and utilizing support vector machine to learn it and obtain training model, the AVS that will extract The eigenvectors are divided into two types: the CU at the corresponding position in HEVC or not. In the transcoding stage, the training model is used to predict whether the CU needs to be divided. When the current CU needs to be divided, 2N× is performed at the current HEVC depth. 2N mode and SKIP mode calculation, and select the optimal prediction mode from these two modes. When it is predicted that the current CU does not need to be divided, the optimal mode selection is performed according to the HEVC standard encoding process;

The AVS feature vector is collected from the AVS code stream, including: macroblock coding mode, motion vector and transform coefficient;

The optimal prediction mode refers to: compare the rate-distortion between 2N×2N and SKIP mode, and select the encoding mode with the smallest rate-distortion cost.

2. The method according to claim 1, wherein said macroblock encoding mode refers to: the mode information of each macroblock in the AVS; said motion vector refers to: the average motion of the macroblock in the AVS The size of the vector; the transform coefficient refers to: the number of non-zero coefficients in the discrete cosine transform coefficients of the AVS code.

3. The method according to claim 1 or 2, wherein the macroblock coding mode refers to: when the depth in HEVC corresponding to AVS is 0, each CU contains 16 macroblocks, so Contains 16 features; when the depth is 1, each CU contains 4 macroblocks, and correspondingly has 4 mode features; the motion vector refers to: the motion vector in AVS is in units of 8×8, A macroblock contains 4 motion vector basic units, and each motion vector is modulused separately, and then the average value of the four motion vector moduli in a macroblock is used as a feature. When the depth is 0, there are 16 motion features in the CU , when the depth is 1, it contains 4 motion features; the transformation coefficient refers to: DCT in AVS is also based on 8×8 blocks, and the number of non-zero coefficients in all DCT coefficients in a macroblock is used as a feature, When the depth is 0, it contains 16 DCT coefficient features, and when the depth is 1, it contains 4 DCT coefficient features.

4. The method according to claim 1, wherein the selection of the optimal mode refers to: performing a rate-distortion comparison on all encoding modes in HEVC, so as to select a process with the least rate-distortion cost.