CN104168481B

CN104168481B - Real-time AVS soft coding method for Intel mobile platform

Info

Publication number: CN104168481B
Application number: CN201410355678.1A
Authority: CN
Inventors: 刘宏志; 李�浩; 吴中海; 张兴
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2014-07-24
Filing date: 2014-07-24
Publication date: 2017-08-04
Anticipated expiration: 2034-07-24
Also published as: CN104168481A

Abstract

The invention discloses a kind of soft coding methods of real-time AVS for Intel mobile platforms.This method is：1) AVS encoders detect the type of current macroblock to be encoded, call corresponding I frames coding unit or P frame coding units to be encoded；2) when carrying out I frame macroblock codings, the termination in advance of predictive mode traversal is realized as threshold value by using the Matching power flow average of predictive mode；3) when carrying out P frame macroblock codings, the termination in advance that threshold value realizes reference frame and predictive mode traversal is used as by using the minimum value of Matching power flow.During Matching power flow (sad value) before current macro and predicted macroblock is calculated, realized using the distinctive SSE instruction techniques of Intel mobile platforms.The present invention substantially increases processor utilization, realizes the video acquisition and coding of 10 frame per second.

Description

Real-time AVS soft coding method for Intel mobile platform

技术领域technical field

本方法涉及AVS视频编码技术、Android应用程序开发技术、NDK技术、程序优化技术和SSE技术。本方法直接应用于移动平台视频编码领域，实现了在Intel移动平台上的实时AVS软编码。The method involves AVS video coding technology, Android application program development technology, NDK technology, program optimization technology and SSE technology. The method is directly applied to the field of mobile platform video coding, and realizes real-time AVS soft coding on the Intel mobile platform.

背景技术Background technique

AVS视频编码技术：AVS video coding technology:

AVS视频标准是为了适应数字电视广播、数字存储媒体、因特网流媒体、多媒体通信等应用中对运动图像压缩技术的需要而制定的。该标准适用的范围包括但不限于下述领域:数字地面电视广播(DTTB,Digital terrestrial television broadcasting)、有线电视(CATV,Cable TV)、交互存储媒体、直播卫星视频业务(DBS,Direct broadcastSatellite video services)、宽带视频业务、多媒体邮件、分组网络的多媒体业务(MSPN,Multimedia services on packet networks)、实时通信业务(视频会议,可视电话等)、远程视频监控。The AVS video standard is formulated to meet the needs of moving image compression technology in applications such as digital TV broadcasting, digital storage media, Internet streaming media, and multimedia communications. The scope of application of this standard includes but is not limited to the following fields: digital terrestrial television broadcasting (DTTB, Digital terrestrial television broadcasting), cable television (CATV, Cable TV), interactive storage media, direct broadcast satellite video services (DBS, Direct broadcastSatellite video services ), broadband video services, multimedia mail, multimedia services on packet networks (MSPN, Multimedia services on packet networks), real-time communication services (video conferencing, videophone, etc.), remote video surveillance.

该标准采用了一系列技术来达到高效率的视频编码,包括帧内预测、帧间预测、变换、量化和熵编码等。帧间预测使用基于块的运动矢量来消除图像间的冗余；帧内预测使用空间预测模式来消除图像内的冗余。通过对预测残差进行变换和量化消除图像内的视觉冗余。最后,运动矢量、预测模式、量化参数和变换系数用熵编码进行压缩。This standard adopts a series of techniques to achieve high-efficiency video coding, including intra prediction, inter prediction, transformation, quantization and entropy coding. Inter prediction uses block-based motion vectors to remove redundancy between images; intra prediction uses spatial prediction modes to remove redundancy within images. Remove visual redundancy within images by transforming and quantizing prediction residuals. Finally, motion vectors, prediction modes, quantization parameters and transform coefficients are compressed with entropy coding.

Android应用程序开发技术：Android application development technology:

Android是一种基于Linux的自由及开放源代码的操作系统，主要使用于移动设备，如智能手机和平板电脑。Android的系统架构和其操作系统一样，采用了分层的架构，分为四个层，从高层到低层分别是应用程序层、应用程序框架层、系统运行库层和Linux内核层。Android开发四大组件分别是：活动(Activity)：用于表现功能。服务(Service)：后台运行服务，不提供界面呈现。广播接收器(BroadcastReceiver)：用于接收广播。内容提供商(Content Provider)：支持在多个应用中存储和读取数据，相当于数据库。在Android集成开发环境(IDE)利用上述组件可以完成基本的Android应用开发。Android is a Linux-based free and open-source operating system primarily used on mobile devices such as smartphones and tablets. Android's system architecture, like its operating system, adopts a layered architecture and is divided into four layers. From high to low, they are the application layer, the application framework layer, the system runtime layer and the Linux kernel layer. The four major components of Android development are: Activity (Activity): used to express functions. Service: The service runs in the background and does not provide interface presentation. Broadcast Receiver (BroadcastReceiver): used to receive broadcasts. Content Provider: It supports storing and reading data in multiple applications, which is equivalent to a database. The basic Android application development can be completed by using the above components in the Android Integrated Development Environment (IDE).

NDK技术：NDK technology:

Android NDK(Android Native DevelopmentKit)是一系列的开发工具，允许程序开发人员在Android应用程序中嵌入C/C++语言编写的代码。NDK允许程序开发人员使用C/C++语言进行库文件开发，并提供便捷工具将库文件打包到apk文件中。Android NDK (Android Native Development Kit) is a series of development tools that allow program developers to embed code written in C/C++ language in Android applications. NDK allows program developers to use C/C++ language to develop library files, and provides convenient tools to package library files into apk files.

程序优化技术：Program optimization technology:

程序优化技术是指在不改变程序功能的情况下，根据处理器及系统的特性，通过修改原来程序的算法、结构，或利用软件开发工具对程序进行改进。使修改后的程序运行速度更快或占用空间更小或能耗最低。优化的原则有：等效原则、有效原则、经济原则。优化的途径主要有：程序多线程化、使用处理器专用编译器、程序结构优化、代码优化等。Program optimization technology refers to improving the program by modifying the algorithm and structure of the original program or using software development tools according to the characteristics of the processor and the system without changing the program function. Make the modified program run faster or take up less space or consume the least amount of energy. The principles of optimization are: equivalence principle, effective principle and economic principle. The optimization methods mainly include: program multi-threading, using processor-specific compiler, program structure optimization, code optimization, etc.

SSE技术：SSE technology:

SSE是指令集的简称，它包括70条指令，其中包含单指令多数据浮点计算、以及额外的SIMD整数和高速缓存控制指令。其优势包括：更高分辨率的图像浏览和处理、高质量音频、MPEG2视频、同时MPEG2加解密；语音识别占用更少CPU资源；更高精度和更快响应速度。SSE is the abbreviation of the instruction set, which includes 70 instructions, including single instruction multiple data floating point calculations, and additional SIMD integer and cache control instructions. Its advantages include: higher resolution image browsing and processing, high-quality audio, MPEG2 video, and MPEG2 encryption and decryption at the same time; speech recognition takes up less CPU resources; higher precision and faster response speed.

现有的同类应用存在AVS编码效率低，处理器利用率低等缺点。Existing similar applications have disadvantages such as low AVS encoding efficiency and low processor utilization.

发明内容Contents of the invention

针对现有的同类技术方案存在AVS编码效率低、处理器利用率低等问题，本发明提出一种针对Intel移动平台的实时AVS软编码方法。Aiming at the problems of low AVS coding efficiency and low processor utilization in existing similar technical solutions, the present invention proposes a real-time AVS soft coding method for Intel mobile platforms.

本方法首先运用Android应用开发技术开发出摄像机框架。之后运用NDK技术将AVS编码器移植到Android摄像机工程中。参照AVS视频编码原理和技术，完成对AVS编码器的算法优化；运用程序优化技术，完成摄像机程序的多线程优化以及AVS编码器C程序的结构优化和代码优化。最后，使用处理器专用的C编译器，并使用处理器支持的SSE指令集，优化计算，提高处理器利用率。In this method, the camera frame is firstly developed by using the Android application development technology. Then use NDK technology to port the AVS encoder to the Android camera project. Referring to the AVS video coding principle and technology, the algorithm optimization of the AVS encoder is completed; using the program optimization technology, the multi-thread optimization of the camera program and the structure optimization and code optimization of the AVS encoder C program are completed. Finally, use a processor-specific C compiler and use the SSE instruction set supported by the processor to optimize calculations and improve processor utilization.

AVS视频编码的关键是对I、P、B三种图像的编码(本方法涉及的优化只考虑了I图像和P图像，并未涉及B图像)。而对I图像和P图像的编码都是以宏块为单位的，所有关键技术都体现在对一个宏块的编码过程中。下面介绍对宏块编码的优化。The key of AVS video coding is to the coding of I, P, B three kinds of pictures (the optimization that this method involves has only considered I picture and P picture, does not involve B picture). The coding of I picture and P picture is based on macroblock, and all key technologies are embodied in the coding process of one macroblock. The optimization of macroblock coding is introduced below.

本发明的技术方案为：Technical scheme of the present invention is:

一种针对Intel移动平台的实时AVS软编码方法，其步骤为：A kind of real-time AVS soft coding method for Intel mobile platform, its step is:

1)AVS编码器检测当前待编码宏块的类型，如果为I帧宏块，则调用I帧编码单元进行编码；如果为P帧宏块，则调用P帧编码单元进行编码；1) AVS encoder detects the type of the macroblock to be encoded at present, if it is an I frame macroblock, then calls the I frame coding unit to encode; if it is a P frame macroblock, then calls the P frame coding unit to encode;

2)P帧编码单元对当前P帧宏块进行帧间预测，选择块间绝对误差和SAD最小的帧间宏块模式为最佳帧间宏块模式，将该帧间宏块模式下得到预测宏块作为最佳预测宏块；将其与当前P帧宏块进行残差计算，并对得到的残差矩阵进行变换、量化、熵编码；2) The P-frame encoding unit performs inter-frame prediction on the current P-frame macroblock, selects the inter-frame macroblock mode with the smallest inter-block absolute error and SAD as the best inter-frame macroblock mode, and obtains prediction in this inter-frame macroblock mode The macroblock is used as the best predicted macroblock; the residual is calculated with the current P frame macroblock, and the resulting residual matrix is transformed, quantized, and entropy encoded;

3)I帧编码单元对当前I帧宏块进行帧内预测，得到当前I帧宏块的最佳帧内预测块，将其与当前I帧宏块进行残差计算，并对得到的残差矩阵进行变换、量化、熵编码；其中，在每次帧内预测模式遍历过程中，判断当前帧内预测模式下计算所得的8x8块率失真代价是否小于率失真代价均值，如果是，则将其作为当前I帧宏块的最佳帧内预测块，终止遍历；3) The I-frame encoding unit performs intra-frame prediction on the current I-frame macroblock, obtains the best intra-frame prediction block of the current I-frame macroblock, performs residual calculation with the current I-frame macroblock, and calculates the obtained residual The matrix is transformed, quantized, and entropy encoded; wherein, during each traversal of the intra prediction mode, it is judged whether the rate-distortion cost of the 8x8 block calculated in the current intra prediction mode is less than the average value of the rate-distortion cost, and if so, it is As the best intra frame prediction block of the current I frame macroblock, the traversal is terminated;

所述率失真代价均值＝所有已计算帧内预测模式下8x8块率失真代价之和/已计算的帧内预测模式数。The average value of the rate-distortion cost=the sum of the rate-distortion costs of all calculated intra-frame prediction modes for 8×8 blocks/the number of calculated intra-frame prediction modes.

进一步的，获取所述当前P帧宏块的最佳帧间宏块模式的方法为：对于每一种帧间宏块模式的帧间预测中，进行内层循环参考帧遍历时，参考帧遍历过第一帧后，如果当前帧间宏块模式当前子块对应的代价小于之前的最小代价，则将当前参考帧作为当前帧间宏块模式当前子块的最佳参考帧，此时当前子块对应的代价作为最小代价，并终止内层循环参考帧遍历；对于外层帧间宏块模式遍历，帧间宏块模式遍历过第一种后，如果当前帧间宏块模式下的总代价小于之前最小总代价，则将当前帧间宏块模式作为当前P帧宏块的最佳帧间宏块模式。Further, the method for obtaining the best inter-frame macroblock mode of the current P-frame macroblock is as follows: for each type of inter-frame macroblock mode inter-frame prediction, when performing inner-layer cycle reference frame traversal, reference frame traversal After the first frame, if the cost corresponding to the current sub-block in the current inter-frame macroblock mode is less than the previous minimum cost, the current reference frame will be used as the best reference frame for the current sub-block in the current inter-frame macroblock mode. At this time, the current sub-block The cost corresponding to the block is taken as the minimum cost, and the inner loop reference frame traversal is terminated; for the outer inter macroblock mode traversal, after the inter macroblock mode has traversed the first type, if the total cost in the current inter macroblock mode is less than the previous minimum total cost, the current inter-frame macroblock mode is taken as the best inter-frame macroblock mode for the current P-frame macroblock.

进一步的，所述块间绝对误差和(SAD)的计算方法为：将所有待编码块全部分割为8x8块的单位，然后采用SSE技术中的_mm_loadl_epi64()指令获取待编码8x8块和预测8x8块中各一整行像素值，然后利用SSE技术中的_mm_sad_epu8()指令计算这两行像素值的绝对误差和，利用SSE技术中的_mm_extract_epi16()指令提取该绝对误差和并将其累加到总绝对误差和上；当所有行遍历完后，即得到待编码8x8块和预测8x8块之间的绝对误差和。Further, the calculation method of the sum of absolute errors (SAD) between blocks is as follows: all the blocks to be encoded are divided into units of 8x8 blocks, and then the _mm_loadl_epi64() instruction in the SSE technology is used to obtain the 8x8 blocks to be encoded and the predicted 8x8 A whole row of pixel values in each block, and then use the _mm_sad_epu8() instruction in the SSE technology to calculate the absolute error sum of the two rows of pixel values, and use the _mm_extract_epi16() instruction in the SSE technology to extract the absolute error sum and accumulate it to the total absolute error sum; when all rows are traversed, the absolute error sum between the 8x8 block to be encoded and the predicted 8x8 block is obtained.

进一步的，所述帧间宏块模式包括16x16、16x8、8x16、8x8四种帧间宏块模式。Further, the inter-frame macroblock modes include four inter-frame macroblock modes: 16x16, 16x8, 8x16, and 8x8.

进一步的，采用UMHexagons运动估计算法进行帧间预测。Further, the UMHexagons motion estimation algorithm is used for inter-frame prediction.

进一步的，所述帧内预测包括：均值预测、水平预测、垂直预测、平面预测、左下对角预测和右下对角预测。Further, the intra-frame prediction includes: mean value prediction, horizontal prediction, vertical prediction, planar prediction, bottom-left diagonal prediction, and bottom-right diagonal prediction.

本方法的有益效果是：The beneficial effect of this method is:

提升移动平台上AVS编码效率，使摄像机能够完成每秒10帧的实时采集和编码，提高了处理器利用率。下面结合测试实例进行分析。Improve the efficiency of AVS encoding on the mobile platform, enabling the camera to complete real-time acquisition and encoding at 10 frames per second, and improve processor utilization. The following is combined with the test case for analysis.

将搭载优化前AVS编码器的摄像机应用和搭载优化后AVS编码器的摄像机应用分别在移动设备上运行，通过对三个不同的YUV文件进行编码，得到测试数据。对测试数据进行分析，得到测试数据分析结果。Run the camera application equipped with the pre-optimized AVS encoder and the camera application equipped with the optimized AVS encoder on the mobile device respectively, and obtain test data by encoding three different YUV files. Analyze the test data to obtain the test data analysis results.

移动设备配置如表1所示。The mobile device configuration is shown in Table 1.

表1移动设备配置Table 1 Mobile device configuration

项目project 值value Targettarget Android4.2.2–API Level16Android4.2.2–API Level16 CPU/ABICPU/ABI INTEL(x86-atom)INTEL (x86-atom) SD Card supportSD Card support yesyes

SD Card sizeSD Card size 10GB10GB Camera Facing Front supportCamera Facing Front support yesyes Camera Facing Back supportCamera Facing Back support yesyes Keyboard supportKeyboard support yesyes

三个YUV测试文件详细参数如表2所示：The detailed parameters of the three YUV test files are shown in Table 2:

表2YUV测试文件详细参数Table 2 YUV test file detailed parameters

使用优化前和优化后的AVS编码器，分别对三个YUV测试文件进行编码，编码帧数为100帧。得到输出视频比特率，Y、U、V三个分量平均信噪比，编码时间三个测试数据。针对三个YUV测试文件的测试数据如表3、表4、表5所示。Use the optimized and optimized AVS encoders to encode three YUV test files respectively, and the number of encoded frames is 100 frames. The output video bit rate, the average signal-to-noise ratio of the three components of Y, U, and V, and the encoding time are obtained. The test data for the three YUV test files are shown in Table 3, Table 4, and Table 5.

表3YUV测试文件的测试数据Table 3 Test data of YUV test file

表4YUV测试文件的测试数据Table 4 Test data of YUV test file

表5YUV测试文件的测试数据Table 5 Test data of YUV test file

综合测试数据进行分析，优化后相对优化前：编码时间比优化前节省80％以上，平均信噪比不低于优化前的98％，输出文件比特率不高于优化前的172％，帧速基本达到10fps。Comprehensive test data analysis, after optimization compared with before optimization: the encoding time is saved by more than 80% compared with before optimization, the average signal-to-noise ratio is not lower than 98% before optimization, the output file bit rate is not higher than 172% before optimization, and the frame rate Basically reach 10fps.

附图说明Description of drawings

下面结合附图和实施例对本方法进一步说明。The method will be further described below in conjunction with the accompanying drawings and embodiments.

图1是宏块编码流程图。Figure 1 is a flow chart of macroblock encoding.

图2是优化前帧间预测流程图。Fig. 2 is a flowchart of inter-frame prediction before optimization.

图3是优化后帧间预测流程图。Fig. 3 is a flowchart of optimized inter-frame prediction.

图4是优化前8x8块最佳帧内预测模式选择流程图。Fig. 4 is a flow chart of selecting the best intra prediction mode for an 8x8 block before optimization.

图5是优化后8x8块最佳帧内预测模式选择流程图。Fig. 5 is a flow chart of selecting the best intra prediction mode for an optimized 8x8 block.

图6是优化前绝对误差和计算流程图。Fig. 6 is the flow chart of absolute error and calculation before optimization.

图7是优化后绝对误差和计算流程图。Fig. 7 is a flow chart of absolute error and calculation after optimization.

图8是_mm_loadl_epi64()指令示意图。Fig. 8 is a schematic diagram of the _mm_loadl_epi64() instruction.

图9是_mm_sad_epu8()指令示意图。Fig. 9 is a schematic diagram of the _mm_sad_epu8() instruction.

图10是_mm_extract_epi16()指令示意图。Fig. 10 is a schematic diagram of the _mm_extract_epi16() instruction.

具体实施方式detailed description

使用处理器专用的C编译器编译经优化的AVS视频编码器，使之成为Android摄像机应用工程的外部引用库。编译摄像机应用，生成摄像机应用程序。在移动平台上安装本方法对应的Android摄像机应用。单击该应用图标，打开应用。单击“开始录制”按钮，程序进入录制状态。录制过程中，程序对每帧图像进行处理：移动平台AVS摄像机应用采集一帧YUV格式图像，并将它传递给经优化的AVS编码器C程序。AVS编码器C程序完成对该YUV图像的AVS编码，最后将已编码的AVS图像数据保存至SD存储卡。录制完成后，点击“结束录制”按钮，程序进入录制结束状态。录制好的视频文件存储在Android移动平台的SD存储卡中。Compile the optimized AVS video encoder with a processor-specific C compiler, making it an external reference library for Android camera application projects. Compile the camera application and generate the camera application. Install the Android camera application corresponding to this method on the mobile platform. Click the app icon to open the app. Click the "Start Recording" button, the program enters the recording state. During the recording process, the program processes each frame of image: the mobile platform AVS camera application captures a frame of YUV format image and passes it to the optimized AVS encoder C program. The AVS encoder C program completes the AVS encoding of the YUV image, and finally saves the encoded AVS image data to the SD memory card. After the recording is completed, click the "End Recording" button, and the program enters the recording end state. The recorded video files are stored in the SD memory card of the Android mobile platform.

AVS宏块编码主流程如图1所示。The main process of AVS macroblock coding is shown in Figure 1.

对于I帧宏块，其编码步骤主要为：帧内色度预测、帧内亮度预测、计算所有帧内预测模式下的率失真代价、选择率失真代价最小的帧内预测模式为最佳帧内预测模式、设置宏块编码参数、变换、量化、熵编码。For I-frame macroblocks, the encoding steps are mainly: intra-frame chroma prediction, intra-frame luma prediction, calculating the rate-distortion cost in all intra-frame prediction modes, and selecting the intra-frame prediction mode with the smallest rate-distortion cost as the best intra-frame Prediction mode, setting macroblock coding parameters, transformation, quantization, entropy coding.

对于P帧宏块，其编码步骤主要为：帧间预测、选择最佳帧间宏块模式、设置宏块编码参数、变换、量化、熵编码。For P-frame macroblocks, the encoding steps mainly include: inter-frame prediction, selection of the best inter-frame macroblock mode, setting macroblock encoding parameters, transformation, quantization, and entropy encoding.

需要说明的有以下几点：The following points need to be explained:

1.宏块编码参数包括待编码宏块与预测宏块的残差矩阵、最佳宏块编码模式参数等内容。1. The macroblock coding parameters include the residual matrix of the macroblock to be coded and the predicted macroblock, the optimal macroblock coding mode parameter, and the like.

2.对于帧内模式，由于帧内宏块模式是固定(8x8)的，因此只需要从几种帧内预测模式(均值预测、水平预测、垂直预测、平面预测、左下对角预测、右下对角预测)中依据率失真代价的值选择最佳帧内预测模式即可确定帧内最佳宏块编码模式的关键参数。2. For the intra-frame mode, since the intra-frame macroblock mode is fixed (8x8), only several intra-frame prediction modes (average prediction, horizontal prediction, vertical prediction, planar prediction, bottom-left diagonal prediction, bottom-right In the diagonal prediction), the key parameters of the best intra-macroblock coding mode can be determined by selecting the best intra-frame prediction mode according to the value of the rate-distortion cost.

3.对于帧间模式，由于帧间预测算法是固定(本方法中使用UMHexagons运动估计算法)的，因此只需要从几种帧间宏块模式(16x16、16x8、8x16、8x8、SKIP)中选择最佳帧间宏块模式即可确定帧间最佳宏块编码模式的关键参数。3. For the inter-frame mode, since the inter-frame prediction algorithm is fixed (UMHexagons motion estimation algorithm is used in this method), it only needs to be selected from several inter-frame macroblock modes (16x16, 16x8, 8x16, 8x8, SKIP) The best inter-frame macroblock mode can determine the key parameters of the best inter-frame macroblock coding mode.

4.在进行宏块编码之前，AVS编码器会检测当前待编码宏块的类型，如果为I帧宏块，则调用I帧编码单元进行编码；如果为P帧宏块，则调用P帧编码单元进行编码。4. Before encoding the macroblock, the AVS encoder will detect the type of the macroblock to be encoded. If it is an I-frame macroblock, it will call the I-frame encoding unit for encoding; if it is a P-frame macroblock, it will call the P-frame encoding unit to encode.

5.在“帧间预测”部分将待编码块和预测块的绝对误差和(SAD)作为代价的表征量，并根据SAD最小值选择其对应的帧间宏块模式为最佳帧间宏块模式。5. In the "Inter Prediction" section, the sum of absolute errors (SAD) of the block to be encoded and the predicted block is used as the characterization of the cost, and the corresponding inter macroblock mode is selected as the best inter macroblock according to the minimum value of SAD model.

帧间预测优化：Inter prediction optimization:

帧间预测的目的是寻找当前待编码P帧宏块的最佳帧间预测块，以便当前待编码P帧宏块与之做残差，并将残差矩阵进行变换、量化、熵编码等后续编码操作。The purpose of inter-frame prediction is to find the best inter-frame prediction block for the current P-frame macroblock to be coded, so that the current P-frame macroblock to be coded can make a residual with it, and the residual matrix is transformed, quantized, entropy coded, etc. encoding operations.

通过帧间预测的执行，需要明确以下几个参量：特定帧间宏块模式下所有子块的最佳参考帧、运动向量和最小代价，最佳帧间宏块模式和该模式下总最小代价。Through the execution of inter-frame prediction, the following parameters need to be clarified: the best reference frame, motion vector and minimum cost of all sub-blocks in a specific inter-frame macroblock mode, the best inter-frame macroblock mode and the total minimum cost in this mode .

本方法中，帧间预测主要包括四种帧间宏块模式(16x16、16x8、8x16、8x8)下的帧间预测和Skip模式下的帧间预测(其中，Skip模式下的帧间预测相对独立，不在优化范围内)。In this method, the inter-frame prediction mainly includes the inter-frame prediction under the four inter-frame macroblock modes (16x16, 16x8, 8x16, 8x8) and the inter-frame prediction under the Skip mode (wherein, the inter-frame prediction under the Skip mode is relatively independent , not within the scope of optimization).

四种帧间宏块模式(16x16、16x8、8x16、8x8)下的帧间预测的主体是两层循环，从外到内是四种帧间宏块模式的遍历和当前帧间宏块模式子块的遍历。在内层循环中，首先执行当前帧间宏块模式当前子块的运动搜索，之后通过完成参考帧的遍历，找到当前帧间宏块模式当前子块最佳参考帧和最小代价。在每种帧间宏块模式的遍历结束之前，将内层循环得到的当前帧间宏块模式所有子块的最小代价求和得到当前帧间宏块模式总代价，再通过与之前最小总代价的比较，更新待编码宏块的最佳帧间宏块模式和该宏块模式下的最小总代价。其过程如图2所示。The main body of the inter-frame prediction under the four inter-frame macroblock modes (16x16, 16x8, 8x16, 8x8) is a two-layer loop, from the outside to the inside is the traversal of the four inter-frame macroblock modes and the current inter-frame macroblock mode block traversal. In the inner loop, first perform the motion search of the current sub-block in the current inter-macroblock mode, and then find the best reference frame and the minimum cost of the current sub-block in the current inter-macroblock mode by completing the traversal of the reference frames. Before the traversal of each inter-frame macroblock mode ends, the minimum cost of all sub-blocks of the current inter-frame macroblock mode obtained by the inner loop is summed to obtain the total cost of the current inter-frame macroblock mode, and then the previous minimum total cost The best inter-macroblock mode of the macroblock to be encoded and the minimum total cost in the macroblock mode are updated. The process is shown in Figure 2.

四种帧间宏块模式(16x16、16x8、8x16、8x8)下的帧间预测的优化运用了提前终止的思想。保持原有结构基本不变的基础上，加入了两处提前终止。第一处加在内层循环参考帧遍历处，用于提前终止参考帧遍历。第二处加在外层的四种帧间宏块模式遍历处，用于提前终止四种帧间宏块模式的遍历。第一处提前终止的条件是：参考帧遍历过第一帧后，且当前参考帧条件下，当前帧间宏块模式当前子块对应的代价小于之前参考帧条件下当前帧间宏块模式当前子块的最小代价。第二处提前终止的条件是：帧间宏块模式遍历过第一种后，且当前帧间宏块模式下总代价小于之前帧间宏块模式下的最小总代价。其流程如图3所示。两处提前终止的指导思想均是：只需要找到一个较好的参考帧或较好的帧间宏块模式，而不必找到最优的那一个，这样可以提高程序的执行效率。The optimization of inter prediction under four inter macroblock modes (16x16, 16x8, 8x16, 8x8) uses the idea of early termination. On the basis of keeping the original structure basically unchanged, two early terminations were added. The first is added to the inner loop reference frame traversal to terminate the reference frame traversal early. The second part is added to the traversal of the four inter-frame macroblock modes in the outer layer, and is used to terminate the traversal of the four inter-frame macroblock modes in advance. The condition for the first early termination is: after the reference frame has traversed the first frame, and under the current reference frame condition, the cost corresponding to the current sub-block in the current inter-frame macroblock mode is less than the current inter-frame macroblock mode under the previous reference frame condition. The minimum cost of a subblock. The second condition for early termination is: after the inter-macroblock mode has traversed the first type, and the total cost in the current inter-macroblock mode is smaller than the minimum total cost in the previous inter-macroblock mode. Its process is shown in Figure 3. The guiding ideas of the two early terminations are: only need to find a better reference frame or a better inter-frame macroblock mode, instead of finding the optimal one, which can improve the execution efficiency of the program.

帧内预测优化：Intra prediction optimization:

I帧宏块对应的帧内预测的流程如之前所述，包括：帧内色度预测、帧内亮度预测、计算所有帧内预测模式下的率失真代价、选择率失真代价最小的帧内预测模式为最佳帧内预测模式、设置宏块编码参数、变换、量化、熵编码。The intra-frame prediction process corresponding to the I-frame macroblock is as described above, including: intra-frame chroma prediction, intra-frame luma prediction, calculating the rate-distortion cost in all intra-frame prediction modes, and selecting the intra-frame prediction with the smallest rate-distortion cost The modes are optimal intra prediction mode, setting macroblock encoding parameters, transformation, quantization, and entropy encoding.

帧内预测的单位是8x8块。在原有的流程中，需要遍历所有的帧内预测模式(均值预测、水平预测、垂直预测、平面预测、左下对角预测、右下对角预测)以确定所有帧内预测模式下的当前8x8块的率失真代价，再从中选择率失真代价最小的帧内预测模式为最佳帧内预测模式。其流程如图4所示。The unit of intra prediction is an 8x8 block. In the original process, it is necessary to traverse all intra prediction modes (mean prediction, horizontal prediction, vertical prediction, planar prediction, lower left diagonal prediction, lower right diagonal prediction) to determine the current 8x8 block in all intra prediction modes rate-distortion cost, and then select the intra-frame prediction mode with the smallest rate-distortion cost as the best intra-frame prediction mode. Its process is shown in Figure 4.

对这部分操作的优化运用了提前终止的思想。在保持8x8块最佳帧内预测模式选择原有结构不变的前提下，在每次帧内预测模式遍历中加入一个判断，若该判断为真，则更新8x8块最佳帧内预测模式并提前终止对所有帧内预测模式的遍历。该判断为：判断当前帧内预测模式下计算所得的8x8块率失真代价是否小于率失真代价均值。The optimization of this part of the operation uses the idea of early termination. Under the premise of keeping the original structure of the optimal intra-frame prediction mode selection of 8x8 blocks unchanged, a judgment is added in each intra-frame prediction mode traversal. If the judgment is true, the best intra-frame prediction mode of 8x8 blocks is updated and Terminates the walk through all intra prediction modes early. The judgment is: judging whether the rate-distortion cost of the 8x8 block calculated in the current intra-frame prediction mode is smaller than the average value of the rate-distortion cost.

率失真代价均值计算方法是：The calculation method of the mean value of the rate-distortion cost is:

率失真代价均值＝所有已计算帧内预测模式下8x8块率失真代价之和/已计算的帧内预测模式数。Average rate-distortion cost = sum of rate-distortion costs of all calculated intra-frame prediction modes for 8x8 blocks/number of calculated intra-frame prediction modes.

优化完成后，减少了对帧内预测模式的遍历次数，提高了程序执行效率。其流程如图5所示。After the optimization is completed, the number of traversals to the intra-frame prediction mode is reduced, and the program execution efficiency is improved. Its process is shown in Figure 5.

AVS视频编码器SSE优化：AVS video encoder SSE optimization:

安装处理器专用的C编译器，以支持SSE指令集。将帧间预测SAD计算部分的C语言指令用适当的SSE指令进行替换，从而完成优化，提高程序执行效率以及处理器利用率。Install a processor-specific C compiler that supports the SSE instruction set. Replace the C language instructions in the inter-frame prediction SAD calculation part with appropriate SSE instructions, so as to complete the optimization and improve the program execution efficiency and processor utilization.

SSE优化的对象是帧间预测中两个块间绝对误差和(SAD)的计算部分。其计算公式为：The object of SSE optimization is the calculation part of the sum of absolute errors (SAD) between two blocks in inter-frame prediction. Its calculation formula is:

∑_i＝0 ^i<M|P(i)-Q(i)|，∑ _i=0 ^i<M |P(i)-Q(i)|,

其中M为块像素点总数，P(i)为第一个块i像素点的Y分量值，Q(i)为第二个块i像素点的Y分量值。Where M is the total number of block pixels, P(i) is the Y component value of the i pixel point of the first block, and Q(i) is the Y component value of the i pixel point of the second block.

优化前的代码全部用C语言表达，其具体实施方式就是依照块内像素的X、Y坐标进行两层循环，循环最内层计算当前位置绝对误差，并加到绝对误差和中。最终达到计算所有像素点绝对误差之和的目的。在对X坐标进行循环时做出了优化，连续的四个像素点的计算作为一组，便于利用处理器的流水线。其流程如图6所示。The code before optimization is all expressed in C language. The specific implementation method is to perform two-layer loop according to the X and Y coordinates of the pixels in the block. The innermost loop calculates the absolute error of the current position and adds it to the absolute error sum. Finally, the purpose of calculating the sum of the absolute errors of all pixels is achieved. The optimization is made when the X coordinate is cycled, and the calculation of four consecutive pixels is taken as a group, which is convenient to use the pipeline of the processor. Its process is shown in Figure 6.

优化的思路是先将传入的16x16、16x8、8x16或8x8块全部分割为8x8块的单位，再依次对8x8块进行处理。在对8x8块的处理过程中，运用了SSE技术中的_mm_loadl_epi64()、_mm_sad_epu8()、_mm_extract_epi16()等几个函数。首先利用_mm_loadl_epi64()指令获取待编码8x8块和预测8x8块中各一整行像素值，其次利用_mm_sad_epu8()指令计算这两行像素值的绝对误差和，最后利用_mm_extract_epi16()指令提取刚刚计算出的绝对误差和，并将它累加到总绝对误差和上。所有行遍历完后，即得到待编码8x8块和预测8x8块之间的绝对误差和。其流程如图7所示。The idea of optimization is to divide all incoming 16x16, 16x8, 8x16 or 8x8 blocks into units of 8x8 blocks, and then process the 8x8 blocks in turn. In the processing of 8x8 blocks, several functions such as _mm_loadl_epi64(), _mm_sad_epu8(), _mm_extract_epi16() in SSE technology are used. First, use the _mm_loadl_epi64() instruction to obtain a whole row of pixel values in the 8x8 block to be encoded and the predicted 8x8 block, and then use the _mm_sad_epu8() instruction to calculate the absolute error sum of the two rows of pixel values, and finally use the _mm_extract_epi16() instruction to extract Just compute the sum of absolute errors and add it to the total sum of absolute errors. After traversing all rows, the sum of absolute errors between the 8x8 block to be coded and the predicted 8x8 block is obtained. Its process is shown in Figure 7.

优化过程中用到的几个函数具体解释如下。Several functions used in the optimization process are explained in detail as follows.

(1)s0＝_mm_loadl_epi64((__m128i*)ref_block):(1) s0=_mm_loadl_epi64((__m128i*)ref_block):

如图8，本指令将ref_block指针指向的64位(8字节)数据提取，并存放在s0(总长度为128位)的低64位中。As shown in Figure 8, this instruction extracts the 64-bit (8-byte) data pointed to by the ref_block pointer and stores it in the lower 64 bits of s0 (the total length is 128 bits).

(2)s2＝_mm_sad_epu8(s0,s1)：(2) s2=_mm_sad_epu8(s0,s1):

如图9所示，本指令将S0、S1(两个数据长度均为128位)的低64位以字节(8位)为单位计算绝对误差并求和，并将求得的绝对误差和(SAD)存放在S2的0-15位(作为一个字存储)中；同时将S0、S1的高64位以字节(8位)为单位计算绝对误差并求和，并将求得的绝对误差和(SAD)存放在S2的64-79位(作为一个字存储)中。本函数只用到其中的低64位绝对误差和(SAD)。As shown in Figure 9, this instruction calculates the absolute error and sums the lower 64 bits of S0 and S1 (both data lengths are 128 bits) in bytes (8 bits), and sums the obtained absolute error sum (SAD) is stored in 0-15 bits of S2 (stored as a word); at the same time, the upper 64 bits of S0 and S1 are calculated and summed in units of bytes (8 bits), and the obtained absolute The sum of errors (SAD) is stored in bits 64-79 of S2 (stored as a word). This function only uses the low 64-bit sum of absolute errors (SAD).

(3)mcost+＝_mm_extract_epi16(s2,0)：(3) mcost+=_mm_extract_epi16(s2,0):

如图10所示，本指令将S2的低16位数据(即所需的S1、S0的低64位以字节为单位所求得的绝对误差和)提取出来并累加到int类型的mcost变量中。_mm_extract_epi16(__m128i，int)通过第二个参数确定提取数据在S2中的位置(0表示0-15位，1表示16-31位，以此类推)，提取单位为16位(一个字)。As shown in Figure 10, this instruction extracts the lower 16-bit data of S2 (that is, the absolute error sum of the required lower 64 bits of S1 and S0 in bytes) and accumulates it to the mcost variable of type int middle. _mm_extract_epi16(__m128i, int) determines the position of the extracted data in S2 through the second parameter (0 means 0-15 bits, 1 means 16-31 bits, and so on), and the extraction unit is 16 bits (one word).

Claims

1. A kind of real-time AVS soft coding method for Intel mobile platform, its step is:

1) AVS encoder detects the type of the macroblock to be encoded at present, if it is an I frame macroblock, then calls the I frame coding unit to encode; if it is a P frame macroblock, then calls the P frame coding unit to encode;

2) The P-frame encoding unit performs inter-frame prediction on the current P-frame macroblock, selects the inter-frame macroblock mode with the smallest inter-block absolute error and SAD as the best inter-frame macroblock mode, and obtains prediction in this inter-frame macroblock mode The macroblock is used as the best predicted macroblock; the residual is calculated with the current P frame macroblock, and the resulting residual matrix is transformed, quantized, and entropy encoded;

3) The I-frame encoding unit performs intra-frame prediction on the current I-frame macroblock, obtains the best intra-frame prediction block of the current I-frame macroblock, performs residual calculation with the current I-frame macroblock, and calculates the obtained residual The matrix is transformed, quantized, and entropy encoded; wherein, during each traversal of the intra prediction mode, it is judged whether the rate-distortion cost of the 8x8 block calculated in the current intra prediction mode is less than the average value of the rate-distortion cost, and if so, it is As the best intra frame prediction block of the current I frame macroblock, the traversal is terminated;

The average value of the rate-distortion cost = the sum of the rate-distortion costs of all calculated intra-frame prediction modes for 8x8 blocks/the number of calculated intra-frame prediction modes;

Wherein, the method for obtaining the best inter-frame macroblock mode of the macroblock of the current P frame is: in the inter-frame prediction of each inter-frame macroblock mode, when performing inner-layer cycle reference frame traversal, the reference frame traversed After the first frame, if the cost corresponding to the current sub-block in the current inter-macroblock mode is less than the previous minimum cost, the current reference frame is used as the best reference frame for the current sub-block in the current inter-macroblock mode. At this time, the current sub-block The corresponding cost is taken as the minimum cost, and the inner loop reference frame traversal is terminated; for the outer inter macroblock mode traversal, after the inter macroblock mode has traversed the first type, if the total cost in the current inter macroblock mode is less than If the previous minimum total cost is used, the current inter-frame macroblock mode is used as the best inter-frame macroblock mode for the current P-frame macroblock.

2. The method according to claim 1, wherein the calculation method of the absolute error between the blocks and the SAD is: all blocks to be coded are all divided into units of 8x8 blocks, and then _mm_loadl_epi64 () in the SSE technology is adopted The instruction obtains a whole row of pixel values in the 8x8 block to be encoded and the predicted 8x8 block, and then uses the _mm_sad_epu8() instruction in the SSE technology to calculate the absolute error sum of the two rows of pixel values, and uses the _mm_extract_epi 16() in the SSE technology The instruction extracts the sum of absolute errors and adds it to the total sum of absolute errors; when all rows are traversed, the sum of absolute errors between the 8x8 block to be encoded and the predicted 8x8 block is obtained.

3. The method according to claim 1, wherein the inter-frame macroblock modes include four inter-frame macroblock modes: 16x16, 16x8, 8x16, and 8x8.

4. The method according to claim 1, characterized in that UMHexagons motion estimation algorithm is used for inter-frame prediction.

5. The method according to claim 1, wherein the intra-frame prediction comprises: mean value prediction, horizontal prediction, vertical prediction, planar prediction, bottom-left diagonal prediction, and bottom-right diagonal prediction.