CN106558083A

CN106558083A - A kind of accelerated method in webp compression algorithms infra-frame prediction stage, apparatus and system

Info

Publication number: CN106558083A
Application number: CN201611092708.XA
Authority: CN
Inventors: 魏士欣
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2017-04-05

Abstract

The invention discloses an acceleration method, device and system for the intra-frame prediction stage of a webp compression algorithm, which are used in a data center server, including obtaining a picture in the webp format on the CPU side, dividing the picture into macroblocks of a preset size and positioning them Information; select the macroblock to be predicted as the current block, judge whether there is an encoded macroblock in the preset position around the current block according to the position information of the current block, and if so, send the current block and the encoded macroblock to the calculation Chip; the computing chip performs several preset prediction calculations according to the current block and the encoded macroblock, and obtains the prediction results and distortion rates of each prediction calculation, and sends them to the CPU; select the prediction calculation corresponding to the minimum distortion rate The result is used as the encoding of the current block; continue to select the next current block until all the macroblocks in the picture are encoded. The invention distinguishes logical judgment and vector calculation, which are respectively processed by different chips, and the speed of intra-frame prediction is fast.

Description

A method, device and system for accelerating the intra-frame prediction stage of a webp compression algorithm

技术领域technical field

本发明涉及图片帧内预测技术领域，特别是涉及一种webp压缩算法帧内预测阶段的加速方法、装置及系统。The present invention relates to the technical field of picture intra-frame prediction, in particular to an acceleration method, device and system for the intra-frame prediction stage of a webp compression algorithm.

背景技术Background technique

在互联网+时代，据不完全统计，图片数据占互联网总流量的60％左右，海量图片数据的传输增加了网络延迟。WebP图片格式是一种图片压缩格式，采用复杂的压缩算法来换取更高的压缩率。In the era of Internet +, according to incomplete statistics, picture data accounts for about 60% of the total Internet traffic, and the transmission of massive picture data increases network delay. The WebP image format is an image compression format that uses a complex compression algorithm in exchange for a higher compression rate.

在整个WebP压缩流程中，帧内预测是一个重要阶段，帧内预测的目的是对图片分割后得到的各个宏块进行编码。在该过程中由于处理多种预测模式，故需要复杂的逻辑判断的以及大量的向量计算操作，而传统的CPU不能很好地实现高速并行计算，处理速度慢。In the entire WebP compression process, intra-frame prediction is an important stage. The purpose of intra-frame prediction is to encode each macroblock obtained after the picture is divided. In this process, complex logic judgments and a large number of vector calculation operations are required due to the processing of multiple prediction modes, while the traditional CPU cannot well realize high-speed parallel calculation, and the processing speed is slow.

因此，如何提供一种处理速度快的webp格式的图片帧内预测方法、装置及系统是本领域技术人员目前需要解决的问题。Therefore, how to provide a method, device and system for intra-frame prediction of pictures in webp format with fast processing speed is a problem that those skilled in the art need to solve at present.

发明内容Contents of the invention

本发明的目的是提供一种webp压缩算法帧内预测阶段的加速方法、装置及系统，将逻辑判断等操作与向量计算操作区分开来，分别由CPU和计算芯片进行，两种芯片分工处理，帧内预测的速度快。The purpose of the present invention is to provide an acceleration method, device and system for the intra-frame prediction stage of a webp compression algorithm, which distinguishes operations such as logic judgment from vector calculation operations, which are respectively performed by a CPU and a computing chip, and the two chips are divided into tasks. Intra prediction is fast.

为解决上述技术问题，本发明提供了一种webp压缩算法帧内预测阶段的加速方法，用于数据中心服务器，包括：In order to solve the above-mentioned technical problems, the present invention provides a method for accelerating the intra-frame prediction stage of a webp compression algorithm, which is used for a data center server, including:

步骤s101：在CPU端获取webp格式的图片，将所述图片划分为预设大小的宏块并记录每个宏块的位置信息；Step s101: Obtain a picture in webp format on the CPU side, divide the picture into macroblocks of a preset size and record the position information of each macroblock;

步骤s102：选定当前待预测的宏块作为当前块，依据所述当前块的位置信息判断所述图片上所述当前块的周围预设位置是否存在已编码的宏块，若有，将所述当前块以及所述已编码的宏块发送至计算芯片；供计算芯片根据所述当前块以及所述已编码的宏块分别进行预设的若干种预测计算，得到每种所述预测计算的预测结果以及失真率，并发送至所述CPU；Step s102: Select the current macroblock to be predicted as the current block, judge whether there is an encoded macroblock in the preset position around the current block on the picture according to the position information of the current block, and if so, store the The current block and the coded macro block are sent to the calculation chip; the calculation chip performs several preset prediction calculations according to the current block and the coded macro block, and obtains the prediction calculation of each The prediction result and the distortion rate are sent to the CPU;

步骤s103：选取失真率最小的预测计算对应的预测结果作为所述当前块的编码，重复步骤s102，直至所述图片内的全部宏块均编码完成。Step s103: Select the prediction result corresponding to the prediction calculation with the smallest distortion rate as the encoding of the current block, and repeat step s102 until all the macroblocks in the picture are encoded.

优选地，所述周围预设位置具体为所述当前块的左侧、上方及左上方。Preferably, the surrounding preset positions are specifically the left side, upper side and upper left side of the current block.

优选地，所述预设的若干种预测计算具体包括：Preferably, the preset several predictive calculations specifically include:

水平预测、垂直预测、DC预测以及运动预测中的一种或几种的组合。One or a combination of horizontal prediction, vertical prediction, DC prediction and motion prediction.

为解决上述技术问题，本发明还提供了一种webp压缩算法帧内预测阶段的加速装置，用于数据中心服务器，包括：In order to solve the above-mentioned technical problems, the present invention also provides an acceleration device for the intra-frame prediction stage of a webp compression algorithm, which is used for a data center server, including:

图片获取模块，用于在CPU端获取webp格式的图片，将所述图片划分为预设大小的宏块并记录每个宏块的位置信息；The picture acquisition module is used to obtain the picture of webp format at the CPU side, divides the picture into macroblocks of preset size and records the position information of each macroblock;

选定模块，用于选定当前待预测的宏块作为当前块，依据所述当前块的位置信息判断所述图片上所述当前块的周围预设位置是否存在已编码的宏块，若有，将所述当前块以及所述已编码的宏块发送至计算芯片；供计算芯片根据所述当前块以及所述已编码的宏块分别进行预设的若干种预测计算，得到每种所述预测计算的预测结果以及失真率，并发送至所述CPU；The selection module is used to select the current macroblock to be predicted as the current block, and judge whether there is an encoded macroblock in the preset position around the current block on the picture according to the position information of the current block, if there is , sending the current block and the coded macro block to the computing chip; the computing chip performs several preset prediction calculations according to the current block and the coded macro block, and obtains each of the The prediction result and the distortion rate of the prediction calculation are sent to the CPU;

编码模块，用于选取失真率最小的预测计算对应的预测结果作为所述当前块的编码；并触发所述选定模块选定下一个待预测的宏块，直至所述图片内的全部宏块均编码完成。The coding module is used to select the prediction result corresponding to the prediction calculation with the smallest distortion rate as the coding of the current block; and trigger the selection module to select the next macroblock to be predicted until all the macroblocks in the picture All coded.

为解决上述技术问题，本发明还提供了一种webp压缩算法帧内预测阶段的加速系统，用于数据中心服务器，包括CPU以及计算芯片，所述CPU内包括以上图片帧内预测装置。In order to solve the above technical problems, the present invention also provides an acceleration system for the intra-frame prediction stage of the webp compression algorithm, which is used in a data center server, including a CPU and a computing chip, and the CPU includes the above picture intra-frame prediction device.

优选地，所述计算芯片具体为FPGA。Preferably, the computing chip is specifically an FPGA.

本发明提供了一种webp压缩算法帧内预测阶段的加速方法、装置及系统，在CPU内进行宏块的划分，以及当前块的周围预设位置是否存在已编码宏块的逻辑判断操作，之后对当前块以及已编码宏块进行预测计算则是由计算芯片进行。即本发明将逻辑判断等操作与向量计算操作区分开来，分别由CPU和计算芯片进行，计算芯片相比CPU更为适合进行计算操作，而CPU则仅用于处理逻辑判断等操作，由两种芯片分工处理，提高了帧内预测的速度。The present invention provides an acceleration method, device and system for the intra-frame prediction stage of the webp compression algorithm, which divides the macroblocks in the CPU, and performs a logic judgment operation on whether there are coded macroblocks in the preset positions around the current block, and then Predictive calculations for the current block and encoded macroblocks are performed by the computing chip. That is to say, the present invention distinguishes operations such as logic judgment from vector calculation operations, which are performed by CPU and computing chip respectively. Compared with CPU, computing chip is more suitable for computing operations, while CPU is only used to process operations such as logical judgment. The division of work by a chip increases the speed of intra-frame prediction.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对现有技术和实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the prior art and the accompanying drawings that need to be used in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为本发明提供的一种webp压缩算法帧内预测阶段的加速方法的过程的流程图；Fig. 1 is the flow chart of the process of the acceleration method of a kind of webp compression algorithm intra prediction stage that the present invention provides;

图2为本发明提供的一种具体实施例中当前块与周围已编码的宏块示意图；Fig. 2 is a schematic diagram of the current block and surrounding coded macroblocks in a specific embodiment provided by the present invention;

图3为本发明提供的一种webp压缩算法帧内预测阶段的加速系统的结构示意图。FIG. 3 is a schematic structural diagram of an acceleration system in the intra prediction stage of a webp compression algorithm provided by the present invention.

具体实施方式detailed description

本发明的核心是提供一种webp压缩算法帧内预测阶段的加速方法及其装置，将逻辑判断等操作与向量计算操作区分开来，分别由CPU和计算芯片进行，两种芯片分工处理，帧内预测的速度快。The core of the present invention is to provide an acceleration method and device for the intra-frame prediction stage of a webp compression algorithm, which distinguishes operations such as logic judgment from vector calculation operations, which are performed by the CPU and the computing chip respectively, and the two chips are divided into tasks. Intraprediction is fast.

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

本发明提供了一种webp压缩算法帧内预测阶段的加速方法，用于数据中心服务器，参见图1所示，图1为本发明提供的一种webp压缩算法帧内预测阶段的加速方法的过程的流程图；该方法包括：The present invention provides an acceleration method for the intra-frame prediction stage of a webp compression algorithm, which is used in a data center server, as shown in Figure 1, which is a process of an acceleration method for the intra-frame prediction stage of a webp compression algorithm provided by the present invention A flowchart of the method; the method includes:

步骤s101：在CPU端获取webp格式的图片，将图片划分为预设大小的宏块并记录每个宏块的位置信息；Step s101: Obtain a picture in webp format on the CPU side, divide the picture into macroblocks of a preset size and record the position information of each macroblock;

可以理解的是，该图片采用的是YUV颜色编码方法，每个宏块由一个亮度块和两个色度块组成，由于Y(亮度分量)与UV(色度分量)采样率不同，将Y、UV分割成不同大小的块，其中Y亮度块的大小为16*16，UV色度块的大小为8*8；当然，本发明对各个宏块的具体大小不做限定。It can be understood that the picture adopts the YUV color coding method, and each macroblock is composed of one luma block and two chrominance blocks. Since the sampling rate of Y (brightness component) and UV (chrominance component) are different, the Y , UV is divided into blocks of different sizes, wherein the size of the Y luma block is 16*16, and the size of the UV chrominance block is 8*8; of course, the present invention does not limit the specific size of each macro block.

步骤s102：选定当前待预测的宏块作为当前块，依据当前块的位置信息判断图片上当前块的周围预设位置是否存在已编码的宏块，若有，将当前块以及已编码的宏块发送至计算芯片；供计算芯片根据当前块以及已编码的宏块分别进行预设的若干种预测计算，得到每种预测计算的预测结果以及失真率，并发送至CPU；Step s102: Select the current macroblock to be predicted as the current block, judge whether there is an encoded macroblock in the preset position around the current block on the picture according to the position information of the current block, and if so, combine the current block and the encoded macroblock The block is sent to the calculation chip; the calculation chip performs several preset prediction calculations according to the current block and the encoded macro block, and obtains the prediction result and distortion rate of each prediction calculation, and sends it to the CPU;

其中，这里的位置信息具体为坐标，例如可以为xy坐标。另外，还需要根据当前块的坐标以及当前块的行列宽度计算出当前块的起始地址，用于进行宏块发送时使用。Wherein, the location information here is specifically coordinates, for example, may be xy coordinates. In addition, it is also necessary to calculate the start address of the current block according to the coordinates of the current block and the row and column widths of the current block, which is used for sending the macroblock.

作为优选地，周围预设位置具体为当前块的左侧、上方及左上方。周围预设位置视预测计算方式而定，本发明对此不作限定。Preferably, the surrounding preset positions are specifically the left side, upper side and upper left side of the current block. The surrounding preset positions depend on the prediction calculation method, which is not limited in the present invention.

另外，当发送当前块以及已编码的宏块至计算芯片后，CPU会发送启动控制指令至计算芯片，用于设置相关参数以及控制计算芯片进行预测计算。当然，这里的参数设置视具体情况而定。In addition, after sending the current block and the encoded macroblock to the computing chip, the CPU will send a start control command to the computing chip for setting relevant parameters and controlling the computing chip to perform predictive calculations. Of course, the parameter setting here depends on the specific situation.

进一步的，当预测完成后，计算芯片会发送结束标志至CPU，CPU接收该结束标志后生成读取指令，读取各种预测计算的预测结果及失真率。Further, when the prediction is completed, the computing chip will send an end flag to the CPU, and the CPU will generate a read instruction after receiving the end flag, and read the prediction results and distortion rates of various prediction calculations.

步骤s103：选取失真率最小的预测计算对应的预测结果作为当前块的编码，重复步骤s102，直至图片内的全部宏块均编码完成。Step s103: Select the prediction result corresponding to the prediction calculation with the smallest distortion rate as the encoding of the current block, and repeat step s102 until all the macroblocks in the picture are encoded.

其中，预设的若干种预测计算具体包括：Among them, several preset forecasting calculations specifically include:

为方便理解，以4*4的宏块为例。参见图2所示，图2为本发明提供的一种具体实施例中当前块与周围已编码的宏块示意图。For easy understanding, a 4*4 macroblock is taken as an example. Referring to FIG. 2 , FIG. 2 is a schematic diagram of a current block and surrounding coded macroblocks in a specific embodiment of the present invention.

其中，每种预测计算的关系式如下：Among them, the relational formula of each prediction calculation is as follows:

水平预测：X_ij＝L_i；即水平预测时当前块的每个像素的预测结果等于其左侧同一行的已编程的宏块的编码。Horizontal prediction: X _ij =L _i ; that is, during horizontal prediction, the prediction result of each pixel in the current block is equal to the coding of the programmed macroblock in the same row on the left.

垂直预测：X_ij＝A_j；即垂直预测时当前块的每个像素的预测结果等于其上方同一列的已编程的宏块的编码。Vertical prediction: X _ij =A _j ; that is, during vertical prediction, the prediction result of each pixel in the current block is equal to the coding of the programmed macroblock in the same column above it.

DC预测：X_ij＝(L_i+A_j/2DC prediction: X _ij = (L _i +A _j /2

运动预测：X_ij＝L_i+A_j-CMotion prediction: X _ij =L _i +A _j -C

其中i,j均∈(0,1,2,3)，X_ij表示对当前宏块的每一个像素预测结果。Where i, j are both ∈ (0, 1, 2, 3), X _ij represents the prediction result of each pixel of the current macroblock.

本发明提供了一种webp压缩算法帧内预测阶段的加速方法，在CPU内进行宏块的划分，以及当前块的周围预设位置是否存在已编码宏块的逻辑判断操作，之后对当前块以及已编码宏块进行预测计算则是由计算芯片进行。即本发明将逻辑判断等操作与向量计算操作区分开来，分别由CPU和计算芯片进行，计算芯片相比CPU更为适合进行计算操作，而CPU则仅用于处理逻辑判断等操作，由两种芯片分工处理，提高了帧内预测的速度。The present invention provides an acceleration method for the intra-frame prediction stage of the webp compression algorithm, which divides the macroblocks in the CPU, and performs a logical judgment operation on whether there are encoded macroblocks in the preset positions around the current block, and then performs the current block and The prediction calculation of the coded macroblock is performed by the calculation chip. That is to say, the present invention distinguishes operations such as logic judgment from vector calculation operations, which are performed by CPU and computing chip respectively. Compared with CPU, computing chip is more suitable for computing operations, while CPU is only used to process operations such as logical judgment. The division of work by a chip increases the speed of intra-frame prediction.

本发明还提供了一种webp压缩算法帧内预测阶段的加速装置，用于数据中心服务器，该装置包括：The present invention also provides an acceleration device for the intra-frame prediction stage of a webp compression algorithm, which is used for a data center server, and the device includes:

图片获取模块11，用于在CPU端获取webp格式的图片，将图片划分为预设大小的宏块并记录每个宏块的位置信息；Picture acquisition module 11, is used for obtaining the picture of webp format at CPU end, divides picture into the macroblock of preset size and records the positional information of each macroblock;

选定模块12，用于选定当前待预测的宏块作为当前块，依据当前块的位置信息判断图片上当前块的周围预设位置是否存在已编码的宏块，若有，将当前块以及已编码的宏块发送至计算芯片2；供计算芯片2根据当前块以及已编码的宏块分别进行预设的若干种预测计算，得到每种预测计算的预测结果以及失真率，并发送至CPU1；The selection module 12 is used to select the current macroblock to be predicted as the current block, judge whether there is an encoded macroblock in the preset position around the current block on the picture according to the position information of the current block, and if so, convert the current block and The coded macroblock is sent to the computing chip 2; the computing chip 2 performs several preset prediction calculations according to the current block and the coded macroblock, and obtains the prediction result and distortion rate of each prediction calculation, and sends it to the CPU1 ;

编码模块13，用于选取失真率最小的预测计算对应的预测结果作为当前块的编码；并触发选定模块12选定下一个待预测的宏块，直至图片内的全部宏块均编码完成。The encoding module 13 is used to select the prediction result corresponding to the prediction calculation with the smallest distortion rate as the encoding of the current block; and trigger the selection module 12 to select the next macroblock to be predicted until all the macroblocks in the picture are encoded.

本发明提供了一种webp压缩算法帧内预测阶段的加速装置，在CPU内进行宏块的划分，以及当前块的周围预设位置是否存在已编码宏块的逻辑判断操作，之后对当前块以及已编码宏块进行预测计算则是由计算芯片进行。即本发明将逻辑判断等操作与向量计算操作区分开来，分别由CPU和计算芯片进行，计算芯片相比CPU更为适合进行计算操作，而CPU则仅用于处理逻辑判断等操作，由两种芯片分工处理，提高了帧内预测的速度。The present invention provides an acceleration device for the intra-frame prediction stage of the webp compression algorithm, which divides the macroblocks in the CPU, and performs a logical judgment operation on whether there are coded macroblocks in the preset positions around the current block, and then performs the current block and The prediction calculation of the coded macroblock is performed by the calculation chip. That is to say, the present invention distinguishes operations such as logic judgment from vector calculation operations, which are performed by CPU and computing chip respectively. Compared with CPU, computing chip is more suitable for computing operations, while CPU is only used to process operations such as logical judgment. The division of work by a chip increases the speed of intra-frame prediction.

本发明还提供了一种webp压缩算法帧内预测阶段的加速系统，用于数据中心服务器，参见图3所示，图3为本发明提供的一种webp压缩算法帧内预测阶段的加速系统的结构示意图。该系统包括CPU1以及计算芯片2，CPU1内包括以上图片帧内预测装置。The present invention also provides an acceleration system for the intra-frame prediction stage of a webp compression algorithm, which is used for a data center server, as shown in FIG. 3 . FIG. 3 is an acceleration system for the intra-frame prediction stage of a webp compression algorithm provided by the present invention Schematic. The system includes a CPU1 and a computing chip 2, and the CPU1 includes the above picture intra prediction device.

其中，这里的计算芯片2具体为FPGA。可以理解的是，FPGA(Field-ProgrammableGate Array，现场可编程门阵列)计算速度快，适用于进行包含大量向量操作的运行处理。Wherein, the computing chip 2 here is specifically an FPGA. It can be understood that FPGA (Field-Programmable Gate Array, Field Programmable Gate Array) has a fast calculation speed and is suitable for running processing involving a large number of vector operations.

另外，本发明需要在FPGA内设置缓存，该缓存可设置与FPGA的DDR内存(DoubleData Rate，双倍速率同步动态随机存储器)上，当然，本发明不限定缓存的位置。该缓存用于CPU1发送的当前块与其周围预设位置处已编程的宏块的编码以及预测完成后得到预测结果及失真率。In addition, the present invention needs to set a cache in the FPGA, which can be set on the DDR memory (Double Data Rate, double data rate synchronous dynamic random access memory) of the FPGA. Of course, the present invention does not limit the location of the cache. The cache is used for encoding the current block sent by the CPU1 and the programmed macroblocks at the preset positions around it, and obtaining the prediction result and distortion rate after the prediction is completed.

另外，本发明可以采用OpenCL高级语言生成相应的算法描述，进而分别生成在CPU1上运行的主机端程序，以及面向FPGA平台的Kernel端程序。然后，采用GCC编译器对主机端程序进行编译，生成可在CPU1上执行的可执行程序文件；采用Altera SDK for OpenCL(AOC)高层次综合工具对Kernel程序文件进行编译综合，生成可在FPGA上运行的AOCX文件。最后，CPU1与FPGA之间采用PCI-E接口连接，进行数据通信；采用FPGA开发板上的DDR3内存作为数据缓存。当然，以上为优选实施例，具体如何生成程序根据实际情况而定。In addition, the present invention can use the OpenCL high-level language to generate corresponding algorithm descriptions, and further generate a host-side program running on the CPU1 and a Kernel-side program oriented to the FPGA platform. Then, use the GCC compiler to compile the host-side program to generate an executable program file that can be executed on CPU1; use the Altera SDK for OpenCL (AOC) high-level synthesis tool to compile and synthesize the Kernel program file to generate an executable program file that can be executed on the FPGA. AOCX file to run. Finally, the PCI-E interface is used to connect CPU1 and FPGA for data communication; the DDR3 memory on the FPGA development board is used as a data cache. Of course, the above is a preferred embodiment, and how to generate the program depends on the actual situation.

需要说明的是，在本说明书中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this specification, relative terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations Any such actual relationship or order exists between. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其他实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. a method for accelerating the intra-frame prediction stage of a webp compression algorithm, used for data center servers, characterized in that, comprising:

Step s101: Obtain a picture in webp format on the CPU side, divide the picture into macroblocks of a preset size and record the position information of each macroblock;

Step s102: Select the current macroblock to be predicted as the current block, judge whether there is an encoded macroblock in the preset position around the current block on the picture according to the position information of the current block, and if so, store the The current block and the coded macro block are sent to the calculation chip; the calculation chip performs several preset prediction calculations according to the current block and the coded macro block, and obtains the prediction calculation of each Prediction results and distortion rates are sent to the CPU;

Step s103: Select the prediction result corresponding to the prediction calculation with the smallest distortion rate as the encoding of the current block, and repeat step s102 until all the macroblocks in the picture are encoded.

2 . The method according to claim 1 , wherein the preset surrounding positions are specifically the left side, upper side and upper left side of the current block. 3 .

3. The method according to claim 1, characterized in that, the preset several predictive calculations specifically include:

One or a combination of horizontal prediction, vertical prediction, DC prediction and motion prediction.

4. An acceleration device for the intra-frame prediction stage of a webp compression algorithm, used for a data center server, is characterized in that, comprising:

The picture acquisition module is used to obtain the picture of webp format at the CPU side, divides the picture into macroblocks of preset size and records the position information of each macroblock;

The selection module is used to select the current macroblock to be predicted as the current block, and judge whether there is an encoded macroblock in the preset position around the current block on the picture according to the position information of the current block, if there is , sending the current block and the coded macro block to the computing chip; the computing chip performs several preset prediction calculations according to the current block and the coded macro block, and obtains each of the The prediction result and the distortion rate of the prediction calculation are sent to the CPU;

The coding module is used to select the prediction result corresponding to the prediction calculation with the smallest distortion rate as the coding of the current block; and trigger the selection module to select the next macroblock to be predicted until all the macroblocks in the picture All coded.

5. An acceleration system for the intra-frame prediction stage of a webp compression algorithm, which is used in a data center server, is characterized in that it includes a CPU and a computing chip, and the picture intra-frame prediction device as claimed in claim 4 is included in the CPU.

6. The system according to claim 4, wherein the computing chip is specifically an FPGA.