CN1235411C

CN1235411C - Flow-line-based frame predictive mode coding acceleration method

Info

Publication number: CN1235411C
Application number: CN 200310101445
Authority: CN
Inventors: 彭聪; 黄晁; 李锦涛
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2003-10-17
Filing date: 2003-10-17
Publication date: 2006-01-04
Anticipated expiration: 2023-10-17
Also published as: CN1529512A

Abstract

The invention relates to the technical field of network media communication, in particular to a pipeline-based intra-frame prediction mode block coding acceleration method. The steps are as follows: divide the video frame into macroblocks (such as 16*16), then divide the macroblock into subblocks (such as 4*4), and predictively encode each subblock in intra-frame prediction mode. It can greatly accelerate the speed of intra-frame coding without increasing resource consumption, thereby increasing the speed of video coding. The characteristics of the present invention are: based on the assembly line structure, without increasing resource consumption; suitable for any block-based intra-frame predictive encoding, with strong applicability; adjusting sub-block encoding order according to the assembly line requirements; and improving encoding speed. The invention is suitable for video coding design.

Description

Pipeline-Based Acceleration Method for Intra Prediction Mode Block Coding

技术领域technical field

本发明涉及网络媒体传播技术领域，特别是一种基于流水线的帧内预测模式块编码加速方法。The invention relates to the technical field of network media communication, in particular to a pipeline-based intra-frame prediction mode block coding acceleration method.

技术背景technical background

随着多媒体技术和网络技术的飞速发展和广泛应用，通过网络来传播各种视频数据得到越来越广泛的应用。由于原始视频数据所需带宽极大，同时又具有很大的冗余性，因此通常通过编码压缩再传输。在某些实时或近实时的环境下(如视频会议等)，要求编码器达到足够高的编码速度。而视频编码计算量的庞大，就要求对编码过程采用加速算法。With the rapid development and wide application of multimedia technology and network technology, the dissemination of various video data through the network has become more and more widely used. Because the original video data requires a huge bandwidth and has great redundancy, it is usually compressed and then transmitted through encoding. In some real-time or near-real-time environments (such as video conferencing, etc.), the encoder is required to achieve a sufficiently high encoding speed. The huge amount of video encoding calculations requires the use of accelerated algorithms for the encoding process.

视频编码从80年代末开始，从MPEG-1、H.261到现在的MPEG-4、H.264已经有很长的研究历史，提出了很多国际标准，但是基本思想仍然是分块压缩和运动预测。H.264在以前标准中的帧内宏块预测方式的基础上，提出了一种新的帧内预测模式——4*4帧内预测，它将一个16*16的宏块分为16个4*4的子块，每一子块分别进行帧内预测，各自有独立的预测模式。这种模式提高了编码的效率，也带来了计算复杂度的提高和编码速度的减慢，通过采用本发明提出的基于流水线的加速方法可以极大地加速帧内编码速度。Video coding began in the late 1980s. From MPEG-1, H.261 to the current MPEG-4, H.264, there has been a long research history, and many international standards have been proposed, but the basic idea is still block compression and motion predict. Based on the intra-frame macroblock prediction method in the previous standard, H.264 proposes a new intra-frame prediction mode - 4*4 intra-frame prediction, which divides a 16*16 macroblock into 16 For 4*4 sub-blocks, intra-frame prediction is performed on each sub-block, and each has an independent prediction mode. This mode improves the coding efficiency, but also brings about the increase of computational complexity and the slowdown of coding speed, and the speed of intra-frame coding can be greatly accelerated by adopting the pipeline-based acceleration method proposed by the present invention.

发明内容Contents of the invention

本发明的目的在于提供一种基于流水线的帧内预测模式块编码加速方法。本发明包括以下特征：The purpose of the present invention is to provide a pipeline-based method for accelerating block coding in intra-frame prediction mode. The present invention includes the following features:

发明的技术方案Invented technical solution

一种基于流水线的帧内预测模式块编码加速方法，将帧内编码过程分为预测、DCT变换及量化、反量化及反DCT变换和重建四个子过程，根据宏块中子块帧内编码的数据相关性，将从左至右、从上至下的子块帧内编码顺序进行调整，使得编码顺序中子块和后续子块间不存在数据相关性，从而使子块帧内编码的子过程之间形成流水线操作，不需要等待前一子块的全部子过程结束才能进行后续子块的子过程，在不增加资源消耗的情况下，前一子块的DCT变换及量化子过程可以与后续子块的预测子过程同时进行，反量化及反DCT变换子过程可以与后续子块的DCT变换及量化子过程同时进行，重建子过程可以与后续子块的反量化及反DCT变换子过程同时进行，依次类推。A pipeline-based intra-frame prediction mode block coding acceleration method, which divides the intra-frame coding process into four sub-processes: prediction, DCT transformation and quantization, inverse quantization, inverse DCT transformation and reconstruction. Data dependency, adjust the sub-block intra-coding order from left to right and from top to bottom, so that there is no data correlation between sub-blocks and subsequent sub-blocks in the coding order, so that sub-block intra-coded sub-blocks The pipeline operation is formed between the processes, and the sub-process of the subsequent sub-block does not need to wait for the completion of all the sub-processes of the previous sub-block. Without increasing resource consumption, the DCT transformation and quantization sub-process of the previous sub-block can be compared with The prediction sub-process of subsequent sub-blocks is carried out simultaneously, the sub-process of inverse quantization and inverse DCT transformation can be carried out simultaneously with the sub-process of DCT transformation and quantization of subsequent sub-blocks, and the sub-process of reconstruction can be carried out together with the sub-process of inverse quantization and inverse DCT transformation of subsequent sub-blocks Simultaneously, and so on.

附图说明Description of drawings

图1是MPEG-4AVC/H.264帧内4×4块编码顺序图。Fig. 1 is a sequence diagram of encoding 4*4 blocks in an MPEG-4AVC/H.264 frame.

图2是MPEG-4AVC/H.264预测参考点图。FIG. 2 is a diagram of MPEG-4AVC/H.264 prediction reference points.

图3是帧内编码流水线示意图。Fig. 3 is a schematic diagram of an intra-frame encoding pipeline.

图4是顺序执行序列图。Figure 4 is a sequential execution sequence diagram.

图5是理想情况下流水线执行序列图。Figure 5 is an ideal pipeline execution sequence diagram.

图6是实际情况下流水线执行序列图。Fig. 6 is a sequence diagram of pipeline execution under actual conditions.

发明的具体实施方式Specific Embodiments of the Invention

图1中，以MPEG-4AVC/H.264帧内预测为例：In Figure 1, take MPEG-4AVC/H.264 intra-frame prediction as an example:

MPEG-4AVC/H.264将视频帧划分为16*16的宏块，再将宏块划分为4*4的子块，在帧内预测模式中对每一子块分别预测编码。图1为编码顺序，MPEG-4AVC/H.264 divides the video frame into 16*16 macroblocks, and then divides the macroblock into 4*4 subblocks, and predicts and codes each subblock separately in the intra prediction mode. Figure 1 shows the coding sequence,

图2是预测所需参考点。Figure 2 is the reference point required for prediction.

如图3，帧内4x4块编码过程可以分为预测、DCT变换及量化、反量化及反DCT变换和重建四个子过程，设四个子过程所需时间分别为T₁、T₂、T₃、T₄，则对一个宏块的16个子块帧内编码顺序执行所需总时间As shown in Figure 3, the intra-frame 4x4 block encoding process can be divided into four sub-processes: prediction, DCT transformation and quantization, inverse quantization, inverse DCT transformation and reconstruction. The time required for the four sub-processes is respectively T ₁ , T ₂ , T ₃ , T ₄ , then the total time required to execute the intra-frame coding sequence of 16 sub-blocks of a macroblock

T_seq＝16*(T₁+T₂+T₃+T₄)T _seq ＝16*(T ₁ +T ₂ +T ₃ +T ₄ )

执行序列如图4所示。The execution sequence is shown in Figure 4.

可以注意到，预测、DCT变换及量化、反量化及反DCT变换和重建四个子过程在任何时刻都只有一个在执行，造成了资源的浪费和计算时间的延长，因此可以采用流水线技术充分利用计算资源并减少计算时间。如图5所示，在理想情况下对一个宏块的16个子块帧内编码流水线的执行时间It can be noticed that only one of the four sub-processes of prediction, DCT transformation and quantization, inverse quantization and inverse DCT transformation and reconstruction is executed at any time, resulting in waste of resources and prolongation of calculation time, so pipeline technology can be used to make full use of calculation resources and reduce computing time. As shown in Figure 5, under ideal conditions, the execution time of the 16 sub-block intra-frame coding pipelines of a macroblock

T_{pipeline_ideal}＝T₁+max(T₁+T₂)+max((T₁+T₂+T₃)T _{pipeline_ideal} ＝T ₁ +max(T ₁ +T ₂ )+max((T ₁ +T ₂ +T ₃ )

+max(T₁+T₂+T₃+T₄)*13+max(T ₁ +T ₂ +T ₃ +T ₄ )*13

+max(T₂+T₃+T₄)+max(T₃+T₄)+T₄ +max(T ₂ +T ₃ +T ₄ )+max(T ₃ +T ₄ )+T ₄

但是由于各子块之间存在数据相关性，不能达到理想状态，如子块1的预测需要字块0的重建数据，子块2的预测需要子块0和子块0的重建数据。根据数据相关性，将子块编码顺序进行调整以充分发挥流水线效率，编码顺序调整如下：However, due to the data correlation between sub-blocks, the ideal state cannot be achieved. For example, the prediction of sub-block 1 needs the reconstruction data of block 0, and the prediction of sub-block 2 needs the reconstruction data of sub-block 0 and sub-block 0. According to the data correlation, the sub-block encoding order is adjusted to fully utilize the pipeline efficiency, and the encoding order is adjusted as follows:

0，1，4，2，5，3，6，8，7，9，12，10，13，11，14，15执行序列图如图6所示。0, 1, 4, 2, 5, 3, 6, 8, 7, 9, 12, 10, 13, 11, 14, 15 are shown in Figure 6.

对一个宏块的16个子块帧内编码执行所需总时间Total time required to perform intra coding of 16 subblocks of a macroblock

T_{pipeline_real}＝(T₁+T₂+T₃+T₄)*4+{T₁+T₄+[max(T₁，T₂)+max(T₂，T _{pipeline_real} = (T ₁ +T ₂ +T ₃ +T ₄ )*4+{T ₁ +T ₄ +[max(T ₁ , T ₂ )+max(T ₂ ,

T₃)+max(T₃，T₄)+max(T₄，T₁)]*2}*3T ₃ )+max(T ₃ , T ₄ )+max(T ₄ , T ₁ )]*2}*3

加速比λ＝T_seq/T_{pipeline_real} Speedup ratio λ=T _seq /T _{pipeline_real}

在FPGA参考硬件实现中，T₁＝20cycle，T₂＝T₃＝16cycle，T₄＝18cycle，可以得出加速比为1.3365，在不增加硬件资源消耗的情况下性能提高了33.65％。In the FPGA reference hardware implementation, T ₁ =20cycle, T ₂ =T ₃ =16cycle, T ₄ =18cycle, it can be obtained that the acceleration ratio is 1.3365, and the performance is improved by 33.65% without increasing the consumption of hardware resources.

Claims

1. intra prediction mode block encoding accelerated method based on streamline, the intraframe coding process is divided into prediction, dct transform and quantification, inverse quantization and anti-dct transform and four subprocess of reconstruction, data dependence according to the intraframe coding of macro block neutron piece, will be from left to right, sub-piece intraframe coding is from top to bottom adjusted in proper order, make and do not have data dependence between coded sequence neutron piece and subsequent sub-block, thereby make between the subprocess of sub-piece intraframe coding and form pile line operation, do not need to wait for that whole subprocess of last sub-piece finish just can carry out the subprocess of subsequent sub-block, under the situation that does not increase resource consumption, the dct transform of last sub-piece and quantification subprocess can carry out simultaneously with the predictor process of subsequent sub-block, inverse quantization and anti-dct transform subprocess can carry out simultaneously with the dct transform and the quantification subprocess of subsequent sub-block, rebuilding subprocess can carry out simultaneously with the inverse quantization and the anti-dct transform subprocess of subsequent sub-block, and the like.