[go: up one dir, main page]

CN101150719B - Method and device for parallel video coding - Google Patents

Method and device for parallel video coding Download PDF

Info

Publication number
CN101150719B
CN101150719B CN 200610113256 CN200610113256A CN101150719B CN 101150719 B CN101150719 B CN 101150719B CN 200610113256 CN200610113256 CN 200610113256 CN 200610113256 A CN200610113256 A CN 200610113256A CN 101150719 B CN101150719 B CN 101150719B
Authority
CN
China
Prior art keywords
slice
group
sub
frame
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200610113256
Other languages
Chinese (zh)
Other versions
CN101150719A (en
Inventor
孟新建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN 200610113256 priority Critical patent/CN101150719B/en
Publication of CN101150719A publication Critical patent/CN101150719A/en
Application granted granted Critical
Publication of CN101150719B publication Critical patent/CN101150719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明涉及一种并行视频编码的方法及装置。本发明主要包括主处理器及多个编码器,所述的主处理器用于将视频序列中的当前帧固定分割成宏块,按预定规则将所有宏块分配给一个或多个条带组;以及将当前帧的条带组按使多个编码器之间处理负荷均衡原则,并依据光栅扫描的顺序划分为一个或多个子条带组;并将划分确定的当前帧所有子条带组和编码配置参数分别传送给多个编码器,之后,由所述的多个编码器对当前帧的子条带组并行进行编码处理,分别输出码流及参数;最后,将各编码器输出的码流及参数汇聚,进一步完成条带组、帧和序列的编码,输出整个序列码流。因此,本发明的方法和装置适用于实时高清视频编码。

Figure 200610113256

The invention relates to a parallel video encoding method and device. The present invention mainly includes a main processor and a plurality of encoders, the main processor is used to divide the current frame in the video sequence into macroblocks, and allocate all the macroblocks to one or more slice groups according to predetermined rules; and Divide the slice group of the current frame into one or more sub-slice groups according to the principle of balancing the processing load between multiple encoders and in the order of raster scanning; and divide all the sub-slice groups and encoding The configuration parameters are respectively transmitted to multiple encoders, and then the multiple encoders encode the sub-strip groups of the current frame in parallel, and output code streams and parameters respectively; finally, the code streams output by each encoder And parameter aggregation, further complete the encoding of slice groups, frames and sequences, and output the entire sequence code stream. Therefore, the method and device of the present invention are suitable for real-time high-definition video coding.

Figure 200610113256

Description

并行视频编码的方法及装置 Method and device for parallel video coding

技术领域technical field

本发明涉及视频编码技术领域,尤其涉及一种并行视频编码技术。The present invention relates to the technical field of video coding, in particular to a parallel video coding technology.

背景技术Background technique

视频编码技术是将数字视频信息压缩,以便实现更有效地被传送和存储。视频编码是多媒体数据处理的核心技术。Video coding technology is to compress digital video information so that it can be transmitted and stored more efficiently. Video coding is the core technology of multimedia data processing.

目前,视频压缩编码标准主要包括:国际电信联盟标准化部(ITU-T)视频编码专家组(VCEG)制定的活动图像编码标准H.261、H.263;ISO/IEC联合的活动图像专家组(MPEG)制定的视频编码标准MPEG1、MPEG4-Part2;ITU-T视频编码专家组(VCEG)和ISO/IEC MPEG联合专家组(JVT)共同制定的视频编码标准MPEG2/H.262、H.264/AVC(MPEG4-Part10);另外还有VC—1(前身是WMV—9)和音视频标准组(AVS)制定的视频编码标准AVS1.0-P2等。At present, video compression coding standards mainly include: the moving picture coding standards H.261 and H.263 formulated by the Video Coding Experts Group (VCEG) of the International Telecommunication Union (ITU-T); the moving picture expert group ( MPEG) video coding standards MPEG1, MPEG4-Part2; video coding standards MPEG2/H.262, H.264/ AVC (MPEG4-Part10); In addition, there are VC-1 (formerly WMV-9) and the video coding standard AVS1.0-P2 formulated by the Audio Video Standards Group (AVS).

MPEG(活动图像专家组)、JVT(MPEG联合专家组)及VCEG(视频编码专家组)系列视频编码标准基本框架如图1所示,采用基于块的运动补偿和变换编码的混合编码架构,包括帧内预测、帧间预测、变换、量化和熵编码等。其中,帧间预测是使用基于块的运动矢量来消除图像间的冗余;帧内预测是使用空间预测模式来消除图像内的冗余。再通过对预测残差进行变换和量化消除图像内的视觉冗余。最后,运动矢量、预测模式、量化参数和变换系数用熵编码进行压缩。视频解码过程的基本处理单元是宏块,且一个宏块通常包括一个16×16的亮度样值块和对应的色度样值块。The basic framework of MPEG (Moving Picture Experts Group), JVT (MPEG Joint Experts Group) and VCEG (Video Coding Experts Group) series of video coding standards is shown in Figure 1, using a hybrid coding architecture based on block-based motion compensation and transform coding, including Intra prediction, inter prediction, transform, quantization and entropy coding, etc. Among them, inter-frame prediction uses block-based motion vectors to eliminate redundancy between images; intra-frame prediction uses spatial prediction modes to eliminate image redundancy. Then, the visual redundancy in the image is eliminated by transforming and quantizing the prediction residual. Finally, motion vectors, prediction modes, quantization parameters and transform coefficients are compressed with entropy coding. The basic processing unit in the video decoding process is a macroblock, and a macroblock usually includes a 16×16 luma sample block and a corresponding chrominance sample block.

不同的标准的所使用工具的有一定差别。对于新一代标准H.264/AVC(MPEG4-Part10)、VC—1、AVS1.0-P2而言,去块滤波器是必须的模块,被称为环路滤波;而在MPEG2、H.263、MPEG4-Part2标准中,去块滤波器仅是解码器中可选的一个后处理环节。There are certain differences in the tools used by different standards. For the new generation standard H.264/AVC (MPEG4-Part10), VC-1, AVS1.0-P2, deblocking filter is a necessary module, called loop filter; while in MPEG2, H.263 , In the MPEG4-Part2 standard, the deblocking filter is only an optional post-processing link in the decoder.

实时视频编码器的输入为高清视频信号,实时完成视频压缩编码,输出码流。实时视频编码器是数字电视头端(Headend)系统的基本设备,也广泛用于视频会议、数字摄像、可录的DVD播放机等设备中。The input of the real-time video encoder is a high-definition video signal, and the video compression coding is completed in real time, and the output stream is output. The real-time video encoder is the basic equipment of the digital TV headend (Headend) system, and is also widely used in equipment such as video conferencing, digital camera, and recordable DVD player.

H.264/AVC(MPEG4-Part10)、VC—1、AVS1.0-P2被称为新一代视频压缩编码标准,同以MPEG2为代表的上一代标准相比,新一代标准的压缩率提供一倍以上,但复杂度也增加2倍以上,实现的难度大大增加。H.264/AVC (MPEG4-Part10), VC-1, and AVS1.0-P2 are called new-generation video compression coding standards. Compared with the previous generation standard represented by MPEG2, the compression rate of the new generation standard provides a higher times, but the complexity also increases by more than 2 times, and the difficulty of implementation is greatly increased.

高清电视(HDTV)通常是指每帧扫描行数为逐行720线或隔行1080线及其以上的活动图像。目前常见的高清格式有:720p(分辨率1280×720,帧频为24、30、60),1080i(隔行扫描,每帧分辨率1920×1088,场频为60),1080p(分辨率1920×1088,帧频为24、30)。未来更高分辨率的视频也会获得应用。高清视频能提供更高的视频质量,同时,高清视频压缩编码的实现更为困难。High-definition television (HDTV) usually refers to moving images whose scanning lines per frame are 720 progressive lines or 1080 interlaced lines and above. Currently common high-definition formats are: 720p (resolution 1280×720, frame frequency 24, 30, 60), 1080i (interlaced scanning, resolution 1920×1088 per frame, field frequency 60), 1080p (resolution 1920× 1088, the frame rate is 24, 30). Higher resolution video will also be available in the future. High-definition video can provide higher video quality, and at the same time, the realization of high-definition video compression coding is more difficult.

以新一代标准H.264/AVC为例,由于引入了多参考帧运动补偿、最小为4×4的可变块尺寸预测、丰富的帧内预测模式、环路滤波、上下文自适应的变长编码(CAVLC)或上下文自适应的算术编码(CABAC)等工具,使得编码器复杂度大大增加,据评估,在采用全搜索运动估计的情况下,H.264/AVC HDTV720p编码器的计算复杂度约为3600giga-instructions per second(GIPS)、Memory访问流量约5570giga-bytes per second(GBytes/s)。1080i和1080p情况下更为巨大。Taking the new-generation standard H.264/AVC as an example, due to the introduction of multi-reference frame motion compensation, variable block size prediction with a minimum of 4×4, rich intra-frame prediction modes, loop filtering, and context-adaptive variable length Tools such as coding (CAVLC) or context-adaptive arithmetic coding (CABAC) greatly increase the complexity of the encoder. According to estimates, in the case of full search motion estimation, the computational complexity of the H.264/AVC HDTV720p encoder About 3600giga-instructions per second(GIPS), Memory access traffic about 5570giga-bytes per second(GBytes/s). It's even bigger at 1080i and 1080p.

由于高清视频编码器巨大的计算复杂性,通常采用单个处理器难于实现实时编码。尤其是在头端(Headend)等应用场合,需要支持多通道、多种视频格式和压缩编码标准以及转码等应用环境中,采用单个编码器实现编码的难度更为突出。因此,高清编码器常常需要采用多芯片(即多个编码器)并行进行编码处理实现。Due to the enormous computational complexity of high-definition video encoders, real-time encoding is often difficult to achieve with a single processor. Especially in applications such as the headend (Headend), which need to support multiple channels, multiple video formats and compression coding standards, and transcoding application environments, the difficulty of using a single encoder to achieve encoding is more prominent. Therefore, high-definition encoders often need to use multiple chips (that is, multiple encoders) to implement encoding processing in parallel.

然而,目前业界还没有一种并行视频编码处理方案能很好满足H.264/AVC(MPEG4-Part10)、VC—1、AVS1.0-P2等新一代视频压缩编码标准头端等应用的高性能实时高清编码要求。However, currently there is no parallel video coding solution in the industry that can well meet the high requirements of head-end applications such as H.264/AVC (MPEG4-Part10), VC-1, and AVS1.0-P2. Performance real-time HD encoding requirements.

发明内容Contents of the invention

本发明的目的是提供一种并行视频编码的方法及装置,从而提供一种高效的并行视频编码实现,满足高性能实时高清视频编码。The purpose of the present invention is to provide a parallel video encoding method and device, thereby providing an efficient implementation of parallel video encoding to meet high-performance real-time high-definition video encoding.

本发明的目的是通过以下技术方案实现的:The purpose of the present invention is achieved through the following technical solutions:

本发明提供了一种并行视频编码的方法,该方法包括:The present invention provides a method for parallel video coding, the method comprising:

将视频序列中的当前帧分割成宏块,按预定规则将所有宏块分配给一个或多个条带组;Divide the current frame in the video sequence into macroblocks, and assign all macroblocks to one or more slice groups according to predetermined rules;

依据光栅扫描的顺序将所述条带组分别划分为一个或多个子条带组;在所述划分过程中,每个宏块必须并且只能分配到一个子条带组;Divide the slice group into one or more sub-slice groups according to the order of raster scanning; during the division process, each macroblock must and can only be allocated to one sub-slice group;

将子条带组根据其对应的处理负荷均衡地对应至各个编码器,每一个子条带组对应同一个编码器,每个编码器对应一个或多个子条带组;Corresponding sub-strip groups to respective encoders in a balanced manner according to their corresponding processing loads, each sub-strip group corresponds to the same encoder, and each encoder corresponds to one or more sub-strip groups;

按子条带组和编码器对应关系,将当前帧所有子条带组和编码配置参数传送给多个编码器;According to the corresponding relationship between sub-slice groups and encoders, transmit all sub-slice groups and encoding configuration parameters of the current frame to multiple encoders;

当当前帧为I帧时,各编码器丢弃所有重建子条带组数据;When the current frame is an I frame, each encoder discards all reconstructed sub-slice group data;

当当前帧不为I帧时,每个编码器从其它编码器重建子条带组获取所属子条带组运动估计所需的参考数据;包括:根据各编码器当前帧子条带组和重建子条带组的图像区、存储区交叠关系,以及当前帧子条带组中运动估计的最大搜索区、多参考帧属性,确定编码器之间交换的最少的参考数据,各编码器之间通过交换操作获得所述参考数据,并利用所述参考数据更新各编码器中缓存的每个子条带组运动估计搜索区参考数据;When the current frame is not an I frame, each encoder reconstructs the sub-slice group from other encoders to obtain the reference data required for motion estimation of the sub-slice group; The image area and storage area overlapping relationship of the sub-slice group, as well as the maximum search area for motion estimation in the current frame sub-slice group, and the multi-reference frame attribute determine the minimum reference data exchanged between the encoders, and each encoder Obtain the reference data through an exchange operation, and use the reference data to update the reference data of each sub-slice group motion estimation search area cached in each encoder;

由所述的多个编码器对当前帧的子条带组并行进行编码处理,在编码过程中将每个子条带组按光栅扫描的顺序划分为一个或多个条带,完成对所述子条带组中所有宏块的编码,产生重建子条带组,输出编码码流和参数;The sub-slice groups of the current frame are encoded in parallel by the multiple encoders, and each sub-strip group is divided into one or more slices in the order of raster scanning during the encoding process, and the Coding of all macroblocks in the slice group, generating a reconstructed sub-slice group, and outputting the coded code stream and parameters;

将各编码器输出的码流及参数汇聚,完成条带组、帧和序列的编码,输出整个序列码流。Gather the code streams and parameters output by each encoder to complete the encoding of the slice group, frame and sequence, and output the entire sequence code stream.

所述的预定规则包括:交错扫描模式、散乱扫描模式、前景和背景扫描模式、打开的盒子Box-out扫描模式、光栅扫捕扫描模式、手绢扫描模式、以及需要采用编号指明每一个宏块所属的条带组的显式扫描模式,或者,将当前帧划分为一个条带组。The predetermined rules include: interlaced scan mode, random scan mode, foreground and background scan mode, open box Box-out scan mode, raster sweep scan mode, handkerchief scan mode, and the need to use numbers to indicate that each macroblock belongs to The explicit scan mode of the slice group, or, divide the current frame into a slice group.

所述的子条带组为宽度等于帧宽度的一个或多个连续的整宏块行。The sub-strip group is one or more continuous whole macroblock rows whose width is equal to the frame width.

所述的将子条带组根据其对应的处理负荷均衡地对应至各个编码器的处理包括:The processing of corresponding the sub-strip group to each encoder in a balanced manner according to its corresponding processing load includes:

根据各个编码器处理能力,以及参考数据传送代价,将当前帧包含的各子条带对应至各个编码器,令多个编码器同时完成所分配子条带组的编码处理。According to the processing capability of each encoder and the reference data transmission cost, each sub-slice included in the current frame is corresponding to each encoder, so that multiple encoders simultaneously complete the encoding process of the assigned sub-slice group.

在图像组编码过程中,所述的划分成多个子条带组的处理包括:During the encoding process of the group of pictures, the processing of dividing into multiple sub-slice groups includes:

对于I帧,按照宏块数预测编码处理负荷,并根据编码器的处理能力及所述编码处理负荷将子条带组对应至各个编码器,令多个编码器同时完成所分配子条带组的编码处理;For an I frame, the encoding processing load is predicted according to the number of macroblocks, and the sub-slice group is corresponding to each encoder according to the processing capability of the encoder and the encoding processing load, so that multiple encoders can simultaneously complete the assigned sub-slice group encoding processing;

对于第一个非I帧,当条带组划分不变时,采用I帧的子条带组划分方式进行子条带组的划分;For the first non-I frame, when the slice group division is unchanged, the sub-slice group division method of the I frame is used to divide the sub-slice group;

对于第2个非I帧,根据实际统计得到的第1个非I帧的各子条带组编码处理量预测所述第2个非I帧各子条带组的编码处理负荷,并根据参考数据载入代价,调整所述第2个非I帧子条带组的划分,令并行编码所述第2个非I帧时多个编码器处理时间一致;For the second non-I frame, predict the encoding processing load of each sub-slice group of the second non-I frame according to the actual statistics of the encoding processing load of each sub-slice group of the first non-I frame, and according to the reference Data loading cost, adjusting the division of the second non-I frame sub-strip group, so that the processing time of multiple encoders is consistent when encoding the second non-I frame in parallel;

对于第2个非I帧以后的任一非I帧,根据该帧前一帧或前几帧各子条带组编码处理量预测所述非I帧各子条带组的编码处理负荷,并根据考帧数据载入代价,调整所述非I帧子条带组的划分,令并行编码所述非I帧时多个编码器处理时间一致。For any non-I frame after the second non-I frame, predict the encoding processing load of each sub-slice group of the non-I frame according to the encoding processing amount of each sub-slice group in the previous frame or several previous frames of the frame, and According to the frame data loading cost, the division of the non-I frame sub-slice group is adjusted, so that the processing time of multiple encoders is consistent when the non-I frame is encoded in parallel.

所述条带为宽度等于帧宽度的一个或多个连续的整宏块行。The slice is one or more consecutive rows of full macroblocks with a width equal to the frame width.

所述的每个编码器对子条带组中的每个宏块进行的编码处理过程包括:The encoding process performed by each encoder on each macroblock in the sub-slice group includes:

使用多个后向或前向参考帧进行运动估计,及运动补偿;Motion estimation and motion compensation using multiple backward or forward reference frames;

帧内预测选择,即当前宏块帧内预测时,使用同一条带已重建尚未环路滤波的左边宏块和上边宏块数据;帧内/帧间选择,求残差;Intra-frame prediction selection, that is, when the current macroblock is intra-frame predicted, use the same strip to reconstruct the left macroblock and the upper macroblock data that have not been loop-filtered; intra-frame/inter-frame selection, find the residual;

码率控制;Bit rate control;

整数变换、量化;Integer transformation, quantization;

重排序,熵编码,熵编码是上下文基于自适应的变长编码或基于上下文的自适应二进制算术编码;Reordering, entropy coding, entropy coding is context-based adaptive variable-length coding or context-based adaptive binary arithmetic coding;

反量化,反变换;Inverse quantization, inverse transformation;

重建;reconstruction;

环路滤波。loop filtering.

所述的每个编码器对子条带组中的每个宏块进行的编码处理过程还包括:设置选择当前帧所有条带的边界环路滤波模式为不滤波,各编码器各自独立完成每个条带环路滤波处理,且各编码器、各条带之间不交换信息。The encoding process performed by each encoder on each macroblock in the sub-slice group further includes: setting and selecting the boundary loop filtering mode of all slices in the current frame to no filtering, and each encoder independently completes each Each slice is loop-filtered, and information is not exchanged between each encoder and each slice.

所述各自独立完成的环路滤波包括:条带的环路滤波在该条带第一个宏块重建完成后启动,或者,条带的环路滤波在该条带重建全部完成后开始。The independently completed loop filtering includes: the loop filtering of the slice is started after the reconstruction of the first macroblock of the slice is completed, or the loop filtering of the slice is started after all reconstructions of the slice are completed.

本发明还提供了一种并行视频编码的装置,包括主处理器和多个编码器,主处理器用于将待编码的当前帧划分为子条带组,并分别传递给多个编码器,多个编码器并行编码各子条带组,输出各自编码码流给主处理器,由主处理器生成序列码流;The present invention also provides a device for parallel video encoding, which includes a main processor and multiple encoders, the main processor is used to divide the current frame to be encoded into sub-slice groups, and transmit them to the multiple encoders respectively. An encoder encodes each sub-strip group in parallel, and outputs each coded code stream to the main processor, and the main processor generates a serial code stream;

所述的主处理器包括条带组确定单元、子条带组确定单元及顶层编码单元,其中:The main processor includes a slice group determination unit, a sub-slice group determination unit and a top-level coding unit, wherein:

条带组确定单元,用于将视频帧序列中的当前帧分割成宏块,按预定规则将所有宏块分配给一个或多个条带组;A slice group determination unit, configured to divide the current frame in the video frame sequence into macroblocks, and assign all macroblocks to one or more slice groups according to predetermined rules;

子条带组确定单元,用于对当前帧的所有条带组,依据光栅扫描的顺序将所述条带组分别划分为一个或多个子条带组,且每个宏块必须并且只能分配到一个子条带组,每个编码器对应一个或多个子条带组,每一个子条带组对应个编码器;The sub-slice group determination unit is used to divide all slice groups of the current frame into one or more sub-slice groups according to the order of raster scanning, and each macroblock must and can only be allocated To a sub-strip group, each encoder corresponds to one or more sub-strip groups, and each sub-strip group corresponds to an encoder;

子条带组数据传递单元,按子条带组和编码器对应关系将当前帧所有子条带组和编码配置参数传送给各个编码器;The sub-slice group data transmission unit transmits all sub-slice groups and encoding configuration parameters of the current frame to each encoder according to the corresponding relationship between the sub-slice group and the encoder;

顶层编码单元,将各编码器输出的码流及参数汇聚,完成条带组、帧和序列的编码,输出整个序列码流;The top-level encoding unit gathers the code streams and parameters output by each encoder, completes the encoding of strip groups, frames and sequences, and outputs the entire sequence code stream;

所述的编码器包括子条带组接收单元、参考数据输入输出单元及编码单元,其中:The encoder includes a sub-strip group receiving unit, a reference data input and output unit, and an encoding unit, wherein:

子条带组接收单元,用于接收子条带组和编码配置参数;The sub-strip group receiving unit is used to receive the sub-strip group and encoding configuration parameters;

参考数据输入输出单元,用于在各编码器之间交换参考数据,当当前帧不为I帧时,控制编码器之间交换已重建的参考帧子条带组数据,更新各编码器中缓存的每个子条带组运动估计搜索区参考数据;The reference data input and output unit is used to exchange reference data between the encoders. When the current frame is not an I frame, control the exchange of the reconstructed reference frame sub-strip group data between the encoders, and update the cache in each encoder Each sub-slice group motion estimation search area reference data;

编码单元,用于所分配子条带组进行编码处理,在编码过程中将每个子条带组按光栅扫描的顺序划分为一个或多个条带,完成对所分配子条带组中所有宏块的编码,产生重建子条带组,输出编码码流和参数。The encoding unit is used for encoding processing of the assigned sub-strip group. During the encoding process, each sub-strip group is divided into one or more slices in the order of raster scanning, and all macros in the assigned sub-strip group are completed. Coding of the block generates a reconstructed sub-strip group, and outputs coded code stream and parameters.

所述的主处理器的处理包括:将整个帧划分为一个条带组,并进一步划分为多个子条带组,且所有子条带组为宽度等于帧宽度的一个或多个连续的整宏块行。The processing of the main processor includes: dividing the entire frame into a slice group, and further dividing it into a plurality of sub-strip groups, and all the sub-strip groups are one or more continuous whole macros whose width is equal to the frame width block row.

所述的多个编码器中的每个编码器还包括数据存储单元,用于缓存当前和重建的子条带组数据。Each of the plurality of encoders further includes a data storage unit for buffering current and reconstructed sub-slice group data.

所述的条带为宽度等于帧宽度的一个或多个连续的整宏块行,而且,所述的同一个宏块仅能被划分到同一条带中。The strip is one or more continuous whole macroblock rows whose width is equal to the frame width, and the same macroblock can only be divided into the same strip.

所述的装置主处理器还包括控制单元,用于初始化、配置及控制主处理器和多个编码器完成整个视频序列编码。The main processor of the device further includes a control unit for initializing, configuring and controlling the main processor and multiple encoders to complete the encoding of the entire video sequence.

所述的多个编码器中的每个编码器在进行编码过程中还包括:Each encoder in the plurality of encoders also includes:

将当前帧所有条带的边界环路滤波模式设置为不滤波,各编码器各自独立完成每个条带环路滤波处理,各编码器、各条带之间不交换信息,且条带的环路滤波在该条带第一个宏块重建完成后启动,或者,条带的环路滤波在该条带重建全部完成后开始。Set the boundary loop filtering mode of all slices in the current frame to no filtering, each encoder completes the loop filtering process of each slice independently, no information is exchanged between each encoder and each slice, and the loop of the slice The in-loop filtering of the slice starts after the reconstruction of the first macroblock of the slice is completed, or the in-loop filtering of the slice starts after all reconstructions of the slice are completed.

所述的多个编码器仅完成宏块级熵编码输出宏块码流和参数,主处理器顶层编码模块完成条带级熵编码。The multiple encoders only complete macroblock-level entropy coding to output macroblock code streams and parameters, and the top-level coding module of the main processor completes slice-level entropy coding.

由上述本发明提供的技术方案可以看出,本发明提供了一种并行视频编码处理方案,从而可以在视频编码过程中选择并行的视频编码方式,使得本发明还可以很好地满足高性能实时高清视频编码所需的处理能力。同时,本发明中,由于采用负载均衡的原则将粒度适中的待编码数据提供给多个编码器,并由多个编码器并行对视频数据进行编码处理,从而使得各处理器之间尽可能地减少相互等待的时间,并可以尽量减少编码器之间交互的开销尽量减少,因此,本发明提供的并行视频编码处理方案还具有较高的视频编码效率。It can be seen from the above-mentioned technical solution provided by the present invention that the present invention provides a parallel video coding processing solution, so that a parallel video coding method can be selected in the video coding process, so that the present invention can also well meet high-performance real-time The processing power required for high-definition video encoding. At the same time, in the present invention, due to the principle of load balancing, the data to be encoded with moderate granularity is provided to a plurality of encoders, and the video data is encoded in parallel by a plurality of encoders, so that the processors are as close as possible to each other. The waiting time for each other can be reduced, and the interaction overhead between encoders can be reduced as much as possible. Therefore, the parallel video encoding processing solution provided by the present invention also has higher video encoding efficiency.

附图说明Description of drawings

图1为现有技术中的视频编码框架示意图;FIG. 1 is a schematic diagram of a video coding framework in the prior art;

图2为本发明所述的方法的具体实现过程示意图;Fig. 2 is a schematic diagram of the specific implementation process of the method of the present invention;

图3为本发明所述的装置的具体实现结构示意图。Fig. 3 is a schematic structural diagram of a specific implementation of the device of the present invention.

具体实施方式Detailed ways

本发明主要是在视频编码过程,将条带组进一步划分为多个子条带组,每个子条带组包括一个或多个条带,基于所述的子条带组利用多个编码器进行并行编码。The present invention mainly divides the slice group into a plurality of sub-slice groups in the video encoding process, each sub-strip group includes one or more slices, and uses multiple encoders to perform parallel processing based on the sub-slice group. coding.

为便于对本发明中涉及的现有编码实现方案中技术概念的理解,首先对现有编码方案中的部分概念进行说明。In order to facilitate the understanding of the technical concepts in the existing encoding implementation schemes involved in the present invention, some concepts in the existing encoding schemes are described first.

H.264按时间、空间由高到低分成序列、图像组、图像(即帧)、场、条带组、条带、宏块、子宏块等不同层次。H.264 is divided into sequence, picture group, picture (ie frame), field, slice group, slice, macroblock, sub-macroblock and other different levels according to time and space from high to low.

(1)场、帧、图像(1) field, frame, image

视频的一场或一帧可以用来产生一个编码图像。通常,视频帧可以分成两种类型:连续或隔行视频。隔行视频情况下,两场(分别称为顶场和底场)组成一个帧;A field or frame of video can be used to generate an encoded picture. In general, video frames can be divided into two types: continuous or interlaced video. In the case of interlaced video, two fields (referred to as the top field and the bottom field respectively) form a frame;

当前帧:正在编码的帧;Current frame: the frame being encoded;

重建帧:编码器经过本地解码重建输出的帧;Reconstructed frame: the encoder reconstructs the output frame through local decoding;

参考图像(帧):为了提高预测精度,H264编码可从一组前向或后向已编码重建图像中选出一个或多个与当前最匹配的图像作为帧间编码的参考图像。H.264中最多可从16个参考图像中进行选择,选出最佳的匹配图像。Reference image (frame): In order to improve prediction accuracy, H264 encoding can select one or more images that best match the current image from a set of forward or backward encoded reconstruction images as reference images for inter-frame encoding. In H.264, you can choose from up to 16 reference images to select the best matching image.

(2)宏块(Macroblock,MB)和子宏块(2) Macroblock (Macroblock, MB) and sub-macroblock

一个编码图像通常划分成若干个宏块,一个宏块可以由一个16×16亮度象素和附加的一个8×8Cb和一个8×8Cr彩色象素块组成。宏块是视频编码的基本尺寸单位。一个宏块可进一步分割为块:即可分成16×8,8×16或8×8亮度象素块(以及附带的彩色象素);对8×8的子宏块,则可再分割成各种子宏块:8×4,4×8或4×4亮度象素块(以及附带的彩色象素)。一个隔行扫描帧的顶场和底场的两个对应宏块组成宏块对。A coded image is usually divided into several macroblocks, and a macroblock may consist of a 16×16 luminance pixel and an additional 8×8Cb and an 8×8Cr color pixel block. A macroblock is the basic unit of size for video encoding. A macroblock can be further divided into blocks: it can be divided into 16×8, 8×16 or 8×8 luminance pixel blocks (and accompanying color pixels); for 8×8 sub-macroblocks, it can be further divided into Various sub-macroblocks: 8x4, 4x8 or 4x4 luma pixel blocks (and accompanying color pixels). Two corresponding macroblocks of the top and bottom fields of an interlaced frame form a macroblock pair.

根据所采用的预测模式是帧内还是帧间,将宏块分为帧内预测块(I宏块)和帧间预测块(P宏块),I宏块利用从当前条带中已解码的象素作为参考进行帧内预测。P宏块利用前面已编码的图像作为参考图像进行帧内预测。According to whether the prediction mode adopted is intra frame or inter frame, the macro block is divided into intra frame prediction block (I macro block) and inter frame prediction block (P macro block). Pixels are used as a reference for intra prediction. The P macroblock uses the previously coded picture as a reference picture for intra prediction.

(3)条带(Slice)(3) Strip (Slice)

每个图像中,若干宏块被排列成条带的形式。帧内编码条带(I条带)只包含I宏块,帧间编码条带(P条带)可包含P和I宏块。条带的编码是相互独立的,某条带的帧内预测不能以其他条带中的宏块为参考图像。In each picture, several macroblocks are arranged in the form of slices. An intra-coded slice (I slice) contains only I macroblocks, and an inter-coded slice (P slice) can contain both P and I macroblocks. The coding of the slices is independent of each other, and the intra prediction of a certain slice cannot use the macroblocks in other slices as reference images.

全部由I条带组成的帧成为I帧;不全由I条带构成的帧称为非I帧。A frame composed entirely of I slices is called an I frame; a frame not entirely composed of I slices is called a non-I frame.

(4)条带组(4) Strip group

在H.264标准中的图象宏块可以灵活的宏块组织顺序(FMO)划分为多个条带组(slicegroup);条带组是一个编码图像中若干MB的一个子集,它可包含一个或若干个条带。通过条带组(slice group)的使用,FMO改变了图像划分为条带和宏块的方式,可以进一步提高条带的差错恢复能力。Image macroblocks in the H.264 standard can be divided into multiple slice groups (slicegroups) in a flexible macroblock organization order (FMO); a slice group is a subset of several MBs in a coded image, which can contain one or several strips. Through the use of slice groups, FMO changes the way images are divided into slices and macroblocks, which can further improve the error recovery capability of slices.

宏块到条带组的映射定义了宏块属于哪一个条带组。利用FMO技术,H.264定义了7种宏块扫描模式,所述的七种扫描模式包括:交错、散乱、前景和背景、打开的盒子(Box—out)、光栅扫描、手绢及显式(用编号指明每一个宏块所属的条带组)。The macroblock to slice group mapping defines which slice group the macroblock belongs to. Using FMO technology, H.264 defines seven macroblock scanning modes, the seven scanning modes include: interlaced, scattered, foreground and background, open box (Box-out), raster scan, handkerchief and explicit ( The slice group to which each macroblock belongs is indicated by number).

不同的标准及档次(Profile)对FMO的支持不同。H.264基本档次(BaselineProfile)和扩展档次支持FMO7种扫描模式。H.264主档次(Main Profile)、VC—1、AVS1.0-P2不支持FMO模式,仅有“光栅扫描”一种扫描模式;只有一个条带组,它的尺寸等于帧。Different standards and profiles support FMO differently. H.264 basic profile (BaselineProfile) and extended profile support FMO7 scanning modes. H.264 main profile (Main Profile), VC-1, AVS1.0-P2 do not support FMO mode, there is only one scan mode of "raster scan"; there is only one strip group, and its size is equal to the frame.

(5)图像组(GOP)(5) Group of Pictures (GOP)

多个连续的图像(帧),起始帧就是I帧。For multiple consecutive images (frames), the starting frame is the I frame.

(6)序列(6) sequence

视频序列(sequence),编码比特流的最高层语法结构,包括一个或多个连续的编码图像。A video sequence (sequence), the highest-level syntax structure of a coded bitstream, includes one or more consecutive coded images.

在编码处理过程中,档次是规定的语法、语义及算法的子集。符合某个档次规定的解码器必须完全支持该档次定义的子集。H.264/AVC标准分为3个档次(Profile)和4种高保真扩展(High Extended)。A profile is a subset of syntax, semantics, and algorithms specified in the encoding process. A decoder conforming to a profile must fully support the subset defined by that profile. The H.264/AVC standard is divided into 3 grades (Profile) and 4 kinds of high-fidelity extensions (High Extended).

1、基本档次(Baseline Profile):1. Baseline Profile:

利用I条带和P条带支持帧内和帧间编码,支持基于上下文的自适应的变长编码进行的熵编码(CAVLC)。主要应用于可视电话、会议电视、无线通信等实时视频通信。Utilize I slice and P slice to support intra-frame and inter-frame coding, and support entropy coding (CAVLC) based on context adaptive variable length coding. It is mainly used in real-time video communication such as videophone, conference TV, and wireless communication.

2、主档次(Main Profile):2. Main Profile:

支持隔行视频,采用B条带的帧间编码和采用加权预测的帧内编码;支持利用基于上下文的自适应的算术编码(CABAC)。主要用于数字广播电视与数字视频存储。Supports interlaced video, inter-frame coding using B slices and intra-frame coding using weighted prediction; supports context-based adaptive arithmetic coding (CABAC). It is mainly used for digital broadcast TV and digital video storage.

3、扩展档次(Extended Profile):3. Extended Profile:

支持码流之间有效的切换(SP和SI条带)、改进误码性能(数据分割),但不支持隔行视频和CABAC,主要应用于流媒体。Support effective switching between code streams (SP and SI strips), improve bit error performance (data segmentation), but do not support interlaced video and CABAC, mainly used in streaming media.

在具体的编码处理过程中,为方便适配各种网络标准协议,H.264/AVC的功能分为两层,即视频编码层(VCL)和网络提取层(NAL,Network Abstraction Layer)。VCL数据即编码处理后的输出,它表示被压缩编码后的视频数据序列。在VCL数据传输或存储之前,这些编码的VCL数据,先被映射或封装进NAL单元中。In the specific encoding process, in order to facilitate the adaptation of various network standard protocols, the function of H.264/AVC is divided into two layers, namely the video coding layer (VCL) and the network abstraction layer (NAL, Network Abstraction Layer). The VCL data is the output after encoding, which represents the compressed and encoded video data sequence. Before VCL data transmission or storage, these coded VCL data are first mapped or encapsulated into NAL units.

H.264/AVC(MPEG4-Part10)视频编码标准的视频编码层采用变换和预测混合编码方法,相应的框图仍如图1所示。如果采用帧内预测编码,其预测值PRED(图中用P表示)是由当前条带中已编码的参考图像经运动补偿(MC)后得出的,其中,参考图像用F’n—1表示。为了提高预测精度,从而提高压缩比,实际的参考图像可在过去或未来(指显示次序上)已编码解码重建和滤波的帧中进行选择。预测值PRED和当前块相减后,产生一个残差块Dn,经块变换、量化后产生一组量化后的变换系数X,再经熵编码,与解码所需的一些边信息(如预测模式量化参数、运动矢量等)一起组成一个压缩后的码流,经NAL(网络自适应层)供传输和存储用。The video coding layer of the H.264/AVC (MPEG4-Part10) video coding standard adopts a hybrid coding method of transformation and prediction, and the corresponding block diagram is still shown in FIG. 1 . If intra-frame prediction coding is used, the predicted value PRED (indicated by P in the figure) is obtained by motion compensation (MC) of the coded reference image in the current slice, where the reference image is F'n-1 express. In order to improve the prediction accuracy and thus the compression ratio, the actual reference image can be selected from the encoded, decoded, reconstructed and filtered frames in the past or in the future (referring to the display order). After the predicted value PRED is subtracted from the current block, a residual block Dn is generated. After block transformation and quantization, a set of quantized transformation coefficients X is generated, and then entropy encoded, and some side information required for decoding (such as prediction mode Quantization parameters, motion vectors, etc.) together form a compressed code stream, which is used for transmission and storage through NAL (Network Adaptation Layer).

正如上述,为了提供进一步预测用的参考图像,编码器必须有重建图像的功能。因此必须使残差图像经反量化、反变换后得到的D’n与预测值P相加,得到uF’n(未经滤波的帧)。为了去除编码解码环路中产生的噪声,提高参考帧的图像质量,从而提高压缩图像性能,设置了一个环路滤波器,滤波后的输出即为重建图像,可用作参考图像。As mentioned above, in order to provide reference images for further prediction, the encoder must have the function of reconstructing images. Therefore, it is necessary to add the D'n obtained after inverse quantization and inverse transformation of the residual image to the predicted value P to obtain uF'n (unfiltered frame). In order to remove the noise generated in the encoding and decoding loop, improve the image quality of the reference frame, and thus improve the performance of the compressed image, a loop filter is set, and the filtered output is the reconstructed image, which can be used as a reference image.

运动估计占编码器运算量的50%以上,是编码器实现的瓶颈。所谓运动估计是对于当前帧中的每一块(亮度宏块及其子宏块)到前一帧或后一帧某给定搜索范围内根据一定的匹配准则找出与当前块最相似的块,即匹配块,由匹配块与当前块的相对位移计算出运动矢量(Motion Vector)。这里常用的准则为绝对误差和(SAD)最小。运动估计的越准确,补偿的残差就越小,编码效率就越高,编码出来的图像质量也就越好。为了块运动估计,需要读入该块对应搜索窗的参考帧数据(也称参考数据)。对于一个16×16宏块,如果运动估计搜索位置范围水平[-64,+63]/竖直[-32,+31],那么,需要读入参考数据为参考帧中对应于本宏块及其周围的位置图像区域,大小为(64+16+64)×(32+16+32)=144×80。在多参考帧情况下,可能需要读入多个参考帧搜索窗数据。由于每帧宏块数很大,上述存储访问量巨大,成为视频编码实现的瓶颈。为此,需要通过相邻宏块间重用参考帧搜索窗数据,这样可使得读入的参考帧数据量大大下降,当很多相邻宏块在一个单元中进行运动估计时,所需读入的参考数据仅比这些宏块区域略大。Motion estimation accounts for more than 50% of the computation of the encoder and is the bottleneck of the encoder implementation. The so-called motion estimation is to find out the block most similar to the current block according to a certain matching criterion for each block (luminance macroblock and its sub-macroblock) in the current frame to a given search range of the previous frame or the next frame. That is, the matching block, and the motion vector (Motion Vector) is calculated from the relative displacement between the matching block and the current block. The commonly used criterion here is the minimum sum of absolute errors (SAD). The more accurate the motion estimation, the smaller the compensation residual, the higher the coding efficiency, and the better the quality of the coded image. For block motion estimation, it is necessary to read in reference frame data (also referred to as reference data) corresponding to the search window of the block. For a 16×16 macroblock, if the motion estimation search position range is horizontal [-64, +63]/vertical [-32, +31], then it is necessary to read in reference data corresponding to this macroblock and The size of the surrounding location image area is (64+16+64)×(32+16+32)=144×80. In the case of multiple reference frames, it may be necessary to read in the search window data of multiple reference frames. Due to the large number of macroblocks per frame, the above-mentioned storage access is huge, which becomes the bottleneck of video coding implementation. For this reason, it is necessary to reuse the reference frame search window data between adjacent macroblocks, which can greatly reduce the amount of reference frame data read in. When many adjacent macroblocks perform motion estimation in one unit, the required read-in Reference data is only slightly larger than these macroblock areas.

H.264/MPEG-4AVC标准定义了一个对16X16宏块和4X4块边界的解块过滤过程。在宏块这种情况下,过滤的目的是消除由于相邻宏块有不同的帧内、帧间预测或者不同的量化参数导致的人工方块效应。在块边界这种情况下,过滤的目的是消除可能由于变换/量化和来自于相邻块运动矢量的差别引起的人工痕迹。环路滤波通过一个内容自适应的非线性算法修改在宏块/块边界的同一边的两个像素。The H.264/MPEG-4AVC standard defines a deblocking filtering process for 16X16 macroblocks and 4X4 block boundaries. In the case of macroblocks, the purpose of filtering is to remove artifacts caused by adjacent macroblocks having different intra or inter predictions or different quantization parameters. In the case of block boundaries, the purpose of filtering is to remove artifacts that may arise from transform/quantization and differences in motion vectors from neighboring blocks. Loop filtering modifies two pixels on the same side of a macroblock/block boundary through a content-adaptive non-linear algorithm.

其它新一代视频编码标准,即VC—1、AVS1.0-P2,与H.264/AVC具有相同的编码框架,仅部分模块细节不同;如环路滤波:H.264/AVC可以选择对条带的边界滤波或者不滤波,VC—1、AVS1.0-P2的条带边界总是不滤波,具体的滤波算法也有差别。同样,上一代视频编码标准,包括MPEG4、H.263、MPEG4—Part2,与H.264/AVC、VC—1、AVS1.0-P2具有类似的编码框架,仅部分模块不同,如MPEG4、H.263、MPEG4—Part2编码器没有环路滤波,运动估计参考帧也只有一个。Other next-generation video coding standards, namely VC-1 and AVS1.0-P2, have the same coding framework as H.264/AVC, only some module details are different; such as loop filtering: H.264/AVC can choose to Band boundary filtering or no filtering, VC-1 and AVS1.0-P2 strip boundaries are always not filtered, and the specific filtering algorithms are also different. Similarly, the previous generation of video coding standards, including MPEG4, H.263, MPEG4-Part2, and H.264/AVC, VC-1, AVS1.0-P2 have a similar coding framework, only some modules are different, such as MPEG4, H .263, MPEG4-Part2 encoder has no loop filter, and there is only one reference frame for motion estimation.

为便于对本发明的理解,下面将对本发明的具体实现过程进行说明。In order to facilitate the understanding of the present invention, the specific implementation process of the present invention will be described below.

本发明的具体实现过程如图2所示,具体包括:The concrete realization process of the present invention is as shown in Figure 2, specifically comprises:

步骤21:将当前帧划分至少一个条带组;Step 21: Divide the current frame into at least one slice group;

具体为将视频帧序列中的当前帧分割成宏块,按预定规则将所有宏块分配给一个或多个条带组;Specifically, the current frame in the video frame sequence is divided into macroblocks, and all the macroblocks are assigned to one or more slice groups according to predetermined rules;

在该步骤中,可以将视频帧划分为条带组具体可以按灵活的宏块组织顺序(FMO)的各种扫描方式划分,所述的各种扫描模式包括:交错扫描模式、散乱扫描模式、前景和背景扫描模式、打开的盒子(Box—out)扫描模式、光栅扫描扫描模式、手绢扫描模式和显式扫描模式,其中,在显式扫描模式中,需要采用编号指明每一个宏块所属的条带组;若编码器不支持FMO时,则可以将一帧所有宏块划分到一个条带组。In this step, the video frame can be divided into slice groups, which can be divided according to various scanning modes of flexible macroblock organization order (FMO). The various scanning modes include: interlaced scanning mode, random scanning mode, Foreground and background scanning mode, open box (Box-out) scanning mode, raster scanning scanning mode, handkerchief scanning mode and explicit scanning mode, wherein, in the explicit scanning mode, it is necessary to use the number to indicate which macroblock belongs to Slice group; if the encoder does not support FMO, all macroblocks in one frame can be divided into one slice group.

而且,在所述的光栅扫描扫描模式中,所述的条带组可以选择为在图像中大小和位置固定、宽度等于帧宽度的一个或多个连续的整宏块行;在所述的显式扫描模式中,所述的条带组的形状可以均选择为矩形。Moreover, in the raster scanning scanning mode, the strip group can be selected as one or more continuous entire macroblock rows whose size and position are fixed in the image and whose width is equal to the frame width; In the scanning mode, the shapes of the strip groups may all be selected as rectangles.

步骤22:基于所述的条带组将当前帧划分为多个子条带组;Step 22: Divide the current frame into multiple sub-slice groups based on the slice group;

在H.264编码方案中,分成序列、图像组(GOP)、图像(帧)、条带组、条带、宏块、子宏块等不同层次;本发明中,为了用多个编码器并行编码,需要选择一种恰当的粒度将整个一个序列的编码任务划分为多个子任务(模块),即将所述的条带组进一步划分为一个或多个子条带组,或当前帧只有一个条带组,则需要将该条带组划分为多个子条带组,以便于后续可以基于当前帧的多个子条带组进行负荷均衡的并行编码处理。In the H.264 coding scheme, it is divided into different levels such as sequence, group of pictures (GOP), picture (frame), slice group, slice, macroblock, sub-macroblock; in the present invention, in order to use multiple encoders in parallel For encoding, it is necessary to select an appropriate granularity to divide the entire encoding task of a sequence into multiple sub-tasks (modules), that is, to further divide the slice group into one or more sub-slice groups, or the current frame has only one slice group, the slice group needs to be divided into multiple sub-slice groups, so that subsequent parallel encoding processing of load balancing can be performed based on the multiple sub-slice groups of the current frame.

在该步骤中,对当前帧的所有条带组分别进一步划分为多个子条带组,具体是按归光栅扫描的顺序划分为一个或多个子条带组;In this step, all the slice groups of the current frame are further divided into a plurality of sub-strip groups, specifically divided into one or more sub-strip groups in the order of normalized raster scanning;

其中,所有子条带组可以为宽度等于帧宽度的一个或多个连续的整宏块行。Wherein, all the sub-slice groups may be one or more continuous whole macroblock rows whose width is equal to the frame width.

需要说明的是,在划分子条带组过程中,每个宏块必须并且只能分配到一个子条带组,将每个编码器对应一个或多个子条带组,每一个子条带组仅能对应一个编码器,但每个编码器可以对应一个或多个子条带组,即可以用于对一个或多个子条带组进行编码;It should be noted that, in the process of dividing sub-slice groups, each macroblock must and can only be assigned to one sub-slice group, and each encoder corresponds to one or more sub-slice groups, and each sub-slice group It can only correspond to one encoder, but each encoder can correspond to one or more sub-slice groups, that is, it can be used to encode one or more sub-slice groups;

另外,在该步骤中,一个编码器还可以对应多个子条带组,例如,某种条带组和子条带组划分方案导致部分子条带组尺寸偏小时,可将若干个小的子条带组分给一个编码器。In addition, in this step, one encoder can also correspond to multiple sub-slice groups. For example, if a certain slice group and sub-slice group division scheme causes the size of some sub-strip groups to be small, several small sub-slice groups can be Take components to an encoder.

多个编码器之间处理负荷均衡的原则具体为:按与多个编码器处理能力匹配的原则来划分子条带组,在划分过程中结合参考数据传送代价,令多个编码器同时完成所分配子条带组的编码处理。The principle of processing load balancing between multiple encoders is as follows: divide the sub-strip group according to the principle of matching the processing capabilities of multiple encoders, and combine the reference data transmission cost during the division process to make multiple encoders complete all tasks simultaneously. Allocate encoding processing for sub-slice groups.

在该步骤中,在对一个图像组(GOP)进行编码的过程中,具体的子条带组划分处理方案包括如下处理方式:In this step, in the process of encoding a group of pictures (GOP), the specific sub-slice group division processing scheme includes the following processing methods:

对于I帧,按照宏块数预测编码处理负荷,按与多编码器处理能力匹配的原则来划分子条带组,令多个编码器同时完成所分配子条带组的编码处理;For an I frame, the encoding processing load is predicted according to the number of macroblocks, and the sub-slice groups are divided according to the principle of matching the processing capabilities of multiple encoders, so that multiple encoders simultaneously complete the encoding processing of the allocated sub-slice groups;

对于第一个非I帧,当条带组划分不变时,采用I帧的子条带组划分方式进行子条带组的划分;For the first non-I frame, when the slice group division is unchanged, the sub-slice group division method of the I frame is used to divide the sub-slice group;

对于第2个非I帧,根据实际统计得到的第1个非I帧的各子条带组编码处理量预测所述第2个非I帧各子条带组的编码处理负荷,并根据参考数据载入代价,调整所述第2个非I帧子条带组的划分,令并行编码所述第2个非I帧时多个编码器处理时间一致;For the second non-I frame, predict the encoding processing load of each sub-slice group of the second non-I frame according to the actual statistics of the encoding processing load of each sub-slice group of the first non-I frame, and according to the reference Data loading cost, adjusting the division of the second non-I frame sub-strip group, so that the processing time of multiple encoders is consistent when encoding the second non-I frame in parallel;

对于第2个非I帧以后的任一非I帧,根据该帧前一帧或前几帧各子条带组编码处理量预测所述非I帧各子条带组的编码处理负荷,并根据考帧数据载入代价,调整所述非I帧子条带组的划分,令并行编码所述非I帧时多个编码器处理时间一致;For any non-I frame after the second non-I frame, predict the encoding processing load of each sub-slice group of the non-I frame according to the encoding processing amount of each sub-slice group in the previous frame or several previous frames of the frame, and According to the frame data loading cost, adjust the division of the sub-strip group of the non-I frame, so that the processing time of multiple encoders is consistent when encoding the non-I frame in parallel;

步骤23:将划分为子条带组后的待编码数据分发给各个编码器;Step 23: Distributing the data to be encoded divided into sub-slice groups to each encoder;

具体的分发方式按步骤22中描述的子条带组和编码器对应关系进行,具体的分发内容不仅包括当前帧所有子条带组,还包括相关的编码配置参数信息,这些信息包括条带的描述信息,如位置、宏块数等;所述的编码配置参数信息具体包括但不限于:编码标准信息(标准、档次),FMO及扫描模式,参考帧数,运动估计搜索范围,码率控制要求,以及环路滤波模式等。The specific distribution method is carried out according to the corresponding relationship between sub-slice groups and encoders described in step 22. The specific distribution content not only includes all sub-slice groups in the current frame, but also includes related encoding configuration parameter information. Descriptive information, such as position, number of macroblocks, etc.; the encoding configuration parameter information specifically includes but not limited to: encoding standard information (standard, grade), FMO and scanning mode, number of reference frames, motion estimation search range, code rate control requirements, and the loop filter mode, etc.

步骤24:各编码器之间执行交换参考数据的操作;Step 24: perform the operation of exchanging reference data between the encoders;

具体的交换参考数据的处理过程可以为:当当前帧不为I帧时,编码器之间交换已重建的参考帧子条带组数据,更新各编码器中缓存的每个子条带组运动估计搜索区参考数据;The specific process of exchanging reference data can be as follows: when the current frame is not an I frame, exchange the reconstructed reference frame sub-slice group data between encoders, and update the motion estimation of each sub-slice group cached in each encoder search area reference data;

进一步讲,所述的交换参考数据的处理为:当当前帧不为I帧时,编码器进行帧间预测,其中最核心的处理是运动估计;在所述运动估计前,编码器需要获得搜索区参考数据;参考数据来源于前面已重建的参考帧,这个参考帧由各编码器输出的各重建子条带组合成;参考数据的读入是视频编码存储访问量中最大的部分,对编码性能影响很大;为此需要最大限度地减少参考数据的读入量,这需要采用各种参考数据重用策略。Further, the process of exchanging reference data is as follows: when the current frame is not an I frame, the encoder performs inter-frame prediction, and the core process is motion estimation; before the motion estimation, the encoder needs to obtain a search Region reference data; the reference data comes from the previously reconstructed reference frame, which is composed of the reconstructed sub-strips output by each encoder; the reading of reference data is the largest part of video coding storage access, and it is important for coding The performance impact is significant; this requires minimizing the amount of reference data read in, which requires various reference data reuse strategies.

即在该步骤中,若当前帧不为I帧,则根据各编码器当前帧子条带组和重建帧子条带组的图像区、存储区交叠关系,以及当前帧子条带组中运动估计最大搜索区、参考帧,确定编码器之间交换的最少的参考数据,并利用所述交换的参考数据更新各编码器中缓存的每个子条带组运动估计搜索区参考数据。That is, in this step, if the current frame is not an I frame, then according to the image area and storage area overlapping relationship of the current frame sub-slice group and the reconstructed frame sub-strip group of each encoder, and the current frame sub-slice group For motion estimation maximum search area and reference frame, determine the minimum reference data exchanged between encoders, and use the exchanged reference data to update each sub-slice group motion estimation search area reference data cached in each encoder.

也就是说,本发明中,一个编码器编码过程中产生的重建子条带组存储在本编码器本地存储器中,当下一帧同一个子条带组位置大小不变时,这个重建子条带组的数据能够全部作为下帧运动估计的参考数据,但超出本编码器所分配子条带组区域以外的搜索区部分参考数据是由其它编码器产生的,必须从其它编码器获得。That is to say, in the present invention, the reconstructed sub-slice group generated during the encoding process of an encoder is stored in the local memory of the encoder. All of the data can be used as reference data for motion estimation in the next frame, but part of the reference data in the search area beyond the sub-slice group area allocated by this encoder is generated by other encoders and must be obtained from other encoders.

例如,在运动估计过程中采用大小固定的搜索窗,一个参考帧,根据搜索窗的尺寸确定编码器之间交换的最小参考数据,并在编码器之间交换获得相应的参考数据,以便于用来更新各编码器中缓存的每个子条带组的运动估计搜索区参考数据。For example, a fixed-size search window and a reference frame are used in the motion estimation process, and the minimum reference data exchanged between encoders is determined according to the size of the search window, and the corresponding reference data is exchanged between encoders, so that it can be used to update the motion estimation search area reference data of each sub-slice group cached in each encoder.

步骤25:由各个编码器并行将待编码的当前帧所有子条带组包含的宏块进行编码处理;Step 25: Encoding the macroblocks included in all the sub-slice groups of the current frame to be encoded is performed in parallel by each encoder;

对当前帧所有子条带组用多个编码器并行编码的处理具体可以为:每个编码器完成所分配子条带组的编码,在编码过程中,将每个子条带组按光栅扫描的顺序划分为一个或多个条带,每个宏块必须并且只能分配到一个条带;每个编码器完成所分配子条带组中所有宏块的编码,产生重建子条带组,输出编码码流和参数。The processing of encoding all sub-slice groups in the current frame in parallel with multiple encoders may specifically be as follows: each encoder completes the encoding of the assigned sub-slice group, and during the encoding process, each sub-slice group is encoded in a raster-scanned Sequentially divided into one or more slices, each macroblock must and can only be allocated to one slice; each encoder completes the encoding of all macroblocks in the assigned sub-slice group, generates a reconstructed sub-slice group, and outputs Encoding stream and parameters.

在该步骤中,每个编码器对子条带组中所有宏块的编码过程具体可以包括:运动估计,运动补偿;帧内预测选择,帧内预测,帧内/帧间选择,求残差;码率控制;变换、量化;重排序,熵编码;反量化,反变换;以及重建。In this step, each encoder's encoding process for all macroblocks in the sub-slice group may specifically include: motion estimation, motion compensation; intra prediction selection, intra prediction, intra/inter selection, residual error calculation ; Rate Control; Transformation, Quantization; Reordering, Entropy Coding; Inverse Quantization, Inverse Transformation; and Reconstruction.

若根据H.264、VC-1、AVS1.0-P2标准,则每个编码器对子条带组中所有宏块的的编码过程具体可以包括以下步骤:According to the H.264, VC-1, AVS1.0-P2 standards, the encoding process of each encoder for all macroblocks in the sub-slice group may specifically include the following steps:

(1)使用多个后向或前向参考帧进行运动估计,及运动补偿;(1) Use multiple backward or forward reference frames for motion estimation and motion compensation;

(2)帧内预测选择,即当前宏块帧内预测时,使用同一条带已重建尚未环路滤波的左边宏块和上边宏块数据;帧内/帧间选择,求残差;(2) Intra-frame prediction selection, that is, when the current macroblock is intra-frame predicted, use the same strip to reconstruct the left macroblock and the upper macroblock data that have not been loop-filtered; intra-frame/inter-frame selection, and find the residual;

(3)码率控制处理;(3) code rate control processing;

(4)整数变换、量化处理;(4) Integer transformation and quantization processing;

(5)重排序,熵编码,熵编码是上下文基于自适应的变长编码或基于上下文的自适应二进制算术编码;(5) Reordering, entropy coding, entropy coding is context-based adaptive variable-length coding or context-based adaptive binary arithmetic coding;

(6)反量化,及反变换处理;(6) Inverse quantization and inverse transformation processing;

(7)重建处理;(7) Reconstruction processing;

(8)环路滤波处理。(8) Loop filter processing.

基于H.264标准,环路滤波是编码过程产生重建帧必须的环节。H.264每条带的边界是否进行环路滤波可以通过“条带边界环路滤波模式”设置选择,本发明优选将当前帧所有条带“边界环路滤波模式”设置为“不滤波”。当前帧所有条带“边界环路滤波模式”设置为“不滤波”时,各编码器各自独立完成每个条带环路滤波处理,各编码器、各条带之间不交换数据。Based on the H.264 standard, loop filtering is a necessary link in the encoding process to generate reconstructed frames. Whether to perform loop filtering at the boundary of each H.264 strip can be selected by setting the "strip boundary loop filtering mode". In the present invention, it is preferable to set the "boundary loop filtering mode" of all strips in the current frame to "no filtering". When the "boundary loop filter mode" of all slices in the current frame is set to "no filter", each encoder completes the loop filter processing of each slice independently, and no data is exchanged between encoders and slices.

由于各条带的环路滤波是独立的,为节省环路滤波的处理时间,本发明中针对一个条带包含的宏块的环路滤波在该条带第一个宏块重建完成后启动,宏块的重建和环路滤波构成宏块级流水线,为了降低环路滤波处理对主存储器的访问频度,可以在片上缓存上存储上边一整行宏块中的最下边4行象素值及块信息和左边宏块最右4列的象素值及块信息,环路滤波处理时仅需要访问这块片上缓存器。当然,一个条带的环路滤波也可以在该条带重建全部完成后开始,同样,各编码器对各条带的环路滤波各自独立进行,各编码器、各条带之间不需要交换数据。Since the loop filtering of each slice is independent, in order to save the processing time of loop filtering, in the present invention, the loop filtering of the macroblocks contained in a slice is started after the reconstruction of the first macroblock of the slice is completed. The macroblock reconstruction and loop filtering constitute a macroblock-level pipeline. In order to reduce the access frequency of the loop filtering process to the main memory, the pixel values and block values of the bottom 4 rows of the entire row of macroblocks above can be stored on the on-chip cache. information and the pixel values and block information of the rightmost 4 columns of the left macroblock, only this on-chip register needs to be accessed during the loop filtering process. Of course, the loop filtering of a slice can also be started after the rebuilding of the slice is completed. Similarly, each encoder performs the loop filtering of each slice independently, and there is no need to switch between each encoder and each slice. data.

步骤26:对编码器编码后的输出码流及参数进行汇聚处理和顶层编码,产生输出序列码流;Step 26: Perform aggregation processing and top-level encoding on the encoded output code stream and parameters of the encoder to generate an output sequence code stream;

具体为:主处理器汇集各编码器输出的码流及相关参数,生成条带组、帧码流,合并处理生成序列码流后输出;例如,对于H.264,需要完成包括NAL在内的所有编码处理。Specifically: the main processor collects the code streams and related parameters output by each encoder, generates slice groups and frame code streams, merges them to generate sequence code streams, and outputs them; for example, for H.264, it is necessary to complete the code streams including NAL All encoding is handled.

经过上述步骤21至步骤26的处理,便可以实现针对视频数据的并行编码处理,而且,在针对视频数据进行编码处理过程中需要重复执行所述的步骤21至步骤26,以便于重复进行下一帧编码处理,直到编码过程结束。After the processing of the above steps 21 to 26, the parallel encoding process for video data can be realized, and the steps 21 to 26 need to be repeatedly executed during the encoding process for video data, so as to repeat the next step Frame encoding is processed until the end of the encoding process.

本发明在具体实现过程中,划分子带分组的处理过程为本发明实现的关键,下面将结合具体的应用实例对划分子带分组的处理过程进行说明。In the specific implementation process of the present invention, the process of dividing the subbands into groups is the key to the realization of the present invention. The process of dividing the subbands into groups will be described below in conjunction with specific application examples.

本发明中,为了通过并行处理获得更大编码处理能力,满足实时高清视频编码的要求,需要采用多个编码器并行进行编码处理,所述的多个编码器的架构和编码处理能力可以不相同,但其编码器总的处理能力应大于所要求的编码处理,并有一定余量应付并行编码处理器的附加开销。In the present invention, in order to obtain greater encoding processing capability through parallel processing and meet the requirements of real-time high-definition video encoding, it is necessary to use multiple encoders to perform encoding processing in parallel, and the architecture and encoding processing capabilities of the multiple encoders may be different. , but the total processing capacity of the encoder should be greater than the required encoding processing, and there is a certain margin to cope with the additional overhead of the parallel encoding processor.

以高清视频720p(1280×720,30fps)H.264实时编码为例。编码器架构设计考虑支持1或2个参考帧,运动估计搜索范围水平[-64,+63]/竖直[-32,+31]。单个编码器采用VLIW DSP或FPGA实现,根据预先的评估,大约需要5个所述的编码器累计起来能够达到720p总的编码处理能力并有一定余量,所述的编码器编号为编码器1~5。Take high-definition video 720p (1280×720, 30fps) H.264 real-time encoding as an example. The encoder architecture is designed to support 1 or 2 reference frames, and the motion estimation search range is horizontal [-64, +63]/vertical [-32, +31]. A single encoder is realized by VLIW DSP or FPGA. According to the pre-evaluation, about 5 encoders are required to achieve the total encoding processing capacity of 720p with a certain margin. The encoder number is encoder 1 ~5.

本发明中,为了获得最佳的并行处理性能,主要考虑以下两方面:In the present invention, in order to obtain the best parallel processing performance, the following two aspects are mainly considered:

(1)各编码器之间负荷均衡,相互等待尽量少;(1) The load is balanced among the encoders, and the mutual waiting is as little as possible;

(2)各编码器之间通信等开销尽量小。(2) Communication overhead between encoders should be as small as possible.

本发明的方法正是通过灵活的子条带组划分,使并行视频编码系统中多个处理器(即编码器)满足上述两方面处理,以达到整个并行视频编码系统的性能最佳。The method of the present invention enables multiple processors (ie, encoders) in the parallel video coding system to meet the above two aspects of processing through flexible division of sub-slice groups, so as to achieve the best performance of the entire parallel video coding system.

为达到上述(1)所述的需求,需要使预计的子条带组的编码处理负荷与所分配的编码器的处理能力匹配。子条带组的编码处理负荷与该子条带组大小(即包含的宏块数数)、图像内容特征、编码配置参数(帧内预测、帧间预测、参考帧数量、最小块、量化等级、熵编码方式等)密切相关,通常,宏块数越多,编码处理负荷越大,因此可以通过调整子条带组的宏块数和对应编码器处理能力相匹配。In order to meet the requirement described in (1) above, it is necessary to match the expected encoding processing load of the sub-slice group with the allocated processing capacity of the encoder. The encoding processing load of a sub-slice group is related to the size of the sub-slice group (that is, the number of macroblocks included), image content characteristics, and encoding configuration parameters (intra-frame prediction, inter-frame prediction, number of reference frames, minimum block, quantization level , entropy coding methods, etc.), generally, the more the number of macroblocks, the greater the encoding processing load, so the number of macroblocks in the sub-slice group can be adjusted to match the processing capability of the corresponding encoder.

对于上述(2)所述的需求,多编码器并行处理的通信开销主要是参考数据的交换,若运动估计参数(参考帧数、搜索范围等)固定,当子条带组划分不变时,通信开销不变,如果当前帧子条带组划分发生改变,则参考数据交换的通信开销明显增大。故子条带组划分的调整需要计算参考数据通信开销增大的代价影响。For the requirements described in (2) above, the communication overhead of multi-encoder parallel processing is mainly the exchange of reference data. If the motion estimation parameters (number of reference frames, search range, etc.) are fixed, when the division of sub-strip groups remains unchanged, The communication overhead remains unchanged, and if the sub-strip group division of the current frame changes, the communication overhead of the reference data exchange increases significantly. Therefore, the adjustment of sub-stripe group division needs to calculate the cost impact of the increase of reference data communication overhead.

具体以高清视频720p(1280×720)为例,对本发明提供的条带组、子条带组的划分方式进行说明。即如表1所示,表3中是720p一帧的宏块划分,共有3600个16×16宏块,分为45个宏块行(表中最右边一列是宏块行的编号,MBR1~45),每个宏块行有80个宏块,每个小格中的数字是宏块编号,按光栅扫描(从左向右,从上到下)递增;Specifically, taking a high-definition video 720p (1280×720) as an example, the method of dividing the slice group and the sub-strip group provided by the present invention will be described. That is, as shown in Table 1, Table 3 is the macroblock division of a 720p frame. There are 3600 16×16 macroblocks in total, which are divided into 45 macroblock rows (the rightmost column in the table is the number of the macroblock row, MBR1~ 45), each macroblock row has 80 macroblocks, and the number in each cell is the macroblock number, which is incremented according to the raster scan (from left to right, from top to bottom);

表1Table 1

  0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1 80 81 82 116 117 118 119 120 121 122 123 157 158 159 MBR2 MBR3~7 560 561 562 596 597 598 599 600 601 602 603 637 638 639 MBR8 640 641 642 676 677 678 679 680 691 692 693 717 718 719 MBR9 720 721 722 756 757 758 759 760 761 762 763 797 798 799 MBR10 800 801 802 836 837 838 839 840 841 842 843 877 878 879 MBR11 MBR12~16 1280 1359 MBR17 1360 1439 MBR18 1440 1519 MBR19 1520 1599 MBR20 MBR21~25 2000 2039 MBR26 2080 2159 MBR27 2160 2239 MBR28 2240 2319 MBR29 MBR30~34 2720 2799 MBR35 2800 2879 MBR36 2880 2959 MBR37 0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1 80 81 82 116 117 118 119 120 121 122 123 157 158 159 MBR2 MBR3~7 560 561 562 596 597 598 599 600 601 602 603 637 638 639 MBR8 640 641 642 676 677 678 679 680 691 692 693 717 718 719 MBR9 720 721 722 756 757 758 759 760 761 762 763 797 798 799 MBR10 800 801 802 836 837 838 839 840 841 842 843 877 878 879 MBR11 MBR12~16 1280 1359 MBR17 1360 1439 MBR18 1440 1519 MBR19 1520 1599 MBR20 MBR21~25 2000 2039 MBR26 2080 2159 MBR27 2160 2239 MBR28 2240 2319 MBR29 MBR30~34 2720 2799 MBR35 2800 2879 MBR36 2880 2959 MBR37

29602960 30393039 MBR38MBR38 MBR39~43MBR39~43 34403440 35193519 MBR44MBR44 35203520 35993599 MBR45MBR45

针对表1所示的一帧,以按前景和后景划分条带组为例。假设,仅有一个前景条带组,共216个宏块(8宏块列、27个宏块行)。剩余宏块其余属于背景条带组。在实际应用中,如会议电视中,常把视场中的人(或人脸)视为最关心的部分,将这部分划分为前景条带组单独进行编码。对两个条带组又进一步划分为总共6个子条带组,其中背景条带组分为5个子条带组,前景条带组分为一个子条带组。For a frame shown in Table 1, it is taken as an example to divide strip groups by foreground and background. Assume that there is only one foreground slice group, with a total of 216 macroblocks (8 macroblock columns, 27 macroblock rows). The remainder of the remaining macroblocks belong to the background slice group. In practical applications, such as video conferencing, people (or faces) in the field of view are often regarded as the most concerned part, and this part is divided into foreground strip groups for separate encoding. The two strip groups are further divided into a total of six sub-strip groups, wherein the background strip group is divided into five sub-strip groups, and the foreground strip group is divided into one sub-strip group.

如表2所示,相应的划分后的子条带组具体可以为如下划分结果:As shown in Table 2, the corresponding divided sub-stripe groups may specifically be divided into the following division results:

子条带组1:宏块编号0~719;Sub-strip group 1: macroblock numbers 0-719;

子条带组2:宏块编号720~1439,前景部分(带双下划线的黑体字区)除外;Sub-strip group 2: macroblock numbers 720-1439, except the foreground part (area in bold with double underlines);

子条带组3:宏块编号1440~2159,前景部分(带双下划线的黑体字区)除外;Sub-strip group 3: macroblock numbers 1440-2159, except the foreground part (area in bold with double underlines);

子条带组4:宏块编号2160~2879,前景部分(带双下划线的黑体字区)除外;Sub-strip group 4: macroblock numbers 2160-2879, except the foreground part (area in bold with double underlines);

子条带组5:宏块编号2880~3599;Sub-strip group 5: macroblock numbers 2880-3599;

子条带组6:前景条带组(带双下划线的黑体字区)所有216个宏块。Sub-slice group 6: all 216 macroblocks of the foreground slice group (area in bold with double underline).

这种子条带组的划分方式适用于H.264基本档次和扩展档次支持FMO的情形。This division method of the sub-stripe group is applicable to the situation that the H.264 basic profile and the extended profile support FMO.

表2Table 2

Figure DEST_PATH_S061B3256920061221D000021
Figure DEST_PATH_S061B3256920061221D000021

仍以对表1所示的帧为例,且按显式划分条带组的实现方式如表3所示,一帧划分成4个条带组,又进一步划分为总共5个子条带组。Still taking the frame shown in Table 1 as an example, and as shown in Table 3, a frame is divided into 4 slice groups and further divided into a total of 5 sub-strip groups.

条带组1为左上角的斜体字区,共1600个宏块(40宏块列、40个宏块行);该条带组进一步上下等划分成两个子条带组(子条带组1、2),每个子条带组共800个宏块(40宏块列、20个宏块行)。Slice group 1 is the italic font area in the upper left corner, with a total of 1600 macroblocks (40 macroblock columns and 40 macroblock rows); this strip group is further divided into two sub-strip groups (sub-strip group 1 , 2), each sub-strip group has a total of 800 macroblocks (40 macroblock columns, 20 macroblock rows).

条带组2为右上角的带下划线的黑体字区,共1600个宏块(40宏块列、40个宏块行);该条带组进一步上下等划分成两个子条带组(子条带组3、4),每个子条带组共800个宏块(40宏块列、20个宏块行)。Stripe group 2 is the underlined bold font area in the upper right corner, with a total of 1600 macroblocks (40 macroblock columns and 40 macroblock rows); this strip group is further divided into two sub-stripe groups (sub-stripe groups) strip groups 3 and 4), each sub-strip group has a total of 800 macroblocks (40 macroblock columns, 20 macroblock rows).

条带组3为下方5个整宏块行,共400个宏块(80宏块列、5个宏块行);该条带组整个划分成一个子条带组(子条带组5)。Slice group 3 is the bottom 5 full macroblock rows, with a total of 400 macroblocks (80 macroblock columns, 5 macroblock rows); this slice group is entirely divided into a sub-strip group (sub-strip group 5) .

这种划分适用于H.264基本档次和扩展档次支持FMO的情形。This division is applicable to the situation that H.264 basic profile and extended profile support FMO.

表3table 3

  00   1 1   2 2   3636   3737   3838   3939   40 40   41 41   42 42   43 43   77 77   78 78   79 79   MBR1MBR1   8080   8181   8282   116116   117117   118118   119119   120 120   121 121   122 122   123 123   157 157   158 158   159 159   MBR2MBR2

  00   1 1   2 2   3636   3737   3838   3939   40 40   41 41   42 42   43 43   77 77   78 78   79 79   MBR1MBR1   …...   …...   …...   …...   …...   …...   …...                        MBR3~7MBR3~7   560560   561561   562562   596596   597597   598598   599599   600 600   601 601   602 602   603 603   637 637   638 638   639 639   MBR8MBR8   640640   641641   642642   676676   677677   678678   679679   680 680   691 691   692 692   693693   717 717   718 718   719 719   MBR9MBR9   720720   721721   722722   756756   757757   758758   759759   760 760   761 761   762 762   763 763   797 797   798 798   799 799   MBR10MBR10   800800   801801   802802   836836   837837   838838   839839   840 840   841 841   842 842   843 843   877 877   878 878   879 879   MBR11MBR11   …...   …...   …...   …...   …...   …...   …...                        MBR12~16MBR12~16   12801280   1359 1359   MBR17MBR17   13601360   1439 1439   MBR18MBR18   14401440   1519 1519   MBR19MBR19   15201520   1599 1599   MBR20MBR20   …...   …...   …...   …...   …...   …...   …...                        MBR21~25MBR21~25   20002000   2039 2039   MBR26MBR26   20802080   2159 2159   MBR27MBR27   21602160   2239 2239   MBR28MBR28   22402240   2319 2319   MBR29MBR29   …...   …...   …...   …...   …...   …...   …...                        MBR30~34MBR30~34   27202720   2799 2799   MBR35MBR35   28002800   2879 2879   MBR36MBR36   28802880   29592959   MBR37MBR37   29602960   30393039   MBR38MBR38   …...   …...   …...   …...   …...   …...   …...   …...   …...   …...   …...   …...   …...   …...   MBR39~43MBR39~43   34403440   35193519   MBR44MBR44   35203520   35993599   MBR45MBR45

再以按光栅扫描划分条带组为例,相应的划分结果如表4所示,一帧划分成1个条带组,又进一步等分为总共5个子条带组,每个子条带组9个整宏块行,共720个宏块。Taking the division of stripe groups by raster scanning as an example, the corresponding division results are shown in Table 4. One frame is divided into one stripe group, which is further divided into a total of five sub-strip groups, and each sub-strip group is 9 The whole macroblock row has 720 macroblocks in total.

子条带组1:宏块编号0~719;Sub-strip group 1: macroblock numbers 0-719;

子条带组2:宏块编号720~1439;Sub-strip group 2: macroblock numbers 720-1439;

子条带组3:宏块编号1440~2159;Sub-strip group 3: macroblock numbers 1440-2159;

子条带组4:宏块编号2160~2879;Sub-strip group 4: macroblock numbers 2160-2879;

子条带组5:宏块编号2880~3599;Sub-strip group 5: macroblock numbers 2880-3599;

这种划分方案适用于H.264、VC—1、AVS1.0-P2、H.263、MPEG2、MPEG4—Part2标准及其各档次。This division scheme is applicable to H.264, VC-1, AVS1.0-P2, H.263, MPEG2, MPEG4-Part2 standards and their grades.

表4Table 4

Figure DEST_PATH_S061B3256920061221D000041
Figure DEST_PATH_S061B3256920061221D000041

这里对本发明中步骤24所述的交换参考数据举例说明如下:The exchange reference data described in step 24 among the present invention is illustrated as follows here:

以表4所示的子带分组划分方式为例,以720p、运动估计搜索范围水平[-64,+63]/竖直[-32,+31]为例,假设编码器3对应子条带组3,需要的参考数据区为对应宏块编号为1280~2319的前一个或多个参考帧图像区域。Take the sub-band grouping method shown in Table 4 as an example, and take 720p, motion estimation search range horizontal [-64, +63]/vertical [-32, +31] as an example, assuming that encoder 3 corresponds to sub-strips In group 3, the required reference data area is one or more previous reference frame image areas corresponding to macroblock numbers 1280-2319.

对于一个参考帧的运动估计情况下,若在编码器3的当前帧与上一帧所编码子条带组划分不变,即同为宏块编号为1440~2159区域,由于所需参考帧区域中宏块编号为1440~2159区域是本编码器上一帧重建的子条带组,故需要从编码器3外部读入的参考数据仅为以下两块:For the motion estimation of a reference frame, if the division of sub-slice groups encoded by the current frame and the previous frame of the encoder 3 remains unchanged, that is, the same macroblock number is 1440-2159 areas, because the required reference frame area The area of macroblocks numbered 1440-2159 is the sub-strip group reconstructed by the encoder in the last frame, so the reference data that needs to be read in from the encoder 3 is only the following two blocks:

宏块编号为1280~1439,来自于(编码器2输出的)重建子条带组2;和,Macroblocks numbered 1280-1439 from reconstructed sub-slice group 2 (output by encoder 2); and,

宏块编号为2160~2319,来自于(编码器4输出的)重建子条带组4。The macroblocks are numbered 2160-2319 and come from the reconstructed sub-slice group 4 (outputted by the encoder 4).

同样,对于一个参考帧的运动估计情况下,若在编码器2的当前帧与上一帧所编码子条带组划分不变,即同为宏块编号为720~1439区域,由于所需参考帧区域中宏块编号为720~1439区域是本编码器上一帧重建的子条带组,故需要从编码器3外部读入的参考数据仅为以下两块:Similarly, in the case of motion estimation of a reference frame, if the sub-slice group division between the current frame and the previous frame of the encoder 2 remains unchanged, that is, the same macroblock number is 720-1439, because the required reference In the frame area, the macroblock numbered from 720 to 1439 is the sub-strip group reconstructed in the previous frame of the encoder, so the reference data that needs to be read from the outside of the encoder 3 is only the following two blocks:

宏块编号为1440~1559,来自于(编码器3输出的)重建子条带组3;和,The macroblocks are numbered 1440-1559 from the reconstructed sub-slice group 3 (output by the encoder 3); and,

宏块编号为560~719,,来自于(编码器1输出的)重建子条带组1。The macroblocks are numbered 560-719, and come from the reconstructed sub-slice group 1 (outputted by the encoder 1).

在此,当前编码子条带组相邻的两个编码器2和3相互将重建子条带组数据传给对方作为参考数据,用于进行编码操作。Here, the two encoders 2 and 3 adjacent to the current encoding sub-slice group transmit the reconstructed sub-slice group data to each other as reference data for encoding operations.

若使用多参考帧运动估计,则需要从其它编码器载入所需的多个重建子条带组数据部分。If multi-reference frame motion estimation is used, it is necessary to load the required multiple reconstructed sub-slice group data parts from other encoders.

本发明提供的方法可以适用于各种视频格式,包括高清视频或标清视频,而且,可以是逐行也可以为隔行扫描;其中,对于隔行扫描视频,一个隔行扫描帧的顶场和底场的两个对应宏块组成宏块对,需要将一个宏块对分配给同一个条带组、子条带组、条带编码。The method provided by the present invention can be applicable to various video formats, including high-definition video or standard-definition video, and can be progressive or interlaced scanning; wherein, for interlaced scanning video, the top field and bottom field of an interlaced scanning frame Two corresponding macroblocks form a macroblock pair, and one macroblock pair needs to be allocated to the same slice group, sub-slice group, and slice coding.

基于前面的条带组及子条带组划分的实例,下面对针对子条带组编码过程中的条带的划分的应用实例进行描述:Based on the previous examples of slice group and sub-slice group division, the following describes an application example for the division of slices in the sub-slice group encoding process:

应用实例一Application example one

将子条带组划分为条带分割的一个例子如表5所示。在表5中,子条带组1在编码过程中分成7个条带,按光栅扫描顺序编号为条带1~7,条带1、2各为两个宏块行,条带3~7各为一个宏块行;An example of dividing sub-slice groups into slice partitions is shown in Table 5. In Table 5, sub-strip group 1 is divided into 7 slices during the encoding process, numbered as slices 1-7 according to the raster scanning sequence, slices 1 and 2 are two macroblock rows, and slices 3-7 Each is a macroblock row;

表5table 5

  0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1 80 81 82 116 117 118 119 120 121 122 123 157 158 159 MBR2 160 161 162 196 197 198 199 200 201 202 203 237 238 239 MBR3 240 241 242 276 277 278 279 280 281 282 283 317 318 319 MBR4 320 321 322 356 357 358 359 360 361 362 363 397 398 399 MBR5 400 401 402 416 417 418 419 420 421 422 423 477 478 479 MBR6 480 481 482 516 517 518 519 520 521 522 523 557 558 559 MBR7 560 561 562 596 597 598 599 600 601 602 603 637 638 639 MBR8 640 641 642 676 677 678 679 680 691 692 693 717 718 719 MBR9 0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1 80 81 82 116 117 118 119 120 121 122 123 157 158 159 MBR2 160 161 162 196 197 198 199 200 201 202 203 237 238 239 MBR3 240 241 242 276 277 278 279 280 281 282 283 317 318 319 MBR4 320 321 322 356 357 358 359 360 361 362 363 397 398 399 MBR5 400 401 402 416 417 418 419 420 421 422 423 477 478 479 MBR6 480 481 482 516 517 518 519 520 521 522 523 557 558 559 MBR7 560 561 562 596 597 598 599 600 601 602 603 637 638 639 MBR8 640 641 642 676 677 678 679 680 691 692 693 717 718 719 MBR9

应用实例二Application example two

第二个条带分割的例子如表6所示,在表2中,子条带组1在编码过程中分成5个条带,按光栅扫描顺序编号为条带1~5,各条带包含的宏块如下:The second example of slice division is shown in Table 6. In Table 2, sub-strip group 1 is divided into 5 slices during the encoding process, and they are numbered as slices 1 to 5 according to the raster scanning order. Each slice contains The macroblocks are as follows:

条带1:宏块编码0~123;Slice 1: macroblock code 0~123;

条带2:宏块编码124~237;Slice 2: Macroblock coding 124-237;

条带5:宏块编码238~319;Slice 5: macroblock coding 238-319;

条带3:宏块编码320~521;Slice 3: macroblock coding 320-521;

条带4:宏块编码522~719;Slice 4: macroblock coding 522-719;

表6Table 6

  0 1 2 36 37 38 39 40 41 42 43 44 77 78 79 MBR1 80 81 82 116 117 118 119 120 121 122 123 124 157 158 159 MBR2 160 161 162 196 197 198 199 200 201 202 203 204 237 238 239 MBR3 240 241 242 276 277 278 279 280 281 282 283 284 317 318 319 MBR4 0 1 2 36 37 38 39 40 41 42 43 44 77 78 79 MBR1 80 81 82 116 117 118 119 120 121 122 123 124 157 158 159 MBR2 160 161 162 196 197 198 199 200 201 202 203 204 237 238 239 MBR3 240 241 242 276 277 278 279 280 281 282 283 284 317 318 319 MBR4

  0 1 2 36 37 38 39 40 41 42 43 44 77 78 79 MBR1 320 321 322 356 357 358 359 360 361 362 363 364 397 398 399 MBR5 400 401 402 416 417 418 419 420 421 422 423 424 477 478 479 MBR6 480 481 482 516 517 518 519 520 521 522 523 524 557 558 559 MBR7 560 561 562 596 597 598 599 600 601 602 603 604 637 638 639 MBR8 640 641 642 676 677 678 679 680 691 692 693 694 717 718 719 MBR9 0 1 2 36 37 38 39 40 41 42 43 44 77 78 79 MBR1 320 321 322 356 357 358 359 360 361 362 363 364 397 398 399 MBR5 400 401 402 416 417 418 419 420 421 422 423 424 477 478 479 MBR6 480 481 482 516 517 518 519 520 521 522 523 524 557 558 559 MBR7 560 561 562 596 597 598 599 600 601 602 603 604 637 638 639 MBR8 640 641 642 676 677 678 679 680 691 692 693 694 717 718 719 MBR9

本发明还包括一种并行视频编码的装置,该装置的具体实现结构如图3所示,包括主处理器和多个编码器,主处理器用于将待编码的当前帧划分为子条带组,并分别传递给多个编码器,多个编码器并行编码各子条带组,输出各自编码码流给主处理器,由主处理器生成序列码流并输出。The present invention also includes a device for parallel video encoding. The specific implementation structure of the device is shown in Figure 3, including a main processor and a plurality of encoders, and the main processor is used to divide the current frame to be encoded into sub-strip groups , and transmit them to multiple encoders respectively, and the multiple encoders encode each sub-strip group in parallel, and output respective encoded code streams to the main processor, and the main processor generates and outputs serial code streams.

(一)主处理器(1) Main processor

所述的主处理器进一步包括条带组确定单元、子条带组确定单元、子条带组数据传递单元及顶层编码单元,各单元具体为:The main processor further includes a slice group determination unit, a sub-stripe group determination unit, a sub-stripe group data transfer unit, and a top-level encoding unit, and each unit is specifically:

条带组确定单元,用于从数据传送单元接收数字视频序列数据,将视频帧序列中的当前帧固定分割成宏块,按预定规则将所有宏块分配给一个或多个条带组;The slice group determination unit is used to receive digital video sequence data from the data transmission unit, divide the current frame in the video frame sequence into macroblocks, and assign all macroblocks to one or more slice groups according to predetermined rules;

子条带组确定单元,用于对当前帧的所有条带组,按照光栅扫描的顺序划分为一个或多个子条带组,每个宏块必须并且只能分配到一个子条带组;The sub-slice group determination unit is used to divide all slice groups of the current frame into one or more sub-slice groups in the order of raster scanning, and each macroblock must and can only be assigned to one sub-slice group;

子条带组数据传递单元,用于根据子条带组和编码器的对应关系配置控制多个编码器读入所述的子条带组数据,且每个编码器读入一个或多个子条带组,每一个子条带组包含的待编码数据均传递到同一个编码器;在子条带组数据传递过程中可以使用的总线信号格式包括但不限于:并行接口、高速串行接口、BT656或CCIR601数字视频信号格式、HD-SDI信号、AMBA总线规范中的AHB或AXI总线或以太网接口对应的信号格式;A sub-strip group data transfer unit, configured to control multiple encoders to read in the sub-strip group data according to the corresponding configuration between the sub-strip group and the encoder, and each encoder reads in one or more sub-stripes The data to be encoded in each sub-strip group is transmitted to the same encoder; the bus signal formats that can be used in the sub-strip group data transmission process include but are not limited to: parallel interface, high-speed serial interface, BT656 or CCIR601 digital video signal format, HD-SDI signal, signal format corresponding to AHB or AXI bus or Ethernet interface in AMBA bus specification;

顶层编码单元:用于汇集各编码器输出的码流及相关参数,生成条带组、帧码流,合并处理生成序列码流后输出;Top-level encoding unit: used to collect the code streams and related parameters output by each encoder, generate slice groups and frame code streams, and combine them to generate sequence code streams and output them;

在主处理器中具体可以将整个帧为一个条带组,划分为多个子条带组,且所有子条带组为在图像中大小和位置固定、宽度等于帧宽度的一个或多个连续的整宏块行。Specifically, in the main processor, the entire frame can be divided into a strip group and divided into multiple sub-strip groups, and all sub-strip groups are one or more consecutive ones whose size and position are fixed in the image and whose width is equal to the frame width. entire macroblock row.

另外,在主处理器中还完成预先对接收到当前帧进行图像预处理,包括:二维去噪处理,和或,缩放处理,和/或,数字视频4:4:4到4:2:2格式变换处理,和/或,数字视频4:2:2到4:2:0的格式变换处理等。In addition, the main processor also performs image preprocessing on the received current frame in advance, including: two-dimensional denoising processing, and or, scaling processing, and/or, digital video 4:4:4 to 4:2: 2 Format conversion processing, and/or format conversion processing of digital video from 4:2:2 to 4:2:0, etc.

而且,本发明所述主处理器具体中可以是一个计算机系统,包括:CPU、存储器、与数据传送单元的接口(与数据传送单元的接口和外部通信网络输入接口),以及输出接口,用于输出已编码码流,和,配置控制信息接口,用于接受外部配置控制信息。Moreover, the main processor of the present invention may specifically be a computer system, including: a CPU, a memory, an interface with a data transfer unit (an interface with a data transfer unit and an external communication network input interface), and an output interface for output coded code stream, and configuration control information interface for receiving external configuration control information.

(二)编码器,所述的编码器的结构具体可以包括:(2) encoder, the structure of described encoder specifically can comprise:

(1)子条带组接收单元,用于接收子条带组和编码配置参数;(1) a sub-strip group receiving unit, configured to receive sub-strip groups and encoding configuration parameters;

(2)参考数据输入输出单元,用于在各编码器之间交换参考数据,具体为:当当前帧不为I帧时,每个编码器通过数据传送单元中设置的交换参考数据单元,交换从其它编码器获取所需的参考数据,即控制编码器之间交换已重建的参考帧子条带组数据,更新各编码器中缓存的每个子条带组运动估计搜索区参考数据。(2) The reference data input and output unit is used for exchanging reference data between encoders, specifically: when the current frame is not an I frame, each encoder exchanges reference data units set in the data transfer unit to exchange Obtain the required reference data from other encoders, that is, control the exchange of reconstructed reference frame sub-slice group data between encoders, and update the reference data of each sub-slice group motion estimation search area cached in each encoder.

(3)编码单元,用于并行对当前帧所有子条带组编码:每个编码器完成所分配子条带组的编码,在编码过程中将每个子条带组按光栅扫描的顺序划分为一个或多个条带,每个宏块必须并且只能分配到一个条带;每个编码器完成所分配子条带组中所有宏块的编码,产生重建子条带组,产生并输出码流和相关参数;而且,所述条带优选为宽度等于帧宽度的一个或多个连续的整宏块行;(3) The coding unit is used to encode all sub-strip groups in the current frame in parallel: each encoder completes the coding of the assigned sub-strip group, and divides each sub-strip group into raster scanning order during the coding process One or more slices, each macroblock must and can only be allocated to one slice; each encoder completes the encoding of all macroblocks in the allocated sub-slice group, generates a reconstructed sub-slice group, generates and outputs code stream and related parameters; moreover, the slice is preferably one or more consecutive whole macroblock rows with a width equal to the frame width;

本发明所述多个编码器中每个编码器可以为:Each encoder in the plurality of encoders described in the present invention can be:

通用处理器,包括:超标量结构的处理器、超长指令字信号处理器(VLIW DSP);或General-purpose processors, including: superscalar processors, very long instruction word signal processors (VLIW DSP); or

现场可编程大规模阵列芯片(FPGA);或Field Programmable Large Scale Array (FPGA); or

定制的超大规模集成电路芯片(VLSI);或Custom VLSI chips; or

指令集可配置处理器。The instruction set configures the processor.

每个编码器还配置数据随机访问存储器(RAM),即存储单元,用于缓存当前和重建的子条带组数据等信息。数据RAM容量不大时可以使用芯片内大容量RAM,所需容量较大时可以配置使用片外的同步静态随机访问存储器(SSRAM)或同步动态随机访问存储器(SDRAM)。Each encoder is also configured with a data random access memory (RAM), that is, a storage unit for caching information such as current and reconstructed sub-slice group data. When the capacity of the data RAM is not large, the on-chip large-capacity RAM can be used, and when the required capacity is large, the off-chip synchronous static random access memory (SSRAM) or synchronous dynamic random access memory (SDRAM) can be configured.

进一步地,为获得更好的并行处理,用于H.264、VC—1、AVS1.0—P2标准编码时,所述编码器编码中的环路滤波方式为:Further, in order to obtain better parallel processing, when being used for H.264, VC-1, AVS1.0-P2 standard encoding, the loop filtering mode in the described encoder encoding is:

设置选择当前帧所有条带的边界环路滤波模式为不滤波,各编码器各自独立完成环路滤波处理,各编码器、各条带之间不交换数据;Set the boundary loop filtering mode of all slices in the current frame to no filtering, each encoder completes the loop filtering process independently, and no data is exchanged between each encoder and each slice;

具体为:一个条带的环路滤波在该条带第一个宏块重建完成后启动,宏块的重建和环路滤波构成宏块级流水线;或者,一个条带的环路滤波在该条带重建全部完成后开始,各编码器对各条带的环路滤波各自独立进行,各编码器、各条带之间不需要交换数据。Specifically: the loop filtering of a slice is started after the reconstruction of the first macroblock of the slice is completed, and the reconstruction of the macroblock and the loop filtering constitute a macroblock-level pipeline; or, the loop filtering of a slice starts After all band reconstruction is completed, each encoder performs loop filtering for each band independently, and there is no need to exchange data between each encoder and each band.

所述的编码器在编码过程中,相应的参考数据交换过程为:During the encoding process of the encoder, the corresponding reference data exchange process is:

当当前帧不为I帧时,编码器之间交换参考帧子条带组数据,根据各编码器当前帧子条带组和重建帧子条带组的图像区、存储区交叠关系,并考虑当前帧子条带组中运动估计最大搜索区、参考帧,确定编码器之间交换的最少的参考数据,更新各编码器中缓存的每个子条带组运动估计搜索区参考数据;其中,运动估计可以采用大小固定的搜索窗,一个参考帧,根据搜索窗尺寸确定编码器之间交换的最少的参考数据,更新各编码器中缓存的每个子条带组运动估计搜索区参考数据。When the current frame is not an I frame, the reference frame sub-slice group data is exchanged between the encoders, according to the overlapping relationship between the image area and the storage area of the current frame sub-slice group and the reconstruction frame sub-slice group of each encoder, and Considering the maximum motion estimation search area and reference frame in the sub-slice group of the current frame, determine the minimum reference data exchanged between encoders, and update the reference data of each sub-slice group motion estimation search area cached in each encoder; wherein, Motion estimation can use a fixed-size search window and a reference frame, determine the minimum reference data exchanged between encoders according to the size of the search window, and update the reference data of each sub-slice group motion estimation search area cached in each encoder.

需要说明的是,在编码器执行编码操作过程中,通常还需要对当前帧进行视频编码前还需要预处理,包括:二维去噪,缩放和格式变换,格式变换指将输入数字视频格式转为编码器要求的格式等处理;常见的编码数字视频格式为4:2:0,常见的输入的数字视频格式为BT656或CCIR601规定的4:2:2。H.264高保真扩展编码支持4:2:2和4:4:4,此时输入数字视频为4:2:2和4:4:4。高清视频物理接口常为HD—SDI。It should be noted that during the encoding operation of the encoder, it is usually necessary to perform preprocessing on the current frame before video encoding, including: two-dimensional denoising, scaling, and format conversion. Format conversion refers to converting the input digital video format to It is processed for the format required by the encoder; the common encoded digital video format is 4:2:0, and the common input digital video format is 4:2:2 specified by BT656 or CCIR601. H.264 high-fidelity extended encoding supports 4:2:2 and 4:4:4, and the input digital video is 4:2:2 and 4:4:4. HD video physical interface is usually HD-SDI.

而且,多个编码器所需的当前帧数据预处理可采用以下两种形式之一:Also, the current frame data preprocessing required by multiple encoders can take one of two forms:

1、主处理器接收当前帧,集中预处理并缓存后发再转发给多个编码器;这种方案需要主处理器处理能力较强,数据帧存储容量较大;1. The main processor receives the current frame, preprocesses it centrally, buffers it, and then forwards it to multiple encoders; this solution requires a strong main processor processing capability and a large data frame storage capacity;

2、多个编码器各自预处理处理所分配的子条带组。2. Each of the multiple encoders pre-processes the allocated sub-strip group.

本发明所述的装置主处理器中还可以设置有控制单元,用于初始化、配置及控制主处理器中其它处理单元和多个编码器完成整个视频序列编码。The main processor of the device according to the present invention may also be provided with a control unit for initializing, configuring and controlling other processing units and multiple encoders in the main processor to complete the encoding of the entire video sequence.

本发明所述的装置中,为了简化每个编码器的启动、配置电路,降低成本,所述的编码器的启动、配置程序或数据可以来自主处理器;对于编码器为FPGA的情况,可以由主处理器通过并行或串行配置接口配置FPGA;对于编码器为通用处理器的情形,可选择并口或串口启动(Boot),启动过程中应用的BootROM程序来自主处理器。In the device of the present invention, in order to simplify the start-up and configuration circuits of each encoder and reduce costs, the start-up and configuration programs or data of the encoder can come from the main processor; in the case that the encoder is an FPGA, it can The FPGA is configured by the main processor through the parallel or serial configuration interface; for the case where the encoder is a general-purpose processor, parallel port or serial port boot (Boot) can be selected, and the BootROM program used during the boot process comes from the main processor.

本发明所述的装置典型地是由一个或多个电路板构成的一个嵌入式计算机系统;或者,也可以是个人计算机或服务器通过通信网络连接在一起的系统,其中编码器可以是PC或服务器。The device of the present invention is typically an embedded computer system composed of one or more circuit boards; or, it can also be a system in which personal computers or servers are connected together through a communication network, wherein the encoder can be a PC or a server .

本发明所述的方法及装置适用多种编码标准及其各档次。The method and device described in the present invention are applicable to various encoding standards and their grades.

本发明提供的装置适用于各种视频格式,包括高清视频或标清视频,可以是逐行也可以为隔行扫描,对于隔行扫描视频,一个隔行扫描帧的顶场和底场的两个对应宏块组成宏块对,需要将一个宏块对分配给同一个条带组、子条带组、条带编码。The device provided by the present invention is applicable to various video formats, including high-definition video or standard-definition video, which can be progressive or interlaced. For interlaced video, two corresponding macroblocks of the top field and the bottom field of an interlaced frame To form a macroblock pair, it is necessary to assign a macroblock pair to the same slice group, sub-slice group, and slice coding.

以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art within the technical scope disclosed in the present invention can easily think of changes or Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (16)

1. a method of parallel video coding is characterized in that, comprising:
Present frame in the video sequence is divided into macro block, presses pre-defined rule and give one or more slice-group all macroblock allocation;
Order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group; In described partition process, each macro block must and can only be assigned to a sub-slice-group;
With sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder, the corresponding one and same coding device of each sub-slice-group, the corresponding one or more sub-slice-group of each encoder;
By sub-slice-group and encoder corresponding relation, send all sub-slice-group of present frame and coding configuration parameter to each encoder;
When present frame is not the I frame, exchange the sub-slice-group data of reference frame of having rebuild between the encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;
By described each encoder to the parallel encoding process of carrying out of the sub-slice-group of present frame, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the sub-slice-group that each encoder is distributed, produce and rebuild sub-slice-group, output encoder code stream and parameter;
The code stream and the parameter of each encoder output are converged, finish the coding of slice-group, frame and sequence, output whole sequence code stream.
2. method according to claim 1 is characterized in that, described pre-defined rule comprises:
Interlacing pattern, random scanning pattern, prospect and background scans pattern, Box-out scan pattern, raster scan scan pattern, handkerchief scan pattern and needs adopt numbering to indicate the explicit scan pattern of the affiliated slice-group of each macro block, perhaps, present frame is divided into a slice-group.
3. method according to claim 1 is characterized in that, described sub-slice-group is one or more continuous whole macro-block line that width equals the frame width.
4. according to each described method of claim 1 to 3, it is characterized in that, described with sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder processing comprise:
According to each coder processes ability, and reference data transmits cost, and each sub-band that present frame is comprised corresponds to each encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group.
5. method according to claim 1 is characterized in that, in the image sets cataloged procedure, the described processing that is divided into a plurality of sub-slice-group comprises:
For the I frame, handle load according to the macroblock number predictive coding, and sub-slice-group corresponded to each encoder according to the disposal ability and the described encoding process load of encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group;
For first non-I frame, divide when constant when slice-group, adopt the sub-slice-group dividing mode of I frame to carry out the division of sub-slice-group;
For the 2nd non-I frame, each sub-slice-group encoding process amount of the 1st the non-I frame that obtains according to actual count is predicted the encoding process load of described the 2nd non-each sub-slice-group of I frame, and be written into cost according to reference data, adjust the division of described the 2nd non-I frame slice-group, a plurality of coder processes time unanimities when making described the 2nd the non-I frame of parallel encoding;
For the 2nd the arbitrary non-I frame that non-I frame is later, predict the encoding process load of described non-each sub-slice-group of I frame according to this frame former frame or each sub-slice-group encoding process amount of former frame, and be written into cost according to reference frame data, adjust the division of described non-I frame slice-group, a plurality of coder processes time unanimities when making the described non-I frame of parallel encoding.
6. method according to claim 1 is characterized in that, described band is one or more continuous whole macro-block line that width equals the frame width.
7. method according to claim 1 is characterized in that, the encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out comprises:
Use a plurality of backward or forward reference frames to carry out estimation, and motion compensation;
Infra-frame prediction is selected, and promptly during the current macro infra-frame prediction, uses same band to rebuild as yet the not left side macro block and the top macro block data of loop filtering; In the frame/and the interframe selection, ask residual error;
Rate Control;
Integer transform, quantification;
Reorder, entropy coding, entropy coding are that context is based on adaptive variable-length encoding or based on contextual adaptive binary arithmetic coding;
Inverse quantization, inverse transformation;
Rebuild;
Loop filtering.
8. method according to claim 7 is characterized in that, the encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out also comprises:
Be provided with and select the border loop filter patterns of all bands of present frame to be not filtering, each encoder is independently finished each band loop Filtering Processing separately, and exchange message not between each encoder, each band.
9. method according to claim 8 is characterized in that, the described loop filtering of independently finishing separately comprises:
The loop filtering of band starts after first macro block reconstruction of this band is finished, and perhaps, the loop filtering of band begins after this band reconstruction is all finished.
10. the device of a parallel video coding, it is characterized in that, comprise primary processor and a plurality of encoder, primary processor is used for present frame to be encoded is divided into sub-slice-group, and pass to a plurality of encoders respectively, each sub-slice-group of a plurality of encoder parallel encodings is exported separately encoding code stream and is given primary processor, by primary processor formation sequence code stream and output;
Described primary processor comprises slice-group determining unit, sub-slice-group determining unit and top layer coding unit, wherein:
The slice-group determining unit is used for the present frame of sequence of frames of video is divided into macro block, presses pre-defined rule and gives one or more slice-group with all macroblock allocation;
Sub-slice-group determining unit, be used for all slice-group to present frame, order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group, and each macro block must and can only be assigned to a sub-slice-group, the corresponding one or more sub-slice-group of each encoder, the corresponding encoder of each sub-slice-group;
Sub-slice-group data passes unit sends all sub-slice-group of present frame and coding configuration parameter to each encoder by sub-slice-group and encoder corresponding relation;
The top layer coding unit, code stream and parameter that each encoder is exported converge, and finish the coding of slice-group, frame and sequence, output whole sequence code stream;
Described encoder comprises sub-slice-group receiving element, reference data input-output unit and coding unit, wherein:
Sub-slice-group receiving element is used to receive sub-slice-group and coding configuration parameter;
The reference data input-output unit, be used between each encoder, exchanging reference data, when present frame is not the I frame, exchange the sub-slice-group data of reference frame of having rebuild between the controlled encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;
Coding unit, be used for divide the gamete slice-group to carry out encoding process, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the branch gamete slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter.
11. device according to claim 10 is characterized in that, the processing of described primary processor comprises:
Entire frame is divided into a slice-group, and further is divided into a plurality of sub-slice-group, and all sub-slice-group are one or more continuous whole macro-block line that width equals the frame width.
12. device according to claim 10 is characterized in that, each encoder in described a plurality of encoders also comprises data storage cell, is used for the sub-slice-group data that buffer memory is current and rebuild.
13. device according to claim 10 is characterized in that, described band is one or more continuous whole macro-block line that width equals the frame width, and described same macro block only can be divided in the same band.
14. device according to claim 10 is characterized in that, described device primary processor also comprises control unit, is used for initialization, configuration and control primary processor and a plurality of encoder and finishes the whole video sequence coding.
15., it is characterized in that each encoder in described a plurality of encoders also comprises according to each described device of claim 10 to 13 in carrying out cataloged procedure:
The border loop filter patterns of all bands of present frame is set to not filtering, each encoder is independently finished each band loop Filtering Processing separately, exchange message not between each encoder, each band, and the loop filtering of band starts after first macro block reconstruction of this band is finished, perhaps, the loop filtering of band begins after this band reconstruction is all finished.
16., it is characterized in that described a plurality of encoders are only finished macro-block level entropy coding output macro block code stream and parameter according to each described device of claim 10 to 13, primary processor top layer coding module is finished the slice level entropy coding.
CN 200610113256 2006-09-20 2006-09-20 Method and device for parallel video coding Active CN101150719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610113256 CN101150719B (en) 2006-09-20 2006-09-20 Method and device for parallel video coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610113256 CN101150719B (en) 2006-09-20 2006-09-20 Method and device for parallel video coding

Publications (2)

Publication Number Publication Date
CN101150719A CN101150719A (en) 2008-03-26
CN101150719B true CN101150719B (en) 2010-08-11

Family

ID=39251016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610113256 Active CN101150719B (en) 2006-09-20 2006-09-20 Method and device for parallel video coding

Country Status (1)

Country Link
CN (1) CN101150719B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472371A (en) * 2016-01-13 2016-04-06 腾讯科技(深圳)有限公司 Video code stream processing method and device

Families Citing this family (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009149671A1 (en) * 2008-06-13 2009-12-17 华为技术有限公司 Method, device, and system for packaging and encoding/decoding of video data
CN101686388B (en) * 2008-09-24 2013-06-05 国际商业机器公司 Video streaming encoding device and method thereof
KR101118091B1 (en) * 2009-06-04 2012-03-09 주식회사 코아로직 Apparatus and Method for Processing Video Data
JP5918128B2 (en) 2009-07-01 2016-05-18 トムソン ライセンシングThomson Licensing Method and apparatus for signaling intra prediction per large block for video encoders and decoders
JP5359657B2 (en) * 2009-07-31 2013-12-04 ソニー株式会社 Image encoding apparatus and method, recording medium, and program
CN102340659B (en) * 2010-07-23 2013-09-04 联合信源数字音视频技术(北京)有限公司 Parallel mode decision device and method based on AVS (Audio Video Standard)
US8344917B2 (en) * 2010-09-30 2013-01-01 Sharp Laboratories Of America, Inc. Methods and systems for context initialization in video coding and decoding
CN101969560B (en) * 2010-11-01 2012-09-05 北京中科大洋科技发展股份有限公司 Slice code rate allocation method of Mpeg2 high-definition coder under multi-core platform
KR101824241B1 (en) * 2011-01-11 2018-03-14 에스케이 텔레콤주식회사 Intra Additional Information Encoding/Decoding Apparatus and Method
CN102281441B (en) * 2011-06-17 2017-05-24 中兴通讯股份有限公司 Method and device for parallel filtering
CN102231631B (en) * 2011-06-20 2018-08-07 深圳市中兴微电子技术有限公司 The coding method of RS encoders and RS encoders
EP2781088A4 (en) * 2011-11-16 2015-06-24 Ericsson Telefon Ab L M Reducing amount op data in video encoding
CN103124345A (en) * 2011-11-18 2013-05-29 江南大学 Parallel encoding method
SI2811745T1 (en) * 2012-01-30 2018-12-31 Samsung Electronics Co., Ltd Method and apparatus for hierarchical data unit-based video encoding and decoding comprising quantization parameter prediction
CN108965892B (en) 2012-01-30 2021-02-19 三星电子株式会社 equipment for video decoding
CN107734339B (en) 2012-02-04 2021-06-01 Lg 电子株式会社 Video encoding method, video decoding method, and device using the same
EP3793200B1 (en) * 2012-04-13 2022-11-09 GE Video Compression, LLC Low delay picture coding
CA2877045C (en) 2012-06-29 2020-12-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Video data stream concept
GB2507127B (en) * 2012-10-22 2014-10-08 Gurulogic Microsystems Oy Encoder, decoder and method
MX349298B (en) * 2013-01-04 2017-07-21 Samsung Electronics Co Ltd Method for entropy-encoding slice segment and apparatus therefor, and method for entropy-decoding slice segment and apparatus therefor.
CN103268263B (en) * 2013-05-14 2016-08-10 讯美电子科技有限公司 A kind of method and system of dynamic adjustment multi-graphics processor load
JP6226578B2 (en) * 2013-06-13 2017-11-08 キヤノン株式会社 Image coding apparatus, image coding method, and program
CN103442196B (en) * 2013-08-16 2016-12-07 福建省物联网科学研究院 A kind of video recording method being used for touch panel device based on vector coding
CN103414902A (en) * 2013-08-26 2013-11-27 上海富瀚微电子有限公司 AVC parallel coding method used for low power consumption applications
CN103458244B (en) 2013-08-29 2017-08-29 华为技术有限公司 A kind of video-frequency compression method and video compressor
CN103916675B (en) * 2014-03-25 2017-06-20 北京工商大学 A kind of low latency inner frame coding method divided based on band
CN104980764B (en) * 2014-04-14 2019-06-21 深圳力维智联技术有限公司 Parallel decoding method, apparatus and system based on complex degree equalization
CN104038766A (en) * 2014-05-14 2014-09-10 三星电子(中国)研发中心 Device used for using image frames as basis to execute parallel video coding and method thereof
CN105992018B (en) * 2015-02-11 2019-03-26 阿里巴巴集团控股有限公司 Streaming media transcoding method and apparatus
CN104780377B (en) * 2015-03-18 2017-12-15 同济大学 A kind of parallel HEVC coded systems and method based on Distributed Computer System
CN104811696B (en) * 2015-04-17 2018-01-02 北京奇艺世纪科技有限公司 A kind of coding method of video data and device
CN113115043A (en) * 2015-08-07 2021-07-13 辉达公司 Video encoder, video encoding system and video encoding method
EP3381190B1 (en) * 2016-08-04 2021-06-02 SZ DJI Technology Co., Ltd. Parallel video encoding
CN106231320B (en) * 2016-08-31 2020-07-14 上海交通大学 Joint code rate control method and system supporting multi-machine parallel coding
CN106454354B (en) * 2016-09-07 2019-10-18 中山大学 A kind of AVS2 parallel code processing system and method
CN106603564A (en) * 2016-12-30 2017-04-26 上海寰视网络科技有限公司 Unlimited high-resolution image and video playing methods and systems
CN106849956B (en) * 2016-12-30 2020-07-07 华为机器有限公司 Compression method, decompression method, apparatus and data processing system
US10979728B2 (en) * 2017-04-24 2021-04-13 Intel Corporation Intelligent video frame grouping based on predicted performance
CN107819573A (en) * 2017-10-17 2018-03-20 东北大学 High dimension safety arithmetic coding method
CN107888917B (en) * 2017-11-28 2021-06-22 北京奇艺世纪科技有限公司 A kind of image coding and decoding method and device
CN110971896B (en) * 2018-09-28 2022-02-18 瑞芯微电子股份有限公司 H.265 coding method and device
EP3664451B1 (en) * 2018-12-06 2020-10-21 Axis AB Method and device for encoding a plurality of image frames
EP3668096B1 (en) * 2018-12-11 2025-05-14 Axis AB Method and device for encoding a sequence of image frames using a first and a second encoder
CN109862357A (en) * 2019-01-09 2019-06-07 深圳威尔视觉传媒有限公司 Cloud game image encoding method, device, equipment and the storage medium of low latency
CN112698937B (en) * 2019-10-23 2025-09-30 深圳市茁壮网络股份有限公司 A high-efficiency coding storage method and device
CN111669596B (en) * 2020-06-17 2022-08-12 展讯通信(上海)有限公司 Video compression method and device, storage medium and terminal
CN113259675B (en) * 2021-05-06 2021-10-01 北京中科大洋科技发展股份有限公司 Ultrahigh-definition video image parallel processing method
CN114205595A (en) * 2021-12-20 2022-03-18 广东博华超高清创新中心有限公司 A low-latency transmission method and system based on AVS3 encoding and decoding
CN115134606A (en) * 2022-07-13 2022-09-30 翱捷科技股份有限公司 A method and device for realizing video parallel coding
CN117412062A (en) * 2023-09-28 2024-01-16 协创芯片(上海)有限公司 A multimedia chip that supports H265 encoding
CN118646883B (en) * 2024-08-16 2024-11-08 浙江大华技术股份有限公司 Coding method and related device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1126408A (en) * 1994-06-14 1996-07-10 大宇电子株式会社 Apparatus for parallel decoding of digital video signals
US5557332A (en) * 1993-03-05 1996-09-17 Sony Corporation Apparatus and method for reproducing a prediction-encoded video signal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5557332A (en) * 1993-03-05 1996-09-17 Sony Corporation Apparatus and method for reproducing a prediction-encoded video signal
CN1126408A (en) * 1994-06-14 1996-07-10 大宇电子株式会社 Apparatus for parallel decoding of digital video signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
全文.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472371A (en) * 2016-01-13 2016-04-06 腾讯科技(深圳)有限公司 Video code stream processing method and device

Also Published As

Publication number Publication date
CN101150719A (en) 2008-03-26

Similar Documents

Publication Publication Date Title
CN101150719B (en) Method and device for parallel video coding
TWI626842B (en) Motion picture coding device and its operation method
CN111866512B (en) Video decoding method, video encoding method, video decoding apparatus, video encoding apparatus, and storage medium
CN101621687B (en) Methodfor converting video code stream from H. 264 to AVS and device thereof
CN102150427B (en) System and method for video encoding using adaptive loop filter
US8000388B2 (en) Parallel processing apparatus for video compression
CN103248893B (en) From H.264/AVC standard to code-transferring method and transcoder thereof the fast frame of HEVC standard
CN101783957B (en) A video predictive coding method and device
CN101247525B (en) A Method of Improving the Intra-Frame Coding Rate of Image
Akiyama et al. MPEG2 video codec using image compression DSP
JP2015015666A (en) Video encoding apparatus and operation method thereof
CN101984665A (en) Method and system for evaluating video transmission quality
CN103442228B (en) Code-transferring method and transcoder thereof in from standard H.264/AVC to the fast frame of HEVC standard
CN104519367A (en) Video decoding processing apparatus and operating method thereof
CN114071158A (en) Method, device and device for constructing motion information list in video codec
CN101179729A (en) A H.264 Macroblock Mode Selection Method Based on Statistical Classification of Inter Modes
CN114079782A (en) Video image reconstruction method, device, computer equipment and storage medium
WO2025157062A1 (en) Reference block searching method and circuit, chip, and device
CN109495745B (en) A Lossless Compression and Decoding Method Based on Inverse Quantization/Inverse Transform
CN101198066A (en) A method for selecting inter-frame and intra-frame coding modes
CN201282535Y (en) Device for converting H.264 to AVS video code stream
CN101977317B (en) Intra-frame prediction method and device
CN104954806A (en) Intra-frame video optimization coding method
JP6234770B2 (en) Moving picture decoding processing apparatus, moving picture encoding processing apparatus, and operation method thereof
CN102065297B (en) MPEG-2 (Moving Pictures Experts Group-2) to H.264 fast video transcoding method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210421

Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040

Patentee after: Honor Device Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040

Patentee after: Honor Terminal Co.,Ltd.

Country or region after: China

Address before: 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong

Patentee before: Honor Device Co.,Ltd.

Country or region before: China