CN101150719B

CN101150719B - Parallel video coding method and device

Info

Publication number: CN101150719B
Application number: CN 200610113256
Authority: CN
Inventors: 孟新建
Original assignee: Huawei Technologies Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2006-09-20
Filing date: 2006-09-20
Publication date: 2010-08-11
Anticipated expiration: 2026-09-20
Also published as: CN101150719A

Abstract

The invention relates to a method of parallel video coding and a device. The invention comprises a host processor and a plurality of coder, the host processor is used for dividing present frame in the video sequence fixedly into macroblocks, distributing all macroblocks to one or a plurality of band groups according to the preset rule; according to the rule of processing load to balance between a plurality of coders, the band group of present frame is devided into one or a plurality of sub-band groups according to the raster scanning sequence; all sub-band groups of present frame which are determined by division and code configuration parameters are sent to a plurality of coders respectively, then parallel sub-band groups of present frame is coded by the coders, outputs codes and parameters respectively; finally, gathering codes and parameters output by each coder, further coding sub-band groups, frames and sequences, outputting whole sequence code. Thereby, the method and the device of the invention are suitable to realtime high definition video code.

Description

Method of parallel video coding and device

Technical field

The present invention relates to technical field of video coding, relate in particular to a kind of parallel video coding technology.

Background technology

Video coding technique is with the digital video information compression, more effectively is transmitted and stores so that realize.Video coding is the core technology that multi-medium data is handled.

At present, the video compression coding standard mainly comprises: the moving picture standard that International Telecommunications Union's Standardization Sector (ITU-T) video coding expert group (VCEG) formulates H.261, H.263; Video encoding standard MPEG1, MPEG4-Part2 that the Motion Picture Experts Group (MPEG) of ISO/IEC associating formulates; The common video encoding standard MPEG2/H.262 that formulates of ITU-T video coding expert group (VCEG) and ISO/IEC MPEG joint specialist group (JVT), H.264/AVC (MPEG4-Part10); Also has the video encoding standard AVS1.0-P2 of VC-1 (predecessor is WMV-9) and audio and video standard group (AVS) formulation etc. in addition.

MPEG (Motion Picture Experts Group), JVT (MPEG joint specialist group) and VCEG (video coding expert group) series video coding standard basic framework are as shown in Figure 1, adopt the hybrid coding framework of block-based motion compensation and transition coding, comprise infra-frame prediction, inter prediction, conversion, quantification and entropy coding etc.Wherein, inter prediction is to use block-based motion vector to come redundancy between removal of images; Infra-frame prediction is to use spatial prediction mode to come the interior redundancy of removal of images.Again by prediction residual being carried out the visual redundancy in the transform and quantization removal of images.At last, motion vector, predictive mode, quantization parameter and conversion coefficient compress with entropy coding.The basic processing unit of video decoding process is a macro block, and a macro block generally includes one 16 * 16 brightness sample value piece and corresponding colourity sample value piece.

Institute's tool using of different standards certain difference arranged.H.264/AVC for (MPEG4-Part10), VC-1, the AVS1.0-P2, de-blocking filter is necessary module, is called as loop filtering for generation standard; And at MPEG2, H.263, in the MPEG4-Part2 standard, de-blocking filter only is an optional reprocessing link in the decoder.

Real-Time Video Encoder be input as high-definition video signal, finish video compression coding in real time, output code flow.Real-Time Video Encoder is the basic equipment of Digital Television head end (Headend) system, in the equipment such as DVD player that also be widely used in video conference, digital camera, can record.

H.264/AVC (MPEG4-Part10), VC-1, AVS1.0-P2 are called as video compression coding standard of new generation, compare with the previous generation standard that with MPEG2 is representative, the compression ratio of generation standard provides more than one times, but complexity also increases more than 2 times, and the difficulty of realization increases greatly.

HDTV (High-Definition Television) (HDTV) typically refers to every frame scan line number and is 720 lines or interlacing 1080 lines and above live image thereof line by line.Common high-definition format has at present: 720p (resolution 1280 * 720, frame frequency are 24,30,60), 1080i (field frequency is 60 for interlacing scan, every frame resolution 1920 * 1088), 1080p (resolution 1920 * 1088, frame frequency are 24,30).Future, more high-resolution video also can obtain to use.HD video can provide higher video quality, and simultaneously, the realization of HD video compressed encoding is more difficult.

H.264/AVC, with generation standard is example, owing to introduced the multi-reference frame motion compensation, minimum is 4 * 4 variable block length prediction, abundant intra prediction mode, loop filtering, the arithmetic coding instruments such as (CABAC) of variable-length encoding of context-adaptive (CAVLC) or context-adaptive, make encoder complexity increase greatly, according to assessment, under the situation that adopts full searching moving to estimate, H.264/AVC the computation complexity of HDTV720p encoder is about 3600giga-instructions per second (GIPS), the about 5570giga-bytes per of Memory flowing of access second (GBytes/s).More huge under 1080i and the 1080p situation.

Because the huge computational complexity of HD video encoder adopts single processor to be difficult to realize real-time coding usually.Especially in head end application scenarios such as (Headend), need to support in multichannel, various video form and the applied environments such as compressed encoding standard and transcoding, adopt single encoded device to realize that the difficulty of coding is more outstanding.Therefore, the high definition encoder usually needs to adopt the parallel encoding process of carrying out of multicore sheet (being a plurality of encoders) to realize.

Yet at present industry does not also have a kind of parallel video coding processing scheme can finely satisfy the real-time high definition coding requirement of high-performance that video compression coding standard of new generation such as (MPEG4-Part10), VC-1, AVS1.0-P2 head end H.264/AVC etc. is used.

Summary of the invention

The purpose of this invention is to provide a kind of method of parallel video coding and device, thereby provide a kind of parallel video coding efficiently to realize, satisfy the real-time HD video coding of high-performance.

The objective of the invention is to be achieved through the following technical solutions:

The invention provides a kind of method of parallel video coding, this method comprises:

Present frame in the video sequence is divided into macro block, presses pre-defined rule and give one or more slice-group all macroblock allocation;

Order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group; In described partition process, each macro block must and can only be assigned to a sub-slice-group;

With sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder, the corresponding one and same coding device of each sub-slice-group, the corresponding one or more sub-slice-group of each encoder;

By sub-slice-group and encoder corresponding relation, send all sub-slice-group of present frame and coding configuration parameter to a plurality of encoders;

When present frame was the I frame, each encoder abandoned all and rebuilds sub-slice-group data;

When present frame was not the I frame, each encoder was rebuild sub-slice-group from other encoder and is obtained the affiliated required reference data of sub-slice-group estimation; Comprise: according to image area, the memory block overlapping relationship of sub-slice-group of each encoder present frame and the sub-slice-group of reconstruction, and maximum search district, the multi-reference frame attribute of estimation in the sub-slice-group of present frame, determine the minimum reference data that exchanges between the encoder, obtain described reference data by swap operation between each encoder, and utilize described reference data to upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;

By described a plurality of encoders to the parallel encoding process of carrying out of the sub-slice-group of present frame, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the described sub-slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter;

The code stream and the parameter of each encoder output are converged, finish the coding of slice-group, frame and sequence, output whole sequence code stream.

Described pre-defined rule comprises: interlacing pattern, random scanning pattern, prospect and background scans pattern, box Box-out scan pattern, the grating opened are swept and are caught the explicit scan pattern that scan pattern, handkerchief scan pattern and needs employing numbering indicate the affiliated slice-group of each macro block, perhaps, present frame is divided into a slice-group.

Described sub-slice-group is one or more continuous whole macro-block line that width equals the frame width.

Described with sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder processing comprise:

According to each coder processes ability, and reference data transmits cost, and each sub-band that present frame is comprised corresponds to each encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group.

In the image sets cataloged procedure, the described processing that is divided into a plurality of sub-slice-group comprises:

For the I frame, handle load according to the macroblock number predictive coding, and sub-slice-group corresponded to each encoder according to the disposal ability and the described encoding process load of encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group;

For first non-I frame, divide when constant when slice-group, adopt the sub-slice-group dividing mode of I frame to carry out the division of sub-slice-group;

For the 2nd non-I frame, each sub-slice-group encoding process amount of the 1st the non-I frame that obtains according to actual count is predicted the encoding process load of described the 2nd non-each sub-slice-group of I frame, and be written into cost according to reference data, adjust the division of described the 2nd non-I frame slice-group, a plurality of coder processes time unanimities when making described the 2nd the non-I frame of parallel encoding;

For the 2nd the arbitrary non-I frame that non-I frame is later, predict the encoding process load of described non-each sub-slice-group of I frame according to this frame former frame or each sub-slice-group encoding process amount of former frame, and be written into cost according to examining frame data, adjust the division of described non-I frame slice-group, a plurality of coder processes time unanimities when making the described non-I frame of parallel encoding.

Described band is one or more continuous whole macro-block line that width equals the frame width.

The encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out comprises:

Use a plurality of backward or forward reference frames to carry out estimation, and motion compensation;

Infra-frame prediction is selected, and promptly during the current macro infra-frame prediction, uses same band to rebuild as yet the not left side macro block and the top macro block data of loop filtering; In the frame/and the interframe selection, ask residual error;

Rate Control;

Integer transform, quantification;

Reorder, entropy coding, entropy coding are that context is based on adaptive variable-length encoding or based on contextual adaptive binary arithmetic coding;

Inverse quantization, inverse transformation;

Rebuild;

Loop filtering.

The encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out also comprises: be provided with and select the border loop filter patterns of all bands of present frame to be not filtering, each encoder is independently finished each band loop Filtering Processing separately, and exchange message not between each encoder, each band.

The described loop filtering of independently finishing separately comprises: the loop filtering of band starts after first macro block reconstruction of this band is finished, and perhaps, the loop filtering of band begins after this band reconstruction is all finished.

The present invention also provides a kind of device of parallel video coding, comprise primary processor and a plurality of encoder, primary processor is used for present frame to be encoded is divided into sub-slice-group, and pass to a plurality of encoders respectively, each sub-slice-group of a plurality of encoder parallel encodings, export separately encoding code stream and give primary processor, by primary processor formation sequence code stream;

Described primary processor comprises slice-group determining unit, sub-slice-group determining unit and top layer coding unit, wherein:

The slice-group determining unit is used for the present frame of sequence of frames of video is divided into macro block, presses pre-defined rule and gives one or more slice-group with all macroblock allocation;

Sub-slice-group determining unit, be used for all slice-group to present frame, order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group, and each macro block must and can only be assigned to a sub-slice-group, the corresponding one or more sub-slice-group of each encoder, the corresponding encoder of each sub-slice-group;

Sub-slice-group data passes unit sends all sub-slice-group of present frame and coding configuration parameter to each encoder by sub-slice-group and encoder corresponding relation;

The top layer coding unit, code stream and parameter that each encoder is exported converge, and finish the coding of slice-group, frame and sequence, output whole sequence code stream;

Described encoder comprises sub-slice-group receiving element, reference data input-output unit and coding unit, wherein:

Sub-slice-group receiving element is used to receive sub-slice-group and coding configuration parameter;

The reference data input-output unit, be used between each encoder, exchanging reference data, when present frame is not the I frame, exchange the sub-slice-group data of reference frame of having rebuild between the controlled encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;

Coding unit, be used for divide the gamete slice-group to carry out encoding process, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the branch gamete slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter.

The processing of described primary processor comprises: entire frame is divided into a slice-group, and further is divided into a plurality of sub-slice-group, and all sub-slice-group are one or more continuous whole macro-block line that width equals the frame width.

Each encoder in described a plurality of encoder also comprises data storage cell, is used for the sub-slice-group data that buffer memory is current and rebuild.

Described band is one or more continuous whole macro-block line that width equals the frame width, and described same macro block only can be divided in the same band.

Described device primary processor also comprises control unit, is used for initialization, configuration and control primary processor and a plurality of encoder and finishes the whole video sequence coding.

Each encoder in described a plurality of encoder also comprises in carrying out cataloged procedure:

The border loop filter patterns of all bands of present frame is set to not filtering, each encoder is independently finished each band loop Filtering Processing separately, exchange message not between each encoder, each band, and the loop filtering of band starts after first macro block reconstruction of this band is finished, perhaps, the loop filtering of band begins after this band reconstruction is all finished.

Described a plurality of encoder is only finished macro-block level entropy coding output macro block code stream and parameter, and primary processor top layer coding module is finished the slice level entropy coding.

As seen from the above technical solution provided by the invention, the invention provides a kind of parallel video coding processing scheme, thereby can in video coding process, select parallel video coding mode, make the present invention can also satisfy the required disposal ability of the real-time HD video coding of high-performance well.Simultaneously, among the present invention, owing to adopt the principle data to be encoded that granularity is moderate of load balancing to offer a plurality of encoders, and coding video data is handled by a plurality of encoders are parallel, thereby make and reduce the time of waiting for mutually as much as possible between each processor, and can reduce expense mutual between the encoder as far as possible and reduce as far as possible, therefore, parallel video coding processing scheme provided by the invention also has higher video coding efficient.

Description of drawings

Fig. 1 is a video coding framework schematic diagram of the prior art;

Fig. 2 is the specific implementation process schematic diagram of method of the present invention;

Fig. 3 is the specific implementation structural representation of device of the present invention.

Embodiment

The present invention mainly is at video coding process, and slice-group further is divided into a plurality of sub-slice-group, and each sub-slice-group comprises one or more bands, utilizes a plurality of encoders to carry out parallel encoding based on described sub-slice-group.

For ease of understanding, at first the part notion in the existing encoding scheme is described technological concept in the existing coding implementation that relates among the present invention.

H.264 be divided into different levels such as sequence, image sets, image (being frame), field, slice-group, band, macro block, sub-macro block from high to low by time, space.

(1) field, frame, image

One of video or a frame can be used for producing a coded image.Usually, frame of video can be divided into two types: continuous or interlaced video.Under the interlaced video situation, two (being called field, top and field, the end) form a frame;

Present frame: the frame of encoding;

Reconstruction frames: encoder is rebuild the frame of output through local decode;

Reference picture (frame): in order to improve precision of prediction, the H264 coding can be from one group of forward direction or back to selecting the reference picture of the one or more and current image that mates most as interframe encode the coding and rebuilding image.H.264 can from 16 reference pictures, select at most in, select best matching image.

(2) (Macroblock is MB) with sub-macro block for macro block

A coded image is divided into several macro blocks usually, and a macro block can be made up of one 16 * 16 luminance pixel and additional a 8 * 8Cb and 8 * 8Cr color pixel piece.Macro block is the basic size unit of video coding.A macro block can further be divided into piece: can be divided into 16 * 8,8 * 16 or 8 * 8 luminance pixel pieces (and subsidiary color pixel); Sub-macro block to 8 * 8 then can be divided into each seed macro block again: 8 * 4,4 * 8 or 4 * 4 luminance pixel pieces (and subsidiary color pixel).It is right that two corresponding macro blocks of the field, top of an interlaced scanned frames and field, the end are formed macro block.

According to the predictive mode that is adopted is in the frame or interframe, and macro block is divided into intra-frame prediction block (I macro block) and inter prediction piece (P macro block), and I macro block utilization decoded pixel from current band carries out infra-frame prediction as a reference.The P macro block utilize the front image encoded carry out infra-frame prediction as the reference image.

(3) band (Slice)

In each image, several macro blocks is aligned to the form of band.Intraframe coding band (I band) only comprises the I macro block, and interframe encode band (P band) can comprise P and I macro block.The coding of band is separate, and the infra-frame prediction of certain band can not be a reference picture with the macro block in other bands.

All the frame of being made up of the I band becomes the I frame; The frame that is made of the I band entirely is not called non-I frame.

(4) slice-group

Visual macro block in standard H.264 can flexible macro block tissue order (FMO) be divided into a plurality of slice-group (slicegroup); Slice-group is the subclass of some MB in the coded image, and it can comprise one or several bands.By the use of slice-group (slice group), it is the mode of band and macro block that FMO has changed image division, can further improve the error resilience of band.

Macro block to the mapping definition of slice-group macro block belong to which slice-group.Utilize the FMO technology, H.264 defined 7 kinds of macro block scan patterns, described seven kinds of scan patterns comprise: staggered, at random, prospect and background, box (Box-out), raster scan, handkerchief and explicit (the indicating slice-group under each macro block with numbering) of opening.

Different standards and class (Profile) are to the support difference of FMO.H.264 basic class (BaselineProfile) and expansion class are supported FMO7 kind scan pattern.H.264 main file time (Main Profile), VC-1, AVS1.0-P2 do not support the FMO pattern, and " raster scan " a kind of scan pattern is only arranged; Have only a slice-group, its size equals frame.

(5) image sets (GOP)

A plurality of continuous images (frame), start frame are exactly the I frame.

(6) sequence

Video sequence (sequence), the top syntactic structure of coded bit stream comprises one or more continuous coded images.

In the encoding process process, class is the subclass of grammer, semanteme and the algorithm of regulation.The decoder that meets certain class regulation must be supported the subclass of this class definition fully.H.264/AVC standard is divided into 3 class (Profile) and 4 kinds of high-fidelity expansions (High Extended).

1, basic class (Baseline Profile):

Utilize I band and P band to support in the frame and interframe encode, support the entropy coding (CAVLC) that carries out based on contextual adaptive variable-length encoding.Be mainly used in real-time video communications such as video telephone, video conferencing, radio communication.

2, main file time (Main Profile):

Support interlaced video, adopt the interframe encode of B band and the intraframe coding of employing weight estimation; Support to utilize based on contextual adaptive arithmetic coding (CABAC).Be mainly used in the storage of digital broadcast television and digital video.

3, expansion class (Extended Profile):

Support effectively to switch (SP and SI band) between the code stream, improve error performance (data are cut apart), but do not support interlaced video and CABAC, be mainly used in Streaming Media.

In concrete encoding process process, be convenient adaptive diverse network standard agreement, function H.264/AVC is divided into two-layer, i.e. video coding layer (VCL) and network abstraction layer (NAL, Network Abstraction Layer).The VCL data are the output after the encoding process, and its expression is compressed the video data sequences behind the coding.Before VCL transfer of data or storage, the VCL data of these codings, mapped earlier or encapsulation is advanced in the NAL unit.

H.264/AVC the video coding layer of (MPEG4-Part10) video encoding standard adopts conversion and prediction hybrid coding method, and corresponding block diagram still as shown in Figure 1.If employing intraframe predictive coding, its predicted value PRED (representing with P among the figure) they are by drawing behind the reference picture of having encoded in the current band motion-compensated (MC), wherein, and reference picture F ' n-1 expression.In order to improve precision of prediction, thereby improve compression ratio, actual reference picture can be in the past or following (referring on the display order) coding and decoding rebuild and the frame of filtering in select.After predicted value PRED and current block subtract each other, produce a residual block Dn, after piece conversion, quantification, produce one group of conversion coefficient X after the quantification, again through entropy coding, form a compressed code flow with required some side informations (as predictive mode quantization parameter, motion vector etc.) of decoding, use for transmission and storage through NAL (network self-adapting layer).

As above-mentioned, for the reference picture of further prediction usefulness is provided, encoder must have the function of reconstructed image.Therefore D ' n and the predicted value P addition that residual image is obtained after inverse quantization, inverse transformation obtains uF ' n (frame of non-filtered).In order to remove the noise that produces in the coding and decoding loop, improve the picture quality of reference frame, thereby improve the compressed image performance, be provided with a loop filter, filtered output is reconstructed image, can be used as reference picture.

Estimation accounts for more than 50% of encoder operand, is the bottleneck that encoder is realized.So-called estimation is to find out to current block the most similar piece according to certain matching criterior for each piece in the present frame (luminance macroblock and sub-macro block thereof) in former frame or the given hunting zone of one frame, back, be match block, calculate motion vector (Motion Vector) by the relative displacement of match block and current block.Here Chang Yong criterion is absolute error and (SAD) minimum.Estimation accurate more, the residual error of compensation is just more little, and code efficiency is just high more, and the picture quality that coding comes out is also just good more.For the piece estimation, need read in the reference frame data (also claiming reference data) of the corresponding search window of this piece.For one 16 * 16 macro block, if motion estimation search position range level [64 ,+63]/vertical [32, + 31], so, need read in reference data and be in the reference frame corresponding to this macro block and position image-region on every side thereof, size is (64+16+64) * (32+16+32)=144 * 80.Under the multi-reference frame situation, may need to read in a plurality of reference frame search window data.Because every frame macroblock number is very big, above-mentioned memory access amount is huge, becomes the bottleneck that video coding is realized.For this reason, need be by reusing reference frame search window data between adjacent macroblocks, can make the reference frame data amount of reading in descend greatly like this, when a lot of adjacent macroblocks were carried out estimation in a unit, required reference data of reading in was only bigger than these macro block zones.

H.264/MPEG-4AVC standard definition the filter process that deblocks to 16X16 macro block and 4X4 block boundary.At macro block in this case, the purpose of filtration is to eliminate because modified cube's effect that adjacent macroblocks has in the different frames, inter prediction or different quantization parameters cause.At block boundary in this case, the purpose of filtration is to eliminate the artificial trace that possibility causes owing to transform/quantization and the difference that comes from the adjacent block motion vector.Loop filtering is modified in two pixels on the same one side on macroblock/block border by the nonlinear algorithm of a content-adaptive.

Other video encoding standard of new generation, i.e. H.264/AVC VC-1, AVS1.0-P2 and have identical coding framework, only part of module details difference; As loop filtering: H.264/AVC can select boundary filtering or not filtering to band, the band boundaries not filtering always of VC-1, AVS1.0-P2, concrete filtering algorithm also has difference.Equally, the previous generation video encoding standard, comprise MPEG4, H.263, MPEG4-Part2, with H.264/AVC, VC-1, AVS1.0-P2 have similar coding framework, part of module difference only, as MPEG4, H.263, MPEG4-Part2 encoder do not have loop filtering, the estimation reference frame also has only one.

For ease of the understanding of the present invention, will describe specific implementation process of the present invention below.

Specific implementation process of the present invention specifically comprises as shown in Figure 2:

Step 21: present frame is divided at least one slice-group;

Be specially the present frame in the sequence of frames of video is divided into macro block, press pre-defined rule and give one or more slice-group all macroblock allocation;

In this step, frame of video can be divided into slice-group specifically can divide by the various scan modes of flexible macro block tissue order (FMO), described various scan pattern comprises: interlacing pattern, random scanning pattern, prospect and background scans pattern, box (Box-out) scan pattern, raster scan scan pattern, handkerchief scan pattern and the explicit scan pattern opened, wherein, in explicit scan pattern, need to adopt numbering to indicate the affiliated slice-group of each macro block; When if encoder is not supported FMO, then can be with all macroblock partitions to one of frame slice-group.

And in described raster scan scan pattern, described slice-group can be chosen as in image one or more whole macro-block line continuously that size and fixed-site, width equal the frame width; In described explicit scan pattern, the shape of described slice-group can all be chosen as rectangle.

Step 22: present frame is divided into a plurality of sub-slice-group based on described slice-group;

In encoding scheme H.264, be divided into different levels such as sequence, image sets (GOP), image (frame), slice-group, band, macro block, sub-macro block; Among the present invention, for with a plurality of encoder parallel encodings, need selecting a kind of appropriate granularity is a plurality of subtasks (modules) with the coding task division of a whole sequence, be about to described slice-group and further be divided into one or more sub-slice-group, or present frame has only a slice-group, then need this slice-group is divided into a plurality of sub-slice-group, so that the follow-up parallel encoding processing that can carry out load balancing based on a plurality of sub-slice-group of present frame.

In this step, all slice-group of present frame further are divided into a plurality of sub-slice-group respectively, specifically be to be divided into one or more sub-slice-group by the order of returning raster scan;

Wherein, all sub-slice-group can equal one or more continuous whole macro-block line of frame width for width.

Need to prove, in dividing sub-slice-group process, each macro block must and can only be assigned to a sub-slice-group, with the corresponding one or more sub-slice-group of each encoder, each sub-slice-group only can corresponding encoder, but each encoder can corresponding one or more sub-slice-group, promptly can be used for one or more sub-slice-group are encoded;

In addition, in this step, all right corresponding a plurality of sub-slice-group of encoder for example, when certain slice-group and sub-slice-group splitting scheme cause parton slice-group size less than normal, can the sub-slice-group that several are little be given an encoder.

The principle of handling load balancing between a plurality of encoders is specially: by dividing sub-slice-group with the principle of a plurality of coder processes capabilities match, in partition process, transmit cost in conjunction with reference data, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group.

In this step, in the process that an image sets (GOP) is encoded, concrete sub-slice-group is divided processing scheme and is comprised following processing mode:

For the I frame, handle load according to the macroblock number predictive coding, by dividing sub-slice-group with the principle of odd encoder device disposal ability coupling, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group;

For the 2nd the arbitrary non-I frame that non-I frame is later, predict the encoding process load of described non-each sub-slice-group of I frame according to this frame former frame or each sub-slice-group encoding process amount of former frame, and be written into cost according to examining frame data, adjust the division of described non-I frame slice-group, a plurality of coder processes time unanimities when making the described non-I frame of parallel encoding;

Step 23: the data to be encoded that will be divided into after the sub-slice-group are distributed to each encoder;

Sub-slice-group and encoder corresponding relation that concrete ways of distribution is described in 22 set by step carry out, concrete distributing contents not only comprises all sub-slice-group of present frame, also comprise relevant coding configuration parameter information, these information comprise the descriptor of band, as position, macroblock number etc.; Described coding configuration parameter information specifically includes but not limited to: coding standard information (standard, class), and FMO and scan pattern, reference frame number, the motion estimation search scope, the Rate Control requirement, and the loop filtering pattern etc.

Step 24: carry out the operation of exchange reference data between each encoder;

The processing procedure of concrete exchange reference data can for: when present frame was not the I frame, the exchange sub-slice-group data of reference frame of having rebuild between the encoder were upgraded each the sub-slice-group motion estimation search district reference data of buffer memory in each encoder;

Further, being treated to of described exchange reference data: when present frame was not the I frame, encoder carried out inter prediction, and wherein most crucial processing is an estimation; Before described estimation, encoder need obtain field of search reference data; Reference data derives from the reference frame that the front has rebuild, and this reference frame is synthesized by the sub-slice-group of respectively rebuilding of each encoder output; Reading in of reference data is the best part in the video coding memory access amount, and be very big to the coding efficiency influence; Need to reduce to greatest extent the amount of reading in of reference data, this need adopt various reference datas to reuse strategy for this reason.

Promptly in this step, if present frame is not the I frame, then according to image area, the memory block overlapping relationship of sub-slice-group of each encoder present frame and the sub-slice-group of reconstruction frames, and estimation maximum search district, reference frame in the sub-slice-group of present frame, determine the minimum reference data that exchanges between the encoder, and utilize the reference data of described exchange to upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder.

That is to say, among the present invention, the sub-slice-group of reconstruction that produces in encoder encodes process is stored in this encoder local storage, when same sub-slice-group position size of next frame is constant, these data of rebuilding sub-slice-group can be all as the reference data of descending the frame estimation, divide gamete slice-group zone field of search partial reference data in addition to produce but exceed this encoder, must obtain from other encoder by other encoder.

For example, in motion estimation process, adopt fixed-size search window, a reference frame, determine the minimum reference data that exchanges between the encoder according to the size of search window, and exchange obtains corresponding reference data between encoder, so that be used for upgrading the motion estimation search district reference data of each sub-slice-group of buffer memory in each encoder.

Step 25: by each encoder and be about to the macro block that all sub-slice-group of present frame to be encoded comprise and carry out encoding process;

Processing with a plurality of encoder parallel encodings is specifically as follows to all sub-slice-group of present frame: each encoder finish divide the coding of gamete slice-group, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, and each macro block must and can only be assigned to a band; Each encoder finish divide all macroblock encoding in the gamete slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter.

In this step, all macroblock encoding processes specifically can comprise in each encoder antithetical phrase slice-group: estimation, motion compensation; Infra-frame prediction is selected, in the infra-frame prediction, frame/the interframe selection, ask residual error; Rate Control; Conversion, quantification; Reorder entropy coding; Inverse quantization, inverse transformation; And rebuild.

If according to H.264, VC-1, AVS1.0-P2 standard, then in each encoder antithetical phrase slice-group all macro blocks cataloged procedure specifically can may further comprise the steps:

(1) use a plurality of backward or forward reference frames to carry out estimation, and motion compensation;

(2) infra-frame prediction is selected, and promptly during the current macro infra-frame prediction, uses same band to rebuild as yet the not left side macro block and the top macro block data of loop filtering; In the frame/and the interframe selection, ask residual error;

(3) Rate Control is handled;

(4) integer transform, quantification treatment;

(5) reorder, entropy coding, entropy coding are that context is based on adaptive variable-length encoding or based on contextual adaptive binary arithmetic coding;

(6) inverse quantization, and inverse transformation is handled;

(7) reconstruction process;

(8) loop filtering is handled.

Based on standard H.264, loop filtering is that cataloged procedure produces the necessary link of reconstruction frames.H.264 whether the border of every band is carried out loop filtering and can selection be set by " band boundaries loop filtering pattern ", and the present invention preferably is set to " not filtering " with all bands of present frame " border loop filter patterns ".All bands of present frame " border loop filter patterns " are when being set to " not filtering ", and each encoder is independently finished each band loop Filtering Processing separately, swap data not between each encoder, each band.

Because the loop filtering of each band is independently, for saving the processing time of loop filtering, the loop filtering of the macro block that comprises at a band among the present invention starts after first macro block reconstruction of this band is finished, the reconstruction of macro block and loop filtering constitute the macro-block level streamline, in order to reduce the visiting frequency of loop filtering processing to main storage, can be on sheet the pixel value and the block message of the 4 row pixel values bottom in the one full line macro block of storage top and block message and the rightest 4 row of left side macro block on the buffer memory, loop filtering only needs to visit buffer on this piece sheet when handling.Certainly, the loop filtering of a band also can begin after this band reconstruction is all finished, and same, each encoder independently carries out separately to the loop filtering of each band, does not need swap data between each encoder, each band.

Step 26: output code flow after the encoder encodes and parameter are carried out convergence processing and top layer coding, produce the output sequence code stream;

Be specially: primary processor compiles the code stream and the relevant parameter of each encoder output, generates slice-group, frame code stream, and merging is exported after handling the formation sequence code stream; For example, for H.264, need finish all encoding process that comprise NAL.

Through the processing of above-mentioned steps 21 to step 26, just can realize handling at the parallel encoding of video data, and, in at the coding video data processing procedure, need to repeat described step 21 to step 26, so that repeat the next frame encoding process, finish up to cataloged procedure.

The present invention divides the key of the processing procedure of subband grouping for the present invention's realization in the specific implementation process, below in conjunction with concrete application example the processing procedure of dividing the subband grouping is described.

Among the present invention, in order to obtain more unitary Item disposal ability by parallel processing, satisfy the requirement of real-time HD video coding, need to adopt the parallel encoding process of carrying out of a plurality of encoders, the framework of described a plurality of encoders and encoding process ability can be inequality, but the total disposal ability of its encoder should be greater than desired encoding process, and has certain surplus to deal with the additional overhead of parallel encoding processor.

(1280 * 720,30fps) H.264 real-time coding is an example with HD video 720p.The encoder architecture design considers to support 1 or 2 reference frame, motion estimation search scope level [64 ,+63]/vertically [32 ,+31].Single encoded device adopts VLIW DSP or FPGA to realize that according to assessment in advance, approximately need 5 described encoders to add up and can reach the total encoding process ability of 720p and certain surplus is arranged, described encoder is numbered encoder 1～5.

Among the present invention, in order to obtain best parallel processing performance, following two aspects of main consideration:

(1) load balancing between each encoder is waited for few as far as possible mutually;

(2) expense such as communicate by letter between each encoder is as far as possible little.

Method of the present invention is divided by sub-slice-group flexibly just, makes a plurality of processors in the parallel video coding system (being encoder) satisfy above-mentioned two aspects and handles, to reach performance the best of whole parallel video coding system.

For reaching above-mentioned (1) described demand, the encoding process load of the sub-slice-group of expectation is mated with the disposal ability of the encoder that is distributed.The encoding process of sub-slice-group load and this sub-slice-group size (the macroblock number number that promptly comprises), image content features, coding configuration parameter (infra-frame prediction, inter prediction, reference frame quantity, smallest blocks, quantification gradation, entropy coding mode etc.) are closely related, usually, macroblock number is many more, the encoding process load is big more, therefore can be complementary by macroblock number and the corresponding coder processes ability of adjusting sub-slice-group.

For above-mentioned (2) described demand, the communication overhead of odd encoder device parallel processing mainly is the exchange of reference data, if estimation parameter (reference frame number, hunting zone etc.) is fixing, the group slice-group is divided when constant, communication overhead is constant, if the sub-slice-group of present frame is divided change, then the communication overhead of reference data exchange obviously increases.So the adjustment that sub-slice-group is divided need be calculated the cost influence that the reference data communication overhead increases.

Be example specifically, the dividing mode of slice-group provided by the invention, sub-slice-group is described with HD video 720p (1280 * 720).Promptly as shown in table 1, it in the table 3 macroblock partitions of 720p one frame, have 3600 16 * 16 macro blocks, (rightmost one row are numberings of macro-block line in the table to be divided into 45 macro-block line, MBR1～45), each macro-block line has 80 macro blocks, and the numeral in each little lattice is the macro block numbering, pressing raster scan (from left to right, from top to bottom) increases progressively;

Table 1

0	1	2		36	37	38	39	40	41	42	43	…	77	78	79	MBR1
0	1	2		36	37	38	39	40	41	42	43	…	77	78	79	MBR1	80	81	82		116	117	118	119	120	121	122	123	…	157	158	159	MBR2
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR3~7	80	81	82		116	117	118	119	120	121	122	123	…	157	158	159	MBR2
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR3~7	560	561	562	…	596	597	598	599	600	601	602	603	…	637	638	639	MBR8
640	641	642	…	676	677	678	679	680	691	692	693	…	717	718	719	MBR9	560	561	562	…	596	597	598	599	600	601	602	603	…	637	638	639	MBR8
640	641	642	…	676	677	678	679	680	691	692	693	…	717	718	719	MBR9	720	721	722	…	756	757	758	759	760	761	762	763	…	797	798	799	MBR10
800	801	802	…	836	837	838	839	840	841	842	843	…	877	878	879	MBR11	720	721	722	…	756	757	758	759	760	761	762	763	…	797	798	799	MBR10
800	801	802	…	836	837	838	839	840	841	842	843	…	877	878	879	MBR11	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR12~16
1280			…									…			1359	MBR17	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR12~16
1280			…									…			1359	MBR17	1360			…									…			1439	MBR18
1440			…									…			1519	MBR19	1360			…									…			1439	MBR18
1440			…									…			1519	MBR19	1520			…									…			1599	MBR20
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR21~25	1520			…									…			1599	MBR20
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR21~25	2000			…									…			2039	MBR26
2080			…									…			2159	MBR27	2000			…									…			2039	MBR26
2080			…									…			2159	MBR27	2160			…									…			2239	MBR28
2240			…									…			2319	MBR29	2160			…									…			2239	MBR28
2240			…									…			2319	MBR29	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR30~34
2720			…									…			2799	MBR35	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR30~34
2720			…									…			2799	MBR35	2800			…									…			2879	MBR36
2880			…									…			2959	MBR37	2800			…									…			2879	MBR36

2960	…	…	3039	MBR38
2960	…	…	3039	MBR38	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR39~43
3440	…	…	3519	MBR44	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR39~43
3440	…	…	3519	MBR44	3520			…									…			3599	MBR45

At the frame shown in the table 1, be example to divide slice-group by prospect and background.Suppose that a prospect slice-group is only arranged, totally 216 macro blocks (8 macro blocks row, 27 macro-block line).All the other belong to the background slice-group residue macro block.In actual applications, in video conferencing, often the people in the visual field (or people's face) is considered as the part be concerned about most, this part is divided into the prospect slice-group encodes separately.Two slice-group further are divided into 6 sub-slice-group altogether again, and wherein the background slice-group is divided into 5 sub-slice-group, and the prospect slice-group is divided into a sub-slice-group.

As shown in table 2, the sub-slice-group after corresponding the division is specifically as follows following division result:

Sub-slice-group 1: macro block numbering 0～719;

Sub-slice-group 2: except the macro block numbering 720～1439, prospect part (the black matrix block of band double underline);

Sub-slice-group 3: except the macro block numbering 1440～2159, prospect part (the black matrix block of band double underline);

Sub-slice-group 4: except the macro block numbering 2160～2879, prospect part (the black matrix block of band double underline);

Sub-slice-group 5: macro block numbering 2880～3599;

Sub-slice-group 6: all 216 macro blocks of prospect slice-group (the black matrix block of band double underline).

The dividing mode of this seed slice-group is applicable to the situation of H.264 basic class and expansion class support FMO.

Table 2

Figure DEST_PATH_S061B3256920061221D000021

Be example still with the frame shown in the his-and-hers watches 1, and as shown in table 3 by the implementation of explicit division slice-group, a frame is divided into 4 slice-group, further is divided into 5 sub-slice-group altogether again.

Slice-group 1 is the italic block in the upper left corner, totally 1600 macro blocks (40 macro blocks row, 40 macro-block line); This slice-group further goes up inferior two the sub-slice-group (sub-slice-group 1,2) that are divided into, and each sub-slice-group is totally 800 macro blocks (40 macro blocks row, 20 macro-block line).

Slice-group 2 is the black matrix block of the band underscore in the upper right corner, totally 1600 macro blocks (40 macro blocks row, 40 macro-block line); This slice-group further goes up inferior two the sub-slice-group (sub-slice-group 3,4) that are divided into, and each sub-slice-group is totally 800 macro blocks (40 macro blocks row, 20 macro-block line).

Slice-group 3 is 5 the whole macro-block line in below, totally 400 macro blocks (80 macro blocks row, 5 macro-block line); The whole sub-slice-group (sub-slice-group 5) that is divided into of this slice-group.

This division is applicable to the situation of H.264 basic class and expansion class support FMO.

Table 3

0	1	2	…	36	37	38	39	40	41	42	43	…	77	78	79	MBR1
0	1	2	…	36	37	38	39	40	41	42	43	…	77	78	79	MBR1	80	81	82	…	116	117	118	119	120	121	122	123	…	157	158	159	MBR2

0	1	2	…	36	37	38	39	40	41	42	43	…	77	78	79	MBR1
0	1	2	…	36	37	38	39	40	41	42	43	…	77	78	79	MBR1	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR3~7
560	561	562	…	596	597	598	599	600	601	602	603	…	637	638	639	MBR8	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR3~7
560	561	562	…	596	597	598	599	600	601	602	603	…	637	638	639	MBR8	640	641	642	…	676	677	678	679	680	691	692	693	…	717	718	719	MBR9
720	721	722	…	756	757	758	759	760	761	762	763	…	797	798	799	MBR10	640	641	642	…	676	677	678	679	680	691	692	693	…	717	718	719	MBR9
720	721	722	…	756	757	758	759	760	761	762	763	…	797	798	799	MBR10	800	801	802	…	836	837	838	839	840	841	842	843	…	877	878	879	MBR11
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR12~16	800	801	802	…	836	837	838	839	840	841	842	843	…	877	878	879	MBR11
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR12~16	1280			…									…			1359	MBR17
1360			…									…			1439	MBR18	1280			…									…			1359	MBR17
1360			…									…			1439	MBR18	1440			…									…			1519	MBR19
1520			…									…			1599	MBR20	1440			…									…			1519	MBR19
1520			…									…			1599	MBR20	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR21~25
2000			…									…			2039	MBR26	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR21~25
2000			…									…			2039	MBR26	2080			…									…			2159	MBR27
2160			…									…			2239	MBR28	2080			…									…			2159	MBR27
2160			…									…			2239	MBR28	2240			…									…			2319	MBR29
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR30~34	2240			…									…			2319	MBR29
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR30~34	2720			…									…			2799	MBR35
2800			…									…			2879	MBR36	2720			…									…			2799	MBR35
2800			…									…			2879	MBR36	2880			…									…			2959	MBR37
2960			…									…			3039	MBR38	2880			…									…			2959	MBR37
2960			…									…			3039	MBR38	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR39~43
3440			…									…			3519	MBR44	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	MBR39~43
3440			…									…			3519	MBR44	3520			…									…			3599	MBR45

Be example to divide slice-group by raster scan again, it is as shown in table 4 to divide the result accordingly, and a frame is divided into 1 slice-group, further is divided into 5 sub-slice-group altogether again, 9 whole macro-block line of each sub-slice-group, totally 720 macro blocks.

Sub-slice-group 1: macro block numbering 0～719;

Sub-slice-group 2: macro block numbering 720～1439;

Sub-slice-group 3: macro block numbering 1440～2159;

Sub-slice-group 4: macro block numbering 2160～2879;

Sub-slice-group 5: macro block numbering 2880～3599;

H.264 this splitting scheme is applicable to, VC-1, AVS1.0-P2, H.263, MPEG2, MPEG4-Part2 standard and each class thereof.

Table 4

Figure DEST_PATH_S061B3256920061221D000041

Here illustrate as follows to the described exchange reference data of step 24 among the present invention:

With the grouping of the subband shown in the table 4 dividing mode is example, with 720p, motion estimation search scope level [64, + 63]/vertical [32, + 31] be example, suppose encoder 3 corresponding sub-slice-group 3, the reference data district that needs is that corresponding macro block is numbered previous or a plurality of reference frame image zone of 1280～2319.

Under the estimation situation for a reference frame, if present frame and the coded sub-slice-group of previous frame at encoder 3 are divided constant, promptly be all macro block and be numbered 1440～2159 zones, because macro block is numbered the sub-slice-group that 1440～2159 zones are the reconstructions of this encoder previous frame in the required reference frame regions, the reference data that need read in from encoder 3 outsides only is following two:

Macro block is numbered 1280～1439, comes from (encoder 2 outputs) sub-slice-group 2 of reconstruction; With,

Macro block is numbered 2160～2319, comes from (encoder 4 outputs) sub-slice-group 4 of reconstruction.

Equally, under the estimation situation for a reference frame, if present frame and the coded sub-slice-group of previous frame at encoder 2 are divided constant, promptly be all macro block and be numbered 720～1439 zones, because macro block is numbered the sub-slice-group that 720～1439 zones are the reconstructions of this encoder previous frame in the required reference frame regions, the reference data that need read in from encoder 3 outsides only is following two:

Macro block is numbered 1440～1559, comes from (encoder 3 outputs) sub-slice-group 3 of reconstruction; With,

Macro block is numbered 560～719,, come from (encoder 1 output) sub-slice-group 1 of reconstruction.

At this, two encoders 2 that the sub-slice-group of present encoding is adjacent and 3 will be rebuild sub-slice-group data mutually and pass to the other side as the reference data, be used to carry out encoding operation.

If use the multi-reference frame estimation, then need to be written into the sub-slice-group data division of required a plurality of reconstructions from other encoder.

Method provided by the invention goes for various video formats, comprises HD video or SD video, and, can be also can be interlacing scan line by line; Wherein, for interlaced scanning video, it is right that two corresponding macro blocks of the field, top of an interlaced scanned frames and field, the end are formed macro block, need be with a macro block to distributing to same slice-group, sub-slice-group, band coding.

The example of dividing based on the slice-group of front and sub-slice-group is described the application example at the division of the band in the sub-slice-group cataloged procedure below:

Application example one

It is as shown in table 5 that sub-slice-group is divided into the example that band cuts apart.In table 5, sub-slice-group 1 is divided into 7 bands in cataloged procedure, be numbered band 1～7 by raster scan order, and band 1,2 respectively is two macro-block line, and band 3～7 respectively is a macro-block line;

Table 5

0	1	2	…	36	37	38	39	40	41	42	43	…	77	78	79	MBR1
0	1	2	…	36	37	38	39	40	41	42	43	…	77	78	79	MBR1	80	81	82	…	116	117	118	119	120	121	122	123	…	157	158	159	MBR2
160	161	162	…	196	197	198	199	200	201	202	203	…	237	238	239	MBR3	80	81	82	…	116	117	118	119	120	121	122	123	…	157	158	159	MBR2
160	161	162	…	196	197	198	199	200	201	202	203	…	237	238	239	MBR3	240	241	242	…	276	277	278	279	280	281	282	283	…	317	318	319	MBR4
320	321	322	…	356	357	358	359	360	361	362	363	…	397	398	399	MBR5	240	241	242	…	276	277	278	279	280	281	282	283	…	317	318	319	MBR4
320	321	322	…	356	357	358	359	360	361	362	363	…	397	398	399	MBR5	400	401	402	…	416	417	418	419	420	421	422	423	…	477	478	479	MBR6
480	481	482	…	516	517	518	519	520	521	522	523		557	558	559	MBR7	400	401	402	…	416	417	418	419	420	421	422	423	…	477	478	479	MBR6
480	481	482	…	516	517	518	519	520	521	522	523		557	558	559	MBR7	560	561	562	…	596	597	598	599	600	601	602	603	…	637	638	639	MBR8
640	641	642		676	677	678	679	680	691	692	693		717	718	719	MBR9	560	561	562	…	596	597	598	599	600	601	602	603	…	637	638	639	MBR8

Application example two

The example that second band cut apart is as shown in table 6, and in table 2, sub-slice-group 1 is divided into 5 bands in cataloged procedure, be numbered band 1～5 by raster scan order, and the macro block that each band comprises is as follows:

Band 1: macroblock coding 0～123;

Band 2: macroblock coding 124～237;

Band 5: macroblock coding 238～319;

Band 3: macroblock coding 320～521;

Band 4: macroblock coding 522～719;

Table 6

0	1	2	…	36	37	38	39	40	41	42	43	44	…	77	78	79	MBR1
0	1	2	…	36	37	38	39	40	41	42	43	44	…	77	78	79	MBR1	80	81	82	…	116	117	118	119	120	121	122	123	124	…	157	158	159	MBR2
160	161	162	…	196	197	198	199	200	201	202	203	204	…	237	238	239	MBR3	80	81	82	…	116	117	118	119	120	121	122	123	124	…	157	158	159	MBR2
160	161	162	…	196	197	198	199	200	201	202	203	204	…	237	238	239	MBR3	240	241	242	…	276	277	278	279	280	281	282	283	284	…	317	318	319	MBR4

0	1	2	…	36	37	38	39	40	41	42	43	44	…	77	78	79	MBR1
0	1	2	…	36	37	38	39	40	41	42	43	44	…	77	78	79	MBR1	320	321	322	…	356	357	358	359	360	361	362	363	364	…	397	398	399	MBR5
400	401	402	…	416	417	418	419	420	421	422	423	424	…	477	478	479	MBR6	320	321	322	…	356	357	358	359	360	361	362	363	364	…	397	398	399	MBR5
400	401	402	…	416	417	418	419	420	421	422	423	424	…	477	478	479	MBR6	480	481	482	…	516	517	518	519	520	521	522	523	524	…	557	558	559	MBR7
560	561	562	…	596	597	598	599	600	601	602	603	604	…	637	638	639	MBR8	480	481	482	…	516	517	518	519	520	521	522	523	524	…	557	558	559	MBR7
560	561	562	…	596	597	598	599	600	601	602	603	604	…	637	638	639	MBR8	640	641	642	…	676	677	678	679	680	691	692	693	694	…	717	718	719	MBR9

The present invention also comprises a kind of device of parallel video coding, the specific implementation structure of this device as shown in Figure 3, comprise primary processor and a plurality of encoder, primary processor is used for present frame to be encoded is divided into sub-slice-group, and pass to a plurality of encoders respectively, each sub-slice-group of a plurality of encoder parallel encodings is exported separately encoding code stream and is given primary processor, by primary processor formation sequence code stream and output.

(1) primary processor

Described primary processor further comprises slice-group determining unit, sub-slice-group determining unit, sub-slice-group data passes unit and top layer coding unit, and each unit is specially:

The slice-group determining unit is used for from data transfer unit receiving digital video sequence data the present frame in the sequence of frames of video fixedly being divided into macro block, presses pre-defined rule and gives one or more slice-group with all macroblock allocation;

Sub-slice-group determining unit is used for all slice-group to present frame, is divided into one or more sub-slice-group according to the order of raster scan, and each macro block must and can only be assigned to a sub-slice-group;

Sub-slice-group data passes unit, be used for reading in described sub-slice-group data according to a plurality of encoders of corresponding relation configuration control of sub-slice-group and encoder, and each encoder reads in one or more sub-slice-group, and the data to be encoded that each sub-slice-group comprises all are delivered to the one and same coding device; Operable bus signals form includes but not limited in sub-slice-group data transfer: the signal format of AHB in parallel interface, HSSI High-Speed Serial Interface, BT656 or CCIR601 format digital video signal, HD-SDI signal, the AMBA bus specification or AXI bus or Ethernet interface correspondence;

The top layer coding unit: be used to compile the code stream and the relevant parameter of each encoder output, generate slice-group, frame code stream, merging is exported after handling the formation sequence code stream;

In primary processor, can be a slice-group specifically, be divided into a plurality of sub-slice-group, and all sub-slice-group equal one or more continuous whole macro-block line of frame width for size and fixed-site, width in image entire frame.

In addition, in primary processor, also finish in advance and carry out the image preliminary treatment receiving present frame, comprise: two-dimentional denoising, with or, convergent-divergent is handled, and/or digital video 4:4:4 handles to the 4:2:2 format conversion, and/or digital video 4:2:2 is to format conversion processing of 4:2:0 etc.

And, during primary processor of the present invention is concrete can be a computer system, comprise: CPU, memory, with the interface (with the interface and the external communication network input interface of data transfer unit) of data transfer unit, and output interface, be used to export encoding code stream, with, configuration control information interface is used to accept the exterior arrangement control information.

(2) encoder, the structure of described encoder specifically can comprise:

(1) sub-slice-group receiving element is used to receive sub-slice-group and coding configuration parameter;

(2) reference data input-output unit, be used between each encoder, exchanging reference data, be specially: when present frame is not the I frame, the exchange reference data unit of each encoder by being provided with in the data transfer unit, exchange is obtained required reference data from other encoder, be the sub-slice-group data of reference frame that exchange has been rebuild between the controlled encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder.

(3) coding unit, be used for parallel to all sub-slice-group codings of present frame: each encoder finish divide the coding of gamete slice-group, in cataloged procedure each sub-slice-group is divided into one or more bands by the order of raster scan, each macro block must and can only be assigned to a band; Each encoder finish divide all macroblock encoding in the gamete slice-group, produce and rebuild sub-slice-group, produce and output code flow and relevant parameter; And described band is preferably one or more continuous whole macro-block line that width equals the frame width;

In a plurality of encoder of the present invention each encoder can for:

General processor comprises: the processor of superscalar architecture, VLIW word signal processor (VLIW DSP); Or

Field-programmable large scale array chip (FPGA); Or

The VLSI (very large scale integrated circuit) chip (VLSI) of customization; Or

The instruction set configurable processor.

Each encoder is configuration data random access storage device (RAM) also, and promptly memory cell is used for the information such as sub-slice-group data that buffer memory is current and rebuild.Can use big capacity RAM in the chip when data RAM capacity is little, desired volume can dispose synchronous static random-access memory (SSRAM) or the synchronous dynamic random access memory (SDRAM) that uses sheet outer when big.

For obtaining better parallel processing, be used for H.264, VC-1, AVS1.0 further ,-during the P2 standard code, the loop filtering mode in the described encoder encodes is:

Be provided with and select the border loop filter patterns of all bands of present frame be not filtering, each encoder is independently finished the loop filtering processing separately, swap data not between each encoder, each band;

Be specially: the loop filtering of a band starts after first macro block reconstruction of this band is finished, and the reconstruction of macro block and loop filtering constitute the macro-block level streamline; Perhaps, the loop filtering of a band begins after this band reconstruction is all finished, and each encoder independently carries out separately to the loop filtering of each band, does not need swap data between each encoder, each band.

Described encoder is in cataloged procedure, and corresponding reference data exchange process is:

When present frame is not the I frame, the sub-slice-group data of exchange reference frame between the encoder, image area, memory block overlapping relationship according to sub-slice-group of each encoder present frame and the sub-slice-group of reconstruction frames, and consider estimation maximum search district, reference frame in the sub-slice-group of present frame, determine the minimum reference data that exchanges between the encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder; Wherein, estimation can adopt fixed-size search window, and a reference frame is determined the minimum reference data that exchanges between the encoder according to the search window size, upgrades each sub-slice-group motion estimation search district reference data of buffer memory in each encoder.

Need to prove, carry out in the encoding operation process at encoder, also need usually present frame is carried out also needing preliminary treatment before the video coding, comprising: two-dimentional denoising, convergent-divergent and format conversion, format conversion refer to the input digit video format is transferred to the processing such as form of encoder requirement; Common encoded digital video form is 4:2:0, and the video format of common input is the 4:2:2 of BT656 or CCIR601 regulation.H.264 the high-fidelity extended coding is supported 4:2:2 and 4:4:4, and this moment, the input digit video was 4:2:2 and 4:4:4.The HD video physical interface often is HD-SDI.

And the required current frame data preliminary treatment of a plurality of encoders can be adopted one of following two kinds of forms:

1, primary processor receives present frame, sends out behind concentrated preliminary treatment and the buffer memory and is transmitted to a plurality of encoders again; This scheme needs the master processor processes ability stronger, and Frame memory capacity is bigger;

2, a plurality of encoders separately preliminary treatment handle the sub-slice-group distributed.

Can also be provided with control unit in the device primary processor of the present invention, be used for initialization, configuration and control other processing unit of primary processor and a plurality of encoder and finish the whole video sequence coding.

In the device of the present invention, startup, configuration circuit in order to simplify each encoder reduce cost, and the startup of described encoder, configurator or data can be from primary processors; For encoder is the situation of FPGA, can be by primary processor by parallel or series arrangement interface configuration FPGA; For encoder is the situation of general processor, can select parallel or serial port to start (Boot), and the BootROM program of using in the start-up course is from primary processor.

The embedded computer system that device of the present invention is made of one or more circuit boards typically; Perhaps, also can be the system that personal computer or server link together by communication network, wherein encoder can be PC or server.

Method of the present invention and device are suitable for multiple coding standard and each class thereof.

Device provided by the invention is applicable to various video formats, comprise HD video or SD video, can be also can be interlacing scan line by line, for interlaced scanning video, it is right that two corresponding macro blocks of the field, top of an interlaced scanned frames and field, the end are formed macro block, need be with a macro block to distributing to same slice-group, sub-slice-group, band coding.

The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of claim.

Claims

1. a method of parallel video coding is characterized in that, comprising:

By sub-slice-group and encoder corresponding relation, send all sub-slice-group of present frame and coding configuration parameter to each encoder;

When present frame is not the I frame, exchange the sub-slice-group data of reference frame of having rebuild between the encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;

By described each encoder to the parallel encoding process of carrying out of the sub-slice-group of present frame, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the sub-slice-group that each encoder is distributed, produce and rebuild sub-slice-group, output encoder code stream and parameter;

2. method according to claim 1 is characterized in that, described pre-defined rule comprises:

Interlacing pattern, random scanning pattern, prospect and background scans pattern, Box-out scan pattern, raster scan scan pattern, handkerchief scan pattern and needs adopt numbering to indicate the explicit scan pattern of the affiliated slice-group of each macro block, perhaps, present frame is divided into a slice-group.

3. method according to claim 1 is characterized in that, described sub-slice-group is one or more continuous whole macro-block line that width equals the frame width.

4. according to each described method of claim 1 to 3, it is characterized in that, described with sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder processing comprise:

5. method according to claim 1 is characterized in that, in the image sets cataloged procedure, the described processing that is divided into a plurality of sub-slice-group comprises:

For the 2nd the arbitrary non-I frame that non-I frame is later, predict the encoding process load of described non-each sub-slice-group of I frame according to this frame former frame or each sub-slice-group encoding process amount of former frame, and be written into cost according to reference frame data, adjust the division of described non-I frame slice-group, a plurality of coder processes time unanimities when making the described non-I frame of parallel encoding.

6. method according to claim 1 is characterized in that, described band is one or more continuous whole macro-block line that width equals the frame width.

7. method according to claim 1 is characterized in that, the encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out comprises:

Rate Control;

Integer transform, quantification;

Inverse quantization, inverse transformation;

Rebuild;

Loop filtering.

8. method according to claim 7 is characterized in that, the encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out also comprises:

Be provided with and select the border loop filter patterns of all bands of present frame to be not filtering, each encoder is independently finished each band loop Filtering Processing separately, and exchange message not between each encoder, each band.

9. method according to claim 8 is characterized in that, the described loop filtering of independently finishing separately comprises:

The loop filtering of band starts after first macro block reconstruction of this band is finished, and perhaps, the loop filtering of band begins after this band reconstruction is all finished.

10. the device of a parallel video coding, it is characterized in that, comprise primary processor and a plurality of encoder, primary processor is used for present frame to be encoded is divided into sub-slice-group, and pass to a plurality of encoders respectively, each sub-slice-group of a plurality of encoder parallel encodings is exported separately encoding code stream and is given primary processor, by primary processor formation sequence code stream and output;

11. device according to claim 10 is characterized in that, the processing of described primary processor comprises:

Entire frame is divided into a slice-group, and further is divided into a plurality of sub-slice-group, and all sub-slice-group are one or more continuous whole macro-block line that width equals the frame width.

12. device according to claim 10 is characterized in that, each encoder in described a plurality of encoders also comprises data storage cell, is used for the sub-slice-group data that buffer memory is current and rebuild.

13. device according to claim 10 is characterized in that, described band is one or more continuous whole macro-block line that width equals the frame width, and described same macro block only can be divided in the same band.

14. device according to claim 10 is characterized in that, described device primary processor also comprises control unit, is used for initialization, configuration and control primary processor and a plurality of encoder and finishes the whole video sequence coding.

15., it is characterized in that each encoder in described a plurality of encoders also comprises according to each described device of claim 10 to 13 in carrying out cataloged procedure:

16., it is characterized in that described a plurality of encoders are only finished macro-block level entropy coding output macro block code stream and parameter according to each described device of claim 10 to 13, primary processor top layer coding module is finished the slice level entropy coding.