[go: up one dir, main page]

CN101150719B - Parallel video coding method and device - Google Patents

Parallel video coding method and device Download PDF

Info

Publication number
CN101150719B
CN101150719B CN 200610113256 CN200610113256A CN101150719B CN 101150719 B CN101150719 B CN 101150719B CN 200610113256 CN200610113256 CN 200610113256 CN 200610113256 A CN200610113256 A CN 200610113256A CN 101150719 B CN101150719 B CN 101150719B
Authority
CN
China
Prior art keywords
slice
group
sub
frame
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200610113256
Other languages
Chinese (zh)
Other versions
CN101150719A (en
Inventor
孟新建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN 200610113256 priority Critical patent/CN101150719B/en
Publication of CN101150719A publication Critical patent/CN101150719A/en
Application granted granted Critical
Publication of CN101150719B publication Critical patent/CN101150719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a method of parallel video coding and a device. The invention comprises a host processor and a plurality of coder, the host processor is used for dividing present frame in the video sequence fixedly into macroblocks, distributing all macroblocks to one or a plurality of band groups according to the preset rule; according to the rule of processing load to balance between a plurality of coders, the band group of present frame is devided into one or a plurality of sub-band groups according to the raster scanning sequence; all sub-band groups of present frame which are determined by division and code configuration parameters are sent to a plurality of coders respectively, then parallel sub-band groups of present frame is coded by the coders, outputs codes and parameters respectively; finally, gathering codes and parameters output by each coder, further coding sub-band groups, frames and sequences, outputting whole sequence code. Thereby, the method and the device of the invention are suitable to realtime high definition video code.

Description

Method of parallel video coding and device
Technical field
The present invention relates to technical field of video coding, relate in particular to a kind of parallel video coding technology.
Background technology
Video coding technique is with the digital video information compression, more effectively is transmitted and stores so that realize.Video coding is the core technology that multi-medium data is handled.
At present, the video compression coding standard mainly comprises: the moving picture standard that International Telecommunications Union's Standardization Sector (ITU-T) video coding expert group (VCEG) formulates H.261, H.263; Video encoding standard MPEG1, MPEG4-Part2 that the Motion Picture Experts Group (MPEG) of ISO/IEC associating formulates; The common video encoding standard MPEG2/H.262 that formulates of ITU-T video coding expert group (VCEG) and ISO/IEC MPEG joint specialist group (JVT), H.264/AVC (MPEG4-Part10); Also has the video encoding standard AVS1.0-P2 of VC-1 (predecessor is WMV-9) and audio and video standard group (AVS) formulation etc. in addition.
MPEG (Motion Picture Experts Group), JVT (MPEG joint specialist group) and VCEG (video coding expert group) series video coding standard basic framework are as shown in Figure 1, adopt the hybrid coding framework of block-based motion compensation and transition coding, comprise infra-frame prediction, inter prediction, conversion, quantification and entropy coding etc.Wherein, inter prediction is to use block-based motion vector to come redundancy between removal of images; Infra-frame prediction is to use spatial prediction mode to come the interior redundancy of removal of images.Again by prediction residual being carried out the visual redundancy in the transform and quantization removal of images.At last, motion vector, predictive mode, quantization parameter and conversion coefficient compress with entropy coding.The basic processing unit of video decoding process is a macro block, and a macro block generally includes one 16 * 16 brightness sample value piece and corresponding colourity sample value piece.
Institute's tool using of different standards certain difference arranged.H.264/AVC for (MPEG4-Part10), VC-1, the AVS1.0-P2, de-blocking filter is necessary module, is called as loop filtering for generation standard; And at MPEG2, H.263, in the MPEG4-Part2 standard, de-blocking filter only is an optional reprocessing link in the decoder.
Real-Time Video Encoder be input as high-definition video signal, finish video compression coding in real time, output code flow.Real-Time Video Encoder is the basic equipment of Digital Television head end (Headend) system, in the equipment such as DVD player that also be widely used in video conference, digital camera, can record.
H.264/AVC (MPEG4-Part10), VC-1, AVS1.0-P2 are called as video compression coding standard of new generation, compare with the previous generation standard that with MPEG2 is representative, the compression ratio of generation standard provides more than one times, but complexity also increases more than 2 times, and the difficulty of realization increases greatly.
HDTV (High-Definition Television) (HDTV) typically refers to every frame scan line number and is 720 lines or interlacing 1080 lines and above live image thereof line by line.Common high-definition format has at present: 720p (resolution 1280 * 720, frame frequency are 24,30,60), 1080i (field frequency is 60 for interlacing scan, every frame resolution 1920 * 1088), 1080p (resolution 1920 * 1088, frame frequency are 24,30).Future, more high-resolution video also can obtain to use.HD video can provide higher video quality, and simultaneously, the realization of HD video compressed encoding is more difficult.
H.264/AVC, with generation standard is example, owing to introduced the multi-reference frame motion compensation, minimum is 4 * 4 variable block length prediction, abundant intra prediction mode, loop filtering, the arithmetic coding instruments such as (CABAC) of variable-length encoding of context-adaptive (CAVLC) or context-adaptive, make encoder complexity increase greatly, according to assessment, under the situation that adopts full searching moving to estimate, H.264/AVC the computation complexity of HDTV720p encoder is about 3600giga-instructions per second (GIPS), the about 5570giga-bytes per of Memory flowing of access second (GBytes/s).More huge under 1080i and the 1080p situation.
Because the huge computational complexity of HD video encoder adopts single processor to be difficult to realize real-time coding usually.Especially in head end application scenarios such as (Headend), need to support in multichannel, various video form and the applied environments such as compressed encoding standard and transcoding, adopt single encoded device to realize that the difficulty of coding is more outstanding.Therefore, the high definition encoder usually needs to adopt the parallel encoding process of carrying out of multicore sheet (being a plurality of encoders) to realize.
Yet at present industry does not also have a kind of parallel video coding processing scheme can finely satisfy the real-time high definition coding requirement of high-performance that video compression coding standard of new generation such as (MPEG4-Part10), VC-1, AVS1.0-P2 head end H.264/AVC etc. is used.
Summary of the invention
The purpose of this invention is to provide a kind of method of parallel video coding and device, thereby provide a kind of parallel video coding efficiently to realize, satisfy the real-time HD video coding of high-performance.
The objective of the invention is to be achieved through the following technical solutions:
The invention provides a kind of method of parallel video coding, this method comprises:
Present frame in the video sequence is divided into macro block, presses pre-defined rule and give one or more slice-group all macroblock allocation;
Order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group; In described partition process, each macro block must and can only be assigned to a sub-slice-group;
With sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder, the corresponding one and same coding device of each sub-slice-group, the corresponding one or more sub-slice-group of each encoder;
By sub-slice-group and encoder corresponding relation, send all sub-slice-group of present frame and coding configuration parameter to a plurality of encoders;
When present frame was the I frame, each encoder abandoned all and rebuilds sub-slice-group data;
When present frame was not the I frame, each encoder was rebuild sub-slice-group from other encoder and is obtained the affiliated required reference data of sub-slice-group estimation; Comprise: according to image area, the memory block overlapping relationship of sub-slice-group of each encoder present frame and the sub-slice-group of reconstruction, and maximum search district, the multi-reference frame attribute of estimation in the sub-slice-group of present frame, determine the minimum reference data that exchanges between the encoder, obtain described reference data by swap operation between each encoder, and utilize described reference data to upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;
By described a plurality of encoders to the parallel encoding process of carrying out of the sub-slice-group of present frame, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the described sub-slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter;
The code stream and the parameter of each encoder output are converged, finish the coding of slice-group, frame and sequence, output whole sequence code stream.
Described pre-defined rule comprises: interlacing pattern, random scanning pattern, prospect and background scans pattern, box Box-out scan pattern, the grating opened are swept and are caught the explicit scan pattern that scan pattern, handkerchief scan pattern and needs employing numbering indicate the affiliated slice-group of each macro block, perhaps, present frame is divided into a slice-group.
Described sub-slice-group is one or more continuous whole macro-block line that width equals the frame width.
Described with sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder processing comprise:
According to each coder processes ability, and reference data transmits cost, and each sub-band that present frame is comprised corresponds to each encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group.
In the image sets cataloged procedure, the described processing that is divided into a plurality of sub-slice-group comprises:
For the I frame, handle load according to the macroblock number predictive coding, and sub-slice-group corresponded to each encoder according to the disposal ability and the described encoding process load of encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group;
For first non-I frame, divide when constant when slice-group, adopt the sub-slice-group dividing mode of I frame to carry out the division of sub-slice-group;
For the 2nd non-I frame, each sub-slice-group encoding process amount of the 1st the non-I frame that obtains according to actual count is predicted the encoding process load of described the 2nd non-each sub-slice-group of I frame, and be written into cost according to reference data, adjust the division of described the 2nd non-I frame slice-group, a plurality of coder processes time unanimities when making described the 2nd the non-I frame of parallel encoding;
For the 2nd the arbitrary non-I frame that non-I frame is later, predict the encoding process load of described non-each sub-slice-group of I frame according to this frame former frame or each sub-slice-group encoding process amount of former frame, and be written into cost according to examining frame data, adjust the division of described non-I frame slice-group, a plurality of coder processes time unanimities when making the described non-I frame of parallel encoding.
Described band is one or more continuous whole macro-block line that width equals the frame width.
The encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out comprises:
Use a plurality of backward or forward reference frames to carry out estimation, and motion compensation;
Infra-frame prediction is selected, and promptly during the current macro infra-frame prediction, uses same band to rebuild as yet the not left side macro block and the top macro block data of loop filtering; In the frame/and the interframe selection, ask residual error;
Rate Control;
Integer transform, quantification;
Reorder, entropy coding, entropy coding are that context is based on adaptive variable-length encoding or based on contextual adaptive binary arithmetic coding;
Inverse quantization, inverse transformation;
Rebuild;
Loop filtering.
The encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out also comprises: be provided with and select the border loop filter patterns of all bands of present frame to be not filtering, each encoder is independently finished each band loop Filtering Processing separately, and exchange message not between each encoder, each band.
The described loop filtering of independently finishing separately comprises: the loop filtering of band starts after first macro block reconstruction of this band is finished, and perhaps, the loop filtering of band begins after this band reconstruction is all finished.
The present invention also provides a kind of device of parallel video coding, comprise primary processor and a plurality of encoder, primary processor is used for present frame to be encoded is divided into sub-slice-group, and pass to a plurality of encoders respectively, each sub-slice-group of a plurality of encoder parallel encodings, export separately encoding code stream and give primary processor, by primary processor formation sequence code stream;
Described primary processor comprises slice-group determining unit, sub-slice-group determining unit and top layer coding unit, wherein:
The slice-group determining unit is used for the present frame of sequence of frames of video is divided into macro block, presses pre-defined rule and gives one or more slice-group with all macroblock allocation;
Sub-slice-group determining unit, be used for all slice-group to present frame, order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group, and each macro block must and can only be assigned to a sub-slice-group, the corresponding one or more sub-slice-group of each encoder, the corresponding encoder of each sub-slice-group;
Sub-slice-group data passes unit sends all sub-slice-group of present frame and coding configuration parameter to each encoder by sub-slice-group and encoder corresponding relation;
The top layer coding unit, code stream and parameter that each encoder is exported converge, and finish the coding of slice-group, frame and sequence, output whole sequence code stream;
Described encoder comprises sub-slice-group receiving element, reference data input-output unit and coding unit, wherein:
Sub-slice-group receiving element is used to receive sub-slice-group and coding configuration parameter;
The reference data input-output unit, be used between each encoder, exchanging reference data, when present frame is not the I frame, exchange the sub-slice-group data of reference frame of having rebuild between the controlled encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;
Coding unit, be used for divide the gamete slice-group to carry out encoding process, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the branch gamete slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter.
The processing of described primary processor comprises: entire frame is divided into a slice-group, and further is divided into a plurality of sub-slice-group, and all sub-slice-group are one or more continuous whole macro-block line that width equals the frame width.
Each encoder in described a plurality of encoder also comprises data storage cell, is used for the sub-slice-group data that buffer memory is current and rebuild.
Described band is one or more continuous whole macro-block line that width equals the frame width, and described same macro block only can be divided in the same band.
Described device primary processor also comprises control unit, is used for initialization, configuration and control primary processor and a plurality of encoder and finishes the whole video sequence coding.
Each encoder in described a plurality of encoder also comprises in carrying out cataloged procedure:
The border loop filter patterns of all bands of present frame is set to not filtering, each encoder is independently finished each band loop Filtering Processing separately, exchange message not between each encoder, each band, and the loop filtering of band starts after first macro block reconstruction of this band is finished, perhaps, the loop filtering of band begins after this band reconstruction is all finished.
Described a plurality of encoder is only finished macro-block level entropy coding output macro block code stream and parameter, and primary processor top layer coding module is finished the slice level entropy coding.
As seen from the above technical solution provided by the invention, the invention provides a kind of parallel video coding processing scheme, thereby can in video coding process, select parallel video coding mode, make the present invention can also satisfy the required disposal ability of the real-time HD video coding of high-performance well.Simultaneously, among the present invention, owing to adopt the principle data to be encoded that granularity is moderate of load balancing to offer a plurality of encoders, and coding video data is handled by a plurality of encoders are parallel, thereby make and reduce the time of waiting for mutually as much as possible between each processor, and can reduce expense mutual between the encoder as far as possible and reduce as far as possible, therefore, parallel video coding processing scheme provided by the invention also has higher video coding efficient.
Description of drawings
Fig. 1 is a video coding framework schematic diagram of the prior art;
Fig. 2 is the specific implementation process schematic diagram of method of the present invention;
Fig. 3 is the specific implementation structural representation of device of the present invention.
Embodiment
The present invention mainly is at video coding process, and slice-group further is divided into a plurality of sub-slice-group, and each sub-slice-group comprises one or more bands, utilizes a plurality of encoders to carry out parallel encoding based on described sub-slice-group.
For ease of understanding, at first the part notion in the existing encoding scheme is described technological concept in the existing coding implementation that relates among the present invention.
H.264 be divided into different levels such as sequence, image sets, image (being frame), field, slice-group, band, macro block, sub-macro block from high to low by time, space.
(1) field, frame, image
One of video or a frame can be used for producing a coded image.Usually, frame of video can be divided into two types: continuous or interlaced video.Under the interlaced video situation, two (being called field, top and field, the end) form a frame;
Present frame: the frame of encoding;
Reconstruction frames: encoder is rebuild the frame of output through local decode;
Reference picture (frame): in order to improve precision of prediction, the H264 coding can be from one group of forward direction or back to selecting the reference picture of the one or more and current image that mates most as interframe encode the coding and rebuilding image.H.264 can from 16 reference pictures, select at most in, select best matching image.
(2) (Macroblock is MB) with sub-macro block for macro block
A coded image is divided into several macro blocks usually, and a macro block can be made up of one 16 * 16 luminance pixel and additional a 8 * 8Cb and 8 * 8Cr color pixel piece.Macro block is the basic size unit of video coding.A macro block can further be divided into piece: can be divided into 16 * 8,8 * 16 or 8 * 8 luminance pixel pieces (and subsidiary color pixel); Sub-macro block to 8 * 8 then can be divided into each seed macro block again: 8 * 4,4 * 8 or 4 * 4 luminance pixel pieces (and subsidiary color pixel).It is right that two corresponding macro blocks of the field, top of an interlaced scanned frames and field, the end are formed macro block.
According to the predictive mode that is adopted is in the frame or interframe, and macro block is divided into intra-frame prediction block (I macro block) and inter prediction piece (P macro block), and I macro block utilization decoded pixel from current band carries out infra-frame prediction as a reference.The P macro block utilize the front image encoded carry out infra-frame prediction as the reference image.
(3) band (Slice)
In each image, several macro blocks is aligned to the form of band.Intraframe coding band (I band) only comprises the I macro block, and interframe encode band (P band) can comprise P and I macro block.The coding of band is separate, and the infra-frame prediction of certain band can not be a reference picture with the macro block in other bands.
All the frame of being made up of the I band becomes the I frame; The frame that is made of the I band entirely is not called non-I frame.
(4) slice-group
Visual macro block in standard H.264 can flexible macro block tissue order (FMO) be divided into a plurality of slice-group (slicegroup); Slice-group is the subclass of some MB in the coded image, and it can comprise one or several bands.By the use of slice-group (slice group), it is the mode of band and macro block that FMO has changed image division, can further improve the error resilience of band.
Macro block to the mapping definition of slice-group macro block belong to which slice-group.Utilize the FMO technology, H.264 defined 7 kinds of macro block scan patterns, described seven kinds of scan patterns comprise: staggered, at random, prospect and background, box (Box-out), raster scan, handkerchief and explicit (the indicating slice-group under each macro block with numbering) of opening.
Different standards and class (Profile) are to the support difference of FMO.H.264 basic class (BaselineProfile) and expansion class are supported FMO7 kind scan pattern.H.264 main file time (Main Profile), VC-1, AVS1.0-P2 do not support the FMO pattern, and " raster scan " a kind of scan pattern is only arranged; Have only a slice-group, its size equals frame.
(5) image sets (GOP)
A plurality of continuous images (frame), start frame are exactly the I frame.
(6) sequence
Video sequence (sequence), the top syntactic structure of coded bit stream comprises one or more continuous coded images.
In the encoding process process, class is the subclass of grammer, semanteme and the algorithm of regulation.The decoder that meets certain class regulation must be supported the subclass of this class definition fully.H.264/AVC standard is divided into 3 class (Profile) and 4 kinds of high-fidelity expansions (High Extended).
1, basic class (Baseline Profile):
Utilize I band and P band to support in the frame and interframe encode, support the entropy coding (CAVLC) that carries out based on contextual adaptive variable-length encoding.Be mainly used in real-time video communications such as video telephone, video conferencing, radio communication.
2, main file time (Main Profile):
Support interlaced video, adopt the interframe encode of B band and the intraframe coding of employing weight estimation; Support to utilize based on contextual adaptive arithmetic coding (CABAC).Be mainly used in the storage of digital broadcast television and digital video.
3, expansion class (Extended Profile):
Support effectively to switch (SP and SI band) between the code stream, improve error performance (data are cut apart), but do not support interlaced video and CABAC, be mainly used in Streaming Media.
In concrete encoding process process, be convenient adaptive diverse network standard agreement, function H.264/AVC is divided into two-layer, i.e. video coding layer (VCL) and network abstraction layer (NAL, Network Abstraction Layer).The VCL data are the output after the encoding process, and its expression is compressed the video data sequences behind the coding.Before VCL transfer of data or storage, the VCL data of these codings, mapped earlier or encapsulation is advanced in the NAL unit.
H.264/AVC the video coding layer of (MPEG4-Part10) video encoding standard adopts conversion and prediction hybrid coding method, and corresponding block diagram still as shown in Figure 1.If employing intraframe predictive coding, its predicted value PRED (representing with P among the figure) they are by drawing behind the reference picture of having encoded in the current band motion-compensated (MC), wherein, and reference picture F ' n-1 expression.In order to improve precision of prediction, thereby improve compression ratio, actual reference picture can be in the past or following (referring on the display order) coding and decoding rebuild and the frame of filtering in select.After predicted value PRED and current block subtract each other, produce a residual block Dn, after piece conversion, quantification, produce one group of conversion coefficient X after the quantification, again through entropy coding, form a compressed code flow with required some side informations (as predictive mode quantization parameter, motion vector etc.) of decoding, use for transmission and storage through NAL (network self-adapting layer).
As above-mentioned, for the reference picture of further prediction usefulness is provided, encoder must have the function of reconstructed image.Therefore D ' n and the predicted value P addition that residual image is obtained after inverse quantization, inverse transformation obtains uF ' n (frame of non-filtered).In order to remove the noise that produces in the coding and decoding loop, improve the picture quality of reference frame, thereby improve the compressed image performance, be provided with a loop filter, filtered output is reconstructed image, can be used as reference picture.
Estimation accounts for more than 50% of encoder operand, is the bottleneck that encoder is realized.So-called estimation is to find out to current block the most similar piece according to certain matching criterior for each piece in the present frame (luminance macroblock and sub-macro block thereof) in former frame or the given hunting zone of one frame, back, be match block, calculate motion vector (Motion Vector) by the relative displacement of match block and current block.Here Chang Yong criterion is absolute error and (SAD) minimum.Estimation accurate more, the residual error of compensation is just more little, and code efficiency is just high more, and the picture quality that coding comes out is also just good more.For the piece estimation, need read in the reference frame data (also claiming reference data) of the corresponding search window of this piece.For one 16 * 16 macro block, if motion estimation search position range level [64 ,+63]/vertical [32, + 31], so, need read in reference data and be in the reference frame corresponding to this macro block and position image-region on every side thereof, size is (64+16+64) * (32+16+32)=144 * 80.Under the multi-reference frame situation, may need to read in a plurality of reference frame search window data.Because every frame macroblock number is very big, above-mentioned memory access amount is huge, becomes the bottleneck that video coding is realized.For this reason, need be by reusing reference frame search window data between adjacent macroblocks, can make the reference frame data amount of reading in descend greatly like this, when a lot of adjacent macroblocks were carried out estimation in a unit, required reference data of reading in was only bigger than these macro block zones.
H.264/MPEG-4AVC standard definition the filter process that deblocks to 16X16 macro block and 4X4 block boundary.At macro block in this case, the purpose of filtration is to eliminate because modified cube's effect that adjacent macroblocks has in the different frames, inter prediction or different quantization parameters cause.At block boundary in this case, the purpose of filtration is to eliminate the artificial trace that possibility causes owing to transform/quantization and the difference that comes from the adjacent block motion vector.Loop filtering is modified in two pixels on the same one side on macroblock/block border by the nonlinear algorithm of a content-adaptive.
Other video encoding standard of new generation, i.e. H.264/AVC VC-1, AVS1.0-P2 and have identical coding framework, only part of module details difference; As loop filtering: H.264/AVC can select boundary filtering or not filtering to band, the band boundaries not filtering always of VC-1, AVS1.0-P2, concrete filtering algorithm also has difference.Equally, the previous generation video encoding standard, comprise MPEG4, H.263, MPEG4-Part2, with H.264/AVC, VC-1, AVS1.0-P2 have similar coding framework, part of module difference only, as MPEG4, H.263, MPEG4-Part2 encoder do not have loop filtering, the estimation reference frame also has only one.
For ease of the understanding of the present invention, will describe specific implementation process of the present invention below.
Specific implementation process of the present invention specifically comprises as shown in Figure 2:
Step 21: present frame is divided at least one slice-group;
Be specially the present frame in the sequence of frames of video is divided into macro block, press pre-defined rule and give one or more slice-group all macroblock allocation;
In this step, frame of video can be divided into slice-group specifically can divide by the various scan modes of flexible macro block tissue order (FMO), described various scan pattern comprises: interlacing pattern, random scanning pattern, prospect and background scans pattern, box (Box-out) scan pattern, raster scan scan pattern, handkerchief scan pattern and the explicit scan pattern opened, wherein, in explicit scan pattern, need to adopt numbering to indicate the affiliated slice-group of each macro block; When if encoder is not supported FMO, then can be with all macroblock partitions to one of frame slice-group.
And in described raster scan scan pattern, described slice-group can be chosen as in image one or more whole macro-block line continuously that size and fixed-site, width equal the frame width; In described explicit scan pattern, the shape of described slice-group can all be chosen as rectangle.
Step 22: present frame is divided into a plurality of sub-slice-group based on described slice-group;
In encoding scheme H.264, be divided into different levels such as sequence, image sets (GOP), image (frame), slice-group, band, macro block, sub-macro block; Among the present invention, for with a plurality of encoder parallel encodings, need selecting a kind of appropriate granularity is a plurality of subtasks (modules) with the coding task division of a whole sequence, be about to described slice-group and further be divided into one or more sub-slice-group, or present frame has only a slice-group, then need this slice-group is divided into a plurality of sub-slice-group, so that the follow-up parallel encoding processing that can carry out load balancing based on a plurality of sub-slice-group of present frame.
In this step, all slice-group of present frame further are divided into a plurality of sub-slice-group respectively, specifically be to be divided into one or more sub-slice-group by the order of returning raster scan;
Wherein, all sub-slice-group can equal one or more continuous whole macro-block line of frame width for width.
Need to prove, in dividing sub-slice-group process, each macro block must and can only be assigned to a sub-slice-group, with the corresponding one or more sub-slice-group of each encoder, each sub-slice-group only can corresponding encoder, but each encoder can corresponding one or more sub-slice-group, promptly can be used for one or more sub-slice-group are encoded;
In addition, in this step, all right corresponding a plurality of sub-slice-group of encoder for example, when certain slice-group and sub-slice-group splitting scheme cause parton slice-group size less than normal, can the sub-slice-group that several are little be given an encoder.
The principle of handling load balancing between a plurality of encoders is specially: by dividing sub-slice-group with the principle of a plurality of coder processes capabilities match, in partition process, transmit cost in conjunction with reference data, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group.
In this step, in the process that an image sets (GOP) is encoded, concrete sub-slice-group is divided processing scheme and is comprised following processing mode:
For the I frame, handle load according to the macroblock number predictive coding, by dividing sub-slice-group with the principle of odd encoder device disposal ability coupling, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group;
For first non-I frame, divide when constant when slice-group, adopt the sub-slice-group dividing mode of I frame to carry out the division of sub-slice-group;
For the 2nd non-I frame, each sub-slice-group encoding process amount of the 1st the non-I frame that obtains according to actual count is predicted the encoding process load of described the 2nd non-each sub-slice-group of I frame, and be written into cost according to reference data, adjust the division of described the 2nd non-I frame slice-group, a plurality of coder processes time unanimities when making described the 2nd the non-I frame of parallel encoding;
For the 2nd the arbitrary non-I frame that non-I frame is later, predict the encoding process load of described non-each sub-slice-group of I frame according to this frame former frame or each sub-slice-group encoding process amount of former frame, and be written into cost according to examining frame data, adjust the division of described non-I frame slice-group, a plurality of coder processes time unanimities when making the described non-I frame of parallel encoding;
Step 23: the data to be encoded that will be divided into after the sub-slice-group are distributed to each encoder;
Sub-slice-group and encoder corresponding relation that concrete ways of distribution is described in 22 set by step carry out, concrete distributing contents not only comprises all sub-slice-group of present frame, also comprise relevant coding configuration parameter information, these information comprise the descriptor of band, as position, macroblock number etc.; Described coding configuration parameter information specifically includes but not limited to: coding standard information (standard, class), and FMO and scan pattern, reference frame number, the motion estimation search scope, the Rate Control requirement, and the loop filtering pattern etc.
Step 24: carry out the operation of exchange reference data between each encoder;
The processing procedure of concrete exchange reference data can for: when present frame was not the I frame, the exchange sub-slice-group data of reference frame of having rebuild between the encoder were upgraded each the sub-slice-group motion estimation search district reference data of buffer memory in each encoder;
Further, being treated to of described exchange reference data: when present frame was not the I frame, encoder carried out inter prediction, and wherein most crucial processing is an estimation; Before described estimation, encoder need obtain field of search reference data; Reference data derives from the reference frame that the front has rebuild, and this reference frame is synthesized by the sub-slice-group of respectively rebuilding of each encoder output; Reading in of reference data is the best part in the video coding memory access amount, and be very big to the coding efficiency influence; Need to reduce to greatest extent the amount of reading in of reference data, this need adopt various reference datas to reuse strategy for this reason.
Promptly in this step, if present frame is not the I frame, then according to image area, the memory block overlapping relationship of sub-slice-group of each encoder present frame and the sub-slice-group of reconstruction frames, and estimation maximum search district, reference frame in the sub-slice-group of present frame, determine the minimum reference data that exchanges between the encoder, and utilize the reference data of described exchange to upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder.
That is to say, among the present invention, the sub-slice-group of reconstruction that produces in encoder encodes process is stored in this encoder local storage, when same sub-slice-group position size of next frame is constant, these data of rebuilding sub-slice-group can be all as the reference data of descending the frame estimation, divide gamete slice-group zone field of search partial reference data in addition to produce but exceed this encoder, must obtain from other encoder by other encoder.
For example, in motion estimation process, adopt fixed-size search window, a reference frame, determine the minimum reference data that exchanges between the encoder according to the size of search window, and exchange obtains corresponding reference data between encoder, so that be used for upgrading the motion estimation search district reference data of each sub-slice-group of buffer memory in each encoder.
Step 25: by each encoder and be about to the macro block that all sub-slice-group of present frame to be encoded comprise and carry out encoding process;
Processing with a plurality of encoder parallel encodings is specifically as follows to all sub-slice-group of present frame: each encoder finish divide the coding of gamete slice-group, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, and each macro block must and can only be assigned to a band; Each encoder finish divide all macroblock encoding in the gamete slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter.
In this step, all macroblock encoding processes specifically can comprise in each encoder antithetical phrase slice-group: estimation, motion compensation; Infra-frame prediction is selected, in the infra-frame prediction, frame/the interframe selection, ask residual error; Rate Control; Conversion, quantification; Reorder entropy coding; Inverse quantization, inverse transformation; And rebuild.
If according to H.264, VC-1, AVS1.0-P2 standard, then in each encoder antithetical phrase slice-group all macro blocks cataloged procedure specifically can may further comprise the steps:
(1) use a plurality of backward or forward reference frames to carry out estimation, and motion compensation;
(2) infra-frame prediction is selected, and promptly during the current macro infra-frame prediction, uses same band to rebuild as yet the not left side macro block and the top macro block data of loop filtering; In the frame/and the interframe selection, ask residual error;
(3) Rate Control is handled;
(4) integer transform, quantification treatment;
(5) reorder, entropy coding, entropy coding are that context is based on adaptive variable-length encoding or based on contextual adaptive binary arithmetic coding;
(6) inverse quantization, and inverse transformation is handled;
(7) reconstruction process;
(8) loop filtering is handled.
Based on standard H.264, loop filtering is that cataloged procedure produces the necessary link of reconstruction frames.H.264 whether the border of every band is carried out loop filtering and can selection be set by " band boundaries loop filtering pattern ", and the present invention preferably is set to " not filtering " with all bands of present frame " border loop filter patterns ".All bands of present frame " border loop filter patterns " are when being set to " not filtering ", and each encoder is independently finished each band loop Filtering Processing separately, swap data not between each encoder, each band.
Because the loop filtering of each band is independently, for saving the processing time of loop filtering, the loop filtering of the macro block that comprises at a band among the present invention starts after first macro block reconstruction of this band is finished, the reconstruction of macro block and loop filtering constitute the macro-block level streamline, in order to reduce the visiting frequency of loop filtering processing to main storage, can be on sheet the pixel value and the block message of the 4 row pixel values bottom in the one full line macro block of storage top and block message and the rightest 4 row of left side macro block on the buffer memory, loop filtering only needs to visit buffer on this piece sheet when handling.Certainly, the loop filtering of a band also can begin after this band reconstruction is all finished, and same, each encoder independently carries out separately to the loop filtering of each band, does not need swap data between each encoder, each band.
Step 26: output code flow after the encoder encodes and parameter are carried out convergence processing and top layer coding, produce the output sequence code stream;
Be specially: primary processor compiles the code stream and the relevant parameter of each encoder output, generates slice-group, frame code stream, and merging is exported after handling the formation sequence code stream; For example, for H.264, need finish all encoding process that comprise NAL.
Through the processing of above-mentioned steps 21 to step 26, just can realize handling at the parallel encoding of video data, and, in at the coding video data processing procedure, need to repeat described step 21 to step 26, so that repeat the next frame encoding process, finish up to cataloged procedure.
The present invention divides the key of the processing procedure of subband grouping for the present invention's realization in the specific implementation process, below in conjunction with concrete application example the processing procedure of dividing the subband grouping is described.
Among the present invention, in order to obtain more unitary Item disposal ability by parallel processing, satisfy the requirement of real-time HD video coding, need to adopt the parallel encoding process of carrying out of a plurality of encoders, the framework of described a plurality of encoders and encoding process ability can be inequality, but the total disposal ability of its encoder should be greater than desired encoding process, and has certain surplus to deal with the additional overhead of parallel encoding processor.
(1280 * 720,30fps) H.264 real-time coding is an example with HD video 720p.The encoder architecture design considers to support 1 or 2 reference frame, motion estimation search scope level [64 ,+63]/vertically [32 ,+31].Single encoded device adopts VLIW DSP or FPGA to realize that according to assessment in advance, approximately need 5 described encoders to add up and can reach the total encoding process ability of 720p and certain surplus is arranged, described encoder is numbered encoder 1~5.
Among the present invention, in order to obtain best parallel processing performance, following two aspects of main consideration:
(1) load balancing between each encoder is waited for few as far as possible mutually;
(2) expense such as communicate by letter between each encoder is as far as possible little.
Method of the present invention is divided by sub-slice-group flexibly just, makes a plurality of processors in the parallel video coding system (being encoder) satisfy above-mentioned two aspects and handles, to reach performance the best of whole parallel video coding system.
For reaching above-mentioned (1) described demand, the encoding process load of the sub-slice-group of expectation is mated with the disposal ability of the encoder that is distributed.The encoding process of sub-slice-group load and this sub-slice-group size (the macroblock number number that promptly comprises), image content features, coding configuration parameter (infra-frame prediction, inter prediction, reference frame quantity, smallest blocks, quantification gradation, entropy coding mode etc.) are closely related, usually, macroblock number is many more, the encoding process load is big more, therefore can be complementary by macroblock number and the corresponding coder processes ability of adjusting sub-slice-group.
For above-mentioned (2) described demand, the communication overhead of odd encoder device parallel processing mainly is the exchange of reference data, if estimation parameter (reference frame number, hunting zone etc.) is fixing, the group slice-group is divided when constant, communication overhead is constant, if the sub-slice-group of present frame is divided change, then the communication overhead of reference data exchange obviously increases.So the adjustment that sub-slice-group is divided need be calculated the cost influence that the reference data communication overhead increases.
Be example specifically, the dividing mode of slice-group provided by the invention, sub-slice-group is described with HD video 720p (1280 * 720).Promptly as shown in table 1, it in the table 3 macroblock partitions of 720p one frame, have 3600 16 * 16 macro blocks, (rightmost one row are numberings of macro-block line in the table to be divided into 45 macro-block line, MBR1~45), each macro-block line has 80 macro blocks, and the numeral in each little lattice is the macro block numbering, pressing raster scan (from left to right, from top to bottom) increases progressively;
Table 1
0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1
80 81 82 116 117 118 119 120 121 122 123 157 158 159 MBR2
MBR3~7
560 561 562 596 597 598 599 600 601 602 603 637 638 639 MBR8
640 641 642 676 677 678 679 680 691 692 693 717 718 719 MBR9
720 721 722 756 757 758 759 760 761 762 763 797 798 799 MBR10
800 801 802 836 837 838 839 840 841 842 843 877 878 879 MBR11
MBR12~16
1280 1359 MBR17
1360 1439 MBR18
1440 1519 MBR19
1520 1599 MBR20
MBR21~25
2000 2039 MBR26
2080 2159 MBR27
2160 2239 MBR28
2240 2319 MBR29
MBR30~34
2720 2799 MBR35
2800 2879 MBR36
2880 2959 MBR37
2960 3039 MBR38
MBR39~43
3440 3519 MBR44
3520 3599 MBR45
At the frame shown in the table 1, be example to divide slice-group by prospect and background.Suppose that a prospect slice-group is only arranged, totally 216 macro blocks (8 macro blocks row, 27 macro-block line).All the other belong to the background slice-group residue macro block.In actual applications, in video conferencing, often the people in the visual field (or people's face) is considered as the part be concerned about most, this part is divided into the prospect slice-group encodes separately.Two slice-group further are divided into 6 sub-slice-group altogether again, and wherein the background slice-group is divided into 5 sub-slice-group, and the prospect slice-group is divided into a sub-slice-group.
As shown in table 2, the sub-slice-group after corresponding the division is specifically as follows following division result:
Sub-slice-group 1: macro block numbering 0~719;
Sub-slice-group 2: except the macro block numbering 720~1439, prospect part (the black matrix block of band double underline);
Sub-slice-group 3: except the macro block numbering 1440~2159, prospect part (the black matrix block of band double underline);
Sub-slice-group 4: except the macro block numbering 2160~2879, prospect part (the black matrix block of band double underline);
Sub-slice-group 5: macro block numbering 2880~3599;
Sub-slice-group 6: all 216 macro blocks of prospect slice-group (the black matrix block of band double underline).
The dividing mode of this seed slice-group is applicable to the situation of H.264 basic class and expansion class support FMO.
Table 2
Figure DEST_PATH_S061B3256920061221D000021
Be example still with the frame shown in the his-and-hers watches 1, and as shown in table 3 by the implementation of explicit division slice-group, a frame is divided into 4 slice-group, further is divided into 5 sub-slice-group altogether again.
Slice-group 1 is the italic block in the upper left corner, totally 1600 macro blocks (40 macro blocks row, 40 macro-block line); This slice-group further goes up inferior two the sub-slice-group (sub-slice-group 1,2) that are divided into, and each sub-slice-group is totally 800 macro blocks (40 macro blocks row, 20 macro-block line).
Slice-group 2 is the black matrix block of the band underscore in the upper right corner, totally 1600 macro blocks (40 macro blocks row, 40 macro-block line); This slice-group further goes up inferior two the sub-slice-group (sub-slice-group 3,4) that are divided into, and each sub-slice-group is totally 800 macro blocks (40 macro blocks row, 20 macro-block line).
Slice-group 3 is 5 the whole macro-block line in below, totally 400 macro blocks (80 macro blocks row, 5 macro-block line); The whole sub-slice-group (sub-slice-group 5) that is divided into of this slice-group.
This division is applicable to the situation of H.264 basic class and expansion class support FMO.
Table 3
0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1
80 81 82 116 117 118 119 120 121 122 123 157 158 159 MBR2
0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1
MBR3~7
560 561 562 596 597 598 599 600 601 602 603 637 638 639 MBR8
640 641 642 676 677 678 679 680 691 692 693 717 718 719 MBR9
720 721 722 756 757 758 759 760 761 762 763 797 798 799 MBR10
800 801 802 836 837 838 839 840 841 842 843 877 878 879 MBR11
MBR12~16
1280 1359 MBR17
1360 1439 MBR18
1440 1519 MBR19
1520 1599 MBR20
MBR21~25
2000 2039 MBR26
2080 2159 MBR27
2160 2239 MBR28
2240 2319 MBR29
MBR30~34
2720 2799 MBR35
2800 2879 MBR36
2880 2959 MBR37
2960 3039 MBR38
MBR39~43
3440 3519 MBR44
3520 3599 MBR45
Be example to divide slice-group by raster scan again, it is as shown in table 4 to divide the result accordingly, and a frame is divided into 1 slice-group, further is divided into 5 sub-slice-group altogether again, 9 whole macro-block line of each sub-slice-group, totally 720 macro blocks.
Sub-slice-group 1: macro block numbering 0~719;
Sub-slice-group 2: macro block numbering 720~1439;
Sub-slice-group 3: macro block numbering 1440~2159;
Sub-slice-group 4: macro block numbering 2160~2879;
Sub-slice-group 5: macro block numbering 2880~3599;
H.264 this splitting scheme is applicable to, VC-1, AVS1.0-P2, H.263, MPEG2, MPEG4-Part2 standard and each class thereof.
Table 4
Figure DEST_PATH_S061B3256920061221D000041
Here illustrate as follows to the described exchange reference data of step 24 among the present invention:
With the grouping of the subband shown in the table 4 dividing mode is example, with 720p, motion estimation search scope level [64, + 63]/vertical [32, + 31] be example, suppose encoder 3 corresponding sub-slice-group 3, the reference data district that needs is that corresponding macro block is numbered previous or a plurality of reference frame image zone of 1280~2319.
Under the estimation situation for a reference frame, if present frame and the coded sub-slice-group of previous frame at encoder 3 are divided constant, promptly be all macro block and be numbered 1440~2159 zones, because macro block is numbered the sub-slice-group that 1440~2159 zones are the reconstructions of this encoder previous frame in the required reference frame regions, the reference data that need read in from encoder 3 outsides only is following two:
Macro block is numbered 1280~1439, comes from (encoder 2 outputs) sub-slice-group 2 of reconstruction; With,
Macro block is numbered 2160~2319, comes from (encoder 4 outputs) sub-slice-group 4 of reconstruction.
Equally, under the estimation situation for a reference frame, if present frame and the coded sub-slice-group of previous frame at encoder 2 are divided constant, promptly be all macro block and be numbered 720~1439 zones, because macro block is numbered the sub-slice-group that 720~1439 zones are the reconstructions of this encoder previous frame in the required reference frame regions, the reference data that need read in from encoder 3 outsides only is following two:
Macro block is numbered 1440~1559, comes from (encoder 3 outputs) sub-slice-group 3 of reconstruction; With,
Macro block is numbered 560~719,, come from (encoder 1 output) sub-slice-group 1 of reconstruction.
At this, two encoders 2 that the sub-slice-group of present encoding is adjacent and 3 will be rebuild sub-slice-group data mutually and pass to the other side as the reference data, be used to carry out encoding operation.
If use the multi-reference frame estimation, then need to be written into the sub-slice-group data division of required a plurality of reconstructions from other encoder.
Method provided by the invention goes for various video formats, comprises HD video or SD video, and, can be also can be interlacing scan line by line; Wherein, for interlaced scanning video, it is right that two corresponding macro blocks of the field, top of an interlaced scanned frames and field, the end are formed macro block, need be with a macro block to distributing to same slice-group, sub-slice-group, band coding.
The example of dividing based on the slice-group of front and sub-slice-group is described the application example at the division of the band in the sub-slice-group cataloged procedure below:
Application example one
It is as shown in table 5 that sub-slice-group is divided into the example that band cuts apart.In table 5, sub-slice-group 1 is divided into 7 bands in cataloged procedure, be numbered band 1~7 by raster scan order, and band 1,2 respectively is two macro-block line, and band 3~7 respectively is a macro-block line;
Table 5
0 1 2 36 37 38 39 40 41 42 43 77 78 79 MBR1
80 81 82 116 117 118 119 120 121 122 123 157 158 159 MBR2
160 161 162 196 197 198 199 200 201 202 203 237 238 239 MBR3
240 241 242 276 277 278 279 280 281 282 283 317 318 319 MBR4
320 321 322 356 357 358 359 360 361 362 363 397 398 399 MBR5
400 401 402 416 417 418 419 420 421 422 423 477 478 479 MBR6
480 481 482 516 517 518 519 520 521 522 523 557 558 559 MBR7
560 561 562 596 597 598 599 600 601 602 603 637 638 639 MBR8
640 641 642 676 677 678 679 680 691 692 693 717 718 719 MBR9
Application example two
The example that second band cut apart is as shown in table 6, and in table 2, sub-slice-group 1 is divided into 5 bands in cataloged procedure, be numbered band 1~5 by raster scan order, and the macro block that each band comprises is as follows:
Band 1: macroblock coding 0~123;
Band 2: macroblock coding 124~237;
Band 5: macroblock coding 238~319;
Band 3: macroblock coding 320~521;
Band 4: macroblock coding 522~719;
Table 6
0 1 2 36 37 38 39 40 41 42 43 44 77 78 79 MBR1
80 81 82 116 117 118 119 120 121 122 123 124 157 158 159 MBR2
160 161 162 196 197 198 199 200 201 202 203 204 237 238 239 MBR3
240 241 242 276 277 278 279 280 281 282 283 284 317 318 319 MBR4
0 1 2 36 37 38 39 40 41 42 43 44 77 78 79 MBR1
320 321 322 356 357 358 359 360 361 362 363 364 397 398 399 MBR5
400 401 402 416 417 418 419 420 421 422 423 424 477 478 479 MBR6
480 481 482 516 517 518 519 520 521 522 523 524 557 558 559 MBR7
560 561 562 596 597 598 599 600 601 602 603 604 637 638 639 MBR8
640 641 642 676 677 678 679 680 691 692 693 694 717 718 719 MBR9
The present invention also comprises a kind of device of parallel video coding, the specific implementation structure of this device as shown in Figure 3, comprise primary processor and a plurality of encoder, primary processor is used for present frame to be encoded is divided into sub-slice-group, and pass to a plurality of encoders respectively, each sub-slice-group of a plurality of encoder parallel encodings is exported separately encoding code stream and is given primary processor, by primary processor formation sequence code stream and output.
(1) primary processor
Described primary processor further comprises slice-group determining unit, sub-slice-group determining unit, sub-slice-group data passes unit and top layer coding unit, and each unit is specially:
The slice-group determining unit is used for from data transfer unit receiving digital video sequence data the present frame in the sequence of frames of video fixedly being divided into macro block, presses pre-defined rule and gives one or more slice-group with all macroblock allocation;
Sub-slice-group determining unit is used for all slice-group to present frame, is divided into one or more sub-slice-group according to the order of raster scan, and each macro block must and can only be assigned to a sub-slice-group;
Sub-slice-group data passes unit, be used for reading in described sub-slice-group data according to a plurality of encoders of corresponding relation configuration control of sub-slice-group and encoder, and each encoder reads in one or more sub-slice-group, and the data to be encoded that each sub-slice-group comprises all are delivered to the one and same coding device; Operable bus signals form includes but not limited in sub-slice-group data transfer: the signal format of AHB in parallel interface, HSSI High-Speed Serial Interface, BT656 or CCIR601 format digital video signal, HD-SDI signal, the AMBA bus specification or AXI bus or Ethernet interface correspondence;
The top layer coding unit: be used to compile the code stream and the relevant parameter of each encoder output, generate slice-group, frame code stream, merging is exported after handling the formation sequence code stream;
In primary processor, can be a slice-group specifically, be divided into a plurality of sub-slice-group, and all sub-slice-group equal one or more continuous whole macro-block line of frame width for size and fixed-site, width in image entire frame.
In addition, in primary processor, also finish in advance and carry out the image preliminary treatment receiving present frame, comprise: two-dimentional denoising, with or, convergent-divergent is handled, and/or digital video 4:4:4 handles to the 4:2:2 format conversion, and/or digital video 4:2:2 is to format conversion processing of 4:2:0 etc.
And, during primary processor of the present invention is concrete can be a computer system, comprise: CPU, memory, with the interface (with the interface and the external communication network input interface of data transfer unit) of data transfer unit, and output interface, be used to export encoding code stream, with, configuration control information interface is used to accept the exterior arrangement control information.
(2) encoder, the structure of described encoder specifically can comprise:
(1) sub-slice-group receiving element is used to receive sub-slice-group and coding configuration parameter;
(2) reference data input-output unit, be used between each encoder, exchanging reference data, be specially: when present frame is not the I frame, the exchange reference data unit of each encoder by being provided with in the data transfer unit, exchange is obtained required reference data from other encoder, be the sub-slice-group data of reference frame that exchange has been rebuild between the controlled encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder.
(3) coding unit, be used for parallel to all sub-slice-group codings of present frame: each encoder finish divide the coding of gamete slice-group, in cataloged procedure each sub-slice-group is divided into one or more bands by the order of raster scan, each macro block must and can only be assigned to a band; Each encoder finish divide all macroblock encoding in the gamete slice-group, produce and rebuild sub-slice-group, produce and output code flow and relevant parameter; And described band is preferably one or more continuous whole macro-block line that width equals the frame width;
In a plurality of encoder of the present invention each encoder can for:
General processor comprises: the processor of superscalar architecture, VLIW word signal processor (VLIW DSP); Or
Field-programmable large scale array chip (FPGA); Or
The VLSI (very large scale integrated circuit) chip (VLSI) of customization; Or
The instruction set configurable processor.
Each encoder is configuration data random access storage device (RAM) also, and promptly memory cell is used for the information such as sub-slice-group data that buffer memory is current and rebuild.Can use big capacity RAM in the chip when data RAM capacity is little, desired volume can dispose synchronous static random-access memory (SSRAM) or the synchronous dynamic random access memory (SDRAM) that uses sheet outer when big.
For obtaining better parallel processing, be used for H.264, VC-1, AVS1.0 further ,-during the P2 standard code, the loop filtering mode in the described encoder encodes is:
Be provided with and select the border loop filter patterns of all bands of present frame be not filtering, each encoder is independently finished the loop filtering processing separately, swap data not between each encoder, each band;
Be specially: the loop filtering of a band starts after first macro block reconstruction of this band is finished, and the reconstruction of macro block and loop filtering constitute the macro-block level streamline; Perhaps, the loop filtering of a band begins after this band reconstruction is all finished, and each encoder independently carries out separately to the loop filtering of each band, does not need swap data between each encoder, each band.
Described encoder is in cataloged procedure, and corresponding reference data exchange process is:
When present frame is not the I frame, the sub-slice-group data of exchange reference frame between the encoder, image area, memory block overlapping relationship according to sub-slice-group of each encoder present frame and the sub-slice-group of reconstruction frames, and consider estimation maximum search district, reference frame in the sub-slice-group of present frame, determine the minimum reference data that exchanges between the encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder; Wherein, estimation can adopt fixed-size search window, and a reference frame is determined the minimum reference data that exchanges between the encoder according to the search window size, upgrades each sub-slice-group motion estimation search district reference data of buffer memory in each encoder.
Need to prove, carry out in the encoding operation process at encoder, also need usually present frame is carried out also needing preliminary treatment before the video coding, comprising: two-dimentional denoising, convergent-divergent and format conversion, format conversion refer to the input digit video format is transferred to the processing such as form of encoder requirement; Common encoded digital video form is 4:2:0, and the video format of common input is the 4:2:2 of BT656 or CCIR601 regulation.H.264 the high-fidelity extended coding is supported 4:2:2 and 4:4:4, and this moment, the input digit video was 4:2:2 and 4:4:4.The HD video physical interface often is HD-SDI.
And the required current frame data preliminary treatment of a plurality of encoders can be adopted one of following two kinds of forms:
1, primary processor receives present frame, sends out behind concentrated preliminary treatment and the buffer memory and is transmitted to a plurality of encoders again; This scheme needs the master processor processes ability stronger, and Frame memory capacity is bigger;
2, a plurality of encoders separately preliminary treatment handle the sub-slice-group distributed.
Can also be provided with control unit in the device primary processor of the present invention, be used for initialization, configuration and control other processing unit of primary processor and a plurality of encoder and finish the whole video sequence coding.
In the device of the present invention, startup, configuration circuit in order to simplify each encoder reduce cost, and the startup of described encoder, configurator or data can be from primary processors; For encoder is the situation of FPGA, can be by primary processor by parallel or series arrangement interface configuration FPGA; For encoder is the situation of general processor, can select parallel or serial port to start (Boot), and the BootROM program of using in the start-up course is from primary processor.
The embedded computer system that device of the present invention is made of one or more circuit boards typically; Perhaps, also can be the system that personal computer or server link together by communication network, wherein encoder can be PC or server.
Method of the present invention and device are suitable for multiple coding standard and each class thereof.
Device provided by the invention is applicable to various video formats, comprise HD video or SD video, can be also can be interlacing scan line by line, for interlaced scanning video, it is right that two corresponding macro blocks of the field, top of an interlaced scanned frames and field, the end are formed macro block, need be with a macro block to distributing to same slice-group, sub-slice-group, band coding.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of claim.

Claims (16)

1. a method of parallel video coding is characterized in that, comprising:
Present frame in the video sequence is divided into macro block, presses pre-defined rule and give one or more slice-group all macroblock allocation;
Order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group; In described partition process, each macro block must and can only be assigned to a sub-slice-group;
With sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder, the corresponding one and same coding device of each sub-slice-group, the corresponding one or more sub-slice-group of each encoder;
By sub-slice-group and encoder corresponding relation, send all sub-slice-group of present frame and coding configuration parameter to each encoder;
When present frame is not the I frame, exchange the sub-slice-group data of reference frame of having rebuild between the encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;
By described each encoder to the parallel encoding process of carrying out of the sub-slice-group of present frame, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the sub-slice-group that each encoder is distributed, produce and rebuild sub-slice-group, output encoder code stream and parameter;
The code stream and the parameter of each encoder output are converged, finish the coding of slice-group, frame and sequence, output whole sequence code stream.
2. method according to claim 1 is characterized in that, described pre-defined rule comprises:
Interlacing pattern, random scanning pattern, prospect and background scans pattern, Box-out scan pattern, raster scan scan pattern, handkerchief scan pattern and needs adopt numbering to indicate the explicit scan pattern of the affiliated slice-group of each macro block, perhaps, present frame is divided into a slice-group.
3. method according to claim 1 is characterized in that, described sub-slice-group is one or more continuous whole macro-block line that width equals the frame width.
4. according to each described method of claim 1 to 3, it is characterized in that, described with sub-slice-group according to the processing load balancing of its correspondence correspond to each encoder processing comprise:
According to each coder processes ability, and reference data transmits cost, and each sub-band that present frame is comprised corresponds to each encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group.
5. method according to claim 1 is characterized in that, in the image sets cataloged procedure, the described processing that is divided into a plurality of sub-slice-group comprises:
For the I frame, handle load according to the macroblock number predictive coding, and sub-slice-group corresponded to each encoder according to the disposal ability and the described encoding process load of encoder, make a plurality of encoders finish simultaneously divide the encoding process of gamete slice-group;
For first non-I frame, divide when constant when slice-group, adopt the sub-slice-group dividing mode of I frame to carry out the division of sub-slice-group;
For the 2nd non-I frame, each sub-slice-group encoding process amount of the 1st the non-I frame that obtains according to actual count is predicted the encoding process load of described the 2nd non-each sub-slice-group of I frame, and be written into cost according to reference data, adjust the division of described the 2nd non-I frame slice-group, a plurality of coder processes time unanimities when making described the 2nd the non-I frame of parallel encoding;
For the 2nd the arbitrary non-I frame that non-I frame is later, predict the encoding process load of described non-each sub-slice-group of I frame according to this frame former frame or each sub-slice-group encoding process amount of former frame, and be written into cost according to reference frame data, adjust the division of described non-I frame slice-group, a plurality of coder processes time unanimities when making the described non-I frame of parallel encoding.
6. method according to claim 1 is characterized in that, described band is one or more continuous whole macro-block line that width equals the frame width.
7. method according to claim 1 is characterized in that, the encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out comprises:
Use a plurality of backward or forward reference frames to carry out estimation, and motion compensation;
Infra-frame prediction is selected, and promptly during the current macro infra-frame prediction, uses same band to rebuild as yet the not left side macro block and the top macro block data of loop filtering; In the frame/and the interframe selection, ask residual error;
Rate Control;
Integer transform, quantification;
Reorder, entropy coding, entropy coding are that context is based on adaptive variable-length encoding or based on contextual adaptive binary arithmetic coding;
Inverse quantization, inverse transformation;
Rebuild;
Loop filtering.
8. method according to claim 7 is characterized in that, the encoding process process that each macro block in described each encoder antithetical phrase slice-group carries out also comprises:
Be provided with and select the border loop filter patterns of all bands of present frame to be not filtering, each encoder is independently finished each band loop Filtering Processing separately, and exchange message not between each encoder, each band.
9. method according to claim 8 is characterized in that, the described loop filtering of independently finishing separately comprises:
The loop filtering of band starts after first macro block reconstruction of this band is finished, and perhaps, the loop filtering of band begins after this band reconstruction is all finished.
10. the device of a parallel video coding, it is characterized in that, comprise primary processor and a plurality of encoder, primary processor is used for present frame to be encoded is divided into sub-slice-group, and pass to a plurality of encoders respectively, each sub-slice-group of a plurality of encoder parallel encodings is exported separately encoding code stream and is given primary processor, by primary processor formation sequence code stream and output;
Described primary processor comprises slice-group determining unit, sub-slice-group determining unit and top layer coding unit, wherein:
The slice-group determining unit is used for the present frame of sequence of frames of video is divided into macro block, presses pre-defined rule and gives one or more slice-group with all macroblock allocation;
Sub-slice-group determining unit, be used for all slice-group to present frame, order according to raster scan is divided into one or more sub-slice-group respectively with described slice-group, and each macro block must and can only be assigned to a sub-slice-group, the corresponding one or more sub-slice-group of each encoder, the corresponding encoder of each sub-slice-group;
Sub-slice-group data passes unit sends all sub-slice-group of present frame and coding configuration parameter to each encoder by sub-slice-group and encoder corresponding relation;
The top layer coding unit, code stream and parameter that each encoder is exported converge, and finish the coding of slice-group, frame and sequence, output whole sequence code stream;
Described encoder comprises sub-slice-group receiving element, reference data input-output unit and coding unit, wherein:
Sub-slice-group receiving element is used to receive sub-slice-group and coding configuration parameter;
The reference data input-output unit, be used between each encoder, exchanging reference data, when present frame is not the I frame, exchange the sub-slice-group data of reference frame of having rebuild between the controlled encoder, upgrade each sub-slice-group motion estimation search district reference data of buffer memory in each encoder;
Coding unit, be used for divide the gamete slice-group to carry out encoding process, in cataloged procedure, each sub-slice-group is divided into one or more bands by the order of raster scan, finish all macroblock encoding in the branch gamete slice-group, produce and rebuild sub-slice-group, output encoder code stream and parameter.
11. device according to claim 10 is characterized in that, the processing of described primary processor comprises:
Entire frame is divided into a slice-group, and further is divided into a plurality of sub-slice-group, and all sub-slice-group are one or more continuous whole macro-block line that width equals the frame width.
12. device according to claim 10 is characterized in that, each encoder in described a plurality of encoders also comprises data storage cell, is used for the sub-slice-group data that buffer memory is current and rebuild.
13. device according to claim 10 is characterized in that, described band is one or more continuous whole macro-block line that width equals the frame width, and described same macro block only can be divided in the same band.
14. device according to claim 10 is characterized in that, described device primary processor also comprises control unit, is used for initialization, configuration and control primary processor and a plurality of encoder and finishes the whole video sequence coding.
15., it is characterized in that each encoder in described a plurality of encoders also comprises according to each described device of claim 10 to 13 in carrying out cataloged procedure:
The border loop filter patterns of all bands of present frame is set to not filtering, each encoder is independently finished each band loop Filtering Processing separately, exchange message not between each encoder, each band, and the loop filtering of band starts after first macro block reconstruction of this band is finished, perhaps, the loop filtering of band begins after this band reconstruction is all finished.
16., it is characterized in that described a plurality of encoders are only finished macro-block level entropy coding output macro block code stream and parameter according to each described device of claim 10 to 13, primary processor top layer coding module is finished the slice level entropy coding.
CN 200610113256 2006-09-20 2006-09-20 Parallel video coding method and device Active CN101150719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610113256 CN101150719B (en) 2006-09-20 2006-09-20 Parallel video coding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610113256 CN101150719B (en) 2006-09-20 2006-09-20 Parallel video coding method and device

Publications (2)

Publication Number Publication Date
CN101150719A CN101150719A (en) 2008-03-26
CN101150719B true CN101150719B (en) 2010-08-11

Family

ID=39251016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610113256 Active CN101150719B (en) 2006-09-20 2006-09-20 Parallel video coding method and device

Country Status (1)

Country Link
CN (1) CN101150719B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472371A (en) * 2016-01-13 2016-04-06 腾讯科技(深圳)有限公司 Video code stream processing method and device

Families Citing this family (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009149671A1 (en) * 2008-06-13 2009-12-17 华为技术有限公司 Method, device, and system for packaging and encoding/decoding of video data
CN101686388B (en) 2008-09-24 2013-06-05 国际商业机器公司 Video streaming encoding device and method thereof
KR101118091B1 (en) * 2009-06-04 2012-03-09 주식회사 코아로직 Apparatus and Method for Processing Video Data
KR20250059556A (en) 2009-07-01 2025-05-02 인터디지털 브이씨 홀딩스 인코포레이티드 Methods and apparatus for signaling intra prediction for large blocks for video encoders and decoders
JP5359657B2 (en) * 2009-07-31 2013-12-04 ソニー株式会社 Image encoding apparatus and method, recording medium, and program
CN102340659B (en) * 2010-07-23 2013-09-04 联合信源数字音视频技术(北京)有限公司 Parallel mode decision device and method based on AVS (Audio Video Standard)
US8344917B2 (en) * 2010-09-30 2013-01-01 Sharp Laboratories Of America, Inc. Methods and systems for context initialization in video coding and decoding
CN101969560B (en) * 2010-11-01 2012-09-05 北京中科大洋科技发展股份有限公司 Slice code rate allocation method of Mpeg2 high-definition coder under multi-core platform
KR101824241B1 (en) * 2011-01-11 2018-03-14 에스케이 텔레콤주식회사 Intra Additional Information Encoding/Decoding Apparatus and Method
CN102281441B (en) * 2011-06-17 2017-05-24 中兴通讯股份有限公司 Method and device for parallel filtering
CN102231631B (en) * 2011-06-20 2018-08-07 深圳市中兴微电子技术有限公司 The coding method of RS encoders and RS encoders
CN103918258A (en) * 2011-11-16 2014-07-09 瑞典爱立信有限公司 Reducing amount of data in video encoding
CN103124345A (en) * 2011-11-18 2013-05-29 江南大学 Parallel encoding method
PL3944620T3 (en) * 2012-01-30 2024-10-21 Samsung Electronics Co., Ltd. Apparatus for hierarchical data unit-based video encoding and decoding comprising quantization parameter prediction
CN108989822B (en) * 2012-01-30 2021-06-08 三星电子株式会社 Apparatus for video decoding
KR20250010140A (en) * 2012-02-04 2025-01-20 엘지전자 주식회사 Video encoding method, video decoding method, and device using same
IL268801B (en) * 2012-04-13 2022-09-01 Ge Video Compression Llc Low-delay image coding
TWI584637B (en) 2012-06-29 2017-05-21 Ge影像壓縮有限公司 Video data stream concept technology
US20140112589A1 (en) * 2012-10-22 2014-04-24 Gurulogic Microsystems Oy Encoder, decoder and method
SG10201507030WA (en) * 2013-01-04 2015-10-29 Samsung Electronics Co Ltd Method for entropy-encoding slice segment and apparatus therefor, and method for entropy-decoding slice segment and apparatus therefor
CN103268263B (en) * 2013-05-14 2016-08-10 讯美电子科技有限公司 A kind of method and system of dynamic adjustment multi-graphics processor load
JP6226578B2 (en) 2013-06-13 2017-11-08 キヤノン株式会社 Image coding apparatus, image coding method, and program
CN103442196B (en) * 2013-08-16 2016-12-07 福建省物联网科学研究院 A kind of video recording method being used for touch panel device based on vector coding
CN103414902A (en) * 2013-08-26 2013-11-27 上海富瀚微电子有限公司 AVC parallel coding method used for low power consumption applications
CN103458244B (en) 2013-08-29 2017-08-29 华为技术有限公司 A kind of video-frequency compression method and video compressor
CN103916675B (en) * 2014-03-25 2017-06-20 北京工商大学 A kind of low latency inner frame coding method divided based on band
CN104980764B (en) * 2014-04-14 2019-06-21 深圳力维智联技术有限公司 Parallel decoding method, apparatus and system based on complex degree equalization
CN104038766A (en) * 2014-05-14 2014-09-10 三星电子(中国)研发中心 Device used for using image frames as basis to execute parallel video coding and method thereof
CN105992018B (en) * 2015-02-11 2019-03-26 阿里巴巴集团控股有限公司 Streaming media transcoding method and apparatus
CN104780377B (en) * 2015-03-18 2017-12-15 同济大学 A kind of parallel HEVC coded systems and method based on Distributed Computer System
CN104811696B (en) * 2015-04-17 2018-01-02 北京奇艺世纪科技有限公司 A kind of coding method of video data and device
CN113115043A (en) * 2015-08-07 2021-07-13 辉达公司 Video encoder, video encoding system and video encoding method
DK3381190T3 (en) * 2016-08-04 2021-08-16 Sz Dji Technology Co Ltd PARALLEL VIDEO CODING
CN106231320B (en) * 2016-08-31 2020-07-14 上海交通大学 Joint code rate control method and system supporting multi-machine parallel coding
CN106454354B (en) * 2016-09-07 2019-10-18 中山大学 A kind of AVS2 parallel code processing system and method
CN106849956B (en) * 2016-12-30 2020-07-07 华为机器有限公司 Compression method, decompression method, apparatus and data processing system
CN106603564A (en) * 2016-12-30 2017-04-26 上海寰视网络科技有限公司 Unlimited high-resolution image and video playing methods and systems
US10979728B2 (en) * 2017-04-24 2021-04-13 Intel Corporation Intelligent video frame grouping based on predicted performance
CN107819573A (en) * 2017-10-17 2018-03-20 东北大学 High dimension safety arithmetic coding method
CN107888917B (en) * 2017-11-28 2021-06-22 北京奇艺世纪科技有限公司 Image coding and decoding method and device
CN110971896B (en) * 2018-09-28 2022-02-18 瑞芯微电子股份有限公司 H.265 coding method and device
EP3664451B1 (en) * 2018-12-06 2020-10-21 Axis AB Method and device for encoding a plurality of image frames
EP3668096B1 (en) * 2018-12-11 2025-05-14 Axis AB Method and device for encoding a sequence of image frames using a first and a second encoder
CN109862357A (en) * 2019-01-09 2019-06-07 深圳威尔视觉传媒有限公司 Cloud game image encoding method, device, equipment and the storage medium of low latency
CN112698937A (en) * 2019-10-23 2021-04-23 深圳市茁壮网络股份有限公司 Efficient code storage method and device
CN111669596B (en) * 2020-06-17 2022-08-12 展讯通信(上海)有限公司 Video compression method and device, storage medium and terminal
CN113259675B (en) * 2021-05-06 2021-10-01 北京中科大洋科技发展股份有限公司 Ultrahigh-definition video image parallel processing method
CN114205595A (en) * 2021-12-20 2022-03-18 广东博华超高清创新中心有限公司 A low-latency transmission method and system based on AVS3 encoding and decoding
CN117412062A (en) * 2023-09-28 2024-01-16 协创芯片(上海)有限公司 A multimedia chip that supports H265 encoding
CN118646883B (en) * 2024-08-16 2024-11-08 浙江大华技术股份有限公司 Coding method and related device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1126408A (en) * 1994-06-14 1996-07-10 大宇电子株式会社 Apparatus for parallel decoding of digital video signals
US5557332A (en) * 1993-03-05 1996-09-17 Sony Corporation Apparatus and method for reproducing a prediction-encoded video signal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5557332A (en) * 1993-03-05 1996-09-17 Sony Corporation Apparatus and method for reproducing a prediction-encoded video signal
CN1126408A (en) * 1994-06-14 1996-07-10 大宇电子株式会社 Apparatus for parallel decoding of digital video signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
全文.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472371A (en) * 2016-01-13 2016-04-06 腾讯科技(深圳)有限公司 Video code stream processing method and device

Also Published As

Publication number Publication date
CN101150719A (en) 2008-03-26

Similar Documents

Publication Publication Date Title
CN101150719B (en) Parallel video coding method and device
CN103152579B (en) In-loop adaptive wiener filter for video coding and decoding
US8000388B2 (en) Parallel processing apparatus for video compression
CN101969563B (en) Image processing device and image processing method
CN103314581B (en) Use image partition method and the system of row
KR101213326B1 (en) Signal processing device
EP0912063B1 (en) A method for computational graceful degradation in an audiovisual compression system
CN101621687B (en) Methodfor converting video code stream from H. 264 to AVS and device thereof
CN101345876B (en) In-frame predication encoding equipment and method in video coding
KR20040054776A (en) Reduced-complexity video decoding using larger pixel-grid motion compensation
CN103907348A (en) Moving picture encoding device, moving picture decoding device, moving picture encoding method, and moving picture decoding method
CN103621099A (en) Entropy decoding method, and decoding apparatus using same
CN106464894A (en) Method and apparatus for processing video
CN102057680A (en) Dynamic image encoding/decoding method and device
CN101072356A (en) Motion vector predicating method
CN118900328A (en) Methods, devices and media for encoding and decoding
CN102088603A (en) Entropy coder for video coder and implementation method thereof
Zhao et al. A highly efficient parallel algorithm for H. 264 video encoder
CN104322066B (en) Block interleaved handling ordered for decoding video data
KR20050062836A (en) Transcoding method and apparatus
CN114390289A (en) Reference pixel candidate list construction method, device, equipment and storage medium
CN118945360A (en) Method and apparatus for video encoding and decoding
CN102055970A (en) Multi-standard video decoding system
Zhao et al. Parallel entropy decoding for high-resolution video coding
CN201282535Y (en) Device for converting H.264 to AVS video code stream

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210421

Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040

Patentee after: Honor Device Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040

Patentee after: Honor Terminal Co.,Ltd.

Country or region after: China

Address before: 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong

Patentee before: Honor Device Co.,Ltd.

Country or region before: China