Background technology
Mainly comprise three kinds of redundant informations in the video sequence: spatial redundancy, time redundancy, statistical redundancy.Usually utilize video information self correlation, can remove the redundant information in the video data, reach the purpose of video compression.Main I frame and P (B) the frame coding techniques of adopting in video compression; The I frame is an inner frame coding method, promptly a certain frame in the video is encoded separately, is used for eliminating spatial redundancy; P (B) frame is the interframe encode method, utilizes the correlation of consecutive frame to eliminate time redundancy; Utilize entropy coding method to eliminate statistical redundancy again.
H.264/AVC (hereinafter to be referred as H.264) is as up-to-date video encoding standard, introduced numerous advanced persons' coding techniques, as the estimation of the inter prediction encoding of multidirectional intraframe predictive coding, variable-block, 1/4 pixel accuracy, multi-reference frame coding etc., it is had than the outstanding coding efficiency of standard in the past, be subjected to the people in the industry and paid attention to widely and welcome.
Yet H.264 the acquisition of higher coding efficiency is a cost with huge encoder complexity.Under identical signal to noise ratio condition, H.264 more H.263 code check is saved about 50%; But the computation complexity of H.264 encoding is about H.263 4~5 times, is about 3 times of MPEG-4, and the coding real-time is relatively poor.Therefore, how to reduce H.264 the computation complexity of codec just become H.264 can enter as early as possible extensive actual commercial, one of key factor of achieving success.
Studies show that, the computation complexity of H.264 encoding 80% mainly from estimation and model selection.Especially inter-frame mode is selected technology, and current coding macro block is traveled through all predictive modes that calculate in interframe and the frame down in the mode of full search, calculates very complexity, as Fig. 1.
In order to obtain forced coding efficient, reduce that video encoding standard in the past adopts single macro-block partition mode and H.264 the encoding error that brings in interframe encode, adopts the coding mode of variable size block, each current coding macro block can be divided into: 16 * 16,16 * 8,8 * 16 and 8 * 8, wherein 8 * 8 are called the sub-split pattern again, can also continue to be divided into 8 * 8,8 * 4,4 * 8 and 4 * 4, as Fig. 2.Support the Skip pattern simultaneously, promptly directly copy corresponding macroblock encoding pattern in the last reference frame.H.264 the motion search in is a unit with sub-piece, so the sub-piece of each in 16 * 16 macro blocks all has an independently motion vector.Obviously, the sub-piece in the macro block is divided carefullyyer, and is just little to the residual values required figure place of encoding, but the problem of therefore bringing is to have more motion vector to encode, and this can cause the required bit number of whole macroblock coding to increase.Therefore, when specific coding, need come optimized choice macroblock encoding pattern according to the motion vector encoder cost of size of the residual values after the estimation and needs.
In infra-frame prediction, H.264 make full use of the spatial coherence of neighbor, support Intra4 * 4 and Intra16 * 16 two kind of intraframe predictive coding.For the luminance component of current coding macro block, Intra4 * 4 comprise 9 kinds of prediction direction (vertical prediction, horizontal forecast, consensus forecasts, the prediction of diagonal angle, a left side, the prediction of right diagonal angle, the prediction of vertical right diagonal angle, level is predicted downwards, the prediction of vertical left diagonal angle, level is prediction upwards); Intra16 * 16 comprise 4 kinds of prediction direction (horizontal forecast, vertical prediction, DC prediction and planar prediction).For the chromatic component of current coding macro block, infra-frame prediction is similar to Intra16 * 16 of luminance component, also comprises 4 kinds of prediction direction.Therefore, the infra-frame prediction of current coding macro block can reach [4+ (16 * 9)] * 4=592 time, and amount of calculation is considerable.
H.264 based on the rate distortion costs criterion, by Lagrangian rate-distortion optimization function (Rate-Distortion Optimization, RDO), calculate the rate distortion costs value (RD-cost) of every kind of predictive mode, the predictive mode of selecting to have minimum RD-cost is optimum inter-frame forecast mode (see figure 3).Though this method can be selected the inter-frame forecast mode with optimum rate distortion meaning, amount of calculation significantly increases, and causes H.264 coding rate decline, becomes and limits one of its bottleneck of using in the real-time video field.
Summary of the invention
Proposition of the present invention is mainly based on following technical thought:
Video image can be divided into background texture flat site, the careful zone of background texture and moving region three major types substantially: very big proportion is occupied in the mild zone of background texture flat site or motion in video content usually, takes Skip (mode0) or macro-block level prediction (mode1~3) mostly; Sub-split prediction (mode4~7) is used in zone only complicated at texture or that move violent; Only just select infra-frame prediction (seeing Table 1) in the marginal portion of video image.
Various inter prediction encoding pattern utilance % in the dissimilar video sequences of table 1
By table 1 as seen, various inter-frame forecast modes right and wrong in video image are equally distributed.
For Akiyo, Miss America, Mother ﹠amp; Textures such as Daughter smooth or move mild video sequence more employing Skip and macro-block level prediction; And Coastguard, Foreman, texture-rich or the more employing sub-split predictions of the violent video sequence that moves such as Mobile; The macro-block level prediction is obviously predicted more than sub-split on the whole; No matter the video sequence of which kind of type selects the probability of infra-frame prediction all very low.
If can be according to the flatness characteristic or the movement degree of current coding macro block, the layering anticipation goes out the predictive mode set that may adopt, and exclude the set of the less predictive mode of probability of occurrence, can reduce in a large number undoubtedly by the rate distortion costs of traversal formula and calculate the encoder complexity that brings, improve the coding real-time performance simultaneously.And the prerequisite of realizing this goal is how to judge the macro block classification fast and accurately, and then selects different predictive mode set carrying out anticipations.And can guarantee not introduce additional computational overhead, and inherit the H.264 superior function of high compression ratio, become and reduce the H.264 key of encoder complexity.The present invention under this research background, has proposed the fast encoding method based on the layering anticipation of macro block temporal correlation and flatness feature just, is intended to the whole coding rate that improves H.264.
In the present invention: at first according to the magnitude relationship of the temporal correlation characteristic value of current coding macro block, carry out the ground floor prediction, inter prediction or infra-frame prediction are adopted in anticipation, if the relativity of time domain of macro block greater than spatial correlation, then need not carry out rate distortion costs to the numerous prediction direction in Intra16 * 16 and Intra4 * 4 and calculate; Secondly, if inter prediction has been selected in the ground floor anticipation, carry out second layer anticipation again, emphasis is considered the flatness feature of macro block and the relevance between the inter-frame forecast mode, realizes the judgement in advance of optimum inter-frame forecast mode, reaches the purpose that reduces computation complexity.
The inventive method specifically comprises the steps:
Step 1: the luma component values of from frame of video, extracting current coding macro block;
Step 2: utilize macro block space, temporal correlation to screen the macroblock prediction pattern in advance, squared differences between usefulness primary signal and reconstruction signal and SSD (s, c|QP) represent the macro block correlation:
In the formula, the Horizontal number of pixels and the Vertical number of pixels of 16 expression macro blocks, QP is the coded quantization step-length, and s is the original video luminance signal, and c is for adopting the reconstruction video luminance signal after predictive mode is encoded, s
Y[x, y|QP] and c
YWhen [x, y|QP] represents that respectively quantization step is QP, the value of original and reconstruction video luminance signal, x, y are the position of macro block in frame of video.
Specifically may further comprise the steps:
1) SSD
Int raRepresent the macro block spatial coherence, SSD
Int erRepresent the macro block temporal correlation;
SSD
Int raComputing formula be:
In the formula, SSD
Int raBe as predict pixel, through the squared differences of vertical and horizontal direction prediction and, s with neighbor around the coded macroblocks
Y[x+m, y+n] is the pixel brightness value of current coding macro block in the frame of video, c
Y[x+m-1, y+n], c
Y[x+m, y+n-1] be with current coding macro block at the vertical and pixel brightness value horizontal direction adjacent macroblocks, x, y are the position of macro block in frame of video, m, n represent locations of pixels in the macro block;
SSD
Int erComputing formula be:
In the formula, SSD
Int erFor the prediction squared differences of coded macroblocks and preceding frame macro block and, s
YThe pixel brightness value of [x+m, y+n] expression current coding macro block, c
YThe pixel brightness value of correspondence position macro block in the frame before [x+m, y+n] expression, x, y represent the position of macro block in frame of video, and m, n represent locations of pixels in the macro block;
2) compare SSD
Iny raAnd SSD
Int erSize, the screening current coding macro block predictive mode type that should adopt is introduced and is adjusted factor-alpha and β; Discrimination formula is:
Th
1=α·SSD
intra-SSD
inter
Th
2=SSD
intra-β·SSD
inter (4)
In the following formula, α, β are the real number of value between [0,1], if Th
1Greater than zero, illustrate intra prediction mode squared differences and greater than the squared differences of inter-frame forecast mode with, the frame-to-frame correlation that proves macro block is greater than in-frame correlation, then directly give up intra prediction mode, macro block adopts inter-frame forecast mode, must enter step 3 to the macro block estimation of taking exercises; Otherwise, judge Th
2Whether less than zero, if Th
2Less than zero, illustrate inter-frame forecast mode squared differences and greater than the squared differences of intra prediction mode and, the in-frame correlation that proves macro block is then directly given up inter-frame forecast mode greater than frame-to-frame correlation, macro block employing intra prediction mode enters step 4; Otherwise when current coding macro block is described/empty correlative character is not remarkable, any predictive mode in can not giving up in interframe/frame is carried out intra prediction mode earlier, enters step 3 again;
Step 3: determine best interframe encoding mode, adopt Lagrangian rate-distortion optimization criterion, as the judgement foundation of estimation and model selection, the optimum interframe encoding mode on the selection rate distortion sense; The rate distortion costs value, i.e. RD cost, can calculate according to following formula:
J
mode(s,c,MODE|λ
mod?e)=SSD(s,c|QP)+λ
mod?e×R(s,c,MODE|QP) (5)
In the formula, MODE represents the inter-frame forecast mode that current coding macro block adopts; S is the original video luminance signal; C is for adopting the reconstruction video luminance signal after the MODE predictive mode is encoded; λ
Mod eBe Lagrange multiplier; J
Mod e(s, c, MODE| λ
Moode) rate distortion costs value RD cost under the expression MODE pattern; (s, c MODE|QP) are the total number of bits that comprise macro block header, motion vector and all DCT block messages relevant with predictive mode and quantization parameter to R; QP is the coded quantization step-length; SSD (s, c|QP) be between primary signal and reconstruction signal squared differences and; According to macro block flatness feature, with macro block classification, and then preferentially select possible inter prediction encoding set of modes, realize the quick judgement of inter-frame forecast mode; Specifically may further comprise the steps:
1) characterizes the macro block flatness
Statistics macro block brightness component comprises the pixel count of each gray scale, obtains the macro block grey level histogram, and its shape has reflected the degree of enriching of macroblock image details just, can be used for estimating the flatness of macro block; In the macro block grey level histogram, certainly exist the gray scale of an ordinate maximum, the pixel sum that will belong to this gray scale is defined as the maximum pixel number of macro block, is designated as MaxValue.If the maximum pixel number in the macro block grey level histogram is relatively large, illustrate that the probability of some gray scale appearance is very high, be the main gray component composition of this macro block, pixel interdependence is bigger in the macro block, and promptly macro block is smooth; On the contrary, if the macro block histogram relatively disperses, a plurality of gray scales occur, corresponding maximum pixel number is less relatively, illustrates that this macro block is made of a plurality of gray scales, and grain details is abundant, variation is violent, and promptly macroblock texture is abundant.
For smooth macro block, can directly select the set of macro-block level inter-frame forecast mode (Skip, Inter16 * 16 for use, Inter16 * 8, Inter8 * 16), interrupt simultaneously sub-split inter-frame forecast mode set (Inter8 * 8, Inter8 * 4, Inter4 * 8, Inter4 * 4) traversal search; Otherwise, texture-rich is then directly selected for use the sub-split inter-frame forecast mode, interrupt traversal search simultaneously to the macro-block level inter-frame forecast mode.
2) judge macro block (mb) type
For reducing the unconspicuous macro block of feature is made erroneous judgement, the inter-frame forecast mode set of adopting dynamic dual threshold to judge macro block (mb) type and may adopt, detailed process is as follows:
(1) computing macro block grey level histogram writes down its maximum pixel and counts Max Value;
(2) capping threshold value Th
HighWith lower threshold Th
Low, Th
HighAnd Th
LowBe the integer between [0,255];
(3) if Max Value>Th
High, think that macro block is smooth, then directly carry out the large scale inter prediction, determine optimum inter-frame forecast mode, enter step 4;
(4) if Max Value<Th
Low, think that macroblock texture is abundant, then directly carry out the small size inter prediction, determine optimum inter-frame forecast mode, enter step 4;
(5) if Th
Low<Max Value<Th
High, think that macro block flatness feature is not remarkable, carry out whole inter-frame forecast modes;
For making the variation adaptively changing of threshold value, adopt following strategy with the macro block flatness:
(1) if the Max Value of current coding macro block greater than current upper limit threshold Th
High, then upgrade upper limit threshold:
With this mean value as new upper limit threshold Th
High
(2) if the Max Value of current coding macro block less than current lower threshold Th
Low, then upgrade lower threshold:
With this mean value as new lower threshold Th
Low
(3) if the Max Value of current coding macro block between upper limit threshold Th
HighWith lower limit Th
LowBetween, then keep former upper and lower limit threshold value constant.
Step 4: according to the rate distortion criterion, to the residual error under the predictive coding pattern with minimum rate distortion costs change, quantification, entropy coding;
Step 5: export final compressed video bit stream, preserve correlative coding information.
The present invention has following beneficial effect:
The inventive method has proposed fast prediction coding new approaches.Analysing in depth on the basis of inter prediction encoding principle H.264, excavate the relevance of inter-frame forecast mode and macro block characteristics, give macro block temporal correlation and the new definition of flatness feature, adopt layering anticipation step by step, effectively improve H.264 coding rate, be applicable to application scenario in real time such as video conference, remote monitoring.Coding method versatility provided by the invention is good, movement degree difference, the different video sequence of texture flatness are saved in the scramble time and all obtained good optimization effect, and the code check increase has been controlled in strictness.The inventive method is primarily aimed at the P frame and launches research, and same coding techniques can expand to the B frame, has stronger portability, also can combine with other fast encoding methods H.264, further reduces encoder complexity and scramble time.
Embodiment
Below in conjunction with description of drawings and embodiment the present invention is described in further detail.
Responsive more than chrominance information to monochrome information in view of human eye, the inventive method is encoded at the luminance component in the video sequence.Read in the video sequence of yuv format earlier, extract its luminance component, encoder calls fast coding module of the present invention and finishes video compression coding.
In concrete the enforcement, in computer, finish following program:
Step 1. beginning inter prediction encoding, the first step as shown in Figure 4: read in the video sequence of yuv format according to coding configuration file encoder.cfg, according to the parameter configuration encoder in the configuration file.For example: finish coding frame number FramesToBeEncoded; Frame per second FrameRate; Video file width S ourceWidth, height SourceHeight; Output file title OutputFile; Quantization step value QPISlice, QPPSlice; Motion estimation search scope SearchRange; Reference frame number NumberReferenceFrames; Activity ratio distortion cost function R DOptimization whether; Parameter configuration such as entropy coding type SymbolMode are set;
Step 2. is extracted coded macroblocks brightness, second step as shown in Figure 4): the luminance component that from input video stream, reads current coding macro block;
Step 3. is determined selectable interframe/infra-frame prediction, the 3rd step as shown in Figure 4: the spatial correlation characteristic value SSD that calculates and compare current coding macro block
Int raWith relativity of time domain characteristic value SSD
Int ra
Step 3.1: as Fig. 5 step 3.1, if satisfy Th
1=α SSD
Int ra-SSD
Int er>0 condition, the frame-to-frame correlation that current coding macro block then is described can directly be given up infra-frame prediction greater than in-frame correlation, and macro block is selected inter prediction, jumps to step 4; Otherwise, enter step 3.2;
Step 3.2:, judge whether to satisfy Th as Fig. 5 step 3.2
2=SSD
Int ra-β SSD
Int er<0 condition, if satisfy, the in-frame correlation that current macro then is described is greater than frame-to-frame correlation, macro block is selected infra-frame prediction, gives up inter prediction, jumps to step 5; Otherwise, illustrating that macro block temporal correlation feature is not remarkable, interior/dual prediction of interframe that need are carried out frame enters step 4;
Step 4: determine inter-frame forecast mode set,, calculate the grey level histogram of current coding macro block, write down its maximum pixel and count Max Value as the 4th step of Fig. 4, and with self adaptation upper limit threshold Th
HighWith lower limit Th
LowCompare respectively, differentiate the macro block flatness;
Step 4.1 is as Fig. 5 step 4.1: if Max Value>Th
High, think that macro block is smooth, then directly carry out the macro-block level inter prediction, determine optimum inter-frame forecast mode, upgrade upper limit threshold Th
High, jump to step 5; Otherwise, enter step 4.2;
Step 4.2 is as Fig. 5 step 4.2: if Max Value<Th
Low, think that macroblock texture is abundant, then directly carry out the sub-split inter prediction, determine optimum inter-frame forecast mode, upgrade lower threshold Th
Low, jump to step 5; Otherwise, enter step 4.3;
Step 4.3 is as Fig. 5 step 4.3: if Th
Low<Max Value<Th
High, think that macro block flatness feature is not remarkable, traversal macro-block level and sub-split inter prediction are kept former upper and lower limit threshold value Th
HighAnd Th
LowConstant, determine optimum inter-frame forecast mode;
Step 5: determine optimum inter-frame forecast mode, as Fig. 4 the 5th step: according to the rate distortion criterion, to the residual error under the predictive coding pattern with minimum rate distortion costs change, quantification, entropy coding;
Step 6: the output compressed bit stream goes on foot as Fig. 4 the 6th: export final compressed video bit stream, preserve correlative coding information.
The setting of adaptive threshold makes the threshold value can be according to the flatness of macro block and real time altering is realized the real-time adjustment of threshold value.Adopt dual threshold, both can directly judge the inter-frame forecast mode that it adopts to flatness evident characteristic macro block, again the unconspicuous macro block of flatness feature is kept the primary standard method, effectively reduce erroneous judgement, strict control code check increases.
For checking the present invention to propose the validity of method, the cycle tests of having selected to have different characteristics is as movement degree more violent Coastguard and Forman sequence; The Akiyo that movement degree is milder, Miss America and Mother﹠amp; The Daughter sequence; The Mobile sequence of texture-rich smooth motion.From scramble time, compression bit rate and Y-PSNR three aspects, the inventive method and standard code method are H.264 compared statistics (seeing Table 2).Adopt H.264 coding checkout model JM12.2, the experiment condition configuration is as follows:
Main frame is P42.8CPU, the 512M internal memory, and 100 frames of encoding, frame per second 30f/s, code flow structure are IPPP, and quantization parameter QP is made as 28, and entropy coding is CAVLC, 5 reference frames.
By table 2 statistics, the inventive method is compared with standard code method H.264, the Y-PSNR 0.046dB that on average descends, and video quality is free of losses almost; Average bit rate reduces by 0.536%, has kept the superior function of high compression ratio, on average saves the scramble time 69.59%, has effectively improved coding rate.The Claire particularly smooth, that motion is mild for texture, Container, Miss America, the Akiyo video sequence is saved the scramble time near 80%.This is that the possibility that anticipation in advance goes out the optimum code pattern is bigger owing to most macro blocks in comparatively mild video sequence have been selected the macro-block level predictive mode, has omitted a large amount of calculating to the sub-split predictive mode; And, saved the scramble time greatly because the relativity of time domain between the consecutive frame has been given up the calculating to intra prediction mode more greater than the spatial correlation between intra-frame macro block.
The performance comparison result statistics of table 2. the inventive method and standard code method H.264
PSNR represents Y-PSNR in the table 2, represents reconstruction video picture quality after the predictive coding.Symbol "+" expression strengthens or increases; Symbol "-" expression descends or reduces.