CN102186070A

CN102186070A - Method for realizing rapid video coding by adopting hierarchical structure anticipation

Info

Publication number: CN102186070A
Application number: CN2011100983697A
Authority: CN
Inventors: 刘鹏宇; 贾克斌
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2011-04-20
Filing date: 2011-04-20
Publication date: 2011-09-14
Anticipated expiration: 2031-04-20
Also published as: CN102186070B

Abstract

The invention discloses a fast video coding method with hierarchical structure prediction, which relates to the field of video compression coding. It extracts the brightness information of the currently encoded macroblock from the original video data; defines, calculates and compares the time/space correlation characteristics of the currently encoded macroblock, and is the first to predict that the macroblock should adopt inter-frame prediction or intra-frame prediction; if selected Inter-frame prediction defines and calculates the flatness characteristics of macroblocks, and according to the characteristics, the current coded macroblocks are divided into three types: flat macroblocks, texture-rich macroblocks and feature-insignificant macroblocks, and targeted prediction The set of inter-frame prediction modes that should be used to determine the optimal inter-frame prediction mode in advance to achieve fast inter-frame compression coding. Under the premise of no loss of video quality, no increase of compression code rate, and maintaining the structure of the output code stream, the method of the invention greatly reduces the complexity and time of encoding between frames, and inherits the superior performance of the original standard algorithm with high compression ratio.

Description

The fast video coding method of hierarchy anticipation

Technical field

The present invention relates to the video compression coding field, design and realized a kind of fast video coding method of hierarchy anticipation.

Background technology

Mainly comprise three kinds of redundant informations in the video sequence: spatial redundancy, time redundancy, statistical redundancy.Usually utilize video information self correlation, can remove the redundant information in the video data, reach the purpose of video compression.Main I frame and P (B) the frame coding techniques of adopting in video compression; The I frame is an inner frame coding method, promptly a certain frame in the video is encoded separately, is used for eliminating spatial redundancy; P (B) frame is the interframe encode method, utilizes the correlation of consecutive frame to eliminate time redundancy; Utilize entropy coding method to eliminate statistical redundancy again.

H.264/AVC (hereinafter to be referred as H.264) is as up-to-date video encoding standard, introduced numerous advanced persons' coding techniques, as the estimation of the inter prediction encoding of multidirectional intraframe predictive coding, variable-block, 1/4 pixel accuracy, multi-reference frame coding etc., it is had than the outstanding coding efficiency of standard in the past, be subjected to the people in the industry and paid attention to widely and welcome.

Yet H.264 the acquisition of higher coding efficiency is a cost with huge encoder complexity.Under identical signal to noise ratio condition, H.264 more H.263 code check is saved about 50%; But the computation complexity of H.264 encoding is about H.263 4～5 times, is about 3 times of MPEG-4, and the coding real-time is relatively poor.Therefore, how to reduce H.264 the computation complexity of codec just become H.264 can enter as early as possible extensive actual commercial, one of key factor of achieving success.

Studies show that, the computation complexity of H.264 encoding 80% mainly from estimation and model selection.Especially inter-frame mode is selected technology, and current coding macro block is traveled through all predictive modes that calculate in interframe and the frame down in the mode of full search, calculates very complexity, as Fig. 1.

In order to obtain forced coding efficient, reduce that video encoding standard in the past adopts single macro-block partition mode and H.264 the encoding error that brings in interframe encode, adopts the coding mode of variable size block, each current coding macro block can be divided into: 16 * 16,16 * 8,8 * 16 and 8 * 8, wherein 8 * 8 are called the sub-split pattern again, can also continue to be divided into 8 * 8,8 * 4,4 * 8 and 4 * 4, as Fig. 2.Support the Skip pattern simultaneously, promptly directly copy corresponding macroblock encoding pattern in the last reference frame.H.264 the motion search in is a unit with sub-piece, so the sub-piece of each in 16 * 16 macro blocks all has an independently motion vector.Obviously, the sub-piece in the macro block is divided carefullyyer, and is just little to the residual values required figure place of encoding, but the problem of therefore bringing is to have more motion vector to encode, and this can cause the required bit number of whole macroblock coding to increase.Therefore, when specific coding, need come optimized choice macroblock encoding pattern according to the motion vector encoder cost of size of the residual values after the estimation and needs.

In infra-frame prediction, H.264 make full use of the spatial coherence of neighbor, support Intra4 * 4 and Intra16 * 16 two kind of intraframe predictive coding.For the luminance component of current coding macro block, Intra4 * 4 comprise 9 kinds of prediction direction (vertical prediction, horizontal forecast, consensus forecasts, the prediction of diagonal angle, a left side, the prediction of right diagonal angle, the prediction of vertical right diagonal angle, level is predicted downwards, the prediction of vertical left diagonal angle, level is prediction upwards); Intra16 * 16 comprise 4 kinds of prediction direction (horizontal forecast, vertical prediction, DC prediction and planar prediction).For the chromatic component of current coding macro block, infra-frame prediction is similar to Intra16 * 16 of luminance component, also comprises 4 kinds of prediction direction.Therefore, the infra-frame prediction of current coding macro block can reach [4+ (16 * 9)] * 4=592 time, and amount of calculation is considerable.

H.264 based on the rate distortion costs criterion, by Lagrangian rate-distortion optimization function (Rate-Distortion Optimization, RDO), calculate the rate distortion costs value (RD-cost) of every kind of predictive mode, the predictive mode of selecting to have minimum RD-cost is optimum inter-frame forecast mode (see figure 3).Though this method can be selected the inter-frame forecast mode with optimum rate distortion meaning, amount of calculation significantly increases, and causes H.264 coding rate decline, becomes and limits one of its bottleneck of using in the real-time video field.

Summary of the invention

Proposition of the present invention is mainly based on following technical thought:

Video image can be divided into background texture flat site, the careful zone of background texture and moving region three major types substantially: very big proportion is occupied in the mild zone of background texture flat site or motion in video content usually, takes Skip (mode0) or macro-block level prediction (mode1～3) mostly; Sub-split prediction (mode4～7) is used in zone only complicated at texture or that move violent; Only just select infra-frame prediction (seeing Table 1) in the marginal portion of video image.

Various inter prediction encoding pattern utilance % in the dissimilar video sequences of table 1

By table 1 as seen, various inter-frame forecast modes right and wrong in video image are equally distributed.

For Akiyo, Miss America, Mother ﹠amp; Textures such as Daughter smooth or move mild video sequence more employing Skip and macro-block level prediction; And Coastguard, Foreman, texture-rich or the more employing sub-split predictions of the violent video sequence that moves such as Mobile; The macro-block level prediction is obviously predicted more than sub-split on the whole; No matter the video sequence of which kind of type selects the probability of infra-frame prediction all very low.

If can be according to the flatness characteristic or the movement degree of current coding macro block, the layering anticipation goes out the predictive mode set that may adopt, and exclude the set of the less predictive mode of probability of occurrence, can reduce in a large number undoubtedly by the rate distortion costs of traversal formula and calculate the encoder complexity that brings, improve the coding real-time performance simultaneously.And the prerequisite of realizing this goal is how to judge the macro block classification fast and accurately, and then selects different predictive mode set carrying out anticipations.And can guarantee not introduce additional computational overhead, and inherit the H.264 superior function of high compression ratio, become and reduce the H.264 key of encoder complexity.The present invention under this research background, has proposed the fast encoding method based on the layering anticipation of macro block temporal correlation and flatness feature just, is intended to the whole coding rate that improves H.264.

In the present invention: at first according to the magnitude relationship of the temporal correlation characteristic value of current coding macro block, carry out the ground floor prediction, inter prediction or infra-frame prediction are adopted in anticipation, if the relativity of time domain of macro block greater than spatial correlation, then need not carry out rate distortion costs to the numerous prediction direction in Intra16 * 16 and Intra4 * 4 and calculate; Secondly, if inter prediction has been selected in the ground floor anticipation, carry out second layer anticipation again, emphasis is considered the flatness feature of macro block and the relevance between the inter-frame forecast mode, realizes the judgement in advance of optimum inter-frame forecast mode, reaches the purpose that reduces computation complexity.

The inventive method specifically comprises the steps:

Step 1: the luma component values of from frame of video, extracting current coding macro block;

Step 2: utilize macro block space, temporal correlation to screen the macroblock prediction pattern in advance, squared differences between usefulness primary signal and reconstruction signal and SSD (s, c|QP) represent the macro block correlation:

SSD (s, c | QP) = Σ_{m = 1, n - 1}^{16.16} {(s_{Y} [x + m, y + n | QP] - c_{Y} [x + m, y + n | QP])}^{2} - - - (1)

In the formula, the Horizontal number of pixels and the Vertical number of pixels of 16 expression macro blocks, QP is the coded quantization step-length, and s is the original video luminance signal, and c is for adopting the reconstruction video luminance signal after predictive mode is encoded, s _Y[x, y|QP] and c _YWhen [x, y|QP] represents that respectively quantization step is QP, the value of original and reconstruction video luminance signal, x, y are the position of macro block in frame of video.

Specifically may further comprise the steps:

1) SSD _{Int ra}Represent the macro block spatial coherence, SSD _{Int er}Represent the macro block temporal correlation;

SSD _{Int ra}Computing formula be:

{SSD}_{intra} = Σ_{m = 1, n = 1}^{16,16} {(s_{Y} [x + m, y + n] - c_{Y} [x + m - 1, y + n])}^{2} + Σ_{m = 1, n = 1}^{16,16} {(s_{Y} [x + m, y + n] - c_{Y} [x + m, y + n - 1])}^{2} - - - (2)

In the formula, SSD _{Int ra}Be as predict pixel, through the squared differences of vertical and horizontal direction prediction and, s with neighbor around the coded macroblocks _Y[x+m, y+n] is the pixel brightness value of current coding macro block in the frame of video, c _Y[x+m-1, y+n], c _Y[x+m, y+n-1] be with current coding macro block at the vertical and pixel brightness value horizontal direction adjacent macroblocks, x, y are the position of macro block in frame of video, m, n represent locations of pixels in the macro block;

SSD _{Int er}Computing formula be:

{SSD}_{inter} = Σ_{m = 1, n = 1}^{16,16} {(s_{Y} [x + m, y + n] - c_{Y} [x + m, y + n])}^{2} - - - (3)

In the formula, SSD _{Int er}For the prediction squared differences of coded macroblocks and preceding frame macro block and, s _YThe pixel brightness value of [x+m, y+n] expression current coding macro block, c _YThe pixel brightness value of correspondence position macro block in the frame before [x+m, y+n] expression, x, y represent the position of macro block in frame of video, and m, n represent locations of pixels in the macro block;

2) compare SSD _{Iny ra}And SSD _{Int er}Size, the screening current coding macro block predictive mode type that should adopt is introduced and is adjusted factor-alpha and β; Discrimination formula is:

Th ₁＝α·SSD _intra-SSD _inter

Th ₂＝SSD _intra-β·SSD _inter (4)

In the following formula, α, β are the real number of value between [0,1], if Th ₁Greater than zero, illustrate intra prediction mode squared differences and greater than the squared differences of inter-frame forecast mode with, the frame-to-frame correlation that proves macro block is greater than in-frame correlation, then directly give up intra prediction mode, macro block adopts inter-frame forecast mode, must enter step 3 to the macro block estimation of taking exercises; Otherwise, judge Th ₂Whether less than zero, if Th ₂Less than zero, illustrate inter-frame forecast mode squared differences and greater than the squared differences of intra prediction mode and, the in-frame correlation that proves macro block is then directly given up inter-frame forecast mode greater than frame-to-frame correlation, macro block employing intra prediction mode enters step 4; Otherwise when current coding macro block is described/empty correlative character is not remarkable, any predictive mode in can not giving up in interframe/frame is carried out intra prediction mode earlier, enters step 3 again;

Step 3: determine best interframe encoding mode, adopt Lagrangian rate-distortion optimization criterion, as the judgement foundation of estimation and model selection, the optimum interframe encoding mode on the selection rate distortion sense; The rate distortion costs value, i.e. RD cost, can calculate according to following formula:

J _mode(s，c，MODE|λ _mod?e)＝SSD(s，c|QP)+λ _mod?e×R(s，c，MODE|QP) (5)

In the formula, MODE represents the inter-frame forecast mode that current coding macro block adopts; S is the original video luminance signal; C is for adopting the reconstruction video luminance signal after the MODE predictive mode is encoded; λ _{Mod e}Be Lagrange multiplier; J _{Mod e}(s, c, MODE| λ _Moode) rate distortion costs value RD cost under the expression MODE pattern; (s, c MODE|QP) are the total number of bits that comprise macro block header, motion vector and all DCT block messages relevant with predictive mode and quantization parameter to R; QP is the coded quantization step-length; SSD (s, c|QP) be between primary signal and reconstruction signal squared differences and; According to macro block flatness feature, with macro block classification, and then preferentially select possible inter prediction encoding set of modes, realize the quick judgement of inter-frame forecast mode; Specifically may further comprise the steps:

1) characterizes the macro block flatness

Statistics macro block brightness component comprises the pixel count of each gray scale, obtains the macro block grey level histogram, and its shape has reflected the degree of enriching of macroblock image details just, can be used for estimating the flatness of macro block; In the macro block grey level histogram, certainly exist the gray scale of an ordinate maximum, the pixel sum that will belong to this gray scale is defined as the maximum pixel number of macro block, is designated as MaxValue.If the maximum pixel number in the macro block grey level histogram is relatively large, illustrate that the probability of some gray scale appearance is very high, be the main gray component composition of this macro block, pixel interdependence is bigger in the macro block, and promptly macro block is smooth; On the contrary, if the macro block histogram relatively disperses, a plurality of gray scales occur, corresponding maximum pixel number is less relatively, illustrates that this macro block is made of a plurality of gray scales, and grain details is abundant, variation is violent, and promptly macroblock texture is abundant.

For smooth macro block, can directly select the set of macro-block level inter-frame forecast mode (Skip, Inter16 * 16 for use, Inter16 * 8, Inter8 * 16), interrupt simultaneously sub-split inter-frame forecast mode set (Inter8 * 8, Inter8 * 4, Inter4 * 8, Inter4 * 4) traversal search; Otherwise, texture-rich is then directly selected for use the sub-split inter-frame forecast mode, interrupt traversal search simultaneously to the macro-block level inter-frame forecast mode.

2) judge macro block (mb) type

For reducing the unconspicuous macro block of feature is made erroneous judgement, the inter-frame forecast mode set of adopting dynamic dual threshold to judge macro block (mb) type and may adopt, detailed process is as follows:

(1) computing macro block grey level histogram writes down its maximum pixel and counts Max Value;

(2) capping threshold value Th _HighWith lower threshold Th _Low, Th _HighAnd Th _LowBe the integer between [0,255];

(3) if Max Value＞Th _High, think that macro block is smooth, then directly carry out the large scale inter prediction, determine optimum inter-frame forecast mode, enter step 4;

(4) if Max Value＜Th _Low, think that macroblock texture is abundant, then directly carry out the small size inter prediction, determine optimum inter-frame forecast mode, enter step 4;

(5) if Th _Low＜Max Value＜Th _High, think that macro block flatness feature is not remarkable, carry out whole inter-frame forecast modes;

For making the variation adaptively changing of threshold value, adopt following strategy with the macro block flatness:

(1) if the Max Value of current coding macro block greater than current upper limit threshold Th _High, then upgrade upper limit threshold:

With this mean value as new upper limit threshold Th _High

(2) if the Max Value of current coding macro block less than current lower threshold Th _Low, then upgrade lower threshold:

With this mean value as new lower threshold Th _Low

(3) if the Max Value of current coding macro block between upper limit threshold Th _HighWith lower limit Th _LowBetween, then keep former upper and lower limit threshold value constant.

Step 4: according to the rate distortion criterion, to the residual error under the predictive coding pattern with minimum rate distortion costs change, quantification, entropy coding;

Step 5: export final compressed video bit stream, preserve correlative coding information.

The present invention has following beneficial effect:

The inventive method has proposed fast prediction coding new approaches.Analysing in depth on the basis of inter prediction encoding principle H.264, excavate the relevance of inter-frame forecast mode and macro block characteristics, give macro block temporal correlation and the new definition of flatness feature, adopt layering anticipation step by step, effectively improve H.264 coding rate, be applicable to application scenario in real time such as video conference, remote monitoring.Coding method versatility provided by the invention is good, movement degree difference, the different video sequence of texture flatness are saved in the scramble time and all obtained good optimization effect, and the code check increase has been controlled in strictness.The inventive method is primarily aimed at the P frame and launches research, and same coding techniques can expand to the B frame, has stronger portability, also can combine with other fast encoding methods H.264, further reduces encoder complexity and scramble time.

Description of drawings

Alternative predictive coding pattern diagram in Fig. 1 standard inter prediction encoding method.

Fig. 2 inter macroblocks is cut apart schematic diagram.

Fig. 3 standard inter prediction encoding method flow diagram.

Fig. 4 the present invention proposes the structured flowchart of coding method.

The fast encoding method flow chart that Fig. 5 the present invention proposes.

Embodiment

Below in conjunction with description of drawings and embodiment the present invention is described in further detail.

Responsive more than chrominance information to monochrome information in view of human eye, the inventive method is encoded at the luminance component in the video sequence.Read in the video sequence of yuv format earlier, extract its luminance component, encoder calls fast coding module of the present invention and finishes video compression coding.

In concrete the enforcement, in computer, finish following program:

Step 1. beginning inter prediction encoding, the first step as shown in Figure 4: read in the video sequence of yuv format according to coding configuration file encoder.cfg, according to the parameter configuration encoder in the configuration file.For example: finish coding frame number FramesToBeEncoded; Frame per second FrameRate; Video file width S ourceWidth, height SourceHeight; Output file title OutputFile; Quantization step value QPISlice, QPPSlice; Motion estimation search scope SearchRange; Reference frame number NumberReferenceFrames; Activity ratio distortion cost function R DOptimization whether; Parameter configuration such as entropy coding type SymbolMode are set;

Step 2. is extracted coded macroblocks brightness, second step as shown in Figure 4): the luminance component that from input video stream, reads current coding macro block;

Step 3. is determined selectable interframe/infra-frame prediction, the 3rd step as shown in Figure 4: the spatial correlation characteristic value SSD that calculates and compare current coding macro block _{Int ra}With relativity of time domain characteristic value SSD _{Int ra}

Step 3.1: as Fig. 5 step 3.1, if satisfy Th ₁=α SSD _{Int ra}-SSD _{Int er}＞0 condition, the frame-to-frame correlation that current coding macro block then is described can directly be given up infra-frame prediction greater than in-frame correlation, and macro block is selected inter prediction, jumps to step 4; Otherwise, enter step 3.2;

Step 3.2:, judge whether to satisfy Th as Fig. 5 step 3.2 ₂=SSD _{Int ra}-β SSD _{Int er}＜0 condition, if satisfy, the in-frame correlation that current macro then is described is greater than frame-to-frame correlation, macro block is selected infra-frame prediction, gives up inter prediction, jumps to step 5; Otherwise, illustrating that macro block temporal correlation feature is not remarkable, interior/dual prediction of interframe that need are carried out frame enters step 4;

Step 4: determine inter-frame forecast mode set,, calculate the grey level histogram of current coding macro block, write down its maximum pixel and count Max Value as the 4th step of Fig. 4, and with self adaptation upper limit threshold Th _HighWith lower limit Th _LowCompare respectively, differentiate the macro block flatness;

Step 4.1 is as Fig. 5 step 4.1: if Max Value＞Th _High, think that macro block is smooth, then directly carry out the macro-block level inter prediction, determine optimum inter-frame forecast mode, upgrade upper limit threshold Th _High, jump to step 5; Otherwise, enter step 4.2;

Step 4.2 is as Fig. 5 step 4.2: if Max Value＜Th _Low, think that macroblock texture is abundant, then directly carry out the sub-split inter prediction, determine optimum inter-frame forecast mode, upgrade lower threshold Th _Low, jump to step 5; Otherwise, enter step 4.3;

Step 4.3 is as Fig. 5 step 4.3: if Th _Low＜Max Value＜Th _High, think that macro block flatness feature is not remarkable, traversal macro-block level and sub-split inter prediction are kept former upper and lower limit threshold value Th _HighAnd Th _LowConstant, determine optimum inter-frame forecast mode;

Step 5: determine optimum inter-frame forecast mode, as Fig. 4 the 5th step: according to the rate distortion criterion, to the residual error under the predictive coding pattern with minimum rate distortion costs change, quantification, entropy coding;

Step 6: the output compressed bit stream goes on foot as Fig. 4 the 6th: export final compressed video bit stream, preserve correlative coding information.

The setting of adaptive threshold makes the threshold value can be according to the flatness of macro block and real time altering is realized the real-time adjustment of threshold value.Adopt dual threshold, both can directly judge the inter-frame forecast mode that it adopts to flatness evident characteristic macro block, again the unconspicuous macro block of flatness feature is kept the primary standard method, effectively reduce erroneous judgement, strict control code check increases.

For checking the present invention to propose the validity of method, the cycle tests of having selected to have different characteristics is as movement degree more violent Coastguard and Forman sequence; The Akiyo that movement degree is milder, Miss America and Mother﹠amp; The Daughter sequence; The Mobile sequence of texture-rich smooth motion.From scramble time, compression bit rate and Y-PSNR three aspects, the inventive method and standard code method are H.264 compared statistics (seeing Table 2).Adopt H.264 coding checkout model JM12.2, the experiment condition configuration is as follows:

Main frame is P42.8CPU, the 512M internal memory, and 100 frames of encoding, frame per second 30f/s, code flow structure are IPPP, and quantization parameter QP is made as 28, and entropy coding is CAVLC, 5 reference frames.

By table 2 statistics, the inventive method is compared with standard code method H.264, the Y-PSNR 0.046dB that on average descends, and video quality is free of losses almost; Average bit rate reduces by 0.536%, has kept the superior function of high compression ratio, on average saves the scramble time 69.59%, has effectively improved coding rate.The Claire particularly smooth, that motion is mild for texture, Container, Miss America, the Akiyo video sequence is saved the scramble time near 80%.This is that the possibility that anticipation in advance goes out the optimum code pattern is bigger owing to most macro blocks in comparatively mild video sequence have been selected the macro-block level predictive mode, has omitted a large amount of calculating to the sub-split predictive mode; And, saved the scramble time greatly because the relativity of time domain between the consecutive frame has been given up the calculating to intra prediction mode more greater than the spatial correlation between intra-frame macro block.

The performance comparison result statistics of table 2. the inventive method and standard code method H.264

PSNR represents Y-PSNR in the table 2, represents reconstruction video picture quality after the predictive coding.Symbol "+" expression strengthens or increases; Symbol "-" expression descends or reduces.

Claims

1. The fast video coding method of hierarchical structure prediction is to screen out the optimal inter-frame predictive coding mode in advance according to the characteristics of the current coded macroblock. Using hierarchical prediction, first judge the macroblock in advance according to the time/space correlation of the macroblock. Blocks should use inter-frame prediction or intra-frame prediction; if inter-frame prediction is selected, then according to the flatness characteristics of the macroblock, select the set of inter-frame prediction modes with a higher probability of occurrence, and discard the inter-frame prediction with a lower probability of occurrence A mode set, thereby replacing the exhaustive traversal search method in the original standard inter-frame prediction algorithm of H.264/AVC, determining the optimal inter-frame prediction mode in advance, and realizing fast inter-frame compression coding; it is characterized in that it includes the following steps:

Step 1: Extracting the luminance component value of the current coded macroblock from the video frame;

Step 2: Use the spatial and temporal correlation of the macroblock to filter the macroblock prediction mode in advance, and use the square difference between the original signal and the reconstructed signal and SSD(s, c|QP) to represent the macroblock correlation:

SSD SSD ((s the s,, c c | | QP QP)) = = {Σ Σ}_{m m = = 11,, n no - - 11}^{16.16 16.16} {(({s the s}_{Y Y} [[x x + + m m,, y the y + + n no | | QP QP]] - - {c c}_{Y Y} [[x x + + m m,, y the y + + n no | | QP QP]]))}^{22} - - - - - - ((11))

In the formula, 16 is the number of horizontal pixels and vertical pixels of the macroblock, QP is the encoding and quantization step size, s is the original video brightness signal, c is the reconstructed video brightness signal encoded in the prediction mode, s _Y [x, y| QP], c _Y [x, y|QP] respectively represent the value of the original and reconstructed video luminance signal when the quantization step size is QP, x, y are the position of the macroblock in the video frame, specifically including the following steps:

1) SSD _{int ra} stands for spatial correlation of macroblocks, and SSD _{int er} stands for temporal correlation of macroblocks;

The calculation formula of SSD _{int ra} is:

{SSD SSD}_{intra intra} = = {Σ Σ}_{m m = = 11,, n no = = 11}^{16,16 16,16} {(({s the s}_{Y Y} [[x x + + m m,, y the y + + n no]] - - {c c}_{Y Y} [[x x + + m m - - 11,, y the y + + n no]]))}^{22} + + {Σ Σ}_{m m = = 11,, n no = = 11}^{16,16 16,16} {(({s the s}_{Y Y} [[x x + + m m,, y the y + + n no]] - - {c c}_{Y Y} [[x x + + m m,, y the y + + n no - - 11]]))}^{22} - - - - - - ((22))

In the formula, SSD _{in ra} uses the surrounding adjacent pixels of the coded macroblock as prediction pixels, and the sum of the square difference values predicted in the vertical and horizontal directions, s _Y [x+m, y+n] is the current coded macroblock in the video frame The pixel brightness value of the block, c _Y [x+m-1, y+n], c _Y [x+m, y+n-1] is the pixel of the macroblock adjacent to the current coded macroblock in the vertical and horizontal directions Brightness value, x, y are the position of the macroblock in the video frame, m, n represents the position of the pixel in the macroblock;

The calculation formula of SSD _{int er} is:

{SSD SSD}_{inter inter} = = {Σ Σ}_{m m = = 11,, n no = = 11}^{16,16 16,16} {(({s the s}_{Y Y} [[x x + + m m,, y the y + + n no]] - - {c c}_{Y Y} [[x x + + m m,, y the y + + n no]]))}^{22} - - - - - - ((33))

In the formula, SSD _{inter is} the sum of predicted squared differences between the coded macroblock and the previous frame macroblock, s _Y [x+m, y+n] represents the pixel brightness value of the current coded macroblock, c _Y [x+m, y+n] represents the pixel luminance value of the corresponding position macroblock in the previous frame, x, y represents the position of the macroblock in the video frame, m, and n represents the position of the pixel in the macroblock;

2) Compare the size of SSD _{in ra} and SSD _{in er} , select the type of prediction mode that should be used for the current coded macroblock, and introduce adjustment factors α and β; the discriminant formula is:

Th ₁ ＝α·SSD _intra -SSD _inter

Th ₂ ＝SSD _{int ra} -β·SSD _inter (4)

In the above formula, α and β are real numbers with values between [0, 1]. If Th ₁ is greater than zero, it means that the sum of the squared differences of the intra prediction mode is greater than the sum of the squared differences of the inter prediction mode, proving that the macro If the inter-frame correlation of the block is greater than the intra-frame correlation, the intra-frame prediction mode will be discarded directly, and the macro-block adopts the inter-frame prediction mode, and motion estimation must be performed on the macro-block, and then enter step 3; otherwise, judge whether Th ₂ is less than zero, if Th ₂ is less than zero, indicating that the sum of the squared differences of the inter-frame prediction mode is greater than the sum of the squared differences of the intra-frame prediction mode, which proves that the intra-frame correlation of the macroblock is greater than the inter-frame correlation, and the inter-frame prediction mode is directly discarded, and the macroblock Use the intra-frame prediction mode and go to step 4; otherwise, it means that the temporal/spatial correlation characteristics of the current coded macroblock are not significant, and any prediction mode in the inter-frame/intra-frame cannot be discarded, and the intra-frame prediction mode is executed first, and then enter Step three;

Step 3: Determine the best inter-frame coding mode, use the Lagrangian rate-distortion optimization criterion as the judgment basis for motion estimation and mode selection, and select the optimal inter-frame coding mode in the sense of rate-distortion; the rate-distortion cost value, namely RD cost can be calculated according to the following formula:

J _mode (s, c, MODE|λ _{mod e} )=SSD(s, c|QP)+λ _{mod e} ×R(s, c, MODE|QP) (5)

In the formula, MODE represents the inter-frame prediction mode used by the current coded macroblock; s is the original video signal; c is the reconstructed video signal encoded by the MODE prediction mode; λ _{mod e} is the Lagrangian multiplier; J _mod e( s, c, MODE|λ _moode ) indicates the rate-distortion cost value RD cost in MODE mode; R(s, c, MODE|QP) is related to the prediction mode and quantization parameters, including macroblock header information, motion vector and all DCT The total binary digits of the block information; QP is the coding quantization step size; SSD(s, c|QP) is the sum of the square difference between the original signal and the reconstructed signal; according to the flatness characteristics of the macroblock, the macroblock is classified, Then, a set of possible inter-frame prediction coding modes is preferentially selected to realize a fast judgment of the inter-frame prediction mode; specifically, the following steps are included:

1) Characterize macroblock flatness

The luminance component of the macroblock contains the number of pixels of each gray level, and the gray histogram of the macroblock is obtained, and its shape just reflects the richness of the macroblock image details, which can be used to evaluate the flatness of the macroblock; In the figure, there must be a gray level with the largest vertical coordinate, and the total number of pixels belonging to this gray level is defined as the maximum number of pixels of the macroblock, which is recorded as MaxValue;

2) Determine the macroblock type

In order to reduce misjudgment of macroblocks with inconspicuous features, dynamic double thresholds are used to judge the type of macroblock and the set of possible inter-frame prediction modes. The specific process is as follows:

(1) Calculate the gray level histogram of the macroblock, and record its maximum pixel number Max Value;

(2) Set the upper threshold Th _high and the lower threshold Th _low , Th _high and Th _low are both integers between [0, 255];

(3) If Max Value>Th _high , it is considered that the macroblock is flat, then directly perform large-size inter-frame prediction, determine the optimal inter-frame prediction mode, and enter step 4;

(4) If Max Value<Th _low , it is considered that the texture of the macroblock is rich, then directly perform small-size inter-frame prediction, determine the optimal inter-frame prediction mode, and enter step 4;

(5) If Th _low <Max Value<Th _high , it is considered that the flatness feature of the macroblock is not significant, and all inter-frame prediction modes are executed;

In order to make the threshold change adaptively with the change of macroblock flatness, the following strategy is adopted:

(1) If the Max Value of the current coded macroblock is greater than the current upper threshold Th _high , update the upper threshold: Use this average value as the new upper threshold Th _high ;

(2) If the Max Value of the current coded macroblock is less than the current lower threshold Th _low , update the lower threshold:

Use this average value as the new lower threshold Th _low ;

(3) If the Max Value of the currently coded macroblock is between the upper threshold Th _high and the lower threshold Th _low , keep the original upper and lower threshold values unchanged.

Step 4: According to the rate-distortion criterion, change, quantize, and entropy encode the residual in the predictive coding mode with the minimum rate-distortion cost;

Step 5: Output the final compressed video stream and save the encoding information.