Block-removal filtering method in a kind of coding and decoding video
Technical field
The present invention relates to field of video processing, more specifically, relate to the coding and decoding video field.
Background technology
H.264 be the up-to-date international video compression coding standard that ITU-T/ISO announced in 2003, it has improved code efficiency greatly, and one of them major reason is in the encoding and decoding loop, to have introduced block elimination filtering.Block elimination filtering has also brought very big amount of calculation to video compression coding, and for HD video, block elimination filtering has taken for about 38% computing time.On multi-core/many-core processor,, become the important channel that performance improves to the block elimination filtering tasks in parallelization.
The parallel method that exists at present generally is on the fewer polycaryon processor of processing unit, to carry out macro-block level to walk abreast, and the instruction that promptly utilizes processor to provide is directly given different processing units with incoherent macroblock allocation.Each macro block needed to wait for that its relevant macro block disposed before handling, so just need some extra expending synchronously.When on the many many-core processors of processing unit, adopting this parallel scheme; Degree of parallelism is smaller; Can not make full use of so much processing unit, cause a plurality of processing units idle, expend the speed that also greatly reduces block elimination filtering synchronously between the macro block simultaneously.
In full accord in order to guarantee the block elimination filtering process in the Code And Decode, the block elimination filtering computing of encoding and decoding image must be carried out according to a definite sequence.At first, for interior macroblocks, in the process of respectively the brightness border and the colourity border of each macro block being handled, in proper order for from left to right, from top to bottom, and handle vertical border earlier according to the prior art BORDER PROCESSING.Fig. 1 a shows the 16*16 brightness data, and is as shown in the figure, and the brightness BORDER PROCESSING sequencing among the figure shown in the black lines is followed successively by the 1st, the 2nd, the 3rd, the 4th on the 1st on the left side, the 2nd, the 3rd, the 4th, top; Fig. 1 b shows the 8*8 chroma data, and similarly, the processing sequencing on the colourity border among the figure shown in the black lines is followed successively by the 1st, the 2nd on the 1st on the left side, the 2nd, top.Secondly, the macro block in every two field picture is handled according to raster scan order, and the Code And Decode process all need be according to this order.
Because the filter sequence on aforesaid interior macroblocks border and the macro block processing sequence of entire image, it is relevant, as shown in Figure 2 that current macro block and the left macro block that closes on, last macro block and three of upper right macro blocks close on macro block.Thus before to current macro filtering, need be to these three relevant macro block filterings.
In the macro-block level parallel scheme of block elimination filtering, directly carry out while filtering for different processing units incoherent macroblock allocation.Fig. 3 shows in the prior art macro block and handles time sequencing, a macro block in each square representative frame image wherein, and the timestamp of this macro block of digitized representation, the macro block with same numbers can parallel processing.As shown in Figure 3, when beginning filtering, two timestamps of every mistake, degree of parallelism adds one.If the processing unit number is abundant, the macroblock number on two field picture height and the width is respectively H and W, then maximum parallelism degree be min (ceil (W/2), H), wherein min representes to get minimum value, ceil representes to return and is greater than or equal to the smallest positive integral of specifying expression formula.If for current macro, expending synchronously that an one of which relevant macro block is caused is C, and expending synchronously of entire frame image is similar to 3*W*H*C so.
Summary of the invention
The objective of the invention is to overcome the problem that above-mentioned degree of parallelism of the prior art is little, synchronous wasteful and filtering speed is slow.
According to an aspect of the present invention; Block-removal filtering method in a kind of coding and decoding video is provided; When macro block was carried out filtering, the brightness BORDER PROCESSING was the 4th on the right, the 3rd on the right, the 2nd on the right, the 1st of top, the 2nd of top, the 3rd of top, the 4th of top, the 1st on the right in proper order; The colourity BORDER PROCESSING is the 2nd on the right, the 1st of top, the 2nd of top, the 1st on the right in proper order.
In said method, processor can dynamically obtain the macro block of treating filtering, also can static obtain the macro block of treating filtering.
According to a further aspect in the invention, block-removal filtering method in a kind of coding and decoding video is provided also, said method comprises the following steps:
1) quantity with the untreated relevant macro block of each macro block stores conditional matrix into;
2) coordinate of the corresponding macro block of digital " 0 " in the said conditional matrix is put into waiting list, simultaneously this numeral is changed to negatively, wherein said waiting list is used to store the coordinate of treating filtered macroblock;
3) processing unit of free time obtains to treat the coordinate of filtered macroblock from said waiting list; Treat filtered macroblock and carry out filtering; And upgrade said conditional matrix; During said filtering, the brightness BORDER PROCESSING is the 4th on the right, the 3rd on the right, the 2nd on the right, the 1st of top, the 2nd of top, the 3rd of top, the 4th of top, the 1st on the right in proper order; The colourity BORDER PROCESSING is the 2nd on the right, the 1st of top, the 2nd of top, the 1st on the right in proper order;
4) whether also have digital " 0 " in the Rule of judgment matrix, if also have, then forward step 3 to, otherwise finish.
The present invention can utilize the more free processing unit, makes to expend than prior art synchronously obviously to reduce; In addition, each macro block only need carry out synchronous communication with macro block still less before filtering, reduced operation bidirectional, has further accelerated whole filtering speed.
Description of drawings
Fig. 1 a and Fig. 1 b are respectively the processing sequence sketch mapes on brightness border and colourity border in the prior art;
Fig. 2 is a relevant macro block sketch map in the prior art;
Fig. 3 is that macro block is handled the time sequencing sketch map in the prior art;
Fig. 4 is brightness border and the processing sequence sketch map on colourity border in accordance with a preferred embodiment of the present invention;
Fig. 5 is a relevant macro block sketch map in accordance with a preferred embodiment of the present invention;
Fig. 6 is a macro block processing time sequencing sketch map in accordance with a preferred embodiment of the present invention;
Fig. 7 is a block elimination filtering process sketch map in accordance with a preferred embodiment of the present invention.
Embodiment
In order to make the object of the invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing, to block-removal filtering method further explain in the coding and decoding video in accordance with a preferred embodiment of the present invention.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
In the present invention, the filter sequence through change interior macroblocks border changes the correlation between the macro block, but does not change the boundary filtering order in the entire frame image like this, and does not influence the quality of video.Shown in Fig. 4 a; According to a preferred embodiment of the invention; In the process that the brightness border of each macro block is handled, BORDER PROCESSING is the 4th on the right, the 3rd on the right, the 2nd on the right, the 1st of top, the 2nd of top, the 3rd of top, the 4th of top, the 1st on the right in proper order; Shown in Fig. 4 b, in the process that the colourity border of each macro block is handled, BORDER PROCESSING is the 2nd on the right, the 1st of top, the 2nd of top, the 1st on the right in proper order.
Because the modification of the filter sequence on interior macroblocks border, the correlation between the macro block descends, and is as shown in Figure 5, and current macro is only relevant with last macro block with the left macro block that closes on.Before to current macro filtering, need carry out filtering to these two relevant macro blocks.
The macro block that Fig. 6 shows after the correlation that has reduced according to a preferred embodiment of the present invention between the macro block is handled time sequencing.With Fig. 3 similarly, the macro block in each square representative frame image, the digitized representation timestamp, the macro block with same numbers can parallel processing.As shown in Figure 6, when beginning filtering, timestamp of every mistake, degree of parallelism adds one, increases sooner than degree of parallelism in the existing method.If the processing unit number is abundant, the macroblock number on two field picture height and the width is respectively H and W, and maximum parallelism degree is that (W, H), it is more than or equal to the maximum parallelism degree of prior art for min.If for current macro, expending synchronously that an one of which relevant macro block is caused is C, and expending synchronously of entire frame image is similar to 2*W*H*C so.
Thus, the present invention can allow more irrelevant macro block parallel filtering on processor, utilizes idle processing unit preferably, makes to expend than prior art synchronously obviously to reduce; In addition, each macro block only need carry out synchronous communication with macro block still less before filtering, reduced operation bidirectional, has accelerated whole filtering speed.
Describe according to the preferred embodiment of the invention block-removal filtering method in the coding and decoding video in detail below in conjunction with Fig. 7, this method specifically comprises the steps:
1) initialization condition matrix; Conditional matrix is used to write down the state of each macro block; Its corresponding macro block of each numeral in this matrix just can be handled after also need waiting for the processing of what macro blocks, just representes the untreated relevant number of macroblocks of the macro block that it is corresponding.Conditional matrix can be stored in the shared drive of each processing unit in the multi-core/many-core processor.
2) the initialization waiting list is empty, and wherein waiting list is used to store the coordinate of treating filtered macroblock.Waiting list can be stored in the shared drive of each processing unit in the multi-core/many-core processor.
3) testing conditions matrix is put into waiting list with the coordinate of the corresponding macro block of digital " 0 " in the conditional matrix, and numeral changes to negatively simultaneously, for example this numeral is changed to " 1 ".
4) idle processing unit obtains to treat the coordinate of filtered macroblock in proper order macro block to be carried out filtering according to brightness border in the above-mentioned macro block and colourity BORDER PROCESSING from waiting list, and the update condition matrix.Particularly, when coordinate be that (i, macro block j) be by after the filtering, is that (i+1 is j) with (i, the pairing numeral of macro block j+1) subtracts 1 with coordinate in the conditional matrix.
5) whether also have digital " 0 " in the Rule of judgment matrix, if also have " 0 " to exist, then jump to step 3), otherwise finish.
In above-mentioned preferred embodiment; Provide a kind of multi-core/many-core processor to adopt dynamical fashion to obtain the block-removal filtering method of the macro block of treating filtering; One of ordinary skill in the art will appreciate that; Multi-core/many-core processor also can adopt static mode to obtain the macro block of treating filtering, for example macro block is carried out filtering line by line.
Utilize said method of the present invention and existing method, on the Tile64 platform, filtering is experimentized respectively.
Tile64 is a many-core processor that Tilera company releases.Its 64 processing units are linked to each other by a high speed 2D network, form a 8x8 array.The single clock cycle of each processing unit is moved three instructions, and has the L1 buffer memory of a 16KB, and comprising the Instructions Cache of 8KB and the metadata cache of 8KB, each processing unit has direct memory access (DMA) (DMA) system of oneself simultaneously.Instruction such as can move on the Tile64 platform that the various threads that comprised in Tilera multinuclear Component Gallery (TMC) storehouse and processing unit are bound, communicate by letter between the processing unit.The filtering testing software takes from H.264 reference software JM15.1, and the video of test has " blue sky ", " pedestrian ", " riverbed " or the like, and the form of video comprises four kinds: 1280x720 (HD), 720x576 (SD), 352x288 (CIF) and 176x144 (QCIF).
Test result is shown in table 1 and table 2; Wherein table 1 is the test result to each video; Table 2 is the average statisticses to the test result of every kind of format video; The serial execution required time of block elimination filtering is shown in " JM15.1 " tabulation in table 1 and table 2, the ratio that the existing method required time of " speed-up ratio " expression obtained divided by the required time of the inventive method.Visible from table, method of the present invention has effectively improved the speed of block elimination filtering.
The block elimination filtering test result of many videos of table 1
The block elimination filtering test statistics result of many kinds of format videos of table 2
Should be noted that and understand, under the situation that does not break away from the desired the spirit and scope of the present invention of accompanying Claim, can make various modifications and improvement the present invention of above-mentioned detailed description.Therefore, the scope of the technical scheme of requirement protection does not receive the restriction of given any specific exemplary teachings.