Secondary coding optimization method
Technical Field
The invention belongs to the technical field of coding, and particularly relates to a secondary coding optimization method.
Background
With the rapid development of electronic information technology and the use of various video data acquisition modes, digital video becomes the main carrier of multimedia information, however, the data volume of uncompressed digital video is very huge, for example, 8-bit RGB color video with a resolution of 1920 × 1080 and a frame rate of 30Hz, the data volume per hour is up to 4.89 TB. Such a large amount of data poses a great challenge to the transmission and storage of video, so that the video compression technology continues to become a hot field for research and application at home and abroad since the last 80 th century. With the development of digital video encoding and decoding technology, the application of digital video covers various fields such as television broadcasting, digital movies, remote education, telemedicine, video monitoring, video sessions, streaming media transmission and the like, and a plurality of well-known video application enterprises appear. In order to ensure the interoperability between the coding and decoding products of different manufacturers, corresponding video coding standards are promoted.
Since there are many kinds of redundant data in digital video, it is impossible for any one encoding tool to achieve efficient video compression by itself. The first generation video coding standard H.261 released in the last 80 th century adopts a hybrid video coding framework containing various compression tools such as prediction, transformation, quantization and entropy coding, and can effectively remove redundancies such as time domain, spatial domain, vision and information entropy in digital video, and realize high-efficiency video data compression. Therefore, hybrid video coding structures are used today by each subsequent generation of video coding standards. The first edition of the High Efficiency Video Coding (HEVC) standard was published in 2013 by the Video Coding Experts Group (VCEG) of the International Telecommunication Union-Telecommunication Standardization Organization (ITU-T) and the Moving Picture Experts Group (Moving Picture Experts Group, MPEG) of the International Organization for Standardization/International Electrotechnical Commission (International Organization for Standardization/International Electrotechnical Commission, ISO/IEC), and the compression performance of the first edition was improved by one time compared to the h.264/AVC standard, and the HEVC Coding method was optimized in the past several years for HEVC.
In video coding, a Quantization Parameter (QP) and a lagrange multiplier λ determine distortion of a coded video and a number of bits required for coding, and there is a close relationship between them. The HEVC encoder employs a hierarchical coding structure, the quantization parameter of a coded frame being determined by its position in a Group of pictures (GOP) and the encoder input quantization parameter QP
0Determining: QP
HM=QP
0+ Δ QP. In the HEVC low-latency coding configuration, the GOP size is 4, and the corresponding Δ QP values are 3, 2, 3, and 1, respectively. Then, the Lagrangian multiplier used in the encoding is represented by QP
HMAnd calculating to obtain:
w herein
LThe weighting factor is related to the level to which the coding frame belongs, and the value of the weighting factor takes the rate distortion importance of the coding frames of different levels into consideration to a certain extent. In addition, in an R- λ rate control algorithm of HEVC, a lagrangian multiplier of a frame to be encoded or a Coding Tree Unit (CTU) is first determined; the quantization parameter is then calculated from the lagrange multiplier λ: QP 4.2005 × ln (λ) + 13.7122.
In the framework of hybrid video coding, predictive coding technology makes a great contribution to an encoder to realize efficient video compression, intra-frame prediction and inter-frame prediction effectively remove spatial and temporal redundancy in video data, however, predictive coding also causes great rate-distortion dependence between coding frames and between basic coding units, that is, the current coding decision affects the maximum rate-distortion performance achievable by a subsequent coding process. The compression performance of the encoder can be further improved by effectively utilizing the rate distortion dependency to carry out self-adaptive bit resource allocation, and the bit resource allocation can be realized by adjusting the quantization parameter and the Lagrange multiplier in the encoding process.
Disclosure of Invention
The invention optimizes the code bit resource allocation by using the distortion dependency relationship and provides a secondary coding optimization method.
The technical scheme of the invention is as follows:
the secondary coding optimization method comprises the following steps:
s1, setting the video sequence start frame as an I frame, setting the Level as Level 0, and using the I frame as an independent GOP; optimizing the starting frame I frame, specifically according to the input quantization parameter QP of the encoder0Range setting I frame quantization parameter QPI:
S2, the encoder reads in a frame to be encoded of a GOP;
s3, first encoding: by usingQuantization parameter QP of HEVC default settingHM,iThe current frame is coded for the first time to obtain a frame-level time domain influence factor k of the current frameiAnd the block-level temporal impact factor k of all 16 x 16 pixel blocks within a frameB,j:
D
iAnd
coding distortion and motion compensated prediction error of the current frame, respectively; d
B,jAnd
coding distortion and motion compensated prediction error of the
jth 16 × 16 pixel block in the current frame, respectively;
s4, restoring reference list information of the encoder after the first encoding is finished, namely, the first encoding does not output the code stream of the current frame and stores the reconstructed image, and resetting the state from the image linked list in the encoder to the state before the current frame is encoded after the encoding, including restoring the reference frame identifier in the image linked list;
s5, judging whether the scene is switched, if so, entering the step S6, and if not, entering the step S7; the judging method comprises the following steps: if it is
And p is
i>10, judging that the scene switching of the current frame occurs, wherein p
iIs the average motion compensated prediction absolute error of the current frame,
is the mean of the average motion compensated prediction absolute errors of the previous 6 frames;
s6, setting the currentFrame quantization parameter QPi=QP0Then, encoding is carried out, a code stream is output and a reconstructed image is stored according to a default mode of an encoder, and the step S8 is carried out;
s7, setting the quantization parameter of the current frame
Wherein QP
HM,iIs the quantization parameter set by the current frame in the original encoder, Round () is the operator of rounding; if the current frame is a key frame, increase QP
iThe resulting coding quality loss is multiplied because the direct distortion propagation of the key frame to the subsequent multiframes, so the saved bit number is not enough to offset the total loss of the coding quality, and finally the coding performance may be reduced, therefore, the quantization parameter of the key frame is not adjusted in the current step;
by QPiCalculating to obtain a frame-level Lagrange multiplier lambdapThen, the Lagrange multiplier lambda of each CTU in the coding current frame is obtained through the following formulanAnd quantization parameter QPn:
QPn=4.2005×ln(λn)+13.7122
Where M is the number of 16 × 16 pixel blocks included in the nth CTU, N is the number of CTUs in the encoded frame, w
n、
And W
nUsing the Lagrange multiplier lambda obtained by calculation as an intermediate variable
nAnd quantization parameter QP
nCoding each CTU in the current frame, outputting a code stream and storing a reconstructed image according to a default mode of a coder;
s8, judging whether the coded frame is the last frame of the video sequence, if so, ending the coding; otherwise, continuously determining whether the encoded frame is the last frame in the current GOP, if so, returning to step S2, otherwise, returning to step S3.
The scheme of the invention is different from the traditional method in that: proposing a frame-level temporal impact factor kiAnd a block-level time domain impact factor kB,jTo measure the rate-distortion dependence. The initial I frame of the video sequence is optimized, the coding quality of the I frame determines the maximum rate distortion performance which can be achieved by the coding of the subsequent P/B frame, and the rate distortion performance can be improved for the whole coded video by properly improving the quality of the I frame. Because the rate distortion dependency is weaker when the code rate is high and the rate distortion dependency is stronger when the code rate is low, the invention quantizes the parameter QP according to the input of the coder0Range setting I frame quantization parameter QPI. Frame-level temporal impact factor k obtained with simplified coding processiAnd a block-level time domain impact factor kB,j. To reduce the computational complexity, the RDO mode selection process for the first encoding skips most of the mode decision and only employs the 64 × 64, 32 × 32, and 16 × 16 inter prediction modes. And adopting different optimization strategies for the coding frame according to whether scene switching occurs or not.
The invention has the beneficial effects that:
the rate-distortion dependence strength between the current frame and the reference frame and the rate-distortion importance degree of each coding block in the current frame to the subsequent coding process are obtained through simplified first-time coding, and the rate-distortion dependence strength and the rate-distortion importance degree are further used for guiding the optimization of the coding resources of the second-time coding frame level and the CTU level. In HEVC encoder HM-16.7, the present invention achieves code rate savings of on average 5.1% and 5.3% in both low latency B frame (LDB) and low latency P frame (LDP) coding configurations, respectively, with an average increase in coding complexity of 23%.
Drawings
FIG. 1 is a principal flow diagram of the present invention;
fig. 2 is an exemplary diagram of reference relationships in HEVC low-latency coding;
FIG. 3 is a frame-level temporal impact factor k in the present inventioniA statistical chart;
FIG. 4 is a block-level time domain impact factor k in the present inventionB,jA videotex example schematic;
FIG. 5 is a graph comparing rate-distortion curves;
fig. 6 is a diagram illustrating the coding time increase and rate saving of the present invention with respect to an HEVC encoder.
Detailed Description
The invention will be further explained and its effectiveness pointed out in the following description with reference to the drawings and simulation examples.
Examples
The embodiment adopts a development environment of Visual Studio 2013, and is realized based on HEVC reference software HM-16.7.
Fig. 1 is a flow chart of the main steps of a secondary coding optimization method, specifically including:
step 1: and optimizing the starting frame of the video sequence. According to input quantization parameter QP of encoder0Range setting I frame quantization parameter QPI:
Step 2: the frames to be encoded of a GOP are read in the HM-16.7 default way.
And step 3: the first encoding is performed using a simplified encoding process. The method comprises the steps of firstly encoding a current frame by using HEVC default quantization parameter setting and a Lagrange multiplier calculation mode, thereby obtaining information such as motion compensation prediction error and coding distortion of a frame level and a 16 x 16 pixel block level, and then calculating a frame level time domain influence factor k of the current frame according to a formulaiAnd the block-level temporal impact factor k of all 16 x 16 pixel blocks within a frameB,j。
And 4, step 4: restore the reference list and the like. The first encoding does not output the code stream of the current frame and store the reconstructed image, and the state from the image linked list in the encoder to the state before the current frame is encoded is reset after the encoding, including the restoration of the reference frame identifier in the image linked list.
And 5: and (4) judging scene switching, if so, entering a
step 6, and otherwise, entering a step 7. The scene switching judgment method comprises the following steps: if it is
And p is
i>10, it is determined that scene switching has occurred in the ith frame, where p
iIs the average motion compensated prediction absolute error of the current frame,
is the average of the average motion compensated prediction absolute errors of the previous 6 frames.
Step 6: setting a current frame quantization parameter QPi=QP0And then coding is carried out, and a code stream is output and a reconstructed image is stored according to a default mode of the coder.
And 7: setting a current frame quantization parameter
Wherein QPHM,iIs the quantization parameter that the current frame sets in the original HEVC encoder HM, Round (·) is the rounding operator. It should be noted that the above quantization parameter setting only operates on the coding frames of levels 2 and 3, and the quantization parameter of the key frame maintains the setting in the original HEVC encoder.
By QPiCalculating to obtain a frame-level Lagrange multiplier lambdapThen, the Lagrange multiplier lambda of each CTU in the coding current frame is obtained through the following formulanAnd quantization parameter QPn:
QPn=4.2005×ln(λn)+13.7122
Where M is the number of 16 × 16 pixel blocks included in the nth CTU, and N is the number of CTUs in the encoded frame. Finally, for the case that no scene switching occurs, the Lagrangian multiplier λ calculated as above is adoptednAnd quantization parameter QPnAnd coding each CTU in the current frame, outputting a code stream and storing a reconstructed image according to a default mode of the coder.
And 8: it is determined whether the encoded frame is the last frame of the video sequence. If yes, the encoding is ended; if not, it is determined whether the encoded frame is the last frame in the current GOP. If yes, skipping to step 2 to read the next GOP data; if not, then go to step 3 to encode the next frame in the current GOP.
The bitstream generated by the coding of the invention conforms to the syntax format of the HEVC standard, and the generated bitstreams can be decoded by a standard HEVC decoder. Coding experiments were performed according to HEVC general test conditions, and two encoder configurations, LDB and LDP, were tested, and reference relationships are shown in fig. 2. As can be seen from FIGS. 3 and 4, the frame-level temporal impact factor k proposed by the present inventioniAnd a block-level time domain impact factor kB,jThe rate-distortion dependence is effectively represented.
Fig. 5 is a comparison of rate-distortion curves of a test sequence parylene in LDB and LDP coding configurations, and it can be seen that the rate-distortion performance of the present invention is better than that of the original HEVC encoder HM-16.7 at both low and high code rates.
Fig. 6 is a schematic diagram of coding time increase and code rate saving of the present invention compared with the original HEVC encoder HM-16.7, and it can be seen that, under the condition that the coding complexity is increased by 23% on average, the present invention obtains code rate saving of 5.1% and 5.3% on average under LDB and LDP configurations, respectively, and shows a large rate-distortion performance improvement.