Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not limiting of embodiments of the invention. It should be further noted that, for convenience of description, only some, but not all of the structures related to the embodiments of the present invention are shown in the drawings.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
Fig. 1 is a flowchart of a content adaptive video coding method according to an embodiment of the present invention, which can be applied to coding video data, where the method can be executed by a computing device, such as a notebook, a desktop, a smart phone, a server, and a tablet computer, and specifically includes the following steps:
step S101, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.
The video data to be encoded includes recorded video data and video data generated in real time and required to be transmitted and displayed, such as live video data.
In one embodiment, in encoding video data, the video data is first divided into a plurality of image sets containing successive frame images for a piece of video data. I.e. when video encoding is performed, separate video encoding is performed for each sub-divided image set. Illustratively, the video data may be divided into successive GOPs (Group of pictures, a group of pictures), each GOP representing a group of successive pictures in an encoded video stream. If each GOP contains 15 frames or 20 frames of pictures, i.e. the video data to be encoded is divided into a plurality of consecutive picture sets, each picture set contains 15 to 20 frames of pictures, i.e. the encoding of the video data is performed in GOP-coding units.
Step S102, determining coding features of the image set, and inputting the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters.
In one embodiment, the manner in which the encoding features of the image set are determined may be by employing a precoding implementation to derive the encoding features of the image set. Such as encoding the image collection using an encoder to obtain corresponding encoding characteristics.
In one embodiment, the encoding features of the image set are obtained by feature extraction and analysis of each frame of image in the image set. Optionally, the coding features include motion vector features, distortion level parameters, complexity parameters, etc. for describing each frame of image in the image set. The motion vector feature is used for representing the change degree of the image, wherein the more the change among the images of each frame is severe, the larger the motion vector is relatively, otherwise, if the images of each frame describe a still picture, the smaller the motion vector is, the larger the distortion degree of the image is represented by a distortion degree parameter, the higher the distortion degree of the image is, the higher the parameter value is, otherwise, if the distortion degree of the image is low, the corresponding parameter value is relatively lower, and the complexity parameter is used for representing the complexity degree of the image, for example, the image comprises a plurality of different objects, and the greater the pixel difference among each object is, the higher the complexity is. Alternatively, the identification of the coding features may be implemented by an existing encoder module, an image processing algorithm, and the like.
The video picture evaluation parameter is a comprehensive evaluation index for representing the image quality. Alternatively, the video picture evaluation parameters may be characterized by VMAF (Video Multimethod Assessment Fusion, video multi-method evaluation fusion). Wherein VMAF is an objective evaluation index combining human visual modeling and machine learning, which is proposed by Netflix. VMAF uses a large amount of subjective data as a training set, and fuses algorithms of different evaluation dimensions through a machine learning method, so that the method is an objective evaluation index which is the current mainstream of comparison. It can be generally considered that the higher the VMAF score is, the better the video quality is, but from the viewpoint of human eye perception, when the VMAF score is increased to a certain threshold, the human eye cannot perceive the image quality improvement, so that different VMAF values can be designed for different videos to realize the saving of the coding bit rate without changing the subjective quality of the videos.
In one embodiment, the determined coding features of the image set and the set video picture evaluation parameters are input to a pre-trained machine learning model to output code rate control parameters, wherein the set video picture evaluation parameters can be customized according to different picture quality requirements, different playing devices and the like, and the set values can also be adjusted. The input machine learning model is a pre-trained neural network model, and the input machine learning model can output corresponding code rate control parameters based on the coding characteristics of the image set and the set video picture evaluation parameters. Alternatively, the rate control parameter may be a CRF (Constant Rate Factor, constant rate coefficient) or a CQF (Constant Quality Factor ). The CRF is one of code rate control, the smaller the CRF value is, the higher the video quality is brought, and the file volume is also increased, and the larger the CRF value is, the higher the video compression rate is, but the lower the video quality is. Alternatively, different CRF values correspond to different code rates, and different CRF values and corresponding code rates may be recorded by using a mapping table, or a relationship between CRF and code rate may be represented by using a function curve.
And step 103, coding the image set according to the coding characteristics and the code rate control parameters.
In one embodiment, after obtaining the rate control parameter through the machine learning model, final secondary encoding is performed on the image set based on the rate control parameter and the encoding feature determined in step S101, so as to output the code stream data.
Specifically, fig. 2 is a flowchart of a method for performing secondary encoding based on a primary encoding result according to an embodiment of the present invention, as shown in fig. 2, specifically includes:
step S1031, determining frame type information and scene information according to the coding features.
Wherein the coding feature records the frame type of each frame, such as different frame types of I-frame, P-frame and B-frame partitions. Wherein different frame type information requires different quality of encoding compression due to the difference in its reference references. The I frame represents a key frame and is a frame picture which is reserved completely, decoding of the picture can be completed only by the frame data without referring to other frame pictures, the P frame represents the difference between the frame and the previous key frame or the P frame, the difference defined by the previous buffer picture is overlapped with the picture to generate a final picture when decoding, and the B frame represents a bidirectional difference frame, namely, the B frame records the difference between the previous frame and the next frame when decoding the B frame picture, the previous buffer picture is acquired, the picture after decoding is acquired, and the final picture is acquired through the superposition of the previous frame and the next frame and the current frame data.
The scene information may be divided into a motion scene and a still scene, for example. Which can be determined from the coding features by an integrated scene discrimination module. Wherein, the coding feature records the image feature related to the motion vector, motion compensation and the like of each frame of image and the motion displacement change, and the scene information of the image is determined by analyzing the data of the motion vector, the motion compensation and the like.
Step S1032, performing prediction analysis according to the frame type information, the scene information and the code rate control parameter to obtain coding parameters.
The encoding parameter is exemplified by HEVC (HIGH EFFICIENCY Video Coding), which corresponds to a quantization parameter QP (quantization parameter ). The quantization parameter QP is a sequence number of a quantization step length Qstep, and for luma coding, the quantization step length Qstep has 52 values, and for chroma coding, the QP has values of 0 to 51.
The coding parameters take quantization parameter QP as an example, reflecting the spatial detail compression. The smaller the encoding parameter value is, the finer the quantization is, the higher the image quality is, and the longer the generated code stream is, if the quantization parameter QP value is smaller, most of details in the image can be reserved, and if the quantization parameter QP value is increased, some details in the image are correspondingly lost, and the code rate is reduced. Taking the QP values 0-51 as an example, the QP is the finest quantization when it is the minimum value 0, and conversely, the QP is the maximum value 51, which indicates that the quantization is the coarsest. Quantization is to reduce the image coding length without reducing the visual effect and reduce unnecessary information in visual restoration.
Specifically, the process of obtaining coding parameters by performing prediction analysis based on frame type information, scene information and code rate control parameters, for example, is implemented by using an integrated encoder module of HEVC high efficiency video coding. I.e. different frame type information (I-frame, B-frame, P-frame), scene information (static scene, dynamic scene), code rate control parameters (CRF) together determine the final coding parameters (frame level QP). Illustratively, the higher the frame type is a key frame, the scene information is a dynamic scene, and the value of the rate control parameter, the lower the frame level QP value is determined.
Step S1033, encoding the image set based on the encoding parameter.
In one embodiment, after obtaining the coding parameters, taking a frame level QP parameter in HEVC high efficiency video coding as an example, HEVC high efficiency video coding is performed to implement code stream output.
In another embodiment, in order to improve accuracy of the secondary encoding, the process of performing prediction analysis to obtain encoding parameters and encoding the image set based on the encoding parameters includes performing prediction analysis to obtain first encoding parameters, determining second encoding parameters based on the first encoding parameters, encoding feedback information, buffer information, frame type information, and scene information, adjusting quantization offset parameters according to the first encoding parameters, and encoding the image set based on the second encoding parameters and the adjusted quantization offset parameters to output bitstream data. Taking HEVC coding as an example, the first coding parameter may be understood as basic QP information (base QP), which determines frame-level QP information according to the first coding parameter, coding feedback information, buffer information, frame type information, and scene information. The buffer information characterizes parameters of a buffer memory in the video coding process, and the larger the buffer occupation is, the larger the corresponding QP value is so as to reduce the operation amount and the storage amount of video coding. The encoding feedback information can be information obtained in the pre-encoding process or fed back after encoding the image set or the video in the previous round, such as distortion degree, if the distortion degree is higher, the QP value is required to be reduced correspondingly, so as to improve the encoding quality. The quantization offset parameter is further adjusted according to the first coding parameter while the second coding parameter is determined according to the first coding parameter. The quantization offset parameter, exemplified by HEVC video coding, may be characterized by Cutree intensities, which represent quantization offset adjustments according to the extent to which the current block is referenced. Specifically, if the current block is referred to, it is further determined whether a certain number of blocks after the current block refer to the current block, if the current block is referred to more by the subsequent image blocks, the current block is characterized as belonging to a slowly changing scene, and the QP value is correspondingly adjusted down to improve the image quality. And finally, comprehensively carrying out image set coding by utilizing the determined second coding parameters and the determined quantization offset parameters so as to output code stream data, and ensuring the optimal balance of coding effect between image quality and compression rate.
According to the scheme, when video coding is carried out, firstly, video is divided into image sets, coding characteristics are obtained by primary coding, then, accurate code rate control parameters are output by using a trained machine learning model, and then, secondary coding is carried out on the image sets based on the code rate control parameters and the coding characteristics obtained in the primary coding process to finally obtain video coding results.
Fig. 3 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, and provides a method for determining coding characteristics of an image set, where as shown in fig. 3, the method specifically includes:
step S201, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.
Step S202, obtaining a preset number of frame images in the image set, encoding the preset number of frame images to obtain encoding features, and determining the encoding features as the encoding features of the image set.
In one embodiment, taking the GOP image as an example, the preset number of frame images may be miniGOP images in one GOP image, that is, taking the GOP image with 15 frames as an example, the preset number of frame images may be 5 frame images. Wherein, the process of encoding the preset number of frame images may be pre-encoding by an encoder to obtain encoding characteristics. And determining the coding features of the frame images with preset numbers as the coding features of the image set.
And step 203, inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters.
And step S204, coding the image set according to the coding characteristics and the code rate control parameters.
According to the scheme, the video live broadcast content self-adaptive coding technology of twice coding and machine learning is adopted in the video coding process, coding configuration is dynamically adjusted according to the complexity of video content, wherein the preset number of frame images in the image set are obtained by obtaining the coding characteristics, the coding characteristics are determined as the coding characteristics of the image set, the coding speed can be remarkably improved, the video coding effect required by real-time performance is outstanding, meanwhile, the data calculated amount is reduced, the content self-adaptive coding is realized, the video smoothness and definition are balanced better, and the video live broadcast content self-adaptive coding method can be applied to real-time live broadcast video scenes and has good video coding effect.
Fig. 4 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, and provides a specific method for outputting a rate control parameter through a machine learning model, where the machine learning model includes a joint model formed by a first training model and a second training model, and as shown in fig. 4, the method specifically includes:
Step S301, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.
Step S302, determining coding features of the image set, and respectively inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model.
In one embodiment, the first training model is a XGBoost model and the second training model is a LightGBM model, both of which are decision tree-based machine learning algorithms. Illustratively, the first rate control parameter output by the first training model is denoted CRF1 and the second rate control parameter output by the second training model is denoted CRF2.
And step S303, carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain the code rate control parameter.
The finally calculated rate control parameter is denoted as CRF3, and is optionally calculated by the formula crf3=λ 1*CRF1+λ2 ×crf2. Wherein lambda 1+λ2=1,λ1∈[0,1],λ2 E [0,1].
And step S304, coding the image set according to the coding characteristics and the code rate control parameters.
According to the method, when the code rate control parameters are output through the machine learning model, two different models based on the decision tree are adopted to output the corresponding code rate control parameters, and then weighted average is carried out to obtain the final code rate control parameters, so that the accuracy of the obtained code rate control parameters is higher, and the effect of final video coding is better.
In one embodiment, before the coding features and the set video picture evaluation parameters are respectively input into the first training model and the second training model, the method further comprises the steps of obtaining video sample data of different scene types and corresponding different resolutions, dividing the video sample data into a training set sample, a test set sample and a verification set sample, and respectively inputting the training set sample, the test set sample and the verification set sample into the first training model and the second training model for training. In the model training process, the scheme firstly distinguishes the scene types of the video pictures, such as dynamic scenes and static scenes, and simultaneously respectively trains based on the video pictures with different resolutions as sample data, and in the training process, the video sample data is divided into a training set sample, a test set sample and a verification set sample so as to obtain a final training model with good prediction effect.
Fig. 5 is a block diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, where the apparatus is configured to execute the content adaptive video coding method according to the foregoing embodiment, and the apparatus has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 5, the apparatus specifically includes an image set determining module 101, a code rate parameter determining module 102, and an encoding module 103, wherein,
An image set determining module 101, configured to obtain video data to be encoded, and divide the video data into a plurality of image sets including continuous frame images;
the code rate parameter determining module 102 is configured to determine coding features of the image set, input the coding features and the set video picture evaluation parameters to a pre-trained machine learning model, and output code rate control parameters;
an encoding module 103, configured to encode the image set according to the encoding feature and the rate control parameter.
According to the scheme, when video coding is carried out, firstly, video is divided into image sets, coding characteristics are obtained by primary coding, then, accurate code rate control parameters are output by using a trained machine learning model, and then, secondary coding is carried out on the image sets based on the code rate control parameters and the coding characteristics obtained in the primary coding process to finally obtain video coding results.
In one possible embodiment, the code rate parameter determining module 102 is specifically configured to:
acquiring a preset number of frame images in the image set;
And coding the frame images with the preset number to obtain coding features, and determining the coding features as the coding features of the image set.
In one possible embodiment, the machine learning model includes a joint model composed of a first training model and a second training model, and the code rate parameter determining module 102 is specifically configured to:
inputting the coding characteristics and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model;
and carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain the code rate control parameter.
In one possible embodiment, the code rate parameter determining module 102 is further configured to:
Before the coding characteristics and the set video picture evaluation parameters are respectively input into the first training model and the second training model, video sample data of different scene types and corresponding different resolutions are obtained;
Dividing the video sample data into a training set sample, a test set sample and a verification set sample, and respectively inputting the training set sample, the test set sample and the verification set sample into the first training model and the second training model for training.
In one possible embodiment, the encoding module 103 is specifically configured to:
determining frame type information and scene information according to the coding features;
Performing predictive analysis according to the frame type information, the scene information and the code rate control parameter to obtain coding parameters;
The set of images is encoded based on the encoding parameters.
In one possible embodiment, the encoding module 103 is specifically configured to:
Performing predictive analysis to obtain a first coding parameter;
and determining a second coding parameter based on the first coding parameter, the coding feedback information, the buffer information, the frame type information and the scene information.
In one possible embodiment, the encoding module 103 is specifically configured to:
Adjusting a quantization offset parameter according to the first coding parameter;
and encoding the image set according to the second encoding parameter and the adjusted quantization offset parameter to output code stream data.
Fig. 6 is a schematic structural diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, where, as shown in fig. 6, the apparatus includes a processor 201, a memory 202, an input device 203 and an output device 204, where the number of processors 201 in the apparatus may be one or more, in fig. 6, one processor 201 is taken as an example, and the processor 201, the memory 202, the input device 203 and the output device 204 in the apparatus may be connected by a bus or other manners, in fig. 6, which is taken as an example by a bus connection. The memory 202 is used as a computer readable storage medium for storing software programs, computer executable programs and modules, such as program instructions/modules corresponding to the content adaptive video coding method in the embodiment of the present invention. The processor 201 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 202, i.e., implements the content adaptive video encoding method described above. The input means 203 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.
The embodiment of the present invention also provides a storage medium containing computer executable instructions, which when executed by a computer processor, are configured to perform a content adaptive video encoding method described in the foregoing embodiment, specifically including:
acquiring video data to be encoded, and dividing the video data into a plurality of image sets containing continuous frame images;
determining coding characteristics of the image set, and inputting the coding characteristics and set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters;
And encoding the image set according to the encoding characteristics and the code rate control parameters.
It should be noted that, in the embodiment of the adaptive video coding apparatus, each unit and module included in the embodiment are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be implemented, and in addition, specific names of each functional unit are only for convenience of distinguishing each other, and are not used for limiting the protection scope of the embodiment of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the embodiments of the present invention are not limited to the particular embodiments described herein, but are capable of numerous obvious changes, rearrangements and substitutions without departing from the scope of the embodiments of the present invention. Therefore, while the embodiments of the present invention have been described in connection with the above embodiments, the embodiments of the present invention are not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.