CN114554211B

CN114554211B - Content-adaptive video encoding method, device, equipment and storage medium

Info

Publication number: CN114554211B
Application number: CN202210043241.9A
Authority: CN
Inventors: 刘芳; 袁子逸; 洪旭东; 崔同兵
Original assignee: Bigo Technology Pte Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2025-01-28
Anticipated expiration: 2042-01-14
Also published as: CN114554211A; WO2023134523A1

Abstract

The embodiment of the present invention discloses a content-adaptive video encoding method, device, equipment and storage medium, the method comprising: obtaining video data to be encoded, dividing the video data into a plurality of image sets including continuous frame images; determining encoding features of the image set, inputting the encoding features and set video picture evaluation parameters into a pre-trained machine learning model to output a bit rate control parameter; encoding the image set according to the encoding features and the bit rate control parameter. This solution improves the video encoding efficiency and is also applicable to real-time video scenarios.

Description

Content adaptive video coding method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of video processing, in particular to a content adaptive video coding method, device, equipment and storage medium.

Background

With the rapid development of mobile internet technology, video has become a mainstream medium for users, and live video, on-demand video, short video and video chat have become part of people's lives. However, since the amount of video data is very large compared to text and pictures, the transmission and storage of video also face a significant challenge, and the video codec technology is to implement as high a compression ratio and as high a video reconstruction quality as possible within available computing resources to meet the requirements of storage capacity and bandwidth. Early video service providers typically processed almost all video content using a pre-determined common encoding scheme, which may suffer from insufficient code rate for highly moving video, resulting in low encoding quality, and wasted code rate for low speed video. The content adaptive coding achieves the purpose of saving bandwidth by setting different coding configurations for different videos according to video content, and finding out the lowest code rate meeting the requirements of definition and subjective sensitivity for each video or video segment.

When video coding is carried out, the coding data is extracted as characteristics through pre-coding training video data, and a machine learning model is trained by combining corresponding constant code rate coefficient values. Coding parameter prediction is performed according to video characteristics by using the model in a production environment, then a predicted value is used for coding, the balance between coding bit rate and coding quality is achieved, and the viewing experience of most audiences is improved. However, the coding method extracts the characteristics by coding the whole video, and predicts the constant code rate coefficient value of the whole video by using a machine learning model, and for long video containing complex and mixed contents, the method can cause poor coding quality of the complex part of the video and waste of the simple part of code rate. Meanwhile, in the encoding process, firstly, the whole video is encoded to extract the characteristics and predict the coefficient value of the constant code rate, and then encoding is carried out according to the predicted value, so that a great amount of time is consumed, and the method is not suitable for live broadcasting scenes.

Disclosure of Invention

The embodiment of the invention provides a content adaptive video coding method, device, equipment and storage medium, which solve the problem that video coding in the prior art is not ideal for coding effect under complex scenes, improve video coding efficiency and are simultaneously suitable for real-time video scenes.

In a first aspect, an embodiment of the present invention provides a content adaptive video encoding method, including:

acquiring video data to be encoded, and dividing the video data into a plurality of image sets containing continuous frame images;

determining coding characteristics of the image set, and inputting the coding characteristics and set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters;

And encoding the image set according to the encoding characteristics and the code rate control parameters.

In a second aspect, an embodiment of the present invention further provides a content adaptive video encoding apparatus, including:

The image set determining module is used for acquiring video data to be encoded and dividing the video data into a plurality of image sets containing continuous frame images;

The code rate parameter determining module is used for determining the coding characteristics of the image set, inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model and outputting code rate control parameters;

And the coding module is used for coding the image set according to the coding characteristics and the code rate control parameters.

In a third aspect, an embodiment of the present invention further provides a content adaptive video encoding apparatus, including:

One or more processors;

Storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the content adaptive video coding method according to the embodiments of the present invention.

In a fourth aspect, embodiments of the present invention also provide a storage medium storing computer-executable instructions that, when executed by a computer processor, are configured to perform the content adaptive video encoding method according to the embodiments of the present invention.

In the embodiment of the invention, the video data to be encoded is obtained, the video data is divided into a plurality of image sets containing continuous frame images, the encoding characteristics of the image sets are determined, the encoding characteristics and the set video picture evaluation parameters are input into a pre-trained machine learning model to output code rate control parameters, and the image sets are encoded according to the encoding characteristics and the code rate control parameters, so that the problem that the video encoding in the prior art is not ideal for the encoding effect under complex scenes is solved, the video encoding efficiency is improved, and the method is simultaneously applicable to real-time video scenes.

Drawings

Fig. 1 is a flowchart of a content adaptive video coding method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for performing secondary encoding based on primary encoding results according to an embodiment of the present invention;

FIG. 3 is a flowchart of another method for content adaptive video encoding according to an embodiment of the present invention;

FIG. 4 is a flowchart of another method for content adaptive video encoding according to an embodiment of the present invention;

Fig. 5 is a block diagram of a content adaptive video coding apparatus according to an embodiment of the present invention;

Fig. 6 is a schematic structural diagram of a content adaptive video coding device according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not limiting of embodiments of the invention. It should be further noted that, for convenience of description, only some, but not all of the structures related to the embodiments of the present invention are shown in the drawings.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

Fig. 1 is a flowchart of a content adaptive video coding method according to an embodiment of the present invention, which can be applied to coding video data, where the method can be executed by a computing device, such as a notebook, a desktop, a smart phone, a server, and a tablet computer, and specifically includes the following steps:

step S101, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.

The video data to be encoded includes recorded video data and video data generated in real time and required to be transmitted and displayed, such as live video data.

In one embodiment, in encoding video data, the video data is first divided into a plurality of image sets containing successive frame images for a piece of video data. I.e. when video encoding is performed, separate video encoding is performed for each sub-divided image set. Illustratively, the video data may be divided into successive GOPs (Group of pictures, a group of pictures), each GOP representing a group of successive pictures in an encoded video stream. If each GOP contains 15 frames or 20 frames of pictures, i.e. the video data to be encoded is divided into a plurality of consecutive picture sets, each picture set contains 15 to 20 frames of pictures, i.e. the encoding of the video data is performed in GOP-coding units.

Step S102, determining coding features of the image set, and inputting the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters.

In one embodiment, the manner in which the encoding features of the image set are determined may be by employing a precoding implementation to derive the encoding features of the image set. Such as encoding the image collection using an encoder to obtain corresponding encoding characteristics.

In one embodiment, the encoding features of the image set are obtained by feature extraction and analysis of each frame of image in the image set. Optionally, the coding features include motion vector features, distortion level parameters, complexity parameters, etc. for describing each frame of image in the image set. The motion vector feature is used for representing the change degree of the image, wherein the more the change among the images of each frame is severe, the larger the motion vector is relatively, otherwise, if the images of each frame describe a still picture, the smaller the motion vector is, the larger the distortion degree of the image is represented by a distortion degree parameter, the higher the distortion degree of the image is, the higher the parameter value is, otherwise, if the distortion degree of the image is low, the corresponding parameter value is relatively lower, and the complexity parameter is used for representing the complexity degree of the image, for example, the image comprises a plurality of different objects, and the greater the pixel difference among each object is, the higher the complexity is. Alternatively, the identification of the coding features may be implemented by an existing encoder module, an image processing algorithm, and the like.

The video picture evaluation parameter is a comprehensive evaluation index for representing the image quality. Alternatively, the video picture evaluation parameters may be characterized by VMAF (Video Multimethod Assessment Fusion, video multi-method evaluation fusion). Wherein VMAF is an objective evaluation index combining human visual modeling and machine learning, which is proposed by Netflix. VMAF uses a large amount of subjective data as a training set, and fuses algorithms of different evaluation dimensions through a machine learning method, so that the method is an objective evaluation index which is the current mainstream of comparison. It can be generally considered that the higher the VMAF score is, the better the video quality is, but from the viewpoint of human eye perception, when the VMAF score is increased to a certain threshold, the human eye cannot perceive the image quality improvement, so that different VMAF values can be designed for different videos to realize the saving of the coding bit rate without changing the subjective quality of the videos.

In one embodiment, the determined coding features of the image set and the set video picture evaluation parameters are input to a pre-trained machine learning model to output code rate control parameters, wherein the set video picture evaluation parameters can be customized according to different picture quality requirements, different playing devices and the like, and the set values can also be adjusted. The input machine learning model is a pre-trained neural network model, and the input machine learning model can output corresponding code rate control parameters based on the coding characteristics of the image set and the set video picture evaluation parameters. Alternatively, the rate control parameter may be a CRF (Constant Rate Factor, constant rate coefficient) or a CQF (Constant Quality Factor ). The CRF is one of code rate control, the smaller the CRF value is, the higher the video quality is brought, and the file volume is also increased, and the larger the CRF value is, the higher the video compression rate is, but the lower the video quality is. Alternatively, different CRF values correspond to different code rates, and different CRF values and corresponding code rates may be recorded by using a mapping table, or a relationship between CRF and code rate may be represented by using a function curve.

And step 103, coding the image set according to the coding characteristics and the code rate control parameters.

In one embodiment, after obtaining the rate control parameter through the machine learning model, final secondary encoding is performed on the image set based on the rate control parameter and the encoding feature determined in step S101, so as to output the code stream data.

Specifically, fig. 2 is a flowchart of a method for performing secondary encoding based on a primary encoding result according to an embodiment of the present invention, as shown in fig. 2, specifically includes:

step S1031, determining frame type information and scene information according to the coding features.

Wherein the coding feature records the frame type of each frame, such as different frame types of I-frame, P-frame and B-frame partitions. Wherein different frame type information requires different quality of encoding compression due to the difference in its reference references. The I frame represents a key frame and is a frame picture which is reserved completely, decoding of the picture can be completed only by the frame data without referring to other frame pictures, the P frame represents the difference between the frame and the previous key frame or the P frame, the difference defined by the previous buffer picture is overlapped with the picture to generate a final picture when decoding, and the B frame represents a bidirectional difference frame, namely, the B frame records the difference between the previous frame and the next frame when decoding the B frame picture, the previous buffer picture is acquired, the picture after decoding is acquired, and the final picture is acquired through the superposition of the previous frame and the next frame and the current frame data.

The scene information may be divided into a motion scene and a still scene, for example. Which can be determined from the coding features by an integrated scene discrimination module. Wherein, the coding feature records the image feature related to the motion vector, motion compensation and the like of each frame of image and the motion displacement change, and the scene information of the image is determined by analyzing the data of the motion vector, the motion compensation and the like.

Step S1032, performing prediction analysis according to the frame type information, the scene information and the code rate control parameter to obtain coding parameters.

The encoding parameter is exemplified by HEVC (HIGH EFFICIENCY Video Coding), which corresponds to a quantization parameter QP (quantization parameter ). The quantization parameter QP is a sequence number of a quantization step length Qstep, and for luma coding, the quantization step length Qstep has 52 values, and for chroma coding, the QP has values of 0 to 51.

The coding parameters take quantization parameter QP as an example, reflecting the spatial detail compression. The smaller the encoding parameter value is, the finer the quantization is, the higher the image quality is, and the longer the generated code stream is, if the quantization parameter QP value is smaller, most of details in the image can be reserved, and if the quantization parameter QP value is increased, some details in the image are correspondingly lost, and the code rate is reduced. Taking the QP values 0-51 as an example, the QP is the finest quantization when it is the minimum value 0, and conversely, the QP is the maximum value 51, which indicates that the quantization is the coarsest. Quantization is to reduce the image coding length without reducing the visual effect and reduce unnecessary information in visual restoration.

Specifically, the process of obtaining coding parameters by performing prediction analysis based on frame type information, scene information and code rate control parameters, for example, is implemented by using an integrated encoder module of HEVC high efficiency video coding. I.e. different frame type information (I-frame, B-frame, P-frame), scene information (static scene, dynamic scene), code rate control parameters (CRF) together determine the final coding parameters (frame level QP). Illustratively, the higher the frame type is a key frame, the scene information is a dynamic scene, and the value of the rate control parameter, the lower the frame level QP value is determined.

Step S1033, encoding the image set based on the encoding parameter.

In one embodiment, after obtaining the coding parameters, taking a frame level QP parameter in HEVC high efficiency video coding as an example, HEVC high efficiency video coding is performed to implement code stream output.

In another embodiment, in order to improve accuracy of the secondary encoding, the process of performing prediction analysis to obtain encoding parameters and encoding the image set based on the encoding parameters includes performing prediction analysis to obtain first encoding parameters, determining second encoding parameters based on the first encoding parameters, encoding feedback information, buffer information, frame type information, and scene information, adjusting quantization offset parameters according to the first encoding parameters, and encoding the image set based on the second encoding parameters and the adjusted quantization offset parameters to output bitstream data. Taking HEVC coding as an example, the first coding parameter may be understood as basic QP information (base QP), which determines frame-level QP information according to the first coding parameter, coding feedback information, buffer information, frame type information, and scene information. The buffer information characterizes parameters of a buffer memory in the video coding process, and the larger the buffer occupation is, the larger the corresponding QP value is so as to reduce the operation amount and the storage amount of video coding. The encoding feedback information can be information obtained in the pre-encoding process or fed back after encoding the image set or the video in the previous round, such as distortion degree, if the distortion degree is higher, the QP value is required to be reduced correspondingly, so as to improve the encoding quality. The quantization offset parameter is further adjusted according to the first coding parameter while the second coding parameter is determined according to the first coding parameter. The quantization offset parameter, exemplified by HEVC video coding, may be characterized by Cutree intensities, which represent quantization offset adjustments according to the extent to which the current block is referenced. Specifically, if the current block is referred to, it is further determined whether a certain number of blocks after the current block refer to the current block, if the current block is referred to more by the subsequent image blocks, the current block is characterized as belonging to a slowly changing scene, and the QP value is correspondingly adjusted down to improve the image quality. And finally, comprehensively carrying out image set coding by utilizing the determined second coding parameters and the determined quantization offset parameters so as to output code stream data, and ensuring the optimal balance of coding effect between image quality and compression rate.

According to the scheme, when video coding is carried out, firstly, video is divided into image sets, coding characteristics are obtained by primary coding, then, accurate code rate control parameters are output by using a trained machine learning model, and then, secondary coding is carried out on the image sets based on the code rate control parameters and the coding characteristics obtained in the primary coding process to finally obtain video coding results.

Fig. 3 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, and provides a method for determining coding characteristics of an image set, where as shown in fig. 3, the method specifically includes:

step S201, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.

Step S202, obtaining a preset number of frame images in the image set, encoding the preset number of frame images to obtain encoding features, and determining the encoding features as the encoding features of the image set.

In one embodiment, taking the GOP image as an example, the preset number of frame images may be miniGOP images in one GOP image, that is, taking the GOP image with 15 frames as an example, the preset number of frame images may be 5 frame images. Wherein, the process of encoding the preset number of frame images may be pre-encoding by an encoder to obtain encoding characteristics. And determining the coding features of the frame images with preset numbers as the coding features of the image set.

And step 203, inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters.

And step S204, coding the image set according to the coding characteristics and the code rate control parameters.

According to the scheme, the video live broadcast content self-adaptive coding technology of twice coding and machine learning is adopted in the video coding process, coding configuration is dynamically adjusted according to the complexity of video content, wherein the preset number of frame images in the image set are obtained by obtaining the coding characteristics, the coding characteristics are determined as the coding characteristics of the image set, the coding speed can be remarkably improved, the video coding effect required by real-time performance is outstanding, meanwhile, the data calculated amount is reduced, the content self-adaptive coding is realized, the video smoothness and definition are balanced better, and the video live broadcast content self-adaptive coding method can be applied to real-time live broadcast video scenes and has good video coding effect.

Fig. 4 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, and provides a specific method for outputting a rate control parameter through a machine learning model, where the machine learning model includes a joint model formed by a first training model and a second training model, and as shown in fig. 4, the method specifically includes:

Step S301, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.

Step S302, determining coding features of the image set, and respectively inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model.

In one embodiment, the first training model is a XGBoost model and the second training model is a LightGBM model, both of which are decision tree-based machine learning algorithms. Illustratively, the first rate control parameter output by the first training model is denoted CRF1 and the second rate control parameter output by the second training model is denoted CRF2.

And step S303, carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain the code rate control parameter.

The finally calculated rate control parameter is denoted as CRF3, and is optionally calculated by the formula crf3=λ ₁*CRF1+λ₂ ×crf2. Wherein lambda ₁+λ₂＝1,λ₁∈[0,1],λ₂ E [0,1].

And step S304, coding the image set according to the coding characteristics and the code rate control parameters.

According to the method, when the code rate control parameters are output through the machine learning model, two different models based on the decision tree are adopted to output the corresponding code rate control parameters, and then weighted average is carried out to obtain the final code rate control parameters, so that the accuracy of the obtained code rate control parameters is higher, and the effect of final video coding is better.

In one embodiment, before the coding features and the set video picture evaluation parameters are respectively input into the first training model and the second training model, the method further comprises the steps of obtaining video sample data of different scene types and corresponding different resolutions, dividing the video sample data into a training set sample, a test set sample and a verification set sample, and respectively inputting the training set sample, the test set sample and the verification set sample into the first training model and the second training model for training. In the model training process, the scheme firstly distinguishes the scene types of the video pictures, such as dynamic scenes and static scenes, and simultaneously respectively trains based on the video pictures with different resolutions as sample data, and in the training process, the video sample data is divided into a training set sample, a test set sample and a verification set sample so as to obtain a final training model with good prediction effect.

Fig. 5 is a block diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, where the apparatus is configured to execute the content adaptive video coding method according to the foregoing embodiment, and the apparatus has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 5, the apparatus specifically includes an image set determining module 101, a code rate parameter determining module 102, and an encoding module 103, wherein,

An image set determining module 101, configured to obtain video data to be encoded, and divide the video data into a plurality of image sets including continuous frame images;

the code rate parameter determining module 102 is configured to determine coding features of the image set, input the coding features and the set video picture evaluation parameters to a pre-trained machine learning model, and output code rate control parameters;

an encoding module 103, configured to encode the image set according to the encoding feature and the rate control parameter.

In one possible embodiment, the code rate parameter determining module 102 is specifically configured to:

acquiring a preset number of frame images in the image set;

And coding the frame images with the preset number to obtain coding features, and determining the coding features as the coding features of the image set.

In one possible embodiment, the machine learning model includes a joint model composed of a first training model and a second training model, and the code rate parameter determining module 102 is specifically configured to:

inputting the coding characteristics and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model;

and carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain the code rate control parameter.

In one possible embodiment, the code rate parameter determining module 102 is further configured to:

Before the coding characteristics and the set video picture evaluation parameters are respectively input into the first training model and the second training model, video sample data of different scene types and corresponding different resolutions are obtained;

Dividing the video sample data into a training set sample, a test set sample and a verification set sample, and respectively inputting the training set sample, the test set sample and the verification set sample into the first training model and the second training model for training.

In one possible embodiment, the encoding module 103 is specifically configured to:

determining frame type information and scene information according to the coding features;

Performing predictive analysis according to the frame type information, the scene information and the code rate control parameter to obtain coding parameters;

The set of images is encoded based on the encoding parameters.

Performing predictive analysis to obtain a first coding parameter;

and determining a second coding parameter based on the first coding parameter, the coding feedback information, the buffer information, the frame type information and the scene information.

Adjusting a quantization offset parameter according to the first coding parameter;

and encoding the image set according to the second encoding parameter and the adjusted quantization offset parameter to output code stream data.

Fig. 6 is a schematic structural diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, where, as shown in fig. 6, the apparatus includes a processor 201, a memory 202, an input device 203 and an output device 204, where the number of processors 201 in the apparatus may be one or more, in fig. 6, one processor 201 is taken as an example, and the processor 201, the memory 202, the input device 203 and the output device 204 in the apparatus may be connected by a bus or other manners, in fig. 6, which is taken as an example by a bus connection. The memory 202 is used as a computer readable storage medium for storing software programs, computer executable programs and modules, such as program instructions/modules corresponding to the content adaptive video coding method in the embodiment of the present invention. The processor 201 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 202, i.e., implements the content adaptive video encoding method described above. The input means 203 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.

The embodiment of the present invention also provides a storage medium containing computer executable instructions, which when executed by a computer processor, are configured to perform a content adaptive video encoding method described in the foregoing embodiment, specifically including:

It should be noted that, in the embodiment of the adaptive video coding apparatus, each unit and module included in the embodiment are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be implemented, and in addition, specific names of each functional unit are only for convenience of distinguishing each other, and are not used for limiting the protection scope of the embodiment of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the embodiments of the present invention are not limited to the particular embodiments described herein, but are capable of numerous obvious changes, rearrangements and substitutions without departing from the scope of the embodiments of the present invention. Therefore, while the embodiments of the present invention have been described in connection with the above embodiments, the embodiments of the present invention are not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.

Claims

1. A content-adaptive video encoding method, comprising:

Acquire video data to be encoded, and divide the video data into a plurality of image sets including continuous frame images;

Determine the coding features of the image set, and input the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output a bit rate control parameter;

Frame type information and scene information are determined according to the encoding features, encoding parameters are obtained by performing predictive analysis according to the frame type information, the scene information and the bit rate control parameters, and the image set is encoded based on the encoding parameters, wherein the encoding parameters include a first encoding parameter obtained when performing predictive analysis, and a second encoding parameter is determined based on the first encoding parameter, encoding feedback information, cache information, the frame type information, and the scene information.

2. The content adaptive video encoding method according to claim 1, wherein determining the encoding features of the image set comprises:

Acquire a preset number of frame images from the image set;

The preset number of frame images are encoded to obtain encoding features, and the encoding features are determined as encoding features of the image set.

3. The content adaptive video encoding method according to claim 1, characterized in that the machine learning model comprises a joint model consisting of a first training model and a second training model, and the step of inputting the encoding features and the set video picture evaluation parameters into the pre-trained machine learning model to output the bit rate control parameters comprises:

Inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, to obtain a first rate control parameter output by the first training model and a second rate control parameter output by the second training model;

A rate control parameter is obtained by performing weighted average calculation on the first rate control parameter and the second rate control parameter.

4. The content adaptive video encoding method according to claim 3, characterized in that before the encoding features and the set video picture evaluation parameters are respectively input into the first training model and the second training model, it also includes:

Obtain video sample data of different scene types and corresponding resolutions;

The video sample data is divided into training set samples, test set samples and validation set samples, and are respectively input into the first training model and the second training model for training.

5. The content adaptive video encoding method according to claim 1, wherein encoding the image set based on the encoding parameters comprises:

Adjusting a quantization offset parameter according to the first encoding parameter;

The image set is encoded according to the second encoding parameter and the adjusted quantization offset parameter to output code stream data.

6. A content-adaptive video encoding device, comprising:

An image set determination module is used to obtain video data to be encoded and divide the video data into a plurality of image sets including continuous frame images;

A bit rate parameter determination module is used to determine the coding features of the image set, and input the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output a bit rate control parameter;

A coding module, used to determine frame type information and scene information according to the coding features, perform predictive analysis based on the frame type information, the scene information and the bit rate control parameters to obtain coding parameters, and encode the image set based on the coding parameters, wherein the coding parameters include a first coding parameter obtained when performing predictive analysis, and a second coding parameter determined based on the first coding parameter, coding feedback information, cache information, the frame type information, and the scene information.

7. A content adaptive video encoding device, the device comprising: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, the one or more processors implement the content adaptive video encoding method as described in any one of claims 1-5.

8. A storage medium storing computer executable instructions, wherein the computer executable instructions are used to perform the content adaptive video encoding method according to any one of claims 1 to 5 when executed by a computer processor.