[go: up one dir, main page]

CN114554211B - Content-adaptive video encoding method, device, equipment and storage medium - Google Patents

Content-adaptive video encoding method, device, equipment and storage medium Download PDF

Info

Publication number
CN114554211B
CN114554211B CN202210043241.9A CN202210043241A CN114554211B CN 114554211 B CN114554211 B CN 114554211B CN 202210043241 A CN202210043241 A CN 202210043241A CN 114554211 B CN114554211 B CN 114554211B
Authority
CN
China
Prior art keywords
encoding
coding
video
parameter
rate control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210043241.9A
Other languages
Chinese (zh)
Other versions
CN114554211A (en
Inventor
刘芳
袁子逸
洪旭东
崔同兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202210043241.9A priority Critical patent/CN114554211B/en
Publication of CN114554211A publication Critical patent/CN114554211A/en
Priority to PCT/CN2023/070555 priority patent/WO2023134523A1/en
Application granted granted Critical
Publication of CN114554211B publication Critical patent/CN114554211B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明实施例公开了一种内容自适应视频编码方法、装置、设备和存储介质,该方法包括:获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;根据所述编码特征和所述码率控制参数对所述图像集合进行编码。本方案提高了视频编码效率,同时适用于实时视频场景。

The embodiment of the present invention discloses a content-adaptive video encoding method, device, equipment and storage medium, the method comprising: obtaining video data to be encoded, dividing the video data into a plurality of image sets including continuous frame images; determining encoding features of the image set, inputting the encoding features and set video picture evaluation parameters into a pre-trained machine learning model to output a bit rate control parameter; encoding the image set according to the encoding features and the bit rate control parameter. This solution improves the video encoding efficiency and is also applicable to real-time video scenarios.

Description

Content adaptive video coding method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of video processing, in particular to a content adaptive video coding method, device, equipment and storage medium.
Background
With the rapid development of mobile internet technology, video has become a mainstream medium for users, and live video, on-demand video, short video and video chat have become part of people's lives. However, since the amount of video data is very large compared to text and pictures, the transmission and storage of video also face a significant challenge, and the video codec technology is to implement as high a compression ratio and as high a video reconstruction quality as possible within available computing resources to meet the requirements of storage capacity and bandwidth. Early video service providers typically processed almost all video content using a pre-determined common encoding scheme, which may suffer from insufficient code rate for highly moving video, resulting in low encoding quality, and wasted code rate for low speed video. The content adaptive coding achieves the purpose of saving bandwidth by setting different coding configurations for different videos according to video content, and finding out the lowest code rate meeting the requirements of definition and subjective sensitivity for each video or video segment.
When video coding is carried out, the coding data is extracted as characteristics through pre-coding training video data, and a machine learning model is trained by combining corresponding constant code rate coefficient values. Coding parameter prediction is performed according to video characteristics by using the model in a production environment, then a predicted value is used for coding, the balance between coding bit rate and coding quality is achieved, and the viewing experience of most audiences is improved. However, the coding method extracts the characteristics by coding the whole video, and predicts the constant code rate coefficient value of the whole video by using a machine learning model, and for long video containing complex and mixed contents, the method can cause poor coding quality of the complex part of the video and waste of the simple part of code rate. Meanwhile, in the encoding process, firstly, the whole video is encoded to extract the characteristics and predict the coefficient value of the constant code rate, and then encoding is carried out according to the predicted value, so that a great amount of time is consumed, and the method is not suitable for live broadcasting scenes.
Disclosure of Invention
The embodiment of the invention provides a content adaptive video coding method, device, equipment and storage medium, which solve the problem that video coding in the prior art is not ideal for coding effect under complex scenes, improve video coding efficiency and are simultaneously suitable for real-time video scenes.
In a first aspect, an embodiment of the present invention provides a content adaptive video encoding method, including:
acquiring video data to be encoded, and dividing the video data into a plurality of image sets containing continuous frame images;
determining coding characteristics of the image set, and inputting the coding characteristics and set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters;
And encoding the image set according to the encoding characteristics and the code rate control parameters.
In a second aspect, an embodiment of the present invention further provides a content adaptive video encoding apparatus, including:
The image set determining module is used for acquiring video data to be encoded and dividing the video data into a plurality of image sets containing continuous frame images;
The code rate parameter determining module is used for determining the coding characteristics of the image set, inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model and outputting code rate control parameters;
And the coding module is used for coding the image set according to the coding characteristics and the code rate control parameters.
In a third aspect, an embodiment of the present invention further provides a content adaptive video encoding apparatus, including:
One or more processors;
Storage means for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the content adaptive video coding method according to the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention also provide a storage medium storing computer-executable instructions that, when executed by a computer processor, are configured to perform the content adaptive video encoding method according to the embodiments of the present invention.
In the embodiment of the invention, the video data to be encoded is obtained, the video data is divided into a plurality of image sets containing continuous frame images, the encoding characteristics of the image sets are determined, the encoding characteristics and the set video picture evaluation parameters are input into a pre-trained machine learning model to output code rate control parameters, and the image sets are encoded according to the encoding characteristics and the code rate control parameters, so that the problem that the video encoding in the prior art is not ideal for the encoding effect under complex scenes is solved, the video encoding efficiency is improved, and the method is simultaneously applicable to real-time video scenes.
Drawings
Fig. 1 is a flowchart of a content adaptive video coding method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for performing secondary encoding based on primary encoding results according to an embodiment of the present invention;
FIG. 3 is a flowchart of another method for content adaptive video encoding according to an embodiment of the present invention;
FIG. 4 is a flowchart of another method for content adaptive video encoding according to an embodiment of the present invention;
Fig. 5 is a block diagram of a content adaptive video coding apparatus according to an embodiment of the present invention;
Fig. 6 is a schematic structural diagram of a content adaptive video coding device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the particular embodiments described herein are illustrative only and are not limiting of embodiments of the invention. It should be further noted that, for convenience of description, only some, but not all of the structures related to the embodiments of the present invention are shown in the drawings.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
Fig. 1 is a flowchart of a content adaptive video coding method according to an embodiment of the present invention, which can be applied to coding video data, where the method can be executed by a computing device, such as a notebook, a desktop, a smart phone, a server, and a tablet computer, and specifically includes the following steps:
step S101, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.
The video data to be encoded includes recorded video data and video data generated in real time and required to be transmitted and displayed, such as live video data.
In one embodiment, in encoding video data, the video data is first divided into a plurality of image sets containing successive frame images for a piece of video data. I.e. when video encoding is performed, separate video encoding is performed for each sub-divided image set. Illustratively, the video data may be divided into successive GOPs (Group of pictures, a group of pictures), each GOP representing a group of successive pictures in an encoded video stream. If each GOP contains 15 frames or 20 frames of pictures, i.e. the video data to be encoded is divided into a plurality of consecutive picture sets, each picture set contains 15 to 20 frames of pictures, i.e. the encoding of the video data is performed in GOP-coding units.
Step S102, determining coding features of the image set, and inputting the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters.
In one embodiment, the manner in which the encoding features of the image set are determined may be by employing a precoding implementation to derive the encoding features of the image set. Such as encoding the image collection using an encoder to obtain corresponding encoding characteristics.
In one embodiment, the encoding features of the image set are obtained by feature extraction and analysis of each frame of image in the image set. Optionally, the coding features include motion vector features, distortion level parameters, complexity parameters, etc. for describing each frame of image in the image set. The motion vector feature is used for representing the change degree of the image, wherein the more the change among the images of each frame is severe, the larger the motion vector is relatively, otherwise, if the images of each frame describe a still picture, the smaller the motion vector is, the larger the distortion degree of the image is represented by a distortion degree parameter, the higher the distortion degree of the image is, the higher the parameter value is, otherwise, if the distortion degree of the image is low, the corresponding parameter value is relatively lower, and the complexity parameter is used for representing the complexity degree of the image, for example, the image comprises a plurality of different objects, and the greater the pixel difference among each object is, the higher the complexity is. Alternatively, the identification of the coding features may be implemented by an existing encoder module, an image processing algorithm, and the like.
The video picture evaluation parameter is a comprehensive evaluation index for representing the image quality. Alternatively, the video picture evaluation parameters may be characterized by VMAF (Video Multimethod Assessment Fusion, video multi-method evaluation fusion). Wherein VMAF is an objective evaluation index combining human visual modeling and machine learning, which is proposed by Netflix. VMAF uses a large amount of subjective data as a training set, and fuses algorithms of different evaluation dimensions through a machine learning method, so that the method is an objective evaluation index which is the current mainstream of comparison. It can be generally considered that the higher the VMAF score is, the better the video quality is, but from the viewpoint of human eye perception, when the VMAF score is increased to a certain threshold, the human eye cannot perceive the image quality improvement, so that different VMAF values can be designed for different videos to realize the saving of the coding bit rate without changing the subjective quality of the videos.
In one embodiment, the determined coding features of the image set and the set video picture evaluation parameters are input to a pre-trained machine learning model to output code rate control parameters, wherein the set video picture evaluation parameters can be customized according to different picture quality requirements, different playing devices and the like, and the set values can also be adjusted. The input machine learning model is a pre-trained neural network model, and the input machine learning model can output corresponding code rate control parameters based on the coding characteristics of the image set and the set video picture evaluation parameters. Alternatively, the rate control parameter may be a CRF (Constant Rate Factor, constant rate coefficient) or a CQF (Constant Quality Factor ). The CRF is one of code rate control, the smaller the CRF value is, the higher the video quality is brought, and the file volume is also increased, and the larger the CRF value is, the higher the video compression rate is, but the lower the video quality is. Alternatively, different CRF values correspond to different code rates, and different CRF values and corresponding code rates may be recorded by using a mapping table, or a relationship between CRF and code rate may be represented by using a function curve.
And step 103, coding the image set according to the coding characteristics and the code rate control parameters.
In one embodiment, after obtaining the rate control parameter through the machine learning model, final secondary encoding is performed on the image set based on the rate control parameter and the encoding feature determined in step S101, so as to output the code stream data.
Specifically, fig. 2 is a flowchart of a method for performing secondary encoding based on a primary encoding result according to an embodiment of the present invention, as shown in fig. 2, specifically includes:
step S1031, determining frame type information and scene information according to the coding features.
Wherein the coding feature records the frame type of each frame, such as different frame types of I-frame, P-frame and B-frame partitions. Wherein different frame type information requires different quality of encoding compression due to the difference in its reference references. The I frame represents a key frame and is a frame picture which is reserved completely, decoding of the picture can be completed only by the frame data without referring to other frame pictures, the P frame represents the difference between the frame and the previous key frame or the P frame, the difference defined by the previous buffer picture is overlapped with the picture to generate a final picture when decoding, and the B frame represents a bidirectional difference frame, namely, the B frame records the difference between the previous frame and the next frame when decoding the B frame picture, the previous buffer picture is acquired, the picture after decoding is acquired, and the final picture is acquired through the superposition of the previous frame and the next frame and the current frame data.
The scene information may be divided into a motion scene and a still scene, for example. Which can be determined from the coding features by an integrated scene discrimination module. Wherein, the coding feature records the image feature related to the motion vector, motion compensation and the like of each frame of image and the motion displacement change, and the scene information of the image is determined by analyzing the data of the motion vector, the motion compensation and the like.
Step S1032, performing prediction analysis according to the frame type information, the scene information and the code rate control parameter to obtain coding parameters.
The encoding parameter is exemplified by HEVC (HIGH EFFICIENCY Video Coding), which corresponds to a quantization parameter QP (quantization parameter ). The quantization parameter QP is a sequence number of a quantization step length Qstep, and for luma coding, the quantization step length Qstep has 52 values, and for chroma coding, the QP has values of 0 to 51.
The coding parameters take quantization parameter QP as an example, reflecting the spatial detail compression. The smaller the encoding parameter value is, the finer the quantization is, the higher the image quality is, and the longer the generated code stream is, if the quantization parameter QP value is smaller, most of details in the image can be reserved, and if the quantization parameter QP value is increased, some details in the image are correspondingly lost, and the code rate is reduced. Taking the QP values 0-51 as an example, the QP is the finest quantization when it is the minimum value 0, and conversely, the QP is the maximum value 51, which indicates that the quantization is the coarsest. Quantization is to reduce the image coding length without reducing the visual effect and reduce unnecessary information in visual restoration.
Specifically, the process of obtaining coding parameters by performing prediction analysis based on frame type information, scene information and code rate control parameters, for example, is implemented by using an integrated encoder module of HEVC high efficiency video coding. I.e. different frame type information (I-frame, B-frame, P-frame), scene information (static scene, dynamic scene), code rate control parameters (CRF) together determine the final coding parameters (frame level QP). Illustratively, the higher the frame type is a key frame, the scene information is a dynamic scene, and the value of the rate control parameter, the lower the frame level QP value is determined.
Step S1033, encoding the image set based on the encoding parameter.
In one embodiment, after obtaining the coding parameters, taking a frame level QP parameter in HEVC high efficiency video coding as an example, HEVC high efficiency video coding is performed to implement code stream output.
In another embodiment, in order to improve accuracy of the secondary encoding, the process of performing prediction analysis to obtain encoding parameters and encoding the image set based on the encoding parameters includes performing prediction analysis to obtain first encoding parameters, determining second encoding parameters based on the first encoding parameters, encoding feedback information, buffer information, frame type information, and scene information, adjusting quantization offset parameters according to the first encoding parameters, and encoding the image set based on the second encoding parameters and the adjusted quantization offset parameters to output bitstream data. Taking HEVC coding as an example, the first coding parameter may be understood as basic QP information (base QP), which determines frame-level QP information according to the first coding parameter, coding feedback information, buffer information, frame type information, and scene information. The buffer information characterizes parameters of a buffer memory in the video coding process, and the larger the buffer occupation is, the larger the corresponding QP value is so as to reduce the operation amount and the storage amount of video coding. The encoding feedback information can be information obtained in the pre-encoding process or fed back after encoding the image set or the video in the previous round, such as distortion degree, if the distortion degree is higher, the QP value is required to be reduced correspondingly, so as to improve the encoding quality. The quantization offset parameter is further adjusted according to the first coding parameter while the second coding parameter is determined according to the first coding parameter. The quantization offset parameter, exemplified by HEVC video coding, may be characterized by Cutree intensities, which represent quantization offset adjustments according to the extent to which the current block is referenced. Specifically, if the current block is referred to, it is further determined whether a certain number of blocks after the current block refer to the current block, if the current block is referred to more by the subsequent image blocks, the current block is characterized as belonging to a slowly changing scene, and the QP value is correspondingly adjusted down to improve the image quality. And finally, comprehensively carrying out image set coding by utilizing the determined second coding parameters and the determined quantization offset parameters so as to output code stream data, and ensuring the optimal balance of coding effect between image quality and compression rate.
According to the scheme, when video coding is carried out, firstly, video is divided into image sets, coding characteristics are obtained by primary coding, then, accurate code rate control parameters are output by using a trained machine learning model, and then, secondary coding is carried out on the image sets based on the code rate control parameters and the coding characteristics obtained in the primary coding process to finally obtain video coding results.
Fig. 3 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, and provides a method for determining coding characteristics of an image set, where as shown in fig. 3, the method specifically includes:
step S201, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.
Step S202, obtaining a preset number of frame images in the image set, encoding the preset number of frame images to obtain encoding features, and determining the encoding features as the encoding features of the image set.
In one embodiment, taking the GOP image as an example, the preset number of frame images may be miniGOP images in one GOP image, that is, taking the GOP image with 15 frames as an example, the preset number of frame images may be 5 frame images. Wherein, the process of encoding the preset number of frame images may be pre-encoding by an encoder to obtain encoding characteristics. And determining the coding features of the frame images with preset numbers as the coding features of the image set.
And step 203, inputting the coding characteristics and the set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters.
And step S204, coding the image set according to the coding characteristics and the code rate control parameters.
According to the scheme, the video live broadcast content self-adaptive coding technology of twice coding and machine learning is adopted in the video coding process, coding configuration is dynamically adjusted according to the complexity of video content, wherein the preset number of frame images in the image set are obtained by obtaining the coding characteristics, the coding characteristics are determined as the coding characteristics of the image set, the coding speed can be remarkably improved, the video coding effect required by real-time performance is outstanding, meanwhile, the data calculated amount is reduced, the content self-adaptive coding is realized, the video smoothness and definition are balanced better, and the video live broadcast content self-adaptive coding method can be applied to real-time live broadcast video scenes and has good video coding effect.
Fig. 4 is a flowchart of another content adaptive video coding method according to an embodiment of the present invention, and provides a specific method for outputting a rate control parameter through a machine learning model, where the machine learning model includes a joint model formed by a first training model and a second training model, and as shown in fig. 4, the method specifically includes:
Step S301, obtaining video data to be encoded, and dividing the video data into a plurality of image sets including continuous frame images.
Step S302, determining coding features of the image set, and respectively inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model.
In one embodiment, the first training model is a XGBoost model and the second training model is a LightGBM model, both of which are decision tree-based machine learning algorithms. Illustratively, the first rate control parameter output by the first training model is denoted CRF1 and the second rate control parameter output by the second training model is denoted CRF2.
And step S303, carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain the code rate control parameter.
The finally calculated rate control parameter is denoted as CRF3, and is optionally calculated by the formula crf3=λ 1*CRF1+λ2 ×crf2. Wherein lambda 12=1,λ1∈[0,1],λ2 E [0,1].
And step S304, coding the image set according to the coding characteristics and the code rate control parameters.
According to the method, when the code rate control parameters are output through the machine learning model, two different models based on the decision tree are adopted to output the corresponding code rate control parameters, and then weighted average is carried out to obtain the final code rate control parameters, so that the accuracy of the obtained code rate control parameters is higher, and the effect of final video coding is better.
In one embodiment, before the coding features and the set video picture evaluation parameters are respectively input into the first training model and the second training model, the method further comprises the steps of obtaining video sample data of different scene types and corresponding different resolutions, dividing the video sample data into a training set sample, a test set sample and a verification set sample, and respectively inputting the training set sample, the test set sample and the verification set sample into the first training model and the second training model for training. In the model training process, the scheme firstly distinguishes the scene types of the video pictures, such as dynamic scenes and static scenes, and simultaneously respectively trains based on the video pictures with different resolutions as sample data, and in the training process, the video sample data is divided into a training set sample, a test set sample and a verification set sample so as to obtain a final training model with good prediction effect.
Fig. 5 is a block diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, where the apparatus is configured to execute the content adaptive video coding method according to the foregoing embodiment, and the apparatus has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 5, the apparatus specifically includes an image set determining module 101, a code rate parameter determining module 102, and an encoding module 103, wherein,
An image set determining module 101, configured to obtain video data to be encoded, and divide the video data into a plurality of image sets including continuous frame images;
the code rate parameter determining module 102 is configured to determine coding features of the image set, input the coding features and the set video picture evaluation parameters to a pre-trained machine learning model, and output code rate control parameters;
an encoding module 103, configured to encode the image set according to the encoding feature and the rate control parameter.
According to the scheme, when video coding is carried out, firstly, video is divided into image sets, coding characteristics are obtained by primary coding, then, accurate code rate control parameters are output by using a trained machine learning model, and then, secondary coding is carried out on the image sets based on the code rate control parameters and the coding characteristics obtained in the primary coding process to finally obtain video coding results.
In one possible embodiment, the code rate parameter determining module 102 is specifically configured to:
acquiring a preset number of frame images in the image set;
And coding the frame images with the preset number to obtain coding features, and determining the coding features as the coding features of the image set.
In one possible embodiment, the machine learning model includes a joint model composed of a first training model and a second training model, and the code rate parameter determining module 102 is specifically configured to:
inputting the coding characteristics and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain a first code rate control parameter output by the first training model and a second code rate control parameter output by the second training model;
and carrying out weighted average calculation on the first code rate control parameter and the second code rate control parameter to obtain the code rate control parameter.
In one possible embodiment, the code rate parameter determining module 102 is further configured to:
Before the coding characteristics and the set video picture evaluation parameters are respectively input into the first training model and the second training model, video sample data of different scene types and corresponding different resolutions are obtained;
Dividing the video sample data into a training set sample, a test set sample and a verification set sample, and respectively inputting the training set sample, the test set sample and the verification set sample into the first training model and the second training model for training.
In one possible embodiment, the encoding module 103 is specifically configured to:
determining frame type information and scene information according to the coding features;
Performing predictive analysis according to the frame type information, the scene information and the code rate control parameter to obtain coding parameters;
The set of images is encoded based on the encoding parameters.
In one possible embodiment, the encoding module 103 is specifically configured to:
Performing predictive analysis to obtain a first coding parameter;
and determining a second coding parameter based on the first coding parameter, the coding feedback information, the buffer information, the frame type information and the scene information.
In one possible embodiment, the encoding module 103 is specifically configured to:
Adjusting a quantization offset parameter according to the first coding parameter;
and encoding the image set according to the second encoding parameter and the adjusted quantization offset parameter to output code stream data.
Fig. 6 is a schematic structural diagram of a content adaptive video coding apparatus according to an embodiment of the present invention, where, as shown in fig. 6, the apparatus includes a processor 201, a memory 202, an input device 203 and an output device 204, where the number of processors 201 in the apparatus may be one or more, in fig. 6, one processor 201 is taken as an example, and the processor 201, the memory 202, the input device 203 and the output device 204 in the apparatus may be connected by a bus or other manners, in fig. 6, which is taken as an example by a bus connection. The memory 202 is used as a computer readable storage medium for storing software programs, computer executable programs and modules, such as program instructions/modules corresponding to the content adaptive video coding method in the embodiment of the present invention. The processor 201 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 202, i.e., implements the content adaptive video encoding method described above. The input means 203 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.
The embodiment of the present invention also provides a storage medium containing computer executable instructions, which when executed by a computer processor, are configured to perform a content adaptive video encoding method described in the foregoing embodiment, specifically including:
acquiring video data to be encoded, and dividing the video data into a plurality of image sets containing continuous frame images;
determining coding characteristics of the image set, and inputting the coding characteristics and set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters;
And encoding the image set according to the encoding characteristics and the code rate control parameters.
It should be noted that, in the embodiment of the adaptive video coding apparatus, each unit and module included in the embodiment are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be implemented, and in addition, specific names of each functional unit are only for convenience of distinguishing each other, and are not used for limiting the protection scope of the embodiment of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the embodiments of the present invention are not limited to the particular embodiments described herein, but are capable of numerous obvious changes, rearrangements and substitutions without departing from the scope of the embodiments of the present invention. Therefore, while the embodiments of the present invention have been described in connection with the above embodiments, the embodiments of the present invention are not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.

Claims (8)

1.内容自适应视频编码方法,其特征在于,包括:1. A content-adaptive video encoding method, comprising: 获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;Acquire video data to be encoded, and divide the video data into a plurality of image sets including continuous frame images; 确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;Determine the coding features of the image set, and input the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output a bit rate control parameter; 根据所述编码特征确定帧类型信息和场景信息,根据所述帧类型信息、所述场景信息和所述码率控制参数进行预测分析得到编码参数,基于所述编码参数对所述图像集合进行编码,其中,所述编码参数包括进行预测分析时得到第一编码参数,以及基于所述第一编码参数、编码返馈信息、缓存信息、所述帧类型信息、所述场景信息确定第二编码参数。Frame type information and scene information are determined according to the encoding features, encoding parameters are obtained by performing predictive analysis according to the frame type information, the scene information and the bit rate control parameters, and the image set is encoded based on the encoding parameters, wherein the encoding parameters include a first encoding parameter obtained when performing predictive analysis, and a second encoding parameter is determined based on the first encoding parameter, encoding feedback information, cache information, the frame type information, and the scene information. 2.根据权利要求1所述的内容自适应视频编码方法,其特征在于,所述确定所述图像集合的编码特征,包括:2. The content adaptive video encoding method according to claim 1, wherein determining the encoding features of the image set comprises: 获取所述图像集合中预设数量的帧图像;Acquire a preset number of frame images from the image set; 对所述预设数量的帧图像进行编码得到编码特征,将所述编码特征确定为所述图像集合的编码特征。The preset number of frame images are encoded to obtain encoding features, and the encoding features are determined as encoding features of the image set. 3.根据权利要求1所述的内容自适应视频编码方法,其特征在于,所述机器学习模型包括第一训练模型和第二训练模型组成的联合模型,所述将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数,包括:3. The content adaptive video encoding method according to claim 1, characterized in that the machine learning model comprises a joint model consisting of a first training model and a second training model, and the step of inputting the encoding features and the set video picture evaluation parameters into the pre-trained machine learning model to output the bit rate control parameters comprises: 将所述编码特征以及设置的视频画面评价参数分别输入所述第一训练模型和所述第二训练模型,得到所述第一训练模型输出的第一码率控制参数,以及所述第二训练模型输出的第二码率控制参数;Inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, to obtain a first rate control parameter output by the first training model and a second rate control parameter output by the second training model; 对所述第一码率控制参数和所述第二码率控制参数进行加权平均计算得到码率控制参数。A rate control parameter is obtained by performing weighted average calculation on the first rate control parameter and the second rate control parameter. 4.根据权利要求3所述的内容自适应视频编码方法,其特征在于,在将所述编码特征以及设置的视频画面评价参数分别输入所述第一训练模型和所述第二训练模型之前,还包括:4. The content adaptive video encoding method according to claim 3, characterized in that before the encoding features and the set video picture evaluation parameters are respectively input into the first training model and the second training model, it also includes: 获取不同场景类型以及对应不同分辨率的视频样本数据;Obtain video sample data of different scene types and corresponding resolutions; 将所述视频样本数据划分为训练集样本、测试集样本和验证集样本,并分别输入至所述第一训练模型和所述第二训练模型进行训练。The video sample data is divided into training set samples, test set samples and validation set samples, and are respectively input into the first training model and the second training model for training. 5.根据权利要求1所述的内容自适应视频编码方法,其特征在于,所述基于所述编码参数对所述图像集合进行编码,包括:5. The content adaptive video encoding method according to claim 1, wherein encoding the image set based on the encoding parameters comprises: 根据所述第一编码参数对量化偏移参数进行调整;Adjusting a quantization offset parameter according to the first encoding parameter; 根据所述第二编码参数和调整后的量化偏移参数对所述图像集合进行编码,以输出码流数据。The image set is encoded according to the second encoding parameter and the adjusted quantization offset parameter to output code stream data. 6.内容自适应视频编码装置,其特征在于,包括:6. A content-adaptive video encoding device, comprising: 图像集合确定模块,用于获取待编码的视频数据,将所述视频数据划分为多个包含连续帧图像的图像集合;An image set determination module is used to obtain video data to be encoded and divide the video data into a plurality of image sets including continuous frame images; 码率参数确定模块,用于确定所述图像集合的编码特征,将所述编码特征以及设置的视频画面评价参数输入至预先训练的机器学习模型输出码率控制参数;A bit rate parameter determination module is used to determine the coding features of the image set, and input the coding features and the set video picture evaluation parameters into a pre-trained machine learning model to output a bit rate control parameter; 编码模块,用于根据所述编码特征确定帧类型信息和场景信息,根据所述帧类型信息、所述场景信息和所述码率控制参数进行预测分析得到编码参数,基于所述编码参数对所述图像集合进行编码,其中,所述编码参数包括进行预测分析时得到第一编码参数,以及基于所述第一编码参数、编码返馈信息、缓存信息、所述帧类型信息、所述场景信息确定第二编码参数。A coding module, used to determine frame type information and scene information according to the coding features, perform predictive analysis based on the frame type information, the scene information and the bit rate control parameters to obtain coding parameters, and encode the image set based on the coding parameters, wherein the coding parameters include a first coding parameter obtained when performing predictive analysis, and a second coding parameter determined based on the first coding parameter, coding feedback information, cache information, the frame type information, and the scene information. 7.一种内容自适应视频编码设备,所述设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-5中任一项所述的内容自适应视频编码方法。7. A content adaptive video encoding device, the device comprising: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, the one or more processors implement the content adaptive video encoding method as described in any one of claims 1-5. 8.一种存储计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-5中任一项所述的内容自适应视频编码方法。8. A storage medium storing computer executable instructions, wherein the computer executable instructions are used to perform the content adaptive video encoding method according to any one of claims 1 to 5 when executed by a computer processor.
CN202210043241.9A 2022-01-14 2022-01-14 Content-adaptive video encoding method, device, equipment and storage medium Active CN114554211B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210043241.9A CN114554211B (en) 2022-01-14 2022-01-14 Content-adaptive video encoding method, device, equipment and storage medium
PCT/CN2023/070555 WO2023134523A1 (en) 2022-01-14 2023-01-04 Content adaptive video coding method and apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210043241.9A CN114554211B (en) 2022-01-14 2022-01-14 Content-adaptive video encoding method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114554211A CN114554211A (en) 2022-05-27
CN114554211B true CN114554211B (en) 2025-01-28

Family

ID=81671210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210043241.9A Active CN114554211B (en) 2022-01-14 2022-01-14 Content-adaptive video encoding method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114554211B (en)
WO (1) WO2023134523A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114554211B (en) * 2022-01-14 2025-01-28 百果园技术(新加坡)有限公司 Content-adaptive video encoding method, device, equipment and storage medium
CN115209151A (en) * 2022-07-18 2022-10-18 北京达佳互联信息技术有限公司 Video coding method, device, server and computer readable storage medium
CN115379229B (en) * 2022-07-19 2025-03-18 百果园技术(新加坡)有限公司 Content-adaptive video encoding method and device
CN117750019B (en) * 2022-11-24 2024-09-24 行吟信息科技(武汉)有限公司 Video encoding method and device, electronic equipment and computer readable storage medium
CN117750014B (en) * 2022-11-24 2024-11-26 行吟信息科技(武汉)有限公司 Video encoding method, device and storage medium
CN117750018A (en) * 2022-12-16 2024-03-22 书行科技(北京)有限公司 Video encoding method, video encoding device, electronic equipment and storage medium
CN117729335B (en) * 2023-03-14 2024-11-19 书行科技(北京)有限公司 Video data processing method, device, computer equipment and storage medium
CN116320429B (en) * 2023-04-12 2024-02-02 瀚博半导体(上海)有限公司 Video encoding method, apparatus, computer device, and computer-readable storage medium
CN117459732B (en) * 2023-10-25 2024-08-16 书行科技(北京)有限公司 Video encoding method, apparatus, device, readable storage medium, and program product
CN117676156A (en) * 2023-11-21 2024-03-08 书行科技(北京)有限公司 Video coding data prediction method, video coding method and related equipment
CN117714700B (en) * 2023-12-20 2024-12-24 书行科技(北京)有限公司 Video coding method, device, equipment, readable storage medium and product
CN117750080A (en) * 2023-12-28 2024-03-22 广州速启科技有限责任公司 Coding parameter prediction method and server for audio and video streaming
CN117956157B (en) * 2024-02-27 2024-11-05 书行科技(北京)有限公司 Video encoding method, device, electronic device and computer storage medium
CN118524222B (en) * 2024-07-22 2024-09-27 湖南快乐阳光互动娱乐传媒有限公司 Video transcoding method and device, storage medium and electronic equipment
CN118646877B (en) * 2024-08-15 2024-11-05 浙江大华技术股份有限公司 Video coding code rate adjusting method, device and image processing system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video
CN111083473A (en) * 2019-12-28 2020-04-28 杭州当虹科技股份有限公司 Content self-adaptive video coding method based on machine learning

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2851111B1 (en) * 2003-02-10 2005-07-22 Nextream France DEVICE FOR ENCODING A VIDEO DATA STREAM
US8879623B2 (en) * 2009-09-02 2014-11-04 Sony Computer Entertainment Inc. Picture-level rate control for video encoding a scene-change I picture
CN101895759B (en) * 2010-07-28 2011-10-05 南京信息工程大学 A H.264 code rate control method
KR101868270B1 (en) * 2017-02-28 2018-06-15 재단법인 다차원 스마트 아이티 융합시스템 연구단 Content-aware video encoding method, controller and system based on single-pass consistent quality control
CN110351555B (en) * 2018-04-03 2021-04-23 杭州微帧信息科技有限公司 Multi-traversal video coding rate allocation and control optimization method based on reinforcement learning
WO2020036502A1 (en) * 2018-08-14 2020-02-20 Huawei Technologies Co., Ltd Machine-learning-based adaptation of coding parameters for video encoding using motion and object detection
CN110971943B (en) * 2018-09-30 2021-10-15 北京微播视界科技有限公司 Video code rate adjusting method, device, terminal and storage medium
CN110933430B (en) * 2019-12-16 2022-03-25 电子科技大学 Secondary encoding optimization method
CN112383777B (en) * 2020-09-28 2023-09-05 北京达佳互联信息技术有限公司 Video encoding method, video encoding device, electronic equipment and storage medium
CN114554211B (en) * 2022-01-14 2025-01-28 百果园技术(新加坡)有限公司 Content-adaptive video encoding method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video
CN111083473A (en) * 2019-12-28 2020-04-28 杭州当虹科技股份有限公司 Content self-adaptive video coding method based on machine learning

Also Published As

Publication number Publication date
CN114554211A (en) 2022-05-27
WO2023134523A1 (en) 2023-07-20

Similar Documents

Publication Publication Date Title
CN114554211B (en) Content-adaptive video encoding method, device, equipment and storage medium
JP7075983B2 (en) Video processing equipment and video stream processing method
US12052427B2 (en) Video data processing method and apparatus, and storage medium
WO2021135983A1 (en) Video transcoding method and apparatus, server and storage medium
US20200310739A1 (en) Real-time screen sharing
CN110620924B (en) Method and device for processing coded data, computer equipment and storage medium
CN111277826A (en) Video data processing method and device and storage medium
CN111182303A (en) Encoding method and device for shared screen, computer readable medium and electronic equipment
CN111385577B (en) Video transcoding method, device, computer equipment and computer readable storage medium
WO2021129007A1 (en) Method and device for determining video bitrate, computer apparatus, and storage medium
CN112437301B (en) A code rate control method, device, storage medium and terminal for visual analysis
CA3182110A1 (en) Reinforcement learning based rate control
US12192478B2 (en) Adaptively encoding video frames using content and network analysis
CN110740316A (en) Data coding method and device
WO2024152893A1 (en) Video display control method, apparatus and device, and medium and product
US20140254688A1 (en) Perceptual Quality Of Content In Video Collaboration
CN114245209A (en) Video resolution determination method, video resolution determination device, video model training method, video coding device and video coding device
CN115022629B (en) Method and device for determining optimal encoding mode of cloud game video
CN116471262A (en) Video quality evaluation method, apparatus, device, storage medium, and program product
CN111524110A (en) Video quality evaluation model construction method, evaluation method and device
CN115379229A (en) Content adaptive video coding method and device
Amirpour et al. A real-time video quality metric for HTTP adaptive streaming
WO2024109138A1 (en) Video encoding method and apparatus and storage medium
TW202207053A (en) Image quality assessment apparatus and image quality assessment method thereof
CN116980604A (en) Video encoding method, video decoding method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant