CN117201796A

CN117201796A - Video encoding method, apparatus, computing device and storage medium

Info

Publication number: CN117201796A
Application number: CN202311021953.1A
Authority: CN
Inventors: 石仔良; 刘天义
Original assignee: Shenzhen Huawei Cloud Computing Technology Co ltd
Current assignee: Shenzhen Huawei Cloud Computing Technology Co ltd
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2023-12-08
Anticipated expiration: 2043-08-14
Also published as: CN117201796B

Abstract

The application provides a video encoding method, a video encoding apparatus, a computing device and a computer-readable storage medium. The method comprises the following steps: acquiring generation information of a video; dividing a first frame of the video according to the generated information to obtain a plurality of coding blocks, wherein the plurality of coding blocks of the first frame comprise a current block, and target blocks are included in the plurality of coding blocks of the first frame or in a second frame adjacent to the first frame of the video; predicting predicted pixel information of the target block according to the current pixel information of the current block; the video is encoded based on the predicted pixel information. According to the application, the division of the coding blocks can be rapidly realized with less calculation amount, thereby accelerating the video coding process.

Description

Video encoding method, device, computing device and storage medium

技术领域Technical field

本申请涉及图像处理技术领域，特别涉及一种视频编码方法、视频编码装置、计算设备和计算机可读存储介质。The present application relates to the field of image processing technology, and in particular to a video encoding method, a video encoding device, a computing device and a computer-readable storage medium.

背景技术Background technique

在视频编码压缩技术领域中，存在帧内预测和帧间预测技术，这两种技术都需要将视频帧划分成多个编码块从而更好进行预测编码。然而，现有技术中的编码块划分过程往往需要进行多次尝试才能得到最优的划分方式，需要大量的计算，导致计算成本增加，时间消耗增大，越来越难以适应当前视频数据量不断增大、传输速率要求越来越高的现状。因此，本领域亟需一种能够以较少的计算量快速实现视频帧的编码块划分，从而加速视频编码过程的技术。In the field of video coding and compression technology, there are intra-frame prediction and inter-frame prediction technologies. Both technologies require dividing the video frame into multiple coding blocks for better predictive coding. However, the coding block division process in the existing technology often requires multiple attempts to obtain the optimal division method, which requires a large amount of calculations, resulting in increased calculation costs and time consumption, making it increasingly difficult to adapt to the increasing amount of current video data. The current situation is that the transmission rate requirements are getting higher and higher. Therefore, there is an urgent need in this field for a technology that can quickly divide video frames into coding blocks with less calculation, thereby accelerating the video coding process.

发明内容Contents of the invention

为此，本申请致力于提供一种视频编码方法、视频编码装置、计算设备和计算机可读存储介质，其能够以较少的计算量快速实现视频帧的编码块划分，从而加速视频编码过程。To this end, the present application is committed to providing a video encoding method, a video encoding device, a computing device, and a computer-readable storage medium, which can quickly implement encoding block division of video frames with less calculation, thereby accelerating the video encoding process.

在一方面，本申请提供一种视频编码方法，包括：获取视频的生成信息；根据生成信息，对视频的第一帧进行划分，得到多个编码块，第一帧的多个编码块包括当前块，第一帧的多个编码块中或视频的邻近第一帧的第二帧中包括目标块；根据当前块的当前像素信息，预测目标块的预测像素信息；根据预测像素信息，对视频进行编码。In one aspect, the present application provides a video encoding method, which includes: obtaining generation information of the video; dividing the first frame of the video according to the generation information to obtain multiple coding blocks. The multiple coding blocks of the first frame include the current block, the target block is included in multiple coding blocks of the first frame or in the second frame adjacent to the first frame of the video; according to the current pixel information of the current block, the predicted pixel information of the target block is predicted; according to the predicted pixel information, the video Encode.

根据本方面，通过视频的生成信息来确定视频帧的编码块划分方式，有利于直接快速根据视频内容的性质(材质、前景与背景等)进行最优的划分，避免反复尝试所带来的计算量增大，大幅提高了生成式视频的编码效率，使得生成式视频可以具有更清晰的画面和更大的传输速率。According to this aspect, the video generation information is used to determine the encoding block division method of the video frame, which is conducive to directly and quickly making optimal divisions based on the nature of the video content (material, foreground and background, etc.) and avoiding calculations caused by repeated attempts. The increased volume greatly improves the coding efficiency of generative videos, allowing generative videos to have clearer images and greater transmission rates.

在本申请一特别实施例中，生成信息包括第一帧中的前景物体的位置信息和材质信息。In a special embodiment of the present application, the generated information includes position information and material information of the foreground object in the first frame.

根据本实施例，前景物体的位置信息和材质信息往往无法直接根据视频画面本身得到，但是视频生成工具往往能够提供前景物体与背景物体的区分，从而确定前景物体的位置信息。另外，生成工具在渲染前景物体的表面色彩和材质时，可以确定前景的材质信息。根据这两种信息，可以有助于确定编码块的不同划分策略，使得后续的预测和编码过程更为准确高效。According to this embodiment, the position information and material information of the foreground object often cannot be obtained directly from the video picture itself, but the video generation tool can often provide the distinction between the foreground object and the background object, thereby determining the position information of the foreground object. In addition, the generation tool can determine the material information of the foreground when rendering the surface color and material of the foreground object. Based on these two kinds of information, it can help to determine different division strategies of coding blocks, making the subsequent prediction and coding process more accurate and efficient.

在本申请一特别实施例中，根据生成信息，对视频的第一帧进行划分，得到多个编码块，包括：根据前景物体的位置信息，确定第一帧中的前景区域和背景区域；对前景区域按照第一模式进行划分，并对背景区域按照第二模式进行划分，其中第一模式中的编码块小于第二模式中的编码块。In a special embodiment of the present application, the first frame of the video is divided according to the generated information to obtain multiple coding blocks, including: determining the foreground area and background area in the first frame according to the position information of the foreground object; The foreground area is divided according to the first mode, and the background area is divided according to the second mode, wherein the coding blocks in the first mode are smaller than the coding blocks in the second mode.

根据本实施例，对前景区域和背景区域采用不同的划分方式，有利于利用前景区域和背景区域在显示内容变化幅度上的区别，采取不同的划分方式，从而提高预测的准确性以及压缩效率。According to this embodiment, using different division methods for the foreground area and the background area is beneficial to utilizing the difference in the change amplitude of the display content between the foreground area and the background area to adopt different division methods, thereby improving prediction accuracy and compression efficiency.

在本申请一特别实施例中，根据生成信息，对视频的第一帧进行划分，得到多个编码块，还包括：根据前景物体的位置信息，确定第一帧中的前景物体的边缘区域；对边缘区域按照第一模式进行划分。In a special embodiment of the present application, dividing the first frame of the video according to the generated information to obtain multiple coding blocks also includes: determining the edge area of the foreground object in the first frame according to the position information of the foreground object; The edge area is divided according to the first mode.

根据本实施例，对边缘区域采取更为细致的编码块划分方式，可以利用边缘区域更容易产生的变化的规律，更为清晰地保留前景物体边缘的显示细节，尽可能保持边缘区域的图像内容的准确性和逼真度。According to this embodiment, a more detailed coding block division method is adopted for the edge area, and the change pattern that is more likely to occur in the edge area can be used to more clearly retain the display details of the edge of the foreground object and maintain the image content of the edge area as much as possible. accuracy and fidelity.

在本申请一特别实施例中，根据生成信息，对视频的第一帧进行划分，得到多个编码块，包括：根据前景物体的位置信息，确定第一帧中的前景区域和背景区域；根据前景物体的材质信息，判断前景物体属于刚性物体或柔性物体；若前景物体属于刚性物体，对前景区域按照第三模式进行划分；若前景物体属于柔性物体，对前景区域按照第四模式进行划分，其中第三模式中的编码块大于第四模式中的编码块。In a special embodiment of the present application, the first frame of the video is divided according to the generated information to obtain multiple coding blocks, including: determining the foreground area and background area in the first frame according to the position information of the foreground object; Based on the material information of the foreground object, it is judged whether the foreground object is a rigid object or a flexible object; if the foreground object is a rigid object, the foreground area is divided according to the third mode; if the foreground object is a flexible object, the foreground area is divided according to the fourth mode. The coding blocks in the third mode are larger than the coding blocks in the fourth mode.

根据本实施例，针对刚性物体和柔性物体采取不同的划分方式，有利于利用两种物体在视频画面中的不同显示规律，更高效地压缩视频数据，同时尽可能保持柔性物体的显示细节，提高画面的清晰度和还原度。According to this embodiment, different division methods are adopted for rigid objects and flexible objects, which is conducive to utilizing the different display rules of the two objects in the video picture to compress the video data more efficiently, while maintaining the display details of the flexible objects as much as possible, and improving The clarity and restoration of the picture.

在本申请一特别实施例中，视频包括通过视频生成工具生成的虚拟场景视频，获取视频的生成信息，包括：从视频生成工具获取生成信息。In a special embodiment of the present application, the video includes a virtual scene video generated by a video generation tool, and obtaining the generation information of the video includes: obtaining the generation information from the video generation tool.

根据本实施例，虚拟场景视频大量存在于如今的视频产品中，包括游戏、电影、电视剧中大量存在通过人工制作、特效技术产生的虚拟场景视频，针对虚拟场景视频的编码加速有利于极大推动视频内容的传播和广泛使用。另外，从制作虚拟场景视频的视频生成工具获取生成信息，有利于从源头直接快速获取有助于视频编码的信息，提高信息获取的准确性和及时性。According to this embodiment, a large number of virtual scene videos exist in today's video products, including games, movies, and TV series that contain a large number of virtual scene videos produced through manual production and special effects technology. The encoding acceleration of virtual scene videos is beneficial to greatly promote Dissemination and widespread use of video content. In addition, obtaining generation information from video generation tools that produce virtual scene videos is conducive to quickly obtaining information helpful for video encoding directly from the source, improving the accuracy and timeliness of information acquisition.

在本申请一特别实施例中，视频生成工具包括渲染器或AI工具。In a special embodiment of the present application, the video generation tool includes a renderer or an AI tool.

根据本实施例，渲染器和AI工具是常见的视频生成工具，其能够提供有关视频内容的大量隐藏信息，从而有利于编码块划分时参考这些信息快速确定划分方式。According to this embodiment, renderers and AI tools are common video generation tools that can provide a large amount of hidden information about video content, which is beneficial to quickly determine the division method by referring to this information when dividing encoding blocks.

在本申请一特别实施例中，编码块包括宏块、子块、编码树单元CTU、编码单元CU、预测单元PU和转换单元TU中的一种或多种。In a special embodiment of the present application, the coding block includes one or more of a macroblock, a sub-block, a coding tree unit CTU, a coding unit CU, a prediction unit PU, and a transformation unit TU.

根据本实施例，本领域中存在各种编码块的具体形式，都可以采用本申请提供的编码块划分方法，从而有利于在各个编码层面上都实现编码块划分的加速，从而整体上大幅提高编码压缩的速率。According to this embodiment, there are various specific forms of coding blocks in the field, and the coding block division method provided by this application can be adopted, which is beneficial to realizing the acceleration of coding block division at each coding level, thereby greatly improving the overall Encoding compression rate.

在另一方面，本申请提供一种视频编码装置，包括：获取模块，用于获取视频的生成信息；划分模块，用于根据生成信息，对视频的第一帧进行划分，得到多个编码块，第一帧的多个编码块包括当前块，第一帧的多个编码块中或视频的邻近第一帧的第二帧中包括目标块；预测模块，用于根据当前块的当前像素信息，预测目标块的预测像素信息；编码模块，用于根据预测像素信息，对视频进行编码。On the other hand, the present application provides a video encoding device, including: an acquisition module, used to obtain the generation information of the video; a dividing module, used to divide the first frame of the video according to the generation information to obtain multiple encoding blocks. , the multiple coding blocks of the first frame include the current block, and the multiple coding blocks of the first frame or the second frame adjacent to the first frame of the video include the target block; the prediction module is used to predict the current block according to the current pixel information of the current block. , predict the predicted pixel information of the target block; the encoding module is used to encode the video based on the predicted pixel information.

在另一方面，本申请提供一种计算设备，其特征在于，计算设备包括处理器和存储器，处理器用于执行存储于存储器内的计算机程序以实现上述的视频编码方法。In another aspect, the present application provides a computing device, characterized in that the computing device includes a processor and a memory, and the processor is configured to execute a computer program stored in the memory to implement the above video encoding method.

在另一方面，本申请提供一种计算机可读存储介质，其特征在于，计算机可读存储介质存储有计算机程序，计算机程序用于执行上述的视频编码方法。On the other hand, the present application provides a computer-readable storage medium, which is characterized in that the computer-readable storage medium stores a computer program, and the computer program is used to execute the above video encoding method.

在另一方面，本申请提供一种计算机程序产品，包括程序代码，当计算机运行所述计算机程序产品时，使得所述计算机实现上述的视频编码方法。In another aspect, the present application provides a computer program product, including program code, which causes the computer to implement the above video encoding method when the computer runs the computer program product.

上述提供的任一种视频编码装置、计算设备、计算机可读存储介质或计算机程序产品，均用于执行上文所提供的视频编码方法，因此，其所能达到的有益效果可参考上文提供的对应方法中的对应方案的有益效果，此处不再赘述。Any video encoding device, computing device, computer-readable storage medium or computer program product provided above is used to execute the video encoding method provided above. Therefore, the beneficial effects it can achieve can be referred to the above provided The beneficial effects of the corresponding solution in the corresponding method will not be described again here.

附图说明Description of the drawings

以下，结合附图详细描述本申请的具体实施方式，其中：Below, specific embodiments of the present application are described in detail with reference to the accompanying drawings, wherein:

图1示出根据本申请一实施例的视频编码方法的应用场景示意图；Figure 1 shows a schematic diagram of an application scenario of a video encoding method according to an embodiment of the present application;

图2示出根据图1实施例的视频编码方法的帧间预测的示意图；Figure 2 shows a schematic diagram of inter-frame prediction according to the video encoding method according to the embodiment of Figure 1;

图3示出根据图1实施例的视频编码方法的编码块划分方式示意图；Figure 3 shows a schematic diagram of the coding block division method according to the video coding method according to the embodiment of Figure 1;

图4示出根据本申请另一实施例的视频编码方法的流程示意图；Figure 4 shows a schematic flowchart of a video encoding method according to another embodiment of the present application;

图5示出根据图4实施例的视频编码方法的划分步骤的示意图；Figure 5 shows a schematic diagram of the dividing steps of the video encoding method according to the embodiment of Figure 4;

图6示出根据图4实施例的视频编码方法的划分步骤的另一示意图；Figure 6 shows another schematic diagram of the dividing steps of the video encoding method according to the embodiment of Figure 4;

图7示出根据图4实施例的视频编码方法的划分步骤的另一示意图；Figure 7 shows another schematic diagram of the dividing steps of the video encoding method according to the embodiment of Figure 4;

图8示出根据图4实施例的视频编码方法的划分步骤的另一示意图；Figure 8 shows another schematic diagram of the dividing steps of the video encoding method according to the embodiment of Figure 4;

图9示出根据图4实施例的视频编码方法的划分步骤的另一示意图；Figure 9 shows another schematic diagram of the dividing steps of the video encoding method according to the embodiment of Figure 4;

图10示出根据图4实施例的视频编码方法的划分步骤的另一示意图；Figure 10 shows another schematic diagram of the dividing steps of the video encoding method according to the embodiment of Figure 4;

图11示出根据本申请一实施例的视频编码装置的结构示意图；Figure 11 shows a schematic structural diagram of a video encoding device according to an embodiment of the present application;

图12示出根据本申请一实施例的计算设备的结构示意图。Figure 12 shows a schematic structural diagram of a computing device according to an embodiment of the present application.

具体实施方式Detailed ways

为了使本领域技术人员更加清楚地理解本申请的概念和思想，以下结合具体实施例详细描述本申请。应理解，本文给出的实施例都只是本申请可能具有的所有实施例的一部分。本领域技术人员在阅读本申请的说明书以后，有能力对下述实施例的部分或整体作出改进、改造、或替换，这些改进、改造、或替换也都包含在本申请要求保护的范围内。In order to enable those skilled in the art to understand the concepts and ideas of the present application more clearly, the present application is described in detail below with reference to specific embodiments. It should be understood that the embodiments given herein are only a portion of all possible embodiments of the present application. After reading the description of this application, those skilled in the art are capable of making improvements, transformations, or substitutions to part or all of the following embodiments, and these improvements, transformations, or substitutions are also included in the scope of protection claimed by this application.

在本文中，术语“一”、“一个”和其它类似词语并不意在表示只存在一个所述事物，而是表示有关描述仅仅针对所述事物中的一个，所述事物可能具有一个或多个。在本文中，术语“包含”、“包括”和其它类似词语意在表示逻辑上的相互关系，而不能视作表示空间结构上的关系。例如，“A包括B”意在表示在逻辑上B属于A，而不表示在空间上B位于A的内部。另外，术语“包含”、“包括”和其它类似词语的含义应视为开放性的，而非封闭性的。例如，“A包括B”意在表示B属于A，但是B不一定构成A的全部，A还可能包括C、D、E等其它元素。In this article, the terms "a", "an" and other similar words are not intended to indicate that there is only one of the things, but rather that the description is directed to only one of the things, which may have one or more . In this article, the terms "comprising", "includes" and other similar words are intended to indicate logical interrelationships and cannot be regarded as indicating spatial structural relationships. For example, "A includes B" is intended to mean that B belongs to A logically, but does not mean that B is located inside A spatially. In addition, the terms "includes," "includes," and other similar words are to be considered open-ended rather than closed-ended. For example, "A includes B" is intended to mean that B belongs to A, but B does not necessarily constitute all of A. A may also include other elements such as C, D, and E.

在本文中，术语“第一”、“第二”和其它类似词语并不意在暗示任何顺序、数量和重要性，而是仅仅用于对不同的元件进行区分。在本文中，术语“实施例”、“本实施例”、“一实施例”、“一个实施例”并不表示有关描述仅仅适用于一个特定的实施例，而是表示这些描述还可能适用于另外一个或多个实施例中。本领域技术人员应理解，在本文中，任何针对某一个实施例所做的描述都可以与另外一个或多个实施例中的有关描述进行替代、组合、或者以其它方式结合，所述替代、组合、或者以其它方式结合所产生的新实施例是本领域技术人员能够容易想到的，属于本申请的保护范围。In this document, the terms "first", "second" and other similar words are not intended to imply any order, quantity or importance, but are merely used to distinguish different elements. In this article, the terms "embodiment", "this embodiment", "an embodiment" and "an embodiment" do not mean that the relevant description is only applicable to a specific embodiment, but mean that the description may also be applicable to in one or more embodiments. Those skilled in the art should understand that any description of a certain embodiment herein can be replaced, combined, or otherwise combined with the relevant description of one or more embodiments. New embodiments generated by combining or combining in other ways can be easily thought of by those skilled in the art, and belong to the protection scope of this application.

在本申请各实施例中，视频可以是指将一系列静态影像以电信号的方式加以捕捉、记录、处理、储存、传送与重现的各种技术。例如，视频就是利用人眼视觉暂留的原理，通过播放一系列的图片(也称帧)，使人眼产生运动的感觉。在一些实施例中，视频可以包括生成式视频。生成式视频可以是指与传统视频不同的视频，传统视频由视频采集设备，如摄像头、带摄像功能手机、录像机等，对真实世界中的景象拍摄产生；生成式视频相对传统视频不是由视频采集设备生成，通常由计算机软件生成，如渲染器、AI等，根据用户需求生成相应的连续画面组成的视频，最常见的为游戏、电影等视频。In various embodiments of the present application, video may refer to various technologies that capture, record, process, store, transmit and reproduce a series of still images in the form of electrical signals. For example, video uses the principle of persistence of human vision to give the human eye a sense of movement by playing a series of pictures (also called frames). In some embodiments, the video may include generative video. Generative video can refer to video that is different from traditional video. Traditional video is produced by video collection equipment, such as cameras, mobile phones with camera functions, video recorders, etc., which capture scenes in the real world. Compared with traditional video, generative video is not produced by video collection. Device generation is usually generated by computer software, such as renderers, AI, etc., and generates videos composed of corresponding continuous images according to user needs. The most common videos are games, movies, etc.

在本申请各实施例中，视频编码可以是指通过压缩技术，将原始视频格式的文件转换成另一种视频格式文件的方式。视频是连续的图像序列，由连续的帧构成，由于连续的帧之间相似性极高，为便于储存传输，需要对原始的视频进行编码压缩，以去除空间、时间维度的冗余。视频图像数据有很强的相关性，也就是说有大量的冗余信息，其中冗余信息可分为空域冗余信息和时域冗余信息，压缩技术就是将数据中的冗余信息去掉，压缩技术包含帧内图像数据压缩技术、帧间图像数据压缩技术和熵编码压缩技术。在本文中，“视频压缩”与“视频编码”在一些场景下可以互换。In various embodiments of the present application, video encoding may refer to a method of converting an original video format file into another video format file through compression technology. Video is a continuous image sequence, consisting of continuous frames. Due to the extremely high similarity between continuous frames, in order to facilitate storage and transmission, the original video needs to be encoded and compressed to remove redundancy in the spatial and temporal dimensions. Video image data has a strong correlation, which means there is a lot of redundant information. The redundant information can be divided into spatial domain redundant information and time domain redundant information. Compression technology is to remove the redundant information in the data. Compression technology includes intra-frame image data compression technology, inter-frame image data compression technology and entropy coding compression technology. In this article, "video compression" and "video encoding" are interchangeable in some scenarios.

在本领域中，游戏、大部分电影的视频画面都是由电脑通过渲染器生成，特别是非本地渲染场景，如云端渲染，渲染器生成视频后，需要通过视频编码器进行压缩，再推流到终端进行解码显示。非本地渲染场景一般遵循如下工作流程：1)渲染器根据设定程序生成视频画面；2)视频画面经过视频编码器编码(压缩)，极大降低网络传输带宽；3)压缩编码视频流，通过网络传输给终端用户设备；4)终端用户设备通过解码器对压缩编码视频流进行解码，解码后通过显示器显示渲染器生成的视频画面。In this field, the video images of games and most movies are generated by computers through renderers, especially in non-local rendering scenarios, such as cloud rendering. After the renderer generates the video, it needs to be compressed by a video encoder and then pushed to The terminal performs decoding and display. Non-local rendering scenes generally follow the following workflow: 1) The renderer generates video images according to the set program; 2) The video images are encoded (compressed) by the video encoder, which greatly reduces network transmission bandwidth; 3) The compressed and encoded video stream is passed through The network transmits it to the end user device; 4) The end user device decodes the compressed and encoded video stream through the decoder, and displays the video image generated by the renderer on the display after decoding.

压缩前视频的数据量非常巨大。以当前较多应用场景的1080P为例，每个像素包含亮度和色度共以12比特表示，帧率为60(即每秒钟输出60个画面)，这样网络传输带宽的需求约为1.39Gbps。4K视频带宽需求会更大。这样大的互联网带宽需求很难满足。视频流应用的一项关键技术就是视频编码，也称视频压缩，完成视频压缩功能的即为视频编码器，和文本压缩一样，目的是去除视频数据中的冗余数据，减少编码压缩后的数据量。视频压缩有时是一种有损压缩，视频有损压缩虽然会引入一定的视频画面失真，但是这些失真对于人眼来说通常是难于察觉的。The amount of video data before compression is very large. Taking 1080P in many current application scenarios as an example, each pixel includes brightness and chroma, represented by 12 bits in total, and the frame rate is 60 (that is, 60 images are output per second). In this way, the network transmission bandwidth requirement is approximately 1.39Gbps. . 4K video bandwidth requirements will be greater. Such a large demand for Internet bandwidth is difficult to meet. A key technology for video streaming applications is video coding, also known as video compression. The video encoder that completes the video compression function is the same as text compression. The purpose is to remove redundant data in the video data and reduce the coded and compressed data. quantity. Video compression is sometimes a type of lossy compression. Although video lossy compression will introduce certain video picture distortions, these distortions are usually difficult to detect by the human eye.

视频里通常单幅图像(也称单帧)内相邻范围内像素有较强的空间相关性。连续的图像数据之间有很强的时间相关性。视频压缩编码技术主要是采用帧内预测和帧间预测方式，使用图像已编码像素预测空间和时间上邻近像素，从而有效压缩视频图像像素时域和空域冗余性。In videos, there is usually a strong spatial correlation between pixels in adjacent ranges within a single image (also called a single frame). There is a strong temporal correlation between consecutive image data. Video compression coding technology mainly uses intra-frame prediction and inter-frame prediction methods, using the coded pixels of the image to predict spatially and temporally adjacent pixels, thereby effectively compressing the temporal and spatial redundancy of video image pixels.

随着视频画面的增大，如4K、8K画面，为了更好地进行压缩编码，视频编码器通常将画面分成很多小块进行压缩编码，如AVC(advanced video coding，高级视频编码)的16x16像素大小的宏块。为了更高效地压缩视频场景中不同纹理和运动变化细节，HEVC(high efficiency video coding，高效视频编码)为图像定义了64x64像素为单位的编码树单元(coding tree unit，CTU)以及不同大小的编码单元(coding unit，CU)。VVC(versatile video coding，灵活视频编码)标准的CU划分更加灵活丰富。As the video picture increases, such as 4K and 8K pictures, in order to perform better compression and encoding, video encoders usually divide the picture into many small blocks for compression and encoding, such as 16x16 pixels of AVC (advanced video coding, advanced video coding) size macroblock. In order to more efficiently compress the details of different textures and motion changes in video scenes, HEVC (high efficiency video coding, high efficiency video coding) defines a coding tree unit (coding tree unit, CTU) of 64x64 pixels for the image and codes of different sizes. Unit (coding unit, CU). The VVC (versatile video coding, flexible video coding) standard CU division is more flexible and rich.

视频编码过程中，需要尝试不同CU划分深度，随着视频画面增大，视频编码时延要求提高，且CU间预测相互依赖，视频编码器压力越来越大。During the video encoding process, it is necessary to try different CU division depths. As the video picture increases, the video encoding delay requirements increase, and predictions between CUs are interdependent, putting increasing pressure on the video encoder.

本领域中的一种技术，在以CTU为单位进行压缩时，对于每一个CU划分深度，即64x64、32x32、16x16、8x8四种划分方式，都分别进行运动评估得到运动向量，基于运动向量进行帧间预测、离散余弦变换(discrete cosine transform，DCT)、量化、比特数评估熵编码和失真度计算，然后根据率失真优化(rate-distortion optimized，RDO)算法逐层进行比较。对于16x16像素区域，有四种划分方式：4个8x8 CU、2个4x8 CU、2个8x4 CU或1个16x16CU。对这四种划分方式分别进行运动评估，得到运动向量，然后基于运动向量进行预测，最后选择率失真最小的为划分。往上按照类似上述方式进行各种不同划分方式评估。A technology in this field. When compressing in units of CTU, for each CU division depth, that is, the four division methods of 64x64, 32x32, 16x16, and 8x8, motion evaluation is performed to obtain the motion vector, and the motion vector is obtained based on the motion vector. Inter-frame prediction, discrete cosine transform (DCT), quantization, bit number evaluation entropy coding and distortion calculation, and then compared layer by layer according to the rate-distortion optimized (RDO) algorithm. For a 16x16 pixel area, there are four division methods: 4 8x8 CUs, 2 4x8 CUs, 2 8x4 CUs, or 1 16x16 CU. Motion evaluation is performed on these four division methods to obtain motion vectors, and then prediction is made based on the motion vectors. Finally, the division with the smallest rate distortion is selected. Evaluate various division methods in a similar manner as above.

这种技术可以用于传统视频和的生成式视频，其问题在于，需要尝试所有CU划分方式，然后进行运动评估、预测计算、DCT、量化、熵编码过程，最后通过率失真计算得到最终划分方式，计算量巨大，并且最终只选定一种划分方式，存在大量计算浪费。特别对于4K、8K视频，最新VCC编码标准，难以满足实时视频编解码需要。This technology can be used for traditional videos and generative videos. The problem is that it needs to try all CU division methods, then perform motion evaluation, prediction calculation, DCT, quantization, entropy coding process, and finally obtain the final division method through rate-distortion calculation. , the amount of calculation is huge, and only one division method is finally selected, resulting in a lot of calculation waste. Especially for 4K and 8K videos, the latest VCC encoding standard is difficult to meet the needs of real-time video encoding and decoding.

本领域中的另一种技术，用于生成式视频编码，渲染引擎会提供每个像素的运动向量给视频编码器，编码器分别计算不同划分方式内各像素运动向量的相似性，如计算16x16像素块内256个像素运动向量的方差、16x8像素块内128个像素的运动向量方差、8x8像素块64个像素运动向量的方差，最终选方差最小的块的块大小作为最终划分方式。Another technology in this field is used for generative video coding. The rendering engine will provide the motion vector of each pixel to the video encoder. The encoder calculates the similarity of the motion vector of each pixel in different division methods, such as calculating 16x16 The variance of the motion vectors of 256 pixels in the pixel block, the variance of the motion vectors of 128 pixels in the 16x8 pixel block, the variance of the motion vector of 64 pixels in the 8x8 pixel block, and finally the block size of the block with the smallest variance is selected as the final division method.

这种技术可用于生成式视频，其问题在于，虽然不需要进行运动评估和RDO计算过程，但是依然需要进行不同CU划分尝试，计算量依然巨大，且存在计算浪费。This technology can be used for generative videos. The problem is that although motion evaluation and RDO calculation processes are not required, different CU partition attempts are still required, the amount of calculation is still huge, and there is computational waste.

本申请一些实施例致力于解决这样的问题：对于生成式视频，如何借助渲染器提供的信息，减小CU(或其它块)划分大小的尝试，从而减少编码器算力浪费。Some embodiments of the present application are dedicated to solving the problem of: for generative videos, how to use the information provided by the renderer to reduce the attempt to divide the size of the CU (or other blocks), thereby reducing the waste of encoder computing power.

本申请一些实施例中，对于生成式视频，在尽量减小压缩导致的画质损失情况下，减少视频编码器算力的浪费，同时降低编码器的编码时延，满足高画质低时延编码需求。具体来说，当前视频编码器，都需要尝试不同CU划分方式，最后选择一种方式作为最终划分方式，不同CU划分都需要进行大量计算，由于最终视频码流只选择其中一种划分方式，因此编码器存在大量无效计算。而实际上，视频画面中占主要范围的背景区域，前后两帧运动向量通常一致，前景物体也可能运动向量一致，理论上不需要尝试所有划分方式。特别对于生成式视频，渲染器等画面生成工具知道哪些像素属于背景区域哪些属于前景区域。因此本申请实施例通过渲染器的信息辅助视频编码器进行编码，最大限度减小无效计算。In some embodiments of this application, for generative videos, while minimizing the loss of image quality caused by compression, the waste of computing power of the video encoder is reduced, and the encoding delay of the encoder is reduced to meet the requirements of high image quality and low delay. Coding requirements. Specifically, current video encoders need to try different CU division methods, and finally choose one method as the final division method. Different CU divisions require a lot of calculations. Since only one of the division methods is selected for the final video stream, The encoder has a lot of invalid calculations. In fact, in the background area that occupies the main range of the video image, the motion vectors of the two frames before and after are usually the same, and the motion vectors of the foreground objects may also be the same. In theory, there is no need to try all division methods. Especially for generative video, image generation tools such as renderers know which pixels belong to the background area and which belong to the foreground area. Therefore, the embodiment of the present application uses the information of the renderer to assist the video encoder in encoding, thereby minimizing invalid calculations.

本申请一些实施例应用于生成式视频编码器，利用视频生成器，如渲染器、AI工具等，生成视频和视频相关信息，包括但不限于包含图层信息、前景运动物体像素在画面中的坐标信息、前景物体的材质如刚体、柔性物体信息等。视频编码器根据图层和运动物体坐标信息，对背景区域和前景运动物体进行CU进行不同划分处理，即快速实现帧间模式编码，从而实现节省视频编码器算力和降低视频编码器编码时延。Some embodiments of this application are applied to generative video encoders, using video generators, such as renderers, AI tools, etc., to generate videos and video-related information, including but not limited to layer information and foreground moving object pixels in the picture. Coordinate information, material of foreground objects such as rigid body, flexible object information, etc. The video encoder performs different CU division processing on the background area and foreground moving objects based on the layer and moving object coordinate information, that is, quickly implements inter-frame mode encoding, thereby saving the video encoder computing power and reducing the video encoder encoding delay. .

在本申请一些实施例中，帧间预测的CU划分按照背景区域和前景运动物体进行固定CU大小划分，相比尝试四种不同CU划分方式通过RDO算法选择一种最终划分方式，本申请实施例可以减小3倍的算力浪费，同时对于背景区域，往往不是视频重要区域，对最终用户体验基本没有影响。另外，本申请一些实施例将前景运动物体按照刚性和柔性材质进行不同CU划分方式，刚性物体运动向量通常一致，柔性物体各部分运动向量通常不一致，对于刚性物体可以按照64x64、32x32大块CU划分方式，柔性物体按照8x8大小CU划分，最大程度的节省算力，同时保证视频画面质量。In some embodiments of this application, the CU division of inter-frame prediction is divided into fixed CU sizes according to the background area and foreground moving objects. Compared with trying four different CU division methods to select a final division method through the RDO algorithm, the embodiments of this application It can reduce the waste of computing power by 3 times. At the same time, the background area is often not an important area of the video and has basically no impact on the end user experience. In addition, some embodiments of this application divide foreground moving objects into different CUs according to rigid and flexible materials. The motion vectors of rigid objects are usually consistent, and the motion vectors of various parts of flexible objects are usually inconsistent. Rigid objects can be divided into large CUs of 64x64 and 32x32. In this way, flexible objects are divided into CUs of 8x8 size, which saves computing power to the greatest extent while ensuring video quality.

图1示出根据本申请一实施例的视频编码方法的应用场景示意图。Figure 1 shows a schematic diagram of an application scenario of a video encoding method according to an embodiment of the present application.

本实施例包含视频生成器、视频编码器，视频编码器根据视频生成器生成的视频和相关信息，对生成式视频进行快速压缩，具体涉及视频编码器对帧内或帧间编码块的快速划分。在本实施例中，视频生成工具生成视频和相关的生成信息，视频和生成信息均被输入到视频编码器。视频编码器可以根据编码器参数进行设定和调整。视频编码器接收视频生成工具的输入，将接收到的视频根据生成信息的内容或参数进行压缩，最终产生压缩码流，用于数据传输或下一步处理。在本实施例中，用于视频编码的视频是通过视频生成工具生成的视频，而不是例如直接通过拍摄设备拍摄到的自然景象视频，视频编码器的编码过程不仅仅需要原始的视频数据，还需要能够表示视频内容有关信息的生成信息，生成信息所提供的有关视频内容的信息对于视频编码过程的加速具有重要作用。视频编码器可以通过帧内预测和帧间预测对视频进行压缩编码，帧内和帧间预测都需要将视频帧划分成编码块。This embodiment includes a video generator and a video encoder. The video encoder quickly compresses the generative video based on the video and related information generated by the video generator. Specifically, it involves the video encoder's rapid division of intra-frame or inter-frame coding blocks. . In this embodiment, the video generation tool generates a video and related generation information, and both the video and the generation information are input to the video encoder. Video encoders can be set and adjusted based on the encoder parameters. The video encoder receives the input of the video generation tool, compresses the received video according to the content or parameters of the generated information, and finally generates a compressed code stream for data transmission or further processing. In this embodiment, the video used for video encoding is a video generated by a video generation tool, rather than a natural scene video captured directly by a shooting device. The encoding process of the video encoder requires not only the original video data, but also Generated information that can represent information about video content is needed. The information about video content provided by the generated information plays an important role in accelerating the video encoding process. Video encoders can compress and encode videos through intra-frame prediction and inter-frame prediction. Both intra-frame and inter-frame prediction require dividing video frames into coding blocks.

作为示例，图2示出根据图1实施例的视频编码方法中的帧间预测过程的示意图。如图2所示，示出视频中连续的2帧画面，图中虚线框只是示意作用，不是画面内容。在这连续两帧中，只有物体1的位置发生了变换，则可以用前一帧中物体1区域(虚线正方形区域)的像素预测第二帧中物体1区域(虚线正方形区域)的像素，前后两区域对应像素相减残差，加上物体1的运动向量信息即可对第二帧进行预测编码。后一帧在进行帧间预测时，需要在前一帧中进行搜索才能找到匹配区域，这个搜索过程即为运动评估(moving estimation，ME)，物体或像素前后两帧之间的位置变化即为运动向量(moving vector，MV)。As an example, FIG. 2 shows a schematic diagram of an inter prediction process in the video encoding method according to the embodiment of FIG. 1 . As shown in Figure 2, it shows two consecutive frames in the video. The dotted frame in the figure is only for illustration, not the content of the picture. In these two consecutive frames, only the position of object 1 has changed, then the pixels of object 1 area (dashed square area) in the previous frame can be used to predict the pixels of object 1 area (dashed square area) in the second frame, before and after The second frame can be predictively encoded by subtracting the residual of the corresponding pixels in the two areas and adding the motion vector information of object 1. When performing inter-frame prediction in the next frame, it is necessary to search in the previous frame to find the matching area. This search process is called motion estimation (ME). The position change of an object or pixel between the two frames before and after is Motion vector (moving vector, MV).

作为示例，图3示出根据图1实施例的视频编码方法中的编码块划分过程的示意图。如图3所示，在本实施例中，为图像定义了64x64像素为单位的CTU。视频中的每幅图像被划分成若干个互相不重叠的CTU。在CTU内部，采用基于四叉树的循环分层结构，如最上面一层为64x64像素单元，可以向下划分一层，划分成4个32x32像素大小的CU，每个32x32像素大小可进一步向下划分成16x16的编码块，直到最小8x8的编码单元。As an example, FIG. 3 shows a schematic diagram of a coding block dividing process in the video encoding method according to the embodiment of FIG. 1 . As shown in Figure 3, in this embodiment, a CTU of 64x64 pixels is defined for the image. Each image in the video is divided into several non-overlapping CTUs. Inside the CTU, a cyclic hierarchical structure based on a quadtree is adopted. For example, the top layer is a 64x64 pixel unit, which can be divided down one level into four CUs of 32x32 pixel size. Each 32x32 pixel size can be further divided into It is divided into 16x16 coding blocks until the minimum 8x8 coding unit.

在本实施例中，帧内预测和帧间预测都可以以CU为单元，按照从左到右从上到下方式顺序进行。将CU范围内视频原始像素值减去预测值得到残差，经过DCT，再对变换结果进行量化，量化值和相关语法元素，如CU划分深度，像素块对应的运动向量等进行熵编码，即可得到压缩视频流。同时，量化结果经过反量化，反离散余弦变换即可得到解码残差值，再和预测值相加即可得到像素重建值。前一个CU块像素重建值，CU划分深度等信息需要作为下一个CU预测的输入，所以在某些情况下所有CU没法同时并行进行压缩编码。In this embodiment, both intra-frame prediction and inter-frame prediction can be performed sequentially from left to right and top to bottom in CU units. Subtract the prediction value from the original pixel value of the video in the CU range to obtain the residual. After DCT, the transformation result is quantized. The quantized value and related syntax elements, such as CU division depth, motion vector corresponding to the pixel block, etc., are entropy encoded, that is, Compressed video streams are available. At the same time, the quantization result is dequantized and inverse discrete cosine transform can be used to obtain the decoding residual value, and then added to the predicted value to obtain the pixel reconstruction value. The pixel reconstruction value of the previous CU block, CU division depth and other information need to be used as input for the next CU prediction, so in some cases all CUs cannot be compressed and encoded in parallel at the same time.

图4示出根据本申请一实施例的视频编码方法的流程示意图。Figure 4 shows a schematic flowchart of a video encoding method according to an embodiment of the present application.

根据本实施例，视频编码方法包括步骤S410至步骤S440，以下详述各步骤。According to this embodiment, the video encoding method includes steps S410 to S440, and each step is described in detail below.

S410、获取视频的生成信息。S410. Obtain video generation information.

在本实施例中，生成信息可以是指生成式视频的生成过程中产生的与后续编码操作有关的信息。在一些实施例中，视频的生成信息不同于视频图像本身所具有的信息(例如图像的像素值信息或像素值所能直接反映出的图像纹理、颜色等信息)，生成信息是生成视频的工具提供的、仅根据视频的图像内容或像素值信息所不能或难以确定的信息。In this embodiment, the generation information may refer to information related to subsequent encoding operations generated during the generation process of the generative video. In some embodiments, the generated information of the video is different from the information of the video image itself (such as the pixel value information of the image or the image texture, color and other information that can be directly reflected by the pixel value), and the generated information is a tool for generating the video. Information provided that cannot or is difficult to determine based only on the image content or pixel value information of the video.

作为示例，生成信息包括第一帧中的前景物体的位置信息和材质信息。例如，生成信息包含图层信息以及前景运动物体像素在画面中的坐标信息。As an example, the generated information includes position information and material information of the foreground object in the first frame. For example, the generated information includes layer information and coordinate information of foreground moving object pixels in the picture.

在本实例中，前景物体的位置信息可以是指前景物体在视频画面中的位置或坐标信息，前景物体的材质信息可以是指视频制作者在制作视频时，赋予虚构的前景物体哪种材质，例如金属、木材、人体皮肤等。In this example, the position information of the foreground object may refer to the position or coordinate information of the foreground object in the video screen, and the material information of the foreground object may refer to the material given to the fictitious foreground object by the video producer when making the video. For example, metal, wood, human skin, etc.

作为示例，视频包括通过视频生成工具生成的虚拟场景视频。此时，为了获取视频的生成信息，可以从视频生成工具获取生成信息。As an example, the video includes a virtual scene video generated by a video generation tool. At this time, in order to obtain the generation information of the video, the generation information can be obtained from the video generation tool.

在本示例中，视频生成工具可以是指用于根据用户的建模、渲染、调色等人工视频制作手段而生成视频的工具，这样的视频生成工具区别于针对实际存在的客观物体进行拍摄的拍摄、录像或录影工具。虚拟场景视频可以是指这样的视频，其中显示的物体并非真实存在，而是由人工通过勾画、建模和渲染等图像编辑手段虚构出来的。当然，通过视频生成工具生成的虚拟场景视频中可以包含实际存在的物体，例如AR(augmented reality，增强现实)眼镜中生成的虚拟物体和真实物体混合成的视频也属于本申请定义的虚拟场景视频。In this example, the video generation tool may refer to a tool used to generate videos based on the user's manual video production methods such as modeling, rendering, and color correction. Such video generation tools are different from those that shoot actual objective objects. Photography, video or video recording tools. Virtual scene videos can refer to videos in which the objects displayed do not really exist, but are artificially created through image editing methods such as sketching, modeling and rendering. Of course, the virtual scene video generated by the video generation tool can contain actual objects. For example, a video mixed with virtual objects and real objects generated in AR (augmented reality) glasses also belongs to the virtual scene video defined in this application. .

在本示例中，为了获取视频的生成信息，可以直接从视频生成工具中导出相关信息或数据，用于后续的视频编码。例如，有些视频生成工具可以提供物体的三维坐标信息，从中可以得出哪些物体是前景物体，哪些物体是背景物体。又例如，有些视频生成工具可以提供物体的材质信息，从而可以根据材质信息判断其硬度或刚性度，对于不同刚性度的材质采用不同的编码块划分方式用于编码。In this example, in order to obtain the video generation information, the relevant information or data can be directly exported from the video generation tool for subsequent video encoding. For example, some video generation tools can provide three-dimensional coordinate information of objects, from which it can be concluded which objects are foreground objects and which objects are background objects. For another example, some video generation tools can provide material information of an object, so that its hardness or rigidity can be judged based on the material information. Different encoding block division methods are used for encoding materials with different rigidities.

作为示例，视频生成工具包括渲染器或AI(artificial intelligence，人工智能)工具。As examples, video generation tools include renderers or AI (artificial intelligence, artificial intelligence) tools.

在本示例中，渲染器可以是指一种将3D(3-dimension，三维)模型转换为2D(2-dimension，二维)图像的软件或硬件工具，它可以将3D模型中的几何形状、纹理、光照和材质等信息转换为2D图像，以便于在屏幕上显示或打印出来，渲染器通常被用于电影、游戏、建筑设计等领域，以创建逼真的视觉效果。例如，渲染器可以是指一种用于处理和显示三维图形的软件，它可以将三维场景绘制到两维显示设备上,如监视器、手机或电视，它的主要作用是将数字模型转换为可供设备显示的图像。In this example, the renderer can refer to a software or hardware tool that converts a 3D (3-dimension, three-dimensional) model into a 2D (2-dimension, two-dimensional) image. It can convert the geometric shapes, Information such as textures, lighting, and materials are converted into 2D images for display on the screen or printing. Renderers are often used in movies, games, architectural design, and other fields to create realistic visual effects. For example, a renderer can refer to a software used to process and display three-dimensional graphics. It can draw a three-dimensional scene onto a two-dimensional display device, such as a monitor, mobile phone, or TV. Its main function is to convert a digital model into An image for the device to display.

在本示例中，AI工具可以是指一种利用人工智能技术自动生成视频的软件工具，工具可以根据用户的绘制操作或根据用户提供的图片素材、视频素材、添加的文字、配音、背景音乐等，通过深度学习技术和算法智能生成视频作品。In this example, the AI tool can refer to a software tool that uses artificial intelligence technology to automatically generate videos. The tool can be based on the user's drawing operations or based on the picture materials, video materials, added text, dubbing, background music, etc. provided by the user. , intelligently generate video works through deep learning technology and algorithms.

S420、根据生成信息，对视频的第一帧进行划分，得到多个编码块，第一帧的多个编码块包括当前块，第一帧的多个编码块中或视频的邻近第一帧的第二帧中包括目标块。S420. According to the generated information, divide the first frame of the video to obtain multiple coding blocks. The multiple coding blocks of the first frame include the current block. Among the multiple coding blocks of the first frame or adjacent to the first frame of the video, The target block is included in the second frame.

在本实施例中，视频的帧可以是指构成视频的连续图像序列中的一幅图像。与第一帧邻近的第二帧，可以是指在视频的连续图像序列中紧密相邻的两个视频帧，也可以是指相隔一帧或多帧的两个视频帧，这两个视频帧存在时间或空间上的相关性，可以用于预测和编码。In this embodiment, the frame of the video may refer to an image in a continuous image sequence that constitutes the video. The second frame adjacent to the first frame can refer to two video frames that are closely adjacent in the continuous image sequence of the video, or it can refer to two video frames that are one or more frames apart. These two video frames There are temporal or spatial correlations that can be used for prediction and encoding.

在本实施例中，编码块可以是指在针对视频进行压缩编码过程中，为了更好进行编码，将视频帧分成的小块，每个小块是一个独立单元，用于不同的编码方式。在本领域中，压缩包括帧间预测和帧内预测，帧间和帧内预测所需要划分的小块就是编码块，编码块可以具有不同的层级，一个大编码块可以包含很多小编码块，这些或大或小的编码块都是本申请中所述的编码块。In this embodiment, the coding block may refer to a video frame divided into small blocks for better encoding during the video compression encoding process. Each small block is an independent unit and is used for different encoding methods. In this field, compression includes inter-frame prediction and intra-frame prediction. The small blocks that need to be divided into inter-frame and intra-frame prediction are coding blocks. The coding blocks can have different levels. A large coding block can contain many small coding blocks. These large or small coding blocks are all coding blocks described in this application.

作为示例，编码块包括宏块、子块、编码树单元CTU、编码单元CU、预测单元PU(prediction unit)和转换单元TU(transform unit)中的一种或多种。As an example, the coding block includes one or more of a macroblock, a sub-block, a coding tree unit CTU, a coding unit CU, a prediction unit PU (prediction unit), and a transformation unit TU (transform unit).

在本示例中，宏块和子块例如是AVC编码标准中的编码块名称，CTU、CU、PU和TU例如是HEVC和VVC标准中的编码块名称，这些编码块的划分、预测和编码都可以采用本申请实施例提供的视频编码方法。本领域技术人员应知，本领域还存在其它编码标准，这些标准中与本申请中编码块相同或相近的概念都在本申请的保护范围内，本申请并不局限于上面列举的几种编码标准。In this example, macroblocks and sub-blocks are, for example, coding block names in the AVC coding standard, and CTU, CU, PU, and TU are, for example, coding block names in the HEVC and VVC standards. The division, prediction, and encoding of these coding blocks can all be The video encoding method provided by the embodiment of this application is adopted. Those skilled in the art should know that there are other coding standards in this field. Concepts in these standards that are the same as or similar to the coding blocks in this application are within the scope of protection of this application. This application is not limited to the coding listed above. standard.

S430、根据当前块的当前像素信息，预测目标块的预测像素信息。S430. Predict the predicted pixel information of the target block based on the current pixel information of the current block.

在本实施例中，当前像素信息可以是指当前块中的已知的像素信息，包含当前块中每个像素点的像素值。预测像素信息可以是指目标块中的根据当前块预测出来的像素信息，包含预测块中每个像素点的像素值。In this embodiment, the current pixel information may refer to the known pixel information in the current block, including the pixel value of each pixel point in the current block. The predicted pixel information may refer to the pixel information in the target block predicted based on the current block, including the pixel value of each pixel in the predicted block.

在本实施例中，根据当前块的当前像素信息预测目标块的预测像素信息，可以采取本领域公知的预测方式。例如基于当前块与目标块在空间和时间上的相似性和冗余性，根据当前块的已知像素信息针对目标块每个像素点的像素值进行预测。In this embodiment, the predicted pixel information of the target block is predicted based on the current pixel information of the current block, and a prediction method known in the art can be used. For example, based on the spatial and temporal similarity and redundancy between the current block and the target block, the pixel value of each pixel point of the target block is predicted based on the known pixel information of the current block.

S440、根据预测像素信息，对视频进行编码。S440. Encode the video according to the predicted pixel information.

在本实施例中，根据预测像素信息对视频进行编码，可以采用本领域公知的技术，例如DCT、量化、熵编码等手段进行。具体地，可以将编码块范围内的视频原始像素值(当前像素信息)减去预测值(预测像素信息)得到残差，经过DCT，再对变换结果进行量化，量化值和相关语法元素，如CU划分深度、像素块对应的运动向量等，再进行熵编码，即可得到压缩视频流。In this embodiment, video encoding based on predicted pixel information can be performed using techniques known in the art, such as DCT, quantization, entropy coding, and other means. Specifically, the original pixel value (current pixel information) of the video within the coding block range can be subtracted from the predicted value (predicted pixel information) to obtain the residual. After DCT, the transformation result is quantized, and the quantized value and related syntax elements, such as The CU divides the depth, the motion vector corresponding to the pixel block, etc., and then performs entropy coding to obtain the compressed video stream.

作为示例，图5示出根据图4实施例的视频编码方法中的划分步骤的示意图。如图5所示，S420可以具体实现为：对视频的第一帧进行划分，得到多个编码块，第一帧的多个编码块包括当前块和目标块。在本示例中，编码方式为帧内预测，当前块和目标块均在同一帧中，根据该帧中的当前块的像素信息预测相邻或其它位置上的目标块的像素信息。帧内预测主要基于同一帧中画面内容的空间相关性，相邻的编码块通常具有空间上相关的像素信息。As an example, FIG. 5 shows a schematic diagram of the dividing step in the video encoding method according to the embodiment of FIG. 4 . As shown in Figure 5, S420 can be specifically implemented as follows: dividing the first frame of the video to obtain multiple coding blocks. The multiple coding blocks of the first frame include the current block and the target block. In this example, the encoding method is intra-frame prediction, the current block and the target block are both in the same frame, and the pixel information of the target block in adjacent or other positions is predicted based on the pixel information of the current block in the frame. Intra-frame prediction is mainly based on the spatial correlation of picture content in the same frame, and adjacent coding blocks usually have spatially related pixel information.

作为示例，图6示出根据图4实施例的视频编码方法中的划分步骤的示意图。如图6所示，S420可以具体实现为：对视频的第一帧划分，得到多个编码块，第一帧的多个编码块包括当前块，第二帧中包括目标块。在本示例中，编码方式为帧间预测，当前块与目标块分别在相邻或邻近的两帧中，根据其中一帧的当前块预测其中另一帧的目标块的像素信息。帧间预测主要基于时间上邻近的两帧画面内容的时间相关性，视频帧序列中相邻两帧通常具有时间上相关的像素信息。As an example, FIG. 6 shows a schematic diagram of the dividing step in the video encoding method according to the embodiment of FIG. 4 . As shown in Figure 6, S420 can be specifically implemented as follows: dividing the first frame of the video to obtain multiple coding blocks, the multiple coding blocks of the first frame include the current block, and the second frame includes the target block. In this example, the encoding method is inter-frame prediction. The current block and the target block are in two adjacent or adjacent frames, and the pixel information of the target block in the other frame is predicted based on the current block in one frame. Inter-frame prediction is mainly based on the temporal correlation of the contents of two temporally adjacent frames. Two adjacent frames in a video frame sequence usually have temporally correlated pixel information.

作为示例，图7示出根据图4实施例的视频编码方法中的划分步骤的示意图。如图7所示，S420可以具体实现为：根据前景物体的位置信息，确定第一帧中的前景区域和背景区域；对前景区域按照第一模式进行划分，并对背景区域按照第二模式进行划分，其中第一模式中的编码块小于第二模式中的编码块。As an example, FIG. 7 shows a schematic diagram of the dividing step in the video encoding method according to the embodiment of FIG. 4 . As shown in Figure 7, S420 can be specifically implemented as: determining the foreground area and background area in the first frame according to the position information of the foreground object; dividing the foreground area according to the first mode, and dividing the background area according to the second mode. Partitioning, where the coding blocks in the first mode are smaller than the coding blocks in the second mode.

在本示例中，生成信息包括前景物体的位置信息，根据该位置信息可以确定前景物体在当前画面中的位置和边界，从而划定前景区域和背景区域。由于编码块的形状限制，用于划分编码块的前景区域可能不是完全贴合前景物体轮廓的区域，而是将前景物体的轮廓包含在其中。例如图7所示，前景物体呈圆形，前景区域为包含该圆形的方形区域，由黑色方框划定。针对前景区域，可以划分面积较小(像素数量较少)的编码块，针对背景区域，可以划分面积较大(像素数量较多)的编码块，这是因为前景物体的运动量或变化幅度通常较大，需要进行细致划分，而背景区域的变化内容较少，因此划分可以不那么细致。例如，对于背景区域，编码块(例如CU)直接按照固定模式进行划分，如全部划分成64x64大小CU进行预测和编码。在一些实施例中，对于背景区域，可以提供用户接口，由用户指定编码块划分方式，例如为了进一步提高视频质量，可以按照较小的如32x32或者16x16大小的编码块进行划分。In this example, the generated information includes position information of the foreground object. Based on this position information, the position and boundary of the foreground object in the current picture can be determined, thereby delimiting the foreground area and the background area. Due to the shape limitation of the coding block, the foreground area used to divide the coding block may not be an area that completely fits the outline of the foreground object, but contains the outline of the foreground object within it. For example, as shown in Figure 7, the foreground object is circular, and the foreground area is a square area containing the circle, which is delineated by a black box. For the foreground area, coding blocks with a smaller area (less number of pixels) can be divided. For the background area, coding blocks with a larger area (a larger number of pixels) can be divided. This is because the amount of movement or change of foreground objects is usually larger. Large, detailed division is required, while the background area has less changing content, so the division can be less detailed. For example, for the background area, coding blocks (such as CUs) are directly divided according to a fixed pattern, such as all divided into 64x64 size CUs for prediction and coding. In some embodiments, for the background area, a user interface may be provided, and the user may specify a encoding block division method. For example, in order to further improve the video quality, the background area may be divided into smaller encoding blocks such as 32x32 or 16x16.

作为示例，图8示出根据图4实施例的视频编码方法中的划分步骤的示意图。如图8所示，S420还可以具体实现为：根据前景物体的位置信息，确定第一帧中的前景物体的边缘区域；对边缘区域按照第一模式进行划分。As an example, FIG. 8 shows a schematic diagram of the dividing step in the video encoding method according to the embodiment of FIG. 4 . As shown in Figure 8, S420 can also be specifically implemented as follows: determining the edge area of the foreground object in the first frame according to the position information of the foreground object; and dividing the edge area according to the first mode.

在本示例中，生成信息仍然包括前景物体的位置信息，根据该位置信息可以确定前景物体在当前画面中的位置和边缘线条，从而根据边缘线条的位置和大小确定前景物体的边缘区域。如图7所示，前景物体呈圆形，其边缘线条为圆圈线条，沿着该环形圆圈线条的编码区域可以是图7中大黑色方框和小黑色方框之间的回字形区域。对于该回字形区域，可以采取更加细致的划分方式，例如采用较小(像素数量较少)的编码块进行划分，而对于其它区域，包括大黑色方框外部和小黑色方框内部的区域，可以采用不那么细致的划分方式，例如采用面积较大(像素数量较多)的编码块进行划分。例如，对于前景区域边缘，如人物所在区域边缘，按照固定模式，如只进行8x8模式划分。In this example, the generated information still includes the position information of the foreground object. Based on this position information, the position and edge lines of the foreground object in the current picture can be determined, so that the edge area of the foreground object can be determined based on the position and size of the edge lines. As shown in Figure 7, the foreground object is circular, and its edge lines are circle lines. The coding area along the circular circle line can be the zigzag area between the large black box and the small black box in Figure 7. For the back-shaped area, a more detailed division method can be adopted, such as using smaller (less number of pixels) coding blocks for division, and for other areas, including areas outside the large black box and inside the small black box, A less detailed division method can be used, for example, coding blocks with a larger area (larger number of pixels) can be used for division. For example, the edge of the foreground area, such as the edge of the area where the character is located, is divided according to a fixed mode, such as only the 8x8 mode.

作为示例，图9示出根据图4实施例的视频编码方法中的划分步骤所采取的两种划分方式的示意图。如图9所示，S420还可以具体实现为：根据前景物体的位置信息，确定第一帧中的前景区域和背景区域；根据前景物体的材质信息，判断前景物体属于刚性物体或柔性物体；若前景物体属于刚性物体，对前景区域按照第三模式进行划分；若前景物体属于柔性物体，对前景区域按照第四模式进行划分，其中第三模式中的编码块大于第四模式中的编码块。As an example, FIG. 9 shows a schematic diagram of two dividing methods according to the dividing step in the video encoding method of the embodiment of FIG. 4 . As shown in Figure 9, S420 can also be specifically implemented as: determining the foreground area and background area in the first frame based on the position information of the foreground object; determining whether the foreground object is a rigid object or a flexible object based on the material information of the foreground object; if If the foreground object is a rigid object, the foreground area is divided according to the third mode; if the foreground object is a flexible object, the foreground area is divided according to the fourth mode, in which the coding blocks in the third mode are larger than the coding blocks in the fourth mode.

如图9左侧所示，前景物体属于刚性物体，因此对前景区域采用第三模式划分。如图9右侧所示，前景物体属于柔性物体，因此对前景区域采用第四模式划分。从图中可知，第三模式和第四模式都是针对前景物体的划分模式，因此相对于背景区域(即黑色方框外部的区域)而言，两种模式的划分都更为细致。而在第三和第四模式彼此之间，第四模式的划分更为细致，采用面积更小(像素数量更少)的编码块对图中圆形的前景物体进行划分，第三模式的划分较为粗糙，采用面积更大(像素数量更多)的编码块对前景物体进行划分。这是因为，刚性物体各个部分的运动方向相对一致，其空间和时间上的相关性更强，因此为了提高压缩效率，可以采用更大的编码块，而柔性物体各个部分通常具有不一致的运动方向或运动向量，其空间和时间上的相关性更弱，因此为了尽可能保留图像细节，使画面还原度更高，可以采用更小的编码块。例如，对于前景区域内部，如果是前景区域是柔性材质，按照较小编码块的固定模式划分，如只进行8x8模式划分；如果前景区域是刚性材质，按照较大编码块的固定模式划分，如只进行64x64模式划分。As shown on the left side of Figure 9, the foreground object is a rigid object, so the third mode is used to divide the foreground area. As shown on the right side of Figure 9, the foreground object is a flexible object, so the fourth mode is used to divide the foreground area. As can be seen from the figure, the third mode and the fourth mode are both division modes for foreground objects, so compared to the background area (that is, the area outside the black box), the division of the two modes is more detailed. Between the third and fourth modes, the division of the fourth mode is more detailed, using coding blocks with smaller areas (fewer number of pixels) to divide the circular foreground objects in the picture. The division of the third mode It is relatively rough and uses coding blocks with a larger area (more pixels) to divide foreground objects. This is because the motion directions of various parts of rigid objects are relatively consistent and their spatial and temporal correlations are stronger. Therefore, in order to improve compression efficiency, larger encoding blocks can be used, while the motion directions of various parts of flexible objects usually are inconsistent. Or motion vector, its spatial and temporal correlation is weaker, so in order to retain the image details as much as possible and make the picture restoration higher, smaller coding blocks can be used. For example, for the inside of the foreground area, if the foreground area is made of flexible material, it will be divided according to the fixed mode of the smaller coding block, such as only 8x8 mode division; if the foreground area is made of rigid material, it will be divided according to the fixed mode of the larger coding block, such as Only 64x64 mode partitioning is performed.

作为示例，图10示出根据图4实施例的视频编码方法中的划分步骤的示意图。如图10所示，前景物体为一个人体形状，边缘形状较为复杂，对于这样的前景物体可以采取多种划分方式。首先可以确定前景区域，即图中黑色矩形框，针对矩形框以外的部分采用最粗糙的背景区域编码块划分方式，例如64x64大小的编码块。针对前景区域中，涉及人体轮廓边缘的图像区域，可以采取最细致的划分方式，例如8x8大小的编码块。对于前景区域中，不涉及前景物体边缘的部分，可以采取细致程度居中的划分方式，例如32x32大小的编码块。对于前景物体内部，考虑到人体的刚性程度，可以采取细致程度居中的划分方式，例如32x32大小的编码块。对于其它部分，可以采用灵活的划分策略，例如采用24x32、8x32、16x16等方式。As an example, FIG. 10 shows a schematic diagram of the dividing step in the video encoding method according to the embodiment of FIG. 4 . As shown in Figure 10, the foreground object is in the shape of a human body, and the edge shape is relatively complex. Various division methods can be adopted for such foreground objects. First, the foreground area can be determined, that is, the black rectangular frame in the picture, and the roughest background area coding block division method is used for the parts outside the rectangular frame, such as a 64x64 sized coding block. For the image area in the foreground area involving the edge of the human body outline, the most detailed division method can be adopted, such as 8x8-sized coding blocks. For the part of the foreground area that does not involve the edge of the foreground object, a mid-level division method can be adopted, such as a 32x32-sized coding block. For the interior of the foreground object, considering the rigidity of the human body, a mid-level division method can be adopted, such as a 32x32-sized coding block. For other parts, flexible partitioning strategies can be used, such as 24x32, 8x32, 16x16, etc.

具体地，编码器根据视频生成器提供的前景运动物体信息，选择64x64对齐的最小区域包裹前景运动物体，如图10所示的黑色矩形区域。如果有多个前景运动物体，则生成多个矩形区域。对于矩形区域之外的背景区域，相对前景运动物体，重要性较低，全部按照最大编码块规格，即64x64进行划分，按照64x64进行预测以及生产码流。对于包含矩形区域之内的区域，可以进一步进行不包含任何前景运动物体的32x32、16x16、24x32、8x32等(视频编码协议允许的)编码块划分方式。对于前景运动物体，如果是刚性物体，则运动往往一致，则也可进行不包含前景运动物体边缘的最大块划分，如64x64、32x32等视频编码协议允许的方式。对于前景运动物体，如果是柔性物体，则运动往往不一致，按照最小8x8方式进行划分。对于前景运动物体的边缘区域，是图像细节信息，按照最小8x8进行编码块划分。Specifically, based on the foreground moving object information provided by the video generator, the encoder selects a 64x64 aligned minimum area to wrap the foreground moving object, such as the black rectangular area shown in Figure 10. If there are multiple foreground moving objects, multiple rectangular areas are generated. For the background area outside the rectangular area, which is less important than the foreground moving objects, they are all divided according to the maximum coding block specification, that is, 64x64, and prediction and production code streams are performed according to 64x64. For the area within the rectangular area, coding block division methods such as 32x32, 16x16, 24x32, 8x32 (allowed by the video coding protocol) that do not contain any foreground moving objects can be further performed. For foreground moving objects, if it is a rigid object, the motion is often consistent, and the maximum block division that does not include the edges of the foreground moving object can also be performed, such as the methods allowed by video encoding protocols such as 64x64 and 32x32. For foreground moving objects, if they are flexible objects, their movements are often inconsistent and should be divided according to the minimum 8x8 method. For the edge area of the foreground moving object, it is the image detail information, which is divided into coding blocks according to the minimum 8x8.

基于前述图4所述的方法实施例，本申请实施例还提供一种视频编码装置1100，其结构示意图如图11所示。该装置用于执行前述图4中的各个步骤。Based on the method embodiment described in Figure 4, an embodiment of the present application also provides a video encoding device 1100, the schematic structural diagram of which is shown in Figure 11. This device is used to perform the various steps in Figure 4 mentioned above.

根据本实施例，视频编码装置1100包括获取模块1110、划分模块1120、预测模块1130和编码模块1140。获取模块1110用于获取视频的生成信息。划分模块1120用于根据生成信息，对视频的第一帧进行划分，得到多个编码块，第一帧的多个编码块包括当前块，第一帧的多个编码块中或视频的邻近第一帧的第二帧中包括目标块。预测模块1130用于根据当前块的当前像素信息，预测目标块的预测像素信息。编码模块1140用于根据预测像素信息，对视频进行编码。According to this embodiment, the video encoding device 1100 includes an acquisition module 1110, a partitioning module 1120, a prediction module 1130 and an encoding module 1140. The acquisition module 1110 is used to acquire video generation information. The dividing module 1120 is used to divide the first frame of the video according to the generated information to obtain multiple coding blocks. The multiple coding blocks of the first frame include the current block. Among the multiple coding blocks of the first frame or the adjacent coding blocks of the video, The second frame of a frame includes the target block. The prediction module 1130 is configured to predict the predicted pixel information of the target block according to the current pixel information of the current block. The encoding module 1140 is used to encode the video according to the predicted pixel information.

在一实施例中，生成信息包括第一帧中的前景物体的位置信息和材质信息。In one embodiment, the generated information includes position information and material information of the foreground object in the first frame.

在一实施例中，划分模块1120被进一步配置成：In one embodiment, the partitioning module 1120 is further configured to:

根据前景物体的位置信息，确定第一帧中的前景区域和背景区域；Determine the foreground area and background area in the first frame based on the position information of the foreground object;

对前景区域按照第一模式进行划分，并对背景区域按照第二模式进行划分，其中第一模式中的编码块小于第二模式中的编码块。The foreground area is divided according to the first mode, and the background area is divided according to the second mode, wherein the coding blocks in the first mode are smaller than the coding blocks in the second mode.

根据前景物体的位置信息，确定第一帧中的前景物体的边缘区域；According to the position information of the foreground object, determine the edge area of the foreground object in the first frame;

对边缘区域按照第一模式进行划分。The edge area is divided according to the first mode.

根据前景物体的材质信息，判断前景物体属于刚性物体或柔性物体；Based on the material information of the foreground object, determine whether the foreground object is a rigid object or a flexible object;

若前景物体属于刚性物体，对前景区域按照第三模式进行划分；If the foreground object is a rigid object, the foreground area is divided according to the third mode;

若前景物体属于柔性物体，对前景区域按照第四模式进行划分，其中第三模式中的编码块大于第四模式中的编码块。If the foreground object is a flexible object, the foreground area is divided according to the fourth mode, where the coding blocks in the third mode are larger than the coding blocks in the fourth mode.

在一实施例中，视频包括通过视频生成工具生成的虚拟场景视频，获取视频的生成信息，包括：In one embodiment, the video includes a virtual scene video generated by a video generation tool, and obtaining the video generation information includes:

从视频生成工具获取生成信息。Get generation information from the video generation tool.

在一实施例中，视频生成工具包括渲染器或AI工具。In one embodiment, the video generation tool includes a renderer or AI tool.

在一实施例中，编码块包括宏块、子块、编码树单元CTU、编码单元CU、预测单元PU和转换单元TU中的一种或多种。In an embodiment, the coding block includes one or more of a macroblock, a sub-block, a coding tree unit CTU, a coding unit CU, a prediction unit PU, and a transformation unit TU.

需要说明的是，图11所示实施例提供的视频编码装置1100在执行方法时，仅以上述各功能模块的划分举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将装置的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的视频编码装置1100与图4所示的视频编码方法实施例分别属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that when the video encoding device 1100 provided in the embodiment shown in FIG. 11 performs the method, only the division of the above functional modules is used as an example. In actual applications, the above functions can be allocated to different functional modules as needed. Completion means dividing the internal structure of the device into different functional modules to complete all or part of the functions described above. In addition, the video encoding device 1100 provided in the above embodiments and the video encoding method embodiment shown in FIG. 4 respectively belong to the same concept. Please refer to the method embodiment for details of the specific implementation process, which will not be described again here.

图12是本申请实施例提供一种计算设备1200的硬件结构示意图。FIG. 12 is a schematic diagram of the hardware structure of a computing device 1200 according to an embodiment of the present application.

参见图12，该计算设备1200包括处理器1210、存储器1220、通信接口1230和总线1240，处理器1210、存储器1220和通信接口1230通过总线1240彼此连接。处理器1210、存储器1220和通信接口1230也可以采用除了总线1240之外的其他连接方式连接。Referring to FIG. 12 , the computing device 1200 includes a processor 1210 , a memory 1220 , a communication interface 1230 , and a bus 1240 through which the processor 1210 , the memory 1220 , and the communication interface 1230 are connected to each other. The processor 1210, the memory 1220 and the communication interface 1230 may also be connected using other connection methods besides the bus 1240.

其中，存储器1220可以是各种类型的存储介质，例如随机存取存储器(randomaccess memory，RAM)、只读存储器(read-only memory，ROM)、非易失性RAM(non-volatileRAM，NVRAM)、可编程ROM(programmable ROM，PROM)、可擦除PROM(erasable PROM，EPROM)、电可擦除PROM(electrically erasable PROM，EEPROM)、闪存、光存储器、硬盘等。Among them, the memory 1220 can be various types of storage media, such as random access memory (randomaccess memory, RAM), read-only memory (read-only memory, ROM), non-volatile RAM (non-volatileRAM, NVRAM), Programmable ROM (PROM), erasable PROM (erasable PROM, EPROM), electrically erasable PROM (electrically erasable PROM, EEPROM), flash memory, optical memory, hard disk, etc.

其中，处理器1210可以是通用处理器，通用处理器可以是通过读取并执行存储器(例如存储器1220)中存储的内容来执行特定步骤和/或操作的处理器。例如，通用处理器可以是中央处理器(central processing unit，CPU)。处理器1210可以包括至少一个电路，以执行图4所示实施例提供的视频编码方法的全部或部分步骤。The processor 1210 may be a general-purpose processor, and the general-purpose processor may be a processor that performs specific steps and/or operations by reading and executing content stored in a memory (eg, the memory 1220). For example, the general-purpose processor may be a central processing unit (CPU). The processor 1210 may include at least one circuit to perform all or part of the steps of the video encoding method provided by the embodiment shown in FIG. 4 .

其中，通信接口1230包括输入/输出(input/output，I/O)接口、物理接口和逻辑接口等用于实现计算设备1200内部的器件互连的接口，以及用于实现计算设备1200与其他设备(例如其他计算设备或用户设备)互连的接口。物理接口可以是以太网接口，光纤接口，ATM接口等。通信接口1230可以外接输入装置和输出装置。例如，输入装置可以是麦克风或麦克风阵列，用于捕捉语音输入信号；可以是通信网络连接器，用于从云端或其它设备接收所采集的输入信号；还可以包括例如键盘、鼠标等等。输出装置可以向外部输出各种信息，包括确定出的距离信息、方向信息等。输出装置可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。Among them, the communication interface 1230 includes an input/output (I/O) interface, a physical interface, a logical interface, and other interfaces for realizing device interconnection within the computing device 1200, and for realizing the interconnection between the computing device 1200 and other devices. (such as other computing devices or user equipment) interconnection interfaces. The physical interface can be an Ethernet interface, a fiber optic interface, an ATM interface, etc. The communication interface 1230 can be externally connected to input devices and output devices. For example, the input device may be a microphone or microphone array for capturing voice input signals; it may be a communication network connector for receiving collected input signals from the cloud or other devices; it may also include, for example, a keyboard, a mouse, and so on. The output device can output various information to the outside, including determined distance information, direction information, etc. Output devices may include, for example, displays, speakers, printers, and communication networks and remote output devices to which they are connected, among others.

其中，总线1240可以是任何类型的，用于实现处理器1210、存储器1220和通信接口1230互连的通信总线，例如系统总线。The bus 1240 may be any type of communication bus used to interconnect the processor 1210, the memory 1220 and the communication interface 1230, such as a system bus.

上述器件可以分别设置在彼此独立的芯片上，也可以至少部分的或者全部的设置在同一块芯片上。将各个器件独立设置在不同的芯片上，还是整合设置在一个或者多个芯片上，往往取决于产品设计的需要。本申请实施例对上述器件的具体实现形式不做限定。The above-mentioned devices may be arranged on separate chips, or at least part or all of them may be arranged on the same chip. Whether each device is independently installed on different chips or integrated on one or more chips often depends on the needs of product design. The embodiments of this application do not limit the specific implementation forms of the above devices.

图12所示的计算设备1200仅仅是示例性的，在实现过程中，计算设备1200还可以包括其他组件，本文不再一一列举。The computing device 1200 shown in FIG. 12 is only exemplary. During the implementation process, the computing device 1200 may also include other components, which are not listed here.

本申请的实施例还可以是计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上文中描述的根据本申请各种实施例的视频编码方法中的步骤。Embodiments of the present application may also be a computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are run by a processor, the processor causes the processor to perform various methods described above in this specification according to the present application. Steps in the video encoding method of the embodiment.

所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The computer-readable storage medium may be any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

以上结合具体实施方式(包括实施例和实例)详细描述了本申请的概念、原理和思想。本领域技术人员应理解，本申请的实施方式不止上文给出的这几种形式，本领域技术人员在阅读本申请文件以后，可以对上述实施方式中的步骤、方法、装置、部件做出任何可能的改进、替换和等同形式，这些改进、替换和等同形式应视为落入在本申请的范围内。本申请的保护范围仅以权利要求书为准。The concepts, principles and ideas of the present application are described in detail above in conjunction with specific implementations (including embodiments and examples). Those skilled in the art should understand that the embodiments of the present application are more than the forms given above. After reading the documents of this application, those skilled in the art can make decisions about the steps, methods, devices and components in the above-mentioned embodiments. Any possible improvements, substitutions and equivalent forms shall be deemed to fall within the scope of this application. The scope of protection of this application shall be determined only by the claims.

Claims

1. A video encoding method, characterized in that the method includes:

Obtain the generation information of the video;

According to the generated information, the first frame of the video is divided to obtain multiple coding blocks. The multiple coding blocks of the first frame include the current block, and any of the multiple coding blocks of the first frame or all A second frame of the video adjacent to the first frame includes a target block;

Predict predicted pixel information of the target block according to the current pixel information of the current block;

The video is encoded based on the predicted pixel information.

2. The video encoding method according to claim 1, wherein the generated information includes position information and material information of the foreground object in the first frame.

3. The video encoding method according to claim 2, wherein the first frame of the video is divided according to the generated information to obtain a plurality of coding blocks, including:

Determine the foreground area and background area in the first frame according to the position information of the foreground object;

The foreground area is divided according to a first mode, and the background area is divided according to a second mode, wherein the coding blocks in the first mode are smaller than the coding blocks in the second mode.

4. The video encoding method according to claim 3, characterized in that, dividing the first frame of the video according to the generated information to obtain a plurality of coding blocks, further comprising:

Determine the edge area of the foreground object in the first frame according to the position information of the foreground object;

The edge area is divided according to the first mode.

5. The video encoding method according to claim 2, wherein the first frame of the video is divided according to the generated information to obtain a plurality of coding blocks, including:

Determine whether the foreground object is a rigid object or a flexible object according to the material information of the foreground object;

If the foreground object is a rigid object, divide the foreground area according to the third mode;

If the foreground object is a flexible object, the foreground area is divided according to a fourth mode, wherein the coding blocks in the third mode are larger than the coding blocks in the fourth mode.

6. The video encoding method according to claim 1, wherein the video includes a virtual scene video generated by a video generation tool, and the obtaining the generation information of the video includes:

The generation information is obtained from the video generation tool.

7. The video encoding method according to claim 6, wherein the video generation tool includes a renderer or an AI tool.

8. The video encoding method according to claim 1, wherein the coding block includes one or more of a macroblock, a sub-block, a coding tree unit (CTU), a coding unit (CU), a prediction unit (PU), and a transformation unit (TU). kind.

9. A video encoding device, characterized in that the device includes:

An acquisition module, used to acquire the generation information of the video;

A dividing module, configured to divide the first frame of the video according to the generated information to obtain a plurality of coding blocks. The plurality of coding blocks of the first frame include the current block. The plurality of coding blocks of the first frame The target block is included in the encoding block or in a second frame of the video adjacent to the first frame;

A prediction module, configured to predict the predicted pixel information of the target block based on the current pixel information of the current block;

An encoding module, configured to encode the video according to the predicted pixel information.

10. A computing device, characterized in that the computing device includes a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the method of any one of claims 1 to 8 Video encoding methods.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and the computer program is used to execute the video encoding method according to any one of claims 1 to 8.