CN116095353A

CN116095353A - Live broadcast method and device based on volume video, electronic equipment and storage medium

Info

Publication number: CN116095353A
Application number: CN202310080256.7A
Authority: CN
Inventors: 孙伟; 罗栋藩; 邵志兢; 张煜
Original assignee: Zhuhai Prometheus Vision Technology Co ltd
Current assignee: Zhuhai Prometheus Vision Technology Co ltd
Priority date: 2023-02-02
Filing date: 2023-02-02
Publication date: 2023-05-09

Abstract

The embodiment of the application discloses a live broadcast method, a live broadcast device, electronic equipment and a storage medium based on volume video; in the embodiment of the application, live video data of the three-dimensional character model and shooting video of the fusion object can be obtained; performing image recognition processing on the shot video of the fusion object to obtain the virtual image of the fusion object; analyzing the live video data of the three-dimensional character model to obtain background information of the live video data; carrying out fusion processing on the virtual image of the fusion object and the background information of the live video data to obtain fused live video data; acquiring viewing angle information, and determining the display content of the three-dimensional character model and the virtual image of the fusion object in the fused live video data based on the viewing angle information; based on the display content, the fused live video data is displayed, and interaction between the watched object and the three-dimensional character model in the live scene can be supported, so that interactivity and interestingness of the volume video are improved.

Description

Live streaming method, device, electronic equipment and storage medium based on volumetric video

技术领域technical field

本申请涉及图像处理技术领域，具体涉及一种基于体积视频的直播方法、装置、电子设备及存储介质。The present application relates to the technical field of image processing, and in particular to a volume video-based live broadcast method, device, electronic equipment, and storage medium.

背景技术Background technique

体积视频(Volumetric Video，又称容积视频、空间视频、体三维视频或6自由度视频等)是一种通过捕获三维空间中信息(如深度信息和色彩信息等)并生成三维模型序列的技术。相对于传统的视频，体积视频将空间的概念加入到视频中，用三维模型来更好的还原真实三维世界，而不是以二维的平面视频加上运镜来模拟真实三维世界的空间感。由于体积视频实质为三维模型序列，使得用户可以随自己喜好调整到任意视角进行观看，较二维平面视频具有更高的还原度和沉浸感。体积视频可以应用在多个不同的场景中，例如，可以将体积视频应用到直播场景中。但是，在将体积视频应用到直播场景中时，并不能支持观看对象和直播场景中三维人物模型进行互动。Volumetric Video (also known as volumetric video, spatial video, volumetric 3D video or 6-DOF video, etc.) is a technology that captures information in 3D space (such as depth information and color information, etc.) and generates a 3D model sequence. Compared with traditional videos, volumetric videos add the concept of space to videos, and use 3D models to better restore the real 3D world, instead of using 2D flat videos and moving mirrors to simulate the sense of space in the real 3D world. Since the volumetric video is essentially a sequence of 3D models, users can adjust it to any viewing angle to watch according to their preferences, which has a higher degree of restoration and immersion than 2D flat video. Volumetric video can be applied in many different scenarios, for example, volumetric video can be applied to live broadcasting scenarios. However, when the volumetric video is applied to the live broadcast scene, it cannot support the interaction between the viewing object and the 3D character model in the live broadcast scene.

发明内容Contents of the invention

本申请实施例提供一种基于体积视频的直播方法、装置、设备、存储介质及程序产品，可以支持观看对象和直播场景中三维人物模型进行互动，从而提高了体积视频的互动性和趣味性。Embodiments of the present application provide a live broadcast method, device, device, storage medium, and program product based on volumetric video, which can support interaction between viewing objects and 3D character models in a live broadcast scene, thereby improving the interactivity and interest of volumetric video.

本申请实施例提供一种基于体积视频的直播方法，包括：An embodiment of the present application provides a live broadcast method based on volumetric video, including:

获取三维人物模型的直播视频数据和融合对象的拍摄视频；Obtain the live video data of the 3D character model and the shooting video of the fused object;

对所述融合对象的拍摄视频进行形象识别处理，得到所述融合对象的虚拟形象；performing image recognition processing on the shot video of the fusion object to obtain the virtual image of the fusion object;

对所述三维人物模型的直播视频数据进行解析，得到所述直播视频数据的背景信息；Analyzing the live video data of the three-dimensional character model to obtain the background information of the live video data;

将所述融合对象的虚拟形象和所述直播视频数据的背景信息进行融合处理，得到融合后直播视频数据；performing fusion processing on the avatar of the fusion object and the background information of the live video data to obtain the fused live video data;

获取观看视角信息，基于所述观看视角信息确定所述融合后直播视频数据中三维人物模型和所述融合对象的虚拟形象的显示内容；Obtain viewing angle information, and determine the display content of the 3D character model in the fused live video data and the avatar of the fused object based on the viewing angle information;

基于所述显示内容显示所述融合后直播视频数据。The fused live video data is displayed based on the display content.

相应地，本申请实施例提供一种基于体积视频的直播装置，包括：Correspondingly, an embodiment of the present application provides a live broadcast device based on volumetric video, including:

获取单元，用于获取三维人物模型的直播视频数据和融合对象的拍摄视频；An acquisition unit, configured to acquire the live video data of the 3D character model and the shooting video of the fused object;

形象识别单元，用于对所述融合对象的拍摄视频进行形象识别处理，得到所述融合对象的虚拟形象；An image recognition unit, configured to perform image recognition processing on the captured video of the fusion object to obtain the virtual image of the fusion object;

解析单元，用于对所述三维人物模型的直播视频数据进行解析，得到所述直播视频数据的背景信息；An analysis unit, configured to analyze the live video data of the 3D character model to obtain the background information of the live video data;

融合单元，用于将所述融合对象的虚拟形象和所述直播视频数据的背景信息进行融合处理，得到融合后直播视频数据；A fusion unit, configured to fuse the avatar of the fusion object and the background information of the live video data to obtain fused live video data;

确定单元，用于获取观看视角信息，基于所述观看视角信息确定所述融合后直播视频数据中三维人物模型和所述融合对象的虚拟形象的显示内容；A determining unit, configured to obtain viewing angle information, and determine the display content of the 3D character model in the fused live video data and the avatar of the fused object based on the viewing angle information;

显示单元，用于基于所述显示内容显示所述融合后直播视频数据。A display unit, configured to display the fused live video data based on the display content.

此外，本申请实施例还提供一种电子设备，包括处理器和存储器，上述存储器存储有计算机程序，上述处理器用于运行上述存储器内的计算机程序实现本申请实施例提供的基于体积视频的直播方法。In addition, an embodiment of the present application also provides an electronic device, including a processor and a memory, the memory stores a computer program, and the processor is used to run the computer program in the memory to implement the volumetric video-based live broadcast method provided in the embodiment of the present application .

此外，本申请实施例还提供一种计算机可读存储介质，上述计算机可读存储介质存储有计算机程序，上述计算机程序适于处理器进行加载，以执行本申请实施例所提供的任一种基于体积视频的直播方法。In addition, the embodiment of the present application also provides a computer-readable storage medium, the above-mentioned computer-readable storage medium stores a computer program, and the above-mentioned computer program is suitable for being loaded by a processor, so as to execute any one of the methods provided in the embodiment of the present application based on Live broadcast method of volumetric video.

此外，本申请实施例还提供一种计算机程序产品，包括计算机程序，上述计算机程序被处理器执行时实现本申请实施例所提供的任一种基于体积视频的直播方法。In addition, an embodiment of the present application further provides a computer program product, including a computer program. When the above computer program is executed by a processor, any method for live broadcast based on volumetric video provided in the embodiment of the present application is implemented.

在本申请实施例中，可以获取三维人物模型的直播视频数据和融合对象的拍摄视频；对融合对象的拍摄视频进行形象识别处理，得到融合对象的虚拟形象；对三维人物模型的直播视频数据进行解析，得到直播视频数据的背景信息；将融合对象的虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据；获取观看视角信息，基于观看视角信息确定融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容；基于显示内容显示所述融合后直播视频数据，可以支持观看对象和直播场景中三维人物模型进行互动，从而提高了体积视频的互动性和趣味性。In the embodiment of the present application, the live video data of the 3D character model and the shot video of the fused object can be obtained; the image recognition processing is performed on the shot video of the fused object to obtain the virtual image of the fused object; the live video data of the 3D character model is processed Analyze to obtain the background information of the live video data; fuse the avatar of the fusion object and the background information of the live video data to obtain the fused live video data; obtain the viewing angle information, and determine the information in the fused live video data based on the viewing angle information. The display content of the 3D character model and the avatar of the fused object; displaying the fused live video data based on the display content can support the interaction between the viewing object and the 3D character model in the live scene, thereby improving the interactivity and interest of the volumetric video .

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是本申请实施例提供的基于体积视频的直播方法的流程示意图；FIG. 1 is a schematic flow diagram of a live broadcast method based on volumetric video provided in an embodiment of the present application;

图2是本申请实施例提供的不同视角的示意图；Fig. 2 is a schematic diagram of different viewing angles provided by the embodiment of the present application;

图3是本申请实施例提供的基于体积视频的直播装置的结构示意图；FIG. 3 is a schematic structural diagram of a live broadcast device based on volumetric video provided by an embodiment of the present application;

图4是本申请实施例提供的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts belong to the scope of protection of this application.

本申请实施例提供一种基于体积视频的直播方法、装置、电子设备及存储介质。该基于体积视频的直播装置可以集成在电子设备中，该电子设备可以是服务器，也可以是终端等设备。Embodiments of the present application provide a live broadcast method, device, electronic device, and storage medium based on volumetric video. The apparatus for live broadcasting based on volumetric video can be integrated in electronic equipment, and the electronic equipment can be a server or a terminal or other equipment.

其中，服务器可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、网络加速服务(Content Delivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。Among them, the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication , middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等，但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接，本申请在此不做限制。The terminal may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto. The terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this application.

另外，本申请实施例中的“多个”指两个或两个以上。本申请实施例中的“第一”和“第二”等用于区分描述，而不能理解为暗示相对重要性。In addition, "multiple" in the embodiments of the present application refers to two or more. "First" and "second" in the embodiments of the present application are used to distinguish descriptions, and should not be understood as implying relative importance.

以下分别进行详细说明。需要说明的是，以下实施例的描述顺序不作为对实施例优选顺序的限定。Each will be described in detail below. It should be noted that the description sequence of the following embodiments is not intended to limit the preferred sequence of the embodiments.

在本实施例中，将从基于体积视频的直播装置的角度进行描述，为了方便对本申请的基于体积视频的直播方法进行说明，以下将以基于体积视频的直播装置集成在终端中进行详细说明，即以终端作为执行主体进行详细说明。In this embodiment, a description will be made from the perspective of a volumetric video-based live broadcast device. In order to facilitate the description of the volumetric video-based live broadcast method of the present application, the following will describe in detail the integration of a volumetric video-based live broadcast device in a terminal. That is, the terminal is used as the execution subject for detailed description.

请参阅图1，图1是本申请一实施例提供的基于体积视频的直播方法的流程示意图。该基于体积视频的直播方法可以包括：Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of a live broadcast method based on volumetric video provided by an embodiment of the present application. The live broadcast method based on volumetric video may include:

101、获取三维人物模型的直播视频数据和融合对象的拍摄视频。101. Acquire the live video data of the three-dimensional character model and the shooting video of the fused object.

其中，本申请实施例的直播视频可以是基于体积视频生成的视频。例如，该直播视频可以是直播演唱会，等等。Wherein, the live video in this embodiment of the present application may be a video generated based on a volumetric video. For example, the live video may be a live concert, and so on.

其中，该三维人物模型可以是基于体积视频构建的人物。例如，该三维人物模型可以是根据歌手构建的模型。Wherein, the three-dimensional character model may be a character constructed based on volume video. For example, the three-dimensional character model may be a model constructed according to a singer.

其中，体积视频(Volumetric Video，又称容积视频、空间视频、体三维视频或6自由度视频等)是一种通过捕获三维空间中信息(如深度信息和色彩信息等)并生成三维模型序列的技术。相对于传统的视频，体积视频将空间的概念加入到视频中，用三维模型来更好的还原真实三维世界，而不是以二维的平面视频加上运镜来模拟真实三维世界的空间感。由于体积视频实质为三维模型序列，使得用户可以随自己喜好调整到任意视角进行观看，较二维平面视频具有更高的还原度和沉浸感。Among them, volumetric video (also known as volumetric video, spatial video, volumetric 3D video or 6-DOF video, etc.) is a method that captures information in 3D space (such as depth information and color information, etc.) and generates a 3D model sequence. technology. Compared with traditional videos, volumetric videos add the concept of space to videos, and use 3D models to better restore the real 3D world, instead of using 2D flat videos and moving mirrors to simulate the sense of space in the real 3D world. Since the volumetric video is essentially a sequence of 3D models, users can adjust it to any viewing angle to watch according to their preferences, which has a higher degree of restoration and immersion than 2D flat video.

在一实施例中，在本申请中，用于构成体积视频的三维模型可以按照如下方式重建得到：In an embodiment, in this application, the 3D model used to form the volumetric video can be reconstructed as follows:

先获取拍摄对象的不同视角的彩色图像和深度图像，以及彩色图像对应的相机参数；然后根据获取到的彩色图像及其对应的深度图像和相机参数，训练隐式表达拍摄对象三维模型的神经网络模型，并基于训练的神经网络模型进行等值面提取，实现对拍摄对象的三维重建，得到拍摄对象的三维模型。First obtain the color images and depth images of different viewing angles of the subject, as well as the camera parameters corresponding to the color images; then, according to the acquired color images and their corresponding depth images and camera parameters, train the neural network that implicitly expresses the 3D model of the subject model, and extract the isosurface based on the trained neural network model, realize the 3D reconstruction of the shooting object, and obtain the 3D model of the shooting object.

应当说明的是，本申请实施例中对采用何种架构的神经网络模型不作具体限制，可由本领域技术人员根据实际需要选取。比如，可以选取不带归一化层的多层感知机(Multilayer Perceptron，MLP)作为模型训练的基础模型。It should be noted that there is no specific limitation on the architecture of the neural network model used in the embodiment of the present application, which can be selected by those skilled in the art according to actual needs. For example, a multilayer perceptron (MLP) without a normalization layer can be selected as the basic model for model training.

下面将对本申请提供的三维模型重建方法进行详细描述。The 3D model reconstruction method provided by this application will be described in detail below.

首先，可以同步采用多个彩色相机和深度相机对需要进行三维重建的目标物体(该目标物体即为拍摄对象)进行多视角的拍摄，得到目标物体在多个不同视角的彩色图像及对应的深度图像，即在同一拍摄时刻(实际拍摄时刻的差值小于或等于时间阈值即认为拍摄时刻相同)，各视角的彩色相机将拍摄得到目标物体在对应视角的彩色图像，相应的，各视角的深度相机将拍摄得到目标物体在对应视角的深度图像。需要说明的是，目标物体可以是任意物体，包括但不限于人物、动物以及植物等生命物体，或者机械、家具、玩偶等非生命物体。First of all, multiple color cameras and depth cameras can be used simultaneously to take multi-angle shots of the target object (the target object is the subject) that needs to be reconstructed in three dimensions, and obtain color images of the target object at multiple different angles of view and the corresponding depth. Image, that is, at the same shooting time (the difference between the actual shooting time is less than or equal to the time threshold, the shooting time is considered to be the same), the color cameras of each viewing angle will capture the color image of the target object at the corresponding viewing angle, and correspondingly, the depth of each viewing angle The camera will capture the depth image of the target object at the corresponding viewing angle. It should be noted that the target object may be any object, including but not limited to living objects such as people, animals, and plants, or non-living objects such as machinery, furniture, and dolls.

以此，目标物体在不同视角的彩色图像均具备对应的深度图像，即在拍摄时，彩色相机和深度相机可以采用相机组的配置，同一视角的彩色相机配合深度相机同步对同一目标物体进行拍摄。比如，可以搭建一摄影棚，该摄影棚中心区域为拍摄区域，环绕该拍摄区域，在水平方向和垂直方向每间隔一定角度配对设置有多组彩色相机和深度相机。当目标物体处于这些彩色相机和深度相机所环绕的拍摄区域时，即可通过这些彩色相机和深度相机拍摄得到该目标物体在不同视角的彩色图像及对应的深度图像。In this way, the color images of the target object at different viewing angles have corresponding depth images, that is, when shooting, the color camera and the depth camera can adopt the configuration of the camera group, and the color camera at the same viewing angle cooperates with the depth camera to simultaneously shoot the same target object . For example, a studio can be built. The central area of the studio is the shooting area, and around the shooting area, multiple groups of color cameras and depth cameras are paired at certain angles in the horizontal and vertical directions. When the target object is in the shooting area surrounded by these color cameras and depth cameras, color images of the target object at different viewing angles and corresponding depth images can be obtained by shooting these color cameras and depth cameras.

此外，进一步获取每一彩色图像对应的彩色相机的相机参数。其中，相机参数包括彩色相机的内外参，可以通过标定确定，相机内参为与彩色相机自身特性相关的参数，包括但不限于彩色相机的焦距、像素等数据，相机外参为彩色相机在世界坐标系中的参数，包括但不限于彩色相机的位置(坐标)和相机的旋转方向等数据。In addition, the camera parameters of the color camera corresponding to each color image are further acquired. Among them, the camera parameters include the internal and external parameters of the color camera, which can be determined through calibration. The internal parameters of the camera are parameters related to the characteristics of the color camera itself, including but not limited to the focal length, pixels and other data of the color camera. The external parameters of the camera are the coordinates of the color camera in the world The parameters in the system, including but not limited to the position (coordinates) of the color camera and the rotation direction of the camera and other data.

如上，在获取到目标物体在同一拍摄时刻的多个不同视角的彩色图像及其对应的深度图像之后，即可根据这些彩色图像及其对应深度图像对目标物体进行三维重建。区别于相关技术中将深度信息转换为点云进行三维重建的方式，本申请训练一神经网络模型用以实现对目标物体的三维模型的隐式表达，从而基于该神经网络模型实现对目标物体的三维重建。As above, after acquiring multiple color images of different viewing angles and their corresponding depth images at the same shooting moment of the target object, three-dimensional reconstruction of the target object can be performed based on these color images and their corresponding depth images. Different from the method of converting depth information into point cloud for 3D reconstruction in related technologies, this application trains a neural network model to realize the implicit expression of the 3D model of the target object, so as to realize the 3D reconstruction of the target object based on the neural network model. Three-dimensional reconstruction.

可选地，本申请选用一不包括归一化层的MLP作为基础模型，按照如下方式进行训练：Optionally, this application selects an MLP that does not include a normalization layer as the basic model, and performs training as follows:

基于对应的相机参数将每一彩色图像中的像素点转化为射线；Convert the pixels in each color image into rays based on the corresponding camera parameters;

在射线上采样多个采样点，并确定每一采样点的第一坐标信息以及每一采样点距离像素点的有向距离(Signed Distance Field，SDF)值；Sampling multiple sampling points on the ray, and determining the first coordinate information of each sampling point and the directed distance (Signed Distance Field, SDF) value of each sampling point from the pixel point;

将采样点的第一坐标信息输入基础模型，得到基础模型输出的每一采样点的预测SDF值以及预测RGB颜色值；Input the first coordinate information of the sampling point into the basic model, and obtain the predicted SDF value and the predicted RGB color value of each sampling point output by the basic model;

基于预测SDF值与SDF值之间的第一差异，以及预测RGB颜色值与像素点的RGB颜色值之间的第二差异，对基础模型的参数进行调整，直至满足预设停止条件；Based on the first difference between the predicted SDF value and the SDF value, and the second difference between the predicted RGB color value and the RGB color value of the pixel, the parameters of the basic model are adjusted until a preset stop condition is met;

将满足预设停止条件的基础模型作为隐式表达目标物体的三维模型的神经网络模型。The basic model that satisfies the preset stop condition is used as a neural network model that implicitly expresses the 3D model of the target object.

首先，基于彩色图像对应的相机参数将彩色图像中的一像素点转化为一条射线，该射线可以为经过像素点且垂直于彩色图像面的射线；然后，在该射线上采样多个采样点，采样点的采样过程可以分两步执行，可以先均匀采样部分采样点，然后再在基于像素点的深度值在关键处进一步采样多个采样点，以保证在模型表面附近可以采样到尽量多的采样点；然后，根据相机参数和像素点的深度值计算出采样得到的每一采样点在世界坐标系中的第一坐标信息以及每一采样点的SDF值，其中，SDF值可以为像素点的深度值与采样点距离相机成像面的距离之间的差值，该差值为有符号的值，当差值为正值时，表示采样点在三维模型的外部，当差值为负值时，表示采样点在三维模型的内部，当差值为零时，表示采样点在三维模型的表面；然后，在完成采样点的采样并计算得到每一采样点对应的SDF值之后，进一步将采样点在世界坐标系的第一坐标信息输入基础模型(该基础模型被配置为将输入的坐标信息映射为SDF值和RGB颜色值后输出)，将基础模型输出的SDF值记为预测SDF值，将基础模型输出的RGB颜色值记为预测RGB颜色值；然后，基于预测SDF值与采样点对应的SDF值之间的第一差异，以及预测RGB颜色值与采样点所对应像素点的RGB颜色值之间的第二差异，对基础模型的参数进行调整。First, based on the camera parameters corresponding to the color image, a pixel in the color image is converted into a ray, which can be a ray passing through the pixel and perpendicular to the color image surface; then, sampling multiple sampling points on the ray, The sampling process of the sampling points can be performed in two steps. You can first uniformly sample some sampling points, and then further sample multiple sampling points at key points based on the depth value of the pixel points, so as to ensure that as much as possible can be sampled near the surface of the model. Sampling point; then, calculate the first coordinate information of each sampling point in the world coordinate system obtained by sampling and the SDF value of each sampling point according to the camera parameters and the depth value of the pixel point, where the SDF value can be a pixel point The difference between the depth value of and the distance between the sampling point and the camera imaging surface. The difference is a signed value. When the difference is positive, it means that the sampling point is outside the 3D model. When the difference is negative When , it means that the sampling point is inside the 3D model, and when the difference is zero, it means that the sampling point is on the surface of the 3D model; then, after completing the sampling of the sampling point and calculating the SDF value corresponding to each sampling point, further divide The first coordinate information of the sampling point in the world coordinate system is input into the basic model (the basic model is configured to map the input coordinate information into SDF values and RGB color values and then output), and record the SDF value output by the basic model as the predicted SDF value , record the RGB color value output by the basic model as the predicted RGB color value; then, based on the first difference between the predicted SDF value and the SDF value corresponding to the sampling point, and the RGB color value of the pixel corresponding to the predicted RGB color value and the sampling point The second difference between the color values, making adjustments to the parameters of the base model.

此外，对于彩色图像中的其它像素点，同样按照上述方式进行采样点采样，然后将采样点在世界坐标系的坐标信息输入至基础模型以得到对应的预测SDF值和预测RGB颜色值，用于对基础模型的参数进行调整，直至满足预设停止条件，比如，可以配置预设停止条件为对基础模型的迭代次数达到预设次数，或者配置预设停止条件为基础模型收敛。在对基础模型的迭代满足预设停止条件时，即得到能够对拍摄对象的三维模型进行准确地隐式表达的神经网络模型。最后，可以采用等值面提取算法对该神经网络模型进行三维模型表面的提取，从而得到拍摄对象的三维模型。In addition, for other pixels in the color image, the sampling points are also sampled in the same manner as above, and then the coordinate information of the sampling points in the world coordinate system is input to the basic model to obtain the corresponding predicted SDF value and predicted RGB color value for Adjust the parameters of the basic model until the preset stopping condition is met. For example, the preset stopping condition can be configured as the number of iterations of the basic model reaches the preset number, or the preset stopping condition can be configured as the basic model converges. When the iteration of the basic model satisfies the preset stop condition, a neural network model capable of accurately and implicitly expressing the three-dimensional model of the shooting object is obtained. Finally, the isosurface extraction algorithm can be used to extract the surface of the three-dimensional model of the neural network model, so as to obtain the three-dimensional model of the shooting object.

可选地，在一些实施例中，根据相机参数确定彩色图像的成像面；确定经过彩色图像中像素点且垂直于成像面的射线为像素点对应的射线。Optionally, in some embodiments, the imaging plane of the color image is determined according to camera parameters; and a ray passing through a pixel in the color image and perpendicular to the imaging plane is determined as a ray corresponding to the pixel.

其中，可以根据彩色图像对应的彩色相机的相机参数，确定该彩色图像在世界坐标系中的坐标信息，即确定成像面。然后，可以确定经过彩色图像中像素点且垂直于该成像面的射线为该像素点对应的射线。Wherein, the coordinate information of the color image in the world coordinate system may be determined according to the camera parameters of the color camera corresponding to the color image, that is, the imaging plane may be determined. Then, it can be determined that the ray passing through the pixel in the color image and perpendicular to the imaging plane is the ray corresponding to the pixel.

可选地，在一些实施例中，根据相机参数确定彩色相机在世界坐标系中的第二坐标信息及旋转角度；根据第二坐标信息和旋转角度确定彩色图像的成像面。Optionally, in some embodiments, the second coordinate information and the rotation angle of the color camera in the world coordinate system are determined according to the camera parameters; and the imaging plane of the color image is determined according to the second coordinate information and the rotation angle.

可选地，在一些实施例中，在射线上等间距采样第一数量个第一采样点；根据像素点的深度值确定多个关键采样点，并根据关键采样点采样第二数量个第二采样点；将第一数量个的第一采样点与第二数量个的第二采样点确定为在射线上采样得到的多个采样点。Optionally, in some embodiments, a first number of first sampling points are sampled at equal intervals on the ray; multiple key sampling points are determined according to the depth value of the pixel point, and a second number of second sampling points are sampled according to the key sampling points. Sampling points: determining the first number of first sampling points and the second number of second sampling points as a plurality of sampling points obtained by sampling on the ray.

其中，先在射线上均匀采样n(即第一数量)个第一采样点，n为大于2的正整数；然后，再根据前述像素点的深度值，从n个第一采样点中确定出距离前述像素点最近的预设数量个关键采样点，或者从n个第一采样点中确定出距离前述像素点小于距离阈值的关键采样点；然后，根据确定出的关键采样点再采样m个第二采样点，m为大于1的正整数；最后，将采样得到的n+m个采样点确定为在射线上采样得到的多个采样点。其中，在关键采样点处再多采样m个采样点，可以使得模型的训练效果在三维模型表面处更为精确，从而提升三维模型的重建精度。Among them, first uniformly sample n (i.e. the first number) first sampling points on the ray, n is a positive integer greater than 2; then, according to the depth value of the aforementioned pixel points, determine the The preset number of key sampling points closest to the aforementioned pixel points, or determine the key sampling points from the n first sampling points that are less than the distance threshold from the aforementioned pixel points; then, re-sampling m according to the determined key sampling points For the second sampling point, m is a positive integer greater than 1; finally, the n+m sampling points obtained by sampling are determined as a plurality of sampling points obtained by sampling on the ray. Among them, sampling m more sampling points at key sampling points can make the training effect of the model more accurate on the surface of the 3D model, thereby improving the reconstruction accuracy of the 3D model.

可选地，在一些实施例中，根据彩色图像对应的深度图像确定像素点对应的深度值；基于深度值计算每一采样点距离像素点的SDF值；根据相机参数与深度值计算每一采样点的坐标信息。Optionally, in some embodiments, the depth value corresponding to the pixel is determined according to the depth image corresponding to the color image; the SDF value of each sampling point from the pixel is calculated based on the depth value; the SDF value of each sampling point is calculated according to the camera parameters and the depth value Point coordinate information.

其中，在每一像素点对应的射线上采样了多个采样点后，对于每一采样点，根据相机参数、像素点的深度值确定彩色相机的拍摄位置与目标物体上对应点之间的距离，然后基于该距离逐一计算每一采样点的SDF值以及计算出每一采样点的坐标信息。Among them, after sampling multiple sampling points on the ray corresponding to each pixel point, for each sampling point, determine the distance between the shooting position of the color camera and the corresponding point on the target object according to the camera parameters and the depth value of the pixel point , and then calculate the SDF value of each sampling point one by one based on the distance and calculate the coordinate information of each sampling point.

需要说明的是，在完成对基础模型的训练之后，对于给定的任意一个点的坐标信息，即可由完成训练的基础模型预测其对应的SDF值，该预测的SDF值即表示了该点与目标物体的三维模型的位置关系(内部、外部或者表面)，实现对目标物体的三维模型的隐式表达，得到用于隐式表达目标物体的三维模型的神经网络模型。It should be noted that after the training of the basic model is completed, for the coordinate information of any given point, the corresponding SDF value can be predicted by the basic model that has completed the training, and the predicted SDF value represents the relationship between the point and The positional relationship (internal, external or surface) of the 3D model of the target object realizes the implicit expression of the 3D model of the target object, and obtains a neural network model for implicitly expressing the 3D model of the target object.

最后，对以上神经网络模型进行等值面提取，比如可以采用等值面提取算法(Marching cubes，MC)绘制出三维模型的表面，得到三维模型表面，进而根据该三维模型表面得到目标物体的三维模型。Finally, extract the isosurface of the above neural network model. For example, you can use the isosurface extraction algorithm (Marching cubes, MC) to draw the surface of the 3D model to obtain the surface of the 3D model, and then obtain the 3D surface of the target object based on the surface of the 3D model. Model.

本申请提供的三维重建方案，通过神经网络去隐式建模目标物体的三维模型，并加入深度信息提高模型训练的速度和精度。采用本申请提供的三维重建方案，在时序上持续的对拍摄对象进行三维重建，即可得到拍摄对象在不同时刻的三维模型，这些不同时刻的三维模型按时序构成的三维模型序列即为对拍摄对象所拍摄得到的体积视频。以此，可以针对任意拍摄对象进行“体积视频拍摄”，得到特定内容呈现的体积视频。比如，可以对跳舞的拍摄对象进行体积视频拍摄，得到可以在任意角度观看拍摄对象舞蹈的体积视频，可以对教学的拍摄对象进行体积视频拍摄，得到可以在任意角度观看拍摄对象教学的体积视频，等等。The 3D reconstruction solution provided by this application uses the neural network to implicitly model the 3D model of the target object, and adds depth information to improve the speed and accuracy of model training. Using the 3D reconstruction scheme provided by this application, the 3D reconstruction of the photographed object is carried out continuously in time sequence, and the 3D model of the photographed object at different moments can be obtained. Volumetric video captured by the subject. In this way, "volume video shooting" can be performed on any subject to obtain a volume video presented with specific content. For example, it is possible to shoot a volumetric video of a dancing subject to obtain a volumetric video that can be watched from any angle. It is possible to shoot a volumetric video of a teaching subject to obtain a volumetric video that can be viewed from any angle. etc.

需要说明的是，本申请以下实施例涉及的体积视频可采用以上体积视频拍摄方式所拍摄得到。It should be noted that the volumetric videos involved in the following embodiments of the present application can be captured by the above volumetric video shooting methods.

其中，融合对象可以包括观看直播视频的用户。在一实施例中，本申请实施例可以支持将观看直播视频的用户和直播视频进行融合，从而提高直播视频的互动性。例如，可以用户可以拍摄自己的视频，然后，可以将观看用户的视频和直播视频进行融合。Wherein, the fused objects may include users watching live video. In an embodiment, the embodiments of the present application may support the integration of users who watch the live video and the live video, so as to improve the interactivity of the live video. For example, a user can shoot his own video, and then the video of the watching user can be fused with the live video.

在一实施例中，当直播视频播放之前或直播视频播放时，用户可以请求将观看用户的视频和直播视频进行融合。In an embodiment, before the live video is played or when the live video is played, the user may request to merge the video of the watching user with the live video.

例如，在直播视频播放之前，视频播放界面上可以包括融合请求控件，用户可以通过触发融合请求控件将自己观看直播视频的过程和直播视频进行融合。当用户触发了融合请求控件之后，用户的终端设备可以采集用户的拍摄视频数据，然后，终端设备可以将用户的拍摄视频数据传输给直播终端。然后，在播放直播视频时，直播终端可以获取三维人物模型的直播视频数据和融合对象的拍摄视频。For example, before the live video is played, the video playback interface may include a fusion request control, and the user may integrate the process of watching the live video with the live video by triggering the fusion request control. After the user triggers the integration request control, the user's terminal device can collect the user's shooting video data, and then, the terminal device can transmit the user's shooting video data to the live broadcast terminal. Then, when playing the live video, the live terminal can acquire the live video data of the 3D character model and the shot video of the fused object.

在一实施例中，由于观看直播视频的可能有很多，而直播视频的背景中可以融合用户的数量有限。例如，直播视频的背景中可以融合的用户数量为100位。因此，当直播视频中的预设融合位置都被预定完了之后，视频播放界面上的融合请求控件就会被禁止，从而避免融合对象的数量过多，影响直播质量。In an embodiment, since there are many possibilities to watch the live video, the number of users that can be integrated into the background of the live video is limited. For example, the number of users that can be integrated in the background of the live video is 100. Therefore, when the preset fusion positions in the live video are all reserved, the fusion request control on the video playback interface will be disabled, thereby avoiding too many fusion objects and affecting the quality of the live broadcast.

102、对融合对象的拍摄视频进行形象识别处理，得到融合对象的虚拟形象。102. Perform image recognition processing on the captured video of the fused object to obtain a virtual image of the fused object.

在一实施例中，用户还可以选择是将自己的拍摄视频直接和直播视频融合，还是以虚拟形象的形式和直播视频进行融合。In an embodiment, the user can also choose whether to directly fuse the captured video with the live video, or to fuse with the live video in the form of an avatar.

例如，在用户触发了融合请求控件之后，视频播放界面上可以显示虚拟形象生成控件和融合选择控件。For example, after the user triggers the fusion request control, the avatar generation control and the fusion selection control may be displayed on the video playback interface.

当用户选择了融合选择控件时，直播装置可以直接将用户的拍摄视频和直播视频进行融合。当用户选择了虚拟形象生成控件时，可以根据融合对象的拍摄视频生成融合对象的虚拟形象。When the user selects the fusion selection control, the live broadcast device can directly fuse the user's shot video with the live video. When the user selects the avatar generation control, the avatar of the fused object can be generated according to the captured video of the fused object.

其中，有多种方式可以生成融合对象的虚拟形象。Among them, there are many ways to generate the virtual image of the fused object.

例如，直播装置可以预先生成多个虚拟形象模板，用户可以选择自己所需的虚拟形象模板，从而生成自己的虚拟形象。For example, the live broadcast device can generate multiple avatar templates in advance, and the user can select the avatar template he needs to generate his own avatar.

又例如，直播装置可以根据用户的拍摄视频，智能化地生成融合对象的虚拟形象。例如，可以从拍摄视频中识别出融合对象的轮廓图像，然后，可以对融合对象的轮廓对象进行风格转换，得到融合对象的虚拟形象。具体的，步骤“对融合对象的拍摄视频进行形象识别处理，得到融合对象的虚拟形象”，可以包括：For another example, the live broadcast device can intelligently generate the avatar of the fusion object according to the video shot by the user. For example, the outline image of the fused object can be recognized from the captured video, and then the style conversion can be performed on the outline object of the fused object to obtain the virtual image of the fused object. Specifically, the step of "performing image recognition processing on the captured video of the fused object to obtain the virtual image of the fused object" may include:

对拍摄视频进行分帧处理，得到拍摄视频的至少一个视频帧；Perform frame division processing on the captured video to obtain at least one video frame of the captured video;

对视频帧进行轮廓识别，得到融合对象的轮廓图像；Perform contour recognition on the video frame to obtain the contour image of the fused object;

对融合对象的轮廓图像进行风格转换，得到融合对象的虚拟形象。Perform style conversion on the outline image of the fused object to obtain the virtual image of the fused object.

在一实施例中，可以对拍摄视频进行分帧处理，得到拍摄视频的至少一个视频帧。例如，可以利用开源计算机视觉库(Open Source Computer Vision Library，openCV)对拍摄视频进行分帧处理，得到拍摄视频的至少一个视频帧。其中，openCV是一个跨平台计算机视觉和机器学习软件库，可以运行在多个操作系统上，同时提供了多种编程语言的接口，实现了图像处理和计算机视觉方向的很多通用算法。In an embodiment, frame division processing may be performed on the shot video to obtain at least one video frame of the shot video. For example, an open source computer vision library (Open Source Computer Vision Library, openCV) can be used to perform frame division processing on the captured video to obtain at least one video frame of the captured video. Among them, openCV is a cross-platform computer vision and machine learning software library, which can run on multiple operating systems, and provides interfaces of multiple programming languages, and realizes many general algorithms in the direction of image processing and computer vision.

在一实施例中，在得到拍摄视频的至少一个视频帧之后，可以对视频帧进行轮廓识别，得到融合对象的轮廓图像。其中，有多种方法可以对视频帧进行轮廓识别，得到融合对象的轮廓图像。In an embodiment, after obtaining at least one video frame of the captured video, contour recognition may be performed on the video frame to obtain a contour image of the fused object. Among them, there are many methods to perform contour recognition on video frames to obtain contour images of fused objects.

例如，可以利用人工智能算法对视频帧进行轮廓识别，得到融合对象的轮廓图像。例如，可以利用卷积神经网络(Convolutional Neural Networks，CNN)、反卷积神经网络(De-Convolutional Networks，DN)或深度神经网络(Deep Neural Networks，DNN)等人工智能算法对视频帧进行轮廓识别，得到融合对象的轮廓图像。For example, artificial intelligence algorithms can be used to perform contour recognition on video frames to obtain contour images of fused objects. For example, artificial intelligence algorithms such as Convolutional Neural Networks (CNN), De-Convolutional Networks (DN) or Deep Neural Networks (DNN) can be used to perform contour recognition on video frames , to obtain the contour image of the fused object.

又例如，可以通过像素点对视频帧进行轮廓识别，得到融合对象的轮廓图像。例如，可以提取出视频帧的像素点信息。然后，可以根据像素点信息对视频帧中的融合对象进行轮廓检测，得到轮廓像素点的位置信息。然后，可以根据轮廓像素点的位置信息裁剪出融合对象的轮廓图像。For another example, the contour recognition of the video frame may be performed through pixels to obtain the contour image of the fused object. For example, pixel information of a video frame can be extracted. Then, contour detection can be performed on the fused object in the video frame according to the pixel point information to obtain the position information of the contour pixel point. Then, the contour image of the fused object can be cropped according to the position information of the contour pixels.

在一实施例中，可以对融合对象的轮廓图像进行风格转换，得到融合对象的虚拟形象。其中，对融合对象的轮廓图像进行风格转换可以指将轮廓图像中融合对象的形象转换成虚拟形象。例如，可以将轮廓图像中融合对象的形象转换成动漫风。又例如，可以将轮廓图像中融合对象的形象转换成油画风，等等。在一实施例中，可以利用人工智能算法对融合对象的轮廓图像进行风格转换，得到融合对象的虚拟形象。例如，可以利用CNN或DNN等人工智能算法对融合对象的轮廓图像进行风格转换，得到融合对象的虚拟形象。In an embodiment, style conversion may be performed on the outline image of the fused object to obtain the avatar of the fused object. Wherein, performing style conversion on the outline image of the fused object may refer to converting the image of the fused object in the outline image into an avatar. For example, you can convert the image of the fused object in the outline image into an anime style. For another example, the image of the fused object in the outline image can be converted into an oil painting style, and so on. In an embodiment, an artificial intelligence algorithm may be used to perform style conversion on the outline image of the fused object to obtain the avatar of the fused object. For example, artificial intelligence algorithms such as CNN or DNN can be used to perform style conversion on the outline image of the fused object to obtain the virtual image of the fused object.

103、对三维人物模型的直播视频数据进行解析，得到直播视频数据的背景信息。103. Analyzing the live video data of the 3D character model to obtain background information of the live video data.

其中，直播视频数据的背景信息可以包括直播视频数据中相当三维人物模型的信息。例如，在直播视频数据中，三维人物模型可以属于前景信息，而除了三维人物模型以外的信息，可以是背景信息。Wherein, the background information of the live video data may include information equivalent to a three-dimensional character model in the live video data. For example, in live video data, a 3D character model may belong to foreground information, while information other than the 3D character model may be background information.

在一实施例中，可以对直播视频数据进行解析，得到直播视频数据的背景信息。例如，可以对直播视频的视频帧进行前景分离，得到直播视频数据的背景信息。其中，对直播视频的视频帧进行前景分离可以指将直播视频中的三维人物模型和背景信息进行分离。例如，可以利用人工智能技术将直播视频中的三维人物模型和背景信息进行分离。In an embodiment, the live video data may be analyzed to obtain the background information of the live video data. For example, foreground separation may be performed on video frames of live video to obtain background information of live video data. Wherein, performing foreground separation on the video frames of the live video may refer to separating the three-dimensional character model and the background information in the live video. For example, artificial intelligence technology can be used to separate the 3D character model and background information in the live video.

104、将融合对象的虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据。104. Perform fusion processing on the virtual image of the fusion object and the background information of the live video data to obtain the fused live video data.

在一实施例中，为了提高体积视频的互动性和趣味性，可以将融合对象的虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据。In one embodiment, in order to improve the interactivity and interest of the volumetric video, the avatar of the fused object and the background information of the live video data may be fused to obtain fused live video data.

在一实施例中，为了保证直播视频的质量，直播视频的背景中可以融合的对象有限，因此，直播视频的背景可以包括至少一个预设融合位置。然后，可以将融合对象的虚拟形象和背景视频中的预设融合位置进行融合处理，得到融合后直播视频数据。In an embodiment, in order to ensure the quality of the live video, objects that can be fused in the background of the live video are limited, therefore, the background of the live video may include at least one preset fusion position. Then, the virtual image of the fused object and the preset fused position in the background video can be fused to obtain fused live video data.

其中，有多种方式可以将融合对象的虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据。Among them, there are many ways to fuse the virtual image of the fusion object and the background information of the live video data to obtain the fused live video data.

例如，可以随机地将融合对象的虚拟形象和直播视频数据的背景信息进行融合，得到融合后直播视频数据。例如，可以随机地为融合对象的虚拟形象分配一个融合位置，然后，可以根据融合位置将融合对象的虚拟信息融合至背景信息中。For example, the avatar of the fused object and the background information of the live video data can be randomly fused to obtain fused live video data. For example, a fusion position may be randomly assigned to the virtual image of the fusion object, and then the virtual information of the fusion object may be fused into the background information according to the fusion position.

在一实施例中，还可以根据融合对象的等级信息将融合对象和背景信息进行融合。具体的，步骤“将融合对象的虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据”，可以包括：In an embodiment, the fusion object and the background information may also be fused according to the level information of the fusion object. Specifically, the step of "fusion processing the virtual image of the fused object and the background information of the live video data to obtain the fused live video data" may include:

获取融合对象的等级信息；Obtain the level information of the fusion object;

基于融合对象的等级信息确定融合对象的虚拟形象在直播视频数据的背景信息中的融合位置；Determining the fusion position of the virtual image of the fusion object in the background information of the live video data based on the level information of the fusion object;

根据虚拟形象的融合位置将虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据。According to the fused position of the avatar, the avatar and the background information of the live video data are fused to obtain the fused live video data.

其中，融合对象的等级信息可以包括融合对象的账号等级，预定融合位置所出的数额，等等。例如，融合对象预定融合位置时所出的数额越高，其等级则越高。又例如，融合对象的账号等级越高，其等级也越高，等等。Wherein, the level information of the fusion object may include the account level of the fusion object, the amount paid by the predetermined fusion position, and so on. For example, the higher the amount paid out when the fused object is scheduled to be fused, the higher its level will be. For another example, the higher the account level of the fusion object is, the higher its level is, and so on.

然后，可以根据融合对象的等级信息确定融合对象的虚拟形象在直播视频数据的背景信息中的融合位置。具体的，步骤“基于融合对象的等级信息确定融合对象的虚拟形象在直播视频数据的背景信息中的融合位置”，可以包括：Then, the fused position of the avatar of the fused object in the background information of the live video data can be determined according to the level information of the fused object. Specifically, the step of "determining the fusion position of the virtual image of the fusion object in the background information of the live video data based on the level information of the fusion object" may include:

将融合对象的等级信息和背景信息中的至少一个预设融合位置的等级信息进行匹配，得到匹配结果；Matching the level information of the fusion object with the level information of at least one preset fusion position in the background information to obtain a matching result;

基于匹配结果在至少一个预设融合位置中确定虚拟形象在直播视频数据的背景信息中的融合位置。The fusion position of the avatar in the background information of the live video data is determined in at least one preset fusion position based on the matching result.

例如，背景信息中的预设融合位置都有对应的等级信息。例如，和三维人物模型越接近的预设融合位置的等级就越高。反之，距离三维人物模型越远的预设融合位置的等级越低。For example, the preset fusion positions in the background information all have corresponding level information. For example, the closer the preset fusion position is to the three-dimensional character model, the higher the level is. Conversely, the farther away from the 3D character model the lower the level of the preset fusion position.

在一实施例中，可以将融合对象的等级信息和背景信息中的至少一个预设融合位置的等级信息进行匹配，得到匹配结果。然后，可以基于匹配结果在至少一个预设融合位置中确定虚拟形象在直播视频数据的背景信息中的融合位置。例如，可以将融合对象的等级信息和预设融合位置的等级信息进行比较。若融合对象的等级信息和预设融合位置的等级信息相同，说明融合对象和该预设融合位置相匹配，然后可以将该预设融合位置确定为融合对象的融合位置。In an embodiment, the level information of the fused object may be matched with the level information of at least one preset fused position in the background information to obtain a matching result. Then, the fusion position of the avatar in the background information of the live video data may be determined in at least one preset fusion position based on the matching result. For example, the level information of the fused object may be compared with the level information of the preset fused position. If the level information of the fused object is the same as the level information of the preset fused position, it means that the fused object matches the preset fused position, and then the preset fused position can be determined as the fused position of the fused object.

然后，可以根据虚拟形象的融合位置将虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据。Then, the avatar and the background information of the live video data may be fused according to the fused position of the avatar to obtain the fused live video data.

105、获取观看视角信息，基于观看视角信息确定融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容。105. Obtain viewing angle information, and determine display content of the 3D character model and the avatar of the fused object in the fused live video data based on the viewing angle information.

在一实施例中，在生成融合后直播视频数据之后，由于基于三维人物模型的直播视频可以是360度展示的。因此，观看可以通过调整自己的观看视角，从而改变直播视频的显示内容。例如，如图2所示，001可以是观众通过第一视角查看到的直播内容，002可以是观众通过第二视角查看到的直播内容。其中，第一视角和第二视角是不同的视角，因此直播视频显示的内容也有所不同。由于用户可以调整自己观看的视角，所以，可以获取用户的观看视角信息，基于观看视角信息确定融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容。In an embodiment, after the fused live video data is generated, the live video based on the 3D character model can be displayed in 360 degrees. Therefore, the viewer can change the displayed content of the live video by adjusting his viewing angle. For example, as shown in FIG. 2 , 001 may be the live content viewed by the audience through the first viewing angle, and 002 may be the live content viewed by the audience through the second viewing angle. Wherein, the first viewing angle and the second viewing angle are different viewing angles, so the content displayed in the live video is also different. Since the user can adjust his own viewing angle, the user's viewing angle information can be obtained, and based on the viewing angle information, the display content of the 3D character model and the avatar of the fused object in the fused live video data can be determined.

其中，观看视角信息可以指用户原来的观看视角和现在的观看视角的变化，即第一视角和第二视角的角度变化。例如，如图2所示，图像中第一视角和第二视角变化了180度。Wherein, the viewing angle information may refer to the change between the user's original viewing angle and the current viewing angle, that is, the angle change between the first viewing angle and the second viewing angle. For example, as shown in FIG. 2 , the first viewing angle and the second viewing angle in the image change by 180 degrees.

在一实施例中，步骤“基于观看视角信息确定融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容”，可以包括：In one embodiment, the step of "determining the display content of the 3D character model and the avatar of the fused object in the fused live video data based on the viewing angle information" may include:

对观看视角信息进行角度计算处理，得到观看角度；Perform angle calculation processing on the viewing angle information to obtain the viewing angle;

将观看角度映射至融合后直播视频数据，得到直播视频数据对应的显示范围；Map the viewing angle to the fused live video data to obtain the display range corresponding to the live video data;

根据显示范围确定融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容。The display content of the 3D character model and the avatar of the fused object in the fused live video data is determined according to the display range.

在一实施例中，可以对观看视角信息进行角度计算处理，得到观看角度。例如，观看视角信息可以用于说明第一视角和第二视角的角度变化。然后，可以将第一视角对应的角度信息和该角度变化进行算术运算，得到观看角度。例如，可以将第一视角对应的角度信息和视角变化进行相加或相减，得到观看角度。In an embodiment, angle calculation processing may be performed on the viewing angle information to obtain the viewing angle. For example, the viewing angle information may be used to describe the angle change between the first angle of view and the second angle of view. Then, an arithmetic operation may be performed on the angle information corresponding to the first viewing angle and the angle change to obtain the viewing angle. For example, the viewing angle may be obtained by adding or subtracting the angle information corresponding to the first viewing angle and the viewing angle change.

在一实施例中，可以将观看角度映射至融合后直播视频数据，得到直播视频数据对应的显示范围。例如，直播视频中有一个预先设置好的三维坐标。可以将该观看角度映射到该三维坐标中，然后再通过三维坐标确定指标视频数据对应的显示范围。In an embodiment, the viewing angle may be mapped to the fused live video data to obtain a display range corresponding to the live video data. For example, there is a pre-set three-dimensional coordinate in the live video. The viewing angle may be mapped to the three-dimensional coordinates, and then the display range corresponding to the target video data is determined through the three-dimensional coordinates.

然后，可以根据显示范围确定融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容。Then, the display content of the 3D character model and the avatar of the fused object in the fused live video data may be determined according to the display range.

106、基于显示内容显示融合后直播视频数据。106. Display the fused live video data based on the display content.

在一实施例中，在确定了融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容之后，可以基于显示内容显示融合后直播视频数据。In an embodiment, after the display content of the 3D character model and the avatar of the fused object in the fused live video data is determined, the fused live video data may be displayed based on the display content.

在一实施例中，为了进一步地提高直播视频的趣味性，还可以令融合后直播视频数据中的虚拟形象跟随这三维人物模型的动作进行变化。例如，当三维人物模型的手做了上升的动作时，融合后直播视频数据中的虚拟形象会跟随着三维人物模型的手上升。又例如，当三维人物模型转圈时，融合后直播视频数据中的虚拟形象会跟随着三维人物模型进行旋转。In an embodiment, in order to further improve the interest of the live video, the avatar in the fused live video data can also be changed following the movement of the 3D character model. For example, when the hand of the 3D character model moves up, the avatar in the fused live video data will follow the hand of the 3D character model to rise. For another example, when the 3D character model turns around, the avatar in the fused live video data will follow the 3D character model to rotate.

具体的，本申请实施例还可以包括：Specifically, this embodiment of the application may also include:

对三维人物模型的动作进行检测；Detect the movement of the 3D character model;

当检测到三维人物模型的动作和预设触发动作相匹配时，提取预设触发动作对应的虚拟形象控制逻辑；When it is detected that the action of the three-dimensional character model matches the preset trigger action, extracting the avatar control logic corresponding to the preset trigger action;

根据虚拟形象控制逻辑对虚拟形象在融合后直播视频数据的位置信息进行调整处理。According to the control logic of the virtual image, the position information of the live video data after fusion of the virtual image is adjusted and processed.

在一实施例中，可以对融合后直播视频数据中三维人物模型的动作进行检测。通过对三维人物模型的动作进行检测，可以知道三维人物模型做了什么动作。In an embodiment, the motion of the 3D character model in the fused live video data may be detected. By detecting the actions of the three-dimensional character model, it is possible to know what actions the three-dimensional character model has performed.

在一实施例中，可以预先设置好三维人物模型的什么动作会导致虚拟形象的位置进行变化。因此，可以将三维人物模型的动作和预设触发动作进行匹配。In one embodiment, it may be preset which action of the three-dimensional character model will cause the position of the avatar to change. Therefore, the action of the three-dimensional character model can be matched with the preset trigger action.

其中，当检测到三维人物模型的动作和预设触发动作相匹配时，可以提取预设触发动作对应的虚拟形象控制逻辑。其中，虚拟形象控制逻辑可以用于说明虚拟形象的位置该怎么变化。例如，虚拟形象控制逻辑可以说明当三维人物模型的手上升时，虚拟形象的位置会跟随着三维人物模型的手上升。又例如，虚拟形象控制逻辑可言说明当三维人物模型的手下降时，虚拟形象的位置会跟随着三维人物模型的手下降。Wherein, when it is detected that the action of the three-dimensional character model matches the preset trigger action, the avatar control logic corresponding to the preset trigger action may be extracted. Wherein, the avatar control logic may be used to describe how the position of the avatar should be changed. For example, the avatar control logic may indicate that when the hand of the three-dimensional character model rises, the position of the avatar will follow the hand of the three-dimensional character model. For another example, the avatar control logic may indicate that when the hand of the 3D character model falls, the position of the avatar will follow the hand of the 3D character model.

然后，根据虚拟形象控制逻辑对虚拟形象在融合后直播视频数据的位置信息进行调整处理。具体的，步骤“根据虚拟形象控制逻辑对虚拟形象在融合后直播视频数据的位置信息进行调整处理”，可以包括：Then, the location information of the fused live video data of the avatar is adjusted according to the avatar control logic. Specifically, the step of "adjusting and processing the location information of the fused live video data of the avatar according to the avatar control logic" may include:

根据三维人物模型的动作，计算动态变化加速度；According to the action of the 3D character model, calculate the dynamic acceleration;

根据虚拟形象控制逻辑，将动态变化加速度转化成作用于虚拟形象的噪声信息；According to the virtual image control logic, the dynamic acceleration is transformed into noise information acting on the virtual image;

将噪声信息添加至虚拟形象，以实现对虚拟形象在融合后直播视频数据的位置信息进行调整处理。The noise information is added to the avatar so as to adjust the position information of the fused live video data of the avatar.

在一实施例中，在对三维人物模型的动作进行检测时，还可以对三维人物模型的动作变化速度进行检测。然后，可以根据三维人物模型的动作变化速度计算动态变化加速度。通过动态变化加速度，可以使得虚拟形象的位置发生改变。In an embodiment, when detecting the motion of the three-dimensional character model, the speed of change of the motion of the three-dimensional character model may also be detected. Then, the dynamic change acceleration can be calculated according to the movement change speed of the three-dimensional character model. By dynamically changing the acceleration, the position of the avatar can be changed.

在一实施例中，可以根据虚拟形象控制逻辑，将动态变化加速度转化成作用于虚拟形象的噪声信息。其中，噪声信息可以令虚拟形象的位置进行变化。例如，该噪声信息可以是柏林噪声(Per l i n noi se)。其中，可以根据虚拟形象控制逻辑，将动态变化加速度转化成作用于虚拟形象的噪声信息。例如虚拟形象控制逻辑为三维人物模型的手上升时，虚拟形象的位置会跟随着三维人物模型的手上升，则可以生成一个方向向上的噪声信息。通过将噪声信息添加至虚拟形象，可以实现对虚拟形象在融合后直播视频数据的位置信息进行调整处理。In one embodiment, the dynamically changing acceleration can be converted into noise information acting on the avatar according to the avatar control logic. Wherein, the noise information can change the position of the avatar. For example, the noise information may be Perlin noise (Per l i n noise). Wherein, according to the virtual image control logic, the dynamic acceleration can be converted into noise information acting on the virtual image. For example, when the control logic of the avatar is that the hand of the 3D character model rises, the position of the avatar will follow the rise of the hand of the 3D character model, and noise information in an upward direction can be generated. By adding the noise information to the avatar, the position information of the fused live video data of the avatar can be adjusted and processed.

在一实施例中，为了进一步地提高观看直播视频的观看和直播视频之间的互动性，本申请实施例还可以基于云渲染技术，支持观看直播视频的用户操作自己的虚拟形象在场景中自由移动，实现千人千面观看演唱会。In one embodiment, in order to further improve the interactivity between watching live video and live video, this embodiment of the present application can also support users watching live video to operate their avatar freely in the scene based on cloud rendering technology. Mobile, enabling thousands of people to watch the concert.

例如，可以根据观看直播视频的用户的等级，判断直播视频的用户是否具有资格控制自己的虚拟形象在场景中自由移动。例如，观看直播视频的用户的等级可以包括第1级至第5级。其中，第5级的用户可以控制自己的虚拟形象在场景中自己移动。而第1级至第4级的用户则不可以控制自己的虚拟形象在场景中自己移动。For example, it can be judged whether the user of the live video is qualified to control his avatar to move freely in the scene according to the level of the user watching the live video. For example, the grades of users who watch the live video may include grades 1 to 5. Among them, users at level 5 can control their avatars to move in the scene by themselves. However, users of levels 1 to 4 cannot control their avatars to move in the scene by themselves.

又例如，可以根据观看直播视频的用户的虚拟形象在直播视频中的位置，判断直播视频的用户是否可以控制自己的虚拟形象在场景中自由移动。例如，可以限定直播视频背景中某些位置的虚拟形象可以移动，则这些位置上的用户可以操作自己的虚拟形象进行移动。For another example, according to the position of the avatar of the user watching the live video in the live video, it can be judged whether the user of the live video can control his avatar to move freely in the scene. For example, it is possible to limit that the avatars at certain positions in the live video background can move, and users at these positions can operate their avatars to move.

由以上可知，在本申请实施例中，可以获取三维人物模型的直播视频数据和融合对象的拍摄视频；对融合对象的拍摄视频进行形象识别处理，得到融合对象的虚拟形象；对三维人物模型的直播视频数据进行解析，得到直播视频数据的背景信息；将融合对象的虚拟形象和直播视频数据的背景信息进行融合处理，得到融合后直播视频数据；获取观看视角信息，基于观看视角信息确定融合后直播视频数据中三维人物模型和融合对象的虚拟形象的显示内容；基于显示内容显示所述融合后直播视频数据，可以支持观看对象和直播场景中三维人物模型进行互动，从而提高了体积视频的互动性和趣味性。As can be seen from the above, in the embodiment of the present application, the live video data of the 3D character model and the shooting video of the fusion object can be obtained; image recognition processing is performed on the shooting video of the fusion object to obtain the virtual image of the fusion object; The live video data is analyzed to obtain the background information of the live video data; the avatar of the fusion object and the background information of the live video data are fused to obtain the fused live video data; the viewing angle information is obtained, and the fused image is determined based on the viewing angle information. The display content of the 3D character model and the avatar of the fused object in the live video data; displaying the fused live video data based on the display content can support the interaction between the viewing object and the 3D character model in the live scene, thereby improving the interaction of the volumetric video sex and fun.

为便于更好的实施本申请实施例提供的基于体积视频的直播方法，本申请实施例还提供一种基于上述基于体积视频的直播方法的装置。其中名词的含义与上述基于体积视频的直播方法中相同，具体实现细节可以参考方法实施例中的说明。In order to facilitate better implementation of the volumetric video-based live broadcast method provided by the embodiment of the present application, the embodiment of the present application further provides a device based on the volumetric video-based live broadcast method. The meanings of the nouns are the same as those in the volumetric video-based live broadcast method above, and for specific implementation details, please refer to the descriptions in the method embodiments.

例如，如图3所示，该基于体积视频的直播装置可以包括：For example, as shown in Figure 3, the live broadcast device based on volumetric video may include:

获取单元301，用于获取三维人物模型的直播视频数据和融合对象的拍摄视频；An acquisition unit 301, configured to acquire the live video data of the 3D character model and the captured video of the fused object;

形象识别单元302，用于对所述融合对象的拍摄视频进行形象识别处理，得到所述融合对象的虚拟形象；An image recognition unit 302, configured to perform image recognition processing on the shot video of the fusion object to obtain the virtual image of the fusion object;

解析单元303，用于对所述三维人物模型的直播视频数据进行解析，得到所述直播视频数据的背景信息；An analysis unit 303, configured to analyze the live video data of the 3D character model to obtain the background information of the live video data;

融合单元304，用于将所述融合对象的虚拟形象和所述直播视频数据的背景信息进行融合处理，得到融合后直播视频数据；A fusion unit 304, configured to perform fusion processing on the avatar of the fusion object and the background information of the live video data to obtain fused live video data;

确定单元305，用于获取观看视角信息，基于所述观看视角信息确定所述融合后直播视频数据中三维人物模型和所述融合对象的虚拟形象的显示内容；The determination unit 305 is configured to obtain viewing angle information, and determine the display content of the 3D character model in the fused live video data and the avatar of the fused object based on the viewing angle information;

显示单元306，用于基于所述显示内容显示所述融合后直播视频数据。The display unit 306 is configured to display the fused live video data based on the display content.

在一实施例中，所述形象识别单元302，可以包括：In one embodiment, the image recognition unit 302 may include:

分帧处理子单元，用于对所述拍摄视频进行分帧处理，得到所述拍摄视频的至少一个视频帧；A frame division processing subunit, configured to perform frame division processing on the captured video to obtain at least one video frame of the captured video;

轮廓识别子单元，用于对所述视频帧进行轮廓识别，得到所述融合对象的轮廓图像；A contour recognition subunit, configured to perform contour recognition on the video frame to obtain a contour image of the fused object;

风格转换子单元，用于对所述融合对象的轮廓图像进行风格转换，得到所述融合对象的虚拟形象。The style conversion subunit is configured to perform style conversion on the outline image of the fusion object to obtain the virtual image of the fusion object.

在一实施例中，所述融合单元304，可以包括：In an embodiment, the fusion unit 304 may include:

信息获取子单元，用于获取所述融合对象的等级信息；an information acquisition subunit, configured to acquire level information of the fusion object;

位置确定子单元，用于基于所述融合对象的等级信息确定所述融合对象的虚拟形象在所述直播视频数据的背景信息中的融合位置；A position determining subunit, configured to determine the fusion position of the avatar of the fusion object in the background information of the live video data based on the level information of the fusion object;

融合子单元，用于根据所述虚拟形象的融合位置将所述虚拟形象和所述直播视频数据的背景信息进行融合处理，得到融合后直播视频数据。The fusion subunit is configured to perform fusion processing on the background information of the virtual image and the live video data according to the fusion position of the virtual image to obtain fused live video data.

在一实施例中，所述位置确定子单元，可以包括：In an embodiment, the position determination subunit may include:

匹配模块，用于将所述融合对象的等级信息和所述背景信息中的至少一个预设融合位置的等级信息进行匹配，得到匹配结果；A matching module, configured to match the level information of the fusion object with the level information of at least one preset fusion position in the background information to obtain a matching result;

位置确定模块，用于基于所述匹配结果在所述至少一个预设融合位置中确定所述虚拟形象在所述直播视频数据的背景信息中的融合位置。A position determining module, configured to determine a fusion position of the avatar in the background information of the live video data in the at least one preset fusion position based on the matching result.

在一实施例中，所述确定单元305，可以包括：In an embodiment, the determining unit 305 may include:

角度计算子单元，用于对所述观看视角信息进行角度计算处理，得到观看角度；An angle calculation subunit, configured to perform angle calculation processing on the viewing angle information to obtain a viewing angle;

映射子单元，用于将所述观看角度映射至所述融合后直播视频数据，得到所述直播视频数据对应的显示范围；A mapping subunit, configured to map the viewing angle to the fused live video data to obtain a display range corresponding to the live video data;

内容确定子单元，用于根据所述显示范围确定所述融合后直播视频数据中三维人物模型和所述融合对象的虚拟形象的显示内容。The content determination subunit is configured to determine the display content of the 3D character model and the avatar of the fused object in the fused live video data according to the display range.

在一实施例中，所述直播装置，还可以包括：In an embodiment, the live broadcast device may further include:

检测单元，用于对所述三维人物模型的动作进行检测；a detection unit, configured to detect the actions of the three-dimensional character model;

提取单元，用于当检测到所述三维人物模型的动作和预设触发动作相匹配时，提取所述预设触发动作对应的虚拟形象控制逻辑；An extracting unit, configured to extract the avatar control logic corresponding to the preset trigger action when it is detected that the action of the three-dimensional character model matches the preset trigger action;

调整单元，用于根据所述虚拟形象控制逻辑对所述虚拟形象在所述融合后直播视频数据的位置信息进行调整处理。The adjusting unit is configured to adjust the location information of the avatar in the fused live video data according to the avatar control logic.

在一实施例中，所述调整单元，可以包括：In an embodiment, the adjustment unit may include:

计算子单元，用于根据所述三维人物模型的动作，计算动态变化加速度；A calculation subunit, configured to calculate the dynamic acceleration according to the action of the three-dimensional character model;

转换子单元，用于根据所述虚拟形象控制逻辑，将所述动态变化加速度转化成作用于所述虚拟形象的噪声信息；A conversion subunit, configured to convert the dynamic change acceleration into noise information acting on the avatar according to the avatar control logic;

添加子单元，用于将所述噪声信息添加至所述虚拟形象，以实现对所述虚拟形象在所述融合后直播视频数据的位置信息进行调整处理。The adding subunit is configured to add the noise information to the avatar, so as to adjust the position information of the avatar in the fused live video data.

具体实施时，以上各个模块可以作为独立的实体来实现，也可以进行任意组合，作为同一或若干个实体来实现，以上各个模块的具体实施方式以及对应的有益效果可参见前面的方法实施例，在此不再赘述。During specific implementation, each of the above modules can be implemented as an independent entity, or can be combined arbitrarily as the same or several entities. For the specific implementation of each of the above modules and the corresponding beneficial effects, please refer to the previous method embodiments. I won't repeat them here.

本申请实施例还提供一种电子设备，该电子设备可以是服务器或终端等，如图4所示，其示出了本申请实施例所涉及的电子设备的结构示意图，具体来讲：The embodiment of the present application also provides an electronic device, which may be a server or a terminal, etc., as shown in FIG. 4 , which shows a schematic structural diagram of the electronic device involved in the embodiment of the present application. Specifically:

该电子设备可以包括一个或者一个以上处理核心的处理器601、一个或一个以上计算机可读存储介质的存储器602、电源603和输入单元604等部件。本领域技术人员可以理解，图4中示出的电子设备结构并不构成对电子设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。其中：The electronic device may include a processor 601 of one or more processing cores, a memory 602 of one or more computer-readable storage media, a power supply 603, an input unit 604 and other components. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 4 does not constitute a limitation on the electronic device, and may include more or less components than shown in the figure, or combine some components, or arrange different components. in:

处理器601是该电子设备的控制中心，利用各种接口和线路连接整个电子设备的各个部分，通过运行或执行存储在存储器602内的计算机程序和/或模块，以及调用存储在存储器602内的数据，执行电子设备的各种功能和处理数据。可选的，处理器601可包括一个或多个处理核心；优选的，处理器601可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器601中。The processor 601 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device, by running or executing computer programs and/or modules stored in the memory 602, and calling the Data, perform various functions of electronic devices and process data. Optionally, the processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 601 .

存储器602可用于存储计算机程序以及模块，处理器601通过运行存储在存储器602的计算机程序以及模块，从而执行各种功能应用以及数据处理。存储器602可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的计算机程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据电子设备的使用所创建的数据等。此外，存储器602可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地，存储器602还可以包括存储器控制器，以提供处理器601对存储器602的访问。The memory 602 can be used to store computer programs and modules, and the processor 601 executes various functional applications and data processing by running the computer programs and modules stored in the memory 602 . The memory 602 can mainly include a program storage area and a data storage area, wherein the program storage area can store operating systems, computer programs required by at least one function (such as sound playback function, image playback function, etc.); Data created by the use of electronic devices, etc. In addition, the memory 602 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 602 may further include a memory controller to provide the processor 601 with access to the memory 602 .

电子设备还包括给各个部件供电的电源603，优选的，电源603可以通过电源管理系统与处理器601逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源603还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The electronic device also includes a power supply 603 for supplying power to various components. Preferably, the power supply 603 can be logically connected to the processor 601 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system. The power supply 603 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.

该电子设备还可包括输入单元604，该输入单元604可用于接收输入的数字或字符信息，以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。The electronic device can also include an input unit 604, which can be used to receive input numbers or character information, and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.

尽管未示出，电子设备还可以包括显示单元等，在此不再赘述。具体在本实施例中，电子设备中的处理器601会按照如下的指令，将一个或一个以上的计算机程序的进程对应的可执行文件加载到存储器602中，并由处理器601来运行存储在存储器602中的计算机程序，从而实现各种功能，比如：Although not shown, the electronic device may also include a display unit, etc., which will not be repeated here. Specifically, in this embodiment, the processor 601 in the electronic device will load the executable file corresponding to the process of one or more computer programs into the memory 602 according to the following instructions, and the processor 601 will run the executable file stored in the The computer program in memory 602, thereby realizes various functions, such as:

以上各个操作的具体实施方式以及对应的有益效果可参见上文对基于体积视频的直播方法的详细描述，在此不作赘述。For the specific implementation manners of the above operations and the corresponding beneficial effects, refer to the detailed description of the live broadcast method based on volumetric video above, and details are not repeated here.

本领域普通技术人员可以理解，上述实施例的各种方法中的全部或部分步骤可以通过计算机程序来完成，或通过计算机程序控制相关的硬件来完成，该计算机程序可以存储于一计算机可读存储介质中，并由处理器进行加载和执行。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a computer program, or by controlling related hardware through a computer program, and the computer program can be stored in a computer-readable storage media and is loaded and executed by the processor.

为此，本申请实施例提供一种计算机可读存储介质，其中存储有计算机程序，该计算机程序能够被处理器进行加载，以执行本申请实施例所提供的任一种基于体积视频的直播方法中的步骤。例如，该计算机程序可以执行如下步骤：To this end, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program can be loaded by a processor to execute any volumetric video-based live broadcast method provided in the embodiment of the present application. in the steps. For example, the computer program can perform the following steps:

以上各个操作的具体实施方式以及对应的有益效果可参见前面的实施例，在此不再赘述。For the specific implementation manners of the above operations and the corresponding beneficial effects, reference may be made to the foregoing embodiments, and details are not repeated here.

其中，该计算机可读存储介质可以包括：只读存储器(ROM，Read Only Memory)、随机存取记忆体(RAM，Random Access Memory)、磁盘或光盘等。Wherein, the computer-readable storage medium may include: a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, and the like.

由于该计算机可读存储介质中所存储的计算机程序，可以执行本申请实施例所提供的任一种基于体积视频的直播方法中的步骤，因此，可以实现本申请实施例所提供的任一种基于体积视频的直播方法所能实现的有益效果，详见前面的实施例，在此不再赘述。Because the computer program stored in the computer-readable storage medium can execute the steps in any volume video-based live broadcast method provided in the embodiments of the present application, therefore, any one of the methods provided in the embodiments of the present application can be realized. For the beneficial effects that can be achieved by the live broadcast method based on volumetric video, refer to the previous embodiments for details, and will not be repeated here.

其中，根据本申请的一个方面，提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述基于体积视频的直播方法。Wherein, according to one aspect of the present application, a computer program product or computer program is provided, the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the above volumetric video-based live broadcast method.

以上对本申请实施例所提供的一种基于体积视频的直播方法、装置、电子设备及存储介质进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The volumetric video-based live broadcast method, device, electronic equipment, and storage medium provided by the embodiments of the present application are described above in detail. In this paper, specific examples are used to illustrate the principles and implementation methods of the present application. The above embodiments The description is only used to help understand the method of the present application and its core idea; at the same time, for those skilled in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope, in summary , the contents of this specification should not be construed as limiting the application.

Claims

1. A live broadcast method based on volume video, characterized in that, comprising:

Obtain the live video data of the 3D character model and the shooting video of the fused object;

performing image recognition processing on the shot video of the fusion object to obtain the virtual image of the fusion object;

Analyzing the live video data of the three-dimensional character model to obtain the background information of the live video data;

performing fusion processing on the avatar of the fusion object and the background information of the live video data to obtain the fused live video data;

Obtain viewing angle information, and determine the display content of the 3D character model in the fused live video data and the avatar of the fused object based on the viewing angle information;

The fused live video data is displayed based on the display content.

2. The method according to claim 1, wherein said performing image recognition processing on the shot video of said fusion object to obtain the virtual image of said fusion object comprises:

performing frame division processing on the captured video to obtain at least one video frame of the captured video;

Perform contour recognition on the video frame to obtain a contour image of the fusion object;

Performing style conversion on the outline image of the fused object to obtain a virtual image of the fused object.

3. The method according to claim 1, wherein said performing fusion processing on the avatar of the fusion object and the background information of the live video data to obtain the fused live video data comprises:

Obtaining level information of the fusion object;

determining the fusion position of the avatar of the fusion object in the background information of the live video data based on the level information of the fusion object;

The avatar and the background information of the live video data are fused according to the fused position of the avatar to obtain fused live video data.

4. The method according to claim 3, wherein the determining the fusion position of the avatar of the fusion object in the background information of the live video data based on the level information of the fusion object comprises:

matching the level information of the fusion object with the level information of at least one preset fusion position in the background information to obtain a matching result;

A fusion position of the avatar in the background information of the live video data is determined in the at least one preset fusion position based on the matching result.

5. The method according to claim 1, wherein the determination of the display content of the three-dimensional character model in the live video data after the fusion and the avatar of the fusion object based on the viewing angle information includes:

performing angle calculation processing on the viewing angle information to obtain a viewing angle;

Mapping the viewing angle to the fused live video data to obtain a display range corresponding to the live video data;

The display content of the 3D character model in the fused live video data and the avatar of the fused object is determined according to the display range.

6. The method according to claim 1 or 5, characterized in that the method further comprises:

Detecting the actions of the three-dimensional character model;

When it is detected that the action of the three-dimensional character model matches the preset trigger action, extracting the avatar control logic corresponding to the preset trigger action;

Adjusting the location information of the avatar in the fused live video data according to the avatar control logic.

7. The method according to claim 6, characterized in that, according to the avatar control logic, the adjustment processing of the position information of the avatar in the live video data after the fusion includes:

Calculating the dynamic change acceleration according to the action of the three-dimensional character model;

According to the avatar control logic, the dynamic change acceleration is converted into noise information acting on the avatar;

The noise information is added to the avatar, so as to adjust the position information of the avatar in the fused live video data.

8. A live broadcast device based on volumetric video, comprising:

An acquisition unit, configured to acquire the live video data of the 3D character model and the shooting video of the fused object;

An image recognition unit, configured to perform image recognition processing on the captured video of the fusion object to obtain the virtual image of the fusion object;

An analysis unit, configured to analyze the live video data of the 3D character model to obtain the background information of the live video data;

A fusion unit, configured to fuse the avatar of the fusion object and the background information of the live video data to obtain fused live video data;

A determining unit, configured to obtain viewing angle information, and determine the display content of the 3D character model in the fused live video data and the avatar of the fused object based on the viewing angle information;

A display unit, configured to display the fused live video data based on the display content.

9. An electronic device, characterized in that it comprises a processor and a memory, the memory stores a computer program, and the processor is used to run the computer program in the memory to execute any one of claims 1 to 7. The volumetric video-based live broadcast method described above.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and the computer program is suitable for being loaded by a processor to execute the method described in any one of claims 1 to 7. Volumetric Video-Based Live Streaming Method.