CN116129526A

CN116129526A - Method and device for controlling photographing, electronic equipment and storage medium

Info

Publication number: CN116129526A
Application number: CN202310080244.4A
Authority: CN
Inventors: 邵志兢; 张煜; 孙伟; 吕云
Original assignee: Zhuhai Prometheus Vision Technology Co ltd
Current assignee: Zhuhai Prometheus Vision Technology Co ltd
Priority date: 2023-02-01
Filing date: 2023-02-01
Publication date: 2023-05-16

Abstract

The present application provides a co-production control method, device, electronic equipment, and computer-readable storage medium. The co-shooting control method includes: presenting a shooting picture containing a user character; performing gesture recognition on the user character in the shooting picture to obtain the current gesture of the user character; if the current gesture is consistent with the preset position control gesture match, obtain the pointing position of the current gesture; place a virtual object in the shooting frame according to the pointing point, so as to control the relative position of the virtual object and the user character in the shooting frame. In this application, the user character can control and know the position of the virtual object, so that the user character can make actions and expressions that are more natural and coordinated with the virtual object when co-shooting with the virtual object, reducing the abruptness of the co-shooting picture, and making the co-shooting picture more effective nature.

Description

Co-production control method, device, electronic device and storage medium

技术领域technical field

本申请涉及拍摄处理技术领域，具体涉及一种合拍控制方法、装置、电子设备及计算机可读存储介质。The present application relates to the technical field of shooting processing, and in particular to a co-shooting control method, device, electronic equipment, and computer-readable storage medium.

背景技术Background technique

随着图像拍摄技术的发展，用户可以使用电子设备拍摄各种各样的视频或图像，用户对视频或图像的拍摄要求也越来越多样化，例如，用户期望可以与虚拟对象(如体积视频)合拍视频或图像。With the development of image capture technology, users can use electronic devices to capture various videos or images, and users have more and more diverse requirements for video or image capture. For example, users expect to be able to interact with virtual objects (such as volume video ) to capture a video or image.

但是，本申请实施例发明人在实际研发过程中发现：在合拍过程中，由于用户角色不知道虚拟对象在画面中与自己的相对位置，从而导致用户角色与虚拟对象的动作、表情等不协调，从而导致合拍出来的画面效果较为突兀。However, the inventors of the embodiments of the present application found in the actual research and development process that during the co-shooting process, since the user character does not know the relative position of the virtual object in the screen, the actions and expressions of the user character and the virtual object are not coordinated. , resulting in a more abrupt picture effect in co-production.

发明内容Contents of the invention

本申请提供一种合拍控制方法、装置、电子设备及计算机可读存储介质，可以使得用户角色可以控制和清楚虚拟对象的位置，进而使得用户角色在与虚拟对象合拍时作出与虚拟对象更加自然协调的动作与表情，降低合拍画面的突兀性，使得合拍画面效果更自然。The present application provides a co-pacing control method, device, electronic equipment, and computer-readable storage medium, which can enable the user character to control and know the position of the virtual object, and then make the user character coordinate with the virtual object more naturally when co-pacing with the virtual object The movements and expressions reduce the abruptness of the co-shooting pictures, making the co-shooting pictures more natural.

第一方面，本申请提供一种合拍控制方法，所述方法包括：In a first aspect, the present application provides a co-pacing control method, the method comprising:

呈现包含用户角色的拍摄画面；Present a shooting screen containing the user's character;

对所述拍摄画面中的所述用户角色进行手势识别，得到所述用户角色的当前手势；performing gesture recognition on the user character in the photographed picture to obtain the current gesture of the user character;

若所述当前手势与预设的位置控制手势匹配，则获取所述当前手势的指向位置点；If the current gesture matches the preset position control gesture, then acquire the pointed position point of the current gesture;

按照所述指向位置点在所述拍摄画面中放置虚拟对象，以控制所述拍摄画面中所述虚拟对象与所述用户角色的相对位置。The virtual object is placed in the shooting frame according to the pointing position point, so as to control the relative position of the virtual object and the user character in the shooting frame.

第二方面，本申请提供一种合拍控制装置，所述合拍控制装置包括：In a second aspect, the present application provides a co-pacing control device, the co-pacing control device includes:

显示单元，用于呈现包含用户角色的拍摄画面；a display unit, configured to present a shooting screen including a user character;

识别单元，用于对所述拍摄画面中的所述用户角色进行手势识别，得到所述用户角色的当前手势；A recognition unit, configured to perform gesture recognition on the user character in the photographed picture to obtain the current gesture of the user character;

获取单元，用于若所述当前手势与预设的位置控制手势匹配，则获取所述当前手势的指向位置点；An acquisition unit, configured to acquire the pointing position point of the current gesture if the current gesture matches a preset position control gesture;

控制单元，用于按照所述指向位置点在所述拍摄画面中放置虚拟对象，以控制所述拍摄画面中所述虚拟对象与所述用户角色的相对位置。A control unit, configured to place a virtual object in the shooting frame according to the pointed position point, so as to control the relative positions of the virtual object and the user character in the shooting frame.

在一些实施例中，所述获取单元具体用于：In some embodiments, the acquiring unit is specifically used for:

若所述当前手势与预设的位置控制手势匹配，则获取所述当前手势对应的手指指向在所述拍摄画面所在的三维空间中形成的射线；If the current gesture matches the preset position control gesture, then acquire the finger corresponding to the current gesture to point to a ray formed in the three-dimensional space where the shooting picture is located;

获取所述射线与所述虚拟对象的支撑面之间的交点，以作为所述当前手势的指向位置点。Obtain an intersection point between the ray and the supporting surface of the virtual object as the pointing position of the current gesture.

在一些实施例中，所述获取所述射线与所述虚拟对象的支撑面之间的交点，以作为所述当前手势的指向位置点之前，所述获取位单元具体用于：In some embodiments, before obtaining the intersection point between the ray and the supporting surface of the virtual object as the pointed position point of the current gesture, the obtaining bit unit is specifically used for:

将所述拍摄画面中所述用户角色的站立面加入所述拍摄画面的备选平面集合中；adding the standing surface of the user character in the shooting picture to the set of candidate planes in the shooting picture;

将所述拍摄画面中与所述站立面之间的夹角小于预设夹角阈值的平面，加入所述备选平面集合中；Adding a plane whose angle between the shooting picture and the standing surface is smaller than a preset angle threshold to the set of candidate planes;

从所述备选平面集合的各平面中，获取与所述射线存在交点且与所述射线的起点最近的平面，以作为所述虚拟对象的支撑面。From each of the planes in the set of candidate planes, a plane that has an intersection point with the ray and is closest to a starting point of the ray is obtained as a supporting surface of the virtual object.

检测所述用户角色的当前手势的手势类型；detecting the gesture type of the current gesture of the user character;

若所述手势类型为位置控制手势、且所述当前手势与预设的位置控制手势匹配，则获取所述当前手势的指向位置点。If the gesture type is a position control gesture and the current gesture matches a preset position control gesture, the pointing position point of the current gesture is acquired.

在一些实施例中，所述控制单元具体用于：In some embodiments, the control unit is specifically used for:

若所述手势类型为朝向控制手势，则获取所述当前手势的关联朝向；If the gesture type is an orientation control gesture, then acquire the associated orientation of the current gesture;

按照所述关联朝向，控制所述拍摄画面中所述虚拟对象与所述用户角色的相对朝向。According to the associated orientation, the relative orientation of the virtual object and the user character in the shooting screen is controlled.

响应于所述用户角色的朝向的变化，更新所述虚拟对象的朝向，以使得所述用户角色与所述虚拟对象的相对朝向保持为所述关联朝向。In response to the change of the orientation of the user character, the orientation of the virtual object is updated, so that the relative orientation of the user character and the virtual object remains as the associated orientation.

若所述手势类型为距离控制手势，则获取所述当前手势的关联距离；If the gesture type is a distance control gesture, then obtain the associated distance of the current gesture;

按照所述关联距离，控制所述拍摄画面中所述虚拟对象与所述用户角色的相对距离。According to the association distance, the relative distance between the virtual object and the user character in the photographing frame is controlled.

响应于针对拍摄控件的触控操作，对所述拍摄画面进行拍摄，得到所述虚拟对象与所述用户角色的目标合拍视频。In response to a touch operation on the shooting control, the shooting screen is shot to obtain a co-shooting video of the virtual object and the target of the user character.

响应于针对拍摄控件的触控操作，对所述拍摄画面进行拍摄，得到所述虚拟对象与所述用户角色的初步合拍视频；Responding to a touch operation on the shooting control, shooting the shooting screen to obtain a preliminary co-shooting video of the virtual object and the user character;

对所述初步合拍视频中的视频帧进行控制手势识别，得到包含控制手势的目标视频帧；Perform control gesture recognition on the video frame in the preliminary co-shooting video to obtain a target video frame containing the control gesture;

将所述目标视频帧从所述初步合拍视频中滤除，得到所述目标合拍视频。The target video frame is filtered out from the preliminary co-shooting video to obtain the target co-shooting video.

当所述当前手势与预设的控制手势匹配时，检测所述拍摄画面的摄像头是否处于拍摄状态；When the current gesture matches the preset control gesture, detect whether the camera of the shooting picture is in a shooting state;

若所述摄像头处于拍摄状态，则将所述摄像头由拍摄状态切换至暂停状态；If the camera is in the shooting state, then switch the camera from the shooting state to the pause state;

直至所述拍摄画面中所述虚拟对象放置于所述指向位置点时，将所述摄像头由暂停状态切换至拍摄状态。Switching the camera from a pause state to a shooting state until the virtual object is placed at the pointing position in the shooting frame.

在一些实施例中，所述虚拟对象为体积视频中的三维模型，所述控制单元具体用于：In some embodiments, the virtual object is a three-dimensional model in a volumetric video, and the control unit is specifically used for:

按照所述指向位置点在所述拍摄画面中放置体积视频中的三维模型。The three-dimensional model in the volumetric video is placed in the shooting frame according to the pointed position point.

第三方面，本申请还提供一种电子设备，所述电子设备包括处理器和存储器，所述存储器中存储有计算机程序，所述处理器调用所述存储器中的计算机程序时执行本申请提供的任一种合拍控制方法。In a third aspect, the present application also provides an electronic device, the electronic device includes a processor and a memory, and a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the computer program provided by the present application is executed. Any method of co-tempo control.

第四方面，本申请还提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器进行加载，以执行所述的合拍控制方法。In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is loaded by a processor to execute the co-time control method.

本申请通过对拍摄画面中的用户角色进行手势识别，得到用户角色的当前手势；若当前手势与预设的位置控制手势匹配，则获取当前手势的指向位置点；按照指向位置点在拍摄画面中放置虚拟对象，可以使得用户角色可以通过手势控制虚拟对象的位置，从而使得用户角色所合拍的虚拟对象的大致位置，进而使得用户角色在与虚拟对象合拍时作出与虚拟对象更加自然协调的动作与表情，降低合拍画面的突兀性，使得合拍画面效果更自然。This application obtains the current gesture of the user character by performing gesture recognition on the user character in the shooting screen; if the current gesture matches the preset position control gesture, then obtains the pointing position point of the current gesture; according to the pointing position point in the shooting screen Placing a virtual object allows the user character to control the position of the virtual object through gestures, so that the approximate position of the virtual object that the user character is in sync with, and then makes the user character take a more natural and coordinated action with the virtual object when in sync with the virtual object. Expressions reduce the abruptness of the co-shooting pictures and make the co-shooting pictures more natural.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是本申请实施例所提供的合拍控制系统的场景示意图；FIG. 1 is a schematic diagram of a scene of a co-production control system provided by an embodiment of the present application;

图2是本申请实施例提供的合拍控制方法的一种流程示意图；FIG. 2 is a schematic flowchart of a co-pacing control method provided in an embodiment of the present application;

图3是本申请实施例中提供的拍摄画面与拍摄场景的一个对比示意图；FIG. 3 is a schematic diagram of a comparison between the shooting picture and the shooting scene provided in the embodiment of the present application;

图4是本申请实施例中提供的拍摄画面的一个场景示意图；FIG. 4 is a schematic diagram of a scene of a shooting screen provided in an embodiment of the present application;

图5是本申请实施例中提供的拍摄画面的另一个场景示意图；FIG. 5 is a schematic diagram of another scene of a shooting screen provided in an embodiment of the present application;

图6是本申请实施例中提供的拍摄画面的另一个场景示意图；FIG. 6 is a schematic diagram of another scene of the shooting screen provided in the embodiment of the present application;

图7是本申请实施例中提供的合拍控制装置的一个实施例结构示意图；Fig. 7 is a schematic structural diagram of an embodiment of the co-pacing control device provided in the embodiment of the present application;

图8是本申请实施例中提供的电子设备的一个实施例结构示意图。Fig. 8 is a schematic structural diagram of an embodiment of the electronic device provided in the embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts belong to the scope of protection of this application.

在本申请实施例的描述中，需要理解的是，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个所述特征。在本申请实施例的描述中，“多个”的含义是两个或两个以上，除非另有明确具体的限定。In the description of the embodiments of the present application, it should be understood that the terms "first" and "second" are only used for descriptive purposes, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the indicated technical features. quantity. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of said features. In the description of the embodiments of the present application, "plurality" means two or more, unless otherwise specifically defined.

为了使本领域任何技术人员能够实现和使用本申请，给出了以下描述。在以下描述中，为了解释的目的而列出了细节。应当明白的是，本领域普通技术人员可以认识到，在不使用这些特定细节的情况下也可以实现本申请。在其它实例中，不会对公知的过程进行详细阐述，以避免不必要的细节使本申请实施例的描述变得晦涩。因此，本申请并非旨在限于所示的实施例，而是与符合本申请实施例所公开的原理和特征的最广范围相一致。The following description is given to enable any person skilled in the art to make and use the application. In the following description, details are set forth for purposes of explanation. It should be understood that one of ordinary skill in the art would recognize that the present application may be practiced without these specific details. In other instances, well known procedures have not been described in detail to avoid obscuring the description of the embodiments of the present application with unnecessary detail. Therefore, the present application is not intended to be limited to the illustrated embodiments, but to be consistent with the broadest scope consistent with the principles and features disclosed in the embodiments of the present application.

体积视频(Volumetric Video，又称容积视频、空间视频、体三维视频或6自由度视频等)是一种通过捕获三维空间中信息(如深度信息和色彩信息等)并生成三维模型序列的技术。相对于传统的视频，体积视频将空间的概念加入到视频中，用三维模型来更好的还原真实三维世界，而不是以二维的平面视频加上运镜来模拟真实三维世界的空间感。由于体积视频实质为三维模型序列，使得用户可以随自己喜好调整到任意视角进行观看，较二维平面视频具有更高的还原度和沉浸感。Volumetric Video (also known as volumetric video, spatial video, volumetric 3D video or 6-DOF video, etc.) is a technology that captures information in 3D space (such as depth information and color information, etc.) and generates a 3D model sequence. Compared with traditional videos, volumetric videos add the concept of space to videos, and use 3D models to better restore the real 3D world, instead of using 2D flat videos and moving mirrors to simulate the sense of space in the real 3D world. Since the volumetric video is essentially a sequence of 3D models, users can adjust it to any viewing angle to watch according to their preferences, which has a higher degree of restoration and immersion than 2D flat video.

可选地，在本申请中，用于构成体积视频的三维模型可以按照如下方式重建得到：Optionally, in this application, the 3D model used to form the volumetric video can be reconstructed as follows:

先获取拍摄对象的不同视角的彩色图像和深度图像，以及彩色图像对应的相机参数；然后根据获取到的彩色图像及其对应的深度图像和相机参数，训练隐式表达拍摄对象三维模型的神经网络模型，并基于训练的神经网络模型进行等值面提取，实现对拍摄对象的三维重建，得到拍摄对象的三维模型。First obtain the color images and depth images of different viewing angles of the subject, as well as the camera parameters corresponding to the color images; then, according to the acquired color images and their corresponding depth images and camera parameters, train the neural network that implicitly expresses the 3D model of the subject model, and extract the isosurface based on the trained neural network model, realize the 3D reconstruction of the shooting object, and obtain the 3D model of the shooting object.

应当说明的是，本申请实施例中对采用何种架构的神经网络模型不作具体限制，可由本领域技术人员根据实际需要选取。比如，可以选取不带归一化层的多层感知机(Multilayer Perceptron，MLP)作为模型训练的基础模型。It should be noted that there is no specific limitation on the architecture of the neural network model used in the embodiment of the present application, which can be selected by those skilled in the art according to actual needs. For example, a multilayer perceptron (MLP) without a normalization layer can be selected as the basic model for model training.

下面将对本申请提供的三维模型重建方法进行详细描述。The 3D model reconstruction method provided by this application will be described in detail below.

首先，可以同步采用多个彩色相机和深度相机对需要进行三维重建的目标物体(该目标物体即为拍摄对象)进行多视角的拍摄，得到目标物体在多个不同视角的彩色图像及对应的深度图像，即在同一拍摄时刻(实际拍摄时刻的差值小于或等于时间阈值即认为拍摄时刻相同)，各视角的彩色相机将拍摄得到目标物体在对应视角的彩色图像，相应的，各视角的深度相机将拍摄得到目标物体在对应视角的深度图像。需要说明的是，目标物体可以是任意物体，包括但不限于人物、动物以及植物等生命物体，或者机械、家具、玩偶等非生命物体。First of all, multiple color cameras and depth cameras can be used simultaneously to take multi-angle shots of the target object (the target object is the subject) that needs to be reconstructed in three dimensions, and obtain color images of the target object at multiple different angles of view and the corresponding depth. Image, that is, at the same shooting time (the difference between the actual shooting time is less than or equal to the time threshold, the shooting time is considered to be the same), the color cameras of each viewing angle will capture the color image of the target object at the corresponding viewing angle, and correspondingly, the depth of each viewing angle The camera will capture the depth image of the target object at the corresponding viewing angle. It should be noted that the target object may be any object, including but not limited to living objects such as people, animals, and plants, or non-living objects such as machinery, furniture, and dolls.

以此，目标物体在不同视角的彩色图像均具备对应的深度图像，即在拍摄时，彩色相机和深度相机可以采用相机组的配置，同一视角的彩色相机配合深度相机同步对同一目标物体进行拍摄。比如，可以搭建一摄影棚，该摄影棚中心区域为拍摄区域，环绕该拍摄区域，在水平方向和垂直方向每间隔一定角度配对设置有多组彩色相机和深度相机。当目标物体处于这些彩色相机和深度相机所环绕的拍摄区域时，即可通过这些彩色相机和深度相机拍摄得到该目标物体在不同视角的彩色图像及对应的深度图像。In this way, the color images of the target object at different viewing angles have corresponding depth images, that is, when shooting, the color camera and the depth camera can adopt the configuration of the camera group, and the color camera at the same viewing angle cooperates with the depth camera to simultaneously shoot the same target object . For example, a studio can be built. The central area of the studio is the shooting area, and around the shooting area, multiple groups of color cameras and depth cameras are paired at certain angles in the horizontal and vertical directions. When the target object is in the shooting area surrounded by these color cameras and depth cameras, color images of the target object at different viewing angles and corresponding depth images can be obtained by shooting these color cameras and depth cameras.

此外，进一步获取每一彩色图像对应的彩色相机的相机参数。其中，相机参数包括彩色相机的内外参，可以通过标定确定，相机内参为与彩色相机自身特性相关的参数，包括但不限于彩色相机的焦距、像素等数据，相机外参为彩色相机在世界坐标系中的参数，包括但不限于彩色相机的位置(坐标)和相机的旋转方向等数据。In addition, the camera parameters of the color camera corresponding to each color image are further acquired. Among them, the camera parameters include the internal and external parameters of the color camera, which can be determined through calibration. The internal parameters of the camera are parameters related to the characteristics of the color camera itself, including but not limited to the focal length, pixels and other data of the color camera. The external parameters of the camera are the coordinates of the color camera in the world The parameters in the system, including but not limited to the position (coordinates) of the color camera and the rotation direction of the camera and other data.

如上，在获取到目标物体在同一拍摄时刻的多个不同视角的彩色图像及其对应的深度图像之后，即可根据这些彩色图像及其对应深度图像对目标物体进行三维重建。区别于相关技术中将深度信息转换为点云进行三维重建的方式，本申请训练一神经网络模型用以实现对目标物体的三维模型的隐式表达，从而基于该神经网络模型实现对目标物体的三维重建。As above, after acquiring multiple color images of different viewing angles and their corresponding depth images at the same shooting moment of the target object, three-dimensional reconstruction of the target object can be performed based on these color images and their corresponding depth images. Different from the method of converting depth information into point cloud for 3D reconstruction in related technologies, this application trains a neural network model to realize the implicit expression of the 3D model of the target object, so as to realize the 3D reconstruction of the target object based on the neural network model. Three-dimensional reconstruction.

可选地，本申请选用一不包括归一化层的多层感知机(Multilayer Perceptron，MLP)作为基础模型，按照如下方式进行训练：Optionally, the present application selects a multilayer perceptron (Multilayer Perceptron, MLP) that does not include a normalization layer as the basic model, and trains as follows:

基于对应的相机参数将每一彩色图像中的像素点转化为射线；Convert the pixels in each color image into rays based on the corresponding camera parameters;

在射线上采样多个采样点，并确定每一采样点的第一坐标信息以及每一采样点距离像素点的SDF值；Sampling a plurality of sampling points on the ray, and determining the first coordinate information of each sampling point and the SDF value of each sampling point from the pixel point;

将采样点的第一坐标信息输入基础模型，得到基础模型输出的每一采样点的预测SDF值以及预测RGB颜色值；Input the first coordinate information of the sampling point into the basic model, and obtain the predicted SDF value and the predicted RGB color value of each sampling point output by the basic model;

基于预测SDF值与SDF值之间的第一差异，以及预测RGB颜色值与像素点的RGB颜色值之间的第二差异，对基础模型的参数进行调整，直至满足预设停止条件；Based on the first difference between the predicted SDF value and the SDF value, and the second difference between the predicted RGB color value and the RGB color value of the pixel, the parameters of the basic model are adjusted until a preset stop condition is met;

将满足预设停止条件的基础模型作为隐式表达目标物体的三维模型的神经网络模型。The basic model that satisfies the preset stop condition is used as a neural network model that implicitly expresses the 3D model of the target object.

首先，基于彩色图像对应的相机参数将彩色图像中的一像素点转化为一条射线，该射线可以为经过像素点且垂直于彩色图像面的射线；然后，在该射线上采样多个采样点，采样点的采样过程可以分两步执行，可以先均匀采样部分采样点，然后再在基于像素点的深度值在关键处进一步采样多个采样点，以保证在模型表面附近可以采样到尽量多的采样点；然后，根据相机参数和像素点的深度值计算出采样得到的每一采样点在世界坐标系中的第一坐标信息以及每一采样点的有向距离(Signed Distance Field，SDF)值，其中，SDF值可以为像素点的深度值与采样点距离相机成像面的距离之间的差值，该差值为有符号的值，当差值为正值时，表示采样点在三维模型的外部，当差值为负值时，表示采样点在三维模型的内部，当差值为零时，表示采样点在三维模型的表面；然后，在完成采样点的采样并计算得到每一采样点对应的SDF值之后，进一步将采样点在世界坐标系的第一坐标信息输入基础模型(该基础模型被配置为将输入的坐标信息映射为SDF值和RGB颜色值后输出)，将基础模型输出的SDF值记为预测SDF值，将基础模型输出的RGB颜色值记为预测RGB颜色值；然后，基于预测SDF值与采样点对应的SDF值之间的第一差异，以及预测RGB颜色值与采样点所对应像素点的RGB颜色值之间的第二差异，对基础模型的参数进行调整。First, based on the camera parameters corresponding to the color image, a pixel in the color image is converted into a ray, which can be a ray passing through the pixel and perpendicular to the color image surface; then, sampling multiple sampling points on the ray, The sampling process of the sampling points can be performed in two steps. You can first uniformly sample some sampling points, and then further sample multiple sampling points at key points based on the depth value of the pixel points, so as to ensure that as much as possible can be sampled near the surface of the model. Sampling point; then, calculate the first coordinate information of each sampling point in the world coordinate system obtained by sampling and the directional distance (Signed Distance Field, SDF) value of each sampling point according to the camera parameters and the depth value of the pixel point , where the SDF value can be the difference between the depth value of the pixel point and the distance between the sampling point and the imaging surface of the camera. The difference is a signed value. When the difference is positive, it means that the sampling point is in the 3D model When the difference is negative, it means that the sampling point is inside the 3D model, and when the difference is zero, it means that the sampling point is on the surface of the 3D model; then, after completing the sampling of the sampling point and calculating each sample After the SDF value corresponding to the point, the first coordinate information of the sampling point in the world coordinate system is further input into the basic model (the basic model is configured to map the input coordinate information into SDF value and RGB color value and then output), and the basic model The output SDF value is recorded as the predicted SDF value, and the RGB color value output by the basic model is recorded as the predicted RGB color value; then, based on the first difference between the predicted SDF value and the SDF value corresponding to the sampling point, and the predicted RGB color value The second difference between the RGB color value of the pixel corresponding to the sampling point and the parameter of the basic model is adjusted.

此外，对于彩色图像中的其它像素点，同样按照上述方式进行采样点采样，然后将采样点在世界坐标系的坐标信息输入至基础模型以得到对应的预测SDF值和预测RGB颜色值，用于对基础模型的参数进行调整，直至满足预设停止条件，比如，可以配置预设停止条件为对基础模型的迭代次数达到预设次数，或者配置预设停止条件为基础模型收敛。在对基础模型的迭代满足预设停止条件时，即得到能够对拍摄对象的三维模型进行准确地隐式表达的神经网络模型。最后，可以采用等值面提取算法对该神经网络模型进行三维模型表面的提取，从而得到拍摄对象的三维模型。In addition, for other pixels in the color image, the sampling points are also sampled in the same manner as above, and then the coordinate information of the sampling points in the world coordinate system is input to the basic model to obtain the corresponding predicted SDF value and predicted RGB color value for Adjust the parameters of the basic model until the preset stopping condition is met. For example, the preset stopping condition can be configured as the number of iterations of the basic model reaches the preset number, or the preset stopping condition can be configured as the basic model converges. When the iteration of the basic model satisfies the preset stop condition, a neural network model capable of accurately and implicitly expressing the three-dimensional model of the shooting object is obtained. Finally, the isosurface extraction algorithm can be used to extract the surface of the three-dimensional model of the neural network model, so as to obtain the three-dimensional model of the shooting object.

可选地，在一些实施例中，根据相机参数确定彩色图像的成像面；确定经过彩色图像中像素点且垂直于成像面的射线为像素点对应的射线。Optionally, in some embodiments, the imaging plane of the color image is determined according to camera parameters; and a ray passing through a pixel in the color image and perpendicular to the imaging plane is determined as a ray corresponding to the pixel.

其中，可以根据彩色图像对应的彩色相机的相机参数，确定该彩色图像在世界坐标系中的坐标信息，即确定成像面。然后，可以确定经过彩色图像中像素点且垂直于该成像面的射线为该像素点对应的射线。Wherein, the coordinate information of the color image in the world coordinate system may be determined according to the camera parameters of the color camera corresponding to the color image, that is, the imaging plane may be determined. Then, it can be determined that the ray passing through the pixel in the color image and perpendicular to the imaging plane is the ray corresponding to the pixel.

可选地，在一些实施例中，根据相机参数确定彩色相机在世界坐标系中的第二坐标信息及旋转角度；根据第二坐标信息和旋转角度确定彩色图像的成像面。Optionally, in some embodiments, the second coordinate information and the rotation angle of the color camera in the world coordinate system are determined according to the camera parameters; and the imaging plane of the color image is determined according to the second coordinate information and the rotation angle.

可选地，在一些实施例中，在射线上等间距采样第一数量个第一采样点；根据像素点的深度值确定多个关键采样点，并根据关键采样点采样第二数量个第二采样点；将第一数量个的第一采样点与第二数量个的第二采样点确定为在射线上采样得到的多个采样点。Optionally, in some embodiments, a first number of first sampling points are sampled at equal intervals on the ray; multiple key sampling points are determined according to the depth value of the pixel point, and a second number of second sampling points are sampled according to the key sampling points. Sampling points: determining the first number of first sampling points and the second number of second sampling points as a plurality of sampling points obtained by sampling on the ray.

其中，先在射线上均匀采样n(即第一数量)个第一采样点，n为大于2的正整数；然后，再根据前述像素点的深度值，从n个第一采样点中确定出距离前述像素点最近的预设数量个关键采样点，或者从n个第一采样点中确定出距离前述像素点小于距离阈值的关键采样点；然后，根据确定出的关键采样点再采样m个第二采样点，m为大于1的正整数；最后，将采样得到的n+m个采样点确定为在射线上采样得到的多个采样点。其中，在关键采样点处再多采样m个采样点，可以使得模型的训练效果在三维模型表面处更为精确，从而提升三维模型的重建精度。Among them, first uniformly sample n (i.e. the first number) first sampling points on the ray, n is a positive integer greater than 2; then, according to the depth value of the aforementioned pixel points, determine the The preset number of key sampling points closest to the aforementioned pixel points, or determine the key sampling points from the n first sampling points that are less than the distance threshold from the aforementioned pixel points; then, re-sampling m according to the determined key sampling points For the second sampling point, m is a positive integer greater than 1; finally, the n+m sampling points obtained by sampling are determined as a plurality of sampling points obtained by sampling on the ray. Among them, sampling m more sampling points at key sampling points can make the training effect of the model more accurate on the surface of the 3D model, thereby improving the reconstruction accuracy of the 3D model.

可选地，在一些实施例中，根据彩色图像对应的深度图像确定像素点对应的深度值；基于深度值计算每一采样点距离像素点的SDF值；根据相机参数与深度值计算每一采样点的坐标信息。Optionally, in some embodiments, the depth value corresponding to the pixel is determined according to the depth image corresponding to the color image; the SDF value of each sampling point from the pixel is calculated based on the depth value; the SDF value of each sampling point is calculated according to the camera parameters and the depth value Point coordinate information.

其中，在每一像素点对应的射线上采样了多个采样点后，对于每一采样点，根据相机参数、像素点的深度值确定彩色相机的拍摄位置与目标物体上对应点之间的距离，然后基于该距离逐一计算每一采样点的SDF值以及计算出每一采样点的坐标信息。Among them, after sampling multiple sampling points on the ray corresponding to each pixel point, for each sampling point, determine the distance between the shooting position of the color camera and the corresponding point on the target object according to the camera parameters and the depth value of the pixel point , and then calculate the SDF value of each sampling point one by one based on the distance and calculate the coordinate information of each sampling point.

需要说明的是，在完成对基础模型的训练之后，对于给定的任意一个点的坐标信息，即可由完成训练的基础模型预测其对应的SDF值，该预测的SDF值即表示了该点与目标物体的三维模型的位置关系(内部、外部或者表面)，实现对目标物体的三维模型的隐式表达，得到用于隐式表达目标物体的三维模型的神经网络模型。It should be noted that after the training of the basic model is completed, for the coordinate information of any given point, the corresponding SDF value can be predicted by the basic model that has completed the training, and the predicted SDF value represents the relationship between the point and The positional relationship (internal, external or surface) of the 3D model of the target object realizes the implicit expression of the 3D model of the target object, and obtains a neural network model for implicitly expressing the 3D model of the target object.

最后，对以上神经网络模型进行等值面提取，比如可以采用等值面提取算法(Marching cubes，MC)绘制出三维模型的表面，得到三维模型表面，进而根据该三维模型表面得到目标物体的三维模型。Finally, extract the isosurface of the above neural network model. For example, you can use the isosurface extraction algorithm (Marching cubes, MC) to draw the surface of the 3D model to obtain the surface of the 3D model, and then obtain the 3D surface of the target object based on the surface of the 3D model. Model.

本申请提供的三维重建方案，通过神经网络去隐式建模目标物体的三维模型，并加入深度信息提高模型训练的速度和精度。采用本申请提供的三维重建方案，在时序上持续的对拍摄对象进行三维重建，即可得到拍摄对象在不同时刻的三维模型，这些不同时刻的三维模型按时序构成的三维模型序列即为对拍摄对象所拍摄得到的体积视频。以此，可以针对任意拍摄对象进行“体积视频拍摄”，得到特定内容呈现的体积视频。比如，可以对跳舞的拍摄对象进行体积视频拍摄，得到可以在任意角度观看拍摄对象舞蹈的体积视频，可以对教学的拍摄对象进行体积视频拍摄，得到可以在任意角度观看拍摄对象教学的体积视频，等等。The 3D reconstruction solution provided by this application uses the neural network to implicitly model the 3D model of the target object, and adds depth information to improve the speed and accuracy of model training. Using the 3D reconstruction scheme provided by this application, the 3D reconstruction of the photographed object is carried out continuously in time sequence, and the 3D model of the photographed object at different moments can be obtained. Volumetric video captured by the subject. In this way, "volume video shooting" can be performed on any subject to obtain a volume video presented with specific content. For example, it is possible to shoot a volumetric video of a dancing subject to obtain a volumetric video that can be watched from any angle. It is possible to shoot a volumetric video of a teaching subject to obtain a volumetric video that can be viewed from any angle. etc.

需要说明的是，本申请以下实施例涉及的体积视频可采用以上体积视频拍摄方式所拍摄得到。It should be noted that the volume video involved in the following embodiments of the present application can be captured by the above volume video shooting method.

本申请实施例提供一种合拍控制方法、装置、电子设备和计算机可读存储介质。其中，该合拍控制装置可以集成在电子设备中，该电子设备可以是服务器，也可以是终端等设备。Embodiments of the present application provide a method, device, electronic device, and computer-readable storage medium for co-pacing control. Wherein, the co-shooting control device may be integrated in an electronic device, and the electronic device may be a server, or a terminal or other equipment.

本申请实施例的合拍控制方法可以应用于体积视频的制作流程及使用中，例如，体积视频中的被摄对象(如表演者)作为虚拟对象，按照本申请实施例的合拍控制方法进行合拍。示例性地，该体积视频的制作流程及使用大致如下：The co-shooting control method of the embodiment of the present application can be applied to the production process and use of the volumetric video, for example, the subject (such as a performer) in the volumetric video is used as a virtual object, and the co-shooting is performed according to the co-shooting control method of the embodiment of the present application. Exemplarily, the production process and use of the volumetric video are roughly as follows:

第一步：拍摄采集Step 1: Shoot and collect

表演者进入按矩阵部署的相机阵列系统，通过其中的红外IR相机、4K超高清工业相机等专业级采集设备将会拍摄并提取表演者的颜色信息、材质信息、深度信息等数据。Performers enter the camera array system deployed in a matrix, through which professional-level acquisition equipment such as infrared IR cameras and 4K ultra-high-definition industrial cameras will capture and extract performers' color information, material information, depth information and other data.

第二步：素材生成The second step: material generation

采集好数据后将素材上传云端，就可以在云端调动算法自动生成体积视频(3D动态人物模型序列)。After collecting the data, upload the materials to the cloud, and then the algorithm can be deployed on the cloud to automatically generate volumetric videos (3D dynamic character model sequences).

第三步：使用体积视频Step Three: Using Volumetric Video

通过插件将体积视频放入UE4/UE5/Unity 3D里面，与虚拟场景或CG特效完美融合，支持实时渲染，或者用于AR合拍等。Put the volumetric video into UE4/UE5/Unity 3D through plug-ins, perfectly integrate with virtual scenes or CG special effects, support real-time rendering, or be used for AR co-production, etc.

本申请实施例合拍控制方法的执行主体可以为本申请实施例提供的合拍控制装置，或者集成了该合拍控制装置的服务器设备、物理主机或者用户设备(User Equipment，UE)等不同类型的电子设备，其中，合拍控制装置可以采用硬件或者软件的方式实现，UE具体可以为智能手机、平板电脑、笔记本电脑、掌上电脑、台式电脑或者个人数字助理(Personal Digital Assistant，PDA)等终端设备。该电子设备可以采用单独运行的工作方式，或者也可以采用设备集群的工作方式，该电子设备可以集成摄像头或与摄像头建立网络连接，以实现对拍摄场景的拍摄形成拍摄画面；该电子设备还可以集成显示屏或与显示屏建立网络连接，以实现在拍摄过程中呈现拍摄画面。The executor of the co-pacing control method in the embodiment of the present application may be the co-pacing control device provided in the embodiment of the present application, or a server device, a physical host, or a user equipment (User Equipment, UE) and other electronic devices of different types integrated with the co-pacing control device. , wherein, the co-production control device can be realized by hardware or software, and the UE can specifically be a terminal device such as a smart phone, a tablet computer, a notebook computer, a palmtop computer, a desktop computer or a personal digital assistant (Personal Digital Assistant, PDA). The electronic device can work independently or in a cluster, and the electronic device can integrate a camera or establish a network connection with the camera to realize the shooting of the shooting scene and form a shooting picture; the electronic device can also Integrate the display screen or establish a network connection with the display screen to display the shooting picture during the shooting process.

例如，本申请实施例提供的合拍控制方法，可以应用于如图1所示的合拍控制系统中。其中，该合拍控制系统包括终端101、服务器102，终端101可以是既包括接收和发射硬件的设备，即具有能够在双向通信链路上，执行双向通信的接收和发射硬件的设备。终端101具体可以是设置有摄像头的如手机、平板电脑、笔记本电脑等终端，用于捕捉拍摄场景得到拍摄画面；终端101具体还可以是安装于拍摄现场用于完成合拍画面拍摄的摄像头。终端101与服务器102可以通过网络进行双向通信，服务器102可以是独立的服务器，也可以是服务器组成的服务器网络或服务器集群，其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云服务器。其中，云服务器由基于云计算(CloudComputing)的大量计算机或网络服务器构成。其中，服务器102还可以包含显示屏，用于显示拍摄画面。终端101和服务器102可以共同实现该合拍控制方法，例如，终端101可以向服务器102发送拍摄画面，服务器102由此可以呈现包含用户角色的拍摄画面；对所述拍摄画面中的所述用户角色进行手势识别，得到所述用户角色的当前手势；若所述当前手势与预设的位置控制手势匹配，则获取所述当前手势的指向位置点；按照所述指向位置点在所述拍摄画面中放置虚拟对象，以控制所述拍摄画面中所述虚拟对象与所述用户角色的相对位置。For example, the co-pacing control method provided in the embodiment of the present application may be applied to the co-pacing control system shown in FIG. 1 . Wherein, the co-production control system includes a terminal 101 and a server 102. The terminal 101 may be a device including both receiving and transmitting hardware, that is, a device having receiving and transmitting hardware capable of performing bidirectional communication on a bidirectional communication link. The terminal 101 can specifically be a terminal equipped with a camera, such as a mobile phone, a tablet computer, a notebook computer, etc., and is used to capture a shooting scene to obtain a shooting picture; the terminal 101 can also specifically be a camera installed on the shooting site to complete co-shooting picture shooting. The terminal 101 and the server 102 can carry out two-way communication through the network. The server 102 can be an independent server, or a server network or server cluster composed of servers, which includes but is not limited to computers, network hosts, single network servers, and multiple network servers. A cloud server composed of a set or multiple servers. Wherein, the cloud server is composed of a large number of computers or network servers based on cloud computing (Cloud Computing). Wherein, the server 102 may also include a display screen for displaying a shooting picture. The terminal 101 and the server 102 can jointly implement the co-shooting control method. For example, the terminal 101 can send a shooting picture to the server 102, and the server 102 can thus present a shooting picture containing a user character; Gesture recognition, obtaining the current gesture of the user character; if the current gesture matches the preset position control gesture, obtaining the pointing position point of the current gesture; placing the pointing position point in the shooting picture according to the pointing position point The virtual object is used to control the relative position of the virtual object and the user character in the shooting frame.

本领域技术人员可以理解，图1中示出的应用环境，仅仅是与本申请方案一种应用场景，并不构成对本申请方案应用场景的限定，其他的应用环境还可以包括比图1中所示更多或更少的计算机设备，例如图1中仅示出1个服务器102，可以理解的，该合拍控制系统还可以包括一个或多个其他服务器，具体此处不作限定。Those skilled in the art can understand that the application environment shown in Figure 1 is only an application scenario related to the solution of this application, and does not constitute a limitation on the application scenario of the solution of this application. More or fewer computer devices are shown. For example, only one server 102 is shown in FIG. 1 . It can be understood that the co-production control system may also include one or more other servers, which are not specifically limited here.

还需说明的是，图1所示合拍控制系统的场景示意图仅仅是一个示例，本发明实施例描述的合拍控制系统以及场景是为了更加清楚的说明本发明实施例的技术方案，并不构成对于本发明实施例提供的技术方案的限定，本领域普通技术人员可知，随着合拍控制系统的演变和新业务场景的出现，本发明实施例提供的技术方案对于类似的技术问题，同样适用。It should also be noted that the scene schematic diagram of the co-production control system shown in FIG. The limitations of the technical solutions provided by the embodiments of the present invention, those skilled in the art know, with the evolution of the co-production control system and the emergence of new business scenarios, the technical solutions provided by the embodiments of the present invention are also applicable to similar technical problems.

下面，开始介绍本申请实施例提供的合拍控制方法，本申请实施例中以电子设备作为执行主体、该电子设备集成了摄像头和显示屏来举例说明，为了简化与便于描述，后续方法实施例中将省略该执行主体。Next, let’s start to introduce the co-shooting control method provided by the embodiment of the present application. In the embodiment of the present application, an electronic device is used as the execution subject, and the electronic device integrates a camera and a display screen for illustration. The execution body will be omitted.

参照图2，图2是本申请实施例提供的合拍控制方法的一种流程示意图。需要说明的是，虽然在图2或其他附图所示的流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。该合拍控制方法包括步骤201～204，其中：Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a co-pacing control method provided in an embodiment of the present application. It should be noted that although a logical order is shown in the flowchart shown in FIG. 2 or other figures, in some cases, the steps shown or described may be performed in a different order than here. The co-production control method includes steps 201-204, wherein:

201、呈现包含用户角色的拍摄画面。201. Present a shooting screen including a user character.

其中，用户角色是指与虚拟对象进行合拍的用户。例如，拍摄场景中的人物。Wherein, the user role refers to a user who performs co-production with the virtual object. For example, photographing people in a scene.

其中，虚拟对象是指不存在于拍摄场景中但呈现于拍摄画面中的、与用户角色合拍的对象。例如，如图3所示，虚拟对象为小狗，拍摄场景(如图3中虚线框内为拍摄场景的示意)中不存在该小狗、但拍摄画面中呈现该小狗。Wherein, the virtual object refers to an object that does not exist in the shooting scene but appears in the shooting screen and is in harmony with the user character. For example, as shown in FIG. 3 , the virtual object is a puppy, and the puppy does not exist in the shooting scene (shown in the dotted frame in FIG. 3 ), but the puppy appears in the shooting screen.

其中，虚拟对象具体可以是体积视频中的三维模型，或者也可以是二维模型。Wherein, the virtual object may specifically be a three-dimensional model in the volumetric video, or may also be a two-dimensional model.

其中，拍摄场景是指用户角色拍摄时所在的实景。Wherein, the shooting scene refers to a real scene where the user character is shooting.

其中，拍摄画面是指对拍摄场景进行画面捕捉形成的画面。拍摄画面具体可以是打开摄像头后但未正式进入拍摄状态时捕捉到的画面，也可以是打开摄像头后正式进行拍摄状态时(如用户按照“开始拍摄”按钮后)捉到的画面。Wherein, the shooting picture refers to a picture formed by capturing a picture of a shooting scene. The shooting picture can specifically be the picture captured when the camera is turned on but not officially entering the shooting state, or it can be the picture captured when the camera is turned on and the shooting state is officially in progress (such as after the user presses the "start shooting" button).

示例性地，当电子设备的摄像头打开后，摄像头将会对当前的拍摄场景进行画面捕捉形成拍摄画面，电子设备的显示屏上将会呈现包含用户角色的拍摄画面。在一些实施例中，当电子设备的摄像头打开后，还可以同时呈现虚拟对象，即步骤201中同时呈现包含用户角色和虚拟对象的拍摄画面。在另一些实施例中，也可以在摄像头正式拍摄时才呈现虚拟对象，即步骤201中可以呈现包含用户角色、但不包含虚拟对象的拍摄画面。For example, when the camera of the electronic device is turned on, the camera will capture the current shooting scene to form a shooting picture, and the display screen of the electronic device will present the shooting picture including the user character. In some embodiments, when the camera of the electronic device is turned on, the virtual object can also be presented at the same time, that is, in step 201, the captured picture including the user character and the virtual object is presented simultaneously. In some other embodiments, the virtual object may also be presented only when the camera is officially shooting, that is, in step 201 , a shooting picture containing the user character but not containing the virtual object may be presented.

202、对所述拍摄画面中的所述用户角色进行手势识别，得到所述用户角色的当前手势。202. Perform gesture recognition on the user character in the captured image to obtain a current gesture of the user character.

其中，当前手势是指对用户角色进行手势识别得到的用户角色的手势。例如，用户角色的当前手势可以是“食指伸展、其余四指弯曲”、“五指合并伸展”等。Wherein, the current gesture refers to a gesture of the user character obtained by performing gesture recognition on the user character. For example, the current gesture of the user character may be "stretch the index finger, bend the other four fingers", "stretch the five fingers together", and so on.

示例性地，首先，可以对步骤201呈现的拍摄画面进行截图得到拍摄画面图像；然后，通过手势识别算法，根据拍摄画面图像进行手势识别，得到用户角色的当前手势。Exemplarily, first, a screenshot of the shooting screen presented in step 201 can be taken to obtain a shooting screen image; then, a gesture recognition algorithm is performed based on the shooting screen image to obtain the current gesture of the user character.

例如，首先，基于训练数据集(包括多个样本图像，并标注每个样本图像中用户的手部区域、每个样本图像中手部区域对应的手势类别)，对预设的手势识别算法进行训练，使得训练后的手势识别算法学习到各种手势的特征，从而得到训练后的(适用于检测图像中的手部区域、确定图像中手部区域对应的手势类别的)手势识别算法。其中，预设的手势识别算法可以是可用于分类任务的开源网络模型，如EfficientNet模型、YOLOv3网络、MobileNet网络等等。具体地，可以采用模型参数为默认值的(可用于分类任务)开源网络作为预设的手势识别算法。For example, first, based on the training data set (including multiple sample images, and labeling the user's hand area in each sample image, and the gesture category corresponding to the hand area in each sample image), the preset gesture recognition algorithm is Training, so that the trained gesture recognition algorithm learns the characteristics of various gestures, so as to obtain the trained gesture recognition algorithm (suitable for detecting the hand area in the image and determining the gesture category corresponding to the hand area in the image). Among them, the preset gesture recognition algorithm can be an open source network model that can be used for classification tasks, such as EfficientNet model, YOLOv3 network, MobileNet network and so on. Specifically, an open source network whose model parameters are default values (usable for classification tasks) can be used as a preset gesture recognition algorithm.

其中，可以根据实际业务场景所需识别的手势而设置手势识别算法所需学习的手势类别，例如，若需要识别用户角色的手势是否为“食指伸展、其余四指弯曲”，则可以设置2种手势类别(一种为“食指伸展、其余四指弯曲”、另一种为“其他手势”)对预设的手势识别算法进行训练；又如，若需要识别用户角色的手势是否为类别1(如“食指伸展、其余四指弯曲”手势)、类别2(如“五指合并伸展”手势)、类别3(除类别1和类别2外的形态)，则可以设置3种手势类别(一种为“食指伸展、其余四指弯曲”、一种为“五指合并伸展”、另一种为“其他手势”)对预设的手势识别算法进行训练。Among them, you can set the gesture categories that the gesture recognition algorithm needs to learn according to the gestures that need to be recognized in the actual business scenario. For example, if you need to recognize whether the gesture of the user character is "stretch the index finger and bend the other four fingers", you can set two types Gesture categories (one is "stretch the index finger, bend the other four fingers", and the other is "other gestures") to train the preset gesture recognition algorithm; Such as "stretching the index finger, bending the remaining four fingers" gesture), category 2 (such as "five fingers combined and stretching" gesture), category 3 (forms other than category 1 and category 2), you can set 3 gesture categories (one for "Extend the index finger, bend the other four fingers", one is "five fingers merged and stretched", and the other is "other gestures") to train the preset gesture recognition algorithm.

然后，将对步骤201呈现的拍摄画面进行截图得到拍摄画面图像输入至训练后的手势识别算法中，以调用训练后的手势识别算法对拍摄画面图像进行分类处理：先检测出拍摄画面图像中用户角色的手部区、再对拍摄画面图像中手部区域进行分类，得到拍摄画面图像中用户角色的手势类别，以作为用户角色的当前手势。Then, take a screenshot of the shooting picture presented in step 201 to obtain the shooting picture image and input it into the trained gesture recognition algorithm, so as to call the trained gesture recognition algorithm to classify the shooting picture image: first detect the user in the shooting picture image The hand area of the character, and then classify the hand area in the shot image to obtain the gesture category of the user character in the shot image, which is used as the current gesture of the user character.

203、若所述当前手势与预设的位置控制手势匹配，则获取所述当前手势的指向位置点。203. If the current gesture matches a preset position control gesture, acquire a pointed position point of the current gesture.

其中，预设的位置控制手势是指预设设定的、用于控制虚拟对象的放置位置的手势。例如，预设的位置控制手势可以是“食指伸展、其余四指弯曲”、或“五指合并伸展”等。Wherein, the preset position control gesture refers to a preset gesture for controlling the placement position of the virtual object. For example, the preset position control gesture may be "stretch the index finger and bend the other four fingers", or "stretch the five fingers together".

此处预设的位置控制手势仅为举例，实际上，预设的位置控制手势的具体呈现形态可以根据实际业务场景需求而设置，本实施例中对预设的位置控制手势的具体呈现形态不做限制。The preset position control gestures here are only examples. In fact, the specific presentation forms of the preset position control gestures can be set according to the actual business scene requirements. In this embodiment, the specific presentation forms of the preset position control gestures are different. Do limit.

其中，指向位置点是指当前手势指向的位置。具体可以是当前手势对应的手指指向的位置点(如下情况(1)所示)，也可以是当前手势预先关联的位置点(如下情况(2)所示)。Wherein, the pointed position point refers to the position pointed by the current gesture. Specifically, it may be the point pointed by the finger corresponding to the current gesture (as shown in the following case (1)), or it may be the pre-associated position point of the current gesture (as shown in the following case (2)).

步骤202中识别出了用户角色的当前手势后，会检测用户角色的当前手势与预设的位置控制手势是否匹配，若当前手势与预设的位置控制手势匹配，将会进入步骤203；否则，若当前手势与预设的位置控制手势不匹配，则可以不作进一步处理或重新执行步骤202对拍摄画面中的用户角色进行手势识别，得到用户角色的当前手势，直至检测到用户角色的当前手势与预设的位置控制手势匹配时，进入步骤203。After identifying the current gesture of the user character in step 202, it will detect whether the current gesture of the user character matches the preset position control gesture, and if the current gesture matches the preset position control gesture, it will enter step 203; otherwise, If the current gesture does not match the preset position control gesture, no further processing may be performed or step 202 may be re-executed to perform gesture recognition on the user character in the shooting screen to obtain the current gesture of the user character until the current gesture of the user character is detected. When the preset position control gesture matches, go to step 203 .

步骤203中确定指向位置点的方式有多种，示例性地，包括：In step 203, there are many ways to determine the pointing point, for example, including:

情况(1)：指向位置为当前手势对应的手指指向的位置点。此时，步骤203具体可以包括如下步骤2031A～2032A：Case (1): The pointing position is the point pointed by the finger corresponding to the current gesture. At this time, step 203 may specifically include the following steps 2031A-2032A:

2031A、若所述当前手势与预设的位置控制手势匹配，则获取所述当前手势对应的手指指向在所述拍摄画面所在的三维空间中形成的射线。2031A. If the current gesture matches the preset position control gesture, acquire the ray formed in the three-dimensional space where the shooting picture is located by the finger pointing corresponding to the current gesture.

2032A、获取所述射线与所述虚拟对象的支撑面之间的交点，以作为所述当前手势的指向位置点。2032A. Obtain an intersection point between the ray and the supporting surface of the virtual object as the pointed position point of the current gesture.

其中，拍摄画面所在的三维空间是指拍摄画面所对应捕捉的拍摄场景的三维空间。Wherein, the three-dimensional space where the shooting picture is located refers to the three-dimensional space of the shooting scene captured corresponding to the shooting picture.

其中，射线是指当前手势对应的手指指向在拍摄画面所在的三维空间中形成的射线，本文中简称为射线。如图3所示，射线具体可以理解为：以当前手势对应的手指指尖为起点、手指指向为射线延长方向的射线。Wherein, the ray refers to the ray formed in the three-dimensional space where the captured image is located by the finger pointing corresponding to the current gesture, which is referred to as the ray in this document. As shown in FIG. 3 , the ray can specifically be understood as: a ray starting from the tip of the finger corresponding to the current gesture, and pointing to the ray in a direction in which the ray extends.

在一些实施例中，步骤2032A中可以直接将用户角色的站立面(如地面)作为虚拟对象的支撑面，此时，可以直接将射线与用户角色的站立面之间的交点作为当前手势的指向位置点。例如，如图4所示，假设预设的位置控制手势是“食指伸展、其余四指弯曲”，通过步骤202可以识别出用户角色的当前手势为“食指伸展、其余四指弯曲”，则当前手势与预设的位置控制手势匹配，可以识别用户角色的当前手势对应的手指指向(即食指的指向)在拍摄画面所在的三维空间中形成的射线，然后，将射线与用户角色的站立面之间的交点(如图4中的A点所示，其中，手指指向的延长线与用户角色的站立面相交于A点)作为当前手势的指向位置点。In some embodiments, in step 2032A, the standing surface of the user character (such as the ground) can be directly used as the supporting surface of the virtual object. At this time, the intersection point between the ray and the standing surface of the user character can be directly used as the direction of the current gesture location point. For example, as shown in FIG. 4 , assuming that the preset position control gesture is "stretch the index finger and bend the other four fingers", through step 202, it can be recognized that the current gesture of the user character is "stretch the index finger and bend the other four fingers", then the current The gesture is matched with the preset position control gesture, which can identify the ray formed by the finger pointing corresponding to the current gesture of the user character (that is, the pointing of the index finger) in the three-dimensional space where the shooting picture is located, and then connect the ray to the standing surface of the user character The intersection point between (as shown at point A in FIG. 4 , where the extension line pointed by the finger intersects with the standing surface of the user character at point A) is used as the pointed position point of the current gesture.

在一些实施例中，拍摄场景中同时存在多个平面(如地板平面、楼梯的每个阶梯平面)，步骤2032A中可以指定拍摄场景任意的一个平面作为虚拟对象的支撑面，此时，步骤2032A之前可以先确定虚拟对象的支撑面，再进入步骤2032A获取射线与虚拟对象的支撑面之间的交点作为当前手势的指向位置点。确定虚拟对象的支撑面的过程具体可以包括：将所述拍摄画面中所述用户角色的站立面加入所述拍摄画面的备选平面集合中；将所述拍摄画面中与所述站立面之间的夹角小于预设夹角阈值的平面，加入所述备选平面集合中；从所述备选平面集合的各平面中，获取与所述射线存在交点且最近的平面，以作为所述虚拟对象的支撑面。In some embodiments, there are multiple planes (such as the floor plane, each step plane of the stairs) in the shooting scene at the same time. In step 2032A, any plane in the shooting scene can be designated as the supporting surface of the virtual object. At this time, step 2032A Before that, the supporting surface of the virtual object can be determined first, and then enter step 2032A to obtain the intersection point between the ray and the supporting surface of the virtual object as the pointed position point of the current gesture. The process of determining the supporting surface of the virtual object may specifically include: adding the standing surface of the user character in the shooting picture to the set of candidate planes in the shooting picture; The plane whose included angle is less than the preset included angle threshold is added to the set of candidate planes; from each plane in the set of candidate planes, the closest plane that intersects with the ray is obtained as the virtual The supporting surface of the object.

其中，射线的起点是指手指指向的起点，例如可以是指尖。Wherein, the starting point of the ray refers to the starting point pointed by a finger, for example, a fingertip.

其中，平面交点是指射线与平面之间的交点。Wherein, the plane intersection point refers to the intersection point between the ray and the plane.

其中，与射线的起点最近的平面是指备选平面集合的各平面中，与射线存在交点的平面、且平面交点与射线的起点之间距离最小的平面。Wherein, the plane closest to the starting point of the ray refers to the plane having an intersection with the ray among the planes in the set of candidate planes, and the plane with the smallest distance between the plane intersection and the starting point of the ray.

例如，如图6所示，拍摄画面中包括了桌面、墙面、地面，用户角色的站立面为地面，其中，桌面与地面之间的夹角小于预设夹角阈值(如5°)，射线与地面的交点为A点、射线与桌面的交点为B点，则会将地面、桌面加入备选平面集合中；然后，会分别计算射线与备选平面集合中各平面(即地面、桌面)是否存在交点，并从中选取与射线存在交点(如图6中射线与地面的交点为A点、射线与桌面的交点为B点)、且与射线的起点最近的平面(如图6中为桌面)，以作为虚拟对象的支撑面。For example, as shown in FIG. 6 , the photographed picture includes a desktop, a wall, and the ground, and the standing surface of the user character is the ground, wherein the angle between the desktop and the ground is less than a preset angle threshold (such as 5°), The intersection point of the ray and the ground is point A, and the intersection point of the ray and the desktop is point B, then the ground and the desktop will be added to the set of alternative planes; ) exists an intersection point, and there is an intersection point with the ray (as shown in Figure 6, the intersection point between the ray and the ground is point A, and the intersection point between the ray and the desktop is point B), and the plane closest to the starting point of the ray (as shown in Figure 6 is desktop) as a supporting surface for virtual objects.

可见，为了保证虚拟对象可以正常放置，虚拟对象的支撑面为地面或者是平行与地面的平面(如桌面、楼梯的每个阶梯平面等)，通过先将用户角色的站立面、以及与站立面之间的夹角小于预设夹角阈值的平面加入拍摄画面的备选平面集合中，再从备选平面集合中筛选出与射线存在交点且最近的平面作为虚拟对象的支撑面；第一方面，由于站立面、以及与站立面之间的夹角小于预设夹角阈值的平面被加入备选平面集合中，从而保证有可能是支撑面的平面保留下来进行支撑面的判定，并可以过滤掉一些非支撑面，从而减少需要确定当前手势的指向位置点的计算量。第二方面，可以在用户可以指定虚拟对象的支撑面基础上，避免将与射线存在交点但不能正常放置虚拟对象的平面(如拍摄场景中的墙面)作为虚拟对象的支撑面，从而避免将虚拟对象放置在错误的平面上。第三方面，由于是以与射线存在交点且最近的平面作为虚拟对象的支撑面，因此可以避免射线同时与多个存在交点时(例如，如图6所示，射线同时与桌面和地面存在交点，此时会以与射线的起点最近的平面即桌面作为虚拟对象的支撑面)支撑面的误判问题。第四方面，可以实现用户指定虚拟对象的放置平面，而无需固定某个平面(如地面)作为虚拟对象的放置平面，提高了虚拟对象的控制多样性。It can be seen that in order to ensure that the virtual object can be placed normally, the supporting surface of the virtual object is the ground or a plane parallel to the ground (such as the desktop, the plane of each step of the stairs, etc.). The planes whose included angles are smaller than the preset included angle threshold are added to the set of candidate planes of the shooting screen, and then the plane with the closest intersection with the ray is selected from the set of candidate planes as the supporting surface of the virtual object; the first aspect , since the standing surface and the plane whose angle with the standing surface is less than the preset angle threshold are added to the set of candidate planes, so as to ensure that the plane that may be the support surface is retained for the determination of the support surface, and can be filtered Drop some non-supporting surfaces, thereby reducing the amount of calculations needed to determine the pointing position of the current gesture. In the second aspect, on the basis that the user can specify the support surface of the virtual object, it is possible to avoid using a plane that intersects with the ray but cannot normally place the virtual object (such as a wall in the shooting scene) as the support surface of the virtual object, thereby avoiding using Dummy placed on wrong plane. In the third aspect, since the plane with the nearest intersection with the ray is used as the supporting surface of the virtual object, it can avoid when the ray intersects with multiple points at the same time (for example, as shown in Figure 6, the ray intersects with the desktop and the ground at the same time) , at this time, the plane closest to the starting point of the ray, that is, the desktop, will be used as the support surface of the virtual object) The problem of misjudgment of the support surface. In the fourth aspect, the user can specify the placement plane of the virtual object without fixing a certain plane (such as the ground) as the placement plane of the virtual object, which improves the control diversity of the virtual object.

情况(2)：指向位置为当前手势预先关联的位置点。例如，预设的位置控制手势包括：A手势、B手势、C手势，假设A手势、B手势、C手势预先关联的位置点分别为：用户角色左侧1米处、用户角色右侧1米处、用户角色正前方1米处，若识别出当前手势为C手势，则证明当前手势与预设的位置控制手势匹配，并将用户角色正前方1米处作为当前手势的指向位置点。Case (2): The pointing position is the pre-associated point of the current gesture. For example, the preset position control gestures include: A gesture, B gesture, and C gesture. Assume that the pre-associated position points of A gesture, B gesture, and C gesture are respectively: 1 meter to the left of the user character, and 1 meter to the right of the user character 1 meter in front of the user character, if the current gesture is recognized as a C gesture, it proves that the current gesture matches the preset position control gesture, and the point 1 meter in front of the user character is used as the pointing position of the current gesture.

204、按照所述指向位置点在所述拍摄画面中放置虚拟对象，以控制所述拍摄画面中所述虚拟对象与所述用户角色的相对位置。204. Place a virtual object in the shooting frame according to the pointed position point, so as to control a relative position between the virtual object and the user character in the shooting frame.

例如，如图4所示和图5所示，指向位置点为A点，则会在指向位置点A处放置虚拟对象(如小狗)，以使得拍摄画面中虚拟对象被放置在用户想要的位置，从而实现用户角色可以控制拍摄画面中虚拟对象与用户角色的相对位置，使得用户即使无法看见拍摄画面时在拍摄过程中仍可以大致控制和清楚虚拟对象的放置位置，从而可以作出与虚拟对象更加自然协调的动作与表情，降低合拍视频或图像的突兀性，使得合拍视频或图像效果更自然。For example, as shown in Figure 4 and Figure 5, if the pointing position point is point A, a virtual object (such as a puppy) will be placed at the pointing position point A, so that the virtual object in the shooting screen is placed where the user wants position, so that the user character can control the relative position of the virtual object and the user character in the shooting screen, so that even if the user cannot see the shooting screen, the user can still roughly control and understand the placement position of the virtual object during the shooting process, so that it can be compared with the virtual object The more natural and coordinated movements and expressions of the subject reduce the abruptness of co-shooting videos or images, making the effect of co-shooting videos or images more natural.

进一步地，为了让合拍者(即拍摄画面中的用户角色)更好地控制和清楚虚拟对象的方位信息，除了可以通过手势控制虚拟对象的放置点外，还可以通过手势控制虚拟对象的朝向(例如与用户角色的相对朝向、与摄像头的相对朝向等)、虚拟对象与用户角色的相对距离。此时，步骤203中检测所述用户角色的当前手势的手势类型；若所述手势类型为位置控制手势、且所述当前手势与预设的位置控制手势匹配，再获取所述当前手势的指向位置点。例如，预设的位置控制手势是“五指合并伸展”，若当前手势为“食指伸展、其余四指弯曲”则不会获取当前手势的指向位置点，若当前手势为“五指合并伸展”则证明当前手势为预设的位置控制手势，会进一步获取当前手势的指向位置点。以保证用户是作出控制虚拟对象的位置情况下再获取执行位置点，以避免用户当前手势不是控制虚拟对象的位置、或当前手势是控制虚拟对象的相对朝向、相对距离等方位信息，从而避免执行位置点的无效检测，保证了用户可以通过多种类型手势控制虚拟对象不同的方位(如放置点、放置距离、放置朝向等)。Furthermore, in order to allow the co-shooter (that is, the user character in the shooting screen) to better control and understand the orientation information of the virtual object, in addition to controlling the placement point of the virtual object through gestures, the orientation of the virtual object can also be controlled through gestures ( For example, the relative orientation to the user character, the relative orientation to the camera, etc.), the relative distance between the virtual object and the user character. At this time, in step 203, the gesture type of the current gesture of the user character is detected; if the gesture type is a position control gesture and the current gesture matches the preset position control gesture, then the direction of the current gesture is obtained location point. For example, the preset position control gesture is "five fingers merged and stretched". If the current gesture is "index finger stretched, and the remaining four fingers are bent", the pointing position of the current gesture will not be obtained. If the current gesture is "five fingers merged and stretched", it proves that The current gesture is a preset position control gesture, and the pointing position of the current gesture will be obtained further. To ensure that the user is controlling the position of the virtual object, then obtain the execution position point, so as to avoid that the user's current gesture is not the position of controlling the virtual object, or the current gesture is controlling the orientation information such as the relative orientation and relative distance of the virtual object, so as to avoid execution The invalid detection of the position point ensures that the user can control different orientations of the virtual object (such as placement point, placement distance, placement orientation, etc.) through various types of gestures.

若所述手势类型为朝向控制手势，则获取所述当前手势的关联朝向；按照所述关联朝向，控制所述拍摄画面中所述虚拟对象与所述用户角色的相对朝向。例如，预设的朝向控制手势为“食指平行地面指向正前方”、“食指平行地面指向正后方”，其中，“食指平行地面指向正前方”、“食指平行地面指向正后方”的关联朝向分别：虚拟对象朝向用户角色、虚拟对象背向用户角色，若当前手势为“食指平行地面指向正前方”则证明当前手势为预设的朝向控制手势，会按照当前手势“食指平行地面指向正前方”的关联朝向“虚拟对象背向用户角色”移动拍摄画面中的虚拟对象，以使得控制拍摄画面中虚拟对象与所述用户角色的相对朝向为“虚拟对象背向用户角色”。由此，可以使得用户可以通过手势控制虚拟对象的不同朝向，从而使得用户在看不到拍摄画面情况下仍可以大致控制和清楚虚拟对象的朝向，从而作出与虚拟对象更加自然协调的动作与表情，降低合拍视频的突兀性，使得合拍视频效果更自然。If the gesture type is an orientation control gesture, then acquire the associated orientation of the current gesture; and control the relative orientation of the virtual object and the user character in the shooting screen according to the associated orientation. For example, the preset orientation control gestures are "index finger parallel to the ground pointing straight ahead" and "index finger parallel to the ground pointing straight back", wherein the associated orientations of "index finger parallel to the ground pointing straight forward" and "index finger parallel to the ground pointing straight back" are respectively : The virtual object is facing the user character, and the virtual object is facing away from the user character. If the current gesture is "index finger parallel to the ground pointing straight ahead", it proves that the current gesture is a preset orientation control gesture, and the current gesture "index finger parallel to the ground pointing straight ahead" will be followed Move the virtual object in the shooting screen with the associated orientation "virtual object faces away from the user character", so that the relative orientation of the virtual object and the user character in the control shooting screen is "virtual object faces away from the user character". In this way, the user can control the different orientations of the virtual object through gestures, so that the user can roughly control and clearly understand the orientation of the virtual object even when the shooting screen cannot be seen, so as to make actions and expressions that are more natural and coordinated with the virtual object , reduce the abruptness of the co-production video, making the co-production video effect more natural.

进一步地，用户在作出朝向控制手势后用户角色的朝向可能会发生，如由正对摄像头变为侧对摄像头，而用户本质上要控制虚拟对象朝向用户角色，为此，当用户在拍摄过程中调整朝向后，可以控制虚拟对象的朝向随之而变，以避免用户拍摄过程中微调朝向后需要不断通过手势控制虚拟对象的朝向问题，实现交互便捷性。此时，在按照所述关联朝向，控制所述拍摄画面中所述虚拟对象与所述用户角色的相对朝向之后，当检测到用户角色的朝向变化时，还可以响应于所述用户角色的朝向的变化，更新所述虚拟对象的朝向，以使得所述用户角色与所述虚拟对象的相对朝向保持为所述关联朝向。Furthermore, after the user makes an orientation control gesture, the orientation of the user character may occur, such as changing from facing the camera directly to the side facing the camera, and the user essentially needs to control the virtual object to face the user character. For this reason, when the user is shooting After adjusting the orientation, the orientation of the virtual object can be controlled to change accordingly, so as to avoid the problem that the user needs to continuously control the orientation of the virtual object through gestures after fine-tuning the orientation during the shooting process, and realize the convenience of interaction. At this time, after controlling the relative orientation of the virtual object and the user character in the photographed screen according to the associated orientation, when a change in the orientation of the user character is detected, it may also respond to the orientation of the user character , updating the orientation of the virtual object, so that the relative orientation between the user character and the virtual object remains the associated orientation.

若所述手势类型为距离控制手势，则获取所述当前手势的关联距离；按照所述关联距离，控制所述拍摄画面中所述虚拟对象与所述用户角色的相对距离。例如，预设的距离控制手势为“竖起1个手指”、“竖起2个手指”，其中，“竖起1个手指”、“竖起2个手指”的关联距离分别：虚拟对象距离用户角色1米、虚拟对象距离用户角色2米，若当前手势为“竖起1个手指”则证明当前手势为预设的距离控制手势，会按照当前手势“竖起1个手指”的关联距离“虚拟对象距离用户角色1米”移动拍摄画面中的虚拟对象，以使得控制拍摄画面中虚拟对象与所述用户角色的相对距离为“1米”。由此，可以使得用户可以通过手势控制虚拟对象的不同距离，从而使得用户在看不到拍摄画面情况下仍可以大致控制和清楚虚拟对象的距离，从而作出与虚拟对象更加自然协调的动作与表情，降低合拍视频的突兀性，使得合拍视频效果更自然。If the gesture type is a distance control gesture, then acquire the associated distance of the current gesture; according to the associated distance, control the relative distance between the virtual object and the user character in the shooting picture. For example, the preset distance control gestures are "raise 1 finger" and "raise 2 fingers", wherein the associated distances of "raise 1 finger" and "raise 2 fingers" are: virtual object distance The user character is 1 meter away, and the virtual object is 2 meters away from the user character. If the current gesture is "raise 1 finger", it proves that the current gesture is a preset distance control gesture, and the associated distance of the current gesture "raise 1 finger" will be used "The virtual object is 1 meter away from the user character" moves the virtual object in the shooting screen so that the relative distance between the virtual object in the shooting screen and the user character is "1 meter". In this way, the user can control the different distances of the virtual object through gestures, so that the user can roughly control and clearly understand the distance of the virtual object even when the user cannot see the shooting screen, so as to make actions and expressions that are more natural and coordinated with the virtual object , reduce the abruptness of the co-production video, making the co-production video effect more natural.

拍摄者(可以是用户角色本人、也可以时其他人)可以选择在任意时间按下拍摄控件，以控制摄像头进入拍摄状态对拍摄画面进行拍摄，得到虚拟对象与用户角色的目标合拍视频；电子设备将会响应于针对拍摄控件的触控操作，对拍摄画面进行拍摄，得到虚拟对象与所述用户角色的目标合拍视频。进一步地，为了避免在进入拍摄状态后，用户角色作出控制手势会被拍摄出来，导致拍摄得到的目标合拍视频数据量较大(例如，尤其是在利用体积视频中的三维模型作为虚拟对象时，会导致合拍视频增大较为明显)、或导致用户需在后期裁剪掉视频中的控制手势帧，可以响应于针对拍摄控件的触控操作，对所述拍摄画面进行拍摄，得到所述虚拟对象与所述用户角色的初步合拍视频；对所述初步合拍视频中的视频帧进行控制手势识别，得到包含控制手势的目标视频帧(如识别出包含位置控制手势、朝向控制手势、或距离控制手势的视频帧作为目标视频帧)；将所述目标视频帧从所述初步合拍视频中滤除，得到所述目标合拍视频。如此，可以过滤掉包含控制手势的视频帧后再保存，从而可以一定程度上降低目标合拍视频所需占用的内存、减少后续所需的裁剪处理。The shooter (it can be the user character himself or someone else) can choose to press the shooting control at any time to control the camera to enter the shooting state to shoot the shooting screen, and get the co-shooting video of the virtual object and the user character's target; electronic equipment The shooting screen will be shot in response to the touch operation on the shooting control, and a co-shooting video of the virtual object and the target of the user character will be obtained. Further, in order to avoid that after entering the shooting state, the control gesture made by the user character will be captured, resulting in a large amount of target co-shooting video data (for example, especially when using the 3D model in the volumetric video as a virtual object, It will cause the co-shooting video to increase significantly), or cause the user to cut out the control gesture frame in the video in the later stage, and can respond to the touch operation on the shooting control to shoot the shooting picture, and obtain the virtual object and Preliminary co-shooting video of the user role; control gesture recognition is performed on the video frame in the preliminary co-shooting video to obtain a target video frame containing a control gesture (such as identifying a position control gesture, an orientation control gesture, or a distance control gesture The video frame is used as the target video frame); the target video frame is filtered out from the preliminary co-shooting video to obtain the target co-shooting video. In this way, video frames containing control gestures can be filtered out before saving, thereby reducing the memory occupied by the target co-shooting video to a certain extent, and reducing the subsequent cropping processing required.

进一步地，当拍摄者按下拍摄控件后，即摄像头处于拍摄状态过程中，也可以通过自动识别当前手势是否为预设的控制手势(如是否为位置控制手势、朝向控制手势、或距离控制手势)，若是则自动暂停拍摄、等待控制完成后再继续拍摄。即该合拍控制方法还包括：当所述当前手势与预设的控制手势匹配时，检测所述拍摄画面的摄像头是否处于拍摄状态；若所述摄像头处于拍摄状态，则将所述摄像头由拍摄状态切换至暂停状态；直至所述拍摄画面中所述虚拟对象放置于所述指向位置点时，将所述摄像头由暂停状态切换至拍摄状态。例如，为了避免合拍者的控制手势会被拍摄记录，可以自动跳过或过滤手势出现的画面。比如，即使拍摄设备已经按下“拍摄”按钮，若合拍者在使用手势控制虚拟对象位置时，自动暂停拍摄；在虚拟对象放置于指向位置点之后，自动继续拍摄，以降低与虚拟对象合拍的视频数据量。Further, when the photographer presses the shooting control, that is, the camera is in the shooting state, it can also automatically identify whether the current gesture is a preset control gesture (such as whether it is a position control gesture, an orientation control gesture, or a distance control gesture) ), if it is, the shooting will be paused automatically, and the shooting will continue after the control is completed. That is, the co-shooting control method further includes: when the current gesture matches the preset control gesture, detecting whether the camera of the shooting picture is in the shooting state; if the camera is in the shooting state, switching the camera from the shooting state Switching to a pause state; until the virtual object in the shooting screen is placed at the pointing position, switch the camera from the pause state to the shooting state. For example, in order to prevent the control gestures of the co-photographer from being recorded, it is possible to automatically skip or filter the pictures where the gestures appear. For example, even if the shooting device has pressed the "shoot" button, if the co-shooter uses gestures to control the position of the virtual object, the shooting will be automatically paused; after the virtual object is placed at the pointed position, the shooting will be continued automatically to reduce the risk of co-shooting with the virtual object. amount of video data.

由此，本实施例可以通过合拍者(即用户)手势控制虚拟对象的位置、朝向、距离等，使得用户可以大致控制和清楚虚拟对象(如体积视频中的三维模型)的位置，从而作出与虚拟对象更加自然协调的动作与表情，降低合拍视频的突兀性，使得合拍视频效果更自然。也可以在一定程度上避免需要拍摄者在拍摄画面中手动调节虚拟对象的位置并告知被拍摄者，被拍摄者无法准确、快速地了解到虚拟对象位置的问题。Therefore, this embodiment can control the position, orientation, distance, etc. of the virtual object through the gesture of the co-shooter (that is, the user), so that the user can roughly control and understand the position of the virtual object (such as the 3D model in the volumetric video), so as to make a comparison with The more natural and coordinated movements and expressions of virtual objects reduce the abruptness of the co-production video, making the co-production video more natural. It can also be avoided to a certain extent that the photographer needs to manually adjust the position of the virtual object in the shooting screen and inform the subject that the subject cannot know the position of the virtual object accurately and quickly.

为了更好实施本申请实施例中合拍控制方法，在合拍控制方法基础之上，本申请实施例中还提供一种合拍控制装置，如图7所示，为本申请实施例中合拍控制装置的一个实施例结构示意图，该合拍控制装置700包括：In order to better implement the co-pacing control method in the embodiment of the present application, on the basis of the co-pacing control method, the embodiment of the present application also provides a co-pacing control device, as shown in Figure 7, which is the co-pacing control device in the embodiment of the present application A structural schematic diagram of an embodiment, the co-beat control device 700 includes:

显示单元701，用于呈现包含用户角色的拍摄画面；a display unit 701, configured to present a shooting picture including a user character;

识别单元702，用于对所述拍摄画面中的所述用户角色进行手势识别，得到所述用户角色的当前手势；A recognition unit 702, configured to perform gesture recognition on the user character in the captured image to obtain the current gesture of the user character;

获取单元703，用于若所述当前手势与预设的位置控制手势匹配，则获取所述当前手势的指向位置点；An acquiring unit 703, configured to acquire a pointing position point of the current gesture if the current gesture matches a preset position control gesture;

控制单元704，用于按照所述指向位置点在所述拍摄画面中放置虚拟对象，以控制所述拍摄画面中所述虚拟对象与所述用户角色的相对位置。The control unit 704 is configured to place a virtual object in the shooting frame according to the pointed position point, so as to control the relative positions of the virtual object and the user character in the shooting frame.

在一些实施例中，所述获取单元703具体用于：In some embodiments, the acquiring unit 703 is specifically configured to:

在一些实施例中，所述获取所述射线与所述虚拟对象的支撑面之间的交点，以作为所述当前手势的指向位置点之前，所述获取单元703具体用于：In some embodiments, before acquiring the intersection point between the ray and the supporting surface of the virtual object as the pointed position point of the current gesture, the acquiring unit 703 is specifically configured to:

在一些实施例中，所述控制单元704具体用于：In some embodiments, the control unit 704 is specifically configured to:

在一些实施例中，所述虚拟对象为体积视频中的三维模型，所述控制单元704具体用于：In some embodiments, the virtual object is a three-dimensional model in a volumetric video, and the control unit 704 is specifically configured to:

由此，本申请实施例提供的合拍控制装置700可以带来如下技术效果：使得用户角色可以通过手势控制虚拟对象的位置，从而使得用户角色所合拍的虚拟对象的大致位置，进而使得用户角色在与虚拟对象合拍时作出与虚拟对象更加自然协调的动作与表情，降低合拍画面的突兀性，使得合拍画面效果更自然。Therefore, the co-shooting control device 700 provided by the embodiment of the present application can bring the following technical effects: the user character can control the position of the virtual object through gestures, so that the approximate position of the virtual object that the user character is co-patting, and then the user character can control the position of the virtual object in the When co-shooting with a virtual object, make movements and expressions that are more natural and coordinated with the virtual object, reduce the abruptness of the co-shooting picture, and make the co-shooting picture effect more natural.

具体实施时，以上各个单元可以作为独立的实体来实现，也可以进行任意组合，作为同一或若干个实体来实现，以上各个单元的具体实施可参见前面的方法实施例，在此不再赘述。During specific implementation, each of the above units may be implemented as an independent entity, or may be combined arbitrarily as the same or several entities. The specific implementation of each of the above units may refer to the previous method embodiments, and will not be repeated here.

相应的，本申请实施例还提供一种电子设备，该电子设备可以为终端，该终端可以为智能手机、平板电脑、笔记本电脑、触控屏幕、个人计算机(PC，Personal Computer)、个人数字助理(Personal Digital Assistant，PDA)等终端设备。如图8所示，图8为本申请实施例提供的电子设备的结构示意图。该电子设备800包括有一个或者一个以上处理核心的处理器801、有一个或一个以上计算机可读存储介质的存储器802及存储在存储器802上并可在处理器上运行的计算机程序。其中，处理器801与存储器802电性连接。本领域技术人员可以理解，图中示出的电子设备结构并不构成对电子设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Correspondingly, the embodiment of the present application also provides an electronic device, the electronic device can be a terminal, and the terminal can be a smart phone, a tablet computer, a notebook computer, a touch screen, a personal computer (PC, Personal Computer), a personal digital assistant (Personal Digital Assistant, PDA) and other terminal equipment. As shown in FIG. 8 , FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device 800 includes a processor 801 with one or more processing cores, a memory 802 with one or more computer-readable storage media, and computer programs stored in the memory 802 and operable on the processor. Wherein, the processor 801 is electrically connected with the memory 802 . Those skilled in the art can understand that the structure of the electronic device shown in the figure does not constitute a limitation on the electronic device, and may include more or less components than those shown in the figure, or combine some components, or arrange different components.

处理器801是电子设备800的控制中心，利用各种接口和线路连接整个电子设备800的各个部分，通过运行或加载存储在存储器802内的软件程序和/或模块，以及调用存储在存储器802内的数据，执行电子设备800的各种功能和处理数据，从而对电子设备800进行整体监控。The processor 801 is the control center of the electronic device 800, and uses various interfaces and lines to connect various parts of the entire electronic device 800, by running or loading software programs and/or modules stored in the memory 802, and calling the software programs stored in the memory 802 Execute various functions of the electronic device 800 and process data, so as to monitor the electronic device 800 as a whole.

在本申请实施例中，电子设备800中的处理器801会按照本申请提供的任一种合拍控制方法的步骤，将一个或一个以上的应用程序的进程对应的指令加载到存储器802中，并由处理器801来运行存储在存储器802中的应用程序，从而实现上述合拍控制方法的具体过程。In the embodiment of the present application, the processor 801 in the electronic device 800 will load the instructions corresponding to the process of one or more application programs into the memory 802 according to the steps of any co-pacing control method provided in the present application, and The application program stored in the memory 802 is run by the processor 801, so as to realize the specific process of the above-mentioned co-time control method.

可选的，如图8所示，电子设备800还包括：触控显示屏803、射频电路804、音频电路805、输入单元806以及电源807。其中，处理器801分别与触控显示屏803、射频电路804、音频电路805、输入单元806以及电源807电性连接。本领域技术人员可以理解，图8中示出的电子设备结构并不构成对电子设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Optionally, as shown in FIG. 8 , the electronic device 800 further includes: a touch display screen 803 , a radio frequency circuit 804 , an audio circuit 805 , an input unit 806 and a power supply 807 . Wherein, the processor 801 is electrically connected with the touch screen 803 , the radio frequency circuit 804 , the audio circuit 805 , the input unit 806 and the power supply 807 respectively. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 8 does not constitute a limitation on the electronic device, and may include more or less components than shown in the figure, or combine some components, or arrange different components.

触控显示屏803可用于显示图形用户界面以及接收用户作用于图形用户界面产生的操作指令。触控显示屏803可以包括显示面板和触控面板。其中，显示面板可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口，这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。The touch display screen 803 can be used to display a graphical user interface and receive operation instructions generated by the user acting on the graphical user interface. The touch display screen 803 may include a display panel and a touch panel. Among them, the display panel can be used to display information input by or provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces can be composed of graphics, text, icons, videos and any combination thereof.

射频电路804可用于收发射频信号，以通过无线通信与网络设备或其他电子设备建立无线通讯，与网络设备或其他电子设备之间收发信号。The radio frequency circuit 804 can be used to send and receive radio frequency signals to establish wireless communication with network equipment or other electronic equipment through wireless communication, and to send and receive signals with network equipment or other electronic equipment.

音频电路805可以用于通过扬声器、传声器提供用户与电子设备之间的音频接口。音频电路805可将接收到的音频数据转换后的电信号，传输到扬声器，由扬声器转换为声音信号输出；另一方面，传声器将收集的声音信号转换为电信号，由音频电路805接收后转换为音频数据，再将音频数据输出处理器801处理后，经射频电路804以发送给比如另一电子设备，或者将音频数据输出至存储器802以便进一步处理。音频电路805还可能包括耳塞插孔，以提供外设耳机与电子设备的通信。The audio circuit 805 may be used to provide an audio interface between the user and the electronic device through a speaker or a microphone. The audio circuit 805 can transmit the electrical signal converted from the received audio data to the speaker, and the speaker converts it into an audio signal for output; on the other hand, the microphone converts the collected audio signal into an electrical signal, which is converted by the audio circuit 805 It is audio data, and then the audio data is processed by the output processor 801, and then sent to another electronic device through the radio frequency circuit 804, or the audio data is output to the memory 802 for further processing. Audio circuitry 805 may also include an earphone jack to provide communication of peripheral headphones with the electronic device.

输入单元806可用于接收输入的数字、字符信息或用户特征信息(例如指纹、虹膜、面部信息等)，以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。The input unit 806 can be used to receive input numbers, character information or user characteristic information (such as fingerprints, iris, face information, etc.), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control .

电源807用于给电子设备800的各个部件供电。可选的，电源807可以通过电源管理系统与处理器801逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源807还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The power supply 807 is used to supply power to various components of the electronic device 800 . Optionally, the power supply 807 may be logically connected to the processor 801 through a power management system, so as to implement functions such as management of charging, discharging, and power consumption through the power management system. The power supply 807 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.

尽管图8中未示出，电子设备800还可以包括摄像头、传感器、无线保真模块、蓝牙模块等，在此不再赘述。Although not shown in FIG. 8 , the electronic device 800 may also include a camera, a sensor, a Wi-Fi module, a Bluetooth module, etc., which will not be repeated here.

本领域普通技术人员可以理解，上述实施例的各种方法中的全部或部分步骤可以通过指令来完成，或通过指令控制相关的硬件来完成，该指令可以存储于一计算机可读存储介质中，并由处理器进行加载和执行。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructions, or by instructions controlling related hardware, and the instructions can be stored in a computer-readable storage medium, and loaded and executed by the processor.

为此，本申请实施例提供一种计算机可读存储介质，其中存储有多条计算机程序，该计算机程序能够被处理器进行加载，以执行本申请实施例所提供的任一种合拍控制方法中。To this end, the embodiment of the present application provides a computer-readable storage medium, which stores a plurality of computer programs, and the computer programs can be loaded by a processor to execute any of the co-pacing control methods provided in the embodiments of the present application. .

其中，该计算机可读存储介质可以包括：只读存储器(ROM，Read Only Memory)、随机存取记忆体(RAM，Random Access Memory)、磁盘或光盘等。Wherein, the computer-readable storage medium may include: a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, and the like.

在上述合拍控制装置、计算机可读存储介质、电子设备实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的合拍控制装置、计算机可读存储介质、电子设备及其相应单元的具体工作过程及可带来的有益效果，可以参考如上实施例中合拍控制方法的说明，具体在此不再赘述。In the above embodiments of the co-production control device, computer-readable storage medium, and electronic equipment, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, refer to the relevant descriptions of other embodiments. Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process and beneficial effects of the above-described co-production control device, computer-readable storage medium, electronic equipment and their corresponding units can be Refer to the description of the co-beating control method in the above embodiment, and details are not repeated here.

以上对本申请实施例所提供的一种合拍控制方法、装置、电子设备和计算机可读存储介质进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The above is a detailed introduction to the co-pacing control method, device, electronic equipment and computer-readable storage medium provided by the embodiments of the present application. In this paper, specific examples are used to illustrate the principles and implementation methods of the present application. The above embodiments The description is only used to help understand the method of the present application and its core idea; at the same time, for those skilled in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope, in summary , the contents of this specification should not be construed as limiting the application.

Claims

1. A co-pacing control method, characterized in that the method comprises:

Present a shooting screen containing the user's character;

performing gesture recognition on the user character in the photographed picture to obtain the current gesture of the user character;

If the current gesture matches the preset position control gesture, then acquire the pointed position point of the current gesture;

The virtual object is placed in the shooting frame according to the pointing position point, so as to control the relative position of the virtual object and the user character in the shooting frame.

2. The co-shooting control method according to claim 1, wherein if the current gesture matches a preset position control gesture, obtaining the pointed position point of the current gesture comprises:

If the current gesture matches the preset position control gesture, then acquire the finger corresponding to the current gesture to point to a ray formed in the three-dimensional space where the shooting picture is located;

Obtain an intersection point between the ray and the supporting surface of the virtual object as the pointing position of the current gesture.

3. The co-shoot control method according to claim 2, characterized in that before acquiring the intersection point between the ray and the supporting surface of the virtual object as the pointing position point of the current gesture, further comprising: :

adding the standing surface of the user character in the shooting picture to the set of candidate planes in the shooting picture;

Adding a plane whose angle between the shooting picture and the standing surface is smaller than a preset angle threshold to the set of candidate planes;

From each of the planes in the set of candidate planes, a plane that has an intersection point with the ray and is closest to a starting point of the ray is obtained as a supporting surface of the virtual object.

4. The co-shooting control method according to claim 1, wherein if the current gesture matches a preset position control gesture, obtaining the pointed position point of the current gesture comprises:

detecting the gesture type of the current gesture of the user character;

If the gesture type is a position control gesture and the current gesture matches a preset position control gesture, the pointing position point of the current gesture is acquired.

5. The co-production control method according to claim 4, characterized in that the method further comprises:

If the gesture type is an orientation control gesture, then acquire the associated orientation of the current gesture;

According to the associated orientation, the relative orientation of the virtual object and the user character in the shooting screen is controlled.

6. The co-shooting control method according to claim 5, characterized in that, after controlling the relative orientation of the virtual object and the user character in the shooting screen according to the associated orientation, further comprising:

In response to the change of the orientation of the user character, the orientation of the virtual object is updated, so that the relative orientation of the user character and the virtual object remains as the associated orientation.

7. The co-production control method according to claim 4, characterized in that the method further comprises:

If the gesture type is a distance control gesture, then obtain the associated distance of the current gesture;

According to the association distance, the relative distance between the virtual object and the user character in the photographing frame is controlled.

8. The co-production control method according to claim 1, further comprising:

In response to a touch operation on the shooting control, the shooting screen is shot to obtain a co-shooting video of the virtual object and the target of the user character.

9. The co-production control method according to claim 8, characterized in that the method further comprises:

Responding to a touch operation on the shooting control, shooting the shooting screen to obtain a preliminary co-shooting video of the virtual object and the user character;

Perform control gesture recognition on the video frame in the preliminary co-shooting video to obtain a target video frame containing the control gesture;

The target video frame is filtered out from the preliminary co-shooting video to obtain the target co-shooting video.

10. The co-production control method according to claim 1, further comprising:

When the current gesture matches the preset control gesture, detect whether the camera of the shooting picture is in a shooting state;

If the camera is in the shooting state, then switch the camera from the shooting state to the pause state;

Switching the camera from a pause state to a shooting state until the virtual object is placed at the pointing position in the shooting screen.

11. The co-shooting control method according to any one of claims 1-10, wherein the virtual object is a three-dimensional model in the volumetric video.

12. A co-beat control device, characterized in that the co-beat control device comprises:

a display unit, configured to present a shooting screen including a user character;

A recognition unit, configured to perform gesture recognition on the user character in the photographed picture to obtain the current gesture of the user character;

An acquisition unit, configured to acquire the pointing position of the current gesture if the current gesture matches a preset position control gesture;

A control unit, configured to place a virtual object in the shooting frame according to the pointed position point, so as to control the relative positions of the virtual object and the user character in the shooting frame.

13. An electronic device, characterized in that it comprises a processor and a memory, and a computer program is stored in the memory, and when the processor invokes the computer program in the memory, it executes the process described in any one of claims 1 to 11. The co-pacing control method described above.

14. A computer-readable storage medium, characterized in that a computer program is stored thereon, and the computer program is loaded by a processor to execute the co-production control method according to any one of claims 1 to 11.