CN110276831A

CN110276831A - Method and device for constructing three-dimensional model, equipment and computer-readable storage medium

Info

Publication number: CN110276831A
Application number: CN201910573384.9A
Authority: CN
Inventors: 康健
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-09-24
Anticipated expiration: 2039-06-28
Also published as: CN110276831B

Abstract

The application relates to a method and a device for constructing a three-dimensional model, a terminal device and a computer readable storage medium. The method comprises the steps of acquiring a visible light graph, and generating a central weight graph corresponding to the visible light graph, wherein the weight value represented by the central weight graph is gradually reduced from the center to the edge; inputting the visible light image and the central weight image into a main body detection model to obtain a main body region confidence image, wherein the main body detection model is obtained by training in advance according to the visible light image, the central weight image and a corresponding marked main body mask image of the same scene; determining a target subject in the visible light map according to a subject region confidence map; acquiring depth information corresponding to a target main body; and performing three-dimensional reconstruction on the target main body according to the target main body and the depth information corresponding to the target main body, returning to the step of acquiring the visible light image to acquire the visible light images at different acquisition angles until a three-dimensional model corresponding to the target main body is obtained, and improving the accuracy of three-dimensional model construction.

Description

Three-dimensional model construction method and device, device, computer readable storage medium

技术领域technical field

本申请涉及计算机技术领域，特别是涉及一种三维模型的建构方法和装置、终端设备、计算机可读存储介质。The present application relates to the field of computer technology, and in particular, to a method and apparatus for constructing a three-dimensional model, a terminal device, and a computer-readable storage medium.

背景技术Background technique

随着影像技术的发展，人们越来越习惯通过电子设备上的摄像头等图像采集设备拍摄图像或视频，记录各种信息，由于三维的图像处理的真实感更强而得到广泛的关注。With the development of imaging technology, people are more and more accustomed to shooting images or videos through image acquisition devices such as cameras on electronic devices to record various information. Because of the stronger realism of three-dimensional image processing, it has received widespread attention.

传统的三维模型在进行三维建构时，经常受到周围的人或物体的影响，导致三维模型的建构准确率低。The traditional 3D model is often affected by surrounding people or objects during 3D construction, resulting in a low construction accuracy of the 3D model.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种三维模型的建构方法、装置、终端设备、计算机可读存储介质，根据中心权重图和主体检测模型得到精准的主体区域置信度图，从而精确的识别图像中的目标主体，在三维模型建构时，通过目标主体的深度信息，实现目标主体对应的三维模型的精准构建，提高了三维模型建构的准确率。The embodiments of the present application provide a method, device, terminal device, and computer-readable storage medium for constructing a three-dimensional model, and obtain an accurate confidence map of the subject area according to the center weight map and the subject detection model, so as to accurately identify the target subject in the image , during the construction of the three-dimensional model, through the depth information of the target subject, the accurate construction of the three-dimensional model corresponding to the target subject is realized, and the accuracy of the three-dimensional model construction is improved.

一种三维模型的建构方法，所述方法包括：A method for constructing a three-dimensional model, the method comprising:

获取可见光图，生成与所述可见光图对应的中心权重图，其中，所述中心权重图所表示的权重值从中心到边缘逐渐减小；acquiring a visible light map, and generating a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map gradually decreases from the center to the edge;

将所述可见光图和所述中心权重图输入到主体检测模型中，得到主体区域置信度图，其中，所述主体检测模型是预先根据同一场景的可见光图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型；Input the visible light map and the center weight map into the subject detection model to obtain the confidence map of the subject area, wherein the subject detection model is based on the visible light map of the same scene, the center weight map and the corresponding marked The model obtained by training the subject mask map;

根据所述主体区域置信度图确定所述可见光图中的目标主体；determining the target subject in the visible light image according to the subject area confidence map;

获取所述目标主体对应的深度信息；obtaining depth information corresponding to the target subject;

根据所述目标主体和目标主体对应的深度信息，对所述目标主体进行三维重构，返回所述获取可见光图的步骤以获取不同采集角度的可见光图，直到得到所述目标主体对应的三维模型。According to the target subject and the depth information corresponding to the target subject, perform three-dimensional reconstruction on the target subject, and return to the step of obtaining visible light images to obtain visible light images at different acquisition angles until the three-dimensional model corresponding to the target subject is obtained. .

一种三维模型的建构装置，所述装置包括：A device for constructing a three-dimensional model, the device comprising:

处理模块，用于获取可见光图，生成与所述可见光图对应的中心权重图，其中，所述中心权重图所表示的权重值从中心到边缘逐渐减小；a processing module, configured to acquire a visible light map and generate a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map gradually decreases from the center to the edge;

检测模块，用于将所述可见光图和所述中心权重图输入到主体检测模型中，得到主体区域置信度图，其中，所述主体检测模型是预先根据同一场景的可见光图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型；The detection module is used to input the visible light map and the center weight map into the subject detection model to obtain the confidence map of the subject area, wherein the subject detection model is based on the visible light map, the center weight map and the center weight map of the same scene in advance. The model obtained by training the corresponding annotated subject mask map;

目标主体确定模块，用于根据所述主体区域置信度图确定所述可见光图中的目标主体；a target subject determination module, configured to determine the target subject in the visible light map according to the subject area confidence map;

三维模型建构模块，用于获取所述目标主体对应的深度信息，根据所述目标主体和目标主体对应的深度信息，对所述目标主体进行三维重构，返回所述获取可见光图的步骤以获取不同采集角度的可见光图，直到得到所述目标主体对应的三维模型。A three-dimensional model building module is used to obtain the depth information corresponding to the target body, perform three-dimensional reconstruction of the target body according to the target body and the depth information corresponding to the target body, and return to the step of obtaining the visible light map to obtain Visible light images at different acquisition angles are obtained until a three-dimensional model corresponding to the target body is obtained.

一种终端设备，包括存储器及处理器，所述存储器中储存有计算机程序，所述计算机程序被所述处理器执行时，使得所述处理器执行如下步骤：A terminal device includes a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the following steps:

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现如下步骤：A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

上述三维模型的建构方法、装置、终端设备、计算机可读存储介质，通过获取可见光图，生成与可见光图对应的中心权重图，其中，中心权重图所表示的权重值从中心到边缘逐渐减小；将可见光图和中心权重图输入到主体检测模型中，得到主体区域置信度图，其中，主体检测模型是预先根据同一场景的可见光图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型；根据主体区域置信度图确定可见光图中的目标主体；获取目标主体对应的深度信息；根据目标主体和目标主体对应的深度信息，对目标主体进行三维重构，返回获取可见光图的步骤以获取不同采集角度的可见光图，直到得到目标主体对应的三维模型，利用中心权重图可以让图像中心的对象更容易被检测，利用训练好的利用可见光图、中心权重图和主体掩膜图等训练得到的主体检测模型，可以更加准确的识别出可见光图中的目标主体，在三维模型建构时，通过目标主体的深度信息，实现目标主体对应的三维模型的精准构建，在存在干扰物体的情况下也能精准识别目标主体，从而提高目标主体对应的三维模型建构的准确率。The construction method, device, terminal device, and computer-readable storage medium of the above-mentioned three-dimensional model generate a center weight map corresponding to the visible light map by acquiring the visible light map, wherein the weight value represented by the center weight map gradually decreases from the center to the edge. ; Input the visible light map and the center weight map into the subject detection model to obtain the confidence map of the subject area, wherein the subject detection model is pre-processed according to the visible light map, the center weight map and the corresponding marked subject mask map of the same scene. The model obtained by training; determine the target subject in the visible light image according to the confidence map of the subject area; obtain the depth information corresponding to the target subject; perform three-dimensional reconstruction of the target subject according to the depth information corresponding to the target subject and the target subject, and return to obtain the visible light image Steps to obtain visible light images at different acquisition angles until the 3D model corresponding to the target subject is obtained. Using the center weight map can make the object in the center of the image easier to detect. Using the trained visible light map, center weight map and subject mask The subject detection model obtained by training such as images can more accurately identify the target subject in the visible light image. During the construction of the 3D model, the depth information of the target subject is used to realize the accurate construction of the 3D model corresponding to the target subject. In the presence of interfering objects It can also accurately identify the target subject in the case of the target subject, thereby improving the accuracy of the construction of the 3D model corresponding to the target subject.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1为一个实施例中终端设备的内部结构框图；1 is a block diagram of the internal structure of a terminal device in one embodiment;

图2为一个实施例中三维模型的建构方法的流程图；2 is a flowchart of a method for constructing a three-dimensional model in one embodiment;

图3为一个实施例中目标主体对应的三维模型的示意图；3 is a schematic diagram of a three-dimensional model corresponding to a target body in one embodiment;

图4为一个实施例中根据该主体区域置信度图确定该可见光图中的目标主体的流程图；4 is a flowchart of determining a target subject in the visible light map according to the subject area confidence map in one embodiment;

图5为一个实施例中主体检测模型的网络结构示意图；5 is a schematic diagram of a network structure of a subject detection model in one embodiment;

图6为一个实施例中主体检测效果示意图；6 is a schematic diagram of a subject detection effect in one embodiment;

图7为一个实施例中三维模型的建构装置的结构框图；7 is a structural block diagram of an apparatus for constructing a three-dimensional model in one embodiment;

图8为一个实施例中终端设备的内部结构图。FIG. 8 is an internal structural diagram of a terminal device in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

本申请实施例中的三维模型的建构方法可应用于终端设备。该终端设备可为带有摄像头的计算机设备、个人数字助理、平板电脑、智能手机、穿戴式设备等。终端设备中的摄像头在拍摄图像时，会进行自动对焦，以保证拍摄的图像清晰。The method for constructing a three-dimensional model in the embodiment of the present application can be applied to a terminal device. The terminal device may be a computer device with a camera, a personal digital assistant, a tablet computer, a smart phone, a wearable device, and the like. When the camera in the terminal device captures an image, it will automatically focus to ensure that the captured image is clear.

在一个实施例中，上述终端设备中可包括图像处理电路，图像处理电路可以利用硬件和/或软件组件实现，可包括定义ISP(Image Signal Processing，图像信号处理)管线的各种处理单元。图1为一个实施例中图像处理电路的示意图。如图1所示，为便于说明，仅示出与本申请实施例相关的图像处理技术的各个方面。In one embodiment, the above-mentioned terminal device may include an image processing circuit, which may be implemented by hardware and/or software components, and may include various processing units that define an ISP (Image Signal Processing, image signal processing) pipeline. FIG. 1 is a schematic diagram of an image processing circuit in one embodiment. As shown in FIG. 1 , for the convenience of description, only various aspects of the image processing technology related to the embodiments of the present application are shown.

如图1所示，图像处理电路包括第一ISP处理器130、第二ISP处理器140和控制逻辑器150。第一摄像头110包括一个或多个第一透镜112和第一图像传感器114。第一图像传感器114可包括色彩滤镜阵列(如Bayer滤镜)，第一图像传感器114可获取用第一图像传感器114的每个成像像素捕捉的光强度和波长信息，并提供可由第一ISP处理器130处理的一组图像数据。第二摄像头120包括一个或多个第二透镜122和第二图像传感器124。第二图像传感器124可包括色彩滤镜阵列(如Bayer滤镜)，第二图像传感器124可获取用第二图像传感器124的每个成像像素捕捉的光强度和波长信息，并提供可由第二ISP处理器140处理的一组图像数据。As shown in FIG. 1 , the image processing circuit includes a first ISP processor 130 , a second ISP processor 140 and a control logic 150 . The first camera 110 includes one or more first lenses 112 and a first image sensor 114 . The first image sensor 114 may include a color filter array (eg, a Bayer filter), the first image sensor 114 may acquire light intensity and wavelength information captured with each imaging pixel of the first image sensor 114, and provide information that can be accessed by the first ISP A set of image data processed by processor 130 . The second camera 120 includes one or more second lenses 122 and a second image sensor 124 . The second image sensor 124 may include a color filter array (eg, a Bayer filter), the second image sensor 124 may obtain light intensity and wavelength information captured with each imaging pixel of the second image sensor 124, and provide information that can be accessed by the second ISP A set of image data processed by processor 140 .

第一摄像头110采集的第一图像传输给第一ISP处理器130进行处理，第一ISP处理器130处理第一图像后，可将第一图像的统计数据(如图像的亮度、图像的反差值、图像的颜色等)发送给控制逻辑器150，控制逻辑器150可根据统计数据确定第一摄像头110的控制参数，从而第一摄像头110可根据控制参数进行自动对焦、自动曝光等操作。第一图像经过第一ISP处理器130进行处理后可存储至图像存储器160中，第一ISP处理器130也可以读取图像存储器160中存储的图像以对进行处理。另外，第一图像经过ISP处理器130进行处理后可直接发送至显示器170进行显示，显示器170也可以读取图像存储器160中的图像以进行显示。The first image collected by the first camera 110 is transmitted to the first ISP processor 130 for processing. After the first ISP processor 130 processes the first image, the statistical data of the first image (such as the brightness of the image, the contrast value of the image, etc.) , image color, etc.) to the control logic 150, and the control logic 150 can determine the control parameters of the first camera 110 according to the statistical data, so that the first camera 110 can perform auto-focus, auto-exposure and other operations according to the control parameters. After being processed by the first ISP processor 130, the first image may be stored in the image memory 160, and the first ISP processor 130 may also read the image stored in the image memory 160 for processing. In addition, after being processed by the ISP processor 130, the first image can be directly sent to the display 170 for display, and the display 170 can also read the image in the image memory 160 for display.

其中，第一ISP处理器130按多种格式逐个像素地处理图像数据。例如，每个图像像素可具有8、10、12或14比特的位深度，第一ISP处理器130可对图像数据进行一个或多个图像处理操作、收集关于图像数据的统计信息。其中，图像处理操作可按相同或不同的位深度精度进行。Among them, the first ISP processor 130 processes the image data pixel by pixel in various formats. For example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the first ISP processor 130 may perform one or more image processing operations on the image data, collecting statistical information about the image data. Among them, the image processing operations can be performed with the same or different bit depth precision.

图像存储器160可为存储器装置的一部分、存储设备、或终端设备内的独立的专用存储器，并可包括DMA(Direct Memory Access，直接直接存储器存取)特征。The image memory 160 may be a part of a memory device, a storage device, or an independent dedicated memory in a terminal device, and may include a DMA (Direct Memory Access, direct memory access) feature.

当接收到来自第一图像传感器114接口时，第一ISP处理器130可进行一个或多个图像处理操作，如时域滤波。处理后的图像数据可发送给图像存储器160，以便在被显示之前进行另外的处理。第一ISP处理器130从图像存储器160接收处理数据，并对所述处理数据进行RGB和YCbCr颜色空间中的图像数据处理。第一ISP处理器130处理后的图像数据可输出给显示器170，以供用户观看和/或由图形引擎或GPU(Graphics Processing Unit，图形处理器)进一步处理。此外，第一ISP处理器130的输出还可发送给图像存储器160，且显示器170可从图像存储器160读取图像数据。在一个实施例中，图像存储器160可被配置为实现一个或多个帧缓冲器。Upon receiving the interface from the first image sensor 114, the first ISP processor 130 may perform one or more image processing operations, such as temporal filtering. The processed image data may be sent to image memory 160 for additional processing before being displayed. The first ISP processor 130 receives processing data from the image memory 160 and performs image data processing in RGB and YCbCr color spaces on the processed data. The image data processed by the first ISP processor 130 may be output to the display 170 for viewing by the user and/or further processed by a graphics engine or a GPU (Graphics Processing Unit, graphics processor). In addition, the output of the first ISP processor 130 may also be sent to the image memory 160 , and the display 170 may read image data from the image memory 160 . In one embodiment, image memory 160 may be configured to implement one or more frame buffers.

第一ISP处理器130确定的统计数据可发送给控制逻辑器150。例如，统计数据可包括自动曝光、自动白平衡、自动聚焦、闪烁检测、黑电平补偿、第一透镜112阴影校正等第一图像传感器114统计信息。控制逻辑器150可包括执行一个或多个例程(如固件)的处理器和/或微控制器，一个或多个例程可根据接收的统计数据，确定第一摄像头110的控制参数及第一ISP处理器130的控制参数。例如，第一摄像头110的控制参数可包括增益、曝光控制的积分时间、防抖参数、闪光控制参数、第一透镜112控制参数(例如聚焦或变焦用焦距)、或这些参数的组合等。ISP控制参数可包括用于自动白平衡和颜色调整(例如，在RGB处理期间)的增益水平和色彩校正矩阵，以及第一透镜112阴影校正参数。Statistics determined by the first ISP processor 130 may be sent to the control logic 150 . For example, the statistics may include first image sensor 114 statistics such as auto exposure, auto white balance, auto focus, flicker detection, black level compensation, first lens 112 shading correction, and the like. The control logic 150 may include a processor and/or a microcontroller executing one or more routines (eg, firmware) that may determine the control parameters and the first camera 110 based on the received statistics. A control parameter of the ISP processor 130. For example, the control parameters of the first camera 110 may include gain, integration time for exposure control, anti-shake parameters, flash control parameters, first lens 112 control parameters (eg, focal length for focusing or zooming), or a combination of these parameters. ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (eg, during RGB processing), and first lens 112 shading correction parameters.

同样地，第二摄像头120采集的第二图像传输给第二ISP处理器140进行处理，第二ISP处理器140处理第一图像后，可将第二图像的统计数据(如图像的亮度、图像的反差值、图像的颜色等)发送给控制逻辑器150，控制逻辑器150可根据统计数据确定第二摄像头120的控制参数，从而第二摄像头120可根据控制参数进行自动对焦、自动曝光等操作。第二图像经过第二ISP处理器140进行处理后可存储至图像存储器160中，第二ISP处理器140也可以读取图像存储器160中存储的图像以对进行处理。另外，第二图像经过ISP处理器140进行处理后可直接发送至显示器170进行显示，显示器170也可以读取图像存储器160中的图像以进行显示。第二摄像头120和第二ISP处理器140也可以实现如第一摄像头110和第一ISP处理器130所描述的处理过程。Similarly, the second image captured by the second camera 120 is transmitted to the second ISP processor 140 for processing. After the second ISP processor 140 processes the first image, the statistical data of the second image (such as the brightness of the image, the The contrast value, the color of the image, etc.) are sent to the control logic 150, and the control logic 150 can determine the control parameters of the second camera 120 according to the statistical data, so that the second camera 120 can perform automatic focusing, automatic exposure and other operations according to the control parameters. . After being processed by the second ISP processor 140, the second image can be stored in the image memory 160, and the second ISP processor 140 can also read the image stored in the image memory 160 for processing. In addition, after being processed by the ISP processor 140, the second image can be directly sent to the display 170 for display, and the display 170 can also read the image in the image memory 160 for display. The second camera 120 and the second ISP processor 140 can also implement the processing procedures as described for the first camera 110 and the first ISP processor 130 .

在一个实施例中，第一摄像头110可为彩色摄像头，第二摄像头120可为TOF(TimeOf Flight，飞行时间)摄像头或结构光摄像头。TOF摄像头可获取TOF深度图，结构光摄像头可获取结构光深度图。第一摄像头110和第二摄像头120可均为彩色摄像头。通过两个彩色摄像头获取双目深度图。第一ISP处理器130和第二ISP处理器140可为同一ISP处理器。In one embodiment, the first camera 110 may be a color camera, and the second camera 120 may be a TOF (Time Of Flight) camera or a structured light camera. The TOF camera can obtain the TOF depth map, and the structured light camera can obtain the structured light depth map. The first camera 110 and the second camera 120 may both be color cameras. The binocular depth map is acquired by two color cameras. The first ISP processor 130 and the second ISP processor 140 may be the same ISP processor.

第一摄像头110和第二摄像头120拍摄同一场景分别得到可见光图和深度图，将可见光图和深度图发送给ISP处理器。ISP处理器可根据相机标定参数对可见光图和深度图进行配准，保持视野完全一致；然后再生成与可见光图对应的中心权重图，其中，该中心权重图所表示的权重值从中心到边缘逐渐减小；将可见光图和中心权重图输入到训练好的主体检测模型中，得到主体区域置信度图，再根据主体区域置信度图确定可见光图中的目标主体；也可将可见光图、深度图和中心权重图输入到训练好的主体检测模型中，得到主体区域置信度图，再根据主体区域置信度图确定可见光图中的目标主体。利用中心权重图可以让位于图像中心的对象更容易被检测，利用深度图可以让距离摄像头更近的对象容易被检测，提高了主体检测的准确性。在三维模型建构时，通过目标主体的深度信息，实现目标主体对应的三维模型的精准构建，在存在干扰物体的情况下也能精准识别目标主体，从而提高目标主体对应的三维模型建构的准确率。The first camera 110 and the second camera 120 capture the same scene to obtain a visible light image and a depth image, respectively, and send the visible light image and the depth image to the ISP processor. The ISP processor can register the visible light map and the depth map according to the camera calibration parameters to keep the field of view completely consistent; and then generate a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map is from the center to the edge. Gradually decrease; input the visible light map and the center weight map into the trained subject detection model to obtain the confidence map of the subject area, and then determine the target subject in the visible light map according to the confidence map of the subject area; you can also use the visible light map, depth The image and the center weight map are input into the trained subject detection model, and the confidence map of the subject area is obtained, and then the target subject in the visible light image is determined according to the confidence map of the subject area. Using the center weight map can make objects located in the center of the image easier to detect, and using the depth map can make objects closer to the camera easier to detect, improving the accuracy of subject detection. When constructing a 3D model, through the depth information of the target subject, the accurate construction of the 3D model corresponding to the target subject is realized, and the target subject can also be accurately identified in the presence of interfering objects, thereby improving the accuracy of the 3D model construction corresponding to the target subject. .

图2为一个实施例中三维模型的建构方法的流程图。如图2所示，一种三维模型的建构方法，可应用于图1中的终端设备中，包括：FIG. 2 is a flowchart of a method for constructing a three-dimensional model in one embodiment. As shown in FIG. 2, a method for constructing a three-dimensional model, which can be applied to the terminal device in FIG. 1, includes:

步骤202，获取可见光图。Step 202, obtaining a visible light image.

其中，主体检测(salient object detection)是指面对一个场景时，自动地对感兴趣区域进行处理而选择性的忽略不感兴趣区域。感兴趣区域称为主体区域。可见光图是指RGB(Red、Green、Blue)图像。可通过彩色摄像头拍摄任意场景得到彩色图像，即RGB图像。该可见光图可为终端设备本地存储的，也可为其他设备存储的，也可以为从网络上存储的，还可为终端设备实时拍摄的，不限于此。。Among them, salient object detection refers to automatically processing the region of interest and selectively ignoring the uninteresting region when facing a scene. The region of interest is called the subject region. Visible light images refer to RGB (Red, Green, Blue) images. Color images, ie RGB images, can be obtained by shooting any scene with a color camera. The visible light image may be stored locally by the terminal device, may also be stored by other devices, may also be stored from the network, or may be captured in real time by the terminal device, which is not limited thereto. .

具体地，终端设备的ISP处理器或中央处理器可从本地或其他设备或网络上获取可见光图，或者通过摄像头拍摄一场景得到可见光图。Specifically, the ISP processor or the central processing unit of the terminal device can obtain the visible light image from a local or other device or network, or obtain the visible light image by photographing a scene with a camera.

步骤204，生成与该可见光图对应的中心权重图，其中，该中心权重图所表示的权重值从中心到边缘逐渐减小。Step 204: Generate a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.

其中，中心权重图是指用于记录可见光图中各个像素点的权重值的图。中心权重图中记录的权重值从中心向四边逐渐减小，即中心权重最大，向四边权重逐渐减小。通过中心权重图表征可见光图的图像中心像素点到图像边缘像素点的权重值逐渐减小。The central weight map refers to a map used to record the weight values of each pixel in the visible light map. The weight value recorded in the center weight map gradually decreases from the center to the four sides, that is, the center weight is the largest, and the weight gradually decreases toward the four sides. The weight value from the image center pixel point to the image edge pixel point of the visible light map is gradually reduced by the center weight map.

ISP处理器或中央处理器可以根据可见光图的大小生成对应的中心权重图。该中心权重图所表示的权重值从中心向四边逐渐减小。中心权重图可采用高斯函数、或采用一阶方程、或二阶方程生成。该高斯函数可为二维高斯函数。The ISP processor or the central processing unit can generate the corresponding center weight map according to the size of the visible light map. The weight value represented by the center weight map gradually decreases from the center to the four sides. The center weight map can be generated by using a Gaussian function, or using a first-order equation, or a second-order equation. The Gaussian function may be a two-dimensional Gaussian function.

步骤206，将该可见光图和中心权重图输入到主体检测模型中，得到主体区域置信度图，其中，主体检测模型是预先根据同一场景的可见光图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。Step 206: Input the visible light map and the center weight map into the subject detection model to obtain a confidence map of the subject area, wherein the subject detection model is based on the visible light map, the center weight map and the corresponding marked subject mask of the same scene in advance. The model obtained by training the membrane map.

其中，主体检测模型是预先采集大量的训练数据，将训练数据输入到包含有初始网络权重的主体检测模型进行训练得到的。每组训练数据包括同一场景对应的可见光图、中心权重图及已标注的主体掩膜图。其中，可见光图和中心权重图作为训练的主体检测模型的输入，已标注的主体掩膜(mask)图作为训练的主体检测模型期望输出得到的真实值(ground truth)。主体掩膜图是用于识别图像中主体的图像滤镜模板，可以遮挡图像的其他部分，筛选出图像中的主体。主体检测模型可训练能够识别检测各种主体，如人、花、猫、狗、背景等。Among them, the subject detection model is obtained by collecting a large amount of training data in advance, and inputting the training data into the subject detection model including the initial network weights for training. Each set of training data includes the visible light map, center weight map and annotated subject mask map corresponding to the same scene. Among them, the visible light map and the center weight map are used as the input of the trained subject detection model, and the annotated subject mask map is used as the ground truth that the trained subject detection model expects to output. The subject mask map is an image filter template used to identify the subject in the image, which can block other parts of the image and filter out the subject in the image. The subject detection model can be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.

具体地，ISP处理器或中央处理器可将该可见光图和中心权重图输入到主体检测模型中，进行检测可以得到主体区域置信度图。主体区域置信度图是用于记录主体属于哪种能识别的主体的概率，例如某个像素点属于人的概率是0.8，花的概率是0.1，背景的概率是0.1。Specifically, the ISP processor or the central processing unit can input the visible light map and the center weight map into the subject detection model, and the subject area confidence map can be obtained by performing detection. The confidence map of the subject area is used to record the probability of which identifiable subject the subject belongs to. For example, the probability of a pixel belonging to a person is 0.8, the probability of a flower is 0.1, and the probability of the background is 0.1.

步骤208，根据该主体区域置信度图确定该可见光图中的目标主体。Step 208: Determine the target subject in the visible light image according to the subject area confidence map.

其中，主体是指各种对象，如人、花、猫、狗、牛、蓝天、白云、背景等。目标主体是指需要的主体，可根据需要选择。Among them, the subject refers to various objects, such as people, flowers, cats, dogs, cows, blue sky, white clouds, background, etc. The target subject refers to the required subject, which can be selected as required.

具体地，ISP处理器或中央处理器可根据主体区域置信度图选取置信度最高或次高等作为可见光图中的主体，若存在一个主体，则将该主体作为目标主体；若存在多个主体，可根据配置信息或深度信息选择其中一个或多个主体作为目标主体。在一个实施例中，根据各个主体对应的深度信息确定各个主体距离拍摄终端的距离，将距离最小的主体作为目标主体。Specifically, the ISP processor or the central processing unit can select the highest or second highest confidence level as the main body in the visible light map according to the confidence level map of the main body area. If there is one main body, the main body is used as the target main body; One or more subjects can be selected as target subjects according to configuration information or depth information. In one embodiment, the distance between each subject and the shooting terminal is determined according to the depth information corresponding to each subject, and the subject with the smallest distance is used as the target subject.

步骤210，获取目标主体对应的深度信息，根据目标主体和目标主体对应的深度信息，对目标主体进行三维重构，返回步骤202以获取不同采集角度的可见光图，直到得到目标主体对应的三维模型。Step 210: Acquire depth information corresponding to the target body, perform three-dimensional reconstruction of the target body according to the target body and the depth information corresponding to the target body, and return to step 202 to obtain visible light images at different acquisition angles until a three-dimensional model corresponding to the target body is obtained .

其中，目标主体存在精确的轮廓，只需获取目标主体轮廓内的各个像素点对应的深度信息，可从摄像头拍摄的深度图中获取目标主体对应的深度信息，其中深度图获取的方式不限。Among them, if the target subject has an accurate contour, it is only necessary to obtain the depth information corresponding to each pixel in the contour of the target subject, and the depth information corresponding to the target subject can be obtained from the depth map captured by the camera, wherein the method of obtaining the depth map is not limited.

具体地，根据目标主体和目标主体对应的深度信息对目标主体进行三维重构，进行三维重构的方式不作限定。其中深度信息表征了目标主体中各个像素点距拍摄设备的距离，根据深度信息可确定目标主体中各个像素点在三维空间中Z轴的坐标，从而对目标主体进行三维重构。进行三维重构时，相同深度信息的像素点在同一个平面，可选取参考像素点，将参考像素点对应的深度值作为参考深度值，比较其它像素点对应的深度值与参考深度值的大小关系，从而确定其它像素点相对于参考像素点在三维空间的位置。例如目标主体上某一像素点的深度值为10，相邻像素点的深度值为是12，则此物件以深度值为10的点作为参考像素点，则相邻像素点会凹12-10＝2的深度，目标主体三维重构就是在目标主体对应的主体表面上赋予其相对的凹凸信息。由于当前可见光图只拍摄了目标主体的一部分，所以需要获取其它采集角度的可见光图，如果是实时拍摄的可见光图，则可变换采集角度，对目标主体的各个部分进行拍摄，并获取各个部分对应的深度信息，从而对完整的目标主体进行三维重构。如果是获取的已拍摄的可见光图，则可从直接获取其它拍摄角度的可见光图，对完整的目标主体进行三维重构，直到得到目标主体对应的三维模型。如图3所示，为一个实施例中，建构出的目标主体对应的三维模型的示意图。Specifically, three-dimensional reconstruction is performed on the target subject according to the target subject and depth information corresponding to the target subject, and the manner of performing the three-dimensional reconstruction is not limited. The depth information represents the distance between each pixel in the target subject and the shooting device. According to the depth information, the Z-axis coordinates of each pixel in the target subject in the three-dimensional space can be determined, so as to perform three-dimensional reconstruction of the target subject. When performing 3D reconstruction, the pixels with the same depth information are in the same plane, you can select reference pixels, use the depth value corresponding to the reference pixel as the reference depth value, and compare the depth values corresponding to other pixels with the reference depth value. relationship, so as to determine the position of other pixel points in the three-dimensional space relative to the reference pixel point. For example, the depth value of a pixel on the target subject is 10, and the depth value of the adjacent pixel is 12, then this object uses the point with the depth value of 10 as the reference pixel, and the adjacent pixel will be concave 12-10 = 2, the three-dimensional reconstruction of the target body is to give relative concave-convex information on the body surface corresponding to the target body. Since the current visible light image only captures a part of the target subject, it is necessary to obtain visible light images from other acquisition angles. If the visible light image is captured in real time, the acquisition angle can be changed to shoot each part of the target subject and obtain the corresponding parts of each part. The depth information of the complete target body can be reconstructed in 3D. If the captured visible light image is obtained, the complete target subject can be 3D reconstructed from directly obtained visible light images at other shooting angles until the 3D model corresponding to the target subject is obtained. As shown in FIG. 3 , it is a schematic diagram of the constructed three-dimensional model corresponding to the target body in one embodiment.

本实施例中的三维模型的建构方法，通过获取可见光图，生成与可见光图对应的中心权重图，其中，中心权重图所表示的权重值从中心到边缘逐渐减小；将可见光图和中心权重图输入到主体检测模型中，得到主体区域置信度图，其中，主体检测模型是预先根据同一场景的可见光图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型；根据主体区域置信度图确定可见光图中的目标主体；获取目标主体对应的深度信息；根据目标主体和目标主体对应的深度信息，对目标主体进行三维重构，返回获取可见光图的步骤以获取不同采集角度的可见光图，直到得到目标主体对应的三维模型，利用中心权重图可以让图像中心的对象更容易被检测，利用训练好的利用可见光图、中心权重图和主体掩膜图等训练得到的主体检测模型，可以更加准确的识别出可见光图中的目标主体，在三维模型建构时，通过目标主体的深度信息，实现目标主体对应的三维模型的精准构建，在存在干扰物体的情况下也能精准识别目标主体，从而提高目标主体对应的三维模型建构的准确率。The construction method of the three-dimensional model in this embodiment generates a center weight map corresponding to the visible light map by acquiring the visible light map, wherein the weight value represented by the center weight map gradually decreases from the center to the edge; The image is input into the subject detection model, and the confidence map of the subject area is obtained. The subject detection model is a model that is pre-trained based on the visible light map, the center weight map and the corresponding labeled subject mask map of the same scene; The regional confidence map determines the target subject in the visible light map; obtains the depth information corresponding to the target subject; performs three-dimensional reconstruction of the target subject according to the depth information corresponding to the target subject and the target subject, and returns to the steps of obtaining the visible light map to obtain different acquisition angles Until the 3D model corresponding to the target subject is obtained, the object in the center of the image can be more easily detected by using the center weight map. The model can more accurately identify the target subject in the visible light image. During the construction of the 3D model, the depth information of the target subject is used to realize the accurate construction of the 3D model corresponding to the target subject, and it can also be accurately identified in the presence of interfering objects. The target subject, thereby improving the accuracy of the construction of the 3D model corresponding to the target subject.

在一个实施例中，如图4所示，步骤208，包括：In one embodiment, as shown in FIG. 4, step 208 includes:

步骤208A，对该主体区域置信度图进行处理，得到主体掩膜图。Step 208A, processing the confidence map of the subject region to obtain a subject mask map.

具体地，主体区域置信度图中存在一些置信度较低、零散的点，可通过ISP处理器或中央处理器对主体区域置信度图进行过滤处理，得到主体掩膜图。该过滤处理可采用配置置信度阈值，将主体区域置信度图中置信度值低于置信度阈值的像素点过滤。该置信度阈值可采用自适应置信度阈值，也可以采用固定阈值，也可以采用分区域配置对应的阈值。Specifically, there are some low-confidence and scattered points in the confidence map of the subject area, and the subject mask map can be obtained by filtering the confidence map of the subject area through the ISP processor or the central processing unit. In the filtering process, a configured confidence threshold can be used to filter the pixels whose confidence value is lower than the confidence threshold in the confidence map of the subject area. The confidence threshold can be an adaptive confidence threshold, a fixed threshold, or a corresponding threshold can be configured by region.

步骤208B，检测该可见光图，确定该可见光图中的高光区域。Step 208B: Detect the visible light image, and determine the highlight area in the visible light image.

其中，高光区域是指亮度值大于亮度阈值的区域。Among them, the highlight area refers to the area whose brightness value is greater than the brightness threshold value.

具体地，ISP处理器或中央处理器对可见光图进行高光检测，筛选得到亮度值大于亮度阈值的目标像素点，对目标像素点采用连通域处理得到高光区域。Specifically, the ISP processor or the central processing unit performs highlight detection on the visible light image, selects target pixels with a brightness value greater than a brightness threshold, and uses connected domain processing on the target pixels to obtain highlight regions.

步骤208C，根据该可见光图中的高光区域与该主体掩膜图，确定该可见光图中消除高光的目标主体。Step 208C, according to the highlight area in the visible light image and the subject mask image, determine a target subject for eliminating highlights in the visible light image.

具体地，ISP处理器或中央处理器可将可见光图中的高光区域与该主体掩膜图做差分计算或逻辑与计算得到可见光图中消除高光的目标主体。Specifically, the ISP processor or the central processing unit may perform differential calculation or logical AND calculation between the highlight area in the visible light image and the subject mask image to obtain the target subject for eliminating highlights in the visible light image.

本实施例中，对主体区域置信度图做过滤处理得到主体掩膜图，提高了主体区域置信度图的可靠性，对可见光图进行检测得到高光区域，然后与主体掩膜图进行处理，可得到消除了高光的目标主体，针对影响主体识别精度的高光、高亮区域单独采用滤波器进行处理，提高了主体识别的精度和准确性。In this embodiment, the subject mask map is obtained by filtering the confidence map of the subject area, which improves the reliability of the confidence map of the subject area. The visible light map is detected to obtain the highlight area, and then processed with the subject mask map, which can be A target subject with no highlights is obtained, and a filter is used to process the highlight and highlight areas that affect the accuracy of subject recognition, thereby improving the accuracy and accuracy of subject recognition.

在一个实施例中，步骤208A，包括：对该主体区域置信度图进行自适应置信度阈值过滤处理，得到主体掩膜图。In one embodiment, step 208A includes: performing an adaptive confidence threshold filtering process on the subject area confidence map to obtain a subject mask map.

其中，自适应置信度阈值是指置信度阈值。自适应置信度阈值可为局部自适应置信度阈值。该局部自适应置信度阈值是根据像素点的领域块的像素值分布来确定该像素点位置上的二值化置信度阈值。亮度较高的图像区域的二值化置信度阈值配置的较高，亮度较低的图像区域的二值化阈值置信度配置的较低。The adaptive confidence threshold refers to a confidence threshold. The adaptive confidence threshold may be a locally adaptive confidence threshold. The locally adaptive confidence threshold is to determine the binarization confidence threshold at the pixel position according to the pixel value distribution of the domain block of the pixel. The binarization confidence threshold is configured to be higher for image regions with higher brightness, and the binarization threshold confidence for image regions with lower brightness is configured to be lower.

在一个实施例中，自适应置信度阈值的配置过程包括：当像素点的亮度值大于第一亮度值，则配置第一置信度阈值，当像素点的亮度值小于第二亮度值，则配置第二置信度阈值，当像素点的亮度值大于第二亮度值且小于第一亮度值，则配置第三置信度阈值，其中，第二亮度值小于或等于第一亮度值，第二置信度阈值小于第三置信度阈值，第三置信度阈值小于第一置信度阈值。In one embodiment, the configuration process of the adaptive confidence threshold includes: when the brightness value of the pixel point is greater than the first brightness value, configure the first confidence threshold value, and when the brightness value of the pixel point is less than the second brightness value, configure The second confidence threshold, when the brightness value of the pixel point is greater than the second brightness value and less than the first brightness value, a third confidence threshold is configured, where the second brightness value is less than or equal to the first brightness value, and the second confidence threshold The threshold is smaller than the third confidence threshold, and the third confidence threshold is smaller than the first confidence threshold.

在一个实施例中，自适应置信度阈值的配置过程包括：当像素点的亮度值大于第一亮度值，则配置第一置信度阈值，当像素点的亮度值小于或等于第一亮度值，则配置第二置信度阈值，其中，第二亮度值小于或等于第一亮度值，第二置信度阈值小于第一置信度阈值。In one embodiment, the configuration process of the adaptive confidence threshold includes: when the brightness value of the pixel point is greater than the first brightness value, configuring the first confidence threshold value, and when the brightness value of the pixel point is less than or equal to the first brightness value, Then a second confidence threshold is configured, wherein the second brightness value is less than or equal to the first brightness value, and the second confidence threshold is less than the first confidence threshold.

对主体区域置信度图进行自适应置信度阈值过滤处理时，将主体区域置信度图中各像素点的置信度值与对应的置信度阈值比较，大于或等于置信度阈值则保留该像素点，小于置信度阈值则去掉该像素点。When performing adaptive confidence threshold filtering on the confidence map of the main area, the confidence value of each pixel in the confidence map of the main area is compared with the corresponding confidence threshold, and if it is greater than or equal to the confidence threshold, the pixel is retained. If it is less than the confidence threshold, the pixel will be removed.

在一个实施例中，该对该主体区域置信度图进行自适应置信度阈值过滤处理，得到主体掩膜图，包括：In one embodiment, the subject region confidence map is subjected to adaptive confidence threshold filtering to obtain a subject mask map, including:

对该主体区域置信度图进行自适应置信度阈值过滤处理，得到二值化掩膜图；对该二值化掩膜图进行形态学处理和引导滤波处理，得到主体掩膜图。Perform adaptive confidence threshold filtering on the confidence map of the subject area to obtain a binarized mask map; perform morphological processing and guided filtering on the binarized mask map to obtain a subject mask map.

具体地，ISP处理器或中央处理器将主体区域置信度图按照自适应置信度阈值过滤处理后，将保留的像素点的置信度值采用1表示，去掉的像素点的置信度值采用0表示，得到二值化掩膜图。Specifically, after the ISP processor or the central processing unit filters the confidence level map of the main area according to the adaptive confidence level threshold, the confidence level value of the retained pixel points is represented by 1, and the confidence level value of the removed pixel points is represented by 0. , to get the binarized mask image.

形态学处理可包括腐蚀和膨胀。可先对二值化掩膜图进行腐蚀操作，再进行膨胀操作，去除噪声；再对形态学处理后的二值化掩膜图进行引导滤波处理，实现边缘滤波操作，得到边缘提取的主体掩膜图。Morphological treatments can include corrosion and swelling. The binarized mask image can be etched first, and then expanded to remove noise; then the morphologically processed binarized mask image can be subjected to guided filtering to realize the edge filtering operation, and obtain the main mask of edge extraction. Membrane diagram.

通过形态学处理和引导滤波处理可以保证得到的主体掩膜图的噪点少或没有噪点，边缘更加柔和。Morphological processing and guided filtering can ensure that the obtained subject mask has less or no noise and softer edges.

在一个实施例中，该根据该可见光图中的高光区域与该主体掩膜图，确定该可见光图中消除高光的目标主体，包括：将该可见光图中的高光区域与该主体掩膜图做差分处理，得到消除高光的目标主体。In one embodiment, determining a target subject to eliminate highlights in the visible light image according to the highlight area in the visible light image and the subject mask image includes: comparing the highlight area in the visible light image with the subject mask image Differential processing to get the target subject to eliminate highlights.

具体地，ISP处理器或中央处理器将该可见光图中的高光区域与该主体掩膜图做差分处理，即可见光图和主体掩膜图中对应的像素值相减，得到该可见光图中的目标主体。通过差分处理得到去除高光的目标主体，计算方式简单。Specifically, the ISP processor or the central processing unit performs differential processing between the highlight area in the visible light image and the main mask image, that is, the corresponding pixel values in the visible light image and the main mask image are subtracted to obtain the visible light image. target subject. The target subject with highlights removed is obtained through differential processing, and the calculation method is simple.

在一个实施例中，该主体检测模型包括依次相连的输入层、中间层和输出层。将该可见光图和该中心权重图输入到主体检测模型中，包括：将该可见光图作用于主体检测模型的输入层；将该中心权重图作用于该主体检测模型的输出层。In one embodiment, the subject detection model includes an input layer, an intermediate layer, and an output layer connected in sequence. Inputting the visible light map and the center weight map into the subject detection model includes: applying the visible light map to the input layer of the subject detection model; and applying the center weight map to the output layer of the subject detection model.

主体检测模型可采用深度学习网络模型。该深度学习网络模型可包括依次相连的输入层、中间层和输出层。中间层可为一层或至少两层的网络结构。可见光图从主体检测模型的输入层输入，即作用于主体检测模型的输入层。中心权重图在主体检测模型的输出层输入，即作用于主体检测模型的输出层。将中心权重图作用于主体检测模型的输出层，可以降低主体检测模型的其他层对权重图的影响，让处于画面中心的对象更加容易被检测为主体。The subject detection model can use a deep learning network model. The deep learning network model may include an input layer, an intermediate layer and an output layer which are connected in sequence. The intermediate layer may be a one-layer or at least two-layer network structure. The visible light map is input from the input layer of the subject detection model, which acts on the input layer of the subject detection model. The central weight map is input in the output layer of the subject detection model, that is, it acts on the output layer of the subject detection model. Applying the center weight map to the output layer of the subject detection model can reduce the influence of other layers of the subject detection model on the weight map, making it easier for objects in the center of the screen to be detected as subjects.

在一个实施例中，步骤210中获取目标主体对应的深度信息包括：获取与可见光图对应的深度图；深度图包括TOF深度图、双目深度图和结构光深度图中至少一种，对可见光图和深度图进行配准处理，得到配准后的可见光图和深度图，根据可见光图中目标主体所在的区域从配准后的深度图中确定目标主体对应的深度信息。In one embodiment, acquiring depth information corresponding to the target subject in step 210 includes: acquiring a depth map corresponding to a visible light map; the depth map includes at least one of a TOF depth map, a binocular depth map, and a structured light depth map. The image and the depth map are registered to obtain the registered visible light map and depth map, and the depth information corresponding to the target subject is determined from the registered depth map according to the region where the target subject is located in the visible light image.

其中，深度图是指包含深度信息的图。通过深度摄像头或双目摄像头拍摄同一场景得到对应的深度图。深度摄像头可为结构光摄像头或TOF摄像头。深度图可为结构光深度图、TOF深度图和双目深度图中的至少一种。The depth map refers to a map containing depth information. The corresponding depth map is obtained by shooting the same scene with a depth camera or a binocular camera. The depth camera can be a structured light camera or a TOF camera. The depth map may be at least one of a structured light depth map, a TOF depth map, and a binocular depth map.

具体地，ISP处理器或中央处理器可通过摄像头拍摄同一场景得到可见光图和对应的深度图，然后采用相机标定参数对可见光图和深度图进行配准，得到配准后的可见光图和深度图。配准后的可见光图和深度图中，每个可见光图中的一个像素点在深度图中都存在匹配的像素点，从而根据匹配关系可得到目标主体所在的区域在深度图中对应的像素点，根据像素点的像素值得到目标主体对应的深度值。Specifically, the ISP processor or the central processing unit can capture the same scene with the camera to obtain the visible light image and the corresponding depth map, and then use the camera calibration parameters to register the visible light image and the depth map to obtain the registered visible light image and depth map. . In the registered visible light map and depth map, each pixel in the visible light map has matching pixels in the depth map, so that the corresponding pixels in the depth map of the region where the target subject is located can be obtained according to the matching relationship. , and obtain the depth value corresponding to the target subject according to the pixel value of the pixel point.

在一个实施例中，当无法拍摄得到深度图，可自动生成仿真深度图。仿真深度图中的各个像素点的深度值可为预设值。此外，仿真深度图中的各个像素点的深度值可对应不同的预设值。In one embodiment, when the depth map cannot be obtained by shooting, a simulated depth map can be automatically generated. The depth value of each pixel in the simulated depth map may be a preset value. In addition, the depth values of each pixel in the simulated depth map may correspond to different preset values.

在一个实施例中，包括：将该配准后的可见光图、该深度图和该中心权重图输入到主体检测模型中，得到主体区域置信度图；其中，该主体检测模型是预先根据同一场景的可见光图、深度图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。In one embodiment, it includes: inputting the registered visible light map, the depth map and the center weight map into a subject detection model to obtain a subject area confidence map; wherein the subject detection model is based on the same scene in advance The model obtained by training the visible light map, depth map, center weight map and corresponding annotated subject mask map.

其中，主体检测模型是预先采集大量的训练数据，将训练数据输入到包含有初始网络权重的主体检测模型进行训练得到的。每组训练数据包括同一场景对应的可见光图、深度图、中心权重图及已标注的主体掩膜图。其中，可见光图和中心权重图作为训练的主体检测模型的输入，已标注的主体掩膜图作为训练的主体检测模型期望输出得到的真实值。主体掩膜图是用于识别图像中主体的图像滤镜模板，可以遮挡图像的其他部分，筛选出图像中的主体。主体检测模型可训练能够识别检测各种主体，如人、花、猫、狗、背景等。Among them, the subject detection model is obtained by collecting a large amount of training data in advance, and inputting the training data into the subject detection model including the initial network weights for training. Each set of training data includes the visible light map, depth map, center weight map and annotated subject mask map corresponding to the same scene. Among them, the visible light map and the center weight map are used as the input of the trained subject detection model, and the annotated subject mask map is used as the actual value that the trained subject detection model expects to output. The subject mask map is an image filter template used to identify the subject in the image, which can block other parts of the image and filter out the subject in the image. The subject detection model can be trained to recognize and detect various subjects, such as people, flowers, cats, dogs, backgrounds, etc.

本实施例中，将深度图和中心权重图作为主体检测模型的输入，可以利用深度图的深度信息让距离摄像头更近的对象更容易被检测，利用中心权重图中中心权重大，四边权重小的中心注意力机制，让图像中心的对象更容易被检测，引入深度图实现对主体做深度特征增强，引入中心权重图对主体做中心注意力特征增强，不仅可以准确识别简单场景下的目标主体，更大大提高了复杂场景下的主体识别准确度，引入深度图可以解决传统目标检测方法对自然图像千变万化的目标鲁棒性较差的问题。简单场景是指主体单一，背景区域对比度不高的场景。In this embodiment, the depth map and the center weight map are used as the input of the subject detection model. The depth information of the depth map can be used to make objects closer to the camera easier to be detected. The center weight map is used to have a large center weight and a small weight on the four sides. The central attention mechanism makes it easier to detect the object in the center of the image. The depth map is introduced to enhance the depth feature of the subject, and the center weight map is introduced to enhance the central attention feature of the subject, which can not only accurately identify the target subject in a simple scene , which greatly improves the accuracy of subject recognition in complex scenes, and the introduction of depth maps can solve the problem of poor robustness of traditional target detection methods to ever-changing targets in natural images. A simple scene refers to a scene with a single subject and low contrast in the background area.

在一个实施例中，该主体检测模型的训练方式，包括：获取同一场景的可见光图、深度图和已标注的主体掩膜图；生成与该可见光图对应的中心权重图，其中，该中心权重图所表示的权重值从中心到边缘逐渐减小；将该可见光图作用于包含初始网络权重的主体检测模型的输入层，将该深度图和该中心权重图作用于初始的主体检测模型的输出层，将该已标注的主体掩膜图作为该主体检测模型输出的真实值，对该包含初始网络权重的主体检测模型进行训练，得到该主体检测模型的目标网络权重。In one embodiment, the training method of the subject detection model includes: acquiring a visible light map, a depth map and an labeled subject mask map of the same scene; generating a center weight map corresponding to the visible light map, wherein the center weight The weight value represented by the figure gradually decreases from the center to the edge; the visible light map is applied to the input layer of the subject detection model containing the initial network weights, and the depth map and the center weight map are applied to the output of the initial subject detection model layer, take the marked subject mask image as the real value output by the subject detection model, train the subject detection model including the initial network weight, and obtain the target network weight of the subject detection model.

可收集一个场景的可见光图、深度图和对应的已标注的主体掩膜图。对可见光图和深度图进行语义级的标注，标注里面的主体。可收集大量的可见光图，然后基于COCO数据集中的前景目标图和简单的背景图进行融合得到大量的纯色背景或简单背景的图像，作为训练的可见光图。COCO数据集中包含数量众多的前景目标。The visible light map, depth map and corresponding annotated subject mask map of a scene can be collected. Semantic-level annotation is performed on the visible light map and the depth map, and the subject inside is marked. A large number of visible light images can be collected, and then based on the foreground target image in the COCO dataset and a simple background image, a large number of images with solid color backgrounds or simple backgrounds can be obtained as training visible light images. The COCO dataset contains a large number of foreground objects.

主体检测模型的网络结构采用基于mobile-Unet的架构，并在decoder部分增加层之间的桥接，使高级语义特征在上采样时更充分的传递。中心权重图作用于主体监测模型的输出层，引入中心注意力机制，让处于画面中心的对象更容易被检测为主体。The network structure of the subject detection model adopts the architecture based on mobile-Unet, and the bridge between layers is added in the decoder part, so that the high-level semantic features can be more fully transmitted during up-sampling. The central weight map acts on the output layer of the subject monitoring model, and introduces a central attention mechanism to make it easier for objects in the center of the screen to be detected as subjects.

主体检测模型的网络结构包括输入层、卷积层(conv)、池化层(pooling)、双线性插值层(Bilinear Up sampling)、卷积特征连接层(concat+conv)、输出层等。在双线性插值层和卷积特征连接层之间采用deconvolution+add(反卷积特征叠加)操作实现桥接，使得高级语义特征在上采样时更充分的传递。卷积层、池化层、双线性插值层、卷积特征连接层等可为主体检测模型的中间层。The network structure of the subject detection model includes an input layer, a convolution layer (conv), a pooling layer (pooling), a bilinear interpolation layer (Bilinear Up sampling), a convolutional feature connection layer (concat+conv), and an output layer. The deconvolution+add (deconvolution feature stacking) operation is used to bridge between the bilinear interpolation layer and the convolution feature connection layer, so that the high-level semantic features are more fully transmitted during upsampling. Convolutional layers, pooling layers, bilinear interpolation layers, convolutional feature connection layers, etc. can be intermediate layers of the subject detection model.

初始网络权重是指初始化的深度学习网络模型的每一层的初始权重。目标网络权重是指训练得到的能够检测图像主体的深度学习网络模型的每一层的权重。可通过预设训练次数得到目标网络权重，也可以设置深度学习网络模型的损失函数。当训练得到损失函数值小于损失阈值时，将主体检测模型的当前网络权重作为目标网络权重。The initial network weights refer to the initial weights of each layer of the initialized deep learning network model. The target network weight refers to the weight of each layer of the deep learning network model trained to detect the subject of the image. The target network weight can be obtained by preset training times, and the loss function of the deep learning network model can also be set. When the loss function value obtained from training is less than the loss threshold, the current network weight of the subject detection model is used as the target network weight.

图5为一个实施例中主体检测模型的网络结构示意图。如图5所示，主体检测模型的网络结构包括卷积层402、池化层404、卷积层406、池化层408、卷积层410、池化层412、卷积层414、池化层416、卷积层418、卷积层420、双线性插值层422、卷积层424、双线性插值层426、卷积层428、卷积特征连接层430、双线性插值层432、卷积层434、卷积特征连接层436、双线性插值层438、卷积层440、卷积特征连接层442等，卷积层402作为主体检测模型的输入层，卷积特征连接层442作为主体检测模型的输出层。本实施例中的主体检测模型的网络结构仅为示例，不作为对本申请的限制。可以理解的是，主体检测模型的网络结构中的卷积层、池化层、双线性插值层、卷积特征连接层等可以根据需要设置多个。FIG. 5 is a schematic diagram of a network structure of a subject detection model in one embodiment. As shown in FIG. 5 , the network structure of the subject detection model includes a convolutional layer 402, a pooling layer 404, a convolutional layer 406, a pooling layer 408, a convolutional layer 410, a pooling layer 412, a convolutional layer 414, a pooling layer layer 416, convolutional layer 418, convolutional layer 420, bilinear interpolation layer 422, convolutional layer 424, bilinear interpolation layer 426, convolutional layer 428, convolutional feature connection layer 430, bilinear interpolation layer 432 , convolutional layer 434, convolutional feature connection layer 436, bilinear interpolation layer 438, convolutional layer 440, convolutional feature connection layer 442, etc. The convolutional layer 402 is used as the input layer of the subject detection model, and the convolutional feature connection layer 442 as the output layer of the subject detection model. The network structure of the subject detection model in this embodiment is only an example, and is not intended to limit the present application. It can be understood that, multiple convolution layers, pooling layers, bilinear interpolation layers, convolution feature connection layers, etc. in the network structure of the subject detection model can be set as required.

该主体检测模型的编码部分包括卷积层402、池化层404、卷积层406、池化层408、卷积层410、池化层412、卷积层414、池化层416、卷积层418，解码部分包括卷积层420、双线性插值层422、卷积层424、双线性插值层426、卷积层428、卷积特征连接层430、双线性插值层432、卷积层434、卷积特征连接层436、双线性插值层438、卷积层440、卷积特征连接层442。卷积层406和卷积层434级联(Concatenation)，卷积层410和卷积层428级联，卷积层414与卷积层424级联。双线性插值层422和卷积特征连接层430采用反卷积特征叠加(Deconvolution+add)桥接。双线性插值层432和卷积特征连接层436采用反卷积特征叠加桥接。双线性插值层438和卷积特征连接层442采用反卷积特征叠加桥接。The encoding part of the subject detection model includes convolutional layer 402, pooling layer 404, convolutional layer 406, pooling layer 408, convolutional layer 410, pooling layer 412, convolutional layer 414, pooling layer 416, convolutional layer 410 Layer 418, the decoding part includes convolutional layer 420, bilinear interpolation layer 422, convolutional layer 424, bilinear interpolation layer 426, convolutional layer 428, convolutional feature connection layer 430, bilinear interpolation layer 432, volume A convolutional layer 434 , a convolutional feature connection layer 436 , a bilinear interpolation layer 438 , a convolutional layer 440 , and a convolutional feature connection layer 442 . The convolutional layer 406 is concatenated with the convolutional layer 434 , the convolutional layer 410 is concatenated with the convolutional layer 428 , and the convolutional layer 414 is concatenated with the convolutional layer 424 . The bilinear interpolation layer 422 and the convolutional feature connection layer 430 are bridged using Deconvolution+add. The bilinear interpolation layer 432 and the convolutional feature connection layer 436 employ deconvolutional feature stacking bridges. The bilinear interpolation layer 438 and the convolutional feature connection layer 442 are bridged using deconvolutional feature stacking.

原图450(如可见光图)输入到主体检测模型的卷积层402，深度图460作用于主体检测模型的卷积特征连接层442，中心权重图470作用于主体检测模型的卷积特征连接层442。深度图460和中心权重图470分别作为一个乘积因子输入到卷积特征连接层442。原图450、深度图460和中心权重图470输入到主体检测模型后输出包含主体的置信度图480。The original image 450 (such as a visible light image) is input to the convolutional layer 402 of the subject detection model, the depth map 460 acts on the convolutional feature connection layer 442 of the subject detection model, and the center weight map 470 acts on the convolutional feature connection layer of the subject detection model. 442. The depth map 460 and the center weight map 470 are each input to the convolutional feature connection layer 442 as a multiplication factor. The original image 450 , the depth map 460 and the center weight map 470 are input to the subject detection model and output a confidence map 480 containing the subject.

该主体检测模型的训练过程中对深度图采用预设数值的丢失率。该预设数值可为50％。深度图的训练过程中引入概率的dropout，让主体检测模型可以充分的挖掘深度图的信息，当主体检测模型无法获取深度图时，仍然可以输出准确结果。对深度图输入采用dropout的方式，让主体检测模型对深度图的鲁棒性更好，即使没有深度图也可以准确分割主体区域。During the training process of the subject detection model, a loss rate of a preset value is used for the depth map. The preset value may be 50%. Probabilistic dropout is introduced in the training process of the depth map, so that the subject detection model can fully mine the information of the depth map. When the subject detection model cannot obtain the depth map, it can still output accurate results. The dropout method is used for the depth map input, which makes the subject detection model more robust to the depth map, and can accurately segment the subject area even without a depth map.

此外，因正常的终端设备拍摄过程中，深度图的拍摄和计算都都相当耗时耗力，难以获取，在训练时深度图设计为50％的dropout概率，能够保证没有深度信息的时候主体检测模型依然可以正常检测。In addition, in the normal shooting process of terminal equipment, the shooting and calculation of the depth map are time-consuming and labor-intensive, and it is difficult to obtain. The depth map is designed with a dropout probability of 50% during training, which can ensure the detection of subjects when there is no depth information. The model can still be detected normally.

对原图450采用高光检测层444进行高光检测识别出原图中的高光区域。对主体检测模型输出的主体区域置信度图进行自适应阈值过滤处理得到二值化的掩膜图，对二值化掩膜图进行形态学处理和引导滤波处理得到主体掩膜图，将主体掩膜图与包含高光区域的原图进行差分处理，将高光区域从主体掩膜图中删除得到去除高光的主体。主体区域置信度图是分布在0至1的置信度图，主体区域置信度图包含的噪点较多，有很多置信度较低的噪点，或聚合在一起的小块高置信度区域，通过区域自适应置信度阈值进行过滤处理，得到二值化掩膜图。对二值化掩膜图做形态学处理可以进一步降低噪声，做引导滤波处理，可以让边缘更平滑。可以理解的是，主体区域置信度图可为包含噪点的主体掩膜图。The highlight detection layer 444 is used to perform highlight detection on the original image 450 to identify the highlight area in the original image. Perform adaptive threshold filtering on the confidence map of the subject area output by the subject detection model to obtain a binarized mask map, and perform morphological processing and guided filtering on the binarized mask map to obtain the subject mask map. The film image is differentially processed with the original image containing the highlight area, and the highlight area is deleted from the subject mask image to obtain the subject with the highlight removed. The confidence map of the main area is a confidence map distributed from 0 to 1. The confidence map of the main area contains more noise points, and there are many noise points with low confidence, or small high-confidence areas that are aggregated together. The adaptive confidence threshold is filtered to obtain a binarized mask image. Morphological processing of the binarized mask image can further reduce noise, and guided filtering processing can make the edges smoother. It can be understood that the confidence map of the subject region may be a subject mask map containing noise.

本实施例中使用深度图作为特征以增强网络输出结果，并没有将深度图直接输入到主体检测模型的网络中，可以另外设计一个双深度学习网络结构，其中一个深度学习网络结构用于对深度图进行处理，另一个深度学习网络结构用于对RGB图进行处理，然后将两个深度学习网络结构的输出进行卷积特征连接，然后再输出。In this embodiment, the depth map is used as a feature to enhance the network output results, and the depth map is not directly input into the network of the subject detection model, and a dual deep learning network structure can be designed in addition, one of which is used for deep learning network structure. The image is processed, another deep learning network structure is used to process the RGB image, and then the outputs of the two deep learning network structures are connected by convolutional features, and then output.

在一个实施例中，该主体检测模型的训练方式，包括：获取同一场景的可见光图和已标注的主体掩膜图；生成与该可见光图对应的中心权重图，其中，该中心权重图所表示的权重值从中心到边缘逐渐减小；将该可见光图作用于包含初始网络权重的主体检测模型的输入层，将该中心权重图作用于初始的主体检测模型的输出层，将该已标注的主体掩膜图作为该主体检测模型输出的真实值，对该包含初始网络权重的主体检测模型进行训练，得到该主体检测模型的目标网络权重。In one embodiment, the training method of the subject detection model includes: acquiring a visible light map and a marked subject mask map of the same scene; generating a center weight map corresponding to the visible light map, wherein the center weight map represents The weight value gradually decreases from the center to the edge; the visible light map is applied to the input layer of the subject detection model containing the initial network weights, the central weight map is applied to the output layer of the initial subject detection model, and the labeled The subject mask image is used as the real value output by the subject detection model, and the subject detection model including the initial network weight is trained to obtain the target network weight of the subject detection model.

本实施例中的训练采用可见光图和中心权重图，即在图5的主体检测模型的网络结构中输出层部分不引入深度图，采用可见光图作用在卷积层402，中心权重图470作用于主体检测模型的卷积特征连接层442。The training in this embodiment uses the visible light map and the center weight map, that is, the depth map is not introduced in the output layer part of the network structure of the subject detection model in FIG. The convolutional feature connection layer 442 of the subject detection model.

图6为一个实施例中目标主体识别的效果示意图。如图6所示，RGB图502中存在一只蝴蝶，将RGB图输入到主体检测模型后得到主体区域置信度图504，然后对主体区域置信度图604进行滤波和二值化得到二值化掩膜图506，再对二值化掩膜图506进行形态学处理和引导滤波实现边缘增强，得到主体掩膜图508。FIG. 6 is a schematic diagram of the effect of target subject recognition in one embodiment. As shown in Fig. 6, there is a butterfly in the RGB image 502. After inputting the RGB image into the subject detection model, the subject area confidence map 504 is obtained, and then the subject area confidence map 604 is filtered and binarized to obtain the binarization mask image 506 , and then perform morphological processing and guided filtering on the binarized mask image 506 to achieve edge enhancement, and obtain a main mask image 508 .

在一个实施例中，返回所述获取可见光图的步骤以获取不同采集角度的可见光图的步骤包括：以目标主体为中心，围绕目标主体连续变换采集角度，在相邻的可见光图之间存在重叠区域的条件下，实时采集可见光图，得到不同采集角度的可见光图。In one embodiment, the step of returning to the step of acquiring visible light images to acquire visible light images of different acquisition angles includes: taking the target subject as the center, continuously changing the acquisition angle around the target subject, and overlapping between adjacent visible light images Under the condition of the area, the visible light image is collected in real time, and the visible light image of different collection angles is obtained.

具体地，可通过旋转拍摄装置实现采集角度的变换，变换的幅度可自定义，如拍摄一个立方体物体，每次旋转45度以采集不同角度的可见光图。由于采集的可见光图需要覆盖整个目标主体，相邻的可见光图之间需要存在重叠区域以保证信息采集的完整性，其中重叠的比例可自定义。Specifically, the acquisition angle can be converted by rotating the photographing device, and the magnitude of the conversion can be customized. For example, when photographing a cubic object, each time rotates 45 degrees to acquire visible light images of different angles. Since the collected visible light images need to cover the entire target body, an overlapping area needs to exist between adjacent visible light images to ensure the integrity of information collection, and the overlap ratio can be customized.

本实施例中，通过围绕目标主体连续变换采集角度，实时采集可见光图可保证信息采集的完整性，从而实现实时对目标主体进行三维重构。In this embodiment, by continuously changing the acquisition angle around the target body and collecting the visible light image in real time, the integrity of the information collection can be ensured, thereby realizing the three-dimensional reconstruction of the target body in real time.

在一个实施例中，步骤210包括：获取目标主体上的第一平面像素点对应的第一深度值，根据第一平面像素点在目标主体的位置和第一深度值得到第一平面像素点在三维空间对应的第一三维像素点，获取目标主体上的第二平面像素点对应的第二深度值，以第一深度值为参考深度值，根据第二深度值确定第二三维像素点相对于所述第一三维像素点的相对位置，第二三维像素点是第二平面像素点在三维空间对应的三维像素点，根据相对位置和第二平面像素点在所述目标主体的位置确定第二三维像素点在三维空间对应的位置；连接三维空间中的各个三维像素点。In one embodiment, step 210 includes: acquiring a first depth value corresponding to the first plane pixel on the target body, and obtaining the first plane pixel in the target body according to the position and the first depth value of the first plane pixel For the first three-dimensional pixel point corresponding to the three-dimensional space, obtain the second depth value corresponding to the second plane pixel point on the target body, use the first depth value as the reference depth value, and determine the relative value of the second three-dimensional pixel point according to the second depth value. The relative position of the first three-dimensional pixel point, the second three-dimensional pixel point is the three-dimensional pixel point corresponding to the second plane pixel point in the three-dimensional space, and the second three-dimensional pixel point is determined according to the relative position and the position of the second plane pixel point in the target subject. The corresponding position of the three-dimensional pixel point in the three-dimensional space; connect each three-dimensional pixel point in the three-dimensional space.

具体的，可对目标主体上的每个平面像素点进行三维像素点的匹配，也可对目标主体上的关键平面像素点进行三维像素点的匹配，以提高三维重构的效率，节省计算机资源。在一个实施例中，当目标主体为人脸时，关键平面像素点为通过对人脸检测得到的特征关键点，如鼻尖、眼睛、嘴巴、眉毛上的点得到关键平面像素点。建构三维模型需要相对深度值，选择一个点，如第一平面像素点，当成基础得标定点，其他点算出与此点相对的深度值，从而可逐步建构出目标主体在三维空间中对应的各个三维像素点，连接三维空间中的各个三维像素点就得到目标主体对应的三维模型。Specifically, three-dimensional pixel point matching can be performed on each plane pixel point on the target body, and three-dimensional pixel point matching can also be performed on key plane pixel points on the target body, so as to improve the efficiency of three-dimensional reconstruction and save computer resources. . In one embodiment, when the target subject is a human face, the key plane pixels are the key plane pixels obtained by detecting the feature key points of the human face, such as points on the tip of the nose, eyes, mouth, and eyebrows. The construction of a 3D model requires a relative depth value. Select a point, such as the first plane pixel point, as the basic calibration point, and other points can calculate the depth value relative to this point, so that the corresponding depth values of the target subject in the 3D space can be gradually constructed. Three-dimensional pixel point, connecting each three-dimensional pixel point in the three-dimensional space to obtain the three-dimensional model corresponding to the target subject.

在一个实施例中，步骤210包括：确定目标主体对应的目标类型，根据目标类型获取相同类型的初始三维模型，获取目标主体上的关键平面像素点对应的实际深度值；从初始三维模型中获取与关键平面像素点匹配的三维模型像素点；根据各个关键平面像素点之间的实际深度值比例调整匹配的三维模型像素点的三维空间位置；获取目标主体上的非关键平面像素点对应的实际深度值，从初始三维模型中获取与非关键平面像素点匹配的三维模型像素点，根据非关键平面像素点与关键平面像素点之间的实际深度值比例调整与非关键平面像素点匹配的三维模型像素点的三维空间位置，直到目标主体上的各个平面像素点存在匹配的调整后的三维模型像素点；调整后的各个三维模型像素点形成目标主体对应的三维模型。In one embodiment, step 210 includes: determining a target type corresponding to the target body, obtaining an initial three-dimensional model of the same type according to the target type, and obtaining actual depth values corresponding to key plane pixels on the target body; obtaining from the initial three-dimensional model The 3D model pixel points matched with the key plane pixels; adjust the 3D spatial position of the matched 3D model pixels according to the actual depth value ratio between the key plane pixels; obtain the actual corresponding non-key plane pixels on the target subject Depth value, obtain the 3D model pixel points matching the non-critical plane pixels from the initial 3D model, and adjust the 3D model pixels matching the non-critical plane pixels according to the actual depth value ratio between the non-critical plane pixels and the key plane pixels. The three-dimensional spatial position of the model pixel point, until each plane pixel point on the target body has a matching adjusted three-dimensional model pixel point; each adjusted three-dimensional model pixel point forms a three-dimensional model corresponding to the target body.

具体地，主体检测模型不仅输出可见光图中的目标主体轮廓，还输出目标主体对应的目标类型，目标类型包括人、花、猫、狗等。可预先建立各个目标类型对应的初始三维模型，如建立人脸初始三维模型、人体初始三维模型、花初始三维模型、猫初始三维模型、狗初始三维模型等。关键平面像素点为确定三维模型立体性的关键特征点，如对于人脸，通过对人脸检测得到关键平面像素点，如鼻尖、眼睛、嘴巴、眉毛上的点为关键平面像素点。根据关键平面像素点之间的实际深度值比例调整匹配的三维模型像素点的三维空间位置。从而对初始三维模型进行调整得到与目标主体匹配的三维模型立体轮廓，如初始三维模型的鼻子较低，而实际目标主体的鼻子较高，通过将初始三维模型中的鼻子部位调高得到与目标主体匹配的三维模型立体轮廓。再根据其它非关键平面像素点的实际深度值进一步调整初始三维模型，得到与目标主体更匹配的三维模型。Specifically, the subject detection model not only outputs the outline of the target subject in the visible light image, but also outputs the target type corresponding to the target subject, and the target type includes people, flowers, cats, dogs, etc. The initial 3D model corresponding to each target type can be established in advance, such as the initial 3D model of the face, the initial 3D model of the human body, the initial 3D model of the flower, the initial 3D model of the cat, the initial 3D model of the dog, etc. The key plane pixels are the key feature points to determine the three-dimensionality of the three-dimensional model. For example, for a human face, the key plane pixels are obtained by detecting the human face. For example, the points on the tip of the nose, eyes, mouth, and eyebrows are the key plane pixels. Adjust the three-dimensional spatial position of the matched three-dimensional model pixels according to the actual depth value ratio between the key plane pixels. Thereby, the initial 3D model is adjusted to obtain the three-dimensional contour of the 3D model that matches the target body. For example, the nose of the initial 3D model is lower, but the nose of the actual target body is higher. The body matches the three-dimensional contour of the 3D model. The initial 3D model is further adjusted according to the actual depth values of other non-critical plane pixels to obtain a 3D model that better matches the target subject.

本实施例中，由于初始三维模型是符合常规的模型，可减少深度图由于精度与算不准的问题，在深度图上形成的黑洞对三维模型的建构造成的影响。在初始三维模型的基础上逐步修正得到与目标主体匹配的三维模型。In this embodiment, since the initial three-dimensional model is a conventional model, the influence of the black hole formed on the depth map on the construction of the three-dimensional model due to the accuracy and inaccuracy of the depth map can be reduced. On the basis of the initial 3D model, a 3D model matching the target body is obtained by correcting step by step.

在一个实施例中，可见光图为实时采集的，方法还包括：根据实时采集的可见光图，在预览界面实时显示目标主体对应的三维模型的建构过程。In one embodiment, the visible light image is collected in real time, and the method further includes: displaying the construction process of the three-dimensional model corresponding to the target body on the preview interface in real time according to the visible light image collected in real time.

具体地，可通过终端设备实时采集包括目标主体的可见光图，在采集的同时逐渐建构与目标主体对应的三维模型，并在在预览界面实时显示目标主体对应的三维模型的建构过程，使得用户可直观的观看三维模型的建构过程。从而可根据显示的已建构的三维模型，调整可见光图的采集角度，使得目标主体对应的三维模型的构建更准确与高效。Specifically, the visible light image including the target body can be collected in real time through the terminal device, the 3D model corresponding to the target body can be gradually constructed while collecting, and the construction process of the 3D model corresponding to the target body can be displayed in real time on the preview interface, so that the user can Intuitively watch the construction process of the 3D model. Therefore, the acquisition angle of the visible light image can be adjusted according to the displayed constructed three-dimensional model, so that the construction of the three-dimensional model corresponding to the target subject is more accurate and efficient.

在一个实施例中，根据可见光图的纹理与颜色，对目标主体对应的三维模型匹配对应的纹理与颜色。应该理解的是，虽然图2、图4的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，图2、图4中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。In one embodiment, according to the texture and color of the visible light image, the corresponding texture and color are matched to the three-dimensional model corresponding to the target body. It should be understood that although the steps in the flowcharts of FIG. 2 and FIG. 4 are displayed in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIG. 2 and FIG. 4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. These sub-steps or stages may be executed at different times. The order of execution of the stages is also not necessarily sequential, but may be performed alternately or alternately with other steps or sub-steps of other steps or at least a portion of a stage.

图7为一个实施例中三维模型的建构装置的结构框图。如图7所示，一种三维模型的建构装置，包括处理模块602、检测模块604、目标主体确定模块606和三维模型建构模块608。其中：FIG. 7 is a structural block diagram of an apparatus for constructing a three-dimensional model in an embodiment. As shown in FIG. 7 , a three-dimensional model construction device includes a processing module 602 , a detection module 604 , a target subject determination module 606 and a three-dimensional model construction module 608 . in:

处理模块602，用于获取可见光图，生成与所述可见光图对应的中心权重图，其中，所述中心权重图所表示的权重值从中心到边缘逐渐减小。The processing module 602 is configured to acquire a visible light map and generate a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.

检测模块604，用于将所述可见光图和所述中心权重图输入到主体检测模型中，得到主体区域置信度图，其中，所述主体检测模型是预先根据同一场景的可见光图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。The detection module 604 is used to input the visible light map and the center weight map into the subject detection model to obtain the confidence map of the subject area, wherein the subject detection model is pre-based on the visible light map and the center weight map of the same scene and the corresponding annotated subject mask image for training.

目标主体确定模块606，用于根据所述主体区域置信度图确定所述可见光图中的目标主体。A target subject determination module 606, configured to determine a target subject in the visible light map according to the subject area confidence map.

三维模型建构模块608，用于获取所述目标主体对应的深度信息，根据所述目标主体和目标主体对应的深度信息，对所述目标主体进行三维重构，返回所述处理模块以获取不同采集角度的可见光图，直到得到所述目标主体对应的三维模型。The three-dimensional model construction module 608 is used to obtain the depth information corresponding to the target body, perform three-dimensional reconstruction of the target body according to the target body and the depth information corresponding to the target body, and return to the processing module to obtain different collections The visible light map of the angle is obtained until the three-dimensional model corresponding to the target body is obtained.

本实施例中的三维模型的建构装置，利用中心权重图可以让图像中心的对象更容易被检测，利用训练好的利用可见光图、中心权重图和主体掩膜图等训练得到的主体检测模型，可以更加准确的识别出可见光图中的目标主体，在三维模型建构时，通过目标主体的深度信息，实现目标主体对应的三维模型的精准构建，在存在干扰物体的情况下也能精准识别目标主体，从而提高目标主体对应的三维模型建构的准确率。The three-dimensional model construction device in this embodiment uses the center weight map to make the object in the center of the image easier to detect, and uses the trained subject detection model using the visible light map, the center weight map, and the subject mask map. The target subject in the visible light image can be more accurately identified. During the construction of the 3D model, the depth information of the target subject is used to realize the accurate construction of the 3D model corresponding to the target subject, and the target subject can also be accurately identified in the presence of interfering objects. , so as to improve the accuracy of the 3D model construction corresponding to the target subject.

在一个实施例中，目标主体确定模块606还用于对该主体区域置信度图进行处理，得到主体掩膜图；检测该可见光图，确定该可见光图中的高光区域；根据该可见光图中的高光区域与该主体掩膜图，确定该可见光图中消除高光的目标主体。In one embodiment, the target subject determination module 606 is further configured to process the confidence map of the subject region to obtain a subject mask map; detect the visible light map to determine the highlight region in the visible light map; The highlight area and the subject mask map determine the target subject to eliminate highlights in the visible light map.

在一个实施例中，目标主体确定模块606还用于对该主体区域置信度图进行自适应置信度阈值过滤处理，得到主体掩膜图。In one embodiment, the target subject determination module 606 is further configured to perform an adaptive confidence threshold filtering process on the subject area confidence map to obtain a subject mask map.

在一个实施例中，目标主体确定模块606还用于对该主体区域置信度图进行自适应置信度阈值过滤处理，得到二值化掩膜图；对该二值化掩膜图进行形态学处理和引导滤波处理，得到主体掩膜图。In one embodiment, the target subject determination module 606 is further configured to perform adaptive confidence threshold filtering processing on the confidence map of the subject region to obtain a binarized mask map; perform morphological processing on the binarized mask map and guided filtering processing to obtain the subject mask map.

在一个实施例中，目标主体确定模块606还用于将该可见光图中的高光区域与该主体掩膜图做差分处理，得到该可见光图中的目标主体。In one embodiment, the target subject determination module 606 is further configured to perform differential processing between the highlight area in the visible light image and the subject mask image to obtain the target subject in the visible light image.

在一个实施例中，该主体检测模型包括依次相连的输入层、中间层和输出层；In one embodiment, the subject detection model includes an input layer, an intermediate layer, and an output layer connected in sequence;

检测模块604还用于将所述可见光图作用于主体检测模型的输入层；将所述中心权重图作用于所述主体检测模型的输出层。The detection module 604 is further configured to apply the visible light map to the input layer of the subject detection model; and to apply the center weight map to the output layer of the subject detection model.

在一个实施例中，三维模型建构模块608还用于获取与所述可见光图对应的深度图；该深度图包括TOF深度图、双目深度图和结构光深度图中至少一种；对所述可见光图和深度图进行配准处理，得到配准后的可见光图和深度图，根据所述可见光图中目标主体所在的区域从配准后的深度图中确定目标主体对应的深度信息。In one embodiment, the three-dimensional model building module 608 is further configured to obtain a depth map corresponding to the visible light map; the depth map includes at least one of a TOF depth map, a binocular depth map and a structured light depth map; The visible light map and the depth map are registered to obtain the registered visible light map and the depth map, and the depth information corresponding to the target subject is determined from the registered depth map according to the region where the target subject is located in the visible light map.

在一个实施例中，三维模型建构模块608还用于以目标主体为中心，围绕目标主体连续变换采集角度；在相邻的可见光图之间存在重叠区域的条件下，实时采集可见光图，得到不同采集角度的可见光图。In one embodiment, the three-dimensional model building module 608 is further configured to take the target body as the center, and continuously change the acquisition angle around the target body; under the condition that there is an overlapping area between adjacent visible light images, collect the visible light images in real time, and obtain different Visible light map of the acquisition angle.

在一个实施例中，三维模型建构模块608还用于获取目标主体上的第一平面像素点对应的第一深度值；根据第一平面像素点在目标主体的位置和第一深度值得到第一平面像素点在三维空间对应的第一三维像素点；获取目标主体上的第二平面像素点对应的第二深度值；以第一深度值为参考深度值，根据第二深度值确定第二三维像素点相对于所述第一三维像素点的相对位置，第二三维像素点是第二平面像素点在三维空间对应的三维像素点；根据相对位置和第二平面像素点在目标主体的位置确定第二三维像素点在三维空间对应的位置；连接三维空间中的各个三维像素点。In one embodiment, the three-dimensional model building module 608 is further configured to obtain the first depth value corresponding to the first plane pixel on the target body; obtain the first depth value according to the position of the first plane pixel on the target body and the first depth value The first three-dimensional pixel point corresponding to the plane pixel point in the three-dimensional space; the second depth value corresponding to the second plane pixel point on the target body is obtained; the first depth value is used as a reference depth value, and the second three-dimensional value is determined according to the second depth value. The relative position of the pixel point relative to the first three-dimensional pixel point, the second three-dimensional pixel point is the three-dimensional pixel point corresponding to the second plane pixel point in the three-dimensional space; determined according to the relative position and the position of the second plane pixel point in the target subject The position corresponding to the second three-dimensional pixel point in the three-dimensional space; connecting each three-dimensional pixel point in the three-dimensional space.

在一个实施例中，三维模型建构模块608还用于确定目标主体对应的目标类型，根据目标类型获取相同类型的初始三维模型；获取目标主体上的关键平面像素点对应的实际深度值；从初始三维模型中获取与所述关键平面像素点匹配的三维模型像素点，根据各个关键平面像素点之间的实际深度值比例调整匹配的三维模型像素点的三维空间位置；获取目标主体上的非关键平面像素点对应的实际深度值；从初始三维模型中获取与所述非关键平面像素点匹配的三维模型像素点；根据非关键平面像素点与关键平面像素点之间的实际深度值比例调整与非关键平面像素点匹配的三维模型像素点的三维空间位置，直到目标主体上的各个平面像素点存在匹配的调整后的三维模型像素点；调整后的各个三维模型像素点形成目标主体对应的三维模型。In one embodiment, the three-dimensional model building module 608 is further configured to determine the target type corresponding to the target body, and obtain an initial three-dimensional model of the same type according to the target type; obtain the actual depth value corresponding to the key plane pixels on the target body; Obtaining the three-dimensional model pixel points matching the key plane pixels in the three-dimensional model, and adjusting the three-dimensional space position of the matched three-dimensional model pixel points according to the actual depth value ratio between each key plane pixel point; acquiring the non-key points on the target subject The actual depth value corresponding to the plane pixel point; obtain the three-dimensional model pixel point matching the non-critical plane pixel point from the initial three-dimensional model; according to the actual depth value ratio between the non-critical plane pixel point and the key plane pixel point. The 3D spatial position of the 3D model pixels matched by the non-critical plane pixels, until each plane pixel on the target body has matching adjusted 3D model pixels; the adjusted 3D model pixels form the corresponding 3D model of the target body. Model.

在一个实施例中，装置还包括：In one embodiment, the apparatus further includes:

显示模块，用于根据实时采集的可见光图，在预览界面实时显示目标主体对应的三维模型的建构过程。The display module is used to display the construction process of the three-dimensional model corresponding to the target body in real time on the preview interface according to the visible light image collected in real time.

在一个实施例中，检测模块604还用于将该配准后的可见光图、深度图和中心权重图输入到主体检测模型中，得到主体区域置信度图；其中，该主体检测模型是预先根据同一场景的可见光图、深度图、中心权重图及对应的已标注的主体掩膜图进行训练得到的模型。In one embodiment, the detection module 604 is further configured to input the registered visible light map, depth map and center weight map into the subject detection model to obtain a subject region confidence map; wherein the subject detection model is based on The model obtained by training the visible light map, depth map, center weight map and corresponding annotated subject mask map of the same scene.

在一个实施例中，上述三维模型的建构装置还包括训练图像获取模块、训练权重生成模块和训练模块。In one embodiment, the above-mentioned apparatus for constructing a three-dimensional model further includes a training image acquisition module, a training weight generation module and a training module.

训练图像获取模块用于获取同一场景的可见光图、深度图和已标注的主体掩膜图。The training image acquisition module is used to acquire the visible light map, depth map and annotated subject mask map of the same scene.

训练权重生成模块用于生成与所述可见光图对应的中心权重图，其中，所述中心权重图所表示的权重值从中心到边缘逐渐减小。The training weight generation module is configured to generate a center weight map corresponding to the visible light map, wherein the weight value represented by the center weight map gradually decreases from the center to the edge.

训练模块用于将所述可见光图作用于包含初始网络权重的主体检测模型的输入层，将所述深度图和所述中心权重图作用于初始的主体检测模型的输出层，将所述已标注的主体掩膜图作为所述主体检测模型输出的真实值，对所述包含初始网络权重的主体检测模型进行训练，得到所述主体检测模型的目标网络权重。当主体检测模型的损失函数小于损失阈值，或者训练次数达到预设次数时，主体检测模型的网络权重作为主体检测模型的目标网络权重。The training module is used to apply the visible light map to the input layer of the subject detection model containing the initial network weights, apply the depth map and the center weight map to the output layer of the initial subject detection model, and apply the labeled The subject mask map of , is used as the real value output by the subject detection model, and the subject detection model including the initial network weight is trained to obtain the target network weight of the subject detection model. When the loss function of the subject detection model is smaller than the loss threshold, or when the number of training times reaches a preset number of times, the network weight of the subject detection model is used as the target network weight of the subject detection model.

图8为一个实施例中终端设备的内部结构示意图。如图8所示，该终端设备包括通过系统总线连接的处理器和存储器。其中，该处理器用于提供计算和控制能力，支撑整个终端设备的运行。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作系统和计算机程序。该计算机程序可被处理器所执行，以用于实现各个实施例所提供的一种三维模型的建构方法。内存储器为非易失性存储介质中的操作系统计算机程序提供高速缓存的运行环境。该终端设备可以是手机、平板电脑或者个人数字助理或穿戴式设备等。FIG. 8 is a schematic diagram of an internal structure of a terminal device in an embodiment. As shown in FIG. 8, the terminal device includes a processor and a memory connected through a system bus. Among them, the processor is used to provide computing and control capabilities to support the operation of the entire terminal equipment. The memory may include non-volatile storage media and internal memory. The nonvolatile storage medium stores an operating system and a computer program. The computer program can be executed by the processor to implement a method for constructing a three-dimensional model provided by various embodiments. Internal memory provides a cached execution environment for operating system computer programs in non-volatile storage media. The terminal device may be a mobile phone, a tablet computer, a personal digital assistant, a wearable device, or the like.

本申请实施例中提供的三维模型的建构装置中的各个模块的实现可为计算机程序的形式。该计算机程序可在终端或服务器上运行。该计算机程序构成的程序模块可存储在终端或服务器的存储器上。该计算机程序被处理器执行时，实现本申请实施例中所描述方法的步骤。The implementation of each module in the apparatus for constructing a three-dimensional model provided in the embodiments of the present application may be in the form of a computer program. The computer program can be run on a terminal or server. The program modules constituted by the computer program can be stored in the memory of the terminal or the server. When the computer program is executed by the processor, the steps of the methods described in the embodiments of the present application are implemented.

本申请实施例还提供了一种计算机可读存储介质。一个或多个包含计算机可执行指令的非易失性计算机可读存储介质，当所述计算机可执行指令被一个或多个处理器执行时，使得所述处理器执行三维模型的建构方法的步骤。The embodiment of the present application also provides a computer-readable storage medium. One or more non-volatile computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the steps of a method for constructing a three-dimensional model .

一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行三维模型的建构方法的步骤。A computer program product containing instructions that, when run on a computer, cause the computer to perform the steps of a method of constructing a three-dimensional model.

本申请实施例所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易失性存储器。合适的非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)，它用作外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDR SDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。Any reference to a memory, storage, database, or other medium as used in embodiments of the present application may include non-volatile and/or volatile memory. Suitable nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Memory Bus (Rambus) Direct RAM (RDRAM), Direct Memory Bus Dynamic RAM (DRDRAM), and Memory Bus Dynamic RAM (RDRAM).

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. a kind of constructing method of threedimensional model, which is characterized in that the described method includes:

Visible light figure is obtained, generates center weight figure corresponding with the visible light figure, wherein represented by the center weight figure Weighted value be gradually reduced from center to edge；

The visible light figure and the center weight figure are input in subject detection model, body region confidence level figure is obtained, Wherein, the subject detection model is the visible light figure previously according to Same Scene, center weight figure and corresponding has marked The model that main body exposure mask figure is trained；

The target subject in the visible light figure is determined according to the body region confidence level figure；

Obtain the corresponding depth information of the target subject；

According to the target subject and the corresponding depth information of target subject, three-dimensionalreconstruction is carried out to the target subject, is returned Described the step of obtaining visible light figure, is corresponding until obtaining the target subject to obtain the visible light figure of different acquisition angle Threedimensional model.

2. the method according to claim 1, wherein described according to body region confidence level figure determination Target subject in visible light figure, comprising:

The body region confidence level figure is handled, main body exposure mask figure is obtained；

The visible light figure is detected, determines the highlight area in the visible light figure；

According in the visible light figure highlight area and the main body exposure mask figure, determine and eliminate bloom in the visible light figure Target subject.

3. the method according to claim 1, wherein described obtain the corresponding depth information packet of the target subject It includes:

Obtain depth map corresponding with the visible light figure；The depth map includes TOF depth map, binocular depth figure and structure light At least one of depth map；

Registration process, visible light figure and depth map after being registrated are carried out to the visible light figure and depth map；

Region according to where target subject described in the visible light figure determines the target master from the depth map after registration The corresponding depth information of body.

4. the method according to claim 1, wherein the step of return acquisition visible light figure, is to obtain The step of visible light figure of different acquisition angle includes:

Centered on the target subject, the target subject continuous transformation acquisition angles are surrounded；

There are under conditions of overlapping region between adjacent visible light figure, visible light figure is acquired in real time, the difference is obtained and adopts Collect the visible light figure of angle.

5. the method according to claim 1, wherein described corresponding according to the target subject and target subject Depth information carries out three-dimensionalreconstruction to the target subject, comprising:

Obtain corresponding first depth value of the first image plane vegetarian refreshments on the target subject；

It is flat that in the position of the target subject and first depth value described first is obtained according to the first image plane vegetarian refreshments Image surface vegetarian refreshments is in the corresponding first three-dimensional image vegetarian refreshments of three-dimensional space；

Obtain corresponding second depth value of the second image plane vegetarian refreshments on the target subject；

Using first depth value as reference depth value, determine the second three-dimensional image vegetarian refreshments relative to institute according to second depth value The relative position of the first three-dimensional image vegetarian refreshments is stated, the second three-dimensional image vegetarian refreshments is that the second image plane vegetarian refreshments is corresponding in three-dimensional space Three-dimensional image vegetarian refreshments；

Determine that described second is three-dimensional in the position of the target subject with the second image plane vegetarian refreshments depending on that relative position Pixel is in the corresponding position of the three-dimensional space；

Connect each three-dimensional image vegetarian refreshments in the three-dimensional space.

6. the method according to any one of claims 1 to 5, which is characterized in that described according to the target subject and mesh The step of marking the corresponding depth information of main body, carried out by three-dimensionalreconstruction, returns to the acquisition visible light figure for the target subject with Obtain different acquisition angle visible light figure, include: until obtaining the corresponding threedimensional model of the target subject

It determines the corresponding target type of target subject, the initial threedimensional model of same type is obtained according to the target type；

Obtain the corresponding actual depth value of crucial image plane vegetarian refreshments on the target subject；

Acquisition and the matched threedimensional model pixel of the crucial image plane vegetarian refreshments from the initial threedimensional model, according to each Actual depth value ratio between crucial image plane vegetarian refreshments adjusts the three-dimensional space position of matched threedimensional model pixel；

Obtain the corresponding actual depth value of non-key image plane vegetarian refreshments on the target subject；

It is obtained and the non-key matched threedimensional model pixel of image plane vegetarian refreshments from the initial threedimensional model；

According to the adjustment of actual depth value ratio and non-key plane between non-key image plane vegetarian refreshments and crucial image plane vegetarian refreshments The three-dimensional space position of the matched threedimensional model pixel of pixel, until each image plane vegetarian refreshments presence on target subject The threedimensional model pixel adjusted matched；

Each threedimensional model pixel adjusted forms the corresponding threedimensional model of target subject.

7. the method according to any one of claims 1 to 5, which is characterized in that the visible light figure acquires in real time, The method also includes:

According to the visible light figure acquired in real time, the corresponding threedimensional model of the target subject described in preview interface real-time display Construction process.

8. a kind of constructing devices of threedimensional model, which is characterized in that described device includes:

Processing module generates center weight figure corresponding with the visible light figure, wherein in described for obtaining visible light figure Weighted value represented by heart weight map is gradually reduced from center to edge；

Detection module obtains main body for the visible light figure and the center weight figure to be input in subject detection model Region confidence figure, wherein the subject detection model is the visible light figure previously according to Same Scene, center weight figure and right The model that the main body exposure mask figure marked answered is trained；

Target subject determining module, for determining the target master in the visible light figure according to the body region confidence level figure Body；

Threedimensional model building block, for obtaining the corresponding depth information of the target subject, according to the target subject and mesh The corresponding depth information of main body is marked, three-dimensionalreconstruction is carried out to the target subject, the processing module is returned and is adopted with obtaining difference The visible light figure for collecting angle, until obtaining the corresponding threedimensional model of the target subject.

9. a kind of terminal device, including memory and processor, computer program, the computer are stored in the memory When program is executed by the processor, so that the processor executes the step of the method as described in any one of claims 1 to 7 Suddenly.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method as described in any one of claims 1 to 7 is realized when being executed by processor.