[go: up one dir, main page]

CN115170746B - Multi-view three-dimensional reconstruction method, system and equipment based on deep learning - Google Patents

Multi-view three-dimensional reconstruction method, system and equipment based on deep learning Download PDF

Info

Publication number
CN115170746B
CN115170746B CN202211087276.9A CN202211087276A CN115170746B CN 115170746 B CN115170746 B CN 115170746B CN 202211087276 A CN202211087276 A CN 202211087276A CN 115170746 B CN115170746 B CN 115170746B
Authority
CN
China
Prior art keywords
point cloud
scale
scales
semantic
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211087276.9A
Other languages
Chinese (zh)
Other versions
CN115170746A (en
Inventor
任胜兵
彭泽文
陈旭洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202211087276.9A priority Critical patent/CN115170746B/en
Publication of CN115170746A publication Critical patent/CN115170746A/en
Application granted granted Critical
Publication of CN115170746B publication Critical patent/CN115170746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于深度学习的多视图三维重建方法、系统及设备,获取多张多视角图像,对多张多视角图像进行多尺度语义特征提取,获得多种尺度的特征图;对多种尺度的特征图进行多尺度语义分割,获得多种尺度的语义分割集;通过有监督的三维重建方法对多张多视角图像进行重建,获得初始深度图;基于多种尺度的语义分割集和初始深度图,获得多种尺度的深度图;构建多种尺度的点云集;对多种尺度的点云集采用不同的半径滤波进行优化,获得优化后的点云集;基于优化后的点云集进行不同尺度的重建,获得不同尺度的三维重建结果;将每种尺度的三维重建结果进行拼接融合。本发明能够充分利用各个尺度的语义信息,能够提高三维重建的精确度。

Figure 202211087276

The invention discloses a multi-view three-dimensional reconstruction method, system and equipment based on deep learning, which acquires multiple multi-view images, performs multi-scale semantic feature extraction on multiple multi-view images, and obtains feature maps of multiple scales; Multi-scale semantic segmentation is performed on feature maps of different scales to obtain semantic segmentation sets of multiple scales; multiple multi-view images are reconstructed through supervised 3D reconstruction methods to obtain initial depth maps; semantic segmentation sets based on multiple scales and The initial depth map obtains depth maps of multiple scales; constructs point cloud sets of multiple scales; optimizes point cloud sets of multiple scales with different radius filters to obtain optimized point cloud sets; based on the optimized point cloud set. Scale reconstruction to obtain 3D reconstruction results of different scales; splicing and fusion of 3D reconstruction results of each scale. The present invention can make full use of the semantic information of each scale, and can improve the accuracy of three-dimensional reconstruction.

Figure 202211087276

Description

一种基于深度学习的多视图三维重建方法、系统及设备A method, system and device for multi-view 3D reconstruction based on deep learning

技术领域technical field

本发明涉及计算机视觉技术领域,尤其是涉及一种基于深度学习的多视图三维重建方法、系统及设备。The present invention relates to the technical field of computer vision, in particular to a deep learning-based multi-view three-dimensional reconstruction method, system and equipment.

背景技术Background technique

深度学习的三维重建方法是利用计算机搭建神经网络, 通过大量的图像数据与三维模型数据进行训练, 学习图像至三维模型的映射关系, 从而实现对新的图像目标进行三维重建。与传统的诸如3DMM方法(3D Morphable Model)和SFM方法(Structure fromMotion)相比,深度学习的三维重建方法能够将学习到的一些全局的语义信息引入图像重建,从而在一定程度上克服传统重建方法在弱光照、弱纹理区域重建不良的局限性。The 3D reconstruction method of deep learning is to use a computer to build a neural network, train through a large amount of image data and 3D model data, and learn the mapping relationship between images and 3D models, so as to achieve 3D reconstruction of new image targets. Compared with traditional methods such as 3DMM method (3D Morphable Model) and SFM method (Structure fromMotion), the 3D reconstruction method of deep learning can introduce some global semantic information learned into image reconstruction, thereby overcoming the traditional reconstruction method to a certain extent. Limitations of bad reconstruction in poorly lit, weakly textured areas.

目前的深度学习三维重建方法大多基于单一尺度,即对于图像中不同大小的物体采取同样的方式进行重建。单尺度的重建在一些场景复杂度较低、细小物体较少的环境下能保持较好的重建精度和速度。但在一些场景复杂、各种尺度的物体较多的环境下容易出现小尺度物体重建精度不足的问题。并且只利用了高层特征,图像的低层细节信息没有得到充分利用。Most of the current deep learning 3D reconstruction methods are based on a single scale, that is, objects of different sizes in the image are reconstructed in the same way. Single-scale reconstruction can maintain good reconstruction accuracy and speed in some environments with low scene complexity and few small objects. However, in some environments with complex scenes and many objects of various scales, the problem of insufficient reconstruction accuracy of small-scale objects is prone to occur. And only the high-level features are used, and the low-level detail information of the image is not fully utilized.

发明内容Contents of the invention

本发明旨在至少解决现有技术中存在的技术问题之一。为此,本发明提出一种基于深度学习的多视图三维重建方法、系统及设备,能够充分利用各个尺度的语义信息,能够提高三维重建的精确度。The present invention aims to solve at least one of the technical problems existing in the prior art. For this reason, the present invention proposes a multi-view 3D reconstruction method, system and device based on deep learning, which can make full use of semantic information of each scale and improve the accuracy of 3D reconstruction.

第一方面,本发明实施例提供了一种基于深度学习的多视图三维重建方法,所述基于深度学习的多视图三维重建方法包括:In the first aspect, an embodiment of the present invention provides a method for multi-view 3D reconstruction based on deep learning, the method for multi-view 3D reconstruction based on deep learning includes:

获取多张多视角图像,对多张所述多视角图像进行多尺度语义特征提取,获得多种尺度的特征图;Obtaining multiple multi-view images, performing multi-scale semantic feature extraction on the multiple multi-view images, and obtaining feature maps of multiple scales;

对所述多种尺度的特征图进行多尺度语义分割,获得多种尺度的语义分割集;performing multi-scale semantic segmentation on the feature maps of multiple scales to obtain semantic segmentation sets of multiple scales;

通过有监督的三维重建方法对多张所述多视角图像进行重建,获得初始深度图;Reconstructing multiple multi-view images by a supervised three-dimensional reconstruction method to obtain an initial depth map;

基于所述多种尺度的语义分割集和所述初始深度图,获得多种尺度的深度图;Obtaining depth maps of multiple scales based on the semantic segmentation sets of multiple scales and the initial depth map;

基于所述多种尺度的深度图,构建多种尺度的点云集;Constructing point cloud sets of multiple scales based on the depth maps of multiple scales;

根据所述点云集的尺度,对所述多种尺度的点云集采用不同的半径滤波进行优化,获得优化后的点云集;According to the scale of the point cloud set, the point cloud set of multiple scales is optimized by using different radius filters to obtain the optimized point cloud set;

基于所述优化后的点云集进行不同尺度的重建,获得不同尺度的三维重建结果;Reconstructing at different scales based on the optimized point cloud set to obtain three-dimensional reconstruction results at different scales;

将每种尺度的三维重建结果进行拼接融合,获得最终的三维重建结果。The 3D reconstruction results of each scale are spliced and fused to obtain the final 3D reconstruction result.

与现有技术相比,本发明第一方面具有以下有益效果:Compared with the prior art, the first aspect of the present invention has the following beneficial effects:

本方法通过对多张多视角图像进行多尺度语义特征提取,能够提取不同尺度的特征,能获得多种尺度的特征图,并对多种尺度的特征图进行多尺度语义分割,聚合各个尺度的语义信息,丰富了各个尺度的语义信息;通过利用多种尺度的语义分割集中的各个尺度的语义信息分别对初始深度图进行语义引导,从而不断修正初始深度图,获得准确的多种尺度的深度图;本方法用获得的多种尺度的深度图构建多种尺度的点云集,根据点云集的尺度采用不同的半径滤波进行优化,优化后的点云集用于不同尺度的重建,再将三维重建结果融合以获得更加精确的三维重建结果。因此,本方法能够充分利用各个尺度的语义信息,能够提高三维重建的精确度。By extracting multi-scale semantic features from multiple multi-view images, this method can extract features of different scales, obtain feature maps of multiple scales, perform multi-scale semantic segmentation on feature maps of multiple scales, and aggregate the features of each scale. Semantic information enriches the semantic information of each scale; by using the semantic information of each scale in the semantic segmentation set of multiple scales, the initial depth map is semantically guided, so as to continuously correct the initial depth map and obtain accurate depth of multiple scales Figure; this method constructs point cloud sets of multiple scales with the obtained depth maps of multiple scales, and optimizes them with different radius filters according to the scale of the point cloud sets. The optimized point cloud sets are used for reconstruction of different scales, and then the three-dimensional reconstruction The results are fused to obtain more accurate 3D reconstruction results. Therefore, this method can make full use of the semantic information of each scale, and can improve the accuracy of 3D reconstruction.

根据本发明的一些实施例,所述对多张所述多视角图像进行多尺度语义特征提取,获得多种尺度的特征图,包括:According to some embodiments of the present invention, the multi-scale semantic feature extraction is performed on multiple multi-view images to obtain feature maps of multiple scales, including:

通过ResNet网络对多张所述多视角图像进行多层特征提取,获得多种尺度的原始特征图;Performing multi-layer feature extraction on multiple multi-view images through a ResNet network to obtain original feature maps of multiple scales;

将每种尺度的所述原始特征图分别与通道注意力连接,以通过通道注意力机制对每种尺度的所述原始特征图进行重要性加权,获得多种尺度的特征图。The original feature maps of each scale are respectively connected with channel attention, so as to weight the importance of the original feature maps of each scale through a channel attention mechanism, and obtain feature maps of multiple scales.

根据本发明的一些实施例,所述通过通道注意力机制对每种尺度的所述原始特征图进行重要性加权,获得多种尺度的特征图,包括:According to some embodiments of the present invention, the importance weighting of the original feature maps of each scale is carried out through the channel attention mechanism to obtain feature maps of multiple scales, including:

将每种尺度的所述原始特征图通过压缩网络进行压缩,获得每种尺度的所述原始特征图对应的一维特征图;Compressing the original feature map of each scale through a compression network to obtain a one-dimensional feature map corresponding to the original feature map of each scale;

将所述一维特征图通过激励网络输入全连接层进行重要性预测,获得每个通道的重要性大小;The one-dimensional feature map is input into the fully connected layer through the excitation network for importance prediction, and the importance of each channel is obtained;

将所述每个通道的重要性大小通过激励函数激励到每种尺度的所述原始特征图的一维特征图上,获得多种尺度的特征图。The importance of each channel is excited on the one-dimensional feature map of the original feature map of each scale through an activation function to obtain feature maps of multiple scales.

根据本发明的一些实施例,所述对所述多种尺度的特征图进行多尺度语义分割,获得多种尺度的语义分割集,包括:According to some embodiments of the present invention, performing multi-scale semantic segmentation on the feature maps of multiple scales to obtain semantic segmentation sets of multiple scales includes:

将所述多种尺度的特征图通过非负矩阵分解进行聚类,获得多种尺度的语义分割集;其中,所述非负矩阵分解的表达式为:The feature maps of the multiple scales are clustered by non-negative matrix decomposition to obtain semantic segmentation sets of multiple scales; wherein, the expression of the non-negative matrix decomposition is:

Figure DEST_PATH_IMAGE001
Figure DEST_PATH_IMAGE001

其中,V表示将多种尺度的特征图映射串联并重塑为HW行C列的矩阵V,P表示HW行K列的矩阵,Q表示K行C列的矩阵,H表示系数矩阵,W表示基矩阵,K表示语义簇数的非负矩阵分解因子,C表示每个像素的维度,F表示采用非诱导范数。Among them, V means that the feature map maps of various scales are concatenated and reshaped into a matrix V with HW rows and C columns, P means a matrix with HW rows and K columns, Q means a matrix with K rows and C columns, H means a coefficient matrix, and W means The basis matrix, K represents the non-negative matrix factorization factor of the number of semantic clusters, C represents the dimension of each pixel, and F represents the use of non-induced norms.

根据本发明的一些实施例,所述基于所述多种尺度的语义分割集和所述初始深度图,获得多种尺度的深度图,包括:According to some embodiments of the present invention, the obtaining of depth maps of multiple scales based on the semantic segmentation set of multiple scales and the initial depth map includes:

选取多张所述多视角图像中的任一张作为参考图,其他作为待匹配图;Select any one of multiple multi-view images as a reference image, and others as images to be matched;

从所述参考图中选取参考点,并获取所述参考点在所述语义分割集中对应的语义类别,以及获取所述参考点在所述初始深度图上对应的深度值;Selecting a reference point from the reference image, and acquiring a semantic category corresponding to the reference point in the semantic segmentation set, and acquiring a depth value corresponding to the reference point on the initial depth map;

通过如下公式选取所述参考点的数目:The number of reference points is selected by the following formula:

Figure 778149DEST_PATH_IMAGE002
Figure 778149DEST_PATH_IMAGE002

其中,

Figure DEST_PATH_IMAGE003
表示第j个分割集选取的参考点数目,H表示所述多视角图像的高度,W 表示所述多视角图像的宽度,HW表示所述多视角图像的像素点数量,t表示一个常量参数,
Figure 773918DEST_PATH_IMAGE004
表示第j个所述语义分割集所含的语义类别数,
Figure DEST_PATH_IMAGE005
表示第i个所述语义分割集所含的 语义类别数; in,
Figure DEST_PATH_IMAGE003
Represents the number of reference points selected by the j-th segmentation set, H represents the height of the multi-view image, W represents the width of the multi-view image, HW represents the number of pixels of the multi-view image, t represents a constant parameter,
Figure 773918DEST_PATH_IMAGE004
Indicates the number of semantic categories contained in the jth semantic segmentation set,
Figure DEST_PATH_IMAGE005
Indicates the number of semantic categories contained in the i-th semantic segmentation set;

基于每个所述参考点,通过如下公式获取每个所述参考点在所述待匹配图上的匹配点:Based on each of the reference points, the matching points of each of the reference points on the graph to be matched are obtained by the following formula:

Figure 819234DEST_PATH_IMAGE006
Figure 819234DEST_PATH_IMAGE006

其中,

Figure DEST_PATH_IMAGE007
表示第i个参考点在所述待匹配图上的匹配点,K表示相机的内参,T表示 所述相机的外参,
Figure 745602DEST_PATH_IMAGE008
表示所述参考图中的参考点Pi在所述初始深度图上对应的深度值; in,
Figure DEST_PATH_IMAGE007
Represents the matching point of the i-th reference point on the image to be matched, K represents the internal reference of the camera, and T represents the external reference of the camera,
Figure 745602DEST_PATH_IMAGE008
Indicates the depth value corresponding to the reference point P i in the reference image on the initial depth image;

获取每个所述匹配点对应的语义类别,通过最小化语义损失函数对每种尺度的所 述多视角图像进行修正,获得所述多种尺度的深度图,所述语义损失函数

Figure DEST_PATH_IMAGE009
的计算公式如 下: Obtain the semantic category corresponding to each of the matching points, correct the multi-view image of each scale by minimizing the semantic loss function, and obtain the depth map of the multiple scales, and the semantic loss function
Figure DEST_PATH_IMAGE009
The calculation formula is as follows:

Figure 137138DEST_PATH_IMAGE010
Figure 137138DEST_PATH_IMAGE010

其中,

Figure DEST_PATH_IMAGE011
表示第i个所述参考点的语义信息和第i个所述匹配点的语义信息 的差别,Mi表示掩膜,N表示所述参考点的数目。 in,
Figure DEST_PATH_IMAGE011
Indicates the difference between the semantic information of the i-th reference point and the semantic information of the i-th matching point, M i represents the mask, and N represents the number of the reference points.

根据本发明的一些实施例,所述基于所述多种尺度的深度图,构建多种尺度的点云集,包括:According to some embodiments of the present invention, the construction of point cloud sets of multiple scales based on the depth maps of multiple scales includes:

将每种尺度的深度图,通过如下表达式构建每种尺度的点云集:The depth map of each scale is used to construct the point cloud set of each scale through the following expression:

Figure 490759DEST_PATH_IMAGE012
Figure 490759DEST_PATH_IMAGE012

其中,

Figure DEST_PATH_IMAGE013
表示所述深度图的横坐标,
Figure 23371DEST_PATH_IMAGE014
表示所述深度图的纵坐标,
Figure DEST_PATH_IMAGE015
Figure 504162DEST_PATH_IMAGE016
表示根 据相机参数所获得的相机焦距,x、y和z表示点云转化的点云坐标。 in,
Figure DEST_PATH_IMAGE013
represents the abscissa of the depth map,
Figure 23371DEST_PATH_IMAGE014
represents the ordinate of the depth map,
Figure DEST_PATH_IMAGE015
and
Figure 504162DEST_PATH_IMAGE016
Indicates the focal length of the camera obtained according to the camera parameters, and x, y, and z indicate the point cloud coordinates of point cloud conversion.

根据本发明的一些实施例,根据所述点云集的尺度,对所述多种尺度的点云集采用不同的半径滤波进行优化,获得优化后的点云集,包括:According to some embodiments of the present invention, according to the scale of the point cloud set, the point cloud sets of various scales are optimized using different radius filters to obtain the optimized point cloud set, including:

获取所述多种尺度的点云集,每种尺度的所述点云集中的点云都有对应的半径大小和预设的邻点数量;Obtaining the point cloud sets of multiple scales, the point clouds in the point cloud sets of each scale have a corresponding radius size and a preset number of adjacent points;

根据所述点云集的尺度采用如下公式计算出所述点云集中点云对应的半径:According to the scale of the point cloud set, the following formula is used to calculate the radius corresponding to the point cloud in the point cloud set:

Figure DEST_PATH_IMAGE017
Figure DEST_PATH_IMAGE017

其中,

Figure 704199DEST_PATH_IMAGE018
表示不同尺度的点云集中点云对应的半径,
Figure DEST_PATH_IMAGE019
表示常量参数,t表示常量 参数,
Figure 805885DEST_PATH_IMAGE020
表示每个点云集的预先所设定的尺度等级; in,
Figure 704199DEST_PATH_IMAGE018
Indicates the radius corresponding to the point cloud in the point cloud set of different scales,
Figure DEST_PATH_IMAGE019
Represents a constant parameter, t represents a constant parameter,
Figure 805885DEST_PATH_IMAGE020
Represents the preset scale level of each point cloud set;

根据每个点云对应的所述半径大小和预设的邻点数量对所述多种尺度的点云集进行优化,获得优化后的点云集。The point cloud sets of various scales are optimized according to the radius size corresponding to each point cloud and the preset number of adjacent points, to obtain an optimized point cloud set.

第二方面,本发明实施例还提供了一种基于深度学习的多视图三维重建系统,所述基于深度学习的多视图三维重建系统包括:In the second aspect, the embodiment of the present invention also provides a multi-view 3D reconstruction system based on deep learning, and the multi-view 3D reconstruction system based on deep learning includes:

特征图获取单元,用于获取多视角图像,对所述多视角图像进行多尺度语义特征提取,获得多种尺度的特征图;A feature map acquisition unit, configured to acquire multi-view images, perform multi-scale semantic feature extraction on the multi-view images, and obtain feature maps of multiple scales;

语义分割集获取单元,用于对所述多种尺度的特征图进行多尺度语义分割,获得多种尺度的语义分割集;a semantic segmentation set acquisition unit, configured to perform multi-scale semantic segmentation on the feature maps of multiple scales, and obtain semantic segmentation sets of multiple scales;

初始深度图获取单元,用于通过有监督的三维重建方法对多张所述多视角图像进行重建,获得初始深度图;An initial depth map acquisition unit, configured to reconstruct multiple multi-view images through a supervised three-dimensional reconstruction method to obtain an initial depth map;

深度图获取单元,用于基于所述多种尺度的语义分割集和所述初始深度图,获得多种尺度的深度图;a depth map acquisition unit, configured to obtain depth maps of multiple scales based on the semantic segmentation sets of multiple scales and the initial depth map;

点云集获取单元,用于基于所述多种尺度的深度图,构建多种尺度的点云集;A point cloud set acquisition unit, configured to construct point cloud sets of multiple scales based on the depth maps of multiple scales;

半径滤波单元,用于根据所述点云集的尺度,对所述多种尺度的点云集采用不同的半径滤波进行优化,获得优化后的点云集;The radius filtering unit is configured to optimize the point cloud sets of multiple scales by using different radius filters according to the scale of the point cloud set to obtain an optimized point cloud set;

重建结果获取单元,用于基于所述优化后的点云集进行不同尺度的重建,获得不同尺度的三维重建结果;A reconstruction result acquisition unit, configured to perform reconstruction at different scales based on the optimized point cloud set, and obtain three-dimensional reconstruction results at different scales;

重建结果融合单元,用于将每种尺度的重建结果进行拼接融合,获得最终的三维重建结果。The reconstruction result fusion unit is configured to stitch and fuse the reconstruction results of each scale to obtain a final three-dimensional reconstruction result.

与现有技术相比,本发明第二方面具有以下有益效果:Compared with the prior art, the second aspect of the present invention has the following beneficial effects:

本系统的特征图获取单元通过对多张多视角图像进行多尺度语义特征提取,能够提取深层次的特征,能获得多种尺度的特征图,并通过语义分割集获取单元对多种尺度的特征图进行多尺度语义分割,聚合各个尺度的语义信息,丰富了各个尺度的语义信息;深度图获取单元通过利用多种尺度的语义分割集中的各个尺度的语义信息分别对初始深度图进行语义引导,从而不断修正初始深度图,获得准确的多种尺度的深度图;本系统的点云集获取单元用获得的多种尺度的深度图构建多种尺度的点云集,通过半径滤波单元根据点云集的尺度采用不同的半径滤波进行优化,通过重建结果获取单元基于优化后的点云集进行不同尺度的重建,再通过重建结果融合单元将三维重建结果融合以获得更加精确的三维重建结果。因此,本系统能够充分利用各个尺度的语义信息,能够提高三维重建的精确度。The feature map acquisition unit of this system can extract deep-level features by extracting multi-scale semantic features from multiple multi-view images, and can obtain feature maps of multiple scales, and obtain features of multiple scales through the semantic segmentation set acquisition unit. Multi-scale semantic segmentation is performed on the graph, and the semantic information of each scale is aggregated to enrich the semantic information of each scale; the depth map acquisition unit uses the semantic information of each scale in the semantic segmentation set of multiple scales to guide the semantics of the initial depth map respectively. In this way, the initial depth map is continuously corrected to obtain accurate depth maps of multiple scales; the point cloud acquisition unit of this system uses the obtained depth maps of various scales to construct point cloud sets of various scales, and the radius filter unit is used according to the scale of the point cloud set Different radius filters are used for optimization, and the reconstruction result acquisition unit performs reconstruction of different scales based on the optimized point cloud set, and then the 3D reconstruction results are fused by the reconstruction result fusion unit to obtain a more accurate 3D reconstruction result. Therefore, this system can make full use of the semantic information of each scale, and can improve the accuracy of 3D reconstruction.

第三方面,本发明实施例还提供了一种基于深度学习的多视图三维重建设备,包括至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器;所述存储器存储有可被所述至少一个控制处理器执行的指令,所述指令被所述至少一个控制处理器执行,以使所述至少一个控制处理器能够执行如上所述的一种基于深度学习的多视图三维重建方法。In the third aspect, the embodiment of the present invention also provides a multi-view 3D reconstruction device based on deep learning, including at least one control processor and a memory for communicating with the at least one control processor; the memory stores Instructions executable by the at least one control processor, the instructions are executed by the at least one control processor, so that the at least one control processor can perform the above-mentioned deep learning-based multi-view 3D rebuild method.

第四方面,本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如上所述的一种基于深度学习的多视图三维重建方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer perform one of the above-mentioned Multi-view 3D reconstruction method based on deep learning.

可以理解的是,上述第三方面至第四方面与相关技术相比存在的有益效果与上述第一方面与相关技术相比存在的有益效果相同,可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that the beneficial effects of the above-mentioned third aspect to the fourth aspect compared with the related technology are the same as those of the above-mentioned first aspect compared with the related technology. Please refer to the relevant description in the above-mentioned first aspect. This will not be repeated here.

附图说明Description of drawings

本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become apparent and comprehensible from the description of the embodiments in conjunction with the following drawings, wherein:

图1是本发明一实施例的一种基于深度学习的多视图三维重建方法的流程图;Fig. 1 is a flowchart of a multi-view 3D reconstruction method based on deep learning according to an embodiment of the present invention;

图2是本发明一实施例的深度残差网络的结构图;Fig. 2 is a structural diagram of a deep residual network according to an embodiment of the present invention;

图3是本发明一实施例的非负矩阵分解的示意图;Fig. 3 is a schematic diagram of a non-negative matrix decomposition of an embodiment of the present invention;

图4是本发明一实施例的多尺度语义分割的结构图;4 is a structural diagram of multi-scale semantic segmentation according to an embodiment of the present invention;

图5是本发明一实施例的一种基于深度学习的多视图三维重建系统的结构图。Fig. 5 is a structural diagram of a multi-view 3D reconstruction system based on deep learning according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

在本发明的描述中,如果有描述到第一、第二等只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。In the description of the present invention, if the first, second, etc. are described only for the purpose of distinguishing technical features, it cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implying Indicates the sequence of the indicated technical features.

在本发明的描述中,需要理解的是,涉及到方位描述,例如上、下等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that when it comes to orientation descriptions, for example, the orientation or positional relationship indicated by up, down, etc. is based on the orientation or positional relationship shown in the drawings, which is only for the convenience of describing the present invention and simplifying the description , rather than indicating or implying that the device or element referred to must have a particular orientation, be constructed and operate in a particular orientation, and thus should not be construed as limiting the invention.

本发明的描述中,需要说明的是,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本发明中的具体含义。In the description of the present invention, it should be noted that, unless otherwise clearly defined, words such as setting, installation, and connection should be understood in a broad sense, and those skilled in the art can reasonably determine that the above words are included in the present invention in combination with the specific content of the technical solution. specific meaning.

为方便本领域人员理解,对本实施例中的名词进行解释:For the convenience of those skilled in the art to understand, the nouns in the present embodiment are explained:

深度学习三维重建方法:深度学习的三维重建方法是利用计算机搭建神经网络,通过大量的图像数据与三维模型数据进行训练, 学习图像至三维模型的映射关系, 从而实现对新的图像目标进行三维重建。与传统的诸如3DMM重建三维信息的方法和SFM重建三维信息的方法相比,深度学习的三维重建方法能够将学习到的一些全局的语义信息引入图像重建,从而在一定程度上克服传统重建方法在弱光照、弱纹理区域重建不良的局限性,其中,SFM算法是一种基于各种收集到的无序图片进行三维重建的离线算法;3DMM,即三维可变形人脸模型,是一个通用的三维人脸模型,用固定的点数来表示人脸。Deep learning 3D reconstruction method: The 3D reconstruction method of deep learning is to use a computer to build a neural network, train through a large amount of image data and 3D model data, and learn the mapping relationship between images and 3D models, so as to achieve 3D reconstruction of new image targets . Compared with the traditional method of reconstructing 3D information such as 3DMM and the method of reconstructing 3D information by SFM, the 3D reconstruction method of deep learning can introduce some global semantic information learned into image reconstruction, thus to a certain extent overcome the traditional reconstruction method in The limitations of poor reconstruction in weak light and weak texture areas. Among them, the SFM algorithm is an offline algorithm for 3D reconstruction based on various collected disordered pictures; 3DMM, that is, 3D deformable face model, is a general 3D Face model, which uses a fixed number of points to represent the face.

目前的深度学习的三维重建方法主要可分为有监督的三维重建方法(例如,现有技术中的NVSNet、CVP-MVSNet、PatchmatchNet等)和自监督的三维重建方法(例如现有技术中的JDACS-MS等)两类。有监督的三维重建方法需要真值来进行训练,精度较高,但在一些真值获取困难的场景下难以适用。自监督的三维重建方法无需真值进行训练,适用面较广,精度相对低。The current 3D reconstruction methods of deep learning can be mainly divided into supervised 3D reconstruction methods (for example, NVSNet, CVP-MVSNet, PatchmatchNet, etc. in the prior art) and self-supervised 3D reconstruction methods (such as JDACS in the prior art -MS, etc.) two categories. The supervised 3D reconstruction method needs the real value for training, and has high accuracy, but it is difficult to apply in some scenes where the real value is difficult to obtain. The self-supervised 3D reconstruction method does not need the ground truth for training, so it has a wide range of applications and relatively low accuracy.

语义分割:语义分割是在像素级别上的分类,属于同一类的像素都要被归为一类,因此语义分割是从像素级别来理解图像的,例如,用不同的颜色将含有不同语义的像素标记出来。属于动物的像素被分为同一类。分割的语义信息能够对图像重建进行引导,提高重建的精度。采用聚类的方式进行语义分割,将属于同类的像素聚为相同的类。Semantic segmentation: Semantic segmentation is a classification at the pixel level. Pixels belonging to the same category must be classified into one category. Therefore, semantic segmentation understands images from the pixel level. For example, pixels with different semantics are divided into different colors. Mark it out. Pixels belonging to animals are grouped into the same class. The semantic information of segmentation can guide image reconstruction and improve the accuracy of reconstruction. Clustering is used for semantic segmentation, and pixels belonging to the same class are clustered into the same class.

深度图:也叫距离影像,是指将从图像采集器到场景中各点的距离(深度)值作为像素值的图像。Depth map: also called distance image, refers to an image with the distance (depth) value from the image collector to each point in the scene as the pixel value.

点云:物体外观表面的点数据集合是点云,含有物体的三维坐标信息和颜色等信息,通过点云数据可以实现图像的重建。Point cloud: The collection of point data on the appearance surface of an object is a point cloud, which contains information such as the three-dimensional coordinate information and color of the object, and the reconstruction of the image can be realized through the point cloud data.

非负矩阵分解(NMF):是在矩阵中所有元素均为非负数约束条件之下的矩阵分解方法。利用矩阵分解来解决实际问题的分析方法很多,如PCA(主成分分析)、ICA(独立成分分析)、SVD(奇异值分解)、VQ(矢量量化)等。在所有这些方法中,原始的大矩阵V被近似分解为低秩的V=WH形式。这些方法的共同特点是,因子W和H中的元素可为正或负,即使输入的初始矩阵元素是全正的,传统的秩削减算法也不能保证原始数据的非负性。在数学上,从计算的观点看,分解结果中存在负值是正确的,但负值元素在实际问题中往往是没有意义的。Non-negative matrix factorization (NMF): It is a matrix factorization method under the constraint that all elements in the matrix are non-negative numbers. There are many analysis methods for solving practical problems by using matrix decomposition, such as PCA (Principal Component Analysis), ICA (Independent Component Analysis), SVD (Singular Value Decomposition), VQ (Vector Quantization) and so on. In all these methods, the original large matrix V is approximately decomposed into a low-rank V=WH form. The common feature of these methods is that the elements in the factors W and H can be positive or negative, even if the input initial matrix elements are all positive, the traditional rank reduction algorithm cannot guarantee the non-negativity of the original data. Mathematically, from a computational point of view, it is correct to have negative values in the decomposition results, but negative elements are often meaningless in practical problems.

深度学习的三维重建方法是利用计算机搭建神经网络, 通过大量的图像数据与三维模型数据进行训练, 学习图像至三维模型的映射关系, 从而实现对新的图像目标进行三维重建。与传统的诸如3DMM方法和SFM方法相比,深度学习的三维重建方法能够将学习到的一些全局的语义信息引入图像重建,从而在一定程度上克服传统重建方法在弱光照、弱纹理区域重建不良的局限性。The 3D reconstruction method of deep learning is to use a computer to build a neural network, train through a large amount of image data and 3D model data, and learn the mapping relationship between images and 3D models, so as to achieve 3D reconstruction of new image targets. Compared with traditional methods such as 3DMM method and SFM method, the 3D reconstruction method of deep learning can introduce some global semantic information learned into image reconstruction, so as to overcome the poor reconstruction of traditional reconstruction methods in weakly illuminated and weakly textured areas to a certain extent. limitations.

目前的深度学习三维重建方法大多基于单一尺度,即对于图像中不同大小的物体采取同样的方式进行重建。单尺度的重建在一些场景复杂度较低、细小物体较少的环境下能保持较好的重建精度和速度。但在一些场景复杂、各种尺度的物体较多的环境下容易出现小尺度物体重建精度不足的问题。并且只利用了高层特征,图像的低层细节信息没有得到充分利用。Most of the current deep learning 3D reconstruction methods are based on a single scale, that is, objects of different sizes in the image are reconstructed in the same way. Single-scale reconstruction can maintain good reconstruction accuracy and speed in some environments with low scene complexity and few small objects. However, in some environments with complex scenes and many objects of various scales, the problem of insufficient reconstruction accuracy of small-scale objects is prone to occur. And only the high-level features are used, and the low-level detail information of the image is not fully utilized.

为解决上述问题,本申请对多张多视角图像进行多尺度语义特征提取,能够提取不同尺度的特征,能获得多种尺度的特征图,并对多种尺度的特征图进行多尺度语义分割,聚合各个尺度的语义信息,丰富了各个尺度的语义信息;通过利用多种尺度的语义分割集中的各个尺度的语义信息分别对初始深度图进行语义引导,从而不断修正初始深度图,获得准确的多种尺度的深度图;本申请用获得的多种尺度的深度图构建多种尺度的点云集,根据点云集的尺度采用不同的半径滤波进行优化,优化后的点云集用于不同尺度的重建,再将三维重建结果融合以获得更加精确的三维重建结果。因此,本申请能够充分利用各个尺度的语义信息,能够提高三维重建的精确度。In order to solve the above problems, this application performs multi-scale semantic feature extraction on multiple multi-view images, which can extract features of different scales, obtain feature maps of multiple scales, and perform multi-scale semantic segmentation on feature maps of multiple scales. The semantic information of each scale is aggregated to enrich the semantic information of each scale; the initial depth map is semantically guided by using the semantic information of each scale in the semantic segmentation set of multiple scales, so as to continuously correct the initial depth map and obtain accurate multiple Depth maps of different scales; this application uses the obtained depth maps of multiple scales to construct point cloud sets of multiple scales, and uses different radius filters to optimize according to the scale of the point cloud set. The optimized point cloud set is used for reconstruction of different scales. The 3D reconstruction results are then fused to obtain a more accurate 3D reconstruction result. Therefore, the present application can make full use of the semantic information of each scale, and can improve the accuracy of three-dimensional reconstruction.

参照图1,本发明实施例提供了一种基于深度学习的多视图三维重建方法,本基于深度学习的多视图三维重建方法包括:Referring to FIG. 1 , an embodiment of the present invention provides a method for multi-view 3D reconstruction based on deep learning. The method for multi-view 3D reconstruction based on deep learning includes:

步骤S100、获取多张多视角图像,对多张多视角图像进行多尺度语义特征提取,获得多种尺度的特征图。Step S100 , acquiring multiple multi-view images, performing multi-scale semantic feature extraction on the multiple multi-view images, and obtaining feature maps of multiple scales.

具体的,获取多张多视角图像,可以通过图像采集设备,例如摄像机、图像扫描仪等,对待识别的物体进行全方位多种角度的图像采集,得到多张多视角图像。例如,当需要对对多张多视角图像进行多尺度语义特征提取时,可以采用摄像机等图像采集设备得到多张多视角图像。Specifically, to obtain multiple multi-view images, an image acquisition device, such as a video camera, an image scanner, etc., may be used to collect images from all directions and multiple angles of the object to be recognized to obtain multiple multi-view images. For example, when multi-scale semantic feature extraction needs to be performed on multiple multi-view images, multiple multi-view images can be obtained by using an image acquisition device such as a camera.

本实施例通过ResNet网络对多张多视角图像进行多层特征提取,获得多种尺度的原始特征图;In this embodiment, multi-layer feature extraction is performed on multiple multi-view images through the ResNet network to obtain original feature maps of multiple scales;

将每种尺度的原始特征图分别与通道注意力连接,以通过通道注意力机制对每种尺度的原始特征图进行重要性加权,获得多种尺度的特征图,具体的:The original feature maps of each scale are respectively connected with the channel attention to weight the importance of the original feature maps of each scale through the channel attention mechanism to obtain feature maps of multiple scales, specifically:

将每种尺度的原始特征图通过压缩网络进行压缩,获得每种尺度的原始特征图对应的一维特征图;Compress the original feature map of each scale through the compression network to obtain the one-dimensional feature map corresponding to the original feature map of each scale;

将一维特征图通过激励网络输入全连接层进行重要性预测,获得每个通道的重要性大小;Input the one-dimensional feature map into the fully connected layer through the excitation network for importance prediction, and obtain the importance of each channel;

将每个通道的重要性大小通过激励函数激励到每种尺度的原始特征图的一维特征图上,获得多种尺度的特征图。The importance of each channel is excited to the one-dimensional feature map of the original feature map of each scale through the activation function to obtain feature maps of multiple scales.

在本实施例中,采用ResNet网络提取图像特征,当深度学习网络层数越深时,理论 上表达能力会更强,但是CNN网络达到一定的深度后,再加深,分类性能不会提高,而是会导 致网络收敛更缓慢,准确率也随着降低;即使把数据集增大,解决过拟合的问题,分类性能 和准确度也不会提高。而ResNet网络采用一个残差学习的方法,参照图2,当输入为x时其学 习到的特征记为

Figure DEST_PATH_IMAGE021
,现在我们希望其可以学习到残差
Figure 91373DEST_PATH_IMAGE022
,这样其实原始 的学习特征是
Figure DEST_PATH_IMAGE023
。之所以这样是因为残差学习相比原始特征直接学习更容易。当残 差为0时,此时堆积层仅仅做了恒等映射,至少网络性能不会下降,实际上残差不会为0,这 也会使得堆积层在输入特征基础上学习到新的特征,从而拥有更好的性能。这种残差函数 更容易优化,能使网络层数大大加深,从而能够提取到更加深层次的语义信息。ResNet在效 率、资源消耗以及深层次语义特征提取方面的性能表现显著优于VGG等网络。 In this embodiment, the ResNet network is used to extract image features. When the depth of the deep learning network layer is deeper, the expression ability will be stronger in theory, but after the CNN network reaches a certain depth, if it is deepened, the classification performance will not be improved, and It will cause the network to converge more slowly, and the accuracy rate will also decrease; even if the data set is increased to solve the problem of over-fitting, the classification performance and accuracy will not improve. The ResNet network uses a residual learning method. Referring to Figure 2, when the input is x, the learned features are recorded as
Figure DEST_PATH_IMAGE021
, now we hope that it can learn the residual
Figure 91373DEST_PATH_IMAGE022
, so in fact the original learning feature is
Figure DEST_PATH_IMAGE023
. The reason for this is that residual learning is easier than direct learning of raw features. When the residual is 0, the accumulation layer only does the identity mapping at this time, at least the network performance will not decrease, in fact the residual will not be 0, which will also allow the accumulation layer to learn new features based on the input features , resulting in better performance. This kind of residual function is easier to optimize, and can greatly deepen the number of network layers, so that deeper semantic information can be extracted. The performance of ResNet in terms of efficiency, resource consumption, and deep semantic feature extraction is significantly better than that of networks such as VGG.

在通过ResNet网络对多张多视角图像进行多层特征提取,获得多种尺度的原始特征图后,将每种尺度的原始特征图分别与通道注意力连接,以通过通道注意力机制对每种尺度的原始特征图进行重要性加权,获得多种尺度的特征图。该通道注意力机制主要由压缩网络和激励网络两部分组成,具体过程为:After multi-layer feature extraction is performed on multiple multi-view images through the ResNet network to obtain original feature maps of various scales, the original feature maps of each scale are connected to the channel attention, so as to use the channel attention mechanism for each The original feature maps of different scales are weighted by importance to obtain feature maps of multiple scales. The channel attention mechanism is mainly composed of compression network and incentive network. The specific process is as follows:

设原始特征图的维度为H*W*C,其中H是高度(Height),W是宽度(width),C是通道数(channel)。压缩网络做的事情是把H*W*C压缩为1*1*C,相当于把H*W压缩成一维特征了,通过全局平均池化实现。H*W压缩成一维后,相当于这一维参数获得了之前H*W全局的视野,感受区域更广。将压缩网络得到的一维特征传递给激励网络,激励网络将一维特征传输至一个全连接层,对每个通道的重要性进行预测,得到不同通道的重要性大小后,再通过Sigmoid激励函数将不同通道的重要性大小激励到之前的特征图所对应的通道上。通道注意力机制使得网络能够关注更有效的语义特征,并迭代提高其权重,特征提取网络会提取到丰富的语义特征,不同的语义特征对于语义分割的重要性是不同的。通道注意力机制的引入能够使得网络关注那些更加有效的特征,抑制低效的特征,提高特征提取的有效性。Let the dimension of the original feature map be H*W*C, where H is the height (Height), W is the width (width), and C is the number of channels (channel). What the compression network does is to compress H*W*C into 1*1*C, which is equivalent to compressing H*W into a one-dimensional feature, which is realized by global average pooling. After H*W is compressed into one dimension, it is equivalent to obtaining the previous H*W global field of view for this dimension parameter, and the sensory area is wider. The one-dimensional features obtained by the compressed network are passed to the incentive network, and the incentive network transmits the one-dimensional features to a fully connected layer to predict the importance of each channel, and after obtaining the importance of different channels, pass the Sigmoid activation function The importance of different channels is excited to the channel corresponding to the previous feature map. The channel attention mechanism enables the network to focus on more effective semantic features and iteratively increase their weights. The feature extraction network will extract rich semantic features. Different semantic features have different importance for semantic segmentation. The introduction of the channel attention mechanism can make the network focus on those more effective features, suppress inefficient features, and improve the effectiveness of feature extraction.

在本实施例中,由于现有技术特征提取所用的卷积神经网络,像VGG网络的特征提取受限于网络提取层数,深层次特征提取能力不足,特征有效性不高。随着卷积层数的增加会出现网络收敛缓慢,准确率降低等问题,特征提取能力不足,并且所提取的全部特征对图像重建的重要性是不同的,难以保证提取到有效性高的特征。因此,本实施例通过对多张多视角图像进行多尺度语义特征提取,能够提取深层次的特征,能获得多种尺度的特征图。并通过通道注意力机制的引入能够使得网络关注那些更加有效的特征,抑制低效的特征,提高特征提取的有效性。In this embodiment, due to the convolutional neural network used for feature extraction in the prior art, the feature extraction of the VGG network is limited by the number of network extraction layers, the deep feature extraction capability is insufficient, and the feature validity is not high. As the number of convolutional layers increases, there will be problems such as slow network convergence and reduced accuracy, and the feature extraction ability is insufficient, and all the extracted features are of different importance to image reconstruction, so it is difficult to guarantee the extraction of highly effective features. . Therefore, in this embodiment, by performing multi-scale semantic feature extraction on multiple multi-view images, deep-level features can be extracted, and feature maps of multiple scales can be obtained. And through the introduction of the channel attention mechanism, the network can focus on those more effective features, suppress inefficient features, and improve the effectiveness of feature extraction.

步骤S200、对多种尺度的特征图进行多尺度语义分割,获得多种尺度的语义分割集。Step S200 , performing multi-scale semantic segmentation on feature maps of multiple scales to obtain semantic segmentation sets of multiple scales.

具体的,将多种尺度的特征图通过非负矩阵分解进行聚类,获得多种尺度的语义分割集;其中,非负矩阵分解的表达式为:Specifically, feature maps of multiple scales are clustered through non-negative matrix decomposition to obtain semantic segmentation sets of multiple scales; where the expression of non-negative matrix decomposition is:

Figure 93964DEST_PATH_IMAGE001
Figure 93964DEST_PATH_IMAGE001

其中,V表示将多种尺度的特征图映射串联并重塑为HW行C列的矩阵V,P表示HW行K列的矩阵,Q表示K行C列的矩阵,H表示系数矩阵,W表示基矩阵,K表示语义簇数的非负矩阵分解因子,C表示每个像素的维度,F表示采用非诱导范数。Among them, V means that the feature map maps of various scales are concatenated and reshaped into a matrix V with HW rows and C columns, P means a matrix with HW rows and K columns, Q means a matrix with K rows and C columns, H means a coefficient matrix, and W means The basis matrix, K represents the non-negative matrix factorization factor of the number of semantic clusters, C represents the dimension of each pixel, and F represents the use of non-induced norms.

通常的矩阵分解会把一个大的矩阵分解为多个小的矩阵,但是这些矩阵的元素有 正有负。而在现实世界中,比如图像,文本等形成的矩阵中负数的存在是没有意义的,所以 如果能把一个矩阵分解成全是非负元素是很有意义的。在NMF中要求原始的矩阵

Figure 414087DEST_PATH_IMAGE024
的所有 元素均是非负的,那么矩阵
Figure 578352DEST_PATH_IMAGE024
可以分解为两个更小的非负矩阵的乘积,这个矩阵有且仅有 一个这样的分解,即满足存在性和唯一性。例如, The usual matrix decomposition decomposes a large matrix into multiple small matrices, but the elements of these matrices are positive and negative. In the real world, the existence of negative numbers in the matrix formed by images, texts, etc. is meaningless, so it is very meaningful if a matrix can be decomposed into all non-negative elements. Ask for the original matrix in NMF
Figure 414087DEST_PATH_IMAGE024
All elements of are non-negative, then the matrix
Figure 578352DEST_PATH_IMAGE024
It can be decomposed into the product of two smaller non-negative matrices, and this matrix has one and only one such decomposition, which satisfies existence and uniqueness. E.g,

给定矩阵

Figure DEST_PATH_IMAGE025
,寻找非负矩阵
Figure 101869DEST_PATH_IMAGE026
和非负矩阵
Figure DEST_PATH_IMAGE027
, 使得
Figure 704888DEST_PATH_IMAGE028
。分解前后可理解为:原始矩阵
Figure 82780DEST_PATH_IMAGE024
的列向量是对左矩阵中所有列向量的加权 和,而权重系数就是右矩阵对应列向量的元素,故称
Figure DEST_PATH_IMAGE029
为基矩阵,
Figure 995110DEST_PATH_IMAGE030
为系数矩阵。 given matrix
Figure DEST_PATH_IMAGE025
, looking for a nonnegative matrix
Figure 101869DEST_PATH_IMAGE026
and a nonnegative matrix
Figure DEST_PATH_IMAGE027
, making
Figure 704888DEST_PATH_IMAGE028
. Before and after decomposition, it can be understood as: the original matrix
Figure 82780DEST_PATH_IMAGE024
The column vector of is the weighted sum of all column vectors in the left matrix, and the weight coefficient is the element of the corresponding column vector of the right matrix, so it is called
Figure DEST_PATH_IMAGE029
is the basis matrix,
Figure 995110DEST_PATH_IMAGE030
is the coefficient matrix.

参照图3,首先将N张多种尺度的特征图映射串联并重塑为(HW,C)矩阵V。利用乘法 更新规则求解NMF,即使用公式

Figure DEST_PATH_IMAGE031
Figure 255190DEST_PATH_IMAGE032
求解NMF,通过图中的NMF分解(即, NMF非负矩阵分解)将V分解为(HW,K)矩阵P和(K,C)矩阵Q,其中K是表示语义簇数的NMF因 子。由于NMF(QQT=I)的正交约束性,可以将(K,C)矩阵Q的每一行视为C维的簇心,K,C)矩阵Q 的每一行对应于视图中的若干对象。(HW,K)矩阵P的行对应于来自N张多种尺度的特征图的 所有像素的位置。通常,矩阵分解强制执行P的每一行和Q的每一列之间的乘积,以更好地近 似V中每个像素的C维特征。这样,通过P矩阵得到了图像中每个位置的语义类别。 Referring to Figure 3, first, N feature map maps of various scales are concatenated and reshaped into a (HW, C) matrix V. Using the multiplicative update rule to solve NMF, that is, using the formula
Figure DEST_PATH_IMAGE031
,
Figure 255190DEST_PATH_IMAGE032
To solve NMF, decompose V into (HW, K) matrix P and (K, C) matrix Q by NMF decomposition (i.e., NMF non-negative matrix factorization) in the figure, where K is the NMF factor representing the number of semantic clusters. Due to the orthogonal constraint of NMF (QQ T = I), each row of the (K, C) matrix Q can be regarded as a C-dimensional cluster center, and each row of the K, C) matrix Q corresponds to several objects in the view . The rows of the (HW,K) matrix P correspond to the positions of all pixels from N feature maps of various scales. In general, matrix factorization enforces a product between each row of P and each column of Q to better approximate the C-dimensional features of each pixel in V. In this way, the semantic category of each position in the image is obtained through the P matrix.

参照图4,假设提取的特征图

Figure DEST_PATH_IMAGE033
通过聚类的方式(即,NMF非负矩阵分解)进行语义 分割,将各特征矩阵
Figure 865163DEST_PATH_IMAGE034
分解为
Figure DEST_PATH_IMAGE035
,由于高层特征层的感受野大,特征更抽象,更加关注 全局。低层特征层感受野小,更加关注细节。因此,通过多尺度的语义分割获得的各个分割 集
Figure 628720DEST_PATH_IMAGE036
,由粗到细包含多个层次。图4中的分割集S1至S3含有的细节信息逐渐增多。每一个分 割集S中含有输入的一组图像(参考图和待匹配图)的语义分割结果,例如,不同的颜色表示 不同的语义类别,含有更多的细节信息的分割集(如分割集S3)会含有更多的语义类别。 Referring to Figure 4, it is assumed that the extracted feature map
Figure DEST_PATH_IMAGE033
Semantic segmentation is performed by clustering (that is, NMF non-negative matrix decomposition), and each feature matrix
Figure 865163DEST_PATH_IMAGE034
Decomposed into
Figure DEST_PATH_IMAGE035
, due to the large receptive field of the high-level feature layer, the features are more abstract and pay more attention to the overall situation. The low-level feature layer has a small receptive field and pays more attention to details. Therefore, each segmentation set obtained by multi-scale semantic segmentation
Figure 628720DEST_PATH_IMAGE036
, including multiple levels from coarse to fine. The segmentation sets S1 to S3 in FIG. 4 contain more detailed information gradually. Each segmentation set S contains the semantic segmentation results of a set of input images (reference image and image to be matched), for example, different colors represent different semantic categories, segmentation sets containing more detailed information (such as segmentation set S3) will contain more semantic categories.

在本实施例中,由于目前的深度学习三维重建方法大多基于单一尺度,即对于图像中不同大小的物体采取同样的方式进行重建。单尺度的重建在一些场景复杂度较低、细小物体较少的环境下能保持较好的重建精度和速度,但在一些场景复杂、各种尺度的物体较多的环境下容易出现小尺度物体重建精度不足的问题;并且只利用了高层特征,图像的低层细节信息没有得到充分利用。因此,本实施例通过对多种尺度的特征图进行多尺度语义分割,聚合各个尺度的语义信息,丰富了各个尺度的语义信息,并使得低层特征层的细节信息能够得到充分利用。In this embodiment, since the current deep learning 3D reconstruction methods are mostly based on a single scale, that is, objects of different sizes in the image are reconstructed in the same manner. Single-scale reconstruction can maintain good reconstruction accuracy and speed in some environments with low scene complexity and few small objects, but small-scale objects are prone to appear in some environments with complex scenes and many objects of various scales The problem of insufficient reconstruction accuracy; and only high-level features are used, and the low-level detail information of the image is not fully utilized. Therefore, this embodiment performs multi-scale semantic segmentation on feature maps of multiple scales, aggregates semantic information of each scale, enriches semantic information of each scale, and makes full use of detailed information of low-level feature layers.

步骤S300、通过有监督的三维重建方法对多张多视角图像进行重建,获得初始深度图。Step S300 , using a supervised three-dimensional reconstruction method to reconstruct multiple multi-view images to obtain an initial depth map.

具体的,本实施例通过有监督的三维重建方法对多张多视角图像进行重建,获得初始深度图。Specifically, in this embodiment, a supervised three-dimensional reconstruction method is used to reconstruct multiple multi-view images to obtain an initial depth map.

本实施例通过有监督的三维重建方法获得初始深度图,能够提高重建精度。因为有监督的三维重建方法精度较高,但需要大量的训练真值数据,在某些特定的场景下(例如,水下),训练真值获取困难,难以适用。因此,需要步骤S400对本实施例的初始深度图进行语义引导,有监督三维重建方法转变为无监督,实现自监督的三维重建,从而克服有监督的三维重建方法的固有缺陷。In this embodiment, an initial depth map is obtained through a supervised three-dimensional reconstruction method, which can improve reconstruction accuracy. Because the supervised 3D reconstruction method has high accuracy, but requires a large amount of training ground truth data, in some specific scenarios (for example, underwater), it is difficult to obtain the training ground truth value and is difficult to apply. Therefore, step S400 is required to carry out semantic guidance on the initial depth map of this embodiment, and the supervised 3D reconstruction method is transformed into unsupervised to realize self-supervised 3D reconstruction, thereby overcoming the inherent defects of the supervised 3D reconstruction method.

需要说明的是,本实施例中的有监督的三维重建方法为现有技术中的任一种有监督的三维重建方法,例如,MVSNet(MVSNet: Depth Inference for Unstructured Multi-view Stereo)、CVP-MVSNet(Cost Volume Pyramid Based Depth Inference for Multi-View Stereo)和PatchmatchNet(PatchmatchNet: Learned Multi-View PatchmatchStereo)等,本实施例不再详细描述。It should be noted that the supervised 3D reconstruction method in this embodiment is any supervised 3D reconstruction method in the prior art, for example, MVSNet (MVSNet: Depth Inference for Unstructured Multi-view Stereo), CVP- MVSNet (Cost Volume Pyramid Based Depth Inference for Multi-View Stereo) and PatchmatchNet (PatchmatchNet: Learned Multi-View PatchmatchStereo), etc., will not be described in detail in this embodiment.

步骤S400、基于多种尺度的语义分割集和初始深度图,获得多种尺度的深度图。Step S400 , based on the semantic segmentation sets of multiple scales and the initial depth map, depth maps of multiple scales are obtained.

具体的,本实施例通过语义信息作为监督信号结合一个有监督的三维重建方法,引导图像重建获得深度图,具体过程为:Specifically, this embodiment uses semantic information as a supervisory signal combined with a supervised 3D reconstruction method to guide image reconstruction to obtain a depth map. The specific process is:

通过图像采集设备获取多张多视角图像,将多张多视角图像作为输入通过有监督的三维重建方法获得初始深度图;Obtain multiple multi-view images through an image acquisition device, and use the multiple multi-view images as input to obtain an initial depth map through a supervised 3D reconstruction method;

选取多张多视角图像中的任一张作为参考图,其他作为待匹配图;Select any one of multiple multi-view images as a reference image, and the others as images to be matched;

从参考图中选取参考点,并获取参考点在语义分割集中对应的语义类别,以及获取参考点在初始深度图上对应的深度值;Select a reference point from the reference image, and obtain the semantic category corresponding to the reference point in the semantic segmentation set, and obtain the depth value corresponding to the reference point on the initial depth map;

通过如下公式选取参考点的数目:The number of reference points is selected by the following formula:

Figure DEST_PATH_IMAGE037
Figure DEST_PATH_IMAGE037

其中,

Figure 682258DEST_PATH_IMAGE038
表示第j个分割集选取的参考点数目,H表示多视角图像的高度,W表示 多视角图像的宽度,HW表示多视角图像的像素点数量,t表示一个常量参数,
Figure DEST_PATH_IMAGE039
表示第j个 语义分割集所含的语义类别数,
Figure 960792DEST_PATH_IMAGE040
表示第i个语义分割集所含的语义类别数; in,
Figure 682258DEST_PATH_IMAGE038
Indicates the number of reference points selected by the j-th segmentation set, H indicates the height of the multi-view image, W indicates the width of the multi-view image, HW indicates the number of pixels of the multi-view image, t indicates a constant parameter,
Figure DEST_PATH_IMAGE039
Indicates the number of semantic categories contained in the jth semantic segmentation set,
Figure 960792DEST_PATH_IMAGE040
Indicates the number of semantic categories contained in the i-th semantic segmentation set;

基于每个参考点,通过如下公式获取每个参考点在待匹配图上的匹配点:Based on each reference point, the matching point of each reference point on the image to be matched is obtained by the following formula:

Figure 577718DEST_PATH_IMAGE006
Figure 577718DEST_PATH_IMAGE006

其中,

Figure 726940DEST_PATH_IMAGE007
表示第i个参考点在待匹配图上的匹配点,K表示相机的内参,T表示相机 的外参,
Figure 473352DEST_PATH_IMAGE008
表示参考图中的参考点Pi在初始深度图上对应的深度值; in,
Figure 726940DEST_PATH_IMAGE007
Indicates the matching point of the i-th reference point on the image to be matched, K indicates the internal reference of the camera, T indicates the external reference of the camera,
Figure 473352DEST_PATH_IMAGE008
Indicates the depth value corresponding to the reference point P i in the reference image on the initial depth image;

获取每个匹配点对应的语义类别,通过最小化语义损失函数对每种尺度的多视角 图像进行修正,获得多种尺度的深度图,语义损失函数

Figure 645707DEST_PATH_IMAGE009
的计算公式如下: Obtain the semantic category corresponding to each matching point, correct the multi-view image of each scale by minimizing the semantic loss function, and obtain depth maps of multiple scales, semantic loss function
Figure 645707DEST_PATH_IMAGE009
The calculation formula is as follows:

Figure 863062DEST_PATH_IMAGE010
Figure 863062DEST_PATH_IMAGE010

其中,

Figure 100002_DEST_PATH_IMAGE041
表示第i个参考点的语义信息和第i个匹配点的语义信息的差别,Mi 表示掩膜,N表示参考点的数目。本实施例采用如下例子进行说明: in,
Figure 100002_DEST_PATH_IMAGE041
Indicates the difference between the semantic information of the i-th reference point and the semantic information of the i-th matching point, M i represents the mask, and N represents the number of reference points. This embodiment uses the following example for illustration:

首先,通过图像采集设备获取同一物体不同视角下的多张多视角图像,将多张多视角图像作为输入,通过一个有监督的三维重建方法可以得到初始深度图。在输入的多张多视角图像中选取一张为参考图,其余为待匹配图,在参考图像上取一点参考点Pi,及其在分割集S上对应的语义类别Si,以及在深度图上对应的深度值。First, multiple multi-view images under different viewing angles of the same object are acquired through the image acquisition device, and the multiple multi-view images are used as input, and the initial depth map can be obtained through a supervised 3D reconstruction method. Select one of the multiple input multi-view images as the reference image, and the rest are the images to be matched. Take a reference point P i on the reference image, its corresponding semantic category S i on the segmentation set S, and the depth The corresponding depth value on the map.

对于不同层次的分割集,由于语义类别数的不同,类别较多的分割集需要更精细的引导,参考点数目应更多,参考点数目选取依据公式:For segmentation sets of different levels, due to the difference in the number of semantic categories, segmentation sets with more categories need more refined guidance, and the number of reference points should be more. The number of reference points is selected according to the formula:

Figure 601211DEST_PATH_IMAGE042
Figure 601211DEST_PATH_IMAGE042

通过如下单应性矩阵公式求出与参考点在待匹配图上对应的匹配点

Figure 511398DEST_PATH_IMAGE007
: Calculate the matching point corresponding to the reference point on the image to be matched by the following homography matrix formula
Figure 511398DEST_PATH_IMAGE007
:

取匹配点

Figure 171049DEST_PATH_IMAGE007
的语义类别
Figure 100002_DEST_PATH_IMAGE043
,参考点在深度图准确的情况下(即对应位置的深度值 正确)所计算出的匹配点的语义类别应与参考点的语义类别相同,计算并最小化如下的语 义损失函数: take match point
Figure 171049DEST_PATH_IMAGE007
Semantic category of
Figure 100002_DEST_PATH_IMAGE043
, when the depth map of the reference point is accurate (that is, the depth value of the corresponding position is correct), the semantic category of the matching point calculated should be the same as the semantic category of the reference point, and the following semantic loss function is calculated and minimized:

Figure 473986DEST_PATH_IMAGE044
Figure 473986DEST_PATH_IMAGE044

通过最小化语义损失函数,从而不断修正初始深度图,最终获得准确的深度图。语义信息能够代替真值进行引导,将有监督三维重建方法转变为无监督,实现自监督的三维重建,从而克服有监督方法的固有缺陷。By minimizing the semantic loss function, the initial depth map is continuously corrected, and an accurate depth map is finally obtained. Semantic information can replace the truth value for guidance, transform supervised 3D reconstruction methods into unsupervised, realize self-supervised 3D reconstruction, and overcome the inherent defects of supervised methods.

在本实施例中,由于,图像的语义可分为三层,视觉层,对象层和概念层,视觉层语义包含颜色、线条、轮廓等,对象层的语义包含各种物体,概念层的语义则涉及对场景的理解。在现有技术中,部分三维重建方法也利用到了语义信息引导,但单一尺度的高层抽象语义信息(对象层)在一些大尺度物体的重建任务上拥有较好的精度,在一些小尺度的重建任务上,高层抽象的语义信息相对粗糙,重建的精度不好。In this embodiment, since the semantics of the image can be divided into three layers, the visual layer, the object layer and the conceptual layer, the visual layer semantics include colors, lines, outlines, etc., the object layer semantics include various objects, and the conceptual layer semantics It involves understanding the scene. In the existing technology, some 3D reconstruction methods also use semantic information guidance, but the single-scale high-level abstract semantic information (object layer) has good accuracy in the reconstruction tasks of some large-scale objects, and in some small-scale reconstruction tasks On the task, the high-level abstract semantic information is relatively rough, and the reconstruction accuracy is not good.

因此,本实施例将多张多视角图像作为输入,通过有监督的三维重建方法获得初始深度图;基于多种尺度的语义分割集和初始深度图,获得多种尺度的深度图;本实施例利用多种尺度的语义分割集中的各个尺度的语义信息分别对初始深度图进行语义引导,从而不断修正初始深度图,获得准确的多种尺度的深度图。Therefore, in this embodiment, multiple multi-view images are used as input, and an initial depth map is obtained through a supervised three-dimensional reconstruction method; depth maps of various scales are obtained based on semantic segmentation sets of multiple scales and initial depth maps; this embodiment The semantic information of each scale in the multi-scale semantic segmentation set is used to carry out semantic guidance on the initial depth map, so as to continuously correct the initial depth map and obtain accurate multi-scale depth maps.

步骤S500、基于多种尺度的深度图,构建多种尺度的点云集。Step S500, based on the depth maps of multiple scales, construct point cloud sets of multiple scales.

具体的,将每种尺度的深度图,通过如下表达式构建每种尺度的点云集:Specifically, the depth map of each scale is used to construct the point cloud set of each scale through the following expression:

Figure DEST_PATH_IMAGE045
Figure DEST_PATH_IMAGE045

其中,

Figure 801062DEST_PATH_IMAGE046
表示深度图的横坐标,
Figure DEST_PATH_IMAGE047
表示深度图的纵坐标,
Figure 616571DEST_PATH_IMAGE048
Figure 75103DEST_PATH_IMAGE016
表示根据相机参 数所获得的相机焦距,x、y和z表示点云转化的点云坐标。 in,
Figure 801062DEST_PATH_IMAGE046
represents the abscissa of the depth map,
Figure DEST_PATH_IMAGE047
Indicates the ordinate of the depth map,
Figure 616571DEST_PATH_IMAGE048
and
Figure 75103DEST_PATH_IMAGE016
Indicates the focal length of the camera obtained according to the camera parameters, and x, y, and z indicate the point cloud coordinates of point cloud conversion.

步骤S600、根据点云集的尺度,对多种尺度的点云集采用不同的半径滤波进行优化,获得优化后的点云集。Step S600 , according to the scale of the point cloud set, optimize point cloud sets of various scales by using different radius filters to obtain optimized point cloud sets.

具体的,获取多种尺度的点云集,每种尺度的点云集中的点云都有对应的半径大小和预设的邻点数量;Specifically, point cloud sets of multiple scales are obtained, and the point clouds in each scale point cloud set have a corresponding radius and a preset number of adjacent points;

根据点云集的尺度采用如下公式计算出点云集中点云对应的半径:According to the scale of the point cloud set, the following formula is used to calculate the radius corresponding to the point cloud in the point cloud set:

Figure 837523DEST_PATH_IMAGE017
Figure 837523DEST_PATH_IMAGE017

其中,

Figure 550264DEST_PATH_IMAGE018
表示不同尺度的点云集中点云对应的半径,
Figure 536674DEST_PATH_IMAGE019
表示常量参数,t表示常量 参数,
Figure DEST_PATH_IMAGE049
表示每个点云集的预先所设定的尺度等级; in,
Figure 550264DEST_PATH_IMAGE018
Indicates the radius corresponding to the point cloud in the point cloud set of different scales,
Figure 536674DEST_PATH_IMAGE019
Represents a constant parameter, t represents a constant parameter,
Figure DEST_PATH_IMAGE049
Represents the preset scale level of each point cloud set;

根据每个点云对应的半径大小和预设的邻点数量对多种尺度的点云集进行优化,获得优化后的点云集。According to the radius size corresponding to each point cloud and the preset number of adjacent points, the point cloud sets of various scales are optimized to obtain the optimized point cloud set.

在本实施例中,对于不同尺度的点云集,通过深度图转化后需进行半径滤波,滤除噪点,优化点云数据。对于不同尺度的点云集,由于点云的聚集程度不同,采取不同的半径滤波。半径滤波即首先获取每个点云对应的半径大小并预设邻点数量,只有满足在该半径范围内拥有足够数量的邻点的点云才会被保留,其余都被滤去。对于本实施例多尺度的点云集,还需考虑点云的在分割集中所属的语义类别,即,半径内拥有n数量的同语义类别的邻点的点云才会被保留。In this embodiment, for point cloud sets of different scales, radius filtering is required after the depth map is converted to filter out noise and optimize point cloud data. For point cloud sets of different scales, different radius filters are adopted due to the different degree of point cloud aggregation. Radius filtering is to first obtain the radius size corresponding to each point cloud and preset the number of neighbors. Only the point cloud with a sufficient number of neighbors within the radius will be retained, and the rest will be filtered out. For the multi-scale point cloud set in this embodiment, it is also necessary to consider the semantic category of the point cloud in the segmentation set, that is, only the point cloud with n number of neighbor points of the same semantic category within the radius will be retained.

步骤S700、基于优化后的点云集进行不同尺度的重建,获得不同尺度的三维重建结果。Step S700 , perform reconstruction at different scales based on the optimized point cloud set, and obtain three-dimensional reconstruction results at different scales.

具体的,在步骤S600中对不同尺度的点云集进行优化,获得不同尺度优化后的点云集,将每种尺度优化后的点云集进行重建,获得不同尺度的三维重建结果。Specifically, in step S600, point cloud sets of different scales are optimized to obtain optimized point cloud sets of different scales, and the optimized point cloud sets of each scale are reconstructed to obtain three-dimensional reconstruction results of different scales.

步骤S800、将每种尺度的三维重建结果进行拼接融合,获得最终的三维重建结果。In step S800, the 3D reconstruction results of each scale are spliced and fused to obtain a final 3D reconstruction result.

具体的,将每种尺度的三维重建结果进行拼接融合,获得最终的三维重建结果。本实施例通过步骤S700基于优化后的点云集进行不同尺度的重建,优化后的点云集更加精确,因此,本实施例获得的最终的三维重建结果也更加精确。Specifically, the 3D reconstruction results of each scale are spliced and fused to obtain a final 3D reconstruction result. In this embodiment, reconstructions of different scales are performed based on the optimized point cloud set through step S700, and the optimized point cloud set is more accurate, therefore, the final 3D reconstruction result obtained in this embodiment is also more accurate.

本实施例通过获取多张多视角图像,对多张多视角图像进行多尺度语义特征提取,获得多种尺度的特征图;对多种尺度的特征图进行多尺度语义分割,获得多种尺度的语义分割集;本实施例通过对多张多视角图像进行多尺度语义特征提取,能够提取深层次的特征,能获得多种尺度的特征图。并对多种尺度的特征图进行多尺度语义分割,聚合各个尺度的语义信息,丰富了各个尺度的语义信息。本实施例将多张多视角图像作为输入,通过有监督的三维重建方法获得初始深度图;基于多种尺度的语义分割集和初始深度图,获得多种尺度的深度图;本实施例利用多种尺度的语义分割集中的各个尺度的语义信息分别对初始深度图进行语义引导,从而不断修正初始深度图,获得准确的多种尺度的深度图。本实施例基于多种尺度的深度图,构建多种尺度的点云集;根据点云集的尺度,对多种尺度的点云集采用不同的半径滤波进行优化,获得优化后的点云集;基于优化后的点云集进行不同尺度的重建,获得不同尺度的重建结果;将每种尺度的重建结果进行拼接融合,获得最终的重建结果。本实施例用获得的多种尺度的深度图构建多种尺度的点云集,根据点云集的尺度采用不同的半径滤波进行优化,优化后的点云集用于不同尺度的重建,再将重建结果融合以获得更加精确的重建结果。本实施例能够充分利用各个尺度的语义信息,能够提高三维重建的精确度。In this embodiment, by acquiring multiple multi-view images, multi-scale semantic feature extraction is performed on multiple multi-view images to obtain feature maps of multiple scales; multi-scale semantic segmentation is performed on feature maps of multiple scales to obtain multi-scale semantic features Semantic segmentation set: In this embodiment, by performing multi-scale semantic feature extraction on multiple multi-view images, deep-level features can be extracted, and feature maps of multiple scales can be obtained. And perform multi-scale semantic segmentation on feature maps of multiple scales, aggregate semantic information of each scale, and enrich semantic information of each scale. In this embodiment, multiple multi-view images are used as input, and an initial depth map is obtained through a supervised three-dimensional reconstruction method; depth maps of various scales are obtained based on semantic segmentation sets of multiple scales and initial depth maps; this embodiment uses multiple The semantic information of each scale in the semantic segmentation set of different scales provides semantic guidance to the initial depth map, so as to continuously correct the initial depth map and obtain accurate depth maps of multiple scales. In this embodiment, point cloud sets of various scales are constructed based on depth maps of various scales; according to the scales of point cloud sets, point cloud sets of various scales are optimized using different radius filters to obtain optimized point cloud sets; based on the optimized The point cloud sets of different scales are reconstructed to obtain reconstruction results of different scales; the reconstruction results of each scale are spliced and fused to obtain the final reconstruction result. In this embodiment, the obtained depth maps of various scales are used to construct point cloud sets of various scales, and different radius filters are used for optimization according to the scale of point cloud sets. The optimized point cloud sets are used for reconstruction of different scales, and then the reconstruction results are fused to obtain more accurate reconstruction results. This embodiment can make full use of semantic information of various scales, and can improve the accuracy of three-dimensional reconstruction.

参照图5,本发明实施例提供了一种基于深度学习的多视图三维重建系统,本基于深度学习的多视图三维重建系统包括特征图获取单元100、语义分割集获取单元200、初始深度图获取单元300、深度图获取单元400、点云集获取单元500、半径滤波单元600、重建结果获取单元700以及重建结果融合单元800,其中:Referring to FIG. 5 , an embodiment of the present invention provides a multi-view 3D reconstruction system based on deep learning. The multi-view 3D reconstruction system based on deep learning includes a feature map acquisition unit 100, a semantic segmentation set acquisition unit 200, and an initial depth map acquisition unit. Unit 300, depth map acquisition unit 400, point cloud set acquisition unit 500, radius filter unit 600, reconstruction result acquisition unit 700 and reconstruction result fusion unit 800, wherein:

特征图获取单元100,用于获取多视角图像,对多视角图像进行多尺度语义特征提取,获得多种尺度的特征图;The feature map acquisition unit 100 is configured to acquire multi-view images, perform multi-scale semantic feature extraction on multi-view images, and obtain feature maps of multiple scales;

语义分割集获取单元200,用于对多种尺度的特征图进行多尺度语义分割,获得多种尺度的语义分割集;Semantic segmentation set acquisition unit 200, configured to perform multi-scale semantic segmentation on feature maps of multiple scales to obtain semantic segmentation sets of multiple scales;

初始深度图获取单元300,用于通过有监督的三维重建方法对多张多视角图像进行重建,获得初始深度图;An initial depth map acquisition unit 300, configured to reconstruct a plurality of multi-view images through a supervised three-dimensional reconstruction method to obtain an initial depth map;

深度图获取单元400,用于基于多种尺度的语义分割集和初始深度图,获得多种尺度的深度图;A depth map acquisition unit 400, configured to obtain depth maps of multiple scales based on semantic segmentation sets of multiple scales and initial depth maps;

点云集获取单元500,用于基于多种尺度的深度图,构建多种尺度的点云集;The point cloud set acquisition unit 500 is configured to construct point cloud sets of multiple scales based on depth maps of multiple scales;

半径滤波单元600,用于根据点云集的尺度,对多种尺度的点云集采用不同的半径滤波进行优化,获得优化后的点云集;The radius filtering unit 600 is configured to optimize point cloud sets of various scales using different radius filters according to the scale of the point cloud set, so as to obtain an optimized point cloud set;

重建结果获取单元700,用于基于优化后的点云集进行不同尺度的重建,获得不同尺度的三维重建结果;A reconstruction result acquisition unit 700, configured to perform reconstruction at different scales based on the optimized point cloud set, and obtain three-dimensional reconstruction results at different scales;

重建结果融合单元800,用于将每种尺度的重建结果进行拼接融合,获得最终的三维重建结果。The reconstruction result fusion unit 800 is configured to stitch and fuse the reconstruction results of each scale to obtain a final three-dimensional reconstruction result.

需要说明的是,由于本实施例中的一种基于深度学习的多视图三维重建系统与上述的一种基于深度学习的多视图三维重建方法基于相同的发明构思,因此,方法实施例中的相应内容同样适用于本系统实施例,此处不再详述。It should be noted that since the deep learning-based multi-view 3D reconstruction system in this embodiment is based on the same inventive concept as the above-mentioned deep learning-based multi-view 3D reconstruction method, the corresponding methods in the method embodiments The content is also applicable to this embodiment of the system, and will not be described in detail here.

本发明实施例还提供了一种基于深度学习的多视图三维重建设备,包括:至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器。An embodiment of the present invention also provides a multi-view three-dimensional reconstruction device based on deep learning, including: at least one control processor and a memory for communicating with the at least one control processor.

存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

实现上述实施例的一种基于深度学习的多视图三维重建方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的一种基于深度学习的多视图三维重建方法,例如,执行以上描述的图1中的方法步骤S100至步骤S800。The non-transitory software programs and instructions required to realize the multi-view 3D reconstruction method based on deep learning in the above-mentioned embodiments are stored in the memory, and when executed by the processor, a deep-learning-based 3D reconstruction method in the above-mentioned embodiments is executed. The multi-view three-dimensional reconstruction method, for example, executes the method steps S100 to S800 in FIG. 1 described above.

以上所描述的系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The system embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

本发明实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个控制处理器执行,可使得上述一个或多个控制处理器执行上述方法实施例中的一种基于深度学习的多视图三维重建方法,例如,执行以上描述的图1中的方法步骤S100至步骤S800的功能。The embodiment of the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more control processors, so that the above-mentioned one or more The control processor executes a deep learning-based multi-view 3D reconstruction method in the above method embodiment, for example, executes the functions from step S100 to step S800 of the method in FIG. 1 described above.

通过以上的实施方式的描述,本领域技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现。本领域技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(ReadOnly Memory ,ROM)或随机存储记忆体(Random Access Memory ,RAM)等。Through the above description of the implementation manners, those skilled in the art can clearly understand that each implementation manner can be implemented by means of software plus a general hardware platform. Those skilled in the art can understand that all or part of the process in the method of the above-mentioned embodiments can be completed by instructing related hardware through a computer program, and the program can be stored in a computer-readable storage medium. , it may include the flow of the embodiment of the above method. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (ReadOnly Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.

上面结合附图对本发明实施例作了详细说明,但本发明不限于上述实施例,在所属技术领域普通技术人员所具备的知识范围内,还可以在不脱离本发明宗旨的前提下作出各种变化。The embodiments of the present invention have been described in detail above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned embodiments. Within the scope of knowledge of those of ordinary skill in the art, various modifications can be made without departing from the spirit of the present invention. Variety.

Claims (9)

1. A multi-view three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps:
acquiring a plurality of multi-view images, and performing multi-scale semantic feature extraction on the plurality of multi-view images to obtain feature maps of various scales;
performing multi-scale semantic segmentation on the feature maps of multiple scales to obtain semantic segmentation sets of multiple scales;
reconstructing a plurality of multi-view images by a supervised three-dimensional reconstruction method to obtain an initial depth map;
obtaining the depth maps of multiple scales based on the semantic segmentation sets of multiple scales and the initial depth map, specifically:
selecting any one of the multiple multi-view images as a reference image, and taking the others as images to be matched;
selecting a reference point from the reference image, acquiring a semantic category corresponding to the reference point in the semantic segmentation set, and acquiring a depth value corresponding to the reference point on the initial depth image;
the number of reference points is chosen by the following formula:
Figure DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE004
representing the number of reference points selected by the jth segmentation set, H representing the height of the multi-view image, W representing the width of the multi-view image, HW representing the number of pixel points of the multi-view image, t representing a constant parameter,
Figure DEST_PATH_IMAGE006
representing the number of semantic categories contained in the jth said semantic partition set,
Figure DEST_PATH_IMAGE008
representing the number of semantic categories contained in the ith semantic segmentation set, wherein n represents the total number of the semantic segmentation sets;
based on each reference point, acquiring a matching point of each reference point on the graph to be matched through the following formula:
Figure DEST_PATH_IMAGE010
wherein,
Figure DEST_PATH_IMAGE012
representing the matching point of the ith reference point on the graph to be matched, K representing the internal parameter of the camera, T representing the external parameter of the camera,
Figure DEST_PATH_IMAGE014
representing a reference point P in said reference map i Corresponding depth values on the initial depth map;
obtaining semantic categories corresponding to each matching point, correcting the multi-view images of each scale by minimizing a semantic loss function to obtain the depth maps of various scales, wherein the semantic loss function
Figure DEST_PATH_IMAGE016
The calculation formula of (c) is as follows:
Figure DEST_PATH_IMAGE018
wherein,
Figure DEST_PATH_IMAGE020
representing the difference between the semantic information of the ith reference point and the semantic information of the ith matching point, M i Representing a mask, N representing the number of said reference points;
constructing a point cloud set with multiple scales based on the depth maps with multiple scales;
according to the scale of the point cloud set, different radius filtering is adopted for the point cloud sets with various scales to carry out optimization, and the optimized point cloud set is obtained;
reconstructing at different scales based on the optimized point cloud set to obtain three-dimensional reconstruction results at different scales;
and splicing and fusing the three-dimensional reconstruction results of each scale to obtain a final three-dimensional reconstruction result.
2. The deep learning-based multi-view three-dimensional reconstruction method according to claim 1, wherein the performing multi-scale semantic feature extraction on the multiple multi-view images to obtain feature maps of multiple scales comprises:
performing multi-layer feature extraction on the multi-view images through a ResNet network to obtain original feature maps with various scales;
and respectively connecting the original feature map of each scale with channel attention so as to carry out importance weighting on the original feature map of each scale through a channel attention mechanism and obtain feature maps of various scales.
3. The deep learning-based multi-view three-dimensional reconstruction method according to claim 2, wherein the weighting of importance of the original feature map of each scale through a channel attention mechanism to obtain feature maps of multiple scales comprises:
compressing the original characteristic diagram of each scale through a compression network to obtain a one-dimensional characteristic diagram corresponding to the original characteristic diagram of each scale;
inputting the one-dimensional characteristic diagram into a full-connection layer through an excitation network to perform importance prediction, and obtaining the importance of each channel;
and exciting the importance of each channel to the one-dimensional characteristic diagram of the original characteristic diagram of each scale through an excitation function to obtain characteristic diagrams of various scales.
4. The deep learning-based multi-view three-dimensional reconstruction method according to claim 1, wherein the performing multi-scale semantic segmentation on the feature maps of multiple scales to obtain semantic segmentation sets of multiple scales includes:
clustering the characteristic graphs of multiple scales through nonnegative matrix decomposition to obtain semantic segmentation sets of multiple scales; wherein the expression of the non-negative matrix factorization is:
Figure DEST_PATH_IMAGE022
the method comprises the following steps of mapping, connecting and remolding feature maps of various scales into a matrix V with HW rows and C columns, wherein the P represents a matrix with HW rows and K columns, the Q represents a matrix with K rows and C columns, the H represents a coefficient matrix, the W represents a base matrix, the K represents a non-negative matrix decomposition factor of a semantic cluster number, the C represents the dimension of each pixel, and the F represents the adoption of a non-inducible norm.
5. The method for multi-view three-dimensional reconstruction based on deep learning of claim 1, wherein the constructing the point cloud sets of multiple scales based on the depth maps of multiple scales comprises:
constructing a point cloud set of each scale by using the depth map of each scale according to the following expression:
Figure DEST_PATH_IMAGE024
wherein,
Figure DEST_PATH_IMAGE026
the abscissa representing the depth map is shown,
Figure DEST_PATH_IMAGE028
represents the ordinate of the depth map and,
Figure DEST_PATH_IMAGE030
and
Figure DEST_PATH_IMAGE032
representing the camera focal length obtained from the camera parameters, and x, y and z represent the point cloud coordinates of the point cloud transformation.
6. The deep learning-based multi-view three-dimensional reconstruction method according to claim 1, wherein the optimization of the point cloud sets of multiple scales by using different radius filters according to the scales of the point cloud sets to obtain an optimized point cloud set comprises:
acquiring the point cloud sets of multiple scales, wherein the point cloud in the point cloud set of each scale has a corresponding radius and a preset number of adjacent points;
calculating the corresponding radius of the point cloud in the point cloud set by adopting the following formula according to the scale of the point cloud set:
Figure DEST_PATH_IMAGE034
wherein,
Figure DEST_PATH_IMAGE036
representing the corresponding radius of the point cloud in the point cloud set with different scales,
Figure DEST_PATH_IMAGE038
representing a constant parameter, t representing a constant parameter,
Figure DEST_PATH_IMAGE040
representing a preset scale grade of each point cloud set;
and optimizing the point cloud sets with various scales according to the radius corresponding to each point cloud and the preset number of adjacent points to obtain an optimized point cloud set.
7. A deep learning based multi-view three-dimensional reconstruction system, comprising:
the characteristic diagram acquisition unit is used for acquiring multi-view images, and performing multi-scale semantic feature extraction on the multi-view images to acquire characteristic diagrams of multiple scales;
the semantic segmentation set acquisition unit is used for carrying out multi-scale semantic segmentation on the feature maps with various scales to obtain a semantic segmentation set with various scales;
the initial depth map acquisition unit is used for reconstructing a plurality of multi-view images by a supervised three-dimensional reconstruction method to obtain an initial depth map;
a depth map obtaining unit, configured to obtain depth maps of multiple scales based on the multiple-scale semantic segmentation sets and the initial depth map, specifically:
selecting any one of the multiple multi-view images as a reference image, and taking the other images as images to be matched;
selecting a reference point from the reference image, acquiring a semantic category corresponding to the reference point in the semantic segmentation set, and acquiring a depth value corresponding to the reference point on the initial depth image;
the number of reference points is chosen by the following formula:
Figure 650250DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE041
representing the number of reference points selected by the jth segmentation set, H representing the height of the multi-view image, W representing the width of the multi-view image, HW representing the number of pixel points of the multi-view image, t representing a constant parameter,
Figure DEST_PATH_IMAGE042
representing the number of semantic categories contained in the jth of said semantic segmentation sets,
Figure 629707DEST_PATH_IMAGE008
representing the number of semantic categories contained in the ith semantic segmentation set, wherein n represents the total number of the semantic segmentation sets;
based on each reference point, obtaining the matching point of each reference point on the graph to be matched through the following formula:
Figure 675023DEST_PATH_IMAGE010
wherein,
Figure DEST_PATH_IMAGE043
representing the matching point of the ith reference point on the graph to be matched, K representing the internal reference of the camera, T representing the external reference of the camera,
Figure 414440DEST_PATH_IMAGE014
representing a reference point P in said reference map i Corresponding depth values on the initial depth map;
obtaining the semantic category corresponding to each matching point, correcting the multi-view image of each scale by minimizing a semantic loss function to obtain the depth maps of multiple scales, wherein the semantic loss function
Figure 432075DEST_PATH_IMAGE016
The calculation formula of (c) is as follows:
Figure 848013DEST_PATH_IMAGE018
wherein,
Figure 380625DEST_PATH_IMAGE020
representing the difference between the semantic information of the ith reference point and the semantic information of the ith matching point, M i Representing a mask, N representing the number of said reference points;
the point cloud set acquisition unit is used for constructing point cloud sets with various scales based on the depth maps with various scales;
the radius filtering unit is used for optimizing the point cloud sets with various scales by adopting different radius filtering according to the scales of the point cloud sets to obtain the optimized point cloud sets;
a reconstruction result obtaining unit, configured to perform reconstruction of different scales based on the optimized point cloud set, so as to obtain three-dimensional reconstruction results of different scales;
and the reconstruction result fusion unit is used for splicing and fusing the reconstruction results of each scale to obtain a final three-dimensional reconstruction result.
8. A deep learning based multi-view three-dimensional reconstruction device comprising at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the method of deep learning based multi-view three-dimensional reconstruction according to any one of claims 1 to 6.
9. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method for deep learning based multi-view three-dimensional reconstruction according to any one of claims 1 to 6.
CN202211087276.9A 2022-09-07 2022-09-07 Multi-view three-dimensional reconstruction method, system and equipment based on deep learning Active CN115170746B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211087276.9A CN115170746B (en) 2022-09-07 2022-09-07 Multi-view three-dimensional reconstruction method, system and equipment based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211087276.9A CN115170746B (en) 2022-09-07 2022-09-07 Multi-view three-dimensional reconstruction method, system and equipment based on deep learning

Publications (2)

Publication Number Publication Date
CN115170746A CN115170746A (en) 2022-10-11
CN115170746B true CN115170746B (en) 2022-11-22

Family

ID=83481918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211087276.9A Active CN115170746B (en) 2022-09-07 2022-09-07 Multi-view three-dimensional reconstruction method, system and equipment based on deep learning

Country Status (1)

Country Link
CN (1) CN115170746B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457101B (en) * 2022-11-10 2023-03-24 武汉图科智能科技有限公司 Edge-preserving multi-view depth estimation and ranging method for unmanned aerial vehicle platform
CN118096995A (en) * 2022-11-21 2024-05-28 华为云计算技术有限公司 Three-dimensional twin method and device
CN117593454B (en) * 2023-11-21 2024-07-19 重庆市祥和大宇包装有限公司 Three-dimensional reconstruction and target surface Ping Miandian cloud generation method
CN117876397B (en) * 2024-01-12 2024-06-18 浙江大学 Bridge member three-dimensional point cloud segmentation method based on multi-view data fusion
CN118644640B (en) * 2024-08-09 2024-10-29 宁波博海深衡科技有限公司 A method and system for underwater image 3D reconstruction based on deep learning

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715504A (en) * 2015-02-12 2015-06-17 四川大学 Robust large-scene dense three-dimensional reconstruction method
CN106157307B (en) * 2016-06-27 2018-09-11 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
US11004202B2 (en) * 2017-10-09 2021-05-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for semantic segmentation of 3D point clouds
CN108388639B (en) * 2018-02-26 2022-02-15 武汉科技大学 A cross-media retrieval method based on subspace learning and semi-supervised regularization
JP7422785B2 (en) * 2019-05-17 2024-01-26 マジック リープ, インコーポレイテッド Method and apparatus for angle detection using neural networks and angle detectors
US11645756B2 (en) * 2019-11-14 2023-05-09 Samsung Electronics Co., Ltd. Image processing apparatus and method
CN111340186B (en) * 2020-02-17 2022-10-21 之江实验室 Compressed representation learning method based on tensor decomposition
CN112734915A (en) * 2021-01-19 2021-04-30 北京工业大学 Multi-view stereoscopic vision three-dimensional scene reconstruction method based on deep learning
CN113066168B (en) * 2021-04-08 2022-08-26 云南大学 Multi-view stereo network three-dimensional reconstruction method and system
CN113673400A (en) * 2021-08-12 2021-11-19 土豆数据科技集团有限公司 Real scene three-dimensional semantic reconstruction method and device based on deep learning and storage medium
CN114881867A (en) * 2022-03-24 2022-08-09 山西三友和智慧信息技术股份有限公司 Image denoising method based on deep learning
CN114677479A (en) * 2022-04-13 2022-06-28 温州大学大数据与信息技术研究院 Natural landscape multi-view three-dimensional reconstruction method based on deep learning

Also Published As

Publication number Publication date
CN115170746A (en) 2022-10-11

Similar Documents

Publication Publication Date Title
CN115170746B (en) Multi-view three-dimensional reconstruction method, system and equipment based on deep learning
CN112529015B (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
CN113449736B (en) Photogrammetry point cloud semantic segmentation method based on deep learning
CN111325851B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN114255238A (en) Three-dimensional point cloud scene segmentation method and system fusing image features
WO2020228525A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
CN111753698B (en) Multi-mode three-dimensional point cloud segmentation system and method
CN111652966A (en) A 3D reconstruction method and device based on multi-view of unmanned aerial vehicle
JP2023533907A (en) Image processing using self-attention-based neural networks
CN114418030A (en) Image classification method, and training method and device of image classification model
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
CN113077554A (en) Three-dimensional structured model reconstruction method based on any visual angle picture
CN110781894A (en) Point cloud semantic segmentation method and device and electronic equipment
CN115131849A (en) Image generation method and related equipment
CN115205150A (en) Image deblurring method, device, equipment, medium and computer program product
CN113313176A (en) Point cloud analysis method based on dynamic graph convolution neural network
CN115082885A (en) Point cloud target detection method, device, equipment and storage medium
CN115222917A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
EP4388451A1 (en) Attention-based method for deep point cloud compression
CN108986210B (en) Method and device for reconstructing three-dimensional scene
CN118229844A (en) Image generation data processing method, image generation method and device
CN115222896B (en) Three-dimensional reconstruction method, device, electronic device and computer-readable storage medium
CN114494395A (en) Method, Apparatus, Device and Storage Medium for Depth Map Generation Based on Plane Prior
CN111860668A (en) A point cloud recognition method with deep convolutional network for raw 3D point cloud processing
CN117934737A (en) Intelligent generation method for ancient cultural relic digital map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant