CN110717917B

CN110717917B - CNN-based semantic segmentation depth prediction method and device

Info

Publication number: CN110717917B
Application number: CN201910944866.0A
Authority: CN
Inventors: 吴霞
Original assignee: Beijing Moviebook Science And Technology Co ltd
Current assignee: Beijing Moviebook Science And Technology Co ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2022-08-09
Anticipated expiration: 2039-09-30
Also published as: CN110717917A

Abstract

The present application discloses a depth prediction method and device based on CNN semantic segmentation, and relates to the field of semantic segmentation. The method includes: acquiring a current frame from a data set, calculating an ORB feature and a map point cloud of the current frame; according to the ORB feature and map point cloud of the current frame, determining that the current frame is selected as a key when a specified condition is satisfied frame; the CNN convolutional neural network is used to associate the key frame with the depth map to make depth prediction, and the CNN is used to label the depth map with semantic labels to achieve semantic association. The device includes: a calculation module, a selection module and a prediction module. This application no longer inputs each image into the model, which solves the memory problem, and adjusts the depth regression through CNN, which optimizes the problem of inaccurate absolute scale during 3D reconstruction.

Description

Depth prediction method and device for semantic segmentation based on CNN

技术领域technical field

本申请涉及语义分割领域，特别是涉及一种基于CNN语义分割深度预测方法和装置。The present application relates to the field of semantic segmentation, and in particular, to a method and device for depth prediction based on CNN semantic segmentation.

背景技术Background technique

语义SLAM(Simultaneous Localization And Mapping，即时定位与地图构建)应用在图像语义分割、语义地图构建等方向，扩展了传统SLAM问题的研究内容，目前已经出现了一些将语义信息集成到SLAM的研究。比如用SLAM系统中得到的图像之间的几何一致性促进图像语义分割，或者用语义分割/建图的结果促进SLAM的定位/闭环等。Semantic SLAM (Simultaneous Localization And Mapping) is applied in image semantic segmentation, semantic map construction and other directions, expanding the research content of traditional SLAM problems. At present, there have been some researches that integrate semantic information into SLAM. For example, using the geometric consistency between the images obtained in the SLAM system to promote image semantic segmentation, or using the results of semantic segmentation/mapping to promote SLAM localization/loop closure, etc.

现有的语义SLAM模型如MaskFusion，本质是RGBD-SLAM+语义分割mask-rcnn，即几何SLAM模型和深度学习SLAM模型的融合，主要解决从对象对平理解环境，在准确分割运动目标的同时，可以识别、检测、跟踪以及重建目标。对于每帧新来的数据，算法包含如下几个过程：1)跟踪通过分割获取独立的模型对象，每一个对象的6DoF(自由度)通过最小化一个激活函数来确定。其中，仅对那些动态的模型进行追踪。2)比较两种确定对象是否运动的方法：基于运动不一致性和Treating objects which are being touched by a person asdynamic。3)使用Mask RCNN和一个基于深度和表面法线不连续的分割算法进行分割。4)把对象的几何结构与标签结合起来即进行融合。Existing semantic SLAM models such as MaskFusion are essentially RGBD-SLAM+ semantic segmentation mask-rcnn, that is, the fusion of geometric SLAM model and deep learning SLAM model, which mainly solves the problem of understanding the environment from object to plane. While accurately segmenting moving targets, it can Identify, detect, track and reconstruct objects. For each frame of new data, the algorithm includes the following processes: 1) Tracking obtains independent model objects through segmentation, and the 6DoF (degree of freedom) of each object is determined by minimizing an activation function. Among them, only those dynamic models are tracked. 2) Compare two methods of determining whether an object is in motion: based on motion inconsistency and Treating objects which are being touched by a person asdynamic. 3) Segmentation is performed using Mask RCNN and a segmentation algorithm based on depth and surface normal discontinuities. 4) Fusion is performed by combining the geometric structure of the object with the label.

但是，基于深度学习的语义SLAM模型没有关键帧的概念，都是一张一张图片计算的，基于几何的单目SLAM无法解决深度问题，在几何一致性上不够鲁棒。However, the semantic SLAM model based on deep learning does not have the concept of key frames, and is calculated one by one. The geometry-based monocular SLAM cannot solve the depth problem and is not robust enough in terms of geometric consistency.

发明内容SUMMARY OF THE INVENTION

本申请的目的在于克服上述问题或者至少部分地解决或缓减解决上述问题。The purpose of the present application is to overcome the above-mentioned problems or at least partially solve or alleviate the above-mentioned problems.

根据本申请的一个方面，提供了一种基于CNN语义分割深度预测方法，包括：According to an aspect of the present application, there is provided a depth prediction method based on CNN semantic segmentation, including:

从数据集中获取当前帧，计算所述当前帧的ORB特征和地图点云；Obtain the current frame from the data set, and calculate the ORB feature and map point cloud of the current frame;

根据所述当前帧的ORB特征和地图点云，确定指定条件满足时将所述当前帧选取为关键帧；According to the ORB feature of the current frame and the map point cloud, determine that the current frame is selected as a key frame when the specified condition is satisfied;

采用CNN(Convolutional Neural Networks，卷积神经网络)将所述关键帧与深度图关联起来，做深度预测，并采用所述CNN为所述深度图标注语义标签，实现语义关联。A CNN (Convolutional Neural Networks, convolutional neural network) is used to associate the key frame with the depth map to make depth prediction, and the CNN is used to label the depth map with semantic labels to achieve semantic association.

可选地，根据所述当前帧的ORB特征和地图点云，确定指定条件满足时将所述当前帧选取为关键帧，包括：Optionally, according to the ORB feature of the current frame and the map point cloud, it is determined that the current frame is selected as a key frame when a specified condition is satisfied, including:

当局部地图构建处于空闲状态或当前帧距上一个关键帧之间已超过指定帧数M帧，则选取所述当前帧为关键帧；或，When the local map construction is in an idle state or the current frame has exceeded the specified number of M frames from the previous key frame, the current frame is selected as the key frame; or,

如果当前帧中与地图点云匹配成功的ORB特征点少于指定个数N，则选取所述当前帧为关键帧；或，If the ORB feature points in the current frame that are successfully matched with the map point cloud are less than the specified number N, then the current frame is selected as the key frame; or,

如果当前帧中与地图点云匹配成功的ORB特征点少于所述地图云点的指定比例，则选取所述当前帧为关键帧。If the ORB feature points in the current frame that are successfully matched with the map point cloud are less than the specified ratio of the map cloud points, the current frame is selected as a key frame.

可选地，采用CNN卷积神经网络将所述关键帧与深度图关联起来，做深度预测，包括：Optionally, a CNN convolutional neural network is used to associate the key frame with the depth map to make depth prediction, including:

采用CNN估计所述关键帧的深度生成深度图，并测量每个深度预测的像素方向置信度，构造出对应的不确定性图；CNN is used to estimate the depth of the key frame to generate a depth map, and the pixel direction confidence of each depth prediction is measured to construct a corresponding uncertainty map;

将生成的所述关键帧的深度图和不确定性图，与最邻近关键帧的深度图和不确定性图进行融合，更新点云后得到重建的3D场景。The generated depth map and uncertainty map of the key frame are fused with the depth map and uncertainty map of the nearest key frame, and the reconstructed 3D scene is obtained after updating the point cloud.

可选地，采用所述CNN为所述深度图标注语义标签，实现语义关联，包括：Optionally, using the CNN to mark semantic labels for the depth map to achieve semantic association, including:

采用将残差网络拓展到全卷积网络的CNN，预测语义标签，并标注在所述深度图上实现语义关联。A CNN that extends the residual network to a fully convolutional network is used to predict semantic labels, and label them to achieve semantic associations on the depth map.

可选地，所述CNN的计算采用反向传播和SGD随机梯度下降来最小化softmax层和交叉熵损失函数。Optionally, the computation of the CNN employs backpropagation and SGD stochastic gradient descent to minimize the softmax layer and the cross-entropy loss function.

根据本申请的另一个方面，提供了一种基于CNN语义分割深度预测装置，包括：According to another aspect of the present application, a depth prediction device based on CNN semantic segmentation is provided, including:

计算模块，其配置成从数据集中获取当前帧，计算所述当前帧的ORB特征和地图点云；a computing module, configured to obtain the current frame from the data set, and calculate the ORB feature and map point cloud of the current frame;

选取模块，其配置成根据所述当前帧的ORB特征和地图点云，确定指定条件满足时将所述当前帧选取为关键帧；A selection module, which is configured to select the current frame as a key frame when it is determined that the specified condition is satisfied according to the ORB feature of the current frame and the map point cloud;

预测模块，其配置成采用CNN卷积神经网络将所述关键帧与深度图关联起来，做深度预测，并采用所述CNN为所述深度图标注语义标签，实现语义关联。A prediction module, which is configured to use a CNN convolutional neural network to associate the key frame with the depth map to perform depth prediction, and use the CNN to label the depth map with semantic labels to achieve semantic association.

可选地，所述选取模块具体配置成：Optionally, the selection module is specifically configured to:

可选地，所述预测模块具体配置成：Optionally, the prediction module is specifically configured to:

根据本申请的又一个方面，提供了一种计算设备，包括存储器、处理器和存储在所述存储器内并能由所述处理器运行的计算机程序，其中，所述处理器执行所述计算机程序时实现如上所述的方法。According to yet another aspect of the present application, there is provided a computing device comprising a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein the processor executes the computer program implement the method described above.

根据本申请的又一个方面，提供了一种计算机可读存储介质，优选为非易失性可读存储介质，其内存储有计算机程序，所述计算机程序在由处理器执行时实现如上所述的方法。According to yet another aspect of the present application, there is provided a computer-readable storage medium, preferably a non-volatile readable storage medium, in which a computer program is stored, the computer program, when executed by a processor, realizes the above-mentioned Methods.

根据本申请的又一个方面，提供了一种计算机程序产品，包括计算机可读代码，当所述计算机可读代码由计算机设备执行时，导致所述计算机设备执行上述的方法。According to yet another aspect of the present application, there is provided a computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform the above method.

本申请提供的技术方案，通过从数据集中获取当前帧，计算所述当前帧的ORB特征和地图点云，根据所述当前帧的ORB特征和地图点云，确定指定条件满足时将所述当前帧选取为关键帧，采用CNN卷积神经网络将所述关键帧与深度图关联起来，做深度预测，并采用所述CNN为所述深度图标注语义标签，实现语义关联，这种基于ORB特征选取关键帧的方式，不再把每一张图片输入模型，解决了内存问题，而且，通过CNN调整深度回归，优化了3D重建时绝对尺度不精确的问题。The technical solution provided by the present application is to obtain the current frame from the data set, calculate the ORB feature and map point cloud of the current frame, and determine the current frame when the specified condition is satisfied. The frame is selected as the key frame, and the CNN convolutional neural network is used to associate the key frame with the depth map to make depth prediction, and the CNN is used to label the depth map with semantic labels to achieve semantic association. This is based on ORB features. The method of selecting key frames eliminates the need to input each image into the model, which solves the memory problem. Moreover, by adjusting the depth regression through CNN, the problem of inaccurate absolute scale during 3D reconstruction is optimized.

根据下文结合附图对本申请的具体实施例的详细描述，本领域技术人员将会更加明了本申请的上述以及其他目的、优点和特征。The above and other objects, advantages and features of the present application will be more apparent to those skilled in the art from the following detailed description of the specific embodiments of the present application in conjunction with the accompanying drawings.

附图说明Description of drawings

后文将参照附图以示例性而非限制性的方式详细描述本申请的一些具体实施例。附图中相同的附图标记标示了相同或类似的部件或部分。本领域技术人员应该理解，这些附图未必是按比例绘制的。附图中：Hereinafter, some specific embodiments of the present application will be described in detail by way of example and not limitation with reference to the accompanying drawings. The same reference numbers in the figures designate the same or similar parts or parts. It will be understood by those skilled in the art that the drawings are not necessarily to scale. In the attached picture:

图1是根据本申请一个实施例的基于CNN语义分割深度预测方法流程图；1 is a flowchart of a method for depth prediction based on CNN semantic segmentation according to an embodiment of the present application;

图2是根据本申请另一个实施例的基于CNN语义分割深度预测方法流程图；2 is a flowchart of a method for depth prediction based on CNN semantic segmentation according to another embodiment of the present application;

图3是根据本申请另一个实施例的基于CNN语义分割深度预测装置结构图；3 is a structural diagram of a depth prediction device based on CNN semantic segmentation according to another embodiment of the present application;

图4是根据本申请另一个实施例的计算设备结构图；4 is a structural diagram of a computing device according to another embodiment of the present application;

图5是根据本申请另一个实施例的计算机可读存储介质结构图。FIG. 5 is a structural diagram of a computer-readable storage medium according to another embodiment of the present application.

具体实施方式Detailed ways

本申请选用的实验数据集为KITTI数据集(由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办)，是目前国际上最大的自动驾驶场景下的计算机视觉算法评测数据集。KITTI数据集的采集平台包括：2个灰度摄像机，2个彩色摄像机，一个Velodyne 3D激光雷达，4个光学镜头，以及1个GPS导航系统。整个数据集由389对立体图像和光流图，39.2公里视觉测距序列以及超过200,0003D标注物体的图像组成，其中，每个图像最多包括15辆车及30个行人，而且还包含不同程度的遮挡。The experimental data set selected in this application is the KITTI data set (co-founded by Karlsruhe Institute of Technology in Germany and Toyota American Institute of Technology), which is currently the largest international computer vision algorithm evaluation data set in autonomous driving scenarios. The acquisition platform of the KITTI dataset includes: 2 grayscale cameras, 2 color cameras, a Velodyne 3D lidar, 4 optical lenses, and a GPS navigation system. The entire dataset consists of 389 pairs of stereo images and optical flow maps, 39.2 kilometers of visual ranging sequences, and more than 200,000 images of 3D annotated objects. Each image includes up to 15 vehicles and 30 pedestrians, and also contains varying degrees of occlude.

图1是根据本申请一个实施例的基于CNN语义分割深度预测方法流程图。参见图1，该方法包括：FIG. 1 is a flowchart of a depth prediction method based on CNN semantic segmentation according to an embodiment of the present application. Referring to Figure 1, the method includes:

101：从数据集中获取当前帧，计算当前帧的ORB特征和地图点云；101: Obtain the current frame from the dataset, and calculate the ORB feature and map point cloud of the current frame;

102：根据当前帧的ORB特征和地图点云，确定指定条件满足时将当前帧选取为关键帧；102: According to the ORB feature of the current frame and the map point cloud, determine that the current frame is selected as a key frame when the specified condition is satisfied;

103：采用CNN卷积神经网络将关键帧与深度图关联起来，做深度预测，并采用CNN为深度图标注语义标签，实现语义关联。103: Use the CNN convolutional neural network to associate key frames with the depth map to make depth prediction, and use CNN to label the depth map with semantic labels to achieve semantic association.

本实施例中，可选的，根据当前帧的ORB特征和地图点云，确定指定条件满足时将当前帧选取为关键帧，包括：In this embodiment, optionally, according to the ORB feature of the current frame and the map point cloud, it is determined that the current frame is selected as a key frame when the specified condition is satisfied, including:

当局部地图构建处于空闲状态或当前帧距上一个关键帧之间已超过指定帧数M帧，则选取当前帧为关键帧；或，When the local map construction is in an idle state or the current frame is more than the specified number of M frames from the previous key frame, the current frame is selected as the key frame; or,

如果当前帧中与地图点云匹配成功的ORB特征点少于指定个数N，则选取当前帧为关键帧；或，If the ORB feature points in the current frame that are successfully matched with the map point cloud are less than the specified number N, the current frame is selected as the key frame; or,

如果当前帧中与地图点云匹配成功的ORB特征点少于地图云点的指定比例，则选取当前帧为关键帧。If the ORB feature points in the current frame that are successfully matched with the map point cloud are less than the specified ratio of the map cloud points, the current frame is selected as the key frame.

本实施例中，可选的，采用CNN卷积神经网络将关键帧与深度图关联起来，做深度预测，包括：In this embodiment, optionally, a CNN convolutional neural network is used to associate the key frame with the depth map to make depth prediction, including:

采用CNN估计关键帧的深度生成深度图，并测量每个深度预测的像素方向置信度，构造出对应的不确定性图；CNN is used to estimate the depth of key frames to generate a depth map, and the pixel-wise confidence of each depth prediction is measured to construct a corresponding uncertainty map;

将生成的关键帧的深度图和不确定性图，与最邻近关键帧的深度图和不确定性图进行融合，更新点云后得到重建的3D场景。The depth map and uncertainty map of the generated key frame are fused with the depth map and uncertainty map of the nearest key frame, and the reconstructed 3D scene is obtained after updating the point cloud.

本实施例中，可选的，采用CNN为深度图标注语义标签，实现语义关联，包括：In this embodiment, optionally, a CNN is used to mark semantic labels for the depth map to achieve semantic association, including:

采用将残差网络拓展到全卷积网络的CNN，预测语义标签，并标注在深度图上实现语义关联。A CNN that extends the residual network to a fully convolutional network is used to predict semantic labels, and annotate them on the depth map to achieve semantic association.

本实施例中，可选的，CNN的计算采用反向传播和SGD随机梯度下降来最小化softmax层和交叉熵损失函数。In this embodiment, optionally, backpropagation and SGD stochastic gradient descent are used in the calculation of CNN to minimize the softmax layer and the cross-entropy loss function.

本实施例提供的上述方法，通过从数据集中获取当前帧，计算所述当前帧的ORB特征和地图点云，根据所述当前帧的ORB特征和地图点云，确定指定条件满足时将所述当前帧选取为关键帧，采用CNN卷积神经网络将所述关键帧与深度图关联起来，做深度预测，并采用所述CNN为所述深度图标注语义标签，实现语义关联，这种基于ORB特征选取关键帧的方式，不再把每一张图片输入模型，解决了内存问题，而且，通过CNN调整深度回归，优化了3D重建时绝对尺度不精确的问题。The above method provided in this embodiment obtains the current frame from the data set, calculates the ORB feature and map point cloud of the current frame, and determines that the ORB feature and map point cloud of the current frame are satisfied when the specified condition is satisfied. The current frame is selected as the key frame, and the CNN convolutional neural network is used to associate the key frame with the depth map to make depth prediction, and the CNN is used to label the depth map with semantic labels to achieve semantic association. This ORB-based approach The method of feature selection key frame, no longer input each picture into the model, solve the memory problem, and adjust the depth regression through CNN, optimize the problem of inaccurate absolute scale during 3D reconstruction.

图2是根据本申请另一个实施例的基于CNN语义分割深度预测方法流程图。参见图2，该方法包括：FIG. 2 is a flowchart of a depth prediction method based on CNN semantic segmentation according to another embodiment of the present application. Referring to Figure 2, the method includes:

201：从数据集中获取当前帧，计算当前帧的ORB特征和地图点云；201: Obtain the current frame from the dataset, and calculate the ORB feature and map point cloud of the current frame;

本实施例中，ORB是一种特征的组成形式，ORB特征是由“关键点”和“描述子”组成的。In this embodiment, the ORB is a form of feature composition, and the ORB feature is composed of "key points" and "descriptors".

202：当局部地图构建处于空闲状态或当前帧距上一个关键帧之间已超过指定帧数M帧，则选取当前帧为关键帧；202: When the local map construction is in an idle state or the current frame has exceeded the specified number of M frames from the previous key frame, the current frame is selected as the key frame;

本实施例中，局部地图构建处于空闲状态是指局部地图的某一部分有缺失值。其中，指定帧数M可以根据需要设置，如设置为20帧等等。In this embodiment, the fact that the local map construction is in an idle state means that a certain part of the local map has missing values. Among them, the specified number of frames M can be set as required, such as set to 20 frames and so on.

203：如果当前帧中与地图点云匹配成功的ORB特征点少于指定个数N，则选取当前帧为关键帧；203: If the ORB feature points in the current frame that are successfully matched with the map point cloud are less than the specified number N, select the current frame as a key frame;

其中，指定个数N可以根据需要设置，如设置为50个等等。Among them, the specified number N can be set as required, such as set to 50 and so on.

204：如果当前帧中与地图点云匹配成功的ORB特征点少于地图云点的指定比例，则选取当前帧为关键帧；204: If the ORB feature points in the current frame that are successfully matched with the map point cloud are less than the specified ratio of the map cloud points, select the current frame as a key frame;

其中，指定比例的具体数值也可以根据需要设置，如设置为90％等等。The specific value of the specified ratio can also be set as required, such as set to 90% and so on.

205：采用CNN估计关键帧的深度生成深度图，并测量每个深度预测的像素方向置信度，构造出对应的不确定性图；205: CNN is used to estimate the depth of the key frame to generate a depth map, and the pixel direction confidence of each depth prediction is measured to construct a corresponding uncertainty map;

本实施例中，当特征匹配不成功(即跟踪失败)时，需要启动全局重定位，优选地，在距离上一次全局重定位后，若特征匹配不成功且超过20帧图像后，再次启动重定位。In this embodiment, when the feature matching is unsuccessful (that is, the tracking fails), the global relocation needs to be started. Preferably, after the distance from the last global relocation, if the feature matching is unsuccessful and exceeds 20 frames of images, restart the relocation again. position.

206：将生成的关键帧的深度图和不确定性图，与最邻近关键帧的深度图和不确定性图进行融合，更新点云后得到重建的3D场景；206: Integrate the depth map and uncertainty map of the generated key frame with the depth map and uncertainty map of the nearest key frame, and update the point cloud to obtain a reconstructed 3D scene;

207：采用将残差网络拓展到全卷积网络的CNN，预测语义标签，并标注在深度图上实现语义关联。207: Using a CNN that extends the residual network to a fully convolutional network, predicts semantic labels, and annotates them on the depth map to achieve semantic association.

本实施例中，可选地，上述CNN的计算采用反向传播和SGD随机梯度下降来最小化softmax层和交叉熵损失函数。In this embodiment, optionally, back-propagation and SGD stochastic gradient descent are used in the calculation of the above-mentioned CNN to minimize the softmax layer and the cross-entropy loss function.

本实施例中，选取关键帧是为了减少待优化的帧数，关键帧可以代表其附近的帧，这样可以减少计算量，因此，在选取完关键帧之后，基于CNN的SLAM模型则主要针对关键帧进行数据处理。In this embodiment, the key frame is selected to reduce the number of frames to be optimized, and the key frame can represent the nearby frames, which can reduce the amount of calculation. Therefore, after the key frame is selected, the CNN-based SLAM model mainly targets the key frame. frame for data processing.

图3是根据本申请另一个实施例的基于CNN语义分割深度预测装置结构图。参见图3，该装置包括：FIG. 3 is a structural diagram of a depth prediction apparatus based on CNN semantic segmentation according to another embodiment of the present application. Referring to Figure 3, the device includes:

计算模块301，其配置成从数据集中获取当前帧，计算当前帧的ORB特征和地图点云；A calculation module 301, which is configured to obtain the current frame from the data set, and calculate the ORB feature and the map point cloud of the current frame;

选取模块302，其配置成根据当前帧的ORB特征和地图点云，确定指定条件满足时将当前帧选取为关键帧；Selection module 302, which is configured to select the current frame as a key frame when it is determined that the specified condition is satisfied according to the ORB feature of the current frame and the map point cloud;

预测模块303，其配置成采用CNN卷积神经网络将关键帧与深度图关联起来，做深度预测，并采用CNN为深度图标注语义标签，实现语义关联。The prediction module 303 is configured to use a CNN convolutional neural network to associate the key frames with the depth map to make depth prediction, and use CNN to label the depth map with semantic labels to achieve semantic association.

本实施例中，可选的，选取模块具体配置成：In this embodiment, optionally, the selection module is specifically configured as:

本实施例中，可选的，预测模块具体配置成：In this embodiment, optionally, the prediction module is specifically configured to:

本实施例中，可选的，上述CNN的计算采用反向传播和SGD随机梯度下降来最小化softmax层和交叉熵损失函数。In this embodiment, optionally, backpropagation and SGD stochastic gradient descent are used in the calculation of the above-mentioned CNN to minimize the softmax layer and the cross-entropy loss function.

本实施例提供的上述装置，可以执行上述任一方法实施例提供的方法，详细过程详见方法实施例中的描述，此处不赘述。The foregoing apparatus provided in this embodiment can execute the method provided by any of the foregoing method embodiments. For details, refer to the description in the method embodiment, which is not repeated here.

本实施例提供的上述装置，通过从数据集中获取当前帧，计算所述当前帧的ORB特征和地图点云，根据所述当前帧的ORB特征和地图点云，确定指定条件满足时将所述当前帧选取为关键帧，采用CNN卷积神经网络将所述关键帧与深度图关联起来，做深度预测，并采用所述CNN为所述深度图标注语义标签，实现语义关联，这种基于ORB特征选取关键帧的方式，不再把每一张图片输入模型，解决了内存问题，而且，通过CNN调整深度回归，优化了3D重建时绝对尺度不精确的问题。The above-mentioned device provided in this embodiment obtains the current frame from the data set, calculates the ORB feature and map point cloud of the current frame, and determines that the ORB feature and map point cloud of the current frame are satisfied when the specified condition is satisfied. The current frame is selected as the key frame, and the CNN convolutional neural network is used to associate the key frame with the depth map to make depth prediction, and the CNN is used to label the depth map with semantic labels to achieve semantic association. This ORB-based approach The method of feature selection key frame, no longer input each picture into the model, solve the memory problem, and adjust the depth regression through CNN, optimize the problem of inaccurate absolute scale during 3D reconstruction.

本申请实施例还提供了一种计算设备，参照图4，该计算设备包括存储器1120、处理器1110和存储在所述存储器1120内并能由所述处理器1110运行的计算机程序，该计算机程序存储于存储器1120中的用于程序代码的空间1130，该计算机程序在由处理器1110执行时实现用于执行任一项根据本发明的方法步骤1131。The embodiment of the present application also provides a computing device, referring to FIG. 4 , the computing device includes a memory 1120, a processor 1110, and a computer program stored in the memory 1120 and executable by the processor 1110, the computer program Space 1130 stored in the memory 1120 for program code which, when executed by the processor 1110, implements for performing any one of the method steps 1131 according to the invention.

本申请实施例还提供了一种计算机可读存储介质。参照图5，该计算机可读存储介质包括用于程序代码的存储单元，该存储单元设置有用于执行根据本发明的方法步骤的程序1131′，该程序被处理器执行。Embodiments of the present application also provide a computer-readable storage medium. 5, the computer-readable storage medium comprises a storage unit for program codes provided with a program 1131' for performing the method steps according to the invention, the program being executed by a processor.

本申请实施例还提供了一种包含指令的计算机程序产品。当该计算机程序产品在计算机上运行时，使得计算机执行根据本发明的方法步骤。Embodiments of the present application also provide a computer program product including instructions. The computer program product, when run on a computer, causes the computer to perform the method steps according to the invention.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机加载和执行所述计算机程序指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、获取其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer loads and executes the computer program instructions, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, Solid State Disk (SSD)), among others.

专业人员应该还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Professionals should be further aware that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令处理器完成，所述的程序可以存储于计算机可读存储介质中，所述存储介质是非短暂性(英文：non-transitory)介质，例如随机存取存储器，只读存储器，快闪存储器，硬盘，固态硬盘，磁带(英文：magnetic tape)，软盘(英文：floppy disk)，光盘(英文：optical disc)及其任意组合。Those of ordinary skill in the art can understand that all or part of the steps in the method of implementing the above embodiments can be completed by instructing the processor through a program, and the program can be stored in a computer-readable storage medium, and the storage medium is non-transitory ( English: non-transitory) media, such as random access memory, read only memory, flash memory, hard disk, solid state disk, magnetic tape (English: magnetic tape), floppy disk (English: floppy disk), optical disc (English: optical disc) and any combination thereof.

以上所述，仅为本申请较佳的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应该以权利要求的保护范围为准。The above are only the preferred specific embodiments of the present application, but the protection scope of the present application is not limited to this. Substitutions should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A CNN semantic based depth prediction method for segmentation comprises the following steps:

acquiring a current frame from a data set, and calculating ORB characteristics and map point clouds of the current frame; the ORB features are composed of key points and descriptors;

according to the ORB characteristics of the current frame and the map point cloud, when the specified conditions are met, the current frame is selected as a key frame;

estimating the depth of the key frame by adopting a CNN convolutional neural network to generate a depth map, measuring the confidence of each depth predicted pixel direction, and constructing a corresponding uncertainty map; fusing the generated depth map and uncertainty map of the key frame with the depth map and uncertainty map of the nearest key frame, and updating the point cloud to obtain a reconstructed 3D scene;

adopting a CNN (convolutional neural network) which expands a residual network to a full convolutional network, predicting a semantic label, and marking the semantic label on the depth map to realize semantic association; wherein the CNN calculation employs back propagation and SGD random gradient descent to minimize the softmax layer and cross entropy loss function;

according to the ORB characteristics of the current frame and the map point cloud, when determining that specified conditions are met, selecting the current frame as a key frame, wherein the method comprises the following steps:

when the local map construction is in an idle state or the distance between the current frame and the previous key frame exceeds the specified frame number M, selecting the current frame as the key frame; or the like, or, alternatively,

if the number of ORB characteristic points successfully matched with the map point cloud in the current frame is less than the specified number N, selecting the current frame as a key frame; or the like, or, alternatively,

and if the ORB characteristic points successfully matched with the map point cloud in the current frame are less than the specified proportion of the map point cloud, selecting the current frame as a key frame.

2. A CNN semantic based depth prediction device for segmentation comprises:

a calculation module configured to obtain a current frame from a dataset, calculate ORB features and a map point cloud of the current frame; wherein, the ORB characteristics are composed of key points and descriptors;

the selecting module is configured to select the current frame as a key frame when determining that a specified condition is met according to the ORB characteristics of the current frame and the map point cloud;

a prediction module configured to estimate depths of the key frames using a CNN convolutional neural network to generate a depth map, and measure a pixel direction confidence of each depth prediction to construct a corresponding uncertainty map; fusing the generated depth map and uncertainty map of the key frame with the depth map and uncertainty map of the nearest key frame, and updating the point cloud to obtain a reconstructed 3D scene;

adopting a CNN (compressed natural network) for expanding a residual error network to a full convolution network, predicting semantic labels, and labeling the semantic labels on the depth map to realize semantic association; wherein the CNN calculation employs back propagation and SGD random gradient descent to minimize the softmax layer and cross entropy loss function;

the selection module is specifically configured to: