CN116468768B - Scene depth completion method based on conditional variation self-encoder and geometric guidance - Google Patents
Scene depth completion method based on conditional variation self-encoder and geometric guidance Download PDFInfo
- Publication number
- CN116468768B CN116468768B CN202310422520.0A CN202310422520A CN116468768B CN 116468768 B CN116468768 B CN 116468768B CN 202310422520 A CN202310422520 A CN 202310422520A CN 116468768 B CN116468768 B CN 116468768B
- Authority
- CN
- China
- Prior art keywords
- depth
- map
- point cloud
- depth map
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000005070 sampling Methods 0.000 claims abstract description 6
- 230000003287 optical effect Effects 0.000 claims abstract description 4
- 238000013507 mapping Methods 0.000 claims abstract 3
- 230000006870 function Effects 0.000 claims description 10
- 230000000295 complement effect Effects 0.000 claims description 8
- 238000013461 design Methods 0.000 claims description 8
- 238000009826 distribution Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 230000008447 perception Effects 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims 1
- 230000005540 biological transmission Effects 0.000 claims 1
- 239000000284 extract Substances 0.000 description 7
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/521—Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01B—MEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
- G01B11/00—Measuring arrangements characterised by the use of optical techniques
- G01B11/24—Measuring arrangements characterised by the use of optical techniques for measuring contours or curvatures
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
- G06T2207/10044—Radar image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Optics & Photonics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical field
本发明涉及深度图补全技术领域,具体为基于条件变分自编码器和几何引导的场景深度补全方法。The present invention relates to the technical field of depth map completion, specifically a scene depth completion method based on conditional variational autoencoders and geometric guidance.
背景技术Background technique
人类对周围环境的感知、理解和体验依赖于视觉获取的三维场景信息。计算机视觉效仿人类的行为,将各类传感器作为视觉器官获取场景信息,进而实现对场景的识别和理解,其中深度信息在机器人、自动驾驶和增强现实等领域中起到关键性作用。在自动驾驶领域,驾驶过程中需要感知当前车辆与其他车辆、行人和障碍物等之间的距离,完全自动化Level5要求具备精确到厘米的测距能力。目前,激光雷达(LiDAR)是自动驾驶中主要的主动式距离传感器。相较于彩色相机采集的二维RGB图像,激光雷达采集的深度图(通过相机内参,深度图和点云可以互相转换)具有精确的深度距离,因此能够准确地感知周围环境的3D目标的位置信息。然而,一颗激光雷达在垂直方向只能够发射16线,32线或64线的有限激光线束,导致采集的点云密度极为稀疏(有效深度值的像素只占彩色图像的5%左右),给3D目标检测、三维环境感知等下游任务带来了严重的影响。Human perception, understanding and experience of the surrounding environment rely on three-dimensional scene information acquired through vision. Computer vision imitates human behavior and uses various sensors as visual organs to obtain scene information, thereby realizing the recognition and understanding of the scene. Depth information plays a key role in fields such as robotics, autonomous driving, and augmented reality. In the field of autonomous driving, it is necessary to sense the distance between the current vehicle and other vehicles, pedestrians and obstacles during driving. Full automation Level 5 requires the ability to measure distances accurate to centimeters. Currently, LiDAR is the main active distance sensor in autonomous driving. Compared with the two-dimensional RGB image collected by a color camera, the depth map collected by lidar (the depth map and point cloud can be converted into each other through the camera's internal parameters) has a precise depth distance, so it can accurately perceive the position of the 3D target in the surrounding environment. information. However, a lidar can only emit 16, 32 or 64 limited laser beams in the vertical direction, resulting in an extremely sparse point cloud density (the pixels with effective depth values only account for about 5% of the color image), giving Downstream tasks such as 3D target detection and 3D environment perception have had a serious impact.
发明内容Contents of the invention
本发明的目的在于提供一种基于条件变分自编码器和几何引导的场景深度补全方法,以解决现有的激光雷达等深度成像设备带来的数据稀疏、缺失的关键问题。The purpose of the present invention is to provide a scene depth completion method based on conditional variational autoencoders and geometric guidance to solve the key problems of data sparseness and missingness caused by existing depth imaging equipment such as lidar.
为实现上述目的,本发明提供如下技术方案:一种基于条件变分自编码器和几何引导的场景深度补全方法,包括以下步骤:To achieve the above objectives, the present invention provides the following technical solution: a scene depth completion method based on conditional variational autoencoders and geometric guidance, including the following steps:
获取自动驾驶场景下的彩色图像,稀疏深度图以及稠密深度图;Obtain color images, sparse depth maps and dense depth maps in autonomous driving scenarios;
设计具有先验网络和后验网络的条件变分自编码器,将彩色图像和稀疏深度图输入到先验网络中提取特征,再将彩色图像、稀疏深度图和稠密深度图输入到后验网络中提取特征;Design a conditional variational autoencoder with a priori network and a posteriori network, input the color image and sparse depth map into the prior network to extract features, and then input the color image, sparse depth map and dense depth map into the posterior network extract features;
利用相机内参或焦距将稀疏深度图转换成点云,再将点云上采样模型提取到几何空间特征,并映射回稀疏深度图上;Convert the sparse depth map into a point cloud using camera intrinsic parameters or focal length, then extract the point cloud upsampling model into geometric space features, and map them back to the sparse depth map;
采用动态图消息传播模块实现图像特征和点云特征的融合;The dynamic graph message dissemination module is used to achieve the fusion of image features and point cloud features;
利用基于残差网络的U形编码解码器生成初步的深度补全图;A preliminary depth completion map is generated using a U-shaped encoder and decoder based on a residual network;
将初步预测的补全深度图输入到置信度不确定性估计模块,实现最终的深度补全优化。The initially predicted completion depth map is input into the confidence uncertainty estimation module to achieve the final depth completion optimization.
优选的,所述获取自动驾驶场景下的彩色图像和稀疏深度图,包括:Preferably, the acquisition of color images and sparse depth maps in autonomous driving scenarios includes:
使用彩色相机和激光雷达捕获自动驾驶场景下的彩色图像和稀疏深度图;Use color cameras and lidar to capture color images and sparse depth maps in autonomous driving scenarios;
利用SparsityInvariantCNNs算法将稀疏深度图变成稠密深度图作为真实标签辅助训练。The SparsityInvariantCNNs algorithm is used to convert the sparse depth map into a dense depth map as a real label auxiliary training.
优选的,所述设计具有先验网络和后验网络的条件变分自编码器,将彩色图像和稀疏深度图输入到先验网络中提取特征,再将彩色图像、稀疏深度图和稠密深度图输入到后验网络中提取特征,包括:Preferably, the conditional variational autoencoder is designed with a priori network and a posteriori network, the color image and sparse depth map are input into the prior network to extract features, and then the color image, sparse depth map and dense depth map are Input to the posterior network to extract features, including:
基于ResNet结构的特征提取模块,设计具有相同结构的先验网络和后验网络作为条件变分自编码器;Based on the feature extraction module of the ResNet structure, a priori network and a posteriori network with the same structure are designed as conditional variational autoencoders;
将彩色图像和稀疏深度图输入先验网络提取最后一层的特征图Prior,同时,将彩色图像、稀疏深度图和真实标签输入后验网络提取最后一层的特征图Posterior,然后分别计算Prior和Posterior特征图的均值和方差,得到各自的特征的概率分布D1和D2,再使用Kullback-Leibler散度损失函数监督分布D1和D2之间的损失,使先验网络能够学习到后验网络的真实标签特征。Input the color image and sparse depth map into the prior network to extract the feature map Prior of the last layer. At the same time, input the color image, sparse depth map and real label into the posterior network to extract the feature map Posterior of the last layer, and then calculate Prior and The mean and variance of the Posterior feature map are used to obtain the probability distributions D 1 and D 2 of the respective features, and then the Kullback-Leibler divergence loss function is used to supervise the loss between the distributions D 1 and D 2 so that the prior network can learn the posterior Test the real label characteristics of the network.
优选的,所述利用相机内参或焦距将稀疏深度图转换成点云,再将点云上采样模型提取到几何空间特征,并映射回稀疏深度图上,包括:Preferably, the method uses camera internal parameters or focal length to convert the sparse depth map into a point cloud, and then extracts the point cloud upsampling model into geometric space features and maps them back to the sparse depth map, including:
根据相机内参将稀疏深度图像像素点(ui,vi)由像素坐标系转换到相机坐标系得到三维场景的点云坐标(xi,yi,zi),形成稀疏点云数据S;According to the internal parameters of the camera, the sparse depth image pixels (u i , vi ) are converted from the pixel coordinate system to the camera coordinate system to obtain the point cloud coordinates (xi , y i , z i ) of the three-dimensional scene, forming sparse point cloud data S;
其中,(cx,cy)为相机的光心坐标,fx,fy分别为相机在x轴和y轴方向的焦距,di为(ui,vi)处的深度值,对于真实标签深度图,也采用上述公式生成稠密标签点云S1;Among them, (c x , c y ) are the optical center coordinates of the camera, f x , f y are the focal lengths of the camera in the x-axis and y-axis directions respectively, d i is the depth value at (u i , vi ) , for The real label depth map also uses the above formula to generate a dense label point cloud S 1 ;
通过对点云S进行多次随机采样,获得不同数量的点云集合,针对每个点云集合,利用KNN最邻近节点算法聚合每个点周围的16个最近点,输入到几何感知神经网络中提取该点的局部几何特征;By randomly sampling the point cloud S multiple times, different numbers of point cloud sets are obtained. For each point cloud set, the KNN nearest neighbor node algorithm is used to aggregate the 16 closest points around each point and input into the geometry-aware neural network. Extract the local geometric features of the point;
将每个点提取到的稀疏点云特征与原始点云坐标(xi,yi,zi)相加得到点云编码特征Q,将Q输入到四倍上采样的多层感知机网络中得到预测的稠密点云S2,利用ChamferDistance损失函数计算真实稠密点云S1和预测的稠密点云S2之间的损失值,以此监督网络的训练,CD损失具体的计算公式如下:Add the sparse point cloud features extracted from each point to the original point cloud coordinates (x i , y i , z i ) to obtain the point cloud encoding feature Q, and input Q into the multi-layer perceptron network with four times upsampling The predicted dense point cloud S 2 is obtained, and the ChamferDistance loss function is used to calculate the loss value between the real dense point cloud S 1 and the predicted dense point cloud S 2 to supervise the training of the network. The specific calculation formula of CD loss is as follows:
其中,第一项表示S1中任意一点x到S2的最小距离之和,第二项表示S2中任意一点y到S1的最小距离之和。Among them, the first term represents the sum of the minimum distances from any point x in S 1 to S 2 , and the second term represents the sum of the minimum distances from any point y in S 2 to S 1 .
优选的,所述采用动态图消息传播模块实现图像特征和点云特征的融合,包括:Preferably, the dynamic graph message dissemination module is used to realize the fusion of image features and point cloud features, including:
设计两个具有相同结构的编码网络,编码器由5层ResNet模块构成,将彩色图像和稀疏深度图输入到RGB分支编码器提取到五个不同尺度的特征图L1,L2,L3,L4和L5,将点云特征Q和稀疏深度图输入到点云分支编码器提取到五个不同尺度的特征图P1,P2,P3,P4和P5;Design two encoding networks with the same structure. The encoder is composed of a 5-layer ResNet module. The color image and sparse depth map are input to the RGB branch encoder to extract five different scale feature maps L 1 , L 2 , L 3 , L 4 and L 5 , input the point cloud feature Q and sparse depth map to the point cloud branch encoder to extract feature maps of five different scales P 1 , P 2 , P 3 , P 4 and P 5 ;
对于L1,L2,L3,L4和L5,采用空洞卷积的方式获取到不同感受野的像素点,再利用可变形卷积探索每个像素点的坐标偏移量,使得每个像素点动态地聚合周围相关性强的特征值,获得特征T1,T2,T3,T4和T5;For L 1 , L 2 , L 3 , L 4 and L 5 , dilated convolution is used to obtain pixels in different receptive fields, and then deformable convolution is used to explore the coordinate offset of each pixel, so that each Each pixel dynamically aggregates surrounding feature values with strong correlation to obtain features T 1 , T 2 , T 3 , T 4 and T 5 ;
将富含动态图特征的T1,T2,T3,T4,T5加到点云编码特征图P1,P2,P3,P4,P5上,获得包含语义信息和几何信息的点云特征图M1,M2,M3,M4,M5。Add T 1 , T 2 , T 3 , T 4 , and T 5 rich in dynamic image features to the point cloud encoding feature maps P 1 , P 2 , P 3 , P 4 , and P 5 to obtain semantic information and geometry. Information point cloud feature maps M 1 , M 2 , M 3 , M 4 , M 5 .
优选的,所述利用基于残差网络的U形编码解码器生成初步的深度补全图,包括:Preferably, the U-shaped codec based on residual network is used to generate a preliminary depth completion map, including:
根据编码器结构设计相应的多尺度解码器结构,形成U-Net结构的编码解码器网络;Design the corresponding multi-scale decoder structure according to the encoder structure to form an encoder-decoder network with a U-Net structure;
将RGB分支生成的特征图L1,L2,L3,L4和L5输入U-Net预测出第一张粗糙深度补全图Depth1和置信度图C1;Input the feature maps L 1 , L 2 , L 3 , L 4 and L 5 generated by the RGB branch into U-Net to predict the first rough depth completion map Depth 1 and confidence map C 1 ;
将点云分支生成的特征图M1,M2,M3,M4和M5输入U-Net预测出第二张粗糙深度补全图Depth2和置信度图C2。Input the feature maps M 1 , M 2 , M 3 , M 4 and M 5 generated by the point cloud branch into U-Net to predict the second rough depth completion map Depth 2 and confidence map C 2 .
优选的,所述将初步预测的补全深度图输入到置信度不确定性估计模块,实现最终的深度补全优化,包括:Preferably, the initially predicted completion depth map is input into the confidence uncertainty estimation module to achieve final depth completion optimization, including:
将生成的置信度图C1和C2相加得到特征图C,利用Softmax函数对于C再进行不确定性预测,逐像素预测出每张置信度图的不确定性比例F1和F2;Add the generated confidence maps C 1 and C 2 to obtain the feature map C, use the Softmax function to predict the uncertainty of C, and predict the uncertainty ratios F 1 and F 2 of each confidence map pixel by pixel;
将不确定性图F1和F2分别和粗糙深度补全Depth1和Depth2相乘获得最终优化的深度补全图。The final optimized depth completion map is obtained by multiplying the uncertainty maps F 1 and F 2 with the rough depth completion Depth 1 and Depth 2 respectively.
与现有技术相比,本发明的有益效果是:Compared with the prior art, the beneficial effects of the present invention are:
通过利用条件变分自编码器学习真实稠密深度图中的特征分布,引导彩色图像和稀疏深度图生成更有价值的深度特征,其次,利用三维空间的点云特征捕获不同模态的下的空间结构特征,增强网络的几何感知能力,为预测出更加准确的深度值提供了辅助信息,并且,动态图消息传播模块巧妙地融合彩色图像和点云之间的特征,实现高精度的深度补全预测,能够弥补激光雷达采集的数据过于稀疏的问题,使得线束较少的低成本激光雷达获得较为准确稠密的深度信息,为自动驾驶、机器人环境感知等需要精确稠密深度数据的行业提供了具有成本效益的解决方案。By using conditional variational autoencoders to learn the feature distribution in real dense depth maps, color images and sparse depth maps are guided to generate more valuable depth features. Secondly, point cloud features in three-dimensional space are used to capture the space under different modalities. Structural features enhance the geometric perception of the network and provide auxiliary information for predicting more accurate depth values. Moreover, the dynamic graph message propagation module cleverly integrates the features between color images and point clouds to achieve high-precision depth completion. Prediction can make up for the problem of too sparse data collected by lidar, allowing low-cost lidar with fewer wire harnesses to obtain more accurate and dense depth information, providing a cost-effective solution for industries that require accurate and dense depth data, such as autonomous driving and robot environment perception. Efficient solutions.
附图说明Description of the drawings
图1为本发明实施例提供的一种基于条件变分自编码器和几何引导的场景深度补全方法的流程图;Figure 1 is a flow chart of a scene depth completion method based on conditional variational autoencoders and geometric guidance provided by an embodiment of the present invention;
图2为本发明实施例提供的一种基于条件变分自编码器和几何引导的场景深度补全方法的深度补全图。Figure 2 is a depth completion diagram of a scene depth completion method based on conditional variational autoencoders and geometric guidance provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
本实施方式的方法的执行主体为终端,所述终端可以为手机、平板电脑、掌上电脑PDA、笔记本或台式机等设备,当然,还可以为其他具有相似功能的设备,本实施方式不加以限制。The execution subject of the method in this embodiment is a terminal. The terminal can be a mobile phone, a tablet computer, a PDA, a notebook or a desktop computer. Of course, it can also be other devices with similar functions, which is not limited by this embodiment. .
请参阅图1和2,本发明提供一种基于条件变分自编码器和几何引导的场景深度补全方法,所述方法应用于自动驾驶场景深度补全,包括:Please refer to Figures 1 and 2. The present invention provides a scene depth completion method based on conditional variational autoencoders and geometric guidance. The method is applied to automatic driving scene depth completion, including:
步骤S1,获取自动驾驶场景下的彩色图像,稀疏深度图以及稠密深度图。Step S1: Obtain color images, sparse depth maps and dense depth maps in autonomous driving scenarios.
具体的,步骤S1还包括以下步骤:Specifically, step S1 also includes the following steps:
S101,使用彩色相机和激光雷达捕获自动驾驶场景下的彩色图像和稀疏深度图;S101, uses color cameras and lidar to capture color images and sparse depth maps in autonomous driving scenarios;
S102,利用Sparsity Invariant CNNs算法将稀疏深度图变成稠密深度图作为真实标签辅助训练。S102, use the Sparsity Invariant CNNs algorithm to convert the sparse depth map into a dense depth map as real label auxiliary training.
其中,自动驾驶车辆主要配备彩色相机和激光雷达,分别采集RGB图像和深度图像,本方法中需要额外生成补全深度图作为训练标签,具体步骤如下:Among them, self-driving vehicles are mainly equipped with color cameras and lidar, which collect RGB images and depth images respectively. In this method, an additional complementary depth map needs to be generated as a training label. The specific steps are as follows:
使用彩色相机和Velodyne HDL-64E激光雷达捕获自动驾驶场景下的彩色图像和深度图像;利用Sparsity Invariant CNNs算法将稀疏深度图变成稠密深度图作为真实标签。Use a color camera and Velodyne HDL-64E lidar to capture color images and depth images in autonomous driving scenarios; use the Sparsity Invariant CNNs algorithm to convert sparse depth maps into dense depth maps as real labels.
步骤S2,设计具有先验网络和后验网络的条件变分自编码器,将彩色图像和稀疏深度图输入到先验网络中提取特征,再将彩色图像、稀疏深度图和稠密深度图输入到后验网络中提取特征。Step S2, design a conditional variational autoencoder with a priori network and a posteriori network, input the color image and sparse depth map into the prior network to extract features, and then input the color image, sparse depth map and dense depth map into Extract features from the posterior network.
具体的,步骤S2还包括以下步骤:Specifically, step S2 also includes the following steps:
S201,基于ResNet结构的特征提取模块,设计具有相同结构的先验网络和后验网络作为条件变分自编码器;S201, based on the feature extraction module of ResNet structure, design a priori network and posterior network with the same structure as a conditional variational autoencoder;
S202,将彩色图像和稀疏深度图输入先验网络提取最后一层的特征图Prior,同时,将彩色图像、稀疏深度图和真实标签输入后验网络提取最后一层的特征图Posterior,然后分别计算Prior和Posterior特征图的均值和方差,得到各自的特征的概率分布D1和D2,再使用Kullback-Leibler散度损失函数监督分布D1和D2之间的损失,使先验网络能够学习到后验网络的真实标签特征。S202, input the color image and sparse depth map into the prior network to extract the feature map Prior of the last layer. At the same time, input the color image, sparse depth map and real label into the posterior network to extract the feature map Posterior of the last layer, and then calculate them respectively. The mean and variance of the Prior and Posterior feature maps are used to obtain the probability distributions D 1 and D 2 of the respective features, and then the Kullback-Leibler divergence loss function is used to supervise the loss between the distributions D 1 and D 2 so that the prior network can learn to the true label features of the posterior network.
步骤S3,利用相机内参或焦距将稀疏深度图转换成点云,再将点云上采样模型提取到几何空间特征,并映射回稀疏深度图上。Step S3: Use camera intrinsic parameters or focal length to convert the sparse depth map into a point cloud, then extract the point cloud upsampling model into geometric space features, and map them back to the sparse depth map.
具体的,步骤S3还包括以下步骤:Specifically, step S3 also includes the following steps:
S301,根据相机内参/焦距等将稀疏深度图像像素点(ui,vi)由像素坐标系转换到相机坐标系得到三维场景的点云坐标(xi,yi,zi),形成稀疏点云数据S;S301, convert the sparse depth image pixels (u i , vi ) from the pixel coordinate system to the camera coordinate system according to the camera internal parameters/focal length, etc. to obtain the point cloud coordinates (x i , y i , z i ) of the three-dimensional scene, forming a sparse Point cloud data S;
其中,(cx,cy)为相机的光心坐标,fx,fy分别为相机在x轴和y轴方向的焦距,di为(ui,vi)处的深度值,对于真实标签深度图,也采用上述公式生成稠密标签点云S1;Among them, (c x , c y ) are the optical center coordinates of the camera, f x , f y are the focal lengths of the camera in the x-axis and y-axis directions respectively, d i is the depth value at (u i , vi ) , for The real label depth map also uses the above formula to generate a dense label point cloud S 1 ;
S302,通过对点云S进行多次随机采样,获得不同数量的点云集合,针对每个点云集合,利用KNN最邻近节点算法聚合每个点周围的16个最近点,输入到几何感知神经网络中提取该点的局部几何特征;S302, obtain different numbers of point cloud collections by randomly sampling the point cloud S multiple times. For each point cloud collection, use the KNN nearest neighbor node algorithm to aggregate the 16 closest points around each point and input them to the geometry perception neural network. Extract the local geometric features of the point in the network;
S303,将每个点提取到的稀疏点云特征与原始点云坐标(xi,yi,zi)相加得到点云编码特征Q,将Q输入到四倍上采样的多层感知机网络中得到预测的稠密点云S2,利用Chamfer Distance损失函数计算真实稠密点云S1和预测的稠密点云S2之间的损失值,以此监督网络的训练,CD损失具体的计算公式如下:S303, add the sparse point cloud features extracted from each point to the original point cloud coordinates (x i , y i , z i ) to obtain the point cloud encoding feature Q, and input Q to the multi-layer perceptron with four times upsampling The predicted dense point cloud S 2 is obtained from the network, and the Chamfer Distance loss function is used to calculate the loss value between the real dense point cloud S 1 and the predicted dense point cloud S 2 to supervise the training of the network. The specific calculation formula of CD loss as follows:
其中,第一项表示S1中任意一点x到S2的最小距离之和,第二项表示S2中任意一点y到S1的最小距离之和。Among them, the first term represents the sum of the minimum distances from any point x in S 1 to S 2 , and the second term represents the sum of the minimum distances from any point y in S 2 to S 1 .
步骤S4,采用动态图消息传播模块实现图像特征和点云特征的融合。Step S4: Use the dynamic graph message dissemination module to achieve the fusion of image features and point cloud features.
具体的,步骤S4还包括以下步骤:Specifically, step S4 also includes the following steps:
S401,设计两个具有相同结构的编码网络,编码器由5层ResNet模块构成,将彩色图像和稀疏深度图输入到RGB分支编码器提取到五个不同尺度的特征图L1,L2,L3,L4和L5,将点云特征Q和稀疏深度图输入到点云分支编码器提取到五个不同尺度的特征图P1,P2,P3,P4和P5;S401, design two encoding networks with the same structure. The encoder is composed of a 5-layer ResNet module. The color image and sparse depth map are input to the RGB branch encoder to extract five feature maps of different scales L 1 , L 2 , L 3 , L 4 and L 5 , input the point cloud feature Q and sparse depth map to the point cloud branch encoder to extract feature maps of five different scales P 1 , P 2 , P 3 , P 4 and P 5 ;
S402,对于L1,L2,L3,L4和L5,采用空洞卷积的方式获取到不同感受野的像素点,再利用可变形卷积探索每个像素点的坐标偏移量,使得每个像素点动态地聚合周围相关性强的特征值,获得特征T1,T2,T3,T4和T5;S402, for L 1 , L 2 , L 3 , L 4 and L 5 , use atrous convolution to obtain pixels in different receptive fields, and then use deformable convolution to explore the coordinate offset of each pixel. This allows each pixel to dynamically aggregate surrounding feature values with strong correlation to obtain features T 1 , T 2 , T 3 , T 4 and T 5 ;
S403,将富含动态图特征的T1,T2,T3,T4,T5加到点云编码特征图P1,P2,P3,P4,P5上,获得包含语义信息和几何信息的点云特征图M1,M2,M3,M4,M5。S403, add T 1 , T 2 , T 3 , T 4 , and T 5 rich in dynamic image features to the point cloud encoding feature maps P 1 , P 2 , P 3 , P 4 , and P 5 to obtain semantic information. and point cloud feature maps M 1 , M 2 , M 3 , M 4 , M 5 of geometric information.
步骤S5,利用基于残差网络的U形编码解码器生成初步的深度补全图。Step S5: Use the U-shaped codec based on the residual network to generate a preliminary depth completion map.
具体的,步骤S5还包括以下步骤:Specifically, step S5 also includes the following steps:
S501,根据步骤S4中编码器结构设计相应的多尺度解码器结构,形成U-Net结构的编码解码器网络;S501. Design the corresponding multi-scale decoder structure according to the encoder structure in step S4 to form an encoder-decoder network with a U-Net structure;
S502,将RGB分支生成的特征图L1,L2,L3,L4和L5输入U-Net预测出第一张粗糙深度补全图Depth1和置信度图C1;S502, input the feature maps L 1 , L 2 , L 3 , L 4 and L 5 generated by the RGB branch into U-Net to predict the first rough depth completion map Depth 1 and confidence map C 1 ;
S503,将点云分支生成的特征图M1,M2,M3,M4和M5输入U-Net预测出第二张粗糙深度补全图Depth2和置信度图C2。S503, input the feature maps M 1 , M 2 , M 3 , M 4 and M 5 generated by the point cloud branch into U-Net to predict the second rough depth complement map Depth 2 and confidence map C 2 .
步骤S6,将初步预测的补全深度图输入到置信度不确定性估计模块,实现最终的深度补全优化。Step S6: Input the initially predicted completion depth map into the confidence uncertainty estimation module to achieve final depth completion optimization.
具体的,步骤S6还包括以下步骤:Specifically, step S6 also includes the following steps:
S601,将步骤S5生成的置信度图C1和C2相加得到特征图C,利用Softmax函数对于C再进行不确定性预测,逐像素预测出每张置信度图的不确定性比例F1和F2;S601, add the confidence maps C 1 and C 2 generated in step S5 to obtain the feature map C, use the Softmax function to perform uncertainty prediction on C, and predict the uncertainty ratio F 1 of each confidence map pixel by pixel. and F 2 ;
S602,将不确定性图F1和F2分别和粗糙深度补全Depth1和Depth2相乘获得最终优化的深度补全图。S602: Multiply the uncertainty maps F 1 and F 2 with the rough depth completion Depth 1 and Depth 2 respectively to obtain the final optimized depth completion map.
在本实施例中,首先,通过利用条件变分自编码器学习真实稠密深度图中的特征分布,引导彩色图像和稀疏深度图生成更有价值的深度特征,其次,利用三维空间的点云特征捕获不同模态的下的空间结构特征,增强网络的几何感知能力,为预测出更加准确的深度值提供了辅助信息,并且,动态图消息传播模块巧妙地融合彩色图像和点云之间的特征,实现高精度的深度补全预测。In this embodiment, first, by using conditional variational autoencoders to learn the feature distribution in real dense depth maps, color images and sparse depth maps are guided to generate more valuable depth features, and secondly, point cloud features in three-dimensional space are used Captures the spatial structure characteristics of different modalities, enhances the geometric perception ability of the network, and provides auxiliary information for predicting more accurate depth values. Moreover, the dynamic graph message propagation module cleverly integrates the features between color images and point clouds. , achieving high-precision depth completion prediction.
另外,还需要说明的是,本案中各技术特征的组合方式并不限本案权利要求中所记载的组合方式或是具体实施例所记载的组合方式,本案所记载的所有技术特征可以以任何方式进行自由组合或结合,除非相互之间产生矛盾。In addition, it should be noted that the combination of the technical features in this case is not limited to the combinations recorded in the claims of this case or the combinations recorded in the specific embodiments. All the technical features recorded in this case can be combined in any way. Free combination or combination, unless there is a conflict between them.
需要注意的是,以上列举的仅为本发明的具体实施例,显然本发明不限于以上实施例,随之有着许多的类似变化。本领域的技术人员如果从本发明公开的内容直接导出或联想到的所有变形,均应属于本发明的保护范围。It should be noted that the above examples are only specific embodiments of the present invention. Obviously, the present invention is not limited to the above embodiments, and there are many similar changes. All modifications directly derived or thought of by those skilled in the art from the disclosure of the present invention shall fall within the protection scope of the present invention.
以上仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310422520.0A CN116468768B (en) | 2023-04-20 | 2023-04-20 | Scene depth completion method based on conditional variation self-encoder and geometric guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310422520.0A CN116468768B (en) | 2023-04-20 | 2023-04-20 | Scene depth completion method based on conditional variation self-encoder and geometric guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116468768A CN116468768A (en) | 2023-07-21 |
CN116468768B true CN116468768B (en) | 2023-10-17 |
Family
ID=87183885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310422520.0A Active CN116468768B (en) | 2023-04-20 | 2023-04-20 | Scene depth completion method based on conditional variation self-encoder and geometric guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116468768B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117351310B (en) * | 2023-09-28 | 2024-03-12 | 山东大学 | Multimodal 3D target detection method and system based on depth completion |
CN117953029B (en) * | 2024-03-27 | 2024-06-07 | 北京科技大学 | General depth map completion method and device based on depth information propagation |
CN118411396B (en) * | 2024-04-16 | 2025-03-21 | 中国科学院自动化研究所 | Depth completion method, device, electronic device and storage medium |
CN119151804B (en) * | 2024-11-19 | 2025-03-18 | 浙江大学 | A method and device for generating pseudo points based on mixed light images |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112767294A (en) * | 2021-01-14 | 2021-05-07 | Oppo广东移动通信有限公司 | Depth image enhancement method and device, electronic equipment and storage medium |
CN112861729A (en) * | 2021-02-08 | 2021-05-28 | 浙江大学 | Real-time depth completion method based on pseudo-depth map guidance |
WO2022045495A1 (en) * | 2020-08-25 | 2022-03-03 | Samsung Electronics Co., Ltd. | Methods for depth map reconstruction and electronic computing device for implementing the same |
CN114998406A (en) * | 2022-07-14 | 2022-09-02 | 武汉图科智能科技有限公司 | Self-supervision multi-view depth estimation method and device |
CN115423978A (en) * | 2022-08-30 | 2022-12-02 | 西北工业大学 | Deep Learning-Based Image Laser Data Fusion Method for Building Reconstruction |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11315266B2 (en) * | 2019-12-16 | 2022-04-26 | Robert Bosch Gmbh | Self-supervised depth estimation method and system |
-
2023
- 2023-04-20 CN CN202310422520.0A patent/CN116468768B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022045495A1 (en) * | 2020-08-25 | 2022-03-03 | Samsung Electronics Co., Ltd. | Methods for depth map reconstruction and electronic computing device for implementing the same |
CN112767294A (en) * | 2021-01-14 | 2021-05-07 | Oppo广东移动通信有限公司 | Depth image enhancement method and device, electronic equipment and storage medium |
CN112861729A (en) * | 2021-02-08 | 2021-05-28 | 浙江大学 | Real-time depth completion method based on pseudo-depth map guidance |
CN114998406A (en) * | 2022-07-14 | 2022-09-02 | 武汉图科智能科技有限公司 | Self-supervision multi-view depth estimation method and device |
CN115423978A (en) * | 2022-08-30 | 2022-12-02 | 西北工业大学 | Deep Learning-Based Image Laser Data Fusion Method for Building Reconstruction |
Non-Patent Citations (5)
Title |
---|
Depth Completion Auto-Encoder;Kaiyue Lu等;《2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)》;全文 * |
Sparse-to-Dense Multi-Encoder Shape Completion of Unstructured Point Cloud;Yanjun Peng等;《IEEE Access》;第8卷;全文 * |
周云成 ; 邓寒冰 ; 许童羽 ; 苗腾 ; 吴琼 ; .基于稠密自编码器的无监督番茄植株图像深度估计模型.农业工程学报.2020,(第11期),全文. * |
王东敏 ; 彭永胜 ; 李永乐 ; .视觉与激光点云融合的深度图像获取方法.军事交通学院学报.2017,(第10期),全文. * |
面向鲁棒和智能化的多源融合SLAM技术研究;左星星;《中国博士学位论文全文数据库 (信息科技辑)》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116468768A (en) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116468768B (en) | Scene depth completion method based on conditional variation self-encoder and geometric guidance | |
CN109377530B (en) | A Binocular Depth Estimation Method Based on Deep Neural Network | |
CN110992271B (en) | Image processing method, path planning method, device, equipment and storage medium | |
CN111105432B (en) | Unsupervised end-to-end driving environment perception method based on deep learning | |
CN110246181B (en) | Anchor point-based attitude estimation model training method, attitude estimation method and system | |
CN111696148A (en) | End-to-end stereo matching method based on convolutional neural network | |
CN116612468A (en) | 3D Object Detection Method Based on Multimodal Fusion and Deep Attention Mechanism | |
CN116129233A (en) | Panoramic Segmentation Method for Autonomous Driving Scene Based on Multimodal Fusion Perception | |
WO2021249114A1 (en) | Target tracking method and target tracking device | |
CN117422884A (en) | Three-dimensional target detection method, system, electronic equipment and storage medium | |
Shi et al. | An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds | |
CN115330935A (en) | A 3D reconstruction method and system based on deep learning | |
CN115511759A (en) | A Point Cloud Image Depth Completion Method Based on Cascade Feature Interaction | |
CN114049362A (en) | Transform-based point cloud instance segmentation method | |
CN117745944A (en) | Pre-training model determining method, device, equipment and storage medium | |
Jia et al. | Depth measurement based on a convolutional neural network and structured light | |
CN117351310B (en) | Multimodal 3D target detection method and system based on depth completion | |
CN118429764A (en) | Collaborative sensing method based on multi-mode fusion | |
CN118537834A (en) | Vehicle perception information acquisition method, device, equipment and storage medium | |
CN113012191A (en) | Laser mileage calculation method based on point cloud multi-view projection graph | |
CN117423102A (en) | Point cloud data processing method and related equipment | |
CN115965961B (en) | Local-global multi-mode fusion method, system, equipment and storage medium | |
CN117953157A (en) | Pure vision self-supervision three-dimensional prediction model based on two-dimensional video in automatic driving field | |
CN116912645A (en) | Three-dimensional target detection method and device integrating texture and geometric features | |
CN117671625A (en) | A multi-modal perception method and system for autonomous driving based on diffusion model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |