[go: up one dir, main page]

CN116468768B - Scene depth completion method based on conditional variation self-encoder and geometric guidance - Google Patents

Scene depth completion method based on conditional variation self-encoder and geometric guidance Download PDF

Info

Publication number
CN116468768B
CN116468768B CN202310422520.0A CN202310422520A CN116468768B CN 116468768 B CN116468768 B CN 116468768B CN 202310422520 A CN202310422520 A CN 202310422520A CN 116468768 B CN116468768 B CN 116468768B
Authority
CN
China
Prior art keywords
depth
map
point cloud
depth map
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310422520.0A
Other languages
Chinese (zh)
Other versions
CN116468768A (en
Inventor
魏明强
吴鹏
燕雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310422520.0A priority Critical patent/CN116468768B/en
Publication of CN116468768A publication Critical patent/CN116468768A/en
Application granted granted Critical
Publication of CN116468768B publication Critical patent/CN116468768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/24Measuring arrangements characterised by the use of optical techniques for measuring contours or curvatures
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Optics & Photonics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a scene depth completion method based on a conditional variation self-encoder and geometric guidance, which comprises the following steps: acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene; the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features; and converting the sparse depth map into point cloud by using camera internal parameters, namely focal length and optical center coordinates, extracting a geometric space feature from a point cloud up-sampling model, and mapping the geometric space feature back to the sparse depth map. The invention can solve the problem that the data acquired by the laser radar is too sparse, so that the low-cost laser radar with fewer wire harnesses can obtain more accurate and dense depth information, and provides a cost-effective solution for industries such as automatic driving, robot environment sensing and the like which need accurate and dense depth data.

Description

基于条件变分自编码器和几何引导的场景深度补全方法Scene depth completion method based on conditional variational autoencoders and geometric guidance

技术领域Technical field

本发明涉及深度图补全技术领域,具体为基于条件变分自编码器和几何引导的场景深度补全方法。The present invention relates to the technical field of depth map completion, specifically a scene depth completion method based on conditional variational autoencoders and geometric guidance.

背景技术Background technique

人类对周围环境的感知、理解和体验依赖于视觉获取的三维场景信息。计算机视觉效仿人类的行为,将各类传感器作为视觉器官获取场景信息,进而实现对场景的识别和理解,其中深度信息在机器人、自动驾驶和增强现实等领域中起到关键性作用。在自动驾驶领域,驾驶过程中需要感知当前车辆与其他车辆、行人和障碍物等之间的距离,完全自动化Level5要求具备精确到厘米的测距能力。目前,激光雷达(LiDAR)是自动驾驶中主要的主动式距离传感器。相较于彩色相机采集的二维RGB图像,激光雷达采集的深度图(通过相机内参,深度图和点云可以互相转换)具有精确的深度距离,因此能够准确地感知周围环境的3D目标的位置信息。然而,一颗激光雷达在垂直方向只能够发射16线,32线或64线的有限激光线束,导致采集的点云密度极为稀疏(有效深度值的像素只占彩色图像的5%左右),给3D目标检测、三维环境感知等下游任务带来了严重的影响。Human perception, understanding and experience of the surrounding environment rely on three-dimensional scene information acquired through vision. Computer vision imitates human behavior and uses various sensors as visual organs to obtain scene information, thereby realizing the recognition and understanding of the scene. Depth information plays a key role in fields such as robotics, autonomous driving, and augmented reality. In the field of autonomous driving, it is necessary to sense the distance between the current vehicle and other vehicles, pedestrians and obstacles during driving. Full automation Level 5 requires the ability to measure distances accurate to centimeters. Currently, LiDAR is the main active distance sensor in autonomous driving. Compared with the two-dimensional RGB image collected by a color camera, the depth map collected by lidar (the depth map and point cloud can be converted into each other through the camera's internal parameters) has a precise depth distance, so it can accurately perceive the position of the 3D target in the surrounding environment. information. However, a lidar can only emit 16, 32 or 64 limited laser beams in the vertical direction, resulting in an extremely sparse point cloud density (the pixels with effective depth values only account for about 5% of the color image), giving Downstream tasks such as 3D target detection and 3D environment perception have had a serious impact.

发明内容Contents of the invention

本发明的目的在于提供一种基于条件变分自编码器和几何引导的场景深度补全方法,以解决现有的激光雷达等深度成像设备带来的数据稀疏、缺失的关键问题。The purpose of the present invention is to provide a scene depth completion method based on conditional variational autoencoders and geometric guidance to solve the key problems of data sparseness and missingness caused by existing depth imaging equipment such as lidar.

为实现上述目的,本发明提供如下技术方案:一种基于条件变分自编码器和几何引导的场景深度补全方法,包括以下步骤:To achieve the above objectives, the present invention provides the following technical solution: a scene depth completion method based on conditional variational autoencoders and geometric guidance, including the following steps:

获取自动驾驶场景下的彩色图像,稀疏深度图以及稠密深度图;Obtain color images, sparse depth maps and dense depth maps in autonomous driving scenarios;

设计具有先验网络和后验网络的条件变分自编码器,将彩色图像和稀疏深度图输入到先验网络中提取特征,再将彩色图像、稀疏深度图和稠密深度图输入到后验网络中提取特征;Design a conditional variational autoencoder with a priori network and a posteriori network, input the color image and sparse depth map into the prior network to extract features, and then input the color image, sparse depth map and dense depth map into the posterior network extract features;

利用相机内参或焦距将稀疏深度图转换成点云,再将点云上采样模型提取到几何空间特征,并映射回稀疏深度图上;Convert the sparse depth map into a point cloud using camera intrinsic parameters or focal length, then extract the point cloud upsampling model into geometric space features, and map them back to the sparse depth map;

采用动态图消息传播模块实现图像特征和点云特征的融合;The dynamic graph message dissemination module is used to achieve the fusion of image features and point cloud features;

利用基于残差网络的U形编码解码器生成初步的深度补全图;A preliminary depth completion map is generated using a U-shaped encoder and decoder based on a residual network;

将初步预测的补全深度图输入到置信度不确定性估计模块,实现最终的深度补全优化。The initially predicted completion depth map is input into the confidence uncertainty estimation module to achieve the final depth completion optimization.

优选的,所述获取自动驾驶场景下的彩色图像和稀疏深度图,包括:Preferably, the acquisition of color images and sparse depth maps in autonomous driving scenarios includes:

使用彩色相机和激光雷达捕获自动驾驶场景下的彩色图像和稀疏深度图;Use color cameras and lidar to capture color images and sparse depth maps in autonomous driving scenarios;

利用SparsityInvariantCNNs算法将稀疏深度图变成稠密深度图作为真实标签辅助训练。The SparsityInvariantCNNs algorithm is used to convert the sparse depth map into a dense depth map as a real label auxiliary training.

优选的,所述设计具有先验网络和后验网络的条件变分自编码器,将彩色图像和稀疏深度图输入到先验网络中提取特征,再将彩色图像、稀疏深度图和稠密深度图输入到后验网络中提取特征,包括:Preferably, the conditional variational autoencoder is designed with a priori network and a posteriori network, the color image and sparse depth map are input into the prior network to extract features, and then the color image, sparse depth map and dense depth map are Input to the posterior network to extract features, including:

基于ResNet结构的特征提取模块,设计具有相同结构的先验网络和后验网络作为条件变分自编码器;Based on the feature extraction module of the ResNet structure, a priori network and a posteriori network with the same structure are designed as conditional variational autoencoders;

将彩色图像和稀疏深度图输入先验网络提取最后一层的特征图Prior,同时,将彩色图像、稀疏深度图和真实标签输入后验网络提取最后一层的特征图Posterior,然后分别计算Prior和Posterior特征图的均值和方差,得到各自的特征的概率分布D1和D2,再使用Kullback-Leibler散度损失函数监督分布D1和D2之间的损失,使先验网络能够学习到后验网络的真实标签特征。Input the color image and sparse depth map into the prior network to extract the feature map Prior of the last layer. At the same time, input the color image, sparse depth map and real label into the posterior network to extract the feature map Posterior of the last layer, and then calculate Prior and The mean and variance of the Posterior feature map are used to obtain the probability distributions D 1 and D 2 of the respective features, and then the Kullback-Leibler divergence loss function is used to supervise the loss between the distributions D 1 and D 2 so that the prior network can learn the posterior Test the real label characteristics of the network.

优选的,所述利用相机内参或焦距将稀疏深度图转换成点云,再将点云上采样模型提取到几何空间特征,并映射回稀疏深度图上,包括:Preferably, the method uses camera internal parameters or focal length to convert the sparse depth map into a point cloud, and then extracts the point cloud upsampling model into geometric space features and maps them back to the sparse depth map, including:

根据相机内参将稀疏深度图像像素点(ui,vi)由像素坐标系转换到相机坐标系得到三维场景的点云坐标(xi,yi,zi),形成稀疏点云数据S;According to the internal parameters of the camera, the sparse depth image pixels (u i , vi ) are converted from the pixel coordinate system to the camera coordinate system to obtain the point cloud coordinates (xi , y i , z i ) of the three-dimensional scene, forming sparse point cloud data S;

其中,(cx,cy)为相机的光心坐标,fx,fy分别为相机在x轴和y轴方向的焦距,di为(ui,vi)处的深度值,对于真实标签深度图,也采用上述公式生成稠密标签点云S1Among them, (c x , c y ) are the optical center coordinates of the camera, f x , f y are the focal lengths of the camera in the x-axis and y-axis directions respectively, d i is the depth value at (u i , vi ) , for The real label depth map also uses the above formula to generate a dense label point cloud S 1 ;

通过对点云S进行多次随机采样,获得不同数量的点云集合,针对每个点云集合,利用KNN最邻近节点算法聚合每个点周围的16个最近点,输入到几何感知神经网络中提取该点的局部几何特征;By randomly sampling the point cloud S multiple times, different numbers of point cloud sets are obtained. For each point cloud set, the KNN nearest neighbor node algorithm is used to aggregate the 16 closest points around each point and input into the geometry-aware neural network. Extract the local geometric features of the point;

将每个点提取到的稀疏点云特征与原始点云坐标(xi,yi,zi)相加得到点云编码特征Q,将Q输入到四倍上采样的多层感知机网络中得到预测的稠密点云S2,利用ChamferDistance损失函数计算真实稠密点云S1和预测的稠密点云S2之间的损失值,以此监督网络的训练,CD损失具体的计算公式如下:Add the sparse point cloud features extracted from each point to the original point cloud coordinates (x i , y i , z i ) to obtain the point cloud encoding feature Q, and input Q into the multi-layer perceptron network with four times upsampling The predicted dense point cloud S 2 is obtained, and the ChamferDistance loss function is used to calculate the loss value between the real dense point cloud S 1 and the predicted dense point cloud S 2 to supervise the training of the network. The specific calculation formula of CD loss is as follows:

其中,第一项表示S1中任意一点x到S2的最小距离之和,第二项表示S2中任意一点y到S1的最小距离之和。Among them, the first term represents the sum of the minimum distances from any point x in S 1 to S 2 , and the second term represents the sum of the minimum distances from any point y in S 2 to S 1 .

优选的,所述采用动态图消息传播模块实现图像特征和点云特征的融合,包括:Preferably, the dynamic graph message dissemination module is used to realize the fusion of image features and point cloud features, including:

设计两个具有相同结构的编码网络,编码器由5层ResNet模块构成,将彩色图像和稀疏深度图输入到RGB分支编码器提取到五个不同尺度的特征图L1,L2,L3,L4和L5,将点云特征Q和稀疏深度图输入到点云分支编码器提取到五个不同尺度的特征图P1,P2,P3,P4和P5Design two encoding networks with the same structure. The encoder is composed of a 5-layer ResNet module. The color image and sparse depth map are input to the RGB branch encoder to extract five different scale feature maps L 1 , L 2 , L 3 , L 4 and L 5 , input the point cloud feature Q and sparse depth map to the point cloud branch encoder to extract feature maps of five different scales P 1 , P 2 , P 3 , P 4 and P 5 ;

对于L1,L2,L3,L4和L5,采用空洞卷积的方式获取到不同感受野的像素点,再利用可变形卷积探索每个像素点的坐标偏移量,使得每个像素点动态地聚合周围相关性强的特征值,获得特征T1,T2,T3,T4和T5For L 1 , L 2 , L 3 , L 4 and L 5 , dilated convolution is used to obtain pixels in different receptive fields, and then deformable convolution is used to explore the coordinate offset of each pixel, so that each Each pixel dynamically aggregates surrounding feature values with strong correlation to obtain features T 1 , T 2 , T 3 , T 4 and T 5 ;

将富含动态图特征的T1,T2,T3,T4,T5加到点云编码特征图P1,P2,P3,P4,P5上,获得包含语义信息和几何信息的点云特征图M1,M2,M3,M4,M5Add T 1 , T 2 , T 3 , T 4 , and T 5 rich in dynamic image features to the point cloud encoding feature maps P 1 , P 2 , P 3 , P 4 , and P 5 to obtain semantic information and geometry. Information point cloud feature maps M 1 , M 2 , M 3 , M 4 , M 5 .

优选的,所述利用基于残差网络的U形编码解码器生成初步的深度补全图,包括:Preferably, the U-shaped codec based on residual network is used to generate a preliminary depth completion map, including:

根据编码器结构设计相应的多尺度解码器结构,形成U-Net结构的编码解码器网络;Design the corresponding multi-scale decoder structure according to the encoder structure to form an encoder-decoder network with a U-Net structure;

将RGB分支生成的特征图L1,L2,L3,L4和L5输入U-Net预测出第一张粗糙深度补全图Depth1和置信度图C1Input the feature maps L 1 , L 2 , L 3 , L 4 and L 5 generated by the RGB branch into U-Net to predict the first rough depth completion map Depth 1 and confidence map C 1 ;

将点云分支生成的特征图M1,M2,M3,M4和M5输入U-Net预测出第二张粗糙深度补全图Depth2和置信度图C2Input the feature maps M 1 , M 2 , M 3 , M 4 and M 5 generated by the point cloud branch into U-Net to predict the second rough depth completion map Depth 2 and confidence map C 2 .

优选的,所述将初步预测的补全深度图输入到置信度不确定性估计模块,实现最终的深度补全优化,包括:Preferably, the initially predicted completion depth map is input into the confidence uncertainty estimation module to achieve final depth completion optimization, including:

将生成的置信度图C1和C2相加得到特征图C,利用Softmax函数对于C再进行不确定性预测,逐像素预测出每张置信度图的不确定性比例F1和F2Add the generated confidence maps C 1 and C 2 to obtain the feature map C, use the Softmax function to predict the uncertainty of C, and predict the uncertainty ratios F 1 and F 2 of each confidence map pixel by pixel;

将不确定性图F1和F2分别和粗糙深度补全Depth1和Depth2相乘获得最终优化的深度补全图。The final optimized depth completion map is obtained by multiplying the uncertainty maps F 1 and F 2 with the rough depth completion Depth 1 and Depth 2 respectively.

与现有技术相比,本发明的有益效果是:Compared with the prior art, the beneficial effects of the present invention are:

通过利用条件变分自编码器学习真实稠密深度图中的特征分布,引导彩色图像和稀疏深度图生成更有价值的深度特征,其次,利用三维空间的点云特征捕获不同模态的下的空间结构特征,增强网络的几何感知能力,为预测出更加准确的深度值提供了辅助信息,并且,动态图消息传播模块巧妙地融合彩色图像和点云之间的特征,实现高精度的深度补全预测,能够弥补激光雷达采集的数据过于稀疏的问题,使得线束较少的低成本激光雷达获得较为准确稠密的深度信息,为自动驾驶、机器人环境感知等需要精确稠密深度数据的行业提供了具有成本效益的解决方案。By using conditional variational autoencoders to learn the feature distribution in real dense depth maps, color images and sparse depth maps are guided to generate more valuable depth features. Secondly, point cloud features in three-dimensional space are used to capture the space under different modalities. Structural features enhance the geometric perception of the network and provide auxiliary information for predicting more accurate depth values. Moreover, the dynamic graph message propagation module cleverly integrates the features between color images and point clouds to achieve high-precision depth completion. Prediction can make up for the problem of too sparse data collected by lidar, allowing low-cost lidar with fewer wire harnesses to obtain more accurate and dense depth information, providing a cost-effective solution for industries that require accurate and dense depth data, such as autonomous driving and robot environment perception. Efficient solutions.

附图说明Description of the drawings

图1为本发明实施例提供的一种基于条件变分自编码器和几何引导的场景深度补全方法的流程图;Figure 1 is a flow chart of a scene depth completion method based on conditional variational autoencoders and geometric guidance provided by an embodiment of the present invention;

图2为本发明实施例提供的一种基于条件变分自编码器和几何引导的场景深度补全方法的深度补全图。Figure 2 is a depth completion diagram of a scene depth completion method based on conditional variational autoencoders and geometric guidance provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

本实施方式的方法的执行主体为终端,所述终端可以为手机、平板电脑、掌上电脑PDA、笔记本或台式机等设备,当然,还可以为其他具有相似功能的设备,本实施方式不加以限制。The execution subject of the method in this embodiment is a terminal. The terminal can be a mobile phone, a tablet computer, a PDA, a notebook or a desktop computer. Of course, it can also be other devices with similar functions, which is not limited by this embodiment. .

请参阅图1和2,本发明提供一种基于条件变分自编码器和几何引导的场景深度补全方法,所述方法应用于自动驾驶场景深度补全,包括:Please refer to Figures 1 and 2. The present invention provides a scene depth completion method based on conditional variational autoencoders and geometric guidance. The method is applied to automatic driving scene depth completion, including:

步骤S1,获取自动驾驶场景下的彩色图像,稀疏深度图以及稠密深度图。Step S1: Obtain color images, sparse depth maps and dense depth maps in autonomous driving scenarios.

具体的,步骤S1还包括以下步骤:Specifically, step S1 also includes the following steps:

S101,使用彩色相机和激光雷达捕获自动驾驶场景下的彩色图像和稀疏深度图;S101, uses color cameras and lidar to capture color images and sparse depth maps in autonomous driving scenarios;

S102,利用Sparsity Invariant CNNs算法将稀疏深度图变成稠密深度图作为真实标签辅助训练。S102, use the Sparsity Invariant CNNs algorithm to convert the sparse depth map into a dense depth map as real label auxiliary training.

其中,自动驾驶车辆主要配备彩色相机和激光雷达,分别采集RGB图像和深度图像,本方法中需要额外生成补全深度图作为训练标签,具体步骤如下:Among them, self-driving vehicles are mainly equipped with color cameras and lidar, which collect RGB images and depth images respectively. In this method, an additional complementary depth map needs to be generated as a training label. The specific steps are as follows:

使用彩色相机和Velodyne HDL-64E激光雷达捕获自动驾驶场景下的彩色图像和深度图像;利用Sparsity Invariant CNNs算法将稀疏深度图变成稠密深度图作为真实标签。Use a color camera and Velodyne HDL-64E lidar to capture color images and depth images in autonomous driving scenarios; use the Sparsity Invariant CNNs algorithm to convert sparse depth maps into dense depth maps as real labels.

步骤S2,设计具有先验网络和后验网络的条件变分自编码器,将彩色图像和稀疏深度图输入到先验网络中提取特征,再将彩色图像、稀疏深度图和稠密深度图输入到后验网络中提取特征。Step S2, design a conditional variational autoencoder with a priori network and a posteriori network, input the color image and sparse depth map into the prior network to extract features, and then input the color image, sparse depth map and dense depth map into Extract features from the posterior network.

具体的,步骤S2还包括以下步骤:Specifically, step S2 also includes the following steps:

S201,基于ResNet结构的特征提取模块,设计具有相同结构的先验网络和后验网络作为条件变分自编码器;S201, based on the feature extraction module of ResNet structure, design a priori network and posterior network with the same structure as a conditional variational autoencoder;

S202,将彩色图像和稀疏深度图输入先验网络提取最后一层的特征图Prior,同时,将彩色图像、稀疏深度图和真实标签输入后验网络提取最后一层的特征图Posterior,然后分别计算Prior和Posterior特征图的均值和方差,得到各自的特征的概率分布D1和D2,再使用Kullback-Leibler散度损失函数监督分布D1和D2之间的损失,使先验网络能够学习到后验网络的真实标签特征。S202, input the color image and sparse depth map into the prior network to extract the feature map Prior of the last layer. At the same time, input the color image, sparse depth map and real label into the posterior network to extract the feature map Posterior of the last layer, and then calculate them respectively. The mean and variance of the Prior and Posterior feature maps are used to obtain the probability distributions D 1 and D 2 of the respective features, and then the Kullback-Leibler divergence loss function is used to supervise the loss between the distributions D 1 and D 2 so that the prior network can learn to the true label features of the posterior network.

步骤S3,利用相机内参或焦距将稀疏深度图转换成点云,再将点云上采样模型提取到几何空间特征,并映射回稀疏深度图上。Step S3: Use camera intrinsic parameters or focal length to convert the sparse depth map into a point cloud, then extract the point cloud upsampling model into geometric space features, and map them back to the sparse depth map.

具体的,步骤S3还包括以下步骤:Specifically, step S3 also includes the following steps:

S301,根据相机内参/焦距等将稀疏深度图像像素点(ui,vi)由像素坐标系转换到相机坐标系得到三维场景的点云坐标(xi,yi,zi),形成稀疏点云数据S;S301, convert the sparse depth image pixels (u i , vi ) from the pixel coordinate system to the camera coordinate system according to the camera internal parameters/focal length, etc. to obtain the point cloud coordinates (x i , y i , z i ) of the three-dimensional scene, forming a sparse Point cloud data S;

其中,(cx,cy)为相机的光心坐标,fx,fy分别为相机在x轴和y轴方向的焦距,di为(ui,vi)处的深度值,对于真实标签深度图,也采用上述公式生成稠密标签点云S1Among them, (c x , c y ) are the optical center coordinates of the camera, f x , f y are the focal lengths of the camera in the x-axis and y-axis directions respectively, d i is the depth value at (u i , vi ) , for The real label depth map also uses the above formula to generate a dense label point cloud S 1 ;

S302,通过对点云S进行多次随机采样,获得不同数量的点云集合,针对每个点云集合,利用KNN最邻近节点算法聚合每个点周围的16个最近点,输入到几何感知神经网络中提取该点的局部几何特征;S302, obtain different numbers of point cloud collections by randomly sampling the point cloud S multiple times. For each point cloud collection, use the KNN nearest neighbor node algorithm to aggregate the 16 closest points around each point and input them to the geometry perception neural network. Extract the local geometric features of the point in the network;

S303,将每个点提取到的稀疏点云特征与原始点云坐标(xi,yi,zi)相加得到点云编码特征Q,将Q输入到四倍上采样的多层感知机网络中得到预测的稠密点云S2,利用Chamfer Distance损失函数计算真实稠密点云S1和预测的稠密点云S2之间的损失值,以此监督网络的训练,CD损失具体的计算公式如下:S303, add the sparse point cloud features extracted from each point to the original point cloud coordinates (x i , y i , z i ) to obtain the point cloud encoding feature Q, and input Q to the multi-layer perceptron with four times upsampling The predicted dense point cloud S 2 is obtained from the network, and the Chamfer Distance loss function is used to calculate the loss value between the real dense point cloud S 1 and the predicted dense point cloud S 2 to supervise the training of the network. The specific calculation formula of CD loss as follows:

其中,第一项表示S1中任意一点x到S2的最小距离之和,第二项表示S2中任意一点y到S1的最小距离之和。Among them, the first term represents the sum of the minimum distances from any point x in S 1 to S 2 , and the second term represents the sum of the minimum distances from any point y in S 2 to S 1 .

步骤S4,采用动态图消息传播模块实现图像特征和点云特征的融合。Step S4: Use the dynamic graph message dissemination module to achieve the fusion of image features and point cloud features.

具体的,步骤S4还包括以下步骤:Specifically, step S4 also includes the following steps:

S401,设计两个具有相同结构的编码网络,编码器由5层ResNet模块构成,将彩色图像和稀疏深度图输入到RGB分支编码器提取到五个不同尺度的特征图L1,L2,L3,L4和L5,将点云特征Q和稀疏深度图输入到点云分支编码器提取到五个不同尺度的特征图P1,P2,P3,P4和P5S401, design two encoding networks with the same structure. The encoder is composed of a 5-layer ResNet module. The color image and sparse depth map are input to the RGB branch encoder to extract five feature maps of different scales L 1 , L 2 , L 3 , L 4 and L 5 , input the point cloud feature Q and sparse depth map to the point cloud branch encoder to extract feature maps of five different scales P 1 , P 2 , P 3 , P 4 and P 5 ;

S402,对于L1,L2,L3,L4和L5,采用空洞卷积的方式获取到不同感受野的像素点,再利用可变形卷积探索每个像素点的坐标偏移量,使得每个像素点动态地聚合周围相关性强的特征值,获得特征T1,T2,T3,T4和T5S402, for L 1 , L 2 , L 3 , L 4 and L 5 , use atrous convolution to obtain pixels in different receptive fields, and then use deformable convolution to explore the coordinate offset of each pixel. This allows each pixel to dynamically aggregate surrounding feature values with strong correlation to obtain features T 1 , T 2 , T 3 , T 4 and T 5 ;

S403,将富含动态图特征的T1,T2,T3,T4,T5加到点云编码特征图P1,P2,P3,P4,P5上,获得包含语义信息和几何信息的点云特征图M1,M2,M3,M4,M5S403, add T 1 , T 2 , T 3 , T 4 , and T 5 rich in dynamic image features to the point cloud encoding feature maps P 1 , P 2 , P 3 , P 4 , and P 5 to obtain semantic information. and point cloud feature maps M 1 , M 2 , M 3 , M 4 , M 5 of geometric information.

步骤S5,利用基于残差网络的U形编码解码器生成初步的深度补全图。Step S5: Use the U-shaped codec based on the residual network to generate a preliminary depth completion map.

具体的,步骤S5还包括以下步骤:Specifically, step S5 also includes the following steps:

S501,根据步骤S4中编码器结构设计相应的多尺度解码器结构,形成U-Net结构的编码解码器网络;S501. Design the corresponding multi-scale decoder structure according to the encoder structure in step S4 to form an encoder-decoder network with a U-Net structure;

S502,将RGB分支生成的特征图L1,L2,L3,L4和L5输入U-Net预测出第一张粗糙深度补全图Depth1和置信度图C1S502, input the feature maps L 1 , L 2 , L 3 , L 4 and L 5 generated by the RGB branch into U-Net to predict the first rough depth completion map Depth 1 and confidence map C 1 ;

S503,将点云分支生成的特征图M1,M2,M3,M4和M5输入U-Net预测出第二张粗糙深度补全图Depth2和置信度图C2S503, input the feature maps M 1 , M 2 , M 3 , M 4 and M 5 generated by the point cloud branch into U-Net to predict the second rough depth complement map Depth 2 and confidence map C 2 .

步骤S6,将初步预测的补全深度图输入到置信度不确定性估计模块,实现最终的深度补全优化。Step S6: Input the initially predicted completion depth map into the confidence uncertainty estimation module to achieve final depth completion optimization.

具体的,步骤S6还包括以下步骤:Specifically, step S6 also includes the following steps:

S601,将步骤S5生成的置信度图C1和C2相加得到特征图C,利用Softmax函数对于C再进行不确定性预测,逐像素预测出每张置信度图的不确定性比例F1和F2S601, add the confidence maps C 1 and C 2 generated in step S5 to obtain the feature map C, use the Softmax function to perform uncertainty prediction on C, and predict the uncertainty ratio F 1 of each confidence map pixel by pixel. and F 2 ;

S602,将不确定性图F1和F2分别和粗糙深度补全Depth1和Depth2相乘获得最终优化的深度补全图。S602: Multiply the uncertainty maps F 1 and F 2 with the rough depth completion Depth 1 and Depth 2 respectively to obtain the final optimized depth completion map.

在本实施例中,首先,通过利用条件变分自编码器学习真实稠密深度图中的特征分布,引导彩色图像和稀疏深度图生成更有价值的深度特征,其次,利用三维空间的点云特征捕获不同模态的下的空间结构特征,增强网络的几何感知能力,为预测出更加准确的深度值提供了辅助信息,并且,动态图消息传播模块巧妙地融合彩色图像和点云之间的特征,实现高精度的深度补全预测。In this embodiment, first, by using conditional variational autoencoders to learn the feature distribution in real dense depth maps, color images and sparse depth maps are guided to generate more valuable depth features, and secondly, point cloud features in three-dimensional space are used Captures the spatial structure characteristics of different modalities, enhances the geometric perception ability of the network, and provides auxiliary information for predicting more accurate depth values. Moreover, the dynamic graph message propagation module cleverly integrates the features between color images and point clouds. , achieving high-precision depth completion prediction.

另外,还需要说明的是,本案中各技术特征的组合方式并不限本案权利要求中所记载的组合方式或是具体实施例所记载的组合方式,本案所记载的所有技术特征可以以任何方式进行自由组合或结合,除非相互之间产生矛盾。In addition, it should be noted that the combination of the technical features in this case is not limited to the combinations recorded in the claims of this case or the combinations recorded in the specific embodiments. All the technical features recorded in this case can be combined in any way. Free combination or combination, unless there is a conflict between them.

需要注意的是,以上列举的仅为本发明的具体实施例,显然本发明不限于以上实施例,随之有着许多的类似变化。本领域的技术人员如果从本发明公开的内容直接导出或联想到的所有变形,均应属于本发明的保护范围。It should be noted that the above examples are only specific embodiments of the present invention. Obviously, the present invention is not limited to the above embodiments, and there are many similar changes. All modifications directly derived or thought of by those skilled in the art from the disclosure of the present invention shall fall within the protection scope of the present invention.

以上仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.

Claims (6)

1. The scene depth completion method based on the conditional variation self-encoder and the geometric guidance is characterized by comprising the following steps of:
acquiring a color image, a sparse depth map and a dense depth map under an automatic driving scene;
the method comprises the steps of designing a condition variation self-encoder with a priori network and a posterior network, inputting a color image and a sparse depth map into the priori network to extract features, and inputting the color image, the sparse depth map and the dense depth map into the posterior network to extract features;
converting the sparse depth map into point cloud by utilizing camera internal parameters or focal length, extracting a point cloud up-sampling model to geometric space features, and mapping the geometric space features back to the sparse depth map;
the dynamic image information transmission module is adopted to realize the fusion of image characteristics and point cloud characteristics, and the specific steps of the fusion are as follows: designing two coding networks with the same structure, wherein an encoder consists of a 5-layer ResNet module, and inputting a color image and a sparse depth map into an RGB (red, green and blue) branch encoder to extract five feature maps L with different scales 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting the point cloud characteristics Q and the sparse depth map into a point cloud branch encoder to extract characteristic maps P with five different scales 1 ,P 2 ,P 3 ,P 4 And P 5 The method comprises the steps of carrying out a first treatment on the surface of the For L 1 ,L 2 ,L 3 ,L 4 And L 5 Obtaining pixel points of different receptive fields by adopting a cavity convolution mode, and exploring the coordinate offset of each pixel point by utilizing deformable convolution to dynamically aggregate the characteristic values with strong surrounding correlation of each pixel point to obtain a characteristic T 1 ,T 2 ,T 3 ,T 4 And T 5 The method comprises the steps of carrying out a first treatment on the surface of the T to be rich in dynamic diagram features 1 ,T 2 ,T 3 ,T 4 ,T 5 Adding to point cloud coding feature map P 1 ,P 2 ,P 3 ,P 4 ,P 5 Obtaining a point cloud characteristic map M containing semantic information and geometric information 1 ,M 2 ,M 3 ,M 4 ,M 5
Generating a preliminary depth complement diagram by using a U-shaped coder decoder based on a residual error network;
and inputting the preliminary predicted complement depth map to a confidence uncertainty estimation module to realize final depth complement optimization.
2. The scene depth completion method based on conditional variance self-encoder and geometric guidance of claim 1, wherein the acquiring of color image and sparse depth map in an autopilot scene comprises:
capturing a color image and a sparse depth map in an autopilot scene using a color camera and a lidar;
the sparse depth map is changed into a dense depth map by using a Sparsity Invariant CNNs algorithm to serve as a real tag to assist training.
3. The scene depth completion method based on conditional variance self-encoder and geometric guidance according to claim 2, wherein the designing the conditional variance self-encoder with a priori network and a posterior network, inputting the color image and sparse depth map into the a priori network to extract features, and then inputting the color image, sparse depth map and dense depth map into the posterior network to extract features, comprises:
the feature extraction module based on the ResNet structure designs a priori network and a posterior network with the same structure as a condition variation self-encoder;
inputting the color image and the sparse depth map into a priori network to extract the feature map primary of the last layer, inputting the color image, the sparse depth map and the real label into a Posterior network to extract the feature map Posterior of the last layer, and then respectively calculating the mean value and the variance of the primary and Posterior feature maps to obtain the probability distribution D of the respective features 1 And D 2 Supervision distribution D by using Kullback-Leibler divergence loss function 1 And D 2 The loss between the two causes the prior network to learn the real label characteristics of the posterior network.
4. A scene depth completion method based on a conditional variation self-encoder and geometric guidance according to claim 3, wherein said converting a sparse depth map into a point cloud using camera intrinsic parameters or focal lengths, extracting a point cloud upsampling model to geometric spatial features, and mapping back onto the sparse depth map comprises:
sparse depth image pixels (u) i ,v i ) Converting the pixel coordinate system into the camera coordinate system to obtain the point cloud coordinate (x i ,y i ,z i ) Forming sparse point cloud data S;
wherein (c) x ,c y ) Is the optical center coordinate of the camera, f x ,f y Focal lengths of the camera in x-axis and y-axis directions, d i Is (u) i ,v i ) Depth values at the locations, for a real tag depth map, a dense tag point cloud S is also generated using the above formula 1
The method comprises the steps of randomly sampling point clouds for a plurality of times to obtain different numbers of point cloud sets, aggregating 16 nearest points around each point by utilizing a KNN nearest node algorithm aiming at each point cloud set, and inputting the 16 nearest points into a geometric perception neural network to extract local geometric features of the point;
the sparse point cloud features extracted from each point are combined with the original point cloud coordinates (x i ,y i ,z i ) Adding to obtain a point cloud coding characteristic Q, and inputting the Q into a quadruple up-sampling multi-layer perceptron network to obtain a predicted dense point cloud S 2 Computing a true dense point cloud S using a Chamfer Distance loss function 1 And predicted dense point cloud S 2 The specific calculation formula of the CD loss is as follows:
wherein the first term represents S 1 Any point x to S 2 Sum of minimum distances of (2), second termRepresent S 2 Any point y to S 1 Is the sum of the minimum distances of (a) and (b).
5. The conditional variance self-encoder and geometry guided scene depth completion method of claim 1, wherein generating a preliminary depth complement map using a residual network-based U-shaped codec comprises:
designing a corresponding multi-scale decoder structure according to the encoder structure to form a U-Net structured encoder-decoder network;
feature map L generated by branching RGB 1 ,L 2 ,L 3 ,L 4 And L 5 Inputting U-Net to predict the Depth of the first rough Depth complement map 1 And confidence map C 1
Feature map M generated by branching point cloud 1 ,M 2 ,M 3 ,M 4 And M 5 Inputting U-Net to predict second coarse Depth full-complement map Depth 2 And confidence map C 2
6. The scene depth completion method based on conditional variance self-encoder and geometric guidance of claim 1, wherein said inputting the preliminary predicted completion depth map to the confidence uncertainty estimation module, achieves final depth completion optimization, comprises:
confidence map C to be generated 1 And C 2 Adding to obtain a characteristic diagram C, carrying out uncertainty prediction on the characteristic diagram C by using a Softmax function, and predicting an uncertainty proportion F of each confidence diagram pixel by pixel 1 And F 2
Map of uncertainty F 1 And F 2 Depth of roughness and Depth of roughness 1 And Depth 2 Multiplying to obtain the final optimized depth complement diagram.
CN202310422520.0A 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance Active CN116468768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310422520.0A CN116468768B (en) 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310422520.0A CN116468768B (en) 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance

Publications (2)

Publication Number Publication Date
CN116468768A CN116468768A (en) 2023-07-21
CN116468768B true CN116468768B (en) 2023-10-17

Family

ID=87183885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310422520.0A Active CN116468768B (en) 2023-04-20 2023-04-20 Scene depth completion method based on conditional variation self-encoder and geometric guidance

Country Status (1)

Country Link
CN (1) CN116468768B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351310B (en) * 2023-09-28 2024-03-12 山东大学 Multimodal 3D target detection method and system based on depth completion
CN117953029B (en) * 2024-03-27 2024-06-07 北京科技大学 General depth map completion method and device based on depth information propagation
CN118411396B (en) * 2024-04-16 2025-03-21 中国科学院自动化研究所 Depth completion method, device, electronic device and storage medium
CN119151804B (en) * 2024-11-19 2025-03-18 浙江大学 A method and device for generating pseudo points based on mixed light images

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767294A (en) * 2021-01-14 2021-05-07 Oppo广东移动通信有限公司 Depth image enhancement method and device, electronic equipment and storage medium
CN112861729A (en) * 2021-02-08 2021-05-28 浙江大学 Real-time depth completion method based on pseudo-depth map guidance
WO2022045495A1 (en) * 2020-08-25 2022-03-03 Samsung Electronics Co., Ltd. Methods for depth map reconstruction and electronic computing device for implementing the same
CN114998406A (en) * 2022-07-14 2022-09-02 武汉图科智能科技有限公司 Self-supervision multi-view depth estimation method and device
CN115423978A (en) * 2022-08-30 2022-12-02 西北工业大学 Deep Learning-Based Image Laser Data Fusion Method for Building Reconstruction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11315266B2 (en) * 2019-12-16 2022-04-26 Robert Bosch Gmbh Self-supervised depth estimation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022045495A1 (en) * 2020-08-25 2022-03-03 Samsung Electronics Co., Ltd. Methods for depth map reconstruction and electronic computing device for implementing the same
CN112767294A (en) * 2021-01-14 2021-05-07 Oppo广东移动通信有限公司 Depth image enhancement method and device, electronic equipment and storage medium
CN112861729A (en) * 2021-02-08 2021-05-28 浙江大学 Real-time depth completion method based on pseudo-depth map guidance
CN114998406A (en) * 2022-07-14 2022-09-02 武汉图科智能科技有限公司 Self-supervision multi-view depth estimation method and device
CN115423978A (en) * 2022-08-30 2022-12-02 西北工业大学 Deep Learning-Based Image Laser Data Fusion Method for Building Reconstruction

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Depth Completion Auto-Encoder;Kaiyue Lu等;《2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)》;全文 *
Sparse-to-Dense Multi-Encoder Shape Completion of Unstructured Point Cloud;Yanjun Peng等;《IEEE Access》;第8卷;全文 *
周云成 ; 邓寒冰 ; 许童羽 ; 苗腾 ; 吴琼 ; .基于稠密自编码器的无监督番茄植株图像深度估计模型.农业工程学报.2020,(第11期),全文. *
王东敏 ; 彭永胜 ; 李永乐 ; .视觉与激光点云融合的深度图像获取方法.军事交通学院学报.2017,(第10期),全文. *
面向鲁棒和智能化的多源融合SLAM技术研究;左星星;《中国博士学位论文全文数据库 (信息科技辑)》;全文 *

Also Published As

Publication number Publication date
CN116468768A (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN116468768B (en) Scene depth completion method based on conditional variation self-encoder and geometric guidance
CN109377530B (en) A Binocular Depth Estimation Method Based on Deep Neural Network
CN110992271B (en) Image processing method, path planning method, device, equipment and storage medium
CN111105432B (en) Unsupervised end-to-end driving environment perception method based on deep learning
CN110246181B (en) Anchor point-based attitude estimation model training method, attitude estimation method and system
CN111696148A (en) End-to-end stereo matching method based on convolutional neural network
CN116612468A (en) 3D Object Detection Method Based on Multimodal Fusion and Deep Attention Mechanism
CN116129233A (en) Panoramic Segmentation Method for Autonomous Driving Scene Based on Multimodal Fusion Perception
WO2021249114A1 (en) Target tracking method and target tracking device
CN117422884A (en) Three-dimensional target detection method, system, electronic equipment and storage medium
Shi et al. An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds
CN115330935A (en) A 3D reconstruction method and system based on deep learning
CN115511759A (en) A Point Cloud Image Depth Completion Method Based on Cascade Feature Interaction
CN114049362A (en) Transform-based point cloud instance segmentation method
CN117745944A (en) Pre-training model determining method, device, equipment and storage medium
Jia et al. Depth measurement based on a convolutional neural network and structured light
CN117351310B (en) Multimodal 3D target detection method and system based on depth completion
CN118429764A (en) Collaborative sensing method based on multi-mode fusion
CN118537834A (en) Vehicle perception information acquisition method, device, equipment and storage medium
CN113012191A (en) Laser mileage calculation method based on point cloud multi-view projection graph
CN117423102A (en) Point cloud data processing method and related equipment
CN115965961B (en) Local-global multi-mode fusion method, system, equipment and storage medium
CN117953157A (en) Pure vision self-supervision three-dimensional prediction model based on two-dimensional video in automatic driving field
CN116912645A (en) Three-dimensional target detection method and device integrating texture and geometric features
CN117671625A (en) A multi-modal perception method and system for autonomous driving based on diffusion model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant