CN115131569A

CN115131569A - An Unguided Depth Completion Method for Custom Kernel Dilation

Info

Publication number: CN115131569A
Application number: CN202210749638.XA
Authority: CN
Inventors: 盛冯浩; 钱孙磊; 蒲嘉成; 谢世朋
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-09-30

Abstract

A custom kernel-dilated unguided depth completion method for lidar depth image processing. The algorithm does not depend on any training data. The present invention adopts a greedy algorithm to complete the input sparse depth map into a dense depth map through depth inversion, pixel hole filling and kernel expansion and other operations in order from small to large. The present invention can run in real time on a 3.8GHz CPU and does not require any additional GPU hardware, which enables it to be deployed in embedded systems as a preprocessing step for more complex tasks such as SLAM or 3D object detection. Meanwhile, considering the wide application of guided depth completion, this method can be used as a preprocessing step for guided depth completion. Or directly apply it to occasions where guided depth completion is not applicable, such as dark nights and other environments with poor lighting.

Description

An Unguided Depth Completion Method for Custom Kernel Dilation

技术领域technical field

本发明属于图像处理技术领域，具体涉及一种自定义核膨胀的非引导深度补全方法。The invention belongs to the technical field of image processing, and in particular relates to a non-guided depth completion method for self-defined kernel expansion.

背景技术Background technique

激光雷达是一种广泛应用于无人驾驶汽车与机器人视觉技术领域的测量工具，其可以输出周围环境的雷达点云图，反映环境的3维深度信息。但由于激光雷达单位周期扫描的点云有限，市面上的激光雷达输出的深度点云图往往不够稠密，无法满足人们的应用需要。这使得人们必须采用其他方法使得激光雷达输出的雷达点云图稠密化，也就是实现深度补全。Lidar is a measurement tool widely used in the field of driverless cars and robot vision technology. It can output radar point cloud images of the surrounding environment and reflect the 3-dimensional depth information of the environment. However, due to the limited number of point clouds scanned by lidar per unit period, the depth point cloud images output by lidars on the market are often not dense enough to meet people's application needs. This makes it necessary to use other methods to densify the radar point cloud image output by the lidar, that is, to achieve depth completion.

近年来，随着深度学习与人工智能领域的发展，人们越来越多运用深度学习作为雷达数据深度补全的方案。目前，市面上主流的方法是借助稀疏雷达点云图和与其标定的彩色RGB图像，利用相关深度学习模型，实现雷达数据的深度补全。In recent years, with the development of deep learning and artificial intelligence, more and more people use deep learning as a solution for deep completion of radar data. At present, the mainstream method on the market is to use the sparse radar point cloud image and the color RGB image calibrated with it, and use the relevant deep learning model to realize the depth completion of the radar data.

这种方法往往需要训练大量的数据集，并且对于测试使用的GPU要求极高。同时，由于依赖标定的彩色RGB图像，这类深度补全往往对于采光有着极高的要求，使用黑夜中拍摄的没有足够颜色区分度的彩色RGB图像，往往会使得补全效果极大下降。This approach often requires training on a large dataset and is extremely demanding on the GPU used for testing. At the same time, due to the reliance on calibrated color RGB images, this type of depth completion often has extremely high requirements for lighting, and the use of color RGB images that do not have sufficient color discrimination in the dark will often greatly reduce the completion effect.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种不依赖于任何训练数据，没有使用深度学习模型，无需额外GPU，就可以能从激光雷达扫描的深度图中恢复完整的深度图的深度补全方法。The purpose of the present invention is to provide a depth completion method that does not depend on any training data, does not use a deep learning model, and does not require an additional GPU, and can restore a complete depth map from a depth map scanned by a lidar.

为达此目的，本发明采用以下技术方案：For this purpose, the present invention adopts the following technical solutions:

本发明提出了一种自定义核膨胀的非引导深度补全方法，其中包括以下具体步骤：The present invention proposes a non-guided depth completion method for self-defined kernel expansion, which includes the following specific steps:

步骤S0，通过激光雷达探测采集获得目标场景的稀疏点云图；Step S0, obtaining a sparse point cloud image of the target scene through lidar detection and collection;

将这些稀疏点云图作为输入，对其使用经典图像处理的方法，具体包括依次进行的深度编码反演、5×5菱形核膨胀、5×5全满核闭运算、13×13全满核膨胀、像素延伸、27×27全满核膨胀、中值模糊、高斯模糊、深度编码复原的操作。Take these sparse point cloud images as input, and use classical image processing methods for them, including depth coding inversion, 5×5 diamond kernel expansion, 5×5 full kernel closing operation, and 13×13 full kernel expansion. , pixel extension, 27×27 full kernel dilation, median blur, Gaussian blur, depth coding restoration operations.

步骤S1，用较大的像素值覆盖较小的像素值。本发明使用的数据深度值范围在0～250米之间，没有有效深度值的位置用0填充。将有效深度进行反演操作，即Step S1, cover the smaller pixel value with the larger pixel value. The data depth value range used in the present invention is between 0 and 250 meters, and the position without valid depth value is filled with 0. Perform the inversion operation on the effective depth, i.e.

X_反演＝270.0-K_输入 X _inversion = 270.0-K _input

这样，在有效深度和空值之间有一个20m的缓冲。This way, there is a 20m buffer between the effective depth and the null value.

步骤S2，用5×5菱形核进行膨胀操作。In step S2, the expansion operation is performed with 5×5 diamond-shaped cores.

步骤S3，用5×5全满核对小孔洞进行闭运算操作。Step S3, perform a closing operation on the small holes with a 5×5 full check.

经过前面的步骤，深度图中仍然有一些小到中等大小的空洞没有被前填充。为了填补这些空洞，步骤S4首先计算一个空像素的掩膜，然后再进行13×13全内核膨胀操作。After the previous steps, there are still some small to medium sized holes in the depth map that are not front-filled. To fill these holes, step S4 first computes a mask of empty pixels, and then performs a 13×13 full-kernel dilation operation.

考虑一些比较高的物体，如树木建筑物等，都到达了LiDAR点云的顶端。为了补齐这些物体的深度，步骤S5将每一列的顶端的像素值向上延伸。Consider some taller objects, such as trees, buildings, etc., all reaching the top of the LiDAR point cloud. In order to fill in the depths of these objects, step S5 extends the pixel values at the top of each column upwards.

步骤S6，对尚未填充的大空洞进行填充，采用用一个27×27全满核l来进行膨胀操作，将所有仍然为空值的深度值填充起来，而其他位置的有效深度值不变。Step S6, fill the unfilled large holes, and use a 27×27 full kernel l to perform the expansion operation to fill all the depth values that are still empty, while the effective depth values of other positions remain unchanged.

步骤S7，先使用3×3全满核的中值模糊，去除局部边缘的离群点。然后，再用3×3全满核进行高斯模糊，对局部平面进行平滑，平滑尖锐的物体边缘。Step S7, firstly use a 3×3 full-kernel median blur to remove outliers at local edges. Then, a Gaussian blur is performed with a 3×3 full kernel to smooth the local plane and smooth the edges of sharp objects.

步骤S8，先将原始深度图像划分为若干8×8大小的图像块，利用公式Step S8, first divide the original depth image into several image blocks of 8×8 size, and use the formula

x_ij＝R_ijXx _ij =R _ij X

获得图像块与原始图像的映射关系，其中x_ij为图像块在原始图像(i，j)处的向量表示，X表示原始图像的深度值矩阵，R_ij表示从图像X中提取出块x_ij的矩阵算子。Obtain the mapping relationship between the image block and the original image, where x _ij is the vector representation of the image block at the original image (i, j), X represents the depth value matrix of the original image, and R _ij represents the block x _ij extracted from the image X matrix operator.

选择标准离散余弦变换字典的列，使得在每次迭代过程中选择的列与当前冗余向量最大程度相关，从原始向量中减去相关部分并反复迭代，直到迭代次数达到一定的稀疏度，停止迭代。Select the columns of the standard discrete cosine transform dictionary so that the selected column is most relevant to the current redundant vector during each iteration, subtract the relevant part from the original vector and iterate repeatedly until the number of iterations reaches a certain sparsity, stop iterate.

此时，最优稀疏系数满足公示At this time, the optimal sparse coefficient satisfies the public

公式中，α^k表示近似信号的最优稀疏表示系数，k表示最终迭代的稀疏度。b为图像块深度信息展开后的向量表示，D为标准离散余弦变换字典，M为图像块的掩码矩阵。In the formula, α ^k represents the optimal sparse representation coefficient of the approximate signal, and k represents the sparsity of the final iteration. b is the expanded vector representation of the depth information of the image block, D is the standard discrete cosine transform dictionary, and M is the mask matrix of the image block.

利用公式Use the formula

可以获得最终修复后的深度图。其中，

表示步骤S8最终获取的深度信息矩阵，

为最终更新后的字典，

为最终的稀疏表示矩阵。R_ij表示从图像X中提取出8×8数据块x_ij的矩阵算子。The final inpainted depth map can be obtained. in,

represents the depth information matrix finally obtained in step S8,

is the final updated dictionary,

is the final sparse representation matrix. R _ij represents the matrix operator that extracts the 8×8 data block x _ij from the image X.

步骤S9，对以上步骤两条路径获取的深度信息进行加权处理：Step S9, performing weighting processing on the depth information obtained by the two paths in the above steps:

其中，

表示两条路径加权后的深度信息，

表示主路径获取的深度信息，

表示分路径获取的深度信息。in,

represents the weighted depth information of the two paths,

Indicates the depth information obtained by the main path,

Indicates the depth information obtained by sub-path.

步骤S10，将前几步所使用的倒置深度值恢复到原来的深度编码，即使用同步骤S1的反演公式：Step S10, restore the inverted depth value used in the previous steps to the original depth coding, that is, use the same inversion formula as step S1:

X_反演＝270.0-X_输入。X _inversion = 270.0 - X _input .

步骤S11，输出完整深度图。Step S11, output the complete depth map.

本发明具有的有益效果是：The beneficial effects that the present invention has are:

(1)本发明仅使用经典图像处理的方法，没有使用任何深度学习算法，不需要借助任何已有深度学习模型，也不需要其他用于进行训练的数据集。因此在初次接触本发明时，仅需要相应用于补全的数据点云图，即可实现深度补全。不需要深度学习方案通常需要进行的模型训练与测试。(1) The present invention only uses the classical image processing method, does not use any deep learning algorithm, does not need to rely on any existing deep learning model, and does not require other data sets for training. Therefore, when contacting the present invention for the first time, only the corresponding data point cloud image for completion is required to realize depth completion. There is no need for model training and testing that deep learning solutions typically require.

(2)本发明补全的所有像素全部基于对原始激光雷达点云的补充与推测，不需要借助标定的彩色RGB图像，因此可以适用于不同场景的深度变化，拥有更高的鲁棒性。(2) All the pixels completed by the present invention are all based on the addition and speculation of the original lidar point cloud, and do not need to use the calibrated color RGB image, so it can be applied to the depth change of different scenes and has higher robustness.

(3)本发明在网络结构简单，没有引入更多的测试参数，同时不需要任何在其他数据集上进行预训练，没有复杂的后处理网络，相比于其他复杂的深度补全方法，拥有更强的实时性，并且对于测试硬件要求极低。(3) The present invention has a simple network structure, does not introduce more test parameters, does not require any pre-training on other data sets, and does not have a complex post-processing network. Compared with other complex depth completion methods, it has Stronger real-time performance, and extremely low requirements for test hardware.

综合来说，本发明网络结构简单，对于输入数据没有过多苛刻要求，能够在短期内快速输出稠密的深度图。拥有更强的实时性，在相同测试条件下，与其他已经公布的深度补全算法相比，拥有最快的测试速度。同时，由于本发明不需要借助彩色RGB图像进行辅助补全，同时由于对测试硬件要求不高，本发明能够适用于更多深度补全场景。In general, the network structure of the present invention is simple, does not have too many strict requirements for input data, and can quickly output a dense depth map in a short period of time. It has stronger real-time performance and has the fastest test speed compared with other published depth completion algorithms under the same test conditions. At the same time, since the present invention does not need to use color RGB images for auxiliary completion, and because the requirements for testing hardware are not high, the present invention can be applied to more depth completion scenarios.

附图说明Description of drawings

图1为本发明实施例中自定义核膨胀的非引导深度补全方法的流程图。FIG. 1 is a flowchart of an unguided depth completion method for custom kernel expansion in an embodiment of the present invention.

图2为本发明实施例中一个模型示例。FIG. 2 is an example of a model in an embodiment of the present invention.

图3为本发明实施例中用于比较的不同内核示意图。FIG. 3 is a schematic diagram of different kernels used for comparison in an embodiment of the present invention.

图4为本发明实施例中本方法在KITTI测试集中三个样本上的定性结果的可视化。FIG. 4 is a visualization of qualitative results of the method on three samples in the KITTI test set in the embodiment of the present invention.

图5为本发明实施例中的定性结果示意图。FIG. 5 is a schematic diagram of qualitative results in the embodiment of the present invention.

具体实施方式Detailed ways

下面结合说明书附图对本发明的技术方案做进一步的详细说明。The technical solutions of the present invention will be further described in detail below with reference to the accompanying drawings.

如图1的流程图所示，按照本发明完整方法实施的实施例及其实施过程如下：As shown in the flow chart of Fig. 1, the embodiment implemented according to the complete method of the present invention and its implementation process are as follows:

以KITTI Depth Completion已知数据集作为已知数据集和补全稀疏深度图为例来表述自定义核膨胀的非引导深度补全的思想和具体实施步骤。Taking the KITTI Depth Completion known dataset as a known dataset and complementing the sparse depth map as an example, the idea and specific implementation steps of the unguided depth completion of custom kernel expansion are described.

实施例的稀疏深度图、以及真值深度图均来自KITTI Depth Completion已知数据集。The sparse depth map and the ground-truth depth map of the embodiment are all from the known dataset of KITTI Depth Completion.

步骤一：利用KITTIDepthCompletion已知训练集，对训练集提供的深度图，执行步骤二。Step 1: Use the known training set of KITTIDepthCompletion to perform step 2 on the depth map provided by the training set.

步骤二：对步骤一所述的训练集中的稀疏深度图进行深度补全。使用的主要稀疏处理机制是OpenCV形态变换操作，它用较大的像素值覆盖较小的像素值。在考虑原始KITTI深度图数据时，较近的像素值接近0m，而较远的像素值最大为250m。但是，空像素值也为0m，这会阻止在不进行修改的情况下使用原生OpenCV操作。对原始深度图应用扩展操作将导致较大的距离覆盖较小的距离，从而丢失较近对象的边缘信息。为了解决这个问题，有效(非空)像素深度根据X_反演＝270.0-X_输入进行倒置，还在有效和空的像素值之间创建一个20米的缓冲区。这种反演算法在应用扩展操作时保留更接近的边缘。20米缓冲区用于偏移有效深度，以便在后续操作中屏蔽无效像素。Step 2: Perform depth completion on the sparse depth map in the training set described in Step 1. The main sparse processing mechanism used is the OpenCV morphological transform operation, which overlays smaller pixel values with larger pixel values. When considering the raw KITTI depth map data, the closer pixel values are close to 0m, while the farther pixel values are at most 250m. However, the null pixel value is also 0m, which prevents using native OpenCV operations without modification. Applying an expansion operation to the original depth map will cause larger distances to cover smaller distances, thus losing edge information for closer objects. To fix this, the valid (non-empty) pixel depth is inverted according to the X _inversion = 270.0-X _input , also creating a 20m buffer between valid and empty pixel values. This inversion algorithm preserves closer edges when applying spread operations. The 20m buffer is used to offset the effective depth to mask invalid pixels in subsequent operations.

步骤三：首先填充最接近有效像素的空像素，因为这些像素最有可能与有效深度共享接近的深度值。考虑到投影点的稀疏性和激光雷达扫描线的结构，为每个有效深度像素的初始扩展设计了一个自定义核。内核形状的设计使得最有可能具有相同值的像素被放大到相同的值。实现并评估了图3所示的四种内核形状。根据实验结果，使用5×5菱形核来膨胀所有有效像素。Step 3: The empty pixels closest to the valid pixels are filled first, since these pixels are most likely to share a close depth value with the valid depth. Considering the sparsity of projected points and the structure of lidar scan lines, a custom kernel is designed for the initial expansion of each effective depth pixel. The kernel shape is designed such that pixels that are most likely to have the same value are upscaled to the same value. The four kernel shapes shown in Figure 3 were implemented and evaluated. According to the experimental results, a 5 × 5 diamond kernel is used to dilate all valid pixels.

步骤四：在初始扩展步骤之后，深度图中仍然存在许多孔。由于这些区域不包含深度值，所以考虑环境中物体的结构，并注意附近的扩张深度斑块可以连接起来形成物体的边缘。使用5×5全核的形态闭合算法来闭合深度图中的小孔。此操作使用二进制内核，保留对象边缘。此步骤用于连接附近的深度值，可以看作是一组从最远到最近的5×5像素平面。Step 4: After the initial expansion step, there are still many holes in the depth map. Since these regions do not contain depth values, consider the structure of objects in the environment and note that nearby patches of dilated depth can connect to form the edges of objects. A 5×5 full-kernel morphological closure algorithm is used to close small holes in the depth map. This operation uses a binary kernel, preserving object edges. This step is used to concatenate nearby depth values, which can be viewed as a set of 5×5 pixel planes from farthest to nearest.

步骤五：深度图中的一些小到中等大小的孔没有通过前两次膨胀操作填充。为了填补这些空洞，首先计算空像素的掩码，然后进行13×13的全内核膨胀操作。此操作只会填充空像素，同时保持先前计算的有效像素不变。Step 5: Some small to medium sized holes in the depth map were not filled by the first two dilation operations. To fill these holes, a mask for empty pixels is first computed, followed by a 13×13 full-kernel dilation operation. This operation will only fill empty pixels, while keeping the previously computed valid pixels unchanged.

步骤六：为了考虑延伸到激光雷达点顶部以上的高大对象，例如树木、杆子和建筑物，将沿每列的顶部值外推到图像顶部，从而提供更密集的深度贴图输出。Step 6: To account for tall objects that extend above the top of the lidar point, such as trees, poles, and buildings, the top values along each column are extrapolated to the top of the image, providing a denser depth map output.

步骤七：最后一个填充步骤将处理深度贴图中未完全填充的较大孔。由于这些区域不包含点，并且不使用图像数据，因此这些像素的深度值是从附近的值推断出来的。带有27×27完整内核的扩展操作用于填充任何剩余的空像素，同时保持有效像素不变。Step Seven: The final fill step will deal with larger holes in the depth map that are not fully filled. Since these regions do not contain points and do not use image data, the depth values for these pixels are inferred from nearby values. An expansion operation with a 27×27 full kernel is used to fill in any remaining empty pixels, while keeping valid pixels unchanged.

步骤八：在应用了前面的步骤之后，最终得到了一个密集的深度图。在这个深度图中，异常(极端)值是膨胀运算的副产品(不必要的数值)。为了去除这些异常值，使用了3×3的核中值滤波。这个去噪步骤非常重要，因为它在保持局部边缘的同时去除异常值。最后，使用3×3高斯滤波来平滑局部平面和完善尖锐部分的边缘。Step 8: After applying the previous steps, a dense depth map is finally obtained. In this depth map, outlier (extreme) values are by-products (unnecessary values) of the dilation operation. To remove these outliers, a 3×3 kernel median filter is used. This denoising step is very important because it removes outliers while preserving local edges. Finally, a 3×3 Gaussian filter is used to smooth local planes and refine the edges of sharp parts.

步骤九：在步骤二完成深度反演后，将反演后的原始深度图像划分为若干8×8大小的图像块，利用公式Step 9: After completing the depth inversion in Step 2, divide the inverted original depth image into several 8×8 image blocks, and use the formula

x_ij＝R_ijXx _ij =R _ij X

选择标准离散余弦变换字典的列，使得在每次迭代过程中选择的列与当前冗余向量最大程度相关，从原始向量中减去相关部分并反复迭代，直到迭代次数达到一定的稀疏度，停止迭代。核心算法步骤如下：Select the columns of the standard discrete cosine transform dictionary so that the selected column is most relevant to the current redundant vector during each iteration, subtract the relevant part from the original vector and iterate repeatedly until the number of iterations reaches a certain sparsity, stop iterate. The core algorithm steps are as follows:

初始化：k＝0,α⁰＝0,残差

支撑集

Initialization: k=0, α ⁰ =0, residual

support set

迭代过程：计算误差

找到最小的j₀，使得∈(j₀)≤∈(j),

更新支撑集S^k＝S^k-1∪{j₀}Iterative Process: Calculating Errors

Find the smallest j ₀ such that ∈(j ₀ )≤∈(j),

Update support set ^Sk = ^Sk-1 ∪{j ₀ }

更新稀疏表示系数：对于给定的支撑集S^k，求解近似信号的最优稀疏表示系数Update sparse representation coefficients: For a given support set S ^k , find the optimal sparse representation coefficients of the approximate signal

更新残差：

Update residuals:

迭代：如果‖r^k‖₂≤∈(j),停止迭代更新；否则重复Iteration: if ‖r ^k ‖ ₂ ≤∈(j), stop iterative update; otherwise repeat

最终的稀疏表示解α^k The final sparse representation solution α ^k

通过迭代，获得满足一定条件的迭代次数J和误差阈值∈，得到最终更新后的字典

和稀疏表示矩阵

对图块进行取平均值的操作。Through iteration, the number of iterations J and the error threshold ∈ satisfying certain conditions are obtained, and the final updated dictionary is obtained

and sparse representation matrix

Perform an average operation on the tiles.

得到最终完成修复的深度图。Get the depth map with the final inpainting.

步骤十：将两条路径得到的深度图进行加权处理，公式如下：Step 10: Perform weighting processing on the depth maps obtained by the two paths, the formula is as follows:

其中，

表示两条路径加权后的深度信息，

表示主路径获取的深度信息，

表示分路径获取的深度信息。in,

represents the weighted depth information of the two paths,

Indicates the depth information obtained by the main path,

Indicates the depth information obtained by sub-path.

最后，从算法前面步骤使用的反向深度值恢复到原始深度编码。Finally, restore the original depth encoding from the reversed depth values used in the previous steps of the algorithm.

可以简单的计算为：It can be simply calculated as:

X_反演＝270.0-X_输入。X _inversion = 270.0 - X _input .

步骤十一，输出完整深度图。Step 11, output the full depth map.

进行深度补全测试，包含大量投影到图像坐标中的激光雷达扫描图，以此方法来形成深度图。利用前置摄像机标定矩阵将激光雷达点投影到图像坐标上，得到与RGB图像相同大小的稀疏深度图。稀疏性是由于激光雷达数据的分辨率远低于其投影到的图像空间由于激光雷达扫描线的角度，只有底部三分之二的深度图包含点(像素点)深度图底部区域的点的稀疏度在5-7％之间。对应的RGB图像也为每个深度图提供，但不被无引导深度补全算法使用。所提供的1000张图像验证集将用于所有实验的评估，这1000张图像测试集的最终结果将由KITTI的测试服务器提交和评估。使用逆均方根误差(iRMSE)、逆平均误差(iMAE)、均方根误差(RMSE)和平均误差(MAE)度量来评估算法和基线的性能。A depth-completion test is performed, which consists of a large number of lidar scans projected into image coordinates to form a depth map. Using the front camera calibration matrix to project the lidar points onto the image coordinates, a sparse depth map of the same size as the RGB image is obtained. Sparsity is due to the lidar data being at a much lower resolution than the image space into which it is projected Only the bottom two-thirds of the depth map contains points (pixels) due to the angle of the lidar scan lines The sparseness of the points in the bottom region of the depth map Degrees are between 5-7%. Corresponding RGB images are also provided for each depth map, but are not used by the unguided depth completion algorithm. The provided 1000-image validation set will be used for the evaluation of all experiments, and the final results of this 1000-image test set will be submitted and evaluated by KITTI's test server. The performance of the algorithm and baselines is evaluated using the Inverse Root Mean Square Error (iRMSE), Inverse Mean Square Error (iMAE), Root Mean Square Error (RMSE), and Mean Error (MAE) metrics.

表一：在KITTI深度完成测试集上三种最常用的深度补全方法和本方法的性能进行了比较。结果由KITTI的评估服务器生成。Table 1: The performance of the three most commonly used depth completion methods and this method is compared on the KITTI depth completion test set. Results are generated by KITTI's evaluation server.

从表中可以看到：本方法比KITTI数据集上的亚军NN+CNN算法的均方根误差(RMSE)和MAE的平均误差(MAE)分别提高了131.29mm和113.54mm。这相当于最终点云结果中11cm的平均误差差，这对于精确的3D目标定位、避障和SLAM(即使定位与地图构建)都很重要。It can be seen from the table that this method improves the root mean square error (RMSE) and the average error (MAE) of MAE by 131.29mm and 113.54mm, respectively, over the runner-up NN+CNN algorithm on the KITTI dataset. This corresponds to an average error difference of 11cm in the final point cloud result, which is important for accurate 3D object localization, obstacle avoidance and SLAM (even for localization and map building).

表2：膨胀核形状和大小对算法性能的影响。Table 2: Influence of dilation kernel shape and size on algorithm performance.

膨胀核的大小The size of the expanded nucleus 均方根误差(mm)Root mean square error (mm) 平均误差(mm)Average error (mm) 3×33×3 1649.971649.97 367.06367.06 5×55×5 1545.851545.85 349.45349.45 7×77×7 1720.791720.79 430.82430.82 膨胀核的形状shape of the expanded nucleus 均方根误差(mm)Root mean square error (mm) 平均误差(mm)Average error (mm) 整形plastic 1545.851545.85 349.45349.45 环形ring 1528.451528.45 342.49342.49 十字形cruciform 1521.951521.95 333.94333.94 菱形diamond 1512.181512.18 333.67333.67

初始膨胀核的设计对算法的性能有很大的影响。为了找到最优膨胀核，一个完整的核大小在3×3、5×5和7×7之间变化。一个7×7的核被发现扩展深度值超过了他们的实际影响区域，而一个3×3的核膨胀没有足够的用来扩展的像素，使得边界被后来的孔封闭操作连接。表2显示5×5内核提供最低的均方根误差。利用核尺寸实验的结果，探索了5×5二值核形状的设计空间。完整的内核被用作基线，并与圆形、十字和菱形内核形状进行比较。膨胀核的形状确定了每个像素的初始效果区域。表二显示菱形内核提供了最低的均方根误差。菱形核形状保留了圆角边缘的大致轮廓，并且足够大来使得边缘被下一个孔闭合操作连接。The design of the initial dilation kernel has a great impact on the performance of the algorithm. To find the optimal dilation kernel, a full kernel size varies between 3×3, 5×5 and 7×7. A 7×7 kernel was found to expand depth values beyond their actual area of influence, while a 3×3 kernel did not have enough pixels to expand, so that the boundaries were connected by subsequent hole closing operations. Table 2 shows that the 5×5 kernel provides the lowest rms error. Using the results of the kernel size experiments, the design space of the 5 × 5 binary kernel shape is explored. The full kernel was used as the baseline and compared with circular, cross and diamond kernel shapes. The shape of the dilation kernel determines the initial area of effect for each pixel. Table 2 shows that the diamond kernel provides the lowest rms error. The diamond core shape retains the general outline of the rounded edges and is large enough that the edges are joined by the next hole closing operation.

表三：滤波效果。Table 3: Filtering effect.

核nuclear 均方根误差(mm)Root mean square error (mm) 平均误差(mm)Average error (mm) 运行时间(s)Running time (s) 没有滤波no filtering 1512.181512.18 333.67333.67 0.0070.007 双边滤波bilateral filtering 1511.801511.80 334.12334.12 0.0110.011 中间滤波Intermediate filtering 1461.541461.54 323.34323.34 0.0090.009 中间加双边滤波Intermediate plus bilateral filtering 1456.691456.69 328.02328.02 0.0140.014 高斯滤波Gaussian filter 1360.061360.06 310.39310.39 0.0080.008 中间加高斯滤波Intermediate Gaussian filter 1350.931350.93 305.35305.35 0.0110.011

中间滤波的设计是为了去除椒盐噪，使其有效地去除异常深度值，这个操作增加了2ms的运行时间，但是其可以降低均方根误差和平均误差的值。双边滤波仅通过滤波附近具有相似值的像素来保持局部结构，对评估的均方根误差和平均误差指标的影响最小，但增加4ms的运行时间。由于均方根误差度量的欧几里德计算，高斯滤波通过最小化异常像素深度带来的影响显著地降低了均方根误差。高斯滤波也运行得最快，平均运行时间只增加1毫秒。最后的本方法使用组合的中值和高斯滤波，因为这种组合被证明提供最低的均方根误差。The design of the intermediate filter is to remove salt and pepper noise, making it effective in removing abnormal depth values. This operation increases the running time of 2ms, but it can reduce the value of the root mean square error and the average error. Bilateral filtering preserves local structure by only filtering nearby pixels with similar values, with minimal impact on the evaluated RMSE and mean error metrics, but adds 4ms to the runtime. Due to the Euclidean computation of the root mean square error metric, Gaussian filtering significantly reduces the root mean square error by minimizing the effect of outlier pixel depth. Gaussian filtering also ran the fastest, adding only 1 millisecond to the average running time. The final present method uses a combined median and Gaussian filtering, as this combination has been shown to provide the lowest rms error.

可以看出，本实施例提出的一种深度补全方法，以一个稀疏深度图作为输入，输出一个密集深度图。只使用传统的图像处理技术，不需要训练，使其对过拟合具有鲁棒性。展示了基于图像处理的算法在KITTI深度补全基准上提供了相当先进的结果，优于几种基于深度学习的方法。本方法可在3.8GHz以上的CPU上实时运行，不需要任何额外的GPU硬件，使其成为部署在嵌入式系统上作为更复杂任务(如SLAM或3D对象检测)的预处理步骤的一个有力竞争者。最后，这项工作并不是为了削弱深度学习系统的力量，而是为了阐明当前的研究趋势，在这种趋势中，经典方法没有被仔细考虑来进行比较，但如果设计得当，它们可以成为强大的基线。It can be seen that a depth completion method proposed in this embodiment takes a sparse depth map as input and outputs a dense depth map. Only traditional image processing techniques are used and no training is required, making it robust to overfitting. We show that the image processing-based algorithm provides fairly advanced results on the KITTI depth completion benchmark, outperforming several deep learning-based methods. This method can run in real-time on CPUs above 3.8GHz and does not require any additional GPU hardware, making it a strong competition for deployment on embedded systems as a preprocessing step for more complex tasks such as SLAM or 3D object detection By. In the end, this work is not intended to diminish the power of deep learning systems, but rather to shed light on current research trends in which classical approaches have not been carefully considered for comparison, but if properly designed, they can be powerful baseline.

以上所述仅为本发明的较佳实施方式，本发明的保护范围并不以上述实施方式为限，但凡本领域普通技术人员根据本发明所揭示内容所作的等效修饰或变化，皆应纳入权利要求书中记载的保护范围内。The above descriptions are only the preferred embodiments of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, but any equivalent modifications or changes made by those of ordinary skill in the art based on the contents disclosed in the present invention should be included in the within the scope of protection described in the claims.

Claims

1. a non-guided depth completion method of self-defined kernel expansion, is characterized in that: comprise the following steps:

Step S0, the sparse point cloud data image collected by the lidar is used as the input of the algorithm;

Step S1, perform inversion processing according to the depth information contained in the real depth point cloud image;

Step S2, use custom kernel expansion to fill the empty pixels closest to the effective pixels; use the custom kernel as the initial expansion of each effective depth pixel; the design of the kernel shape makes the most probable pixels with the same value enlarged to the same value ;

Step S3, take the hole closing operation, connect the adjacent depth values, and fill the small holes;

In step S4, the depth map obtained in step S3 is further filled with small holes, the empty pixel mask is calculated first, and then full-kernel expansion is performed to fill in the unfilled small-to-medium-sized holes in the first two steps of expansion operations;

Step S5, performing an extension operation on the top pixel value of each column, so that the empty pixels above the image are completely filled;

Step S6, the depth map obtained in step S5 is filled with large holes, and the remaining empty pixels are filled with large-size kernel expansion;

In step S7, Gaussian blur and median blur are performed on the dense depth map obtained in step S6 to remove outliers and maintain local boundaries;

Step S8, perform depth restoration based on sparse representation on the depth map obtained in step S2, and use a discrete cosine transform dictionary to obtain a dense depth map;

Step S9, weighting and integrating the depth maps obtained in steps S7 and S8 to obtain the final depth map;

Step S10, take depth inversion again, restore the inverted depth value used in the previous steps to the original depth coding;

Step S11, output the complete depth map.

2. the non-guided depth completion method of a kind of self-defined kernel expansion according to claim 1, it is characterized in that: step S1 first carries out the operation of filling 0 on the position that does not have effective depth value on the depth image; Perform an inversion operation on all depth codes on , calculated as:

X _inversion = 270.0 - X _input

Among them, X _inversion refers to the depth information after data inversion, and X _input refers to the depth information input in this link, and the unit is meters.

3 . The non-guided depth completion method for self-defined kernel expansion according to claim 1 , wherein the self-defined kernel used in step S2 is a 5×5 diamond kernel. 4 .

4 . The non-guided depth completion method for self-defined kernel expansion according to claim 1 , wherein the self-defined kernel used in step S3 is a 5×5 full kernel. 5 .

5 . The non-guided depth completion method for self-defined kernel expansion according to claim 1 , wherein the self-defined kernel used in step S4 is a 13×13 full kernel. 6 .

6 . The non-guided depth completion method for self-defined kernel expansion according to claim 1 , wherein the self-defined kernel used in step S6 is a 27×27 full kernel. 7 .

7 . The non-guided depth completion method for self-defined kernel expansion according to claim 1 , wherein the self-defined kernel used in step S7 is a 3×3 full kernel. 8 .

8. The non-guided depth completion method for self-defined kernel expansion according to claim 1, characterized in that: in the depth restoration method based on sparse representation adopted in step S8, the original depth map is divided into several 8× 8 size image blocks, using the formula:

x _ij =R _ij X

Obtain the mapping relationship between the image block and the original image, where x _ij is the vector representation of the image block at the original image (i, j), X represents the depth value matrix of the original image, and R _ij represents the block x _ij extracted from the image X The matrix operator of ;

Data repair based on discrete cosine transform dictionary, and its sparse representation coefficients are updated iteratively

Among them, α ^k represents the optimal sparse representation coefficient of the approximate signal, k represents the sparsity of the final iteration; b is the vector representation of the expanded depth information of the image block, D is the standard discrete cosine transform dictionary, and M is the mask matrix of the image block .

9. the non-guided depth completion method of a kind of self-defined kernel expansion according to claim 8, is characterized in that: in step S8, the depth information expression finally obtained is:

in,

represents the depth information matrix finally obtained in step S8,

is the final updated dictionary,

is the final sparse representation matrix.

10. The non-guided depth completion method of a self-defined kernel expansion according to claim 1, characterized in that: in the path integration adopted in step S9, the depth information obtained by the two paths is weighted:

in,

represents the weighted depth information of the two paths,

Indicates the depth information obtained by the main path,

Indicates the depth information obtained by sub-path.