CN115131569A - An Unguided Depth Completion Method for Custom Kernel Dilation - Google Patents
An Unguided Depth Completion Method for Custom Kernel Dilation Download PDFInfo
- Publication number
- CN115131569A CN115131569A CN202210749638.XA CN202210749638A CN115131569A CN 115131569 A CN115131569 A CN 115131569A CN 202210749638 A CN202210749638 A CN 202210749638A CN 115131569 A CN115131569 A CN 115131569A
- Authority
- CN
- China
- Prior art keywords
- depth
- kernel
- self
- image
- guided
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000010339 dilation Effects 0.000 title description 10
- 238000012545 processing Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 15
- 229910003460 diamond Inorganic materials 0.000 claims description 7
- 239000010432 diamond Substances 0.000 claims description 7
- 238000013461 design Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims 1
- 238000012549 training Methods 0.000 abstract description 10
- 238000001514 detection method Methods 0.000 abstract description 3
- 238000007781 pre-processing Methods 0.000 abstract description 3
- 238000012360 testing method Methods 0.000 description 12
- 238000001914 filtration Methods 0.000 description 10
- 238000013135 deep learning Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 230000002146 bilateral effect Effects 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 235000002566 Capsicum Nutrition 0.000 description 1
- 239000006002 Pepper Substances 0.000 description 1
- 235000016761 Piper aduncum Nutrition 0.000 description 1
- 235000017804 Piper guineense Nutrition 0.000 description 1
- 244000203593 Piper nigrum Species 0.000 description 1
- 235000008184 Piper nigrum Nutrition 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/34—Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
一种自定义核膨胀的非引导深度补全方法,来进行激光雷达深度图像处理。该算法不依赖于任何训练数据。本发明采用贪心算法,通过深度反演,由小到大顺序的像素空洞填充以及核膨胀等操作,将输入的稀疏深度图,补全为一个密集深度图。本发明可在3.8GHz的CPU上实时运行,且不需要任何额外的GPU硬件,这使得它可以作为预处理步骤部署在嵌入式系统中,用于更复杂的任务,如SLAM或3D目标检测。同时,考虑到引导深度补全的广泛应用,该方法可作为引导深度补全的预处理步骤。或直接应用于引导深度补全不适用的场合,如黑夜等采光不佳的环境。
A custom kernel-dilated unguided depth completion method for lidar depth image processing. The algorithm does not depend on any training data. The present invention adopts a greedy algorithm to complete the input sparse depth map into a dense depth map through depth inversion, pixel hole filling and kernel expansion and other operations in order from small to large. The present invention can run in real time on a 3.8GHz CPU and does not require any additional GPU hardware, which enables it to be deployed in embedded systems as a preprocessing step for more complex tasks such as SLAM or 3D object detection. Meanwhile, considering the wide application of guided depth completion, this method can be used as a preprocessing step for guided depth completion. Or directly apply it to occasions where guided depth completion is not applicable, such as dark nights and other environments with poor lighting.
Description
技术领域technical field
本发明属于图像处理技术领域,具体涉及一种自定义核膨胀的非引导深度补全方法。The invention belongs to the technical field of image processing, and in particular relates to a non-guided depth completion method for self-defined kernel expansion.
背景技术Background technique
激光雷达是一种广泛应用于无人驾驶汽车与机器人视觉技术领域的测量工具,其可以输出周围环境的雷达点云图,反映环境的3维深度信息。但由于激光雷达单位周期扫描的点云有限,市面上的激光雷达输出的深度点云图往往不够稠密,无法满足人们的应用需要。这使得人们必须采用其他方法使得激光雷达输出的雷达点云图稠密化,也就是实现深度补全。Lidar is a measurement tool widely used in the field of driverless cars and robot vision technology. It can output radar point cloud images of the surrounding environment and reflect the 3-dimensional depth information of the environment. However, due to the limited number of point clouds scanned by lidar per unit period, the depth point cloud images output by lidars on the market are often not dense enough to meet people's application needs. This makes it necessary to use other methods to densify the radar point cloud image output by the lidar, that is, to achieve depth completion.
近年来,随着深度学习与人工智能领域的发展,人们越来越多运用深度学习作为雷达数据深度补全的方案。目前,市面上主流的方法是借助稀疏雷达点云图和与其标定的彩色RGB图像,利用相关深度学习模型,实现雷达数据的深度补全。In recent years, with the development of deep learning and artificial intelligence, more and more people use deep learning as a solution for deep completion of radar data. At present, the mainstream method on the market is to use the sparse radar point cloud image and the color RGB image calibrated with it, and use the relevant deep learning model to realize the depth completion of the radar data.
这种方法往往需要训练大量的数据集,并且对于测试使用的GPU要求极高。同时,由于依赖标定的彩色RGB图像,这类深度补全往往对于采光有着极高的要求,使用黑夜中拍摄的没有足够颜色区分度的彩色RGB图像,往往会使得补全效果极大下降。This approach often requires training on a large dataset and is extremely demanding on the GPU used for testing. At the same time, due to the reliance on calibrated color RGB images, this type of depth completion often has extremely high requirements for lighting, and the use of color RGB images that do not have sufficient color discrimination in the dark will often greatly reduce the completion effect.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种不依赖于任何训练数据,没有使用深度学习模型,无需额外GPU,就可以能从激光雷达扫描的深度图中恢复完整的深度图的深度补全方法。The purpose of the present invention is to provide a depth completion method that does not depend on any training data, does not use a deep learning model, and does not require an additional GPU, and can restore a complete depth map from a depth map scanned by a lidar.
为达此目的,本发明采用以下技术方案:For this purpose, the present invention adopts the following technical solutions:
本发明提出了一种自定义核膨胀的非引导深度补全方法,其中包括以下具体步骤:The present invention proposes a non-guided depth completion method for self-defined kernel expansion, which includes the following specific steps:
步骤S0,通过激光雷达探测采集获得目标场景的稀疏点云图;Step S0, obtaining a sparse point cloud image of the target scene through lidar detection and collection;
将这些稀疏点云图作为输入,对其使用经典图像处理的方法,具体包括依次进行的深度编码反演、5×5菱形核膨胀、5×5全满核闭运算、13×13全满核膨胀、像素延伸、27×27全满核膨胀、中值模糊、高斯模糊、深度编码复原的操作。Take these sparse point cloud images as input, and use classical image processing methods for them, including depth coding inversion, 5×5 diamond kernel expansion, 5×5 full kernel closing operation, and 13×13 full kernel expansion. , pixel extension, 27×27 full kernel dilation, median blur, Gaussian blur, depth coding restoration operations.
步骤S1,用较大的像素值覆盖较小的像素值。本发明使用的数据深度值范围在0~250米之间,没有有效深度值的位置用0填充。将有效深度进行反演操作,即Step S1, cover the smaller pixel value with the larger pixel value. The data depth value range used in the present invention is between 0 and 250 meters, and the position without valid depth value is filled with 0. Perform the inversion operation on the effective depth, i.e.
X反演=270.0-K输入 X inversion = 270.0-K input
这样,在有效深度和空值之间有一个20m的缓冲。This way, there is a 20m buffer between the effective depth and the null value.
步骤S2,用5×5菱形核进行膨胀操作。In step S2, the expansion operation is performed with 5×5 diamond-shaped cores.
步骤S3,用5×5全满核对小孔洞进行闭运算操作。Step S3, perform a closing operation on the small holes with a 5×5 full check.
经过前面的步骤,深度图中仍然有一些小到中等大小的空洞没有被前填充。为了填补这些空洞,步骤S4首先计算一个空像素的掩膜,然后再进行13×13全内核膨胀操作。After the previous steps, there are still some small to medium sized holes in the depth map that are not front-filled. To fill these holes, step S4 first computes a mask of empty pixels, and then performs a 13×13 full-kernel dilation operation.
考虑一些比较高的物体,如树木建筑物等,都到达了LiDAR点云的顶端。为了补齐这些物体的深度,步骤S5将每一列的顶端的像素值向上延伸。Consider some taller objects, such as trees, buildings, etc., all reaching the top of the LiDAR point cloud. In order to fill in the depths of these objects, step S5 extends the pixel values at the top of each column upwards.
步骤S6,对尚未填充的大空洞进行填充,采用用一个27×27全满核l来进行膨胀操作,将所有仍然为空值的深度值填充起来,而其他位置的有效深度值不变。Step S6, fill the unfilled large holes, and use a 27×27 full kernel l to perform the expansion operation to fill all the depth values that are still empty, while the effective depth values of other positions remain unchanged.
步骤S7,先使用3×3全满核的中值模糊,去除局部边缘的离群点。然后,再用3×3全满核进行高斯模糊,对局部平面进行平滑,平滑尖锐的物体边缘。Step S7, firstly use a 3×3 full-kernel median blur to remove outliers at local edges. Then, a Gaussian blur is performed with a 3×3 full kernel to smooth the local plane and smooth the edges of sharp objects.
步骤S8,先将原始深度图像划分为若干8×8大小的图像块,利用公式Step S8, first divide the original depth image into several image blocks of 8×8 size, and use the formula
xij=RijXx ij =R ij X
获得图像块与原始图像的映射关系,其中xij为图像块在原始图像(i,j)处的向量表示,X表示原始图像的深度值矩阵,Rij表示从图像X中提取出块xij的矩阵算子。Obtain the mapping relationship between the image block and the original image, where x ij is the vector representation of the image block at the original image (i, j), X represents the depth value matrix of the original image, and R ij represents the block x ij extracted from the image X matrix operator.
选择标准离散余弦变换字典的列,使得在每次迭代过程中选择的列与当前冗余向量最大程度相关,从原始向量中减去相关部分并反复迭代,直到迭代次数达到一定的稀疏度,停止迭代。Select the columns of the standard discrete cosine transform dictionary so that the selected column is most relevant to the current redundant vector during each iteration, subtract the relevant part from the original vector and iterate repeatedly until the number of iterations reaches a certain sparsity, stop iterate.
此时,最优稀疏系数满足公示At this time, the optimal sparse coefficient satisfies the public
公式中,αk表示近似信号的最优稀疏表示系数,k表示最终迭代的稀疏度。b为图像块深度信息展开后的向量表示,D为标准离散余弦变换字典,M为图像块的掩码矩阵。In the formula, α k represents the optimal sparse representation coefficient of the approximate signal, and k represents the sparsity of the final iteration. b is the expanded vector representation of the depth information of the image block, D is the standard discrete cosine transform dictionary, and M is the mask matrix of the image block.
利用公式Use the formula
可以获得最终修复后的深度图。其中,表示步骤S8最终获取的深度信息矩阵,为最终更新后的字典,为最终的稀疏表示矩阵。Rij表示从图像X中提取出8×8数据块xij的矩阵算子。The final inpainted depth map can be obtained. in, represents the depth information matrix finally obtained in step S8, is the final updated dictionary, is the final sparse representation matrix. R ij represents the matrix operator that extracts the 8×8 data block x ij from the image X.
步骤S9,对以上步骤两条路径获取的深度信息进行加权处理:Step S9, performing weighting processing on the depth information obtained by the two paths in the above steps:
其中,表示两条路径加权后的深度信息,表示主路径获取的深度信息,表示分路径获取的深度信息。in, represents the weighted depth information of the two paths, Indicates the depth information obtained by the main path, Indicates the depth information obtained by sub-path.
步骤S10,将前几步所使用的倒置深度值恢复到原来的深度编码,即使用同步骤S1的反演公式:Step S10, restore the inverted depth value used in the previous steps to the original depth coding, that is, use the same inversion formula as step S1:
X反演=270.0-X输入。X inversion = 270.0 - X input .
步骤S11,输出完整深度图。Step S11, output the complete depth map.
本发明具有的有益效果是:The beneficial effects that the present invention has are:
(1)本发明仅使用经典图像处理的方法,没有使用任何深度学习算法,不需要借助任何已有深度学习模型,也不需要其他用于进行训练的数据集。因此在初次接触本发明时,仅需要相应用于补全的数据点云图,即可实现深度补全。不需要深度学习方案通常需要进行的模型训练与测试。(1) The present invention only uses the classical image processing method, does not use any deep learning algorithm, does not need to rely on any existing deep learning model, and does not require other data sets for training. Therefore, when contacting the present invention for the first time, only the corresponding data point cloud image for completion is required to realize depth completion. There is no need for model training and testing that deep learning solutions typically require.
(2)本发明补全的所有像素全部基于对原始激光雷达点云的补充与推测,不需要借助标定的彩色RGB图像,因此可以适用于不同场景的深度变化,拥有更高的鲁棒性。(2) All the pixels completed by the present invention are all based on the addition and speculation of the original lidar point cloud, and do not need to use the calibrated color RGB image, so it can be applied to the depth change of different scenes and has higher robustness.
(3)本发明在网络结构简单,没有引入更多的测试参数,同时不需要任何在其他数据集上进行预训练,没有复杂的后处理网络,相比于其他复杂的深度补全方法,拥有更强的实时性,并且对于测试硬件要求极低。(3) The present invention has a simple network structure, does not introduce more test parameters, does not require any pre-training on other data sets, and does not have a complex post-processing network. Compared with other complex depth completion methods, it has Stronger real-time performance, and extremely low requirements for test hardware.
综合来说,本发明网络结构简单,对于输入数据没有过多苛刻要求,能够在短期内快速输出稠密的深度图。拥有更强的实时性,在相同测试条件下,与其他已经公布的深度补全算法相比,拥有最快的测试速度。同时,由于本发明不需要借助彩色RGB图像进行辅助补全,同时由于对测试硬件要求不高,本发明能够适用于更多深度补全场景。In general, the network structure of the present invention is simple, does not have too many strict requirements for input data, and can quickly output a dense depth map in a short period of time. It has stronger real-time performance and has the fastest test speed compared with other published depth completion algorithms under the same test conditions. At the same time, since the present invention does not need to use color RGB images for auxiliary completion, and because the requirements for testing hardware are not high, the present invention can be applied to more depth completion scenarios.
附图说明Description of drawings
图1为本发明实施例中自定义核膨胀的非引导深度补全方法的流程图。FIG. 1 is a flowchart of an unguided depth completion method for custom kernel expansion in an embodiment of the present invention.
图2为本发明实施例中一个模型示例。FIG. 2 is an example of a model in an embodiment of the present invention.
图3为本发明实施例中用于比较的不同内核示意图。FIG. 3 is a schematic diagram of different kernels used for comparison in an embodiment of the present invention.
图4为本发明实施例中本方法在KITTI测试集中三个样本上的定性结果的可视化。FIG. 4 is a visualization of qualitative results of the method on three samples in the KITTI test set in the embodiment of the present invention.
图5为本发明实施例中的定性结果示意图。FIG. 5 is a schematic diagram of qualitative results in the embodiment of the present invention.
具体实施方式Detailed ways
下面结合说明书附图对本发明的技术方案做进一步的详细说明。The technical solutions of the present invention will be further described in detail below with reference to the accompanying drawings.
如图1的流程图所示,按照本发明完整方法实施的实施例及其实施过程如下:As shown in the flow chart of Fig. 1, the embodiment implemented according to the complete method of the present invention and its implementation process are as follows:
以KITTI Depth Completion已知数据集作为已知数据集和补全稀疏深度图为例来表述自定义核膨胀的非引导深度补全的思想和具体实施步骤。Taking the KITTI Depth Completion known dataset as a known dataset and complementing the sparse depth map as an example, the idea and specific implementation steps of the unguided depth completion of custom kernel expansion are described.
实施例的稀疏深度图、以及真值深度图均来自KITTI Depth Completion已知数据集。The sparse depth map and the ground-truth depth map of the embodiment are all from the known dataset of KITTI Depth Completion.
步骤一:利用KITTIDepthCompletion已知训练集,对训练集提供的深度图,执行步骤二。Step 1: Use the known training set of KITTIDepthCompletion to perform
步骤二:对步骤一所述的训练集中的稀疏深度图进行深度补全。使用的主要稀疏处理机制是OpenCV形态变换操作,它用较大的像素值覆盖较小的像素值。在考虑原始KITTI深度图数据时,较近的像素值接近0m,而较远的像素值最大为250m。但是,空像素值也为0m,这会阻止在不进行修改的情况下使用原生OpenCV操作。对原始深度图应用扩展操作将导致较大的距离覆盖较小的距离,从而丢失较近对象的边缘信息。为了解决这个问题,有效(非空)像素深度根据X反演=270.0-X输入进行倒置,还在有效和空的像素值之间创建一个20米的缓冲区。这种反演算法在应用扩展操作时保留更接近的边缘。20米缓冲区用于偏移有效深度,以便在后续操作中屏蔽无效像素。Step 2: Perform depth completion on the sparse depth map in the training set described in
步骤三:首先填充最接近有效像素的空像素,因为这些像素最有可能与有效深度共享接近的深度值。考虑到投影点的稀疏性和激光雷达扫描线的结构,为每个有效深度像素的初始扩展设计了一个自定义核。内核形状的设计使得最有可能具有相同值的像素被放大到相同的值。实现并评估了图3所示的四种内核形状。根据实验结果,使用5×5菱形核来膨胀所有有效像素。Step 3: The empty pixels closest to the valid pixels are filled first, since these pixels are most likely to share a close depth value with the valid depth. Considering the sparsity of projected points and the structure of lidar scan lines, a custom kernel is designed for the initial expansion of each effective depth pixel. The kernel shape is designed such that pixels that are most likely to have the same value are upscaled to the same value. The four kernel shapes shown in Figure 3 were implemented and evaluated. According to the experimental results, a 5 × 5 diamond kernel is used to dilate all valid pixels.
步骤四:在初始扩展步骤之后,深度图中仍然存在许多孔。由于这些区域不包含深度值,所以考虑环境中物体的结构,并注意附近的扩张深度斑块可以连接起来形成物体的边缘。使用5×5全核的形态闭合算法来闭合深度图中的小孔。此操作使用二进制内核,保留对象边缘。此步骤用于连接附近的深度值,可以看作是一组从最远到最近的5×5像素平面。Step 4: After the initial expansion step, there are still many holes in the depth map. Since these regions do not contain depth values, consider the structure of objects in the environment and note that nearby patches of dilated depth can connect to form the edges of objects. A 5×5 full-kernel morphological closure algorithm is used to close small holes in the depth map. This operation uses a binary kernel, preserving object edges. This step is used to concatenate nearby depth values, which can be viewed as a set of 5×5 pixel planes from farthest to nearest.
步骤五:深度图中的一些小到中等大小的孔没有通过前两次膨胀操作填充。为了填补这些空洞,首先计算空像素的掩码,然后进行13×13的全内核膨胀操作。此操作只会填充空像素,同时保持先前计算的有效像素不变。Step 5: Some small to medium sized holes in the depth map were not filled by the first two dilation operations. To fill these holes, a mask for empty pixels is first computed, followed by a 13×13 full-kernel dilation operation. This operation will only fill empty pixels, while keeping the previously computed valid pixels unchanged.
步骤六:为了考虑延伸到激光雷达点顶部以上的高大对象,例如树木、杆子和建筑物,将沿每列的顶部值外推到图像顶部,从而提供更密集的深度贴图输出。Step 6: To account for tall objects that extend above the top of the lidar point, such as trees, poles, and buildings, the top values along each column are extrapolated to the top of the image, providing a denser depth map output.
步骤七:最后一个填充步骤将处理深度贴图中未完全填充的较大孔。由于这些区域不包含点,并且不使用图像数据,因此这些像素的深度值是从附近的值推断出来的。带有27×27完整内核的扩展操作用于填充任何剩余的空像素,同时保持有效像素不变。Step Seven: The final fill step will deal with larger holes in the depth map that are not fully filled. Since these regions do not contain points and do not use image data, the depth values for these pixels are inferred from nearby values. An expansion operation with a 27×27 full kernel is used to fill in any remaining empty pixels, while keeping valid pixels unchanged.
步骤八:在应用了前面的步骤之后,最终得到了一个密集的深度图。在这个深度图中,异常(极端)值是膨胀运算的副产品(不必要的数值)。为了去除这些异常值,使用了3×3的核中值滤波。这个去噪步骤非常重要,因为它在保持局部边缘的同时去除异常值。最后,使用3×3高斯滤波来平滑局部平面和完善尖锐部分的边缘。Step 8: After applying the previous steps, a dense depth map is finally obtained. In this depth map, outlier (extreme) values are by-products (unnecessary values) of the dilation operation. To remove these outliers, a 3×3 kernel median filter is used. This denoising step is very important because it removes outliers while preserving local edges. Finally, a 3×3 Gaussian filter is used to smooth local planes and refine the edges of sharp parts.
步骤九:在步骤二完成深度反演后,将反演后的原始深度图像划分为若干8×8大小的图像块,利用公式Step 9: After completing the depth inversion in
xij=RijXx ij =R ij X
获得图像块与原始图像的映射关系,其中xij为图像块在原始图像(i,j)处的向量表示,X表示原始图像的深度值矩阵,Rij表示从图像X中提取出块xij的矩阵算子。Obtain the mapping relationship between the image block and the original image, where x ij is the vector representation of the image block at the original image (i, j), X represents the depth value matrix of the original image, and R ij represents the block x ij extracted from the image X matrix operator.
选择标准离散余弦变换字典的列,使得在每次迭代过程中选择的列与当前冗余向量最大程度相关,从原始向量中减去相关部分并反复迭代,直到迭代次数达到一定的稀疏度,停止迭代。核心算法步骤如下:Select the columns of the standard discrete cosine transform dictionary so that the selected column is most relevant to the current redundant vector during each iteration, subtract the relevant part from the original vector and iterate repeatedly until the number of iterations reaches a certain sparsity, stop iterate. The core algorithm steps are as follows:
初始化:k=0,α0=0,残差支撑集 Initialization: k=0, α 0 =0, residual support set
迭代过程:计算误差找到最小的j0,使得∈(j0)≤∈(j),更新支撑集Sk=Sk-1∪{j0}Iterative Process: Calculating Errors Find the smallest j 0 such that ∈(j 0 )≤∈(j), Update support set Sk = Sk-1 ∪{j 0 }
更新稀疏表示系数:对于给定的支撑集Sk,求解近似信号的最优稀疏表示系数Update sparse representation coefficients: For a given support set S k , find the optimal sparse representation coefficients of the approximate signal
更新残差: Update residuals:
迭代:如果‖rk‖2≤∈(j),停止迭代更新;否则重复Iteration: if ‖r k ‖ 2 ≤∈(j), stop iterative update; otherwise repeat
最终的稀疏表示解αk The final sparse representation solution α k
通过迭代,获得满足一定条件的迭代次数J和误差阈值∈,得到最终更新后的字典和稀疏表示矩阵对图块进行取平均值的操作。Through iteration, the number of iterations J and the error threshold ∈ satisfying certain conditions are obtained, and the final updated dictionary is obtained and sparse representation matrix Perform an average operation on the tiles.
得到最终完成修复的深度图。Get the depth map with the final inpainting.
步骤十:将两条路径得到的深度图进行加权处理,公式如下:Step 10: Perform weighting processing on the depth maps obtained by the two paths, the formula is as follows:
其中,表示两条路径加权后的深度信息,表示主路径获取的深度信息,表示分路径获取的深度信息。in, represents the weighted depth information of the two paths, Indicates the depth information obtained by the main path, Indicates the depth information obtained by sub-path.
最后,从算法前面步骤使用的反向深度值恢复到原始深度编码。Finally, restore the original depth encoding from the reversed depth values used in the previous steps of the algorithm.
可以简单的计算为:It can be simply calculated as:
X反演=270.0-X输入。X inversion = 270.0 - X input .
步骤十一,输出完整深度图。Step 11, output the full depth map.
进行深度补全测试,包含大量投影到图像坐标中的激光雷达扫描图,以此方法来形成深度图。利用前置摄像机标定矩阵将激光雷达点投影到图像坐标上,得到与RGB图像相同大小的稀疏深度图。稀疏性是由于激光雷达数据的分辨率远低于其投影到的图像空间由于激光雷达扫描线的角度,只有底部三分之二的深度图包含点(像素点)深度图底部区域的点的稀疏度在5-7%之间。对应的RGB图像也为每个深度图提供,但不被无引导深度补全算法使用。所提供的1000张图像验证集将用于所有实验的评估,这1000张图像测试集的最终结果将由KITTI的测试服务器提交和评估。使用逆均方根误差(iRMSE)、逆平均误差(iMAE)、均方根误差(RMSE)和平均误差(MAE)度量来评估算法和基线的性能。A depth-completion test is performed, which consists of a large number of lidar scans projected into image coordinates to form a depth map. Using the front camera calibration matrix to project the lidar points onto the image coordinates, a sparse depth map of the same size as the RGB image is obtained. Sparsity is due to the lidar data being at a much lower resolution than the image space into which it is projected Only the bottom two-thirds of the depth map contains points (pixels) due to the angle of the lidar scan lines The sparseness of the points in the bottom region of the depth map Degrees are between 5-7%. Corresponding RGB images are also provided for each depth map, but are not used by the unguided depth completion algorithm. The provided 1000-image validation set will be used for the evaluation of all experiments, and the final results of this 1000-image test set will be submitted and evaluated by KITTI's test server. The performance of the algorithm and baselines is evaluated using the Inverse Root Mean Square Error (iRMSE), Inverse Mean Square Error (iMAE), Root Mean Square Error (RMSE), and Mean Error (MAE) metrics.
表一:在KITTI深度完成测试集上三种最常用的深度补全方法和本方法的性能进行了比较。结果由KITTI的评估服务器生成。Table 1: The performance of the three most commonly used depth completion methods and this method is compared on the KITTI depth completion test set. Results are generated by KITTI's evaluation server.
从表中可以看到:本方法比KITTI数据集上的亚军NN+CNN算法的均方根误差(RMSE)和MAE的平均误差(MAE)分别提高了131.29mm和113.54mm。这相当于最终点云结果中11cm的平均误差差,这对于精确的3D目标定位、避障和SLAM(即使定位与地图构建)都很重要。It can be seen from the table that this method improves the root mean square error (RMSE) and the average error (MAE) of MAE by 131.29mm and 113.54mm, respectively, over the runner-up NN+CNN algorithm on the KITTI dataset. This corresponds to an average error difference of 11cm in the final point cloud result, which is important for accurate 3D object localization, obstacle avoidance and SLAM (even for localization and map building).
表2:膨胀核形状和大小对算法性能的影响。Table 2: Influence of dilation kernel shape and size on algorithm performance.
初始膨胀核的设计对算法的性能有很大的影响。为了找到最优膨胀核,一个完整的核大小在3×3、5×5和7×7之间变化。一个7×7的核被发现扩展深度值超过了他们的实际影响区域,而一个3×3的核膨胀没有足够的用来扩展的像素,使得边界被后来的孔封闭操作连接。表2显示5×5内核提供最低的均方根误差。利用核尺寸实验的结果,探索了5×5二值核形状的设计空间。完整的内核被用作基线,并与圆形、十字和菱形内核形状进行比较。膨胀核的形状确定了每个像素的初始效果区域。表二显示菱形内核提供了最低的均方根误差。菱形核形状保留了圆角边缘的大致轮廓,并且足够大来使得边缘被下一个孔闭合操作连接。The design of the initial dilation kernel has a great impact on the performance of the algorithm. To find the optimal dilation kernel, a full kernel size varies between 3×3, 5×5 and 7×7. A 7×7 kernel was found to expand depth values beyond their actual area of influence, while a 3×3 kernel did not have enough pixels to expand, so that the boundaries were connected by subsequent hole closing operations. Table 2 shows that the 5×5 kernel provides the lowest rms error. Using the results of the kernel size experiments, the design space of the 5 × 5 binary kernel shape is explored. The full kernel was used as the baseline and compared with circular, cross and diamond kernel shapes. The shape of the dilation kernel determines the initial area of effect for each pixel. Table 2 shows that the diamond kernel provides the lowest rms error. The diamond core shape retains the general outline of the rounded edges and is large enough that the edges are joined by the next hole closing operation.
表三:滤波效果。Table 3: Filtering effect.
中间滤波的设计是为了去除椒盐噪,使其有效地去除异常深度值,这个操作增加了2ms的运行时间,但是其可以降低均方根误差和平均误差的值。双边滤波仅通过滤波附近具有相似值的像素来保持局部结构,对评估的均方根误差和平均误差指标的影响最小,但增加4ms的运行时间。由于均方根误差度量的欧几里德计算,高斯滤波通过最小化异常像素深度带来的影响显著地降低了均方根误差。高斯滤波也运行得最快,平均运行时间只增加1毫秒。最后的本方法使用组合的中值和高斯滤波,因为这种组合被证明提供最低的均方根误差。The design of the intermediate filter is to remove salt and pepper noise, making it effective in removing abnormal depth values. This operation increases the running time of 2ms, but it can reduce the value of the root mean square error and the average error. Bilateral filtering preserves local structure by only filtering nearby pixels with similar values, with minimal impact on the evaluated RMSE and mean error metrics, but adds 4ms to the runtime. Due to the Euclidean computation of the root mean square error metric, Gaussian filtering significantly reduces the root mean square error by minimizing the effect of outlier pixel depth. Gaussian filtering also ran the fastest, adding only 1 millisecond to the average running time. The final present method uses a combined median and Gaussian filtering, as this combination has been shown to provide the lowest rms error.
可以看出,本实施例提出的一种深度补全方法,以一个稀疏深度图作为输入,输出一个密集深度图。只使用传统的图像处理技术,不需要训练,使其对过拟合具有鲁棒性。展示了基于图像处理的算法在KITTI深度补全基准上提供了相当先进的结果,优于几种基于深度学习的方法。本方法可在3.8GHz以上的CPU上实时运行,不需要任何额外的GPU硬件,使其成为部署在嵌入式系统上作为更复杂任务(如SLAM或3D对象检测)的预处理步骤的一个有力竞争者。最后,这项工作并不是为了削弱深度学习系统的力量,而是为了阐明当前的研究趋势,在这种趋势中,经典方法没有被仔细考虑来进行比较,但如果设计得当,它们可以成为强大的基线。It can be seen that a depth completion method proposed in this embodiment takes a sparse depth map as input and outputs a dense depth map. Only traditional image processing techniques are used and no training is required, making it robust to overfitting. We show that the image processing-based algorithm provides fairly advanced results on the KITTI depth completion benchmark, outperforming several deep learning-based methods. This method can run in real-time on CPUs above 3.8GHz and does not require any additional GPU hardware, making it a strong competition for deployment on embedded systems as a preprocessing step for more complex tasks such as SLAM or 3D object detection By. In the end, this work is not intended to diminish the power of deep learning systems, but rather to shed light on current research trends in which classical approaches have not been carefully considered for comparison, but if properly designed, they can be powerful baseline.
以上所述仅为本发明的较佳实施方式,本发明的保护范围并不以上述实施方式为限,但凡本领域普通技术人员根据本发明所揭示内容所作的等效修饰或变化,皆应纳入权利要求书中记载的保护范围内。The above descriptions are only the preferred embodiments of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, but any equivalent modifications or changes made by those of ordinary skill in the art based on the contents disclosed in the present invention should be included in the within the scope of protection described in the claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210749638.XA CN115131569A (en) | 2022-06-29 | 2022-06-29 | An Unguided Depth Completion Method for Custom Kernel Dilation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210749638.XA CN115131569A (en) | 2022-06-29 | 2022-06-29 | An Unguided Depth Completion Method for Custom Kernel Dilation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115131569A true CN115131569A (en) | 2022-09-30 |
Family
ID=83380019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210749638.XA Pending CN115131569A (en) | 2022-06-29 | 2022-06-29 | An Unguided Depth Completion Method for Custom Kernel Dilation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115131569A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108171659A (en) * | 2017-12-01 | 2018-06-15 | 天津大学 | A kind of image repair method based on K-SVD dictionaries |
CN112734825A (en) * | 2020-12-31 | 2021-04-30 | 深兰人工智能(深圳)有限公司 | Depth completion method and device for 3D point cloud data |
CN112861729A (en) * | 2021-02-08 | 2021-05-28 | 浙江大学 | Real-time depth completion method based on pseudo-depth map guidance |
-
2022
- 2022-06-29 CN CN202210749638.XA patent/CN115131569A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108171659A (en) * | 2017-12-01 | 2018-06-15 | 天津大学 | A kind of image repair method based on K-SVD dictionaries |
CN112734825A (en) * | 2020-12-31 | 2021-04-30 | 深兰人工智能(深圳)有限公司 | Depth completion method and device for 3D point cloud data |
CN112861729A (en) * | 2021-02-08 | 2021-05-28 | 浙江大学 | Real-time depth completion method based on pseudo-depth map guidance |
Non-Patent Citations (2)
Title |
---|
JASON KU等: "In Defense of Classical Image Processing: Fast Depth Completion on the CPU", ARXIV, 31 January 2018 (2018-01-31), pages 1 - 7 * |
王鑫;朱行成;宁晨;王慧斌;: "基于冗余字典学习的图像修补算法", 计算机工程与应用, vol. 54, no. 2018, 15 March 2018 (2018-03-15), pages 198 - 204 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11488308B2 (en) | Three-dimensional object detection method and system based on weighted channel features of a point cloud | |
CN112966587B (en) | Training method of target detection model, target detection method and related equipment | |
CN109241913B (en) | Ship detection method and system combining significance detection and deep learning | |
CN110070025B (en) | Monocular image-based three-dimensional target detection system and method | |
WO2023015743A1 (en) | Lesion detection model training method, and method for recognizing lesion in image | |
US8213726B2 (en) | Image labeling using multi-scale processing | |
CN112101066A (en) | Object detection method and device and intelligent driving method, device and storage medium | |
CN110688947B (en) | Method for synchronously realizing human face three-dimensional point cloud feature point positioning and human face segmentation | |
CN113221925B (en) | Target detection method and device based on multi-scale image | |
CN107808386A (en) | A kind of sea horizon detection method based on image, semantic segmentation | |
Mozerov et al. | One-view occlusion detection for stereo matching with a fully connected CRF model | |
CN104408458B (en) | SAR image segmentation method based on ray completion administrative division map and feature learning | |
CN116503602A (en) | Unstructured environment three-dimensional point cloud semantic segmentation method based on multi-level edge enhancement | |
CN102903102A (en) | Non-local-based triple Markov random field synthetic aperture radar (SAR) image segmentation method | |
Yuan et al. | Neighborloss: a loss function considering spatial correlation for semantic segmentation of remote sensing image | |
CN116310128A (en) | Dynamic environment monocular multi-object SLAM method based on instance segmentation and three-dimensional reconstruction | |
CN113269689B (en) | A depth image completion method and system based on normal vector and Gaussian weight constraints | |
CN112819832B (en) | Fine-grained boundary extraction method for semantic segmentation of urban scenes based on laser point cloud | |
Sharshar et al. | Innovative horizons in aerial imagery: Lsknet meets diffusiondet for advanced object detection | |
CN113920273B (en) | Image processing method, device, electronic equipment and storage medium | |
CN109816710B (en) | Parallax calculation method for binocular vision system with high precision and no smear | |
CN107564024B (en) | SAR image aggregation region extraction method based on single-side aggregation line segment | |
CN107403465B (en) | Urban scene segmentation plane reconstruction method based on structure prior and deep learning | |
CN113705433A (en) | Power line detection method based on visible light aerial image | |
CN109614952B (en) | Target signal detection and identification method based on waterfall plot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |