CN114677659A

CN114677659A - Model construction method, system and intelligent terminal for road surface flatness detection

Info

Publication number: CN114677659A
Application number: CN202210243771.8A
Authority: CN
Inventors: 刘超; 王超; 高扬; 何喜军
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-12-27
Filing date: 2022-03-10
Publication date: 2022-06-28

Abstract

The invention discloses a model construction method, a system and an intelligent terminal for detecting the flatness of a pavement, wherein the method comprises the following steps: acquiring a multi-frame original image of a target scene, extracting an RGB (red, green and blue) image from the original image, and generating a depth map from the original image; generating a data set by the RGB image and the depth map, dividing the data set into a training set, a verification set and a test set according to a preset proportion, and labeling each data set; and respectively inputting RGB-D data into a training set and a verification set based on a pre-stored classification detection network model, and obtaining a road surface flatness detection model through model training. The technical problem of lower road surface flatness detection accuracy among the prior art is solved.

Description

Model construction method, system and intelligent terminal for road surface flatness detection

技术领域technical field

本发明涉及自动驾驶辅助技术领域，具体涉及一种用于路面平坦度检测的模型构建方法、系统和智能终端。The invention relates to the technical field of automatic driving assistance, in particular to a model building method, system and intelligent terminal for road flatness detection.

背景技术Background technique

随着自动驾驶技术的发展，人们对于辅助驾驶车辆安全性和舒适性的要求也日益提高。自动驾驶是汽车产业与人工智能、物联网等新兴信息技术深度融合的产物，是当前全球汽车与交通出行领域智能化和网联化发展的主要方向，通过检测路面平坦程度来改善行车体验及安全性也成为了自动驾驶领域的核心问题之一。从当前城市道路情况来看，由路面损害和城市基础建设所必须的减速带或井盖等造成的路面凹凸会影响驾驶人的驾驶体验及乘车人的乘坐体验，甚至会增加汽车驾驶的危险系数。因此，有效地检测路面平坦程度对车辆的驾驶起着至关重要的作用。With the development of autonomous driving technology, people's requirements for the safety and comfort of assisted driving vehicles are also increasing. Autonomous driving is the product of the deep integration of the automotive industry with emerging information technologies such as artificial intelligence and the Internet of Things. It is the main direction of the development of intelligence and networking in the global automotive and transportation fields. It can improve the driving experience and safety by detecting the smoothness of the road. Sexuality has also become one of the core issues in the field of autonomous driving. Judging from the current urban road conditions, road surface irregularities caused by road damage and speed bumps or manhole covers necessary for urban infrastructure will affect the driving experience of drivers and the riding experience of passengers, and even increase the risk factor of car driving. . Therefore, effectively detecting the smoothness of the road surface plays a crucial role in the driving of the vehicle.

但是，传统的图像处理算法对数据要求严格，路面阴影、光照不均匀等干扰因素都使其难以给出精确的检测结果。而经典的深度学习模型主要是针对大尺寸、大目标进行分类和识别，对于路面上形态各异的凹槽、裂缝等极易造成漏检和误判。However, traditional image processing algorithms have strict data requirements, and interference factors such as road shadows and uneven illumination make it difficult to give accurate detection results. The classic deep learning model is mainly for classification and identification of large-scale and large targets, and it is easy to cause missed detection and misjudgment for grooves and cracks with different shapes on the road surface.

发明内容SUMMARY OF THE INVENTION

为此，本发明实施例提供一种用于路面平坦度检测的模型构建方法、系统和智能终端，以期至少部分解决现有技术中路面平坦度检测准确性较低的技术问题。To this end, the embodiments of the present invention provide a model construction method, system and intelligent terminal for road surface flatness detection, in order to at least partially solve the technical problem of low road surface flatness detection accuracy in the prior art.

为了实现上述目的，本发明实施例提供如下技术方案：In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

一种用于路面平坦度检测的模型构建方法，所述方法包括：A model building method for road surface flatness detection, the method comprising:

获取目标场景的多帧原始图像，在所述原始图像中提取RGB图像，并将所述原始图像生成深度图；Obtaining multiple frames of original images of the target scene, extracting RGB images from the original images, and generating a depth map from the original images;

将所述RGB图像和所述深度图生成数据集，并将所述数据集按预设比例划分为训练集、验证集和测试集，并对各数据集进行标注；generating a data set from the RGB image and the depth map, dividing the data set into a training set, a verification set and a test set according to a preset ratio, and labeling each data set;

基于预存的分类检测网络模型，将RGB－D数据分别输入训练集和验证集，通过模型训练得到路面平坦度检测模型。Based on the pre-stored classification detection network model, the RGB-D data are input into the training set and the validation set respectively, and the road flatness detection model is obtained through model training.

进一步地，获取目标场景的多帧原始图像，在所述原始图像中提取RGB图像，并将所述原始图像生成深度图，具体包括：Further, acquiring multiple frames of original images of the target scene, extracting RGB images from the original images, and generating a depth map from the original images, specifically including:

利用双目相机采集双目道路图像，并将双目道路图像作为原始图像；Use the binocular camera to collect the binocular road image, and use the binocular road image as the original image;

根据彩色三通道原理在所述原始图像中提取RGB图像；extracting an RGB image from the original image according to the color three-channel principle;

根据立体匹配和三维重建方法将所述原始图像生成深度图，所述深度图与所述RGB图像的尺寸大小相同。A depth map is generated from the original image according to a stereo matching and three-dimensional reconstruction method, and the depth map has the same size as the RGB image.

进一步地，将所述RGB图像和所述深度图生成数据集，并将所述数据集按预设比例划分为训练集、验证集和测试集，具体包括：Further, generate a data set from the RGB image and the depth map, and divide the data set into a training set, a verification set and a test set according to a preset ratio, specifically including:

将左目RGB图像和深度图生成数据集；Generate a dataset from the left-eye RGB image and depth map;

按照3：1：1的比例将所述数据集划分为训练集、验证集和测试集。The dataset is divided into training set, validation set and test set in a ratio of 3:1:1.

进一步地，对各数据集进行标注，具体包括：Further, label each data set, including:

对左目图像的训练集和验证集进行标注，标注类别包括纵向裂缝、横向裂缝、网状裂缝、坑槽、减速带、井盖中的至少一者。The training set and the validation set of the left-eye images are labeled, and the labeled categories include at least one of longitudinal cracks, transverse cracks, network cracks, pits, speed bumps, and manhole covers.

进一步地，将RGB－D数据分别输入训练集和验证集，具体包括：Further, input the RGB-D data into the training set and the validation set respectively, including:

将RGB图像统一缩放至同一尺寸，并自适应地为RGB图像及视差图填充黑边后得到特征图，将特征图输入到训练集和验证集中进行训练。The RGB image is uniformly scaled to the same size, and the RGB image and disparity map are filled with black edges adaptively to obtain a feature map, and the feature map is input into the training set and the validation set for training.

进一步地，所述分类检测网络模型的输出端采用CIoU＿Loss作为边界框的损失函数：Further, the output of the classification detection network model adopts CIoU_Loss as the loss function of the bounding box:

其中，A和B分别为预测边界框和真实标注框的面积，|A∩B|表示两框交集的面积，|A∪B|表示两框并集的面积，IoU即为两框面积的交并比。D1为两框中心点之间的欧氏距离的平方，D2为刚好能包含两框的最小矩形的对角线的长度平方，

为权重系数，其中v是衡量长宽比一致性的参数：Among them, A and B are the areas of the predicted bounding box and the ground truth, respectively, |A∩B| represents the area of the intersection of the two boxes, |A∪B| represents the area of the union of the two boxes, and IoU is the intersection of the areas of the two boxes and compare. D1 is the square of the Euclidean distance between the center points of the two boxes, D2 is the square of the length of the diagonal of the smallest rectangle that can just contain the two boxes,

is the weight coefficient, where v is a parameter that measures the consistency of the aspect ratio:

上式中，w^gt为真实标注框的宽，h^gt为真实标注框的高；w为目标检测框的宽，h为目标检测框的高。CIoU＿Loss函数在IoU的基础上，针对两框中心点距离、长宽比增加了

这两个惩罚项。In the above formula, w ^gt is the width of the real annotation frame, h ^gt is the height of the real annotation frame; w is the width of the target detection frame, and h is the height of the target detection frame. On the basis of IoU, the CIoU_Loss function increases the distance between the center points of the two frames and the aspect ratio.

these two penalties.

进一步地，所述方法还包括后处理步骤，所述后处理步骤包括：Further, the method further includes a post-processing step, and the post-processing step includes:

针对多目标框的筛选，采用加权非极大值抑制算法去除部分冗余的边界框。加权非极大值抑制与传统的非极大值抑制相比，在执行矩形框剔除的过程中，并未将与真实标注框的IoU大于阈值，且类别相同的预测框直接剔除，而是根据网络预测的置信度进行加权，得到新的矩形框，把该矩形框作为最终预测的矩形框，再将其他的冗余框剔除。For the screening of multi-object boxes, a weighted non-maximum suppression algorithm is used to remove some redundant bounding boxes. Compared with the traditional non-maximum suppression, the weighted non-maximum suppression does not directly remove the prediction box whose IoU is greater than the threshold and the same category as the real annotation box in the process of executing the rectangular box culling, but according to The confidence of the network prediction is weighted to obtain a new rectangular frame, which is used as the final predicted rectangular frame, and then other redundant frames are eliminated.

本发明还提供一种用于路面平坦度检测的模型构建系统，所述系统包括：The present invention also provides a model building system for road surface flatness detection, the system comprising:

数据获取单元，用于获取目标场景的多帧原始图像，在所述原始图像中提取RGB图像，并将所述原始图像生成深度图；a data acquisition unit for acquiring multiple frames of original images of the target scene, extracting RGB images from the original images, and generating a depth map from the original images;

数据集生成单元，用于将所述RGB图像和所述深度图生成数据集，并将所述数据集按预设比例划分为训练集、验证集和测试集，并对各数据集进行标注；a data set generation unit, configured to generate a data set from the RGB image and the depth map, and divide the data set into a training set, a verification set and a test set according to a preset ratio, and label each data set;

模型输出单元，用于基于预存的分类检测网络模型，将RGB－D数据分别输入训练集和验证集，通过模型训练得到路面平坦度检测模型。The model output unit is used to input the RGB-D data into the training set and the verification set based on the pre-stored classification detection network model, and obtain the road surface flatness detection model through model training.

本发明还提供一种智能终端，所述智能终端包括：数据采集装置、处理器和存储器；The present invention also provides an intelligent terminal, the intelligent terminal includes: a data acquisition device, a processor and a memory;

所述数据采集装置用于采集数据；所述存储器用于存储一个或多个程序指令；所述处理器，用于执行一个或多个程序指令，用以执行如上所述的方法。The data collection device is used to collect data; the memory is used to store one or more program instructions; the processor is used to execute one or more program instructions, so as to execute the above method.

本发明还提供一种计算机可读存储介质，所述计算机存储介质中包含一个或多个程序指令，所述一个或多个程序指令用于执行如上所述的方法。The present invention also provides a computer-readable storage medium containing one or more program instructions for executing the method as described above.

本发明所提供的用于路面平坦度检测的模型构建方法，通过获取目标场景的多帧原始图像，在所述原始图像中提取RGB图像，并将所述原始图像生成深度图；将所述RGB图像和所述深度图生成数据集，并将所述数据集按预设比例划分为训练集、验证集和测试集，并对各数据集进行标注；基于预存的分类检测网络模型，将RGB－D数据分别输入训练集和验证集，通过模型训练得到路面平坦度检测模型。该方法采用RGB－D图像进行目标检测，对检测路面裂缝、坑洞、减速带及井盖等目标具有较好的效果，解决了现有技术中路面平坦度检测准确性较低的技术问题。The model construction method for road surface flatness detection provided by the present invention obtains multiple frames of original images of a target scene, extracts RGB images from the original images, and generates a depth map from the original images; The image and the depth map generate a data set, and the data set is divided into a training set, a verification set and a test set according to a preset ratio, and each data set is marked; based on the pre-stored classification detection network model, the RGB- D data are input into training set and validation set respectively, and the road surface flatness detection model is obtained through model training. The method uses RGB-D images for target detection, and has a good effect on detecting targets such as road cracks, potholes, speed bumps, and manhole covers, and solves the technical problem of low road flatness detection accuracy in the prior art.

附图说明Description of drawings

为了更清楚地说明本发明的实施方式或现有技术中的技术方案，下面将对实施方式或现有技术描述中所需要使用的附图作简单地介绍。显而易见地，下面描述中的附图仅仅是示例性的，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图引伸获得其它的实施附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that are required to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only exemplary, and for those of ordinary skill in the art, other implementation drawings can also be obtained according to the extension of the drawings provided without creative efforts.

本说明书所绘示的结构、比例、大小等，均仅用以配合说明书所揭示的内容，以供熟悉此技术的人士了解与阅读，并非用以限定本发明可实施的限定条件，故不具技术上的实质意义，任何结构的修饰、比例关系的改变或大小的调整，在不影响本发明所能产生的功效及所能达成的目的下，均应仍落在本发明所揭示的技术内容得能涵盖的范围内。The structures, proportions, sizes, etc. shown in this specification are only used to cooperate with the contents disclosed in the specification, so as to be understood and read by those who are familiar with the technology, and are not used to limit the conditions for the implementation of the present invention, so there is no technical The substantive meaning above, any modification of the structure, the change of the proportional relationship or the adjustment of the size should still fall within the technical content disclosed in the present invention without affecting the effect and the purpose that the present invention can produce. within the range that can be covered.

图1为本发明所提供的用于路面平坦度检测的模型构建方法一种具体实施方式的流程图；1 is a flow chart of a specific embodiment of a model building method for road surface flatness detection provided by the present invention;

图2为RGB图像和深度图的结合形成RGB－D图像的示意图；FIG. 2 is a schematic diagram of a combination of an RGB image and a depth map to form an RGB-D image;

图3为网络架构图；Figure 3 is a network architecture diagram;

图4为特征传递示意图；4 is a schematic diagram of feature transfer;

图5为特征融合示意图；5 is a schematic diagram of feature fusion;

图6为本发明所提供的用于路面平坦度检测的模型构建系统一种具体实施方式的结构框图。FIG. 6 is a structural block diagram of a specific embodiment of the model building system for road surface flatness detection provided by the present invention.

具体实施方式Detailed ways

以下由特定的具体实施例说明本发明的实施方式，熟悉此技术的人士可由本说明书所揭露的内容轻易地了解本发明的其他优点及功效，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The embodiments of the present invention are described below by specific specific embodiments. Those who are familiar with the technology can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. Obviously, the described embodiments are part of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

为了解决路面平坦度检测准确率较低的问题，本发明提供了一种基于深度学习的路面平坦度检测方法，其基于RGB－D图像数据训练网络。检测精度高、检测速度快，且适用于城市复杂道路环境。In order to solve the problem of low detection accuracy of road surface flatness, the present invention provides a road surface flatness detection method based on deep learning, which trains a network based on RGB-D image data. The detection accuracy is high, the detection speed is fast, and it is suitable for the urban complex road environment.

在一种具体实施方式中，如图1所示，本发明所提供的用于路面平坦度检测的模型构建方法包括以下步骤：In a specific embodiment, as shown in FIG. 1 , the model construction method for road surface flatness detection provided by the present invention includes the following steps:

S1：获取目标场景的多帧原始图像，在所述原始图像中提取RGB图像，并将所述原始图像生成深度图。S1: Acquire multiple frames of original images of the target scene, extract RGB images from the original images, and generate a depth map from the original images.

具体地，利用双目相机采集双目道路图像，并将双目道路图像作为原始图像；根据彩色三通道原理在所述原始图像中提取RGB图像；根据立体匹配和三维重建方法将所述原始图像生成深度图，所述深度图与所述RGB图像的尺寸大小相同。Specifically, the binocular road image is collected by using a binocular camera, and the binocular road image is used as the original image; RGB images are extracted from the original image according to the principle of color three-channel; A depth map is generated, the depth map is the same size as the RGB image.

S2：将所述RGB图像和所述深度图生成数据集，并将所述数据集按预设比例划分为训练集、验证集和测试集，并对各数据集进行标注。S2: Generate a data set from the RGB image and the depth map, divide the data set into a training set, a verification set and a test set according to a preset ratio, and label each data set.

将左目RGB图像和深度图生成数据集；按照3：1：1的比例将所述数据集划分为训练集、验证集和测试集。A dataset is generated from the left-eye RGB image and the depth map; the dataset is divided into training set, validation set and test set according to the ratio of 3:1:1.

对各数据集进行标注时，对左目图像的训练集和验证集进行标注，标注类别包括纵向裂缝、横向裂缝、网状裂缝、坑槽、减速带、井盖中的至少一者。When labeling each data set, label the training set and validation set of left-eye images, and the labeling categories include at least one of longitudinal cracks, transverse cracks, network cracks, pits, speed bumps, and manhole covers.

将RGB－D数据分别输入训练集和验证集时，首先将RGB图像统一缩放至同一尺寸，并自适应地为RGB图像及视差图填充黑边后得到特征图，将特征图输入到训练集和验证集中进行训练。When inputting the RGB-D data into the training set and the validation set respectively, firstly, the RGB images are uniformly scaled to the same size, and the RGB images and disparity maps are adaptively filled with black borders to obtain feature maps, and the feature maps are input into the training set and the disparity map. Train on the validation set.

在模型训练过程中，首先需要准备数据集，利用双目相机采集双目道路图像，并根据立体匹配、三维重建相关原理生成深度图，深度图像与双目图像的尺寸大小相同。将左目图像及深度图数据集按照3：1：1的比例划分为训练集、验证集和测试集，并人工按照统一标准对左目图像的训练集和验证集进行标注，标注类别包括：纵向裂缝(D00)、横向裂缝(D10)、网状裂缝(D20)、坑槽(D40)、减速带(speed bump)、井盖(manhole cover)。In the model training process, it is first necessary to prepare a data set, use a binocular camera to collect binocular road images, and generate a depth map according to the relevant principles of stereo matching and 3D reconstruction. The depth image and the binocular image have the same size. The left-eye image and depth map dataset are divided into training set, validation set and test set according to the ratio of 3:1:1, and the training set and validation set of left-eye image are manually marked according to the unified standard. The labeling categories include: longitudinal cracks (D00), transverse cracks (D10), network cracks (D20), potholes (D40), speed bumps, manhole covers.

数据集准备完成后，增加网络通道进行模型训练。一般深度学习网络输入为彩色三通道RGB图像，采用这类数据对目标物体尺度和位置信息捕捉得比较模糊，误检率较高。针对此问题，本发明在输入端增加一个通道用于储存图像的深度信息，即输入为RGB－D图像，如图2所示，RGB－D是RGB图像和深度图的结合，包括RGB三个通道和一个代表像素点与成像设备距离的深度通道共四个通道，且深度图和RGB图像在网络中采用相似的处理方式。深度图和RGB图像中的信息是相互补充的，RGB图像提供了明显的外观特征，深度图像传达了几何结构。采用RGB－D数据进行实验，能较好地克服目标物体遮挡、重叠时所产生的不确定性，对物体的轮廓及表面特征也更加敏感。依据深度图像和彩色图像是配准的，使用ICP(Iterative Closest Point，最近点迭代)算法将3D模型与2D检测窗口内的点进行对齐。另外，由于RGB图像和深度图是配准的，像素点之间具有一一对应关系，两者的真实标注框、预测边界框的位置信息也是一致的。After the data set is prepared, add network channels for model training. Generally, the input of the deep learning network is a color three-channel RGB image. Using this kind of data, the scale and position information of the target object are captured relatively vaguely, and the false detection rate is high. In response to this problem, the present invention adds a channel at the input end to store the depth information of the image, that is, the input is an RGB-D image. As shown in Figure 2, RGB-D is a combination of an RGB image and a depth map, including three RGB images. The channel and a depth channel representing the distance between the pixel and the imaging device have a total of four channels, and the depth map and RGB image are processed in a similar way in the network. The information in the depth map and the RGB image are complementary, with the RGB image providing distinct appearance features and the depth image conveying the geometric structure. Experiments using RGB-D data can better overcome the uncertainty caused by the occlusion and overlap of the target object, and are more sensitive to the contour and surface features of the object. According to the registration of the depth image and the color image, the ICP (Iterative Closest Point) algorithm is used to align the 3D model with the points within the 2D detection window. In addition, since the RGB image and the depth map are registered, there is a one-to-one correspondence between the pixels, and the location information of the real annotation box and the predicted bounding box of the two are also consistent.

S3：基于预存的分类检测网络模型，将RGB－D数据分别输入训练集和验证集，通过模型训练得到路面平坦度检测模型。S3: Based on the pre-stored classification detection network model, the RGB-D data are input into the training set and the validation set respectively, and the road flatness detection model is obtained through model training.

输入训练集及验证集数据后，在输入端对训练集数据随机旋转、裁剪、拼接、排布，将原始图像统一缩放至同一尺寸(416＊416)，并自适应地为原始输入图像及视差信息填充最少的黑边以减少信息冗余，最后输入到Backbone及Neck网络结构中进行训练。例如原始输入图像尺寸为800＊600，则其宽和高的缩放系数分别为

和

在此缩放系数下，原始图像的尺寸被缩小到416＊312，故需对其高度方向填充黑边；原本需要填充的高度为416－312＝104，取其整除32的余数8并除以2，得到最终在图像高度方向两端所需填充的最小黑边数值。After inputting the training set and validation set data, randomly rotate, crop, splicing, and arrange the training set data at the input end, uniformly scale the original image to the same size (416*416), and adaptively provide the original input image and parallax The information is filled with the least black borders to reduce information redundancy, and finally input into the Backbone and Neck network structures for training. For example, the original input image size is 800*600, then the scaling factors of its width and height are respectively

and

Under this scaling factor, the size of the original image is reduced to 416*312, so it needs to be filled with black borders in the height direction; the original height to be filled is 416-312=104, take the remainder 8 of the division of 32 and divide by 2 , to get the minimum black border value that needs to be filled at both ends of the image height direction.

网络结构见图3所示，首先，在Backbone的SubNet1结构中经过切片操作加深特征图的厚度以加强网络对特征的学习能力，再进行卷积操作。这里的卷积采用CSPNet结构，此结构针对冗余梯度信息导致计算量过高的问题进行改善解决，将梯度的变化从头到尾地集成到特征图中，以在实现轻量化计算的同时保证特征的学习能力。具体实现方式如图4所示，将特征映射为两部分，一部分进行Conv卷积、BN(batch normalization)批标准化处理，并选用激活函数Leaky relu，另一部分与卷积的输出直接进行concat拼接。The network structure is shown in Figure 3. First, in the SubNet1 structure of Backbone, the thickness of the feature map is deepened by slicing operation to enhance the network's ability to learn features, and then the convolution operation is performed. The convolution here adopts the CSPNet structure, which improves and solves the problem of excessive calculation amount caused by redundant gradient information, and integrates the change of gradient into the feature map from beginning to end, so as to ensure the features while achieving lightweight calculation. learning ability. The specific implementation is shown in Figure 4. The feature is mapped into two parts, one part is processed by Conv convolution and BN (batch normalization) batch normalization, and the activation function Leaky relu is selected, and the other part is directly concat spliced with the output of the convolution.

Neck部分由一个自顶向下的特征金字塔串联上一个自底向上的特征金字塔组成，顶层特征通过上采样和浅层特征融合，另外再通过下采样将浅层特征的强定位信息传递上去，这样能够更好地利用高层的强语义信息和低层的强定位信息。另外，使用ROI池化层对每一层的特征进行池化操作，得到相同尺寸的特征矩阵，最后将这些特征进行融合，这种操作能有效增强网络对特征的适应，同时获得更好的分类、回归、预测效果。The Neck part consists of a top-down feature pyramid connected with a bottom-up feature pyramid. The top-level features are fused with the shallow features through upsampling, and the strong localization information of the shallow features is passed on through downsampling, so that It can make better use of high-level strong semantic information and low-level strong localization information. In addition, the ROI pooling layer is used to pool the features of each layer to obtain a feature matrix of the same size, and finally these features are fused. This operation can effectively enhance the adaptation of the network to the features and obtain better classification. , regression, prediction effect.

输出端采用CIoU＿Loss作为边界框的损失函数，The output uses CIoU_Loss as the loss function of the bounding box,

these two penalties.

在模型构建完成后，为了验证模型的准确性，还可以对模型进行评估，具体方法为：After the model is constructed, in order to verify the accuracy of the model, the model can also be evaluated. The specific methods are as follows:

其中，TP即True Positive，表示实际是正样本且预测也是正样本的样本数；FP即False Positive，表示实际是负样本预测成正样本的样本数；FN即False Negative，表示实际是正样本预测成负样本的样本数。Recall为模型召回率，反映正样本被预测为正的比例，Recall越高，说明模型对正样本的识别能力越强。Precision为精确度，反映了被预测为正样本中真正的正样本的比重。F1－score，即F1分数，是分类问题的一个衡量指标，计算的是精确率和召回率的调和平均数，最大为1，最小为0。Among them, TP is True Positive, indicating the number of samples that are actually positive samples and predicted to be positive samples; FP is False Positive, indicating the number of samples that are actually negative samples predicted to be positive samples; FN is False Negative, indicating that the actual positive samples are predicted to be negative samples the number of samples. Recall is the model recall rate, which reflects the proportion of positive samples that are predicted to be positive. The higher the Recall, the stronger the model's ability to identify positive samples. Precision is the precision, which reflects the proportion of true positive samples that are predicted to be positive samples. F1-score, or F1 score, is a measure of classification problems. It calculates the harmonic mean of precision and recall, with a maximum of 1 and a minimum of 0.

本发明所提供的方法具有以下技术效果：The method provided by the present invention has the following technical effects:

1、通过设计基于RGB－D图像的目标检测网络，提高路面检测的准确性。1. Improve the accuracy of road detection by designing a target detection network based on RGB-D images.

具体地，在此之前的众多技术中，均只使用单目视觉，通过机器学习模型结合几何约束去拟合空间点坐标分布。相比之下，双目视觉会有额外的视差信息，可以重建出场景的深度信息，所以可以得到比单目视觉更强的空间约束关系。Specifically, in many of the previous technologies, only monocular vision is used, and the spatial point coordinate distribution is fitted through a machine learning model combined with geometric constraints. In contrast, binocular vision has additional parallax information, which can reconstruct the depth information of the scene, so it can obtain stronger spatial constraints than monocular vision.

针对基于RGB的路面检测模型漏检、误检概率较大的问题，本发明根据深度图可提供目标轮廓和目标表面凹陷特征，彩色图像可提供目标表面纹理特征的特点，将两者结合设计了基于RGB－D的目标检测网络。RGB为三通道彩色图像，三通道分别对应于Red，Green，Blue三原色值，每个原色的取值是0到255间的整数。RGB－D在RGB的基础上多了一个Depth通道，其以深度值Depth作为像素，反映了相机与物体的实际距离。Aiming at the problem that the RGB-based pavement detection model has a large probability of missed detection and false detection, the present invention can provide the target contour and the target surface concave feature according to the depth map, and the color image can provide the target surface texture feature. RGB-D based object detection network. RGB is a three-channel color image. The three channels correspond to the three primary color values of Red, Green, and Blue. The value of each primary color is an integer between 0 and 255. RGB-D has an additional Depth channel on the basis of RGB, which uses the depth value Depth as a pixel, which reflects the actual distance between the camera and the object.

其中，b、f为相机内参，分别表示基线距离和焦距；disparity为立体匹配所得到的双目视差。Depth和RGB图像中的信息是相互补充的，RGB图像提供了明显的外观特征，深度图像传达了几何结构。Among them, b and f are the internal parameters of the camera, representing the baseline distance and focal length, respectively; disparity is the binocular disparity obtained by stereo matching. The information in Depth and RGB images are complementary, with RGB images providing distinct appearance features and depth images conveying geometric structure.

本发明通过将深度图像视为彩色图像的额外通道来检测2D图像平面中的目标物体，并使用ICP算法将3D模型与2D检测窗口内的点进行对齐。ICP算法的目的是将不同坐标系下的三维数据合并到同一坐标系下，该算法通过重复进行选择对应的关系点对，计算最优刚性变换，直到满足正确配准的收敛精度要求，输出此条件下的刚性变换矩阵，对原三维数据进行变换。The present invention detects the target object in the 2D image plane by treating the depth image as an extra channel of the color image, and uses the ICP algorithm to align the 3D model with the points within the 2D detection window. The purpose of the ICP algorithm is to merge the three-dimensional data in different coordinate systems into the same coordinate system. The algorithm selects the corresponding relationship point pairs repeatedly, calculates the optimal rigid transformation until it meets the convergence accuracy requirements of the correct registration, and outputs this. The rigid transformation matrix under the condition is used to transform the original 3D data.

采用RGB－D数据进行实验，能较好地克服目标物体遮挡、重叠时所产生的不确定性，对物体的轮廓及表面特征也更加敏感通过提取出更加充分的特征信息来有效提升路面平坦度的检测准确率。Experiments using RGB-D data can better overcome the uncertainty caused by the occlusion and overlap of the target object, and are more sensitive to the contour and surface features of the object. By extracting more sufficient feature information, the road surface flatness can be effectively improved detection accuracy.

实验结果表明，此网络得到适当训练后能很好地推广到不同的数据集，且可以依据当前训练权重而不需要做额外的训练，检测效果也会优于单纯使用RGB图像的网络。The experimental results show that the network can be well generalized to different data sets after proper training, and can be based on the current training weights without additional training, and the detection effect is also better than the network using only RGB images.

2、采用数据增强丰富训练集，以保证数据准确性。2. Use data augmentation to enrich the training set to ensure data accuracy.

针对数据量少，模型泛化能力欠缺的问题，本网络在输入端对数据集进行增强，即随机旋转、裁剪、拼接、排布训练样本。具体实现方式为：每次读取6张训练样本，分别对这6张图片随机旋转任意角度，并随机将样本中的部分区域裁剪掉，填充为0像素值；然后对这六张图片进行随机地拼接、排布，每一张样本图像都有其对应的真实标注框，将6张图像拼接之后获得一张新的图像，同时也获得这张图像所对应的框，然后将新图像传入到网络当中去学习，相当于同时传入6张具有丰富背景的图像，增强了场景的多样性，也极大地丰富了训练数据集。另外还采取了随机缩放的策略，以增加小目标数目，让网络的鲁棒性更好。In view of the small amount of data and the lack of model generalization ability, this network enhances the data set at the input end, that is, random rotation, cropping, splicing, and arrangement of training samples. The specific implementation method is: read 6 training samples each time, randomly rotate these 6 images by any angle, and randomly crop some areas in the samples, and fill them with 0 pixel values; then randomize these six images. Each sample image has its corresponding real annotation frame. After splicing 6 images, a new image is obtained, and the frame corresponding to this image is also obtained, and then the new image is passed in Learning in the network is equivalent to passing in 6 images with rich backgrounds at the same time, which enhances the diversity of the scene and greatly enriches the training data set. In addition, a random scaling strategy is adopted to increase the number of small targets and make the network more robust.

3、串联双金字塔，充分利用特征信息，以加强特征融合性能。3. Double pyramids are connected in series to make full use of feature information to enhance feature fusion performance.

针对路面上的小障碍物、小面积坑洼、裂缝的检测及定位准确率较低的问题，本发明根据低层特征能提供强定位信息，高层特征能提供强语义信息，使用了两个特征金字塔。如附图1中Neck部分所示，首先自顶向下的金字塔通过上采样将高层的强语义特征传递下来，接着在其后串联一个自底向上的金字塔对上一个金字塔进行补充，且为了缩短特征层之间的路径，在底层特征到顶层特征之间建立横向连接。高层特征与低层特征之间的融合方式如图5所示。Aiming at the problem of low detection and positioning accuracy of small obstacles, small potholes and cracks on the road, the present invention can provide strong positioning information according to low-level features, and high-level features can provide strong semantic information, and uses two feature pyramids . As shown in the Neck part of Fig. 1, first the top-down pyramid transfers the high-level strong semantic features through upsampling, and then a bottom-up pyramid is connected in series to supplement the previous pyramid, and in order to shorten the Paths between feature layers, establishing lateral connections from bottom-level features to top-level features. The fusion method between high-level features and low-level features is shown in Figure 5.

由图5可知，双金字塔通过融合一个更低层的N_i和一个更高层的P_i+1，得到下一层N_i+1。以N₂到N₃之间的计算为例，首先通过一个步长为2的3＊3卷积对N₂进行下采样，再通过单位加的方式将P₃和下采样得到的特征矩阵进行融合，接着再使用一个3＊3的卷积对特征进行融合，增强融合之后的特征的表征能力。两个特征金字塔的串联可以极大地保留特征的多样性和完整性，从而能更准确地预测目标边界框。It can be seen from FIG. 5 that the double pyramid obtains the next layer Ni+1 by fusing a lower layer Ni and a higher layer P _i ₊ ₁ . Taking the calculation between N ₂ and N ₃ as an example, firstly, N ₂ is down-sampled by a 3*3 convolution with a step size of 2, and then P ₃ and the feature matrix obtained by down-sampling are processed by unit addition. Fusion, and then use a 3*3 convolution to fuse the features to enhance the representation ability of the fused features. The concatenation of two feature pyramids can greatly preserve the diversity and completeness of features, resulting in more accurate prediction of target bounding boxes.

4、采用CIoU损失函数，以改善目标重叠遮挡情况下的检测性能。4. The CIoU loss function is adopted to improve the detection performance in the case of overlapping targets.

针对路面上目标物体密集重叠时漏检的情况，本发明在边界框回归部分采用CIoU＿Loss函数。其在DIoU＿Loss函数的基础上添加了一个影响因子v，可用来衡量Anchor框和目标框之间的比例一致性。Aiming at the situation of missing detection when the target objects on the road are densely overlapped, the present invention adopts the CIoU_Loss function in the bounding box regression part. It adds an impact factor v based on the DIoU_Loss function, which can be used to measure the proportional consistency between the Anchor box and the target box.

CIoU＿Loss函数在普通IoU的基础上增加了两个惩罚项，一是针对边界框重叠增加了中心点距离的惩罚项，二是预测框和真实标注框大小相对比例的惩罚项，这部分改进使得CIoU＿Loss拥有更快的收敛速度以及更加鲁棒的检测效果，回归损失在目标框出现重叠、遮挡时也能更准确、收敛更快。The CIoU_Loss function adds two penalty items on the basis of ordinary IoU. One is the penalty item for the distance of the center point for the overlapping of the bounding box, and the other is the penalty item for the relative proportion of the predicted frame and the real label frame. This part of the improvement makes CIoU_Loss With faster convergence speed and more robust detection effect, the regression loss can be more accurate and converge faster when the target frame overlaps or is occluded.

除了上述方法，本发明还提供一种用于路面平坦度检测的模型构建系统，如图6所示，所述系统包括：In addition to the above method, the present invention also provides a model building system for road surface flatness detection, as shown in FIG. 6 , the system includes:

数据获取单元100，用于获取目标场景的多帧原始图像，在所述原始图像中提取RGB图像，并将所述原始图像生成深度图；a data acquisition unit 100 for acquiring multiple frames of original images of a target scene, extracting RGB images from the original images, and generating a depth map from the original images;

数据集生成单元200，用于将所述RGB图像和所述深度图生成数据集，并将所述数据集按预设比例划分为训练集、验证集和测试集，并对各数据集进行标注；A data set generating unit 200 is used to generate a data set from the RGB image and the depth map, and divide the data set into a training set, a verification set and a test set according to a preset ratio, and label each data set ;

模型输出单元300，用于基于预存的分类检测网络模型，将RGB－D数据分别输入训练集和验证集，通过模型训练得到路面平坦度检测模型。The model output unit 300 is configured to input the RGB-D data into the training set and the verification set respectively based on the pre-stored classification detection network model, and obtain a road surface flatness detection model through model training.

在上述具体实施方式中，本发明所提供的用于路面平坦度检测的模型构建系统，通过获取目标场景的多帧原始图像，在所述原始图像中提取RGB图像，并将所述原始图像生成深度图；将所述RGB图像和所述深度图生成数据集，并将所述数据集按预设比例划分为训练集、验证集和测试集，并对各数据集进行标注；基于预存的分类检测网络模型，将RGB－D数据分别输入训练集和验证集，通过模型训练得到路面平坦度检测模型。该方法采用RGB－D图像进行目标检测，对检测路面裂缝、坑洞、减速带及井盖等目标具有较好的效果，解决了现有技术中路面平坦度检测准确性较低的技术问题。In the above specific embodiment, the model construction system for road flatness detection provided by the present invention obtains multiple frames of original images of the target scene, extracts RGB images from the original images, and generates the original images. Depth map; generate a data set from the RGB image and the depth map, and divide the data set into a training set, a verification set and a test set according to a preset ratio, and label each data set; based on the pre-stored classification Detect the network model, input the RGB-D data into the training set and the validation set respectively, and obtain the road flatness detection model through model training. The method uses RGB-D images for target detection, and has a good effect on detecting targets such as road cracks, potholes, speed bumps, and manhole covers, and solves the technical problem of low road flatness detection accuracy in the prior art.

与上述实施例相对应的，本发明实施例还提供了一种计算机存储介质，该计算机存储介质中包含一个或多个程序指令。其中，所述一个或多个程序指令用于被一种双目相机深度标定系统执行如上所述的方法。Corresponding to the foregoing embodiments, an embodiment of the present invention further provides a computer storage medium, where the computer storage medium contains one or more program instructions. Wherein, the one or more program instructions are used for performing the above method by a binocular camera depth calibration system.

在本发明实施例中，处理器可以是一种集成电路芯片，具有信号的处理能力。处理器可以是通用处理器、数字信号处理器(Digital Signal Processor，简称DSP)、专用集成电路(Application Specific工ntegrated Circuit，简称ASIC)、现场可编程门阵列(FieldProgrammable Gate Array，简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。In this embodiment of the present invention, the processor may be an integrated circuit chip, which has signal processing capability. The processor may be a general-purpose processor, a digital signal processor (DSP for short), an application specific integrated circuit (ASIC for short), a field programmable gate array (FPGA for short), or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。处理器读取存储介质中的信息，结合其硬件完成上述方法的步骤。Various methods, steps, and logical block diagrams disclosed in the embodiments of the present invention can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present invention may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The processor reads the information in the storage medium, and completes the steps of the above method in combination with its hardware.

存储介质可以是存储器，例如可以是易失性存储器或非易失性存储器，或可包括易失性和非易失性存储器两者。The storage medium may be memory, eg, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.

其中，非易失性存储器可以是只读存储器(Read－Only Memory，简称ROM)、可编程只读存储器(Programmable ROM，简称PROM)、可擦除可编程只读存储器(Erasable PROM，简称EPROM)、电可擦除可编程只读存储器(ElectricallyEPROM，简称EEPROM)或闪存。Among them, the non-volatile memory may be a read-only memory (Read-Only Memory, referred to as ROM), a programmable read-only memory (Programmable ROM, referred to as PROM), an erasable programmable read-only memory (Erasable PROM, referred to as EPROM) , Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.

易失性存储器可以是随机存取存储器(Random Access Memory，简称RAM)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的RAM可用，例如静态随机存取存储器(Static RAM，简称SRAM)、动态随机存取存储器(Dynamic RAM，简称DRAM)、同步动态随机存取存储器(Synchronous DRAM，简称SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data RateSDRAM，简称DDRSDRAM)、增强型同步动态随机存取存储器(EnhancedSDRAM，简称ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM，简称SLDRAM)和直接内存总线随机存取存储器(DirectRambus RAM，简称DRRAM)。The volatile memory may be a random access memory (Random Access Memory, RAM for short), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM, referred to as SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, referred to as DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, referred to as ESDRAM), synchronous connection dynamic random access memory (Synchlink DRAM) , referred to as SLDRAM) and direct memory bus random access memory (DirectRambus RAM, referred to as DRRAM).

本发明实施例描述的存储介质旨在包括但不限于这些和任意其它适合类型的存储器。The storage medium described in the embodiments of the present invention is intended to include, but not be limited to, these and any other suitable types of memory.

本领域技术人员应该可以意识到，在上述一个或多个示例中，本发明所描述的功能可以用硬件与软件组合来实现。当应用软件时，可以将相应功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质，其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。Those skilled in the art should appreciate that, in one or more of the above examples, the functions described in the present invention may be implemented by a combination of hardware and software. When the software is applied, the corresponding functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

以上的具体实施方式，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上仅为本发明的具体实施方式而已，并不用于限定本发明的保护范围，凡在本发明的技术方案的基础之上，所做的任何修改、等同替换、改进等，均应包括在本发明的保护范围之内。The above specific embodiments further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the protection scope of the present invention. On the basis of the technical solutions of the present invention, any modifications, equivalent replacements, improvements, etc. made shall be included within the protection scope of the present invention.

Claims

1. a model construction method for road surface flatness detection, is characterized in that, described method comprises:

Obtaining multiple frames of original images of the target scene, extracting RGB images from the original images, and generating a depth map from the original images;

generating a data set from the RGB image and the depth map, dividing the data set into a training set, a verification set and a test set according to a preset ratio, and labeling each data set;

Based on the pre-stored classification detection network model, the RGB-D data are input into the training set and the validation set respectively, and the road flatness detection model is obtained through model training.

2. The method for building a model according to claim 1, wherein obtaining multiple frames of original images of the target scene, extracting RGB images from the original images, and generating a depth map from the original images, specifically comprising:

Use the binocular camera to collect the binocular road image, and use the binocular road image as the original image;

extracting an RGB image from the original image according to the color three-channel principle;

A depth map is generated from the original image according to a stereo matching and three-dimensional reconstruction method, and the depth map has the same size as the RGB image.

3. The method for building a model according to claim 1, wherein a dataset is generated from the RGB image and the depth map, and the dataset is divided into a training set, a verification set and a test set according to a preset ratio set, including:

Generate a dataset from the left eye RGB image and depth map;

The dataset is divided into training set, validation set and test set in a ratio of 3:1:1.

4. The model construction method according to claim 3, wherein each data set is marked, specifically comprising:

The training set and the validation set of the left-eye images are labeled, and the labeled categories include at least one of longitudinal cracks, transverse cracks, network cracks, pits, speed bumps, and manhole covers.

5. The method for building a model according to claim 4, wherein the RGB-D data are input into the training set and the verification set, respectively, comprising:

The RGB image is uniformly scaled to the same size, and the RGB image and disparity map are filled with black borders adaptively to obtain a feature map, and the feature map is input into the training set and validation set for training.

6. The model construction method according to claim 5, wherein the output of the classification detection network model adopts CIoU_Loss as the loss function of the bounding box:

Among them, A and B are the areas of the predicted bounding box and the ground truth, respectively, |A∩B| represents the area of the intersection of the two boxes, |A∪B| represents the area of the union of the two boxes, and IoU is the intersection of the areas of the two boxes and compare. D1 is the square of the Euclidean distance between the center points of the two boxes, D2 is the square of the length of the diagonal of the smallest rectangle that can just contain the two boxes,

In the above formula, w ^gt is the width of the real annotation frame, h ^gt is the height of the real annotation frame; w is the width of the target detection frame, and h is the height of the target detection frame. On the basis of IoU, the CIoU_Loss function increases the distance between the center points of the two frames and the aspect ratio.

these two penalties.

7. The model construction method according to claim 6, wherein the method further comprises a post-processing step, and the post-processing step comprises:

For the screening of multi-object boxes, a weighted non-maximum suppression algorithm is used to remove some redundant bounding boxes. Compared with the traditional non-maximum suppression, the weighted non-maximum suppression does not directly eliminate the prediction box whose IoU is greater than the threshold and the same category as the real annotation box in the process of executing the rectangular box culling, but according to The confidence of the network prediction is weighted to obtain a new rectangular frame, which is used as the final predicted rectangular frame, and then other redundant frames are eliminated.

8. A model building system for road surface flatness detection, wherein the system comprises:

a data acquisition unit for acquiring multiple frames of original images of the target scene, extracting RGB images from the original images, and generating a depth map from the original images;

a data set generation unit, configured to generate a data set from the RGB image and the depth map, and divide the data set into a training set, a verification set and a test set according to a preset ratio, and label each data set;

The model output unit is used to input the RGB-D data into the training set and the verification set based on the pre-stored classification detection network model, and obtain the road surface flatness detection model through model training.

9. An intelligent terminal, characterized in that the intelligent terminal comprises: a data acquisition device, a processor and a memory;

The data collection device is used to collect data; the memory is used to store one or more program instructions; the processor is used to execute one or more program instructions to execute any one of claims 1-7 the method described.

10. A computer-readable storage medium, wherein the computer storage medium contains one or more program instructions, and the one or more program instructions are used to execute any one of claims 1-7 Methods.