CN110852243B

CN110852243B - Road intersection detection method and device based on improved YOLOv3

Info

Publication number: CN110852243B
Application number: CN201911078236.6A
Authority: CN
Inventors: 金飞; 陈佳怡; 王龙飞; 刘智; 芮杰; 王淑香; 官恺; 吕虎
Original assignee: PLA Information Engineering University
Current assignee: PLA Information Engineering University
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2022-06-28
Anticipated expiration: 2039-11-06
Also published as: CN110852243A

Abstract

The invention relates to a road intersection detection method and device based on improved YOLOv3, which mainly comprises the steps of firstly obtaining a road image; then, carrying out network training to construct an improved YOLOv3 network model; the improved YOLOv3 network model comprises a feature extraction end and a feature detection end, wherein the feature detection end comprises a plurality of channels, and in each channel, the corresponding convolution module is firstly widened transversely to generate different feature maps, and then is longitudinally aggregated; and identifying the road image to be detected by adopting the improved YOLOv3 network model, and outputting a result. The invention firstly expands the convolution module in each channel of the improved YOLOv3 feature detection end transversely to generate different feature maps, and then carries out longitudinal aggregation, so that the network width of the convolution module of each channel is wider, the expression capability of the network is enhanced, the difficulty in detecting small-size road intersections in complex remote sensing scenes is reduced, and the detection precision is improved.

Description

A road intersection detection method and device based on improved YOLOv3

技术领域technical field

本发明属于图像处理技术领域，具体涉及一种基于改进YOLOv3的道路交叉口检测方法及装置。The invention belongs to the technical field of image processing, and in particular relates to a road intersection detection method and device based on improved YOLOv3.

背景技术Background technique

道路交叉口作为道路连接的枢纽，为道路网的快速构建提供准确的位置、方向、拓扑关系等重要信息。在道路网提取过程中，受到多种复杂因素的干扰，提取的道路会出现不连续现象。此时，以道路交叉口的位置作为基点，利用方向和拓扑关系等信息可以辅助和指导道路网的构建。As the hub of road connections, road intersections provide important information such as accurate location, direction, and topological relationship for the rapid construction of road networks. In the process of road network extraction, due to the interference of various complex factors, the extracted roads will appear discontinuous. At this time, the location of the road intersection is used as the base point, and the information such as direction and topological relationship can be used to assist and guide the construction of the road network.

但是，基于道路交叉口在遥感影像中一般为形状较小的面状目标的特点，其常用的检测算法主要依据纹理、形状、灰度等特征来进行检测，这对于背景简单、轮廓特征较明显的道路交叉口有不错的检测效果，但在复杂遥感场景下对小尺寸道路交叉口的检测难度很大，且需要引入较多的人工干预，自动化程度和检测精度不高。However, based on the characteristics that road intersections are generally small-shaped surface targets in remote sensing images, the commonly used detection algorithms are mainly based on texture, shape, grayscale and other characteristics to detect, which is simple for the background and the contour features are more obvious. However, it is very difficult to detect small-sized road intersections in complex remote sensing scenarios, and requires more manual intervention, and the degree of automation and detection accuracy is not high.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于改进YOLOv3的道路交叉口检测方法及装置，用于解决复杂遥感场景下对小尺寸道路交叉口的检测难度很大，检测精度不高的问题。The purpose of the present invention is to provide a road intersection detection method and device based on the improved YOLOv3, which is used to solve the problems that the detection of small-sized road intersections in complex remote sensing scenarios is very difficult and the detection accuracy is not high.

为解决上述技术问题，本发明的技术方案为：一种基于改进YOLOv3的道路交叉口检测方法，包括以下步骤：In order to solve the above-mentioned technical problems, the technical scheme of the present invention is: a road intersection detection method based on improved YOLOv3, comprising the following steps:

1)，获取道路图像；1), get the road image;

2)，进行网络训练，构建改进的YOLOv3网络模型；所述改进的YOLOv3网络模型包括特征提取端和特征检测端，所述特征检测端包括多个通道，每个通道中，将对应的卷积模块先进行横向拓宽，生成不同的特征图，然后再进行纵向聚合；2), carry out network training, and construct an improved YOLOv3 network model; the improved YOLOv3 network model includes a feature extraction end and a feature detection end, and the feature detection end includes multiple channels, and in each channel, the corresponding convolution The module first widens horizontally, generates different feature maps, and then aggregates vertically;

3)，采用所述改进的YOLOv3网络模型对待检测的道路图像进行识别，并输出结果。3), using the improved YOLOv3 network model to identify the road image to be detected, and output the result.

本发明的有益效果为：通过对改进的YOLOv3特征检测端的每个通道中的卷积模块先进行横向拓宽，生成不同的特征图，然后再进行纵向聚合，能够使每个通道的卷积模块的网络宽度更宽，增强网络的表达能力，从而降低复杂遥感场景下对小尺寸道路交叉口的检测难度很大，提高检测精度。The beneficial effects of the present invention are: by first widening the convolution module in each channel of the improved YOLOv3 feature detection end to generate different feature maps, and then performing vertical aggregation, the convolution module of each channel can be The network width is wider and the expressive ability of the network is enhanced, thereby reducing the difficulty of detecting small-sized road intersections in complex remote sensing scenarios and improving the detection accuracy.

进一步的，所述特征提取端的激活函数为：Further, the activation function of the feature extraction end is:

其中，x_i为特征值；

μ为动量因子，ε为学习率，∈为常数，a_i的初始化值为0.25，变化范围在(0,1)之间。Among them, x _i is the eigenvalue;

μ is the momentum factor, ε is the learning rate, ∈ is a constant, the initial value of a _i is 0.25, and the variation range is between (0, 1).

进一步的，所述对每个通道的卷积模块进行拓宽的方式包括：对称拓宽和不对称拓宽。Further, the manner of widening the convolution module of each channel includes: symmetrical widening and asymmetric widening.

本发明还提供了一种基于改进YOLOv3的道路交叉口检测装置，包括处理器和存储器，处理器连接有用于获取道路图像的采集接口；所述处理器执行存储在存储器中的如下方法指令：The present invention also provides a road intersection detection device based on improved YOLOv3, comprising a processor and a memory, the processor is connected with a collection interface for acquiring road images; the processor executes the following method instructions stored in the memory:

1)，获取道路图像；1), get the road image;

其中，x_i为特征值；

附图说明Description of drawings

图1是本发明的改进的YOLOv3网络模型的示意图；Fig. 1 is the schematic diagram of the improved YOLOv3 network model of the present invention;

图2是本发明的改进的YOLOv3网络模型的对称拓宽的对称卷积模块通道；Fig. 2 is the symmetrical convolution module channel of the symmetrical widening of the improved YOLOv3 network model of the present invention;

图3是本发明的YOLOv3网络模型的不对称拓宽的不对称卷积模块通道；Fig. 3 is the asymmetric convolution module channel of the asymmetric widening of the YOLOv3 network model of the present invention;

图4是本发明的YOLOv3网络模型的多尺度特征融合示意图；Fig. 4 is the multi-scale feature fusion schematic diagram of the YOLOv3 network model of the present invention;

图5-1是现有的YOLOv3网络模型测试曲线；Figure 5-1 is the test curve of the existing YOLOv3 network model;

图5-2是本发明的改进的YOLOv3网络模型测试曲线；Fig. 5-2 is the improved YOLOv3 network model test curve of the present invention;

图6-1是现有的YOLOv3网络模型遮挡环境下的道路交叉口检测结果图；Figure 6-1 is the result of road intersection detection under the occlusion environment of the existing YOLOv3 network model;

图6-2是本发明的改进的YOLOv3网络模型的遮挡环境下的道路交叉口检测结果图；Fig. 6-2 is the road intersection detection result graph under the occlusion environment of the improved YOLOv3 network model of the present invention;

图7-1是现有的YOLOv3网络模型背景颜色相近环境下的道路交叉口检测结果图；Figure 7-1 is the result of road intersection detection under the background color of the existing YOLOv3 network model;

图7-2是本发明的改进的YOLOv3网络模型背景颜色相近环境下的道路交叉口检测结果图；Fig. 7-2 is the road intersection detection result graph under the environment with similar background color of the improved YOLOv3 network model of the present invention;

图8-1是现有的YOLOv3网络模型密集道路交叉口的检测结果图；Figure 8-1 is the detection result of the dense road intersection of the existing YOLOv3 network model;

图8-2是本发明的改进的YOLOv3网络模型的密集道路交叉口的检测结果图。FIG. 8-2 is a graph of the detection result of the dense road intersection of the improved YOLOv3 network model of the present invention.

具体实施方式Detailed ways

为了详细阐述本发明的目的、技术方案和优点，以下结合具体实施步骤和附图，对本发明进行进一步详细说明。In order to elaborate the objectives, technical solutions and advantages of the present invention, the present invention will be further described in detail below with reference to specific implementation steps and accompanying drawings.

道路交叉口检测方法实施例Embodiment of road intersection detection method

本发明提供了一种基于改进的YOLOv3的道路交叉口检测方法，其主要是先采集道路图像；进行网络训练，构建改进的YOLOv3网络模型；改进的YOLOv3网络模型包括特征提取端和特征检测端，特征检测端包括多个通道，每个通道中，将对应的卷积模块先进行横向拓宽，生成不同的特征图，然后再进行纵向聚合；采用所述改进的YOLOv3网络模型对待检测的道路图像进行识别，并输出结果。The invention provides a road intersection detection method based on improved YOLOv3, which mainly includes first collecting road images; performing network training to construct an improved YOLOv3 network model; the improved YOLOv3 network model includes a feature extraction end and a feature detection end, The feature detection end includes multiple channels. In each channel, the corresponding convolution module is first widened horizontally to generate different feature maps, and then vertically aggregated; the road image to be detected is processed by using the improved YOLOv3 network model. Identify and output the results.

具体的，本发明的基于改进YOLOv3的道路交叉口检测方法，包括以下步骤：Specifically, the road intersection detection method based on improved YOLOv3 of the present invention includes the following steps:

(1)道路图像数据获取。(1) Road image data acquisition.

通过影像数据库、高德地图、谷歌地图等数据平台采集了3106张包含十字、丁字、X形、Y形、错位、环形、多路7种常见的道路交叉口影像，影像分辨率在0.5米至2.5米之间，尺寸为416×416像素。Collected 3,106 images of 7 common road intersections including cross, T-shaped, X-shaped, Y-shaped, dislocation, ring, and multi-way through image database, AutoNavi Map, Google Map and other data platforms. Between 2.5 meters, the size is 416 × 416 pixels.

在计算机视觉中，面状目标通常以边框形式表达。为了获取目标准确的边框标签信息，利用LambIng平台以人工方式对目标逐一进行标注，获取边框的中心点坐标、宽高以及所属类别等信息，存储在与影像对应的xml文件中。In computer vision, planar objects are usually expressed in the form of bounding boxes. In order to obtain the accurate frame label information of the target, the LambIng platform is used to manually mark the target one by one, obtain the coordinates of the center point, width and height of the frame, and the category to which it belongs, and store it in the xml file corresponding to the image.

最后将整个数据集以1：2:7的比例划分为测试集、验证集和训练集。Finally, the entire dataset is divided into test set, validation set and training set in a ratio of 1:2:7.

(2)构建改进的YOLOv3网络模型，并对其进行网络训练。(2) Build an improved YOLOv3 network model and perform network training on it.

YOLOv3网络属于一种基于端到端的目标检测网络，包括特征提取端Darknet和特征检测端yolo两部分。其中，特征提取端Darknet是由52个卷积模块和23个残差模块构成的深度网络，其任务是对原始图像逐层提取特征，形成不同尺度的语义特征图。The YOLOv3 network belongs to an end-to-end target detection network, including the feature extraction end Darknet and the feature detection end yolo. Among them, the feature extraction end Darknet is a deep network composed of 52 convolution modules and 23 residual modules. Its task is to extract features layer by layer from the original image to form semantic feature maps of different scales.

本实施例的改进的YOLOv3网络模型是在现有的YOLOv3网络的基础上进行了进一步的改进，其改进有以下两点：The improved YOLOv3 network model of this embodiment is further improved on the basis of the existing YOLOv3 network, and the improvement has the following two points:

1、在卷积模块中采用PReLU函数来激活卷积层。1. The PReLU function is used in the convolution module to activate the convolution layer.

相对于LReLU函数采用固定小斜率的线性函数将负特征映射为弱特征的方式，PReLU函数是根据数据特性而自动调整线性函数的斜率，保留更多与目标相关的负特征。Compared with the LReLU function that uses a linear function with a fixed small slope to map negative features into weak features, the PReLU function automatically adjusts the slope of the linear function according to the data characteristics and retains more negative features related to the target.

PReLU函数定义表达式如下：The PReLU function definition expression is as follows:

式(1)所示，当特征值大于0时，PReLU函数进行恒等映射，当特征值小于0时，则进行非固定线性映射。As shown in formula (1), when the eigenvalue is greater than 0, the PReLU function performs an identity mapping, and when the eigenvalue is less than 0, a non-fixed linear mapping is performed.

在神经网络反向传播过程中，通过利用动量因子和网络的学习率来更新斜率a_i。式(2)中μ为动量因子，ε为学习率，∈为常数，a_i的初始化值为0.25，变化范围在(0,1)之间。During backpropagation of the neural network, the slope a _i is updated by utilizing the momentum factor and the learning rate of the network. In formula (2), μ is the momentum factor, ε is the learning rate, ∈ is a constant, the initial value of a _i is 0.25, and the variation range is between (0, 1).

本实施例中的PReLU函数，在复杂遥感场景下，假如检测边框内存在较多相似的干扰因素，其在反向传播过程中不断增大线性函数的斜率，提高负特征的关注度，将道路交叉口特征与干扰因素建立相关性。在特征检测端对目标进行分类回归时，综合上下文的语义信息与负特征，可在一定程度上解决相似干扰因素下的道路交叉口检测问题。The PReLU function in this embodiment, in a complex remote sensing scene, if there are many similar interference factors in the detection frame, it will continuously increase the slope of the linear function in the process of backpropagation, improve the attention of negative features, and convert road Intersection features are correlated with disturbance factors. When the target is classified and regressed at the feature detection end, the semantic information of the context and negative features can be integrated to solve the problem of road intersection detection under similar interference factors to a certain extent.

2、对特征检测端中的每个通道的卷积模块进行拓宽的方式包括：对称拓宽和不对称拓宽。2. The ways of widening the convolution module of each channel in the feature detection end include: symmetrical widening and asymmetric widening.

具体的，在输出层Contact之前的卷积层中设置了3种并列的卷积层，如图2所示，卷积核大小依次为3×3、1×1和3×3、3×3；也即以1×1和3×3的卷积核在横向上构建3个卷积通道，每个通道经过卷积运算后生成不同尺寸的特征图。在每个卷积层后加入BN层和PReLU激活函数，其中，BN层是在深度神经网络训练过程中使得每一层神经网络的输入保持相同分布的正则化函数，可提高梯度下降的速度与准确率，PReLU激活函数对特征数据进行线性激活。最终3种卷积层输出相同通道数目的同等大小尺寸的特征图；为了保证尺寸相同，卷积操作中采用SAME算法填充特征图边框(或者边框补零的设置)，随后将尺寸相同的特征图输入到输出层Contact中进行合并操作。Specifically, three parallel convolution layers are set in the convolution layer before the output layer Contact. As shown in Figure 2, the size of the convolution kernel is 3 × 3, 1 × 1, 3 × 3, 3 × 3 in turn. ; that is, 3 convolution channels are constructed horizontally with 1×1 and 3×3 convolution kernels, and each channel generates feature maps of different sizes after convolution operations. The BN layer and the PReLU activation function are added after each convolutional layer. The BN layer is a regularization function that keeps the input of each layer of the neural network in the same distribution during the training process of the deep neural network, which can improve the speed of gradient descent and Accuracy, the PReLU activation function performs linear activation on the feature data. Finally, the three convolution layers output feature maps of the same size with the same number of channels; in order to ensure the same size, the SAME algorithm is used in the convolution operation to fill the feature map border (or the setting of border zero padding), and then the feature maps of the same size are used. Input to the output layer Contact for merging.

上述实施例中的BN函数与PReLU函数是在每个卷积核之后设置的，其实际上是与卷积层相结合在一起的，也可认为卷积层中包括BN层和PReLU函数的。The BN function and the PReLU function in the above embodiment are set after each convolution kernel, which is actually combined with the convolution layer. It can also be considered that the convolution layer includes the BN layer and the PReLU function.

上述实施例中的3种并列的卷积层组成了对称卷积模块，如图3所示。当然作为其他实施方式，将上述中的其中一个3×3替换为1×3和3×1卷积核，可以构造出非对称卷积模块。The three parallel convolutional layers in the above embodiment form a symmetric convolutional module, as shown in FIG. 3 . Of course, as other implementations, an asymmetric convolution module can be constructed by replacing one of the above 3×3 convolution kernels with 1×3 and 3×1 convolution kernels.

需要说明的是，本实施例中的特征图尺寸在12×12至12×20时，非对称卷积模块的提取效果好于对称卷积模块，并可减少1/3的网络参数量。因此，本发明中在特征图尺寸为13×13的第1个检测通道中，采用非对称卷积模块；而在其它大尺度的通道上采用对称卷积模块。It should be noted that, when the size of the feature map in this embodiment is 12×12 to 12×20, the extraction effect of the asymmetric convolution module is better than that of the symmetric convolution module, and the amount of network parameters can be reduced by 1/3. Therefore, in the present invention, in the first detection channel with a feature map size of 13×13, an asymmetric convolution module is used; while other large-scale channels use a symmetric convolution module.

因此，本发明中的改进后的YOLOv3网络整体结构如图1所示，图中Convolutional-0表示对称卷积模块，Convolutional-1表示非对称卷积模块，Convolutional-Set为5个卷积模块的组合。Therefore, the overall structure of the improved YOLOv3 network in the present invention is shown in Figure 1. In the figure, Convolutional-0 represents a symmetric convolution module, Convolutional-1 represents an asymmetric convolution module, and Convolutional-Set is a set of 5 convolution modules. combination.

另外，本实施例中在进行道路图像的特征提取时，采用多尺度融合方法；该多尺度融合方法包括自下而上的路径，自上而下的路径和横向连接；其中的自下而上的路径是网络Darknet的前馈计算，由多个尺度的特征图组成的特征层次结构，其缩放步长为2；选择同一网络阶段的最后一层的输出作为的参考特征映射集；自上而下的路径通过特征尺度放大迁移融合，然后通过所述横向连接从自下而上的路径增强这些特征；每个横向连接合并来自自下而上路径和自上而下路径的相同空间大小的特征图。In addition, in this embodiment, a multi-scale fusion method is used in the feature extraction of road images; the multi-scale fusion method includes bottom-up paths, top-down paths and lateral connections; The path is the feedforward calculation of the network Darknet, a feature hierarchy composed of feature maps of multiple scales, and its scaling step is 2; the output of the last layer of the same network stage is selected as the reference feature map set; from top to bottom The lower paths are fused by feature scale upscaling transfer and then these features are augmented from the bottom-up path by the lateral connections; each lateral connection merges features of the same spatial size from the bottom-up and top-down paths picture.

具体的，如图4所示，本实施例中的多尺度融合，一是将Darknet网络最后一层输出的13×13特征图输入到YOLO中的第1个通道，对大目标进行检测；二是将13×13大小的特征图向下传递到第2个通道，经过上采样操作后与Darknet输出的26×26特征图进行融合，然后对次大目标进行检测；三是将第2个通道的特征图继续向第3个通道传递，上采样后与52×52特征图融合，对次小目标进行检测；四是将第3通道的52×52特征图进行最后一次上采样，与104×104特征图融合后进行小目标的检测，实现了多尺寸融合后的特征提取。Specifically, as shown in Figure 4, for the multi-scale fusion in this embodiment, one is to input the 13×13 feature map output by the last layer of the Darknet network into the first channel in YOLO to detect large targets; two The feature map of size 13×13 is passed down to the second channel, and after upsampling operation, it is fused with the 26×26 feature map output by Darknet, and then the second largest target is detected; the third is to use the second channel. The feature map of the third channel continues to be passed to the third channel, and after upsampling, it is fused with the 52 × 52 feature map to detect the sub-small target; fourth, the 52 × 52 feature map of the third channel is upsampled for the last time, and the 104 × 104 The feature map is fused to detect small targets, and the feature extraction after multi-size fusion is realized.

基于上述建立的改进的YOLOv3网络模型，利用道路图像数据中的训练集，在特征提取端利用Darknet网络在MSCOCO数据集上训练得到的模型来初始化参数。Based on the improved YOLOv3 network model established above, the training set in the road image data is used, and the model obtained by the Darknet network training on the MSCOCO data set is used at the feature extraction end to initialize the parameters.

其中，训练的总迭代次数设定为30000次，采用Adam优化算法来更新权值梯度，迭代批量大小为32，动量参数为0.9，前20000次迭代的学习率为0.0001，后10000次迭代的学习率降为0.00005。Among them, the total number of iterations of training is set to 30,000 times, the Adam optimization algorithm is used to update the weight gradient, the iteration batch size is 32, the momentum parameter is 0.9, the learning rate for the first 20,000 iterations is 0.0001, and the learning for the last 10,000 iterations rate dropped to 0.00005.

具体的，改进的YOLOv3网络模型的训练过程为：Specifically, the training process of the improved YOLOv3 network model is as follows:

1)导入训练数据：将样本图像按照顺序和比例分别生成训练文件train.txt、验证文件val.txt、和测试文件test.txt，将这3个文件与标注数据、样本图像一起作为训练数据导入到网络中；1) Import training data: Generate training file train.txt, verification file val.txt, and test file test.txt from sample images in order and proportion, and import these three files together with labeled data and sample images as training data into the network;

2)选择预训练模型：将Darknet网络在MSCOCO数据集上训练得到的模型作为预训练模型，来初始化特征提取端的网络参数。2) Select the pre-training model: The model obtained by training the Darknet network on the MSCOCO dataset is used as the pre-training model to initialize the network parameters on the feature extraction side.

3)网络训练：进行网络迭代训练，在训练过程中，将训练实时的准确率、loss值、召回率等指标存储在训练日志中。3) Network training: Iterative training of the network is performed. During the training process, the real-time training accuracy rate, loss value, recall rate and other indicators are stored in the training log.

4)最后得到训练后的改进的YOLOv3网络模型。4) Finally, the improved YOLOv3 network model after training is obtained.

3、采用已训练好的改进的YOLOv3网络模型对待测道路图像(测试集)进行道路交叉口检测。3. Use the trained and improved YOLOv3 network model to detect road intersections on the road image (test set) to be tested.

为了验证算法的有效性，本发明中将改进前后的网络在道路交叉口数据集上训练与调优，然后在多源遥感影像上对多个类型交叉口进行检测。In order to verify the effectiveness of the algorithm, in the present invention, the network before and after the improvement is trained and optimized on the road intersection data set, and then multiple types of intersections are detected on the multi-source remote sensing image.

为准确评估本发明的改进的YOLOv3网络的算法，通过实验综合准确率、查全率、PR曲线、平均准确率AP(Average-Precision)和平均准确率均值mAP来衡量模型的检测性能。In order to accurately evaluate the algorithm of the improved YOLOv3 network of the present invention, the detection performance of the model is measured by the comprehensive accuracy rate, recall rate, PR curve, average accuracy rate AP (Average-Precision) and average accuracy rate mAP.

其中的指标计算公式为：The formula for calculating the index is:

其中，TP为检测正确的道路交叉口数量；FP为检测结果与实际不符的道路交叉口数量；FN为未检测到的道路交叉口数量；N为道路交叉口的类别数；P_re为检测准确率，表示检测正确的道路交叉口在所有检测结果中的占比率；R_ec为查全率，表示检测正确的道路交叉口数量在所有道路交叉口中的占比率；PR曲线综合了准确率与查全率的评价标准，以查全率为横坐标轴，准确率为纵坐标轴，曲线与横纵坐标轴围城区域面积为AP值；mAP为平均准确率均值，用于衡量网络对整个测试集的检测效果。Among them, TP is the number of road intersections detected correctly; FP is the number of road intersections whose detection results do not match the actual; FN is the number of road intersections that are not detected; N is the number of categories of road intersections; _Pre is the accurate detection R ec is the percentage of correctly detected road intersections in all detection results; R _ec is the recall rate, which indicates the percentage of correctly detected road intersections in all road intersections; the PR curve combines the accuracy and The evaluation standard of the full rate is the recall rate as the abscissa axis, the accuracy as the ordinate axis, the curve and the area of the besieged city on the abscissa and ordinate axes as the AP value; mAP is the average accuracy rate, which is used to measure the network's response to the entire test set. detection effect.

图5-1和图5-2分别为改进前后YOLOv3网络训练得到的模型在测试集上依据准确率和查全率而绘制的PR曲线，测试集涵盖了十字(crossing)、丁字(tjunction)、错位(malposed)、X型(xshape)、Y型(yshape)、环形(roundabout)和多路(multiple)7种类型道路交叉口。Figure 5-1 and Figure 5-2 show the PR curves drawn by the model trained on the YOLOv3 network before and after the improvement on the test set based on the accuracy and recall. The test set covers crossing, tjunction, There are seven types of road intersections: malposed, X-shape, Y-shape, roundabout and multiple.

与图5-1中的PR曲线相比，图5-2的PR曲线更加平滑，检测准确率P_re与查全率R_ec都有一定提升，平均准确率均值mAP提高3.17％。不同类型道路交叉口在遥感影像中的相对尺寸有所不同，其中尺寸最小的为丁字交叉口，检测效果提升最为明显，AP值提高了8％；对于尺寸偏大的十字、错位、X型和Y型4类道路交叉口，AP值分别提高3.01％、3.25％、4.5％、2.08％；尺寸最大的环形和多路交叉口在两个网络模型上的测试结果相近，AP值提高不到1％。Compared with the PR curve in Figure 5-1, the PR curve in Figure 5-2 is smoother, the detection accuracy Pre and the recall rate _Rec are both improved to a certain extent, and the average accuracy rate _mAP is increased by 3.17%. The relative sizes of different types of road intersections in the remote sensing images are different. The smallest size is the T-shaped intersection, and the detection effect is improved most obviously, and the AP value is increased by 8%. For Y-type 4-type road intersections, the AP value is increased by 3.01%, 3.25%, 4.5%, and 2.08%, respectively; the test results of the largest ring and multi-way intersection on the two network models are similar, and the AP value is increased by less than 1 %.

实验结果表明：以道路交叉口测试集为实验数据，本文改进的YOLOv3网络有较强的鲁棒性，测试结果相对稳定，同时，网络通过增强目标细节特征的提取，对小尺寸道路交叉口目标的表达能力得到了明显的提高。The experimental results show that: using the road intersection test set as the experimental data, the improved YOLOv3 network in this paper has strong robustness, and the test results are relatively stable. The expression ability has been significantly improved.

为了验证算法对复杂环境下道路交叉口检测的有效性，在测试集中挑选存在较多干扰因素的影像，并基于改进前后的YOLOv3网络模型进行测试。In order to verify the effectiveness of the algorithm for road intersection detection in complex environments, images with more interference factors were selected in the test set and tested based on the YOLOv3 network model before and after improvement.

对于乡镇居民区的水泥道路，部分道路交叉口被周围树木遮挡，轮廓特征不完整进行基于改进前后的YOLOv3网络模型的测试，检测结果如图6-1和图6-2所示；对于道路与邻近建筑物颜色相近，在地物背景的干扰下道路交叉口特征不明显的道路交叉口进行基于改进前后的YOLOv3网络模型的测试，检测结果如图7-1和图7-2所示。基于改进前的YOLOv3网络对2张影像的检测结果，如图6-1与7-1所示，共有9个受干扰影响的道路交叉口未被识别，漏检率较高；基于改进的YOLOv3网络的检测结果，如图6-2与7-2所示，准确识别了部分轮廓特征不明显的道路交叉口，提高了复杂环境下的道路交叉口检测效果。For cement roads in township and residential areas, some road intersections are blocked by surrounding trees, and the contour features are incomplete. The test based on the YOLOv3 network model before and after improvement is carried out. The test results are shown in Figure 6-1 and Figure 6-2; The adjacent buildings have similar colors, and the road intersections with insignificant road intersection characteristics under the interference of the ground object background are tested based on the YOLOv3 network model before and after the improvement. The detection results are shown in Figure 7-1 and Figure 7-2. Based on the detection results of the two images by the YOLOv3 network before the improvement, as shown in Figures 6-1 and 7-1, a total of 9 road intersections affected by the interference were not identified, and the missed detection rate was high; based on the improved YOLOv3 The detection results of the network, as shown in Figures 6-2 and 7-2, can accurately identify some road intersections with inconspicuous contour features, and improve the detection effect of road intersections in complex environments.

为验证算法的适用性，在Google Earth平台上随机截取城市影像作为测试数据，基于改进前后的YOLOv3模型分别进行迁移测试，检测结果如图8-1和图8-2所示。In order to verify the applicability of the algorithm, city images were randomly intercepted on the Google Earth platform as test data, and migration tests were carried out based on the YOLOv3 model before and after the improvement. The test results are shown in Figure 8-1 and Figure 8-2.

图8-1和8-2的影像的空间分辨率约为1米，对应的实地面积为1760m×1096m，包含十字、丁字、错位和X形4种类型的65个道路交叉口。由于影像中错位和X形道路交叉口数量偏少，避免指标虚高的问题，在计算查全率与平均准确率时将所有类别道路交叉口归为一类，各评价指标统计如表1所示。The spatial resolution of the images in Figures 8-1 and 8-2 is about 1 meter, and the corresponding solid area is 1760m×1096m, including 65 road intersections of four types: cross, T-shaped, misplaced and X-shaped. Due to the small number of dislocations and X-shaped road intersections in the image, to avoid the problem of falsely high indicators, all types of road intersections are classified into one category when calculating the recall rate and average accuracy. The statistics of each evaluation index are shown in Table 1. Show.

表1 YOLOv3与改进YOLOv3检测性能对比Table 1 Comparison of detection performance between YOLOv3 and improved YOLOv3

由表1可知，与YOLOv3算法相比，本文改进的YOLOv3算法在密集道路交叉口检测中有明显的优势，降低了漏检和错检的数量，平均准确率提高12.18％。实验结果表明：本文改进的YOLOv3算法可将数据集中道路交叉口特征有效地进行迁移，对其它道路影像有较强的适用性。从统计结果看，道路交叉口检测的平均准确率较高，可为道路网的快速构建提供辅助信息。It can be seen from Table 1 that compared with the YOLOv3 algorithm, the improved YOLOv3 algorithm in this paper has obvious advantages in the detection of dense road intersections, reducing the number of missed detections and false detections, and the average accuracy rate is increased by 12.18%. The experimental results show that the improved YOLOv3 algorithm in this paper can effectively migrate the features of road intersections in the dataset, and has strong applicability to other road images. From the statistical results, the average accuracy of road intersection detection is high, which can provide auxiliary information for the rapid construction of road network.

道路交叉口的检测装置实施例The embodiment of the detection device of the road intersection

本发明还提供了一种基于改进YOLOv3的道路交叉口检测装置，该装置实际上为计算机等具备数据处理能力的设备，该设备包括处理器和存储器，处理器连接有用于获取道路图像的采集接口，该处理器可以是通用处理器，还可以是数字信号处理器、专用集成电路等，该处理器用于执行指令实现本发明的道路交叉口检测方法，具体方法见上述介绍的方法实施例，这里不再赘述。The present invention also provides a road intersection detection device based on improved YOLOv3. The device is actually a computer and other equipment with data processing capabilities. The device includes a processor and a memory, and the processor is connected to a collection interface for acquiring road images. , the processor can be a general-purpose processor, or a digital signal processor, an application-specific integrated circuit, etc., the processor is used to execute instructions to implement the road intersection detection method of the present invention. No longer.

尽管本发明的内容已经通过上述优选实施例作了详细介绍，但应当认识到上述的描述不应被认为是对本发明的限制。在本领域技术人员阅读了上述内容后，对于本发明的多种修改和替代都将是显而易见的。因此，本发明的保护范围应由所附的权利要求来限定。While the content of the present invention has been described in detail by way of the above preferred embodiments, it should be appreciated that the above description should not be construed as limiting the present invention. Various modifications and alternatives to the present invention will be apparent to those skilled in the art upon reading the foregoing. Accordingly, the scope of protection of the present invention should be defined by the appended claims.

Claims

1. a road intersection detection method based on improving YOLOv3, is characterized in that, comprises the following steps:

1), get the road image;

2), carry out network training, and construct an improved YOLOv3 network model; the improved YOLOv3 network model includes two parts, a feature extraction end Darknet and a feature detection end yolo, and the feature detection end yolo includes four channels, and in each channel, The corresponding convolution modules are first widened horizontally to generate different feature maps, and then vertical aggregation is performed;

The convolution module is a symmetric convolution module or an asymmetric convolution module; the convolution kernel is 3 × 3, 1 × 1 and 3 × 3 parallel convolution layers to form a symmetric convolution module. A 3×3 convolution kernel in the symmetric convolution module is replaced with 1×3 and 3×1 convolution kernels to form an asymmetric convolution module; each channel generates feature maps of different sizes after convolution operations; for features Channels whose image size is in the range of 12×12 to 12×20 use an asymmetric convolution module, and channels of other scales use a symmetric convolution module;

When performing feature extraction, a multi-scale fusion method is used, specifically: first, input the feature map with a size of 13 × 13 output from the last layer of the Darknet network into the first channel in YOLO to detect large targets; second The feature map of size 13 × 13 is passed down to the second channel, and after upsampling operation, it is fused with the feature map of size 26 × 26 output by the Darknet network, and then the second largest target is detected; Continue to pass the feature map of the second channel to the third channel, and fuse it with the feature map with a size of 52×52 after upsampling to detect the second-smallest target; fourth, the size of the third channel is 52×52. The feature map is up-sampled for the last time, and it is fused with the feature map of the fourth channel with a size of 104×104 to detect small objects, and the feature extraction after multi-scale fusion is realized;

3), using the improved YOLOv3 network model to identify the road image to be detected, and output the result.

2. the road intersection detection method based on improved YOLOv3 according to claim 1, is characterized in that, the activation function of described feature extraction end is:

Among them, x _i is the eigenvalue;

3. a road intersection detection device based on improving YOLOv3, comprises a processor and a memory, and the processor is connected with an acquisition interface that is used to obtain road images; it is characterized in that, the processor executes the following method instructions stored in the memory:

1), get the road image;

The convolution module is a symmetric convolution module or an asymmetric convolution module; the convolution kernel is 3×3, 1×1 and 3×3 parallel convolution layers to form a symmetric convolution module. A 3×3 convolution kernel in the symmetric convolution module is replaced with 1×3 and 3×1 convolution kernels to form an asymmetric convolution module; each channel generates feature maps of different sizes after convolution operations; for features Channels whose image size is in the range of 12×12 to 12×20 use an asymmetric convolution module, and channels of other scales use a symmetric convolution module;

When performing feature extraction, a multi-scale fusion method is used, specifically: first, input the feature map with a size of 13 × 13 output from the last layer of the Darknet network into the first channel in YOLO to detect large targets; second The feature map of size 13 × 13 is passed down to the second channel, and after upsampling operation, it is fused with the feature map of size 26 × 26 output by the Darknet network, and then the second largest target is detected; Continue to pass the feature map of the second channel to the third channel, and fuse it with the feature map with a size of 52 × 52 after upsampling to detect the second-smallest target; fourth, the size of the third channel is 52 × 52. The feature map is up-sampled for the last time, and then fused with the feature map of the fourth channel with a size of 104×104 to detect small targets, realizing feature extraction after multi-scale fusion;

4. the road intersection detection device based on improved YOLOv3 according to claim 3, is characterized in that, the activation function of described feature extraction end is:

Among them, x _i is the eigenvalue;