CN110674674A

CN110674674A - A rotating target detection method based on YOLO V3

Info

Publication number: CN110674674A
Application number: CN201910707178.2A
Authority: CN
Inventors: 陈华杰; 吴栋; 侯新雨; 韦玉谭
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Electronic Science and Technology University
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2020-01-10

Abstract

The invention discloses a rotating target detection method based on YOLO V3. The invention first extracts information from images of remote sensing ship training set images with arbitrary directions, uses the improved YOLO V3 algorithm to train the remote sensing ship training set, and finally trains the remote sensing ship training set. The detection of remote sensing ships with arbitrary angles can be done in the test set. The invention redesigns the generation of the anchor box, the calculation of the IOU, and the calculation method of the loss function of the original YOLO V3 algorithm, and increases the angle information of the detection target in the training set image, so that the target with any angle can be completed in the test set. detection.

Description

A rotating target detection method based on YOLO V3

技术领域technical field

本发明属于深度学习领域，涉及一种基于YOLO V3的旋转目标检测方法。The invention belongs to the field of deep learning, and relates to a rotating target detection method based on YOLO V3.

背景技术Background technique

目前，目标检测已被广泛应用于军事和民用等领域中。深度卷积神经网络可以利用目标数据集对要检测的目标进行自主学习，并完善自己的模型。YOLO V3是一种单步目标检测算法，此算法不需要使用区域候选网络(RPN)来提取候选目标信息，而是直接通过网络来产生目标的位置和类别信息，是一种端到端的目标检测算法。因此，单步目标检测算法具有更快的检测速度。At present, target detection has been widely used in military and civilian fields. A deep convolutional neural network can use the target dataset to autonomously learn the target to be detected and improve its own model. YOLO V3 is a single-step target detection algorithm. This algorithm does not need to use a regional candidate network (RPN) to extract candidate target information, but directly generates target location and category information through the network. It is an end-to-end target detection. algorithm. Therefore, the single-step object detection algorithm has a faster detection speed.

YOLO V3模型是采用分网格直接回归目标坐标和分类目标的方法进行目标检测，主要利用水平矩形边界框定义目标位置，通过边界框参数的回归对目标进行定位。这种方法对于目标对象是人，车等目标是足够准确的，而对于目标对象是文本，舰船等任意方向的目标是不适合的。The YOLO V3 model uses the method of directly regressing the target coordinates and classification targets by dividing the grid for target detection. It mainly uses the horizontal rectangular bounding box to define the target position, and locates the target through the regression of the bounding box parameters. This method is accurate enough for targets such as people and vehicles, but it is not suitable for targets with arbitrary directions such as text and ships.

发明内容SUMMARY OF THE INVENTION

本发明针对现有技术的不足，提供了一种基于YOLO V3的旋转目标检测方法，该方法通过对具有任意方向的遥感舰船训练集图像进行信息提取，使用改进的YOLO V3算法对遥感舰船训练集进行训练，最终可以在测试集中完成对具有任意角度的遥感舰船的检测。Aiming at the deficiencies of the prior art, the present invention provides a rotating target detection method based on YOLO V3. The method extracts information from images of remote sensing ship training sets with arbitrary directions, and uses the improved YOLO V3 algorithm to detect remote sensing ships. The training set is used for training, and finally the detection of remote sensing ships with arbitrary angles can be completed in the test set.

步骤(1)、对训练集图像中的检测目标进行位置信息，角度信息的提取；Step (1), extracting the position information and angle information of the detection target in the training set image;

训练集图像进行处理，将检测目标表示为一个5维向量{x,y,w,h,θ}；其中：x表示检测目标的x轴方向的坐标，y表示检测目标的y轴方向的坐标，w表示检测目标的长度，h表示检测目标的宽度，θ表示检测目标的倾斜角度；The training set image is processed, and the detection target is represented as a 5-dimensional vector {x, y, w, h, θ}; where: x represents the coordinates of the detection target in the x-axis direction, and y represents the coordinates of the y-axis direction of the detection target. , w represents the length of the detection target, h represents the width of the detection target, and θ represents the inclination angle of the detection target;

步骤(2)、改进YOLO V3算法框架Step (2), improve the YOLO V3 algorithm framework

2.1锚点框的设置2.1 Setting the anchor box

使用带有旋转角度的三维度变量锚点，设置N个不同的角度来控制检测目标疑似区域的提取；设置多组锚点框长度和宽度，根据已知的训练集图像检测目标设置三组锚点框大小；根据卷积神经网络中卷积核的步幅和输入图像的大小会产生不同的尺度；在每个尺度上使用3个锚点来产生3个边界框，得到3*3*N个锚点框来与检测目标的位置区域进行拟合；Use a three-dimensional variable anchor point with a rotation angle to set N different angles to control the extraction of the suspected area of the detection target; set the length and width of multiple sets of anchor boxes, and set three sets of anchors according to the known training set image detection target Point box size; different scales are generated according to the stride of the convolution kernel in the convolutional neural network and the size of the input image; use 3 anchors at each scale to generate 3 bounding boxes, resulting in 3*3*N An anchor box is used to fit the location area of the detection target;

2.2损失函数的设置；2.2 Setting of loss function;

正负样本区域提取的划定准则为：锚点框与GT框的IOU大于0.7，同时与GT框的角度夹角小于π/12为正样本；锚点框与GT框的IOU小于0.3或是与GT框的IOU大于0.7，但是与GT框的角度夹角大于π/12为负样本；没有被归为正样本和负样本的在训练过程中不会被使用；GT框即检测目标的位置区域；IOU表示锚点框与GT框拟合的重叠面积大小；The delineation criteria for the extraction of positive and negative sample regions are: the IOU between the anchor box and the GT box is greater than 0.7, and the angle between the anchor box and the GT box is less than π/12, which is a positive sample; the IOU between the anchor box and the GT box is less than 0.3 or The IOU with the GT box is greater than 0.7, but the angle with the GT box is greater than π/12 is a negative sample; the samples that are not classified as positive samples and negative samples will not be used in the training process; the GT box is the location of the detection target area; IOU represents the overlapping area of the anchor box and the GT box fitted;

定义损失函数：Define the loss function:

Loss＝L_cls(p,l)+L_reg(v^*,v)+L_conf Loss=L _cls (p,l)+L _reg (v ^* ,v)+L _conf

其中：L_cls(p,l)表示进行分类的损失函数，l表示类别标签，l＝1表示选择的是目标物，l＝0表示选择的是背景，如果选择背景则不存在回归的损失函数，当l＝1时，采用交叉熵的方式计算真实类别与预测类别的损失，p为线性分类的概率；L_reg(v^*,v)表示进行回归的损失函数；v^*为步骤(1)中检测目标的信息(x,y,w,h,θ)，v为步骤(2)中锚点框对应的预测检测目标的信息(x',y',w',h',θ')，采用均方差的方式计算预测值v与真实值v^*的损失；L_conf表示置信度的损失函数，当IOU低于0.3时视为图片中不存在目标，当高于0.3时需要让预测的目标尽可能靠近真实值，没有目标的部分需要尽可能靠近背景真实值，采用交叉熵的方式来计算置信度的损失；Among them: L _cls (p, l) represents the loss function for classification, l represents the category label, l=1 means the target is selected, l=0 means the background is selected, if the background is selected, there is no regression loss function , when l=1, the loss of the real category and the predicted category is calculated by means of cross entropy, p is the probability of linear classification; L _reg (v ^* ,v) represents the loss function for regression; v ^* is step (1) The information of the detection target in (x, y, w, h, θ), v is the information of the predicted detection target corresponding to the anchor box in step (2) (x', y', w', h', θ') , the loss between the predicted value v and the real value v ^* is calculated by means of the mean square error; L _conf represents the loss function of the confidence degree. When the IOU is lower than 0.3, it is considered that there is no target in the picture. When it is higher than 0.3, the predicted value needs to be The target is as close to the real value as possible, and the part without the target needs to be as close to the real value of the background as possible, and the loss of confidence is calculated by means of cross entropy;

步骤(3)、使用改进后的YOLO V3算法对训练集图像进行训练，迭代至损失函数不再下降为止，获取权重文件；Step (3), use the improved YOLO V3 algorithm to train the training set images, iterate until the loss function no longer decreases, and obtain the weight file;

步骤(4)、利用步骤(3)中得到的权重文件，对测试集图像进行测试。Step (4), use the weight file obtained in step (3) to test the test set images.

本发明的有益效果如下：The beneficial effects of the present invention are as follows:

本发明对原YOLO V3算法的anchor box的产生、IOU的计算、损失函数的计算方法进行重新设计，增加训练集图像中检测目标的角度信息，使得可以在测试集中完成对具有任意角度的目标的检测。The invention redesigns the generation of the anchor box, the calculation of the IOU, and the calculation method of the loss function of the original YOLO V3 algorithm, and increases the angle information of the detection target in the training set image, so that the target with any angle can be completed in the test set. detection.

附图说明Description of drawings

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

具体实施方式Detailed ways

下面结合图1对本发明做进一步的分析。The present invention is further analyzed below in conjunction with FIG. 1 .

本实验将一组采集的舰船目标图像划分为训练集和测试集。在基于YOLO V3的旋转目标检测任务中的具体步骤如下：In this experiment, a group of collected ship target images are divided into training set and test set. The specific steps in the rotating target detection task based on YOLO V3 are as follows:

步骤(1)、对训练集图像中的检测目标进行位置信息，角度信息的提取。Step (1), extracting the position information and angle information of the detection target in the training set image.

旋转目标检测方法相比于原YOLO V3算法需要增加一个角度信息。对训练集图像进行处理，将检测目标表示为一个5维向量{x,y,w,h,θ}。其中x表示检测目标的x轴方向的坐标，y表示检测目标的y轴方向的坐标，w表示检测目标的长度，w表示检测目标的宽度，θ表示检测目标的倾斜角度。Compared with the original YOLO V3 algorithm, the rotating target detection method needs to add an angle information. The training set images are processed to represent the detection target as a 5-dimensional vector {x, y, w, h, θ}. Where x represents the x-axis coordinate of the detection target, y represents the y-axis coordinate of the detection target, w represents the length of the detection target, w represents the width of the detection target, and θ represents the inclination angle of the detection target.

步骤(2)、改进YOLO V3算法框架。Step (2), improve the YOLO V3 algorithm framework.

2.1anchor box(锚点框)的设置。2.1 Anchor box (anchor box) settings.

使用带有旋转角度的三维度变量锚点，设置多个不同的角度来控制检测目标疑似区域的提取，可以使用6个不同的角度：-π/6，0，π/6，π/3，π/2，2π/3。设置多组锚点框长度和宽度，可以根据已知的训练集图像检测目标设置三组anchor box大小：[(10，13)，(16，30)，(33，23)]，[(30，61)，(62，45)，(59，119)]，[(116，90)，(156，198)，(373，326)]。根据卷积神经网络中卷积核的步幅和输入图像的大小会产生不同的尺度，原YOLO V3特征图步幅分别使32、16、8，输入图像的大小416*416，产生的尺度为13*13、26*26、52*52。在每个尺度上使用3个anchor来产生3个边界框，这样就可以的到3*3*6＝54个anchor box来与检测目标的位置区域进行拟合Use a three-dimensional variable anchor point with a rotation angle to set multiple different angles to control the extraction of the suspected area of the detection target. You can use 6 different angles: -π/6, 0, π/6, π/3, π/2, 2π/3. To set the length and width of multiple sets of anchor boxes, three sets of anchor box sizes can be set according to the known training set image detection targets: [(10, 13), (16, 30), (33, 23)], [(30 , 61), (62, 45), (59, 119)], [(116, 90), (156, 198), (373, 326)]. According to the stride of the convolution kernel in the convolutional neural network and the size of the input image, different scales will be generated. The stride of the original YOLO V3 feature map is 32, 16, and 8 respectively, and the size of the input image is 416*416. The resulting scale is 13*13, 26*26, 52*52. Use 3 anchors at each scale to generate 3 bounding boxes, so that 3*3*6=54 anchor boxes can be fitted to the location area of the detection target

2.2损失函数的设置。2.2 Setting of loss function.

正负样本区域提取的划定准则为：anchor box与GT框(检测目标的位置区域)的IOU(anchor box与GT框拟合的重叠面积大小)大于0.7，同时与GT框的角度夹角小于π/12为正样本；anchor box与GT框的IOU小于0.3或是与GT框的IOU大于0.7，但是与GT框的角度夹角大于π/12为负样本；没有被归为正样本和负样本的在训练过程中不会被使用。The delineation criteria for the extraction of positive and negative sample regions are: the IOU (the size of the overlap area fitted by the anchor box and the GT box) of the anchor box and the GT box (the location area of the detection target) is greater than 0.7, and the angle with the GT box is less than 0.7. π/12 is a positive sample; the IOU of the anchor box and the GT box is less than 0.3 or the IOU of the GT box is greater than 0.7, but the angle with the GT box is greater than π/12 is a negative sample; not classified as positive samples and negative samples Samples are not used during training.

定义损失函数：Define the loss function:

其中：L_cls(p,l)表示进行分类的损失函数，l表示类别标签(l＝1表示选择的是目标物，l＝0表示选择的是背景，如果选择背景则不存在回归的损失函数)，当l＝1时，采用交叉熵的方式计算真实类别与预测类别的损失(p为线性分类的概率)；L_reg(v^*,v)表示进行回归的损失函数。v^*为步骤(1)中检测目标的信息(x,y,w,h,θ)，v为步骤(2)中anchorbox对应的预测检测目标的信息(x',y',w',h',θ')，采用均方差的方式计算预测值v与真实值v^*的损失；L_conf表示置信度的损失函数。当IOU低于0.3时视为图片中不存在目标，当高于0.3时需要让预测的目标尽可能靠近真实值，没有目标的部分需要尽可能靠近背景真实值，采用交叉熵的方式来计算置信度的损失。Among them: L _cls (p, l) represents the loss function for classification, l represents the category label (l=1 means the target is selected, l=0 means the background is selected, if the background is selected, there is no regression loss function ), when l=1, the loss between the real category and the predicted category is calculated by means of cross entropy (p is the probability of linear classification); L _reg (v ^* ,v) represents the loss function for regression. v ^* is the information of the detection target in step (1) (x, y, w, h, θ), v is the information of the predicted detection target corresponding to the anchorbox in step (2) (x', y', w', h) ', θ'), the loss between the predicted value v and the true value v ^* is calculated by means of the mean square error; L _conf represents the loss function of the confidence. When the IOU is lower than 0.3, it is considered that there is no target in the picture. When it is higher than 0.3, the predicted target needs to be as close to the real value as possible, and the part without the target needs to be as close to the real value of the background as possible. Cross entropy is used to calculate the confidence. degree loss.

步骤(3)、使用改进后的YOLO V3算法对训练集图像进行训练，迭代至损失函数不再下降为止，获取权重文件。Step (3): Use the improved YOLO V3 algorithm to train the training set images, iterate until the loss function no longer decreases, and obtain the weight file.

上述实施例并非是对于本发明的限制，本发明并非仅限于上述实施例，只要符合本发明要求，均属于本发明的保护范围。The above-mentioned embodiments are not intended to limit the present invention, and the present invention is not limited to the above-mentioned embodiments. As long as the requirements of the present invention are met, they all belong to the protection scope of the present invention.

Claims

1. a rotating target detection method based on YOLO V3, is characterized in that, the method specifically comprises the following steps:

Step (1), extracting the position information and angle information of the detection target in the training set image;

The training set image is processed, and the detection target is represented as a 5-dimensional vector {x, y, w, h, θ}; where: x represents the coordinates of the detection target in the x-axis direction, and y represents the coordinates of the y-axis direction of the detection target. , w represents the length of the detection target, h represents the width of the detection target, and θ represents the inclination angle of the detection target;

Step (2), improve the YOLO V3 algorithm framework

2.1 Setting of anchor box

Use a three-dimensional variable anchor point with a rotation angle to set N different angles to control the extraction of the suspected area of the detection target; set the length and width of multiple sets of anchor boxes, and set three sets of anchors according to the known training set image detection target Point box size; different scales are generated according to the stride of the convolution kernel in the convolutional neural network and the size of the input image; use 3 anchors at each scale to generate 3 bounding boxes, resulting in 3*3*N An anchor box is used to fit the location area of the detection target;

2.2 Setting of loss function;

The delineation criteria for the extraction of positive and negative sample regions are: the IOU of the anchor box and the GT box is greater than 0.7, and the angle between the anchor box and the GT box is less than π/12, which is a positive sample; the IOU of the anchor box and the GT box is less than 0.3 or The IOU with the GT box is greater than 0.7, but the angle with the GT box is greater than π/12, which is a negative sample; the samples that are not classified as positive samples and negative samples will not be used in the training process; the GT box is the location of the detection target area; IOU represents the overlapping area of the anchor box and the GT box fitted;

Define the loss function:

Loss=L _cls (p,l)+L _reg (v ^* ,v)+L _conf

Among them: L _cls (p, l) represents the loss function for classification, l represents the category label, l=1 means the target is selected, l=0 means the background is selected, if the background is selected, there is no regression loss function , when l=1, the loss of the real category and the predicted category is calculated by means of cross entropy, p is the probability of linear classification; L _reg (v ^* ,v) represents the loss function for regression; v ^* is step (1) The information of the detection target in (x, y, w, h, θ), v is the information of the predicted detection target corresponding to the anchor box in step (2) (x', y', w', h', θ') , the loss between the predicted value v and the real value v ^* is calculated by means of the mean square error; L _conf represents the loss function of the confidence degree. When the IOU is lower than 0.3, it is considered that there is no target in the picture. When it is higher than 0.3, the predicted value needs to be The target is as close to the real value as possible, and the part without the target needs to be as close to the real value of the background as possible, and the loss of confidence is calculated by means of cross entropy;

Step (3), use the improved YOLO V3 algorithm to train the training set images, iterate until the loss function no longer decreases, and obtain the weight file;

Step (4), use the weight file obtained in step (3) to test the test set images.