CN113536896A

CN113536896A - Small target detection method, device and storage medium based on improved Faser RCNN

Info

Publication number: CN113536896A
Application number: CN202110593538.8A
Authority: CN
Inventors: 李乾; 张明; 余志强; 孙晓云; 刘保安; 韩广; 郑海清; 戎士敏; 药炜
Original assignee: State Grid Corp of China SGCC; Shijiazhuang Tiedao University; Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd; Taiyuan Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Shijiazhuang Tiedao University; Shijiazhuang Power Supply Co of State Grid Hebei Electric Power Co Ltd; Taiyuan Power Supply Co of State Grid Shanxi Electric Power Co Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-10-22
Anticipated expiration: 2041-05-28
Also published as: CN113536896B

Abstract

The invention relates to a small target detection method based on the improved Faser RCNN, which is realized by executing an improved Faser RCNN algorithm instruction by a processor, including: receiving a scene picture containing a small target and extracting a first feature map F; according to the first feature Figure F ₁ obtains a prediction anchor point frame a(x, y, w, h); according to the first feature map F ₁ , the prediction anchor point frame a(x, y, w, h) A second feature map F ₂ with the same size as the feature map F ₁ ; the detection result of the scene picture is obtained according to the second feature map F ₂ and the prediction anchor frame a(x, y, w, h). The present invention modifies the frame based on the Faser RCNN algorithm, and replaces the RPN network under it with an adaptive anchor frame network, so that the generated anchor frame can be more matched with targets of different scales, thereby avoiding the unreasonable size of the anchor frame. Resulting in missed detection, improve detection accuracy.

Description

Small target detection method, device and storage medium based on improved Faser RCNN

技术领域technical field

本发明涉及目标识别识别领域，具体涉及一种基于改进Faser RCNN的小目标检测方法。同时，本发明还涉及一种基于改进Faser RCNN的小目标检测装置及存储介质。The invention relates to the field of target recognition and recognition, in particular to a small target detection method based on improved Faser RCNN. At the same time, the invention also relates to a small target detection device and a storage medium based on the improved Faser RCNN.

背景技术Background technique

目标检测，也叫目标提取，是一种基于目标几何和统计特征的图像分割。它将目标的分割和识别合二为一，其准确性和实时性是整个系统的一项重要能力。Object detection, also called object extraction, is an image segmentation based on the geometric and statistical features of objects. It combines the segmentation and recognition of objects into one, and its accuracy and real-time performance are an important capability of the entire system.

目标检测是计算机视觉和数字图像处理的一个热门方向，广泛应用于机器人导航、智能视频监控、工业检测、航空航天等诸多领域，通过计算机视觉减少对人力资本的消耗，具有重要的现实意义。因此，目标检测也就成为了近年来理论和应用的研究热点。近些年机器学习，尤其是深度学习的蓬勃发展为目标检测实现低成本、高效率提供了可能。Object detection is a popular direction in computer vision and digital image processing. It is widely used in robot navigation, intelligent video surveillance, industrial inspection, aerospace and many other fields. It is of great practical significance to reduce the consumption of human capital through computer vision. Therefore, object detection has become a research hotspot in theory and application in recent years. In recent years, the vigorous development of machine learning, especially deep learning, makes it possible to achieve low-cost and high-efficiency target detection.

目前优秀的深度学习模型大致可以分为两类：第一类属于两阶段目标检测算法(two-stage)，如R-CN、SPP-Net、Fast-RCNN、Faster-RCNN等，这类算法首先从目标图像的区域候选框(RPN)中提取目标信息,然后利用检测网络对候选框中的目标进行位置的预测以及类别的识别；第二类属于一阶段(one-stage)目标检测算法，如SSD、YOLO等，这类算法不需要建立RPN网络，而是直接在图像上进行目标的预测以及类别的识别。At present, excellent deep learning models can be roughly divided into two categories: the first category belongs to two-stage target detection algorithms (two-stage), such as R-CN, SPP-Net, Fast-RCNN, Faster-RCNN, etc. This kind of algorithm first The target information is extracted from the regional candidate frame (RPN) of the target image, and then the detection network is used to predict the position of the target in the candidate frame and identify the category; the second category belongs to the one-stage target detection algorithm, such as SSD, YOLO, etc., such algorithms do not need to establish an RPN network, but directly perform target prediction and category recognition on the image.

然而，在现实应用中，目标检测的结果往往出现小目标的检测效果远不如大目标和中目标。这是由于小目标检测存在两个问题：①信息量匮乏，即目标在图像中占比非常小，对应区域的像素所能反映出的信息量非常有限；②数据量稀缺，即数据集中含有小目标的图像少，导致了整个训练集的类别不均衡，致使小目标物体检测的准确率远低于中等和大型物体。目前，针对小目标检测效果差的问题，有如下几类方法：However, in practical applications, the results of target detection often show that the detection effect of small targets is far inferior to that of large and medium targets. This is due to two problems in small target detection: (1) The amount of information is lacking, that is, the target occupies a very small proportion in the image, and the amount of information that can be reflected by the pixels in the corresponding area is very limited; (2) The amount of data is scarce, that is, the data set contains small The image of the target is few, which leads to the imbalance of the categories of the entire training set, resulting in the detection accuracy of small target objects is much lower than that of medium and large objects. At present, for the problem of poor small target detection effect, there are the following methods:

①进行图像数据扩增，将图像放大，从而将小目标变大。然而这种方法简单粗暴，操作复杂，计算量太大，实际意义不强。①Amplify the image data and enlarge the image to make the small target bigger. However, this method is simple and rude, the operation is complicated, the calculation amount is too large, and the practical significance is not strong.

②利用GAN模型将小目标放大再检测，这种方法与图像数据扩增思想一致，但同样有操作复杂的缺点。②Using the GAN model to enlarge the small target and then detect it. This method is consistent with the idea of image data amplification, but it also has the disadvantage of complicated operation.

③修改模型训练的参数，如设置参数stride为1，但这种方法效果也一般。③ Modify the parameters of the model training, such as setting the parameter stride to 1, but this method has a general effect.

CN 111985540 A公开了一种基于过采样faster-RCNN提高小目标检测率的方法，涉及目标识别领域，包括步骤1：获取目标图片数据集，并将该数据集划分为训练集和测试集；步骤2：根据步骤1中的训练集得到训练集的子集为过采样集；步骤3：构建faster-RCNN模型；步骤4：利用训练集和过采样集对faster-RCNN模型进行训练；步骤5：用测试集对训练后的faster-RCNN模型进行测试，若测试结果低于平均精确率AP阈值，则修改参数，再次进行步骤4后对训练结果进行测试，直至测试结果达到AP阈值；步骤6：输入需要检测的图片，利用训练后的faster-RCNN模型进行小目标检测。本发明通过对小目标进行过采样，从而提高小目标的检测率。CN 111985540 A discloses a method for improving the detection rate of small targets based on over-sampling faster-RCNN, which relates to the field of target recognition, and includes step 1: acquiring a target picture data set, and dividing the data set into a training set and a test set; step 2: Obtain a subset of the training set as an oversampling set according to the training set in step 1; Step 3: Build a faster-RCNN model; Step 4: Use the training set and oversampling set to train the faster-RCNN model; Step 5: Use the test set to test the trained faster-RCNN model. If the test result is lower than the average accuracy AP threshold, modify the parameters, and perform step 4 again to test the training result until the test result reaches the AP threshold; Step 6: Input the image to be detected, and use the trained faster-RCNN model for small target detection. The invention improves the detection rate of the small target by over-sampling the small target.

CN 111986160 A公开了一种基于faster-RCNN提高小目标检测效果的方法，属于目标检测领域。本发明包括：获取数据集，并按照相应比例划分成训练集及测试集；构建faster-RCNN模型；利用训练集对模型进行训练，在训练时，若在第n次迭代中，小目标的loss值达到了预设条件，则在第n+1次迭代中，将多张图片进行缩小，然后拼接成原图大小，再进行训练；在训练结束后，利用测试集对模型进行测试，得到AP值，若小于设定的阈值，则修改相应参数，重新训练，直到模型的小目标AP值达到设定的阈值；利用训练好的模型进行小目标的检测。本发明能够使小目标的分布更加均匀，进而提高小目标训练的充分度，从而提高小目标的检测精度。CN 111986160 A discloses a method for improving small target detection effect based on faster-RCNN, which belongs to the field of target detection. The invention includes: acquiring a data set, and dividing it into a training set and a test set according to a corresponding proportion; constructing a faster-RCNN model; using the training set to train the model, during training, if in the nth iteration, the loss of the small target If the value reaches the preset condition, in the n+1th iteration, multiple pictures are reduced, then spliced into the original picture size, and then trained; after the training, the model is tested with the test set, and the AP is obtained. If it is less than the set threshold, modify the corresponding parameters and retrain until the small target AP value of the model reaches the set threshold; use the trained model to detect small targets. The invention can make the distribution of the small targets more uniform, thereby improving the adequacy of the training of the small targets, thereby improving the detection accuracy of the small targets.

CN 111898668 A公开了一种基于深度学习的小目标物体检测方法，该检测方法可以克服现有的小目标物体检测方法中检测效率不足和准确率低等问题。首先，基于COCO数据集提取出不包含小目标物体的图像，对图像尺寸进行调整后进行拼接，将拼接图像和COCO数据集中含有小目标对象的图像组成新的数据集，并按4：1的比例将数据集分成训练集和测试集；然后，修改Faster-RCNN的基础特征提取网络进行特征融合；接着，将进行融合之后的每层级融合特征通过RPN网络进行候选区域选择；再用训练集对改进后的网络进行训练，得到训练模型；最后，将测试集输入到训练好的模型进行目标检测。CN 111898668 A discloses a small target object detection method based on deep learning, which can overcome the problems of insufficient detection efficiency and low accuracy in the existing small target object detection methods. First, based on the COCO dataset, extract images that do not contain small target objects, adjust the size of the images, and then stitch them together. Combine the stitched images and the images containing small target objects in the COCO dataset to form a new dataset, and use a 4:1 ratio. The data set is divided into training set and test set proportionally; then, the basic feature extraction network of Faster-RCNN is modified to perform feature fusion; then, the fusion features of each level after fusion are selected by the RPN network for candidate regions; The improved network is trained to obtain a training model; finally, the test set is input into the trained model for target detection.

CN 111368769 A提供一种基于改进锚点框生成模型的船舶多目标检测方法，包括：获取SAR船舶图像；构建低复杂度网络架构，并将图像放入低复杂度网络中，生成特征映射空间；采用基于形状相似度的聚类方法生成初始锚点框；以生成的初始锚点框为基础，采用滑窗机制在低复杂度特征空间中生成新的候选框，对候选框进行回归训练，用于船舶多目标检测。本发明解决了因网络复杂、候选框质量差造成的算法效率和检测质量低的问题，具有较好的准确性。由于采用低复杂度网络架构进行检测，因此从统计分析角度而言，数据采集量越大，也即检测次数越多，检测的效果越好。CN 111368769 A Provides a ship multi-target detection method based on an improved anchor frame generation model, comprising: acquiring a SAR ship image; constructing a low-complexity network structure, and placing the image into the low-complexity network to generate a feature map space; The clustering method based on shape similarity is used to generate the initial anchor frame; based on the generated initial anchor frame, the sliding window mechanism is used to generate a new candidate frame in the low-complexity feature space, and the candidate frame is subjected to regression training. for ship multi-target detection. The invention solves the problems of low algorithm efficiency and low detection quality caused by complex network and poor candidate frame quality, and has better accuracy. Since a low-complexity network architecture is used for detection, from the perspective of statistical analysis, the greater the amount of data collected, that is, the more the number of detections, the better the detection effect.

上述专利申请的目标检测方法均属于基于Faster-RCNN的两阶段目标检测算法，其分别通过对小目标进行过采样、提高小目标训练的充分度、改进模型或模型训练过程、采用滑窗机制在低复杂度特征空间中生成新的候选框来提高小目标的检测精度。Faster-RCNN的主要步骤如下：首先，卷积层用于提取输入图片的特征图(feature maps)；其次，RPN网络用于生成区域(proposals)；再次，Roi Pooling层根据feature maps和proposals，提取区域特征图(proposal feature maps)；最后，分类层根据proposal feature maps计算proposal的类别，同时再次进行边框回归(bounding box regression)获得检测框最终的精确位置。但是，若proposals的设置尺寸不合理，容易造成漏检的问题，尤其对于两目标尺度相差较大的情况下，极易造成小尺度目标漏检的问题，从而降低了检测准确率。The target detection methods of the above patent applications belong to the two-stage target detection algorithm based on Faster-RCNN, which respectively oversample small targets, improve the adequacy of small target training, improve the model or model training process, and adopt the sliding window mechanism. Generate new candidate boxes in the low-complexity feature space to improve the detection accuracy of small objects. The main steps of Faster-RCNN are as follows: first, the convolutional layer is used to extract the feature maps of the input image; secondly, the RPN network is used to generate proposals; thirdly, the Roi Pooling layer is used to extract feature maps and proposals. Regional feature maps (proposal feature maps); finally, the classification layer calculates the proposal category according to the proposal feature maps, and performs bounding box regression again to obtain the final precise position of the detection frame. However, if the setting size of proposals is unreasonable, it is easy to cause the problem of missed detection, especially when the scales of the two targets are quite different, it is easy to cause the problem of missed detection of small-scale targets, thereby reducing the detection accuracy.

发明内容SUMMARY OF THE INVENTION

本发明目的在于提供一种基于改进Faser RCNN的小目标检测方法，该方法能够生成与目标尺寸更加匹配的锚点框，降低小尺度目标的漏检率，提高检测准确率。The purpose of the present invention is to provide a small target detection method based on the improved Faser RCNN, which can generate an anchor point frame more matching the target size, reduce the missed detection rate of small-scale targets, and improve the detection accuracy.

本发明提供的技术方案是一种基于改进Faser RCNN的小目标检测方法，由处理器执行一改进Faser RCNN算法指令实现，包括：接收包含小目标的场景图片；使用所述FaserRCNN其第一卷积模块提取所述场景图片的第一特征图F₁；使用所述Faser RCNN其第二卷积模块根据所述第一特征图F₁获得预测锚点框中心位置a(x,y)和预测锚点框尺寸a(w,h)并根据所述预测锚点框中心位置a(x,y)和所述预测锚点框尺寸a(w,h)获得预测锚点框a(x,y,w,h)；使用所述Faser RCNN其第三卷积模块根据所述第一特征图F₁、所述预测锚点框a(x,y,w,h)获得与所述第一特征图F₁尺寸相同的第二特征图F₂；使用所述Faser RCNN其第四卷积模块根据所述第二特征图F₂和所述预测锚点框a(x,y,w,h)获得所述场景图片的检测结果。The technical solution provided by the present invention is a small target detection method based on the improved Faser RCNN, which is realized by the processor executing an improved Faser RCNN algorithm instruction, including: receiving a scene picture containing a small target; using the FaserRCNN to its first convolution The module extracts the first feature map F ₁ of the scene picture; the second convolution module of the Faser RCNN is used to obtain the prediction anchor frame center position a (x, y) and the prediction anchor according to the first feature map F ₁ point frame size a(w,h) and obtain the predicted anchor point frame a(x,y, w,h); the third convolution module of the Faser RCNN is used to obtain the first feature map and the first feature map according to the first feature map F ₁ and the prediction anchor frame a(x, y, w, h). The second feature map F ₂ with the same size as F ₁ ; the fourth convolution module of the Faser RCNN is used to obtain the second feature map F ₂ and the predicted anchor point frame a(x, y, w, h) The detection result of the scene picture.

进一步的，所述第一卷积模块的主干部分采用ResNet的卷积结构。Further, the main part of the first convolution module adopts the convolution structure of ResNet.

进一步的，所述第一卷积模块的主干部分采用ResNet50的卷积结构。Further, the main part of the first convolution module adopts the convolution structure of ResNet50.

进一步的，所述ResNet50的卷积结构包括多层Deform ResNet50残差块结构，所述Deform ResNet50残差块结构其第二卷积层替换为深度可分离卷积层。Further, the convolutional structure of the ResNet50 includes a multi-layer Deform ResNet50 residual block structure, and the second convolutional layer of the Deform ResNet50 residual block structure is replaced by a depthwise separable convolutional layer.

进一步的，所述第一卷积模块包括通道注意力机制模块，所述通道注意力机制模块被配置为根据所述场景图片获得特征权重S_c；所述第二卷积模块被配置为根据所述第一特征图F₁和所述特征权重S_c获得所述预测锚点框中心位置a(x,y)并根据所述预测锚点框中心位置a(x,y)获得所述预测锚点框a(x,y,w,h)。Further, the first convolution module includes a channel attention mechanism module, and the channel attention mechanism module is configured to obtain the feature weight S _c according to the scene picture; the second convolution module is configured to The first feature map F ₁ and the feature weight S _c obtain the central position a(x, y) of the predicted anchor frame and obtain the predicted anchor according to the central position a(x, y) of the predicted anchor frame Point box a(x,y,w,h).

进一步的，所述第二卷积模块的主干部分采用自适应锚点框网络结构。Further, the backbone part of the second convolution module adopts an adaptive anchor frame network structure.

进一步的，所述自适应锚点框网络根据所述第一特征图F₁获得得分特征图F_P，后根据所述得分特征图F_P获得所述预测锚点框中心位置a(x,y)；并根据所述预测锚点框中心位置a(x,y)和所述预测锚点框尺寸a(w,h)获得所述预测锚点框a(x,y,w,h)。Further, the adaptive anchor frame network obtains a score feature map F _P according to the first feature map F ₁ , and then obtains the predicted anchor frame center position a(x, y according to the score feature map _FP ) ); and the predicted anchor frame a(x, y, w, h) is obtained according to the predicted anchor frame center position a(x, y) and the predicted anchor frame size a(w, h).

进一步的，所述自适应锚点框网络包括自适应调整模块，所述自适应调整模块被配置为根据所述预测锚点框a(x,y,w,h)和所述第一特征图F₁获得自适应预测锚点框a'(x,y,w,h)；所述第三卷积模块被配置为根据所述第一特征图F₁、所述自适应预测锚点框a'(x,y,w,h)获得与所述第一特征图F₁尺寸相同的第二特征图F₂。Further, the adaptive anchor point frame network includes an adaptive adjustment module, and the adaptive adjustment module is configured to predict the anchor point frame a(x, y, w, h) and the first feature map according to the F ₁ obtains an adaptive prediction anchor frame a'(x, y, w, h); the third convolution module is configured to, according to the first feature map F ₁ , the adaptive prediction anchor frame a '(x, y, w, h) to obtain a second feature map F _{2 with the same size as the first feature map F 1} _.

同时，本发明还提供一种基于改进Faser RCNN的小目标检测装置，包括：Meanwhile, the present invention also provides a small target detection device based on the improved Faser RCNN, including:

处理器；和processor; and

用于存储所述处理器可执行指令的存储器；a memory for storing the processor-executable instructions;

其中，所述处理器被配置为执行所述算法结构的程序指令，以实现上述基于改进Faser RCNN的小目标检测方法。Wherein, the processor is configured to execute the program instructions of the algorithm structure, so as to realize the above-mentioned small target detection method based on the improved Faser RCNN.

此外，本发明还提供一种基于改进Faser RCNN的小目标检测方法的计算机存储介质，当所述计算机存储介质中的指令由基于改进Faser RCNN的小目标检测装置的处理器执行时，使得所述基于改进Faser RCNN的小目标检测装置能够执行上述基于改进Faser RCNN的小目标检测方法。In addition, the present invention also provides a computer storage medium based on the small target detection method based on the improved Faser RCNN, when the instructions in the computer storage medium are executed by the processor of the small target detection device based on the improved Faser RCNN, so that the The small target detection device based on the improved Faser RCNN can perform the above-mentioned small target detection method based on the improved Faser RCNN.

本发明的主要包括四个部分：第一部分是对所述场景图片进行特征提取的过程；第二部分是根据预测锚点框中心点得到预测锚点框形状并最终生成预测锚点框A(x,y,w,h)的过程；第三部分是生成与锚点框A(x,y,w,h)尺寸相同特征图的过程；第四部分是获得场景图片检测结果的过程。The present invention mainly includes four parts: the first part is the process of extracting the features of the scene picture; the second part is to obtain the shape of the predicted anchor point frame according to the center point of the predicted anchor point frame and finally generate the predicted anchor point frame A(x , y, w, h); the third part is the process of generating a feature map of the same size as the anchor box A (x, y, w, h); the fourth part is the process of obtaining the scene picture detection result.

本发明的有益效果：Beneficial effects of the present invention:

1、本发明通过基于Faser RCNN算法框架进行修改，将Faser RCNN算法框架下的RPN网络替换为自适应锚点框网络，使生成的锚点框能够与不同尺度目标更加匹配，进而避免因锚点框设置尺寸不合理而造成的漏检现象，最终达到提高检测准确率的目的。1. The present invention is modified based on the framework of the Faser RCNN algorithm, and the RPN network under the framework of the Faser RCNN algorithm is replaced with an adaptive anchor point frame network, so that the generated anchor point frame can be more matched with targets of different scales, and then avoid the anchor point. The missed detection phenomenon caused by the unreasonable size of the frame setting will ultimately achieve the purpose of improving the detection accuracy.

2、本发明采用自适应锚点框网络，该网络主要由形状预测和位置预测两个分支，通过两个分支选择预测概率高于某个阈值的位置和每个选择位置的最可能形状来生成锚点框。通过上述的改进方式，能够得到尺寸合理的锚点框，避免造成漏检情况，提高了网络对目标的检测能力，有效的提高了检测性能。当应用于绝缘子检测时，通过对比本发明与现有的绝缘子检测算法得知，本发明的检测准确率大幅度提高。2. The present invention adopts an adaptive anchor box network. The network is mainly composed of two branches: shape prediction and position prediction. The two branches select a position with a predicted probability higher than a certain threshold and the most probable shape of each selected position. Anchor box. Through the above improvement method, an anchor frame with a reasonable size can be obtained, which avoids the situation of missed detection, improves the detection ability of the network to the target, and effectively improves the detection performance. When applied to insulator detection, it is known by comparing the present invention with the existing insulator detection algorithm that the detection accuracy of the present invention is greatly improved.

3、当应用于绝缘子检测时，通过对比本发明与现有的绝缘子检测算法得知，本发明利用深度可分离卷积代替常规卷积减少了网络的参数量，大大提高了检测速度。3. When applied to insulator detection, by comparing the present invention with the existing insulator detection algorithm, it is known that the present invention uses depthwise separable convolution instead of conventional convolution to reduce the amount of network parameters and greatly improve the detection speed.

4、本发明在特征提取网络中对残差网络进行改进，添加通道注意力机制结构，使得特征图中各通道相互连接并且加强各通道中的重要特征信息，一方面更有助于后续的场景图片小目标检测过程，另一方面能够提高检测结果准确率。4. The present invention improves the residual network in the feature extraction network, and adds a channel attention mechanism structure, so that each channel in the feature map is connected to each other and strengthens the important feature information in each channel, which is more helpful for subsequent scenarios on the one hand. On the other hand, the detection process of small objects in pictures can improve the accuracy of detection results.

附图说明Description of drawings

图1为本发明实施例1中基于改进Faser RCNN的小目标检测方法中DeformResNet50残差块结构示意图；1 is a schematic structural diagram of the DeformResNet50 residual block in the small target detection method based on improved Faser RCNN in Embodiment 1 of the present invention;

图2为本发明实施例1中基于改进Faser RCNN的小目标检测方法中深度可分离卷积的原理示意图；2 is a schematic diagram of the principle of depth separable convolution in the small target detection method based on improved Faser RCNN in Embodiment 1 of the present invention;

图3为本发明实施例1中基于改进Faser RCNN的小目标检测方法中自适应锚点框网络结构示意图；3 is a schematic diagram of an adaptive anchor frame network structure in a small target detection method based on improved Faser RCNN in Embodiment 1 of the present invention;

图4为本发明实施例1中基于改进Faser RCNN的小目标检测方法中改进FaserRCNN的网络训练流程图；Fig. 4 is the network training flow chart of improving FaserRCNN in the small target detection method based on improving Faser RCNN in the embodiment of the present invention 1;

图5为本发明实施例1中基于改进Faser RCNN的小目标检测方法中改进FaserRCNN的网络框架图；5 is a network frame diagram of the improved FaserRCNN in the small target detection method based on the improved Faser RCNN in Embodiment 1 of the present invention;

图6为本发明实施例1中绝缘子缺陷的检测流程图；Fig. 6 is the detection flow chart of the insulator defect in the embodiment 1 of the present invention;

图7为本发明实施例2中正常绝缘子的缺陷检测结果图；Fig. 7 is the defect detection result diagram of the normal insulator in the embodiment 2 of the present invention;

图8为本发明实施例2中缺陷绝缘子的缺陷检测结果图。FIG. 8 is a graph showing the defect detection result of the defective insulator in Example 2 of the present invention.

具体实施方式Detailed ways

绝缘子是架空输电线路中重要的组成部分，用来支持和固定母线与带电导体、并使带电导体间或导体与大地之间有足够的距离和绝缘。由于架空输电线路长期暴露在自然环境中，受到自然或者人为因素的影响,存在线路老化和破坏等问题，如果不对这些问题进行定期检查和检修可能引起重大安全事故发生。本领域技术人员知晓，绝缘子缺陷包括有侵蚀、开裂、破碎、芯棒外露等，但是绝缘子缺陷和绝缘子的尺度差异性较大。因此，在绝缘子缺陷检测中，漏检率相对较高。Insulators are an important part of overhead transmission lines, which are used to support and fix busbars and live conductors, and to provide sufficient distance and insulation between live conductors or between conductors and the ground. Due to the long-term exposure of overhead transmission lines to the natural environment and the influence of natural or human factors, there are problems such as line aging and damage. If these problems are not regularly checked and repaired, major safety accidents may occur. Those skilled in the art know that insulator defects include erosion, cracking, breaking, core rod exposure, etc., but the dimensions of insulator defects and insulators are quite different. Therefore, in insulator defect detection, the missed detection rate is relatively high.

下面将结合附图并以绝缘子检测为例对本发明提供技术方案及相应技术效果进行详细说明。The technical solutions provided by the present invention and the corresponding technical effects will be described in detail below with reference to the accompanying drawings and taking the insulator detection as an example.

实施例1Example 1

本实施例提供一种基于改进Faser RCNN的小目标检测方法，是一种对所述场景图片进行特征提取、基于锚点框中心位置生成锚点框并最终获得检测结果的方法。为方便理解，本实施例通过以下步骤100至600对具体过程表述，这些步骤在实际实施中并不代表时间的先后顺序，在实现各步骤的实施准备条件的前提下，变更顺序的实施也是本发明的不同实施方式。This embodiment provides a small target detection method based on the improved Faser RCNN, which is a method of extracting features from the scene picture, generating an anchor frame based on the center position of the anchor frame, and finally obtaining a detection result. For the convenience of understanding, this embodiment describes the specific process through the following steps 100 to 600. In actual implementation, these steps do not represent the order of time. On the premise of realizing the preparation conditions for the implementation of each step, the implementation of changing the order is also the same. different embodiments of the invention.

步骤100，预处理用于学习的原始输电线路图像样本图片，即为场景图片。Step 100: Preprocess the original transmission line image sample picture used for learning, that is, the scene picture.

本实施例中，在预处理中，每张样本图片被调整为同一的尺寸，并进行数据增强。示范的，本实施例中，每张样本图片被使用双线性插值的方法调整为900×600大小，训练时使用的数据增强方式将收集到的样本图片利用旋转、裁剪、增加对比度等数据增强的方法对数据集进行扩充，数据增强方式具体参数如表1所示。本步骤获得包含小目标的场景图片数据集。具体的，本实施例中，样本图片在RGB三个通道分量展开后，一张场景图片使用一个的900×600×3的张量表示。In this embodiment, in the preprocessing, each sample picture is adjusted to the same size, and data enhancement is performed. Exemplarily, in this embodiment, each sample picture is adjusted to a size of 900×600 using bilinear interpolation, and the data enhancement method used during training enhances the collected sample pictures with data such as rotation, cropping, and contrast increase. The method expands the data set, and the specific parameters of the data enhancement method are shown in Table 1. This step obtains a scene image dataset containing small objects. Specifically, in this embodiment, after the sample picture is expanded with three RGB channel components, one scene picture is represented by one tensor of 900×600×3.

表1数据增强方式Table 1 Data enhancement methods

利用LabelImg软件对得到的数据集进行人工标注，将绝缘子和绝缘子缺陷分别设置标签名为Insulator和defect，将其制作成可供检测网络训练的VOC格式。随机选定80％的场景图片数据集作为训练集，用于优化网络模型、20％的场景图片数据集作为测试集，用于评定模型效果。Use LabelImg software to manually label the obtained data set, set the labels of insulator and insulator defects as Insulator and defect respectively, and make them into VOC format for detection network training. 80% of the scene image data set was randomly selected as the training set for optimizing the network model, and 20% of the scene image data set was used as the test set for evaluating the model effect.

步骤200，构建由第一卷积模块、第二卷积模块、第三卷积模块、第四卷积模块、通道注意力机制模块组成的改进Faser RCNN(如图5所示)。本实施例中，对本发明使用的Faser RCNN更具体的结构设置和作用，可以从以下对其工作原理说明中获得。本实施例的神经网络构建包括4个主要部分：第一部分为卷积层，第二部分为自适应锚点框网络，第三部分为ROI Pooling层，第四部分为分类和回归层。具体如下Step 200, constructing an improved Faser RCNN consisting of a first convolution module, a second convolution module, a third convolution module, a fourth convolution module, and a channel attention mechanism module (as shown in Figure 5). In this embodiment, more specific structural settings and functions of the Faser RCNN used in the present invention can be obtained from the following description of its working principle. The neural network construction in this embodiment includes four main parts: the first part is the convolution layer, the second part is the adaptive anchor box network, the third part is the ROI Pooling layer, and the fourth part is the classification and regression layer. details as follows

第一部分：卷积层(conv layers)，用于提取图片的特征，输入为经预处理后得到的张量(224*224*3)，输出为提取出的特征，简称为第一特征图F₁。在这一部分的其中一种具体实施方式中，我们采用ResNet的卷积结构对场景图片进行特征提取，得到第一特征图F₁。优选的，第一神经网络第一卷积模块的主干部分采用ResNet50的卷积结构，即将张量(224*224*3)输入到ResNet50的特征提取层中。Resnet 50网络是由若干个残差块组成的，结构图如表2所示。The first part: convolution layers (conv layers), used to extract the features of the picture, the input is the tensor (224*224*3) obtained after preprocessing, and the output is the extracted features, referred to as the first feature map F for short ₁ . In one of the specific implementations of this part, we use the convolutional structure of ResNet to perform feature extraction on the scene picture to obtain the first feature map F ₁ . Preferably, the backbone part of the first convolution module of the first neural network adopts the convolution structure of ResNet50, that is, the tensor (224*224*3) is input into the feature extraction layer of ResNet50. The Resnet 50 network is composed of several residual blocks, and the structure diagram is shown in Table 2.

表2 Resnet 50网络结构Table 2 Resnet 50 network structure

首先，输入的图像经过一个64维的7x7卷积，卷积核的步长为2，然后经过卷积核尺寸为3x3、步长为2的最大池化进行下采样，最后经过一系列残差块之后，对其做全局平均池化和1000维度的全连接层，并输出到softmax分类器进行分类处理。Resnet 50的主干网络结构并未做改进，此处不再赘述。经过特征提取后，得到第一特征图F₁。First, the input image is subjected to a 64-dimensional 7x7 convolution with a stride of 2, then it is downsampled by max pooling with a kernel size of 3x3 and a stride of 2, and finally a series of residuals After the block, do global average pooling and 1000-dimensional fully connected layer, and output to the softmax classifier for classification processing. The backbone network structure of Resnet 50 has not been improved and will not be repeated here. After feature extraction, a first feature map F ₁ is obtained.

作为本实施例的一种优选，对Resnet 50网络的残差块进行改进，由此形成Deform_Resnet 50主干特征提取网络。所述ResNet50的卷积结构包括多层DeformResNet50残差块结构，每个残差块结构包括三个卷积层(1x1+3x3+1x1)，中间的3x3卷积层首先在一个降维1x1卷积层下减少了计算，然后在另一个1x1的卷积层下做了还原，既保持了精度又减少了计算量。As a preference of this embodiment, the residual block of the Resnet 50 network is improved, thereby forming the Deform_Resnet 50 backbone feature extraction network. The convolutional structure of the ResNet50 includes a multi-layer DeformResNet50 residual block structure, each residual block structure includes three convolutional layers (1x1+3x3+1x1), and the middle 3x3 convolutional layer is first in a dimension reduction 1x1 convolution. The calculation is reduced under the layer, and then restored under another 1x1 convolutional layer, which both maintains the accuracy and reduces the amount of calculation.

示范的，本实施例的其中一种具体实施方式为：所述Deform ResNet50残差块结构其第二卷积层(即3x3卷积层)替换为深度可分离卷积层，对于Deform_Resnet 50残差块的构建方式如图1所示。深度可分离卷积包括逐通道卷积(Depthwise Convolution)和逐点卷积(Pointwise Convolution)。首先将输入的C通道的图像生成对应的C个特征图，然后通过逐点卷积的方式将生成的特征图重新进行组合形成新的特征图。其中，逐点卷积的卷积核的尺寸为1x1xM，M为上一层的通道数。深度可分离卷积原理图如图2所示。相同输入的情况下，深度可分离卷积的参数个数大幅下降，大大减少了计算的复杂程度。Exemplarily, one of the specific implementations of this embodiment is: the second convolution layer (ie, the 3x3 convolution layer) of the Deform ResNet50 residual block structure is replaced by a depthwise separable convolution layer. For the Deform_Resnet 50 residual The way the blocks are constructed is shown in Figure 1. Depthwise separable convolutions include channel-wise convolution (Depthwise Convolution) and pointwise convolution (Pointwise Convolution). First, the corresponding C feature maps are generated from the input C channel image, and then the generated feature maps are recombined to form a new feature map through point-by-point convolution. Among them, the size of the convolution kernel of point-by-point convolution is 1x1xM, and M is the number of channels in the previous layer. The schematic diagram of depthwise separable convolution is shown in Figure 2. In the case of the same input, the number of parameters of the depthwise separable convolution is greatly reduced, which greatly reduces the computational complexity.

示范的，本实施例的另一种具体实施方式为：所述第一神经网络其第一卷积模块包括通道注意力机制模块。具体到本实施例中，在残差块的分支部分(如图1右侧分支所示)加入通道注意力机制模块，即在将张量(112*112*3)输入到ResNet50的特征提取层中，即调用通道注意力机制模块。将尺寸为W×H×C的图像输入到检测网络中，其中W、H、C分别为图像的宽、高、通道数，首先对其的C个通道进行全局平均池化(Global pooling)起到降维的作用，将其转换成1×1×C的向量，用1个全连接层(FC)减少向量的通道维度，随后经过ReLU层对函数进行激活，再经过一个FC层恢复原始的维度，最后利用Sigmoid函数激活得到特征权重。然后将输入的图像与该特征权重进行乘积运算：Exemplarily, another specific implementation of this embodiment is: the first convolution module of the first neural network includes a channel attention mechanism module. Specifically in this embodiment, the channel attention mechanism module is added to the branch part of the residual block (as shown in the right branch of Figure 1), that is, the tensor (112*112*3) is input to the feature extraction layer of ResNet50 , that is, the channel attention mechanism module is called. Input an image of size W×H×C into the detection network, where W, H, and C are the width, height, and number of channels of the image, respectively. First, perform global average pooling on the C channels of the image. To reduce the dimension, convert it into a 1×1×C vector, use a fully connected layer (FC) to reduce the channel dimension of the vector, then activate the function through the ReLU layer, and then restore the original through a FC layer. dimension, and finally use the Sigmoid function to activate the feature weights. Then multiply the input image with this feature weight:

F_scale(u_c,s_c)＝u_c×s_c F _scale (u _c ,s _c )=u _c ×s _c

其中：u_c为输入特征图，s_c为激发操作得到的权值。Among them: uc is the input feature map, and s _c is the weight obtained by the excitation operation _.

示范的，整个通道注意力机制模块可以用以下公式表达：Exemplarily, the entire channel attention mechanism module can be expressed by the following formula:

其中，

δ为激活函数，Q₁、Q₂表示为两个全连接层的权重值。in,

δ is the activation function, and Q ₁ and Q ₂ are the weight values of the two fully connected layers.

第二部分：自适应锚点框网络。Part II: Adaptive Anchor Box Network.

基于概率公式P(x，y,w,h|F)＝P(x，y|F)*P(w，h|x,y,F)可知，影响当中心点位置(x，y)不同时对应的预测框的概率是不同的，记为P(x，y|F)。当中心点位置(x,y)确定时，不同尺寸预测框的发生概率是不同的，记为P(w,h|x,y,F)。因此说明影响一组锚点框是有位置以及形状尺寸因素所影响。因此本实施例的自适应锚点框网络有位置预测模块和形状预测模块两个分支。Based on the probability formula P(x, y, w, h|F)=P(x, y|F)*P(w, h|x, y, F), it can be known that when the center point position (x, y) does not affect At the same time, the probabilities of the corresponding prediction boxes are different, denoted as P(x, y|F). When the position of the center point (x, y) is determined, the probability of occurrence of prediction boxes of different sizes is different, which is recorded as P(w,h|x,y,F). Therefore, it is explained that the influence of a group of anchor boxes is affected by the location and shape size factors. Therefore, the adaptive anchor box network of this embodiment has two branches: a position prediction module and a shape prediction module.

位置预测模块：首先经过第一神经网络其第一卷积模块获得的第一特征图F₁，该第一特征图F₁可以用四个参数(x，y，w，h)表示，其中(x，y)表示预测锚点框的中心点坐标，(w,h)表示预测锚点框形状的宽度和高度。然后将该第一特征图F₁经过一个1x1的卷积操作并由激活函数Sigmoid激活，生成一个与之对应的且尺度相同的得分特征图F_p，其中各点所表示的为每个像素点存在物体的得分。然后将该得分与得分阈值τ比较，阈值经实验取0.6。若F_p中的特征值得分大于阈值τ，对应的该点像素表示为目标的中心点，标记为锚点框中心位置a(x,y)。Position prediction module: First, the first feature map F ₁ obtained through the first convolution module of the first neural network. The first feature map F ₁ can be represented by four parameters (x, y, w, h), where ( x, y) represents the coordinates of the center point of the prediction anchor box, and (w, h) represents the width and height of the shape of the prediction anchor box. Then the first feature map F ₁ is subjected to a 1x1 convolution operation and activated by the activation function Sigmoid to generate a corresponding score feature map F _p with the same scale, where each point represents each pixel point A score for the presence of an object. The score is then compared to a score threshold τ, which is experimentally taken as 0.6. If the feature value score in F _p is greater than the threshold τ, the corresponding pixel of this point is represented as the center point of the target, and marked as the center position a(x, y) of the anchor point frame.

形状预测模块：首先通过位置预测确定了锚点框a(x,y,w,h)的中心位置a(x,y)，之后需要利用中心位置所对应的最近真实框B'来确定锚点框a的形状即a(w,h)。由于利用回归运算得出a(w,h)的值是很困难的，因此采用对w、h采样近似的方法。本发明w:h有0.5、1.0、2.0三种取值。通过锚点框a与某个真实框B的最大交并比vIOU,得到与a具有最大交并比的B'(x',y',w',h'),以及此时的a(w,h)，即经过形状预测分支所输出的结果为a(w,h)。由于该数字范围较大不稳定，因此需要对于预测框的宽度和高度经过公式：W＝μ·S·e^dW，H＝μ·S·e^dH的调整，其中，S为步长、μ为经验因子(一般取值为8)。将待学习的参数范围从[1,1000]调整到[-1,1]，达到简化网络训练的目的。根据形状预测分支输出的所述锚点框中心位置a(x,y)和位置预测分支输出的所述锚点框尺寸a(w,h)，最终获得锚点框a(x,y,w,h)。Shape prediction module: First, the center position a(x,y) of the anchor frame a(x,y,w,h) is determined by position prediction, and then the anchor point needs to be determined by using the nearest real frame B' corresponding to the center position The shape of box a is a(w,h). Since it is very difficult to obtain the value of a(w,h) by regression operation, the method of sampling approximation to w and h is adopted. In the present invention, w:h has three values of 0.5, 1.0 and 2.0. Through the maximum intersection ratio vIOU between anchor box a and a real box B, B'(x', y', w', h') with the largest intersection ratio with a, and a(w at this time) are obtained. ,h), that is, the result outputted by the shape prediction branch is a(w,h). Due to the large and unstable numerical range, the width and height of the prediction frame need to be adjusted by the formulas: W=μ·S· ^edW , H=μ·S· ^edH , where S is the step size and μ is Experience factor (usually 8). Adjust the range of parameters to be learned from [1,1000] to [-1,1] to simplify network training. According to the center position a(x, y) of the anchor frame output by the shape prediction branch and the anchor frame size a(w, h) output by the position prediction branch, the anchor frame a(x, y, w) is finally obtained , h).

作为本实施例其中一种更优的实施方式，所述自适应锚点框网络包括自适应调整模块，所述自适应调整模块被配置为根据所述锚点框a(x,y,w,h)和所述第一特征图F₁获得自适应锚点框a'(x,y,w,h)，特征的自适应化的具体操作公式如下所示：As a more preferred implementation of this embodiment, the adaptive anchor box network includes an adaptive adjustment module, and the adaptive adjustment module is configured to adjust according to the anchor box a(x, y, w, h) and the first feature map F ₁ to obtain an adaptive anchor frame a'(x, y, w, h), and the specific operation formula of the adaptive feature is as follows:

f_i'＝D(f_i，W_i，H_i)f _i '=D(f _i , Wi , H _i ₎

其中，通过f_i是生成的第i个锚点框所映射在输入第一特征图F₁上的特征值，D(·)由3x3的可变形卷积组成，即图3中所述的Offset field。f_i'为调整后的特征值，即对第i个位置的特征及宽高进行处理。根据f_i'得到自适应锚点框a'(x,y,w,h)。自适应锚点框网络结构图如图3所示，输入为第一特征图F₁，输出为预测锚点框a(x,y，w,h)。图3中，W×H×1和W×H×2均为第一特征图F₁的尺寸表示，其中，W、H分别对应第一特征图F的宽、高，1、2代表的是通道数。如图3所示，第一特征图F₁首先经过位置定位预测，获得锚点框中心位置a(x，y)，其次经过形状预测，获得预测锚点框a(x,y，w，h)；最后，经过特征的自适应化调整得到自适应锚点框a'(x,y,w,h)。Among them, fi is the eigenvalue of the generated _i -th anchor point frame mapped on the input _first feature map F1, and D( ) is composed of a 3x3 deformable convolution, that is, the Offset described in Figure 3 field. f _i ' is the adjusted feature value, that is, the feature and width and height of the i-th position are processed. The adaptive anchor frame a'(x,y,w,h) is obtained according to f _i '. The network structure diagram of the adaptive anchor box is shown in Figure 3, the input is the first feature map F ₁ , and the output is the predicted anchor box a(x, y, w, h). In FIG. 3 , W×H×1 and W×H×2 are the size representations of the first feature map F ₁ , wherein W and H respectively correspond to the width and height of the first feature map F, and 1 and 2 represent the number of channels. As shown in Figure 3, the first feature map F ₁ is first predicted by location positioning to obtain the anchor frame center position a(x, y), and secondly, through shape prediction, the predicted anchor frame a(x, y, w, h is obtained) ); finally, the adaptive anchor frame a'(x, y, w, h) is obtained after the adaptive adjustment of the features.

第三部分：ROI Pooling，该层收集输入的第一特征图F₁和预测锚点框a(x,y,w,h)，综合这些信息后，得到与所述第一特征图F₁尺寸相同的第二特征图F₂，第二特征图F₂为固定尺寸的feature map，然后送入后续全连接层。优选的，该层收集输入的第一特征图F₁和自适应锚点框a'(x,y,w，h)，综合这些信息后，得到与所述第一特征图F₁尺寸相同的第二特征图F₂，The third part: ROI Pooling, this layer collects the input first feature map F ₁ and the prediction anchor frame a (x, y, w, h), after synthesizing these information, the size of the first feature map F ₁ is obtained The same second feature map F ₂ , the second feature map F ₂ is a feature map of a fixed size, and then sent to the subsequent fully connected layer. Preferably, this layer collects the input first feature map F ₁ and the adaptive anchor point frame a' (x, y, w, h), and after synthesizing these information, obtains the same size as the first feature map F ₁ The second feature map F ₂ ,

第四部分：分类和回归(Classification and regression)，这一层的输入为第二特征图F₂，输出是目标的边界框和缺陷类别置信度。利用Softmax Loss(探测分类概率)和Smooth L1Loss(探测边框回归)对分类概率和边框回归(Bounding box regression)联合训练，从而得到目标的边界框和缺陷类别置信度(如图6所示)。The fourth part: Classification and regression, the input of this layer is the second feature map F ₂ , and the output is the bounding box of the target and the confidence of the defect category. Using Softmax Loss (detection classification probability) and Smooth L1Loss (detection bounding box regression) to jointly train the classification probability and bounding box regression to obtain the bounding box and defect category confidence of the target (as shown in Figure 6).

步骤300，配置损失函数并对卷积神经网络训练。Step 300, configure the loss function and train the convolutional neural network.

本实施例中，神经网络整个训练过程为端到端的训练，损失函数主要包括锚点定位损失函数L_loc、锚点形状预测损失函数L_shape、目标分类损失函数L_cls、回归损失函数L_reg组成：In this embodiment, the entire training process of the neural network is end-to-end training, and the loss function mainly includes the anchor point location loss function L _loc , the anchor point shape prediction loss function L _shape , the target classification loss function L _cls , and the regression loss function L _reg . :

L_loss＝λ₁L_loc+λ₂L_shape+L_cls+L_reg L _loss =λ ₁ L _loc +λ ₂ L _shape +L _cls +L _reg

其中，λ₁＝1和λ₂＝0.1where λ ₁ =1 and λ ₂ =0.1

第一，对于定位损失函数L_loc，该损失函数是用来控制锚点框对应数量以达到在目标中心放置更多的锚点框，在非中心坐标放置更少的锚点框，以防止生成锚点框中正负样本不平衡。而对于位置的确定就是所保留的锚点框的中心位置。即，当中心点是负样本也就是简单分类时，预测得分y会接近于1，因此代入公式对应的权重就会变得更加小，由此，对该中心点放置的锚点框数量就会减少。由于生成的锚点框数量庞大，并且包含负样本的锚点框占据大多数。因此，为平衡锚点中所包含的正负样本，采用Focal Loss来训练位置分支。Focal loss的公式如下：First, for the localization loss function L _loc , the loss function is used to control the corresponding number of anchor boxes so as to place more anchor boxes in the center of the target and place less anchor boxes in the non-center coordinates to prevent the generation of The positive and negative samples in the anchor box are unbalanced. The determination of the position is the central position of the reserved anchor box. That is, when the center point is a negative sample, that is, a simple classification, the prediction score y will be close to 1, so the weight corresponding to the formula will become smaller, so the number of anchor boxes placed on the center point will be reduce. Due to the huge number of generated anchor boxes, the anchor boxes containing negative samples occupy most of them. Therefore, in order to balance the positive and negative samples contained in the anchors, Focal Loss is used to train the position branch. The formula for Focal loss is as follows:

其中，y表示该锚点框位置中心预测为样本的得分，y‘表示该位置中心在预标记的实际标签值，正样本设置为1，负样本设置为0。α的值来控制正负锚点框的权重，一般α∈(0,0.5)，本文设置为0.25。γ为专注力参数，当γ＝0时，该损失函数为传统的交叉熵损失函数，因此一般γ≥0，本文γ取值为2。(1-y')^γ为调制系数，用来控制易分类和难分类样本的权重。Among them, y represents the predicted score of the anchor box location center as a sample, y' represents the actual label value of the location center in the pre-label, positive samples are set to 1, negative samples are set to 0. The value of α controls the weight of the positive and negative anchor boxes, generally α∈(0,0.5), which is set to 0.25 in this paper. γ is the concentration parameter. When γ=0, the loss function is the traditional cross-entropy loss function, so generally γ≥0, the value of γ in this paper is 2. (1-y') ^γ is the modulation coefficient, which is used to control the weight of easy-to-classify and hard-to-classify samples.

对于位置预测，首先将每个框分为三种类型的区域：中心区域；忽略区域；负样本区域。三种具体的区域是根据真实边框映射到所对应的特征图上所对应的位置信息所界定的，即(x₀,y₀,w₀,h₀)。因此对应的中心区域表示为

忽略区域表示为

对于其他边界区域定义为负样本区域。其中，

用来调节生成锚点框的数量，一般为

由此确定其锚点框的中心位置。For location prediction, each box is first divided into three types of regions: central region; ignore region; negative sample region. The three specific regions are defined according to the location information corresponding to the real frame mapped to the corresponding feature map, namely (x ₀ , y ₀ , w ₀ , h ₀ ). So the corresponding central region is expressed as

The ignore region is represented as

For other boundary regions, it is defined as a negative sample region. in,

Used to adjust the number of generated anchor boxes, generally

This determines the center position of its anchor box.

第二，对于形状预测损失函数Second, for shape prediction loss function

首先通过位置预测确定了锚点框a(x,y,w,h)的中心位置(x,y)，之后需要利用中心位置所对应的最近真实框B'来确定锚点框a的形状即a(w,h)。由于利用回归运算得出a(w,h)的值是很困难的，因此采用对w、h采样近似的方法，本发明w:h有0.5、1.0、2.0三种取值。通过锚点框a与某个真实框B的最大交并比vIOU,得到与a具有最大交并比的B'(x',y',w',h'),以及此时的a(w,h)。形状预测的损失函数为：First, the center position (x, y) of the anchor frame a (x, y, w, h) is determined through position prediction, and then the shape of the anchor frame a needs to be determined by using the nearest real frame B' corresponding to the center position. a(w, h). Since it is very difficult to obtain the value of a(w,h) by regression operation, the method of sampling approximation to w and h is adopted. In the present invention, w:h has three values of 0.5, 1.0 and 2.0. Through the maximum intersection ratio vIOU between anchor box a and a real box B, B'(x', y', w', h') with the largest intersection ratio with a, and a(w at this time) are obtained. , h). The loss function for shape prediction is:

其中，L₁为平滑函数：

Among them, L ₁ is a smoothing function:

第三，对于分类损失函数Third, for the classification loss function

分类部分的损失函数使用的是二元交叉熵函数，公式如下所示：The loss function of the classification part uses the binary cross entropy function, and the formula is as follows:

其中：p_i为锚点框(anchor)预测为目标的概率，

为背景真实的概率，

的规定如下：

Among them: p _i is the probability that the anchor box (anchor) is predicted to be the target,

is the true probability of the background,

The provisions are as follows:

第四，对于回归损失函数L_reg Fourth, for the regression loss function L _reg

回归损失函数L_reg如下式所述：The regression loss function _Lreg is described as follows:

其中，t_i＝{t_x,t_y,t_w,t_h}表示锚点框的的4个参数，分别为锚点框的中心位置坐标以及宽、高。

是与positiveanchor对应的groundtruth的4个坐标参数。R为smoothL1函数：Wherein, t _i ={t _x , _ty , t _w , _th } represents four parameters of the anchor point frame, which are the center position coordinates, width and height of the anchor point frame, respectively.

are the 4 coordinate parameters of the groundtruth corresponding to the positiveanchor. R is the smoothL1 function:

本实施例中，神经网络整个训练过程为端到端的训练，在第一神经网络第一卷积模块得到的是第一特征图F₁，在第二神经网络第一卷积模块得到的首先是锚点框中心位置的预测结果，其次得到的是锚点框中心尺寸的预测结果，最终得到的是锚点框的预测结果，将场景图片转换为目标的边界框和缺陷类别置信度。In this embodiment, the entire training process of the neural network is end-to-end training. The first feature map F 1 obtained by the first convolution module of the first neural network is the first feature map F ₁ obtained by the first convolution module of the second neural network. The prediction result of the center position of the anchor point box, followed by the prediction result of the center size of the anchor point box, and finally the prediction result of the anchor point box, which converts the scene picture into the bounding box of the target and the confidence level of the defect category.

步骤400，训练模型。Step 400, train the model.

训练过程(如图4所示)中采用梯度下降对反向传播阶段进行优化，训练的批量大小batch-size设置为16，采用动量值(momentum)为0.9，权重衰减以指数形式衰减，学习率设置为0.004，设置参数num_classes为3(代表绝缘子、绝缘子缺陷和背景)，采用warmup训练策略。epoch设置为40000，每3000个epoch保存一次模型并且保存最后一次模型，最终选取loss最低的模型用于检测。In the training process (as shown in Figure 4), gradient descent is used to optimize the backpropagation stage. The training batch size is set to 16, the momentum value is set to 0.9, and the weight decay is exponentially decayed. The rate is set to 0.004, the parameter num_classes is set to 3 (representing insulators, insulator defects and background), and the warmup training strategy is adopted. The epoch is set to 40000, the model is saved every 3000 epochs and the last model is saved, and the model with the lowest loss is finally selected for detection.

步骤500，模型应用。Step 500, model application.

在通过上面的训练过程后可以得到多个模型，选取其中最优的模型(损失函数值最小)用于应用，此时图片数据处理在这里不需要数据增强，只需要把图像调整到900*600大小，并且归一化即可作为模型的输入。整个的网络模型的参数都固定不动，只要输入图像数据并向前传播即可。依次得到第一特征图F、预测锚点框a(x,y,w,h)、第二特征图F₂，经过整个模型可以直接得到检测结果。当需要测试大量原始输电线路图像时候，可以将所有图整合为一个数据文件，如采用数据表存储各图片RGB数值时可以使用lmdb格式文件，方便一次性读取所有图片。After the above training process, multiple models can be obtained, and the optimal model (with the smallest loss function value) can be selected for application. At this time, the image data processing does not require data enhancement here, and only needs to adjust the image to 900*600 size, and normalization can be used as the input of the model. The parameters of the entire network model are fixed, as long as the image data is input and propagated forward. The first feature map F, the prediction anchor frame a(x, y, w, h), and the second feature map F ₂ are obtained in sequence, and the detection result can be directly obtained through the entire model. When a large number of original transmission line images need to be tested, all images can be integrated into one data file. For example, when using a data table to store the RGB values of each image, an lmdb format file can be used, which is convenient for reading all images at one time.

本实施例中，基于改进Faser RCNN网络架构中的带有通道注意力机制的ResNet50作为特征提取网络，进行场景图片的特征提取；基于改进Faser RCNN网络架构中的自适应锚点框网络作为预测锚点框网络，进行目标锚点的预测生成；最终基于改进Faser RCNN网络架构中ROI Pooling层、分类即回归层，最终输出类别置信度和类别边界框。In this embodiment, based on the ResNet50 with channel attention mechanism in the improved Faser RCNN network architecture as the feature extraction network, the feature extraction of the scene picture is performed; based on the adaptive anchor box network in the improved Faser RCNN network architecture as the prediction anchor The point box network is used to predict and generate target anchor points; finally, based on the improved ROI Pooling layer and classification or regression layer in the Faser RCNN network architecture, the final output category confidence and category bounding box.

步骤500，模型验证Step 500, model validation

为验证本实施例的有效性将Aut-Faster RCNN检测方法与Fast RCNN、FasterRCNN、SSD、YOLO v3、YOLO v4方法在绝缘子数据集上进行训练，对比其运行速度和mAP，结果如表3所示。In order to verify the effectiveness of this embodiment, the Aut-Faster RCNN detection method and the Fast RCNN, FasterRCNN, SSD, YOLO v3, and YOLO v4 methods are trained on the insulator data set, and their running speed and mAP are compared. The results are shown in Table 3. .

表3六种检测方法在绝缘子数据集上的对比结果Table 3 Comparison results of six detection methods on the insulator dataset

根据实验分析结果可以得出，本实施例的Aut-Faster RCNN算法的平均精度(mAP)为93.67％，较当前主流检测网络相比有着较高的准确率。According to the experimental analysis results, it can be concluded that the average precision (mAP) of the Aut-Faster RCNN algorithm in this embodiment is 93.67%, which is higher than the current mainstream detection network.

本发明实施方式还包括一种计算机可读存储介质，所述计算机可读存储介质存储实现本发明方法的程序指令和/或通过本发明方法训练获得的模型参数。Embodiments of the present invention further include a computer-readable storage medium, where the computer-readable storage medium stores program instructions for implementing the method of the present invention and/or model parameters obtained by training the method of the present invention.

本发明实施方式还包括一种基于改进Faser RCNN的小目标检测装置，其包括存储器、处理器，所述存储器存储本发明方法实施例训练获得的模型参数，所述处理器读取实现本发明改进Faser RCNN所描述的算法结构的程序指令，并依据所述模型参数实现上述小目标检测。Embodiments of the present invention further include a small target detection device based on the improved Faser RCNN, which includes a memory and a processor, the memory stores model parameters obtained by training in the method embodiment of the present invention, and the processor reads and implements the improvement of the present invention The program instructions of the algorithm structure described by Faser RCNN, and the above-mentioned small target detection is realized according to the model parameters.

实施例2Example 2

选取两张包含绝缘子的场景图片作为检测对象，利用本发明实施例1所提供的检测方法进行检测。检测过程如下。Two scene pictures containing insulators are selected as detection objects, and the detection method provided in Embodiment 1 of the present invention is used for detection. The detection process is as follows.

实施例2.1Example 2.1

(1)将一张包含绝缘子的场景图片调整为900*600尺寸，后将张量为900×600×3输入实施例1的基于改进Faser RCNN的小目标检测装置。(1) Adjust a scene image containing insulators to 900*600 size, and then input the tensor to 900*600*3 into the small target detection device based on improved Faser RCNN in Example 1.

(2)张量(900*600*3)经卷积层得到的为第一特征图F₁∈R^37×50×512。(2) The first feature map F ₁ ∈ R ^37×50×512 obtained by the tensor (900*600*3) through the convolutional layer.

(3)第一特征图F₁∈R^37×50×512经自适应锚点框网络得到的为自适应锚点框a'(-212，-419，183，359)。(3) The first feature map F ₁ ∈ R ^37×50×512 obtained by the adaptive anchor frame network is the adaptive anchor frame a' (-212, -419, 183, 359).

(4)第一特征图F₁∈R^37×50×512和a'(-212，-419，183，359)经ROI Pooling层得到的为第二特征图F₂∈R^7×7×512.(4) The first feature map F ₁ ∈ R ^{37 × 50 × 512} and a' (-212, -419, 183, 359) obtained through the ROI Pooling layer is the second feature map F ₂ ∈ R ^{7 × 7 × 512} .

(5)第二特征图F₂∈R^7×7×512和a'(-212，-419，183，359)经分类和回归层得到的分类类别和类别置信度，结果如图7所示。(5) The second feature map F ₂ ∈ R ^{7 × 7 × 512} and a' (-212, -419, 183, 359) obtained by the classification and regression layer classification and class confidence, the results are shown in Figure 7 .

实施例2.2Example 2.2

(1)将另一张包含缺陷绝缘子的场景图片调整为900*600尺寸，后将张量为900×600×3输入实施例1的基于改进Faser RCNN的小目标检测装置。(1) Adjust another scene picture containing defective insulators to 900*600 size, and then input the tensor to 900*600*3 into the small target detection device based on improved Faser RCNN in Example 1.

(3)第一特征图F₁∈R^37×50×512经自适应锚点框网络得到的为自适应锚点框a'(-415，-425，286，431)和自适应锚点框a'(-490，-652，83，135)。(3) The first feature map F ₁ ∈ R ^37×50×512 obtained by the adaptive anchor frame network is the adaptive anchor frame a' (-415, -425, 286, 431) and the adaptive anchor frame a' (-490, -652, 83, 135).

(4)第一特征图F₁∈R^37×50×512和a'(-415，-425，286，431)、a'(-490，-652，83，135)经ROI Pooling层得到的为第二特征图F₂∈R^7×7×512.(4) The first feature map F ₁ ∈ R ^37×50×512 and a' (-415, -425, 286, 431), a' (-490, -652, 83, 135) obtained by the ROI Pooling layer is the second feature map F ₂ ∈ R ^7×7×512 .

(5)第二特征图F₂∈R^7×7×512和a'(-415，-425，286，431)、a'(-490，-652，83，135)经分类和回归层得到的分类类别和类别置信度，结果如图8所示。(5) The second feature map F ₂ ∈ R ^7×7×512 and a' (-415, -425, 286, 431), a' (-490, -652, 83, 135) are obtained by classification and regression layer The classification category and category confidence of , the results are shown in Figure 8.

需要说明的是，因为在锚点框的自适应生成过程中，并非单一的生成一个锚点框，而是在对应的ROI区域生成很多个锚点框，最后是对每一个锚点框做分类运算，用来区分是正常还是破损绝缘子。It should be noted that, because in the adaptive generation process of the anchor point box, not a single anchor point box is generated, but many anchor point boxes are generated in the corresponding ROI area, and finally each anchor point box is classified. Operation is used to distinguish whether the insulator is normal or damaged.

图7为正常绝缘子的缺陷检测结果图，图8为缺陷绝缘子的缺陷检测结果图。根据图示可知，图7中仅包含一个绝缘子目标的边界框，于绝缘子目标的边界框的上方示出了绝缘子类别置信度。而图8中包含两个目标的边界框，其一为绝缘子目标的边界框，于绝缘子目标的边界框的上方示出了绝缘子类别置信度，其二为缺陷绝缘子目标的边界框，于缺陷绝缘子目标的边界框的上方示出了缺陷绝缘子类别置信度。FIG. 7 is a diagram of the defect detection result of a normal insulator, and FIG. 8 is a diagram of the defect detection result of a defective insulator. It can be seen from the figure that only one bounding box of the insulator target is included in FIG. 7 , and the confidence level of the insulator category is shown above the bounding box of the insulator target. Figure 8 contains two bounding boxes of targets, one is the bounding box of the insulator target, the confidence of the insulator category is shown above the bounding box of the insulator target, and the other is the bounding box of the defective insulator target, which is the bounding box of the defective insulator target. Defective insulator class confidence is shown above the bounding box of the target.

需要说明的是，为了更好地区分本实施例中绝缘子目标和缺陷绝缘子目标，故本发明专利申请在提交时，将图5、图7和图8以其他证明文件的形式同步提交。其中，图5对应其他证明文件中的证明材料1，两者除颜色不同外，其他因素全部相同；图7对应其他证明文件中的证明材料2，两者除颜色不同外，其他因素全部相同；图8对应其他证明文件中的证明材料3，两者除颜色不同外，其他因素全部相同。It should be noted that, in order to better distinguish the insulator target and the defective insulator target in this embodiment, when the patent application of the present invention is submitted, FIG. 5 , FIG. 7 and FIG. 8 are simultaneously submitted in the form of other certification documents. Among them, Figure 5 corresponds to the certification material 1 in other certification documents, except that the color is different, all other factors are the same; Figure 7 corresponds to the certification material 2 in other certification documents, except for the color difference, all other factors are the same; Figure 8 corresponds to the proof material 3 in the other proof documents, except for the color difference, all other factors are the same.

以上仅为本发明的较佳实施例而已，并不用于限制本发明，凡在本发明的精神和原则之内所做的任何修改、等同替换和改进等，均在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention are all within the protection scope of the present invention. .

Claims

1. a small target detection method based on improving Faser RCNN, implements an improved Faser RCNN algorithm instruction by processor and realizes, it is characterized in that, comprise: receive the scene picture that comprises small target; Use its first convolution of described Faser RCNN The module extracts the first feature map F of the scene picture; the second convolution module of the Faser RCNN is used to obtain the prediction anchor frame center position a (x, y) and the prediction anchor point according to the first feature map F ₁ frame size a(w,h) and obtain the prediction anchor frame a(x,y,w according to the prediction anchor frame center position a(x,y) and the prediction anchor frame size a(w,h) , h); use the Faser RCNN its third convolution module to obtain and the first feature map F according to the first feature map F ₁ , the prediction anchor frame a (x, y, w, h) _1. The second feature map F ₂ of the same size; the fourth convolution module of the Faser RCNN is used to obtain the obtained data according to the second feature map F ₂ and the prediction anchor frame a(x, y, w, h). The detection result of the scene picture.

2. The small target detection method based on improved Faser RCNN according to claim 1, wherein the main part of the first convolution module adopts the convolution structure of ResNet.

3. The small target detection method based on improved Faser RCNN according to claim 2, is characterized in that, the main part of described first convolution module adopts the convolution structure of ResNet50.

4. the small target detection method based on improved Faser RCNN according to claim 3, is characterized in that, the convolution structure of described ResNet50 comprises multi-layer Deform ResNet50 residual block structure, described Deform ResNet50 residual block structure its No. 1 The second convolutional layer is replaced by a depthwise separable convolutional layer.

5. The small target detection method based on improved Faser RCNN according to claim 1, wherein the first convolution module comprises a channel attention mechanism module, and the channel attention mechanism module is configured to The scene picture obtains a feature weight S _c ; the second convolution module is configured to obtain the prediction anchor frame center position a(x, y) according to the first feature map F ₁ and the feature weight S _c and The predicted anchor frame a(x, y, w, h) is obtained according to the central position a(x, y) of the predicted anchor frame.

6. The small target detection method based on improved Faser RCNN according to any one of claims 1-5, wherein the backbone part of the second convolution module adopts an adaptive anchor frame network structure.

7. The small target detection method based on improved Faser RCNN according to claim 6, wherein the adaptive anchor frame network obtains the score feature map F _P according to the first feature map F ₁ , and then according to the The score feature map _FP obtains the prediction anchor frame center position a(x, y); and according to the prediction anchor frame center position a(x, y) and the prediction anchor frame size a(w, h) Obtain the predicted anchor box a(x, y, w, h).

8. The small target detection method based on improved Faser RCNN according to claim 7, wherein the adaptive anchor frame network comprises an adaptive adjustment module, and the adaptive adjustment module is configured to predict according to the prediction The anchor frame a(x, y, w, h) and the first feature map F ₁ obtain the adaptive prediction anchor frame a'(x, y, w, h); the third convolution module is configured To obtain a second feature map F ₂ with the same size as the first feature map F ₁ according to the first feature map F ₁ and the adaptive prediction anchor frame a'(x, y, w, h).

9. a small target detection device based on improved Faser RCNN, is characterized in that, comprises:

processor; and

a memory for storing the processor-executable instructions;

Wherein, the processor is configured to execute the program instructions of the algorithm structure, so as to realize the small target detection method according to any one of claims 1 to 8.

10. A computer storage medium based on the small target detection method of the improved Faser RCNN, characterized in that: when the instructions in the computer storage medium are executed by the processor of the small target detection device based on the improved Faser RCNN, the The small target detection device based on the improved Faser RCNN can perform the small target detection method according to any one of claims 1 to 8.