CN108537286A

CN108537286A - A kind of accurate recognition methods of complex target based on key area detection

Info

Publication number: CN108537286A
Application number: CN201810345899.9A
Authority: CN
Inventors: 王田; 李玮匡; 李嘉锟; 陶飞
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2018-04-18
Filing date: 2018-04-18
Publication date: 2018-09-14
Anticipated expiration: 2038-04-18
Also published as: CN108537286B

Abstract

The present invention relates to a method for accurately identifying complex targets based on key region detection, including: using cross-training methods to perform fusion training on the entire neural network, using convolutional neural networks to extract target features, and using detection sub-networks with anchor boxes as references Detect the key areas of complex targets, use regional standard pooling to pool the key areas into fixed-size feature maps, use the classification sub-network to classify the key areas, and fuse the classification results of each key area to achieve accurate recognition of the target. The entire network includes a key area detection sub-network and a key area classification sub-network. The detection sub-network detects key areas with a degree of discrimination for complex targets, and then the classification sub-network classifies the key areas. target identification. These two sub-networks share the features extracted by the VGG convolutional neural network, so that the recognition of complex targets can be achieved quickly and accurately.

Description

A method for accurate recognition of complex targets based on key region detection

技术领域technical field

本发明涉及图像处理技术，特别是涉及一种基于关键区域检测的复杂目标精准识别方法。The invention relates to image processing technology, in particular to a method for accurately recognizing complex targets based on key region detection.

背景技术Background technique

复杂目标的分类与识别，是计算机视觉领域一项重要而基础的任务。不同种类的复杂目标，其大部分部位往往是相同或者相似的，而其差异往往体现在局部的一些关键区域，因此复杂目标的图像存在着大量干扰和冗余信息。而现有的一些针对复杂目标的分类识别方法，因无法去除复杂目标图像中的干扰与冗余信息，存在着精确度低的问题。为了实现对复杂目标的精准分类识别，研究一种基于关键区域检测的复杂目标精准识别方法具有重要意义。The classification and recognition of complex objects is an important and basic task in the field of computer vision. Most of the parts of different types of complex targets are the same or similar, but the differences are often reflected in some key local areas. Therefore, there are a lot of interference and redundant information in the images of complex targets. However, some existing classification and recognition methods for complex targets have the problem of low accuracy because they cannot remove the interference and redundant information in complex target images. In order to achieve accurate classification and recognition of complex targets, it is of great significance to study a method for precise recognition of complex targets based on key region detection.

发明内容Contents of the invention

有鉴于此，本发明的主要目的在于提供一种识别精准度高的基关键区域检测的复杂目标精准识别方法，在大大提高检测精度的同时，保证了识别的快速性。In view of this, the main purpose of the present invention is to provide an accurate recognition method for complex targets based on key area detection with high recognition accuracy, which ensures rapid recognition while greatly improving detection accuracy.

为了达到上述目的，本发明提出的技术方案为：一种基于关键区域检测的复杂目标精准识别方法，实现步骤如下：In order to achieve the above purpose, the technical solution proposed by the present invention is: a method for accurately identifying complex targets based on key area detection, and the implementation steps are as follows:

步骤1，读取数据库中训练样本中复杂目标图片，复杂目标关键区域的坐标标签，以及复杂目标分类标签，使用交叉训练的方法对复杂目标精准识别网络进行融合训练。Step 1. Read the images of complex targets in the training samples in the database, the coordinate labels of the key areas of complex targets, and the classification labels of complex targets, and use the cross-training method to perform fusion training on the complex target accurate recognition network.

步骤2，将待识别的复杂目标图片作为步骤1训练之后的复杂目标精准识别网络的输入，通过VGG卷积神经网络提取特征，得到待识别的复杂目标图片的特征图。In step 2, the complex target picture to be recognized is used as the input of the complex target accurate recognition network trained in step 1, and features are extracted through the VGG convolutional neural network to obtain the feature map of the complex target picture to be recognized.

步骤3，将步骤2得到的特征图输入到关键区域检测子网络中，以3×3大小的子网络在特征图上进行滑动，以锚方框作为参考，检测复杂目标图片的关键区域，给出关键区域的预测方框和是否是关键区域的可能性P_is,P_not；Step 3, input the feature map obtained in step 2 into the key region detection subnetwork, slide the feature map with a 3×3 subnetwork, and use the anchor box as a reference to detect the key region of the complex target image, and give The prediction box of the key area and the possibility of whether it is a key area P _is , P _not ;

步骤4，采用非最大抑制对检测到的重叠度较高的区域进行过滤，当不同预测方框交集部分面积与并集部分面积的比例超过规定的阈值IOU_threshold时，则仅保留是关键区域可能性P_is最大的预测方框，而对其他的方框进行过滤；Step 4: Use non-maximum suppression to filter the detected areas with a high degree of overlap. When the ratio of the area of the intersection of different prediction boxes to the area of the union exceeds the specified threshold IOU_threshold, only the possibility of the key area is retained P _is the largest prediction box, while filtering the other boxes;

步骤5，设定是关键区域可能性P_is的阈值P_threshold，将是关键区域可能性P_is大于设定阈值P_threshold的区域映射到VGG网络提取的特征图上；Step 5, setting the threshold P_threshold of the possibility P _is of the key area, and mapping the area where the possibility P is of the key area _is greater than the set threshold P_threshold to the feature map extracted by the VGG network;

步骤6，将步骤5得到的映射到特征图上的区域进行区域标准池化，把检测出的不同大小的区域池化为固定大小的特征图；Step 6, perform regional standard pooling on the region mapped to the feature map obtained in step 5, and pool the detected regions of different sizes into a fixed-size feature map;

步骤7、将步骤6得到的固定大小的特征图作为分类子网络的输入，使用分类子网络对其作精准的分类，使用softmax函数对分类结果归一化，得到对关键区域分类的概率；Step 7. Use the fixed-size feature map obtained in step 6 as the input of the classification sub-network, use the classification sub-network to classify it accurately, use the softmax function to normalize the classification results, and obtain the probability of classifying the key regions;

步骤8，针对同一张图片对应的一个复杂目标，对步骤7得到的各个关键区域的分类的对应概率取均值进行融合，得到复杂目标种类的精准识别结果。In step 8, for a complex target corresponding to the same picture, the mean value of the corresponding probabilities of the classifications of each key area obtained in step 7 is fused to obtain an accurate recognition result of the complex target type.

所述步骤1中，整个网络交叉训练的过程如下：In the step 1, the whole network cross-training process is as follows:

步骤11，使用以ImageNet数据库图片为训练样本、针对分类任务训练的VGG网络的权值作为初始权值，在此基础上进行微调；Step 11, using the ImageNet database picture as the training sample and the weight of the VGG network trained for the classification task as the initial weight, and fine-tuning on this basis;

步骤12，读取复杂目标图片和复杂目标图片对应的关键区域的坐标标签，对关键区域检测子网络进行训练，训练的损失函数为loss＝L_P+L_reg，其中L_P为关键区域检测子网络输出的是否是关键区域的概率P_is,P_not与标签真实值的交叉熵，L_reg为关键区域检测子网络输出的检测区域坐标偏移量与标签中实际关键区域坐标偏移量的平方和；Step 12, read the complex target picture and the coordinate label of the key area corresponding to the complex target picture, and train the key area detection sub-network, the training loss function is loss=L _P +L _reg , where L _P is the key area detector Whether the network output is the probability of the key area P _is , P _not and the cross entropy of the true value of the label, L _reg is the square of the coordinate offset of the detection area output by the key area detection sub-network and the actual key area coordinate offset in the label and;

步骤13，读取复杂目标图片和复杂目标图片对应的分类标签，对分类子网络进行训练，训练的损失函数为网络输出分类结果与实际标签结果之间的交叉熵；Step 13, read the complex target picture and the classification label corresponding to the complex target picture, and train the classification sub-network, the training loss function is the cross entropy between the network output classification result and the actual label result;

步骤14重复步骤12和步骤13若干次，对关键区域检测子网络和分类子网络进行交叉训练，直到网络稳定。Step 14 Repeat step 12 and step 13 several times to perform cross-training on the key region detection sub-network and classification sub-network until the network is stable.

所述步骤3中，关键区域检测的的方法如下：In the step 3, the method of key region detection is as follows:

步骤31，使用一个大小为3×3的滑动窗口，在步骤2得到的特征图上进行滑动，在每个位置得到一个512维的向量；Step 31, using a sliding window with a size of 3×3, sliding on the feature map obtained in step 2, and obtaining a 512-dimensional vector at each position;

步骤32，在每一个滑动窗口的位置设定9个锚方框作为参考，锚方框长宽比按1:2、1:1、2:1设定为三种比例，面积大小设定为128²、256²、512²像素三种大小，锚方框的中心点为所在滑动窗口的中心；Step 32, set 9 anchor boxes at the position of each sliding window as a reference, the aspect ratio of the anchor box is set to three ratios according to 1:2, 1:1, and 2:1, and the area size is set to 128 ² , 256 ² , 512 ² pixels in three sizes, the center point of the anchor box is the center of the sliding window;

步骤33，将上述每个滑动窗口位置得到的512维向量通过全连接网络输出9个6维的向量；每个向量表示相对于一个参考锚方框，检测区域的中心点坐标、长和宽的偏移量d_x,d_y,d_l,d_w和是否是关键区域可能性P_is,P_not，其中：d_x＝(x-x_a)/l_a，d_y＝(y-y_a)/w_a， d_l＝log(l/l_a)，d_w＝log(w/w_a)，x,y,l,w表示检测出的区域中心点坐标、长和宽， x_a,y_a,l_a,w_a表示参考锚区域中心点坐标、长和宽，P_is,P_not使用softmax函数进行归一化处理；Step 33, output the 512-dimensional vector obtained by each sliding window position above through the fully connected network to output nine 6-dimensional vectors; each vector represents the center point coordinates, length and width of the detection area relative to a reference anchor box Whether the offset d _x , d _y , d _l , d _w is the possibility of the key region P _is , P _not , where: d _x =(xx _a )/l _a , d _y =(yy _a )/w _a , d _l =log(l/l _a ), d _w =log(w/w _a ), x,y,l,w represent the coordinates, length and width of the center point of the detected area, x _a ,y _a ,l _a , w _a represent the coordinates, length and width of the center point of the reference anchor area, P _is , P _not use the softmax function for normalization;

步骤34根据网络回归得到的偏移量d_x,d_y,d_l,d_w，与锚方框的中心点坐标、长和宽x_a,y_a,l_a,w_a，计算出检测区域实际的中心点坐标、长和宽x,y,l,w。Step 34 Calculate the detection area according to the offsets d _x , d _y , d _l , d _w obtained from the network regression, and the center point coordinates, length and width x _a , y _a , l _a , w _a of the anchor box The actual center point coordinates, length and width x, y, l, w.

所述步骤6中，区域标准池化的方法如下：In the step 6, the method of regional standard pooling is as follows:

步骤61，把待池化的区域大小表示为m×n，将待池化的区域划分成7×7个，大小约为m/7×n/7的小格子，当m/7或n/7无法取整时，则按照四舍五入近似取整；Step 61, express the size of the area to be pooled as m×n, divide the area to be pooled into 7×7 small grids with a size of about m/7×n/7, when m/7 or n/ 7 If it cannot be rounded to an integer, it will be rounded to an approximate integer;

步骤62，在步骤61划分的每一个小格子中，使用最大池化的方法，将小格子的中的特征池化为1×1维的，这样，将不同大小的特征区域池化为7×7维固定大小的特征图。Step 62, in each small grid divided in step 61, use the method of maximum pooling to pool the features in the small grid into 1×1 dimension, so that the feature regions of different sizes are pooled into 7× 7-dimensional fixed-size feature maps.

综上所述，本发明所述的一种基于关键区域检测的复杂目标精准识别方法，包括：使用交叉训练的方法对整个神经网络进行融合训练，使用卷积神经网络提取目标特征，使用检测子网络以锚方框作为参考检测复杂目标的关键区域，使用区域标准池化将关键区域池化为固大小的特征图，使用分类子网络对关键区域进行分类，融合各个关键区域的分类结果从而达到对目标的精准识别。整个网络包括了关键区域检测子网络和关键区域分类子网络，由检测子网络检测出复杂目标具有区分度的关键区域，再由分类子网络对关键区域进行分类，融合各区域的分类结果对整体目标进行识别。这两个子网络共享了VGG卷积神经网络提取的特征,从而使复杂目标的识别达了到快速与精准的效果。To sum up, the method for accurate identification of complex targets based on key area detection described in the present invention includes: using cross-training methods to perform fusion training on the entire neural network, using convolutional neural networks to extract target features, and using detectors The network uses the anchor box as a reference to detect the key areas of complex targets, uses regional standard pooling to pool the key areas into feature maps of fixed size, uses the classification sub-network to classify the key areas, and fuses the classification results of each key area to achieve Accurate identification of the target. The entire network includes a key area detection sub-network and a key area classification sub-network. The detection sub-network detects key areas with a degree of discrimination for complex targets, and then the classification sub-network classifies the key areas. target identification. These two sub-networks share the features extracted by the VGG convolutional neural network, so that the recognition of complex targets can be achieved quickly and accurately.

本发明与现有技术相比的优点在于：The advantage of the present invention compared with prior art is:

(1)精准性：很多不同的复杂目标往往在大部分地方相似，而其不同之处往往体现在局部的关键区域。传统的目标识别方法将整张图片作为分类网络的输入，而整张图片含有大量的冗余信息和干扰信息，这限制了目标识别的精度。本方法使用检测子网络先检测出关键区域，再使用分类子网络对关键区域进行识别，融合各部分关键区域识别结果，达到目标精准识别的效果。(1) Accuracy: Many different complex targets are often similar in most places, but their differences are often reflected in local key areas. The traditional object recognition method takes the whole picture as the input of the classification network, and the whole picture contains a lot of redundant information and interference information, which limits the accuracy of target recognition. In this method, the detection sub-network is used to first detect the key area, and then the classification sub-network is used to identify the key area, and the identification results of each part of the key area are fused to achieve the effect of accurate target identification.

(2)快速性：本发明采用深度神经网络来提取原始图像的特征，检测子网络和分类子网络子网络共享同一个神经网络提取的特征。在训练过程中，采用交叉训练的方法对整个网络进行训练。在测试过程中，检测子网络和分类子网络共享同一个神经网络提取的特征，从而大大减少了网络的参数量和计算量，可以达到快速的目标识别效果。(2) Rapidity: The present invention uses a deep neural network to extract the features of the original image, and the detection sub-network and the classification sub-network share the features extracted by the same neural network. During the training process, the whole network is trained by cross-training method. During the test, the detection sub-network and the classification sub-network share the features extracted by the same neural network, which greatly reduces the amount of network parameters and calculations, and can achieve fast target recognition results.

附图说明Description of drawings

图1为本发明的实现流程示意图。Fig. 1 is a schematic diagram of the implementation flow of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合附图及具体实施例对本发明作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

本发明所述的一种基于关键区域检测的复杂目标精准识别方法，包括：使用交叉训练的方法对整个神经网络进行融合训练，使用卷积神经网络提取目标特征，使用检测子网络以锚方框作为参考检测复杂目标的关键区域，使用区域标准池化将关键区域池化为固大小的特征图，使用分类子网络对关键区域进行分类，融合各个关键区域的分类结果从而达到对目标的精准识别。整个网络包括了关键区域检测子网络和关键区域分类子网络，由检测子网络检测出复杂目标具有区分度的关键区域，再由分类子网络对关键区域进行分类，融合各区域的分类结果对整体目标进行识别。这两个子网络共享了VGG卷积神经网络提取的特征,从而使复杂目标的识别达了到快速与精准的效果。A method for accurately identifying complex targets based on key region detection according to the present invention includes: using cross-training methods to perform fusion training on the entire neural network, using convolutional neural networks to extract target features, and using detection sub-networks to anchor boxes As a reference to detect key areas of complex targets, use regional standard pooling to pool key areas into fixed-size feature maps, use classification subnetworks to classify key areas, and fuse classification results of each key area to achieve accurate identification of targets . The entire network includes a key area detection sub-network and a key area classification sub-network. The detection sub-network detects key areas with a degree of discrimination for complex targets, and then the classification sub-network classifies the key areas. target identification. These two sub-networks share the features extracted by the VGG convolutional neural network, so that the recognition of complex targets can be achieved quickly and accurately.

如图1所示，本发明具体实现如下步骤：As shown in Figure 1, the present invention specifically realizes the following steps:

步骤1，读取数据库中训练样本中的复杂目标图片、复杂目标图片对应的关键区域的坐标标签、以及复杂目标图片对应的分类标签，使用交叉训练的方法对复杂目标精准识别网络进行融合训练；Step 1, read the complex target picture in the training sample in the database, the coordinate label of the key area corresponding to the complex target picture, and the classification label corresponding to the complex target picture, and use the cross-training method to perform fusion training on the complex target accurate recognition network;

步骤2，将待识别的复杂目标图片作为步骤1训练之后的复杂目标精准识别网络的输入，通过VGG卷积神经网络提取特征，得到待识别的复杂目标图片的特征图；Step 2, using the complex target picture to be recognized as the input of the complex target accurate recognition network after step 1 training, extracting features through the VGG convolutional neural network, and obtaining the feature map of the complex target picture to be recognized;

所述步骤6中，区域标准池化的过程如下：In the step 6, the process of regional standard pooling is as follows:

综上所述，以上仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。To sum up, the above are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. a kind of accurate recognition methods of complex target based on key area detection, which is characterized in that include the following steps：

Step 1, the seat of the complex target picture in reading database in training sample, the corresponding key area of complex target picture Label and the corresponding tag along sort of complex target picture are marked, net is precisely identified to complex target using the method for cross-training Network carries out Fusion training；

Step 2, the defeated of network is precisely identified using complex target picture to be identified as the complex target of step 1 after training Enter, extracts feature by VGG convolutional neural networks, obtain the characteristic pattern of complex target picture to be identified；

Step 3, the characteristic pattern that step 2 obtains is input in key area detection sub-network network, is existed with the sub-network of 3 × 3 sizes It is slided on characteristic pattern, as reference with anchor box, the key area of detection of complex Target Photo provides the pre- of key area Survey box and whether be key area possibility P_is,P_not；

Step 4, the higher region of the degree of overlapping detected is filtered using non-maximum suppression, when different prediction box intersections When the ratio of area and union part area is more than defined threshold value IOU_threshold, then it is key area only to retain Possibility P_isMaximum prediction box, and other boxes are filtered；

Step 5, setting is key area possibility P_isThreshold value P_threshold, will be key area possibility P_isMore than setting Determine on the characteristic pattern that the area maps of threshold value P_threshold are extracted to VGG networks；

Step 6, the region being mapped on characteristic pattern that step 5 obtains is subjected to regional standard pond, the difference detected is big Small pool area turns to the characteristic pattern of fixed size；

Step 7, the fixed size for obtaining step 6 characteristic pattern as classification sub-network input, use classification sub-network pair It is accurately classified, and is normalized using softmax function pair classification results, obtains the probability classified to key area；

Step 8, for the corresponding complex target of same pictures, the classification for each key area that step 7 is obtained Corresponding probability takes mean value to be merged, and obtains the accurate recognition result of complex target type.

2. a kind of accurate recognition methods of complex target based on key area detection according to claim 1, feature exist In：In the step 1, cross-training process is as follows：

Step 11, the weights of the VGG networks using ImageNet databases picture as training sample, for classification task training are used As initial weight, it is finely adjusted on this basis；

Step 12, the coordinate label for reading complex target picture and the corresponding key area of complex target picture, to key area Detection sub-network network is trained, and trained loss function is loss=L_P+L_reg, wherein L_PIt is exported for key area detection sub-network network Whether be key area probability P_is,P_notWith the cross entropy of label actual value, L_regIt is exported for key area detection sub-network network Detection zone coordinate shift amount and label in practical key area coordinate shift amount quadratic sum；

Step 13, complex target picture and the corresponding tag along sort of complex target picture are read, classification sub-network is trained, Cross entropy of the trained loss function between network output category result and physical tags result；

Step 14 repeats step 12 and step 13 several times, and intersection instruction is carried out to key area detection sub-network network and classification sub-network Practice, until network stabilization.

3. a kind of accurate recognition methods of complex target based on key area detection according to claim 1, feature exist In：The step 3 specifically includes：

Step 31, the sliding window for a use of size being 3 × 3, is slided on the characteristic pattern that step 2 obtains, each Position obtains the vector of one 512 dimension；

Step 32,9 anchor boxes are set as reference in the position of each sliding window, anchor box length-width ratio presses 1:2、1:1、 2:1 is set as three kinds of ratios, and size is set as 128²、256²、512²The central point of three kinds of sizes of pixel, anchor box is institute At the center of sliding window；

Step 33,512 dimensional vectors above-mentioned each sliding window position obtained by fully-connected network export 96 dimension to Amount；Each vector indicates to refer to anchor box, center point coordinate, length and the wide offset d of detection zone relative to one_x,d_y, d_l,d_wWhether it is key area possibility P_is,P_not, wherein：d_x=(x-x_a)/l_a, d_y=(y-y_a)/w_a, d_l=log (l/ l_a), d_w=log (w/w_a), x, y, l, w indicate the regional center point coordinates, length and the width that detect, x_a,y_a,l_a,w_aIndicate reference Anchoring area domain center point coordinate, length and width, P_is,P_notIt is normalized using softmax functions；

The offset d that step 34 is obtained according to net regression_x,d_y,d_l,d_w, center point coordinate, length and width x with anchor box_a,y_a, l_a,w_a, calculate the actual center point coordinate of detection zone, length and width x, y, l, w.

4. a kind of accurate recognition methods of complex target based on key area detection according to claim 1, feature exist In：In the step 6, the process in regional standard pond is as follows：

Step 61, waiting for that the area size in pond is expressed as m × n, the region division in pond will be waited for into 7 × 7, size is about m/ The sub-box of 7 × n/7, when m/7 or n/7 can not rounding when, then according to the approximate rounding that rounds up；

Step 62, in each sub-box that step 61 divides, using the method in maximum pond, by the feature in sub-box Pond turns to 1 × 1 dimension, in this way, different size of characteristic area pond to be turned to the characteristic pattern of 7 × 7 dimension fixed sizes.