[go: up one dir, main page]

CN108537286A - A kind of accurate recognition methods of complex target based on key area detection - Google Patents

A kind of accurate recognition methods of complex target based on key area detection Download PDF

Info

Publication number
CN108537286A
CN108537286A CN201810345899.9A CN201810345899A CN108537286A CN 108537286 A CN108537286 A CN 108537286A CN 201810345899 A CN201810345899 A CN 201810345899A CN 108537286 A CN108537286 A CN 108537286A
Authority
CN
China
Prior art keywords
network
key area
sub
complex target
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810345899.9A
Other languages
Chinese (zh)
Other versions
CN108537286B (en
Inventor
王田
李玮匡
李嘉锟
陶飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201810345899.9A priority Critical patent/CN108537286B/en
Publication of CN108537286A publication Critical patent/CN108537286A/en
Application granted granted Critical
Publication of CN108537286B publication Critical patent/CN108537286B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种基于关键区域检测的复杂目标精准识别方法,包括:使用交叉训练的方法对整个神经网络进行融合训练,使用卷积神经网络提取目标特征,使用检测子网络以锚方框作为参考检测复杂目标的关键区域,使用区域标准池化将关键区域池化为固大小的特征图,使用分类子网络对关键区域进行分类,融合各个关键区域的分类结果从而达到对目标的精准识别。整个网络包括了关键区域检测子网络和关键区域分类子网络,由检测子网络检测出复杂目标具有区分度的关键区域,再由分类子网络对关键区域进行分类,融合各区域的分类结果对整体目标进行识别。这两个子网络共享了VGG卷积神经网络提取的特征,从而使复杂目标的识别达了到快速与精准的效果。

The present invention relates to a method for accurately identifying complex targets based on key region detection, including: using cross-training methods to perform fusion training on the entire neural network, using convolutional neural networks to extract target features, and using detection sub-networks with anchor boxes as references Detect the key areas of complex targets, use regional standard pooling to pool the key areas into fixed-size feature maps, use the classification sub-network to classify the key areas, and fuse the classification results of each key area to achieve accurate recognition of the target. The entire network includes a key area detection sub-network and a key area classification sub-network. The detection sub-network detects key areas with a degree of discrimination for complex targets, and then the classification sub-network classifies the key areas. target identification. These two sub-networks share the features extracted by the VGG convolutional neural network, so that the recognition of complex targets can be achieved quickly and accurately.

Description

一种基于关键区域检测的复杂目标精准识别方法A method for accurate recognition of complex targets based on key region detection

技术领域technical field

本发明涉及图像处理技术,特别是涉及一种基于关键区域检测的复杂目标精准识别方法。The invention relates to image processing technology, in particular to a method for accurately recognizing complex targets based on key region detection.

背景技术Background technique

复杂目标的分类与识别,是计算机视觉领域一项重要而基础的任务。不同种类的复杂目标,其大部分部位往往是相同或者相似的,而其差异往往体现在局部的一些关键区域,因此复杂目标的图像存在着大量干扰和冗余信息。而现有的一些针对复杂目标的分类识别方法,因无法去除复杂目标图像中的干扰与冗余信息,存在着精确度低的问题。为了实现对复杂目标的精准分类识别,研究一种基于关键区域检测的复杂目标精准识别方法具有重要意义。The classification and recognition of complex objects is an important and basic task in the field of computer vision. Most of the parts of different types of complex targets are the same or similar, but the differences are often reflected in some key local areas. Therefore, there are a lot of interference and redundant information in the images of complex targets. However, some existing classification and recognition methods for complex targets have the problem of low accuracy because they cannot remove the interference and redundant information in complex target images. In order to achieve accurate classification and recognition of complex targets, it is of great significance to study a method for precise recognition of complex targets based on key region detection.

发明内容Contents of the invention

有鉴于此,本发明的主要目的在于提供一种识别精准度高的基关键区域检测的复杂目标精准识别方法,在大大提高检测精度的同时,保证了识别的快速性。In view of this, the main purpose of the present invention is to provide an accurate recognition method for complex targets based on key area detection with high recognition accuracy, which ensures rapid recognition while greatly improving detection accuracy.

为了达到上述目的,本发明提出的技术方案为:一种基于关键区域检测的复杂目标精准识别方法,实现步骤如下:In order to achieve the above purpose, the technical solution proposed by the present invention is: a method for accurately identifying complex targets based on key area detection, and the implementation steps are as follows:

步骤1,读取数据库中训练样本中复杂目标图片,复杂目标关键区域的坐标标签,以及复杂目标分类标签,使用交叉训练的方法对复杂目标精准识别网络进行融合训练。Step 1. Read the images of complex targets in the training samples in the database, the coordinate labels of the key areas of complex targets, and the classification labels of complex targets, and use the cross-training method to perform fusion training on the complex target accurate recognition network.

步骤2,将待识别的复杂目标图片作为步骤1训练之后的复杂目标精准识别网络的输入,通过VGG卷积神经网络提取特征,得到待识别的复杂目标图片的特征图。In step 2, the complex target picture to be recognized is used as the input of the complex target accurate recognition network trained in step 1, and features are extracted through the VGG convolutional neural network to obtain the feature map of the complex target picture to be recognized.

步骤3,将步骤2得到的特征图输入到关键区域检测子网络中,以3×3大小的子网络在特征图上进行滑动,以锚方框作为参考,检测复杂目标图片的关键区域,给出关键区域的预测方框和是否是关键区域的可能性Pis,PnotStep 3, input the feature map obtained in step 2 into the key region detection subnetwork, slide the feature map with a 3×3 subnetwork, and use the anchor box as a reference to detect the key region of the complex target image, and give The prediction box of the key area and the possibility of whether it is a key area P is , P not ;

步骤4,采用非最大抑制对检测到的重叠度较高的区域进行过滤,当不同预测方框交集部分面积与并集部分面积的比例超过规定的阈值IOU_threshold时,则仅保留是关键区域可能性Pis最大的预测方框,而对其他的方框进行过滤;Step 4: Use non-maximum suppression to filter the detected areas with a high degree of overlap. When the ratio of the area of the intersection of different prediction boxes to the area of the union exceeds the specified threshold IOU_threshold, only the possibility of the key area is retained P is the largest prediction box, while filtering the other boxes;

步骤5,设定是关键区域可能性Pis的阈值P_threshold,将是关键区域可能性Pis大于设定阈值P_threshold的区域映射到VGG网络提取的特征图上;Step 5, setting the threshold P_threshold of the possibility P is of the key area, and mapping the area where the possibility P is of the key area is greater than the set threshold P_threshold to the feature map extracted by the VGG network;

步骤6,将步骤5得到的映射到特征图上的区域进行区域标准池化,把检测出的不同大小的区域池化为固定大小的特征图;Step 6, perform regional standard pooling on the region mapped to the feature map obtained in step 5, and pool the detected regions of different sizes into a fixed-size feature map;

步骤7、将步骤6得到的固定大小的特征图作为分类子网络的输入,使用分类子网络对其作精准的分类,使用softmax函数对分类结果归一化,得到对关键区域分类的概率;Step 7. Use the fixed-size feature map obtained in step 6 as the input of the classification sub-network, use the classification sub-network to classify it accurately, use the softmax function to normalize the classification results, and obtain the probability of classifying the key regions;

步骤8,针对同一张图片对应的一个复杂目标,对步骤7得到的各个关键区域的分类的对应概率取均值进行融合,得到复杂目标种类的精准识别结果。In step 8, for a complex target corresponding to the same picture, the mean value of the corresponding probabilities of the classifications of each key area obtained in step 7 is fused to obtain an accurate recognition result of the complex target type.

所述步骤1中,整个网络交叉训练的过程如下:In the step 1, the whole network cross-training process is as follows:

步骤11,使用以ImageNet数据库图片为训练样本、针对分类任务训练的VGG网络的权值作为初始权值,在此基础上进行微调;Step 11, using the ImageNet database picture as the training sample and the weight of the VGG network trained for the classification task as the initial weight, and fine-tuning on this basis;

步骤12,读取复杂目标图片和复杂目标图片对应的关键区域的坐标标签,对关键区域检测子网络进行训练,训练的损失函数为loss=LP+Lreg,其中LP为关键区域检测子网络输出的是否是关键区域的概率Pis,Pnot与标签真实值的交叉熵,Lreg为关键区域检测子网络输出的检测区域坐标偏移量与标签中实际关键区域坐标偏移量的平方和;Step 12, read the complex target picture and the coordinate label of the key area corresponding to the complex target picture, and train the key area detection sub-network, the training loss function is loss=L P +L reg , where L P is the key area detector Whether the network output is the probability of the key area P is , P not and the cross entropy of the true value of the label, L reg is the square of the coordinate offset of the detection area output by the key area detection sub-network and the actual key area coordinate offset in the label and;

步骤13,读取复杂目标图片和复杂目标图片对应的分类标签,对分类子网络进行训练,训练的损失函数为网络输出分类结果与实际标签结果之间的交叉熵;Step 13, read the complex target picture and the classification label corresponding to the complex target picture, and train the classification sub-network, the training loss function is the cross entropy between the network output classification result and the actual label result;

步骤14重复步骤12和步骤13若干次,对关键区域检测子网络和分类子网络进行交叉训练,直到网络稳定。Step 14 Repeat step 12 and step 13 several times to perform cross-training on the key region detection sub-network and classification sub-network until the network is stable.

所述步骤3中,关键区域检测的的方法如下:In the step 3, the method of key region detection is as follows:

步骤31,使用一个大小为3×3的滑动窗口,在步骤2得到的特征图上进行滑动,在每个位置得到一个512维的向量;Step 31, using a sliding window with a size of 3×3, sliding on the feature map obtained in step 2, and obtaining a 512-dimensional vector at each position;

步骤32,在每一个滑动窗口的位置设定9个锚方框作为参考,锚方框长宽比按1:2、1:1、2:1设定为三种比例,面积大小设定为1282、2562、5122像素三种大小,锚方框的中心点为所在滑动窗口的中心;Step 32, set 9 anchor boxes at the position of each sliding window as a reference, the aspect ratio of the anchor box is set to three ratios according to 1:2, 1:1, and 2:1, and the area size is set to 128 2 , 256 2 , 512 2 pixels in three sizes, the center point of the anchor box is the center of the sliding window;

步骤33,将上述每个滑动窗口位置得到的512维向量通过全连接网络输出9个6维的向量;每个向量表示相对于一个参考锚方框,检测区域的中心点坐标、长和宽的偏移量dx,dy,dl,dw和是否是关键区域可能性Pis,Pnot,其中:dx=(x-xa)/la,dy=(y-ya)/wa, dl=log(l/la),dw=log(w/wa),x,y,l,w表示检测出的区域中心点坐标、长和宽, xa,ya,la,wa表示参考锚区域中心点坐标、长和宽,Pis,Pnot使用softmax函数进行归一化处理;Step 33, output the 512-dimensional vector obtained by each sliding window position above through the fully connected network to output nine 6-dimensional vectors; each vector represents the center point coordinates, length and width of the detection area relative to a reference anchor box Whether the offset d x , d y , d l , d w is the possibility of the key region P is , P not , where: d x =(xx a )/l a , d y =(yy a )/w a , d l =log(l/l a ), d w =log(w/w a ), x,y,l,w represent the coordinates, length and width of the center point of the detected area, x a ,y a ,l a , w a represent the coordinates, length and width of the center point of the reference anchor area, P is , P not use the softmax function for normalization;

步骤34根据网络回归得到的偏移量dx,dy,dl,dw,与锚方框的中心点坐标、长和宽xa,ya,la,wa,计算出检测区域实际的中心点坐标、长和宽x,y,l,w。Step 34 Calculate the detection area according to the offsets d x , d y , d l , d w obtained from the network regression, and the center point coordinates, length and width x a , y a , l a , w a of the anchor box The actual center point coordinates, length and width x, y, l, w.

所述步骤6中,区域标准池化的方法如下:In the step 6, the method of regional standard pooling is as follows:

步骤61,把待池化的区域大小表示为m×n,将待池化的区域划分成7×7个,大小约为m/7×n/7的小格子,当m/7或n/7无法取整时,则按照四舍五入近似取整;Step 61, express the size of the area to be pooled as m×n, divide the area to be pooled into 7×7 small grids with a size of about m/7×n/7, when m/7 or n/ 7 If it cannot be rounded to an integer, it will be rounded to an approximate integer;

步骤62,在步骤61划分的每一个小格子中,使用最大池化的方法,将小格子的中的特征池化为1×1维的,这样,将不同大小的特征区域池化为7×7维固定大小的特征图。Step 62, in each small grid divided in step 61, use the method of maximum pooling to pool the features in the small grid into 1×1 dimension, so that the feature regions of different sizes are pooled into 7× 7-dimensional fixed-size feature maps.

综上所述,本发明所述的一种基于关键区域检测的复杂目标精准识别方法,包括:使用交叉训练的方法对整个神经网络进行融合训练,使用卷积神经网络提取目标特征,使用检测子网络以锚方框作为参考检测复杂目标的关键区域,使用区域标准池化将关键区域池化为固大小的特征图,使用分类子网络对关键区域进行分类,融合各个关键区域的分类结果从而达到对目标的精准识别。整个网络包括了关键区域检测子网络和关键区域分类子网络,由检测子网络检测出复杂目标具有区分度的关键区域,再由分类子网络对关键区域进行分类,融合各区域的分类结果对整体目标进行识别。这两个子网络共享了VGG卷积神经网络提取的特征,从而使复杂目标的识别达了到快速与精准的效果。To sum up, the method for accurate identification of complex targets based on key area detection described in the present invention includes: using cross-training methods to perform fusion training on the entire neural network, using convolutional neural networks to extract target features, and using detectors The network uses the anchor box as a reference to detect the key areas of complex targets, uses regional standard pooling to pool the key areas into feature maps of fixed size, uses the classification sub-network to classify the key areas, and fuses the classification results of each key area to achieve Accurate identification of the target. The entire network includes a key area detection sub-network and a key area classification sub-network. The detection sub-network detects key areas with a degree of discrimination for complex targets, and then the classification sub-network classifies the key areas. target identification. These two sub-networks share the features extracted by the VGG convolutional neural network, so that the recognition of complex targets can be achieved quickly and accurately.

本发明与现有技术相比的优点在于:The advantage of the present invention compared with prior art is:

(1)精准性:很多不同的复杂目标往往在大部分地方相似,而其不同之处往往体现在局部的关键区域。传统的目标识别方法将整张图片作为分类网络的输入,而整张图片含有大量的冗余信息和干扰信息,这限制了目标识别的精度。本方法使用检测子网络先检测出关键区域,再使用分类子网络对关键区域进行识别,融合各部分关键区域识别结果,达到目标精准识别的效果。(1) Accuracy: Many different complex targets are often similar in most places, but their differences are often reflected in local key areas. The traditional object recognition method takes the whole picture as the input of the classification network, and the whole picture contains a lot of redundant information and interference information, which limits the accuracy of target recognition. In this method, the detection sub-network is used to first detect the key area, and then the classification sub-network is used to identify the key area, and the identification results of each part of the key area are fused to achieve the effect of accurate target identification.

(2)快速性:本发明采用深度神经网络来提取原始图像的特征,检测子网络和分类子网络子网络共享同一个神经网络提取的特征。在训练过程中,采用交叉训练的方法对整个网络进行训练。在测试过程中,检测子网络和分类子网络共享同一个神经网络提取的特征,从而大大减少了网络的参数量和计算量,可以达到快速的目标识别效果。(2) Rapidity: The present invention uses a deep neural network to extract the features of the original image, and the detection sub-network and the classification sub-network share the features extracted by the same neural network. During the training process, the whole network is trained by cross-training method. During the test, the detection sub-network and the classification sub-network share the features extracted by the same neural network, which greatly reduces the amount of network parameters and calculations, and can achieve fast target recognition results.

附图说明Description of drawings

图1为本发明的实现流程示意图。Fig. 1 is a schematic diagram of the implementation flow of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合附图及具体实施例对本发明作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

本发明所述的一种基于关键区域检测的复杂目标精准识别方法,包括:使用交叉训练的方法对整个神经网络进行融合训练,使用卷积神经网络提取目标特征,使用检测子网络以锚方框作为参考检测复杂目标的关键区域,使用区域标准池化将关键区域池化为固大小的特征图,使用分类子网络对关键区域进行分类,融合各个关键区域的分类结果从而达到对目标的精准识别。整个网络包括了关键区域检测子网络和关键区域分类子网络,由检测子网络检测出复杂目标具有区分度的关键区域,再由分类子网络对关键区域进行分类,融合各区域的分类结果对整体目标进行识别。这两个子网络共享了VGG卷积神经网络提取的特征,从而使复杂目标的识别达了到快速与精准的效果。A method for accurately identifying complex targets based on key region detection according to the present invention includes: using cross-training methods to perform fusion training on the entire neural network, using convolutional neural networks to extract target features, and using detection sub-networks to anchor boxes As a reference to detect key areas of complex targets, use regional standard pooling to pool key areas into fixed-size feature maps, use classification subnetworks to classify key areas, and fuse classification results of each key area to achieve accurate identification of targets . The entire network includes a key area detection sub-network and a key area classification sub-network. The detection sub-network detects key areas with a degree of discrimination for complex targets, and then the classification sub-network classifies the key areas. target identification. These two sub-networks share the features extracted by the VGG convolutional neural network, so that the recognition of complex targets can be achieved quickly and accurately.

如图1所示,本发明具体实现如下步骤:As shown in Figure 1, the present invention specifically realizes the following steps:

步骤1,读取数据库中训练样本中的复杂目标图片、复杂目标图片对应的关键区域的坐标标签、以及复杂目标图片对应的分类标签,使用交叉训练的方法对复杂目标精准识别网络进行融合训练;Step 1, read the complex target picture in the training sample in the database, the coordinate label of the key area corresponding to the complex target picture, and the classification label corresponding to the complex target picture, and use the cross-training method to perform fusion training on the complex target accurate recognition network;

步骤2,将待识别的复杂目标图片作为步骤1训练之后的复杂目标精准识别网络的输入,通过VGG卷积神经网络提取特征,得到待识别的复杂目标图片的特征图;Step 2, using the complex target picture to be recognized as the input of the complex target accurate recognition network after step 1 training, extracting features through the VGG convolutional neural network, and obtaining the feature map of the complex target picture to be recognized;

步骤3,将步骤2得到的特征图输入到关键区域检测子网络中,以3×3大小的子网络在特征图上进行滑动,以锚方框作为参考,检测复杂目标图片的关键区域,给出关键区域的预测方框和是否是关键区域的可能性Pis,PnotStep 3, input the feature map obtained in step 2 into the key region detection subnetwork, slide the feature map with a 3×3 subnetwork, and use the anchor box as a reference to detect the key region of the complex target image, and give The prediction box of the key area and the possibility of whether it is a key area P is , P not ;

步骤4,采用非最大抑制对检测到的重叠度较高的区域进行过滤,当不同预测方框交集部分面积与并集部分面积的比例超过规定的阈值IOU_threshold时,则仅保留是关键区域可能性Pis最大的预测方框,而对其他的方框进行过滤;Step 4: Use non-maximum suppression to filter the detected areas with a high degree of overlap. When the ratio of the area of the intersection of different prediction boxes to the area of the union exceeds the specified threshold IOU_threshold, only the possibility of the key area is retained P is the largest prediction box, while filtering the other boxes;

步骤5,设定是关键区域可能性Pis的阈值P_threshold,将是关键区域可能性Pis大于设定阈值P_threshold的区域映射到VGG网络提取的特征图上;Step 5, setting the threshold P_threshold of the possibility P is of the key area, and mapping the area where the possibility P is of the key area is greater than the set threshold P_threshold to the feature map extracted by the VGG network;

步骤6,将步骤5得到的映射到特征图上的区域进行区域标准池化,把检测出的不同大小的区域池化为固定大小的特征图;Step 6, perform regional standard pooling on the region mapped to the feature map obtained in step 5, and pool the detected regions of different sizes into a fixed-size feature map;

步骤7、将步骤6得到的固定大小的特征图作为分类子网络的输入,使用分类子网络对其作精准的分类,使用softmax函数对分类结果归一化,得到对关键区域分类的概率;Step 7. Use the fixed-size feature map obtained in step 6 as the input of the classification sub-network, use the classification sub-network to classify it accurately, use the softmax function to normalize the classification results, and obtain the probability of classifying the key regions;

步骤8,针对同一张图片对应的一个复杂目标,对步骤7得到的各个关键区域的分类的对应概率取均值进行融合,得到复杂目标种类的精准识别结果。In step 8, for a complex target corresponding to the same picture, the mean value of the corresponding probabilities of the classifications of each key area obtained in step 7 is fused to obtain an accurate recognition result of the complex target type.

所述步骤1中,整个网络交叉训练的过程如下:In the step 1, the whole network cross-training process is as follows:

步骤11,使用以ImageNet数据库图片为训练样本、针对分类任务训练的VGG网络的权值作为初始权值,在此基础上进行微调;Step 11, using the ImageNet database picture as the training sample and the weight of the VGG network trained for the classification task as the initial weight, and fine-tuning on this basis;

步骤12,读取复杂目标图片和复杂目标图片对应的关键区域的坐标标签,对关键区域检测子网络进行训练,训练的损失函数为loss=LP+Lreg,其中LP为关键区域检测子网络输出的是否是关键区域的概率Pis,Pnot与标签真实值的交叉熵,Lreg为关键区域检测子网络输出的检测区域坐标偏移量与标签中实际关键区域坐标偏移量的平方和;Step 12, read the complex target picture and the coordinate label of the key area corresponding to the complex target picture, and train the key area detection sub-network, the training loss function is loss=L P +L reg , where L P is the key area detector Whether the network output is the probability of the key area P is , P not and the cross entropy of the true value of the label, L reg is the square of the coordinate offset of the detection area output by the key area detection sub-network and the actual key area coordinate offset in the label and;

步骤13,读取复杂目标图片和复杂目标图片对应的分类标签,对分类子网络进行训练,训练的损失函数为网络输出分类结果与实际标签结果之间的交叉熵;Step 13, read the complex target picture and the classification label corresponding to the complex target picture, and train the classification sub-network, the training loss function is the cross entropy between the network output classification result and the actual label result;

步骤14重复步骤12和步骤13若干次,对关键区域检测子网络和分类子网络进行交叉训练,直到网络稳定。Step 14 Repeat step 12 and step 13 several times to perform cross-training on the key region detection sub-network and classification sub-network until the network is stable.

所述步骤3中,关键区域检测的的方法如下:In the step 3, the method of key region detection is as follows:

步骤31,使用一个大小为3×3的滑动窗口,在步骤2得到的特征图上进行滑动,在每个位置得到一个512维的向量;Step 31, using a sliding window with a size of 3×3, sliding on the feature map obtained in step 2, and obtaining a 512-dimensional vector at each position;

步骤32,在每一个滑动窗口的位置设定9个锚方框作为参考,锚方框长宽比按1:2、1:1、2:1设定为三种比例,面积大小设定为1282、2562、5122像素三种大小,锚方框的中心点为所在滑动窗口的中心;Step 32, set 9 anchor boxes at the position of each sliding window as a reference, the aspect ratio of the anchor box is set to three ratios according to 1:2, 1:1, and 2:1, and the area size is set to 128 2 , 256 2 , 512 2 pixels in three sizes, the center point of the anchor box is the center of the sliding window;

步骤33,将上述每个滑动窗口位置得到的512维向量通过全连接网络输出9个6维的向量;每个向量表示相对于一个参考锚方框,检测区域的中心点坐标、长和宽的偏移量dx,dy,dl,dw和是否是关键区域可能性Pis,Pnot,其中:dx=(x-xa)/la,dy=(y-ya)/wa, dl=log(l/la),dw=log(w/wa),x,y,l,w表示检测出的区域中心点坐标、长和宽, xa,ya,la,wa表示参考锚区域中心点坐标、长和宽,Pis,Pnot使用softmax函数进行归一化处理;Step 33, output the 512-dimensional vector obtained by each sliding window position above through the fully connected network to output nine 6-dimensional vectors; each vector represents the center point coordinates, length and width of the detection area relative to a reference anchor box Whether the offset d x , d y , d l , d w is the possibility of the key region P is , P not , where: d x =(xx a )/l a , d y =(yy a )/w a , d l =log(l/l a ), d w =log(w/w a ), x,y,l,w represent the coordinates, length and width of the center point of the detected area, x a ,y a ,l a , w a represent the coordinates, length and width of the center point of the reference anchor area, P is , P not use the softmax function for normalization;

步骤34根据网络回归得到的偏移量dx,dy,dl,dw,与锚方框的中心点坐标、长和宽xa,ya,la,wa,计算出检测区域实际的中心点坐标、长和宽x,y,l,w。Step 34 Calculate the detection area according to the offsets d x , d y , d l , d w obtained from the network regression, and the center point coordinates, length and width x a , y a , l a , w a of the anchor box The actual center point coordinates, length and width x, y, l, w.

所述步骤6中,区域标准池化的过程如下:In the step 6, the process of regional standard pooling is as follows:

步骤61,把待池化的区域大小表示为m×n,将待池化的区域划分成7×7个,大小约为m/7×n/7的小格子,当m/7或n/7无法取整时,则按照四舍五入近似取整;Step 61, express the size of the area to be pooled as m×n, divide the area to be pooled into 7×7 small grids with a size of about m/7×n/7, when m/7 or n/ 7 If it cannot be rounded to an integer, it will be rounded to an approximate integer;

步骤62,在步骤61划分的每一个小格子中,使用最大池化的方法,将小格子的中的特征池化为1×1维的,这样,将不同大小的特征区域池化为7×7维固定大小的特征图。Step 62, in each small grid divided in step 61, use the method of maximum pooling to pool the features in the small grid into 1×1 dimension, so that the feature regions of different sizes are pooled into 7× 7-dimensional fixed-size feature maps.

综上所述,以上仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。To sum up, the above are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (4)

1. a kind of accurate recognition methods of complex target based on key area detection, which is characterized in that include the following steps:
Step 1, the seat of the complex target picture in reading database in training sample, the corresponding key area of complex target picture Label and the corresponding tag along sort of complex target picture are marked, net is precisely identified to complex target using the method for cross-training Network carries out Fusion training;
Step 2, the defeated of network is precisely identified using complex target picture to be identified as the complex target of step 1 after training Enter, extracts feature by VGG convolutional neural networks, obtain the characteristic pattern of complex target picture to be identified;
Step 3, the characteristic pattern that step 2 obtains is input in key area detection sub-network network, is existed with the sub-network of 3 × 3 sizes It is slided on characteristic pattern, as reference with anchor box, the key area of detection of complex Target Photo provides the pre- of key area Survey box and whether be key area possibility Pis,Pnot
Step 4, the higher region of the degree of overlapping detected is filtered using non-maximum suppression, when different prediction box intersections When the ratio of area and union part area is more than defined threshold value IOU_threshold, then it is key area only to retain Possibility PisMaximum prediction box, and other boxes are filtered;
Step 5, setting is key area possibility PisThreshold value P_threshold, will be key area possibility PisMore than setting Determine on the characteristic pattern that the area maps of threshold value P_threshold are extracted to VGG networks;
Step 6, the region being mapped on characteristic pattern that step 5 obtains is subjected to regional standard pond, the difference detected is big Small pool area turns to the characteristic pattern of fixed size;
Step 7, the fixed size for obtaining step 6 characteristic pattern as classification sub-network input, use classification sub-network pair It is accurately classified, and is normalized using softmax function pair classification results, obtains the probability classified to key area;
Step 8, for the corresponding complex target of same pictures, the classification for each key area that step 7 is obtained Corresponding probability takes mean value to be merged, and obtains the accurate recognition result of complex target type.
2. a kind of accurate recognition methods of complex target based on key area detection according to claim 1, feature exist In:In the step 1, cross-training process is as follows:
Step 11, the weights of the VGG networks using ImageNet databases picture as training sample, for classification task training are used As initial weight, it is finely adjusted on this basis;
Step 12, the coordinate label for reading complex target picture and the corresponding key area of complex target picture, to key area Detection sub-network network is trained, and trained loss function is loss=LP+Lreg, wherein LPIt is exported for key area detection sub-network network Whether be key area probability Pis,PnotWith the cross entropy of label actual value, LregIt is exported for key area detection sub-network network Detection zone coordinate shift amount and label in practical key area coordinate shift amount quadratic sum;
Step 13, complex target picture and the corresponding tag along sort of complex target picture are read, classification sub-network is trained, Cross entropy of the trained loss function between network output category result and physical tags result;
Step 14 repeats step 12 and step 13 several times, and intersection instruction is carried out to key area detection sub-network network and classification sub-network Practice, until network stabilization.
3. a kind of accurate recognition methods of complex target based on key area detection according to claim 1, feature exist In:The step 3 specifically includes:
Step 31, the sliding window for a use of size being 3 × 3, is slided on the characteristic pattern that step 2 obtains, each Position obtains the vector of one 512 dimension;
Step 32,9 anchor boxes are set as reference in the position of each sliding window, anchor box length-width ratio presses 1:2、1:1、 2:1 is set as three kinds of ratios, and size is set as 1282、2562、5122The central point of three kinds of sizes of pixel, anchor box is institute At the center of sliding window;
Step 33,512 dimensional vectors above-mentioned each sliding window position obtained by fully-connected network export 96 dimension to Amount;Each vector indicates to refer to anchor box, center point coordinate, length and the wide offset d of detection zone relative to onex,dy, dl,dwWhether it is key area possibility Pis,Pnot, wherein:dx=(x-xa)/la, dy=(y-ya)/wa, dl=log (l/ la), dw=log (w/wa), x, y, l, w indicate the regional center point coordinates, length and the width that detect, xa,ya,la,waIndicate reference Anchoring area domain center point coordinate, length and width, Pis,PnotIt is normalized using softmax functions;
The offset d that step 34 is obtained according to net regressionx,dy,dl,dw, center point coordinate, length and width x with anchor boxa,ya, la,wa, calculate the actual center point coordinate of detection zone, length and width x, y, l, w.
4. a kind of accurate recognition methods of complex target based on key area detection according to claim 1, feature exist In:In the step 6, the process in regional standard pond is as follows:
Step 61, waiting for that the area size in pond is expressed as m × n, the region division in pond will be waited for into 7 × 7, size is about m/ The sub-box of 7 × n/7, when m/7 or n/7 can not rounding when, then according to the approximate rounding that rounds up;
Step 62, in each sub-box that step 61 divides, using the method in maximum pond, by the feature in sub-box Pond turns to 1 × 1 dimension, in this way, different size of characteristic area pond to be turned to the characteristic pattern of 7 × 7 dimension fixed sizes.
CN201810345899.9A 2018-04-18 2018-04-18 An Accurate Recognition Method of Complex Targets Based on Key Area Detection Active CN108537286B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810345899.9A CN108537286B (en) 2018-04-18 2018-04-18 An Accurate Recognition Method of Complex Targets Based on Key Area Detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810345899.9A CN108537286B (en) 2018-04-18 2018-04-18 An Accurate Recognition Method of Complex Targets Based on Key Area Detection

Publications (2)

Publication Number Publication Date
CN108537286A true CN108537286A (en) 2018-09-14
CN108537286B CN108537286B (en) 2020-11-24

Family

ID=63481345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810345899.9A Active CN108537286B (en) 2018-04-18 2018-04-18 An Accurate Recognition Method of Complex Targets Based on Key Area Detection

Country Status (1)

Country Link
CN (1) CN108537286B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410601A (en) * 2018-12-04 2019-03-01 北京英泰智科技股份有限公司 Method for controlling traffic signal lights, device, electronic equipment and storage medium
CN109829398A (en) * 2019-01-16 2019-05-31 北京航空航天大学 A kind of object detection method in video based on Three dimensional convolution network
CN110852285A (en) * 2019-11-14 2020-02-28 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
WO2020057145A1 (en) * 2018-09-21 2020-03-26 Boe Technology Group Co., Ltd. Method and device for generating painting display sequence, and computer storage medium
CN110929678A (en) * 2019-12-04 2020-03-27 山东省计算中心(国家超级计算济南中心) Method for detecting candida vulva vagina spores
CN110955380A (en) * 2018-09-21 2020-04-03 中科寒武纪科技股份有限公司 Access data generation method, storage medium, computer device and apparatus
CN111612797A (en) * 2020-03-03 2020-09-01 江苏大学 A rice image information processing system
CN111931877A (en) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 Target detection method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106599939A (en) * 2016-12-30 2017-04-26 深圳市唯特视科技有限公司 Real-time target detection method based on region convolutional neural network
CN107169421A (en) * 2017-04-20 2017-09-15 华南理工大学 A kind of car steering scene objects detection method based on depth convolutional neural networks
CN107368845A (en) * 2017-06-15 2017-11-21 华南理工大学 A kind of Faster R CNN object detection methods based on optimization candidate region
CN107798335A (en) * 2017-08-28 2018-03-13 浙江工业大学 A kind of automobile logo identification method for merging sliding window and Faster R CNN convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106599939A (en) * 2016-12-30 2017-04-26 深圳市唯特视科技有限公司 Real-time target detection method based on region convolutional neural network
CN107169421A (en) * 2017-04-20 2017-09-15 华南理工大学 A kind of car steering scene objects detection method based on depth convolutional neural networks
CN107368845A (en) * 2017-06-15 2017-11-21 华南理工大学 A kind of Faster R CNN object detection methods based on optimization candidate region
CN107798335A (en) * 2017-08-28 2018-03-13 浙江工业大学 A kind of automobile logo identification method for merging sliding window and Faster R CNN convolutional neural networks

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BO ZHAO等: ""A survey on deep learning-based fine-grained object classification and semantic segmentation"", 《INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING》 *
SHAOQING REN等: ""Faster r-cnn: Towards real-time object detection with region proposal networks"", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
XIANGTENG HE等: ""Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN"", 《MM’17 PROCEEDINGS OF THE 25TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》 *
YINGFENG CAI等: ""Scene-Adaptive Vehicle Detection Algorithm Based on a Composite Deep Structure"", 《IEEE ACCESS》 *
吴凡: ""基于深度学习的车型细粒度识别研究"", 《HTTP://WWW.DOC88.COM/P-7708621280922.HTML》 *
李新叶等: ""基于卷积神经网络语义检测的细粒度鸟类识别"", 《科学技术与工程》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955380A (en) * 2018-09-21 2020-04-03 中科寒武纪科技股份有限公司 Access data generation method, storage medium, computer device and apparatus
WO2020057145A1 (en) * 2018-09-21 2020-03-26 Boe Technology Group Co., Ltd. Method and device for generating painting display sequence, and computer storage medium
CN109410601A (en) * 2018-12-04 2019-03-01 北京英泰智科技股份有限公司 Method for controlling traffic signal lights, device, electronic equipment and storage medium
CN109829398A (en) * 2019-01-16 2019-05-31 北京航空航天大学 A kind of object detection method in video based on Three dimensional convolution network
CN109829398B (en) * 2019-01-16 2020-03-31 北京航空航天大学 A method for object detection in video based on 3D convolutional network
CN110852285A (en) * 2019-11-14 2020-02-28 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN110852285B (en) * 2019-11-14 2023-04-18 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN110929678A (en) * 2019-12-04 2020-03-27 山东省计算中心(国家超级计算济南中心) Method for detecting candida vulva vagina spores
CN110929678B (en) * 2019-12-04 2023-04-25 山东省计算中心(国家超级计算济南中心) Method for detecting vulvovaginal candida spores
CN111612797A (en) * 2020-03-03 2020-09-01 江苏大学 A rice image information processing system
CN111612797B (en) * 2020-03-03 2021-05-25 江苏大学 Rice image information processing system
CN111931877A (en) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 Target detection method, device, equipment and storage medium
CN111931877B (en) * 2020-10-12 2021-01-05 腾讯科技(深圳)有限公司 Target detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN108537286B (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN108537286A (en) A kind of accurate recognition methods of complex target based on key area detection
WO2017190574A1 (en) Fast pedestrian detection method based on aggregation channel features
Sirmacek et al. Urban-area and building detection using SIFT keypoints and graph theory
CN107506763B (en) An accurate positioning method of multi-scale license plate based on convolutional neural network
İlsever et al. Two-dimensional change detection methods: remote sensing applications
CN110210475B (en) A non-binarization and edge detection method for license plate character image segmentation
CN109061600B (en) A Target Recognition Method Based on Millimeter Wave Radar Data
Yan et al. Detection and classification of pole-like road objects from mobile LiDAR data in motorway environment
CN111274926B (en) Image data screening method, device, computer equipment and storage medium
CN107480620B (en) Remote sensing image automatic target identification method based on heterogeneous feature fusion
CN104182985B (en) Remote sensing image change detection method
Zhang et al. Road recognition from remote sensing imagery using incremental learning
CN108629286B (en) Remote sensing airport target detection method based on subjective perception significance model
CN108492298B (en) Multispectral image change detection method based on generation countermeasure network
CN109784392A (en) A kind of high spectrum image semisupervised classification method based on comprehensive confidence
KR102757154B1 (en) Clothes defect detection algorithm using CNN image processing
CN105976376B (en) A target detection method for high-resolution SAR images based on component model
CN111898627B (en) A PCA-based SVM Cloud Particle Optimal Classification and Recognition Method
Yuan et al. Learning to count buildings in diverse aerial scenes
CN112819753B (en) Building change detection method and device, intelligent terminal and storage medium
CN113158954B (en) Automatic detection method for zebra crossing region based on AI technology in traffic offsite
CN106056139A (en) Forest fire smoke/fog detection method based on image segmentation
KR101742115B1 (en) An inlier selection and redundant removal method for building recognition of multi-view images
CN109934216A (en) Image processing method, apparatus, and computer-readable storage medium
Indrabayu et al. Blob modification in counting vehicles using gaussian mixture models under heavy traffic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant