[go: up one dir, main page]

CN108564097B - Multi-scale target detection method based on deep convolutional neural network - Google Patents

Multi-scale target detection method based on deep convolutional neural network Download PDF

Info

Publication number
CN108564097B
CN108564097B CN201711267789.7A CN201711267789A CN108564097B CN 108564097 B CN108564097 B CN 108564097B CN 201711267789 A CN201711267789 A CN 201711267789A CN 108564097 B CN108564097 B CN 108564097B
Authority
CN
China
Prior art keywords
network
layer
model
classification
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711267789.7A
Other languages
Chinese (zh)
Other versions
CN108564097A (en
Inventor
徐雪妙
肖永杰
胡枭玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201711267789.7A priority Critical patent/CN108564097B/en
Publication of CN108564097A publication Critical patent/CN108564097A/en
Application granted granted Critical
Publication of CN108564097B publication Critical patent/CN108564097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-scale target detection method based on a deep convolutional neural network, which comprises the following steps: 1) acquiring data; 2) processing data; 3) constructing a model; 4) defining a loss function; 5) training a model; 6) and (5) verifying the model. The method combines the capability of extracting high-level semantic information of the image by the deep convolutional neural network, the capability of generating a candidate region by a region generation network, the capability of repairing and mapping a pooling layer of an interested region with content perception capability and the accurate classification capability of a multi-task classification network, and more accurately and efficiently completes multi-scale target detection.

Description

一种基于深度卷积神经网络的多尺度目标检测方法A multi-scale target detection method based on deep convolutional neural network

技术领域technical field

本发明涉及计算机图像处理的技术领域,尤其是指一种基于深度卷积神经网络的多尺度目标检测方法。The invention relates to the technical field of computer image processing, in particular to a multi-scale target detection method based on a deep convolutional neural network.

背景技术Background technique

目标检测与识别是计算机视觉计算领域的重要课题之一。随着人类科学技术的发展,目标检测这一重要技术不断地得到充分利用,人们把它运用到各种场景中,实现各种预期目标,如战场警戒、安全检测、交通管制、视频监控等都方面。Object detection and recognition is one of the important topics in the field of computer vision computing. With the development of human science and technology, the important technology of target detection is constantly being fully utilized. People use it in various scenarios to achieve various expected goals, such as battlefield warning, security detection, traffic control, video surveillance, etc. aspect.

近些年,随着深度学习的快速发展,深度卷积神经网络在目标检测与识别技术上也有进一步的突破。利用深度卷积神经网络,可以提取到图片的高层语义特征信息,然后再利用这些高层语义信息进行目标的检测。神经网络越深,其所表达的特征信息就更具有代表性,但是其存在的问题是,对小尺度物体则表达的非常粗糙,甚至会使得小尺度物体的部分特征丢失,而且,神经网络对大小尺度非常敏感,不同大小尺度的物体经过神经网络所提取到的特征信息存在很大的差异性,导致小尺度物体检测的准确率低,从而大大降低了目标检测的鲁棒性和有效性。In recent years, with the rapid development of deep learning, deep convolutional neural networks have also made further breakthroughs in target detection and recognition technology. Using the deep convolutional neural network, the high-level semantic feature information of the picture can be extracted, and then the high-level semantic information can be used to detect the target. The deeper the neural network is, the more representative the feature information it expresses, but the problem is that it expresses very coarsely for small-scale objects, and even some features of small-scale objects are lost. The size and scale are very sensitive, and the feature information extracted by the neural network of objects of different sizes and scales has great differences, resulting in low accuracy of small-scale object detection, which greatly reduces the robustness and effectiveness of target detection.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的缺点与不足,提出了一种深度卷积神经网络的多尺度目标检测方法,该方法可以很好的将大小尺度的目标检测出来,突破了之前方法中无法很好检测出大小尺度差异很大的同类目标的限制。The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and propose a multi-scale target detection method based on a deep convolutional neural network. It is good at detecting the limitation of homogeneous objects with large differences in size and scale.

为实现上述目的,本发明所提供的技术方案为:一种基于深度卷积神经网络的多尺度目标检测方法,包括以下步骤:In order to achieve the above purpose, the technical solution provided by the present invention is: a multi-scale target detection method based on a deep convolutional neural network, comprising the following steps:

1)数据获取1) Data acquisition

训练深度卷积神经网络需要大量的训练数据,因此需要使用大规模的自然图像或视频图像数据,如果得到的图像数据没有标签数据则需要进行人工标注,然后划分为训练数据集以及验证数据集;Training a deep convolutional neural network requires a lot of training data, so large-scale natural image or video image data needs to be used. If the obtained image data does not have label data, it needs to be manually labeled, and then divided into training data sets and verification data sets;

2)数据处理2) Data processing

将图像数据集的图像和标签数据通过预处理转化为训练深度卷积神经网络所需要的格式;Convert the image and label data of the image dataset into the format required for training a deep convolutional neural network through preprocessing;

3)模型构建3) Model building

根据训练目标以及模型的输入输出形式,构造出一个适用于多尺度目标检测问题的深度卷积神经网络;According to the training target and the input and output form of the model, a deep convolutional neural network suitable for multi-scale target detection is constructed;

4)定义损失函数4) Define the loss function

根据训练目标以及模型的架构,定义出所需的损失函数;Define the required loss function according to the training target and the architecture of the model;

5)模型训练5) Model training

初始化各层网络的参数,不断迭代输入训练样本,根据损失函数计算得到网络的损失值,再通过反向传播计算出各网络层参数的梯度,通过随机梯度下降法对各层网络的参数进行更新;Initialize the parameters of each layer of the network, iteratively input training samples, calculate the loss value of the network according to the loss function, and then calculate the gradient of the parameters of each network layer through back propagation, and update the parameters of each layer of the network through the stochastic gradient descent method. ;

6)模型验证6) Model Validation

使用验证数据集对训练得到的模型进行验证,测试其泛化性能。Use the validation dataset to validate the trained model and test its generalization performance.

所述步骤2)包括以下步骤:Described step 2) comprises the following steps:

2.1)将数据集中的图像缩放到长和宽为m×n像素大小,标签数据也根据相应的比例缩放到相应的大小;2.1) Scale the images in the dataset to the size of m×n pixels in length and width, and the label data is also scaled to the corresponding size according to the corresponding ratio;

2.2)在缩放后的图像,随机裁剪包含有标签的地方得到a×b像素大小的矩形图像,a<=m,b<=n;2.2) In the scaled image, randomly crop the place containing the label to obtain a rectangular image of a×b pixel size, a<=m, b<=n;

2.3)以0.5的概率随机水平翻转裁剪后的图像;2.3) Randomly flip the cropped image horizontally with a probability of 0.5;

2.4)将随机翻转后的图像从[0,255]转换到[-1,1]的范围内。2.4) Convert the randomly flipped image from [0, 255] to the range of [-1, 1].

所述步骤3)包括以下步骤:Described step 3) comprises the following steps:

3.1)构造特征提取网络模型3.1) Construct feature extraction network model

特征提取网络相当于一个编码器,用于从输入的图像中提取出高层的语义信息并保存到一个低维的编码中,特征提取网络的输入为经过步骤2)处理的图像,小物体在越深层的编码中会丢失部分信息,因此为了减少保全更多的信息,输出低维和较低维的特征编码;为了实现从输入到一系列输出的转换,特征提取网络包含多个级联的下采样层,下采样层由串联的卷积层、批量正则化层、以及非线性激活函数层、池化层组成,其中卷积层步长为1,卷积核大小为3×3,提取出相应的特征图,批量正则化层通过归一化同一个批次的输入样本的均值和标准差,起到稳定和加速模型训练的作用,非线性激活函数层的加入防止模型退化为简单的线性模型,提高模型的描述能力,池化层的作用是缩小特征图的大小,这样能够增加卷积核的感受野;The feature extraction network is equivalent to an encoder, which is used to extract high-level semantic information from the input image and save it into a low-dimensional encoding. The input of the feature extraction network is the image processed in step 2). Some information will be lost in the deep coding, so in order to reduce the preservation of more information, low-dimensional and low-dimensional feature codes are output; in order to realize the transformation from input to a series of outputs, the feature extraction network contains multiple cascaded downsampling The downsampling layer consists of convolutional layers, batch regularization layers, nonlinear activation function layers, and pooling layers in series. The batch regularization layer plays a role in stabilizing and accelerating model training by normalizing the mean and standard deviation of the input samples of the same batch. The addition of a nonlinear activation function layer prevents the model from degenerating into a simple linear model. , to improve the description ability of the model, the role of the pooling layer is to reduce the size of the feature map, which can increase the receptive field of the convolution kernel;

3.2)构造区域生成网络模型3.2) Constructing a regional generative network model

区域生成网络负责找到输入图中所有的物体和它们的位置;区域生成网络输入特征图,然后把这个特征图上的每一个点映射回原图,得到这些点的坐标,再在这些点周围取一些提前设定好的不同大小不同长宽比例的候选框,并计算出每个框是物体的概率分数;其中,区域生成网络的输入为步骤3.1)特征提取网络的输出,输出一系列候选框的坐标和一系列候选框是物体的概率分数;The region generation network is responsible for finding all the objects and their positions in the input image; the region generation network inputs the feature map, and then maps each point on the feature map back to the original image, obtains the coordinates of these points, and then takes the coordinates around these points. Some candidate boxes of different sizes and different aspect ratios are set in advance, and the probability score that each box is an object is calculated; among them, the input of the region generation network is the output of step 3.1) feature extraction network, and a series of candidate boxes are output. The coordinates of and a series of candidate boxes are the probability scores of the object;

为了实现从输入到输出的一系列转换,区域生成网络模型包括3个串联的功能结构,有卷积层、批量正则化层、非线性激活函数层,第一个功能结构是将输入进行3×3大小的特征融合,融合周边的信息,并分别作为第二和第三个功能结构的输入,第二个功能结构实现输出矩形框的坐标信息,第三个功能结构实现输出对应矩形框是物体的概率分数;In order to realize a series of transformations from input to output, the region generation network model includes 3 functional structures in series, including convolutional layers, batch regularization layers, and nonlinear activation function layers. The first functional structure is to perform 3 × The feature fusion of 3 sizes, fuses the surrounding information, and serves as the input of the second and third functional structures respectively. The second functional structure realizes the output of the coordinate information of the rectangular frame, and the third functional structure realizes the output corresponding to the rectangular frame is an object. probability score;

3.3)构造有内容感知能力的感兴趣区域池化层3.3) Construct a content-aware region of interest pooling layer

有内容感知能力的感兴趣区域池化层的作用是实现从原图的目标区域映射到所述步骤3.1)得到的低维编码区域,再池化到固定大小的功能,而有内容感知能力则表现在以下两方面:The role of the content-aware region of interest pooling layer is to map the target region of the original image to the low-dimensional coding region obtained in step 3.1), and then pool to a fixed size, while the content-aware capability It is manifested in the following two aspects:

3.3.1)信息补全3.3.1) Information Completion

信息补全是为了补全小目标在低维编码时丢失的信息,让小目标的检测更准确;针对从原图的目标区域映射到所述步骤3.1)的低维编码的特征图,若其长和宽其中一个大于z,z的取值根据网络需求而定,另一个小于z,则通过反卷积的方式将其放大到边长为max(长,宽)的正方形,再进行池化操作;若其长和宽都小于z,则长宽通过反卷积的方式放大到原来的2倍,再进行池化操作;若其长和宽都大于z,则直接进行后续的池化操作;Information completion is to complete the information lost by small targets during low-dimensional coding, so that the detection of small targets is more accurate; for the low-dimensional coding feature map mapped from the target area of the original image to the step 3.1), if its One of the length and width is greater than z, the value of z is determined according to the network requirements, and the other is less than z, then it is enlarged to a square whose side length is max (length, width) by deconvolution, and then pooled Operation; if its length and width are both less than z, the length and width are enlarged to twice the original size by deconvolution, and then the pooling operation is performed; if both its length and width are greater than z, the subsequent pooling operation is performed directly ;

3.3.2)划分大小3.3.2) Division size

对所述步骤3.2)输出原图的目标区域进行划分大小,根据准备的训练数据集中所有标签框的面积的均值,若所述步骤3.2)输出的矩形框的面积小于该均值,标记为小目标输出,而大于或等于该均值的,标记为大目标输出;Divide the size of the target area of the original image output in the step 3.2), according to the average value of the area of all the label boxes in the prepared training data set, if the area of the rectangular box output in the step 3.2) is smaller than the average value, mark it as a small target output, and those greater than or equal to the mean are marked as large target outputs;

3.4)构造多任务分类网络3.4) Constructing a multi-task classification network

多任务分类网络是为了分别识别大尺度和小尺度的目标,防止大和小尺度的目标的低维编码不同导致的分类错误;根据步骤3.3)得到的大小两类矩形框,分别输入两个分类网络;分类网络输出类别的分数用以分类任务,以及精修选框的位置用于回归任务,为了完成分类和回归任务,该网络包含全连接层、非线性激活函数层、信号丢失层,全连接层起到将学到的“分布式特征表示”映射到样本标记空间的作用,非线性激活函数层的加入防止了模型退化为简单的线性模型,提高模型的描述能力,信号丢失层以0.5的概率让神经元不工作,让训练过程收敛更快,防止过拟合;The multi-task classification network is to identify large-scale and small-scale targets respectively, and prevent classification errors caused by different low-dimensional codes of large-scale and small-scale targets; according to the two types of rectangular boxes obtained in step 3.3), input the two classification networks respectively. ; The score of the output category of the classification network is used for the classification task, and the position of the refinement box is used for the regression task. In order to complete the classification and regression tasks, the network includes a fully connected layer, a nonlinear activation function layer, a signal loss layer, and a fully connected layer. The layer plays the role of mapping the learned "distributed feature representation" to the sample label space. The addition of the nonlinear activation function layer prevents the model from degenerating into a simple linear model and improves the description ability of the model. The signal loss layer is 0.5 Probability makes neurons not work, makes the training process converge faster, and prevents overfitting;

最后将大小分类网络的输出结果进行融合,作为最终输出;Finally, the output results of the size classification network are fused as the final output;

所述步骤4)包括以下步骤:Described step 4) comprises the following steps:

4.1)定义区域生成网络的损失函数4.1) Define the loss function of the region generation network

区域生成网络用于在低维的编码中得到输入图感兴趣区域的坐标和该区域是否为前景的分数,即回归任务和分类任务,定义损失函数使输出的选框尽可能的接近标准参考框的位置;因此,回归任务的损失函数能够定义为平滑化曼哈顿距离损失损失(SmoothL1Loss),公式如下所示:The region generation network is used to obtain the coordinates of the region of interest in the input image and the score of whether the region is foreground in the low-dimensional encoding, that is, the regression task and the classification task, and the loss function is defined to make the output box as close to the standard reference box as possible. ; therefore, the loss function for the regression task can be defined as the smoothed Manhattan distance loss loss (SmoothL1Loss), the formula is as follows:

Figure GDA0002559733250000051
Figure GDA0002559733250000051

Figure GDA0002559733250000052
Figure GDA0002559733250000052

其中,Lreg为回归损失,v和t分别表示预测框的位置和其对应的标准参考框的位置,x和y表示左上角坐标值,w和h分别表示矩形框的宽和高;Among them, L reg is the regression loss, v and t respectively represent the position of the prediction frame and the position of its corresponding standard reference frame, x and y represent the upper left corner coordinate value, and w and h represent the width and height of the rectangular frame, respectively;

分类任务的损失函数定义为柔性最大化损失(SoftmaxLoss),公式如下所示:The loss function of the classification task is defined as the soft maximization loss (SoftmaxLoss), and the formula is as follows:

x'i=x'i-max(x'1,...,x'n)x' i =x' i -max(x' 1 ,...,x' n )

Figure GDA0002559733250000053
Figure GDA0002559733250000053

Lcls=-logpi L cls = -logpi

其中,x'为网络的输出,n表示总类别数,p表示每一类的概率,Lcls为分类损失;Among them, x' is the output of the network, n is the total number of categories, p is the probability of each category, and L cls is the classification loss;

4.2)定义分类网络的损失函数4.2) Define the loss function of the classification network

分类网络输出类别的分数用于分类任务,以及精修选框的位置用于回归任务,定义损失函数使其输出的类别尽可能的和标签数据一致,同时使其输出的选框位置尽可能的和标准参考框的位置一致;同样如步骤4.1),回归任务的损失函数能够定义为SmoothL1Loss,分类任务的损失函数定义为SoftmaxLoss;The score of the output category of the classification network is used for the classification task, and the position of the refinement box is used for the regression task. The loss function is defined so that the output category is as consistent as possible with the label data, and the output box position is as close as possible. The position of the standard reference frame is the same; as in step 4.1), the loss function of the regression task can be defined as SmoothL1Loss, and the loss function of the classification task can be defined as SoftmaxLoss;

4.3)定义总损失函数4.3) Define the total loss function

步骤4.1)和步骤4.2)中定义的两个区域生成网络损失函数与两个分类网络损失函数能够通过加权的方式组合起来,使得网络可以完成图片中多尺度目标检测的任务;The two area generation network loss functions defined in step 4.1) and step 4.2) and the two classification network loss functions can be combined in a weighted manner, so that the network can complete the task of multi-scale target detection in the picture;

所述步骤5)包括以下步骤:Described step 5) comprises the following steps:

5.1)初始化模型各层参数5.1) Initialize the parameters of each layer of the model

各层参数的初始化采用的是传统的深度卷积神经网络中使用到的方法,对特征提取网络的卷积层参数利用在ImageNet预训练好的VGG16网络模型的卷积层参数值作为初始值,区域生成网络中的卷积层以及分类网络的全连接层,则采用均值为0,标准差为0.02的高斯分布进行初始化,而对所有的批量正则化层的参数采用均值为1,标准差为0.02的高斯分布进行初始化;The initialization of the parameters of each layer adopts the method used in the traditional deep convolutional neural network. For the convolutional layer parameters of the feature extraction network, the convolutional layer parameter values of the VGG16 network model pre-trained in ImageNet are used as the initial values. The convolutional layer in the region generation network and the fully connected layer of the classification network are initialized with a Gaussian distribution with a mean of 0 and a standard deviation of 0.02, while the parameters of all batch regularization layers are initialized with a mean of 1 and a standard deviation of The Gaussian distribution of 0.02 is initialized;

5.2)训练网络模型5.2) Train the network model

随机输入经过步骤2)处理的原始图像,经过步骤3.1)的特征提取网络得到相应的低维编码特征,在经过步骤3.2)的区域生成网络生成一批选框的候选区域,并通过步骤4.1)计算相应的损失值,然后将这些区域经过步骤3.3)的有内容感知能力的感兴趣区域池化层得到固定大小的另一种低维编码特征,而后再经过步骤3.4)的分类网络得到目标的分类以及精修的选框位置,并通过步骤4.2)计算相应的损失值。最后将这两部分的损失值经过步骤4.3)的处理得到最终损失值,将该值通过反向传播能够得到步骤3)网络模型中的各层参数的梯度,再通过随机梯度下降算法使得到的梯度对各层参数进行优化,即可实现一轮网络模型的训练;Randomly input the original image processed in step 2), obtain the corresponding low-dimensional coding features through the feature extraction network in step 3.1), generate a batch of candidate regions for the selection box in the region generation network in step 3.2), and pass step 4.1) Calculate the corresponding loss value, and then pass these regions through the content-aware region of interest pooling layer in step 3.3) to obtain another low-dimensional encoding feature of a fixed size, and then go through the classification network in step 3.4) to obtain the target's Classification and refinement of the marquee position, and calculate the corresponding loss value through step 4.2). Finally, the loss values of these two parts are processed in step 4.3) to obtain the final loss value, and the gradient of each layer parameter in the network model in step 3) can be obtained by back-propagation of this value, and then the stochastic gradient descent algorithm is used to make the obtained The gradient optimizes the parameters of each layer to realize a round of training of the network model;

5.3)重复步骤5.2)直到网络关于多尺度目标检测的能力达到预期的目标为止。5.3) Repeat step 5.2) until the ability of the network on multi-scale object detection reaches the desired goal.

所述步骤6)的具体做法如下:The concrete practice of described step 6) is as follows:

随机从验证数据集中取出一些原始图像,经过步骤2)处理后,输入到步骤5)训练好的网络模型,让该网络模型去检测图中的目标的位置并预测其类别,通过输出的结果与对应的标签数据进行比对,从而判断该训练好的网络模型的多尺度目标检测能力。Randomly take some original images from the verification data set, and after processing in step 2), input them to the network model trained in step 5), and let the network model detect the position of the target in the figure and predict its category. The corresponding label data are compared to judge the multi-scale target detection ability of the trained network model.

本发明与现有技术相比,具有如下优点与有益效果:Compared with the prior art, the present invention has the following advantages and beneficial effects:

1、提出了新的网络层--有内容感知能力的感兴趣区域池化层(CAROIPooling,Content-Aware ROIPooling layer),实现从原图区域映射到所低维编码区域再池化到固定大小的功能,尤其会对小尺度物体的地位编码特征图进行信息补全,达到更准确和更全面的低维编码特征图的目的,而且该网络层在其他目标检测网络中一样适用。1. A new network layer is proposed--a content-aware region of interest pooling layer (CAROIPooling, Content-Aware ROIPooling layer), which realizes the mapping from the original image area to the low-dimensional coding area and then pools it to a fixed size. In particular, it will complete the information of the position-encoding feature maps of small-scale objects, so as to achieve the purpose of more accurate and comprehensive low-dimensional encoding feature maps, and this network layer is also applicable to other target detection networks.

2、提出了一个多分支的目标检测网络,不同分支分别负责大尺度和小尺度的目标检测任务,从而更加准确的区分和检测出大尺度物体和小尺度物体,突破已有方法的限制。2. A multi-branch target detection network is proposed. Different branches are responsible for large-scale and small-scale target detection tasks, so as to more accurately distinguish and detect large-scale objects and small-scale objects, breaking through the limitations of existing methods.

附图说明Description of drawings

图1为本发明方法流程图。Fig. 1 is the flow chart of the method of the present invention.

图2为特征提取网络示意图。Figure 2 is a schematic diagram of the feature extraction network.

图3为区域生成网络示意图。Figure 3 is a schematic diagram of the area generation network.

图4为分类网络示意图。Figure 4 is a schematic diagram of the classification network.

具体实施方式Detailed ways

下面结合具体实施例对本发明作进一步说明。The present invention will be further described below in conjunction with specific embodiments.

如图1所示,本实施例所提供的基于深度卷积神经网络的多尺度目标检测方法,其具体情况如下:As shown in FIG. 1 , the details of the multi-scale target detection method based on a deep convolutional neural network provided by this embodiment are as follows:

步骤1,获取高速公路视频数据集,然后获取其视频帧,进行人工标注,并划分为训练数据集以及验证数据集。Step 1: Obtain the highway video data set, then obtain its video frames, perform manual annotation, and divide it into a training data set and a verification data set.

步骤2,将图像数据集的图像和标签数据通过预处理转化为训练深度卷积神经网络所需要的格式,包括以下步骤:Step 2: Convert the image and label data of the image dataset into a format required for training a deep convolutional neural network through preprocessing, including the following steps:

步骤2.1,将数据集中的图像缩放到长和宽为768×1344像素大小,标签数据也根据相应的比例缩放到相应的大小。In step 2.1, the images in the dataset are scaled to a size of 768 × 1344 pixels in length and width, and the label data is also scaled to the corresponding size according to the corresponding scale.

步骤2.2,在缩放后的图像,随机裁剪其中包含有标签的地方得到768×768像素大小的正方形图像。Step 2.2, in the scaled image, randomly crop the place containing the label to obtain a square image with a size of 768 × 768 pixels.

步骤2.3,以0.5的概率随机水平翻转裁剪后的图像。Step 2.3, randomly flip the cropped image horizontally with a probability of 0.5.

步骤2.4,将随机翻转后的图像从[0,255]转换到[-1,1]的范围内。Step 2.4, transform the randomly flipped image from [0,255] to the range of [-1,1].

步骤3,构建网络模型,包括特征提取网络、区域生成网络、多任务分类网络,包括以下步骤:Step 3, building a network model, including a feature extraction network, a region generation network, and a multi-task classification network, including the following steps:

步骤3.1,构造特征提取网络。特征提取网络的输入为3×768×768的图像,输出为一系列低维编码特征图(512×48×48和512×24×24)。该网络包括多个级联的下采样层。下采样层由串联的卷积层、批量正则化层、以及非线性激活函数层、池化层组成。以下是一个特征提取网络模型的具体例子,如图2所示。Step 3.1, construct a feature extraction network. The input to the feature extraction network is an image of 3 × 768 × 768, and the output is a series of low-dimensional encoded feature maps (512 × 48 × 48 and 512 × 24 × 24). The network includes multiple cascaded downsampling layers. The downsampling layer is composed of convolutional layers, batch regularization layers, nonlinear activation function layers, and pooling layers in series. The following is a specific example of a feature extraction network model, as shown in Figure 2.

步骤3.2,构造区域生成网络。区域生成网络的输入为512×48×48/512×24×24的特征图,输出为36×48×48/36×24×24和18×48×48/18×24×24的矩阵信息。该网络包括3个串联的结构(卷积层、批量正则化层、非线性激活函数层)。以下是一个区域生成网络模型的具体例子,如图3所示。Step 3.2, construct the region generation network. The input of the region generation network is the feature map of 512×48×48/512×24×24, and the output is the matrix information of 36×48×48/36×24×24 and 18×48×48/18×24×24. The network consists of 3 concatenated structures (convolutional layer, batch regularization layer, non-linear activation function layer). The following is a specific example of a region generative network model, as shown in Figure 3.

步骤3.3,构造多任务分类网络。本例子用了两个分类网络,他们的输入都是长度为512×7×7的向量,输出长度为4的向量A和长度为4的向量B,其中向量A中的4个值分别表示背景、小车、公共汽车、火车的类别分数,向量B中的4个值表示了一个选框的位置(左上角点的坐标x和y,选框的宽和高w和h)。该网络包含了全连接层、非线性激活函数层,信息丢失层。以下是本例子多任务分类网络模型的具体例子,如图4所示。Step 3.3, construct a multi-task classification network. This example uses two classification networks, their input is a vector of length 512 × 7 × 7, and the output is a vector A of length 4 and a vector B of length 4, where the four values in vector A represent the background respectively. , car, bus, train class scores, the 4 values in the vector B represent the position of a marquee (coordinates x and y of the upper left point, and the width and height of the marquee w and h). The network includes a fully connected layer, a nonlinear activation function layer, and an information loss layer. The following is a specific example of the multi-task classification network model in this example, as shown in Figure 4.

步骤4,定义区域生成网络和分类网络的损失函数,包括以下步骤:Step 4, define the loss function of the region generation network and the classification network, including the following steps:

步骤4.1,定义区域生成网络的损失函数。定义损失函数使输出的选框尽可能的接近标准参考框的位置,此处用SmoothL1Loss定义损失函数使输出的选框的前景分数尽可能的与标签数据接近,此处用SoftmaxLoss。Step 4.1, define the loss function of the region generation network. Define the loss function to make the output box as close as possible to the position of the standard reference frame. Here, SmoothL1Loss is used to define the loss function to make the foreground score of the output box as close to the label data as possible, and SoftmaxLoss is used here.

步骤4.2,定义分类网络的损失函数。定义损失函数使输出的选框的前景分数尽可能的与标签数据接近,类别为4类。定义损失函数使输出的选框尽可能的接近标准参考框的位置。Step 4.2, define the loss function of the classification network. The loss function is defined so that the foreground score of the output box is as close as possible to the label data, and the category is 4 categories. Define the loss function so that the output box is as close as possible to the position of the standard reference box.

步骤4.3,定义总损失函数。对以上4个损失进行加权求和。用公式表示如下:Step 4.3, define the total loss function. Weighted summation of the above 4 losses. The formula is expressed as follows:

Loss=(w1×Lcls+w2×Lreg)区域生成网络损失+(w1×Lcls+w2×Lreg)分类网络损失 Loss=(w 1 ×L cls +w 2 ×L reg ) area generation network loss +(w 1 ×L cls +w 2 ×L reg ) classification network loss

其中,Loss为总损失值,w1、w2、w3、w4为权重,本例w1=w2=w3=w4=1,Lcls为分类损失值,Lreg为回归损失值。Among them, Loss is the total loss value, w1, w2, w3, and w4 are the weights. In this example, w1=w2=w3=w4=1, L cls is the classification loss value, and L reg is the regression loss value.

步骤5,训练网络模型,包括以下步骤:Step 5, train the network model, including the following steps:

步骤5.1,初始化模型各层参数,特征提取网络的卷积层参数利用在一个大数据库ImageNet上预训练好的VGG16网络模型的卷积层参数值作为初始值,区域生成网络中的卷积层以及分类网络的全连接层,则采用均值为0,标准差为0.02的高斯分布进行初始化,而对所有的批量正则化层的参数采用均值为1,标准差为0.02的高斯分布进行初始化。Step 5.1, initialize the parameters of each layer of the model, the convolutional layer parameters of the feature extraction network use the convolutional layer parameter values of the VGG16 network model pre-trained on a large database ImageNet as the initial value, and the convolutional layer and The fully connected layer of the classification network is initialized with a Gaussian distribution with a mean of 0 and a standard deviation of 0.02, while the parameters of all batch regularization layers are initialized with a Gaussian distribution with a mean of 1 and a standard deviation of 0.02.

步骤5.2,训练网络模型随机输入经过步骤2处理的原始图像,输入步骤3的网络模型,输出类别信息和回归框的坐标信息,再经过步骤4计算得到最终损失值,将该值通过反向传播能够得到步骤3网络模型中的各层参数的梯度,再通过随机梯度下降算法使得到的梯度对各层参数进行优化,即可实现一轮网络模型的训练。Step 5.2, train the network model to randomly input the original image processed in step 2, input the network model of step 3, output the category information and the coordinate information of the regression box, and then calculate the final loss value through step 4, and pass this value through back propagation The gradient of the parameters of each layer in the network model in step 3 can be obtained, and then the obtained gradient can be used to optimize the parameters of each layer through the stochastic gradient descent algorithm, and then a round of training of the network model can be realized.

步骤5.3,持续迭代训练,即重复步骤5.2直到网络关于多尺度目标检测的能力达到预期的目标为止。Step 5.3, continuous iterative training, that is, repeat step 5.2 until the ability of the network on multi-scale target detection reaches the expected target.

步骤6,使用验证数据集对训练得到的模型进行验证,测试其泛化性能。Step 6: Validate the trained model using the validation dataset to test its generalization performance.

具体做法是随机从验证数据集中取出一些原始图像,经过步骤2处理后,输入到步骤5训练好的网络模型,让该网络模型去检测图中的目标的位置并预测其类别。通过输出的结果与对应的标签数据进行比对,从而判断该训练好的网络模型的多尺度目标检测能力。The specific method is to randomly take some original images from the verification data set, and after processing in step 2, input them into the network model trained in step 5, and let the network model detect the position of the target in the picture and predict its category. The multi-scale target detection ability of the trained network model is judged by comparing the output results with the corresponding label data.

以上所述实施例只为本发明之较佳实施例,并非以此限制本发明的实施范围,故凡依本发明之形状、原理所作的变化,均应涵盖在本发明的保护范围内。The above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of implementation of the present invention. Therefore, any changes made according to the shape and principle of the present invention should be included within the protection scope of the present invention.

Claims (3)

1.一种基于深度卷积神经网络的多尺度目标检测方法,其特征在于,包括以下步骤:1. a multi-scale target detection method based on deep convolutional neural network, is characterized in that, comprises the following steps: 1)数据获取1) Data acquisition 训练深度卷积神经网络需要大量的训练数据,因此需要使用大规模的自然图像或视频图像数据,如果得到的图像数据没有标签数据则需要进行人工标注,然后划分为训练数据集以及验证数据集;Training a deep convolutional neural network requires a lot of training data, so large-scale natural image or video image data needs to be used. If the obtained image data does not have label data, it needs to be manually labeled, and then divided into training data sets and verification data sets; 2)数据处理2) Data processing 将图像数据集的图像和标签数据通过预处理转化为训练深度卷积神经网络所需要的格式;Convert the image and label data of the image dataset into the format required for training a deep convolutional neural network through preprocessing; 3)模型构建3) Model building 根据训练目标以及模型的输入输出形式,构造出一个适用于多尺度目标检测问题的深度卷积神经网络,包括以下步骤:According to the training target and the input and output form of the model, a deep convolutional neural network suitable for multi-scale target detection is constructed, including the following steps: 3.1)构造特征提取网络模型3.1) Construct feature extraction network model 特征提取网络相当于一个编码器,用于从输入的图像中提取出高层的语义信息并保存到一个低维的编码中,特征提取网络的输入为经过步骤2)处理的图像,小物体在越深层的编码中会丢失部分信息,因此为了减少保全更多的信息,输出低维和较低维的特征编码;为了实现从输入到一系列输出的转换,特征提取网络包含多个级联的下采样层,下采样层由串联的卷积层、批量正则化层、以及非线性激活函数层、池化层组成,其中卷积层步长为1,卷积核大小为3×3,提取出相应的特征图,批量正则化层通过归一化同一个批次的输入样本的均值和标准差,起到稳定和加速模型训练的作用,非线性激活函数层的加入防止模型退化为简单的线性模型,提高模型的描述能力,池化层的作用是缩小特征图的大小,这样能够增加卷积核的感受野;The feature extraction network is equivalent to an encoder, which is used to extract high-level semantic information from the input image and save it into a low-dimensional encoding. The input of the feature extraction network is the image processed in step 2). Some information will be lost in the deep coding, so in order to reduce the preservation of more information, low-dimensional and low-dimensional feature codes are output; in order to realize the transformation from input to a series of outputs, the feature extraction network contains multiple cascaded downsampling The downsampling layer consists of convolutional layers, batch regularization layers, nonlinear activation function layers, and pooling layers in series. The batch regularization layer plays a role in stabilizing and accelerating model training by normalizing the mean and standard deviation of the input samples of the same batch. The addition of a nonlinear activation function layer prevents the model from degenerating into a simple linear model. , to improve the description ability of the model, the role of the pooling layer is to reduce the size of the feature map, which can increase the receptive field of the convolution kernel; 3.2)构造区域生成网络模型3.2) Constructing a regional generative network model 区域生成网络负责找到输入图中所有的物体和它们的位置;区域生成网络输入特征图,然后把这个特征图上的每一个点映射回原图,得到这些点的坐标,再在这些点周围取一些提前设定好的不同大小不同长宽比例的候选框,并计算出每个框是物体的概率分数;其中,区域生成网络的输入为步骤3.1)特征提取网络的输出,输出一系列候选框的坐标和一系列候选框是物体的概率分数;The region generation network is responsible for finding all the objects and their positions in the input image; the region generation network inputs the feature map, and then maps each point on the feature map back to the original image, obtains the coordinates of these points, and then takes the coordinates around these points. Some candidate boxes of different sizes and different aspect ratios are set in advance, and the probability score that each box is an object is calculated; among them, the input of the region generation network is the output of step 3.1) feature extraction network, and a series of candidate boxes are output. The coordinates of and a series of candidate boxes are the probability scores of the object; 为了实现从输入到输出的一系列转换,区域生成网络模型包括3个串联的功能结构,有卷积层、批量正则化层、非线性激活函数层,第一个功能结构是将输入进行3×3大小的特征融合,融合周边的信息,并分别作为第二和第三个功能结构的输入,第二个功能结构实现输出矩形框的坐标信息,第三个功能结构实现输出对应矩形框是物体的概率分数;In order to realize a series of transformations from input to output, the region generation network model includes 3 functional structures in series, including convolutional layers, batch regularization layers, and nonlinear activation function layers. The first functional structure is to perform 3 × The feature fusion of 3 sizes, fuses the surrounding information, and serves as the input of the second and third functional structures respectively. The second functional structure realizes the output of the coordinate information of the rectangular frame, and the third functional structure realizes the output corresponding to the rectangular frame is an object. probability score; 3.3)构造有内容感知能力的感兴趣区域池化层3.3) Construct a content-aware region of interest pooling layer 有内容感知能力的感兴趣区域池化层的作用是实现从原图的目标区域映射到所述步骤3.1)得到的低维编码区域,再池化到固定大小的功能,而有内容感知能力则表现在以下两方面:The role of the content-aware region of interest pooling layer is to map the target region of the original image to the low-dimensional coding region obtained in step 3.1), and then pool to a fixed size, while the content-aware capability It is manifested in the following two aspects: 3.3.1)信息补全3.3.1) Information Completion 信息补全是为了补全小目标在低维编码时丢失的信息,让小目标的检测更准确;针对从原图的目标区域映射到所述步骤3.1)的低维编码的特征图,若其长和宽其中一个大于z,z的取值根据网络需求而定,另一个小于z,则通过反卷积的方式将其放大到边长为max(长,宽)的正方形,再进行池化操作;若其长和宽都小于z,则长宽通过反卷积的方式放大到原来的2倍,再进行池化操作;若其长和宽都大于z,则直接进行后续的池化操作;Information completion is to complete the information lost by small targets during low-dimensional coding, so that the detection of small targets is more accurate; for the low-dimensional coding feature map mapped from the target area of the original image to the step 3.1), if its One of the length and width is greater than z, the value of z is determined according to the network requirements, and the other is less than z, then it is enlarged to a square whose side length is max (length, width) by deconvolution, and then pooled Operation; if its length and width are both less than z, the length and width are enlarged to twice the original size by deconvolution, and then the pooling operation is performed; if both its length and width are greater than z, the subsequent pooling operation is performed directly ; 3.3.2)划分大小3.3.2) Division size 对所述步骤3.2)输出原图的目标区域进行划分大小,根据准备的训练数据集中所有标签框的面积的均值,若所述步骤3.2)输出的矩形框的面积小于该均值,标记为小目标输出,而大于或等于该均值的,标记为大目标输出;Divide the size of the target area of the original image output in the step 3.2), according to the average value of the area of all the label boxes in the prepared training data set, if the area of the rectangular box output in the step 3.2) is smaller than the average value, mark it as a small target output, and those greater than or equal to the mean are marked as large target outputs; 3.4)构造多任务分类网络3.4) Constructing a multi-task classification network 多任务分类网络是为了分别识别大尺度和小尺度的目标,防止大和小尺度的目标的低维编码不同导致的分类错误;根据步骤3.3)得到的大小两类矩形框,分别输入两个分类网络;分类网络输出类别的分数用以分类任务,以及精修选框的位置用于回归任务,为了完成分类和回归任务,该网络包含全连接层、非线性激活函数层、信号丢失层,全连接层起到将学到的“分布式特征表示”映射到样本标记空间的作用,非线性激活函数层的加入防止了模型退化为简单的线性模型,提高模型的描述能力,信号丢失层以0.5的概率让神经元不工作,让训练过程收敛更快,防止过拟合;The multi-task classification network is to identify large-scale and small-scale targets respectively, and prevent classification errors caused by different low-dimensional codes of large-scale and small-scale targets; according to the two types of rectangular boxes obtained in step 3.3), input the two classification networks respectively. ; The score of the output category of the classification network is used for the classification task, and the position of the refinement box is used for the regression task. In order to complete the classification and regression tasks, the network includes a fully connected layer, a nonlinear activation function layer, a signal loss layer, and a fully connected layer. The layer plays the role of mapping the learned "distributed feature representation" to the sample label space. The addition of the nonlinear activation function layer prevents the model from degenerating into a simple linear model and improves the description ability of the model. The signal loss layer is 0.5 Probability makes neurons not work, makes the training process converge faster, and prevents overfitting; 最后将大小分类网络的输出结果进行融合,作为最终输出;Finally, the output results of the size classification network are fused as the final output; 4)定义损失函数4) Define the loss function 根据训练目标以及模型的架构,定义出所需的损失函数,包括以下步骤:According to the training target and the architecture of the model, the required loss function is defined, including the following steps: 4.1)定义区域生成网络的损失函数4.1) Define the loss function of the region generation network 区域生成网络用于在低维的编码中得到输入图感兴趣区域的坐标和该区域是否为前景的分数,即回归任务和分类任务,定义损失函数使输出的选框接近标准参考框的位置;因此,回归任务的损失函数能够定义为平滑化曼哈顿距离损失SmoothL1Loss,公式如下所示:The region generation network is used to obtain the coordinates of the region of interest in the input image and the score of whether the region is foreground in the low-dimensional encoding, that is, the regression task and the classification task, and the loss function is defined to make the output box close to the position of the standard reference box; Therefore, the loss function of the regression task can be defined as the smoothed Manhattan distance loss SmoothL1Loss, the formula is as follows:
Figure FDA0002559733240000041
Figure FDA0002559733240000041
Figure FDA0002559733240000042
Figure FDA0002559733240000042
其中,Lreg为回归损失,v和t分别表示预测框的位置和其对应的标准参考框的位置,x和y表示左上角坐标值,w和h分别表示矩形框的宽和高;Among them, L reg is the regression loss, v and t respectively represent the position of the prediction frame and the position of its corresponding standard reference frame, x and y represent the upper left corner coordinate value, and w and h represent the width and height of the rectangular frame, respectively; 分类任务的损失函数定义为柔性最大化损失SoftmaxLoss,公式如下所示:The loss function of the classification task is defined as the soft maximization loss SoftmaxLoss, the formula is as follows: x'i=x'i-max(x'1,...,x'n)x' i =x' i -max(x' 1 ,...,x' n )
Figure FDA0002559733240000043
Figure FDA0002559733240000043
Lcls=-logpi L cls = -logpi 其中,x'为网络的输出,n表示总类别数,p表示每一类的概率,Lcls为分类损失;Among them, x' is the output of the network, n is the total number of categories, p is the probability of each category, and L cls is the classification loss; 4.2)定义分类网络的损失函数4.2) Define the loss function of the classification network 分类网络输出类别的分数用于分类任务,以及精修选框的位置用于回归任务,定义损失函数使其输出的类别和标签数据一致,同时使其输出的选框位置和标准参考框的位置一致;同样如步骤4.1),回归任务的损失函数能够定义为SmoothL1Loss,分类任务的损失函数定义为SoftmaxLoss;The score of the output category of the classification network is used for the classification task, and the position of the refinement box is used for the regression task. The loss function is defined to make the output category and label data consistent, and the output box position and the standard reference frame position. Consistent; also as in step 4.1), the loss function of the regression task can be defined as SmoothL1Loss, and the loss function of the classification task can be defined as SoftmaxLoss; 4.3)定义总损失函数4.3) Define the total loss function 步骤4.1)和步骤4.2)中定义的两个区域生成网络损失函数与两个分类网络损失函数能够通过加权的方式组合起来,使得网络可以完成图片中多尺度目标检测的任务;The two area generation network loss functions defined in step 4.1) and step 4.2) and the two classification network loss functions can be combined in a weighted manner, so that the network can complete the task of multi-scale target detection in the picture; 5)模型训练5) Model training 初始化各层网络的参数,不断迭代输入训练样本,根据损失函数计算得到网络的损失值,再通过反向传播计算出各网络层参数的梯度,通过随机梯度下降法对各层网络的参数进行更新,包括以下步骤:Initialize the parameters of each layer of the network, iteratively input training samples, calculate the loss value of the network according to the loss function, and then calculate the gradient of the parameters of each network layer through back propagation, and update the parameters of each layer of the network through the stochastic gradient descent method. , including the following steps: 5.1)初始化模型各层参数5.1) Initialize the parameters of each layer of the model 各层参数的初始化采用的是传统的深度卷积神经网络中使用到的方法,对特征提取网络的卷积层参数利用在ImageNet预训练好的VGG16网络模型的卷积层参数值作为初始值,区域生成网络中的卷积层以及分类网络的全连接层,则采用均值为0,标准差为0.02的高斯分布进行初始化,而对所有的批量正则化层的参数采用均值为1,标准差为0.02的高斯分布进行初始化;The initialization of the parameters of each layer adopts the method used in the traditional deep convolutional neural network. For the convolutional layer parameters of the feature extraction network, the convolutional layer parameter values of the VGG16 network model pre-trained in ImageNet are used as the initial values. The convolutional layer in the region generation network and the fully connected layer of the classification network are initialized with a Gaussian distribution with a mean of 0 and a standard deviation of 0.02, while the parameters of all batch regularization layers are initialized with a mean of 1 and a standard deviation of The Gaussian distribution of 0.02 is initialized; 5.2)训练网络模型5.2) Train the network model 随机输入经过步骤2)处理的原始图像,经过步骤3.1)的特征提取网络得到相应的低维编码特征,在经过步骤3.2)的区域生成网络生成一批选框的候选区域,并通过步骤4.1)计算相应的损失值,然后将这些区域经过步骤3.3)的有内容感知能力的感兴趣区域池化层得到固定大小的另一种低维编码特征,而后再经过步骤3.4)的分类网络得到目标的分类以及精修的选框位置,并通过步骤4.2)计算相应的损失值;最后将这两部分的损失值经过步骤4.3)的处理得到最终损失值,将该值通过反向传播能够得到步骤3)网络模型中的各层参数的梯度,再通过随机梯度下降算法使得到的梯度对各层参数进行优化,即可实现一轮网络模型的训练;Randomly input the original image processed in step 2), obtain the corresponding low-dimensional coding features through the feature extraction network in step 3.1), generate a batch of candidate regions for the selection box in the region generation network in step 3.2), and pass step 4.1) Calculate the corresponding loss value, and then pass these regions through the content-aware region of interest pooling layer in step 3.3) to obtain another low-dimensional encoding feature of a fixed size, and then go through the classification network in step 3.4) to obtain the target's Classification and refinement of the marquee position, and calculate the corresponding loss value through step 4.2); finally, the loss value of these two parts is processed in step 4.3) to obtain the final loss value, which can be obtained through back propagation. Step 3 ) the gradients of the parameters of each layer in the network model, and then optimize the parameters of each layer with the gradient obtained by the stochastic gradient descent algorithm, and then a round of training of the network model can be realized; 5.3)重复步骤5.2)直到网络关于多尺度目标检测的能力达到预期的目标为止;5.3) Repeat step 5.2) until the ability of the network on multi-scale target detection reaches the expected target; 6)模型验证6) Model Validation 使用验证数据集对训练得到的模型进行验证,测试其泛化性能。Use the validation dataset to validate the trained model and test its generalization performance.
2.根据权利要求1所述的一种基于深度卷积神经网络的多尺度目标检测方法,其特征在于,所述步骤2)包括以下步骤:2. a kind of multi-scale target detection method based on deep convolutional neural network according to claim 1, is characterized in that, described step 2) comprises the following steps: 2.1)将数据集中的图像缩放到长和宽为m×n像素大小,标签数据也根据相应的比例缩放到相应的大小;2.1) Scale the images in the dataset to the size of m×n pixels in length and width, and the label data is also scaled to the corresponding size according to the corresponding ratio; 2.2)在缩放后的图像,随机裁剪包含有标签的地方得到a×b像素大小的矩形图像,a<=m,b<=n;2.2) In the scaled image, randomly crop the place containing the label to obtain a rectangular image of a×b pixel size, a<=m, b<=n; 2.3)以0.5的概率随机水平翻转裁剪后的图像;2.3) Randomly flip the cropped image horizontally with a probability of 0.5; 2.4)将随机翻转后的图像从[0,255]转换到[-1,1]的范围内。2.4) Convert the randomly flipped image from [0, 255] to the range of [-1, 1]. 3.根据权利要求1所述的一种基于深度卷积神经网络的多尺度目标检测方法,其特征在于,所述步骤6)的具体做法如下:3. a kind of multi-scale target detection method based on deep convolutional neural network according to claim 1, is characterized in that, the concrete practice of described step 6) is as follows: 随机从验证数据集中取出一些原始图像,经过步骤2)处理后,输入到步骤5)训练好的网络模型,让该网络模型去检测图中的目标的位置并预测其类别,通过输出的结果与对应的标签数据进行比对,从而判断该训练好的网络模型的多尺度目标检测能力。Randomly take out some original images from the verification data set, and after processing in step 2), input them into the network model trained in step 5), and let the network model detect the position of the target in the figure and predict its category, through the output result and The corresponding label data are compared to judge the multi-scale target detection ability of the trained network model.
CN201711267789.7A 2017-12-05 2017-12-05 Multi-scale target detection method based on deep convolutional neural network Active CN108564097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711267789.7A CN108564097B (en) 2017-12-05 2017-12-05 Multi-scale target detection method based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711267789.7A CN108564097B (en) 2017-12-05 2017-12-05 Multi-scale target detection method based on deep convolutional neural network

Publications (2)

Publication Number Publication Date
CN108564097A CN108564097A (en) 2018-09-21
CN108564097B true CN108564097B (en) 2020-09-22

Family

ID=63529242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711267789.7A Active CN108564097B (en) 2017-12-05 2017-12-05 Multi-scale target detection method based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN108564097B (en)

Families Citing this family (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361617B (en) * 2018-09-26 2022-09-27 中国科学院计算机网络信息中心 A convolutional neural network traffic classification method and system based on network packet load
CN109446911B (en) * 2018-09-28 2021-08-06 北京陌上花科技有限公司 Image detection method and system
CN109492636B (en) * 2018-09-30 2021-08-03 浙江工业大学 Object detection method based on adaptive receptive field deep learning
CN109376619B (en) * 2018-09-30 2021-10-15 中国人民解放军陆军军医大学 Cell detection method
CN109525859B (en) * 2018-10-10 2021-01-15 腾讯科技(深圳)有限公司 Model training method, image sending method, image processing method and related device equipment
CN109558791B (en) * 2018-10-11 2020-12-01 浙江大学宁波理工学院 Bamboo shoot searching device and method based on image recognition
CN109344806B (en) * 2018-10-31 2019-08-23 第四范式(北京)技术有限公司 The method and system detected using multitask target detection model performance objective
CN109634820A (en) * 2018-11-01 2019-04-16 华中科技大学 A kind of fault early warning method, relevant device and the system of the collaboration of cloud mobile terminal
CN109583321A (en) * 2018-11-09 2019-04-05 同济大学 The detection method of wisp in a kind of structured road based on deep learning
CN109523015B (en) * 2018-11-09 2021-10-22 上海海事大学 A kind of image processing method in neural network
CN109583483B (en) * 2018-11-13 2020-12-11 中国科学院计算技术研究所 A target detection method and system based on convolutional neural network
CN111260536B (en) * 2018-12-03 2022-03-08 中国科学院沈阳自动化研究所 Digital image multi-scale convolution processor with variable parameters and implementation method thereof
CN111310775B (en) * 2018-12-11 2023-08-25 Tcl科技集团股份有限公司 Data training method, device, terminal equipment and computer readable storage medium
CN109753995B (en) * 2018-12-14 2021-01-01 中国科学院深圳先进技术研究院 Optimization method of 3D point cloud target classification and semantic segmentation network based on PointNet +
CN109753959B (en) * 2018-12-21 2022-05-13 西北工业大学 Pavement traffic sign detection method based on adaptive multi-scale feature fusion
CN109766790B (en) * 2018-12-24 2022-08-23 重庆邮电大学 Pedestrian detection method based on self-adaptive characteristic channel
CN109685066B (en) * 2018-12-24 2021-03-09 中国矿业大学(北京) Mine target detection and identification method based on deep convolutional neural network
CN110889425A (en) * 2018-12-29 2020-03-17 研祥智能科技股份有限公司 Target detection method based on deep learning
CN109726690B (en) * 2018-12-30 2023-04-18 陕西师范大学 Multi-region description method for learner behavior image based on DenseCap network
CN109741318B (en) * 2018-12-30 2022-03-29 北京工业大学 Real-time detection method of single-stage multi-scale specific target based on effective receptive field
CN109753927B (en) 2019-01-02 2025-03-07 腾讯科技(深圳)有限公司 A face detection method and device
CN109784476B (en) * 2019-01-12 2022-08-16 福州大学 Method for improving DSOD network
CN109829421B (en) * 2019-01-29 2020-09-08 西安邮电大学 Method and device for vehicle detection and computer readable storage medium
CN111523351A (en) * 2019-02-02 2020-08-11 北京地平线机器人技术研发有限公司 Neural network training method and device and electronic equipment
CN109977997B (en) * 2019-02-13 2021-02-02 中国科学院自动化研究所 Image target detection and segmentation method based on convolutional neural network rapid robustness
CN109919214B (en) * 2019-02-27 2023-07-21 南京地平线机器人技术有限公司 Training method and training device for neural network model
CN109949229A (en) * 2019-03-01 2019-06-28 北京航空航天大学 A multi-platform and multi-view target collaborative detection method
CN111695380B (en) * 2019-03-13 2023-09-26 杭州海康威视数字技术股份有限公司 Target detection method and device
CN110120047B (en) * 2019-04-04 2023-08-08 平安科技(深圳)有限公司 Image segmentation model training method, image segmentation method, device, equipment and medium
CN109977918B (en) * 2019-04-09 2023-05-02 华南理工大学 An Optimization Method for Object Detection and Localization Based on Unsupervised Domain Adaptation
CN110072119B (en) * 2019-04-11 2020-04-10 西安交通大学 Content-aware video self-adaptive transmission method based on deep learning network
CN110084165B (en) * 2019-04-19 2020-02-07 山东大学 Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation
CN110070530B (en) * 2019-04-19 2020-04-10 山东大学 Transmission line icing detection method based on deep neural network
CN110135480A (en) * 2019-04-30 2019-08-16 南开大学 A network data learning method based on unsupervised object detection to eliminate bias
CN110215232A (en) * 2019-04-30 2019-09-10 南方医科大学南方医院 Ultrasonic patch analysis method in coronary artery based on algorithm of target detection
CN110929746A (en) * 2019-05-24 2020-03-27 南京大学 A deep neural network-based method for location, extraction and classification of electronic file titles
CN110288082B (en) * 2019-06-05 2022-04-05 北京字节跳动网络技术有限公司 Convolutional neural network model training method and device and computer readable storage medium
CN110298387A (en) * 2019-06-10 2019-10-01 天津大学 Incorporate the deep neural network object detection method of Pixel-level attention mechanism
CN110298266B (en) * 2019-06-10 2023-06-06 天津大学 Object detection method based on deep neural network based on multi-scale receptive field feature fusion
CN110348437B (en) * 2019-06-27 2022-03-25 电子科技大学 A Target Detection Method Based on Weakly Supervised Learning and Occlusion Awareness
CN110288586A (en) * 2019-06-28 2019-09-27 昆明能讯科技有限责任公司 A kind of multiple dimensioned transmission line of electricity defect inspection method based on visible images data
CN110472483B (en) * 2019-07-02 2022-11-15 五邑大学 SAR image-oriented small sample semantic feature enhancement method and device
CN110399884B (en) * 2019-07-10 2021-08-20 浙江理工大学 A feature fusion adaptive anchor frame model vehicle detection method
CN110349148A (en) * 2019-07-11 2019-10-18 电子科技大学 A Weakly Supervised Learning-Based Image Object Detection Method
CN111027581A (en) * 2019-08-23 2020-04-17 中国地质大学(武汉) A 3D target detection method and system based on learnable coding
CN110706205B (en) * 2019-09-07 2021-05-14 创新奇智(重庆)科技有限公司 Method for detecting cloth hole-breaking defect by using computer vision technology
CN110659724B (en) * 2019-09-12 2023-04-28 复旦大学 Construction Method of Deep Convolutional Neural Network for Target Detection Based on Target Scale
CN112712097B (en) * 2019-10-25 2024-01-05 杭州海康威视数字技术股份有限公司 Image recognition method and device based on open platform and user side
CN110909623B (en) * 2019-10-31 2022-10-04 南京邮电大学 Three-dimensional target detection method and three-dimensional target detector
CN110991247B (en) * 2019-10-31 2023-08-11 厦门思泰克智能科技股份有限公司 Electronic component identification method based on deep learning and NCA fusion
CN111008656B (en) * 2019-11-29 2022-12-13 中国电子科技集团公司第二十研究所 Target detection method based on prediction frame error multi-stage loop processing
CN111222546B (en) * 2019-12-27 2023-04-07 中国科学院计算技术研究所 Multi-scale fusion food image classification model training and image classification method
CN111178446B (en) * 2019-12-31 2023-08-04 歌尔股份有限公司 Optimization method and device of target classification model based on neural network
CN111242897A (en) * 2019-12-31 2020-06-05 北京深睿博联科技有限责任公司 Chest X-ray image analysis method and device
CN111241964A (en) * 2020-01-06 2020-06-05 北京三快在线科技有限公司 Training method and device of target detection model, electronic equipment and storage medium
CN111242037B (en) * 2020-01-15 2023-03-21 华南理工大学 Lane line detection method based on structural information
CN111275171B (en) * 2020-01-19 2023-07-04 合肥工业大学 A small target detection method based on multi-scale super-resolution reconstruction based on parameter sharing
CN111274981B (en) * 2020-02-03 2021-10-08 中国人民解放军国防科技大学 Target detection network construction method and device and target detection method
CN111444939B (en) * 2020-02-19 2022-06-28 山东大学 Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field
CN111340123A (en) * 2020-02-29 2020-06-26 韶鼎人工智能科技有限公司 Image score label prediction method based on deep convolutional neural network
CN111445026B (en) * 2020-03-16 2023-08-22 东南大学 Acceleration method for deep neural network multi-path reasoning for edge intelligence applications
CN111461190B (en) * 2020-03-24 2023-03-28 华南理工大学 Deep convolutional neural network-based non-equilibrium ship classification method
CN111257341B (en) * 2020-03-30 2023-06-16 河海大学常州校区 Crack detection method for underwater buildings based on multi-scale features and stacked fully convolutional network
CN111489332B (en) * 2020-03-31 2023-03-17 成都数之联科技股份有限公司 Multi-scale IOF random cutting data enhancement method for target detection
CN111611846A (en) * 2020-03-31 2020-09-01 北京迈格威科技有限公司 Pedestrian re-identification method, device, electronic device and storage medium
CN111553397B (en) * 2020-04-21 2022-04-29 东南大学 Cross-domain target detection method based on regional full convolution network and self-adaption
CN112016542A (en) * 2020-05-08 2020-12-01 珠海欧比特宇航科技股份有限公司 Urban waterlogging intelligent detection method and system
CN111597945B (en) * 2020-05-11 2023-08-18 济南博观智能科技有限公司 Target detection method, device, equipment and medium
CN111931900B (en) * 2020-05-29 2023-09-19 西安电子科技大学 GIS discharge waveform detection method based on residual network and multi-scale feature fusion
CN111626373B (en) * 2020-06-01 2023-07-25 中国科学院自动化研究所 Multi-scale widening residual network, small target recognition and detection network and its optimization method
CN111783784A (en) * 2020-06-30 2020-10-16 创新奇智(合肥)科技有限公司 Method and device for detecting building cavity, electronic equipment and storage medium
CN111860264B (en) * 2020-07-10 2024-01-05 武汉理工大学 Multi-task instance-level road scene understanding algorithm based on gradient equalization strategy
CN111986126B (en) * 2020-07-17 2022-05-24 浙江工业大学 Multi-target detection method based on improved VGG16 network
CN112288686B (en) * 2020-07-29 2023-12-19 深圳市智影医疗科技有限公司 Model training method and device, electronic equipment and storage medium
CN112183579B (en) * 2020-09-01 2023-05-30 国网宁夏电力有限公司检修公司 Method, medium and system for detecting micro target
CN112149521B (en) * 2020-09-03 2024-05-07 浙江工业大学 Palm print ROI extraction and enhancement method based on multitasking convolutional neural network
CN112116079A (en) * 2020-09-22 2020-12-22 视觉感知(北京)科技有限公司 Solution for data transmission between neural networks
CN112132816B (en) * 2020-09-27 2022-12-30 北京理工大学 Target detection method based on multitask and region-of-interest segmentation guidance
CN112200089B (en) * 2020-10-12 2021-09-14 西南交通大学 Dense vehicle detection method based on vehicle counting perception attention
CN112347967B (en) * 2020-11-18 2023-04-07 北京理工大学 A Pedestrian Detection Method Fused with Motion Information in Complex Scenes
CN114547785B (en) * 2020-11-25 2024-11-22 英业达科技有限公司 Manufacturing equipment manufacturing parameter adjustment control system and method
CN112348036B (en) * 2020-11-26 2025-01-14 北京工业大学 Adaptive object detection method based on lightweight residual learning and deconvolution cascade
CN112560627A (en) * 2020-12-09 2021-03-26 江苏集萃未来城市应用技术研究所有限公司 Real-time detection method for abnormal behaviors of construction site personnel based on neural network
CN112508016B (en) * 2020-12-15 2024-04-16 深圳万兴软件有限公司 Image processing method, device, computer equipment and storage medium
CN112712133A (en) * 2021-01-15 2021-04-27 北京华捷艾米科技有限公司 Deep learning network model training method, related device and storage medium
CN112836816B (en) * 2021-02-04 2024-02-09 南京大学 Training method suitable for crosstalk of photoelectric storage and calculation integrated processing unit
CN113269182A (en) * 2021-04-21 2021-08-17 山东师范大学 Target fruit detection method and system based on small-area sensitivity of variant transform
CN113326735B (en) * 2021-04-29 2023-11-28 南京大学 YOLOv 5-based multi-mode small target detection method
CN113239775B (en) * 2021-05-09 2023-05-02 西北工业大学 Method for detecting and extracting tracks in azimuth lineage diagram based on hierarchical attention depth convolution neural network
CN112990444B (en) * 2021-05-13 2021-09-24 电子科技大学 Hybrid neural network training method, system, equipment and storage medium
CN113076962B (en) * 2021-05-14 2022-10-21 电子科技大学 Multi-scale target detection method based on micro neural network search technology
CN113762278B (en) * 2021-09-13 2023-11-17 中冶路桥建设有限公司 Asphalt pavement damage identification method based on target detection
CN114048536A (en) * 2021-11-18 2022-02-15 重庆邮电大学 A road structure prediction and target detection method based on multi-task neural network
CN113902980B (en) * 2021-11-24 2024-02-20 河南大学 Remote sensing target detection method based on content perception
CN114462487A (en) * 2021-12-28 2022-05-10 浙江大华技术股份有限公司 Target detection network training and detection method, device, terminal and storage medium
CN114549958B (en) * 2022-02-24 2023-08-04 四川大学 Night and camouflage target detection method based on context information perception mechanism
CN114687012A (en) * 2022-02-25 2022-07-01 武汉智目智能技术合伙企业(有限合伙) Efficient foreign fiber removing device and method for high-impurity-content raw cotton
CN115049952B (en) * 2022-04-24 2023-04-07 南京农业大学 Juvenile fish limb identification method based on multi-scale cascade perception deep learning network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320963A (en) * 2015-10-21 2016-02-10 哈尔滨工业大学 High resolution remote sensing image oriented large scale semi-supervised feature selection method
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN106845430A (en) * 2017-02-06 2017-06-13 东华大学 Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN107103590A (en) * 2017-03-22 2017-08-29 华南理工大学 A kind of image for resisting generation network based on depth convolution reflects minimizing technology
CN107341517A (en) * 2017-07-07 2017-11-10 哈尔滨工业大学 The multiple dimensioned wisp detection method of Fusion Features between a kind of level based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150342560A1 (en) * 2013-01-25 2015-12-03 Ultrasafe Ultrasound Llc Novel Algorithms for Feature Detection and Hiding from Ultrasound Images
US10002313B2 (en) * 2015-12-15 2018-06-19 Sighthound, Inc. Deeply learned convolutional neural networks (CNNS) for object localization and classification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320963A (en) * 2015-10-21 2016-02-10 哈尔滨工业大学 High resolution remote sensing image oriented large scale semi-supervised feature selection method
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN106845430A (en) * 2017-02-06 2017-06-13 东华大学 Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN107103590A (en) * 2017-03-22 2017-08-29 华南理工大学 A kind of image for resisting generation network based on depth convolution reflects minimizing technology
CN107341517A (en) * 2017-07-07 2017-11-10 哈尔滨工业大学 The multiple dimensioned wisp detection method of Fusion Features between a kind of level based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection;Zhaowei Cai.et.;《ECCV 2016》;20161231;第354-370页 *

Also Published As

Publication number Publication date
CN108564097A (en) 2018-09-21

Similar Documents

Publication Publication Date Title
CN108564097B (en) Multi-scale target detection method based on deep convolutional neural network
CN109977918B (en) An Optimization Method for Object Detection and Localization Based on Unsupervised Domain Adaptation
CN110298266B (en) Object detection method based on deep neural network based on multi-scale receptive field feature fusion
Wang et al. An improved light-weight traffic sign recognition algorithm based on YOLOv4-tiny
CN113642390B (en) Street view image semantic segmentation method based on local attention network
CN109934200B (en) RGB color remote sensing image cloud detection method and system based on improved M-Net
CN110738207A (en) character detection method for fusing character area edge information in character image
CN110276269A (en) An Attention Mechanism Based Target Detection Method for Remote Sensing Images
CN114187450A (en) A deep learning-based semantic segmentation method for remote sensing images
CN111860171A (en) A method and system for detecting irregularly shaped targets in large-scale remote sensing images
CN117078942B (en) Context-aware refereed image segmentation method, system, device and storage medium
CN116524189A (en) High-resolution remote sensing image semantic segmentation method based on coding and decoding indexing edge characterization
CN117409190B (en) Real-time infrared image target detection method, device, equipment and storage medium
CN115546569A (en) An attention mechanism-based data classification optimization method and related equipment
CN116912708A (en) Remote sensing image building extraction method based on deep learning
CN118691815A (en) A high-quality automatic instance segmentation method for remote sensing images based on fine-tuning of the SAM large model
CN110852327A (en) Image processing method, device, electronic device and storage medium
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
Zheng et al. Feature enhancement for multi-scale object detection
CN116363526A (en) MROCNet model construction and multi-source remote sensing image change detection method and system
CN116012626B (en) Material matching method, device, equipment and storage medium for building elevation image
Pang et al. PTRSegNet: A Patch-to-Region Bottom–Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images
CN112668662B (en) Target detection method in wild mountain forest environment based on improved YOLOv3 network
CN118521791A (en) Remote sensing image semantic segmentation method based on convolutional neural network and complete attention network
CN113011506A (en) Texture image classification method based on depth re-fractal spectrum network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant