CN113160050B

CN113160050B - Small target identification method and system based on space-time neural network

Info

Publication number: CN113160050B
Application number: CN202110319609.5A
Authority: CN
Inventors: 刘绍辉; 梁智博; 姜峰; 付森
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2023-08-25
Anticipated expiration: 2041-03-25
Also published as: CN113160050A

Abstract

The invention discloses a small target identification method and a system based on a space-time neural network, wherein the method comprises the following steps: preprocessing an original blurred image by using a super-resolution algorithm to obtain a high-quality image sequence; performing logic subtraction operation on adjacent frames of the high-quality image sequence by using a space-time attention mechanism, capturing and highlighting a suspicious region; extracting depth features in the suspicious region to obtain a feature map time sequence; inputting the time sequence of the feature map into a mapping device of confidence output by adopting an LSTM state transition subnet to obtain a transition state; and classifying the transfer state by using a classifier to obtain a final recognition result, wherein the final recognition result is the target type and the confidence coefficient. The method is characterized in that the model is self-corrected along with continuous reading of the frame sequence, and the model is gradually corrected to be of a correct category and the confidence coefficient is continuously improved.

Description

Small target recognition method and system based on spatio-temporal neural network

技术领域technical field

本发明涉及计算机视觉技术领域，特别涉及一种基于时空神经网络的小目标识别方法及系统。The invention relates to the technical field of computer vision, in particular to a small target recognition method and system based on a spatio-temporal neural network.

背景技术Background technique

随着计算机视觉领域的发展，目标识别技术成为研究热点并被广泛应用于智能安保、自动驾驶、医疗辅助诊断等领域。在实际应用中，要求目标清晰易辨往往是不现实的，这就使得小目标识别技术在近年来受到越来越多的关注。现实应用场景中普遍存在着一些困难，例如：目标尺寸极小、目标距离较远，图像源分辨率过低等，这对以单帧图像作为识别依据的传统算法提出了严峻的挑战。With the development of computer vision, target recognition technology has become a research hotspot and has been widely used in intelligent security, automatic driving, medical aided diagnosis and other fields. In practical applications, it is often unrealistic to require the target to be clear and easy to distinguish, which makes the small target recognition technology receive more and more attention in recent years. There are some difficulties in practical application scenarios, such as: extremely small target size, long target distance, low resolution of image source, etc., which pose a serious challenge to traditional algorithms based on single-frame image recognition.

当代基于深度网络的普适目标识别算法，基本采用主流的深度网络模型作为主干网络和自动特征提取器，再由分类器给出最终识别结果。因为使用了包含大量图片的数据集训练，这些普适算法在面对清晰可辨物体时往往能够取得不错的效果，但由于其主干网络中不同程度上运用了卷积等操作，必然导致卷积通道上的特征分辨率发生下降，进而直接使得这些算法在面对小目标问题时性能发生严重退化。Contemporary deep network-based pervasive target recognition algorithms basically use the mainstream deep network model as the backbone network and automatic feature extractor, and then the final recognition result is given by the classifier. Because of the use of data sets containing a large number of pictures for training, these universal algorithms can often achieve good results in the face of clearly identifiable objects, but due to the use of convolution and other operations in the backbone network to varying degrees, it will inevitably lead to convolution. The feature resolution on the channel decreases, which directly causes the performance of these algorithms to seriously degrade in the face of small target problems.

近年来，相关工作围绕小目标识别问题展开，一种思路是从识别模型角度，通过不同尺寸特征融合、拓展接受野、引入图像上下文关系等操作对模型本身进行改进，提高模型对小目标的识别能力。另一种思路是从图像源修复角度考虑，通过数据增强、超分辨率处理等方式将小目标尽可能还原成清晰可辨的信号。虽然上述两类方法起到了一定的效果，但由于只工作在单帧图像上，仍远不能满足现实场景中实时、精准的要求。In recent years, related work has been carried out around the problem of small target recognition. One way of thinking is to improve the model itself through operations such as fusion of different sizes of features, expansion of the receptive field, and introduction of image context from the perspective of the recognition model to improve the recognition of small targets by the model. ability. Another way of thinking is to consider from the perspective of image source restoration, and restore small targets into clear and identifiable signals as much as possible through data enhancement and super-resolution processing. Although the above two types of methods have achieved certain effects, they are still far from meeting the real-time and accurate requirements in real scenes because they only work on a single frame of image.

发明内容Contents of the invention

本发明旨在至少在一定程度上解决相关技术中的技术问题之一。The present invention aims to solve one of the technical problems in the related art at least to a certain extent.

为此，本发明的一个目的在于提出一种基于时空神经网络的小目标识别方法。Therefore, an object of the present invention is to propose a small target recognition method based on a spatio-temporal neural network.

本发明的另一个目的在于提出一种基于时空神经网络的小目标识别系统。Another object of the present invention is to propose a small target recognition system based on spatio-temporal neural network.

为达到上述目的，本发明一方面实施例提出了基于时空神经网络的小目标识别方法，包括以下步骤：步骤S1，获取当前时刻的原始模糊图像；步骤S2，运用超分辨率算法对所述原始模糊图像进行预处理，得到高画质图像序列；步骤S3，利用时空注意力机制对所述高画质图像序列的相邻帧间进行逻辑减操作，捕捉并高亮强调可疑区域；步骤S4，提取所述可疑区域中的深度特征，得到特征图时序序列；步骤S5，采用LSTM状态转移子网将所述特征图时序序列输入到置信输出的映射装置中，得到修正后的特征图时序序列；步骤S6，利用分类器对所述修正后的特征图时序序列进行分类，得到最终识别结果，其中，所述最终识别结果为目标种类和置信率。In order to achieve the above purpose, an embodiment of the present invention proposes a small target recognition method based on a spatio-temporal neural network, including the following steps: step S1, obtaining the original blurred image at the current moment; step S2, using a super-resolution algorithm to process the original Preprocessing the blurred image to obtain a high-quality image sequence; step S3, using the spatio-temporal attention mechanism to perform a logical subtraction operation between adjacent frames of the high-quality image sequence, capturing and highlighting suspicious areas; step S4, Extracting the depth features in the suspicious area to obtain a time-series sequence of feature maps; step S5, using the LSTM state transition subnetwork to input the time-series sequence of feature maps into a confidence output mapping device to obtain a time-series sequence of feature maps after correction; Step S6, using a classifier to classify the corrected time series of feature maps to obtain a final recognition result, wherein the final recognition result is the target category and confidence rate.

本发明实施例的基于时空神经网络的小目标识别方法，解决了现有由单帧图像目标识别导致的识别性能下降问题，在大致锁定目标所在区域后，将视觉捕捉器和计算部件资源集中连续地分配给处于该区域的可疑目标，通过一定时间的连续时序图像捕捉，逐渐提升识别置信率；同时，随着模型的逐渐运行，一些前期得到的错误结论也将得到修正，使模型具备一定的纠错性能。The small target recognition method based on the spatio-temporal neural network in the embodiment of the present invention solves the existing problem of recognition performance degradation caused by single-frame image target recognition. The recognition confidence rate will be gradually improved through continuous time-series image capture for a certain period of time; at the same time, with the gradual operation of the model, some erroneous conclusions obtained in the early stage will also be corrected, so that the model has a certain error correction performance.

另外，根据本发明上述实施例的基于时空神经网络的小目标识别方法还可以具有以下附加的技术特征：In addition, the spatio-temporal neural network-based small target recognition method according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

进一步地，在本发明的一个实施例中，所述LSTM状态转移子网部分采用RNN循环神经网络的重要变种LSTM作为主要部件，其中，一个完整的重要变种LSTM细胞结构包括输入门、输出门、门门和遗忘门。Further, in one embodiment of the present invention, the LSTM state transition subnetwork part adopts an important variant LSTM of the RNN cyclic neural network as a main component, wherein a complete important variant LSTM cell structure includes an input gate, an output gate, Doors and Forgotten Doors.

进一步地，在本发明的一个实施例中，所述一个完整的重要变种LSTM细胞结构为：Further, in an embodiment of the present invention, the cell structure of a complete important variant LSTM is:

其中，i为输入门，f为遗忘门，o为输出门，g为门门，sigmod函数σ(x)＝1/(1+e^-x)，φ(x)＝(e^x-e^-x)/(e^x+e^-x)，W为权重矩阵，B为偏置向量，x_t和h_t-1为当前输入。Among them, i is the input gate, f is the forget gate, o is the output gate, g is the gate, the sigmod function σ(x)=1/(1+e ^-x ), φ(x)=(e ^x -e ^{- x} )/(e ^x +e ^-x ), W is the weight matrix, B is the bias vector, x _t and h _t-1 are the current input.

进一步地，在本发明的一个实施例中，所述转移状态为：Further, in one embodiment of the present invention, the transition state is:

其中，h_t为当前时序的输出状态，t为时序，o为输出门，c_t为当前时序的隐藏状态，f为遗忘门，c_t-1为上一时序的隐藏状态，i为输入门，g为门门。Among them, h _t is the output state of the current sequence, t is the sequence, o is the output gate, c _t is the hidden state of the current sequence, f is the forgetting gate, c _t-1 is the hidden state of the previous sequence, and i is the input gate , g is the gate.

进一步地，在本发明的一个实施例中，所述步骤S4和所述步骤S6中采用任一种深度卷积模型作为主干网络。Further, in one embodiment of the present invention, any one of the deep convolutional models is used as the backbone network in the step S4 and the step S6.

为达到上述目的，本发明另一方面实施例提出了基于时空神经网络的小目标识别系统，包括：获取模块、超分辨率模块、时空注意力模块、特征提取模块、LSTM状态转移子网和分类模块，其中，所述获取模块，用于获取当前时刻的原始模糊图像；所述超分辨率模块，用于对原始模糊图像进行预处理，得到高画质图像序列；所述时空注意力模块，用于在所述高画质图像序列的相邻帧间进行逻辑减操作，捕捉并高亮强调可疑区域；所述特征提取模块，用于提取所述可疑区域中的深度特征，得到特征图时序序列；所述LSTM状态转移子网，用于将所述特征图时序序列输入到置信输出的映射装置中，得到修正后的特征图时序序列；所述分类模块，用于对所述修正后的特征图时序序列进行分类，得到最终识别结果，其中，所述最终识别结果为种类和置信率。In order to achieve the above object, another embodiment of the present invention proposes a small target recognition system based on spatio-temporal neural network, including: acquisition module, super-resolution module, spatio-temporal attention module, feature extraction module, LSTM state transition subnetwork and classification module, wherein the acquisition module is used to acquire the original blurred image at the current moment; the super-resolution module is used to preprocess the original blurred image to obtain a high-quality image sequence; the spatiotemporal attention module, It is used to perform a logical subtraction operation between adjacent frames of the high-quality image sequence, to capture and highlight suspicious areas; the feature extraction module is used to extract depth features in the suspicious areas, and obtain the sequence of feature maps Sequence; the LSTM state transfer subnetwork is used to input the time series sequence of the feature map into the mapping device of the confidence output to obtain the time series sequence of the feature map after correction; the classification module is used to process the corrected time series The time series of feature maps are classified to obtain a final recognition result, wherein the final recognition result is a category and a confidence rate.

本发明实施例的基于时空神经网络的小目标识别系统，解决了现有由单帧图像目标识别导致的识别性能下降问题，在大致锁定目标所在区域后，将视觉捕捉器和计算部件资源集中连续地分配给处于该区域的可疑目标，通过一定时间的连续时序图像捕捉，逐渐提升识别置信率；同时，随着模型的逐渐运行，一些前期得到的错误结论也将得到修正，使模型具备一定的纠错性能。The small target recognition system based on spatio-temporal neural network in the embodiment of the present invention solves the existing problem of recognition performance degradation caused by single-frame image target recognition. The recognition confidence rate will be gradually improved through continuous time-series image capture for a certain period of time; at the same time, with the gradual operation of the model, some erroneous conclusions obtained in the early stage will also be corrected, so that the model has a certain error correction performance.

另外，根据本发明上述实施例的基于时空神经网络的小目标识别系统还可以具有以下附加的技术特征：In addition, the small target recognition system based on the spatio-temporal neural network according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

进一步地，在本发明的一个实施例中，所述特征提取模块和所述分类模块中采用任一种深度卷积模型作为主干网络。Further, in one embodiment of the present invention, any deep convolution model is used as the backbone network in the feature extraction module and the classification module.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1是本发明一个实施例的基于时空神经网络的小目标识别方法流程图；Fig. 1 is the flow chart of the small target recognition method based on spatio-temporal neural network of an embodiment of the present invention;

图2是本发明一个实施例的对特定目标种类注意力分布与识别精度关系直观示意图，其中，(a)为注意力分散，识别错误；(b)为注意力集中，识别正确；Fig. 2 is an intuitive schematic diagram of the relationship between attention distribution and recognition accuracy of a specific target type according to an embodiment of the present invention, wherein (a) is distraction and wrong recognition; (b) is concentration of attention and correct recognition;

图3是本发明一个实施例的LSTM细胞单元构造示意图；Fig. 3 is a schematic diagram of the structure of an LSTM cell unit according to an embodiment of the present invention;

图4是本发明一个实施例的ATSETC4样例图示意图；Fig. 4 is a schematic diagram of an ATSETC4 sample diagram of an embodiment of the present invention;

图5是本发明一个实施例的模型自修正能力示意图；Fig. 5 is a schematic diagram of model self-correction capability according to an embodiment of the present invention;

图6是本发明一个具体实施例的SRGAN对不同尺寸图像处理结果示意图；Fig. 6 is a schematic diagram of SRGAN processing results of images of different sizes according to a specific embodiment of the present invention;

图7是本发明一个实施例的基于时空神经网络的小目标识别系统结构示意图。Fig. 7 is a schematic structural diagram of a small target recognition system based on a spatio-temporal neural network according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, and examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

下面参照附图描述根据本发明实施例提出的基于时空神经网络的小目标识别方法及系统，首先将参照附图描述根据本发明实施例提出的基于时空神经网络的小目标识别方法。The spatio-temporal neural network-based small target recognition method and system according to the embodiments of the present invention will be described below with reference to the accompanying drawings. First, the spatio-temporal neural network-based small target recognition method according to the present invention will be described with reference to the accompanying drawings.

图1是本发明一个实施例的基于时空神经网络的小目标识别方法流程图。Fig. 1 is a flow chart of a small target recognition method based on a spatio-temporal neural network according to an embodiment of the present invention.

如图1所示，该基于时空神经网络的小目标识别方法包括以下步骤：As shown in Figure 1, the small target recognition method based on spatio-temporal neural network includes the following steps:

在步骤S1中，获取当前时刻的原始模糊图像。In step S1, the original blurred image at the current moment is acquired.

在步骤S2中，运用超分辨率算法对原始模糊图像进行预处理，得到高画质图像序列。In step S2, a super-resolution algorithm is used to preprocess the original blurred image to obtain a high-quality image sequence.

具体地，采用带有完整训练模型的超分辨率算法对原始模糊图像进行初步增强，以得到画质更好的数据源，其中，该超分辨率算法可以引用任何有效的超分方法，本发明实施例中采用SRGAN，本领域技术人员可根据实际情况选择不同的超分方法，在此不做具体限定。Specifically, a super-resolution algorithm with a complete training model is used to initially enhance the original blurred image to obtain a data source with better image quality, wherein the super-resolution algorithm can refer to any effective super-resolution method, the present invention In the embodiment, SRGAN is used, and those skilled in the art can choose different super-resolution methods according to actual conditions, which are not specifically limited here.

在步骤S3中，利用时空注意力机制对高画质图像序列的相邻帧间进行逻辑减操作，捕捉并高亮强调可疑区域，以便后续计算资源更精准地分配到实际目标上。In step S3, the spatial-temporal attention mechanism is used to perform logical subtraction between adjacent frames of the high-quality image sequence, capture and highlight suspicious areas, so that subsequent computing resources can be more accurately allocated to actual targets.

形式化地，设模型对某一确定目标的注意力得分Y为权重w和特征图A的内积：Formally, let the model's attention score Y for a certain target be the inner product of weight w and feature map A:

其中，A为特征图feature map，w为神经网络模型权重，Y为模型推理过程的注意力分布得分，relu为线性整流函数，为模型的权重梯度，具体为：Among them, A is the feature map, w is the weight of the neural network model, Y is the attention distribution score of the model reasoning process, relu is the linear rectification function, is the weight gradient of the model, specifically:

公式(2)中，是每一个特征元素的梯度加权和。合并公式(1)和公式(2)得到公式(3)，即为模型对固定类别注意力得分的最终形式：In formula (2), is the gradient weighted sum of each feature element. Combine formula (1) and formula (2) to get formula (3), which is the final form of the model's attention score for fixed categories:

事实上，注意力的分布情况与识别正确率密切相关，如图2所示，当误识别发生时，模型的注意力变得异常分散，反之，当识别正确时，注意力几乎与目标轮廓完整切合。In fact, the distribution of attention is closely related to the correct rate of recognition. As shown in Figure 2, when misrecognition occurs, the attention of the model becomes extremely scattered. On the contrary, when the recognition is correct, the attention is almost complete with the target outline fit.

在步骤S4中，提取可疑区域中的深度特征，得到特征图时序序列。In step S4, the deep features in the suspicious area are extracted to obtain the time series of feature maps.

也就是说，接受时空注意力机制输出的可疑区域，并提取其深度特征，作为LSTM状态转移子网的输入。That is, the suspicious regions output by the spatio-temporal attention mechanism are accepted and their deep features are extracted as the input of the LSTM state transfer sub-network.

在步骤S5中，采用LSTM状态转移子网将特征图时序序列输入到置信输出的映射装置中，得到修正后的特征图时序序列。In step S5, the LSTM state transition subnetwork is used to input the time-series sequence of feature maps into the confidence output mapping device to obtain the time-series sequence of feature maps after correction.

进一步地，LSTM状态转移子网部分采用RNN循环神经网络的重要变种LSTM作为主要部件。传统的RNN循环神经网络单元由于梯度消失问题，存在存储内容时长限制，且不易训练。如图3所示，LSTM是一种被专门设计解决此类问题的变种循环神经网络，一个完整的LSTM细胞结构包括输入门、输出门、门门和遗忘门，其可以将当前的隐藏状态传递到下一时刻参与融合计算，同时又避免了普通循环神经网络由于梯度消失问题导致的记忆存储时长限制，具体公式为：Furthermore, the LSTM state transfer subnet part adopts LSTM, an important variant of RNN recurrent neural network, as the main component. Due to the problem of gradient disappearance, the traditional RNN cyclic neural network unit has a time limit for storing content and is not easy to train. As shown in Figure 3, LSTM is a variant of cyclic neural network specially designed to solve such problems. A complete LSTM cell structure includes input gate, output gate, gate gate and forget gate, which can transfer the current hidden state Participate in the fusion calculation at the next moment, and at the same time avoid the memory storage time limit caused by the gradient disappearance problem of the ordinary cyclic neural network. The specific formula is:

输出状态和隐藏状态计算过程为：The output state and hidden state calculation process is:

其中，h_t为当前时序的输出状态，t为时序，o为输出门，c_t为当前时序的隐藏状态，f为遗忘门，c_t-1为上一时序的隐藏状态，i为输入门，g为门门；Among them, h _t is the output state of the current sequence, t is the sequence, o is the output gate, c _t is the hidden state of the current sequence, f is the forgetting gate, c _t-1 is the hidden state of the previous sequence, and i is the input gate , g is the gate;

再之后，根据该输出状态和隐藏状态对特征图时序序列进行修正。Then, according to the output state and the hidden state, the time sequence of the feature map is corrected.

步骤S6中，利用分类器对修正后的特征图时序序列进行分类，得到最终识别结果，其中，最终识别结果为目标种类和置信率。In step S6, a classifier is used to classify the corrected time series of feature maps to obtain a final recognition result, wherein the final recognition result is the target category and confidence rate.

具体地，将LSTM细胞的时序输出(修正后的特征图时序序列)输入到分类器中，以得到最终分类结果。Specifically, the time series output of LSTM cells (modified time series of feature maps) is input into the classifier to obtain the final classification result.

需要说明的是，步骤S4和步骤S6中可采用主流的深度卷积模型中任一种作为主干网络，同样，事实上，也可替换为任意有效的特征提取器和分类器，本发明实施例中采用VGG主干网络构成的特征提取器，显式全连接层作为分类器，但本领域技术人员可根据实际情况选择不同的超分方法，在此不做具体限定。It should be noted that in step S4 and step S6, any of the mainstream deep convolutional models can be used as the backbone network. Similarly, in fact, any effective feature extractor and classifier can also be replaced. The embodiment of the present invention The feature extractor composed of the VGG backbone network is used in the method, and the explicit fully connected layer is used as the classifier, but those skilled in the art can choose different super-resolution methods according to the actual situation, and no specific limitation is made here.

此外，本发明为了进行定量化实验，构造了一种对空小目标序列图像挑战数据集ATSETC4，以作为深度网络训练基础，解决目前领域内缺少用于序列化神经网络训练的专用数据集的缺点。ATSETC4包含2400个源自真实捕获和网络资源的视频片段，考虑了包括野外、城市、虚拟环境和复杂气象环境等诸多场景。如图4所示，本发明在ATSETC4中设置了四种飞行目标种类：飞鸟、热气球、固定翼无人机和旋翼无人机。同时也设置了六种标准尺寸图像子集：224×224、112×112、56×56、28×28、14×14、7×7，以便多尺度对比测试(采用该种设置的原因是需适应带有全连接层深度网络的参数要求)。具体而言，小尺度子集是由大尺度子集下采样得来，且在大尺度子集初始化阶段，就适当融合了不同尺寸的原始目标，以增加数据集的挑战性。最终的ATSETC4包含2400个序列，每个序列长度为25帧，共计60000帧图像。通常意义上讲，目标尺寸小于等于28×28可被认为是小目标。In addition, in order to carry out quantitative experiments, the present invention constructs an empty small target sequence image challenge dataset ATSETC4 as the basis for deep network training, and solves the shortcoming of lack of dedicated datasets for serialized neural network training in the current field . ATSETC4 contains 2400 video clips from real capture and network resources, considering many scenarios including field, city, virtual environment and complex meteorological environment. As shown in Fig. 4, the present invention has been provided with four kinds of flight target types in ATSETC4: flying bird, hot-air balloon, fixed-wing unmanned aerial vehicle and rotary-wing unmanned aerial vehicle. At the same time, six standard size image subsets are set: 224×224, 112×112, 56×56, 28×28, 14×14, 7×7 for multi-scale comparison test (the reason for using this setting is that it needs Adapt to the parameter requirements of deep networks with fully connected layers). Specifically, the small-scale subset is down-sampled from the large-scale subset, and in the initialization stage of the large-scale subset, the original targets of different sizes are properly fused to increase the challenge of the dataset. The final ATSETC4 contains 2400 sequences, each sequence length is 25 frames, a total of 60000 frame images. Generally speaking, objects whose size is less than or equal to 28×28 can be considered as small objects.

因此，本发明提出的小目标识别方法的具体工作过程可如下表1所示。Therefore, the specific working process of the small target recognition method proposed by the present invention can be shown in Table 1 below.

进一步地，实际测试中，本发明对连续的目标帧识别效果显著，具备强大的自修正能力，如图5所示，模型在前期识别过程由于信号模糊、目标较小、背景复杂等原因发生短暂的错误识别，随着帧序列的不断读入，模型进行自我修正，逐渐修正为正确的类别并不断提高置信率。Furthermore, in the actual test, the present invention has a remarkable effect on the recognition of continuous target frames, and has a strong self-correction ability. As shown in Figure 5, the model in the early stage of the recognition process is short-lived due to reasons such as blurred signals, small targets, and complex backgrounds. With the continuous reading of frame sequences, the model performs self-correction, gradually correcting to the correct category and continuously increasing the confidence rate.

下面通过一个具体实施例对本发明提出的基于时空神经网络的小目标识别方法进一步说明。The small target recognition method based on the spatio-temporal neural network proposed by the present invention will be further described below through a specific embodiment.

首先，本发明实施例采用带有已训练模型的SRGAN超分辨率模型对图像进行直接处理。First, the embodiment of the present invention uses the SRGAN super-resolution model with a trained model to directly process the image.

接下来，使用交叉熵损失函数作为优化函数，并且设置最小batchsize为16。具体地，一个带有25帧的序列被设置为最小batch单元。初始学习率为10^-4，随着验证集的实验精度停止显著上升时，学习率按照100倍放缩因子进行下降。特别地，模型中的全连接层在训练过程中采用droupout模式进行正则化，droupout因子为0.5(即为防止过拟合，随机地冻结一部分全连接层参数)。另外，本发明主要基于在ImageNet上经过预训练的VGG11深度卷积网络作为特征提取器，训练过程保持卷积主干网络大部分参数处于冻结状态。相应地，ATSETC4按照8:2的比例被分割成训练集和测试集。最终在不同尺寸的子集上平均需要训练55～65轮，且平均训练完成一个尺寸的子集模型耗时90分钟。实验设备为单卡NVIDIA GTX1080Ti GPU，机器学习框架采用Pytorch。对比实验中其他模型采用默认参数，测试阶段仍然以本发明提出的ATSETC4为基准。Next, use the cross-entropy loss function as the optimization function, and set the minimum batchsize to 16. Specifically, a sequence with 25 frames is set as the minimum batch unit. The initial learning rate is 10 ^-4 , and when the experimental accuracy of the verification set stops increasing significantly, the learning rate decreases according to the scaling factor of 100. In particular, the fully connected layer in the model is regularized using the dropout mode during training, and the dropout factor is 0.5 (that is, to prevent overfitting, a part of the fully connected layer parameters are randomly frozen). In addition, the present invention is mainly based on the pre-trained VGG11 deep convolutional network on ImageNet as a feature extractor, and the training process keeps most of the parameters of the convolutional backbone network in a frozen state. Correspondingly, ATSETC4 is split into training set and test set according to the ratio of 8:2. In the end, an average of 55 to 65 rounds of training is required on subsets of different sizes, and it takes an average of 90 minutes to complete the training of a subset model of one size. The experimental equipment is a single-card NVIDIA GTX1080Ti GPU, and the machine learning framework uses Pytorch. In the comparative experiment, default parameters are used for other models, and the ATSETC4 proposed by the present invention is still used as the benchmark in the testing stage.

实验中，本发明实施例将简化的时空神经网络设置为Simple_STNet(不带有超分模块)，完全版模型命名为STNet。与一些前沿的目标识别网络性能对比，如下表2所示。In the experiment, the embodiment of the present invention sets the simplified spatio-temporal neural network as Simple_STNet (without super-resolution module), and the full version model is named STNet. Compared with some cutting-edge target recognition network performance, as shown in Table 2 below.

表2：Simple STNet、STNet和各种先进识别算法性能对比Table 2: Performance comparison of Simple STNet, STNet and various advanced recognition algorithms

由表2可知，无论是简化版Simple STNet还是完全版STNet，均几乎在ATSETC4上所有尺寸子集上取得最优性能，完全版STNet在7尺度上发生退化是由于超分辨率在32倍下采样条件下已经超出理论极限，图像复原过程发生错误，进而导致性能衰退。如图6所示，是不同尺寸原图及经过SRGAN处理后结果，从左到右三列分别是：224尺寸高清图、7尺寸低清图和7尺寸SRGAN处理结果。It can be seen from Table 2 that both the simplified version of Simple STNet and the full version of STNet achieve the best performance on almost all size subsets on ATSETC4. The degradation of the full version of STNet on the 7 scale is due to the super-resolution downsampling by 32 times Under these conditions, the theoretical limit has been exceeded, and an error occurs in the image restoration process, which leads to performance degradation. As shown in Figure 6, it is the original images of different sizes and the results after SRGAN processing. The three columns from left to right are: 224-size high-definition images, 7-size low-resolution images, and 7-size SRGAN processing results.

因此，本发明实施例提出的基于时空神经网络的小目标识别方法，解决了现有由单帧图像目标识别导致的识别性能下降问题，在大致锁定目标所在区域后，将视觉捕捉器和计算部件资源集中连续地分配给处于该区域的可疑目标，通过一定时间的连续时序图像捕捉，逐渐提升识别置信率；同时，随着模型的逐渐运行，一些前期得到的错误结论也将得到修正，使模型具备一定的纠错性能。Therefore, the small target recognition method based on the spatio-temporal neural network proposed by the embodiment of the present invention solves the problem of the recognition performance degradation caused by the single-frame image target recognition. Resources are concentrated and continuously allocated to suspicious targets in this area, and the recognition confidence rate is gradually improved through continuous time-series image capture for a certain period of time; at the same time, with the gradual operation of the model, some erroneous conclusions obtained in the early stage will also be corrected, making the model It has certain error correction performance.

其次参照附图描述根据本发明实施例提出的基于时空神经网络的小目标识别系统。Next, the small target recognition system based on spatio-temporal neural network proposed according to the embodiment of the present invention will be described with reference to the accompanying drawings.

图7是本发明一个实施例的基于时空神经网络的小目标识别系统的结构示意图。FIG. 7 is a schematic structural diagram of a small target recognition system based on a spatio-temporal neural network according to an embodiment of the present invention.

如图7所示，该系统10包括：获取模块100、超分辨率模块200、时空注意力模块300、特征提取模块400、LSTM状态转移子网500和分类模块600。As shown in FIG. 7 , the system 10 includes: an acquisition module 100 , a super-resolution module 200 , a spatiotemporal attention module 300 , a feature extraction module 400 , an LSTM state transfer subnetwork 500 and a classification module 600 .

其中，获取模块100用于获取当前时刻的原始模糊图像。超分辨率模块200用于对原始模糊图像进行预处理，得到高画质图像序列。时空注意力模块300用于在高画质图像序列的相邻帧间进行逻辑减操作，捕捉并高亮强调可疑区域。特征提取模块400用于提取可疑区域中的深度特征，得到特征图时序序列。LSTM状态转移子网500用于将特征图时序序列输入到置信输出的映射装置中，得到转移状态。分类模块600用于对转移状态进行分类，得到最终识别结果，其中，最终识别结果为种类和置信率。Wherein, the acquisition module 100 is used to acquire the original blurred image at the current moment. The super-resolution module 200 is used to preprocess the original blurred image to obtain a high-quality image sequence. The spatio-temporal attention module 300 is used to perform a logical subtraction operation between adjacent frames of a high-quality image sequence to capture and highlight suspicious areas. The feature extraction module 400 is used to extract the deep features in the suspicious area to obtain the time series of feature maps. The LSTM state transition sub-network 500 is used to input the time series sequence of the feature map into the confidence output mapping device to obtain the transition state. The classification module 600 is used to classify transition states to obtain a final recognition result, wherein the final recognition result is a category and a confidence rate.

进一步地，在本发明的一个实施例中，LSTM状态转移子网部分采用RNN循环神经网络的重要变种LSTM作为主要部件，其中，一个完整的重要变种LSTM细胞结构包括输入门、输出门、门门和遗忘门。Further, in one embodiment of the present invention, the LSTM state transfer subnet part adopts an important variant LSTM of the RNN cyclic neural network as a main component, wherein a complete important variant LSTM cell structure includes an input gate, an output gate, a gate gate and Forgotten Gate.

进一步地，在本发明的一个实施例中，一个完整的重要变种LSTM细胞结构为：Further, in one embodiment of the present invention, a complete important variant LSTM cell structure is:

进一步地，在本发明的一个实施例中，转移状态为：Further, in one embodiment of the present invention, the transfer status is:

可选地，在本发明的一个实施例中，特征提取模块和分类模块中采用任一种深度卷积模型作为主干网络。Optionally, in an embodiment of the present invention, any deep convolution model is used as the backbone network in the feature extraction module and the classification module.

需要说明的是，前述对基于时空神经网络的小目标识别方法的实施例的解释说明也适用于该系统，此处不再赘述。It should be noted that the foregoing explanations of the embodiment of the small target recognition method based on the spatio-temporal neural network are also applicable to this system, and will not be repeated here.

根据本发明实施例提出的基于时空神经网络的小目标识别系统，解决了现有由单帧图像目标识别导致的识别性能下降问题，在大致锁定目标所在区域后，将视觉捕捉器和计算部件资源集中连续地分配给处于该区域的可疑目标，通过一定时间的连续时序图像捕捉，逐渐提升识别置信率；同时，随着模型的逐渐运行，一些前期得到的错误结论也将得到修正，使模型具备一定的纠错性能。The spatio-temporal neural network-based small target recognition system proposed according to the embodiment of the present invention solves the problem of recognition performance degradation caused by single-frame image target recognition. After roughly locking the target area, the visual capture device and computing component resources Concentrated and continuous allocation to suspicious targets in this area, through continuous time-series image capture for a certain period of time, the recognition confidence rate is gradually improved; at the same time, with the gradual operation of the model, some wrong conclusions obtained in the early stage will also be corrected, so that the model has Certain error correction performance.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

1. a small target recognition method based on spatio-temporal neural network, is characterized in that, comprises the following steps:

Step S1, obtaining the original blurred image at the current moment;

Step S2, using a super-resolution algorithm to preprocess the original blurred image to obtain a high-quality image sequence;

Step S3, using the spatio-temporal attention mechanism to perform a logical subtraction operation between adjacent frames of the high-quality image sequence, capturing and highlighting suspicious areas;

Let the model's attention score Y for a certain target be the inner product of the weight w and the feature map A:

Among them, A is the feature map featuremap, w is the weight of the neural network model, Y is the attention distribution score of the model reasoning process, relu is the linear rectification function, is the weight gradient of the model, specifically:

In formula (2), it is the gradient weighted sum of each feature element, combining formula (1) and formula (2) to get formula (3), which is the final form of the model's attention score for fixed categories:

Step S4, extracting the depth features in the suspicious area to obtain a time series of feature maps;

Step S5, using the LSTM state transfer subnetwork to input the feature map time series sequence into the confidence output mapping device to obtain the corrected feature map time series sequence, the LSTM state transfer subnetwork part adopts an important variant of the RNN cyclic neural network LSTM is used as the main component, wherein, a complete important variant LSTM cell structure includes an input gate, an output gate, a gate gate and a forgetting gate, and the complete important variant LSTM cell structure is:

Among them, i is the input gate, f is the forget gate, o is the output gate, g is the gate, the sigmod function σ(x)=1/(1+e ^-x ), φ(x)=(e ^x -e ^{- x} )/(e ^x +e ^-x ), W is the weight matrix, B is the bias vector, x _t and h _t-1 are the current input;

The transfer status is:

Among them, h _t is the output state of the current sequence, t is the sequence, o is the output gate, c _t is the hidden state of the current sequence, f is the forgetting gate, c _t-1 is the hidden state of the previous sequence, and i is the input gate , g is the gate;

Step S6, using a classifier to classify the corrected time series of feature maps to obtain a final recognition result, wherein the final recognition result is the target category and confidence rate.

2. The small target recognition method based on spatio-temporal neural network according to claim 1, characterized in that, any one of the deep convolution models is used as the backbone network in the step S4 and the step S6.

3. A small target recognition system based on spatio-temporal neural network, characterized in that, comprising: acquisition module, super-resolution module, spatio-temporal attention module, feature extraction module, LSTM state transfer subnetwork and classification module, wherein said An acquisition module, configured to acquire the original blurred image at the current moment;

The super-resolution module is used to preprocess the original blurred image to obtain a high-quality image sequence;

The spatio-temporal attention module is used to perform a logical subtraction operation between adjacent frames of the high-quality image sequence to capture and highlight suspicious areas;

The feature extraction module is used to extract the depth features in the suspicious area to obtain a time series of feature maps;

The LSTM state transfer subnetwork is used to input the time series sequence of the feature map into the mapping device of the confidence output, and obtain the time series sequence of the feature map after correction, wherein the LSTM state transfer subnetwork part adopts RNN cyclic neural network The important variant of LSTM is used as the main component, and a complete important variant of LSTM cell structure includes input gate, output gate, gate gate and forget gate;

A complete important variant of the LSTM cell structure is:

The transfer status is:

The classification module is configured to classify the corrected time series of feature maps to obtain a final recognition result, wherein the final recognition result is a category and a confidence rate.

4. The small target recognition system based on spatio-temporal neural network according to claim 3, characterized in that any deep convolution model is used as the backbone network in the feature extraction module and the classification module.