CN114821441A

CN114821441A - Deep learning-based airport scene moving target identification method combined with ADS-B information

Info

Publication number: CN114821441A
Application number: CN202210524626.7A
Authority: CN
Inventors: 张翔; 张志卓; 杨瑞菁; 蒋岚洁; 李宜航
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-07-29
Anticipated expiration: 2042-05-13
Also published as: CN114821441B

Abstract

The invention discloses an airport scene moving target identification method based on deep learning combined with ADS-B information, which combines the thought of change detection motion and stillness judgment and a feature extraction, classification and regression module in target detection, and improves a background modeling method based on a time histogram in a change detection algorithm to obtain a more accurate background image; the existing ADS-B technology of the aircraft is adopted for auxiliary training, the feature extraction capability is enhanced, and the detection result is improved. The ideas and advantages of the change detection algorithm and the target detection algorithm are integrated.

Description

Recognition method of moving target in airport scene based on deep learning combined with ADS-B information

技术领域technical field

本发明涉及机场目标识别领域，具体涉及一种联合ADS-B信息的基于深度学习的机场场面运动目标识别方法。The invention relates to the field of airport target recognition, in particular to a deep learning-based airport scene moving target recognition method combined with ADS-B information.

背景技术Background technique

近年来，全球民航载客量持续增加，机场结构日益复杂，机场场面愈加拥挤，管理员对场区地面运动目标的目视监视难度逐渐增加，人工指挥与管理的安全隐患越加明显。为了满足自动化场面监视的需求，通过多个摄像头拍摄，覆盖整个机场场面区域的监控系统被应用到现代机场场面监视中。相比于其他的传感器而言，计算机视觉技术拥有得天独厚的优势，原因在于视频的传播信息的方式更加直观和真实，人眼能够直观的捕捉展示在视频中的关键信息。于是基于计算机视觉技术的机场智能化应用逐渐成为自动化场面监视技术的主流。基于计算机视觉技术获取的视频数据信息量非常丰富，在此基础上可以开发诸如非法入侵提醒，撞击预警等智能应用。作为许多基于视频的智能应用的基础，变化检测和目标检测算法已在近些年有了很大进展，但在机场场面监视应用时存在一定的局限性。In recent years, the global civil aviation passenger capacity has continued to increase, the airport structure has become increasingly complex, and the airport scene has become more and more crowded. It has gradually become more difficult for administrators to visually monitor moving targets on the ground in the field, and the safety hazards of manual command and management have become more and more obvious. In order to meet the needs of automated surface surveillance, a surveillance system covering the entire airport surface area is applied to modern airport surface surveillance through multiple cameras. Compared with other sensors, computer vision technology has unique advantages because the way of disseminating information in video is more intuitive and realistic, and the human eye can intuitively capture the key information displayed in the video. Therefore, the intelligent application of airports based on computer vision technology has gradually become the mainstream of automated scene surveillance technology. The amount of video data obtained based on computer vision technology is very rich. On this basis, intelligent applications such as illegal intrusion reminders and collision warnings can be developed. As the basis of many video-based intelligent applications, change detection and object detection algorithms have made great progress in recent years, but there are certain limitations in airport scene surveillance applications.

变化检测算法的目的是将图像序列或视频中发生空间位置变化的前景物体呈现并标记出来，传统的变化检测算法主要是利用背景减法技术，利用少量的输入视频帧估计出初始干净的背景图像，然后在估计的背景和输入视频帧之间进行像素分割，并不断更新背景模型。在方法ViBe中，提出了三种背景模型更新策略:随机背景样本替换来代表短期和长期历史，无记忆更新策略和通过背景样本传播的空间扩散策略，这些策略已被最新的最先进的变化检测技术广泛采用，但传统的方法是无监督的，算法的性能取决于所建立的背景模型的质量，在机场场景中，由于光照、阴影等因素导致背景频繁变化，影响检测结果。近年来，大量的基于卷积神经网络(CNN)的有监督变化检测技术被提出。基于卷积神经网络的变化检测算法通过对输入图像多尺度特征进行编码，然后使用转置卷积神经网络进行解码，实现特征到前景像素的概率映射，并逐一判断每个像素属于前景还是背景，虽然基于卷积神经网络的变化检测算法精度相较于传统方法有了很大的提升，并在AGVS(机场场面监控数据集)中取得了良好的精度，但这些方法比较复杂，速度很慢，很难在机场场面视频实时监控中发挥作用。The purpose of the change detection algorithm is to present and mark the foreground objects whose spatial position changes in the image sequence or video. The traditional change detection algorithm mainly uses the background subtraction technique to estimate the initial clean background image by using a small number of input video frames. Pixel segmentation is then performed between the estimated background and the input video frame, and the background model is continuously updated. In method ViBe, three background model update strategies are proposed: random background sample replacement to represent short-term and long-term history, memoryless update strategy and spatial diffusion strategy propagated through background samples, which have been detected by the latest state-of-the-art change detection The technology is widely adopted, but the traditional method is unsupervised, and the performance of the algorithm depends on the quality of the established background model. In the airport scene, the background frequently changes due to factors such as lighting and shadows, which affects the detection results. In recent years, a large number of supervised change detection techniques based on Convolutional Neural Networks (CNN) have been proposed. The change detection algorithm based on the convolutional neural network encodes the multi-scale features of the input image, and then uses the transposed convolutional neural network to decode, realizes the probability mapping of the features to the foreground pixels, and determines whether each pixel belongs to the foreground or the background one by one. Although the accuracy of the change detection algorithm based on convolutional neural network has been greatly improved compared with traditional methods, and achieved good accuracy in AGVS (airport scene surveillance data set), these methods are relatively complex and slow. It is difficult to play a role in the real-time monitoring of airport scene video.

作为计算机视觉中的经典任务，目标检测任务一直是该领域的研究热点。目标检测的目的是找到具有不同几何形状的对象，以及为每个检测到的对象分配一个准确的标签。这是一项具有挑战性的任务，因为对象可以在自然图像中以任何比例、形状和位置出现，而且同一类对象的外观可能非常不同。传统的目标检测算法通常用滑动窗口的方式，通过SIFT，HOG等特征提取算法对每个窗口进行特征提取，之后对提取的特征利用机器学习算法，比如支持向量机等进行分类，最终得到该窗口是否包含某一类物体。这就导致传统目标检测算法存在计算量比较大，运算速度慢，常常会产生多个正确识别的结果的缺点；在深度学习快速发展的推动下，研究人员设计了更加有效的目标检测算法，目标检测算法可分为两阶段和一阶段两种方法，两阶段的目标检测算法首先生成可能包含物体的候选区域，然后对候选区域做进一步的分类和校准，得到最终的检测结果。而一阶段的没有生成候选区域的步骤，直接生成结果。因此，前者工作速度较慢，但检测精度较高。与像素级分割的变化检测算法相比，目标检测只需要找到当前目标的所在区域，计算成本大大降低。但在机场场面监视的特定场景中，常常会出现大部分飞机静止在停机坪，只有少部分飞机在运动。而其中，我们感兴趣的仅仅是运动的目标，而目标检测算法会将画面中所有的目标全部检测，这不仅会造成信息的冗余，算法的显示结果也不够直观。As a classic task in computer vision, object detection has always been a research hotspot in this field. The goal of object detection is to find objects with different geometries and assign an accurate label to each detected object. This is a challenging task because objects can appear in natural images in any scale, shape, and position, and the appearance of the same class of objects can be very different. Traditional target detection algorithms usually use sliding windows to extract features for each window through feature extraction algorithms such as SIFT and HOG, and then use machine learning algorithms, such as support vector machines, to classify the extracted features, and finally get the window. Whether to contain a certain class of objects. This leads to the shortcomings of traditional target detection algorithms that have a relatively large amount of calculation, slow operation speed, and often produce multiple correct identification results; driven by the rapid development of deep learning, researchers have designed more effective target detection algorithms. The detection algorithm can be divided into two-stage and one-stage methods. The two-stage target detection algorithm first generates candidate regions that may contain objects, and then further classifies and calibrates the candidate regions to obtain the final detection result. In the first stage, there is no step of generating candidate regions, and the results are directly generated. Therefore, the former works slower, but has higher detection accuracy. Compared with the change detection algorithm of pixel-level segmentation, target detection only needs to find the area of the current target, and the computational cost is greatly reduced. However, in the specific scene of airport scene surveillance, it often occurs that most of the aircraft are still on the tarmac, and only a small number of aircraft are moving. Among them, what we are interested in is only the moving target, and the target detection algorithm will detect all the targets in the picture, which will not only cause redundancy of information, but also the display results of the algorithm are not intuitive enough.

机场场面不同于其它场景，大多数民用航空飞机都搭载了ADS-B系统，该系统是一种广播式自动相关监视系统。搭载了该系统的飞机，就可以通过数据链广播其自身的一些信息，这些信息包含了飞机的位置(四维坐标，精度、维度、高度、时间)，飞机的飞行速度以及呼号(飞机的身份信息)，还包含了一些其它信息。目前，已经有一些工作，可以根据ADS-B中的四维图像信息和相机位置，通过映射的方法，映射到二维图像坐标当中。而ADS-B信号正是运动飞机目标独有的信息，因此可以借助ADS-B信息，提出一种全新的机场场面运动目标识别策略，满足机场场面监视的特殊性和实时性要求，以实现基于计算机视觉技术的机场智能化应用。The airport scene is different from other scenes. Most of the civil aviation aircraft are equipped with the ADS-B system, which is an automatic dependent surveillance system of broadcast type. The aircraft equipped with this system can broadcast some of its own information through the data link, which includes the position of the aircraft (four-dimensional coordinates, accuracy, dimension, altitude, time), the flight speed of the aircraft and the call sign (the identity information of the aircraft). ), and some additional information. At present, there have been some works that can map to two-dimensional image coordinates by mapping methods based on the four-dimensional image information and camera position in ADS-B. The ADS-B signal is the unique information of the moving aircraft target. Therefore, with the help of the ADS-B information, a brand-new airport surface moving target recognition strategy can be proposed to meet the special and real-time requirements of airport surface surveillance. Airport intelligent application of computer vision technology.

发明内容SUMMARY OF THE INVENTION

针对现有技术中的上述不足，本发明提供了一种联合ADS-B信息的基于深度学习的机场场面运动目标识别方法。In view of the above deficiencies in the prior art, the present invention provides a method for recognizing moving objects in airport scenes based on deep learning combined with ADS-B information.

为了达到上述发明目的，本发明采用的技术方案为：In order to achieve the above-mentioned purpose of the invention, the technical scheme adopted in the present invention is:

一种联合ADS-B信息的基于深度学习的机场场面运动目标识别方法，包括如下步骤：A deep learning-based airport scene moving target recognition method combined with ADS-B information, comprising the following steps:

S1、对ADS-B信息进行解码，利用飞机返回的三维数据及相机得位置将ADS-B数据对应到二维图像中；S1. Decode the ADS-B information, and use the three-dimensional data returned by the aircraft and the position of the camera to correspond the ADS-B data to the two-dimensional image;

S2、对机场场面监控数据集中的场面监视视频序列进行标注，得到真实边界框，同时选取前十段视频序列为训练集，后十段视频序列为测试集；S2, label the scene surveillance video sequences in the airport scene surveillance data set to obtain the real bounding box, and simultaneously select the first ten video sequences as the training set, and the last ten video sequences as the test set;

S3、构建运动目标识别网络，利用所构建的运动目标识别网络输出所估计的背景图像；S3, constructing a moving target recognition network, and using the constructed moving target recognition network to output the estimated background image;

S4、将步骤S3得到的所估计的背景图像与当前帧图像作为运动目标识别网络计算目标候选框的类别以及目标候选框的边界信息；S4, using the estimated background image obtained in step S3 and the current frame image as the moving target recognition network to calculate the category of the target candidate frame and the boundary information of the target candidate frame;

S5、将S4得到的目标候选框类别和目标候选框的边界信息与S2标注的真实边界框进行对比，并利用交叉熵损失函数和smooth_L1距离计算出分类损失函数和回归损失函数，将两个损失函数之和作为运动估计网络的损失函数；S5. Compare the target candidate frame category obtained in S4 and the boundary information of the target candidate frame with the real bounding box marked in S2, and use the cross-entropy loss function and the smooth _L1 distance to calculate the classification loss function and the regression loss function. The sum of the loss functions is used as the loss function of the motion estimation network;

S6、将运动目标识别网络的损失函数反向传播进行学习，得到训练好的运动估计网络，并利用训练好的运动估计网络对机场场面进行运动目标识别。S6 , back-propagating the loss function of the moving target recognition network to obtain a trained motion estimation network, and using the trained motion estimation network to perform moving target recognition on the airport scene.

进一步的，所述S3中构建的运动目标识别网络包括级联的运动判断模块和目标检测模块，其中，运动目标识别网络的输入为当前帧、当前帧ADS-B解码后的位置信息和标号信息、当前帧的历史前30帧图像。Further, the moving target recognition network constructed in the described S3 comprises a cascaded motion judgment module and a target detection module, wherein, the input of the moving target recognition network is the position information and the label information after the current frame, the current frame ADS-B decoding , The first 30 frames of images in the history of the current frame.

进一步的，所述S3中所估计的背景图像的计算方法为：Further, the calculation method of the estimated background image in the S3 is:

将最近历史帧图像输入运动判断模块，连续通过多个多尺度特征接收快，每个特征快捕获大小为1×1、3×3和5×5感受野的最大响应，得到所估计的背景图像。Input the recent historical frame image into the motion judgment module, and continuously pass through multiple multi-scale features, each feature captures the maximum response of the receptive field of size 1 × 1, 3 × 3 and 5 × 5, and obtains the estimated background image. .

进一步的，所述目标检测模块包括依次级联的主干网络、特征金字塔网络以及回归和分类网络，其中，主干网络为ResNet网络，用于计算整个输入图像的卷积特征；特征金字塔网络通过自上而下的路径和横向连接增强所述目标检测模块；回归和分类网络均为多尺度卷积神经网络。Further, the target detection module includes a cascaded backbone network, a feature pyramid network, and a regression and classification network, wherein the backbone network is a ResNet network for calculating the convolution features of the entire input image; The lower paths and lateral connections enhance the object detection module; both regression and classification networks are multi-scale convolutional neural networks.

进一步的，所述S5中分类损失函数和回归损失函数分别表示为：Further, the classification loss function and the regression loss function in S5 are respectively expressed as:

其中，t_i＝{t_x,t_y,t_w,th}表示锚点的坐标，在训练阶段为预测的偏移量；

为与维度向量相同的地面实况标签，在训练阶段为地面实况的偏移量；p_i为锚点是目标的概率；

为分类损失函数；

为回归损失函数。Among them, t _i ={t _x , t _y , t _w , th} represents the coordinates of the anchor point, which is the predicted offset in the training phase;

is the ground truth label that is the same as the dimension vector, and is the offset of the ground truth in the training phase; _pi is the probability that the anchor point is the target;

is the classification loss function;

is the regression loss function.

本发明具有以下有益效果：The present invention has the following beneficial effects:

1、本方法解决了目标检测算法无法分辨目标运动与静止的问题，更适用于机场场面监视场景。1. This method solves the problem that the target detection algorithm cannot distinguish the moving and static targets, and is more suitable for the airport scene surveillance scene.

2、采用多尺度接受性特征块级联作为背景建模方法，能够更准确地识别出运动的目标，得到精确背景图像。2. The multi-scale receptive feature block cascade is used as the background modeling method, which can more accurately identify the moving target and obtain the accurate background image.

3、采用航空器特有的ADS-B技术，提高航空器检测精度，为机场提供更有效的场面监视方案。3. The aircraft-specific ADS-B technology is adopted to improve the detection accuracy of the aircraft and provide a more effective surface surveillance scheme for the airport.

附图说明Description of drawings

图1为本发明联合ADS-B信息的基于深度学习的机场场面运动目标识别方法流程示意图。FIG. 1 is a schematic flowchart of a method for recognizing moving objects in an airport scene based on deep learning combined with ADS-B information according to the present invention.

图2为本发明的运动估计模块网络结构。FIG. 2 is the network structure of the motion estimation module of the present invention.

图3为本发明的目标检测模块网络结构。FIG. 3 is the network structure of the target detection module of the present invention.

图4为本发明运动估计模块的输出与基于时间直方图方法得到估计背景的比较结果。FIG. 4 is a comparison result between the output of the motion estimation module of the present invention and the estimated background based on the time histogram method.

图5为本发明实现运动目标识别算法与RCNN目标检测算法比较结果。FIG. 5 is a comparison result between the moving target recognition algorithm implemented by the present invention and the RCNN target detection algorithm.

图6为本发明实现运动目标识别算法与RCNN目标检测算法在AGVS数据集部分视频序列的mAP结果。FIG. 6 is the mAP result of the present invention implementing the moving target recognition algorithm and the RCNN target detection algorithm in part of the video sequence of the AGVS data set.

具体实施方式Detailed ways

下面对本发明的具体实施方式进行描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。The specific embodiments of the present invention are described below to facilitate those skilled in the art to understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, as long as various changes Such changes are obvious within the spirit and scope of the present invention as defined and determined by the appended claims, and all inventions and creations utilizing the inventive concept are within the scope of protection.

一种联合ADS-B信息的基于深度学习的机场场面运动目标识别方法，如图1所示，包括如下步骤：A method for identifying moving targets in airport scenes based on deep learning combined with ADS-B information, as shown in Figure 1, includes the following steps:

对于基于深度学习的算法而言，数据集有着至关重要的作用。因此我们选取AGVS(机场场面监控数据集)中的部分机场场面监视视频序列，本实施例里通过labelme软件对每帧图片运动的飞机目标进行手工标注，并将前十段视频序列设定为训练集，后十段视频序列设定为测试集Datasets play a crucial role in deep learning-based algorithms. Therefore, we select some airport scene surveillance video sequences in AGVS (airport scene surveillance data set). In this embodiment, labelme software is used to manually label the moving aircraft targets in each frame of pictures, and the first ten video sequences are set as training set, the last ten video sequences are set as the test set

根据所提出的运动判断模块以及目标检测模块的基础算法框架，我们将二者级联得到最终的运动目标识别网络，网络的输入为当前帧、当前帧ADS-B解码之后的位置信息及标号信息以及当前帧的历史前30帧图像，其中，当前帧的历史前30帧图像输入到如图2所示的运动估计网络，运动估计网络的输出为所估计的背景图像，输出的可视化结果与基于时间直方图的对比结果如图4所示，二者均采用30帧历史图像得到当前帧的背景图像。According to the basic algorithm framework of the proposed motion judgment module and target detection module, we cascade the two to obtain the final moving target recognition network. The input of the network is the current frame, the position information and label information of the current frame after ADS-B decoding. And the first 30 frames of images in the history of the current frame, wherein, the first 30 frames of images in the history of the current frame are input to the motion estimation network as shown in Figure 2, and the output of the motion estimation network is the estimated background image. The comparison results of the time histograms are shown in Figure 4. Both of them use 30 frames of historical images to obtain the background image of the current frame.

基于时间直方图的背景估计方法被广泛的应用于变化检测算法中，时间直方图实现背景估计的方法如下。首先，在每个像素位置，获得时间直方图：The background estimation method based on the time histogram is widely used in the change detection algorithm. The method of the time histogram to realize the background estimation is as follows. First, at each pixel location, get a temporal histogram:

其中，I(m,n,t)代表第t帧图片(m,n)位置的像素值。由像素时间直方图，可以得到特定位置(m,n)的背景像素强度：Among them, I(m,n,t) represents the pixel value at the position (m,n) of the t-th frame picture. From the pixel time histogram, the background pixel intensity at a specific location (m, n) can be obtained:

其中，argmax(.)是直方图点(m,n)处像素值l统计的最大值。图4给出了基于时间直方图方法的背景估计结果。然而，由于机场场景中运动目标的速度较慢，且往往存在着频繁的运动静止，基于时间直方图的方法不能产生良好的效果。因此，我们提出基于卷积神经网络的运动估计网络。结构如图2所示。网络的输入为最近历史帧图像，连续通过多个多尺度接受特征块，每个特征块捕获大小为1×1、3×3和5×5感受野的最大响应，最终得到估计的背景特征图。这使我们能够在获得背景统计信息的同时还能确保网络对不同变化场景的适应性。所提出的运动估计网络的可视化对比结果显示在图4中，由图可知，基于时间直方图的背景估计方法会因飞机目标运动缓慢而出现出现鬼影现象，所提出的运动估计网络得到背景更为准确。Among them, argmax(.) is the maximum value of the statistics of the pixel value l at the histogram point (m, n). Figure 4 presents the background estimation results based on the temporal histogram method. However, due to the slow speed of moving objects in airport scenes and frequent motion stills, the method based on temporal histogram cannot produce good results. Therefore, we propose a motion estimation network based on convolutional neural network. The structure is shown in Figure 2. The input of the network is the recent historical frame image, which continuously passes through multiple multi-scale receiving feature blocks. Each feature block captures the maximum response of the receptive field of size 1×1, 3×3 and 5×5, and finally obtains the estimated background feature map. . This allows us to obtain background statistics while still ensuring the adaptability of the network to different changing scenarios. The visual comparison results of the proposed motion estimation network are shown in Figure 4. It can be seen from the figure that the background estimation method based on the time histogram will appear ghost phenomenon due to the slow movement of the aircraft target. The proposed motion estimation network obtains a better background. to be accurate.

目标检测模块由一个主干网络、特征金字塔网络和回归和分类网络组成，如图3所示。我们采用ResNet网络作为主干网络，主干网络负责计算整个输入图像的卷积特征图，为特征金字塔网络提供高维的图片特征。特征金字塔网络(FPN)通过自上而下的路径和横向连接增强了标准的卷积网络，可以被用来检测不同尺度的物体。在金字塔特征图的每一层，采取256个通道的输入特征图，并设置9个锚点。在得到特征图后，首先由ADS-B信号作为指导，选择一个概率区域，如图3所示，红色区域为概率区域，并且在此区域中用特征图上进行搜寻，最终由回归和分类生成目标候选框。The object detection module consists of a backbone network, feature pyramid network, and regression and classification networks, as shown in Figure 3. We use the ResNet network as the backbone network, which is responsible for computing the convolutional feature map of the entire input image and providing high-dimensional image features for the feature pyramid network. Feature Pyramid Network (FPN) augments standard convolutional networks with top-down paths and lateral connections, and can be used to detect objects at different scales. At each layer of the pyramid feature map, an input feature map of 256 channels is taken and 9 anchors are set. After obtaining the feature map, first use the ADS-B signal as a guide to select a probability area, as shown in Figure 3, the red area is the probability area, and search on the feature map in this area, and finally generate by regression and classification target candidate box.

S4、将步骤S3得到的所估计的背景图像与当前帧图像作为运动目标识别网络计算目标候选框的类别以及目标候选框的边界信息；S4, use the estimated background image obtained in step S3 and the current frame image as the moving target recognition network to calculate the category of the target candidate frame and the boundary information of the target candidate frame;

所得到的背景估计图像与当前帧图像作为主干网络和特征金字塔网络的输入，由目标检测网络计算得出候选框的类别以及候选框的边界框信息(框的左上顶点坐标和宽高)，由这些参数与标注好的真实边界框(ground truth)对比，并且由交叉熵损失函数和smooth_L1距离计算出两个损失函数，其中分类损失记作

回归损失记作

The obtained background estimation image and the current frame image are used as the input of the backbone network and the feature pyramid network. The target detection network calculates the category of the candidate box and the bounding box information of the candidate box (the coordinates of the upper left vertex and the width and height of the box), which are calculated by These parameters are compared with the labeled ground truth, and two loss functions are calculated from the cross-entropy loss function and the smooth _L1 distance, where the classification loss is denoted as

The regression loss is recorded as

其中t_i＝{t_x,t_y,t_w,t_h}，表示anchor的坐标，训练阶段是预测的偏移量。

是与维度相同的向量，表示anchor，训练阶段是相对于ground truth的w偏移量。R代表smooth_L1。p_i是anchor为目标的概率，

是ground truth的标签。网络的损失函数为分类损失和回归损失二者损失函数之和：where t _i ={t _x , _ty , t _w , t _h } represents the coordinates of the anchor, and the training phase is the predicted offset.

is a vector of the same dimension as the anchor, and the training phase is the w offset relative to the ground truth. R stands for smooth _L1 . p _i is the probability that the anchor is the target,

is the ground truth label. The loss function of the network is the sum of the loss functions of classification loss and regression loss:

本实施例里，损失函数的反向传播也会通过运动估计模块，我使用初始学习率为1×10^-5的Adam优化器进行优化，最终检测的预测阈值为0.5。图5为本实验方法与目标检测RCNN算法同一帧图片结果的比较，可以看到，本文算法除了具有相对更好的检测精度，还可以准确判断目标的运动和静止。In this example, the back-propagation of the loss function will also pass through the motion estimation module. I use the Adam optimizer with an initial learning rate of 1×10 ^-5 for optimization, and the final detection prediction threshold is 0.5. Figure 5 compares the results of the experimental method and the target detection RCNN algorithm in the same frame of pictures. It can be seen that the algorithm in this paper not only has relatively better detection accuracy, but also can accurately determine the motion and stillness of the target.

通过对实验结果的数据分析判断最适合机场场景的历史帧数量选取。我们采用目标检测的评价指标mAP来评估检测结果的精度由于本发明首次提出运动目标识别这一概念，并无相关算法可供对比，因此选用目标检测算法RCNN作为对比。实验结果如图6所示，可以看到当历史帧选取为30帧时平均精度最高，且因RCNN算法无法分辨运动静止，精度较差。Through the data analysis of the experimental results, it is judged that the number of historical frames most suitable for the airport scene is selected. We use the target detection evaluation index mAP to evaluate the accuracy of the detection results. Since the concept of moving target recognition is proposed for the first time in the present invention, and there is no related algorithm for comparison, the target detection algorithm RCNN is selected for comparison. The experimental results are shown in Figure 6. It can be seen that when the historical frame is selected as 30 frames, the average accuracy is the highest, and because the RCNN algorithm cannot distinguish motion and stillness, the accuracy is poor.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

本发明中应用了具体实施例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。In the present invention, the principles and implementations of the present invention are described by using specific embodiments, and the descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention; The idea of the invention will have changes in the specific implementation and application scope. To sum up, the content of this specification should not be construed as a limitation to the present invention.

本领域的普通技术人员将会意识到，这里所述的实施例是为了帮助读者理解本发明的原理，应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合，这些变形和组合仍然在本发明的保护范围内。Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to assist readers in understanding the principles of the present invention, and it should be understood that the scope of protection of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations without departing from the essence of the present invention according to the technical teaching disclosed in the present invention, and these modifications and combinations still fall within the protection scope of the present invention.

Claims

1. An airport scene moving target recognition method based on deep learning combined with ADS-B information is characterized by comprising the following steps:

s1, decoding the ADS-B information, and corresponding the ADS-B information to a two-dimensional image by using three-dimensional data returned by the airplane and the position of the camera;

s2, labeling the scene monitoring video sequences in the airport scene monitoring data set to obtain a real boundary frame, and simultaneously selecting the first ten segments of video sequences as a training set and the last ten segments of video sequences as a test set;

s3, constructing a moving object recognition network, and outputting the estimated background image by using the constructed moving object recognition network;

s4, calculating the category of a target candidate frame and the boundary information of the target candidate frame by taking the estimated background image and the current frame image obtained in the step S3 as a moving target recognition network;

s5, comparing the boundary information of the target candidate frame and the target candidate frame obtained in the step S4 with the real boundary frame marked in the step S2, and utilizing a cross entropy loss function and smooth _L1 Calculating a classification loss function and a regression loss function according to the distance, and taking the sum of the two loss functions as a loss function of the motion estimation network;

and S6, learning the loss function back propagation of the moving object recognition network to obtain a trained motion estimation network, and recognizing the moving object on the airport scene by using the trained motion estimation network.

2. The method of claim 1, wherein the moving object recognition network constructed in S3 includes a motion judgment module and an object detection module in cascade, where the input of the moving object recognition network is a current frame, position information and label information after ADS-B decoding of the current frame, and a historical previous 30 frames image of the current frame.

3. The method for identifying airport surface moving objects based on deep learning combined with ADS-B information of claim 2, wherein the method for calculating the background image estimated in S3 is as follows:

inputting the recent historical frame image into a motion judgment module, continuously receiving the image through a plurality of multi-scale feature blocks, and capturing the maximum response of the receptive field with the size of 1 × 1, 3 × 3 and 5 × 5 by each feature block to obtain an estimated background image.

4. The method for identifying airport surface moving objects based on deep learning combined with ADS-B information of claim 2, wherein the object detection module comprises a trunk network, a feature pyramid network and a regression and classification network which are sequentially cascaded, wherein the trunk network is a ResNet network and is used for calculating convolution features of the whole input image; the characteristic pyramid network reinforces the target detection module through a top-down path and a transverse connection; both the regression and classification networks are multi-scale convolutional neural networks.

5. The method for identifying airport surface moving objects based on deep learning combined with ADS-B information of claim 2, wherein the classification loss function and the regression loss function in S5 are respectively expressed as:

wherein, t _i ＝{t _x ,t _y ,t _w ,t _h Representing the coordinates of the anchor points, which are predicted offsets in the training phase;

the ground truth label is the same as the dimension vector, and the offset of the ground truth is obtained in the training stage; p is a radical of _i Is the probability that the anchor point is the target;

is a classification loss function;

is a regression loss function.