CN118570455B

CN118570455B - An anti-interference recognition method based on collaborative interference segmentation and target detection

Info

Publication number: CN118570455B
Application number: CN202411060233.0A
Authority: CN
Inventors: 张鸿; 李波; 刘雨腾; 刘偲; 韦星星; 李磊; 姜吉祥
Original assignee: Beihang University; Beijing Electromechanical Engineering Research Institute
Current assignee: Beihang University; Beijing Electromechanical Engineering Research Institute
Priority date: 2024-08-05
Filing date: 2024-08-05
Publication date: 2024-09-27
Anticipated expiration: 2044-08-05
Also published as: CN118570455A

Abstract

The present invention discloses an anti-interference recognition method for coordinated interference segmentation and target detection, which relates to the field of computer vision technology. Interference segmentation and target detection are coupled to design an anti-interference recognition model, which includes: a multi-scale feature extraction network, a multi-directional target detection module and an interference segmentation recognition module. Multi-scale features are extracted from input images, and features are extracted using a shared backbone network. The target position and the interference position are predicted collaboratively. The anti-interference recognition model is trained, and image recognition is performed using the trained model to achieve target detection. The present invention has high detection speed and precision, a small number of model parameters, high compatibility, and can effectively improve the anti-interference ability of the model and improve detection performance in complex scenarios.

Description

An anti-interference recognition method based on collaborative interference segmentation and target detection

技术领域Technical Field

本发明涉及计算机视觉技术领域，更具体的说是涉及一种干扰分割与目标检测协同的抗干扰识别方法。The present invention relates to the field of computer vision technology, and more specifically to an anti-interference recognition method that collaborates with interference segmentation and target detection.

背景技术Background Art

目前，目标检测技术已经在遥感侦查、国防安全领域得到了广泛的应用，红外目标检测是遥感目标检测的核心内容之一，对于实现全天时全天候工作有重要作用，能够提高检测能力，减轻工作时段要求，因而受到了广泛的关注。由于真实遥感场景中可能存在多种自然干扰、人为干扰，这些干扰的几何特征与检测目标存在很大的相似性，会造成目标检测识别算法产生大量的误检漏检，因此复杂场景下准确的红外目标检测尤其困难。At present, target detection technology has been widely used in the fields of remote sensing reconnaissance and national defense security. Infrared target detection is one of the core contents of remote sensing target detection. It plays an important role in achieving all-day and all-weather work, can improve detection capabilities, and reduce working hours. Therefore, it has received widespread attention. Since there may be a variety of natural interference and human interference in real remote sensing scenes, the geometric features of these interferences are very similar to the detection targets, which will cause a large number of false detections and missed detections in the target detection and recognition algorithm. Therefore, accurate infrared target detection in complex scenes is particularly difficult.

目标检测任务中，常见的有对大范围海域进行扫描式搜寻，利用各种先进的算法完成真实目标位置的及其干扰状况进行分析。常规的一些方案在感知环境简单单一的情况下是有效且可行的。然而真实环境中会面临大量较强的自然干扰以及人为干扰，同时同一片海域仅做一次检索，对实际的效果有更高的要求。若此种环境下，仍然以检测目标为唯一任务，势必会造成大量的误检以及漏检情况发生，因为干扰的几何特征与打击目标存在较大的相似性，增加了神经网络识别的难度。In target detection tasks, it is common to conduct scanning searches over a large area of sea, and use various advanced algorithms to complete the analysis of the real target location and its interference conditions. Some conventional solutions are effective and feasible when the perception environment is simple and single. However, in the real environment, there will be a lot of strong natural interference and human interference, and at the same time, only one search is performed in the same sea area, which has higher requirements for the actual effect. If in this environment, target detection is still the only task, it is bound to cause a large number of false detections and missed detections, because the geometric characteristics of the interference are very similar to the target, which increases the difficulty of neural network recognition.

抑制干扰的图像增强预处理方法由于具有实现容易、原理性强等优点而被经常使用，该方法利用干扰特点对图像进行处理，抑制背景噪声，增强目标和背景的对比度，提高目标背景信噪比。图像增强预处理包括基于场景的非均匀矫正、基于图像灰度分布的动态范围调整以及基于空域滤波的噪声抑制。基于场景的非均匀校正方法常用时域高通滤波算法、卡尔曼滤波算法和神经网络法等，但是这些方法有一定限制条件，方法的算法计算量高，在多数的硬件系统实现困难，实时性较差，同时针对多种形态的人为、自然干扰适应性差，过分依赖现有数据涵盖范围。基于图像灰度分布的动态范围调整主要针对图像灰度进行变换，通过线性或非线性变换调整图像直方图分布，代表方法有自适应直方图均衡化图像增强算法，但该类方法存在全局处理强，局部处理弱问题，针对局部复杂干扰抑制能力差。基于空域滤波的噪声抑制通过空域滤波降低图像噪声，包括中值滤波、双边滤波、二维最小均方滤波等，此类方法思路明了，计算简单，但是针对多种干扰以及图像对比度较低情况结果较差。针对遥感图像中干扰问题，仅仅从图像处理层面抑制干扰难以解决实际场景中出现的多种多样复杂情景，实现干扰的精准感知从而提升复杂场景中检测能力。Image enhancement preprocessing methods that suppress interference are often used because of their advantages such as easy implementation and strong principle. This method uses the characteristics of interference to process images, suppress background noise, enhance the contrast between the target and the background, and improve the target-background signal-to-noise ratio. Image enhancement preprocessing includes scene-based non-uniform correction, dynamic range adjustment based on image grayscale distribution, and noise suppression based on spatial filtering. Scene-based non-uniform correction methods often use time domain high-pass filtering algorithms, Kalman filtering algorithms, and neural network methods, but these methods have certain restrictions. The algorithm calculation amount of the method is high, it is difficult to implement in most hardware systems, and the real-time performance is poor. At the same time, it has poor adaptability to various forms of artificial and natural interference and is overly dependent on the coverage of existing data. Dynamic range adjustment based on image grayscale distribution mainly transforms the image grayscale and adjusts the image histogram distribution through linear or nonlinear transformation. Representative methods include adaptive histogram equalization image enhancement algorithms, but this type of method has the problem of strong global processing and weak local processing, and poor ability to suppress local complex interference. Noise suppression based on spatial filtering reduces image noise through spatial filtering, including median filtering, bilateral filtering, two-dimensional minimum mean square filtering, etc. This method has clear ideas and simple calculations, but the results are poor for various interferences and low image contrast. For the problem of interference in remote sensing images, it is difficult to solve the various complex scenarios that appear in actual scenes by simply suppressing interference at the image processing level, and to achieve accurate perception of interference and thus improve detection capabilities in complex scenes.

现有的红外目标检测方法存在的另一个问题是往往仅在目标检测层面进行处理，对场景中复杂情况的感知不够，复杂场景的干扰理解层次较低，因此很难在复杂干扰场景中取得好的效果。在干扰场景下的目标检测方法也多着眼于解决复杂干扰下目标特征微弱的问题，缺乏有效的干扰感知机制。对人类理解而言，复杂场景中的目标检测，需要先对场景中干扰进行确认，了解目标当前被遮挡或者被干扰的状态，从而更好的建立对复杂干扰场景中目标的识别。通过对场景做进一步感知，为目标检测提供更有效的上下文信息，能够提高复杂场景下的检测能力。Another problem with existing infrared target detection methods is that they often only process at the target detection level, and lack the perception of complex situations in the scene. The level of interference understanding in complex scenes is low, so it is difficult to achieve good results in complex interference scenes. Target detection methods in interference scenes also focus on solving the problem of weak target features under complex interference, and lack an effective interference perception mechanism. For human understanding, target detection in complex scenes requires first confirming the interference in the scene and understanding the current state of the target being occluded or interfered, so as to better establish the recognition of targets in complex interference scenes. By further perceiving the scene and providing more effective contextual information for target detection, the detection capability in complex scenes can be improved.

因此，如何提高复杂场景下的目标检测准确性是本领域技术人员亟需解决的问题。Therefore, how to improve the accuracy of target detection in complex scenarios is an urgent problem that technical personnel in this field need to solve.

发明内容Summary of the invention

有鉴于此，本发明提供了一种干扰分割与目标检测协同的抗干扰识别方法，通过将干扰识分割与目标检测耦合设计抗干扰识别模型，在红外复杂环境下利用分割方法对场景干扰进行检测，并根据对环境中干扰感知以实现复杂实际场景中精准目标检测的方法。In view of this, the present invention provides an anti-interference recognition method that coordinates interference segmentation and target detection. The anti-interference recognition model is designed by coupling interference segmentation with target detection. The segmentation method is used to detect scene interference in a complex infrared environment, and the method is based on the perception of interference in the environment to achieve accurate target detection in complex actual scenes.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solution:

一种干扰分割与目标检测协同的抗干扰识别方法，包括以下步骤：An anti-interference recognition method for coordinated interference segmentation and target detection comprises the following steps:

步骤1：构建抗干扰识别模块；抗干扰识别模型包括依次连接的多尺度特征提取网络、目标检测模块和干扰分割模块；多尺度特征提取网络提取目标特征；干扰分割模块根据目标特征预测干扰信息；目标检测模块根据目标特征预测目标信息；Step 1: Construct an anti-interference recognition module; the anti-interference recognition model includes a multi-scale feature extraction network, a target detection module and an interference segmentation module connected in sequence; the multi-scale feature extraction network extracts target features; the interference segmentation module predicts interference information based on target features; the target detection module predicts target information based on target features;

步骤2：采集训练数据集对抗干扰识别模型进行训练，获得训练后的抗干扰识别模型；训练过程中利用干扰信息约束目标检测模块预测目标信息；Step 2: Collect training data sets to train the anti-interference recognition model to obtain the trained anti-interference recognition model; during the training process, use the interference information to constrain the target detection module to predict the target information;

步骤3：采集待识别图像，输入至训练后的抗干扰识别模型，获得识别结果。Step 3: Collect the image to be recognized and input it into the trained anti-interference recognition model to obtain the recognition result.

优选的，多尺度特征提取网络包括五层卷积层、四层下采样层、五层激活层、空间感受野合并单元和两层上采样层；前四层卷积层之后分别依次设置一层激活层和一层下采样层；第五层卷积层之后依次设置一层激活层和空间感受野合并单元；Preferably, the multi-scale feature extraction network includes five convolutional layers, four downsampling layers, five activation layers, a spatial receptive field merging unit and two upsampling layers; one activation layer and one downsampling layer are sequentially arranged after the first four convolutional layers; one activation layer and one spatial receptive field merging unit are sequentially arranged after the fifth convolutional layer;

每一层卷积层提取一层特征，提取的特征经过激活层在下采样层进行下采样；空间感受野合并单元对第五卷积层提取的经过第五激活层输出的第五层特征T5进行处理，得到第五层目标特征P5，获得不同感受野信息；Each convolutional layer extracts a layer of features, and the extracted features are downsampled in the downsampling layer after passing through the activation layer; the spatial receptive field merging unit processes the fifth-layer features T5 extracted by the fifth convolutional layer and output by the fifth activation layer to obtain the fifth-layer target features P5, and obtain different receptive field information;

对第五层目标特征P5通过第一层上采样层采用最近邻插值算法进行尺度放大，获得第五层放大特征P5’，将第五层放大特征P5’与第四层卷积层提取的依次经过第四层激活层、第四层下采样层输出的第四层特征T4采用通道处连接操作进行融合，获得第四层目标特征P4；The fifth-layer target feature P5 is scaled up by the nearest neighbor interpolation algorithm through the first upsampling layer to obtain the fifth-layer enlarged feature P5', and the fifth-layer enlarged feature P5' is fused with the fourth-layer feature T4 extracted by the fourth convolution layer and output by the fourth activation layer and the fourth downsampling layer by the channel connection operation to obtain the fourth-layer target feature P4;

对第四层目标特征P4通过第二层上采样层采用最近邻插值算法进行尺度放大，获得第四层放大特征P4’，将第四层放大特征P4’与第三层卷积层提取的依次经过第三层激活层、第三层下采样层输出的第三层特征T3采用通道处连接操作进行融合，获得第三层目标特征P3，完成特征提取。The fourth-layer target feature P4 is scaled up by the nearest neighbor interpolation algorithm through the second upsampling layer to obtain the fourth-layer enlarged feature P4'. The fourth-layer enlarged feature P4' is fused with the third-layer feature T3 extracted by the third convolution layer and output by the third activation layer and the third downsampling layer using the channel connection operation to obtain the third-layer target feature P3, thus completing the feature extraction.

优选的，多尺度特征提取网络中五层卷积层的通道数依次为32、64、128、256、512；激活层采用LeakyReLU激活函数。Preferably, the number of channels of the five convolutional layers in the multi-scale feature extraction network is 32, 64, 128, 256, and 512 respectively; and the activation layer adopts the LeakyReLU activation function.

优选的，空间感受野合并单元包括依次连接的第一卷积核、最大池化层和第二卷积核；第一卷积核和第二卷积核大小均为1*1，用于建立隐式非线性映射，最大池化层包括3种尺寸，用于获取不同感受野信息。能够有效整合不同尺度下特征中的信息，有效提取当前尺寸特征图中不同尺寸目标的信息。Preferably, the spatial receptive field merging unit includes a first convolution kernel, a maximum pooling layer, and a second convolution kernel connected in sequence; the first convolution kernel and the second convolution kernel are both 1*1 in size, which are used to establish implicit nonlinear mapping, and the maximum pooling layer includes 3 sizes, which are used to obtain different receptive field information. It can effectively integrate information in features at different scales and effectively extract information of targets of different sizes in the current size feature map.

优选的，目标检测模块包括两层卷积层和预测模块，第三层目标特征P3输入到第一层卷积层中，获得的输出特征F1与第四层目标特征P4采用通道处连接操作进行融合，获得融合特征；融合特征输入到第二层卷积层中，获得的输出特征F2与第五层目标特征P5采用通道处连接操作进行融合，获得输出特征F3；输出特征F1、输出特征F2和输出特征F3在预测模块进行预测，获得的目标信息包括目标位置、目标大小、目标方向、目标类别和目标置信度；预测模块采用两层卷积层，经过两次卷积在第二个卷积层获得目标位置、目标大小、目标方向、目标类别和目标置信度。Preferably, the target detection module includes two convolutional layers and a prediction module, the third layer target feature P3 is input into the first layer convolutional layer, the obtained output feature F1 is fused with the fourth layer target feature P4 by using the channel connection operation to obtain the fused feature; the fused feature is input into the second layer convolutional layer, the obtained output feature F2 is fused with the fifth layer target feature P5 by using the channel connection operation to obtain the output feature F3; the output features F1, F2 and F3 are predicted in the prediction module, and the obtained target information includes target position, target size, target direction, target category and target confidence; the prediction module uses two layers of convolutional layers, and the target position, target size, target direction, target category and target confidence are obtained in the second convolutional layer after two convolutions.

优选的，目标检测模块统一的对目标类别、大小、方向、位置、置信度进行预测；同时目标检测模块接收三个不同尺度的特征输入，分别是处理后的第三层特征、第四层特征和第五层特征，通道数分别为128、256和512，最后输出的通道数为当前尺寸特征下锚点个数*（类别数+方向数+5），其中5代表表述目标位置、大小、置信度所需要的数据个数。Preferably, the target detection module uniformly predicts the target category, size, direction, position, and confidence; at the same time, the target detection module receives feature inputs of three different scales, which are the processed third-layer features, fourth-layer features, and fifth-layer features, with the number of channels being 128, 256, and 512, respectively. The number of channels outputted at last is the number of anchor points under the current size feature * (number of categories + number of directions + 5), where 5 represents the number of data required to express the target position, size, and confidence.

优选的，干扰分割模块包括四层卷积层、三层上采样层和两层通道权重矫正单元，每一层上采样层分别位于两层卷积层之间，第一层通道权重矫正单元位于第一层上采样层和第二层卷积层之间，第二层通道权重矫正单元位于第二层上采样层和第三层卷积层之间；多尺度特征提取网络输出的第三层特征作为第一层卷积层的输入；上采样层根据最近邻插值算法进行尺度放大，对特征进行长宽分别为二倍形式的上采样，经过两层上采样层得到与训练数据集中图像尺寸相同的特征图；第四层卷积层输出每个像素的干扰概率作为干扰信息，从而完成图片中干扰情况的预测。Preferably, the interference segmentation module includes four convolutional layers, three upsampling layers and two channel weight correction units, each upsampling layer is located between two convolutional layers, the first channel weight correction unit is located between the first upsampling layer and the second convolutional layer, and the second channel weight correction unit is located between the second upsampling layer and the third convolutional layer; the third layer features output by the multi-scale feature extraction network are used as input to the first convolutional layer; the upsampling layer scales up the feature according to the nearest neighbor interpolation algorithm, and upsamples the feature in a form of twice the length and width, and obtains a feature map with the same image size as the training data set after two upsampling layers; the fourth convolutional layer outputs the interference probability of each pixel as interference information, thereby completing the prediction of the interference situation in the picture.

优选的，通道权重矫正单元包括依次连接的第一卷积核、全局池化层、第二卷积核和激活函数；全局池化层用于获取训练数据集中图片的一维表示；第一卷积核大小为3*1，用于获取每个通道的权重并修正每个通道特征的权重；第二卷积核大小为1*1，用于完成维度的减少；激活函数用于将输出解释为权重分布从而进行矫正。Preferably, the channel weight correction unit includes a first convolution kernel, a global pooling layer, a second convolution kernel and an activation function connected in sequence; the global pooling layer is used to obtain a one-dimensional representation of the image in the training data set; the first convolution kernel size is 3*1, which is used to obtain the weight of each channel and correct the weight of each channel feature; the second convolution kernel size is 1*1, which is used to complete the dimensionality reduction; the activation function is used to interpret the output as a weight distribution for correction.

优选的，采用训练后的抗干扰识别模型对待识别图像进行识别包括以下步骤：Preferably, using the trained anti-interference recognition model to recognize the image to be recognized includes the following steps:

步骤31：多尺度特征提取网络对训练数据集的图片进行特征提取，对第三层特征T3、第四层特征T4和第五层特征T5，采用通道处连接的方式进行特征融合，获得第三层目标特征P3、第四层目标特征P4和第五层目标特征P5；通过采用通道处连接的方式进行特征融合，来尽可能多的保留浅层网络特征的细节信息，同时包含深层网络特征的全局信息，将第三到五层卷积层的特征进行融合来获取网络对输入图片深度特征的提取；Step 31: The multi-scale feature extraction network extracts features from the images of the training data set, and fuses the third-layer features T3, the fourth-layer features T4, and the fifth-layer features T5 by connecting at the channel to obtain the third-layer target features P3, the fourth-layer target features P4, and the fifth-layer target features P5; by connecting at the channel to perform feature fusion, as much detail information of shallow network features as possible is retained, while including the global information of deep network features, and the features of the third to fifth convolutional layers are fused to obtain the network's extraction of deep features of the input image;

步骤32：目标检测模块根据第三层目标特征P3、第四层目标特征P4和第五层目标特征P5，再次通过卷积和特征融合的方式进行处理，根据融合后的输出特征预测出不同尺度目标的目标位置、目标大小、目标方向、目标类别和目标置信度；Step 32: The target detection module processes the third-layer target features P3, the fourth-layer target features P4, and the fifth-layer target features P5 again by means of convolution and feature fusion, and predicts the target position, target size, target direction, target category, and target confidence of targets of different scales according to the fused output features;

步骤33：干扰分割模块对第三层目标特征P3进行权重矫正和尺度放大，得到与训练数据集的图片尺寸一致的特征图，根据特征图预测出图片中每个像素的干扰概率。Step 33: The interference segmentation module performs weight correction and scale enlargement on the third-layer target feature P3 to obtain a feature map with the same image size as the training data set, and predicts the interference probability of each pixel in the image based on the feature map.

优选的，步骤31还包括以下内容：Preferably, step 31 also includes the following contents:

多尺度特征提取网络在第一层卷积层之前还设置有切片模块，采用特征重组的方式，将第一层大尺度特征转换为4个小尺寸的特征图，然后采用通道处连接操作进行融合获得输入；例如可切分成、、、，这样所有小尺寸融合之后还是包含原图的所有特征，该操作后的输入不用参数计算，可实现快速下采样；The multi-scale feature extraction network also has a slicing module before the first convolutional layer. It uses feature reorganization to convert the first layer of large-scale features into four small-sized feature maps, and then uses channel connection operations to fuse them to obtain input; for example Can be divided into , , , , so that all the small-size images still contain all the features of the original image after fusion. The input after this operation does not require parameter calculation, which can achieve fast downsampling;

输入依次经过第一卷积层、第一激活层、第一下采样层、第二卷积层、第二激活层、第二下采样层、第三卷积层、第三激活层、第三下采样层、第四卷积层、第四激活层、第四下采样层、第五卷积层、第五激活层和空间感受野合并单元，得到第五层目标特征P5；对第五层目标特征P5通过第一上层样层采用最近邻插值算法进行尺度放大，获得第五层放大特征P5’，将第五层放大特征P5’与第四层下采样层输出的第四层特征T4进行通道处融合，获得第四层目标特征P4；对第四层目标特征P4通过第二层上采样层采用最近邻插值算法进行尺度放大，获得第四层放大特征P4’，将第四层放大特征P4’与第三层下采样层输出的第三层特征T3进行通道处融合，获得第三层目标特征P3，完成特征提取；The input passes through the first convolution layer, the first activation layer, the first downsampling layer, the second convolution layer, the second activation layer, the second downsampling layer, the third convolution layer, the third activation layer, the third downsampling layer, the fourth convolution layer, the fourth activation layer, the fourth downsampling layer, the fifth convolution layer, the fifth activation layer and the spatial receptive field merging unit in sequence to obtain the fifth-layer target feature P5; the fifth-layer target feature P5 is scaled up by the nearest neighbor interpolation algorithm through the first upper sampling layer to obtain the fifth-layer enlarged feature P5', and the fifth-layer enlarged feature P5' is fused with the fourth-layer feature T4 output by the fourth downsampling layer at the channel to obtain the fourth-layer target feature P4; the fourth-layer target feature P4 is scaled up by the nearest neighbor interpolation algorithm through the second upper sampling layer to obtain the fourth-layer enlarged feature P4', and the fourth-layer enlarged feature P4' is fused with the third-layer feature T3 output by the third downsampling layer at the channel to obtain the third-layer target feature P3, and feature extraction is completed;

第三层目标特征、第四层目标特征和第五层目标特征的通道数分别为128、256和512。The number of channels of the third-layer target features, the fourth-layer target features, and the fifth-layer target features are 128, 256, and 512, respectively.

优选的，步骤32包括以下内容：Preferably, step 32 includes the following contents:

在目标检测模块通过第四层卷积层进行目标位置、目标大小、目标方向、目标类别和目标置信度的多任务类型回归分类，通过回归目标中心点位置、长宽大小、旋转方向完成目标的精确定位，通过预测目标类别概率获取目标类别。In the target detection module, the fourth convolutional layer is used to perform multi-task type regression classification of target position, target size, target direction, target category and target confidence. The target is accurately positioned by regressing the target center point position, length, width and rotation direction, and the target category is obtained by predicting the target category probability.

优选的，训练干扰分割模块采用二分类任务交叉熵损失函数；对抗干扰识别模型的干扰分割模块的训练过程为：Preferably, the interference segmentation module is trained using a binary classification task cross entropy loss function; the training process of the interference segmentation module of the adversarial interference recognition model is:

根据训练数据集的标注数据中的干扰概率分布，制作成与对应的训练数据集中图片尺寸一致的二值图；二值图中当前存在干扰的位置数值为1，不存在干扰的位置数值为0；According to the interference probability distribution in the labeled data of the training data set, a binary image with the same size as the corresponding image in the training data set is made; the position where there is interference in the binary image has a value of 1, and the position where there is no interference has a value of 0;

通过对干扰分割模块输出的每个像素的干扰概率和相对应的二值图进行交叉熵计算，通过梯度反向传播指导干扰分割模块的学习；通过训练学习后，将得到的干扰概率进行Softmax操作，将Softmax产生的值作为当前各像素点存在干扰的干扰概率。The interference probability of each pixel output by the interference segmentation module and the corresponding binary image are cross-entropy calculated, and the learning of the interference segmentation module is guided by gradient back propagation. After training and learning, the obtained interference probability is subjected to Softmax operation, and the value generated by Softmax is used as the interference probability of interference at the current pixel point.

优选的，训练目标检测模块采用IoU损失函数和Focal损失函数，其中IoU损失函数用于计算目标位置和目标大小的损失，Focal损失函数用于计算目标方向损失；在目标检测模块训练过程中，各个目标根据受遮挡情况按设定的权重分配策略自适应调整目标不同部位的权重分配，具体权重生成包括以下步骤：Preferably, the training target detection module adopts the IoU loss function and the Focal loss function, wherein the IoU loss function is used to calculate the loss of the target position and the target size, and the Focal loss function is used to calculate the target direction loss; during the training process of the target detection module, each target is adaptively adjusted according to the occlusion situation according to the set weight distribution strategy. The specific weight generation includes the following steps:

根据训练数据集的标注数据中标注目标的尺寸分布，统计出目标尺寸大小的平均值；According to the size distribution of the labeled objects in the labeled data of the training data set, the average value of the object size is calculated;

根据尺寸大小平均值，在上述二值图中每个位置为中心，尺寸大小平均值为范围内像素总值作为该位置新的像素值，生成新的与对应的训练数据集中图片尺寸一致的权重图；According to the average value of the size, each position in the above binary image is taken as the center, and the average value of the size is taken as the total value of the pixels in the range as the new pixel value of the position, and a new weight map consistent with the image size in the corresponding training data set is generated;

将权重图以最近邻采样的方式分别下采样至第三层目标特征P3、第四层目标特征P4和第五层目标特征P5尺寸大小后，按设定策略根据像素值得到最终对应的各权重图。After the weight map is downsampled to the size of the third layer target feature P3, the fourth layer target feature P4 and the fifth layer target feature P5 by nearest neighbor sampling, the corresponding weight maps are finally obtained according to the pixel values according to the set strategy.

经由上述的技术方案可知，与现有技术相比，本发明公开提供了一种干扰分割与目标检测协同的抗干扰识别方法，适用于干扰场景，采用共享特征与分割方式干扰识别、目标识别协同的一体化抗干扰识别模型进行目标识别，实现抗干扰的红外遥感图像的准确识别，为了强化学习过程中干扰作为目标的状态信息，协同目标属性信息共同认知，设计网络结构增强特征共享，设计损失函数根据目标遮挡状态自适应调整权重，强化对受遮挡目标的精确识别，模拟视觉感知结合遮挡状态分析目标，实现了从简单去除干扰向利用干扰状态的转变。具体有益效果如下：Through the above technical solutions, it can be known that compared with the prior art, the present invention discloses an anti-interference recognition method that cooperates with interference segmentation and target detection, which is suitable for interference scenes, and adopts an integrated anti-interference recognition model that cooperates with interference recognition and target recognition in a shared feature and segmentation manner to perform target recognition, so as to achieve accurate recognition of anti-interference infrared remote sensing images. In order to strengthen the state information of interference as a target in the learning process, the target attribute information is jointly recognized, the network structure is designed to enhance feature sharing, and the loss function is designed to adaptively adjust the weight according to the target occlusion state, so as to strengthen the accurate recognition of the occluded target, simulate visual perception and analyze the target in combination with the occlusion state, and realize the transformation from simple interference removal to utilization of interference state. The specific beneficial effects are as follows:

（1）分割任务与检测任务共用一个特征提取网络，具有较少的参数和计算量，检测速度快，对运行平台的硬件要求相对较低，兼容性强；(1) The segmentation task and the detection task share a feature extraction network, which has fewer parameters and computational complexity, fast detection speed, relatively low hardware requirements for the operating platform, and strong compatibility;

（2）针对检测识别过程中的目标多尺度问题，通过特征重组以及高低层特征融合的机制来提升网络对小尺度目标的敏感性，其中多次的特征融合操作可以有效的保留输入图像的高层语义信息以及浅层特征中的细节信息，极大程度提升了特征提取的有效性；(2) In order to solve the multi-scale problem of targets in the detection and recognition process, the network’s sensitivity to small-scale targets is improved through feature reorganization and high- and low-level feature fusion. Multiple feature fusion operations can effectively retain the high-level semantic information of the input image and the detailed information in the shallow features, greatly improving the effectiveness of feature extraction.

（3）通过分割任务和检测任务共用一个骨干网络，采用多个任务多个损失约束共同优化骨干网络，完成学习过程中的多任务协同，通过设计合理有效的约束函数提升学习过程中干扰信息和目标信息的交互，通过分割任务的协同，使模型能够有效理解干扰状态，从而基于场景精确识别目标，有效提升模型的抗干扰性能和方法的准确性。(3) The segmentation task and the detection task share a backbone network, and multiple tasks and multiple loss constraints are used to jointly optimize the backbone network to complete multi-task collaboration in the learning process. The interaction between interference information and target information in the learning process is improved by designing reasonable and effective constraint functions. Through the collaboration of segmentation tasks, the model can effectively understand the interference state, thereby accurately identifying targets based on the scene, effectively improving the model's anti-interference performance and the accuracy of the method.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on the provided drawings without paying creative work.

图1为本发明提供的抗干扰识别模型结构示意图；FIG1 is a schematic diagram of the structure of an anti-interference recognition model provided by the present invention;

图2为本发明提供的模型中网络融合示意图；FIG2 is a schematic diagram of network fusion in the model provided by the present invention;

图3为本发明提供的目标检测模块进行目标方向预测示意图。FIG3 is a schematic diagram of target direction prediction performed by a target detection module provided by the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

本发明实施例公开了一种干扰分割与目标检测协同的抗干扰识别方法，包括以下步骤：The embodiment of the present invention discloses an anti-interference identification method for coordinated interference segmentation and target detection, comprising the following steps:

S1：构建抗干扰识别模块；S1: build anti-interference recognition module;

S2：采集训练数据集对抗干扰识别模型进行训练，获得训练后的抗干扰识别模型；S2: Collect training data sets to train the anti-interference recognition model to obtain the trained anti-interference recognition model;

S3：采集待识别图像，输入至训练后的抗干扰识别模型，获得识别结果。S3: Collect the image to be recognized, input it into the trained anti-interference recognition model, and obtain the recognition result.

另一方面，在一个具体实施例中，将干扰识分割与目标检测耦合设计的抗干扰识别模型如图1所示，具体包括以下结构：On the other hand, in a specific embodiment, the anti-interference recognition model designed by coupling interference segmentation with target detection is shown in FIG1 , and specifically includes the following structure:

（1）多尺度特征提取网络(1) Multi-scale feature extraction network

为了更好地完成深度网络模型嵌入硬件环境，需要在良好的检测性能和较快的处理速度之间进行平衡，较大的输入尺寸能带来较优的性能，也会带来更慢的处理速度，同时减少网络计算量也是必要的，因此在整个网络模型中，使用了合适的输入（512*640）、合理的特征重用、多任务的特征共享。基于多任务的特征共用，整个backbone可以分为5层，每层的通道数分别为：32、64、128、256、512，其中后三层特征用于目标检测模块，第三层特征同时用于干扰分割模块。In order to better embed the deep network model into the hardware environment, it is necessary to balance good detection performance and faster processing speed. A larger input size can bring better performance, but also slower processing speed. At the same time, it is also necessary to reduce the amount of network calculation. Therefore, in the entire network model, appropriate input (512*640), reasonable feature reuse, and multi-task feature sharing are used. Based on multi-task feature sharing, the entire backbone can be divided into 5 layers, and the number of channels in each layer is: 32, 64, 128, 256, 512, of which the last three layers of features are used for the target detection module, and the third layer of features are also used for the interference segmentation module.

深度神经网络深层特征为了获得更大的感受野和高层语义信息，特征尺寸会逐渐缩小，与此同时带来的也是特征信息的大量损失。本实施例中采用两种方式保留浅层特征的细节信息和高层特征的语义信息，首先，在特征提取过程中，如图2中(a)采用残差连接的方式提升高层特征的细节信息量，将残差模块设置在每一个卷积层或卷积核之后，而在特征融合过程中，如图2中(b)采用通道连接方式（即通道处连接操作）达成全局信息和局部信息的共同传播。In order to obtain a larger receptive field and high-level semantic information, the feature size of the deep neural network deep features will gradually decrease, and at the same time, a large amount of feature information will be lost. In this embodiment, two methods are used to retain the detail information of shallow features and the semantic information of high-level features. First, in the feature extraction process, as shown in Figure 2 (a), the residual connection method is used to increase the amount of detail information of high-level features, and the residual module is set after each convolution layer or convolution kernel. In the feature fusion process, as shown in Figure 2 (b), the channel connection method (i.e., the connection operation at the channel) is used to achieve the common propagation of global information and local information.

此外，设计了一种浅层特征和深层特征融合方法，通过融合方案提升网络多尺度目标的定位和分类任务性能。由于单纯的深层特征上采样后和浅层特征融合，仅能提升浅层特征中的全局信息，而对深层特征没有明显的加强。因此，本发明设计了一种特征融合方法，首先使用空间感受野合并单元对深层的第五层特征进行处理，通过包括第一卷积核、最大池化层、第二卷积核依次连接，第一卷积核之后设置有残差模块进行特征提取，其中第一卷积核和第二卷积核大小均为1*1，用于建立隐式非线性映射，最大池化层包括3种不同尺寸，用于获取更充足的全局信息，再使用通道处连接方式完成第五层到第三层的特征融合，从而达到全方位的信息综合，全面提升提取到的第三、四、五层的特征信息量。In addition, a method for fusing shallow features and deep features is designed, and the positioning and classification task performance of multi-scale targets of the network is improved through the fusion scheme. Since the simple upsampling of deep features and fusion with shallow features can only improve the global information in shallow features, but there is no obvious enhancement of deep features. Therefore, the present invention designs a feature fusion method, firstly, the fifth layer features of the deep layer are processed using a spatial receptive field merging unit, and the first convolution kernel, the maximum pooling layer, and the second convolution kernel are connected in sequence, and a residual module is arranged after the first convolution kernel for feature extraction, wherein the size of the first convolution kernel and the second convolution kernel are both 1*1, which are used to establish implicit nonlinear mapping, and the maximum pooling layer includes 3 different sizes for obtaining more sufficient global information, and then the channel connection method is used to complete the feature fusion from the fifth layer to the third layer, thereby achieving all-round information synthesis and comprehensively improving the amount of feature information extracted from the third, fourth, and fifth layers.

当前网络是高效的、精简的，整个多尺度特征提取网络共五层，在保证推理速度的情况下，极大地保证了充足的感受野和全局以及细节信息，同时保证了各尺寸目标的特征精度。在每一层卷积层处理后，增加了LeakyReLU激活函数，激活函数的使用可以使得神经网络有合理的特征空间，从而捕捉更多有效信息。The current network is efficient and streamlined. The entire multi-scale feature extraction network has five layers. While ensuring the inference speed, it greatly guarantees sufficient receptive field and global and detailed information, while ensuring the feature accuracy of targets of various sizes. After each convolution layer is processed, the LeakyReLU activation function is added. The use of the activation function can make the neural network have a reasonable feature space, thereby capturing more effective information.

（2）基于角度分类预测的多方向目标检测模块(2) Multi-directional target detection module based on angle classification prediction

红外遥感图像中，目标往往存在较大的长宽比，导致常规的水平框对于准确描述目标位置存在缺陷，同时红外遥感侦测任务中，目标往往是多角度、多方向的，而目标方向往往是一项重要的信息。而常规的通过网络额外分支回归目标角度的方式，存在一些缺陷：In infrared remote sensing images, targets often have a large aspect ratio, which results in the conventional horizontal frame being inadequate for accurately describing the target position. In addition, in infrared remote sensing detection tasks, targets are often multi-angle and multi-directional, and the target direction is often an important piece of information. The conventional method of regressing the target angle through an additional branch of the network has some defects:

1）角度的预测往往是一个离散值，直接预测离散值会导致网络难以收敛，预测精度影响较大；1) The angle prediction is often a discrete value. Directly predicting the discrete value will make it difficult for the network to converge, which will have a significant impact on the prediction accuracy.

2）角度的回归值存在很高的敏感性，同时要解决离散值边界譬如0°和180°突变问题。2) The regression value of the angle is highly sensitive, and the problem of sudden changes at discrete value boundaries such as 0° and 180° must be solved.

因此，本实施例中采用了如下的回归策略，在准确预测目标长宽的同时，完成了目标方向的定位。将角度值的回归变成角度值的分类问题，同时为了保证角度预测的精度，将角度离散分为180类，对180类分别进行编码，来完成角度的分类预测。引入周期性来解决角度的周期性问题，避免0°到180°的巨大突变，合理衡量预测值和真值之间的角度差距。而分类问题与目标本身类别预测任务耦合程度极高，将角度分类和目标本身类别分类任务耦合，也降低网络模型的训练难度，有利于网络的快速回归以及损失函数的稳定性。Therefore, the following regression strategy is adopted in this embodiment to accurately predict the length and width of the target while completing the positioning of the target direction. The regression of the angle value is turned into a classification problem of the angle value. At the same time, in order to ensure the accuracy of the angle prediction, the angle is discretely divided into 180 categories, and the 180 categories are encoded separately to complete the classification prediction of the angle. Introduce periodicity to solve the periodicity problem of the angle, avoid the huge mutation from 0° to 180°, and reasonably measure the angle difference between the predicted value and the true value. The classification problem is highly coupled with the target category prediction task itself. Coupling the angle classification with the target category classification task itself also reduces the training difficulty of the network model, which is conducive to the rapid regression of the network and the stability of the loss function.

如图3所示，目标检测模块同时对目标的中心位置、长宽大小、方向、类别进行预测，三层不同尺度的特征分别对应不同的尺寸大小的目标，各由一层卷积层构成，各层的输出通道数为N*（180+class_num+5），尺寸大小为H*W，其中H和W为该层特征图长和宽，N为该层每个位置anchor数量，class_num表示待识别目标的种类。As shown in Figure 3, the target detection module predicts the center position, length, width, direction, and category of the target at the same time. The three layers of features of different scales correspond to targets of different sizes, each of which is composed of a convolutional layer. The number of output channels of each layer is N*(180+class_num+5), and the size is H*W, where H and W are the length and width of the feature map of this layer, N is the number of anchors at each position of this layer, and class_num represents the type of target to be identified.

（3）基于像素监督的干扰分割模块(3) Interference segmentation module based on pixel supervision

真实复杂的场景中，往往存在着大量的人为、自然干扰。而在红外遥感任务中，占最大比例的是云雾、烟雾等干扰。此类干扰由于本身不定性的特点，通常的目标框级别的标注难以对干扰进行精确地描述，而为了确认目标干扰情况，需要准确了解场景干扰态势。为了更好地对场景中干扰进行感知，基于像素监督的分割方法用于实现干扰识别。In real and complex scenes, there are often a lot of man-made and natural interferences. In infrared remote sensing tasks, the largest proportion of interferences is caused by clouds, fog, and smoke. Due to the inherent uncertainty of such interferences, it is difficult to accurately describe the interferences at the target frame level. In order to confirm the interference situation of the target, it is necessary to accurately understand the interference situation of the scene. In order to better perceive the interference in the scene, the segmentation method based on pixel supervision is used to realize interference recognition.

为了保持原有网络结构的简洁性不引入过于复杂的网络结构。基于多任务协同的机理，分割模块和检测模块共用一个骨干网络提取的特征，从而增强学习过程中二者信息的交互，强化检测过程中对干扰的认知。使用提取到的第三层特征，经过三次的上采样以及三层卷积操作，得到和输入图像尺寸一致的特征图，最后经过一层卷积层以及Softmax激活函数操作得到干扰分割预测结果。而各卷积层通道数为128、32、8、2，用于Softmax操作后得到的是各个像素是干扰的概率，通过阈值划分可以得到最后的预测结果。而为了指导学习过程，需要制作和标注图片中干扰对应的二值图，需要注意的是由于云雾和烟雾存在透明性，在标注过程中需要统一标注标准，对高于一定透明度的干扰进行标注，避免数据上的分布失衡。In order to maintain the simplicity of the original network structure, an overly complex network structure is not introduced. Based on the mechanism of multi-task collaboration, the segmentation module and the detection module share the features extracted by a backbone network, thereby enhancing the interaction of information between the two during the learning process and strengthening the recognition of interference during the detection process. Using the extracted third-layer features, after three upsampling and three-layer convolution operations, a feature map with the same size as the input image is obtained, and finally a convolution layer and a Softmax activation function operation are performed to obtain the interference segmentation prediction result. The number of channels in each convolution layer is 128, 32, 8, and 2. After the Softmax operation, the probability of each pixel being interference is obtained, and the final prediction result can be obtained by threshold division. In order to guide the learning process, it is necessary to make and annotate the binary images corresponding to the interference in the picture. It should be noted that due to the transparency of clouds and smoke, it is necessary to unify the annotation standards during the annotation process, and annotate the interference with a transparency higher than a certain level to avoid data distribution imbalance.

本实施例有利于在网络学习过程中加强对干扰态势的认知，同时通过合理设计损失函数的约束，加强干扰信息和目标信息的交互学习和协同认知，促进网络认知目标干扰状态后对目标准确识别，提升网络的抗干扰性能。This embodiment is conducive to strengthening the recognition of interference situations during the network learning process. At the same time, by reasonably designing the constraints of the loss function, it strengthens the interactive learning and collaborative cognition of interference information and target information, promotes the network to accurately identify the target after recognizing the interference state of the target, and improves the anti-interference performance of the network.

进一步的，为了实现高效的抗干扰遥感图像检测识别，整个网络框架的训练，对不同模块采用不同的损失函数设置；Furthermore, in order to achieve efficient anti-interference remote sensing image detection and recognition, the training of the entire network framework uses different loss function settings for different modules;

（1）目标置信度学习过程通过Focal损失函数则用来解决正负样本不平衡问题；假设真值y取值为±1，其中1代表前景，-1代表背景，网络模型的预测概率值p的取值范围为[0,1]；当y=1时，交叉熵的值为-log(p)，便于表示定义为：(1) The target confidence learning process uses the Focal loss function to solve the problem of imbalance between positive and negative samples. Assume that the true value y is ±1, where 1 represents the foreground and -1 represents the background. The predicted probability value p of the network model ranges from [0, 1]. When y=1, the cross entropy value is -log(p), which is convenient for expressing the definition. for:

此时，交叉熵可以表示为CE(p_t)=-log(p_t)；则分类损失可以描述为：At this time, the cross entropy can be expressed as CE(p _t )=-log(p _t ); then the classification loss It can be described as:

其中,是调控系数，一般取为大于1的数，用来增加当前loss对难分样本的注意力，从而当预测错误时，的值趋近于1，故而不受影响；而当预测正确时则会趋近于0，此时越大，则会越小，从而易分类的样本权重相对减小。被用来平衡正负样本的比例，因为网络往往更倾向于学习样本多的类别，而忽略样本相对少的类别。in, It is a control coefficient, which is generally taken as a number greater than 1, used to increase the attention of the current loss to the difficult samples, so that when the prediction is wrong, The value of is close to 1, so it is not affected; when the prediction is correct It will approach 0, then The bigger, The smaller it is, the smaller it will be, so the weight of samples that are easy to classify will be relatively reduced. It is used to balance the ratio of positive and negative samples, because the network tends to learn categories with more samples and ignore categories with relatively fewer samples.

（2）目标的定位以及方向损失则分别采用IoU损失函数和Focal损失函数，其大致可以描述为：(2) The target positioning and direction loss use the IoU loss function and Focal loss function respectively, which can be roughly described as:

其中前三项表示目标的定位损失，主要用于完成目标的中心点以及长宽预测；b代表目标中心点，表示计算二者欧氏距离，c代表预测框和真实框的最小外接矩；最后一项则表示角度分类预测，使用的交叉熵来对分类的训练，其中为各角度值预测值。The first three items represent the positioning loss of the target, which is mainly used to complete the target center point and length and width prediction; b represents the target center point, Indicates the calculation of the Euclidean distance between the two, c represents the minimum circumscribed moment of the predicted box and the true box; the last item represents the angle classification prediction, and the cross entropy is used to train the classification, where Predict values for each angle value.

同时根据训练数据集中各个目标受遮挡情况根据设定的权重分配策略自适应调整目标不同部位的权重分配，权重分配策略大致可描述为如下：At the same time, according to the occlusion of each target in the training data set, the weight distribution of different parts of the target is adaptively adjusted according to the set weight distribution strategy. The weight distribution strategy can be roughly described as follows:

其中，x为目标受遮挡部位二值图，通过计算目标不同的受遮挡程度，灵活自适应调整各目标最终分配权重，从而使得学习过程中强化对受遮挡目标的精确识别，模拟视觉感知结合遮挡状态分析目标，实现了从简单去除干扰向利用干扰状态的转变；Among them, x is the binary image of the occluded part of the target. By calculating the different occlusion degrees of the target, the final weight of each target is flexibly and adaptively adjusted, so that the accurate recognition of the occluded target is strengthened during the learning process, and the target is analyzed by simulating visual perception combined with the occlusion state, realizing the transition from simple interference removal to utilizing the interference state.

在训练过程中，各个目标根据受遮挡情况按设定的权重分配策略自适应调整目标不同部位的权重分配，具体权重生成包括以下步骤：During the training process, each target is adaptively adjusted according to the weight distribution strategy based on the occlusion situation of different parts of the target. The specific weight generation includes the following steps:

1）根据训练数据集的标注数据中标注目标的尺寸分布，统计出目标尺寸大小的平均值；1) According to the size distribution of the labeled objects in the labeled data of the training data set, the average value of the object size is calculated;

2）根据目标尺寸大小的平均值，在上述二值图中每个位置为中心，尺寸大小为范围内像素总值作为该位置新的像素值，生成新的与对应的训练数据集中图片尺寸一致的权重图；2) Based on the average value of the target size, each position in the above binary image is taken as the center, and the size is the total value of the pixels in the range as the new pixel value of the position, and a new weight map consistent with the image size in the corresponding training data set is generated;

3）将权重图以最近邻采样的方式分别下采样至第三层目标特征P3、第四层目标特征P4和第五层目标特征P5尺寸大小后，按设定策略根据像素值得到最终对应的各权重图。3) After downsampling the weight map to the size of the third-layer target feature P3, the fourth-layer target feature P4, and the fifth-layer target feature P5 by nearest neighbor sampling, the corresponding weight maps are finally obtained according to the pixel values according to the set strategy.

（3）对于干扰分割损失的设计，使用了简单的二分类任务交叉熵损失：(3) For the design of interference segmentation loss, a simple binary classification task cross entropy loss is used:

其中y表示干扰分割模块预测结果，表示在真实标注中该像素是否是干扰。Where y represents the prediction result of the interference segmentation module, Indicates whether the pixel is interference in the real annotation.

本发明所提供的抗干扰的遥感图像目标检测识别方法主要为解决1.5~7.5米可见光或者近红外的目标检测识别。但也适用于其它分辨率下遥感图像中的目标检测和其它应用场景中的目标检测识别，针对可见光其它分辨率或者其它应用场景中的数据，只需要重新训练网络模型，更新相应的权重文件即可完成该应用场景中的目标检测识别。The anti-interference remote sensing image target detection and recognition method provided by the present invention is mainly used to solve the detection and recognition of targets in 1.5-7.5 meters of visible light or near-infrared. However, it is also applicable to target detection in remote sensing images at other resolutions and target detection and recognition in other application scenarios. For data in other resolutions of visible light or other application scenarios, it is only necessary to retrain the network model and update the corresponding weight file to complete the target detection and recognition in the application scenario.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。In this specification, each embodiment is described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the method part.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其它实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables one skilled in the art to implement or use the present invention. Various modifications to these embodiments will be apparent to one skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present invention. Therefore, the present invention will not be limited to the embodiments shown herein, but rather to the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An anti-interference identification method for coordinated interference segmentation and target detection, characterized in that it comprises the following steps:

Step 1: Construct an anti-interference recognition module; the anti-interference recognition model includes a multi-scale feature extraction network, a target detection module and an interference segmentation module connected in sequence; the multi-scale feature extraction network extracts target features; the interference segmentation module predicts interference information based on target features; the target detection module predicts target information based on target features;

Step 2: Collect training data sets to train the anti-interference recognition model to obtain the trained anti-interference recognition model; during the training process, use the interference information to constrain the target detection module to predict the target information;

Step 3: Collect the image to be recognized, input it into the trained anti-interference recognition model, and obtain the recognition result;

The multi-scale feature extraction network includes five convolutional layers, four downsampling layers, five activation layers, a spatial receptive field merging unit, and two upsampling layers; the first four convolutional layers are followed by an activation layer and a downsampling layer; the fifth convolutional layer is followed by an activation layer and a spatial receptive field merging unit;

Each convolutional layer extracts a layer of features, and the extracted features are downsampled in the downsampling layer after passing through the activation layer; the spatial receptive field merging unit processes the fifth-layer features T5 extracted by the fifth convolutional layer and output by the fifth activation layer to obtain the fifth-layer target features P5;

The fifth-layer target feature P5 is scaled up by the nearest neighbor interpolation algorithm through the first upsampling layer to obtain the fifth-layer enlarged feature P5', and the fifth-layer enlarged feature P5' is fused with the fourth-layer feature T4 extracted by the fourth convolution layer and output by the fourth activation layer and the fourth downsampling layer by the channel connection operation to obtain the fourth-layer target feature P4;

The fourth-layer target feature P4 is scaled up by the nearest neighbor interpolation algorithm through the second upsampling layer to obtain the fourth-layer enlarged feature P4'. The fourth-layer enlarged feature P4' is fused with the third-layer feature T3 extracted by the third convolution layer and output by the third activation layer and the third downsampling layer using the channel connection operation to obtain the third-layer target feature P3.

2. According to the anti-interference recognition method of interference segmentation and target detection coordinated according to claim 1, it is characterized in that the spatial receptive field merging unit includes a first convolution kernel, a maximum pooling layer and a second convolution kernel connected in sequence; the sizes of the first convolution kernel and the second convolution kernel are both 1*1; the maximum pooling layer includes 3 sizes.

3. According to the anti-interference identification method of interference segmentation and target detection in collaboration according to claim 1, it is characterized in that the target detection module includes two convolutional layers and a prediction module, the third layer target feature P3 is input into the first layer convolutional layer, and the obtained output feature F1 is fused with the fourth layer target feature P4 by using the channel connection operation to obtain the fused feature; the fused feature is input into the second layer convolutional layer, and the obtained output feature F2 is fused with the fifth layer target feature P5 by using the channel connection operation to obtain the output feature F3; the output feature F1, the output feature F2 and the output feature F3 are predicted in the prediction module, and the obtained target information includes target position, target size, target direction, target category and target confidence; the prediction module uses two layers of convolutional layers, and after two convolutions, the target position, target size, target direction, target category and target confidence are obtained in the second convolutional layer.

4. According to the anti-interference recognition method of interference segmentation and target detection in collaboration according to claim 3, it is characterized in that the number of channels finally output by the target detection module is the number of anchor points under the current size feature * (number of categories + number of directions + n), where n represents the number of data required for target position, target size, and target confidence.

5. According to claim 1, an anti-interference identification method for coordinated interference segmentation and target detection is characterized in that the interference segmentation module includes four convolutional layers, three upsampling layers and two channel weight correction units, each upsampling layer is located between two convolutional layers, the first channel weight correction unit is located after the first upsampling layer, and the second channel weight correction unit is located after the second upsampling layer; the third layer of features output by the multi-scale feature extraction network is used as the input of the first convolutional layer; the upsampling layer is scaled up according to the nearest neighbor interpolation algorithm, and a feature map with the same image size as the training data set is obtained after two upsampling layers; the fourth convolutional layer outputs the interference probability of each pixel as interference information.

6. According to the anti-interference identification method of interference segmentation and target detection coordinated according to claim 5, it is characterized in that the channel weight correction unit includes a first convolution kernel, a global pooling layer, a second convolution kernel and an activation function connected in sequence.

7. According to claim 1, the interference segmentation and target detection coordinated anti-interference identification method is characterized in that the training process of the interference segmentation module of the anti-interference identification model is:

Step 211: extracting features of the images in the training data set through a multi-scale feature extraction network, and the interference segmentation module predicts the interference probability of each pixel in the corresponding image based on the extracted features;

Step 212: according to the interference probability distribution in the labeled data of the training data set, a binary image having the same size as the image in the corresponding training data set is produced;

Step 213: Perform cross entropy calculation on the interference probability of each pixel output by the interference segmentation module and the corresponding binary image, and guide learning through gradient back propagation; use the cross entropy loss function of the binary classification task to perform cross entropy calculation.

8. According to the interference segmentation and target detection collaborative anti-interference recognition method of claim 7, it is characterized in that the training target detection module adopts the IoU loss function and the Focal loss function, wherein the IoU loss function is used to calculate the loss of the target position and the target size, and the Focal loss function is used to calculate the target direction loss; in the target detection model training process, each target is adaptively adjusted according to the occlusion situation according to the set weight distribution strategy The weight distribution of different parts of the target is adjusted, and the specific weight generation includes the following steps:

Step 221: Calculate the average size of the target according to the size distribution of the labeled targets in the labeled data of the training data set;

Step 222: Taking each position in the binary image of step 22 as the center and the average value of the target size as the search range, the total value of all pixels in the search range is used as the new pixel value of the corresponding position to generate a weight map consistent with the image size in the corresponding training data set;

Step 223: After downsampling the weight map to the size of the third layer target feature P3, the fourth layer target feature P4 and the fifth layer target feature P5 respectively by nearest neighbor sampling, the corresponding target feature weight map is obtained according to the new pixel value according to the set weight allocation strategy.

9. The interference segmentation and target detection coordinated anti-interference recognition method according to claim 1 is characterized in that using the trained anti-interference recognition model to recognize the image to be recognized comprises the following steps:

Step 31: The multi-scale feature extraction network extracts features from the image to be identified, and performs feature fusion on the third-layer features T3, the fourth-layer features T4, and the fifth-layer features T5 by using the channel connection operation to obtain the third-layer target features P3, the fourth-layer target features P4, and the fifth-layer target features P5;

Step 32: The target detection module processes the third-layer target features P3, the fourth-layer target features P4, and the fifth-layer target features P5 again by means of convolution and feature fusion, and predicts the target position, target size, target direction, target category, and target confidence of targets of different scales according to the fused output features;

Step 33: The interference segmentation module performs weight correction and scale enlargement on the third-layer target feature P3 to obtain a feature map that is consistent with the size of the image to be identified, and predicts the interference probability of each pixel in the image to be identified based on the feature map.