CN119131364A

CN119131364A - A method for detecting small targets in drones based on unsupervised adversarial learning

Info

Publication number: CN119131364A
Application number: CN202411265268.8A
Authority: CN
Inventors: 涂晓光; 何志; 康朋新; 李卓骏; 张艳艳; 杨明; 刘建华; 殷举航; 王宇; 周超; 崔雨勇
Original assignee: Civil Aviation Flight University of China; South West Institute of Technical Physics
Current assignee: Civil Aviation Flight University of China; South West Institute of Technical Physics
Priority date: 2024-09-10
Filing date: 2024-09-10
Publication date: 2024-12-13
Anticipated expiration: 2044-09-10
Also published as: CN119131364B

Abstract

The invention discloses an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning, and relates to the technical field of small target detection. According to the invention, through introducing image multi-scale degradation and enhancement, the detection model can better learn the multi-element structure and mode in the data, and can better identify the small target. In this way, the model can assist the target detection framework to learn semantic features that are more discriminative and pervasive for small target recognition. The method comprises the steps of generating an countermeasure network, introducing the countermeasure network into a target detection task, using a feature extractor to replace a generator, respectively generating corresponding feature images of an image background and a synthesized image through the feature extractor, simultaneously transmitting the two different feature images into a discriminator, distinguishing the difference between the two feature images by the discriminator, and enabling the discriminator to concentrate on learning the difference between the two feature images in continuous countermeasure learning for accurately positioning and detecting a small target object in the feature images of the synthesized image.

Description

Unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning

Technical Field

The invention relates to the technical field of small target detection, in particular to an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning.

Background

Unmanned aerial vehicle target detection is an important branch in the field of computer vision target detection, and along with continuous progress of unmanned aerial vehicle technology, the application range of unmanned aerial vehicle is continuously expanded, relates to a plurality of fields such as military, civilian, business. The purpose of unmanned aerial vehicle target detection is to locate, detect and classify unmanned aerial vehicles from images or videos to be detected, however, potential threats of unmanned aerial vehicles are also gradually highlighted, such as illegal invasion of unmanned aerial vehicles, malicious attacks of unmanned aerial vehicles and the like. Nowadays, unmanned aerial vehicle target detection based on machine vision is widely applied to the fields of daily life, such as airports, confidential places, video monitoring and the like.

Early target detection relied primarily on manually designed features and machine learning based classifiers, typical methods included Haar features and cascade classifiers (e.g., the Viola-Jones algorithm), and HOG (Histogram of Oriented Gradients) feature based methods, but these methods performed poorly in complex contexts, multi-target and scale variations, etc. With the continuous development of the field of computer vision, the occurrence of various detection algorithms changes the above situation. R-CNN (Region with CNN feature) is one of the pioneering efforts to use deep learning methods, which proposes locating objects in images by regional proposals (Region Proposals), then extracting features for each proposal using Convolutional Neural Networks (CNNs), and finally classifying. Fast R-CNN then improves on R-CNN by integrating the entire detection flow into a single network to increase speed and efficiency, using RoI (Region of Interest) pooling layers to extract features from the shared feature map. Today YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector) single-stage object detectors directly predict the class and position of objects through a single forward propagation process, greatly improving the detection speed. The YOLO-v5 network is used as a high-efficiency one-stage algorithm, and has the characteristics of strong generalization performance, high reasoning speed and the like.

At present, a great number of unmanned aerial vehicle target detection algorithms, such as improved algorithms of YOLO-v5, EFFICIENTDET and the like, have good effects on unmanned aerial vehicle target detection. The algorithm is supervised learning in nature, and the supervised learning target detection algorithm still has some defects, for example, the supervised learning algorithm generally needs a large amount of marked data for training, particularly in the field of deep learning, the supervised target detection algorithm has limited adaptability to the shielding and different postures of the target, the supervised target detection algorithm can only generally learn the mode in the marked data during training, and the generalization capability of the supervised target detection algorithm can be poor for targets in new fields or different environments.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning, which improves the unmanned aerial vehicle target detection precision.

The aim of the invention is realized by the following technical scheme:

an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning comprises the following steps:

Acquiring an original image containing a small target of the unmanned aerial vehicle in a training set, and extracting a rectangular area containing a detection target in the original image to obtain a detection target image and a pure background image without the detection target;

Performing image degradation and enhancement operations on the detection target image, wherein the image degradation and enhancement operations comprise respectively performing nearest neighbor interpolation, bilinear interpolation and bicubic interpolation on the detection target image, randomly selecting one interpolation method for performing secondary degradation on the obtained three groups of images with different degradation degrees to obtain another three groups of images with different degradation degrees, and performing multi-mode enhancement processing on the obtained degradation images to obtain degradation enhancement images of multiple groups of detection targets;

The method comprises the steps of constructing a small target detection model, wherein the small target detection model comprises a feature extraction network and an unsupervised target discrimination network, the feature extraction network is used as a generator and comprises a feature extractor formed by a YOLO-v5 backhaul backbone network and a feature adapter formed by a plurality of full connection layers and/or multi-layer perceptrons MLP;

Training the small target detection model, namely performing countermeasure learning, comprising the steps of fusing a degradation enhanced image and a pure background image of a detection target to obtain a synthesized image, respectively inputting the synthesized image and the pure background image into a feature extraction network to respectively generate an abnormal feature image and a normal feature image;

and selecting an image of a target to be detected in the test set, inputting the image into a trained small target detection model, and outputting a small target detection result of the unmanned aerial vehicle.

Further, the enhancement processing includes random scaling and translational transformation operations, operations of adjusting the chroma saturation and brightness of an image, and operations of fusing two images together with a certain transparency.

The feature extractor further comprises two Focus structures and two CSP structures, wherein the Focus structures are used for carrying out picture slicing operation on the images before semantic feature extraction, collecting information of width W and height H into a channel space for four similar complementary images, expanding an input channel by 4 times, namely, the spliced images become 12 channels relative to the original RGB three channels, and finally carrying out convolution operation on the obtained new images to finally obtain a double downsampling feature map under the condition of no information loss;

The CSP structure is used for splitting the feature map into two parts, one part is subjected to convolution operation, the result of the convolution operation of the other part and the last part is subjected to Concate splicing operation, and then the multi-time convolution operation is performed, so that the multi-layer semantic features of the feature map after the processing are extracted.

Further, the bi-classification discriminator is embodied as a dual-layer MLP structure for directly estimating each position as a normalization scorerWhere h is the height parameter of the position where the image block is located, and w is the width parameter of the position where the image block is located;

the normal estimation process is expressed as: wherein, the method comprises the steps of, wherein, For a normalization scorer, each position is estimated directlyIs used for the normal state of the (c),Is a transformed adaptive feature.

Further, the loss function of the discriminator adopts a binary cross entropy loss function, which is used for quantifying the accuracy of identifying the mapping image features and the synthesized image features of the discriminator, wherein the lower the value is, the better the performance is represented, and the calculation formula of the loss function of the discriminator is as follows:

Wherein, G generator, D is the discriminator, x represents the true data, y represents the synthetic data, G (x) represents the normal characteristic diagram, G (y) represents the abnormal characteristic diagram, D [ G (x) ] represents the discrimination result of the discriminator on the normal characteristic diagram, D [ G (y) ] represents the discrimination result of the discriminator on the abnormal characteristic diagram.

The beneficial effects of the invention are as follows:

(1) The invention introduces image multi-scale degradation and enhancement into the target detection algorithm, so that the model can better learn the multi-element structure and mode in the data, and the detection algorithm can better identify the small target. In this way, the model can assist the target detection framework to learn semantic features that are more discriminative and pervasive for small target recognition. In addition, the degraded target is enhanced in multiple modes, so that training data can be as close to data in real distribution as possible, the problems of sample imbalance and the like can be avoided, the model is forced to learn more robust features, and the generalization capability of the model is effectively improved.

(2) The invention introduces a generated countermeasure network into a target detection task, uses a feature extractor to replace a generator, respectively generates corresponding feature images of an image background and a synthesized image through the feature extractor, simultaneously transmits two different feature images into a discriminator, and the discriminator can discriminate and distinguish the difference of the two feature images, and the feature extractor can continuously learn to strengthen the extraction of the image features. The discriminator can focus on learning the difference between the two feature maps in continuous countermeasure learning and is used for positioning and detecting a small target object in the synthesized image feature map.

Drawings

FIG. 1 is a diagram of an overall network architecture of the present invention;

FIG. 2 is a schematic diagram of image multi-scale degradation and enhancement;

fig. 3 is a schematic diagram of mapping features and image features challenge learning.

Detailed Description

The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.

Referring to fig. 1-3, the present invention provides a technical solution:

s1, acquiring an original image containing a small target of an unmanned aerial vehicle in a training set, and extracting a rectangular area containing a detection target in the original image to obtain a detection target image and a pure background image without the detection target;

Small target objects typically have a small size, which makes them relatively small in number of pixels in the image, the target object may be submerged by neighboring background pixels, and detection algorithms are very easy to identify non-target areas as small targets. The detection algorithm requires more, more discriminating features for discriminating small target objects from the background. The multi-scale degradation and enhancement of the image aims to enable the model to fully learn multi-element and multi-scale data characteristics of small target objects with different scales and different modes.

S2, performing image degradation and enhancement operation on the detection target image, wherein the image degradation and enhancement operation comprises the steps of respectively performing nearest neighbor interpolation, bilinear interpolation and bicubic interpolation on the detection target image, randomly selecting one interpolation method for carrying out secondary degradation on the obtained three groups of images with different degradation degrees to obtain another three groups of images with different degradation degrees, performing multi-mode enhancement processing on the obtained degradation images to obtain degradation enhancement images of a plurality of groups of detection targets, and a multi-scale degradation and enhancement process schematic diagram of the images is shown in figure 2.

The enhancement process includes random scaling and translational transformation operations, adjusting the chroma saturation and brightness of the image, and fusing the two images together with some transparency.

In the embodiment, 3 different degradation scales and 3 degradation methods are adopted to generate 9 groups of different degradation images in total, so that a variety of training samples are generated, the expansion of a training data set is enriched, and the detection model learns more abundant and comprehensive target characteristics. In a real scene, small target detection is often affected by multiple factors, such as low resolution, occlusion, illumination changes, etc., resulting in blurred images of the target. The image multi-scale degradation can simulate the situation, so that the model is contacted with more diversified data in the training process, thereby being better suitable for various complex detection scenes and being better suitable for different scene changes.

More training data is generated by using training sample data of an existing model through a data enhancement technology, so that the expanded training data is more similar to the data with real distribution, and the data enhancement method adopts random combination of color channel transformation, image stitching, image scaling and image blurring, so that the model can be forced to learn more robust features, and the generalization capability of the model is effectively improved.

S3, constructing a small target detection model, as shown in FIG. 1, wherein the small target detection model comprises a feature extraction network and an unsupervised target discrimination network, the feature extraction network is used as a generator and comprises a feature extractor formed by a YOLO-v5 backhaul backbone network and a feature adapter formed by a plurality of full connection layers and/or multi-layer perceptrons MLP;

The feature extractor comprises two Focus structures and two CSP structures, wherein the Focus structures are used for carrying out picture slicing operation on images before semantic feature extraction, collecting information of width W and height H into a channel space for four similar complementary images, expanding an input channel by 4 times, namely, the spliced images become 12 channels relative to the original RGB three channels, and finally carrying out convolution operation on the obtained new images to finally obtain a double downsampling feature map under the condition of no information loss;

In some specific embodiments, the feature extractor is composed of a plurality of convolution layers, pooling layers, activation functions, batch normalization layers, activation functions, global average pooling layers and full connection layers, and can extract the shallow semantic information of the image layer by layer through the feature extractor to obtain deep advanced semantic information of the image, and the deep advanced semantic information is served for the subsequent image target detection step.

S4, training the small target detection model, namely performing countermeasure learning, wherein the method comprises the steps of fusing a degradation enhanced image and a pure background image of a detection target to obtain a synthesized image, respectively inputting the synthesized image and the pure background image into a feature extraction network to respectively generate an abnormal feature image and a normal feature image;

The feature adapter is used for projecting the local features to the self-adaptive features after the local features are obtained from the pure background images through the feature extractor in the training stage, so that the training features are transferred to the target domain, and the mapping image features are generated.

The two-class discriminator is specifically a double-layer MLP structure and is used as a normalization scorer to directly estimate each positionWherein h is a height parameter of the position of the image block, and w is a width parameter of the position of the image block, and the estimation process of the normalization is expressed as follows: wherein, the method comprises the steps of, wherein, For a normalization scorer, each position is estimated directlyIs used for the normal state of the (c),Is a transformed adaptive feature. Since negative samples are generated with normal features, they are all input to the discriminator during training. The discriminator expects a positive output of the normal feature and a negative output of the abnormal feature.

The unsupervised target discrimination network feeds the abnormal feature map (composite image features) and the normal feature map (mapped image features) into the discriminator, which constantly learns how to better distinguish the two types of images, and the generator also learns how to generate a more realistic image to fool the discriminator. In this process, the arbiter loss function calculates the performance of the arbiter in distinguishing between the two images.

In this embodiment, the loss function of the arbiter adopts a binary cross entropy loss function, which is used to quantify the accuracy of the arbiter in identifying the features of the mapped image and the features of the synthesized image, and the lower the value, the better the performance is represented, and the calculation formula of the loss function of the arbiter is as follows:

During model training, the generator and the arbiter learn against each other, the generator attempting to generate samples that are similar to the real data, and the arbiter attempting to distinguish the generated samples from the real samples. Through this resistance learning process, the generator continues to increase the quality of the generated samples, and the arbiter also continues to increase the ability to identify the generated samples. The invention introduces a generated countermeasure network into a target detection task, uses a feature extractor to replace a generator, respectively generates corresponding feature images of an image background and a synthesized image through the feature extractor, simultaneously transmits two different feature images into a discriminator, and the discriminator can discriminate and distinguish the difference of the two feature images, and the feature extractor can continuously learn to strengthen the extraction of the image features. Through the countermeasure learning, the feature extractor has stronger feature extraction capability, and the discriminator has stronger discrimination capability. The discriminator can concentrate on learning the difference of the two feature images in continuous countermeasure learning, and can realize positioning detection of a small target object in the synthesized image feature image.

S5, selecting a target image to be detected in the test set, inputting the target image to be detected in the trained small target detection model, and outputting a small target detection result of the unmanned aerial vehicle.

The discriminator performs countermeasure learning according to the mapped image features and the synthesized image features and is used for target detection, as shown in fig. 3, and the trained model can focus on a target area according to the features of the target image to be detected, so as to realize positioning detection of a small target.

As the image multi-scale degradation and enhancement and feature extraction process is added in the network training, the trained model has more obvious difference between the identification target and the background, has stronger self-adaptability to the real world image, has higher real world generalization and high robustness, and effectively improves the detection precision.

The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims

1. The unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning is characterized by comprising the following steps:

2. The method for unmanned aerial vehicle small target detection based on unsupervised challenge learning of claim 1, wherein the enhancement process comprises a random scaling and translation transformation operation, an adjustment of chroma saturation and brightness of the images, and a fusion of the two images together with a certain transparency.

3. The unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning of claim 1, wherein the feature extractor comprises two Focus structures and two CSP structures, wherein the Focus structures are used for carrying out picture slicing operation on images before semantic feature extraction, collecting information of width W and height H into a channel space for four similar complementary images, expanding an input channel by 4 times, namely, the spliced images become 12 channels relative to the original RGB three channels, and finally carrying out convolution operation on the obtained new images to finally obtain a double downsampling feature map without information loss;

4. The unmanned aerial vehicle small target detection method based on unsupervised challenge learning of claim 1, wherein the classification discriminator is a double-layer MLP structure for directly estimating each position as a normalization scorerWhere h is the height parameter of the position where the image block is located, and w is the width parameter of the position where the image block is located;

5. The unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning according to claim 1, wherein the loss function of the discriminator adopts a binary cross entropy loss function for quantifying the accuracy of the discriminator in identifying the mapping image features and the synthesized image features, the lower the value is, the better the performance is represented, and the calculation formula of the loss function of the discriminator is as follows: