[go: up one dir, main page]

CN119131364A - A method for detecting small targets in drones based on unsupervised adversarial learning - Google Patents

A method for detecting small targets in drones based on unsupervised adversarial learning Download PDF

Info

Publication number
CN119131364A
CN119131364A CN202411265268.8A CN202411265268A CN119131364A CN 119131364 A CN119131364 A CN 119131364A CN 202411265268 A CN202411265268 A CN 202411265268A CN 119131364 A CN119131364 A CN 119131364A
Authority
CN
China
Prior art keywords
image
feature
images
small target
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411265268.8A
Other languages
Chinese (zh)
Other versions
CN119131364B (en
Inventor
涂晓光
何志
康朋新
李卓骏
张艳艳
杨明
刘建华
殷举航
王宇
周超
崔雨勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation Flight University of China
South West Institute of Technical Physics
Original Assignee
Civil Aviation Flight University of China
South West Institute of Technical Physics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation Flight University of China, South West Institute of Technical Physics filed Critical Civil Aviation Flight University of China
Priority to CN202411265268.8A priority Critical patent/CN119131364B/en
Publication of CN119131364A publication Critical patent/CN119131364A/en
Application granted granted Critical
Publication of CN119131364B publication Critical patent/CN119131364B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning, and relates to the technical field of small target detection. According to the invention, through introducing image multi-scale degradation and enhancement, the detection model can better learn the multi-element structure and mode in the data, and can better identify the small target. In this way, the model can assist the target detection framework to learn semantic features that are more discriminative and pervasive for small target recognition. The method comprises the steps of generating an countermeasure network, introducing the countermeasure network into a target detection task, using a feature extractor to replace a generator, respectively generating corresponding feature images of an image background and a synthesized image through the feature extractor, simultaneously transmitting the two different feature images into a discriminator, distinguishing the difference between the two feature images by the discriminator, and enabling the discriminator to concentrate on learning the difference between the two feature images in continuous countermeasure learning for accurately positioning and detecting a small target object in the feature images of the synthesized image.

Description

Unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning
Technical Field
The invention relates to the technical field of small target detection, in particular to an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning.
Background
Unmanned aerial vehicle target detection is an important branch in the field of computer vision target detection, and along with continuous progress of unmanned aerial vehicle technology, the application range of unmanned aerial vehicle is continuously expanded, relates to a plurality of fields such as military, civilian, business. The purpose of unmanned aerial vehicle target detection is to locate, detect and classify unmanned aerial vehicles from images or videos to be detected, however, potential threats of unmanned aerial vehicles are also gradually highlighted, such as illegal invasion of unmanned aerial vehicles, malicious attacks of unmanned aerial vehicles and the like. Nowadays, unmanned aerial vehicle target detection based on machine vision is widely applied to the fields of daily life, such as airports, confidential places, video monitoring and the like.
Early target detection relied primarily on manually designed features and machine learning based classifiers, typical methods included Haar features and cascade classifiers (e.g., the Viola-Jones algorithm), and HOG (Histogram of Oriented Gradients) feature based methods, but these methods performed poorly in complex contexts, multi-target and scale variations, etc. With the continuous development of the field of computer vision, the occurrence of various detection algorithms changes the above situation. R-CNN (Region with CNN feature) is one of the pioneering efforts to use deep learning methods, which proposes locating objects in images by regional proposals (Region Proposals), then extracting features for each proposal using Convolutional Neural Networks (CNNs), and finally classifying. Fast R-CNN then improves on R-CNN by integrating the entire detection flow into a single network to increase speed and efficiency, using RoI (Region of Interest) pooling layers to extract features from the shared feature map. Today YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector) single-stage object detectors directly predict the class and position of objects through a single forward propagation process, greatly improving the detection speed. The YOLO-v5 network is used as a high-efficiency one-stage algorithm, and has the characteristics of strong generalization performance, high reasoning speed and the like.
At present, a great number of unmanned aerial vehicle target detection algorithms, such as improved algorithms of YOLO-v5, EFFICIENTDET and the like, have good effects on unmanned aerial vehicle target detection. The algorithm is supervised learning in nature, and the supervised learning target detection algorithm still has some defects, for example, the supervised learning algorithm generally needs a large amount of marked data for training, particularly in the field of deep learning, the supervised target detection algorithm has limited adaptability to the shielding and different postures of the target, the supervised target detection algorithm can only generally learn the mode in the marked data during training, and the generalization capability of the supervised target detection algorithm can be poor for targets in new fields or different environments.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning, which improves the unmanned aerial vehicle target detection precision.
The aim of the invention is realized by the following technical scheme:
an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning comprises the following steps:
Acquiring an original image containing a small target of the unmanned aerial vehicle in a training set, and extracting a rectangular area containing a detection target in the original image to obtain a detection target image and a pure background image without the detection target;
Performing image degradation and enhancement operations on the detection target image, wherein the image degradation and enhancement operations comprise respectively performing nearest neighbor interpolation, bilinear interpolation and bicubic interpolation on the detection target image, randomly selecting one interpolation method for performing secondary degradation on the obtained three groups of images with different degradation degrees to obtain another three groups of images with different degradation degrees, and performing multi-mode enhancement processing on the obtained degradation images to obtain degradation enhancement images of multiple groups of detection targets;
The method comprises the steps of constructing a small target detection model, wherein the small target detection model comprises a feature extraction network and an unsupervised target discrimination network, the feature extraction network is used as a generator and comprises a feature extractor formed by a YOLO-v5 backhaul backbone network and a feature adapter formed by a plurality of full connection layers and/or multi-layer perceptrons MLP;
Training the small target detection model, namely performing countermeasure learning, comprising the steps of fusing a degradation enhanced image and a pure background image of a detection target to obtain a synthesized image, respectively inputting the synthesized image and the pure background image into a feature extraction network to respectively generate an abnormal feature image and a normal feature image;
and selecting an image of a target to be detected in the test set, inputting the image into a trained small target detection model, and outputting a small target detection result of the unmanned aerial vehicle.
Further, the enhancement processing includes random scaling and translational transformation operations, operations of adjusting the chroma saturation and brightness of an image, and operations of fusing two images together with a certain transparency.
The feature extractor further comprises two Focus structures and two CSP structures, wherein the Focus structures are used for carrying out picture slicing operation on the images before semantic feature extraction, collecting information of width W and height H into a channel space for four similar complementary images, expanding an input channel by 4 times, namely, the spliced images become 12 channels relative to the original RGB three channels, and finally carrying out convolution operation on the obtained new images to finally obtain a double downsampling feature map under the condition of no information loss;
The CSP structure is used for splitting the feature map into two parts, one part is subjected to convolution operation, the result of the convolution operation of the other part and the last part is subjected to Concate splicing operation, and then the multi-time convolution operation is performed, so that the multi-layer semantic features of the feature map after the processing are extracted.
Further, the bi-classification discriminator is embodied as a dual-layer MLP structure for directly estimating each position as a normalization scorerWhere h is the height parameter of the position where the image block is located, and w is the width parameter of the position where the image block is located;
the normal estimation process is expressed as: wherein, the method comprises the steps of, wherein, For a normalization scorer, each position is estimated directlyIs used for the normal state of the (c),Is a transformed adaptive feature.
Further, the loss function of the discriminator adopts a binary cross entropy loss function, which is used for quantifying the accuracy of identifying the mapping image features and the synthesized image features of the discriminator, wherein the lower the value is, the better the performance is represented, and the calculation formula of the loss function of the discriminator is as follows:
Wherein, G generator, D is the discriminator, x represents the true data, y represents the synthetic data, G (x) represents the normal characteristic diagram, G (y) represents the abnormal characteristic diagram, D [ G (x) ] represents the discrimination result of the discriminator on the normal characteristic diagram, D [ G (y) ] represents the discrimination result of the discriminator on the abnormal characteristic diagram.
The beneficial effects of the invention are as follows:
(1) The invention introduces image multi-scale degradation and enhancement into the target detection algorithm, so that the model can better learn the multi-element structure and mode in the data, and the detection algorithm can better identify the small target. In this way, the model can assist the target detection framework to learn semantic features that are more discriminative and pervasive for small target recognition. In addition, the degraded target is enhanced in multiple modes, so that training data can be as close to data in real distribution as possible, the problems of sample imbalance and the like can be avoided, the model is forced to learn more robust features, and the generalization capability of the model is effectively improved.
(2) The invention introduces a generated countermeasure network into a target detection task, uses a feature extractor to replace a generator, respectively generates corresponding feature images of an image background and a synthesized image through the feature extractor, simultaneously transmits two different feature images into a discriminator, and the discriminator can discriminate and distinguish the difference of the two feature images, and the feature extractor can continuously learn to strengthen the extraction of the image features. The discriminator can focus on learning the difference between the two feature maps in continuous countermeasure learning and is used for positioning and detecting a small target object in the synthesized image feature map.
Drawings
FIG. 1 is a diagram of an overall network architecture of the present invention;
FIG. 2 is a schematic diagram of image multi-scale degradation and enhancement;
fig. 3 is a schematic diagram of mapping features and image features challenge learning.
Detailed Description
The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.
Referring to fig. 1-3, the present invention provides a technical solution:
an unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning comprises the following steps:
s1, acquiring an original image containing a small target of an unmanned aerial vehicle in a training set, and extracting a rectangular area containing a detection target in the original image to obtain a detection target image and a pure background image without the detection target;
Small target objects typically have a small size, which makes them relatively small in number of pixels in the image, the target object may be submerged by neighboring background pixels, and detection algorithms are very easy to identify non-target areas as small targets. The detection algorithm requires more, more discriminating features for discriminating small target objects from the background. The multi-scale degradation and enhancement of the image aims to enable the model to fully learn multi-element and multi-scale data characteristics of small target objects with different scales and different modes.
S2, performing image degradation and enhancement operation on the detection target image, wherein the image degradation and enhancement operation comprises the steps of respectively performing nearest neighbor interpolation, bilinear interpolation and bicubic interpolation on the detection target image, randomly selecting one interpolation method for carrying out secondary degradation on the obtained three groups of images with different degradation degrees to obtain another three groups of images with different degradation degrees, performing multi-mode enhancement processing on the obtained degradation images to obtain degradation enhancement images of a plurality of groups of detection targets, and a multi-scale degradation and enhancement process schematic diagram of the images is shown in figure 2.
The enhancement process includes random scaling and translational transformation operations, adjusting the chroma saturation and brightness of the image, and fusing the two images together with some transparency.
In the embodiment, 3 different degradation scales and 3 degradation methods are adopted to generate 9 groups of different degradation images in total, so that a variety of training samples are generated, the expansion of a training data set is enriched, and the detection model learns more abundant and comprehensive target characteristics. In a real scene, small target detection is often affected by multiple factors, such as low resolution, occlusion, illumination changes, etc., resulting in blurred images of the target. The image multi-scale degradation can simulate the situation, so that the model is contacted with more diversified data in the training process, thereby being better suitable for various complex detection scenes and being better suitable for different scene changes.
More training data is generated by using training sample data of an existing model through a data enhancement technology, so that the expanded training data is more similar to the data with real distribution, and the data enhancement method adopts random combination of color channel transformation, image stitching, image scaling and image blurring, so that the model can be forced to learn more robust features, and the generalization capability of the model is effectively improved.
S3, constructing a small target detection model, as shown in FIG. 1, wherein the small target detection model comprises a feature extraction network and an unsupervised target discrimination network, the feature extraction network is used as a generator and comprises a feature extractor formed by a YOLO-v5 backhaul backbone network and a feature adapter formed by a plurality of full connection layers and/or multi-layer perceptrons MLP;
The feature extractor comprises two Focus structures and two CSP structures, wherein the Focus structures are used for carrying out picture slicing operation on images before semantic feature extraction, collecting information of width W and height H into a channel space for four similar complementary images, expanding an input channel by 4 times, namely, the spliced images become 12 channels relative to the original RGB three channels, and finally carrying out convolution operation on the obtained new images to finally obtain a double downsampling feature map under the condition of no information loss;
The CSP structure is used for splitting the feature map into two parts, one part is subjected to convolution operation, the result of the convolution operation of the other part and the last part is subjected to Concate splicing operation, and then the multi-time convolution operation is performed, so that the multi-layer semantic features of the feature map after the processing are extracted.
In some specific embodiments, the feature extractor is composed of a plurality of convolution layers, pooling layers, activation functions, batch normalization layers, activation functions, global average pooling layers and full connection layers, and can extract the shallow semantic information of the image layer by layer through the feature extractor to obtain deep advanced semantic information of the image, and the deep advanced semantic information is served for the subsequent image target detection step.
S4, training the small target detection model, namely performing countermeasure learning, wherein the method comprises the steps of fusing a degradation enhanced image and a pure background image of a detection target to obtain a synthesized image, respectively inputting the synthesized image and the pure background image into a feature extraction network to respectively generate an abnormal feature image and a normal feature image;
The feature adapter is used for projecting the local features to the self-adaptive features after the local features are obtained from the pure background images through the feature extractor in the training stage, so that the training features are transferred to the target domain, and the mapping image features are generated.
The two-class discriminator is specifically a double-layer MLP structure and is used as a normalization scorer to directly estimate each positionWherein h is a height parameter of the position of the image block, and w is a width parameter of the position of the image block, and the estimation process of the normalization is expressed as follows: wherein, the method comprises the steps of, wherein, For a normalization scorer, each position is estimated directlyIs used for the normal state of the (c),Is a transformed adaptive feature. Since negative samples are generated with normal features, they are all input to the discriminator during training. The discriminator expects a positive output of the normal feature and a negative output of the abnormal feature.
The unsupervised target discrimination network feeds the abnormal feature map (composite image features) and the normal feature map (mapped image features) into the discriminator, which constantly learns how to better distinguish the two types of images, and the generator also learns how to generate a more realistic image to fool the discriminator. In this process, the arbiter loss function calculates the performance of the arbiter in distinguishing between the two images.
In this embodiment, the loss function of the arbiter adopts a binary cross entropy loss function, which is used to quantify the accuracy of the arbiter in identifying the features of the mapped image and the features of the synthesized image, and the lower the value, the better the performance is represented, and the calculation formula of the loss function of the arbiter is as follows:
Wherein, G generator, D is the discriminator, x represents the true data, y represents the synthetic data, G (x) represents the normal characteristic diagram, G (y) represents the abnormal characteristic diagram, D [ G (x) ] represents the discrimination result of the discriminator on the normal characteristic diagram, D [ G (y) ] represents the discrimination result of the discriminator on the abnormal characteristic diagram.
During model training, the generator and the arbiter learn against each other, the generator attempting to generate samples that are similar to the real data, and the arbiter attempting to distinguish the generated samples from the real samples. Through this resistance learning process, the generator continues to increase the quality of the generated samples, and the arbiter also continues to increase the ability to identify the generated samples. The invention introduces a generated countermeasure network into a target detection task, uses a feature extractor to replace a generator, respectively generates corresponding feature images of an image background and a synthesized image through the feature extractor, simultaneously transmits two different feature images into a discriminator, and the discriminator can discriminate and distinguish the difference of the two feature images, and the feature extractor can continuously learn to strengthen the extraction of the image features. Through the countermeasure learning, the feature extractor has stronger feature extraction capability, and the discriminator has stronger discrimination capability. The discriminator can concentrate on learning the difference of the two feature images in continuous countermeasure learning, and can realize positioning detection of a small target object in the synthesized image feature image.
S5, selecting a target image to be detected in the test set, inputting the target image to be detected in the trained small target detection model, and outputting a small target detection result of the unmanned aerial vehicle.
The discriminator performs countermeasure learning according to the mapped image features and the synthesized image features and is used for target detection, as shown in fig. 3, and the trained model can focus on a target area according to the features of the target image to be detected, so as to realize positioning detection of a small target.
As the image multi-scale degradation and enhancement and feature extraction process is added in the network training, the trained model has more obvious difference between the identification target and the background, has stronger self-adaptability to the real world image, has higher real world generalization and high robustness, and effectively improves the detection precision.
The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims (5)

1. The unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning is characterized by comprising the following steps:
Acquiring an original image containing a small target of the unmanned aerial vehicle in a training set, and extracting a rectangular area containing a detection target in the original image to obtain a detection target image and a pure background image without the detection target;
Performing image degradation and enhancement operations on the detection target image, wherein the image degradation and enhancement operations comprise respectively performing nearest neighbor interpolation, bilinear interpolation and bicubic interpolation on the detection target image, randomly selecting one interpolation method for performing secondary degradation on the obtained three groups of images with different degradation degrees to obtain another three groups of images with different degradation degrees, and performing multi-mode enhancement processing on the obtained degradation images to obtain degradation enhancement images of multiple groups of detection targets;
The method comprises the steps of constructing a small target detection model, wherein the small target detection model comprises a feature extraction network and an unsupervised target discrimination network, the feature extraction network is used as a generator and comprises a feature extractor formed by a YOLO-v5 backhaul backbone network and a feature adapter formed by a plurality of full connection layers and/or multi-layer perceptrons MLP;
Training the small target detection model, namely performing countermeasure learning, comprising the steps of fusing a degradation enhanced image and a pure background image of a detection target to obtain a synthesized image, respectively inputting the synthesized image and the pure background image into a feature extraction network to respectively generate an abnormal feature image and a normal feature image;
and selecting an image of a target to be detected in the test set, inputting the image into a trained small target detection model, and outputting a small target detection result of the unmanned aerial vehicle.
2. The method for unmanned aerial vehicle small target detection based on unsupervised challenge learning of claim 1, wherein the enhancement process comprises a random scaling and translation transformation operation, an adjustment of chroma saturation and brightness of the images, and a fusion of the two images together with a certain transparency.
3. The unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning of claim 1, wherein the feature extractor comprises two Focus structures and two CSP structures, wherein the Focus structures are used for carrying out picture slicing operation on images before semantic feature extraction, collecting information of width W and height H into a channel space for four similar complementary images, expanding an input channel by 4 times, namely, the spliced images become 12 channels relative to the original RGB three channels, and finally carrying out convolution operation on the obtained new images to finally obtain a double downsampling feature map without information loss;
The CSP structure is used for splitting the feature map into two parts, one part is subjected to convolution operation, the result of the convolution operation of the other part and the last part is subjected to Concate splicing operation, and then the multi-time convolution operation is performed, so that the multi-layer semantic features of the feature map after the processing are extracted.
4. The unmanned aerial vehicle small target detection method based on unsupervised challenge learning of claim 1, wherein the classification discriminator is a double-layer MLP structure for directly estimating each position as a normalization scorerWhere h is the height parameter of the position where the image block is located, and w is the width parameter of the position where the image block is located;
the normal estimation process is expressed as: wherein, the method comprises the steps of, wherein, For a normalization scorer, each position is estimated directlyIs used for the normal state of the (c),Is a transformed adaptive feature.
5. The unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning according to claim 1, wherein the loss function of the discriminator adopts a binary cross entropy loss function for quantifying the accuracy of the discriminator in identifying the mapping image features and the synthesized image features, the lower the value is, the better the performance is represented, and the calculation formula of the loss function of the discriminator is as follows:
Wherein, G generator, D is the discriminator, x represents the true data, y represents the synthetic data, G (x) represents the normal characteristic diagram, G (y) represents the abnormal characteristic diagram, D [ G (x) ] represents the discrimination result of the discriminator on the normal characteristic diagram, D [ G (y) ] represents the discrimination result of the discriminator on the abnormal characteristic diagram.
CN202411265268.8A 2024-09-10 2024-09-10 Unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning Active CN119131364B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411265268.8A CN119131364B (en) 2024-09-10 2024-09-10 Unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411265268.8A CN119131364B (en) 2024-09-10 2024-09-10 Unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning

Publications (2)

Publication Number Publication Date
CN119131364A true CN119131364A (en) 2024-12-13
CN119131364B CN119131364B (en) 2025-07-22

Family

ID=93754955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411265268.8A Active CN119131364B (en) 2024-09-10 2024-09-10 Unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning

Country Status (1)

Country Link
CN (1) CN119131364B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119437044A (en) * 2025-01-08 2025-02-14 中国民用航空飞行学院 A method and system for measuring large hot red parts
CN119784748A (en) * 2025-03-09 2025-04-08 中国民用航空飞行学院 Image processing and analysis system and method based on image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807464A (en) * 2021-09-29 2021-12-17 东南大学 Target detection method of UAV aerial imagery based on improved YOLO V5
WO2022000426A1 (en) * 2020-06-30 2022-01-06 中国科学院自动化研究所 Method and system for segmenting moving target on basis of twin deep neural network
CN118429779A (en) * 2024-05-31 2024-08-02 重庆西部笔迹大数据研究院 Text image countermeasure generation system, method, equipment and medium based on multi-mode information prompt

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022000426A1 (en) * 2020-06-30 2022-01-06 中国科学院自动化研究所 Method and system for segmenting moving target on basis of twin deep neural network
CN113807464A (en) * 2021-09-29 2021-12-17 东南大学 Target detection method of UAV aerial imagery based on improved YOLO V5
CN118429779A (en) * 2024-05-31 2024-08-02 重庆西部笔迹大数据研究院 Text image countermeasure generation system, method, equipment and medium based on multi-mode information prompt

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAOGUANG TU, ET AL.: "An improved YOLOv5 for object detection in visible and thermal infrared images based on contrastive learning", 《FRONT. PHYS.》, 21 April 2023 (2023-04-21) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119437044A (en) * 2025-01-08 2025-02-14 中国民用航空飞行学院 A method and system for measuring large hot red parts
CN119784748A (en) * 2025-03-09 2025-04-08 中国民用航空飞行学院 Image processing and analysis system and method based on image

Also Published As

Publication number Publication date
CN119131364B (en) 2025-07-22

Similar Documents

Publication Publication Date Title
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
He et al. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild
Bascón et al. An optimization on pictogram identification for the road-sign recognition task using SVMs
Zhou et al. Robust vehicle detection in aerial images using bag-of-words and orientation aware scanning
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN119131364B (en) Unmanned aerial vehicle small target detection method based on unsupervised countermeasure learning
CN109918971B (en) Method and device for detecting number of people in monitoring video
CN114926456B (en) A rail foreign body detection method based on semi-automatic annotation and improved deep learning
CN110298297A (en) Flame identification method and device
CN106683119A (en) Moving vehicle detecting method based on aerially photographed video images
CN105205480A (en) Complex scene human eye locating method and system
CN109859246B (en) A low-altitude slow-speed UAV tracking method combining correlation filtering and visual saliency
CN113139896A (en) Target detection system and method based on super-resolution reconstruction
Yu et al. A unified transformer based tracker for anti-uav tracking
Wang et al. Sface: An efficient network for face detection in large scale variations
Han et al. A method based on multi-convolution layers joint and generative adversarial networks for vehicle detection
CN116665015B (en) A method for detecting weak and small targets in infrared sequence images based on YOLOv5
Huang et al. Multi-Teacher Single-Student Visual Transformer with Multi-Level Attention for Face Spoofing Detection.
CN117994573A (en) Infrared dim target detection method based on superpixel and deformable convolution
CN108073940A (en) A kind of method of 3D object instance object detections in unstructured moving grids
Asadzadehkaljahi et al. Spatiotemporal edges for arbitrarily moving video classification in protected and sensitive scenes
Hadiprakoso Face anti-spoofing method with blinking eye and hsv texture analysis
Kapoor A video surveillance detection of moving object using deep learning
Fatichah et al. Optical flow feature based for fire detection on video data
Pan et al. TV logo classification based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant