[go: up one dir, main page]

CN117274740A - Infrared target detection method and device - Google Patents

Infrared target detection method and device Download PDF

Info

Publication number
CN117274740A
CN117274740A CN202311237659.4A CN202311237659A CN117274740A CN 117274740 A CN117274740 A CN 117274740A CN 202311237659 A CN202311237659 A CN 202311237659A CN 117274740 A CN117274740 A CN 117274740A
Authority
CN
China
Prior art keywords
detection
detected
infrared
infrared target
detection result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311237659.4A
Other languages
Chinese (zh)
Inventor
张国华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bright Oceans Inter Telecom Co Ltd
Original Assignee
Bright Oceans Inter Telecom Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bright Oceans Inter Telecom Co Ltd filed Critical Bright Oceans Inter Telecom Co Ltd
Priority to CN202311237659.4A priority Critical patent/CN117274740A/en
Publication of CN117274740A publication Critical patent/CN117274740A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an infrared target detection method and device, which relate to the technical field of target detection, wherein an infrared target detection network is trained through a training set to obtain an infrared target detection model, and the infrared target detection network comprises an attention mechanism module of a target to be detected. Then, the infrared target picture to be detected is detected according to the infrared target detection model, and a preliminary detection result can be obtained. And finally, filtering the preliminary detection result through an algorithm to obtain a detection result of the infrared target picture to be detected. Therefore, the infrared target detection model is obtained by training the infrared target detection network of the attention mechanism module of the target to be detected, so that the infrared target detection model obtained by training can learn enough features from the training set to avoid the interference of the background and concentrate on the target to be detected, thereby improving the accuracy of infrared target detection.

Description

Infrared target detection method and device
Technical Field
The present disclosure relates to the field of target detection technologies, and in particular, to an infrared target detection method and device.
Background
In nature, all objects can radiate infrared rays, infrared images formed by different thermal infrared rays can be obtained by measuring the infrared difference between the object and the background through the detector, and the infrared images are based on the relative temperature information of the objects, so that the infrared images are less influenced by various factors and can be applied to various aspects.
Aiming at the problem of target detection, a plurality of methods based on deep learning are proposed, and target detection can be mainly performed by adopting a target detection algorithm, and as the traditional visible light image target detection cannot obtain a good detection result when applied to the night or unmanned driving field, the target detection can be performed by adopting an infrared imaging target detection method at night or in the unmanned driving field.
However, the infrared image has the characteristics of low contrast ratio, low signal-to-noise ratio and fuzzy boundary between the target and the background, so that the problem of background interference exists in the target detection by the infrared imaging target detection method, and the accuracy of the infrared target detection is reduced.
Disclosure of Invention
In view of this, the embodiments of the present application provide a method and an apparatus for detecting an infrared target, which aim to improve the accuracy of infrared target detection.
In a first aspect, an embodiment of the present application provides an infrared target detection method, where the method includes:
training an infrared target detection network through a training set to obtain an infrared target detection model, wherein the infrared target detection network comprises an attention mechanism module of a target to be detected;
detecting an infrared target picture to be detected according to the infrared target detection model to obtain a preliminary detection result;
and filtering the preliminary detection result through an algorithm to obtain a detection result of the infrared target picture to be detected.
Optionally, the attention mechanism module of the object to be detected includes:
dividing an input feature map into two branches according to a first channel dimension, and respectively carrying out different convolution operations on branch features of the two branches;
splicing the two branch features subjected to the convolution operation according to the second channel dimension to obtain spliced branch features;
global average pooling is carried out on the spliced branch characteristics to obtain a scalar;
performing weight evaluation on the scalar through an activation function in the fully connected neural network to obtain the weight of the scalar;
dividing the scalar weight into two branch weights according to a third channel dimension, and multiplying the two branch weights with corresponding branch features respectively to obtain two weighted branch features;
and splicing the two weighted branch characteristics to obtain an output characteristic diagram.
Optionally, the infrared object detection network includes a feature extraction network, the method further comprising:
splicing the deep feature images and the shallow feature images with different scales in the feature extraction network to obtain multi-scale feature output;
and detecting the multi-scale characteristic output through a target detection layer to obtain the detected multi-scale characteristic output.
Optionally, the training step of the infrared target detection network includes:
predicting the target to be detected based on anchor frames of different data sets according to the infrared target detection network to obtain a prediction frame, wherein the prediction frame is a predicted frame marking the position of the target to be detected;
comparing the predicted frame with a true frame marked by a target to be detected in the training set to obtain a loss value;
and carrying out gradient back propagation on the infrared target detection network according to the loss value to obtain the infrared target detection model.
Optionally, the preliminary detection result includes a confidence coefficient of the target to be detected, filtering the preliminary detection result through an algorithm to obtain a detection result of the infrared target picture to be detected, including:
filtering the preliminary detection result according to a preset confidence threshold value to obtain a filtered preliminary detection result;
sorting the filtered preliminary detection results through the confidence values to obtain sorted detection results;
calculating the cross ratio of the detection results except the first detection result in the ordered detection results with the first detection result in the ordered detection results respectively to obtain a cross ratio result;
filtering the cross comparison result according to a preset cross comparison threshold value to obtain a filtered detection result;
and determining the detection result meeting the preset condition in the filtered detection results as the detection result of the infrared target picture to be detected.
Optionally, the determining the detection result satisfying the preset condition in the filtered detection results as the detection result of the infrared target picture to be detected includes:
sequencing the first detection result of the confidence value in the filtered detection result, and determining the first detection result as a preliminary output result;
circulating the detection results except the first detection result according to a preset rule to respectively obtain corresponding output results until the number of the filtered primary detection results meets a preset condition;
and obtaining a detection result of the infrared target picture to be detected according to the union set of the preliminary output result and the corresponding output result.
In a second aspect, embodiments of the present application provide an infrared target detection apparatus, the apparatus including:
the training module is used for training the infrared target detection network through the training set to obtain an infrared target detection model, and the infrared target detection network comprises an attention mechanism module of a target to be detected;
the detection module is used for detecting the infrared target picture to be detected according to the infrared target detection model to obtain a preliminary detection result;
and the filtering module is used for filtering the preliminary detection result through an algorithm to obtain the detection result of the infrared target picture to be detected.
Optionally, the attention mechanism module of the object to be detected includes:
the first dividing unit is used for dividing the input feature map into two branches according to the first channel dimension, and respectively carrying out different convolution operations on the branch features of the two branches;
the first splicing unit is used for splicing the two branch features subjected to the convolution operation according to the second channel dimension to obtain spliced branch features;
the averaging pooling unit is used for carrying out global averaging pooling on the spliced branch characteristics to obtain a scalar;
the weight evaluation unit is used for evaluating the weight of the scalar through an activation function in the fully-connected neural network to obtain the weight of the scalar;
the second dividing unit is used for dividing the scalar weight into two branch weights according to a third channel dimension, and multiplying the two branch weights with corresponding branch features respectively to obtain two weighted branch features;
and the second splicing unit is used for splicing the two weighted branch characteristics to obtain an output characteristic diagram.
Optionally, the infrared object detection network includes a feature extraction network, and the apparatus further includes:
the third splicing unit is used for splicing the deep feature images and the shallow feature images with different scales in the feature extraction network to obtain multi-scale feature output;
the detection unit is used for detecting the multi-scale characteristic output through the target detection layer to obtain the detected multi-scale characteristic output.
Optionally, the training step of the infrared target detection network includes:
the prediction unit is used for predicting the target to be detected based on anchor frames of different data sets according to the infrared target detection network to obtain a prediction frame, wherein the prediction frame is a predicted frame for marking the position of the target to be detected;
the comparison unit is used for comparing the prediction frame with a real frame marked by a target to be detected in the training set to obtain a loss value;
and the back propagation unit is used for carrying out gradient back propagation on the infrared target detection network according to the loss value to obtain the infrared target detection model.
Optionally, the preliminary detection result includes a confidence level of the target to be detected, and the filtering module includes:
the first filtering unit is used for filtering the preliminary detection result according to a preset confidence threshold value to obtain a filtered preliminary detection result;
the sorting unit is used for sorting the filtered preliminary detection results through the confidence value to obtain sorted detection results;
the calculation unit is used for calculating the cross-over ratio of the detection results except the first detection result in the ordered detection results and the detection results of the first detection result in the ordered detection results respectively to obtain a cross-over ratio result;
the second filtering unit is used for filtering the cross comparison result according to a preset cross comparison threshold value to obtain a filtered detection result;
and the first determining unit is used for determining the detection result meeting the preset condition in the filtered detection results as the detection result of the infrared target picture to be detected.
Optionally, the first determining unit includes:
the second determining unit is used for sequencing the reliability values in the filtered detection results to obtain a first detection result, and determining the first detection result as a preliminary output result;
the circulating unit is used for circulating the detection results except the first detection result in the filtered detection results according to a preset rule to respectively obtain corresponding output results until the number of the filtered primary detection results meets a preset condition;
and the obtaining unit is used for obtaining the detection result of the infrared target picture to be detected according to the union set of the preliminary output result and the corresponding output result.
In a third aspect, embodiments of the present application provide an infrared target detection apparatus, the apparatus including:
a memory for storing a computer program;
a processor configured to execute the computer program to cause the apparatus to perform the infrared target detection method according to the foregoing first aspect.
In a fourth aspect, embodiments of the present application provide a computer storage medium, where a computer program is stored, and when the computer program is executed, an apparatus running the computer program implements the method for detecting an infrared target according to the first aspect.
Compared with the prior art, the embodiment of the application has the following beneficial effects:
the embodiment of the application provides an infrared target detection method and device, firstly, an infrared target detection network is trained through a training set, and an infrared target detection model can be obtained, wherein the infrared target detection network comprises an attention mechanism module of a target to be detected. Then, the infrared target picture to be detected is detected according to the infrared target detection model, and a preliminary detection result can be obtained. And finally, filtering the preliminary detection result through an algorithm to obtain a detection result of the infrared target picture to be detected. Therefore, after the infrared target detection model predicts the infrared target picture to be detected, the preliminary detection result can be filtered through an algorithm, so that the detection result of the infrared target picture to be detected is obtained.
Drawings
In order to more clearly illustrate the present embodiments or the technical solutions in the prior art, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an application scenario of an infrared target detection method provided in an embodiment of the present application;
FIG. 2 is a flowchart of an infrared target detection method according to an embodiment of the present disclosure;
fig. 3 is a schematic frame structure of an infrared target detection method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an attention mechanism module of a pedestrian object according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an infrared target detection device according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Aiming at the problem of target detection, a plurality of methods based on deep learning are proposed, and target detection can be mainly performed by adopting a target detection algorithm, and as the traditional visible light image target detection cannot obtain a good detection result when applied to the night or unmanned driving field, the target detection can be performed by adopting an infrared imaging target detection method at night or in the unmanned driving field.
However, the infrared image has the characteristics of low contrast ratio, low signal-to-noise ratio and fuzzy boundary between the target and the background, so that the problem of background interference exists in the target detection by the infrared imaging target detection method, and the accuracy of the infrared target detection is reduced.
The target detection algorithm based on deep learning generally adopts a target detection algorithm Faster R-CNN, however, the target detection algorithm has the problems that the detection speed is low, the detection instantaneity cannot be ensured, and the problem that the detection instantaneity cannot be ensured in the unmanned field is particularly obvious. Then, a YOLO series target detection algorithm appears, the speed problem in target detection is solved by the one-stage target detection algorithm YOLO, and the real-time performance of target detection is greatly improved under the condition of ensuring certain detection precision, so that most of the target detection algorithms based on deep learning adopt the YOLO series target detection algorithm after the YOLO series target detection algorithm appears.
The YOLOv3 target detection algorithm is mainly aimed at the improvement of small target detection, and the YOLOv4 target detection algorithm is a technique for screening and improving detection precision based on the YOLOv3 target detection algorithm, and a proper innovative algorithm is combined, so that perfect balance between detection speed and detection precision is realized. The network structure of the YOLOv5 target detection algorithm is simple and efficient, the deployment and the use are easy, the applicability is strong, the YOLOv7 target detection algorithm is improved in terms of calculation efficiency and detection accuracy compared with the YOLOv5, but the YOLOv5 target detection algorithm is superior to the YOLOv7 in terms of training speed and reasoning speed, and the occupied memory is low, so that the YOLOv5 target detection algorithm has more advantages compared with other target detection algorithms in the YOLOv series.
Based on this, in order to solve the above-mentioned problem, in this embodiment of the present application, an attention mechanism module of a target to be detected is introduced into an infrared target detection network based on YOLOv5, then the infrared target detection network based on YOLOv5 is trained through a training set, and the infrared target detection model obtained by training can learn enough features from the training set to avoid interference of the background, so that the target to be detected is separated from the background and focused on the target to be detected, thereby improving accuracy of infrared target detection.
For example, one of the scenarios of the embodiments of the present application may be applied to the scenario shown in fig. 1. The scene comprises a database 101 and a server 102, wherein the database 101 comprises a training set and infrared target pictures to be detected, the server 102 adopts the implementation manner provided by the embodiment of the application, introduces an attention mechanism module of a target to be detected in an infrared target detection network based on YOLOv5, and then trains the infrared target detection network based on YOLOv5 through the training set to obtain an infrared target detection model, so that the infrared target detection model can learn enough features from the training set to avoid background interference and concentrate on the target to be detected.
First, in the above application scenario, although the description of the actions of the embodiments provided in the embodiments of the present application is performed by the server 102; however, the embodiment of the present application is not limited in terms of execution subject, as long as the actions disclosed in the embodiments provided by the embodiment of the present application are executed.
Second, the above scenario is only one example of a scenario provided in the embodiments of the present application, and the embodiments of the present application are not limited to this scenario.
Specific implementation manners of the infrared target detection method and the device in the embodiment of the application are described in detail below by means of embodiments with reference to the accompanying drawings.
Referring to fig. 2, the flowchart of an infrared target detection method provided in the embodiment of the present application, with reference to fig. 2, may specifically include:
s201: and training the infrared target detection network through the training set to obtain an infrared target detection model, wherein the infrared target detection network comprises an attention mechanism module of a target to be detected.
The target to be detected can be a moving object such as a pedestrian, an animal, a vehicle and the like, the target to be detected is not particularly limited, the implementation of the embodiment of the application is not affected, understanding is facilitated, and the target to be detected is introduced below as the pedestrian.
In one possible implementation, the training set KAIST in the open source infrared pedestrian detection data set may be preprocessed to obtain the training set of the infrared pedestrian detection network, and since the network training is mostly that a picture with a fixed size needs to be input, most commonly a square picture is input into the network, but the picture in the KAIST data set is not usually a square with a fixed size, the preprocessing of the KAIST data set may specifically include adaptively adding the least black edges to the original pictures with different lengths and widths in the KAIST data set, and uniformly scaling to a standard size, so that the picture in the KAIST data set becomes a size suitable for the network training.
As an example, the application adopts a Pytorch training framework when performing network training, the graphics processor (Graphics Processing Unit, GPU) is two 1080Ti, the infrared pedestrian detection network may be a pre-training model obtained by training a YOLOv5 network on a COCO data set, and then training the training set on the basis of the pre-training model to obtain the infrared pedestrian detection model, 200 rounds of epoch may be trained in the training process, one epoch is that all training samples are forward-propagated and backward-propagated in the neural network, that is, one epoch is that all training samples are trained once, and the number of samples in each batch is 8, which, of course, does not specifically limit the equipment and specific conditions adopted in the application training, and does not affect the implementation of the embodiments of the application.
The infrared pedestrian detection network comprises a feature extraction network, a pedestrian target attention mechanism module and four detectors, wherein the feature extraction network can be of a feature pyramid structure, the pedestrian target attention mechanism module acquires receptive fields of different scales by adopting convolution operations of different convolution kernel sizes, and the infrared pedestrian detection network is enabled to focus on the pedestrian targets in the training set by giving different weights to feature channels. The infrared pedestrian detection network comprises the following three parts, namely an anchor frame Anchor used for generating a prediction frame, a Classification module Classification used for classifying each prediction frame and judging whether the prediction frame is a target object or not, and a Regression module Regression used for carrying out Regression on each prediction frame to obtain the target position and the target size.
In the task of target detection, the size and shape of anchor frames are usually set, then anchor frames are drawn according to a certain point in the picture as the center, and the anchor frames are used as possible candidate areas. And predicting whether the candidate areas contain the target object through a model, and if so, further predicting the category of the object. Meanwhile, as the position of the anchor frame is fixed and the probability of overlapping with the boundary frame of the target object is not great, the model is also required to predict the amplitude of the anchor frame which needs fine adjustment, so that the prediction frame can describe the boundary of the target object as accurately as possible.
In one possible implementation, the training process of the infrared pedestrian detection network may include: before training, corresponding Anchor frames Anchor are calculated adaptively according to different data sets, prediction is carried out on the basis of the Anchor frames of the different data sets according to an infrared pedestrian detection network, the amplitude of the Anchor frames needing fine adjustment is predicted, and after the Anchor frames of the different data sets are subjected to fine adjustment, the prediction frames for marking the positions of pedestrians in the different data sets as accurately as possible are obtained. Because the training set is a picture marked with pedestrian targets, the predicted frame is compared with the real frame marked with pedestrians in the training set to obtain a loss value, and then the infrared pedestrian detection network can be subjected to gradient back propagation according to the loss value to obtain an infrared pedestrian detection model. The positioning loss of the positive sample can be calculated through a target detection regression loss function CIoU, and the target classification and pedestrian confidence loss can be calculated through a multi-label classification task loss function (English: binary Cross Entropy, BCE).
Because the YOLO v5 target detection algorithm has advantages compared with other target detection algorithms in the YOLO series, the infrared pedestrian detection method in the embodiment of the present application is realized based on the improved YOLO v5 algorithm, see fig. 3 for a schematic frame structure of the infrared target detection method provided in the embodiment of the present application, in which fig. 3 includes three parts of feature extraction based on backbone network back, multi-scale feature fusion neg and classification and regression detection Head, wherein the CBS module is composed of a convolution operation conv+ batch normalization (Batch Normalization, BN) +activation function Swish, the FCS module is composed of a Focus module, a CBS module and a CSE module, the downsampling Focus structure is that a plurality of slice results are spliced and then fed into the CBS module, the Channel extrusion excitation (Channel SqueezeExcitation, CSE) module is a pedestrian target attention mechanism module, the spatial pyramid pooling (SpatialPyramid Pooling, SPP) can adopt a maximum pooling mode of 1*1, 5*5, 9*9 and 13, the multi-scale feature pooling mode is performed by using a spatial pyramid pooling mode of 3813, the CBS 2_dfinger and 13, the two-dimensional map sensor module is composed of a two-dimensional map 1 and a CSE module, and a two-dimensional map 1, and a two-dimensional map module is composed of a two-dimensional map module, and a CSS module is composed of a two-dimensional map module, and a CSP module is composed of a sample module, and a sample module.
The feature extraction part based on the backbone network is composed of the deep convolutional neural networks stacked in multiple layers, so that the multi-scale image deep attention features can be extracted and output layer by layer, and then the multi-scale attention features including image semantic information and image texture information can be obtained by splicing the image deep attention features and the image shallow attention features of different scales in the backbone network through the multi-scale feature fusion part. Finally, the multi-scale attention features can be respectively output to the detection head to classify pedestrians and return positions.
Referring to fig. 4, which is a schematic structural diagram of a attention mechanism module of a pedestrian target provided in this embodiment of the present application, W in fig. 4 is Width, H is Height, and C is Channel, an input feature map may be first divided into two branches according to a first Channel dimension, and convolution operations with different convolution kernel sizes may be performed on the branch features of the two branches, and in order to reduce calculation pressure, asymmetric convolution may be used to obtain receptive fields with different scales. As an example, the convolution kernel sizes of the two branches may be Conv 1*3, conv 3*1 and Conv 1*5, conv 5*1.
Then splicing the two branch features subjected to asymmetric convolution operation according to the second channel dimension, wherein the Squeeze represents channel compression, and then carrying out global average pooling of a space on the spliced branch features, wherein each channel is formedThe channel can obtain a scalar with an output format of 1 x C, the scalar is input into the two-layer fully-connected neural network while the scale is kept unchanged, the expression represents channel Excitation, and the scalar can be subjected to weight evaluation through an activation function Sigmoid to obtain the weight W of the scalar, namely the weights W between C0 and 1 c And the weight dimension is 1 x C, then the weights of the C channels are divided into two branch weights of 1 x C/2 according to the third channel dimension, and the two branch weights are multiplied by the corresponding branch characteristics respectively to obtain two weighted branch characteristics. And finally, splicing the two weighted branch characteristics to obtain an output characteristic diagram. The infrared pedestrian detection network can avoid the interference of the background through the attention mechanism module of the pedestrian target, and the pedestrian target in the training set is focused more, so that the accuracy of infrared pedestrian detection is improved.
In addition, since pedestrians in various distance segments may be involved in the unmanned scene, objects of pedestrians far away are smaller, and the detector is difficult to capture accurately, so that the accuracy of infrared pedestrian detection is reduced, in the embodiment of the application, after the multi-scale attention feature output is obtained by splicing the deep attention features and the shallow attention features of images in different scales in the backbone network through the multi-scale feature fusion part, the multi-scale attention feature output can be detected through the small object detection layer, so that the detected multi-scale attention feature output is obtained, and the capability of capturing small object positions of the infrared pedestrian detection network is enhanced. Specifically, the right Head part of fig. 3 may be embodied, where a P2 detection layer is added in the present application, if the size of the input picture is 640 x 640, the input picture may be downsampled 4 times to obtain a P2 detection layer with a size of 160 x 160, for detecting a small target with a size of 4*4 or more, the input picture may be downsampled 8 times to obtain a P3 feature layer with a size of 80 x 80, for detecting a target with a size of 8 x 8 or more, the input picture may be downsampled 16 times to obtain a P4 feature layer with a size of 40 x 40, for detecting a target with a size of 16 x 16 or more, and the input picture may be downsampled 32 times to obtain a P5 feature layer with a size of 20 x 20 or more, for detecting a target with a size of 32 x 32 or more. After the multi-scale features are fused, pedestrian detection is performed through the small target detection layer, and pedestrian targets far away can be captured, so that the accuracy of infrared pedestrian detection is improved on the other hand.
S202: and detecting the infrared target picture to be detected according to the infrared target detection model to obtain a preliminary detection result.
The infrared target picture to be detected can be an infrared pedestrian picture to be detected, and the infrared pedestrian picture to be detected can be a picture which is not marked in an open source infrared pedestrian detection data set KAIST or a picture outside the KAIST data set and is used for detecting the robustness of an infrared pedestrian detection model. Detecting an infrared pedestrian picture to be detected according to an infrared pedestrian detection model, wherein the obtained preliminary detection result comprises n vectors with 5 dimensions, and the n detection result vector is R n =[x 1 n ,y 1 n ,x 2 n ,y 2 n ,s n ]Wherein [ x 1 n ,y 1 n ,x 2 n ,y 2 n ]Respectively representing the upper left corner coordinate and the lower right corner coordinate of the coordinate frame corresponding to the nth detection result, s n And representing the confidence of the pedestrian corresponding to the nth detection result.
S203: and filtering the preliminary detection result through an algorithm to obtain a detection result of the infrared target picture to be detected.
In a possible implementation manner, the filtering of the repeated detection box in the preliminary detection result by the post-processing algorithm NMS may specifically include: the preliminary detection results are filtered according to a preset confidence threshold, which may be 0.5 as an example, and the preliminary detection results with confidence lower than 0.5 are deleted. Then, the filtered preliminary detection results can be ranked according to the order of the confidence coefficient s from large to small, and then the cross ratio of the detection result of the first ranking and the detection result except the first ranking is calculated respectively, which can be specifically expressed by a formula I:
therein, ioU 1,k Representing the ratio of the overlapping Area of the first-ranked candidate frame and the k-ranked candidate frame in the first-ranked detection result to the merging Area, i.e., the result of the cross-correlation of the first-ranked candidate frame and the k-ranked candidate frame, area (R 1 ∩R k ) Represents the Area of the intersection region of the first-ranked candidate frame and the k-th-ranked candidate frame, area (R) 1 ∪R k ) Representing the area of the union region of the first-ranked candidate frame and the k-th-ranked candidate frame, R k ,k>1 represents a candidate box in the preliminary detection result.
The cross-over result may then be filtered according to a preset cross-over threshold, as an example, the preset cross-over IoU threshold may be 0.5, and the detection result above 0.5 may be deleted to obtain a filtered detection result. And then sequencing the first detection result of the confidence value in the filtered detection result to serve as a preliminary output result. And then, continuously cycling the rest detection results according to the steps, and obtaining respectively corresponding output results until the number of the sequenced detection results is less than or equal to 1, and taking the union of the initial output results and the subsequent corresponding output results as the detection result of the infrared pedestrian picture to be detected finally.
The above is an infrared target detection method provided in the embodiments of the present application, where an infrared target detection network is trained through a training set, so as to obtain an infrared target detection model, where the infrared target detection network includes a attention mechanism module of a target to be detected. Then, the infrared target picture to be detected is detected according to the infrared target detection model, and a preliminary detection result can be obtained. And finally, filtering the preliminary detection result through an algorithm to obtain a detection result of the infrared target picture to be detected. Therefore, after the infrared target detection model predicts the infrared target picture to be detected, the preliminary detection result can be filtered through an algorithm, so that the detection result of the infrared target picture to be detected is obtained.
The embodiments of the present application provide some specific implementation manners of an infrared target detection method, and based on this, the present application further provides a corresponding device. The apparatus provided in the embodiments of the present application will be described from the viewpoint of functional modularization.
Referring to fig. 5, which is a schematic structural diagram of an infrared target detection device 500 according to an embodiment of the present application, the device 500 may include:
the training module 501 is configured to train an infrared target detection network through a training set to obtain an infrared target detection model, where the infrared target detection network includes an attention mechanism module of a target to be detected;
the detection module 502 is configured to detect an infrared target picture to be detected according to an infrared target detection model, so as to obtain a preliminary detection result;
and the filtering module 503 is configured to filter the preliminary detection result through an algorithm, and obtain a detection result of the infrared target picture to be detected.
In this embodiment of the present application, through the cooperation of the training module 501, the detection module 502 and the filtering module 503, after the infrared target detection model predicts the infrared target picture to be detected, the preliminary detection result may be filtered by an algorithm, so as to obtain the detection result of the infrared target picture to be detected.
As an embodiment, the attention mechanism module of the training module 501 includes:
the first dividing unit is used for dividing the input feature map into two branches according to the first channel dimension and respectively carrying out different convolution operations on the branch features of the two branches;
the first splicing unit is used for splicing the two branch features subjected to convolution operation according to the second channel dimension to obtain spliced branch features;
the averaging pooling unit is used for carrying out global averaging pooling on the spliced branch characteristics to obtain a scalar;
the weight evaluation unit is used for evaluating the weight of the scalar through an activation function in the fully-connected neural network to obtain the weight of the scalar;
the second dividing unit is used for dividing the scalar weight into two branch weights according to the third channel dimension, and multiplying the two branch weights with the corresponding branch characteristics respectively to obtain two weighted branch characteristics;
and the second splicing unit is used for splicing the two weighted branch characteristics to obtain an output characteristic diagram.
As one embodiment, the infrared object detection network in training module 501 includes a feature extraction network, and infrared object detection device 500 further includes:
the third splicing unit is used for splicing the deep feature images and the shallow feature images with different scales in the feature extraction network to obtain multi-scale feature output;
the detection unit is used for detecting the multi-scale characteristic output through the target detection layer to obtain the detected multi-scale characteristic output.
As one embodiment, the training step of the infrared target detection network in the training module 501 includes:
the prediction unit is used for predicting the target to be detected based on anchor frames of different data sets according to the infrared target detection network to obtain a prediction frame, wherein the prediction frame is a predicted frame marking the position of the target to be detected;
the comparison unit is used for comparing the prediction frame with a real frame marked by a target to be detected in the training set to obtain a loss value;
and the back propagation unit is used for carrying out gradient back propagation on the infrared target detection network according to the loss value to obtain an infrared target detection model.
As one embodiment, the preliminary detection result includes a confidence of the target to be detected, and the filtering module 503 includes:
the first filtering unit is used for filtering the preliminary detection result according to a preset confidence threshold value to obtain a filtered preliminary detection result;
the sorting unit is used for sorting the filtered preliminary detection results through the confidence value to obtain sorted detection results;
the calculation unit is used for calculating the cross ratio of the detection results except the first detection result in the ordered detection results and the detection result of the first detection result in the ordered detection results respectively to obtain a cross ratio result;
the second filtering unit is used for filtering the cross-correlation result according to a preset cross-correlation threshold value to obtain a filtered detection result;
the first determining unit is used for determining a detection result meeting a preset condition in the filtered detection results as a detection result of the infrared target picture to be detected.
As one embodiment, the first determination unit includes:
the second determining unit is used for sequencing the first detection results in the confidence value sequence in the filtered detection results, and determining the first detection results as preliminary output results;
the circulating unit is used for circulating the detection results except the first detection result in the filtered detection results according to a preset rule to respectively obtain corresponding output results until the number of the filtered primary detection results meets a preset condition;
and the obtaining unit is used for obtaining the detection result of the infrared target picture to be detected according to the union set of the preliminary output result and the corresponding output result.
The embodiment of the application also provides corresponding equipment and a computer storage medium, which are used for realizing the scheme provided by the embodiment of the application.
The device comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for executing the computer program to enable the device to execute the infrared target detection method according to any embodiment of the application.
The computer storage medium stores a computer program, and when the code is executed, a device executing the computer program implements the infrared target detection method according to any embodiment of the present application.
The "first" and "second" in the names of "first", "second" (where present) and the like in the embodiments of the present application are used for name identification only, and do not represent the first and second in sequence.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus general hardware platforms. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, including several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a router) to perform the methods described in the embodiments or some portions of the embodiments of the present application.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An infrared target detection method, the method comprising:
training an infrared target detection network through a training set to obtain an infrared target detection model, wherein the infrared target detection network comprises an attention mechanism module of a target to be detected;
detecting an infrared target picture to be detected according to the infrared target detection model to obtain a preliminary detection result;
and filtering the preliminary detection result through an algorithm to obtain a detection result of the infrared target picture to be detected.
2. The method of claim 1, wherein the attention mechanism module of the object to be detected comprises:
dividing an input feature map into two branches according to a first channel dimension, and respectively carrying out different convolution operations on branch features of the two branches;
splicing the two branch features subjected to the convolution operation according to the second channel dimension to obtain spliced branch features;
global average pooling is carried out on the spliced branch characteristics to obtain a scalar;
performing weight evaluation on the scalar through an activation function in the fully connected neural network to obtain the weight of the scalar;
dividing the scalar weight into two branch weights according to a third channel dimension, and multiplying the two branch weights with corresponding branch features respectively to obtain two weighted branch features;
and splicing the two weighted branch characteristics to obtain an output characteristic diagram.
3. The method of claim 1, wherein the infrared object detection network comprises a feature extraction network, the method further comprising:
splicing the deep feature images and the shallow feature images with different scales in the feature extraction network to obtain multi-scale feature output;
and detecting the multi-scale characteristic output through a target detection layer to obtain the detected multi-scale characteristic output.
4. The method of claim 1, wherein the training step of the infrared target detection network comprises:
predicting the target to be detected based on anchor frames of different data sets according to the infrared target detection network to obtain a prediction frame, wherein the prediction frame is a predicted frame marking the position of the target to be detected;
comparing the predicted frame with a true frame marked by a target to be detected in the training set to obtain a loss value;
and carrying out gradient back propagation on the infrared target detection network according to the loss value to obtain the infrared target detection model.
5. The method according to claim 1, wherein the preliminary detection result includes a confidence level of an object to be detected, filtering the preliminary detection result by an algorithm to obtain a detection result of the infrared object picture to be detected, including:
filtering the preliminary detection result according to a preset confidence threshold value to obtain a filtered preliminary detection result;
sorting the filtered preliminary detection results through the confidence values to obtain sorted detection results;
calculating the cross ratio of the detection results except the first detection result in the ordered detection results with the first detection result in the ordered detection results respectively to obtain a cross ratio result;
filtering the cross comparison result according to a preset cross comparison threshold value to obtain a filtered detection result;
and determining the detection result meeting the preset condition in the filtered detection results as the detection result of the infrared target picture to be detected.
6. The method according to claim 5, wherein determining the detection result satisfying the preset condition as the detection result of the infrared target picture to be detected includes:
sequencing the first detection result of the confidence value in the filtered detection result, and determining the first detection result as a preliminary output result;
circulating the detection results except the first detection result according to a preset rule to respectively obtain corresponding output results until the number of the filtered primary detection results meets a preset condition;
and obtaining a detection result of the infrared target picture to be detected according to the union set of the preliminary output result and the corresponding output result.
7. An infrared target detection apparatus, the apparatus comprising:
the training module is used for training the infrared target detection network through the training set to obtain an infrared target detection model, and the infrared target detection network comprises an attention mechanism module of a target to be detected;
the detection module is used for detecting the infrared target picture to be detected according to the infrared target detection model to obtain a preliminary detection result;
and the filtering module is used for filtering the preliminary detection result through an algorithm to obtain the detection result of the infrared target picture to be detected.
8. The apparatus of claim 7, wherein the attention mechanism module of the object to be detected comprises:
the first dividing unit is used for dividing the input feature map into two branches according to the first channel dimension, and respectively carrying out different convolution operations on the branch features of the two branches;
the first splicing unit is used for splicing the two branch features subjected to the convolution operation according to the second channel dimension to obtain spliced branch features;
the averaging pooling unit is used for carrying out global averaging pooling on the spliced branch characteristics to obtain a scalar;
the weight evaluation unit is used for evaluating the weight of the scalar through an activation function in the fully-connected neural network to obtain the weight of the scalar;
the second dividing unit is used for dividing the scalar weight into two branch weights according to a third channel dimension, and multiplying the two branch weights with corresponding branch features respectively to obtain two weighted branch features;
and the second splicing unit is used for splicing the two weighted branch characteristics to obtain an output characteristic diagram.
9. An infrared object detection device, the device comprising:
a memory for storing a computer program;
a processor for executing the computer program to cause the apparatus to perform the infrared target detection method according to any one of claims 1 to 6.
10. A computer storage medium, wherein a computer program is stored in the computer storage medium, which computer program, when being executed by a processor, implements the infrared target detection method according to any one of claims 1 to 6.
CN202311237659.4A 2023-09-22 2023-09-22 Infrared target detection method and device Pending CN117274740A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311237659.4A CN117274740A (en) 2023-09-22 2023-09-22 Infrared target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311237659.4A CN117274740A (en) 2023-09-22 2023-09-22 Infrared target detection method and device

Publications (1)

Publication Number Publication Date
CN117274740A true CN117274740A (en) 2023-12-22

Family

ID=89213855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311237659.4A Pending CN117274740A (en) 2023-09-22 2023-09-22 Infrared target detection method and device

Country Status (1)

Country Link
CN (1) CN117274740A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118229961A (en) * 2024-04-10 2024-06-21 湖南君领科技有限公司 Infrared target detection method, device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118229961A (en) * 2024-04-10 2024-06-21 湖南君领科技有限公司 Infrared target detection method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111709416B (en) License plate positioning method, device, system and storage medium
CN110009010B (en) Wide-width optical remote sensing target detection method based on interest area redetection
US8447139B2 (en) Object recognition using Haar features and histograms of oriented gradients
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
JP2020068028A (en) Cnn-based learning method and learning device selecting useful learning data, and testing method and testing device using the same
CN110889318A (en) Lane detection method and apparatus using CNN
Xing et al. Traffic sign recognition using guided image filtering
KR20200027887A (en) Learning method, learning device for optimizing parameters of cnn by using multiple video frames and testing method, testing device using the same
CN116778277B (en) Cross-domain model training method based on progressive information decoupling
CN113160117A (en) Three-dimensional point cloud target detection method under automatic driving scene
CN114764886A (en) CFAR (computational fluid dynamics) -guidance-based double-current SSD SAR (solid State disk) image target detection method
CN117274740A (en) Infrared target detection method and device
CN118823614B (en) Low-altitude UAV target detection algorithm based on improved SSD
CN111553474A (en) Ship detection model training method and ship tracking method based on unmanned aerial vehicle video
CN115393743A (en) Vehicle detection method, unmanned aerial vehicle, medium based on dual-branch codec network
CN115272882A (en) Discrete building detection method and system based on remote sensing image
CN119091420A (en) A surface ship target detection method and system for unmanned boats
CN117593890B (en) A method, device, electronic equipment and storage medium for detecting objects scattered on roads
CN118658086A (en) A method for detecting small marine vessels using UAVs based on a fusion framework
CN111428567B (en) Pedestrian tracking system and method based on affine multitask regression
CN118447366A (en) Contraband identification model construction method and contraband identification method
CN118115896A (en) Unmanned aerial vehicle detection method and system based on improvement YOLOv3
CN115546668A (en) Marine organism detection method and device and unmanned aerial vehicle
CN116524314A (en) Unmanned aerial vehicle small target detection method based on anchor-free frame algorithm
Dos Santos et al. Performance comparison of convolutional neural network models for object detection in tethered balloon imagery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination