[go: up one dir, main page]

CN110991311A - A target detection method based on densely connected deep network - Google Patents

A target detection method based on densely connected deep network Download PDF

Info

Publication number
CN110991311A
CN110991311A CN201911188895.5A CN201911188895A CN110991311A CN 110991311 A CN110991311 A CN 110991311A CN 201911188895 A CN201911188895 A CN 201911188895A CN 110991311 A CN110991311 A CN 110991311A
Authority
CN
China
Prior art keywords
network
target
dense
image
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911188895.5A
Other languages
Chinese (zh)
Other versions
CN110991311B (en
Inventor
陈莹
潘志浩
化春键
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201911188895.5A priority Critical patent/CN110991311B/en
Publication of CN110991311A publication Critical patent/CN110991311A/en
Application granted granted Critical
Publication of CN110991311B publication Critical patent/CN110991311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method based on a dense connection deep network, and belongs to the technical field of target detection. The target detection method based on the dense connection deep network fuses the dense connection mode into the yolo-tiny network, increases the convolution layer of the yolo-tiny network, and improves the feature extraction network. The improved network firstly normalizes an input image into a fixed size, then extracts and fuses the characteristics of each channel by using a DenseBlock module, and then predicts by using different prior frames on different scales to finish the classification and positioning of a target. Compared with the original algorithm, the improved algorithm has the advantages that the precision is improved by 15%, and the requirement of real-time detection can be met; the size of the model is only 44.7MB, and the requirements of memory occupation and real-time performance in actual use can be met.

Description

Target detection method based on dense connection deep network
Technical Field
The invention relates to a target detection method based on a dense connection deep network, and belongs to the technical field of target detection.
Background
There are many current deep learning based target detection algorithms, such as fast Rcnn (fast Region-based computational Network), ssd (single Shot MultiBox detector), R-fcn (Region-based fuzzy computational Network), yolo (you Only Look one), yolo-Tiny (you Only Look one-Tiny) and so on. However, the algorithms still have many defects, for example, the algorithms such as fast Rcnn, R-fcn, SSD and the like have the problems of low detection speed, complex system configuration environment and the like, the yolov3 algorithm has high detection speed, but the model occupies a large memory, and the yolov3-tiny has the problem of over-low detection precision.
Although the current yolov3-tiny detection network has high detection speed, various problems exist, such as inaccurate detection positioning, poor detection effect, and serious missed detection and false detection conditions. At present, a residual network structure is fused into yolov3-tiny in the literature, but the detection precision is only 60.92%.
A dense connection Convolutional neural network (Gao Huang, Zhuang Li, Laurens van der Maaten, Kilian Q. Weinberger. Densely Connected Convolutional Networks [ C ]. CVPR, 2017. DOI: 10.1109/CVPR.2017.243) is an independent and complete detection network, but the network has the disadvantages that the calculated amount of the network is increased sharply and a large amount of display memory is consumed due to the arrangement of output parameters of different Convolutional layers and the existence of full connection layers. This problem limits the use of the network in practical production.
Disclosure of Invention
In order to solve at least one problem, the invention provides a target detection method based on a dense connection deep network, which achieves the effects of high detection precision, high speed and small memory occupied by a model by improving the network structure of yolov3-tiny algorithm, and can meet the requirement of displaying and using the real-time performance.
According to the target detection method based on the dense connection deep network, a dense connection mode is integrated into a convolutional neural network, and each extracted feature is utilized extremely by cascading the output of each convolutional layer. The invention not only improves the feature utilization rate and information flow of the detection network, but also strengthens feature propagation and improves the detection effect.
The invention aims to provide a target detection method based on a dense connection deep network, which comprises the following steps:
step (1): reading image data in a Pascal VOC data set and extracting target data characteristics;
step (2): training a network model;
and (3): and carrying out target detection.
Optionally, the method comprises the following steps:
step (1): reading in image data in the Pascal VOC data set and extracting target data characteristics: reading input image data by a network, firstly normalizing the resolution of the input image data to 416 x 416, and then extracting and fusing the characteristics of each channel through a series of convolution layers and a Dense connection module Dense Block;
step (2): training a network model: setting a network batch to 64, and repeating iterative training to obtain a detection model;
and (3): and (3) carrying out target detection: the network firstly extracts features from an input image through a feature extraction network to obtain a feature map (assumed to be k x k) with a certain size, then the input image is divided into k x k unit cells, and each unit cell predicts a fixed number (3) of boundary frames; when predicting, adopting logistic regression to predict the target score of each bounding box, namely how likely the block area is to be the target; then, non-maximum suppression (NMS) is carried out, and finally, a detection result is output.
Optionally, the step (1) includes:
(1) the intensive connection mode is introduced, so that the L-layer network has L (L +1)/2 connections; the Dense connection module Dense Block is mainly composed of convolution layers of 1 × 1 and 3 × 3, wherein the convolution operation of 1 × 1 is also called a bottelecklayer, and the purpose is to reduce the number of input feature maps, improve the calculation efficiency and fuse the features of each channel; the 3-by-3 convolution is used for extracting image features; the input of each layer in the Dense connection module Dense Block comes from the output of all the previous layers so as to achieve better effect and fewer parameters; the formula indicates that the input of the l layer is the sum of the outputs of all the previous layers;
xl=Hl([x0,x1,···,xl-1])
wherein x islRepresents the output of the l-th layer, [ x ]0,x1,…,xl-1]Representing the cascade of layer 0, …, l-1 outputs. In the above formula Hl(. -) represents a complex function of three successive operations, consisting of Batch Normalization (BN), normalized linear regression (ReLU) and a 3 x 3 convolutional layer;
(2) reducing the quantity of feature graphs output by the convolution layers in a Dense connection module Dense Block;
(3) in order to realize network down-sampling operation, a network is divided into a plurality of Dense connection modules, the output of feature maps in each Dense Block is set to be the same, and the number of feature maps in different scales is different.
Optionally, the step (2) includes:
setting the learning rate of the network to be 0.001, setting the momentum to be 0.9, setting the weight attenuation regular term to be 0.0005, setting the maximum iteration number of the network to be 500200, and attenuating the learning rate of the network by 10 times when the iteration number reaches 400000 and 450000; and simultaneously, the network uses multi-scale training, after the network reads data, the width and the height of the normalized resolution of the image take random values between 320 and 608, and the random values are changed once every 10 rounds at random, and are all multiples of 32.
Optionally, the step (3) includes:
(1) yolov3-tiny uses K-means clustering algorithm to cluster the real frames in the data set, sets 3 prior frames with different sizes for two scales obtained by down sampling, and clusters 6 prior frames with different sizes in total;
the 6 prior box sizes for the two different scales are shown in table 1 below:
TABLE 1
Figure BDA0002293076710000031
(2) Predicting on feature maps of three different scales using 6 different prior boxes (Anchors); when the bounding box is predicted, in order to better model data and support multi-label classification, a network adopts logistic regression (logistic regression); the coordinate prediction formula of the network bounding box is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0002293076710000032
Figure BDA0002293076710000033
wherein t isx、ty、tw、thAs the actual predicted value of the model, cxAnd cyDenotes the coordinate offset, p, of grid cellwAnd phWidth and height of the anchor box, bx、by、bwAnd bhCoordinates and width and height of the center of the finally obtained bounding box; the training of coordinates uses the square and error loss.
(3) The threshold for non-maximum suppression (NMS) was set to 0.45.
The second purpose of the invention is to apply the target detection method based on the dense connection depth network in image target detection.
Optionally, the method for detecting a target based on a dense connection depth network according to the present invention is applied to image target detection, and comprises the following specific steps: the method comprises the steps that pedestrian image data under different scenes are read by a network to serve as training data, image resolution of the image data is normalized to 416 x 416, and then features of all channels are extracted and fused through a convolution layer, a pooling layer and a dense connection module (DenseBlock); obtaining a corresponding detection model through a training network; the method comprises the steps of loading a model obtained through training, a network configuration file and an image to be detected, wherein the network firstly extracts features of an input image to be detected through a feature extraction network, and because the method adopts multi-scale prediction, feature graphs of 13 x 13 and 26 x 26 are obtained after the features are extracted, and prediction is carried out under the two different scales. Then the network divides the image to be detected into 13 × 13 and 26 × 26 cells respectively, and each cell predicts a fixed number (3) of bounding boxes; logistic regression is used in the prediction to predict the goal score of each bounding box, i.e., how likely the block is to be in the pedestrian category. Then, non-maximum suppression (NMS) is carried out, and finally, a detection result is output.
The third purpose of the invention is to apply the target detection method based on the dense connection deep network in video target detection.
Optionally, the method for detecting a target based on a dense connection deep network according to the present invention is applied to video target detection, and comprises the following specific steps: the method comprises the steps that video target image data under different scenes are read by a network to serve as training data, image resolution of the target image data is normalized to 416 x 416, and then features of all channels are extracted and fused through a convolution layer, a pooling layer and a Dense connection module (Dense Block); obtaining a detection model of a corresponding detection task through a training network; loading a model, a network configuration file and an image to be detected obtained by training, wherein the network firstly extracts features of the input image to be detected through a feature extraction network; because the invention adopts multi-scale prediction, the characteristic graphs of 13 × 13 and 26 × 26 are obtained after the characteristics are extracted, and the prediction is carried out under the two different scales; then the network divides the image to be detected into 13 × 13 and 26 × 26 cells respectively, and each cell predicts a fixed number (3) of bounding boxes; during prediction, logistic regression is adopted for predicting the target score of each bounding box, namely the possibility of the region being a pedestrian category; then performing non-maximum suppression (NMS); the network has high detection speed, so that the effect of real-time detection can be achieved, and the network can be applied to real-time video detection and output a detection result.
The invention has the beneficial effects that:
(1) the method of the invention makes full use of each extracted feature, thereby not only improving the feature utilization rate of the network, strengthening the feature propagation, but also enhancing the learning of the network on the detail information.
(2) The method can reach 65.93% in detection precision, which is higher than 49.19% of yolo-tiny.
(3) The method can reach 83fps/s in detection speed, and can be applied to various real-time target detection tasks in actual scenes.
(4) The size of the model adopted by the method is only 44.7MB, the requirement of the model on the memory of the computer is small, and the cost can be saved.
Drawings
Fig. 1 is a diagram of a dense connection network architecture.
Fig. 2 is an overall architecture diagram of the network.
Fig. 3 is the pedestrian detection results of the original algorithm in the Pascal VOC data set.
Figure 4 is the results of pedestrian detection in the Pascal VOC data set of example 2.
FIG. 5 is the detection result of the original algorithm in the Pascal VOC detection task.
FIG. 6 is the results of example 3 in the Pascal VOC detection task.
Detailed Description
The present invention is described in detail with reference to the embodiments shown in the drawings, but it should be understood that these embodiments are not intended to limit the present invention, and those skilled in the art should understand that functional, methodological, or structural equivalents or substitutions made by these embodiments are within the scope of the present invention.
Example 1
The existing target detection method has high detection precision, but cannot meet the requirement of real-time detection in actual production, and has poor portability due to model memory. Aiming at the problems, the invention provides a target detection method based on a dense connection deep network, which is described in detail with reference to the accompanying drawings as follows:
as shown in fig. 1, it is a structural diagram of a dense connection network of a target detection method based on a dense connection deep network provided by the present invention; fig. 2 is a network overall architecture diagram of a target detection method based on a dense connection depth network according to the present invention. In this embodiment, a target detection method based on a dense connection deep network includes the following steps:
a.1, reading in image data in a Pascal VOC data set and extracting target data characteristics: the network reads the input image data, firstly normalizes the resolution to 416 x 416, and then extracts and fuses the characteristics of each channel through a series of convolution layers and a Dense connection module (Dense Block).
The step A.1 comprises the following steps:
(1) the intensive connection mode is introduced, so that the L-layer network has L (L +1)/2 connections. The Dense connection module Dense Block is mainly composed of convolution layers of 1 × 1 and 3 × 3, wherein the convolution operation of 1 × 1 is called a bottelecklayer again, and the purpose is to reduce the number of input feature maps, improve the calculation efficiency and fuse the features of each channel. The 3 x 3 convolution is used to extract image features. The input of each layer in the Dense connection module Dense Block comes from the output of all previous layers to achieve better effect and fewer parameters. The formula indicates that the input of the l layer is the sum of the outputs of all the previous layers;
xl=Hl([x0,x1,···,xl-1])
wherein x islRepresents the output of the l-th layer, [ x ]0,x1,…,xl-1]Representing the cascade of layer 0, …, l-1 outputs. In the above formula Hl(. -) represents a complex function of three successive operations, consisting of Batch Normalization (BN), normalized linear regression (ReLU) and a 3 x 3 convolutional layer.
(2) Reducing the quantity of feature graphs output by the convolution layers in the sense Block;
(3) in order to realize network down-sampling operation, a network is divided into a plurality of Dense connection modules, the output of feature maps in each Dense Block is set to be the same, and the number of feature maps in different scales is different.
B.1, training a network model: and setting the network batch to 64, and repeating iterative training to obtain a detection model.
The step B.1 comprises the following steps:
the learning rate of the network is set to 0.001, the momentum is set to 0.9, the weight attenuation regular term is 0.0005, the maximum iteration number of the network is 500200, and the learning rate of the network is attenuated by a factor of 10 when the iteration number reaches 400000 and 450000. Meanwhile, the network uses multi-scale training, the width and the height of the network input size are random values between 320 and 608, and the network input size is changed once every 10 rounds.
C.1, target detection:
the network firstly extracts features from an input image through a feature extraction network to obtain a feature map (assumed to be k × k) with a certain size, then divides the input image into k × k cells, and predicts a fixed number (3) of bounding boxes for each cell. Logistic regression is used in the prediction to predict the goal score of each bounding box, i.e., how likely the block is to be the goal. Then, non-maximum suppression (NMS) is carried out, and finally, a detection result is output.
The step C.1 specifically comprises the following steps:
(1) yolov3-tiny clusters the real frames in the dataset using the K-means clustering algorithm, sets 3 prior frames of different sizes for each downsampling scale, and clusters 6 prior frames of different sizes in total.
The 6 prior box sizes for the two different scales are shown in table 2 below:
TABLE 2
Figure BDA0002293076710000061
(2) Prediction was performed on feature maps at three different scales using 6 different a priori boxes (Anchors). When predicting the bounding box, the network uses logistic regression (logistic regression) for better data modeling and support of multi-label classification. The coordinate prediction formula of the bounding box of the network is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0002293076710000062
Figure BDA0002293076710000063
wherein t isx、ty、tw、thAs the actual predicted value of the model, cxAnd cyDenotes the coordinate offset, p, of grid cellwAnd phWidth and height of the anchor box, bx、by、bwAnd bhThe coordinates and width and height of the center of the resulting bounding box. The training of coordinates uses the square and error loss.
(3) The threshold for non-maximum suppression (NMS) was set to 0.45.
Example 2
This example is the process and results for pedestrian detection on the Pascal VOC data set. The method comprises the following specific steps:
a.1, reading pedestrian image data under different scenes in a Pascal VOC data set as training data and extracting pedestrian data characteristics: the network reads the input image, firstly normalizes the resolution to 416 x 416, and then extracts and fuses the characteristics of each channel through a convolution layer, a pooling layer and a Dense connection module (Dense Block).
The step A.1 comprises the following steps:
(1) the intensive connection mode is introduced, so that the L-layer network has L (L +1)/2 connections. The Dense connection module Dense Block is mainly composed of convolution layers of 1 × 1 and 3 × 3, wherein the convolution operation of 1 × 1 is called a bottelecklayer again, and the purpose is to reduce the number of input feature maps, improve the calculation efficiency and fuse the features of each channel. The 3 x 3 convolution is used to extract image features. The input of each layer in the Dense Block comes from the output of all previous layers to achieve better effect and fewer parameters.
xl=Hl([x0,x1,···,xl-1])
Wherein x islRepresents the output of the l-th layer, [ x ]0,x1,…,xl-1]Representing the cascade of layer 0, …, l-1 outputs. Hl(. cndot.) consists of Batch Normalization (BN), normalized linear unit (ReLU) and a 3 x 3 convolutional layer.
The image data is passed through a feature map of 208 × 48 obtained by a first Dense connection module Dense Block, and then passed through a convolution layer of 1 × 1, in order to reduce the number of input channels and reduce the complexity of network computation, and then passed through a pooling layer of 2 × 2, the function is to perform down-sampling on the feature map to obtain higher-level semantic information. The resulting output is used as input to the Dense connection module Dense Block 2.
(2) The number of convolutional layer output feature maps in the sense Block is reduced. The sense Block1 sets the number of feature maps to 16, and the sense blocks 2, 3, 4 and 5 to 32, 64, 128 and 256. The purpose of increasing the quantity of the output characteristic graphs is to enable the network to learn richer high-level semantic information in pedestrian image data and increase the positioning accuracy.
(3) In order to realize network downsampling operation, a network is divided into a plurality of Dense blocks, the output of feature maps in each Dense Block is set to be the same, and the number of feature maps in different scales is different.
B.1 training the network model: and setting the network batch to 64, and repeating iterative training to obtain a detection model.
The step B.1 comprises the following steps:
the learning rate of the network is set to 0.001, the momentum is set to 0.9, the weight attenuation regular term is 0.0005, the maximum iteration number of the network is 500200, and the learning rate of the network is attenuated by a factor of 10 when the iteration number reaches 400000 and 450000. Meanwhile, the network uses multi-scale training, after the network reads in pedestrian image data, the width and the height of the image normalization resolution ratio take random values between 320 and 608, and the random values are changed once every 10 rounds, and are all multiples of 32.
C.1, target detection:
when detecting the image, firstly loading the model, the network configuration file and the image data to be detected, firstly extracting the characteristics of the input image to be detected by the network through the characteristic extraction network, and obtaining the characteristic diagrams of 13 x 13 and 26 x 26 after extracting the characteristics due to the adoption of multi-scale prediction, and predicting under the two different scales. The network then divides the image to be detected into 13 × 13, 26 × 26 cells, respectively, each cell predicting a fixed number (3) of bounding boxes. Logistic regression is used in the prediction to predict the goal score of each bounding box, i.e., how likely the block is to be in the pedestrian category. Then, non-maximum suppression (NMS) is carried out, and finally, a detection result is output.
The step C.1 specifically comprises the following steps:
(1) yolov3-tiny clusters the real frames in the dataset using the K-means clustering algorithm, sets 3 prior frames of different sizes for each downsampling scale, and clusters 6 prior frames of different sizes in total. The 6 prior box sizes for the two different scales are shown in table 3 below:
TABLE 3
Figure BDA0002293076710000081
(2) Prediction was performed on feature maps at three different scales using 6 different a priori boxes (Anchors). When predicting the bounding box, the network uses logistic regression (logistic regression) for better data modeling and support of multi-label classification. The coordinate prediction formula of the bounding box of the network is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0002293076710000082
Figure BDA0002293076710000083
wherein t isx、ty、tw、thAs the actual predicted value of the model, cxAnd cyDenotes the coordinate offset, p, of grid cellwAnd phWidth and height of the anchor box, bx、by、bwAnd bhThe coordinates and width and height of the center of the resulting bounding box. The coordinate training adopts the method of flatteningSquare and error loss.
(3) The threshold for non-maximum suppression (NMS) was set to 0.45. For filtering out overlapping blocks that occur during the prediction process.
FIG. 3 shows the pedestrian detection result of the original algorithm in the Pascal VOC data set, and the accuracy of the pedestrian class detection is 65.1%. The original algorithm was derived from the literature (Redmon J, Farhadi A. Yolov3: An innovative improvement [ J ]. arXiv preprint arXiv:1804.02767,2018.).
Fig. 4 shows the detection result of the pedestrian in the Pascal VOC data set in example 2, where the accuracy of detecting the pedestrian category is 79.8%, and compared with the original algorithm, the detection accuracy is improved by 14.7%.
Example 3
This example is the procedure and results for the detection of horse classes on the Pascal VOC data set. The method comprises the following specific steps:
a.1, reading image data of the horse in different scenes in a Pascal VOC data set as training data and extracting the class data characteristics of the horse: the network reads the input image, firstly normalizes the resolution to 416 x 416, and then extracts and fuses the characteristics of each channel through a convolution layer, a pooling layer and a Dense connection module (Dense Block).
The step A.1 comprises the following steps:
(1) the intensive connection mode is introduced, so that the L-layer network has L (L +1)/2 connections. The Dense Block is mainly composed of convolution layers of 1 × 1 and 3 × 3, and the convolution operation of 1 × 1 is called a bottomsheet layer, so as to reduce the number of input feature maps, improve the calculation efficiency and fuse the features of each channel. The 3 x 3 convolution is used to extract image features. The input of each layer in the Dense Block comes from the output of all previous layers to achieve better effect and fewer parameters.
xl=Hl([x0,x1,···,xl-1])
Wherein x islRepresents the output of the l-th layer, [ x ]0,x1,…,xl-1]Representing the cascade of layer 0, …, l-1 outputs. Hl(. The) consists of Batch Normalization (BN), normalized linear unit (ReLU) and one3 x 3 of a convolutional layer.
The image data is passed through a feature map of 208 × 48 obtained by a first Dense connection module Dense Block, and then passed through a convolution layer of 1 × 1, in order to reduce the number of input channels and reduce the complexity of network computation, and then passed through a pooling layer of 2 × 2, the function is to perform down-sampling on the feature map to obtain higher-level semantic information. The resulting output is provided as input to a Dense Block 2.
(2) And the quantity of the convolution layer output characteristic graphs in a Dense connection module Dense Block is reduced. The DenseBlock1 sets the number of feature maps to 16, and the DenseBlock 2, DenseBlock 3, DenseBlock 4 and DenseBlock5 to 32, 64, 128 and 256. The purpose of increasing the number of the output characteristic graphs is to enable a network to learn richer high-level semantic information in the image data of the horse and increase the positioning accuracy.
(3) In order to realize network down-sampling operation, a network is divided into a plurality of Dense connection modules, the output of feature maps in each Dense Block is set to be the same, and the number of feature maps in different scales is different.
B.1 training the network model: and setting the network batch to 64, and repeating iterative training to obtain a detection model.
The step B.1 comprises the following steps:
the learning rate of the network is set to 0.001, the momentum is set to 0.9, the weight attenuation regular term is 0.0005, the maximum iteration number of the network is 500200, and the learning rate of the network is attenuated by a factor of 10 when the iteration number reaches 400000 and 450000. Meanwhile, the network uses multi-scale training, after the network reads in image data of a horse, the width and the height of the normalized resolution of the image take random values between 320 and 608, and the random values are changed once every 10 rounds, and are all multiples of 32.
C.1, target detection:
when detecting the image, firstly loading the model, the network configuration file and the image data to be detected, firstly extracting the characteristics of the input image to be detected by the network through the characteristic extraction network, and obtaining the characteristic diagrams of 13 x 13 and 26 x 26 after extracting the characteristics due to the adoption of multi-scale prediction, and predicting under the two different scales. The network then divides the image to be detected into 13 × 13, 26 × 26 cells, respectively, each cell predicting a fixed number (3) of bounding boxes. Logistic regression is used in the prediction to predict the goal score of each bounding box, i.e., how likely the block is to be in the horse category. Then, non-maximum suppression (NMS) is carried out, and finally, a detection result is output.
The step C.1 specifically comprises the following steps:
(1) yolov3-tiny clusters the real frames in the dataset using the K-means clustering algorithm, sets 3 prior frames of different sizes for each downsampling scale, and clusters 6 prior frames of different sizes in total. The 6 prior box sizes for the two different scales are shown in table 4 below:
TABLE 4
Figure BDA0002293076710000101
(2) Prediction was performed on feature maps at three different scales using 6 different a priori boxes (Anchors). When predicting the bounding box, the network uses logistic regression (logistic regression) for better data modeling and support of multi-label classification. The coordinate prediction formula of the bounding box of the network is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0002293076710000102
Figure BDA0002293076710000103
wherein t isx、ty、tw、thAs the actual predicted value of the model, cxAnd cyDenotes the coordinate offset, p, of grid cellwAnd phWidth and height of the anchor box, bx、by、bwAnd bhFor the edge finally obtainedCoordinates and width and height of the center of the bounding box. The training of coordinates uses the square and error loss.
(3) The threshold for non-maximum suppression (NMS) was set to 0.45. For filtering out overlapping blocks that occur during the prediction process.
FIG. 5 is a detection result of the original algorithm in the Pascal VOC detection task, and it can be known from the image that the original algorithm network cannot well detect the type in the image, and the condition of missing detection occurs. The accuracy of detecting horse class is 63.2%. The original algorithm was derived from the literature (Redmon J, Farhadi A. Yolov3: An innovative improvement [ J ]. arXivpreprint arXiv:1804.02767,2018.).
Fig. 6 shows the detection result of the Pascal VOC detection task in example 2, which can well detect and locate the categories in the image. The accuracy of detecting the horse is 79.4%, and compared with the original algorithm, the accuracy of detecting the horse is improved by 16.2%.
The invention integrates the dense connection mode into the yolo-tiny network, increases the convolution layer of the yolo-tiny network and improves the characteristic extraction network. The improved network firstly normalizes an input image into a fixed size, then extracts and fuses the characteristics of each channel by using a Dense Block module, and then predicts by using different prior frames on different scales to finish the classification and positioning of a target. Compared with the original algorithm, the improved algorithm has the advantages that the precision is improved by 15%, and the requirement of real-time detection can be met; the size of the model is only 44.7MB, and the requirements of memory occupation and real-time performance in actual use can be met.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (9)

1.一种基于密集连接深度网络的目标检测方法,其特征在于,包括以下步骤:1. a target detection method based on densely connected deep network, is characterized in that, comprises the following steps: 步骤(1):读入Pascal VOC数据集中的图像数据并提取目标数据特征;Step (1): read the image data in the Pascal VOC dataset and extract the target data features; 步骤(2):训练网络模型;Step (2): train the network model; 步骤(3):进行目标检测。Step (3): perform target detection. 2.根据权利要求1所述的方法,其特征在于,具体的步骤为:2. method according to claim 1, is characterized in that, concrete steps are: 步骤(1):读入Pascal VOC数据集中的图像数据并提取目标数据特征:网络读取输入的图像数据,首先将其分辨率归一化为416*416,然后通过一系列卷积层以及密集连接模块Dense Block提取、融合各通道的特征;Step (1): Read in the image data in the Pascal VOC dataset and extract the target data features: The network reads the input image data, first normalizes its resolution to 416*416, and then passes through a series of convolutional layers and dense The connection module Dense Block extracts and fuses the features of each channel; 步骤(2):训练网络模型:设置网络batch为64,重复迭代训练得到检测模型;Step (2): train the network model: set the network batch to 64, and repeat the iterative training to obtain the detection model; 步骤(3):进行目标检测:网络首先通过特征提取网络对输入图像提取特征,得到一定尺寸的k*k特征图,然后将输入图像分为k*k个单元格,每个单元格预测固定数量的边界框;预测时采用逻辑回归,用于预测每个边界框的目标性得分,即这块区域是目标的可能性有多大;而后进行非极大值抑制NMS,最后输出检测结果。Step (3): Perform target detection: the network first extracts features from the input image through the feature extraction network to obtain a k*k feature map of a certain size, and then divides the input image into k*k cells, each cell is predicted to be fixed The number of bounding boxes; logistic regression is used to predict the targetness score of each bounding box, that is, how likely is this area to be a target; then non-maximum suppression NMS is performed, and finally the detection result is output. 3.根据权利要求1所述的方法,其特征在于,所述步骤(1)包括:3. The method according to claim 1, wherein the step (1) comprises: (1)引入的密集连接方式,使得L层网络有L(L+1)/2个连接;其中的密集连接模块DenseBlock主要是由1*1和3*3的卷积层组成,其中的1*1卷积操作又被成为bottleneck layer;3*3卷积则是用于提取图像特征;密集连接模块Dense Block中每一层的输入来自前面所有层的输出;下式表明了第l层的输入即为之前所有层的输出之和;(1) The dense connection method introduced makes the L-layer network have L(L+1)/2 connections; the dense connection module DenseBlock is mainly composed of 1*1 and 3*3 convolutional layers, of which 1 *1 convolution operation is called bottleneck layer; 3*3 convolution is used to extract image features; the input of each layer in the dense connection module Dense Block comes from the output of all previous layers; the following formula shows that the first layer The input is the sum of the outputs of all previous layers; xl=Hl([x0,x1,…,xl-1])x l =H l ([x 0 ,x 1 ,...,x l-1 ]) 其中,xl表示第l层的输出,[x0,x1,…,xl-1]表示第0,…,l-1层输出的级联;上式中Hl(·)表示三个连续运算的复合函数,由BN、ReLU和一个3*3的卷积层组成;Among them, x l represents the output of the lth layer, [x 0 ,x 1 ,...,x l-1 ] represents the cascade of the outputs of the 0th,...,l-1th layer; in the above formula H l ( ) represents three A composite function of continuous operations, consisting of BN, ReLU and a 3*3 convolutional layer; (2)减少密集连接模块Dense Block中卷积层输出特征图数量;(2) Reduce the number of output feature maps of the convolutional layer in the dense connection module Dense Block; (3)将网络分为多个密集连接模块Dense Block,且每个Dense Block中的特征图的输出都设定为相同,且不同尺度下特征图的数量也不同。(3) The network is divided into multiple dense connection modules Dense Block, and the output of the feature maps in each Dense Block is set to be the same, and the number of feature maps at different scales is also different. 4.根据权利要求1所述的方法,其特征在于,所述步骤(2)包括:4. The method according to claim 1, wherein the step (2) comprises: 设置网络的学习率为0.001,动量设为0.9,权重衰减正则项为0.0005,网络最大的迭代次数为500200,网络的学习率在迭代次数到达400000和450000时衰减10倍;同时网络使用多尺度训练,网络读取数据后,图像归一化分辨率的宽高在320~608之间取随机值,且每10轮随机改变一次,随机值均为32的倍数。The learning rate of the network is set to 0.001, the momentum is set to 0.9, the weight decay regular term is 0.0005, the maximum number of iterations of the network is 500200, and the learning rate of the network is attenuated by 10 times when the number of iterations reaches 400000 and 450000; at the same time, the network uses multi-scale training. , After the network reads the data, the width and height of the normalized resolution of the image take random values between 320 and 608, and change randomly every 10 rounds, and the random values are all multiples of 32. 5.根据权利要求1所述的方法,其特征在于,所述步骤(3)包括:5. The method according to claim 1, wherein the step (3) comprises: (1)yolov3-tiny使用K-means聚类算法聚类数据集中真实框,为下采样得到的两种尺度设定3个不同大小的先验框,总共聚类出6个不同尺寸的先验框;(1) yolov3-tiny uses the K-means clustering algorithm to cluster the real boxes in the data set, and sets 3 prior boxes of different sizes for the two scales obtained by downsampling, and a total of 6 prior boxes of different sizes are clustered frame; 两种不同尺度的6个先验框大小如下所示:The 6 a priori box sizes for two different scales are as follows:
Figure FDA0002293076700000021
Figure FDA0002293076700000021
(2)使用6个不同先验框Anchors在三种不同尺度的特征映射图上进行预测;在对边界框进行预测的时候,为了更好的数据建模以及支持多标签分类,网络采用逻辑回归logistic regression;网络边界框的坐标预测公式如下:(2) Use 6 different a priori box Anchors to predict on three different scale feature maps; when predicting bounding boxes, in order to better data modeling and support multi-label classification, the network uses logistic regression logistic regression; the coordinate prediction formula of the network bounding box is as follows: bx=σ(tx)+cx b x =σ(t x )+c x by=σ(ty)+cy b y =σ(t y )+c y
Figure FDA0002293076700000022
Figure FDA0002293076700000022
Figure FDA0002293076700000023
Figure FDA0002293076700000023
其中tx、ty、tw、th为模型的实际预测值,cx和cy表示grid cell的坐标偏移量,pw和ph表示anchor box的宽高,bx、by、bw和bh为最终得到的边界框的中心的坐标和宽高;坐标的训练采用的是平方和误差损失;where t x , ty , t w , th h are the actual predicted values of the model, c x and c y represent the coordinate offset of the grid cell, p w and p h represent the width and height of the anchor box, b x , by y , b w and b h are the coordinates and width and height of the center of the final bounding box; the training of coordinates uses the sum of squared error loss; (3)设定非极大值抑制NMS的阈值为0.45。(3) Set the threshold value of non-maximum value to suppress NMS to 0.45.
6.权利要求1所述的基于密集连接深度网络的目标检测方法在图像目标检测中的应用。6. The application of the densely connected deep network-based target detection method in claim 1 in image target detection. 7.根据权利要求6所述的应用,其特征在于,具体应用步骤为:网络读取不同场景下的行人图像数据作为训练数据,首先将图像数据的图像分辨率归一化为416*416,然后通过卷积层、池化层以及密集连接模块Dense Block提取、融合各通道的特征;通过训练网络得到相应的检测模型;加载训练得到的模型、网络配置文件以及待检测的图像,网络首先通过特征提取网络对输入的待检测图像提取特征;提取特征后得到13*13、26*26的特征图,并在这两种不同尺度下进行预测;随后网络将待检测图像分别划分为13*13、26*26个单元格,每个单元格预测固定3个的边界框;预测时采用逻辑回归,用于预测每个边界框的目标性得分,即这块区域是行人类别的可能性有多大;而后进行非极大值抑制NMS,最后输出检测结果。7. application according to claim 6, it is characterized in that, concrete application steps are: the pedestrian image data under different scenarios is read as training data by the network, at first the image resolution of image data is normalized to 416*416, Then, the features of each channel are extracted and fused through the convolutional layer, the pooling layer and the dense connection module Dense Block; the corresponding detection model is obtained by training the network; the trained model, the network configuration file and the image to be detected are loaded. The feature extraction network extracts features from the input image to be detected; after extracting the features, the feature maps of 13*13 and 26*26 are obtained, and predictions are made at these two different scales; then the network divides the images to be detected into 13*13 , 26*26 cells, each cell predicts a fixed 3 bounding boxes; logistic regression is used to predict the target score of each bounding box, that is, how likely is this area to be a pedestrian ; and then perform non-maximum suppression NMS, and finally output the detection result. 8.权利要求1所述的基于密集连接深度网络的目标检测方法在视频目标检测中的应用。8. The application of the target detection method based on the densely connected deep network of claim 1 in video target detection. 9.根据权利要求8所述的应用,其特征在于,具体应用步骤为:网络读取不同场景下的视频目标图像数据作为训练数据,首先将目标图像数据的图像分辨率归一化为416*416,然后通过卷积层、池化层以及密集连接模块Dense Block提取、融合各通道的特征;通过训练网络得到相应检测任务的检测模型;加载训练得到的模型、网络配置文件以及待检测的图像,网络首先通过特征提取网络对输入的待检测图像提取特征;提取特征后得到13*13、26*26的特征图,并在这两种不同尺度下进行预测;随后网络将待检测图像分别划分为13*13、26*26个单元格,每个单元格预测3个的边界框;预测时采用逻辑回归,用于预测每个边界框的目标性得分,即这块区域是目标类别的可能性有多大;而后进行非极大值抑制NMS;输出检测结果。9. application according to claim 8, is characterized in that, concrete application step is: network reads the video target image data under different scenes as training data, at first the image resolution of target image data is normalized to 416* 416, and then extract and fuse the features of each channel through the convolution layer, the pooling layer and the dense connection module Dense Block; obtain the detection model of the corresponding detection task through the training network; load the model obtained by training, the network configuration file and the image to be detected , the network first extracts the features of the input image to be detected through the feature extraction network; after extracting the features, the feature maps of 13*13 and 26*26 are obtained, and predictions are made at these two different scales; then the network divides the images to be detected separately. For 13*13, 26*26 cells, each cell predicts 3 bounding boxes; logistic regression is used to predict the target score of each bounding box, that is, the possibility that this area is the target category How big is the sex; then perform non-maximum suppression NMS; output the detection result.
CN201911188895.5A 2019-11-28 2019-11-28 A target detection method based on densely connected deep network Active CN110991311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911188895.5A CN110991311B (en) 2019-11-28 2019-11-28 A target detection method based on densely connected deep network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911188895.5A CN110991311B (en) 2019-11-28 2019-11-28 A target detection method based on densely connected deep network

Publications (2)

Publication Number Publication Date
CN110991311A true CN110991311A (en) 2020-04-10
CN110991311B CN110991311B (en) 2021-09-24

Family

ID=70087704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911188895.5A Active CN110991311B (en) 2019-11-28 2019-11-28 A target detection method based on densely connected deep network

Country Status (1)

Country Link
CN (1) CN110991311B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553406A (en) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 Target detection system, method and terminal based on improved YOLO-V3
CN111723737A (en) * 2020-06-19 2020-09-29 河南科技大学 A target detection method based on deep feature learning of multi-scale matching strategy
CN111832489A (en) * 2020-07-15 2020-10-27 中国电子科技集团公司第三十八研究所 Subway crowd density estimation method and system based on target detection
CN111860681A (en) * 2020-07-30 2020-10-30 江南大学 A method and application of deep network difficult sample generation under dual attention mechanism
CN111862056A (en) * 2020-07-23 2020-10-30 东莞理工学院 A segmentation method of retinal blood vessels based on deep learning
CN112132034A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Pedestrian image detection method and device, computer equipment and storage medium
CN112287740A (en) * 2020-05-25 2021-01-29 国网江苏省电力有限公司常州供电分公司 Target detection method and device for power transmission line based on YOLOv3-tiny, and unmanned aerial vehicle
CN112861919A (en) * 2021-01-15 2021-05-28 西北工业大学 Underwater sonar image target detection method based on improved YOLOv3-tiny
CN112949389A (en) * 2021-01-28 2021-06-11 西北工业大学 Haze image target detection method based on improved target detection network
CN113449806A (en) * 2021-07-12 2021-09-28 苏州大学 Two-stage forestry pest identification and detection system and method based on hierarchical structure
CN113705583A (en) * 2021-08-16 2021-11-26 南京莱斯电子设备有限公司 Target detection and identification method based on convolutional neural network model
CN113705359A (en) * 2021-08-03 2021-11-26 江南大学 Multi-scale clothes detection system and method based on washing machine drum image
CN113989753A (en) * 2020-07-09 2022-01-28 浙江大华技术股份有限公司 Multi-target detection processing method and device
CN113989939A (en) * 2021-11-16 2022-01-28 河北工业大学 A Small Target Pedestrian Detection System Based on Improved YOLO Algorithm
CN114663292A (en) * 2020-12-22 2022-06-24 南京大学 Ultra-lightweight picture defogging and identification network model and picture defogging and identification method
CN114998220A (en) * 2022-05-12 2022-09-02 湖南中医药大学 Tongue image detection and positioning method based on improved Tiny-YOLO v4 natural environment
CN115410184A (en) * 2022-08-24 2022-11-29 江西山水光电科技股份有限公司 Target detection license plate recognition method based on deep neural network
US20230075797A1 (en) * 2020-04-30 2023-03-09 Electronic Arts Inc. Extending knowledge data in machine vision

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012551A1 (en) * 2017-03-06 2019-01-10 Honda Motor Co., Ltd. System and method for vehicle control based on object and color detection
CN109522966A (en) * 2018-11-28 2019-03-26 中山大学 A kind of object detection method based on intensive connection convolutional neural networks
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN109685008A (en) * 2018-12-25 2019-04-26 云南大学 A kind of real-time video object detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012551A1 (en) * 2017-03-06 2019-01-10 Honda Motor Co., Ltd. System and method for vehicle control based on object and color detection
CN109522966A (en) * 2018-11-28 2019-03-26 中山大学 A kind of object detection method based on intensive connection convolutional neural networks
CN109685008A (en) * 2018-12-25 2019-04-26 云南大学 A kind of real-time video object detection method
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GAO HUANG 等: "Densely Connected Convolutional Networks", 《 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
JOSEPH REDMON 等: "YOLOv3: An Incremental Improvement", 《ARXIV:1804.02767》 *
ZHOU LONG 等: "YOLO-RD: A lightweight object detection network for range doppler radar images", 《IOP CONFERENCE SERIES》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553406A (en) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 Target detection system, method and terminal based on improved YOLO-V3
CN111553406B (en) * 2020-04-24 2023-04-28 上海锘科智能科技有限公司 Target detection system, method and terminal based on improved YOLO-V3
US20230075797A1 (en) * 2020-04-30 2023-03-09 Electronic Arts Inc. Extending knowledge data in machine vision
CN112287740A (en) * 2020-05-25 2021-01-29 国网江苏省电力有限公司常州供电分公司 Target detection method and device for power transmission line based on YOLOv3-tiny, and unmanned aerial vehicle
CN112287740B (en) * 2020-05-25 2022-08-30 国网江苏省电力有限公司常州供电分公司 Target detection method and device for power transmission line based on YOLOv3-tiny, and unmanned aerial vehicle
CN111723737B (en) * 2020-06-19 2023-11-17 河南科技大学 A target detection method based on multi-scale matching strategy deep feature learning
CN111723737A (en) * 2020-06-19 2020-09-29 河南科技大学 A target detection method based on deep feature learning of multi-scale matching strategy
CN113989753A (en) * 2020-07-09 2022-01-28 浙江大华技术股份有限公司 Multi-target detection processing method and device
CN111832489A (en) * 2020-07-15 2020-10-27 中国电子科技集团公司第三十八研究所 Subway crowd density estimation method and system based on target detection
CN111862056A (en) * 2020-07-23 2020-10-30 东莞理工学院 A segmentation method of retinal blood vessels based on deep learning
CN111860681A (en) * 2020-07-30 2020-10-30 江南大学 A method and application of deep network difficult sample generation under dual attention mechanism
CN111860681B (en) * 2020-07-30 2024-04-30 江南大学 Deep network difficulty sample generation method under double-attention mechanism and application
CN112132034A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Pedestrian image detection method and device, computer equipment and storage medium
CN112132034B (en) * 2020-09-23 2024-04-16 平安国际智慧城市科技股份有限公司 Pedestrian image detection method, device, computer equipment and storage medium
CN114663292A (en) * 2020-12-22 2022-06-24 南京大学 Ultra-lightweight picture defogging and identification network model and picture defogging and identification method
CN112861919A (en) * 2021-01-15 2021-05-28 西北工业大学 Underwater sonar image target detection method based on improved YOLOv3-tiny
CN112949389A (en) * 2021-01-28 2021-06-11 西北工业大学 Haze image target detection method based on improved target detection network
CN113449806A (en) * 2021-07-12 2021-09-28 苏州大学 Two-stage forestry pest identification and detection system and method based on hierarchical structure
CN113705359A (en) * 2021-08-03 2021-11-26 江南大学 Multi-scale clothes detection system and method based on washing machine drum image
CN113705359B (en) * 2021-08-03 2024-05-03 江南大学 Multi-scale clothes detection system and method based on drum images of washing machine
CN113705583A (en) * 2021-08-16 2021-11-26 南京莱斯电子设备有限公司 Target detection and identification method based on convolutional neural network model
CN113705583B (en) * 2021-08-16 2024-03-22 南京莱斯电子设备有限公司 Target detection and identification method based on convolutional neural network model
CN113989939A (en) * 2021-11-16 2022-01-28 河北工业大学 A Small Target Pedestrian Detection System Based on Improved YOLO Algorithm
CN113989939B (en) * 2021-11-16 2024-05-14 河北工业大学 Small target pedestrian detection system based on improved YOLO algorithm
CN114998220A (en) * 2022-05-12 2022-09-02 湖南中医药大学 Tongue image detection and positioning method based on improved Tiny-YOLO v4 natural environment
CN115410184A (en) * 2022-08-24 2022-11-29 江西山水光电科技股份有限公司 Target detection license plate recognition method based on deep neural network

Also Published As

Publication number Publication date
CN110991311B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN110991311B (en) A target detection method based on densely connected deep network
CN111767882B (en) Multi-mode pedestrian detection method based on improved YOLO model
CN112561027B (en) Neural network architecture search method, image processing method, device and storage medium
CN107103754B (en) A method and system for predicting road traffic conditions
CN108765506B (en) Compression method based on layer-by-layer network binarization
CN110084292A (en) Object detection method based on DenseNet and multi-scale feature fusion
US11816881B2 (en) Multiple object detection method and apparatus
CN110991444B (en) License plate recognition method and device for complex scene
CN112884742B (en) A multi-target real-time detection, recognition and tracking method based on multi-algorithm fusion
CN111753682B (en) Hoisting area dynamic monitoring method based on target detection algorithm
CN110443208A (en) YOLOv 2-based vehicle target detection method, system and equipment
CN111626128A (en) A pedestrian detection method in orchard environment based on improved YOLOv3
CN111401516A (en) Neural network channel parameter searching method and related equipment
CN108596053A (en) A kind of vehicle checking method and system based on SSD and vehicle attitude classification
CN112529005B (en) Target detection method based on semantic feature consistency supervision pyramid network
CN112488043B (en) Unmanned aerial vehicle target detection method based on edge intelligence
CN114973112B (en) A scale-adaptive dense crowd counting method based on adversarial learning network
CN117496384B (en) Unmanned aerial vehicle image object detection method
CN115187786A (en) A Rotation-Based Object Detection Method for CenterNet2
CN110008853A (en) Pedestrian detection network and model training method, detection method, medium, equipment
CN111767962A (en) One-stage target detection method, system and device based on generative adversarial network
CN116310688A (en) Target detection model based on cascade fusion, and construction method, device and application thereof
CN115565150A (en) A pedestrian and vehicle target detection method and system based on improved YOLOv3
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN116612450A (en) Point cloud scene-oriented differential knowledge distillation 3D target detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant