[go: up one dir, main page]

CN110263712B - A Coarse and Fine Pedestrian Detection Method Based on Region Candidates - Google Patents

A Coarse and Fine Pedestrian Detection Method Based on Region Candidates Download PDF

Info

Publication number
CN110263712B
CN110263712B CN201910535870.1A CN201910535870A CN110263712B CN 110263712 B CN110263712 B CN 110263712B CN 201910535870 A CN201910535870 A CN 201910535870A CN 110263712 B CN110263712 B CN 110263712B
Authority
CN
China
Prior art keywords
detection
fine
rough
pedestrian
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910535870.1A
Other languages
Chinese (zh)
Other versions
CN110263712A (en
Inventor
宋晓宁
周少康
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uniform Entropy Technology Wuxi Co ltd
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201910535870.1A priority Critical patent/CN110263712B/en
Publication of CN110263712A publication Critical patent/CN110263712A/en
Application granted granted Critical
Publication of CN110263712B publication Critical patent/CN110263712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于区域候选的粗精行人检测方法,包括粗检测阶段,所述粗检测阶段还包括以下步骤,运用局部无关通道特征方法对粗训练样本的待检测图片进行粗检测;筛选出在所述粗训练样本上漏检掉的标签目标框;对漏检的所述标签目标框进行聚类分析,设置标签尺度与长宽比;利用所述尺度与长宽比训练区域候选网络;图片输入训练好的所述区域候选网络输出的检测结果融合,得到粗检测结果。本发明的有益效果:本发明一是在粗检测阶段通过聚类方法对目标真实结果进行分析,利用区域候选网络进行有针对性的训练,将检测结果与原来的候选框融合,得到更高的召回率,显著地降低了检测结果的漏检率。

Figure 201910535870

The invention discloses a coarse and fine pedestrian detection method based on region candidates, which includes a coarse detection stage, and the coarse detection stage further includes the following steps: using a local irrelevant channel feature method to perform coarse detection on a to-be-detected picture of a coarse training sample; screening The label target frame that is missed on the rough training sample; perform cluster analysis on the missed label target frame, and set the label scale and aspect ratio; use the scale and aspect ratio to train the region candidate network ; The detection results output by the region candidate network trained by the image input are fused to obtain a rough detection result. Beneficial effects of the present invention: First, in the rough detection stage, the real results of the target are analyzed by the clustering method, and the regional candidate network is used for targeted training, and the detection results are fused with the original candidate frame to obtain higher The recall rate significantly reduces the missed detection rate of the detection results.

Figure 201910535870

Description

Coarse and fine pedestrian detection method based on region candidates
Technical Field
The invention relates to the technical field of pedestrian detection, in particular to a pedestrian detection method combining a rough and fine expression strategy and a regional candidate network.
Background
Pedestrian detection has gained particular attention in computer vision research in recent years as a key technology for autonomous driving and intelligent monitoring. The pedestrian detection technology aims to find out pedestrians existing in an image or a video, and accurately mark the size of the pedestrians even if the pedestrians exist, which is a classic problem of the target detection direction and is generally represented by a rectangular box. Because the human body has considerable flexibility, various postures and shapes exist, the appearance characteristics are greatly influenced by clothes, postures, angles and the like, and factors such as shielding, illumination and the like also face the influence, so that the stable and efficient detection is very difficult to guarantee in the actual work, and the pedestrian detection is still the classical and challenging problem in the current computer vision research.
Although the existing technology can accurately extract the contour and some textural features of the pedestrian target, the calculation complexity is high, or the false detection situation is not considered, the rough and fine expression only filters redundant false detection windows, and the pedestrian target which is not detected originally can not be detected for classification and judgment, so that the target which is not detected by the original method is still missed.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the technical problem solved by the invention is as follows: the pedestrian detection missing detection problem and the target scale difference problem are solved, and the missing detection rate of the pedestrian detection method is effectively reduced.
In order to solve the technical problems, the invention provides the following technical scheme: a coarse and fine pedestrian detection method based on regional candidates comprises a coarse detection stage, wherein the coarse detection stage further comprises the following steps of performing coarse detection on a to-be-detected picture of a coarse training sample by using a local irrelevant channel characteristic method; screening out tag target frames which are missed to be detected on the coarse training sample; performing cluster analysis on the tag target frames which are missed to be detected, and setting the tag scale and the length-width ratio; training a region candidate network using the scale and aspect ratio; and fusing the detection result output by the area candidate network after the picture input training and the coarse detection result output by the classifier trained by the local irrelevant channel characteristic method to obtain a coarse detection result.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: the generation of the coarse detection result comprises the following steps of extracting different characteristic channels from the pedestrian images of the coarse training samples; extracting features and training the classifier by applying the local irrelevant channel feature method; and carrying out coarse detection on the image through the trained classifier to generate a coarse detection result.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: and the area candidate network selects a candidate area on the feature map through the feature map generated after the convolution operation, and outputs a series of rectangular target candidate frames with corresponding score values by selecting and receiving the picture to be detected as input.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: the method comprises the following steps that 13 convolutional layers of a VGG-16 network are selected as a network for convolutional operation; extracting features of an input picture through a convolutional layer, obtaining sliding windows with different scales and proportions on a feature map obtained in the front by using a small network, and mapping the windows to features with lower dimensionality, including features mapped to 512 dimensions; the previously generated windows are effectively classified and regressed by two fully connected layers.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: the loss function of the regional candidate network training is defined as follows:
Figure BDA0002101159510000021
where i denotes the sequence number of the target candidate box, piIt is the probability that the ith target frame candidate is a pedestrian target, and when the ith target frame candidate is identified as a target, p i *1 and conversely 0, tiRepresenting the predicted coordinates, ti *Representing the coordinates of the real object.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: in the area candidate network training process, a sample with the maximum intersection ratio with the label target frame or the overlapping degree with the real target frame more than 0.7 is taken as a positive sample, and a sample with the overlapping degree with the real target frame less than 0.3 is taken as a negative sample.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: the method also comprises a fine detection stage, wherein the fine detection stage also comprises the following steps of extracting different characteristic channels from the pedestrian images of the coarse training samples; further extracting color self-similarity characteristics and convolution channel characteristics from the characteristic channel; and performing fusion training by using the color self-similarity characteristics and the convolution channel characteristics to obtain three classifiers and outputting corresponding detection results.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: the fine detection stage also comprises the following steps of taking the pedestrian and background pictures generated by the characteristic channel extracted in the coarse detection stage as fine training samples in the fine detection stage; training a VGG-16 network as a two-classifier according to the fine training sample; and combining the two classifiers with the three classifiers to jointly serve as a classification detector of the fine detection stage.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: and the detection stage is used for obtaining a candidate target frame from the test sample image through a local irrelevant channel feature method and a regional candidate network, and inputting the candidate target frame into the classifier in the fine detection stage for accurate classification to obtain a pedestrian label target frame detection result.
As a preferable aspect of the method for detecting rough and fine pedestrians based on the region candidates according to the present invention, wherein: the VGG-16 network replaces the last two pooling layers with the hole convolution layer with the step length of 2 to perform down-sampling operation, so that the characteristic diagram size is reduced and the receptive field is increased.
The invention has the beneficial effects that: firstly, the real target result is analyzed through a clustering method in a coarse detection stage, targeted training is carried out by utilizing a regional candidate network, the detection result is fused with an original candidate frame, higher recall rate is obtained, and the omission ratio of the detection result is remarkably reduced; in the fine classification stage, the VGG-16 network is improved, a part of the pooling layer is replaced by the cavity convolution, the feature extraction capability of the network is improved, an Adaboost classifier is trained, and the detection result is accurately judged.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
fig. 1 is a schematic diagram of a detection framework flow of a rough and fine pedestrian detection method based on region candidates according to a first embodiment of the invention;
fig. 2 is a schematic structural diagram of a regional candidate network according to a regional candidate-based rough/fine pedestrian detection method according to a first embodiment of the present invention;
FIG. 3 is a graph showing a comparison of the results of the detection on TUD-Brussels according to the third embodiment of the present invention;
fig. 4 is a schematic diagram showing the comparison of the detection results on Caltech according to the third embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to the schematic diagram of fig. 1, the method for detecting a pedestrian based on region candidates in the present embodiment includes a coarse detection stage and a fine detection stage, and is used for detecting a pedestrian. With the intensive research on computer vision, many classical and effective pedestrian detection methods have been proposed in succession. For example, in the prior art, a more classical pedestrian detection method is obtained by combining a gradient direction histogram with a support vector machine, the directional gradient histogram features can accurately extract the contour and some texture features of a pedestrian target, but the computation complexity of the method is high. The method has the advantages that the local binary pattern is combined with the directional gradient histogram feature, and the pedestrian feature information is accurately described by using the characteristic of the local binary pattern operator and the directional gradient histogram feature together. A method for integrating multiple channels by aggregating channel characteristics is also provided, the color characteristics, the gradient amplitude and the gradient direction histogram characteristics are comprehensively utilized, and the detection result is effectively improved. The aggregation channel characteristic method is expanded to provide local irrelevant channel characteristics, and local decorrelation is carried out on each channel to obtain more representative characteristics, so that the detection result is effectively improved.
The LDCF method, i.e., the local irrelevant channel feature method, generates many detection windows, but generates many false detections, many backgrounds that do not include a pedestrian target are false detected as pedestrians, and these backgrounds are similar to features extracted from pedestrians. Pedestrian detection methods using a rough and fine expression strategy have been proposed to solve this problem. The pedestrian detection method based on the rough and fine expression strategy is a further improvement of the LDCF method, different feature channels are extracted from a pedestrian image, the LDCF method is applied to extract features and train a classifier, and the image is roughly detected through the trained classifier to generate a rough detection result.
And then further extracting an improved color self-similarity feature (NCSSF) and a Simplified Convolution Channel Feature (SCCF) from the previous feature channel, extracting the color similarity feature of the pedestrian from the improved color self-similarity feature, effectively distinguishing the pedestrian target from the background, mainly describing the robust essential feature inside the target by the simplified convolution channel feature, enhancing the irrelevance and the distinguishability of the features, finally fusing the two features, and training by using a weak classifier as an Adaboost classifier of a decision tree.
During detection, firstly, an image to be detected is detected by using an LDCF method, a detection window similar to a candidate region is generated, then, the generated detection window is detected by using a trained Adaboost classifier, and the false detection result of the LDCF method is eliminated, so that the final result is more accurate. Compared with the original LDCF method, the pedestrian detection method of the rough and fine expression strategy has the advantages that the performance is remarkably improved, the omission factor on the Caltech pedestrian detection data set is reduced by 12.9% compared with the original omission factor, the omission factor on the TUD-Brussels data set is reduced by 2.6%, and the effectiveness of the rough and fine expression strategy is proved.
However, through analysis, the rough and fine expression strategy effectively suppresses excessive false detection windows generated by pedestrian detection by the LDCF method, but the false detection situation of the LDCF is not considered, the rough and fine expression only filters the redundant false detection windows, and pedestrian targets which are not detected by the original LDCF method cannot be detected again for classification and judgment, so that the missed targets are still missed.
Therefore, the present embodiment provides a coarse and fine pedestrian detection method based on region candidates to solve the problem of missed detection, and theoretically, the coarse and fine expression structure should have a recall rate as high as possible in the coarse detection stage, but it is found through analysis that more pedestrian targets are still not detected in the coarse detection stage, and since the subsequent fine classification stage only excludes the previous false detection part, the problem of missed detection of the system cannot be effectively solved.
The rough and fine pedestrian detection method based on the regional candidates, which is provided by the embodiment, obtains a proper length-width ratio by clustering the missed detection part of the LDCF method in the structure, then improves the RPN, trains a training set aiming at the missed detection part, adds the detection results into fine classification judgment in the detection process, and more effectively filters the false detection window in the detection results by using the improved VGG16 network, so that the detection results are more accurate. Specifically, the method comprises a coarse detection stage and a fine detection stage, wherein the coarse detection stage further comprises the following steps,
performing coarse detection on the picture to be detected of the coarse training sample by using a local irrelevant channel characteristic method;
screening out a label target frame which is missed to be detected on the rough training sample, and comparing the label of the sample with a detection result to obtain the missed to be detected;
and performing cluster analysis on the label target frames which are missed to be detected, setting a proper scale and an aspect ratio, wherein the proper scale means that 9 scales capable of approximately summarizing all labels are found, the cluster analysis adopts k-means, namely the height and the width of the label are input, k is set to be 9, then the label target frames are divided into 9 families according to each scale, the scales of the labels in each family are similar as much as possible, and the scale difference of different families is as large as possible. The aspect ratio I is set to be 0.41 directly according to experience and pedestrian characteristics;
training the candidate network of the region by using the scale and the length-width ratio;
and (4) fusing a detection result output by the image input trained regional candidate network and a coarse detection result output by a classifier trained by the local irrelevant channel characteristic method to obtain a coarse detection result.
Further, the coarse detection stage further includes a step of generating a coarse detection result:
extracting different characteristic channels from the pedestrian images of the coarse training samples;
extracting features by applying a local irrelevant channel feature method and training a classifier;
and carrying out coarse detection on the image through a trained classifier to generate a coarse detection result.
The area candidate network selects a candidate area on the feature map generated after the convolution operation, so that the calculation time is reduced, and the detection accuracy is not lost, wherein the convolution is the feature map extraction performed by using an unmodified convolution neural network VGG-16, because the VGG-16 is a convolution neural network and the convolution layer is mainly performed in the convolution neural network, the convolution operation is also called, and because the candidate area is not directly selected on the original image and is performed on the feature map with the reduced size after the convolution of the VGG-16 network, the calculation amount is reduced, and the VGG-16 extracts representative image features, the size is reduced, but the information amount is not reduced too much, so that the accuracy is not lost.
In this embodiment, the region candidates receive the to-be-detected picture as an input, and output a series of rectangular target candidate frames, and each candidate frame has a corresponding score value, where the score value indicates a score that an image in each candidate frame is a pedestrian, and the larger the score value is, the higher the probability that a pedestrian is in the frame is.
Further, the rolling operation in this embodiment includes the following steps,
the network of convolution operation selects 13 convolution layers of VGG-16 network; the convolution of the regional candidate network here uses the extraction feature operation of the original VGG-16 network without modification, and the later modified VGG-16 is used for feature extraction in the fine detection stage. The difference is that the VGG-16 of the area candidate network is the original unmodified,
extracting features of an input picture through a convolutional layer, obtaining sliding windows with different scales and proportions on a feature map obtained in the front by using a small network, and mapping the windows to features with lower dimensionality, including features mapped to 512 dimensions; the small networks are here embodied as the back part of the RPN network, the 3 x 3 convolutional layer and the two fully connected layers.
And finally, effectively classifying and regressing the window generated in the front through two full connection layers. The two fully-connected layers are the most basic fully-connected layers in the neural network, and can also be called convolution layers with convolution kernel size being the same as the picture size, namely, each pixel is added with a weight for network learning.
The loss function for the regional candidate network training is defined as follows:
Figure BDA0002101159510000071
where i denotes the sequence number of the target candidate box, piIt is the probability that the ith target frame candidate is a pedestrian target, and when the ith target frame candidate is identified as a target, p i *1 and conversely 0, tiRepresenting the predicted coordinates, ti *Representing the coordinates of the real object.
In the area candidate network training process, a sample having the maximum intersection ratio with the real target frame or the intersection ratio (overlapping degree) with the real target frame more than 0.7 is taken as a positive sample, and a sample having the intersection ratio (overlapping degree) with the real target frame less than 0.3 is taken as a negative sample. The real target frame is the index label target frame and is a rectangular frame artificially marked and including pedestrians. The 0.7 and 0.3 setting values are thresholds which are set to be more helpful to the result, so that the detection effect of the trained framework is better.
Referring to the schematic diagram of fig. 2, which is a structural diagram of the regional candidate network, it should be further explained in this embodiment that the regional candidate network, that is, the RPN network is a network structure that is used by GirshickR to replace the selectsearch method in the RCNN model in order to improve the detection accuracy and the detection speed of the RCNN model. The RPN is used for finding a region where a target may exist in a picture, and the work is performed by a sliding window method, the original method generates too many regions which do not contain any useful categories, so that the task parameters of classification and position regression of the detected target are numerous, the time is too long, and accurate convergence is often difficult.
The RPN gives the task to a deep network, gives labels to a network training set and a real target frame, and leads the network to acquire the approximate position of the target from the characteristic diagram through training and then to be mapped back to the original image, so that the efficiency of target classification and frame regression is higher, and the accuracy is improved. For a pedestrian detection scene, the proportion of general pedestrians is small, the area occupied by the background is large, if each window in an image is detected, the training calculation time is increased undoubtedly, the convergence difficulty of the RPN network is high, and candidate areas are selected on the feature map generated after the convolution operation, so that the calculation time is reduced, and meanwhile, the detection accuracy cannot be lost. The RPN network receives the picture as input and then outputs a series of rectangular target candidate boxes, referred to in the text as anchors, each with a corresponding score value. The main network selects 13 convolutional layers of a VGG-16 network, features of an input picture are extracted through the convolutional layers, sliding windows with different scales and proportions are obtained on a feature diagram obtained in the front through a small network, and each window is mapped to a feature with a lower dimension, and the feature extraction network is mapped to a feature with 512 dimensions because the VGG-16 network is selected. Finally, the previously generated windows are effectively classified and regressed by two fully connected layers.
Since the RPN network, as part of the fasterncn, is initially used for multi-target detection directions, a total of 9 a priori candidate object frames of 3 scales (64, 128, 256, respectively) and 3 length-to-width ratios (1: 1, 1: 2, 2: 1, respectively) are selected, but too many candidate regions of useless scale and ratio are generated for pedestrian detection. Because the original RPN network aims at various targets, the scale and the length-width ratio are preset aiming at various targets, and the span is large, the difference between the pedestrian detection scale and the length-width ratio is too large, more target candidate frames which are too large or too small are generated, the calculation amount of the subsequent fine classification stage is increased, the detection efficiency of the whole frame is reduced, and the problems of inaccurate pedestrian position, large error of a pedestrian target frame and the like can be caused when the RPN network is directly used for pedestrian detection.
For the above problems, in this embodiment, through statistical analysis of the target labeling frame of the training set, a priori candidate frame parameters with more appropriate scale and aspect ratio are obtained by using a clustering (k-means algorithm) method to replace parameters in the original RPN network, an appropriate candidate region is generated for the pedestrian target, a prediction result more suitable for the pedestrian target is obtained, and the accuracy of the pedestrian target position is improved.
Example 2
Referring again to fig. 1, the present example is a fine detection stage in a region candidate based rough and fine pedestrian detection method, which further includes the steps of,
extracting different characteristic channels from the pedestrian images of the rough training samples;
further extracting color self-similarity characteristics and convolution channel characteristics from the characteristic channel;
and performing fusion training by using the color self-similarity characteristic and the convolution channel characteristic to obtain three classifiers and outputting corresponding detection results.
Further, the fine detection phase comprises the following steps,
the pedestrian and background pictures generated by extracting the feature channel in the rough detection stage are used as a fine training sample in the fine detection stage, wherein the pedestrian picture is directly intercepted according to the feature channel of the label and the training sample, the background picture is a picture which is directly and randomly intercepted in the feature channel of the sample and does not comprise the pedestrian, and the picture is not a result of training in the rough stage, but the training stage is directly used because the intercepting process is also carried out in the training stage. Each characteristic channel can be understood as pictures of different styles, and the positions of 10 characteristic channels of pedestrians in each picture are unchanged, so that the pedestrians are directly intercepted from the 10 characteristic channels according to the labels of the original pictures; the present embodiment uses 10 feature channels in total, which include 1 local normalized gradient magnitude channel, 3 color channels (LUVs), and 6 gradient direction histogram channels.
Training a VGG-16 network as a two-classifier according to the fine training sample;
and the two classifiers and the three classifiers are combined together to be used as a classification detector in a fine detection stage.
The embodiment finally comprises a detection stage, wherein the detection stage obtains a candidate target frame by passing the test sample image through two region candidate structures, and inputs the candidate target frame into a classifier in the fine detection stage for accurate classification, so as to obtain a pedestrian label target frame detection result. Here, the two region candidates refer to the original LDCF detection method and RPN method, respectively, and since they both aim to obtain candidate target frames for subsequent fine detection, i write here two region candidate structures.
In the embodiment, the VGG-16 network replaces the last two pooling layers with the cavity convolution layer with the step length of 2 to perform down-sampling operation, so that the receptive field is increased while the size of the feature map is reduced, and a better feature extraction effect is obtained. The last two pooling layers were replaced with convolutional layers with convolutional kernel size 3 x 3, step size 2, expansion 2, and padding 2. The step length of 2 achieves the effect of downsampling of the previous pooling layer, because the pooling layer is not learnable, the spatial hierarchical information is lost, more information is lost, and the convolution layer is used for replacing and making up the defect. Meanwhile, the expansion rate is 2, so that the receptive field of each point is properly enlarged, and the global information of the features is properly enhanced. The receptive field refers to the size of the original image corresponding to each pixel point in the convolved image.
Similarly, in this embodiment, it should be further explained that, in the fine detection stage of the framework structure, a mode of fine tuning the VGG-16 network is adopted to extract candidate window features generated by the coarse detection, and then a fine Adaboost classifier, that is, the two classifiers in this embodiment, is trained. The classifier consists of 4096 decision trees of depth 5. In the detection stage, the false detection result in the coarse detection stage is further filtered by the trained classifier, and finally, the non-maximum suppression algorithm is used for suppressing the overlapping part of the detection result to obtain the final result.
With the advent of the big data era and the emergence of high-performance computing systems, the convolutional network has recently achieved great success in the identification and classification directions of large-scale images, videos and the like, and the obtained effect is superior to that of the traditional feature extraction method. The VGG-16 network consists of 13 convolutional layers and 3 fully-connected layers, with 13 convolutional layers again separated by 5 max-pooling layers, where only network extraction features other than the fully-connected layers are utilized. 13 convolutional layers are separated by the pooling layer, the pooling layer is a down-sampling process, parameters are not learnable in the network training process, and therefore the loss of an internal data structure and the loss of space hierarchical information are inevitably caused, in order to overcome the defect, the hole convolutional layer with the step length of 2 is used for replacing the last two pooling layers to perform down-sampling operation, the size of a feature diagram is reduced, the receptive field is increased, and a better feature extraction effect is obtained.
The improved whole pedestrian detection framework respectively trains a classifier of the LDCF and an improved RPN network through input images in a training stage; the pedestrian and background pictures generated in the coarse detection stage are used as training samples to train, a VGG-16 network with a cavitation convolution instead of a pooling layer is used as a two-classification detector, and the two-classification detector is combined with an original classifier (namely a three-classifier) trained by improved self-similarity characteristics and convolution channel characteristics to be used as a classification detector in the fine detection stage. And in the detection stage, the candidate target frame is obtained through two region candidate structures of the picture, the candidate target frame is input into a classifier in the fine detection stage to be classified more accurately, and finally, a pedestrian target frame result is obtained.
Example 3
The embodiment provides a rough and fine pedestrian detection method based on region candidates for experimental result verification. The Caltech pedestrian dataset and TUD-Brussels are commonly used pedestrian detection datasets that most pedestrian detection algorithms use to evaluate the performance of the algorithms. The Caltech pedestrian detection data set is a video published by the california institute of technology, at a resolution of 640x480, 30 frames per second, for about 10 hours, captured with a vehicle-mounted camera. The data set is labeled with about 250000 frames of pictures, 350000 pedestrians are labeled with rectangular frames, and the occlusion environment in the data set is also labeled. The data set is divided into sets 00-10, wherein sets 00-05 are used as training sets, and sets 06-10 are used as testing sets.
In the experiment of this embodiment, 32077 training pictures are generated in sets 00 to 05 in the form of one picture at every 4 frames, and 4024 test pictures are obtained in sets 06 to 10 in the form of one picture at every 30 frames. The TUD-Brussels dataset is a dataset captured by a pair of vehicle-mounted cameras, and is motion information given by the dataset to evaluate the effect of the motion information on pedestrian detection, and is not used here. The training set has 1092 pairs of images for positive samples and 192 pairs of no-pedestrian images (part of images captured by the handheld camera) for negative samples, and the training set has 1776 pedestrian targets in total. The test set has 508 pairs of images, with a resolution of 640x480 as in the Caltech dataset. Evaluation of performance for different methods the performance effects of the different methods were compared with the average Log-average miss rate (Log-average miss rate) using the evaluation algorithm proposed by pittrdollar in 2012, the lower the average Log miss rate the better.
And (3) comparing experimental results:
to demonstrate the effectiveness of the method of this example, the method was compared with other methods experimentally on the TUD-Brussels data set, the experimental results are shown schematically in FIG. 3, which shows the log omission factor curves for these several methods. The most classical traditional feature extraction method has the average logarithmic omission ratio of 78% in TUD-Brussels, which is 32% higher than 46% in the method of the present embodiment, the experimental result of ConvNet frame is 69%, the result of LDCF is 52%, and in addition, MF + Motion +2Ped is obtained by adding Motion information to improve the accuracy of the model, which is 5% higher than the method of the present embodiment without adding Motion information, and is 1% higher than the original method (here, expressed by original), which fully proves that the method effectively reduces the omission ratio of the pedestrian detection method. In fig. 3, the rightmost end of the line segment is sequentially an HOG curve, an MF + Motion +2ped curve, a Convnet curve, an LDCF curve, an ours curve and an original curve from top to bottom. In FIG. 4, the leftmost end of the line segment is shown as HOG curve, SA-FastRCN curve, FasterRCNN + ATT curve, MS-CNN curve, original curve, ours curve, RPN + BF curve from top to bottom (wherein original curve and ours curve are close to each other at the left end and cannot be distinguished, but original curve at the end is located above the ours curve).
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (9)

1.一种基于区域候选的粗精行人检测方法,其特征在于:包括粗检测阶段和精细 检测阶段,所述粗检测阶段还包括以下步骤,1. a coarse and fine pedestrian detection method based on regional candidate, is characterized in that: comprise coarse detection stage and fine detection stage, and described coarse detection stage also comprises the following steps, 运用局部无关通道特征方法对粗训练样本的待检测图片进行粗检测;Use the local independent channel feature method to perform rough detection on the images to be detected of the rough training samples; 筛选出在所述粗训练样本上漏检掉的标签目标框;Filter out the label target frame that is missed on the rough training sample; 对漏检的所述标签目标框进行聚类分析,设置标签尺度与长宽比;Perform cluster analysis on the missed label target frame, and set label scale and aspect ratio; 利用所述尺度与长宽比训练区域候选网络;train a region candidate network using the scale and aspect ratio; 图片输入训练好的所述区域候选网络输出的检测结果与所述局部无关通道特征方法训练的一分类器输出的粗检测结果进行融合,得到粗检测结果;The detection result output by the region candidate network trained by the image input is fused with the coarse detection result output by a classifier trained by the local independent channel feature method to obtain a coarse detection result; 所述精细 检测阶段包括 以下步骤:The fine detection stage includes the following steps: 对所述粗训练样本行人图像提取不同的特征通道;extracting different feature channels from the rough training sample pedestrian image; 进一步对所述特征通道提取颜色自相似特征和卷积通道特征;Further extracting color self-similar features and convolution channel features to the feature channel; 利用所颜色自相似特征和所述卷积通道特征进行融合训练得到三分类器,输出相应检测结果。Use the color self-similar feature and the convolution channel feature to perform fusion training to obtain a three-class classifier, and output the corresponding detection result. 2.如权利要求1所述的基于区域候选的粗精行人检测方法,其特征在于:所述粗检测结果生成包括以下步骤,2. The rough and fine pedestrian detection method based on a region candidate as claimed in claim 1, wherein the generation of the rough detection result comprises the following steps: 对所述粗训练样本的行人图像提取不同的特征通道;extracting different feature channels from the pedestrian images of the rough training samples; 应用所述局部无关通道特征方法提取特征并训练所述一分类器;applying the locally independent channel feature method to extract features and train the one classifier; 通过训练好的所述一分类器对图像进行粗检测,生成粗检测结果。The image is roughly detected by the trained classifier to generate a rough detection result. 3.如权利要求1或2所述的基于区域候选的粗精行人检测方法,其特征在于:所述区域候选网络通过在卷积操作后生成的特征图,在所述特征图上选取候选区域,且所述区域候选网络通过选接收待检测图片作为输入,输出一系列带有对应得分值的矩形目标候选框。3. The rough and fine pedestrian detection method based on region candidates according to claim 1 or 2, wherein the region candidate network selects candidate regions on the feature map through the feature map generated after the convolution operation , and the region candidate network outputs a series of rectangular target candidate frames with corresponding score values by selecting and receiving images to be detected as input. 4.如权利要求3所述的基于区域候选的粗精行人检测方法,其特征在于:包括以下步骤,4. The rough and fine pedestrian detection method based on region candidate as claimed in claim 3, is characterized in that: comprises the following steps: 所述卷积操作的网络选用VGG-16网络的13个卷积层;The network of the convolution operation selects 13 convolution layers of the VGG-16 network; 输入图片经过卷积层提取特征,利用小型网络在前面得到的特征图上得到不同尺度与比例的滑动窗口,并将窗口都映射到维度更低的特征,包括映射到512维的特征;The input image is extracted by the convolution layer, and the small network is used to obtain sliding windows of different scales and proportions on the feature map obtained above, and the windows are mapped to features with lower dimensions, including features mapped to 512 dimensions; 通过两个全连接层对前面生成的窗口进行有效地分类和回归。The previously generated windows are efficiently classified and regressed through two fully connected layers. 5.如权利要求4所述的基于区域候选的粗精行人检测方法,其特征在于:所述区域候选网络训练的损失函数定义如下:5. The rough and fine pedestrian detection method based on regional candidate as claimed in claim 4, is characterized in that: the loss function of described regional candidate network training is defined as follows:
Figure FDA0002835814700000021
Figure FDA0002835814700000021
其中i表示目标候选框的序号,pi则是第i个目标候选框是行人目标的概率,当第i个目标候选框认定为目标时,pi *为1反之为0,ti表示预测的坐标,ti *表示真实目标的坐标。Where i represents the serial number of the target candidate frame, pi is the probability that the ith target candidate frame is a pedestrian target, when the ith target candidate frame is identified as a target, pi * is 1, otherwise it is 0, and t i represents the prediction The coordinates of t i * represent the coordinates of the real target.
6.如权利要求4或5所述的基于区域候选的粗精行人检测方法,其特征在于:所述区域候选网络训练过程中将与标签目标框有最大交并比或与真实目标框的重叠度大于0.7的样本作为正样本,而与真实目标框的重叠度小于0.3的样本作为负样本。6. The rough and fine pedestrian detection method based on regional candidates as claimed in claim 4 or 5, wherein the regional candidate network training process will have a maximum intersection ratio with the label target frame or overlap with the real target frame Samples with a degree greater than 0.7 are regarded as positive samples, while samples with an overlap with the real target frame less than 0.3 are regarded as negative samples. 7.如权利要求6所述的基于区域候选的粗精行人检测方法,其特征在于:所述精细检测阶段还包括以下步骤,7. The rough and fine pedestrian detection method based on region candidates according to claim 6, wherein the fine detection stage further comprises the following steps: 将所述粗检测阶段提取特征通道生成的行人和背景图片作为所述精细检测阶段的精细训练样本;Using the pedestrian and background images generated by the feature channels extracted in the coarse detection stage as fine training samples in the fine detection stage; 根据所述精细训练样本训练VGG-16网络,然后训练一个精细的Adaboost分类器作为二分类器;Train a VGG-16 network according to the fine training samples, and then train a fine Adaboost classifier as a binary classifier; 并将所述二分类器与所述三分类器相结合,共同作为所述精细检测阶段的分类检测器。The second classifier and the third classifier are combined together as a classification detector in the fine detection stage. 8.如权利要求7所述的基于区域候选的粗精行人检测方法,其特征在于:还包括检测阶段,所述检测阶段将测试样本图像通过局部无关通道特征方法和区域候选网络得到候选目标框,并将所述候选目标框输入所述精细检测阶段的分类检测器中进行精确分类,得出行人标签目标框检测结果。8. The rough and fine pedestrian detection method based on regional candidates as claimed in claim 7, further comprising a detection stage, wherein the detection stage obtains a candidate target frame from the test sample image through a local irrelevant channel feature method and a region candidate network , and input the candidate target frame into the classification detector in the fine detection stage for accurate classification, and obtain the pedestrian label target frame detection result. 9.如权利要求7或8所述的基于区域候选的粗精行人检测方法,其特征在于:所述根据精细训练样本训练的VGG-16网络利用步长为2的空洞卷积层代替其最后两个池化层做下采样操作,在降低特征图尺寸的同时增大感受野。9. The rough and fine pedestrian detection method based on regional candidates according to claim 7 or 8, wherein the VGG-16 network trained according to the fine training samples uses a hole convolution layer with a step size of 2 to replace its final The two pooling layers perform downsampling operations to reduce the size of the feature map while increasing the receptive field.
CN201910535870.1A 2019-06-20 2019-06-20 A Coarse and Fine Pedestrian Detection Method Based on Region Candidates Active CN110263712B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910535870.1A CN110263712B (en) 2019-06-20 2019-06-20 A Coarse and Fine Pedestrian Detection Method Based on Region Candidates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910535870.1A CN110263712B (en) 2019-06-20 2019-06-20 A Coarse and Fine Pedestrian Detection Method Based on Region Candidates

Publications (2)

Publication Number Publication Date
CN110263712A CN110263712A (en) 2019-09-20
CN110263712B true CN110263712B (en) 2021-02-23

Family

ID=67919859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910535870.1A Active CN110263712B (en) 2019-06-20 2019-06-20 A Coarse and Fine Pedestrian Detection Method Based on Region Candidates

Country Status (1)

Country Link
CN (1) CN110263712B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401215B (en) * 2020-03-12 2023-10-31 杭州涂鸦信息技术有限公司 Multi-class target detection method and system
CN111488918A (en) * 2020-03-20 2020-08-04 天津大学 Transformer substation infrared image equipment detection method based on convolutional neural network
CN111507282B (en) * 2020-04-21 2023-06-30 创优数字科技(广东)有限公司 Target detection early warning analysis system, method, equipment and medium
CN111695430B (en) * 2020-05-18 2023-06-30 电子科技大学 Multi-scale face detection method based on feature fusion and visual receptive field network
CN111666839A (en) * 2020-05-25 2020-09-15 东华大学 Road pedestrian detection system based on improved Faster RCNN
CN111738164B (en) * 2020-06-24 2021-02-26 广西计算中心有限责任公司 Pedestrian detection method based on deep learning
CN113221956B (en) * 2021-04-15 2024-02-02 国网浙江省电力有限公司杭州供电公司 Target identification method and device based on improved multi-scale depth model
CN113128521B (en) * 2021-04-30 2023-07-18 西安微电子技术研究所 Method, system, computer equipment and storage medium for extracting characteristics of miniaturized artificial intelligent model
CN113420725B (en) * 2021-08-20 2021-12-31 天津所托瑞安汽车科技有限公司 Method, device, system and storage medium for identifying false alarm scenes of BSD (backup service discovery) product

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017573A (en) * 2007-02-09 2007-08-15 南京大学 Method for detecting and identifying moving target based on video monitoring
JP2012123626A (en) * 2010-12-08 2012-06-28 Toyota Central R&D Labs Inc Object detector and program
CN102609720A (en) * 2012-01-31 2012-07-25 中国科学院自动化研究所 Pedestrian detection method based on position correction model
CN103778430A (en) * 2014-02-24 2014-05-07 东南大学 Rapid face detection method based on combination between skin color segmentation and AdaBoost
CN107463892A (en) * 2017-07-27 2017-12-12 北京大学深圳研究生院 Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics
CN108460336A (en) * 2018-01-29 2018-08-28 南京邮电大学 A kind of pedestrian detection method based on deep learning
CN108510000A (en) * 2018-03-30 2018-09-07 北京工商大学 The detection and recognition methods of pedestrian's fine granularity attribute under complex scene

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100568262C (en) * 2007-12-29 2009-12-09 浙江工业大学 Face recognition detection device based on multi-camera information fusion
CN108038409B (en) * 2017-10-27 2021-12-28 江西高创保安服务技术有限公司 Pedestrian detection method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017573A (en) * 2007-02-09 2007-08-15 南京大学 Method for detecting and identifying moving target based on video monitoring
JP2012123626A (en) * 2010-12-08 2012-06-28 Toyota Central R&D Labs Inc Object detector and program
CN102609720A (en) * 2012-01-31 2012-07-25 中国科学院自动化研究所 Pedestrian detection method based on position correction model
CN103778430A (en) * 2014-02-24 2014-05-07 东南大学 Rapid face detection method based on combination between skin color segmentation and AdaBoost
CN107463892A (en) * 2017-07-27 2017-12-12 北京大学深圳研究生院 Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics
CN108460336A (en) * 2018-01-29 2018-08-28 南京邮电大学 A kind of pedestrian detection method based on deep learning
CN108510000A (en) * 2018-03-30 2018-09-07 北京工商大学 The detection and recognition methods of pedestrian's fine granularity attribute under complex scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种新型粗-精表达策略行人检测方法;任汉俊等;《南京理工大学学报》;20171031;论文正文 *

Also Published As

Publication number Publication date
CN110263712A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN110263712B (en) A Coarse and Fine Pedestrian Detection Method Based on Region Candidates
CN111931684B (en) A weak and small target detection method based on discriminative features of video satellite data
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
CN107622258B (en) A Fast Pedestrian Detection Method Combining Static Underlying Features and Motion Information
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
Zhou et al. Robust vehicle detection in aerial images using bag-of-words and orientation aware scanning
CN111914664A (en) Vehicle multi-target detection and trajectory tracking method based on re-identification
CN103971386B (en) A kind of foreground detection method under dynamic background scene
CN108460403A (en) The object detection method and system of multi-scale feature fusion in a kind of image
WO2019196131A1 (en) Method and apparatus for filtering regions of interest for vehicle-mounted thermal imaging pedestrian detection
CN102932605B (en) A Combination Selection Method of Cameras in Visual Perception Network
CN113408492A (en) Pedestrian re-identification method based on global-local feature dynamic alignment
CN103714181B (en) A kind of hierarchical particular persons search method
CN108334848A (en) A kind of small face identification method based on generation confrontation network
CN101017573A (en) Method for detecting and identifying moving target based on video monitoring
CN105701448B (en) Three-dimensional face point cloud nose detection method and the data processing equipment for applying it
WO2013091370A1 (en) Human body part detection method based on parallel statistics learning of 3d depth image information
Zhang et al. Coarse-to-fine object detection in unmanned aerial vehicle imagery using lightweight convolutional neural network and deep motion saliency
CN109919223B (en) Target detection method and device based on deep neural network
CN109086659B (en) Human behavior recognition method and device based on multi-channel feature fusion
CN108564598B (en) An Improved Online Boosting Target Tracking Method
CN110008900B (en) Method for extracting candidate target from visible light remote sensing image from region to target
CN108734200B (en) Human target visual detection method and device based on BING feature
CN105760472A (en) Video retrieval method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211123

Address after: E2-103-1113, China Sensor Network International Innovation Park, 200 Linghu Avenue, Xinwu District, Wuxi City, Jiangsu Province, 214122

Patentee after: Uniform entropy technology (Wuxi) Co.,Ltd.

Address before: No. 1800 road 214122 Jiangsu Lihu Binhu District City of Wuxi Province

Patentee before: Jiangnan University