CN115565207B - Occlusion scene downlink person detection method with feature simulation fused - Google Patents
Occlusion scene downlink person detection method with feature simulation fused Download PDFInfo
- Publication number
- CN115565207B CN115565207B CN202211510002.6A CN202211510002A CN115565207B CN 115565207 B CN115565207 B CN 115565207B CN 202211510002 A CN202211510002 A CN 202211510002A CN 115565207 B CN115565207 B CN 115565207B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- feature
- thermodynamic diagram
- detection
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 128
- 238000004088 simulation Methods 0.000 title claims abstract description 59
- 230000004044 response Effects 0.000 claims abstract description 84
- 238000000034 method Methods 0.000 claims abstract description 29
- 230000004927 fusion Effects 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000008447 perception Effects 0.000 claims abstract description 12
- 230000001629 suppression Effects 0.000 claims abstract description 10
- 230000000903 blocking effect Effects 0.000 claims abstract description 8
- 238000010586 diagram Methods 0.000 claims description 96
- 230000006870 function Effects 0.000 claims description 23
- 208000006440 Open Bite Diseases 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 8
- 230000003213 activating effect Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000000007 visual effect Effects 0.000 claims description 2
- 230000009977 dual effect Effects 0.000 claims 2
- 238000012805 post-processing Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000002372 labelling Methods 0.000 abstract 1
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000003278 mimic effect Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a pedestrian detection method in an occlusion scene fused with feature simulation. And in the training stage, the pedestrian features are extracted by using the feature extraction network, and are classified according to the labeling information. And for pedestrian features of different classifications, learning the feature simulation strategy through different branches respectively. In the inference stage, the features extracted through the backbone network pass through two parallel feature simulation branches to obtain a central graph with different response results, and a more representative central point response graph is obtained through an effective fusion strategy. The blocking attribute of the detection frame is designed to solve the problem of missing detection of pedestrians in the dense area, the non-maximum value suppression method of blocking perception is designed, redundant pedestrian detection frames can be deleted in the post-processing stage, and the blocked pedestrian detection frames are reserved. The pedestrian detection performance under the shielding scene is effectively improved.
Description
Technical Field
The invention relates to the field of pedestrian target detection research in image processing and machine vision, in particular to a method for detecting pedestrians under an occlusion scene by fusing feature simulation.
Background
Pedestrian detection in an occlusion scene is an important research subject in the field of computer vision application, and serves as an important upstream task to provide important clues for other downstream tasks such as pedestrian tracking, pedestrian re-identification, automatic driving and the like. Therefore, the pedestrian detection algorithm suitable for various complex scenes has important significance for improving the performance of downstream tasks.
The existing pedestrian detection method comprises a traditional machine vision method based on texture features and the like and a feature extraction method based on deep learning. Limited by the limitation of the related method on the apparent characteristics, the pedestrian detection algorithm under the existing occlusion scene has poor performance on the complex occlusion scene.
Under the complex scene, the occlusion of the pedestrian comprises the intra-class occlusion between the pedestrian and the inter-class occlusion between the pedestrian and other surrounding objects. The apparent features of pedestrians caused by shielding are reduced, so that the shielded pedestrians and the background can not be well distinguished by the detector, and higher missing detection is caused.
Disclosure of Invention
The invention provides a pedestrian detection method in an occlusion scene with fusion of feature simulation aiming at the technical problems in the prior art, which reduces the feature difference in pedestrians and increases the difference between the pedestrian and the background feature by a feature simulation learning mode so as to improve the detection rate of the pedestrian in the occlusion scene. Meanwhile, the shielding attribute is designed as extra semantic information. A non-maximum suppression algorithm for shielding perception is designed, not only the prediction attribute of the pedestrian detection frame is considered, but also the shielding attribute of the pedestrian detection frame is considered, and the detection frame with low confidence coefficient score caused by shielding is effectively reserved while the redundant detection frame is suppressed.
According to a first aspect of the invention, a method for detecting pedestrians in an occlusion scene simulated by fused features is provided, which includes: step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image acquired through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of an image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
and 4, considering the shielding attribute and the classification confidence of the detection frame, and performing non-maximum suppression on the third central point response thermodynamic diagram by using shielding perception to obtain the detection result of the image to be detected.
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the training process of the feature simulation learning network includes:
101, acquiring high-level features of a training image, a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole-body part detection frame of a pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian-obstructing characteristic and a non-pedestrian-obstructing characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first center point response thermodynamic diagram and the second center point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third center point response thermodynamic diagram.
Optionally, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:
Wherein,is the area of the frame visible to the pedestrian,is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole-body part features into the pedestrian-obstructing feature and the non-pedestrian-obstructing feature according to the visibility of the pedestrian:
wherein,indicating the ith occluding pedestrian feature,representA set of occluded pedestrian features;representing the ith non-occluding pedestrian feature,to representA set of non-occluded pedestrian features.
Optionally, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature and a non-occluded pedestrian feature;
extracting the characteristic of each pedestrian to a fixed size by adopting RoI-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
wherein,indicating the jth imitated feature that was,the ensemble represents the mean of the N simulated features,the ith feature to be simulated is shown, and M is the number of features to be simulated.
Optionally, the fusion policy of the central point response thermodynamic diagram in step 3 and step 104 is:
wherein,representing a first center point response thermodynamic diagram,representing a second center point response thermodynamic diagram,a third center point response thermodynamic diagram is represented,。
optionally, the features model a loss function of the learning networkComprises the following steps:
anda loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,is the whole body-partThe characteristic simulation learning constraint loss function of the partial occlusion characteristic simulation module;is the equilibrium coefficient;
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
Optionally, the method for suppressing the non-maximum value by using the occlusion sensing in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the shielding attribute difference does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
Optionally, the occlusion attribute of the detection frame is:
O = {o i |i = 1, 2, 3, 4}
wherein o is 1 ,o 2 ,o 3 ,o 4 Respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame; o denotes an occlusion property vector of one detection box.
Optionally, the step 4 includes:
step 401', initialize the sequence of test framesAnd corresponding confidence score orderColumn(s) ofWhereinit indicates the (i) th detection box,is thatA confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence scoreTaking out M from the detection frame sequence B and putting the M into a set F;
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;andrespectively a detection frame M and a detection frameThe occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
The invention provides a method for detecting pedestrians moving downwards in an occlusion scene with fused feature simulation, which comprises the steps of firstly, providing feature simulation shrinking intra-class feature difference, and improving inter-class feature difference between pedestrians and background classes; secondly, a fusion characteristic simulation learning strategy is provided, difference complementation is realized, and the detection rate of the sheltered scene is improved; and thirdly, constructing an occlusion attribute, proposing occlusion perception non-maximum value inhibition, and effectively reserving a detection frame inhibited due to occlusion. By innovatively fusing the method, the pedestrian detection method under the occlusion scene simulated by the fusion characteristics is constructed and used for improving the pedestrian detection performance under the occlusion scene.
Drawings
Fig. 1 is a structural diagram of a method for detecting pedestrians in an occlusion scene with fusion feature simulation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of feature modeling learning provided by an embodiment of the present invention;
fig. 3 is a code program diagram of an occlusion aware non-maximum suppression algorithm according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a structural diagram of a method for detecting a pedestrian under an occlusion scene simulated by fused features according to an embodiment of the present invention, as shown in fig. 1, the method includes:
And 2, acquiring high-level features of the image to be detected through a backbone network, and inputting the high-level features into a feature simulation learning network to obtain a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
According to the pedestrian detection method in the occlusion scene fused with the feature simulation, provided by the embodiment of the invention, the feature difference in pedestrians is reduced in a feature simulation learning mode, and meanwhile, the difference between the pedestrian and the background feature is increased, so that the detection rate of the pedestrian in the occlusion scene is improved. Meanwhile, the shielding attribute is designed as additional semantic information. A non-maximum suppression algorithm for shielding perception is designed, not only the prediction attribute of the pedestrian detection frame is considered, but also the shielding attribute of the pedestrian detection frame is considered, and the detection frame with low confidence coefficient score caused by shielding is effectively reserved while the redundant detection frame is suppressed.
Example 1
In one possible embodiment mode, the training process of the feature simulation learning network comprises the following steps:
step 101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian.
102, extracting high-level features by adopting RoI-Align to obtain pedestrian whole body part features and pedestrian visible part features according to marking information of a visible part and a whole body part of a pedestrian; and calculating visibility according to the ratio of the areas of the visible part detection frame and the whole-body part detection frame of the pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian shielding characteristic and a non-pedestrian shielding characteristic according to the visibility.
In one possible embodiment, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:
Wherein,is the area of the frame visible to the pedestrian,is the whole body of the pedestrianArea of the frame.
Classifying the pedestrian whole body part features into blocking pedestrian features and non-blocking pedestrian features according to the visibility of the pedestrian:
wherein,indicating the ith occluding pedestrian feature,representA set of occluded pedestrian features;representing the ith non-occluding pedestrian feature,representA set of non-occluded pedestrian features.
103, inputting the characteristic of the blocked pedestrian and the characteristic of the non-blocked pedestrian into a blocking-non-blocking characteristic simulation module for learning, and enabling the characteristic of the blocked pedestrian to learn the characteristic representation simulating the characteristic of the non-blocked pedestrian to obtain a first central point response thermodynamic diagram; inputting the whole-body characteristic of the pedestrian and the visible part characteristic of the pedestrian into a whole-body-visible characteristic simulation module for learning, and enabling the whole-body characteristic of the pedestrian to learn the characteristic representation of the visible part characteristic of the pedestrian to obtain a second central point response thermodynamic diagram.
Fig. 2 is a schematic diagram of the feature simulation learning provided by the embodiment of the present invention, and with reference to fig. 1 and fig. 2, in a possible embodiment, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the features of the pedestrian include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature, and a non-occluded pedestrian feature.
The features of each pedestrian were first extracted to a fixed size (7 × 256) using RoI-Align, and then the mean of the simulated features on each lane was calculated, using the feature mean as the object of the simulation.
wherein,indicating the jth emulated feature,the ensemble represents the mean of the N simulated features,the feature to be simulated is represented as the ith, and M is the number of the features to be simulated.
Two different occlusion emulation strategies are proposed: an occlusion-non-occlusion feature mimic learning module and a whole-body-visual feature mimic learning module.
And 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
In one possible embodiment, the fusion strategy of the center point response thermodynamic diagrams in step 3 and step 104 is:
wherein,representing a first center point response thermodynamic diagram,representing a second center point response thermodynamic diagram,a third center point response thermodynamic diagram is represented,obtained by experiments.
In one possible embodiment, the features model a loss function of the learning networkComprises the following steps:
anda loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;is a balance coefficient, set by experiment。
Wherein Lm is a loss calculation function;representA set of full-body characteristics of an individual pedestrian,to representA set of individual pedestrian-visible features.
And 2, acquiring high-level features of the image to be detected through a backbone network, inputting the high-level features into a feature simulation learning network, and obtaining a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
In a possible embodiment, in the post-processing stage, the method for suppressing non-maximum using occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402.
Step 402, calculating the shielding attribute difference of the two intersected detection frames; when the shielding attribute difference exceeds a set threshold value, the shielding attribute difference indicates that the shielding attribute difference is different detection frames and needs to be reserved; when the difference of the shielding attributes does not exceed the set threshold, the shielding attributes are indicated to be a redundant detection frame, and the suppression deletion is required.
The occlusion attribute is a ratio of a length of a visible portion of each detection frame of the detection frame to a length of the detection frame.
In a scene of a vehicle-mounted camera, in a process from an initial position to infinity, a target at infinity is reduced to the middle of an image, and the vertical coordinate of the position of a lower detection frame is gradually reduced according to the depth of field of the target in the image. According to this phenomenon, for detection frames having an intersection relationship with each other, the occlusion relationship between pedestrians is determined from the ordinate value of the lower boundary of the detection frame, and the occlusion attribute of the detection frame is defined based on the occlusion relationship.
It can be understood that the occlusion property of the detection frame is:
O = {o i |i = 1, 2, 3, 4}
wherein o is 1 ,o 2 ,o 3 ,o 4 Respectively representing the visible length ratios of an upper frame, a right frame, a lower frame and a left frame; o denotes an occlusion property vector of one detection box. The shielding attributes of the four detection frames form the shielding attribute of the whole detection frame.
Fig. 3 is a code program diagram of an occlusion perception non-maximum suppression algorithm according to an embodiment of the present invention, and as can be seen from fig. 1 and fig. 3, in another possible embodiment, step 4 includes:
step 401', initialize the sequence of test framesAnd corresponding confidence score sequencesWherein, in the process,it indicates the (i) th detection box,is thatThe confidence score of (c).
Step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence scoreAnd taking M out of the detection frame sequence B and putting the M into a set F.
Wherein, the IoU is an intersection ratio calculation function,is a set intersection ratio threshold;
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame,A lower or left border;andrespectively a detection frame M and a detection frameThe occlusion property of the jth detection frame.
Step 404', circularly executing step 402' -step 403' until the sequence B is empty, returning the final sets F and S as the final detection frame sequence and the corresponding confidence score sequence respectively
Based on the defects in the background art, the embodiment of the invention provides a pedestrian detection method in an occlusion scene with feature simulation fused, 1, feature simulation is innovatively used for reducing the difference between pedestrian features, an effective thermodynamic diagram fusion strategy is provided by combining a model, and the pedestrian relevance ratio in the occlusion scene is effectively improved. 2. The pedestrian shielding attribute is constructed by utilizing the existing information and can be used as semantic information related to other visual tasks. 3. A blocking perception non-maximum value suppression algorithm is designed, redundant detection frames can be deleted, and meanwhile blocked pedestrian detection frames are reserved.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
1. A method for detecting pedestrians under an occlusion scene with feature simulation fused, the method comprising:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of the image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the characteristic simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
step 4, considering the shielding attribute and the classification confidence of the detection frame, and using non-maximum value suppression of shielding perception to the third central point response thermodynamic diagram to obtain the detection result of the image to be detected;
the training process of the feature simulation learning network comprises the following steps:
101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole-body part detection frame of a pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian-obstructing characteristic and a non-pedestrian-obstructing characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
2. The detection method according to claim 1, wherein the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature comprises:
calculating the visibility Visr of the pedestrian:
wherein S is Visual frame Is the area of the pedestrian' S visible frame, S Whole body frame Is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole body part feature into the pedestrian-obstructing feature and a non-pedestrian-obstructing feature according to the visibility of the pedestrian:
3. The method according to claim 1, wherein the step 103 of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module comprises:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature and a non-occluded pedestrian feature;
extracting the characteristic of each pedestrian to a fixed size by adopting Rol-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
4. The method of claim 1, wherein the fusion strategy of the center point response thermodynamic diagram in step 3 and step 104 is:
M center =αM occ-unocc +(1-α)M full-vis ;
wherein M is occ-unocc Represents a first center point response thermodynamic diagram, M full-vis Representing a second center-point response thermodynamic diagram, M center Representing a third center point response thermodynamic diagram, α =0.5.
5. The detection method according to claim 1, wherein the features model a loss function L of a learning network Dual Comprises the following steps:
L Dual =λ c (L center1 +L center2 )+L occ-unocc +L full-vis ;
L center1 and L center1 Loss functions, L, of the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram, respectively occ-unocc Learning a constraint penalty function for feature modeling of an occlusion-non-occlusion feature modeling module, L full-vis Learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module; lambda [ alpha ] c Is the equilibrium coefficient;
L occ-unocc =L w (O,U),L full-vis =L m (F,V);
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
6. The detection method according to claim 1, wherein the method for using non-maximum suppression of occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the shielding attribute difference does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
7. The detection method according to claim 6, wherein the occlusion property of the detection frame is:
O={o i |i=1,2,3,4}
wherein o is 1 ,o 2 ,o 3 ,o 4 Respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame; o denotes an occlusion property vector of one detection box.
8. The detection method according to claim 1, wherein the step 4 comprises:
step 401', initialize the detection frame sequence B = { B = { (B) 1 ,…,b N } and the corresponding confidence score sequence S = { S = 1 ,...,s N In which b i I =1, \8230, N denotes the ith detection box, s i I =1, \ 8230;, N is b i A confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box b with the highest current confidence score m Taking out M from the detection frame sequence B and putting the M into a set F;
step 403', determine IoU (M, b) i )≥N t When it is, let S i =S i ·f(M,b i ),
Wherein IoU is an intersection-to-parallel ratio calculation function, N t Is a set intersection ratio threshold;
N o is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;and &>Respectively a detection frame M and a detection frame b i The occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211510002.6A CN115565207B (en) | 2022-11-29 | 2022-11-29 | Occlusion scene downlink person detection method with feature simulation fused |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211510002.6A CN115565207B (en) | 2022-11-29 | 2022-11-29 | Occlusion scene downlink person detection method with feature simulation fused |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115565207A CN115565207A (en) | 2023-01-03 |
CN115565207B true CN115565207B (en) | 2023-04-07 |
Family
ID=84769737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211510002.6A Active CN115565207B (en) | 2022-11-29 | 2022-11-29 | Occlusion scene downlink person detection method with feature simulation fused |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115565207B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115713731B (en) * | 2023-01-10 | 2023-04-07 | 武汉图科智能科技有限公司 | Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method |
CN115937906B (en) * | 2023-02-16 | 2023-06-06 | 武汉图科智能科技有限公司 | Occlusion scene pedestrian re-identification method based on occlusion suppression and feature reconstruction |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113191204A (en) * | 2021-04-07 | 2021-07-30 | 华中科技大学 | Multi-scale blocking pedestrian detection method and system |
CN114419671A (en) * | 2022-01-18 | 2022-04-29 | 北京工业大学 | An occluded pedestrian re-identification method based on hypergraph neural network |
CN114419568A (en) * | 2022-01-18 | 2022-04-29 | 东北大学 | A multi-view pedestrian detection method based on feature fusion |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598601A (en) * | 2019-08-30 | 2019-12-20 | 电子科技大学 | Face 3D key point detection method and system based on distributed thermodynamic diagram |
CN111126272B (en) * | 2019-12-24 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Posture acquisition method, and training method and device of key point coordinate positioning model |
CN111738091A (en) * | 2020-05-27 | 2020-10-02 | 复旦大学 | A pose estimation and human body parsing system based on multi-task deep learning |
CN112836676B (en) * | 2021-03-01 | 2022-11-01 | 创新奇智(北京)科技有限公司 | Abnormal behavior detection method and device, electronic equipment and storage medium |
CN113239885A (en) * | 2021-06-04 | 2021-08-10 | 新大陆数字技术股份有限公司 | Face detection and recognition method and system |
CN114639042B (en) * | 2022-03-17 | 2025-04-25 | 哈尔滨理工大学 | Video object detection algorithm based on improved CenterNet backbone network |
-
2022
- 2022-11-29 CN CN202211510002.6A patent/CN115565207B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113191204A (en) * | 2021-04-07 | 2021-07-30 | 华中科技大学 | Multi-scale blocking pedestrian detection method and system |
CN114419671A (en) * | 2022-01-18 | 2022-04-29 | 北京工业大学 | An occluded pedestrian re-identification method based on hypergraph neural network |
CN114419568A (en) * | 2022-01-18 | 2022-04-29 | 东北大学 | A multi-view pedestrian detection method based on feature fusion |
Also Published As
Publication number | Publication date |
---|---|
CN115565207A (en) | 2023-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112926410B (en) | Target tracking method, device, storage medium and intelligent video system | |
CN111489403B (en) | Method and device for generating virtual feature map by using GAN | |
CN111898406B (en) | Face detection method based on focus loss and multitask cascade | |
CN115565207B (en) | Occlusion scene downlink person detection method with feature simulation fused | |
CN106778757B (en) | Scene text detection method based on text saliency | |
CN109815997A (en) | The method and relevant apparatus of identification vehicle damage based on deep learning | |
CN110796018A (en) | A Hand Motion Recognition Method Based on Depth Image and Color Image | |
CN105303163B (en) | A kind of method and detection device of target detection | |
CN113033523B (en) | Method and system for constructing falling judgment model and falling judgment method and system | |
CN111274994A (en) | Cartoon face detection method and device, electronic equipment and computer readable medium | |
CN109871792B (en) | Pedestrian detection method and device | |
CN107633242A (en) | Network model training method, device, equipment and storage medium | |
CN114359892B (en) | Three-dimensional target detection method, three-dimensional target detection device and computer-readable storage medium | |
CN112861678A (en) | Image identification method and device | |
CN113570615A (en) | An image processing method, electronic device and storage medium based on deep learning | |
CN118229965B (en) | Small target detection method in UAV aerial photography based on background noise reduction | |
CN119478351A (en) | A method for target detection based on improved YOLOv8-DES model | |
Xie et al. | Dynamic Dual-Peak Network: A real-time human detection network in crowded scenes | |
CN116469014B (en) | Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN | |
CN112733671A (en) | Pedestrian detection method, device and readable storage medium | |
CN110956097A (en) | Method and module for extracting occluded human body and method and device for scene conversion | |
CN111160219B (en) | Object integrity evaluation method and device, electronic equipment and storage medium | |
CN115713731A (en) | Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method | |
CN111783791B (en) | Image classification method, apparatus and computer readable storage medium | |
KR101972095B1 (en) | Method and Apparatus of adding artificial object for improving performance in detecting object |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd. Address before: 430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone) Patentee before: Wuhan Tuke Intelligent Technology Co.,Ltd. |
|
CP03 | Change of name, title or address |