[go: up one dir, main page]

CN115565207B - Occlusion scene downlink person detection method with feature simulation fused - Google Patents

Occlusion scene downlink person detection method with feature simulation fused Download PDF

Info

Publication number
CN115565207B
CN115565207B CN202211510002.6A CN202211510002A CN115565207B CN 115565207 B CN115565207 B CN 115565207B CN 202211510002 A CN202211510002 A CN 202211510002A CN 115565207 B CN115565207 B CN 115565207B
Authority
CN
China
Prior art keywords
pedestrian
feature
thermodynamic diagram
detection
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211510002.6A
Other languages
Chinese (zh)
Other versions
CN115565207A (en
Inventor
韩守东
潘孝枫
丁绘霖
刘东海生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuke Intelligent Information Technology Co ltd
Original Assignee
Wuhan Tuke Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tuke Intelligent Technology Co ltd filed Critical Wuhan Tuke Intelligent Technology Co ltd
Priority to CN202211510002.6A priority Critical patent/CN115565207B/en
Publication of CN115565207A publication Critical patent/CN115565207A/en
Application granted granted Critical
Publication of CN115565207B publication Critical patent/CN115565207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a pedestrian detection method in an occlusion scene fused with feature simulation. And in the training stage, the pedestrian features are extracted by using the feature extraction network, and are classified according to the labeling information. And for pedestrian features of different classifications, learning the feature simulation strategy through different branches respectively. In the inference stage, the features extracted through the backbone network pass through two parallel feature simulation branches to obtain a central graph with different response results, and a more representative central point response graph is obtained through an effective fusion strategy. The blocking attribute of the detection frame is designed to solve the problem of missing detection of pedestrians in the dense area, the non-maximum value suppression method of blocking perception is designed, redundant pedestrian detection frames can be deleted in the post-processing stage, and the blocked pedestrian detection frames are reserved. The pedestrian detection performance under the shielding scene is effectively improved.

Description

Occlusion scene downlink person detection method integrating feature simulation
Technical Field
The invention relates to the field of pedestrian target detection research in image processing and machine vision, in particular to a method for detecting pedestrians under an occlusion scene by fusing feature simulation.
Background
Pedestrian detection in an occlusion scene is an important research subject in the field of computer vision application, and serves as an important upstream task to provide important clues for other downstream tasks such as pedestrian tracking, pedestrian re-identification, automatic driving and the like. Therefore, the pedestrian detection algorithm suitable for various complex scenes has important significance for improving the performance of downstream tasks.
The existing pedestrian detection method comprises a traditional machine vision method based on texture features and the like and a feature extraction method based on deep learning. Limited by the limitation of the related method on the apparent characteristics, the pedestrian detection algorithm under the existing occlusion scene has poor performance on the complex occlusion scene.
Under the complex scene, the occlusion of the pedestrian comprises the intra-class occlusion between the pedestrian and the inter-class occlusion between the pedestrian and other surrounding objects. The apparent features of pedestrians caused by shielding are reduced, so that the shielded pedestrians and the background can not be well distinguished by the detector, and higher missing detection is caused.
Disclosure of Invention
The invention provides a pedestrian detection method in an occlusion scene with fusion of feature simulation aiming at the technical problems in the prior art, which reduces the feature difference in pedestrians and increases the difference between the pedestrian and the background feature by a feature simulation learning mode so as to improve the detection rate of the pedestrian in the occlusion scene. Meanwhile, the shielding attribute is designed as extra semantic information. A non-maximum suppression algorithm for shielding perception is designed, not only the prediction attribute of the pedestrian detection frame is considered, but also the shielding attribute of the pedestrian detection frame is considered, and the detection frame with low confidence coefficient score caused by shielding is effectively reserved while the redundant detection frame is suppressed.
According to a first aspect of the invention, a method for detecting pedestrians in an occlusion scene simulated by fused features is provided, which includes: step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image acquired through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of an image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
and 4, considering the shielding attribute and the classification confidence of the detection frame, and performing non-maximum suppression on the third central point response thermodynamic diagram by using shielding perception to obtain the detection result of the image to be detected.
On the basis of the technical scheme, the invention can be improved as follows.
Optionally, the training process of the feature simulation learning network includes:
101, acquiring high-level features of a training image, a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole-body part detection frame of a pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian-obstructing characteristic and a non-pedestrian-obstructing characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first center point response thermodynamic diagram and the second center point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third center point response thermodynamic diagram.
Optionally, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:
calculating the visibility of pedestrians
Figure 534700DEST_PATH_IMAGE001
Figure 358299DEST_PATH_IMAGE002
Wherein,
Figure 621922DEST_PATH_IMAGE003
is the area of the frame visible to the pedestrian,
Figure 50629DEST_PATH_IMAGE004
is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole-body part features into the pedestrian-obstructing feature and the non-pedestrian-obstructing feature according to the visibility of the pedestrian:
Figure 233348DEST_PATH_IMAGE005
wherein,
Figure 435922DEST_PATH_IMAGE006
indicating the ith occluding pedestrian feature,
Figure 34394DEST_PATH_IMAGE007
represent
Figure 848766DEST_PATH_IMAGE008
A set of occluded pedestrian features;
Figure 592600DEST_PATH_IMAGE009
representing the ith non-occluding pedestrian feature,
Figure 390791DEST_PATH_IMAGE010
to represent
Figure 261796DEST_PATH_IMAGE011
A set of non-occluded pedestrian features.
Optionally, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature and a non-occluded pedestrian feature;
extracting the characteristic of each pedestrian to a fixed size by adopting RoI-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
by the Smooth-L1 function
Figure 196253DEST_PATH_IMAGE012
To mimic constraints for each feature that needs to be mimicked:
Figure 603665DEST_PATH_IMAGE013
wherein,
Figure 889152DEST_PATH_IMAGE014
indicating the jth imitated feature that was,
Figure 767110DEST_PATH_IMAGE015
the ensemble represents the mean of the N simulated features,
Figure 274183DEST_PATH_IMAGE016
the ith feature to be simulated is shown, and M is the number of features to be simulated.
Optionally, the fusion policy of the central point response thermodynamic diagram in step 3 and step 104 is:
Figure 641711DEST_PATH_IMAGE017
;
wherein,
Figure 352178DEST_PATH_IMAGE018
representing a first center point response thermodynamic diagram,
Figure 50138DEST_PATH_IMAGE019
representing a second center point response thermodynamic diagram,
Figure 224767DEST_PATH_IMAGE020
a third center point response thermodynamic diagram is represented,
Figure 966458DEST_PATH_IMAGE021
optionally, the features model a loss function of the learning network
Figure 226538DEST_PATH_IMAGE022
Comprises the following steps:
Figure 226724DEST_PATH_IMAGE023
Figure 255860DEST_PATH_IMAGE024
and
Figure 168452DEST_PATH_IMAGE025
a loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,
Figure 804576DEST_PATH_IMAGE026
learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,
Figure 218240DEST_PATH_IMAGE027
is the whole body-partThe characteristic simulation learning constraint loss function of the partial occlusion characteristic simulation module;
Figure 977249DEST_PATH_IMAGE028
is the equilibrium coefficient;
Figure 450956DEST_PATH_IMAGE029
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
Optionally, the method for suppressing the non-maximum value by using the occlusion sensing in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the shielding attribute difference does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
Optionally, the occlusion attribute of the detection frame is:
O = {o i |i = 1, 2, 3, 4}
wherein o is 1 ,o 2 ,o 3 ,o 4 Respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame; o denotes an occlusion property vector of one detection box.
Optionally, the step 4 includes:
step 401', initialize the sequence of test frames
Figure 810262DEST_PATH_IMAGE030
And corresponding confidence score orderColumn(s) of
Figure 168562DEST_PATH_IMAGE031
Wherein
Figure 172290DEST_PATH_IMAGE032
it indicates the (i) th detection box,
Figure 177417DEST_PATH_IMAGE033
is that
Figure 368227DEST_PATH_IMAGE032
A confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence score
Figure 999060DEST_PATH_IMAGE034
Taking out M from the detection frame sequence B and putting the M into a set F;
step 403', judge
Figure 778666DEST_PATH_IMAGE035
When it is used, order
Figure 594175DEST_PATH_IMAGE036
Figure 678806DEST_PATH_IMAGE037
Wherein, the IoU is an intersection ratio calculation function,
Figure 601412DEST_PATH_IMAGE038
is a set cross-over ratio threshold;
Figure 579732DEST_PATH_IMAGE039
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;
Figure 441509DEST_PATH_IMAGE040
and
Figure 872490DEST_PATH_IMAGE041
respectively a detection frame M and a detection frame
Figure 173022DEST_PATH_IMAGE032
The occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
The invention provides a method for detecting pedestrians moving downwards in an occlusion scene with fused feature simulation, which comprises the steps of firstly, providing feature simulation shrinking intra-class feature difference, and improving inter-class feature difference between pedestrians and background classes; secondly, a fusion characteristic simulation learning strategy is provided, difference complementation is realized, and the detection rate of the sheltered scene is improved; and thirdly, constructing an occlusion attribute, proposing occlusion perception non-maximum value inhibition, and effectively reserving a detection frame inhibited due to occlusion. By innovatively fusing the method, the pedestrian detection method under the occlusion scene simulated by the fusion characteristics is constructed and used for improving the pedestrian detection performance under the occlusion scene.
Drawings
Fig. 1 is a structural diagram of a method for detecting pedestrians in an occlusion scene with fusion feature simulation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of feature modeling learning provided by an embodiment of the present invention;
fig. 3 is a code program diagram of an occlusion aware non-maximum suppression algorithm according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a structural diagram of a method for detecting a pedestrian under an occlusion scene simulated by fused features according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image acquired through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a full body-visible feature mimicking learning centroid response thermodynamic diagram.
And 2, acquiring high-level features of the image to be detected through a backbone network, and inputting the high-level features into a feature simulation learning network to obtain a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
According to the pedestrian detection method in the occlusion scene fused with the feature simulation, provided by the embodiment of the invention, the feature difference in pedestrians is reduced in a feature simulation learning mode, and meanwhile, the difference between the pedestrian and the background feature is increased, so that the detection rate of the pedestrian in the occlusion scene is improved. Meanwhile, the shielding attribute is designed as additional semantic information. A non-maximum suppression algorithm for shielding perception is designed, not only the prediction attribute of the pedestrian detection frame is considered, but also the shielding attribute of the pedestrian detection frame is considered, and the detection frame with low confidence coefficient score caused by shielding is effectively reserved while the redundant detection frame is suppressed.
Example 1
Embodiment 1 provided by the present invention is an embodiment of a method for detecting a pedestrian under an occlusion scene simulated by fused features, and as can be seen from fig. 1, the embodiment of the method for detecting a pedestrian under an occlusion scene includes:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centerpoint response thermodynamic diagram mimics the learning centerpoint thermodynamic diagram for occlusion-non-occlusion features and the second centerpoint response thermodynamic diagram mimics the learning centerpoint thermodynamic diagram for whole-body-visual features.
In one possible embodiment mode, the training process of the feature simulation learning network comprises the following steps:
step 101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian.
102, extracting high-level features by adopting RoI-Align to obtain pedestrian whole body part features and pedestrian visible part features according to marking information of a visible part and a whole body part of a pedestrian; and calculating visibility according to the ratio of the areas of the visible part detection frame and the whole-body part detection frame of the pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian shielding characteristic and a non-pedestrian shielding characteristic according to the visibility.
In one possible embodiment, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:
calculating the visibility of pedestrians
Figure 864903DEST_PATH_IMAGE001
Figure 22215DEST_PATH_IMAGE002
Wherein,
Figure 815859DEST_PATH_IMAGE003
is the area of the frame visible to the pedestrian,
Figure 451239DEST_PATH_IMAGE004
is the whole body of the pedestrianArea of the frame.
Classifying the pedestrian whole body part features into blocking pedestrian features and non-blocking pedestrian features according to the visibility of the pedestrian:
Figure 764671DEST_PATH_IMAGE005
wherein,
Figure 92884DEST_PATH_IMAGE006
indicating the ith occluding pedestrian feature,
Figure 373824DEST_PATH_IMAGE007
represent
Figure 812896DEST_PATH_IMAGE008
A set of occluded pedestrian features;
Figure 479369DEST_PATH_IMAGE009
representing the ith non-occluding pedestrian feature,
Figure 978484DEST_PATH_IMAGE010
represent
Figure 481140DEST_PATH_IMAGE011
A set of non-occluded pedestrian features.
103, inputting the characteristic of the blocked pedestrian and the characteristic of the non-blocked pedestrian into a blocking-non-blocking characteristic simulation module for learning, and enabling the characteristic of the blocked pedestrian to learn the characteristic representation simulating the characteristic of the non-blocked pedestrian to obtain a first central point response thermodynamic diagram; inputting the whole-body characteristic of the pedestrian and the visible part characteristic of the pedestrian into a whole-body-visible characteristic simulation module for learning, and enabling the whole-body characteristic of the pedestrian to learn the characteristic representation of the visible part characteristic of the pedestrian to obtain a second central point response thermodynamic diagram.
Fig. 2 is a schematic diagram of the feature simulation learning provided by the embodiment of the present invention, and with reference to fig. 1 and fig. 2, in a possible embodiment, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the features of the pedestrian include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature, and a non-occluded pedestrian feature.
The features of each pedestrian were first extracted to a fixed size (7 × 256) using RoI-Align, and then the mean of the simulated features on each lane was calculated, using the feature mean as the object of the simulation.
By the Smooth-L1 function
Figure 723903DEST_PATH_IMAGE012
To mimic constraints for each feature that needs to be mimicked:
Figure 946680DEST_PATH_IMAGE013
wherein,
Figure 272488DEST_PATH_IMAGE014
indicating the jth emulated feature,
Figure 590337DEST_PATH_IMAGE015
the ensemble represents the mean of the N simulated features,
Figure 371212DEST_PATH_IMAGE016
the feature to be simulated is represented as the ith, and M is the number of the features to be simulated.
Two different occlusion emulation strategies are proposed: an occlusion-non-occlusion feature mimic learning module and a whole-body-visual feature mimic learning module.
And 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
In one possible embodiment, the fusion strategy of the center point response thermodynamic diagrams in step 3 and step 104 is:
Figure 763010DEST_PATH_IMAGE017
;
wherein,
Figure 603927DEST_PATH_IMAGE018
representing a first center point response thermodynamic diagram,
Figure 831908DEST_PATH_IMAGE019
representing a second center point response thermodynamic diagram,
Figure 416473DEST_PATH_IMAGE020
a third center point response thermodynamic diagram is represented,
Figure 459516DEST_PATH_IMAGE021
obtained by experiments.
In one possible embodiment, the features model a loss function of the learning network
Figure 674596DEST_PATH_IMAGE022
Comprises the following steps:
Figure 622829DEST_PATH_IMAGE023
Figure 276665DEST_PATH_IMAGE024
and
Figure 377476DEST_PATH_IMAGE025
a loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,
Figure 560196DEST_PATH_IMAGE026
learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,
Figure 339933DEST_PATH_IMAGE027
learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;
Figure 907048DEST_PATH_IMAGE028
is a balance coefficient, set by experiment
Figure 986999DEST_PATH_IMAGE042
Figure 747145DEST_PATH_IMAGE029
Wherein Lm is a loss calculation function;
Figure 279757DEST_PATH_IMAGE043
represent
Figure 868870DEST_PATH_IMAGE044
A set of full-body characteristics of an individual pedestrian,
Figure 741011DEST_PATH_IMAGE045
to represent
Figure 265534DEST_PATH_IMAGE044
A set of individual pedestrian-visible features.
And 2, acquiring high-level features of the image to be detected through a backbone network, inputting the high-level features into a feature simulation learning network, and obtaining a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.
And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.
And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.
In a possible embodiment, in the post-processing stage, the method for suppressing non-maximum using occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402.
Step 402, calculating the shielding attribute difference of the two intersected detection frames; when the shielding attribute difference exceeds a set threshold value, the shielding attribute difference indicates that the shielding attribute difference is different detection frames and needs to be reserved; when the difference of the shielding attributes does not exceed the set threshold, the shielding attributes are indicated to be a redundant detection frame, and the suppression deletion is required.
The occlusion attribute is a ratio of a length of a visible portion of each detection frame of the detection frame to a length of the detection frame.
In a scene of a vehicle-mounted camera, in a process from an initial position to infinity, a target at infinity is reduced to the middle of an image, and the vertical coordinate of the position of a lower detection frame is gradually reduced according to the depth of field of the target in the image. According to this phenomenon, for detection frames having an intersection relationship with each other, the occlusion relationship between pedestrians is determined from the ordinate value of the lower boundary of the detection frame, and the occlusion attribute of the detection frame is defined based on the occlusion relationship.
It can be understood that the occlusion property of the detection frame is:
O = {o i |i = 1, 2, 3, 4}
wherein o is 1 ,o 2 ,o 3 ,o 4 Respectively representing the visible length ratios of an upper frame, a right frame, a lower frame and a left frame; o denotes an occlusion property vector of one detection box. The shielding attributes of the four detection frames form the shielding attribute of the whole detection frame.
Fig. 3 is a code program diagram of an occlusion perception non-maximum suppression algorithm according to an embodiment of the present invention, and as can be seen from fig. 1 and fig. 3, in another possible embodiment, step 4 includes:
step 401', initialize the sequence of test frames
Figure 911541DEST_PATH_IMAGE030
And corresponding confidence score sequences
Figure 117394DEST_PATH_IMAGE031
Wherein, in the process,
Figure 171938DEST_PATH_IMAGE032
it indicates the (i) th detection box,
Figure 991995DEST_PATH_IMAGE033
is that
Figure 968041DEST_PATH_IMAGE032
The confidence score of (c).
Step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence score
Figure 39903DEST_PATH_IMAGE034
And taking M out of the detection frame sequence B and putting the M into a set F.
Step 403', judge
Figure 824319DEST_PATH_IMAGE035
When it is used, order
Figure 221802DEST_PATH_IMAGE036
Figure 839472DEST_PATH_IMAGE037
Wherein, the IoU is an intersection ratio calculation function,
Figure 715024DEST_PATH_IMAGE038
is a set intersection ratio threshold;
Figure 947422DEST_PATH_IMAGE039
is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame,A lower or left border;
Figure 656752DEST_PATH_IMAGE040
and
Figure 607391DEST_PATH_IMAGE041
respectively a detection frame M and a detection frame
Figure 145689DEST_PATH_IMAGE032
The occlusion property of the jth detection frame.
Step 404', circularly executing step 402' -step 403' until the sequence B is empty, returning the final sets F and S as the final detection frame sequence and the corresponding confidence score sequence respectively
Based on the defects in the background art, the embodiment of the invention provides a pedestrian detection method in an occlusion scene with feature simulation fused, 1, feature simulation is innovatively used for reducing the difference between pedestrian features, an effective thermodynamic diagram fusion strategy is provided by combining a model, and the pedestrian relevance ratio in the occlusion scene is effectively improved. 2. The pedestrian shielding attribute is constructed by utilizing the existing information and can be used as semantic information related to other visual tasks. 3. A blocking perception non-maximum value suppression algorithm is designed, redundant detection frames can be deleted, and meanwhile blocked pedestrian detection frames are reserved.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A method for detecting pedestrians under an occlusion scene with feature simulation fused, the method comprising:
step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;
step 2, acquiring high-level features of the image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;
step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the characteristic simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;
step 4, considering the shielding attribute and the classification confidence of the detection frame, and using non-maximum value suppression of shielding perception to the third central point response thermodynamic diagram to obtain the detection result of the image to be detected;
the training process of the feature simulation learning network comprises the following steps:
101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian;
102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole-body part detection frame of a pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian-obstructing characteristic and a non-pedestrian-obstructing characteristic according to the visibility;
step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;
and 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.
2. The detection method according to claim 1, wherein the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature comprises:
calculating the visibility Visr of the pedestrian:
Figure FDA0004058571480000021
wherein S is Visual frame Is the area of the pedestrian' S visible frame, S Whole body frame Is the area of the whole body frame of the pedestrian;
classifying the pedestrian whole body part feature into the pedestrian-obstructing feature and a non-pedestrian-obstructing feature according to the visibility of the pedestrian:
Figure FDA0004058571480000022
wherein,
Figure FDA0004058571480000023
indicates the ith occluded pedestrian feature, O indicates N O A set of occluded pedestrian features; />
Figure FDA0004058571480000024
Represents the ith non-occluded pedestrian feature, U represents N U A set of non-occluded pedestrian features.
3. The method according to claim 1, wherein the step 103 of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module comprises:
dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature and a non-occluded pedestrian feature;
extracting the characteristic of each pedestrian to a fixed size by adopting Rol-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;
by passing
Figure FDA0004058571480000031
The function is to model constraints for each feature that needs to be modeled:
Figure FDA0004058571480000032
wherein,
Figure FDA0004058571480000033
indicates the jth imitated characteristic, <' > is asserted>
Figure FDA0004058571480000034
Represents the mean, based on the N simulated features>
Figure FDA0004058571480000035
The ith feature to be simulated is shown, and M is the number of features to be simulated.
4. The method of claim 1, wherein the fusion strategy of the center point response thermodynamic diagram in step 3 and step 104 is:
M center =αM occ-unocc +(1-α)M full-vis
wherein M is occ-unocc Represents a first center point response thermodynamic diagram, M full-vis Representing a second center-point response thermodynamic diagram, M center Representing a third center point response thermodynamic diagram, α =0.5.
5. The detection method according to claim 1, wherein the features model a loss function L of a learning network Dual Comprises the following steps:
L Dual =λ c (L center1 +L center2 )+L occ-unocc +L full-vis
L center1 and L center1 Loss functions, L, of the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram, respectively occ-unocc Learning a constraint penalty function for feature modeling of an occlusion-non-occlusion feature modeling module, L full-vis Learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module; lambda [ alpha ] c Is the equilibrium coefficient;
L occ-unocc =L w (O,U),L full-vis =L m (F,V);
wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.
6. The detection method according to claim 1, wherein the method for using non-maximum suppression of occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:
step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;
step 402, calculating the shielding attribute difference of two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the shielding attribute difference does not exceed a set threshold value, deleting one of the two intersected detection frames;
the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.
7. The detection method according to claim 6, wherein the occlusion property of the detection frame is:
O={o i |i=1,2,3,4}
wherein o is 1 ,o 2 ,o 3 ,o 4 Respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame; o denotes an occlusion property vector of one detection box.
8. The detection method according to claim 1, wherein the step 4 comprises:
step 401', initialize the detection frame sequence B = { B = { (B) 1 ,…,b N } and the corresponding confidence score sequence S = { S = 1 ,...,s N In which b i I =1, \8230, N denotes the ith detection box, s i I =1, \ 8230;, N is b i A confidence score of (d);
step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box b with the highest current confidence score m Taking out M from the detection frame sequence B and putting the M into a set F;
step 403', determine IoU (M, b) i )≥N t When it is, let S i =S i ·f(M,b i ),
Figure FDA0004058571480000051
Wherein IoU is an intersection-to-parallel ratio calculation function, N t Is a set intersection ratio threshold;
N o is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;
Figure FDA0004058571480000052
and &>
Figure FDA0004058571480000053
Respectively a detection frame M and a detection frame b i The occlusion property of the jth detection frame;
and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.
CN202211510002.6A 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused Active CN115565207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211510002.6A CN115565207B (en) 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211510002.6A CN115565207B (en) 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused

Publications (2)

Publication Number Publication Date
CN115565207A CN115565207A (en) 2023-01-03
CN115565207B true CN115565207B (en) 2023-04-07

Family

ID=84769737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211510002.6A Active CN115565207B (en) 2022-11-29 2022-11-29 Occlusion scene downlink person detection method with feature simulation fused

Country Status (1)

Country Link
CN (1) CN115565207B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713731B (en) * 2023-01-10 2023-04-07 武汉图科智能科技有限公司 Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method
CN115937906B (en) * 2023-02-16 2023-06-06 武汉图科智能科技有限公司 Occlusion scene pedestrian re-identification method based on occlusion suppression and feature reconstruction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191204A (en) * 2021-04-07 2021-07-30 华中科技大学 Multi-scale blocking pedestrian detection method and system
CN114419671A (en) * 2022-01-18 2022-04-29 北京工业大学 An occluded pedestrian re-identification method based on hypergraph neural network
CN114419568A (en) * 2022-01-18 2022-04-29 东北大学 A multi-view pedestrian detection method based on feature fusion

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598601A (en) * 2019-08-30 2019-12-20 电子科技大学 Face 3D key point detection method and system based on distributed thermodynamic diagram
CN111126272B (en) * 2019-12-24 2020-11-10 腾讯科技(深圳)有限公司 Posture acquisition method, and training method and device of key point coordinate positioning model
CN111738091A (en) * 2020-05-27 2020-10-02 复旦大学 A pose estimation and human body parsing system based on multi-task deep learning
CN112836676B (en) * 2021-03-01 2022-11-01 创新奇智(北京)科技有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN113239885A (en) * 2021-06-04 2021-08-10 新大陆数字技术股份有限公司 Face detection and recognition method and system
CN114639042B (en) * 2022-03-17 2025-04-25 哈尔滨理工大学 Video object detection algorithm based on improved CenterNet backbone network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191204A (en) * 2021-04-07 2021-07-30 华中科技大学 Multi-scale blocking pedestrian detection method and system
CN114419671A (en) * 2022-01-18 2022-04-29 北京工业大学 An occluded pedestrian re-identification method based on hypergraph neural network
CN114419568A (en) * 2022-01-18 2022-04-29 东北大学 A multi-view pedestrian detection method based on feature fusion

Also Published As

Publication number Publication date
CN115565207A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN112926410B (en) Target tracking method, device, storage medium and intelligent video system
CN111489403B (en) Method and device for generating virtual feature map by using GAN
CN111898406B (en) Face detection method based on focus loss and multitask cascade
CN115565207B (en) Occlusion scene downlink person detection method with feature simulation fused
CN106778757B (en) Scene text detection method based on text saliency
CN109815997A (en) The method and relevant apparatus of identification vehicle damage based on deep learning
CN110796018A (en) A Hand Motion Recognition Method Based on Depth Image and Color Image
CN105303163B (en) A kind of method and detection device of target detection
CN113033523B (en) Method and system for constructing falling judgment model and falling judgment method and system
CN111274994A (en) Cartoon face detection method and device, electronic equipment and computer readable medium
CN109871792B (en) Pedestrian detection method and device
CN107633242A (en) Network model training method, device, equipment and storage medium
CN114359892B (en) Three-dimensional target detection method, three-dimensional target detection device and computer-readable storage medium
CN112861678A (en) Image identification method and device
CN113570615A (en) An image processing method, electronic device and storage medium based on deep learning
CN118229965B (en) Small target detection method in UAV aerial photography based on background noise reduction
CN119478351A (en) A method for target detection based on improved YOLOv8-DES model
Xie et al. Dynamic Dual-Peak Network: A real-time human detection network in crowded scenes
CN116469014B (en) Small sample satellite radar image sailboard identification and segmentation method based on optimized Mask R-CNN
CN112733671A (en) Pedestrian detection method, device and readable storage medium
CN110956097A (en) Method and module for extracting occluded human body and method and device for scene conversion
CN111160219B (en) Object integrity evaluation method and device, electronic equipment and storage medium
CN115713731A (en) Crowd scene pedestrian detection model construction method and crowd scene pedestrian detection method
CN111783791B (en) Image classification method, apparatus and computer readable storage medium
KR101972095B1 (en) Method and Apparatus of adding artificial object for improving performance in detecting object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd.

Address before: 430000 B033, No. 05, 4th floor, building 2, international enterprise center, No. 1, Guanggu Avenue, Donghu New Technology Development Zone, Wuhan, Hubei (Wuhan area of free trade zone)

Patentee before: Wuhan Tuke Intelligent Technology Co.,Ltd.

CP03 Change of name, title or address