CN115565207B

CN115565207B - Occlusion scene downlink person detection method with feature simulation fused

Info

Publication number: CN115565207B
Application number: CN202211510002.6A
Authority: CN
Inventors: 韩守东; 潘孝枫; 丁绘霖; 刘东海生
Original assignee: Wuhan Tuke Intelligent Technology Co ltd
Current assignee: Hangzhou Tuke Intelligent Information Technology Co ltd
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-04-07
Anticipated expiration: 2042-11-29
Also published as: CN115565207A

Abstract

The invention relates to a pedestrian detection method in an occlusion scene fused with feature simulation. And in the training stage, the pedestrian features are extracted by using the feature extraction network, and are classified according to the labeling information. And for pedestrian features of different classifications, learning the feature simulation strategy through different branches respectively. In the inference stage, the features extracted through the backbone network pass through two parallel feature simulation branches to obtain a central graph with different response results, and a more representative central point response graph is obtained through an effective fusion strategy. The blocking attribute of the detection frame is designed to solve the problem of missing detection of pedestrians in the dense area, the non-maximum value suppression method of blocking perception is designed, redundant pedestrian detection frames can be deleted in the post-processing stage, and the blocked pedestrian detection frames are reserved. The pedestrian detection performance under the shielding scene is effectively improved.

Description

Occlusion scene downlink person detection method integrating feature simulation

Technical Field

The invention relates to the field of pedestrian target detection research in image processing and machine vision, in particular to a method for detecting pedestrians under an occlusion scene by fusing feature simulation.

Background

Pedestrian detection in an occlusion scene is an important research subject in the field of computer vision application, and serves as an important upstream task to provide important clues for other downstream tasks such as pedestrian tracking, pedestrian re-identification, automatic driving and the like. Therefore, the pedestrian detection algorithm suitable for various complex scenes has important significance for improving the performance of downstream tasks.

The existing pedestrian detection method comprises a traditional machine vision method based on texture features and the like and a feature extraction method based on deep learning. Limited by the limitation of the related method on the apparent characteristics, the pedestrian detection algorithm under the existing occlusion scene has poor performance on the complex occlusion scene.

Under the complex scene, the occlusion of the pedestrian comprises the intra-class occlusion between the pedestrian and the inter-class occlusion between the pedestrian and other surrounding objects. The apparent features of pedestrians caused by shielding are reduced, so that the shielded pedestrians and the background can not be well distinguished by the detector, and higher missing detection is caused.

Disclosure of Invention

The invention provides a pedestrian detection method in an occlusion scene with fusion of feature simulation aiming at the technical problems in the prior art, which reduces the feature difference in pedestrians and increases the difference between the pedestrian and the background feature by a feature simulation learning mode so as to improve the detection rate of the pedestrian in the occlusion scene. Meanwhile, the shielding attribute is designed as extra semantic information. A non-maximum suppression algorithm for shielding perception is designed, not only the prediction attribute of the pedestrian detection frame is considered, but also the shielding attribute of the pedestrian detection frame is considered, and the detection frame with low confidence coefficient score caused by shielding is effectively reserved while the redundant detection frame is suppressed.

According to a first aspect of the invention, a method for detecting pedestrians in an occlusion scene simulated by fused features is provided, which includes: step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image acquired through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;

step 2, acquiring high-level features of an image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;

step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;

and 4, considering the shielding attribute and the classification confidence of the detection frame, and performing non-maximum suppression on the third central point response thermodynamic diagram by using shielding perception to obtain the detection result of the image to be detected.

On the basis of the technical scheme, the invention can be improved as follows.

Optionally, the training process of the feature simulation learning network includes:

101, acquiring high-level features of a training image, a visible part detection frame and a whole body part detection frame of a target pedestrian;

102, extracting the high-level features by adopting RoI-Align to obtain the pedestrian whole body part features and the pedestrian visible part features according to the marking information of the visible part and the whole body part of the pedestrian; calculating visibility according to the ratio of the areas of a visible part detection frame and a whole-body part detection frame of a pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian-obstructing characteristic and a non-pedestrian-obstructing characteristic according to the visibility;

step 103, inputting the feature of the blocked pedestrian and the feature of the non-blocked pedestrian into a blocking-non-blocking feature simulation module for learning, and enabling the feature of the blocked pedestrian to learn and simulate the feature representation of the feature of the non-blocked pedestrian to obtain the first central point response thermodynamic diagram; inputting the whole-body pedestrian feature and the visible part pedestrian feature into a whole-body-visible feature simulation module for learning, and enabling the whole-body pedestrian feature to learn the feature representation of the visible part pedestrian feature to obtain the second central point response thermodynamic diagram;

and 104, fusing the first center point response thermodynamic diagram and the second center point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third center point response thermodynamic diagram.

Optionally, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:

calculating the visibility of pedestrians

：

Wherein,

is the area of the frame visible to the pedestrian,

is the area of the whole body frame of the pedestrian;

classifying the pedestrian whole-body part features into the pedestrian-obstructing feature and the non-pedestrian-obstructing feature according to the visibility of the pedestrian:

wherein,

indicating the ith occluding pedestrian feature,

represent

A set of occluded pedestrian features;

representing the ith non-occluding pedestrian feature,

to represent

A set of non-occluded pedestrian features.

Optionally, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:

dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the pedestrian features include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature and a non-occluded pedestrian feature;

extracting the characteristic of each pedestrian to a fixed size by adopting RoI-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;

by the Smooth-L1 function

To mimic constraints for each feature that needs to be mimicked:

wherein,

indicating the jth imitated feature that was,

the ensemble represents the mean of the N simulated features,

the ith feature to be simulated is shown, and M is the number of features to be simulated.

Optionally, the fusion policy of the central point response thermodynamic diagram in step 3 and step 104 is:

;

wherein,

representing a first center point response thermodynamic diagram,

representing a second center point response thermodynamic diagram,

a third center point response thermodynamic diagram is represented,

。

optionally, the features model a loss function of the learning network

Comprises the following steps:

；

and

a loss function for the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram respectively,

learning constraint penalty functions for the feature emulation of the occlusion-nonocclusion feature emulation module,

is the whole body-partThe characteristic simulation learning constraint loss function of the partial occlusion characteristic simulation module;

is the equilibrium coefficient;

；

wherein Lm is a loss calculation function; f denotes a set of pedestrian whole-body features, and V denotes a set of pedestrian visible features.

Optionally, the method for suppressing the non-maximum value by using the occlusion sensing in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:

step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402;

step 402, calculating the shielding attribute difference of the two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the shielding attribute difference does not exceed a set threshold value, deleting one of the two intersected detection frames;

the shielding attribute is the ratio of the length of the visible part of each detection frame of the detection frame to the length of the detection frame.

Optionally, the occlusion attribute of the detection frame is:

O = {o _i |i = 1, 2, 3, 4}

wherein o is ₁ ,o ₂ ,o ₃ ,o ₄ Respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame; o denotes an occlusion property vector of one detection box.

Optionally, the step 4 includes:

step 401', initialize the sequence of test frames

And corresponding confidence score orderColumn(s) of

Wherein

it indicates the (i) th detection box,

is that

A confidence score of (d);

step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box with the highest current confidence score

Taking out M from the detection frame sequence B and putting the M into a set F;

step 403', judge

When it is used, order

，

；

Wherein, the IoU is an intersection ratio calculation function,

is a set cross-over ratio threshold;

is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;

and

respectively a detection frame M and a detection frame

The occlusion property of the jth detection frame;

and step 404', circularly executing the step 402' -the step 403' until the sequence B is empty, and returning final sets F and S as a final detection frame sequence and a corresponding confidence score sequence respectively.

The invention provides a method for detecting pedestrians moving downwards in an occlusion scene with fused feature simulation, which comprises the steps of firstly, providing feature simulation shrinking intra-class feature difference, and improving inter-class feature difference between pedestrians and background classes; secondly, a fusion characteristic simulation learning strategy is provided, difference complementation is realized, and the detection rate of the sheltered scene is improved; and thirdly, constructing an occlusion attribute, proposing occlusion perception non-maximum value inhibition, and effectively reserving a detection frame inhibited due to occlusion. By innovatively fusing the method, the pedestrian detection method under the occlusion scene simulated by the fusion characteristics is constructed and used for improving the pedestrian detection performance under the occlusion scene.

Drawings

Fig. 1 is a structural diagram of a method for detecting pedestrians in an occlusion scene with fusion feature simulation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of feature modeling learning provided by an embodiment of the present invention;

fig. 3 is a code program diagram of an occlusion aware non-maximum suppression algorithm according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a structural diagram of a method for detecting a pedestrian under an occlusion scene simulated by fused features according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image acquired through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a full body-visible feature mimicking learning centroid response thermodynamic diagram.

And 2, acquiring high-level features of the image to be detected through a backbone network, and inputting the high-level features into a feature simulation learning network to obtain a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.

And 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the feature simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram for subsequent post-processing.

And 4, considering the shielding attribute and the classification confidence of the detection frame, and using the non-maximum value of shielding perception to suppress the third central point response thermodynamic diagram, so as to realize post-processing of the prediction result and obtain the detection result of the image to be detected.

According to the pedestrian detection method in the occlusion scene fused with the feature simulation, provided by the embodiment of the invention, the feature difference in pedestrians is reduced in a feature simulation learning mode, and meanwhile, the difference between the pedestrian and the background feature is increased, so that the detection rate of the pedestrian in the occlusion scene is improved. Meanwhile, the shielding attribute is designed as additional semantic information. A non-maximum suppression algorithm for shielding perception is designed, not only the prediction attribute of the pedestrian detection frame is considered, but also the shielding attribute of the pedestrian detection frame is considered, and the detection frame with low confidence coefficient score caused by shielding is effectively reserved while the redundant detection frame is suppressed.

Example 1

Embodiment 1 provided by the present invention is an embodiment of a method for detecting a pedestrian under an occlusion scene simulated by fused features, and as can be seen from fig. 1, the embodiment of the method for detecting a pedestrian under an occlusion scene includes:

step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centerpoint response thermodynamic diagram mimics the learning centerpoint thermodynamic diagram for occlusion-non-occlusion features and the second centerpoint response thermodynamic diagram mimics the learning centerpoint thermodynamic diagram for whole-body-visual features.

In one possible embodiment mode, the training process of the feature simulation learning network comprises the following steps:

step 101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian.

102, extracting high-level features by adopting RoI-Align to obtain pedestrian whole body part features and pedestrian visible part features according to marking information of a visible part and a whole body part of a pedestrian; and calculating visibility according to the ratio of the areas of the visible part detection frame and the whole-body part detection frame of the pedestrian, and classifying the whole-body part characteristic of the pedestrian into a pedestrian shielding characteristic and a non-pedestrian shielding characteristic according to the visibility.

In one possible embodiment, the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature includes:

calculating the visibility of pedestrians

：

Wherein,

is the area of the frame visible to the pedestrian,

is the whole body of the pedestrianArea of the frame.

Classifying the pedestrian whole body part features into blocking pedestrian features and non-blocking pedestrian features according to the visibility of the pedestrian:

wherein,

indicating the ith occluding pedestrian feature,

represent

A set of occluded pedestrian features;

representing the ith non-occluding pedestrian feature,

represent

A set of non-occluded pedestrian features.

103, inputting the characteristic of the blocked pedestrian and the characteristic of the non-blocked pedestrian into a blocking-non-blocking characteristic simulation module for learning, and enabling the characteristic of the blocked pedestrian to learn the characteristic representation simulating the characteristic of the non-blocked pedestrian to obtain a first central point response thermodynamic diagram; inputting the whole-body characteristic of the pedestrian and the visible part characteristic of the pedestrian into a whole-body-visible characteristic simulation module for learning, and enabling the whole-body characteristic of the pedestrian to learn the characteristic representation of the visible part characteristic of the pedestrian to obtain a second central point response thermodynamic diagram.

Fig. 2 is a schematic diagram of the feature simulation learning provided by the embodiment of the present invention, and with reference to fig. 1 and fig. 2, in a possible embodiment, the process of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module in step 103 includes:

dividing the characteristics of the target pedestrian in each batch into simulated characteristics and characteristics needing simulation; the features of the pedestrian include: a pedestrian whole body part feature, a pedestrian visible part feature, an occluded pedestrian feature, and a non-occluded pedestrian feature.

The features of each pedestrian were first extracted to a fixed size (7 × 256) using RoI-Align, and then the mean of the simulated features on each lane was calculated, using the feature mean as the object of the simulation.

By the Smooth-L1 function

To mimic constraints for each feature that needs to be mimicked:

wherein,

indicating the jth emulated feature,

the ensemble represents the mean of the N simulated features,

the feature to be simulated is represented as the ith, and M is the number of the features to be simulated.

Two different occlusion emulation strategies are proposed: an occlusion-non-occlusion feature mimic learning module and a whole-body-visual feature mimic learning module.

And 104, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram.

In one possible embodiment, the fusion strategy of the center point response thermodynamic diagrams in step 3 and step 104 is:

;

wherein,

representing a first center point response thermodynamic diagram,

representing a second center point response thermodynamic diagram,

a third center point response thermodynamic diagram is represented,

obtained by experiments.

In one possible embodiment, the features model a loss function of the learning network

Comprises the following steps:

；

and

learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module;

is a balance coefficient, set by experiment

。

。

Wherein Lm is a loss calculation function;

represent

A set of full-body characteristics of an individual pedestrian,

to represent

A set of individual pedestrian-visible features.

And 2, acquiring high-level features of the image to be detected through a backbone network, inputting the high-level features into a feature simulation learning network, and obtaining a first central point response thermodynamic diagram and a second central point response thermodynamic diagram.

In a possible embodiment, in the post-processing stage, the method for suppressing non-maximum using occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:

step 401, for any detection frame, judging whether the intersection ratio of the detection frame intersected with the detection frame is larger than a set threshold value, if so, executing step 402.

Step 402, calculating the shielding attribute difference of the two intersected detection frames; when the shielding attribute difference exceeds a set threshold value, the shielding attribute difference indicates that the shielding attribute difference is different detection frames and needs to be reserved; when the difference of the shielding attributes does not exceed the set threshold, the shielding attributes are indicated to be a redundant detection frame, and the suppression deletion is required.

The occlusion attribute is a ratio of a length of a visible portion of each detection frame of the detection frame to a length of the detection frame.

In a scene of a vehicle-mounted camera, in a process from an initial position to infinity, a target at infinity is reduced to the middle of an image, and the vertical coordinate of the position of a lower detection frame is gradually reduced according to the depth of field of the target in the image. According to this phenomenon, for detection frames having an intersection relationship with each other, the occlusion relationship between pedestrians is determined from the ordinate value of the lower boundary of the detection frame, and the occlusion attribute of the detection frame is defined based on the occlusion relationship.

It can be understood that the occlusion property of the detection frame is:

O = {o _i |i = 1, 2, 3, 4}

wherein o is ₁ ,o ₂ ,o ₃ ,o ₄ Respectively representing the visible length ratios of an upper frame, a right frame, a lower frame and a left frame; o denotes an occlusion property vector of one detection box. The shielding attributes of the four detection frames form the shielding attribute of the whole detection frame.

Fig. 3 is a code program diagram of an occlusion perception non-maximum suppression algorithm according to an embodiment of the present invention, and as can be seen from fig. 1 and fig. 3, in another possible embodiment, step 4 includes:

step 401', initialize the sequence of test frames

And corresponding confidence score sequences

Wherein, in the process,

it indicates the (i) th detection box,

is that

The confidence score of (c).

And taking M out of the detection frame sequence B and putting the M into a set F.

Step 403', judge

When it is used, order

，

。

Wherein, the IoU is an intersection ratio calculation function,

is a set intersection ratio threshold;

is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame,A lower or left border;

and

respectively a detection frame M and a detection frame

The occlusion property of the jth detection frame.

Step 404', circularly executing step 402' -step 403' until the sequence B is empty, returning the final sets F and S as the final detection frame sequence and the corresponding confidence score sequence respectively

Based on the defects in the background art, the embodiment of the invention provides a pedestrian detection method in an occlusion scene with feature simulation fused, 1, feature simulation is innovatively used for reducing the difference between pedestrian features, an effective thermodynamic diagram fusion strategy is provided by combining a model, and the pedestrian relevance ratio in the occlusion scene is effectively improved. 2. The pedestrian shielding attribute is constructed by utilizing the existing information and can be used as semantic information related to other visual tasks. 3. A blocking perception non-maximum value suppression algorithm is designed, redundant detection frames can be deleted, and meanwhile blocked pedestrian detection frames are reserved.

It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for detecting pedestrians under an occlusion scene with feature simulation fused, the method comprising:

step 1, training to obtain a feature simulation learning network, wherein the input of the feature simulation learning network is the high-level features of an image obtained through a backbone network, and the output of the feature simulation learning network is a third central point response thermodynamic diagram obtained by fusing a first central point response thermodynamic diagram and a second central point response thermodynamic diagram; the first centroid response thermodynamic diagram is an occlusion-non-occlusion feature mimicking learning centroid response thermodynamic diagram and the second centroid response thermodynamic diagram is a whole-body-visible feature mimicking learning centroid response thermodynamic diagram;

step 2, acquiring high-level features of the image to be detected through the backbone network, and inputting the high-level features into the feature simulation learning network to obtain the first central point response thermodynamic diagram and the second central point response thermodynamic diagram;

step 3, fusing the first central point response thermodynamic diagram and the second central point response thermodynamic diagram by the characteristic simulation learning network in a weighted fusion mode, and activating by sigmod to obtain a third central point response thermodynamic diagram;

step 4, considering the shielding attribute and the classification confidence of the detection frame, and using non-maximum value suppression of shielding perception to the third central point response thermodynamic diagram to obtain the detection result of the image to be detected;

the training process of the feature simulation learning network comprises the following steps:

101, acquiring high-level features of a training image, and a visible part detection frame and a whole body part detection frame of a target pedestrian;

2. The detection method according to claim 1, wherein the step 102 of classifying the pedestrian whole-body part feature into an obstructing pedestrian feature and a non-obstructing pedestrian feature comprises:

calculating the visibility Visr of the pedestrian:

wherein S is _{Visual frame} Is the area of the pedestrian' S visible frame, S _{Whole body frame} Is the area of the whole body frame of the pedestrian;

classifying the pedestrian whole body part feature into the pedestrian-obstructing feature and a non-pedestrian-obstructing feature according to the visibility of the pedestrian:

wherein,

indicates the ith occluded pedestrian feature, O indicates N _O A set of occluded pedestrian features; />

Represents the ith non-occluded pedestrian feature, U represents N _U A set of non-occluded pedestrian features.

3. The method according to claim 1, wherein the step 103 of training the occlusion-non-occlusion feature simulation module and the whole-body-visible feature simulation module comprises:

extracting the characteristic of each pedestrian to a fixed size by adopting Rol-Align, calculating the mean value of the simulated characteristic on each channel, and using the characteristic mean value as a simulated object;

by passing

The function is to model constraints for each feature that needs to be modeled:

wherein,

indicates the jth imitated characteristic, <' > is asserted>

Represents the mean, based on the N simulated features>

4. The method of claim 1, wherein the fusion strategy of the center point response thermodynamic diagram in step 3 and step 104 is:

M _center ＝αM _occ-unocc +(1-α)M _full-vis ；

wherein M is _occ-unocc Represents a first center point response thermodynamic diagram, M _full-vis Representing a second center-point response thermodynamic diagram, M _center Representing a third center point response thermodynamic diagram, α =0.5.

5. The detection method according to claim 1, wherein the features model a loss function L of a learning network _Dual Comprises the following steps:

L _Dual ＝λ _c (L _center1 +L _center2 )+L _occ-unocc +L _full-vis ；

L _center1 and L _center1 Loss functions, L, of the first centroid response thermodynamic diagram and the second centroid response thermodynamic diagram, respectively _occ-unocc Learning a constraint penalty function for feature modeling of an occlusion-non-occlusion feature modeling module, L _full-vis Learning a constraint loss function for the feature emulation of the whole-body-partial occlusion feature emulation module; lambda [ alpha ] _c Is the equilibrium coefficient;

L _occ-unocc ＝L _w (O，U)，L _full-vis ＝L _m (F，V)；

6. The detection method according to claim 1, wherein the method for using non-maximum suppression of occlusion perception in step 4 is to sequentially determine each detection frame according to the order of the detection confidence scores from high to low, and includes:

step 402, calculating the shielding attribute difference of two intersected detection frames; when the difference of the shielding attributes exceeds a set threshold value, the two intersected detection frames are reserved; when the shielding attribute difference does not exceed a set threshold value, deleting one of the two intersected detection frames;

7. The detection method according to claim 6, wherein the occlusion property of the detection frame is:

O＝{o _i |i＝1，2，3，4}

wherein o is ₁ ，o ₂ ，o ₃ ，o ₄ Respectively representing the visible length ratios of the upper frame, the right frame, the lower frame and the left frame; o denotes an occlusion property vector of one detection box.

8. The detection method according to claim 1, wherein the step 4 comprises:

step 401', initialize the detection frame sequence B = { B = { (B) ₁ ，…，b _N } and the corresponding confidence score sequence S = { S = ₁ ，...，s _N In which b _i I =1, \8230, N denotes the ith detection box, s _i I =1, \ 8230;, N is b _i A confidence score of (d);

step 402', when the mth value in the sequence S is determined to be the maximum value, let M be the detection box b with the highest current confidence score _m Taking out M from the detection frame sequence B and putting the M into a set F;

step 403', determine IoU (M, b) _i )≥N _t When it is, let S _i ＝S _i ·f(M，b _i )，

Wherein IoU is an intersection-to-parallel ratio calculation function, N _t Is a set intersection ratio threshold;

N _o is an occlusion property difference threshold; j =1, 2, 3 or 4 respectively represents the jth detection frame: an upper frame, a right frame, a lower frame or a left frame;

and &>

Respectively a detection frame M and a detection frame b _i The occlusion property of the jth detection frame;