CN120954052A

CN120954052A - Passenger Flow Statistics System and Method Based on Deep Learning

Info

Publication number: CN120954052A
Application number: CN202511068910.8A
Authority: CN
Inventors: 贾正旭; 王家栋; 张卿瑜; 杨凯; 陈林; 张运飞
Original assignee: Nanyang Power Supply Co of State Grid Henan Electric Power Co Ltd
Current assignee: Nanyang Power Supply Co of State Grid Henan Electric Power Co Ltd
Priority date: 2025-07-31
Filing date: 2025-07-31
Publication date: 2025-11-14

Abstract

The invention belongs to the field of computer vision and data analysis, and particularly relates to a passenger flow statistical system and a statistical method based on deep learning. The method solves the defects of the existing passenger flow statistical method in the aspects of accuracy, adaptability, density sensitivity, control dislocation and the like.

Description

Passenger flow statistical system and statistical method based on deep learning

Technical Field

The invention belongs to the field of computer vision and data analysis, and particularly relates to a passenger flow statistical system and a passenger flow statistical method based on deep learning.

Background

In the current society, accurate passenger flow statistics is important for various public places such as electric business offices, malls, stations, scenic spots and the like. Conventional passenger flow statistics methods, such as based on infrared sensing, pressure sensing and the like, have various limitations. The infrared induction is easy to be interfered by environment, when a plurality of people pass through the induction area at the same time, counting errors are easy to occur, the pressure induction has higher requirements on the installation position and the ground condition, different individuals cannot be distinguished, and the statistical accuracy is difficult to ensure.

With the development of computer vision technology, a passenger flow statistical method based on video images is gradually rising. Early video-based passenger flow statistics methods relied on hand-designed features such as HOG (direction gradient histogram), LBP (local binary pattern), etc., and then combined with traditional classifiers such as SVM (support vector machine) for pedestrian detection. However, these methods do not perform well in complex scenes, for example, the detection accuracy is greatly reduced when the illumination changes severely and pedestrians are blocked seriously. Because the manually designed features are difficult to fully and accurately describe the complex features of pedestrians, and the generalization capability of the traditional classifier is limited.

The appearance of deep learning technology brings new breakthrough to passenger flow statistics. Deep learning can automatically learn more representative features from a large amount of data, thereby improving the accuracy of detection and statistics. However, current passenger flow statistical algorithms based on deep learning still have some problems. On one hand, a large amount of marking data is needed for training the model, manual marking data is time-consuming and labor-consuming, marking errors are easy to occur, and on the other hand, the existing model still needs to be further improved in the aspects of multi-target tracking and instantaneity under complex scene processing. For example, in a scene of high personnel density such as a market sales promotion, the problems of target loss, repeated counting and the like may occur in the existing algorithm, and the requirement of actual application on high-precision passenger flow statistics cannot be met.

Summarizing three technical bottlenecks exist in the existing passenger flow statistical system:

1. the environmental adaptability is poor, namely the false detection rate of the traditional visible light scheme in a low illumination scene is more than 40%;

2. The density sensitive defect is that the ID switching rate of a fixed parameter algorithm in a crowded scene (ρ >4 people/m 2) is up to 35%;

3. And predicting control dislocation, wherein the statistical result and emergency decision response delay is more than 8 seconds.

The invention patent with publication number CN113591876A only realizes static scene counting and does not solve the problem of algorithm switching under dynamic density, and the invention patent with publication number CN110675443A discloses a thermal imaging fusion scheme which does not relate to a feature level weighting mechanism.

Therefore, the development of the passenger flow statistics method based on deep learning, which can more accurately and efficiently carry out passenger flow statistics and is suitable for complex scenes, has important practical significance.

Disclosure of Invention

The invention aims to provide a passenger flow statistical system and a passenger flow statistical method based on deep learning aiming at the problems in the prior art, so as to solve the defects of the existing passenger flow statistical method in the aspects of accuracy, adaptability, density sensitivity, control and dislocation and the like.

To achieve the above object, the deep learning-based passenger flow statistics system of the present invention comprises;

The multi-mode feature extraction module adopts a parallel double-channel Convolutional Neural Network (CNN), wherein a first channel processes visible light video stream, a second channel processes infrared thermal imaging data, and an enhanced pedestrian feature map is output through a feature level fusion layer;

The dynamic target tracking module generates a motion trail initial value based on an optical flow method, fuses a Kalman filtering prediction result in an SORT algorithm, and dynamically adjusts tracking weights through an appearance characteristic matching function;

The space-time passenger flow analysis module is used for carrying out space-time modeling on the pedestrian track by utilizing a space-time diagram convolution network (STGCN) and outputting a regional passenger flow density thermodynamic diagram and a future flow prediction curve;

And the feedback optimization module is used for adaptively adjusting the receptive field size of the CNN feature extraction layer and the search window size of an optical flow method according to the tracking loss rate in the shielding scene.

Specifically, the calculation mode of the feature level fusion layer is as follows:

F_fusion＝α·σ(Wv·F_RGB)+(1-α)·ReLU(Wt·F_thermal);

α=1-e^﹣β·Ι

Wherein F _RGB is a visible light feature map, F _thermal is a thermal imaging feature map, wv and Wt are trainable weight matrices, sigma is an illumination intensity self-adaptive coefficient, beta is an ambient illumination intensity, and I is an attenuation factor. The system relies on thermal imaging characteristics, and the false detection rate is reduced by 32%.

Compared with data-level fusion (direct fusion of original data), feature-level fusion is performed after key features of all data sources are extracted, redundant information can be removed, core features useful for tasks can be reserved, and fusion efficiency is improved. The feature extraction and fusion are carried out on the data sources, so that high calculation cost of directly processing massive original data is avoided, meanwhile, more simplified input is provided for subsequent models (such as a classifier and a predictor), the overall operation amount is reduced, and the calculation complexity is reduced.

Specifically, the appearance feature matching function is defined as:

S_match＝λ·loU(B_t,B_t-1)+(1-λ)·‖φ(f_t)-φ(f_t-1)‖₂

Wherein, λ dynamic adjustment coefficient, ρ is crowd density, ρ _max is maximum crowd density, φ (f _t)、φ(f_t-1) is appearance feature vector of current frame and previous frame extracted by CNN, II ₂ is norm of L2, loU (B _t,B_t-1) represents intersection ratio of target boundary frames in front and back frames, and dynamic adjustment is performed according to crowd density ρ. Appearance characteristics are dominant and matched, and the ID switching rate is reduced by 41%.

Specifically, the feedback optimization module performs the following operations:

when the tracking loss rate L _track is more than 15%, the convolution kernel size of the last two layers of the CNN backbone network is increased by 50%, and the recall rate of the shielding scene is increased by 28%;

When the average displacement D _move between continuous frames is less than 5px, the searching window of the optical flow method is reduced from 32X 32 to 16X 16, and the calculation time is reduced by 45%.

Specifically, the space-time passenger flow analysis module includes:

the space-time diagram construction unit is used for constructing a space-time diagram taking the track points of pedestrians as nodes and the motion relationship as edges;

the gating graph rolling unit is used for aggregating the space-time neighborhood information through a gating mechanism and updating the node state;

And the space-time prediction unit predicts the passenger flow distribution after the time T through a space-time attention mechanism based on the final layer node state h _i ^(L). MAE was predicted as low as 6.8% (22% compared to traditional approach).

A passenger flow statistical method based on deep learning is applied to the system, and comprises the following steps:

s1, synchronously collecting visible light and a thermal imaging video stream and aligning time;

s2, extracting multi-modal features through the double-channel CNN, and fusing according to a calculation formula of a feature level fusion layer;

s3, initializing a pedestrian detection frame based on the fusion characteristics, and executing cross-frame target association according to the appearance characteristic matching function;

s4, executing dynamic optimization feature extraction and tracking parameters according to the feedback optimization module;

S5, generating a regional passenger flow thermodynamic diagram and a future prediction curve by the passenger flow analysis module according to the time and space as a passenger flow prediction result.

Specifically, in the step S2, fusion is performed by adopting a migration learning strategy:

the visible light channel is loaded with COCO pre-training weight, and the thermal imaging channel is loaded with FLIR pre-training weight;

The fusion layer weights are updated by end-to-end joint fine tuning.

Specifically, the guest flow prediction result in S5 is used to trigger a control instruction:

when the regional density is greater than the safety density, a diversion alarm signal is generated, so that people flow or traffic flow in the region is dispersed through reminding, and risks caused by the overhigh density are avoided;

When the predicted flow > maximum capacity, a restriction control command is generated, which is a direct intervention to ensure that the flow does not exceed the upper load limit of the system or area by limiting the amount of ingress.

The passenger flow statistical method based on deep learning has the technical effects of various aspects and remarkable:

By adopting a passenger flow statistical algorithm based on deep learning, high-precision personnel detection and tracking can be performed on real-time video data in a complex scene. The spatial characteristics in each frame of image are extracted through a Convolutional Neural Network (CNN), and the system can automatically detect personnel targets in the video and track and count the personnel based on motion information between continuous frames. Compared with the traditional image processing algorithm, the deep learning algorithm can effectively cope with complex conditions such as shielding, light change and the like in a scene, ensures the accuracy and robustness of passenger flow statistics, and has full scene robustness, namely low illumination (I <20 lux) accuracy rate is more than 93%, and high density (rho >5 people/m <2 >)) ID switching rate is less than 1.5%.

Person detection is performed depending on a Convolutional Neural Network (CNN), and continuous person tracking is realized by combining an Optical Flow method or a target tracking algorithm (such as Kalman filtering or SORT). The system processes each frame in the video stream through the CNN model, identifies and locates the pedestrian target in the image, and then based on the target tracking algorithm, the system can track the motion track of the same target across frames to prevent repeated counting or missing counting. By combining the time dimension, the system can analyze the passenger flow change trend of different time periods, so that data support is provided for flow control and management in a scene, and the time delay from prediction to control instruction generation is less than 1 second.

The passenger flow statistical system and the statistical method based on deep learning provided by the invention adopt dynamic fusion (alpha function) of illumination-thermal imaging to solve the environmental adaptability defect, a density driving matching mechanism (lambda dynamic adjustment) solves the high density tracking problem, loss rate feedback optimization realizes algorithm parameter self-adjustment, and a space-time diagram prediction engine opens a statistical-decision closed loop. The method has the advantages that good effects can be achieved under different scenes, the accuracy, adaptability and instantaneity of passenger flow statistics are effectively improved, and the method has wide application prospects and practical values.

Drawings

FIG. 1 is a flow chart of a deep learning-based passenger flow statistics system of the present invention;

FIG. 2 is a flow chart of the operation of the space-time passenger flow analysis module.

Detailed Description

The technical scheme of the present invention is described in detail below with reference to the accompanying drawings and the detailed description.

Example 1

The embodiment provides a passenger flow statistical system based on deep learning, which comprises the following steps of;

In this embodiment, the calculation mode of the feature level fusion layer is as follows:

F_fusion＝α·σ(Wv·F_RGB)+(1-α)·ReLU(Wt·F_thermal);

α=1-e^﹣β·Ι

Wherein F _RGB is a visible light feature map, F _thermal is a thermal imaging feature map, wv and Wt are trainable weight matrices, sigma is an illumination intensity self-adaptive coefficient, beta is an ambient illumination intensity, and I is an attenuation factor. The system relies on thermal imaging features, and at low illumination (I <50 lux), α is about 0, with a 32% drop in false detection rate.

In this embodiment, the appearance feature matching function is defined as:

S_match＝λ·loU(B_t,B_t-1)+(1-λ)·‖φ(f_t)-φ(f_t-1)‖₂

Wherein, λ dynamic adjustment coefficient, ρ is crowd density, ρ _max is maximum crowd density, φ (f _t)、φ(f_t-1) is appearance feature vector of current frame and previous frame extracted by CNN, II ₂ is norm of L2, loU (B _t,B_t-1) represents intersection ratio of target boundary frames in front and back frames, and dynamic adjustment is performed according to crowd density ρ. At high density (ρ=4 people/m ²), λ=0.4, the appearance characteristics dominate the matching, and the ID switching rate is reduced by 41%.

Further, the feedback optimization module performs the following operations:

When the tracking loss rate L _track is more than 15%, the characteristic extraction capability of the current network to the target is possibly insufficient (for example, characteristic capture is inaccurate caused by target shielding, posture change and the like), the convolution kernel size of the last two layers of the CNN backbone network is increased by 50%, the characteristic receptive field can be enlarged, the capturing capability of the global characteristic and the context information of the target is enhanced, the tracking loss rate is possibly reduced, and the recall rate of the shielded scene is increased by 28%.

When the average displacement D _move between consecutive frames is less than 5px, it indicates that the motion amplitude of the target is small (e.g., slow moving or near stationary), and the displacement information can be captured without a large search range. The search window of the optical flow method is reduced from 32×32 to 16×16, so that the accurate calculation of small displacement can be ensured, and the calculated amount (the area of the search window is reduced to 1/4 of the original area) can be reduced, thereby improving the algorithm efficiency, reducing the calculation time consumption by 45%, and being an adaptive optimization strategy considering both precision and speed, and being suitable for tasks relying on optical flow calculation such as video target tracking, motion analysis and the like.

In this embodiment, the space-time passenger flow analysis module includes:

And the space-time prediction unit predicts the passenger flow distribution after the time T through a space-time attention mechanism based on the final layer node state h _i ^(L). MAE was predicted as low as 6.8% (22% compared to traditional approach). The specific implementation steps are shown in fig. 2.

In this embodiment, the gating graph rolling units (GATED GRAPH Convolutional Unit, GGCNU) combine the graph rolling network (GCN) and the gating mechanism to effectively aggregate the spatio-temporal neighborhood information so as to predict the passenger flow distribution at the future time T. The following specifically related formulas and operations:

1. First, a graph convolution operation is performed

Let us assume that we have a graph structure g= (V, E), where V is the set of nodes and E is the set of edges. For each node i, its characteristic is denoted as h_i { (l) }, where l represents the number of layers of the graph convolution.

The graph convolution operation can be expressed as:

Where N (i) is the set of neighbor nodes of node i. c _ij is a normalization constant, typically set as the degree d _i of node i or a more complex graph-Laplacian matrix-based normalization form (e.g. ) Common in GCN related to symmetric normalized laplace matrix), W ^(l) is the weight matrix of the first layer, used to linearly transform the features of neighboring nodes. b ^(l) is the bias vector of the first layer.

2. Gating mechanism

Introducing gating mechanisms to control the flow of information, including updating gatesReset gateThe calculation formula is as follows:

update door:

Reset gate:

wherein:

σ is a sigmoid activation function that compresses the input into (0, 1) intervals for calculating the weight of the gating value.

AndThe weight matrices of the update gate and the reset gate, respectively.

AndThe offset vectors of the update gate and the reset gate, respectively.

Representing the feature sums after convolving the graphCharacteristics of node i at the previous timeAnd (5) splicing.

3. Hidden state update

Candidate hidden states are computed by resetting gates and graph convolution resultsThen updating the hidden state of the node in combination with the update gate

Candidate hidden state:

Where tan h is a hyperbolic tangent activation function that maps the input to the (-1, 1) interval.

W ^(l)′ is a weight matrix for calculating candidate hidden states, and b ^(l)′ is a corresponding bias vector.

As indicated by the element level multiplication, i.e. reset gateSum-of-the-map convolution resultsElement-wise multiplication is performed.

Hidden state update:

This formula indicates that the hidden state of the current node is the hidden state at the previous time And candidate hidden statesIs updated by the update gateAnd (5) determining.

4. Predicting future T-moment passenger flow distribution

After the multi-layer gating graph rolling operation, the final hidden state representation of the node is obtained(L is the total number of layers). Then mapping the hidden state to a predicted value of the passenger flow distribution through a fully connected layer, and assuming that the predicted function is f:

Where W _out is the weight matrix of the full connection layer, b _out is the bias vector, The predicted value of the passenger flow distribution of the node i at the future time T is obtained. And may be formed in the form of icons for viewing based on the predicted values.

Example 2

The embodiment provides a passenger flow statistics method based on deep learning, which is applied to the system described in embodiment 1, and comprises the following steps:

In step S2 described in this embodiment, fusion is performed by adopting a migration learning strategy, which specifically includes:

The fusion layer weights are updated by end-to-end joint fine tuning.

In the embodiment, the guest flow prediction result in S5 is used to trigger a management and control instruction:

Example 3:

the embodiment controls the subway early peak passenger flow, and comprises the following specific steps:

1. And (3) data acquisition:

the acquisition was synchronized using a visible camera + thermal imager (30 fps).

2. Feature fusion:

ambient illuminance i=30 lux→α=0.2, thermal imaging weight 80%,

The coco+flir pre-training weights are loaded.

3. Tracking and optimizing:

Detection density ρ=5.2 human/m2→λ=0.38

The tracking loss rate L _track =18% →increasing ResNet the last two layers of convolution kernels to 9×9.

4. And (3) prediction management and control:

STGCN prediction of region A after 5min

Trigger gate current limit r=120× (1-6.8/8) =30 people/min.

Effect contrast:

Index (I)	Traditional scheme	The invention is that
			Counting accuracy	76.3%	98.1%
Predicting MAE	22.7%	6.8%
			Instruction response delay	8.5s	0.3s

From the above example, the full scene robustness can be derived:

low light (I <20 lux) accuracy >93%

The high density (ρ >5 people/m 2) ID switching rate is <1.5%.

The passenger flow statistics method based on deep learning can obtain good effects in different scenes, effectively improves the accuracy, adaptability and instantaneity of passenger flow statistics, and has wide application prospects and practical values.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the present invention can be modified or substituted for part of the technical features thereof without departing from the spirit and scope of the claimed technical solution.

Claims

1. A passenger flow statistics system based on deep learning, characterized in that it includes:

Multimodal feature extraction module: It adopts a parallel dual-channel convolutional neural network (CNN). The first channel processes the visible light video stream, and the second channel processes the infrared thermal imaging data. The enhanced pedestrian feature map is output through the feature-level fusion layer.

Dynamic target tracking module: Generates initial values of motion trajectory based on optical flow method, integrates Kalman filter prediction results from SORT algorithm, and dynamically adjusts tracking weights through appearance feature matching function;

Spatiotemporal Passenger Flow Analysis Module: Utilizes a Spatiotemporal Graph Convolutional Network (STGCN) to perform spatiotemporal modeling of pedestrian trajectories, outputting a regional passenger flow density heatmap and future traffic prediction curves;

Feedback optimization module: Adaptively adjusts the receptive field size of the CNN feature extraction layer and the search window size of the optical flow method based on the tracking loss rate in occluded scenarios.

2. The passenger flow statistics system based on deep learning according to claim 1, characterized in that the calculation method of the feature-level fusion layer is as follows:

F _fusion =α·σ(Wv·F _RGB )+(1-α)·ReLU(Wt·F _thermal );

α = 1 - e ^{^(-β·Ι)}

Where F _RGB is the visible light feature map, F _thermal is the thermal imaging feature map, Wv and Wt are trainable weight matrices, σ is the light intensity adaptive coefficient, β is the ambient light intensity, and Ι is the attenuation factor.

3. The deep learning-based passenger flow statistics system according to claim 1, characterized in that, in the data preprocessing step, the appearance feature matching function is defined as:

S _match =λ·loU(B _t ,B _t-1 )+(1-λ)·‖φ(f _t )-φ(f _t-1 )‖ ₂

Where λ is the dynamic adjustment coefficient, ρ is the crowd density, _ρmax is the maximum crowd density, φ(f _t ) and φ(f _t-1 ) are the appearance feature vectors of the current frame and the previous frame extracted by CNN, ‖‖ ₂ is the L2 norm, and loU(B _t ,B _t-1 ) represents the intersection-union ratio of the target bounding boxes in the two frames, which is dynamically adjusted according to the crowd density ρ.

4. The passenger flow statistics system based on deep learning according to claim 1, characterized in that, in the model construction step, the feedback optimization module performs the following operations:

When the tracking loss rate L _track > 15%, increase the kernel size of the last two layers of the CNN backbone by 50%.

When the average displacement D<sub> _move </sub> between consecutive frames is less than 5px, the search window for optical flow is reduced from 32×32 to 16×16.

5. The deep learning-based passenger flow statistics system according to claim 1, characterized in that the spatiotemporal passenger flow analysis module includes:

Spatiotemporal graph construction unit: Constructs a spatiotemporal graph with pedestrian trajectory points as nodes and motion relationships as edges;

Gated graph convolutional units: aggregate spatiotemporal neighborhood information and update node states through a gating mechanism;

Spatiotemporal prediction unit: Based on the final layer node state h _i ^(L) , predict the passenger flow distribution after time T through a spatiotemporal attention mechanism.

6. A deep learning-based passenger flow statistics method, applied to the system described in any one of claims 1-5, characterized in that it includes the following steps:

S1. Synchronous acquisition and time alignment of visible light and thermal imaging video streams;

S2. Extract multimodal features using a dual-channel CNN and fuse them according to the calculation formula of the feature-level fusion layer;

S3. Initialize pedestrian detection boxes based on fused features, and perform cross-frame target association according to the appearance feature matching function;

S4. Perform dynamic optimization of feature extraction and tracking parameters based on the feedback optimization module;

S5. Generate a regional passenger flow heat map and future prediction curve as the passenger flow prediction result using the spatiotemporal passenger flow analysis module.

7. The deep learning-based passenger flow statistics method according to claim 6, characterized in that, in step S2, a transfer learning strategy is used for fusion:

The visible light channel is loaded with COCO pre-trained weights, and the thermal imaging channel is loaded with FLIR pre-trained weights.

The weights of the fusion layer are updated through end-to-end joint fine-tuning.

8. The deep learning-based passenger flow statistics method according to claim 6, characterized in that, after the passenger flow statistics step, the passenger flow prediction result in S5 is used to trigger control instructions:

When the area density is greater than the safety density, a diversion alarm signal is generated;

When the predicted flow exceeds the maximum capacity, a flow limiting control command is generated.