CN117150907A

CN117150907A - Method and system for establishing station passenger flow real-time deduction model based on reinforcement learning

Info

Publication number: CN117150907A
Application number: CN202311125286.1A
Authority: CN
Inventors: 王雅观; 秦勇; 豆飞; 魏运; 郭建媛; 王艳辉; 管洋; 张惠茹; 孙璇; 刘宇然; 只巍
Original assignee: Beijing Metro Operation Co ltd Technology Innovation Research Institute Branch
Current assignee: Beijing Metro Operation Co ltd Technology Innovation Research Institute Branch
Priority date: 2023-09-01
Filing date: 2023-09-01
Publication date: 2023-12-01

Abstract

The embodiment of the specification provides a method and a system for establishing a station passenger flow real-time deduction model based on reinforcement learning, wherein the method comprises the following steps: identifying and judging the system structure of the station and the flow causality relation of passengers among all areas based on system dynamics, and constructing a station passenger flow evolution simulation model according to the identification and judgment results; based on the real-time monitoring video data and with the minimum simulation error as a target, continuously learning and optimizing the passenger selection behavior parameter value by using a reinforcement learning algorithm, and constructing a passenger selection behavior real-time correction model, wherein the passenger selection behavior real-time correction model is used for dynamically adjusting the passenger selection behavior parameter in the station passenger flow evolution simulation model; and embedding the passenger selection behavior real-time correction model into a station passenger flow evolution simulation model, and establishing a station passenger flow real-time deduction model based on reinforcement learning.

Description

Method and system for establishing station passenger flow real-time deduction model based on reinforcement learning

Technical Field

The document relates to the technical field of computers, in particular to a method and a system for establishing a station passenger flow real-time deduction model based on reinforcement learning.

Background

With the rapid development of urban economy and the continuous increase of urban population in China, the problem of urban ground traffic jam is more serious, not only prevents the sustainable development of cities, but also influences the normal travel of urban residents. The rail transit is an important ring in the urban travel mode, and the unique underground operation mode is not easy to be interfered by other urban traffic modes and weather, so that the traffic is larger and the efficiency is higher, and the rail transit becomes the first choice for relieving the ground traffic pressure. Meanwhile, the running mode based on electric power has the characteristics of high energy efficiency and low discharge, and can play a great role in promoting construction of strong traffic countries and smart cities and development of green traffic. The urban rail transit can fully play an important role in promoting economic development of China, bringing convenience to people's life and guaranteeing social safety and stability.

Along with the continuous expansion of urban rail transit station space, the functions are continuously increased, the internal structure of the station is more three-dimensional and complex, all equipment and facilities are mutually linked and cooperatively work, and passenger flow streamlines are complicated and interweaved, so that a complex network system is formed. The complex station space environment increases the difficulty of station operation management, and provides higher requirements for ensuring the safety of passengers and normal operation of the station. The passenger flow organization method based on the historical automatic fare collection system (Automatic Fare Collection, AFC) data cannot accurately reveal and analyze the flow demand and evolution mechanism of passenger flow in the internal network system of the rail transit station, the congestion state propagation rule and the control management mechanism. Therefore, a new passenger flow data source is urgently needed to be obtained, deep research is conducted on passenger flow characteristics and organization methods in a micro-network system in a station, a set of scientific theory and method are established from the aspects of quantification, refinement, system optimization and the like, and the capability of preventing and coping with urban rail transit station operation risks is improved.

The space-time distribution characteristics of the passenger flow have important theoretical support significance in the aspects of passenger flow group behavior analysis, passenger flow organization mode adjustment, operation safety management and the like. The research on the space-time distribution characteristics of the passenger flow in the station is mostly carried out based on simulation models, the existing station passenger flow simulation is based on research data, and the movement of passengers in the station is simulated by utilizing a plurality of simulation models such as microcosmic or macroscopic simulation models, but the research data is limited by manpower cost, and the research data usually selects a plurality of typical dates and time periods to acquire passenger flow information of key areas of the station in a short time, so that the method has certain limitation. Because the behaviors of passengers are easily influenced by a plurality of factors such as environment, time, subjective psychology and the like, the selection and behaviors of the passengers can show a certain difference in different scenes, and the situation that the passenger flow simulation is deviated from the real data can occur in certain scenes by only relying on the historical investigation data. In order to monitor the passenger flow state of a part of key areas in real time, a monitoring camera is usually arranged at a station to acquire and transmit passenger flow information of the monitored areas in real time, and the data are mainly used for assisting staff in judging the risk level of the areas, so that the data utilization rate is not high.

Disclosure of Invention

The invention aims to provide a station passenger flow real-time deduction model building method and system based on reinforcement learning, and aims to solve the problems in the prior art.

The invention provides a station passenger flow real-time deduction model building method based on reinforcement learning, which comprises the following steps:

identifying and judging the system structure of the station and the flow causality relation of passengers among all areas based on system dynamics, and constructing a station passenger flow evolution simulation model according to the identification and judgment results;

based on the real-time monitoring video data and with the minimum simulation error as a target, continuously learning and optimizing the passenger selection behavior parameter value by using a reinforcement learning algorithm, and constructing a passenger selection behavior real-time correction model, wherein the passenger selection behavior real-time correction model is used for dynamically adjusting the passenger selection behavior parameter in the station passenger flow evolution simulation model;

and embedding the passenger selection behavior real-time correction model into a station passenger flow evolution simulation model, and establishing a station passenger flow real-time deduction model based on reinforcement learning.

The invention provides a station passenger flow real-time deduction model building system based on reinforcement learning, which comprises the following steps:

the first construction module is used for identifying and judging the system structure of the station and the flow causality relation of passengers in each area based on system dynamics, and constructing a station passenger flow evolution simulation model according to the identification and judgment results;

The second construction module is used for continuously learning and optimizing the passenger selection behavior parameter value by using a reinforcement learning algorithm based on the real-time monitoring video data and with the minimum simulation error as a target, and constructing a passenger selection behavior real-time correction model, wherein the passenger selection behavior real-time correction model is used for dynamically adjusting the passenger selection behavior parameter in the station passenger flow evolution simulation model;

the building module is used for embedding the passenger selection behavior real-time correction model into the station passenger flow evolution simulation model and building the station passenger flow real-time deduction model based on reinforcement learning.

The embodiment of the invention also provides electronic equipment, which comprises: the method comprises the steps of a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the computer program realizes the method for establishing the station passenger flow real-time deduction model based on reinforcement learning when being executed by the processor.

The embodiment of the invention also provides a computer readable storage medium, wherein an implementation program for information transmission is stored on the computer readable storage medium, and the program is executed by a processor to realize the steps of the method for establishing the station passenger flow real-time deduction model based on reinforcement learning.

The embodiment of the invention can have the following beneficial effects: the embodiment of the invention firstly analyzes the causal relationship formed by the station system structure and the passenger flow, builds a station passenger flow evolution simulation model based on system dynamics on the basis, then utilizes real-time monitoring video data to study the time-varying rule of passenger selection behaviors, builds a passenger selection behavior real-time correction model, and embeds the model into the simulation model to realize the station passenger flow real-time deduction simulation introducing real-time data, thereby being beneficial to obtaining accurate information of station passenger flow distribution under different scenes and providing theoretical support for finding potential passenger flow risks, judging bottleneck nodes and regulating passenger flow organization strategies.

Drawings

For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description that follow are only some of the embodiments described in the description, from which, for a person skilled in the art, other drawings can be obtained without inventive faculty.

FIG. 1 is a flow chart of a method for establishing a station passenger flow real-time deduction model based on reinforcement learning according to an embodiment of the invention;

FIG. 2 is a station topology diagram of an embodiment of the present invention;

FIG. 3 is a logically analyzed view of platform traffic in accordance with an embodiment of the present invention;

FIG. 4 is a logically analyzed view of living traffic in a living room in accordance with an embodiment of the present invention;

FIG. 5 is a diagram of a logical analysis of transfer traffic in accordance with an embodiment of the present invention;

FIG. 6 is a diagram of parent-child node relationships of an embodiment of the present invention;

FIG. 7 is a flow chart of a passenger selection behavior real-time correction logic of an embodiment of the present invention;

FIG. 8 is a block diagram of a station passenger flow real-time deduction simulation flow according to an embodiment of the invention;

FIG. 9 is a graph comparing results of a simulation of the passage of passenger flow through a station entrance in a non-workday scenario according to an embodiment of the present invention;

FIG. 10 is a graph comparing simulation results of passing passenger flow in a transfer passage in a non-workday scenario according to an embodiment of the present invention;

FIG. 11 is a comparison chart of simulation results of the passing of passenger flow through stairway openings in a hall in a non-workday scene according to an embodiment of the invention;

FIG. 12 is a graph showing a comparison of simulation results of the passage of passenger flow through an entrance under a workday scenario in accordance with an embodiment of the present invention;

FIG. 13 is a graph showing comparison of simulation results of passing passenger flow in a transfer passage in a workday scenario according to an embodiment of the present invention;

FIG. 14 is a comparison of simulation results of the passing of passenger flow through the stairway opening in the hall in a workday scenario according to an embodiment of the present invention;

FIG. 15 is a graph showing a comparison of simulation results of the passage of passenger flow through a station entrance in a holiday scene according to an embodiment of the present invention;

FIG. 16 is a graph showing a comparison of simulation results of passing passenger flow in a transfer passage in a holiday scenario according to an embodiment of the present invention;

FIG. 17 is a graph showing a comparison of simulation results of the passage of passenger flow through stairway openings in a hall in a holiday scene according to an embodiment of the invention;

fig. 18 is a schematic diagram of a system for building a real-time deduction model of passenger flow of a station based on reinforcement learning according to an embodiment of the present invention.

Detailed Description

In order to enable a person skilled in the art to better understand the technical solutions in one or more embodiments of the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive faculty, are intended to be within the scope of the present disclosure.

Before the technical scheme of the embodiment of the invention is described in detail, a description is first given of related partial concepts.

1. System dynamics method

System dynamics is a comprehensive science of studying dynamic behavior of a system. The basic idea of system dynamics is to adopt the concept of system analysis, make a system target on the premise of defining the system problem, analyze the structure and the function of the system on the basis of the system, refine the causal relationship among the elements of the system, thereby establishing a series of differential equations with time hysteresis to form a simulation model of the system, and dynamically correct the system behavior through the feedback mechanism among the elements so as to make corresponding decisions. The method can be used for realizing the flow of social system problems in a general sense, constructing a complex system model consisting of a plurality of elements in a mode of combining qualitative analysis and quantitative representation, obtaining the information of system behaviors by using simulation as a means, and analyzing and researching the system behavior, thereby providing scientific basis for correct decision. The heart of system dynamics is the structure of the system and the feedback mechanism between the elements that make up the system structure. The structure of the system determines the function of the system, the feedback mechanism of the system reflects the causal relationship existing among the elements, the system function is the key for solving the system problem, and the feedback mechanism of the system is the key for information transmission and behavior correction among the elements in the system.

The system dynamics absorbs the advantages of multiple disciplines such as system theory, control theory, information theory and decision theory, and therefore has the following advantages:

(1) The method comprises the steps of (1) converting a researched social problem into a system formed by a plurality of elements, establishing a series of differential equations through analyzing the structure of the system and the causal relation among the elements, constructing a corresponding system simulation model, obtaining dynamic behavior information in the system through a simulation mode, and outputting a decision result;

(2) The system dynamics model is a closed loop system based on information feedback, and the information depends on causal relation among elements and feedback mechanism circulation, so that the input information quantity and the information precision are not required to be high, and high-precision parameters are not required to be provided;

(3) The model is constructed by means of a series of time-lapse differential equations, so that the model has great flexibility, and can dynamically adjust the equations even if the system structure changes, and certain elements are added or subtracted;

(4) The system dynamics model is an imitation structure model, and is constructed by means of combination of qualitative analysis and quantitative calculation, and the behavior mode and the characteristics of the system dynamics model are mainly determined by the structure and the feedback mechanism of the system, so that the system dynamics model can better reflect a real system and can be applied to complex time-varying system problems with high-order, nonlinear and multiple feedback and delay effects.

A complete system dynamics model consists of a system structure diagram, a causal relation diagram among elements, a flow chart and a structural equation. The analysis of the actual social system problems determines the structure and the function of the system, and the internal relation of the system forms the causal relation among the elements and is finally embodied by a quantitative structural equation set.

(1) System structure diagram

Firstly, the construction of a system structure diagram is to define the constituent elements of the system, analyze the characteristics and functions of each element, the connection relationship and the spatial distribution, and then judge the input and output information and the generated internal information in the running process of the system and the influence of the information on the system state, so as to form a system model topological structure capable of completely embodying the system functions.

(2) Causal relationship graph

The causality graph reflects the correlation action relation among all elements in the system and mainly consists of a causality arrow, a causality chain and a causality loop (feedback loop). The causal arrow is a directed line segment connecting causal elements, the arrow tail is positioned on a cause element, the arrow head is positioned on a result element, the causal relationship is divided into a positive causal relationship (+) and a negative causal relationship (-) according to the effect result, the positive causal relationship indicates that the cause is consistent with the result change direction, and the negative causal relationship indicates that the cause is opposite to the result change direction. The causal arrows are connected to form a causal chain, recursion of the causal relationship is formed, the positive and negative of the causal chain is determined by the state of the causal arrows, and if the causal chain contains an odd number of negative causal arrows in the same chain, the causal chain is a negative causal chain; if there are an even number of negative causal arrows, or all positive causal arrows, then it is a positive causal chain. The closed loop formed by a plurality of causal chains is called a causal relation loop and is characterized by being closed and connected end to end, and the closed loop reflects the causal relation interaction state existing among elements, namely, the cause generates a result, and the result in turn influences the environment causing the cause, so that the cause changes, but the causal relation loop has relativity from the point of view of the relation loop. The positive and negative discrimination criteria of the relation loop are consistent with the causal link, wherein the positive relation loop can promote the elements to continue to develop along the original direction, so that the relation loop has self-strengthening (or weakening) effect; the negative relationship circuit causes the element to change in a direction opposite to the original direction, thus having an internal regulator (stabilizer) effect.

(3) System flow diagram

The flow chart is a basic form of system dynamics for describing the transmission direction of information in the system, and is also the basis for constructing a system dynamics model, and the symbols of the flow chart are as follows:

1) Flow: activity and behavior in the system, represented as directed edges in the flow chart, including both real-world (solid line) and information-world (dashed line) flows;

2) Stream bit (horizontal): the quantity of the subsystem or element state in the system is the accumulation of the physical flow, the process of the change of the system state along with time is reflected, and the process is represented as a rectangular frame in the flow chart;

3) Flow rate (rate): the change of the flow in the system along with time can control and influence the flow position, and is used for forming a decision function in a system model and is expressed as a valve in a flow chart;

4) Auxiliary variables: the system is used for simplifying the variable of the flow rate and facilitating the understanding of complex functions;

5) Parameters: various constants in the system remain unchanged in amount during operation of the system.

(4) Structural equation

The structural equation is a quantitative representation of a system dynamics model, is a precondition for realizing system simulation, can accurately transmit information in the system only by constructing the structural equation which accords with the actual state of the system and reflects the change of the state of each element, and comprises a flow position equation, a flow rate equation, an auxiliary equation, a given constant equation, an initial value equation and the like.

2. Reinforced learning Q-learning method

The reinforcement learning is a machine learning method for learning by continuously performing interactive feedback with the environment, aims at obtaining the maximum rewards, continuously probing in the interaction with the environment, obtaining feedback signals of action results, taking the feedback results as guidance of next behaviors, and further learning the optimal behavior mode of the task environment so as to improve and update the behavior strategy of the task environment and achieve the optimal state.

The basic framework of reinforcement learning consists of an intelligent agent and an environment, and by giving an unknown environment, the intelligent agent needs to gradually find the optimal behavior through continuous feedback and trial and error learning to achieve the optimal goal, and the motivation is to improve learning capacity and adaptability through continuous feedback.

Q-learning is one of the classical methods in reinforcement learning, and based on the value of actions, by evaluating the learning result of a model, an optimal strategy for an agent to take action in a given environment is learned, and the action strategy is optimized by experimental iteration, so that a detailed system model is not required to be constructed, and the method has stronger adaptability.

Method embodiment

According to an embodiment of the present invention, a method for building a station passenger flow real-time deduction model based on reinforcement learning is provided, and fig. 1 is a flowchart of a station passenger flow real-time deduction model building method based on reinforcement learning according to an embodiment of the present invention, as shown in fig. 1, and the station passenger flow real-time deduction model building method based on reinforcement learning according to an embodiment of the present invention specifically includes:

step S101: based on system dynamics, identifying and judging the flow causality relationship of passengers between a station system structure and each region, and constructing a station passenger flow evolution simulation model according to the identification and judgment results, wherein the method specifically comprises the following steps: dividing the station into three areas of a station platform, a station hall and a transfer channel, logically analyzing passenger flow states, facilities and passenger flow causal relations in the three areas, generating a passenger flow causal relation structure tree of a station system structure, quantifying the passenger flow causal relation based on the passenger flow causal relation structure tree, constructing a system dynamics equation, and constructing the station passenger flow evolution simulation model based on the system dynamics equation, wherein the system dynamics equation specifically comprises: a station hall traffic volume change equation, a station traffic volume change equation, a general channel traffic volume change equation, a transfer channel traffic volume change equation, a traffic volume change equation at a staircase in the station hall, a traffic volume change equation at a ticket checking place, a get-off traffic volume change equation and a get-on traffic volume change equation;

Step S102: based on real-time monitoring video data and with minimum simulation error as a target, continuously learning and optimizing the passenger selection behavior parameter value by using a reinforcement learning algorithm, and constructing a passenger selection behavior real-time correction model, wherein the passenger selection behavior real-time correction model is used for dynamically adjusting the passenger selection behavior parameter in the station passenger flow evolution simulation model and specifically comprises the following steps:

based on real-time monitoring video data, introducing real-time video passenger flow data, comparing and verifying real passenger flow and simulated passenger flow at a monitoring area, taking a comparison verification result as input of parameters of a passenger selection behavior real-time correction model, continuously learning and optimizing the passenger selection behavior parameters by using a reinforcement learning algorithm Q-learning with minimum simulation errors as a target, maintaining dynamic changes of the passenger selection behavior parameters, and constructing a passenger selection behavior real-time correction model, wherein the passenger selection behavior parameters comprise path selection behaviors and equipment facility selection behaviors;

step S103: embedding a passenger selection behavior real-time correction model into a station passenger flow evolution simulation model to establish a station passenger flow real-time deduction model based on reinforcement learning, wherein the method specifically comprises the following steps of:

Step 1: determining time t=t _s Wherein T is _s Setting the time step as delta t for the simulation starting time;

step 2: initializing a simulation environment and loading data to construct a station passenger flow evolution simulation model, wherein the loading data comprises the following steps: the system comprises an inbound passenger flow, an outbound passenger flow, a get-off passenger flow, a length, width, height, maximum capacity, maximum safety capacity, maximum inflow and outflow capacity of a node, maximum speeds of passengers in the node, inflow and outflow relations among the nodes and initial values of selection behavior parameters of the passengers;

step 3: judging whether the current time T reaches the simulation ending time T _q If t<T _q Executing step 4, otherwise, finishing the simulation;

step 4: acquiring real-time video passenger flow data, embedding a passenger selection behavior real-time correction model, and calculating an error value of the simulated passenger flow data and the real passenger flow data at a station monitoring node;

step 5: establishing a station passenger flow real-time deduction model based on reinforcement learning, and updating a Q value and an error value to reach a termination condition, wherein the termination condition is whether the error value of the simulated passenger flow data and the real passenger flow data at a monitoring node reaches the required precision or reaches the iteration number specified by a system;

step 6: updating passenger selection behavior parameters of the selection nodes;

Step 7: updating the inflow passenger flow volume and the outflow passenger flow volume of the train nodes of the station;

step 8: updating inflow passenger flow and outflow passenger flow of other nodes except the train node of the station;

step 9: updating the passenger flow stock of each node of the station;

step 10: updating the passenger flow volume allowed to flow out by each node of the station at the next time step;

step 11: updating dynamic attributes of all nodes of the station, wherein the dynamic attributes comprise passenger flow density, passenger flow speed and traveling time of passengers passing through all nodes;

step 12: updating the time step, enabling t=t+Δt, and returning to the step 3;

the method further comprises: and carrying out station passenger flow real-time deduction by using the station passenger flow real-time deduction model.

The following describes the above technical solution of the embodiment of the present invention in detail with reference to the method for establishing a reinforcement learning-based real-time deduction model for passenger flow in a station according to the embodiment of the present invention, as shown in fig. 1. The method for establishing the station passenger flow real-time deduction model based on reinforcement learning specifically comprises the following steps:

1. station system structure and passenger flow causality analysis among areas

Based on research and analysis of rail transit stations, the embodiment of the invention abstracts the stations into a complex system consisting of passenger flow states, facilities with various functions and passenger flow causal relations, wherein the passenger flow states and the facilities are components of the system, the passenger flow causal relations determine the interactive relation of the passenger flowing in the station environment, the function of the system is to realize the simulation of the passenger flow evolution in the stations along with time, and the aim of the system is to reflect the travel demands of the passengers through an optimization system.

(1) Composition and connection relation of station facilities

The facilities in the station mainly comprise a station entrance, a travel channel, a gate, a station hall, an escalator, a station platform and the like, and the facilities are divided into accommodation facilities with the capacity of accommodating passengers and passing facilities for the passengers to pass according to functions, wherein the accommodation facilities comprise accommodation surface areas and channels in connecting line areas, the facilities can provide space for the passengers to stay, and when large passenger flows occur, the phenomenon of the passengers to stay in the facilities generally occurs; the traffic facilities include junction areas and other areas except for the passage, such as stairs, stairs and the like, and the functions of the traffic facilities are mainly that passengers can take, so that the traffic is relatively small, and when large passenger flow occurs, the queuing phenomenon of passengers is generally generated outside the facilities.

The connection relation between the nodes is determined by the space layout of the station and the passenger flow organization, and is represented as a running path of passengers in the station, and the topological structure of the connection relation is shown in fig. 2.

(2) Passenger flow state

The running process of the system model is the process of the circulation of passengers in the system, and the state of the passenger flow at each facility is information generated by each element of the system, so that the state of the passenger flow is an important factor in the model. The passenger flow state in the system is mainly influenced by the passenger flow quantity flowing into the system, the position, the function, the attribute and the passenger flow organization mode of each facility. The passenger flow state mainly comprises two factors, namely passenger flow and passenger flow streamline, the dynamic deduction of the system is the process of transferring the passenger flow according to the passenger flow streamline, a streamline-based logic relation diagram among facilities is further formed, and the logic relation of the passenger flow among the facilities is the foundation of building a model stock flow diagram, so that analysis and research on the passenger flow structure and the passenger flow relation of the facilities in the station system are needed. The rail transit transfer station has a complex space structure, more passenger flow types, passenger flow volumes and passenger flow lines, so the transfer station is selected as an analysis object, and later stations refer to transfer stations, and the security check and gate machine combination is set as a ticket checking area due to the fact that the security check and gate machine are closer in distance and the passenger flow state is similar.

The embodiment of the invention divides the passenger flow in the station according to the three areas of the station, the station hall and the transfer channel, analyzes the logic relation of the passenger flow contained in the three areas respectively, and draws analysis results in the form of a structure tree, wherein the results are shown in figures 3 to 5.

(3) Causal relationship of passenger flow change of station facilities

Based on analysis of the space structure of the station and the logical relationship of the passenger flow among facilities, the circulation change relationship of the passenger flow in the station system can be integrally constructed, so that the passenger flow causal relationship of the station system is formed. The causality relationship takes two parts of standing room traffic and station platform traffic as cores, wherein the standing room traffic is mainly influenced by incoming traffic, transfer traffic and boarding traffic, and the station platform traffic is mainly influenced by alighting traffic, transfer traffic and boarding traffic.

2. Building station passenger flow evolution simulation model equation

Further quantifying the flow information of the passenger flow on the basis of analyzing the system structure of the station and the causal relationship of the passenger flow, constructing a system dynamics equation of a station passenger flow evolution simulation model, and realizing quantitative representation of the passenger flow state of each facility so as to acquire real-time distribution data of the passenger flow and provide support for optimizing and adjusting the subsequent passenger flow organization mode.

(1) Accommodation type facility passenger flow

1) Station hall stock change equation

Wherein P is _hall Flow is station hall traffic _enter (t) passenger Flow quantity entering station at t moment, flow _channel-in (t) is the passenger Flow rate entering the entrance hall from the channel at the moment t, flow _esca-in (t) is the passenger Flow rate from the escalator to the entrance hall at the moment t, flow _out (t) passenger Flow volume leaving the station at time t, flow _channel-out (t) is the passenger Flow rate leaving the hall from the channel at the moment t, flow _esca-out (t) is the passenger flow from the escalator to leave the hall at the moment t;

2) Station passenger flow stock change equation

P _plat ＝∫[Flow _train-in (t)+Flow _plat-in (t)-Flow _train-out (t)-Flow _plat-out (t)]dt formula 2;

wherein P is _plat Flow for platform traffic _train-in (t) is the passenger Flow volume from the arrival train to the platform at the moment t, flow _plat-in (t) is the passenger Flow quantity entering the platform from other areas of the station at the moment t, flow _train-out (t) is the passenger Flow volume leaving from the arrival train at time t, flow _plat-out (t) is the passenger flow from the platform to other areas of the station at the moment t;

3) General channel passenger flow stock change equation

P _channel ＝∫[Flow _channel-in (t)-Flow _channel-out (t)]dt formula 3;

wherein P is _channel Flow is the channel passenger Flow volume _channel-in (t) is the passenger Flow quantity entering the channel from other areas of the station at the moment t, flow _channel-out (t) is the passenger flow volume leaving the channel at the moment t and going to other areas of the station;

4) Transfer channel passenger flow stock change equation

P _{transfer-channel} ＝∫[Flow(t)]dt

Flow(t)＝Flow _channel-in (t)+Flow _transfer-in (t)-Flow _channel-out (t)-Flow _transfer-in (t) equation 4;

wherein P is _{transfer-channel} Flow for platform traffic _channel-in (t) is the passenger Flow quantity entering the channel from other areas of the station at the moment t, flow _channel-out (t) is the passenger Flow volume leaving the channel at the moment t and going to other areas of the station, flow _transfer-in (t) transfer passenger Flow rate entering transfer channel at t moment, flow _transfer-in (t) is the transfer traffic leaving the transfer aisle at time t;

(2) Passenger flow of traffic facilities

1) Equation of change of passenger flow stock at escalator in hall

(1) Descending stair lift:

in the method, in the process of the invention,flow for the passenger Flow of the stairs in the descending direction in the living room _hall-in (t) is the passenger Flow rate of entering the escalator from the station hall at the moment t, flow _plat-out (t) is the passenger flow volume leaving the escalator from the building to enter the platform at the moment t;

(2) ascending stair lift:

in the method, in the process of the invention,flow for the upward Flow of stairs in a hall _plat-in (t) passenger Flow entering the escalator from the platform at time t, flow _hall-out (t) is the passenger flow rate leaving the escalator to enter the hall at the moment t;

2) Passenger flow change equation at ticket gate

F _zj (t)＝min{n _zj .v _zj (t),P _zj Equation 7;

wherein F is _zj (t) is the passing passenger flow volume at ticket gate, n _zj V is the number of gates _zj (t) is the passing speed at ticket gate, P _zj Passenger flow can be accommodated for ticket checking department;

3) Passenger flow change equation for getting off

Flow _train-out (t)＝bool _train ·k·Train _mzl ·Train _dy ·v _xc Equation 8;

in the Flow _train-out (t) is the passenger flow volume of the getting-off vehicle, pool _train In order to determine whether a Train arrives at a station, the value is 0 or 1, k is the proportion of passengers getting off, train _mzl Train full load rate _dy For train-officer, v _xc The speed of the passengers getting off the vehicle;

4) Passenger flow change equation for getting on

Flow _train-in (t)＝bool _train ·min{(Train _mzl ·Train _dy -Flow _train-out (t)),Flow _plat (t) } equation 9;

in the Flow _train-in And (t) is the passenger flow of the boarding vehicle.

3. Building passenger selection behavior real-time correction model based on reinforcement learning

(1) Q-learning method

The Q-learning calculation process is to represent each action-environment combination of the agent as a different Q value, where the Q value reflects a desired return that the agent may obtain when performing the corresponding action, and the agent selects a direction action with the maximum Q value according to the current state, and then updates the Q value, so as to improve the result of the action to the maximum extent. The core idea of Q-learning is to maximize the desired total prize by updating the Q value with the current best behavior in each iteration, that is, the learning process of Q-learning is the process of training the Q value, the table on which the Q value is located is the Q table, each row represents a different state, and each column represents an action that can be taken in each state. The update rule of the Q value is as follows:

q (s, a) ≡Q (s, a) +α (r+γmaxQ (s ', a') -Q (s, a)) formula 10;

Wherein Q (s, a) is the Q value corresponding to the selection action a in the state s, alpha is the learning rate, r is the rewarding value generated by the selection action a in the state s, gamma is the attenuation value, represents the importance of the agent on the subsequent return, maxQ (s ', a') is the maximum Q value generated by all the selectable actions in the next state s ', and the corresponding action is a'.

The specific flow of Q-learning is as follows:

step 1: initializing environment and algorithm parameters R (maximum training cycle number, decay value gamma, reward function and evaluation matrix Q);

step 2: randomly selecting an initial state s, ending the scene if s=s, and reselecting the initial state;

step 3: randomly selecting an action a from all possible actions in the current state s, wherein the probability of selecting each action is equal;

step 4: the current state reaches a state s' after the current state selects the action a;

step 5: updating the Q matrix using equation 10;

step 6: setting the next state as a current state s=s', and if s does not reach the target state, turning to step 3;

step 7: if the algorithm does not reach the maximum training period number, the method goes to step 2 to enter the next scene. Otherwise, the training is finished, and a convergence Q matrix after the training is finished is obtained.

(2) Passenger selection behavior analysis

The movement of the passenger in the station is a process comprising a plurality of selection actions, the selection actions of the passenger refer to the selection actions of the passenger on the node when a plurality of inflow nodes can be selected after the passenger flows out of a certain node on the basis of the determined passenger flow direction after the passenger enters the station network, the outflow node is called as a father node, the inflow node is a child node, the father node can be provided with a plurality of child nodes, the child node can also be provided with a plurality of father nodes, the father node and the child node are positioned relatively, and for a specific certain node, the selection actions of the passenger mainly comprise path selection actions and equipment and facility selection actions, as shown in fig. 6, the father node is the child node of the upstream node and the father node of the downstream node.

(1) Path selection behavior

The passenger's path selection behavior refers to the passenger's selection of areas or facilities on different flow lines depending on the destination. The path selection behavior of the passenger may be divided according to the spatial location of the parent node, as shown in table 1.

Table 1 path selection behavior classification table

(2) Device facility selection behavior

The equipment selection behavior of the passengers refers to the selection behavior of parallel equipment with similar functions when the passengers move on the streamline, mainly including the selection of stairs and escalators, the selection of gates, and the selection of platform doors, as shown in table 2.

Table 2 device facility selection behavior

(3) Building passenger selection behavior real-time correction model

The movement of passengers in a station is a process involving a number of selection actions, including the selection of entrance gates, the selection of stairs, the selection of travel paths, etc. The accurate calculation of the passenger selection preference is the basis for constructing the station passenger flow simulation model. The method and the device can reflect the travel preference information of the passenger to a certain extent by calculating the selection parameters of the passenger based on the historical data, but the passenger is easily influenced by various factors such as external environment, time, subjective psychology and the like, the selection behavior is differentiated in different scenes, particularly, when large passenger flows occur, the characteristics of the selection behavior of the passenger are more difficult to capture, so that the deviation between simulation results and real data exists and the establishment of subsequent passenger flow organization and passenger flow control measures is influenced.

The basic thought for constructing the passenger selection behavior real-time correction model is as follows: firstly, parameters in a model are fitted by utilizing historical data and experience to obtain passenger selection behavior preference and corresponding parameter initial values, a passenger flow evolution simulation model is constructed, real-time video passenger flow data is introduced on the basis, real passenger flow and simulated passenger flow at a monitoring area are compared and verified, the real passenger flow and the simulated passenger flow are taken as input of a passenger selection behavior real-time correction model, simulation errors are the minimum, the passenger selection behavior parameter values are continuously learned and optimized by utilizing a reinforcement learning algorithm, the process is continuously carried out, the selection behavior parameters are kept dynamically changed, so that the simulation result of the passenger flow evolution simulation model is ensured to be close to the real data as much as possible, and a logic flow chart is shown in fig. 7.

Firstly, an intelligent agent takes current passenger flow distribution information simulated by a station as a state, inputs the current passenger flow distribution information into a Q-learning model, then, according to a Q value updating formula, selects to increase or decrease each selected behavior parameter value as an action, inputs the state and the action into an environment to execute passenger flow simulation based on the current action, thereby obtaining errors of simulated passenger flow and real data at a monitoring area, and inputs the errors into the Q-learning model again together with the next state as rewards, so that the errors are continuously and iteratively updated.

The parameter real-time correction method based on reinforcement learning comprises the following elements of environment E, state S, action A, rewards R and the like. The environment E is a passenger flow scene simulated by a station passenger flow evolution simulation model; the state S is passenger flow state information at a monitoring area obtained based on a simulation model; action a is the change of the values of the parameters in the passenger selection behavior parameter set; the reward R is the error of the simulated traffic information from the actual traffic information, and therefore the goal of the reinforcement learning model of embodiments of the present invention is to minimize the reward value.

(1) Passenger flow environment E

The environment refers to a process of simulating the evolution of passenger flow in a station, namely a process that passengers appear outside the station or at a station platform to leave the station, and the process comprises two types of passengers from entering to leaving and passengers from getting off to leaving. The flowing interaction and the selection behavior of the passengers in the station are carried out in the environment, after the modification of the selection behavior parameters, the change of the passenger behavior can be reflected in the environment, but in consideration of the randomness of the selection of the passengers, in order to ensure the stability of the value of the selection behavior parameters, each updating period is divided into T _am And updating the passenger selection behavior parameter value at intervals of each time step.

(2) Passenger flow state S

The passenger flow state refers to the passing passenger flow volume at the monitoring area or the passenger flow density in the area, and is used for reflecting the accuracy of the simulation result of the simulation model. The passenger flow state is defined ass _n,t Represents the passing passenger flow volume or passenger flow density of the region N at the moment t, N is E N _camera Representing a set of monitoring area points. The passenger flow state information is selected according to different area functions, and passenger flow is used for traffic facilities>As the regional passenger flow state, passenger flow density +.>As the regional traffic status.

(3) Action A

The action refers to the change condition of the selection behavior parameter value in the simulation model, namely the change state of the probability of a passenger selecting a certain path or a certain facility, including increase, unchanged and decrease, and the selection parameter action is defined asa _t ＝{a _1,t ,...,a _m,t ,...,a _M,t And }, wherein a _m,t Representing the probability of the passenger selecting the mth selection node at the t moment, m epsilon N _choice Representing a set of select nodes. a, a _m,t 3 action states can be selected, namely:

a _m,t ＝[a ^- ,0,a ⁺ ]equation 11;

wherein a is ^- Indicating a decrease in the parameter value, 0 indicating a constant parameter value, a ⁺ Indicating an increase in the parameter value. The change value of the increase or decrease of a per update is 0.1. Due to a _m,t Is a probability value, so a _m,t ∈[0,1]，a _m,t When=0, it means that the passenger will not select the node at all at time t, a _m,t When=1, it means that the passenger must select the node at time t.

(4) Reward function R

The reward function is the direct representation of the reinforcement learning objective and determines the convergence rate and degree of the reinforcement learning algorithm. The aim of the real-time correction model of the embodiment of the invention is to minimize the error between the simulated passenger flow state and the real passenger flow state, so that the rewarding function is set as follows

Wherein,indicating the time tReal traffic state of region n +.>And the simulated passenger flow state of the region n at the time t is shown.

The algorithm flow for constructing the passenger selection behavior real-time correction model based on reinforcement learning is shown in the table 3:

table 3 flow of method for correcting passenger selection behavior in real time based on reinforcement learning

4. Building station passenger flow real-time deduction model based on reinforcement learning

The embodiment of the invention provides a multi-scene-oriented station passenger flow real-time deduction method based on video data, which can introduce monitoring video data into a simulation model, correct passenger selection behavior parameters in real time according to the current passenger flow state, and enhance the time-varying capability of station passenger flow simulation deduction so as to improve the accuracy of the simulation model.

(1) Model hypothesis

In order to improve the simulation efficiency of the model, the following assumptions are made for the station passenger flow real-time deduction model provided by the embodiment of the invention:

1) The passengers arrive at the station only in a queuing and waiting mode and enter the station to receive services, and the behavior that the passengers give up traveling is not considered;

2) The train arrival time is strictly carried out according to a schedule, and no late point exists;

3) The passengers who get in the bus do not have subjective riding-leaving behaviors;

4) Platform waiting passengers all follow the principle of 'getting off before getting on';

5) The passing state of all facility equipment of the station is good.

(2) Description of variables

According to different functions, the station is divided into a plurality of areas, and each area comprises one or more nodes. The node set includes virtual nodes, i.e. nodes with certain functions but without physical space in the station, such as nodes of the station where passengers arrive, nodes of the station where passengers wait for entering the station outside the station, and the like. The variables in the model and their corresponding descriptions are given in table 4.

Table 4 model variables and description

(3) Constraint condition and evaluation index

A strict mathematical model is established for the flowing process in the passenger station, and the mathematical model comprises capacity constraint of nodes, constraint of passenger flow supply and demand relation among the nodes, constraint of train schedule and the like.

1) Node capacity constraint

The capacity of the node for accommodating the passenger flow is limited, and the setting is setFor the maximum number of passengers that node n can accommodate, as determined by the physical condition of node n, equation 13 represents the number of passengers ac that node n can accommodate at time t _t,n Should be less than or equal to the maximum number of passengers that node n can accommodate +.>

2) Constraint of passenger flow supply and demand relation between nodes

The flow of the passenger flow between the nodes is limited by the outflow passenger flow of the previous node and the capacity of the next node at the same time, namely, the passenger flow flowing out of the node n at the time t is not more than the total number of passengers waiting to flow out of the node n at the time t; the flow of incoming traffic to node N at time t should also not be greater than the ability of node N to receive incoming traffic, and the supply-demand relationship constraint is defined in equations 14 through 16, where t e s, q, N e N.

e _t,n ≤o _t,n Equation 14;

equation 14 shows that the amount of the passenger flow flowing out of node n at time t is limited by the outflow demand of node n, and the outflow passenger flow can only be less than or equal to the total passenger flow with outflow demand in node n;

as can be seen from equation 15, the number of passengers flowing into node n at time t is limited by the residual capacity of node n, and the inflow passenger flow is not larger than the difference between the maximum accommodated passenger flow and the actual accommodated passenger flow of node n, and is also limited by the inflow capacity of node n Limiting, the inflow capacity is determined by the physical condition of the node n, i.e. the maximum number of passengers that can simultaneously flow into the node n;

ac _t,n ＝ac _t-1,n -e _t,n +f _t,n equation 16;

equation 16 gives the relationship between the traffic accommodated by node n at time t and the incoming and outgoing traffic, in equation 16 ac _t-1,n And ac _t,n The number of passengers, ac, accommodated by node n at times t-1 and t, respectively _t,n Is calculated as ac _t-1,n Subtracting the flow of the passenger out at the moment t plus the flow of the passenger in at the moment t.

3) Train schedule constraints

On-off passenger flow is constrained by train schedules, and generally, trains can be divided into three states according to different passenger behaviors: state 1, a train enters a station, and passengers get off; a state 2, after the passengers get off the train, the passengers waiting at the platform get on the train; and 3, no train enters the station, passengers wait on the station, and the number of passengers getting off and getting on is 0.

Assuming that passengers in the station follow the principle of getting off and getting on, in the state 1, the passenger flow is l _t The passenger flow of the boarding vehicle is 0; in the state 2, the passenger flow of the getting-off vehicle is 0, and the passenger flow of the getting-on vehicle is the sum of all passenger flows flowing from the platform node to the train node at the moment t; otherwise, the passenger flow of the lower car and the passenger flow of the upper car are both 0.

The train arrival time limits the number of passengers flowing out from the station nodes, thereby affecting the passenger density of the station nodes, and also determines the departure time of the passengers getting off, which is an important factor in calculating the average residence time in the passenger station.

4) Relation constraint between passenger flow speed, density and flow

The relationship between the velocity (v), the density (k), and the passenger flow volume (q) is shown in formulas 18 to 21.

k _t,n ＝ac _t,n /s _n N is E N formula 18;

equation 18 is a density calculation equation for calculating the number of passengers that can be accommodated per unit area of node n at time t, and for non-special nodes, the density can be used to calculate the passenger speed through the node by equation 19.

For non-special nodes, the speed of travel of the passenger varies with the density of passengers within the node, a function g (k _t,n ) Description of k _t,n And v _t,n According to existing studies, some classical velocity-density relationship formulas can be obtained.

Equation 20 is a special node set N _S For the elevator node, the running speed is constant, and for the AFC node, only one passenger can pass through the gate at a time, the random fluctuation of the running speed when passing through the node does not affect the passenger flow characteristics in the station, therefore, the running speed v of the passenger passing through the special node is assumed _t,n Is a fixed value

f _t,n ＝v _t,n ×k _t,n ×w _n N is E N formula 21;

equation 21 is used to represent velocity v _t,n Density k _t,n And passenger flow volume f _t,n In other words, the traffic flowing into node n at time t can be expressed as velocity v _t,n Density k _t,n And node width w _n For N, the product of _S Special node, k _t,n ＝1。

(4) Real-time deduction simulation logic for passenger flow of station

The real-time correction model of the passenger selection behavior based on reinforcement learning is embedded into a station passenger flow evolution simulation model based on system dynamics, the model and a corresponding experimental environment are built by utilizing computer programming, and the process of building the station passenger flow real-time deduction model based on reinforcement learning is shown in a figure 8 and is as follows:

step 2: initializing a simulation environment and loading data to construct a station passenger flow evolution simulation model, wherein the loading data comprises the following steps: the method comprises the following steps of (1) entering passenger flow, getting-off passenger flow, length, width, height, maximum capacity, maximum safety capacity, maximum inflow and outflow capacity of nodes, maximum speed of passengers in the nodes, inflow and outflow relation among the nodes and initial values of selection behavior parameters of the passengers;

Step 5: establishing a station passenger flow real-time deduction model based on reinforcement learning, and updating a Q value and an error value to reach a termination condition by adopting a formula 10 and a formula 12, wherein the termination condition is whether the error value of the simulated passenger flow data and the real passenger flow data at a monitoring node reaches the required precision or reaches the iteration number specified by a system;

step 7: updating the inflow passenger flow volume and the outflow passenger flow volume of the train nodes of the station by adopting a formula 15 and a formula 17;

step 8: updating the inflow passenger flow volume and the outflow passenger flow volume of other nodes except the train node of the station by adopting a formula 14 and a formula 15;

step 9: updating the passenger flow stock of each node of the station by adopting a formula 13 and a formula 16;

step 11: updating dynamic attributes of each node of the station by adopting formulas 18, 19 and 20, wherein the dynamic attributes comprise passenger flow density, passenger flow speed and travelling time of passengers passing through each node;

step 12: the time step is updated so that t=t+Δt, returning to step 3.

5. Experiment verification

(1) Scene summary and data preparation

In order to verify the performance of the model provided by the embodiment of the invention, taking Guangzhou subway Yang Ji station as an example, the simulation result is analyzed. Yang Ji stations are located at the bottom of the mountain in Guangzhou mountain-over-show area and are transfer stations of subway No. 1 and No. 5, and because of being located in Guangzhou great roads and Zhujiang newcastle, the stations have huge commuter passenger flow and holiday passenger flow, the daily passenger flow reaches 18 ten thousands of times, and become one of the most crowded subway stations in Guangzhou.

The Yang Ji station is integrally arranged as an island-type transfer station with three underground layers, wherein one underground layer is a station hall layer and comprises a No. 1 wire station hall and a No. 5 wire station hall, and the two station halls are connected through two transfer channels; the underground two layers are No. 1 wire platform layers, the underground three layers are No. 5 wire platform layers, and island type platform designs are adopted. The total station is provided with 7 inlets and outlets, wherein 4 inlets and outlets are positioned on the No. 1 line, and 3 inlets and outlets are positioned on the No. 5 line.

The station is provided with 37 monitoring areas in total, and the experimental data set is constructed by selecting the passing passenger flow of the monitoring area or the passenger flow density in the area according to the functional attribute of the area where the monitoring is located.

According to the embodiment of the invention, a Java language is adopted to carry out visual modeling on a simulation model, simulation results of 2 days, 4 days and 7 days of 6 months of 2019 are selected as examples to carry out comparative analysis, wherein 2 days of 6 months of 2019 are sunday-non-working days, 4 days of 6 months of 2019 are sunday-working days, and 7 days of 6 months of 2019 are holidays. Because the instantaneous passing passenger flow volume value is smaller, the passenger flow density change also needs a certain time, and in order to more clearly compare the simulated passenger flow and the real passenger flow data, the passenger flow volume is counted by taking 5 minutes as granularity.

(2) Model simulation

Taking several typical areas such as a station entrance, a transfer channel, a station hall, a stair-supporting entrance and the like as examples, comparing and analyzing the real video data with a simulation model without the modification of the selective behavior parameters and a simulation model with the modification of the selective behavior parameters to verify the effectiveness of the model proposed by the embodiment of the invention, and the results are shown in fig. 9 to 17.

Fig. 9-11, fig. 12-14 and fig. 15-17 show the comparison fitting results of real passenger flow data and real passenger flow data of the real-time corrected passenger flow deduction simulation method and the traditional passenger flow simulation method provided by the embodiment of the invention at several typical areas of stations such as a station entrance, a transfer passage and a landing hall elevator entrance under the non-working day scene, the working day scene and the holiday scene respectively. The model precision after the parameter correction is improved by about 7.8% compared with the simulation model precision without the parameter correction, and the method has obvious optimization effect.

As can be seen from the comparison of the simulation results, the real-time passenger flow deduction simulation method provided by the embodiment of the invention can obtain better simulation effects compared with the traditional passenger flow simulation method at different areas of the station under different scenes, and proves that the real-time passenger flow deduction model of the station can better process the passenger flow state change under different scenes, adapt to the law of passenger flow evolution at different areas and have better simulation performance.

System embodiment

According to an embodiment of the present invention, a system for building a real-time deduction model of a station passenger flow based on reinforcement learning is provided, and fig. 18 is a schematic diagram of the system for building a real-time deduction model of a station passenger flow based on reinforcement learning according to the embodiment of the present invention, as shown in fig. 18, and the system for building a real-time deduction model of a station passenger flow based on reinforcement learning according to the embodiment of the present invention specifically includes:

The first construction module 1802 is configured to identify and determine, based on system dynamics, a system structure of a station and a flow causal relationship of passengers in each area, and construct a station passenger flow evolution simulation model according to an identification and determination result, and is specifically configured to:

dividing the station into three areas of a station platform, a station hall and a transfer channel, logically analyzing passenger flow states, facilities and passenger flow causal relations in the three areas, generating a passenger flow causal relation structure tree of a station system structure, quantifying the passenger flow causal relation based on the passenger flow causal relation structure tree, constructing a system dynamics equation, and constructing the station passenger flow evolution simulation model based on the system dynamics equation;

wherein, the system dynamics equation specifically includes: a station hall traffic volume change equation, a station traffic volume change equation, a general channel traffic volume change equation, a transfer channel traffic volume change equation, a traffic volume change equation at a staircase in the station hall, a traffic volume change equation at a ticket checking place, a get-off traffic volume change equation and a get-on traffic volume change equation;

the second construction module 1804 is configured to continuously learn and optimize the passenger selection behavior parameter value by using a reinforcement learning algorithm based on the real-time monitoring video data and with the minimum simulation error as a target, and construct a passenger selection behavior real-time correction model, where the passenger selection behavior real-time correction model is used for dynamically adjusting the passenger selection behavior parameter in the station passenger flow evolution simulation model, and is specifically configured to:

the building module 1806 is configured to embed the passenger selection behavior real-time correction model into a station passenger flow evolution simulation model, and build a station passenger flow real-time deduction model based on reinforcement learning, which is specifically configured to:

determining a submodule: for determining time t=t _s Wherein T is _s Setting the time step as delta t for the simulation starting time;

initializing a submodule: the method is used for initializing a simulation environment and loading data to construct a station passenger flow evolution simulation model, wherein the loading data comprises the following steps: the system comprises an inbound passenger flow, an outbound passenger flow, a get-off passenger flow, a length, width, height, maximum capacity, maximum safety capacity, maximum inflow and outflow capacity of a node, maximum speeds of passengers in the node, inflow and outflow relations among the nodes and initial values of selection behavior parameters of the passengers;

And a judging sub-module: for judging whether the current time T reaches the simulation end time T _q If t<T _q Calling a calculation submodule, otherwise, ending the simulation;

and a calculation sub-module: the real-time passenger flow data acquisition module is used for acquiring real-time video passenger flow data, embedding a passenger selection behavior real-time correction model, and calculating an error value of the simulated passenger flow data and the real passenger flow data at the station monitoring node;

a first update sub-module: the method comprises the steps of establishing a station passenger flow real-time deduction model based on reinforcement learning, and updating a Q value and an error value to reach a termination condition, wherein the termination condition is whether the error value of simulation passenger flow data and real passenger flow data at a monitoring node reaches required precision or reaches the iteration number specified by a system;

a second update sub-module: a passenger selection behavior parameter for updating the selection node;

and a third updating sub-module: the system is used for updating the inflow passenger flow volume and the outflow passenger flow volume of the train nodes of the station;

fourth update sub-module: the system is used for updating the inflow passenger flow volume and the outflow passenger flow volume of other nodes except the train node of the station;

a fifth update sub-module: the system is used for updating the passenger flow stock of each node of the station;

sixth update sub-module: the system is used for updating the passenger flow volume allowed to flow out by each node of the station at the next time step;

Seventh update sub-module: the method comprises the steps of updating dynamic attributes of each node of a station, wherein the dynamic attributes comprise passenger flow density, passenger flow speed and the travelling time of passengers passing through each node;

eighth update sub-module: the judging sub-module is used for updating the time step to ensure that t=t+deltat and calling the judging sub-module;

the system further comprises: and the deduction module is used for carrying out station passenger flow real-time deduction by utilizing the station passenger flow real-time deduction model.

The embodiment of the present invention is a system embodiment corresponding to the above method embodiment, and specific operations of each module may be understood by referring to the description of the method embodiment, which is not repeated herein.

In summary, the embodiment of the invention provides a station passenger flow real-time deduction model based on reinforcement learning aiming at the problems that the existing station simulation model lacks consideration of time variability of passenger selection behavior parameters and monitoring video data information is underutilized. Firstly, a station system structure and passenger flow causality among all areas are identified and judged based on system dynamics, a station passenger flow evolution simulation model is built based on the system dynamics, in order to realize dynamic adjustment of passenger selection behavior parameters in the simulation model, based on real-time monitoring video data, a passenger selection behavior real-time correction model based on reinforcement learning is provided, real-time video passenger flow data are introduced, simulation errors are the minimum, a reinforcement learning algorithm is utilized to continuously learn and optimize the passenger selection behavior parameter values, so that the simulation model is close to real data as much as possible under different scenes, accurate data support is provided for judgment of follow-up bottleneck nodes and establishment of passenger flow control measures, real-time correction is carried out on station passenger flow evolution simulation, adaptability of the simulation model to different scenes is improved, time-varying capability of passenger flow simulation deduction is enhanced, an auxiliary operation management department holds risk states of whole and partial passenger flow, and scientific basis is provided for passenger flow group behavior analysis, passenger flow organization mode adjustment and control measure execution.

Device embodiment 1

An embodiment of the present invention provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the process, which when executed by the processor performs the steps as described in the method embodiments.

Device example two

Embodiments of the present invention provide a computer-readable storage medium having stored thereon a program for realizing information transmission, which when executed by a processor realizes the steps as described in the method embodiments.

The computer readable storage medium of the present embodiment includes, but is not limited to: ROM, RAM, magnetic or optical disks, etc.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A method for establishing a station passenger flow real-time deduction model based on reinforcement learning is characterized by comprising the following steps:

2. The method according to claim 1, wherein the method further comprises:

and carrying out station passenger flow real-time deduction by using the station passenger flow real-time deduction model.

3. The method of claim 1, wherein the identifying and judging the flow causal relationship of passengers between the station system structure and each area based on the system dynamics, and constructing the station passenger flow evolution simulation model according to the identifying and judging result specifically comprises:

Dividing the station into three areas of a station platform, a station hall and a transfer channel, logically analyzing passenger flow states, facilities and passenger flow causal relations in the three areas, generating a passenger flow causal relation structure tree of a station system structure, quantifying the passenger flow causal relation based on the passenger flow causal relation structure tree, constructing a system dynamics equation, and constructing the station passenger flow evolution simulation model based on the system dynamics equation, wherein the system dynamics equation specifically comprises: a station hall traffic volume change equation, a station traffic volume change equation, a general channel traffic volume change equation, a transfer channel traffic volume change equation, a traffic volume change equation at a staircase in the station hall, a traffic volume change equation at a ticket checking place, a get-off traffic volume change equation and a get-on traffic volume change equation.

4. The method according to claim 1, wherein the constructing the passenger selection behavior real-time correction model based on the real-time monitoring video data and with the aim of minimizing simulation errors by continuously learning and optimizing the passenger selection behavior parameter values by using a reinforcement learning algorithm specifically comprises:

based on real-time monitoring video data, introducing real-time video passenger flow data, comparing and verifying real passenger flow and simulated passenger flow at a monitoring area, taking a comparison and verification result as input of parameters of a passenger selection behavior real-time correction model, continuously learning and optimizing the passenger selection behavior parameters by using a reinforcement learning algorithm Q-learning with minimum simulation errors as a target, maintaining dynamic changes of the passenger selection behavior parameters, and constructing a passenger selection behavior real-time correction model, wherein the passenger selection behavior parameters comprise path selection behaviors and equipment facility selection behaviors.

5. The method of claim 1, wherein the embedding the passenger selection behavior real-time correction model into the station passenger flow evolution simulation model, and the establishing the station passenger flow real-time deduction model based on reinforcement learning specifically comprises:

step 9: updating the passenger flow stock of each node of the station;

step 12: the time step is updated so that t=t+Δt, returning to step 3.

6. The system for establishing the station passenger flow real-time deduction model based on reinforcement learning is characterized by comprising the following components:

7. The system of claim 6, wherein the system further comprises:

and the deduction module is used for carrying out station passenger flow real-time deduction by utilizing the station passenger flow real-time deduction model.

8. The system of claim 6, wherein the system further comprises a controller configured to control the controller,

the first construction module is specifically configured to:

dividing the station into three areas of a station platform, a station hall and a transfer channel, logically analyzing passenger flow states, facilities and passenger flow causal relations in the three areas, generating a passenger flow causal relation structure tree of a station system structure, quantifying the passenger flow causal relation based on the passenger flow causal relation structure tree, constructing a system dynamics equation, and constructing the station passenger flow evolution simulation model based on the system dynamics equation, wherein the system dynamics equation specifically comprises: a station hall traffic volume change equation, a station traffic volume change equation, a general channel traffic volume change equation, a transfer channel traffic volume change equation, a traffic volume change equation at a staircase in the station hall, a traffic volume change equation at a ticket checking place, a get-off traffic volume change equation and a get-on traffic volume change equation;

The second construction module is specifically configured to:

the building module specifically comprises:

eighth update sub-module: updating the time step, enabling t=t+Δt, and calling the judging submodule.

9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the reinforcement learning based station passenger flow real-time deduction model building method according to any one of claims 1 to 5.

10. A computer-readable storage medium, wherein a program for realizing information transfer is stored on the computer-readable storage medium, and when the program is executed by a processor, the steps of the reinforcement learning-based station passenger flow real-time deduction model building method according to any one of claims 1 to 5 are realized.