CN114328448B

CN114328448B - Highway vehicle following behavior reconstruction method based on imitation learning algorithm

Info

Publication number: CN114328448B
Application number: CN202111461186.7A
Authority: CN
Inventors: 吴游宇; 李正军; 王丽园; 马天奕; 杨晶; 罗丰; 熊文磊
Original assignee: CCCC Second Highway Consultants Co Ltd
Current assignee: CCCC Second Highway Consultants Co Ltd
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2024-08-23
Anticipated expiration: 2041-12-01
Also published as: CN114328448A

Abstract

The invention discloses a highway vehicle following behavior reconstruction method based on an imitation learning algorithm, which comprises the following steps: taking the expressway vehicle operation data set as an example, and extracting and processing data features aiming at expressway operation features of different areas; training the expressway following behavior model by using imitation learning to obtain an imitation learning following model; and acquiring vehicle operation information in simulation in real time as input, and controlling the vehicle following behavior according to the output acceleration information on the basis of simulating the learning following model. The expressway vehicle following behavior reconstruction method based on the imitation learning algorithm focuses on expressway scenes, provides a technical method frame for accurately reconstructing the vehicle following behavior based on the imitation learning algorithm, combines simulation to carry out practical application, can be transferred to expressways with different characteristics, can automatically fit following models of different scenes, and reflects vehicle driving characteristics.

Description

Highway vehicle following behavior reconstruction method based on imitation learning algorithm

Technical Field

The invention relates to the technical field of traffic big data application, in particular to a highway vehicle following behavior reconstruction method based on an imitation learning algorithm.

Background

The traffic simulation with the digital twin as the core develops rapidly, is favorable for saving the test time and cost, and avoids the danger of actual measurement of the road. Therefore, how to truly describe the behavior equation of the vehicle (such as following, lane changing behavior, etc.) is far from meaningful for realizing traffic simulation.

Taking the following behavior as an example, the traditional model driving method (such as GM, IDM model and the like) is limited by a lengthy parameter adjusting process, and it is difficult to accurately describe the specificity of the vehicle following model in different scenes.

With the gradual deepening of big data technology in the traffic field, the emerging data driving method (such as machine learning, reinforcement learning and the like) provides a better thought for describing the following behavior of vehicles through information processing and extraction of massive data sets.

In a great deal of research data, the following behavior characterization based on reinforcement learning is widely applied, and an algorithm is utilized to find an optimal strategy so as to maximize the accumulated rewards. However, in some reinforcement learning tasks, the reward function is difficult to define, it is difficult to manually set the reward feedback, and it is very difficult for the agent to learn the behavior characteristics based on the reward feedback to be expected.

Disclosure of Invention

The invention aims to provide a reconstruction method of the following behavior of a highway vehicle based on a simulated learning algorithm, which can be transferred to highways with different characteristics, customize input parameters and reflect the driving characteristics of the vehicle.

In order to achieve the above purpose, the expressway vehicle following behavior reconstruction method based on the imitation learning algorithm comprises the following steps:

Taking a highway vehicle operation data set as an example, and extracting and processing data features aiming at highway operation features of different areas;

Training the expressway following behavior model by using imitation learning to obtain an imitation learning following model;

and (3) acquiring vehicle operation information in simulation in real time as input, and controlling the vehicle following behavior according to the output acceleration information on the basis of the simulated learning following model obtained in the step (2).

In one implementation, the step (1) further includes the steps of:

A step (1.1) of extracting data representative of the vehicle state and the corresponding driver behaviour based on the expressway vehicle operation data set;

And (1.2) preprocessing the raw data representing the vehicle state and the corresponding driver behavior extracted in the step (1.1).

In one implementation, in step (1.2), the preprocessing includes field computation and data padding that employs a linear interpolation algorithm to pad data missing in certain frames.

In one implementation, in the step (1.1), the acquiring the acceleration field in the field calculation includes the steps of:

grouping the data according to the vehicle ID to obtain track data of a certain vehicle;

sequencing the track data according to the time sequence;

and calculating the acceleration of the vehicle according to the speed and time in the track data.

In one implementation, in the step (1.1), the acquiring the front vehicle information field in the field calculation includes the steps of:

grouping the data according to the sequence numbers or the time stamps of the data frames to obtain all vehicle information in a certain data frame acquisition range;

the lane number of the ith vehicle in a certain data frame is obtained, and all vehicles with the same lane number are screened out from the data;

sorting all vehicles with the same lane number according to the front-rear sequence of the positions, marking the first vehicle with the position larger than the ith vehicle as the front vehicle of the ith vehicle, and recording the ID value of the vehicle;

if no vehicle with a position greater than the ith vehicle exists, the front vehicle ID of the ith vehicle is marked as a null value.

In one implementation, the data padding uses a linear interpolation algorithm to pad missing data in some frames according to the following method:

Assuming that the data sampling frequency is f, when a missing value exists between a first time t ₁ and a second time t ₂ (t ₁<t₂), calculating the number n of time nodes needing to be filled according to n=f (t ₂-t₁) -1;

Performing linear interpolation according to d _i＝d₁+(d₂-d₁) i/n, i=1, 2,..n, and completing numerical filling;

Wherein d ₁ and d ₂ are target amounts corresponding to the times t ₁ and t ₂.

In one implementation, the step (2) further includes the steps of:

Step (2.1), screening vehicle driving data in a stable following state from the data obtained by the step (1);

Step (2.2), extracting state-behavior pairs from the vehicle operation data in the stable following state obtained by screening in the step (2.1) according to simulated learning;

step (2.3), adopting behavior cloning to realize driver behavior learning of the expressway, and obtaining a simulated learning following model;

the stable following state comprises that the types of vehicles in front and back are small vehicles, the state/behavior quantity cannot be null, and the vehicles do not have lane changing behavior within the number of the running interval frames;

The state-behavior pairs refer to mappings of state or observation vectors to agent behaviors in expert demonstrations.

In one implementation, in the step (2.2), the extracting of the state-behavior pairs includes the steps of:

a step (2.2.1) of extracting and processing data based on the step (1);

Step (2.2.2), setting the response time of the driver and the sampling interval time of the data to obtain a behavior interval frame number n _a and a sampling interval frame number n _s;

Step (2.2.3), data is collected from the 1 st frame, and the 1 st+n _s is taken as a state frame, and then the 1 st+n _a+n_s frame is taken as a corresponding behavior frame;

Step (2.2.4), for each state frame and corresponding behavior frame, respectively obtaining a vehicle ID set, wherein the intersection of the vehicle ID set of each state frame and the vehicle ID set of the behavior frame corresponding to each state frame is regarded as a vehicle possibly extracted into a state-behavior pair, and a vehicle ID intersection is obtained;

Step (2.2.5), extracting the vehicle ID, the front vehicle ID, the vehicle speed, the acceleration and the position of each vehicle according to the vehicle ID, and obtaining the vehicle speed, the acceleration and the position of the front vehicle according to the front vehicle ID to obtain the vehicle state;

The vehicle state comprises a vehicle speed, an acceleration, a front vehicle speed, a front vehicle acceleration and a workshop distance, and the acceleration of the vehicle is extracted from a behavior frame to serve as a driver behavior.

In one implementation, in the step (3), vehicle running information in the SUMO simulation is obtained in real time through a TraCI interface and used as input of the simulated learning following model obtained by training in the step2, and the following behavior of the vehicle is controlled according to the output acceleration information; wherein the vehicle operation information comprises vehicle position, speed and acceleration.

The beneficial effects of the invention are as follows: the expressway vehicle following behavior reconstruction method based on the imitation learning algorithm focuses on expressway scenes, provides a technical method framework for accurately reconstructing the vehicle following behavior based on the imitation learning algorithm, and is practically applied by combining simulation; focusing on a standardized system frame, the system frame has high compatibility, can be moved to highways with different characteristics, customizes input parameters, and can automatically fit following models of different scenes after simple data format conversion to reflect vehicle driving characteristics.

Drawings

FIG. 1 is a schematic flow chart of a method for reconstructing a following behavior of a highway vehicle based on an imitation learning algorithm according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of behavioral cloning;

FIG. 3 is a behavioral clone model-multi-layer neural network diagram;

FIG. 4 is a diagram of a simulated learning following model technique;

Fig. 5 is a simulation run interface.

Detailed Description

The invention will now be described in further detail with reference to the drawings and to specific examples.

The invention discloses a method for reconstructing the following Behavior of a highway vehicle based on a simulation learning algorithm, which takes a highway as a scene, takes a following model as an object, takes simulation learning as a method, takes traffic simulation software as a carrier, selects simulation learning based on expert demonstration, and adopts a Behavior Cloning method (BC) to directly learn the mapping (namely a state-Behavior pair) of states or observation vectors in expert demonstration to the behaviors of an agent. Based on the system framework, input parameters can be customized by only carrying out format conversion and information extraction on vehicle operation data sets of different highways, a following model with regional characteristics is generated, and the vehicle operation data sets are migrated to highways in different areas. From the simulation level, the method can also be used for replacing a default following model of traffic simulation software (such as SUMO) to construct a more real simulation environment.

Referring to fig. 1, the expressway vehicle following behavior reconstruction method based on the simulated learning algorithm according to the embodiment of the invention includes the following steps:

Step 1, taking the expressway vehicle operation data set as an example, extracting and processing data features aiming at expressway operation features of different areas, and providing customized simulation input for the expressway vehicle operation data sets with different features.

The expressway vehicle following behavior reconstruction method based on the simulated learning algorithm is based on an open-source expressway vehicle operation data set, and can customize input parameters according to the operation characteristics (speed, acceleration and the like) of expressways in different areas, generate a following model with the area characteristics and transfer the following model to the expressways in different areas.

More specifically, step 1 further comprises the steps of:

Step 1.1, data characterizing the vehicle state and corresponding driver behavior are extracted based on the highway vehicle operation dataset.

Common open source datasets include: NGSIM (U.S.), highD (germany), zenData (japan), ubiquitousTrafficEyes (china), etc. The data types of the expressway vehicle operation data set include vehicle trajectory data, radar data, and the like, for extracting data of a vehicle state and corresponding driver behavior.

The data requirement covers most (all) vehicles on the target road, and can directly or indirectly acquire high-precision vehicle state with certain continuity and driver behavior information, the details of the state and the behavior information are shown in table 1, wherein the data marked by the ground color needs to contain necessary information, a time stamp and a vehicle ID are used as identifiers, and the data needs to have uniqueness. The fields such as speed, acceleration, surrounding vehicle state, etc. are not necessary and can be obtained by calculation.

Table 1 data field requirements

That is, the data-necessary information includes a time stamp, a vehicle ID, a position, and a lane. Wherein, the time stamp and the vehicle ID are identifiers and need to have uniqueness; the position and the lane are used as the vehicle state, and information such as speed, acceleration, surrounding vehicles and the like can be calculated.

And step 1.2, preprocessing the raw data representing the vehicle state and the corresponding driver behavior extracted in the step 1.1.

Since the driver has a reaction time, the vehicle acceleration after a time interval is needed to represent the driver behavior (as shown in table 1), so that it is needed to ensure that the data has a certain continuity (for example, there is vehicle state information at time t, and when the reaction time is taken to be 1, the data needs to include driving behavior information corresponding to time t+1). Because of limited technical capability, incomplete field, data missing and the like in the process of data acquisition and transmission, the original data needs to be preprocessed before the state-behavior pair is extracted. The preprocessing of the original data mainly comprises field calculation, data filling and the like, and the specific method is as follows:

the field calculation is mainly aimed at data which does not directly contain critical state quantities (vehicle speed, acceleration, etc.), for example, japanese ZenData data only contains a vehicle speed field and lacks an acceleration field, and a specific method for acquiring the acceleration field is as follows:

Step 1.2.1, grouping data according to the ID of the vehicle to obtain track data of a certain vehicle;

Step 1.2.2, sequencing the track data according to the time sequence;

And 1.2.3, calculating the acceleration of the vehicle according to the speed and time in the track data.

Because the sampling frequency is high (10 hz), the vehicle can be assumed to adopt uniform acceleration motion, and the acceleration of the vehicle can be obtained by dividing the difference between two adjacent vehicle speed recorded values in the track data by the time interval. The method is only used for data with high sampling frequency, and the lower the sampling frequency is, the greater the randomness of the behavior of the vehicle in the adjacent sampling compartments is, and the greater the error caused by the random behavior of the vehicle is also possible.

For the last vehicle speed record value, the acceleration is marked as a null value because the vehicle is in a data acquisition range later and the state is unknown.

The method for acquiring the surrounding vehicle information is as follows, taking following behavior learning as an example, the ID of the vehicle in front of the target vehicle needs to be acquired, the field (NGSIM, highD, etc.) is already included in part of the data, and the data (ZenData, ubiquitousTrafficEyes) not including the field needs to be extracted, and the specific method for acquiring the vehicle information in front is as follows:

step 1.2.4, grouping the data according to the serial numbers (or time stamps) of the data frames to obtain all vehicle information in a certain frame data acquisition range;

Step 1.2.5, for the ith vehicle in the data frame, acquiring the lane number of the ith vehicle, and screening all vehicles with the same lane number from the data;

Step 1.2.6, sorting the vehicles according to the front-rear sequence of the positions, wherein the first vehicle with the position larger than the ith vehicle is the front vehicle of the ith vehicle, and recording the ID value of the vehicle;

Step 1.2.7, if no vehicle is present in the frame at a position in the lane greater than the i-th vehicle, indicating that the i-th vehicle is the head vehicle in the lane, no additional vehicle is present in front of the i-th vehicle, and the head vehicle ID is marked with a null value.

When data of some frames are missing, the data needs to be filled, and the sampling frequency of the existing equipment is considered to be high, so that a linear interpolation algorithm is adopted. Assuming that the data sampling frequency is f, when a missing value exists between the first time t ₁ and the second time t ₂ (t ₁<t₂), the data filling algorithm is as follows: calculating the number of time nodes needing to be filled: n=fΔt-1, Δt=t ₂-t₁; linear interpolation, finishing numerical filling: d _i＝d₁+(d₂-d₁) i/n, i=1, 2,..n. Wherein d ₁ and d ₂ are target amounts corresponding to the times t ₁ and t ₂.

And 2, training the expressway following behavior model by using imitation learning.

The imitative learning is learning based on expert demonstration, and directly learns the mapping of states or observation vectors to agent behaviors in expert demonstration. Based on the basic assumption that the strategy adopted by the expert is optimal, the difficulty of the definition of the strategy by the traditional reinforcement learning is avoided. Expert demonstration, also called expert trajectory, is the result of the agent's continuous decision, denoted by the letter τ. Expert demonstration can be split into several pairs of state actions τ ₁,τ₂,...,τ_n (or pairs of observed actions). Vehicle following behavior is the most basic microscopic driving behavior describing the interaction between two adjacent vehicles in a fleet traveling on a single way that limits overtaking.

Further, the step 2 specifically includes the following steps:

step 2.1, screening vehicle operation data in a stable following state after the data preprocessing in the step 1: the types of vehicles in front and back are small vehicles; the state/behavior quantity cannot be null; the vehicle has no lane change behavior within the behavior interval frame number.

And 2.2, extracting the state-behavior pairs of the vehicle operation data in the stable following state, which are obtained by screening in the step 2.1, according to the basic principle of simulated learning.

The state-behavior pair refers to the mapping of states or observation vectors to agent behaviors in expert demonstration. For highway following model learning, the state of the highway following model mainly comprises the running states of the vehicle and the vehicles in front and distance layers (vehicle speed, workshop distance, acceleration and the like), and the behavior of the highway following model comprises state-behavior pairs such as acceleration and the like of the vehicle after a certain reaction time. The extraction flow of the state-behavior pairs is as follows:

and 2.2.1, performing operations such as field calculation, unit unification (such as speed unit km/h and distance unit m), numerical filling and the like based on the data preprocessing method in the step 1.

And 2.2, setting core parameter values.

The core parameter values include a driver reaction time (response time) and a sampling interval time (simulation update time) of the data.

Since the filled time is a continuous equidistant time node, the two times are multiplied by the frequency corresponding to the data, and the behavior interval frame number n _a and the sampling interval frame number n _s are obtained.

Step 2.2.3, collecting data from the 1 st frame, wherein the 1 st frame is taken as a state frame, and the 1 st+n _a th frame is taken as a corresponding behavior frame; taking the 1+n _s as the status frame, the 1+n _a+n_s frame is the corresponding behavior frame, and so on, table 2 shows the selection method of the status-behavior frame.

TABLE 2 State-behavior frame selection method schematic table

Step 2.2.4, for each state frame and corresponding behavior frame, respectively obtaining a vehicle ID set, wherein the intersection of the vehicle ID set of each state frame and the vehicle ID set of the behavior frame corresponding to each state frame is regarded as a vehicle possibly extracted into a state-behavior pair, and a vehicle ID intersection is obtained.

And 2.2.5, extracting the vehicle ID, the front vehicle ID, the vehicle speed, the acceleration and the position of each vehicle from the intersection of the vehicle IDs obtained in the step 2.2.4, and obtaining the vehicle speed, the acceleration and the position of the front vehicle according to the front vehicle ID so as to finally obtain the vehicle state.

The vehicle state includes vehicle speed, acceleration, front vehicle speed, front vehicle acceleration, and vehicle-to-vehicle distance. In the behavior frame, the acceleration of the vehicle is extracted as the driver behavior.

And 2.3, adopting behavior cloning to realize driver behavior learning of the expressway.

The Behavior Cloning (BC) model is a model that directly learns the mapping of states or observation vectors to agent behaviors in expert demonstration. In practical application, the factors such as cost, safety and the like are considered, and the practical operation result of an intelligent body (vehicle) is difficult to manually calibrate, so that the behavior cloning is adopted to realize the learning of the driving behavior of the expressway. The behavioral cloning diagram is shown in fig. 4, and mainly consists of the following two parts:

step 2.3.1, extracting state-behavior pairs from NGSIM high-speed data (NGSIM-I80 for short) collected at the position number of "I80".

The state is represented by speed, speed of a front vehicle, distance between vehicles and acceleration, the response time of a driver is 1s, and the sampling interval time is 1s. After data processing, 803773 state-behavior pairs are extracted from NGSIM-I80 data, and two decimal places are reserved for each physical quantity.

And 2.3.2, training by using a neural network to obtain a simulated learning following model.

The extracted state-behavior pairs are input into a neural network with three hidden layers for training, and a simulated learning following model is obtained. The neural network structure of the neural network-behavior clone model containing three hidden layers is shown in fig. 3. The dimension of the input vector is 4 (speed, speed of a front vehicle, distance between vehicles and acceleration), the dimension of the output vector is 1 (acceleration), and the mapping relation of the input vector and the output vector is obtained through training. The middle layers respectively have 16, 32 and 16 neuron layers, and a leakage ReLU activation function is adopted between the layers.

The leak ReLU function is shown in equation (1), α _i =0.3 in the test case;

And step 3, acquiring vehicle running information in simulation in real time as input, and controlling the vehicle following behavior according to the output acceleration information on the basis of the simulated learning following model obtained in the step 2.

Vehicle running information in SUMO simulation can be obtained in real time through TraCI interfaces and used as input of the simulated learning following model obtained in the step 2, and the following behavior of the vehicle is controlled according to the output acceleration information. The vehicle operation information comprises vehicle position, speed and acceleration.

As shown in fig. 4, the specific flowchart specifically includes the following steps:

And 3.1, constructing a simulation scene.

And (3) establishing a road network, setting an initial traffic flow, and applying the simulated learning following model obtained by training in the step (2) to SUMO vehicle control.

On road network construction, an existing real road network can be imported on open source map software OpenStreetMap through an API interface provided by SUMO or the road network can be edited by self in a netedit road network editor built in SUMO, so that a net.xml road network file is generated.

And generating a rou.xml file for defining the vehicles, the vehicle flows and the paths thereof in the simulation by combining the measured flow data.

And calling the net.xml road network file and the rou.xml vehicle, vehicle flow and path file by generating the sumocfg configuration file.

And 3.2, controlling the vehicle following behavior based on the simulation scene.

And for the vehicles running in the simulation scene, acquiring running information of the vehicles by utilizing the TraCI interface in real time. The state information comprises vehicle speed, front vehicle speed, acceleration and workshop distance.

Similar to the screening of the following behavior data in the training data set, in the actual simulation operation, the stable following state is screened through the same principle, and the default behavior model of the SUMO bottom layer is adopted in the rest conditions: krauss and a model of LC2013 lane change.

Based on the stable following state, the operation information extracted by TraCI is input into the multi-layer neural network-based simulation learning following model obtained by training in the step 2, and the acceleration output by the simulation learning following model is applied to vehicle control in real time through a TraCI interface.

In the simulation running process, the running conditions (speed, acceleration and the like) of the vehicle can be continuously tracked, and data analysis and model evaluation are facilitated. The simulation run interface is shown in fig. 5 below. The basic principle and main flow of the expressway vehicle following behavior reconstruction method based on the imitation learning algorithm of the invention are shown and described above.

Compared with the prior art, the expressway vehicle following behavior reconstruction method based on the simulated learning algorithm has the following advantages: because the stable following behavior proportion of the expressway is higher, the invention focuses on the expressway scene, provides a technical method framework for accurately reconstructing the following behavior of the vehicle based on the imitation learning algorithm, and combines simulation to carry out practical application. The expressway vehicle following behavior reconstruction method based on the simulated learning algorithm is focused on a standardized system framework, has high compatibility, can be transferred to expressways with different characteristics, customizes input parameters, and can automatically fit following models of different scenes after simple data format conversion to reflect vehicle driving characteristics.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The expressway vehicle following behavior reconstruction method based on the imitation learning algorithm is characterized by comprising the following steps of:

Step (3), vehicle operation information in simulation is obtained in real time as input, and vehicle following behavior is controlled according to the output acceleration information on the basis of the simulated learning following model obtained through training in the step (2);

said step (1) further comprises the steps of:

Step (1.2), preprocessing the raw data representing the vehicle state and the corresponding driver behavior extracted in the step (1.1);

The step (2) further comprises the steps of:

The state-behavior pair refers to the mapping of states or observation vectors to agent behaviors in expert demonstration;

In said step (2.2), the extraction of said state-behavior pairs comprises the steps of:

a step (2.2.1) of extracting and processing data based on the step (1);

2. The expressway vehicle following behavior reconstruction method based on an imitation learning algorithm as claimed in claim 1, wherein: in said step (1.2), the preprocessing comprises field computation and data padding, which fills in missing data in some frames using a linear interpolation algorithm.

3. The method for reconstructing the following behavior of a highway vehicle based on a simulated learning algorithm as claimed in claim 2, wherein in said step (1.1), said obtaining an acceleration field in said field calculation comprises the steps of:

sequencing the track data according to the time sequence;

4. The method for reconstructing the following behavior of a highway vehicle based on a simulated learning algorithm as claimed in claim 2, wherein in said step (1.1), said obtaining the preceding vehicle information field in said field calculation comprises the steps of:

5. The method for reconstructing the following behavior of an expressway vehicle based on a simulated learning algorithm as claimed in claim 2, wherein said data filling uses a linear interpolation algorithm to fill in missing data in some frames as follows:

Assuming that the data sampling frequency is f, when a missing value exists between the first time t ₁ and the second time t ₂ and t ₁＜t₂ exists, calculating the number n of time nodes needing to be filled according to n=f (t ₂-t₁) -1; performing linear interpolation according to d _i＝d₁+(d₂-d₁) i/n, i=1, 2,..n, and completing numerical filling;

6. The method for reconstructing the vehicle following behavior of the expressway based on the simulated learning algorithm according to claim 1, wherein in the step (3), vehicle operation information in the SUMO simulation is obtained in real time through a TraCI interface as input of the simulated learning following model obtained by training in the step 2, and the vehicle following behavior is controlled according to the output acceleration information; wherein the vehicle operation information comprises vehicle position, speed and acceleration.