CN115171388A

CN115171388A - Multi-intersection travel time collaborative optimization method for intelligent internet vehicle

Info

Publication number: CN115171388A
Application number: CN202210851099.0A
Authority: CN
Inventors: 许明; 高登磊
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2022-10-11

Abstract

The invention belongs to the field of artificial intelligence, and particularly relates to a multi-intersection travel time collaborative optimization method for an intelligent internet vehicle; the intelligent internet vehicle is combined with a reinforcement learning method, a new reward function is provided, the average speed of vehicles in the traffic system is used as a reward value by the reward function, and punishment is carried out on the condition that the deceleration of vehicle running in the traffic system is lower than a comfort level parameter. And the vehicle is configured with an IDM following model to simulate an artificial vehicle by utilizing SUMO software, the calculation of comfort level parameters in the IDM following model is improved, and the comfort level parameters of the IDM following model are calculated according to the current traffic flow, the lane length and the expected speed of the vehicle. The invention proves that the traffic flow rate is effectively improved and the traffic stability is improved by combining the reinforcement learning method with the ICV. And the ICV after reinforcement learning can effectively reduce the frequent change of the vehicle acceleration.

Description

A collaborative optimization method of multi-intersection travel time for intelligent networked vehicles

技术领域technical field

本发明属于人工智能领域，具体涉及一种智能网联车的多交叉口旅行时间协同优化方法。The invention belongs to the field of artificial intelligence, and in particular relates to a multi-intersection travel time collaborative optimization method for intelligent networked vehicles.

背景技术Background technique

车联网将物联网概念充分体现在交通领域。车联网借助现代电子传感、无线电通信及控制技术，将人、车、环境三个信息主题以网络相联，在大数据的支撑下，实现车辆智能化控制。智能网联车就是车联网技术发展至成熟阶段对所有联网汽车的统称。在我国，特斯拉汽车的产生证明了智能网联车与人工车辆将保持长期共存的事实。这种现象增加了智能网联车协同的难度。智能网联车的出现，使得车辆可以获得周围车辆的位置、速度、加速度等基本状态信息，甚至可以通过集中处理器来获取信息对自身状态进行调整。目前在智能网联车与人工车辆的混合交通场景下一般采用多智能体强化学习方法解决交通系统中出现的问题。The Internet of Vehicles fully embodies the concept of the Internet of Things in the field of transportation. With the help of modern electronic sensing, radio communication and control technology, the Internet of Vehicles connects the three information topics of people, vehicles and the environment through a network, and realizes intelligent control of vehicles under the support of big data. ICV is the general term for all connected cars in the mature stage of the development of IoV technology. In my country, the production of Tesla vehicles proves the fact that intelligent networked vehicles and artificial vehicles will coexist for a long time. This phenomenon increases the difficulty of ICV collaboration. The emergence of intelligent networked vehicles enables vehicles to obtain basic status information such as the position, speed, and acceleration of surrounding vehicles, and can even obtain information through a centralized processor to adjust their own status. At present, the multi-agent reinforcement learning method is generally used to solve the problems in the traffic system in the mixed traffic scene of intelligent networked vehicles and artificial vehicles.

利用深度强化学习方法在虚拟场景中让机器学习人类行为，最终产生了一种能够学习在各种具有挑战性的任务中表现出色的人工智能体，但是不能解决DQN(深度强化学习方法)方法训练效率低，时间长等缺陷。在交通系统中，交通拥堵导致大量时间浪费和交通缓慢，是交通管理机构和交通参与者必须克服的主要挑战之一。在众多交通拥堵问题中，解决交叉口堵塞问题是重中之重。由此看来，利用多智能体的强化学习方法来解决在混行交通流中的交叉口堵塞问题是十分必要的。Using deep reinforcement learning methods to allow machines to learn human behavior in virtual scenes, eventually produces an artificial agent that can learn to perform well in a variety of challenging tasks, but cannot solve DQN (deep reinforcement learning methods) method training Defects such as low efficiency and long time. In transportation systems, traffic congestion, which results in a lot of wasted time and slow traffic, is one of the main challenges that traffic authorities and traffic participants must overcome. Among the many traffic congestion problems, solving the problem of intersection congestion is the top priority. From this point of view, it is necessary to use a multi-agent reinforcement learning method to solve the problem of intersection congestion in mixed traffic flow.

早期的解决交叉口堵塞问题中一般有集中式处理方法和分布式处理方法两类。集中式处理方法的思想是考虑一个合作式的环境，直接将单智能体算法扩展，让其直接学习一个联合动作的输出，但是并不好给出单个智能体该如何进行决策，从而导致车辆加速度频繁变换的情况。分布式处理方法是每个智能体独立学习自己的奖励函数，对于每个智能体来说，其它智能体就是环境的一部分，因此往往需要去考虑环境的非平稳态。现有技术介绍了车辆可利用先进先出策略逐个向中央处理器发送通行路口的请求，再由中央处理器集中处理，再向车辆发送确认请求，最后车辆收到确认信息排队通过路口。在续篇中将该方案扩展到相互连接的交叉口网络，旨在探索最佳路线，以引导车辆到达交叉口，以最大限度地减少其通过网络的延迟。进一步发展了基于预留的无信号交叉口方案的想法，并放宽了FIFO(先进先出)排队策略。通过放宽FIFO，与之前基于FIFO的基于预留的方案相比，具有更好的性能。由FIFO排队策略衍生的模糊控制模型，该模型是当车辆进入交叉口(或者某个路口)，车辆将给集中处理器发出通过请求，集中器也会根据该路口的车辆信息(位置，车辆大小等)将车辆进行分组，集中器根据模糊规则进行对每组车辆的平均等待时间进行计算，随后根据每个组的评分等级进行分组顺序安排。根据模糊规则进行分组，车辆向集中控制器发送请求，请求通过方可通行，但是此类模型实际上仿照信号灯指挥交通的模式，使得车辆可以有序的通行，不能完全体现ICV的对于削弱车辆加速度频繁变换的问题，并且同时也削弱了跟驰模型对车辆的作用。There are generally two types of centralized processing methods and distributed processing methods to solve the problem of intersection congestion in the early days. The idea of the centralized processing method is to consider a cooperative environment and directly extend the single-agent algorithm to directly learn the output of a joint action, but it is not easy to give a single agent how to make decisions, resulting in vehicle acceleration. Frequent changes. The distributed processing method is that each agent learns its own reward function independently. For each agent, other agents are part of the environment, so it is often necessary to consider the non-stationary state of the environment. The prior art introduces that vehicles can send requests for intersections to the central processor one by one using a first-in-first-out strategy, which is then processed centrally by the central processor, and then sends confirmation requests to vehicles. Extending the scheme to a network of interconnected intersections in the sequel aims to explore the best routes to guide vehicles to the intersection to minimize their delay through the network. The idea of a reservation-based unsignalized intersection scheme is further developed and the FIFO (first in first out) queuing policy is relaxed. By relaxing the FIFO, it has better performance compared to the previous FIFO-based reservation-based scheme. A fuzzy control model derived from the FIFO queuing strategy. This model is that when a vehicle enters an intersection (or a certain intersection), the vehicle will send a pass request to the centralized processor, and the concentrator will also base on the vehicle information (position, vehicle size) at the intersection. etc.) to group the vehicles, the concentrator calculates the average waiting time of each group of vehicles according to the fuzzy rules, and then arranges the grouping sequence according to the rating level of each group. The vehicles are grouped according to fuzzy rules, and the vehicles send requests to the centralized controller to pass the request. However, this type of model actually imitates the traffic pattern of the signal lights, so that the vehicles can pass in an orderly manner. The problem of frequent changes, and at the same time weakens the effect of the car-following model on the vehicle.

发明内容SUMMARY OF THE INVENTION

针对现有技术的不足，本发明设计一种智能网联车的多交叉口旅行时间协同优化方法。Aiming at the deficiencies of the prior art, the present invention designs a multi-intersection travel time collaborative optimization method for intelligent networked vehicles.

一种智能网联车的多交叉口旅行时间协同优化方法，具体包括以下步骤：A multi-intersection travel time collaborative optimization method for intelligent networked vehicles, which specifically includes the following steps:

步骤1：在SUMO中建立主车道为三车道，汇入、汇出车道均为两车道的交通交叉口场景，并且制定车辆限制速度和车道流向，使其模拟现实交通交叉口情况，在汇出车道时增加不稳定参数，允许在最右侧车在接近左转路口时突然左转的情况；Step 1: In SUMO, establish a traffic intersection scene where the main lane is three lanes, and the incoming and outgoing lanes are two lanes, and the vehicle speed limit and lane flow direction are set to simulate the real traffic intersection. Increase the instability parameter when driving in the lane to allow a sudden left turn when the car on the far right is approaching a left turn intersection;

步骤2：需设定一个交通流参数用来控制每小时车辆涌入车道的数量，在模拟交通交叉口场景过程中：加入跟驰模型和变道模型，来模拟现实世界中人类驾驶车辆，使得车辆根据前方车辆的状态判断自身车辆是否需要加速、减速、变道；Step 2: A traffic flow parameter needs to be set to control the number of vehicles entering the lane per hour. In the process of simulating the traffic intersection scene: add the car-following model and the lane-changing model to simulate human-driven vehicles in the real world, so that The vehicle judges whether its own vehicle needs to accelerate, decelerate, or change lanes according to the state of the vehicle ahead;

步骤2.1：使用IDM跟驰模型，该模型以当前车辆预期速度和与前导车间距为变量来计算当前车辆所需的最优加速度，具体为：Step 2.1: Use the IDM car-following model, which uses the expected speed of the current vehicle and the distance from the leading vehicle as variables to calculate the optimal acceleration required by the current vehicle, specifically:

其中，期望速度v0，期望距离s^*，时间间隔T，最小间隙s₀,加速指数δ,加速项α，自身速度v，以及舒适度b，车辆行进距离即与前方车辆的距离s和自身车辆与前方车辆相比的速度差Δv；a为当前车辆加速度；Among them, the desired speed v0, the desired distance s ^* , the time interval T, the minimum gap s ₀ , the acceleration index δ, the acceleration term α, the own speed v, and the comfort level b, the vehicle travel distance is the distance s from the vehicle ahead and the own vehicle Speed difference Δv compared with the preceding vehicle; a is the current vehicle acceleration;

并且为了提升乘客乘坐车辆的舒适度增加舒适度变量b；不同的b，使得产生不同的期望距离s^*，当超过安全车距后越大的期望距离将影响整个交通系统的效率；由此为了最大化实现车辆最优加速度，选择对舒适度b进行改进，公式如下：And in order to improve the comfort of passengers riding in the vehicle, the comfort variable b is added; different b results in different expected distances s ^* , when the safety distance is exceeded, the larger expected distance will affect the efficiency of the entire traffic system; thus, in order to To maximize the optimal acceleration of the vehicle, choose to improve the comfort b, the formula is as follows:

F＝VρF=Vρ

其中，F为车流量，V为车流速度，ρ为车流密度，H为车道长度，h为车身长度，舒适度为b；舒适度根据车流量和车流速度进行改进；Among them, F is the traffic flow, V is the traffic speed, ρ is the traffic density, H is the lane length, h is the body length, and the comfort is b; the comfort is improved according to the traffic flow and traffic speed;

步骤2.2：通过变道模型来判断当前车辆是否进行变道；变道模型中明确定义了四种不同的换道动机：Strategic change战略变道、Cooperative change协同变道、Tacticalchange战术变道，Obligatory change义务变道；Step 2.2: Determine whether the current vehicle is changing lanes through the lane-changing model; four different lane-changing motives are clearly defined in the lane-changing model: Strategic change, Cooperative change, Tactical change, Obligatory change Obligation to change lanes;

能否发生变道动作是通过车辆速度、期望速度来计算变道需求，以及根据车辆速度、期望速度、与前导车辆距离和车道占用率来计算变道紧急度来决定；Whether the lane change action can occur is determined by calculating the lane change demand according to the vehicle speed and the expected speed, and calculating the lane change urgency according to the vehicle speed, the expected speed, the distance to the leading vehicle and the lane occupancy rate;

根据自身车辆变道需求选择优先备选车道；计算在当前车道的安全速度，并结合备选变道的速度要求；根据自身车辆变道需求与变道紧急度的大小，决定是否进行变道；Select the preferred alternative lane according to the lane change demand of the own vehicle; calculate the safe speed in the current lane, and combine the speed requirements of the alternative lane change; decide whether to change the lane according to the lane change demand of the own vehicle and the urgency of the lane change;

步骤3：将车辆作为多智能体与强化学习方法ppo结合，即智能网联车；Step 3: Combine the vehicle as a multi-agent with the reinforcement learning method ppo, that is, an intelligent networked vehicle;

将车辆速度，位置，加速度，期望速度作为状态空间，将加速度作为动作空间；在ppo奖励函数中进行设计；将每辆车的平均速度之和作为奖励价值，设定舒适度作为条件，若车辆减速度小于舒适度，将对奖励函数增加惩罚；具体公式如下：Take the vehicle speed, position, acceleration, and desired speed as the state space, and the acceleration as the action space; design in the ppo reward function; take the sum of the average speed of each vehicle as the reward value, and set the comfort as the condition, if the vehicle If the deceleration is less than the comfort level, a penalty will be added to the reward function; the specific formula is as follows:

奖励函数re由三方面组成：平均速度V_averge，瞬时刹车减速度A_real，碰撞惩罚Z；w₁，w₂，w₃代表着平均速度，两车发生碰撞和过快减速的权重，A_max为设置的最大加速度；；正常车辆行驶情况下，当车辆减速时，减速度会在某一区域内进行缓慢增幅，但前方前导车突然停车，自身车辆将强制刹车，此时将加速度设为瞬时刹车减速度；随着智能网联车的不断汇入，得到整体平均速度的提高。The reward function re consists of three aspects: the average speed V _averge , the instantaneous braking deceleration A _real , and the collision penalty Z; w ₁ , w ₂ , w ₃ represent the average speed, the weight of the collision between the two vehicles and the excessive deceleration, A _max is the set maximum acceleration; under normal vehicle driving conditions, when the vehicle decelerates, the deceleration will increase slowly in a certain area, but the leading vehicle in front suddenly stops, and the own vehicle will be forced to brake, at this time, the acceleration is set to instantaneous Braking and deceleration; with the continuous integration of intelligent networked vehicles, the overall average speed is improved.

本发明有益技术效果：Beneficial technical effects of the present invention:

随着城市化率的增加，使得交通堵塞情况集中体现在交叉口，为提高交通流率和提升交通稳定性，解决智能网联车与非网联车辆构成的混行交通流在多交叉路口中产生的交通堵塞问题，本发明提出一种智能网联车的多交叉口旅行时间协同优化方法。将智能网联车与强化学习方法结合，提出一个新的奖励函数，奖励函数将交通系统车辆的平均速度作为奖励值，对交通系统中出现车辆行驶的减速度低于舒适度参数的情况进行惩罚。并且利用SUMO软件将车辆配置IDM跟驰模型模拟人工车辆，对IDM跟驰模型中舒适度参数计算进行改进，根据当前的车流量，车道长度，车辆期望速度计算IDM跟驰模型的舒适度参数。本发明证明通过强化学习方法与ICV结合，有效的提高交通流率和提升交通稳定性。并且验证了经过强化学习后的ICV可以有效的减少车辆加速度频繁变换情况。With the increase of urbanization rate, traffic congestion is concentrated at intersections. In order to improve traffic flow rate and improve traffic stability, it is necessary to solve the mixed traffic flow composed of intelligent networked vehicles and non-connected vehicles in multi-intersections. To solve the problem of traffic congestion, the present invention proposes a multi-intersection travel time collaborative optimization method for intelligent networked vehicles. Combining ICV with reinforcement learning method, a new reward function is proposed. The reward function takes the average speed of the vehicle in the traffic system as the reward value, and punishes the situation that the deceleration of the vehicle is lower than the comfort parameter in the traffic system. . And use SUMO software to configure the vehicle with IDM car following model to simulate artificial vehicles, and improve the calculation of comfort parameters in the IDM car following model. The invention proves that the combination of the reinforcement learning method and the ICV can effectively improve the traffic flow rate and improve the traffic stability. And it is verified that the ICV after reinforcement learning can effectively reduce the frequent change of vehicle acceleration.

附图说明Description of drawings

图1本发明实施例一种智能网联车的多交叉口旅行时间协同优化方法模拟交叉口环境结构示意图：1 is a schematic diagram of the environment structure of a simulated intersection environment of a multi-intersection travel time collaborative optimization method for intelligent networked vehicles according to an embodiment of the present invention:

图2本发明实施例一种智能网联车的多交叉口旅行时间协同优化方法流程图。FIG. 2 is a flowchart of a method for collaborative optimization of multi-intersection travel time for an intelligent networked vehicle according to an embodiment of the present invention.

图3本发明实施例连续T型交叉口交通场景；3 is a traffic scene of a continuous T-shaped intersection according to an embodiment of the present invention;

图4本发明实施例为跟驰模型和变道模型模拟的人工驾驶车辆实验结果；FIG. 4 is an experimental result of an artificially driven vehicle simulated by a car-following model and a lane-changing model according to an embodiment of the present invention;

图5本发明实施例为将车辆作为多智能体与强化学习方法ppo结合后的实验结果；FIG. 5 is the experimental result of combining the vehicle as a multi-agent with the reinforcement learning method ppo according to the embodiment of the present invention;

图6本发明实施例车流量与速度之间关系示意图。FIG. 6 is a schematic diagram of the relationship between traffic flow and speed according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明做进一步说明；The present invention will be further described below in conjunction with the accompanying drawings and embodiments;

本发明方法通过SUMO软件结合FLOW框架利用Rllab强化学习算法实现。The method of the invention is realized by using the Rllab reinforcement learning algorithm in combination with the SUMO software and the FLOW framework.

实验通过SUMO仿真软件与FLOW二次开源框架进行模拟，实验运行架构如图1，模拟场景采用了连续T型交叉口交通场景如图3，并且允许车辆在将要通过交叉路口时可以临时改变行车路线。介绍仿真环境和参数设置，同时进行算法分析。The experiment is simulated by SUMO simulation software and FLOW secondary open source framework. The experimental operation structure is shown in Figure 1. The simulation scene adopts the continuous T-intersection traffic scene as shown in Figure 3, and allows the vehicle to temporarily change the driving route when it is about to pass the intersection. . The simulation environment and parameter settings are introduced, and the algorithm analysis is carried out at the same time.

一种智能网联车的多交叉口旅行时间协同优化方法，如附图2所示，具体包括以下步骤：A multi-intersection travel time collaborative optimization method for intelligent networked vehicles, as shown in FIG. 2, specifically includes the following steps:

步骤1：在SUMO中建立主车道为三车道，汇入、汇出车道均为两车道的交通交叉口场景，并且制定车辆限制速度和车道流向，使其模拟现实交通交叉口情况，例"from":"edge_{}".format(i),"to":"edge_{}".format(i+1)。在汇出车道时增加不稳定参数，允许在最右侧车在接近左转路口时突然左转的情况；Step 1: Create a traffic intersection scene in SUMO where the main lane is three lanes, and the incoming and outgoing lanes are two lanes, and the vehicle speed limit and lane flow direction are set to simulate the real traffic intersection situation, for example "from ":"edge_{}".format(i),"to":"edge_{}".format(i+1). Added instability parameter when merging out of lanes to allow a sudden left turn when the car on the far right is approaching a left turn intersection;

仿真环境和参数设置：Simulation environment and parameter settings:

实验中通过FLOW框架中公共接口与SUMO仿真软件中Traci接口连接后生成所需的交通场景，在汇出车道时增加不稳定参数0.1，允许在最右侧车在接近左转路口时突然左转的情况；然后利用强化学习为ICV生成控制策略，训练迭代100次，一个回合200个时隙,每个时隙长为0.2秒，期间利用优化器进行优化。并记录迭代相关指标。In the experiment, the required traffic scene is generated by connecting the public interface in the FLOW framework with the Traci interface in the SUMO simulation software, and the instability parameter 0.1 is added when exiting the lane, allowing the rightmost car to suddenly turn left when approaching the left-turn intersection. Then use reinforcement learning to generate a control policy for ICV, train 100 iterations, 200 time slots in a round, each time slot is 0.2 seconds long, and optimize it with an optimizer. And record iteration-related metrics.

实验设置的超参数如表1所示。同一类数据的有效数字需保持一致。The hyperparameters of the experimental setup are shown in Table 1. The valid figures of the same type of data must be consistent.

表1实验设置的超参数；Table 1 Hyperparameters set by the experiment;

F＝VρF=Vρ

舒适度的出现一定程度上增加了交通系统的稳定性。但是过大的舒适度，对交通系统来说增大了资源消耗。对于交通系统来说，车流量与车流速度符合线性关系如图6，最初车辆流入交通系统中，随着车流速度的增加，车流密度随之增大，当增加的车流量趋于稳定时，车流速度增大使得将减小交通系统的所承受的车流量。由此改进计算舒适度将根据交通系统的中车流量以及车流速度计算出符合该车流量的最优舒适度。The appearance of comfort increases the stability of the transportation system to a certain extent. However, excessive comfort increases resource consumption for the transportation system. For the traffic system, there is a linear relationship between the traffic flow and the traffic speed as shown in Figure 6. Initially, the vehicles flow into the traffic system. As the traffic speed increases, the traffic density increases. When the increased traffic volume tends to be stable, the traffic flow The increase in speed will reduce the volume of traffic experienced by the traffic system. Therefore, the improved calculation comfort level will calculate the optimal comfort level according to the traffic flow and the speed of the traffic flow in the traffic system.

车辆的控制分为纵向控制和横向控制，纵向控制选择IDM跟驰模型，横向控制选用SUMO中自身带有LC2013变道模型。在复杂的多车道路网中，车辆行驶中大多数需要在同方向车道中进行变道处理，这样不仅提升整个交通系统的效率，也能削弱车辆加速度频繁变换的产生。车辆的速度主要由前导车道所决定，当前车辆若要变道时，为了防止与目标车道的前方和后方车辆发生碰撞只会在目标车道拥有足够的物理空间的时候执行变道动作。在仿真过程中，The control of the vehicle is divided into longitudinal control and lateral control. The longitudinal control selects the IDM car-following model, and the lateral control selects the LC2013 lane changing model in SUMO. In a complex multi-vehicle road network, most vehicles need to change lanes in the same direction, which not only improves the efficiency of the entire traffic system, but also reduces the frequent changes in vehicle acceleration. The speed of the vehicle is mainly determined by the leading lane. When the current vehicle wants to change lanes, in order to prevent collisions with vehicles ahead and behind the target lane, it will only perform the lane-changing action when the target lane has enough physical space. During the simulation process,

当车辆必须变道使得其行驶路径的下一条路，称之为战略变道，例如交通系统为三车道，此时车辆在第二车道，但是单位时间后车辆需转弯，则此时车辆即便停车也需等待变道。当自身车辆由其他车辆告知前面堵塞情况而产生的变道，称之为协同变道，例如自身车辆的前导车辆需执行战略变道以致停车，自身车辆根据获取的前导速度变化而产生变道需求，从而变道。战术变道动机的是自身车辆由于想要避免跟随的前导车辆速度缓慢而产生。义务变道动机的产生是自身车辆不影响其他速度更快的车辆而发生的变道。When the vehicle has to change lanes to make the next road on its driving path, it is called strategic lane change. For example, the traffic system is three lanes, and the vehicle is in the second lane at this time, but the vehicle needs to turn after unit time, then the vehicle even stops at this time. Also have to wait for a lane change. When the own vehicle is informed by other vehicles of the congestion ahead, it is called cooperative lane change. For example, the leading vehicle of the own vehicle needs to perform a strategic lane change to stop, and the own vehicle needs to change lanes according to the change of the obtained leading speed. , thereby changing lanes. The tactical lane change motivation arises from the slow speed of the leading vehicle that the ego vehicle wants to avoid following. The obligatory lane change motivation is the lane change that occurs when the own vehicle does not affect other faster vehicles.

在不超过交通量阈值的情况下，发生变道的情况更加明显，在一定方面上变道模型的产生减慢了堵塞的产生，减少了堵塞周期。在自此模拟中所有的车辆变道的前提是不影响变道后的车辆状态。并且对于研究所计算的跟驰模型舒适度参数来说，增大了舒适度的兼容性。Under the condition that the traffic volume threshold is not exceeded, the situation of lane change is more obvious. In a certain respect, the generation of the lane change model slows down the generation of congestion and reduces the congestion period. The premise of all vehicle lane changes in this simulation is not to affect the vehicle state after the lane change. And for the comfort parameter of the car following model calculated by the research, the compatibility of the comfort is increased.

奖励函数re由三方面组成：平均速度V_averge，瞬时刹车减速度A_real，碰撞惩罚Z；w₁，w₂，w₃代表着平均速度，两车发生碰撞和过快减速的权重，A_max为设置的最大加速度；；正常车辆行驶情况下，当车辆减速时，减速度会在某一区域内进行缓慢增幅，但前方前导车突然停车，自身车辆将强制刹车，此时将加速度设为瞬时刹车减速度；随着智能网联车的不断汇入，得到整体平均速度的提高。The reward function re consists of three aspects: the average speed V _averge , the instantaneous braking deceleration A real , and the collision penalty Z; _w ₁ , w ₂ , w ₃ represent the average speed, the weight of the collision between the two vehicles and the excessive deceleration, A _max is the maximum acceleration set; under normal vehicle driving conditions, when the vehicle decelerates, the deceleration will increase slowly in a certain area, but the leading vehicle in front suddenly stops, and the own vehicle will be forced to brake. At this time, the acceleration is set to Instantaneous braking deceleration; with the continuous integration of intelligent networked vehicles, the overall average speed is improved.

实验中通过在交通系统中加入不同比例的具有强化学习算法的车辆，具有强化学习算法的车辆可以明显的减弱车辆频繁变换车速问题。并且对比了不同的奖励函数。In the experiment, by adding different proportions of vehicles with reinforcement learning algorithm to the traffic system, the vehicles with reinforcement learning algorithm can significantly reduce the problem of frequent change of vehicle speed. And compare different reward functions.

实验结果如图4车辆不结合强化学习方法，图5车辆结合强化学习方法，首先比对有无强化学习算法对交通系统的影响，如图4，图5。图像数据统计了每辆车在交通系统中从开始到结束整个行驶过程的平均速度，很明显图5效果优于图4，车辆在与强化学习方法结合后整体平均速度提升了2.5倍。图4，无强化学习方法的时，车辆行驶速度较慢，并且交通拥堵情况严重，在通过结合强化学习方法后，如图5，车辆可以有效的增加车辆在交通系统中的通行效率。The experimental results are shown in Figure 4 for vehicles without reinforcement learning methods, and Figure 5 for vehicles combined with reinforcement learning methods. First, compare the impact of reinforcement learning algorithms on the traffic system, as shown in Figure 4 and Figure 5. The image data counts the average speed of each vehicle from the beginning to the end of the entire driving process in the traffic system. It is obvious that the effect of Figure 5 is better than that of Figure 4. The overall average speed of the vehicle is increased by 2.5 times after combining with the reinforcement learning method. Figure 4, without the reinforcement learning method, the vehicle travels at a slower speed and the traffic congestion is serious. After combining the reinforcement learning method, as shown in Figure 5, the vehicle can effectively increase the traffic efficiency of the vehicle in the traffic system.

实验利用SUMO仿真软件模拟真实场景下的交通状态，研究了在混行交通流中利用深度强化学习PPO算法解决交叉口堵塞问题，高效的提升交通系统的稳定性和交通流。并且实验将不同比例的智能网联车加入到交通系统中，证明了智能网联车的发展对智慧城市的积极作用。The experiment uses the SUMO simulation software to simulate the traffic state in the real scene, and studies the use of the deep reinforcement learning PPO algorithm to solve the problem of intersection congestion in the mixed traffic flow, and effectively improves the stability of the traffic system and the traffic flow. And the experiment added different proportions of intelligent networked vehicles into the transportation system, which proved the positive effect of the development of intelligent networked vehicles on smart cities.

Claims

1. A multi-intersection travel time collaborative optimization method for an intelligent internet vehicle is characterized by specifically comprising the following steps:

step 1: establishing a traffic intersection scene with three main lanes and two merging lanes in the SUMO, and establishing vehicle limiting speed and lane flow direction to simulate the situation of a real traffic intersection, adding unstable parameters when merging the lanes, and allowing a vehicle on the rightmost side to suddenly turn left when approaching a left-turn intersection;

and 2, step: setting a traffic flow parameter for controlling the number of vehicle inrush lanes per hour, in the process of simulating traffic intersection scenes: adding a following model and a lane changing model to simulate a human-driven vehicle in the real world, so that the vehicle judges whether the vehicle needs to accelerate, decelerate and change lanes according to the state of the vehicle in front;

and step 3: combining a vehicle as a multi-agent with a reinforcement learning method ppo, namely, an intelligent networking vehicle; along with the continuous convergence of the intelligent network connection vehicle, the integral average speed is improved.

2. The method for collaborative optimization of the multi-intersection travel time of the intelligent networked vehicle according to claim 1, wherein the step 2 specifically comprises:

step 2.1: using an IDM following model, wherein the model takes the expected speed of the current vehicle and the distance between the current vehicle and a leading vehicle as variables to calculate the optimal acceleration required by the current vehicle, and specifically comprises the following steps:

wherein the desired velocity v0, the desired distance s ^* Time interval T, minimum gap s ₀ Acceleration index δ, acceleration term α, self-speed v, and comfort b, vehicle travel distance, i.e. distance s from the vehicle in front and speed difference Δ v of the self-vehicle compared to the vehicle in front; a is the current vehicle acceleration;

and increasing a comfort variable b in order to improve the comfort of passengers riding the vehicle; b are different so that different desired distances s result ^* The greater the expected distance beyond the safe vehicle distance will affect the efficiency of the overall traffic system; in order to maximize the optimal acceleration of the vehicle, the comfort level b is improved, and the formula is as follows:

F＝Vρ

wherein F is traffic flow, V is traffic flow speed, rho is traffic flow density, H is lane length, H is vehicle body length, and comfort level is b; comfort is improved according to traffic flow and traffic speed;

step 2.2: judging whether the current vehicle changes lanes or not through a lane changing model; four different lane changing motivations are clearly defined in the lane changing model: stratetic change Strategic lane change, cooperative change collaborative lane change, tactical change Tactical lane change, obblication change obligation lane change;

whether lane changing action can be carried out is determined by calculating lane changing requirements through the vehicle speed and the expected speed and calculating lane changing urgency according to the vehicle speed, the expected speed, the distance from the leading vehicle and the lane occupancy rate;

selecting a priority alternative lane according to the lane change requirement of the vehicle; calculating the safe speed of the current lane and combining the speed requirement of the alternative lane change; and determining whether to change lanes or not according to the lane changing requirement of the vehicle and the lane changing emergency degree.

3. The method for collaborative optimization of multi-intersection travel time of the intelligent networked vehicle as claimed in claim 1, wherein the step 3 combines the vehicle as a multi-agent with a reinforcement learning method ppo specifically as follows:

taking the vehicle speed, position, acceleration and expected speed as a state space and taking the acceleration as an action space; designing in a ppo reward function; taking the sum of the average speeds of each vehicle as reward value, setting comfort level as condition, and adding punishment to reward function if the deceleration of the vehicle is less than the comfort level; the specific formula is as follows:

the reward function re consists of three aspects: average velocity V _averge Instantaneous brake deceleration A _real A collision penalty Z; w is a ₁ ，w ₂ ，w ₃ Weight representing average speed, collision of two vehicles and over-rapid deceleration, A _max Is the set maximum acceleration; (ii) aUnder the condition of normal vehicle running, when the vehicle decelerates, the deceleration can be slowly increased in a certain area, but the front leader vehicle stops suddenly, the self vehicle brakes forcibly, and the acceleration is set as the instantaneous braking deceleration; along with the continuous convergence of the intelligent network connection vehicle, the integral average speed is improved.