[go: up one dir, main page]

CN115257819A - Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment - Google Patents

Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment Download PDF

Info

Publication number
CN115257819A
CN115257819A CN202211070514.5A CN202211070514A CN115257819A CN 115257819 A CN115257819 A CN 115257819A CN 202211070514 A CN202211070514 A CN 202211070514A CN 115257819 A CN115257819 A CN 115257819A
Authority
CN
China
Prior art keywords
sub
vehicle
driving
decision
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211070514.5A
Other languages
Chinese (zh)
Other versions
CN115257819B (en
Inventor
李旭
胡玮明
胡悦
胡锦超
陆红伟
徐启敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211070514.5A priority Critical patent/CN115257819B/en
Publication of CN115257819A publication Critical patent/CN115257819A/en
Application granted granted Critical
Publication of CN115257819B publication Critical patent/CN115257819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2300/00Indexing codes relating to the type of vehicle
    • B60W2300/12Trucks; Load vehicles
    • B60W2300/125Heavy duty trucks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/403Image sensing, e.g. optical camera
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/408Radar; Laser, e.g. lidar
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2556/00Input parameters relating to data
    • B60W2556/10Historical data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Transportation (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mechanical Engineering (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a decision-making method for safe driving of a large-scale commercial vehicle in an urban low-speed environment. And secondly, constructing a multi-head attention-based safe driving decision model of the commercial vehicle. The model contains two sub-networks, deep double-Q network and generative confrontation mock-learning. The deep double-Q network learns the safe driving strategies under the edge scenes such as dangerous scenes and conflict scenes in an unsupervised learning mode; an antagonistic emulation learning sub-network is generated that emulates safe driving behavior of a human driver under different driving conditions and driving regimes. And finally, training a safe driving decision model to obtain driving strategies under different driving conditions and driving conditions. The method provided by the invention can simulate the safe driving behavior of a human driver, and considers the influence of factors such as a visual blind area, a sudden obstacle and the like on the driving safety.

Description

城市低速环境下的大型营运车辆安全驾驶决策方法Decision-making method for safe driving of large commercial vehicles in urban low-speed environment

技术领域technical field

本发明涉及一种营运车辆驾驶决策方法,尤其是涉及一种城市低速环境下的大型营运车辆安全驾驶决策方法,属于汽车安全技术领域。The invention relates to a decision-making method for driving a commercial vehicle, in particular to a decision-making method for safe driving of a large commercial vehicle in an urban low-speed environment, and belongs to the technical field of automobile safety.

背景技术Background technique

在城市交通环境下,因驾驶员视线盲区引发的道路交通事故占比最高,这些事故的肇事主体多为重型货车、大型客车、汽车列车等大型营运车辆。不同于乘用车辆,大型营运车辆具有体积大、车身长、轴距大、驾驶位置高等特点,其车身周围存在较多的静态和动态视觉盲区,如车头前方、右前轮附近和右后视镜下方等。当营运车辆转向特别是右转时,极易碰撞甚至碾压视野盲区内行人和非机动车,是产生恶性安全事故的主要区域。此外,相比于较为封闭的高速公路场景,在机非混行的城市交通环境下,交通参与者的类型和数量相对较多,营运车辆突遇障碍物的情况时有发生,具有更高的危险性。因此,在开放、多交通目标干扰的城市交通环境下,如何提高营运车辆的行车安全性,是目前亟需解决的关键问题,也是保障城市道路交通安全的重点。In the urban traffic environment, road traffic accidents caused by drivers' blind spots account for the highest proportion, and the main causes of these accidents are mostly large operating vehicles such as heavy goods vehicles, large passenger cars, and automobile trains. Different from passenger vehicles, large commercial vehicles have the characteristics of large size, long body, large wheelbase, and high driving position. There are many static and dynamic visual blind spots around the body, such as the front of the front, near the right front wheel and right rear view. Wait under the mirror. When commercial vehicles turn, especially turn right, they are very likely to collide or even crush pedestrians and non-motor vehicles in the blind area of vision, which is the main area where vicious safety accidents occur. In addition, compared with the relatively closed highway scene, in the urban traffic environment with mixed traffic, the types and numbers of traffic participants are relatively large, and business vehicles encounter obstacles frequently, which has a higher dangerous. Therefore, how to improve the driving safety of operating vehicles in an open and interfering urban traffic environment is a key problem that needs to be solved urgently, and it is also the focus of ensuring urban road traffic safety.

目前,积极发展自动驾驶技术已成为国内外广泛认可的保障车辆运行安全的重要手段。作为实现高品质自动驾驶的关键一环,驾驶决策决定了营运车辆自动驾驶的合理性和安全性。如果能在交通事故发生前的1.5秒对驾驶员进行危险预警,并提供可靠、有效的安全驾驶策略,可以大幅度降低因视觉盲区、突遇障碍物等因素造成的交通事故发生频率。因此,研究大型营运车辆的安全驾驶决策方法,对于保障营运车辆的行车安全具有重要作用。At present, the active development of autonomous driving technology has become an important means to ensure the safety of vehicle operation widely recognized at home and abroad. As a key part of realizing high-quality autonomous driving, driving decision-making determines the rationality and safety of autonomous driving of commercial vehicles. If the driver can be warned of danger 1.5 seconds before a traffic accident, and a reliable and effective safe driving strategy can be provided, the frequency of traffic accidents caused by factors such as visual blind spots and sudden obstacles can be greatly reduced. Therefore, studying the safe driving decision-making method of large commercial vehicles plays an important role in ensuring the driving safety of commercial vehicles.

已有较多专利和文献对防碰撞驾驶决策进行了研究,但主要面向乘用车辆。相比于乘用车辆,营运车辆具有较大的视觉盲区,且具有更长的制动距离和制动时间。面向乘用车辆的防撞决策方法,无法直接应用于营运车辆。另一方面,已有部分专利对营运车辆的安全驾驶决策进行了研究,如一种高度类人的自动驾驶营运车辆安全驾驶决策方法(申请号:202210158758.2)、一种基于深度学习的大型营运车辆车道变换决策方法(公开号:CN113954837A)等,但这些决策方法均面向高速公路场景。There have been many patents and literatures on collision avoidance driving decision-making, but mainly for passenger vehicles. Compared with passenger vehicles, commercial vehicles have larger visual blind spots, and have longer braking distance and braking time. The collision avoidance decision-making method for passenger vehicles cannot be directly applied to commercial vehicles. On the other hand, some patents have studied the safe driving decision-making of commercial vehicles, such as a highly human-like autonomous driving commercial vehicle safe driving decision-making method (application number: 202210158758.2), a deep learning-based large commercial vehicle lane Transformation decision-making method (public number: CN113954837A), etc., but these decision-making methods are all oriented to the highway scene.

不同于交通参与者类型较少的高速公路场景,城市交通环境具有开放、多交通目标干扰、机非混行等特点。特别是车辆视觉盲区、突遇障碍物等因素的存在,对城市交通环境下的营运车辆安全驾驶提出了更高的挑战。因此,面向高速公路场景的营运车辆安全驾驶决策方法,无法直接应用于开放干扰的城市交通环境。Different from the expressway scene with fewer types of traffic participants, the urban traffic environment has the characteristics of openness, multi-traffic target interference, and mixed traffic. In particular, the existence of factors such as vehicle visual blind spots and unexpected obstacles pose higher challenges to the safe driving of commercial vehicles in urban traffic environments. Therefore, the decision-making method for safe driving of commercial vehicles for highway scenarios cannot be directly applied to urban traffic environments with open interference.

总体而言,针对开放、多交通目标干扰的城市交通环境,现有的方法难以满足营运车辆对于安全驾驶决策的要求,尚缺乏能够提供驾驶动作、行车路径等具体驾驶建议的安全驾驶决策方法,特别是缺乏考虑视觉盲区和突遇障碍物影响的大型营运车辆安全驾驶决策研究。In general, for the open urban traffic environment with multiple traffic targets, the existing methods are difficult to meet the requirements of operating vehicles for safe driving decision-making, and there is still a lack of safe driving decision-making methods that can provide specific driving suggestions such as driving actions and driving paths. In particular, there is a lack of research on safe driving decision-making of large commercial vehicles considering the impact of visual blind spots and sudden obstacles.

发明内容Contents of the invention

发明目的:为了实现城市低速环境下的大型营运车辆安全驾驶决策,保障车辆行车安全,本发明针对重型货车、重型卡车等自动驾驶营运车辆,提出了一种城市低速环境下的大型营运车辆安全驾驶决策方法。该方法综合考虑了视觉盲区、突遇障碍物、不同行驶工况等因素对行车安全的影响,且能够模拟人类驾驶员的安全驾驶行为,为自动驾驶营运车辆提供更加合理、安全的驾驶策略,可以有效保障自动驾驶营运车辆的行车安全。同时,该方法无需考虑复杂的车辆动力学方程和车身参数,计算方法简单清晰,可以实时输出自动驾驶营运车辆的安全驾驶策略,且使用的传感器成本较低,便于大规模推广。Purpose of the invention: In order to realize the safe driving decision-making of large commercial vehicles in urban low-speed environment and ensure the driving safety of vehicles, the present invention proposes a safe driving of large commercial vehicles in urban low-speed environment for heavy goods vehicles, heavy trucks and other self-driving commercial vehicles. decision making method. This method comprehensively considers the impact of factors such as visual blind spots, unexpected obstacles, and different driving conditions on driving safety, and can simulate the safe driving behavior of human drivers, providing more reasonable and safe driving strategies for autonomous driving vehicles. It can effectively guarantee the driving safety of self-driving commercial vehicles. At the same time, this method does not need to consider complex vehicle dynamic equations and body parameters. The calculation method is simple and clear, and it can output the safe driving strategy of autonomous driving commercial vehicles in real time. The cost of the sensors used is low, which is convenient for large-scale promotion.

技术方案:为实现本发明的目的,本发明所采用的技术方案是:城市低速环境下的大型营运车辆安全驾驶决策方法。首先,采集城市交通环境下人类驾驶员的安全驾驶行为,构建形成安全驾驶行为数据集。其次,构建基于多头注意力的营运车辆安全驾驶决策模型。该模型包含深度双Q网络和生成对抗模仿学习两个子网络。其中,深度双Q网络通过无监督学习的方式,学习危险场景、冲突场景等边缘场景下的安全驾驶策略;生成对抗模仿学习子网络模仿不同驾驶条件和行驶工况下的安全驾驶行为。最后,训练安全驾驶决策模型,得到不同驾驶条件和行驶工况下的驾驶策略,实现对营运车辆安全驾驶行为的高级决策输出。具体包括以下步骤:Technical solution: In order to realize the object of the present invention, the technical solution adopted in the present invention is: a decision-making method for safe driving of large commercial vehicles in an urban low-speed environment. First, collect the safe driving behavior of human drivers in the urban traffic environment, and construct a safe driving behavior data set. Secondly, construct a decision-making model for safe driving of commercial vehicles based on multi-head attention. The model consists of two sub-networks, a deep double-Q network and a generative adversarial imitation learning. Among them, the deep double-Q network learns safe driving strategies in edge scenarios such as dangerous scenes and conflict scenes through unsupervised learning; the generated confrontational imitation learning sub-network imitates safe driving behaviors under different driving conditions and driving conditions. Finally, the safe driving decision-making model is trained to obtain driving strategies under different driving conditions and driving conditions, and realize advanced decision-making output for safe driving behavior of commercial vehicles. Specifically include the following steps:

步骤一:采集城市交通环境下人类驾驶员的安全驾驶行为Step 1: Collect the safe driving behavior of human drivers in urban traffic environment

为了实现与人类驾驶员相媲美的驾驶决策,本发明通过实际道路测试和驾驶模拟仿真的方式,采集不同驾驶条件和行驶工况下的安全驾驶行为,进而构建表征人类驾驶员安全驾驶行为的数据集。具体包括以下4个子步骤:In order to achieve driving decisions comparable to human drivers, the present invention collects safe driving behaviors under different driving conditions and driving conditions through actual road tests and driving simulations, and then constructs data representing safe driving behaviors of human drivers set. Specifically, it includes the following 4 sub-steps:

子步骤1:利用毫米波雷达、128线激光雷达、视觉传感器、北斗传感器和惯性传感器搭建多维目标信息同步采集系统。Sub-step 1: Use millimeter-wave radar, 128-line laser radar, vision sensor, Beidou sensor and inertial sensor to build a multi-dimensional target information synchronous acquisition system.

子步骤2:在真实城市环境下,多名驾驶员依次驾驶搭载多维目标信息同步采集系统的营运车辆,对驾驶员的车道变换、车道保持、车辆跟驰、加减速等各种驾驶行为的相关数据进行采集和处理,获取各驾驶行为的多源异质描述数据,如雷达或视觉传感器测得多个不同方位的障碍物距离,北斗传感器及惯性传感器测得的位置、速度、加速度及横摆角速度等,以及车载传感器测得的方向盘转角等。Sub-step 2: In a real urban environment, multiple drivers sequentially drive a commercial vehicle equipped with a multi-dimensional target information synchronous acquisition system, and the correlation of various driving behaviors such as lane change, lane keeping, vehicle following, acceleration and deceleration, etc. Data is collected and processed to obtain multi-source heterogeneous description data of various driving behaviors, such as the distance of obstacles in different directions measured by radar or visual sensors, the position, speed, acceleration and yaw measured by Beidou sensors and inertial sensors Angular velocity, etc., and steering wheel angle measured by on-board sensors, etc.

子步骤3:为了模仿危险场景、冲突场景等边缘场景下的安全驾驶行为,搭建基于硬件在环仿真的虚拟城市场景,所构建的城市交通场景包括以下三类:Sub-step 3: In order to imitate the safe driving behavior in borderline scenarios such as dangerous scenarios and conflict scenarios, build a virtual city scenario based on hardware-in-the-loop simulation. The constructed urban traffic scenarios include the following three categories:

(1)在车辆行驶过程中,车辆前方会出现横向接近的交通参与者(即突遇障碍物);(1) During the driving process of the vehicle, there will be laterally approaching traffic participants in front of the vehicle (that is, suddenly encountering obstacles);

(2)在车辆转向过程中,车辆的视觉盲区内存在静止的交通参与者;(2) During the turning process of the vehicle, there are stationary traffic participants in the blind spot of the vehicle;

(3)在车辆转向过程中,车辆的视觉盲区内存在运动的交通参与者。(3) During the turning process of the vehicle, there are moving traffic participants in the blind spot of the vehicle.

在上述交通场景中,存在多种路网结构(直道、弯道和十字路口)和多类交通参与者(营运车辆、乘用车、非机动车和行人)。In the above traffic scenarios, there are various road network structures (straight roads, curves, and intersections) and various types of traffic participants (commercial vehicles, passenger vehicles, non-motorized vehicles, and pedestrians).

多名驾驶员通过真实控制器(方向盘、油门和制动踏板)驾驶虚拟场景中的营运车辆,采集自车的横纵向位置、横纵向速度、横纵向加速度、与周围交通参与者的相对距离和相对速度等信息。Multiple drivers drive commercial vehicles in the virtual scene through real controllers (steering wheel, accelerator and brake pedal), and collect the vehicle's horizontal and vertical positions, horizontal and vertical speeds, horizontal and vertical accelerations, and relative distances and distances from surrounding traffic participants. Relative speed and other information.

子步骤4:基于真实城市环境和驾驶模拟仿真环境采集的数据,构建形成用于安全驾驶决策学习的驾驶行为数据集,具体可表示为:Sub-step 4: Based on the data collected in the real urban environment and driving simulation environment, construct a driving behavior data set for safe driving decision-making learning, which can be specifically expressed as:

Figure BDA0003829935270000031
Figure BDA0003829935270000031

式中,X表示涵盖状态、动作的二元组,即构建的表征人类驾驶员安全驾驶行为的数据集,(sj,aj)表示j时刻的“状态-动作”对,其中,sj表示j时刻的状态,aj表示j时刻的动作,即人类驾驶员基于状态sj做出的动作,n表示数据库中“状态-动作”对的数量。In the formula, X represents the 2-tuple covering the state and action, that is, the constructed data set representing the safe driving behavior of human drivers, (s j , a j ) represents the "state-action" pair at time j, where s j Indicates the state at time j, a j represents the action at time j, that is, the action made by a human driver based on state s j , and n represents the number of "state-action" pairs in the database.

步骤二:构建基于多头注意力的营运车辆安全驾驶决策模型Step 2: Construct a decision-making model for safe driving of commercial vehicles based on multi-head attention

为了实现城市低速环境下的大型营运车辆安全驾驶决策,本发明综合考虑视觉盲区、突遇障碍物、行驶工况等因素对行车安全的影响,建立营运车辆安全驾驶决策模型。考虑到深度强化学习将深度学习的感知能力和强化学习的决策能力相结合,通过无监督学习的方式对交通环境进行探索,本发明利用深度强化学习对危险场景、冲突场景等边缘场景下的安全驾驶策略进行学习。此外,考虑到模仿学习具有仿效榜样的能力,本发明利用模仿学习模拟人类驾驶员在不同驾驶条件和行驶工况下的安全驾驶行为。因此,构建的安全驾驶决策模型由两部分组成,具体描述如下:In order to realize the safe driving decision-making of large-scale commercial vehicles in low-speed urban environments, the present invention comprehensively considers the influence of factors such as visual blind spots, unexpected obstacles, and driving conditions on driving safety, and establishes a safe driving decision-making model for commercial vehicles. Considering that deep reinforcement learning combines the perception ability of deep learning with the decision-making ability of reinforcement learning, and explores the traffic environment through unsupervised learning, the present invention uses deep reinforcement learning to improve safety in edge scenarios such as dangerous scenes and conflict scenes. Learn driving strategies. In addition, considering that imitation learning has the ability to imitate models, the present invention uses imitation learning to simulate the safe driving behavior of human drivers under different driving conditions and driving conditions. Therefore, the constructed safe driving decision-making model consists of two parts, which are described in detail as follows:

子步骤1:定义安全驾驶决策模型的基本参数Sub-step 1: Define the basic parameters of the safe driving decision model

首先,将城市低速环境下的安全驾驶决策问题转化为有限马尔科夫决策过程。其次,定义安全驾驶决策模型的基本参数。First, the safe driving decision-making problem in urban low-speed environment is transformed into a finite Markov decision process. Second, the basic parameters of the safe driving decision-making model are defined.

(1)定义状态空间(1) Define the state space

为了描述自车和附近交通参与者的运动状态,本发明利用时间序列数据和占据栅格图构建状态空间。具体描述如下:In order to describe the motion state of the self-vehicle and nearby traffic participants, the present invention utilizes time series data and occupancy grid graphs to construct a state space. The specific description is as follows:

St=[S1(t),S2(t),S3(t)] (2)S t = [S 1 (t), S 2 (t), S 3 (t)] (2)

式中,St表示t时刻的状态空间,S1(t)和S2(t)表示t时刻与时间序列数据相关的状态空间,S3(t)表示t时刻与占据栅格图相关的状态空间。In the formula, S t represents the state space at time t, S 1 (t) and S 2 (t) represent the state space related to the time series data at time t, and S 3 (t) represents the state space related to the occupancy grid map at time t state space.

首先,利用连续位置、速度、加速度和航向角信息描述自车的运动状态:First, the ego vehicle's motion state is described using continuous position, velocity, acceleration, and heading angle information:

S1(t)=[px,py,vx,vy,ax,ays] (3)S 1 (t)=[p x ,p y ,v x ,v y ,a x ,a ys ] (3)

式中,px,py分别表示自车的横向位置和纵向位置,单位为米,vx,vy分别表示自车的横向速度和纵向速度,单位为米每秒,ax,ay分别表示自车的横向加速度和纵向加速度,单位为米每二次方秒,θs表示自车的航向角,单位为度。In the formula, p x , p y represent the lateral position and longitudinal position of the self-vehicle respectively in meters, v x , v y represent the lateral speed and longitudinal speed of the self-vehicle respectively in meters per second, a x , a y Respectively represent the lateral acceleration and longitudinal acceleration of the own vehicle, the unit is meter per square second, θ s represents the heading angle of the own vehicle, the unit is degree.

其次,利用自车与周围交通参与者的相对运动状态信息描述周围交通参与者的运动状态:Secondly, use the relative motion state information of the self-vehicle and the surrounding traffic participants to describe the motion state of the surrounding traffic participants:

Figure BDA0003829935270000041
Figure BDA0003829935270000041

式中,

Figure BDA0003829935270000042
分别表示自车与第i个交通参与者的相对距离、相对速度和加速度,单位分别为米、米每秒和米每二次方秒。In the formula,
Figure BDA0003829935270000042
respectively represent the relative distance, relative speed and acceleration between the self-vehicle and the i-th traffic participant, and the units are meters, meters per second and meters per square second.

现有的状态空间定义方法中,常使用固定的编码方法,即考虑的周围交通参与者的数量是固定的。然而,在实际的城市交通场景中,营运车辆周围的交通参与者数量和位置是时刻变化的,且需要特别考虑突遇障碍物和视觉盲区导致的侧向碰撞。虽然固定编码的方法可以实现有效的状态表征,但考虑的交通参与者数量有限(使用了表示场景所需的最少信息量),无法准确、全面地描述周围所有交通参与者对营运车辆行车安全的影响。In the existing state space definition methods, a fixed coding method is often used, that is, the number of surrounding traffic participants considered is fixed. However, in the actual urban traffic scene, the number and position of traffic participants around the operating vehicle are constantly changing, and special consideration should be given to lateral collisions caused by sudden obstacles and visual blind spots. Although the method of fixed coding can achieve effective state representation, the number of traffic participants considered is limited (the minimum amount of information required to represent the scene is used), and it cannot accurately and comprehensively describe the impact of all surrounding traffic participants on the driving safety of operating vehicles. influences.

最后,为了更加形象地描述自车与周围交通参与者的相对位置关系,提高决策的可靠性和有效性,本发明将道路区域栅格化,划分成若干个a×b的网格区域,将道路区域及车辆目标抽象成栅格图,即用于描述相对位置关系的“存在”栅格图S3(t)。其中,a表示网格区域的长度,b表示网格区域的宽度。Finally, in order to more vividly describe the relative positional relationship between the self-vehicle and the surrounding traffic participants, and improve the reliability and effectiveness of decision-making, the present invention grids the road area and divides it into several a×b grid areas. The road area and vehicle targets are abstracted into a grid map, that is, the "existence" grid map S 3 (t) used to describe the relative positional relationship. Wherein, a represents the length of the grid area, and b represents the width of the grid area.

“存在”栅格图包含四种属性,包括栅格坐标、是否存在车辆、对应车辆的类别、与左右车道线的距离。其中,不存在交通参与者的网格置为“0”,存在交通参与者的网格置为“1”,该网格与自车所在网格的位置分布,用于描述两车的相对间距。The "existence" grid map contains four attributes, including grid coordinates, whether there is a vehicle, the category of the corresponding vehicle, and the distance from the left and right lanes. Among them, the grid with no traffic participants is set to "0", and the grid with traffic participants is set to "1". The position distribution between the grid and the grid where the self-vehicle is located is used to describe the relative distance between the two vehicles .

(2)定义动作空间(2) Define the action space

利用横向和纵向驾驶动作定义动作空间:Define the action space with lateral and longitudinal driving actions:

At=[aleft,astraight,aright,aaccel,acons,adecel] (5)A t =[a left ,a straight ,a right ,a accel ,a cons ,a decel ] (5)

式中,At表示t时刻的动作空间,aleft,astraight,aright分别表示左转、直行和右转,aaccel,acons,adecel分别表示加速、匀速和减速。In the formula, A t represents the action space at time t, a left , a straight , and a right represent left turn, straight line and right turn respectively, and a accel , a cons , a decel represent acceleration, constant speed and deceleration respectively.

(3)定义奖励函数(3) Define the reward function

Rt=r1+r2+r3 (6)R t =r 1 +r 2 +r 3 (6)

式中,Rt表示t时刻的奖励函数,r1,r2,r3分别表示前向防撞奖励函数、后向防撞奖励函数和侧向防撞奖励函数,可通过式(7)、式(8)和式(9)获得。In the formula, R t represents the reward function at time t, r 1 , r 2 , and r 3 represent the forward collision avoidance reward function, the backward collision avoidance reward function and the side collision avoidance reward function respectively, which can be obtained through formula (7), Formula (8) and formula (9) are obtained.

Figure BDA0003829935270000051
Figure BDA0003829935270000051

Figure BDA0003829935270000052
Figure BDA0003829935270000052

Figure BDA0003829935270000053
Figure BDA0003829935270000053

式中,TTC表示自车与前方障碍物发生碰撞的时间,可通过自车与前方障碍物之间的距离除以相对速度获得,TTCthr表示距离碰撞时间阈值,RTTC表示后向碰撞时间,RTTCthr表示后向碰撞时间阈值,单位均为秒,xlat表示自车与两侧交通参与者的距离,xmin表示最小侧向安全距离,单位均为米,β123分别表示前向防撞奖励函数、后向防撞奖励函数和侧向防撞奖励函数的权重系数。In the formula, TTC represents the collision time between the self-vehicle and the front obstacle, which can be obtained by dividing the distance between the self-vehicle and the front obstacle by the relative speed, TTC thr represents the distance collision time threshold, RTTC represents the backward collision time, RTTC thr represents the backward collision time threshold, the unit is second, x lat represents the distance between the vehicle and the traffic participants on both sides, x min represents the minimum lateral safety distance, the unit is meter, β 1 , β 2 , β 3 respectively Indicates the weight coefficients of forward collision avoidance reward function, backward collision avoidance reward function and side collision avoidance reward function.

子步骤2:构建基于深度双Q网络的决策子网络Sub-step 2: Construct a decision-making sub-network based on a deep double-Q network

考虑到深度双Q网络(Double Deep Q Network,DDQN)通过使用经验复用池的方式,提高了数据的利用效率,且能够避免参数振荡或发散,可以降低Q学习网络中因过估计导致负面的学习效果。因此,本发明利用深度双Q网络学习边缘场景下的安全驾驶策略。Considering that the Double Deep Q Network (DDQN) improves the efficiency of data utilization by using the experience multiplexing pool, and can avoid parameter oscillation or divergence, it can reduce the negative effects of overestimation in the Q learning network. learning result. Therefore, the present invention utilizes a deep double-Q network to learn safe driving strategies in edge scenarios.

不同于处理固定维度的状态空间,处理涵盖周围所有交通参与者的特征信息,需具有更强的特征提取能力。考虑到注意力机制可以捕捉到更加丰富的特征信息(自车与周围各交通参与者之间的依赖关系),本发明设计了基于多头注意力机制的策略网络。此外,考虑到驾驶决策只与自车、周围交通参与者的运动状态有关,不应受状态空间中各交通参与者的顺序影响,本发明利用位置编码方法(文献:Vaswani,Ashish,et al.“Attention isall you need.”Advances in Neural Information Processing Systems.2017.),将排列不变性构建到决策子网络中。Different from dealing with a fixed-dimensional state space, dealing with feature information covering all surrounding traffic participants requires stronger feature extraction capabilities. Considering that the attention mechanism can capture more abundant feature information (dependence between the self-vehicle and the surrounding traffic participants), the present invention designs a policy network based on the multi-head attention mechanism. In addition, considering that the driving decision is only related to the movement state of the vehicle and the surrounding traffic participants, and should not be affected by the order of the traffic participants in the state space, the present invention uses the position encoding method (document: Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems.2017.), Building permutation invariance into decision subnetworks.

注意力层可表示为:The attention layer can be expressed as:

MultiHead(Q,K,V)=Concat(head1,...,headh)WO (10)MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (10)

式中,MultiHead(Q,K,V)表示多头注意力值,Q表示查询向量,K表示键向量,维度均为dk,V表示值向量,维度为dv,WO表示需要学习的参数矩阵,headh表示多头注意力中的第h个头,在本发明中,h=2,可通过下式计算:In the formula, MultiHead(Q,K,V) represents the multi-head attention value, Q represents the query vector, K represents the key vector, the dimension is d k , V represents the value vector, the dimension is d v , W O represents the parameters to be learned Matrix, head h represents the hth head in the multi-head attention, in the present invention, h=2, can be calculated by following formula:

Figure BDA0003829935270000061
Figure BDA0003829935270000061

Figure BDA0003829935270000062
Figure BDA0003829935270000062

式中,Attention(Q,K,V)表示输出的注意力矩阵,

Figure BDA0003829935270000063
表示需要学习的参数矩阵。In the formula, Attention(Q,K,V) represents the output attention matrix,
Figure BDA0003829935270000063
Represents the parameter matrix that needs to be learned.

构建基于深度双Q网络的决策子网络,具体描述如下。A decision-making sub-network based on a deep double-Q network is constructed, which is described in detail as follows.

首先,状态空间St分别与编码器1、编码器2和编码器3相连。编码器1由两个全连接层组成,输出自车运动状态编码。编码器2的结构与编码器1相同,输出相对运动状态编码。编码器3由两个卷积层组成,输出占据栅格图编码。First, the state space S t is connected to Encoder 1, Encoder 2, and Encoder 3, respectively. Encoder 1 consists of two fully connected layers and outputs the ego-vehicle motion state encoding. Encoder 2 has the same structure as Encoder 1, and outputs relative motion state codes. Encoder 3 consists of two convolutional layers, and the output occupies a raster map encoding.

其中,全连接层的神经元数量均为64,激活函数均为Tanh函数。卷积层的卷积核均为3×3,步长均为2。Among them, the number of neurons in the fully connected layer is 64, and the activation functions are all Tanh functions. The convolution kernels of the convolutional layer are all 3×3, and the stride is 2.

其次,利用多头注意力机制分析自车与周围交通参与者的依赖关系,使得决策子网络能够注意到突然靠近自车或与自车行驶路径冲突的交通参与者,并将不同的输入大小和排列不变性构建到决策子网络中。编码器1、编码器2和编码器3的输出均与多头注意力模块连接,输出注意力矩阵。再次,将输出的注意力矩阵与解码器1相连。编码器1由一个全连接层组成。Secondly, the multi-head attention mechanism is used to analyze the dependence relationship between the self-vehicle and the surrounding traffic participants, so that the decision-making sub-network can notice the traffic participants who suddenly approach the self-vehicle or conflict with the driving path of the self-vehicle, and use different input sizes and arrangements Invariance is built into the decision subnetwork. The outputs of Encoder 1, Encoder 2, and Encoder 3 are all connected to a multi-head attention module to output an attention matrix. Again, connect the output attention matrix to decoder 1. Encoder 1 consists of a fully connected layer.

其中,全连接层的神经元数量为64,激活函数为Sigmoid函数。Among them, the number of neurons in the fully connected layer is 64, and the activation function is the Sigmoid function.

子步骤3:构建基于生成对抗模仿学习的决策子网络Sub-step 3: Build a decision-making sub-network based on generative confrontational imitation learning

在开放、多交通目标干扰的复杂城市交通环境下,很难构建一个准确、全面的奖励函数,特别是难以定量描述不确定性(如突遇障碍物、视觉盲区内的交通参与者等)对行车安全的影响。为了减小安全驾驶决策受交通环境和行驶工况不确定性的影响,提高驾驶决策的有效性和可靠性,本发明利用生成对抗模仿学习子网络,学习驾驶行为数据集及其泛化的样本数据中的驾驶策略,进而模仿不同驾驶条件和行驶工况下的安全驾驶行为。生成对抗模仿学习子网络由生成器和判别器两部分组成,分别利用深度神经网络构建生成器网络和判别器网络。具体描述如下:In the complex urban traffic environment with open and multi-traffic target interference, it is difficult to construct an accurate and comprehensive reward function, especially it is difficult to quantitatively describe the uncertainty (such as sudden obstacles, traffic participants in the visual blind zone, etc.) impact on driving safety. In order to reduce the influence of the uncertainty of traffic environment and driving conditions on safe driving decision-making, and improve the effectiveness and reliability of driving decision-making, the present invention utilizes the generation confrontational imitation learning sub-network to learn the driving behavior data set and its generalized samples The driving strategy in the data, and then imitate the safe driving behavior under different driving conditions and driving conditions. The generative confrontational imitation learning sub-network consists of two parts, the generator and the discriminator, and the deep neural network is used to construct the generator network and the discriminator network respectively. The specific description is as follows:

(1)构建生成器(1) Build generator

构建生成器网络。生成器的输入是状态空间,输出是动作空间中各个动作的概率值f=π(·|s;θ),其中,θ表示生成器网络的参数。首先,状态空间依次与全连接层FC1和FC2相连,得到特征F1。状态空间依次与全连接层FC3和FC4,得到特征F2。同时,状态空间依次与卷积层C1和卷积层C2相连,得到特征F3。然后,将特征F1、F2和F3依次与合并层、全连接层FC5、Softmax激活函数相连,得到输出f=π(·|s;θ)。Build a generator network. The input of the generator is the state space, and the output is the probability value f=π(·|s;θ) of each action in the action space, where θ represents the parameters of the generator network. First, the state space is sequentially connected with fully connected layers FC 1 and FC 2 to obtain feature F 1 . The state space is sequentially combined with the fully connected layers FC 3 and FC 4 to obtain the feature F 2 . At the same time, the state space is sequentially connected with the convolutional layer C 1 and the convolutional layer C 2 to obtain the feature F 3 . Then, the features F 1 , F 2 and F 3 are sequentially connected to the merge layer, the fully connected layer FC 5 , and the Softmax activation function to obtain the output f=π(·|s; θ).

其中,全连接层FC1、FC2、FC3、FC4和FC5的神经元数量均为64,卷积层C1的卷积核为3×3,步长为2,卷积层C2的卷积核为3×3,步长为1。Among them, the number of neurons in the fully connected layers FC 1 , FC 2 , FC 3 , FC 4 and FC 5 is 64, the convolution kernel of the convolution layer C 1 is 3×3, and the step size is 2, and the convolution layer C The convolution kernel of 2 is 3×3, and the step size is 1.

(2)构建判别器(2) Build a discriminator

构建判别器网络。判别器的输入是状态空间,输出是向量

Figure BDA0003829935270000071
维度为6,其中,φ表示判别器网络的参数。首先,状态空间依次与全连接层FC6和FC7相连,得到特征F4。状态空间依次与全连接层FC8和FC9,得到特征F5。同时,状态空间依次与卷积层C3和卷积层C4相连,得到特征F6。然后,将特征F4、F5和F6依次与合并层、全连接层FC10、Sigmoid激活函数相连,得到输出
Figure BDA0003829935270000072
Build the discriminator network. The input of the discriminator is the state space, and the output is a vector
Figure BDA0003829935270000071
The dimension is 6, where φ represents the parameters of the discriminator network. First, the state space is sequentially connected with the fully connected layers FC 6 and FC 7 to obtain the feature F 4 . The state space is sequentially combined with the fully connected layers FC 8 and FC 9 to obtain the feature F 5 . At the same time, the state space is sequentially connected with the convolutional layer C 3 and the convolutional layer C 4 to obtain the feature F 6 . Then, the features F 4 , F 5 and F 6 are sequentially connected to the merge layer, the fully connected layer FC 10 , and the Sigmoid activation function to obtain the output
Figure BDA0003829935270000072

其中,全连接层FC6、FC7、FC8、FC9和FC10的神经元数量均为64,卷积层C3的卷积核为3×3,步长为2,卷积层C4的卷积核为3×3,步长为1。Among them, the number of neurons in the fully connected layers FC 6 , FC 7 , FC 8 , FC 9 and FC 10 is 64, the convolution kernel of the convolution layer C 3 is 3×3, and the step size is 2, and the convolution layer C The convolution kernel of 4 is 3×3, and the step size is 1.

步骤三:训练营运车辆安全驾驶决策模型Step 3: Train the decision-making model for safe driving of commercial vehicles

首先,训练基于生成对抗模仿学习的决策子网络。生成对抗模仿学习子网络的目标是学习一个生成器网络,使得判别器无法区分生成器生成的驾驶动作与驾驶行为数据集中的动作。具体包括以下几个子步骤:First, a decision sub-network based on generative adversarial imitation learning is trained. The goal of the Generative Adversarial Imitation Learning sub-network is to learn a generator network such that the discriminator cannot distinguish between the driving actions generated by the generator and those in the driving behavior dataset. Specifically, the following sub-steps are included:

子步骤1:在驾驶行为数据集

Figure BDA0003829935270000081
中,初始化生成器网络参数θ0和判别器网络参数ω0;Sub-step 1: On the driving behavior dataset
Figure BDA0003829935270000081
, initialize the generator network parameter θ 0 and the discriminator network parameter ω 0 ;

子步骤2:进行L次迭代求解,每一次迭代包括子步骤2.1至2.2,具体地:Sub-step 2: Carry out L iterations to solve, each iteration includes sub-steps 2.1 to 2.2, specifically:

子步骤2.1:利用式(13)描述的梯度公式更新判别器参数ωi→ωi+1Sub-step 2.1: Update the discriminator parameters ω i →ω i+1 using the gradient formula described by Equation (13):

Figure BDA0003829935270000082
Figure BDA0003829935270000082

式中,

Figure BDA0003829935270000083
表示参数为ω的神经网络损失函数的梯度函数;In the formula,
Figure BDA0003829935270000083
Represents the gradient function of the neural network loss function with parameter ω;

子步骤2.2:设置奖励函数

Figure BDA0003829935270000084
利用信赖域策略优化算法更新生成器参数θi→θi+1。Sub-step 2.2: Set the reward function
Figure BDA0003829935270000084
Utilize the trust region policy optimization algorithm to update the generator parameters θ i →θ i+1 .

首先,在上述网络训练结果的基础上,继续训练构建基于DDQN的决策子网络,具体包括以下几个子步骤:First, on the basis of the above network training results, continue to train and build a decision-making sub-network based on DDQN, which specifically includes the following sub-steps:

子步骤3:初始化经验复用池D的容量为N;Sub-step 3: Initialize the capacity of experience reuse pool D as N;

子步骤4:初始化动作对应的Q值为随机值;Sub-step 4: The Q value corresponding to the initialization action is a random value;

子步骤5:进行M次迭代求解,每一次迭代包括子步骤5.1至5.2,具体地:Sub-step 5: Carry out M iterations to solve, each iteration includes sub-steps 5.1 to 5.2, specifically:

子步骤5.1:初始化状态s0,初始化策略参数φ0Sub-step 5.1: Initialize state s 0 , initialize strategy parameter φ 0 ;

子步骤5.2:进行T次迭代求解,每一次迭代包括子步骤5.21至5.27,具体地:Sub-step 5.2: Perform T iterations to solve, each iteration includes sub-steps 5.21 to 5.27, specifically:

子步骤5.21:随机选择一个驾驶动作;Sub-step 5.21: Randomly select a driving action;

子步骤5.22:否则选择at=maxaQ*(φ(st),a;θ);Sub-step 5.22: Otherwise select a t = max a Q * (φ(s t ),a; θ);

式中,Q*(·)表示最优的动作价值函数,at表示t时刻的动作;In the formula, Q * ( ) represents the optimal action-value function, and a t represents the action at time t;

子步骤5.23:执行动作at,获得t时刻的奖励值rt和t+1时刻的状态st+1Sub-step 5.23: Execute action a t to obtain reward value r t at time t and state s t+1 at time t+1 ;

子步骤5.24:在经验复用池D中存储样本(φt,at,rtt+1);Sub-step 5.24: Store samples (φ t , a t , r tt+1 ) in the experience multiplexing pool D;

子步骤5.25:从经验复用池D中随机抽取小批量的样本(φj,aj,rjj+1);Sub-step 5.25: Randomly select a small batch of samples (φ j ,a j ,r jj+1 ) from the experience multiplexing pool D;

子步骤5.26:利用下式计算迭代目标:Sub-step 5.26: Calculate the iteration target using the following formula:

Figure BDA0003829935270000085
Figure BDA0003829935270000085

式中,

Figure BDA0003829935270000086
表示t时刻目标网络的权重;γ表示折扣因子;argmax(·)表示使目标函数具有最大值的变量,yi表示i时刻的迭代目标,p(s,a)表示动作分布;In the formula,
Figure BDA0003829935270000086
Indicates the weight of the target network at time t; γ indicates the discount factor; argmax( ) indicates the variable that makes the objective function have the maximum value, y i indicates the iterative target at time i, and p(s,a) indicates the action distribution;

子步骤5.27:利用下式在(yi-Q(φj,aj;θ))2上进行梯度下降:Sub-step 5.27: Use the following formula to perform gradient descent on (y i -Q(φ j ,a j ; θ)) 2 :

Figure BDA0003829935270000091
Figure BDA0003829935270000091

式中,

Figure BDA0003829935270000092
表示参数为θi的神经网络损失函数的梯度函数,ε表示在ε-greedy探索策略下,随机选择一个行为的概率;θi表示i时刻迭代的参数,Lii)表示i时刻的损失函数,Q(s,a;θi)表示目标网络的动作价值函数,a′表示状态s′所有可能存在的动作。In the formula,
Figure BDA0003829935270000092
Indicates the gradient function of the neural network loss function whose parameter is θ i , ε indicates the probability of randomly selecting a behavior under the ε-greedy exploration strategy; θ i indicates the parameter of iteration at time i, and L ii ) indicates the The loss function, Q(s,a; θ i ) represents the action value function of the target network, and a' represents all possible actions in the state s'.

当营运车辆安全驾驶决策模型训练完成后,将传感器采集的状态空间信息输入到安全驾驶决策模型中,可以实时地输出转向、直行、加减速等高级驾驶决策,能够有效保障城市低速环境下的营运车辆运行安全。After the training of the safe driving decision model of commercial vehicles is completed, the state space information collected by the sensor is input into the safe driving decision model, which can output high-level driving decisions such as steering, straight ahead, acceleration and deceleration in real time, which can effectively guarantee the operation in the urban low-speed environment The vehicle is safe to operate.

有益效果:相比于一般的驾驶决策方法,本发明提出的方法具有更为有效、可靠的特点,具体体现在:Beneficial effects: Compared with general driving decision-making methods, the method proposed by the present invention has more effective and reliable characteristics, which are specifically reflected in:

(1)本发明提出的方法能够模拟人类驾驶员的安全驾驶行为,为城市低速环境下的营运车辆提供更加合理、安全的驾驶策略,实现了具有高度类人水平的大型营运车辆安全驾驶决策,可以有效保障车辆的行车安全。(1) The method proposed by the present invention can simulate the safe driving behavior of human drivers, provide more reasonable and safe driving strategies for commercial vehicles in urban low-speed environments, and realize safe driving decisions for large commercial vehicles with a high human-like level, It can effectively guarantee the driving safety of the vehicle.

(2)本发明提出的方法综合考虑了视觉盲区、突遇障碍物、不同行驶工况等因素对行车安全的影响,并正常驾驶场景和边缘场景下进行策略学习和训练,进一步提高了驾驶决策的有效性和可靠性。(2) The method proposed by the present invention comprehensively considers the influence of factors such as visual blind spots, unexpected obstacles, and different driving conditions on driving safety, and conducts strategy learning and training in normal driving scenes and edge scenes, further improving driving decision-making. effectiveness and reliability.

(3)本发明提出的方法引入了多头注意力机制,考虑了自车与周围各交通参与者之间的动态交互,且能够处理输入可变(周围交通参与者数量动态变化)的安全驾驶决策.(3) The method proposed in the present invention introduces a multi-head attention mechanism, considers the dynamic interaction between the vehicle and the surrounding traffic participants, and can handle safe driving decisions with variable input (the number of surrounding traffic participants changes dynamically) .

(4)本发明提出的方法无需考虑复杂的车辆动力学方程和车身参数,计算方法简单清晰,可以实时输出大型营运车辆的安全驾驶决策策略,且使用的传感器成本较低,便于大规模推广。(4) The method proposed by the present invention does not need to consider complex vehicle dynamic equations and vehicle body parameters. The calculation method is simple and clear, and can output the safe driving decision-making strategy of large commercial vehicles in real time, and the cost of the sensors used is low, which is convenient for large-scale promotion.

附图说明Description of drawings

图1是本发明的技术路线图;Fig. 1 is a technical roadmap of the present invention;

图2是本发明设计的基于多头注意力机制的策略网络结构示意图;Fig. 2 is a schematic diagram of the strategy network structure based on the multi-head attention mechanism designed by the present invention;

图3是本发明设计的生成器网络结构示意图;Fig. 3 is a schematic diagram of the generator network structure designed by the present invention;

图4是本发明设计的判别器网络结构示意图。Fig. 4 is a schematic diagram of the structure of the discriminator network designed by the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明的技术方案作进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

本发明针对开放、多交通目标干扰的城市交通环境,提出了一种具有高度类人水平的大型营运车辆安全驾驶决策方法。首先,采集城市交通环境下人类驾驶员的安全驾驶行为,构建形成安全驾驶行为数据集。其次,构建基于多头注意力的营运车辆安全驾驶决策模型。该模型包含深度双Q网络和生成对抗模仿学习两个子网络。其中,深度双Q网络通过无监督学习的方式,学习危险场景、冲突场景等边缘场景下的安全驾驶策略;生成对抗模仿学习子网络模仿不同驾驶条件和行驶工况下的安全驾驶行为。最后,训练安全驾驶决策模型,得到不同驾驶条件和行驶工况下的驾驶策略,实现对营运车辆安全驾驶行为的高级决策输出。本发明提出的方法,能够模拟人类驾驶员的安全驾驶行为,且考虑了视觉盲区、突遇障碍物等因素对行车安全的影响,为大型营运车辆提供更加合理、安全的驾驶策略,实现了城市交通环境下的营运车辆安全驾驶决策。本发明的技术路线如图1所示,具体步骤如下:Aiming at the urban traffic environment which is open and interfered by multiple traffic targets, the invention proposes a safe driving decision-making method for large commercial vehicles with a highly human-like level. First, collect the safe driving behavior of human drivers in the urban traffic environment, and construct a safe driving behavior data set. Secondly, construct a decision-making model for safe driving of commercial vehicles based on multi-head attention. The model consists of two sub-networks, a deep double-Q network and a generative adversarial imitation learning. Among them, the deep double-Q network learns safe driving strategies in edge scenarios such as dangerous scenes and conflict scenes through unsupervised learning; the generated confrontational imitation learning sub-network imitates safe driving behaviors under different driving conditions and driving conditions. Finally, the safe driving decision-making model is trained to obtain driving strategies under different driving conditions and driving conditions, and realize advanced decision-making output for safe driving behavior of commercial vehicles. The method proposed by the present invention can simulate the safe driving behavior of human drivers, and considers the influence of factors such as visual blind spots and sudden obstacles on driving safety, and provides more reasonable and safe driving strategies for large-scale commercial vehicles, realizing urban Safe driving decisions for commercial vehicles in traffic environments. Technical route of the present invention is as shown in Figure 1, and concrete steps are as follows:

步骤一:采集城市交通环境下人类驾驶员的安全驾驶行为Step 1: Collect the safe driving behavior of human drivers in urban traffic environment

为了实现与人类驾驶员相媲美的驾驶决策,本发明通过实际道路测试和驾驶模拟仿真的方式,采集不同驾驶条件和行驶工况下的安全驾驶行为,进而构建表征人类驾驶员安全驾驶行为的数据集。具体包括以下5个子步骤:In order to achieve driving decisions comparable to human drivers, the present invention collects safe driving behaviors under different driving conditions and driving conditions through actual road tests and driving simulations, and then constructs data representing safe driving behaviors of human drivers set. Specifically, it includes the following 5 sub-steps:

子步骤1:利用毫米波雷达、128线激光雷达、视觉传感器、北斗传感器和惯性传感器搭建多维目标信息同步采集系统。Sub-step 1: Use millimeter-wave radar, 128-line laser radar, vision sensor, Beidou sensor and inertial sensor to build a multi-dimensional target information synchronous acquisition system.

子步骤2:在真实城市环境下,多名驾驶员依次驾驶搭载多维目标信息同步采集系统的营运车辆。Sub-step 2: In a real urban environment, multiple drivers sequentially drive a commercial vehicle equipped with a multi-dimensional target information synchronous acquisition system.

子步骤3:对驾驶员的车道变换、车道保持、车辆跟驰、加减速等各种驾驶行为的相关数据进行采集和处理,获取各驾驶行为的多源异质描述数据,如雷达或视觉传感器测得多个不同方位的障碍物距离,北斗传感器及惯性传感器测得的位置、速度、加速度及横摆角速度等,以及车载传感器测得的方向盘转角等。Sub-step 3: Collect and process data related to various driving behaviors such as driver's lane change, lane keeping, vehicle following, acceleration and deceleration, etc., and obtain multi-source heterogeneous description data of each driving behavior, such as radar or visual sensors Obstacle distances in different orientations are measured, the position, velocity, acceleration, and yaw rate measured by Beidou sensors and inertial sensors, and the steering wheel angle measured by on-board sensors.

子步骤4:为了模仿危险场景、冲突场景等边缘场景下的安全驾驶行为,搭建基于硬件在环仿真的虚拟城市场景,所构建的城市交通场景包括以下三类:Sub-step 4: In order to imitate the safe driving behavior in borderline scenarios such as dangerous scenarios and conflict scenarios, build a virtual city scenario based on hardware-in-the-loop simulation. The constructed urban traffic scenarios include the following three categories:

(1)在车辆行驶过程中,车辆前方会出现横向接近的交通参与者(即突遇障碍物);(1) During the driving process of the vehicle, there will be laterally approaching traffic participants in front of the vehicle (that is, suddenly encountering obstacles);

(2)在车辆转向过程中,车辆的视觉盲区内存在静止的交通参与者;(2) During the turning process of the vehicle, there are stationary traffic participants in the blind spot of the vehicle;

(3)在车辆转向过程中,车辆的视觉盲区内存在运动的交通参与者。(3) During the turning process of the vehicle, there are moving traffic participants in the blind spot of the vehicle.

在上述交通场景中,存在多种路网结构(直道、弯道和十字路口)和多类交通参与者(营运车辆、乘用车、非机动车和行人)。In the above traffic scenarios, there are various road network structures (straight roads, curves, and intersections) and various types of traffic participants (commercial vehicles, passenger vehicles, non-motorized vehicles, and pedestrians).

多名驾驶员通过真实控制器(方向盘、油门和制动踏板)驾驶虚拟场景中的营运车辆,采集自车的横纵向位置、横纵向速度、横纵向加速度、与周围交通参与者的相对距离和相对速度等信息。Multiple drivers drive commercial vehicles in the virtual scene through real controllers (steering wheel, accelerator and brake pedal), and collect the vehicle's horizontal and vertical positions, horizontal and vertical speeds, horizontal and vertical accelerations, and relative distances and distances from surrounding traffic participants. Relative speed and other information.

子步骤5:基于真实城市环境和驾驶模拟仿真环境采集的数据,构建形成用于安全驾驶决策学习的驾驶行为数据集,具体可表示为:Sub-step 5: Based on the data collected in the real urban environment and driving simulation environment, construct a driving behavior data set for safe driving decision-making learning, which can be specifically expressed as:

Figure BDA0003829935270000111
Figure BDA0003829935270000111

式中,X表示涵盖状态、动作的二元组,即构建的表征人类驾驶员安全驾驶行为的数据集,(sj,aj)表示j时刻的“状态-动作”对,其中,sj表示j时刻的状态,aj表示j时刻的动作,即人类驾驶员基于状态sj做出的动作,n表示数据库中“状态-动作”对的数量。In the formula, X represents the 2-tuple covering the state and action, that is, the constructed data set representing the safe driving behavior of human drivers, (s j , a j ) represents the "state-action" pair at time j, where s j Indicates the state at time j, a j represents the action at time j, that is, the action made by a human driver based on state s j , and n represents the number of "state-action" pairs in the database.

步骤二:构建基于多头注意力的营运车辆安全驾驶决策模型Step 2: Construct a decision-making model for safe driving of commercial vehicles based on multi-head attention

为了实现城市低速环境下的大型营运车辆安全驾驶决策,本发明综合考虑视觉盲区、突遇障碍物、行驶工况等因素对行车安全的影响,建立营运车辆安全驾驶决策模型。考虑到深度强化学习将深度学习的感知能力和强化学习的决策能力相结合,通过无监督学习的方式对交通环境进行探索,本发明利用深度强化学习对危险场景、冲突场景等边缘场景下的安全驾驶策略进行学习。此外,考虑到模仿学习具有仿效榜样的能力,本发明利用模仿学习模拟人类驾驶员在不同驾驶条件和行驶工况下的安全驾驶行为。因此,构建的安全驾驶决策模型由两部分组成,具体描述如下:In order to realize the safe driving decision-making of large-scale commercial vehicles in low-speed urban environments, the present invention comprehensively considers the influence of factors such as visual blind spots, unexpected obstacles, and driving conditions on driving safety, and establishes a safe driving decision-making model for commercial vehicles. Considering that deep reinforcement learning combines the perception ability of deep learning with the decision-making ability of reinforcement learning, and explores the traffic environment through unsupervised learning, the present invention uses deep reinforcement learning to improve safety in edge scenarios such as dangerous scenes and conflict scenes. Learn driving strategies. In addition, considering that imitation learning has the ability to imitate models, the present invention uses imitation learning to simulate the safe driving behavior of human drivers under different driving conditions and driving conditions. Therefore, the constructed safe driving decision-making model consists of two parts, which are described in detail as follows:

子步骤1:定义安全驾驶决策模型的基本参数Sub-step 1: Define the basic parameters of the safe driving decision model

首先,将城市低速环境下的安全驾驶决策问题转化为有限马尔科夫决策过程。其次,定义安全驾驶决策模型的基本参数。First, the safe driving decision-making problem in urban low-speed environment is transformed into a finite Markov decision process. Second, the basic parameters of the safe driving decision-making model are defined.

(1)定义状态空间(1) Define the state space

为了描述自车和附近交通参与者的运动状态,本发明利用时间序列数据和占据栅格图构建状态空间。具体描述如下:In order to describe the motion state of the self-vehicle and nearby traffic participants, the present invention utilizes time series data and occupancy grid graphs to construct a state space. The specific description is as follows:

St=[S1(t),S2(t),S3(t)] (2)S t = [S 1 (t), S 2 (t), S 3 (t)] (2)

式中,St表示t时刻的状态空间,S1(t)和S2(t)表示t时刻与时间序列数据相关的状态空间,S3(t)表示t时刻与占据栅格图相关的状态空间。In the formula, S t represents the state space at time t, S 1 (t) and S 2 (t) represent the state space related to the time series data at time t, and S 3 (t) represents the state space related to the occupancy grid map at time t state space.

首先,利用连续位置、速度、加速度和航向角信息描述自车的运动状态:First, the ego vehicle's motion state is described using continuous position, velocity, acceleration, and heading angle information:

S1(t)=[px,py,vx,vy,ax,ays] (3)S 1 (t)=[p x ,p y ,v x ,v y ,a x ,a ys ] (3)

式中,px,py分别表示自车的横向位置和纵向位置,单位为米,vx,vy分别表示自车的横向速度和纵向速度,单位为米每秒,ax,ay分别表示自车的横向加速度和纵向加速度,单位为米每二次方秒,θs表示自车的航向角,单位为度。In the formula, p x , p y represent the lateral position and longitudinal position of the self-vehicle respectively in meters, v x , v y represent the lateral speed and longitudinal speed of the self-vehicle respectively in meters per second, a x , a y Respectively represent the lateral acceleration and longitudinal acceleration of the own vehicle, the unit is meter per square second, θ s represents the heading angle of the own vehicle, the unit is degree.

其次,利用自车与周围交通参与者的相对运动状态信息描述周围交通参与者的运动状态:Secondly, use the relative motion state information of the self-vehicle and the surrounding traffic participants to describe the motion state of the surrounding traffic participants:

Figure BDA0003829935270000121
Figure BDA0003829935270000121

式中,

Figure BDA0003829935270000122
分别表示自车与第i个交通参与者的相对距离、相对速度和加速度,单位分别为米、米每秒和米每二次方秒。In the formula,
Figure BDA0003829935270000122
respectively represent the relative distance, relative speed and acceleration between the self-vehicle and the i-th traffic participant, and the units are meters, meters per second and meters per square second.

现有的状态空间定义方法中,常使用固定的编码方法,即考虑的周围交通参与者的数量是固定的。然而,在实际的城市交通场景中,营运车辆周围的交通参与者数量和位置是时刻变化的,且需要特别考虑突遇障碍物和视觉盲区导致的侧向碰撞。虽然固定编码的方法可以实现有效的状态表征,但考虑的交通参与者数量有限(使用了表示场景所需的最少信息量),无法准确、全面地描述周围所有交通参与者对营运车辆行车安全的影响。In the existing state space definition methods, a fixed coding method is often used, that is, the number of surrounding traffic participants considered is fixed. However, in the actual urban traffic scene, the number and position of traffic participants around the operating vehicle are constantly changing, and special consideration should be given to lateral collisions caused by sudden obstacles and visual blind spots. Although the method of fixed coding can achieve effective state representation, the number of traffic participants considered is limited (the minimum amount of information required to represent the scene is used), and it cannot accurately and comprehensively describe the impact of all surrounding traffic participants on the driving safety of operating vehicles. influences.

最后,为了更加形象地描述自车与周围交通参与者的相对位置关系,提高决策的可靠性和有效性,本发明将道路区域栅格化,划分成若干个a×b的网格区域,将道路区域及车辆目标抽象成栅格图,即用于描述相对位置关系的“存在”栅格图S3(t)。其中,a表示网格区域的长度,b表示网格区域的宽度。Finally, in order to more vividly describe the relative positional relationship between the self-vehicle and the surrounding traffic participants, and improve the reliability and effectiveness of decision-making, the present invention grids the road area and divides it into several a×b grid areas. The road area and vehicle targets are abstracted into a grid map, that is, the "existence" grid map S 3 (t) used to describe the relative positional relationship. Wherein, a represents the length of the grid area, and b represents the width of the grid area.

“存在”栅格图包含四种属性,包括栅格坐标、是否存在车辆、对应车辆的类别、与左右车道线的距离。其中,不存在交通参与者的网格置为“0”,存在交通参与者的网格置为“1”,该网格与自车所在网格的位置分布,用于描述两车的相对间距。The "existence" grid map contains four attributes, including grid coordinates, whether there is a vehicle, the category of the corresponding vehicle, and the distance from the left and right lanes. Among them, the grid with no traffic participants is set to "0", and the grid with traffic participants is set to "1". The position distribution between the grid and the grid where the self-vehicle is located is used to describe the relative distance between the two vehicles .

(2)定义动作空间(2) Define the action space

利用横向和纵向驾驶动作定义动作空间:Define the action space with lateral and longitudinal driving actions:

At=[aleft,astraight,aright,aaccel,acons,adecel] (5)A t =[a left ,a straight ,a right ,a accel ,a cons ,a decel ] (5)

式中,At表示t时刻的动作空间,aleft,astraight,aright分别表示左转、直行和右转,aaccel,acons,adecel分别表示加速、匀速和减速。In the formula, A t represents the action space at time t, a left , a straight , and a right represent left turn, straight line and right turn respectively, and a accel , a cons , a decel represent acceleration, constant speed and deceleration respectively.

(3)定义奖励函数(3) Define the reward function

Rt=r1+r2+r3 (6)R t =r 1 +r 2 +r 3 (6)

式中,Rt表示t时刻的奖励函数,r1,r2,r3分别表示前向防撞奖励函数、后向防撞奖励函数和侧向防撞奖励函数,可通过式(7)、式(8)和式(9)获得。In the formula, R t represents the reward function at time t, r 1 , r 2 , and r 3 represent the forward collision avoidance reward function, the backward collision avoidance reward function and the side collision avoidance reward function respectively, which can be obtained through formula (7), Formula (8) and formula (9) are obtained.

Figure BDA0003829935270000131
Figure BDA0003829935270000131

Figure BDA0003829935270000132
Figure BDA0003829935270000132

Figure BDA0003829935270000133
Figure BDA0003829935270000133

式中,TTC表示自车与前方障碍物发生碰撞的时间,可通过自车与前方障碍物之间的距离除以相对速度获得,TTCthr表示距离碰撞时间阈值,RTTC表示后向碰撞时间,RTTCthr表示后向碰撞时间阈值,单位均为秒,xlat表示自车与两侧交通参与者的距离,xmin表示最小侧向安全距离,单位均为米,β123分别表示前向防撞奖励函数、后向防撞奖励函数和侧向防撞奖励函数的权重系数。In the formula, TTC represents the collision time between the self-vehicle and the front obstacle, which can be obtained by dividing the distance between the self-vehicle and the front obstacle by the relative speed, TTC thr represents the distance collision time threshold, RTTC represents the backward collision time, RTTC thr represents the backward collision time threshold, the unit is second, x lat represents the distance between the vehicle and the traffic participants on both sides, x min represents the minimum lateral safety distance, the unit is meter, β 1 , β 2 , β 3 respectively Indicates the weight coefficients of forward collision avoidance reward function, backward collision avoidance reward function and side collision avoidance reward function.

子步骤2:构建基于DDQN的决策子网络Sub-step 2: Build a decision-making subnetwork based on DDQN

考虑到深度双Q网络(Double Deep Q Network,DDQN)通过使用经验复用池的方式,提高了数据的利用效率,且能够避免参数振荡或发散,可以降低Q学习网络中因过估计导致负面的学习效果。因此,本发明利用深度双Q网络学习边缘场景下的安全驾驶策略。Considering that the Double Deep Q Network (DDQN) improves the efficiency of data utilization by using the experience multiplexing pool, and can avoid parameter oscillation or divergence, it can reduce the negative effects of overestimation in the Q learning network. learning result. Therefore, the present invention utilizes a deep double-Q network to learn safe driving strategies in edge scenarios.

不同于处理固定维度的状态空间,处理涵盖周围所有交通参与者的特征信息,需具有更强的特征提取能力。考虑到注意力机制可以捕捉到更加丰富的特征信息(自车与周围各交通参与者之间的依赖关系),本发明设计了基于多头注意力机制的策略网络。此外,考虑到驾驶决策只与自车、周围交通参与者的运动状态有关,不应受状态空间中各交通参与者的顺序影响,本发明利用位置编码方法(文献:Vaswani,Ashish,et al.“Attention isall you need.”Advances in Neural Information Processing Systems.2017.),将排列不变性构建到决策子网络中。Different from dealing with a fixed-dimensional state space, dealing with feature information covering all surrounding traffic participants requires stronger feature extraction capabilities. Considering that the attention mechanism can capture more abundant feature information (dependence between the self-vehicle and the surrounding traffic participants), the present invention designs a policy network based on the multi-head attention mechanism. In addition, considering that the driving decision is only related to the movement state of the vehicle and the surrounding traffic participants, and should not be affected by the order of the traffic participants in the state space, the present invention uses the position encoding method (document: Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems.2017.), Building permutation invariance into decision subnetworks.

注意力层可表示为:The attention layer can be expressed as:

MultiHead(Q,K,V)=Concat(head1,...,headh)WO (10)MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (10)

式中,MultiHead(Q,K,V)表示多头注意力值,Q表示查询向量,K表示键向量,维度均为dk,V表示值向量,维度为dv,WO表示需要学习的参数矩阵,headh表示多头注意力中的第h个头,在本发明中,h=2,可通过下式计算:In the formula, MultiHead(Q,K,V) represents the multi-head attention value, Q represents the query vector, K represents the key vector, the dimension is d k , V represents the value vector, the dimension is d v , W O represents the parameters to be learned Matrix, head h represents the hth head in the multi-head attention, in the present invention, h=2, can be calculated by following formula:

Figure BDA0003829935270000134
Figure BDA0003829935270000134

Figure BDA0003829935270000141
Figure BDA0003829935270000141

式中,Attention(Q,K,V)表示输出的注意力矩阵,

Figure BDA0003829935270000142
表示需要学习的参数矩阵。In the formula, Attention(Q,K,V) represents the output attention matrix,
Figure BDA0003829935270000142
Represents the parameter matrix that needs to be learned.

构建基于深度双Q网络的决策子网络,如图2所示,具体描述如下。Construct a decision-making sub-network based on a deep double-Q network, as shown in Figure 2, and the specific description is as follows.

首先,状态空间St分别与编码器1、编码器2和编码器3相连。编码器1由两个全连接层组成,输出自车运动状态编码。编码器2的结构与编码器1相同,输出相对运动状态编码。编码器3由两个卷积层组成,输出占据栅格图编码。First, the state space S t is connected to Encoder 1, Encoder 2, and Encoder 3, respectively. Encoder 1 consists of two fully connected layers and outputs the ego-vehicle motion state encoding. Encoder 2 has the same structure as Encoder 1, and outputs relative motion state codes. Encoder 3 consists of two convolutional layers, and the output occupies a raster map encoding.

其中,全连接层的神经元数量均为64,激活函数均为Tanh函数。卷积层的卷积核均为3×3,步长均为2。Among them, the number of neurons in the fully connected layer is 64, and the activation functions are all Tanh functions. The convolution kernels of the convolutional layer are all 3×3, and the stride is 2.

其次,利用多头注意力机制分析自车与周围交通参与者的依赖关系,使得决策子网络能够注意到突然靠近自车或与自车行驶路径冲突的交通参与者,并将不同的输入大小和排列不变性构建到决策子网络中。编码器1、编码器2和编码器3的输出均与多头注意力模块连接,输出注意力矩阵。再次,将输出的注意力矩阵与解码器1相连。编码器1由一个全连接层组成。Secondly, the multi-head attention mechanism is used to analyze the dependence relationship between the self-vehicle and the surrounding traffic participants, so that the decision-making sub-network can notice the traffic participants who suddenly approach the self-vehicle or conflict with the driving path of the self-vehicle, and use different input sizes and arrangements Invariance is built into the decision subnetwork. The outputs of Encoder 1, Encoder 2, and Encoder 3 are all connected to a multi-head attention module to output an attention matrix. Again, connect the output attention matrix to decoder 1. Encoder 1 consists of a fully connected layer.

其中,全连接层的神经元数量为64,激活函数为Sigmoid函数。Among them, the number of neurons in the fully connected layer is 64, and the activation function is the Sigmoid function.

子步骤3:构建基于生成对抗模仿学习的决策子网络Sub-step 3: Build a decision-making sub-network based on generative confrontational imitation learning

在开放、多交通目标干扰的复杂城市交通环境下,很难构建一个准确、全面的奖励函数,特别是难以定量描述多种不确定性(如突遇障碍物、视觉盲区内的交通参与者等)对行车安全的影响。为了减小安全驾驶决策受交通环境和行驶工况不确定性的影响,提高驾驶决策的有效性和可靠性,本发明利用生成对抗模仿学习子网络,学习驾驶行为数据集及其泛化的样本数据中的驾驶策略,进而模仿不同驾驶条件和行驶工况下的安全驾驶行为。生成对抗模仿学习子网络由生成器和判别器两部分组成,分别利用深度神经网络构建生成器网络和判别器网络。具体描述如下:In the complex urban traffic environment with open and multi-traffic target interference, it is difficult to construct an accurate and comprehensive reward function, especially it is difficult to quantitatively describe various uncertainties (such as sudden obstacles, traffic participants in the visual blind zone, etc.) ) on driving safety. In order to reduce the influence of the uncertainty of traffic environment and driving conditions on safe driving decision-making, and improve the effectiveness and reliability of driving decision-making, the present invention utilizes the generation confrontational imitation learning sub-network to learn the driving behavior data set and its generalized samples The driving strategy in the data, and then imitate the safe driving behavior under different driving conditions and driving conditions. The generative confrontational imitation learning sub-network consists of two parts, the generator and the discriminator, and the deep neural network is used to construct the generator network and the discriminator network respectively. The specific description is as follows:

(1)构建生成器(1) Build generator

构建如图3所示的生成器网络。生成器的输入是状态空间,输出是动作空间中各个动作的概率值f=π(·|s;θ),其中,θ表示生成器网络的参数。首先,状态空间依次与全连接层FC1和FC2相连,得到特征F1。状态空间依次与全连接层FC3和FC4,得到特征F2。同时,状态空间依次与卷积层C1和卷积层C2相连,得到特征F3。然后,将特征F1、F2和F3依次与合并层、全连接层FC5、Softmax激活函数相连,得到输出f=π(·|s;θ)。Build a generator network as shown in Figure 3. The input of the generator is the state space, and the output is the probability value f=π(·|s;θ) of each action in the action space, where θ represents the parameters of the generator network. First, the state space is sequentially connected with fully connected layers FC 1 and FC 2 to obtain feature F 1 . The state space is sequentially combined with the fully connected layers FC 3 and FC 4 to obtain the feature F 2 . At the same time, the state space is sequentially connected with the convolutional layer C 1 and the convolutional layer C 2 to obtain the feature F 3 . Then, the features F 1 , F 2 and F 3 are sequentially connected to the merge layer, the fully connected layer FC 5 , and the Softmax activation function to obtain the output f=π(·|s; θ).

其中,全连接层FC1、FC2、FC3、FC4和FC5的神经元数量均为64,卷积层C1的卷积核为3×3,步长为2,卷积层C2的卷积核为3×3,步长为1。Among them, the number of neurons in the fully connected layers FC 1 , FC 2 , FC 3 , FC 4 and FC 5 is 64, the convolution kernel of the convolution layer C 1 is 3×3, and the step size is 2, and the convolution layer C The convolution kernel of 2 is 3×3, and the step size is 1.

(2)构建判别器(2) Build a discriminator

构建如图4所示的判别器网络。判别器的输入是状态空间,输出是向量

Figure BDA0003829935270000151
维度为6,其中,φ表示判别器网络的参数。首先,状态空间依次与全连接层FC6和FC7相连,得到特征F4。状态空间依次与全连接层FC8和FC9,得到特征F5。同时,状态空间依次与卷积层C3和卷积层C4相连,得到特征F6。然后,将特征F4、F5和F6依次与合并层、全连接层FC10、Sigmoid激活函数相连,得到输出
Figure BDA0003829935270000152
Construct the discriminator network as shown in Figure 4. The input of the discriminator is the state space, and the output is a vector
Figure BDA0003829935270000151
The dimension is 6, where φ represents the parameters of the discriminator network. First, the state space is sequentially connected with the fully connected layers FC 6 and FC 7 to obtain the feature F 4 . The state space is sequentially combined with the fully connected layers FC 8 and FC 9 to obtain the feature F 5 . At the same time, the state space is sequentially connected with the convolutional layer C 3 and the convolutional layer C 4 to obtain the feature F 6 . Then, the features F 4 , F 5 and F 6 are sequentially connected to the merge layer, the fully connected layer FC 10 , and the Sigmoid activation function to obtain the output
Figure BDA0003829935270000152

其中,全连接层FC6、FC7、FC8、FC9和FC10的神经元数量均为64,卷积层C3的卷积核为3×3,步长为2,卷积层C4的卷积核为3×3,步长为1。Among them, the number of neurons in the fully connected layers FC 6 , FC 7 , FC 8 , FC 9 and FC 10 is 64, the convolution kernel of the convolution layer C 3 is 3×3, and the step size is 2, and the convolution layer C The convolution kernel of 4 is 3×3, and the step size is 1.

步骤三:训练营运车辆安全驾驶决策模型Step 3: Train the decision-making model for safe driving of commercial vehicles

首先,训练基于生成对抗模仿学习的决策子网络。生成对抗模仿学习子网络的目标是学习一个生成器网络,使得判别器无法区分生成器生成的驾驶动作与驾驶行为数据集中的动作。具体包括以下几个子步骤:First, a decision sub-network based on generative adversarial imitation learning is trained. The goal of the Generative Adversarial Imitation Learning sub-network is to learn a generator network such that the discriminator cannot distinguish between the driving actions generated by the generator and those in the driving behavior dataset. Specifically, the following sub-steps are included:

子步骤1:在驾驶行为数据集

Figure BDA0003829935270000153
中,初始化生成器网络参数θ0和判别器网络参数ω0;Sub-step 1: On the driving behavior dataset
Figure BDA0003829935270000153
, initialize the generator network parameter θ 0 and the discriminator network parameter ω 0 ;

子步骤2:进行L次迭代求解,每一次迭代包括子步骤2.1至2.2,具体地:Sub-step 2: Carry out L iterations to solve, each iteration includes sub-steps 2.1 to 2.2, specifically:

子步骤2.1:利用式(13)描述的梯度公式更新判别器参数ωi→ωi+1Sub-step 2.1: Update the discriminator parameters ω i →ω i+1 using the gradient formula described by Equation (13):

Figure BDA0003829935270000154
Figure BDA0003829935270000154

式中,

Figure BDA0003829935270000155
表示参数为ω的神经网络损失函数的梯度函数;In the formula,
Figure BDA0003829935270000155
Represents the gradient function of the neural network loss function with parameter ω;

子步骤2.2:设置奖励函数

Figure BDA0003829935270000156
利用信赖域策略优化算法更新生成器参数θi→θi+1。Sub-step 2.2: Set the reward function
Figure BDA0003829935270000156
Utilize the trust region policy optimization algorithm to update the generator parameters θ i →θ i+1 .

首先,在上述网络训练结果的基础上,继续训练构建基于DDQN的决策子网络,具体包括以下几个子步骤:First, on the basis of the above network training results, continue to train and build a decision-making sub-network based on DDQN, which specifically includes the following sub-steps:

子步骤3:初始化经验复用池D的容量为N;Sub-step 3: Initialize the capacity of experience reuse pool D as N;

子步骤4:初始化动作对应的Q值为随机值;Sub-step 4: The Q value corresponding to the initialization action is a random value;

子步骤5:进行M次迭代求解,每一次迭代包括子步骤5.1至5.2,具体地:Sub-step 5: Carry out M iterations to solve, each iteration includes sub-steps 5.1 to 5.2, specifically:

子步骤5.1:初始化状态s0,初始化策略参数φ0Sub-step 5.1: Initialize state s 0 , initialize strategy parameter φ 0 ;

子步骤5.2:进行T次迭代求解,每一次迭代包括子步骤5.21至5.27,具体地:Sub-step 5.2: Perform T iterations to solve, each iteration includes sub-steps 5.21 to 5.27, specifically:

子步骤5.21:随机选择一个驾驶动作;Sub-step 5.21: Randomly select a driving action;

子步骤5.22:否则选择at=maxaQ*(φ(st),a;θ);Sub-step 5.22: Otherwise select a t = max a Q * (φ(s t ),a; θ);

式中,Q*(·)表示最优的动作价值函数,at表示t时刻的动作;In the formula, Q * ( ) represents the optimal action-value function, and a t represents the action at time t;

子步骤5.23:执行动作at,获得t时刻的奖励值rt和t+1时刻的状态st+1Sub-step 5.23: Execute action a t to obtain reward value r t at time t and state s t+1 at time t+1 ;

子步骤5.24:在经验复用池D中存储样本(φt,at,rtt+1);Sub-step 5.24: Store samples (φ t , a t , r tt+1 ) in the experience multiplexing pool D;

子步骤5.25:从经验复用池D中随机抽取小批量的样本(φj,aj,rjj+1);Sub-step 5.25: Randomly select a small batch of samples (φ j ,a j ,r jj+1 ) from the experience multiplexing pool D;

子步骤5.26:利用下式计算迭代目标:Sub-step 5.26: Calculate the iteration target using the following formula:

Figure BDA0003829935270000161
Figure BDA0003829935270000161

式中,

Figure BDA0003829935270000162
表示t时刻目标网络的权重;γ表示折扣因子;argmax(·)表示使目标函数具有最大值的变量,yi表示i时刻的迭代目标,p(s,a)表示动作分布;In the formula,
Figure BDA0003829935270000162
Indicates the weight of the target network at time t; γ indicates the discount factor; argmax( ) indicates the variable that makes the objective function have the maximum value, y i indicates the iterative target at time i, and p(s,a) indicates the action distribution;

子步骤5.27:利用下式在(yi-Q(φj,aj;θ))2上进行梯度下降:Sub-step 5.27: Use the following formula to perform gradient descent on (y i -Q(φ j ,a j ; θ)) 2 :

Figure BDA0003829935270000163
Figure BDA0003829935270000163

式中,

Figure BDA0003829935270000164
表示参数为θi的神经网络损失函数的梯度函数,ε表示在ε-greedy探索策略下,随机选择一个行为的概率;θi表示i时刻迭代的参数,Lii)表示i时刻的损失函数,Q(s,a;θi)表示目标网络的动作价值函数,a′表示状态s′所有可能存在的动作。In the formula,
Figure BDA0003829935270000164
Indicates the gradient function of the neural network loss function whose parameter is θ i , ε indicates the probability of randomly selecting a behavior under the ε-greedy exploration strategy; θ i indicates the parameter of iteration at time i, and L ii ) indicates the The loss function, Q(s,a; θ i ) represents the action value function of the target network, and a' represents all possible actions in the state s'.

当营运车辆安全驾驶决策模型训练完成后,将传感器采集的状态空间信息输入到安全驾驶决策模型中,可以实时地输出转向、直行、加减速等高级驾驶决策,能够有效保障城市低速环境下的营运车辆运行安全。After the training of the safe driving decision model of commercial vehicles is completed, the state space information collected by the sensor is input into the safe driving decision model, which can output high-level driving decisions such as steering, straight ahead, acceleration and deceleration in real time, which can effectively guarantee the operation in the urban low-speed environment The vehicle is safe to operate.

Claims (1)

1.城市低速环境下的大型营运车辆安全驾驶决策方法,首先,采集城市交通环境下人类驾驶员的安全驾驶行为,构建形成安全驾驶行为数据集;其次,构建基于多头注意力的营运车辆安全驾驶决策模型;该模型包含深度双Q网络和生成对抗模仿学习两个子网络;其中,深度双Q网络通过无监督学习的方式,学习危险场景、冲突场景等边缘场景下的安全驾驶策略;生成对抗模仿学习子网络模仿人类驾驶员在不同驾驶条件和行驶工况下的安全驾驶行为;最后,训练安全驾驶决策模型,得到不同驾驶条件和行驶工况下的驾驶策略,实现对营运车辆安全驾驶行为的高级决策输出;其特征在于:1. The decision-making method for safe driving of large commercial vehicles in urban low-speed environments. First, collect the safe driving behaviors of human drivers in urban traffic environments, and construct a safe driving behavior data set; secondly, construct a safe driving of commercial vehicles based on multi-head attention Decision-making model; this model includes two sub-networks of deep double-Q network and generative confrontational imitation learning; among them, the deep double-Q network learns safe driving strategies in edge scenarios such as dangerous scenes and conflict scenes through unsupervised learning; generative confrontational imitation learning The learning sub-network imitates the safe driving behavior of human drivers under different driving conditions and driving conditions; finally, trains the safe driving decision-making model to obtain driving strategies under different driving conditions and driving conditions, and realizes the safe driving behavior of commercial vehicles. High-level decision output; characterized by: 步骤一:采集城市交通环境下人类驾驶员的安全驾驶行为Step 1: Collect the safe driving behavior of human drivers in urban traffic environment 通过实际道路测试和驾驶模拟仿真的方式,采集不同驾驶条件和行驶工况下的安全驾驶行为,进而构建表征人类驾驶员安全驾驶行为的数据集;具体包括以下4个子步骤:Through the actual road test and driving simulation, the safe driving behavior under different driving conditions and driving conditions is collected, and then the data set representing the safe driving behavior of human drivers is constructed; specifically, it includes the following four sub-steps: 子步骤1:利用毫米波雷达、128线激光雷达、视觉传感器、北斗传感器和惯性传感器搭建多维目标信息同步采集系统;Sub-step 1: Use millimeter-wave radar, 128-line laser radar, vision sensor, Beidou sensor and inertial sensor to build a multi-dimensional target information synchronous acquisition system; 子步骤2:在真实城市环境下,多名驾驶员依次驾驶搭载多维目标信息同步采集系统的营运车辆,对驾驶员的车道变换、车道保持、车辆跟驰、加减速的各种驾驶行为的相关数据进行采集和处理,获取各驾驶行为的多源异质描述数据,包括雷达或视觉传感器测得多个不同方位的障碍物距离,北斗传感器及惯性传感器测得的位置、速度、加速度及横摆角速度,以及车载传感器测得的方向盘转角;Sub-step 2: In a real urban environment, multiple drivers sequentially drive a commercial vehicle equipped with a multi-dimensional target information synchronous acquisition system, and the correlation of various driving behaviors of the driver's lane change, lane keeping, vehicle following, acceleration and deceleration The data is collected and processed to obtain multi-source heterogeneous description data of various driving behaviors, including obstacle distances in different directions measured by radar or visual sensors, position, speed, acceleration and yaw measured by Beidou sensors and inertial sensors Angular velocity, and steering wheel angle measured by on-board sensors; 子步骤3:为了模仿危险场景、冲突场景等边缘场景下的安全驾驶行为,搭建基于硬件在环仿真的虚拟城市场景;所构建的城市交通场景包括以下三类:Sub-step 3: In order to imitate safe driving behavior in borderline scenarios such as dangerous scenarios and conflict scenarios, build a virtual city scenario based on hardware-in-the-loop simulation; the constructed urban traffic scenarios include the following three categories: (1)在车辆行驶过程中,车辆前方会出现横向接近的交通参与者,即突遇障碍物;(1) During the driving process of the vehicle, there will be traffic participants approaching laterally in front of the vehicle, that is, suddenly encountering obstacles; (2)在车辆转向过程中,车辆的视觉盲区内存在静止的交通参与者;(2) During the turning process of the vehicle, there are stationary traffic participants in the blind spot of the vehicle; (3)在车辆转向过程中,车辆的视觉盲区内存在运动的交通参与者;(3) During the turning process of the vehicle, there are moving traffic participants in the blind spot of the vehicle; 在上述交通场景中,存在多种路网结构,包括直道、弯道和十字路口和多类交通参与者,包括营运车辆、乘用车、非机动车和行人;In the above traffic scenarios, there are various road network structures, including straight roads, curves, and intersections, and various types of traffic participants, including commercial vehicles, passenger vehicles, non-motorized vehicles, and pedestrians; 多名驾驶员通过具有方向盘、油门和制动踏板的真实控制器,驾驶虚拟场景中的营运车辆,采集自车的横纵向位置、横纵向速度、横纵向加速度、与周围交通参与者的相对距离和相对速度信息;Multiple drivers drive commercial vehicles in the virtual scene through real controllers with steering wheels, accelerators and brake pedals, and collect the vehicle's horizontal and vertical positions, horizontal and vertical speeds, horizontal and vertical accelerations, and relative distances from surrounding traffic participants and relative velocity information; 子步骤4:基于真实城市环境和驾驶模拟仿真环境采集的数据,构建形成用于安全驾驶决策学习的驾驶行为数据集,具体表示为:Sub-step 4: Based on the data collected in the real urban environment and driving simulation environment, construct a driving behavior data set for safe driving decision-making learning, specifically expressed as:
Figure FDA0003829935260000021
Figure FDA0003829935260000021
式中,X表示涵盖状态、动作的二元组,即构建的表征人类驾驶员安全驾驶行为的数据集,(sj,aj)表示j时刻的“状态-动作”对,其中,sj表示j时刻的状态,aj表示j时刻的动作,即人类驾驶员基于状态sj做出的动作,n表示数据库中“状态-动作”对的数量;In the formula, X represents the 2-tuple covering the state and action, that is, the constructed data set representing the safe driving behavior of human drivers, (s j , a j ) represents the "state-action" pair at time j, where s j Indicates the state at time j, a j represents the action at time j, that is, the action made by the human driver based on the state s j , n represents the number of "state-action" pairs in the database; 步骤二:构建基于多头注意力的营运车辆安全驾驶决策模型Step 2: Construct a decision-making model for safe driving of commercial vehicles based on multi-head attention 利用深度强化学习对危险场景、冲突场景的边缘场景下的安全驾驶策略进行学习;此外,考虑到模仿学习具有仿效榜样的能力,利用模仿学习模拟人类驾驶员在不同驾驶条件和行驶工况下的安全驾驶行为;因此,构建的安全驾驶决策模型由两部分组成,具体描述如下:Use deep reinforcement learning to learn safe driving strategies in dangerous scenes and edge scenes of conflict scenes; in addition, considering that imitation learning has the ability to imitate models, use imitation learning to simulate the behavior of human drivers under different driving conditions and driving conditions. Safe driving behavior; therefore, the constructed safe driving decision-making model consists of two parts, which are specifically described as follows: 子步骤1:定义安全驾驶决策模型的基本参数Sub-step 1: Define the basic parameters of the safe driving decision model 首先,将城市低速环境下的安全驾驶决策问题转化为有限马尔科夫决策过程;其次,定义安全驾驶决策模型的基本参数;Firstly, transform the safe driving decision-making problem in urban low-speed environment into a finite Markov decision process; secondly, define the basic parameters of the safe driving decision-making model; (1)定义状态空间(1) Define the state space 为了描述自车和附近交通参与者的运动状态,利用时间序列数据和占据栅格图构建状态空间;具体描述如下:In order to describe the movement state of the self-vehicle and nearby traffic participants, the state space is constructed by using time series data and occupancy grid graph; the specific description is as follows: St=[S1(t),S2(t),S3(t)] (2)S t = [S 1 (t), S 2 (t), S 3 (t)] (2) 式中,St表示t时刻的状态空间,S1(t)表示t时刻与时间序列数据相关的自车的状态空间,S2(t)表示t时刻与时间序列数据相关的周围交通参与者的状态空间,S3(t)表示t时刻与占据栅格图相关的状态空间;In the formula, S t represents the state space at time t, S 1 (t) represents the state space of the ego vehicle related to time series data at time t, and S 2 (t) represents the surrounding traffic participants related to time series data at time t The state space of , S 3 (t) represents the state space related to the occupancy grid graph at time t; 首先,利用连续位置、速度、加速度和航向角信息描述自车的运动状态:First, the ego vehicle's motion state is described using continuous position, velocity, acceleration, and heading angle information: S1(t)=[px,py,vx,vy,ax,ays] (3)S 1 (t)=[p x ,p y ,v x ,v y ,a x ,a ys ] (3) 式中,px,py分别表示自车的横向位置和纵向位置,单位为米,vx,vy分别表示自车的横向速度和纵向速度,单位为米每秒,ax,ay分别表示自车的横向加速度和纵向加速度,单位为米每二次方秒,θs表示自车的航向角,单位为度;In the formula, p x , p y represent the lateral position and longitudinal position of the self-vehicle respectively in meters, v x , v y represent the lateral speed and longitudinal speed of the self-vehicle respectively in meters per second, a x , a y Respectively represent the lateral acceleration and longitudinal acceleration of the own vehicle, the unit is meter per square second, θ s represents the heading angle of the own vehicle, the unit is degree; 其次,利用自车与周围交通参与者的相对运动状态信息描述周围交通参与者的运动状态:Secondly, use the relative motion state information of the self-vehicle and the surrounding traffic participants to describe the motion state of the surrounding traffic participants:
Figure FDA0003829935260000031
Figure FDA0003829935260000031
式中,
Figure FDA0003829935260000032
ai分别表示自车与第i个交通参与者的相对距离、相对速度和加速度,单位分别为米、米每秒和米每二次方秒;
In the formula,
Figure FDA0003829935260000032
a i represent the relative distance, relative speed and acceleration between the ego vehicle and the i-th traffic participant, and the units are meters, meters per second and meters per square second;
最后,将道路区域栅格化,划分成若干个p×q的网格区域,将道路区域及车辆目标抽象成栅格图,即用于描述相对位置关系的“存在”栅格图S3(t);其中,p表示网格区域的长度,q表示网格区域的宽度;Finally, the road area is rasterized and divided into several p×q grid areas, and the road area and vehicle targets are abstracted into a grid map, which is the "existence" grid map S 3 ( t); wherein, p represents the length of the grid area, and q represents the width of the grid area; “存在”栅格图包含四种属性,包括栅格坐标、是否存在车辆、对应车辆的类别、与左右车道线的距离;其中,不存在交通参与者的网格置为“0”,存在交通参与者的网格置为“1”,该网格与自车所在网格的位置分布,用于描述两车的相对间距;The "existence" grid map contains four attributes, including grid coordinates, whether there is a vehicle, the category of the corresponding vehicle, and the distance from the left and right lane lines; among them, the grid without traffic participants is set to "0", and the grid with traffic The grid of the participant is set to "1", and the position distribution between the grid and the grid of the self-vehicle is used to describe the relative distance between the two vehicles; (2)定义动作空间(2) Define the action space 利用横向和纵向驾驶动作定义动作空间:Define the action space with lateral and longitudinal driving actions: At=[aleft,astraight,aright,aaccel,acons,adecel] (5)A t =[a left ,a straight ,a right ,a accel ,a cons ,a decel ] (5) 式中,At表示t时刻的动作空间,aleft,astraight,aright分别表示左转、直行和右转,aaccel,acons,adecel分别表示加速、匀速和减速;In the formula, A t represents the action space at time t, a left , a straight , and a right represent left turn, straight line and right turn respectively, and a accel , a cons , a decel represent acceleration, constant speed and deceleration respectively; (3)定义奖励函数(3) Define the reward function Rt=r1+r2+r3 (6)R t =r 1 +r 2 +r 3 (6) 式中,Rt表示t时刻的奖励函数,r1,r2,r3分别表示前向防撞奖励函数、后向防撞奖励函数和侧向防撞奖励函数,可通过式(7)、式(8)和式(9)获得;In the formula, R t represents the reward function at time t, r 1 , r 2 , and r 3 represent the forward collision avoidance reward function, the backward collision avoidance reward function and the side collision avoidance reward function respectively, which can be obtained through formula (7), Formula (8) and formula (9) are obtained;
Figure FDA0003829935260000033
Figure FDA0003829935260000033
Figure FDA0003829935260000034
Figure FDA0003829935260000034
Figure FDA0003829935260000035
Figure FDA0003829935260000035
式中,TTC表示自车与前方障碍物发生碰撞的时间,可通过自车与前方障碍物之间的距离除以相对速度获得,TTCthr表示距离碰撞时间阈值,RTTC表示后向碰撞时间,RTTCthr表示后向碰撞时间阈值,单位均为秒,xlat表示自车与两侧交通参与者的距离,xmin表示最小侧向安全距离,单位均为米,β123分别表示前向防撞奖励函数、后向防撞奖励函数和侧向防撞奖励函数的权重系数;In the formula, TTC represents the collision time between the self-vehicle and the front obstacle, which can be obtained by dividing the distance between the self-vehicle and the front obstacle by the relative speed, TTC thr represents the distance collision time threshold, RTTC represents the backward collision time, RTTC thr represents the backward collision time threshold, the unit is second, x lat represents the distance between the vehicle and the traffic participants on both sides, x min represents the minimum lateral safety distance, the unit is meter, β 1 , β 2 , β 3 respectively Indicates the weight coefficients of forward collision avoidance reward function, backward collision avoidance reward function and side collision avoidance reward function; 子步骤2:构建基于深度双Q网络的决策子网络Sub-step 2: Construct a decision-making sub-network based on a deep double-Q network 利用深度双Q网络学习边缘场景下的安全驾驶策略;Use deep double Q network to learn safe driving strategy in edge scenarios; 设计基于多头注意力机制的策略网络;此外,利用位置编码方法,将排列不变性构建到决策子网络中;Design a policy network based on a multi-head attention mechanism; in addition, use a positional encoding method to build permutation invariance into the decision sub-network; 注意力层可表示为:The attention layer can be expressed as: MultiHead(Q,K,V)=Concat(head1,...,headh)WO (10)MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W O (10) 式中,MultiHead(Q,K,V)表示多头注意力值,Q表示查询向量,K表示键向量,维度均为dk,V表示值向量,维度为dv,WO表示需要学习的参数矩阵,headh表示多头注意力中的第h个头,在本发明中,h=2,通过下式计算:In the formula, MultiHead(Q,K,V) represents the multi-head attention value, Q represents the query vector, K represents the key vector, the dimension is d k , V represents the value vector, the dimension is d v , W O represents the parameters to be learned Matrix, head h represents the hth head in the multi-head attention, in the present invention, h=2, calculated by the following formula:
Figure FDA0003829935260000041
Figure FDA0003829935260000041
Figure FDA0003829935260000042
Figure FDA0003829935260000042
式中,Attention(Q,K,V)表示输出的注意力矩阵,
Figure FDA0003829935260000043
表示需要学习的参数矩阵;
In the formula, Attention(Q,K,V) represents the output attention matrix,
Figure FDA0003829935260000043
Represents the parameter matrix that needs to be learned;
构建决策子网络,具体描述如下;Construct a decision-making sub-network, which is described in detail as follows; 首先,状态空间St分别与编码器1、编码器2和编码器3相连;编码器1由两个全连接层组成,输出自车运动状态编码;编码器2的结构与编码器1相同,输出相对运动状态编码;编码器3由两个卷积层组成,输出占据栅格图编码;First, the state space S t is connected to encoder 1, encoder 2, and encoder 3 respectively; encoder 1 is composed of two fully connected layers, and outputs the self-vehicle motion state code; encoder 2 has the same structure as encoder 1, Output relative motion state encoding; Encoder 3 consists of two convolutional layers, and the output occupies a raster image encoding; 其中,全连接层的神经元数量均为64,激活函数均为Tanh函数;卷积层的卷积核均为3×3,步长均为2;Among them, the number of neurons in the fully connected layer is 64, and the activation function is Tanh function; the convolution kernel of the convolution layer is 3×3, and the step size is 2; 其次,利用多头注意力机制分析自车与周围交通参与者的依赖关系,使得决策子网络能够注意到突然靠近自车或与自车行驶路径冲突的交通参与者,并将不同的输入大小和排列不变性构建到决策子网络中;编码器1、编码器2和编码器3的输出均与多头注意力模块连接,输出注意力矩阵;再次,将输出的注意力矩阵与解码器1相连;编码器1由一个全连接层组成;Secondly, the multi-head attention mechanism is used to analyze the dependence relationship between the self-vehicle and the surrounding traffic participants, so that the decision-making sub-network can notice the traffic participants who suddenly approach the self-vehicle or conflict with the driving path of the self-vehicle, and use different input sizes and arrangements The invariance is built into the decision-making sub-network; the outputs of Encoder 1, Encoder 2, and Encoder 3 are all connected to the multi-head attention module, and the attention matrix is output; again, the output attention matrix is connected to Decoder 1; the encoding Device 1 consists of a fully connected layer; 其中,全连接层的神经元数量为64,激活函数为Sigmoid函数;Among them, the number of neurons in the fully connected layer is 64, and the activation function is the Sigmoid function; 子步骤3:构建基于生成对抗模仿学习的决策子网络Sub-step 3: Build a decision-making sub-network based on generative confrontational imitation learning 利用生成对抗模仿学习子网络,学习驾驶行为数据集及其泛化的样本数据中的驾驶策略,进而模仿人类驾驶员在不同驾驶条件和行驶工况下的安全驾驶行为;生成对抗模仿学习子网络由生成器和判别器两部分组成,分别利用深度神经网络构建生成器网络和判别器网络;具体描述如下:Use the generative adversarial imitation learning subnetwork to learn the driving strategy in the driving behavior data set and its generalized sample data, and then imitate the safe driving behavior of human drivers under different driving conditions and driving conditions; generate the adversarial imitation learning subnetwork It consists of two parts, the generator and the discriminator, and uses the deep neural network to build the generator network and the discriminator network respectively; the specific description is as follows: (1)构建生成器(1) Build generator 构建如图3所示的生成器网络;生成器的输入是状态空间,输出是动作空间中各个动作的概率值f=π(·|s;θ),其中,θ表示生成器网络的参数;首先,状态空间依次与全连接层FC1和FC2相连,得到特征F1;状态空间依次与全连接层FC3和FC4,得到特征F2;同时,状态空间依次与卷积层C1和卷积层C2相连,得到特征F3;然后,将特征F1、F2和F3依次与合并层、全连接层FC5、Softmax激活函数相连,得到输出f=π(·|s;θ);Build a generator network as shown in Figure 3; the input of the generator is the state space, and the output is the probability value f=π(|s; θ) of each action in the action space, wherein, θ represents the parameter of the generator network; First, the state space is sequentially connected with the fully connected layers FC 1 and FC 2 to obtain the feature F 1 ; the state space is sequentially connected with the fully connected layers FC 3 and FC 4 to obtain the feature F 2 ; at the same time, the state space is sequentially connected with the convolutional layer C 1 Connect with the convolutional layer C 2 to obtain the feature F 3 ; then, connect the features F 1 , F 2 and F 3 to the merge layer, the fully connected layer FC 5 , and the Softmax activation function in turn to obtain the output f=π(·|s ;θ); 其中,全连接层FC1、FC2、FC3、FC4和FC5的神经元数量均为64,卷积层C1的卷积核为3×3,步长为2,卷积层C2的卷积核为3×3,步长为1;Among them, the number of neurons in the fully connected layers FC 1 , FC 2 , FC 3 , FC 4 and FC 5 is 64, the convolution kernel of the convolution layer C 1 is 3×3, and the step size is 2, and the convolution layer C The convolution kernel of 2 is 3×3, and the step size is 1; (2)构建判别器(2) Build a discriminator 构建判别器网络;判别器的输入是状态空间,输出是向量
Figure FDA0003829935260000051
维度为6,其中,φ表示判别器网络的参数;首先,状态空间依次与全连接层FC6和FC7相连,得到特征F4;状态空间依次与全连接层FC8和FC9,得到特征F5;同时,状态空间依次与卷积层C3和卷积层C4相连,得到特征F6;然后,将特征F4、F5和F6依次与合并层、全连接层FC10、Sigmoid激活函数相连,得到输出
Figure FDA0003829935260000052
Build a discriminator network; the input of the discriminator is a state space, and the output is a vector
Figure FDA0003829935260000051
The dimension is 6, where φ represents the parameters of the discriminator network; first, the state space is sequentially connected with the fully connected layers FC 6 and FC 7 to obtain the feature F 4 ; the state space is sequentially connected with the fully connected layers FC 8 and FC 9 to obtain the feature F 5 ; at the same time, the state space is sequentially connected with the convolutional layer C 3 and the convolutional layer C 4 to obtain the feature F 6 ; then, the features F 4 , F 5 and F 6 are sequentially connected with the merge layer, the fully connected layer FC 10 , The Sigmoid activation function is connected to get the output
Figure FDA0003829935260000052
其中,全连接层FC6、FC7、FC8、FC9和FC10的神经元数量均为64,卷积层C3的卷积核为3×3,步长为2,卷积层C4的卷积核为3×3,步长为1;Among them, the number of neurons in the fully connected layers FC 6 , FC 7 , FC 8 , FC 9 and FC 10 is 64, the convolution kernel of the convolution layer C 3 is 3×3, and the step size is 2, and the convolution layer C The convolution kernel of 4 is 3×3, and the step size is 1; 步骤三:训练营运车辆安全驾驶决策模型Step 3: Train the decision-making model for safe driving of commercial vehicles 首先,训练基于生成对抗模仿学习的决策子网络;生成对抗模仿学习子网络的目标是学习一个生成器网络,使得判别器无法区分生成器生成的驾驶动作与驾驶行为数据集中的动作;具体包括以下几个子步骤:First, train the decision-making subnetwork based on generative confrontational imitation learning; the goal of the generative confrontational imitation learning subnetwork is to learn a generator network, so that the discriminator cannot distinguish between the driving actions generated by the generator and the actions in the driving behavior data set; the specifics include the following Several sub-steps: 子步骤1:在驾驶行为数据集
Figure FDA0003829935260000061
中,初始化生成器网络参数θ0和判别器网络参数ω0
Sub-step 1: On the driving behavior dataset
Figure FDA0003829935260000061
, initialize the generator network parameter θ 0 and the discriminator network parameter ω 0 ;
子步骤2:进行L次迭代求解,每一次迭代包括子步骤2.1至2.2,具体地:Sub-step 2: Carry out L iterations to solve, each iteration includes sub-steps 2.1 to 2.2, specifically: 子步骤2.1:利用式(13)描述的梯度公式更新判别器参数ωi→ωi+1Sub-step 2.1: Update the discriminator parameters ω i →ω i+1 using the gradient formula described by Equation (13):
Figure FDA0003829935260000062
Figure FDA0003829935260000062
式中,▽ω表示参数为ω的神经网络损失函数的梯度函数;where ▽ ω represents the gradient function of the neural network loss function whose parameter is ω; 子步骤2.2:设置奖励函数
Figure FDA0003829935260000063
利用信赖域策略优化算法更新生成器参数θi→θi+1
Sub-step 2.2: Set the reward function
Figure FDA0003829935260000063
Use the trust region strategy optimization algorithm to update the generator parameters θ i → θ i+1 ;
首先,在上述网络训练结果的基础上,继续训练构建基于DDQN的决策子网络,具体包括以下几个子步骤:First, on the basis of the above network training results, continue to train and build a decision-making sub-network based on DDQN, which specifically includes the following sub-steps: 子步骤3:初始化经验复用池D的容量为N;Sub-step 3: Initialize the capacity of experience reuse pool D as N; 子步骤4:初始化动作对应的Q值为随机值;Sub-step 4: The Q value corresponding to the initialization action is a random value; 子步骤5:进行M次迭代求解,每一次迭代包括子步骤5.1至5.2,具体地:Sub-step 5: Carry out M iterations to solve, each iteration includes sub-steps 5.1 to 5.2, specifically: 子步骤5.1:初始化状态s0,初始化策略参数φ0Sub-step 5.1: Initialize state s 0 , initialize strategy parameter φ 0 ; 子步骤5.2:进行T次迭代求解,每一次迭代包括子步骤5.21至5.27,具体地:Sub-step 5.2: Perform T iterations to solve, each iteration includes sub-steps 5.21 to 5.27, specifically: 子步骤5.21:随机选择一个驾驶动作;Sub-step 5.21: Randomly select a driving action; 子步骤5.22:否则选择at=maxaQ*(φ(st),a;θ);Sub-step 5.22: Otherwise select a t = max a Q * (φ(s t ),a; θ); 式中,Q*(·)表示最优的动作价值函数,at表示t时刻的动作;In the formula, Q * ( ) represents the optimal action-value function, and a t represents the action at time t; 子步骤5.23:执行动作at,获得t时刻的奖励值rt和t+1时刻的状态st+1Sub-step 5.23: Execute action a t to obtain reward value r t at time t and state s t+1 at time t+1 ; 子步骤5.24:在经验复用池D中存储样本(φt,at,rtt+1);Sub-step 5.24: Store samples (φ t , a t , r tt+1 ) in the experience multiplexing pool D; 子步骤5.25:从经验复用池D中随机抽取小批量的样本(φj,aj,rjj+1);Sub-step 5.25: Randomly select a small batch of samples (φ j ,a j ,r jj+1 ) from the experience multiplexing pool D; 子步骤5.26:利用下式计算迭代目标:Sub-step 5.26: Calculate the iteration target using the following formula:
Figure FDA0003829935260000064
Figure FDA0003829935260000064
式中,
Figure FDA0003829935260000065
表示t时刻目标网络的权重;γ表示折扣因子;arg max(·)表示使目标函数具有最大值的变量,yi表示i时刻的迭代目标,p(s,a)表示动作分布;
In the formula,
Figure FDA0003829935260000065
Indicates the weight of the target network at time t; γ indicates the discount factor; arg max( ) indicates the variable that makes the objective function have the maximum value, y i indicates the iterative target at time i, and p(s,a) indicates the action distribution;
子步骤5.27:利用下式在(yi-Q(φj,aj;θ))2上进行梯度下降:Sub-step 5.27: Use the following formula to perform gradient descent on (y i -Q(φ j ,a j ; θ)) 2 :
Figure FDA0003829935260000071
Figure FDA0003829935260000071
式中,
Figure FDA0003829935260000072
表示参数为θi的神经网络损失函数的梯度函数,ε表示在ε-greedy探索策略下,随机选择一个行为的概率;θi表示i时刻迭代的参数,Lii)表示i时刻的损失函数,Q(s,a;θi)表示目标网络的动作价值函数,a′表示状态s′所有可能存在的动作;
In the formula,
Figure FDA0003829935260000072
Represents the gradient function of the neural network loss function with parameter θ i , ε represents the probability of randomly selecting a behavior under the ε-greedy exploration strategy; θ i represents the parameter of iteration at time i, L ii ) represents the Loss function, Q(s,a; θ i ) represents the action value function of the target network, a' represents all possible actions in the state s';
当营运车辆安全驾驶决策模型训练完成后,将传感器采集的状态空间信息输入到安全驾驶决策模型中,可以实时地输出转向、直行、加减速等高级驾驶决策,能够有效保障城市低速环境下的营运车辆运行安全。After the training of the safe driving decision model of commercial vehicles is completed, the state space information collected by the sensor is input into the safe driving decision model, which can output high-level driving decisions such as steering, straight ahead, acceleration and deceleration in real time, which can effectively guarantee the operation in the urban low-speed environment The vehicle is safe to operate.
CN202211070514.5A 2022-09-02 2022-09-02 Safe driving decision-making method for large commercial vehicles in urban low-speed environment Active CN115257819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211070514.5A CN115257819B (en) 2022-09-02 2022-09-02 Safe driving decision-making method for large commercial vehicles in urban low-speed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211070514.5A CN115257819B (en) 2022-09-02 2022-09-02 Safe driving decision-making method for large commercial vehicles in urban low-speed environment

Publications (2)

Publication Number Publication Date
CN115257819A true CN115257819A (en) 2022-11-01
CN115257819B CN115257819B (en) 2024-12-24

Family

ID=83755148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211070514.5A Active CN115257819B (en) 2022-09-02 2022-09-02 Safe driving decision-making method for large commercial vehicles in urban low-speed environment

Country Status (1)

Country Link
CN (1) CN115257819B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731690A (en) * 2022-11-18 2023-03-03 北京理工大学 A decision-making method for unmanned bus clusters based on graph neural network reinforcement learning
CN117048365A (en) * 2023-10-12 2023-11-14 江西五十铃汽车有限公司 Automobile torque control method, system, storage medium and equipment
CN117246345A (en) * 2023-11-06 2023-12-19 镁佳(武汉)科技有限公司 A generative vehicle control method, device, equipment and medium
CN118397581A (en) * 2024-04-07 2024-07-26 湘江实验室 Intelligent driving method based on brain-like perception and related equipment
WO2024198773A1 (en) * 2023-03-27 2024-10-03 华为技术有限公司 Neural network, self-driving method, and apparatus
CN118770181A (en) * 2024-06-14 2024-10-15 昆明理工大学 A real-time energy control method for plug-in hybrid electric vehicles based on weighted double Q learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN113762512A (en) * 2021-11-10 2021-12-07 北京航空航天大学杭州创新研究院 Distributed model training method, system and related device
CN114407931A (en) * 2022-02-21 2022-04-29 东南大学 A highly human-like decision-making method for safe driving of autonomous commercial vehicles

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN113762512A (en) * 2021-11-10 2021-12-07 北京航空航天大学杭州创新研究院 Distributed model training method, system and related device
CN114407931A (en) * 2022-02-21 2022-04-29 东南大学 A highly human-like decision-making method for safe driving of autonomous commercial vehicles

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731690A (en) * 2022-11-18 2023-03-03 北京理工大学 A decision-making method for unmanned bus clusters based on graph neural network reinforcement learning
CN115731690B (en) * 2022-11-18 2023-11-28 北京理工大学 A decision-making method for unmanned bus clusters based on graph neural network reinforcement learning
WO2024198773A1 (en) * 2023-03-27 2024-10-03 华为技术有限公司 Neural network, self-driving method, and apparatus
CN117048365A (en) * 2023-10-12 2023-11-14 江西五十铃汽车有限公司 Automobile torque control method, system, storage medium and equipment
CN117048365B (en) * 2023-10-12 2024-01-26 江西五十铃汽车有限公司 Automobile torque control method, system, storage medium and equipment
CN117246345A (en) * 2023-11-06 2023-12-19 镁佳(武汉)科技有限公司 A generative vehicle control method, device, equipment and medium
CN118397581A (en) * 2024-04-07 2024-07-26 湘江实验室 Intelligent driving method based on brain-like perception and related equipment
CN118770181A (en) * 2024-06-14 2024-10-15 昆明理工大学 A real-time energy control method for plug-in hybrid electric vehicles based on weighted double Q learning

Also Published As

Publication number Publication date
CN115257819B (en) 2024-12-24

Similar Documents

Publication Publication Date Title
JP7607641B2 (en) Modeling and predicting yielding behavior.
CN114407931B (en) A highly human-like safe driving decision-making method for autonomous driving commercial vehicles
CN115257819B (en) Safe driving decision-making method for large commercial vehicles in urban low-speed environment
US11635764B2 (en) Motion prediction for autonomous devices
EP3678911B1 (en) Pedestrian behavior predictions for autonomous vehicles
CN114312830B (en) Intelligent vehicle coupling decision model and method considering dangerous driving conditions
US11465650B2 (en) Model-free reinforcement learning
CN113954837B (en) Deep learning-based lane change decision-making method for large-scale commercial vehicle
CN115257820B (en) A forward collision avoidance driving decision-making method for commercial vehicles in open interference scenarios
Huang et al. An integrated architecture for intelligence evaluation of automated vehicles
Siboo et al. An empirical study of DDPG and PPO-based reinforcement learning algorithms for autonomous driving
CN115257789B (en) Side collision avoidance driving decision-making method for commercial vehicles in low-speed urban environment
US20220147051A1 (en) Systems and methods for path planning with latent state inference and graphical relationships
Ren et al. Self-learned intelligence for integrated decision and control of automated vehicles at signalized intersections
Guo et al. Toward human-like behavior generation in urban environment based on Markov decision process with hybrid potential maps
Elallid et al. Deep reinforcement learning for autonomous vehicle intersection navigation
Li et al. A decision-making approach for complex unsignalized intersection by deep reinforcement learning
CN115092166B (en) Interactive automatic driving speed planning method based on social force model
Dubey et al. Autonomous braking and throttle system: A deep reinforcement learning approach for naturalistic driving
CN114925461A (en) Emergency steering control strategy network model, training method, modeling method and simulation method for automatic driving commercial vehicle
CN118212808B (en) Method, system and equipment for planning traffic decision of signalless intersection
Molaie et al. Auto-driving policies in highway based on distributional deep reinforcement learning
JP7655918B2 (en) Pedestrians with objects
YU et al. Vehicle intelligent driving technology
Zhang et al. Lane change decision algorithm based on deep Q network for autonomous vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant