CN117141520B

CN117141520B - Real-time track planning method, device and equipment

Info

Publication number: CN117141520B
Application number: CN202311426011.1A
Authority: CN
Inventors: 彭航; 于文豪; 王晨捷; 吉建民; 张昱; 张燕咏
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-01-12
Anticipated expiration: 2043-10-31
Also published as: CN117141520A

Abstract

The disclosure belongs to the technical field of automatic driving, and particularly relates to a real-time track planning method, device and equipment. Wherein the method comprises the following steps: basic navigation point information and pose information of a vehicle at the current moment are obtained, a target path reference line is determined based on the basic navigation point information, the pose information is converted from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and a transverse offset-displacement map and a rasterized map are obtained; determining a path planning initial value of a current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and performing multi-objective optimization on the initial path plan to obtain an optimal path plan; and determining an optimal speed curve corresponding to the optimal path planning, obtaining a target track by combining the optimal path planning and the optimal speed curve, and carrying out cost selection on the multi-lane target track to obtain a final control track.

Description

A real-time trajectory planning method, device and equipment

技术领域Technical field

本公开属于自动驾驶技术领域，具体涉及一种实时轨迹规划方法、装置和设备。The present disclosure belongs to the field of autonomous driving technology, and specifically relates to a real-time trajectory planning method, device and equipment.

背景技术Background technique

目前自动驾驶实时轨迹规划所应对的自动驾驶场景越来越复杂和多变，当异常场景出现时传统的解耦规划或优化求解往往因为搜索维数、代价参数等问题难以应对，从而导致实时规划失败。At present, the autonomous driving scenarios that real-time trajectory planning of autonomous driving deals with are becoming more and more complex and changeable. When abnormal scenarios occur, traditional decoupled planning or optimization solutions are often difficult to deal with due to issues such as search dimensions and cost parameters, resulting in real-time planning. fail.

因此，现有技术通过引入强化学习直接生成轨迹粗解再进行后端优化求解最终轨迹，此种方法虽然可以获得相对可靠安全的轨迹，但DQN强化学习方法往往因为网络本身的局限性难以获得良好的轨迹粗解，从而使后端优化并不稳定。Therefore, the existing technology directly generates a rough solution of the trajectory by introducing reinforcement learning and then performs back-end optimization to solve the final trajectory. Although this method can obtain a relatively reliable and safe trajectory, the DQN reinforcement learning method is often difficult to obtain good results due to the limitations of the network itself. The rough solution of the trajectory makes the back-end optimization unstable.

发明内容Contents of the invention

本公开实施例提出了一种实时轨迹规划方案，以解决现有技术对异常场景处理能力不足的问题。The embodiment of the present disclosure proposes a real-time trajectory planning solution to solve the problem of insufficient processing capabilities of abnormal scenes in the existing technology.

本公开实施例的第一方面提供了一种实时轨迹规划方法，包括：A first aspect of the embodiments of the present disclosure provides a real-time trajectory planning method, including:

获取车辆在当前时刻的基础导航点信息和位姿信息，基于所述基础导航点信息确定目标路径参考线，基于所述目标路径参考线将所述位姿信息由笛卡尔坐标系转化到弗莱纳坐标系下，获得横向偏移-位移图和栅格化地图，其中，所述位姿信息包括车辆自身状态、周围交通参与者和局部路网信息；Obtain the basic navigation point information and pose information of the vehicle at the current moment, determine the target path reference line based on the basic navigation point information, and convert the pose information from the Cartesian coordinate system to Frye based on the target path reference line Under the nano coordinate system, obtain the lateral offset-displacement map and rasterized map, where the pose information includes the vehicle's own status, surrounding traffic participants and local road network information;

通过代价搜索和强化学习模型，确定当前车辆的路径规划初始值，平滑化所述路径规划初始值以获得初始路径规划，对所述初始路径规划进行多目标优化以获得最优路径规划；Determine the initial path planning value of the current vehicle through cost search and reinforcement learning models, smooth the initial path planning value to obtain the initial path planning, and perform multi-objective optimization on the initial path planning to obtain the optimal path planning;

确定与所述最优路径规划对应的最优速度曲线，联合所述最优路径规划和所述最优速度曲线得到目标轨迹，对多车道目标轨迹进行代价选择，获取最终的控制轨迹。Determine the optimal speed curve corresponding to the optimal path planning, obtain the target trajectory by combining the optimal path planning and the optimal speed curve, perform cost selection on the multi-lane target trajectory, and obtain the final control trajectory.

在一些实施例中，所述基于所述基础导航点信息确定目标路径参考线包括：In some embodiments, determining the target path reference line based on the basic navigation point information includes:

根据所述车辆当前帧的定位信息实时获取前方第一预设距离和后方第二预设距离内的基础导航点；Obtain basic navigation points within the first preset distance ahead and the second preset distance behind in real time according to the positioning information of the current frame of the vehicle;

确定所述目标路径参考线的损失函数和边界约束条件，其中，所述损失函数包括平滑代价、紧凑代价和相似代价，所述平滑代价用于衡量所述目标路径参考线的光滑程度，所述紧凑代价用于衡量与所述基础导航点对应的目标路径参考点之间的均匀程度，所述相似代价用于衡量所述目标路径参考线与原导航路径的偏离程度；Determine the loss function and boundary constraints of the target path reference line, where the loss function includes a smoothing cost, a compact cost and a similarity cost, the smoothing cost is used to measure the smoothness of the target path reference line, the The compactness cost is used to measure the degree of uniformity between the target path reference points corresponding to the basic navigation points, and the similarity cost is used to measure the deviation of the target path reference line from the original navigation path;

基于二次型求解器得到满足所述边界约束条件的所述基础导航点，与所述基础导航点对应的路径参考点组成所述目标路径参考线。The basic navigation point that satisfies the boundary constraint condition is obtained based on the quadratic solver, and the path reference point corresponding to the basic navigation point constitutes the target path reference line.

在一些实施例中，所述通过代价搜索和强化学习模型，确定当前车辆的路径规划初始值包括：In some embodiments, determining the initial value of the path planning of the current vehicle through cost search and reinforcement learning model includes:

沿所述目标路径参考线进行采样，计算各个采样节点的预设代价的代价值，由代价值总和最小的所述采样节点组成与所述预设代价对应的第一路径规划初始值；Sampling is performed along the target path reference line, the cost value of the preset cost of each sampling node is calculated, and the first path planning initial value corresponding to the preset cost is composed of the sampling node with the smallest sum of cost values;

根据当前及过去基于所述目标路径参考线的连续三帧所述栅格化地图，利用决策神经网络模型进行所述车辆在当前时刻所要执行动作的决策，获得所述车辆在当前时刻的决策动作值，所述决策动作值组成第二路径规划初始值；According to the current and past three consecutive frames of the rasterized map based on the target path reference line, the decision-making neural network model is used to make decisions about the actions to be performed by the vehicle at the current moment, and the decision-making actions of the vehicle at the current moment are obtained. value, the decision action value constitutes the initial value of the second path planning;

对所述第二路径规划初始值计算预设代价的代价值并与所述预设代价对应的第一路径规划初始值的代价值作比较，将所述第二路径规划初始值和所述第一路径规划初始值中代价值较小者作为当前车辆的路径规划初始值。Calculate the cost value of the preset cost for the second path planning initial value and compare it with the cost value of the first path planning initial value corresponding to the preset cost, and compare the second path planning initial value and the third path planning initial value. Among the initial path planning values, the one with the smaller cost value is used as the initial path planning value of the current vehicle.

在一些实施例中，所述决策神经网络模型的训练方法包括：In some embodiments, the training method of the decision-making neural network model includes:

由以车辆为中心的基于路径的连续三帧的栅格地图构成状态空间，其中，所述车辆和周围障碍物的位置被投影到所述栅格地图上；The state space is constituted by a three-frame continuous path-based grid map centered on the vehicle, in which the positions of the vehicle and surrounding obstacles are projected onto the grid map;

构建由所述车辆前方预设矩形范围内采样的多个路径点的横向连续坐标值组成的动作空间，其中，所述路径点纵向间保持固定间隔，横向坐标值由决策神经网络模型确定；Constructing an action space consisting of transverse continuous coordinate values of multiple path points sampled within a preset rectangular range in front of the vehicle, wherein a fixed interval is maintained between the longitudinal path points, and the transverse coordinate values are determined by the decision-making neural network model;

基于到达目标的奖励、碰撞发生的惩罚、输出路径的曲率以及输出路径间的平滑迁移设计奖励函数；Design a reward function based on the reward for reaching the target, the penalty for collision, the curvature of the output path, and the smooth transition between output paths;

以所述状态空间为输入，以所述动作空间为输出构造训练集；Construct a training set with the state space as input and the action space as output;

设计包含包括静态场景和复杂场景的仿真环境，在所述仿真环境中基于所述奖励函数，采用SAC强化学习方法对所述训练集训练神经网络模型，获得所述决策神经网络模型。Design a simulation environment including static scenes and complex scenes. In the simulation environment, based on the reward function, the SAC reinforcement learning method is used to train the neural network model on the training set to obtain the decision-making neural network model.

在一些实施例中，所述平滑化所述路径规划初始值以获得初始路径规划包括：In some embodiments, smoothing the path planning initial value to obtain the initial path planning includes:

将所述目标路径参考线的损失函数和边界约束条件作为所述路径规划初始值的损失函数和边界约束条件；Use the loss function and boundary constraints of the target path reference line as the loss function and boundary constraints of the initial value of the path planning;

基于二次型求解器对所述路径规划初始值进行优化以得到所述初始路径规划。The initial path planning value is optimized based on a quadratic solver to obtain the initial path planning.

在一些实施例中，所述对所述初始路径规划进行多目标优化以获得最优路径规划包括：In some embodiments, performing multi-objective optimization on the initial path plan to obtain the optimal path plan includes:

基于所述横向偏移-位移图和所述路径规划初始值确定所述路径的边界约束条件，基于所述边界约束条件以及预设优化目标对所述初始路径规划进行多目标优化以获取目标空间内的最优路径规划，其中，所述预设优化目标包括目标路径与所述目标路径参考线的偏离程度、所述目标路径与所述路径规划初始值的偏离程度、所述目标路径的连续性中的一个或多个；Determine the boundary constraint conditions of the path based on the lateral offset-displacement map and the initial path planning value, and perform multi-objective optimization on the initial path planning based on the boundary constraint conditions and preset optimization goals to obtain the target space The optimal path planning within one or more of sex;

当目标空间内的最优路径规划求解异常时，将所述初始路径规划作为所述最优路径规划。When the optimal path planning in the target space is solved abnormally, the initial path planning is used as the optimal path planning.

在一些实施例中，所述确定与所述最优路径规划对应的最优速度曲线包括：In some embodiments, determining the optimal speed curve corresponding to the optimal path planning includes:

对所述车辆在弗莱纳坐标系下的静态障碍物和动态障碍物构建位移-时间图；Construct a displacement-time diagram for the static obstacles and dynamic obstacles of the vehicle in the Fleener coordinate system;

对所述位移-时间图进行采样，计算各个采样节点的预设代价的代价值，由代价值总和最小的所述采样节点组成与所述预设代价对应的速度曲线初始值，其中，所述预设代价包括速度代价、加速度代价、加加速代价以及碰撞代价；The displacement-time graph is sampled, the cost value of the preset cost of each sampling node is calculated, and the initial value of the velocity curve corresponding to the preset cost is composed of the sampling node with the smallest sum of cost values, wherein, The preset costs include speed cost, acceleration cost, jerk cost and collision cost;

由所述速度曲线初始值得到速度优化的凸空间，在所述凸空间中提取边界约束条件，基于所述边界约束条件以及预设优化目标对所述速度曲线进行多目标优化以获取最优速度曲线，其中，所述预设优化目标包括巡航速度、加速度和加加速之中的一个或多个。A convex space for speed optimization is obtained from the initial value of the speed curve, boundary constraints are extracted in the convex space, and multi-objective optimization is performed on the speed curve based on the boundary constraints and preset optimization goals to obtain the optimal speed. Curve, wherein the preset optimization target includes one or more of cruising speed, acceleration and jerk.

在一些实施例中，所述联合所述最优路径规划和所述最优速度曲线得到目标轨迹包括：In some embodiments, combining the optimal path planning and the optimal speed curve to obtain the target trajectory includes:

基于所述最优速度曲线，通过位移对所述最优路径规划做插值，得到每个时刻的速度和路径信息以生成目标轨迹。Based on the optimal speed curve, the optimal path planning is interpolated through displacement to obtain the speed and path information at each moment to generate a target trajectory.

本公开实施例的第二方面提供了一种实时轨迹规划装置，包括：A second aspect of the embodiment of the present disclosure provides a real-time trajectory planning device, including:

确定模块，用于获取车辆在当前时刻的基础导航点信息和位姿信息，基于所述基础导航点信息确定目标路径参考线，基于所述目标路径参考线将所述位姿信息由笛卡尔坐标系转化到弗莱纳坐标系下，获得横向偏移-位移图和栅格化地图，其中，所述位姿信息包括所述车辆自身状态、周围交通参与者和局部路网信息；Determination module, used to obtain the basic navigation point information and pose information of the vehicle at the current moment, determine the target path reference line based on the basic navigation point information, and convert the pose information into Cartesian coordinates based on the target path reference line. The system is transformed into the Fleener coordinate system to obtain the lateral offset-displacement map and the rasterized map, where the pose information includes the vehicle's own status, surrounding traffic participants and local road network information;

优化模块，用于通过代价搜索和强化学习模型，确定当前车辆的路径规划初始值，平滑化所述路径规划初始值以获得初始路径规划，对所述初始路径规划进行多目标优化以获得最优路径规划；The optimization module is used to determine the initial value of the current vehicle's path planning through cost search and reinforcement learning models, smooth the initial value of the path planning to obtain the initial path planning, and perform multi-objective optimization on the initial path planning to obtain the optimal route plan;

获取模块，用于确定与所述最优路径规划对应的最优速度曲线，联合所述最优路径规划和所述最优速度曲线得到目标轨迹，对多车道目标轨迹进行代价选择，确定最终的控制轨迹。An acquisition module is used to determine the optimal speed curve corresponding to the optimal path planning, obtain the target trajectory by combining the optimal path planning and the optimal speed curve, perform cost selection on the multi-lane target trajectory, and determine the final Control the trajectory.

本公开实施例的第三方面提供了一种实时轨迹规划设备，包括存储器和处理器：A third aspect of the embodiment of the present disclosure provides a real-time trajectory planning device, including a memory and a processor:

所述存储器，用于存储计算机程序；The memory is used to store computer programs;

所述处理器，用于当执行所述计算机程序时，实现根据本公开第一方面所述的方法。The processor is configured to implement the method according to the first aspect of the present disclosure when executing the computer program.

综上所述，不同于现有技术通过强化学习直接得出轨迹粗解然后优化出最优轨迹，本公开各实施例提供的实时轨迹规划方法、装置和设备用强化学习得出路径粗解，并与代价搜索得的粗解做融合得到目标路径粗解，然后优化得到目标路径，最后与速度优化后的结果合并成目标轨迹。因为通过对基础导航点的多层代价与平滑处理确定目标路径参考线，解决了模型或搜索空间的偏差，使得动态规划的参考线初始路径更加稳定可靠；通过代价搜索与强化学习相结合的方法，解决了细粒度空间解的选取问题，增强了路径规划初始值的准确性和安全性；通过基于准确路径规划初始值的多目标优化策略，解决了规划模块在不同驾驶场景频繁调参的问题，使得系统具有良好的异常场景处理能力。To sum up, unlike the existing technology that directly obtains a rough solution of the trajectory through reinforcement learning and then optimizes the optimal trajectory, the real-time trajectory planning method, device and equipment provided by each embodiment of the present disclosure uses reinforcement learning to obtain a rough solution for the path. And fuse it with the rough solution obtained by the cost search to obtain the rough solution of the target path, then optimize it to obtain the target path, and finally merge it with the speed optimized result to form the target trajectory. Because the target path reference line is determined through multi-layer cost and smoothing processing of basic navigation points, the deviation of the model or search space is solved, making the initial path of the reference line of dynamic programming more stable and reliable; through the method of combining cost search and reinforcement learning , which solves the problem of selecting fine-grained spatial solutions and enhances the accuracy and safety of the initial value of path planning; through the multi-objective optimization strategy based on accurate initial value of path planning, it solves the problem of frequent parameter adjustment of the planning module in different driving scenarios. , so that the system has good abnormal scene processing capabilities.

附图说明Description of the drawings

通过参考附图会更加清楚的理解本公开的特征和优点，附图是示意性的而不应理解为对本公开进行任何限制，在附图中：The features and advantages of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be construed as limiting the disclosure in any way, in which:

图1是本公开所适用的一种实时轨迹规划系统的示意图；Figure 1 is a schematic diagram of a real-time trajectory planning system to which the present disclosure is applicable;

图2是根据本公开的一些实施例所示的一种实时轨迹规划方法的流程图；Figure 2 is a flow chart of a real-time trajectory planning method according to some embodiments of the present disclosure;

图3是根据本公开的一些实施例所示的一种实时轨迹规划装置的示意图；Figure 3 is a schematic diagram of a real-time trajectory planning device according to some embodiments of the present disclosure;

图4是根据本公开的一些实施例所示的一种实时轨迹规划设备的示意图。Figure 4 is a schematic diagram of a real-time trajectory planning device according to some embodiments of the present disclosure.

具体实施方式Detailed ways

在下面的详细描述中，通过示例阐述了本公开的许多具体细节，以便提供对相关披露的透彻理解。然而，对于本领域的普通技术人员来讲，本公开显而易见的可以在没有这些细节的情况下实施。应当理解的是，本公开中使用“系统”、“装置”、“单元”和/或“模块”术语，是用于区分在顺序排列中不同级别的不同部件、元件、部分或组件的一种方法。然而，如果其他表达式可以实现相同的目的，这些术语可以被其他表达式替换。In the following detailed description, numerous specific details of the present disclosure are set forth by way of example in order to provide a thorough understanding of the relevant disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these details. It should be understood that the terms "system", "device", "unit" and/or "module" are used in this disclosure as a means of distinguishing between different components, elements, portions or assemblies at different levels in a sequential arrangement. method. However, these terms may be replaced by other expressions if they serve the same purpose.

应当理解的是，当设备、单元或模块被称为“在……上”、“连接到”或“耦合到”另一设备、单元或模块时，其可以直接在另一设备、单元或模块上，连接或耦合到或与其他设备、单元或模块通信，或者可以存在中间设备、单元或模块，除非上下文明确提示例外情形。例如，本公开所使用的术语“和/或”包括一个或多个相关所列条目的任何一个和所有组合。It will be understood that when a device, unit or module is referred to as being "on," "connected to" or "coupled to" another device, unit or module, it can be directly on the other device, unit or module on, connected to, coupled to, or in communication with other devices, units or modules, or intervening devices, units or modules may be present, unless the context clearly indicates an exception. For example, as used in this disclosure, the term "and/or" includes any and all combinations of one or more of the associated listed items.

本公开所用术语仅为了描述特定实施例，而非限制本公开范围。如本公开说明书和权利要求书中所示，除非上下文明确提示例外情形，“一”、“一个”、“一种”和/或“该”等词并非特指单数，也可包括复数。一般说来，术语“包括”与“包含”仅提示包括已明确标识的特征、整体、步骤、操作、元素和/或组件，而该类表述并不构成一个排它性的罗列，其他特征、整体、步骤、操作、元素和/或组件也可以包含在内。The terminology used in this disclosure is for the purpose of describing specific embodiments only and does not limit the scope of the disclosure. As shown in this disclosure and claims, words such as "a", "an", "an" and/or "the" do not specifically refer to the singular and may include the plural unless the context clearly indicates an exception. Generally speaking, the terms "include" and "include" only imply the inclusion of clearly identified features, integers, steps, operations, elements and/or components, and such expressions do not constitute an exclusive list. Other features, Integers, steps, operations, elements and/or components may also be included.

参看下面的说明以及附图，本公开的这些或其他特征和特点、操作方法、结构的相关元素的功能、部分的结合以及制造的经济性可以被更好地理解，其中说明和附图形成了说明书的一部分。然而，可以清楚地理解，附图仅用作说明和描述的目的，并不意在限定本公开的保护范围。可以理解的是，附图并非按比例绘制。These and other features and characteristics of the present disclosure, methods of operation, function of associated elements of construction, combination of parts, and economics of manufacture may be better understood by reference to the following description and accompanying drawings, which form part of the instruction manual. It will be expressly understood, however, that the drawings are included for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It will be understood that the drawings are not drawn to scale.

本公开中使用了多种结构图用来说明根据本公开的实施例的各种变形。应当理解的是，前面或下面的结构并不是用来限定本公开。本公开的保护范围以权利要求为准。Various structural diagrams are used in the present disclosure to illustrate various modifications according to the embodiments of the present disclosure. It should be understood that the preceding or following structures are not intended to limit the present disclosure. The scope of protection of the present disclosure shall be determined by the claims.

图1是本公开适用的一种实时轨迹规划系统的示意图。如图1所示的系统，规划模块与感知模块和控制模块数据连接，规划模块基于感知模块在笛卡尔坐标系下感知到的数字化世界模型进行实时轨迹规划，并将生成的控制轨迹发往控制模块以控制车辆沿控制轨迹行驶。Figure 1 is a schematic diagram of a real-time trajectory planning system to which the present disclosure is applicable. As shown in the system shown in Figure 1, the planning module is connected with the data of the perception module and the control module. The planning module performs real-time trajectory planning based on the digital world model perceived by the perception module in the Cartesian coordinate system, and sends the generated control trajectory to the control module. The module controls the vehicle to drive along the control trajectory.

其中，所述感知模块包括用于感知车辆自身状态的各种传感器，如位姿、速度、加速度传感器等；也包括用于感知局部路网以及周围交通参与者信息的传感器，如摄像头、激光雷达等。所述感知模块还可以通过CANBUS总线获取挡位、速度、转向角等车辆底盘状态信息。Among them, the sensing module includes various sensors for sensing the vehicle's own status, such as posture, speed, acceleration sensors, etc.; it also includes sensors for sensing the local road network and surrounding traffic participant information, such as cameras and lidar. wait. The sensing module can also obtain vehicle chassis status information such as gear, speed, steering angle, etc. through the CANBUS bus.

所述规划模块可以是单机、集群或分布式服务器中的任一种。特别的，所述规划模块可以是自动驾驶车辆的车载控制器。所述规划模块可以基于数字化世界模型进行路径的动态规划得到路径粗解，并对路径粗解进行二次优化得到最优路径，同时对速度进行动态规划再进行二次优化得到最优速度，最终联合最优路径与最优速度生成控制轨迹。The planning module can be any one of a single machine, a cluster, or a distributed server. In particular, the planning module may be an on-board controller of an autonomous vehicle. The planning module can perform dynamic planning of the path based on the digital world model to obtain a rough path solution, perform secondary optimization on the rough path solution to obtain the optimal path, and simultaneously perform dynamic planning on the speed and then perform secondary optimization to obtain the optimal speed. Finally, The optimal path and optimal speed are combined to generate a control trajectory.

所述控制模块是自动驾驶车辆的机械、电子控制单元，如车身控制单元、方向盘和转向系统、制动系统、稳定系统等。The control module is the mechanical and electronic control unit of the autonomous vehicle, such as the body control unit, steering wheel and steering system, braking system, stability system, etc.

图2是根据本公开的一些实施例所示的一种实时轨迹规划方法的流程图。在一些实施例中，所述实时轨迹规划方法由图1所示系统中的规划模块执行，所述实时轨迹规划方法包括以下步骤：Figure 2 is a flow chart of a real-time trajectory planning method according to some embodiments of the present disclosure. In some embodiments, the real-time trajectory planning method is executed by the planning module in the system shown in Figure 1, and the real-time trajectory planning method includes the following steps:

S210：获取车辆在当前时刻的基础导航点信息和位姿信息，基于所述基础导航点信息确定目标路径参考线，基于所述目标路径参考线将所述位姿信息由笛卡尔坐标系转化到弗莱纳坐标系下，获得横向偏移-位移图和栅格化地图，其中，所述位姿信息包括所述车辆自身状态、周围交通参与者和局部路网信息。S210: Obtain the basic navigation point information and pose information of the vehicle at the current moment, determine the target path reference line based on the basic navigation point information, and convert the pose information from the Cartesian coordinate system to Under the Fleener coordinate system, a lateral offset-displacement map and a rasterized map are obtained, where the pose information includes the vehicle's own status, surrounding traffic participants and local road network information.

具体的，获取当前车辆在笛卡尔坐标系下感知到的数字化世界模型，包括车辆自身的状态、局部路网以及周围交通参与者的信息。Specifically, the digital world model perceived by the current vehicle in the Cartesian coordinate system is obtained, including the vehicle's own status, local road network, and information about surrounding traffic participants.

根据车辆当前帧的定位信息实时动态截取车前50m和后10m的基础导航点，对截取出来的点位进行优化处理。这里主要对基础导航点进行平滑、相似以及紧凑三个方面处理以得到目标路径参考线，对应优化求解的损失函数构建如下：Based on the positioning information of the current frame of the vehicle, the basic navigation points 50m in front and 10m behind the vehicle are dynamically intercepted in real time, and the intercepted points are optimized. Here, the basic navigation points are mainly processed in three aspects: smoothness, similarity and compactness to obtain the target path reference line. The loss function corresponding to the optimization solution is constructed as follows:

平滑代价：，Smoothing cost: ,

紧凑代价：，Compact cost: ,

相似代价：，Similar cost: ,

其中，分别为目标路径参考点任意相邻三个连续点位，/>为与目标路径参考点对应位置的原始导航点。平滑代价/>衡量路径参考线的光滑程度，紧凑代价/>衡量路径参考点之间的均匀程度，相似代价/>衡量路径参考线与原导航路径的偏离程度。in, They are any three adjacent consecutive points of the target path reference point,/> It is the original navigation point corresponding to the target path reference point. Smoothing cost/> Measures the smoothness of the path reference line, compactness cost/> Measuring the degree of uniformity and similarity cost between path reference points/> Measure the deviation of the path reference line from the original navigation path.

优化求解的边界约束如下：The boundary constraints of the optimization solution are as follows:

， ,

其中，分别为第i个路径参考点、原始导航点以及原始导航点和路径参考点之间的上下边界点。in, are respectively the i-th path reference point, the original navigation point, and the upper and lower boundary points between the original navigation point and the path reference point.

求解方式采用二次型求解器OSQP对该优化问题进行求解，该优化问题包括损失函数和边界约束，以得到满足约束条件的路径参考点，相对于粗略的原始导航点更加的平滑稳定，为后续规划模块提供准确参考。The solution method uses the quadratic solver OSQP to solve the optimization problem. The optimization problem includes a loss function and boundary constraints to obtain a path reference point that satisfies the constraint conditions. It is smoother and more stable than the rough original navigation point, and provides a good basis for subsequent Planning module provides accurate reference.

本公开通过对基础导航点的多层代价与平滑处理确定目标路径参考线，解决了模型或搜索空间的偏差，使得动态规划的参考线初始路径更加稳定可靠。This disclosure determines the target path reference line through multi-layer cost and smoothing processing of basic navigation points, solving the deviation of the model or search space, making the initial path of the dynamic planning reference line more stable and reliable.

最终在在目标参考线的基础上，将数字化世界模型中的各类位姿信息（包括边界）如自车位姿Te、静动态障碍物To以及车道线点位Tl和可行驶区域边界点位Tb从笛卡尔坐标系投影到弗莱纳坐标系下，得到横向偏移-位移（SL）图，同时对图中信息进行栅格化，进而得到当前帧的栅格化地图。Finally, based on the target reference line, various pose information (including boundaries) in the digital world model, such as self-vehicle pose Te, static and dynamic obstacles To, lane line point Tl and drivable area boundary point Tb Project from the Cartesian coordinate system to the Fleener coordinate system to obtain the lateral offset-displacement (SL) map, and rasterize the information in the map to obtain the rasterized map of the current frame.

S220：通过代价搜索和强化学习模型，确定当前车辆的路径规划初始值，平滑化所述路径规划初始值以获得初始路径规划，对所述初始路径规划进行多目标优化以获得最优路径规划。S220: Determine the initial value of the path planning of the current vehicle through cost search and reinforcement learning model, smooth the initial value of the path planning to obtain the initial path planning, and perform multi-objective optimization on the initial path planning to obtain the optimal path planning.

具体的，首先在横向偏移-位移（SL）图中，对横向L进行均匀撒点，对纵向S非均匀撒点（这里采用近密远疏的方式），采样空间内相邻采样点之间的路径生成模型如下：Specifically, first, in the lateral offset-displacement (SL) diagram, the horizontal L is uniformly scattered, and the longitudinal S is non-uniformly scattered (here, the close and far distance method is used), and the adjacent sampling points in the sampling space are The path generation model between is as follows:

， ,

其中，表示横向距离；s表示纵向距离，/>表示横向距离的五次多项式系数：。in, represents the horizontal distance; s represents the vertical distance,/> Fifth degree polynomial coefficients representing lateral distance: .

各个采样节点的代价计算构建如下：The cost calculation of each sampling node is constructed as follows:

平滑代价：，Smoothing cost: ,

偏移代价：，Offset cost: ,

碰撞代价：，，Collision cost: , ,

其中，分别为纵向距离/>下对应的横向距离及其一阶、二阶和三阶导数，/>为障碍物在SL图中的坐标，/>分别为距离障碍物的代价最小和最大距离，/>距离之间的代价采用线性关系/>来表示，/>为斜率，为截距项。in, Respectively, the vertical distance/> The corresponding lateral distance and its first, second and third order derivatives below,/> is the coordinate of the obstacle in the SL diagram,/> are the minimum and maximum cost distances from obstacles, /> The cost between distances adopts a linear relationship/> to express,/> is the slope, is the intercept term.

根据上述公式递推计算各个节点的代价值，通过回溯得到离散空间内的代价最小的路径规划初始值I。The cost value of each node is calculated recursively according to the above formula, and the initial value I of the path planning with the smallest cost in the discrete space is obtained through backtracking.

再根据当前及过去基于路径的连续三帧状态空间信息（栅格地图），利用决策神经网络模型进行自身车辆在当前时刻所要执行动作的决策，获得自身车辆在当前时刻的决策动作值（也即路径规划初始值II）。Then based on the current and past path-based three consecutive frames of state space information (grid map), the decision-making neural network model is used to make decisions about the actions that the own vehicle will perform at the current moment, and the decision-making action value of the own vehicle at the current moment is obtained (i.e. Path planning initial value II).

这里决策神经网络模型通过采用SAC强化学习方法训练神经网络获得。决策动作值包括：多个路径点的横向连续坐标值。决策神经网络模型的训练过程包括以下步骤：Here, the decision-making neural network model is obtained by training the neural network using the SAC reinforcement learning method. Decision action values include: horizontal continuous coordinate values of multiple path points. The training process of the decision neural network model includes the following steps:

1. 构建状态空间：状态空间是以车辆为中心的基于路径的连续三帧的激光栅格地图，车辆和周围障碍物的位置被投影到栅格地图上。1. Construct a state space: The state space is a three-frame continuous path-based laser raster map centered on the vehicle. The positions of the vehicle and surrounding obstacles are projected onto the raster map.

2. 构建由多个路径点的横向连续坐标值组成的动作空间。路径点筛选的范围为车辆前方一定范围的矩形范围，其中离散化路径点纵向坐标值的选取，路径点纵向间保持固定间隔，路径点横向坐标值由决策神经网络模型输出。2. Construct an action space composed of horizontal continuous coordinate values of multiple path points. The scope of waypoint screening is a certain rectangular range in front of the vehicle, in which the longitudinal coordinate values of the waypoints are discretized, a fixed interval is maintained between the longitudinal direction points, and the transverse coordinate values of the waypoints are output by the decision-making neural network model.

3. 根据生成的粗略路径点，使用贝塞尔曲线拟合出更加平滑、曲率连续的导航路径。3. Based on the generated rough path points, use Bezier curves to fit a smoother navigation path with continuous curvature.

4. 设计奖励函数：整个算法奖励函数考虑4项设计，到达目标的奖励、碰撞发生的惩罚、输出路径的曲率以及输出路径间的平滑迁移。4. Design reward function: The entire algorithm reward function considers four designs, the reward for reaching the target, the penalty for collision, the curvature of the output path, and the smooth migration between output paths.

5. 设计多样复杂的仿真环境（包括静态场景和复杂场景），基于奖励函数设计，在仿真环境中采用SAC强化学习方法训练神经网络模型，获得决策神经网络模型。5. Design various and complex simulation environments (including static scenes and complex scenes), based on reward function design, use SAC reinforcement learning method to train neural network models in the simulation environment, and obtain decision-making neural network models.

6. 决策神经网络模型的整个训练过程采用课程学习的方式，训练环境难度从简单到复杂，加速决策神经网络模型的训练，提高决策神经网络的鲁棒性。6. The entire training process of the decision-making neural network model adopts course learning, and the difficulty of the training environment ranges from simple to complex to accelerate the training of the decision-making neural network model and improve the robustness of the decision-making neural network.

在仿真环境中利用SAC强化学习方法进行神经网络模型的训练，当对仿真环境中的自身车辆施加相应的路径动作后，控制模块保证车辆准确跟随策略路径，同时计算相应的奖励，进而利用奖励调整相应状态对应的策略输出。The SAC reinforcement learning method is used to train the neural network model in the simulation environment. When the corresponding path action is applied to the own vehicle in the simulation environment, the control module ensures that the vehicle accurately follows the policy path, and at the same time calculates the corresponding reward, and then uses the reward adjustment The policy output corresponding to the corresponding status.

最终对在离散空间搜索得到的路径规划初始值I和在连续空间强化学习输出的路径规划初始值II进行相同的代价计算，代价函数设计如下：Finally, the same cost calculation is performed on the path planning initial value I obtained by searching in discrete space and the path planning initial value II output by reinforcement learning in continuous space. The cost function is designed as follows:

波动代价：，Fluctuation cost: ,

曲率代价：，Curvature cost: ,

平滑代价、偏移代价、碰撞代价：与前述计算方式一致，Smoothing cost, offset cost, collision cost: consistent with the aforementioned calculation method,

其中，波动代价考虑过去连续三帧的路径，/>分别为当前帧路径和第i帧路径对应同时刻的横向位移，波动代价衡量当前路径较历史路径的波动程度。曲率代价计算认为采样点足够小且均匀，/>、/>、分别为路径点任意相邻三个连续点位，曲率代价衡量路径点之间的弯曲程度。Among them, the fluctuation cost is considered The path of three consecutive frames in the past,/> They are respectively the lateral displacements corresponding to the current frame path and the i-th frame path at the same time, and the fluctuation cost measures the degree of fluctuation of the current path compared with the historical path. The curvature cost calculation considers that the sampling points are small enough and uniform,/> ,/> , They are any three adjacent consecutive points of the path point, and the curvature cost measures the degree of curvature between the path points.

通过上述代价信息的评估，对比两条路径规划初始值I、II的总代价，选择代价较小的路径作为最终路径规划初始值。Through the evaluation of the above cost information, the total cost of the two path planning initial values I and II is compared, and the path with a smaller cost is selected as the final path planning initial value.

然后，为进一步优化路径规划初始值，同时降低后续连续空间优化的约束依赖，这里对路径规划初始值进行初级优化处理，优化的目标问题、损失函数构建以及边界约束与S210中一致，最终得到一条光滑稳定的初始路径规划。Then, in order to further optimize the initial value of path planning and reduce the constraint dependence of subsequent continuous space optimization, primary optimization processing is performed on the initial value of path planning. The optimization target problem, loss function construction and boundary constraints are consistent with those in S210, and finally a Smooth and stable initial path planning.

最终，根据横向偏移-位移（SL）图和带有决策信息的初始路径规划，可以将该二次型优化问题进行如下构建：Finally, based on the lateral offset-displacement (SL) diagram and the initial path planning with decision information, the quadratic optimization problem can be constructed as follows:

目标路径等式约束为：The target path equality constraint is:

， ,

其中，为/>位置对应的横向偏移及其一阶和二阶导，同样/>为/>位置对应的横向偏移及其一阶和二阶导，/>为相邻纵向位移增量。该等式约束为分段加加速连续，确保/>三阶导数为常量/>，从而满足车辆舒适度要求。in, for/> The corresponding lateral offset of the position and its first and second order derivatives, also/> for/> The lateral offset corresponding to the position and its first and second derivatives,/> is the adjacent longitudinal displacement increment. The constraint of this equation is that the piecewise acceleration is continuous, ensuring that/> The third derivative is a constant/> , thereby meeting vehicle comfort requirements.

凸空间边界不等式约束为：The boundary inequality constraint of convex space is:

， ,

其中，分别为/>位置对应的凸空间上下限制。in, respectively/> The upper and lower limits of the convex space corresponding to the position.

路径边界不等式约束为：The path boundary inequality constraints are:

， ,

其中，为起始点/>对应的横向偏移及其一阶和二阶导，分别为车辆的横向偏移最小速度、最小加速度、最大速度、最大加速度。in, as the starting point/> The corresponding lateral offset and its first and second derivatives, They are the minimum speed, minimum acceleration, maximum speed, and maximum acceleration of the vehicle's lateral offset, respectively.

优化的目标分为以下三个部分：The optimization goals are divided into the following three parts:

参考线一致性目标：Reference line consistency goals:

， ,

其中，为纵向距离/>下对应的横向距离，该目标衡量目标路径与参考线的偏离程度。in, is the vertical distance/> The corresponding lateral distance below measures the deviation of the target path from the reference line.

粗解一致性目标：Rough explanation of consistency goals:

， ,

其中，为路径粗解在纵向距离/>下对应的横向距离，该目标衡量目标路径与粗解的偏离程度。in, Rough solution for path in longitudinal distance/> The corresponding lateral distance below measures the deviation of the target path from the rough solution.

光滑连续性目标：Smooth continuity target:

， ,

其中，分别为/>对应横向偏移的一阶、二阶和三阶导的权重系数。该目标衡量目标路径的连续性。in, respectively/> The weight coefficients corresponding to the first, second and third derivatives of the lateral offset. This goal measures the continuity of the goal path.

利用二次规划求解器对该优化问题进行求解，该优化问题包括上述目标路径分段加加速连续等式约束、凸空间边界不等式约束、路径边界不等式约束，以参考线一致性、粗解一致性以及光滑连续性为优化目标，求解出目标空间内的最优路径。特别的，当优化求解异常时返回初始路径规划作为目标路径。Use a quadratic programming solver to solve this optimization problem, which includes the above-mentioned target path segmented acceleration continuity equation constraints, convex space boundary inequality constraints, and path boundary inequality constraints. Based on the reference line consistency and rough solution consistency And smooth continuity is the optimization goal, and the optimal path in the target space is solved. In particular, when the optimization solution is abnormal, the initial path plan is returned as the target path.

本公开通过代价搜索与强化学习相结合的方法，解决了细粒度空间解的选取问题，增强了路径规划初始值的准确性和安全性；通过基于准确路径规划初始值的多目标优化策略，解决了规划模块在不同驾驶场景频繁调参的问题，使得系统具有良好的异常场景处理能力。This disclosure solves the problem of selecting fine-grained space solutions through a method that combines cost search and reinforcement learning, and enhances the accuracy and security of the initial value of path planning; through a multi-objective optimization strategy based on accurate initial value of path planning, it solves This solves the problem of frequent parameter adjustment of the planning module in different driving scenarios, giving the system good ability to handle abnormal scenarios.

S230：确定与所述最优路径规划对应的最优速度曲线，联合所述最优路径规划和所述最优速度曲线得到目标轨迹，对多车道目标轨迹进行代价选择，获取最终的控制轨迹。S230: Determine the optimal speed curve corresponding to the optimal path planning, obtain the target trajectory by combining the optimal path planning and the optimal speed curve, perform cost selection on the multi-lane target trajectory, and obtain the final control trajectory.

具体的，首先对弗莱纳坐标系下的静、动态障碍物构建位移-时间（ST）图，在位移-时间（ST）图中，对时间T进行均匀撒点，对纵向位移S非均匀撒点（这里采用近密远疏的方式），考虑到空间内边界和障碍物，采用节点代价（包括速度、加速度、加加速以及碰撞代价）搜索的方式获得初始速度曲线。Specifically, first, a displacement-time (ST) diagram is constructed for the static and dynamic obstacles in the Fleener coordinate system. In the displacement-time (ST) diagram, the time T is uniformly dotted, and the longitudinal displacement S is non-uniformly Sprinkle points (the method of close and far is used here), taking into account the boundaries and obstacles in the space, and use the node cost (including speed, acceleration, acceleration and collision cost) search method to obtain the initial speed curve.

与S220类似，由初始速度曲线得到速度优化的凸空间，在凸空间中提取边界约束信息，这里的约束包括目标速度曲线分段加加速连续等式约束、凸空间边界不等式约束、变量边界不等式约束；优化目标包括巡航速度、加速度和加加速，由此利用二次规划优化求解出最终的最优速度曲线。Similar to S220, the convex space of speed optimization is obtained from the initial speed curve, and the boundary constraint information is extracted in the convex space. The constraints here include the target speed curve segmented acceleration continuity equation constraint, the convex space boundary inequality constraint, and the variable boundary inequality constraint. ; The optimization goals include cruising speed, acceleration and acceleration, and the final optimal speed curve is solved using quadratic programming optimization.

然后以最优速度曲线为基础，通过位移（S）对最优路径规划做插值，得到每个时刻（T_i）的速度和路径的信息以生成目标轨迹。考虑到自车的状态限制，包括速度、加速度、曲率等限制信息，验证目标轨迹是否在状态限制的范围内。Then based on the optimal speed curve, the optimal path planning is interpolated through the displacement (S) to obtain the speed and path information at each time (T _i ) to generate the target trajectory. Taking into account the state constraints of the self-vehicle, including speed, acceleration, curvature and other limit information, verify whether the target trajectory is within the range of the state limits.

最终，通过设计代价计算，对多车道目标轨迹进行选择，获取最终的控制轨迹。其中：Finally, through design cost calculation, the multi-lane target trajectory is selected to obtain the final control trajectory. in:

单车道代价计算构建如下：The single lane cost calculation is constructed as follows:

横向偏移代价：，Lateral offset cost: ,

横向舒适度代价：，Lateral comfort cost: ,

纵向舒适度代价：，Longitudinal comfort cost: ,

其中，分别为纵向位移对时间的一阶、二阶导，/>为规划周期，/>为车辆最大纵向加加速限制。in, are the first and second derivatives of longitudinal displacement with respect to time respectively,/> For the planning cycle,/> It is the maximum longitudinal acceleration limit of the vehicle.

通过并行计算各个车道目标轨迹的代价信息，选取代价最小的目标轨迹作为当前帧的最终输出轨迹。By calculating the cost information of each lane target trajectory in parallel, the target trajectory with the smallest cost is selected as the final output trajectory of the current frame.

图3是根据本公开的一些实施例所示的一种实时轨迹规划装置的示意图。如图3所示，所述实时轨迹规划装置300包括确定模块310、优化模块320、获取模块330。所述实时轨迹规划功能可以由图1所示系统中的规划模块执行。其中：Figure 3 is a schematic diagram of a real-time trajectory planning device according to some embodiments of the present disclosure. As shown in FIG. 3 , the real-time trajectory planning device 300 includes a determination module 310 , an optimization module 320 , and an acquisition module 330 . The real-time trajectory planning function can be executed by the planning module in the system shown in Figure 1. in:

确定模块310，用于获取车辆在当前时刻的基础导航点信息和位姿信息，基于所述基础导航点信息确定目标路径参考线，基于所述目标路径参考线将所述位姿信息由笛卡尔坐标系转化到弗莱纳坐标系下，获得横向偏移-位移图和栅格化地图，其中，所述位姿信息包括所述车辆自身状态、周围交通参与者和局部路网信息；The determination module 310 is used to obtain the basic navigation point information and pose information of the vehicle at the current moment, determine a target path reference line based on the basic navigation point information, and convert the pose information into a Cartesian model based on the target path reference line. The coordinate system is converted to the Fleener coordinate system to obtain a lateral offset-displacement map and a rasterized map, where the pose information includes the vehicle's own status, surrounding traffic participants and local road network information;

优化模块320，用于通过代价搜索和强化学习模型，确定当前车辆的路径规划初始值，平滑化所述路径规划初始值以获得初始路径规划，对所述初始路径规划进行多目标优化以获得最优路径规划；The optimization module 320 is used to determine the initial value of the current vehicle's path planning through cost search and reinforcement learning models, smooth the initial value of the path planning to obtain the initial path planning, and perform multi-objective optimization on the initial path planning to obtain the optimal path planning. Optimal path planning;

获取模块330，用于确定与所述最优路径规划对应的最优速度曲线，联合所述最优路径规划和所述最优速度曲线得到目标轨迹，对多车道目标轨迹进行代价选择，确定最终的控制轨迹。The acquisition module 330 is used to determine the optimal speed curve corresponding to the optimal path planning, obtain the target trajectory by combining the optimal path planning and the optimal speed curve, perform cost selection on the multi-lane target trajectory, and determine the final control trajectory.

本公开的一个实施例提供了一种实时轨迹规划设备。如图4所示，所述实时轨迹规划设备400包括存储器420和处理器410，所述存储器420，用于存储计算机程序；所述处理器410，用于当执行所述计算机程序时，实现图2中S210-S230所述的方法。One embodiment of the present disclosure provides a real-time trajectory planning device. As shown in Figure 4, the real-time trajectory planning device 400 includes a memory 420 and a processor 410. The memory 420 is used to store a computer program; the processor 410 is used to implement the figure when executing the computer program. Methods described in S210-S230 in 2.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述装置实施例中的对应描述，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the above-described equipment and modules can be referred to the corresponding descriptions in the foregoing device embodiments, and will not be described again here.

尽管此处所述的主题是在结合操作系统和应用程序在计算机系统上的执行而执行的一般上下文中提供的，但本领域技术人员可以认识到，还可结合其他类型的程序模块来执行其他实现。一般而言，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、组件、数据结构和其他类型的结构。本领域技术人员可以理解，此处所述的本主题可以使用其他计算机系统配置来实践，包括手持式设备、多处理器系统、基于微处理器或可编程消费电子产品、小型计算机、大型计算机等，也可使用在其中任务由通过通信网络连接的远程处理设备执行的分布式计算环境中。在分布式计算环境中，程序模块可位于本地和远程存储器存储设备的两者中。Although the subject matter described herein is provided in the general context of being performed in conjunction with an operating system and application programs executing on a computer system, those skilled in the art will recognize that other types of program modules may also be performed in conjunction with other types of programs. accomplish. Generally speaking, program modules include routines, programs, components, data structures, and other types of structures that perform specific tasks or implement specific abstract data types. Those skilled in the art will appreciate that the subject matter described herein may be practiced using other computer system configurations, including handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like , may also be used in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及方法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本公开的范围。Those of ordinary skill in the art will appreciate that the units and method steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of this disclosure.

应当理解的是，本公开的上述具体实施方式仅仅用于示例性说明或解释本公开的原理，而不构成对本公开的限制。因此，在不偏离本公开的精神和范围的情况下所做的任何修改、等同替换、改进等，均应包含在本公开的保护范围之内。此外，本公开所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。It should be understood that the above-described specific embodiments of the present disclosure are only used to illustrate or explain the principles of the present disclosure, and do not constitute a limitation of the present disclosure. Therefore, any modifications, equivalent substitutions, improvements, etc. made without departing from the spirit and scope of the present disclosure should be included in the protection scope of the present disclosure. Furthermore, the appended claims of the present disclosure are intended to cover all changes and modifications that fall within the scope and boundaries of the appended claims, or equivalents of such scopes and boundaries.

Claims

1. A method of real-time trajectory planning, comprising:

basic navigation point information and pose information of a vehicle at the current moment are obtained, a target path reference line is determined based on the basic navigation point information, the pose information is converted from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and a transverse offset-displacement map and a rasterization map are obtained, wherein the pose information comprises the state of the vehicle, surrounding traffic participants and local road network information;

determining a path planning initial value of a current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and performing multi-objective optimization on the initial path plan to obtain an optimal path plan;

determining an optimal speed curve corresponding to the optimal path planning, combining the optimal path planning and the optimal speed curve to obtain a target track, and carrying out cost selection on the multi-lane target track to obtain a final control track;

wherein, the determining the path planning initial value of the current vehicle through the cost searching and reinforcement learning model comprises:

sampling along the target path reference line, calculating the cost value of the preset cost of each sampling node, and forming a first path planning initial value corresponding to the preset cost by the sampling nodes with the minimum sum of cost values;

according to the three continuous frames of the grid map based on the target path reference line in the current and past, utilizing a decision neural network model to make a decision of the action to be executed by the vehicle at the current moment, and obtaining a decision action value of the vehicle at the current moment, wherein the decision action value forms a second path planning initial value;

and calculating the cost value of the preset cost for the second path planning initial value, comparing the cost value with the cost value of the first path planning initial value corresponding to the preset cost, and taking the smaller one of the second path planning initial value and the first path planning initial value as the path planning initial value of the current vehicle.

2. The method of claim 1, wherein the determining a target path reference line based on the base navigation point information comprises:

acquiring basic navigation points in a first preset distance in front and a second preset distance in rear in real time according to the positioning information of the current frame of the vehicle;

determining a loss function and a boundary constraint condition of the target path reference line, wherein the loss function comprises a smooth cost, a compact cost and a similar cost, the smooth cost is used for measuring the smoothness of the target path reference line, the compact cost is used for measuring the uniformity degree between target path reference points corresponding to the basic navigation points, and the similar cost is used for measuring the deviation degree of the target path reference line and an original navigation path;

and obtaining the basic navigation points meeting the boundary constraint condition based on a quadratic solver, wherein the path reference points corresponding to the basic navigation points form the target path reference line.

3. The method of claim 1, wherein the training method of the decision neural network model comprises:

constructing a state space from a path-based three-frame continuous grid map centered on a vehicle, wherein the positions of the vehicle and surrounding obstacles are projected onto the grid map;

constructing an action space formed by transverse continuous coordinate values of a plurality of path points sampled in a preset rectangular range in front of the vehicle, wherein the longitudinal intervals of the path points are kept at fixed intervals, and the transverse coordinate values are determined by a decision neural network model;

designing a reward function based on rewards to reach the target, penalties for collision occurrence, curvature of the output paths, and smooth migration between the output paths;

constructing a training set by taking the state space as input and the action space as output;

designing a simulation environment comprising a static scene and a complex scene, training the neural network model for the training set by adopting a SAC reinforcement learning method based on the reward function in the simulation environment, and obtaining the decision neural network model.

4. The method of claim 1, wherein said smoothing the path plan initial value to obtain an initial path plan comprises:

taking the loss function and the boundary constraint condition of the target path reference line as the loss function and the boundary constraint condition of the path planning initial value;

and optimizing the path planning initial value based on a quadratic solver to obtain the initial path planning.

5. The method of claim 1, wherein said multi-objective optimizing said initial path plan to obtain an optimal path plan comprises:

determining a boundary constraint condition of the path based on the transverse offset-displacement diagram and the path planning initial value, and performing multi-objective optimization on the initial path planning based on the boundary constraint condition and a preset optimization objective to obtain an optimal path planning in a target space, wherein the preset optimization objective comprises one or more of a deviation degree of a target path from a target path reference line, a deviation degree of the target path from the path planning initial value and continuity of the target path;

and when the optimal path planning in the target space solves for abnormality, taking the initial path planning as the optimal path planning.

6. The method of claim 1, wherein the determining an optimal speed profile corresponding to the optimal path plan comprises:

constructing a displacement-time diagram for static obstacles and dynamic obstacles of the vehicle under a flener coordinate system;

sampling the displacement-time diagram, calculating the cost value of preset cost of each sampling node, and forming an initial speed curve corresponding to the preset cost by the sampling nodes with the minimum sum of cost values, wherein the preset cost comprises speed cost, acceleration adding cost and collision cost;

obtaining a convex space of speed optimization from the initial speed curve, extracting boundary constraint conditions in the convex space, and performing multi-objective optimization on the initial speed curve based on the boundary constraint conditions and a preset optimization target to obtain an optimal speed curve, wherein the preset optimization target comprises one or more of cruising speed, acceleration and acceleration.

7. The method of claim 1, wherein the combining the optimal path plan and the optimal speed profile to obtain a target trajectory comprises:

and interpolating the optimal path planning through displacement based on the optimal speed curve to obtain speed and path information at each moment so as to generate a target track.

8. A real-time trajectory planning device, comprising:

the determining module is used for acquiring basic navigation point information and pose information of the vehicle at the current moment, determining a target path reference line based on the basic navigation point information, converting the pose information from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and acquiring a transverse offset-displacement map and a rasterized map, wherein the pose information comprises the state of the vehicle, surrounding traffic participants and local road network information;

the optimizing module is used for determining a path planning initial value of the current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and carrying out multi-objective optimization on the initial path plan to obtain an optimal path plan;

the acquisition module is used for determining an optimal speed curve corresponding to the optimal path planning, combining the optimal path planning and the optimal speed curve to obtain a target track, and carrying out cost selection on the multi-lane target track to acquire a final control track;

9. A real-time trajectory planning device comprising a memory and a processor:

the memory is used for storing a computer program;

the processor being adapted to implement the method according to any of claims 1-7 when executing the computer program.