CN115309160A

CN115309160A - Planning method and planning device for robot path

Info

Publication number: CN115309160A
Application number: CN202211008398.4A
Authority: CN
Inventors: 和望利; 杜文莉; 钱锋
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-08
Anticipated expiration: 2042-08-22
Also published as: CN115309160B

Abstract

The invention provides a robot path planning method, a device and a storage medium. The robot path planning method comprises the following steps: acquiring first initial information of the robot at the current moment and second initial information of the pedestrian at the current moment; predicting a first state of the robot at the next moment and a second state of the pedestrian at the next moment according to the first initial information and the second initial information; determining action values of a plurality of actions of the robot at the current moment according to the first initial information, the second initial information, the first state and the second state; and according to the action value of each action, performing real-time local path planning. By executing the steps, the path planning method can enable the robot to move in a mode which is more effective and conforms to human social criteria, has high environmental adaptability and obstacle avoidance success rate, and can realize local obstacle avoidance planning under a complex pedestrian environment.

Description

Robot path planning method and planning device

技术领域technical field

本发明涉及机器人导航领域，尤其涉及一种机器人路径规划方法、机器人路径规划的装置以及对应的计算机可读存储介质。The present invention relates to the field of robot navigation, in particular to a robot path planning method, a robot path planning device, and a corresponding computer-readable storage medium.

背景技术Background technique

近年来，随着移动机器人和人工智能技术的发展，移动机器人已经开始从实验室环境进入公共领域为人类提供服务。但是公共服务领域场景更加复杂，特别是行人环境给移动机器人的运动规划算法提出了新的挑战。In recent years, with the development of mobile robots and artificial intelligence technology, mobile robots have begun to enter the public domain from the laboratory environment to provide services for humans. However, the scene in the public service field is more complex, especially the pedestrian environment poses new challenges to the motion planning algorithm of mobile robots.

传统的移动机器人局部路径规划方法使用数学模型或物理模型来构建机器人与行人的交互状态，然后结合传统的搜索算法如遗传算法等完成路径规划任务，这类方法需要根据不同的实验场景设置不同参数，对于陌生的场景泛化能力有限，且效果欠佳。虽然可以保障移动机器人在工厂等结构化的环境中稳定运行，但是在复杂行人环境下依旧面临诸多理论和工程上的困难。Traditional local path planning methods for mobile robots use mathematical models or physical models to construct the interaction state between robots and pedestrians, and then combine traditional search algorithms such as genetic algorithms to complete path planning tasks. Such methods need to set different parameters according to different experimental scenarios , the generalization ability for unfamiliar scenes is limited, and the effect is not good. Although it can ensure the stable operation of mobile robots in structured environments such as factories, it still faces many theoretical and engineering difficulties in complex pedestrian environments.

而随着机器学习的发展，数据驱动方法成为行人环境下机器人路径规划的热门研究方向，该方法使移动机器人具有“学习能力”，极大地提升了场景适应性，但也面临着学习效率低、收敛困难等问题。With the development of machine learning, data-driven methods have become a popular research direction for robot path planning in pedestrian environments. This method enables mobile robots to have "learning ability" and greatly improves scene adaptability, but it also faces low learning efficiency, Convergence difficulties and other issues.

为了克服现有技术存在的上述缺陷，本领域亟需一种机器人路径规划技术，使得机器人以更有效且顺应人类社会准则的方式进行移动，具有很高的环境适应性和避障成功率，并且能够在复杂行人环境下实现局部避障规划。In order to overcome the above-mentioned defects in the prior art, there is an urgent need for a robot path planning technology in this field, so that the robot can move in a more effective and conforming manner to human social norms, with high environmental adaptability and obstacle avoidance success rate, and It can realize local obstacle avoidance planning in complex pedestrian environment.

发明内容Contents of the invention

以下给出一个或多个方面的简要概述以提供对这些方面的基本理解。此概述不是所有构想到的方面的详尽综览，并且既非旨在指认出所有方面的关键性或决定性要素亦非试图界定任何或所有方面的范围。其唯一的目的是要以简化形式给出一个或多个方面的一些概念以为稍后给出的更加详细的描述之前序。A brief summary of one or more aspects is presented below to provide a basic understanding of these aspects. This summary is not an exhaustive overview of all contemplated aspects and is intended to neither identify key or critical elements of all aspects nor attempt to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

为了克服现有技术存在的上述缺陷，本发明提供了一种机器人路径规划方法、机器人路径规划装置以及对应的计算机可读存储介质，能够使得机器人以更有效且顺应人类社会准则的方式进行移动，具有很高的环境适应性和避障成功率，并且能够在复杂行人环境下实现局部避障规划。In order to overcome the above-mentioned defects in the prior art, the present invention provides a robot path planning method, a robot path planning device, and a corresponding computer-readable storage medium, which enable the robot to move in a more effective manner and in compliance with human social norms. It has high environmental adaptability and obstacle avoidance success rate, and can realize local obstacle avoidance planning in complex pedestrian environment.

具体来说，根据本发明的第一方面提供的上述车辆交流方法包括以下步骤：获取机器人在当前时刻的第一初始信息和行人在当前时刻的第二初始信息；根据所述第一初始信息和所述第二初始信息，预测所述机器人在下一时刻的第一状态和所述行人在下一时刻的第二状态；根据所述第一初始信息、所述第二初始信息、所述第一状态及所述第二状态，确定当前时刻所述机器人的多个动作的动作价值；以及根据各所述动作的动作价值，进行实时局部路径规划。Specifically, the above-mentioned vehicle communication method provided according to the first aspect of the present invention includes the following steps: obtaining the first initial information of the robot at the current moment and the second initial information of the pedestrian at the current moment; The second initial information predicts the first state of the robot at the next moment and the second state of the pedestrian at the next moment; according to the first initial information, the second initial information, the first state and the second state, determining the action values of multiple actions of the robot at the current moment; and performing real-time local path planning according to the action values of each of the actions.

进一步地，在本发明的一些实施例中，所述获取机器人在当前时刻的第一初始信息和行人在当前时刻的第二初始信息的步骤包括：设定环境状态空间、机器人动作空间及环境奖励函数；获取当前时刻的行人位置及行人速度；根据所述机器人的里程计数据，确定所述机器人的位姿信息；以及根据所述环境状态空间、所述机器人动作空间、所述环境奖励函数、所述行人位置、所述行人速度、所述位姿信息、固定的目标点信息，以及最大线速度信息，确定所述第一初始信息和所述第二初始信息。Further, in some embodiments of the present invention, the step of acquiring the first initial information of the robot at the current moment and the second initial information of the pedestrian at the current moment includes: setting the environment state space, the robot action space and the environment reward function; obtain the pedestrian position and pedestrian speed at the current moment; determine the pose information of the robot according to the odometer data of the robot; and according to the environment state space, the robot action space, the environment reward function, The pedestrian position, the pedestrian speed, the pose information, the fixed target point information, and the maximum linear velocity information determine the first initial information and the second initial information.

进一步地，在本发明的一些实施例中，设定所述环境状态空间的步骤包括：以s_t表示所述机器人在t时刻的状态，并以

表示第i个行人在t时刻的可观测状态；以及在二维空间中，将所述机器人和每个所述行人都假定成一个半径为r的圆，将所述机器人的状态信息表示为

其由所述机器人的当前位姿

速度

自身半径r、目标位置[g_x，g_y]、最大线速度v_pref组成，将每个所述行人的状态信息表示为

其由所述行人的坐标位置

速度

以及半径r组成，将t时刻n个行人的状态信息表示为

并将环境联合状态表示为

Further, in some embodiments of the present invention, the step of setting the environment state space includes: denoting the state of the robot at time t by s _t , and by

Represents the observable state of the i-th pedestrian at time t; and in two-dimensional space, assuming the robot and each pedestrian are a circle with a radius of r, the state information of the robot is expressed as

which consists of the current pose of the robot

speed

The self-radius r, the target position [g _x , g _y ], the maximum linear velocity v _pref , and the state information of each pedestrian is expressed as

which consists of the pedestrian's coordinate position

speed

and the radius r, the state information of n pedestrians at time t is expressed as

and express the environment union state as

进一步地，在本发明的一些实施例中，设定所述机器人动作空间的步骤包括：在动作空间A中，将线速度设定为v＝{0.2、0.4、0.6、0.8、1.0}，将角速度设定为ω＝{-π/4、-π/6、-π/12、0、π/12、π/6、π/4}，并将所述动作空间A表示为A＝[{0，0}，{v，ω}]，其中，所述动作空间A中包括多个离散的动作。Further, in some embodiments of the present invention, the step of setting the action space of the robot includes: in the action space A, setting the linear velocity as v={0.2, 0.4, 0.6, 0.8, 1.0}, setting The angular velocity is set as ω={-π/4, -π/6, -π/12, 0, π/12, π/6, π/4}, and the action space A is expressed as A=[{ 0, 0}, {v, ω}], wherein the action space A includes multiple discrete actions.

进一步地，在本发明的一些实施例中，设定所述环境奖励函数的步骤包括：将奖励函数R^t定义为

其中，

为接近目标奖励，

为碰撞行人惩罚，

为违反社交规范惩罚，所述

用于引导机器人快速并最终到达目标位置：

其中r_g＝0.25为到达目标位置的奖励，P^t为机器人在t时刻所处的位置，g为目标位置，所述

用于保证运动的安全性：

其中r_c＝-0.25为机器人碰到行人时的惩罚，r_robot为机器人的半径，r_i为第i个行人的半径，P^t为机器人在t时刻所处的位置，

为第i个行人在t时刻所处的位置，所述

用于保证机器人运动满足社交属性要求，并避免机器人在运动过程中过度靠近行人而造成不舒适感：Further, in some embodiments of the present invention, the step of setting the environment reward function includes: defining the reward function R ^t as

in,

To reward for being close to the goal,

Punishment for colliding with pedestrians,

Penalties for violating social norms, the

Used to guide the robot quickly and eventually reach the target position:

Where r _g =0.25 is the reward for reaching the target position, P ^t is the position of the robot at time t, g is the target position, the

Used to ensure the safety of sports:

Where r _c = -0.25 is the penalty when the robot encounters a pedestrian, r _robot is the radius of the robot, r _i is the radius of the ith pedestrian, P ^t is the position of the robot at time t,

is the position of the ith pedestrian at time t, the

It is used to ensure that the robot's movement meets the requirements of social attributes, and to avoid the discomfort caused by the robot being too close to pedestrians during the movement:

其中

为所述机器人与第i个行人之间的距离。in

is the distance between the robot and the i-th pedestrian.

进一步地，在本发明的一些实施例中，所述获取当前时刻的行人位置及行人速度的步骤包括：经由多帧激光雷达数据对行人的双腿进行检测；以及根据检测到的腿部信息匹配出对应的行人，并对所述行人进行跟踪。Further, in some embodiments of the present invention, the step of obtaining the pedestrian position and pedestrian speed at the current moment includes: detecting the pedestrian's legs through multiple frames of lidar data; and matching the legs according to the detected leg information. Identify the corresponding pedestrians and track the pedestrians.

进一步地，在本发明的一些实施例中，所述根据所述第一初始信息和所述第二初始信息，预测所述机器人在下一时刻的第一状态和所述行人在下一时刻的第二状态的步骤包括：构建行人状态预测模型；以机器人状态信息s_t和行人状态信息W_t为输入，通过两个多层感知机模型f_r、f_h将所述机器人状态信息s_t和所述行人状态信息W_t的维数变成一致s′_t＝f_r(s_t；W_r)W′_t＝f_h(W_t；W_h)其中，W_r、W_h为可训练的权重矩阵；以

为特征矩阵，构建图注意力网络，以预测所述机器人在下一时刻的第一状态和所述行人在下一时刻的第二状态。Further, in some embodiments of the present invention, the first state of the robot at the next moment and the second state of the pedestrian at the next moment are predicted according to the first initial information and the second initial information. The step of the state includes: building a pedestrian state prediction model; taking the robot state information _st and the pedestrian state information W _t as input, and combining the robot state information _st and the pedestrian state information through two multi-layer perceptron models f _r , f _h The dimensions of pedestrian state information W _t become consistent s′ _t = f _r (s _t ; W _r )W′ _t = f _h (W _t ; W _h ) where W _r and W _h are trainable weight matrices ;by

As the feature matrix, a graph attention network is constructed to predict the first state of the robot at the next moment and the second state of the pedestrian at the next moment.

进一步地，在本发明的一些实施例中，所述构建图注意力神经网络的步骤包括：计算第1层图注意力网络中，每个注意力头下的注意力矩阵

Further, in some embodiments of the present invention, the step of constructing a graph attention neural network includes: calculating the attention matrix under each attention head in the first layer graph attention network

其中

为可训练的权重矩阵，k₁为所述第1层图注意力网络中注意力头的个数，并结合多头注意力机制提取机器人与行人、行人与行人之间的交互特征信息，以计算所述第1层图注意力网络的输出结果：in

is a trainable weight matrix, k ₁ is the number of attention heads in the first-layer graph attention network, and combines the multi-head attention mechanism to extract interaction feature information between robots and pedestrians, and between pedestrians and pedestrians, to calculate The output of the layer 1 graph attention network:

其中，||表示特征拼接，σ表示函数，设为ELU，W₁为可训练的权重矩阵；以及计算第2层图注意力网络中，每个注意力头下的注意力矩阵

Among them, || represents feature splicing, σ represents a function, set to ELU, W ₁ is a trainable weight matrix; and calculate the attention matrix under each attention head in the second layer graph attention network

其中

为可训练的权重矩阵，k₂为第2层图注意力网络中注意力头的个数，并结合多头注意力机制提取移动机器人与行人、行人与行人之间的交互特征信息，以计算第2层图注意力网络的输出结果：in

is a trainable weight matrix, k ₂ is the number of attention heads in the second-layer graph attention network, and combines the multi-head attention mechanism to extract the interaction feature information between mobile robots and pedestrians, and between pedestrians and pedestrians, to calculate the first The output of the 2-layer graph attention network:

其中，||表示特征拼接，σ表示函数，设为ELU，W₂为可训练的权重矩阵。Among them, || represents feature concatenation, σ represents a function, which is set to ELU, and W ₂ is a trainable weight matrix.

进一步地，在本发明的一些实施例中，所述预测所述机器人在下一时刻的第一状态和所述行人在下一时刻的第二状态的步骤包括：令

通过一个多层感知机模型f_predict预测行人下一时刻的状态

其中W为可训练的权重矩阵；以及基于移动机器人状态信息

和动作空间A，计算动作空间中每个动作策略a_t＝[v^t，ω^t]对应的下一时刻状态信息s_t+1：Further, in some embodiments of the present invention, the step of predicting the first state of the robot at the next moment and the second state of the pedestrian at the next moment includes:

Predict the state of pedestrians at the next moment through a multi-layer perceptron model f _predict

where W is a trainable weight matrix; and based on the state information of the mobile robot

and action space A, calculate the next moment state information s _t+1 corresponding to each action strategy a _t = [v ^t , ω ^t ] in the action space:

θ^t+1＝θ^t+1+ω^t θ ^t+1 ＝θ ^t+1 +ω ^t

进一步地，在本发明的一些实施例中，如图3所示，所述根据所述第一初始信息、所述第二初始信息31、所述第一状态及所述第二状态，确定当前时刻所述机器人的多个动作的动作价值的步骤包括：基于D3QN强化学习方法，将所述第一初始信息、所述第二初始信息、所述第一状态及所述第二状态分别输入值网络模型，以计算当前时刻所述机器人在其动作空间中的多个动作的动作价值Q(J_t，a_t；ω)。Further, in some embodiments of the present invention, as shown in FIG. 3 , the current The step of timing the action values of multiple actions of the robot includes: based on the D3QN reinforcement learning method, inputting values of the first initial information, the second initial information, the first state and the second state respectively A network model to calculate the action value Q(J _t _, at ; ω) of multiple actions of the robot in its action space at the current moment.

进一步地，在本发明的一些实施例中，所述计算当前时刻所述机器人在其动作空间中的多个动作的动作价值Q(J_t，a_t；ω)的步骤包括：采用D3QN强化学习框架，构建值网络模型；以机器人状态信息s_t和行人状态信息W_t31作为输入，通过两个多层感知机模型f′_r、f′_h32将所述机器人状态信息s_t和所述行人状态信息W_t的维数变成一致：Further, in some embodiments of the present invention, the step of calculating the action value Q(J _t _, at ; ω) of multiple actions of the robot in its action space at the current moment includes: adopting D3QN reinforcement learning frame, constructing a value network model; taking robot state information _st and pedestrian state information W _t ₃₁ as input, and _combining said robot state information _st and said The dimensions of pedestrian state information W _t become consistent:

s′_t＝f′_r(s_t；W′_r)s′ _t = f′ _r (s _t ; W′ _r )

W′_t＝f′_h(W_t；W′_h)W′ _t = f′ _h (W _t ; W′ _h )

其中，W′_r、W′_h为可训练的权重矩阵；以

为特征矩阵，计算第1层图注意力网络中，每个注意力头下的注意力矩阵

Among them, W′ _r , W′ _h are trainable weight matrices;

is the feature matrix, and calculates the attention matrix under each attention head in the layer 1 graph attention network

其中

为可训练的权重矩阵，k₁为所述第1层图注意力网络33中注意力头的个数，并结合多头注意力机制提取机器人与行人、行人与行人之间的交互特征信息，以计算所述第1层图注意力网络33的输出结果：in

is a trainable weight matrix, k ₁ is the number of attention heads in the first layer graph attention network 33, and combines the multi-head attention mechanism to extract the interaction feature information between robots and pedestrians, pedestrians and pedestrians, to Calculate the output result of the first layer graph attention network 33:

其中，||表示特征拼接，σ表示函数，设为ELU，W′₁为可训练的权重矩阵；计算第2层图注意力网络34中，每个注意力头下的注意力矩阵

Among them, || represents feature splicing, σ represents a function, set as ELU, W′ ₁ is a trainable weight matrix; calculate the attention matrix under each attention head in the second layer graph attention network 34

其中

为可训练的权重矩阵，k₂为第2层图注意力网络34中注意力头的个数，并结合多头注意力机制提取机器人与行人、行人与行人之间的交互特征信息，计算第2层图注意力网络34的输出结果：in

is a trainable weight matrix, k ₂ is the number of attention heads in the second-layer graph attention network 34, and combines the multi-head attention mechanism to extract the interactive feature information between robots and pedestrians, and between pedestrians and pedestrians, and calculate the second The output of the layer map attention network34:

其中，||表示特征拼接，σ表示函数，设为ELU，W′₂为可训练的权重矩阵；令

基于D3QN强化学习算法原理，将H输入后续的值网络模型，以计算各所述动作a_t的动作价值Q(J_t，a_t；ω)：Among them, || represents feature splicing, σ represents a function, which is set as ELU, and W′ ₂ is a trainable weight matrix; let

Based on the principle of the D3QN reinforcement learning algorithm, input H into the subsequent value network model to calculate the action value Q(J _t _, at ; ω ₎ of each action at:

其中V和A表示两个多层感知机模型35，并且引入了Noisy Net，将高斯噪声添加到全连接层，两个多层感知机模型以H为输入，分别输出状态价值和优势函数；将下一时刻的机器人状态信息s_t+1和行人状态信息W_t+1作为值网络模型输入，以计算下一时刻中每个动作a_t+1的动作价值Q(J_t+1，a_t+1；ω)：Among them, V and A represent two multi-layer perceptron models 35, and Noisy Net is introduced to add Gaussian noise to the fully connected layer. The two multi-layer perceptron models take H as input and output state value and advantage function respectively; The robot state information s _t+1 and pedestrian state information W _t+1 at the next moment are input as the value network model to calculate the action value Q(J t ₊ ₁ , a _{t +1} ; ω):

以及

as well as

基于Q(J_t，a_t；ω)和Q(J_t+1，a_t+1；ω)，重新计算当前环境状态J_t下，动作空间中各所述动作a_t的价值Q(J_t，a_t；ω)：Based on Q(J _t , at ; ω) and Q(J _t ₊₁ _, at ₊₁ ; ω), _recalculate the value Q(J _t , a _t ; ω):

其中γ为折扣因子，Δt为所述机器人每两次决策之间的时间间隔。Where γ is a discount factor, and Δt is the time interval between two decisions of the robot.

进一步地，在本发明的一些实施例中，根据各所述动作的动作价值，进行实时局部路径规划的步骤包括：根据计算出的每个动作价值Q(J_t，a_t；ω)，选择当前状态J_t下动作价值最大的动作来制定最优策略输出，以实现所述机器人的实时局部路径规划。Further, in some embodiments of the present invention, according to the action value of each action, the step of performing real-time local path planning includes: according to the calculated action value Q(J _t _, at ; ω), select The action with the largest action value in the current state J _t is used to formulate the optimal policy output, so as to realize the real-time local path planning of the robot.

进一步地，在本发明的一些实施例中，所述根据各所述动作的动作价值，进行实时局部路径规划的步骤进一步包括：获取所述机器人在当前时刻的坐标信息

以计算所述坐标信息与目标点之间的距离d_g；判断所述距离d_g是否小于预设阈值；响应于所述距离d_g大于或等于所述预设阈值的判断结果，进一步确定下一状态J_t+1下动作价值最大的动作来制定最优策略输出；以及响应于所述距离d_g小于所述预设阈值的判断结果，停止规划。Further, in some embodiments of the present invention, the step of performing real-time local path planning according to the action value of each action further includes: acquiring the coordinate information of the robot at the current moment

To calculate the distance d _g between the coordinate information and the target point; determine whether the distance d _g is less than a preset threshold; in response to the judgment result that the distance d _g is greater than or equal to the preset threshold, further determine the following An action with the largest action value in state J _t+1 to formulate an optimal policy output; and in response to the judgment result that the distance d _g is less than the preset threshold, stop planning.

此外，根据本发明的第二方面提供的上述机器人路径规划装置包括存储器及处理器。所述处理器连接所述存储器，并被配置与用于实施本发明的第一方面提供的上述机器人路径规划方法。In addition, the above robot path planning device provided according to the second aspect of the present invention includes a memory and a processor. The processor is connected to the memory and is configured to implement the above robot path planning method provided by the first aspect of the present invention.

此外，根据本发明的第二方面提供的上述计算机可读存储介质，其上存储有计算机指令。所述计算机指令被处理器执行时，实施本发明的第一方面提供的上述机器人路径规划方法。In addition, the above-mentioned computer-readable storage medium provided according to the second aspect of the present invention has computer instructions stored thereon. When the computer instructions are executed by the processor, the above robot path planning method provided by the first aspect of the present invention is implemented.

附图说明Description of drawings

在结合以下附图阅读本公开的实施例的详细描述之后，能够更好地理解本发明的上述特征和优点。在附图中，各组件不一定是按比例绘制，并且具有类似的相关特性或特征的组件可能具有相同或相近的附图标记。The above-mentioned features and advantages of the present invention can be better understood after reading the detailed description of the embodiments of the present disclosure in conjunction with the following drawings. In the drawings, components are not necessarily drawn to scale, and components with similar related properties or characteristics may have the same or similar reference numerals.

图1示出了根据本发明的一些实施例提供的机器人路径规划方法的流程图。Fig. 1 shows a flowchart of a robot path planning method provided according to some embodiments of the present invention.

图2示出了根据本发明的一些实施例提供的机器人路径规划方法中的行人状态预测模型图。Fig. 2 shows a diagram of a pedestrian state prediction model in a robot path planning method according to some embodiments of the present invention.

图3示出了根据本发明的一些实施例提供的机器人路径规划方法中的值网络模型图。Fig. 3 shows a value network model diagram in the robot path planning method provided according to some embodiments of the present invention.

图4示出了根据本发明的一些实施例提供的机器人路径规划方法中的实验结果示意图。Fig. 4 shows a schematic diagram of experimental results in the robot path planning method provided according to some embodiments of the present invention.

具体实施方式Detailed ways

以下由特定的具体实施例说明本发明的实施方式，本领域技术人员可由本说明书所揭示的内容轻易地了解本发明的其他优点及功效。虽然本发明的描述将结合优选实施例一起介绍，但这并不代表此发明的特征仅限于该实施方式。恰恰相反，结合实施方式作发明介绍的目的是为了覆盖基于本发明的权利要求而有可能延伸出的其它选择或改造。为了提供对本发明的深度了解，以下描述中将包含许多具体的细节。本发明也可以不使用这些细节实施。此外，为了避免混乱或模糊本发明的重点，有些具体细节将在描述中被省略。The implementation of the present invention will be illustrated by specific specific examples below, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. Although the description of the invention will be presented in conjunction with a preferred embodiment, it is not intended that the features of the invention be limited to that embodiment only. On the contrary, the purpose of introducing the invention in conjunction with the embodiments is to cover other options or modifications that may be extended based on the claims of the present invention. The following description contains numerous specific details in order to provide a thorough understanding of the present invention. The invention may also be practiced without these details. Also, some specific details will be omitted from the description in order to avoid obscuring or obscuring the gist of the present invention.

在本发明的描述中，需要说明的是，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be noted that unless otherwise specified and limited, the terms "installation", "connection" and "connection" should be understood in a broad sense, for example, it can be a fixed connection or a detachable connection. Connected, or integrally connected; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, and it can be the internal communication of two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention in specific situations.

另外，在以下的说明中所使用的“上”、“下”、“左”、“右”、“顶”、“底”、“水平”、“垂直”应被理解为该段以及相关附图中所绘示的方位。此相对性的用语仅是为了方便说明之用，其并不代表其所叙述的装置需以特定方位来制造或运作，因此不应理解为对本发明的限制。In addition, "up", "down", "left", "right", "top", "bottom", "horizontal", and "vertical" used in the following descriptions should be understood The orientation shown in the figure. The relative terms are used for convenience of description only, and do not imply that the device described therein must be manufactured or operated in a specific orientation, and thus should not be construed as limiting the present invention.

能理解的是，虽然在此可使用用语“第一”、“第二”、“第三”等来叙述各种组件、区域、层和/或部分，这些组件、区域、层和/或部分不应被这些用语限定，且这些用语仅是用来区别不同的组件、区域、层和/或部分。因此，以下讨论的第一组件、区域、层和/或部分可在不偏离本发明一些实施例的情况下被称为第二组件、区域、层和/或部分。It can be understood that although the terms "first", "second", "third", etc. may be used herein to describe various components, regions, layers and/or sections, these components, regions, layers and/or sections It should not be limited by these terms, and these terms are only used to distinguish different components, regions, layers and/or sections. Thus, a first component, region, layer and/or section discussed below could be termed a second component, region, layer and/or section without departing from some embodiments of the present invention.

如上所述，近年来，随着机器人和人工智能技术的发展，机器人已经开始从实验室环境进入公共领域为人类提供服务。但是公共服务领域场景更加复杂，特别是行人环境给移动机器人的运动规划算法提出了新的挑战。传统的移动机器人局部路径规划方法使用数学模型或物理模型来构建机器人与行人的交互状态，然后结合传统的搜索算法如遗传算法等完成路径规划任务，这类方法需要根据不同的实验场景设置不同参数，对于陌生的场景泛化能力有限，且效果欠佳。虽然可以保障移动机器人在工厂等结构化的环境中稳定运行，但是在复杂行人环境下依旧面临诸多理论和工程上的困难。而随着机器学习的发展，数据驱动方法成为行人环境下机器人路径规划的热门研究方向，该方法使移动机器人具有“学习能力”，极大地提升了场景适应性，但也面临着学习效率低、收敛困难等问题。As mentioned above, in recent years, with the development of robotics and artificial intelligence technology, robots have begun to enter the public domain from laboratory environments to provide services for humans. However, the scene in the public service field is more complex, especially the pedestrian environment poses new challenges to the motion planning algorithm of mobile robots. Traditional local path planning methods for mobile robots use mathematical models or physical models to construct the interaction state between robots and pedestrians, and then combine traditional search algorithms such as genetic algorithms to complete path planning tasks. Such methods need to set different parameters according to different experimental scenarios , the generalization ability for unfamiliar scenes is limited, and the effect is not good. Although it can ensure the stable operation of mobile robots in structured environments such as factories, it still faces many theoretical and engineering difficulties in complex pedestrian environments. With the development of machine learning, data-driven methods have become a popular research direction for robot path planning in pedestrian environments. This method enables mobile robots to have "learning ability" and greatly improves scene adaptability, but it also faces low learning efficiency, Convergence difficulties and other issues.

为了克服现有技术存在的上述缺陷，本发明提供了一种机器人路径规划方法、机器人路径规划装置以及对应的计算机可读存储介质，使得机器人以更有效且顺应人类社会准则的方式进行移动，具有很高的环境适应性和避障成功率，并且能够在复杂行人环境下实现局部避障规划。In order to overcome the above-mentioned defects existing in the prior art, the present invention provides a robot path planning method, a robot path planning device, and a corresponding computer-readable storage medium, so that the robot can move in a more effective manner that complies with human social norms. High environmental adaptability and obstacle avoidance success rate, and can realize local obstacle avoidance planning in complex pedestrian environment.

在一些非限制性的实施例中，本发明的第一方面提供的上述机器人路径规划方法，可以经由本发明的第二方面提供的机器人路径规划装置来实施。具体来说，该规划装置中配置有存储器及处理器。该存储器包括但不限于本发明的第三方面提供的上述计算机可读存储介质，其上存储有计算机指令。该处理器连接存储器，并被配置用于执行该存储器上存储的计算机指令，以实施本发明的第一方面提供的机器人路径规划方法。In some non-limiting embodiments, the above robot path planning method provided by the first aspect of the present invention can be implemented via the robot path planning device provided by the second aspect of the present invention. Specifically, the planning device is configured with a memory and a processor. The storage includes, but is not limited to, the computer-readable storage medium provided by the third aspect of the present invention, on which computer instructions are stored. The processor is connected to the memory and is configured to execute computer instructions stored on the memory to implement the robot path planning method provided in the first aspect of the present invention.

以下将结合一些路径规划方法的实施例来描述上述路径规划装置的工作原理。本领域的技术人员可以理解，这些交流方法的实施例只是本发明提供的一些非限制性的实施方式，旨在清楚地展示本发明的主要构思，并提供一些便于公众实施的具体方案，而非用于限制该交流系统的全部功能或全部工作方式。同样地，该路径规划装置也只是本发明提供的一种非限制性的实施方式，不对这些路径规划方法中各步骤的执行主体构成限制。The working principle of the above-mentioned path planning device will be described below in conjunction with some embodiments of path planning methods. Those skilled in the art can understand that the embodiments of these communication methods are only some non-limiting implementations provided by the present invention, and are intended to clearly demonstrate the main idea of the present invention and provide some specific solutions that are convenient for the public to implement, rather than Used to limit all features or the way the AC system works. Likewise, the path planning device is only a non-limiting embodiment provided by the present invention, and does not limit the execution subject of each step in these path planning methods.

请参考图1，图1示出了根据本发明的一些实施例提供的机器人路径规划方法的流程图示意图。Please refer to FIG. 1 , which shows a schematic flowchart of a robot path planning method according to some embodiments of the present invention.

如图1的步骤S1所示，在规划机器人路径的过程中，规划装置可以首先设定环境状态空间、移动机器人的动作空间、环境奖励函数。As shown in step S1 of FIG. 1 , in the process of planning the robot path, the planning device may first set the environment state space, the action space of the mobile robot, and the environment reward function.

具体来说，在本发明的一些实施例中，该环境状态空间可以被定义为如下形式：用s_t表示移动机器人在t时刻的状态，

表示第i个行人在t时刻的可观测状态。在二维空间(X-Y平面)中，机器人和每个行人都被假定成一个半径为r的圆，移动机器人的状态信息表示为

由移动机器人当前位姿

速度

自身半径r、目标位置[g_x，g_y]、最大线速度v_pref组成；每个行人状态信息可表示为

为行人的坐标位置、速度以及半径，在t时刻n个行人的状态信息表示为

环境联合状态表示为

Specifically, in some embodiments of the present invention, the environment state space can be defined as the following form: use _st to represent the state of the mobile robot at time t,

Indicates the observable state of the i-th pedestrian at time t. In two-dimensional space (XY plane), the robot and each pedestrian are assumed to be a circle with radius r, and the state information of the mobile robot is expressed as

The current pose of the mobile robot

speed

Self-radius r, target position [g _x , g _y ], maximum linear velocity v _pref ; the state information of each pedestrian can be expressed as

is the coordinate position, velocity and radius of pedestrians, and the state information of n pedestrians at time t is expressed as

The environment federation state is expressed as

可选地，在另一些实施例中，环境状态空间还可以被定义为如下形式：用s_t表示移动机器人在t时刻的状态，

表示第i个行人在t时刻的可观测状态。在二维空间(X-Y平面)中，机器人和每个行人都被假定成一个长度为a，宽度为b的长方形，移动机器人的状态信息表示为

由移动机器人当前位姿

速度

自身长a，自身宽b、目标位置[g_x，g_y]、最大线速度v_pref组成；每个行人状态信息可表示为

环境联合状态表示为

Optionally, in other embodiments, the environment state space can also be defined as follows: use _st to represent the state of the mobile robot at time t,

Indicates the observable state of the i-th pedestrian at time t. In two-dimensional space (XY plane), the robot and each pedestrian are assumed to be a rectangle with length a and width b, and the state information of the mobile robot is expressed as

The current pose of the mobile robot

speed

Self-length a, self-width b, target position [g _x , g _y ], maximum linear velocity v _pref ; the state information of each pedestrian can be expressed as

The environment federation state is expressed as

此外，在本发明的一些实施例中，机器人的动作空间可以被定义为如下形式：设定移动机器人的动作空间A。动作空间A由机器人的线速度v和角速度ω构成，考虑到机器人的动力学限制以及人机共存环境下的安全性要求，设定线速度v＝{0.2、0.4、0.6、0.8、1.0}，角速度ω＝{-π/4、-π/6、-π/12、0、π/12、π/6、π/4}。动作空间A表示为A＝[{0，0}，{v，ω}]，由36个离散的动作组成。In addition, in some embodiments of the present invention, the action space of the robot can be defined as follows: set the action space A of the mobile robot. The action space A is composed of the linear velocity v and the angular velocity ω of the robot. Considering the dynamic constraints of the robot and the safety requirements in the environment of man-machine coexistence, set the linear velocity v={0.2, 0.4, 0.6, 0.8, 1.0}, Angular velocity ω={-π/4, -π/6, -π/12, 0, π/12, π/6, π/4}. The action space A is expressed as A=[{0, 0}, {v, ω}], which consists of 36 discrete actions.

可选地，在另一些实施例中，机器人的动作空间还可以被定义为如下形式，设定移动机器人的动作空间B。动作空间B由机器人的线速度v和角速度ω构成，考虑到机器人的动力学限制以及人机共存环境下的安全性要求，设定线速度v＝{0.3、0.6、0.9、1.2、1.5}，角速度ω＝{-π/6、-π/12、0、π/12、π/6}。动作空间B表示为B＝[{0，0}，{v，ω}]，由26个离散的动作组成。Optionally, in some other embodiments, the action space of the robot can also be defined as follows, and the action space B of the mobile robot is set. The action space B is composed of the robot's linear velocity v and angular velocity ω. Considering the robot's dynamic constraints and the safety requirements of the human-machine coexistence environment, set the linear velocity v={0.3, 0.6, 0.9, 1.2, 1.5}, Angular velocity ω={-π/6, -π/12, 0, π/12, π/6}. The action space B is expressed as B=[{0, 0}, {v, ω}], which consists of 26 discrete actions.

此外，在本发明的一些实施例中，机器人的环境奖励函数可以被定义为如下形式：奖励函数R^t由三部分构成，分别为接近目标奖励

与行人碰撞惩罚

以及违反社交规范惩罚

In addition, in some embodiments of the present invention, the environment reward function of the robot can be defined as the following form: the reward function R ^t consists of three parts, which are rewards for approaching the target

Punishment for Collision with Pedestrians

and penalties for violating social norms

用于引导机器人快速并最终到达目标位置：

Used to guide the robot quickly and eventually reach the target position:

其中r_g为到达目标位置的奖励，相应于目标环境的不同以及算法收敛速度的不同，本发明可以自行调节r_g的大小范围，实现快速收敛抑或精准达到目标值，例如，本发明可以设置r_g＝0.4以实现其快速收敛使得机器人能够快速达到最终目标位置，又例如，本发明可以设置r_g＝0.1以实现机器人精准避障的效果，P^t为机器人在t时刻所处的位置，g为目标位置，又例如，本发明可以设置r_g＝0.25以实现机器人较为精准避障与较快速度地到达目标区域。P^t为机器人在t时刻所处的位置，g为目标位置；Where r _g is the reward for reaching the target position. According to the difference in the target environment and the convergence speed of the algorithm, the present invention can adjust the size range of r _g by itself to achieve rapid convergence or accurately reach the target value. For example, the present invention can set r _g = 0.4 to achieve its rapid convergence so that the robot can quickly reach the final target position, and for example, the present invention can set r _g = 0.1 to achieve the effect of precise obstacle avoidance of the robot, P ^t is the position of the robot at time t, g is the target position, and for another example, the present invention can set r _g =0.25 to achieve more accurate obstacle avoidance and faster arrival of the robot to the target area. P ^t is the position of the robot at time t, and g is the target position;

为保证运动的安全性，机器人在与行人发生碰撞时会得到一定程度的惩罚：In order to ensure the safety of the movement, the robot will be punished to a certain extent when it collides with pedestrians:

其中r_c为机器人碰到行人时的惩罚，同样的，本发明可以自行调节r_c的大小范围，实现快速收敛抑或精准达到目标值。Among them, r _c is the punishment when the robot encounters pedestrians. Similarly, the present invention can adjust the size range of r _c by itself to achieve rapid convergence or accurately reach the target value.

可选地，本发明可以设置r_g＝0.4以实现机器人精准避障的效果。Optionally, the present invention can set r _g =0.4 to achieve the effect of precise obstacle avoidance of the robot.

可选地，本发明可以设置r_g＝0.1以实现快速收敛使得机器人能够快速到达目标位置。Optionally, the present invention can set r _g =0.1 to achieve fast convergence so that the robot can quickly reach the target position.

优选地，本发明可以设置r_c＝0.25以实现机器人较为精准避障与较快速度地到达目标区域，其中，r_robot为机器人的半径，r_i为第i个行人的半径，P^t为机器人在t时刻所处的位置，

为第i个行人在t时刻所处的位置。Preferably, the present invention can set r _c =0.25 to achieve more accurate obstacle avoidance and faster arrival of the robot to the target area, where r _robot is the radius of the robot, r _i is the radius of the i-th pedestrian, and P ^t is the robot The position at time t,

is the position of the i-th pedestrian at time t.

最后，为保证机器人运动的社交属性要求，本发明可以基于社交困境增添社交距离惩罚，避免机器人在运动过程中过度靠近行人而造成不舒适感，例如本发明可以将惩罚函数设置成为如下形式：Finally, in order to ensure the social attribute requirements of the robot’s movement, the present invention can add social distance penalties based on the social dilemma, so as to avoid the discomfort caused by the robot getting too close to pedestrians during the movement. For example, the present invention can set the penalty function to the following form:

其中

为移动机器人与第i个行人之间的距离。本发明可以设定当移动机器人在行人0.5m范围内时，会引起行人在人群中运动方式的变化，导致对其他行人造成影响；当移动机器人在行人0.2m范围内时，会直接引起行人的不舒适。in

is the distance between the mobile robot and the i-th pedestrian. The present invention can be set that when the mobile robot is within the range of 0.5m of pedestrians, it will cause changes in the movement mode of pedestrians in the crowd, which will affect other pedestrians; when the mobile robot is within the range of 0.2m of pedestrians, it will directly cause pedestrians uncomfortable.

可选地，本发明可以将惩罚函数设置成为等式约束的二次外点惩罚函数、内点惩罚函数。Optionally, the present invention may set the penalty function as a quadratic outlier penalty function and an inlier penalty function constrained by equality.

进一步地，在本发明的一些实施例中，如图1的步骤S2所示，在设置完空间环境状态、机器人的动作空间、环境奖励函数后，本发明可以通过激光雷达、里程计获取当前时刻行人状态信息、机器人状态信息。Further, in some embodiments of the present invention, as shown in step S2 of FIG. 1, after setting the space environment state, the action space of the robot, and the environment reward function, the present invention can obtain the current moment through the laser radar and the odometer Pedestrian status information, robot status information.

具体来说，本发明可以通过腿探测算法(leg detector)算法通过多帧激光雷达数据来对行人的双腿进行检测，并依据检测到的腿部信息匹配出对应的行人，然后利用行人跟踪(people_tracker)算法对行人进行跟踪，获得每个行人的速度

和坐标

之后，本发明可以设定行人的半径r为0.3m，得到t时刻的每个行人状态信息

最终得到t时刻n个行人的状态信息表示为

Specifically, the present invention can detect the legs of pedestrians through multi-frame lidar data through the leg detector algorithm, and match the corresponding pedestrians according to the detected leg information, and then use pedestrian tracking ( people_tracker) algorithm to track pedestrians and obtain the speed of each pedestrian

and coordinates

Afterwards, the present invention can set the radius r of pedestrians to 0.3m, and obtain the status information of each pedestrian at time t

Finally, the state information of n pedestrians at time t is obtained as

进一步地，在本发明的一些实施例中，本发明可以基于自适应蒙特卡洛算法，得到当前时刻移动机器人的位姿信息

并获取移动机器人当前时刻的速度

在此，本发明可以将移动机器人自身半径r设为0.3m、最大线速度v_pref设为1m/s，目标位置[g_x，g_y]初始给定，得到t时刻的移动机器人状态信息

Further, in some embodiments of the present invention, the present invention can obtain the pose information of the mobile robot at the current moment based on the adaptive Monte Carlo algorithm

And get the speed of the mobile robot at the current moment

Here, in the present invention, the radius r of the mobile robot itself can be set to 0.3m, the maximum linear velocity v _pref can be set to 1m/s, the target position [g _x , g _y ] is initially given, and the state information of the mobile robot at time t can be obtained

请进一步参考图1及图2，图2示出了为本发明的行人状态预测模型图。如图1的步骤S3与图2所示，本发明可以根据第一初始信息和第二初始信息，预测机器人在下一时刻的第一状态和行人在下一时刻的第二状态。Please refer further to FIG. 1 and FIG. 2 , FIG. 2 shows a pedestrian state prediction model diagram of the present invention. As shown in step S3 of FIG. 1 and FIG. 2 , the present invention can predict the first state of the robot at the next moment and the second state of the pedestrian at the next moment according to the first initial information and the second initial information.

具体来说，在预测机器人在下一时刻的第一状态和行人在下一时刻的第二状态的过程中，本发明可以首先构建行人状态预测模型。之后，如图2中模块21所示，本发明可以以机器人状态信息s_t和行人状态信息W_t为输入，通过构建模型将机器人状态信息s_t和行人状态信息W_t的维数变成一致。Specifically, in the process of predicting the first state of the robot at the next moment and the second state of the pedestrian at the next moment, the present invention may first construct a pedestrian state prediction model. Afterwards, as shown in module 21 in Figure 2, the present invention can take robot state information _st and pedestrian state information W _t as input, and make the dimensions of robot state information _st and pedestrian state information W _t consistent by building a model .

优选地，本发明可以通过两个多层感知机模型22f_r、f_h将机器人状态信息s_t和行人状态信息W_t的维数变成一致：Preferably, the present invention can make the dimensions of robot state information _st and pedestrian state information W _t consistent through two multi-layer perceptron models _22fr and f _h :

s′_t＝f_r(s_t；W_r) W′_t＝f_h(W_t；W_h)s′ _t = f _r (s _t ; W _r ) W′ _t = f _h (W _t ; W _h )

其中，W_r、W_h为可训练的权重矩阵；Among them, W _r and W _h are trainable weight matrices;

可选地，本发明可以通过自定义的多层神经网络将机器人状态信息s_t和行人状态信息W_t的维数变成一致。Optionally, the present invention can make the dimensions of robot state information _st and pedestrian state information W _t consistent through a self-defined multi-layer neural network.

可选地，本发明可以经由支持向量机模型将机器人状态信息s_t和行人状态信息W_t的维数变成一致。Optionally, the present invention can make the dimensions of robot state information _st and pedestrian state information W _t consistent through a support vector machine model.

优选地，本发明还可以以

为特征矩阵，构建图注意力网络，以预测机器人在下一时刻的第一状态和行人在下一时刻的第二状态。Preferably, the present invention can also be

可选地，本发明还可以将特征矩阵进行正则化作为新的特征矩阵，构建图注意力网络，以预测机器人在下一时刻的第一状态和行人在下一时刻的第二状态。Optionally, the present invention can also regularize the feature matrix as a new feature matrix to construct a graph attention network to predict the first state of the robot at the next moment and the second state of the pedestrian at the next moment.

可选地，本发明还可以将特征矩阵进行归一化作为新的特征矩阵，构建图注意力网络，以预测机器人在下一时刻的第一状态和行人在下一时刻的第二状态。Optionally, the present invention can also normalize the feature matrix as a new feature matrix to construct a graph attention network to predict the first state of the robot at the next moment and the second state of the pedestrian at the next moment.

进一步地，在本发明的一些实施例中，构建图注意力神经网络的步骤包括：计算第1层图注意力网络23中，每个注意力头下的注意力矩阵

Further, in some embodiments of the present invention, the step of constructing the graph attention neural network includes: calculating the attention matrix under each attention head in the first layer graph attention network 23

可选地，本发明可以用激活函数sigmoid替代激活函数LeakyReLu，或者用激活函数tanh替代激活函数LeakyReLu，其中

为可训练的权重矩阵，k₁为第1层图注意力网络中注意力头的个数，并结合多头注意力机制提取机器人与行人、行人与行人之间的交互特征信息，以计算第1层图注意力网络23的输出结果：Optionally, the present invention can replace the activation function LeakyReLu with the activation function sigmoid, or replace the activation function LeakyReLu with the activation function tanh, where

is a trainable weight matrix, k ₁ is the number of attention heads in the first-layer graph attention network, and combines the multi-head attention mechanism to extract the interaction feature information between robots and pedestrians, and between pedestrians and pedestrians, to calculate the first The output of the layer map attention network23:

其中，||表示特征拼接，σ表示函数，设为ELU，W₁为可训练的权重矩阵。Among them, || represents feature concatenation, σ represents a function, which is set to ELU, and W ₁ is a trainable weight matrix.

此外，本发明可以计算第2层图注意力网络24中，每个注意力头下的注意力矩阵

In addition, the present invention can calculate the attention matrix under each attention head in the layer 2 graph attention network 24

为可训练的权重矩阵，k₂为第2层图注意力网络24中注意力头的个数，并结合多头注意力机制提取移动机器人与行人、行人与行人之间的交互特征信息，以计算第2层图注意力网络的输出结果：Optionally, the present invention can replace the activation function LeakyReLu with the activation function sigmoid, or replace the activation function LeakyReLu with the activation function tanh, where

is a trainable weight matrix, k ₂ is the number of attention heads in the second-layer graph attention network 24, and combines the multi-head attention mechanism to extract the interaction feature information between mobile robots and pedestrians, and between pedestrians and pedestrians, to calculate The output of the layer 2 graph attention network:

进一步地，在本发明的一些实施例中，预测机器人在下一时刻的第一状态和行人在下一时刻的第二状态的步骤包括：Further, in some embodiments of the present invention, the step of predicting the first state of the robot at the next moment and the second state of the pedestrian at the next moment includes:

令

make

优选地，本发明可以通过一个多层感知机模型f_predict25预测行人下一时刻的状态

其中W为可训练的权重矩阵；Preferably, the present invention can predict the state of pedestrians at the next moment through a multi-layer perceptron model f _predict 25

where W is a trainable weight matrix;

可选地，本发明可以通过一个自定义的多层神经网络模型f_predict1预测行人下一时刻的状态

其中W为可训练的权重矩阵。Optionally, the present invention can predict the state of pedestrians at the next moment through a self-defined multi-layer neural network model f _predict1

where W is the trainable weight matrix.

此外，本发明还可以基于移动机器人状态信息

和动作空间A，计算动作空间中每个动作策略a_t＝[v^t，ω^t]对应的下一时刻状态信息s_t+1：In addition, the present invention can also be based on the state information of the mobile robot

θ^t+1＝θ^t+1+ω^t θ ^t+1 ＝θ ^t+1 +ω ^t

进一步地，在本发明的一些实施例中，如图1的步骤S4所示，在获取下一时刻状态信息即第一状态信息、第二状态信息后，基于D3QN计算动作空间中每个动作的动作价值函数计算当前时刻机器人在其动作空间中的多个动作的动作价值Q(J_t，a_t；ω)的步骤包括：采用D3QN强化学习框架，构建值网络模型。之后，本发明可以以机器人状态信息s_t和行人状态信息W_t作为输入，通过两个多层感知机模型f′_r、f′_h将机器人状态信息s_t和行人状态信息W_t的维数变成一致：Further, in some embodiments of the present invention, as shown in step S4 of FIG. 1 , after obtaining the state information at the next moment, that is, the first state information and the second state information, calculate the value of each action in the action space based on D3QN The action value function calculates the action value Q(J _t , a _t ; ω) of multiple actions of the robot in its action space at the current moment. The steps include: using the D3QN reinforcement learning framework to construct a value network model. Afterwards, the present invention can take robot state information _st and pedestrian state information W _t as input, and convert the dimensions of robot state information _st and pedestrian state information W _t through two multi-layer perceptron models f′ _r and f′ _h becomes consistent with:

s′_t＝f′_r(s_t；W′_r)s′ _t = f′ _r (s _t ; W′ _r )

W′_t＝f′_h(W_t；W′_h)W′ _t = f′ _h (W _t ; W′ _h )

其中，W′_r、W′_h为可训练的权重矩阵；Among them, W′ _r and W′ _h are trainable weight matrices;

再之后，本发明可以以

为特征矩阵，计算第1层图注意力网络33中，每个注意力头下的注意力矩阵

Afterwards, the present invention can be

is the feature matrix, and calculates the attention matrix under each attention head in the first layer graph attention network 33

其中

为可训练的权重矩阵，k₁为第1层图注意力网络33中注意力头的个数，并结合多头注意力机制提取机器人与行人、行人与行人之间的交互特征信息，以计算第1层图注意力网络33的输出结果：in

is a trainable weight matrix, k ₁ is the number of attention heads in the first-layer graph attention network 33, and combines the multi-head attention mechanism to extract the interaction feature information between robots and pedestrians, and between pedestrians and pedestrians, to calculate the first The output of the 1-layer graph attention network33:

其中，||表示特征拼接，σ表示函数，设为ELU，W′₁为可训练的权重矩阵。Among them, || represents feature concatenation, σ represents a function, which is set to ELU, and W′ ₁ is a trainable weight matrix.

再之后，本发明可以计算第2层图注意力网络中，每个注意力头下的注意力矩阵

Afterwards, the present invention can calculate the attention matrix under each attention head in the second-layer graph attention network

其中

其中，||表示特征拼接，σ表示函数，设为ELU，W′₂为可训练的权重矩阵。Among them, || represents feature splicing, σ represents a function, which is set to ELU, and W′ ₂ is a trainable weight matrix.

再之后，本发明可以令

并基于D3QN强化学习算法原理，将H输入后续的值网络模型，以计算各动作a_t的动作价值Q(J_t，a_t；ω)：After that, the present invention can make

And based on the principle of the D3QN reinforcement learning algorithm, input H into the subsequent value network model to calculate the action value Q(J _t _, at ; ω ₎ of each action at:

其中V和A表示两个多层感知机模型35，并且引入了Noisy Net，将高斯噪声添加到全连接层，两个多层感知机模型35以H为输入，分别输出状态价值和优势函数。Among them, V and A represent two multi-layer perceptron models 35, and Noisy Net is introduced to add Gaussian noise to the fully connected layer. The two multi-layer perceptron models 35 take H as input and output state value and advantage function respectively.

再之后，本发明可以将下一时刻的机器人状态信息s_t+1和行人状态信息W_t+1作为值网络模型输入，以计算下一时刻中每个动作a_t+1的动作价值Q(J_t+1，a_t+1；ω)：Afterwards, the present invention can input the robot state information _st+1 and pedestrian state information W _t+1 _at the next moment as a value network model to calculate the action value Q( J _t+1 , a _t+1 ; ω):

再之后，本发明可以基于Q(J_t，a_t；ω)和Q(J_t+1，a_t+1；ω)，重新计算当前环境状态J_t下，动作空间中各动作a_t的价值Q(J_t，a_t；ω)：Afterwards, the present invention can recalculate the values of each action at in the action space under the current environment state J _t based on Q(J _t , at ; ω) and Q(J _t+1 , at ₊₁ ; ω ₎ _. Value Q(J _t , a _t ; ω):

其中γ为折扣因子，Δt为机器人每两次决策之间的时间间隔。Among them, γ is the discount factor, and Δt is the time interval between two decisions of the robot.

可选地，本发明可以通过对于实际机器人地避障效果，调节自由调节折扣因子地值从而达到能够达到更好地避障效果。Optionally, the present invention can achieve a better obstacle avoidance effect by adjusting the value of the freely adjustable discount factor for the obstacle avoidance effect of the actual robot.

进一步地，在本发明的一些实施例中，如图1的步骤S5所示，本发明可以依据动作价值函数选择最优动作策略发布给机器人执行。Further, in some embodiments of the present invention, as shown in step S5 of FIG. 1 , the present invention may select an optimal action strategy based on the action value function and issue it to the robot for execution.

具体来说，本发明可以首先获取机器人在当前时刻的坐标信息

以计算坐标信息与目标点之间的距离d_g，再判断距离d_g是否小于预设阈值。之后，响应于距离d_g大于或等于预设阈值的判断结果，本发明可以进一步确定下一状态J_t+1下动作价值最大的动作来制定最优策略输出。反之，响应于距离d_g小于预设阈值的判断结果，本发明可以判定机器人已经到达目标点，从而停止规划。Specifically, the present invention can first obtain the coordinate information of the robot at the current moment

to calculate the distance d _g between the coordinate information and the target point, and then determine whether the distance d _g is smaller than a preset threshold. Afterwards, in response to the judgment result that the distance d _g is greater than or equal to the preset threshold, the present invention can further determine the action with the highest action value in the next state J _t+1 to formulate the optimal policy output. On the contrary, in response to the judging result that the distance d _g is less than the preset threshold, the present invention can judge that the robot has reached the target point, so as to stop planning.

可选地，如图4所示，本发明还可以根据机器人路径规划方法中的实验结果示意图，比较机器人避障效果并调节预设阈值，直到达到最佳的避障效果。Optionally, as shown in FIG. 4 , the present invention can also compare the obstacle avoidance effect of the robot and adjust the preset threshold according to the schematic diagram of the experimental results in the robot path planning method until the best obstacle avoidance effect is achieved.

综上，相比于本领域目前的机器人导航技术，本发明能够基于机器人路径规划方法以实现机器人更有效且顺应人类社会准则的方式进行移动，因而具有很高的环境适应性和避障成功率，并且能够在复杂行人环境下实现局部避障规划。To sum up, compared with the current robot navigation technology in the field, the present invention can realize the robot to move in a more effective and conforming manner to human social norms based on the robot path planning method, thus having high environmental adaptability and obstacle avoidance success rate , and can realize local obstacle avoidance planning in complex pedestrian environment.

尽管为使解释简单化将上述方法图示并描述为一系列动作，但是应理解并领会，这些方法不受动作的次序所限，因为根据一个或多个实施例，一些动作可按不同次序发生和/或与来自本文中图示和描述或本文中未图示和描述但本领域技术人员可以理解的其他动作并发地发生。Although the methods described above are illustrated and described as a series of acts for simplicity of explanation, it is to be understood and appreciated that the methodologies are not limited by the order of the acts, as some acts may occur in a different order according to one or more embodiments And/or concurrently with other actions from those illustrated and described herein or not illustrated and described herein but can be understood by those skilled in the art.

本领域技术人员将可理解，信息、信号和数据可使用各种不同技术和技艺中的任何技术和技艺来表示。例如，以上描述通篇引述的数据、指令、命令、信息、信号、位(比特)、码元、和码片可由电压、电流、电磁波、磁场或磁粒子、光场或光学粒子、或其任何组合来表示。Those of skill in the art would understand that information, signals and data may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips referred to throughout the above description may be composed of voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination to represent.

本领域技术人员将进一步领会，结合本文中所公开的实施例来描述的各种解说性逻辑板块、模块、电路、和算法步骤可实现为电子硬件、计算机软件、或这两者的组合。为清楚地解说硬件与软件的这一可互换性，各种解说性组件、框、模块、电路、和步骤在上面是以其功能性的形式作一般化描述的。此类功能性是被实现为硬件还是软件取决于具体应用和施加于整体系统的设计约束。技术人员对于每种特定应用可用不同的方式来实现所描述的功能性，但这样的实现决策不应被解读成导致脱离了本发明的范围。Those of skill in the art would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

尽管上述的实施例所述的路径规划装置是可以通过软件与硬件的组合来实现的。但是可以理解，该路径规划装置也可单独在软件或硬件中加以实施。对于硬件实施而言，该路径规划装置可在一个或多个专用集成电路(ASIC)、数字信号处理器(DSP)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、用于执行上述功能的其它电子装置或上述装置的选择组合来加以实施。对软件实施而言，该路径规划装置可通过在通用芯片上运行的诸如程序模块(procedures)和函数模块(functions)等独立的软件模块来加以实施，其中每一个模块执行一个或多个本文中描述的功能和操作。Although the path planning apparatus described in the above-mentioned embodiments can be realized by a combination of software and hardware. However, it can be understood that the path planning device can also be implemented in software or hardware alone. For hardware implementation, the path planning device can be implemented in one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), processors , a controller, a microcontroller, a microprocessor, other electronic devices for performing the functions described above, or a selected combination of the above devices. For software implementation, the path planning device can be implemented by independent software modules such as program modules (procedures) and function modules (functions) running on a general-purpose chip, wherein each module executes one or more Describes the function and operation.

结合本文所公开的实施例描述的各种解说性逻辑模块、和电路可用通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立的门或晶体管逻辑、分立的硬件组件、或其设计成执行本文所描述功能的任何组合来实现或执行。通用处理器可以是微处理器，但在替换方案中，该处理器可以是任何常规的处理器、控制器、微控制器、或状态机。处理器还可以被实现为计算设备的组合，例如DSP与微处理器的组合、多个微处理器、与DSP核心协作的一个或多个微处理器、或任何其他此类配置。The various illustrative logic modules, and circuits described in connection with the embodiments disclosed herein may be implemented using a general-purpose processor, digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), or other programmable Logic devices, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein are implemented or performed. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in cooperation with a DSP core, or any other such configuration.

提供对本公开的先前描述是为使得本领域任何技术人员皆能够制作或使用本公开。对本公开的各种修改对本领域技术人员来说都将是显而易见的，且本文中所定义的普适原理可被应用到其他变体而不会脱离本公开的精神或范围。由此，本公开并非旨在被限定于本文中所描述的示例和设计，而是应被授予与本文中所公开的原理和新颖性特征相一致的最广范围。The previous description of the present disclosure is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to the present disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the present disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A robot path planning method, is characterized in that, comprises the following steps:

Obtain the first initial information of the robot at the current moment and the second initial information of the pedestrian at the current moment;

Predicting the first state of the robot at the next moment and the second state of the pedestrian at the next moment according to the first initial information and the second initial information;

According to the first initial information, the second initial information, the first state and the second state, determine the action values of multiple actions of the robot at the current moment; and

According to the action value of each said action, real-time local path planning is performed.

2. The robot path planning method according to claim 1, wherein the step of obtaining the first initial information of the robot at the current moment and the second initial information of the pedestrian at the current moment comprises:

Set the environment state space, robot action space and environment reward function;

Obtain the pedestrian position and pedestrian speed at the current moment;

determining pose information of the robot according to the odometry data of the robot; and

Determine the the first initial information and the second initial information.

3. The robot path planning method according to claim 2, wherein the step of setting the environment state space comprises:

Let s _t represent the state of the robot at time t, and use

Indicates the observable state of the i-th pedestrian at time t; and

In two-dimensional space, the robot and each pedestrian are assumed to be a circle with a radius of r, and the state information of the robot is expressed as

which consists of the current pose of the robot

speed

which consists of the pedestrian's coordinate position

speed

and express the environment union state as

4. The robot path planning method according to claim 2, wherein the step of setting the robot action space comprises:

In the action space A, the linear velocity is set as v={0.2, 0.4, 0.6, 0.8, 1.0}, and the angular velocity is set as ω={-π/4, -π/6, -π/12, 0 , π/12, π/6, π/4}, and the action space A is expressed as A=[{0,0},{v,ω}], wherein the action space A includes multiple discrete actions.

5. The robot path planning method according to claim 2, wherein the step of setting the environment reward function comprises:

Define the reward function ^Rt as

in,

To reward for being close to the goal,

Punishment for colliding with pedestrians,

Penalties for violating social norms,

said

Used to guide the robot quickly and eventually reach the target position:

Where r _g =0.25 is the reward for reaching the target position, P ^t is the position of the robot at time t, g is the target position,

said

Used to ensure the safety of sports:

is the position of the ith pedestrian at time t,

said

in

is the distance between the robot and the i-th pedestrian.

6. The robot path planning method according to claim 2, wherein the step of obtaining the pedestrian position and pedestrian speed at the current moment comprises:

Detection of the pedestrian's legs via multiple frames of lidar data; and

The corresponding pedestrian is matched according to the detected leg information, and the pedestrian is tracked.

7. The robot path planning method according to claim 3, wherein, according to the first initial information and the second initial information, predicting the first state of the robot at the next moment and the pedestrian The steps of the second state at the next moment include:

Build a pedestrian state prediction model;

Taking the robot state information _st and pedestrian state information W _t as input, the dimensions of the robot state information _st and the pedestrian state information W _t are made consistent through two multi-layer perceptron models f _r and f _h :

s′ _t = f _r (s _t ; W _r ) W′ _t = f _h (W _t ; W _h )

Among them, W _r and W _h are trainable weight matrices;

by

8. robot path planning method according to claim 7, is characterized in that, the step of described construction graph attention neural network comprises:

Calculate the attention matrix under each attention head in the layer 1 graph attention network

in

Among them, || represents feature splicing, σ represents a function, which is set to ELU, and W ₁ is a trainable weight matrix; and

Calculate the attention matrix under each attention head in the layer 2 graph attention network

in

Among them, || represents feature concatenation, σ represents a function, which is set to ELU, and W ₂ is a trainable weight matrix.

9. The robot path planning method according to claim 8, wherein the step of predicting the first state of the robot at the next moment and the second state of the pedestrian at the next moment comprises:

make

where W is a trainable weight matrix; and

Based on the state information of the mobile robot

and action space A, calculate the next moment state information s _t+1 corresponding to each action strategy a _t = [v ^t ,ω ^t ] in the action space:

θ ^t+1 ＝θ ^t+1 +ω ^t

10. The robot path planning method according to claim 1, characterized in that, according to the first initial information, the second initial information, the first state and the second state, determine the current moment The step of action value of a plurality of actions of the robot comprises:

Based on the D3QN reinforcement learning method, the first initial information, the second initial information, the first state and the second state are respectively input into the value network model to calculate the current moment of the robot in its action space The action value Q(J _t _, at ; ω) of multiple actions of .

11. The robot path planning method according to claim 10, characterized in that, the step of calculating the action values Q(J _t _, at ; ω) of multiple actions of the robot in its action space at the current moment include:

Use the D3QN reinforcement learning framework to build a value network model;

_Taking the robot state information _st and the pedestrian state information W _t as input _, the dimensions of the robot state information _st and the pedestrian state information W _t are transformed into In unison:

s′ _t = f′ _r (s _t ; W′ _r )

W _t ′= f _h ′(W _t ; W′ _h )

Among them, W′ _r and W′ _h are trainable weight matrices;

by

in

Among them, || represents feature splicing, σ represents a function, set to ELU, W′ ₁ is a trainable weight matrix; calculate the attention matrix under each attention head in the second layer graph attention network

in

is a trainable weight matrix, k ₂ is the number of attention heads in the second-layer graph attention network, and combines the multi-head attention mechanism to extract the interaction feature information between robots and pedestrians, and between pedestrians and pedestrians, and calculate the second-layer The output of the graph attention network:

Among them, || represents feature splicing, σ represents a function, which is set to ELU, and W′ ₂ is a trainable weight matrix;

make

Among them, V and A represent two multi-layer perceptron models, and Noisy Net is introduced to add Gaussian noise to the fully connected layer. The two multi-layer perceptron models take H as input and output state value and advantage function respectively;

Input the robot state information s _t+1 and pedestrian state information W _t+1 at the next moment as the value network model to calculate the action value Q(J _t ₊₁ ,a _t+1 ; ω):

as well as

Based on Q(J _t , at ; ω) and Q(J _t ₊₁ _, at ₊₁ ; ω), _recalculate the value Q(J _t , a _t ; ω):

Where γ is a discount factor, and Δt is the time interval between two decisions of the robot.

12. The robot path planning method according to claim 1, wherein the step of performing real-time local path planning according to the action value of each of the actions comprises:

According to the calculated value Q(J _t , at ; ω) of each action, select the action with the highest action value in the current state _{J t} _to formulate the optimal policy output, so as to realize the real-time local path planning of the robot.

13. The robot path planning method according to claim 12, wherein the step of performing real-time local path planning further comprises:

Obtain the coordinate information of the robot at the current moment

to calculate the distance d _g between the coordinate information and the target point;

judging whether the distance d _g is less than a preset threshold;

In response to the judgment result that the distance d _g is greater than or equal to the preset threshold, further determine the action with the greatest action value in the next state J _t+1 to formulate an optimal policy output; and

In response to the judgment result that the distance d _g is smaller than the preset threshold, stop planning.

14. A robot path planning device, characterized in that it comprises:

storage; and

A processor, the processor is connected to the memory and is configured to implement the robot path planning method according to any one of claims 1-13.

15. A computer-readable storage medium, on which computer instructions are stored, wherein when the computer instructions are executed by a processor, the robot path planning method according to any one of claims 1-13 is implemented.