CN117444978B

CN117444978B - Position control method, system and equipment for pneumatic soft robot

Info

Publication number: CN117444978B
Application number: CN202311626768.5A
Authority: CN
Inventors: 高席丰; 晏祯卓; 赵鹏越; 刘欢
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-05-14
Anticipated expiration: 2043-11-30
Also published as: CN117444978A

Abstract

The present invention discloses a position control method for a pneumatic soft robot and its system and equipment. The method obtains an initial data set including several states and corresponding action data groups through a random strategy, and then uses the above data set as a training input, uses a differential variable representing the difference between the Gaussian process function mapping and the actual state as a training target, and uses the posterior predictive distribution of the differential variable obtained by Bayesian inference to represent the transfer dynamics function in the dynamics model. Finally, the learned dynamics model is used to calculate the predicted state distribution, and the strategy parameters are optimized with the goal of minimizing the expected long-term loss to achieve an update strategy. The present invention uses a probability model to replace the actual dynamics model, thereby reducing the model deviation and solving the problem of the existing method over-relying on the precise dynamics model.

Description

A method and system for controlling the position of a pneumatic soft robot

技术领域Technical Field

本发明属于智能机器人技术领域，具体涉及一种气动软体机器人的位置控制方法及其系统、设备。The present invention belongs to the technical field of intelligent robots, and in particular relates to a position control method of a pneumatic soft robot and a system and equipment thereof.

背景技术Background technique

气动软体机器人能够完成现有机器人无法完成的动作，尤其是在一些危险的密闭空间。由于气动软体机器人的结构复杂，对其进行建模和位置控制的问题仍未解决，有待深入探讨和开发。Pneumatic soft robots can perform actions that existing robots cannot, especially in some dangerous confined spaces. Due to the complex structure of pneumatic soft robots, the problems of modeling and position control have not yet been solved and need to be further explored and developed.

位置控制方案基本上分为两类，即静态控制器和动态控制器。静态控制器总是建立在这些机器人的运动学模型之上，而运动学模型是基于稳态假设的，这就导致了它们的速度、效率和机器人的可到达性受到限制。动态控制器是为了利用和发挥这些系统的复杂动态特性。然而，这并不简单，因为建立一个精确的动态模型要比建立运动模型复杂得多。Position control schemes are basically divided into two categories, static controllers and dynamic controllers. Static controllers are always built on the kinematic models of these robots, which are based on steady-state assumptions, which leads to limitations in their speed, efficiency, and reachability. Dynamic controllers are designed to exploit and take advantage of the complex dynamic characteristics of these systems. However, this is not simple, as building an accurate dynamic model is much more complicated than building a kinematic model.

发明内容Summary of the invention

本发明的目的是提供一种气动软体机器人的位置控制方法及其系统、设备，利用概率模型代替实际动力学模型，进而减轻模型偏差，解决了现有方法过度依赖精确动力学模型的问题。The purpose of the present invention is to provide a position control method and system and equipment for a pneumatic soft robot, which utilizes a probabilistic model to replace the actual dynamic model, thereby reducing model deviation and solving the problem of over-reliance on precise dynamic models in existing methods.

本发明的目的是通过以下技术方案实现的：The objective of the present invention is achieved through the following technical solutions:

一种气动软体机器人的位置控制方法，包括如下步骤：A method for controlling the position of a pneumatic soft robot comprises the following steps:

步骤S1、获得初始经验池：Step S1: Obtaining the initial experience pool:

为控制器施加随机策略，利用系统中的传感器获得包含若干机器人状态以及对应控制器动作数据组的初始数据集；Applying a random strategy to the controller, using sensors in the system to obtain an initial data set containing several robot states and corresponding controller action data sets;

步骤S2、动力学模型学习：Step S2: Dynamic model learning:

将步骤S1获得的初始数据集作为训练输入，并使用代表高斯过程功能映射与实际状态之间差异的差分变量作为训练目标；对于已知的策略，利用贝叶斯推断所获得的差分变量的后验预测分布表示动力学模型中的传递动力学函数；The initial data set obtained in step S1 is used as training input, and the difference variable representing the difference between the Gaussian process function map and the actual state is used as the training target; for a known strategy, the posterior predictive distribution of the difference variable obtained by Bayesian inference is used to represent the transfer dynamics function in the dynamics model;

步骤S3、策略更新：Step S3: Policy update:

利用步骤S2所学习的动力学模型计算预测状态分布，评估长期损失，并以预期长期损失最小为目标优化策略参数，实现更新策略。The dynamic model learned in step S2 is used to calculate the predicted state distribution, evaluate the long-term loss, and optimize the strategy parameters with the goal of minimizing the expected long-term loss to achieve the updated strategy.

一种气动软体机器人的位置控制系统，包括被控气动软体机器人单元、输入输出信号采集单元以及数据处理及传输单元，其中：A position control system for a pneumatic soft robot comprises a controlled pneumatic soft robot unit, an input and output signal acquisition unit, and a data processing and transmission unit, wherein:

所述气动软体机器人单元包括至少一种控制对象的一个关节，用于驱动机器人运动；The pneumatic soft robot unit includes at least one joint of a control object for driving the robot to move;

所述输入输出信号采集单元至少可以用于收集上述气动软体机器人的位置控制方法的数据集；The input and output signal acquisition unit can at least be used to collect a data set of the position control method of the pneumatic soft robot;

所述数据处理以及传输单元至少可以用于运行及储存上述气动软体机器人的位置控制方法及信号。The data processing and transmission unit can at least be used to run and store the position control method and signal of the pneumatic soft robot.

一种气动软体机器人的位置控制设备，包括至少一个处理器以及与所述至少一个处理器通信连接的存储器，其中：A position control device for a pneumatic soft robot comprises at least one processor and a memory in communication with the at least one processor, wherein:

所述存储器存储有可被所述至少一个处理器执行的指令；The memory stores instructions executable by the at least one processor;

所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行上述气动软体机器人的位置控制方法。The instructions are executed by the at least one processor so that the at least one processor can execute the position control method of the pneumatic soft robot.

本发明中，上述气动软体机器人的位置控制设备还包括计算机可读存储介质，其中：所述计算机可读存储介质存储有计算机可执行指令；所述计算机可执行指令用于使计算机执行上述气动软体机器人的位置控制方法。In the present invention, the position control device of the above-mentioned pneumatic soft robot also includes a computer-readable storage medium, wherein: the computer-readable storage medium stores computer-executable instructions; the computer-executable instructions are used to enable a computer to execute the position control method of the above-mentioned pneumatic soft robot.

相比于现有技术，本发明具有如下优点：Compared with the prior art, the present invention has the following advantages:

1、本发明提供的气动软体机器人的位置控制方法利用概率模型代替实际动力学模型，进而减轻模型偏差。同时基于概率推理所获得的模型可表达且表示动力学模型的不确定性，且将模型的不确定性纳入到长期规划和决策中。1. The position control method of the pneumatic soft robot provided by the present invention uses a probabilistic model to replace the actual dynamic model, thereby reducing the model deviation. At the same time, the model obtained based on probabilistic reasoning can express and represent the uncertainty of the dynamic model, and incorporate the uncertainty of the model into long-term planning and decision-making.

2、本发明提供的气动软体机器人的位置控制方法利用测量输入的预测结果严重依赖于其附近的训练样本的特点，选取测试输入附近区域的训练样本完成局部GP回归，不但保证预测精度，而且可以降低模型的训练时间。2. The position control method of the pneumatic soft robot provided by the present invention utilizes the characteristics that the prediction result of the measured input is seriously dependent on the training samples near it, and selects training samples in the area near the test input to complete local GP regression, which not only ensures the prediction accuracy but also reduces the training time of the model.

3、本发明提供的气动软体机器人的位置控制方法将实物交互和仿真学习相结合。使用交互数据建立状态转移模型，利用此模型进行仿真策略优化，减小实物交互的风险与成本。3. The position control method of the pneumatic soft robot provided by the present invention combines physical interaction with simulation learning. The state transition model is established using the interaction data, and the simulation strategy is optimized using this model to reduce the risk and cost of physical interaction.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是气动软体机器人的位置控制流程图。FIG1 is a flow chart of position control of a pneumatic soft robot.

图2是气动软体机器人的位置控制系统的结构示意图。FIG. 2 is a schematic diagram of the structure of the position control system of the pneumatic soft robot.

具体实施方式Detailed ways

下面结合附图对本发明的技术方案作进一步的说明，但并不局限于此，凡是对本发明技术方案进行修改或者等同替换，而不脱离本发明技术方案的精神和范围，均应涵盖在本发明的保护范围中。The technical solution of the present invention is further described below in conjunction with the accompanying drawings, but is not limited thereto. Any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention should be included in the protection scope of the present invention.

本发明提供了一种气动软体机器人的位置控制方法，通过随机策略得到包含若干状态以及对应动作数据组的初始数据集，再将上述数据集作为训练输入，使用代表高斯过程功能映射与实际状态之间差异的差分变量作为训练目标，利用贝叶斯推断所获得的差分变量的后验预测分布表示动力学模型中的传递动力学函数，最后利用所学习的动力学模型计算预测状态分布，并以预期长期损失最小为目标优化策略参数，实现更新策略。如图1所示，所述方法包括如下步骤：The present invention provides a position control method for a pneumatic soft robot. An initial data set containing a number of states and corresponding action data groups is obtained through a random strategy. The data set is then used as a training input. A differential variable representing the difference between a Gaussian process function map and an actual state is used as a training target. The posterior predictive distribution of the differential variable obtained by Bayesian inference is used to represent the transfer dynamics function in the dynamics model. Finally, the learned dynamics model is used to calculate the predicted state distribution, and the strategy parameters are optimized with the goal of minimizing the expected long-term loss to achieve an update strategy. As shown in FIG1 , the method comprises the following steps:

为控制器施加随机策略，利用系统中的传感器获得包含若干机器人状态以及对应控制器动作数据组的初始数据集。具体步骤如下：Apply a random strategy to the controller and use the sensors in the system to obtain an initial data set containing several robot states and corresponding controller action data sets. The specific steps are as follows:

向机器人系统中的控制对象施加随机策略，将同一时刻下控制器的动作以及对应的机器人状态/>记录为一组数据，重复施加随机策略得到包含若干状态动作数据组的数据集/>其中n代表施加随机策略的次数。Apply a random strategy to the control object in the robot system and change the controller's actions at the same time. And the corresponding robot state/> Record as a set of data, repeatedly apply random strategies to obtain a data set containing several state-action data sets/> Where n represents the number of times the random strategy is applied.

步骤S2、动力学模型学习：Step S2: Dynamic model learning:

将步骤S1获得的初始数据集作为训练输入，并使用代表高斯过程功能映射与实际状态之间差异的差分变量作为训练目标；对于已知的策略，利用贝叶斯推断所获得的差分变量的后验预测分布表示动力学模型中的传递动力学函数。The initial data set obtained in step S1 is used as training input, and the differential variable representing the difference between the Gaussian process function map and the actual state is used as the training target; for known strategies, the posterior predictive distribution of the differential variable obtained by Bayesian inference is used to represent the transfer dynamics function in the dynamic model.

具体步骤如下：Specific steps are as follows:

步骤S21、将步骤S1获得的数据(x_i,u_i)，i＝1,...,n，作为基于高斯过程所建立的动力学模型的训练输入。Step S21: Use the data ( _xi , _ui ), i=1, ..., n, obtained in step S1 as training input for a dynamics model established based on a Gaussian process.

为了降低动力学模型的训练时间，需要对训练输入进行预处理，具体步骤如下：In order to reduce the training time of the dynamics model, the training input needs to be preprocessed. The specific steps are as follows:

(1)获取目标点与训练数据集之间的距离；(1) Obtain the distance between the target point and the training data set;

(2)计算距离并对此排序；(2) Calculate the distance and sort it;

(3)选择距离训练数据集最近的K个点，其中：K的取值没有具体要求，如果K值较小，则近似误差较低，但估计误差较高；相反，如果K值较大，则近似误差较高，但估计误差较低，因此，在实际使用中，需要根据控制效果进行不断的调整以确定最终的适用值；(3) Select K points closest to the training data set, where: there is no specific requirement for the value of K. If the K value is small, the approximation error is low, but the estimation error is high; on the contrary, if the K value is large, the approximation error is high, but the estimation error is low. Therefore, in actual use, it is necessary to make continuous adjustments according to the control effect to determine the final applicable value;

(4)将选取的K个点作为新训练数据集并进行局部高斯过程回归。(4) The selected K points are used as the new training data set and local Gaussian process regression is performed.

步骤S22、对于M维的状态空间，将第m个高斯过程的函数映射f_m(x_i,u_i)与真实状态x_im之间的差分变量作为训练目标：Step S22: For the M-dimensional state space, the difference variable between the function mapping f _m ( _xi , _ui ) of the m-th Gaussian process and the true state _xim is used as the training target:

Δx_im＝f_m(x_i,u_i)-x_im,m＝1,...,MΔx _im = f _m ( x _i , u _i ) - x _im , m = 1, ..., M

其中，Δx_im又称为高斯过程的训练目标。Among them, Δx _im is also called the training target of the Gaussian process.

步骤S23、对于(x_*,u_*)给定的确定性输入，高斯后续状态分布p(f_m(x_*,u_*))的均值与方差分别是：Step S23: for a given deterministic input (x _* , u _* ), the mean and variance of the Gaussian subsequent state distribution p( _fm (x _* , u _* )) are:

其中，x_*与u_*的定义与S1中的定义相同，分别表示机器人的状态以及控制器的动作，对于本发明的气动软体机器人的位置控制方法，这里的控制器动作可以是各个气动驱动元件产生的驱动信号，如比例阀门的压力，这里的机器人状态即机器人末端执行器的(x，y，z)位置信息；与var分别为均值与方差的符号；Wherein, the definitions of x _* and u _* are the same as those in S1, and represent the state of the robot and the action of the controller, respectively. For the position control method of the pneumatic soft robot of the present invention, the controller action here can be the driving signal generated by each pneumatic driving element, such as the pressure of a proportional valve, and the robot state here is the (x, y, z) position information of the robot end effector; and var are the signs of the mean and variance respectively;

由上述两式组成的预测分布则描述隐函数的不确定关系。The prediction distribution composed of the above two equations describes the uncertainty relationship of the implicit function.

步骤S24、将M个高斯过程整合，利用贝叶斯推断所获得的以差分变量的后验预测分布所表示动力学模型中的传递动力学函数表示为：Step S24: Integrate the M Gaussian processes and use Bayesian inference to obtain the transfer kinetics function in the kinetic model represented by the posterior predictive distribution of the differential variable as follows:

其中，为高斯分布的通用表达方法，其中μ为分布的均值，σ²为分布的方差。在该式中用S23中的/>与var公式代替。in, is a general expression of Gaussian distribution, where μ is the mean of the distribution and σ ² is the variance of the distribution. In this formula, the value of S23 is used to represent the distribution. Replace with var formula.

步骤S3、策略更新：Step S3: Policy update:

利用步骤S2所学习的动力学模型计算预测状态分布，评估长期损失，并以预期长期损失最小为目标优化策略参数，实现更新策略。具体步骤如下：The dynamic model learned in step S2 is used to calculate the predicted state distribution, evaluate the long-term loss, and optimize the strategy parameters with the goal of minimizing the expected long-term loss to achieve the updated strategy. The specific steps are as follows:

步骤S31、在实际系统中，控制信号(压力或流量)存在一定的约束，即u∈[-u_max,u_max]。为此，采用正弦函数对初始策略加以约束，进而限制策略π幅值，最终策略π可描述为/>其中，u_max代表实际控制器的最大控制信号，如比例阀可达到的最大压力。Step S31: In the actual system, the control signal (pressure or flow) has certain constraints, that is, u∈[-u _max ,u _max ]. Therefore, a sine function is used to adjust the initial strategy. Constrained, and then limit the amplitude of strategy π, the final strategy π can be described as/> Among them, u _max represents the maximum control signal of the actual controller, such as the maximum pressure that a proportional valve can reach.

步骤S32、根据步骤S2得到的动力学模型，计算预测状态分布p(x₁),...,p(x_T)以评估长期损失利用确定性共轭梯度最小化方法完成策略参数更新，并利用如下饱和代价函数作为惩罚项：Step S32: Calculate the predicted state distribution p(x ₁ ), ..., p(x _T ) based on the dynamic model obtained in step S2 to assess the long-term loss. The deterministic conjugate gradient minimization method is used to update the policy parameters, and the following saturation cost function is used as the penalty term:

其中，下角标target代表目标位置；Among them, the subscript target represents the target position;

步骤S33、利用梯度的策略搜索方法搜寻最优策略π^*：Step S33: Use the gradient strategy search method to search for the optimal strategy π ^* :

图2展示了气动软体机器人的位置控制系统的一种实施例，所述气动软体机器人的位置控制系统2包括至少一个关节，如机器人第一关节21、机器人第二关节22，机器人末端执行器23，机器人末端位置采集器24，数据处理以及传输单元25，其中：FIG2 shows an embodiment of a position control system of a pneumatic soft robot, wherein the position control system 2 of the pneumatic soft robot includes at least one joint, such as a first joint 21 of the robot, a second joint 22 of the robot, a robot end effector 23, a robot end position collector 24, and a data processing and transmission unit 25, wherein:

所述机器人第一关节21、第二关节22中含有至少一个控制对象，如机器人第一关节气动元件(211a、211b、211c)、机器人第二关节气动元件(221a、221b、221c)，以及与控制对象所匹配的驱动元件，如机器人第一关节气动驱动元件(212a、212b、212c)、机器人第二关节气动驱动元件(222a、222b、222c)；The first joint 21 and the second joint 22 of the robot contain at least one control object, such as the first joint pneumatic element (211a, 211b, 211c) of the robot, the second joint pneumatic element (221a, 221b, 221c) of the robot, and a driving element matched with the control object, such as the first joint pneumatic driving element (212a, 212b, 212c) of the robot, and the second joint pneumatic driving element (222a, 222b, 222c) of the robot;

所述机器人末端位置采集器24用以实时采集机器人末端执行器23输出的位置信息并上传至数据处理以及传输单元25；The robot end position collector 24 is used to collect the position information output by the robot end effector 23 in real time and upload it to the data processing and transmission unit 25;

所述机器人第一关节气动驱动元件(212a、212b、212c)、机器人第二关节气动驱动元件(222a、222b、222c)应包括信号采集模块用以检测输入至机器人系统的控制信号并上传至数据处理以及传输单元25；The robot first joint pneumatic drive element (212a, 212b, 212c) and the robot second joint pneumatic drive element (222a, 222b, 222c) should include a signal acquisition module for detecting the control signal input to the robot system and uploading it to the data processing and transmission unit 25;

所述数据处理以及传输单元25至少可以用于运行及储存如上述步骤S1-步骤S3中的控制方法及信号。The data processing and transmission unit 25 can at least be used to run and store the control method and signals in the above steps S1 to S3.

同时，上述数据处理以及传输单元25可作为气动软体机器人的位置控制设备，其至少包含一个处理器以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行如上述步骤S1-步骤S3中所述的气动软体机器人的位置控制方法。At the same time, the above-mentioned data processing and transmission unit 25 can be used as a position control device of a pneumatic soft robot, which includes at least one processor and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the position control method of the pneumatic soft robot as described in the above-mentioned steps S1 to S3.

Claims

1. A method for controlling the position of a pneumatic soft robot, characterized in that the method comprises the following steps:

Step S1: Obtaining the initial experience pool:

Apply a random strategy to the controller and use the sensors in the system to obtain an initial data set containing several robot states and corresponding controller action data sets. The specific steps are as follows:

Apply a random strategy to the control object in the robot system, record the controller's action u and the corresponding robot state x at the same time as a set of data, and repeatedly apply the random strategy to obtain a data set containing several state-action data sets.

Step S2: Dynamic model learning:

The initial data set obtained in step S1 is used as the training input, and the differential variable representing the difference between the Gaussian process function map and the actual state is used as the training target; for a known strategy, the posterior predictive distribution of the differential variable obtained by Bayesian inference is used to represent the transfer kinetic function in the kinetic model; the specific steps are as follows:

Step S21: Use the data ( _xi , _ui ) obtained in step S1 as the training input of the dynamic model established based on the Gaussian process, i = 1, ..., n, u is the action of the controller, x is the robot state, and n represents the number of times the random strategy is applied; in order to reduce the training time of the dynamic model, the training input needs to be preprocessed, and the specific steps are as follows:

(1) Obtain the distance between the target point and the training data set;

(2) Calculate the distance and sort it;

(3) Select the K points closest to the training data set;

(4) The selected K points are used as new training data sets and local Gaussian process regression is performed;

Step S22: For the M-dimensional state space, the difference variable between the function mapping f _m ( _xi , _ui ) of the m-th Gaussian process and the true state _xim is used as the training target:

Δx _im = f _m ( x _i , u _i ) - x _im , m = 1, ..., M

Among them, Δx _im is also called the training target of the Gaussian process;

Step S23: for a given deterministic input (x _* , u _* ), the mean and variance of the Gaussian subsequent state distribution p( _fm (x _* , u _* )) are:

in, and var are the signs of the mean and variance respectively;

Step S24: Integrate the M Gaussian processes and use Bayesian inference to obtain the transfer kinetics function in the kinetic model represented by the posterior predictive distribution of the differential variable as follows:

Step S3: Policy update:

The dynamic model learned in step S2 is used to calculate the predicted state distribution, evaluate the long-term loss, and optimize the strategy parameters with the goal of minimizing the expected long-term loss to implement the update strategy. The specific steps are as follows:

Step S31: Use a sine function to calculate the initial strategy To constrain the strategy π, the strategy π is finally described as Among them, u _max represents the maximum control signal of the actual controller;

Step S32: Calculate the predicted state distribution p(x ₁ ), ..., p(x _T ) based on the dynamic model obtained in step S2 to assess the long-term loss. The deterministic conjugate gradient minimization method is used to update the policy parameters, and the following saturation cost function is used as the penalty term:

Among them, the subscript target represents the target position;

Step S33: Search for the optimal strategy π ^* using a gradient strategy search method:

2. A position control system for a pneumatic soft robot, characterized in that the system comprises a controlled pneumatic soft robot unit, an input and output signal acquisition unit, and a data processing and transmission unit, wherein:

The pneumatic soft robot unit includes at least one joint of a control object for driving the robot to move;

The input and output signal acquisition unit is at least used to collect a data set of the position control method of the pneumatic soft robot;

The data processing and transmission unit is at least used to run and store the position control method and signal of the pneumatic soft robot according to claim 1.

3. A position control device for a pneumatic soft robot, characterized in that the device comprises at least one processor and a memory in communication with the at least one processor, wherein:

The memory stores instructions executable by the at least one processor;

The instructions are executed by the at least one processor so that the at least one processor can execute the position control method of the pneumatic soft robot according to claim 1.

4. The position control device of a pneumatic soft robot according to claim 3 is characterized in that the device also includes a computer-readable storage medium, wherein: the computer-readable storage medium stores computer-executable instructions; the computer-executable instructions are used to enable a computer to execute the position control method of the pneumatic soft robot described in 1.