CN114460841B

CN114460841B - Foot robot multi-step controller generation method and computer readable storage medium

Info

Publication number: CN114460841B
Application number: CN202111534719.XA
Authority: CN
Inventors: 王宏涛; 邵烨程; 金永斌
Original assignee: ZJU Hangzhou Global Scientific and Technological Innovation Center
Current assignee: ZJU Hangzhou Global Scientific and Technological Innovation Center
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2024-10-15
Anticipated expiration: 2041-12-15
Also published as: CN114460841A

Abstract

The application relates to the technical field related to robot control, and discloses a method for generating a multi-step controller of a foot-type robot and a computer readable storage medium, wherein the method for generating the multi-step controller of the foot-type robot comprises the following steps: s1: establishing a simulation environment by using any physical engine; s2, inputting initial parameters, and generating reference motion by using a function; and S3, performing reinforcement learning training in a simulation environment by using an optimization algorithm, adding a reward function, simulating the motion generated in the step S2, wherein the reward function comprises deviation weights of the actual state of the robot and the reference state in the step S2, and then outputting the controller. Because reinforcement learning has great randomness, a control algorithm with very unnatural motion conditions is easily obtained, and the problem of unnatural motion conditions can be effectively avoided by generating reference motion in the step S2 and taking the difference between the actual motion of the robot and the reference motion as one of targets in the step S3.

Description

Foot robot multi-step controller generation method and computer readable storage medium

Technical Field

The invention relates to the technical field related to robot control, in particular to a method for generating a multi-step controller of a foot-type robot and a computer readable storage medium.

Background

The rhythms of the leg-foot type living beings or robots in which the different legs are lifted and landed during movement are called gait. Animals in nature can adopt different optimal gait under different environments and movement speeds. In the case of four-foot exercise, because of the large number of legs and the large number and complexity of gait, it is difficult to design a controller to control the four-foot robot to perform a plurality of different gait exercises. The application provides the robot for solving the problems that most of robots at home and abroad can only control the movement of one gait, and the effect of automatically adjusting the gait of the robot according to the environment is difficult to realize, so that the adaptability of the robot to different environments is insufficient.

Disclosure of Invention

The present invention is directed to a method for generating a multi-step controller of a foot robot and a computer readable storage medium for solving the above problems.

The invention is realized by the following technical scheme.

The invention relates to a method for generating a multi-step controller of a foot robot, which comprises the following steps:

S1: establishing a simulation environment by using any physical engine;

s2, inputting initial parameters, and generating reference motion by using a function;

And S3, performing reinforcement learning training in a simulation environment by using an optimization algorithm, adding a reward function, simulating the motion generated in the step S2, taking the difference between the actual motion of the robot and the reference motion in the step S2 as one of targets, and then outputting the targets to a controller.

Further, in the step S2, a plurality of phases are used, each phase corresponds to a periodic motion of one leg, and each phase is generated in the same or different manners.

Further, the function of generating the reference motion in the step S2 is:

where p _i is the position of the foot, Is the three-dimensional coordinates of the foot,In order to be a phase of the light,The step length is represented by T, beta is the duty ratio of the landing time, v _x,v_y is the movement speed, and h is the leg lifting height.

Further, the strategy of reinforcement learning training in the step S3 is a long-short term memory neural network.

Further, the optimization algorithm in the step S3 is one of a strategy gradient algorithm, Q-Learning or DQN.

Further, the policy gradient algorithm includes a PPO or TRPO algorithm.

Further, the reward function in the step S3 is the sum of the products of the motion state values of the robots and the weight coefficients thereof.

Further, the reward function is specifically r＝w₁r_τ+w₂r_v+w₃r_j+w₄r_b+w_sr_h+w₆r_c, where w ₁-w₆ is a weight, r _j is a deviation parameter of an actual state of the robot from a reference state generated in the step S2, r _τ is a parameter for encouraging the robot to save a motor moment, r _v is a parameter for encouraging the actual speed of the robot to follow a command speed, r _b is a parameter for encouraging the body posture of the robot to be stable, r _h is a parameter for encouraging the body height of the robot to be stable at a certain value, and r _c is a parameter for encouraging the contact state of each foot of the robot with the ground to be the same as the contact state of the reference movement.

Further, in the step S3, the size and the quality of each element of the robot, the contact friction and the time delay are randomized.

A computer readable storage medium having stored therein a computer program or code set which when executed by a processor implements the foot robot multi-step controller generation method described above.

The invention has the beneficial effects that:

Because reinforcement learning has great randomness, a control algorithm with very unnatural motion conditions is easily obtained, and the problem of unnatural motion conditions can be effectively avoided by generating reference motion in the step S2 and taking the difference between the actual motion of the robot and the reference motion as one of targets in the step S3.

The method for generating the reference motion in the step S2 is very simple, the reference motion control effect of the simple geometry is considered to be poor by the prior art, a simplified dynamics model is required to be established for the robot, a control algorithm based on the model is used for obtaining a high-quality controller, but the establishment difficulty of the controller is high.

The application uses the long-short time memory network, the network structure can fully excavate the hidden information in the time sequence data, and for the problem, the long-short time memory network can automatically identify the system characteristic of the robot by utilizing the time sequence information in the process of the movement of the robot, so that the controller can directly migrate to the physical robot from the simulation environment without additional debugging.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.

The invention will be further described with reference to the drawings and examples.

Fig. 1 is a schematic flow chart of a method for generating a multi-step controller of a foot robot according to the present invention.

Detailed Description

The present invention will be described in detail with reference to fig. 1.

S1: the simulation environment is established using any physical engine, such as: the method comprises the steps of constructing a multi-legged robot rapid simulation environment by using an open source Bullet physical engine, wherein the multi-legged robot rapid simulation environment comprises a plurality of groups of robot physical models and physical attributes of surrounding environments, and the sensor information of the plurality of groups of robots is added into the simulation environment in a plug-in mode and displayed in a visual mode.

S2: inputting initial parameters, generating reference motion by using a function, wherein the function for generating the reference motion is as follows:

where p _i is the position of the foot, Is the three-dimensional coordinates of the foot,In order to be a phase of the light,The method comprises the steps of representing step length, T is period, beta is ground time duty ratio, v _x,v_y is motion speed, h is leg lifting height, initial parameters input in the step S2 comprise sensor data of a robot, control instructions and phase information, the sensor data comprise motor position and speed, robot posture and robot heart speed, the control instructions comprise target line speed and angular speed, the phase information is in four-foot robot example, 4 phases are used, each phase corresponds to periodic motion of one leg, each phase is continuously circulated from 0 to 2 pi in the working process, sin and cos functions are calculated for the phases at the same time, and final input quantity is obtained.

And S3, performing reinforcement learning training in a simulation environment by utilizing an optimization algorithm, wherein the complete reinforcement learning training comprises the simulation environment, a strategy (or a controller, a control algorithm, namely, a mathematical form from input to output), a state space and a motion space (namely, the input and the output of the strategy), a reward function (or an objective function) and the optimization algorithm. Repeating trial and error in a simulation environment by using a reinforcement learning method, and automatically determining parameters by an optimization algorithm;

The optimization algorithm is specifically a strategic gradient algorithm such as PPO or TRPO, or Q-Learning algorithm, DQN algorithm and the like algorithm can be used, a reward function is added, the reward function is the sum of products of motion state values of a plurality of robots and weight coefficients thereof, preferably, the reward function is r＝w₁r_τ+w₂r_v+w₃r_j+w₄r_b+w₅r_h+w₆r_c,, wherein w ₁-w₆ is weight, r _j is a deviation parameter of an actual state of the robots from a reference state generated in the step S2, r _τ is a parameter for encouraging the robots to save motor moment, r _v is a parameter for encouraging the robots to follow a command speed, r _b is a parameter for encouraging the robot to have stable body posture, r _h is a parameter for encouraging the robot to have a stable height at a certain value, r _c is a parameter for encouraging the contact state of each foot of the robots with the ground to have the same contact state as the reference motion, so that the difference of the actual motion of the robots and the reference motion in the step S2 is one of targets, and then a controller is output;

the strategy of reinforcement learning training in the step S3 is a long-short term memory neural network, the network structure can fully mine implicit information in time sequence data, and for the problem, the long-short term memory network can automatically identify the system characteristics of the robot by utilizing the time sequence information in the process of robot movement, so that a controller can be directly migrated to a physical robot from a simulation environment without additional debugging.

S4: and (3) transferring the controller obtained in the step (S3) to a physical robot, outputting the target position of each joint of the robot by the controller, so that the control of the robot can be realized, the robot can move along with a speed command under the gait specified by the phase, and the multiple gaits are switched.

In the step S3, parameters such as size and quality of each element, contact friction, delay and the like of the robot are randomized, actual conditions such as hardware processing errors, ground conditions, system delay and the like are simulated, robustness is enhanced, and success rate from simulation to physical migration is improved.

The above embodiments are only for illustrating the technical concept and features of the present invention, and are intended to enable those skilled in the art to understand the present invention and implement it without limiting the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.

Claims

1. A method for generating a multi-step controller of a foot robot is characterized by comprising the following steps: the method comprises the following steps:

S1: establishing a simulation environment by using any physical engine;

s3, performing reinforcement learning training in a simulation environment by using an optimization algorithm, adding a reward function, simulating the motion generated in the step S2, wherein the reward function comprises deviation weights of the actual state of the robot and the reference state in the step S2, and then outputting the deviation weights to a controller;

in the step S2, a plurality of phases are used, each phase corresponds to the periodic motion of one leg, and the generation modes of each phase are the same or different;

The function of generating the reference motion in the step S2 is as follows:

2. The method for generating the multi-step controller of the foot robot according to claim 1, wherein the method comprises the steps of: and (3) the strategy of reinforcement learning training in the step S3 is a long-short-term memory neural network.

3. The method for generating the multi-step controller of the foot robot according to claim 2, wherein the method comprises the steps of: the optimization algorithm in the step S3 is one of a strategy gradient algorithm, Q-Learning or DQN.

4. A method for generating a multi-step controller for a foot robot according to claim 3, wherein: the policy gradient algorithm comprises a PPO or TRPO algorithm.

5. The method for generating the multi-step controller of the foot robot according to claim 4, wherein the method comprises the steps of: the reward function in the step S3 is the sum of the products of the motion state values of a plurality of robots and the weight coefficients of the robots.

6. The method for generating the multi-step controller of the foot robot according to claim 5, wherein the method comprises the steps of: the reward function is specifically r＝w₁r_τ+w₂r_v+w₃r_j+w₄r_b+w₅r_h+w₆r_c,, where w ₁-w₆ is a weight, r _j is a deviation parameter of an actual state of the robot from a reference state generated in the step S2, r _τ is a parameter for encouraging the robot to save a motor moment, r _v is a parameter for encouraging the actual speed of the robot to follow a command speed, r _b is a parameter for encouraging the body posture of the robot to be stable, r _h is a parameter for encouraging the body height of the robot to be stable at a certain value, and r _c is a parameter for encouraging the contact state of each foot of the robot with the ground to be the same as the contact state of the reference movement.

7. A method of generating a multi-step controller for a foot robot according to any one of claims 3 to 6, wherein: and in the step S3, randomizing the size and the quality of each element of the robot, the contact friction and the time delay.

8. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein a computer program or a set of codes, which when executed by a processor, implements the method for generating a multi-step controller of a foot robot according to any one of claims 1-7.