[go: up one dir, main page]

0% found this document useful (0 votes)
34 views7 pages

Energy-Efficient Robot Trajectory Optimization

Uploaded by

h20240104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views7 pages

Energy-Efficient Robot Trajectory Optimization

Uploaded by

h20240104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Actor Critic Methods

 The actor-critic method is a reinforcement learning approach that combines


two components: The actor, which decides the actions to take (policy), and
the critic, which evaluates the actions taken by estimating the value
function. The actor updates the policy based on feedback from the critic,
creating a balance between exploration and optimization.

Deep deterministic policy gradient (DDPG)

Deep Deterministic Policy Gradient (DDPG) is an algorithm is a reinforcement


learning algorithm that combines ideas from Q-Learning (estimating the value of
actions) and Policy Gradients (directly optimizing actions) to learn both the best
actions and their value simultaneously.

BITS Pilani, Pilani Campus


Energy-Efficient Trajectory Planning

 Energy-efficient trajectory planning is possible using the Deep Deterministic Policy


Gradient (DDPG) algorithm for training. It involves parallel training by dividing
the robot's dynamic model into submodules, facilitating faster training while
obtaining high accuracy.
 This achieves significant energy savings (23.21%) reduction compared to default
(trajectories) by eliminating the heavy computations involved in traditional
nonlinear methods.
 The main advantage of this method is it achieves real-time trajectory generation,
contrasting with slower traditional optimization techniques like genetic algorithms
or dynamic programming.

BITS Pilani, Pilani Campus


Proximal Policy Optimization (PPO)
 Proximal Policy Optimization (PPO) is a reinforcement learning algorithm
designed to optimize policies efficiently and reliably by improving upon Trust
Region Policy Optimization (TRPO).
 PPO uses a clipped objective function to limit the size of policy updates, ensuring
they stay within a safe range without requiring the computational complexity of
TRPO's trust region constraints.

Robotic arm trajectory tracking method based on improved proximal policy


optimization

• To study the trajectory tracking method for robotic arms, the traditional tracking
method has low accuracy and cannot realize the complex tracking tasks.
• Compared with traditional methods, deep reinforcement learning is an effective
scheme with the advantages of robustness and solving complex problems.

BITS Pilani, Pilani Campus


Conti..
• If the step size is too large, the result is jittery and does not converge. The PPO
algorithm uses the ratio of new and old strategies, which can solve the problem that
the learning rate is difficult to determine in the PG algorithm. To improve the
robustness of the tracking algorithm, the PPO algorithm is improved based on the
stable policy gradient.

(a) (b)
• The solid blue line in Fig (a) is the expected trajectory of the robotic arm. The red
solid line in Fig (b) shows the actual trajectory of the robotic arm.
• The simulation results show that the Improved-PPO algorithm outperforms the A3C
and PPO algorithms for robotic arm trajectory tracking.

BITS Pilani, Pilani Campus


Trust region policy Optimization

 Trust Region Policy Optimization (TRPO) is a reinforcement learning algorithm


that optimizes policies by constraining the step size during updates to ensure stable
and reliable learning. It uses a trust region constraint to prevent the policy from
changing too drastically, maintaining a balance between exploration and
exploitation.
 Using TRPO, industrial robots can improve their decision-making in complex
scenarios, such as assembly or material handling, while ensuring performance
consistency and energy efficiency.

Complex Robot Manipulation Tasks Based On Hindsight Trust Region Policy Optimization
• In this experimentation, the manipulator is put into four challenging sparse-reward
environments, which include two types of tasks. One is the reaching task with
obstacles, and the other consists of three dynamic object tasks. Both types of tasks
are goal-conditioned, which means the robot will have a goal observation at every
time step.

BITS Pilani, Pilani Campus


• The results show that HTRPO (Hindsight Trust Region Policy Optimization) when
compared with HPG and TRPO achieves higher success rate and stability on most
of the tasks.

BITS Pilani, Pilani Campus


Thank You

BITS Pilani, Pilani Campus

You might also like