GitHub - sparisi/tensorl: Simple and self-contained TensorFlow implementation of reinforcement learning algorithms for continuous control, integrated with OpenAI Gym and other physics engines.

Description

This is a small and simple collection of some reinforcement learning algorithms. The core idea of this repo is to have minimal structure, such that each algorithm is easy to understand and to modify. For this reason, each algorithm has a separate folder, independent from the others. Only approximators (neural network, linear functions, ...), policy classes, and auxiliary functions (for plotting or collecting data with gym-like environments) are shared.

Note that an algorithm can have different versions. For example, SPG can learn the critic by using Monte-Carlo estimates or by temporal difference.

The repository has a modular structure and no installation is needed. To run an algorithm, from the root folder execute
python3 -m <ALG>.<RUN_SCRIPT> <ENV_NAME> <SEED>
(seed is optional, default is 1). At each iteration, data about the most important statistics (average return, value function loss, entropy, ...) is saved in
data-trial/<ALG_NAME>/<ENV_NAME>/<DATE_TIME>.dat.
For example, running
python3 -m ddpg.ddpg Pendulum-v0 0
will generate
data-trial/ddpg/Pendulum-v0/180921_155842.dat.

You can also save/load the learned model and visualize the graph. For more info, check demo.py. The demo also shows how to use the LQR environment and how to plot value functions.

Finally, use any of the run scripts in the root folder to run several trials of the same algorithm in parallel (see the scripts for instructions).
With data generated from the runs, you can plot the average results with 95% confidence interval using plot_shaded.py, or you can plot all learning curves together with plot_all.py (see the scripts for instructions).

Note that all scripts use flexible memory, i.e.,

config_tf = tf.ConfigProto()
config_tf.gpu_options.allow_growth=True
session = tf.Session(config=config_tf)

Requirements

Later versions of tensorflow may raise warnings.

You can also use other physics simulators, such as Roboschool, PyBullet and MuJoCo.

Common files

approximators.py : neural network, random Fourier features, polynomial features
average_env.py : introduces state resets to consider average return MDPs
cross_validation.py : function to minimize a loss function with cross-validation
data_collection.py : functions for sampling MDP transitions and getting mini-batches
filter_env.py : modifies a gym environment to have states and actions normalized in [-1,1]
logger.py : creates folders for saving data
noise.py : noise functions
plotting.py : to plot value functions
policy.py : implementation of common policies
rl_utils.py : RL functions, such as generalized advantage estimation and retrace

Algorithm-specific files

solver.py : (optional) defines optimization routines required by the algorithm
hyperparameters.py : defines the hyperparameters (e.g., number of transitions per iteration, network sizes and learning rates)
<NAME>.py : script to run the algorithm (e.g., ppo.py or ddpg.py)

Implemented algorithms

Stochastic policy gradient (SPG). The folder includes REINFORCE and two actor-critic versions.
Deep deterministic policy gradient (DDPG).
Twin delayed DDPG (TD3).
Trust region policy optimization (TRPO).
Proximal policy optimization (PPO).
Asynchronous advantage actor-critic (A3C).
Soft Actor-Critic (SAC), first and second version.
Relative entropy policy search (REPS).
Actor-critic REPS (AC-REPS).

TD-regularized actor-critic methods (TD-REG and GAE-REG) is implemented for PPO, TRPO, and DDPG.
Curiosity-driven exploration by self-supervised prediction (ICM) is implemented for PPO.
Prioritized experience replay (PER) is implemented for DDPG.
Projections for approximate policy iteration algorithms (HPROJ) is implemented for PPO.

All implementations are very basic, there is no reward/gradient clipping, hyperparameters tuning, decaying KL/entropy coefficient, batch normalization, standardization with running mean and std, ...

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
a3c		a3c
a3c_ppo		a3c_ppo
ac_reps		ac_reps
common		common
ddpg		ddpg
ppo		ppo
reps		reps
spg		spg
trpo		trpo
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
demo_cv.py		demo_cv.py
demo_discrete.py		demo_discrete.py
plot_all.py		plot_all.py
plot_shaded.py		plot_shaded.py
run_joblib.py		run_joblib.py
run_multiproc.py		run_multiproc.py
run_slurm.py		run_slurm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Requirements

Common files

Algorithm-specific files

Implemented algorithms

About

Releases

Packages

Languages

sparisi/tensorl

Folders and files

Latest commit

History

Repository files navigation

Description

Requirements

Common files

Algorithm-specific files

Implemented algorithms

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages