[go: up one dir, main page]

Skip to content

Scalable Bayesian Inverse Reinforcement Learning (ICLR 2021) by Alex J. Chan and Mihaela van der Schaar.

License

Notifications You must be signed in to change notification settings

XanderJC/scalable-birl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alex J. Chan and Mihaela van der Schaar

International Conference on Learning Representations (ICLR) 2021

License: MIT Code style: black

Last Updated: 2 March 2021

Code Author: Alex J. Chan (ajc340@cam.ac.uk)

This repo contains a JAX based implementation of the Approximate Variational Reward Imitation Learning (AVRIL) algorithm. The code is ready to run on the control environments in the OpenAI Gym, with pre-run expert trajectories stored in the volume folder.

Given demonstrations, AVRIL learns an approximate posterior distributon over the agents reward function as well as an optimal policy with respect to said reward.

This repo is pip installable - clone it, optionally create a virtual env, and install it (this will automatically install dependencies):

git clone https://github.com/XanderJC/scalable-birl.git

cd scalable-birl

pip install -e .

Example usage:

from sbirl import avril, load_data

# First setup the data, I have provided a helper function for dealing 
# with the OpenAI gym control environemnts

inputs,targets,a_dim,s_dim = load_data('CartPole-v1',num_trajs=15)

# However, AVRIL can handle any data appropriately formatted, that is inputs
# that are (state,next_state) pairs and targets that are (action, next_action)
# pairs:
# inputs = [num_pairs x 2 x state_dimension]
# targets = [num_pairs x 2 x 1]

# You can define the reward to be state-only or state-action depending on use

agent = avril(inputs,targets,s_dim,a_dim,state_only=True)

# Train for set number of iterations with desired batch-size

agent.train(iters=5000,batch_size=64)

# Now test by rolling out in the live Gym environment

agent.gym_test('CartPole-v1')

We can see the trained agent can now balance the pole:

This example can be run simply from the shell using:

python sbirl/models.py

Citing

If you use this software please cite as follows:

@inproceedings{chan2021scalable,
    title={Scalable {B}ayesian Inverse Reinforcement Learning},
    author={Alex James Chan and Mihaela van der Schaar},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=4qR3coiNaIv}
}

About

Scalable Bayesian Inverse Reinforcement Learning (ICLR 2021) by Alex J. Chan and Mihaela van der Schaar.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages