Last Updated: 2 March 2021
Code Author: Alex J. Chan (ajc340@cam.ac.uk)
This repo contains a JAX based implementation of the Approximate Variational Reward Imitation Learning (AVRIL) algorithm. The code is ready to run on the control environments in the OpenAI Gym, with pre-run expert trajectories stored in the volume folder.
Given demonstrations, AVRIL learns an approximate posterior distributon over the agents reward function as well as an optimal policy with respect to said reward.This repo is pip installable - clone it, optionally create a virtual env, and install it (this will automatically install dependencies):
git clone https://github.com/XanderJC/scalable-birl.git
cd scalable-birl
pip install -e .
Example usage:
from sbirl import avril, load_data
# First setup the data, I have provided a helper function for dealing
# with the OpenAI gym control environemnts
inputs,targets,a_dim,s_dim = load_data('CartPole-v1',num_trajs=15)
# However, AVRIL can handle any data appropriately formatted, that is inputs
# that are (state,next_state) pairs and targets that are (action, next_action)
# pairs:
# inputs = [num_pairs x 2 x state_dimension]
# targets = [num_pairs x 2 x 1]
# You can define the reward to be state-only or state-action depending on use
agent = avril(inputs,targets,s_dim,a_dim,state_only=True)
# Train for set number of iterations with desired batch-size
agent.train(iters=5000,batch_size=64)
# Now test by rolling out in the live Gym environment
agent.gym_test('CartPole-v1')
We can see the trained agent can now balance the pole:
This example can be run simply from the shell using:
python sbirl/models.py
If you use this software please cite as follows:
@inproceedings{chan2021scalable,
title={Scalable {B}ayesian Inverse Reinforcement Learning},
author={Alex James Chan and Mihaela van der Schaar},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=4qR3coiNaIv}
}