[go: up one dir, main page]

Skip to content

Reinforcement learning environments with musculoskeletal models

License

Notifications You must be signed in to change notification settings

crowdAI/osim-rl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NIPS2017: Learning to run

This repository contains software required for participation in the NIPS 2017 Challenge: Learning to Run. See more details about the challenge here. Please read about the latest changes and the logistics of the second round here (last update November 6th).

In this competition, you are tasked with developing a controller to enable a physiologically-based human model to navigate a complex obstacle course as quickly as possible. You are provided with a human musculoskeletal model and a physics-based simulation environment where you can synthesize physically and physiologically accurate motion. Potential obstacles include external obstacles like steps, or a slippery floor, along with internal obstacles like muscle weakness or motor noise. You are scored based on the distance you travel through the obstacle course in a set amount of time.

HUMAN environment

To model physics and biomechanics we use OpenSim - a biomechanical physics environment for musculoskeletal simulations.

Getting started

Anaconda is required to run our simulations. Anaconda will create a virtual environment with all the necessary libraries, to avoid conflicts with libraries in your operating system. You can get anaconda from here https://www.continuum.io/downloads. In the following instructions we assume that Anaconda is successfully installed.

We support Windows, Linux, and Mac OSX (all in 64-bit). To install our simulator, you first need to create a conda environment with the OpenSim package.

On Windows, open a command prompt and type:

conda create -n opensim-rl -c kidzik opensim git python=2.7
activate opensim-rl

On Linux/OSX, run:

conda create -n opensim-rl -c kidzik opensim git python=2.7
source activate opensim-rl

These commands will create a virtual environment on your computer with the necessary simulation libraries installed. Next, you need to install our python reinforcement learning environment. Type (on all platforms):

conda install -c conda-forge lapack git
pip install git+https://github.com/stanfordnmbl/osim-rl.git

If the command python -c "import opensim" runs smoothly, you are done! Otherwise, please refer to our FAQ section.

Note that source activate opensim-rl activates the anaconda virtual environment. You need to type it every time you open a new terminal.

Basic usage

To execute 200 iterations of the simulation enter the python interpreter and run the following:

from osim.env import RunEnv

env = RunEnv(visualize=True)
observation = env.reset(difficulty = 0)
for i in range(200):
    observation, reward, done, info = env.step(env.action_space.sample())

Random walk

The function env.action_space.sample() returns a random vector for muscle activations, so, in this example, muscles are activated randomly (red indicates an active muscle and blue an inactive muscle). Clearly with this technique we won't go too far.

Your goal is to construct a controller, i.e. a function from the state space (current positions, velocities and accelerations of joints) to action space (muscle excitations), that will enable to model to travel as far as possible in a fixed amount of time. Suppose you trained a neural network mapping observations (the current state of the model) to actions (muscle excitations), i.e. you have a function action = my_controller(observation), then

# ...
total_reward = 0.0
for i in range(200):
    # make a step given by the controller and record the state and the reward
    observation, reward, done, info = env.step(my_controller(observation))
    total_reward += reward
    if done:
        break

# Your reward is
print("Total reward %f" % total_reward)

There are many ways to construct the function my_controller(observation). We will show how to do it with a DDPG (Deep Deterministic Policy Gradients) algorithm, using keras-rl. If you already have experience with training reinforcement learning models, you can skip the next section and go to evaluation.

Training your first model

Below we present how to train a basic controller using keras-rl. First you need to install extra packages:

conda install keras -c conda-forge
pip install git+https://github.com/matthiasplappert/keras-rl.git
git clone http://github.com/stanfordnmbl/osim-rl.git

keras-rl is an excellent package compatible with OpenAI, which allows you to quickly build your first models!

Go to the scripts subdirectory from this repository

cd osim-rl/scripts

There are two scripts:

  • example.py for training (and testing) an agent using the DDPG algorithm.
  • submit.py for submitting the result to crowdAI.org

Training

python example.py --visualize --train --model sample

Test

and for the gait example (walk as far as possible):

python example.py --visualize --test --model sample

Moving forward

Note that it will take a while to train this model. You can find many tutorials, frameworks and lessons on-line. We particularly recommend:

Tutorials & Courses on Reinforcement Learning:

Frameworks and implementations of algorithms:

OpenSim and Biomechanics:

This list is by no means exhaustive. If you find some resources particularly well-fit for this tutorial, please let us know!

Evaluation

Your task is to build a function f which takes the current state observation (a 41 dimensional vector) and returns the muscle excitations action (18 dimensional vector) in a way that maximizes the reward.

The trial ends either if the pelvis of the model goes below 0.65 meters or if you reach 1000 iterations (corresponding to 10 seconds in the virtual environment). Your total reward is the position of the pelvis on the x axis after the last iteration minus a penalty for using ligament forces. Ligaments are tissues which prevent your joints from bending too much - overusing these tissues leads to injuries, so we want to avoid it. The penalty in the total reward is equal to the sum of forces generated by ligaments over the trial, divided by 10,000,000.

After each iteration you get a reward equal to the change of the x axis of pelvis during this iteration minus the magnitude of the ligament forces used in that iteration.

You can test your model on your local machine. For submission, you will need to interact with the remote environment: crowdAI sends you the current observation and you need to send back the action you take in the given state. You will be evaluated at three different levels of difficulty. For details, please refer to Details of the environment.

Submission

Assuming your controller is trained and is represented as a function my_controller(observation) returning an action you can submit it to crowdAI through interaction with an environment there:

import opensim as osim
from osim.http.client import Client
from osim.env import RunEnv

# Settings
remote_base = "http://grader.crowdai.org:1729"
crowdai_token = "[YOUR_CROWD_AI_TOKEN_HERE]"

client = Client(remote_base)

# Create environment
observation = client.env_create(crowdai_token)

# IMPLEMENTATION OF YOUR CONTROLLER
# my_controller = ... (for example the one trained in keras_rl)

while True:
    [observation, reward, done, info] = client.env_step(my_controller(observation), True)
    print(observation)
    if done:
        observation = client.env_reset()
        if not observation:
            break

client.submit()

In the place of [YOUR_CROWD_AI_TOKEN_HERE] put your token from the profile page from crowdai.org website.

Note that during the submission, the environment will get restarted. Since the environment is stochastic, you will need to submit three trials -- this way we make sure that your model is robust.

Rules

In order to avoid overfitting to the training environment, the top participants (those who obtained 15.0 points or more) will be asked to resubmit their solutions in the second round of the challenge. Environments in the second round will have the same structure but 10 obstacles and different seeds. In each submission, there will be 10 simulation. Each participant will have a limit of 3 submissions. The final ranking will be based on the results from the second round.

Additional rules:

  • You are not allowed to use external datasets (e.g., kinematics of people walking)
  • Organizers reserve the right to modify challenge rules as required.

Details of the environment

In order to create an environment, use:

    from osim.env import RunEnv

    env = RunEnv(visualize = True)

Parameters:

  • visualize - turn the visualizer on and off

Methods of RunEnv

reset(difficulty = 2, seed = None)

Restart the enivironment with a given difficulty level and a seed.

  • difficulty - 0 - no obstacles, 1 - 3 randomly positioned obstacles (balls fixed in the ground), 2 - same as 1 but also strength of the psoas muscles (the muscles that help bend the hip joint in the model) varies. The muscle strength is set to z * 100%, where z is a normal variable with the mean 1 and the standard deviation 0.1
  • seed - starting seed for the random number generator. If the seed is None, generation from the previous seed is continued.

Your solution will be graded in the environment with difficulty = 2, yet it might be easier to train your model with difficulty = 0 first and then retrain with a higher difficulty

step(action)

Make one iteration of the simulation.

  • action - a list of length 18 of continuous values in [0,1] corresponding to excitation of muscles.

The function returns:

  • observation - a list of length 41 of real values corresponding to the current state of the model. Variables are explained in the section "Physics of the model".

  • reward - reward gained in the last iteration. The reward is computed as a change in position of the pelvis along the x axis minus the penalty for the use of ligaments. See the "Physics of the model" section for details.

  • done - indicates if the move was the last step of the environment. This happens if either 1000 iterations were reached or the pelvis height is below 0.65 meters.

  • info - for compatibility with OpenAI, currently not used.

Physics and biomechanics of the model

The model is implemented in OpenSim[1], which relies on the Simbody physics engine. Note that, given recent successes in model-free reinforcement learning, expertise in biomechanics is not required to successfully compete in this challenge.

To summarize briefly, the agent is a musculoskeletal model that include body segments for each leg, a pelvis segment, and a single segment to represent the upper half of the body (trunk, head, arms). The segments are connected with joints (e.g., knee and hip) and the motion of these joints is controlled by the excitation of muscles. The muscles in the model have complex paths (e.g., muscles can cross more than one joint and there are redundant muscles). The muscle actuators themselves are also highly nonlinear. For example, there is a first order differential equation that relates electrical signal the nervous system sends to a muscle (the excitation) to the activation of a muscle (which describes how much force a muscle will actually generate given the muscle's current force-generating capacity). Given the musculoskeletal structure of bones, joint, and muscles, at each step of the simulation (corresponding to 0.01 seconds), the engine:

  • computes activations of muscles from the excitations vector provided to the step() function,
  • actuates muscles according to these activations,
  • computes torques generated due to muscle activations,
  • computes forces caused by contacting the ground,
  • computes velocities and positions of joints and bodies,
  • generates a new state based on forces, velocities, and positions of joints.

In each action, the following 18 muscles are actuated (9 per leg):

  • hamstrings,
  • biceps femoris,
  • gluteus maximus,
  • iliopsoas,
  • rectus femoris,
  • vastus,
  • gastrocnemius,
  • soleus,
  • tibialis anterior. The action vector corresponds to these muscles in the same order (9 muscles of the right leg first, then 9 muscles of the left leg).

The observation contains 41 values:

  • position of the pelvis (rotation, x, y)
  • velocity of the pelvis (rotation, x, y)
  • rotation of each ankle, knee and hip (6 values)
  • angular velocity of each ankle, knee and hip (6 values)
  • position of the center of mass (2 values)
  • velocity of the center of mass (2 values)
  • positions (x, y) of head, pelvis, torso, left and right toes, left and right talus (14 values)
  • strength of left and right psoas: 1 for difficulty < 2, otherwise a random normal variable with mean 1 and standard deviation 0.1 fixed for the entire simulation
  • next obstacle: x distance from the pelvis, y position of the center relative to the the ground, radius.

For more details on the simulation framework, please refer to [1]. For more specific information about the muscles model we use, please refer to [2] or to OpenSim documentation.

[1] Delp, Scott L., et al. "OpenSim: open-source software to create and analyze dynamic simulations of movement." IEEE transactions on biomedical engineering 54.11 (2007): 1940-1950.

[2] Thelen, D.G. "Adjustment of muscle mechanics model parameters to simulate dynamic contractions in older adults." ASME Journal of Biomechanical Engineering 125 (2003): 70–77.

Frequently Asked Questions

I'm getting 'version GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference' error

If you are getting this error:

ImportError: /opensim-rl/lib/python2.7/site-packages/opensim/libSimTKcommon.so.3.6:
  symbol _ZTVNSt7__cxx1119basic_istringstreamIcSt11char_traitsIcESaIcEEE, version
  GLIBCXX_3.4.21 not defined in file libstdc++.so.6 with link time reference

Try conda install libgcc.

Can I use languages other than python?

Yes, you just need to set up your own python grader and interact with it https://github.com/kidzik/osim-rl-grader. Find more details here OpenAI http client

Do you have a docker container?

Yes, you can use https://hub.docker.com/r/stanfordnmbl/opensim-rl/ Note, that connecting a display to a docker can be tricky and it's system dependent. Nevertheless, for training your models the display is not necessary -- the docker container can be handy for using multiple machines.

Some libraries are missing. What is required to run the environment?

Most of the libraries by default exist in major distributions of operating systems or are automatically downloaded by the conda environment. Yet, sometimes things are still missing. The minimal set of dependencies under Linux can be installed with

sudo apt install libquadmath0 libglu1-mesa libglu1-mesa-dev libsm6 libxi-dev libxmu-dev liblapack-dev

Please, try to find equivalent libraries for your OS and let us know -- we will put them here.

Why there are no energy constraints?

Please refer to the issue stanfordnmbl#34.

I have some memory leaks, what can I do?

Please refer to stanfordnmbl#10 and to stanfordnmbl#58

I see only python3 environment for Linux. How to install Windows environment?

Please refer to stanfordnmbl#29

How to visualize observations when running simulations on the server?

Please refer to stanfordnmbl#59

I still have more questions, how can I contact you?

For questions related to the challenge please use the challenge forum. For issues and problems related to installation process or to the implementation of the simulation environment feel free to create an issue on GitHub.

Credits

This challenge would not be possible without:

About

Reinforcement learning environments with musculoskeletal models

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 69.3%
  • Jupyter Notebook 30.7%