This repo contains a variety of tutorials for using the PiPPy pipeline parallelism library with accelerate. You will find examples covering:
- How to trace the model using
accelerate.prepare_pippy - How to specify inputs based on what the model expects (when to use
kwargs,args, and such) - How to gather the results at the end.
This requires the main branch of accelerate (or a version at least 0.27.0) and pippy version of 0.2.0 or greater. Please install using pip install . to pull from the setup.py in this repo, or run manually:
pip install 'accelerate>=0.27.0' 'torchpippy>=0.2.0'One can expect that PiPPy will outperform native model parallism by a multiplicative factor since all GPUs are running at all times with inputs, rather than one input being passed through a GPU at a time waiting for the prior to finish.
Below are some benchmarks we have found when using the accelerate-pippy integration for a few models when running on 2x4090's:
| Accelerate/Sequential | PiPPy + Accelerate | |
|---|---|---|
| First batch | 0.2137s | 0.3119s |
| Average of 5 batches | 0.0099s | 0.0062s |
| Accelerate/Sequential | PiPPy + Accelerate | |
|---|---|---|
| First batch | 0.1959s | 0.4189s |
| Average of 5 batches | 0.0205s | 0.0126s |
| Accelerate/Sequential | PiPPy + Accelerate | |
|---|---|---|
| First batch | 0.2789s | 0.3809s |
| Average of 5 batches | 0.0198s | 0.0166s |