Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta
This repository contains the code to reproduce the experiments from the ICLR24 paper "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging". The code is based on PyTorch 1.9 and the experiment-tracking platform Weights & Biases. See the blog post or the twitter thread for a TL;DR.
Experiments are started from the following file:
main.py
: Starts experiments using the dictionary format of Weights & Biases.
The rest of the project is structured as follows:
strategies
: Contains the strategies used for training, pruning and model averaging.runners
: Contains classes to control the training and collection of metrics.metrics
: Contains all metrics as well as FLOP computation methods.models
: Contains all model architectures used.utilities
: Contains useful auxiliary functions and classes.
An entire experiment is subdivided into multiple steps, each being multiple (potentially many) different runs and wandb experiments. First of all, a model has to be pretrained using the Dense
strategy. This steps is completely agnostic to any pruning specifications. Then, for each phase or prune-retrain-cycle (specified by the n_phases
parameter and controlled by phase
parameter), the following steps are executed:
- Strategy
IMP
: Prune the model using the IMP strategy. Here, it is important to specify theensemble_by
,split_val
andn_splits_total
parameters:ensemble_by
: The parameter which is varied when retraining multiple models. E.g. setting this toweight_decay
will train multiple models with different weight decay values.split_val
: The value by which theensemble_by
parameter is split. E.g. setting this to 0.0001 while usingweight_decay
asensemble_by
will retrain a model with weight decay 0.0001, all else being equal.n_splits_total
: The total number of splits for theensemble_by
parameter. If set to three, the souping operation in the next step will expect three models to be present, given theensemble_by
configuration.
- Strategy
Ensemble
: Souping the models. This step will average the weights of the models specified by theensemble_by
parameter. Theensemble_by
parameter has to be the same as in the previous step.n_splits_total
has to be the same as well.split_val
is not used in this step and has to be set to None. Theensemble_method
parameter controls how the models are averaged.
In case you find the paper or the implementation useful for your own research, please consider citing:
@inproceedings{zimmer2024sparse,
title={Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging},
author={Max Zimmer and Christoph Spiegel and Sebastian Pokutta},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=xx0ITyHp3u}
}