GitHub - ZIB-IOL/SMS: Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"

[ICLR24] Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta

This repository contains the code to reproduce the experiments from the ICLR24 paper "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging". The code is based on PyTorch 1.9 and the experiment-tracking platform Weights & Biases. See the blog post or the twitter thread for a TL;DR.

Structure and Usage

Structure

Experiments are started from the following file:

main.py: Starts experiments using the dictionary format of Weights & Biases.

The rest of the project is structured as follows:

strategies: Contains the strategies used for training, pruning and model averaging.
runners: Contains classes to control the training and collection of metrics.
metrics: Contains all metrics as well as FLOP computation methods.
models: Contains all model architectures used.
utilities: Contains useful auxiliary functions and classes.

Usage

An entire experiment is subdivided into multiple steps, each being multiple (potentially many) different runs and wandb experiments. First of all, a model has to be pretrained using the Dense strategy. This steps is completely agnostic to any pruning specifications. Then, for each phase or prune-retrain-cycle (specified by the n_phases parameter and controlled by phase parameter), the following steps are executed:

Strategy IMP: Prune the model using the IMP strategy. Here, it is important to specify the ensemble_by, split_val and n_splits_total parameters:
- ensemble_by: The parameter which is varied when retraining multiple models. E.g. setting this to weight_decay will train multiple models with different weight decay values.
- split_val: The value by which the ensemble_by parameter is split. E.g. setting this to 0.0001 while using weight_decay as ensemble_by will retrain a model with weight decay 0.0001, all else being equal.
- n_splits_total: The total number of splits for the ensemble_by parameter. If set to three, the souping operation in the next step will expect three models to be present, given the ensemble_by configuration.
Strategy Ensemble: Souping the models. This step will average the weights of the models specified by the ensemble_by parameter. The ensemble_by parameter has to be the same as in the previous step. n_splits_total has to be the same as well. split_val is not used in this step and has to be set to None. The ensemble_method parameter controls how the models are averaged.

Citation

In case you find the paper or the implementation useful for your own research, please consider citing:

@inproceedings{zimmer2024sparse,
title={Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging},
author={Max Zimmer and Christoph Spiegel and Sebastian Pokutta},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=xx0ITyHp3u}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
metrics		metrics
models		models
runners		runners
strategies		strategies
utilities		utilities
.gitignore		.gitignore
README.md		README.md
citation.bib		citation.bib
config.py		config.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICLR24] Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

Structure and Usage

Structure

Usage

Citation

About

Languages

ZIB-IOL/SMS

Folders and files

Latest commit

History

Repository files navigation

[ICLR24] Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

Structure and Usage

Structure

Usage

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages