SLSA for ML Supply Chain Integrity
Signing with Sigstore provides users with confidence in the models that they are using, but it cannot answer every question they have about the model. SLSA goes a step further to provide more meaning behind those signatures.
SLSA (Supply-chain Levels for Software Artifacts) is a specification for describing how a software artifact was built. SLSA-enabled build platforms implement controls to prevent tampering and output signed provenance describing how the software artifact was produced, including all build inputs. This way, SLSA provides trustworthy metadata about what went into a software artifact.
Applying SLSA to ML could provide similar information about an ML model’s supply chain and address attack vectors not covered by model signing, such as compromised source control, compromised training process, and vulnerability injection. Our vision is to include specific ML information in a SLSA provenance file, which would help users spot an undertrained model or one trained on bad data. Upon detecting a vulnerability in an ML framework, users can quickly identify which models need to be retrained, thus reducing costs.
We don’t need special ML extensions for SLSA. Since an ML training process is a build (shown in the earlier diagram), we can apply the existing SLSA guidelines to ML training. The ML training process should be hardened against tampering and output provenance just like a conventional build process. More work on SLSA is needed to make it fully useful and applicable to ML, particularly around describing dependencies such as datasets and pretrained models. Most of these efforts will also benefit conventional software.
For models training on pipelines that do not require GPUs/TPUs, using an existing, SLSA-enabled build platform is a simple solution. For example, Google Cloud Build, GitHub Actions, or GitLab CI are all generally available SLSA-enabled build platforms. It is possible to run an ML training step on one of these platforms to make all of the built-in supply chain security features available to conventional software.
How to do model signing and SLSA for ML today
By incorporating supply chain security into the ML development lifecycle now, while the problem space is still unfolding, we can jumpstart work with the open source community to establish industry standards to solve pressing problems. This effort is already underway and available for testing.
Our repository of tooling for model signing and experimental SLSA provenance support for smaller ML models is available now. Our future ML framework and model hub integrations will be released in this repository as well.
We welcome collaboration with the ML community and are looking forward to reaching consensus on how to best integrate supply chain protection standards into existing tooling (such as Model Cards). If you have feedback or ideas, please feel free to open an issue and let us know.