This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.
-
Updated
Mar 13, 2026 - Python
8000
This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
Agentic Safety Framework
an exploration of issues of international social development policy and its operationalization
A FastAPI application for clinical safeguards using BERT-like models, providing endpoints for text processing and analysis.
Safe and Fearless lossy compression using safeguards
Add a description, image, and links to the safeguards topic page so that developers can more easily learn about it.
To associate your repository with the safeguards topic, visit your repo's landing page and select "manage topics."