safeguards

Here are 6 public repositories matching this topic...

wisent-ai / wisent

This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.

security ai safeguards explainability llms

Updated Mar 13, 2026
Python

rishub-tamirisa / tamper-resistance

Star

[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"

meta-learning safeguards tamper-resistance llm open-weight

Updated Jun 9, 2025
Python

cirbuk / safeguards

Star

Agentic Safety Framework

python agents safeguards budgets guardrails llm

Updated Apr 23, 2025
Python

aaronkyle / social-development

Star

an exploration of issues of international social development policy and its operationalization

policy international-social-policy social-development social-policy safeguards

Updated Feb 28, 2023
HTML

JMasr / clinical-safeguards

Star

A FastAPI application for clinical safeguards using BERT-like models, providing endpoints for text processing and analysis.

security-tools safeguards llm-safety agent-security

Updated Feb 17, 2026
Python

juntyr / compression-safeguards

Star

Safe and Fearless lossy compression using safeguards

compression safe lossy-compression safeguards fearless

Updated Mar 13, 2026
Python

Improve this page

Add a description, image, and links to the safeguards topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the safeguards topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safeguards

Here are 6 public repositories matching this topic...

wisent-ai / wisent

rishub-tamirisa / tamper-resistance

cirbuk / safeguards

aaronkyle / social-development

JMasr / clinical-safeguards

juntyr / compression-safeguards

Improve this page

Add this topic to your repo