Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
-
Updated
Nov 21, 2025 - Python
8000
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
🐢 Open-Source Evaluation & Testing library for LLM Agents
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 System Demonstration)
The open-sourced Python toolbox for backdoor attacks and defenses.
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
Deliver safe & effective language models
Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support (SMT2 and JSON DSL)
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
🚀 A fast safe reinforcement learning library in PyTorch
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
[NeurIPS'24] "Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration"
Open-source testing platform & SDK for LLM and agentic applications. Define what your app should and shouldn't do in plain language, and Rhesis generates hundreds of test scenarios, runs them, and shows you where it breaks before production. Built for cross-functional teams to collaborate.
A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
MarkDiffusion: An Open-Source Toolkit for Generative Watermarking of Latent Diffusion Models
Code of the paper: A Recipe for Watermarking Diffusion Models
A package of distributionally robust optimization (DRO) methods. Implemented via cvxpy and PyTorch
Add a description, image, and links to the trustworthy-ai topic page so that developers can more easily learn about it. 30FA
To associate your repository with the trustworthy-ai topic, visit your repo's landing page and select "manage topics."