trustworthy-ai

Here are 230 public repositories matching this topic...

Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

python machine-learning privacy ai attack extraction inference artificial-intelligence evasion red-team poisoning adversarial-machine-learning blue-team adversarial-examples adversarial-attacks trusted-ai trustworthy-ai

Updated Nov 21, 2025
Python

Giskard-AI / giskard-oss

Sponsor

Star

🐢 Open-Source Evaluation & Testing library for LLM Agents

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

Updated Nov 18, 2025
Python

zjunlp / EasyEdit

Star

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

Updated Nov 21, 2025
Jupyter Notebook

THU-BPM / MarkLLM

Star

MarkLLM: An Open-Source Toolkit for LLM Watermarking.（EMNLP 2024 System Demonstration)

toolkit safety watermark trustworthy-ai large-language-models llm

Updated Oct 14, 2025
Python

THUYimingLi / BackdoorBox

Star

The open-sourced Python toolbox for backdoor attacks and defenses.

backdoor-attacks trustworthy-machine-learning backdoor-learning trustworthy-ai backdoor-defenses

Updated Sep 27, 2025
Python

HowieHwong / TrustLLM

Star

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

nlp benchmark natural-language-processing ai toolkit evaluation dataset pypi-package trustworthy-machine-learning trustworthy-ai large-language-models llm

Updated Jun 24, 2025
Python

Pacific-AI-Corp / langtest

Star

Deliver safe & effective language models

nlp artificial-intelligence benchmarks benchmark-framework model-assessment ai-safety mlops responsible-ai ml-safety trustworthy-ai ethics-in-ai ml-testing large-language-models llm ai-testing llm-test llm-evaluation-toolkit llm-as-evaluator llm-testing

Updated Oct 25, 2025
Python

DebarghaG / proofofthought

Star

Proof of thought : LLM-based reasoning using Z3 theorem proving with multiple backend support (SMT2 and JSON DSL)

z3 automated-reasoning trustworthy-ai llm llm-inference llm-reasoning

Updated Oct 18, 2025
Python

aiverify-foundation / moonshot

Star

Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

benchmarking evaluation-framework red-teaming trustworthy-ai llm

Updated Sep 4, 2025
Python

liuzuxin / FSRL

Star

🚀 A fast safe reinforcement learning library in PyTorch

library reinforcement-learning robotics decision-making pytorch sac safety-critical trpo ppo cpo safe-rl trustworthy-ai cvpo

Updated Sep 30, 2024
Python

yunqing-me / AttackVLM

Star

[NeurIPS-2023] Annual Conference on Neural Information Processing Systems

deep-generative-model adversarial-attack trustworthy-ai foundation-models large-language-models text-to-image-generation generative-ai vision-language-model image-to-text-generation

Updated Dec 22, 2024
Python

sleeepeer / PoisonedRAG

Star

[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

security machine-learning ai rag trustworthy-ai retrieval-augmented-generation

Updated Nov 19, 2025
Python

responsible-ai-collaborative / aiid

Star

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.

ai trustworthy-ai

Updated Nov 12, 2025
JavaScript

tsinghua-fib-lab / ANeurIPS2024_SPV-MIA

Star

[NeurIPS'24] "Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration"

membership-inference-attack trustworthy-ai large-language-models

Updated Mar 13, 2025
Python

Open-source testing platform & SDK for LLM and agentic applications. Define what your app should and shouldn't do in plain language, and Rhesis generates hundreds of test scenarios, runs them, and shows you where it breaks before production. Built for cross-functional teams to collaborate.