The LLM Evaluation Framework
-
Updated
Mar 10, 2026 - Python
FFFF
The LLM Evaluation Framework
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
A framework for few-shot evaluation of language models.
Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
Data-Driven Evaluation for LLM-Powered Applications
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
Python SDK for running evaluations on LLM generated responses
The official evaluation suite and dynamic data release for MixEval.
build and benchmark deep research
A research library for automating experiments on Deep Graph Networks
AI Data Management & Evaluation Platform
MedEvalKit: A Unified Medical Evaluation Framework
DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.
Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."