LLM as Judge is a framework for implementing a pattern where Large Language Models (LLMs) act as judges to evaluate content, responses, or code based on defined criteria. This project provides a complete solution with API endpoints, Azure deployment support, and configurable judge assemblies.
- Judge Orchestration: Create and manage multiple judges with different evaluation criteria
- Assembly System: Group judges into assemblies for comprehensive evaluations
- FastAPI Backend: RESTful API for managing judges and performing evaluations
- Azure Integration: Built-in support for Azure Cognitive Services, Cosmos DB, and Container Apps
- Statistical Analysis: Includes plugins for statistical analysis and data operations
- Infrastructure as Code: Terraform support for easy deployment to Azure
The system consists of:
- Judges: Individual LLM instances configured with specific evaluation criteria
- Assemblies: Collections of judges that work together to provide comprehensive evaluations
- API Layer: FastAPI application for managing judges and assemblies, and processing evaluation requests
- Storage Layer: Azure Cosmos DB for storing judge configurations and evaluation results
- Plugins: Extensions for statistical analysis, data processing, and other functions
The LLM as Judge framework offers two distinct visual approaches:
This approach groups multiple specialized judges into assemblies that handle different evaluation criteria. It promotes modularity, allowing tailored assemblies for various tasks.
The Super Judge model centralizes the evaluation process within a single, advanced judge. It streamlines management while integrating multiple evaluation strategies.
Each approach presents unique advantages. The Judge Assembly enables flexible scalability and specialization, whereas the Super Judge focuses on simplicity and unified decision-making.
- Python 3.12 or higher
- Poetry for dependency management
- Azure account with necessary permissions
- Azure CLI installed
- Terraform installed
- Clone the repository:
git clone https://github.com/your-username/llm-as-judge.git
cd llm-as-judge
- Configure the environment:
./configuration/conf-env.ps1
- Run linters to verify code quality:
./configuration/linters.ps1
- Export requirements (if needed):
./configuration/export-requirements.ps1
Login to your Azure account:
az login
- Create a
terraform.tfvars
file based on the sample:
cp terraform.tfvars.sample terraform.tfvars
- Edit the
terraform.tfvars
file with your subscription and tenant IDs:
subscription_id = "your-subscription-id"
tenant_id = "your-tenant-id"
- Initialize Terraform:
terraform init
- Create an execution plan:
terraform plan -out=tfplan
- Apply the configuration:
terraform apply -var-file="terraform.tfvars"
To import existing resources into Terraform state:
./configuration/update-tf.ps1
The API provides several endpoints for managing judges and performing evaluations:
GET /list-judges
- List all judgesPOST /create-judge
- Create a new judgePUT /update-judge/{judge_id}
- Update an existing judgeDELETE /delete-judge/{judge_id}
- Delete a judge
GET /list-assemblies
- List all judge assembliesPOST /create-assembly
- Create a new assemblyPUT /update-assembly/{assembly_id}
- Update an existing assemblyDELETE /delete-assembly/{assembly_id}
- Delete an assembly
POST /evaluate
- Execute a judging evaluation
{
"id": "assembly_id",
"prompt": "The content to evaluate",
"method": "assembly"
}
src/app/
- Main application codejudges.py
- Judge implementation and orchestrationmain.py
- FastAPI applicationplugins/
- Additional functionality pluginsschemas/
- Data models and validation
src/tests/
- Test casesconfiguration/
- Setup and configuration scriptsterraform.tf
- Infrastructure as code
cd src
poetry run pytest
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.