LLM as Judge

LLM as Judge is a framework for implementing a pattern where Large Language Models (LLMs) act as judges to evaluate content, responses, or code based on defined criteria. This project provides a complete solution with API endpoints, Azure deployment support, and configurable judge assemblies.

Features

Judge Orchestration: Create and manage multiple judges with different evaluation criteria
Assembly System: Group judges into assemblies for comprehensive evaluations
FastAPI Backend: RESTful API for managing judges and performing evaluations
Azure Integration: Built-in support for Azure Cognitive Services, Cosmos DB, and Container Apps
Statistical Analysis: Includes plugins for statistical analysis and data operations
Infrastructure as Code: Terraform support for easy deployment to Azure

Architecture

The system consists of:

Judges: Individual LLM instances configured with specific evaluation criteria
Assemblies: Collections of judges that work together to provide comprehensive evaluations
API Layer: FastAPI application for managing judges and assemblies, and processing evaluation requests
Storage Layer: Azure Cosmos DB for storing judge configurations and evaluation results
Plugins: Extensions for statistical analysis, data processing, and other functions

Judge Visualization

The LLM as Judge framework offers two distinct visual approaches:

Judge Assembly

This approach groups multiple specialized judges into assemblies that handle different evaluation criteria. It promotes modularity, allowing tailored assemblies for various tasks.

Super Judge

The Super Judge model centralizes the evaluation process within a single, advanced judge. It streamlines management while integrating multiple evaluation strategies.

Each approach presents unique advantages. The Judge Assembly enables flexible scalability and specialization, whereas the Super Judge focuses on simplicity and unified decision-making.

Getting Started

Prerequisites

Python 3.12 or higher
Poetry for dependency management
Azure account with necessary permissions
Azure CLI installed
Terraform installed

Local Setup

Clone the repository:

git clone https://github.com/your-username/llm-as-judge.git
cd llm-as-judge

Configure the environment:

./configuration/conf-env.ps1

Run linters to verify code quality:

./configuration/linters.ps1

Export requirements (if needed):

./configuration/export-requirements.ps1

Provisioning Infrastructure with Terraform

Authentication

Login to your Azure account:

az login

Configuration

Create a terraform.tfvars file based on the sample:

cp terraform.tfvars.sample terraform.tfvars

Edit the terraform.tfvars file with your subscription and tenant IDs:

subscription_id = "your-subscription-id"
tenant_id       = "your-tenant-id"

Deployment

Initialize Terraform:

terraform init

Create an execution plan:

terraform plan -out=tfplan

Apply the configuration:

terraform apply -var-file="terraform.tfvars"

Updating Existing Resources

To import existing resources into Terraform state:

./configuration/update-tf.ps1

Using the API

The API provides several endpoints for managing judges and performing evaluations:

Judge Configuration

GET /list-judges - List all judges
POST /create-judge - Create a new judge
PUT /update-judge/{judge_id} - Update an existing judge
DELETE /delete-judge/{judge_id} - Delete a judge

Assembly Configuration

GET /list-assemblies - List all judge assemblies
POST /create-assembly - Create a new assembly
PUT /update-assembly/{assembly_id} - Update an existing assembly
DELETE /delete-assembly/{assembly_id} - Delete an assembly

Judge Execution

POST /evaluate - Execute a judging evaluation

Example Evaluation Request

{
  "id": "assembly_id",
  "prompt": "The content to evaluate",
  "method": "assembly"
}

Development

Project Structure

src/app/ - Main application code
- judges.py - Judge implementation and orchestration
- main.py - FastAPI application
- plugins/ - Additional functionality plugins
- schemas/ - Data models and validation
src/tests/ - Test cases
configuration/ - Setup and configuration scripts
terraform.tf - Infrastructure as code

Running Tests

cd src
poetry run pytest

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
.static		.static
configuration		configuration
src		src
use-cases		use-cases
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
main.tf		main.tf
terraform.tfvars.sample		terraform.tfvars.sample
tfplan		tfplan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM as Judge

Features

Architecture

Judge Visualization

Judge Assembly

Super Judge

Getting Started

Prerequisites

Local Setup

Provisioning Infrastructure with Terraform

Authentication

Configuration

Deployment

Updating Existing Resources

Using the API

Judge Configuration

Assembly Configuration

Judge Execution

Example Evaluation Request

Development

Project Structure

Running Tests

Contributing

Trademarks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

microsoft/llm-as-judge

Folders and files

Latest commit

History

Repository files navigation

LLM as Judge

Features

Architecture

Judge Visualization

Judge Assembly

Super Judge

Getting Started

Prerequisites

Local Setup

Provisioning Infrastructure with Terraform

Authentication

Configuration

Deployment

Updating Existing Resources

Using the API

Judge Configuration

Assembly Configuration

Judge Execution

Example Evaluation Request

Development

Project Structure

Running Tests

Contributing

Trademarks

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages