evals

Star

Here are 16 public repositories matching this topic...

METR / vivaria

Star

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

ai elicitation ai-evaluation evals

Updated Nov 6, 2024
TypeScript

lmnr-ai / lmnr

Star

Laminar - open-source all-in-one platform for engineering AI products. Traces, Evals, Datasets, Labels. YC S24.

open-source ai monitoring analytics evaluation self-hosted rust-lang developer-tools agents observability pipeline-builder aiops rag ai-observability llmops evals llm-evaluation llm-observability llm-workflow

Updated Nov 6, 2024
TypeScript

AgentOps-AI / agentops

Star

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Nov 6, 2024
Python

modelmetry / modelmetry-sdk-js

Star

The Modelmetry JS/TS SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.

monitoring tracing observability guardrails ai-observability large-language-models llm evals llm-observability

Updated Nov 1, 2024
TypeScript

superlinear-ai / raglite

Star

🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite

Updated Nov 6, 2024
Python

NirantK / rag-to-riches

Star

search rag evals

Updated Oct 19, 2024
Jupyter Notebook

dustalov / evalica

Sponsor

Star

Evalica, your favourite evaluation toolkit

Updated Oct 11, 2024
Python

openlayer-ai / templates

Star

Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.

ai examples evals

Updated Sep 25, 2024
Python

modelmetry / modelmetry-sdk-python

Star

The Modelmetry Python SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.

monitoring openai observability guardrails ai-observability large-language-models llm llmops evals llm-evaluation

Updated Aug 24, 2024
Python

AIAnytime / rag-evaluator

Star

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

eval rag evals

Updated Aug 10, 2024
Python

noah-art3mis / crucible

Star

Develop better LLM apps by testing different models and prompts in bulk.

ai llm prompt-engineering evals

Updated Jul 29, 2024
Python

lennart-finke / picturebooks

Star

Which objects are visible through the holes in a picture book? This visual task is easy for adults, doable for primary schoolers, but hard for vision transformers.

inspect vision-transformer evals

Updated Jul 26, 2024
Jupyter Notebook

camronh / ContextLength-Experiment

Star

Gemini 1.5 Million Token Context Experiment

llm evals gemini-flash

Updated Jun 23, 2024
Jupyter Notebook

gokayfem / dspy-ollama-colab

Star

dspy with ollama and llamacpp on google colab

evaluation agents vlm colab-notebook dspy llm llamacpp evals ollama

Updated May 23, 2024
Jupyter Notebook

Mockingbird Front End Code | Zeus + SciFi = Power of the gods (cloud + ai | Zeus) Meets the power of SciFi (human ingenuity | SfYi) At the intersection of intelligent design (systems engineering excellence) For your intelligence —ZeusFYI.

react redux control ai evals

Updated Apr 10, 2024
TypeScript

nstankov-bg / oaievals-collector

Star

The OAIEvals Collector: A robust, Go-based metric collector for EVALS data. Supports Kafka, Elastic, Loki, InfluxDB, TimescaleDB integrations, and containerized deployment with Docker. Streamlines OAI-Evals data management efficiently with a low barrier of entry!

go docker devops openai chatgpt evals

Updated Oct 26, 2023
Go

Improve this page

Add a description, image, and links to the evals topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evals topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evals

Here are 16 public repositories matching this topic...

METR / vivaria

lmnr-ai / lmnr

AgentOps-AI / agentops

modelmetry / modelmetry-sdk-js

superlinear-ai / raglite

NirantK / rag-to-riches

dustalov / evalica

openlayer-ai / templates

modelmetry / modelmetry-sdk-python

AIAnytime / rag-evaluator

noah-art3mis / crucible

lennart-finke / picturebooks

camronh / ContextLength-Experiment

gokayfem / dspy-ollama-colab

zeus-fyi / mockingbird

nstankov-bg / oaievals-collector

Improve this page

Add this topic to your repo