Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
-
Updated
Nov 6, 2024 - TypeScript
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
Laminar - open-source all-in-one platform for engineering AI products. Traces, Evals, Datasets, Labels. YC S24.
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
The Modelmetry JS/TS SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite
Evalica, your favourite evaluation toolkit
The Modelmetry Python SDK allows developers to easily integrate Modelmetry’s advanced guardrails and monitoring capabilities into their LLM-powered applications.
Develop better LLM apps by testing different models and prompts in bulk.
Which objects are visible through the holes in a picture book? This visual task is easy for adults, doable for primary schoolers, but hard for vision transformers.
Gemini 1.5 Million Token Context Experiment
dspy with ollama and llamacpp on google colab
The OAIEvals Collector: A robust, Go-based metric collector for EVALS data. Supports Kafka, Elastic, Loki, InfluxDB, TimescaleDB integrations, and containerized deployment with Docker. Streamlines OAI-Evals data management efficiently with a low barrier of entry!
Add a description, image, and links to the evals topic page so that developers can more easily learn about it.
To associate your repository with the evals topic, visit your repo's landing page and select "manage topics."