Evals

5 articles

January 30, 2026 Update

OpenCode plugin with AI agent teams, complete queue system, sandbox snapshots, TTL support across services, eval lifecycle hooks, and 12 releases worth of improvements.

January 7, 2026 by Agentuity

Agentuity v1 Reaches Beta

Agentuity v1 reaches beta with sandbox infrastructure, SSH support, type-safe RPC, built-in auth, and a first-class evaluations system.

June 10, 2025 by Jeff Haynie

Summer Contributions - LLM as a Judge

Joel, a student at University of Florida, takes LLM as a Judge and runs with it with this great pattern example built on Agentuity.

June 9, 2025 by Jeff Haynie

Summer Contributions - Evals

This evaluation system uses multiple specialized agents to create a robust, scalable framework for testing AI models against ground truth datasets. Each agent has a discrete task and can pass information to others through the Agentuity key-value store.

May 29, 2025 by Bobby Christopher

Collider: Our AI Gateway Testing With Intelligent Automation

How Agentuity built Collider — an AI-powered testing framework that validates AI gateway integrations across models and runtimes, then auto-triages failures.