Evals
5 articles

January 30, 2026 Update
OpenCode plugin with AI agent teams, complete queue system, sandbox snapshots, TTL support across services, eval lifecycle hooks, and 12 releases worth of improvements.

January 7, 2026 by Agentuity
Agentuity v1 Reaches Beta
Agentuity v1 reaches beta with sandbox infrastructure, SSH support, type-safe RPC, built-in auth, and a first-class evaluations system.

June 10, 2025 by Jeff Haynie
Summer Contributions - LLM as a Judge
Joel, a student at University of Florida, takes LLM as a Judge and runs with it with this great pattern example built on Agentuity.

June 9, 2025 by Jeff Haynie
Summer Contributions - Evals
This evaluation system uses multiple specialized agents to create a robust, scalable framework for testing AI models against ground truth datasets. Each agent has a discrete task and can pass information to others through the Agentuity key-value store.

May 29, 2025 by Bobby Christopher
Collider: Our AI Gateway Testing With Intelligent Automation
How Agentuity built Collider — an AI-powered testing framework that validates AI gateway integrations across models and runtimes, then auto-triages failures.