Back

Summer Contributions - Evals

June 9, 2025 by Jeff Haynie

Community Contributions

Here's another great summer contribution from Jason Walkow.

What It Does

Jason's evaluation system uses multiple specialized agents to create a robust, scalable framework for testing AI models against ground truth datasets. Each agent has a discrete task and can pass information to others through the Agentuity key-value store.

It includes a modern React-based web interface that provides:

  • Results Dashboard: View evaluation summaries, detailed results, and real-time progress
  • Dataset Management: Upload and manage evaluation datasets
  • Evaluation Configuration: Set up new evaluations with custom parameters
  • Settings Management: Configure API endpoints and authentication

Get it here: https://github.com/jsw324/evals

Evals

How to Use It

Check the README for more details and to make it your own on Agentuity. It's as simple as:

  • Clone the repo
  • agentuity project import
  • Look at the README for running both the agents and the front end.

Evals

Community Spotlight

GitHub Profile Summary

Jason Walkow's eval system is a robust, modular framework for testing AI models—combining specialized agents, a modern React UI, and Agentuity’s infrastructure to make large-scale evaluation simple and transparent.

Want to contribute to our summer series? Share your Agentuity projects with us on Discord or tag us on social media.