Back

Summer Contributions - LLM as a Judge

June 10, 2025 by Jeff Haynie

Community Contributions

First, a little about Joel:

Joel @joel13samuel is a student at University of Florida that is working with us this summer.

He is a rising junior at the University of Florida studying Computer Science. Joel is currently involved with Florida Community Innovation (FCI), a small nonprofit that connects financially struggling individuals with local resources. At FCI, Joel contributed to front-end development. Outside of programming, Joel loves to play volleyball, and is on the Mens indoor volleyball club team at school. He also loves spending time going on hikes or chilling with friends at the springs.

Also, he's been with us at a few conferences and let me tell you: he's got the Agentuity groove going on pretty well!

Joel

He just committed common and important pattern in the agentic world: LLM as a Judge:

Check it out here


In Joel's own words, here are the details:

What It Does

This project uses Agentuity to create a multi-agent system where one AI agent (ContentWriter) creates blog posts, and another AI agent (Jury) evaluates them on multiple criteria using different AI models.

How It Works

  • The ContentWriter agent receives a topic and uses OpenAI to generate a high-quality blog post
  • The Jury agent receives the blog post and evaluates it using multiple specialized "judge" LLMs
  • Each judge evaluates the blog post on specific criteria and provides scores out of 10
  • The Jury agent combines all evaluations and returns a comprehensive assessment

Agent details

ContentWriter

Uses the Mastra framework with OpenAI's gpt-4o-mini model to generate blog posts with:

Engaging titles Clear introductions Well-organized body paragraphs with subheadings Strong conclusions

Jury

A multi-model evaluation system that provides balanced assessment using:

Default Models:

  • GPT-4o Mini: Precise and thorough evaluator
  • GPT-4o: Critical and detailed evaluator focused on technical merits
  • Claude: Pretty cool model I can't lie

Evaluation Criteria:

  • Clarity
  • Structure
  • Engagement
  • Technical accuracy

How to Use It

Check the README for more details and to make it your own on Agentuity. It's as simple as:

  • Clone the repo
  • agentuity project import
  • agentuity deploy

Using the agent

  1. Open the DevMode URL provided when you start agentuity dev
  2. Generate Content: Select ContentWriter agent → Enter a topic → Get blog post
  3. Evaluate Content: Select Jury agent → Paste blog post → Get detailed evaluation

Via the test CLI client

# Generate a blog post on a topic
bun run index.ts ContentWriter "artificial intelligence"

# Evaluate a blog post
bun run index.ts Jury "Your blog post content here..."

# Run the full workflow (ContentWriter -> Jury)
bun run index.ts workflow "technology trends"

Community Spotlight

GitHub Profile Summary

Want to contribute to our summer series? Share your Agentuity projects with us on Discord or tag us on social media.