
We're excited to announce that Agentuity v1 has reached beta. Since launching the public preview in early December the team has been heads-down building, and this release marks a major step toward our official launch:
- Sandbox infrastructure for isolated code execution in any language
- SSH access to running agents for live debugging
- Framework-agnostic frontend with type-safe RPC
- Built-in authentication powered by BetterAuth
- First-class evaluations to test agent quality
- And much more!
Here's a look at some of the highlights.
Sandbox Infrastructure
Sandboxes give your agents isolated Linux containers with their own filesystem, network, and configurable resources. Your agents are written in TypeScript, but sandboxes can run any language: Python, Node.js, shell scripts, or anything available via apt install.
Use sandboxes for code execution agents, AI coding assistants, automated testing, or validating generated code before returning it to users.
On the SDK side, sandboxes integrate naturally with your agent code. Each sandbox gets dedicated memory, CPU, and disk, all visible from the CLI.
Type-Safe Frontend and RPC
We introduced the @agentuity/frontend package and type-safe API calling in our December 19 update. Since then, we've added type-safe path and query parameters with automatic extraction from route definitions, ergonomic positional arguments for path params, and fixed optional params handling in the useAPI hook.
For the full API, check out the RPC client docs.
Built-in Authentication
The @agentuity/auth package provides first-class authentication powered by BetterAuth. It includes API key support, organization management, JWT tokens, and CLI commands for setup:
On the server side, protect your routes with middleware:
Workbench
We covered Workbench setup in our December 19 update. Since then, we've added execution metadata (token counts and duration) in thread messages, URL deep linking so you can share workbench URLs with pre-selected agents, and minification-safe schema detection for production builds. You can now also run Workbench against live deployed agents directly from the web app. Run agentuity dev and open /workbench to try it locally.
Built-in Evals Suite
New to the beta: a first-class evaluations system for testing your agents. Evals are automated tests that run after your agent completes, validating output quality and monitoring performance without blocking responses.
The @agentuity/evals package ships with 12 preset evaluations including politeness, safety, PII detection, conciseness, adversarial prompt detection, and more. You can also write custom evals for your specific use cases.
Evals come in two types:
- Binary (pass/fail) for yes/no criteria like safety checks
- Score (0-1) for quality gradients like relevance or helpfulness
Results appear in the web app, so you can track agent quality over time. No external tooling required.
SSH into Running Agents
Need to debug a live agent? You can now SSH directly into running cloud agents and sandboxes:
This makes debugging production issues much easier. No more guessing what's happening inside your agent. For setup instructions, see the debugging docs.
Explore the SDK (and More)
The new SDK Explorer is an interactive playground for learning the Agentuity platform. Browse live examples covering services (KV, Vector, Object Storage, AI Gateway), I/O patterns (streaming, SSE, durable streams), and more. Each example includes plain-language explanations of the concepts involved, plus reference code you can copy and drop into your own projects.
Also Shipping
Beyond the highlights above, here's what else we've been building:
VS Code Extension
- Agent, Data, and Deployment explorers in the sidebar
- AI chat participant for workspace-aware assistance
- Dev server lifecycle management from the IDE
- 8 new language model tools for managing agents and deployments
GitHub Integration
agentuity git linkandgit unlinkcommands- Connect repos for automatic deployments
agentuity integration github connect/disconnect
Configuration System
- Next.js-style
agentuity.config.tswith Vite HMR and plugin support - Custom Bun plugins, Tailwind CSS, and compile-time constants
- All build phases respect custom configuration
New Default Template
- The
agentuity createcommand now scaffolds a translation agent demo - Includes AI Gateway integration, React frontend with Tailwind, thread state, and evals
- A hands-on starting point for exploring the platform
Developer Experience
- Rust-style TypeScript error formatting
- Async lazy-loaded thread state (eliminates 100-150ms latency)
- Ergonomic positional arguments for RPC path params
- Improved DNS developer experience for custom domains
Stability
- 52 bug fixes including dev server stability, hot reload, type inference, and S3 compatibility
- Better process cleanup and stale process detection
- Terminal cursor restoration and CI-friendly output
All changes are backward compatible.
Upgrade Now
Ready to try the beta? Run:
Or if you're starting fresh:
Once you're on the beta, try out a few of the new features:
- Launch Workbench with
agentuity devand open/workbench - List your sandboxes with
agentuity cloud sandbox list - Add evals to your agents using the
@agentuity/evalspackage
Resources
We'd love your feedback as we work toward 1.0. Happy building!


