Introduction to agrepl

agrepl (Agent Replay) is a powerful CLI tool and library designed for debugging, testing, and monitoring AI agents. It acts as a transparent proxy between your AI agent and LLM providers (like OpenAI, Anthropic, or local models), recording all interactions for later analysis and replay.

Key Features

  • Transparent Proxying: Intercepts HTTP/LLM calls with zero-config for most setups.
  • Recording & Replay: Capture agent sessions and replay them exactly as they occurred.
  • Deterministic Testing: Use recorded responses to test agent behavior without hitting live APIs.
  • Visual Diffing: Compare runs to see how changes in your agent’s code or prompts affect LLM interactions.
  • Remote Storage: Push and pull recorded runs to a central server for team collaboration.

Why use agrepl?

Testing AI agents is notoriously difficult due to the stochastic nature of LLMs. agrepl solves this by giving you a reliable way to capture “what happened” and “what changed”. Whether you’re debugging a complex multi-step reasoning loop or regression testing a new prompt, agrepl provides the visibility you need.