Strategic Comparison
agrepl vs. The Orbit
Observability is for watching. agrepl is for reproducing.
| Feature | Observability (LangSmith) | Mocking (VCR.py) | agrepl |
|---|---|---|---|
| Primary Goal | Tracing & Evaluation | HTTP Mocking | Deterministic Replay |
| Instrumentation | Requires SDK / Code changes | Library-specific | Zero-instrumentation (CLI) |
| Network Logic | Passes through (Real calls) | Stubs response | Frozen local state |
| Team Sync | Cloud Dashboard | Manual file sharing | share → pull → replay |
| Determinism | Partial (logs only) | High (for HTTP) | Strict (System-wide) |
vs. Observability (LangSmith, Helicone)
Observability tools are dashboards for what happened. They are great for analytics, but they don't let you re-live the execution. When an agent drifts, you need to see the exact raw response that caused the logic to fail.
The Pivot: agrepl is a debugger, not a dashboard.
vs. API Mocking (VCR.py, Polly.js)
API mocking was built for unit tests. Agents are more complex—they are multi-step, stateful, and non-deterministic. agrepl treats the entire execution as a single "run" that can be shared and replayed across machines.
The Pivot: agrepl captures the agent's journey, not just the API calls.