Retrace: A debugging tool for AI agents that replays and forks executions to identify regressions
By
Yashwanth
Summary
Retrace is a debugging tool for AI agents that allows users to record, replay, fork, and share agent executions. It captures every LLM call, tool invocation, and error. A key challenge highlighted is that replaying or forking runs often produces different results due to provider non-determinism, making it difficult to distinguish real regressions from random variation. The tool currently shows a first-divergence diff with a verdict (improved, regressed, or unchanged), but the author is seeking community input on how to better handle this issue.
Source
Key quotes
· 4 pulledWhen you replay or fork a run in Retrace, the steps before the fork come from the recording, but everything after runs live against the model.
So two runs of the same input rarely match exactly, even when nothing actually broke.
when a replay diverges, is it a real regression from your change, or just provider non-determinism?
Retrace currently shows a first-divergence diff and a verdict of improved, regressed, or unchanged
You might also wanna read
re_gent: A Version Control System for AI Coding Agents
re_gent is an open-source version control system designed specifically for AI coding agents. It automatically tracks every tool call an agen
Tracing module
Using Time Travel Debugging and Codex for Reverse Engineering Binary Analysis
This article explores how Time Travel Debugging (TTD) combined with TTDObjectsPy can assist OpenAI's Codex AI in reverse engineering binarie
RepoReaper: An Evidence-Grounded Repository Intelligence Agent for Codebase Analysis
RepoReaper is an evidence-grounded repository intelligence agent designed for engineers, reviewers, and researchers to quickly understand un
Cross-Trace Verification Protocol: A Framework for Detecting Malicious Code in AI-Generated Programs
Researchers present Cross-Trace Verification Protocol (CTVP), a novel AI control framework for detecting malicious code generated by large l
Kelet: Automated Debugging and Fixing Tool for LLM Applications and AI Agents
Kelet is a new tool that automatically detects and fixes failures in LLM applications and AI agents in production environments. It works by

Comments
Sign in to join the conversation.
No comments yet. Be the first.