All Topics

Technology

Art

Codex iteratively optimized AGENTS.md 8 times against real PRs, but the best version regressed on a clean holdout

Ben Redmond

4d ago· 11 min readenInsight

95/100

Golden Brown

Bagelometer↗

Fresh out the oven, still warm. Top of the tray.

Score95TypeanalysisSentimentneutral

Summary

The author describes using Codex (an AI coding agent) to iteratively optimize their AGENTS.md file (a configuration file that guides AI agent behavior) against a benchmark of real pull requests from their Stet repository. After 8 iterations, the best-performing version improved performance on the training data but regressed on a clean holdout set, meaning it wasn't safe to deploy. The article explores the tension between vibe-coded configurations and data-driven optimization, the risks of overfitting agent instructions, and the importance of rigorous evaluation for AI agent behavior files.

Key quotes

· 4 pulled

I vibe-coded my AGENTS.md, and I'm pretty sure it's slop.

Codex used a benchmark on my repo to measure each change, and optimized AGENTS.md against the data, instead of on pure vibes.

Someone adds a rule that sounds smart, senior, and reasonable, commits it, and hopes the agent behaves better.

The best candidate improved the training slice, then regressed enough on a clean holdout that it was not safe to ship.

Snippet from the RSS feed

Codex optimized its own AGENTS.md against real Stet repo tasks. The best candidate improved the training slice, then regressed enough on a clean holdout that it was not safe to ship.

You might also wanna read

AGENTS.md: Standardized Documentation Format for AI Agents Adopted by Major Platforms

The article introduces AGENTS.md, a standardized format for AI agents that serves as a structured alternative to human-readable README files

Product Hunt·9mo ago

How I Used Coding Agents to Automate My AI Research Work in Copilot Applied Science

An AI researcher shares their experience using coding agents to automate intellectual work, specifically building agents that automate parts

GitHub·2mo ago

OpenAI's Codex 3.0 becomes an autonomous cross-app coding agent with GPT-5.5

OpenAI's Codex 3.0, powered by GPT-5.5, has evolved into a cross-app coding agent that can autonomously navigate browsers, interact with web

Product Hunt·1mo ago

AGENTS.md: An Open Format for Guiding AI Coding Agents in Open-Source Projects

AGENTS.md is a simple, open format for guiding AI coding agents, functioning as a README specifically designed for agents rather than humans

agents.md·2d ago

OpenAI Updates Agents SDK with Codex-Style Harness and Enhanced Sandboxing

OpenAI's Build Hour session, led by engineer Steve Corley, introduced key updates to the Agents SDK, including a new "Codex-style harness" t

startuphub.ai·3d ago

Scorecard CEO warns of AI agent dangers in high-stakes domains, offers evaluation platform

Darius, CEO of Scorecard, shares a cautionary tale about building AI agents in high-stakes domains. He describes how his EMR agent for docto

Product Hunt·7mo ago