Prompt Rewrite Boosts GPT-5-mini Performance by 22% on Tau² Benchmark
By
blndrt
Crackling crust, pillowy middle. The kind of bagel that earns a second cup of coffee.
Summary
Researchers discovered that a simple prompt rewrite significantly boosted the performance of GPT-5-mini by 22% on the Tau² benchmark, which tests LLM agent capabilities. The article details how they identified and fixed a performance bottleneck through subtle changes to agent policies, revealing a common reliability trap in small models despite their speed advantages.
Key quotes
· 4 pulleda simple prompt rewrite boosted a small model's success rate by over 20%
we found and fixed this performance bottleneck by making subtle changes to agent policies
our benchmarks revealed a common reliability trap
Tau² benchmark, which simulates real-world agent interactions
You might also wanna read

OpenAI launches GPT-5.5 with improved coding and cross-tool capabilities
OpenAI has announced GPT-5.5, its latest AI model, just one month after releasing GPT-5.4. The company claims the new model excels at writin
Datacurve's DeepSWE Benchmark Shows GPT-5.5 Leading AI Coding Models with 70% Pass Rate
A new benchmark called DeepSWE, released by startup Datacurve, reveals significant performance differences among AI coding models that were
OpenAI Unveils GPT-5: Enhanced Reasoning and Coding Capabilities
OpenAI's GPT-5 is an advanced model with significant improvements in reasoning, code quality, and user experience. It excels in handling com
