Red Queen Gödel Machine: An Evolutionary Framework for Self-Improving AI with Dynamic Evaluation
By
[Submitted on 24 Jun 2026]
Summary
This paper introduces the Red Queen Gödel Machine (RQGM), an evolutionary framework for recursive self-improvement of AI agents under non-stationary evaluation criteria. Unlike prior self-improving agents that assume fixed benchmarks or verifiers, RQGM allows the evaluation utility to evolve alongside the agent, organized into epochs with fixed within-epoch criteria and updated objectives at epoch boundaries. The framework is tested across three domains: (1) verifiable coding tasks, where it improves test pass rates over prior SOTA while using fewer tokens; (2) scientific paper writing and reviewing, where co-evolved writers achieve 1.78x-1.86x higher acceptance rates and co-evolved graders reach 9% higher accuracy; and (3) Olympiad-level proof writing and grading. Notably, RQGM corrects a bias in baseline reviewers that over-accept AI-generated papers by introducing adversarial objectives that enforce equal stringency on AI and human work.
Source
Key quotes
· 4 pulledWe aim to bring the same principle to recursive self-improvement, making evaluation part of the improvement loop and opening search to evolving evaluators, adversarial objectives, and dynamic utilities that may surpass static benchmarks.
The RQGM improves test pass rate over the prior SOTA by adding a complementary agent-as-a-judge code-review signal. This signal is cheaper and the RQGM uses 1.35x-1.72x fewer tokens.
Co-evolved writers reach 1.78x-1.86x higher acceptance rates under a diverse agent-as-a-judge panel, while co-evolved graders reach 9% higher ground-truth accuracy.
The strongest baseline reviewer over-accepts AI-generated papers at up to 1.91x the human rate. The RQGM corrects this by introducing an adversarial objective that discovers reviewers equally stringent on AI and human work.
You might also wanna read
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
The Evolution of AI: From Static Benchmarks to Inference-Time Search for Autonomous Agents
The article explores the shift from traditional AI benchmarking to inference-time search as the future of AI development. It discusses how c
Survey of Self-Evolving AI Agents: Bridging Foundation Models and Lifelong Adaptability
The article surveys the emerging field of self-evolving AI agents, which aim to bridge the static capabilities of foundation models with the
Google DeepMind's Aletheia: An Autonomous AI System for Mathematical Research and Proof Generation
Google DeepMind researchers introduce Aletheia, an autonomous mathematics research agent that can generate, verify, and revise mathematical
Turing-RL: A Reinforcement Learning Approach for Training User Simulators Using Turing Test Rewards
This paper introduces Turing-RL, a novel reinforcement learning approach for training user simulator models that can mimic human users in in
R-Zero: A Self-Evolving LLM Framework That Generates Its Own Training Data Without Human Input
R-Zero is a fully autonomous framework for training self-evolving Large Language Models (LLMs) that generates its own training data from scr

Comments
Sign in to join the conversation.
No comments yet. Be the first.