Bidirectional Evolutionary Search: A New Framework for Self-Improving Language Models
By
[Submitted on 27 May 2026]
Not artisan, but a perfectly fine bagel. Hits the spot.
Summary
This paper introduces Bidirectional Evolutionary Search (BES), a novel search framework for self-improving language models that addresses limitations of existing methods like best-of-N sampling and tree search. BES combines forward candidate evolution (using evolution operators to recombine partial trajectories) with backward goal decomposition (recursively breaking tasks into checkable subgoals for dense feedback). The authors provide theoretical motivation showing evolutionary operators can escape the narrow entropy shell of expansion-only search, and backward search can exponentially reduce required samples. Experiments demonstrate BES enables consistent gains on challenging post-training tasks where mainstream algorithms fail, and outperforms existing open-source frameworks on problem-solving benchmarks.
Key quotes
· 5 pulledSearch has been proposed as an effective method for self-improving language models and agentic systems, both for post-training sample generation and for inference.
Bidirectional Evolutionary Search (BES) ... couples forward candidate evolution with backward goal decomposition.
In the forward search, BES augments standard expansion with evolution operators that recombine partial trajectories to generate candidates that are difficult to obtain from a single model rollout.
In the backward search, BES recursively decomposes the original task into checkable subgoals, producing dense intermediate feedback that guides forward search.
Experiments show that on challenging post-training tasks where mainstream post-training algorithms fail to improve, BES enables consistent gains.
You might also wanna read
OpenEvolve: Combining LLMs with Evolutionary Search for Algorithm Discovery
OpenEvolve is an open-source evolutionary coding agent that integrates large language models (LLMs) into a quality-diversity search framewor
Applying Tree Search Techniques to Language Models: Lessons from AlphaZero and DeepSeek-R1
This article explores the application of tree search techniques (like those used in AlphaZero for board games) to language models, examining
GEPA: A Language-Driven Evolutionary Algorithm for AI Prompt Optimization
The article introduces GEPA (Genetic-Pareto), a novel algorithm for optimizing prompts in complex, multi-module AI systems. Unlike tradition
R-Zero: A Self-Evolving LLM Framework That Generates Its Own Training Data Without Human Input
R-Zero is a fully autonomous framework for training self-evolving Large Language Models (LLMs) that generates its own training data from scr
Introduction to Self-Adapting Language Models (SEAL)
The article introduces Self-Adapting Large Language Models (SEAL), a framework that enables models to self-adapt by generating their own fin
Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration
Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-
