SPIRAL: A Reinforcement Learning Framework for Multi-Primitive Language Model Reasoning
By
[Submitted on 22 Jun 2026]
Summary
This paper introduces SPIRAL (Sequential-Parallel-Aggregative Reinforcement Learning), a framework that trains language models to use three inference-time reasoning primitives: sequential reasoning within a trace, independently sampled parallel traces, and aggregation of multiple traces into a final response. Unlike standard post-training which only optimizes sequential reasoning, SPIRAL uses set reinforcement learning to teach models to produce collectively useful traces and standard RL to teach aggregation. Experiments show SPIRAL outperforms GRPO by up to 11x scaling efficiency and 15% higher performance when all three compute primitives are scaled.
Source
Key quotes
· 3 pulledWe introduce Sequential-Parallel-Aggregative Reinforcement Learning (SPIRAL), a framework in which a language model is trained to use all three primitives, as part of a unified inference compute pipeline.
To train this system, SPIRAL uses set reinforcement learning to teach models to produce a set of traces that are collectively useful for an aggregator and standard reinforcement learning to teach models to aggregate the set into improved final responses.
Our experiments on reasoning tasks show that SPIRAL effectively scales with inference compute, outperforming GRPO by up to 11× scaling efficiency and 15% higher performance when all three compute primitives are scaled.
You might also wanna read
Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration
Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-
ConSPO: A Contrastive Approach to Improving Reinforcement Learning with Verifiable Rewards for LLMs
This paper analyzes Group Relative Policy Optimization (GRPO), a widely used RLVR algorithm for post-training large language models on reaso
Reinforcement Learning to Train Large Language Models to Explain Human Decisions
Tiny Recursion Model Achieves Strong AGI Benchmark Results with Only 7M Parameters
The paper introduces Tiny Recursion Model (TRM), a recursive reasoning model that achieves impressive scores of 45% on ARC-AGI-1 and 8% on A
New Framework Formalizes Learning from Language Feedback with Provable Performance Guarantees
This paper formalizes the Learning from Language Feedback (LLF) problem, providing a principled framework for interactive learning using lan
Comprehensive Survey of Reasoning Failures in Large Language Models
This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization frame

Comments
Sign in to join the conversation.
No comments yet. Be the first.