Technology

Art

SPIRAL: A Reinforcement Learning Framework for Multi-Primitive Language Model Reasoning

[Submitted on 22 Jun 2026]

4h ago· 2 min readenInsight

technology science artificial intelligence machine learning research

Summary

This paper introduces SPIRAL (Sequential-Parallel-Aggregative Reinforcement Learning), a framework that trains language models to use three inference-time reasoning primitives: sequential reasoning within a trace, independently sampled parallel traces, and aggregation of multiple traces into a final response. Unlike standard post-training which only optimizes sequential reasoning, SPIRAL uses set reinforcement learning to teach models to produce collectively useful traces and standard RL to teach aggregation. Experiments show SPIRAL outperforms GRPO by up to 11x scaling efficiency and 15% higher performance when all three compute primitives are scaled.

Source

Twitter / XSPIRAL: A Reinforcement Learning Framework for Multi-Primitive Language Model Reasoningarxiv.org

Key quotes

· 3 pulled

We introduce Sequential-Parallel-Aggregative Reinforcement Learning (SPIRAL), a framework in which a language model is trained to use all three primitives, as part of a unified inference compute pipeline.

To train this system, SPIRAL uses set reinforcement learning to teach models to produce a set of traces that are collectively useful for an aggregator and standard reinforcement learning to teach models to aggregate the set into improved final responses.

Our experiments on reasoning tasks show that SPIRAL effectively scales with inference compute, outperforming GRPO by up to 11× scaling efficiency and 15% higher performance when all three compute primitives are scaled.

Snippet from the RSS feed

Language model reasoning can be substantially improved at test time via scaffolds that scale inference compute across different primitives -- sequential reasoning within a trace, independently sampled parallel traces, and aggregation of multiple reasoning

You might also wanna read

Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration

Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-

arxiv.org·5mo ago

ConSPO: A Contrastive Approach to Improving Reinforcement Learning with Verifiable Rewards for LLMs

This paper analyzes Group Relative Policy Optimization (GRPO), a widely used RLVR algorithm for post-training large language models on reaso

arxiv.org·24d ago

Reinforcement Learning to Train Large Language Models to Explain Human Decisions

arxiv.org·1y ago

Tiny Recursion Model Achieves Strong AGI Benchmark Results with Only 7M Parameters

The paper introduces Tiny Recursion Model (TRM), a recursive reasoning model that achieves impressive scores of 45% on ARC-AGI-1 and 8% on A

alexiajm.github.io·8mo ago

New Framework Formalizes Learning from Language Feedback with Provable Performance Guarantees

This paper formalizes the Learning from Language Feedback (LLF) problem, providing a principled framework for interactive learning using lan

arxiv.org·15d ago

Comprehensive Survey of Reasoning Failures in Large Language Models

This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization frame

arxiv.org·4mo ago

Comments

No comments yet. Be the first.