Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration
By
remexre
The bagel they save for the regulars. Don't skim, savour.
Summary
Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-training phase through iterative computation in latent space, rather than relying on post-training techniques like chain-of-thought. The models use an entropy-regularized objective for learned depth allocation and were trained on 7.7 trillion tokens. The 1.4B and 2.6B parameter Ouro models achieve performance matching state-of-the-art 12B parameter LLMs across various benchmarks, demonstrating superior knowledge manipulation rather than increased knowledge capacity. The research shows LoopLM produces reasoning traces more aligned with final outputs than explicit chain-of-thought approaches.
Key quotes
· 5 pulledModern LLMs are trained to 'think' primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data.
We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens.
Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks.
Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities.
We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT.
