Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration

remexre

4mo ago· 2 min readenInsight

85/100

Golden Brown

Bagelometer↗

The bagel they save for the regulars. Don't skim, savour.

Score85TypeanalysisSentimentpositive

Summary

Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-training phase through iterative computation in latent space, rather than relying on post-training techniques like chain-of-thought. The models use an entropy-regularized objective for learned depth allocation and were trained on 7.7 trillion tokens. The 1.4B and 2.6B parameter Ouro models achieve performance matching state-of-the-art 12B parameter LLMs across various benchmarks, demonstrating superior knowledge manipulation rather than increased knowledge capacity. The research shows LoopLM produces reasoning traces more aligned with final outputs than explicit chain-of-thought approaches.

Key quotes

· 5 pulled

Modern LLMs are trained to 'think' primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data.

We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens.

Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks.

Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities.

We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT.

Snippet from the RSS feed

Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros,

You might also wanna read

Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models

This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains

arxiv.org·3d ago