All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration

By

remexre

4mo ago· 2 min readenInsight

Summary

Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-training phase through iterative computation in latent space, rather than relying on post-training techniques like chain-of-thought. The models use an entropy-regularized objective for learned depth allocation and were trained on 7.7 trillion tokens. The 1.4B and 2.6B parameter Ouro models achieve performance matching state-of-the-art 12B parameter LLMs across various benchmarks, demonstrating superior knowledge manipulation rather than increased knowledge capacity. The research shows LoopLM produces reasoning traces more aligned with final outputs than explicit chain-of-thought approaches.

Key quotes

· 5 pulled
Modern LLMs are trained to 'think' primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data.
We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens.
Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks.
Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities.
We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT.
Snippet from the RSS feed
Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros,

You might also wanna read