Tiny Recursive Model Outperforms Large Language Models on Complex Reasoning Tasks
By
guybedo
Crisped on the outside, thoughtful enough on the inside.
Summary
Researchers propose Tiny Recursive Model (TRM), a simplified recursive reasoning approach that outperforms both the existing Hierarchical Reasoning Model (HRM) and large language models on complex puzzle tasks like Sudoku, Maze, and ARC-AGI. TRM achieves superior generalization using only a single tiny network with 2 layers and 7M parameters, obtaining 45% test accuracy on ARC-AGI-1 and 8% on ARC-AGI-2 - higher than most LLMs while using less than 0.01% of their parameters.
Key quotes
· 4 pulledHRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal.
We propose Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM.
With only 7M parameters, TRM obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs.
TRM achieves these results with less than 0.01% of the parameters used by large language models.
You might also wanna read
Revolutionary 27M-Parameter AI Model Enhances Sequential Reasoning and Planning
The article introduces a revolutionary 27M-parameter AI model called the Hierarchical Reasoning Model, which performs complex sequential rea
Sapient Intelligence Releases HRM-Text-1B: A 1B Parameter Language Model with Hierarchical Reasoning Architecture
Sapient Intelligence has released HRM-Text-1B, a 1 billion parameter language model built on the Hierarchical Reasoning Model (HRM) architec
HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models
This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many sim
Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models
This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains
