HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models
By
@ai-firehose.column.social
A good honest bake. Not flashy, but you'll finish the whole bagel.
Summary
This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many simple samples, too few challenging ones) and overthinking (redundant reasoning steps). The authors propose HSIR (Harnessing Self-Improvement in large Reasoning models), which uses a verify-then-exit sampling strategy to address data imbalance and an Intrinsic Diversity score to filter out overthinking. They also introduce H-GRPO, an enhanced reinforcement learning algorithm. Results show up to +10.9% performance gains and 42.4% reduction in inference overhead.
Key quotes
· 3 pulledSelf-improvement training enables the large reasoning models (LRMs) to improve themselves by self-generating reasoning trajectories as training data without external supervision.
We reveal two problems: (1) data imbalance, where most training samples are simple, but the challenging yet crucial samples are scarce; (2) overthinking, where many undesired samples with redundant reasoning steps are used for self-training.
HSIR not only effectively enhances the reasoning performance, i.e., bringing up to +10.9% average performance gains, but also significantly improves the reasoning efficiency by reducing up to 42.4% relative inference overhead.
You might also wanna read
Study Reveals Large Reasoning Models Fail at Complex Problem-Solving Despite Strong Benchmark Performance
This research article examines the limitations of large reasoning models (LRMs) - fine-tuned LLMs designed for step-by-step reasoning. While
Introducing the Hierarchical Reasoning Model: A Breakthrough in AI Reasoning
The article introduces the Hierarchical Reasoning Model (HRM) as a novel recurrent architecture inspired by the human brain's hierarchical a
Understanding Large Reasoning Models: Strengths and Limitations
Recent frontier language models have introduced Large Reasoning Models (LRMs) that enhance reasoning processes. However, understanding their
Tiny Recursive Model Outperforms Large Language Models on Complex Reasoning Tasks
Researchers propose Tiny Recursive Model (TRM), a simplified recursive reasoning approach that outperforms both the existing Hierarchical Re
Comprehensive Survey of Reasoning Failures in Large Language Models
This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization frame
Investigating Monitoring and Control of Thinking Processes in Large Reasoning Models
The article explores how large reasoning models monitor and control their thinking processes, focusing on models that segment computations u
