All Topics

Technology

Art

HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models

@ai-firehose.column.social

5d ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

A good honest bake. Not flashy, but you'll finish the whole bagel.

Score75TypeanalysisSentimentpositive

Summary

This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many simple samples, too few challenging ones) and overthinking (redundant reasoning steps). The authors propose HSIR (Harnessing Self-Improvement in large Reasoning models), which uses a verify-then-exit sampling strategy to address data imbalance and an Intrinsic Diversity score to filter out overthinking. They also introduce H-GRPO, an enhanced reinforcement learning algorithm. Results show up to +10.9% performance gains and 42.4% reduction in inference overhead.

Key quotes

· 3 pulled

Self-improvement training enables the large reasoning models (LRMs) to improve themselves by self-generating reasoning trajectories as training data without external supervision.

We reveal two problems: (1) data imbalance, where most training samples are simple, but the challenging yet crucial samples are scarce; (2) overthinking, where many undesired samples with redundant reasoning steps are used for self-training.

HSIR not only effectively enhances the reasoning performance, i.e., bringing up to +10.9% average performance gains, but also significantly improves the reasoning efficiency by reducing up to 42.4% relative inference overhead.

Snippet from the RSS feed

Self-improvement training enables the large reasoning models (LRMs) to improve themselves by self-generating reasoning trajectories as training data without external supervision. However, we find that this method often falls short in complex reasoning tas

You might also wanna read

Study Reveals Large Reasoning Models Fail at Complex Problem-Solving Despite Strong Benchmark Performance

This research article examines the limitations of large reasoning models (LRMs) - fine-tuned LLMs designed for step-by-step reasoning. While

arxiv.org·7mo ago

Introducing the Hierarchical Reasoning Model: A Breakthrough in AI Reasoning

The article introduces the Hierarchical Reasoning Model (HRM) as a novel recurrent architecture inspired by the human brain's hierarchical a

arxiv.org·10mo ago

Understanding Large Reasoning Models: Strengths and Limitations

Recent frontier language models have introduced Large Reasoning Models (LRMs) that enhance reasoning processes. However, understanding their

machinelearning.apple.com·11mo ago

Tiny Recursive Model Outperforms Large Language Models on Complex Reasoning Tasks

Researchers propose Tiny Recursive Model (TRM), a simplified recursive reasoning approach that outperforms both the existing Hierarchical Re

arxiv.org·7mo ago

Comprehensive Survey of Reasoning Failures in Large Language Models

This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization frame

arxiv.org·3mo ago

Investigating Monitoring and Control of Thinking Processes in Large Reasoning Models

The article explores how large reasoning models monitor and control their thinking processes, focusing on models that segment computations u

royeisen.github.io·10mo ago