Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models

[Submitted on 29 Jun 2025 (v1), last revised 26 May 2026 (this version, v4)]

3d ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

Reliable enough to start your morning with. Toast it again tomorrow.

Score75TypeanalysisSentimentneutral

Summary

This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains of thought. The authors test if hidden states encode progress information by discretizing reasoning trajectories and training a linear probe to classify reasoning states. They fine-tune models to generate progress estimates from 0-100% during chain-of-thought reasoning, achieving a best MAE of 0.161 on mathematical reasoning traces. The study also quantifies the intrinsic ambiguity of progress labels, finding that larger models like Qwen3-4B produce more stable progress labels by reducing variation in remaining solution length.

Key quotes

· 4 pulled

Recent reasoning language models, particularly those that employ long latent chains of thought, achieve strong performance on complex agentic tasks.

As these models operate over increasingly long time horizons, their internal progress becomes opaque to users, making expectation management and real-time oversight difficult.

Our strongest progress-reporting checkpoint reaches 0.161 MAE on mathematical reasoning traces and outperforms position baselines in this setting.

This ambiguity is lowest for Qwen3-4B, whose continuations produce the smallest rollout dispersion, suggesting that larger models can make progress labels more stable by reducing variation in remaining solution length.

Snippet from the RSS feed

Recent reasoning language models, particularly those that employ long latent chains of thought, achieve strong performance on complex agentic tasks. However, as these models operate over increasingly long time horizons, their internal progress becomes opa

You might also wanna read

Theoretical Perspective on Continuous Chain of Thoughts in Reasoning

Large Language Models (LLMs) have shown impressive performance in reasoning tasks using chain-of-thoughts (CoTs) techniques. This article ex

arxiv.org·11mo ago

Understanding Large Reasoning Models: Strengths and Limitations

Recent frontier language models have introduced Large Reasoning Models (LRMs) that enhance reasoning processes. However, understanding their

machinelearning.apple.com·11mo ago

Study Reveals Large Reasoning Models Fail at Complex Problem-Solving Despite Strong Benchmark Performance

This research article examines the limitations of large reasoning models (LRMs) - fine-tuned LLMs designed for step-by-step reasoning. While

arxiv.org·7mo ago

Investigating Monitoring and Control of Thinking Processes in Large Reasoning Models

The article explores how large reasoning models monitor and control their thinking processes, focusing on models that segment computations u

royeisen.github.io·10mo ago

Research Analysis: How AI Models Optimize Reasoning for Training Rewards Rather Than Truth

The article presents a case study on how Large Language Models approach reasoning, arguing that while they do engage in reasoning processes,

tomaszmachnik.pl·4mo ago

Program of Thoughts: Separating Computation from Reasoning in Language Models for Numerical Tasks

The article introduces "Program of Thoughts" (PoT), a new approach that disentangles computation from reasoning in language models for numer

arxiv.org·6mo ago