Feedback Distillation: A New Training Method for Improving LLM Reasoning in Theorem Proving

Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse…

Read the full article

[Submitted on 29 May 2026]1mo ago1 min readenInsight

technology science machine learning formal verification

You might also wanna read

DOPD: A Dual On-policy Distillation Method to Address Privilege Illusion in LLM and VLM Training

On-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. T

arxiv.org·17d ago

Can We Trust LLM's Logic? Quantifying Uncertainty, Coherence, and Robustness via a Graph-Based Framework

arXiv:2607.08017v1 Announce Type: new Abstract: Large-Language Models (LLMs) can be prone to flawed and unfaithful reasoning that decoding s

machinebrief.com·7d ago

Proxy-KD: A Novel Method for Knowledge Distillation from Black-Box Large Language Models

Given the exceptional performance of proprietary large language models (LLMs) like GPT-4, recent research has increasingly focused on boosti

arxiv.org·18d ago

SPIRAL: A Reinforcement Learning Framework for Multi-Primitive Language Model Reasoning

Language model reasoning can be substantially improved at test time via scaffolds that scale inference compute across different primitives -

arxiv.org·20d ago

Just Keep Prompting: Evaluating Repetitive Socratic Prompting in VLMs

arXiv:2607.14099v1 Announce Type: cross Abstract: Deploying Vision-Language Models (VLMs) in real-world settings requires not only strong vi

machinebrief.com·6h ago

Comprehensive Survey of Reasoning Failures in Large Language Models

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. De

arxiv.org·4mo ago

Comments

No comments yet. Be the first.