All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Latent Agents: Distilling Multi-Agent Debate into Single LLMs via Post-Training Internalization

By

[Submitted on 27 Apr 2026]

3h ago· 2 min readenInsight

Summary

This paper introduces "Latent Agents," a post-training framework that distills multi-agent debate into a single LLM through a two-stage fine-tuning pipeline. The approach combines debate structure learning with internalization via dynamic reward scheduling and length clipping. Results show the internalized models match or exceed explicit multi-agent debate performance while using up to 93% fewer tokens. The research investigates the mechanistic basis through activation steering, finding that internalization creates agent-specific subspaces in activation space. A practical application demonstrates that instilling malicious agents and applying negative steering makes harmful behaviors easier to localize and control with smaller performance reductions compared to steering base models.

Key quotes

· 5 pulled
Multi-agent debate has been shown to improve reasoning in large language models (LLMs). However, it is compute-intensive, requiring generation of long transcripts before answering questions.
Our internalized models match or exceed explicit multi-agent debate performance using up to 93% fewer tokens.
We investigate the mechanistic basis of this capability through activation steering, finding that internalization creates agent-specific subspaces: interpretable directions in activation space corresponding to different agent perspectives.
By instilling malicious agents into the LLM through internalized debate, then applying negative steering to suppress them, we show that distillation makes harmful behaviors easier to localize and control with smaller reductions in general performance compared to steering base models.
Our findings offer a new perspective for understanding multi-agent capabilities in distilled models and provide practical guidelines for controlling internalized reasoning behaviors.
Snippet from the RSS feed
Multi-agent debate has been shown to improve reasoning in large language models (LLMs). However, it is compute-intensive, requiring generation of long transcripts before answering questions. To address this inefficiency, we develop a framework that distil

You might also wanna read