Arbor: A Multi-Agent Framework Using Tree Search for Autonomous LLM Inference Optimization

[Submitted on 10 Jun 2026]

2d ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

Plain bagel done well. Pleasantly substantive.

Score75TypeanalysisSentimentpositive

Summary

Arbor is a multi-agent framework that uses structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Unlike prior systems that work on isolated targets with stateless evaluation, Arbor maintains an explicit search tree of scored hypotheses as shared working memory across agents. It treats failures as diagnostic signals and evolves with each measurement. Validated on full-stack LLM inference optimization, Arbor pairs an Orchestrator agent with a Critic agent in a checks-and-balances architecture. It achieves up to 193% inference throughput-latency Pareto improvement over vendor-optimized baselines, while a single agent without the harness plateaus at +33% and crashes within hours. The system is hardware-agnostic, reproducible across multiple hardware generations, and enables fully autonomous multi-day optimization campaigns.

Key quotes

· 5 pulled

Arbor instead maintains an explicit search tree of scored hypotheses that serves as the shared working memory across agents, evolving with every measurement, treating failures as diagnostic signal that reshapes subsequent exploration.

Arbor pairs an Orchestrator agent, which drives optimization by delegating to Domain Specialists across the inference stack, with a Critic agent that safeguards stability through root-cause analysis, introspection, and measurement validation -- a checks-and-balances architecture where neither agent can unilaterally drive the system.

Arbor achieves up to 193% inference throughput-latency Pareto improvement over vendor-optimized baselines, while a single agent without the harness plateaus at +33% throughput improvement and crashes irrecoverably within hours.

Arbor generalizes to multiple generations of hardware platform, and run-to-run variance is within 2 percentage points demonstrating that the method is hardware-agnostic and reproducible.

Agent capabilities are decomposed into hard skills (domain expertise) and soft skills (coordination protocols that determine how contributions compose), enabling fully autonomous multi-day campaigns.

Snippet from the RSS feed

Arbor is a multi-agent framework that introduces structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Prior autonomous optimization systems operate on isolated targets with stateless evaluation. Arb

You might also wanna read

The Evolution of AI: From Static Benchmarks to Inference-Time Search for Autonomous Agents

The article explores the shift from traditional AI benchmarking to inference-time search as the future of AI development. It discusses how c

adlrocha.substack.com·5mo ago

Alibaba's Tongyi DeepResearch: Open-Source AI Research Agent Matches OpenAI Performance

Alibaba's Tongyi DeepResearch is presented as the first fully open-source web agent that achieves performance comparable to OpenAI's DeepRes

tongyi-agent.github.io·7mo ago

Memori launches agent-native persistent memory infrastructure using structured knowledge graphs from agent trace data

Memori is a new agent-native memory infrastructure that enables AI agents to create structured, long-term persistent memory directly from ag

Product Hunt·1mo ago

GLM-5V-Turbo: A Native Multimodal Foundation Model for Agentic AI Tasks

GLM-5V-Turbo is a new multimodal foundation model developed by the GLM-V Team that integrates perception, reasoning, planning, tool use, and

arXiv.org·1mo ago

New Benchmark Reveals High Rates of Outcome-Driven Constraint Violations in Autonomous AI Agents

Researchers introduce a new benchmark for evaluating autonomous AI agents' safety, specifically focusing on outcome-driven constraint violat

arxiv.org·4mo ago

Technical Analysis of LLM Inference Engines: Exploring Nano-vLLM Architecture and Scheduling

This article provides an in-depth technical exploration of LLM inference engines, focusing on Nano-vLLM as a case study. It explains the cri

neutree.ai·4mo ago