Arbor: A Multi-Agent Framework Using Tree Search for Autonomous LLM Inference Optimization
By
[Submitted on 10 Jun 2026]
Plain bagel done well. Pleasantly substantive.
Summary
Arbor is a multi-agent framework that uses structured tree search as a cognition layer for autonomous agents operating in large, stateful action spaces. Unlike prior systems that work on isolated targets with stateless evaluation, Arbor maintains an explicit search tree of scored hypotheses as shared working memory across agents. It treats failures as diagnostic signals and evolves with each measurement. Validated on full-stack LLM inference optimization, Arbor pairs an Orchestrator agent with a Critic agent in a checks-and-balances architecture. It achieves up to 193% inference throughput-latency Pareto improvement over vendor-optimized baselines, while a single agent without the harness plateaus at +33% and crashes within hours. The system is hardware-agnostic, reproducible across multiple hardware generations, and enables fully autonomous multi-day optimization campaigns.
Key quotes
· 5 pulledArbor instead maintains an explicit search tree of scored hypotheses that serves as the shared working memory across agents, evolving with every measurement, treating failures as diagnostic signal that reshapes subsequent exploration.
Arbor pairs an Orchestrator agent, which drives optimization by delegating to Domain Specialists across the inference stack, with a Critic agent that safeguards stability through root-cause analysis, introspection, and measurement validation -- a checks-and-balances architecture where neither agent can unilaterally drive the system.
Arbor achieves up to 193% inference throughput-latency Pareto improvement over vendor-optimized baselines, while a single agent without the harness plateaus at +33% throughput improvement and crashes irrecoverably within hours.
Arbor generalizes to multiple generations of hardware platform, and run-to-run variance is within 2 percentage points demonstrating that the method is hardware-agnostic and reproducible.
Agent capabilities are decomposed into hard skills (domain expertise) and soft skills (coordination protocols that determine how contributions compose), enabling fully autonomous multi-day campaigns.
You might also wanna read
The Evolution of AI: From Static Benchmarks to Inference-Time Search for Autonomous Agents
The article explores the shift from traditional AI benchmarking to inference-time search as the future of AI development. It discusses how c
Alibaba's Tongyi DeepResearch: Open-Source AI Research Agent Matches OpenAI Performance
Alibaba's Tongyi DeepResearch is presented as the first fully open-source web agent that achieves performance comparable to OpenAI's DeepRes
tongyi-agent.github.io·7mo agoMemori launches agent-native persistent memory infrastructure using structured knowledge graphs from agent trace data
Memori is a new agent-native memory infrastructure that enables AI agents to create structured, long-term persistent memory directly from ag
GLM-5V-Turbo: A Native Multimodal Foundation Model for Agentic AI Tasks
GLM-5V-Turbo is a new multimodal foundation model developed by the GLM-V Team that integrates perception, reasoning, planning, tool use, and
New Benchmark Reveals High Rates of Outcome-Driven Constraint Violations in Autonomous AI Agents
Researchers introduce a new benchmark for evaluating autonomous AI agents' safety, specifically focusing on outcome-driven constraint violat
Technical Analysis of LLM Inference Engines: Exploring Nano-vLLM Architecture and Scheduling
This article provides an in-depth technical exploration of LLM inference engines, focusing on Nano-vLLM as a case study. It explains the cri
