Avengers-Pro: Performance-Efficiency Optimized Routing Framework for Large Language Models
By
omarsar
Crackles when you bite it. Shows the baker did the work.
Summary
Researchers present Avengers-Pro, a test-time routing framework that dynamically assigns queries to different LLMs based on performance-efficiency optimization. The system clusters incoming queries and routes them to the most suitable model (either efficient or high-capacity) to achieve optimal balance between accuracy and cost. Across 6 benchmarks and 8 leading models including GPT-5-medium and Gemini-2.5-pro, Avengers-Pro achieves state-of-the-art results: +7% higher accuracy than the strongest single model, 27% lower cost at equivalent accuracy, and 63% lower cost at 90% performance.
Key quotes
· 3 pulledAvengers-Pro achieves state-of-the-art results: by varying a performance-efficiency trade-off parameter, it can surpass the strongest single model (GPT-5-medium) by +7% in average accuracy
It can match the average accuracy of the strongest single model at 27% lower cost, and reach ~90% of that performance at 63% lower cost
It achieves a Pareto frontier, consistently yielding the highest accuracy for any given cost, and the lowest cost for any given accuracy, among all single models
You might also wanna read
RTP-LLM: Alibaba's High-Performance Inference Engine for Large Language Model Deployment
This paper presents RTP-LLM, a high-performance inference engine developed by Alibaba for industrial-scale deployment of Large Language Mode
ModelPilot: Intelligent LLM Router Optimizes AI Model Selection for Cost, Speed, Quality, and Environmental Impact
ModelPilot is an intelligent LLM router that automatically selects the optimal AI model for each prompt based on cost, latency, quality, and
ClawPane: Intelligent LLM Routing API for Cost Optimization and Performance
ClawPane is an API solution that provides intelligent LLM (Large Language Model) routing for AI agent requests. It automatically routes each
