LLMs vs. Classical HPO Algorithms: Hybrid Approach Outperforms Both in Hyperparameter Optimization
By
[Submitted on 25 Mar 2026 (v1), last revised 17 Apr 2026 (this version, v5)]
The kind of bagel that ruins lesser bagels for you.
Summary
This research paper compares classical hyperparameter optimization (HPO) algorithms (CMA-ES, TPE) against LLM-based methods for tuning a small language model under a fixed compute budget. Classical methods consistently outperform pure LLM agents, especially in avoiding out-of-memory failures. Allowing LLMs to directly edit source code narrows the gap but doesn't close it, even with frontier models like Claude Opus 4.6 and Gemini 3.1 Pro Preview. The authors introduce Centaur, a hybrid approach that shares CMA-ES's interpretable internal state with an LLM, achieving the best results. A 0.8B parameter LLM in Centaur already outperforms all classical and pure LLM methods. The findings suggest LLMs are most effective as complements to classical optimizers, not replacements.
Key quotes
· 5 pulledWhen defining a fixed search space over autoresearch, classical methods such as CMA-ES and TPE consistently outperform LLM-based agents, where avoiding out-of-memory failures matters more than search diversity.
We observe that LLMs struggle to track optimization state across trials. In contrast, classical methods lack the domain knowledge of LLMs.
To combine the strengths of both, we introduce Centaur, a hybrid that shares CMA-ES's interpretable internal state, including mean vector, step-size, and covariance matrix, with an LLM.
Centaur achieves the best result in our experiments, and a 0.8B LLM already suffices to outperform all classical and pure LLM methods.
All in all, our results suggest that LLMs are most effective as a complement to classical optimizers, not as a replacement.
You might also wanna read
RTP-LLM: Alibaba's High-Performance Inference Engine for Large Language Model Deployment
This paper presents RTP-LLM, a high-performance inference engine developed by Alibaba for industrial-scale deployment of Large Language Mode
LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs
This paper introduces LK losses, a novel training objective for speculative decoding in large language models (LLMs). Speculative decoding a
DecompR: A Method for Reducing Weighting Noise in Multi-Stakeholder LLM Alignment
This paper addresses the challenge of aligning large language models (LLMs) with multiple stakeholders who have conflicting preferences. It
Parametric Memory Law: A Quantitative Framework for Understanding LoRA Memory Capacity in LLMs
This research paper introduces the Parametric Memory Law, a quantitative framework for understanding how Low-Rank Adaptation (LoRA) enables
LLM-Based Framework Translates Natural Language into Spacecraft Trajectory Optimization Code
This paper presents a framework that uses large language models (LLMs) to translate natural language descriptions of mission requirements an
