LLMs vs. Classical HPO Algorithms: Hybrid Approach Outperforms Both in Hyperparameter Optimization

[Submitted on 25 Mar 2026 (v1), last revised 17 Apr 2026 (this version, v5)]

1d ago· 2 min readenInsight

80/100

Golden Brown

Bagelometer↗

The kind of bagel that ruins lesser bagels for you.

Score80TypeanalysisSentimentneutral

Summary

This research paper compares classical hyperparameter optimization (HPO) algorithms (CMA-ES, TPE) against LLM-based methods for tuning a small language model under a fixed compute budget. Classical methods consistently outperform pure LLM agents, especially in avoiding out-of-memory failures. Allowing LLMs to directly edit source code narrows the gap but doesn't close it, even with frontier models like Claude Opus 4.6 and Gemini 3.1 Pro Preview. The authors introduce Centaur, a hybrid approach that shares CMA-ES's interpretable internal state with an LLM, achieving the best results. A 0.8B parameter LLM in Centaur already outperforms all classical and pure LLM methods. The findings suggest LLMs are most effective as complements to classical optimizers, not replacements.

Key quotes

· 5 pulled

When defining a fixed search space over autoresearch, classical methods such as CMA-ES and TPE consistently outperform LLM-based agents, where avoiding out-of-memory failures matters more than search diversity.

We observe that LLMs struggle to track optimization state across trials. In contrast, classical methods lack the domain knowledge of LLMs.

To combine the strengths of both, we introduce Centaur, a hybrid that shares CMA-ES's interpretable internal state, including mean vector, step-size, and covariance matrix, with an LLM.

Centaur achieves the best result in our experiments, and a 0.8B LLM already suffices to outperform all classical and pure LLM methods.

All in all, our results suggest that LLMs are most effective as a complement to classical optimizers, not as a replacement.

Snippet from the RSS feed

The autoresearch repository enables an LLM agent to optimize hyperparameters by editing training code directly. We use it as a testbed to compare classical HPO algorithms against LLM-based methods on tuning the hyperparameters of a small language model un

You might also wanna read

RTP-LLM: Alibaba's High-Performance Inference Engine for Large Language Model Deployment

This paper presents RTP-LLM, a high-performance inference engine developed by Alibaba for industrial-scale deployment of Large Language Mode

arxiv.org·11d ago

LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs

This paper introduces LK losses, a novel training objective for speculative decoding in large language models (LLMs). Speculative decoding a

arxiv.org·8d ago

DecompR: A Method for Reducing Weighting Noise in Multi-Stakeholder LLM Alignment

This paper addresses the challenge of aligning large language models (LLMs) with multiple stakeholders who have conflicting preferences. It

arxiv.org·12d ago

Parametric Memory Law: A Quantitative Framework for Understanding LoRA Memory Capacity in LLMs

This research paper introduces the Parametric Memory Law, a quantitative framework for understanding how Low-Rank Adaptation (LoRA) enables

arxiv.org·11d ago

LLM-Based Framework Translates Natural Language into Spacecraft Trajectory Optimization Code

This paper presents a framework that uses large language models (LLMs) to translate natural language descriptions of mission requirements an

arxiv.org·5d ago