OmniPilot: An LLM Inference Advisor for Optimizing GPU Cluster Configuration Selection
OmniPilot is an uncertainty-aware LLM inference advisor designed for heterogeneous GPU clusters. It helps users and operators select optimal GPU type, tensor-parallel degree, and precision configurations by predicting serving costs using a conformally calibrated quantile cost mod