Inference-Time Distillation: Cost-Efficient Agents Without Fine-Tuning or Manual Prompt Engineering
arXiv:2512.02543v3 Announce Type: replace
Abstract: Deploying LLM agents at scale typically requires choosing between quality and cost. Existing cost-reduction approaches fail to preserve agility: the ability to iterate rapidly without human time bott…