cs.AI, cs.CL, cs.LG

Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference

arXiv:2605.04341v1 Announce Type: new
Abstract: We study distillation for large language models under explicit compute constraints, with the goal of producing student models that are not only cheaper to train, but structurally efficient at inference t…