Large language model (LLM) development typically has been divided into two distinct phases: the massive, capital-intensive undertaking of training; and the operational utility of inference. For years, the industry’s focus — and investments — was dominated by the race to train larger models on larger datasets.