Saksham Rathi, Preeti, Mythili Vutukuru

Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference

Saksham Rathi, Preeti, Mythili Vutukuru / May 8, 2026

arXiv:2605.06046v1 Announce Type: new
Abstract: Auto-regressive token generation in large language models is memory-bound because it requires “attending to” key and value tensors (KV cache) of all previous tokens. Prior work aims to improve the effici…

Author name: Saksham Rathi, Preeti, Mythili Vutukuru

Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference