cs.CL

UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification

arXiv:2605.06221v1 Announce Type: new
Abstract: As large language models (LLMs) continue to advance rapidly, they are becoming increasingly capable while simultaneously demanding ever-longer context lengths. To improve the inference efficiency of long…