cs.AI, cs.CL

Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation

arXiv:2605.15913v1 Announce Type: new
Abstract: Block attention, which processes the input as separate blocks that cannot attend to one another, offers significant potential to improve KV cache reuse in long-context scenarios such as Retrieval-Augment…