Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation
arXiv:2605.15913v1 Announce Type: new
Abstract: Block attention, which processes the input as separate blocks that cannot attend to one another, offers significant potential to improve KV cache reuse in long-context scenarios such as Retrieval-Augment…