cs.CL, cs.LG

ProxyAttn: Guided Sparse Attention via Representative Heads

arXiv:2509.24745v2 Announce Type: replace
Abstract: The quadratic complexity of attention mechanisms limits the efficiency of Large Language Models (LLMs) on long-text tasks. Recently, methods that dynamically estimate block importance have enabled ef…