CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference
arXiv:2604.08584v1 Announce Type: new
Abstract: Long-context LLMs increasingly rely on extended, reusable prefill prompts for agents and domain Q&A, pushing attention and KV-cache to become the dominant decode-time bottlenecks. While sparse attention …