cs.CL, cs.CV

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

arXiv:2604.04921v1 Announce Type: cross
Abstract: Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE q…