TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
arXiv:2604.04921v1 Announce Type: cross
Abstract: Extended reasoning in large language models (LLMs) creates severe KV cache memory bottlenecks. Leading KV cache compression methods estimate KV importance using attention scores from recent post-RoPE q…