DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
arXiv:2510.09883v2 Announce Type: replace
Abstract: Large reasoning models (LRMs) achieve state-of-the-art performance on challenging benchmarks by generating long chains of intermediate steps, but their inference cost is dominated by decoding, where …