LongFlow: Efficient KV Cache Compression for Reasoning Models
arXiv:2603.11504v2 Announce Type: replace-cross
Abstract: Recent reasoning models such as OpenAI-o1 and DeepSeek-R1 have shown strong performance on complex tasks including mathematical reasoning and code generation. However, this performance gain com…