cs.AI, cs.CL, cs.CV

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

arXiv:2601.14724v3 Announce Type: replace
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inpu…