CoRDS: Coreset-based Representative and Diverse Selection for Streaming Video Understanding
arXiv:2605.14310v1 Announce Type: new
Abstract: Streaming video understanding with large vision-language models (VLMs) requires a compact memory that can support future reasoning over an ever-growing visual history. A common solution is to compress th…