STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding
arXiv:2603.27593v1 Announce Type: new
Abstract: Recent progress in video large language models (Video-LLMs) has enabled strong offline reasoning over long and complex videos. However, real-world deployments increasingly require streaming perception an…