Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding
arXiv:2605.07575v2 Announce Type: replace-cross
Abstract: Proactive streaming video understanding requires Video-LLMs to decide when to respond as a video unfolds, a task where existing methods often fall short due to their implicit, query-agnostic mo…