cs.CV

CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models

arXiv:2605.08735v1 Announce Type: new
Abstract: Recent “Thinking with Video” approaches use Video Generation Models (VGMs) for visual reasoning by producing temporally coherent Chain-of-Frames as reasoning artifacts. Even strong VGMs, however, exhibit…