cs.CL, cs.CV

VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval

arXiv:2602.19146v2 Announce Type: replace-cross
Abstract: We introduce VIGiA, a novel multimodal dialogue model designed to understand and reason over complex, multi-step instructional video action plans. Unlike prior work which focuses mainly on text…