cs.AI, cs.CV

UpstreamQA: A Modular Framework for Explicit Reasoning on Video Question Answering Tasks

arXiv:2604.23145v1 Announce Type: cross
Abstract: Video Question Answering (VideoQA) demands models that jointly reason over spatial, temporal, and linguistic cues. However, the task’s inherent complexity often requires multi-step reasoning that curre…