POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency
arXiv:2510.01009v3 Announce Type: replace
Abstract: Long-video multimodal question answering requires structured reasoning over visual evidence and dialogue, but Large Vision-Language Models (LVLMs) are constrained by context-window and compute limits…