cs.AI, cs.CV

DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA

arXiv:2511.22521v2 Announce Type: replace
Abstract: Document visual question answering requires models not only to answer questions correctly, but also to precisely localize answers within complex document layouts. While large vision-language models (…