DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA
arXiv:2511.22521v2 Announce Type: replace
Abstract: Document visual question answering requires models not only to answer questions correctly, but also to precisely localize answers within complex document layouts. While large vision-language models (…