cs.CV, cs.IR

ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence

arXiv:2605.13034v1 Announce Type: new
Abstract: Recent deep research systems have improved the ability of large language models to produce long, grounded reports through iterative retrieval and reasoning. However, most text-centered systems rely mainl…