cs.CL, cs.CV, cs.IR

Attention Grounded Enhancement for Visual Document Retrieval

arXiv:2511.13415v2 Announce Type: replace-cross
Abstract: Visual document retrieval requires understanding heterogeneous and multi-modal content to satisfy implicit information needs. Recent advances use screenshot-based document encoding with fine-gr…