Attention Grounded Enhancement for Visual Document Retrieval
arXiv:2511.13415v2 Announce Type: replace-cross
Abstract: Visual document retrieval requires understanding heterogeneous and multi-modal content to satisfy implicit information needs. Recent advances use screenshot-based document encoding with fine-gr…