LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
arXiv:2603.26683v1 Announce Type: cross
Abstract: Retrieving relevant evidence from visually rich documents such as textbooks, technical reports, and manuals is challenging due to long context, complex layouts, and weak lexical overlap between user qu…