cs.CL, cs.CV, cs.IR

Indexing Multimodal Language Models for Large-scale Image Retrieval

arXiv:2604.13268v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated strong cross-modal reasoning capabilities, yet their potential for vision-only tasks remains underexplored. We investigate MLLMs as training-f…