Indexing Multimodal Language Models for Large-scale Image Retrieval
arXiv:2604.13268v1 Announce Type: cross
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated strong cross-modal reasoning capabilities, yet their potential for vision-only tasks remains underexplored. We investigate MLLMs as training-f…