cs.CL, cs.CV, cs.IR, cs.LG, cs.MM

SIMMER: Cross-Modal Food Image–Recipe Retrieval via MLLM-Based Embedding

arXiv:2604.15628v1 Announce Type: cross
Abstract: Cross-modal retrieval between food images and recipe texts is an important task with applications in nutritional management, dietary logging, and cooking assistance. Existing methods predominantly rely…