SIMMER: Cross-Modal Food Image–Recipe Retrieval via MLLM-Based Embedding
arXiv:2604.15628v1 Announce Type: cross
Abstract: Cross-modal retrieval between food images and recipe texts is an important task with applications in nutritional management, dietary logging, and cooking assistance. Existing methods predominantly rely…