UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
arXiv:2505.11815v2 Announce Type: replace
Abstract: Current vision-language models have been explored for multi-modal embedding tasks like information retrieval. However, they face significant challenges in real-world queries and targets involving div…