Yang Qiao, Yuntong Hu, Bowen Zhu, Hasibul Haque, Liang Zhao

Multimodal Representation Learning Conditioned on Semantic Relations

Yang Qiao, Yuntong Hu, Bowen Zhu, Hasibul Haque, Liang Zhao / May 12, 2026

arXiv:2508.17497v2 Announce Type: replace-cross
Abstract: Multimodal representation learning has been largely driven by contrastive models such as CLIP, which learn a shared embedding space by aligning paired image-text samples. While effective for ge…

Author name: Yang Qiao, Yuntong Hu, Bowen Zhu, Hasibul Haque, Liang Zhao

Multimodal Representation Learning Conditioned on Semantic Relations