cs.CV

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

arXiv:2503.11892v3 Announce Type: replace
Abstract: Multimodal representation learning aims to capture both shared and complementary semantic information across multiple modalities. However, the intrinsic heterogeneity of diverse modalities presents s…