DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
arXiv:2503.11892v3 Announce Type: replace
Abstract: Multimodal representation learning aims to capture both shared and complementary semantic information across multiple modalities. However, the intrinsic heterogeneity of diverse modalities presents s…