Donghuo Zeng, Hao Niu, Masato Taya

Hierarchical Semantic Correlation-Aware Masked Autoencoder for Unsupervised Audio-Visual Representation Learning

Donghuo Zeng, Hao Niu, Masato Taya / April 7, 2026

arXiv:2604.04229v1 Announce Type: cross
Abstract: Learning aligned multimodal embeddings from weakly paired, label-free corpora is challenging: pipelines often provide only pre-extracted features, clips contain multiple events, and spurious co-occurre…

Author name: Donghuo Zeng, Hao Niu, Masato Taya

Hierarchical Semantic Correlation-Aware Masked Autoencoder for Unsupervised Audio-Visual Representation Learning