OmniEncoder: See, Hear, and Feel Continuous Motion Like Humans With One Encoder
arXiv:2605.01506v1 Announce Type: new
Abstract: Recent advances in omni-modal large language models have enabled remarkable progress in joint vision-audio understanding. However, prevailing architectures rely on modality-specific encoders with a \emph…