MoCHA: Denoising Caption Supervision for Motion-Text Retrieval
arXiv:2603.23684v2 Announce Type: replace
Abstract: Text-motion retrieval systems learn shared embedding spaces from motion-caption pairs via contrastive objectives. However, each caption is not a deterministic label but a sample from a distribution o…