cs.CV, cs.LG, q-bio.NC

Behavioral Geometric Supervision Aligns Video Foundation Models with Human Social Perception

arXiv:2510.01502v2 Announce Type: replace-cross
Abstract: Current video foundation models, including the strongest self-supervised models such as V-JEPA2, fail to capture how humans organize social information in dynamic scenes. For example, across a …