Behavioral Geometric Supervision Aligns Video Foundation Models with Human Social Perception
arXiv:2510.01502v2 Announce Type: replace-cross
Abstract: Current video foundation models, including the strongest self-supervised models such as V-JEPA2, fail to capture how humans organize social information in dynamic scenes. For example, across a …