Balaji Darur, Amanmeet Garg, Makarand Tapaswi

One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition

Balaji Darur, Amanmeet Garg, Makarand Tapaswi / April 28, 2026

arXiv:2604.23173v1 Announce Type: new
Abstract: Video Situation Recognition (VidSitu) addresses the challenging problem of “who did what to whom, with what, how, and where” in a video. It tests thorough video understanding by requiring identification …

Author name: Balaji Darur, Amanmeet Garg, Makarand Tapaswi

One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition