Ego-Grounding for Personalized Question-Answering in Egocentric Videos
arXiv:2604.01966v1 Announce Type: new
Abstract: We present the first systematic analysis of multimodal large language models (MLLMs) in personalized question-answering requiring ego-grounding – the ability to understand the camera-wearer in egocentric…