Daniele Materia, Francesco Ragusa, Giovanni Maria Farinella

Leveraging Gaze and Set-of-Mark in VLLMs for Human-Object Interaction Anticipation from Egocentric Videos

Daniele Materia, Francesco Ragusa, Giovanni Maria Farinella / April 7, 2026

arXiv:2604.03667v1 Announce Type: new
Abstract: The ability to anticipate human-object interactions is highly desirable in an intelligent assistive system in order to guide users during daily life activities and understand their short and long-term go…

Author name: Daniele Materia, Francesco Ragusa, Giovanni Maria Farinella

Leveraging Gaze and Set-of-Mark in VLLMs for Human-Object Interaction Anticipation from Egocentric Videos