Mining Instance-Centric Vision-Language Contexts for Human-Object Interaction Detection
arXiv:2604.02071v1 Announce Type: new
Abstract: Human-Object Interaction (HOI) detection aims to localize human-object pairs and classify their interactions from a single image, a task that demands strong visual understanding and nuanced contextual re…