cs.CV, cs.MM

Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs

arXiv:2603.25004v1 Announce Type: new
Abstract: Zero-shot referring expression comprehension (REC) aims to locate target objects in images given natural language queries without relying on task-specific training data, demanding strong visual understan…